You are on page 1of 88

Chapter 7

The Singular Value Decomposition


In an earlier chapter we looked at a procedure for diagonalizing a square matrix by using a change
of basis. At that time we saw that not every square matrix could be diagonalized. In this chapter
we will look at a generalization of that diagonalization procedure that will allow us to diagonalize
any matrix square or not square, invertible or not invertible. This procedure is called the singular
value decomposition.

7.1

Singular Values

Let A be an m n matrix, then we know that AT A will be a symmetric positive semi-definite


n n matrix. We can therefore find an orthonormal basis of Rn consisting of eigenvectors of AT A.
Let this orthonormal basis be {v1 , v2 , . . . , vn } and let i be the eigenvalue of AT A corresponding
to the eigenvector vi . Since AT A is positive semi-definite we must have i 0.
Now notice that
kAvi k2 = (Avi )T Avi = viT AT Avi = i viT vi = i

Therefore the length of Avi is i . In other words, i is the factor by which the length of each
eigenvector of AT A is scaled when multiplied by A.
Furthermore, notice that for i 6= j we have
T

Avi Avj = (Avi ) Avj = viT AT Avj = j vi vj = 0

so {Av1 , Av2 , . . . , Avn } is an orthogonal set ofvectors. If we want to normalize a non-zero vector,
Avi , in this set we just have to scale it by 1/ i . Note also that some of the vectors in this set
could be the zero vector if 0 happens to be an eigenvalue of AT A. In fact one of these vectors will
definitely be the zero vector whenever Nul A 6= {0} (that is, whenever the columns of A are linearly
dependent). The reason is as follows:
Nul A 6= {0} Ax = 0 for some x 6= 0
AT Ax = 0
x is an eigenvector of AT A with eigenvalue 0.
The implication also works in the other direction as follows:
0 is an eigenvalue of AT A

AT Ax = 0 for some x 6= 0
xT AT Ax = 0
kAxk = 0
Ax = 0
The columns of A are linearly dependent.
291

292

7. The Singular Value Decomposition

The above comments lead to the following definition.


Definition 22 Let A be an m n matrix then the singular values of A are defined to be the square
roots of the eigenvalues1 of AT A. The singular values of A will be denoted by 1 , 2 , . . . , n . It is
customary to list the singular values in decreasing order so it will be assumed that
1 2 n 0

Example 7.1.1

1
1 ?
1


3 1
T
The first step is to compute A A which gives
. This matrix has the character1 3
istic polynomial
2 6 + 8 = ( 4)( 2)
1
What are the singular values of A = 1
1

which gives us the two eigenvalues of4 and 2. We take the square roots of these to get
the singular values,
and 2 = 2.
1 = 2 

2/2
2/2

and v2 =
would be orthonormal eigenvectors
The vectors v1 =
2/2
2/2
of AT A. What happens when these two vectors are multiplied by A?

2
Av1 = 2 and this vector has length 1 = 2.
0

Av2 = 0 and this vector has length 2 = 2.


2
So the lengths of v1 and v2 are scaled by the corresponding singular values when these
vectors are multiplied by A. Note also, as mentioned earlier, that Av1 and Av2 are
orthogonal.
Now consider the following problem: Let B = AT . What are the singular values of B?
B T B will be a 3 3 matrix so B has 3 singular values. It was shown earlier that AT A
and AAT will have the same non-zero eigenvalues so the singular values of B will be 2,

2, and 0.

Example 7.1.2



1 2
Let A =
. What are the singular values of A? (Note that in this case the columns
1 2
of A are not linearly independent so, for reasons mentioned earlier in this section, 0 will
turn out to be a singular value.)
1 Some

textbooks prefer to define the singular values of A as the square roots of the non-zero eigenvalues of AT A.

7.1. Singular Values

293

The procedure is straightforward. First we compute




2 4
AT A =
4 8
and this matrix has characteristic
polynomial 2 10, which gives eigenvalues of 10

of A.
and 0, and so we get 1 = 10 and 2 = 0 as the singular
 values

1/5
For = 10 we would have a unit eigenvector of v1 =
.
2/ 5
 

5
Then Av1 = which has length 1 = 10.
5


2/ 5
For = 0 we would have a unit eigenvector of v2 =
.
1/ 5
 
0
Then Av1 =
and this vector has length 2 = 0.
0

The Singular Value Decomposition


Here is the main theorem for this chapter.
Theorem 7.1 (The Singular Value Decomposition) Let A be any m n matrix, then we can
write A = U V T where U is an m m orthogonal matrix, V is an n n orthogonal matrix, and
is an m n matrix whose first r diagonal entries are the nonzero singular values 1 , 2 , . . . , r
of A and all other entries are zero. The columns of V are called the right singular vectors. The
columns of U are called the left singular vectors.
Proof.
Let A be any m n matrix. Let 1 , 2 , . . . , n be the singular values of A (with 1 , 2 , . . . , r
the non-zero singular
 values) and let v1 , v2 , . . . , vn be the corresponding orthonormal eigenvectors
of AT A. Let V = v1 v2 vn . So V is an orthogonal matrix and
 


AV = Av1 Av2 Avn = Av1 Av2 Avr 0 0

We will mention here (the proof is left as an exercise) that r will be the rank of A. So it is possible
that r = n in which case there will not be any columns of zeroes in AV .
1
Now let ui =
Avi for 1 i r. As we saw earlier these vectors will form an orthonormal
i
m
set of r vectors in R . Extend this set to an orthonormal basis of Rm by adding m r appropriate
vectors, ur+1 , . . . , um , and let U = [u1 u2 . . . ur ur+1 . . . um ]. Then U will be an orthogonal matrix
and

1 0
0 0
0 2 0 0

U = [u1 u2 . . . um ] 0
0 3 0

..
..
..
..
..
.
.
.
.
.


= 1 u1 2 u2 r ur 0 0


= Av1 Av2 Avr 0 0

(In case the above reasoning is unclear remember that in the product U the columns of contain
the weights given given to the columns of U and after the rth column all the entries in are zeroes.)

294

7. The Singular Value Decomposition

Therefore AV = U and multiplying on the right by V T gives us the singular value decomposition
A = U V T .
The singular value decomposition (SVD) can also be written as
A = 1 u1 v1T + 2 u2 v2T + + r ur vrT
You should see a similarity between the singular value decomposition and the spectral decomposition.
In fact, if A is symmetric and positive definite they are equivalent.
The singular value decomposition of a matrix is not unique. The right singular vectors are orthonormal eigenvectors of AT A. If an eigenspace of this matrix is 1 dimensional there are two choices
for the corresponding singular vector, these choices are negatives of each other. If an eigenspace has
dimension greater than 1 then there are infinitely many choices for the (orthonormal) eigenvectors,
but any of these choices would be an orthonormal basis of the same eigenspace. Furthermore, as
seen in the above proof, it might be necessary to add columns2 to U to make up an orthonormal
basis for Rm . There will be a certain amount of freedom in choosing these vectors.

Example 7.1.3
To illustrate the proof of Theorem 7.1 we will outline the steps required to find the
SVD of


1 2
A=
1 2

In Example 7.1.2 we found the singular values of A to be 1 = 10 and 2 = 0 so we


know that


10 0
=
0
0
If we take the right singular vectors (in the appropriate order) as columns then we have


1/5 2/ 5
V =
2/ 5 1/ 5
Take a moment to consider the following questions:
i. Are there any other possible answers for in this example?
ii. Are there any other possible answers for V in this example?
The answer is no to the first question, and yes to the second. There are four possible
choices for V . (What are they?)
Now how can we find U ? From the proof of Theorem 7.1 we see that
u1

1
Av1
1

 
1 1 2 1/ 5

=
10 1 2 2/ 5
 
1
5
=
5
10
 
1/2
=
1/ 2

2 Suppose we let W be the span of {u , u , , u }. Then the columns that we add are an orthonormal basis of
r
1
2
W .

7.1. Singular Values

295

This gives us the first column of U , but we cant find u2 the same way since 2 = 0. To
2
find u2 we just
to
 extend u1 to an orthonormal basis of R . It should be clear that
 have
1/ 2
will work. So we now have
letting u2 =
1/ 2
U=


1/2 1/ 2
1/ 2 1/ 2

Again, stop now and ask yourself if there any other possible choices for U at this stage?
(The answer is yes, for any particular choice of V there are 2 choices for U .)
We now have the SVD
 



1/ 5 2/5
1/2 1/ 2
10 0
A = U V T =
0
0 2/ 5 1/ 5
1/ 2 1/ 2
You should recognize U and V as rotation matrices.
This SVD can also be written in the form
 

1/2 
T
T
1 u1 v1 + 2 u2 v2 = 10
1/ 5
1/ 2


2 5

Example 7.1.4

1
1
1 .
Find the SVD of A = 1
1 1
We used this matrix for an earlier example and so we already have most of the important
information. From the earlier results we know that



2 0
2/2
2/2
and
V =
= 0
2
2/2 2/2
0
0
The
last step is to find U . Thefirst column of U will be Av1 normalized, so u1 =

0
2/2
2/2 . Similarly u2 = 0 . What about u3 ? First notice that at this point we
1
0
can write the SVD as follows:

U V T


2/2
= 2/2
0

2/2
= 2/2
0

0
0
1
0
0
1


2 0 
2/2
2/2
0
2
2/2 2/2
0
0

2
2
1 1
0
0

If we now carry out the last matrix multiplication, the entries in the third column of
U all get multiplied by 0. So in a sense it doesnt matter what entries go in that last
column.

296

7. The Singular Value Decomposition


This can also be seen if we write the SVD in the form 1 u1 v1T + 2 u2 v2T . Since there is
no 3 it follows that the value of u3 is not relevant when the SVD is expressed in this
form. In this form the SVD gives

1 u1 v1T + 2 u2 v2T


 0 


2/2 
2 2/2 2/2
2/2 + 2 0 2/2 2/2
1
0

0
0
1/2 1/2

2 1/2 1/2 + 2 0
0
0
0
2/2 2/2

1 1
0 0
1 1 + 0 0
0 0
1 1

1 1
1 1
1 1

But having said all this U should have a third column and if you wanted to find it how
could you do it? The set {u1 , u2 } is an orthonormal basis for a plane in R3 . To extend
these two vectors to an orthonormal basis for all of R3 we want a third vector, u3 , that
is normal to
plane.
One way of doing this would be to let u3 = u1 u2 . This would
this
2/2

give u3 = 2/2 .
0

7.1. Singular Values

297

Exercises
1. Find a singular value decomposition of the following matrices.





2 3
6 3
0
(a)
(b)
(c)
0 2
1 2
0


2
0

2. Find a singular value decomposition of the following matrices.

0 2
1 0
1 2
(a) 0 1
(b) 0 1
(c) 0 1
0 0
1 0
1 0
3. What are the singular values of the matrix

cos()
sin()

sin()
cos()


1 2 2
. Find a SVD for A and AT .
1 2 2


1 0
5. (a) Let A =
. This is a symmetric indefinite matrix. Find a spectral decomposition and
0 2
a singular value decomposition for this matrix.


1 3
(b) Let A =
. This is a symmetric indefinite matrix. Find a spectral decomposition and a
3 1
singular value decomposition for this matrix.
4. Let A =

(c) If A is a symmetric matrix show that the singular values of A are just the absolute value of
the eigenvalues of A.
6. Find a singular value decomposition for the following matrices. Note that these matrices have
different sizes, but they are all of rank 1 so in each case the SVD can be written 1 u1 v1T .

1
1 1
1 1 1
1 1 1 1
(a) 1 (b) 1 1 (c) 1 1 1 (d) 1 1 1 1
1
1 1
1 1 1
1 1 1 1
7. Find a singular value decomposition for the following matrices. Note that these matrices have
different sizes, but they are all of rank 2 so in each case the SVD can be written 1 u1 v1T + 2 u2 v2T .

1 0
1 0 2
1 0 2 0
(a) 1 0 (b) 1 0 2 (c) 1 0 2 0
0 1
0 1 0
0 1 0 2

1
8. Find the SVD of A = 0
0


1 3/2
9. The matrix A =
0 1
matrix?

1
0
0

0
1 .
1

is not diagonalizable. What is the singular value decomposition of this

298

7. The Singular Value Decomposition

0 0
10. Let A = 0 0. Find the singular value decomposition A = U V T . How many choices are
1 1
there for the second and third column of U ?
11. Let A = U V T be the singular value decomposition of A. Express the following in terms of U ,
and V .
(a) AT A
(b) AAT
(c) (AT A)1 AT (assuming A has linearly independent columns)
(d) A(AT A)1 AT (assuming A has linearly independent columns)
12. Suppose A is a square matrix with singular value decomposition A = U V T
(a) What is the SVD of AT ?
(b) If A is invertible, what is the SVD of A1 ?
(c) Show that | det(A)| is the product of the singular values of A.

u
13. Let A = U V T be the singular
 value decomposition of the mn matrix A with U = 1
and V = v1 v2 vn . Show that

u2

1 u1 v1T + 2 u2 v2T + + k uk vkT

has rank k. (Hint: show that {vk+1 , vk+2 , , vn } is a basis for Nul A.)
14. (a) Suppose A is a symmetric matrix with the spectral decomposition A = P DP T , show that
the spectral decomposition of A + I is P (D + I)P T .
(b) Suppose A is a square matrix with the SVD A = U V T . Is the SVD of A + I given by
U ( + I)V T ?
15. Let Q be a matrix with orthonormal columns. What does a SVD of Q look like?

um

7.1. Singular Values

299

Using MAPLE
Example 1
The Maple command for computing the SVD is SingularValues and is illustrated below.
We will find the SVD of

0 1 2
3 0 1

A=
2 3 0
1 2 3
>A:=<<0,3,2,1>|<1,0,3,2>|<2,1,0,3>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]):
>U;
\[
\left[ \begin {array}{cccc} - 0.32302& 0.49999& 0.03034&- 0.80296
\\\noalign{\medskip}- 0.41841&- 0.49999&
0.74952&- 0.11471\\\noalign{\medskip}0.55065&- 0.5000&- 0.65850&0.11471\\\noalign{\medskip}- 0.64604&
0.49999& 0.06068& 0.57354
\end {array} \right] \]
>S;
\[ \left[ \begin {array}{c} 5.35768\\\noalign{\medskip} 2.82843\\\noalign{\medskip} 2.30115
\\\noalign{\medskip} 0.0\end {array} \right] \]
>Vt;
\[
\left[ \begin {array}{ccc} - 0.56043&- 0.60979&- 0.56043\\\noalign{\medskip}0.70711&-{ 2.6895\times 10^{-16}}&
0.70711\\\noalign{\medskip} 0.43119&0.79256& 0.43119\end {array} \right]
\]
The singular values are returned as a vector not in the form of a diagonal matrix. If you want the
singular values in a matrix you can enter
>DiagonalMatrix(S[1..3],4,3);
>U.%.Vt;
This last command returns the following matrix

0.00000000036 1.0
2.0

3.0
0 0.9999999997

2.0
3.0 0.0000000001

1.000000001 2.0 3.000000001

300

7. The Singular Value Decomposition

This is matrix A with some small differences due to the accumulation of rounding errors in the floating
point arithmetic. The precision of our result could be improved by increasing the value of the Digits
variable in Maple .
We could also write the SVD in the form
3
X
i ui viT
i=1

In Maple this sum could be entered as


>simplify(add(S[i]*Column(U,i).Row(Vt,i),i=1..3));
This will again give matrix A with some rounding errors.

Example 2
We will use Maple to find the singular values of

1
1
1

a
0
a

and we will investigate how these singular values relate to the parameter a.
>A:=<<1,1,1>|<a,0,a>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt],conjugate=false);
We now have the two singular values of A expressed in terms of the parameter a. We can visualize
the relationship between a and these singular values as follows:
>plot({ [a,S[1],a=-4..4],[a,S[2].a=-4..4]});
We get Figure 7.1.

Figure 7.1: The singular values of A versus a.

The plot seems to indicate that one of the singular values, s[2], approaches a limit as a becomes
large. We can compute this limit in Maple as follows

7.1. Singular Values

301

>limit(s[2], a=infinity);
1
We look at a variation on the same type of problem. Suppose we want to investigate the singular
values of matrices of the form


cos(t) sin(t)
B=
sin(t) cos(t)
We will first define
>f:=t-><<cos(t),sin(t)>|<sin(t),cos(t)>>;
This defines a function in Maple which returns a matrix of the desired form for any specified value
of t. For example the command
>f(1);
will return

cos(1)
sin(1)


sin(1)
cos(1)

cos(k)
sin(k)


sin(k)
cos(k)

and
>f(k);
will return

Next we enter
>g:=t->map( sqrt, eigenvals(transpose(f(t))&*f(t)) );
This will compute the singular values of our matrix for any specified value of t.
For example, the command
>g(.3);
[ .659816, 1.250857 ]
returns the singular values of


cos(.3)
sin(.3)


sin(.3)
cos(.3)

So we can enter
>sv:=g(t):
>plot( [ sv[1], sv[2] ], t=-3..3);
These commands give Figure 7.2 which plots the singular values of our matrix as a funtion of t.

Example 3
We have seen that the SVD of matrix A can be expressed as
A=

r
X
i=1

i ui viT

302

7. The Singular Value Decomposition


1.4

1.2

0.8

0.6

0.4

0.2

Figure 7.2: The singular values of B versus t.


where r is the rank of A. For any integer n with 0 < n r the sum
n
X

i ui viT

i=1

is called the rank n singular value approximation of A.


Before getting
to the main

problem we will look at a simple example to illustrate the basic idea.


1.4 0.0 3.0
Let A = 1.1 0.0 0.0. This is a 3 3 matrix of rank 3. We will find the SVD of A using Maple
2.1 2.1 2.1
.
>A:=<<1.4, 1.1, 2.1>|<0.0,0.0,2.1>|<2.1,2.1,2.1>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
>u1:=Column(U,1): ### The left singular vectors
>u2:=Column(U,2):
>u3:=Column(U,3):
>v1:=Row(V,1): ### the right singular vectors as row vectors
>v2:=Row(V,2):
>v3:=Row(V,3):
The rank 1 singular value approximation would be
>A1:=S[1]*u1.v1;
>A1:=U.DiagonalMatrix(<S[1],0,0>).Vt;
### another way to get the same result

1.719 1.023 2.309


A1 = .348 .207 .468
1.953 1.163 2.624
How close is matrix A1 to A? This question makes sense only relative to an inner product. We will use
the inner product hA, Bi = trace AT B.
The distance from A to A1 can now be computed as
>sqrt(Trace((A-A1)^%T.(A-A1)));
1.9046

7.1. Singular Values

303

We will mention without proof that, in fact, matrix A1 is the closest you can get to A by a matrix
of rank 1 relative to this inner product.
The rank 2 approximation would be
>A2:=sv[1]*u1.v1 + sv[2]*u2.v2;
### one way
>A2:=U.DiagonalMatrix(<S[1],S[2],0>).Vt; ### another way

1.365 .0232 3.016


A2 = .433 .447 .299
2.249 1.000 2.033

If you compare the entries in this matrix with those in A you can see that it appears to be close to A
than matrix A1. How far is A2 from A?
>sqrt(Trace((A-A2)^%T.(A-A2)));
.8790
So we see that A2 is a better approximation to A than A1. A2 will be the closest you can get to A by
a rank 2 matrix.
If we were to continue this for one more step and compute the rank 3 singular value approximation
we would get A exactly. The distance from A3 to A would be 0.
We will extend this idea to a larger matrix.
In this example we will choose a random 12 12 matrix and compute the distance between the rank
n singular value approximation of A and A itself for n = 1..12. The distance will be computed relative
to the inner product hA, Bi = trace AT B.
>A:=RandomMatrix(12,12, generator=0.0..9.0):
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
>ip:=(A,B)->Trace(A^%T.B); ### our inner product
We will now define our rank n approximations in Maple . Then we compute the distances (i.e., the
errors of our approximations) using the inner product.
>for n to 12 do
B[n]:=eval(add(S[i]*Column(U,i).Row(Vt,i),i=1..n)) od:
>for n to 12 do
err[n]:=sqrt(ip(A-B[n],A-B[n])) od;
We can visualize these errors using a plot.
>plot([seq( [i,err[i]],i=1..12)],style=point);
This gives Figure 7.3.
Of course B12 = A so the final error must be 0. The above pattern is typical for any matrix. As n
increases the approximations become better and better, and becomes exact when n = r.
There is another interesting aspect to this example. We have found the singular values and place
them in the list sv. From this list we will define the following values
q
2
22 + 32 + 42 + + 12
e1 =
q
2
e2 =
32 + 42 + + 12
q
2
e3 =
42 + + 12
..
.

e11

..
.
12

e12

304

7. The Singular Value Decomposition


600

500

400

300

200

100

10

12

Figure 7.3: The errors of the SVD approximations.


and plot them.
>for i to 12 do e[i]:=sqrt( add( sv[j]^2, j=i+1..12)) od:
>plot( [seq( [i, e[i]], i=1..12),style =point)
This plot turns out to be exactly the same as Figure 7.3. This illustrates a fact that is true in general
and whose proof is left as an exercise3 : The error of the rank n singular value approximation is the square
root of the sum of the squares of the unused singular values. That is, if you look at the unused singular
values as a vector, then the error is the length of this vector.

3 The

trickiest part of the proof depends on the fact that if v is a unit vector then the trace of vvT is 1.

7.2. Geometry of the Singular Value Decomposition

7.2

305

Geometry of the Singular Value Decomposition

Let A =

2 1
2
2

. This matrix has the following SVD:

A = U V



3
1/5 2/5
0
2/ 5
1/ 5

0
2



T

2/5 1/5
1/ 5
2/ 5

The matrices U and V T are orthogonal matrices, and in this case they are simple rotation
matrices (i.e., there is no reflection). U corresponds to a counter-clockwise rotation by 63.4 and
V T corresponds to a clockwise rotation of 26.6 . Finally is a diagonal matrix so it corresponds to
a scaling by the factors of 3 and 2 along the two axes. So what happens to the unit circle when it
is multiplied by A? We will look at the effect of multiplying the unit circle by each of the factors of
the SVD in turn. The steps are illustrated in Figures 7.4 - 7.7
3
3
2
2
1
1

3
1

3
1

1
2
2
3
3

Figure 7.4: The unit circle with the right


singular vectors

Figure 7.5: The unit circle is rotated by


V T . The right singular vectors now lie
on the axes.

3
3
2
2
1
1
3

3
3

1
1
2
2
3
3

Figure 7.6: The unit circle is scaled by


resulting in an ellipse.

Figure 7.7: The ellipse is rotated by U .

In Figure 7.4 we see the unit circle with the right singular vectors (the columns of V ) plotted.
In Figure 7.5 the unit circle has been multiplied by V T , which means it has been rotated
clockwise. There is something you should understand about this result. First, recall that the
columns of V form an orthonormal set of vectors - the right singular vectors. When these vectors
(arranged in matrix V ) are multiplied by V T we get the identity matrix. This means that the right
singular vectors have been reoriented (by a rotation and possibly a reflection) to lie along the axes
of the original coordinate system. So in Figure 7.4 we see that the right singular vectors have been

306

7. The Singular Value Decomposition

rotated to lie on the x and y axes. (This happens in every case, multiplying by V T rotates, and
possibly flips, the right singular vectors so that they line up along the original axes.)
In Figure 7.6 the rotated unit circle is multiplied by . Since is a diagonal matrix we see
the expected result. The circle has been scaled by a factor of 3 along the x axis and by a factor of
2 along the y axis. The circle has now been transformed into an ellipse.
Finally in Figure 7.7 we multiply by U . This is a rotation matrix so the ellipse in Figure 7.6
is rotated so that it is no longer oriented along the x and y axes. The axes of the ellipse are now
the left singular vectors. The vectors shown in Figure 7.7 are not the left singular vectors, they
are the vectors Av1 and Av2 . The left singular vectors would be the result of normalizing these two
vectors.
To summarize the above: The unit circle is transformed into an ellipse when it is multiplied by
A. The axes of the ellipse are in the directions of u1 and u2 . The points on the ellipse that are
furthest from the origin are Av1 and its negative. The points on the ellipse that are closest to the
origin are Av2 and its negative.
PROBLEM. Repeat the above example with A =
matrix is
A = U V

1/2
=
1/ 2


1 2
1 2

. Use the fact that the SVD for this

 

1/ 2
10 0 1/5
0
0 2/ 5
1/ 2

T
2/ 5
1/ 5

1
1
1 . We have already computed
Suppose we try a similar analysis with the matrix A = 1
1 1
the SVD of A:



1
0
2/2
0
2/2

2/2
2/2
A = U V T = 2/2 0 2/2 0
2
2/2 2/2
0 0
0
1
0
In this case notice that A is a 3 2 matrix so multiplication by A would correspond to a linear
transformation from R2 to R3 . In the SVD we have A = U V T where U is a 3 3 matrix, V is a
2 2 matrix, and is 3 2. So U corresponds to a transformation from R3 to R3 , V T corresponds
to a transformation from R2 to R2 , and corresponds to a transformation from R2 to R3 .
So suppose we start with the unit circle in R2 . When we multiply by V T the circle looks the
same, it has just been rotated so that the right singular vectors lie along the axes. Next we multiply
by . Notice that for any vector in R2 we have


2x
2 0 

x
= 2y
Ax = 0
2
y
0
0
0

So what happens here? We see that the x value is scaled by 2 and the y value is scaled by 2 so
again the circle is stretched into an ellipse. But something else happens, there is a third coordinate
of 0 that gets added on. In other words we still have an ellipse in the xy plane, but the ellipse
in now located in 3 dimensional space. In this case multiplying by has the effect of scaling and
zero-padding. It is the zero-padding that results in the change of dimension.
Finally we multiply by U which again will be a rotation matrix, but now the rotation is in R3
so the ellipse is rotated out of the xy plane. The unit circle is again transformed into an ellipse,
but the resulting ellipse is located in 3 dimensional space. These transformations are illustrated in
Figures 7.8- 7.11.

7.2. Geometry of the Singular Value Decomposition

307
2

Figure 7.9: The unit circle is multiplied


by V T . The right singular vectors now
lie on the axes.

Figure 7.8: The unit circle with the right


singular vectors

1
2

1.5
1
0.5

2
1
2
0 1
1
2 2

1
1.5 1 0.5

Figure 7.10: The unit circle is scaled


into an ellipse by and inserted into
R3 .

1.5
1
1
1.5

Figure 7.11: The ellipse from is rotated


in R3 by U .

PROBLEM. Do a similar analysis for multiplying the unit sphere by AT . (There are a couple of
major differences with this example. In particular, what exactly do you end up with in this case?)
In summary you should understand that the finding the SVD of a matrix A can be interpreted
as factoring the matrix into a rotation followed by a scaling followed by another rotation. This
last sentence is a bit of an oversimplificationin that there could also be reflections involved in the
orthogonal matrices. Also if A is not a square matrix then multiplying by will involve truncation
(decreasing the dimension) or zero padding (increasing the dimension).

The SVD and Linear Transformations


If A is an m n matrix then T (x) = Ax would be a linear transformation from Rn to Rm .
A
Rn

Rm

Now when we find the singular value decomposition A = U V T the matrices U and V T can be
looked at as change of basis matrices giving the following diagram.
A
Rn
V

?
Rn

Rm
6
U

Rm

308

7. The Singular Value Decomposition

From this point of view you can look at A and as corresponding to the same linear transformation relative to different bases in the domain and codomain. More specifically, if the vectors in
the domain are expressed in terms of the columns of U and vectors in the codomain are expressed
in terms of the columns of V , then the multiplication by A (in the standard basis) corresponds to
multiplication by .
If the domain and codomain have different dimensions then the change in dimension is a result
of the operation of . If the dimension is increased via the transformation, this is accomplished
through zero padding. If the dimension is decreased, this is accomplished through truncation.4

D
I
O or =
=
D where D is a square diagonal matrix (with
O
O
possibly some zeroes on the diagonal). So can be written as the product
of a square matrix which scales the entries

I
in a vector and a truncation matrix, I O or a zero padding matrix
.
O
4 In

fact we can write = D

O =D I

7.2. Geometry of the Singular Value Decomposition

309

Exercises
1. For A =


6 2
we have the SVD
7 6
A = U V



1/ 5 2/5 10 0 2/5 1/ 5
=
2/ 5 1/ 5 0 5 1/ 5 2/ 5


Plot the unit circle with the right singular vectors then show the result of successively multiplying
this circle by V T , , and U
2. Let matrix A be the same as in question (1). Repeat the steps of question 1 for
(a) AT
(b) A1


2 3
3. For A =
we have the SVD
0 2
A = U V T =


2/5 1/ 5 4
0
1/ 5 2/ 5


0
1/ 5
1 2/ 5


2/5
1/ 5

Plot the unit circle with the right singular vectors then show the result of successively multiplying
this circle by V T , , and U
4. Let matrix A be the same as in question (3). Repeat the steps of question 1 for
(a) AT
(b) A1

1
5. Let A = 1
1

1
1. This matrix has the following SVD
1
A = U V T

1/3 1/6
= 1/3 1/ 6
1/ 3 2/ 6


1/ 2
6 0 
2/2
2/2

0 0
1/ 2
2/2
2/2
0 0
0

(a) Describe the effect of multiplying the unit circle by A by looking at the effect of multiplying
successively by each factor of the SVD.
(b) The unit circle gets transformed into a line segment in R3 with what end points?
(c) What is a basis for Col A? How does this relate to the answer for (b)?
(d) What are the furthest points from the origin on the transformed unit circle? How far are these
points from the origin? What does this have to do with the singular values of A?


1 1 1
6. Let A =
. This matrix has the following SVD
1 1 1
A = U V T


 
2/2 2/2
6

=
0
2/2
2/2


 1/ 3 1/ 3

0 0
1/6 1/ 6
0 0
1/ 2 1/ 2


1/ 3
2/ 6
0

310

7. The Singular Value Decomposition


(a) Describe the effect of multiplying the unit sphere by A by looking at the effect of multiplying
successively by each factor of the SVD.
(b) The unit sphere gets transformed into a line segment in R2 with what end points?
(c) What is a basis for Col A? How does this relate to the answer for (b)?

(d) What are the furthest points from the origin on the transformed unit circle? How far are these
points from the origin? What does this have to do with the singular values of A?

0 0 1
7. Let A = 1 0 1 .
1 0 0
(a) Find the SVD of A.
(b) The unit sphere will be transformed into a filled ellipse in R3 . What is the equation of the
plane containing this ellipse.
(c) What are the points on the ellipse that are furthest from the origin? What is the distance of
these points from the origin.
8. Are the following statements TRUE or FALSE?
(a) A 2 2 matrix of rank 1 transforms the unit circle into a line segment in R2 .

(b) A 3 2 matrix of rank 1 transforms the unit circle into a line segment in R3 .
(c) A 2 2 matrix of rank 2 transforms the unit circle into an ellipse in R2 .

(d) A 3 2 matrix of rank 2 transforms the unit circle into an ellipse in R3 .

(e) A 3 3 matrix of rank 3 transforms the unit sphere into an ellipsoid in R3 .

(f) A 3 3 matrix of rank 1 transforms the unit sphere into a line segment in R3 .

7.2. Geometry of the Singular Value Decomposition

311

Using MAPLE
Example 1
In this example we will use Maple to illustrate the

1
A = 0
1

geometry of the SVD in R3 . We will let

0
1
1
1
0 .5

and show the effects of multiplying the unit sphere by this matrix.
We will use the following basic fact: the unit sphere can be plotted using the vector

cos(s) sin(t)
v = sin(s) sin(t)
cos(t)
and letting the parameter s range over the interval [0, 2] and the parameter t range over the interval
[0, ]. We will in fact write a Maple procedure that will plot the top and bottom halves in different
colors.
>showsphere:=proc(matr)
local A,v1,v2,p1,p2;
A:=matr:
v1:=<cos(s)*sin(t), sin(s)*sin(t), cos(t)>:
v2:=A.v2:
p1:=plot3d(v2,s=0..2*Pi,t=0..Pi/2,color=grey):
p2:=plot3d(v2,s=0..2*Pi,t=Pi/2..Pi,color=blue):
plots[display]([p1,p2],scaling=constrained,orientation=[100,70]);
end:
In this procedure, the input matr is assumed to be a 3 3 matrix and the procedure plots the unit
sphere after being multiplied by the matrix.
Next we will enter matrix A and find the SVD.
>A:=<<1,0,1,0,1,1,1,0,-.5>>;
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]):
>I3:=IdentityMatrix(3):
Next we will use the showsphere above and apply the various transformations to a sphere.
Now to plot the results we just have to enter the following:
>showsphere(I3);
#### the original sphere
>showsphere(Vt);
#### apply Vt
>showsphere(DiagonalMatrix(S).Vt);
###
now apply S
>showsphere(U.DiagonalMatrix(S).Vt); ### and finally apply U
This gives Figures 7.12- 7.15.
Note that one of the singular values is .3099 which results in the sphere being flattened a lot in one
direction. To see this it is a good idea to use the mouse to rotate the plots once they have been drawn
in order to see them from different viewing angles.

Example 2

312

7. The Singular Value Decomposition

Figure 7.12: The unit sphere.

Figure 7.13: Multiply by V T .


6

Figure 7.14: Multiply by .


Figure 7.15: Multiply by U .



1.2 1 1
. This corresponds to a transformation from R3 to R2 .
1 1 1
Finding the SV D with Maple we get:
In this example we will let A =

>A:=<<1.2,1>|<1,-1>|<1,1>>;
>U,S,Vt:=SingularValues(A, output=[U,S,Vt]);


2.107
0
0
So we have =
which involves scaling and truncation (dimension reduction). In
0
1.414 0
this case we will have to modify our approach since after multiplying by we will be in R2 . We will still
use the plot3d command by adding a third component of zero, and choosing an appropriate viewing
angle.
>S1:=DiagonalMatrix(S,2,3):
>v:=<cos(s)*sin(t),sin(s)*sin(t),cos(t)>:
>SV:=S1.Vt.v;
>USV:=U.S1.Vt.v;
We now have a slight problem when it comes to plotting. The vectors V T v and U V T v are
vectors in R2 using 2 parameters. Maple doesnt have a command for plotting in two dimensions with
2 parameters, so we will use a trick as shown below.
>showsphere(V);
>plot3d( [SV[1],SV[2],0],s=0..2*Pi,t=0..Pi,
orientation=[90,0],scaling=constrained);
>plot3d( [USV[1],USV[2],0],s=0..2*Pi,t=0..Pi,
orientation=[90,0],scaling=constrained);
This gives Figures 7.16 - 7.19.
Multiplication by V T gives, as expected, a rotation in R3 . Multiplication by truncates the third
coordinate and scales the result. This gives a filled ellipse in R2 . Multiplying by U rotates this ellipse in
R2 . Notice that the plotting method we used makes it clear where the north pole and south pole of
the original sphere have ended up. They are in the interior of the ellipse at the points


 0
 
1.2 1 1
1
0 =
1 1 1
1
1

7.2. Geometry of the Singular Value Decomposition

313

Figure 7.17: Multiply by V T .

Figure 7.16: The unit sphere.

Figure 7.18: Multiply by .


Figure 7.19: Multiply by U .
and



 0
 
1.2 1 1
1
0 =
1 1 1
1
1

314

7.3

7. The Singular Value Decomposition

The Singular Value Decomposition and the Pseudoinverse

1
Consider the matrix A = 1
0
in Chapter 5 would be

1
0. This matrix has no inverse, but the pseudoinverse as defined
1


1 

 

1 0
1/3 2/3 1/3
A = (A A)
=
0 1
1/3 1/3 2/3

Now look at the SVD of A. From AT A we get singular values of 3 and 1. Omitting the rest of
the details we get



0
1/ 3
2/6
3 0 
1/ 2 1/2
A = U V T = 1/6 1/ 2 1/3 0 1
1/ 2 1/ 2
0 0
1/ 6 1/ 2 1/ 3

2 1
A =
1 2
T

1
1

Now suppose we ask ourselves why matrix A cannot be inverted. If we look at the SVD we see
that A can been decomposed into three factors. Of those three both U and V are invertible (since
they are orthogonal, their inverse is just their transpose), so the reason that A is not invertible
must have something
to do with . What is the effect of , the middle factor? It scales
the first

component by 3, and this scaling can be inverted (just divide the first component by 3). It scales
the second component by 1, and again this scaling can be undone. There is a third effect of the
matrix, it takes vectors in R2 and places them in R3 by adding a 0 as a third component (zero
padding). It is this last effect of that lies behind the non-invertibility of A in that it changes the
dimension of the vector. Every vector in R2 gets transformed into a unique vector in R3 by A, but
the reverse is not true. Every vector in R3 does not have a pre-image in R2 since the column space
of A is two dimensional. It is precisely the vectors in R3 that are not in the column space of A that
do not have a pre-image in R2 .
So we have A = U V T and if each factor was invertible the inverse of A would be V 1 U T .
This should be a 2 3 matrix which corresponds to a linear transformation from R3 to R2 that will
undo the effects of matrix A. The problem is the middle term, the matrix has no inverse. How
close can we come to finding an inverseof ? To undo the effects of matrix A we want to do three
things: scale the first component by 1/ 3, scale the second component by 1, and chop off(truncate)

1/ 3 0 0
3
the third component of an input vector in R . The matrix that would do this is
0
1 0
and, for reasons that will become clear shortly, we will call this matrix . If we evaluate V U T
we get





 2/ 6 1/ 6
1/

6
1/2 1/ 2 1/ 3 0 0
0 1/2 1/ 2
0
1 0
1/ 2 1/ 2
1/ 3 1/

3 1/ 3

 2/ 6 1/ 6

1/

6
1/6 1/ 2 0
=
0 1/2 1/ 2
1/ 6 1/ 2 0

 1/ 3 1/ 3 1/ 3
1/3 2/3 1/3
=
1/3 1/3 2/3

In other words we get A , the pseudoinverse of A.


T
Now, in general, when you
 findthe SVD of an m n matrix A = U V the matrix will be an
D 0
m n matrix of the form
where D stands for a square diagonal matrix with all non-zero
0 0

7.3. The Singular Value Decomposition and the Pseudoinverse

315

D1
diagonal entries. We will define the pseudoinverse of to be the n m matrix
0
1
matrix D will undo the scalings of D.



0
. The
0

The principle behind the pseudoinverse is essentially how we deal with . The principle is to
invert all scalings and then to undo any zero padding by a truncation and vice versa..
To clarify the point we are trying to make in this section suppose A is an m n matrix with
linearly independent columns with the singular value decomposition A = U V T . The pseudoinverse
of A as defined in Chapter 5 would be
A

= (AT A)1 AT
= (V T U T U V T )1 V T U T
= (V T V T )1 V T U T
2

22

T 1
= (V
V ) V T U T
.
..

n2

= V

= V

= V

1/12

1/22

..

.
1/n2

1/12

1/22

..

.
1/n2

1/1

= V U

1/2
..

.
1/n

T
V V T U T

2
..

.
n

T
U

...

T
U

In other words the pseudoinverse as defined in this section in terms of the singular value decomposition is consistent with our previous definition. But this new definition is more powerful because
it is always defined. It is not restricted to matrices with linearly independent columns.

Example 7.3.5



1 2
W hat is the pseudoinverse of A =
?
1 2
We have already found the SVD of this matrix in Example 7.1.4
A =
=

U V T

 


1/ 5
10 0
2/2 2/2
0
0 2/ 5
2/2
2/2


2/5
1/ 5

316

7. The Singular Value Decomposition


From the above discussion we have the pseuodinverse
A

=
=
=

V U T





2/2
2/2
1/5 2/ 5 1/ 10 0

0
0 2/2
2/ 5 1/ 5
2/2


1/10 1/10
1/5 1/5

What happens if you multiply A by its pseudoinverse? Do you get the identity?
No. Simple computation gives


 

1 2 1/10 1/10
1/2 1/2

AA =
=
1 2 1/5 1/5
1/2 1/2
and
A A =



1/10 1/10 1
1/5 1/5 1

 

2
1/5 2/5
=
2
2/5 4/5

Suppose we write the SVD of an m n matrix A as


A = 1 u1 v1T + 2 u2 v2T + 3 u3 v3T + + r ur vrT
where r is the number of non-zero singular values of A. Then the above comments mean that the
pseudoinverse of A can be written as
A =

1
1
1
1
v1 uT1 + v2 uT2 + v3 uT3 + + vr uTr
1
2
3
r

Notice what happens when these two expressions are multiplied together. We leave it as a simple
exercise to show that
AAT = u1 uT1 + u2 uT2 + u3 uT3 + + ur uTr
and
AT A = v1 v1T + v2 v2T + v3 v3T + + vr vrT

These are just projectors onto Col U and Col V respectively5 .

Example 7.3.6
C onsider the matrix A =

1 1
1 1

which has the following SVD



2/2 2/2 2

A=
0
2/2
2/2

0
0



2/2 2/2
2/2
2/2

Now it should be obvious that A is not invertible, in fact since the columns of A are not
linearly independent you cant find the pseudoinverse of A from the formula (AT A)1 AT .
But why is A not invertible? What insights can the SVD give us into this question?
We have A = U V T so it might seem that to invert A all we have to do is to invert
each of the factors of the SVD and then reverse the order of multiplication. If we try
this there is certainly no problem with U or V ; since these matrices are orthogonal they
are certainly invertible. But what about . This is just a scaling matrix and so it might
5 We

will soon see that Col U = Col A andCol V = Row A.

7.3. The Singular Value Decomposition and the Pseudoinverse

317

seem that to invert it we just have to undo the scalings. In particular the x coordinate
is scaled by 2 so to undo that scaling we just have to multiply by 1/2. But the y value is
scaled by 0 and that means all y values are mapped to 0, so there is no way to undo this
scaling.That is one way of understanding why A is not invertible one of the singular
values is equal to 0 and a scaling by 0 cannot be inverted.
If we proceed as outlined above, the pseudoinverse of A should be given by V U T which
gives


 



2/2 2/2 1/2 0
2/2 2/2 = 1/4 1/4
0 0 2/2
1/4 1/4
2/2
2/2
2/2
Now suppose you had the following system of equations
x1 + x2
x1 + x2

=
=

1
3

This system is obviously inconsistent. The normal equations would be


2x1 + 2x2

= 4

2x1 + 2x2

= 4

The normal equations have infinitely many solutions so the system we are looking at
doesnt have a unique least squares solution. It has infinitely many least squares solutions.
6

Figure 7.20: The two solid parallel lines represent the inconsistent system. The dotted line represents
the least-squares solutions to the system.
The normal equations imply that all the points on the line x1 + x2 = 2 would be least
squares solutions. This is illustrated in Figure 7.20. If we write this system as Ax = b
suppose we tried to find a least squares solution by multiplying by the pseudoinverse
found above. In this case we get

   
1/4 1/4 1
1
A b =
=
1/4 1/4 3
1
What is so special about this result? First of all it lies on the line x1 + x2 = 2 so it is
a least squares solution. More than that it is the least squares solution of the minimum
length (i.e., it is the least squares solution that is closest to the origin).

318

7. The Singular Value Decomposition


Although we wont prove it, what happended in this example will always happen. If
Ax = b has linearly independent columns then A b will give the unique least squares
solution to the system. If Ax = b has linearly dependent columns then the system will
have many least squares solutions and A b will give the least squares solution to the
system of minimum norm.

You should see the pseudoinverse as a generalization of the idea of a matrix inverse. The following
points should clarify this.
If A is square with independent columns then A is invertible and the pseudoinverse of A would
be the same as the inverse. That is,
A = A1
In this case, a linear system Ax = b would have the unique solution A1 b.
If A is not square but has linearly independent columns then A is not invertible but A does
have a pseudoinverse. The pseudoninverse can be computed as
A = (AT A)1 AT
In this case A b gives the unique least-squares solution to Ax = b.
If A does not have linearly independent columns then the pseudoninverse can be computed
using the SVD. In this case A b gives the least-squares solution of minimum norm to the
system Ax = b.

7.3. The Singular Value Decomposition and the Pseudoinverse

319

Exercises
1. Suppose you are given the following SVD of A

1
0
0
4
A = 0 1/2 1/ 2 0
0
0 1/ 2 1/ 2

0 
2/5
2
1/ 5
0


1/ 5
2/ 5

What is A ?
2. Suppose


1/2 

A = 3 1/ 2 2/3 1/3 2/3
0

What is A ?
3. Suppose

2 
A = 3 1 1 1
1


1

What is A ?
4. Use the SVD to find the pseudoinverse of


1 1
1

(a) 1 1
(b)
1
1 1

1 1
1 1


1 1
(c)
1 1

5. Find the psudoinverse of

1 2
(a) 0 0
0 0

3
0
0

1
(b) 2
3

0 0
0 0
0 0 (c) 0 0
0 0
0 0

1
2
3


3 0 0
6. Let =
. Evaluate and .
0 2 0

6 0
7. Let = 0 4. Evaluate and .
0 0

5 0 0
0 2 0

8. Let =
0 0 0. Evaluate and .
0 0 0
9. Use the pseudoinverse to find a least squares solution to the following system:
x1 + x2 + x3
x1 + x2 + x3

=
=

0
6

320

7. The Singular Value Decomposition

10. The system


x1 + 2x2 + x3

x2 x3

is consistent and has infinitely many solutions.


(a) Find an expression for the general solution of this system.
(b) Find an expression for the magnitude squared of the general solution and use calculus to
determine the smallest possible value of the magnitude squared.
(c) If you write this system as Ax = b, evaluate A b. How does this relate to the answer from
(b)?

a1
a2

11. (a) What is the pseudoninverse of any n 1 matrix A = . ? (Hint: use the fact that this
..
an
matrix has rank 1.)


(b) What is the pseudoninverse of any 1 n matrix A = a1 a2 an ?
12. Let A be an m n matrix of rank r with SVD A = U V T .
(a) What is A ui for 1 i r?

(b) What is A ui for r + 1 i m?


13. If A has orthonormal columns what is A ?
14. Show that if A is an invertible matrix then A = A1 .
15. Show that A A and AA are symmetric.
16. Show that AA A = A and A AA = A . (Note: this result along with the previous problem shows
that AA and A A are projectors.)

7.3. The Singular Value Decomposition and the Pseudoinverse

321

Using MAPLE
Example 1.
In this example we will use Maple to find the least-squares solution to an overdetermined system with
the pseudoinverse. Our system of equations will represent an attempt to write ex as a linear combination
of 1, x, x2 , and x3 . We will convert this into a discrete problem by sampling these functions 41 times
on the interval [2, 2].
>f:=x->exp(x):
>g[1]:=x->1:
>g[2]:=x->x:
>g[3]:=x->x^2:
>g[4]:=x->x^3:
>xvals:=Vector(41,i->-2+.1*(i-1)):
>u:=map(f,xvals):
>for i to 4 do v[i]:=map(g[i],xvals) od:
We will now try to write u as a linear combination of the vi . Now vector u is a discrete approximation
to ex and vectors v1 , v2 , v3 , and v4 are approximations to 1, x, x2 and x3 so our problem is the discrete
version of trying to write ex as a cubic polynomial. Setting up this problem will result in an inconsistent
system of 41 equations in 4 unknowns.
In the following Maple commands we compute the pseudoinverse from the fact that if
A = 1 u1 vT + 2 u2 v22 + + r ur vrT
then
A =

1
1
1
v1 uT + v2 u22 + + vr uTr
1
2
r

>A:=<v[1]|v[2]|v[3]|v[4]>:
>U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
>pinvA:=eval( add( 1/S[i] * Column(V^%T,i) . Column(U^%T,i),i=1..4)):
>soln:=pinvA.u;
soln = [.92685821055486, .9606063839232, .6682692476746, .209303723666]
>p1:=add(soln[i]*x^(i-1),i=1..4);
>p2:=1+x+1/2*x^2+1/6*x^3;
>plot([exp(x),p1,p2],x=-2..2,color=[black,red,blue]);
The resulting plot is shown in Figure 7.21. By looking at the graphs it appears that by using
the weights computed above we get a better approximation to ex than the Taylor polynomial. We can
quantify this a bit more clearly as follows:
>int((exp(x)-p1)^2,x=-2..2);
.015882522833790110294
>int((exp(x)-p2)^2,x=-2..2);
.27651433188568219389
These values show that p1 is closer to ex than p2

Example 2.

322

7. The Singular Value Decomposition

1
x

Figure 7.21:
In this example we will illustrate how to write a Maple procedure that will compute the pseudoinverse
of a matrix. We will call our procedure pinv. We will first give the procedure and make some comments
afterwards. When you are entering the procedure you should end each line (until you are finished) with
SHIFT-ENTER rather than ENTER. This prevents a new prompt from appearing on each line.
>pinv:=proc(A)
local sv1,sv2,U,V,i:
U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
sv2:=select(x->x>10^(-8),S);
eval(add( 1/sv2[i]*Column(Vt^%T,i).Row(U^%T,i),i=1..Dimension(sv2)));
end;

The first line gives the name of the procedure and indicates that the procedure will require one
input parameter. The A in this line is a dummy variable, it stands for whatever matrix is input to
the procedure.
The second line lists the local variables used in the procedure. These are basically all the symbols
used within the procedure.
The third line computes the SVD of the input matrix.
The fourth line is a bit tricky. Some of the singular values from the previous line could be zero.
We just want the non-zero singular values. But, unfortunately, due to rounding errors sometimes
singular values that should be 0 turn out to be small non-zero decimals. This line selects all the
singular values that are greater than 108 . Even if a singular value is not zero but very small then
its reciprocal will be very large and this can result in numerical instability in the computation.
The fifth line computes the pseudoinverse as
X 1
vi uTi
i
for the non-zero singular values, or at least for the singular values greater than our cut-off value.

7.3. The Singular Value Decomposition and the Pseudoinverse

323

The last line indicates that the procedure is finished. You can now use the pinv command to find
the pseudoinverse of any (numerical) matrix.
For example:
>M:=<<1,5>|<2,6>|<3,7>|<4,8>>;
>pinv(M);
This returns the matrix:

0.5500000002

0.2500000001

0.2250000001 0.1250000000

0.1000000000
1.0 1011

0.4250000002 0.1250000001

324

7.4

7. The Singular Value Decomposition

The SVD and the Fundamental Subspaces of a Matrix

Suppose A is an m n matrix of rank r with the following SVD

..





v1
A = u1 . . . ur ur+1 . . . um

.
..

...

vr

vr+1

...

vn

T

which can be written as


A = 1 u1 v1T + 2 u2 v2T + + r ur vrT

Now since A is m n with rank r it follows that Nul A has dimension n r. If we look at the
product Avk where k > r then we have

Avk = 1 u1 v1T + 2 u2 v2T + + r ur vrT vk = 0

since the columns of V are orthogonal. It follows then that {vr+1 , . . . , vn } is an orthonormal basis
of Nul A. Since the row space of A is the orthogonal complement of the null space it then follows
that {v1 , . . . , vr } is an orthonormal basis of Row A.
If we apply the above argument to AT we then get {ur , . . . , um } is an orthonormal basis of
Nul AT , and {u1 , . . . , ur } is an orthonormal basis for Row AT (which is the same as Col A).
Given any matrix A, the four fundamental subspaces of A are: Col A, Nul A, Col AT , and Nul AT .
So the SVD of A gives orthonormal bases for each of these subspaces.
The SVD also gives us projectors onto these four fundamental subpaces.
AA projects onto Col A.
A A projects onto Row A.
I AA projects onto NulAT .
I A A projects onto NulA.
The following may help clarify some of the above comments:
If A is an n n matrix with linearly independent columns then A is invertible and
A1 A = I
AA1 = I
In this case we have Col A = Row A = Rn and Nul A = Nul AT = {0}.
If A is not square but has linearly independent columns then A has a pseudoinverse and
A A = I
AA = the projector onto Col A
If the columns of A are not linearly independent then A has a pseudoinverse and
A A = the projector onto Row A
AA = the projector onto Col A

7.4. The SVD and the Fundamental Subspaces of a Matrix

325

Example 7.4.7
L et

1
1
1
1
A=
1 1
1 1

1
1

1
1


1
It should be perfectly clear that A has rank 1 and that 1 is a basis for the row space
1
T
1
1

of A and
1 is a basis for the column space of A. If we find the SVD we get
1



1/3 1/ 2 1/ 6
V = 1/3
0
2/ 6
1/ 3 1/ 2
1/ 6
The first column is a unit vector that is a basis of Row A. Because the columns are
orthonormal, the second and third columns form a basis for the plane orthogonal to the
row space, and that is precisely Nul A.
We also have

1/ 12
1/2 1/ 2 1/ 6
1/2
0
0 3/12

U =
1/2 1/ 2 1/ 6 1/ 12

1/2
0
2/ 6 1/ 12
Again is should be easy to see that the first column is a unit vector that is a basis for
Col A and so the remaining columns must be an orthonormal basis of Nul AT

Example 7.4.8

0 1 0
L et A = 1 0 2. Find the matrix that projects vectors orthogonally onto Col A.
0 2 0
One way of doing this would be to find an explicit orthonormal basis for the column
space. In this particular case this is easy because it is clear that the first two columns
form an orthogonal basis for the column space. If we normalize these columns then we
can compute the projector as

0
1
0


1/ 5 0
0 1
2/ 5 0

1/ 5
1/5 0 2/5
0 = 0 1 0
2/5 0 4/5
2/ 5

(If you look at this projector it should be clear that it has rank 2. You should remember
that this corresponds to the fact that it projects vectors onto a 2 dimensional subspace.)
Another way of finding the projector is by the SVD. In this case the SVD would be given
by

T

5 0 0 1/ 5 0 2/ 5
0 1/ 5 2/ 5
0
0 0
0
U V T = 1
5 0 0 1
0 2/ 5 1/ 5
0
0 0 2/ 5 0 1/ 5

326

7. The Singular Value Decomposition


The pseudoinverse is then given by
A = V U T

0 1/5 0
= 1/5 0 2/5
0 2/5 0

The projector onto Col A will then be

1/5 0
AA = 0 1
2/5 0

2/5
0
4/5

Note that in this case the first method seems simpler because it was very easy to find an
orthonormal basis for the column space. The second method has an advantage in that
it allows you to define the projector strictly in terms of matrix A regardless of the size
of A.

7.4. The SVD and the Fundamental Subspaces of a Matrix

Exercises

1
1. Let A = 1
0

1 0
1 0.
0 1

(a) Find the SVD of A.


(b) Find a basis for Col A and Row A.
(c) Find a basis for Nul A and Nul AT .
(d) Evaluate A A and AA

1 0
1 0

2. Let A =
1 0.
1 2
(a) Find the SVD of A.
(b) Find a basis for Col A and Row A.
(c) Find a basis for Nul A and Nul AT .
(d) Evaluate A A and AA

327

328

7.5

7. The Singular Value Decomposition

The SVD and Statistics

There are deep connections between linear agebra and statistics. In this section we want to take
a brief look at the relationship bewteen the SVD of a matrix and several statistical concepts.
Suppose a series of measurements results in several lists of related data. For example, in a study
of plant growth biologists might collect data about the temperature, the acidity of the soil, the
height of the plants, and the surface area of the leaves. The data collected can be arranged in the
form of a matrix called the matrix of observations. Each parameter that is measured can be
arranged along one row of the matrix, so an m n matrix of observations consists of n observations
(i.e., measurements) of m different parameters.
Let X = [X1 X2 Xn ] be an m n matrix of observations. The sample mean, M, is given
by
1
M = (X1 + X2 + + Xn )
n

If we define Xj = Xk M then the matrix


i
h
1 X
2 X
n
B= X
is said to represent the data in mean-deviation form.
The covariance matrix, S, is defined to be
S=

1
BB T
n1

As an example suppose we measured the weights and heights of 10 individuals and got the results
shown in the following table.
weight (kg)
height (m)

23.1
1.10

16.2
.92

18.4
.98

24.2
1.24

12.4
.86

20.0
.99

25.2
1.21

11.1
.75

19.3
1.00

25.1
1.35

This would give a 2 10 matrix of observations. Each observation would involve the measurement
of 2 parameters.
The sample mean is the vector whose entries are the average weight and average height. Computing these averages we get


19.5
M=
1.04
If we now subtract this mean from each of the observations we get the data in mean-deviation
form


3.6 3.3 1.1 4.7 7.1
.5
5.7 8.4 .2 5.6
B=
.06 .12 .06 .20 .18 .05 .17 .29 0.4 .31
If we look at each column of the matrix of observations as a point in R2 we can plot these points
in what is called a scatter plot. Figure 7.22 is a scatter plot of our data. In this plot the sample
mean is also plotted as a cross, it is located at the center of the data. For comparison, Figure
7.23 is a plot of the data in mean deviation form. The only difference is that the data has been
shifted so that it is now centered around the origin. In mean deviation form the sample mean will
be the origin. The entries in matrix B indicate how much above or below average each value lies.
The covariance matrix would be


1
25.807 .891
T
S = BB =
.891 .034
9

7.5. The SVD and Statistics

329

0.3
1.3

0.2
1.2

0.1
1.1

0.1
0.9

0.2
0.8

12

14

16

18

20

22

0.3

24

Figure 7.22: Scatter plot of original data.

Figure 7.23: Data in mean-deviation form.

The entries down the diagonal of the covariance matrix represent the variance of the data. In
particular, the diagonal entry sjj of matrix S is the variance of the j th parameter.
So, in the above example, 25.907 is the variance of the weight and .034 is the variance of the
height.
The variance can be interpreted as a measure of the spread of the values of a certain parameter
around the average value. For example, the average of 9 and 11 is 10, but the average of -100 and
220 is also 10. The difference is that the first pair of numbers lie much closer to 10 than the second
pair, i.e., the variance of the first pair is much less than the variance of the second pair.
The total variance of the data is the sum of all the separate variances. That is, the total
variance of the data is the sum of the diagonal entries of S (this is also the trace of S).
Each off-diagonal entry of matrix S, sij for i 6= j, is called the covariance between parameters
xi and xj of the data matrix X. Notice that the covariance matrix is symmetric so sij = sji . If the
covariance is 0 it is said that the corresponding parameters are uncorrelated.

Principal Components
The covariance matrix is symmetric and positive definite so, as weve seen before, it can be diagonalized. To diagonalize S we would find the (positive) eigenvalues and then the corresponding
eigenvectors. The eigenvectors of S determine a set of orthogonal lines. If ui is one of these eigenvectors then the vector B T ui is called a principal components of the data. The principal component
corresponding to the largest eigenvalue is called the first principal component. The second principal
component corresponds to the second larget eigenvalue and so on.
For our earlier example of weights and heights we would get the following eigenvalues and unit
eigenvectors
1 = 25.838, 2 = .0032



u1 = .9994, .0345 , u2 = .0345, .9994


The first principal component would then be



T
B T u1 = 3.600 3.302 1.101 4.704 7.102 .498 5.702 8.405 .214

The second principal component would be



T
B T u2 = .064 .006 .022 .038 .065 .067 .027 .000 .393 .117

330

7. The Singular Value Decomposition

Now this might seem confusing but all that is going on is a change of basis. We have our data in
mean-deviation form and we are converting it to our eigenbasis6 which has been ordered according
to the size of the eigenvalues. The first principal component is a vector that contains all the first
coordinates of our data points relative to the eigenbasis. The second principal component is made
up of the second coordinates of our data points relative to the eigenbasis.
In the above example the entries in the second principal component are fairly small. This means
that most of the data points lie very near the first eigenspace. That is, relative to the eigenbasis our
data is approximately 1-dimensional. This is connected to the relative sizes of the eigenvalues. The
sum of the eigenvalues of S will equal the total variance.
In the following plot we see the data in mean-deviation form and the eigenspace of S corresponding to the first principal component (1 = 25.8374).
1.4
1.2
1
0.8
0.6
0.4
0.2
10

6 4 0

2 4 6 8 10 12 14 16 18 20 22 24 26
x

0.2

Figure 7.24: The data in mean-deviation form and the first principal component
The line through the origin along the principal component has slope .0345. This would have
= .0345w
and w
equation h
where h
are the weight and height in mean deviation form. Is this just
the least-squares line? No7 , the significance of this line and how it relates to the least-squares line
will be explained in the next section.

6 The
7 The

eigenbasis consists of the eigenvectors of mathbf S which are the right singular vectors of X.
You should try deriving this equation for a bit of review.
least-squares line would be w
= .0383h.

7.5. The SVD and Statistics

331

Exercises
1. Given the following data points (in mean-deviation form)
x
y

-2
-3

-1
0

1
2

2
1

(a) Find the least-squares line for this data.


(b) Find the total least-squares line for this data.
(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.
(d) Consider the line y = x. Find the square root of the sum of the squares of the vertical
distances of the data points to this line. Find the square root of the sum of the squares of
the perpendicular distances of the data points to this line.
2. Given the following data points (in mean-deviation form)
x
y

-2
1

-1
1

0
0

1
-2

2
0

(a) Find the least-squares line for this data.


(b) Find the total least-squares line for this data.
(c) Plot the data points and the two lines from (a) and (b) on the same set of axes.
(d) Consider the line y = x. Find the square root of the sum of the squares of the vertical
distances of the data points to this line. Find the square root of the sum of the squares of
the perpendicular distances of the data points to this line.
3. Given the following data points
x
y

1
3

3
1

5
2

(a) Find the least-squares line for this data.


(b) Find the total least-squares line for this data.


3 4 1
4. Let A =
be a data matrix.
1 2 5
(a) Convert A to mean-deviation form.
(b) Find the covariance matrix.
(c) Find the principal components.
(d) What fraction of

1 1 2
5. Let A = 3 5 7
1 1 1

of the total variance is due to the first principal component.

2
1
2
1
2
9 11 13 15 17 be a data matrix.
1 1 1 1 1

(a) Convert A to mean-deviation form.


(b) Find the covariance matrix.
(c) Find the principal components.
(d) What fraction of of the total variance is due to the first principal component.

332

7. The Singular Value Decomposition

Using MAPLE
Example 1.
We will use Maple to illustrate the idea of principal components.
We begin by generating 200 points using one of the random number routines in Maple .
>with(stats[random]):
>xv:=[seq( normald(),i=1..200)]: ### the x coordinates
>yv:=[seq(.9*xv[i]+normald(),i=1..200)]: ### the y coordinates
>mx:=add(xv[i],i=1..200)/200:
### the average x value
>my:=add(yv[i],i=1..200)/200:
### the average y value
>mxv:=[seq(xv[i]-mx,i=1..200)]: ### x in mean deviation form
>myv:=[seq(yv[i]-my,i=1..200)]: ### y in mean deviation form
>data:=[seq( [mxv[i],myv[i]],i=1..200)]:
>p1:=plot(data,style=point,color=black):
>B:=< convert(mxv,Vector), convert(myv,Vector)>;
>M:=1/199*B^%T.B;


1.084 .9079
.9079 1.790
>SingularValues(M,output=[U,S,Vt]);
[ 2.4115, .4631 ]


.5646 .8254
.8254 .5646
The first row of V t gives the first principal component. We will compute the corresponding slope.
>m1:=Vt[1,2]/V[1,1]:
>p2:=plot([m1*x,-1/m1*x],x=-3..3,thickness=2,color=black):
>plots[display]([p1,p2],scaling=constrained);

Figure 7.25: The 200 data points and the principal components.

7.5. The SVD and Statistics

333

This gives Figure 7.25. We have a cloud of data points centered at the origin. These points lie in
a roughly elliptical region. The principal components correspond to the axes of that ellipse.

Example 2.
In this example we will begin by genterating 30 data points and then put the data in mean-deviation
form. The steps are similar to the first example.
>xv:=[seq(normald(),i=1..30)]:yv:=[seq(.5*normald()+.6*xv[i], i=1..30)]:
>mx:=add(xv[i],i=1..30)/30:
>my:=add(yv[i],i=1..30)/30:
>mxv:=convert([seq(xv[i]-mx,i=1..30)],Vector):
>myv:=convert([seq(yv[i]-my,i=1..30)],Vector):
>data:=[seq( [mxv[i],myv[i]],i=1..30)]:
We now have a collection of points centered at the origin. Look at any straight line drawn through
the origin at angle (the slope of this line would be tan ). We will find the sum of the squares of the
orthogonal distances to this line and the sum of the squares
of the vertical diastances to this line.

cos()
A unit vector in the direction of the line would be
. A unit vector normal to the line would
sin()


 
sin()
x
be
. The orthogonal distance from a point i to this line the length of the projection onto
cos()
yi
the normal vector and this would
be
|

x
sin()
+
y
cos()|.
i
i
 
xi
to this line would be |yi xi tan()|.
The vertical distance from
yi
We will use Maple to compute the sum of the squares of these distances and plot the results functions
of . We will call the sum of the squares of the orthogonal distances D1, and the sum of the squares of
the vertical distances will be called D2.
>D1:=expand(add( (-mxv[i]*sin(t)+myv[i]*cos(t))^2,i=1..30));
D1 = 19.42397 cos2 () + 33.01234 sin2 () 40.59837 sin() cos()
>D2:=expand(add( ( myv[i]-mxv[i]*tan(t) )^2,i=1..30));
D2 = 33.01234 tan2 () 40.59837 tan() + 19.42397
>plot( [ D1, D2 ], t=-Pi/2..Pi/2, 0..60, thickness=2);
The plots of D1 and D2 are shown in Figure 7.24. The plot shows that both of these functions
take on a minimum at around = .5. Using Maple we can find where these minima occur. We will
find the derivatives (using the diff command), and find the critical values.
>fsolve(diff(D1,t)=0,t=0..1);
.62391
>fsolve(diff(D2,t)=0,t=0..1);
.55130
So the line which minimizes the sum of the squares of the orthogonal distance would lie at an
angle of .6391 radians. For vertical distances the minimum would be when the line lies at .55310
radians.

334

7. The Singular Value Decomposition


60

50

40

30

20

10

1.5

0.5

0.5

1.5

Figure 7.26: The plot of D1 and D2


Now the line which minimizes the sum of the squares of the vertical distances would be the leastsquares line. If our x coordinates are in x and out y coordinates are in vy then the least-squares
line through the origin fitting these points would have slope
yx
xx
To find the angle at which this line lies we then apply the inverse tangent. In Maple we have
>arctan(DotProduct(mxv,myv)/DotProduct(mxv,mxv));
.55130
>VectorAngle(mxv,myv); ### an easier way
Amazing! This is the same result that we obtained above using calculus to determine the minimum value of D2.
Now what about the minimum of D1. How do we find this using linear algebra. The minimum
line here will be the eigenspace of the covariance matrix corresponding to the largest eigenvalue.
>B:=<mxv|myv>:
>S:=1/29*B^%T.B;
>U,S,Vt:=SingularValues(M,output=[U,S,Vt]);
The line we are looking for is determined by the first column of V. We will find the slope of this
line and then apply the inverse tangent.
>arctan(V[1,2]/V[1,1]);
.62391
This agrees with the previous result obtained using calculus.

7.6. Total Least Squares

7.6

335

Total Least Squares

Suppose you want to find the straight line that gives the best fit to a collection of data. One
approach, as we saw earlier, is to find the least squares line. The assumption of this approach is
that all the error of the data is located in the y values8 . In some cases this assumption is valid, but
in many cases it will turn out that there
in both the x and y values.

 are errors of measurement
Suppose we have data matrix X = x1 x2 xn where each column is a point in R2 and
the data is already in mean-deviation form. Let u be a unit vector, and let x = tu be the line
through the origin in the direction of u. We can find square of the orthogonal distance of each point,
xi , to the line x = tu by using the projector I uuT .
k(I uuT )xi k2

= xTi (I uuT )(I uuT )xi

= xTi (I uuT )xi

The sum of all such distances is therefore


n
X
i=1

xTi (I uuT )xi =

n
X
i=1

kxi k2

n
X

xTi uuT xi

i=1

Look at the expression on the right hand side of the above equation. This represents the value that
we want to minimize as the difference of two sums. If we wish to find the vector u which minimizes
this quantity we must maximize the second sum. This is because the value of the first sum is fixed
by the given data points so we want to subtract as much as possible from this sum. But the second
sum is
X
X
xTi uuT xi =
uT xi xTi u

X
= uT
xi xTi u
uT XX T u

and this can be seen as a quadratic form with the unknown u and the maximum will be taken on
when u is a unit eigenvector corresponding to the largest eigenvalue of the matrix XX T . Finally,
this is just the first principal component of X.
As an example suppose we have the following data values:
x
y

1.1
.2

1.2
.4

1.3
.4

1.8
.6

1.9
.7

2.1
.9

2.3
.8

2.4
1.0

2.5
1.3

3.0
1.1

3.3
2.5

3.8
2.8

We can put this data in mean-deviation form and then find the least squares line and total least
squares line. We will outline the steps.
First find the average x and y values.
P12

xi

P12

yi

i=1

12

i=1

12

= 2.225

= 1.0583

8 When we find the least squares line by solving a system of the form X = y we remove the error from y by
projecting y onto the column space of X. The column space of X is determined by the x coordinates of the data
points. If there are errors in these x values then this method would be flawed.

336

7. The Singular Value Decomposition

We subtract these averages from the x and y values to put the data in mean-deviation form and
create matrix X.

1.125 1.025 0.925 0.425 0.325 0.125
0.075
0.175
0.275 0.775 1.075 1.57
X=
0.8853 0.6853 0.6853 0.4853 0.3853 0.1853 0.2853 0.0853 0.2147 0.0147 1.4147 1.714
This gives
XX T =

7.82250 6.94250
6.94250 7.21791

This matrix has eigenvalues



of 14.4693 and .5711. A basis for the eigenspace corresponding to
.72232
the largest eigenvalue is
. This eigenspace would be a line whose slope is
.69156
.69156
= .95741
.72232
If we find the least squares line through these points in the usual way we would get a line whose
slope is .88750.
If we plot the data and the lines we obtain Figure 7.27.
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
1.2 0.8

0
0.2 0.2
0.4
0.6
0.8
1
1.2

TLS
LS

0.6

1 1.2

1.6

Figure 7.27: Comparison of the total least squares line (TLS) and the least squares line (LS).

7.6. Total Least Squares

337

Exercises
1. Find the pseudoinverse of A =
solution of the system


1
2
and use this pseudoinverse to find the least-squares
1 2
 
1
Ax =
0

1
2. Find the pseudoinverse of A = 2
1
solution of the system

3. Find
lines
x
y

2
1 and use this pseudoinverse to find the least-squares
1

1
Ax = 1
1

the least-squares line and the total least-squares line for the following data points. Plot both
on the same set of axes.
0 1 2
0 0 1

4. Find the least-squares line and the total least-squares line for the following data points. Plot both
lines on the same set of axes.
x -1 0 1 2
y
0 0 1 1

338

7. The Singular Value Decomposition

Using MAPLE
Example 1.
We will use Maple to illustrate another use of the SVD - data compression. We will begin by defining
a 30 30 matrix of data.
>with(plots):
>f:=(x,y)-> if abs(x)+abs(y)<1 then 1 else 0 fi:
>A:=Matrix(30,30,(i,j)->f((i-15)/12,(j-15)/12)-f((i-15)/4,(j-15)/4)):
>matrixplot(A);

Figure 7.28: Matrixplot of A.


There is nothing special about this matrix other than the fact that it generates a nice picture.
The matrixplot command in Maple generates a 3 dimensional image where the values in the matrix
correspond to the heights of a surface. This gives us a nice way of visualizing the data in the matrix.
Now we will find the singular value decomposition of A and we will define new matrices
B1
B2

= 1 u1 v1T
= 1 u1 v1T + 2 u2 v2T

B3 = 1 u1 v1T + 2 u2 v2T + 3 u3 v3T


..
..
.
.
and plot them. The second line of Maple code below is a bit tricky but it just corresponds to the formula
Bi =

i
X

j uj vjT

j=1

>U,S,Vt:=SingularValues(A,output=[U,S,Vt]);
>for i to 12 do
B[i]:=value(add(S[j]*Column(U,j).Row(Vt,j)),j=1..i)) od:
>matrixplot(B[1]);
>matrixplot(B[2]);
>matrixplot(B[3]);
>matrixplot(B[10]);

7.6. Total Least Squares

Figure 7.29: Matrixplot of B1 .

Figure 7.31: Matrixplot of B3 .

339

Figure 7.30: Matrixplot of B2

Figure 7.32: Matrixplot of B10

Now matrix A contains 900 entries. Matrix B1 was computed from only 61 numbers - 1 , the 30
entries in u1 and the 30 entries in v1 . With less than 7% of the original amount of information we
were able to reconstruct a poor approximation to A - the plot of A is not recognizable from the plot
of B1 . With matrix B2 we use 122 numbers, about 14% of the original amount of data. The plot of
B2 is beginning to reveal the basic 3d structure of the original data. By adding more and more of the
components of the SVD we can get closer and closer to A, and the plot of B10 is very close to the
plot of A, but if you look at the singular values they become very small after the first 12 so all the
components after 12 should contribute very little to the reconstruction of A. To construct B12 we need
732 numbers which is about 80% of the amount of data in A. The point is that if we wanted to transmit
the information in matrix A to another location we could save time by sending the 732 numbers needed
to reconstruct A12 rather than the 900 numbers of A thereby reducing the amount of data that must be
transferred. It is true that the matrix reconstructed would not be exactly the same as A but, depending
on the context, it might be acceptably close.
How close is matrix B1 to matrix A. If they were the same then A B1 would be the zero matrix.
If B1 is close to A then the entries in A B1 should all be small. The distance from A to B1 could
be measured in a way similar to how we measure the distance from one vector to another. We could
subtract the matrices, square the entries, add them, and then take the square root9 . In Maple we can do
this with the Norm command with the frobenius option. (There are various ways of finding the norm
of a matrix. The method we are using here is called the Frobenius norm.)
>Norm(A-B[1],frobenius);
>Norm(A-B[2],frobenius);
>Norm(A-B[3],frobenius);
>Norm(A-B[12],frobenius);
9 This

is in fact the same as finding the distance relative to the inner product hA, Bi = trace(AT B)

340

7. The Singular Value Decomposition

This gives us the values 6.791, 2.648, 2.278, .013426353, .764e-8. The conclusion is that B1 2 is very
close to A.
We can plot the error of the successive approximations and get the following graph. We also plot
the singular values for comparison. Notice how the decrease in the errors parallels the decrease in the
singular values.
12

6
5

10

10

12

Figure 7.33: Errors of the SVD reconstructions.

10

12

Figure 7.34: The singular values ofA.

There is another way to visualize how the matrices Bj approximate A using an animation in Maple .
>for i to 12 do p[i]:=matrixplot(B[i]) od:
>display( [ seq( p[i],i=1..12) ], insequence=true );

7.6. Total Least Squares

341

Example 2.
We will now look at another example of data compression. We begin by defining a 8 8 matrix:
>M:=Matrix(8,8,[[0,0,0,0,0,0,0,0],
[0,1,1,1,1,1,1,0],
[0,1,0,0,0,0,1,0],
[0,1,0,1,1,0,1,0],
[0,1,0,0,0,0,1,0],
[0,1,0,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]]):
This matrix would correspond to the following image where 0=black and 1=white.

Figure 7.35: An 8 by 8 image.


The JPEG method of compressing an image involves converting it to the Discrete Cosine basis that
we mentioned in Chapter 1. We will write a procedure that converts an 8 8 matrix to the Discrete
Cosine basis and the corresponding inverse transformation. First we define a function f that gives cosine
functions at various frequencies. We then generate a basis for R8 by sampling f. We place these basis
vectors in in matrix A and let A1 be the inverse of A (these are the change of basis matrices). Then we
define dct for the Discrete Cosine transform and idct for the inverse transform.
>f:=(k,t)->evalf(cos(Pi*k*(t-1/2)/8)):
>A:=Matrix(8,8, (i,j)->f(i-1,j));
>A1:=A^(-1):
>dct:=proc(mat)
local m1;
m1:=mat:
A1.m1.A1^%T;
end:
>idct:=proc(mat)
local m1;
m1:=mat:
A.M1.A^%T;
end:
We now will apply the dct procedure to M and call the result TM. This matrix contains all the
information from the original image but relative to a different basis. Image compression is performed by
reducing the amount of information in TM by making all small entries equal to 0. The following Maple
code scans through the entries in TM and if an entry is lesss than 0.2 then that entry is made 0.

342

7. The Singular Value Decomposition


>TM:=dct(M);
>for i to 8 do
for j to 8 do if abs(TM[i,j])<.2 then TM[i,j]:=0 fi
od; od;
>print(TM);

This gives the following matrix

0.3437500000 0
0

0
0
0

0
0
0

0
0
0

0.2209708691 0
0

0
0
0

0.3027230267 0 0.3093592167

0 0.2209708692 0 0.3027230266 0
0

0.3093592166

Notice that we are keeping only 7 of the 64 entries in TM. We now transform back to the original basis.
>M2:=idct(TM):
This would correspond to the Figure 7.37

Figure 7.36: The DCT compressed image.

Figure 7.37: The SVD compressed image.

Now we will compare this with using the SVD to compress the image:
>U,S,Vt:=SingularValues(M,oytput=[U,S,Vt]):
>M3:=value(add(S[i]*Column(U,i).Row(Vt,i)),i=1..2));
Here we are reconstructing the image from just two components of the SVD and we get
The idea is not to recreate the original image exactly. The idea is to create a reasonably good
reproduction of the image by using significantly less data than that contained in the original image.
Since some of the original information is lost in this process, this type of compression is called lossy
compression.

7.6. Total Least Squares

343

Example 3.
We will use Maple to compare the total least squares line and the least squares line for a set of data.
We begin by generating two lists called xv and yv which contain the coordinates of our data points.
>with(stats[random],normald):
>f1:=x->.2*x+normald[0,.2]():
>f2:=x->2*sin(.1*x)+normald[0,.2]():
>xv:=[seq( f1(i), i=1..20)]: ## noisy x values
>yv:=[seq( f2(i),i=1..20)]: ## noisy y values
Next we have to put the data in mean-deviation form. We will write a procedure called mdform which
will take any list as an input and return the mean-deviation form of that list.
>mdform:=proc(L)
local n,m;
n:=nops(L):
## n is the number of poiints
m:=add(L[i],i=1..n)/n: ## m is the mean
convert([seq( L[i]-m, i=1..n)],Vector);
end:
>mx:=mdform(xv):
>my:=mdform(yv):
>A:=<mx|my>:
>M:=A^%T.A:
>U,S,Vt:=SingularValues(M,output=[U,S,Vt]);
The direction of the total least squares line is determined by the first column of V computed above.
We will define the slope of this line and then define the plots the TLS line and the data points. The
plots wont be displayed until we find the least squares line as well.
>v1:=Row(Vt,1):
>mtls:=v1[2]/v1[1]: ### the slope of the TLS line
>data:=[seq( [x[i],y[i]],i=1..50)]:
>p1:=plot(data,style=point):
>p2:=plot(mtls*x,x=-2..2): ### plot the TLS line
We can find the least squares line as follows:
>mls:=dotprod(mx,my)/dotprod(mx,mx); ## slope of the LS line
>p3:=plot(mls*x,x=-2..2): ### plot the LS line
>plots[display]([p1,p2,p3]);
This gives the following plot:
Now the least squares line should minimize the sum of the squares of the vertical distances from the
data point to the line. We will use Maple to compute these distances for the TLS line and the LS line
>yls:=mls*mx:
>ytls:=mtls*mx:
>Norm(my-yls,2);
1.29529
>Norm(my-ytls,2);
1.30000
Each time you execute the above commands the numerical values obtained should vary because the data
points wwere generated using random numbers but the first value should always be smaller than the
second.
We leave it as a final Maple exercise for the reader to compare the sum of the squares of the
orthogonal distances from the data points to the lines. (See the discussion in section 7.6).

344

7. The Singular Value Decomposition

0.5

0.5

Figure 7.38: The TLS line, LS line, and the data points.

Chapter 8

Calculus and Linear Algebra


In this chapter we will look at some connections between techniques of calculus (differentiation and
integration) and the methods of linear algebra we have covered in this course.

8.1

Calculus with Discrete Data

Suppose we have the following experimental data which gives the vapor pressure (in torr) of
ethanol at various temperatures (in degrees Centigrade)
Temperature T
Pressure P

20.0
42.43

25.0
55.62

30.0
69.25

35.0
93.02

40.0
116.95

45.0
153.73

50.0
190.06

55.0
241.26

60.0
303.84

We have plotted this data in Figure 8.1 which shows that the pressure is clearly an increasing
function of temperature, P = f (T ), but any precise formulation of this function is unknown. Now
suppose we want to answer the following questions:
What is the value of P when T = 42.7?
What is the value of dP/dT when T = 42.7?
Z T =40.0
What is
f (T ) dT ?
T =20.0

How could we answer these questions? We have seen two major approaches that could be taken.
We could find an interpolating polynomial and use that to answer each of the above questions.
In Figure 8.1 we show the data, the interpolating polynomial, and the derivative of this polynomial.
Looking at these plots should convince you that this approach is unsatisfactory. As the plot of the
derivative clearly shows the interpolating polynomial is not strictly increasing and so it violates our
basic intuition of the physics of the problem. This is typical of what happens when you try to fit a
high degree polynomial to a set of data.
The problem with the above approach is that we tried to find a curve that fit the data values
exactly but experimental data is almost always guaranteed to contain errors of measurement. Another method would be to find a function that gives the best least-squares fit to the data. The
problem here is to determine what type of function to fit. In many cases understanding the physical
theory behind the phenomenon can indicate what type of function should be used. In the current
example we can see that P grows with T but should that growth be exponential, linear, quadratic,
or some other form. If we assume quadratic growth and find the best fitting function of the form
345

346

8. Calculus and Linear Algebra

300

250

200

150

100

50

30

40

50

60

Figure 8.1: Vapor pressure versus temperature (interpolation)

P = c0 + C1 T + c2 T 2 we would get the plots shown in Figure 8.2 . Clearly this is a more satisfactory
result than simple interpolation.
We now want to take a different approach to this type of problem.
You should recall from calculus that if P = f (T ) then the derivative dP/dT is defined as

f (T ) = lim

T 0

f (T + T ) f (T )
T

This means that we have the following approximation

f (T + T ) f (T )
dP

dT
T

This is not saying anything new or complicated. It is stating the obvious fact that the instantaneous
rate of change can be approximated by an average rate of change over an interval (generally the
smaller the interval the better the approximation). There is an interval of T = 5.0 between each
of our data values in the above table.

8.1. Calculus with Discrete Data

347

300

250

200

150

100

50

20

30

40

50

60

Figure 8.2: Vapor pressure versus temperature (least squares fit)

If we store the P values in a vector v then the finite differences can be computed by matrix
multiplication

1
1

42.43
2.64
55.62

1
69.25 2.73

1 1
93.02 4.75

1 1
116.95 = 4.78

1 1
153.73 7.36

1 1
190.06 7.27

10.24
1 1
241.26
1 1
12.52
303.84

Notice that since it takes 2 data values to compute each finite difference our nine data values
gives only eight finite differences.
The matrix in the above example could be called a differentiation matrix since for any vector
generated by sampling a function (with, in this case, an interval of 5 between the samples) multiplication by the matrix results in a vector containing approximations to the derivative of the function.
What is the null space of this matrix?

348

8.2

8. Calculus and Linear Algebra

Differential Equations and Dynamical Systems

In this section we want to look at differential equations of the form


dy
= f (y)
dt
where f (y) is linear. The left hand side of this equation represents the instantaneous rate of change
of y with respect to t. If we evaluate y at a sequence of equidistant values of t then this instantaneous
rate of change at the particular value yk can be approximated by
yk+1 yk
dy

dt
t
where t represents the t interval from yk to yk+1 .
For example, if we had the function y = 3t2 then the value of dy
dt at t = 1 could be approximated
by
y(1.1) (y(1)
3(1.1)2 3(1)2
=
= 6.3
.1
.1
where we are using a value of t = .1. It should be clear that in general we will get a better
approximation by using a smaller value of t. So if we used t = .02 we would have
3(1.02)2 3(1)2
y(1.02) (y(1)
=
= 6.06
.02
.02
Now suppose we have the differential equation
dy
= 5y
dt
At the particular value yk this equation could be approximated by
yk+1 yk
= 5yk
t
which can be rewritten as
yk+1 = (1 + 5t)yk
If we had some initial value (say y0 = 3) and some fixed interval (say t = .1) then we could
approximate subsequent values of y from the order 1 difference equation yk+1 = 1.5yk . This would
give y0 = 3, y1 = (1.5)(3) = 4.5, y2 = (1.5)2 (3) = 6.75, and so on. In general we would have
yn = 3(1.5)n . Remember yn stands for the value of y after n intervals of length .1 .

8.3. An Oscillating Spring

349

Exercises
1. Given the differential equation

dy
= 2y
dt
We will look at several ways of solving this by a discrete approximation. The right hand side tells
us that the rate of change at time k is given by 2yk but how can we approximate this rate of
change by a finite difference? For the following problems use a time interval of t = .25 with the
initial va lues y0 = 1, y1 = 1/2.
(a) The rate of change at time k can be approximated by the forward difference
yk+1 yk
t
Use this approximation to solve the differential equation. (Remember a solution in this context
is a sequence of values y2 , y3 , y4 , . . . . This sequence of values will be generatedby a discrete
dynamical system.)
(b) The rate of change at time k can be approximated by the backward difference
yk yk1
t
Use this approximation to solve the differential equation.
(c) The rate of change at time k can be approximated by the centered difference
yk+1 yk1
2t
Use this approximation to solve the system.
(d) Plot the three approximate solutions along with the exact solution.

2. Repeat the previous problem for

dy
= 2y + 1
dt

Use the same t and initial values.


3. Repeat for

8.3

d2 y
= 2y + 1
dt2

An Oscillating Spring

In this section we will consider a system composed of a mass connected to a spring with the
other end of the spring connected to a fixed support. We will assume that the only relevant force
is the spring force and will ignore gravity and friction. The mass is displaced from its equilibrium
position and released. The result is that the mass will oscillate.
On a conceptual level this is one of the most important examples in this chapter. It illustrates
how a problem can be analyzed in terms of basic laws of physics. These laws of physics can then
be expressed in the form of a differential equation which can be solved in continuous time using
calculus. Finally it shows how it can be converted to discrete time and solved using linear algebra.
This last step might appear to be redundant but in many applications it turns out that the equations

350

8. Calculus and Linear Algebra

involved are too complicated for a calculus solution and they have to be converted to discrete time
approximations.
The use of calculus in such problems results in what is called an analytical solution; the solution
is given as a function. The use of linear algebra as shown here is called a numerical solution; the
solution is a list of numbers. The development of fast computers with large memories have had a
revolutionary impact on applied mathematics. These technological improvements have made quick
and accurate numerical solutions possible where they would have been impossible 30 years ago.

Fixed
Support

Figure 8.3: A mass on a spring

Continuous Time
First we will analyze this as a continuous time system. From physics you know that the position
of the mass is governed by an equation of the form F = ma. Furthermore, in this example, we are
assuming the only relevant force is the spring force which is given by F = Kx where K > 0 is the
spring constant and x is the displacement of the mass from equilibrium. Combining these equations
we get
ma = Kx
m

d2 x
= Kx
dt2

d2 x
K
= x
dt2
m
This last equation has a general solution of the following form1 :
1 In fact, the set of solutions to this differential equation form a vector space (i.e., the sum of any two solutions
is also a solution, andq
a scalar multiple
q of a solution is a solution). This vector space is two dimensional and has a

basis consisting of cos


of these basis vectors.

K
t
m

and sin

K
t.
m

So you can look at the general solution as being any possible combination

8.3. An Oscillating Spring

351

x = C1 cos

K
t + C2 sin
m

K
t
m

Problem. Given that K = 1, m = 1, and at t = 0 you know that x = 0 and


values of C1 and C2 .
Solution. Given the values of K and m the above equation would become

dx
dt

= 1 find the

x = C1 cos t + C2 sin t
Substituting x = 0 and t = 0 into this equation we have
0

= C1 cos(0) + C2 sin(0)
= C1

Hence C1 = 0.
Now find the derivative and substitute t = 0 and
dx
=
dt
1 =
1 =

dx
dt

=1:

sin t + C2 cos t
sin(0) + C2 cos(0)
C2

Therefore the motion of of the oscillating mass is described by


x = sin t

Discrete Time
Weve seen before that a first derivative can be approximated by a finite difference. The second
derivative can be approximated in a similar way. Using the fact that the second derivative is the
derivative of the first derivative we get
xk+2 xk+1
xk
xk+1
d2 x
xk+2 2xk+1 + xk
t
t

=
dt2
t
t2
This finite difference expression would give an approximation to the second derivative at time
k + 1 (the midpoint of the values used). So if we use this discrete approximation in the place of the
second derivative then the equation describing the motion becomes

K
xk+2 2xk+1 + xk
= xk+1
t2
m
Solving this for xt+2 we get
xk+2 =



K
2 t2 xk+1 xk
m

Or in matrix form:

where p =

K 2
t .
m

xk+1
xk+2

0
1
1 2 p



xk
xk+1

352

8. Calculus and Linear Algebra

As an example of this discrete model suppose that K/m = 1 and t = .8. We then have the
dynamical system


0
1
xk+1 =
xk
1 1.36

To actually use this to compute values we would need x0 which would require knowledge of
the position of the mass at times t = 0 and t = .8 (i.e, at k = 0 and k = 1). From previous
results we know that the position of the object is described by x = sin t, soat t = .8 we have
0
x1 = sin(.8) = .71736. The initial state of the dynamical system is then x0 =
. Repeated
.71736
multiplication by A gives the following values:










0
.71736
.97560
.60947
.14673

.71736
.97560
.60947
.14673
.80902








.80902
.95354
.48779
.29014


.95354
.48779
.29014
.88238

If we draw a time plot of the discrete system along with the solution of the continuous model we
get Figure 3.56. The values given by the discrete system also lie on a sine wave but at a slightly
different frequency from that of the continuous time solution.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1

8 10 12 14 16

Figure 8.4: Plots of the discrete time and continuous time solutions with t = .8. The horizontal
axis is indexed by k not by t.
The continuous solution has a period of 2. What is the period of the discrete solution? The
characteristic polynomial would be 2 1.36 + 1 = 0 giving eigenvalues of

1.36 1.362 4
= .68 .7333212i
2
These complex eigenvalues have magnitude 1, and correspond to a rotation of arccos(.68) at each
2
multiplication by A. One complete cycle would therefore require
steps. Since each step
arccos(.68)
is .8 seconds the total period is
.8

1.6
2
=
6.107342014
arccos(.68)
arccos(.68)

This is slightly less than the period of the continuous time model which had a period of 2
6.283185308.
Problem. You should realize that the finite difference approximation that we are using to
generate our linear dynamical system becomes more exact as the time interval becomes smaller.

8.3. An Oscillating Spring

353

Compute the period of the discrete time solution for t = .4, and t = .1 and compare the result
with the period of the continuous time solution.
Here are the plots of these solutions for t = .4, and t = .1.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1

8 10 12 14 16

Figure 8.5: Plots of the discrete time and continuous time solutions with t = .4.
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1

8 10 12 14 16

Figure 8.6: Plots of the discrete time and continuous time solutions with t = .1.

354

8. Calculus and Linear Algebra

Exercises
1. In our analysis of the oscillating spring-mass system we ignored gravity. How would the analysis
change if we include gravity.
2. The dynamical system which modelled the spring-mass system ignored friction. We can modify the
system as follows to include the effect of friction:

 


0
1
xk
xk+1
=
xk+1
q1 2pq
xk+2
Here the parameter q is a (small) positive value that models the presence of friction.
(a) What are the magnitudes of the (complex) eigenvalues of this system?


0
(b) Let p = .64 and x0 =
. Use Maple to draw time plots of this system for q =
.717
0.1, 0.2, 0.3, . . . , 0.9, 1.0. At what value of q do the eigenvalues become real? How does the
behavior of the system change at that point?
3. Set up the differential equations for a system of two equal masses connected by three springs to
two fixed supports. Assume the springs all have the same spring constant.

8.4

Differential Equations and Linear Algebra

We have already looked at dynamical systems of the form xk+1 = Axk . Dynamical systems of
this type are sometimes called discrete dynamical systems because the time variable (k) evolves
in steps of some fixed finite size. There is another fundamental way of modeling systems which evolve
over time using continuous dynamical systems which are described by differential equations.
The simplest example of a continuous linear dynamical system would be an equation like
dx
= .06x
dt
The left side of this equation is a derivative which represents the instantaneous rate of change of x
with respect to time, t. The equation says that this rate of change is equal to 6% of the value of x.
The solution to this equation would be
x = Ce.06t
Checking this by taking the derivative we would get
dx
= Ce.06t (.06) = x(.06) = .06x
dt
One key idea here is that the set of all possible solutions of the above differential equation can
be seen as a 1 dimensional vector space2 with basis e.06t .
We now show how the same dynamical system can be modeled as a discrete system.For simplicity
we will choose some specific value for C, say C = 10. We would then have the solution x = 10e.06t .
If we let x0 stand for the value of x at time 0 we have x0 = 10. Choose a time interval, say t = .5
2 As

a generalization of this example the solution of

consisting of all scalar multiples of ekt .

dx
= kx would be x = Cekt a one dimensional vector space
dt

8.4. Differential Equations and Linear Algebra

355

and let xk be the value of x at time t = .5k (that is, k time intervals after t = 0). Then the derivative
can be approximated by a difference quotient
xk+1 xk
= .06xk
.5
Solving this for xk+1 we get
xk+1 = 1.03xk
This would now be a discrete time approximation to the original continuous time system. If we start
off with x0 = 10 and use this difference equation we get the following values
x0
x1
x2
x3
x4
x5
x6

10
10.3
10.609
10.927
11.255
11.595
11.941

Now remember that x6 is the value after 6 time intervals which corresponds to t = 3. In the
continuous time model the value of x at t = 3 would be 10e.06(3) = 11.972. So the continuous and
the discrete model DO NOT AGREE. Here is a plot of the continuous time model and the values of
the discrete time model. The gap between the discrete and continuous models will generally increase

18

16

14

12

10
0

10

Figure 8.7: Comparison of the continuous and discrete models with t = .5.
as time goes on. At t = 10 the difference between the two models would be
10e.06(10) 10(1.03)20 .16008
In general, as the time interval gets smaller the solution given by the discrete model gets closer
to the solution of the continuous model. To illustrate this we show the plots that would result if we
used a larger time interval of t = 1 and a smaller time interval of t = .2.
By choosing a smaller time interval the discrete model becomes a better approximation to the
continuous model. When we use a time interval of t = .2 the difference between the two models

356

8. Calculus and Linear Algebra

18

18

16

16

14

14

12

12

10

10
0

10

10

Figure 8.8: Comparison of the continuous Figure 8.9: Comparison of the continuous
and discrete models with t = 1.
and discrete models with t = .2.
at t = 10 would be .06496 (approximately one third of what it was with an interval of .5). When
the time interval is t = 1 the difference between the two models at t = 10 would be .31271.
There are two drawbacks to using very small intervals. First, due to the finite precision of
computers there is a limit as to how small the interval can be and the smaller the interval the more
serious the rounding errors will be. Second, the smaller the time interval the greater the amount of
data that will be produced. For example, if you chose t to be a millionth of a second, it would
take a million steps to compute just one second of the solution.

Systems of Two Differential Equations


Next suppose we had a system of differential equations like
dx1
dt
dx2
dt

= .01x1
=

.07x2

This system would be easy to solve based on the earlier example. The solution here would be
x1 = C1 e.01t
x2 = C2 e.07t
That was pretty simple, but now look at a more complicated example
dx1
dt
dx2
dt

=
=

x1
x1

+2x2
+4x2

The big difference in this example is that the variables are coupled. That is the formula for the
rate of change of x1 involves the values of both x1 and x2 , and similarly for the rate of change of x2 .
To solve this problem we want to uncouple the equations, and this will just involve diagonalizing
a matrix.
 
x
We begin by letting x = 1 and then we have
x2
 
 

 
dx
1 2 x1
dx1 /dt
x1 + 2x2
= Ax
=
=
=
1 4 x2
x1 + 4x2
dx2 /dt
dt

8.4. Differential Equations and Linear Algebra

357

Thematrix
 A in this case can
 bediagonalized in the usual way. The matrix that diagonalizes A
2 1
2 0
1
is P =
and P AP =
. So we introduce a new variable
1 1
0 3
 
y
y = 1 = P 1 x
y2
We then get
dx
dt
dx
dt
1 dx
P
dt
dy
dt
In this case that leaves us with

Ax

P DP 1 x

P 1 P DP 1 x

Dy

=
=

2y1

dy1
dt
dy2
dt

3y2

The equations have been uncoupled. The solution now is simple:


y1 = C1 e2t
y2 = C2 e3t
But this is the solution in terms of our new variables. What is the solution in terms of the original
variables. For this we just evaluate the following



2 1 C1 e2t
x = Py =
1 1 C2 e3t
So
x1 = 2C1 e2t + C2 e3t
x2 = C1 e2t + C2 e3t
This solution can be written as
x = C1

 3t 

e
2e2t
+ C2 3t
e
e2t

When the solutions


 are
 written
 3tthis
 way it is easier to see that they form a two dimensional vector
2e2t
e
space with basis
and 3t .
e2t
e
Now how can the solutions be visualized. First, to use a specific example lets choose C1 = 1
and C2 = 1. Then we can draw time plots of the solutions where we view x1 and x2 as functions
of time. This would give
We can also draw a phase plot where we plot x1 values against the corresponding x2 values.
This gives In the phase plot we see the origin as a repellor.
Can we model this system as a discrete-time system? First, to simplify the notation 3 , we will
introduce new variables: let a = x1 and b = x2 .Well, if we let t = .1 then using a similar argument
as our earlier example we get
3 If

we keep the variable x1 , it is most common to represent the value of this variable at time t by xt1

358

8. Calculus and Linear Algebra

10
8
6
4
2
2

0
2

1
t

4
6
8
10
Figure 8.10: The time plots of x1 = 2e2t e3t and x2 = e2t e3t

6
4
x2
2

2 x1 4

2
4
6
Figure 8.11: The phase plot of x1 = 2e2t e3t and x2 = e2t e3t .

Appendix A

Linear Algebra with Maple


We will summarize the basics of using the LinearAlgebra package to do linear algebra. For all the commands in this
section it will be assumed that you have entered the following command to load the LinearAlgebra package.
>with(LinearAlgebra):

Defining Vectors in Maple


There are several ways of defining a vector in Maple . Suppose we have the vector
2 3
4
637
7
u=6
405
8

The easiest way of defining this vector in Maple is


>u:=<4, 3, 0, 8>;

Note: You have to enclose the vector entries in angled brackets, < and >.
The Vector command also allows you to define a vector by giving a rule to generate the nth entry in the vector.
This method of defining a vector requires two input parameters. The first parameter is the size of the vector. The
second is the rule to generate entries in the vector.
>u:=Vector(4, n->n^2);
u=[1, 4, 9, 16]
>v:=Vector(5, j->t/(j+1));
v=[t/2, t/3, t/4, t/5, t/6]
A vector is a one-dimensional array in Maple which means individual entries in the vector can be accessed by
specifying the index of that entry as shown below
>u:=<x,y,x*y,-1>:
>u[2];
y
>u[4];
-1
>u[1]-u[3];
x - xy
>v:=Vector(4, n->u[5-n]);
v=[-1, xy, y, x]

359

360

A. Linear Algebra with Maple

Defining Matrices in Maple


There are many ways of defining a matrix in Maple . Suppose we have the matrix
2
3
1 0
A = 43 55
8 2

Either of the first two following commands could be used to define this matrix.
>A:=<<1|0>,<3|5>,<8|2>>;
>A:=<<1,3,8>|<0,5,2>>;
><A|A>;
><A,A>;

#### row by row


#### column by column

A matrix is a two-dimensional array in Maple and each entry in the matrix can be accessed by specifying the
two indices (row and column) of the entry. For example
>A:=<<2,0,8>|<5,1,4>>;
2
2
40
8

>A[2,1];

3
5
15
4

0
>A[3,1]*A[1,2];
40
Matrices can also be generated by giving a rule which generates thye entries. You first specify the size of the
matrix and then give the rule
>B:=Matrix(2,3, (i,j)->i/j);
>C:=Matrix(2,2, (i,j)->t^(i*j));
B=

1
2

C=

1/2
1

t
t2

1/3
2/3

t2
4
t

Patterned Matrices
Some matrices have entries that fall into
matrices:
2
4
60
6
B=4
0
0

a particular pattern. For example, the following matrices are square diagonal

2
1 0
60 1
6
C = 60 0
40 0
0 0
The maple command DiagonalMatrix can be used for this type of matrix.
the entries on the diagonal of the matrix. So we could define B and C as
0
8
0
0

0
0
3
0

3
0
0 7
7
0 5
1

3
0 0 0
0 0 07
7
1 0 07
0 2 05
0 0 5
With this command you just have to input

>B:=Diagonalmatrix(<4,8,3,-1>);
>C:=DiagonalMatrix(<1,1,1,2,5>);
The 10 10 identity matrix could be defined as
>I10:=IdentityMatrix(10);
Note how the $ sign is used to create a sequence of ten 1s.
Another type of patterned matrix is called a band matrix. The following are examples of band matrices:
3
2
b c d 0 0 0
3
2
4
1
0
0
0
6a b c d 0 07
4
1
0
07
7
6
62
7
60 a b c d 07
6
B2 = 6
2
4
1
07
B1 = 6 0
7
60 0 a b c d7
4 0
0
2
4
15
40 0 0 a b c5
0
0
0
2 4
0 0 0 0 a b

In Maple we can enter

361
>B1:=BandMatrix(<-2,4,1>,1, 5);
>B2:=BandMatrix(<0,a,b,c,d>,1,6);
The BandMatrix command requires three inputs. The first input must be a vector containing the entries down the
band. The next entry specifies how many diagonal bands extend below the main diagonal. The third entry (or third
and fourth) specifies the size of the matrix.

Solving Systems of Equations


As usual with Maple there are many ways of solving a system
of equations.
We2 will3 only mention three ways.
2
3
2
1 2 1
Suppose we want to solve the system Ax = b where A = 42 1 25 and b = 4 0 5. In this case A is a square
5
3 4 5
invertible matrix so the solution is given by x = A1 b. The Maple commands for this are as follows:
>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;
>b:=<2,0,-5>:
>sol:=A^(-1).b;
[7/2, 4/3, -25/6]
The method that was just used only works if A is invertible. We can solve any system by setting up the augmented
matrix of the system and then putting it in reduced row echelon form. The last column of the reduced matrix will
contain the solution.
>A:=<<1,2,3>|<2,1,4>|<1,2,5>>;
>b:=<2,0,-5>:
>ReducedRowEchelonForm(A,b):
>col(%, 4);
[7/2, 4/3, -25/6]
>LinearSolve(A,b); ### another option


1 2 2 3
3
Suppose A =
and b =
then Ax = b must have infinitely many solutions. We can find these
2 1 3 4
0
solutions in Maple as follows:
>A:=<<1,2>|<2,1>|<2,3>|<3,4>>:
>b:=[3,0]:
>LinearSolve(A,b);
[-1-4/3*s-5/3*t, 2-1/3*s-2/3*t, s, t]

Matrix and Vector Opertations


The simplest operations on matrices and vectors are scalar multiplication and addition. These two operations
allow you to create linear combinations.
We will use the following matrices and vectors for our examples in this section:
2
2
3
3


2
1
3
3
1
3
4
4
5
A = 1 1 , B = 1 45 , u =
,v=
3
1
0
2
2
1
then we can evaluate 5A, 3u, 2A + 3B, 4u 8v as follows
>A:=<<2|1|1>|<-1|0|2>>:
>B:=<<3|3|-1>|<4|2|1>>:
>u:=<1,3>:
>v:=<3,-1>:
>5*A;
>-3*u;
>2*A+3*B;
>4*u-8*v;
So addition and scalar multiplication are computed using the symbols + and *.
Matrix multiplication is not the same as scalar multiplication and is represented by a different symbol in Maple
. Matrix multiplication is indicated by a dot, that is the . symbol. So if we wanted to compute AB, BA, A2 , Au,
B(u + v) using the same matrices and vectors as above we could enter

362

A. Linear Algebra with Maple


>A.B;
>B.A;
>A.A; ### one way of finding A^2
>A^2;
### an alternate way of finding A^2
>A.u;
>A.(u+v);

Finding the transpose or inverse of a matrix can be found as follows (we show two methods for finding the inverse).
>Transpose(A);
>A^(-1);

### this stands for the inverse of A but does not compute it

For example, using the same matrices as above suppose we want to find a matrix C such that
A(C + B) = B t A
Solving this equation symbolically we would get
C = A1 B t A B
We can then compute this result in Maple
>C:=A^(-1).Transpose(B).A-B;
The dot product can be found in two ways. To find u v we can enter either
>DotProduct(u,v);
>Transpose(u).v);
These two methods result from the equation u v = uT v.
There is a similar command for the cross product. A cross product can be evaluated using
>CrossProduct(<1,2,3>,<4,5,6>);
[-3, 6, -3]
>CrossProduct(<A,B,C>,<X,Y,Z>);
[B*Z-C*Y, C*X-A*Z, A*Y-B*X])

Determinants
A determinant can be computed in Maple with the det command. For example suppose we want to use Cramers
Rule to solve
2x1 + 3x2 + 4x3

3x1 + 2x2 + 3x3

5x1 + 5x2 + 9x3

for x2 .
Cramers Rule says that

In Maple we could do
>a1:=<2,3,5>:
>a2:=<3,2,5>:
>a3:=<4,3,9>:
>y:=<a,b,c>:
>A:=<a1|a2|a3>:
>A2:=<a1|y|a3>:
>Determinant(A2)/Determinant(A);

x2 =

2
3
5

a
b
c

2
3
5

3
2
5

4
3
9
4
3
9

### Cramers Rule for x2

363

Examples
Example 1
We will solve the system
x+y+z

3x 2y + z

4x y + 2z

First we will show how to plot these equations.


>e1:=x+y+z=3:
>e2:=3*x-2*y+z=1:
>e3:=4*x-y+2*z=4:
>plots[implicitplot3d]({e1,e2,e3},x=-4..4,y=-4..4,z=-4..4,axes=boxed,style=patchnogrid,shading=zgrayscale);

4
2
0
4

y
2

0
2

4
2

Figure A.1: The plot of the system.


The plot shows that the three planes making up the system intersect along a line.
We could solve this system by
>solve({e1,e2,e3}, {x,y,z});
{y = 8/5 2/5 z, z = z, x = 7/5 3/5 z}

This result means that z is free and so the solution would correspond to the line
2
2
3
3
8/5
2/5
4 0 5 + t4 1 5
7/5
3/5

We could also solve this system by setting up the augmented matrix and reducing.
>A:=<<1,3,4>|<1,-2,-1>|<1,1,2>|<3,1,4>>;
>ReducedRowEchelonForm(A);
1

3/5

7/5

6
6 0
4

2/5

7
8/5 7
5

It should be clear that this reduced form gives the same solution as the previous method.

364

A. Linear Algebra with Maple

Example 2
Given

2 3
2 3
2 3
2 3
1
2
3
4
627
637
647
657
6
6
7
6
7
6
7
v1 = 4 5 , v2 = 4 5 , v3 = 4 5 , v4 = 4 7
3
4
5
65
4
5
6
7

Find a basis for Span (v1 , v2 , v3 , v4 ) from among hese vectors.


We can solve this in Maple as follows
>v1:=<1,2,3,4>:
>v2:=<2,3,4,5>:
>v3:=<3,4,5,6>:
>v4:=<4,5,6,7>:
>A:=<v1|v2|v3|v4>:
>ReducedRowEchelonForm(A);
1

6
6 0
6
6
6 0
4

3
7
7
7
7
7
5

Maple has done the computation but it is up to us to give the correct interpretation to this result. In this case
we see that two columns of the reduced form contain pivots. The corresponding columns of the original matrix would
be the basis we are looking for. So our basis is {v1 , v2 }.

Example 3
Find all 2 2 matrices satisfying A2 = 0.
We start by defining
A=

a
c

b
d

>A:=<<a,c>|<b,d>>:
>B:=A^2:

B=

"

a2 + bc

ab + bd

ca + dc

bc + d2

Now we want each entry in B to equal 0. The next line shows haw we can refer to these entries in Maple and
have Maple solve the desired equations.
>solve( {B[1,1]=0, B[1,2]=0, B[2,1]=0, B[2,2]=0}, {a,b,c,d} );

{c = 0, d = 0, b = b, a = 0} ,

c = c, d = d, a = d, b =

d2
c

This result means that there are two basic solutions. If c = 0 then there is a solution of the form

0 b
0 0
where b is free.
If c 6= 0 then there is a solution of the form
"
d
c
where c and d are free.

dc
d

365

Example 4
For what values of a and b do the vectors

2 3 2 3 2 3
1
2
1
425 , 415 4a5
2
a
b

from a basis of R3 ?
We will illustrate two methods of answering this question.
>A:=<<1,2,2>|<2,1,a>|<1,a,b>>:
>GaussianElimination(A);
1
6
6 0
4
2

In order for these vectors to be a basis of


We can state this condition as

a2

b + 2/3 + 1/3 a2 2 a

R3

3
7
7
5

we need the entry in the third row, third column to be non-zero.

2
1
b 6= a2 + 2a
3
3

We could also do the following


>Determinant(A);

For these vectors to be a basis of


first method.

R3

3 b a2 + 6 a 2

we want the determinant to be non-zero. This would give the same result as the

366

A. Linear Algebra with Maple

Appendix B

Complex Numbers
Consider the equation x2 + 1 = 0. If you try to solve this equation,
the first step would be to isolate the x2 term giving

2
x = 1. You would then take the square
root and get x = 1. Algebraically this would be the solution (or rather
one of the solutions, the other being 1). However there is no real number which satisfies this condition since
when you square as a real number the result can never be negative. In the 16th century mathematicians introduced
the symbol i to represent this algebraic solution, and referred to this solution as an imaginary number. In general,
an imaginary number is any real multiple of i.
A complex number is a number that is the sum of a real number and an imaginary number. A complex number
is usually represented as a + bi where a and b are real numbers. In this notation a is referred to as the real part, and
b is referred to as the imaginary part of the complex number. There are special symbols that are commonly used to
refer to the real and imaginary parts of a complex number. If z is a complex number then z indicates the real part
of z and z indicates the imaginary part of z.
Complex numbers satisfy the usual rules of addition and multiplication. The one complication is that any
occurrence of i2 can be replaced by 1. Look at the following computations for example :
(2 + 5i) + (7 2i) = 9 + 3i
(2 + 5i)(7 2i) = 14 4i + 35i 10i2 = 14 + 31i 10(1) = 24 + 31i
i3 = i2 i = 1 i = i

Geometry of Complex Numbers


A correpondance can be set up between complex numbers and points in the plane. The real part gives the
horizontal coordinate, and the imaginary part gives the vertical coordinate. So, for example, the complex number
3 + 2i would correspond to the point (3, 2). A purely real number would lie somewhere on the horizontal axis and a
purely complex number would lie on the vertical axis. When plotting complex numnbers in this way it is standard to
call the horizontal axis the real axis and the vertical axis the imaginary axis. If we associate points in the plane with
position vectors (that is, vectors whose starting point is the origin), then adding complex numbers is like adding the
corresponding vectors. Multiplying a complex number by a real number is like multiplying the vector by a scalar.
Given a complex number z = a + bi, the complex conjugate of that number is z = a bi. So the conjugate of
a complex number is formed by changing the sign of the imaginary part. Geometrically, the conjugate of z is the
mirror image of z through the real axis. Notice that z = z if and only if zis purely real. Two basic properties of the
conjugate are:
z1 + z2

z1 + z2

z1 z2

z1 z2

We will give a proof of the second of these properties. Let z1 = a + bi and z2 = c + di, then
z1 z2

(a + bi)(c + di)

ac + adi + bci + bdi2

ac bd + (ad + bc)i

367

368

B. Complex Numbers
IMAGINARY AXIS

a+bi

REAL AXIS

a-bi

Figure B.1: The points a + bi and a bi in the complex plane


and so we have
z1 z2

=
=
=
=

(a bi)(c di)

ac adi bci + bdi2


ac bd (ad + bc)i

z1 z2

The above result can be generalized to matrices and vectors with complex entries. For a complex matrix A and
complex vector x we have:
Ax = Ax
Or, more particularly, if Ax = x then Ax = x. From this it follows that is A has only real entries then Ax = x.
In other words, if A has only real entries and has complex eigenvalues then the eigenvalues and eigenvectors come in
complex pairs. In other words, if is an eigenvalue then so is , and if x is an eigenvector (corresponding to ) then
x is an eigenvector corresponding to .
Another important property of the conjugate is that if z = a + bi then
zz = (a + bi)(a bi) = a2 abi + abi b2 i2 = a2 + b2
which you should recognize as the distance of z from the origin squared (or the length of the corresponding vector
squared).
This distance is called the magnitude (or length, or absolute value) of the complex number and written

|z| = z z. This equation has an important consequence when dealing with complex vectors: recall that if v is a real
2
T v.1
vector then kvk2 = vT v. But if
v is a complex vector then kvk = v
1
For example, suppose v =
then
i

1
vT v = 1 i
= 12 + i2 = 1 1 = 0
i
which would clearly be incorrect for the length. But


1
i
= 12 i2 = 1 + 1 = 2
i

Taking the square root we then get the correct length, kvk = 2.

vT v = 1

1 The conjugate of the transpose of a complex matrix A is usually written A . So if v is a complex vector then
kvk2 = v v. This equation is also valid for real vectors since v = vT if all the entries are real.

369
The conjugate also has some use with division of complex numbers. To rationalize the denominator of a complex
fraction means to eliminate any imaginary terms from the denominator. This can be done by multiplying the numerator
and denominator of the fraction by the conjugate of the denominator. For example:
1+i
(1 + i)(2 i)
3+i
3
1
=
=
= + i
2+i
(2 + i)(2 i)
5
5
5

Polar Representation of Complex Numbers


Any complex number (or, more generally, any point in the plane) can be characterized by the distance of the
point from theorigin, r, and the angle measured from the positive x axis, . So, for example, the complex number
1 + i has r = 2 and = /4. If we square this complex number we get (1 + i)2 = 1 + 2i + i2 = 2i. In this case the
value of r would be 2 and would be /2.
In general if a complex number lies at a distance r and an angle the real coordinate would be given by r cos
and the imaginary coordinate would be r sin . So this complex number could be written as r cos + ir sin =
r (cos + i sin ).
There is another important notation for complex numbers that is related to the idea of power series. From calculus
you should recall that
x3
x4
x2
+
+
+
2
6
24
Substituting x = i into the above and simplifying we get
ex = 1 + x +

ei

=
=

i3 3
i4 4
i2 2
+
+
+
2
6
24
2
3
4

1 + i

+
+
2
6
24

1 + i +

The real part of this last expression is 1


is

2
4
+
+ which is the power series for cos . The imaginary part
2
24

3
5
+
+ which is the power series for sin . As a result we get what is called Eulers Formula:
6
120
ei = cos + i sin

As a result we have the fact that any complex number can be represented as rei . The conjugate of this complex
number would be rei . The absolute value of r is just the magnitude of the complex number, and the angle is
called the argument of the complex number.
This notation makes one important aspect of multiplication of complex numbers easy to see. Suppose we have
a complex number z1 = r1 ei . This point is located at a distance r from the origin and at an angle from the
positive real axis. Now suppose we multiply this complex number by another complex number z2 = r2 ei . We get
z1 z2 = r1 ei r2 ei = r1 r2 ei(+) . What has happened to the original complex number? Its length has been scaled
by r2 and the angle has been rotated to + . In other words, multiplication by a complex number can be seen as a
combination of a scaling and a rotation.

Roots of Unity
Suppose you have the equation z 3 = 1. One solution is clearly z = 1. This is the real cube root of 1, but there are
two other complex solutions. If we write z = rei , then we want r 3 ei3 = 1 = ei2N for any integer N . This implies
, and this gives three different solutions = 0, 2/3, 2/3.
that r = 1 and that 3 = 2N . We then have = 2N
3
(All the other values of would be coterminal with these angles.) If we plot these points in the complex plane along
with the unit circle we get the following:
`
N
In general, if we want the N th roots of 1 we can start with w = ei2/N . Then w N = ei2/N
= ei2P i = 1, so
th
k
th
2
3
N1
th
w is an N root of 1. Then w is also an N root of 1. Thus 1, w, w , w , . . . , w
are the N roots. By earlier
remarks, these will be evenly spaced points on the unit circle.

370

B. Complex Numbers
1

Figure B.2: The cube roots of 1.

Exercises
1. Let z1 = 2 + i and z2 = 1 + 2i. Find
(a) z1 z2
(b) z1 z1
(c) z22

2. Let z = 1 +

3i.

(a) Write z in the form rei .


(b) Write z in the form rei .
(c) Write z 2 in the form rei .
(d) Write z 6 in the form rei .
3. Find all solutions of z 3 = 1. Do this by rewriting the equation as z 3 1 = 0. Then factor the left hand side:
(z 1)(z 2 + z + 1) = 0. You should get 3 solutions. Give your solutions in both the standard form as a + bi
and in exponential form as rei .
4. Find all four solutions to z 4 = 1.
5. Start with the equation ei = cos + i sin . Square both sides of this equation. Use this result to find
trigonometric identities for cos 2 and sin 2.
z
1
=
for any complex number z 6= 0.
6. Show that
z
|z|2
7.

(a) Find |ei | and |ie |.

(b) Plot the two points ei and ie in the complex plane.

371

Using MAPLE
We will use Maple to illustrate some of the aspects
of complex numbers discussed in this section.

In Maple the symbol I is used to stand for 1. In the following example we will begin by defining the complex
number z = 7.4 + 3.2i.
>z:=7.4+3.2*I:
>abs(z);
8.062257748
>Re(z);
7.4
>Im(z);
3.2
>conjugate(z);
7.4-3.2*I
>conjugate(z)*z;
65.00
>sqrt(conjugate(z)*z);
8.062257748
>convert(z,polar);
polar(8.062257748,.4081491038)
>8.062257748*exp(.4081491038*I);
7.399999999+3.200000000*I
>argument(z);
.4081491038
>convert(z^2,polar);
polar(65.00, .816298)
The command abs(z) computes |z|, the magnitude of z.
The commands Re and Im return the real and imaginary parts of a complex number.
The conjugate command returns z. Notice that the product conjugate(z)*z returns the square of abs(z).
The command convert(z,polar) returns the value of r and required to write z in the form rei . The following
command computes this exponential form and returns the original z (with some rounding error). Notice that the values
returned by convert(z^
2,polar) show that when z is squared the magnitude gets squared and the argument gets doubled.
The Maple command argument(z) will return just the argument of z.
Next we will use Maple to illustrate Eulers Formula.
>f:=exp(I*t);
>plot([Re(f), Im(f)],t=-9..9,linestyle=[1,2],thickness=2);
This gives Figure B.3.
You should understand where these plots came from. Since eit = cos t + i sin t plotting the real and imaginary parts
results in plots of a cosine and sine function.
Compare the above with the following:
>w:=.3+.9*I;
>g:=exp(w*t);
>plot([Re(g), Im(g)],t=-9..9,linestyle=[1,2],thickness=2);
This gives the Figure B.4.
To understand this result notice that we have
e(.3+.9i)t = e.3t e.9it = e.3t (cos(.9t) + i sin(.9t)) = e.3t cos(.9t) + ie.3t sin(.9t)
So plotting the real and imaginary parts returns a cosine and sine function but now they are being scaled by a function
which is increasing exponentially.
For one last example we will use Maple to plot the solutions to z 2 0 = 1 (that is, to plot the 20 twentieth roots of
1). The first command below uses Maple to compute the roots and place them in a list called sols. The second line uses
complexplot procedure in Maple which can be used to plot a list of complex numbers.

372

B. Complex Numbers
1
0.8
0.6
0.4
0.2
8 6 4 0
0.2
0.4
0.6
0.8
1

2 4 6 8
t

Figure B.3: The real and imaginary parts of eit .


14
12
10
8
6
4
2
8 6 4

2
4

2 4 6 8
t

Figure B.4: The real and imaginary parts of e(.3+.9i)t .


>sols:=[solve(z^20=1,z)];
>plots[complexplot](sols,style=point);
This gives Figure B.5.

1
0.8
0.6
0.4
0.2
1 0.6

0.2
0.4
0.6
0.8
1

0.20.40.60.8 1

Figure B.5: The solutions to z 20 = 1.

Appendix C

Linear Transformations
Let U and V be vector spaces and let T be a transformation (or function, or mapping) from U to V . That is,
T is a rule that associates each vector, u, in U with a unique vector, T (u), in V . The space U is called the domain
of the transformation and V is called the co-domain. The vector T (u) is called the image of the vector u under
transformation T .
Definition 23 A transformation is linear if:
1. T (u + v) = T (u) + T (v) for all u and v in the domain of T .
2. T (cu) = cT (u) for all u in the domain of T and all scalars c.

The combination of the two properties of a linear transformation implies that


T (c1 v1 + c2 v2 + + cn vn ) = c1 T (v1 ) + c2 T (v2 ) + + cn T (vn )
for any set of vectors, vi , and scalars, ci .
We will just make a few observations about linear transformations.
Theorem C.1 If T is a linear transformation then T (0) = 0.
Proof Let T : U V be a linear transformation and let u be any vector in U , then
T (0U ) = T (u u) = T (u) T (u) = 0V
In the above 0U stands for the zero vector in U and 0V is the zero vector in V .

Theorem C.2 If T is a linear transformation from


u1
6 u2 7
6 7
Proof Let u = 6 . 7 be any vector in
4 .. 5
2

Rn

Rn

to

Rm

then T (u) = Au for some m n matrix A.

then

un

T (u)

=
=

T (u1 e1 + u2 e2 + + un en )

u1 T (e1 ) + u2 T (e2 ) + + un T (en )


2 3
u1
u 7

6
6 27
T (e1 ) T (e2 ) T (en ) 6 . 7
4 .. 5
un

Au

373

374

C. Linear Transformations

The matrix A in the above theorem is called the standard matrix of the linear transformation T . The
above proof in fact gives a method for finding the matrix A. The proof shows that the columns of A will be the images
of the standard basis under the transformation.
2 3

3
1 1 2
For example, suppose A =
and u = 415. The linear transformation T (x) = Ax would be from R3
2 2 1
2
2
to R . This is sometimes written T : R3 7 R2 . The image of u under this transformation would be
2 3

3

1 1 2 4 5
2
T (u) =
1 =
2 2 1
2
2
So any linear transformation from Rn to Rm is equivalent to a matrix multiplication. What happens with other
vector spaces? There are many familiar operations which qualify as linear transformations. For example, in vector
spaces of differentiable functions the operation of finding a derivative is a linear transformation because
(f + g) = f + g
(cf ) = cf
where f and g are functions and c is a scalar.
Or in the vector spaces of matrices, taking the transpose is a linear transformation because
(A + B)T = AT + B T
(cA)T = cAT
When you take the determinant of a matrix the inputs are square matrices and the outputs are real numbers, so
computing a determinant is a transformation from the vector space of n n matrices to R1 but it is not linear since
det(A + B) 6= det(A) + det(B)

det(cA) 6= c det(A)
It turns out that we can say something specific about linear transformations between finite-dimensional vector
spaces:
Suppose T is a linear transformation where the domain and co-domain are both finite dimensional vector spaces.
In this case if we represent each vector by coordinates in terms of some basis then the vector spaces will look like Rn
for some value of n (the dimension of the spaces).

For example, suppose we had T : P3 P3 defined by T (p(x)) = p (x).If we use a basis 1, x, x2 , x3 then the
2 3
c0
6c1 7
2
7
polynomial c0 + c1 x + c2 x2 + c3 x3 would be represented by 6
4c2 5 and T (p) = c1 + 2c2 x + 3c3 x would be represented
c3
2
3
c1
62c2 7
7
by 6
43c3 5 and this transformation would be equivalent to multiplying by the matrix
0
2
3
0 1 0 0
60 0 2 07
6
7
40 0 0 35
0 0 0 0
It is also possible for one or both of the domain and co-domain to be infinite dimensional and in this case the
transformation is usually not represented by a matrix multiplication. But even here it is possible. Suppose for example
we had an infinite dimensional vector where the transformation is just a shift in the coordinates, i.e.

2 3
2 3
c1
c0
6c2 7
6c1 7
6 7
6 7
T : 6c2 7 6c3 7
4 5
4 5
..
..
.
.
This could be seen as multiplication by the matrix
3
2
0 1 0 0
60 0 1 0 7
7
6
60 0 0 1 7
5
4
.
.
..
..

In this case the matrix would have an infinite number of rows and columns.
Finally we point out why they are called linear transformations.

375
Theorem C.3 If T : U V is a linear transformation and L is a straight line in U , then T (L) is either a straight
line in V or a single point in V .
Proof Any straight line L in U must have an equation of the form x = u0 + tu1 . This is a line through u0 in the
direction of u1 . If we apply T to this line we get:
T (L)

T (u0 + tu1 )

T (u0 ) + T (tu1 )

T (u0 ) + tT (u1 )

This result can be seen as a line through T (u0 ) in the direction of T (u1 ). If T (u1 ) = 0 then the transformation gives
just a single point.

You have to be careful in interpreting the above. For example, in the vector space of differentiable functions
the expression t sin x would correspond to a straight line through the origin. The points on this line would be
expressions such as 2 sin x, 3 sin x, 3.7 sin x.

It is a straight line because it corresponds to all scalar multiples of a vector. The usual plot of sin x as a
waveform is totally irrelevant in this case.
The origin in this case is not the point (0,0). The origin would be the zero function, f (x) = 0.

As pointed out earlier, taking the derivative of a function is a linear transformation. If we apply this linear transformation to this line (by differentiating with respect to x) we get t cos x, which is another straight line.
Heres another example. The expression t sin x+(1t) cos x gives a straight line in the vector space of differentiable
functions. The points in this space are functions. When you plug in t = 0 you get cos x. When you plug in t = 1
you get sin x. So this is a straight line passing through the points cos x and sin x. This type of abstraction is one of
the basic features of higher mathematics. Here we have taken a simple, intuitive geometric idea from R2 (the idea of
a line through two points) and extended it to an abstract space.

376

C. Linear Transformations

Appendix D

Partitioned Matrices
Suppose you have the 5 5 matrix

1 1
6 6 3
6
A=6 9 0
4 8 7
1 1
This matrix can be partioned, for example, as follows:
2 1 1 4 3
2

6
6
A=6
4

6
9
8
1

3
0
7
1

1
1
6
3

7
2
5
4

4
1
1
6
3
2
8
2
8
2

3
7
2
5
4
3

2
8
2
8
2

3
7
7
7
5

7
A11
7
7=
A21
5

A12
A22

The entries in A can be divided into a group of submatrices. In this example A11 is a 3 3 matrix, A12 is a 3 2
matrix, A21 is a 2 3 matrix, and A22 is a 2 2 matrix. (This would not be the only way of partitioning A. Draw
any collection of horizontal and vertical lines through the matrix and you can create a partition.
For another example let I3 be the 3 3 identity matrix. The following are all ways of partitioning I3 :
2 T3

e1

1 0
4eT 5
e1 e2 e3
2
0 I2
eT
3

The important thing about partitioned matrices is that if the partitions have compatible sizes then the usual rules
for matrix addition and multiplication can be used with the partitions. For example we could write

A11 A12
B11 B12
A+B =
+
=
A21 A22
B21 B22

A11 + B11 A12 + B12


A21 + B21 A22 + B22
if the various submatrices have compatible sizes for the additions to be defined (i.e., A11 and B11 must have the same
size, etc.)
Similarly we could write

A11 A12 B11 B12


AB =
=
A21 A22 B21 B22

A11 B11 + A12 B21 A11 B21 + A12 B22


A21 B11 + A22 B21 A21 B12 + A22 B22
provided that all the subsequent multiplications and additions are defined.
For example suppose A is an invertible n n matrix,I is the n n identity matrix, and O is the n n zero matrix,
then

O A
O
I
I O
=
1
I O A
O
O I
Or suppose that matrix B is a 3 7 matrix.
If you can find a pivot in of the first 3 columns of B then the reduced

row echelon form of B would have the form I C where I is the 3 3 identity matrix and C is a 3 7 matrix. Now
notice that

C
I C
=O
I

377

378

D. Partitioned Matrices

Ask yourself: what


are

the dimensions of the matrices in the above equation. The above equation also implies that
C
the columns of
form a basis for Nul B. (Why?)
I
Two other familiar examples of multiplying partitioned matrices are when each row or column is a partition. For
example if we have the matrix product AB and we let aT
i be the rows of A and bi be the columns of B then we can
write
3
2 T
2 T3
a1 b1 aT
aT
...
a1
1 b2
1 b3
T
T
T
T
6a2 b1 a2 b2 a2 b3 . . .7
6a2 7

7
6
6 7
AB = 6aT
7 b1 b2 b3 . . . = 6aT
b
aT
aT
. . .7
3 b2
3 b3
3
5
4 3 1
4 5
.
.
.
.
..
..
..
..

This is just the inner product form for matrix multiplication.


On the other hand if we have the matrix product CD and we partition C into columns and D into rows we have
2 T3
d1
dT 7

6
6 27
T
T
CD = c1 c2 c3 . . . 6dT
7 = c1 dT
1 + c2 d2 + c3 d3 +
4 35
..
.

This is the outer product form for matrix multiplication.

As a last example of using partitioned matrices we will give a proof that a symmetric matrix, A, is orthogonally
diagonalizable by some matrix P .
We will proof this by induction on the size of the matrix. If A is 1 1 then it is already diagonal and we can let
P = [1].
Now assume the statement is true for matrices of size (n 1) (n 1). We have to show that it is true for n n
matrices. We know that A has only real eigenvalues, so let 1 be some real eigenvalue of A with a corresponding
unit eigenvector v1 . We can find an orthonormal basis for Rn {v1 , v2 , . . . , vn } (any such basis will do) and let
P = v1 v2 . . . vn . Now
2 T3
v1
6v2T 7

6 7
P T AP = 6 . 7 Av1 Av2 . . . Avn
4 .. 5
vn
2 T3
v1
6v2T 7

6 7
= 6 . 7 1 v1 Av2 . . . Avn
4 .. 5
vn

1 0
=
0
B
where B is an (n 1) (n 1) matrix. Furthermore, P T AP is symmetric, so B must be symmentric. By the induction
hypothesis we now have
QT BQ = D
for some orthogonal

matrix Q and diagonal matrix D.


1 0
Let R =
. We then have
0 Q

1
0
1 0
1 0 1 0
RT
R =
0
B
0
B 0 Q
0 QT

1
0
=
0
QT BQ

1 0
=
0
D

1 0
Finally, this means that RT P T AP R = (P R)T AP R =
. But P R is an orthogonal matrix since the
0
D
product of two orthogonal matrices is orhogonal. Lets define S = P R. We then get S T AS is diagonal and so A is
orthogonally diagonalizable.

You might also like