A Practical Approach to
LINEAR ALGEBRA
"This page is Intentionally Left Blank"
A Practical Approach to
LINEAR ALGEBRA
Prabhat Choudhary
'Oxford Book Company
Jaipur. India
ISBN: 9788189473952
First Edition 2009
Oxford Book Company
267, 10BScheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur3020 18
Phone: 01412594705, Fax: 01412597527
email: oxfordbook@sity.com
website: www.oxfordbookcompany.com
© Reserved
Typeset by:
Shivangi Computers
267, lOBScheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur3020 18
Printed at :
Rajdhani Printers, Delhi
All Rights are Reserved. No part ofthis publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic. mechanical, photocopying, recording, scanning or otherwise, without the prior written
permission of the copyright owner. Responsibility for the facts stated, opinions expressed, conclusions reached and
plagiarism, if any, in this volume is entirely that of the Author, according to whom the matter encompassed in this book has
been origmally created/edited and resemblance with any such publication may be incidental. The Publisher bears no
responsibility for them, whatsoever.
Preface
Linear Algebra has occupied a very crucial place in Mathematics. Linear Algebra is a
continuation of classical course in the light of the modem development in Science and
Mathematics. We must emphasize that mathematics is not a spectator sport, and that in
order to understand and appreciate mathematics it is necessary to do a great deal of personal
cogitation and problem solving.
Scientific and engineering research is becoming increasingly dependent upon the
development and implementation of efficient parallel algorithms. Linear algebra is an
indispensable tool in such research and this paper attempts to collect and describe a selection
of some of its more important parallel algorithms. The purpose is to review the current
status and to provide an overall perspective of parallel algorithms for solving dense, banded,
or blockstructured problems arising in the major areas of direct solution of linear systems,
least squares computations, eigenvalue and singular value computations, and rapid elliptic
solvers. There is a widespread feeling that the nonlinear world is very different, and it is
usually studied as a sophisticated phenomenon of Interpolation between different
approximatelyLinear Regimes.
Prabhat Choudhary
"This page is Intentionally Left Blank"
Contents
Preface v
1. Basic Notions 1
.2:' Systems of Linear Equations 26
,3.
Matrics', 50
4. Determinants 101
5. Introduction to Spectral Theory 139
6. Inner Product Spaces 162
7. Structure of Operators in Inner Product Spaces 198
8. Bilinear and Quadratic Forms 221
: ..
9. Advanced Spectral Theory 234
10. Linear Transformations 252
"This page is Intentionally Left Blank"
Chapter 1
Basic Notions
VECTOR SPACES
A vector space V is a collection of objects, called vectors, along with two operations,
addition of vectors and multiplication by a number (scalar), such that the following
properties (the socalled axioms of a vector space) hold:
The first four properties deal with the addition of vector:
1. Commutativity: v + w = w + v for all v, W E V.
2. Associativity: (u + v) + W = u + (v + w) for all u, v, W E V.
3. Zero vector: there exists a special vector, denoted by 0 such that v + 0 = v for
all v E V.
4. Additive inverse: For every vector v E V there exists a vector W E V such that
v + W = O. Such additive inverse is usually denoted as v.
The next two properties concern multiplication:
5. Multiplicative identity: 1 v = v for all v E V.
6. Multiplicative associativity: ( a ~ ) v = a ( ~ v ) for all v E Vand all E scalars a, ~ .
And finally, two distributive properties, which connect multiplication and addition:
7. a(u + v) = au + av for all u, v E Vand all sCfllars a.
8. (a + ~ ) v = av + ~ v for all v E Vand all scalars a, ~ .
Remark: The above properties seem hard to memorize, but it is not necessary. They
are simply the familiar rules of algebraic manipulations with numbers.
The only new twist here is that you have to understand what operations you can apply
to what objects. You can add vectors, and you can multiply a vector by a number (scalar).
Of course, you can do with number all possible manipulations that you have learned before.
But, you cannot multiply two vectors, or add a number to a vector.
Remark: It is not hard to show that zero vector 0 is unique. It is also easy to show that
2 Basic Notions
given v E V the inverse vector v is unique. In fact, properties can be deduced from the
properties: they imply that 0 = Ov for any v E V, and that v = (l)v.
If the scalars are the usual real numbers, we call the space Va real vector space. If the
scalars are the complex numbers, i.e., if we can multiply vectors by complex numbers, we
call the space Va complex vector space.
Note, that any complex vector space is a real vector space as well (if we can multiply
by complex numbers, we can multiply by real numbers), but not the other way around.
It is also possible to consider a situation when the scalars are elements of an arbitrary
field IF.
In this case we say that V is a vector space over the field IF. Although many of the
constructions in the book work for general fields, in this text we consider only real and
complex vector spaces, i.e., IF is always either lR or Co
Example: The space lR
n
consists of all columns of size n,
VI
v2
v=
vn
whose entries are real numbers. Addition and multiplication are defined entrywise, i.e.,
a
vn aV
n
vn wn vn +wn
Example: The space en also consists of columns of size n, only the entries now are
complex numbers. Addition and multiplication are defined exactly as in the case of lR
n
,
the only difference is that we can now multiply vectors by complex numbers, i.e., en is a
complex vector space.
Example: The space Mmxn (also denoted as Mm n) ofm x n matrices: the multiplication
and addition are defined entrywise.Ifwe allow only real entries (and so only multiplication
only by reals), then we have a real vector space; if we allow complex entries and
multiplication by complex numbers, we then have a complex vector space.
Example: The space lP' n of polynomials of degree at most n, consists of all polynomials
p of form
pet) = a
o
+ alt + a
2
r + ... + ant,
where t is the independent variable. Note, that some, or even all, coefficients ak can be O.
In the case of real coefficients a
k
we have a real vector space, complex coefficient
give us a complex vector space.
Basic Notions 3
Question: What are zero vectors in each of the above examples?
Matrix notation
An m x n matrix is a rectangular array with m rows and n columns. Elements of the
array are called entries of the matrix.
It is often convenient to denote matrix entries by indexed letters is}, the first index
denotes the number of the row, where the entry is aij' and the second one is the number of
the column. For example
al,1
(
 )m n a2,1
A = a'k' =
J j=l,k=1
is a general way to write an m x n matrix.
Very often for a matrix A the entry in row number) and column number k is denoted
by A ,k or (A) k' and sometimes as in example above the same letter but in lowercase is
J J,
used for the matrix entries.
Given a matrix A, its transpose (or transposed matrix) AT, is defined by transforming
the rows of A into the columns. For example
(
1 2 3)T = (1 4J
4 5 6 2 5 .
3 6
So, the columns of AT are the rows of A and vise versa, the rows of AT are the columns
ofA.
The formal definition is as follows: (AT)j,k = (A)kj meaning that the entry of AT in the
row number) and column number k equals the entry of A in the row number k and row
number}.
The transpose of a matrix has a very nice interpretation in terms of linear
transformations, namely it gives the so called adjoint transformation. We will study this
in detail later, but for now transposition will be just a useful formal operation.
One of the first uses of the transpose is that we can write a column vector x E IR
n
as
x = (XI' x
2
' •.. , xn)T. Ifwe put the column vertically, it will use significantly more space.
LINEAR COMBINATIONS, BASES.
Let Vbe a vector space, and let VI' v
2
"'" vp E Vbe a collection of vectors. A linear
combination of vectors VI' v
2
"'" vp is a sum of form
4
p
aiv
i
+ a
2
v
2
+ ... + apvp = Lakvk.
k=1
Basic Notions
Definition: A system of vectors vI' v
2
, ... vn E Vis called a basis (for the vector space
V) if any vector v E V admits a unique representation as a linear combination
II
V = aiv
i
+ a
2
v
2
+ ... + anvn = Lak
v
k
•
k=1
The coefficients ai' a
2
, ••• , an are called coordinates of the vector v (in the basis, or
with respect to the basis vI' v
2
' ••. , v,J
Another way to say that vI' v
2
'.·., VII is a basis is to say that the equation xlvI + x
2
v
2
+ ... + xmvn = v (with unknowns x
k
) has a unique solution for arbitrary right side v.
Before discussing any properties of bases, let us give few examples, showing that
such objects exist, and it makes sense to study them.
Example: The space V is ]RII. Consider vectors
0 0 0
0 1 0 0
0
, e2 =
0
, e3 = , ... , en = 0 ,
e, =
0 0 0
(the vector e
k
has all entries 0 except the entry number k, which is 1). The system of
vectors e
l
, e
2
, ... , ell is a basis in Rn. Indeed, any vector
V=
Xn
can be represented as the linear combination
n
V = xle
l
+ x
2
e
2
+ ... xnen = LXkek
k=1
and this representation is unique. The system e
l
, e
2
, ... , en E ]Rn is called the standard basis
in ]Rn.
Example: In this example the space is the space Jllln of the polynomials of degree at
most n. Consider vectors (polynomials) eo' e
l
, e
2
, ... , en E Jllln defined by
e
o
_= 1, e
1
= t, e
2
= P, e
3
= ~ , ... , en = ~ .
Basic Notions 5
Clearly, any polynomial p, pet) = a
o
+ alt + a
2
t
2
+ + ain admits a unique
representation
p = aoe
o
+ aiel + ... + anen·
So the system eo' e
l
, e
2
, •.. , en E pn is a basis in pn. We will call it the standard basis
in pn.
Remark: If a vector space V has a basis vI' v
2
, •.. , vn' then any vector v is uniquely
defined by its coeffcients in the decomposition v = I. k=1 Uk vk .
So, if we stack the coefficients uk in a column, we can operate with them as if they
were column vectors, i.e., as with elements oflRn.
Namely, if v = I. k=1 Uk vk and w = I. k=1 vk ' then
n n n
v+w= I.UkVk+
k=1 k=! k=1
i.e., to get the column of coordinates of the sum one just need to add the columns of
coordinates of the summands.
Generating and Linearly Independent Systems. The definition of a basis says that any
vector admits a unique representation as a linear combination. This statement is in fact
two statements, namely that the representation exists and that it is unique. Let us analyse
these two statements separately.
Definition: A system of vectors vI' '.'2' ... ' Vp E Vis called a generating system (also a
spanning system, or a complete system) in V if any vector v E V admits representation as
a linear combination
p
v = Uiv
i
+ U
2
v
2
+ ... + upvp = I.UkVk
k=1
The only difference with the definition of a basis is that we do not assume that the
representation above is unique. The words generating, spanning and complete here are
synonyms. The term complete, because of my operator theory background.
Clearly, any basis is a generating (complete) system. Also, if we have a basis, say vI'
v
2
' ... , vn' and we add to it several vectors, say vn +1' ... , v
p
' then the new system will be a
generating (complete) system. Indeed, we can represent any vector as a linear combination
of the vectors vI' v
2
' ... , vn' and just ignore the new ones (by putting corresponding
coefficients uk = 0).
Now, let us turn our attention to the uniqueness. We do not want to worry about
existence, so let us consider the zero vector 0, which always admits a representation as a
linear combination.
6 Basic Notions
Definition: A linear combination a
l
vI + a
2
v
2
+ ... + apvp is called trivial if a
k
= 0 Vk.
A combination is always (for all choices of vectors vI' v
2
' .•. , v
p
) equal to
0, and that IS probably the reason for the name.
Definition: A system of vectors vI' v
2
' ... , vp E V is called linearly independent if only
the trivial linear combination (:2..:=l akvk with a
k
= 0 Vk) of vectors vI' V
2
, ... , vp equals
O.
In other words, the system vI' v
2
, ••• , vp is linearly independent i the equation xlvI +
x
2
v
2
+ ... + xpvp = 0 (with unknowns x
k
) has only trivial solution xI = x
2
= ... = xp = O.
If a system is not linearly independent, it is called linearly dependent. By negating the
definition of linear independence, we get the following
Definition: A system of vectors vI' v
2
, ••• , vp is called linearly dependent if 0 can be
represented as a nontrivial linear combination, 0 = :2..:=l akvk .
Nontrivial here means that at least one of the coefficient a
k
is nonzero. This can be
(and usually is) written as :2..:=11 ak 1"* o.
So, restating the definition we can say, that a system is linearly dependent if and only
ifthere exist scalars at' a
2
, ... , (J.P' :2..:=11 ak 1"* 0 such that
p
:2..
a
k
v
k = o.
k=1
An alternative definition (in terms of equations) is that a system VI' v
2
' ••• , vp is linearly
dependent i the equation
XlVI +x
2
v
2
+··· +xpvp=O
(with unknowns x
k
) has a nontrivial solution. Nontrivial, once again again means that at
least one ofxk is different from 0, and it can be written as :2..:=11 xk 1"* O.
The following proposition gives an alternative description of linearly dependent
systems.
Proposition: A system of vectors VI' V
2
, ... , vp E V is linearly dependent if and only if
one of the vectors V
k
can be represented as a linear combination of the other vectors,
P
V
k
=
j=1
j*k
Proof Suppose the system VI' V
2
"'" vp is linearly dependent. Then there exist scalars
ak' :2..:=11 ak 1"* 0 such that
Basic Notions 7
aiv
i
+ a
2
v
2
+ ... + apvp = O.
Let k be the index such that ak:f:. O. Then, moving all terms except akv
k
to the right
side we get
p
akvk Iajvj.
j=1
j"#k
Dividing both sides by ak we get with ~ j = a/a
k
•
On the other hand, if holds, 0 can be represented as a nontrivial linear combination
P
v
k
 I ~ j V j =0
j=1
j"#k
Obviously, any basis is a linearly independent system. Indeed, if a system vI' v
2
,···, vn
is a basis, 0 admits a unique representation
n
0= alv
1
+ a
2
v
2
+ ... + anv
n
= Lakvk'
k=l
Since the trivial linear combination always gives 0, the trivial linear combination must
be the only one giving O.
So, as we already discussed, if a system is a basis it is a complete (generating) and
linearly independent system. The following proposition shows that the converse implication
is also true.
Proposition: A system of vectors v I' v
2
' ..• , V n E V is a basis if and only if it is linearly
independent and complete (generating).
Proof: We already know that a basis is always linearly independent and complete, so
in one direction the proposition is already proved.
Let us prove the other direction. Suppose a system vI' v
2
' ... , vn is linearly independent
and complete. Take an arbitrary vector v
2
v. Since the system vI' v
2
, ••. , vn is linearly complete
(generating), v can be represented as
n
V = I'V V + I'V V + + rv V = ~ akvk'
U,I I '""'2 2 •.• '""'n n £...J
k=I
We only need to show that this representation is unique.
Suppose v admits another representation
Then
8 Basic Notions
n n n
L, (ak advk = L, (akvk) L,akVk = VV= 0
k=l k=l k=l
Since the system is linearly independent, Uk  Uk = 0 'r;fk, and thus the representation
v = aIv
I
+ a
2
v
2
+ ... + anv
n
is unique.
Remark: In many textbooks a basis is defined as a complete and linearly independent
system. Although this definition is more common than one presented in this text. It
emphasizes the main property of a basis, namely that any vector admits a unique
representation as a linear combination.
Proposition: Any (finite) generating system contains a basis.
Proof Suppose VI' v
2
"'" Vp E V is a generating (complete) set. If it is linearly
independent, it is a basis, and we are done.
Suppose it is not linearly independent, i.e., it is linearly dependent. Then there exists
a vector V k which can be represented as a linear combination of the vectors vj' j :j; k.
Since v
k
can be represented as a linear combination of vectors vj' j :j; k, any linear
combination of vectors vI' v
2
"'" vp can be represented as a linear combination of the same
vectors without v
k
(i.e., the vectors vj' 1 ~ j ~ p , j = k). So, if we delete the vector v
k
, the
new system will still be a complete one.
If the new system is linearly independent, we are done. 1fnot, we repeat the procedure.
Repeating this procedure finitely many times we arrive to a linearly independent and
complete system, because otherwise we delete all vectors and end up with an empty set.
So, any finite complete (generating) set contains a complete linearly independent subset,
i.e., a basis.
LINEAR TRANSFORMATIONS. MATRIXVECTOR MULTIPLICATION
A transformation T from a set X to a set Y is a rule that for each argument (input) x E
X assigns a value (output) y = T (x) E Y. The set X is called the domain of T, and the set
Y is called the target space or codomain of T. We write T: X ~ Y to say that T is a
transformation with the domain X and the target space Y.
Definition: Let V, W be vector spaces. A transformation T: V ~ W is called linear if
I. T (u + v) = T(u) + T (v) 'r;fu, v E V;
2. T (av) = aT (v) for all v E Vand for all scalars a.
Properties I and 2 together are equivalent to the following one:
T (au + pv) = aT (u) + PT (v) for all u, v E Vand for all scalars a, b.
Examples: You dealt with linear transformation before, may be without even suspecting
it, as the examples below show.
Example: Differentiation: Let V = lfDn (the set of polynomials of degree at most n), W
= lP' nl' and let T: lfD n ~ lfD,._l be the differentiation operator,
Basic Notions 9
T (p):= p'lip E lP'n'
Since if + g) = f + g and (a./)' = af', this is a linear transformation.
Example: Rotation: in this example V = W = jR2 (the usual coordinate plane), and a
transformation Ty: jR2 7 jR2 takes a vector in jR2 and rotates it counterclockwise by r
radians. Since Tyrotates the plane as a whole, it rotates as a whole the parallelogram used
to define a sum of two vectors (parallelogram law). Therefore the property 1 of linear
transformation holds. It is also easy to see that the property 2 is also true.
Example: Reflection: in this example again V = W = jR2, and the trans
formation T: jR2 7 jR2 is the reflection in the first coordinate axis. It can also be shown
geometrically, that this transformation is linear, but we will use another way to show that.
Fig. Rotation
Namely, it is easy to write a formula for T,
T ((::)) ~ V ~ J
and from this formula it is easy to check that the transformation is linear.
Example: Let us investigate linear transformations T: jR ~ lR. Any such transformation
is given by the formula
T (x) = ax where a = T (1).
Indeed,
T(x) = T(x x I) =xT(l) =xa = ax.
So, any linear transformation of jR is just a multiplication by a constant.
Linear transformations J!{' 7 r. Matrixcolumn mUltiplication: It turns out that a
linear transformation T: jRn 7 jRm also can be represented as a multiplication, not by a
number, but by a matrix.
10 Basic Notions
Let us see how. Let T: ]Rn ]Rm be a linear transformation. What information do we
need to compute T (x) for all vectors x E ]Rn? My claim is that it is sufficient how T acts on
the standard basis e" e
2
, ... , en of Rn. Namely, it is sufficient to know n vectors in R
m
(i.e."
the vectors of size m),
Indeed, let
X=
Xn
Then x = xle
l
+ x
2
e
2
+ ... + xnen = and
T(x) = T(ixkek) = iT(Xkek) = iXkT(ek) = iXkak .
k=l k=l k=l k=l
So, if we join the vectors (columns) aI' a
2
, ... , an together in a matrix
A = [aI' a
2
, ... , an]
(ak being the kth column of A, k = 1, 2, ... , n), this matrix contains all the information
about T. Let us show how one should define the product of a matrix and a vector (column)
to represent the transformation T as a product, T (x) = Ax. Let
al,l al,2 al,n
a2,1 a2,2 a2,n
A=
am,l a
m
,2 am,n
Recall, that the column number k of A is the vector a
k
, i.e.,
al,k
a2,k
a
k
=
Then if we want Ax = T (x) we get
am,k
al,l al,2 al,n
n
a2,1 a2,2 a2,n
Ax = LXkak = Xl +X2 +···+X
n
k=l
am,l a
m
,2 am,n
So, the multiplication should be performed by the following column by
Basic Notions 11
coordinate rule: Multiply each column of the matrix by the corresponding coordinate of
the vector.
Example:
The "column by coordinate" rule is very well adapted for parallel computing. It will
be also very important in different theoretical constructions later.
However, when doing computations manually, it is more convenient to compute the
result one entry at a time. This can be expressed as the following row by column rule:
To get the entry number k of the result, one need to multiply row number k of the
matrix by the vector, that is, if Ax = y, then
I
n
= a ·x· = .
yk j=lk,}},k 1,2, ... m,
here Xj and Yk are coordinates ofthe vectors x and y respectively, and aj'k are the entries of
the matrix A.
Example:
(
1 2 = (1.1+2.2+3.3)=(14)
4 5 6 3 4·1 + 5·2 + 6·3 32
Linear transformations and generating sets: As we discussed above, linear
transformation T(acting from is completely defined by its values on the standard
basis in The fact that we consider the standard basis is not essential, one can consider
any basis, even any generating (spanning) set. Namely, a linear transformation T: V ? W
is completely defined by its values on a generating set (in particular by its values on a
basis). In particular, if vI' V
2
, ••. , vn is a generating set (in particular, ifit is a basis) in V,
and T and TI are linear transformations T, V ? W such that
thenT= T
I
.
Conclusions
Tv
k
= TIv", k= 1,2, ... , n
1. To get the matrix of a linear transformation T: Rn ? Rm one needs to join
the vectors a
k
= T e
k
(where e
I
, e
2
, ••• , en is the standard basis in Rn) into a
matrix: kth column of the matrix is ak' k = 1,2, ... , n.
2. If the matrix A of the linear transformation T is known, then T (x) can be
found by the matrixvector multiplication, T(x) = Ax. To perform matrix
vector multiplication one can use either "column by coordinate" or "row by
column" rule.
12 Basic Notions
The latter seems more appropriate for manual computations. The former is well adapted
for parallel computers, and will be used in different theoretical constructions.
For a linear transformation T: JR.n ~ JR:m, its matrix is usually denoted as [T]. However,
very often people do not distinguish between a linear transformation and its matrix, and
use the same symbol for both. When it does not lead to confusion, we will also use the
same symbol for a transformation and its matrix.
Since a linear transformation is essentially a multiplication, the notation Tv is often
used instead of T(v). We will also use this notation. Note that the usual order of algebraic
operations apply, i.e., Tv + u means T(v) + u, not T(v + u).
Remark: In the matrixvector mUltiplication Ax the number of columns of the matrix
A matrix must coincide with the size of the vector x, i.e." a vector in JR.n can only be
multiplied by an m x n matrix. It makes sense, since an m x n matrix defines a linear
transformation JR.n ~ JR.
m
, so vector x must belong to JR.
n
.
The easiest way to remember this is to remember that if performing multiplication
you run out of some elements faster, then the multiplication is not defined. For example, if
using the "row by column" rule you run out of row entries, but still have some unused
entries in the vector, the multiplication is not defined. It is also not defined if you run out
of vector's entries, but still have unused entries in the column.
COMPOSITION OF LINEAR TRANSFORMATIONS
AND lVIATRIX MULTIPLICATION
Definition of the matrix multiplication: Knowing matrixvector multiplication, one
can easily guess what is the natural way to define the product AB of two matrices: Let us
multiply by A each column of B (matrixvector multiplication) and join the resulting
columnvectors into a matrix. Formally, if b
I
, b
2
, ••. , b
r
are the columns of B, then Ab
I
,
Ab
2
, ... , Ab
r
are the columns of the matrix AB. Recalling the row by column rule for the
matrixvector mUltiplication we get the following row by column rule for the matrices the
entry (AB)j,k (the entry in the row j and column k) of the product AB is defined by
(AB)j,k = (row j of A) . (column k of B)
Formally it can be rewritten as
(AB)j,k = Laj"b"k'
,
if aj,k and bj,k are entries of the matrices A and B respectively.
I intentionally did not speak about sizes of the matrices A and B, but if we recall the
row by column rule for the matrixvector multiplication, we can see that in order for the
multiplication to be defined, the size of a row of A should be equal to the size of a column
of B. In other words the product AB is defined i£ and only if A is an m x nand B is n x r
matrix.
Basic Notions 13
Motivation: Composition of linear transformations. Why are we using such a
complicated rule of multiplication? Why don't we just multiply matrices entrywise? And
the answer is, that the multiplication, as it is defined above, arises naturally from the
composition of linear transformations. Suppose we have two linear transformations,
T
1
: ]Rn ~ ]Rm and T
2
: ]Rr ~ ]Rn. Define the composition T = TI T2 of the transformations
T
I
, T2 as
T (x) = TI(Tix)) \Ix ERr.
Note that TI(x) ERn. Since TI:]Rn ~ ]Rm, the expression TI(Tix)) is well defined and
the result belongs to ]Rm. So, T: ]Rr ~ ]Rm.
It is easy to show that T is a linear transformation, so it is defined by an m x r matrix.
How one can find this matrix, knowing the matrices of TI and T2?
Let A be the matrix of TI and B be the matrix of T
2
. As we discussed in the previous
section, the columns of T are vectors T (e
l
), T (e
2
), ... , T(e
r
) , where el' e
2
, ... , e
r
is the
standard basis in Rr. For k = 1, 2, ... , r we have
T (e
k
) = T
I
(T
2
(e
k
)) = TI(Be
k
) = TI(b
k
) = Ab
k
(operators T2 and TI are simply the mUltiplication by B and A respectively).
So, the columns of the matrix of Tare Abl' Ab
2
, ... , Ab
r
, and that is exactly how
the matrix AB was defined!
Let us return to identifying again a linear transformation with its matrix. Since the
matrix multiplication agrees with the composition, we can (and will) write TI T2 instead of
TI T2 and TIT 2x instead of TI(Tix)).
Note that in the composition TI T2 the transformation T2 is applied first! The way to
remember this is to see that in TI T
2
x the transformation T2 meets x first.
Remark: There is another way of checking the dimensions of matrices in a product,
different form the row by column rule: for a composition T
J
T2 to be defined it is necessary
that T 2x belongs to the domain of T
1
• If T2 acts from some space, say ]R'to ]Rn, then TI must
act from Rn to some space, say ]Rm. So, in order for TI T2 to be defined the matrices of TI
and T2 should We will usually identify a linear transformation and its matrix, but in the
next few paragraphs we will distinguish them be of sizes m x nand n x r respectivelythe
same condition as obtained from the row by column rule.
Example: Let T: ]R2 ~ ]R2 be the reflection in the line xI = 3x
2
. It is a linear
transformation, so let us find its matrix. To find the matrix, we need to compute Tel and
T
e2
. However, the direct computation of Te I and Te2 involves significantly more trigonometry
than a sane person is willing to remember.
An easier way to find the matrix of T is to represent it as a composition of simple
linear transformation. Namely, let g be the angle between the xI axis and the line xI = 3x
2
,
and let To be the reflection in the xlaxis. Then to get the reflection T we can first rotate
the plane by the angle g, moving the line xI = 3x
2
to the xlaxis, then reflect everything
in the xIaxis, and then rotate the plane by g, taking everything back. Formally it can be
written as
14 Basic Notions
T= RgTORY
where Rg is the rotation by g. The matrix of To is easy to compute,
To
the rotation matrices are known
(
cosy sin Y)
Ry = sin y cos y ,
(
COS( y) sine y») (COSY sin y)
R_y= sin(y) cos(y) = siny cosy,
To compute sin yand cos ytake a vector in the line x I = 3x
2
, say a vector (3, Il. Then
first coordinate 3 3
cos Y = length  + 12  .JW
and similarly
second coordinate 1
sin y = length  + 12  .J1O
Gathering everything together we get
lo G lo
G
It remains only to perform matrix multiplication here to get the final result.
Properties of Matrix Multiplication.
Matrix multiplication enjoys a lot of properties, familiar to us from high school algebra:
I. Associativity: A(BC) = (AB)C, provided that either left or right side is well defined;
2. Distributivity: A(B + C) = AB + AC, (A + B)C = AC + BC, provided either left or
right side of each equation is well defined;
3. One can take scalar multiplies out: A(aB) = aAB.
This properties are easy to prove. One should prove the corresponding properties for
linear transformations, and they almost trivially follow from the definitions. The properties
of linear transformations then imply the properties for the matrix multiplication.
The new twist here is that the commutativity fails: Matrix multiplication is non
commutative, i.e., generally for matrices AB = BA.
Basic Notions 15
One can see easily it would be unreasonable to expect the commutativity of matrix
multiplication. Indeed, letA and B be matrices of sizes m x nand n x r respectively. Then
the product AB is well defined, but if m = r, BA is not defined.
Even when both products are well defined, for example, when A and Bare nxn (square)
matrices, the multiplication is still noncommutative. If we just pick the matrices A and B
at random, the chances are that AB = BA: we have to be very lucky to get AB = BA.
Transposed Matrices and Multiplication.
Given a matrix A, its transpose (or transposed matrix) AT is defined by transforming
the rows of A into the columns. For example
(
I 2
4 5
T (1 4)
!) ~ ~ ! .
So, the columns of AT are the rows of A and vise versa, the rows of AT are the columns
ofA.
The formal definition is as follows: (AT)j,k = (A)kJ meaning that the entry of AT in the
row number) and column number k equals the entry of A in the row number k and row
number}.
The transpose of a matrix has a very nice interpretation in terms of linear
transformations, namely it gives the socalled adjoint transformation.
We will study this in detail later, but for now transposition will be just a useful formal
operation.
One of the first uses of the transpose is that we can write a column vector x E R
n
as x
= (x \' x
2
, ..• , Xn)T. If we put the column vertically, it will use significantly more space.
A simple analysis of the row by columns rule shows that
(AB)T = BTAT,
i.e." when you take the transpose of the product, you change the order of the terms.
Trace and Matrix Multiplication.
For a square (n x n) matrix A = (aj,k) its trace (denoted by trace A) is the sum of the
diagonal entries
n
trace A = L ak,k
k=l
Theorem: Let A and B be matrices of size m Xn and n Xm respectively (so the both
p )ducts AB and BA are well defined). Then
trace(AB) = trace(BA)
16 Basic Notions
There are essentially two ways of proving this theorem. One is to compute the diagonal.
entries of AB and of BA and compare their sums. This method requires some proficiency
in manipulating sums in notation. If you are not comfortable with algebraic manipulatioos,
there is another way. We can consider two linear transformations, T and Tl' acting from
Mnxm to lR = lRI defined by
T (X) = trace(AX), T} (X) = trace(XA)
To prove the theorem it is sufficient to show that T = T
1
; the equality for X = A gives
the theorem. Since a linear transformation is completely defined by its values on a generating
system, we need just to check the equality on some simple matrices, for example on matrices
J0.k' which has all entries 0 except the entry I in the intersection of jth column and kth
row.
INVERTIBLE TRANSFORMATIONS AND MATRICES. ISOMORPHISMS
IDENTITY TRANSFORMATION AND IDENTITY MATRIX
Among all linear transformations, there is a special one, the identity transformation
(operator) L Ix = x, 'Vx. To be precise, there are infinitely many identity transformations:
for any vector space V, there is the identity transformation I = Iv: V ~ V, Ivx = x, 'Vx E
V. However, when it is does not lead to the confusion we will use the same symbol I for all
identity operators (transformations). We will use the notation IV only we want to emphasize
in what space the transformation is acting. Clearly, if I: lR
n
~ lR
n
is the identity
transformation in Rn, its matrix is an n x n matrix
1=1 =
n
1 0 0
o 1 0
o 0 1
(l on the main diagonal and 0 everywhere else). When we want to emphasize the size
of the matrix, we use the notation In; otherwise we just use 1. Clearly, for an arbitrary
linear transformation A, the equalities
AI=A,IA =A
hold (whenever the product is defined).
INVERTffiLE TRANSFORMATIONS
Definition: Let A: V ~ W be a linear transformation. We say that the transformation
A is left invertible if there exist a transformation B: W ~ V such that
BA = I (I = I v here). The transformation A is called right invertible if there exists a linear
transformation C: W ~ V such that
Basic Notions 17
AC = I (here 1= I
w
)'
The transformations Band C are called left and right inverses of A. Note, that we did
not assume the uniqueness of B or C here, and generally left and right inverses are not
unique.
Definition: A linear transformation A: V ~ W is called invertible if it is both right and
left invertible.
Theorem. If a linear transformation A: V ~ W is invertible, then its left and right
inverses Band C are unique and coincide.
Corollary: A transformation A: V ~ Wis invertible if and only if there erty is used as
the exists a unique linear transformation (denoted AI), AI: W ~ V such definition of an
AIA = IV' AAl = Iw
The transformation AI is called the inverse of A.
Proof Let BA = I and AC = 1. Then
BAC = B(AC) = BI = B.
On the other hand
BAC = (BA)C = IC = C,
and therefore B = C.
Suppose for some transformation BI we have BIA = 1. Repeating the above reasoning
with B I instead of B we get B 1 = C. Therefore the left inverse B is unique. The uniqueness
of C is proved similarly.
Definition: A matrix is called invertible (resp. left invertible, right invertible) if the
corresponding linear transformation is invertible (resp. left invertible, right invertible).
Theorem: asserts that a matrix A is invertible if there exists a unique matrix
AI such that A1A = I, AA
I
= 1. The matrix AI is called (surprise) the inverse of A.
Examples:
1. The identity transformation (matrix) is invertible, 11 = I;
2. The rotation Rg
= ( C ~ S 1 sin 1)
Ry sm 1 cos 1
is invertible, and the inverse is given by (R
y
rl = R_"( This equality is clear
from the geometric description of Rg, and it also can be checked by the matrix
multiplication;
3. The column (l, I)T is left invertible but not right invertible. One of the possible
left inverses in the row (112, 112).
To show that this matrix is not right invertible, we just notice that there are
more than one left inverse. Exercise: describe all left inverses of this matrix.
18 Basic Notions
4. The row (l, 1) is right invertible, but not left invertible. The column (112, 1I2l
is a possible right inverse.
Remark: An invertible matrix must be square (n x n). Moreover, if a square matrix A
has either left of right inverse, it is invertib!e. So, it is sufficient to check only one of the
identities AA
I
= L AIA = 1.
This fact will be proved later. Until we prove this fact, we will not use it. I presented
it here only to stop trying wrong directions.
Properties of the Inverse Transformation
Theorem: (Inverse of the product). If linear transformations A and B are invertible
(and such that the product AB is defined), then the product AB is invertible and
(ABt
l
= .sI AI
(note the change of the order!)
Proof Direct computation shows:
(AB)(.sIAI) = A(B.sI)AI = AIA
I
= AAI = I
and similarly
(.sIAI)(AB) = .sI(AIA)B = .sIIB = .sIB = I
Remark: The invertibility of the product AB does not imply the invertibility of the
factors A and B (can you think of an example?). However, if one of the factors (either A or
B) and the product AB are invertible, then the second factor is also invertible.
Theorem: (Inverse of AT). If a matrix A is invertible, then AT is also invertible and
(ATrl = (At)T
Proof Using (ABl = BT AT we get
(At)T AT = (AAt)T = IT = I, and similarly
AT (AIl = (AtAl = IT = 1.
And finally, if A is invertible, then AI is also invertible, (AIr
l
= A. So, ret us
summarize the main properties of the' inverse:
1. If A is invertible, then AI is also invertible, (AIt
l
= A;
2. If A and B are invertible and the product AB is defined, then AB is invertible and
(AB)I = .sIAI.
3. If A is invertible, then AT is also invertible and (ATt
l
= (AI)T.
ISOMORPHISM ISOMORPHIC SPACES.
An invertible linear transformation A: V ~ W is called an isomorphism. We did not
introduce anything new here, it is just another name for the object we already studied.
Two vector spaces V and Ware called isomorphic (denoted V == W) if there is an
isomorphism A: V ~ W.
Isomorphic spaces can be considered as di erent representation of the same space,
Basic Notions 19
meaning that all properties and constructions involving vector space operations are preserved
under isomorphism.
The theorem below illustrates this statement.
Theorem: LetA: V ~ Wbe an isomorphism, and let vI' V
2
' ... , vn be a basis in V. Then
the system Av
l
, Av
2
, ... , AVn is a basis in W.
Remark: In the above theorem one can replace "basis" by "linearly independent", or
"generating", or "linearly dependent"all these properties are preserved under isomorphisms.
Remark: If A is an isomorphism, then so is AI. Therefore in the above theorem we
can state that vI' v
2
' •.. , vn is a basis if and only if Avl' Av
2
, .•. , AVn is a basis.
The inverse to the Theorem is also true
Theorem: Let A: V ~ W be a linear map, and let VI' v
2
' ••• , vn and WI' w
2
' ... , wn are
bases in Vand W respectively. if AVk = w
k
' k = 1,2, ... , n, then A is an isomorphism.
Proof Define the inverse transformation AI by AIw
k
= v
k
, k= 1,2, ... , n (as we know,
a linear transformation is defined by its values on a basis).
Examples:
1. Let A: ]Rn+1 ~ JP>n (JP>n is the set of polynomials I:=oak
tk
of degree at most n)
is defined by
Ae
l
= 1, Ae
2
= t, ... , Ae
n
= (11, Ae
n
+
1
= (1 By Theorem A is an isomorphism, so
JP>n _ = ]Rn+l.
2. Let V be a (real) vector space with a basis v\' v
2
' ... , v
n
' Define transformation A:
]Rn ~ Vby
Ae
k
= v
k
, k = 1,2, ... , n,
where e
l
, e
2
, ... , en is the standard basis in ]Rn. Again by Theorem A is an isomorphism,
so V== Rn.
3. M
2x3
== JR6;
4. More generally, Mmxn == Rm"n
Invertibility and equations
Theorem: Let A: V ~ W be a linear transformation. Then A is invertible if and only if
for any right side b E W the equation
Ax=b
has a unique solution x E V.
Proof: Suppose A is invertible. Then x = AIb solves the equation Ax = b. To show
that the solution is unique, suppose that for some other vector XI E V
Ax b
I 
Multiplying this identity by AI from the left we get
20 Basic Notions
AlAx =AIb
,
and therefore x I = AI b = x. Note that both identities, AAI = I and AiA = I were used
here. Let us now suppose that the equation Ax = b has a unique solution x for any b E W.
Let us use symbol y instead of b. We know that given yEW the equation
Ax=y
has a unique solution x E V. Let us call this solution B (y).
Let us check that B is a linear transformation. We need to show that
B(aYI + PY2) = ap(YI) + PB(Y2)·
Let x
k
:= B(Yk)' k = 1,2, i.e., AXk = Yk' k = 1,2.
Then
which means
B(aYI + PY2) = aB(Yi) + PB(Y2)·
Corollary: An m x n matrix is invertible ifand only ifits columns form a basis in Rm.
SUBSPACES
A subspace of a vector space V is a subset Vo c V of V which is closed under the
vector addition and multiplication by scalars, i.e.,
1. If v E Vo then av E Vo for all scalars a.
2. For any u, v E Vo the sum u + v E Vo.
Again, the conditions 1 and 2 can be replaced by the following one:
au + bv E Vo for all u, v E Vo' and for all scalars a, p.
Note, that a subspace Vo c V with the operations (vector addition and multiplication
by scalars) inherited from Vis a vector space. Indeed, because all operations are inherited
from the vector space V they must satisfy all eight axioms of the vector space. The only
thing that could possibly go wrong, is that the result of some operation does not belong to
Vo. But the definition of a subspace prohibits this!
Now let us consider some examples:
1. Trivial subspaces of a space V, namely V itself and {O} (the subspace consisting
only of zero vector). Note, that the empty set 0 is not a vector space, since it
does not contain a zero vector, so it is not a subspace. With each linear
transformation A : V t W we can associate the following two subspaces:
2. The null space, or kernel of A, which is denoted as Null A or Ker A and consists
of all vectors v E V such that Ay = o.
3. The range Ran A is defined as the set of all vectors W E W whicb can be
represented as w = Ay for some v E V.
If A is a matrix, i.e., A: R
m
t ]Rn, then recalling column by coordinate rule of the
matrixvector multiplication, we can see that any vector W E Ran A can be represented as
Basic Notions
21
a linear combination of columns of the matrix A. That explains why the term column
space (and notation Col A) is often used for the range of the matrix. So, for a matrix A, the
notation Col A is often used instead of Ran A.
And now the last Example.
4. Given a system of vectors vI' V
2
' ... , Vr E Vits linear span (sometimes called simply
span) £{V
I
, V
2
' ... , v
r
} is the collection of all vectors V E Vthat can be represented
as a linear combination v = alv
I
+ a
2
v
2
+ ... + arv
r
of vectors vI' V
2
' ... , v
r
. The
notation span{v
I
, v
2
' ••• , v
r
} is also used instead of £{vl' v
2
'···, v
r
}
It is easy to check that in all of these examples we indeed have subspaces.
APPLICATION TO COMPUTER GRAPHICS
In this section we give some ideas of how linear algebra is used in computer graphics.
We will not go into the details, but just explain some ideas. In particular we explain why
manipulation with 3 dimensional images are reduced to multiplications of 4 x 4 matrices.
2Dimensional Manipulation
The x  y plane (more precisely, a rectangle there) is a good model of a computer
monitor. Any object on a monitor is represented as a collection of pixels, each pixel is
assigned a specific colour.
Position of each pixel is determined by the column and row, which play role of x and
y coordinates on the plane. So a rectangle on a plane with x  y coordinates is a good
model for a computer screen: and a graphical object is just a collection of points.
Remark: There are two types of graphical objects: bitmap objects, where every pixel
of an object is described, and vector object, where we describe only critical points, and
graphic engine connects them to reconstruct the object. A (digital) photo is a good example
of a bitmap object: every pixel of it is described.
Bitmap object can contain a lot of points, so manipulations with bitmaps require a lot
of computing power. Anybody who has edited digital photos in a bitmap manipulation
programme, like Adobe Photoshop, knows that one needs quite a powerful computer, and
even with modern and powerful computers manipulations can take some time.
That is the reason that most ofthe objects, appearing on a computer screen are vector
ones: the computer only needs to memorize critical points.
For example, to describe a polygon, one needs only to give the coordinates of its
vertices, and which vertex is connected with which. Of course, not all objects on a computer
screen can be represented as polygons, some, like letters, have curved smooth boundaries.
But there are standard methods allowing one to draw smooth curves through a collection
of points. For us a graphical object will be a collection of points (either wireframe model,
or bitmap) and we would like to show how one can perform some manipulations with such
objects. The simplest transformation is a translation (shift), where each point (vector) v is
22 Basic Notions
translated by a, i.e., the vector v is replaced by v + a (notation v 17 v + a is used for this).
A vector addition is very well adapted to the computers, so the translation is easy to
implement.
Note, that the translation is not a linear transformation (if a :f. 0): while it preserves
the straight lines, it does not preserve O. All other transformation used in computer graphics
are linear. The first one that comes to mind is rotation. The rotation by yaround the origin
o is given by the multiplication by the rotation matrix Rr we discussed above,
_ (COSY sin y)
R  .
r Stny cosy
Ifwe want to rotate around a point a, we first need to translate the picture bya, moving
the point a to 0, then rotate around 0 (multiply by R) and then translate everything back
by a. Another very useful transformation is scaling, given by a matrix
( ~ ~ ) ,
a, b :?: O. If a = b it is uniform scaling which enlarges (reduces) an object, preserving its
shape. If a :f. b then x and y coordinates scale di erently; the object becomes "taller" or
"wider". Another often used transformation is reflection: for example the matrix
defines the reflection through xaxis. We will show later in the book, that any linear
transformation in ]R2 can be represented either as a composition of scaling rotations and
reflections. However it is sometimes convenient to consider some di erent transformations,
like the shear transformation, given by the matrix
This transformation makes all objects slanted, the horizontal lines remain horizontal,
but vertical lines go to the slanted lines at the angle j to the horizontal ones.
3Dimensional Graphics
Threedimensional graphics is more complicated. First we need to be able to
manipulate 3dimensional objects, and then we need to represent it on 2dimensional plane
(monitor). The manipulations with 3dimensional objects is pretty straightforward, we have
the same basic transformations:
Translation, reflection through a plane, scaling, rotation. Matrices of these
Basic Notions 23
transformations are very similar to the matrices of their 2 x 2 counterparts. For example
the matrices
J'
o 0 I 0 0 cOO I
represent respectively reflection through x  y plane, scaling, and rotation around zaxis.
Note, that the above rotation is essentially 2dimensional transformation, it does not
change z coordinate.
Similarly, one can write matrices for the other 2 elementary rotations around x and
around y axes. It will be shown later that a rotation around an arbitrary axis can be
represented as a composition of elementary rotations.
So, we know how to manipulate 3dimensional objects. Let us now discuss how to
represent such objects on a 2dimensional plane.
The simplest way is to project it to a plane, say to the x  y plane. To perform such
projection one just needs to replace z coordinate by 0, the matrix of this projection is
%
000
y
:r
Fig. Perspective Projection onto x  y plane: F is the centre (focal point) of the projection
Such method is often used in technical illustrations. Rotating an object and projecting
it is equivalent to looking at it from di erent points. However, this method does not give a
very realistic picture, because it does not take into account the perspective, the fact that
the objects that are further away look smaller.
To get a more realistic picture one needs to use the socalled perspective projection.
To: Qefine a perspective projection one needs to pick a point the centre of projection or the
24 Basic Notions
focal point) and a plane to project onto. Then each point in ]R3 is projected into a point on
the plane such that the point, its image and the centre of the projection lie on the same line.
This is exactly how a camera works, and it is a reasonable first approximation of how our
eyes work.
Let us get a formula for the projection. Assume that the focal point is (0, 0, d)T and
that we are projecting onto xy plane. Consider a point v = {x, y, zl, and let
v* = (x*, y*, ol
be its projection, we get that
x* x
d dz'
so
y
,..... __ ..... "'" (x',y' , O)
x'
z
x
h)
z
Fig. Finding Coordinates x*, y* of the Perspective Projection of the Point (x, y, z) T
xd x
x*=  = 
dz lz/d
and similarly
y* = y .
lz/d
Note, that this formula also works if z > d and if z < 0: you can draw the corresponding
similar triangles to check It. Thus the perspective projection maps a point (x, y, z) to the
(
x y O)T
point 1 z / d ' 1  z / d'
This transformation is definitely not linear (because of z in the denominator). However
it is still possible to represent it as a linear transformation. To do this let us introduce the
socalled homogeneous coordinates.
In the homogeneous coordinates, every point in ]R3 is represented by 4 coordinates,
the last, 4th coordinate playing role of the scaling coe cient. Thus, to get usual3dimensional
coordinates of the vector v = (x, y, zl from its homogeneous coordinates (x l' x
2
, x
3
, x
4
l
Basic Notions 25
one needs to divide all entries by the last coordinate x
4
and take the first 3 coordinates 3 (if
x
4
= 0 this recipe does not work, so we assume that the case x
4
= 0 corresponds to the point
at infinity).
Thus in homogeneous coordinates the vector v* can be represented as
(x, y, 0, I  z/dl, so in homogeneous coordinates the perspective projection.
Ifwe multiply homogeneous coordinates of a point in]R2 by a nonzero scalar, we do
not change the point. In other words, in homogeneous coordinates a point in ]R3 is
represented by a line through 0 in ]R4.
is a linear transformation:
x 0 0 0 x
y 0 I 0 0 y
0
= 0 0 0 0 z
lzld 0 0 lid I I
Note that in the homogeneous coordinates the translation is also a linear transformation:
= 0 0 0 G3
o 0 0 I
x
y
z
I
But what happen if the centre of projection is not a point (0, 0, d) T but some arbitrary
point (d
t
, d
2
, d
3
l. Then we first need to apply the translation by (d
p
d
2
, O)Tto move the
centre to (0, 0, d
3
)T while preserving the xy plane, apply the projection, and then move
everything back translating it by (d
t
, d
2
, ol.
Similarly, if the plane we project to is not xy plane, we move it to the xy plane by
using rotations and translations, and so on.
All these operations are just multiplications by 4 x 4 matrices. That explains why
modern graphic cards have 4 x 4 matrix operations embedded in the processor.
Of course, here we only touched the mathematics behind 3dimensional graphics,
there is much more.
For example, how to determine which parts of the object are visible and which are
hidden, how to make realistic lighting, shades, etc.
Chapter 2
Systems of Linear Equations
Different Faces of Linear Systems
There exist several points of view on what a system of linear equations, or in short a
linear system is. The first one is, that it is simply a collection ofm linear equations with n
unknowns xl' X
2
' ... , X
n
'
{
all X, + a
12
x
2
+ ... + a'nXn =:. b
i
a
2I
x
i
+ a
22
x
2
+ '" + a
2n
X
n
 b
2
amixI + amZx
Z
+ ... + amnXn = b
m
To solve the system is to find all ntuples of numbers xl' X
2
' ... , xn which satisfy all m
equations simultaneously.
Ifwe denote X:= (xl' X
2
' ... , xnl E lR
n
, b = (b
l
,b
2
, ... , bml E lR
m
, and
A = ( ~ ~ : ~ ~ ~ : ~ ::: ~ ~ : : ] ,
am 'I am,Z '" am'n
then the above linear system can be written in the matrix form (as a matrix vector
equation)
Ax = b.
To solve the above equation is to find all vectors X E Rn satisfying Ax = b, and finally,
recalling the "column by coordinate" rule of the matrixvector multiplication, we can write
the system as a vector equation
xla
l
+ x
2
a
2
+ ... + xnan = b,
where a
k
is the kth column of the matrix A, a
k
= (alk' a
2
'k' ... , am,k)T, k = I, 2, ... , n.
Note, these three examples are essentially just different representations of the same
mathematical object.
Systems of Linear Equations 27
Before explaining how to solve a linear system, let us notice that it does not matter
what we call the unknowns, x
k
' Yk or something else. So, all the information necessary to
solve the system is contained in the matrix A, which is called the coefficient matrix of the
system and in the vector (right side) b. Hence, all the information we need is contained in
the following matrix
which is obtained by attaching the column b to the matrix A. This matrix is called the
augmented matrix ofthe system. We will usually put the vertical line separating A and b to
distinguish between the augmented matrix and the coefficient matrix.
Solution of a Linear System. Echelon and Reduced Echelon Forms
Linear system are solved by the GaussJordan elimination (which is sometimes called
row reduction). By performing operations on rows of the augmented matrix of the system
(i.e., on the equations), we reduce it to a simple form, the socalled echelon form. When
the system is in the echelon form, one can easily write the solution.
Row operations. There are three types of row operations we use:
1. Row exchange: interchange two rows of the matrix;
2. Scaling: multiply a row by a nonzero scalar a;
3. Row replacement: replace a row # k by its sum with a constant multiple of a row
# j; all other rows remain intact;
It is clear that the operations 1 and 2 do not change the set of solutions of the system;
they essentially do not change the system. As for the operation 3, one can easily see that it
does not lose solutions.
Namely, let a "new" system be obtained from an "old" one by a row operation of type
3. Then any solution of the "old" system is a solution of the "new" one.
To see that we do not gain anything extra, i.e., that any solution of the "new" system
is also a solution of the "old" one, we just notice that row operation of type 3 are reversible,
i.e., the "old' system also can be obtained from the "new" one by applying a row operation
of type 3.
Row operations and multiplication by elementary matrices. There is another, more
"advanced" explanation why the above row operations are legal.
Namely, every row operation is equivalent to the multiplication of the matrix from
the left by one ofthe special elementary matrices. Namely, the multiplication by the matrix
28
} k
o
1
o ........ .
k
......... 0
o
just interchanges the rows number} and number k.
1
o
1 0
o Q 0
k 0 ]
o o
o 1
Systems of Linear Equations
Multiplication by the matrix multiplies the row number k by Q. Finally, multiplication
by the matrix
}
k
1
o
1
Q
o
1
o
A way to describe (or to remember) these elementary matrices: they are obtained
from I by applying the corresponding row operation to it adds to the row # k row # }
multiplied by a, and leaves all other rows intact. To see, that the multiplication by these
matrices works as advertised, one can just see how the multiplications act on vectors
(columns).
Note that all these matrices are invertible (compare with reversibility of row operations).
The inverse ofthe first matrix is the matrix itself. To get the inverse ofthe second one, one
just replaces a by 1/a. And finally, the inverse of the third matrix is obtained by replacing
a by a. To see that the inverses are indeed obtained this way, one again can simply check
how they act on columns.
So, performing a row operatiQn on the augmented matrix of the system Ax = b is
equivalent to the multiplication of the system (from the left) by a special invertible matrix
E. Left multiplying the equality Ax = b by E we get that any solution of the equation
Ax =b
Systems oj Linear Equations
is also a solution of
EAx = Eb.
29
Multiplying this equation (from the left) by we get that any of its solutions is a
solution of the equation
,
which is the original equation Ax = b. So, a row operation does not change the solution
set of a system.
Row reduction. The main step of row reduction consists of three substeps:
1. Find the leftmost nonzero column of the matrix;
2. Make sure, by applying row operations of type 2, if necessary, that the first (the
upper) entry of this column is nonzero. This entry will be called the pivot entry
or simply the pivot;
3. "Kill" (i.e., make them 0) all nonzero entries below the pivot by adding
(subtracting) an appropriate multiple of the first row from the rows number 2, 3,
... ,m.
We apply the main step to a matrix, then we leave the first row alone and apply the
main step to rows 2, ... , m, then to rows 3, ... , m, etc.
The point to remember is that after we subtract a multiple of a row from all rows
below it (step 3), we leave it alone and do not change it in any way, not even interchange
it with another row.
After applying the main step finitely many times (at most m), we get what is called
the echelon form of the matrix.
An example of row reduction. Let us consider the following linear system:
{
XI + 2x2 + 3x3 = 1
3xI+2x2 +x3 = 7
2x1 + X
2
+ 2x3 = 1
The augmented matrix of the system is
jJ
2 1 2 1
Operate R
2
, 2(3), R
3
, (2), we get:
U n  j j JJ
Operate R2 (  ± ), we get
30 Systems of Linear Equations
J J =l)
Operate, R3 2 (3), we obtain
(
6 (6 l)
o 3 4 1 0 0 2 4
Now we can use the so called back substitution to solve the system. Namely, from the
last row (equation) we getx3 =2. Then from the second equation we get
x
2
= 1 2x3 =  1  2(2) = 3,
and finally, from the first row (equation)
xl = 1  2X2  3x3 = 1  6 + 6 = 1.
So, the solution is
: 13
x3 = 2,
or in vector form
or x= (1, 3,2l. We can check the solution by mUltiplying Ax, where A is the coefficient
matrix.
Instead of using back substitution, we can do row reduction from down to top, killing
all the entries above the main diagonal of the coefficient matrix: we start by multiplying
the last row by 112, and the rest is pretty selfexplanatory:
(
6 J) _ (6 g (6 g
o 0 1 2 0 0 1 2 0 0 1 2
and we just read the solution x = (1, 3,2)T 0 the augmented matrix.
Echelon form. A matrix is in echelon form if it satisfies the following two conditions:
1. All zero rows (i.e." the rows with all entries equal 0), if any, are below all non
zero entries.
For a nonzero row, let us call the leftmost nonzero entry the leading entry. Then the
second property of the echelon form can be formulated as follows:
2. For any nonzero row its leading entry is strictly to the right of the leading entry
in the previous row.
The leading entry in each row in echelon form is also called pivot entry, Pivots: leading
(rightmost nonzero entries) in a row. or simply pivot, because these entries are exactly
the pivots we used in the row reduction.
Systems of Linear Equations 31
A particular case of the echelon form is the socalled triangular form. We got this
form in our example above. In this form the coefficient matrix is square (n x n), all its
entries on the main diagonal are nonzero, and all the entries below the main diagonal are
zero. The right side, i.e., the rightmost column of the augmented matrix can be arbitrary.
After the backward phase of the row reduction, we get what the socalled reduced
echelonform of the matrix: coefficient matrix equal I, as in the above example, is a particular
case of the reduced echelon form.
The general definition is as follows: we say that a matrix is in the reduced echelon
form, if it is in the echelon form and
3. All pivot entries are equal I;
4. All entries above the pivots are O. Note, that all entries below the pivots are also
o because of the echelon form.
To get reduced echelon form from echelon form, we work from the bottom to the top
and from the right to the left, using row replacement to kill all entries above the pivots.
An example of the reduced echelon form is the system with the coefficient matrix
equal!. In this case, one just reads the solution from the reduced echelon form. In general
case, one can also easily read the solution from the reduced echelon form. For example, let
the reduced echelon form of the system (augmented matrix) be
ooills 02;
(
ill 2 0 0 0 IJ
0000ill3
here we boxed the pivots. The idea is to move the variables, corresponding to the
columns without pivot (the socalled free variables) to the right side.
Then we can just write the solution.
Xl = 12x2
x
2
is free
x3 = 2 Sx4
x
4
is free
x5 = 3
or in the vector form
X=
12x2
x2
1Sx4
One can also find the solution from the echelon form by using back substitution: the
idea is to work from bottom to top, moving all free variables to the right side.
32 Systems of Linear Equations
Analyzing the Pivots
All questions about existence of a solution and it uniqueness can be answered by
analyzing pivots in the echelon (reduced echelon) form of the augmented matrix of the
system. First of all, let us investigate the question of when when is the equation Ax = b
inconsistent, i.e., when it does not have a solution. The answer follows immediately, if
one just thinks about it:
a system is inconsistent (does not have a solution) if and only ifthere is a pivot in the
last row of an echelon form of the augmented matrix, i.e., I an echelon form of the augmented
matrix has a row 00 ... ° b, b :t ° in it.
Indeed, such a row correspond to the equation ox
I
+ Ox
2
+ . .. + xn = b :t ° that does not
have a solution. Ifwe don't have such a row, we just make the reduced echelon form and
then read the solution off.
Now, three more statements. Note, they all deal with the coefficient matrix, and not
with the augmented matrix of the system.
1. A solution (if it exists) is unique i there are no free variables, that is if and only
if the echelon form of the coefficient matrix has a pivot in every column;
2: Equation Ax = b is consistent for all right sides b if and only if the echelon form
of the coefficient matrix has a pivot in every row.
3. Equation Ax = b has a unique solution for any right side b if and only if echelon
form of the coefficient matrix A has a pivot in every column and every row.
The first statement is trivial, because free variables are responsible for all non
uniqueness. I should only emphasize that this statement does not say anything about the
existence.
The second statement is a tiny bit more complicated. If we have a pivot in every row
of the coefficient matrix, we cannot have the pivot in the last column of the augmented
matrix, so the system is always consistent, no matter what the right side b is.
Let us show that if we have a zero row in the echelon form ofthe coefficient matrix A,
then we can pick a right side b such that the system Ax = b is not consistent. LetAe echelon
form of the coefficient matrix A. Then
Ae=EA,
where E is the product of elementary matrices, corresponding to the row operations, E
= EN, ... , E
2
, E
I
. If Ae has a zero row, then the last row is also zero. Therefore, if we put be
= (0, ... ,0, Il (all entries are 0, except the last one), then the equation
Ac = be
does not have a solution. Multiplying this equation by n
I
from the left, an recalling
that nIAe = A, we get that the equation
Ax = nIb
e
does not have a solution.
Finally, statement 3 immediately follows from statements 1 and 2.
Systems of Linear Equations 33
From the above analysis of pivots we get several very important corollaries. The main
observation. In echelon form, any row and any column have no more than 1 pivot in it (it
can have 0 pivots)
Corollaries about Linear Independence and Bases
Questions as to when a system of vectors in ~ n is a basis, a linearly independent or
a span'1ing system, can be easily answered by the row reduction.
Proposition. Let us have a system of vectors vI' v
2
• ...• Vm E ~ n . and let A = [vI' v
2
•
...• v
m
] be an n x m matrix with columns vI' v
2
• ...• v
m
. Then
1. The system vI' v
2
, ... , vm is linearly independent i echelonform of A has a pivot
in every column;
2. The system VI' V
2
' ••. , vrn is complete in ~ n (spanning. generating) i echelon
form of A has a pivot in every row;
3. The system vi' v
2
' •.. , vm is a basis in ~ n i echelon form of A has a pivot in
every column and in every row.
Proof The system VI' v
2
' •.. , vm E ~ m is linearly independent ifand only if the equation
XlvI +x
2
v
2
+··· +xmvm=O
has the unique (trivial) solution XI = x
2
= ... = xm = 0, or equivalently, the equation Ax
= 0 has unique solution x = O. By statement 1 above, it happens if and only if there is a
pivot in every column of the matrix.
Similarly, the system VI' v
2
' •.. , vm E ~ m is complete in ~ n ifand only if the equation
XlvI +x
2
v
2
+··· +xmvm=b
has a solution for any right side b E ~ n . By statement 2 above, it happens if and only
if there is a pivot in every column in echelon form of the matrix.
And finally, the system VI' v
2
' ••. , vm E ~ m is a basis in ~ n ifand only if the equation
XlVI +x
2
v
2
+··· +xmvm=b
has unique solution for any right side b E ~ n . By statement 3 this happens if and
only ifthere is a pivot in every column and in every row of echelon form of A.
Proposition. Any linearly independent system of vectors in ~ n cannot have more
than n vectors in it.
Proof Let a system vi' v
2
' ... , vm E ~ n be linearly independent, and letA = [VI' v
2
' ••• ,
v m] be the n x m matrix with columns v I' v
2
' ... , v m. By Proposition echelon form of A must
have a pivot in every column, which is impossible if m > n (number of pivots cannot be
more than number of rows).
Proposition. Any two bases in a vector space V have the same number of vectors in
them.
34 Systems of Linear Equations
Proof Let vI' V
2
' ... , vn and w"w
2
' ..• ,w
m
be two different bases in V. Without loss of
generality we can assume that n ~ m. Consider an isomorphism A : IR.
n
~ V defined by
Ae
k
= v
k
' k = 1,2, ... n,
where e
l
, e
2
, ... , en is the standard basis in IRn.
Since AI is also an isomorphism, the system AI
wl
, AI w
2
' ••. , AI wm is a basis. So
it is linearly independent, m ~ n. Together with the assumption n ~ m this implies that m
=n.
The statement below is a particular case of the above proposition.
Proposition. Any basis in IR
n
must have exactly n vectors in it.
Proof This fact foIlows immediately from the previous proposition, but there is also
a direct proof. Let v I' v
2
' .•• , v m be a basis in IR n and let A be the n x m matrix with
columns VI' v
2
' ... , v
m
. The fact that the system is a basis, means that the equation
Ax = b
has a unique solution for any (all possible) right side b. The existence means that
there is a pivot in every row (of a reduced echelon form of the matrix), hence the number
of pivots is exactly n. The uniqueness mean that there is pivot in every column of the
coefficient matrix (its echelon form), so
m = number of columns = number of pivots = n
Proposition. Any spanning (generating) set in IR n must have at least n vectors.
Proof Let VI' v
2
' ... , vm be a complete system in IR
n
, and letA be n x m matrix with
columns VI' v
2
' ... , v
m
. Statement 2 of Proposition implies that echelon form of A has a
pivot in every row. Since number of pivots cannot exceed the number of rows, n ~ m.
Corollaries About Invertible Matrices
Proposition. A matrix A is invertible if and only if its echelon form has pivot in every
column and every row.
Proof As it was discussed in the beginning of the section, the equation Ax = b has a
unique solution for any right side b if and only if the echelon form of A has pivot in every
row and every column. But, we know, that the matrix (linear transformation) A is invertible
if and only if the equation Ax = b has a unique solution for any possible right side b.
There is also an alternative proof. We know that a matrix is invertible if and only if its
columns form a basis in. Proposition above states that it happens if and only if there is a
pivot in every row and every column.
The above propo·sition immediately implies the foIlowing
Corollary. An invertible matrix must be square (n x n).
Proposition.lj a square (n x n) matrix is left invertible, or ifit is right right invertible,
then it is invertible. In other words, to check the invertibility of a square matrix A it is
sucient to check only one of the conditions AAI = 1, AIA = 1.
Systems of Linear Equations 35
Note, that this proposition applies only to square matrices!
Proof We know that matrix A is invertible if and only if the equation Ax = b has a
unique solution for any right side b. This happens if and only if the echelon form of the
matrix A has pivots in every row and in every column.
If a matrix A is left invertible, the equation Ax = 0 has unique solution x = O. Indeed,
if b is a left inverse of A (i.e., BA = l), and x satisfies
Ax = 0,
then multiplying this identity by B from the left we get x = 0, so the solution is unique.
Therefore, the echelon form of A has pivots in every row. If the matrix A is square (n x n),
the echelon form also has pivots in every column, so the matrix is invertible.
If a matrix A is right invertible, and C is its right inverse (AC = l), then for x = Cb, b
E IR
n
Ax = ACb = Ib = b.
Therefore, for any right side b the equation Ax = b has a solution x = Cb. Thus, echelon
form of A has pivots in every row. If A is square, it also has a pivot in every column, so A
is invertible.
FINDING AI BY ROW REDUCTION
As it was discussed above, an invertible matrix must be square, and its echelon form
must have pivots in every row and every column. Therefore reduced echelon form of an
invertible matrix is the identity matrix 1. Therefore,
Any invertible matrix is row equivalent (Le. can be reduced by row operations)
to the identity matrix.
Now let us state a simple algorithm of finding the inverse of an n x n matrix:
1. Form an augmented n x 2n matrix (All) by writing the n x n identity matrix
right of A;
2. Performing row operations on the augmented matrix transform A to the identity
matrix I;
3. The matrix I that we added will be automatically to AI;
4. If it is impossible to transform A to the identity by row operations, A is not
invertible
There are several possible explanations of the above algorithm. The first, a na·"yve
one, is as follows: we know that (for an invertible A) the vector AIb is the solution of the
equation Ax = b. So to find the column number k of AI we need to find the solution of Ax
= ek, where e
l
, e
2
, ... , en is the standard basis in Rn. The above algorithm just solves the
equations
Ax = e
k
, k = 1,2, ... , n
simultaneously!
36 Systems of Linear Equations
Let us also present another, more "advanced" explanation. As we discussed above,
every row operation can be realized as a left multiplication by an elementary matrix. Let
E
I
, E
2
, ... , EN be the elementary matrices corresponding to the row operation we performed,
and let E = EN ... E2EI be their product. 1 We know that the row operations transform A to
identity, i.e., EA = J, so E = AI. But the same row operations transform the augmented
matrix (AI 1) to (EA IE) = (I I AI).
This "advanced" explanation using elementary matrices implies an important
proposition that will be often used later.
Theorem. Any invertible matrix can be represented as a product of elementary matrices.
Proof As we discussed in the previous paragraph,
AI =EN··· E E
2 I'
so
A = (AIt
l
= EjlE";l ... E;/.
Suppose we want to find the inverse of the matrix
(
1 4 2J
2 7 7 .
3 11 6
Augmenting the identity matrix to it and performing row reduction we get
(
1 4 2 1 0 OJ ( 1 4 2 1
2 7 7 0 1 0 2R  0 1 3 2
3 11 6 0 0 1 ~ 3 R : 0 1 03
o OJ
1 0
o 1 +R2
+2R2
(
1 4 2 1 0 0JX3 (3 12 6 3 0 0 ~ J  R 3
013210 01321
o 0 3 1 1 1 0 0 3 1 1
Here in the last row operation we multiplied the first row by 3 to avoid fractions in the
backward phase of row reduction. Continuing with the row reduction we get
(
3 12 0 1 2 2J
12R
2 (3 0 0 35 2 14J
o 1 0 3 0 1  0 1 0 3 0 1
o 0 3 1 1 1 0 0 3 1 1 1
Dividing the first and the last row by 3 we get the inverse matrix
(
35/3 2/3 14/3J
3 0 1
113 1/3 113
DIMENSION, FINITEDIMENSIONAL SPACES
Definition. The dimension dim V of a vector space V is the number of vectors in a
basis.
Systems of Linear Equations 37
For a vector space consisting only of zero vector 0 we put dim V = o. If V does not
have a (finite) basis, we put dim V = 00. If dim V is finite, we call the space V finite
dimensional; otherwise we call it infinitedimensional.
Proposition asserts that the dimension is well defined, i.e., that it does not"depend on
the choice of a basis.
This immediately implies the following
Proposition. A vector space Vis finitedimensional if and only if it has a finite spanning
system.
Suppose, that we have a system of vectors in a finitedimensional vector space, and
we want to check if it is a basis (or if it is linearly independent, or if it is complete)?
Probably the simplest way is to use an isomorphism A : V t IR
n
, n = dimE to move the
problem to IR
n
, where all such questions can be answered by row reduction (studying
pivots).
Note, that if dim V = n, then there always exists an isomorphism A : V t IRn. Indeed,
if dim V = n then there exists a basis
VI' V
2
' ••• , Vn E V,
and one can define an isomorphism
A : V t IR
n
by
AVk = e
k
, k = 1,2, ... , n.
Proposition. Any linearly independent system in a finitedimensional vector space V
cannot have more than dim V vectors in it.
Proof Let vI' v
2
' ... , vm E Vbe a linearly independent system, and letA: V t IR
n
be
an isomorphism. Then Av
I
, Av
2
, ... , AVm is a linearly independent system in IR
n
, and by
Proposition m ::; n.
Proposition. Any generating system in afinitedimensional vector space V must have
at least dim V vectors in it.
Proof Let vI' v
2
, ••. , vm E Vbe a complete system in V, and let
A : V t IR
n
be an isomorphism. Then Av
I
, Av
2
, ... , AVm is a complete system in lR
n
, and by
Proposition
m ~ n.
Proposition. Any linearly independent system 0/ vectors in a finitedimensional space
can be extended to a basis, i.e., ijvI' v
2
, ... , vr are linec:rly independent vectors in afinite
dimensional vector space V then one can find vectors v
r
+l, v
r
+ 2 """' vn such that the
system o/vectors vI' v
2
, """' vn is a basis in V.
38 Systems of Linear Equations
Proof Let n = dim Vand let r < n (if r = n then the system v I' V
2
' .•• , v r is already a basis,
and the case r> n is impossible). Take any vector not belonging to span{vl' v
2
' ... , v
r
} and
call it vr + I (one can always do that because the system vI' V
2
' ... , vr is not generating).
The system vI' v
2
' ... , v
r
' vr + I is linearly independent. Repeat the procedure with the new
system to get vector v r + 2, and so on.
We will stop the process when we get a generating system. Note, that the process
cannot continue infinitely, because a linearly independent system of vectors in V cannot
have more than n = dim V vectors.
General Solution of a Linear System
In this short section we discuss the structure ofthe general solution (i.e., ofthe solution
set) of a linear system. We call a system Ax = b homogeneous, if the right side, b = 0, i.e.,
a homogeneous system is a system of form Ax = O. With each system
Ax = b
we can associate a homogeneous system just by putting b = O.
Theorem. (General solution of a linear equation). Let a vector xlsatisfy the equation
Ax = b, and let H be the set of all solutions of the associated homogeneous system
Ax = O.
Then the set
{x = X I + x
h
: xh E H}
is the set of all solutions of the equation Ax = b.
In other words, this theorem can be stated as
General solution A particular solution General solution
= +
of Ax = b of Ax = b of Ax = 0
Proof Fix a vector xI satisfying
AXI =b.
Let a vector xh satisfy
Axh = O.
Then for
we have
Ax = A(x
I
+ xh) = AXI + AXh = b + 0 = b,
so any x of form
x = xI + xh' xII E H
is a solution of
Ax = b.
Now let x be satisfy Ax = b. Then for
x
h
:=xx
I
we get
Systems of Linear Equations 39
AXh = A(x  xl) = Ax  AXI = b  b = 0,
so
X
h
E H.
Therefore any solution x of Ax = b can be represented as x = xl + x
h
with some x
h
E
H.
The power of this theorem is in its generality. It applies to all linear equations, we do
not have to assume here that vector spaces are finitedimensional. You will meet this theorem
in differential equations, integral equations, partial differential equations, etc. Besides
showing the structure of the solution set, this theorem allows one to separate investigation
of uniqueness from the study of existence. Namely, to study uniqueness, we only need to
analyse uniqueness of the homogeneous equation Ax = 0, which always has a solution.
There is an immediate application in this course: this theorem allows us to check a
solution of a system Ax = b. For example, consider a system
(
1 1 1 1 = ~ ) x = (li).
2 2 2 2 8 14
Performing row reduction one can find the solution of this system
The parameters x
3
, x5 can be denoted here by any other letters, t and s, for example;
we keeping notation x3 and Xs here only to remind us that they came from the corresponding
free variables.
Now, let us suppose, that we are just given this solution, and we want to check whether
or not it is correct. Of course, we can repeat the row operations, but this is too time
consuming. Moreover, if the solution was obtained by some nonstandard method, it can
look differently from what we get from the row reduction. For example the formula
gives the same set as (can you say why?); here we just replaced the last vector by its
sum with the second one. So, this formula is different from the solution we got from the
row reduction, but it is nevertheless correct.
The simplest way to check that give us correct solutions, is to check that the first
vector (3, 1, 0, 2, ol satisfies the equation Ax = b, and that the other two (the ones with
40 Systems of Linear Equations
the parameters x3 and Xs or sand t in front of them) should satisfy the associated
homogeneous equation Ax = O.
If this checks out, we will be assured that any vector x defined is indeed a solution.
Note, that this method of checking the solution does not guarantee that gives us all the
solutions. For example, if we just somehow miss the term with x
2
' the above method of
checking will still work fine. What comes to mind, is to count the pivots again. In this
example, if one does row operations, the number of pivots is 3. So indeed, there should be
2 free variables, and it looks like we did not miss anything.
To be able to prove this, we will need new notions of fundamental subspaces and of
rank of a matrix. Systems of linear equations example, one does not have to perform all
row operations to check that there are only 2 free variables, and that formulas both give
correct general solution.
FUNDAMENTAL SUBSPACES OF A MATRIX
A: V ~ Wwe can associate two subspaces, namely, its kernel, or null space
KerA = Null A := {v E V: Av = O} C V, and its range
Ran A = {w E W: w = Av for some v E V} C W.
In other words, the kernel Ker A is the solution set of the homogeneous equation Ax =
0, and the range Ran A is exactly the set of all right sides b E W for which the equation Ax
= b has a solution.
If A is an m x n matrix, i.e., a mapping from ]Rn to ]Rm, then it follows from the
"column by coordinate" rule ofthe matrix mUltiplication that any vector W E Ran A can be
represented as a linear combination of columns of A. This explains the name column space
(notation Col A), which is often used instead of Ran A.
If A is a matrix, then in addition to Ran A and Ker A one can also consider the range
and kernel for the transposed matrix AT . Often the term row space is used for Ran AT and
the term left null space is used for KerAT (but usually no special notation is introduced).
The four subspaces RanA, Ker A, Ran AT, Ker AT are called the fundamental subspaces
of the matrix A. In this section we will study important relations between the dimensions
of the four fundamental subspaces. We will need the following definition, which is one of
the fundamental notions of Linear Algebra.
Definition. Given a linear transformation (matrix) a its rank, rankA, is the dimension
of the range of A
rankA := dim Ran A.
Computing Fundamental Subspaces and Rank
To compute the fundamental subspaces and rank ofa matrix, one needs to do echelon
reduction. Namely, let a be the matrix, and Ae be its echelon form
Systems of Linear Equations 41
1. The pivot columns of the original matrix a (i.e., the columns where after row
operations we will have pivots in the echelon form) give us a basis (one of many
possible) in Ran A.
2. The pivot rows of the echelon from Ae give us a basis in the row space. Of
course, it is possible just to transpose the matrix, and then do row operations.
But if we already have the echelon form of A, say by computing Ran A, then we
get Ran AT for free.
3. To find a basis in the null space Ker A one needs to solve the homogeneous
equation Ax = 0: the details will be seen from the example below.
Example. Consider a matrix
(
i i ~ ~ ~ J
3 3 3 3 2
0
'
1 1 1 1
Performing row operations we get the echelon form
(
~ ~ ~ ; !J
o 0 000
o 0 000
(the pivots are boxed here). So, the columns 1 and 3 of the original matrix,
i.e., the columns
give us a basis in Ran A. We also get a basis for the row space RanA T for free: the first
and second row of the echelon form of A, i.e., the vectors
(we put the vectors vertically here. The question of whether to put vectors here vertically
as columns, or horizontally as rows is is really a matter of convention. Our reason for
putting them vertically is that although we call RanAT the row space we define it as a
column space of AT)
To compute the basis in the null space Ker A we need to solve the equation Ax = O.
Compute the reduced echelon form of A, which in this example is
42 Systems of Linear Equations
(
ill 1 0 0 113)
o 0 ill 1 113.
o 0 0 0 0
o 0 0 0 0
Note, that when solving the homogeneous equation Ax = 0, it is not necessary to write
the whole augmented matrix, it is sucient to work with the coefficient matrix. Indeed, in
this case the last column of the augmented matrix is the column of zeroes, which does not
change under row operations. So, we can just keep this column in mind, without actually
writing it. Keeping this last zero column in mind, we can read the solution 0 the reduced
echelon form above:
1
xI = x2 3"x
s
,
x2is free.
x
4
is free,
Xs is free
or, in the vector form
x= 1
x4 xs
3
X4
Xs
1
1
=x2
0
0
0
0
113
0 0
+x4
1
+xs
113
1 0
0 1
The vectors at each free variable, i.e., in our case the vectors
[
 ~ ] [g] [  1 1 ~ ]
O. 1, 113
o . 1 0
001
form a basis in KerA.
Unfortunately, there is no shortcut for finding a basis in KerAT, one must solve the
equation AT x = O. Unfortunately, the knowledge of the echelon form of a does not help
here.
Explanation of the Computing Bases in the Fundamental Subspaces
So, why do the above methods indeed give us bases in the fundamental subspaces?
Systems of Linear Equations 43
The null space KerA. The case of the null space KerA is probably the simplest one:
since we solved the equation Ax = 0, i.e., found all the solutions, then any vector in Ker A
is a linear combination of the vectors we obtained. Thus, the vectors we obtained form a
spanning system in Ker A. To see that the system is linearly independent, let us multiply
each vector by the corresponding free variable and add everything. Then for each free
variable x
k
, the entry number k of the resulting vector is exactly x
k
' so the only way this
vector (the linear combination) can be 0 is when all free variables are O.
The column space Ran A. Let us now explain why the method for finding a basis in
the column space Ran A works. First of all, notice that the pivot columns of the reduced
echelon form are of a form a basis in Ran Are. Since row operations are just left
multiplications by invertible atrices, they do not change linear independence. Therefore,
the pivot columns of the original matrix A are linearly independent.
Let us now show that the pivot columns of a span the column space of A. Let VI ' v
2
,
... , vr be the pivot columns of A, and let V be an arbitrary column of A. We want to show
that v can be represented as a linear combination of the pivot columns vI' v
2
, ... , v
r
'
v = a IVI + a
2
v
2
+ ... + arv
r
.
the reduced echelon form Are is obtained from A by the left multiplication
Are = EA,
where E is a product of elementary matrices, so E is an invertible matrix. The vectors
Ev
l
, Ev
2
, ... , EVr are the pivot columns of Are' and the column v ofa is transformed to the
column Ev of Are. Since the pivot columns of Are form a basis in RanA
re
, vector Ev can be
represented as a linear combination
Ev = alEv
I
+ a
2
Ev
2
+ ... + a,Ev
r
.
Multiplying this equality by gI from the left we get the representation
v = aIv
I
+ a
2
v
2
+ ... + arv
r
,
so indeed the pivot columns of A span Ran A.
The row space Ran A T. It is easy to see that the pivot rows of the echelon form Ae of
a are linearly independent. Indeed, let w
I
'w
2
, ... ,w
r
be the transposed (since we agreed
always to put vectors vertically) pivot rows of Ae. Suppose
alw
l
+ a
2
w
2
+ ... + arw
r
= O.
Consider the first nonzero entry of WI. Since for all other vectors w
2
,w
3
' ... , wr the
corresponding entries equal 0 (by the definition of echelon form), we can conclude that a
l
= O. So we can just ignore the first term in the sum.
Consider now the first nonzero entry ofw
2
. The corresponding entries of the vectors
w
3
, ... , wr are 0, so a
2
= O. Repeating this procedure, we get that a
k
= 0 Vk = 1, 2, ... , r.
To see that vectors w
I
,w
2
' ... , wr span the row space, one can notice that row operations
do not change the row space. This can be obtained directly from analyzing row operations,
but we present here a more formal way to demonstrate this fact.
44 Systems of Linear Equations
For a transformation A and a set X let us denote by A(X) the set of all elements y
which can represented as y = A(x), x E X,
A(X) : = {y = A(x) : x EX}.
If a is an m x n matrix, and Ae is its echelon form, Ae is obtained from A be left
multiplication
Ae = EA,
where E is an m x m invertible matrix (the product of the corresponding elementary
matrices). Then
Ran A: = Ran(AT ET) =AT (Ran ET) =AT ( ~ m ) = Ran AT ,
so indeed RanA T = Ran A:
THE RANK THEOREM, DIMENSIONS
OF FUNDAMENTAL SUBSPACES
There are many applications in which one needs to find a basis in column space or in
the null space of a matrix. For example, as it was shown above, solving a homogeneous
equation Ax = 0 amounts to finding a basis in the null space KerA. Finding a basis in the
column space means simply extracting a basis from a spanning set, by removing unnecessary
vectors (columns).
However, the most important application of the above methods of computing bases of
fundamental subspaces is the relations between their dimensions. Theorem
rankA = rankA T .
This theorem is often stated as follows:
IThe column rank of a matrix coincides with its row rank.l
The proof of this theorem is trivial, since dimensions of both Ran A and RanA Tare
equal to the number of pivots in the echelon form of A. The following theorem is gives us
important relations between dimensions of the fundamental spaces. It is often also called
the Rank Theorem
Theorem. Let A be an m Xn matrix, i.e., a linear transformationfrom ]Rn to ]Rm. Then
1. dim Ker A + dim Ran A = dim Ker A + rank A = n (dimension of the domain of
A).
2. dim Ker AT + dimRan AT = dim Ker AT + rank AT = dimKer AT + rankA
T
= m
(dimension of the target space of A).
Proof The proof, modulo the above algorithms of finding bases in the fundamental
subspaces, is almost trivial. The first statement is simply the fact that the number of free
variables (dimKer A) plus the number of basic variables (i.e., the number of pivots, i.e.,
rank A) adds up to the number of columns (i.e., to n). The second statement, if one takes
into account that rank A = rank AT is simply the first statement applied to AT.
Systems of Linear Equations
As an application of the above theorem, there we considered a system
(
2 3
1 1
1 1
2 2
1 4 9J (17J 1 1 3 6
1 2 5 x = 8 .
2 3 8 14
and we claimed that its general solution given by
or by
45
A vector x given by either formula is indeed a solution of the equation. But, how can
we guarantee that any of the formulas describe all solutions?
First of all, we know that in either formula, the last 2 vectors (the ones multiplied by
the parameters) belong to Ker A.1t is easy to see that in either case both vectors are linearly
independent (two vectors are linearly dependent if and only if one is a mUltiple of the
other).
Now, let us count dimensions: interchanging the first and the second rows and
performing first round of row operations

2R
J(i ~ ~ l = ~ J (J ~  ~ i = ~ J
R
J
1 1 1 2 5 0 0 0 1 2
2R
J
2 2 2 3 8 0 0 0 1 2
we see that there are three pivots already, S0 rank A ~ 3. (Actually, we already can
see that the rank is 3, but it is enough just to have the estimate here). By Theorem, rankA
+ dim Ker A = 5, hence dim Ker A ~ 2, and therefore there cannot be more than 2 linearly
independent vectors in KerA. Therefore, last 2 vectors in either formula form a basis in
KerA, so either formula give all solutions of the equation.
An important corollary of the rank theorem, is the following theorem connecting
existence and uniqueness for linear equations.
Theorem. Let A be an an m x n matrix. Then the equation
Ax = b
has a solution for every b E lR m if and only if the dual equation
ATx=O
46 Systems of Linear Equations
has a unique (only the trivial) solution. (Note, that in the second equation we have AT,
not A).
Proof The proof follows immediately from Theorem by counting the dimensions.
There is a very nice geometric interpretation of the second rank theorem. Namely,
statement 1 of the theorem says, that if a transformation a: IR
n
~ IRtn has trivial kernel
(KerA = {O}), then the dimensions of the domain Rn and of the range Ran A coincide. If
the kernel is nontrivial, then the transformation "kills" dimKerA dimensions, so dimRanA
= n  dim Ker A.
Representation of a Linear Transformation in
Arbitrary Bases, Change of Coordinates Formula
The material we have learned about linear transformations and their matrices can be
easily extended to transformations in abstract vector spaces with finite bases. In this section
we will distinguish between a linear transformation T and its matrix, the reason being that
we consider different bases, so a linear transformation can have different matrix
representation.
Coordinate vector. Let V be a vector space with a basis
B := {bI' b
2
, ... , b
n
}·
Any vector v E V admits a unique representation as a linear combination
n
V = xlb
I
+ x
2
b
2
+ ... + xnbn = LXkbk'
k=I
The numbers xl' X
2
, ... , Xn are called the coordinates of the vector v in the basis B. It is
convenient to join these coordinates into the socalled coordinate vector of v relative to
the basis B, which is the column vector
Note that the mapping
v ~ [v]B
is an isomorphism between Vand IRn. It transforms the basis v!' v
2
, ... , vn to the
standard basis e
I
, e
2
, ... , en in IRn.
Matrix of a linear transformation. Let T: V ~ W be a I inear transformation, and let
a = {a!' a
2
, ... , an}, B := {bl' b
2
, "'j b
m
}
be bases in Vand W respectively.
A matrix of the transformation T in (or with respect to) the bases a and b is an m x n
matrix, denoted by [11
BA
. which relates the coordinate vectors [Tv]B and [v]A'
[Tv]B= [1]BA [v]A;
Systems of Linear Equations 47
notice the balance of symbols A and B here: this is the reason we put the first basis A
into the second position.
The matrix [1]BA is easy to find: its kth column is just the coordinate vector [Tak]B
(compare this with finding the matrix of a linear transformation from ~ n to ~ m ) .
As in the case of standard bases, composition of linear transformations is equivalent
to multiplication oftheir matrices: one only has to be a bit more careful about bases. Namely,
let T) : x ~ Yand T2 : Y ~ Z be linear transformation, and let A, Band C be bases in X,
Yand Z respectively.
The for the composition T= T
2
T
I
,
T: x ~ Z, Tx:= T
2
(T
I
(x))
we have
[1]CA = [T2
T
dcA= [T
2
]CB [TdBA
(notice again the balance of indices here).
The proof here goes exactly as in the case of ~ n spaces with standard bases, so we do
not repeat it here. Another possibility is to transfer everything to the spaces ~ n via the
coordinate isomorphisms v ~ [v]B' Then one does not need any proof, everything follows
from the results about matrix multiplication.
Change of Coordinate Matrix. Let us have two bases
A = {aI' a
2
, ... , an}
and
b = {bI' b
2
, ... , b
n
}
in a vector space V. Consider the identity transformation I = I v and its matrix [1]BA in
these bases. By the definition
[v]B = [1]BA [v] A , \Iv E V.
i.e., for any vector v E Vthe matrix [1]BA transforms its coordinates in the basis a into
coordinates in the basis B. The matrix [1]BA is often called the change of coordinates (from
the basis A to the basis B) matrix.
The matrix [1]BA is easy to compute: according to the general rule of finding the matrix
of a linear transformation, its kth column is the coordinate representation [aklB of kth
element of the basis A Note that
[1]AB = ([1] BAtl,
(follows immediately from the mUltiplication of matrices rule), so any change of
coordinate matrix is always invertible.
An example: change of coordinates from the standard basis. Let our space Vbe ~ n ,
and let us have a basis B = {bI' b
2
, .. " b
n
} there. We also have the standard basis
S = {el' e
2
, "., en}
there. The change of coordinates matrix [1]SB is easy to compute:
[1]SB = [bl' b
2
, .'" b
n
] =: B,
48 Systems of Linear Equations
i.e., it is just the matrix B whose kth column is the vector (column) v
k
. And in the
other direction
[l]BS = ([1] SB )1 = 15
1
.
For example, consider a basis
B={(1), un
in }R2 , and let S denote the standard basis there. Then
[1]SB = U i) =: B
and
[
1]1 B1 1 (I 2)
S[1]BS= SB = ="3 2 I
(we know how to compute inverses, and it is also easy to check that the above matrix
is indeed the inverse of B)
An example: going through the standard basis. In the space of polynomials of degree
at most 1 we have bases .
A = {I, 1 + x}, and B = {I + 2x, 1  2x},
and we want to find the change of coordinate matrix [1]BA.
Of course, we can always take vectors from the basis A and try to decompose them in
the basis B; it involves solving linear systems, and we know how to do that.
However, I think the following way is simpler. In PI we also have the standard basis
S = {l, x}, and for this basis
[I]SA = (6 n =: A, [llsA = (1 ~ 2 ) =: B,
and taking the inverses
_I (1 1) 1 1 (2 1)
[1]As=A = 0 1 ' [1]BS= 15 = 4" 2 1·
Then
and
Notice the balance of indices here. [1]
Matrix of a transformation and change of coordinates. Let T: V ~ W be a linear
transformation, and let A, A be two bases in V and let B, B be two bases in W. Suppose
we know the matrix [1] BA' and we would like to find the matrix representation with respect
to new bases A, B, i.e., the matrix [1] BA. The rule is very simple:
Systems of Linear Equations
to get the matrix in the "new" bases one has to surround the matrix in the
"old" bases by change of coordinates matrices.
49
We did not mention here what change of coordinate matrix should go where, because
we don't have any choice if we follow the balance of indices rule. Namely, matrix
representation of a linear transformation changes according to the formula Notice the balance
of indices.
I[Tls
A
= [I]BB[T]BA[I]AA I
The proof can be done just by analyzing what each of the matrices does.
Case of one basis: similar matrices. Let V be a vector space and let A = {aI' a
2
, ... ,
an} be a basis in V. Consider a linear transformation T: V ~ V and let [11
AA
be its matrix
in this basis (we use the same basis for "inputs" and "outputs")
The case when we use the same basis for "inputs" and "outputs" is very important
(because in this case we can multiply a matrix by itself), so let us study this case a bit more
carefully. Notice, that very often in this [11
A
is often used instead of [11
AA
. It is shorter, but
two index notation is better adapted to the balance of indices rule. case the shorter notation
[11
A
is used instead of [T]AA' However, the two index notation [T] is better adapted to the
balance of indices rule, so I recommend using it (or at least always keep it in mind) when
doing change of coordinates.
Let B = {b
I
, b
2
, ... , bn} be another basis in V. By the change of coordinate rule above
[T]BB = [I]BA[T]AA[I]AB
Recalling that
[I]BA = [flAk
and denoting Q := [I]AB ' we can rewrite the above formula as
[11
BB
= Q 1 [11
AA
Q.
This gives a motivation for the following definition
Definition. We say that a matrix A is similar to a matrix b if there exists an invertible
matrix Q such that A = gI BQ.
Since an invertible matrix must be square, it follows from counting dimensions, that
similar matrices A and B have to be square and of the same size. If a is similar to B, i.e., if
A = gIBQ, then
B = QAgI = (gItIA(gI)
(since gl is invertible), therefore B is similar to A. So, we can just say that A and B
are similar.
The above reasoning shows, that it does not matter where to put Q and where gl:
one can use the formula A = QBgI in the definition of similarity.
The above discussion shows, that one can treat similar matrices as different matrix
representation of the same linear operator (transformation).
Chapter 3
Matrics
Introduction
A rectangular array of numbers of the form
[
a ~ i a;n 1
ami a
mn
is called an m x n matrix, with m rows and n columns. We count rows from the top
and columns from the left. Hence
(ai 1 ... ain) and
a
mj
represent respectively the ith row and the jth column of the matrix (1), and aij
represents the entry in the matrix (1) on the ith row andjth column.
Example. Consider the 3 x 4 matrix
[J
1
4 3
~ 1
I 5
0 7
Here
and m
represent respectively the 2nd row and the 3rd column ofthe matrix, and 5 represents
the entry in the matrix on the 2nd row and 3rd column.
We now consider the question of arithmetic involving matrices. First of all, let us
Matrics 51
study the problem of addition. A reasonable theory can be derived from the following
definition.
Definition. Suppose that the two matrices
[
1 .. . 1 [ I .. . 1
A=: : and B = :
amI a
mn
b
m1
b
mn
both have m rows and n columns. Then we write
A+B=[ all al
n
1
amI + b
ml
a
mn
· + b
mn
and call this the sum of the two matrices A and B.
Example. Suppose that
A=[ ;
1 0 7
and B =[
6 2 1 3 3
Then
A+B = and
11 0+1 7+3 6+3 1 1 10
Example. We do not have a definition for adding" the matrices
[
2 4 3 I] and ;
1 0 7 6
1 0 7
Proposition. (Matrix Addition) Suppose that A, B, Care m x n matrices. Suppose
further that 0 represents the m x n matrix with all entries zero. Then
(a) A + B = B + A;
(b) A + (B + C) = (A + B) + C;
(c) A + 0 = A; and
(d) there is an m x n matrix A' such that A + A' = O.
Proof Parts (a)  (c) are easy consequences of ordinary addition, as matrix addition is
simply entrywise addition. For part (d), we can consider the matrix A' obtained from A by
multiplying each entry of A by 1.
The theory of multiplication is rather more complicated, and includes multiplication
of a matrix by a scalar as well as mUltiplication of two matrices.
We first study the simpler case of multiplication by scalars.
52 Matrics
Definition. Suppose that the matrix
A
 .
 .
ami
has m rows and n columns, and that C E JR. Then we write
cA = 1
cam I ca
mn
. and call this the product of the matrix A by the scalar c.
Example. Suppose that
[
2 4 3 1]
A= 3 1 5 2 .
1 0 7 6
Then
2A = [:
2 0 14 12
Proposition. (Multiplication By Scalar) Suppose that A, Bare m x n matrices, and that
c, dE JR. Suppose further that 0 represents the m x n matrix with all entries zero. Then
(a) c(A + B) = cA + cB;
(b) (c + d)A = cA + dA;
(c) OA = 0; and
(d) c(dA) = (cd)A.
Proof These are all easy consequences of ordinary multiplication, as multiplication
by scalar c is simply entrywise multiplication by the number c.
The question of multiplication of two matrices is rather more complicated. To motivate
this, let us consider the representation of a system of linear equations
a
l1
x
I +···+a1nx
n
in the form Ax = b, where
[
I . . . 1 1
A=: : andb = :
ami a
mn
b
m
represent the coeffcients and
Matrics
represents the variables. This can be written in full matrix notation by
Can you work out the meaning of this representation?
Now let us define matrix multiplication more formally.
Definition. Suppose that
A= and B 
53
are respectively an m x n matrix and an n x p matrix. Then the matrix product AB is
given by the m x p matrix
AB 
qml qmp
where for every i = 1, ... , m and j = 1, ... , p, we have
n
qij = ~ a i k b k j = ailb
1j
+ ... +ainb,y·
k=l
Remark. Note first of all that tbe number of columns of the first matrix must be equal
to the numberof rows of the second matrix. On the other hand, for a simple way to work
out q jj , the entry in the ithrow and jth column of AB, we observe that the ith row of A
and the jth column of B are respectively
b
1j
b
nj
We now multiply the corresponding entries  from ail with bll' and so on, until a
in
with b nj  and then add these products to obtain q ij'
Example. Consider the matrices
54
A=[ ~ ;
1 0
4
3
2
1
Matrics
Note that A is a 3 x 4 matrix and B is a 4 x 2 matrix, so that the product AB is a 3 x 2
matrix. Let us calculate the product
Consider first of all q II. To calculate this, we need the Ist row of A and the Ist
column of B, so let us cover up all unnecessary information, so that
[
2 4 3 I] ~ : [ qtl qt2]
xxx x = xx.
o x
x x x x x x
3 x
From the definition, we have
qll = 2.1 + 4.2 + 3.0 + (1) .3 = 2 + 8 + 0 3 = 7.
Consider next q 12. To calculate this, we need the Ist row of A and the 2nd column of
B, so let us cover up all unnecessary information, so that
[
2 4 3 1]: ; [X ql21
x x x x =X x.
x 2
x x x x x xJ
x 1
From the definition, we have
qI2 = 2.4 + 4.3 + 3 ( 2) + ( 1) .1 = 8 + 12  6 1 = 13.
Consider next q2I. To calculate this, we need the 2nd row of A and the Ist column of
B, so let us cover up all unnecessary information, so that
[ ~ ~ ~ ~ H : + ~ l :]
3 x
From the definition, we have
q2I = 3.1 + 1.2 + 5. 0 + 2.3 = 3 + 2 + 0 + 6 = 11
Matries 55
Consider next q22. To calculate this, we need the 2nd row of A and the 2nd column
of B, so let us cover up all unnecessary information, so that
x x x x
3 5 2
x x x x
x 4
x 3
x 2
3
From the definition, we have
x x j
= X q22 .
X X
q22 = 3.4 + 1.3 + 5. ( 2) + 2.1 = 12 + 3 10 + 2 = 7.
Consider next q31. To calculate this, we need the 3rd row of A and the Ist column of
B, so let us cover up all unnecessary information, so that
x x x v
x x x x
1 0 7 6
From the definition, we have
1 x
2 x
o x
3 x
q31 =(1).1 +0.2+7.0+6.3 =1 +0+0+ 18= 17.
Consider finally q32. To calculate this, we need the 3rd row of A and the 2nd column
of B, so let us cover up all unnecessary information, so that
: : :j: : =[:
x 2
1 0 7 6 X q32
X
x x
x x
From the definition, we have
q32 = (1) .4 + 0.3 + 7 ( 2) + 6.1 =  4 + 0 + ( 14) + 6 =  12.
We therefore conclude that
1 4
AB=[ ~
4 3 Ij
+ ~
13 1
2 3
152 7 .
0 2
1 o 7 6 17 12
3
Example. Consider again the matrices
1 4
A ~ P
4 3
Ij
2 3
1 5
~ and B=
0 2
1 0 7
3
56
Matries
Note that B is a 4 x 2 matrix and A is a 3 x 4 matrix, so that we do not have a definition
for the "product" BA. We leave the proofs of the following results as exercises for the
interested reader.
Proposition. (Associative Law) Suppose that A is an mn matrix, B is an np matrix and
C is an p x r matrix. Then A(BC) = (AB)C.
Proposition (Distributive Laws)
(a) Suppose that A is an m x n matrix and Band Care n x p matrices. Then
A(B + C) = AB + AC.
(b) Suppose that A and Bare m x n matrices and C is an n x p matrix. Then
(A + B)C = AC +BC.
Proposition. Suppose that A is an m x n matrix, B is an n x p matrix, and that c E JR.
Then c(AB) = (cA)B = A(cB).
Systems of Linear Equations
Note that the system of linear equations can be written in matrix form as
Ax = b,
where the matrices A, x and b are given. We shall establish the following important
result.
Proposition. Every system of linear equations of the form,has either no solution, one
solution or infinitely many solutions. .
Proof Clearly the system (2) has either no solution, exactly one solution, or more
than one solution. It remains to show that if the system (2) has two distinct solutions, then
it must have infinitely many solutions. Suppose that x = u and x = v represent two distinct
solutions. Then
Au = band Av = b,
so that
A(u  v) = Au Av = b  b = 0,
where 0 is the zero m x 1 matrix. It now follows that for every c E lR, we have
A(u + c(u  v» = Au + A(c(u  v» = Au + c(A(u  v» = b + c) = b;
so that x = u + c(u  v) is a solution for every c E lR. Clearly we have infinitely many
solutions.
Inversion of Matrices
We shall deal with square matrices, those where the number ofrows equals the number
of columns.
Definition. The n x n matrix
Matrics 57
where
{
I if i = j,
a··
1)= 0 ifi::;!:j,
is called the identity matrix of order n.
Remark. Note that
1 a
a
II = (1) and 14 =
a a
a a
a a
a a
1 a
a
The following result is relatively easy to check. It shows that the identity matrix In
acts as the identity for multiplication of n x n matrices.
Proposition. For every n x n matrix A, we have AIn = I,.,A = A.
This raises the following question: Given an n x n matrix A, is it possible to find
another n x n matrix B such that AB = BA = In?
However, we shall be content with nding such a matrix B if it exists. We shall relate
the existence of such a matrix B to some properties of the matrix A.
Definition. An n x n matrix A is said to be invertible if there exists an n x n matrix B
such that AB = BA = In. In this case, we say that B is the inverse of A and write B = AI.
Proposition. Suppose that A is an invertible n x n matrix. Then its inverse AI is
unique.
Proof Suppose that B satises the requirements for being the inverse of A. Then AB =
BA = In. It follows that
A I =A lIn = A I(AB) = (A IA)B = InB = B.
Hence the inverse A I is unique.
Proposition. Suppose that A and B are invertible n x n matrices. Then (AB) B1 = A
I
Proof In view of the uniqueness of inverse, it is sucient to show that BIA 1 satises
the requirements for being the inverse of AB. Note that
(AB)(BIA I) =A(B(B IA I» = A «BB I)A I)
= A(I,.,A I) = AA I = In
and
(BIA I)(AB) = B I(A I(AB» = BI« A IA)B) =BI(InB) B IB = In as required.
Proposition. Suppose that A is an invertible n x n matrix. Then
(A I) I =A.
Proof Note that both (A 1) 1 and A satisfy the requirements for being the inverse of
A I. Equality follows from the uniqueness of inverse.
58
Matrics
Application to Matrix Multiplication
In this section, we shall discuss an application of invertible matrices. Detailed discussion
of the technique involved will be covered.
Definition. An n x n matrix
[
I ... al
n
A= :
ani ann
where aij = 0 whenever i ::j: j, is called a diagonal matrix of order n.
Example. The 3 x 3 matrices
1 and
are both diagonal.
Given an n x n matrix A, it is usually rather complicated to calculate
Ak =1 d:.:A:
k
However, the calculation is rather simple when A is a diagonal matrix, as we shall see
in the following example.
Example. Consider the 3 x 3 matrix
17 10 5]
A= 45 28 15.
30 20 12
Suppose that we wish to calculate A98. It can be checked that if we take
P =[
2 3 0
then
pI
3 5/3
Furthermore, if we write
n
].
1
then it can be checked that A = PDP 1, so that
Matrics 59
3
98
0 0
A
98
= = PD
98
p
I
= P 0 2
98
0
pI.
.
98
0 0 2
98
This is much simpler than calculating A98 directly. Note that this example is only an
illustration. We have not discussed here how the matrices P and D are found.
Finding Inverses by Elementary Row Operations
In this section, we shall discuss a technique by which we can nd the inverse of a
square matrix, if the inverse exists. Before we discuss this technique, let us recall the three
elementary row operations we discussed in the previous chapter. These are: (1) interchanging
two rows; (2) adding a multiple of one row to another row; and (3) multiplying one row by
a nonzero constant.
Example. Consider the matrices
(
all a12 a13 (1 0 0
A = a
21
a22 a23 and h = 0 0 O.
0 0 1
•
Let us interchange rows 1 and 2 of A and do likewise for 1
3
, We obtain respectively
and
001
Note that
[
= ::: ::: :::].
a31 a
3
2 a33 0 0 1 a31 a32 a33
• Let us interchange rows 2 and 3 of A and do likewise for 1
3
, We obtain respectively
(
land 0
a21 a
22
a23 0 0
Note that
[all
a12 a13
1 0
Til
a12
0
13
]
a
31
a32 a
3
3
and 0 0
1 a21 a22
a
23
.
a21 a22 a23
0 1
o a31 a32
a
33
•
Let us add 3 times row 1 to row 2 of A and do likewise for 1
3
, We obtain
respectively
60 Matrics
[ all
a
l2
a" ]
1 0
3a
ll
+a21 3al2 +a
22
3a
l
3 + a23 and 3 1
a31 a32 a33
0 0
Note that
[
a
21
a22 a
23
] = ::: ::: :::].
a31 a
32
a
33
0 0 1 a31 a32 a33
• Let us add 2 times row 3 to row 1 of A and do likewise for 1
3
. We obtain
respectively
2a31 +all 
2a
32 +a
12
2a
33
+a,,]
0
2]
a
21
a22 a23
and o .
a31
a
32
a
33
0
Note that
[
2a31 + all 
2a
32 + a
l2
2a
33
+ al3] 1 0 2][ all a12 al3]
a21 a22 a23 = 0 1 0 a
21
a22 a23
a
31
a32 a3
3
0 0 1 a31 a32 a
3
3
• Let us multiply row 2 of A by S and do likewise for 13" We obtain respectively
[
S::2 S::3] and
a31 a32 a33 0 0 I
Note that
[
all al2 al3] [I 0 01[a
11
a
12
al3]
Sa
21
Sa22 Sa
23
= 0 S 0 J a21 a
22
a23 .
a31 a
3
2 a33 0 0 1 a
31
a32 a33
•
Let us multiply row 3 of A by  1 and do likewise for 1
3
. We obtain respectively
[
::: ::: :::] and[ 1.
0 0 1
Note that
[
].
a31 a32 a
3
3 0 0 1 a
31
a32 a
33
Let us now consider the problem in general.
Matrics 61
Definition. By an elementary n x n matrix, we mean an n x n matrix obtained from In
by an elementary row operation.
We state without proof the following important result. The interested reader may wish
to construct a proof, taking into account the different types of elementary row operations.
Proposition. Suppose that A is an n x n matrix, and suppose that B is obtained from A
by an elementary row operation. Suppose further that E is an elementary matrix obtained
from In by the same elementary row operation. Then B = EA.
We now adopt the following strategy. Consider an n x n matrix A. Suppose that it is
possible to reduce the matrix A by a sequence <X
J
<X
2
, ... , <Xk of elementary row operations
to the identity matrix In. If EI' E
2
, ... , Ek are respectively the elem'entary n x n matrices
obtained from In by the same elementary row operations <XI' <X
2
, ••• , <Xk then
In = Ek ... E2EIA
We therefore must have
AI = Ek ... E2EI = Ek ... E
2
E
l
I
n
·
It follows that the inverse AI can be obtained from In by performing the same
elementary row operations <Xl' <X2 •••• <X
k
• Since we are performing the same elementary
row operations on A and In it makes sense to put them side by side. The process can then
be described pictorially by'
(A I In) C'i) l(EJA I EJl
n
)
°2 l(E2EIA I E2ElIn)
03 l."
<Yk l(Ek ···E
2
E
I
A I E
k
.. ·E
2
E
l
I
n
) = (In,1 A
J
).
In other words, we consider an array with the matrix A on the left and the matrix In on
the right. We now perform elementary row operations on the array and try to reduce the
left hand half to the matrix In. If we succeed in doing so, then the right hand half of the
array gives the inverse AI.
Example, Consider the matrix
A=[ ~ ~ ~ .
2 3 0
To find AI, we consider the array
121
030
300
o 0]
1 0
o 1
We now perform elementary row operations on this array and try to reduce the left
hand half to the matrix 1
3
. Note that if we succeed, then the final array is clearly in reduced
62
Matrics
row echelon form. We therefore follow the same procedure as reducing an array to reduced
row echelon form. Adding  3 times row 1 to row 2, we obtain
[
2 3 0 0 0 1
Adding 2 times row 1 to row 3, we obtain
o 5 4 2 0 1
MUltiplying row by 3 by 3. we obtain
o 15 12 6 0 3
Adding 5 times row 2 to row 3, we obtain
o 0 3 9 5 3
Multiplying row 1 by 3, we obtain
3
3
o 0 3 9 5 3
Adding 2 times row 3 to row 1, we obtain
[
3 3 0 15 10 6]
o 3 3 3 1 O.
o 0 3 9 5 3
Adding 1 times row 3 to row 2, we obtain
[
3 3 0 15 10 6
o 3 0 6 4., 3 .
o 0 3 9 5 3
Adding 1 times row 2 to row 1, we obtain
o 0 3 9 5 3
MUltiplying row 1 by 113, we obtain
Matrics 63
0 0 3 2
~ 3 1
0 3 0 6 4
0 0 3 9 5
Multiplying row 2 by 113, we obtain
1 0 0 3 2 1
0 1 0 2 4/3 1 .
0 0 3 9 5 3
Multiplying row 3 by 113, we obtain
[
1 0 0 3 2 1 1
o 1 0 2 4/3 1.
o 0 1 3 5/3 1
Note now that the array is in reduced row echelon form, and that the left hand half is
the identity matrix 1
3
. It follows that the right hand half of the array represents the inverse
AI. Hence
3 2
: l
AI =
2 4/3
3 5/3 1
Example. Consider the matrix
1 1 2 3
2 2 4 5
A=
0 3 0 0
0 0 0 1
To find AI, we consider the array
1 1 2 3 0 0 0
2 2 4 5 0 1 0 0
(AI
1
4)=
0 3 0 0 0 0 1 0
0 0 0 0 0 0
We now perform elementary row operations on this array and try to reduce the left
hand half to the matrix 1
4
. Adding 2 times row 1 to row 2, we obtain
11231000
0 0 0 1 2 1 0 0
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
Adding 1 times row 2 to row 4, we obtain
64
1 1 2 3
o 0 0 1
o 3 0 0
o 0 0 0
100 0
2 1 0 0
o 0 1 0
2 1 0 1
Interchanging rows 2 and 3, we obtain
1 2 3 000
03000010
o 0 0 1 2 1 0 0
o 0 0 0 2 1 0 1
Matrics
At this point, we observe that it is impossible to reduce the left hand half of the array
to 1
4
. For those who remain unconvinced, let us continue. Adding 3 times row 3 to row 1,
we obtain
1 2 0 5 3 0 0
03000010
o 0 0 1 2 0 0
o 0 0 0 2 0 1
Adding 1 times row 4 to row 3, we obtain
1 1 2 0 5 3 0 0
o 3 0 0 o
o 0 0 1 2
o 1 0
o 0
o 0 0 0 2 1 0
Multiplying row 1 by 6 (here we want to avoid fractions in the next two steps), we
obtain
6 6 12 0 30 18 0 0
o 3 0 0 0 o 1 0
o 0 0 1 0 o 0 1
o 0 0 0 2 1 0
Adding  15 times row 4 to row 1, we obtain
6 6 12 0 0 3 0 15
03000010
o 0 0 1
o 0 0 0
o 0 0
2 1 0
1
1
Adding 2 times row 2 to row 1, we obtain
Matries 65
6 0 12 0 0 3 2 15
0 3 0 0 0 0 1 0
0 0 0 1 0 0 0 1
0 0 0 0 2 1 0 1
Multiplying row 1 by 116, multiplying row 2 by 113, multiplying row 3 by 1 and
multiplying row 4 by 112, we obtain
1 0 2 0 0 1/12
1/3 5/2
0 1 0 0 0 0 113 0
0 0 0 1 0 0 0 1
0 0 0 0 1 112 0 1/2
Note now that the array is in reduced row echelon form, and that the left hand half is
not the identity matrix 1
4
, Our technique has failed. In fact, the matrix A is not invertible.
Criteria for Invertibility
Examples. In this section, we shall obtain some partial answers to this question. Our
first step here is the following simple observation.
Proposition. Every elementary matrix is invertible.
Proof Let us consider elementary row operations. Recall that these are: (1)
interchanging two rows; (2) adding a multiple of one row to another row; and (3) mUltiplying
one row by a nonzero constant.
These elementary row operations can clearly be reversed by elementary row operations.
For (1), we interchange the two rows again. For (2), if we have originally added e times
row i to row j, then we can reverse this by adding  e times row i to row j. For (3), if we
have multiplied any row by a nonzero constant e, we can reverse this by multiplying the
same row by the constant lie. Note now that each elementary matrix is obtained from In
by an elementary row operation. The inverse of this elementary matrix is clearly the
elementary matrix obtained from In by the elementary row operation that reverses the
original elementary row operation.
Suppose that an n x n matrix B can be obtained from an n x n matrix A by a finite
sequence of elementary row operations. Then since these elementary row operations can
be reversed, the matrix A can be obtained from the matrix B by a finite sequence of
elementary row operations.
Definition. An n x n matrix A is said to be row equivalent to an n x n matrix B ifthere
exist a fin,ite number of elementary n x n matrices EI ... Ek such that B = Ek ... E1A.
Remark. Note that B = Ek ..... EIA implies that A = E11 ... E;IB. It follows that if A is
row equivalent to B, then B is row equivalent to A. We usually say that A and B are row
equivalent. The following result gives conditions equivalent to the invertibility of an n x n
matrixA.
66 Matrics
Proposition. Suppose that
[
a ~ I ... a ~ n 1
A
=· .
. .,
ani ann
and that
are n x 1 matrices, where xl' ... , Xn are variables.
(a) Suppose that the matrix A is invertible. Then the system Ax = 0 of linear
equations has only the trivial solution.
(b) Suppose that the system Ax = 0 of linear equations has only the trivial solution.
Then the matrices A and In are row equivalent.
(c) Suppose that the matrices A and In are row equivalent. Then A is invertible.
Proof (a) Suppose that Xo is a solution of the system Ax = O. Then since A is invertible,
we have
Xo = Irfo = (AIA)x
o
= Al(Ax
o
) = A:! ° = o.
It follows that the trivial solution is the only solution.
(b) Note that if the system Ax = 0 of linear equations has only the trivial solution,
then it can be reduced by elementary row operations to the system
: x1=O, ... ,xn=O.
This is eqQ.ivalent to saying that the array
[
a ~ 1 aln 0
ani ann 0
can be reduced by elementary row operations to the reduced row echelon form
l ... r! 1
Hence the matrices A and In are row equivalent.
(c) Suppose that the matrices A and In are row equivalent. Then there exist
elementary nn matrices E
l
, ... , Ek such that
In = E
k
, ... , EIA.
By Proposition, the matrices E
l
, ... , E ~ are all invertible, so that
Matrics
A = E
I
···E
I
l = E
I
E
I
I k n I ... k
is a product of invertible matrices, and is therefore itself invertible.
Consequences of Invertibility
Suppose that the matrix
A = all··· a;n 1
anI ann
is invertible. Consider the system Ax = b, where
[]:] and
are n x 1 matrices, where Xl' ... , Xn are variables and
b
l
, ... , b
n
E lR
are arbitrary. Since A is invertible let us consider X = Alb. Clearly
Ax = A(A_I b) = (AI b) = (AA1)b = Inb = b,
is invertible, Consider
67
so that x = AI b is a solution of the system. On the other hand, let Xo be any solution
of the system. Then Axo = b, so that
Xo = l,fo= (AIA)x
o
= AI(Ax
o
) = Alb.
It follows that the system has unique solution. We have proved the following important
result.
Proposition. Suppose that
[
I ... 1
A=: :,
anI ann
and that
x = and b
xn b
n
are n x I matrices, where x I' ... , xn are variables and
bl' ... , b
n
E lR
are arbitrary. Suppose further that the matrix A is invertible. Then the system Ax = b
of linear equations has the unique solution x = AI b.
We next attempt to study the question in the opposite direction.
Proposition. Suppose that
68 Matrics
and that
are n x 1 matrices, where xl' ... , Xn are variables. Suppose further that for every b
I
, ... ,
b
n
E lR., the system Ax = b of linear equations is soluble. Then the matrix A is invertible.
Proof Suppose that
1 0 0
0 0
b 
, ,b
2
= o , ... , b
n
=
0 0
0 0 1
In other words, for every j = 1, ... , n, b
j
is an n x 1 matrix with entry 1 on row and
entry 0 elsewhere.
Now let
XI = [X;I] ... ,x
n
=[x;n]
xnl xnn
denote respectively solutions of the systems of linear equations
Ax = b
l
, ... , Ax = b
n
It is easy to check that
A( xl' ... , xn ) = ( b
l
, ..• , b
n
);
in other words,
A[a;1
ani
so that A is invertible.
We can now summarize Propositions as follows.
Proposition. In the notation of Proposition, the following four statements are equivalent:
(a) The matrix A is invertible.
Matrics
(b) The system Ax = 0 of linear equations has only the trivial solution.
(c) The matrices A and In are row equivalent.
Cd) The system Ax = b of linear equations is soluble for every n x 1 matrix b.
Application to Economics
69
In this section, we describe briey the Leontief inputoutput model, where an economy
is divided into n sectors.
For every i = 1, ... , n, let xi denote the monetary value of the total output of sector i
over a fixed period, and let d; denote the output of sector i needed to satisfy outside demand
over the same xed period. Collecting together xi and d
i
for i = 1, .... n, we obtain the vectors
andd=
Xn d
n
known respectively as the production vector and demand vector of the economy.
On the other hand, each of the n sectors requires material from some or all of the
sectors to produce its output. For i, j = 1, ... n, let cij denote the monetary value of the
output of sector i needed by sector j to produce one unit of monetary value of output. For
every j = 1, ... n, the vector
Cl
i
C =
J
C
nj
is known as the unit consumption vector of sector j. Note that the column sum
ClO + ... + C 0 < 1
J nJ 
in order to ensure that sector j does not make a loss. Collecting together the unit
consumption vectors, we obtain the matrix
[
cll
C=(cl·· ·c
n
) = :
cnl
is known as the consumption matrix of the economy.
Consider the product
For every i = 1, ... , n, the entry cilx\ + ... + c;nXn represents the monetary value of the
output of sector
i needed by all the sectors to produce their output. This leads to the production equation
x= Cx+ d
Here Cx represents the part ofthe total output that is required by the various sectors of
70 Matrics
the economy to produce the output in the first place, and d represents the part of the total
output that is available to satisfy outside demand.
, Clearly (/  C)x = d. If the matrix (/  C)  d is invertible, then represents the perfect
production level. We state without proof the following fundamental result.
Proposition. Suppose that the entries of the consumption matrix C and the demand
vector d are nonnegative. Suppose further that the inequality (5) holds for each column of
C. Then the inverse matrix (/  C)I exists, ,and the production vector x = (I  C)I d has
nonnegative entries and is the unique solution of the production equation (6).
Let us indulge in some heuristics. Initially, we have demand d. To produce d, we need
Cd as input. To produce this extra Cd, we need C(Cd) = c'2d as input. To produce this
extra C
2
d, we need C(C
2
d) = C
3
d as input. And so on. Hence we need to produce
d+ Cd+ C
2
d+ C
3
d+ ... = (/+ C+ C2 + C3 + ... )d
in total. Now it is not dicuIt to check that for every positive integer k, we have
,{f C)(/ + C + C
2
+ C3 + ... + C") = I  Ck + I,
If the entries of Ck + I are all very small, then
(I  C)(I + C + C
2
+ C3 + ... + C") "" I,
so that
(I  C)I"" I + C + C2 + C
3
+ ... + Ck.
This gives a practical way of approximating (I  C)I, and lalso suggests that
(/  C)I = / + C + C2 + C3 + ...
Example. An economy consists of three sectors. Their dependence on each other is
summarized in the table below:
monetary value of output required from sector 1
monetary value of output required from sector 2
To produce one unit of monetary
value of output in sector
0:3
0:4
2
0:2
0:5
3
0:1
0:2
monetary value of output required from sector 3 0: 1 0: 1 0:3
Suppose that the fit;lal demand from sectors 1,2 and 3 are respectively 30, 50 and 20. Then
the production vector and demand vector are respectively
while the consumption matrix is given by
Matrics
[
0.3 0.2 0.1] [0.7 0.2
C = 0.4 0.5 0.2, so that I  C = 0.4 0.5
0.1 0.1 0.3 0.1 0.1
0.1]
0.2 .
0.7
. The production equation (I  C) x = d has augmented matrix
0.7 0.2 0.1 30 [ 7
0.4 0.5 0.2 50, . Itt 4
2 1 300
5 2 500
eqmva en 0
0.1 0.1 0.7 20 1 1 7 200
and which can be converted to reduced row echelon form
(
0 0 0 3200/27
o 0 0 6100/27
o 0 1 70019
This gives xl ~ 119, x
2
~ 226 and x3 ~ 78, to the nearest integers.
Matrix Transformation on the Plane
71
Let A be a 2 x 2 matrix with real entries. A matrix transformation T:]R
2
+ ]R
2
can
be dened as follows: For every
x = (XI' x
2
) E]R,
we write
T(x) = y,
where
satifies
Such a transformation is linear, in the sense that T(x' + x1 = T(x1 + T(x1 for every x'
x' E]R2 and T(ex) = eT(x) for every X E]R2 and every C E ]R . To see this, simply observe
that
(
X; + x;'] ( x ~ ] (x;'] (ex
l
) (Xl]
A = = A + A and A J = eA .
x ~ + x; x ~ x; eX2 x2
Here we conne ourselves to looking at a few simple matrix transformations on the
plane.
Example. The matrix
72 Matrics
for every (x I' X
2
) E.JR
2
, and so represents reflection across the x Iaxis, whereas the
matrix
A=( ~ l ~ l satisfies A[;:H ~ l ~ J [ : : H ~ : l l
for every (x I' x
2
) E.JR
2
, and so represents reflection across the x
2
axis. On the other
hand, the matrix
satisfies
for every (x I' x
2
) E.JR
2
, and so represents reection across the origin, whereas the
matrix
for every (xI' x
2
) E.JR
2
, and so represents reection across the line xI = x
2
. We give a
summary in the table below:
Transformation Equations matrix
Reflection acrossx
1
 axis
{Yl =x1
[ ~
~ l l
Y2 = x
2
Reflection acrossx2  axis
{Yl =x1
[ ~ 1
~ l Y2 =x2
Reflection across origin
{Yl =x1
[ ~ 1
~ l l
Y2 = x2
Reflection acrossxI = x2
{Yl =x
1
[ ~
~ l Y2 =x2
Example. Let k be a fixed positive real number. The matrix
for every (x I' x
2
) E.JR
2
, and so represents dilation if k > 1 and contraction if 0 < 1. k
< 1. On the other hand, the matrix
•
Matrics 73
for every (XI' X
2
) E IR.
2
, and so represents an expansionn in the xI direction if k >
1 and compression in the x Idirection if 0 < k < 1. whereas the matrix
for every (xI' x
2
) E IR. 2, and so represents expansion in the x
2
direction if k > 1 and
compression in the x
2
direction if 0 < k < 1. We give summary in the table below:
Transformation Equations matrix
Dilation or contract ion by factor k > 0
{YI = Axl
Y2 = kx2
[ ~
~ l
Expansion or compression in xI direction by factor k > 0
{YI ~ Axl
[ ~
~ l Y2 =x2
Expansion or compression in x2 direction by factor k > 0
{YI ~ x l
Y2 = kx2
[ ~
~ l
Example. Let k be a fixed real number. The matrix
for every (xI' x
2
) E IR.
2
, and so represents a shear in the xIdirection. For the case k =
1, we have the following:
T ~
(k= I)
74 Matries
For the case k =  1, we have the following:
T
(k=J) •
Similarly, the matrix
satisfies
for every (xl' x
2
) E IR?, and so represents a shear in the x
2
direction. We give a
summary in the table below:
Transformation Equations Matrix
Shear in XI  direction
{Y2 ='1 +kx2
Y2 =X
2
Shear in xl  direction
{Yl =xl
Y2 =kxl +x
2
Example. For anticlockwise rotation by an angle e, we have T(x], x
2
) = (Y]'Y2)' where
Y] + iY2 = (x] + ix
2
)(cos e + i sin e ),
and so
[
YI ] = Sine][ YI ].
Y2 sme cose Y2
It follows that the matrix in question is given by
A = [ cos e  sin e 1
sine cose .
We give a summary in the table below:
Transformation Equations Matrix
Anticlockwise rotation bylangle e
{Yl = x, cose  x
2
sin e
YI = x] sin e + x
2
cos e
[ cose
sine
Sine]
cose
We conclude this section by establishing the following result which reinforces the
linearity of matrix transformations on the plane.
Matries 75
Proposition. Suppose that a matrix transformation T: JR2 ? JR2 is given by an
invertible matrix A. Then
(a) The image under T of a straight line is a straight line;
(b) The image under T of a straight line through the origin is a straight line
through the origin; and
(c) The images under T of parallel straight lines are parallel straight lines.
Proof Suppose that T(x
l
, x
2
) = (Yl'Y2)' Since A is invertible, we have x = AIy, where
x = [ and y = [ ;;J
The equation of a straight line is given by ax
l
+ = yor, in matrix form, by
= [:: )<1)., .
Hence
(a;'W) * AI.
In other words, the image under T of the straight line
=yis aYI + WY2 =y,
clearly another straight line. This proves (a). To prove (b), note that straight lines
through the origin correspond to y = O. To prove (c), note that parallel straight lines
correspond to different values of y for the same values of a and
Application to Computer Graphics
Example. Consider the letter M in the diagram below:
Following the boundary in the anticlockwise direction starting at the origin, the 12
76
vertices can be represented by the coordinates
(:],
(:], (:], (!],
Let us apply a matrix transformation to these vertices, using the matrix
A=[: H
representing a shear in the xldirection with factor 0:5, so that
(
Xl] (Xl +..!.. X2]
A x2 = X: for every (x
l
,x
2
)EIR
2
.
Then the images of the 12 vertices are respectively
noting that
(:],[:],
[I:], (I:) (!], [:],
(:
1 1 4 7 7 8 8 7 4 1 0]
060 6 008 8 288
= [0 I 4 4 10 7 8 12 11 5 5 4].
006060088288
Matrics
In view of Proposition, the image of any line segment that joins two vertices is a line
segment that joins the images of the two vertices. Hence the image of the letter M under
the shear looks like the following:
Matrics 77
Next, we may wish to translate this image. However, translation is transformation by
vector
h=(h
l
,h
2
) Ene
is of the form
(;: H:: H 1 for every (x" x,)E R'.
and this cannot be described by a matrix transformation on the plane. To overcome
this deficiency, we introduce homogeneous coordinates. For every point
to
(xl' X
2
) (Xl' x
2
)E]R2
we identify it with the point (Xl' x
2
' 1) E]R3 . Now we wish to translate a point (Xl' X
2
)
(Xl' X
2
) + (hI' h
2
) = (Xl + hI' x
2
+ h
2
),
so we attempt to find a 3 x 3 matrix A * such that
[ ] = A * [ 1 for every (x l' x,)
It is easy to check that
E]R2.
;:] for every (x
l
,x
2
) E]R1.
1 0 0 1 1
It fo Hows that using homogeneous coordinates, translation by vector h = (h l' h
2
) E ]R
2
.
can be described by the matrix
o 0 1
Remark. Consider a matrix transformation T :]R2 on the plane given by a matrix
A=[a
l1
a
12
].
a2l a22
Suppose that T(x
l
, x
2
) = (YI' Y2). Then
[;:]A = [::]=[::: ::][::].
Under homogeneous coordinates, the image of the point (Xl' x
2
' 1) is now (Yl'Y2' 1).
Note that
78 Matrics
[
YI] [all a
l
2 0) XI]
= 7 .
It follows that homogeneous coordinates can also be used to study all the matrix
transformations we have discussed. By moving over to homogeneous coordinates, we simply
replace the 2 x 2 matrix A by the 3 x 3 matrix
Example. The letter M, the 12 vertices are now represented by homogeneous
coordinates, put in an array in the form
[
0 1 1 4
o 0 6 0
111 1
7 7 8 8 7 4 0]
6 0 0 8 8 2 8 8
1
,
1 I 1 1
Then the 2 x 2 matrix
, [1
A= 2
o 1
is now replaced by the 3 x 3 matrix
Note that
o
2
A* 0 0 .
o 0
[
0 ,t 1 4 7 7 8 8 7 4 11
A* 0" 0 6 0 6 0 0 8 8 2 8
1 1 1 1 1 1
1 1
2
 0 1
o 0
o [0 1 I 4 7 7 8 8 7 4
00060600882
11111111111
1 0]
8 8
1 1
Matrics
o 1 4 4 1 0 7 8 12 11 5 5 4]
=006060088288.
111111111111
79
Next, let us consider a translation by the vector (2; 3). The matrix under homogeneous
coordinates for this translation is given by
B* ~ ~ ~ ] .
o 0 1
Note that
0 1 4 4 7 7 8 8 7 4 0
B*A* 0 0 6 0 6 0 0 8 8 2 8 8
1 1 1
=( ~
0
m
1 4 4 10 7 8 12 11 5
1 0 6 0 6 0 0 8 8 2
0 111 1 1 1 1 1 1
= [ ~
3 6 6 12 9 10 14 13 7 7
l ~ l J.
3 9 3 939 11 11 5 11
1 1 1 1
1 '1
1 1 1 1
giving rise to coordinates in IR
2
, displayed as an array
[
2 3 6 6 12 9 10 14 13 7 7 6]'
3 3 9 3 9 3 3 11 11 5 11 11
5
~ l
8
1
Hence the image of the letter M under the shear followed by translation looks like the
following:
80
Matrics
Example. Under homogeneous coordinates, the transformation representing a reflection
across the xIaxis, followed by a shear by factor 2 in the xIdirection, followed by
anticlockwise rotation by 90°, and followed by translation by vector (2, I), has matrix
[
~ ~ ~ I l [ ~ ~ I ~ [ ~ ~ ~ l [ ~ ~ I ~ l = [: ~ 2 ~ I l '
00 100100100 100 I
Complexity of a NonHomogeneous System
Consider the problem of solving a system of linear equations of the form Ax = b,
where A is an n x n invertible matrix. We are interested in the number of operations required
to solve such a system. By an operation, we mean interchanging, adding or mUltiplying
two real numbers.
One way of solving the system Ax = b is to write down the augmented matrix
[
a;1 ... a;n ~ I l '
anI ann b
n
and then convert it to reduced row echelon form by elementary row operations.
The first step is to reduce it to row echelon form: .
(I) First of all, we may need to interchange two rows in order to ensure that the top
left entry in the array is nonzero. This requires n + 1 operations.
(II) Next, we need to multiply the new first row by a constant in order to make the
top left pivot entry equal to I. This requires n + I operations, and the array now
looks like
anI a
n
2 ann b
n
Note that we are abusing notation somewhat, as the entry a
l2
here, for example, may
well be different from the entry a
l2
in the augmented matrix.
(III) For each row i = 2, ... , n, we now multiply the first row by  ail and then add to
row i. This requires 2(n  I)(n + 1) operations, and the array now looks like
Matrics 81
(IV) In sumJary, to proceed from the form II to the form III, the number of operations
at most 2(n + 1) + 2(11  1 )(n + 1) = 2n(n + 1) .
. (V) Our nelt task is to convert the smaller array
t ...
a
n
2 ann b
n
to an array that looks like
These have one row and one column fewer than the arrays (II) and (III), and the number
of operations required is at most 2m(m + 1), where m = n  1. We continue in this way
systematically to reach row echelon form, and conclude that the number of operations
required to convert the augmented matrix (II) to row echelon form is at most
n2 2
L2m(m+ 1) _n
3
•
m=1 3
The next step is to convert the row echelon form to reduced row echelon form. This is
simpler, as many entries are now zero. It can be shown that the number of operations
required is bounded by something like 2n
2
indeed, by something like n
2
if one analyzes
the problem more carefully. In any case, these estimates are insignicant compared to the
. 2 3 .
estlmate"3
n
earlier.
We therefore conclude that the number of operations required to solve the system
Ax = b by reducing the augmented matrix to reduced row echelon form is bounded by
2
something like 3" n
3
when n is large.
Another way of solving the system Ax = b is to first nd the inverse matrix AI. This
may involve converting the array
ani ann 1
to reduced row echelon form by elementary row operations. It can be shown that the
number of operations required is something like 2n
3
, so this is less ecient than our first
method.
82 Matrics
Matrix Factorization
In some situations, we may need to solve systems of linear equations of the form
Ax = b, with the same coefficient matrix A but for many different vectors b. If A is an
invertible square matrix, then we can and its inverse AI and then compute A1b for each
vector b. However, the matrix A may not be a square matrix, and we may have to convert
the augmented matrix to reduced row echelon form.
In this section, we describe a way for solving this problem in a more efficient way. To
describe this, we first need a deffinition.
Definition. A rectangular array of numbers is said to be in quasi row echelon form if
the following conditions are satised:
(1) The leftmost nonzero entry of any nonzero row is called a pivot entry. It is
not necessary for its value to be equal to 1.
(2) All zero rows are grouped together at the bottom of the array.
(3) The pivot entry of a nonzero row occurring lower in the array is to the right
of the pivot entry of a nonzero row occurring higher in the array.
In other words, the array looks like row echelon form in shape, except that the pivot
entries do not have to be equal to 1.
We consider first of all a special case.
Proposition. Suppose that an m x n matrix A can be converted to quasi row echelon
form by elementary row operations but without interchanging any two rows. Then A = LU,
where L is an m x m lower triangular matrix with diagonal entries all equal to 1 and U is
a quasi row echelon form of A.
Proof Recall that applying an elementary row operation to an m x n matrix corresponds
to mUltiplying the matrix on the left by an elementary m x m matrix. On the other hand, if
we are aiming for quasi row echelon form and not row echelon form, then there is no need
to multiply any row of the array by a nonzero constant. Hence the only elementary row
operation we need to perform is to add a mUltiple of one row to another row. In fact, it is
sucient even to restrict this to adding a mUltiple of a row higher in the array to another row
lower in the array, and it is easy to see that the corresponding elementary matrix is lower
triangular, with diagonal entries all equal to 1. Let us call such elementary matrices unit
lower triangular. If an m x n matrix A can be reduced in this way to quasi row echelon
form U, then
U = E
k
, ... , E
2
E
1
A,
where the elementary matrices E
1
,E
2
, ... , Ek are all unit lower triangular. Let
L = (Ek' ... , E
2
E
1
)I.
Then
A=LU.
It can be shown that products and inverses of unit lower triangular matrices are also
unit lower triangular. Hence L is a unit lower triangular matrix as required.
Matrics
If Ax = b and A = L U, then
L(Ux) = b.
Writing
y= Ux,
we have
Ly = band Ux = y.
"
83
It follows that the problem of solving the system Ax = b corresponds to first solving
the system Ly = b and then solving the system Ux = y. Both of these systems are easy to
solve since both Land U have many zero entries. It remains to and Land U.
Ifwe reduce the matrix A to quasi row echelon form by only performing the elementary
row operation of adding a multiple of a row higher in the array to another row lower in the
array, then U can be taken as the quasi row echelon form resulting from this. It remains to
nd L. However, note that L = (E
k
, ••• , E2Eltl, where U = E
k
, ••• , E
2
E
1
A, and so
1= E
k
, ••• , E2EIL
This means that the very elementary row operations that convert A to U will convert
L to /. We therefore wish to create a matrix L such that this is satised. It is simplest to
illustrate the technique by an example.
Example. Consider the Matrix
2 1 2 2 3
4 6 5 8
A=
2 10 4 8 5
2 13 6 16 5
The entry 2 in row 1 and column 1 is a pivot entry, and column 1 is a pivot column.
Adding 2 times row 1 to row 2, adding 1 times row 1 to row 3, and adding 1 times row
1 to row 4, we obtain
2 1 2 2 3
0 3 2 1 2
0 9 6 10 8
0 12 8 18 8
Note that the same three elementary row operations convert
1 0 0 0 0 0 0
2 1 0 0 0 0 0
to
1
*
1 0 0
*
1 0
* *
0
* *
1
Next, the entry 3 in row 2 and column 2 is a pivot entry, and column 2 is a pivot
84 Matrics
column. Adding 3 times row 2 to row 3, and adding 4 times row 2 to row 4, we obtain
2 1 2 2 3
0 3 2 1 2
0 0 0 7 2
0 0 0 14 0
Note that the same two elementary row operations convert
0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
to
0 3 1 0 0 0 1 0
0 4
*
0 0
*
Next, the entry 7 in row 3 and column 4 is a pivot entry, and column 4 is a pivot
column. Adding 2 times row 3 to row 4, we obtain the quasi row echelon form
2 1 2 2 3
0 3 2 1 2
u=
2'
0 0 0 7
0 0 0 0 4
where the entry 4 in row 4 and column 5 is a pivot entry, and column 5 is a pivot
column. Note that the same elementary row operation converts
1 0 0 0 0 0 0
0 0 0 0 1 0 0
to
0 0 1 0 0 0 1 0
0 0 2 1 0 0 0
Now observe that if we take
0 0 0
2 1 0 0
L=
o '
1 3
1 4 2 1
then L can be converted to 14 by the same elementary operations that convert A to U.
The strategy is now clear. Every time we nd a new pivot, we note its value and the
entries below it. The lower triangular entries of L are formed by these columns with each
column divided by the value of the pivot entry in that column.
Example. Let us examine our last example again. The pivot columns at the time of
establishing the pivot entries are respectively
Matrics 85
2
* *
*
4 3
* *
2 '
9 7
,
* '
2 12 14 4
Dividing them respectively by the pivot entries 2, 3, 7 and 4, we obtain respectively
the columns
* *
2 1
* *
l' 3' l' *'
4 2
Note that the lower triangular entries of the matrix
1 0 0 0
2 1 0 0
L=
3 1 0
4 2
correspond precisely to the entries in these columns.
LU FACTORIZATION ALGORITHM.
(1) Reduce the matrix A to quasi row echelon form by only performing the
elementary row operation of adding a mUltiple of a row higher in the array
to another row lower in the array. Let V be the quasi row echelon form
obtained.
(2) Record any new pivot column at the time of its first recognition, and modify
it by replacing any entry above the pivot entry by zero and dividing every
other entry by the value of the pivot entry.
(3) Let L denote the square matrix obtained by letting the columns be the pivot
columns as modied in step (2).
Example. We wish to solve the system of linear equations Ax = b, where
3 1 2 4 1 1
3 3 5 5 2 2
A= andb=
6 4 11 10 6 9
6 8 21 13 9 15
Let us first apply LV factorization to the matrix A. The first pivot column is column 1,
with modied version
86
1
2
2
Matries
Adding row 1 to row 2, adding  2 times row 1 to row 3, and adding 2 times row 1 to
row 4, we obtain
3 1 2 4
0 2 3 1
0 2 7 2 4
0 6 17 5 7
The second pivot column is column 2, with modied version
o
1
1
3
Adding row 2 to row 3, and adding 3 times row 2 to row 4, we obtain
3 1 2 4 1
0 2 3 1
0 0 4 1 3
0 0 8 2 4
The third pivot column is column 3, with modied version
o
o
2
Adding 2 times row 3 to row 4, we obtain the quasi row echelon form
3 1 2 4
0 2 3 1
0 0 4 1 3
0 0 0 0 2
The last pivot column is column 5, with modied version
Matrics 87
0
0
0
1
It follows that
0 0 0 3 1 2 4
1 0 0 0 2 3 1 1
L= and u=
2 1 1 0 0 0 4 1 3
2 3 2 0 0 0 0 2
We now consider the system Ly = b, with augmented matrix
1 0 0 0 1
1 o 02
2 1 1 0 9
2 3 2 1 15
Using row 1, we obtainYl = 1. Using row 2, we obtainY2  Yl = 2, so thatY2 =1.
Using row 3, we obtain Y3 + 2Yl  Y2 = 9, so that Y3 = 6. Using row 4, we obtain
Y4  2Yl + 3Y2  2Y3 = 15,
so that Y4 = 2.
Hence
Y=
1
1
6
2
We next consider the system Ux = Y, with augmented matrix
3 1 2 4 1 1
o 2 3 1 11
o 0 4 1 3 6
o 0 0 0 2 2
Here the free variable is x
4
. Let x
4
= t. Using row 4, we obtain 2x5 = 2, so that x5 = 1.
.. 3 1
Usmg row 3, we obtam 4x3 = 6 + x
4
3x5 = 3 + t, so that x3 = 4 + 4
t
. Using row 2, we
obtain
9 1
2x2 = 1 + 3x3 x4 + x5 Xs = t,
4 4
88 Matrics
9 1 27 3
so that x
2
= g  g/. Using row 1, we obtain 3x
I
= l+x2  2x3+4x4  x5 = Sl  g' so
9 1
that XI = gt  g' Hence
_ (9/1 91 t
3
+t 11)
(xl' x
2
' X3' X
4
' Xs)  8' 8 ' 4 " ,where t E R.
Remarks. (1) In practical situations, interchanging rows is usually necessary to convert
a matrix A to quasi row echelon form. The technique here can be modied to produce a
matrix L which is not unit lower triangular, but which can be made unit lower triangular
by interchanging rows.
2
(2) Computing an LU factorization of an n x n matrix takes approximately 3
n3
operations. Solving the systems Ly = band Ux = y requires approximately 2n
2
operations.
(3) LU factorization is particularly ecient when the matrix A has many zero entries,
in which case the matrices Land U may also have many zero entries.
Application to Games of Strategy
Consider a game with two players. Player R, usually known as the row player, has m
possible moves, denoted by i = 1, 2, 3, ... , m, while player C, usually known as the column
player, has n possible moves, denoted by j = 1,2,3, ... , n, For every i = 1,2,3, ... , m, and
j = 1, 2, 3, ... , n, let aij denote the payo that player C has to make to player R if player R
makes move i and player C makes move j. These numbers give rise to the payo matrix
[
all
A
 :
 .
ami
The entries can be positive, negative or zero.
Suppose that for every i = 1, 2, 3, ... , m, player R makes move i with probability Pi'
and that for every j = 1,2,3, ... , n, player C makes movej with probability qj" Then
PI + ... + Pm = 1 and qI + ... + qn = 1.
Assume that the players make moves independently of each other. Then for every
i = 1,2,3, ... , m, andj = 1,2,3, ... , n, the number Pjqj represents the probability that player
R makes move i and player C makes move j. Then the double sum
m n
EA(p,q) = LLaijp;qj
;=1 j=1
represents the expected payo that player C has to make to player R.
The matrices
Matrics 89
P (PI··· Pm ) and q = [ ::J
are known as the strategies of player R and player C respectively. Clearly the expected
payo
m n [ all .. . ]_
EA(p,q) = ..  p.Aq.
amI'" a
mn
qn
Here we have slightly abused notation. The right hand side is a I x 1 matrix!
We now consider the following problem: Suppose that A is xed. Is it possible for
player R to choose a strategy p to try to maximize the expected payo Eip, q)? Is it possible
for player C to choose a strategy q to try to minimize the expected payo EA(P, q)?
Fundemental Theorem of Zero Sum Games. There exist strategies p* and q* such
that
EA(p*, q) > EA(p*, q*) > EA(p, q*)
for every strategy p* of player R, and every strategy q* of player C. Remark. The
strategy p is known as an optimal strategy for player R, and the strategy q is known as an
optimal strategy for player C. The quantity EA(p*, q*) is known as the value of the game.
Optimal strategies are not necessarily unique. However, if p** and q** are another pair of
optimal strategies, then EA(P*, q*) = EA(p**, q**).
Zero sum games which are strictly determined are very easy to analyse. Here the payo
matrix A contains saddle points. An entry aij in the payo matrix A is called a saddle point
if it is a least entry in its row and a greatest entry in its column. In this case, the strategies
o

o
p* = (0 ... 0 I 0 ... 0) and q = I ,
o
o
where the I 's occur in position i in p* and positionj in q*, are optimal strategies, so
that the value of the game is aij .
Remark. It is very easy to show that different saddle points in the payo matrix have
the same value.
Example. In some sports mad school, the teachers require 100 students to each choose
between rowing (R) and cricket (C). However, the students cannot make up their mind,
90 Matrics
and will only decide when the identities of the rowing coach and cricket coach are known.
There are 3 possible rowing coaches and 4 possible cricket coaches the school can hire.
The number of students who will choose rowing ahead of cricket in each scenario is as
follows, where RI, R2 and R3 denote the 3 possible rowing coaches, and CI, C2, C3 and
C4 denote the 4 possible cricket coaches:
CI C2 C3 C5
RI 75 50 45 60
R2 20 60 30 55
R3 45 70 35 30
[For example, if coaches R2 and CI are hired, then 20 students will choose rowing,
and so 80 students will choose cricket.] We first reset the problem by subtracting 50 from
each entry and create a payo matrix
A ].
5 20 15 20
[For example, the top left entry denotes that if each sport starts with 50 students, then
25 is the number cricket concedes to rowing.] Here the entry 5 in row I and column 3 is
a saddle point, so the optimal strategy for rowing is to use coach RI and the optimal strategy
for cricket is to use coach C3.
In general, saddle points may not exist, so that the problem is not strictly determined.
Then the solution for these optimal problems are solved by linear programming techniques
which we do not discuss here. However, in the case of 2 x 2 payo matrices
which do not contain saddle points, we can write P2 = I  PI and q2 = I  ql. Then
Eip, q) = allPlql + a
I
2PI(1 ql) + a
21
(1 PI)ql + a
22
(1  PI)(1  ql)
= ((all  a
12
 a
21
+ a
22
)Pl  (a
22
 a
21
))ql + (a
12
 a
22
)Pl + a
22
·
Let
Then
E (
* ) (a
12
a2
2
)(a2
2
a
21
) a
lI
a
22
a
12
a
2
1
A P ,q = + a22 =
all  al2  a
21
+ a2
2
all  a
l2
 a
21
+ a22
which is independent of q. Similarly, if
Matrics 91
then
E (
*)  alla22  a12
a
2l
A p,q  ,
al1  a12  a
2l
+ a
22
which is independent of p. Hence
Eip* q) = Eip*, q*) = Eip, q*) for all strategiesp and q:
Note that
* [ a22 a2l all a
12
1
P = all a12 a
2l
+a22 alla12 a
2l
+a
2
2
and
with value
ORTHOGONAL MATRICES
Definition. A square matrix A with real entries and satisfying the condition AI = At is
called an orthogonal matrix.
Example. Consider the euclidean space ~ 2 with the euclidean inner product. The
vectors u
l
= (1, 0) and u
2
= (0, 1) form an orthonormal basis B = {up u2}. Let us now
rotate u
l
and u
2
anti clockwise by an angle to obtain vI = (cose sin e) and v
2
= (sine,
cose ). Then C = {vI' v
2
} is also an orthonormal basis.
92
Matries
The transition matrix from the basis C to the basis B is given by
[
COS sinS 1
p = ([VdB[V
2
ln = . .
smS cosS
Clearly
I t [cos sin S 1
p =p = .
sinS cosS
In fact, our example is a special case of the following general result.
Proposition. Suppoiie that B = {up'" un} and C = {vI' ... , v
n
} are two orthonormal
bases of a real inner product space V. Then the transition matrix P from the basis C to the
basis B is an orthogonal matrix.
Example. The matrix
(
113 2/3
A = 2/3 113
2/3 2/3
is orthogonal, since
2/3 )
2/3
1/3
(
113 2/3 2/3) (113
At A = 2/3 113 2/3 2/3
2/3 2/3 113 2/3
2/3 2/3) (1 0 0 )
1/3 2/3 = 0 1 0 .
2/3 1/3 0 0 1
Note also that the row vectors of A, namely (113,2/3,2/3), (2/3,113,2/3) and (2/3, 2/3,
113) are orthonormal. So are the column vectors of A.
In fact, our last observation is not a coincidence.
Proposition. Suppose that A is an n x n matrix with real entries. Then
(a) A is orthogonal if and only if the row vectors of A form an orthonormal basis
of 1R n under the euclidean inner product; and
(b) A is orthogonal if and only if the column vectors of A form an orthonormal
basis of 1R n under the euclidean inner product.
Proof We shall only prove (a), since the proof of (b) is almost identical. Let r., .... , rn
denote the row vectors of A. Then
(
Ii. Ii .. . Ii .:rn )
AA
t _ :
. ..
rn .r. rn .rn
It follows that AAt = I if and only if for every i, j = 1, ..... , n, we have
Matrics 93
{
I if i == j,
r· r·
/. } 0 ifi::/:.j,
if and only if r I' ... , r n are orthonormal.
Proposition. Suppose that A is an n x n matrix with real entries. Suppose further that
the inner product in lR n is the euclidean inner product. Then the following are equivalent:
(a) A is orthogonal.
(b) For every x E lRn, we have II Ax II = II x II.
(c) For every u, v E lRn, we have Au . Av = u . v.
Proof «a)) :::} (b)) Suppose that A is orthogonal, so that AlA = 1. It follows that for
every x E lR n, we have
II Ax 112 = Ax . Ax = xlAIAx = xlIx = XIX = X . X = II X 112.
«b)) :::} (c)) Suppose that II Ax II = II x II for every x E lR
n
• Then for every u, v E lRn,
we have
1 21 21 21 2
Au.Av=IIAu+Avll IIAuAvll =IIA(u+v)1I IIA(uvll
4 4 4 4
1 2 1 2
=lIu+vll lluvll =u.v.
4 4
Suppose that Au. Av = u. v for every u, v E lR n . Then
Iu. v=u. v=Au.Av=v'AIAu=AIAu. v,
so that
(AlA l)u . v = o.
In particular, this holds when v = (AlA l)u, so that
(AlA I)u . (AlA I) u = 0,
whence
(AlA I)u = 0, ... (i)
But then (1) is a system ofn homogeneous linear equations in n unknowns satised by"
every u E lR
n
• Hence the coefficient matrix AI A I must be the zero matrix, and so AlA =
1.
Proof For every UE V, we can write
u = + ... + = Ylvi + ... + ynv
n
'
where
YI'····· YnE lR,
and where B = {u
l
' ..... , un} and C= {vi' ..... , v
n
} are two orthonormal bases of V. Then
II u 112 = (u, u) = + ... + + .... +
94 Matrics
Similarly,
IIu /1
2
= (U, U) = ('YIVI + ... + 'Y
n
V
n
, 'YIVI + ... + 'Y
n
V
)
n n n
= LL'Yi'Yj(V;Vj ) = L ' Y ~
1=1 j=1 1=1
It follows that in lR n with the euclidean norm, we have /I [u]B /I = /I [u]c /I, and so
/I P[u]c /I = II [u]c /I
for every u E V. Hence /I Px II = II x II holds for every x E R
n
. It now follows from
Proposition that P is orthogonal.
Eigenvalues and Eigenvectors
We give a brief review on eigenvalues and eigenvectors:
Suppose that
(
a ~ 1 ••• a ~ n J
A=: :
ani ann
is an n x n matrix with real entries. Suppose further that there exist a number A E 1R
and a nonzero vector vERn such that Av = v. Then we say that A is an eigenvalue of the
matrix A, and that v is an eigenvector corresponding to the eigenvalue A . In this case, we
have Av = AV = ')Jv, where I is the n x n identity matrix, so that (A  AI)v = O. Since
vERn is nonzero, it follows that we must have
det(A  AI) = O. . .. (ii)
In other words, we must have
det =0
ani a
n
2 ann  A
Note that (ii) is a polynomial equation. The polynomial det(A  AI) is called the
characteristic polynomial of the matrix A. Solving this equation (2) gives the eigenvalues
of the matrix A.
On the other hand, for any eigenvalue of the matrix A, the set
{ v E lR n : (A  AI)v = O}
... (iii)
is the nullspace of the matrix A  ')J, and forms a subspace of R n • This space (iii) is
Matrics 95
called the eigenspace corresponding to the eigenvalue A. Suppose now that A has eigenvalues
AI' .... An E lR, not necessarily distinct, with corresponding eigenvectors vI' .... , vn E lRn,
and that vi' .... vn are linearly independent. Then it can be shown that
pIAP=D
,
where
J
In fact, we say that A is diagonalizable if there exists an invertible matrix P with real
entries such that PIAP is a diagonal matrix with real entries. It follows that A is
diagonalizable if its eigenvectors form a basis of lR n • In the opposite direction, one can
show that if A is diagonalizable, then it has n linearly independent eigenvectors in lR n • It
therefore follows that the question of diagonalizing a matrix A with real entries is reduced
to one of linear independence of its eigenvectors.
We now summarize our discussion so far.
Diagonalization Process. Suppose that A is an n x n matrix with real entries.
(1) Determine whether the n roots of the characteristic polynomial det(A  IJ) are
real.
(2) If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding
to these eigenvalues. Determine whether we can nd n linearly independent
eigenvectors.
(3) If not, then A is not diagonalizable. If so, then write
p ~ (vI .... v
n
) and D ~ (A.I '. A.J
where Ai' ... , An E lR are the eigenvalues of A and where vl' ..... v
n
E lR
n
are respectively
their corresponding eigenvectors. Then PIAP = D.
In particular, it can be shown that if A has distinct eigenvalues AI'·· .. An E lR, with
corresponding eigenvectors vi' ...... , vn E lR
n
, then vi' ..... vn are linearly independent. It
follows that all such matrices A are diagonalizable.
Orthonormal Diagonalization
We now consider the euclidean space ffi. n an as inner product space with the euclidean
inner product. Given any n x n matrix A with real entries, we wish to nd out whether there
96
Matrics
exists an orthonormal basis of ~ n consisting of eigenvectors of A. Recall that in the
Diagonalization process discussed in the last section, the columns of the matrix Pare
eigenvectors of A, and these vectors form a basis of ~ n . It follows from Proposition basis
is orthonormal if and only if the matrix P is orthogonal.
Definition. An nn matrix A with real entries is said to be orthogonally dagonalizable if
there exists an orthogonal matrix P with real entries such that PIAP = plAP is a diagonal
matrix with real entries. First of all, we would like to determine which matrices are
orthogonally diagonalizable. For those that are, we then need to discuss how we may find
an orthogonal matrix P to carry out the diagonalization. To study the first question, we
have the following result which gives a restriction on those matrices that are orthogonally
diagonalizable.
Proposition. Suppose that A is a orthogonally diagonalizable matrix with real entries.
Then A is symmetric.
Proof Suppose that A is orthogonally diagonalizable. Then there exists an orthogonal
matrix P and a diagonal matrix D, both with real entries and such that P'AP = D. Since PP'
= pIp = I and Dt = D, we have
A = PDP' = PD'P',
so that
A' = (PD'P0
t
= (P0
t
(D0'P' = PDP' = A,
whence A is symmetric. Our first question is in fact answered by the following result
which we state without proof.
Proposition. Suppose that A is an n x n matrix with real entries. Then it is orthogonally
diagonalizable if and only if it is symmetric. The remainder of this section is devoted to
nding a way to orthogonally diagonalize a symmetric matrix with real entries. We begin
by stating without proof the following result. The proof requires results from the theory of
complex vector spaces.
Proposition. Suppose that A is a symmetric matrix with real entries. Then all the
eigenvalues of A are real.
Our idea here is to follow the Diagonalization process discussed in the last section,
knowing that since A is diagonalizable, we shall find a basis of ~ n consisting of
eigenvectors of A. We may then wish to orthogonalize this basis by the GramSchmidt
process. This last step is considerably simplied in view of the following result.
Proposition. Suppose that u
1
and u
2
are eigenvectors of a symmetric matrix A with
real entries, corresponding to distinct eigenvalues 1 and 2 respectively. Then u
1
• u
2
= O. In
other words, eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues
are orthogonal.
Proof Note that if we write u
1
and u
2
as column matrices, then since A is symmetric,
we have
Matrics
It follows that
A)U) u
2
= Au) . u
2
= u) AU
2
= u) .2u
2
,
so that (AI  A2)(u
l
. u
2
) = o. Since A) 7: A
2
, we must have u
l
. u
2
= o.
We can now follow the procedure below.
97
Orthogonal Diagonalization Process. Suppose that A is a symmetric n x n matrix
with real entries.
(1) Determine the n real roots AI, .... An of the characteristic polynomial det (A AJ),
and find n linearly independent eigenvectors u
l
' •••• , un of A corresponding to these
eigenvalues as in the Diagonalization process.
(2) Aply the orthogonalization process to the eigenvectors u
I
' ••• , un
to obtain orthogonal eigenvectors vIm, ... , vn of A, noting that eigenvectors
corresponding to distinct eigenvalues are already orthogonal.
(3) Normalize the orthogonal eigenvectors v .. ... , vn to obtain orthonormal eigenvectors
wI' ... , wn of A. These form an orthonormal basis of lRn. Furthermore, write
(w
J
... w.) and D=(AJ ". AJ
where A., ... , An E lR are the eigenvalues of A and where wI' ... , wn E lR
n
are
respectively their orthogonalized and normalized eigenvectors. Then PtAP = D.
Remark. Note that if we apply the GramSchmidt orthogonalization process to
eigenvectors corresponding to the same eigenvalue, then the new vectors that result from
this process are also eigenvectors corresponding to this eigenvalue. Why?
Example. Consider the matrix
A=(
122
To find the eigenvalues of A, we need to nd the roots of
(
2A 2 1)
det 2 5  A 2 = 0;
2 2A
in other words, (A7)(A 1)2 = O. The eigenvalues are therefore Al = 7 and (double
root)
An eigenvector corresponding to A) = 7 is a solution of the system
98 Matrics
(
5
(A71)u =
Eigenvectors corresponding to 1..2 =1..3 = 1 are solutions of the system
(A7I)U=U 0 wiili root ., =UJ Md ", =Ul J
which are linearly independent. Next, we apply the GramSchmidt orthogonalization
process to u
2
and u
3
, and obtain
which are now orthogonal to each other. Note that we do not have to do anything to
ul at this stage, in view of Proposition. We now conclude that
form an orthogonal basis of Normalizing each of these, we obtain respectively
[
1116] (1IJi] [1IJ3]
wI = 2116 ,w2 = 0 , w3 = 11 J3 .
1116 1IJi 11J3
We now take
Then
[
1116 2/16
p= pI = 1IJi 0
1IJ3 1IJ3
and
1IJ3 0 0
Example. Consider the matrix
Matries
o 9 20
To find the eigenvalues of A, we need to nd the roots of
_13
6
_A Jo;
o 9 20A
in other words, (A + 1)( A 2)( A  5) = O. The eigenvalues are therefore
Al = 1, A2 = 2 and A3 = 5.
An eigenvector corresponding Al = I is a solution of the system
with root ul
o 9 21 0
An eigenvector corresponding to = 2 is a solution of the system
with root u3
o 9 15 3
An eigenvector corresponding to A3 = 5 is a solution of the system
(A+5/)u = Ju = 0, with root u3 = J.
o 9 15 3
99
Note that while u
I
' u
2
' u
3
correspond to distinct eigenvalues of A, they are not
orthogonal. The matrix A is not symmetric, and so Proposition 100 does not apply in this
case.
Example. Consider the matrix
A=( n
To find the eigenvalues of A, we need to nd the roots of
(
5A 2 0 J
det 2 6  A 2 = 0;
o 2 7A
in other words, ( A  3)( A 6)( A  9) = O. The eigenvalues are therefore
A\ = 3, A2 = 6
100
Matrics
and 1..3 = 9. An eigenvector corresponding 1..1 = 3 is a solution of the system
(A3J)u = 0, with root Ut =( ).
o 2 4 1
An eigenvector corresponding to = 6 is a solution of the system
(A6nu =( = 0, wiili root =( )
An eigenvector corresponding to 3 = 9 is a solution of the system
(
4 2 0) [1)
(A9J)u= 2 3 2 u=O, with root u3= 2 .
o 2 2 2
Note now that the eigenvalues are distinct, so it follows from Proposition that up u
2
,
u
3
are orthogonal, so we do not have to apply Step (2) of the Orthogonal diagonalization
process. Normalizing each of these vectors, we obtain respectively
f
2/3) [2/3) (1/3)
wt 2/3 ,w
2
= 113, w3 2/3 .
113 2/3 2/3
We now take
Then
(
2/3 2/3 1/3)
P=(wt w2 w3)= 2/3 113 2/3.
113 2/3 . 2/3
(
2/3
pt = pi = 2/3
113
2/3
1/3
2/3
113) (3
2/3 and pIAP= 0
2/3 0
Chapter 4
Determinants
Introduction
The reader probably already met determinants in calculus or algebra, at least the
determinants of 2 x 2 and 3 x 3 matrices. For a 2 x 2 matrix
the determinant is simply ad  be; the determinant of a 3 x 3 matrix can be found by
the "Star of David" rule.
In this chapter we would like to introduce determinants for n x n matrices. I don't
want just to give a formal definition. First I want to give some motivation, and then derive
some properties the determinant should have. Then if we want to have these properties,
we do not have any'choice, and arrive to several equivalent definitions of the determinant.
It is more convenient to start not with the determinant of a matrix, but with determinant
of a system of vectors. There is no real difference here, since we always can join vectors
together (say as columns) to form a matrix.
Let us have n vectors VI' V
2
, ... , vn in R,n (notice that the number of vectors coincide
with and we want to find the ndimensional volume of the parallelepiped
determined by these vectors.
The parallelepiped determined by the vectors VI' V
2
, ... , Vn can be defined as the
collection of all vectors V E ]Rn that can be represented as
v=tIvI +t
2
v
2
+ .. ·+t
n
v
n
,O S tk S 1 "ik = 1,2, ... , n.
It can be easily visualized when n = 2 (parallelogram) and n = 3
So, what is the ndimensional volume?
If n = 2 it is area; if n = 3 it is indeed the volume. In dimension 1 is it just the length.
Finally, let us introduce some notation. For a system of vectors (columns) VI' V
2
, ... , Vn
we will denote its determinant (that we are going to construct) as D(v
I
, V
2
, ... , vn). If we
102 Determinants
join these vectors in a matrix A (column number k of a is v
k
), then we will use the notation
detA,
detA = D(v\, v
2
, ... , vn)
Also, for a matrix
an,} an'2 an,n
its determinant is often is denoted by
al.l aI,2 aI,n
a2) a2'f a2,!1 .
WHAT PROPERTIES DETERMINANT SHOULD HAVE
We know, that for dimensions 2 and 3 "volume" of a parallelepiped is determined by
the base times height rule: if we pick one vector, then height is the distance from this
vector to the subspace spanned by the remaining vectors, and the base is the (n  1) 
dimensional,Volume of the parallelepiped determined by the remaining vectors.
Now let us generalize this idea to higher dimensions. For a moment we do not care
about how exactly to determine height and base. We will show, that if we assume that the
base and the height satisfy some natural properties, then we do not have any choice, and
the volume (determinant) is uniquely defined.
Linearity in Each Argument
First of all, if we multiply vector vI by a positive number a, then the height (i.e., the
distance to the linear span £ (v2' ... , v n)) is multiplied by a. If we admit negative heights
(and negative volumes), then this property holds for all scalars a, and so the determinant
D(vl'v
2
, ... , v
n
) of the system vI' V
2
, ... , vn should satisfy
D(av
I
, v
2
, ... , v
n
) = aD(v
I
, v
2
, ... , vn).
Of course, there is nothing special about vector vI' so for any index k
D(VI, .. ·,o.Vk'·"'V
n
) = aD(v
I
, ... , v ~ ... , v
n
)
k k
To get the next property, let us notice that if we add 2 vectors, then the "height" of the
result should be equal the sum of the "heights" of summands, i.e., that
D(vI"",uk +vk,''''vn) D( ) D( )
, , ' = VI''''Ub' .. , vn + VI'"'' Vb"', vn
k k k
In other words, the above two properties say that the determinant of n vectors is linear
in each argument (vector), meaning that if we fix n  1 vectors and interpret the remaining
vector as a variable (argument), we get a linear function.
Determinants 103
Remark. We already know that linearity is a very nice property, that helps in many
situations. So, admitting negative heights (and therefore negative volumes) is a very small
price to pay to get linearity, since we can always put on the absolute value afterwards.
In fact, by admitting negative heights, we did not sacrifice anything! To the contrary,
we even gained something, because the sign of the determinant contains some information
about the system of vectors (orientation).
Preservation Under "Column Replacement"
The next property also seems natural. Namely, if we take a vector, say vp and add to
it a multiple of another vector v
k
, the "height" does not change, so
D(v\,. "Vj + o.Vk':'" vb'''' Vn) = D[V\, ... , v
j
, ... , vk,'''' V
n
]
, 'k k
j j
In other words, if we apply the column operation of the third type, the determinant
does not change.
Remark. Although it is not essential here, let us notice that the second part of linearity
is not independent: it can be deduced from properties.
Antisymmetry
The next property the determinant should have, is Functions of several variables that
change sign
D[V\>'''' ... , Vj, ... , Vn 1 = D(v\, ... , Vj""Vk,""V
n
),
} k J j k
At first sight this property does not look natural, but it can be deduced from the previous
ones. Namely, applying property three times, and then using we get
D(v\, ... , Vj, ... , Vb'''' V
n
) =
j k
= D(V1"",Vj"",:k Vj,'''''Vn)
j k
= +(Vj ... 'Vn 1
= Vj.' ... ,Vn]
) k
104
=n['1, ... ,Vy.,V
t
··,V
n
1
= +, ... ,vr .. ,Vt .. ,vn J
Normalization
The last property is the easiest one. For the D(e!, e
2
, ... , en) = 1.
In matrix notation this can be written as
det(l) = 1
Constructing the Determinant
Determinants
The plan of the game is now as follows: using the properties that the determinant
should have, we derive other properties of the determinant, some of them highly non
trivial. We will show how to use these properties to compute the determinant using our old
friendrow reduction.
We will show that the determinant, i.e., a fu'nction with the desired properties exists
and unique. After all we have to be sure that the object we are computing and studying
exists. .
Basic Properties. We will use the following basic properties of the determinant:
1. is linear in each column, i.e., in vector notation for every index k
D(v!, ... :f3
v
k;···'V
n
) =
k
o.D(v!, ... ,uk , ... , v
n
) + f3D(v!, ... vk,··· v
n
)
k k
for all scalars a,
2. Determinant is antisymmetric, i.e., if one interchanges two columns, the
determinant changes sign.
3. Normalization property: det 1= 1.
The first propertyis just the combined. The second one and the last one is the
normalization property. Note, that we did not use property: it can be deduced from the
above three. These three properties completely define determinant.
Properties of Determinant Deduced from the Basic Properties
Proposition. For a square matrix a the following statements hold:
1. If a has a zero column, then detA = o.
2. If a has two equal columns, then det A = 0;
3. If one column of a is a multiple of another, then detA = 0;
4. If columns of a are linearly dependent, i.e., if the matrix is not invertible, then
detA = o.
Determinants 105
Proof Statement 1 follows immediately from linearity. Ifwe multiply the zero column
by zero, we do not change the matrix and its determinant. But by the property 1 above, we
should get O. The fact that determinant is antisymmetric, implies statement 2. Indeed, if
we interchange two equal columns, we change nothing, so the determinant remains the
same. On the other hand, interchanging two columns changes sign of determinant, so
det4 =  det A,
which is possible only if det A = O. Statement 3 is immediate corollary of statement 2
and linearity.
To prove the last statement, let us first suppose that the first vector VI is a linear
combination of the other vectors,
n
VI = <x
2
v
2
+ <X3
v
3 + ... + <Xnvn = Lakvk'
k=2
Then by linearity we have (in vector notation)
D(v,. v" ... , v.l = D [[t, "'tVt]. v,. v3'"'' v. 1
n
= v2, v3"'" vn)
k=2
and each determinant in the sum is zero because of two equal columns.
Let us now consider general case, i.e., let us assume that the system vI' v
2
, ... , vn is
linearly dependent. Then one of the vectors, say v
k
can be represented as a linear combination
ofthe others. Interchanging this vector with VI we arrive to the situation we just treated, so
D(vl""'Vk"", v
n
) = D(vk""'V)'''''vn) = 0 = 0,
k k
so the determinant in this case is also O.
The next proposition generalizes property. As we already have said above, this property
can be deduced from the three "basic" properties of the determinant, we are using in this
section.
Proposition. The determinant does not change if we add to a column a linear
combination of the other columns (leaving the other columns intact). In particular, the
determinant is preserved under "column replacement" (column operation of third type).
Proof Fix a vector v
k
, and let u be a linear combination of the other vectors,
u =
j"",k
Then by linearity
D(vl'''' vk + U, ... , v
n
) = D(vl"'" vk,'''' v
n
) + D(vI," .,u, ... , v
n
),
'k k k
and by Proposition the last term is zero.
106 Determinants
Determinants of Diagonal and Triangular Matrices
Now we are ready to compute determinant for some important special classes of
matrices.
The first class is the socalled diagonal matrices. Let us recall that a square matrix
A = {aj, k} is called diagonal if all entries off the main diagonal are zero, i.e., if a
j
•
k
= ° for all} ::j:: k. We will often use the notation diag{a\, a
2
, ... , an} for the diagonal matrix
[
? ... g].
° ° an
Since a diagonal matrix diag{a\, a
2
, ... , an} can be obtained from the identity matrix
I by multiplying column number k by a
k
,
Determinant of a diagonal matrix equal the product of the
diagonal entries,det(diag{a\, a2' ... , an}) = a\a2 .... an'
The next important class is the class of socalled triangular matrices. A square matrix
A = {a j 'k } is called upper triangular if all entries below the main diagonal are 0,
i.e" if a"k = ° for all k <i. A square matrix is called lower triangular if all entries above
the are 0, i.e if aj'k = ° for all} < k. We call a matrix triangular, if it is either lower
or upper triangular matrix.
It is easy to see that
Determinant of a triangular matrix equals to the product
of the diagonal entries,
detA = a\,\a
2
'2 ... an' n.
Indeed, if a triangular matrix has zero on the main diagonal, it is not invertible (this
can easily be checked by column operations) and therefore both sides equal zero. If all
diagonal entries are nonzero, then using column replacement (column operations of third
type) one can transform the matrix into a diagonal one with the same diagonal entries: For
upper triangular matrix one should first subtract appropriate multiples of the first column
from the columns number 2, 3, ... , n, "killing" all entries in the first row, then subtract
appropriate mUltiples of the second column from columns number 3, ... , n, and so on.
To treat the case of lower triangular matrices one has to do "column reduction" from
the left to the right, i.e., first subtract appropriate multiples of the last column from columns
number n  1, ... , 2, 1, and so on.
Computing the Determinant
Now we know how to compute determinants, using their properties: one just need to
do column reduction (i.e., row reduction for AT) keeping track of column operations
Determinants 107
changing the determinant. Fortunately, the most often used operation  row replacement,
i.e., operation of third type does not change the determinant. So we only need to keep
track of interchanging of columns and of multiplication of column by a scalar.
Ifan echelon form of AT does not have pivots in every column (and row), then a is not
invertible, so det A = O. If a is invertible, we arrive at a triangular matrix, and det A is the
product of diagonal entries times the correction from column interchanges and
multiplications.
The above algorithm implies that detA can be zero only if a matrix A is not invertible.
Combining this with the last statement of Proposition we get Proposition. det A = 0 if and
only if a is not invertible. An equivalent statement: det A * 0 if and only if A is invertible.
Note, that although we now know how to compute determinants, the determinant is
still not defined. One can ask: why don't we define it as the result we get from the above
algorithm? The problem is that formally this result is not well defined: that means we did
not prove that different sequences of column operations yield the same answer.
Determinants of Transpose and Product
In this section we prove two important theorems
Theorem. (Determinant of a transpose). For a square matrix A,
det A = det(AT).
This theorem implies that for all statement about columns the corresponding statements
about rows are also true. In particular, determinants behave under row operations the same
way they behave under column operations. So, we can use row operations to compute
determinants.
Theorem. (Determinant of a product). For n x n matrices A and B
det(AB) = (detA)( detB)
In other words
I Determinant of a product equals product of deteramamtsl
To prove both theorem we need the following lemma
Lemma. For a square matrix A and an elementary matrix E (of the same size)
det(AE) = (detA) (detE)
Proof The proof can be done just by direct checking: determinants of special matrices
are easy to compute; right multiplication by an elementary matrix is a column operation,
and eect of column operations on the determinant is well known.
This can look like a lucky coincidence, that the determinants of elementary matrices
agree with the corresponding column operations, but it is not a coincidence at all.
Namely, for a column operation the corresponding elementary matrix can be obtained
from the identity matrix 1 by this column operation. So, its determinant is 1 (determinant
of 1) times the eect of the column operation. And that is all! It may be hard to realise at
first, but the above paragraph is a complete and rigorous proof of the lemma!
Applying N times Lemma we get the following corollary.
108 Determinants
Corollary. For any matrix A and any sequence of elementary matrices E
I
, E
2
, ... , EN
(all matrices are n x n)
det(AE
I
E
2
, ... , EN) = (det A)(det EI)(det E
2
), ... , (det.EN)
Lemma. Any invertible matrix is a product of elementary matrices.
Proof We know that any invertible matrix is row equivalent to the identity matrix,
which is its reduced echelon form. So
/= ENElV
l
, ... , E
2
E
I
A,
and therefore any invertible matrix can be represented as a product of elementary
matrices,
I I I I I I I I
A = EI E2 ... EN_IE
N
/ = EI E2 ... EN_IE
N
(inverse of an elementary matrix is an elementary matrix).
Proof First of all, it can be easily checked, that for an elementary matrix E we have
detE = det(ET). Notice, that it is sucient to prove the theorem only for invertible matrices
A, since if A is not invertible then AT is also not invertible, and both determinants are zero.
By Lemma matrix A can be represented as a product of elementary matrices,
A = E
I
E
2
, ... , EN'
and by Corollary the determinant of A is the product of determinants of the elementary
matrices. Since taking the transpose just transposes each elementary matrix and reverses
their order, Corollary implies that
detA = detA T.
Proof Let us first suppose that the matrix B is invertible. Then Lemma implies that B
can be represented as a product of elementary matrices
B = E
I
E
2
, ... , EN'
and so by Corollary
det (AB) = (det A)[(det EI)(det E
2
), ... , (det EN)] = (det A)(det B).
If B is not invertible, then the product AB is also not invertible, and the theorem just
says that 0 = O.
To check that the product AB = C is not invertible, let us assume that it is invertible.
Then multiplying the identity AB = C by C
I
from the left, we get C1AB = /, so CIA is a
left inverse of B. So B is left invertible, and since it is square, it is invertible. We got a
contradiction.
Properties of Determinant
First of all, let us say once more, that determinant is defined only for square matrices
Since we now know that det A = det(AT), the statements that we knew about columns are
true for rows too.
1. Determinant is linear in each row (column) when the other rows (columns)
are fixed.
2. If one interchanges two rows (columns) of a matrix A, the determinant changes
sign.
Determinants 109
3. For a triangular (in particular, for a diagonal) matrix its determinant is the
product of the diagonal entries. In particular, det 1= 1.
4. If a matrix A has a zero row (or column), det A = O.
5. If a matrix A has two equal rows (columns), det A = O.
6. If one of the rows (columns) of A is a linear combination of the other rows
(columns), i.e., if the matrix is not invertible, then det A = 0; More generally,
7. det A = 0 if and only if A is not invertible, or equivalently
8. det A*,O if and only if A is invertible.
9. det A does not change if we add to a row (column) a linear combination of
the other rows (columns). In particular, the determinant is preserved under
the row (column) replacement, i.e., under the row (column) operation of the
third kind.
10. det AT = det A.
11. det(AB) = (det A)(det B). And finally,
12. If a is an n x n matrix, then det(aA) = an det A.
The last property follows from the linearity of the determinant, if we recall that to
multiply a matrix A by a we have to multiply each row by a, and that each multiplication
multiplies the determinant by a.
Existence and Uniqueness oftbe Determinant
In this section we arrive to the formal definition of the determinant. We show that a
function, satisfying the basic properties 1, 2, 3 from Section 3 exists, and moreover, such
function is unique, i.e., we do not have any choice in constructing the determinant.
Consider an n x n matrix A = {aj,k }nj,k = 1, and let vI' v
2
, ... , vn be its columns, i.e.,
vk = = al'k el +a2'k e2 + ... +an'k en = taj'k ej .
Using linearity of the determinant we expand it in the first column
n n
vI: D(v., v
2
, ... , v
n
) = D(2:
a
j,l
e
j' v2 ... ,e
n
= 2:
a
j'l D(e
j
, v2'''' v
n
)·
j=1 j=1
Then we expand it in the second column, then in the third, and so on. We get
n n n
D(v., v
2
, .. " v
n
) = 2: 2:, .. .. ,ajn,nD(ejl·eh" .. ej)'
j)=lh=1 jn=1
110 Determinants
Notice, that we have to use a different index of summation for each column:we call
themjl,h, ... ,jn; the indexh here is the same as the index}.
It is a huge sum, it contains nn terms. Fortunately, some of the terms are zero. Namely,
if any 2 of the indices j l' h, ... , jn coincide, the determinant D(ej I . eh, ... ef) is zero,
because there are two equal rows here.
So, let us rewrite the sum, omitting all zero terms. The most convenient way to do
that is using the notion of a permutation. a permutation of an ordered set {l, 2, ... , n} is a
rearrangement of its elements. a convenient way to represent a permutation is by using a
function
a: {I, 2, ... , n} ~ {l, 2, ... , n},
where a(I), a(2), ... , (n) gives the new order of the set 1,2, ... , n. In other words, the
permutation rearranges the ordered set 1,2, ... , n into a(1), a(2), ... , (n).
Such function a has to be onetoone (different values for different arguments) and
onto (assumes all possible values from the target space). Such functions (onetoone and
onto) are called bijections, and they give onetoone correspondence between two sets.
Although it is not directly relevant here, let us notice, that it is wellknown in
combinatorics, that the number of different perturbations of the set {I, 2, ... , n} is exactly
n!. The set of all permutations of the set {l, 2, ... , n} will be denoted Perm(n).
Using the notion of a permutation, we can rewrite the determinant as
D(v
l
, v
2
, ... , v
n
) = 2:: acr(I),lacr(2),2···acr(n),nD(ecr(I),ecr(2)'····,ecr(n»·
crEPenn(n)
The matrix with columns ecr(l)' e
cr
(2)' ... , ecr(n) can be obtained from the identity matrix
by finitely many column interchanges, so the determinant
D(ecr(I)' e
cr
(2), ... , ecr(n»
is I or 1 depending on the number of column interchanges.
To formalize that, we define sign (denoted sign a) of a permutation to be 1 if even
number of interchanges is necessary to rearrange the ntuple 1, 2, ... , n into a(1), a(2), ... ,
a(n), and sign( a) = 1 if the number of interchanges is odd.
It is a wellknown fact from the combinatorics, that the sign of permutation is well
defined, i.e., that although there are infinitely many ways to get the ntuple (1), (2), ... , (n)
from 1, 2, ... , n, the number of interchanges is either always odd or always even.
One of the ways to show that is to count the number K of pairs j, k, j < k such that
aU) > a(k), and see if the number is even or odd. We call the permutation odd if K is odd
and even if K is even. Then define signum of to be (I)K. We want to show that signum
and sign coincide, so sign is well defined.
If (k) = k V k, then the number of such pairs is 0, so signum of such identity permutation
is 1. Note also, that any elementary transpose, which interchange two neighbors, changes
the signum of a permutation, because it changes (increases or decreases) the number of the
pairs exactly by I. So, to get from a permutation to another one always needs an even
Determinants 111
number of elementary transposes if the permutation have the same signum, and an
oddnumber if the signums are different.
Finally, any interchange of two entries can be achieved by an odd number of elementary
transposes. This implies that signum changes under an interchange of two entries. So, to
get from 1,2, ... , n to an even permutation (positive signum) one always need even number
of interchanges, and odd number of interchanges is needed to get an odd permutation
(negative signum). That means signum and sign coincide, and so sign is well defined.
So, if we want determinant to satisfy basic properties 13 from Section 3, we must
define it as
detA = 2: aa(l),l,aa(2),2···,aa(n),n
si
gn(a),
aEPenn(n)
where the sum is taken over all permutations of the set {I, 2, ... , n}.
If we define the determinant this way, it is easy to check that it satisfies the basic
properties. Indeed, it is linear in each column, because for each column every term (product)
in the sum contains exactly one entry from this column. Interchanging two columns of a
just adds an extra interchange to the perturbation, so right side in changes sign. Finally, for
the identitymatrix I, the right side is 1 (it has one nonzero term).
COFACTOR EXPANSION
For an n x n matrix A = letAj'kdenotes the (nl) x (nl) matrix obtained
from A by crossing out row number j and column number k.
Theorem. (Cofactor expansion of determinant). Let A be an n x n matrix. For each j,
j n, determinant of A can be expanded in the row number j as
_ '+1
detA  aj,l (I)J detAj,1 + a
j
,2(I)j+2 detA
j
,2 + ...
+ a'
l
(_I)J+n detA.
J, J,n
n j+k
= 2:aj,k(I) det Aj,k'
k=1
Similarly, for each k, 1 k n, the determinant can be expanded in the
column number k,
n j+k
detA = 2:aj,k(I) det Aj,k'
k=1
Proof Let us first prove the formula for the expansion in row number 1. The formula
for expansion in row number k then can be obtained from it by interchanging rows number
1 and k. Since det A = det AT, column expansion follows automatically.
Let us first consider a special case, when the first row has one non zero term a I I'
Performing column operations on columns 2, 3, ... , n we transform a to the lower triangular
form. The determinant of A then can be computed as
112
I the product of diagonal entries of the triangular matrix I
the product of diagonal
. correcting factor from
entries of the triangular x
the column operations
matrix
Determinants
But the product of all diagonal entries except the first one (i.e., without at I) times the
correcting factor is exactly detAI,l' so in this particular case detA = al,I detAI,I'
Let us now consider the case when all entries in the first row except a
l
,2 are zeroes.'
This case can be reduced to the previous one by interchanging columns number 1 and 2,
and/therefore in this case detA = (1) detA
I
'2'
The case when a1'3 is the only nonzero entry in the first row, can be reduced to the
previous one by interchanging rows 2 and 3, so in this case detA = a1'3 detA1'3'
Repeating this procedure we get that in the case when al,k is the only nonzero entry
in the first row det A = (_1)1+ k aI,k det Al k'
In the general case, linearity of the determinant implies that
n
det A = det A(I) + det A(2) + ... + det A(n) = L:deti
k
)
'. k=1
where the matrix A(k) is obtained from A by replacing all entries in the first row except
aI'k by O. As we just discussed above
detA(k) = (_I)1+k al,k detAl'k,
so
n
det A = L (I) l+k a det A ok.
k=1 I'k I
To get the cofactor expansion in the second row, we can interchange the first and
second rows and apply the above formula. The row exchange changes the sign, so we get
n I+k n 2+k
det A =  l:) 1) a2,k det A
2
,k = 2: (1) a2,k det A
2
,k'
k=I k=I
Exchanging rows 3 and 2 and expanding in the second row we get formula
n 3+k
detA = 2:(1) a3,k detA
3
,k,
k=1
and so on.
To expand the determinant det A in a column one need to apply the row expansion
formula for AT.
Definition. The numbers
'+k
C·
k
=(I)l detA'
k 1, j,
are called co/actors.
Using this notation, the formula for expansion of the determinant in the row number
j can be rewritten as
Determinants 113
n
detA = a"1 C"I + a'
2
c.
2
+ ... + a. C. = Laj,kCj,k'
J J J, J, J.n J.n k=1
Similarly, expansion in the row number k can be written as
n
det A = al'k CI,k + a
2
,k C
2
,k + ... + an'k C
n
.
k
= Laj,kCj,k'
'1
Remark. often the cofactor expansion formula is Jused as the definition of
determinant. It is not dicult to show that the quantity given by this formula satisfies the
basic properties of the determinant: the normalization property is trivial, the proof of
anti symmetry is easy. However, the proof of linearity is a bit tedious (although not too
dicult).
Remark. Although it looks very nice, the cofactor expansion formula is not suitable
for computing determinant of matrices bigger than 3 x 3.
As one can count it requires n! multiplications, and n! grows very rapidly. For example,
cofactor expansion of a 20 x 20 matrix require 20! 2.4 . 10
18
multiplications: it would
take a computer performing a billion multiplications per second over 77 years to perform
the multiplications.
On the other hand, computing the determinant of an n x n matrix using row reduction
requires (n
3
+ 2n  3)/3 multiplications (and about the same number of additions). It would
take a computer performing a million operations per second (very slow, by today's standards)
a fraction of a second to compute the determinant of a 100 x 100 matrix by row reduction.
It can only be practical to apply the cofactor expansion formula in higher dimensions
if a row (or a column) has a lot of zero entries. However, the cofactor expansion formula
is of great theoretical importance, as the next section shows.
Cofactor Formula for Inverse Matrix
The matrix C = {Cj,k whose' entries are c<!factors of A given matrix A is called
the cofactor matrix of A.
Theorem. Let a be an invertible matrix and let C be its cofactor matrix.
Then
AI =_l_C
T
.
detA
Proof Let us find the product ACT. The diagonal entry number j is obtained by
mUltiplyingjth row of a by jth column of a (i.e., jth row of C), so
(ACT) .. = a·'IC.'1 + a·
2
C.
2
+ ... + a· C. = detA,
JJ J J J, J, J.n J,n
by the cofactor expansion formula.
To get the off diagonal terms we need to mUltiply jth row of A by kth column of CT,j
:t= k, to get
a,IC
k
1+ a,
2
C
k2
+ ... + a· C
k
.
j, , j" j,n ,n
114 Determinants
It follows from the cofactor expansions formula (expanding in kth row) that this is the
determinant of the matrix obtained from a by replacing row number k by the row number
} (and leaving all other rows as they were). But the rows} and k of this matrix coincide, so
the determinant is O. So, all offdiagonal entries of ACT are zeroes (and all diagonal ones
equal det A), thus
ACT = (det A) 1.
1
That means that the matrix det A CT is a right inverse of A, and since a is square, it is
the inverse. Recalling that for an invertible matrix A the equation Ax = b has a unique
solution
1
X =A
1
b =  CTb.
detA
We get the following corollary of the above theorem.
Corollary. (Cramer's rule). For an invertible matrix a the entry number k of the solution
of the equation Ax = b is given by the formula
I
detB
k
xk='
detA
where the matrix Bk is obtained from a by replacing column number k of A by the
vector b.
Some applications of the cofactor formula for the inverse. Example (Inverting 2 x 2
matrices). The cofactor formula really shines when one needs to invert a 2 x 2 matrix
The cofactors are.just entries (1 x ] matrices), the cofactor matrix is
(
d b)
c a'
so the inverse matrix AI is given by the formula
AI = (!c
While the cofactor formula for the inverse does not look practical for dimensions higher
than 3, it has a great theoretical value, as the examples below illustrate.
Example. (Matrix with integer inverse). Suppose that we want to construct a matrix a
with integer entries, such that its inverse also has integer entries (inverting such matrix
would make a nice homework problem: no messing with fractions). If det A = 1 and its
entries are integer, the cofactor formula for inverses implies that AI also have integer
entries.
Note, that it is easy to construct an integer matrix A with det A = 1: one should start
with a triangular matrix with 1 on the main diagonal, and then apply several row or column
replacements (operations of the third type) to make the matrix look generic.
Determinants 115
Examp:e (Inverse of a polynomial matrix). Another example is to consider a polynomial
matrix A(x), i.e., a matrix whose entries are not numbers but polynomials a· ix) of the
variable x. If det A(x) 1, then the inverse matrix AI(x) is also a polynomial ri;atrix.
If det A (x) = p(x) == 0, it follows from the cofactor expansion that p(x) is a polynomial,
so AI (x) is a has rational entries: moreover, p(x) is a multiple of each denominator.
Minors and Rank
For a matrix A let us consider its k x k submatrix, obtained by taking k rows and k
columns. The determinant of this matrix is called a minor of order k. Note, that an m x n
matrix has ( ~ l ) . ( ~ ) different k x k submatrices, and so it has (;).( ~ ) minors of order k.
The'Jrem. For a nonzero matrix a its rank equals to the maximal integer k such that
there exists a nonzero minor of order k.
Proof Let us first show, that if k > rank A then all minors of order k are 0. Indeed,
since the dimension of the column space Ran A is rank A < k, any k columns of A are
linearly dependent. Therefore, for any k x k submatrix of A its column are linearly dependent,
and so all minors of order k are 0.
To complete the proof we need to show that there exists a nonzero minor of order k
= rankA. There can be many such minors, but probably the easiest way to get such a minor
is to take pivot rows and pivot column (i.e., rows and columns of the original matrix,
containing a pivot). This k x k submatrix has the same pivots as t:.e original matrix, so it
is invertible (pivot in every column and every row) and its determinant is nonzero.
This theorem does not look very useful, because it is much easier to perform row
reduction than to compute all minors. However, it is of greattheoretical importance, as the
following corollary shows.
Corollary. Let A = A(x) be an m x n polynomial matrix (i.e., a matrix whose entries
are polynomials of x). Then rank A (x) is constant everywhere, except maybe finitely many
points.
Proof Let r be the largest integer such that rankA(x) = r for some x. To show that
such r exists, we first try r = min {m, n}. If there exists x such that rank A (x) = r, we found
r. If not, we replace r by r 1 and try again. After finitely many steps we either stop or hit
0. So, r exists.
Let Xo be a point such that rankA(xo) = r, and let M be a minor of order k such that
M(xo) '* 0. Since M(x) is the determinant of a k x k polynomial matrix, M(x) is a polynomial.
Since M(xo) '* 0, it is not identically zero, so it can be zero onl; at finitely many points.
So, everywhere except maybe finitely many points rankA(x) ~ r. But by the definition ofr,
rankA(x) ~ r for all x.
DETERMINANTS
We have related the question of the invertibility of a square matrix to a question of
116 Determinants
solutions of systems of linear equations. In some sense, this is unsatisfactory, since it is
not simple to nd an answer to either of these questions without a lot of work. We shall
relate these two uestions to the question of the determinant of the matrix in question. The
task is reduced to checking whether this determinant is zero or nonzero. So what is the
determinant?
Let us start with 1 x 1 matrices, of the form
A = (a) •
Note here that 1\ = (1). If a 6 ::;c 0, then clearly the matrix A is invertible, with inverse
matrix
AI = (a  1)
On the other hand, if a = 0, then clearly no matrix B can satisfy AB = BA = 1\, so that
the matrix A is not invertible. We therefore conclude that the value a is a good "determinant"
to determine whether the 1 x 1 matrix A is invertible, since the matrix A is invertible if and
only if a ::;c O. ,
Let us then agree on the following definition.
Definition. Suppose that
A = (a).
is a 1 x 1 matrix. We write
det (A) = a,
and call this the determinant of the matrix A.
Next, let us turn to 202 0 x 2 matrices, of the form
A = ( ~ ~ ) .
We shall use elementary row operations to nd out when the matrix A is invertible. So
we consider the array
(AII
2
) = ( ~ ~ 6 ~ ) ,
and try to use elementary row operations to reduce the left hand half of the array to 1
2
,
Suppose first of all that a = e = O. Then the array becomes
(
0 b I 0)
o dOl'
and so it is impossible to reduce the left hand half of the array by elementary row
operations to the matrix 1
2
, Consider next the case a 6:f. O. Multiplying row 2 of the array
(1) by a, we obtain
Adding e times row 1 to row 2, we obtain
(
a b 1 0)
o adbe e a·
If D = ad  be = 0, then this becomes
(
a b 1 0)
o 0 c a'
Determinants 117
and so it is impossible to reduce the left hand half of the array by elementary row
operations to the matrix 1
2
, On the other hand, if D = ad  be :f. 0, then the array (2) can be
reduced by elementary row operations to
so that
(
1 0 d / D b / D)
o 1 c/ D a/ D '
AI = 1 (d b).
adbe e a
Consider nally the case e:f. O. Interchanging rows I and 2 of the array (I), we obtain
(
e dOl)
a b 1 O·
Multiplying row 2 of the array bye, we obtain
(
e dOl)
ae be eO'
Adding a times row I to row 2, we obtain
(
e dOl)
o bead e a'
Multiplying row 2 by 1, we obtain
(
e dOl)
o ad be e a'
Again, if D = ad  be = 0, then this becomes
(
e dOl)
a 0 e a'
and so it is impossible to reduce the left hand half of the array by elementary row
operations to the matrix 1
2
, On the other hand, if D = ad  be = 0, then the array (3) can be
reduced by elementary row operations to
(
I 0 d / D b / D)
o 1 e/ D af D '
so that
AI = 1 ( d b).
adbe e a
Finally, note that a = e = 0 is a special case of ad  be = O. We therefore conclude that
the value ad  be is a good determinant" to determine whether the 2 x 2 matrix A is
invertible, since the matrix A is invertible if and only if ad  be :f. O.
Let us then agree on the following definition.
Definition. Suppose that
A = ( ~ ~ )
118
is a 2 x 2 matrix. We write
det(A) = ad  be,
and call this the determinant of the matrix A.
Determinants
Determinants for Square Matrices of Higher Order
If we attempt to repeat the argument for 2 x 2 matrices to 3 x 3 matrices, then it is
very likely that we shall end up in a mess with possibly no firm conclusion. Try the argument
on 4 x 4 matrices if you must. Those who have their feet firmly on the ground will try a
different approach. Our approach is inductive in nature. In other words, we shall dene the
determinant of 2 x 2 matrices in terms of determinants of 1 x 1 matrices, dene the
determinant of 3 x 3 matrices in terms of determinants of 2 x 2 matrices, dene the
determinant of 4 x 4 matrices in terms of determinants of 3 x 3 matrices,
and so on. Suppose now that we have dened the determinant of (n  1) x (n  1)
matrices. Let
(
all... aln ]
A
=· . . .
anI··· ann
be an n matrix. For every i,) = 1, ... , n, let us delete row i and column} of A to obtain
the (n  1) (n  1) matrix
all
aI(II) • aI(I+I)
aln
•
a(iI)I a(il)(jI) ·a(iI)(j+I)
a(i_I)"
Aij =
• •
a(i,!"I)I a(i+l}(jI) : a(j+I)(j+I)
anI an(jI)
•
an(j+I) ann
Here denotes that the entry has been deleted.
Definition. The number Cij = (_l)i+j det(Aij) is called the cofactor of the entry aij of A.
In other words, the cofactor of the entry a ij is obtained from A by first deleting the row and
the column containing the entry aij' then calculating the of the resulting
(n  1) x (n  1) matrix, and nally multiplying by a sign (ly+J.
Note that the entries of A in row i are given by
(ail' ... , ain)·
Definition. By the cofactor expansion of A by row i, we mean the expression
n
L aijCij = aijC
il
+ ... + ainC
in
.
j=I
Note that the entries of A in column} are given by
Determinants
(
a
F
]
an)
Definition. By the cofactor expansion of A by column j, we mean the expression
n
IaijC
y
= aljC
lj
+ ... anjCnj"
i=l
119
Proposition. Suppose that A is an n x n matrix. Then the expressions are all equal
and independent of the row or column chosen.
Definition. Suppose that A is an n x n matrix. We call the common value in the
determinant of the matrix A, denoted by det(A).
Let us check whether this agrees with our earlier definition of the determinant of a
2 x 2 matrix. Writil1g
we have
A = (a
l1
a
12
),
a21 a22
CII = a
22
, C
l2
=a
21
, C
21
=a
I2
, C
22
= all:
It follows that
by row 1 : allC
ll
+ a
l2
C
l2
= a
ll
a
22
 a
12
a
21
,
by row 2 : a
21
C
21
+ a
22
C
22
= a
2l
a
l2
+ a
22
a
ll
,
by column 1 : allC
ll
+ a
21
C
21
= a
ll
a
22
 a
21
a
12
,
by column 2 : a
l2
C
l2
+ a
22
C
22
= a
l2
a
21
+ a
22
a
ll
:
The four values are clearly equal, and of the form ad  bc as before.
Example. Consider the matrix
(
2 3 5)
A= 1 4 2.
215
Let us use cofactor expansion by row 1. Then
Cll = (_1)1+1 det (i ;) = (I? (202)= 18,
C
l2
= (_1)1+2 detl U ;) = (_1)3 (5  4) = 1,
C
I3
= (_1)1+3 detl U i) = (_1)4 (1  8) = 7,
so that
det(A) = allC
ll
+ a
l2
C
l2
+ a
l3
C
l3
= 36  3  35 = 2:
Alternatively, let us use cofactor expansion by column 2. Then
120 Determinants
so that
C
l2
= (_1)1+2 det U ;) = (1)3(5  4) = 1,
C
22
= (_1)2+2 det ( ~ ~ ) = (_1)4(10  10) = 0,
C
32
= (_1)3+2 det (f ~ ) = (1)5(4  5) = 1,
det(A) = a
12
C
12
+ a
22
C
22
+ a
32
C
32
= 3 + 0 + 1 = 2.
When using cofactor expansion, we should choose a row or column with as few non
zero entries as possible in order to minimize the calculations.
Example. Consider the matrix
(
2 3 0 5J
1 402
A= 5 4 8 5 .
2 1 0 5
Here it is convenient to use cofactor expansion by column 3, since then
(
2 3
det(A) = a
13
C
13
+ a
23
C
23
+ a
33
C
33
+ a
43
C
43
= 8C
33
= 8El)3+3 det ~ i
in view of Example.
Some Simple Observations
D=16,
In this section, we shall describe two simple observations which follow immediately
from the definition of the determinant by cofactor expansion.
Proposition. Suppose that a square matrix A has a zero row or has a zero column.
Then det (A) = O.
det(A) = O.
Proof We simply use cofactor expansion by the zero row or zero column.
Definition. Consider an n x n matrix
A = (ap ... al
n
J.
an, ... ann
If aij = 0 whenever i > j, then A is called an upper triangular matrix. If aij = 0 whenever
i <j, then A is called a lower triangular matrix. We also say that A is a triangular matrix if
it is upper triangular or lower triangular.
Example. The matrix
(
~ ~ ~ J
006
is upper triangular.
Determinants
Example. A diagonal matrix is both upper triangular and lower triangular.
Proposition. Suppose that the n x n matrix is triangular.
(
all ... aln'J
A
= . .
. .
ani ... ann
Then det(A) = all a
22
, ... , ann' the product of the diagonal entries.
121
Proof Let us assume that A is upper triangular for the case when A is lower triangular,
change the term leftmost column to the term top row in the proof. Using cofactor expansion
by the leftmost column at each step, we see that
det (a
f3
'" a ~ n J = '" all a
22
• .. ann
a
n
3 '" ann
as required.
Elementary Row Operations
We now study the eect of elementary row operations on determinants. Recall that the
elementary row operations that we consider are: (l) interchanging two rows, (2) adding a
multiple of one row to another row, and (3) multiplying one row by a nonzero constant.
Proposition. (ELEMENTARY ROW OPERATIONS) Suppose that A is an n x n
matrix.
(a) Suppose that the matrix B is obtained from the matrix A by interchanging
two rows of A. Then det(B} = det(A}.
(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple
of one row of A to another row. Then det(B} = det(A}.
(c) Suppose that the matrix B is obtained from the matrix A by multiplying one
row of A by a nonzero constant c. Then det(B} = c det(A}.
Proof (a) The proof is by induction on n. It is easily checked that the result holds
when n = 2. When n> 2, we use cofactor expansion by a third row, say row i. Then
n
'"" i+ j
det(B) = .LJaij(l) det(Bij)'
j=1
Note that the (n  1) x (n  I) matrices Bij are obtained from the matrices Ai" by
interchanging two rows of Aij , so that det(Bij) = det(Aij)' It follows that !)
n
det(B) =  Laij(li+
j
det(Ay) = det(A)
j=1
as required.
,
122 Determinants
(b) Again, the proof is by induction on n. It is easily checked that the result holds
when n = 2. When n> 2, we use cofactor expansion by a third row, say row i. Then
n
det(B) = L,aij(li+
j
det(Bij)
j=1
Note that the (n 1) x (n  1) matrices Bij are obtained from the matricesAij by adding
a multiple of one row of Aij to another row, so that det(Bij) = det(Aij)' It follows that
n
det(B) = L,uij(l)i+
j
det(Aij) = det(A)
1=1
as required.
(c) This is simpler. Suppose that the matrix B is obtained from the matrix A by
multiplying row i of A by a nonzero constant c. Then
n
det{ B) = L caij ( 1 )i+
1
det{ Bij )
j=1
Note now that
Bij = Aij'
since row i has been removed respectively from Band A. It follows that
n
det(B) = Lcaij(I)i+
1
det(Aij) = cdet(A)
j=1
as required.
In fact, the above operations can also be carried out on the columns of A. More precisely,
we have the following result.
Proposition. Suppose that A is an n x n matrix.
(a) Suppose that the matrix B is obtained from the matrix A by interchanging
two columns of A. Then det(B} =  det(A).
(b) Suppose that the matrix B is obtained from the matrix A by adding a multiple
of one column of A to another column. Then det(B} = det(A}.
(c) Suppose that the matrix B is obtained from the matrix A by multiplying one
column of A by a nonzero constant c. Then det(B} = c det(A}.
Elementary row and column operations can be combined with cofactor expansion to
calculate the determinant of a given matrix. We shall illustrate this point by the following
examples.
Example. Consider the matrix
A =(1 ! ! ~ J
2 2 0 4
Adding I times column 3 to column 1, we have
Determinants
det(A) = det[? : !
2 2 0 4
Adding 1/2 times row 4 to row 3, we have
det(A) = det [H !
2 2 0 4
Using cofactor expansion by column 1, we have
det(A) = 2(_1)4+1 det i = 2
3 4 3 3
Adding 1 times row 1 to row 3, we have
det(A) = 2 det i J.
o 2 2
Adding 1 times column 2 to column 3, we have
det ! n
Using cofactor expansion by row 3, we have
det(A) = 2.2(1)3+2 det (! = det (!
Using the formula for the determinant of2 x 2 matrices, we conclude that
det(A) = 4(9  28) = 76.
Let us start again and try a different way. Dividing row 4 by 2, we have
det(A) = 2 det (! : !
1 1 0 2
Adding 1 times row 4 to row 2, we have
det(A) = 2 det = ! !
1 1 0 2
Adding 3 times column 3 to column 2, we have
det(A) = ! !
1 1 0 2
123
124
Using cofactor expansion by row 2, we have
det(A) = 2 1(1)2+3 det (; = 2det(;
. 1 I 2 1
Adding 2 times row 3 to row 1, we have
det(A) = 2 det 1)
1 1 2
Adding 5 times row 3 to row 2, we have
det(A) 2 de{ H
Using cofactor expansion by column 1, we have
det(A) = 2. 1(_1)3+1 det(=t3 = 2det(=1
5
3
Using the formula for the determinant of 22 matrices, we conclude that
det(A) = 2(25 + 13) = 76.
Example. Consider the matrix
H]
1 0 1 1 3 .
2 102 0
Determinants
Here we have the least number of nonzero entries in column 3, so let us work to get
more zeros into this column. Adding 1 times row 4 to row 2, we have
det(A) = ? H].
1 0 1 1 3
2 102 0
Adding 2 times row 4 to row 3, we have
[
2 1 0 1 3]
1 3 0 1 2
det(A) = det 2 7 0 1 1.
1 0 1 1 3
2 1 0 2 0
Using cofactor expansion by column 3, we have
(
2 1 1 3J (2 1 1
4+3 1 3 1 2 1 3 1
det(A) = 1(1) det 2 7 1 1 = det 2 7 1
2120 212
Determinants 125
Adding 1 times column 3 to column 1, we have
(
1 1 1 3J
o 3 1 2
det( A) =  det 1 7 1 1·
o 1 2 0
Adding 1 times row 1 to row 3, we have
(
1 1 1 3 J
o 3 1 2
det(A) = det 0 6 0 2·
o 1 2 0
Using cofactor expansion by column 1, we have
32'J.
1 2 0 1 2 0
Adding 1 times row 1 to row 2, we have
(
3 1 2)
det(A) = det 9 1 O.
120
Using cofactor expansion by column 3, we have
det(A) = _2(_1)1+3 det(i = 2det(i
Using the formula for the determinant of 2 x 2 matrices, we conclude that det(A) =
2(18  1) = 34.
Example. Consider the matrix
1 024 1 0
2 4 5 7 6 2
A= 4 6 1 9 2 1
350 125
245 362
1 0 2 5 1 0
Here note that rows 1 and 6 are almost identical. Adding 1 times row 1 to row 6, we
have
1 0 2 4 1 0
2 4 5 7 6 2
det(A) = det
4 6 1 9 2 1
3 5 0 1 2 5
2 4 5 3 6 2
0 0 0 1 0 0
Adding 1 times row 5 to row 2, we have
126
det(A) = det
102 4 1 0
o 0 0 4 0 0
4 6 1 9 2 1
350 125
245 362
000 100
Adding 4 times row 6 to row 2, we have
1 0 2 4 1 0
o 0 0 0 0 0
4 6 1 9 2 1
det(A) = det 3 5 0 1 2 5
245 362
000 100
It follows from Proposition 3B that det(A) = O.
Further Properties of Determinants
Definition. Consider the n x n matrix
A = (a11 ... a1n ).
anI'" ann
Determinants
By the transpose At of A, we mean the matrix obtained from A by transposing rows
and columns.
I (all ... a(ll)
A
=' . . ..
al
n
... ann
Example. Consider the matrix
(
1 2 3)
A= 4 5 6.
789
Then
AI =(1 ~ ~ J .
369
Recall that determinants of 2 x 2 matrices depend on determinants of 1 x 1 matrices,
in turn, determinants of 3 x 3 matrices depend on determinants of 2 x 2 matrices, and so
on.
It follows that determinants of n x n matrices ultimately depend on determinants of 1
x 1 matrices. Note now that transposing a 1 x 1 matrix does not aect its determinant (why?).
The result below follows in view of Proposition. For every n x n matrix A, we have det(A
t
)
= det(A).
Example. We have
Determinants 127
det=[n H i i
12312 10113
35730 2 102 0
Next, we shall study the determinant of a product. We shall sketch a proof of the
following important result
Proposition. For every n x n matrices A and B, we have det(AB) = det(A) det(B).
Proposition. Suppose that the n x n matrix A is invertible. Then
det(A
1
) = 1
det(A)
Proof In view of Propositions 3G and 3C, we have det(A) det(A
1
) = det(In) = 1. The
result follows immediately. Finally, the main reason for studying determinants, as outlined
in the introduction, is summarized by the following result.
Proposition. Suppose that A is an n x n matrix. Then A is invertible if and only if
det(A) * o.
Proof Suppose that A is invertible. Then det(A) * 0 follows immediately from
Proposition. Suppose now that det(A) ::f:. O. Let us now reduce A by elementary row operations
to reduced row echelon form B. Then there exist a finite sequence E
1
, ••• , Ek of elementary
n x n matrices such that
B = E
k
, ••• , EIA
It foIrows from Proposition that
det(B) = det(E
k
), ••• , det(E
1
) det(A)
Recall that all elementary matrices are invertible and so have nonzero determinants.
It follows that det(B) ::f:. 0, so that B has no zero rows by Proposition. Since B is an n x n
matrix in reduced row echelon form, it must be In. We therefore conclude that A is row
equivalent to In. It now follows from Proposition that A is invertible. Combining
Propositions, we have the following result.
Proposition. In the notation of Proposition, the following statements are equivalent:
(a) The matrix A is invertible.
(b) The system Ax = 0 of linear equations has only the trivial solution.
(c) The matrices A and In are row equivalent.
(d) The system Ax = b of linear equations is soluble for every n 1 matrix b.
(e) The determinant det(A) ::f:. o.
Application to Curves and Surfaces
A special case of Proposition states that a homogeneous system ofn linear equations in
n variables has a nontrivial solution if and only if the determinant if the coefficient matrix
128 Determinants
is equal to zero. In this section, we shall use this to solve some problems in geometry. We
illustrate our ideas by a few simple examples.
Example. Suppose that we wish to determine the equation of the unique line on the
xyplane that passes through two distinct given points (xI' Yl) and (x
2
' Y2)' The equation of
a line on the xyplane is of the form ax + by + c = O. Since the two points lie on the line, we
must have aX
I
+ bYI + c = 0 and ax
2
+ bY2 + c = O. Hence xa + yb + c = 0,
xla + Ylb + c = 0,
x
2
a + Y2b + c = O.
Written in matrix notation, we have
(
~ ~ ~ J ( ~ J = ( ~ J .
x2 Y2 1 c 0
Clearly there is a nontrivial solution (a, b, c) to this system of linear equations, and
so we must have
d e t ( ~ ~ ~ J = O.
x2 Y2 1
the equation of the line required.
Example. Suppose that we wish to determine the equation of the unique circle on the
xyplane that passes through three distinct given points (xl'Yl)' (x
2
'Y2) and (x
3
'Y3)' not all
lying on a straight line. The equation of a circle on the xyplane is of the form
a(x
2
+Y2) + bx + cy + d = O.
Since the three points lie on the circle, we must have
and
Hence
(
2 2) (2 2)
a al + YI + bX
1
+ cYI + d = 0, a x2 + Y2 + bX
2
+ cY2 + d = 0
(x
2
+ Y2)a + xb + yc + d = 0,
(x
1
2
+ yf}a + xlb + ylc + d = 0
(xi + yi)a + x
2
b + Y2c + d, 0
(x; + y;)a + x3b + Y3c + d = 0
Written in matrix notation, we have
Determinants 129
Clearly there is a nontrivial solution (a, b, c, d) to this system of linear equations, and
so we must have
x
2
+i
x
Y
x
2
+i xI
YI
det
I I
2 2
= 0,
x
2
+ Y2 x2
Y2
2 2
x3 + Y3 x3 Y3
the equation of the circle required.
Example. Suppose that we wish to Qetermine the equation of the unique plane in 3
space that passes through three distinct given points (xl' YI' ZI)' (X
2
'Y2' z2) and (x
3
' Y3' z3)'
not all lying on a straight line. The equation of a plane in 3space is of the form ax + by +
cz + d = o. Since the three points lie on the plane, we must have ax I + bYI + cz I + d = 0, aX
2
+ bY2 + CZ
2
+ d= 0, and ax
3
+ bY3 + cZ
3
+ d= O. Hence
xa + yb + zc + d = 0,
xla+Ylb+zlc+d=O,
x
2
a" + Y2b + z2c + d = 0,
x
3
a + Y3b + z3c + d = 0:
Written in matrix notation, we have
x
2
+ i x Y a 0
b
xi + yi x
2
Y2
c
2 2
x3 + Y3 x3 Y3
d
=
o
o
o
Clearly there is a nontrivial solution (a, b, c, d) to this system of linear equations, and so
we must have the equation of the plane required.
d e t ( ~ ~ ~ ~ J = 0
z2 Y2 z2 1 '
x3 Y3 z3 1
Example. Suppose that we wish to determine the equation of the unique sphere in 3
space that passes through four distinct given points (xl' YI' ZI)' (x
2
' Y2' Z2)' (x
3
' Y3' z3) and
(x4' Y 4' Z4)' not all lying on a plane. The equation of a sphere in 3space is of the form
a(x
2
+ Y2 + z2) + bx + cy + dz + e = o.
Since the four points lie on the sphere, we must have
a(xf + yf + zf)+ bX
I
+cYI +dz
l
+e = 0,
a (xi + yi + zi) + bX2 + CY2 + dz
2
+ e = 0,
a(x; + Y; + z;) + bX3 +CY3 +dz
3
+e = 0,
130
Hence
(x
2
+ y2 + z2)a+ xb+ yc+ zd +e = 0,
(xf + yf +zf)a+xlb+ Ylc+zl
d
+e = 0,
(x; + Y; + zi)a+ x2
b
+ Y2
C
+ z2d + e = 0,
(xi + yi + zi)a+x3
b
+ Y3
C
+ z3
d
+e = 0,
(xi + yi + zi)a+x4
b
+ Y4
C
+ Z4
d
+e = o.
Written in matrix notation, we have
x
2
+ i+z2 x
Y
Z 1
a 0
x
2
+ i +z2
I I I
xI
YI
zi b 0
2 2 2
x2 + Y2 +z2 x2
Y2
z2
1
C
=
0
x
2
+ i+z2
x3
Y3
z3
d 0
3 3 3
2 2 2 1 e
x4 + Y4 + z4 x4 Y4 z4
o
Determinants
Clearly there is a nontrivial solution (a, b, c, d, e) to this system of linear equations,
and so we must have
X2 + Y2 +z2
x
Y
Z 1
x
2
+ i +z2
I I I
xI
YI
zi
1
det
xi + Y; +z; X2
Y2
Z2
=0,
xf + yf +zf
X3
Y3
z3
1
x ~ + y ~ + z ~ x4
Y4
Z4
the equation of the sphere required.
Some Useful Formulas
In this section, we shall discuss two very useful formulas which involve determinants
only. The rst one enables us to nd the inverse of a matrix, while the second one enables us
to solve a system of linear equations. The interested reader is referred to Section 3.8 for
proofs. Recall rst of all that for any n x n matrix
(
all ... aln )
A =· . . .,
ani ... ann
the number Cij = (ly+j det(Aij) is called the cofactor of the entry aij' and the (n  1)
(n  1) matrix
Determinants
a(iI)I
•
al(j._I) •
a(i_l)(jI) • a(iI)(j+I)
• • •
a(iI)(i_l) • a(i+I){i+I)
a.ln
aU l)n
aU + l)n
aU + I)n
ani an(j_I)· an(j+I) ann
131
is obtained from A by deleting row i and column}, here denotes that the entry has
been deleted.
Definition. The n x n matrix
ad} (A) = (C?I ... Cf/lJ
C
11I
•• , C
IIII
is called the adjoint of the matrix A.
Remark. Note that adj(A) is obtained from the matrix A rst by replacing each entry of
A by its cofactor and then by transposing the resulting matrix.
Proposition. Suppose that the n x n matrix A is invertible. Then
AI = 1 ad·(A).
det(A) lj
Example. Consider the matrix
[
1 1 0]
A= 0 1 2.
203
Then
det[ det[ I
i2
det[O I] det[1 1] detr1 I]
2 0 2 0 ,0 1
ad}(A) =
On the other hand, adding 1 times column 1 to column 2 and then using cofactor
expansion on row 1, we have
det(A)
0] [1 0 0]
2 0 I 2
3 2 2 3
It follows that
132 Determinants
[
3 3 2]
AI = 4 3 2.
2 2 1
Next, we turn our attention to systems of n linear equations in n unknowns, of the
form
anix
i
+ ... anrfn = b
n
,
represented in matrix notation in the form
Ax = b,
where
C
Jn
) (b)
: and b = :1
C
nn
b
n
represent the coefficients and
represents the variables. For every j = 1, ... , k, write in other words, we replace column
j of the matrix A by the column b.
al(Jr
l
) ... a\nJ.
. . ,
b
n
an(J+I) . . . ann
Proposition. (Cramer's Rule) Suppose that the matrix A is invertible. Then the unique
solution of the system Ax = b, where A, x and b are given by equation, is given by
det(A, (b»
x  _.....:.......e,,,,
1  det(A) ,
det(An(b»
x = ...:......:.''':..:...
n det(A) '
where the matrices AI(b), ... , AI(b).
Example. Consider the system Ax = b, where
[
1 10] [1]
A = ~ ~ ~ and b = ~ .
Recall that det(A) = 1. By Cramer's rule, we have
d e t [ ~ 11 ~ ] d e t [ ~ ~ ~ ]
3 0 3 2 3 3
XI = 'det(A)'= 3, x2 = det(A) = 4,
Determinants 133
(
1 1 1]
det 0 1 2
203
x = =3
3 det{A) ,
Let us check our calculations. Recall from Example that
[
3 3 2]
AI = 4 3 2.
2 2 1
We therefore have
[
: ~ l = [ = ~ = ~ ~ ] [ ~ ] = [ = ~ ] .
x3 2 2 1 3 3
Further Discussion
In this section, we shall first discuss a definition of the determinant in terms of
permutations. In order to do so, we need to make a digression and discuss first the rudiments
of permutations on nonempty finite sets.
Definition. LetXbe a nonempty finite set. A permutation $ onXis a function: X ~
X which is onetoone and onto. If x E X, we den()te by x the image of x under the
permutation. It is not dicult to see that if: $ X ~ X and: X ~ X are both permutations on
X, then: X ~ X, dened by x<l>'I' = {x$)'I' for every x E X so that is followed by, is also a
permutation on X.
Remark. Note that we use the notation x instead of our usual notation (x) to denote the
image of x under. Note also that we write to denote the composition. We shall do this
only for permutations. The reasons will become a little clearer later in the discussion.
Since the set X is nonempty and finite, we may assume, without loss of generality,
that it is {I, 2, ... , n}, where n EN. We now let Sn denote the set of all permutations on
the set {l, 2, ... , n}. In other words, Sn denotes the collection of all functions from (1, 2,
... , n) to {I, 2, ... , n} that are both onetoone and onto.
Proposition. For every n EN, the set Sn has n! elements.
Proof There are n choices for 1$. For each such choice, there are (n  1) choices left
for 2$. And so on.
To represent particular elements of Sn' there are various notations. For example, we
can use the notation
(
1 2 ... n)
1<1> 2<1> ... n<1>
to denote the permutation $.
Example. In S4'
(
1 2 3 4)
2 4 1 3
134 Determinants
denotes the permutation, where 1 = 2, 2 = 4, 3 = 1 and 4 = 3. On the other hand, the
reader can easily check that
(
1 2 3 4)(1 2 3 4) (I 2 3 4)
241332412134'
A more convenient way is to use the cycle notation. The permutations
U ~ ~ j) and ( ~ ~ l:i)
can be represented respectively by the cycles (I 243) and (I 34). Here the cycle (1
243) gives the information 1<1> = 2, 2<1> = 4, 4<1> = 3 and 3<1> = 1. Note also that in the latter
case, since the image of2 is 2, it is not necessary to include this in the cycle. Furthermore,
the information
(
1 2 3 4)(1 2 3 4) (I 2 3 4)
241332412134'
can be represented in cycle notation by (1 243)(1 34) = (1 2). We also say that the
cycles (1 2 4 3), (1 3 4) and (1 2) have lengths 4, 3 and 2 respectively.
Example. In 8
6
, the permutation
(
I 2 3 4 5 6)
2 4 136 5
can be represented in cycle notation as (1 243)(5 6).
Example. In 8
4
or 8
6
, we have (1 2 4 3) = (1 2)(1 4)(1 3).
The last example motivates the following important idea.
Definition. Suppose that n EN. A pet:mutation in 8
n
that interchanges two numbers
among the elements of {I, 2, ... , n} and leaves all the others unchanged is called a
transposition. Remark. It is obvious that a transposition can be represented by a 2cycle,
and is its own inverse. Two cycles (XI' x
2
' ... , x
k
) and (YI' Y2' ... , YI) in 8
n
are said to be
disjoint if the elements XI' ... , x
k
' YI' ... , YI are all different. The interested reader may try
to prove the following result.
Proposition. Suppose that n EN.
(a) Every permutation in 8
n
can be written as a product of disjoint cycles.
(b) For every subset (XI' x
2
, ... , x
k
) of the set {I, 2, ... , n}, where the elements Xl'
X
2
' ... , X
k
are distinct, the cycle (XI' X
2
' ... , X
k
) satises
(XI X
2
, .. ·, Xk) = (XI' X
2
)(X
I
, x
3
), ... , (XI' xk),
in other words, every cycle can be written as a product of transpositions.
(c) Consequently, every permutation in 8
n
can be written as a product of transpositions.
Example. In 8
9
, the permutation
(
I 2 3 4 5 6 7 8 9)
32517 849 6
can be written in cycle notation as (1 3 5 74)(6 8 9). By Theorem 3P(b), we have
Determinants 13S
(1 3 5 7 4) = (1 3)(1 5)(1 7)(1 4) and (6 8 9) = (6 8)(6 9).
Hence the permutation can be represented by (1 3)(1 5)(1 7)(1 4)(68)(69).
Definition. Suppose that n EN. Then a permutation in Sn is said to be even if it is
representable as the product of an even number oftransp0sitions and odd ifit is representable
as the product of an odd number of transpositions. Furthermore, we write
(
,!..) _ {+ 1 if <l> is even
E 'f'  1 iff is odd.
Remark. It can be shown that no permutation can be simultaneously odd and even.
We are now in a position to dene the determinant of a matrix. Suppose that
A = [ a I ~ ... a
In
:]
anI ... ann
is an n x n matrix.
Definition. By an elementary product from the matrix A, we mean the product of n
entries of A, no two of which are from the same row or same column.
sum
It follows that any such elementary product must be of the form
a
I
(1<1»ai2<1» ... an(nj),
where <I> is a permutation in Sn'
Definition. By the determinant of an n x n matrix A of the form (11), we mean the
<Pe:S"
where the summation is over all the n! permutations in Sn'
It is be shown that the determinant de ned in this way is the same as that dened earlier
by row or column expansions. Indeed, one can use (12) to establish Proposition. The very
interested reader may wish to make an attempt. Here we conne our study to the special
cases when n = 2 and n = 3.
In the two examples below, we use e to denote the identity permutation.
Example. Suppose that n = 2. We have the following:
elementary product permutation sign
a
ll
a
22
e +1
a
12
a
2I
(1 2) 1
Hence det (A) = all a
22
 a
I2
a
2I
as shown before.
Example. Suppose that n = 3. We have the following:
elementary product permutation sign
alla22a33
a12a21a33
a13a21a32
a13a22a3I
e +1
(123)
(1 32)
(1 3)
+1
+1
1
contribution
contribution
+ aIla22a33
+ a12a21a31
+ a13a21a32
 a13a22a31
136 Determinants
alla23a32 (23) 1  alla23a32
a12a21a33 (1 2) 1  a12a21a33
Hence det(A) = alla22a33 + a12a23a31 + a13a21a32  a13a22a31 alla23a32  a12a2Ia33'
We have the picture lielow:
+
Next, we discuss briey how one may prove Proposition concerning the determinant of
the, product of two matrices. The idea is to use elementary matrices. Corresponding to
Proposition, we can easily establish the following result.
Proposition. Suppose that E is an elementary matrix.
(a) If E arises from interchanging two rows of In, then det(E) = I.
(b) If E arises from adding one row of In to another row, then det(E) = 1.
(c) If E arises from mUltiplying one row of In by a nonzero constant c, then
det(E) = c.
Combining Propositions 3D and 3Q, we can establish the following intermediate result.
Proposition. Suppose that E is an n x n elementary matrix. Thenfor any n x n matrix
B, we have det(EB) = det(E) det(B).
Proof of Proposition. Let us reduce A by elementary row operations to reduced row
echelon form A'. Then there exist a finite sequence G
1
, ... , Gk,ofelementary matrices such
that A' = G
k
, ... , G1A.
Since elementary matrices are invertible with elementary inverse matrices, it follows
that there exist a nite sequence E
1
, ••• , Ek of elementary matrices such that
A = EI ... EJI1'
Suppose first of all that det(A) = O. Then it follows from (13) that the matrix Ao must
have a zero row. Hence A' B must have a zero row, and so det(A' B) = O. ButAB = E
1
, ... ,
E/A 'B), so it follows from Proposition that det(AB) = O. Suppose next that det(A) ::t: O.
Then A' = In' and so it follows from Equation that AB = E
1
, ••• , E ~ .
Determinants
Proof It sucess to show that
Aad}(A) = det(A)I
n
,
as this clearly implies
A[ 1 ad}(A)] = In'
det(A)
giving the result. To show, note that
[
all ... al
n
1 [CJl
Aad}(A) =: ::
anI'" ann Ctn
Suppose that the right hand side of is equal to
... b1n ]
(B)=: : .
b
n1
... b
nn
Then for every i,} = 1, ... , n, we have
bij = ajJC
jI
+ ... + ainC
jn
.
It follows that when i = }, we have
b
ii
= ailC
il
+ ... + ainC;n = det(A):
137
On the other hand, if i :;; }, then equation is equal to the determinant of the matrix
obtained from A by replacing row} by row i. This matrix has therefore two identical rows,
and so the determinant is 0 (why?).
Proof Since A is invertible, we get
AI = 1 ad'(A)
det(A) lj
By Proposition, the unique solution of the system Ax = b is given by
I 1 •
x = A = adj(A)b.
det(A)
Written in full, this becomes
[::J [
Hence, for every} = 1, ... , n, we have
htClj + ... +bnC,y
x· = ="
J det(A) .
To complete the proof, it remains to show that
bIClj + ... + bnC
nj
= det(Aib)):
Note, on using cofactor expansion by column}, that
138
Determinants
~ Hj a(iI)I
det(A .(b» = L...,.bi ( 1) det •
} i=l a(i+}) I
a(il)(jI) • a(iI)(jI)
• • •
a(i+I)(JI)
a(iI)n
•
an(jI) •
Chapter 5
Introduction to Spectral Theory
EIGENVALUES AND EIGENVECTORS
Spectral theory is the main tool that helps us to understand the structure of a linear
operator. In this chapter we consider only operators acting from a vector space to itself (or,
equivalently, n x n matrices). Ifwe have such a linear transformation A : V ~ V, we can
mUltiply it by itself, take any power of it, or any polynomial.
The main idea of spectral theory is to split the operator into simple blocks and analyse
each block separately. To explain the main idea, let us consider difference equations. Many
processes can be described by the equations of the following type
x
n
+
1
= Ax
n
, n = 0, 1,2, ... ,
where a : V ~ V is a linear transformation, and xn is the state of the system at the time
n. Given the initial state Xo we would like to know the state xn at the time n, analyse the
long time behaviour of x
n
' etc. 1
At the first glance the problem looks trivial: the solution xn is given by the formula xn
= An
xo
. But what if n is huge: thousands, millions? Or what if we want to analyse the
behaviour of xn as n ~ oo?
Here the idea of eigenvalues and eigenvectors comes in. Suppose that Axo = A x
o
'
where').., is some scalar. ThenA
2
ax
o
= A 2
xO
' ')..,2x
O
= ')..,2xO' ... , Anxo = ')..,n
xo
' so the behaviour
of the solution is very well understood
In this section we will consider only operators in finitedimensional spaces. Spectral
theory in infinitely many dimensions if significantly more complicated, and most of the
results presented here fail in infinitedimensional setting.
MAIN DEFINITIONS
Eigenvalues, Eigenvectors, Spectrum
A ·scalar A. is called an eigenvalue of an operator A : V ~ V if there exists a nonzero
vector v E V such that
140 Introduction to Spectral Theory
Av = Av.
The vector v is called the eigenvector ofa (corresponding to the eigenvalue A).
Ifwe know that ').. is an eigenvalue, the eigenvectors are easy to find: one just has to
solve the equation Ax = Ax, or, equivalently
(A  Ai)x = o.
So, finding all eigenvectors, corresponding to an eigenvalue is simply finding the
nUllspace of A  AI. The nullspace Ker(A  AI), i.e., the set of all eigenvectors and 0 vector,
is called the eigenspace.
The set of all eigenvalues of an operator A is called spectrum of A, and is usually
denoted cr(A).
Finding Eigenvalues: Characteristic Polynomials
A scalar A is an eigenvalue if and only if the nullspace Ker(A  AI) is nontrivial (so
the equation (A  Ai) x = 0 has a nontrivial solution).
Let a act on lR
n
(i.e., a: lR
n
~ lRn). Since the matrix of A is square, A Ihas a non
trivial nullspace if and only if it is not invertible. We know that a square matrix is not
invertible if and only if its determinant is O. Therefore
II E a(A),i.e.A is an eigenvalue of A {:} det (A  AI) = 01
If A is an n x n matrix, the determinant det(A  AI) is a polynomial of degree n of the
variable A. This polynomial is called the characteristic polynomial of A. So, to find all
eigenvalues of A one just needs to compute the characteristic polynomial and find all its
roots.
This method of finding the spectrum of an operator is not very practical in higher
dimensions. Finding roots of a polynomial of high degree can be a very dicult problem,
and it is impossible to solve the equation of degree higher than 4 in radicals. So, in higher
dimensions different numerical methods of finding eigenvalues and eigenvectors are used.
Characteristic Polynomial of an Operator
So we know how to find the spectrum of a matrix. But how do we find eigenvalues of
an operator acting in an abstract vector space? The recipe is simple:
Take an arbitrary basis, and compute eigenvalues of the matrix of
the operator in this basis.
But how do we know that the result does not depend on a choice of the basis?
There can be several possible explanations. One is based on the notion of similar
matrices. Let us recall that square matrices A and B are called similar if there exist an
invertible matrix S such that
A =SBSl.
Note, that determinants of similar matrices coincide. Indeed
det A = det(SBSl) = det S det B det Sl = det B
because det Sl = 1/ det S. Note that if A = SBSI then
Introduction to Spectral Theory
A  U = SBS
I
 ASISI = S(BSI  USI) = S(B _IJ.)SI,
so the matrices A lJ and B lJ are similar. Therefore
det(A 'JJ) = det(B  'JJ),
i.e.,
Icharacteristic polynomials of similar matrices coincide. I
If T: V 7 V is a linear transformation, and A and B are two bases in V, then
[T]AA = [I]AB[TbB[I]BA
and since [l]BA = ([l]AB)1 the matrices [11
AA
and [11
BB
are similar.
141
In other words, matrices of a linear transformation in different bases are similar.
Therefore, we can define the characteristic polynomial of an operator as the
characteristic polynomial of its matrix in some basis. As we have discussed above, the
result does not depend on the choice of the basis, so characteristic polynomial of an operator
is well defined.
Multiplicities of Eigenvalues
Let us remind the reader, that ifp is a polynomial, and A is its root (i.e., peA) = 0) then
Z  A divides p(z), i.e., p can be represented as p(z) = (z  A)q(Z), where q is some polynomial.
If q(A) = 0, then q also can be divided by z , so (z  )2 divides p and so on.
The largest. positive integer k such that (z  A i divides p(z) is called the multiplicity.
of the root A.
If A i ~ an eigenvalue of an operator (matrix) A, then it is a root of the characteristic
polynomial p(z) = det(A  zl). The mUltiplicity of this root is called the (algebraic)
multiplicity of the eigenvalue A.
Any polynomial p(z) = L ~ = o akz k of degree n has exactly n complex roots, counting
multiplicity. The words counting multiplicities mean that if a root has multiplicity d we
have to count it d times. In other words, p can be represented as
p(z) = an(z  AI)(Z  A
2
) ... (z  An).
where Ai' ~ , ... , An are its complex roots, counting multiplicities. There is another
notion of multiplicity of an eigenvalue: the dimension of the eigenspace Ker(A1) is called
geometric multiplicity of the eigenvalue A.
Geometric multiplicity is not as widely used as algebraic mUltiplicity. So, when people
say simply "multiplicity" they usually mean algebraic multiplicity.
Let us mention, that algebraic and geometric multiplicities of an eigenvalue can differ.
Proposition. Geometric multiplicity of an eigenvalue cannot exceed its algebraic
multiplicity.
Trace and Determinant
Theorem. Let A be n x n matrix, and let AI' A
2
, ... , An be its eigenvalues (counting
multiplicities). Then
1. traceA = AI + A2 + ... + An.
142 Introduction to Spectral Theory
2. det A = A)A2 ... A
n
.
Eigenvalues of a Triangular Matrix
Computing eigenvalues is equivalent to finding roots of a characteristic polynomial
of a matrix (or using some numerical method), which can be quite time consuming.
However, there is one particular case, when we can just read eigenvalues off the matrix.
Namely
eigenvalues ofa triangular matrix (counting mUltiplicities)
are exactly the diagonal entries a),), a2,2, ... , an,n
By triangular here we mean either upper or lower triangular matrix. Since a diagonal
matrix is a particular case of a triangular matrix (it is both upper and lower triangular the
eigenvalues of a diagonal matrix are its diagonal entries
The proof of the statement about triangular matrices is trivial: we need to subtract A
from the diagonal entries of A, and use the fact that determinant of a triangular matrix is
the product of its diagonal entries. We get the characteristic polynomial
det(A 'M) = (a),)  A)(a
2
'2  A) ... (an'n  A)
and its roots are exactly al')' a
2
'2' ... , an'n'
DIAGONALIZATION
Suppose an operator (matrix) a has a basis b = vI' v
2
' ... vn of eigenvectors, and let A.),
A.
2
, ... , n be the corresponding eigenvalues. Then the matrix of A in this basis is the diagonal
matrix with 1, 2, ... , n on the diagonal
[A] BB = diag{A), A
2
, ... , An} = [AI A2 ... 0 ].
o An
Therefore, it is easy to find an Nth power of the operator A. Namely, its matrix in the
basis B is
o
N • {" N "N "N _
[A ]BB = dlag = A) ,A2 , ... ,An 
o
Moreover, functions of the operator are also very easy to compute: for example the
t2A2 t3
A
3
operator (matrix) exponent e
t4
is defined as e
t4
= I + t A + ~ + 3!
and its matrix in the basis B is
00 l Ak
=L
kl
,
k=O •
Introduction to Spectral Theory 143
o
o
To find the matrices in the standard basis S, we need to recall that the change of
coordinate matrix [l]SB is the matrix with columns vI' v
2
, ... , v
n
.
Let us call this matrix S, then
A II
[
AI 0 1
A = [A]ss = s 2. . . S = SDS ,
o An
where we use D for the diagonal matrix in the middle.
Similarly
Af
o
AN = SD
N
SI = S A ~
o
and similarly for etA.
Another way of thinking about powers (or other functions) of diagonalizable operators
is to see that if operator A can be represented as A = SDS
I
, then
AN = (SDSI)(SDS
I
) ... = SD
N
SI
, .. '
NTimes
and it is easy to compute the Nth power of a diagonal matrix. The following theorem
is almost trivial.
Theorem. A matrix a admits a representation A = SDSI, where D is a diagonal matrix
if and only if there exists a basis of eigenvectors of A.
Proof We already discussed above that if there is a basis of eigenvectors, then the
matrix admits the representation A = SDSl, where S = [l]SB is the change of coordinate
matrix from coordinates in the basis B to the standard coordinates.
On the other hand if the matrix admits the representation a = SDSI with a diagonal
matrix D, then columns of S are eigenvectors of A (column number k corresponds to the
kth diagonal entry of D). Since S is invertible, its columns form a basis.
Theorem. Let A.
I
, A.z, ... , A.
r
be distinct eigenvalues of A, and let vI' v
2
, ... , vr be the
corresponding eigenvectors. Then vectors vI' v
2
, ... , vr are linearly independent.
Proof We will use induction on r. The case r = 1 is trivial, because by the definition
an eigenvector is nonzero, and a system consisting of one nonzero vector is linearly
independent.
144 Introduction to Spectral Theory
Suppose that the statement of the theorem is true for r  1. Suppose there exists a
nontrivial linear combination
r
cIv
I
+ c2v2 + ... + crv
r
= L::CkVk = 0
k=I
Applying A  Ar I and using the fact that (A  A/)Vr = 0 we get
rI
L::Ck (Ak  Ar )Vk = O.
k=I
By the induction hypothesis vectors vI' v
2
, ... , vr
I
are linearly independent, so
ciAkr) = 0
for k = 1, 2, ... , r  1. Since Ak"* Ar we can conclude that c
k
= 0 for k < r. Then it
follows from that c
r
= 0, i.e., we have the trivial linear combination.
Corollary. If an operator A " V ~ V has exactly n = dim V distinct then it is
diagonalizable.
Proof For each eigenvalue k let vk be a corresponding eigenvector (just pick one
eigenvector for each eigenvalue). By Theorem the system vI' v
2
, ... , vn is linearly
independent, and since it consists of exactly n = dim V vectors it is a basis.
Bases of Subspaces (AKA Direct Sums of Subspaces)
Let VI' V
2
, ... , Vp be subspaces ofa vector space V. We saythatthe system of subs paces
is a basis in V if any vector v E V admits a unique representation as a sum
p
v = v
J
+ v
2
+ ... + vp = L::Vk,Vk E Vk·
k=1
We also say, that a system of subspaces VI' V
2
, ... , Vp is linearly independent if the
equation
VI + v
2
+ ... + vp = 0, vk E V
k
has only trivial solution (vk = 0 Vk = 1,2, ... , p).
Another way to phrase that is to say that a system of subspaces VI' V
2
, ... , Vp is linearly
independent if and only if any system of nonzero vectors vk, where v
k
E V
k
, is linearly
independent. ,
We say that the system of subspaces VI' V
2
, ... , Vp is generating (or complete, or
spanning) if any vector v E V admits representation.
Remark. From the above definition one can immediately see that Theorem states in
fact that the system of eigenspaces E k of an operator A
Ek := Ker(A  AkI), Ak E cr(A), '
is linearly independent.
Remark. It is easy to see that similarly to the bases of vectors, a system of subspaces
VI' V
2
, ... , Vp is a basis if and only if it is generating and linearly independent.
Introduction to Spectral Theory 145
There is a simple example of a basis of subspaces. Let V be a vector space with a basis
vI' v
2
, ... , v
n
' Split the set of indices I, 2, ... , n into p subsets AI' A
2
, ... , A
p
' and define
subspaces V
k
:= span {Vj :} E A
k
}. Clearly the subspaces V
k
form a basis of V.
The following theorem shows that in the finitedimensional case it is essentially the
only possible example of a basis of subspaces.
Theorem. Let VI' V
2
' ... , Vp be a basis of subs paces, and let us have in each subspace
V
k
a basis (of vectors) B;. Then the union [kBk ofthese bases is a basis in V. To prove the
theorem we need the following lemma.
Lemma. Let VI' V
2
' ... , Vp be a linearly independent family of subs paces, and let us
have in each subspace Vk a linearly independent system Bk of vectors 3 Then the union B
" = U k is a linearly independent system.
Proof The proof of the lemma is almost trivial, if one thinks a bit about it. The main
diculty in writing the proof is a choice of a appropriate notation. Instead of using two
indices (one for the number k and the other for the number of a vector in B
k
, let us use
"flat" notation.
Namely, let n be the number of vectors in B := Let us order the set B, for
example as follows: first list all vectors from B
I
, then all vectors in B
2
, etc, listing all
vectors from Bp last.
This way, we index all vectors in B by integers 1,2, ... , n, and the set of indices {I, 2,
... , n} splits into the sets 1,2, ... , P such that the set Bk consists of vectors b
j
:} E A
k
. Suppose
we have a nontrivial linear combination
n
b + b + + b
 '\" c ·b· = 0
c
il
c
22
...
J=I
Denote
Then can be rewritten as
VI + v
2
+ ... + vp = O.
Since v
k
E v
k
and the system of subs paces V
k
is linearly independent, v
k
= 0 Vk. Than
means that for every k
L: Cij =0,
JEAk
and since the system of vectors b
j
:} E A k (i.e., the system B
k
) are linearly independent,
we have c
j
= 0 for all} E A k' Since it is true for all A k' we can conclude that c
j
= 0 for all
}.
Proof To prove the theorem we will use the same notation as in the proof of Lemma,
i.e., the system Bk consists of vectors bi'} E A k'
146 Introduction to Spectral Theory
Lemma asserts that the system of vectors b"j = 12, "" n is linearly independent, so it
only remains to show that the system is
Since the system of subspaces VI' V
2
, "., Vp is a basis, any vector v E V can be
represented as
p
v = vIPI + v
2
P2 + ".+ v = P = I:'>k, v
k
E Vk'
k=1
Since the vectors bpj E A k form a basis in Vk' the vectors v
k
can be represented as
vk 2:: cjb
j
,
jEA
k
and therefore v =
Criterion of Diagonalizability. First of all let us mention a simple necessary condition.
Since for a diagonal matrix D = diag{A
1
, A
2
, "., An}
det(D  'Ai) = (AI  A)(A2  A) ".(A
n
 A),
we see that if an operator A is diagonalizable, its characteristic polynomial splits into
the product of monomials. Note, that any polynomial can be decomposed into the product
of monomials, if we allow complex coefficients (i.e., complex eigenvalues).
In what follows we always assume that the characteristic polynomial splits into the
product of monomials, either by working in a complex vector space, or simply assuming
that a has exactly n = dim Veigenvalues(counting multiplicity).
Theorem. An operator A : V V is diagonalizable ifand only iffor each eigenvalue
Il the dimension of the eigenspace Ker(A  Ai) (i.e., the geometric multiplicity) coincides
with the algebraic multiplicity of Il.
Proof First of all let us note, that for a diagonal matrix, the algebraic and geometric
multiplicities of eigenvalues coincide, and therefore the same holds for the diagonalizable
operators.
Let us now prove the other implication. Let AI' A
2
, "" Ap be eigenvalues of A, and let
Ek := Ker(A  At/) be the corresponding eigenspaces. The subspaces E
k
, k = 1,2, '''' pare
linearly independent.
Let Bk be a basis in E
k
. By Lemma the system B = is a linearly independent
system of vectors.
We know that each Bk consists of dim Ei= mUltiplicity ofA
k
) vectors. So the number
of vectors in b equal to the sum of multiplicities of eigenvectors k, which is exactly n =
dim V. So, we have a linearly independent system of dim V eigenvectors, which means it
is a basis.
Some Examples
Real eigenvalues. Consider the matrix
A = r).
Introduction to Spectral Theory 147
Its characteristic polynomial is equal to
118> 1 >1 = (1 >)2 16
and its roots (eigenvalues) are t.. = 5 and t.. = 3. For the eigenvalue t.. = 5
(
15 2) (4 2)
A51 = 8 1 5 = 8 4
A basis in its nullspace consists of one vector (1, 2)T, so this is the corresponding
eigenvector. Similarly, for t.. = 3
AU=A+31=(:
and the eigenspace Ker(A + 31) is spanned by the vector (1, _2)T . The matrix A can be
diagonalized as
A = i) =
1 )1
2
Complex eigenvalues. Consider the matrix
A=(l2 i)·
Its characteristic polynomial is
1
1
=2>
and the eigenvalues (roots of the characteristic polynomial are t.. = 1 ± 2i. For
t.. = 1 + 2i
A  V = (= _
This matrix has rank 1, so the eigenspace Ker(A  AI) is spanned by one vector, for
example by (1, OT.
Since the matrix A is real, we do not need to compute an eigenvector for t.. = 12i: we
can get it for free by taking the complex conjugate of the above eigenvector. So, for
t..= 12i
a corresponding eigenvector is (1 ,if , and so the matrix A can be diagonalized as
A=(1 1 )(1+2i 0 )(1 1 )1
i i 0 1 2i i i
A nondiagonalizable matrix. Consider the matrix
A=(b O·
Its characteristic polynomial is
110>
148 Introduction to Spectral Theory
so A has an eigenvalue 1 of mUltiplicity 2. But, it is easy to see that
dimKer(A  1) = 1
(1 pivot, so 2  1 = 1 free variable). Therefore, the geometric multiplicity of the
eigenvalue 1 is different from its algebraic multiplicity, so a is not diagonalizable.
There is also an explanation which does not use Theorem. Namely, we got that the
eigenspace Ker(AI1) is one dimensional (spanned by the vector (1, Of). If A were
diagonalizable, it would have a diagonal form (6 in some basis, and so the dimension
of the eigenspace wold be 2. Therefore A cannot be diagonalized.
Example. Consider a function f: JR2 , dened for every (x, y) E JR2 by
j(x, y) = (s, t),
where
(;)=G
Note that
On the other hand, note that
form a basis for JR
l
. It follows that every U E JR2 can be written uniquely in the form
u=c1v
1
+c
2
v
2
'
where c
I
, c
2
E lR, so that
Au =A(clv
l
+ c
2
v
2
) = clAvI + c0v2 = 2c
l
v
I
+ 6c
2
v
2
.
Note that in this case, the function f: JR2 JR2 can be described easily in terms of
the two special vectors vI and v
2
and the two special numbers 2 and 6. Let us now examine
how these special vectors and numbers arise. We hope to find numbers A E JR and non
zero vectors v E JR2 such that
G
Since
we must have
Introduction to Spectral Theory 149
(G o.
In other words, we must have
(
3 A. 3)
1 5A.
v=O.
In order to have nonzero v E]R2 , we must therefore ensure that
(
3 A. 3 J
det 1 5 _ A. = O.
Hence (3  A.)(5  A.)  3 = 0, with roots A.
t
= 2 and A.2 = 6. Substituting A = 2 into (1),
we obtain
G !} = 0, willi root v, =
Substituting A = 6 into (1), we obtain
0, wiiliroot v, = CJ.
Definition. Suppose that
...
A=: :
ani ann
is an n x n matrix with entries in lR. Suppose further that there exist a number E R
and a nonzero vector v E]Rn such that Av = AV. Then we say that A is an eigenvalue of the
matrix A, and that v is an eigenvector corresponding to the eigenvalue A.
Suppose that A is an eigenvalue of the n x n matrix A, and that v is an eigenvector
corresponding to the eigenvalue A. Then Av = AV = 'Alv, where I is the n x n identity
matrix, so that (A  AJ)v = O. Since vERn is nonzero, it follows that we must have
det (A  A1) = O.
In other words, we must have
al1 A a12
det
a21 a22
A
=0.
ani a
n2
ann  A
that is a polynomial equation. Solving this equation gives the eigenvalues of the matrix
A. On the other hand, for any eigenvalue A of the matrix A, the set
{v elR
n
: (AA1)v=O}
ISO Introduction to Spectral Theory
is the nullspace of the matrix A  IJ., a subspace of n •
Definition. The polynomial is called the characteristic polynomial of the matrix A.
For any root A of equation, the space is called the eigenspace corresponding to the eigenvalue
A.
Example. The matrix
G
has characteristic polynomial (3  1..)(5  A)  3 = 0, in other words, 1..
2
 81.. + 12 = O.
Hence the eigenvalues are Al = 2 and = 6, with corresponding eigenvectors
respectively. The eigenspace corresponding to the eigenvalue 2 is
The eigenspace corresponding to the eigenvalue 6 is
Example. Consider the matrix
(
1 6 12J
A= 0 1,3 30 .
o 9 20 •
To find the eigenvalues of A, we need to nd the roots of
(
11.. 6 12 J
det 0 13  A 30 = 0;
o 9 201..
in other words, (A + 1 )(1..  2)(1..  5) = O. The eigenvalues are therefore
Al = 1, 1..2 = 2
and
1..3 = 5.
An eigenvector corresponding to the eigenvalue 1 is a solution of the system
(A + f)v
6
12
9
12J (IJ
v = 0, with root vI = .
Introduction to Spectral Theory 151
An eigenvector corresponding to the eigenvalue 2 is a solution of the system
(
3 6 12) (0)
 2J)v = 0 15 30 v = 0, with root v
2
= 2.
o 9 18 1
An eigenvector corresponding to the eigenvalue 5 is a solution of the system
(
6 6 12) ( 1 )
(A5J)v= 0 18 30 v=O, withrootv
3
= 5 .
o 9 15 3
Note that the three eigenspaces are all lines through the origin. Note also that the
eigenvectors vI' v
2
and v3 are linearly independent, and so form a basis for ]R3.
Example. Consider the matrix
A =
30 20 12
To find the eigenvalues of A, we need to nd the roots of
(
17A to 5)
det 45 28A 15 = 0;
30 20 12A
in other words, (A + 3)(A  2)2 = O. The eigenvalues are therefore Al = 3 and Az = 2.
An eigenvector corresponding to the eigenvalue 3 is a solution of the system
(
20 to 5) ( 1 )
(A+31)v= 45 25 15 V=O, with root vI = 3.
30 20 15 2
An eigenvector corresponding to the eigenvalue 2 is a solution of the system
(
15 to 5) (1) (2)
(A  2I)v = 45 30 15 v = 0, with roots v
2
= 0 and v3 = 3 .
30 20 10 3 0
Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin,
while the eigenspace corresponding to the eigenvalue 2 is a plane through the origin. Note
also that the eigenvectors VI' v
2
and v3 are linearly independent, and so form a basis for
]R3.
152 Introduction to Spectral Theory
Example. Consider the matrix
(
2 1 OJ
A= 1 ° 0.
003
To find the eigenvalues of A, we need to nd the roots of
(
2A 1 ° J
det 1 ° A ° = 0;
° ° 3A
in other words, (A  3)(A  1)2 = 0. The eigenvalues are therefore Al = 3 and A2 = 1.
An eigenvector corresponding to the eigenvalue 3 is a solution of the system
(
1 1 OJ (OJ
(A  3I)v = ~ ~ ~ v = 0, with root vI = ~ .
An eigenvector corresponding to the eigenvalue 1 is a solution of the system
(A J)v= (i ~ : ~ } = 0, with root v2 = (i}
Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin.
On the other hand, the matrix
(i ~ : ~ J
has rank 2, and so the eigenspace corresponding to the eigenvalue 1 is of dimension 1
and so is also a line through the origin. We can therefore only nd two linearly independent
eigenvectors, so that ]R3 does not have a basis consisting of linearly independent
eigenvectors of the matrix A.
Example. Consider the matrix
A = ( ~ = ~ ~ J .
1 3 4
To find the eigenvalues of A, we need to find the roots of
Introduction to Spectral Theory 153
(
3A 3 2 J
det 1 1 A 2 = 0;
1 3 4A
in other words, (A  2)3 = O. The eigenvalue is therefore A = 2. An eigenvector
corresponding to the eigenvalue 2 is a solution of the system
= ~ ~ J v = 0, with roots vI = ( ~ J and v
2
= ( ~ J .
3 2 1 0
Note now that the matrix
(i = ~ ~ J
has rank 1, and so the eigenspace corresponding to the eigenvalue 2 is of dimension 2
and so is a plane through the origin. We can therefore only nd two linearly independent
eigenvectors, so that IR
3
does not have a basis consisting of linearly independent
eigenvectors of the matrix A.
Example. Suppose that A is an eigenvalue of a matrix A, with corresponding eigenvector
v. Then
A
2
v = A(Av) = A(AV) = A(Av) = A(AV) = A
2
V.
Hence A2 is an eigenvalue of the matrixA2, with corresponding eigenvector v. In fact,
it can be proved by induction that for every natural number kEN, A k is an eigenvalue of
the matrix Ak, with corresponding eigenvector v.
Example. Consider the matrix
(
~ ~ :J.
003
To find the eigenvalues of A, we need to find the roots of
det(l ~ A 2 ~ A : J = 0;
o 0 3A
in other words, (A  1 )(A  2)(A  3) = O. It follows that the eigenvalues of the matrix
A are given by the entries on the diagonal. In fact, this is true for all triangular matrices.
The Diagonalization Problem
Example. Let us return to Examples are consider again the matrix
154
Introduction to Spectral Theory
!}
We have already shown that the matrix A has eigenvalues Al = 2 and 11.2 = 6, with
corresponding eigenvectors
respectively. Since the eigenvectors form a basis for ne, every u E R2 can be written
uniquely in the form
u = civ
i
+ c
2
v
2
' where cl' c
2
E R,
and
Write
c= (:J,u =(;}AU = (;}
Then both can be rewritten as
(;) :)(;:)
and
respectively. Ifwe write
p :) and D
then matrix become u = Pc and Au = P Dc respectively, so that APc = P Dc. Note that
c E R2 is arbitrary. This implies that (AP  PD)c = 0 for every c E R2. Hence we must
have AP = PD. Since P is invertible, we conclude that
PIAP=D.
Note here that
(
AI
P = (vI v
2
) and D = 0
Note also the crucial point that the eigenvectors of A form a basis for 1R
2
•
We now consider the problem in general.
Proposition. Suppose that A is an nn matrix, with entries in R. Suppose further that
Introduction to Spectral Theory 155
A has eigenvalues AI' ... , An E R, not necessarily distinct, with corresponding eigenvectors
vI' ... , vn ERn, and that vI' ... , vn are linearly independent. Then
pIAP=D,
where
Proof Since VI' ... , Vn are linearly independent, they form a basis for ]Rn, so that
every u E]Rn can be written uniquely in the form
u = civ
i
+ ... + cnv
n
' where c
I
' ... , c
n
E]Rn ,
and
Writing
we see that both equations can be rewritten as
U = Pc and Au = P = : = P Dc
. (AICI J
AnCn
respectively, so that
APc = PDc.
Note that C E]Rn is arbitrary. This implies that (AP  PD)c = 0 for every cERn.
Hence we must have AP = PD. Since the columns of P are linearly independent, it follows
that P is invertible. Hence PIAP = D as required.
Example. Consider the matrix
A =
o 9 20
as in Example. We have PIAP = D, where
and n
156 Introduction to Spectral Theory
Example. Consider the matrix
(
17 10 5)
A= 45 28 15 ,
30 20 12
as in Example. We have PIAP = D, where
(
1 1 2) (3 0
P = 3 0 3 and D = 0 2
2 3 0 0 0
Definition. Suppose that A is an n x n matrix, with entries in lR. We say that A is
diagonalizable ifthere exists an invertible matrix P, with entries in lR, such that PIAP is
a diagonal matrix, with entries in lR. It follows from Proposition that an n x n matrix A
with entries in lR is diagonalizable if its eigenvectors form a basis for lR n • In the opposite
direction, we establish the following result.
Proposition. Suppose that A is an nn matrix, with entries in lR. Suppose further that
A is diagonalizable. Then A has n linearly independent eigenvectors in lRn.
Proof Suppose that A is diagonalizable. Then there exists an invertible matrix P, with
entries in lR, such that D = PIAP is a diagonal matrix, with entries in lR.
Denote by vI' ... , vn the columns of P; in other words, write
P=(vl· .. v
n
)·
Also write
Clearly we have AP = PD. It follows that
(
AI
(Avi ... Av
n
) = A(vi ... v
n
) = ( vI'" v
n
)
= (AIV
I
... Alnv
n
)·
Equating columns, we obtain
AVI = Alvl' ... , AVn = AnVn'
It follows that A has eigenvalues AI' ... , An E lR, with corresponding eigenvectors vI'
... , Vn E lRn. Since P is invertible and vI' ... , vn are the columns of P, it follows that the
eigenvectors vI' ... , vn are l!nearly independent.
IntroductIOn to Spectral Theory 157
In view of Propositions, the question of diagonalizing a matrix A with entries in IR is
reduced to one of linearindependence of its eigenvectors.
Proposition. Suppose that A is an n x n matrix, with entries in IR. Suppose further
that A has distinct eigenvalues AI' ... , An E IR, with corresponding eigenvectors vI' ... , Vn
E IR
n
. Then vI' ... , vn are linearly independent.
Proof Suppose that vI' ... , vn are linearly dependent. Then there exist c
I
' ... , c
n
E IR,
not all zero, such that
clvl+···+cnvn=O.
Then
A(CIV
I
+ ... + cnv
n
) = clAvI + ... + c ~ v n = Atclv
I
+ ... + Ancnvn = O.
Since vI' ... , vn are all eigenvectors and hence nonzero, it follows that at least two
numbers among c
I
' ... , c
n
are nonzero, so that c
I
' ••• , c
n
_
1
are not all zero. Multiplying by
An and subtracting, we obtain
(AI  n)c 1 VI + ... + (AnI  An)C
n

I
V nI = O.
Note that since AI' ... , An are distinct, the numbers Al  An' ... , AnI  An are all non
zero. It follows that VI' ... , V
n

l
are linearly dependent. To summarize, we can eliminate
one eigenvector and the remaining ones are still linearly dependent. Repeating this argument
a finite number of times, we arrive at a linearly dependent set of one eigenvector, clearly
an absurdity.
We now summarize our discussion in this section.
Diagonalization Process. Suppose that A is an n x n matrix with entries in lR.
(l) Determine whether the n roots of the characteristic polynomial det(A  IJ) are
real.
(2) If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding
to these eigenvalues. Determine whether we can find n linearly independent
eigenvectors.
(3) If not, then A is not diagonalizable. If so, then write
... J,
where AI' ... , An E IR are the eigenvalues of A and where vI' ... , vn E lR
n
are respectively
their corresponding eigenvectors. Then PIAP = D.
Some Remarks
In all the examples we have discussed, we have chosen matrices A such that the
characteristic polynomial det(A IJ) has only real roots. However, there are matrices A
where the characteristic polynomial has nonreal roots. Ifwe permit AI' ... , An to take values
158 Introduction to Spectral Theory
in C and permit "eigenvectors" to have entries in <C , then we may be able to "diagonalize"
the matrix A, using matrices P and D with entries in <c. The details are similar.
Example. Consider the matrix
A = (1 5).
1 1
To find the eigenvalues of A, we need to find the roots of
det= = 0;
(
IA 5 J
1 IA
in other words, A2 + 4 = O. Clearly there are no real roots, so the matrix A has no
eigenvalues in 1R. Try to show, however, that the matrix A can be "diagonalized" to the
matrix
(
2i 0)
D= .
o 2i
We also state without proof the following useful result which will guarantee many
examples where the characteristic polynomial has only real roots.
Proposition. Suppose that A is an n x n matrix, with entries in 1R. Suppose further
thatA is symmetric. Then the characteristic polynomial det(A IJ) has only real roots. We
conclude this section by discussing an application of diagonalization. We illustrate this by
an example.
Example. Consider the matrix
A = ( ~ ~ = ~ ~ ~ 1 5 5 J '
30 20 12
as in Example. Suppose that we wish to calculate A98. Note that PIAP = D, where
(
1 1 2J (3
P = 3 0 3 and D = 0
2 3 0 0
o OJ
2 O.
o 2
It follows that A = PDpi, so that
A
98
= (PDPI) ... (PDP
1
) = PD
98
p
1
= P 0
, v I
98
o o
This is much simpler than calculating A
98
directly.
Introduction to Spectral Theory 159
An Application to Genetics
In this section, we discuss very briey the problem of autosomal inheritance. Here we
consider a set oftwo genes designated by G and g. Each member of the population inherits
one from each parent, resulting in possible genotypes GG, Gg and gg. Furthermore, the
gene G dominates the gene g, so that in the case of human eye colours, for example, people
with genotype GG or Gg have brown eyes while people with genotype gg have blue eyes.
It is also believed that each member of the population has equal probability of inheriting
one or the other gene from each parent. The table below gives these peobabilities in detail.
Here the genotypes of the parents are listed on top, and the genotypes of the ospring are
listed on the left.
GGGG GGGg GGgg GgGg Gggg gggg
GG 1
1
0
1
0 0  
2 4
Gg 0
1
1
1 1
0
  
2 2 2
0 0 0
1 1
1 gg  
4 2
Example. Suppose that a plant breeder has a large population consisting of all three
genotypes. At regular intervals, each plant he owns is fertilized with a plant known to have
genotype GG, and is then disposed of and replaced by one of its osprings. We would like
to study the distribution of the three genotypes after n rounds of fertilization and
replacements, where n is an arbitrary positive integer. Suppose that GG(n), Gg(n) and
gg(n) denote the proportion of each genotype after n rounds offertilization and replacements,
and that GG(O), Gg(O) and gg(O) denote the initial proportions. Then clearly we have
GG(n) + Gg(n) + gg(n) = 1 for every n = 0, 1,2, ...
On the other hand, the left hand half of the table above shows that for every n = 1, 2,
3, ... , we have
and
so that
1
GG(n) = GG(n  1) + "2 Gg(n  1),
1
Gg(n) = "2 Gg(n  1) + gg(n  1),
gg(n) = 0,
(
GG(n)J (1
Gg(n) = 0
gg(n) 0
112
112
o
0J(GG(n 1)J
1 Gg(nl).
o gg(nl)
160 Introduction to Spectral Theory
It follows that
= An for every n = 1, 2, 3, ... ,
gg(n) gg(O)
where the matrix
000
has eigenvalues Al = 1, A2 = 0, A3 = 112, with respective eigenvectors
We therefore write
and
0
OJ
2
0 o ,
0 112
(I
1
:J
I
0
with P =
1 2
Then pI AP = D, so that A = PDpI, and so
An =PDnp
1
0 : J
o 1 0 0 0 1I2
n
0 1 2
11I2n
Il/2n1
=
0
I/2n
l/2n1
0 0 0
It follows that
(GG(n))
II/2
n
I1I2
n

1
(GG(O))
Gg(n) = 0 1I2
n
I/2n1
Gg(O)
gg(n) 0 0 0 gg(O)
Introduction to Spectral Theory
161
n n\
GG(O) + Gg(O) + gg(O)  Gg(O)/2  gg(O)/2
=
n n\
Gg(O)/2 + gg(O)/2
o
n n\
IGg(O)/2 gg(O)/2
n n\
= Gg(O)/2 + gg(O)/2
o
This means that nearly the whole crop will have genotype GG.
Chapter 6
Inner Product Spaces
Inner Product in IR
n
and en
Inner product and norm in IRn. In dimensions 2 and 3, we defined the length of a
vector x (i.e., the distance from its endpoint to the origin) by the Pythagorean rule, for
example in lR
3
the length of the vector is defined as
as
I 2 2 2
II x 11= V Xl + x2 + x3 .
It is natural to generalize this formula for all n, to define the norm ofthe vector x ERn
I 2 2 2
IIxll=vxI +X2 + ... +xn
The word norm is used as a fancy replacement to the word length.
The dot product in IR3 was defined as x . Y = x
I
Y2 + xV'2 + xJY3' where
x = (xl' X
2
' x3)T andy = (YI' Y2' Y3l.
Similarly, in IR
n
one can define the inner product (x, y) oftwo vectors
x = (xl' X
2
, ... , xnl, Y = (Yl' Y2' ... , ynl
by
(X, y):= XIYI + XV'2 + ... + X,ln = yT X,
so II X 11= ~ ( x , x ) .
Note, that yT X = xT y, and we use the notation yT X only to be consistent.
Inner product and norm in en. Let us now define norm and inner product for en.
The complex space en is the most natural space from the point of view of spectral theory:
even if one starts from a matrix with real coefficients (or operator on a real vectors space),
the eigenvalues can be complex, and one needs to work in a complex space.
For a complex number z = X + iy, we have 1 z 12 = x2 + y2 = z Z . If Z E en is given by
Inner Product Spaces 163
z = ~ ~  [ : ~ ! & ~
z n  xn ;iYn '
it is natural to define its norm II z II by
2 n 2 2 n 2
II z II = L (Xk + Yk) = 2:) zk I .
k=l" k=l
Let us try to define an inner product on en such that II z 112 = (z, z). One of the choices
is to define (z, w) by
n
(z,w)=z,WI +z2W2+ ... +znwn= I:>kWk>
k=1
and that will be our definition of the inner product in en .
To simplify the notation, let us introduce a new notion. For a matrix A let us define its
Hermitian adjoint, or simply adjoint A* by A* = A  ~ meaning that we take the transpose
of the matrix, and then take the complex conjugate of each entry. Note, that for a real
matrix A, A* = AT .
Using the notion of A *, one can write the inner product in en as
(z, w) = w*z.
Remark. It is easy to see that one can define a different inner product in en such that
II z Ib = (z, z), namely the inner product given by
(z, w)1 = Z I WI + Z 2 w
2
+ ... + Z n Wn = z*w.
We did not specify what properties we want the inner product to satisfy, but z*w and
w*z are the only reasonable choices giving II z 112 = (z, z).
Note, that the above two choices of the inner product are essentially equivalent: the
only difference between them is notatioool, because (z, w)l = (w, z).
While the second choice of the inner product looks more natural, the first one, (z, w)
= w*z is more widely used, so we will use it as well.
Inner Product Spaces. The inner product we defined for ]Rn and en satisfies the
following properties:
1. (Conjugate) symmetry: (x, y) = (x, y); note, that for a real space, this property
is just symmetry, (x, y) = (y, x);
2. Linearity: (ax + ay, z) = (x, z) + (y, z) for all vector x, y, z and all scalars a, p;
3. Nonnegativity: (x, x) ~ 0 '\Ix;
4. Nondegeneracy: (x, x) = 0 if and only if x = o.
164 Inner Product Spaces
Let V be a (complex or real) vector space. An inner product on V is a function, that
assign to each pair of vectors x, Y a scalar, denoted by (x, y) such that the properties 14
from the previous section are satisfied.
Note that for a real space V we assume that (x, y) is always real, and for a complex
space the inner product (x, y) can be complex.
A space V together with an inner product on it is called an inner product space. Given
an inner product space, one defines the norm on it by
II x II = ~ ( x , x ) .
Example. Let V be ]Rn or en . We already have an inner product
(x, y) = y*x = y * x = 2:;=1 xkYk
defined above.
This inner product is called the standard inner product in ]Rn or en We will use
symbol F to denote both e and lR. When we have somt! statement about the space F
n
, it
means the statement is true for both lR
n
and en.
Example. Let V be the space Pn of polynomials of degree at most n. Define the inner
product by
(f,g) = JI f(t)g(t)dt.
1
It is easy to check, that the above properties 14 are satisfied.
This definition works both for complex and real cases. In the real case we only allow
polynomials with real coefficients, and we do not need the complex conjugate here.
Let us recall, that for a square matrix A, its trace is defined as the sum of the diagonal
entries,
n
trace A = 2:ak,k'
k=l
Example. For the space Mm x n of m x n matrices let us define the socalled Frobenius
inner product by
(A, B) = trace (B*A).
Again, it is easy to check that the properties, i.e., that we indeed defined an inner
product.
Note, that
trace (B* A) = 2: Aj,kBj,k ,
j,k
so this inner product coincides with the standard inner product in e
mn
•
Inner Product Spaces 165
Properties of Inner Product. The statements we get in this section are true for any
inner product space, not only for Fn' To prove them we use only properties 14 of the
inner product.
First of all let us notice, that properties 1 and 2 imply that
2 '. (x, ay + ~ z ) = ~ (x, y) + i3 (x, z).
Indeed,
(x,ay +0z) = (ayt 0z,x) = a(y,x) + 0(z,x)
 
= a(y,x) + 0(z,x) = a(x,y) + 0(x,z)
Note also that property 2 implies that for all vectors x (0, x) = (x, 0) = O.
Lemma. Let x be a vector in an inner product space V. Then x = 0 if and only if
(x, y) = 0 \ly E V.
Proof Since (0, y) = 0 we only need to show that implies x = O. Putting y = x in) we
get(x, x) = 0, so x = O.
Applying the above lemma to the difference x  y we get the following
Corollary. Let x, y be vectors in an inner product space V. The equality x = y holds
if and only if
(x, z) = (y, z) \lz E V .
The following corollary is very simple, but will be used a lot
Corollary. Suppose two operators A, B : x ~ Y satisfo
(Ax, y) = (Bx, y) \Ix E X, \ly E V.
Then A = B
Proof By the previous corollary (fix x and take all possible y's) we get Ax = Bx' Since
this is true for all x E X, the transformations A and B coincide.
The following property relates the norm and the inner product.
Theorem. (CauchySchwarz inequality).
I (x, y) I ~ II x II . II y II·
Proof The proofwe are going to present, is not the shortest one, but it gives a lot for
the understanding.
Let us consider the real case first. If y = 0, the statement is trivial, so we can assume
that y "* O. By the properties of an inner product, for all scalar t
o ::; II x  ty 112 = (x  ty, x  ty) = II x 112  2t(x, y) + t
2
11 Y 112.
In particular, this inequality should hold for t = ( x , y ~ 1, and for this point the inequality
becomes II y II
2
(x,y)
2
lIyll ,
166 Inner Product Spaces
which is exactly the inequality we need to prove.
There are several possible ways to treat the complex case. One is to replace x by ax,
where a is a complex constant, I a I = 1 such that (ax, y) is real, and then repeat the proof
for the real case.
The other possibility is again to consider
o :::; II x  ty 112 = (x  ty, x  ty) = (x, x  ty)  t(y, x  ty)
= II x 112  t(y, x)  t (x, y) + I t 1211 Y 112.
Substituting t = (x,y) + (x,y) into this inequality, we get
II y 112 II Y 112 2
2 I (x,y) I
o :::; II x II  II Y 112
which is the inequality we need.
Note, that the above paragraph is in fact a complete formal proof of the theorem. The
reasoning before that was only to explain why do we need to pick this particular value of
t.
An immediate Corollary of the CauchySchwarz Inequality is the following lemma.
Lemma. (Triangle inequality). For any vectors x, y in an inner product space
II x + y II :::; II x II + II y II·
Proof II x + y 112 = (x + y, x + y) = II x 112 + II y 112 + (x, y) + (y, x)
:::; II x 112 + II y 112 + 211 x II . II y II = (II x II + II y 11)2.
The following polarization identities allow one to reconstruct the inner product from
the norm:
Lemma (Polarization identities). For x, y E V
1 2 2
(x,y)=4"(llx+ yll llxyll )
if V is a real inner product space, and
(x,y)=.!. L::: a l ~ + a Y I l 2
4 a=±l,+i
if Vis a complex space.
The lemma is proved by direct computation.
Another important property of the norm in an inner product space can be also checked
by direct calculation.
Lemma. (Parallelogram Identity). For any vectors u, v
II u + v 112 + II u  v 112 = 2(11 u 112 + II v 11
2
).
In 2dimensional space this lemma relates sides of a parallelogram with its diagonals,
which explains the name. It is a wellknown fact from planar geometry.
Inner Product Spaces 167
NORMS
Normed spaces
We have proved before that the norm II v II satisfies the following properties:
1. Homogeneity: II v II = II . II v II for all vectors v and all scalars.
2. Triangle inequality: II u + v II ~ II u II + II v II·
3. Nonnegativity: II v II ~ 0 for all vectors v.
4. Nondegeneracy: II v II = 0 if and only if v = o.
Suppose in a vector space V we assigned to each vector v a number II v II such that
above properties 14 are satisfied. Then we say that the function v ~ II v II is a norm. A
vector space V equipped with a norm is called a normed space.
Any inner product space is a normed space, because the norm II v II = J(v, v) satisfies
the above properties 14. However, there are many other normed spaces. For example,
given p, 1 < p < 00 one can define the norm II . lip on lR.
n
or en by
II x lip ~ (I XI f" + I x
2
f" + ... + II xn 1V'lllp ~ [ ~ I xk I
P
r p
One can also define the norm 11.11"" (p = 00) by
II x 1100 = max{1 xkl: k= 1,2, ... , n}.
The norm 1I·ll
p
for P = 2 coincides with the regular norm obtained from the inner
product.
To check that 1I·lI
p
is indeed a norm one has to check that it satisfies all the above
properties 14. Properties 1,3 and 4 are very easy to check. The triangle inequality (property
2) is easy to check for P = 1 and p = 1 (and we proved it for p = 2).
For all other p the triangle inequality is true, but the proof is not so simple, and we
will not present it here. The triangle inequality for k . kp even has special name: its called
Minkowski inequality, after the German mathematician H. Minkowski.
Note, that the norm 1I.lI
p
for p * 2 cannot be obtained from an inner product. It is
easy to see that this norm is not obtained from the standard inner product in Rn (en). But
we claim more! We claim that it is impossible to introduce an inner product which gives
rise to the norm 1I.lIp'p ~ 2.
This statement is actually quite easy to prove. It is easy to see that the Parallelogram
Identity fails for the norm 1I.lIp'p ~ 2. and one can easily find a counter example in 1R
2
,
which then gives rise to a counter example in all other spaces.
In fact, the Parallelogram Identity, as the theorem below asserts completely
characterizes norms obtained from an inner product.
168 Inner Product Spaces
Theorem. A norm in a normed space is obtained from some inner product if and only
if it satisfies the Parallelogram Identity
II u + v 112 + II u  v 112 = 2(11 u 112 + II v 112) "i/u, v E V.
The inverse implication is more complicated. If we are given a norm, and this norm
came from an inner product, then we do not have any choice; this inner product must be
given by the polarization identities.
But, we need to show that (x, y) we got from the polarization identities is indeed an
inner product, i.e., that it satisfies alt the properties. It is indeed possible to check if the
norm satisfies the parallelogram identity, but the proof is a bit too involved, so we do not
present it here.
ORTHOGONALITY
Orthogonal and Orthonormal Bases
Definition. Two vectors u and v are called orthogonal (also perpendicular) if (u, v) = O.
We will write u 1. v to say that the vectors are orthogonal. Note, that for orthogonal vectors
u and v we have the following Pythagorean identity:
II u + v 112 = II U 112 + II v 112 if u 1. v.
The proof is straightforward computation,
II u + v 112 = (u + v, u + v) = (u, u) + (v, v) + (u, v) + (v, u) = II U 112 + II v 112
((u, v) = (v, u) = 0 because of orthogonality).
Definition. We say that a vector v is orthogonal to a subspace E if v is orthogonal to
all vectors w in E.
We say that subspaces E and F are orthogonal if all vectors in E are orthogonal to F,
i.e., all vectors in E are orthogonal to all vectors in F
The following lemma shows how to check that a vector is orthogonal to a subspace.
Lemma. Let E be spanned by vectors VI' v
2
, ... , Yr' Then v ? E if and only if
v J.. v
k
, "i/k = 1, 2, ... , r.
Proof By the definition, if v 1. E then v is orthogonal to all vectors in E. In particular,
v J.. v
k
, k = 1,2, ... , r.
On the other hand, let v J.. v
k
' k = 1, 2, ... , r. Since the vectors v
k
span E, any vector
WEE can be represented as a linear combination Z=:=lakVk' Then so v 1. w.
Definition. A system of vectors v I' v
2
, ... , v n is called orthogonal if any two vectors
are orthogonal to each other (i.e., if(v
j
, v
k
) = 0 forj * k).
If, in addition Itvk 11= 1 for all k, we call the system orthonormal.
Lemma. (Generalized Pythagorean identity). Let VI' v
2
, ... , vn be an orthogonal system.
Then
Inner Product Spaces
n 2 n 2 2
I:CXkVk = I:lcxk Illvk II
k=1 k=1
This formula looks particularly simple for orthonormal systems, where II v
k
II = 1.
Proof of the Lemma.
tCXkVk 2 = [tCXkVk,tCXjVj] = ttakaj(Vk>Vj)'
k=1 k=1 j=1 k=1 j=1
169
Because of orthogonality v
k
' v) = 0 if} ::j:. k. Therefore we only need to sum the terms
with} = k, which gives exactly
n 2 n 2 2
I:lcxk I (Vk,Vk) = I:lak IlIvk II .
k=1 k=1
Corollary. Any orthogonal system Vi' v
2
, ... , Vn of nonzero vectors is linearly
independent.
Proof Suppose for some ai' ... , ~ 2 ' ... , an we have I::=I CXkVk = O. Then by the
Generalized Pythagorean identity
2 ~ 2 2
o =11 0 II = L,...I CXk I II Vk II .
k=1
Since II v
k
II ::j:. 0 (vk ::j:. 0) we conclude that
a
k
= 0 Vk,
so only the trivial linear combination gives O.
Remark. In what follows we will usually mean by an orthogonal system an orthogonal
system of nonzero vectors. Since the zero vector 0 is orthogonal to everything, it always
can be added to any orthogonal system, but it is really not interesting to consider orthogonal
systems with zero vectors.
Orthogonal and Orthonormal Bases
Definition. An orthogonal (orthonormal) system VI' v
2
, ... , vn which is also a basis is
called an orthogonal (orthonormal) basis.
It is clear that in dim V = n then any orthogonal system of n nonzero vectors is an
orthogonal basis. As we studied before, to find coordinates of a vector in a basis one needs
to solve a linear system. However, for an orthogonal basis finding coordinates of a vector
is much easier. Namely, suppose VI' v
2
, ... , vn is an orthogonal basis, and let
n
X = aiv
i
+ 2v2 + ... + anv
n
= I:cx jVj'
j=1
Taking inner product of both sides of the equation with VI we get
170
n
(x, VI) = I : > ~ j (V
j
, VI) = al(v
l
, VI) = alii VI 112
j=1
(all inner products (V
j
, VI) = 0 if} :t= 1), so
(x, vI)
(XI =2·
II vIII
Similarly, mUltiplying both sides by v
k
we get
so
Therefore,
n 2
(x, Vk) = L(X j(Vj' Vk) = (Xk(Vk' Vk) = ak II Vk II
j=1
Inner Product Spaces
to find coordinates of a vector in an orthogonal basis one does not
need to solve a linear system, the coordinates are determined by the
formula.
This formula is especially simple for orthonormal bases, when II V
k
II = 1.
Orthogonal Projection and GramSchmidt Orthogonalization
Recalling the definition of orthogonal projection from the classical planar (2
dimensional) geometry, one can introduce the following definition. Let E be a subspace of
an inner product space V.
Definition. For a vector V its orthogonal projection P EV onto the subspace E is a vector
w such that
1. wEE;
2. v W 1. E.
We will use notation w = P EV for the orthogonal projection.
After introducing an object, it is natural to ask:
1. Does the object exist?
2. Is the object unique?
3. How does one find it?
We will show first that the projection is unique. Then we present a method of finding
the projection, proving its existence.
The following theorem shows why the orthogonal projection is important and also
proves that it is unique.
Inner Product Spaces 171
Theorem. The orthogonal projection w = P EV minimizes the distance from v to E, i.e.,
for all x E E
II v  w II II v  x II·
Moreover, if for some x E E
II v  w II = II v  x II,
then x = v.
Proof Let y = w  x. Then
Since v  w ..L E we have y ..L v  wand so by Pythagorean Theorem
II v  x 112 = II v  w 112 + II y 112 II v  w 112.
Note that equality happens only ify = 0 i.e., if x = w.
The following proposition shows how to find an orthogonal projection if we know an
orthogonal basis in E.
Proposition. Let VI' v
2
, ... , vr be an orthogonal basis in E. Then the orthogonal
projection P EV of a vector v is given by the formula
r
PEv= LQ.kVk' where Q.k =
k=I II vk II
In other words
P
(v,vk)
EV= L...J2vk ·
k=III vk II
Note that the formula for k coincides with, i.e., this formula for an orthogonal system
(not a basis) gives us a projection onto its span.
Remark. It is easy to see now from formula that the orthogonal projection P E is a
linear transformation.
One can also see linearity of P E directly, from the definition and uniqueness of the orthogonal
projection. Indeed, it is easy to check that for any x and y the vector
ax +  (aPeX  sY)
is orthogonal to any vector in E, so by the definition
PE(ax + = a.PE! +
Remark. Recalling the definition of inner product in en and lR. n one can get from the
above formula the matrix of the orthogonal projection P E onto E in en (lR. n ) is given by
r 1 *
P
E
= L 2 VkVk
k=ll1 V
k
II
where columns VI' V
2
' ... , vr form an orthogonal basis in E.
Proof of Proposition. Let
172
Inner Product Spaces
r
where Olk=
k=I II vk II
We want to show that v  W ..1 E. By Lemma it is sucient to show that v  W ..L v
k
' k
= 1, 2, ... , n. Computing the inner product we get for k = 1, 2, ... , r
r
(v  w, vk) = (v, v
k
)  (w, v
k
) = (v, v
k
)  L Ol /Vj' vk)
j=I
(V,Vk) 2
=(v,vk)ak(vk,vk)=2
I1vk
II =0.
IIvk II
SO, if we know an orthogonal basis in E we can find the orthogonal projection onto E.
In particular, since any system consisting of one vector is an orthogonal system, we know
how to perform orthogonal projection onto onedimensional spaces.
But how do we find an orthogonal projection if we are only given a basis in E?
Fortunately, there exists a simple algorithm allowing one to get an orthogonal basis from
a basis.
GramSchmidt Orthogonalization Algorithm. Suppose we have a linearly independent
system x I' X
2
, ... , x
n
. The GramSchmidt method constructs from this system an orthogonal
system vI' v
2
' ... , vn such that span{xl' x
2
, .. " xn} = span {vI' V
2
, ... , v
n
}·
Moreover, for all r $ n we get
span {xl' x
2
, ... , x
r
} = span {vI' V
2
, ... , v
r
}
Now let us describe the algorithm.
Step 1. Put VI :=x
I
' Denote by EI := span{x
I
} = span{v
I
}.
Step 2. Define v
2
by
(x2' VI)
V2 =X2 P
E
,X2 =X2  2 VI'
II VI II
Define E2 = span {vI' v
2
}· Note that span {xl'x
2
} = E
2
.
Step 3. Define v3 by
(x3' VI) (X3' V2)
V3 := X3  P
E2
X
3 = X3  2 VI  2 V2
II vIII IIv211
Put E3 := span {VI' V
2
, v
3
}· Note that span {xl' X
2
, X
3
} = E
3
. Note also thatx
3
E2 so
v3 =1= O.
Step r + 1. Suppose that we already made r steps of the process, constructing an
orthogonal system (consisting of nonzero vectors) VI' v
2
, ... , vr such that Er := span{vl'
v
2
' ... , v
r
} = span{x
I
, x
2
, ... , x
r
}. Define
._ _ (Xr+I,Vk)
vr+I . xr+I  PEr Xr+I  Xr+I  6 I 2 Vk
k=I ,I Vk II
Inner Product Spaces
Note, that xr+I e Er so vr+I '* 0.
Continuing this algorithm we get an orthogonal system vI' v
2
, ... , v
n
.
An example. Suppose we are given vectors
xI = (1, I,ll, X
2
= (0, 1, 2l, X3 = (1,0, 2l,
and we want to orthogonalize it by GramSchmidt. On the first step define
vI =x
l
= (1,1, ll·
On the second step we get
(x2' vI)
v2 = x2 PE
1
X
2 = x2  2 VI·
II vI II
Computing
(x2' v,) [[ 3,11 112 = 3,
we get
Finally, define
(x3' vI) (x3' v2)
v3 = x3  P
E2
X
3 = x3  2 vI  2 v2·
II vIII II v211
Computing
(II VI 112 was already computed before) we get
1
vJ 1]= I
2
173
Remark. Since the multiplication by a scalar does not change the orthogonality, one
can multiply vectors v
k
obtained by GramSchmidt by any nonzero numbers.
In particular, in many theoretical constructions one normalizes vectors v
k
by dividing
them by their respective norms II v
k
II. Then the resulting system will be orthonormal, and
the formulas will look simpler.
On the other hand, when performing the computations one may want to avoid fractional
entries by multiplying a vector by the least common denominator of its entries. Thus one
may want to replace the vector v 3 from the above example by (1, 2, ll.
..
174 Inner Product Spaces
Orthogonal Complement. Decomposition V = E $ E..L.
Definition. For a subspace E its orthogonal complement E? is the set of all vectors
orthogonal to E,
E1.. := {x:x ..L E}.
If x, Y ..L E then for any linear combination ax + ~ y ..L E. Therefore E1.. is a subspace.
By the definition of orthogonal projection any vector in an inner product space V admits
a unique representation
v = VI + V
2
' VI E E, V
2
..L E (eqv. V
2
E E1..)
, (where clearly VI = P EV)'
This statement is often symbolically written as V = E $ E1.., which mean exactly
that any vector admits the unique decomposition above.
The following proposition gives an important property ofthe orthogonal complement.
Proposition. For a subspace E
Least Square Solution
The equation
Ax =b
has a solution if and only if b E R an A. But what do we do to solve an equation that
does not have a solution?
This seems to be a silly question, because if there is no solution, then there is no
solution. But, situations when we want to solve an equation that does not have a solution
can appear naturally, for example, if we obtained the equation from an experiment. Ifwe
do not have any errors, the right side b belongs to the column space Ran A, and equation
is consistent. But, in real life it is impossible to avoid errors in measurements, so it is
possible that an equation that in theory should be consistent, does not have a solution. So,
what one can do in this situation?
Least square solution. The simplest idea is to write down the error
IIAxb II
and try to find x minimizing it. Ifwe can find x such that the error is 0, the system is
consistent and we have exact solution. Otherwise, we get the socalled least square solution.
The term least square arises from the fact that minimizing II Ax  b II is equivalent to
minimizing
2
2 m 2 m n
II Axb II = 2) (AX)k b
k
I = L: L:Ak,jx
j
b
k
k=I k=I j=I
i.e., to minimizing the sum of squares of linear functions.
Inner Product Spaces 175
There are several ways to find the least square solution. Ifwe are in ~ n ,and everything
is real, we can forget about absolute values. Then we can just take partial derivatives with
respect to Xj and find the where all of them are 0, which gives us minimum.
Geometric approach. However, there is a simpler way offinding the minimum. Namely,
if we take all possible vectors x, then Ax gives us all possible vectors in Ran A, so minimum
of II Ax  b II is exactly the distance from b to Ran A. Therefore the value of II Ax  b II is
minimal if and only if Ax = P Ran Ab, where P RanA stands for the orthogonal projection onto
the column space Ran A.
So, to find the least square solution we simply need to solve the equation
Ax = PRanAb.
Ifwe know an orthogonal basis vI' v
2
, ..• , vn in Ran A, we can find vector P R a ~ b by
the formula
~ (b,vk)
PRanAb= w
11
11
2vk
•
k=I vk
Ifwe only know a basis in Ran A, we need to use the GramSchmidt orthogonalization
to obtain an orthogonal basis from it.
So, theoretically, the problem is solved, but the solution is not very simple: it involves
GramSchmidt orthogonalization, which can be computationally intensive. Fortunately,
there exists a simpler solution.
Normal equation. Namely, Ax is the orthogonal projection P R a ~ b if and only if b 
Ax 1. Ran A (Ax E Ran A for all x).
If aI' a
2
, ... , an are columns of A, then the condition Ax 1. Ran A can be rewritten as
b Ax 1. ak' '1k = 1,2, ... , n.
That means
0= (b Ax, ak) = a;(b Ax) Vk = 1,2, ... , n.
Joining rows a; together we get that these equations are equivalent to
A*(b Ax) = 0,
which in tum is equivalent to the socalled normal equation
A * Ax =A * b.
A solution of this equation gives us the least square solution of Ax = b.
Note, that the least square solution is unique if and only if A * A is invertible.
Formula for the orthogonal projection. As we already discussed above, if x is a solution
of the normal equation A * Ax = A * b (i.e., a least square solution of Ax = b), then Ax =
PRanAb. So, to find the orthogonal projection of b onto the column space Ran A we need to
solve the normal equation A *Ax = A *b, and then mUltiply the solution by A.
If the operator A * A is invertible, the solution of the normal equation A * Ax = A *b is
given by x = (A*ArIA*b, so the orthogonal projection P ~ b can be computed as
176
Inner Product Spaces
Since this is true for all b,
PRa"A =A(A*ArIA*
is the formula for the matrix of the orthogonal projection onto Ran A.
The following theorem implies that for an m x n matrix A the matrix A *A is invertible
if and only if rank A = n.
Theorem. For an m x n matrix A
KerA = Ker(A*A).
Indeed, according to the rank theorem KerA = {O} if and only rank A is n. Therefore
Ker(A * A) = {O} if and only if rank A = n. Since the matrix A * A is square, it is invertible if
and only if rank A = n.
To prove the equality Ker A = Ker (A *A) one needs to prove two inclusions Ker(A *A)
KerA and KerA Ker(A *A). One of the inclusion is trivial, for the other one use the fact that
1/ Ax 1/
2
= (Ax, Ax) = (A *Ax, x).
Example. line fitting. Let us introduce a few examples where the least square solution
appears naturally. Suppose that we know that two quantities x and yare related by the law
Y = a + bx. The coefficients a and b are unknown, and we would like to find them from
experimental data.
Suppose we run the experiment n times, and we get n pairs (xk' Yk)' k = 1, 2, ... , n.
Ideally, all the points (x
k
' Yk) should be on a straight line, but because of errors in
measurements, it usually does not happen: the point are usually close to some line, but not
exactly on it. That is where the least square solution helps!
Ideally, the coefficients a and b should satisfy the equations
a + bx k = Y k' k = 1, 2, ... , n
(note that here, x
k
andYk are some fixed numbers, and the unknowns are a and b). If
it is possible to find such a and b we are lucky. If not, the standard thing to do, is to
minimize the total quadratic error
n 2
2:1 a+bxk  Yk 1 •
k=l
But, minimizing this error is exactly finding the least square solution of the system
1
'1 :;j [bl ~ J ~ l
1 xn Y
n
(recall, that xkYk are some given numbers, and the unknowns are a and b).
Example. Suppose our data (x
k
' Yk) consist of pairs
(2,4), (1,2), (0, 1), (2, 1), (3, 1).
Then we need to find the least square solution of
Inner Product Spaces 177
4
1
=
2
0 1
2 I
3 1
Then
1 2
A*A=(_i
1 1 1
1) 1
1
1 0 2
3 1
0
1 2
1 3
and
4
A*b=(_i
1 1 1
j)
2
1 0 2
I
1
I
so the normal equation A *Ax = A *b is rewritten as
=
The solution of this equation is
a = 2, b = 112,
so the best fitting straight line is
y = 2  1I2x.
Examples. Curves and Planes. The least square method is not limited to the line
fitting. It can also be applied to more general curves, as well as to surfaces in higher
dimensions.
The only constraint here is that the parameters we want to find be involved linearly.
The general algorithm is as follows:
1. Find the equations that your data should satisfy if there is exact fit;
2. Write these equations as a linear system, where unknowns are the parameters
you want to find. Note, that the system need not to be consistent (and usually is
not);
3. Find the least square solution of the system.
An example: curve fitting. For example, suppose we know that the relation between x
and y is given by the quadratic law y = a + bx + cx
2
' so we want to fit a parabola y = a +
bx + cx
2
to the data. Then our unknowns a, b, c should satisfy the equations
a + bX
k
+ ex; = yk, ... , k = 1,2, ... , n
or, in matrix form
178 Inner Product Spaces
: :;
1 X X Y
n
n n
For example, for the data from the previous example we need to find the least square
solution of
1 2 4 4
1 1
2
1 0 1
1 2 1
1 3 9 1
Then
1 2 4
<A*A=H
1 1 1
r
1 1
=p
2
1 0 2 3 1 o 0 18
1 0 4 9 1 2 4 18 26
1 3 9
and
4
A*b=H
1 1 1
i]=
2
=[3H
1 0 2 1
1 0 4 1
1
Therefore the normal equation A *Ax = A *b is
[
=
18 26 114 C 31
which has the unique solution
a = 86/77, b = 62/77, C = 43/154.
Therefore,
Y = 86/77  62 x/77 + 43 x
2
/154
is the best fitting parabola.
IS]
26
114
Plane fitting. As another example, let us fit a plane z = a + b
x
+ c
y
to the data
(xk'Yk' zk) E E]R3, k = 1,2, ... n.
The equations we should have in the case of exact fit are
a + bX
k
+ cYk = zk' k = 1,2, .'" n,
or, in the matrix form
Inner Product Spaces 179
So, to find the best fitting plane, we need to find the best square solution ofthis system
(the unknowns are a, b, c).
Fundamental Subspaces Revisited
Adjoint matrices and adjoint operators. Let as recall that for an m x n matrix A its
Hermitian adjoint (or simply adjoint) A* is defined by A * : = AT . In other words, the
matrix A * is obtained from the transposed matrix AT by taking complex conjugate of each
entry. The following identity is the main property of adjoint matrix:
I(Ax,y) = (x,A * y)\ix,y E V.I
Before proving this identity, let us introduce some useful formulas. Let us recall that
for transposed matrices we have the identity (AB)T = BT AT. Since for complex numbers z
and W we have zw = ;;, the identity
(AB)* = B*A*
holds for the adjoint.
Also, since (AT l = A and z = ~ , z ,
(A*) =A* = A.
Now, we are ready to prove the main identity:
(Ax, y) = y*Ax = (A*y)*x = (x, A*y);
the first and the last equalities here follow from the definition of inner product in F
n
,
and the middle one follows from the fact that
(Ax) = x(A) = xA.
Uniqueness of the adjoint. The above main identity (Ax, y) = (x, A *y) is often used
as the definition of the adjoint operator. Let us first notice that the adjoint operator is
unique: if a matrix B satisfies
(Ax, y) = (x, By) 'd x, y,
then B = A *. Indeed, by the definition of A*
(x,A *y) = (x, By) 'd x
and therefore by Corollary A*y = By. Since it is true for all y, the linear transformations,
and therefore the matrices A and B coincide.
Adjoint transformation in abstract setting. The above main identity (Ax, y) = (x, A *y)
can be used to define the adjoint operator in abstract setting, where A : V ~ W is an operator
acting from one inner product space to another. Namely, we define A* : W ~ Vto be the
operator satisfying
(Ax, y) = (x, A *y) 'dx E V, 'dy E W.
Why does such an operator exists? We can simply construct it: consider orthonormal
bases A = VI' v
2
, ... , vn in Vand B = w
l
,w
2
' ... ,wm in W. If [AlBA is the matrix of A with
respect to these bases, we define the operator A * by defining its matrix [A *lAB as
180
Inner Product Spaces
[A*]AB = ([A]BA)*'
Useful form ulas. Below we present the properties of the adjoint operators (matrices)
we will use a lot.
l. (A + B) = A* + B*;
2. (aA)* = (iA*;
3. (AB) = B*A *;
4. (A*)* =A;
5. (y, Ax) = (A *y, x).
Relation Between Fundamental Subspaces
Theorem. Let A : V ~ W be an operator acting from one inner product space to
another. Then
l. Ker A = (Ran A)I ;
2. Ker A = (Ran A)L;
3. Ran A = (Ker A)L;
4. Ran A = (Ker A).l.
Proof First of all, let us notice, that since for a subspace E we have (E L ) L = E, the
statements 1 and 3 are equivalent. Similarly, for the same reason, the statements 2 and 4
are equivalent as well. Finally, statement 2 is exactly statement 1 applied to the operator
A * (here we use the fact that
(A*)* =A)
So, to prove the theorem we only need to prove statement 1. We will present 2 proofs
of this statement: a "matrix" proof, and an "invariant", or "coordinatefree" one.
In the "matrix" proof, we assume that A is an m x n matrix, i.e., thatA : F" ~ P. The
general case can be always reduced to this one by picking orthonormal bases in Vand W,
and considering the matrix of A in this bases.
Let aI' a
2
, ... , an be the columns of A. Note, that x E (Ran A)I if and only if x ..L ak
(i.e., (x, a
k
) = 0) \;/k = 1,2, ... , n. By the definition of the inner product in F
n
, that means
o = (x, a
k
) = A; = 1, 2, ... , n.
Since A; is the row number k of A*, the above n equalities are equivalent to the
equation
A*x = o.
So, we proved that x E (Ran A) L if and only if A *x = 0, and that is exactly the statement 1.
Inner Product ::''paces 181
Now, let us present the "coordinatefree" proof. The inclusion x E (Ran A)..L means
that x is orthogonal to all vectors of the form AY' i.e., that
(x, Ay) = 0 V y .
Since (x, Ay) = (A *x, y), the last identity is equivalent to
(A *x, y) = 0 Vy,
and by Lemma this happens if and only if Ax = O. So we proved that x E (Ran A)..L if
and only if A *x = 0, which is exactly the statement 1 of the theorem.
The above theorem makes the structure of the operator A and the geometry of
fundamental subspaces much more transparent. It follows from this theorem that the operator
A can be represented as a composition of orthogonal projection onto Ran A * and an
isomorphism from Ran A * to Ran A.
Isometrices and Unitary Operators. Unitary and Orthogonal Matrices
Main definitions
Definition. An operator U: x 7 Y is called an isometry, if it preserves the norm,
II Ux II = II x II Vx E X, ...
The following theorem shows that an isometry preserves the inner product.
Theorem. An operator U : x 7 Y is an isometry if and only if it preserves the inner
product, i.e., tland only if
(x, y) = (Ux, Uy) Vx,y E X.
Proof The proof uses the polarization identities. For example, if Xis a complex space
1 ~ 2
(Ux,U
y
) =  D ex II Ux+exUy II
4 a=±l,±i
=! L ex IIU(x+exUy) 112
4 a=±I.±i
=! L exllx+
exyI1
2=(x,y).
4 a=±l,±i
Similarly, for a real space x
1 2 2
(V"U
y
) = "4(11 Ux+Uy II II UxUy II )
1 2 2
= "4 (II U (x + y) II II Ux  Uy II )
I 2 2
="4(1l
x
+YII llxyll )=(x,y).
Lemma. An operator U : X 7 Y is an isometry if and only if U*U = I.
182 Inner Product Spaces
Proof By the definitions of the isometry and of the adjoint operator
(x. y) = (Ux, Vy) = (V*Vx, y) V x, Y E X.
Therefore, if V*V::I: I, we have (x, y) = (V
x
' V
y
) and therefore Vis an isometry. On the
other hand, if V is an isometry, then for all x E X
(V*Vx, y) = (x, y) V Y E X,
and therefore by Corollary V*Vx = x. Since it is true for all x E X, we have V*V = I.
The above lemma implies that an isometry is always left invertible (V* being a left
inverse).
Definition. An isometry V: X ~ Y is called a unitary operator if it is invertible.
Proposition. An isometry V: X ~ Y is a unitary operator if and only if dirnX = dim Y.
Proof Since V is an isometry, it is left invertible, and since dim X = dim Y , it is
invertible (a left invertible square matrix is invertible). On the other hand, if V: x ~ Y is
invertible, dim X = dim Y (only quare matrices are invertible, isomorphic spaces have
equal dimensions).
A square matrix V is called unitary if V*V = I, i.e., a unitary matrix is a matrix of a
unitary operator acting in Fn' A unitary matrix with real entries is called an orthogonal
matrix. In other words, an orthogonal matrix is a matrix of a unitary operator acting in the
real space R n •
Few properties of unitary operators:
I. For a unitary transformation V, U
I
= V*;
2. If V is unitary, V* = U
I
is also unitary;
3. If V is a isometry, and vi' v
2
, ... , vn is an orthonormal basis, then Uv
l
,Vv
2
, ... , Vvn
is an orthonormal system. Moreover, if V is unitary, VVi'Vv
2
, ""Vv
n
is an
orthonormal basis.
4. A product of unitary operators is a unitary operator as well.
Examples. First of all, let us notice, that
a matrix V is an isometry if and only if its columns form
an orthonormal system.
This statement can be checked directly by computing the product V*V. It is easy to
check that the columns of the rotation matrix
(
c?sa Sino.)
sma coso.
are orthogonal to each other, and that each column has norm 1. Therefore, the rotation
matrix is an isometry, and since it is square, it is unitary. Since all entries of the rotation
matrix are real, it is an orthogonal matrix.
The next example is more abstract. Let x and Y be inner products paces, dim X = dim
Inner Product Spaces 183
Y= n, and let xl' x
2
, ... , Xn and yp Y2' ... , Y
n
be orthonormal bases inXand Yrespectively.
Define an operator U: X Y by
U
xk
= Yk' k = 1, 2, ... , n.
Since for a vector x = C
l
xl + C(2 + ... + C,ff
II X 112 = I cl12 + I c21 + ... + I C
n
I
and
II Ux 112 = 112= 1
2
,
one can conclude that II Ux II = II X II for all x E X, so U is a unitary operator.
Properties of Unitary Operators
Proposition. Let U be a unitary matrix. Then
I. I det U I = 1. In particular, for an orthogonal matrix det U = ±I;
2. If 'A is an eigenvalue of U, then I 'A I = 1
Remark. Note, that for an orthogonal matrix, an eigenvalue (unlike the determinant)
does not have to be real. Our old friend, the rotation matrix gives an example.
Proof Let det U = z. Since det (U) = det(U) , we have
I z 12 = z z = det (U* U) = det I = 1,
so I det U I = I z I = 1. Statement 1 is proved.
To prove statement 2 let us notice that if Ux = Ax then II Ux II = II Ax II = I 'A I . II x II, so
I A I = 1.
Unitary Equivalent Operators
Definition. Operators (matrices) a and b are called unitarily equivalent if there exists
a unitary operator U such that A = UBU)*. Since for a unitary U we have U
l
= if, any
fWO unitary equivalent matrices are similar as well.
The converse is not true, it is easy to construct a pair of similar matrices, which are
not unitarily equivalent.
The following proposition gives a way to construct a counterexample.
Proposition. A matrix a is unitarily equivalent to a diagonal one if and only if it has
an orthogonal (orthonormal) basis of eigenvectors.
Proof LetA = UBU* and let Ax = Ax. Then BUx = UAU*Ux = UAx = U(Ax) =
i.e., Ux is an eigeri'vector of B.
So, letA be unitarily equivalent to a diagonal matrix D, i.e., letD = UAU*. The vectors
e k of the standard basis are eigenvectors of D, so the vectors Ue k are eigenvectors of A.
Since U is unitary, the system Ue
l
,Ue
2
, ... , Ue
n
is an orthonormal basis.
Let now a has an orthogonal basis up u
2
, ... , un of eigenvectors. Dividing each vector
Uk by its norm if necessary, we can always assume that the system u
1l
u
2
' ... , un is an
184 Inner Product Spaces
orthonormal basis. Let D be the matrix of A in the basis B = u
I
' u
2
, ... , un' Clearly, D is a
diagonal matrix.
Denote by U the matrix with columns u
I
' u2' ... , un' Since the columns form an
orthonormal basis, U is unitary. The standard change of coordinate formula implies
A = [A]ss = [1]SB [A]BB [1]BS = UDU
I
and since U is unitary, A = UDU*.
COMPLEX INNER PRODUCTS
Our task in this section is to dene a suitable complex inner product. We begin by
giving a reminder of the basics of complex vector spaces or vector spaces over e .
Definition. A complex vector space V is a set of objects, known as vectors, together
with vector addition + and mUltiplication of vectors by elements of e , and satisfying the
following properties:
(VAl) For every u, v E V, we have u + v E V.
(VA2) For every u, v, WE V, we have u + (v + w) = (u + v) + w.
(VA3) There exists an element 0 E V such that for every u E V, we have u + 0 = 0 +
u= u.
(VA4) For every u E V, there exists u E V such that u + (u) = O.
(VAS) For every u, V E V, we have u + V= V + u.
(SMl) For every C E e and u E V, we have cu E V.
(SM2) For every C E e and u, V E V, we have c(u + v) = cu + CV.
(SM3) For every a, b E C and u E V, we have (a + b)u = au + bu.
(SM4) For every a, b E C and u E V, we have (ab)u = a(bu).
(SMS) For every u E V, we have lu = u.
Remark. Subspaces of complex vector spaces can be dened in a similar way as for real
vector spaces. An example of a complex vector space is the euclidean space C
n
consisting
of all vectors of the form u = (u
I
' ••• , un)' where u
I
' ... , un E e. We shall first generalize
the concept of dot product, norm and distance, rst developed for ~ n in Chapter 9.
Definition. Suppose that u = (up ... , un) and v = (vI' ... , v
n
) are vectors in e. The
complex euclidean inner product of u and v is dened by
u. v = U
I
VI + ... + Un Vn ;
the complex euclidean norm of u is dened by
"u" = (u u)1I2 = (I u
I
I
2
+ ... + 1 un 12)112;
and the complex euclidean distance between u and v is dened by
d(u, v) = " u  v " = (I u
l
 vI 12 + .,. + 1 un  vn 12)112.
Corresponding to Proposition, we have the following result.
Proposition. Suppose that u, v, WEen and C E C. Then
Inner Product Spaces
(a) u. v= v.u;
(b) u (v + w) = (u v) + (u w);
(c) c(u v) = (c
u
) v, and
(d) u. u ~ 0, and u . u = 0 if and only if u = O.
The following definition is motivated by Proposition.
185
Definition. Suppose that V is a complex vector space. By a complex inner product on
V, we mean a function (, ) : V x V ~ C which satises the following conditions:
(lPl) For every u, V E V, we have hu, vi = hv, ui.
(lP2)Foreveryu, v, WE V,wehave (u,v+w)=(u,v)+(u,w).
(lP3) For every u, v E V and c E C, we have chu, c(u, v) = (cu.v).
(IP4) For every u E V, we have (u,u) ~ 0, and (u,u) = 0 if and only if u = O.
Definition. A complex vector space with an inner product is called a complex inner
product space or a unitary space.
Definition. Suppose that u and v are vectors in a complex inner product space V .
Then the norm of u is defined by
II u 1I=(u,u)1I2 ,
and the distance between u and v is defined by
d(u, v) = IIu  vii.
Using this inner product, we can discuss orthogonality, orthogonal and orthonormal
bases, the Gram Schmidt orthogonalization process, as well as orthogonal projections, in
a similar way as for real inner product spaces. In particular, the results in Sections can be
generalized to the case of complex inner product spaces.
Unitary Matrices
For matrices with real entries, orthogonal matrices and symmetric matrices play an
important role in the orthogonal diagonalization problem. For matrices with complex entries,
the analogous roles are played by unitary matrices and hermitian matrices respectively.
Definition. Suppose that A is a matrix with complex entries. Suppose further that the
matrix A is obtained from the matrix A by replacing each entry of A by its complex conjugate.
Then the matrix
I
A=A
is called the conjugate transpose of the matrix A.
Proposition. Suppose that A and B are matrices with complex entries, and that
c E C. Then
(a) (A *)* = A;
(b) (A + B)* =A* + B*;
186
(c) (cA*) = cA*; and
(d) (AB)* = B*A*.
Inner Product Spaces
Definition. A square matrix A with complex entries and satisfying the condition
AI = A * is said to be a unitary matrix.
Corresponding to Proposition, we have the following result.
Proposition. Suppose that A is an n x n matrix with complex entries. Then
(a) A is unitary if and only if the row vectors of A form an orthonormal basis of en
under the complex euclidean inner product.
(b) A is unitary if and only if the column vectors of A form an orthonormal basis of
en under the complex euclidean inner product.
Unitary Diagonalization
Corresponding to the orthogonal disgonalization problem in Section 10.3, we now
discuss the following unitary diagonalization problem.
Definition. A square matrix A with complex entries is said to be unitarily diagonalizable
if there exists a unitary matrix P with comp lex entries such that pI AP = P * AP is a diagonal
matrix with complex entries.
First of all, we would like to determine which matrices are unitarily diagonalizable.
For those" that are, we then need to discuss how we may nd a unitary matrix P to carry out
the diagonalization. As before, we study the question of eigenvalues and eigenvectors of a
given matrix; these are dened as for the real case without any change. We have indicated
that a square matrix with real entries is orthogonally diagonalizable if and only if it is
symmetric. The most natural extension to the complex case is the following.
Definition. A square matrix A with complex entries is said to be hermitian if A = A.
Unfortunately, it is not true that a square matrix with complex entries is unitarily
diagonalizable if and only if it is hermitian. While it is true that every hermitian matrix is
unitarily diagonalizable, there are unitarily diagonalizable matrices that are not hermitian.
The explanation is provided by the following.
Definition. A square matrix A with complex entries is said to be normal if AA * = A * A.
Remark. Note that every hermitian matrix is normal and every unitary matrix is normal.
Corresponding to Propositions, we have the following results.
Proposition. Suppose that A is an n n matrix with complex entries. Then it is unitarily
diagonalizable if and only if it is normal.
Proposition. Suppose that u
l
and u
2
are eigenvectors of a normal matrix A with complex
entries, corresponding to distinct eigenvalues Al and ~ respectively. Then u
l
u
2
= O. In
other words, eigenvectors of a normal matrix corresponding to distinct eigenvalues are
orthogonal.
We can now follow the procedure below.
Inner Product Spaces 187
Unitary Diagonalization Process. Suppose that A is a normal n n matrix with complex
entries.
(1) Determine the n complex roots 11"'" In of the characteristic polynomial det(A
II), and n linearly independent eigenvectors u
I
' ... , un of A corresponding to these
eigenvalues as in the Diagonalization process.
(2) Apply the GramSchmidt orthogonalization process to the eigenvectors u
l
, ... ,
un to obtain orthogonal eigenvectors vI' ... , Vn of A, noting that eigenvectors
corresponding to distinct eigenvalues are already orthogonal.
(3) Normalize the orthogonal eigenvectors vI' ... , vn to obtain orthonormal
eigenvectors wI' ... , Wn of A. These form an orthonormal basis of en.
Furthermore, write
(
AI J
P = (w 1 ... W n ) and D = ,
An
where AI' ... , An E e are the eigenvalues of A and where wI' ... 'W
n
E en are
respectively their orthogonalized and normalized eigenvectors. Then P*AP = D.
We conclude this chapter by discussing the following important result which implies
Proposition, that all the eigenvalues of a symmetric real matrix are real.
Proposition. Suppose that A is a hermitian matrix. Then all the eigenvalues of A are
real.
Proof Suppose that A is a hermitian matrix. Suppose further that is an eigenvalue of
A, with corresponding eigenvector v. Then
Av = AV.
Multiplying on the left by the conjugate transpose v* of v, we obtain
v*Av = V*AV = AV* v.
To show that A is real, it suces to show that the 1 x 1 matrices v*Av and v*v both have
real entries. Now
(v*Av)* = v*A*(v*)* = v*Av
and
(v*v)* = v*(v*)* = v*v.
It follows that both v*Av and v*v are hermitian. It is easy to prove that hermitian
matrices must have real entries on the main diagonal. Since v*Av and v*v are 1 xl, it
follows that they are real.
APPLICATIONS OF REAL INNER PRODUCT SPACES
Least Squares Approximation
Given a continuous functionf: [a; b] ~ lR, we wish to approximatefby a polynomial
188 Inner Product Spaces
g: [a; b] ~ lR
of degree at most k, such that the error
f
b 2
a I f(x)  g(x) dx
is minimized. The purpose of this section is to study this problem using the theory of
real inner product spaces. Our argument is underpinned by the following simple result in
the theory.
Proposition. Suppose that V is a real inner product space, and that W is a finite
dimensional subspace of V. Given any u E V, the inequality
II u projwU II ~ II u w II
holds for every w E W.
In other words, the distance from u to any W E W is minimized by the choice w =
projwU, the orthogonal projection of u on the subspace W. Alternatively, projwU can be
thought of as the vector in W closest to u.
Proof Note that
u projwU E W.L and proj Wu WE W.
It follows from Pythagoras's theorem that
II u  W 112 = II(u  projwu  W) + (projwu  w)1I
2
= II u  projwU 112 + II projwU  W 112; so that
II u  W 112 II u  proj wU 112 = II proj wU  W 112 ~ 0:
The result follows immediately.
Let V denote the vector space qa, b] of all continuous real valued functions on the
closed interval [a, b], with inner product
(f,g) = f: f(x)g(x)dx.
Then
f
b 2 2
a I f(x)g(x) dx =(f  g,f  g) =11 f  gil·
It follows that the least squares approximation problem is reduced to one of nding a
suitable polynomial g to minimize the norm IIf  g Ii.
Now let W = P k [a, b] be the collection of all polynomials g : [a, b] ~ lR with real
coecients and of degree at most k. Note that W is essentially P k' although the variable is
restricted to the closed interval [a, b]. It is easy to show that W is a subspace of V. In view
of Proposition IIA, we conclude that
g= projwf
gives the best least squares approximation among polynomials in W = P
k
[a, bJ. This
subspace is of dimension k + 1. Suppose that {l'o' vI' ... , v
k
} is an orthogonal basis of W=
P k [a, b]. Then by Proposition, we have
Inner Product Spaces 189
_ (f,v
o
) (f,vI) (f,Vk)
g 2 V
O
+
2
VI + ... + 2 V
k
•
II Vo II II VI II II V
k
II
Example. Consider the functionj{x) = x
2
in the interval [0,2]. Suppose that we wish
to nd a least squares approximation by a polynomial of degree at most 1. In this case, we
can take V = C[O, 2], with inner product
(f,g) = f: f(x)g(x)dx,
and W = PI [0,2], with basisfI' xg. We now apply the GramSchmidt orthogonalization
process to this basis to obtain an orthogonal basis {I, xI} of W, and take
li,I) li,xI)
g= \ 1+ \ (xI).
111112 IIxIII
2
(i,I)= and 111112= f:dx=2,
while
(x
2
,XI) = f: x
2
(xI)dx = and II x_II1
2
= (xI,xI)
1
2 2 2
= (xI) dx=.
o 3
It follows that
4 2
g=+2(xI)=2x.
3 3
Example. Consider the functionf(x) = eX in the interval [0, 1]. Suppose that we wish
to find a least squares approximation by a polynomial of degree at most 1. In this case, we
can take V = qo, 1], with inner product
(f,g) = I x,
and W = PI [0, 1], with basis {I, x}. We now apply the GramSchmidt orthogonalization
process to this basis to obtain an orthogonal basis {I, x  1I2} of W, and take
(ex,I) (e
X
,X1I2)( 1)
g = IIlIT
1
+ II X _11211
2
X "2 .
Now
so that
190
Inner Product Spaces
Also
11111'= (1,IH;dx= 1 and +f,xf)= J;( XHdx=
It follows that
g = (el)+(l86e)(x.!.) = (l86e)x+(4e1O).
2,
Remark. From the proof of Proposition, it is clear that II u  w II is minimized by the
unique choice w = proj wU. It follows that the least squares approximation problem posed
here has a unique solution.
Quadratic Form
A real quadratic form in n variables xl' ... , xn is an expression of the form
n n
L L Cijx;Xj'
;=1 j=1
iSj.
where cij E lR for every i,j = 1; ... ; n satisfying i < j.
Example. The expression +6xlx2 +7x; is a quadratic form in two variables xl
and x
2
. It can be written in the form
+6x1x,
Example. The expression + 5x; + 3x; + 2x1x2 + 4xlx3 + 6x2x3 is a quadratic form
in three variables xl' x
2
and x
3
• It can be written in the form
(
4 1 2J (XIJ
+ 5x; + 3x; + 2xlx2 + 4xlx3 + 6x2x3 = (xlx2
x
3) 1 5 3 x2'
2 3 3 x3
Note that in both examples, the quadratic form can be described in terms of a real
symmetric matrix. In fact, this is always possible. To see this, note that given any quadratic
form (1), we can write, for every i,j = 1, ... , n,
Cij
ifi = j,
1
ifi> j,
aij =
coo
2 l)
1
ifi> j. Coo
2 )1
Inner Product Spaces 191
Then
We are interested in the case when xl"'" xn take real values. In this case, we can
write
It follows that a quadratic form can be written as
x/Ax,
where A is an n x n real symmetric matrix and x takes values in ]Rn.
Many problems in mathematics can be studied using quadratic forms. Here we shall
restrict our attention to two fundamental problems which are in fact related. The rst is the
question of what conditions the matrix A must satisfy in order that the inequality
x/Ax> 0
holds for every nonzero x E ]Rn. The second is the question of whether it is possible
to have a change of variables of the type x = P
Y
' where P is an invertible matrix, such that
the quadratic form x(1x can be represented in the alternative form y Dy, where D is a diagonal
matrix with real entries.
Definition. A quadratic form x/Ax is said to be positive denite if x/Ax> 0 for every
nonzero x E lR n • In this case, we say that the symmetric matrix A is a positive denite
matrix. To answer our rst question, we shall prove the following result.
Proposition. A quadratic form xtAx is positive denite if and only if all the eigenvalues
of the symmetric matrix A are positive.
Our strategy here is to prove Proposition by first studying our second question. Since
the matrix A is real and symmetric, it follows from Proposition lOE that it is orthogonally
diagonalizable. In other words, there exists an orthogonal matrix P and a diagonal matrix
D such that PtAP = D, and so A = P DPt. It follows that
xtAx = xtPDpt
x
,
and so, writing
y= ptx;
192
Inner Product Spaces
we have
x'Ax = YDy.
Also, since P is an orthogonal matrix, we also have x = Py. This answers our second
question. Furthermore, in view of the Orthogonal diagonalization process, the diagonal
entries in the matrix D can be taken to be the eigenvalues of A, so that
D=(AI '. J
where AI' ... , n E lR are the eigenvalues of A. Writing
y=(n
we have
_1.t 2 2
x:Ax = y Dy = AlYI + ... + AnYn.
Note now that x = 0 if and only if y = 0, since P is an invertible matrix.
Example. Consider the quadratic form 2xJ + 5x; + 2x; +4xlx2 + 2Xlx3 + 4x2x3. This
can bewritten in the form xlAx, where
The matrix A has eigenvalues I = 7 and (double root) 2 = 3 = 1, see Example.
Furthermore, we have plAP = D, where
and
1116 1I.J2 1/.J3 0 0 1
Writingy = pIX, the quadratic form becomes 7xJ + y; + y; which is clearly positive
defnite.
Example. Consider the quadratic from 5xJ + + yi 4xlx2 + 4x2x3. This cn be
written in the form Xl Ax, where
Inner Product Spaces 193
The matrix A has eigenvalues Al = A
3
, A2 = 6 and A3 = 9, See Example, furthermore,
we have PIAP = D, where
(
2/3 2/3 113) (3 0 0)
P= 2/3 113 2/3 and D= 0 6 o.
113 2/3 2/3 0 0 9
Writingy= pIX, the quadratic form becomes 3 y ~ + 6y; +9y; which is clearly positive
definite.
Example. Consider the quadratic form x ~ + xi + 2xIX2. Clearly this is equal to
(x 1 + x
2
)2 and is therefore not positive defnite. The quadratic form can be written in the
form xlAx, where
A = G :) and x = [ ::j.
It follows from Proposition that the eigenvalues of A are not all positive. Indeed, the
matrix A has eigenvalues Al = 2 and A2 = 0, with corresponding eigenvectors
(:) and (_:).
Hence we may take
p=(1IJi 1IJi J and D=(2 00).
1IJi 1IJi 0
Writing y = pIX, the quadratic form becomes 2 y ~ which is not positive denite.
Real Fourier Series
Let E denote the collection of all functions /: [n, n] 7 lR which are piecewise
continuous on the interval [n, n]. This means that any / E E has at most a finite number
of po ints of discontinuity, ateach of which / need not be dened but must have one sided
limits which are finite. We further adopt the convention that any two functionsf, gEE are
considered equal, denoted by /= g, if/ex) = g(x) for every x E [n, n] with at most a finite
number of exceptions.
It is easy to check that E forms a real vector space. More precisely, let E E denote the
function A: [n, n] E lR, where (x) = 0 for every x E [n, n]. Then the following conditions
hold:
For every f, gEE, we have f, gEE.
For every f, g, h E E, we have / + (g + h) = if + g) + h.
For every /E E, we have/ + A = A + /= f.
For every / E E, we have/ + (j) = A.
For every f, gEE, we have / + g = g + f.
194
For every e E IR andfE E, we have efE E.
For every e E IR andJ, gEE, we have elf + g) = ef+ eg.
For every a, e E IR andf E E, we have (a + b)f= af + bf.
For every a, e E IR andfE E, we have (a + b)f= a(bf).
For every fEE, we have If= f.
Inner Product Spaces
We now give this vector space E more structure by introducing an inner product. For
every J, gEE,
(f,g) = !f1t f(x)g(x)dx.
1t 1t
The integral exists since the function j{x)g(x) is clearly piecewise continuous on
[1t', 1t]. It is easy to check that the following conditions hold:
For every J, gEE, we have (f,g) = (g,f).
For every f, g, h E E, we have (f,g+ h) = (f,g)+(f,h).
For every J, gEE and e E IR, we have e (f, g) = (cf, g).
ForeveryfE E,wehave ,(f,f)=O if and only iff=A. HenceEis
a real inner product space.
The diculty here is that the inner product space E is not finitedimensional. It is not
straightforward to show that the set
{ ,k sin x,cosx,sin 2x,cos 2x, sin 3x,COS3X, •• }
in E forms an orthonormal "basis" for E. The diculty is to show that the set spans E.
Remark. It is easy to check that the elements in (4) form an orthonormal "system". For
every k, mEN,
we have
as well as
(
. kx' ) 1 f1t . kx' dx 1
SIll ,smmx =  sm smmx =
1t 1t 1t
f
1t 1 {I
(cos(k  m)x  cos (k + m)x)dx =
1t 2 0
ifk=m
if k:f; m
Inner Product Spaces 195
(coskx,cosmx) = lJ1t coskxcosmxdx = l
X 1t X
J1t = {I ? k = m
1t 2 r 0 if k '* m
(sinkx,cosmx) = lJ1t sinkxcosmxdx = lJ1t .!.(sin(km)xsin(k+ m)x)dx
x 1t X 1t 2
Let us assume that we have established that the set (4) forms an orthonormal basis for
E. Then a natural extension of Proposition 9H gives rise to the following: Every function
J E E can be written uniquely in the form
ao + t(a
n
cosnx+b
n
sinnx)
2 n=l
known usually as the (trigonometric) Fourier series of the function f, with Fourier
coecients
Ji (J, :n) = ! i:1t
J
(x)dx,
and, for every n E N,
an = (J,cosnx) = lJ1t J(x)cosnxdx and
x 1t
b
n
= (J,sinnx) = lJ1t J(x)sinnxdx
x 1t
Note that the constant term in the Fourier series (5) is given by
(f,:n)=
Example. Consider the functionJ: [x, x] 1R, given by j{x) = x for every x E [x, x].
For every n E N u {O}, we have
a = lJ1t xcosnxdx = 0
n x1t
since the integrand is an odd function. On the other hand, for every n E N, we have
b
n
= lJ1t xsin nxdx = l r
1t
xsin nxdx,
x 1t x
Jo
since the integrand is an even function. On integrating by parts, we have
b
n
=3..(_[xcosnx]1t + i1tcosnx dxJ=3..(_[xcosnx]1t + [sin;x]1tJ= 2(1)n+1
x noon X non 0 n
We therefore have the (trigonometric) Fourier series
00 2( _l)n+1 .
L smnx.
n=1 n
196 Inner Product Spaces
Note that the functionfis odd, and this plays a crucial role in eschewing the Fourier
coefficients an corresponding to the even part of the Fourier series.
Example. Consider the functionf: [n, n] 1R, given by f(x) = I x I for every x E [
n, n]. For every n E N u {O}, we have
an = I x I cosnxdx = r
1t
xcoxnxdx,
n 1t n Jo
since the integrand is an even function. Clearly
ao = xdx = n.
7t 0
Furthermore, for every n EN, on integrating by parts, we have
an lJ1t _ J1tSinnxdxJ
1t n 0 0 n
(
J
{
o if niseven,
[XSi:nx I +[ cO:,nx I =  ifnisodd.
On the other hand, for every n EN, we have
b
n
= I x I sinnxdx = 0
n 1t
since the integrand is an odd function. We therefore have the (trigonometric) Fourier
series
nf4 4
 cosnx= 2 cos(2kl)x.
2 nl ttn 2 k=l 7t(2k 1)
II odd
Note that the functionfis even, and this plays a crucial role in eschewing the Fourier
coefficients b
n
corresponding to the odd part of the Fourier series.
Example. Consider the function f: [n, n] 1R, given for every x E [n, n] by
{
+ 1 if 0 < x n,
f(x) = sgn(x) = 0 if x = 0
1 if n x < 0,
For every n E N u {O}we have
an = sgn(x) cos nxdx = 0,
n 1t
since the integrand is an odd function. On the other hand, for every n EN, we have
1 f1t ')' dx 2 i1t . dx
an =  smnx  smn,x ,
n 1t 1t 0
since the integrand is an even function. It is easy to see that
Inner Product Spaces 197
b
n
= = {
11: m 0 
nn
ifniseven
ifn is odd.
We therefore have the (trigonometric) Fourier series
00 4 00 4
Lsinnx = L sin(2kl)x.
n=1 nn k=1 n(2k 1)
II odd
Example. Consider the function/: [n, n]t lR, given by j(x) = x
2
for every x E [n, n]
For every n E N u {O} • we have
1 flt 2 2i
lt
2
an =  x cosnxdx =  x cosnxdx,
n It n 0
since the integrand is an even function. Clearly
2
ao = r
lt
idx = 3n .
n J
o
3
Furthermore, for every n EN, on integrating by parts, we have
an = !([ x' I  dx J
= !([ x' I +[ I  dx J
= I I 4(:;)'
On the other hand, for every n EN, we have
1 flt 2
b
n
=  x sin nxdx = 0
n It
since the integrand is an odd function. We therefore have the (trigonometric) Fourier
series
n
2
00 4( _1)n
+ L, 2 cosnx.
3 11=1 n
Chapter 7
Structure of Operators in Inner Product Spaces
In this chapter we again assuming that all spaces are finitedimensional.
1. Upper triangular (Schur) representation of an operator.
Theorem. Let A : X ~ X be an operator acting in a complex inner product space.
There exists an orthonormal basis u
l
, u
2
, ... , un in X such that the matrix of A in this basi$
is upper triangular.
In other words, any n x n matrix A can be represented as T = UTU*, where U is a
unitary, and T is an upper triangular matrix.
Proof We prove the theorem using the induction in dirnX. If dirnX = 1 the theorem is
trivial, since any 1 x 1 matrix is upper triangular.
Suppose we proved that the theorem is true if dirnX = n  1, and we want to prove it
for dim X = n. Let A.) be an eigenvalue of A, and let up II u
l
II = 1 be a corresponding
eigenvector, AU
I
= 1..1 u
I
· Denote E = ut ,and let v
2
' ... , v n be some orthonormal basis in
E (clearly, dimE = dim XI = n
I
), so up v
2
, ... , vn is an orthonormal basis in X. In this
basis the matrix of A has the form
•
o
o
here all entries below 1..1 are zeroes, and * means that we do not care what entries are
in the first row right of 1..
1
'
We do care enough about the lower right (n  1) x (n  1) block, to give it name: we
denote it as A I'
. Note, that A I defines a linear transformation in E, and since dimE = n  1, the induction
hypothesis implies that there exists an orthonormal basis (let us denote is as u
2
, ... , un) in
which the matrix of A I is upper triangular.
,
Structure of Operators in Inner Product Spaces 199
So, matrix of A in the orthonormal basis u I' u
2
, ... , un has the form), where matrix A I
is upper triangular. Therefore, the matrix of a in this basis is upper triangular as well.
Remark. Note, that the subspace E = ut introduced in the proof is not invariant under
A, i.e. the inclusion AE c E does not necessarily holds. That means that A I is not a part
of A, it is some operator constructed from A.
Note also, that AE c E if and only if all entries denoted by * (i.e. all entries in the
first row, except AI) are zero.
Remark. Note, that even if we start from a real matrix A, the matrices U and T can
have complex entries. The rotation matrix
(
c?so. Sino.) a:;1!: k7\ k E Z
sma coso.' ,
is not unitarily equivalent (not even similar) to a real upper triangular matrix. Indeed,
eigenvalues of this matrix are complex, and the eigenvalues of an upper triangular matrix
are its diagonal entries.
Remark. An analogue of Theorem can be stated and proved for an arbitrary vector
space, without requiring it to have an inner product. In this case the theorem claims that
any operator have an upper triangUlar form in some basis. a proof can be modeled after the
proof of Theorem. An alternative way is to equip V with an inner product by fixing a basis
in V and declaring it to be an orthonormal one.
Note, that the version for inner product spaces is stronger than the one for the vector
spaces, because it says that we always can find an orthonormal basis, not just a basis.
The following theorem is a realvalued version of Theorem
Theorem. Let A : X ~ X be an operator acting in a real inner product space. Suppose
that all eigenvalues of A are real. Then there exists an brthonormal basis u
l
, u
2
, ... , un in
X such that the matrix of A in this basis is upper triangular.
In other words, any real n x n matrix A can be represented as T = UTU* = UTUT,
where U is an orthogonal, and T is a real upper triangular matrices.
Proof To prove the theorem we just need to analyse the proof of Theorem. Let us
assume (we can always do that without loss of generality, that the operator (matrix) A acts
in lRn.
Suppose, the theorem is true for (n  1) x (n  1) matrices. As in the proof of Theorem
let 1 be a real eigenvalue of A, u
l
E lR
n
, II ulll = 1 be a corresponding eigenvector, and let
v
2
, ... , v n be on orthonormal system (in ~ n) such that up v
2
, ... , v n is an orthonormal basis
in lR n • The matrix of a in this basis has form equation, where A I is some real matrix.
If we can prove that matrix Al has only real eigenvalues, then we are done. Indeed,
then by the induction hypothesis there exists an orthonormal basis u
2
, ... , un in E = ut
200 Structure of Operators in Inner Product Spaces
such that the matrix of A I in this basis is upper triangular, so the matrix of a in the basis U
I
'
U
2
, ... , un is also upper triangular.
To show that A I has only real eigenvalues, let us notice that
det(A 'JJ) = (AI  A) det(A I  A)
(take the cofactor expansion in the first, row, for example), and so any eigenvalue of
A I is also an eigenvalue of A. But a has only real eigenvalues!
Spectral Theorem for selfadjoint and normal operators.
In this section we deal with matrices (operators) which are unitarily equivalent to
diagonal matrices.
Let us recall that an operator is called selfadjoint if A = A *. A matrix about of a self
adjoint operator (in some orthonormal basis), i.e. a matrix satisfying A = a is called a
Hermitian matrix. Since we usually do not distinguish between operators and their matrices,
we will use both terms.
Theorem. Let A = A * be a selfadjoint operator in an inner product space X (the
space can be complex or real). Then all eigenvalues of A are real, and there exists and
orthonormal basis of eigenvectors of A in X
This theorem can be restated in matrix form as follows
Theorem. Let A = A be a selfadjoint (and therefore square) matrix. Then A can be
represented as
A= UDU,
where U is a unitary matrix and D is a diagonal matrix with real entries.
Moreover, if the matrix A is real, matrix U can be chosen to be real (i.e. orthogonal).
Proof To prove Theorems let us first apply Theorem if X is a real space) to find an
orthonormal basis in X such that the matrix of a in this basis is upper triangular. Now let us
ask ourself a question: What upper triangular matrices are selfadjoint?
The answer is immediate: an upper triangular matrix is selfadjoint if and only if it is
a diagonal matrix with real entries. Theorem is proved.
Lei us give an independent proof to the fact that eigenvalues of a selfadjoint operators
are real. Let A = A* and Ax = Ax, x '* O. Then
(Ax, x) = (x, x) = (x, x) = II x 112.
On the other hand,
(Ax, x) = (x, A *x) = (x, Ax) = (x, Ax) = ~ (x, x) = ~ II x 11
2
,
so All x 112 = ~ II x 112. Since Ilx116= 0 (x '* 0),
we can conclude A = I, so is real.
It also follows from Theorem that eigenspaces of a selfadjoint operator are orthogonal.
Let us give an alternative proof of this result.
Proposition. Let A = A * be a selfadjoint operator, and let u, v be its eigenvectors,
Au = AU, Av = Av. Then, if A '* j.l, the eigenvectors u and v are orthogonal.
Structure of Operators in Inner Product Spaces 201
Proof This proposition follows from the spectral theorem, but here we are giving a
direct proof. Namely,
(Au, v) = (lu, v) = leu, v).
On the other hand
(Au, v) = (u, A *v) = (u, Av) = (u, /lv) = iI (u, v) = /leU, v)
(the last equality holds because eigenvalues of a selfadjoint operator are real), so
'A,(u, v) = /leU, v). If 'A, *" /l it is possible only if (u, v) = o.
Now let us try to find what matrices are unitarily equivalent to a diagonal one. It is
easy to check that for a diagonal matrix D
D*D=DD*.
Therefore AA = AA if the matrix of a in some orthonormal basis is diagonal.
Definition. An operator (matrix) N is called normal if N* N = NN.
Clearly, any selfadjoint operator (AA * = AA *) is normal. Also, any unitary operator
U: X ~ X is normal since U*U = UU* = 1.
Note, that a normal operator is an operator acting in one space, not from one space to
another. So, if U is a unitary operator acting from one space to another, we cannot say that
U is normal.
Theorem. Any normal operator N in a complex vector space has an orthonormal basis
of eigenvectors.
In other words, any matrix N satisfying N*N = NN* can be represented as
N= UDU*,
where U is a unitary matrix, and D is a diagonal one.
Remark. Note, that in the above theorem even if N is a real matrix, we did not claim
that matrices U and D are real. Moreover, it can be easily shown, that if D is real, N must
be selfadjoint.
Proof To prove Theorem we apply Theorem to get an orthonormal basis, such that
the matrix of N in this basis is upper triangular. To complete the proof of the theorem we
only need to show that an upper triangular normal matrix must be diagonal.
We will prove this using induction in the dimension of matrix. The case of 1 x 1
matrix is trivial, since any 1 x 1 matrix is diagonal.
Suppose we have proved that any (n  1) x (n  1) upper triangular normal matrix is
diagonal, and we want to prove it for n x n matrices. Let N be n x n upper triangular
normal matrix. We can write it as
aJ'J aJ'2
...
aJ'n
0
N=
N
J
0
where Nt is an upper triangular (n  1) x (n  1) matrix.
202 Structure of Operators in Inner Product Spaces
Let us compare upper left entries (first row first column) of N*N and NN*. Direct
computation shows that that
and
(NN»),) = I a),) 12 + I a)'21
2
+ ... + I al'n 12.
So, (N*N»),) = (NN»),) if and only if a),2 = ... = a),n = O. Therefore, the matrix N has
the form
a),) 0 o
0
N=
N)
0
It follows from the above representation that
2
I a),Ii 0
o o
N*N=
o
, NN*=
o
o o
so N; Nl = Nl N; . That means the matrix N) is also normal, and by the induction
hypothesis it is diagonal. So the matrix N is also diagonal.
The following proposition gives a very useful characterization of normal operators.
Proposition. An operator N : X ~ X is normal if and only if
II Nx II = II N*x II \;fx E X.
Proof Let N be normal, N*N = NN*. Then
II Nx 112 = (Nx,Nx) = (N*Nx, x) = (NN*x, x) = (N*x,N*x) = II Nx 112
so II Nx II = II N*x II·
Now let
II Nx II = II N*x II \;fx E X ..
The Polarization Identities imply that for all x, y E x
(N*N x, y) = (Nx, Ny) = L a II Nx+aNy 112
a=±),±i
a=±l,±i a=±l,±i
= L a II N*(x+aN* y) 112 = (N*x, N*y) = (NN* x, y)
a=±l,±i
and therefore N*N = NN*.
Structure of Operators in Inner Product Spaces
Polar and Singular Value Decompositions
Definition. a self adjoint operator A : X X is called positive definite if
(Ax, x) > 0 Vx:;t: 0"
and it is called positive semidefinite if
(Ax, x) 0 "Ix EX.
203
We will use the notation A> 0 for positive definite operators, and A 0 for positive
semidefinite.
The following theorem describes positive definite and semidefinite operators.
Theorem. Let a = A *. Then
1. A > 0 if and only if all eigenvalues of A are positive.
2. A A 0 if and only if all eigenvalues of A are nonnegative.
Proof Pick an orthonormal basis such that matrix of a in this basis is diagonal. To
finish the proof it remains to notice that a diagonal matrix is positive definite (positive
semidefinite) if and only if all its diagonal entries are positive (nonnegative).
Corollary. Let A = A * 0 be a positive semidefinite operator. There exists a unique
positive semidefinite operator B such that B2 = A
Such B is called (positive) square root of A and is denoted asJA or il2.
Proof Let us prove that JA exists. Let VI' v
2
, ... , vn be an orthonormal basis of
eigenvectors of A, and let AI' ... , An be the corresponding eigenvalues. Note, that since
A 0, all Ak o.
In the basis VI' v
2
, ... , vn the matrix of a is a diagonal matrix diag{l, 2, ... , n} with
entries A], ... , An on the diagonal. Detine the matrix of B in the same basis as diag
{,J>:;,Fz ... Fn}.Clearly,B=B andB2=A.
To prove that such B is unique, let us suppose that there exists an operator C = C* 0
such that e
2
= A. Let u
l
' u
2
, ... , un be an orthonormal basis of eigenvalues of e, and let
J.ll' J.l2' ... , J.l
n
be the corresponding eigenvalues (note that Uk OV k). The matrix of C in
the basis ul' u
2
, ... , un is a diagonal one diag ... , and therefore the matrix of
A ::: C2 in the same basis is diag ... This implies that any eigenvalue A of A is
of form and, moreover, if Ax = Ax, then ex = ,J';.x.
Therefore in the basis vI' v
2
' ... , vn above, the matrix of C has the form
diag{A,A, ... ,Fn}, i.e. B = C.
Modulus of an operator. Singular values. Consider an operator A : X Y. Its Hermitian
square A * A is a positive semidefinite operator acting in X. Indeed,
(A*A*) =A*A** =A*A
and
204 Structure of Operators in Inner Product Spaces
(A *Ax, x) = (Ax, Ax) = II Ax 112 ~ 0 Vx EX.
Therefore, there exists a (unique) positivesemidefinite square root R = .J A * A . This
operator R is called the modulus of the operator A, and is often denoted as I A I.
The modulus of A shows how "big" the operator A is: Proposition. For a linear operator
A:X7Y
III A I x II = II Ax II Vx EX·
Proof For any x E X
III A I x 112 = CI A I x, I A I x) = CI A I * I A lx, x) = (I A 1
2
x, x)
= (A*Ax, x) = (Ax, Ax) = II Ax 112
Corollary.
Ker A = Ker I A I = (Ran I A I) 1 •
Proof The first equality follows immediately from Proposition, the second one follows
from the identity Ker T = (Ran T*).l (j A I is selfadjoint).
Theorem. (Polar decomposition of an operator). Let A : X ~ X be an operator (square
matrix). Then A can be represented as
A=VIAI,
where V is a unitary operator.
Remark. The unitary operator V is generally not unique. As one will see from the
proof of the theorem, V is unique if and only if a is invertible.
Remark. The polar decomposition A = VI A I also holds for operators A : X 7 Yacting
from one space to another. But in this case we can only guarantee that V is an isometry
from Ran i A I = {KerA) 1 to Y.
If dim X :S; dim Y this isometry can be extended to the isometry from the whole X to
Y (if dim X = dim Y this will be a unitary operator).
Proof Consider a vector x E Ran I A I. Then vector x can be represented as x = IAlv for
some vector v E X.
Define Vo x := Av. By Proposition
II Vox II = II A v II = III A I v II = II x II
so it looks like V is an isometry from Ran I A I to X.
But first we need to prove that Vo is well defined. Let VI be another vector such that
x = I A Iv
I
·
But x = I A I v = I A I VI means that
v  v I E Ker I A I = Ker A
so
meaning that Vo x is well defined.
By the construction
A = VolA I.
Structure of Operators in Inner Product Spaces
To extend Vo to a unitary operator V, let us find some unitary transformation
VI: Ker A (RanA)l. = Ker A*.
205
It is always possible to do this, since for square matrices dim KerA = dim Ker A * (the
Rank Theorem). It is easy to check that V = Vo + VI is a unitary operator, and that
Singular Values
Eigenvalues ofl A 1 are called the singular values of A. In other words, if AI' A
2
, ... , An
are eigenvalues of A*A, then
are singular values of A.
Consider an operator A : X Y, and let aI' a
2
, ... , an be the singular values of a
counting mUltiplicities. Assume also that a l' a
2
, ... , a
r
are the nonzero singular values of
A, counting multiplicities. That means a
k
= 0 for k> r.
By the definition of singular values ... are eigenvalues of A*A, and let
vI' V
2
, ... , V n be an orthonormal basisof eigenvectors of
A *A, A *Av
k
= ai v
k
Proposition. The system
1 .
wk := ak AVk' k = 1,2 ... , r
is an orthonormal system.
{
0 i*k
Proof (Av
j
, Av
k
) = (A *Av
j
, v
k
) = vk) = ; J. = k
a j'
since vI' v
2
, ... , vr is an orthonormal system.
In the notation of the above proposition, the operator a can be represented as
or, equivalently
r
Ax = L ak(x, vk)wk'
k=I
Indeed, we know that vI' V
2
, ... , vn is an orthonormal basis in X. Then
r * *
"akwkvk
v
. = a ·w·vJv· = a ·w· = Av· ·f· 1 2
L..J J J J J J J .J I J = , , ... , r,
k=I
206 Structure of Operators in Inner Product Spaces
and
r *
LOk WkVkVj = 0= AVj forj> r.
k=1
So the operators in the left and right sides of equation coincide on the basis v I' V
2
, ... , v n'
so they are equal.
Definition
Remark. Singular value decomposition is not unique. Why?
Lemma. Let a can be represented as
r *
A = LOkWkVk
k=1
where ok> ° and v\, V
2
, .•. , V
r
' W
I
'W
2
' ... , wr are some orthonormal systems. Then this
representation gives a singular value decomposition of A.
Proof We only need to show that vk are eigenvalues of
A* A, A*A
vk
= O ~ V k .
Since v\, v
2
, ... , vr is an orthonormal system,
* {a, j ~ k
WkWj=(Wj,wk)=Bkj := ·k
1, ]  ,
and therefore
r 2 *
A * A = LOkVkVk.
k=1
Since vI' v
2
, ... , vr is an orthonormal system
r 2 * 2
A* AVj = LOkVkVkVj =OjVj
k=1
thus v
k
are eigenvectors of A *A.
Corollary. Let
r *
A= LOkWkVk
k=1
be a singular value decomposition of A. Then
r *
A= LOkVkWk
k=1
is a singular value decomposition of A *
Matrix representation of the singular value decomposition. The singular value
decomposition can be written in a nice matrix form. It is especially easy to do if the operator
Structure of Operators in InnerrrOduct Spaces 207
A is invertible. In this case di X = dim Y = n, and the operator A has n nonzero singular
values (counting multiplicitie ), so the singular value decomposition has the form
n *
A= Lakwkvk
k=1
where vI' v
2
, ... , vn and wl'w
2
, ... , wn are orthonormal bases in x and Y respectively. It
can be rewritten as
A = W L V*,
where L = diag{a
l
, a
2
, ... , an} and V and Ware unitary matrices with columns
v\' v
2
' ... , vn and w
p
w
2
, ... ,wn respectively.
Such representation can be written even if A is not invertible. Let us first consider the
case dim X = dim Y = n, and let a I' a
2
, ... , a,., r < n be nonzero singular values of A. Let
n *
A = Lakwkvk
k=1
be a singular value decomposition of A. To represent A as WL V let us complete the
systems { V d ~ = I ' {wd:=1 to orthonormal bases. Namely, let v
r
+ 1, ... , vn and wr+I' ... , wn
be an orthonormal bases in Ker A = Ker I A I and (Ran A).L respectively. Then v\, v
2
' ... , v n
and w
l
,w
2
' ... ,wn are orthonormal bases inXand Yrespectively and A can be represented
as
A= WLV*,
where L is n x n diagonal matrix diag {aI' ... , ar' 0, ... , O}, and V, Ware n x n
unitary matrices with columns vI' v
2
, ..• , vn and w
I
,w
2
' ... ,wn respectively.
Remark. Another way to interpret the singular value decomposition A = W LV * is
to say that L is the matrix of A in the (orthonormal) bases A = vI' v
2
, ... , vn and B:= w
l
,w
2
'
... , w
n
' i.e, that = [AlB A. We will use this interpretation later.
From singular value decomposition to the polar decomposition. Note, that if we know
the singular value decomposition A = W LV * of a square matrix A, we can write a polar
decomposition of A:
A = WLV* = (WV)(VLV *) = VIAl
so I A I = V LV * and U = WV.
General matrix form of the singular value decomposition. In the general case when
dim X = n, dim Y = m (i.e. A is an m x n matrix), the above representation A = V L V * is
also possible. Namely, if
208
r *
A = LO"kWkVk
k=1
Structure of Operators in Inner Product Spaces
is a singular value decomposition of A, we need to complete the systems vI' v
2
, ... , vr
and w
I
,w
2
' ... , wr to orthonormal bases in X and Y respectively. Then a can be represented
as
A = W LV*,
where V E Mn x nand WE Mmxm are unitary matrices with columns VI' V
2
, ... , vn and
w
I
'w
2
, ""w
m
respectively, and L is a "diagonal" m x n matrix
{
O"k j = k ~ r:
L j,k = 0 otherwise.
In other words, to get the matrix one has to take the diagonal matrix diag {O" I' 0"2' ... , r}
and make it to an m x n matrix by adding extra zeroes "south and east".
SINGULAR VALUES
As we discussed above, the singular value decomposition is simply diagonalization
with respect to two dierent orthonormal bases. Since we have two dierent bases here, we
cannot say much about spectral properties of an operator from its singular value
decomposition. For example, the diagonal entries of L in the singular value decomposition
are not the eigenvalues of A. Note, that for a = WL V* as in we generally have
An :f:. W L n V * , so this diagonalization does not help us in computing functions 'of a matrix.
However, as the examples below show, singular values tell us a lot about socalled
metric properties of a linear transformation.
Final Remark: performing singular value decomposition requires finding eigenvalues
and eigenvectors ofthe Hermitian (selfadjoint) matrix A * A. To find eigenvalues we usually
computed characteristic polynomial, found its roots, and so on ... This looks like quite a
complicated process, especially if one takes into account that there is no formula for finding
roots of polynomials of degree 5 and higher.
However, there are very eective numerical methods of find eigenvalues and
eigenvectors of a hermitian matrix up to any given precision. These methods do not involve
computing the characteristic polynomial and finding its roots. They compute approximate
eigenvalues and eigenvectors directly by an iterative procedure. Because a Hermitian matrix
has an orthogonal basis of eigenvectors, these methods work extremely well.
We will not discuss these methods here, it goes beyond the scope of this book. However,
you should believe me that there are very eective numerical methods for computing
eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value
decomposition. These methods are extremely eective, and just a little more computationally
intensive than solving a linear system.
Structure of Operators in Inner Product Spaces 209
Image of the unit ball. Consider for example the following problem: let A : IR
n
7
lR
m
be a linear transformation, and let B = {x E lR
n
: II x II ::; I} be the closed unit ball in
lRn. We want to describe A(B), i.e. we want to find out how the unit ball is transformed
under the linear transformation.
Let us first consider the simplest case when A is a diagonal matrix A =
diag{a
l
, a2, ... , an}, a
k
> 0, k= 1,2, ... , n. Then forv= (x
l
,x
2
' ... , xnl and (Y1'Y2' ... ,
ynl = Y = Ax we have yk = akXk (equivalently, x
k
= ykla
k
) for k = 1,2, ... , n, so
y=(YI'Y2' ···,ynl=Axforllxll::; 1,
if and only if the coordinates YI' Y2' ... , Y
n
satisfy the inequality
2 2 2 n 2
2l+
Y2
+
2 2··· 2
a1 a2 an k=1 ak
(this is simply the inequality II x 112 = L k I xk 12 ::; 1 ).
The set of points in lR n satisfying the above inequalities is called an ellipsoid. If n =
2 it is an ellipse with halfaxes a I and a
2
, for an = 3 it is an ellipsoid with halfaxes a I' a
2
and a
2
. In lR
n
the geometry of this set is also easy to visualize, and we call that set an
ellipsoid with half axes aI' a
2
, ... , an. The vectors e
l
, e
2
, ... , en or, more precisely the
corresponding lines are called the principal axes of the ellipsoid.
The singular value decomposition essentially says that any operator in an inner product
space is diagonal with respect to a pair of orthonormal bases, see Remark 3.9. Namely,
consider the orthogonal bases A = VI' v
2
, ... , vn and B = w
l
,w
2
' ... ,w
n
from the singular
value decomposition. Then the matrix of A in these bases is diagonal
[AhA = diag{sn: n = 1,2, ... , n}.
Assuming that all a
k
> 0 and essentially repeating the above reasoning, it is easy to
show that any point Y = Ax E A(B) if and only if it satisfies the inequality
2 2 2 II 2
2l+
Y2
+
2 2 ... 2 2·
a1 a2 an k=1 ak
where YI' Y2' ... , Y
n
are coordinates ofy in the orthonormal basis B = w
l
'w
2
, ... , w
n
' not
in the standard one. Similarly, (XI' x
2
, ... , xnl = [x]A.
But that is essentially the same ellipsoid as before, only "rotated" (with dierent but
still orthogonal principal axes)!
There is also an alternative explanation which is presented below.
Consider the general case, when the matrix A is not necessarily square, and (or) not
all singular values are nonzero. Consider first the case of a "diagonal" matrix I of
form. It is easy to see that the image L B of the unit ball B is the ellipsoid (not in the whole
space but in the Ran L) with half axes a I' a
2
, ... , ar.
210 Structure of Operators in Inner Product Spaces
Consider now the general case, A = wI. V*, where V, Ware unitary operators. Unitary
transformations do not change the unit ball (because they preserve norm), so V* (B) = B.
We know that I. (B) is an ellipsoid in Ran I. with halfaxes ai' a
2
, ... , a
r
. Unitary
transformations do not change geometry of objects, so W( I. (B)) is also an ellipsoid with
the same halfaxes. It is not hard to see from the decomposition A = wI. V* (using the
fact that both Wand V are invertible) that W transforms RanI. to Ran A, so we can
conclude:
the image A(B) of the closed unit ball B is an ellipsoid in Ran A with
half axes ai' 0"2' ... , a
r
. Here r is the number of nonzero singular
values, i.e. the rank of A.
Operator norm of a linear transformation. Given a linear transformation A : x Y let
us consider the following optimization problem: find the maximum ofkAxk on the closed
unit ball B = {x EX: II x I}.
Again, singular value decomposition allows us to solve the problem. For a diagonal
matrix A with nonnegative entries the maximum is exactly maximal diagonal entry. Indeed,
let s!' s2' ... , sr be nonzero diagonal entries of A and let sl be the maximal one. Since
r
Ax= LXkek'
k=l
we can conclude that
r r
IXk 12 1
2
=sJ.ll
x
Il
2
,
k=l k=1
so II Ax II S) II x II· On the other hand, II Aelll = II slelll = sIll ell, so indeed sl is the
maximum of II Ax lion the closed unit ball B. Note, that in the above reasoning we did not
assume that the matrix A is square; we only assumed that all entries outside the "main
diagonal" are 0, so formula holds.
To treat the general case let us consider the singular value decomposition, A = WI. V
, where W, Vare unitary operators, and I. is the diagonal matrix with nonnegative entries.
Since unitary transformations do not change the norm, one can conclude that the maximum
of II Ax II on the unit ball B is the maximal diagonal entry of I. i.e. that
the maximum of I IAxl I on the unit ball B is the maximal singular
value of A.
Definition. The quantity max {II Ax II : x E X, II x II I} is called the operator norm of
a and denoted II A II.
lt is an easy exercise to see that II A II satisfies all properties of the norm:
Structure of Operators in Inner Product Spaces 211
1. II aA II = I ex I . II A II·
2. II A + B II :::; II A II + II B II·
3. IIA II ~ o for aliA.
4. II A II = 0 if and only if A = o.
So it is indeed a norm on a space of linear transformations from from x to Y. One of
the main properties of the operator norm is the inequality
II Ax II ~ II A II . II x II,
which follows easily from the homogeneity of the norm II x II.
In fact, it can be shown that the operator norm II A II is the best (smallest) number C ~
o such that
II Ax II :::; C II x II '\Ix E X.
This is often used as a definition of the operator norm.
On the space of linear transformations we already have one norm, the Frobenius, or
HilbertSchmidt norm II Alb,
2
II A 112= trace(A * A).
So, let us investigate how these two norms compare.
Let sl' s2' ... , sr be nonzero singular values of A (counting multiplicities), and let sl
be the largest eigenvalues. Then s ~ ,s; , ... ,s; are nonzero eigenvalues of A*A (again
counting multiplicities). Recalling that the trace equals the sum of the eigenvalues we
conclude that
r
II A 112 = trace(A * A) = ~ > i .
k=1
On the other hand we know t h ~ t the operator norm of a equals its largest singular
value, i.e. II A II = s I. SO we can conclude that II A II :::; II A Ib, i.e. that the operator norm of
a matrix cannot be more than its.
This statement also admits a direct proof using the CauchySchwarz inequality, and
such a proof is presented in some textbooks. The beauty of the proof we presented here is
that it does not require any computations and illuminates the reasons behind the inequality.
Condition number of a matrix. Suppose we have an invertible matrix A and we want
to solve the equation Ax = b. The solution, of course, is given by x = AI b, but we want to
investigate what happens if we know the data only approximately.
That happens in the real life, when the data is obtained, for example by some
experiments. But even if we have exact data, roundo errors during computations by a
computer may have the same effect of distorting the data.
Let us consider the simplest model, suppose there is a small error in the right side of
the equation. That means, instead of the equation Ax = b we are solving
212 Structure of Operators in Inner Product Spaces
Ax = b +
where is a small perturbation of the right side b.
So, instead of the exact solution x of Ax = b we get the approximate solution x + Llx
of A(x+ Llx) = b + We are assuming that A is invertible, so x = AI We want to
know how big is the relative error in the solution II 11111 x II in comparison with the
relative error in the right side II 11111 b II. It is easy to see that
II = II II = II 1111 b II = II 1111 Axil
II x II II x II II b II II x II II b II II x II
Since II AI II ::;; II AI II . II b II and II A x II ::;; II A II . II x II we can conclude that
II AI 11.11 AII.II
II xII IIbll
The quantity II A 11'11 AIII is called the condition number of the matrixA. It estimates
how the relative error in the solution x depends on the relative error in the right side b.
Let us see how this quantity is related to singular values. Let sl' s2' ... , sn be the singular
values of A, and let us assume that sl is the largest singular value and sn is the smallest. We
know that the (operator) norm of an operator equals its largest singular value, so
I 1
II A 11= sl,1I A 11=,
sn
so
IIAII.IIA
I
sn
In other words, the condition number of a matrix equals to the ratio of the largest and
the smallest singular values.
I
We deduced above that W A II· II A II ·w· It is not hard to see that this
estimate is sharp, i.e. that it is possible to pick the right side b and the error such that
we have equality
II =11 AI 11.11 AII.II
II xII IIbil
We just put b = VI and = aWn' where VI is the first column of the matrix V, and wn
is the nth column of the matrix Win the singular value decomposition A = w'L V*. Here a
can be any scalar.
A matrix is called well conditioned ifits condition number is not too big. If the condition
number is big, the matrix is called ill conditioned. What is "big" here depends on the
problem: with what precision you can find your right side, what precision is required for
the solution, etc. Effective rank of a matrix. Theoretically, the rank of a matrix is easy to
compute: one JUSt needs to row reduce matrix and count pivots.However, in practical
Structure of Operators in Inner Product Spaces 213
applications not everything is so easy. The main reason is that very often we do not know
the exact matrix, we only know its approximation up to some precision.
Moreover, even if we know the exact matrix, most computer programs introduce round
o errors in the computations, so effectively we cannot distinguish between a zero pivot and
a very small pivot.
A simple naive idea of working with roundoff errors is as follows. When computing
the rank (and other objects related to it, like column space, kernel, etc) one simply sets up
a tolerance (some small number) and if the pivot is smaller than the tolerance, count it as
zero. The advantage of this approach is its simplicity, since it is very easy to programme.
However, the main disadvantage is that is is impossible to see what the tolerance is
responsible for. For example, what do we lose is we set the tolerance equal to 106? How
much better will 10
8
be?
While the above approach works well for well conditioned matrices, it is not very
reliable in the general case.
A better approach is to use singular values. It requires more computations, but gives
much better results, which are easier to interpret. In this approach we also set up some
small number as a tolerance, and then perform singular value decomposition. Then we
simply treat singular values smaller than the tolerance as zero. The advantage of this
approach is that we can see what we are doing. The singular values are the halfaxes of the
ellipsoid A(B) (B is the closed unit ball), so by setting up the tolerance we just deciding
how "thin" the ellipsoid sholJld be to be considered "flat".
Structure of Orthogonal Matrices
An orthogonal matrix U with det U = 1 is often called a rotation. The theorem below
explains this name.
Theorem. Let U be an orthogonal operator in IRI1. Suppose that detU = 1. Then there
exists an orthonormal basis VI' v
2
, ... , vn such that the matrix of U in this basis has the
block diagonal form
Rq>1
Rq>2
0
where Rjk are 2dimensional rotations,
= (COS<Pk sin<Pk)
Rq,k sin<Pk cos<Pk
0
Rq>k
I
n

2k
and I
n

2k
stands for the identity matrix of size (n  2k) x (n  2k).
214 Structure of Operators in Inner Product Spaces
Proof We know that if p is a polynomial with real coefficient and A is its complex
root, pCA) = 0, then I is a root of p as well, p(I) = 0 (this can easily be checked by
plugging I into p(z) = L ~ = o ak
z
k ), Therefore, all complex eigenvalues of a real matrix
A can be split into pairs A
k
, 5: k'
We know, that eigenvalues of a unitary matrix have absolute value 1, so all complex
eigenvalues of A can be written as Ak = cos ak + i sin ak, I k = cos a
k
+ i sin a
k
,
Fix a pair of complex eigenvalues A and 5:, and let u E en be the eigenvector of
U, Uu = Au' Then Uu = I 17 . Now, split U into real and maginary parts, i.e. define
x
k
:= Re u = (u + 17)/2, Y = 1m u = (u  u)/(2i),
so u = x + iy (note, that x, yare real vectors, i.e. vectors with real entries).
Then
1 l
ux = U (u + iJ) = (AU + AU) = Re(Au)
2 2
Similarly,
1 1
Uy = 2i U(uiJ) = 2/
AU

AU
) = 1m (AU).
Since = cos a + i sin a, we have
AU = (cos a + i sin a) (x + iy) = ((cos a)x(sin a)y) + i((cos a)y + (sin a)x).
so
Ux = Re(Au) = (cos a) x  (sin a)y, Uy = Im(u) = (cos a)y + (sin a)x.
In other word, U leaves the 2dimensional subspace EA. spanned by the vectors x, y
invariant and the matrix of the restriction of U onto this subspace is the rotation matrix
(
cosa sina)
Ra= .
 sina cosa
Note, that the vectors u and u (eigenvectors of a unitary matrix, corresponding to
dierent eigenvalues) are orthogonal, so by the Pythagorean Theorem
J2
IIx 11= Ilyll =2'lluli.
It is easy to check that x 1 y, so x, y is an orthogonal basis in EA.' Ifwe multiply each
vector in the basis x, y by the same nonzero number, we do not change matrices of linear
transformations, so without loss of generality we can assume that II x II = II y II = 1 i.e. that
x, y is an orthogonal basis in EA.'
Let us complete the orthonormal system VI = x, v
2
= Y to an orthonormal basis in ~ n •
Since UEA. c E'J... E"A.' i.e. E is an invariant subspace of U, the matrix of U in this basis has
the block triangular form
Structure of Operators in Inner Product Spaces
where 0 stands for the (n  2) x 2 block of zeroes.
Since the rotation matrix R_ uis invertible, we have VE/... = E/.... Therefore
V*E = UIE = E
I I I
so the matrix of V in the basis we constructed is in fact block diagonal,
(*)
Since V is unitary
( ~ }
so, since VI is square, it is also unitary.
215
If VI has complex eigenvalues we can apply the same procedure to decrease its size
by 2 until we are left with a block that has only real eigenvalues. Real eigenvalues can be
only + 1 or 1, so in some orthonormal basis the matrix of V has the form
R
u1
0
R_
U2
o
here Ir and I, are identity matrices of size r x r and I x I respectively. Since det U = 1,
the multiplicity of the eigenvalue 1 (i.e. r) must be even.
Note, that the 2 x 2 matrix 12 can be interpreted as the rotation through the angle n'.
Therefore, the above matrix has the form given in the conclusion of the theorem with '<Pk
= uk or '<Pk = n
Let us give a dierent interpretation of Theorem. Define ~ . to be a rotation thorough <Pj
in the plane spanned by the vectors vi , vi + 1. Then Theorem simply says that V is the
composition of the rotations ~ , i = 1, 2, ... , k. Note, that because the rotations T. act in
mutually orthogonal planes, they commute, i.e. it does not matter in what order ..!ve take
the composition. So, the theorem can be interpreted as follows:
Any rotation in IR n can be represented as a composition of at most nl2 commuting
planar rotations.
Ifan orthogonal matrix has determinant 1, its structure is described by the following
theorem.
216 Structure of Operators in Inner Product Spaces
Theorem. Let U be an orthogonal operator in and let detU = 1. Then there
exists an orthonormal basis vI' v
2
, ... , vn such that the matrix of U in this basis has block
diagonal form
°
R<pk
° I
n

2k
where r = n  2k  1 and R<pk are 2dimensional rotations,
_(COS'Pk sin'Pk]
.
sm'Pk cos'Pk
and I
n

2k
stands for the identity matrix of size (n  2k) x (n  2k).
The modification that one should make to the proof of Theorem are pretty obvious.
Note, that it follows from the above theorem that an orthogonal 2 x 2 matrix U with
determinant I is always a reflection.
Let us now fix an orthonormal basis, say the standard basis in We call an
elementary rotation
2
a rotation in the Xj  x
k
plane, i.e. a linear transformation which changes
only the coordinates Xj and x
k
, and it acts on these two coordinates as a plane rotation.
Theorem. Any rotation U (i.e. an orthogonal transformation U with detU = I) can be
represented as a product at most n(n  1)/2 elementary rotations.
To prove the theorem we will need the following simple lemmas.
Lemma. Let x = (xI' x
2
f E There exists a rotation R of R2 which moves the
vector x to the vector (a, of, where a = J x: +
One can just draw a picture orland write a formula for Ra'
Lemma. Let x = (xI' x
2
, ... , xn)T E There exist n
1
elementary rotations R
I
, R
2
, ... ,
R
n
_
1
such thdt R
n

1
... , R
2
R
j
x = (a, 0, 0, ... , Of, where a = + ...
Proof The idea ofthe proof of the lemma is very simple. We use an elementary rotation
R
t
in the xn  I xn plane to "kill" the last coordinate ofx. Then use an elementary rotation
R2 in x
n

2
x
n

t
plane to "kill" the coordinate number n  I of Rtx (the rotation R2 does not
change the last coordinate, so the last coordinate of R
2
R
t
x remains zero), and so on.
For a formal proof we will use induction in n. The case n = 1 is trivial, since any
vector in 1 has the desired form. The case n = 2 is treated.
Assuming now that Lemma is true for n  1, let us prove it for n. There exists a 2 x 2
rotation matrix Ra such that
Structure of Operators in Inner Product Spaces
where anI = J X;_I + x;, So if we define the n x n elementary rotation R I by
R, ;J
(1n2 is (n  2) x (n  2) identity matrix), then
R
lx
= (xI' x
2
, .. " x
n

2
' anI' of,
217
We assumed that the conclusion of the lemma holds for n  I, so there exist n  2
elementary rotations (let us call them R
2
, R
3
, .. " R
n
_
l
) in JR.
1l

1
which transform the vector
(XI' x
2
, .. " Xn_I' an_If E JR.
1l

1
to the vector (a, 0, .. " O)T E JR.
1l

I
, In other words
R
n
_
l
, .. " R
3
Ri
x
l' x
2
, .. " XnI' an_If = (a, 0, .. " of,
We can always assume that the elementary rotations R
2
, R
3
, .. " R
n
_
J
act in JR.1l , simply
by assuming that they do not change the last coordinate,
Then
R
n
_
l
, .. ,' R3R2R x = (a, 0, .. " of E JR.
Il
,
Let us now show that a = J + xi +'" + It can be easily checked directly, but we
apply the following indirect reasoning, We know that orthogonal transformations preserve
the norm, and we know that a 0,
But, then we do not have any choice, the only possibility for a is a =
I 2 2 2
\jXJ +X2 +"'+X
n
'
Lemma. Let A be an n x n matrix with real entries, There exist elementary rotations
R
I
,R
2
, .. " R
n
, n n(n  1)/2 such that the matrix B = RN .. " R2RIA is upper triangular,
and, moreover, all its diagonal entries except the last one B
n
.
n
are nonnegative,
Proof We will use induction in n, The case n = I is trivial, since we can say that any
I x I matrix is of desired form,
Let us consider the case n = 2, Let al be the first column of A, There exists a rotation
R which "kills" the second coordinate of ai' making the first coordinate nonnegative,
Then the matrix B = RA is of desired form,
Let us now assume that lemma holds for (n  I) x (n  I) matrices, and we want to
prove it for n x n matrices, For the n x n matrix A let al be its first column, We can find n
 I elementary rotations (say R
I
, R
2
, , .. , R
n
_
1
which transform a
l
into (a, 0, .. " Of, So, the
matrix R
n
_
l
, .. " R2R)A has the following block triangular form
R
n
_
l
.. ,R
2
R
J
A =
218 Structure of Operators in Inner Product Spaces
where Al is an (n  1) x (n  1) block.
We assumed that lemma holds for n  1, so AI can be transformed by at most (n  1)
(n  2)/2 rotations into the desired upper triangular form. Note, that these rotations act in
JR(n1 (only on the coordinates x
2
' x
3
' ... , xn)' but we can always assume that they act on the
whole JR(1l simply by assuming that they do not change the first coordinate. Then, these
rotations do not change the vector (a, 0, ... , ol (the first column of R
n
_
l
, .•• , R2RIA), so the
matrix A can be transformed into the desired upper triangular form by at most
n  1 + (n  1 )(n  2)/2 = n(n  1 )/2
elementary rotations.
Proof There exist elementary rotations R
1
, R
2
, .... RN such that the matrix
VI = R
N
, ... , R
2
R
2
V
is upper triangular, and all diagonal entries, except may be the last one, are non
negative.
Note, that the matrix VI is orthogonal. Any orthogonal matrix is normal, and we know
that an upper triangular matrix can be normal only if it is diagonal. Therefore, VI is a
diagonal matrix.
We know that an eigenvalue of an orthogonal matrix can either be lorI, so we can
have only 1 or Ion the diagonal of VI' But, we know that all diagonal entries of VI'
except may be the last one, are nonnegative, so all the diagonal entries of VI' except may
be the last one, are 1. The last diagonal entry can be ±1.
Since elementary rotations have determinant 1, we can conclude that
det V\ = det V= 1,
so the last diagonal entry also must be 1. So VI = J, and therefore V can be represented
as a product of elementary rotations
\ \ I
V=R
1
R2 ... R
N
·
Here we use the fact that the inverse of an elementary rotation is an elementary rotation
as well.
Orientation
Motivation. In Figures 3 orthonormal bases in JR(2 and JR(3 respectively. In each figure,
the basis b) can be obtained from the standard basis a) by a rotation, while it is impossible
to rotate the standard basis to get the basis c) (so that e
k
goes to vk Vk).
You have probably heard the word "orientation" before, and you probably know that
bases (a) and (b) have positive orientation, and orientation of the bases (c) is negative.
You also probably know some rules to determ;ne the orientation, like the right hand rule
from physics. So, if you can see a basis, say in JR(3, you probably can say what orientation
it has. But what if you only given coordinates of the vectors VI' V
2
' V3? Of course, you can
try to draw a picture to visualize the vectors, and then to see what the orientation is.
Structure of Operators in Inner Product Spaces 219
But this is not always easy. Moreover, how do you "explain" this to a computer?
It turns out that there is an easier way. Let us explain it. We need to check whether it
is possible to get a basis VI' V
2
' v3 in IR3 by rotating the standard basis e
l
, e
2
, en'
U
ek
= vk, k = 1, 2, 3;
L
e)
v
2
(a) (b) (c)
Fig. Standard Bases lR
2
h
e
2
e)
V
2
(a) (b)
(c)
Fig. Orientation in IR
3
There is unique linear transformation U such that its matrix (in the standard basis) is
the matrix with columns VI' v
2
, v
3
• It is an orthogonal matrix (because it transforms an
orthonormal basis to an orthonormal basis), so we need to see when it is rotation. Theorems
give us the answer: the matrix U is a rotation if and only if det U = 1. Note, that (for 3 x 3
matrices) if det U = 1, then U is the composition of a rotation about some axis and a
reflection in the plane of rotation, i.e. in the plane orthogonal to this axis. This gives us a
motivation for the formal definition below.
Definition. Let a and b be two bases in a real vector space X. We say that the bases a
and b have the same orientation, if the change of coordinates matrix [J]B,A has positive
determinant, and say that they have dierent orientations if the determinant of [J]B A is
negative. Note, that since '
I
[J]A,B = [I]B,A'
one can use the matrix [J]A,B in the definition ..
We usually assume that the standard basis e
l
, e
2
, ... , en in IR
n
has positive orientation.
In an abstract space one just needs to fix a basis and declare that its orientation is positive.
If an orthonormal basis VI' v
2
, ... , vn in IR
n
has positive orientation (i.e. the same
orientation as the standard basis) This equation show that the basis V l' v
2
, .•. , V n is obtained
from the standard basis by a rotation.
220 Structure of Operators in Inner Product Spaces
Continuous Transformations of Bases and Orientation
Definition. We say that a basis a = {aI' a
2
, ... , an} can be continuously transformed to
A basis
b = {bI' b
2
, ... , b
n
}
if there exists a continuous family of bases
Vet) = {vIet), vit), ... , vnc!)}, t E [a, b]
such that
via) = a
k
, vib) = b
k
, k = 1,2 ... , n.
"Continuous family of bases" mean that the vectorfunctions vit) are continuous (their
coordinates in some bases are continuous functions) and, which is essential, the system
vI (t), v
2
(t), ... , v n(t) is a basis for all t E [a, b]. Note, that performing a change of variables,
we can always assume, if necessary that
[a, b] = [0,1].
Theorem. Two bases A = {aI' a
2
, ... , an} and B = {bI' b
2
, ... , b
n
} have the same
orientation, if and only if one of the bases can be continuously transformed to the other.
Proof Suppose the basis A can be continuously transformed to the basis B, and let
Vet), t E [a, b] be a continuous family of bases, performing this transformation. Consider
a matrixfunction V (t) whose columns are the coordinate vectors [vit)]A of vi!) in the
basis A.
Clearly, the entries of V(t) are continuous functions and V(a) = I, V(b) = [1]A B' Note,
that because Vet) is always a basis, det V (t) is never zero. Then, the Intermediate Value
Theorem asserts that det V (a) and det V (b) has the same sign. Since det V (a) = detl = 1,
we can conclude that
det[1]A,B = det V (b) > 0,
so the bases A and B have the same orientation.
To prove the oppqsite implication, i.e. the "only if' part of the theorem, one needs to
show that the identity matrix J can be continuously transformed through invertible matrices
to any matrix B satisfying det B > O. In other words, that there exists a continuous matrix
function V (t) on an interval [a, b] such that for all t E [a, b] the matrix V (t) is invertible
and such that
V (a) = J, V (b) = B.
Chapter 8
Bilinear and Quadratic Forms
Main Definition
Bilinear forms on Rn. A bilinear form on Rn is a function L = L(x, y) oftwo arguments
x, Y E jRn which is linear in each argument, i.e. such that
1. L(ax! t = aL(xl,y) +
2. L(x, ay! + = alex, y!) + Y2)'
One can consider bilinear form whose values belong to an arbitrary vector space, but
in this book we only consider forms that take real values.
If x = (x!' x
2
, ... , xn)T and y = (Y!, Y2' ... , Ynf, a bilinear form can be written as
n
L(x,y) = L aj,kxkYj'
j,k=!
or in matrix form
L(x, y) = (Ax, y)
where
A=
The matrix A is determined uniquely by the bilinear form L.
Quadratic forms on jR n • There are several equivalent definition of a quadratic form.
One can say that a quadratic form on Rn is the "diagonal" of a bilinear form L, i.e. that any
quadratic form Q is defined by Q[x] = L(x, x) = (Ax, x).
222 Bilinear and Quadratic Forms
Another, more algebraic way, is to say that a quadratic form is a homogeneous
polynomial of degree 2, i.e. that Q[x] is a of n variables xl' x
2
, ... , xn having
only terms of degree 2. That means that only terms axi and cX/'k are allowed.
There many ways (in fact, infinitely many) to write a quadratic form Q[x] as Q[x] =
2 2
(Ax, x). For example, the quadratic form Q[x] =xl + x2  4xlx2 on can be represented
as (Ax, x) where A can be any of the matrices
(
1 4) (1 0) (1 1).
° 1 ' 4 1 ' 2 1
In fact, any matrix A of form
will work.
But if we require the matrix a to be symmetric, then such a matrix is unique:
Any quadratic form Q[x] on admits unique representation Q[x] = (Ax, x) where a
is A (real) symmetric matrix.
For example, for the quadratic form
2 2 2
Q[x]=xl +3x2 +5x3 +4xlx216x2x3+7xlX3
on , the corresponding symmetric matrix A is
[
8 3.5 5
Quadratic forms on en. One can also define a quadratic form on en (or any complex
inner product space) by taking a selfadjoint transformation A = A * and defining Q by
Q[x] = (Ax, x). While our main examples will be in Rn, all the theorems are true in the
settipg of Cn as well. Bearing this in mind, we will always use a instead of AT
Diagonalization of Quadratic Forms
You have probably met quadratic forms before, when you studied second order curves
in the plane. May be you even studied the second order surfaces in •
We want to present a unified approach to classification of such objects. Suppose we
are given a set in
3
defined by the equation Q[x] = I, where Q is some quadratic form.
If Q has some simple form, for example if the corresponding matrix is diagonal, i.e. if
Q[x] = + + ... + we can easily visualize this set, especially if n = 2, 3. In
Bilinear and Quadratic Forms 223
higher dimensions, it is also possible, if not to visualize, then to understand the structure
of the set very well.
So, if we are given a general, complicated quadratic form, we want to simplify it as
much as possible, for example to make it diagonal. The standard way of doing that is the
change of variables. Orthogonal diagonalization. Let us have a quadratic form
Q[x] = (Ax, x) in IRn. Introduce new variables y = (Y!, Y2' ... , ynf E IR
n
, with y = SI x,
where S is some invertible n x n matrix, so x = Sy.
Then,
Q[x] = Q[Sy] = (ASy, 'Sy) = (S*ASy, y),
so in the new variables y the quadratic form has matrix S*AS.
So, we want to find an invertible matrix S such that the matrix S*AS is diagonal. Note,
that it is dierent from the diagonalization of matrices we had before: we tried to represent
a matrix a as A = SDSI, so the matrix D = SlAS is diagonal. However, for orthogonal
matrices U, we have U = U!, and we can orthogonally diagonalize symmetric matrices.
So we can apply orthogonal diagonalization we studied before to the quadratic forms.
Namely, we can represent the matrix a as
A = UDU* = UDU!.
Recall, that D is a diagonal matrix with eigenvalues of A on the diagonal, and U is the
matrix of eigenvectors (we need to pick an orthogoml basis of eigenvectors). Then
D = U*AU,
so in the variables
y= U! x
the quadratic form has diagonal matrix.
Let us analyse the geometric meaning of the orthogonal diagonalization. The columns
up u
2
, ... , un of the orthogonal matrix Uform an orthonormal basis in IR
n
, let us call this
basis S. The change of coordinate matrix [I]s B from this basis to the standard one is exactly
U. We know that '
y = (Y!, Y2' ... , ynf = Ax,
so the coordinates Y!'Y2' ... , Yn can be interpreted as coordinates ofthe vector x in the
new basis u!' u
2
, ... , un' So, orthogonal diagonalization allows us to visualize very well the
set Q[x] = I, or a similar one, as long as we can visualize it for diagonal matrices.
Example. Consider the quadratic form of two variables (i.e. quadratic form on IR
2
),
Q(x, y) = 2.x
2
+2y2 + 2xy.
Let us describe the set of points (x, yf E IR
2
satisfying Q(x, y) = 1.
The matrix A of Q is
A = ( ~ ~ }
224 Bilinear and Quadratic Forms
Orthogonally diagonalizing this matrix we can represent it as
A = where U =
or, equivalently
U' AU D
The set {y : (Dy, y) = I} is the ellipse with halfaxes 1 1.J3 and 1. Therefore the set
{x E IR
2
: (Ax. x) I}, is the same ellipse only in the basis (Jz, Jz r .( Jz, Jz r '
or, equivalently, the same ellipse, rotated 1t/4.
Nonorthogonal diagonalization. Orthogonal diagonalization involves computing
eigenvalues and eigenvectors, so it may be dicult to do without computers for n > 2. There
is another way of diagonalization based on completion of squares, which is easier to do
without computers.
Let us again consider the quadratic form of two variables, Q[ x] = 2x{ + 2xlx2 + 2xi
(it is the same quadratic form as in the above example, only here we call variables not x, y
but x I' x
2
). Since
(
1)2 (2 1 1 2 ) 2 1 2
2 xJ+"2X2 =2 xl +2XI"2X2+4"X2 =2xJ +2XIX2+"2X2
(note, that the first two terms coincide with the first two terms of Q), we get
2 2 ( 1)2 3 2 2 3 2
2xJ +2XIX2 +2xI =2 xl +X2 +X2 =2YI +Y2'
2 2 2
1
whereYI =x
I
+ xl +"2X2 andY2 =x
2
.
The same method can be applied to quadratic form of more than 2 variables. Let us
consider, for example, a form Q[x] in ,
222
Q[ X] = Xl  6XIX2 + 4XIX3  6X2X3 + 8X2  3X3 .
Considering all terms involving the first variable xl (the first 3 terms in this case), let
us pick a full square or a multiple of a full square which has exactly these terms (plus
some other terms).
Since
2 2 ' 2 2
(Xl  3X2 + 2X3) = xI  6XIX2 + 4XI
X
3 12x2X3 + 9X2 + 4X3
we can rewrite the quadratic form as
2 2 2
(XI
3x
2+
2x
3) =X2 6X2
X
3+
7x
3·
Bilinear and Quadratic Forms 225
Note, that the expression xi + 6x2x3 7xi involves only variables x
2
and x
3
. Since
2 2 222
(X2 
3X
3) = (X2 
6X
2
X
3 +
9X
3) = x2 +6X2X3  9X3
we have
2 2 2 2
X2 + 6X2X3  7X3 = (X2  3X3) + 2x3·
Thus we can write the quadratic fonn Q as
,2 22222
Q[x](x)3X2+ 2x3) (x2 3x3) +2X3 =y) +2+23
where
Y) = x)  3x
2
+ 2x
3
'Y2 = x
2
 3x
3
'Y3 = x
3
·
There is another way of perfonning nonorthogonal diagonalization of a quadratic
fonn. The idea is to perfonn row operations on the matrix a of the quadratic form. The
difference with the row reduction (GaussJordan elimination) is that after each row operation
we need to perform the same column operation, the reason for that being that we want to
make the matrix S* AS diagonal.
Let us explain how everything works on an example. Suppose we want to diagonalize
a quadratic form with matrix
(
1 1 3)
A = 1 2 1.
311
We augment the matrix A by the identity matrix, and perform on the augmented matrix
(All) row/column operations. After each row operation we have to perfonn on the matrix
a the same column operation. We get
(
  + R) : !
311001 31100
(
0
3
1
! : !
4 0 0 1 ,0 4 8 3 0
(
001 0 1 1
4 8 3 0 1 4R2 0 0 24 7 4
(
00
1
0 0
o 24 7 4
226 Bilinear and Quadratic Forms
Note, that we perform column operations only on the left side of the augmented matrix.
We get the diagonal D matrix on the left, and the matrix S* on the right, so D = S*AS,
=:J.
° ° 24 7 4 1 3 1 ° ° 1
Let us explain why the method works. A row operation is a left multiplication by an
elementary matrix. The corresponding column operation is the right multiplication by the
transposed elementary matrix. Therefore, performing row operations E
l
,E
2
, "., EN and the
same column operations we transform the matrix A to
* * *
EN ... E2EIAEIE2 ... EN =EAE*.
As for the identity matrix in the right side, we performed only row operations on it, so
the identity matrix is transformed to
EN'" E2El1 = E1 = E.
If we now denote E* = S we get that S* AS is a diagonal matrix, and the matrix E = S
is the right half of the transformed augmented matrix.
In the above example we got lucky, because we did not need to interchange two rows.
This operation is a bit tricker to perform. It is quite simple if you know what to do, but it
may be hard to guess the correct row operations. Let us consider, for example, a quadratic
form with the matrix
A =
Ifwe want to diagonalize it by row and column operations, the simplest idea would be
to interchange rows I and 2. But we also must to perform the same column operation, i.e.
interchange columns 1 and 2, so we will end up with the same matrix.
So, we need something more nontrivial. The identity
122
2XIX2 = i[(XI +X2) (Xl X2) ]
gives us the idea for the following series of row operations:
(
0 1 11 0) (0 1 I 1 0)
1 ° ° 1 1 1/ 2 1/ 2 1
2
(
01 1 I 1 O)+Rl (1 ° 1112 1)
1 112 1 1 I 112 1
(0
1
° I 112 11)'
1 112
Nonorthogonal diagonalization gives us a simple description of a set Q[x] = 1 in a
nonorthogonal basis. It is harder to visualize, then the representation given by the
Bilinear and Quadratic Forms 227
orthogonal diagonalization. However, if we are not interested in the details, for example if
it is sucient for us to know that the set is ellipsoid (or hyperboloid, etc), then the non
orthogonal diagonalization is an easier way to get the answer.
Silvester's Law of Inertia
As we discussed above, there many ways to diagonalize a quadratic form. Note, that
a resulting diagonal matrix is not unique. For example, if we got a diagonal matrix
D = diag{Al' A
2
, ... , An},
we can take a diagonal matrix
S=diag{sl,s2' ... , sn},sk
E
]R.,sk *' 0
and transform D to
S * DS = ...
This transformation changes the diagonal entries of D. However, it does not change
the signs of the diagonal entries. And this is always the case
Namely, the famous Silvester's Law ofInertia states that:
For a Hermitian matrix A (i.e. for a quadratic form Q[x] :::;; (Ax, x» and any its
diagonalization D = S*AS, the number of positive (negative, zero) diagonal entries of D
depends only on A, but not on a particular choice of diagonalization. Here we of course
assume that S is an invertible matrix, and D is a diagonal one. The idea of the proof
Silvester's Law of Inertia is to express the number of positive (negative, zero) diagonal
entries of a diagonalization D = S*AS in terms of A, not involving S or D.
We will need the following definition.
Definition. Given an n x n symmetric matrix A = A * (a quadratic form Q[x] = (Ax, x)
on ]R.n) we call a subspace E c lR
n
positive (resp. negative, resp. neutral) if
(Ax, x) > 0 (resp. (Ax, x) < 0, resp. (Ax, x) = 0) for all x E E, x *' O.
Sometimes, to emphasize the role of a we will say Apositive (A negative, Aneutral).
Theorem. Let A be an n x n symmetric matrix, and let D = S*AS be its diagonalization
by an invertible matrix S. Then the number of positive (resp. negative, resp. zero) diagonal
entries of D coincides with the maximal dimension of an Apositive (resp. A  negative,
resp. A  neutral) subspace.
The above theorem says that ifr + is the number of positive diagonal entries of D, then
there exists an A  positive subspace E of dimension 1 +, but it is impossible to find a
positive subspace E with dim E > r +.
We will need the following lemma, which can be considered a particular case of the
above theorem.
Lemma. Let D be a diagonal matrix D = diag{Al' A
2
, ... , An}' Then the number of
positive (resp. negative, resp. zero) diagonal entries of D coincides with the maximal
dimension of an D  positive (resp. D  negative, resp. D  neutral) subspace.
Proof By rearranging the standard basis in ]R.n (changing the numeration) we can
228 Bilinear and Quadratic Forms
always assume without loss of generality that the positive diagonal entries of D are the
first r + diagonal entries.
Consider the subspace E+ spanned by the first rf coordinate vectors e
l
, e
2
, ... , e
r
+.
Clearly E+ is a Dpositive subspace, and dimE+ = r +.
Let us now show that for any other Dpositive subspace E we have dim E ~ r +.
Consider the orthogonal projection P = P E+'
Px = (xl' x
2
, ... , x
r
+, 0 ... , Of, x = (Xl' x
2
, ... , xnf.
For a D  positive subspace E define an operator T: E ~ E+ by
Tx = Px, V X E E.
In other words, T is the restriction of the projection P : P is defined on the whole
space, but we restricted its domain to E and target space to E+. We got an operator acting
from E to E+, and we use a dierent letter to distinguish it from P.
Note, that Ker T = {O}. Indeed, let for X = (xl' x
2
, ... , xnf E E we have Tx = Px = O.
Then, by the definition of P
Xl = x
2
= ... = xr + = 0,
. and therefore
n 2
(Dx,x) = L Akxk ~ 0 (Ak ~ Ojork>r+).
k=r++l
But X belongs to aD  positive subspace E, so the inequality (Dx, x) ~ 0 holds only
for x = O.
Let us now apply the Rank Theorem. First of all, rank T = dimRan T ~ dimE+ = r +
because Ran T c E+. By the Rank Theorem, dimKer T + rank T = dim E. But we just
proved that Ker T= {O}, i.e. that dim Ker T= 0, so
dim E = rank T ~ dim E+ = r +.
To prove the statement about negative entries, we just apply the above reasoning to
the matrix D. The case of zero entries is treated similarly, but even simpler.
Proof Let D = S*AS be a diagonalization of A. Since
(Dx, x) = (S* ASx, x) = (ASx, Sx)
it follows that for any D  positive subspace E, the subspace SE is an Apositive
subspace. The same identity implies that for any A  positive subspace F the subspace
!)IF is D  positive.
Since Sand !)l are invertible transformations, dimE = dim SE and dimF = dim !)l F.
Therefore, for any D positive subspace E we can find an A  positive subspace (namely
SE) of the same dimension, and vice versa: for any A  positive subspace F we can find a
D  positive subspace (namely !)l F) of the same dimension. Therefore the maximal possible
dimensions of a A  positive and a D  positive subspace coincide, and the theorem is
proved. The case of negative and zero diagonal entries treated similarly.
Bilinear and Quadratic Forms 229
Positive Definite Forms
Minimax characterization of eigenvalues and the Silvester's criterion of positivity
Definition. A quadratic form Q is called
• Positive definite if Q[x] > 0 for all x * O.
• Positive semidefinite if Q[x] ;;:: 0 for all x.
• Negative definite if Q[x] < 0 for all x '* o.
• Negative semidefinite if Q[x] ~ 0 for all x.
• Indefinite if it take both positive and negative values, i.e. if there exist vectors
xl and x
2
such that Q[xd > 0 and Q[x
2
] < O.
Definition. A symmetric matrix A = A * is called positive definite (negative definite,
etc.) ifthe corresponding quadratic form Q[x] = (Ax, x) is positive definite (negative definite,
etc.).
Theorem. Let A = A *. Then
1. A is positive definite i all eigenvalues of a are positive.
2. A is positive semidefinite i all eigenvalues of a are nonnegative.
3. A is negative definite i all eigenvalues of a are negative.
4. A is negative semidefinite i all eigenvalues of a are nonpositive.
S. A is indefinite i it has both positive and negative eigenvalues.
Proof The proof follows trivially from the orthogonal diagonalization. Indeed, there
is an orthonormal basis in which matrix of a is diagonal, and for diagonal matrices the
theorem is trivial.
Remark. Note, that to find whether a matrix (a quadratic form) is positive definite
(negative definite, etc) one does not have to compute eigenvalues. By Silvester's Law of
Inertia it is sucient to perform av arbitrary, not necessarily orthogonal diagonalization
D = SAS and look at the diagonal entries of D. Silvester's criterion of positivity. It is an
easy exercise to see that a 2 x 2 matrix
is positive definite if and only if
a > 0 and det A = ac  b
2
> 0
Indeed, if a > 0 and det A = acb
2
> 0, then c > 0, so trace A = a + c > O. So we know
that if AI' A2 are eigenvalues of a then AI "'2 > 0 (det A > 0) and 1 + 2 = traceA > O. But that
only possible if both eigenvalues are positive. So we have proved that conditions imply
that A is positive definite. The opposite implication is quite simple. This result can be
generalized to the case of n x n matrices. Namely, for a matrix A
230 Bilinear and Quadratic Forms
A=
an,1 a
n
,2 an,n
A let us consider its all upper left submatrices
A
I
=(a
l1
),A
2
= 1,1 1,2 ,A3= a21 a22 a23, ... ,A
n
=A
, a a '"
(
a a) (al,1 al,2 al,3 J
2,1 2,2
Theorem. (Silvester's Criterion of Positivit f?.JJ. * is positive definite if
and only if
detA
k
> Ofor all k = 1,2, ... , n.
First of all let us notice that if A> 0 then Ak > 0 also (can you explain why?). Therefore,
since all eigenvalues of a positive definite matrix are positive, det Ak > 0 for all k.
One can show that if det Ak > 0 \;j k then all eigenvalues of A are positive by analyzing
diagonalization of a quadratic form using row and column operations. The key here is the
observation that if we perform row/column operations in natural order (i.e. first subtracting
the first row/column from all other rows/columns, then subtracting the second row/columns
from the rows/columns 3, 4, ... , n, and so on ... ), and if we are not doing any row interchanges,
then we automatically diagonalize quadratic forms Ak as well. Namely, after we subtract
first and second rows and columns, we get diagonalization of A
2
; after we subtract the
third row/column we get the diagonalization of A
2
, and so on.
Since we are performing only row we do not change the determinant.
Moreover, since we are not performing row exchanges and performing the operations in
the correct order, we preserve determinants of A
k
• Therefore, the condition det Ak > 0
guarantees that each new entry in the diagonal is positive.
Of course, one has to be sure that we can use only row replacements, and perform the
operations in the correct order, i.e. that we do not encounter any pathological situation. If
one analyzes the algorithm, one can see that the only bad situation that can happen is the
situation where at some step we have zero in the pivot place. In other words, if after we
subtracted the first k rows and columns and obtained a diagonalization of A
k
, the entry in
the k + 1st row and k + 1st column is O.
We will need the following characterizatipn of eigenvalues of a hermitian matrix.
Minimax characterization of eigenvalues. Let us recall that the codimension of a subspace
E c X is by the definition the dimension of its orthogonal complement, codim E = dim(E.l ).
Since for a subspace E c X, dim X = n we have dim E + dim E.l = n, we can see that
codim E = dirnX  dim E. Recall that the trivial subspace {O} has dimension zero, so the
whole space X has codimension O.
Bilinear and Quadratic Forms 231
Theorem. (Minimax characterization of eigenvalues). Let A = A * be an n x n matrix.
and let AI' 1.,2 ... An be its eigenvalues taken in the decreasing order. Then
Ak = max min(Ax,x) = min max(Ax,x).
& F:
dimE=k 11.<11=1 11'11=1
Let us explain in more details what the expressions like max min and minmax mean.
To compute the first one, we need to consider all subspaces E of dimension k. For each
such subspace E we consider the set of all x E E of norm 1, and find the minimum of (Ax,
x) over all such x. Thus for each subspace we obtain a number, and we need to pick a
subspace E such that the number is maximal. That is the maxmin. The min max is defined
similarly.
Remark. A sophisticated reader may notice a problem here: why do the maxima and
minima exist? It is well known, that maximum and minimum have a nasty habit of not
existing: for example, the functionJ(x) = x has neither maximum nor minimum on the
open interval (0, 1).
However, in this case maximum and minimum do exist. There are two possible
explanations of the fact that (Ax, x) attains maximum and minimum. The first one requires
some familiarity with basic notions of analysis: one should just say that the unit sphere in
E, i.e. the set {x E E: II x II = I} is compact, and that a continuous function (Q[x] = (Ax, x)
in our case) on a compact set attains its maximum and minimum.
Another explanation will be to notice that the function Q[x] = (Ax, x), x E E is a
quadratic form on E. It is not dicult to compute the matrix of this form in some orthonormal
basis in E, but let us only note that this matrix is not A: it has to be a k x k matrix, where
k= dimE.
It is easy to see that for a quadratic form the maximum and minimum over a unit
sphere is the maximal and minimal eigenvalues of its matrix. As for optimizing over all
subspaces, we will prove below that the maximum and minimum do exist.
Proof First of all, by picking an appropriate orthonormal basis, we can assume without
loss of generality that the matrix A is diagonal, A = diag{A\, 1.,2' ... , An}'
Pick subspaces E and F, dimE= k, codim F= kl, i.e. dimE = n k+ 1. Since dimE
+ dim F> n, there exists a nonzero vector Xo E En F. By normalizing it we can assume
without loss of generality that II Xo II = 1. We can always arrange the eigenvalues in decreasing
order, so let us assume that AI 1.,2 ... An. Since x belongs to the both subspaces E
and F
min(Ax,x):::; (Axo,xo):::; max(Ax,x).
xeE xeF
IIxll=1 Ilxll=1
We did not assume anything except dimensions about the subs paces E and F, so the
above inequality
min(Ax,x) :::; max(Ax,x).
xeE xeF
IIxll=1 Ilxll=1
232 Bilinear and Quadratic Forms
holds for all pairs of E and F of appropriate dimensions. Define
Eo := span{el' e
2
, ... , e
k
}, Fo := span{e
k
, e
k
+
l
, e
k
+
2
, ... , en}'
Since for a selfadjoint matrix B, the maximum and minimum of (Bx, x) over the unit
sphere {x : " x " = I} are the maximal and the minimal eigenvalue respectively (easy to
check on diagonal matrices), we get that .
min (Ax, x) = max(Ax,x) = A.k'
xeEo xeFo
IIxll=1 IIxll=1
It follows from equation that for any subspace E, dimE = k
min(Ax,x) ~ max(Ax,x) = A.k'
xeE xeFo
IIxll=1 IIxll=1
and similarly, for any subspace F of codimension k  1,
max(Ax,x) ~ min (Ax, x) = A.k'
xeF xeEo
IIxll=1 IIxll=1
But on subspaces Eo and Fo both maximum and minimum are A.
k
, so
min max = max min = A.
k
•
Corollary (Intertwining of Eigenvalues). Let A = A * = {aj,k } ~ , k = I be a selfadjoint
matrix, and let A = {aj,k } ~ ; I = I be its submatrix of size (n  1) x (n  1). Let 1...
1
' 1...
2
' ... , A.
n
and Ill' 112' ... , Il
n
I be the eigenvalues of A and A respectively, taken in decreasing order.
Then
I.e.
A.k ~ Ilk ~ A.k+l,k = 1,2, ... ,n1
Proof. Let X c F
n
be the subspace spanned by the first n  1 basis vectors, eX =
span{e
I
, e
2
, ... , enI}' Since (Ax, x) = (Ax, x) for all x EX,
Ilk = ma{( min(Ax,x).
xcX xeE
dimE=k IIxll=1
To get A.k we need to get maximum over the set of all subspaces E of P', dimE = k, i.e.
take maximum over a bigger set (any subspace of X is a subspace of P'). Therefore
Ilk ~ lk'
(the maximum can only increase, if we increase the set).
On the other hand, any subspace E c X of codimension k  1 (here we mean
codimension in X) has dimension n  1  (k  1) = n  k, so its codimension in Fn is k.
Therefore
Ilk = mil} max(Ax,x) ~ min max(Ax, x) = A.k+I
EcX xeE EcF
n
xeE
dimE=nk IIxll=1 dimE=nk IIxll=1
(minimum over a bigger set can only be smaller).
Bilinear and Quadratic Forms 233
Proof If A> 0, thenA
k
> 0 for k= 1,2, ... , n as well (can you explain why?). Since all
eigenvalues of a positive definite matrix are positive, det Ak > 0 for all k = 1, 2, ... , n.
'Let us now prove the other implication. Let det Ak > 0 for all k. We will show, using
induction in k, that all Ak (and so A = An) are positive definite.
Clearly A I is positive definite (it is 1 x 1 matrix, so A I = det A I)' Assuming that A
k
_
1
> 0 (and det Ak > 0) let us show that Ak is positive definite. Let AI' A
2
, ... , Ak and ~ I ' ~ 2 ' ... ,
~ k  I be eigenvalues of Ak and A
k

I
respectively. By Corollary
'k
j
;?:Ilj >0 forj= 1,2, ... , kl
Since detA
k
= AI/"'2 ... AkI/"'k > 0, the last eigenvalue Ak must also be positive. Therefore,
since all its eigenvalues are positive, the matrix Ak is positive definite.
Chapter 9
Advanced Spectral Theory
CayleyHamilton Theorem
Theorem (CayleyHamilton). Let a be a square matrix, and let pCA) = det(A I) be its
characteristic polynomial. Then peA) = o.
The proof looks ridiculously simple: plugging A instead of in the definition of the
characteristic polynomial, we get
peA) = det(A  AI) = det 0 = o.
But this is a wrong proofl To see why, let us analyse what the theorem states. It states,
that if we compute the characteristic polynomial
n
det(A  Ai) = peA) = LCk
Ak
k=O
and then plug matrix A instead of A to get
n k n
peA) = L ck
A
= col + cIA + .. . cnA
k=O
then the result will be zero matrix.
It is not clear why we get the same result if we just plug A instead of A in the determinant
det(A  AI). Moreover, it is easy to see that with the exception of trivial case of 1 x 1
matrices we will get a dierent object.
Namely, A  AI is zero matrix, and its determinant is just the number O.
But peA) is a matrix, and the theorem claims that this matrix is the zero matrix. Thus
we are comparing apples and oranges. Even though in both cases we got zero, these are
dierent zeroes: he number zero and the zero matrix!
Let us present another proof, which is based on some ideas from analysis. A
"continuous" proof. The proof is based o,n several observations. First of all, the theorem is
trivial for diagonal matrices, and so for matrices similar to diagonal (Le. for diagonalizable
matrices). The second observation is that any matrix can be approximated (as close as we
Advanced Spectral Theory 235
want) by diagonalizable matrices. Since any operator has an upper triangular matrix in
some orthonormal basis, we can assume without loss of generality that a is an upper
triangular matrix.
We can perturb diagonal entries of A (as little as we want), to make them all dierent,
so the perturbed matrix A is diagonalizable (eigenvalues of a a triangular matrix are its
diagonal entries, and by Corollary an n x n matrix with n distinct eigenvalues is
diagonalizable). We can perturb the diagonal entries of A as little as we want, so Frobenius
norm II A  A 112 is as small as we want. Therefore one can find a sequence of diagonalizable
matrices Ak such that Ak 7 A as k 7 00 for example such that Ak  Ak 7 A as k 7 00).
It can be shown that the characteristic polynomials pl)..) = det(Ak IJ) converge to the
characteristic polynomial pC)..) = det(A  A1) of A. Therefore
peA) = lim Pk(A
k
)·
k7
00
But as we just discussed above the CayleyHamilton Theorem is trivial for
diagonalizable matrices, so piAk) = O. Therefore peA) = lim
k
7oo 0 = O.
This proof is intended for a reader who is comfortable with such ideas from analysis
as continuity and convergence. Such a reader should be able to fill in all the details, and
for him/her this proof should look extremely easy and natural.
However, for others, who are not comfortable yet with these ideas, the proof definitely
may look strange. It may even look like some kind of cheating, although, let me repeat that
it is an absolutely correct and rigorous proof (modulo some standard facts in analysis). So,
let us resent another, proof of the theorem which is one of the "standard" proofs from
linear algebra textbooks.
A "standard" proof We know, see Theorem, that any square matrix is unitary
equivalent to an upper triangular one. Since for any polynomial p we have p(UA[jI) =
Up(A)[jI, and the characteristic polynomials of unitarily equivalent matrices coincide, it
is sucient to prove the theorem only for upper triangular matrices. So, let A be an upper
triangular matrix. We know that diagonal entries of a triangular matrix coincide with it
eigenvalues, so let AI' A
2
, ... , An be eigenvalues of A ordered as they appear on the diagonal,
so
*
A=
o An
The characteristic polynomial p(z) = det(A  z1) of A can be represented as
p(z) = (A\  Z)(A2  z) ... (An  z) = (_l)n (z  AI)(Z  A
2
) ... (z  An)'
so
236 Advanced Spectral Theory
Define subspaces
Ek := span{e
l
, e
2
, ... , e
k
},
where e
l
, e
2
, ... , An is the standard basis in en. Since the matrix of A is upper triangular,
the subspaces Ek are socalled invariant subspaces of the operator A, i.e. AEk c Ek (meaning
that Av E Ek for all v E E
k
). Moreover, since for any v E Ek and any A
(A  Al)v =Av  AV E E
k
,
because both Av and AV are in E
k
. Thus (A  Al)Ek c E
k
, i.e. Ek is an invariant subspace
of A  Al. We can say even more about the the subspace (A  A ~ E k . Namely,
(A  A ~ e k E span{e
l
, e
2
, ... , e
k
_
I
},
because only the first k  1 entries of the kth column of the matrix of A  At! can be
nonzero. On the other hanc!, for j < k we have
(A  Ak)e
j
E E
j
C Ek
(because E
j
is an invariant subspace of A  A ~ .
Take any vector v E E
k
. By the definition of Ele it can be represented as a linear
combination of the vectors e
1
, e
2
, ... , e
k
. Since all vectors e
l
, e
2
, ... , e
le
are transformed by
A  At! to some vectors in E
k
_1' we can conclude that
get
(A  A ~ V E E
k
_
1
\/v E E
k
•
Take an arbitrary vector x E en = En" Applying inductively with k = n, n  1, ... , 1 we
XI := (A  Anl)x E EnI'
X
2
:= (A In_ll)x
I
= (A In_Il)(A Inl) X E En2'
Xn := (A 1
2
l)x
n
_
1
= (A 1
2
l) ... (A  I n_Il)(A Inl)x EEl·
The last inclusion mean that xn = ae
l
• But (A  AIl)e
l
= 0, so
0= (A  AIl)x
n
= (A  AIl)(A  A
2
l) ... (A  Anl)X.
Therefor.:: p(A)x = 0 for all X E en , which means exactly that peA) = O.
Spectral Mapping Theorem
Polynomials of operators. Let us also recall that for a square matrix (an operator) A
and for a polynomial p(z) = the operator peA) is defined by substituting a instead of the
independent variable,
N Ie 2 N
p(A):= LaleA =aOI+a1A+a2A + ... +aNA ;
k=1
here we agree that AO = I
We know that generally matrix multiplication is not commutative, i.e. generally AB
::j::. BA so the order is essential. However
Aylj = AiAk = Ak+j,
and from here it is easy to show that for arbitrary polynomials p and q
Advanced Spectral Theory
p(A)q(A) = q(A)p(A) = R(A)
where R(z) = p(z)q(z).
237
That means that when dealing only with polynomials of an operator A, one does not
need to worry about noncommutativity, and act like a is simply an independent (scalar)
variable. In particular, if a polynomial p(z) can be represented as a product of monomials
p(z) = a(z  z\)(z  z2) ... (z  zN)'
where z \' z2' ... , z N are the roots of p, then peA) can be represented as
peA) = a(A  n\l)(A  zNl)
Spectral Mapping Theorem. Let us recall that the spectrum seA) of as quare matrix (an
operator) A is the set of all eigenvalues of A (not counting multiplicities.
Theorem (Spectral Mapping Theorem). For a square matrix a and an arbitrary polynomial
p
cr(p(A)) = p(cr(A)).
In other words, ~ is an eigenvalue of peA) if and only if ~ = peA) for some eigenvalue
of A. Note, that as stated, this theorem does not say anything about multiplicities of the
eigenvalues.
Remark. Note, that one inclusion is trivial. Namely, if is an eigenvalue of A, Ax = x
for some x =1= 0, then Akx = Akx, and p(A)x = p(A)x, so peA) is an eigenvalue of peA). That
means that the inclusion p(cr(A)) c cr(p(A)) is trivial. Ifwe consider a particular case ~ =
o of the above theorem, we get the following corollary.
Corollary. Let a be a square matrix with eigenvalues 1, 2, ... , n and let p be a
polynomial. Then peA) is invertible if and only if
peAk) =1= 'v'k = 1,2, ... n.
Proof As it was discussed above, the inclusion p( cr(A)) c (P(A)) is trivial.
To prove the opposite inclusion cr(p(A)) cp(cr(A)) take a point ~ E cr(p(A)). Denote
q(z) = p(z)  ~ , so q(A) = peA)  ~ 1 .
Since ~ E (P(A)) the operator q(A) = peA)  ~ I is not invertible. Let us represent the
polynomial q(z) as a product of monomials,
q(z) = a(z  z\)(z  z2) ... (z  zN).
Then, as it was discussed above in Section 2.1, we can represent
q(A) = a(A  ztl)(A  z2l) ... (A  zNl).
The operator q(A) is not invertible, so one of the terms A  z/ must be not invertible
(because a product of invertible transformations is always invertible). That means zk E
(A). On the other hand zk is a root of q, so
o = q(zk) = p(zk)  ~
and therefore ~ = p(zk). So we have proved the inclusion s(P(A)) c p((A)).
Invariant Subspaces
Definition. Let A : V ~ V be an operator (linear transformation) in a vector space V.
a subspace E of the vector space V is called an invariant subspace of the operator A (or,
shortly, A  invariant) if AE c E, i.e. if
238
Av E E for all vEE.
If E is Ainvariant, then
A2E = A(AE) cAE c E,
i.e. E is A
2
 invariant.
Advanced Spectral Theory
Similarly one can show (using induction, for example), that if AE c E. then
Att:
cE
\ t k ~ l ·
This implies that P(A)E c E for any polynomial p, i.e. that any Ainvariant subspace
E is an invariant subspace of peA). If E is an Ainvariant subspace, then for all vEE the
result Av also belongs to E. Therefore we can treat A as an operator acting on E, not on the
whole space V.
Formally, for an Ainvariant subspace E we define the socalled restriction AlE: E 7 E
of A onto E by
(AIE)v = Av \tv E E.
Here we changed domain and target space of the operator, but the rule assigning value
to the argument remains the same.
We will need the following simple lemma
Lemma. Let p be a polynomial, and let E be an Ainvariant subspace.
Then
p(AI
E
) = p(A)I
E
·
Proof The proof is trivial. If E1 ,E
2
, ... , Er a basis of Ainvariant subspaces, and AEk
:= AIEk are the corresponding restrictions, then, since AEk = Att:k c Ek the operators Ak
act independently of each other (do not interact), and to analyse action of A we can analyse
operators Ak separately.
In particular, if we pick a basis in each subspace Ek and join them to get a basis in V
then the operator a will have in this basis the following blockdiagonal form
o
A=
o
(of course, here we have the correct ordering of the basis in V, first we take a basis in
E1,then in E2 and so on). Our goal now is to pick a basis of invariant subspaces E
1
, E
2
, ... ,
Er such that the restrictions Ak have a simple structure. In this case we will get basis in
which the matrix of A has a simple structure.
The eigenspaces Ker(A  AI) would be good candidates, because the restriction of a
to the eigenspace Ker(A  AI) is simply At!. Unfortunately, as we know eigenspaces do
not always form a basis (they form a basis if and only if A can be diagonalized. However,
the socalled generalized eigenspaces will work.
Advanced Spectral Theory 239
Generalized Eigenspaces
Definition. A vector v is called a generalized eigenvector (corresponding to an
eigenvalue) if (A  ,)J/v = 0 for some k ~ 1.
The collection EA, of all generalized eigenvectors, together with 0 is called the
generalized eigenspace (corresponding to the eigenvalue A.
In other words one can represent the generalized eigenspace EA, as
EA, = UKer(AAI)k.
k ~ 1
The sequence Ker(A  Ali, k = 1,2, 3, ... is an increasing sequence of subspaces, i.e.
Ker(A  AI)k c Ker(A  AI)k+ 1 V k ~ k" .
The representation does not look very simple, for it involves an infinite union. However,
the sequence of the subspaces Ker(A  A.I)k stabilizes, i.e.
Ker(A  Hi c Ker(A  'M)k+
1
V k ~ 1. ,
so, in fact one can take the finite union.
To show that the sequence of kernels stabilizes, let us notice that if for finite
dimensional subspaces E and F we have E <; F (symbol E <; F means that E c F but
E:;:. F), then dim E < dim F. Since dimKer(A  I)k ~ dim V < 00 , it cannot grow to infinity,
so at some point
Ker(A  I)k = Ker(A  II)k+ I.
The rest follows from the lemma below.
Lemma. Let for some k
Ker(A  I)k = Ker(A  AI)k+ I.
Then
Ker(A _l)k+r = Ker(A  'Al)k+r+l Vr ~ O.
Proof Let v E Ker(A  I)k+r+l, i.e. (A  Ai)k+r+1v = O. Then
w := (A  ll)r E Ker(A  ll)k+
1
.
But we know that Ker(A _'M)k = Ker(A _'M)k+l so WE Ker(A _'M)k,
which means (A _l)kw = O. Recalling the definition of w we get that
(A  A.I)k+r v = (A  A.I)k w = 0
so V E Ker(A  Al)k+r. We proved that Ker(A _l)k+r+l c Ker(A _'M)k+r.
The opposite inclusion is trivial.
Definition. The number d = d(A) on which the sequence Ker(A  A.I)k stabilizes, i.e.
the number d such that
Ker(A _'M)dI C Ker(A _'M)d = Ker(A  Al)d+l
oct
is called the depth of the eigenvalue A.
It follows from the definition of the depth, that for the generalized eigenspace EA,
240 Advanced Spectral Theory
(A  'AJ)d(')..)v = Vv E E/...
Now let us summarize, what we know about generalized eigenspaces.
1. E is an invariant subspace of A, AE, E.
2. If d(A) is the depth of the eigenvalue, then
«A  Al)IEJ
d
(')..) = (AlE')..  A1n)d(')..) = 0.
3. cr(AIE')..) = {A}, because the operator AlE')..  AI£)." is nilpotent, see 2, and the
spectrum of nilpotent operator consists of one point 0,
Now we are ready to state the main result of this section. Let A : V ~ V.
Theorem. Let (A) consists of r points AI' A
2
, ... , A,. and let Ek : = E k be the corresponding
generalized eigenspaces. Then the system of subspace E
J
,E
2
, ... , Er is a basis of subs paces
in V
Remark. Ifwe join the bases in all generalized eigenspaces E
k
, then by Theorem, we
will get a basis in the whole space. In this basis the matrix of the operator a has the block
diagonal form A = diag{Al'A
2
, ... , Ar}, where Ak := AI
Ek
, Ek = E')..e' It is also easy to see,
that the operators Nk := Ak  AkIEk are nilpotent, Ntk = 0.
Proof Let m
k
be the multiplicity of the eigenvalue A
k
, so p(z) = II::1 (z  Ak )mk is
the characteristic polynomial of A. Define
Lemma.
II
A m;
pk(z) = p(Z)/(ZAkt'k = . (z j) .
hk
(A  AkI)mk I Ek = 0,
Proof There are 2 possible simple proofs. The first one is to notice that m
k
;::: d
k
,
where d
k
is the depth of the eigenvalue Ak and use the fact that
'I d
k
'I mk
(A  I\,k I ) I Ek = (Ak  I\,k I E
k
) = 0,
where Ak := A I Ek (property 2 of the generalized eigenspaces).
The second possibility is to notice that according to the Spectral Mapping Theorem,
the operator PiA) I Ek = piAk) is invertible. By the CayleyHamilton Theorem.
0= peA) = (A  AkI)/II
k
(A),
and restriction all operators to Ek we get
0= p(A
k
) = (Ak  AkI E. )mk Pk (A
k
),
K
so
I11k 1 1
(AkAkIE
k
) =p(Ak)Pk(A
k
) =OPkCA
k
) =0.
To prove the theorem define
Advanced Spectral Theory 241
r
q(z) = LPk(z)
k=1
Since Pk( j) = 0 for j ;j:. k and Pk (k) ;j:. 0, we can conclude that q(k) if:. 0 for all k.
Therefore, by the Spectral Mapping Theorem, the operator
B = q(A)
is invertible.
Note that BEk c Ek (any A  invariant subspace is also peA)  invariant).
Since B is an invertible operator, dim(BE
k
) = dim E
k
, which together with BEk c Ek
implies BEk = E
k
. MUltiplying the last identity by SI we get that SIEk = E
k
, i.e. that Ek is
an invariant subspace of S1.
Note also, that it follows from that
piA) I
Ej
= 0 Vj;j:. k ,
because piA) I E
j
= Pk(A) and piA) contains the factor (Aj 'A/
Ej
)mj = O.
Define the operators P k by
P
k
= SlpiA).
Lemma. For the operators P
k
defined above
1. PI + P 2 + ... + P r = I;
2. PklEj = 0 for} ;j:. k;
3. RanP
k
c E
k
;
4. Moreover, P kV = vVv E E
k
, so, in fact Ran P k = E
k
•
Proof Property 1 is trivial:
r r
'" 1", 1
L.,.Pk =B L.,.PkPPk(A)=B B=l.
k=1 k=1
Indeed,Pk(A) contains the factor (A  'A)m
j
, restriction of which to E
j
is zero. Therefore
piA) I E
j
= 0 and thus
P
k
I E
j
= SI piA) I E
j
= O.
To prove property 3, recall that according to CayleyHamilton Theorem peA) = O.
Since
we have for w = piA)v
'\ /Ilk /Ilk
(A I'vkl) w = (A  Akl) pk(A)v = p(A)v = O.
That means, any vector w in Ran piA) is annihilated by some power of (A  kJ),
which by definition means that Ran Pk(A) c E
k
.
To prove the last property, let us notice that it follows from that for v E Ek
242
r
Pk(A)v = LPj(A)v = Bv,
j=l
Advanced Spectral Theory
which implies Pkv= glBv= v. Now we are ready to complete the proofofthe theorem.
Take v E Vand define
v
k
= Pkv.
Then according to Statement 3 of the above lemma, v
k
E E
k
, and by statement,
r
V= LVk'
k=l
so v admits a representation as a linear combination.
To show that this representation is unique, we can just note, that if v is represented as
v = "" r vk' E E
k
, then it follows from the Statements 2 and 4 of the lemma that
L.Jk=l
Pkv = Pivl + v
2
+ ... + v
r
) = Pkv
k
= v
k
·
Geometric Meaning of Algebraic Multiplicity
Proposition. Algebraic multiplicity of an eigenvalue equals to the dimension of the
corresponding generalized eigenspace.
Proof Ifwe joint bases in generalized eigenspaces Ek = EM to get a basis in the whole
space, the matrix of a in any such basis has a blockdiagonal form diag{AI.A2' ... ,A
r
},
where
Operators
are nilpotent, so
a(N
k
) = {O}.
Therefore, the spectrum of the operator Ak (recall that
Ak =N
k
 Ai)
consists of one eigenvalue k of (algebraic) multiplicity
n
k
= dimE
k
·
The multiplicity equals n
k
because an operator in a finitedimensional space V has
exactly dim V eigenvalues counting multiplicities, and Ak has only one eigenvalue.
Note that we are free to pick bases in E
k
, so let us pick them in such a way that the
corresponding blocks Ak are upper triangular. Then
r r
det(A  AI) = IT det(Ak  AI Ek = IT (I"k  A)nk .
k=1 k=1
But this means that the algebraic multiplicity of the eigenvalue Ak is
n
k
= dimEj,i
An important application. The following corollary is very important for dierential
equations.
Advanced Spectral Theory 243
Corollary. Any operator A in V can be represented as A = D + N, where D is
diagonalizable (i.e. diagonal in some basis) and N is nilpotent (NI1l = 0 for some m), and
DN=ND.
Proof As we discussed above, if we join the bases in Ek to get a basis in V, then in
this basis A has the block diagonal form A = diag{A\,A2' ... ,A
r
}, where
Ak :=A I E
k
, Ek = E
k
·
The operators Nk := Ac AJ Ek are nilpotent, and the operator in the equation.
D = diag{Al lEI' A21 E2 ... , AJ Er }
Notice also that
AkIEkNk = NkAkIEk
(identity operator commutes with any operator), so the block diagonal operator
N = diag{N\,N
2
, ... ,N
r
}
commutes with
D,DN=ND.
Therefore, defining N as the block diagonal operator
N = diag{N
1
, N
2
, ... , N
r
}
we get the desired decomposition.
This corollary allows us to compute functions of operators. Let us recall that if p is a
polynomial of degree d, then p(a + x) can be computed with the help of Taylor's formula
d (k)
p(a+x) = L p (a) i
k=O k!
This formula is an algebraic identity, meaning that for each polynomial p we can check
that the formula is true using formal algebraic manipulations with a and x and not caring
about their nature. Since operators D and N commute,
DN=ND,
the same rules as for usual (scalar) variables apply to them, and we can write (by
plugging D instead of a and N instead of x
d p(k)(D) k
p(A) = p(D + N) =.to k! N
Here, to compute the derivative p(k)(D) we first compute the kth derivative of the
polynomial p(x) (using the usual rules from calculus), and then plug D instead of x. But
since N is nilpotent,
N"' = 0
for some m, only first m terms can be nonzero, so
ml/k)(D) k
p(A)=p(D+N)= ~ k! N.
244 Advanced Spectral Theory
In m i., much smaller than d, this formula makes computation of peA) much easier.
The same approach works if p is not a polynomial, but an infinite power series. For
general power series we have to be careful about convergence of all the series involved, so
we cannot say that the formula is true for an arbitrary power series p(x). However, if the
radius of convergence of the power series is 1, then everything works fine. In particular, if
p(x) = eX, then, using the fact that (eXy = eX we get.
ml D ml
e
A
= L =Nk = e
D
= L ..!..N
k
k=O k! k=O k!
This formula has important applications in dierential equation. Note, that the fact that
ND=DN
is essential here!
Structure of Nilpotent Operators
Recall, that an operator a in a vector space V is called nilpotent if Ak = 0 for some
exponent k.
In the previous section we have proved, that if we join the bases in all generalized
eigenspaces
Ek = Ek
to get a basis in the whole space, then the operator a has in this basis A block diagonal
form diag{AI' A
2
, ... , Ar} and operators Ak ca be represented as
Ak = Ai +N
k
,
where Nk are nilpotent operators.
In each generalized eigenspace Ek we want to pick up a basis such that the matrix of
Ak in this basis has the simplest possible form. Since matrix (in any basis) of the identity
operator is the identity matrix, we need to find a basis in which the nilpotent operator Nk
has a simple form. Since we can deal with each Nk separately, we will need to consider the
following problem:
For a nilpotent operator A find a basis such that the matrix of a in this basis is simple.
Let see, what does it mean for a matrix to have a simple form. It is easy to see that the
matrix
is nilpotent.
o 1 0
o
o
o
1
o
These matrices (together with 1 x 1 zero matrices) will be our "building blocks".
Namely, we will show that for any nilpotent operator one can find a basis such that the
Advanced Spectral Theory 245
matrix of the operator in this basis has the block diagonal form diag{A l' A
2
, ... , Ar}, where
each A k is either a block of form or a 1 x I zero block.
Let us see what we should be looking for. Suppose the matrix of an operator A has in
a basis vI' v'), ... , V the form (4.1). Then
 p
and
Av  °
1
AV
k
+
I
= v
k
, k = 1,2, ... , p  1.
Thus we have to be looking for the chains of vectors vI' v
2
, ... , vp satisfying the above
relations.
Cycles of Generalized Eigenvectors
Definition. Let a be a nilpotent operator. a chain of nonzero vectors vI' v
2
, ... , vp
satisfying relations is called a cycle of generalized eigenvecton of A. The vector vI is
called the initial vector of the cycle, the vector V p is called the end vector of the cycle, and
the number p is called the length of the cycle.
Remark. A similar definition can be made for an arbitrary operator. Then all vectors
vk must belong to the same generalized eigenspace E, and they must satisfy the identities
(A lJ)vI = 0, (A  'Al)v
k
+
1
= v
k
, k = 1,2, ... , p  I,
Theorem. Let a be a nilpotent operator, and let C
I
, C
2
, ... , C
r
be cycles of its generalized
eigenvectors, C
k
= v;, V;k' V;k' Pk being the length of the cycle C
k
. Assume that the initial
vectors VII, , ... , v; are linearly independent. Then no vector belongs to two cycles, and
the union of all the vectors from all the cycles is a linearly independent.
Proof Let
n = PI + P2 + ... + Pr
be the total number of vectors in all the cycles. We will use induction in n. Ifn = 1 the
theorem is trivial.
Let us now assume, that the theorem is true for all operators and for all collection of
cycles, as long as the total number of vectors in all the cycles is strictly less than n. Without
loss of generality we can assume that the vectors span the whole space V, because,
otherwise we can consider instead of the operator A its restriction onto the invariant subspace
: k= 1,2, ... ,1', I j Pk}'
Consider the subspace RanA. It follows from the relations that vectors
:k=I,2, ... ,r,1
span Ran A. Note that if Pk > 1 then the system v;, v;, ... , V;kl is a cycle, and that A
annihilates any cycle of length 1. Therefore, we have finitely many cycles, and initial vectors
ofthese cycles are linearly independent, so the induction hypothesis applies, and the vectors
246 Advanced Spectral Theory
k·k=12 1<· 1
Vj . " ... , r,  J ::; Pk
are linearly independent. Since these vectors also span Ran A, we have a basis there.
Therefore,
rank A = dim Ran A = n  r
k
(we had n vectors, and we removed one vector v Pk from each cycle C
k
, k = 1, 2, ... , r,
so we have nr vectors in the basis v ~ : k = 1, 2, ... , r, 1 ::; j ::; Pk 1 ). On the other hand
Ai =0
I
for k = 1, 2, ... , r, and since these vectors are linearly independent dim Ker A ;;::: r. By
the Rank Theorem.
dimV= rankA + dimKerA = (n  r) + dimKer A ;;::: (n  r) + r = n
so dim V;;::: n.
On the other hand V is spanned by n vectors, therefore the vectors
v ~ : k= 1,2, ... , r, 1 ::; j ::; Pk'
form a basis, so they are linearly independent
Jordan Canonical form of a Nilpotent operator.
Theorem. Let A : V ~ V be a nilpotent operator. Then V has a basis consisting of
union of cycles of generalized eigenvectors of the operator A.
Proof We will use induction in n where n = dim V. For n = 1 the theorem is trivial.
Assume that the theorem is true for any operator acting in a space of dimension strictly
less than n. Consider the subspace X= Ran A. Xis an invariant subspace of the operator A,
so we can consider the restriction Alx.
Since A is not invertible, dim Ran A < dim V, so by the induction hypothesis there
exist cycles C I' C
2
, ... , C
r
of generalized eigenvectors such that their union is a basis in X.
Let
k k k
C k = vI ,v2'···' v pk
where v: is the initial vector of the cycle.
Since the end vector V;k belong to Ran A, one can find a vector V;k+1 such that
k
AV
pk
+
1
= vpk·
So we can extend each cycle C
k
to a bigger cycle C
k
= v: ,v; , ... V;k' V;k+1. Since the
initial vectors v: of cycles C k' vk = 1,2, ... , v:, v; , ... V;k' V;k+1 are linearly independent,
Theorem implies that the union of these cycles is a linearly independent system. By the
definition of the cycle we have v: E Ker A, and we assumed that the initial vectors v: ' k =
1, 2, ... , r are linearly independent. Let us complete this system to a basis in KerA, i.e. let
Advanced Spectral Theory 247
find vectors U
I
, Ul! ... , u
q
such that the system, u
I
' tl2' ... , u
q
is a basis in Ker A (it may
happen that the system v;,k = 1,2, .. . ,r is already a basis in Ker A, in which case we put
q = 0 and add nothing).
The vector u
j
can be treated as a cycle of length !, so we have a collection of cycles
C
I
,C
2
, ... ,C
r
, u
I
' u
2
, ... , u
q
, whose initial vectors are linearly independent. So, we can
apply Theorem to get that the union of all these cycles is a linearly independent system. To
show that it is a basis, let us count the dimensions. We know that the cycles C
p
C
2
, ... , C
r
have
dim Ran A = rank A
vectors total. Each cycle C
k
was obtained from C
k
by adding 1 vector to it, so the
total number of vectors in all the cycles C
k
is rank A + r.
We know that
dim Ker A = r + q
12 r ..  
(because VI' vI , ... VI 'Ul' U2" .. U
q
IS a baSIS there). We added to the cycles C
l
, C
2
, . .. , C
r
additional q vectors, so we got
rank A + r + q = rank A + dimKer A = dim V
linearly independent vectors. But dim V linearly independent vectors is a basis.
Definition. A basis consisting of a union of cycles of generalized eigenvectors of a
nilpotent operator a (existence of which is guaranteed by the Theorem) is called a Jordan
canonical basis for A. Note, that such basis is not unique.
Corollary. Let A be a nilpotent operator. There exists a basis (a Jordan canonical
basis) such that the matrix of A in this basis is a block diagonal diag {A].A
2
, ... ,A
r
}, where
all Ak (except may be one) are blocks ofform, and one of the blocks Ak can be zero.
The matrix of a in a Jordan canonical basis is called the Jordan canonicalform of the
operator A. We will see later that the Jordan canonical formis unique, if we agree on how
to order the blocks (i.e. on how to order the vectors in the basis).
Proof According to Theorem one can find a basis consisting of a union of cycles of
generalized eigenvectors. a cycle of size p gives rise to a p x p diagonal block, and a cycle
of length 1 correspond to a 1 x 1 zero block. We can join these 1 x 1 zero blocks in one large
zero block (because odiagonal entries are 0).
Dot diagrams. Uniqueness of the Jordan canonical form. There is a good way of
visualizing Theorem and Corollary, the socalled dot diagrams. This methods also allows
us to answer many natural questions, like "is the block diagonal representation given by
Corollary unique?"
Of course, if we treat this question literally, the answer is "no", for we always can
change the order of the blocks. But, if we exclude such trivial possibilities, for example by
agreeing on some order of blocks (say, if we put all nonzero blocks in decreasing order,
and then put the zero block), is the representation unique, or not?
248
0 1
• •••••
0 1
• ••
0
• ••
0 1
•
0
•
0
0 1
0
Advanced Spectral Theory
o 0
o 1
o
o 0
o 0
o 0
o
Fig. Dot Diagram and Corresponding Jordan Canonical form of a Nilpotent Operator
To better understand the structure of nilpotent operators, let us draw the socalled dot
diagram. Namely, suppose we have a basis, which is a union of cycles of generalized
eigenvalues. Let us represent the basis by an array of dots, so that each column represents
a cycle. The first row consists of initial vectors of cycles, and we arrange the columns
(cycles) by their length, putting the longest one to the left.
On the figure 1 we have the dot diagram of a nilpotent operator, as well as its Jordan
canonical form. This dot diagram shows, that the basis has 1 cycle oflength 5, two cycles
of length 3, and 3 cycles of length 1. The cycle of length 5 corresponds to the 5 x 5 block
of the matrix, the cycles of length 3 correspond to two 3 nonzero blocks. Three cycles of
length 1 correspond to three zero entries on the diagonal, which we join in the 3 x 3 zero
block. Here we only giving the main diagonal of the matrix and the diagonal above it; all
other entries of the matrix are zero.
Ifwe agree on the ordering of the blocks, there is a onetoone correspondence between
dot diagrams and Jordan canonical forms (for nilpotent operators). So, the question about
uniqueness of the Jordan canonical form is equivalent to the question about uniqueness of
the dot diagram. To answer this question, let us analyse, how the operator A transforms the
dot diagram. Since the operator A annihilates initial vectors of the cycles, and moves vector
v
H1
of a cycle to the vector v
k
, we can see that the operator a acts on its dot diagram by
deleting the first (top) row of the diagram.
The new dot diagram corresponds to a Jordan canonical basis in Ran A, and allows us
to write down the Jordan canonical form for the restriction A IRan A.
Advanced Spectral Theory 249
Similarly, it is not hard to see that the operator Ak removes the first k rows of the dot
diagram. Therefore, if for all k we know the dimensions dimKer(A"), we know the dot
diagram of the operator A. Namely, the number of dots in the first row is dimKerA, the
number of dots in the second row is
dimKer(A2)  dimKer A,
and the number of dots in the kth row is
dimKer(A")  dimKer(A
k
+
1
).
But this means that the dot diagram, which was initially defined using a Jordan
canonical basis, does not depend on a particular choice of such a basis. Therefore, the dot
diagram, is unique.
This implies that if we agree on the order of the blocks, then the Jordan canonical
form is unique. Computing a Jordan canonical basis. Let us say few words about computing
a Jordan canonical basis for a nilpotent operator. Let p} be the largest integer such that
ApI "* 0 (so APl+} = 0). That PI is the length of the longest cycle.
Computing operators
Ak, k= 1,2, ... , PI'
and counting dimKer(A") we can construct the dot diagram of A. Now we want to put
vectors instead of dots and find a basis which is a union of cycles.
We start by finding the longest cycles (because we know the dot diagram, we know
how many cycles should be there, and what is the length of each cycle). Consider a basis
in the column space Ran(Apl). Name the vectors in this basis v:' , ... , V(, these will be
the initial vectors of the cycles.Then we find the end vectors of the cycles V!I"'"
by solving the equations
Pi k k
A v
pI
=1' k = 1,2, ... ,rI·
Applying consecutively the operator a to the end vector ' we get all the vectors
in the cycle. Thus, we have constructed all cycles of maximal length.
Let P2 be the length of a maximal cycle among those that are left to find. Consider the
subspace Ran(AP2), and let dim Ran(A
P2
) = r
2
. Since Ran(A
PI
) c Ran(AP2), we can
complete the basis v:, , ... , V( to a basis v:, ... , v( , V(+l , ... , V( in Ran (A
P2
). Then
we find end vectors of the cycles C'1+
I
, ... ,CI'.2 by solving (for the equations
PI k k
A v
p2
=vI, k=r} + l,r
I
+2, ... ,r
2
,
thus constructing the cycles of length P2'
Let P3 denote the length of a maximal cycle among ones left. Then, completing the
basis v:, , ... , V( in Ker( A
P2
) to a basis in Ker( A
P3
we construct the cycles of length P3'
and so on.
250 Advanced Spectral Theory
One final remark: as we discussed above, if we know the dot diagram, we know the
canonical form, so after we have found a Jordan canonical basis, we do not need to compute
the matrix of a in this basis: we already know it.
Jordan Decomposition Theorem
Theorem. Given an operator A there exist a basis (Jordan canonical basis) such that
the matrix of a in this basis has a block diagonal form with blocks of form
Iv 1 0
Iv 1
Iv
1
Iv
where Iv is an eigenvalue of A. Here we assume that the block of size 1 is just 'A.
The block diagonal form from Theorem is called the Jordan canonical form of the
operator A. The corresponding basis is called a Jordan canonical basis for an operator A.
Proof Ifwe join bases in the generalized eigenspaces
Ek = EM
to get a basis in the whole space, the matrix of a in this basis has a block diagonal
form diag {A
I
,A2' ... ,A
r
}, where
The operators
Nk =A
k
 'A/
Ek
are nilpotent, so by Theorem one can find a basis in Ek such that the matrix of Nk in
this basis is the Jordan canonical form of N
k
. To get the matrix of Ak in this basis one just
puts k instead of 0 on the main diagonal.
First of all let us recall that the computing of eigenvalues is the hardest part, but here
we do not discuss this part, and assume that eigenvalues are already computed. For each
eigenvalue we compute subspaces
Ker(A  ,J'l, k = 1,2, ...
until the sequence of the subspaces stabilizes. In fact, since we have an increasing
sequence of subs paces (Ker(A _'JJ)k c Ker(A  1v1)k+I), then it is sucient only to keep track
oftheir dimension (or ranks of the operators (A _'JJ)k. For an eigenvalue let m = m be the
number where the sequence Ker(A _'JJ)k stabilizes, i.e. m satisfies
dimKer(A _'JJ)ml < dimKer(A  'Al)m = dim Ker(A _'JJ)m+l.
Then
E" = Ker(A  'Al)m
is the generalized eigenspace corresponding to the eigenvalue.
After we computed all the generalized eigenspaces there are two possible ways of
Advanced Spectral Theory 251
action. The first way is to find a basis in each generalized eigenspace, so the matrix of the
operator a in this basis has the blockdiagonal form diag{A1.A2, ... , Ar}, where
Ak =AIE
Ak
·
Then we can deal with each matrix Ak separately. The operators
Nk = A
k
 ')..i
are nilpotent, so applying the algorithm described in Section 4.4 we get the Jordan
canonical representation for N
k
, and putting k instead of 0 on the main diagonal, we get the
Jordan canonical representation for the block A
k
• The advantage of this approach is that
we are working with smaller blocks. But we need to find the matrix of the operator in a
new basis, which involves inverting a matrix and matrix multiplication.
Another way is to find a Jordan canonical basis in each of the generalized eigenspaces
EAk by working directly with the operator A, without splitting it first into the blocks.
Again, the algorithm works with a slight modification. Namely, when computing a Jordan
canonical basis for a generalized eigenspace EAk ' instead of considering subspaces Ran(Ak
 ')..,/)1, which we would need to consider when working with the block Ak separately, we
consider the subspaces (A  ')..,/)1 EAk •
Chapter 10
Linear Transformations
Euclidean Linear Transformations
By a transformation from IR n into IR m , we mean a function of the type T: IR n + IR m ,
with domain IR
n
and codomain IR
m
, For every vector x E IRII, the vector T(x) E IR
m
is
called the image of x under the transformation T, and the set
R(I) = {T(x) : x E IRII},
of all images under T, is called the range of the transformation T,
Remark. For our convenience later, we have chosen to use R(I) instead of the usual
T( IR II) to denote the range of the transformation T,
For every x = (xl' .. " Xn) E IR
n
, we can write
T(x) = T(x
l
, .. " xn) = (YI' .. " Y
m
):
Here, for every i = 1, .. ,' m, we have
Yi = Tixl' .. " xn)' (1)
where Ti : IR II 7 IR is a real valued function,
Definition, A transformation T: IR II + IR m is called a I inear transformation if there
exists a real matrix
amI a
mn
such that for every
we have
where such that for every
Linear Transformations
we have
where
or, in matrix notation,
The matrix A is called the standard matrix for the linear transformation T.
Remarks. (1) In other words, a transformation
T: lR.
n
~ lR./11
is linear if the equation (l) for every i = 1, .. , ,m is linear.
253
(2) Ifwe write x E]Rn and y E JRm as column matrices, then (2) can be written in the
form
y=Ax,
and so the linear transformation T can be interpreted as multiplication of x E]Rn by
the standard matrix A.
Definition. A linear transformation T:]R11 7]R/11 is said to be a linear operator if n =
m. In this case, we say that T is a linear operator on JR
I1
•
Example. The linear transformation T:]Rs 7]R3 , defined by the equations
y) = 2x) + 3x
2
+ 5x3 + 7x4  9xs;
Y2 = 3x
2
+ 4x3 + 2xs;
Y3 =x) + 3x3 2x
4
;
can be expressed in matrix form as
(
; ~ J = ( ~ ~ ~ ~ :]
Y3 1 0 3 2 0
254
Linear Transformations
1
3 5 7
n
°
3 4 0 1
°
3 2
°
so that
T(1, 0, 1,0, 1) = (2,6,4).
Example. Suppose that A is the zero m x n matrix. The linear transformation T: lR n
lR
m
, where T(x) = Ix for every x E lR
n
, is the zero transformation from Rn into lRn.
Clearly T(x) = 0 for every x E lR
n
•
Example. Suppose that I is the identity n x n matrix. The linear operator T: lR n t lR m ,
where T(x) = Ix for every x E lR
n
, is the identity operator on Rn. Clearly T(x) = x for every
xE lRn.
PROPOSITION. Suppose that T: lR
n
t lR
m
is a linear transformation, andthatfel'
... , eng is the standard basis for lR n. Then the standard matrix for T is given by
A = (T(e
l
) ... T(enY;
Linear Operators on ]R
2
In this section, we consider the special case when n = T(xl,x 3)' and study linear operators
on ]R2. For every x E ]R2 , we shall write x = (xl' x
2
).
Example. Consider reection across the x
2
axis, so that T(x
l
, x
2
) = (xl' x
2
). Clearly we
have
and so it follows from Proposition that the standard matrix is given by
It is not dicult to see that the standard matrices for reection across the xlaxis and
across the line Xl = x
2
are given respectively by
and
Also, the standard matrix for reection across the origin is given by
Linear Transformations
A
We give a summary in the table below:
Linear operator Equations Standard matrix
Reection across x
2
axis nYI
Reection across X Iaxis
Reectio:1 across xl = x
2
Reection across origin
{
YI = xI
Y2 =x2
Y\ =X\
Y2 = x2
y\ =X2
Y2 =XI
{
YI = Xl (1 0)
Y2 = X2 01
255
Example. For orthogonal projection onto the xIaxis, we have T(x
l
, x
2
) = (XI' 0), with
standard matrix
Similarly, the standard matrix for orthogonal projection onto the x
2
axis is given by
We give a summary in the table below:
Linear operator Equations Standard matrix
Orthogonal projection onto X Iaxis
{
YI = xl (1 0)
Y2 = ° ° °
{
Y = ° (0 0)
Orthogonal projection onto x
2
axis 1_ ° 1
Y2  x2
Example. For anticlockwise rotation by an angle e, we have T(x I' x
2
) = (}II' Y2)' where
Y\ + iY2 = (Xl + ix
2
)(cose + i sine );
and so
(
Y\) = Sine) (XI)
Y2 SIn e cose x2
It follows that the standard matrix is given by
A = (cose sin e)
sine cose
256 Linear Transformations
We give a summary in the table below:
Linear operator Equations Standard matrix
{
Yl=XlCOSex
2
sine (cose sine)
Anticlockwise rotation by angle e {Y2 = xl cos e  x2 sin e sin e cos e
Example. For contraction or dilation by a nonnegative scalar k, we have
T(xI' x
2
) = (kxl' kx
2
),
with standard matrix
The operator is called a contraction if ° < k < 1 and a dilation if k > 1, and can be
extended to negative values of k by noting that for k < 0, we have
This describes contraction or dilation by nonnegative scalar k followed by reection
across the origin. We give a summary in the table below:
Linear operator Equations Standard matrix
Contraction or dilation by factor k
{
YI = kxl (ko k
O
)
Y2 = kx2
Example. For expansion or compression in the xldirection by positive factor k, we
have T(xl' x
2
) (kxl' x
2
), with standard matrix
This can be extended to negative values of by noting that for k < 0, we have
This describes expansion or compression in the xldirection by positive factor k
followed by reection across the x
2
axis. Similarly, for expansion or compression in the x
2

direction by nonzero factor k, we have the standard matrix
We give summary in the below:
Linear operator Equations Standard matrix
Expansion or compression in xldirection
{
Yl = kxl
Y2 =x2
Linear Transformations
{
Yl =xl
Expansion or compression in x
2
direction Y2 = kx2
Example. For shears in the xldirection with factor k, we have
T(x
l
, x
2
) = (xI + kx
2
, x
2
),
with standard matrix
A = ( ~ ~ ) .
For the case k = 1, we have the following.
T •
(k= 1)
For the case k = 1, we have the following.
T
(k= I)
)
Similarly, for shears in the x
2
direction with factor k, we have standard matrix
A = ( ~ ~ ) .
We give a summary in the table below:
Linear operator Equations
Shear in x ldirection
Shear in x
2
direction
{
Yl = xI +kx2
Y2 = x2
{
Yl = xl +kx2
Y2 =x2
Standard matrix
257
258 Linear Transformations
Example. Consider a linear operator T: ]R Z 7 ]Rz which consists of a reection across
the xzaxis, followed by a shear in the x1<iirection with factor 3 and then reection across
the xlaxis. To end the standard matrix, consider the eect of Ton a standard basis {e
1
, e
z
}
of ]R2 . Note that
el  ) M ( M ( M (
1
) = T( el) ,
ez
so it follows from Proposition that the standard matrix for Tis
A = (I 3).
o 1
Let us summarize the above and consider a few special cases. We have the following
table of invertible linear operators with k '* O. Clearly, if A is the standard matrix for an
invertible linear operator T, then the inverse matrix AI is the standard matrix for the inverse
linear operator '11.
Linnear operator T Standard matrix A Inverse matrix AI Linear operator r I
Reflection across
b) b)
ReflectIOn across
line XI =X2 line "1 =X2
Expansion or compressIOn
rl
0
Expansion or compression
in XI direction
0 1
Expansion or compressIOn
(b
1
J
ExpanSIOn or compression
in X2 direction
0
in X2 direction
Shear in XI direction
1 k 1 k
Shear in XI direction
0 1 0 1
Shear in x2 direction
1 0 1 0
Shear in x2 direction
k 1 k 1
Next, let us consider the question of elementary row operations on 2 x 2 matrices. It
is not dicult to see that an elementary row operation performed on a 2 x 2 matrix A has the
eect of multiplying the matrix A by some elementary matrix E to give the product EA. We
have the following table.
Elementary row operation
Interchanging the two rows
Multiplying row 1 by nonzero factor k
Multiplying row 2 by nonzero factor k
Elementary matrix E
Linear Transformations 259
Adding k times row 2 to row 1
Adding k times row 1 to row 2
Now, we know that any invertible matrix A can be reduced to the identity matrix by a
finite number of elementary row operations. In other words, there exist a finite number of
elementary matrices E
1
, ••• , Es of the types above with various nonzero values of k such
that
so that
1 1
A = El ... E
s
.
We have proved the following result.
Proposition. Suppose that the linear operator T: jR2 jR2 has standard matrix A,
where A is invertible. Then T is the product of a succession of nitely many reections,
expansions, compressions and shears.
In fact, we can prove the following result concerning images of straight lines.
Proposition. Suppose that the linear operator T: 1R.2 1R2 has standard matrix A,
where A is invertible. Then
(a) The image under T of a straight line is a straight line;
(b) The image under T of a straight line through the origin is a straight line
through the origin, and .
(c) The images under T of parallel straight lines are parallel straight lines.
Proof Suppose that T(x
l
• x
2
) = (Y1'Y2)' Since A is invertible, we have x =Aly, where
and
The equation of a straight line is given by (XXI + = 'Yor, in matrix form, by
(IX (1),
Hence
Let
(a' W) = (a
Then
260 Linear Transformations
(a
'
W) = (;J = (y).
In other words, the image under T of the straight line xI + x
2
== y is "(' YI + WY2 == ,,(,
clearly another straight line. This proves (a). To prove (b), note that straight lines through
the origin correspond to "( == 0, To prove (c), note that parallel straight lines correspond to
dierent values of for the same values of a and
Elementary Properties of Euclidean Linear Transformations
In this section, we establish a number of simple properties of euclidean linear
transformations.
Proposition. Suppose that Ii: IR
n
7 IRIll and T
2
: IRIll 7 IRk are linear
tramformations. Then T == T
2
. Ii: IR
n
7 IRk is also a linear transformation.
Proof Since TI and T2 are linear transformations, they have standard matrices A I and
A2 respectively. In other words, we have TI(x) ==Alx for every x E IR
n
and T
2
(y) ==A7,Y for
every Y E IRIll . It follows that T(x) == TiTI(x)) == Ay4lx for every Y E IR
n
, so that T has
standard matrix AzAI'
Example. Suppose that Ii: 7 IR2 is anti clockwise rotation by nl2 and
2 2
T2 : IR 7 IR is orthogonal projection onto the xIaxis. Then the respective standard
matrices are
AI ( and A, ( )
It follows that the standard matrices for T2 . TI and TI . T2 are respectively
Hence T2 . TI and TI . T2 are not equal.
Example. Suppose that Ii : IR2 7 IR2 is anticlockwise rotation by and T2 : IR2 7 IR2
is anticlockwise rotation by <j>. Then the respective standard matrices are
Al  . and A = .
_ (COS8 sin8) (cos<j> sin<j»
smS cos8 2 sin<j> cos<j>
It follows that the standard matrix for T2 . TI is
(
cOS<j>COS8  sin <j>sin 8 cos<j>sin 8  sin <j>COS8)
AA
2 1 sin<j>cos8+cos<j>sin8 cos<j>cos8sin<j>sin8
= (COS(<j>+8) Sin(<j>+8)).
sine <j> + 8) cos( <j> + 8)
Linear Transformations 261
Hence T2 0 TI is anticlockwise rotation by <p + e.
Example. The reader should check that in lR 2 , reaction across the x Iaxis followed
by reection across the x)axis gives reection across the origin. Linear transformations that
map distinct vectors to distinct vectors are of special importance.
Definition. A linear transformation T:][{n t][{m is said to be oneloone iffor every
x', x" E lR,n , we have x' = x" whenever T(x) = T(x").
Example. Ifwe consider linear operators T: ][{
2
) ][{
2
, then T is onetoone precisely
when the standard matrix A is invertible. To see this, suppose rst of all that A is invertible.
If T(x') = T(x"), then Ax' = Axil. Multiplying on the left by AI, we obtain x' = x". Suppose
next that A is not invertible. Then there exists x E lR,
2
such that x ::f. 0 and Ax = O. On the
other hand, we clearly have
Ao = O.
It follows that
T(x) = T(O),
so that T is not onetoone.
Proposition. Suppose that the linear operator T:][{n t][{11 has standard matrix A.
Then the following statements are equivalent:
(a) The matrix A is invertible.
(b) The linear operator T is onetoone.
(c) The range of T is lR n , in other words, R(T) = lR IJ •
Proof ((a» =:} (b» Suppose that T(x') = T(x"). Then Ax' = Axil. Multiplying on the left
by AI gives x' = x",
((b» =:} (a» Suppose that T is onetoone. Then the system Ax = 0 has unique solution
x = 0 in lR IJ • It follows that A can be reduced by elementary row operations to the identity
matrix J, and is therefore invertible.
((a» =:} (c» For any y E lR,n , clearly x = AIy satises Ax = y, so that T(x) = y.
((c» =:} (a) Suppose that {el' ... , en} is the standard basis for ]Rn. Let xI' ... , xn
E]Rn be chosen to satisfy T(x) = e
J
' so that AXj = e
j
, for every j = 1, ... , n. Write
C = (xI'" x
n
)·
Then AC = J, so that A is invertible.
Definition. Suppose that the linear operator T : ][{ n t lR, n has standard matrix A, where
A is invertible. Then the linear operator rI : lR,n t lR,n , dened by rIcx) = AIx for every
x E lR, n , is called the inverse of the linear operator T.
Remark. Clearly r1(T(x» = x and T(rI(x» = x for every x E lR,n .
262 Linear Transformations
Example. Consider the linear operator T:]R2 ~ ] R 2 , dened by T(x) = Ax for every
2
x E]R , where
Clearly A
AI =( 2 1).
1 1
Hence the inverse linear operator is 11 : ]R2 ~ ] R 2 , dened by 11 (x) = AIx for every
2
x E]R .
Example. Suppose that T: 1R2 ~ 1R2 is anticlockwise rotation by angle. The reader
should check that 11 : ]R2 ~ ] R 2 is anticlockwise rotation by angle 21t9.
Next, we study the linearity properties of euclidean linear transformations which we
shall use later to discuss linear transformations in arbitrary real vector spaces.
Proposition. A transformation T : IR
n
~ ] R m is linear if and only if the following two
conditions are satised:
(a) For every u, v E IR
n
, we have T(u + v) = T(u) + T(v).
(b) For every u E IR
n
and c E IR, we have T(cu) = cT (u).
Proof Suppose rst of all that T: IR
n
~ IR
m
is a linear transformation. Let A be the
standard matrix for T. Then for every u, v E IR nand c E IR , we have
T(u + v) = A(u + v) = Au + Av = T(u) + T(v)
and
T(cu) = A(cu) = c(Au) = cT (u):
Suppose now that (a) and (b) hold. To show that Tis linear, we need to nd a matrix A
such that T(x) = Ax for every X€]Rn. Suppose that {e
1
, ••• , en} is the standard basis for
IR n . As suggested by Proposition 8A, we write
A = ( T(e
l
) ..• T(e
n
»;
where T(e) is a column matrix for every j = 1, ... , n. For any vector
in ]Rn, we have
Linear Transformations
Ax ~ ( T(e I) ... T(e
n
)) (]J ~ xIT(el) + ... + xnT(e
n
)·
Using (b) on each summand and then using (a) inductively, we obtain
Ax = T(x)e)) + ... + T(xne
n
) = T(x)e) + ... + xne
n
) = T(x)
as required.
263
To conclude our study of euclidean linear transformations, we briey mention the
problem of eigenvalues and eigenvectors of euclidean linear operators.
Definition. Suppose that T: l l ~ n ~ IR
n
is a linear operator. Then any real number
f... E lR is called an eigenvalue of T if there exists a nonzero vector x E IR
n
such that T(x)
= x. This nonzero vector x E IR
n
is calIed an eigenvector of T corresponding to the
eigenvalue f....
Remark. Note that the equation T(x) = x is equivalent to the equation Ax = Ax. It follows
that there is no distinction between eigenvalues and eigenvectors of T and those of the
standard matrix A. We therefore do not need to discuss this problem any further.
General Linear Transformations
Suppose that Vand Ware real vector spaces. To define a linear transformation from V
into W, we are motivated by Proposition which describes the linearity properties of euclidean
linear transformations.
By a transformation from V into W, we mean a function of the type T: V ~ W, with
domain V and codomain W. For every vector U E V, the vector T(u) E W is called the
image of u under the transformation T.
Definition. A transformation T: V E W from a real vector space V into a real vector
space W is called a linear transformation if the following two conditions are satised:
(LT1) For every u, v E V, we have T(u + v) = T(u) + T(v).
(LT2) For every u E Vand C E lR, we have T(cu) = cT (u).
Definition. A linear transformation T: V ~ V from a real vector space V into itself is
called a linear operator on V.
Example. Suppose that V and Ware two real vector spaces. The transformation
T: V ~ W, where T(u) = 0 for every u E V, is clearly linear, and is called the zero
transformation from V to W.
Example. Suppose that V is a real vector space. The transformation l: V ~ V, where
leu) = u for every u E V, is clearly linear, and is called the identity operator on V.
Example. Suppose that V is a real vector space, and that k E lR is fixed. The
transformation T: V ~ V, where T(u) = ku for every u E V, is clearly linear. This operator
is called a dilation if k > I and a contraction if 0 < k < 1.
264 Linear Transformations
Example. Suppose that V is a finite dimensional vector space, with basis {WI' ... ,w n}.
Dene a transformation T: V IR
n
as follows. For every U E V, there exists a unique
vector ... , E ffi.n such that u = + ... + We let T(u) = ... ,
In other words, the transformation T gives the coordinates of any vector u E V with
respect to the given basis {wI' ... , W
n
}. Suppose now that
V=AIwI+···+AnWn
is another vector in V. Then
u + v = + AI)w
I
+ ... + + An)w
n
,
so that
T(u + v) = + AI' ... , + = ... , + (AI' ... , An) = T(u) + T(v).
Also, if C E IR, then cu = wI + ... + so that
T(cu) = ... , = ... , = cT (u).
Hence T is a linear transformation. We shall return to this in greater detail in the next
section.
Example. Suppose that P n denotes the vector space of all polynomials with real
coefficients and degree at most n. Dene a transformation T: P n P n as follows. For every
polynomial
P=PO+PI
X
+ ... +P,f1
in P
n
, we let
T(P) = P
n
+ PnI
x
+ ... +
Suppose now that
q=qo+q)x+ ... +q,t>(l
is another polynomial in P
n
. Then
P + q = (Po + qo) + (p) + ql)x + ... + (Pn + qn)xn;
so that
T(P + q) = (pn + qn) + (PnI + qn_l)x + ... + (Po + qo)x
n
= (Pn + PnIx + ... + + (qn + qn_I
x
+ ... + = T(P) + T(q).
Also, for any c E IR, we have cp = cPo + cP1x + ... + cp/' so that
T(cp) = cPn + cPn_Ix + ... + cPoX
n
= c(Pn + Pn_)x + ... + = cT (p).
Hence T is a linear transformation.
Example. Let V denote the vector space of all real valued functions dierentiable
everywhere in IR, and let W denote the vector space of all real valued functions dened on
ffi. . Consider the transformation T: V W, where T(f) = f' for every f E V. It is easy to
check from properties of derivatives that T is a linear transformation.
Example. Let V denote the vector space of all real valued functions that are Riemann
integrable over the interval [0, 1]. Consider the transformation T: V ffi. , where
T(f) =
Linear Transformations 265
for every f E V. It is easy to check from properties of the Riemann integral that T is a
linear transformation.
Consider a linear transformation T: V W from a finite dimensional real vector
space V into a real vector space W. Suppose that {vI' ... , v
n
} is a basis of V. Then every u
E V can be written uniqudy in the form u = + ... + where ... E IR. It
follows that
T(u) = + ... + = + ... + = + ... +
We have therefore proved the following generalization of proposition.
Proposition. Suppose that T : V W is a linear transformation from a finite
dimensional real vector space V into a real vector space W Suppose further that {v I' ... ,
v
n
} is a basis of V Then T is completely determined by T(v
l
), ... , T(v
n
).
Example. Consider a linear transformation T: P 2 IR , where T(1) = 1, T(x) = E and
T(x
2
) = 3. Since {l, x, x
2
} is a basis of P
2
, this linear transformation is completely
determined. In particular, we have, for example,
T(5  3x + 2x2) = 5T(1)  3T(x) + 2T(x2) = 5.
Example. Consider a linear transformation T: IR
4
IR , where T(1, 0, 0, 0) = 1, T(1,
1,0,0) = 2, T(l, 1, 1,0) = 3 and T(1, 1, 1, 1) = 4. Since {(1, 0, 0, 0), (1, 1,0,0), (1,1, 1,
0), (1, 1, 1, I)} is a basis of IR
4
, this linear transformation is completely determined. In
particular, we have, for example,
T(6,4,3, 1)=T(2(1 0,0,0)+(1,1,0,0)+2(1,1,1,0)+(1,1,1,1))
= 2T(I, 0, 0, 0) + T(1, 1,0,0) + 2T(1, 1, 1,0) + T(1, 1, 1, 1) = 14.
We also have the following generalization of PROPOSITION.
Proposition. Suppose that V, W, U are real vector spaces. Suppose further that TI : V
Wand T2 : U are linear transformations. Then T= T2 T
J
: V U is also a linear
transformation.
Proof Suppose that u, v E V. Then
T(u + v) = TzCTI(u + v)) = T
2
(T
I
(u) + TJ(v)) = T
2
(T
J
(u)) + TzCTJ(v)) = T(u) + T(v).
Also, if c E IR, then
T(cu) = T
2
(T
J
(cu)) = TzCcTJ(u)) = cT
2
(T
I
(u)) = cT (u).
Hence T is a linear transformation.
Change of Basis
Suppose that V is a real vector space, with basis B = {u
I
' ... , un}' Then every vector u
E V can be written uniquely as a linear combination
u = + ... + where ... , E IR·
It follows that the vector u can be identied with the vector ... , E IRn.
Definition. Suppose that u E V and (3) holds. Then the matrix
266 Linear Transformations
[ul.
is called the coordinate matrix of II relative to the basis B = {Ut' ... , un}.
Example. The vectors
u
l
= (1, 2, 1,0), u
2
= (3,3,3,0), u
3
= (2, 10,0,0), u
4
= (2, 1, 6, 2)
are linearly independent in JR4 , and so B = {u
l
' u
2
' u
3
, u
4
} is a basis of JR4 . It follows
that for any U = (x, y, z, w) E JR4, we can write
U = + + +
In matrix notation, this becomes
x 3 2 2
Y
2 3 10 I
=
,
z 1 3 0 6
w 0 0 0 2
so that
3 2 2 x
2 3 10 1 y
[u]B=
=
1 3 0 6 z
0 0 0 2 w
Remark. Consider a function <j> : V JR
n
, where (u) = [u]B for every U E V. It is not
dicult to see that this function gives rise to a onetoone correspondence between the
elements of V and the elements of n • Furthermore, note that
[u + v]B = [u]B + [v]B and [cu]B = c[u]B'
so that <j>(u + v) = <j>(u) + <j>(v) and <j>(cu) = c<j>(u) for every u, v E Vand C E JR. Thus
is a ljnear transformation, and preserves much of the structure of V. We also say that V is
isomorphic to JR n • In practice, once we have made this identication between vectors and
their coordinate matrices, then we can basically forget about the basis B and imagine that
we are working in JR n with the standard basis.
Clearly, if we change from one basis B = {u
I
' ••• , un} to another basis C = {VI' ... , v
n
}
of V, then we also need to nd a way of calculating [u]C in terms of [u]B for every vector u
E V. To do this, note that each of the vectors vI' ... , vn can be written uniquely as a linear
combination of the vectors ul' ... , un. Suppose that for i = 1, ... , n, we have
vi = aliu
l
+ ... + aniu
n
, where ali' ... , ani E R,
so that
Linear Transformations
(
aliJ
[vilB = :.'
am
for every u E V, we can write
u = + ... + = Ylvl + ... + Ynv
n
, where ... , YI' ... , Y
n
E R;
so that
Clearly
u = Ylvi + ... + ynv
n
= YI(allu
l
+ ... + anI un) + ... + yn(alnu
l
+ ... + ann un)
= (Ylall + ... + ')'naln)u
l
+ ... + (Ylanl + ... + 'Ynann)u
n
= + ... +
Hence
= + ... +
Written in matrix notation, we have
[all alnJ(YIJ
= ( ... ) Y:n'
We have proved the following result.
267
Proposition. Suppose thatB = {u
I
' ... , un} and C = {vI' ... , v
n
} are two bases ofa real
vector space V. Then for every U E V, we have
[ul
B
= P[ul
c
'
where the columns of the matrix
P = ([vtlB ... [vnl
B
)
are precisely the coordinate matrices of the elements of C relative to the basis B.
Remark. Strictly speaking, Proposition gives [ul
B
in terms of [ulc However, note that
the matrix P is invertible (why?), so that [ul
e
= P1[ul
B
•
Definition. The matrix P in Proposition is sometimes called the transition matrix from
the basis C to the basis B.
Example. We know that with
u
l
= (1, 2,1,0), u
2
= (3,3,3,0), u
3
= (2, 10,0,0), u
4
= (2,1, 6,2),
and with
vI = (1,2, 1,0), v
2
= (1, 1, 1,0), v3 = (1, 0, 1,0), v
4
= (0, 0, 0, 2),
268 Linear Transformations
both B = {u
I
' u
2
' u
3
' u
4
} and C = {vI' V
2
, V
3
, V
4
} are bases of ~ 4 . It is easy to check
that
so that
VI = ul'
v
2
= 2u
I
+ u
2
'
v3 = 11u
I
4u
2
+ u
3
v
4
= 27u
I
+ l1u
2
 2u
3
+ u
4
'
1 2 11 27
o 1 4 11
P = ([vdB [v
2
]B [v
3
]B [v
4
]B) = 0 0 1 2
o 0 0 1
Hence [u]B = P[u]c for every u E ~ 4 . It is also easy to check that
u
l
= vI'
u
2
= 2vI + v
2
,
u
3
= 3v
I
+ 4v2 + v
3
'
u
4
= vI  3v
2
+ 2v3 + v
4
'
so that
1 2 3 1
o 1 4 3
o 0 2
o 0 0 1
Hence [u]c = Q[u]B for every u E ~ 4 . Note that PQ = 1. Now let u = (6, 1,2,2). We
can check that
u = VI + 3v
2
+ 2v3 + v
4
' so that
1
3
[u1c =
2
1
Thea
1 2 11
271 1
10
0 1 4 11 3 6
[u]B =
0 0 1
2 j 2
0
0 0 0 1 1 1
Check that u = IOu
I
+ 6u
2
+ u
4
.
Example. Consider the vector space P 2. It is not too dicult to check that
Linear Transformations
U =l+x U =1+x2 U =X+X
2
I ' 2 ' 3
form a basis of P 2' Let
U = 1 + 4x x
2
,
Then
where
269
1 + 4x x
2
= + x) + + x
2
) + + x
2
) = + + + + +
so that
and
P2 + P
3
=1.
Hence = (3, 2, 1). Ifwe write
B = {u
I
' u
2
' u
3
},
then
[Uln=(+J
On the other hand, it is also not too dicult to check that
vI = 1, v
2
= 1 + x, v3 = 1 + x + x
2
form a basis of P 2' Also
where
so that
and
Hence
Ifwe write
then
U = Ylvi + Y2
v
2 + Y3v3'
1 + 4x  x
2
= YI + Y2(1 + x) + Yi1 + x + x
2
)
= (Yl + Y2 + Y3) + (Y2 + Y3)x + Y3
x2
,
(YI' Y2' Y3) = (3,5, 1).
Next, note that
270
Hence
P = ([vdB [v
2
]B [v3]B) =
112 0 112
To verify that [u]B = P[u]c' note that
=
1 112 0 112 1
Linear Transformations
Kernel and Range
Consider rst of all a euclidean linear transformation IR n IR m . Suppose that A is the
standard matrix for T. Then the range of the transformation T is given by
R(I) = {T(x) : x E IRn} = {Ax: x E IR
n
}.
It follows that R(I) is the set of all linear combinations of the columns of the matrix
A, and is therefore the column space of A. On the other hand, the set
{x E IR
n
: Ax = O}
is the nullspace of A. .
Recall that the sum ofthe dimension of the nullspace of A and dimension of the column
space of A is equal to the number of columns of A. This is known as the Ranknullity
theorem. The purpose of this section is to extend this result to the setting of linear
transformations. To do this, we need the following generalization ofthe idea of the nullspace
and the column space.
Definition. Suppose that T: V W is a linear transformation from a real vector space
V into a real vector space W. Then the set
ker(I) = {u E V : T(u) = 0
is called the kernel of T, and the set
R(I) = {T(u) : U E V}
is called the range of T.
Example. For a euclidean linear transformation T with standard matrix A, we have
shown that ker(I) is the nullspace of A, while R(1) is the column space of A.
Example. Suppose that T: V W is the zero transformation. Clearly we have
ker(I) = Vand R(I) = {O}.
Linear Transformations 271
Example. Suppose that T: V ~ V is the identity operator on V. Clearly we have
ker(1) = {O} and R(1) = V. '
Example. Suppose that T: JR2 ~ JR2 is orthogonal projection onto the xlaxis. Then
ker(1) is the x
2
axis, while R(1) is the xlaxis.
Example. Suppose that T: ~ n ~ ~ n is onetoone. Then
ker(1) = {O} and R(1) = JR
n
,
Example. Consider the linear transformation T: V ~ W, where V denotes the vector
space of all real valued functions dierentiable everywhere in JR, where W denotes the
space of all real valued functions dened in JR, and where T(f) = f' for every f E V. Then
ker(1) is the set of all dierentiable functions with derivative 0, and so is the set of all
constant functions in JR.
Example. Consider the linear transformation T: V ~ JR, where V denotes the vector
space of all real valued functions Riemann integrable over the interval [0, 1], and where
T(f) = f ~ f ( x ) d x
for every f E V. Then ker(1) is the set of all Riemann integrable functions in [0, 1]
with zero mean, while R(1) = JR.
Proposition. Suppose that T: V ~ W is a linear transformation from a real vector
space V into a real vector space W. Then ker(1) is a subspace of V, while R(1) is a subspace
of W.
Proof Since T(O) = 0, it follows that 0 E ker(1) Vand 0 E R(1) W. For any
u, v E ker(1),
we have
T(u + v) = T(u) + T(v) = 0 + 0 = 0,
so that u + v E ker(1). Suppose further that c E JR. Then
T(cu) = cT (u) = Co = 0,
so that C
u
E ker(1). Hence ker(1) is a subspace of V. Suppose next that w, Z E R(1).
Then there exist u, v E V such that T(u) = wand T(v) = z. Hence
T(u + v) = T(u) + T(v) = w + z,
so that w + Z E R(1). Suppose further that c E JR. Then
T(cu) = cT (u) = cw,
so that cw E R(1). Hence R(1) is a subspace of W.
To complete this section, we prove the following generalization of the Ranknullity
theorem.
Proposition. Suppose that T,' V ~ W is a linear transformation from an ndimensional
real vector space V into a real vector space W. Then
dim ker(1) + dim R(1) = n.
..
272 < Linear Transformations
Proof Suppose first of all that dim ker(1) = n. Then ker(I) = V, and so R(I) = {O},
and the result follows immediately. Suppose next that dim ker(I) = 0, so that ker(I) = {O}.
If {vI' ... , v
n
} is a basis of V, then it follows that T(v
l
), ... , T(v
n
) are linearly independent
in W, for otherwise there exist c
l
' ... , c
n
E lR, not all zero, such that
cIT(v
l
) + .. , + cnT(v
n
) = 0,
so that
T(c
i
vI + ... + cnv
n
) = 0,
a contradiction since
civ
i
+ ... +cnVn;f:. O.
On the other hand, elements of R(I) are linear combinations of T(v
l
), ... , T(v
n
).
Hence dim R(I) = n,
and the result again follows immediately. We may therefore assume that
dim ker(I) = r,
where 1 ::::; r < n.
Let {vI' ... , v
r
} be a basis ofker(I). This basis can be extended to a basis {vI' ... , v
r
'
vr + 1, ... , v
n
} of V. It suces toshow that
{T(vr+I)' ... , T(v
n
)}
is a basis of R(I). Suppose that U E V. Then there exist ... , E lR such that
U = + '" + + v
r
+
1
+ ... +
so that
T(u) = + ... + + + ... +
= + ... +
It follows that spans R(I). It remains to prove that its elements are linearly independent.
Suppose that cr+I' ... , c
n
E lR and
cr+IT(vr+l) + ... + cnT(v
n
) = O.
We need to show that
cr+1 = ... = c
n
= O.
By linearity, it follows from equation that T(cr+Ivr+I + ... + cnv
n
) = 0, so that
cr+lvr+1 + ... + cnv
n
E ker(I).
Hence there exist c
I
' ... , c
r
E lR such that
cr+lvr+1 + ... + cnv
n
= civ
i
+ ... + crv
r
;
so that
C
I
vI + ... + CrV
r
 Cr+l Vr+l  ...  CnV
n
= O.
Since {VI' ... , v
n
} is a basis of V, it follows that
c
i
= ... = c
r
= cr+I = ... = c
n
= O.
Remark. We sometimes say that dim R(I) and dim ker(I) are respectively the rank
and the nullity of the linear transformation T.
Linear Transformations 273
Inverse Linear Transformations
In this section, we generalize some of the ideas first discussed in Section.
Definition. A linear transformation T: V 7 W from a real vector space V into a real
vector space Wis said to be onetoone iffor every u', u" E V , we have u' = u" whenever
T(u') = T(u").
Proposition. Suppose that T: V 7 W is a linear transformation from a real vector
space V into a real vector space WF. Then T is onetoone if and only if ker(1) = {O}.
Proof (=» Clearly 0 E ker(1). Suppose that ker(1) ::j: {O}. Then there exists a non
zero v E ker(1). It follows that T(v) = T(O), and so T is not onetoone.
(( ¢:::) Suppose that ker(1) = {O}. Given any u', u" E V, we have
T(u')  T(u') = T(u'  u'') = 0
if and only if u'  u" = 0, in other words, if and only if u' = u". We have the following
generalization of PROPOSITION.
Proposition. Suppose that T: V 7 V is a linear operator on a finitedimensional real
vector space V. Then the following statements are equivalent.
(a) The linear operator T is onetoone.
(b) We have ker(1) = {g}.
(c) The range of T is V, in other words, R(1) = V.
Proof The equivalence of (a) and (b) is established by PROPOSITION. The
equivalence of (b) and (c) follows from PROPOSITION.
Suppose that
T:V7W
is a onetoone linear transformation from a real vector space V into a real vector
space W. Then for every W E R(1), there exists exactly one u E V such that
T(u) = W.I
We can therefore dene a transformation
rI : R(1) 7 V
by writing rI(w) = u, where u E Vis the unique vector satisfying T(u) = w.
Proposition. Suppose that T: V 7 W is a onetoone linear transformation from a
real vector space V into a real vector space W. Then rI. R(T) 7 V is a linear
transformation.
Proof Suppose that w, Z E R(1). Then there exist u, v E V such that
rI(w) = u
and rI(z) = v.
It follows that
T(u) = wand T(v) = z,
so that T(u + v) = T(u) + T(v) = w + z,
274 Linear Transformations
•
whence jl(w + z) = U + v = jl(w) + jl(z).
Suppose further that C E IR .
Then T(cu) = cw,
so that jl(cw) = cu = cjl(w).
We also have the following result concerning compositions of linear transformations
and which requires no further proof, in view of our knowledge concerning inverse functions.
Proposition. Suppose that V, W, U are real vector spaces. Suppose further that T\ : V
~ Wand T2 : W ~ U are onetoone linear transformations. Then
(a) The linear transformation T2 . TI : V ~ U is onetoone, and
(b) (T
2
. Titl = 1}I.
T2
1.
Matrices of General Linear Transformations
Suppose that T: V ~ W is a linear transformation from a real vector space V to a real
vector space W. Suppose further that the vector spaces V"and Ware finite dimensional,
with dim V = n and dim W = m. We shall show that if we make use of a basis B of V and
a basis C of W, then it is possible to describe T indirectly in terms of some matrix A. The
main idea is to make use of coordinate inatrices relative to the bases Band C.
Let us recall some discussion in Section. Suppose that
B = {VI' ... , v
n
}
;s a basis of V. Then every vector v E V can be written uniquely as a linear combination
v = PI VI + ... + Pnv
n
,
where PI' ... , P
n
E lR
The matrix
is the coordinate matrix of v relative to the basis B.
Consider now a transformation <I> : V ~ lR.
n
, where <I>(v) = [v]B for every v E V. The
proof of the following result is straightforward.
Proposition. Suppose that the real vector space V has basis B = {VI' ... , v
n
}. Then the
tramformation <I> : V ~ W, where <I>(v) = [v]Bsatisesforevery v E V, is a onetoone linear
transformation, with range R( <1» = lR n • Furthermore, the inverse linear transformation <1>
I : lR n ~ V is also onetoone, with range R( lR n I) = V.
Suppose next that C = {wI' ... ,w m} is a basis of W. Then we can dene a linear
transformation \jJ : W ~ lR
m
, where \jJ (w) = [w]c for every w E W, in a similar way. We
now have the following diagram of linear transformations.
Linear Transformations 275
Suppose nextthat {w l' ... , W m} is basis of W. Then we can define linear transformation
\1' : W ]Rm, where \If (w) = [w]c for every WE W , in similar way. We now have the
following diagram of linear transformations.
V ___ ___ W
I
0/ 0/
JR"
Clearly the composition
I n III
S = \jf.T 0 T 0 <I> : IR. IR.
is a euclidean linear transformation, and can therefore be described in term of a
standard matrix A. Our task is to determine this matrix A in terms of rand the :. nd
C.
We know from Proposition that
A = (S(e
l
) ... Seen))'
where {e
l
, ••• , en} is the standard basis for ]Rn. For every j = I, ... , n, we have
I 1
See) = (\jf 0 T 0 <I> ) (e) = \If (T(<I> (e))) = \If (T(v)) = [T(v) )]c.
It follows that
A = ([T(vl)]c ... [T(vn)]c)'
Definition. The matrix A given by equatioin is called the matrix for the linear
transformation Twith respect to the bases Band C. We now have the following diagram of
linear transformations.
T
W
I
0/ 0/
IIt ___ "'S ___ ]Rnl
Hence wt: can write T as the composition
f
276 Linear Transformations
I
T='V
For every v E V, we have the following:
\
v A[ v]B 'II ) 'V I (A[ v]B )
More precisely, if v = + ... + then
say, and so
T(v) = \IfI(A[v]B) = 'Ylwi + ... + 'Ymwm.
We have proved the following result.
Proposition. Suppose that T: V W is a linear transformation from a real vector
space V into a real vector space W. Suppose further that Vand Ware finite dimensional,
with bases Band C respectively, and that A is the matrix for the linear transformation T
with respect to the bases Band C. Then for every v E V, we have T(v) = w, where W E W
is the unique vector satisfying [w]c = A [v]B.
Remark. In the special case when V = W, the linear transformation T: V W is a
linear operator on T. Of course, we may choose a basis B for the domain V of T and a basis
C for the codomain V of T. In the case when T is the identity linear operator, we often
choose B ;:f:. C since this represents a change of basis. In the case when T is not the identity
operator, we often choose B = C for the sake of convenience, we then say that A is the
matrix for the linear operator T with respect to the basis B.
Example. Consider an operator T : P 3 P 3 on the real vector space P 3 of all
polynomials with real coefficients and degree at most 3, where for every polynomial p(x)
in P 3' we have T(P(x» = xp' (x), the product of x with the formal derivative p'(x) of p(x).
The reader is invited to check that T is a linear operator. Now consider the basis B = { 1, x,
x
2
' x
3
} of P
3
• The matrix for Twith respect to B is given by
A = ([T(I)]B [T(x)]B [T(x
2
)]B [T(x
3
)]B) = ([O]B [x]B [2x
2
]B [3x
3
]B)
o 0 0 0
o 1 0 0
= 0 0 2 0
o 0 0 3
Suppose that p(x) = 1 + 2x + 4x
2
+ 3x
3
. Then
Linear Transformations
1
0 0 0 0 1 0
2
0 0 0 2 2
fp(x)]B =
4
and Afp(x)]B =
0 0 2 0 4 8
3
0 0 0 3 3 9
so that T(P(x» =2x + 8x
2
+ 9x
3
. This can be easily verified by noting that
T(P(x» = xp' (x) = x(2 + 8x + 9x
2
) = 2x + 8x
2
+ 9x
3
.
In general, if p(x) = Po + Plx+ P2x2 + P3x3, then
Po
PI
[P(x)]B
P2
o 0 0 0
o 1 0 0
and A[P(x)]B = 0 0 2 0
Po
PI
P2
o
P3 0 0 0 3 P3 3P3
so that T(P(x» = PIx + 2P2x2 + 3P
3
x3 . Observe that
T(P(x» = xp' (x) = x(Plx + 2P2x + 3P3x) = PIx + 2pzX2 + 3P3
x3
.
verifying our result.
277
Example. Consider the linear operator T: given by T(x\, x
2
) 2x1 x
2
, xI 3x
2
)
for every (x I' x
2
) E .lR
2
. Consider also the basis B = (1, 0), 1, 1) of .lR
2
. Then the matrix
for with respect to is given by
1\1, O)].[T(1, 1 )1.) ([(2, 1 )1. [(3, 4)1. C
Suppose that (xI' x
2
) 3, 2). Then
[(3, 2)]. G) and A[(3,2)]. C (
so that T(3, 2) (1,0) +9(1, 1) (8,9). This can be easily veried directly. In general, we
have
[(XI' x,)lB ( XI :,x,) and A[ XI ,x,1. C :, x} (:: :
so that T(x
l
, x
2
) = (xI  2x
2
) (1 , 0) + (xI + 3x
2
)(1, 1) = (2x
1
+ x
2
, xI + 3x
2
).
Example. Suppose that T: )Rn is a linear transformation. Suppose further that
Band C are the standard bases for .lR nand .lR m respectively. Then the matrix for T with
respect to Band C is given by
A = T(el)]c ... T(en)]c = (T(e
l
) ... T(e
n
),
so it follows from Proposition that is simply the standard matrix for T.
278
Suppose now that
T
J
7 W
and
T2 W7 U
Linear TransformatIOns
are linear transformations, where the real vector spaces V, W, U are finite dimensional,
with respective bases
B = {vI' ... , v
n
}, {wI' ... , w
m
}
and
D = {u
I
' ... , uk}.
We then have the following diagram of linear transformations.
T,
W "+) U
",3 '"
11' 11
S, S, k
___ ........
Here 11: U 7lR
k
, where 11 (u) = [u]D for every U E U, is a linear transformation, and
\ n m d 7' I TlJ)m TlJ)k
SI = 'If 0 1J 0 <I> : lR 7 lR an S2 = 11 0 12 0 'If : m. 7 m.
are euclidean linear transformations. Suppose that Al and A2 for SI and S2' so that
they are respectively the matrix for TI with respect to Band C and the matrix for T2 with
respect to C and D. Clearly
. I n k
S2 0 Sl = 11 0 12 o1J 0 <I> : lR 7 lR ..
It follows thatA01 is the standard matrix for S2 0 SI' and so is the matrix for T2 0 TI
with respect to the bases Band D. To summarize, we have the following result.
Proposition. Suppose that TI : V 7 Wand T2 : W 7 U are linear transformations,
where the real vector spaces V, W, U are finite dimensional, with bases B, C, D respectively.
Suppose further that Al is the matrix for the linear transformation TI with respect to the
bases Band C, and that A2 is the matrix for the linear transformation T2 with respect to the
bases C and D. Then A2 A I is the matrix for the linear transformation T2 x TI with respect
to the bases Band D.
Example. Consider the linear operator TI : P 3 7 P 3' where for every polynomial p(x)
in we have TI (P(x)) = xp'(x). We have already shown that the matrix for TI with respect to
the basis B = {I, x, x
2
, x
3
} of P
3
is given by
Linear Transformations 279
0 0 0 0
0 0 0
Al =
0 0 2 0
0 0 0 3
Consider next the linear operator T2 : P3 ~ P
3
, where for every polynomial q(x) = qo
2 3' P h
+q
l
x+q2x +q3x In 3,we ave
Tiq(x» = q(l + x) = qo + qt(1 + x) + qi1 + x)2 + q3(1 + x)3.
We have Til) = 1, T
2
(x) = 1 + x, T2(x
2
) = 1 + 2x + x
2
and Tix3) = 1 + 3x + 3x
2
+ x
3
,
so that the matrix for T2 with respect to B is given by
A2 = ([T
2
(1)]B [T
2
(x)]B [T
2
(x
2
)]B [T
2
(x
3
)]B)
023
=
003
000
Consider now the composition T= T2 0 TI : P
3
~ P
3
. LetA denote the matrix for T
with respect to B. By Proposition 8T, we have
0 0 0
0 1 2 3 0 1 0
A = A2AI =
0 0 1 3 0 0 2
0 0 0 0 0 0
Suppose thatp(x) = Po + PIx + P2x2 + P3x3. Then
0 0
0 0
=
0 0
3 0
2 3
1 4 9
0 2 9
0 0 3
PI +2Pn +3P3
PI +4P2 +9P3
Po 0 1 2 3 Po
pOl 4 9 P
[p(x>S = I and A[p(x)]B = 0 0 2 9 I =
~ ~ 2 ~ + ~
P3 0 0 0 3 P3 3 P3
so that T(P(x» = (PI+ 2P2 + 3P3) + (PI + 4P2 + 9P3) x + (2P2 + 9P3)x
2
+ 3P3x3. We can
check this directly by noting that
T(P(x» = TiTI(P(x») = T
2
(Plx + 2P2x2 + 3P3x3)
= PI(1 + x) + 2pil + x)2 + 3P3(1 + x)3
= (PI + 2P2 + 3P3) + (PI + 4P2 + 9P3)x + (2P2 + 9P3)x
2
+ 3P3
x3
.
Example. Consider the linear operator T:]R2 ~ ] R 2 , given by
•
280 Linear Transformations
T(x
l
, X
2
) = (2x
l
+ X
2
' xl + 3x
2
)
for every (xl' x
2
) E]R2 . We have already shown that the matrix for Twith respect to
the basis B = {(l, 0), (l, I)} of ]R
2
is given by
A=G
Consider the linear operator T:]R2 7]R2 . By Proposition 8T, the matrix for T2 with
respect to B is given by
A' =(:
Suppose that (xl' x
2
) E]R2 . Then
[(V,)]. = ( XI X') and A,[ V,]. = XI X2)
so that T(xof' x
2
) = 5xil, 0) + (5x] + IQx
2
)(l, I) = (5x
l
+ 5x
2
, 5x
l
+ IOx
2
). The reader
is invited to check this directly. A simple consequence of Propositions 8N and 8T is the
following result concerning inverse linear transformations.
Proposition. Suppose that T: V 7 V is a linear operator on a finite dimensional real
vector space V with basis B. Suppose further that A is the matrix for the linear operator T
with respect to the basis B. Then T is onetoone if and only if A is invertible. Furthermore,
if T is onetoone, then AI is the matrix for the inverse linear operator 11 : V 7 V with
respect to the basis B.
Proof Simply note that T is onetoone if and only if the system
Ax=O
has only the trivial solution X = O. The last assertion follows easily from Proposition
8T, since if A I denotes the matrix for the inverse linear operator 11 with respect to B, then
we must have
A'A = I,
the matrix for the identity operator 110 Twith respect to B.
Example. Consider the linear operator T: P
3
7 P
3
, where for every q(x) = qo + qlx +
2 3' P h
q2
x
+ q3
x
In 3' we ave
T(q(x)) = q(1 + x) = qo + ql(l + x) + qil + x)2 + q3(l + x)3.
We have already shown that the matrix for Twith respect to the basis
. B= {I +x,x
2
,x
3
}
is given by
Linear Transformations 281
1 1
0 1 2 3
A=
0 0 1 3
0 0 0 1
This matrix is invertible, so it follows that T is onetoone. Furthermore, it can be
checked that
1 1 1 1
I
0 1 2 3
A =
0 0 1 3
0 0 0 1
Suppose that p(x) = Po + PIX + P2
x2
+ P3x3. Then
Po
1 1
[p(xh =
PI I
0 1 2 3
and A [P(x)]8 =
0 0 1 3 '
P2
P3
0 0 0 1
Po Po  PI + P2  P3
PI
PI 2P2 +3P2
=
P2
P2 3P3
P3 P3
so that
rI(p(x» = (Po  PI + P2  P3) + (Pt  2P2 + 3PJ)x + (P2  3P3)x2 + P3
x3
= Po + PI(x  1) + P2(X2  2x + 1) + P3(r  3x
2
+ 3x  1)
= Po + PI (x  1) + pix  1)2 + P3(x  1)3 = p(x  1).
Change of Basis
Suppose that V is a finite dimensional real vector space, with one basis B = {v!, ... ,
v
n
} and another basis B' = {u
1
' ... , un}' Suppose that T: V 7 V is a linear operator on V.
Let A denote the matrix for T with respect to the basis B, and let A' denote the matrix for
Twith respect to the basis B'. Ifv E Vand T(v) = w, then
[w]B =A[v]B
and
[W]B' = A,[v]B'
We wish to nd the relationship between A' and A. Recall Proposition J, that if
P = ([ut]B ... [un]B)
282
Linear Transformations
denotes the transition matrix from the basis B' to the basis B, then
[vJs = P[v]B' and [wJs = P[w]B"
Note that the matrix P can also be interpreted as the matrix for the identity operator
I:V7V
with respect to the bases B' and B. It is easy to see that the matrix P is invertible, and
= ([v dB' ... [vn]B' )
denotes the transition matrix from the basis B to the basis B', and can also be interpreted
as the matrix for the identity operator
I:V7V
with respect to the bases Band B'. We conclude that,
[w]B' = w]B = =
Comparing this with (1 ]), we conclude that
This implies that
A =PA'pl.
Remark. We can use the notation
A = [1]B and A' [1]B'
to denote that A and A' are the matrices for T with respect to the basis B and with
respect to the basis B' respectively. We can also write
P = [1]B' B.
to denote that P is the transition matrix from the basis B' to the basis B, so that
= [1]B" B.
Then (13) and (14) become respectively
[I ]B', B[1]B[1 ]B,B' = [1]B' and [1 ]B, B' [1]B' [1]B',B = [1]B'
We have proved the following result.
Proposition. Suppose that T: V 7 V is a linear operator on a finite dimensional real
vector space V, with bases B = {vI' ... , v
n
} and B' = {u
l
' ... , un}' Suppose further that A
and A I are the matrices for T with respect to the basis B and with respect to the basis B'
respectively. Then
=A' andA'=
,
where
P = ([utlB'" [un]B
denotes the transition matrix from the basis B' to the basis B.
Remarks. (1) We have the following picture.
(2) The idea can be extended to the case of linear transformations T: V 7 W ffom a
finite dimensional real vector space into another, with a change of basis in Vand a change
of basis in W.
Linear Transformatiom
~ v ) W ~
V ~ ________________________________ +)W
T
T
A'
[v]B':..:.+) [w]B'
~ /
[V]B ) [W]B
A
283
Example. Consider the vector space P3 of all polynomials with real coefficients and
degree at most 3, with bases B = {1, x, x
2
,x
3
} and B'= {l, 1 + x, 1 + x + x
2
,1 + x + x
2
+ x
3
}.
Consider also the linear operator T: P 3 ~ P 3' where for every polynomial p(x) = Po + PIx
+ pzX2 + P3x3, we have' T(P(x» = (Po + PI) + (PI + P2)x + (P2 +P3)x
2
+ (Po +P3)x
3
. Let A
denote the matrix for T with respect to the basis B. Then
T(l) = 1 + x
3
, T(x) = 1 + x, T(x
2
) = x + x
2
and T(x
3
) = x
2
+ x
3
, and so
o 0
o 0
A = ([T(l)]B [T(x)]B [T(x
2
)]B [T(x
3
)]B) = 0 0
o 0
Next, note that the transition matrix from the basis B' to the basis B is given by
It can be checked that
1
\
0 1
P =
0 0
0 0
0
1
1
0
0
0
1
1
1 1 1 1
o 1 1 1
o 0
000
284 Linear Transformations
and so
1 0 0 1 0 0 1 1
A'=P1AP=
0 1 1 0 0 1 1 0 0 1 1 1 0
=
0 0 1 1 0 0 1 1 0 0 1 1 1
0 0 0 1 1 0 0 1 0 0 0 1
is the matrix for T with respect to the basis Bo. It follows that
T(l) = 1  (1 + x +.x2) + (l + x + x
2
+ x
3
) = 1 + x
3
,
1 0
1 1
1 0
1 1
T(l + x) = 1 + (1 + x)  (1 + x + x
2
) + (1 + x + x
2
+ x
3
) = 2 + x + x
3
,
T(l + x + .x2) = (l + x) + (1 + x + .x2 + x3) = 2 + 2x + .x2 + x3,
T(l + x + x
2
+ x
3
) = 2(1 + x + x
2
+ x
3
) = 2 + 2x + 2x2 + 2x
3
.
These can be veried directly.
0
0
0
2
Eigenvalues and Eigenvectors
Definition. Suppose that T: V ~ V is a linear operator on a finite dimensional real
vector space V. Then any real number A E 1R is called an eigenvalue of T if there exists a
nonzero vector v E V such that T(v) = Av. This nonzero vector v E V is called an
eigenvector of T corresponding to the eigenvalue A.
The purpose of this section is to show that the problem of eigenvalues and eigenvectors
of the linear operator T can be reduced to the problem of eigenvalues and eigenvectors of
the matrix for T with respect to any basis B of V. The starting point of our argument is the
following theorem, the proof of which is left as an exercise.
Proposition. Suppose that T: V ~ V is a linear operator on a finite dimensional real
vector space V, with bases Band B'. Suppose further that A and A' are the matrices for T
with respect to the basis B and with respect to the basis B' respectively. Then
(a) det A = det A',
(b) A and A' have the same rank,
(c) A and A' have the same characteristic polynomial,
(d) A and A' have the same eigenvalues, and
(e) The dimension of the eigenspace of A corresponding to an eigenvalue A is
equal to the dimension of the eigenspace of A' corresponding to A.
We also state without proof the following result.
Proposition. Suppose that T: V ~ V is a linear operator on a finite dimensional real
vector space V. Suppose further that A is the matrix for T with respect to a basis B of V.
Then
(a) The eigenvalues of T are precisely the eigenvalues of A, and
(b) A vector U E V is an eigenvector of T corresponding to an eigenvalue A if
and only if the coordinate matrix [ul
B
is an eigenvector of A corresponding
to the eigenvalue A.
Linear Transformations
Suppose now that A is the matrix for a linear operator
T: V
on a finite dimensional real vector space V with respect to a basis
B = {vI' ... , v
n
}·
If A can be diagonalized, then there exists an invertible matrix P such that
jTIAP=D
285
is a diagonal matrix. Furthermore, the columns of P are eigenvectors of A, and so are
the coordinate matrices of eigenvectors of T with respect to the basis B. In other words,
P = ([uI]B ... [un]B)'
where B' = {u
I
' ... , un} is a basis of V consiting of eigenvectors of T. Furthermore, P
is the transition matrix from the basis B' to the basis B. It follows that the matrix for Twith
respect to the basis B' is given by
D{I .. J
where AI' ... , An are the eigenvalues of T.
Example. Consider the vector space P 2 of all polynomials with real coefficients and
degree at most 2, with basis B = {I, x, x
2
}. Consider also the linear operator T : P 2 P 2'
where for every polynomial p(x) = Po +PIx+Pr2, we have
T(P(x» = (5po 2PI) + (6PI + 2P2  2Po)x + (2p} + 7P2)x2·
Then
T(I) = 5  2x, T(x) = 2 + 6x + 2x
2
and
T(x
2
) = 2x + 7x
2
,
so that the matrix for T with respect to the basis B is given by
(
5 2 OJ
A = ([T(I)]B [T(x)]B [T(x
2
)]B) = 2 6 2.
° 2 7
It is a simple exercise to show that the matrix A has eigenvalues 3, 6, 9, with
corresponding eigenvectors
XI =(
so that writing •
I
2 2
286 Linear Transformations
we have
Now let Bo = {PI (x), Pz{x), P3(x)}, where
[A (xl]B PJ [p, (xl]s ( ( n
Then P is the transition matrix from the basis B' to the basis B, and D is the matrix for
T with respect to the basis B'. Clearly
PI(x) = 2 + 2x x2,pix) = 2 x + 2x
2
ad
P3(X) = 1 + 2x + 2x
2
.
Note now that
T(p} (x)) = T(2 + 2x  xx
2
) = 6 + 6x  3x
2
= 3PI (x),
T(P2(x)) = T(2 x + 2x
2
) = 12  6x + 12x2 = 6P2(x),
T(pix)) = T(1 + 2x + 2x
2
) = 9 + 18x + 18x
2
= 9P3(x).
'.
A Practical Approach to
LINEAR ALGEBRA
"This page is Intentionally Left Blank"
A Practical Approach to
LINEAR ALGEBRA
Prabhat Choudhary
'Oxford Book Company
Jaipur. India
ISBN: 9788189473952
First Edition 2009
Oxford Book Company 267, 10BScheme, Opp. Narayan Niwas, Gopalpura By Pass Road, Jaipur3020 18 Phone: 01412594705, Fax: 01412597527 email: oxfordbook@sity.com website: www.oxfordbookcompany.com
© Reserved
Typeset by:
Shivangi Computers 267, lOBScheme, Opp. Narayan Niwas, Gopalpura By Pass Road, Jaipur3020 18
Printed at :
Rajdhani Printers, Delhi
All Rights are Reserved. No part ofthis publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic. mechanical, photocopying, recording, scanning or otherwise, without the prior written permission of the copyright owner. Responsibility for the facts stated, opinions expressed, conclusions reached and plagiarism, if any, in this volume is entirely that of the Author, according to whom the matter encompassed in this book has been origmally created/edited and resemblance with any such publication may be incidental. The Publisher bears no responsibility for them, whatsoever.
and it is usually studied as a sophisticated phenomenon of Interpolation between different approximatelyLinear Regimes. We must emphasize that mathematics is not a spectator sport. or blockstructured problems arising in the major areas of direct solution of linear systems. and rapid elliptic solvers. Prabhat Choudhary . The purpose is to review the current status and to provide an overall perspective of parallel algorithms for solving dense. There is a widespread feeling that the nonlinear world is very different. banded. and that in order to understand and appreciate mathematics it is necessary to do a great deal of personal cogitation and problem solving. Linear algebra is an indispensable tool in such research and this paper attempts to collect and describe a selection of some of its more important parallel algorithms.Preface Linear Algebra has occupied a very crucial place in Mathematics. eigenvalue and singular value computations. least squares computations. Linear Algebra is a continuation of classical course in the light of the modem development in Science and Mathematics. Scientific and engineering research is becoming increasingly dependent upon the development and implementation of efficient parallel algorithms.
"This page is Intentionally Left Blank" .
.2:' Systems of Linear Equations .3. Bilinear and Quadratic Forms : 9. Advanced Spectral Theory 10. Matrics'. Structure of Operators in Inner Product Spaces 8. Introduction to Spectral Theory 6. Determinants 5. Basic Notions 1 26 50 101 139 162 198 221 234 252 . Inner Product Spaces 7..Contents Preface v 1. Linear Transformations . 4.
"This page is Intentionally Left Blank" .
But. Such additive inverse is usually denoted as v. W E V. along with two operations. 2. 8. Multiplicative associativity: (a~)v = a(~v) for all v E Vand all E scalars a. W E V such that The next two properties concern multiplication: 5. which connect multiplication and addition: 7. two distributive properties. you cannot multiply two vectors. v. Multiplicative identity: 1v = v for all v E V. Additive inverse: For every vector v E V there exists a vector v + W = O. v E Vand all sCfllars a. such that the following properties (the socalled axioms of a vector space) hold: The first four properties deal with the addition of vector: 1. And finally. denoted by 0 such that v + 0 = v for all v E V. They are simply the familiar rules of algebraic manipulations with numbers. W E V. It is also easy to show that . 4. a(u + v) = au + av for all u.Chapter 1 Basic Notions VECTOR SPACES A vector space V is a collection of objects. and you can multiply a vector by a number (scalar). Zero vector: there exists a special vector. Of course. Remark: It is not hard to show that zero vector 0 is unique. The only new twist here is that you have to understand what operations you can apply to what objects. Associativity: (u + v) + W = u + (v + w) for all u. 3. (a + ~)v = av + ~v for all v E Vand all scalars a. ~. Remark: The above properties seem hard to memorize. Commutativity: v + w = w + v for all v. called vectors. 6. or add a number to a vector. addition of vectors and multiplication by a number (scalar). You can add vectors. but it is not necessary. you can do with number all possible manipulations that you have learned before. ~.
i.2 Basic Notions given v E V the inverse vector v is unique. Note. is a complex vector space. complex coefficient give us a complex vector space. + ant. i. Addition and multiplication are defined exactly as in the case of lRn . we call the space Va complex vector space. only the entries now are complex numbers.. In the case of real coefficients ak we have a real vector space. Although many of the constructions in the book work for general fields. en aVn vn wn vn +wn en r . In fact. we call the space Va real vector space.e.. where t is the independent variable. Addition and multiplication are defined entrywise. If the scalars are the complex numbers. Note.e. Example: The space Mmxn (also denoted as Mm n) ofm x n matrices: the multiplication and addition are defined entrywise. but not the other way around. we then have a complex vector space. In this case we say that V is a vector space over the field IF. if we allow complex entries and multiplication by complex numbers. Example: The space lP'n of polynomials of degree at most n...e. VI v2 v= vn whose entries are real numbers. It is also possible to consider a situation when the scalars are elements of an arbitrary field IF.e.Ifwe allow only real entries (and so only multiplication only by reals).. in this text we consider only real and complex vector spaces. we can multiply by real numbers).. if we can multiply vectors by complex numbers. a vn Example: The space also consists of columns of size n. consists of all polynomials p of form pet) = ao + alt + a2 +. that any complex vector space is a real vector space as well (if we can multiply by complex numbers. properties can be deduced from the properties: they imply that 0 = Ov for any v E V. coefficients ak can be O. i. the only difference is that we can now multiply vectors by complex numbers. then we have a real vector space. i. or even all. that some. IF is always either lR or Co Example: The space lRn consists of all columns of size n. and that v = (l)v. If the scalars are the usual real numbers.
Let Vbe a vector space. Elements of the array are called entries of the matrix. LINEAR COMBINATIONS. We will study this in detail later. and let VI' v2"'" vp E Vbe a collection of vectors.k or (A)J.k' and sometimes as in example above the same letter but in lowercase is used for the matrix entries.)m n = a2. xn)T. The transpose of a matrix has a very nice interpretation in terms of linear transformations. It is often convenient to denote matrix entries by indexed letters is}.Basic Notions 3 Question: What are zero vectors in each of the above examples? Matrix notation An m x n matrix is a rectangular array with m rows and n columns. but for now transposition will be just a useful formal operation. the columns of AT are the rows of A and vise versa.k = (A)kj meaning that the entry of AT in the row number) and column number k equals the entry of A in the row number k and row number}. The formal definition is as follows: (AT)j.1 is a general way to write an m x n matrix. One of the first uses of the transpose is that we can write a column vector x E IRn as x = (XI' x 2' •. where the entry is aij' and the second one is the number of the column. For example al. . Very often for a matrix A the entry in row number) and column number k is denoted by AJ. Ifwe put the column vertically.. 5 3 6 So. For example 1 5 3)T (4 2 6 = (12 4J .k=1 . the rows of AT are the columns ofA. its transpose (or transposed matrix) AT. BASES. namely it gives the so called adjoint transformation. A linear combination of vectors VI' v 2"'" vp is a sum of form . it will use significantly more space.1 A = (a ' k ' J j=l. the first index denotes the number of the row. is defined by transforming the rows of A into the columns. Given a matrix A.
.. or with respect to the basis vI' v2' ••. en E ]Rn is called the standard basis in ]Rn... en =~.·.. . Before discussing any properties of bases. en = 0 . + xmvn = v (with unknowns xk ) has a unique solution for arbitrary right side v. _= . VII is a basis is to say that the equation xlvI + x 2v2 +. ••• . . which is 1). ell is a basis in Rn. e2 = P. v. .. .. . an are called coordinates of the vector v (in the basis.. = 0 .. Example: In this example the space is the space Jllln of the polynomials of degree at most n.4 p Basic Notions aiv i + a 2v2 +..... Consider vectors (polynomials) eo' e l . ... and it makes sense to study them. showing that such objects exist. . The system of vectors e l .. (the vector ek has all entries 0 except the entry number k. xnen = LXkek k=1 and this representation is unique... e3 0 = 0 . e 2 . vn E Vis called a basis (for the vector space V) if any vector v E V admits a unique representation as a linear combination II V = aiv i + a 2v2 +. . + anvn = Lak vk • k=1 The coefficients ai' a 2. e2 0 = 0 . Consider vectors 0 0 0 0 0 0 1 e. e2. let us give few examples.. any vector V= can be represented as the linear combination V Xn n = xle l + x 2e2 +. e 1 = t. e3 =~.. en E Jllln defined by eo 1. .J Another way to say that vI' v2'. Example: The space V is ]RII. . . The system e l . Indeed. k=1 Definition: A system of vectors vI' v2. + apvp = Lakvk. e2...
spanning and complete here are synonyms.. We will call it the standard basis in pn.~kVk= L(Uk+~k)Vk> k=1 i. + upvp = I.. .e. k=1 Uk vk . we can operate with them as if they were column vectors. or a complete system) in V if any vector v E V admits representation as a linear combination p v = Uiv i + U 2v 2 +.. . to get the column of coordinates of the sum one just need to add the columns of coordinates of the summands.. The definition of a basis says that any vector admits a unique representation as a linear combination. Clearly. any basis is a generating (complete) system.. say vn +1' . en E pn is a basis in pn. vn' then any vector v is uniquely defined by its coeffcients in the decomposition v = I. namely that the representation exists and that it is unique. if we have a basis. This statement is in fact two statements... i. . Definition: A system of vectors vI' '. •. vn' and just ignore the new ones (by putting corresponding coefficients uk = 0).. k=1 ~k vk ' then n v+w= I. Also. if we stack the coefficients uk in a column. vp ' then the new system will be a generating (complete) system.UkVk+ k=1 I. if v = I. k=1 Uk vk n k=! and w = n I. vn' and we add to it several vectors. because of my operator theory background.. So. . . We do not want to worry about existence. Let us analyse these two statements separately. we can represent any vector as a linear combination of the vectors vI' v2' .. Now. which always admits a representation as a linear combination. pet) representation = ao + alt + a 2t 2 + + ain admits a unique p = aoe o + aiel +. Indeed... + anen· So the system eo' e l . •.Basic Notions 5 Clearly... Namely.'2' . The term complete. say vI' v2' .. e 2. . ' Vp E Vis called a generating system (also a spanning system. The words generating.UkVk k=1 The only difference with the definition of a basis is that we do not assume that the representation above is unique. Remark: If a vector space V has a basis vI' v2.e. so let us consider the zero vector 0.. Generating and Linearly Independent Systems. let us turn our attention to the uniqueness. as with elements oflRn. any polynomial p.
that a system is linearly dependent if and only ifthere exist scalars at' a 2. vp is called linearly dependent if 0 can be represented as a nontrivial linear combination.... If a system is not linearly independent. ..:=11 ak 1"* 0 such that . So.:=l akvk with a k = 0 Vk) of vectors vI' V2. . ... . restating the definition we can say. = xp = O.:=1 1xk 1"* O.. vp is linearly independent i the equation xlvI + x 2v2 + . .. Nontrivial here means that at least one of the coefficient a k is nonzero.. and it can be written as :2.. ••• .. This can be (and usually is) written as :2. 0 = :2. Nontrivial. Definition: A system of vectors vI' v2' . The following proposition gives an alternative description of linearly dependent systems. Then there exist scalars ak' :2. A trivi~llinear combination is always (for all choices of vectors vI' v2' .P' :2. ••• . ..~iVj' VI' P j=1 j*k Proof Suppose the system V2"'" vp is linearly dependent.•. By negating the definition of linear independence. vp equals O.... vp is linearly dependent i the equation XlVI +x2v2 +··· +xpvp=O (with unknowns x k ) has a nontrivial solution... . (J. a k vk = o. it is called linearly dependent. the system vI' v2. k=1 An alternative definition (in terms of equations) is that a system VI' v 2' ••• .6 Basic Notions Definition: A linear combination a l vI + a 2v2 +. In other words.. vp ) equal to 0. Vk = :2. + apvp is called trivial if a k = 0 Vk. Proposition: A system of vectors VI' V2.. once again again means that at least one ofxk is different from 0. and that IS probably the reason for the name. vp E V is called linearly independent if only the trivial linear combination (:2..:=l akvk . vp E V is linearly dependent if and only if one of the vectors Vk can be represented as a linear combination of the other vectors.:=1 1ak 1"* o.. . + xpvp = 0 (with unknowns x k) has only trivial solution xI = x 2 =.:=11 ak 1"* 0 such that p :2. we get the following Definition: A system of vectors vI' v2 ..
if a system vI' v 2. v can be represented as n V = I'V V I U. Suppose a system vI' v2' .. The following proposition shows that the converse implication is also true. the trivial linear combination must be the only one giving O... ••. Take an arbitrary vector v2 v.• + '""'n Vn = £. Indeed. Suppose v admits another representation Then . vn is linearly complete (generating).. if a system is a basis it is a complete (generating) and linearly independent system. if holds.J akvk' rv ~ I'V k=I We only need to show that this representation is unique. 0 can be represented as a nontrivial linear combination a/a vk  I~jVj j=1 P =0 j"#k Obviously. O. vn is a basis. Vn E V is a basis if and only if it is linearly independent and complete (generating).. .. vn is linearly independent and complete. so in one direction the proposition is already proved.• . . + apvp = O. Then. So. Proof: We already know that a basis is always linearly independent and complete. + anvn = Lakvk' k=l Since the trivial linear combination always gives 0. any basis is a linearly independent system. moving all terms except akvk to the right side we get p akvk Iajvj. Let us prove the other direction. Let k be the index such that ak:f:..I + '""'2V 2 + •..···. Since the system vI' v2. j=1 j"#k Dividing both sides by ak we get with ~j = k• On the other hand. Proposition: A system of vectors v I' v2' . 0 admits a unique representation n 0= alv 1 + a 2v2 +. as we already discussed..Basic Notions 7 aiv i + a 2v 2 + .
So. . Although this definition is more common than one presented in this text.e.L. as the examples below show. i. Then there exists a vector V k which can be represented as a linear combination of the vectors vj' j :j. Suppose it is not linearly independent. and the set Y is called the target space or codomain of T. we are done. it is linearly dependent. If the new system is linearly independent. Examples: You dealt with linear transformation before. namely that any vector admits a unique representation as a linear combination. and we are done. Proof Suppose VI' v2"'" Vp E V is a generating (complete) set._l be the differentiation operator.. So. any linear combination of vectors vI' v2"'" vp can be represented as a linear combination of the same vectors without vk (i. because otherwise we delete all vectors and end up with an empty set.e. if we delete the vector vk.e. + anvn is unique. 1fnot. Repeating this procedure finitely many times we arrive to a linearly independent and complete system..8 n n n Basic Notions L. 2.j = k).. v E V. any finite complete (generating) set contains a complete linearly independent subset. b. T (av) = aT (v) for all v E Vand for all scalars a. the vectors vj' 1 ~j ~p. If it is linearly independent. W be vector spaces. (ak advk = L. v E Vand for all scalars a.. may be without even suspecting it.fk.fu. The set X is called the domain of T. We write T: X ~ Y to say that T is a transformation with the domain X and the target space Y. Uk . Remark: In many textbooks a basis is defined as a complete and linearly independent system.Uk = 0 'r. i. Example: Differentiation: Let V = lfDn (the set of polynomials of degree at most n). Properties I and 2 together are equivalent to the following one: T (au + pv) = aT (u) + PT (v) for all u. and thus the representation v = aIv I + a 2v 2 +. the new system will still be a complete one. it is a basis. MATRIXVECTOR MULTIPLICATION A transformation T from a set X to a set Y is a rule that for each argument (input) x E X assigns a value (output) y = T (x) E Y. A transformation T: V ~ W is called linear if I. k.. W = lP'nl' and let T: lfDn ~ lfD. (akvk). a basis. It emphasizes the main property of a basis. LINEAR TRANSFORMATIONS. Since vk can be represented as a linear combination of vectors vj' j :j.akVk = VV= 0 k=l k=l k=l Since the system is linearly independent. Definition: Let V.. k. we repeat the procedure. Proposition: Any (finite) generating system contains a basis. T (u + v) = T(u) + T (v) 'r.
Any such transformation is given by the formula T (x) = ax where a = T (1)./)' = af'. any linear transformation of jR is just a multiplication by a constant. T(x) = T(x x I) =xT(l) =xa = ax. it rotates as a whole the parallelogram used to define a sum of two vectors (parallelogram law). and a transformation Ty: jR2 7 jR2 takes a vector in jR2 and rotates it counterclockwise by r radians. Indeed. Example: Reflection: in this example again V = W = jR2. It is also easy to see that the property 2 is also true. and the transformation T: jR2 7 jR2 is the reflection in the first coordinate axis. it is easy to write a formula for T. but we will use another way to show that. . this is a linear transformation. but by a matrix. Example: Let us investigate linear transformations T: jR ~ lR. Linear transformations J!{' 7 Matrixcolumn mUltiplication: It turns out that a linear transformation T: jRn 7 jRm also can be represented as a multiplication. Example: Rotation: in this example V = W = jR2 (the usual coordinate plane). Therefore the property 1 of linear transformation holds. Since Tyrotates the plane as a whole. It can also be shown geometrically.Basic Notions 9 T (p):= p'lip E lP'n' Since if + g) = f + g and (a. not by a number. Fig. Rotation Namely. r. So. that this transformation is linear. T ((::)) ~ V~J and from this formula it is easy to check that the transformation is linear.
.2 al.n a2.e..2 +X2 a2..k ak = Then if we want Ax = T (x) we get al.n am.k al. this matrix contains all the information about T. Namely.. Let us show how one should define the product of a matrix and a vector (column) to represent the transformation T as a product.. Indeed.e. Let al. What information do we need to compute T (x) for all vectors x E ]Rn? My claim is that it is sufficient how T acts on the standard basis e" e 2. 2.. .n a2..1 A= am.l So. the matrixvect~r multiplication should be performed by the following column by . Let T: ]Rn ~ ]Rm be a linear transformation..2 am.2 am.2 +···+Xn al. it is sufficient to know n vectors in Rm (i. . .1 am.. that the column number k of A is the vector ak . k=l k=l k=l k=l So. .n = Xl a2.l Ax = n LXkak k=l a2.l am. an] (ak being the kth column of A.10 Basic Notions Let us see how." the vectors of size m).n al.k am. if we join the vectors (columns) aI' a2. . T (x) = Ax.. i. n). k = 1.. . . an together in a matrix A = [aI' a2.n Recall.. en of Rn. let X= Xn Then x = xle l + x 2e2 + .2 a2. al. + xnen = L:~=lxkek and T(x) = T(ixkek) = iT(Xkek) = iXkT(ek) = iXkak .l a2.
. one need to multiply row number k of the matrix by the vector. T(x) = Ax. .. if Ax = y. ifit is a basis) in V.2. . e2. it is more convenient to compute the result one entry at a time. .2.2+3. ••• .Basic Notions 11 coordinate rule: Multiply each column of the matrix by the corresponding coordinate of the vector. n.. . m. Example: The "column by coordinate" rule is very well adapted for parallel computing. if vI' V 2. Example: 1 (4 2 6 5 3)(~J = (1. ••.}}.. . even any generating (spanning) set. a linear transformation T: V ? W is completely defined by its values on a generating set (in particular by its values on a basis). vn is a generating set (in particular. and T and TI are linear transformations T. Namely. when doing computations manually. that is. j=lk.1+2. then T (x) can be found by the matrixvector multiplication. T~: V ? W such that Tvk = TIv". If the matrix A of the linear transformation T is known. . then yk = I n a ·x· . This can be expressed as the following row by column rule: To get the entry number k of the result. The fact that we consider the standard basis is not essential. 2. en is the standard basis in Rn) into a matrix: kth column of the matrix is ak' k = 1. one can consider any basis.3)=(14) 3 4·1 + 5·2 + 6·3 32 Linear transformations and generating sets: As we discussed above. n thenT= TI. linear transformation T(acting from ~nto ~m) is completely defined by its values on the standard basis in ~n. To perform matrixvector multiplication one can use either "column by coordinate" or "row by column" rule. However.. and aj'k are the entries of the matrix A. To get the matrix of a linear transformation T: Rn ? Rm one needs to join the vectors ak = T ek (where e I . Conclusions 1. In particular.2. . It will be also very important in different theoretical constructions later. here Xj and Yk are coordinates ofthe vectors x and y respectively..k = 1. k= 1.
very often people do not distinguish between a linear transformation and its matrix. Ab2 . . the size of a row of A should be equal to the size of a column of B. We will also use this notation. then Ab I . but if we recall the row by column rule for the matrixvector multiplication. For example. In other words the product AB is defined i£ and only if A is an m x nand B is n x r matrix. then the multiplication is not defined. if using the "row by column" rule you run out of row entries..n can only be multiplied by an m x n matrix. we can see that in order for the multiplication to be defined. m. if aj.e. since an m x n matrix defines a linear transformation JR. not T(v + u). . i.n ~ JR. but still have unused entries in the column. ••.k are entries of the matrices A and B respectively. I intentionally did not speak about sizes of the matrices A and B.. However.k = Laj"b"k' . When it does not lead to confusion. Remark: In the matrixvector mUltiplication Ax the number of columns of the matrix A matrix must coincide with the size of the vector x. (column k of B) Formally it can be rewritten as (AB)j. . the notation Tv is often used instead of T(v). Formally. Abr are the columns of the matrix AB. the multiplication is not defined. its matrix is usually denoted as [T].n. Since a linear transformation is essentially a multiplication. .n ~ JR:m.k = (row j of A) . but still have some unused entries in the vector. It is also not defined if you run out of vector's entries. and use the same symbol for both.. Recalling the row by column rule for the matrixvector mUltiplication we get the following row by column rule for the matrices the entry (AB)j. COMPOSITION OF LINEAR TRANSFORMATIONS AND lVIATRIX MULTIPLICATION Definition of the matrix multiplication: Knowing matrixvector multiplication. It makes sense. one can easily guess what is the natural way to define the product AB of two matrices: Let us multiply by A each column of B (matrixvector multiplication) and join the resulting columnvectors into a matrix. br are the columns of B. if b I . Tv + u means T(v) + u. b2 .e. The former is well adapted for parallel computers.k (the entry in the row j and column k) of the product AB is defined by (AB)j. we will also use the same symbol for a transformation and its matrix. i. Note that the usual order of algebraic operations apply.k and bj.12 Basic Notions The latter seems more appropriate for manual computations. and will be used in different theoretical constructions." a vector in JR. so vector x must belong to JR. For a linear transformation T: JR. The easiest way to remember this is to remember that if performing multiplication you run out of some elements faster.
. the expression TI(Tix)) is well defined and the result belongs to ]Rm.. let g be the angle between the xI axis and the line xI = 3x2.. er is the standard basis in Rr. . as it is defined above. . knowing the matrices of TI and T2? Let A be the matrix of TI and B be the matrix of T2. we can (and will) write TI T2 instead of TI T2 and TIT2x instead of TI(Tix )). . the columns of T are vectors T (e l ). then TI must act from Rn to some space. Why are we using such a complicated rule of multiplication? Why don't we just multiply matrices entrywise? And the answer is. However. T2 as . so it is defined by an m x r matrix. T1: ]Rn ~ ]Rm and T2: ]Rr ~ ]Rn. and then rotate the plane by g. For k = 1. Suppose we have two linear transformations. . 2. arises naturally from the composition of linear transformations.. the direct computation of Te I and Te2 involves significantly more trigonometry than a sane person is willing to remember. . Then to get the reflection T we can first rotate the plane by the angle g. How one can find this matrix. T: ]Rr ~ ]Rm. To find the matrix. As we discussed in the previous section. and let To be the reflection in the xlaxis. Namely. and that is exactly how the matrix AB was defined! Let us return to identifying again a linear transformation with its matrix. T(e r). Note that in the composition TI T2 the transformation T2 is applied first! The way to remember this is to see that in TI T2x the transformation T2 meets x first. say ]Rm. So. . It is a linear transformation.. but in the next few paragraphs we will distinguish them be of sizes m x nand n x r respectivelythe same condition as obtained from the row by column rule.. we need to compute Tel and Te2 . T (e 2). So. that the multiplication.. Formally it can be written as Motivation: Composition of linear transformations. taking everything back.. the columns of the matrix of Tare Abl' Ab2.. . say ]R'to ]Rn. Define the composition T = TI T2 of the transformations T I . where el' e2. Note that TI(x) ERn. moving the line xI = 3x2 to the xlaxis. Abr.Basic Notions 13 T (x) = TI(Tix)) \Ix ERr. different form the row by column rule: for a composition T JT2 to be defined it is necessary that T2x belongs to the domain of T1• If T2 acts from some space. An easier way to find the matrix of T is to represent it as a composition of simple linear transformation. Since the matrix multiplication agrees with the composition. Remark: There is another way of checking the dimensions of matrices in a product. so let us find its matrix. in order for TI T2 to be defined the matrices of TI and T2 should We will usually identify a linear transformation and its matrix. So. It is easy to show that T is a linear transformation. Example: Let T: ]R2 ~ ]R2 be the reflection in the line xI = 3x2. then reflect everything in the xIaxis. r we have T (e k) = TI (T2(e k )) = TI(Be k) = TI(b k) = Abk (operators T2 and TI are simply the mUltiplication by B and A respectively). Since TI:]Rn ~ ]Rm.
The properties of linear transformations then imply the properties for the matrix multiplication. 3.. Distributivity: A(B + C) = AB + AC.JW 1 and similarly sin y = second coordinate length . first coordinate 3 3 cos Y= length . This properties are easy to prove. (A + B)C = AC + BC. One can take scalar multiplies out: A(aB) = aAB. Properties of Matrix Multiplication. The new twist here is that the commutativity fails: Matrix multiplication is noncommutative. 2. provided that either left or right side is well defined. One should prove the corresponding properties for linear transformations.~32 + 12  Il. cosy sin Y) Ry = ( sin y cos y . The matrix of To is easy to compute. To compute sin yand cos ytake a vector in the line x I = 3x2. generally for matrices AB = BA. provided either left or right side of each equation is well defined. familiar to us from high school algebra: I.14 Basic Notions T= RgTORY where Rg is the rotation by g.J1O Gathering everything together we get T~ VoRy~ lo G~I)(~ ~I) lo (~I ~) ~ I~ G~I)(~ ~I)( ~I ~) It remains only to perform matrix multiplication here to get the final result. Then . . Associativity: A(BC) = (AB)C.e. and they almost trivially follow from the definitions. i.~32 + 12  . COS(y) R_y= ( sin(y) siney») (COSY cos(y) = siny sin y) cosy. say a vector (3. To the rotation matrices are known ~ (~ _~). Matrix multiplication enjoys a lot of properties.
.e. Then the product AB is well defined.. namely it gives the socalled adjoint transformation.k = (A)kJ meaning that the entry of AT in the row number) and column number k equals the entry of A in the row number k and row number}. Even when both products are well defined. T (1 4) So. the multiplication is still noncommutative. For example I 2 (4 5 !) ~ ~ ! . Trace and Matrix Multiplication. Given a matrix A." when you take the transpose of the product. letA and B be matrices of sizes m x nand n x r respectively. A simple analysis of the row by columns rule shows that (AB)T = BTAT. If we put the column vertically. Then trace(AB) = trace(BA) . you change the order of the terms. If we just pick the matrices A and B at random. The formal definition is as follows: (AT)j. the rows of AT are the columns ofA. BA is not defined. it will use significantly more space. Xn)T. the columns of AT are the rows of A and vise versa. for example. but if m = r. its transpose (or transposed matrix) AT is defined by transforming the rows of A into the columns. when A and Bare nxn (square) matrices. the chances are that AB = BA: we have to be very lucky to get AB = BA. Transposed Matrices and Multiplication. We will study this in detail later.k n Theorem: Let A and B be matrices of size m Xn and n Xm respectively (so the both p )ducts AB and BA are well defined).k) its trace (denoted by trace A) is the sum of the trace A = k=l L ak. i.• . The transpose of a matrix has a very nice interpretation in terms of linear transformations. One of the first uses of the transpose is that we can write a column vector x E Rn as x = (x \' x 2. For a square (n x n) matrix A diagonal entries = (aj. but for now transposition will be just a useful formal operation.Basic Notions 15 One can see easily it would be unreasonable to expect the commutativity of matrix multiplication. Indeed.
The transformation A is called right invertible if there exists a linear transformation C: W ~ V such that .k' There are essentially two ways of proving this theorem. if I: lR n ~ lR n is the identity transformation in Rn. T} (X) = trace(XA) To prove the theorem it is sufficient to show that T = T 1. We can consider two linear transformations. ISOMORPHISMS IDENTITY TRANSFORMATION AND IDENTITY MATRIX Among all linear transformations. However. If you are not comfortable with algebraic manipulatioos. we use the notation In. we need just to check the equality on some simple matrices. INVERTIBLE TRANSFORMATIONS AND MATRICES. Since a linear transformation is completely defined by its values on a generating system. To be precise. Clearly. there is the identity transformation I = Iv: V ~ V. there are infinitely many identity transformations: for any vector space V. otherwise we just use 1. Ivx = x. the equalities AI=A. T and Tl' acting from Mnxm to lR = lRI defined by T (X) = trace(AX).16 Basic Notions J0. there is a special one. the equality for X = A gives the theorem. when it is does not lead to the confusion we will use the same symbol I for all identity operators (transformations). 'Vx E V.IA =A hold (whenever the product is defined). for an arbitrary linear transformation A. One is to compute the diagonal. entries of AB and of BA and compare their sums. We will use the notation IV only we want to emphasize in what space the transformation is acting. there is another way. the identity transformation (operator) L Ix = x. 'Vx. its matrix is an n x n matrix 1 0 o 1 1=1n = 0 0 1 o 0 (l on the main diagonal and 0 everywhere else). We say that the transformation A is left invertible if there exist a transformation B: W ~ V such that BA = I (I = I v here). When we want to emphasize the size of the matrix. for example on matrices which has all entries 0 except the entry I in the intersection of jth column and kth row. Clearly. This method requires some proficiency in manipulating sums in notation. INVERTffiLE TRANSFORMATIONS Definition: Let A: V ~ W be a linear transformation.
On the other hand BAC = (BA)C = IC = C. Examples: 1. The uniqueness of C is proved similarly. and generally left and right inverses are not unique. AA. The matrix AI is called (surprise) the inverse of A. and it also can be checked by the matrix multiplication. Ry = (C~S1 sm 1 sin cos 1 1) rl is invertible. Theorem. To show that this matrix is not right invertible. Corollary: A transformation A: V ~ Wis invertible if and only if there erty is used as the exists a unique linear transformation (denoted AI). Exercise: describe all left inverses of this matrix. right invertible) if the corresponding linear transformation is invertible (resp. If a linear transformation A: V ~ W is invertible. 112). One of the possible left inverses in the row (112. right invertible).Basic Notions 17 AC = I (here 1= I w)' The transformations Band C are called left and right inverses of A.I = 1. Note. 11 The rotation Rg = I. Then BAC = B(AC) = BI = B. and the inverse is given by (R y = R_"( This equality is clear from the geometric description of Rg. Repeating the above reasoning with B I instead of B we get B 1 = C. I)T is left invertible but not right invertible. we just notice that there are more than one left inverse. then its left and right inverses Band C are unique and coincide. The column (l. AI: W ~ V such definition of an AIA = IV' AAl = Iw The transformation AI is called the inverse of A. that we did not assume the uniqueness of B or C here. . Definition: A matrix is called invertible (resp. The identity transformation (matrix) is invertible. left invertible. 2. Therefore the left inverse B is unique. Suppose for some transformation BI we have BIA = 1. left invertible. Proof Let BA = I and AC = 1. Theorem: asserts that a matrix A is invertible if there exists a unique matrix AI such that A1A = I. 3. and therefore B = C. Definition: A linear transformation A: V ~ W is called invertible if it is both right and left invertible.
I = L AIA = 1. (AIr l = A. If A is invertible. I presented it here only to stop trying wrong directions. The row (l. and similarly AT (AIl = (AtAl = IT = 1. then AI is also invertible.sIB = I Remark: The invertibility of the product AB does not imply the invertibility of the factors A and B (can you think of an example?). . Theorem: (Inverse of AT).sIAI) = A(B. (AIt l = A. And finally. However. it is just another name for the object we already studied. If A and B are invertible and the product AB is defined.18 Basic Notions 4. if a square matrix A has either left of right inverse. 1) is right invertible. 1I2l is a possible right inverse. 2. we will not use it. This fact will be proved later. If linear transformations A and B are invertible (and such that the product AB is defined).sIAI. Isomorphic spaces can be considered as di erent representation of the same space. then AT is also invertible and (ATrl = (At)T Proof Using (ABl = BT AT we get (At)T AT = (AAt)T = IT = I. if one of the factors (either A or B) and the product AB are invertible. We did not introduce anything new here. ret us summarize the main properties of the' inverse: 1. but not left invertible. Until we prove this fact.I = AAI = I and similarly (. then the second factor is also invertible. then AI is also invertible. it is sufficient to check only one of the identities AA. Remark: An invertible matrix must be square (n x n). So. ISOMORPHISM ISOMORPHIC SPACES. then AT is also invertible and (ATt l = (AI)T. Properties of the Inverse Transformation Theorem: (Inverse of the product). then the product AB is invertible and (ABt l = . 3. The column (112. If A is invertible.sI(AIA)B = . it is invertib!e.sI AI (note the change of the order!) Proof Direct computation shows: (AB)(.sI)AI = AIA. Two vector spaces V and Ware called isomorphic (denoted V == W) if there is an isomorphism A: V ~ W. So. then AB is invertible and (AB)I = . Moreover. An invertible linear transformation A: V ~ W is called an isomorphism.sIAI)(AB) = . If a matrix A is invertible. if A is invertible.sIIB = .
en is the standard basis in ]Rn. vn and WI' w2' . ... AVn is a basis..•. . . k = 1. .. Let A: ]Rn+1 ~ JP>n (JP>n is the set of polynomials I:=oak tk of degree at most n) is defined by Ae l = 1. . .. k= 1. Remark: If A is an isomorphism. vn be a basis in V. Aen+ 1 = (1 By Theorem A is an isomorphism. then A is an isomorphism. Remark: In the above theorem one can replace "basis" by "linearly independent". Then x = AIb solves the equation Ax = b.2. .. . The inverse to the Theorem is also true Theorem: Let A: V ~ W be a linear map. Examples: 1. . .. 2.2. . or "generating". Ae n = (11. ... .2. e2.. suppose that for some other vector XI E V Ax I b Multiplying this identity by AI from the left we get . then so is AI. 3. .. vn is a basis if and only if Avl' Av2 . Ae2 = t... so JP>n _ = ]Rn+l. 4.. vn ' Define transformation A: Aek = vk . Theorem: LetA: V ~ Wbe an isomorphism. Proof: Suppose A is invertible.. a linear transformation is defined by its values on a basis). . n. Av2.Basic Notions 19 meaning that all properties and constructions involving vector space operations are preserved under isomorphism. Mmxn == Rm"n Invertibility and equations Theorem: Let A: V ~ W be a linear transformation. Let V be a (real) vector space with a basis ]Rn ~ Vby v\' v2' . Then A is invertible if and only if for any right side b E W the equation Ax=b has a unique solution x E V. The theorem below illustrates this statement... so V== Rn. . and let VI' v2' ••• . Therefore in the above theorem we can state that vI' v2' •. where e l . Then the system Av l . and let vI' V2' . if AVk = w k' k = 1. or "linearly dependent"all these properties are preserved under isomorphisms.. Proof Define the inverse transformation AI by AIwk = vk . To show that the solution is unique. M 2x3 == JR6. n (as we know.. n. . More generally. . Again by Theorem A is an isomorphism. AVn is a basis in W. wn are bases in Vand W respectively.
is that the result of some operation does not belong to Vo. Indeed. so it is not a subspace. namely V itself and {O} (the subspace consisting only of zero vector). Note. 3. Note that both identities. or kernel of A. we can see that any vector W E Ran A can be represented as . With each linear transformation A : V t W we can associate the following two subspaces: 2. i. Let us use symbol y instead of b. We know that given yEW the equation Ax=y has a unique solution x E V. v E Vo the sum u + v E Vo. v E Vo' and for all scalars a. 1. We need to show that B(aYI + PY2) = ap(YI) + PB(Y2)· Let xk := B(Yk)' k = 1. that the empty set 0 is not a vector space. i. If v E Vo then av E Vo for all scalars a. AXk =Yk' k = 1. p.e. But the definition of a subspace prohibits this! Now let us consider some examples: 1. Again. then recalling column by coordinate rule of the matrixvector multiplication. For any u. which is denoted as Null A or Ker A and consists of all vectors v E V such that Ay = o. because all operations are inherited from the vector space V they must satisfy all eight axioms of the vector space.e. Then which means B(aYI + PY2) = aB(Yi) + PB(Y2)· Corollary: An m x n matrix is invertible ifand only ifits columns form a basis in Rm. SUBSPACES A subspace of a vector space V is a subset Vo c V of V which is closed under the vector addition and multiplication by scalars. The range Ran A is defined as the set of all vectors represented as w = Ay for some v E V. since it does not contain a zero vector.20 Basic Notions AlAx =AIb .2. 2. that a subspace Vo c V with the operations (vector addition and multiplication by scalars) inherited from Vis a vector space. Note. i. W E W whicb can be If A is a matrix. A: R m t ]Rn. The null space. Trivial subspaces of a space V. and therefore x I = AI b = x...2. AAI = I and AiA = I were used here.. Let us now suppose that the equation Ax = b has a unique solution x for any b E W. the conditions 1 and 2 can be replaced by the following one: au + bv E Vo for all u.e. Let us call this solution B (y). The only thing that could possibly go wrong. Let us check that B is a linear transformation.
and even with modern and powerful computers manipulations can take some time. some.y plane (more precisely. which play role of x and y coordinates on the plane. The notation span{v I. So a rectangle on a plane with x . like letters. like Adobe Photoshop. so manipulations with bitmaps require a lot of computing power. a rectangle there) is a good model of a computer monitor. For example. + arvr of vectors vI' V2' . where we describe only critical points.. but just explain some ideas. vr } It is easy to check that in all of these examples we indeed have subspaces. and graphic engine connects them to reconstruct the object. for a matrix A. We will not go into the details. APPLICATION TO COMPUTER GRAPHICS In this section we give some ideas of how linear algebra is used in computer graphics. And now the last Example.Basic Notions 21 a linear combination of columns of the matrix A. vr } is also used instead of £{vl' v 2'···.. vr } is the collection of all vectors V E Vthat can be represented as a linear combination v = alv I + a 2v2 +. where each point (vector) v is . . to describe a polygon. Of course. where every pixel of an object is described. not all objects on a computer screen can be represented as polygons. 4.y coordinates is a good model for a computer screen: and a graphical object is just a collection of points. V 2' . That explains why the term column space (and notation Col A) is often used for the range of the matrix. 2Dimensional Manipulation The x . A (digital) photo is a good example of a bitmap object: every pixel of it is described. knows that one needs quite a powerful computer. But there are standard methods allowing one to draw smooth curves through a collection of points. vr . Vr E Vits linear span (sometimes called simply span) £{V I... v 2' ••• . . appearing on a computer screen are vector ones: the computer only needs to memorize critical points. and vector object... Given a system of vectors vI' V 2 ' .. Anybody who has edited digital photos in a bitmap manipulation programme. and which vertex is connected with which. the notation Col A is often used instead of Ran A. or bitmap) and we would like to show how one can perform some manipulations with such objects. . one needs only to give the coordinates of its vertices. each pixel is assigned a specific colour. Remark: There are two types of graphical objects: bitmap objects. Any object on a monitor is represented as a collection of pixels. Position of each pixel is determined by the column and row. The simplest transformation is a translation (shift). That is the reason that most ofthe objects. Bitmap object can contain a lot of points. In particular we explain why manipulation with 3 dimensional images are reduced to multiplications of 4 x 4 matrices. So.. For us a graphical object will be a collection of points (either wireframe model. have curved smooth boundaries.
the horizontal lines remain horizontal. First we need to be able to manipulate 3dimensional objects. the object becomes "taller" or "wider". it does not preserve O. then rotate around 0 (multiply by R) and then translate everything back by a. The manipulations with 3dimensional objects is pretty straightforward. sin y) R . so the translation is easy to implement. i. b :?: O. The rotation by yaround the origin o is given by the multiplication by the rotation matrix Rr we discussed above. 0): while it preserves the straight lines. If a :f.. a. rotation. and then we need to represent it on 2dimensional plane (monitor). given by a matrix _ (COSY (~ ~). but vertical lines go to the slanted lines at the angle j to the horizontal ones. that any linear transformation in ]R2 can be represented either as a composition of scaling rotations and reflections. we first need to translate the picture bya. we have the same basic transformations: Translation. given by the matrix This transformation makes all objects slanted.e. reflection through a plane. All other transformation used in computer graphics are linear. Note. r Stny cosy Ifwe want to rotate around a point a. the vector v is replaced by v + a (notation v 17 v + a is used for this). b then x and y coordinates scale di erently. preserving its shape.22 Basic Notions translated by a. scaling. 3Dimensional Graphics Threedimensional graphics is more complicated. Matrices of these . However it is sometimes convenient to consider some di erent transformations. If a = b it is uniform scaling which enlarges (reduces) an object. Another often used transformation is reflection: for example the matrix defines the reflection through xaxis. like the shear transformation. moving the point a to 0. We will show later in the book. The first one that comes to mind is rotation. that the translation is not a linear transformation (if a :f. Another very useful transformation is scaling. A vector addition is very well adapted to the computers.
It will be shown later that a rotation around an arbitrary axis can be represented as a composition of elementary rotations. For example ~ ( o ~ ~ J' (~ ~ ~J' (:~:~ ~:i. To perform such projection one just needs to replace z coordinate by 0. say to the x . scaling. However. Similarly. because it does not take into account the perspective. and rotation around zaxis. Note. this method does not give a very realistic picture.Basic Notions 23 x transformations are very similar to the matrices of their 2 the matrices 2 counterparts. Perspective Projection onto x . one can write matrices for the other 2 elementary rotations around x and around y axes. the fact that the objects that are further away look smaller. The simplest way is to project it to a plane. the matrix of this projection is [ ~ ~ ~J.y plane. it does not change z coordinate.y plane. To get a more realistic picture one needs to use the socalled perspective projection. that the above rotation is essentially 2dimensional transformation. Rotating an object and projecting it is equivalent to looking at it from di erent points.y plane: F is the centre (focal point) of the projection Such method is often used in technical illustrations. Let us now discuss how to represent such objects on a 2dimensional plane. To: Qefine a perspective projection one needs to pick a point the centre of projection or the .yY ~J' 0 I 0 0 cOO I represent respectively reflection through x . 000 y :r % Fig. we know how to manipulate 3dimensional objects. So.
lz/d Note. and it is a reasonable first approximation of how our eyes work. Assume that the focal point is (0.z / d ' 1.z / d' y* = O)T This transformation is definitely not linear (because of z in the denominator). 0. y. y. its image and the centre of the projection lie on the same line.. 4th coordinate playing role of the scaling coe cient.dz lz/d and similarly y . Let us get a formula for the projection.... To do this let us introduce the socalled homogeneous coordinates. zl.24 Basic Notions focal point) and a plane to project onto."'" (x'. y. Thus. Thus the perspective projection maps a point (x. Finding Coordinates x*.. However it is still possible to represent it as a linear transformation. to get usual3dimensional coordinates of the vector v = (x. O) x' z x h) z Fig. In the homogeneous coordinates.. y* of the Perspective Projection of the Point (x. This is exactly how a camera works.= . Then each point in ]R3 is projected into a point on the plane such that the point. we get that x* x d dz' so y ._ _.. and let v* = (x*. z) to the x y point ( 1. y*.. ol be its projection. y.. the last. that this formula also works if z > d and if z < 0: you can draw the corresponding similar triangles to check It. z) T xd x x*= . every point in ]R3 is represented by 4 coordinates. x 3 .y' .. Consider a point v = {x. d)T and that we are projecting onto xy plane. x 4l . zl from its homogeneous coordinates (x l' x 2 ..
if the plane we project to is not xy plane. in homogeneous coordinates a point in ]R3 is represented by a line through 0 in ]R4. shades. and then move everything back translating it by (d t . I . Similarly. Ifwe multiply homogeneous coordinates of a point in]R2 by a nonzero scalar. Of course. here we only touched the mathematics behind 3dimensional graphics. Thus in homogeneous coordinates the vector v* can be represented as (x. ol. . O)Tto move the centre to (0. 0. apply the projection. we do not change the point. y. d) T but some arbitrary point (d t . d3l.z/dl. d2 . That explains why modern graphic cards have 4 x 4 matrix operations embedded in the processor. so in homogeneous coordinates the perspective projection. and so on. 0. Then we first need to apply the translation by (dp d2. All these operations are just multiplications by 4 x 4 matrices. how to determine which parts of the object are visible and which are hidden. In other words. we move it to the xy plane by using rotations and translations. there is much more. how to make realistic lighting. d2 . is a linear transformation: x y 0 = 0 0 0 I 0 0 0 0 0 0 0 I x y z I lzld 0 0 lid Note that in the homogeneous coordinates the translation is also a linear transformation: x y = 0 0 0 o 0 0 G3 z I I But what happen if the centre of projection is not a point (0. For example. d3)T while preserving the xy plane. etc. 0. so we assume that the case x 4 = 0 corresponds to the point at infinity).Basic Notions 25 one needs to divide all entries by the last coordinate x4 and take the first 3 coordinates 3 (if x 4 = 0 this recipe does not work.
and finally.. . + a'nXn =:.k)T....Chapter 2 Systems of Linear Equations Different Faces of Linear Systems There exist several points of view on what a system of linear equations. a 2I x i { + + + a 12 x 2 a 22 x 2 amZxZ + .... . The first one is. k = I. Note. or in short a linear system is. n. . . . recalling the "column by coordinate" rule of the matrixvector multiplication..b2. To solve the above equation is to find all vectors X E Rn satisfying Ax = b. xnl E lR n . we can write the system as a vector equation where a k is the kth column of the matrix A. b = (b l . and A= (~~:~ ~~:~ am 'I am.Z ::: '" ~~::].... xla l + x 2a 2 + . am.b2 amnXn amixI = bm To solve the system is to find all ntuples of numbers xl' X2' . .. am'n then the above linear system can be written in the matrix form (as a matrix vector equation) Ax = b. . bml E lR m . that it is simply a collection ofm linear equations with n unknowns xl' X 2' . . these three examples are essentially just different representations of the same mathematical object.. xn which satisfy all m equations simultaneously... X n' all X. + xnan = b. bi a 2n Xn . Ifwe denote X:= (xl' X2' . . + + '" + + . 2.... a k = (alk' a 2'k' .
all the information necessary to solve the system is contained in the matrix A. that any solution of the "new" system is also a solution of the "old" one. This matrix is called the augmented matrix ofthe system. 3. When the system is in the echelon form. Row replacement: replace a row # k by its sum with a constant multiple of a row # j.. Row operations. we reduce it to a simple form.Systems of Linear Equations 27 Before explaining how to solve a linear system. It is clear that the operations 1 and 2 do not change the set of solutions of the system. x k' Yk or something else.. i. on the equations). let a "new" system be obtained from an "old" one by a row operation of type 3. let us notice that it does not matter what we call the unknowns. we just notice that row operation of type 3 are reversible. more "advanced" explanation why the above row operations are legal. There are three types of row operations we use: 1. one can easily write the solution. To see that we do not gain anything extra. i. As for the operation 3. which is called the coefficient matrix of the system and in the vector (right side) b. Namely.. one can easily see that it does not lose solutions. Echelon and Reduced Echelon Forms Linear system are solved by the GaussJordan elimination (which is sometimes called row reduction). Row exchange: interchange two rows of the matrix. Namely. Hence. all the information we need is contained in the following matrix which is obtained by attaching the column b to the matrix A. Solution of a Linear System. Then any solution of the "old" system is a solution of the "new" one. Namely.e.e. every row operation is equivalent to the multiplication of the matrix from the left by one ofthe special elementary matrices. they essentially do not change the system. So. all other rows remain intact. 2. By performing operations on rows of the augmented matrix of the system (i.e. We will usually put the vertical line separating A and b to distinguish between the augmented matrix and the coefficient matrix. the multiplication by the matrix . the "old' system also can be obtained from the "new" one by applying a row operation of type 3. Scaling: multiply a row by a nonzero scalar a. Row operations and multiplication by elementary matrices. There is another. the socalled echelon form.
. And finally. 1 o o 1 0 Q k o 0 0 ] o o 1 Multiplication by the matrix multiplies the row number k by Q. that the multiplication by these matrices works as advertised..28 } Systems of Linear Equations k o 1 o . So. one just replaces a by 1/a. . one can just see how the multiplications act on vectors (columns). Left multiplying the equality Ax = b by E we get that any solution of the equation Ax =b ... .. To see that the inverses are indeed obtained this way... To get the inverse ofthe second one. performing a row operatiQn on the augmented matrix of the system Ax = b is equivalent to the multiplication of the system (from the left) by a special invertible matrix E. To see.. The inverse ofthe first matrix is the matrix itself... Finally. one again can simply check how they act on columns. the inverse of the third matrix is obtained by replacing a by a. multiplication by the matrix 1 } 1 o 1 o k o Q A way to describe (or to remember) these elementary matrices: they are obtained from I by applying the corresponding row operation to it adds to the row # k row # } multiplied by a... and leaves all other rows intact.. Note that all these matrices are invertible (compare with reversibility of row operations).. 0 k o just interchanges the rows number} and number k..
m. Row reduction. . . . . not even interchange it with another row. Multiplying this equation (from the left) by ~l we get that any of its solutions is a solution of the equation ~IEAx =~IEb . which is the original equation Ax = b. we get: U~ ~ n . 2. by applying row operations of type 2.Systems oj Linear Equations 29 is also a solution of EAx = Eb. Find the leftmost nonzero column of the matrix. The main step of row reduction consists of three substeps: 1. that the first (the upper) entry of this column is nonzero. 2(3).. R 3 .~ Operate R2 ( ± ). An example of row reduction. This entry will be called the pivot entry or simply the pivot. So. We apply the main step to a matrix... etc. make them 0) all nonzero entries below the pivot by adding (subtracting) an appropriate multiple of the first row from the rows number 2. After applying the main step finitely many times (at most m). The point to remember is that after we subtract a multiple of a row from all rows below it (step 3). m.(~ j j JJ :. we get what is called the echelon form of the matrix.. a row operation does not change the solution set of a system. m. then to rows 3. we get . "Kill" (i. . 3. ..e... 3. Let us consider the following linear system: XI + 2x2 + 3x3 = 1 3xI+2x2 +x3 { =7 2x1 + X 2 + 2x3 = 1 The augmented matrix of the system is (~ 2 ~ ~ 1 2 jJ 1 Operate R 2 . Make sure. if necessary. (2). then we leave the first row alone and apply the main step to rows 2. we leave it alone and do not change it in any way.
from the first row (equation) xl = 1 . 3. or in vector form { :~ : 13 'x~UJ or x= (1. or simply pivot.(6 ~ g ~) . where A is the coefficient matrix. Then from the second equation we get x 2 = 1. ~ (6 0 o ~ J) =~~ _(6 ~ g ~)2R2 . Then the second property of the echelon form can be formulated as follows: 2.2)T 0 the augmented matrix.3x3 = 1 ." the rows with all entries equal 0). the solution is (6 ~ ~ ~)3R2 o x3 = 2. are below all nonzero entries. The leading entry in each row in echelon form is also called pivot entry.2X2 .6 + 6 = 1. Instead of using back substitution. All zero rows (i. if any. we can do row reduction from down to top. A matrix is in echelon form if it satisfies the following two conditions: 1. because these entries are exactly the pivots we used in the row reduction. For a nonzero row. Echelon form. R3 2 (3). Namely.2l.30 Systems of Linear Equations (~ Operate. For any nonzero row its leading entry is strictly to the right of the leading entry in the previous row. So. and the rest is pretty selfexplanatory: 1 2 0 0 1 2 0 0 1 2 and we just read the solution x = (1. we obtain JJ =l) (6 ~ ~ l) 3 4 1 0 0 2 4 Now we can use the so called back substitution to solve the system. killing all the entries above the main diagonal of the coefficient matrix: we start by multiplying the last row by 112. Pivots: leading (rightmost nonzero entries) in a row.1 . from the last row (equation) we getx3 =2.e. and finally.2x3 = . 3. let us call the leftmost nonzero entry the leading entry. We can check the solution by mUltiplying Ax.2(2) = 3.
we work from the bottom to the top and from the right to the left. . Then we can just write the solution. 4. we get what the socalled reduced echelonform of the matrix: coefficient matrix equal I. After the backward phase of the row reduction. All entries above the pivots are O. one just reads the solution from the reduced echelon form. moving all free variables to the right side. that all entries below the pivots are also o because of the echelon form.. All pivot entries are equal I. In this form the coefficient matrix is square (n x n).Systems of Linear Equations 31 A particular case of the echelon form is the socalled triangular form. one can also easily read the solution from the reduced echelon form. The idea is to move the variables. all its entries on the main diagonal are nonzero. Note. ( 0000ill3 here we boxed the pivots. Xl = 12x2 = 2 Sx4 x 2 is free x3 x 4 is free x5 =3 or in the vector form 12x2 x2 1Sx4 X= One can also find the solution from the echelon form by using back substitution: the idea is to work from bottom to top. is a particular case of the reduced echelon form. using row replacement to kill all entries above the pivots. and all the entries below the main diagonal are zero. the rightmost column of the augmented matrix can be arbitrary. In general case. We got this form in our example above. i. let the reduced echelon form of the system (augmented matrix) be ill 2 0 0 0 IJ ooills 02. as in the above example.e. if it is in the echelon form and 3. corresponding to the columns without pivot (the socalled free variables) to the right side. The general definition is as follows: we say that a matrix is in the reduced echelon form. The right side. For example. To get reduced echelon form from echelon form. An example of the reduced echelon form is the system with the coefficient matrix equal!. In this case.
e. A solution (if it exists) is unique i there are no free variables. we just make the reduced echelon form and then read the solution off. Note. . Therefore. let us investigate the question of when when is the equation Ax = b inconsistent. . no matter what the right side b is. statement 3 immediately follows from statements 1 and 2. . Multiplying this equation by n I from the left.e. that is if and only if the echelon form of the coefficient matrix has a pivot in every column. except the last one). then we can pick a right side b such that the system Ax = b is not consistent. b :t in it. + xn = b :t that does not have a solution.32 Systems of Linear Equations Analyzing the Pivots All questions about existence of a solution and it uniqueness can be answered by analyzing pivots in the echelon (reduced echelon) form of the augmented matrix of the system. .. Equation Ax = b has a unique solution for any right side b if and only if echelon form of the coefficient matrix A has a pivot in every column and every row. then the equation Ac = be does not have a solution. If we have a pivot in every row of the coefficient matrix. if one just thinks about it: a system is inconsistent (does not have a solution) if and only ifthere is a pivot in the last row of an echelon form of the augmented matrix. such a row correspond to the equation ox I + Ox2+. corresponding to the row operations. i. b. First of all.. The answer follows immediately. I an echelon form of the augmented matrix has a row 00 . Il (all entries are 0.0. E 2 .. i. If Ae has a zero row. Indeed. we get that the equation Ax = nIbe does not have a solution. The first statement is trivial. Then Ae=EA. we cannot have the pivot in the last column of the augmented matrix. ° ° ° 2: 3. LetAe echelon form of the coefficient matrix A. Ifwe don't have such a row. they all deal with the coefficient matrix. E I.. 1... an recalling that nIAe = A. where E is the product of elementary matrices. so the system is always consistent. Equation Ax = b is consistent for all right sides b if and only if the echelon form of the coefficient matrix has a pivot in every row. when it does not have a solution... Now.. Let us show that if we have a zero row in the echelon form ofthe coefficient matrix A. and not with the augmented matrix of the system. then the last row is also zero. three more statements. E = EN. . . I should only emphasize that this statement does not say anything about the existence. Finally. if we put be = (0. The second statement is a tiny bit more complicated. because free variables are responsible for all nonuniqueness.
Proposition. the system VI' v2' XlVI ••. any row and any column have no more than 1 pivot in it (it can have 0 pivots) Corollaries about Linear Independence and Bases Questions as to when a system of vectors in ~n is a basis. vm] be the n x m matrix with columns v I' v 2' .. Proposition. Similarly. . . Proposition. Proof Let a system vi' v2' .. Proof The system VI' v2' •. generating) i echelon 3. a linearly independent or a span'1ing system.. . In echelon form. And finally... By statement 1 above. By Proposition echelon form of A must have a pivot in every column.. it happens if and only if there is a pivot in every column of the matrix. Any linearly independent system of vectors in ~ n cannot have more than n vectors in it. vm E ~n be linearly independent.. vm is linearly independent i echelonform of A has a pivot in every column.. The system vI' v2. .• Vm E ~n. and let A = [vI' v2• . the equation Ax = 0 has unique solution x = O. .• vm] be an n x m matrix with columns vI' v2• . vrn is complete in form ofA has a pivot in every row. . Let us have a system of vectors vI' v2• . = xm = 0. +x2v2 +··· +xmvm=b has unique solution for any right side b ~n. ~n (spanning.. By statement 2 above. can be easily answered by the row reduction. vm. The system vi' v 2' •.. . 2. . . or equivalently.• vm. the system VI' v 2' •. The main observation... Then 1.Systems of Linear Equations 33 From the above analysis of pivots we get several very important corollaries.. vm E ~m is linearly independent ifand only if the equation XlvI +x2v2 +··· +xmvm=O has the unique (trivial) solution XI = x 2 = . it happens if and only XlvI if there is a pivot in every column in echelon form of the matrix. which is impossible if m > n (number of pivots cannot be more than number of rows)... and letA = [VI' v2' ••• .. vm is a basis in ~n i echelon form of A has a pivot in every column and in every row. vm E ~m is a basis in ~n ifand only if the equation E By statement 3 this happens if and only ifthere is a pivot in every column and in every row of echelon form of A. Any two bases in a vector space V have the same number of vectors in them. The system VI' V2' ••. . vm E ~m is complete in ~n ifand only if the equation +x2v2 +··· +xmvm=b has a solution for any right side b E ~ n ..
. Proposition above states that it happens if and only if there is a pivot in every row and every column. . . Let v I' v 2' . the system AI wl . or ifit is right right invertible. Any spanning (generating) set in IR n must have at least n vectors.wm be two different bases in V. en is the standard basis in IRn. Together with the assumption n ~ m this implies that m =n. we know. the equation Ax = b has a unique solution for any right side b if and only if the echelon form of A has pivot in every row and every column. Since number of pivots cannot exceed the number of rows. to check the invertibility of a square matrix A it is sucient to check only one of the conditions AAI = 1. The uniqueness mean that there is pivot in every column of the coefficient matrix (its echelon form). then it is invertible. n. So it is linearly independent. The statement below is a particular case of the above proposition.. but there is also a direct proof. Proposition.. A matrix A is invertible if and only if its echelon form has pivot in every column and every row.. AIA = 1. m ~ n.. The existence means that there is a pivot in every row (of a reduced echelon form of the matrix). Without loss of generality we can assume that n ~ m. Any basis in IR n must have exactly n vectors in it. Proof As it was discussed in the beginning of the section. .. vm be a basis in IR n and let A be the n x m matrix with columns VI' v2' . The above propo·sition immediately implies the foIlowing Corollary. . AI w2' ••..•• . vn and w"w2' . n ~ V defined by Ae k = vk' k = 1. An invertible matrix must be square (n x n). Since AI is also an isomorphism. . Proposition. Proof This fact foIlows immediately from the previous proposition.. . means that the equation Ax = b has a unique solution for any (all possible) right side b. Proof Let VI' v2' . We know that a matrix is invertible if and only if its columns form a basis in. Corollaries About Invertible Matrices Proposition. vm be a complete system in IR n . But. AI wm is a basis.. where e l .• . .. hence the number of pivots is exactly n. There is also an alternative proof. Consider an isomorphism A : IR. vm . e 2 . Statement 2 of Proposition implies that echelon form of A has a pivot in every row.. The fact that the system is a basis.. .34 Systems of Linear Equations Proof Let vI' V 2' . and letA be n x m matrix with columns VI' v2' .. n ~ m.lj a square (n x n) matrix is left invertible.2. so m = number of columns = number of pivots = n Proposition. that the matrix (linear transformation) A is invertible if and only if the equation Ax = b has a unique solution for any possible right side b. In other words. . vm .
.e. Indeed. The first. So to find the column number k of AI we need to find the solution of Ax = ek. Therefore reduced echelon form of an invertible matrix is the identity matrix 1. If A is square. The matrix I that we added will be automatically transfo~med to AI. A is not invertible There are several possible explanations of the above algorithm.. if b is a left inverse of A (i. and x satisfies Ax = 0. If a matrix A is right invertible. . b E IR n Ax = ACb = Ib = b. Now let us state a simple algorithm of finding the inverse of an n x n matrix: 1. BA = l). FINDING AI BY ROW REDUCTION As it was discussed above. . the echelon form of A has pivots in every row. that this proposition applies only to square matrices! Proof We know that matrix A is invertible if and only if the equation Ax = b has a unique solution for any right side b. If the matrix A is square (n x n). a na·"yve one. so the solution is unique. so the matrix is invertible. Performing row operations on the augmented matrix transform A to the identity matrix I. k = 1. If it is impossible to transform A to the identity by row operations. en is the standard basis in Rn. so A is invertible. for any right side b the equation Ax = b has a solution x = Cb. Any invertible matrix is row equivalent (Le. . The above algorithm just solves the equations Ax = e k . 4. echelon form of A has pivots in every row. then multiplying this identity by B from the left we get x = 0. Form an augmented n x 2n matrix (All) by writing the n x n identity matrix right of A. 2. an invertible matrix must be square. Thus. can be reduced by row operations) to the identity matrix. it also has a pivot in every column. e2. then for x = Cb. the echelon form also has pivots in every column. and its echelon form must have pivots in every row and every column. .. Therefore.2. If a matrix A is left invertible. the equation Ax = 0 has unique solution x = O.. n simultaneously! . Therefore.. Therefore.Systems of Linear Equations 35 Note. 3. and C is its right inverse (AC = l). is as follows: we know that (for an invertible A) the vector AIb is the solution of the equation Ax = b. where e l . This happens if and only if the echelon form of the matrix A has pivots in every row and in every column.
/. EN be the elementary matrices corresponding to the row operation we performed.. As we discussed above.. . and let E = EN .. i. FINITEDIMENSIONAL SPACES Definition. E. Theorem.0 1 3 2 o 0 ( 3 11 6 0 0 1 ~3R: 0 1 03 o 1 +R2 1 4 2 1 0 0JX3 (3 12 6 3 0 013210 01321 (o 0 3 1 1 1 0 0 3 1 1 0~JR3 +2R2 Here in the last row operation we multiplied the first row by 3 to avoid fractions in the backward phase of row reduction. more "advanced" explanation. Proof As we discussed in the previous paragraph.. This "advanced" explanation using elementary matrices implies an important proposition that will be often used later. Let E I . AI =EN··· E E 2 I' so A = (AIt l = EjlE".36 Systems of Linear Equations Let us also present another. so E = AI.0 1 0 3 0 1 0 0 3 1 1 1 Dividing the first and the last row by 3 we get the inverse matrix DIMENSION. . EA = J.. Any invertible matrix can be represented as a product ofelementary matrices. But the same row operations transform the augmented matrix (AI 1) to (EA IE) = (I I AI). . E2EI be their product. ( 3 11 6 Augmenting the identity matrix to it and performing row reduction we get 1 4 2 1 0 OJ ( 1 4 2 1 1 OJ 2 7 7 0 1 0 2R .. The dimension dim V of a vector space V is the number of vectors in a basis. 1 We know that the row operations transform A to identity.e. every row operation can be realized as a left multiplication by an elementary matrix.. Suppose we want to find the inverse of the matrix 1 4 2J 2 7 7 .l . E 2. Continuing with the row reduction we get 3 12 0 1 2 2Jo 1 0 3 0 1 ( o 0 3 1 1 1 35/3 2/3 14/3J 3 0 1 ( 113 1/3 113 12R 2 (3 0 0 35 2 14J .
and letA: V t IR n be 2 an isomorphism.. Vn E V. .. n. Then Av I. that we have a system of vectors in a finitedimensional vector space. Suppose. . otherwise we call it infinitedimensional. and one can define an isomorphism A : V t IR n by AVk = ek . and by Proposition m ~ n. Proof Let vI' v2. ..e.. . Indeed.. Any linearly independent system 0/ vectors in a finitedimensional space can be extended to a basis. . that it does not"depend on the choice of a basis. we call the space V finitedimensional. . n = dimE to move the problem to IR n . v r+ 2 """' vn such that the system o/vectors vI' v2. Proposition. . Av2 . If dim V is finite. Then Av I. Proposition asserts that the dimension is well defined.. Any generating system in afinitedimensional vector space V must have at least dim V vectors in it.. """' vn is a basis in V. . . if dim V = n then there exists a basis VI' V 2 ' ••• .. Av2.. vm E Vbe a complete system in V. Proposition. Proposition. i. Note. Any linearly independent system in a finitedimensional vector space V cannot have more than dim V vectors in it. . and let A : V t IR n be an isomorphism. ijvI' v2.2.Systems of Linear Equations 37 For a vector space consisting only of zero vector 0 we put dim V = o. vr are linec:rly independent vectors in afinitedimensional vector space V then one can find vectors vr+l. n.e. . that if dim V = n. we put dim V = 00. ••. vm E Vbe a linearly independent system. AVm is a complete system in lR n . then there always exists an isomorphism A : V t IRn. If V does not have a (finite) basis. AVm is a linearly independent system in IR n .. where all such questions can be answered by row reduction (studying pivots). k = 1. i. or if it is complete)? Probably the simplest way is to use an isomorphism A : V t IR n .. Proof Let vI' v ' . and we want to check if it is a basis (or if it is linearly independent. and by Proposition m ::.. This immediately implies the following Proposition. A vector space Vis finitedimensional if and only if it has a finite spanning system.
if the right side. vr is already a basis.38 Systems of Linear Equations Proof Let n = dim Vand let r < n (if r = n then the system v I' V 2' .e. b = 0. Then for we have Ax = A(x I + xh) = AXI + AXh = b + 0 = b. Then for x h :=xx I we get . so any x of form x = xI + xh' xII E H is a solution of Ax = b. vr is not generating). a homogeneous system is a system of form Ax = O.. Note. Let a vector xh satisfy Axh = O.. and the case r> n is impossible). and let H be the set of all solutions of the associated homogeneous system Ax = O.. and so on.. In other words.. Take any vector not belonging to span{vl' v2' . ofthe solution set) of a linear system. Theorem. Now let x be satisfy Ax = b. We call a system Ax = b homogeneous. i.. (General solution of a linear equation).. . Then the set {x = XI + x h : xh E H} is the set of all solutions of the equation Ax = b. Repeat the procedure with the new system to get vector vr + 2.e. v r } and call it vr + I (one can always do that because the system vI' V 2' . The system vI' v2 ' . General Solution of a Linear System In this short section we discuss the structure ofthe general solution (i. that the process cannot continue infinitely. We will stop the process when we get a generating system. With each system Ax = b we can associate a homogeneous system just by putting b = O..•• . Let a vector xlsatisfy the equation Ax = b. this theorem can be stated as General solution of Ax =b = of Ax = b A particular solution + General solution of Ax = 0 Proof Fix a vector xI satisfying AXI =b. because a linearly independent system of vectors in V cannot have more than n = dim V vectors. . . v r' vr + I is linearly independent.
we do not have to assume here that vector spaces are finitedimensional. so H. we can repeat the row operations. consider a system 2 2 2 2 8 14 Performing row reduction one can find the solution of this system (11 1 1=~)x (li). The power of this theorem is in its generality. You will meet this theorem in differential equations. we keeping notation x3 and Xs here only to remind us that they came from the corresponding free variables. but this is too time consuming. For example the formula gives the same set as (can you say why?). is to check that the first vector (3. partial differential equations.b = 0. that we are just given this solution. It applies to all linear equations.xl) = Ax  AXI = b . 0. for example. x5 can be denoted here by any other letters. if the solution was obtained by some nonstandard method. So.Systems of Linear Equations 39 AXh = A(x . Namely. There is an immediate application in this course: this theorem allows us to check a solution of a system Ax = b. integral equations. t and s. ol satisfies the equation Ax = b. here we just replaced the last vector by its sum with the second one. to study uniqueness. 2. 1. Moreover. Of course. Besides showing the structure of the solution set. we only need to analyse uniqueness of the homogeneous equation Ax = 0. this formula is different from the solution we got from the row reduction. The simplest way to check that give us correct solutions. let us suppose. but it is nevertheless correct. Therefore any solution x of Ax X h E = b can be represented as x = xl + xh with some xh E H. etc. this theorem allows one to separate investigation of uniqueness from the study of existence. and we want to check whether or not it is correct. which always has a solution. Now. = The parameters x 3. For example. it can look differently from what we get from the row reduction. and that the other two (the ones with .
there should be 2 free variables. we will be assured that any vector x defined is indeed a solution. Note. or null space KerA = Null A := {v E V: Av = O} C V. and Ae be its echelon form . Ker AT are called the fundamental subspaces of the matrix A. Systems of linear equations example. is the dimension of the range of A rankA := dim Ran A. Given a linear transformation (matrix) a its rank. if we just somehow miss the term with x 2' the above method of checking will still work fine. which is often used instead of Ran A. and that formulas both give correct general solution. one does not have to perform all row operations to check that there are only 2 free variables. In this section we will study important relations between the dimensions of the four fundamental subspaces. This explains the name column space (notation Col A). Computing Fundamental Subspaces and Rank To compute the fundamental subspaces and rank ofa matrix. In other words.e. then it follows from the "column by coordinate" rule ofthe matrix mUltiplication that any vector W E Ran A can be represented as a linear combination of columns of A. Ker A. namely. FUNDAMENTAL SUBSPACES OF A MATRIX A: V ~ Wwe can associate two subspaces. then in addition to Ran A and Ker A one can also consider the range and kernel for the transposed matrix AT . We will need the following definition. its kernel. the kernel Ker A is the solution set of the homogeneous equation Ax = 0. Definition. is to count the pivots again. the number of pivots is 3. If A is an m x n matrix. a mapping from ]Rn to ]Rm. one needs to do echelon reduction. If A is a matrix. rankA. For example. The four subspaces RanA. if one does row operations. and its range Ran A = {w E W: w = Av for some v E V} C W. So indeed. In this example. let a be the matrix. Namely. that this method of checking the solution does not guarantee that gives us all the solutions. and it looks like we did not miss anything. Often the term row space is used for Ran AT and the term left null space is used for KerAT (but usually no special notation is introduced). and the range Ran A is exactly the set of all right sides b E W for which the equation Ax = b has a solution. What comes to mind..40 Systems of Linear Equations the parameters x3 and Xs or sand t in front of them) should satisfy the associated homogeneous equation Ax = O. To be able to prove this. If this checks out. Ran AT. which is one of the fundamental notions of Linear Algebra. we will need new notions of fundamental subspaces and of rank of a matrix. i.
i. But if we already have the echelon form of A. Of course. !J o 0 0 000 000 (the pivots are boxed here). the vectors (we put the vectors vertically here. the columns give us a basis in Ran A.. 3. Example. the columns 1 and 3 of the original matrix. The question of whether to put vectors here vertically as columns. Consider a matrix 2. i.e.. the columns where after row operations we will have pivots in the echelon form) give us a basis (one of many possible) in Ran A.. then we get Ran AT for free. Compute the reduced echelon form of A. We also get a basis for the row space RanA T for free: the first and second row of the echelon form of A. So. say by computing Ran A.e. To find a basis in the null space Ker A one needs to solve the homogeneous equation Ax = 0: the details will be seen from the example below. The pivot columns of the original matrix a (i. i3 i3 ~3 ~3 2~J ' (1 1 1 1 0 Performing row operations we get the echelon form (~o ~ ~ .Systems of Linear Equations 41 1. which in this example is . and then do row operations.e. The pivot rows of the echelon from Ae give us a basis in the row space. Our reason for putting them vertically is that although we call RanAT the row space we define it as a column space of AT) To compute the basis in the null space Ker A we need to solve the equation Ax = O. or horizontally as rows is is really a matter of convention. it is possible just to transpose the matrix.
it is not necessary to write the whole augmented matrix. it is sucient to work with the coefficient matrix. Unfortunately. 1. one must solve the equation AT x = O. that when solving the homogeneous equation Ax = 0.. 1 ~] . Explanation of the Computing Bases in the Fundamental Subspaces So. Xs is free or. Keeping this last zero column in mind. i.e.[g] [11~] 113 o 0 form a basis in KerA. ill o (o o 0 0 0 1 ill 0 113. the knowledge of the echelon form of a does not help here. So. 0 1 113) x2is free. which does not change under row operations. x4 is free. Unfortunately. there is no shortcut for finding a basis in KerAT. in the vector form 1 1 0 0 x= 1 =x2 x4 xs 3 X4 0 +x4 1 +xs 0 1 113 0 113 0 1 0 0 Xs The vectors at each free variable. we can just keep this column in mind. why do the above methods indeed give us bases in the fundamental subspaces? .42 Systems of Linear Equations 0 0 0 0 0 0 Note. in this case the last column of the augmented matrix is the column of zeroes. in our case the vectors [ 001 O. without actually writing it. Indeed. we can read the solution 0 the reduced echelon form above: 1 xI = x2 3"xs .
Let VI ' v 2. vr' v = a IVI + a2v2 + . . the entry number k of the resulting vector is exactly x k' so the only way this vector (the linear combination) can be 0 is when all free variables are O. and let V be an arbitrary column of A. Suppose alw l + a 2w 2 + . The row space Ran A T. 2. Repeating this procedure.. First of all. The column space Ran A. the reduced echelon form Are is obtained from A by the left multiplication Are = EA.. where E is a product of elementary matrices.. . they do not change linear independence.. let wI 'w2.. + arvr.. Ev2. but we present here a more formal way to demonstrate this fact. . . Indeed.. It is easy to see that the pivot rows of the echelon form Ae of a are linearly independent. .wr be the transposed (since we agreed always to put vectors vertically) pivot rows of Ae... . . To see that the system is linearly independent. the vectors we obtained form a spanning system in Ker A. Let us now show that the pivot columns of a span the column space of A.. ... . notice that the pivot columns of the reduced echelon form are of a form a basis in Ran Are.. vr be the pivot columns of A.. so E is an invertible matrix. This can be obtained directly from analyzing row operations. r. so a 2 = O. EVr are the pivot columns of Are' and the column v ofa is transformed to the column Ev of Are. .w3' . Thus.+ a..Systems of Linear Equations 43 The null space KerA. Consider now the first nonzero entry ofw2.. The corresponding entries of the vectors w3. . To see that vectors w I .. we can conclude that a l = O.e. vector Ev can be represented as a linear combination Ev = alEv I + a 2Ev2 + .. one can notice that row operations do not change the row space. . wr span the row space. We want to show that v can be represented as a linear combination of the pivot columns vI' v2. So we can just ignore the first term in the sum.. . .... then any vector in Ker A is a linear combination of the vectors we obtained. wr the corresponding entries equal 0 (by the definition of echelon form).+ arvr. The vectors Ev l . wr are 0.Evr. so indeed the pivot columns of A span Ran A. Then for each free variable x k . The case of the null space KerA is probably the simplest one: since we solved the equation Ax = 0.. Since the pivot columns of Are form a basis in RanA re . Since for all other vectors w2. Since row operations are just left multiplications by invertible atrices. Multiplying this equality by gI from the left we get the representation v = aIv I + a 2v2 + . found all the solutions. let us multiply each vector by the corresponding free variable and add everything. ... Let us now explain why the method for finding a basis in the column space Ran A works. Consider the first nonzero entry of WI. i. the pivot columns of the original matrix A are linearly independent. Therefore.w2' . + arwr = O. we get that a k = 0 Vk = 1.
The first statement is simply the fact that the number of free variables (dimKer A) plus the number of basic variables (i.e. Then 1. to n).. Ae is obtained from A be left multiplication Ae = EA. by removing unnecessary vectors (columns). If a is an m x n matrix. A(X) : = {y = A(x) : x EX}. Then Ran A: = Ran(AT ET) =AT (Ran ET) =AT (~m) = Ran AT . since dimensions of both Ran A and RanA Tare equal to the number of pivots in the echelon form of A. and Ae is its echelon form. where E is an m x m invertible matrix (the product of the corresponding elementary matrices). the number of pivots. i. solving a homogeneous equation Ax = 0 amounts to finding a basis in the null space KerA. The following theorem is gives us important relations between dimensions of the fundamental spaces. modulo the above algorithms of finding bases in the fundamental subspaces. Theorem rankA = rankA T . 2. Let A be an m Xn matrix. is almost trivial.. if one takes into account that rank A = rank AT is simply the first statement applied to AT. a linear transformationfrom ]Rn to ]Rm. It is often also called the Rank Theorem Theorem.44 Systems of Linear Equations For a transformation A and a set X let us denote by A(X) the set of all elements y which can represented as y = A(x). DIMENSIONS OF FUNDAMENTAL SUBSPACES There are many applications in which one needs to find a basis in column space or in the null space of a matrix. For example. dim Ker A + dim Ran A = dim Ker A + rank A = n (dimension of the domain of A). i. . rank A) adds up to the number of columns (i. dim Ker AT + dimRan AT = dim Ker AT + rank AT = dimKer AT + rankA T = m (dimension of the target space ofA). This theorem is often stated as follows: IThe column rank of a matrix coincides with its row rank..e. The second statement. However. so indeed RanA T = Ran A: THE RANK THEOREM.l The proof of this theorem is trivial. x E X.. the most important application of the above methods of computing bases of fundamental subspaces is the relations between their dimensions. Finding a basis in the column space means simply extracting a basis from a spanning set. as it was shown above.e.e. Proof The proof.
S0 rank A ~ 3. how can we guarantee that any of the formulas describe all solutions? First of all. we know that in either formula. and therefore there cannot be more than 2 linearly independent vectors in KerA. let us count dimensions: interchanging the first and the second rows and performing first round of row operations 2R R 1 1 1 2 5 0 0 0 1 2 J 2RJ 2 2 2 3 8 0 0 0 1 2 we see that there are three pivots already. last 2 vectors in either formula form a basis in KerA. An important corollary of the rank theorem. But. there we considered a system 2 3 1 1 ( 1 1 1 1 3 6 1 4 9J x =(17J . we already can see that the rank is 3. By Theorem. Now. Theorem. is the following theorem connecting existence and uniqueness for linear equations. but it is enough just to have the estimate here). the last 2 vectors (the ones multiplied by the parameters) belong to Ker A. Therefore. rankA + dim Ker A = 5. hence dim Ker A ~ 2. (Actually. Let A be an an m x J(i ~ ~ l =~J (J ~ ~ i =~J n matrix.1t is easy to see that in either case both vectors are linearly independent (two vectors are linearly dependent if and only if one is a mUltiple of the other).Systems of Linear Equations 45 As an application of the above theorem. Then the equation Ax = b has a solution for every b E lR m if and only if the dual equation ATx=O . so either formula give all solutions of the equation. 1 2 5 8 2 2 2 3 8 14 and we claimed that its general solution given by or by A vector x given by either formula is indeed a solution of the equation.
. e 2.. Change of Coordinates Formula The material we have learned about linear transformations and their matrices can be easily extended to transformations in abstract vector spaces with finite bases. (Note. Let T: V ~ W be a I inear transformation.. the reason being that we consider different bases. bn}· Any vector v E V admits a unique representation as a linear combination n V = xlb I + x 2b2 + . In this section we will distinguish between a linear transformation T and its matrix.. Representation of a Linear Transformation in Arbitrary Bases. denoted by [11 BA .. Namely.. B := {bl' b2. so dimRanA =n . "'j . vn to the standard basis e I.. . . and let a = {a!' a2. Proof The proof follows immediately from Theorem by counting the dimensions. + xnbn = LXkbk' k=I The numbers xl' X2. an}. Xn are called the coordinates of the vector v in the basis B. . so a linear transformation can have different matrix representation... . which relates the coordinate vectors [Tv]B and [v]A' [Tv]B= [1]BA [v]A. en in IRn. A matrix of the transformation T in (or with respect to) the bases a and b is an m x n matrix. not A). then the dimensions of the domain Rn and of the range Ran A coincide. . . . ... that in the second equation we have AT. If the kernel is nontrivial.dim Ker A. It transforms the basis v!' v2. bm } be bases in Vand W respectively.. It is convenient to join these coordinates into the socalled coordinate vector of v relative to the basis B.. Coordinate vector. statement 1 of the theorem says. Let V be a vector space with a basis B := {bI' b2.46 Systems of Linear Equations has a unique (only the trivial) solution. . then the transformation "kills" dimKerA dimensions. that if a transformation a: IR n ~ IRtn has trivial kernel (KerA = {O}). which is the column vector Note that the mapping v ~ [v]B is an isomorphism between Vand IRn. There is a very nice geometric interpretation of the second rank theorem. Matrix of a linear transformation.
. (follows immediately from the mUltiplication of matrices rule). Consider the identity transformation I = I v and its matrix [1]BA in these bases. Let us have two bases A = {aI' a2. As in the case of standard bases. composition of linear transformations is equivalent to multiplication oftheir matrices: one only has to be a bit more careful about bases. and let A. Another possibility is to transfer everything to the spaces ~n via the coordinate isomorphisms v ~ [v]B' Then one does not need any proof. its kth column is the coordinate representation [aklB of kth element of the basis A Note that [1]AB = ([1] BAtl. Namely. for any vector v E Vthe matrix [1]BA transforms its coordinates in the basis a into coordinates in the basis B. \Iv E V. Tx:= T2(T I (x)) we have [1]CA = [T2 T dcA= [T2]CB [TdBA (notice again the balance of indices here). bn} in a vector space V. en} there. . The for the composition T= T2T I.. T: x ~ Z. Yand Z respectively. everything follows from the results about matrix multiplication.. . The matrix [1]BA is easy to compute: according to the general rule of finding the matrix of a linear transformation. ". Let our space Vbe and let us have a basis B = {bI' b2. an} and b = {bI' b2. let T) : x ~ Yand T2 : Y ~ Z be linear transformation. Band C be bases in X.. The matrix [1]BA is often called the change ofcoordinates (from the basis A to the basis B) matrix. The change of coordinates matrix [1]SB is easy to compute: [1]SB = [bl' b2. " bn} there.e. We also have the standard basis S = {el' e2... . By the definition [v]B = [1]BA [v] A . .. .'" bn] =: B. The proof here goes exactly as in the case of ~n spaces with standard bases. so any change of coordinate matrix is always invertible. Change of Coordinate Matrix. so we do not repeat it here. ~n. . i. An example: change of coordinates from the standard basis.. The matrix [1]BA is easy to find: its kth column is just the coordinate vector [Tak]B (compare this with finding the matrix of a linear transformation from ~n to ~m).Systems of Linear Equations 47 notice the balance of symbols A and B here: this is the reason we put the first basis A into the second position.
it involves solving linear systems. For example. The rule is very simple: . Let T: V ~ W be a linear transformation. and it is also easy to check that the above matrix is indeed the inverse of B) An example: going through the standard basis.. 1 + x}.e. and let S denote the standard basis there. A = {I. and we want to find the change of coordinate matrix [1]BA. consider a basis B={(1). 1 . B be two bases in W. Suppose we know the matrix [1] BA' and we would like to find the matrix representation with respect to new bases A. x}. B. [1] Matrix of a transformation and change of coordinates. it is just the matrix B whose kth column is the vector (column) vk . and we know how to do that. and let A. I think the following way is simpler. i. However. 1 and taking the inverses [1]As=A Then (1 1) ' [1]BS= 15 = 4" (2 1· 12 1) 1 and Notice the balance of indices here. and B = {I + 2x.. we can always take vectors from the basis A and try to decompose them in the basis B. Of course. the matrix [1] BA. [llsA = (1 ~2) =: B. and for this basis [I]SA = (6 _I n = 0 =: A. A be two bases in V and let B. In PI we also have the standard basis S = {l.2x}.48 Systems of Linear Equations i.e. [1]SB = and 1]1 S[1]BS= [ SB un in }R2 . Then Ui) =: B = B1 ="3 12 I (I 2) (we know how to compute inverses. In the space of polynomials of degree at most 1 we have bases . And in the other direction [l]BS = ([1] SB )1 = 15 1.
we can just say that A and B are similar.Systems of Linear Equations 49 to get the matrix in the "new" bases one has to surround the matrix in the "old" bases by change of coordinates matrices. that very often in this [11 A is often used instead of [11 AA . Let B = {b I . that it does not matter where to put Q and where gl: one can use the formula A = QBgI in the definition of similarity. b2.. that similar matrices A and B have to be square and of the same size. We say that a matrix A is similar to a matrix b if there exists an invertible matrix Q such that A = gI BQ. but two index notation is better adapted to the balance of indices rule. it follows from counting dimensions.. therefore B is similar to A. Consider a linear transformation T: V ~ V and let [11 AA be its matrix in this basis (we use the same basis for "inputs" and "outputs") The case when we use the same basis for "inputs" and "outputs" is very important (because in this case we can multiply a matrix by itself). that one can treat similar matrices as different matrix representation of the same linear operator (transformation). . . The above reasoning shows. We did not mention here what change of coordinate matrix should go where. matrix representation of a linear transformation changes according to the formula Notice the balance of indices. If a is similar to B. Namely. i.. . Notice. the two index notation [T] is better adapted to the balance of indices rule. then B = QAgI = (gItIA(gI) (since gl is invertible). an} be a basis in V.. . .. so I recommend using it (or at least always keep it in mind) when doing change of coordinates. So. It is shorter. bn} be another basis in V. I[Tls A = [I]BB[T]BA[I]AA I The proof can be done just by analyzing what each of the matrices does. By the change of coordinate rule above [T]BB = [I]BA[T]AA[I]AB Recalling that [I]BA = [flAk and denoting Q := [I]AB ' we can rewrite the above formula as This gives a motivation for the following definition Definition. case the shorter notation [11 A is used instead of [T]AA' However. if A = gIBQ.e. because we don't have any choice if we follow the balance of indices rule. Case of one basis: similar matrices. Let V be a vector space and let A = {aI' a 2. so let us study this case a bit more carefully. Since an invertible matrix must be square. [11 BB = Q 1 [11 AA Q. The above discussion shows.
with m rows and n columns. Consider the 3 x 4 matrix 4 3 I [J 1 Here and 5 7 0 ~1 m represent respectively the 2nd row and the 3rd column ofthe matrix.. and represents the entry in the matrix (1) on the ith row andjth column. First of all. We now consider the question of arithmetic involving matrices. Hence (ai 1 .Chapter 3 Matrics Introduction A rectangular array of numbers of the form [ a~ i a. ain) and amj aij represent respectively the ith row and the jth column of the matrix (1).n amn 1 ami is called an m x n matrix. We count rows from the top and columns from the left. let us .. and 5 represents the entry in the matrix on the 2nd row and 3rd column. Example.
1 0 7 ~ ~ ~11 6 and B =[ ~ 2 1 ~ ~2 ~Il' 3 3 and A+B = [~:~ . A reasonable theory can be derived from the following definition. . b~:n 1 bmn ~ql + bml al n ~bln 1 amI amn ·+ bmn and call this the sum of the two matrices A and B. We first study the simpler case of multiplication by scalars. Suppose that the two matrices A=: : and B = : [ amI amn bm1 both have m rows and n columns. Example.:~ . Suppose that A=[ Then .. 1 0 7 Proposition. (Matrix Addition) Suppose that A. Care m x n matrices. Then we write A+B=[ all a~ 1 . Suppose further that 0 represents the m x n matrix with all entries zero. we can consider the matrix A' obtained from A by multiplying each entry of A by 1. . as matrix addition is simply entrywise addition. We do not have a definition for adding" the matrices 2 [ 1 4 3 0 7 I] [~ ~l 6 and . Proof Parts (a) . and x (d) there is an m n matrix A' such that A + A' = O. a~n 1 [b~ I .~! 21~171 [~ ~ ~ 9~1' 11 0+1 7+3 6+3 1 1 10 Example.. . (b) A + (B + C) = (A + B) + C. and includes multiplication of a matrix by a scalar as well as mUltiplication of two matrices. Definition. The theory of multiplication is rather more complicated. For part (d). B.(c) are easy consequences of ordinary addition. (c) A + 0 = A.Matrics 51 study the problem of addition. Then (a) A + B = B + A.
(Multiplication By Scalar) Suppose that A.. in the form Ax = b. and that c. Suppose that 2 4 3 1] A= 3 1 5 1 0 7 [ 2 . . Then we write cA = [ a~1 . Suppose that the matrix A ami has m rows and n columns. Bare m x n matrices. Example. (c) OA = 0. where amn represent the coeffcients and A=: [ ami a~ I . Then (a) c(A + B) = cA + cB. dE JR. The question of multiplication of two matrices is rather more complicated. and (d) c(dA) = (cd)A. Proof These are all easy consequences of ordinary multiplication. let us consider the representation of a system of linear equations al1 xI +···+a1nxn =~. To motivate this. and that C E JR.52 Matrics Definition. Suppose further that 0 represents the m x n matrix with all entries zero. as multiplication by scalar c is simply entrywise multiplication by the number c. (b) (c + d)A = cA + dA. a~n 1andb = [~I 1 : : bm . and call this the product of the matrix A by the scalar c. 12 2 0 14 Proposition. [C~l1 C~ln 1 cam I camn . 6 Then 2A = [: ~ 1~ ~21.
. until a in with bnj .and then add these products to obtain q ij' Example. Suppose that A= and B  are respectively an m x n matrix and an n x p matrix. for a simple way to work out qjj . we have ~aikbkj k=l n = ailb1j + . m and j qij = = 1. Definition.from ail with bll' and so on. we observe that the ith row of A and the jth column of B are respectively b1j bnj We now multiply the corresponding entries . ..Matrics 53 represents the variables. This can be written in full matrix notation by Can you work out the meaning of this representation? Now let us define matrix multiplication more formally. Then the matrix product AB is given by the m x p matrix AB qml qmp where for every i = 1. +ainb. . . On the other hand.. Consider the matrices ... Note first of all that tbe number of columns of the first matrix must be equal to the numberof rows of the second matrix. p.y· Remark. the entry in the ithrow and jth column of AB. ..
To calculate this.1 + 1.3 [~ ~ ~ ~H :+~l :] = 3 + 2 + 0 + 6 = 11 .2 + 5.1 = 8 + 12 . 0 + 2. qt2] x x o x [ x x x xxx x x x = 2 4 31]: .3 = 2 + 8 + 0 3 = 7. so let us cover up all unnecessary information.3 + 3 (. so that 3 x From the definition. so that the product AB is a 3 x 2 matrix. we need the 2nd row of A and the Ist column of B. so let us cover up all unnecessary information. we need the Ist row of A and the 2nd column of B. so let us cover up all unnecessary information. Let us calculate the product Consider first of all q II. we need the Ist row of A and the Ist column of B. so that 3 x From the definition. To calculate this.2 + 3. x [ x x x x x x x x x 1 2 =X [X x ql21 x. 0 3 2 1 Note that A is a 3 x 4 matrix and B is a 4 x 2 matrix.1 + 4.1) . Consider next q2I.54 Matrics A=[ 1 ~ 4 .4 + 4.2) + (. we have qll = 2. we have qI2 = 2. Consider next q 12.0 + (1) .6 1 = 13. we have q2I = 3. To calculate this. xJ From the definition. so that 2 4 3 I] ~: [qtl x .
(.Matries 55 Consider next q22. Consider again the matrices A~P 4 1 0 3 5 7 Ij and B= ~ 1 0 2 3 .2) + 6.12.2+7.1 +0. We therefore conclude that AB=[ ~ 4 52 3 Ij 1 o 7 6 1 2 0 3 4 3 2 1 +~ 1 4 3 2 13 1 7 .3 + 5. so let us cover up all unnecessary information. so let us cover up all unnecessary information.1 = 12 + 3 10 + 2 = 7.4 + 0 + (.14) + 6 = . To calculate this. we have q22 = 3.4 + 0. 12 17 Example. we have q31 =(1). we need the 3rd row of A and the 2nd column of B.2) + 2. so that 1 x x x x v 2 x x x x x x 2 = x 3 x X x q22 j . we need the 2nd row of A and the 2nd column of B. so that x x 1 : : :j: : =[: 0 7 6 x x 2 X x q32 X From the definition.1 = .0+6. To calculate this. so that x 4 x x x x 3 5 2 3 From the definition. X X x x x x 0 7 6 1 o x 3 x From the definition. we have q32 = (1) . To calculate this.4 + 1.3 + 7 (. Consider finally q32. we need the 3rd row of A and the Ist column of B. so let us cover up all unnecessary information.3 =1 +0+0+ 18= 17. Consider next q31.
Proposition (Distributive Laws) (a) Suppose that A is an m x n matrix and Band Care n x p matrices. those where the number ofrows equals the number of columns. Then c(AB) = (cA)B = A(cB). where 0 is the zero m x 1 matrix. where the matrices A. Proof Clearly the system (2) has either no solution. Systems of Linear Equations Note that the system of linear equations can be written in matrix form as Ax = b.v) = Au Av = b . so that we do not have a definition for the "product" BA.v» = Au + c(A(u . Inversion of Matrices We shall deal with square matrices. we have A(u + c(u . It remains to show that if the system (2) has two distinct solutions. Every system of linear equations of the form. B is an np matrix and C is an p x r matrix. (b) Suppose that A and Bare m x n matrices and C is an n x p matrix. Suppose that A is an m x n matrix. x and b are given.v) is a solution for every c E lR. B is an n x p matrix. so that x = u + c(u . . then it must have infinitely many solutions. Then A(BC) = (AB)C.v» = Au + A(c(u .has either no solution. Then Au = band Av = b. Proposition. Suppose that x = u and x = v represent two distinct solutions. and that c E JR. Definition. We shall establish the following important result. Proposition. Then (A + B)C = AC +BC. Proposition. (Associative Law) Suppose that A is an mn matrix. exactly one solution. Clearly we have infinitely many solutions. Then A(B + C) = AB + AC. one solution or infinitely many solutions.56 Matries Note that B is a 4 x 2 matrix and A is a 3 x 4 matrix. or more than one solution. so that A(u . The n x n matrix .v» = b + c) = b.b = 0. It now follows that for every c E lR. We leave the proofs of the following results as exercises for the interested reader.
It shows that the identity matrix In acts as the identity for multiplication of n x n matrices. Suppose that A is an invertible n x n matrix. Proposition. For every n x n matrix A.A = A. we shall be content with nding such a matrix B if it exists. Hence the inverse A I is unique. . Note that II = (1) and 14 = a a a a a a a a 1 a a a a 1 The following result is relatively easy to check. a·· { 1)= 0 ifi::. An n x n matrix A is said to be invertible if there exists an n x n matrix B such that AB = BA = In.A I) = AA I = In and (BIA I)(AB) = B I(A I(AB» = BI« A IA)B) =BI(InB) B IB = In as required. is it possible to find another n x n matrix B such that AB = BA = In? However. This raises the following question: Given an n x n matrix A. Proposition. it is sucient to show that BIA 1 satises the requirements for being the inverse of AB. Suppose that A and B are invertible n I x n matrices. Proof Suppose that B satises the requirements for being the inverse of A. Remark. Definition. Then AB = BA = In. We shall relate the existence of such a matrix B to some properties of the matrix A. Proposition. Then (A I) I =A.. Equality follows from the uniqueness of inverse. Proposition.Matrics 57 where I if i = j.. Note that (AB)(BIA I) =A(B(B IA I» = A «BB I)A I) = A(I. Then its inverse AI is unique.. we say that B is the inverse of A and write B = AI. In this case.!:j. It follows that A I =A lIn = A I(AB) = (A IA)B = InB = B. Then (AB) B1 =A Proof In view of the uniqueness of inverse.. Proof Note that both (A 1) 1 and A satisfy the requirements for being the inverse of A I. is called the identity matrix of order n. we have AIn = I. Suppose that A is an invertible n x n matrix.
if we write =[=~ 4~3 ~ ]. Given an n x n matrix A. Consider the 3 x 3 matrix 45 28 15.58 Matrics Application to Matrix Multiplication In this section. we shall discuss an application of invertible matrices. Detailed discussion of the technique involved will be covered.:A: k However. Example. is called a diagonal matrix of order n. the calculation is rather simple when A is a diagonal matrix. it is usually rather complicated to calculate Ak =1 d:. al n A= [ : ani ann where aij = 0 whenever i ::j: j.. as we shall see in the following example. Example.. so that . Definition. 30 20 12 Suppose that we wish to calculate A98. D=[~3 ~ n then it can be checked that A = PDP 1. It can be checked that if we take A= 17 10 5] P then =[ 2 ~ ~ ~l' 3 0 pI 3 5/3 1 Furthermore. An n x n matrix a~ I . The 3 x 3 matrices [~ ~ ~ 1 [~ ~ ~] and are both diagonal.
we shall discuss a technique by which we can nd the inverse of a square matrix. = ~PDI)"'(PDPI~ = PD98 p. Consider the matrices all a12 a22 a13 a23 A = ( a 21 and h= (1 0 0 0 0 O. We obtain respectively and Note that ~ ~ ~]. Example. Finding Inverses by Elementary Row Operations In this section. 001 • 0 0 1 a31 a32 a33 Let us interchange rows 2 and 3 of A and do likewise for 13 . We have not discussed here how the matrices P and D are found. Note that this example is only an illustration. Before we discuss this technique. These are: (1) interchanging two rows. • ~I ~2 ~3 0 0 1 Let us interchange rows 1 and 2 of A and do likewise for 13.I = P .Matrics 59 398 A 98 0 298 0 0 0 298 pI. We obtain respectively a21 Til 1 a12 a22 13 0 a 23 ] . if the inverse exists. 98 0 0 This is much simpler than calculating A98 directly. let us recall the three elementary row operations we discussed in the previous chapter. (2) adding a multiple of one row to another row. a21 . ( Note that land ~~ 0 1 0 0 0 [all • a 31 and 0 0 0 1 o a31 a32 a 33 Let us add 3 times row 1 to row 2 of A and do likewise for 13. We obtain respectively a31 a 32 a33 [ :~: :~: :~:] = [~ ~ ~][ ::: :~: :~: :~: a21 a 22 a23 a12 a32 a22 a13 a 33 a23 ::: :::]. and (3) multiplying one row by a nonzero constant.
0 0 I • al3] 0 01[a11 a12 al3] Sa23 = 0 S 0 J a21 a22 a23 . a32 a33 .2a32 + a l2 a22 0 2][ all a12 al3] 1 0 a 21 a22 a23 0 1 a31 a32 a33 for 13" We obtain respectively [S:~I a31 Note that all Sa21 [ a31 Let us multiply S::2 a32 al2 Sa22 S::3] and a33 ~ ~ ~]. [:~: :~: :~:]= [~ ~ ~][ :~: :~: :~: ].1 and do likewise for 13 . o • 2a33 + al3] 1 a23 = 0 [ a 31 a32 a3 3 0 Let us multiply row 2 of A by S and do likewise .~ 3al:~ a21 a22 a23 ] = [~ ~ ~][ ::: and a31 Note that 2a31 + all a21 .2a32 +a12 a22 a32 2a +a. 0 0 • a31 a32 a33 0 0 1 a31 a32 a33 Let us add 2 times row 3 to row 1 of A and do likewise for 13 . a32 a33 0 0 1 a31 a32 a33 row 3 of A by .. a 31 a31 a32 a33 0 0 1 Let us now consider the problem in general. We obtain respectively ::: ~2 [I [ Note that ::: ~I :::] and[ ~3 0 0 1 ~ ~ ~ 1.] 33 a23 a33 [~ 2] . We obtain respectively 2a31 +all a 21 [ 3alla~ 3al.60 [ all 3all +a21 a31 Note that a l2 3al2 +a22 Matrics a" ] 3al3 + a23 and a33 1 0 3 1 a32 0 0 ~] ::: :::].
Consider the matrix <Yk A=[ 2 3 0 ~ ~ ~. We now adopt the following strategy. we consider the array 030 300 o o 1 0 0] 1 We now perform elementary row operations on this array and try to reduce the left hand half to the matrix 13 ." l(Ek ···E2EIA I Ek . . . If EI' E2.. then the final array is clearly in reduced . <Xk of elementary row operations to the identity matrix In. Then B = EA. ••• .. Example. If we succeed in doing so.1 A. The interested reader may wish to construct a proof. In other words. and suppose that B is obtained from A by an elementary row operation.. then the right hand half of the array gives the inverse AI.... we mean an n x n matrix obtained from In by an elementary row operation. Suppose that A is an n x n matrix.J).. 121 To find AI. <Xk then In = Ek . Proposition. Consider an n x n matrix A. The process can then be described pictorially by' (A I In) °2 03 C'i) l(EJA I EJln ) l(E2EIA I E2ElIn) l. ·E2ElI n) = (In. we consider an array with the matrix A on the left and the matrix In on the right. We state without proof the following important result. E 2E lI n· It follows that the inverse AI can be obtained from In by performing the same elementary row operations <Xl' <X2 •••• <Xk• Since we are performing the same elementary row operations on A and In it makes sense to put them side by side.. Suppose further that E is an elementary matrix obtained from In by the same elementary row operation. E2EIA We therefore must have AI = Ek . Suppose that it is possible to reduce the matrix A by a sequence <X J <X2 .. . . taking into account the different types of elementary row operations.. E2EI = Ek . Ek are respectively the elem'entary n x n matrices obtained from In by the same elementary row operations <XI' <X2.. We now perform elementary row operations on the array and try to reduce the left hand half to the matrix In. By an elementary n x n matrix. Note that if we succeed.Matrics 61 Definition.
we obtain [ ~ ~3 ~3 ~3 ~ ~l' 5 4 2 0 1 MUltiplying row by 3 by 3.62 Matrics row echelon form.. we obtain ~ [ o ~3 ~3 ~3 ~ ~l' 15 12 6 0 3 Adding 5 times row 2 to row 3. We therefore follow the same procedure as reducing an array to reduced row echelon form. we obtain [ [ o o 0 3 9 5 3 MUltiplying row 1 by 113. we obtain ~ [ o ~3 ~ ~9 ~4 ~3]. we obtain 3 3 0 15 10 6 3 0 6 4. 3 . 0 3 9 5 3 Adding 1 times row 3 to row 2. we obtain [~o ~ o o 3 ~3 ~ ~ ~l. . 0 15 10 6] 3 3 3 3 1 O. Adding . we obtain 3 3 0 3 9 5 3 Adding 2 times row 3 to row 1.3 times row 1 to row 2. 0 3 9 5 3 Adding 1 times row 2 to row 1. we obtain [ ~o ~3 ~3 ~3 ~ ~l' ~ [o ~3 ~3 ~3 ~ ~l' 0 3 9 5 3 Multiplying row 1 by 3. we obtain 2 3 0 0 0 1 Adding 2 times row 1 to row 3.
we obtain 11231000 0 0 0 1 2 1 0 0 0 0 1 0 0 3 0 0 0 0 0 1 0 0 0 1 Adding 1 times row 2 to row 4. we obtain 0 0 0 1 To find AI. Consider the matrix 1 1 2 3 A= : l 2 2 4 5 0 3 0 0 0 0 0 2 2 4 5 0 1 0 0 (AI 14)= 0 3 0 0 0 0 1 0 0 0 0 0 0 0 We now perform elementary row operations on this array and try to reduce the left hand half to the matrix 14. we obtain 1 0 0 3 2 1 0 1 0 2 4/3 1 . It follows that the right hand half of the array represents the inverse AI.Matrics 63 0 0 3 0 0 3 6 2 4 0 0 3 9 5 Multiplying row 2 by 113. and that the left hand half is the identity matrix 13 . Hence 3 2 AI = 2 4/3 3 5/3 1 Example. we consider the array 1 1 2 3 . 0 0 3 9 5 3 Multiplying row 3 by 113. Adding 2 times row 1 to row 2. o 0 1 3 5/3 1 Note now that the array is in reduced row echelon form. we obtain ~31 1 o [ 1 0 0 3 1 0 2 2 4/3 1 1.
we obtain 1 1 2 0 5 3 0 0 o 3 0 0 o o 1 0 o 0 0 1 2 o 0 0 0 0 2 1 0 Multiplying row 1 by 6 (here we want obtain 6 6 12 0 30 18 o 3 0 0 0 o o 0 0 1 0 o o o to avoid fractions in the next two steps). we observe that it is impossible to reduce the left hand half of the array to 14 . we obtain o o o 0 0 1 2 1 0 0 3 0 0 o 0 1 0 1 2 3 000 03000010 o 0 0 1 2 1 0 0 o 0 0 0 2 1 0 1 At this point. Adding 3 times row 3 to row 1. we 0 1 0 0 0 1 0 0 0 2 1 0 Adding . For those who remain unconvinced. we obtain . we obtain 6 6 12 0 0 3 0 15 03000010 o 0 0 1 o 0 0 1 o o 0 0 0 2 1 0 1 Adding 2 times row 2 to row 1. let us continue.15 times row 4 to row 1. we obtain 1 2 0 5 3 0 0 03000010 o 0 0 1 2 0 0 0 0 0 2 0 1 Adding 1 times row 4 to row 3.64 1 1 2 3 Matrics 100 0 0 0 0 2 1 0 1 Interchanging rows 2 and 3.
Definition. Proof Let us consider elementary row operations.. E1A. EIA implies that A = E11 . It follows that if A is row equivalent to B..IB. 1 0 2 0 1 0 0 0 0 1 1 0 0 0 1 1 0 2 1 0 116. and that the left hand half is not the identity matrix 14 . then we can reverse this by adding . Note now that each elementary matrix is obtained from In by an elementary row operation. Ek such that B = Ek .e times row i to row j. multiplying row 3 by 1 and we obtain 0 0 1/12 1/3 5/2 0 113 0 0 0 0 0 0 1 0 0 0 1 0 1/2 0 0 0 0 1 112 Note now that the array is in reduced row echelon form.. An n x n matrix A is said to be row equivalent to an n x n matrix B ifthere exist a fin. The following result gives conditions equivalent to the invertibility of an n x n matrixA. Suppose that an n x n matrix B can be obtained from an n x n matrix A by a finite sequence of elementary row operations. E.. Criteria for Invertibility Examples. Proposition... Recall that these are: (1) interchanging two rows.. In this section. We usually say that A and B are row equivalent. multiplying row 2 by 113. we shall obtain some partial answers to this question. then B is row equivalent to A. Note that B = Ek . For (1). the matrix A can be obtained from the matrix B by a finite sequence of elementary row operations. if we have multiplied any row by a nonzero constant e. Remark. These elementary row operations can clearly be reversed by elementary row operations. Our first step here is the following simple observation... Every elementary matrix is invertible. we can reverse this by multiplying the same row by the constant lie. (2) adding a multiple of one row to another row. For (3).. Our technique has failed.Matries 65 6 0 12 0 0 0 3 2 15 0 3 0 0 0 0 0 0 0 Multiplying row 1 by multiplying row 4 by 112. .ite number of elementary n x n matrices EI . if we have originally added e times row i to row j. and (3) mUltiplying one row by a nonzero constant. The inverse of this elementary matrix is clearly the elementary matrix obtained from In by the elementary row operation that reverses the original elementary row operation. For (2). we interchange the two rows again. Then since these elementary row operations can be reversed. In fact. the matrix A is not invertible.
. . Then since A is invertible.ivalent to saying that the array ° a~1 [ al n 0 0 can be reduced by elementary row operations to the reduced row echelon form ani ann l . and that are n x 1 matrices.. By Proposition.. Suppose that A = ·. . .66 Proposition. (a) Suppose that the matrix A is invertible. (c) Suppose that the matrices A and In are row equivalent. = 0 of linear (b) Suppose that the system Ax = 0 of linear equations has only the trivial solution. Xo = Irfo = (AIA)x o = Al(Axo) = A:! = o. (b) Note that if the system Ax = 0 of linear equations has only the trivial solution.. Proof (a) Suppose that we have Xo is a solution of the system Ax = O. EIA. (c) Suppose that the matrices A and In are row equivalent. Then the matrices A and In are row equivalent. It follows that the trivial solution is the only solution. Xn are variables. This is eqQ. r!1 Hence the matrices A and In are row equivalent. . .... . . [ ani ann Matrics a~ I . . . Ek such that In = E k. E~ are all invertible.. so that .xn=O...... a~n 1 . Then A is invertible. . Then the system Ax equations has only the trivial solution. Then there exist elementary nn matrices E l . then it can be reduced by elementary row operations to the system : x1=O.. where xl' . the matrices E l .
Consider so that x = AI b is a solution of the system. . Then the system Ax = b of linear equations has the unique solution x = AI b. where Xl' is invertible. ann and that x= [~Il xn and b =[~I bn . bn E lR are arbitrary.. and is therefore itself invertible. We have proved the following important result. Proposition.. Suppose that Xo = l. Since A is invertible let us consider X = Alb. On the other hand.. where x I' xn are variables and bl' . Proposition. E. b n E lR are arbitrary...fo= (AIA)x o = AI(Axo) = Alb. . a~ I A=: [ anI . . so that It follows that the system has unique solution.. are n x I matrices. Clearly Ax = A(A_I b) = (AI b) = (AA1)b = Inb = b.. Suppose that . We next attempt to study the question in the opposite direction.. let Xo be any solution of the system...I .n ann 1 = b.I I k I k is a product of invertible matrices.Matrics 67 A = E.I ···E. where is invertible. Consequences of Invertibility Suppose that the matrix A = all··· anI a. Xn are variables and b l .. 1 matrices.. Consider the system Ax x~ are n x []:] and b~[ll .I l n = E. Suppose further that the matrix A is invertible. . . a~n 1 :. Then Axo = b.
... bj is an n x 1 matrix with entry 1 on row and XI = [X. Ax = bn It is easy to check that A( xl' . 0 o .I] . n.b2 = 0 0 b  . xn ) = ( b l . Proof Suppose that 1 0 .. bn E lR. We can now summarize Propositions as follows.. for every j entry 0 elsewhere. .. . . . . . Proposition. bn).. where xl' .xn =[x..1 ani so that A is invertible.. the system Ax = b of linear equations is soluble.• ... bn = 0 0 1 0 0 In other words. .68 Matrics and that are n x 1 matrices. A[a.. the following four statements are equivalent: (a) The matrix A is invertible. . Xn are variables. . . In the notation of Proposition. .. Now let = 1..n] xnl xnn denote respectively solutions of the systems of linear equations Ax = b l . Suppose further that for every b I .. Then the matrix A is invertible. . in other words...
On the other hand.. . Collecting together the unit consumption vectors. j = 1.. is soluble for every n x 1 matrix b. .. n. . n... and let d.nXn represents the monetary value of the output of sector i needed by all the sectors to produce their output. For every i = 1. This leads to the production equation x= Cx+ d Here Cx represents the part ofthe total output that is required by the various sectors of .. For every j = 1. n. we describe briey the Leontief inputoutput model. + c. (c) The matrices A and In are row equivalent. Consider the mat~ix product For every i = 1.Matrics (b) The system Ax Cd) The system Ax 69 = 0 of linear equations has = b of linear equations only the trivial solution... For i. each of the n sectors requires material from some or all of the sectors to produce its output... let xi denote the monetary value of the total output of sector i over a fixed period. the entry cilx\ + .. . Collecting together xi and di for i = 1.. let cij denote the monetary value of the output of sector i needed by sector j to produce one unit of monetary value of output. we obtain the vectors x=[~ll E~n Xn andd= ~llE~' dn known respectively as the production vector and demand vector of the economy. denote the output of sector i needed to satisfy outside demand over the same xed period.. + C <1 J nJ in order to ensure that sector j does not make a loss. the vector Cl i C J = C nj E~n is known as the unit consumption vector of sector j. .. Application to Economics In this section. n. n.. . where an economy is divided into n sectors. Note that the column sum ClO + . we obtain the matrix 0 cll C=(cl·· ·cn ) = : [ cnl is known as the consumption matrix of the economy. .
we need C(Cd) = c'2d as input. Example... And so on. = (/+ C+ C2 + C3 + .and the production vector x = (I .C)I exists. Then the production vector and demand vector are respectively while the consumption matrix is given by . Proposition. This gives a practical way of approximating (I . .2 and 3 are respectively 30.C)I = / + C + C2 + C3 + .. then represents the perfect production level.70 Matrics the economy to produce the output in the first place. If the entries of Ck + I are all very small.C)I d has nonnegative entries and is the unique solution of the production equation (6). we have . we need C(C2d) = C3d as input.C)I"" I + C + C2 + C3 + . we have demand d. )d in total.Ck + I.{f. To produce d. we need Cd as input.lal demand from sectors 1.d is invertible. Suppose further that the inequality (5) holds for each column of C.. and lalso suggests that (/ . Now it is not dicuIt to check that for every positive integer k. If the matrix (/ . Initially. + C") = I . and d represents the part of the total output that is available to satisfy outside demand. + Ck. .. To produce this extra Cd. An economy consists of three sectors. so that (I . Hence we need to produce d+ Cd+ C2d+ C3d+ . Let us indulge in some heuristics.C)I.C)(/ + C + C2 + C3 + .. Then the inverse matrix (/ ..... Their dependence on each other is summarized in the table below: To produce one unit of monetary value of output in sector 2 monetary value of output required from sector 1 monetary value of output required from sector 2 monetary value of output required from sector 3 0:3 3 0:1 0:2 0:5 0: 1 0:4 0: 1 0:2 0:3 Suppose that the fit.C) . + C") "" I. We state without proof the following fundamental result. Suppose that the entries of the consumption matrix C and the demand vector d are nonnegative. To produce this extra C2d.C)(I + C + C2 + C3 + .. Clearly (/ . 50 and 20. then (I ..C)x = d.
C = 0.4 0.7 . in the sense that T(x' + x1 = T(x1 + T(x1 for every x' C E ]R .'] = A (x~] + A (x. x 2 ~ 226 and x3 78.5 0.5 0.2 0.1] 0.5 [ 0. I t t 4 5 2 500 eqmva en 0 0.1 30 [ 7 2 1 300 0. A matrix transformation T:]R 2 + ]R 2 can be dened as follows: For every x = (XI' x 2 ) E]R.1 0. we write T(x) = y.3 0.2 . so that I .1 0.2 C = 0.2 0. to the nearest integers.3 0. Example. 0. Matrix Transformation on the Plane Let A be a 2 x 2 matrix with real entries.'] x~ + x. The production equation (I .7 0.2. simply observe A = ( X. The matrix .Matrics 71 0. and A (ex J= eA (Xl] .2 50. where satifies Such a transformation is linear. + x. x~ x.1 0.1 0.4 0.7 0.C) x = d has augmented matrix 0. eX2 x2 l ) Here we conne ourselves to looking at a few simple matrix transformations on the plane.4 0. To x' E]R2 that and T(ex) = eT(x) for every X E]R2 and every see this.1 0.1 0.7 20 1 1 7 200 and which can be converted to reduced row echelon form 0 0 0 3200/27 o 0 0 6100/27 ( o ~ 0 1 70019 ~ This gives xl 119. .1] [0.
JR 2 . k < 1. and so represents reection across the line xI summary in the table below: Transformation Reflection acrossx1 . and so represents dilation if k > 1 and contraction if 0 < 1. whereas the matrix for every (xI' x 2) E. whereas the A=( ~l ~l satisfies A[.axis Reflection acrossx2 . We give a {Yl =x1 Y2 = x2 [~ ~ll [ {Yl =x1 Y2 =x2 ~1 Reflection acrossxI = x2 {Yl =x1 [ ~1 Y2 = x2 {Yl =x1 Y2 =x2 ~l ~ll [~ ~l • Example. and so represents reflection across the x Iaxis. and so represents reection across the origin. the matrix 2 .axis Reflection across origin Equations matrix = x 2.JR 2 .JR 2 .JR 2 . and so represents reflection across the x2axis.72 Matrics for every (x I' matrix X2) E.:H ~l ~J[::H~:ll for every (x I' x 2) E.JR . On the other hand. On the other hand. the matrix satisfies for every (x I' x 2) E. The matrix for every (x I' x 2 ) E. Let k be a fixed positive real number.
The matrix for every (xI' x 2) E IR. and so represents expansion in the x 2direction if k > 1 and compression in the x 2direction if 0 < k < 1. whereas the matrix for every (xI' x 2) E IR.Matrics 73 for every (XI' X2) E IR. For the case k = 1. Let k be a fixed real number. 2. 2 . we have the following: (k= I) T ~ . and so represents an expansionn in the xI direction if k > 1and compression in the x Idirection if 0 < k < 1. 2 . We give summary in the table below: Transformation Dilation or contract ion by factor k > 0 Expansion or compression in xI direction by factor k > 0 Expansion or compression in x2 direction by factor k > 0 Equations matrix {YI = Axl = kx2 {YI ~ Axl Y2 Y2 =x2 {YI ~xl Y2 = kx2 [~ ~l [~ ~l [~ ~l Example. and so represents a shear in the xIdirection.
we have T(x].direction Equations Matrix {Y2 ='1 +kx2 Y2 =X2 {Yl =xl Y2 =kxl +x2 [~ ~] [~ ~] Example. sme cose Y2 It follows that the matrix in question is given by YI ] = [ Y2 [c~se A = [ cos e .direction Shear in xl . and so Sine][ YI ].x YI [ cose Sine] sine cose = x] sin e + x2 cos e We conclude this section by establishing the following result which reinforces the linearity of matrix transformations on the plane.1. . For anticlockwise rotation by an angle e. We give a summary in the table below: Transformation Shear in XI . cose .74 For the case k = .sin e1 cose . the matrix A=[~ ~] satisfies for every (xl' x 2) E IR?. x 2) = (Y]'Y2)' where Y] + iY2 = (x] + ix2)(cos e + i sin e ). and so represents a shear in the x2direction. we have the following: Matries T (k=J) • Similarly. Equations Matrix 2 sin e sine Transformation We give a summary in the table below: Anticlockwise rotation bylangle e {Yl = x.
(a. ? 75 JR2 is given by an (b) The image under T of a straight line through the origin is a straight line through the origin. Then (a) The image under T of a straight line is a straight line. Application to Computer Graphics Example. note that straight lines through the origin correspond to y = O.Matries Proposition. and (c) The images under T of parallel straight lines are parallel straight lines. In other words. by (a~) = [:: )<1).. note that parallel straight lines correspond to different values of y for the same values of a and ~. x 2) = (Yl'Y2)' Since A is invertible. we have x = AI y .. Proof Suppose that T(x l .'W) * (a~) AI.J The equation of a straight line is given by ax l + ~x2 = yor. To prove (b). Hence . Consider the letter M in the diagram below: ~ Following the boundary in the anticlockwise direction starting at the origin. in matrix form. where x = [ ~) and y =[ . the image under T of the straight line a~l +~~ =yis aYI + WY2 =y. the 12 . Suppose that a matrix transformation T: JR2 invertible matrix A. This proves (a). To prove (c). clearly another straight line.
[:]. (I:) (~]. (:].[:]. (:]. (:]. representing a shear in the xldirection with factor 0:5. Let us apply a matrix transformation to these vertices. [~l' [~]. Hence the image of the letter M under the shear looks like the following: . (~]. (~].!. (~]. X2] X: for every (x l .. [I~]. 006060088288 In view of Proposition. (~].x2)EIR 2. noting that (: t][~ = 1 1 4 7 7 8 8 7 4 1 0] 060 6 008 8 288 [0 I 4 4 10 7 8 12 11 5 5 4]. (!]. [I:]. the image of any line segment that joins two vertices is a line segment that joins the images of the two vertices. (~]. using the matrix A=[: A ( x2 H (Xl +. (!]. (~]. (~]. (:]..76 vertices can be represented by the coordinates Matrics (~]. (~]. so that Xl] = Then the images of the 12 vertices are respectively (~].
a22 Suppose that T(x l . translation by vector h = (h l' h2) E ]R . can be described by the matrix A*=[~o ~ ~l.)E R'. the image of the point (Xl' x 2' 1) is now (Yl'Y2' 1). Consider a matrix transformation T :]R2 ~]R2 on the plane given by a matrix A=[a a l1 a2l 12 ].:] 1 1 for every (x l . x 2) = = (YI' Y2). Note that . we introduce homogeneous coordinates. (. we may wish to translate this image.) E]R2. To overcome this deficiency. Then [.Matrics 77 Next.h2) is of the form Ene for every (x" x. Under homogeneous coordinates. 1 Remark. so we attempt to find a 3 x 3 matrix A * such that [ ] = A *[ ~1 for every (x l' x.x2) E]R1.: H:: H~ 1 ~ ~~ 1 and this cannot be described by a matrix transformation on the plane. However. For every point (xl' X2) (Xl' x 2)E]R2 we identify it with the point (Xl' x 2' 1) E]R3 . 2 It fo Hows that using homogeneous coordinates. translation is transformation by vector h=(h l .:]A [::]=[::: ::][::]. Now we wish to translate a point (Xl' X 2) to (Xl' X 2) + (hI' h2) = (Xl + hI' x 2 + h2). It is easy to check that [ :~:~l= ~ 0 0 0 ~ ~] .
[1 ~l 2 o x 1 is now replaced by the 3 2 3 matrix o 0 . the 12 vertices are now represented by homogeneous coordinates. A* 0 o Note that 0 0 . By moving over to homogeneous coordinates. It follows that homogeneous coordinates can also be used to study all the matrix transformations we have discussed. The letter M. 111 1 Then the 2 x 2 matrix A= . we simply replace the 2 x 2 matrix A by the 3 x 3 matrix A*=[~ ~J.78 Matrics YI] [all al2 0) XI] [~2 = a~1 a~2 ~J 7. Example. put in an array in the form 0 [ 1 1 4 o 7 7 8 8 7 4 1 I 1 0] 1 1 0 6 0 6 0 0 8 8 2 8 8 .t 1 4 7 7 8 8 7 4 11 A* 0" 0 6 0 6 0 0 8 8 2 8 [1 1 1 1 1 0 O~] 1 1 1 2 o [0 1 I 4 7 7 8 8 7 4 1 00060600882 8 8 0 1 0] o 11111111111 1 1 .
The matrix under homogeneous coordinates for this translation is given by B* Note that o ~ ~ ~]. 6]' 11 ~l giving rise to coordinates in IR 2 . displayed as an array 3 6 6 12 9 10 14 13 7 7 3 9 3 9 3 3 11 11 5 11 Hence the image of the letter M under the shear followed by translation looks like the following: .Matrics 1 4 4 10 7 8 12 11 5 5 4] =006060088288. let us consider a translation by the vector (2. 0 0 1 1 4 1 4 1 1 4 4 10 7 6 1 11 1 1 11 1 0 7 7 8 8 7 1 8 12 11 8 1 7 1 5 11 1 8 1 1 5 5 2 8 1 1 4 0 B*A* 0 0 6 0 6 0 0 8 8 2 8 8 =( ~ 0 1 0 =[~ 2 [3 3 6 m 3 1 6 0 0 0 111 939 1 '1 6 12 9 10 14 13 7 3 9 1 1 1 l~l J. 79 o 111111111111 Next. 3).
The first step is to reduce it to row echelon form: (I) First of all. followed by anticlockwise rotation by 90°. and followed by translation by vector (2.I)(n + 1) operations. as the entry a l2 here. adding or mUltiplying two real numbers. By an operation. This requires n + 1 operations.. may well be different from the entry a l2 in the augmented matrix. This requires 2(n . where A is an n x n invertible matrix.. we may need to interchange two rows in order to ensure that the top left entry in the array is nonzero. we need to multiply the new first row by a constant in order to make the top left pivot entry equal to I. I).. the transformation representing a reflection across the xIaxis. followed by a shear by factor 2 in the xIdirection. . we mean interchanging.80 Matrics Example. ann a. and the array now looks like .ail and then add to row i. has matrix [~ ~ ~Il[~ ~I ~ [~ ~ ~l[~ ~I ~l ~2 ~Il' = [: 00 100100100 100 I Complexity of a NonHomogeneous System Consider the problem of solving a system of linear equations of the form Ax = b. [anI a. . This requires n + I operations. we now multiply the first row by . Under homogeneous coordinates. (III) For each row i = 2.. . One way of solving the system Ax = b is to write down the augmented matrix bn and then convert it to reduced row echelon form by elementary row operations. We are interested in the number of operations required to solve such a system. for example.n ~Il' (II) Next. and the array now looks like anI a n2 ann bn Note that we are abusing notation somewhat.1 . n.
ts at most 2(n + 1) + 2(11 . to proceed from the form II to the form III.. so this is less ecient than our first method. This 2 Another way of solving the system Ax may involve converting the array 1 to reduced row echelon form by elementary row operations. the number of operations require~. . and the number of operations required is at most 2m(m + 1). We continue in this way systematically to reach row echelon form. Ax =b We therefore conclude that the number of operations required to solve the system by reducing the augmented matrix to reduced row echelon form is bounded by something like 3" n3 when n is large. This is simpler. estlmate"3 n earlier. In any case. 2 3 .1. ani ann . where m = n . (V) Our nelt task is to convert the smaller array t [a~2 . It can be shown that the number of operations required is bounded by something like 2n 2 indeed. by something like n2 if one analyzes the problem more carefully. a n2 a~n ~21' ann bn to an array that looks like These have one row and one column fewer than the arrays (II) and (III). = b is to first nd the inverse matrix AI. and conclude that the number of operations required to convert the augmented matrix (II) to row echelon form is at most n2 2 L2m(m+ 1) ~ _n 3 • m=1 3 The next step is to convert the row echelon form to reduced row echelon form.. as many entries are now zero.1)(n + 1) = 2n(n + 1) . It can be shown that the number of operations required is something like 2n 3 . these estimates are insignicant compared to the .Matrics 81 (IV) In sumJary.
where L is an m x m lower triangular matrix with diagonal entries all equal to 1 and U is a quasi row echelon form of A. it is sucient even to restrict this to adding a mUltiple of a row higher in the array to another row lower in the array. then there is no need to multiply any row of the array by a nonzero constant. if we are aiming for quasi row echelon form and not row echelon form. . Definition. It can be shown that products and inverses of unit lower triangular matrices are also unit lower triangular. A rectangular array of numbers is said to be in quasi row echelon form if the following conditions are satised: (1) The leftmost nonzero entry of any nonzero row is called a pivot entry.. Hence L is a unit lower triangular matrix as required. . (3) The pivot entry of a nonzero row occurring lower in the array is to the right of the pivot entry of a nonzero row occurring higher in the array.82 Matrics Matrix Factorization In some situations. Hence the only elementary row operation we need to perform is to add a mUltiple of one row to another row. then we can and its inverse AI and then compute A1b for each vector b. E 2E 1)I. the matrix A may not be a square matrix. Then A=LU. We consider first of all a special case. Suppose that an m x n matrix A can be converted to quasi row echelon form by elementary row operations but without interchanging any two rows. In other words.. Then A = LU. In fact. L U = Ek.. Let = (Ek' . . It is not necessary for its value to be equal to 1.. In this section. Ek are all unit lower triangular. If A is an invertible square matrix.E2. If an m x n matrix A can be reduced in this way to quasi row echelon form U.. Let us call such elementary matrices unit lower triangular. To describe this.. (2) All zero rows are grouped together at the bottom of the array. On the other hand. we first need a deffinition. Proof Recall that applying an elementary row operation to an m x n matrix corresponds to mUltiplying the matrix on the left by an elementary m x m matrix. However. . and we may have to convert the augmented matrix to reduced row echelon form. and it is easy to see that the corresponding elementary matrix is lower triangular. Proposition. we describe a way for solving this problem in a more efficient way.. then where the elementary matrices E 1. with the same coefficient matrix A but for many different vectors b. with diagonal entries all equal to 1. we may need to solve systems of linear equations of the form Ax = b. E2E1A. except that the pivot entries do not have to be equal to 1. the array looks like row echelon form in shape. .
" Matrics 83 If Ax = b and A = L U. and column 1 is a pivot column. we obtain 2 0 0 1 3 2 2 2 1 3 2 8 8 9 6 8 10 18 0 12 Note that the same three elementary row operations convert 1 0 0 0 2 1 0 0 0 to 0 0 0 0 0 1 0 0 0 * 1 * * * 1 * * 0 1 Next. We therefore wish to create a matrix L such that this is satised. ••• . Both of these systems are easy to solve since both Land U have many zero entries. then L(Ux) = b. and column 2 is a pivot . note that L = (Ek . and adding 1 times row 1 to row 4. It remains to nd L. we have Ly = band Ux = y. where U = Ek . ••• . and so 1= Ek. E2E 1A. However. It follows that the problem of solving the system Ax = b corresponds to first solving the system Ly = b and then solving the system Ux = y. E2EIL This means that the very elementary row operations that convert A to U will convert L to /. ••• . adding 1 times row 1 to row 3. Writing y= Ux. Consider the Matrix 2 1 2 2 3 A= 4 6 5 8 2 10 4 8 5 2 13 6 16 5 The entry 2 in row 1 and column 1 is a pivot entry. Example. It is simplest to illustrate the technique by an example. Ifwe reduce the matrix A to quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array. then U can be taken as the quasi row echelon form resulting from this. E2Eltl. It remains to and Land U. Adding 2 times row 1 to row 2. the entry 3 in row 2 and column 2 is a pivot entry.
and column 5 is a pivot column.84 Matrics column. Let us examine our last example again. we obtain the quasi row echelon form u= 2 1 2 2 0 3 2 1 0 0 0 0 0 0 7 0 3 2 2' 4 where the entry 4 in row 4 and column 5 is a pivot entry. and column 4 is a pivot column. The lower triangular entries of L are formed by these columns with each column divided by the value of the pivot entry in that column. The strategy is now clear. Example. Note that the same elementary row operation converts 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 to 0 1 0 0 0 0 1 0 2 1 0 0 0 0 0 0 Now observe that if we take L= 2 1 0 0 1 3 1 4 o' 2 1 then L can be converted to 14 by the same elementary operations that convert A to U. Adding 3 times row 2 to row 3. Every time we nd a new pivot. we obtain 2 1 0 3 0 0 0 0 0 0 1 0 3 0 4 2 2 2 1 0 0 7 14 3 2 2 0 1 0 Note that the same two elementary row operations convert 0 0 0 0 0 0 1 0 1 0 to 0 1 0 0 0 0 0 0 * * Next. The pivot columns at the time of establishing the pivot entries are respectively . Adding 2 times row 3 to row 4. and adding 4 times row 2 to row 4. we note its value and the entries below it. the entry 7 in row 3 and column 4 is a pivot entry.
7 and 4. (1) Reduce the matrix A to quasi row echelon form by only performing the elementary row operation of adding a mUltiple of a row higher in the array to another row lower in the array. we obtain respectively the columns * 2 1 * * l' l ' 3' *' * 4 1 0 1 2 0 0 0 0 Note that the lower triangular entries of the matrix L= 2 3 1 0 4 2 correspond precisely to the entries in these columns. with modied version A= 3 3 4 5 . * * *' 4 Dividing them respectively by the pivot entries 2. We wish to solve the system of linear equations Ax = b.Matrics 85 2 * 4 3 2 ' 9 12 2 * * 7 14 . where 3 1 2 4 1 1 5 2 2 andb= 11 10 6 9 6 6 8 21 13 9 15 Let us first apply LV factorization to the matrix A. Let V be the quasi row echelon form obtained. Example. and modify it by replacing any entry above the pivot entry by zero and dividing every other entry by the value of the pivot entry. LU FACTORIZATION ALGORITHM. The first pivot column is column 1. (3) Let L denote the square matrix obtained by letting the columns be the pivot columns as modied in step (2). 3. (2) Record any new pivot column at the time of its first recognition.
with modied version . with modied version 4 1 3 8 2 4 o o 2 Adding 2 times row 3 to row 4.2 times row 1 to row 3. adding . we obtain 3 1 0 0 2 6 2 4 1 2 3 7 17 0 2 4 7 5 The second pivot column is column 2.86 Matries 1 2 2 Adding row 1 to row 2. and adding 3 times row 2 to row 4. with modied version o 1 1 3 Adding row 2 to row 3. we obtain the quasi row echelon form 3 1 0 0 0 2 0 0 2 3 4 1 1 4 0 3 2 0 The last pivot column is column 5. we obtain 3 1 0 0 2 0 2 3 4 1 1 0 0 The third pivot column is column 3. and adding 2 times row 1 to row 4.
so that Y3 = 6.Yl = 2. we o 4 obtain 2x2 = 1 + 3x3 x4 + x5 Xs = t. with augmented matrix 3 1 2 4 1 1 o 2 3 1 11 o 0 4 1 3 6 0 0 0 2 2 Here the free variable is x 4. Using row 2. Let x 4 = t. we obtain 2x5 = 2.2Y3 = 15. we obtainYl = 1. so that x3 = + 4t . so that Y4 = 2. . 3 1 Usmg row 3. 1 Hence Y= 1 6 2 We next consider the system Ux =Y. Using row 3. Using row 4.Y2 = 9. 9 1 4 4 .Matrics 87 0 0 0 1 It follows that 3 1 2 4 1 0 0 0 2 3 1 1 and u= L= 4 1 3 0 0 2 1 1 0 0 0 2 2 3 2 0 0 0 0 0 We now consider the system Ly = b. we obtainY2 . we obtain Y3 + 2Yl .2Yl + 3Y2 . with augmented matrix 1 1 0 0 0 0 1 9 o 1 3 1 02 2 2 2 1 15 Using row 1. we obtam 4x3 = 6 + x 4 3x5 = 3 + t. . we obtain Y4 . so thatY2 =1. Using row 2. Using row 4. so that x5 = 1.
. Remarks. 3. For every i = 1.3..88 Matrics so that x 2 that XI = g.g' 27 3 so = gt . andj = 1. .3.2. . n. interchanging rows is usually necessary to convert a matrix A to quasi row echelon form.. m... . m.. the number Pjqj represents the probability that player R makes move i and player C makes move j. . . . we obtain 3x I = l+x2 .2.2. . player C makes movej with probability qj" Then PI + .where t E R. . 2.. 2. usually known as the column player. ami The entries can be positive. m.. has n possible moves. ... Application to Games of Strategy Consider a game with two players.g' 9 _ (9/1 91 t 3 +t 11) (xl' x 2' X3' X4' Xs) 8 ' 8 ' 4 " . + Pm = 1 and qI + . denoted by j = 1. in which case the matrices Land U may also have many zero entries. . usually known as the row player. Solving the systems Ly = band Ux = y requires approximately 2n2 operations. while player C.=1 j=1 represents the expected payo that player C has to make to player R. m.2x3+4x4 ..qj ..2. 2.. . Assume that the players make moves independently of each other.g/. . negative or zero. 1 Hence 9 1 Using row 1. Suppose that for every i = 1.. . n. and j = 1. + qn = 1. . Then the double sum m n EA(p... (3) LU factorization is particularly ecient when the matrix A has many zero entries.x5 = Sl .q) = LLaijp. These numbers give rise to the payo matrix all A[ : .3.2. (1) In practical situations. . let aij denote the payo that player C has to make to player R if player R makes move i and player C makes move j. n.. has m possible moves. Then for every i = 1.3. The technique here can be modied to produce a matrix L which is not unit lower triangular. The matrices . . n. 3. Player R... 2 (2) Computing an LU factorization of an n x n matrix takes approximately 3 n3 operations. 3..3. but which can be made unit lower triangular by interchanging rows. denoted by i = 1. player R makes move i with probability Pi' and that for every j = 1.
.. It is very easy to show that different saddle points in the payo matrix have the same value. Remark. Zero sum games which are strictly determined are very easy to analyse. The strategy p is known as an optimal strategy for player R. then EA(P*. An entry aij in the payo matrix A is called a saddle point if it is a least entry in its row and a greatest entry in its column. q)? Fundemental Theorem of Zero Sum Games.q) = B~aijPiq/PI'''Pm): a~n][. The quantity EA(p*. the teachers require 100 students to each choose between rowing (R) and cricket (C). In this case.Matrics 89 P ~ (PI··· Pm ) and q = [ ::J . so that the value of the game is aij . q)? Is it possible for player C to choose a strategy q to try to minimize the expected payo EA(P.. are known as the strategies of player R and player C respectively. However. q*) = EA(p**. Is it possible for player R to choose a strategy p to try to maximize the expected payo Eip. There exist strategies p* and q* such that EA(p*. and the strategy q is known as an optimal strategy for player C. 0 I 0 . amI'" amn qn Here we have slightly abused notation. q*) for every strategy p* of player R. are optimal strategies. q*) > EA(p. . Here the payo matrix A contains saddle points. Clearly the expected payo m n [ all EA(p. ]_ . the strategies o  o p* = (0 .. 0) and q = I .. q**). the students cannot make up their mind. However. if p** and q** are another pair of optimal strategies. In some sports mad school.Aq. Example. o o where the I 's occur in position i in p* and positionj in q*. Optimal strategies are not necessarily unique.p. q*) is known as the value of the game. . The right hand side is a I x 1 matrix! We now consider the following problem: Suppose that A is xed. Remark. ~I . and every strategy q* of player C. q) > EA(p*.
ql) + a 21 (1. 20 15 20 [For example. and so 80 students will choose cricket.a 22 )Pl + Let . where RI.q * ) = (a12 a22)(a22 a21 ) + a22 = !.90 Matrics and will only decide when the identities of the rowing coach and cricket coach are known. the top left entry denotes that if each sport starts with 50 students. Then the solution for these optimal problems are solved by linear programming techniques which we do not discuss here..a21 + a22 all . then 25 is the number cricket concedes to rowing.~a lI a22 a12 a21 all .. then 20 students will choose rowing.. we can write P2 = I .ql... The number of students who will choose rowing ahead of cricket in each scenario is as follows.a 21 + a22 which is independent of q. There are 3 possible rowing coaches and 4 possible cricket coaches the school can hire. saddle points may not exist. if coaches R2 and CI are hired. However.a 12 . q) = allPlql + a I 2PI(1.PI)(1 = ((all .(a 22 . so the optimal strategy for rowing is to use coach RI and the optimal strategy for cricket is to use coach C3..a l2 .!.PI and q2 = I Eip.!.] Here the entry 5 in row I and column 3 is a saddle point.:.al2 . C3 and C4 denote the 4 possible cricket coaches: CI C2 C3 RI 75 50 45 R2 20 60 30 R3 45 70 35 C5 60 55 30 [For example.==.a 21 ))ql + (a 12 . R2 and R3 denote the 3 possible rowing coaches. in the case of 2 x 2 payo matrices which do not contain saddle points. and CI.~ I~ 5 ]. Then ql) a 22 · Then E A(P . In general.PI)ql + a 22(1 . Similarly.] We first reset the problem by subtracting 50 from each entry and create a payo matrix A =[~:o I~ .a 21 + a 22 )Pl . so that the problem is not strictly determined. C2. if .
A square matrix A with real entries and satisfying the condition AI = At is called an orthogonal matrix.a2l +a22 alla12 . Consider the euclidean space ~2 with the euclidean inner product. Then C = {vI' v2 } is also an orthonormal basis. q*) = Eip. cose ).a 2l + a 22 . which is independent of p.Matrics 91 then EA ( p. q*) for all strategiesp and q: Note that * [ P = and a22 a2l all .q *) alla22 .a12 1 all a12 . Hence Eip* q) = Eip*. . Let us now rotate u l and u2 anti clockwise by an angle to obtain vI = (cose sin e) and v2 = (sine.a12 a 2l al1 . Example. The vectors u l = (1.a2l +a22 with value ORTHOGONAL MATRICES Definition. 1) form an orthonormal basis B = {up u2}.a12 . 0) and u2 = (0.
Then t _ AA . namely (113. rn . our last observation is not a coincidence. j = 1. . 113) are orthonormal. (2/3.2/3).. n. The matrix 113 A= 2/3 113 2/3 2/3 ( 2/3 2/3 ) 2/3 1/3 is orthogonal.. So are the column vectors of A. 2/3 113 2/3 2/3 1/3 0 0 1 Note also that the row vectors of A.. ( Ii. Proposition. Ii . . we have . In fact.113.:rn ) rn . Proof We shall only prove (a).. . sinS cosS p I =p t = [cos 1 In fact... vn } are two orthonormal bases of a real inner product space V.2/3) and (2/3. . Suppose that A is an n x n matrix with real entries. smS cosS sin S . . Then (a) A is orthogonal if and only if the row vectors of A form an orthonormal basis of 1R n under the euclidean inner product. Example. and (b) A is orthogonal if and only if the column vectors of A form an orthonormal basis of 1R n under the euclidean inner product. Let r. Proposition... Ii : . Suppoiie that B = {up'" un} and C = {vI' .r.rn . rn denote the row vectors of A. Then the transition matrix P from the basis C to the basis B is an orthogonal matrix. since the proof of (b) is almost identical. . It follows that AAt = I if and only if for every i. our example is a special case of the following general result. ..... since At A = 113 2/3 ( 2/3 2/3 2/3) (113 2/3 2/3) (1 0 0 ) 113 2/3 2/3 1/3 2/3 = 0 1 0 .92 The transition matrix from the basis C to the basis B is given by Matries p Clearly = ([VdB[V2 ln = [ COS sinS 1 .2/3. 2/3.
... (AlA I) u = 0. Then II u 112 = (u.Matrics r· r· 93 if i == j. v = o. II = II x II for every x E lR n • Then for every u.. v. v. .. v=u. and where B = {u l ' .. + ~nun' ~Iul + .Av=IIAu+Avll IIAuAvll =IIA(u+v)1I IIA(uvll 4 4 4 4 1 21 =lIu+vll lluvll =u. } { 0 ifi::/:. 4 4 Suppose that Au.. /. Proposition. v=Au. X = II X 112. Ax = xlAIAx = xlIx = XIX «b)) :::} (c)) Suppose that II Ax we have = X .(i) But then (1) is a system ofn homogeneous linear equations in n unknowns satised by" n every u E lR • Hence the coefficient matrix AI A I must be the zero matrix. In particular.v. so that (AlA l)u . Av = u. It follows that for every x E lR n. . we can write u = ~Iul + ..Av=v'AIAu=AIAu. 21 21 2 Au. and so AlA = 1. so that AlA = 1. so that (AlA I)u .. we have (c) For every II Ax II = II x II. v E lRn. v E lRn.... Proof For every UE V. v for every u. Proof «a)) :::} (b)) Suppose that A is orthogonal. Then the following are equivalent: (a) A is orthogonal.. vn} are two orthonormal bases of V. u.. Suppose that A is an n x n matrix with real entries. + ~nun) . + ~nun = Ylvi +. (b) For every x E lRn.. v E lR n .. + ynvn' where ~I'····'~n' YI'····· YnE lR.... we have II Ax 112 = Ax . this holds when v = (AlA l)u.. . whence (AlA I)u = 0. I if and only if r I' . rn are orthonormal.. we have Au . Then 1 2 1 2 Iu..j. Suppose further that the inner product in lR n is the euclidean inner product. u) = (~Iul + . Av = u . un} and C= {vi' .
(ii) det ani a n2 ann  =0 A Note that (ii) is a polynomial equation.AI)v = O. Solving this equation (2) gives the eigenvalues of the matrix A. so that (A .. On the other hand. Since vERn is nonzero.Vj ) = L'Y~ 1=1 j=1 It follows that in lR n with the euclidean norm. Eigenvalues and Eigenvectors We give a brief review on eigenvalues and eigenvectors: Suppose that a~1 A=: ( ani ••• a~n J : ann is an n x n matrix with real entries. In other words. and forms a subspace of R n • This space (iii) is . Then we say that A is an eigenvalue of the matrix A.')J. we have /I [u]B /I = /I [u]c /I..AI)v = O} . 'YIVI + . In this case. + 'YnV ..... It now follows from Proposition that P is orthogonal.AI) is called the characteristic polynomial of the matrix A. The polynomial det(A . Hence /I Px II = II x II holds for every x E R . for any eigenvalue of the matrix A. + 'YnV) n n 1=1 = LL'Yi'Yj(V.AI) = O. (iii) is the nullspace of the matrix A . and so /I P[u]c /I = II [u]c /I n for every u E V. Suppose further that there exist a number A E 1R and a nonzero vector vERn such that Av = v. n n U) = ('YIVI +. we have Av = AV = ')Jv.94 Matrics Similarly.. it follows that we must have det(A .. IIu /1 2 = (U. and that v is an eigenvector corresponding to the eigenvalue A . we must have . where I is the n x n identity matrix. the set { v E lR n : (A .
Matrics
95
called the eigenspace corresponding to the eigenvalue A. Suppose now that A has eigenvalues
AI' .... An E lR, not necessarily distinct, with corresponding eigenvectors vI' .... , vn E lRn,
and that vi' .... vn are linearly independent. Then it can be shown that pIAP=D, where
In fact, we say that A is diagonalizable if there exists an invertible matrix P with real entries such that PIAP is a diagonal matrix with real entries. It follows that A is diagonalizable if its eigenvectors form a basis of lR n • In the opposite direction, one can show that if A is diagonalizable, then it has n linearly independent eigenvectors in lR n • It therefore follows that the question of diagonalizing a matrix A with real entries is reduced to one of linear independence of its eigenvectors. We now summarize our discussion so far. Diagonalization Process. Suppose that A is an n x n matrix with real entries. (1) Determine whether the n roots of the characteristic polynomial det(A  IJ) are real. (2) If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding to these eigenvalues. Determine whether we can nd n linearly independent eigenvectors. If not, then A is not diagonalizable. If so, then write
J
(3)
p
~ (vI .... v
n)
and D
~ (A.I '. A.J
where Ai' ... , An E lR are the eigenvalues ofA and where vl' .....vn E lRn are respectively their corresponding eigenvectors. Then PIAP = D. In particular, it can be shown that if A has distinct eigenvalues AI'·· .. An E lR, with corresponding eigenvectors vi' ...... , vn E lR n , then vi' ..... vn are linearly independent. It follows that all such matrices A are diagonalizable.
Orthonormal Diagonalization
We now consider the euclidean space ffi. n an as inner product space with the euclidean inner product. Given any n x n matrix A with real entries, we wish to nd out whether there
96
Matrics
exists an orthonormal basis of ~n consisting of eigenvectors of A. Recall that in the Diagonalization process discussed in the last section, the columns of the matrix Pare eigenvectors of A, and these vectors form a basis of ~ n . It follows from Proposition basis is orthonormal if and only if the matrix P is orthogonal. Definition. An nn matrix A with real entries is said to be orthogonally dagonalizable if there exists an orthogonal matrix P with real entries such that PIAP = plAP is a diagonal matrix with real entries. First of all, we would like to determine which matrices are orthogonally diagonalizable. For those that are, we then need to discuss how we may find an orthogonal matrix P to carry out the diagonalization. To study the first question, we have the following result which gives a restriction on those matrices that are orthogonally diagonalizable. Proposition. Suppose that A is a orthogonally diagonalizable matrix with real entries. Then A is symmetric. Proof Suppose that A is orthogonally diagonalizable. Then there exists an orthogonal matrix P and a diagonal matrix D, both with real entries and such that P'AP = D. Since PP' = pIp = I and Dt = D, we have A = PDP' = PD'P', so that A' = (PD'P0 t = (P0 t(D0'P' = PDP' = A, whence A is symmetric. Our first question is in fact answered by the following result which we state without proof. Proposition. Suppose that A is an n x n matrix with real entries. Then it is orthogonally diagonalizable if and only if it is symmetric. The remainder of this section is devoted to nding a way to orthogonally diagonalize a symmetric matrix with real entries. We begin by stating without proof the following result. The proof requires results from the theory of complex vector spaces. Proposition. Suppose that A is a symmetric matrix with real entries. Then all the eigenvalues of A are real. Our idea here is to follow the Diagonalization process discussed in the last section, knowing that since A is diagonalizable, we shall find a basis of ~n consisting of eigenvectors of A. We may then wish to orthogonalize this basis by the GramSchmidt process. This last step is considerably simplied in view of the following result. Proposition. Suppose that u 1 and u2 are eigenvectors of a symmetric matrix A with real entries, corresponding to distinct eigenvalues 1 and 2 respectively. Then u 1 • u 2 = O. In other words, eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues are orthogonal. Proof Note that if we write u 1 and u2 as column matrices, then since A is symmetric, we have
Matrics
97
It follows that
A)U) u 2 = Au) . u 2 = u) AU2 = u) .2u2, so that (AI  A2)(u l . u2) = o. Since A) 7: A , we must have ul . u2 = o. 2 We can now follow the procedure below. Orthogonal Diagonalization Process. Suppose that A is a symmetric n x n matrix with real entries. (1) Determine the n real roots AI, .... An of the characteristic polynomial det (A AJ), and find n linearly independent eigenvectors u l ' •••• , un of A corresponding to these eigenvalues as in the Diagonalization process.
(2)
Aply the Gram~Schmidt orthogonalization process to the eigenvectors u I ' ••• , un to obtain orthogonal eigenvectors vIm, ... , vn of A, noting that eigenvectors corresponding to distinct eigenvalues are already orthogonal. Normalize the orthogonal eigenvectors v .. ..., vn to obtain orthonormal eigenvectors
wI' ... ,
(3)
wn of A. These form an orthonormal basis of lRn. Furthermore, write
p~ J... and D=(AJ ". AJ
(w
w.)
where A., ... , An E lR are the eigenvalues of A and where wI' ... , wn E lR n are respectively their orthogonalized and normalized eigenvectors. Then PtAP = D. Remark. Note that if we apply the GramSchmidt orthogonalization process to eigenvectors corresponding to the same eigenvalue, then the new vectors that result from this process are also eigenvectors corresponding to this eigenvalue. Why? Example. Consider the matrix
A=( ~
122
~ ~).
1) =
0;
To find the eigenvalues of A, we need to nd the roots of
2A 2 det 2 5 A 2 ( 2 2A
in other words, (A7)(A 1)2 = O. The eigenvalues are therefore Al = 7 and (double root)
~=A3=1.
An eigenvector corresponding to A)
= 7 is a solution of the system
98
Matrics
5
(A71)u
=(
~
Eigenvectors corresponding to 1..2 =1..3 = 1 are solutions of the system
(A7I)U=U
~ ~}=
0 wiili root .,
=UJ
Md ",
=Ul J
which are linearly independent. Next, we apply the GramSchmidt orthogonalization process to u2 and u3, and obtain
which are now orthogonal to each other. Note that we do not have to do anything to ul at this stage, in view of Proposition. We now conclude that
form an orthogonal basis of ~3. Normalizing each of these, we obtain respectively
wI
=
1116] 2116 ,w2 = [ 1116
(1IJi] , = [1IJ3] . 0 11 J3 1IJi 11J3
w3
We now take
Then
p= pI
=
1IJi [
1IJ3
1116 2/16
0
1IJ3
~:::n] /AP~(~ ~
and
1IJ3
0 0
Example. Consider the matrix
Matries
99
o 9 20 To find the eigenvalues of A, we need to nd the roots of 6 _13 _A
A=[~1 ~3 ~~2J.
det(I~A
~102 Jo;
o 9 20A in other words, (A + 1)( A 2)( A  5) = O. The eigenvalues are therefore Al = 1, A2 = 2 and A3 = 5. An eigenvector corresponding Al = I is a solution of the system
(A+/)U=(~ ~2 ~~2Ju=0'
o
9
21
An eigenvector corresponding to ~
with root
ul
0 = 2 is a solution of the system
with root
=(~J.
(A+5/)U=[~6 ~8 ~~2Ju=0'
o
9 15
An eigenvector corresponding to A3
u3
=(~5J.
3
= 5 is a solution of the system
0, with root
(A+5/)u =
(~6 ~8 ~~2 Ju =
o
9 15
u3 =
(~5 J.
3
Note that while uI ' u2' u3 correspond to distinct eigenvalues of A, they are not orthogonal. The matrix A is not symmetric, and so Proposition 100 does not apply in this case. Example. Consider the matrix
A=( ~2
det
~2
2
n
0 2
To find the eigenvalues of A, we need to nd the roots of
(
5A 2 2 6  A
J= 0;
o
7A
in other words, ( A  3)( A 6)( A  9) = O. The eigenvalues are therefore A\ = 3, A2 = 6
1 = 3 is a solution of the system Ut (A3J)u =(~2 ~2 ~)u = o 2 4 0. with root =( ~ ). ( 113 2/3 . with root u3= ( o 2 2 2 0) [1) 2 . so we do not have to apply Step (2) of the Orthogonal diagonalization process.. we obtain respectively 2/3) wt 2/3 . 1 An eigenvector corresponding to ~ = 6 is a solution of the system (A6nu =( ~~ ~2 ~} = 0.w2 = [2/3) 113. f113 w3 2/3 (1/3) . 2 Note now that the eigenvalues are distinct. wiili root ~ =( ~1 ) An eigenvector corresponding to 3 = 9 is a solution of the system (A9J)u= 4 2 3 2 u=O..3 = 9. Normalizing each of these vectors. 2/3 2/3 2/3 = pi = 2/3 1/3 ( 113 2/3 113) 2/3 2/3 and pIAP= 0 (3 0 .100 Matrics and 1. An eigenvector corresponding 1. 2/3 2/3 We now take P=(wt Then pt w2 2/3 2/3 1/3) w3)= 2/3 113 2/3. so it follows from Proposition that up u2. u3 are orthogonal.
.. . Finally. The parallelepiped determined by the vectors VI' V2. since we always can join vectors together (say as columns) to form a matrix. what is the ndimensional volume? If n = 2 it is area. vn).O S tk S 1 "ik = 1. let us introduce some notation.. . I don't want just to give a formal definition. and then derive some properties the determinant should have.Chapter 4 Determinants Introduction The reader probably already met determinants in calculus or algebra. . Vn can be defined as the collection of all vectors V E ]Rn that can be represented as v=tIvI +t2v2 + . . In this chapter we would like to introduce determinants for n x n matrices. . There is no real difference here. but with determinant of a system of vectors.. V2. n. ·+tnvn . . we do not have any'choice. So. vn in R. the determinant of a 3 x 3 matrix can be found by the "Star of David" rule.. .. Vn we will denote its determinant (that we are going to construct) as D(v I ..be. It can be easily visualized when n = 2 (parallelogram) and n = 3 (parallel~piped). . . Then if we want to have these properties. In dimension 1 is it just the length. and arrive to several equivalent definitions of the determinant. For a system of vectors (columns) VI' V2..2. If we . .n (notice that the number of vectors coincide with ~imension). if n = 3 it is indeed the volume... First I want to give some motivation. at least the determinants of 2 x 2 and 3 x 3 matrices. For a 2 x 2 matrix the determinant is simply ad . and we want to find the ndimensional volume of the parallelepiped determined by these vectors. It is more convenient to start not with the determinant of a matrix. Let us have n vectors VI' V2..
.. We will show.. Now let us generalize this idea to higher dimensions. meaning that if we fix n . vn ) k k To get the next property. .1) dimensional. . ..n aI. Linearity in Each Argument First of all. . . . the above two properties say that the determinant of n vectors is linear in each argument (vector). vn)..uk +vk. if we multiply vector vI by a positive number a. vn ) of the system vI' V2. For a moment we do not care about how exactly to determine height and base. then the "height" of the result should be equal the sum of the "heights" of summands. and the volume (determinant) is uniquely defined.. vn should satisfy D(av I.. then we do not have any choice.. and so the determinant D(vl'v2. v2. that for dimensions 2 and 3 "volume" of a parallelepiped is determined by the base times height rule: if we pick one vector.2 an.} al..o... detA = D(v\. v2..!1 . In other words. let us notice that if we add 2 vectors.102 Determinants join these vectors in a matrix A (column number k of a is vk ). then the height (i.n a2..e. ·.. ' = D( VI''''Ub' . . Of course. for a matrix an.. v2. . vn) Also. . . If we admit negative heights (and negative volumes). i. ..''''vn) . .. then we will use the notation detA.v~ . then this property holds for all scalars a. . vn ) k ...e. there is nothing special about vector vI' so for any index k D(VI. that if we assume that the base and the height satisfy some natural properties. .l a2) an'2 aI. vn)) is multiplied by a. we get a linear function. that D(vI"". .1 vectors and interpret the remaining vector as a variable (argument). the distance to the linear span £ (v2' . and the base is the (n . its determinant is often is denoted by a2'f WHAT PROPERTIES DETERMINANT SHOULD HAVE We know.Volume of the parallelepiped determined by the remaining vectors. vn) = aD(v I.. vn k ) + D( VI'"'' Vb"'.. k . then height is the distance from this vector to the subspace spanned by the remaining vectors.Vk'·"'Vn ) = aD(v I .
Remark. + o."'~J +(Vj Vj):···'~k ~VJ. Namely.. if we apply the column operation of the third type. . v j .. } k J j k At first sight this property does not look natural. .. admitting negative heights (and therefore negative volumes) is a very small price to pay to get linearity. if we take a vector. In fact. by admitting negative heights.Vj. the determinant does not change. and then using we get D(v\.' .Vj""Vk. .. Antisymmetry The next property the determinant should have. We already know that linearity is a very nice property. .Vj"". Vj. . .. say vp and add to it a multiple of another vector vk . because the sign of the determinant contains some information about the system of vectors (orientation). Namely. .Vn 1= D(v\. ..'''''Vn ) j k = +". is Functions of several variables that change sign D[V\>'''' v~. applying property three times. .. . we even gained something.. that helps in many situations.'''' Vn ] j k In other words..Vn ] = .. .. let us notice that the second part of linearity is not independent: it can be deduced from properties. . since we can always put on the absolute value afterwards. vk.. . ...:k Vj.Vk':'"kVn ) vb'''' ' j = D[V\.'Vn1 D[V\""'V~""':k k ) Vj. . Vb'''' Vn ) = j k = D(V1"".Determinants 103 Remark... So. "Vj . but it can be deduced from the previous ones. we did not sacrifice anything! To the contrary.""Vn ). Although it is not essential here. so D(v\. the "height" does not change.. Preservation Under "Column Replacement" The next property also seems natural... .
then . . 3. . e2..Vt. 2. The second one and the last one is the normalization property. .e. .. Note.. then detA = 0.. vn ) + f3D(v!. vk.···'Vn ) = k o. . n n Normalization The last property is the easiest one. If one column of a is a multiple of another. Proposition. These three properties completely define determinant. We will show that the determinant... i. . i. we derive other properties of the determinant. ..uk . Constructing the Determinant The plan of the game is now as follows: using the properties that the determinant should have.. en) In matrix notation this can be written as det(l) = 1 = 1.vr.V 1 ..~Uk :f3vk. Normalization property: det 1= 1. i. . For a square matrix a the following statements hold: If a has a zero column.D(v!. . Determina~t is linear in each column..v J +. i.e.. some of them highly nontrivial. then detA = o. detA = o. the determinant changes sign. If a has two equal columns. then det A = 0. Determinant is antisymmetric. For the D(e!. The first propertyis just the combined. ~.. If columns of a are linearly dependent. ..··· vn ) k k for all scalars a. in vector notation for every index k D(v!. . . if one interchanges two columns. . a fu'nction with the desired properties exists and unique. that we did not use property: it can be deduced from the above three. Basic Properties. We will use the following basic properties of the determinant: 1. After all we have to be sure that the object we are computing and studying exists.. We will show how to use these properties to compute the determinant using our old friendrow reduction. 2..e. Properties of Determinant Deduced from the Basic Properties 1. 4.Vy. 3.Vt ··. if the matrix is not invertible.e...104 Determinants = =n['1. .
+ <Xnvn = Lakvk' k=2 Then by linearity we have (in vector notation) D(v. which is possible only if det A = O.. vn ) v. But by the property 1 above. so det4 = . v3'"'' v.. this property can be deduced from the three "basic" properties of the determinant.k Then by linearity D(vl'''' vk + U... u = L>~"jVj' j"".e. On the other hand. so D(vl""'Vk "".. vn ). interchanging two columns changes sign of determinant. v" . Ifwe multiply the zero column by zero. Statement 3 is immediate corollary of statement 2 and linearity. Proof Fix a vector vk . vn ) = D(vl"'" 'k vk.l = D [[t. 1 = L:>~'kD(vk' v2.. Proposition. we are using in this section. say vk can be represented as a linear combination ofthe others. we change nothing. To prove the last statement. vn ) = D(vk""'V)'''''vn ) = 0 = 0. we do not change the matrix and its determinant. k k and by Proposition the last term is zero.. and let u be a linear combination of the other vectors. . "'tVt].. let us first suppose that the first vector VI is a linear combination of the other vectors. if we interchange two equal columns. .. . Let us now consider general case. implies statement 2.u. i.'''' vn ) + D(vI.. n v.det A. The fact that determinant is antisymmetric. . Then one of the vectors. the determinant is preserved under "column replacement" (column operation of third type).. The next proposition generalizes property. The determinant does not change if we add to a column a linear combination of the other columns (leaving the other columns intact). .... As we already have said above.Determinants 105 Proof Statement 1 follows immediately from linearity. n VI = <x2v2 + <X3 v3 + . . let us assume that the system vI' v2. Interchanging this vector with VI we arrive to the situation we just treated. v3"'" k=2 and each determinant in the sum is zero because of two equal columns. In particular." . we should get O. k k so the determinant in this case is also O. . vn is linearly dependent. . so the determinant remains the same. Indeed.
if it is either lower or upper triangular matrix. then subtract appropriate mUltiples of the second column from columns number 3.106 Determinants Determinants of Diagonal and Triangular Matrices Now we are ready to compute determinant for some important special classes of matrices. an' The next important class is the class of socalled triangular matrices. row reduction for AT) keeping track of column operations . an' n..e if aj'k = for all} < k.. and so on. . n. first subtract appropriate multiples of the last column from columns number n .. A square matrix is called lower triangular if all entries above the mai~ are 0. 1.. it is not invertible (this can easily be checked by column operations) and therefore both sides equal zero. 3. a 2. Let us recall that a square matrix A = {aj. k} ~... .. The first class is the socalled diagonal matrices. i... detA = a\. a 2. n.. . using their properties: one just need to do column reduction (i. .j=l i. ? [° ~° . ° ~~ Determinant of a triangular matrix equals to the product of the diagonal entries.\a2'2 . an} for the diagonal matrix an Since a diagonal matrix diag{a\. Indeed. if aj •k = for all} ° ::j:: k.e.. . g].. then using column replacement (column operations of third type) one can transform the matrix into a diagonal one with the same diagonal entries: For upper triangular matrix one should first subtract appropriate multiples of the first column from the columns number 2. 2. an} can be obtained from the identity matrix I by multiplying column number k by ak. an}) = a\a2 .. We call a matrix triangular.e. .j=l is called diagonal if all entries off the main diagonal are zero.e.det(diag{a\. Determinant of a diagonal matrix equal the product of the diagonal entries.... . i. i. To treat the case of lower triangular matrices one has to do "column reduction" from the left to the right.e" if a"k = for all k <i. It is easy to see that ° is called upper triangular if all entries below the main diagonal are 0. A square matrix A = {a j 'k } ~. . . If all diagonal entries are nonzero. . a2' ..1. .. and so on. Computing the Determinant Now we know how to compute determinants. We will often use the notation diag{a\. .. if a triangular matrix has zero on the main diagonal.. "killing" all entries in the first row.
for a column operation the corresponding elementary matrix can be obtained from the identity matrix 1 by this column operation. And that is all! It may be hard to realise at first. that although we now know how to compute determinants. the most often used operation .e. (Determinant of a product). we can use row operations to compute determinants. Determinants of Transpose and Product * In this section we prove two important theorems Theorem. For a square matrix A and an elementary matrix E (of the same size) det(AE) = (detA) (detE) Proof The proof can be done just by direct checking: determinants of special matrices are easy to compute. This can look like a lucky coincidence. (Determinant of a transpose). det A = det(AT). Namely. Ifan echelon form of AT does not have pivots in every column (and row). we arrive at a triangular matrix. For a square matrix A. operation of third type does not change the determinant. So. . Fortunately. Combining this with the last statement of Proposition we get Proposition.. In particular. right multiplication by an elementary matrix is a column operation. but it is not a coincidence at all. only if a is not invertible. and det A is the product of diagonal entries times the correction from column interchanges and multiplications. and eect of column operations on the determinant is well known. The above algorithm implies that detA can be zero only if a matrix A is not invertible. i. One can ask: why don't we define it as the result we get from the above algorithm? The problem is that formally this result is not well defined: that means we did not prove that different sequences of column operations yield the same answer. but the above paragraph is a complete and rigorous proof of the lemma! Applying N times Lemma we get the following corollary. det A = 0 if and 0 if and only if A is invertible. Theorem. For n x n matrices A and B det(AB) = (detA)(detB) In other words IDeterminant of a product equals product of deteramamtsl To prove both theorem we need the following lemma Lemma. the determinant is still not defined. then a is not invertible. that the determinants of elementary matrices agree with the corresponding column operations. So.Determinants 107 changing the determinant. so det A = O.row replacement. determinants behave under row operations the same way they behave under column operations. its determinant is 1 (determinant of 1) times the eect of the column operation. If a is invertible. This theorem implies that for all statement about columns the corresponding statements about rows are also true. An equivalent statement: det A Note. So we only need to keep track of interchanging of columns and of multiplication of column by a scalar.
I I I I I I I I Properties of Determinant First of all. and the theorem just says that 0 = O. .. and both determinants are zero.. and since it is square. . Then multiplying the identity AB = C by CI from the left. For any matrix A and any sequence of elementary matrices E I.. . . Since taking the transpose just transposes each elementary matrix and reverses their order. EN) = (det A)(det EI)(det E 2). A = EI E2 .. 2.. If B is not invertible. E 2.. By Lemma matrix A can be represented as a product of elementary matrices. the determinant changes sign. Notice.. . EN' and by Corollary the determinant of A is the product of determinants of the elementary matrices. Proof Let us first suppose that the matrix B is invertible. Corollary implies that detA = detA T. . .. 1.. E 2E IA.. we get C1AB = /. then the product AB is also not invertible. that determinant is defined only for square matrices Since we now know that det A = det(AT).EN) Lemma. EN_IEN (inverse of an elementary matrix is an elementary matrix). .. So B is left invertible. Then Lemma implies that B can be represented as a product of elementary matrices B = E IE 2. the statements that we knew about columns are true for rows too. let us say once more...108 Determinants Corollary. To check that the product AB = C is not invertible. . .. So /= ENElV l . and therefore any invertible matrix can be represented as a product of elementary matrices. EN (all matrices are n x n) det(AE IE 2. We got a contradiction. since if A is not invertible then AT is also not invertible. which is its reduced echelon form.. . A = E IE 2. that for an elementary matrix E we have detE = det(ET).. Any invertible matrix is a product of elementary matrices. so CIA is a left inverse of B. that it is sucient to prove the theorem only for invertible matrices A. it is invertible... Proof First of all. EN' and so by Corollary det (AB) = (det A)[(det EI)(det E 2). it can be easily checked... Determinant is linear in each row (column) when the other rows (columns) are fixed.. EN_IEN / = EI E2 . Proof We know that any invertible matrix is row equivalent to the identity matrix. If one interchanges two rows (columns) of a matrix A. (det. let us assume that it is invertible.. (det EN)] = (det A)(det B).
.. In particular. v2'''' vn )· j=1 j=1 Then we expand it in the second column. the determinant is preserved under the row (column) replacement..nD(ejl·eh" ... 5. v 2. ej)' j)=lh=1 jn=1 n n n ..O if and only if A is invertible.Determinants 109 3. 3 from Section 3 exists. and moreover. if the matrix is not invertible. 11. More generally. i. Existence and Uniqueness oftbe Determinant In this section we arrive to the formal definition of the determinant. We get D(v. i. . then det A = 0.2 . . 10. v 2. under the row (column) operation of the third kind.k = 1. For a triangular (in particular.e. then in the third. In particular.l e j' v2 .. det A = O. det A = 0 if and only if A is not invertible. . det A does not change if we add to a row (column) a linear combination of the other rows (columns)... J~ Using linearity of the determinant we expand it in the first column n n vI: D(v.... i. . det(AB) The last property follows from the linearity of the determinant. If a matrix A has two equal rows (columns). vn) = D(2: a j. 4. ~ait. det A = O. . or equivalently det A*. = (det A)(det B). and so on. Consider an n x n matrix A = {aj.. and that each multiplication multiplies the determinant by a.en = 2: a j'l D(ej . vn be its columns. and let vI' v 2.. If a is an n x n matrix. for a diagonal) matrix its determinant is the product of the diagonal entries. we do not have any choice in constructing the determinant. And finally. i.e. det 1= 1. 9. +an'k en = taj'k ej . then det(aA) = an det A. 2.ajn. such function is unique. satisfying the basic properties 1.e. If one of the rows (columns) of A is a linear combination of the other rows (columns).. If a matrix A has a zero row (or column). det AT = det A. vk = [~~:~l = ~~ al'k el +a2'k e2 + .. We show that a function.lah. . 6. 12.k }nj. " vn) = 2: 2:. 8. if we recall that to multiply a matrix A by a we have to multiply each row by a. 7...e. ..
the permutation rearranges the ordered set 1... .. 2.. ef) is zero. Using the notion of a permutation. n} ~ {l... So. It is a wellknown fact from the combinatorics. (n)... . then the number of such pairs is 0.. . . (2). n.jn. . ecr(n) can be obtained from the identity matrix by finitely many column interchanges. We want to show that signum and sign coincide.h. that the number of different perturbations of the set {I. i.110 Determinants Notice. 2. jn coincide. Then define signum of to be (I)K. . . that although there are infinitely many ways to get the ntuple (1).. let us notice. . . . a permutation of an ordered set {l. eh. ...nD(ecr(I). . . omitting all zero terms. n into a(1). .. . . . If (k) = k V k. 2.. . so the determinant D(ecr(I)' ecr(2). 2... so signum of such identity permutation is 1.ecr(n»· The matrix with columns ecr(l)' e cr (2)' . some of the terms are zero. To formalize that. the determinant D(ej I . n}. and sign(a) = 1 if the number of interchanges is odd.. changes the signum of a permutation.. ... 2. so sign is well defined..lacr(2). where a(I).ecr(2)'····. a(n). The most convenient way to do that is using the notion of a permutation. The set of all permutations of the set {l. 2. k. because it changes (increases or decreases) the number of the pairs exactly by I. We call the permutation odd if K is odd and even if K is even. . 2. Such function a has to be onetoone (different values for different arguments) and onto (assumes all possible values from the target space). let us rewrite the sum. j < k such that aU) > a(k). if any 2 of the indices j l' h. . vn) = 2:: crEPenn(n) acr(I). (n) gives the new order of the set 1.2. that the sign of permutation is well defined. a(2).. .. . In other words. v2.. the indexh here is the same as the index}. to get from a permutation to another one always needs an even . a(2). we can rewrite the determinant as D(v l . Namely. because there are two equal rows here. .. n} is a rearrangement of its elements. and they give onetoone correspondence between two sets.. a(2)..... (n) from 1. One of the ways to show that is to count the number K of pairs j.2···acr(n).. n into a(1). which interchange two neighbors. .e.. . n. .. Fortunately. ..2. it contains nn terms. . . n} will be denoted Perm(n). . So.. . Although it is not directly relevant here. a convenient way to represent a permutation is by using a function a: {I.. n} is exactly n!. and see if the number is even or odd. that any elementary transpose. Such functions (onetoone and onto) are called bijections. ecr(n» is I or 1 depending on the number of column interchanges.. It is a huge sum. that it is wellknown in combinatorics.. we define sign (denoted sign a) of a permutation to be 1 if even number of interchanges is necessary to rearrange the ntuple 1. . the number of interchanges is either always odd or always even. Note also. .. that we have to use a different index of summation for each column:we call themjl. . .
k(I) k=1 _ '+1 n j+k det Aj.aa(n). . This implies that signum changes under an interchange of two entries. because for each column every term (product) in the sum contains exactly one entry from this column. That means signum and sign coincide. For each j. and an oddnumber if the signums are different. and so sign is well defined. it is easy to check that it satisfies the basic properties. Finally. and odd number of interchanges is needed to get an odd permutation (negative signum). . any interchange of two entries can be achieved by an odd number of elementary transposes.l.2···.n = 2:aj. COFACTOR EXPANSION For an n x n matrix A = {aj. J. the right side is 1 (it has one nonzero term). so right side in changes sign.. . when the first row has one non zero term a I I' Performing column operations on columns 2.aa(2). The determinant of A then can be computed as .1 + aj . Let A be an n x n matrix. If we define the determinant this way.k' Proof Let us first prove the formula for the expansion in row number 1. So.k' n. Theorem.2 + . The formula for expansion in row number k then can be obtained from it by interchanging rows number 1 and k. ~ j ~ n. J. + a' l (_I)J+n detA.Determinants 111 number of elementary transposes if the permutation have the same signum.n sign(a).2..l (I)J detAj. 3. n to an even permutation (positive signum) one always need even number of interchanges. we must define it as aa(l).. 2. . column expansion follows automatically.k=1 letAj'kdenotes the (nl) x (nl) matrix obtained from A by crossing out row number j and column number k.d~. Since det A = det AT. n ~ k ~ detA = 2:aj. if we want determinant to satisfy basic properties 13 from Section 3. 1 column number k.aj. it is linear in each column. Finally. . .2(I)j+2 detAj . Interchanging two columns of a just adds an extra interchange to the perturbation. determinant ofA can be expanded in the row number j as detA . for the identitymatrix I. the determinant can be expanded in the Similarly.. detA = 2: aEPenn(n) where the sum is taken over all permutations of the set {I... So.k(I) k=1 j+k det Aj.. to get from 1. (Cofactor expansion of determinant). n}. Let us first consider a special case. for each k.. Indeed. n we transform a to the lower triangular form.
k=1 k ) aI'k by where the matrix A(k) is obtained from A by replacing all entries in the first row except O. As we just discussed above detA(k) = (_I)1+k al.2 are zeroes. can be reduced to the previous one by interchanging rows 2 and 3.k = 2: (1)2+k a2. correcting factor from entries of the triangular x the column operations matrix But the product of all diagonal entries except the first one (i.I' Let us now consider the case when all entries in the first row except a l . + det A(n) = L:deti '. To expand the determinant det A in a column one need to apply the row expansion formula for AT.l' so in this particular case detA = al.k.' This case can be reduced to the previous one by interchanging columns number 1 and 2. and/therefore in this case detA = (1) detA I'2' The case when a1'3 is the only nonzero entry in the first row. k=1 n To get the cofactor expansion in the second row.k is the only nonzero entry in the first row det A = (_1)1+ k aI.I detAI.k det Al k' In the general case.. Using this notation.k det A2. are called co/actors. the formula for expansion of the determinant in the row number j can be rewritten as . Definition.e. so det A = L (I) l+k a I'k det A I ok.k detA3. The numbers C· k =(I)l'+k detA' k 1. Exchanging rows 3 and 2 and expanding in the second row we get formula detA = 2:(1) k=1 n and so on.. so in this case detA = a1'3 detA1'3' Repeating this procedure we get that in the case when al. we can interchange the first and second rows and apply the above formula. linearity of the determinant implies that n det A = det A(I) + det A(2) + .112 Determinants Ithe product of diagonal entries of the triangular matrix I the product of diagonal .k' n k=I 3+k a3. without at I) times the correcting factor is exactly detAI. so we get det A = . j.l:)1) k=I n I+k a2.. The row exchange changes the sign.k det A2.k detAl'k.
n J.j :t= k. Then AI =_l_C T . the cofactor expansion formula is not suitable for computing determinant of matrices bigger than 3 x 3. It would take a computer performing a million operations per second (very slow. J.n by the cofactor expansion formula. On the other hand. To get the off diagonal terms we need to mUltiply jth row of A by kth column of CT.ICk.. jth row of C).n J.k' k=1 n Similarly. The diagonal entry number j is obtained by mUltiplyingjth row of a by jth column of a (i. 10 18 multiplications: it would take a computer performing a billion multiplications per second over 77 years to perform the multiplications. expansion in the row number k can be written as det A = al'k CI. Although it looks very nice.. However. + a. 2Ck2 + .k + a2.kCj. .'1 + a· 2C. J.n = Laj.. It is not dicult to show that the quantity given by this formula satisfies the basic properties of the determinant: the normalization property is trivial. so (ACT)JJ = a·'IC. detA Proof Let us find the product ACT..Determinants n 113 detA = a"1 C"I J J + a' 2 c.+ a· C k. the proof of linearity is a bit tedious (although not too dicult).4 .. Ver~ often the cofactor expansion formula is Jused as the definition of determinant. 1+ a. 2 + . j" j.. by today's standards) a fraction of a second to compute the determinant of a 100 x 100 matrix by row reduction.k=1 whose' entries are c<!factors of A given matrix A is called the cofactor matrix of A.e.. However. J.kCj. + an'k Cn. cofactor expansion of a 20 x 20 matrix require 20! ~ 2. and n! grows very rapidly. J.k C2. J J J. As one can count it requires n! multiplications. + a· C. It can only be practical to apply the cofactor expansion formula in higher dimensions if a row (or a column) has a lot of zero entries. the cofactor expansion formula is of great theoretical importance.k + . j. to get a. For example.k' '1 Remark... as the next section shows. Cofactor Formula for Inverse Matrix The matrix C = {Cj. computing the determinant of an n x n matrix using row reduction requires (n 3 + 2n . C.n .n .k }~. 2 + . J. Let a be an invertible matrix and let C be its cofactor matrix..k = Laj. the proof of anti symmetry is easy. = detA. Theorem. Remark.3)/3 multiplications (and about the same number of additions).
While the cofactor formula for the inverse does not look practical for dimensions higher than 3. all offdiagonal entries of ACT are zeroes (and all diagonal ones equal det A). But the rows} and k of this matrix coincide. (Cramer's rule).114 Determinants It follows from the cofactor expansions formula (expanding in kth row) that this is the determinant of the matrix obtained from a by replacing row number k by the row number } (and leaving all other rows as they were). Note. . Corollary. such that its inverse also has integer entries (inverting such matrix would make a nice homework problem: no messing with fractions). as the examples below illustrate. Recalling that for an invertible matrix A the equation Ax solution X 1 = b has a unique =A. Suppose that we want to construct a matrix a with integer entries. (Matrix with integer inverse). That means that the matrix det A CT is a right inverse of A. it is the inverse. the cofactor matrix is d a' (c b) so the inverse matrix AI is given by the formula AI = de~A (!c ~b).1 b = . that it is easy to construct an integer matrix A with det A = 1: one should start with a triangular matrix with 1 on the main diagonal.CTb. so the determinant is O. If det A = 1 and its entries are integer. it has a great theoretical value. The cofactors are. the cofactor formula for inverses implies that AI also have integer entries. Some applications of the cofactor formula for the inverse.just entries (1 x ] matrices).. Example (Inverting 2 x 2 matrices). detA We get the following corollary of the above theorem. Example. and then apply several row or column replacements (operations of the third type) to make the matrix look generic. and since a is square. thus ACT = (det A) 1. For an invertible matrix a the entry number k ofthe solution of the equation Ax = b is given by the formula detBk xk=' detA where the matrix Bk is obtained from a by replacing column number k of A by the vector b. So. The cofactor formula really shines when one needs to invert a 2 x 2 matrix I 1 A=(~ ~).
e. The determinant of this matrix is called a minor of order k. Proof Let us first show. since the dimension of the column space Ran A is rank A < k. everywhere except maybe finitely many points rankA(x) ~ r. r exists. To complete the proof we need to show that there exists a nonzero minor of order k = rankA. rankA(x) ~ r for all x. we replace r by r 1 and try again. The'Jrem. So. a matrix whose entries are polynomials ofx). After finitely many steps we either stop or hit 0. To show that such r exists. and so it has (.. as the following corollary shows.e. we found r. If there exists x such that rank A (x) = r.e.). so AI (x) is a has rational entries: moreover. Let Xo be a point such that rankA(xo) = r. then the inverse matrix AI(x) is also a polynomial ri. If not. and let M be a minor of order k such that M(xo) 0. For a nonzero matrix a its rank equals to the maximal integer k such that there exists a nonzero minor of order k. except maybe finitely many points. p(x) is a multiple of each denominator. So. obtained by taking k rows and k columns.e original matrix. Proof Let r be the largest integer such that rankA(x) = r for some x. This theorem does not look very useful. i. However. at finitely many points. containing a pivot). so it can be zero onl. so it is invertible (pivot in every column and every row) and its determinant is nonzero. If det A(x) 1. Indeed. it follows from the cofactor expansion that p(x) is a polynomial. But by the definition ofr.. Since M(xo) 0. n}. If det A (x) =p(x) == 0. it is of greattheoretical importance. a matrix whose entries are not numbers but polynomials a· ix) of the variable x. it is not identically zero. for any k x k submatrix of A its column are linearly dependent. Let A = A(x) be an m x n polynomial matrix (i. There can be many such minors. Therefore. Then rank A (x) is constant everywhere. that an m x n matrix has (~l).Determinants 115 Examp:e (Inverse of a polynomial matrix). because it is much easier to perform row reduction than to compute all minors. that if k > rank A then all minors of order k are 0. Since M(x) is the determinant of a k x k polynomial matrix. but probably the easiest way to get such a minor is to take pivot rows and pivot column (i. Note.atrix. M(x) is a polynomial. Another example is to consider a polynomial matrix A(x)..( ~) minors of order k. rows and columns of the original matrix.( ~) different k x k submatrices. any k columns of A are linearly dependent. Minors and Rank For a matrix A let us consider its k x k submatrix. Corollary. This k x k submatrix has the same pivots as t:. we first try r = min {m. and so all minors of order k are 0. '* '* DETERMINANTS We have related the question of the invertibility of a square matrix to a question of .
Consider next the case a 6:f. of the form A = (a) • Note here that 1\ = (1). If a 6 ::. Suppose first of all that a = e = O. with inverse matrix AI = (a . we obtain I 0) b a ( o adbe e a· If D = ad . of the form A=(~ ~). this is unsatisfactory.c 0. Definition. since it is not simple to nd an answer to either of these questions without a lot of work. and call this the determinant of the matrix A. Let us then agree on the following definition. O. We write det (A) = a. let us turn to 202 0 x 2 matrices. then clearly no matrix B can satisfy AB = BA = 1\. and try to use elementary row operations to reduce the left hand half of the array to 12.be = 0. Multiplying row 2 of the array (1) by a. is a 1 x 1 matrix. So what is the determinant? Let us start with 1 x 1 matrices. In some sense. Next.c O. We shall use elementary row operations to nd out when the matrix A is invertible. since the matrix A is invertible if and . if a = 0. We shall relate these two uestions to the question of the determinant of the matrix in question. The task is reduced to checking whether this determinant is zero or nonzero. Suppose that A = (a). then clearly the matrix A is invertible. So we consider the array (AII2 ) = (~ ~ 6 ~).116 Determinants solutions of systems of linear equations. then this becomes 1 0) a b ( o 0 c 1 a' 0) . so that the matrix A is not invertible. we obtain Adding e times row 1 to row 2. Then the array becomes 0 b (o d O l ' and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12 . only if a ::.1) On the other hand. We therefore conclude that the value a is a good "determinant" to determine whether the 1 x 1 matrix A is invertible.
if D = ad . then this becomes e dOl) (a 0 e a' and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12. then the array (3) can be reduced by elementary row operations to I 0 d / D b / D) ( o 1 e/ D af D ' so that AI adbe e a Finally. Interchanging rows I and 2 of the array (I).Determinants 117 and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12. We therefore conclude that the value ad . Let us then agree on the following definition.be = 0. note that a = e = 0 is a special case of ad .be :f. On the other hand. On the other hand.be is a good determinant" to determine whether the 2 x 2 matrix A is invertible. since the matrix A is invertible if and only if ad . we obtain e (a e dOl) (ae be eO' Adding a times row I to row 2. a Consider nally the case e:f.be = O. 0. O. O. Definition. Suppose that A= = 1 ( d b). then the array (2) can be reduced by elementary row operations to ( so that o 1 0 1 c/ D d / D b / D) a/ D ' AI = 1 adbe e (d b). (~ ~) . we obtain dOl) b 1 O· Multiplying row 2 of the array bye. we obtain e dOl) ( o bead e a' Multiplying row 2 by 1. if D = ad .be = 0.be :f. we obtain dOl) e ( o ad be e a' Again. if D = ad .
. . and nally multiplying by a sign (ly+J.1) x (n . dene the determinant of 4 x 4 matrices in terms of determinants of 3 x 3 matrices. we mean the expression L aijCij =aijC j=I n il + .118 Determinants 2 matrix.1) matrices.. we shall dene the determinant of 2 x 2 matrices in terms of determinants of 1 x 1 matrices.. + ainCin .1) (n .erminant of the resulting (n . Our approach is inductive in nature. By the cofactor expansion of A by row i. The number Cij = (_l)i+j det(Aij) is called the cofactor of the entry aij of A. Suppose now that we have dened the determinant of (n .) = 1.!"I)I anI • a(i+l}(jI) : a(j+I)(j+I) a(i~l)" ann an(jI) • an(j+I) Here denotes that the entry has been deleted. ( all. For every i.be. and call this the determinant of the matrix A. the cofactor of the entry a ij is obtained from A by first deleting the row and the column containing the entry aij' then calculating the d~t. In other words.. We write det(A) = ad . Note that the entries of A in row i are given by (ail' . Those who have their feet firmly on the ground will try a different approach. . dene the determinant of 3 x 3 matrices in terms of determinants of 2 x 2 matrices. Let A =·. x is a 2 Determinants for Square Matrices of Higher Order If we attempt to repeat the argument for 2 x 2 matrices to 3 x 3 matrices. Try the argument on 4 x 4 matrices if you must.1) matrix. In other words. a ln ] .. anI··· ann be an n matrix. then it is very likely that we shall end up in a mess with possibly no firm conclusion. Note that the entries of A in column} are given by . and so on.. let us delete row i and column} of A to obtain the (n . Definition. ... n.1) x (n .1) matrix all a(iI)I Aij aI(II) • aI(I+I) • • aln a(i_I)" a(il)(jI) ·a(iI)(j+I) = a(i. ain)· Definition.
by row 2 : a 21 C21 + a 22 C22 = a2l a l2 + a 22 a ll .a 21 a 12 . Example. det(A) = allC ll + a l2 C l2 + a l3 C l3 = 36 .8) = 7. C22 = all: It follows that by row 1 : allC ll + a l2 C l2 = a ll a22 . a22 = a22 . U i) = (_1)4 (1 .a 12a21 . By the cofactor expansion of A by column j. Let us check whether this agrees with our earlier definition of the determinant of a 2 x 2 matrix. anjCnj" Proposition. Then . Consider the matrix A= 1 4 2. by column 1 : allC ll + a 21 C 21 = a ll a 22 . we mean the expression n IaijCy i=l 119 = aljClj + . C 21 =a I2 . C l2 =a21 . by column 2 : a l2 C l2 + a22 C22 = a l2 a21 + a22a ll : The four values are clearly equal. Suppose that A is an n x n matrix.Determinants a F] ( an) Definition. C l2 = (_1)1+2 detl C I3 = (_1)1+3 detl so that U..bc as before. Suppose that A is an n x n matrix. and of the form ad . Definition.) = (I? (202)= 18.35 = 2: Alternatively.. denoted by det(A). We call the common value in the determinant of the matrix A.4) = 1. Then the expressions are all equal and independent of the row or column chosen.3 . Then Cll 2 3 5) ( = (_1)1+1 det (i . Writil1g A= we have CII (a a l1 a21 12 ).) = (_1)3 (5 . 215 Let us use cofactor expansion by row 1. let us use cofactor expansion by column 2.
Definition. = 0.) = (f ~) (1)3(5 ... 2 1 0 5 Here it is convenient to use cofactor expansion by column 3.. If aij = 0 whenever i <j. (~ ~) = (_1)4(10  10) C32 = (_1)3+2 det so that = (1)5(4 . then A is called an upper triangular matrix. When using cofactor expansion. ann If aij = 0 whenever i > j. Suppose that a square matrix A has a zero row or has a zero column. Consider the matrix 2 402 1 3 0 5J A= (5 4 8 5 . i D=16.. Consider an n x n matrix A= (ap . . Proof We simply use cofactor expansion by the zero row or zero column. Then det (A) = O. Example. We also say that A is a triangular matrix if it is upper triangular or lower triangular. Example. Some Simple Observations In this section. The matrix ( ~ ~ ~J 006 is upper triangular.4) = 1.5) = 1. we shall describe two simple observations which follow immediately from the definition of the determinant by cofactor expansion.120 Determinants C l2 = (_1)1+2 det C22 = (_1)2+2 det U. we should choose a row or column with as few nonzero entries as possible in order to minimize the calculations. al n J. det(A) = O. det(A) = a 12 C 12 + a22 C22 + a 32 C32 = 3 + 0 + 1 = 2. an. since then 2 3 det(A) = a 13 C 13 + a23 C23 + a 33 C33 + a43 C43 = 8C33 = 8El)3+3 det ( ~ in view of Example. Proposition. . then A is called a lower triangular matrix.
Then det(B} = det(A}. (b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one row of A to another row. Then det(B} = det(A}. we see that ( ani all .Determinants Example. When n> 2. ann' the product of the diagonal entries.. ..1) x (n . we use cofactor expansion by a third row. change the term leftmost column to the term top row in the proof... . so that det(Bij) = det(Aij)' It follows that !) n det(B) as required. Proof Let us assume that A is upper triangular for the case when A is lower triangular.. Proposition. A= 121 ..LJaij(l)i+ j '"" det(Bij)' j=1 Note that the (n . ann Then det(A) = all a22 . (a) Suppose that the matrix B is obtained from the matrix A by interchanging two rows of A. Suppose that the n x n matrix is triangular. It is easily checked that the result holds when n = 2. aln'J .. Then det(B} = c det(A}. f3 '" '" ann a~nJ = '" all a22 • . Then n det(B) = . . say row i. . (c) Suppose that the matrix B is obtained from the matrix A by multiplying one row of A by a nonzero constant c. Proof (a) The proof is by induction on n. (ELEMENTARY ROW OPERATIONS) Suppose that A is an n x n matrix. (2) adding a multiple of one row to another row. Using cofactor expansion by the leftmost column at each step. Recall that the elementary row operations that we consider are: (l) interchanging two rows. Proposition. det (a an3 as required. ann Elementary Row Operations We now study the eect of elementary row operations on determinants. A diagonal matrix is both upper triangular and lower triangular. = Laij(li+ det(Ay) = det(A) j=1 j . and (3) multiplying one row by a nonzero constant.I) matrices Bij are obtained from the matrices Ai" by interchanging two rows of Aij . .
the proof is by induction on n. Then det{ B) = Note now that Bij L caij ( 1 )i+1 det{Bij ) j=1 n = Aij' n since row i has been removed respectively from Band A. Consider the matrix 2 2 0 4 Adding I times column 3 to column 1. say row i. Then det(B} = c det(A}. Proposition. It is easily checked that the result holds when n = 2. Elementary row and column operations can be combined with cofactor expansion to calculate the determinant of a given matrix.. 122 Determinants (b) Again. we have the following result. we use cofactor expansion by a third row. Example. We shall illustrate this point by the following examples. we have A =(1 ! ! ~J . Suppose that A is an n x n matrix. In fact. Suppose that the matrix B is obtained from the matrix A by multiplying row i of A by a nonzero constant c. (b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one column of A to another column.det(A). More precisely.1) matrices Bij are obtained from the matricesAij by adding a multiple of one row of Aij to another row. Then n det(B) = L. so that det(Bij) = det(Aij)' It follows that n det(B) = L. Then det(B} = det(A}. It follows that det(B) = Lcaij(I)i+1 det(Aij) = cdet(A) j=1 as required.uij(l)i+ j 1=1 det(Aij) = det(A) as required. Then det(B} = . (c) This is simpler.aij(li+ j j=1 det(Bij) Note that the (n 1) x (n . (c) Suppose that the matrix B is obtained from the matrix A by multiplying one column of A by a nonzero constant c. (a) Suppose that the matrix B is obtained from the matrix A by interchanging two columns of A. the above operations can also be carried out on the columns of A. When n> 2.
2(1)3+2 det = det (! ~). det(A) = det [H ! ~J. det(A) = 2 det = 1 1 0 2 Adding 3 times column 3 to column 2. Using the formula for the determinant of2 x 2 matrices. we have (! : ! ~J. 2 2 0 4 Using cofactor expansion by column 1. we conclude that = 4(9 . we have det(A) = 2(_1)4+1 det (~3 i ~J = 2 det(~3 4 3 Adding 1 times row 1 to row 3.Determinants 123 det(A) = det[? : 2 2 0 4 Adding 1/2 times row 4 to row 3. 1 0 2 . we have n (! ~) = 2. we have ! ~J.28) = 76. we have det(A) = 2 det 1 1 0 2 Adding 1 times row 4 to row 2. we have det(A) = 2 det (~ i 2J. Let us start again and try a different way. we have de~ ~ ~2 det (~ det(A) det(A) ! Using cofactor expansion by row 3. o 2 ~ Adding 1 times column 2 to column 3. Dividing row 4 by 2. we have det(A) (~ ! ! ~J = 2det(~ 1 ! ! ~J.
1(_1)3+1 det(=t3 Using the formula for the determinant of 22 matrices. we have (. Consider the matrix 2 102 0 Here we have the least number of nonzero entries in column 3. we have A=[~ ~ ~ H] 1 0 1 1 3 . we have det(A) = 2. det(A) = det[~ ? ~ H]. so let us work to get more zeros into this column.1 det(A) = 2 det (~1 =~ 1) 1 2 Adding 5 times row 3 to row 2. Example. we conclude that det(A) = 2(25 + 13) = 76.124 Determinants Using cofactor expansion by row 2. we have 2 31 0 1 23] 01 1 det(A) = det 2 7 0 1 1. 1 0 1 1 3 2 102 0 Adding 2 times row 4 to row 3. we have det(A) ~ 2 de{~ ~53 H ~5) =2det(=15 ~5)' 3 Using cofactor expansion by column 1. =~ ~) =2det(. 1 I 2 Adding 2 times row 3 to row 1. 3 [21 01 01 21 0 Using cofactor expansion by column 3. we have det(A) = 2 1(1)2+3 det . we have det(A) = 1(1) 4+3 1 3 1 2 1 3 1 det 2 7 1 1 = det 2 7 1 2120 212 2 1 1 3J ( (2 1 1 . Adding 1 times row 4 to row 2.
we have det(A)=l(I)I+ldet(~1 2 32)=detdet(A)=det(~1 2 3'J. we have 1 0 2 4 1 0 2 4 5 7 6 2 4 6 1 9 2 1 det(A) = det 3 5 0 1 2 5 2 4 5 3 6 2 0 0 0 1 0 0 Adding 1 times row 5 to row 2. ~ 0 ~ 02 Adding 1 times row 1 to row 2.det 1 7 1 1· Adding 1 times row 1 to row 3. Using the formula for the determinant of 2 x 2 matrices. we have o 3 1 2 det( A) = . we have 3 1 2) ( det(A) = _2(_1)1+3 det(i ~) = 2det(i ~). Example.1) = 34. we conclude that det(A) =2(18 . Adding 1 times row 1 to row 6. 120 Using cofactor expansion by column 3. we have det(A) = det 9 1 O. Consider the matrix 1 024 1 0 2 4 5 7 6 2 A= 4 6 1 9 2 1 350 125 245 362 1 0 2 5 1 0 Here note that rows 1 and 6 are almost identical. we have .Determinants 125 Adding 1 times column 3 to column 1. we have 1 1 1 3 o 3 1 2 det(A) = det 0 6 0 2· ( 1 1 1 3J (o 1 2 0 J o 1 2 0 Using cofactor expansion by column 1.
Example. determinants of 3 x 3 matrices depend on determinants of 2 x 2 matrices. we have det(A t) = det(A).. we have 1 0 2 4 1 0 o 0 0 0 0 0 4 6 1 9 2 1 det(A) = det 3 5 0 1 2 5 245 362 000 100 It follows from Proposition 3B that det(A) = O. in turn. we mean the matrix obtained from A by transposing rows and columns. a(ll) A= 4 5 6. . 1n anI'" ann By the transpose At of A. We have . ann Example.. A =' al n . The result below follows in view of Proposition. and so on. Further Properties of Determinants Definition. Note now that transposing a 1x 1 matrix does not aect its determinant (why?)...126 Determinants det(A) = det 4 6 o 102 4 1 0 0 0 4 0 0 1 9 2 1 350 125 245 362 000 100 Adding 4 times row 6 to row 2. Consider the n A x n matrix = 11 . Then 1 2 3) (789 369 AI =(1 ~ ~J.. . Consider the matrix I (all .. (a a ). It follows that determinants of n x n matrices ultimately depend on determinants of 1 x 1 matrices. For every n x n matrix A. Recall that determinants of 2 x 2 matrices depend on determinants of 1 x 1 matrices. ..
Proposition. Let us now reduce A by elementary row operations to reduced row echelon form B.1) = det(In) = 1. The result follows immediately. Combining Propositions.1) = * * (b) The system Ax = 0 of linear equations has only the trivial solution. we have the following result. Finally. Ek of elementary n x n matrices such that B = Ek . o. 10113 2 102 0 Next. In the notation of Proposition. It follows that det(B) ::f:. we have det(A) det(A. (d) The system Ax = b of linear equations is soluble for every n 1 matrix b. (e) The determinant det(A) ::f:. 0. Suppose now that det(A) ::f:. so that B has no zero rows by Proposition. the following statements are equivalent: (a) The matrix A is invertible. Then det(A) 0 follows immediately from Proposition. is summarized by the following result. We therefore conclude that A is row equivalent to In. ••• . Suppose that the n x n matrix A is invertible. we have det(AB) = det(A) det(B). For every n x n matrices A and B. we shall study the determinant of a product. det(A. (c) The matrices A and In are row equivalent. Application to Curves and Surfaces A special case of Proposition states that a homogeneous system ofn linear equations in n variables has a nontrivial solution if and only if the determinant if the coefficient matrix . Suppose that A is an n x n matrix. O. It now follows from Proposition that A is invertible. Then 1 det(A) Proof In view of Propositions 3G and 3C. ••• . Proposition.Determinants 127 det=[n 12312 35730 H~l=det[~ i ~ i ~l=34. det(E 1) det(A) Recall that all elementary matrices are invertible and so have nonzero determinants. Since B is an n x n matrix in reduced row echelon form. Proposition. Then A is invertible if and only if det(A) o. the main reason for studying determinants. We shall sketch a proof of the following important result Proposition. ••• . it must be In. Then there exist a finite sequence E 1. as outlined in the introduction. Proof Suppose that A is invertible. EIA It foIrows from Proposition that det(B) = det(Ek).
b. we must have 2 2) (2 2) a (al + YI + bX 1+ cYI + d = 0. xla + Ylb + c = 0. Since the two points lie on the line. Example. ~ ~J x2 the equation of the line required. In this section. 2 (x1 + yf}a + xlb + ylc + d = 0 (xi + yi)a + x 2b + Y2c + d. we shall use this to solve some problems in geometry. Since the three points lie on the circle. we have . we must have aX I + bYI + c = 0 and ax2 + bY2 + c = O. det(~ Y2 1 = O. c) to this system of linear equations. + y. Example. We illustrate our ideas by a few simple examples. Suppose that we wish to determine the equation of the unique line on the xyplane that passes through two distinct given points (xI' Yl) and (x2' Y2)' The equation of a line on the xyplane is of the form ax + by + c = O. and so we must have ( ~ ~ ~J (~J =(~J. x 2a + Y2b + c = O.)a + x3b + Y3c + d = 0 Written in matrix notation. a x2 + Y2 + bX2 + cY2 + d = 0 and Hence (x2 + Y2)a + xb + yc + d = 0. we have x2 Y2 1 c 0 Clearly there is a nontrivial solution (a.128 Determinants is equal to zero. Suppose that we wish to determine the equation of the unique circle on the xyplane that passes through three distinct given points (xl'Yl)' (x2'Y2) and (x3'Y3)' not all lying on a straight line. The equation of a circle on the xyplane is of the form a(x2 +Y2) + bx + cy + d = O. Hence xa + yb + c = 0. Written in matrix notation. 0 (x.
aX2 + bY2 + CZ2 + d= 0. we must have a(xf + yf + zf)+ bXI +cYI +dzl +e = 0. + z. d) to this system of linear equations. The equation of a plane in 3space is of the form ax + by + cz + d = o. Suppose that we wish to determine the equation of the unique sphere in 3space that passes through four distinct given points (xl' YI' ZI)' (x 2' Y2' Z2)' (x 3' Y3' z3) and (x4' Y4' Z4)' not all lying on a plane. and ax3 + bY3 + cZ3 + d= O. we must have ax I + bYI + cz I + d = 0. x 3a + Y3b + z3c + d = 0: i 2 x Y a 0 b xi + yi x3 2 x 2 Y2 x3 Y3 + Y3 c d =o o o Clearly there is a nontrivial solution (a. a (xi + yi + zi) + bX2 + CY2 + dz 2 + e = 0. a(x. The equation of a sphere in 3space is of the form det(~ ~ ~ ~J = z2 x3 Y2 Y3 z2 z3 a(x2 + Y2 + z2) + bx + cy + dz + e = o.) + bX3 +CY3 +dz3 +e = 0. Hence xa + yb + zc + d = 0. we have x + 2 xla+Ylb+zlc+d=O. d) to this system of linear equations. b.Determinants 129 Clearly there is a nontrivial solution (a. Written in matrix notation. Suppose that we wish to Qetermine the equation of the unique plane in 3space that passes through three distinct given points (xl' YI' ZI)' (X2'Y2' z2) and (x 3' Y3' z3)' not all lying on a straight line. and so we must have the equation of the plane required. + Y. 1 0' 1 Example. c. x2a" + Y2b + z2c + d = 0. and so we must have det 2 x +i 2 xI +i I x2 + Y2 x3 + Y3 2 2 2 2 x xI x2 x3 Y YI Y2 Y3 = 0. . b. the equation of the circle required. Since the four points lie on the sphere. c. Since the three points lie on the plane. Example.
ani . =0. In this section.130 Determinants Hence (x 2 + y2 + z2)a+ xb+ yc+ zd +e = 0. + Y. (x. we have x 2 + i+z2 x 2 + i +z2 I I I x xI x3 Y YI Y3 Z 1 zi z2 z3 z4 1 1 a b C 0 0 x2 + Y2 +z2 x 2 + i+z2 x4 + Y4 + z4 3 2 3 2 2 2 2 x2 Y2 x4 Y4 =0 0 3 2 d e o Clearly there is a nontrivial solution (a.. xf + yf +zf x~ + y~ +z~ Some Useful Formulas x4 Y4 the equation of the sphere required.1) (n . Written in matrix notation. (xi + yi + zi)a+x3 b + Y3 C + z3 d +e = 0. (xf + yf +zf)a+xlb+ Ylc+zl d +e = 0.. while the second one enables us to solve a system of linear equations. we shall discuss two very useful formulas which involve determinants only.8 for proofs. The rst one enables us to nd the inverse of a matrix.1) matrix . b. (xi + yi + zi)a+x4 b + Y4 C + Z4 d +e = o. +z. c. and so we must have X2 + Y2 +z2 x 2 + i +z2 I I I x xI X2 X3 Y YI Y2 Y3 Z 1 zi Z2 z3 Z4 1 1 det xi + Y. ( .. + zi)a+ x2 b + Y2 C + z2d + e = 0. .. Recall rst of all that for any n x n matrix all A =· . d. e) to this system of linear equations. The interested reader is referred to Section 3.. ann the number Cij = (ly+j det(Aij) is called the cofactor of the entry aij' and the (n . a ln ) .
_I) a(iI)I a(i_l)(jI) • • a(iI)(j+I) 131 a. det[O I] det[1 1] detr1 I] 2 0 2 0 . Definition. Consider the matrix A= 0 1 2.. Proposition.. we have det(A) ~ det[~ ~l.Determinants al(j. Suppose that the n x n matrix A is invertible. •• . 203 Then 1 1 0] [ det[~ ~] det[ ~l ~] det[ ~ I ~] ad}(A) = det(~ ~] det(~~] det(~~] ~[~2 i2 ~~l ~]~l. Remark. Note that adj(A) is obtained from the matrix A rst by replacing each entry of A by its cofactor and then by transposing the resulting matrix. Then AI = 1 det(A) ad·(A).0 1 On the other hand. The n x n matrix ad} (A) = (C?I C11I . 0] ~det[10 0 0] ~detG 2 I 2 3 It follows that 2 2 3 .ln • a(i+I){i+I) an(j+I) • • a(iI)(i_l) • • aU l)n aU + l)n aU + I)n ann ani an(j_I)· is obtained from A by deleting row i and column}. here denotes that the entry has been deleted. Cf/lJ CIIII is called the adjoint of the matrix A. lj Example. adding 1 times column 1 to column 2 and then using cofactor expansion on row 1.
e. is given by det(A... we turn our attention to systems of n linear equations in n unknowns. k.. = 1. anrfn = bn. For every j j of the matrix A by the column b.... of the form anix i + .. represented in matrix notation in the form Ax = b. ..:. ann Proposition.. .._.. x n det(An(b» = .. a\nJ. (b» x . Consider the system Ax = b.1det(A) .. .... where A= ~ ~ ~ ~.. Then the unique solution of the system Ax = b. we have XI = 'det(A)'= 3.''':... . . Example. where Cnn C : Jn ) and b = (b) :1 bn represent the coefficients and represents the variables.132 Determinants AI = 4 3 3 2] [ 3 2 2 1 2. .. where A. By Cramer's rule.. [1 10] and = [1] b Recall that det(A) = 1. x and b are given by equation.. . Next.:... write in other words. .. x2 = det[~ 11 ~] 3 0 3 det[~ ~ ~] 2 3 3 det(A) = 4. ..:. we replace column al(Jr l ) . AI(b). (Cramer's Rule) Suppose that the matrix A is invertible. det(A) ' where the matrices AI(b). bn an(J+I) .:... ....
without loss of generality. Since the set X is nonempty and finite. If x E X. Sn denotes the collection of all functions from (1. dened by x<l>'I' = {x$)'I' for every x E X so that is followed by. We now let Sn denote the set of all permutations on the set {l. 2. we can use the notation 1 2 .. Proposition. Definition. we may assume.. we shall first discuss a definition of the determinant in terms of permutations. where n EN. . Example. (1<1> 2<1> . . For each such choice. . n} that are both onetoone and onto. A permutation $ onXis a function: X ~ X which is onetoone and onto. . 2 2 x3 = = Further Discussion In this section.. Proof There are n choices for 1$. that it is {I.. LetXbe a nonempty finite set. 2 2 1 We therefore have [3 3 2] ~ 3 3 [:~l [=~ =~ 1][~] [=~]. To represent particular elements of Sn' there are various notations. . In other words. For example.. Let us check our calculations. we den()te by x the image of x under the permutation.. We shall do this only for permutations. Remark. The reasons will become a little clearer later in the discussion.. n}. 2.. is also a permutation on X.. n}. For every n EN. Recall from Example that (1 1 1] AI = 4 3 2. the set Sn has n! elements. Note also that we write to denote the composition. n) to {I. then: X ~ X. . And so on.. It is not dicult to see that if: $ X ~ X and: X ~ X are both permutations on X. In order to do so. Note that we use the notation x instead of our usual notation (x) to denote the image of x under. n<1> to denote the permutation $.. . we need to make a digression and discuss first the rudiments of permutations on nonempty finite sets.1) choices left for 2$. .. 2.Determinants 133 det 0 1 2 203 x = =3 3 det{A) . 2. In S4' 1 2 3 4) (2 4 1 3 n) . there are (n .
in other words.. the reader can easily check that 1 2 3 4)(1 2 3 4) (I 2 3 4) (2 4 1 3 3 2 4 1 .. n} and leaves all the others unchanged is called a transposition.. . Suppose that n EN. .. the permutation I 2 3 4 5 6 7 8 9) (3 2 5 1 7 8 4 9 6 can be written in cycle notation as (1 3 5 74)(6 8 9). By Theorem 3P(b). . Here the cycle (1 243) gives the information 1<1> = 2. we have (1 2 4 3) = (1 2)(1 4)(1 3). and is its own inverse. (b) X 2' . In 8 9. . Two cycles (XI' x 2' . . 3 = 1 and 4 = 3. . n}..134 Determinants denotes the permutation. Xk) X2 )(X I . YI are all different.2 1 3 4 ' can be represented in cycle notation by (1 243)(1 34) = (1 2). the permutation I 2 3 4 5 6) (2 4 1 3 6 5 can be represented in cycle notation as (1 243)(5 6). .. We also say that the cycles (1 2 4 3). (XI X2 ... we have . A pet:mutation in 8n that interchanges two numbers among the elements of {I. it is not necessary to include this in the cycle.. where 1 = 2. xk) of the set {I. Example. Proposition.. Example. Xk For every subset (XI' x 2. . . Example.. since the image of2 is 2.. .. every permutation in 8n can be written as a product of transpositions. 2<1> = 4. On the other hand. In 84 or 86 . Suppose that n EN.. .. Xk) satises x 3). . YI) in 8n are said to be disjoint if the elements XI' .. Furthermore. In 8 6. . The interested reader may try to prove the following result.. ·. (XI' xk). xk' YI' . It is obvious that a transposition can be represented by a 2cycle.. Remark. 4<1> = 3 and 3<1> = 1.. the cycle (XI' X2' . The permutations U~ ~ j) and (~ ~ l:i) can be represented respectively by the cycles (I 243) and (I 34). . 2. (1 3 4) and (1 2) have lengths 4. x k) and (YI' Y2' . 2 = 4. where the elements are distinct. the information 1 2 3 4)(1 2 3 4) (I 2 3 4) (2 4 1 3 3 2 4 1 . 3 and 2 respectively.. (a) Every permutation in 8n can be written as a product of disjoint cycles.2 1 3 4 ' A more convenient way is to use the cycle notation. Note also that in the latter case. every cycle can be written as a product of transpositions.. The last example motivates the following important idea. Xl' = (XI' (c) Consequently. Definition. 2. .
Example. ann is an n x n matrix. we mean the product of n entries of A. Furthermore.a13a22a31 . Definition. we mean the sum <Pe:S" where the summation is over all the n! permutations in Sn' It is be shown that the determinant de ned in this way is the same as that dened earlier by row or column expansions. By the determinant of an n x n matrix A of the form (11). Definition.. It can be shown that no permutation can be simultaneously odd and even. a :] . In the two examples below. We have the following: elementary product permutation sign e +1 alla22a33 (123) +1 a12a21a33 (1 32) +1 a13a21a32 1 (1 3) a13a22a3I a ll a22 a 12a2I e +1 contribution + aIla22a33 + a12a21a31 + a13a21a32 . The very interested reader may wish to make an attempt. Suppose that A=[aI~ anI In . Example.. It follows that any such elementary product must be of the form a I(1<1»ai 2<1» .. Suppose that n = 3.Determinants 13S (1 3 5 7 4) = (1 3)(1 5)(1 7)(1 4) and (6 8 9) = (6 8)(6 9).a I2 a 2I as shown before. We have the following: elementary product permutation sign contribution (1 2) 1 Hence det (A) = all a 22 .!. Here we conne our study to the special cases when n = 2 and n = 3.. we write E 'f'  (. we use e to denote the identity permutation. We are now in a position to dene the determinant of a matrix. By an elementary product from the matrix A.) _ {+ 1 if is even 1 iff is odd. no two of which are from the same row or same column. one can use (12) to establish Proposition. Hence the permutation can be represented by (1 3)(1 5)(1 7)(1 4)(68)(69). Then a permutation in Sn is said to be even if it is representable as the product of an even number oftransp0sitions and odd ifit is representable as the product of an odd number of transpositions. Suppose that n = 2... where <I> is a permutation in Sn' Definition. an(nj). <l> Remark. Suppose that n EN. Indeed..
ofelementary matrices such that A' = Gk . we discuss briey how one may prove Proposition concerning the determinant of the. Proof of Proposition. . Then there exist a finite sequence G 1.. . it follows that there exist a nite sequence E 1.a12a21a33 Hence det(A) = alla22a33 + a12a23a31 + a13a21a32 .. ButAB = E 1. Ek of elementary matrices such that A = EI . Then A' = In' and so it follows from Equation that AB = E 1. (b) (c) If E arises from adding one row of In to another row. det(E) = c. Corresponding to Proposition. Since elementary matrices are invertible with elementary inverse matrices. so it follows from Proposition that det(AB) = O. Hence A' B must have a zero row. (a) If E arises from interchanging two rows of In. EJI1' Suppose first of all that det(A) = O. . Suppose that E is an n x n elementary matrix. .. we can establish the following intermediate result.alla23a32 . Suppose that E is an elementary matrix.. ••• . The idea is to use elementary matrices. we have det(EB) = det(E) det(B). Proposition. ..a12a2Ia33' We have the picture lielow: + Next.alla23a32 alla23a32 a12a21a33 (1 2) 1 . Suppose next that det(A) ::t: O. Let us reduce A by elementary row operations to reduced row echelon form A'. Then it follows from (13) that the matrix Ao must have a zero row. E~. . then det(E) = I. . we can easily establish the following result..a13a22a31. Gk. Proposition. G1A. Thenfor any n x n matrix B.. E/A 'B).. If E arises from mUltiplying one row of In by a nonzero constant c. then Combining Propositions 3D and 3Q. product of two matrices. ••• . and so det(A' B) = O. then det(E) = 1.136 Determinants (23) 1 .
. bnn Then for every i. }.. it remains to show that bIClj + . To complete the proof.Determinants Proof It sucess to show that Aad}(A) = det(A)In. if i :.. 1 ad'(A) det(A) lj By Proposition. on using cofactor expansion by column}. the unique solution of the system Ax = b is given by AI = det(A) Written in full.. we have [::J.. that x· = =" htClj + ... and so the determinant is 0 (why?).. .. . n. n. we get [ b~l . as this clearly implies 1 ad}(A)] = In' det(A) giving the result.y . ..n = det(A): On the other hand. +bnC.. + bnCnj = det(Aib)): Note. To show. • Hence. This matrix has therefore two identical rows. It follows that when i = }.det~A) ~~ [ J det(A) . we have bii = ailCil + .} = 1.. .. + ainCjn .. we have bij = ajJCjI + . this becomes x=A I = 1 adj(A)b. Proof Since A is invertible. b1n ] bn1 : . note that 137 A[ Aad}(A) =: :: anI'" ann Ctn Suppose that the right hand side of is equal to all .. + ainC.. [ al n 1[CJl (B)=: . for every} = 1. then equation is equal to the determinant of the matrix obtained from A by replacing row} by row i....
bi ( 1) i=l det • a(i+}) I • a(i+I)(JI) • • • • an(jI) ..138 Determinants ~ Hj a(iI)I a(il)(jI) • a(iI)(jI) a(iI)n det(A .(b» } = L....
n = 0. But what if n is huge: thousands. . analyse the long time behaviour of x n' etc..2xO= '). 1 At the first glance the problem looks trivial: the solution xn is given by the formula xn =Anxo ...2xO' .2.Chapter 5 Introduction to Spectral Theory EIGENVALUES AND EIGENVECTORS Spectral theory is the main tool that helps us to understand the structure of a linear operator. we can mUltiply it by itself... ThenA2axo = A 2xO' '). 1. Suppose that Axo = A xo' where'). or any polynomial. and most of the results presented here fail in infinitedimensional setting. MAIN DEFINITIONS Eigenvalues. equivalently. In this chapter we consider only operators acting from a vector space to itself (or.. let us consider difference equations. where a : V ~ V is a linear transformation. ..nxo' so the behaviour of the solution is very well understood In this section we will consider only operators in finitedimensional spaces. To explain the main idea. Many processes can be described by the equations of the following type xn +1 = Axn.. Given the initial state Xo we would like to know the state xn at the time n... Anxo = '). n x n matrices). millions? Or what if we want to analyse the behaviour of xn as n ~ oo? Here the idea of eigenvalues and eigenvectors comes in. is called an eigenvalue of an operator A : V ~ V if there exists a nonzero vector v E V such that . and xn is the state of the system at the time n. Spectral theory in infinitely many dimensions if significantly more complicated... The main idea of spectral theory is to split the operator into simple blocks and analyse each block separately. Ifwe have such a linear transformation A : V ~ V. Spectrum A ·scalar A. . take any power of it. is some scalar. Eigenvectors.
.. But how do we find eigenvalues of an operator acting in an abstract vector space? The recipe is simple: Take an arbitrary basis.e. a: lR n ~ lRn). finding all eigenvectors. in higher dimensions different numerical methods of finding eigenvalues and eigenvectors are used.i. Note. that determinants of similar matrices coincide. Finding roots of a polynomial of high degree can be a very dicult problem. is an eigenvalue. But how do we know that the result does not depend on a choice of the basis? There can be several possible explanations.Ai)x = o.AI. Finding Eigenvalues: Characteristic Polynomials A scalar A is an eigenvalue if and only if the nullspace Ker(A .AI). i. Therefore II E a(A). to find all eigenvalues of A one just needs to compute the characteristic polynomial and find all its roots. Let a act on lR n (i. The vector v is called the eigenvector ofa (corresponding to the eigenvalue A). The nullspace Ker(A . is called the eigenspace. and is usually denoted cr(A).A is an eigenvalue of A {:} det (A . the determinant det(A . The set of all eigenvalues of an operator A is called spectrum of A. So. We know that a square matrix is not invertible if and only if its determinant is O. the set of all eigenvectors and 0 vector. Let us recall that square matrices A and B are called similar if there exist an invertible matrix S such that A =SBSl.AI) = 01 If A is an n x n matrix. This method of finding the spectrum of an operator is not very practical in higher dimensions. or. and compute eigenvalues of the matrix of the operator in this basis.AI) is nontrivial (so the equation (A . One is based on the notion of similar matrices.e. So. equivalently (A .Ai) x = 0 has a nontrivial solution). Ifwe know that '). Characteristic Polynomial of an Operator So we know how to find the spectrum of a matrix. A Ihas a nontrivial nullspace if and only if it is not invertible. So.. corresponding to an eigenvalue is simply finding the nUllspace of A . Note that if A = SBSI then . Since the matrix of A is square.AI) is a polynomial of degree n of the variable A. This polynomial is called the characteristic polynomial of A.140 Introduction to Spectral Theory Av = Av.e. Indeed det A = det(SBSl) = det S det B det Sl = det B because det Sl = 1/ det S. and it is impossible to solve the equation of degree higher than 4 in radicals. the eigenvectors are easy to find: one just has to solve the equation Ax = Ax.
i.A i divides p(z) is called the multiplicity. Therefore det(A 'JJ) = det(B . then it is a root of the characteristic polynomial p(z) = det(A . positive integer k such that (z . Then 1. There is another notion of multiplicity of an eigenvalue: the dimension of the eigenspace Ker(A1) is called geometric multiplicity of the eigenvalue A..e.. Proposition.An). Let us mention. so characteristic polynomial of an operator is well defined.I  = S(BSI  = S(B _IJ. . Trace and Determinant Theorem. If T: V 7 A . . The largest. Geometric multiplicity is not as widely used as algebraic mUltiplicity.)SI.. As we have discussed above. and A and B are two bases in V.)2 divides p and so on.. In other words. matrices of a linear transformation in different bases are similar. and let AI' A .e. (z . we can define the characteristic polynomial of an operator as the characteristic polynomial of its matrix in some basis. counting multiplicities.A)q(Z). where q is some polynomial.Introduction to Spectral Theory 141 ASISI USI) so the matrices A lJ and B lJ are similar. . that algebraic and geometric multiplicities of an eigenvalue can differ. So. traceA = AI + A2 + . peA) = 0) then Z . when people say simply "multiplicity" they usually mean algebraic multiplicity. The mUltiplicity of this root is called the (algebraic) multiplicity of the eigenvalue A.. the result does not depend on the choice of the basis.. . and Ais its root (i.'JJ).. then [T]AA = [I]AB[TbB[I]BA and since [l]BA = ([l]AB)1 the matrices [11 AA and [11 BB are similar. If A i~ an eigenvalue of an operator (matrix) A. then q also can be divided by z . In other words. If q(A) = 0.zl). Let A be n x n matrix..AI)(Z . p can be represented as p(z) = (z . Therefore.A2) . Geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity. An be its eigenvalues (counting 2 multiplicities). counting multiplicity.U = SBS. . I V is a linear transformation. that ifp is a polynomial. p can be represented as p(z) = an(z . of the root A... so (z . Any polynomial p(z) = L~=o akz k of degree n has exactly n complex roots. where Ai' ~. + An.A divides p(z). Multiplicities of Eigenvalues Let us remind the reader.. i. The words counting multiplicities mean that if a root has multiplicity d we have to count it d times.e. Icharacteristic polynomials of similar matrices coincide. An are its complex roots.
..142 Introduction to Spectral Theory det A = A)A2 . .). .) . (an'n .. 2. A2.)... ... Since a diagonal matrix is a particular case of a triangular matrix (it is both upper and lower triangular the eigenvalues of a diagonal matrix are its diagonal entries The proof of the statement about triangular matrices is trivial: we need to subtract A from the diagonal entries of A.A2 .. . Eigenvalues of a Triangular Matrix Computing eigenvalues is equivalent to finding roots of a characteristic polynomial of a matrix (or using some numerical method). there is one particular case.. an.. its matrix in the basis B is o • "N "N [A N ]BB = dlag = {" N . However. n on the diagonal [A] BB = diag{A).. . A. kl• k=O 00 l Ak .An _ A)  o Moreover.A)(a2'2 .A) .. . . Namely eigenvalues ofa triangular matrix (counting mUltiplicities) are exactly the diagonal entries a).. Then the matrix of A in this basis is the diagonal matrix with 1. 0]. it is easy to find an Nth power of the operator A...n By triangular here we mean either upper or lower triangular matrix. n be the corresponding eigenvalues.. o An Therefore.. . which can be quite time consuming. An . a2. when we can just read eigenvalues off the matrix. functions of the operator are also very easy to compute: for example the operator (matrix) exponent et4 is defined as et4 and its matrix in the basis B is = I +tA+~+ t2A2 t3 A3 3! =L.. an'n' DIAGONALIZATION Suppose an operator (matrix) a has a basis b = vI' v2' .. and let A. and use the fact that determinant of a triangular matrix is the product of its diagonal entries. We get the characteristic polynomial det(A 'M) = (a). vn of eigenvectors. 2. .2. . Namely.. An} = [AI A2 . ..2.A) and its roots are exactly al')' a 2'2' .
vn . . we need to recall that the change of coordinate matrix [l]SB is the matrix with columns vI' v2.I ) .. Another way of thinking about powers (or other functions) of diagonalizable operators is to see that if operator A can be represented as A = SDS.. .r be distinct eigenvalues of A. Proof We will use induction on r. . its columns form a basis. . Theorem. Let A. vr be the corresponding eigenvectors. On the other hand if the matrix admits the representation a = SDSI with a diagonal matrix D. The case r = 1 is trivial. = SD N SI .Introduction to Spectral Theory 143 o o To find the matrices in the standard basis S. where S = [l]SB is the change of coordinate matrix from coordinates in the basis B to the standard coordinates. A. then columns of S are eigenvectors of A (column number k corresponds to the kth diagonal entry of D)... 0 1 =ISDSI . A matrix a admits a representation A = SDSI.. .. ' NTimes and it is easy to compute the Nth power of a diagonal matrix.. then the matrix admits the representation A = SDSl. A. Let us call this matrix S. . because by the definition an eigenvector is nonzero. I .. Since S is invertible. vr are linearly independent. S o An where we use D for the diagonal matrix in the middle. . Theorem.I . . then AN = (SDSI)(SDS..z. where D is a diagonal matrix if and only if there exists a basis of eigenvectors of A. Proof We already discussed above that if there is a basis of eigenvectors. . Then vectors vI' v2. . and let vI' v 2. . The following theorem is almost trivial. and a system consisting of one nonzero vector is linearly independent. Similarly Af AN = SD N SI =S o A~ o and similarly for etA. then A = [A]ss = s AI [ A 2... .
Bases of Subspaces (AKA Direct Sums of Subspaces) Let VI' V2.1..+ vp = L::Vk. . vr. By Theorem the system vI' v 2.Vk E Vk· k=1 We also say...e.. .. Vp is a basis if and only if it is generating and linearly independent. Suppose there exists a nontrivial linear combination r cIv I + c2v2 + . Since Ak"* Ar we can conclude that ck = 0 for k < r. From the above definition one can immediately see that Theorem states in fact that the system of eigenspaces Ek of an operator A Ek := Ker(A . We saythatthe system of subspaces is a basis in V if any vector v E V admits a unique representation as a sum p v = v J + v2 + . so ciAkr) = 0 for k = 1. and since it consists of exactly n = dim V vectors it is a basis. Proof For each eigenvalue k let vk be a corresponding eigenvector (just pick one eigenvector for each eigenvalue). If an operator A " V ~ V has exactly n = dim V distinct then it is diagonalizable. Vp be subspaces ofa vector space V. .1. . We say that the system of subspaces VI' V2. . Vp is linearly independent if and only if any system of nonzero vectors vk.. k=I By the induction hypothesis vectors vI' v2. . .. . p). ..I are linearly independent. ' Remark. vn is linearly independent. Remark. Corollary. Vp is linearly independent if the equation VI + v2 + .144 Introduction to Spectral Theory Suppose that the statement of the theorem is true for r . E cr(A).2.. vk E Vk has only trivial solution (vk = 0 Vk = 1.AkI). Another way to phrase that is to say that a system of subspaces VI' V2. . r .. . or spanning) if any vector v E V admits representation. . .Ar I and using the fact that (A ...+ vp = 0.. Ak is linearly independent. 2. Then it follows from that cr = 0. .. a system of subspaces VI' V2. It is easy to see that similarly to the bases of vectors.Ar )Vk = O. i..+ crvr = L::CkVk = 0 k=I Applying A .... . where vk E Vk.A/)Vr = 0 we get rI L::Ck (Ak ... is linearly independent... . ... . .. we have the trivial linear combination. . that a system of subspaces VI' V2. Vp is generating (or complete.
Then the union [kBk ofthese bases is a basis in V.e. Vp be a basis of subspaces. if one thinks a bit about it. vn ' Split the set of indices I.. .. .. the system Bk consists of vectors bi'} E A k' ... ... Instead of using two indices (one for the number k and the other for the number of a vector in Bk.. . for example as follows: first list all vectors from B I . Namely. and the set of indices {I..2.+ vp = O.. the system Bk) are linearly independent.. 2. The following theorem shows that in the finitedimensional case it is essentially the only possible example of a basis of subspaces. . .. i. The main diculty in writing the proof is a choice of a appropriate notation. . we index all vectors in B by integers 1. . Let VI' V2' . Ap ' and define subspaces Vk := span {Vj :} E A k }. Theorem. To prove the theorem we need the following lemma. Since vk E vk and the system of subspaces Vk is linearly independent. and let us have in each subspace Vk a linearly independent system Bk of vectors 3 Then the union B "= U~k is a linearly independent system. let n be the number of vectors in B := [U~k' Let us order the set B. we have cj = 0 for all} E A k' Since it is true for all A k' we can conclude that cj = 0 for all }.. . Proof The proof of the lemma is almost trivial. n} splits into the sets 1.Introduction to Spectral Theory 145 There is a simple example of a basis of subspaces. .e. Suppose we have a nontrivial linear combination n b . let us use "flat" notation. 2. then all vectors in B2. n.. . Lemma. and since the system of vectors bj :} E A k (i. + Cnn~JJ = 0 l J=I Denote Then can be rewritten as VI + v 2 + . .. Vp be a linearly independent family of subspaces.. .'\" c ·b· b c ib + c22 + .. Proof To prove the theorem we will use the same notation as in the proof of Lemma. This way. P such that the set Bk consists of vectors bj :} E A k . vk = 0 Vk..2. Than means that for every k L: JEAk Cij =0. Let V be a vector space with a basis vI' v2. Clearly the subspaces Vk form a basis of V. .. and let us have in each subspace Vk a basis (of vectors) B.. Let VI' V2' .. etc. n into p subsets AI' A2.. listing all vectors from Bp last..
which is exactly n = dim V.At/) be the corresponding eigenspaces. So the number of vectors in b equal to the sum of multiplicities of eigenvectors k. or simply assuming that a has exactly n = dim Veigenvalues(counting multiplicity). we see that if an operator A is diagonalizable.A) ". ". either by working in a complex vector space. .. Theorem. and let Ek := Ker(A .A). So. Since for a diagonal matrix D = diag{A 1. Consider the matrix A= (~ r). that for a diagonal matrix. vk E Vk' k=1 E Since the vectors bpj A k form a basis in Vk' the vectors vk can be represented as vk 2:: cjbj .e.146 Introduction to Spectral Theory Lemma asserts that the system of vectors b"j = 12. Vp is a basis. any vector v E V can be represented as p v = vIPI + v 2P2 + ". "" Ap be eigenvalues of A. By Lemma the system B = [U~k is a linearly independent system of vectors. complex eigenvalues).. Since the system of subspaces VI' V2. Let us now prove the other implication. jEA k and therefore v = 2::~={Jbj' Criterion of Diagonalizability. Some Examples Real eigenvalues. Note.Ai) (i. In what follows we always assume that the characteristic polynomial splits into the product of monomials. we have a linearly independent system of dim V eigenvectors. which means it is a basis..2. An operator A : V ~ V is diagonalizable ifand only iffor each eigenvalue Il the dimension of the eigenspace Ker(A . if we allow complex coefficients (i. Proof First of all let us note.'Ai) = (AI . that any polynomial can be decomposed into the product of monomials. A2. and therefore the same holds for the diagonalizable operators. The subspaces Ek. the algebraic and geometric multiplicities of eigenvalues coincide. the geometric multiplicity) coincides with the algebraic multiplicity of Il.. k = 1. ". Let AI' A2. "" n is linearly independent.A)(A2 .(An .+ v = P = I:'>k. We know that each Bk consists of dim Ei= mUltiplicity ofAk) vectors. An} det(D . its characteristic polynomial splits into the product of monomials.e. Let Bk be a basis in Ek. so it only remains to show that the system is compl~te. First of all let us mention a simple necessary condition. '''' pare linearly independent.
= 3. = 5 and t. For = (=~ _~i) and the eigenvalues (roots of the characteristic polynomial are t. we do not need to compute an eigenvector for t. = 1 + 2i A. Consider the matrix A=(b O· Its characteristic polynomial is 110>.1~ >1 = (1. OT. for t.if .. = 12i: we can get it for free by taking the complex conjugate of the above eigenvector. _2)T .Introduction to Spectral Theory 147 Its characteristic polynomial is equal to 118>.>)2 16 and its roots (eigenvalues) are t..V This matrix has rank 1.. For the eigenvalue t. for example by (1.5 = 8 4 A basis in its nullspace consists of one vector (1.1~>I=(l_>)2. so the eigenspace Ker(A . Consider the matrix A=(l2 i)· Its characteristic polynomial is 1 1=2>. = 5 A51 = 8 1. for t.. So.AI) is spanned by one vector. Since the matrix A is real... 2)T.1~>1=(1>)2+22 t. = 1 ± 2i. so this is the corresponding eigenvector. and so the matrix A can be diagonalized as A=(1i i )(1+2i 10 )(1i i)1 1 0 1 2i A nondiagonalizable matrix.. .= 12i a corresponding eigenvector is (1 . = 3 (15 2) (4 2) ~) 1 AU=A+31=(: and the eigenspace Ker(A + 31) is spanned by the vector (1. Similarly.. The matrix A can be diagonalized as A= (~ i) = (~ l2)(~ ~3)(~ 2 )1 Complex eigenvalues.
y) E JR2 by where (. note that It follows that every U E JR2 can be written uniquely in the form u=c1v 1 +c2v2' where c I . we got that the eigenspace Ker(AI1) is one dimensional (spanned by the vector (1. we must have . On the other hand. so 2 . dened for every (x. Let us now examine how these special vectors and numbers arise.). Il~? ~ JR2 . If A were diagonalizable. the function f: JR2 ~ JR2 can be described easily in terms of .1) = 1 (1 pivot. Example. But.1 = 1 free variable). There is also an explanation which does not use Theorem. Of). and so the dimension of the eigenspace wold be 2. t). Therefore. Consider a function f: j(x. Since AV=A(~ ~}=(~ ~}. We hope to find numbers A E JR and nonzero vectors v E JR2 such that G~)V=AV.148 Introduction to Spectral Theory so A has an eigenvalue 1 of mUltiplicity 2. so a is not diagonalizable. the geometric multiplicity of the eigenvalue 1 is different from its algebraic multiplicity. Therefore A cannot be diagonalized.)=G Note that ~)(. form a basis for JR l the two special vectors vI and v 2 and the two special numbers 2 and 6. it would have a diagonal form (6 ~) in some basis. y) = (s. c2 E lR. it is easy to see that dimKer(A . Namely. so that Au =A(clv l + c2v2) = clAvI + c0 v2 = 2c l v I + 6c2v2 . Note that in this case.
AJ)v = O. In other words.A. Solving this equation gives the eigenvalues of the matrix A. Then Av = AV = 'Alv.. we must therefore ensure that det ( 3. we must have al1 A a12 det a21 a22. 3 J= O. On the other hand. it follows that we must have det (A . Then we say that A is an eigenvalue of the matrix A. where I is the n x n identity matrix. for any eigenvalue A of the matrix A. and that v is an eigenvector corresponding to the eigenvalue A. Suppose that A=: ( a~1 . so that (A . = CJ.. Hence (3 . willi root v. Substituting A = 2 into (1).A. t we obtain G!} = 0. the set n {v elR : (AA1)v=O} . we obtain (~3 ~l}= 0. 3) v=O. ani that is a polynomial equation. Suppose further that there exist a number E R and a nonzero vector v E]Rn such that Av = AV. Suppose that A is an eigenvalue of the n x n matrix A. ( 1 5A. and that v is an eigenvector corresponding to the eigenvalue A. = (~J Substituting A = 6 into (1).A. a~nJ : ani ann is an n x n matrix with entries in lR. Definition.3 = 0. In order to have nonzero v E]R2 . we must have 3.) . 1 5 _ A. = 2 and A. with roots A.A an2 ann .)(5 .2 = 6.A1) = O. In other words. wiiliroot v.A. Since vERn is nonzero.A =0.Introduction to Spectral Theory 149 (G ~)(~ ~)}= o.
1. in other words.2 ..2)(1. a subspace of ~ n • Definition.3 = 0... An eigenvector corresponding to the eigenvalue 1 is a solution of the system (A + f)v =(~ 6 12 12J v = 0. 1.3 = 5.)(5 .. The polynomial is called the characteristic polynomial of the matrix A.2 = 2 det 6 11. the space is called the eigenspace corresponding to the eigenvalue A. with root vI ~~ 9 = (IJ .. The eigenvalues are therefore Al = 1. 1. Hence the eigenvalues are Al = 2 and ~ = 6. we need to nd the roots of 12 J 30 = 0.A and 1. ( o 9 20 • To find the eigenvalues of A. with corresponding eigenvectors G~) respectively. The eigenspace corresponding to the eigenvalue 2 is {VE~2:G :)V=+H~JCE~}.ISO Introduction to Spectral Theory is the nullspace of the matrix A .81. 0 13 . (A + 1)(1. ~ . For any root Aof equation. .. . in other words. The matrix has characteristic polynomial (3 .. + 12 = O.. The eigenspace corresponding to the eigenvalue 6 is {VE~2 {~3 ~l)V=+H:}CE~}.IJ. ( o 9 201. Example.5) = O. Consider the matrix 1 6 12J A= 0 1.. Example..A) .3 30 .
(A+31)v= 20 to 5) 45 25 15 V=O.2I)v = 45 ( 30 15 v = 0. with roots v2 = 0 and v3 = 3 .Introduction to Spectral Theory 151 15 9 12) 30 v = 0. 2 An eigenvector corresponding to the eigenvalue 2 is a solution of the system (A . The eigenvalues are therefore Al = 3 and An eigenvector corresponding to the eigenvalue 3 is a solution of the system det 5) Az = 2. while the eigenspace corresponding to the eigenvalue 2 is a plane through the origin. and so form a basis for ]R3. Note also that the eigenvectors VI' v2 and v3 are linearly independent. ( 9 15 3 Note that the three eigenspaces are all lines through the origin. 30 20 10 3 0 15 to 5) (1) (2) Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin. we need to nd the roots of 17A to 45 28A 15 = 0. withrootv3 = 5 . 30 20 12 To find the eigenvalues of A. . Consider the matrix A= (~~ =~~ ~155). with root vI ( 30 20 15 = ( 1) 3. ( 30 20 12A in other words.2)2 = O. and so form a basis for ]R3.2J)v = 0 3 6 (o o (0) An eigenvector corresponding to the eigenvalue 5 is a solution of the system (A5J)v= 6 6 12) ( 1) 0 18 30 v=O. (A + 3)(A . Note also that the eigenvectors vI' v2 and v3 are linearly independent. 18 1 An eigenvector corresponding to the eigenvalue 2 is a solution of the system ~A . with root v2 = 2. Example.
We can therefore only nd two linearly independent eigenvectors.1)2 = 2A 1 1 0. ° OJ (0 0 3 To find the eigenvalues of A. Consider the matrix A= 1 2 1 0.J)v= (i ~: ~}= =} (i Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin.3)(A . we need to nd the roots of det ( in other words. we need to find the roots of . with root v2 An eigenvector corresponding to the eigenvalue 1 is a solution of the system (A. the matrix (i ~: ~J has rank 2. 1 3 4 To find the eigenvalues of A. Example. 0. An eigenvector corresponding to the eigenvalue 3 is a solution of the system °A ° J= ° ° ° 3A (A . Consider the matrix A= (~ =~ ~J. ( 0.3I)v = 1 ~ OJ v = 0. with root vI = (OJ 1 ~ ~ ~. The eigenvalues are therefore Al = 3 and A2 = 1.152 Introduction to Spectral Theory Example. and so the eigenspace corresponding to the eigenvalue 1 is of dimension 1 and so is also a line through the origin. (A . so that ]R3 does not have a basis consisting of linearly independent eigenvectors of the matrix A. On the other hand.
with roots has rank 1. it can be proved by induction that for every natural number kEN. (A .2)3 = O. Example.2)(A . In fact. with corresponding eigenvector v.1)(A . We can therefore only nd two linearly independent eigenvectors. 2 1 0 3 2 Note now that the matrix =~ ~Jv = 0. The eigenvalue is therefore A = 2. Consider the matrix 3 (i =~ ~J ( ~ ~ :J. 003 To find the eigenvalues of A. It follows that the eigenvalues of the matrix A are given by the entries on the diagonal. (A . with corresponding eigenvector v. In fact. Suppose that A is an eigenvalue of a matrix A. Hence A2 is an eigenvalue of the matrixA2. this is true for all triangular matrices. Then A 2v = A(Av) = A(AV) = A(Av) = A(AV) = A2V. and so the eigenspace corresponding to the eigenvalue 2 is of dimension 2 and so is a plane through the origin. so that IR does not have a basis consisting of linearly independent eigenvectors of the matrix A. Example. 3A in other words.Introduction to Spectral Theory 153 3A det 3 1. An eigenvector corresponding to the eigenvalue 2 is a solution of the system ( 1 J= 0. with corresponding eigenvector v.A 2 2 1 3 4A in other words. vI = ( ~ J and v = (~J. The Diagonalization Problem Example. we need to find the roots of det(l ~A 2 ~ A o 0 : J= 0. Ak is an eigenvalue of the matrix Ak.3) = O. Let us return to Examples are consider again the matrix .
:) respectively. = P Dc respectively.PD)c = 0 for every c E R2. Since the eigenvectors form a basis for uniquely in the form u = civ i + c2v2' where cl' c2 E R. Hence we must have AP = PD.}AU =(. Ifwe write p ~ (~l :) and D ~ (~ ~). Note that c E R2 is arbitrary.2 = 6. so that APc = P Dc. and Write c= (:J. with corresponding eigenvectors respectively.154 Introduction to Spectral Theory A~G !} ne. we conclude that PIAP=D. Suppose further that . This implies that (AP .:) and (. with entries in R. Suppose that A is an nn matrix.u =(.) ~ (~l :)(. Proposition.} Then both can be rewritten as (.J=el :)(~ ~)(.)~(~l :)(~. every u E R2 can be written We have already shown that the matrix A has eigenvalues Al = 2 and 11. Since P is invertible. Note here that P = (vI v2) and D = ( 0 then matrix become u = Pc and Au AI Note also the crucial point that the eigenvectors of A form a basis for 1R 2 • We now consider the problem in general.
. Vn are linearly independent.. where p~(~ ~ ~~) D~(~l ~ n .Introduction to Spectral Theory A has eigenvalues AI' 155 .. it follows that P is invertible.PD)c = 0 for every cERn. Hence PIAP = D as required. Then pIAP=D. Hence we must have AP = PD. + cnvn' where c I ' .. Example. = Pc and Au = P = (AICI = PDc : AnCn J respectively. and that vI' . cn E]Rn ... not necessarily distinct... . so that every u E]Rn can be written uniquely in the form u= and Writing civ i + . vn ERn. we see that both equations can be rewritten as U . with corresponding eigenvectors vI' . Since the columns of P are linearly independent. We have PIAP = D. . Note that C E]Rn is arbitrary.. Consider the matrix A = (~1 ~3 ~~2). they form a basis for ]Rn.. . where Proof Since VI' . .. . o and 9 20 as in Example. An E R. This implies that (AP . vn are linearly independent.. so that APc = PDc.
156
Introduction to Spectral Theory
Example. Consider the matrix
45 28 15 , 30 20 12 as in Example. We have PIAP = D, where
A= P=
(
17 10 5)
(
3 2
1 1 2)
0 3 and D = 3 0
(3 0
0 0 2 0
Definition. Suppose that A is an n x n matrix, with entries in lR. We say that A is diagonalizable ifthere exists an invertible matrix P, with entries in lR, such that PIAP is a diagonal matrix, with entries in lR. It follows from Proposition that an n x n matrix A with entries in lR is diagonalizable if its eigenvectors form a basis for lR n • In the opposite direction, we establish the following result.
Proposition. Suppose that A is an nn matrix, with entries in lR. Suppose further that
A is diagonalizable. Then A has n linearly independent eigenvectors in lRn.
Proof Suppose that A is diagonalizable. Then there exists an invertible matrix P, with
entries in lR, such that D = PIAP is a diagonal matrix, with entries in lR. Denote by vI' ... , vn the columns of P; in other words, write P=(vl· .. vn)· Also write
Clearly we have AP
= PD. It follows that
AI
(Avi ... Avn) = A(vi ... vn) = ( vI'" vn) (
= (AIV I ... Alnvn)· Equating columns, we obtain AVI = Alvl' ... , AVn = AnVn' It follows that A has eigenvalues AI' ... , An E lR, with corresponding eigenvectors vI'
... , Vn E lRn. Since P is invertible and vI' ... , vn are the columns of P, it follows that the eigenvectors vI' ... , vn are l!nearly independent.
IntroductIOn to Spectral Theory
157
In view of Propositions, the question of diagonalizing a matrix A with entries in IR is reduced to one of linearindependence of its eigenvectors.
Proposition. Suppose that A is an n x n matrix, with entries in IR. Suppose further that A has distinct eigenvalues AI' ... , An E IR, with corresponding eigenvectors vI' ... , Vn
E IR n . Then vI' ... , vn are linearly independent. Proof Suppose that vI' ... , vn are linearly dependent. Then there exist c I' ... , cn E IR, not all zero, such that clvl+···+cnvn=O. Then A(CIV I + ... + cnvn) = clAvI + ... + c~vn = Atclv I + ... + Ancnvn = O. Since vI' ... , vn are all eigenvectors and hence nonzero, it follows that at least two numbers among c I ' ... , cn are nonzero, so that c I ' ••• , cn_1 are not all zero. Multiplying by An and subtracting, we obtain (AI  n)c 1VI + ... + (AnI  An)Cn IVnI = O. Note that since AI' ... , An are distinct, the numbers Al  An' ... , AnI  An are all nonzero. It follows that VI' ... , Vn l are linearly dependent. To summarize, we can eliminate one eigenvector and the remaining ones are still linearly dependent. Repeating this argument a finite number of times, we arrive at a linearly dependent set of one eigenvector, clearly an absurdity. We now summarize our discussion in this section.
(l)
Diagonalization Process. Suppose that A is an n x n matrix with entries in lR. Determine whether the n roots of the characteristic polynomial det(A  IJ) are
real. If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding to these eigenvalues. Determine whether we can find n linearly independent eigenvectors. If not, then A is not diagonalizable. If so, then write
(2)
(3)
. . J,
where AI' ... , An E IR are the eigenvalues ofA and where vI' ... , vn E lR n are respectively their corresponding eigenvectors. Then PIAP = D.
Some Remarks
In all the examples we have discussed, we have chosen matrices A such that the characteristic polynomial det(A IJ) has only real roots. However, there are matrices A where the characteristic polynomial has nonreal roots. Ifwe permit AI' ... , An to take values
158
Introduction to Spectral Theory
in C and permit "eigenvectors" to have entries in <C , then we may be able to "diagonalize" the matrix A, using matrices P and D with entries in <c. The details are similar. Example. Consider the matrix
A
1 1 To find the eigenvalues of A, we need to find the roots of
=
(1 5).
IA 5 = 0; 1 IA in other words, A2 + 4 = O. Clearly there are no real roots, so the matrix A has no
det= ( eigenvalues in matrix
J
1R. Try to show, however, that the matrix A can be "diagonalized" to the
D= (
. 2i We also state without proof the following useful result which will guarantee many examples where the characteristic polynomial has only real roots. Proposition. Suppose that A is an n x n matrix, with entries in 1R. Suppose further thatA is symmetric. Then the characteristic polynomial det(A IJ) has only real roots. We conclude this section by discussing an application of diagonalization. We illustrate this by an example. Example. Consider the matrix
2i
o
0)
A
=
(~~ =~~ ~155J'
30 20 12
as in Example. Suppose that we wish to calculate A98. Note that PIAP
= D, where
P=
1 1 2J (3 3 0 (2 03 30 and D = 0
, v
I
o O. 2 OJ o 2
It follows that A = PDpi, so that
98
A
= (PDPI) ...(PDP 1) = PD 98 p 1 = P
98
0
o
o
This is much simpler than calculating A 98 directly.
Introduction to Spectral Theory
159
An Application to Genetics
In this section, we discuss very briey the problem of autosomal inheritance. Here we consider a set oftwo genes designated by G and g. Each member of the population inherits one from each parent, resulting in possible genotypes GG, Gg and gg. Furthermore, the gene G dominates the gene g, so that in the case of human eye colours, for example, people with genotype GG or Gg have brown eyes while people with genotype gg have blue eyes. It is also believed that each member of the population has equal probability of inheriting one or the other gene from each parent. The table below gives these peobabilities in detail. Here the genotypes of the parents are listed on top, and the genotypes of the ospring are listed on the left.
GGGG GG Gg gg
1
GGGg
GGgg
GgGg
Gggg
gggg

1
0 0
2 1 2
0
1
4 1
1
0
0 0
1 2 Example. Suppose that a plant breeder has a large population consisting of all three genotypes. At regular intervals, each plant he owns is fertilized with a plant known to have genotype GG, and is then disposed of and replaced by one of its osprings. We would like to study the distribution of the three genotypes after n rounds of fertilization and replacements, where n is an arbitrary positive integer. Suppose that GG(n), Gg(n) and gg(n) denote the proportion of each genotype after n rounds offertilization and replacements, and that GG(O), Gg(O) and gg(O) denote the initial proportions. Then clearly we have GG(n) + Gg(n) + gg(n) = 1 for every n = 0, 1,2, ... On the other hand, the left hand half of the table above shows that for every n = 1, 2, 3, ... , we have 1 GG(n) = GG(n  1) + Gg(n  1),
0
0

2 1

1
4

2 1
"2
Gg(n)
= "2 Gg(n  1) + gg(n  1),
1
and
gg(n) = 0,
so that GG(n)J Gg(n) ( gg(n)
112 (1 1 Gg(nl). = 0 112 0J(GG(n 1)J 0 o o gg(nl)
160
Introduction to Spectral Theory
It follows that
( ~~~;J = (~~~;J
An
gg(n) gg(O)
for every n = 1, 2, 3, ... ,
where the matrix
000 has eigenvalues Al = 1, A2 = 0, A3
A=(~ ~;~ ~J
=
112, with respective eigenvectors
p=(~ ~2 ~IJ
We therefore write
p=(~
I
2
~IJ and D=(~
1
~
0 0 o 0 112
OJ ,
with P
=
(I
0
1
2
:J
1 0 0 0 1I2n 0 1 2
Then pI AP = D, so that A = PDpI, and so
An =PDnp
1
=(~o ~2 ~IJ(~ ~ ~ J(~ 0 :J
Il/2n1
11I2n
=
It follows that
0 0
I/2n
l/2n1
0
0
(GG(n))
gg(n)
II/2 n 1I2 Gg(n) = 0
n I1I2 n 1 I/2n1
(GG(O))
Gg(O) gg(O)
0
0
0
.Introduction to Spectral Theory 161 n n\ GG(O) + Gg(O) + gg(O) .gg(O)/2 = Gg(O)/2 n + gg(O)/2 n\ o IGg(O)/2 gg(O)/2 n n\ = Gg(O)/2 n + gg(O)/2 n\ o This means that nearly the whole crop will have genotype GG.Gg(O)/2 .
.e. y) oftwo vectors x = (xl' X2. so II X 11= ~(x..Chapter 6 Inner Product Spaces Inner Product in IR and n en Inner product and norm in IRn. If Z E I 2 2 2 n en is given by .. The dot product in IR3 was defined as x .. we have 1z 12 = x2 + y2 = z Z .x). and we use the notation yT X only to be consistent.. y):= XIYI + XV'2 + . the eigenvalues can be complex.ln =yT X. Y = (Yl' Y2' . . for example in lR the length of the vector is defined as 3 II x 11= V Xl + x2 + x3 . ynl by (X. xnl. that yT X = xT y.. Similarly. Inner product and norm in en. For a complex number z = X + iy. . The complex space en is the most natural space from the point of view of spectral theory: even if one starts from a matrix with real coefficients (or operator on a real vectors space).. In dimensions 2 and 3. Let us now define norm and inner product for en. in IR one can define the inner product (x. to define the norm ofthe vector x ERn I 2 2 2 as IIxll=vxI +X2 + .. and one needs to work in a complex space. + X. It is natural to generalize this formula for all n.. Note. the distance from its endpoint to the origin) by the Pythagorean rule. we defined the length of a vector x (i. . Y = x IY2 + xV'2 + xJY3' where x = (xl' X2' x3)T andy = (YI' Y2' Y3l. +xnThe word norm is used as a fancy replacement to the word length.
namely the inner product given by (z.w)=z. w) by n (z. p. y. but z*w and w*z are the only reasonable choices giving II z 112 = (z. z and all scalars a. x). Let us try to define an inner product on en such that II z 112 = (z. = (x.[:~ ! &~ 2 n 2 2 n 2 k=l" k=l z n . To simplify the notation. x) = (x. so we will use it as well. One of the choices is to define (z. + Z n Wn = z*w. one can write the inner product in en as (z.iYn ' it is natural to define its norm II z II by II z II = L (Xk + Yk) = 2:) zk I . (z. y). w) = w*z. note. z). that for a real matrix A. and then take the complex conjugate of each entry. (x. 3. y) Linearity: (ax + ay. 4. that the above two choices of the inner product are essentially equivalent: the only difference between them is notatioool. While the second choice of the inner product looks more natural.+znwn= I:>kWk> k=1 and that will be our definition of the inner product in en . w)1 = Z I WI + Z 2 w 2 + .WI +z2W2+ . Using the notion of A *. because (z. ~ z) + (y. Remark.. Note. A* = AT . the first one.. w) = w*z is more widely used. The inner product we defined for ]Rn and en satisfies the following properties: 1. 2. Note. Nondegeneracy: (x. z) Nonnegativity: (x. that for a real space.. y) is just symmetry. 0 '\Ix. z). We did not specify what properties we want the inner product to satisfy. let us introduce a new notion. It is easy to see that one can define a different inner product in en such that II z Ib = (z.xn . z). (Conjugate) symmetry: (x. this property = (y. Inner Product Spaces.. z) for all vector x. x) = 0 if and only if x = o.Inner Product Spaces 163 z= ~~ . or simply adjoint A* by A* = A~ meaning that we take the transpose of the matrix. z). For a matrix A let us define its Hermitian adjoint. w)l = (w. .
i.k' k=l Example. A space V together with an inner product on it is called an inner product space. For the space Mm x n of m x n matrices let us define the socalled Frobenius inner product by (A. that trace (B* A) = 2: Aj. = 2:. its trace is defined as the sum of the diagonal entries. JI f(t)g(t)dt. 1 It is easy to check.g) = en. and we do not need the complex conjugate here. y) such that the properties 14 from the previous section are satisfied. Note. Let V be ]Rn or en . that assign to each pair of vectors x. j. Example.k so this inner product coincides with the standard inner product in e mn • .kBj. y) is always real. one defines the norm on it by II x II = ~(x. Given an inner product space. In the real case we only allow polynomials with real coefficients. Define the inner product by (f. Again. y) = y*x = y * x defined above. that we indeed defined an inner product.. it n means the statement is true for both lR and Example. Let us recall. and for a complex space the inner product (x. We already have an inner product (x. it is easy to check that the properties. Note that for a real space V we assume that (x. n trace A = 2:ak.164 Inner Product Spaces Let V be a (complex or real) vector space. B) = trace (B*A). that the above properties 14 are satisfied. Let V be the space Pn of polynomials of degree at most n. When we have somt! statement about the space F n . This definition works both for complex and real cases. denoted by (x. that for a square matrix A. y) can be complex.e.=1 xkYk This inner product is called the standard inner product in ]Rn or en We will use symbol F to denote both e and lR. Y a scalar.k . An inner product on V is a function.x).
so x = O. (x. 2 . y) = 0 we only need to show that implies x = O. this inequality should hold for t = (x. Applying the above lemma to the difference x .y we get the following Corollary. z) \lz E V . y be vectors in an inner product space V. In particular. Theorem.ty) = II x 112  2t(x. II y II· Proof The proofwe are going to present. but will be used a lot Corollary. x) = 0. Suppose two operators A. (CauchySchwarz inequality).y~ 1. = ~ (x. I (x. y) \Ix E X. so we can assume that y O. The equality x = y holds if and only if (x. 0) = O.x) + 0(z.Inner Product Spaces 165 Properties of Inner Product. Let us consider the real case first.y) 2 lIyll . for all scalar t "* o ::. but it gives a lot for the understanding. The following corollary is very simple.x) + 0(z. \ly E V. B : x ~ Y satisfo (Ax. z). The following property relates the norm and the inner product. not only for Fn' To prove them we use only properties 14 of the inner product. is not the shortest one.x) =  a(x.z)  Note also that property 2 implies that for all vectors x (0. z) = (y. the statement is trivial. y) = (Bx. ay + ~z) Indeed.x) = a(y. Let x. that properties 1 and 2 imply that 2 '.ay +0z) = (ayt. First of all let us notice. The statements we get in this section are true for any inner product space.x) = a(y. Putting y = x in) we get(x. Proof Since (0. y) = 0 \ly E V. x . II x becomes ty 112 = (x  ty. Then A = B Proof By the previous corollary (fix x and take all possible y's) we get Ax = Bx' Since this is true for all x E X. y) + t 211 Y 112. x) = (x.0z. Then x = 0 if and only if (x. Lemma. the transformations A and B coincide. y) + i3 (x.y) + 0(x. Let x be a vector in an inner product space V. and for this point the inequality II y II (x. (x. By the properties of an inner product. If y = 0. y) I ~ II x II .
II x II + II y II· Proof II x + y 112 = (x + y.+i if Vis a complex space. y) + (y. al~+aYIl2 4 a=±l.166 Inner Product Spaces which is exactly the inequality we need to prove.y) + (x. L::: .ty) = (x. that the above paragraph is in fact a complete formal proof of the theorem. The other possibility is again to consider o : :. x . x) :::. y) is real. (Triangle inequality). Lemma. In 2dimensional space this lemma relates sides of a parallelogram with its diagonals. where a is a complex constant. Lemma. The following polarization identities allow one to reconstruct the inner product from the norm: Lemma (Polarization identities).II Y 112 which is the inequality we need. It is a wellknown fact from planar geometry.!. For any vectors u. For x. For any vectors x. (Parallelogram Identity). The reasoning before that was only to explain why do we need to pick this particular value of t.ty) .y) II y 112 II Y 112 2 2 I(x.v 112 = 2(11 u 112 + II v 11 2). II x II . we get = (x.y)=4"(llx+ yll 1 2 llxyll ) 2 if V is a real inner product space. An immediate Corollary of the CauchySchwarz Inequality is the following lemma. into this inequality. which explains the name.t(y. and (x.ty) = II x 112 . II y II = (II x II + II y 11)2. v II u + v 112 + II u . Another important property of the norm in an inner product space can be also checked by direct calculation.ty. y in an inner product space II x + y II :::. y E V (x. The lemma is proved by direct computation. x .t(y. and then repeat the proof for the real case. Note. x) Substituting t t (x. I a I = 1 such that (ax.ty 112 = (x . x . x + y) = II x 112 + II y 112 + (x. II x 112 + II y 112 + 211 x II .y)=. One is to replace x by ax. y) + I t 1211 Y 112.y) I o :::. There are several possible ways to treat the complex case. II x .
lI p for p 1I·ll p for P = 2 coincides with the regular norm obtained from the inner * 2 cannot be obtained from an inner product. that the norm 1I. . .lIp'p ~ 2. n}. and we will not present it here. But we claim more! We claim that it is impossible to introduce an inner product which gives rise to the norm 1I. but the proof is not so simple.3 and 4 are very easy to check. because the norm II v II = J(v. II v II for all vectors v and all scalars. It is easy to see that the Parallelogram Identity fails for the norm 1I. However. Triangle inequality: Suppose in a vector space V we assigned to each vector v a number II v II such that above properties 14 are satisfied.Inner Product Spaces 167 NORMS Normed spaces We have proved before that the norm II v II satisfies the following properties: 1. 2 .. For example. Any inner product space is a normed space. and one can easily find a counter example in 1R . The triangle inequality for k . Note. The norm product. given p.11"" (p = 00) by II x 1100 = max{1 xkl: k= 1.+ II xn 1V'lllp ~ [~I r p One can also define the norm 11. nor en by P xk I II x lip ~ (I XI f" + Ix 2 f" + . Nondegeneracy: II v II = 0 if and only if v = o. lip on lR. 1 < p < 00 one can define the norm II . Then we say that the function v ~ II v II is a norm. To check that 1I·lI p is indeed a norm one has to check that it satisfies all the above properties 14.lIp'p ~ 2.. Minkowski. The triangle inequality (property 2) is easy to check for P = 1 and p = 1 (and we proved it for p = 2). which then gives rise to a counter example in all other spaces.2. v) satisfies the above properties 14.. as the theorem below asserts completely characterizes norms obtained from an inner product. the Parallelogram Identity. A vector space V equipped with a norm is called a normed space. In fact. For all other p the triangle inequality is true. It is easy to see that this norm is not obtained from the standard inner product in Rn (en). 3. there are many other normed spaces. 4.. Properties 1. This statement is actually quite easy to prove. 2. after the German mathematician H. II u + v II ~ II u II + II v II· Nonnegativity: II v II ~ 0 for all vectors v. Homogeneity: II v II = II . kp even has special name: its called Minkowski inequality.
vk' k WEE = 1..168 Inner Product Spaces Theorem. if v 1. Lemma.... We say that a vector v is orthogonal to a subspace E if v is orthogonal to all vectors w in E. r. Proof By the definition. Lemma. u + v) = (u. Let VI' v2. Two vectors u and v are called orthogonal (also perpendicular) if (u. . Let E be spanned by vectors VI' v 2. u) + (v.. then we do not have any choice. v) + (u.. Since the vectors vk span E. .. . u) = 0 because of orthogonality). v. that it satisfies alt the properties. r. Yr' Then v ? E if and only if v J. We will write u 1. Definition.. 2.e. We say that subspaces E and F are orthogonal if all vectors in E are orthogonal to F. i.. v) = O. On the other hand.. v) = (v. 2. A system of vectors v I' v2. Note. if it satisfies the Parallelogram Identity II u + v 112 + II u . y) we got from the polarization identities is indeed an inner product. .. vk) = 0 forj k). k = 1. "i/k = 1. The inverse implication is more complicated. . .. all vectors in E are orthogonal to all vectors in F The following lemma shows how to check that a vector is orthogonal to a subspace. II u + v 112 = (u + v. . The proof is straightforward computation.v 112 = 2(11 u 112 + II v 112) ORTHOGONALITY Orthogonal and Orthonormal Bases Definition.. vk. If. . so we do not present it here. this inner product must be given by the polarization identities. we need to show that (x. Definition. u) = II U 112 + II v 112 ((u. vn is called orthogonal if any two vectors are orthogonal to each other (i. (Generalized Pythagorean identity). let v J.2. E then v is orthogonal to all vectors in E. we call the system orthonormal. v to say that the vectors are orthogonal. .e. but the proof is a bit too involved. . that for orthogonal vectors u and v we have the following Pythagorean identity: II u + v 112 = II U 112 + II v 112 if u 1. if(vj . . v) + (v.. any vector can be represented as a linear combination Z=:=lakVk' Then so v 1. in addition Itvk 11= 1 for all k. i. If we are given a norm. In particular. vn be an orthogonal system.e... and this norm came from an inner product. v k . w. Then * . . v E V... A norm in a normed space is obtained from some inner product if and only "i/u. v J. r. But. It is indeed possible to check if the norm satisfies the parallelogram identity.
which gives exactly (Vk. . ..Vk) = I:lak IlIvk II . Then by the ~ o=11 0 II 2 = L. .. As we studied before. so only the trivial linear combination gives O. it always can be added to any orthogonal system.Inner Product Spaces n 2 n 2 2 169 I:CXkVk = I:lcxk Illvk II k=1 k=1 This formula looks particularly simple for orthonormal systems. and let n X = j=1 Taking inner product of both sides of the equation with VI we get aiv i + 2v2 + . Therefore we only need to sum the terms with} = k. Remark. k=1 k=1 Corollary.. to find coordinates of a vector in a basis one needs to solve a linear system. Proof Suppose for some ai' .tCXjVj] = ttakaj(Vk>Vj)' k=1 k=1 j=1 k=1 j=1 Because of orthogonality vk' v) = 0 if} ::j:. tCXkVk 2 = [tCXkVk. In what follows we will usually mean by an orthogonal system an orthogonal system of nonzero vectors. Orthogonal and Orthonormal Bases Definition. Since the zero vector 0 is orthogonal to everything.. I:lcxk I n 2 n 2 2 an we have I::=I CXkVk = O. k=1 0) we conclude that a k = 0 Vk... vn which is also a basis is called an orthogonal (orthonormal) basis. However. . vn is an orthogonal basis. . Vn of nonzero vectors is linearly independent. It is clear that in dim V = n then any orthogonal system of n nonzero vectors is an orthogonal basis. Proof of the Lemma. An orthogonal (orthonormal) system VI' v2... ... suppose VI' v2. where II vk II = 1. for an orthogonal basis finding coordinates of a vector is much easier. ~2' Generalized Pythagorean identity . k. Since II vk II ::j:. ..+ anvn = I:cx jVj' .... Namely.I CXk I2 II Vk II 2 . 0 (vk ::j:. .. Any orthogonal system Vi' v2. but it is really not interesting to consider orthogonal systems with zero vectors.
We will use notation w = PEV for the orthogonal projection. vI) = L(X j(Vj' Vk) = (Xk(Vk' Vk) = ak II Vk II j=1 n 2 so Therefore. 2. v. E.2 · II vIII Similarly. VI) = 0 if} :t= 1). After introducing an object.170 Inner Product Spaces (x. Then we present a method of finding the projection. Vk) (x. it is natural to ask: 1. proving its existence. mUltiplying both sides by v k we get (x. to find coordinates of a vector in an orthogonal basis one does not need to solve a linear system. wEE. The following theorem shows why the orthogonal projection is important and also proves that it is unique. For a vector V its orthogonal projection P EV onto the subspace E is a vector w such that 1.. VI) = I:>~ j j=1 n (Vj . 3. one can introduce the following definition. Orthogonal Projection and GramSchmidt Orthogonalization Recalling the definition of orthogonal projection from the classical planar (2dimensional) geometry. This formula is especially simple for orthonormal bases. .W 1. Is the object unique? How does one find it? We will show first that the projection is unique. so (XI = . Let E be a subspace of an inner product space V. VI) = al(v l . VI) = alii VI 112 (all inner products (Vj . the coordinates are determined by the formula. when II V k II = 1. Definition. Does the object exist? 2.
w II ~ II v .. It is easy to see now from formula that the orthogonal projection P E is a linear transformation.Inner Product Spaces 171 Theorem. Then =w vx=vw+wx=vw+~ Since v . Proof of Proposition. if x = w. vr form an orthogonal basis in E. Proposition. x.w 112 + II y 112 ~ II v . none can get from the ) above formula the matrix of the orthogonal projection PE onto E in en (lR.L E we have y . i.x II· Moreover. Remark.e.. II vk II ~ (v.. Indeed. i. Let . n is given by Remark.vk) PEV= L.J2vk · k=III vk II Note that the formula for k coincides with. Proof Let y w II = II v .. .(aPeX .~P sY) is orthogonal to any vector in E..PE! + ~PsY. so by the definition PE(ax + ~y) = a. One can also see linearity of P E directly.e. . from the definition and uniqueness of the orthogonal projection. .x 112 = II v .... en and lR. iffor some x E E = PEV minimizes the distance from v to E. The orthogonal projection w for all x E E II v .wand so by Pythagorean Theorem II v .x II.k = In other words k=I (V'Vk~.L v .. Let VI' v2. Then the orthogonal projection P EV of a vector v is given by the formula r PEv= LQ. Note that equality happens only ify = 0 i. this formula for an orthogonal system (not a basis) gives us a projection onto its span. II v then x = v..w 112.w . Recalling the definition of inner product in r 1 2 PE = L k=ll1 Vk II * VkVk where columns VI' V 2' .kVk' where Q.e. it is easy to check that for any x and y the vector ax + ~y .. vr be an orthogonal basis in E. The following proposition shows how to find an orthogonal projection if we know an orthogonal basis in E.
" xn} = span {vI' V2. In particular. v n }· Moreover.. r (v .. . ._ _ ~ (Xr+I.X2 =X2 (x2' VI) 2 VI' II VI II (x3' VI) (X3' V2) Define E2 = span {vI' v 2}· Note that span {xl'x 2} = E 2.I Vk II .. Suppose that we already made r steps of the process.172 Inner Product Spaces W:L:>~kVk' k=I r where Olk= (V'Vk~. x n . v k)  L Ol /Vj' vk) r j=I (V. Define v3 by V3 := X3 .. . Step 2. .. . vk) . GramSchmidt Orthogonalization Algorithm..vk)=2 I1vk II =0. x r }.PEr Xr+I .W . v r } Now let us describe the algorithm. .. Step 1. since any system consisting of one vector is an orthogonal system. v3 }· Note that span {xl' X2.PE2 X3 = X3  2 VI  2 V2 II vIII IIv211 Put E3 := span {VI' V2. x r } = span {vI' V2. By Lemma it is sucient to show that v = 1. .. . Put VI :=x I ' Denote by EI := span{x I } = span{v I}. .xr+I . for all r $ n we get span {xl' x 2. . we know how to perform orthogonal projection onto onedimensional spaces. Define v2 by V2 =X2 PE. there exists a simple algorithm allowing one to get an orthogonal basis from a basis. . IIvk II SO... . constructing an orthogonal system (consisting of nonzero vectors) VI' v 2. X3 } v3 =1= = E 3. II vk II W We want to show that v . 2. But how do we find an orthogonal projection if we are only given a basis in E? Fortunately. vr such that Er := span{vl' v2' .. vr } = span{x I. x 2... if we know an orthogonal basis in E we can find the orthogonal projection onto E. . n..w. 2.. Computing the inner product we get for k = 1.6 I 2 Vk k=I .Xr+I .Vk) vr+I ..... .L vk' k = (v.. vk) = (v. Note also thatx3 ~ E2 so O. . .. Step r + 1.. .vk)ak(vk.. vn such that span{xl' x 2. The GramSchmidt method constructs from this system an orthogonal system vI' v 2' . v k)  (w. . .Vk) 2 =(v.. Suppose we have a linearly independent system x I' X2. Define . Step 3.1 E.
vn . 2l.) ~ [[ ~Hl]l ~ 3. . Suppose we are given vectors xI = (1. and we want to orthogonalize it by GramSchmidt. X3 = (1. Continuing this algorithm we get an orthogonal system vI' v2.. In particular. and the formulas will look simpler. ll.ll. we get Finally. Since the multiplication by a scalar does not change the orthogonality. ll· On the second step we get (x2' vI) v2 = x2 PE1X 2 = x2  2 VI· II vI II Computing (x2' v. . that xr+I e Er so vr+I '* 0.11 ~ 112 = 3. I. . On the other hand.Inner Product Spaces 173 Note. Thus one may want to replace the vector v 3 from the above example by (1. when performing the computations one may want to avoid fractional entries by multiplying a vector by the least common denominator of its entries. define v3 = x3  PE2 X3 = x3  (x3' vI) II vIII 2 vI  (x3' v2) II v211 2 v2· Computing (II VI 112 was already computed before) we get vJ =[g]~[l]M 1]= I 1 2 Remark.0. one can multiply vectors vk obtained by GramSchmidt by any nonzero numbers. 2l.1. 1. X2 = (0. in many theoretical constructions one normalizes vectors vk by dividing them by their respective norms II vk II. On the first step define vI =x l = (1. 2.. An example. Then the resulting system will be orthonormal.
. By the definition of orthogonal projection any vector in an inner product space V admits a unique representation v = VI + V 2' VI E E. Definition. So. If x. V2 E E1.b II is equivalent to minimizing 2 Ax =b R an A..L. then there is no solution. the right side b belongs to the column space Ran A.. is a subspace.L E.L E}. V2 . ....e.jxj bk k=I j=I m n i. Decomposition V = E $ E. := {x:x .L E (eqv. But. E1. the system is consistent and we have exact solution. Otherwise. The simplest idea is to write down the error IIAxb II and try to find x minimizing it.. and equation is consistent. if we obtained the equation from an experiment. because if there is no solution. The term least square arises from the fact that minimizing II Ax . so it is possible that an equation that in theory should be consistent. does not have a solution. what one can do in this situation? Least square solution.L E then for any linear combination ax + ~y .. Proposition. The following proposition gives an important property ofthe orthogonal complement. situations when we want to solve an equation that does not have a solution can appear naturally. which mean exactly that any vector admits the unique decomposition above.. But. But what do we do to solve an equation that II Axb II = 2) (AX)k bk I = k=I 2 m 2 L: L:Ak. Y .... For a subspace E its orthogonal complement E? is the set of all vectors orthogonal to E. for example. For a subspace E Least Square Solution The equation has a solution if and only if b E does not have a solution? This seems to be a silly question. Ifwe can find x such that the error is 0.) . 174 Inner Product Spaces Orthogonal Complement.. Ifwe do not have any errors. to minimizing the sum of squares of linear functions. (where clearly VI = PEV)' This statement is often symbolically written as V = E $ E1. Therefore E1. we get the socalled least square solution. in real life it is impossible to avoid errors in measurements.
Geometric approach.• . ifx is a solution of the normal equation A * Ax = A * b (i. Then we can just take partial derivatives with respect to Xj and find the where all of them are 0. Namely. A solution of this equation gives us the least square solution of Ax = b. So. As we already discussed above. Formula for the orthogonal projection. to find the least square solution we simply need to solve the equation Ax = PRanAb. which can be computationally intensive. the problem is solved. which gives us minimum. then the condition Ax 1. . . n. If the operator A *A is invertible. . which in tum is equivalent to the socalled normal equation A * Ax =A * b. . then Ax = PRanAb. Ifwe are in ~n .(b. vn in Ran A.b II is exactly the distance from b to Ran A. Fortunately. .• 11 2vk k=I 11 vk Ifwe only know a basis in Ran A. together we get that these equations are equivalent to A*(b Ax) = 0. where PRanA stands for the orthogonal projection onto the column space Ran A. Ran A can be rewritten as b Ax 1. that the least square solution is unique if and only if A * A is invertible. . so minimum of II Ax . Therefore the value of II Ax .. we need to use the GramSchmidt orthogonalization to obtain an orthogonal basis from it. Ax is the orthogonal projection P Ra~b if and only if b Ax 1.and everything is real. Ran A (Ax E Ran A for all x). but the solution is not very simple: it involves GramSchmidt orthogonalization. and then mUltiply the solution by A.b II is minimal if and only if Ax = PRan Ab. then Ax gives us all possible vectors in Ran A. So.. a least square solution of Ax = b)... an are columns of A. there is a simpler way offinding the minimum. the solution of the normal equation A *Ax = A *b is given by x = (A*ArIA*b. there exists a simpler solution. so the orthogonal projection P~b can be computed as . So.2. theoretically. Note. n. Joining rows a..Ax) Vk = 1... = a.2. ak) '1k = 1.. Namely. If aI' a2. . we can find vector P Ra~b by the formula ~ (b.e. we can forget about absolute values. Ifwe know an orthogonal basis vI' v2. Normal equation.vk) PRanAb= w.Inner Product Spaces 175 There are several ways to find the least square solution. ak' That means 0= (b Ax. if we take all possible vectors x. However. to find the orthogonal projection of b onto the column space Ran A we need to solve the normal equation A *Ax = A *b.
Suppose that we know that two quantities x and yare related by the law Y = a + bx. all the points (xk' Yk) should be on a straight line.176 Inner Product Spaces Since this is true for all b. xk andYk are some fixed numbers. it usually does not happen: the point are usually close to some line. according to the rank theorem KerA = {O} if and only rank A is n. x). and the unknowns are a and b). . For an m x n matrix A KerA = Ker(A*A). One of the inclusion is trivial..2). Therefore Ker(A *A) = {O} if and only if rank A = n. minimizing this error is exactly finding the least square solution of the system :. 1). . Let us introduce a few examples where the least square solution appears naturally. . The following theorem implies that for an m x n matrix A the matrix A *A is invertible if and only if rank A = n. (2. Ax) = (A *Ax. line fitting. 1). If it is possible to find such a and b we are lucky. . Theorem. it is invertible if and only if rank A = n. for the other one use the fact that 1/ Ax 1/ 2 = (Ax.4). the coefficients a and b should satisfy the equations a + bxk = Yk' k = 1. Example.j 1 '1 1 xn [bl ~ J~l Yn (recall. To prove the equality Ker A = Ker (A *A) one needs to prove two inclusions Ker(A *A) KerA and KerA Ker(A *A). Suppose we run the experiment n times. (3. and the unknowns are a and b). n (note that here. Ideally. and we would like to find them from experimental data. (1. and we get n pairs (xk' Yk)' k = 1. the standard thing to do.. Since the matrix A *A is square. Then we need to find the least square solution of .Yk k=l n 2 1 • But. PRa"A =A(A*ArIA* is the formula for the matrix of the orthogonal projection onto Ran A. but not exactly on it. 2. Indeed. (0. Example.. that xkYk are some given numbers. is to minimize the total quadratic error 2:1 a+bxk . Suppose our data (xk' Yk) consist of pairs (2. n.. 1). 2. The coefficients a and b are unknown. If not. That is where the least square solution helps! Ideally. but because of errors in measurements.
as well as to surfaces in higher dimensions. . It can also be applied to more general curves.. where unknowns are the parameters you want to find..1I2x. Curves and Planes. c should satisfy the equations a + bXk + ex.. suppose we know that the relation between x and y is given by the quadratic law y = a + bx + cx2' so we want to fit a parabola y = a + bx + cx 2 to the data. b = 112. Write these equations as a linear system. The least square method is not limited to the line fitting. . in matrix form . For example.2. k = 1. = yk. b. An example: curve fitting. Then our unknowns a. y = 2 . . 2. Note. that the system need not to be consistent (and usually is not). Find the least square solution of the system. The general algorithm is as follows: 1. . Find the equations that your data should satisfy if there is exact fit. The solution of this equation is a so the best fitting straight line is = 2. The only constraint here is that the parameters we want to find be involved linearly.. 3. n or. Examples.Inner Product Spaces 177 4 1 0 2 3 Then 1 2 1 1 1 1 1) 1 0 1 0 2 3 1 1 2 1 3 [~] = 2 1 I 1 A*A=(_i and =(~ 1~) 4 1 1 1 A*b=(_i 1 0 2 so the normal equation A *Ax j) =(~) 2 I 1 I = A *b is rewritten as (~ ?8)(~) =(~).
or.2. The equations we should have in the case of exact fit are a + bXk + cYk = zk' k = 1. Plane fitting.. C = 43/154.. Y = 86/77 . b = 62/77. As another example. for the data from the previous example we need to find the least square solution of 1 2 4 4 1 1 2 1 0 1 1 2 1 1 3 9 1 Then 1 2 4 2 1 1 1 1 1 1 0 2 3 1 o 0 18 26 18 26 114 1 0 4 9 1 2 4 1 3 9 and 4 1 1 1 2 1 0 2 1 1 0 4 1 1 Therefore the normal equation A *Ax = A*b is H~]= <A*A=H r =p IS] A*b=H i]= = =[3H [ 18 ~ 26 114 [~l 1~ i~][~l 31 C which has the unique solution a = 86/77.178 Inner Product Spaces : :. Therefore. in the matrix form . n.'" n. let us fit a plane z = a + bx + cy to the data (xk'Yk' zk) E E]R3.?l 1 X n X n Yn For example. .2.62 x/77 + 43 x2/154 is the best fitting parabola. k = 1. . :~ [~]=[.
y. then B = A *.. . vn in Vand B = w l . 'dy E W. The above main identity (Ax. y) = (x. Since it is true for all y. Namely. Uniqueness of the adjoint. we define the operator A * by defining its matrix [A *lAB as . The above main identity (Ax. let us introduce some useful formulas.I Before proving this identity. y) = y*Ax = (A*y)*x = (x. Why does such an operator exists? We can simply construct it: consider orthonormal bases A = VI' v2.y) = (x..wm in W. the linear transformations. y) = (x. . A *y) 'dx E V.A *y) = (x. Let us first notice that the adjoint operator is unique: if a matrix B satisfies (Ax. y) = (x. Adjoint transformation in abstract setting. by the definition of A* (x.. the identity (AB)* = B*A* holds for the adjoint. and the middle one follows from the fact that (Ax) = x(A) = xA. Fundamental Subspaces Revisited Adjoint matrices and adjoint operators.. b. Let as recall that for an m x n matrix A its Hermitian adjoint (or simply adjoint) A* is defined by A * : = AT . Also. Indeed.w2' .. The following identity is the main property of adjoint matrix: I(Ax. where A : V ~ W is an operator acting from one inner product space to another. to find the best fitting plane. By) 'd x and therefore by Corollary A*y = By. since (AT l = A and z = ~. we are ready to prove the main identity: (Ax. and therefore the matrices A and B coincide. Let us recall that for transposed matrices we have the identity (AB)T = BT AT. A *y) can be used to define the adjoint operator in abstract setting. If [AlBA is the matrix of A with respect to these bases. Now.Inner Product Spaces 179 So. c). the first and the last equalities here follow from the definition of inner product in Fn . the matrix A * is obtained from the transposed matrix AT by taking complex conjugate of each entry. A *y) is often used as the definition of the adjoint operator.y E V. . By) 'd x. we need to find the best square solution ofthis system (the unknowns are a. In other words. (A*) =A* = A. Since for complex numbers z and W we have zw = .z.. y) = (x. we define A* : W ~ Vto be the operator satisfying (Ax. A*y).A * y)\ix.
.if and only if x . So.180 Inner Product Spaces [A*]AB = ([A]BA)*' Useful form ulas. ... Ker A = (Ran A)I. By the definition of the inner product in F n . 5. ak) E (Ran A)I.e. the above n equalities are equivalent to the equation A*x = o.. l. Finally. Ran A 2. 4. an be the columns of A. 2. (AB) = B*A *.. is the row number k of A*. 4. 2. n. (x. Ran A = (Ker A). Note. Ax) = (A *y. = (Ker A)L. The general case can be always reduced to this one by picking orthonormal bases in Vand W. that since for a subspace E we have (E L ) L = E.e.2. (aA)* = (iA*. Similarly. x). ak) = A. In the "matrix" proof. Proof First of all. (y. Let aI' a 2. (A*)* =A. . . Below we present the properties of the adjoint operators (matrices) we will use a lot.. that x (i. let us notice. we proved that x E (Ran A) L if and only ifA *x = 0. Since A. thatA : F" ~ P. .. for the same reason.. . Relation Between Fundamental Subspaces Theorem./k = 1. or "coordinatefree" one.L ak = 0) \. . i. 3. Ker A = (Ran A)L. Then ~ W be an operator acting from one inner product space to l. we assume that A is an m x n matrix. We will present 2 proofs of this statement: a "matrix" proof. (A + B) = A* + B*. to prove the theorem we only need to prove statement 1. and that is exactly the statement 1. 3. the statements 1 and 3 are equivalent.. the statements 2 and 4 are equivalent as well. that means o = (x.l. n. and considering the matrix of A in this bases. Let A : V another. .. = 1. statement 2 is exactly statement 1 applied to the operator A * (here we use the fact that (A*)* =A) So. and an "invariant".
y).y).e. Theorem.y E X. Since (x. i.L if and only if A *x = 0.D 1 ~ ex II Ux+exUy II 2 4 a=±l. that (x. So we proved that x E (Ran A).. . if Xis a complex space (Ux. An operator U: x 7 Y is called an isometry. the last identity is equivalent to (A *x. Proof The proof uses the polarization identities.. For example. which is exactly the statement 1 of the theorem.Inner Product ::''paces 181 E Now. Ux II = II x II Vx E X.. The above theorem makes the structure of the operator A and the geometry of fundamental subspaces much more transparent. An operator U : x 7 Y is an isometry if and only if it preserves the inner product. tland only if (x. if it preserves the norm.. for a real space x 1 2 (V"U y ) = "4(11 Ux+Uy II II UxUy II ) Uy II 2 2 = "4 (II U (x + y) II II Ux  1 2 ) I 2 2 ="4(1l x +YII llxyll )=(x. Ay) (Ran A). Unitary and Orthogonal Matrices Main definitions Definition.. Uy) Vx. The following theorem shows that an isometry preserves the inner product. Isometrices and Unitary Operators. y) = 0 Vy.±i Similarly. let us present the "coordinatefree" proof.±i exllx+ exyI1 2=(x.±i =! =! L L ex IIU(x+exUy) 112 4 a=±I.e. y). Ay) = (A *x. 4 a=±l.L means = 0 Vy . The inclusion x that x is orthogonal to all vectors of the form AY' i. y) = (Ux.U y ) II =. It follows from this theorem that the operator A can be represented as a composition of orthogonal projection onto Ran A * and an isomorphism from Ran A * to Ran A.. An operator U : X 7 Y is an isometry if and only if U*U = I. and by Lemma this happens if and only if Ax = O. . Lemma.
a unitary matrix is a matrix of a unitary operator acting in Fn' A unitary matrix with real entries is called an orthogonal matrix.e. vn is an orthonormal basis. This statement can be checked directly by computing the product V*V. then Uv l . Since it is true for all x E X. A product of unitary operators is a unitary operator as well. On the other hand.. i. VVi'Vv 2.) coso. The above lemma implies that an isometry is always left invertible (V* being a left inverse). if V*V::I: I. ""Vvn is an orthonormal basis. 3. y) = (Ux. An isometry V: X ~ Y is a unitary operator if and only if dirnX = dim Y. are orthogonal to each other. It is easy to check that the columns of the rotation matrix Sino. First of all. For a unitary transformation V.I = V*. Since all entries of the rotation matrix are real. Proof Since V is an isometry. In other words. the rotation matrix is an isometry. On the other hand. it is left invertible. and that each column has norm 1... Definition. If V is unitary. then for all x E X (V*Vx. y) = (Vx' Vy ) and therefore Vis an isometry. y) V x. and vi' v2. that a matrix V is an isometry if and only if its columns form 4. If V is a isometry. Vy) = (V*Vx. U.182 Inner Product Spaces Proof By the definitions of the isometry and of the adjoint operator (x. V* = UI is also unitary. y) = (x. it is invertible (a left invertible square matrix is invertible). dim X = dim c?sa ( sma . Examples.. . isomorphic spaces have equal dimensions).. Proposition. Therefore. we have (x. Therefore. let us notice. An isometry V: X ~ Y is called a unitary operator if it is invertible.Vv2. we have V*V = I. and since it is square. A square matrix V is called unitary if V*V = I. and therefore by Corollary V*Vx = x. Moreover. an orthonormal system. . Let x and Y be inner products paces. if V: x ~ Y is invertible. if V is unitary. . if V is an isometry. it is unitary. The next example is more abstract. an orthogonal matrix is a matrix of a unitary operator acting in the real space R n • Few properties of unitary operators: I. Vvn is an orthonormal system. y) VY E X. 2. it is an orthogonal matrix. Y E X. . and since dim X = dim Y . dim X = dim Y (only quare matrices are invertible.
any fWO unitary equivalent matrices are similar as well.. . then I 'A I = 1 Remark.l = if. that for an orthogonal matrix..e.. an eigenvalue (unlike the determinant) does not have to be real. so the vectors Ue k are eigenvectors of A.ff II X 112 = I cl12 + I c21 + . Then BUx = UAU*Ux = UAx = U(Ax) = 'AUx~ i. The following proposition gives a way to construct a counterexample. I det U I = 1. Since for a unitary U we have U.Inner Product Spaces 183 Y= n. 2. Ue n is an orthonormal basis. un is an . Let now a has an orthogonal basis up u2. . the system Ue l . which are not unitarily equivalent. un of eigenvectors.. Statement 1 is proved. If 'A is an eigenvalue of U. we can always assume that the system u 1l u2' .... Our old friend. it is easy to construct a pair of similar matrices..x. II Ux II = II X II for all x E X. Xn and yp Y2' . i.. Proof LetA = UBU* and let Ax = Ax. . Yn be orthonormal bases inXand Yrespectively... A matrix a is unitarily equivalent to a diagonal one if and only if it has an orthogonal (orthonormal) basis of eigenvectors. letD = UAU*. 11l2=II~Ckxk 112= ~ICk one can conclude that 2 1 . Dividing each vector Uk by its norm if necessary. . Proof Let det U = z. . Ux is an eigeri'vector of B. .Ue2. and let xl' x 2. II = II Ax II = I 'A I . Then I. + C. 2. Define an operator U: X ~ Y by Uxk = Yk' k = 1. for an orthogonal matrix det U = ±I. . Note. Let U be a unitary matrix..e. Proposition. Since for a vector x = Cl xl + C(2 + . II x II. + I Cn I and II Ux 112 = u[~c. so Unitary Equivalent Operators Definition. so I det U I = I z I = 1. . In particular. Since U is unitary. To prove statement 2 let us notice that if Ux = Ax then II Ux I A I = 1. Since det (U) = det(U) .. The converse is not true... The vectors ek of the standard basis are eigenvectors of D.. . Properties of Unitary Operators Proposition. . So. n. letA be unitarily equivalent to a diagonal matrix D... the rotation matrix gives an example. so U is a unitary operator. Operators (matrices) a and b are called unitarily equivalent if there exists a unitary operator U such that A = UBU)*. we have I z 12 = z z = det (U* U) = det I = 1.
The complex euclidean inner product of u and v is dened by u. We begin by giving a reminder of the basics of complex vector spaces or vector spaces over Definition. V E V.vI 12 + . + 1 12)112. we have (a + b)u = au + bu... known as vectors.. v.184 Inner Product Spaces orthonormal basis...v " = (I u l . we have c(u + v) = cu + CV.. Remark. together with vector addition + and mUltiplication of vectors by elements of and satisfying the following properties: (VAl) For every u.I and since U is unitary. we have u + V= V + u. We shall first generalize e. . A = UDU*.. (VA2) For every u. D is a diagonal matrix. un)' where u I ' . (SM3) For every a. (SM2) For every C E and u.. there exists u E V such that u + (u) = O. norm and distance. . The standard change of coordinate formula implies A = [A]ss = [1]SB [A]BB [1]BS = UDU. un and the complex euclidean distance between u and v is dened by d(u. . . . Proposition.. COMPLEX INNER PRODUCTS Our task in this section is to dene a suitable complex inner product. . Definition. Suppose that u = (up . we have u + (v + w) = (u + v) + w. rst developed for ~n in Chapter 9. un) and v = (vI' . v = U I VI + . we have (ab)u = a(bu). we have u + v E V. (SMS) For every u E V.vn 12)112. Corresponding to Proposition. (SM4) For every a. V E V. WEen and CE C. we have u + 0 = 0 + u= u. (VAS) For every u. v) = " u . v E V.. un E e. we have lu = u.. U is unitary. un' Since the columns form an orthonormal basis.. we have the following result. the complex euclidean norm of u is dened by "u" = (u u)1I2 = (I u I I2 + . e.. Then . Let D be the matrix of A in the basis B = u I ' u2. Suppose that u. b E C and u E V. un' Clearly. WE V. + Un Vn . vn) are vectors in e. (VA4) For every u E V. Subspaces of complex vector spaces can be dened in a similar way as for real vector spaces.. (SMl) For every C E and u E V. A complex vector space V is a set of objects. e e the concept of dot product. we have cu E V. (VA3) There exists an element 0 E V such that for every u E V.. v. Denote by U the matrix with columns u I ' u2' .. An example of a complex vector space is the euclidean space Cn consisting of all vectors of the form u = (u I ' ••• . + 1un . b E C and u E V.
v) = IIu . Definition. V E V. vi = hv. In particular. v= v. The following definition is motivated by Proposition. we have hu.u)1I2 .vii. Then the matrix I A=A is called the conjugate transpose of the matrix A.w). By a complex inner product on V. Using this inner product. v E V and c E C. (IP4) For every u E V. Unitary Matrices For matrices with real entries. (lP3) For every u.v+w)=(u. u = 0 if and only if u = O. u ~ = (c u) v. and that c E C. the analogous roles are played by unitary matrices and hermitian matrices respectively. the Gram Schmidt orthogonalization process.Inner Product Spaces (a) u. WE V. Then the norm of u is defined by II u 1I=(u. v) =(cu. Suppose that u and v are vectors in a complex inner product space V . c(u. A complex vector space with an inner product is called a complex inner product space or a unitary space. Definition. we can discuss orthogonality. Then (a) (A *)* = A. orthogonal matrices and symmetric matrices play an important role in the orthogonal diagonalization problem. (lP2)Foreveryu. in a similar way as for real inner product spaces. as well as orthogonal projections. and the distance between u and v is defined by d(u. orthogonal and orthonormal bases. . For matrices with complex entries. Proposition.u) = 0 if and only if u = O. Definition. we mean a function (.v)+(u. (c) c(u v) (d) u. ui. and u . Definition.u) ~ 0. v.u. we have (u. we have chu. the results in Sections can be generalized to the case of complex inner product spaces. Suppose that A and B are matrices with complex entries.wehave (u. (b) (A + B)* =A* + B*. Suppose that A is a matrix with complex entries.v). ) : V x V ~ C which satises the following conditions: (lPl) For every u. and 0. (b) u (v + w) 185 = (u v) + (u w). and (u. Suppose further that the matrix A is obtained from the matrix A by replacing each entry ofA by its complex conjugate. Suppose that V is a complex vector space.
we would like to determine which matrices are unitarily diagonalizable. The most natural extension to the complex case is the following. we have the following results. Unitary Diagonalization Corresponding to the orthogonal disgonalization problem in Section 10. corresponding to distinct eigenvalues Al and ~ respectively. we study the question of eigenvalues and eigenvectors of a given matrix. Remark. we have the following result. Definition. eigenvectors of a normal matrix corresponding to distinct eigenvalues are orthogonal. Suppose that u l and u2 are eigenvectors ofa normal matrix A with complex entries.3. Corresponding to Propositions. there are unitarily diagonalizable matrices that are not hermitian. As before. Suppose that A is an n x n matrix with complex entries. We have indicated that a square matrix with real entries is orthogonally diagonalizable if and only if it is symmetric. First of all. Then (a) (b) A is unitary if and only if the row vectors of A form an orthonormal basis of en under the complex euclidean inner product. and (d) (AB)* = B*A*. Proposition. The explanation is provided by the following. Then u l u2 = O. A square matrix A with complex entries is said to be hermitian if A = A. Suppose that A is an n n matrix with complex entries. Corresponding to Proposition. Definition. A square matrix A with complex entries and satisfying the condition AI = A * is said to be a unitary matrix. A is unitary if and only if the column vectors of A form an orthonormal basis of en under the complex euclidean inner product. Definition. these are dened as for the real case without any change. While it is true that every hermitian matrix is unitarily diagonalizable. (c) (cA*) Definition. we now discuss the following unitary diagonalization problem. Note that every hermitian matrix is normal and every unitary matrix is normal. A square matrix A with complex entries is said to be unitarily diagonalizable if there exists a unitary matrix P with comp lex entries such that pIAP = P *AP is a diagonal matrix with complex entries. . Then it is unitarily diagonalizable if and only if it is normal. Unfortunately. Proposition. A square matrix A with complex entries is said to be normal if AA * = A *A. For those" that are.186 Inner Product Spaces = cA*. We can now follow the procedure below. In other words. Proposition. it is not true that a square matrix with complex entries is unitarily diagonalizable if and only if it is hermitian. we then need to discuss how we may nd a unitary matrix P to carry out the diagonalization.
Since v*Av and v*v are 1 xl.. write and D = AI (3) P = (w 1 . To show that A is real. An E e are the eigenvalues of A and where wI' .. Normalize the orthogonal eigenvectors vI' . .. 'Wn E en are respectively their orthogonalized and normalized eigenvectors. (2) Apply the GramSchmidt orthogonalization process to the eigenvectors u l . Proposition. . Suppose that A is a normal n n matrix with complex entries. it suces to show that the 1 x 1 matrices v*Av and v*v both have real entries.. .. un of A corresponding to these eigenvalues as in the Diagonalization process. These form an orthonormal basis of en. Then all the eigenvalues of A are real.. It follows that both v*Av and v*v are hermitian. un to obtain orthogonal eigenvectors vI' . Vn of A. Furthermore.. b] ~ lR. APPLICATIONS OF REAL INNER PRODUCT SPACES Least Squares Approximation Given a continuous functionf: [a.. Suppose that A is a hermitian matrix. with corresponding eigenvector v. . Suppose further that is an eigenvalue of A. . We conclude this chapter by discussing the following important result which implies Proposition. .. Then P*AP = D. vn to obtain orthonormal eigenvectors wI' . Multiplying on the left by the conjugate transpose v* of v. and n linearly independent eigenvectors uI ' . An where AI' . Now (v*Av)* = v*A*(v*)* = v*Av and (v*v)* = v*(v*)* = v*v. we wish to approximatefby a polynomial . Proof Suppose that A is a hermitian matrix.Inner Product Spaces 187 Unitary Diagonalization Process.. that all the eigenvalues of a symmetric real matrix are real. Then Av = AV.. it follows that they are real.. Wn ) ( J... noting that eigenvectors corresponding to distinct eigenvalues are already orthogonal. (1) Determine the n complex roots 11"'" In of the characteristic polynomial det(AII).. . Wn of A. we obtain v*Av = V*AV = AV* v.. It is easy to prove that hermitian matrices must have real entries on the main diagonal.
g) = f: f(x)g(x)dx. In other words. so that II u . b] be the collection of all polynomials g : [a. The purpose of this section is to study this problem using the theory of real inner product spaces.. Given any u E V. 2 2 Then f b a I f(x)g(x) dx =(f .W) + (projwu . Suppose that {l'o' vI' ..W 112 ~ 0: The result follows immediately.g Ii.projwU 112 + II projwU . b] of all continuous real valued functions on the closed interval [a.L and proj Wu WE W. the distance from u to any W E W is minimized by the choice w = projwU.W 112. b].W 112 = II(u .188 Inner Product Spaces g: [a. vk } is an orthogonal basis of W= P k [a. we have . Let V denote the vector space qa. Now let W = Pk [a. the inequality II u projwU II ~ II u w II holds for every w E W. Proof Note that u projwU E W. b] ~ lR with real coecients and of degree at most k. b].proj wU 112 = II proj wU .f . It is easy to show that W is a subspace of V. . and that W is a finitedimensional subspace of V.projwu .w)1I 2 = II u .g.gil· It follows that the least squares approximation problem is reduced to one of nding a suitable polynomial g to minimize the norm IIf. we conclude that g= projwf gives the best least squares approximation among polynomials in W = P k [a. Note that W is essentially Pk' although the variable is restricted to the closed interval [a. b] ~ lR of degree at most k. Our argument is underpinned by the following simple result in the theory. with inner product (f. such that the error f b a I f(x)  g(x) dx 2 is minimized.g) =11 f . Proposition. b]. bJ. Then by Proposition. projwU can be thought of as the vector in W closest to u.W 112 II u . the orthogonal projection of u on the subspace W. It follows from Pythagoras's theorem that II u . Suppose that V is a real inner product space. This subspace is of dimension k + 1. Alternatively. In view of Proposition IIA.
and take X g = IIlIT + II 1 (ex. 1) Now so that .2]. 1]. 1].X1I2)( X _11211 2 X "2 . + 2 (f. 2].g) = f: f(x)g(x)dx.. We now apply the GramSchmidt orthogonalization process to this basis to obtain an orthogonal basis {I. Suppose that we wish to nd a least squares approximation by a polynomial of degree at most 1.vI) (f. and W = PI [0.xI) = 1 o 2 (xI) dx=.I)= while f:idx=~ and 111112= f:dx=2.Inner Product Spaces 189 2 VO g _ (f. 1]. we can take V = C[O. and W = PI [0.. we can take V = qo. 3 3 Example. In this case. 111112 IIxIII (i. x . Suppose that we wish to find a least squares approximation by a polynomial of degree at most 1. 2 2 2 (x . with basisfI' xg. with inner product Example. 2 2 3 It follows that 4 2 g=+2(xI)=2x.I) li.vo) II Vo II + .XI) = f: x (xI)dx = ~ and II x_II1 = (xI. x}.. In this case.VI + . We now apply the GramSchmidt orthogonalization process to this basis to obtain an orthogonal basis {I.g) = f~f(x)g(x)d I x.1I2} of W. Consider the functionj{x) = x 2 in the II Vk II 2 Vk • (f. and take li. with inner product (f. xI} of W.Vk) II VI II interval [0.xI) g= \ 1+ \ 2 (xI).2]. Consider the functionf(x) = eX in the interval [0. with basis {I.I) (e .
!. g = (el)+(l86e)(x.. It follows that the least squares approximation problem posed here has a unique solution. .) =(l86e)x+(4e1O).( XHdx= I~ 2. + 3x. It can be written in the form 5x~ +6x1x. ifi> j. ifi> j. +7xi(XIX'{~ ~tJ Example. .dx= 1 and It follows that ~xMI' +f.j = 1. Quadratic Form A real quadratic form in n variables xl' . note that given any quadratic form (1).IH.. we can write. .. To see this. Cij aij ifi = j. The expression 5x~ +6xlx2 +7x. it is clear that II u . = coo 2 l) Coo 2 )1 1 1 . Example.. this is always possible..190 Inner Product Spaces Also 11111'= (1.Xj' j=1 where cij E lR for every i. Remark. the quadratic form can be described in terms of a real symmetric matrix. From the proof of Proposition.=1 iSj.w II is minimized by the unique choice w = proj wU.xf)= J. + 3x. + 2x1x2 + 4xlx3 + 6x2x3 is a quadratic form in three variables xl' x 2 and x 3• It can be written in the form 4x~ + 5x. . The expression 4x~ + 5x.j = 1. for every i. In fact. + 2xlx2 + 4xlx3 + 6x2x3 = (xlx2 x3) 1 5 3 ( 4 1 2J (XIJ x2' 2 3 3 x3 Note that in both examples. . n satisfying i < j.. is a quadratic form in two variables xl and x 2 . n n Cijx. xn is an expression of the form LL . n.
where D is a diagonal matrix with real entries. It follows that xtAx = xtPDptx . it follows from Proposition lOE that it is orthogonally diagonalizable. Definition. To answer our rst question. A quadratic form xtAx is positive denite if and only if all the eigenvalues of the symmetric matrix A are positive. and so. The rst is the question of what conditions the matrix A must satisfy in order that the inequality x/Ax> 0 holds for every nonzero x E ]Rn. Proposition. we shall prove the following result. such that Y the quadratic form x(1x can be represented in the alternative form y Dy. we can write It follows that a quadratic form can be written as x/Ax. . Here we shall restrict our attention to two fundamental problems which are in fact related. Our strategy here is to prove Proposition by first studying our second question. Since the matrix A is real and symmetric.Inner Product Spaces 191 Then We are interested in the case when xl"'" xn take real values. In this case. we say that the symmetric matrix A is a positive denite matrix. and so A = P DPt. there exists an orthogonal matrix P and a diagonal matrix D such that PtAP = D. In other words. A quadratic form x/Ax is said to be positive denite if x/Ax> 0 for every nonzero x E lR n • In this case. The second is the question of whether it is possible to have a change of variables of the type x = P ' where P is an invertible matrix. Many problems in mathematics can be studied using quadratic forms. where A is an n x n real symmetric matrix and x takes values in ]Rn. writing y= ptx.
. Also. n E lR are the eigenvalues of A. This cn be . in view of the Orthogonal diagonalization process. Consider the quadratic form 2xJ + 5x. +4xlx2 + 2Xlx3 + 4x2x3. 1116 and 1I. + y. + 2x. we have plAP = D. the quadratic form becomes 7xJ + y. . since P is an orthogonal matrix. Consider the quadratic from 5xJ + 6x~ + written in the form XlAx.~l D=(~ ~ ~). + AnYn. This answers our second question. see Example.J3 0 0 1 Writingy = pIX. where A=(~ ~ ~) ~d x=(~} The matrix A has eigenvalues I = 7 and (double root) 2 Furthermore. since P is an invertible matrix..~ 1I~ :. where yi . where = 3 = 1. the diagonal entries in the matrix D can be taken to be the eigenvalues of A.J2 1/. J 2 where AI' .4xlx2 + 4x2x3.192 Inner Product Spaces we have x'Ax =YDy. we also have x = Py. _1.. Note now that x = 0 if and only if y = 0. Writing y=(n we have x:Ax = y Dy = AlYI + .t 2 Example.. Example. Furthermore. so that D=(AI '. which is clearly positive defnite. P=[~. This can bewritten in the form xlAx.
h E E. For every f. n] with at most a finite number of exceptions. gEE are considered equal. For every / E E. furthermore. the quadratic form becomes 3y~ + 6y. +9y. gEE. See Example. where x~ + xi + 2xIX2. we have f. More precisely. where 0 0 9 Writingy= pIX. For every /E E. n]. Consider the quadratic form P= ( 2/3 2/3 113) 2/3 113 2/3 and 113 2/3 2/3 D= (3 0 0) 0 6 o. Example. the matrix A has eigenvalues Al = 2 and A2 = 0. Hence we may take p=(1IJi 1IJi J and D=(2 00). Indeed. we have / + (g + h) = if + g) + h. Then the following conditions hold: For every f. ateach of which / need not be dened but must have one sided limits which are finite. 1IJi 1IJi 0 Writing y = pIX. gEE. We further adopt the convention that any two functionsf. Real Fourier Series Let E denote the collection of all functions /: [n.Inner Product Spaces 193 The matrix A has eigenvalues Al = A . n] 7 lR which are piecewise continuous on the interval [n. n]. n] E lR. It is easy to check that E forms a real vector space. gEE. g. . we have/ + A= A+/= f. 3 we have PIAP = D. we have/ + (j) = A. denoted by /= g. with corresponding eigenvectors (:) and (_:). A2 = 6 and A3 = 9. This means that any / E E has at most a finite number of po ints of discontinuity. The quadratic form can be written in the form xlAx. Clearly this is equal to (x 1 + x 2)2 and is therefore not positive defnite. the quadratic form becomes 2y~ which is not positive denite. if/ex) = g(x) for every x E [n. which is clearly positive definite. we have / + g = g +f. It follows from Proposition that the eigenvalues of A are not all positive. let E E denote the function A: [n. A= G :) and x = [::j. For every f. where (x) = 0 for every x E [n.
COS3X.m)x . gEE and e E IR.cos 2x.g)+(f. The diculty here is that the inner product space E is not finitedimensional. gEE.194 Inner Product Spaces For every e E IR andfE E. g).cos (k + m)x)dx = {I0 2 1t 1t 1t 1t ifk=m if k:f. Remark.cosx. . we have e (f. For every a. we have elf + g) = ef+ eg.g) = !f1t f(x)g(x)dx. we have efE E. we have (a + b)f= af + bf.wehave (f. sin every k. (f. The diculty is to show that the set spans E. gEE. For every J. we have (f. gEE. For every a. HenceEis a real inner product space.g+ h) = (f. m . e E IR andf E E.f).g) = (g. We now give this vector space E more structure by introducing an inner product. For every e E IR andJ. It is easy to check that the following conditions hold: For every J.f)~O.h).sin 2x.smmx) = 1 f1t sm kx' (SIll kx' 1 f1t (cos(k . It is easy to check that the elements in (4) form an orthonormal "system". ForeveryfE E. g) = (cf. 1t 1t The integral exists since the function j{x)g(x) is clearly piecewise continuous on [1t'. It is not straightforward to show that the set {.f)=O if and only iff=A.k sin x. For every fEE. we have 3x. For every f. we have (f. For every J. e E IR andfE E. smmxdx =1 . h E E. mEN.(f. g. 1t]. For as well as . •• } in E forms an orthonormal "basis" for E. we have (a + b)f= a(bf). we have If= f.and .
Consider the functionJ: [x. an = (J. given by j{x) = x for every x E [x.x]1tJ= 2(1)n+1 x noon X non 0 n We therefore have the (trigonometric) Fourier series 2( _l)n+1 .cosnx) = lJ1t J(x)cosnxdx and x 1t x 1t Note that the constant term in the Fourier series (5) is given by bn = (J.Inner Product Spaces 195 (coskx.!. x] For every n E N u {O}. Example.sinnx) = lJ1t J(x)sinnxdx (f.(sin(km)xsin(k+ m)x)dx x 1t X 1t 2 Let us assume that we have established that the set (4) forms an orthonormal basis for E. we have bn =3. = J (x)dx. n x1t since the integrand is an odd function.(cos(km)xcos(k+m)I~)dx = {I r 0 ?kk '=m m if * (sinkx. On the other hand.(_[xcosnx]1t + [sin. and. x 1t x Jo since the integrand is an even function.. L 00 n=1 n . with Fourier coecients Ji :n) !i:1t (J.:n)= ~=~.!. On integrating by parts. for every n E N.(_[xcosnx]1t + i1tcosnx dxJ=3..cosmx) =lJ1t sinkxcosmxdx =lJ1t . x]. for every n E N. we have a = lJ1t xcosnxdx = 0 ~ 1R. we have 1t bn = lJ1t xsin nxdx = l r xsin nxdx.cosmx) = lJ1t coskxcosmxdx = l X 1t X J1t 1t 2 . smnx. Then a natural extension of Proposition 9H gives rise to the following: Every function J E E can be written uniquely in the form ao + t(ancosnx+bnsinnx) 2 n=l known usually as the (trigonometric) Fourier series of the function f.
We therefore have the (trigonometric) Fourier series . 2 nl II odd ttn k=l Note that the functionfis even. an = 1 f1t sgn~x smnxdx 2 i1t smn. For every n E N u {O}we have an = ~f1t n 1t sgn(x) cos nxdx = 0. we have = ~f1t xdx = n. for every n EN. n 1t 1t 0 since the integrand is an even function. and this plays a crucial role in eschewing the Fourier coefficients bn corresponding to the odd part of the Fourier series. Example. =n~ 2 o if niseven.nxIJ { ~. On the other hand. an =~([xsinnx lJ1t _ J1tSinnxdxJ 1t n 0 0 n ~ ~ ([XSi:nx I +[cO:. n].196 Inner Product Spaces Note that the functionfis odd. given for every x E [n. since the integrand is an odd function. we have an = ~f1t I x Icosnxdx = ~ r xcoxnxdx. for every n EN. Consider the functionf: [n. ifnisodd. o n 1t nJ ao 1t since the integrand is an even function. n] ~ 1R. on integrating by parts. f(x) = sgn(x) = 0 if x = 0 { 1 if n ~ x < 0. On the other hand. Consider the function f: [n. n] ~ 1R. Clearly 7t 0 Furthermore. It is easy to see that . Example.~2 cosnx=. we have ')' . given by f(x) = Ix I for every x E [n. we have bn = ~f1t I x I sinnxdx = 0 n 1t since the integrand is an odd function.. For every n E N u {O}.~ nf4 4 7t(2k 1) 2 cos(2kl)x.xdx . for every n EN. and this plays a crucial role in eschewing the Fourier coefficients an corresponding to the even part of the Fourier series. n] by + 1 if 0 < x ~ n.
we have 1 bn = flt x 2 sin nxdx =0 It since the integrand is an odd function.)' n I I On the other hand. n]t lR. given by j(x) = x 2 for every x E [n. n . we have = !([ x' s~nnxI. We therefore have the (trigonometric) Fourier series + L. n=1 nn k=1 n(2k 1) 11: 0  bn = _~[cosnx]lt ={ ~ m L 00 ifniseven L00 II odd Example.Inner Product Spaces 197 ifn is odd. Clearly lt ao = ~ r idx = 3n n Jo an 3 . n] For every n E N u {O} • we have an = 2 It 1 x cosnxdx n lt x cosnxdx. nn We therefore have the (trigonometric) Fourier series 4 4 sinnx = sin(2kl)x. on integrating by parts. for every n EN. Furthermore. Consider the function/: [n.J f:2xs~nnx !([ x' s~nnxI+[ 2X~SnxI cc:~nx = dx dx J = ~([ x2s~nnx _[2X~~nx _[2X~~nx]J 4(:. flt 2i n =2 0 2 since the integrand is an even function. for every n EN. 3 11=1 n2 00 4( _1)n 2 cosnx.
... Theorem. un in X such that the matrix ofA in this basi$ is upper triangular..Chapter 7 Structure of Operators in Inner Product Spaces In this chapter we again assuming that all spaces are finitedimensional.. and * means that we do not care what entries are in the first row right of 1. 1.. any n x n matrix A can be represented as T = UTU*... dimE = dim XI = n.1' We do care enough about the lower right (n . the induction hypothesis implies that there exists an orthonormal basis (let us denote is as u2. . . If dirnX = 1 the theorem is trivial. vn be some orthonormal basis in E (clearly.. that A I defines a linear transformation in E.1. and since dimE = n . and T is an upper triangular matrix. . Proof We prove the theorem using the induction in dirnX. . Suppose we proved that the theorem is true if dirnX = n .1 are zeroes. . u2. where U is a unitary.1. un) in which the matrix of A I is upper triangular. since any 1 x 1 matrix is upper triangular. . AU I basis the matrix of A has the form ut v • o o here all entries below 1.I ). Let A : X ~ X be an operator acting in a complex inner product space.) be an eigenvalue of A. to give it name: we denote it as A I' . Upper triangular (Schur) representation of an operator.. . and we want to prove it for dim X = n.. and let up II u l II = 1 be a corresponding = 1. In other words. vn is an orthonormal basis in X.and let 2' .1) block. so up v2. Note. Let A.1) x (n .1 u I· Denote E = . There exists an orthonormal basis u l .. In this eigenvector.
Suppose. . Remark. where A I is some real matrix. An alternative way is to equip V with an inner product by fixing a basis in V and declaring it to be an orthonormal one. eigenvalues of this matrix are complex. Suppose that all eigenvalues ofA are real.e. that even if we start from a real matrix A.1) matrices.e. Remark. where U is an orthogonal.. .Structure of Operators in Inner Product Spaces 199 So..1) x (n . Proof To prove the theorem we just need to analyse the proof of Theorem. it is some operator constructed from A.. a proof can be modeled after the proof of Theorem.1!: k7\ k E Z ( sma coso... matrix of A in the orthonormal basis u I' u2. An analogue of Theorem can be stated and proved for an arbitrary vector space. . any real n x n matrix A can be represented as T = UTU* = UTUT.. the theorem is true for (n . without requiring it to have an inner product. Sino.. . u2. Then there exists an brthonormal basis u l . where matrix A I is upper triangular. If we can prove that matrix Al has only real eigenvalues. all entries in the first row. un in X such that the matrix of A in this basis is upper triangular. Remark. Note also. is not unitarily equivalent (not even similar) to a real upper triangular matrix. Indeed. Note.) a:. un has the form). the inclusion AE c E does not necessarily holds. Therefore. that the subspace E = ut introduced in the proof is not invariant under A. the matrix of a in this basis is upper triangular as well.. un in E = ut . Let A : X ~ X be an operator acting in a real inner product space. the matrices U and T can have complex entries. The following theorem is a realvalued version of Theorem Theorem.. As in the proof of Theorem let 1 be a real eigenvalue of A. except AI) are zero. Let us assume (we can always do that without loss of generality. then we are done. u l E lR n . Note. . II ulll = 1 be a corresponding eigenvector. and let v 2. c?so. In this case the theorem claims that any operator have an upper triangUlar form in some basis. that the version for inner product spaces is stronger than the one for the vector spaces.' . and T is a real upper triangular matrices. Indeed. vn is an orthonormal basis in lR n • The matrix of a in this basis has form equation. . Note. The rotation matrix . because it says that we always can find an orthonormal basis. . that the operator (matrix) A acts in lRn. not just a basis. and the eigenvalues of an upper triangular matrix are its diagonal entries. That means that AI is not a part of A. In other words. that AE c E if and only if all entries denoted by * (i. vn be on orthonormal system (in ~ n) such that up v2.. . then by the induction hypothesis there exists an orthonormal basis u2. i. . .
i. so is real..e. Now let us ask ourself a question: What upper triangular matrices are selfadjoint? The answer is immediate: an upper triangular matrix is selfadjoint if and only if it is a diagonal matrix with real entries. To show that A I has only real eigenvalues. Proposition. (Ax.e. Let A = A * be a selfadjoint operator in an inner product space X (the space can be complex or real). . Then A can be represented as A= UDU. Let A = A * be a selfadjoint operator. Av = Av. A matrix about of a selfadjoint operator (in some orthonormal basis). for example). if A '* j.A) det(A I . In this section we deal with matrices (operators) which are unitarily equivalent to diagonal matrices. and there exists and orthonormal basis of eigenvectors ofA in X This theorem can be restated in matrix form as follows Theorem. On the other hand. x) = II x 112. Then (Ax. Let A = A* and Ax = Ax.. a matrix satisfying A = a is called a Hermitian matrix.l. and let u. Let us recall that an operator is called selfadjoint if A = A *. Since Ilx116= 0 (x '* 0). Let us give an alternative proof of this result. It also follows from Theorem that eigenspaces of a selfadjoint operator are orthogonal. so the matrix of a in the basis UI ' un is also upper triangular. v be its eigenvectors. and so any eigenvalue of A I is also an eigenvalue of A. Moreover. x '* O. x) = (x. x) = (x. we will use both terms. . let us notice that det(A 'JJ) = (AI . Ax) = ~ (x. if the matrix A is real. Theorem is proved. matrix U can be chosen to be real (i. Then all eigenvalues of A are real. A *x) = (x. Then. Let A = A be a selfadjoint (and therefore square) matrix.200 Structure of Operators in Inner Product Spaces such that the matrix of A I in this basis is upper triangular. Proof To prove Theorems let us first apply Theorem if X is a real space) to find an orthonormal basis in X such that the matrix of a in this basis is upper triangular. But a has only real eigenvalues! U2. Au = AU. Lei us give an independent proof to the fact that eigenvalues of a selfadjoint operators are real. orthogonal). where U is a unitary matrix and D is a diagonal matrix with real entries. x) = (x. x) = ~ II x 11 2 . Theorem. we can conclude A = I. row. Spectral Theorem for selfadjoint and normal operators.A) (take the cofactor expansion in the first. Since we usually do not distinguish between operators and their matrices. the eigenvectors u and v are orthogonal. . so All x 112 = ~ II x 112. Ax) = (x.
v). If 'A. Also.Structure of Operators in Inner Product Spaces 201 Proof This proposition follows from the spectral theorem. any unitary operator U: X ~ X is normal since U*U = UU* = 1. but here we are giving a direct proof. Now let us try to find what matrices are unitarily equivalent to a diagonal one.(u. any selfadjoint operator (AA * = AA *) is normal. it can be easily shown. A *v) = (u. . Note. and we want to prove it for n x n matrices. v) = (u. Any normal operator N in a complex vector space has an orthonormal basis of eigenvectors. So. if U is a unitary operator acting from one space to another. since any 1 x 1 matrix is diagonal. we did not claim that matrices U and D are real. v) = leu. Let N be n x n upper triangular normal matrix.. /lv) = iI (u. Remark. We will prove this using induction in the dimension of matrix. On the other hand (Au. that if D is real.1) x (n . v) = (lu. (Au. so 'A. /l it is possible only if (u. that a normal operator is an operator acting in one space. It is easy to check that for a diagonal matrix D D*D=DD*.1) x (n  1) matrix. v) = /leU. The case of 1 x 1 matrix is trivial. An operator (matrix) N is called normal if N* N = NN.. and D is a diagonal one. any matrix N satisfying N*N = NN* can be represented as N= UDU*. Moreover. v) = o. Proof To prove Theorem we apply Theorem to get an orthonormal basis. In other words. We can write it as *" N= aJ'J 0 aJ'2 . v). Theorem. v) = /leU. Clearly. that in the above theorem even if N is a real matrix. Therefore AA = AA if the matrix of a in some orthonormal basis is diagonal. Av) = (u. Namely. N must be selfadjoint. Definition. Suppose we have proved that any (n . NJ aJ'n 0 where Nt is an upper triangular (n . Note. To complete the proof of the theorem we only need to show that an upper triangular normal matrix must be diagonal. where U is a unitary matrix. such that the matrix of N in this basis is upper triangular. v) (the last equality holds because eigenvalues of a selfadjoint operator are real).1) upper triangular normal matrix is diagonal. we cannot say that U is normal. not from one space to another.
±i = L a=±l. . The Polarization Identities imply that for all x. x) = (NN*x. the matrix N has the form o a). y E x (N*N x. Direct computation shows that that and (NN»).Ii N*N= 2 0 o . and by the induction hypothesis it is diagonal..) = (NN»). Proof Let N be normal.) if and only if a).N*x) = II Nx 112 so II Nx II = II N*x II· Now let II Nx II = II N*x II \..2 = .) 12 + I a)'21 2 + . = a). .fx E X. Then II Nx 112 = (Nx. So. Proposition. That means the matrix N) is also normal. y) = (Nx.±i a II N*(x+aN* y) 112 = (N*x. N*y) = (NN* x. The following proposition gives a very useful characterization of normal operators. So the matrix N is also diagonal. x) = (N*x..fx E X .±i a=±l. Nl = Nl N.Nx) = (N*Nx. (N*N»).±i a=±l.n = O. Ny) = L a II Nx+aNy 112 a=±).) 0 N= 0 N) 0 It follows from the above representation that Ia). y) and therefore N*N = NN*.. N*N = NN*.) = I a). + I al'n 12. NN*= o o o o o so N.. An operator N : X ~ X is normal if and only if II Nx II = II N*x II \.202 Structure of Operators in Inner Product Spaces Let us compare upper left entries (first row first column) of N*N and NN*. Therefore.
.Structure of Operators in Inner Product Spaces 203 Polar and Singular Value Decompositions Definition.. vn be an orthonormal basis of eigenvectors of A. .. A A 0 if and only if all eigenvalues ofA are nonnegative. a self adjoint operator A : X ~ X is called positive definite if (Ax. Let VI' v2.Fn}. .. then Therefore in the basis vI' v2' . andB2=A.. Its Hermitian . .. J.e.. .l2' . Note. un be an orthonormal basis of eigenvalues of e... that since A ~ 0. ..ll' J.. To finish the proof it remains to notice that a diagonal matrix is positive definite (positive semidefinite) if and only if all its diagonal entries are positive (nonnegative). ~. There exists a unique positive semidefinite operator B such that B2 = A Such B is called (positive) square root ofA and is denoted asJA or Proof Let us prove that JA exists. . i. x) > 0 Vx:. .. the matrix of C has the ~ form diag{A.. Detine the matrix of B in the same basis as diag il2. and.. .. (A*A*) =A*A** =A*A and Y... Let a = A *.A. .. all Ak ~ o.B=B of form J.. moreover. . . Theorem. We will use the notation A> 0 for positive definite operators. and let J. Consider an operator A : X square A *A is a positive semidefinite operator acting in X. . ~~} This implies that any eigenvalue A of A is {. .l~. Corollary. .. A > 0 if and only if all eigenvalues of A are positive. Let u l ' u2.ln be the corresponding eigenvalues (note that Uk ~ OVk). To prove that such B is unique. d~agonal vn above. if Ax = Ax... The matrix of C in the basis ul' u2.. ~n} and therefore the matrix of A ::: C2 in the same basis is diag {~~ . 2.J>:. x) ~ 0 "Ix EX. Then 1. In the basis VI' v 2.Clearly.x. . Let A = A * ~ 0 be a positive semidefinite operator. Indeed. .Fz .~~ . B = C.J'. Fn}. and let AI' ~.. vn the matrix of a is a diagonal matrix diag{l. Singular values. .. .. let us suppose that there exists an operator C = C* ~ 0 such that e2 = A. un is a diagonal one diag {~]' ~. An be the corresponding eigenvalues. .. . Proof Pick an orthonormal basis such that matrix of a in this basis is diagonal. ex = . n} with entries A]..t: 0" and it is called positive semidefinite if (Ax. 2. An on the diagonal.. The following theorem describes positive definite and semidefinite operators. and A ~ 0 for positive semidefinite. Modulus ofan operator.
• Proof The first equality follows immediately from Proposition. Theorem. But in this case we can only guarantee that V is an isometry from Ran i A I = {KerA) 1 to Y. This operator R is called the modulus of the operator A. . Proof Consider a vector x E Ran IA I. If dim X :S.l (j A I is selfadjoint). The modulus ofA shows how "big" the operator A is: Proposition. For a linear operator A:X7Y . x) III A I x 112 = CI A I x. Then vector x can be represented as x = IAlv for some vector v E X. Remark. where.v I E Ker I A I = Ker A so meaning that Vo x is well defined. But first we need to prove that Vo is well defined. and is often denoted as I A I.J III A I x II = II Ax II Vx EX· Proof For any x E X lx. x) Ker A = Ker I A I = (Ran I A I) 1. Ax) = II Ax 112 Corollary. The polar decomposition A = VI A I also holds for operators A : X 7 Yacting from one space to another. Let A : X ~ X be an operator (square matrix). Define Vo x := Av. there exists a (unique) positivesemidefinite square root R = A * A . Therefore.204 Structure of Operators in Inner Product Spaces (A *Ax. x) = (Ax. Let VI be another vector such that x = I A Iv I · But x = I A I v = I A I VI means that v . V is unique if and only if a is invertible. the second one follows from the identity Ker T = (Ran T*). x) = (Ax. = (I A 1 2x. Remark. IA I x) = CI A I * I A = (A*Ax. (Polar decomposition of an operator). By the construction A = VolA I. Ax) = II Ax 112 ~ 0 Vx EX. Then A can be represented as A=VIAI. dim Y this isometry can be extended to the isometry from the whole X to Y (if dim X = dim Y this will be a unitary operator).V is a unitary operator. By Proposition II Vox II = II A v II = III A I v II = II x II so it looks like V is an isometry from Ran I A I to X. As one will see from the proof of the theorem. The unitary operator V is generally not unique.
. I J . Proof (Avj . equivalently Ax = L ak(x.a~. . The system wk := ak AVk' k = 1. r is an orthonormal system.. . . k=I . . vk)wk' k=I r Indeed. . . A *Avk = Proposition.. since for square matrices dim KerA = dim Ker A * (the Rank Theorem). then are singular values of A.Structure of Operators in Inner Product Spaces 205 To extend Vo to a unitary operator V.. ... . That means ak = 0 for k> r. In the notation of the above proposition. . . Avk) = (A *Avj . the operator a can be represented as or. vr is an orthonormal system. 0 i*k J.. It is easy to check that V = Vo + VI is a unitary operator.. ai vk a~(vj' vk) = { ... let us find some unitary transformation VI: Ker A ~ (RanA)l. counting multiplicities. In other words. Vn be an orthonormal basisof eigenvectors of A *A. . we know that vI' V 2.2 . r. . a j' =k since vI' v 2. 2. and let vI' V 2. an be the singular values of a counting mUltiplicities.. Consider an operator A : X ~ Y. vn is an orthonormal basis in X.J ·f· = 1. By the definition of singular values a~.J J J = a J·w· = Av· J . .a~ are eigenvalues of A*A.. Assume also that a l' a 2. . An 2 are eigenvalues of A*A. = a J·w·vJv· L. and that A=~AI· Singular Values Eigenvalues ofl A 1 are called the singular values of A.. = Ker A*. Then r * * "akwkvk vJ. vk) = 1 .. . a r are the nonzero singular values of A. .. It is always possible to do this. if AI' A ... and let aI' a 2...
Proof We only need to show that vk are eigenvalues of A* A. Since v\. k=1 r 2 * Since vI' v2. Let a can be represented as A = LOkWkVk r * where ok> and v\. . V2. Then this representation gives a singular value decomposition ofA. .. j ~k ·k ] .. The singular value decomposition can be written in a nice matrix form. ° * k=1 vr is an orthonormal system. .. WkWj=(Wj. Then r * A= LOkVkWk k=1 is a singular value decomposition ofA * Matrix representation of the singular value decomposition. A*A vk = O~Vk.. . ... vr is an orthonormal system A* AVj = LOkVkVkVj =OjVj k=1 r 2 * 2 thus vk are eigenvectors of A *A.•.206 Structure of Operators in Inner Product Spaces and r LOk WkVkVj = 0= AVj forj> r. Why? Lemma. .. Definition Remark. . .. . vn' so they are equal.. A * A = LOkVkVk. 1. v2. wr are some orthonormal systems. Singular value decomposition is not unique. Let r * A= LOkWkVk k=1 be a singular value decomposition ofA. Corollary.wk)=B kj := and therefore {a. It is especially easy to do if the operator . Vr' W I 'W2' . k=1 * So the operators in the left and right sides of equation coincide on the basis v I' V2.
.. . . dim Y = m (i. LV * is .. we can write a polar decomposition of A: A = WLV* so I A I = V LV * and General matrix form of the singular value decomposition... . and V. In this case di X = dim Y = n.. Then v\.. . . We will use this interpretation later.e. From singular value decomposition to the polar decomposition.... .w2' as where . so the singular value decomposition has the form n k=1 * A= Lakwkvk where vI' v2..L respectively. A is an m x n matrix)... and the operator A has n nonzero singular values (counting multiplicitie ).. In the general case when dim X = n. . if = (WV)(VLV *) = VIAl U = WV. Namely. O}.. that if we know the singular value decomposition A = W LV * of a square matrix A... v2' .. let vr+ 1. and let a I' a 2. .wn respectively. L is n x n diagonal matrix diag {aI' . Remark. a. .Structure of Operators in InnerrrOduct Spaces 207 A is invertible. Another way to interpret the singular value decomposition A = W LV * is to say that L is the matrix of A in the (orthonormal) bases A = vI' v2.. Such representation can be written even if A is not invertible. . vn and wr+I' .wn respectively.w2' . To represent A as WL V let us complete the systems {Vd~=I' {wd:=1 to orthonormal bases.. a 2. .. L = diag{a l . .e.• . r < n be nonzero singular values of A.. Let A = Lakwkvk k=1 n * be a singular value decomposition of A. . ..w2' .. vn and w l .. Namely. . vn and wp w2. . an} and V and Ware unitary matrices with columns v\' v2' .. Note..wn are orthonormal bases inXand Yrespectively and A can be represented be an orthonormal bases in Ker A = Ker IA I and (Ran A).. that = [AlB A. vn and wI. . the above representation A = V also possible.. wn are orthonormal bases in x and Y respectively. ar' 0.. wn' i. vn and B:= w l . Let us first consider the case dim X = dim Y = n.. . unitary matrices with columns vI' v2.. vn and wl'w2. It can be rewritten as A where = W L V*. Ware n x n .wn A= WLV*. . . ....
However. To find eigenvalues we usually computed characteristic polynomial. However. . the singular value decomposition is simply diagonalization with respect to two dierent orthonormal bases.. singular values tell us a lot about socalled metric properties of a linear transformation. In other words. and so on . This looks like quite a complicated process. these methods work extremely well. where V E Mn x nand WE Mmxm are unitary matrices with columns VI' V 2. They compute approximate eigenvalues and eigenvectors directly by an iterative procedure. vr and w I . there are very eective numerical methods of find eigenvalues and eigenvectors of a hermitian matrix up to any given precision. r} and make it to an m x n matrix by adding extra zeroes "south and east". to get the matrix one has to take the diagonal matrix diag {O" I' 0"2' . the diagonal entries of L in the singular value decomposition are not the eigenvalues of A.k = { 0 j = k ~ r: otherwise.. These methods are extremely eective. We will not discuss these methods here.. found its roots.. Final Remark: performing singular value decomposition requires finding eigenvalues and eigenvectors ofthe Hermitian (selfadjoint) matrix A * A. you should believe me that there are very eective numerical methods for computing eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value decomposition. Then a can be represented as A = W LV*. . However. it goes beyond the scope of this book. that for a = WL V* as in we generally have An :f:.w2' . Since we have two dierent bases here.. and just a little more computationally intensive than solving a linear system. especially if one takes into account that there is no formula for finding roots of polynomials of degree 5 and higher.. . Because a Hermitian matrix has an orthogonal basis of eigenvectors. . .208 A Structure of Operators in Inner Product Spaces = LO"kWkVk k=1 r * is a singular value decomposition of A.. ""wm respectively.. wr to orthonormal bases in X and Y respectively. vn and w I 'w 2. and O"k L is a "diagonal" m x n matrix L j. . Note. SINGULAR VALUES As we discussed above. we need to complete the systems vI' v 2.. so this diagonalization does not help us in computing functions 'of a matrix. W L n V * . For example. These methods do not involve computing the characteristic polynomial and finding its roots. . as the examples below show. we cannot say much about spectral properties of an operator from its singular value decomposition..
.. ... .. . an}. . In lR n the geometry of this set is also easy to visualize... . an..wn from the singular value decomposition. Similarly... see Remark 3... Y n are coordinates ofy in the orthonormal basis B = w l 'w 2.. k= 1. . We want to describe A(B). . i. xnl and (Y1'Y2' . ynl = Y = Ax we have yk = akXk (equivalently.w2' . If n = 2 it is an ellipse with halfaxes a I and a 2. if and only if the coordinates YI' Y2' . Consider first the case of a "diagonal" matrix form.. a2.. ... more precisely the corresponding lines are called the principal axes of the ellipsoid.. . I} be the closed unit ball in lRn.. . and we call that set an ellipsoid with half axes aI' a 2. Assuming that all a k > 0 and essentially repeating the above reasoning... and let B = {x E IR n 7 lR n : II x II ::.9. x k = yklak) for k = 1. . 1. we want to find out how the unit ball is transformed under the linear transformation.ynl=Axforllxll::. ar. n. . . xnl = [x]A.. and (or) not 2 2 2 II 2 all singular values are nonzero. wn' not in the standard one.. . .2... . . . . Y n satisfy the inequality 2 2 2l+ Y2 + a1 +Yn=~Yk<1 2··· 2 ~2a2 an k=1 ak 2 2 n 2 (this is simply the inequality II x 112 = L k I xk 12 ::. en or.. Consider the general case. . when the matrix A is not necessarily square. only "rotated" (with dierent but still orthogonal principal axes)! There is also an alternative explanation which is presented below.2.. Let us first consider the simplest case when A is a diagonal matrix A = diag{a l .. . The set of points in lR n satisfying the above inequalities is called an ellipsoid. for an = 3 it is an ellipsoid with halfaxes a I' a 2 and a 2 . +Yn=~Yk<1· 2 2 2 ~ 2a1 a2 an k=1 ak where YI' Y2' ..2. . Then forv= (x l . e 2. n. so y=(YI'Y2' ···.. Then the matrix of A in these bases is diagonal [AhA = diag{sn: n = 1. . But that is essentially the same ellipsoid as before. It is easy to see that the image space but in the Ran I of L B of the unit ball B is the ellipsoid (not in the whole L) with half axes a I' a 2. a k > 0.. it is easy to show that any point Y = Ax E A(B) if and only if it satisfies the inequality 2l+ Y2 + . . Namely. . The vectors e l . The singular value decomposition essentially says that any operator in an inner product space is diagonal with respect to a pair of orthonormal bases. (XI' x 2. 1). Consider for example the following problem: let A : lR m be a linear transformation...e. n}.Structure of Operators in Inner Product Spaces 209 Image of the unit ball. consider the orthogonal bases A = VI' v2.x2' . vn and B = w l .
The quantity max {II Ax II : x E X. It is not hard to see from the decomposition A = wI. Vare unitary operators. We know that I. is the diagonal matrix with nonnegative entries..e.. ar . so we can conclude: the image A(B) of the closed unit ball B is an ellipsoid in Ran A with half axes ai' 0"2' .. to Ran A. V* (using the fact that both Wand V are invertible) that W transforms RanI. so V* (B) = B. where V. Unitary transformations do not change geometry of objects. that the maximum of IIAxl I on the unit ball B is the maximal singular value of A. and I. so formula holds. one can conclude that the maximum of II Ax II on the unit ball B is the maximal diagonal entry of I. Operator norm of a linear transformation. let s!' s2' . Since r Ax= LXkek' k=l we can conclude that 2 2 IIAxll~~>i IXk 12 ~S122:lxk 1 =sJ. so W( I. V*. V . .ll x Il . so indeed sl is the maximum of II Ax lion the closed unit ball B. Given a linear transformation A : x ~ Y let us consider the following optimization problem: find the maximum ofkAxk on the closed unit ball B = {x EX: II x ~ I}...210 Structure of Operators in Inner Product Spaces Consider now the general case. with halfaxes ai' a 2. . i. Ware unitary operators. Since unitary transformations do not change the norm. Note. sr be nonzero diagonal entries of A and let sl be the maximal one.e. i. singular value decomposition allows us to solve the problem. II Aelll = II slelll = sIll ell. a r . Again. .. (B)) is also an ellipsoid with the same halfaxes. . For a diagonal matrix A with nonnegative entries the maximum is exactly maximal diagonal entry. A = WI. k=l k=1 r r so II Ax II ~ S) II x II· On the other hand. A = wI. Unitary transformations do not change the unit ball (because they preserve norm). we only assumed that all entries outside the "main diagonal" are 0. To treat the general case let us consider the singular value decomposition. Indeed. (B) is an ellipsoid in Ran I. that in the above reasoning we did not assume that the matrix A is square. Here r is the number of nonzero singular values. Definition. II x II ~ I} is called the operator norm of a and denoted II A II. where W. lt is an easy exercise to see that II A II satisfies all properties of the norm: . the rank of A.
Recalling that the trace equals the sum of the eigenvalues we conclude that II A 112 = trace(A * A) = On the other hand we know th~t the operator norm of a equals its largest singular value. 3. So. are nonzero eigenvalues of A*A (again ~>i. let us investigate how these two norms compare. This is often used as a definition of the operator norm. for example by some experiments. Suppose we have an invertible matrix A and we want to solve the equation Ax = b. II aA II = I ex I . suppose there is a small error in the right side of the equation..e. II Ax II :::. that the operator norm of a matrix cannot be more than its. of course. The solution. That happens in the real life. i. Condition number of a matrix. That means. i. On the space of linear transformations we already have one norm. sr be nonzero singular values of A (counting multiplicities). which follows easily from the homogeneity of the norm II x II. II x II. 4.. k=1 r counting multiplicities). SO we can conclude that II A II :::. II A II· II A + B IIA II :::. The beauty of the proof we presented here is that it does not require any computations and illuminates the reasons behind the inequality. One of the main properties of the operator norm is the inequality II Ax II ~ II A II . instead of the equation Ax = b we are solving . This statement also admits a direct proof using the CauchySchwarz inequality. and let sl be the largest eigenvalues.Structure of Operators in Inner Product Spaces 211 1. and such a proof is presented in some textbooks. the Frobenius..s.. or HilbertSchmidt norm II Alb. . Let us consider the simplest model. II A Ib. II A II + II B II· II ~ o for aliA. is given by x = AI b.s. o such that In fact. . when the data is obtained.e. Then s~ 2 . Let sl' s2' .. roundo errors during computations by a computer may have the same effect of distorting the data. C II x II II A 112= trace(A * A). . but we want to investigate what happens if we know the data only approximately. But even if we have exact data. it can be shown that the operator norm II A II is the best (smallest) number C ~ '\Ix E X. II A II = 0 if and only if A = o. II A II = s I. 2. So it is indeed a norm on a space of linear transformations from from x to Y.
Here a can be any scalar. so x = AI ~b. II AI II . Theoretically. Let us see how this quantity is related to singular values.However. and wn is the nth column of the matrix Win the singular value decomposition A = w'L V*. What is "big" here depends on the problem: with what precision you can find your right side.IIAI II=~. II xII IIbll The quantity II A 11'11 AIII is called the condition number of the matrixA. the matrix is called ill conditioned. II A II . i. We want to know how big is the relative error in the solution II ~ 11111 x II in comparison with the relative error in the right side II ~b 11111 b II. So. that it is possible to pick the right side b and the error ~b such that we have equality II ~II =11 AI 11. etc. We deduced above that II~II lI~bll W ~II A I II· II A II ·w· It is not hard to see that this estimate is sharp. Effective rank of a matrix.. It is easy to see that II ~II = II AI~b II = II AI~b 1111 b II = II AI~b 1111 Axil II x II II x II II b II II x II II b II II x II Since II AI ~b II ::. and let us assume that sl is the largest singular value and sn is the smallest. the rank of a matrix is easy to compute: one JUSt needs to row reduce matrix and count pivots.. instead of the exact solution x of Ax = b we get the approximate solution x + Llx of A(x+ Llx) = b + ~b. II ~ b II and II A x II ::.1I A 11=.11 AII. sn be the singular values of A. We know that the (operator) norm of an operator equals its largest singular value. A matrix is called well conditioned ifits condition number is not too big.II ~bll.. Let sl' s2' . so I 1 II A 11= sl. sn so IIAII.. It estimates how the relative error in the solution x depends on the relative error in the right side b.e.212 Structure of Operators in Inner Product Spaces Ax = b + ~b. where ~b is a small perturbation of the right side b.II AI 11. If the condition number is big. II xII IIbil We just put b = VI and ~b = aWn' where VI is the first column of the matrix V. the condition number of a matrix equals to the ratio of the largest and the smallest singular values. sn In other words. II x II we can conclude that II ~II::. what precision is required for the solution.II ~bll. in practical . . We are assuming that A is invertible..11 AII.
Then there exists an orthonormal basis VI' v2.2k) =(COS<Pk Rq. It requires more computations.. most computer programs introduce roundo errors in the computations. even if we know the exact matrix. so by setting up the tolerance we just deciding how "thin" the ellipsoid sholJld be to be considered "flat". which are easier to interpret. we only know its approximation up to some precision. it is not very reliable in the general case. The main reason is that very often we do not know the exact matrix. The theorem below explains this name. what do we lose is we set the tolerance equal to 106? How much better will 108 be? While the above approach works well for well conditioned matrices. vn such that the matrix of U in this basis has the block diagonal form Rq>1 Rq>2 0 Rq>k 0 I n . so effectively we cannot distinguish between a zero pivot and a very small pivot. Theorem. When computing the rank (and other objects related to it. kernel.k x (n .2k where Rjk are 2dimensional rotations. since it is very easy to programme. The advantage of this approach is that we can see what we are doing. Suppose that detU = 1. the main disadvantage is that is is impossible to see what the tolerance is responsible for. . A simple naive idea of working with roundoff errors is as follows.2k).Structure of Operators in Inner Product Spaces 213 applications not everything is so easy. Then we simply treat singular values smaller than the tolerance as zero. For example. However. . but gives much better results. The singular values are the halfaxes of the ellipsoid A(B) (B is the closed unit ball). and then perform singular value decomposition. In this approach we also set up some small number as a tolerance. The advantage of this approach is its simplicity. Let U be an orthogonal operator in IRI1. count it as zero. Moreover. like column space. etc) one simply sets up a tolerance (some small number) and if the pivot is smaller than the tolerance.. sin<Pk) sin<Pk cos<Pk and In. Structure of Orthogonal Matrices An orthogonal matrix U with det U = 1 is often called a rotation. .2k stands for the identity matrix of size (n . A better approach is to use singular values.
Uy = Im(u) = (cos a)y + (sin a)x. that the vectors u and u (eigenvectors of a unitary matrix. all complex eigenvalues of a real matrix A can be split into pairs Ak. v2 =Y to an orthonormal basis in ~n • Since UEA. E is an invariant subspace of U. corresponding to dierent eigenvalues) are orthogonal. then I is a root of p as well. we have AU = (cos a + i sin a) (x + iy) = ((cos a)x(sin a)y) + i((cos a)y + (sin a)x).AU ) = 1m (AU). y by the same nonzero number. spanned by the vectors x.' Ifwe multiply each vector in the basis x.' i.(sin a)y. so Ux = Re(Au) = (cos a) x . so by the Pythagorean Theorem IIx 11= Ilyll =2'lluli. Y = 1m u = (u . define so u = x + iy (note. and let u en be the eigenvector of = Au' Then Uu =I 17 . Therefore.u)/(2i). Then 1 lux = U (u + iJ) = (AU + AU) = Re(Au) 2 2 Similarly. the matrix of U in this basis has .e. U leaves the 2dimensional subspace EA. so without loss of generality we can assume that II x II = II y II = 1 i..' Let us complete the orthonormal system VI the block triangular form J2 = x. Now. so all complex eigenvalues of A can be written as Ak = cos ak + i sin ak. we do not change matrices of linear transformations. i. It is easy to check that x 1 y. y invariant and the matrix of the restriction of U onto this subspace is the rotation matrix cosa sina) Ra= ( . c E'J.e. Uu 5:.. x k := Re u = (u + 17)/2. yare real vectors. 5: k' I k = cos a k + i sin a k .e. pCA) plugging = 0. that eigenvalues of a unitary matrix have absolute value 1. Fix a pair of complex eigenvalues A and U. that x. y is an orthogonal basis in EA. In other word. E"A. p(I) = 0 (this can easily be checked by into p(z) I = L ~=o ak z k ). y is an orthogonal basis in EA. split U into real and maginary parts. 11 Uy = 2i U(uiJ) = 2/ AU . E We know. i.214 Structure of Operators in Inner Product Spaces Proof We know that if p is a polynomial with real coefficient and A is its complex root. so x.e. Since = cos a + i sin a. sina cosa Note. vectors with real entries). that x.
vi + 1. . Define ~. Then Theorem simply says that V is the composition of the rotations ~ . Ifan orthogonal matrix has determinant 1.. are identity matrices of size r x r and I x I respectively. the multiplicity of the eigenvalue 1 (i.e.2) x 2 block of zeroes. so in some orthonormal basis the matrix of V has the form R. . r) must be even. Therefore V*EI = UIEI = EI so the matrix of V in the basis we constructed is in fact block diagonal. the theorem can be interpreted as follows: Any rotation in IR n can be represented as a composition of at most nl2 commuting planar rotations. Therefore. to be a rotation thorough <Pj in the plane spanned by the vectors vi .Structure of Operators in Inner Product Spaces 215 where 0 stands for the (n .. . 2. Since V is unitary (*) (~} so. they commute.. it is also unitary. that the 2 x 2 matrix 12 can be interpreted as the rotation through the angle n'.u1 R_ U2 0 o here Ir and I. Since the rotation matrix R_ uis invertible. If VI has complex eigenvalues we can apply the same procedure to decrease its size by 2 until we are left with a block that has only real eigenvalues. its structure is described by the following theorem. Real eigenvalues can be only + 1 or 1. it does not matter in what order . = E/.!ve take the composition. Since det U = 1. the above matrix has the form given in the conclusion of the theorem with '<Pk = uk or '<Pk = n Let us give a dierent interpretation of Theorem. k. that because the rotations T. act in mutually orthogonal planes.. i... i = 1.e. So. since VI is square. Note... Note. we have VE/.
2k stands for the identity matrix of size (n . let us prove it for n.1 . so the last coordinate of R 2R tx remains zero). Let U be an orthogonal operator in ~n. Assuming now that Lemma is true for n . where a = Jx~ +x~ + ... Note. .. where a = J + x~. has the desired form.I of Rtx (the rotation R2 does not change the last coordinate. The modification that one should make to the proof of Theorem are pretty obvious. The case n = 2 is treated. Then there exists an orthonormal basis vI' v2. Then use an elementary rotation R2 in x n.2xn. ~k ° _(COS'Pk ..1.e. . .2k . and it acts on these two coordinates as a plane rotation.1 elementary rotations R I.I xn plane to "kill" the last coordinate ofx. Let x = (xI' x 2f E ~2.x k plane. . .1)/2 elementary rotations. There exist n. Let x = (xI' x 2. an orthogonal transformation U with detU = I) can be represented as a product at most n(n . Theorem. and so on. For a formal proof we will use induction in n. 0. R 2R j x = (a. Let us now fix an orthonormal basis... . xn)T E ~n. of. and let detU = 1. We call an elementary rotation 2 a rotation in the Xj . The case n = 1 is trivial. . There exists a rotation R of R2 which moves the vector x to the vector (a.2k where r = n . i. Proof The idea ofthe proof of the lemma is very simple. a linear transformation which changes only the coordinates Xj and x k ..216 Structure of Operators in Inner Product Spaces Theorem.. We use an elementary rotation R t in the xn . x: One can just draw a picture orland write a formula for Ra' R n_1 such thdt R n. R 2. +x~. Of. Lemma. vn such that the matrix of U in this basis has block diagonal form ° R<pk I n . Any rotation U (i. since any vector in ~1 Lemma.t plane to "kill" the coordinate number n . ..e. sm'Pk sin'Pk] cos'Pk and I n.2k) x (n . There exists a 2 x 2 rotation matrix Ra such that .2k).... say the standard basis in ~n. 0.1 and R<pk are 2dimensional rotations. To prove the theorem we will need the following simple lemmas. that it follows from the above theorem that an orthogonal 2 x 2 matrix U with determinant I is always a reflection. .
.. 0. . There exists a rotation R which "kills" the second coordinate of ai' making the first coordinate nonnegative. . and.2' anI' of. " R n. R 2. .I) matrices. . and we want to prove it for n x n matrices.2 elementary rotations (let us call them R 2.2) identity matrix). " So.2) x (n . . We can always assume that the elementary rotations R 2. " x n. moreover. X.. n ~ n(n . but we apply the following indirect reasoning. Il ..R2. . We assumed that the conclusion of the lemma holds for n . There exist elementary rotations R I. R 3.. simply by assuming that they do not change the last coordinate... But. the matrix R n_ l . R n_1 which transform a l into (a. all its diagonal entries except the last one B n. " of.Structure of Operators in Inner Product Spaces 217 where anI = J R. 0. Then the matrix B = RA is of desired form. 1l. . So if we define the n x n elementary rotation R I by ~(1'~2 . .. I 2 2 2 Of. " R 3Rix l' x 2. Then R n_ l . Proof We will use induction in n.. Let us consider the case n = 2.. Let A be an n x n matrix with real entries. .I. . We know that orthogonal transformations preserve the norm.. " of E JR. The case n = I is trivial.1l . Let us now assume that lemma holds for (n . We can find n . 0. the only possibility for a is a = \jXJ +X2 +"'+Xn ' Lemma.I) x (n . " R2RIA is upper triangular._I + x. In other words R n_ l . . Rn _ l . For the n x n matrix A let al be its first column.n are nonnegative. .I elementary rotations (say R I. " R2R)A has the following block triangular form Let us now show that a = Jx~ + xi +'" + x~.. R 3. 0. . . 1l .J (1n2 is (n .. . " O)T E JR... .R2R JA = (~ ~J... and we know that a ~ 0. Let al be the first column of A. then we do not have any choice.1 to the vector (a.' R3R2R x It can be easily checked directly. .1 which transform the vector 1l I (XI' x 2. " Xn_I' an_If E JR. . since we can say that any I x I matrix is of desired form. " R n_l ) in JR.. so there exist n . " R n_ J act in JR. then R lx = (xI' x 2. = (a. " XnI' an_If = (a. .1)/2 such that the matrix B = RN ..
We assumed that lemma holds for n . and all diagonal entries.. say in JR(3. In each figure.1) x (n . R2R2V is upper triangular. We know that an eigenvalue of an orthogonal matrix can either be lorI. So VI = J.. RN such that the matrix VI = RN.. you can try to draw a picture to visualize the vectors. you probably can say what orientation it has. \ \ I Orientation Motivation. . .. Then. and therefore V can be represented as a product of elementary rotations V=R1 R2 ... VI is a diagonal matrix. So. Therefore. xn)' but we can always assume that they act on the whole JR(1l simply by assuming that they do not change the first coordinate. 0. You also probably know some rules to determ. The last diagonal entry can be ±1. R2.2)/2 rotations into the desired upper triangular form. .1) (n . . except may be the last one. the basis b) can be obtained from the standard basis a) by a rotation.218 Structure of Operators in Inner Product Spaces where Al is an (n ..1)(n .••. if you can see a basis.. that these rotations act in JR(n1 (only on the coordinates x 2' x 3' . Proof There exist elementary rotations R 1. Note. and you probably know that bases (a) and (b) have positive orientation. .1)/2 elementary rotations. that the matrix VI is orthogonal. In Figures 3 orthonormal bases in JR(2 and JR(3 respectively. are nonnegative. and orientation of the bases (c) is negative. are 1. Note.. so the matrix A can be transformed into the desired upper triangular form by at most n . RN · Here we use the fact that the inverse of an elementary rotation is an elementary rotation as well. so all the diagonal entries of VI' except may be the last one. and we know that an upper triangular matrix can be normal only if it is diagonal. so we can have only 1 or Ion the diagonal of VI' But. . Any orthogonal matrix is normal..ne the orientation. Since elementary rotations have determinant 1. R2RIA).. while it is impossible to rotate the standard basis to get the basis c) (so that ek goes to vk Vk).2)/2 = n(n . these rotations do not change the vector (a. .1 + (n . like the right hand rule from physics. so AI can be transformed by at most (n . so the last diagonal entry also must be 1. ol (the first column of Rn_ l . But what if you only given coordinates of the vectors VI' V2' V3? Of course. we can conclude that det V\ = det V= 1.. and then to see what the orientation is. You have probably heard the word "orientation" before. are nonnegative.1. we know that all diagonal entries of VI' except may be the last one.1) block.
en' Uek = vk. . We need to check whether it is possible to get a basis VI' V2' v3 in IR3 by rotating the standard basis e l . en in IR n has positive orientation. Standard Bases (c) lR 2 h e) e2 V2 (a) (b) Fig.A' one can use the matrix [J]A. Definition.. Moreover.B in the definition . Note. vn in IR n has positive orientation (i. the same orientation as the standard basis) This equation show that the basis Vl' v 2. that (for 3 x 3 matrices) if det U = 1.B = [I]B. L (a) e) v2 (b) Fig. v 3 • It is an orthogonal matrix (because it transforms an orthonormal basis to an orthonormal basis).. Note. that since ' [J]A. and say that they have dierent orientations if the determinant of [J]B A is negative. Orientation in IR 3 (c) There is unique linear transformation U such that its matrix (in the standard basis) is the matrix with columns VI' v2 . 2. I . Vn is obtained from the standard basis by a rotation. . i. . . how do you "explain" this to a computer? It turns out that there is an easier way.•. Theorems give us the answer: the matrix U is a rotation if and only if det U = 1. if the change of coordinates matrix [J]B. If an orthonormal basis VI' v2. Let us explain it. Let a and b be two bases in a real vector space X. We say that the bases a and b have the same orientation. so we need to see when it is rotation.A has positive determinant. In an abstract space one just needs to fix a basis and declare that its orientation is positive..Structure of Operators in Inner Product Spaces 219 But this is not always easy. This gives us a motivation for the formal definition below.e. We usually assume that the standard basis e l . then U is the composition of a rotation about some axis and a reflection in the plane of rotation.e... k = 1. in the plane orthogonal to this axis. 3. . e2. . e2.
vn(t) is a basis for all t E [a. t E [a.. b] be a continuous family of bases. bn} if there exists a continuous family of bases Vet) = {vIet). v 2(t). one needs to show that the identity matrix J can be continuously transformed through invertible matrices to any matrix B satisfying det B > O. V (b) = B. . an} and B = {bI' b2. Clearly. and let Vet). Proof Suppose the basis A can be continuously transformed to the basis B. b] such that for all t E [a. . .B = det V (b) > 0. vnc!)}. det V (t) is never zero. .. t E [a. . Then. i. which is essential. . if and only if one of the bases can be continuously transformed to the other. that there exists a continuous matrixfunction V (t) on an interval [a. we can conclude that det[1]A... To prove the oppqsite implication. Consider a matrixfunction V (t) whose columns are the coordinate vectors [vit)]A of vi!) in the basis A. b].220 Structure of Operators in Inner Product Spaces Continuous Transformations of Bases and Orientation Definition.1]. b] such that via) = ak . . that performing a change of variables. we can always assume.. ... so the bases A and B have the same orientation. In other words. bn } have the same orientation. . b] the matrix V (t) is invertible and such that V (a) = J. "Continuous family of bases" mean that the vectorfunctions vit) are continuous (their coordinates in some bases are continuous functions) and. Since det V (a) = detl = 1..e.. Theorem.. vit). .. V(b) = [1]A B' Note. .. Note.. the system vI (t). b] = [0. n. the "only if' part of the theorem. that because Vet) is always a basis. if necessary that [a. vib) = bk . Two bases A = {aI' a 2. . .. an} can be continuously transformed to A basis b = {bI' b 2.2 . We say that a basis a = {aI' a2. the Intermediate Value Theorem asserts that det V (a) and det V (b) has the same sign. the entries of V(t) are continuous functions and V(a) = I. k = 1. performing this transformation. .
Chapter 8
Bilinear and Quadratic Forms
Main Definition
Bilinear forms on Rn. A bilinear form on Rn is a function L = L(x, y) oftwo arguments
x, Y
1. 2.
E
jRn
which is linear in each argument, i.e. such that
= aL(xl,y) + ~L(x2'Y); L(x, ay! + ~Y2) = alex, y!) + ~L(x, Y2)'
L(ax!
t
~x2'Y)
One can consider bilinear form whose values belong to an arbitrary vector space, but in this book we only consider forms that take real values. If x = (x!' x 2, ... , xn)T and y = (Y!, Y2' ... , Ynf, a bilinear form can be written as
L(x,y)
=L
j,k=!
n
aj,kxkYj'
or in matrix form
L(x, y)
= (Ax, y)
where
A=
The matrix A is determined uniquely by the bilinear form L. Quadratic forms on jR n • There are several equivalent definition of a quadratic form. One can say that a quadratic form on Rn is the "diagonal" of a bilinear form L, i.e. that any quadratic form Q is defined by Q[x] = L(x, x) = (Ax, x).
222
Bilinear and Quadratic Forms
Another, more algebraic way, is to say that a quadratic form is a homogeneous polynomial of degree 2, i.e. that Q[x] is a polyn~mial of n variables xl' x 2, ... , xn having only terms of degree 2. That means that only terms axi and cX/'k are allowed. There many ways (in fact, infinitely many) to write a quadratic form Q[x] as Q[x] =
2 2
(Ax, x). For example, the quadratic form Q[x] =xl + x2  4xlx2 on ~2 can be represented as (Ax, x) where A can be any of the matrices
1 ' 4 1 ' In fact, any matrix A of form
1 (° 4) (1 0) (1 1).
2 1
(~a a~4)
will work. But if we require the matrix a to be symmetric, then such a matrix is unique: Any quadratic form Q[x] on ~n admits unique representation Q[x] = (Ax, x) where a is A (real) symmetric matrix. For example, for the quadratic form
Q[x]=xl + 3x2 + 5x3 +4xlx216x2x3+7xlX3
on ~3 , the corresponding symmetric matrix A is
2
2
2
[~
8 3.5
~ ;,~J.
5
Quadratic forms on One can also define a quadratic form on (or any complex inner product space) by taking a selfadjoint transformation A = A * and defining Q by Q[x] = (Ax, x). While our main examples will be in Rn, all the theorems are true in the settipg of Cn as well. Bearing this in mind, we will always use a instead of AT
en.
en
Diagonalization of Quadratic Forms
You have probably met quadratic forms before, when you studied second order curves in the plane. May be you even studied the second order surfaces in ~3 • We want to present a unified approach to classification of such objects. Suppose we are given a set in ~ 3 defined by the equation Q[x] = I, where Q is some quadratic form. If Q has some simple form, for example if the corresponding matrix is diagonal, i.e. if
Q[x] = alx~ + a2x~ + ... + anx~, we can easily visualize this set, especially if n = 2, 3. In
Bilinear and Quadratic Forms
223
higher dimensions, it is also possible, if not to visualize, then to understand the structure of the set very well. So, if we are given a general, complicated quadratic form, we want to simplify it as much as possible, for example to make it diagonal. The standard way of doing that is the change of variables. Orthogonal diagonalization. Let us have a quadratic form
= (Ax, x) in IRn. Introduce new variables y = (Y!, Y2' where S is some invertible n x n matrix, so x = Sy.
Q[x]
... ,
ynf E
IR n , with y = SI x,
Then,
Q[x]
= Q[Sy] = (ASy, 'Sy) = (S*ASy,
y),
so in the new variables y the quadratic form has matrix S*AS. So, we want to find an invertible matrix S such that the matrix S*AS is diagonal. Note, that it is dierent from the diagonalization of matrices we had before: we tried to represent a matrix a as A = SDSI, so the matrix D = SlAS is diagonal. However, for orthogonal matrices U, we have U = U!, and we can orthogonally diagonalize symmetric matrices. So we can apply orthogonal diagonalization we studied before to the quadratic forms. Namely, we can represent the matrix a as A = UDU* = UDU!. Recall, that D is a diagonal matrix with eigenvalues of A on the diagonal, and U is the matrix of eigenvectors (we need to pick an orthogoml basis of eigenvectors). Then
D
= U*AU,
so in the variables
y= U! x
the quadratic form has diagonal matrix. Let us analyse the geometric meaning of the orthogonal diagonalization. The columns of the orthogonal matrix Uform an orthonormal basis in IR n , let us call this basis S. The change of coordinate matrix [I]s B from this basis to the standard one is exactly U. We know that ' y = (Y!, Y2' ... , ynf = Ax, so the coordinates Y!'Y2' ... , Yn can be interpreted as coordinates ofthe vector x in the new basis u!' u2, ... , un' So, orthogonal diagonalization allows us to visualize very well the set Q[x] = I, or a similar one, as long as we can visualize it for diagonal matrices.
up u 2, ... , un
Example. Consider the quadratic form of two variables (i.e. quadratic form on IR 2 ), Q(x, y) = 2.x2 +2y2 + 2xy.
Let us describe the set of points (x, The matrix A of Q is
A
yf E
IR 2 satisfying Q(x, y) = 1.
= (~
~}
224
Bilinear and Quadratic Forms
Orthogonally diagonalizing this matrix we can represent it as
A
= (~
~)u*, where U = ~C
~(~ ~)~: D
or, equivalently
U' AU
The set {y : (Dy, y) = I} is the ellipse with halfaxes 11.J3 and 1. Therefore the set
{x
E
IR 2
:
(Ax. x)
~ I}, is the same ellipse only in the basis (Jz, Jz
or, equivalently, the same ellipse, rotated 1t/4. Nonorthogonal diagonalization. Orthogonal diagonalization involves computing eigenvalues and eigenvectors, so it may be dicult to do without computers for n > 2. There is another way of diagonalization based on completion of squares, which is easier to do without computers. Let us again consider the quadratic form of two variables, Q[ x] = 2x{ + 2xlx2 + 2xi (it is the same quadratic form as in the above example, only here we call variables not x, y but x I' x 2 ). Since
2 ( xJ+"2X2
r.(Jz, Jzr'
1 2
1)2
=2 xl +2XI"2X2+4"X2 =2xJ +2XIX2+"2X2
(2
1
1 2)
2
(note, that the first two terms coincide with the first two terms of Q), we get
2xJ +2XIX2 +2xI =2 xl +X2
2
2
(
1)2
2
+X2 =2YI +Y2'
3 2 2
2
3 2 2
whereYI =x I
+ xl +"2 X2 andY2 =x2.
1
The same method can be applied to quadratic form of more than 2 variables. Let us consider, for example, a form Q[x] in ~3 ,
2 2 2
Q[ X] = Xl  6XIX2 + 4XIX3  6X2X3 + 8X2  3X3 . Considering all terms involving the first variable xl (the first 3 terms in this case), let us pick a full square or a multiple of a full square which has exactly these terms (plus some other terms). Since
(Xl  3X2
+ 2X3)
2
= xI
2 '
 6XIX2
+ 4XI X3 12x2 X3 + 9X2 + 4X3
2
2
2
we can rewrite the quadratic form as
(XI 3x2+ 2x3) =X2 6X2 X3+ 7x3·
2 2
Bilinear and Quadratic Forms
225
Note, that the expression xi + 6x2x3 7xi involves only variables x 2 and x 3. Since
(X2  3X3)
2 2 =(X2 2
6X2X3 + 9X3) = x2 + 6X2X3  9X3
2 2
222
we have
X2 + 6X2X3  7X3 =(X2  3X3) + 2x3· Thus we can write the quadratic fonn Q as Q[x](x)3X2+ 2x3) (x2 3x3) + 2X3 =y) +2+23
,2 22222 2
where
Y) = x)  3x2 + 2x3'Y2 = x 2  3x3'Y3 = x 3· There is another way of perfonning nonorthogonal diagonalization of a quadratic fonn. The idea is to perfonn row operations on the matrix a of the quadratic form. The difference with the row reduction (GaussJordan elimination) is that after each row operation we need to perform the same column operation, the reason for that being that we want to make the matrix S*AS diagonal. Let us explain how everything works on an example. Suppose we want to diagonalize a quadratic form with matrix
1 2 1. 311 We augment the matrix A by the identity matrix, and perform on the augmented matrix (All) row/column operations. After each row operation we have to perfonn on the matrix a the same column operation. We get
A
=
1 1 3) (
(

~  ~ ~ ~ ~ ~) + ~ (~
R)
:
! ~ ~ ~1) ~
~
311001
31100
1 0 (3 4
001 ( 1 ( 00 0
~ ! : ~~) ~ [~ ~ ! ~ ~ ~1) ~
0 0 1 3~ ,0 4 8 3 0 1 0
~
1
~~) ~ (~~ ~
1
4R2
4 8 3 0
0 0 24 7 4
~)
o
0 24 7 4
~
~IJ.
. E2EIAEIE2 . So. It is harder to visualize. we performed only row operations on it. The identity 2XIX2 = i[(XI +X2) 122 (Xl X2) ] gives us the idea for the following series of row operations: 1 (0 °11 0)~Rl (01 1/ 21I 1/ 21 0) 1 °1 1 2 (01 11I 1121 O)+Rl (11 1 1 112 (0 1 I 112 11)' I °1112 1) 112 1 ° Nonorthogonal diagonalization gives us a simple description of a set Q[x] = 1 in a nonorthogonal basis. It is quite simple if you know what to do. because we did not need to interchange two rows.. then the representation given by the . We get the diagonal D matrix on the left. so D = S*AS...226 Bilinear and Quadratic Forms Note. As for the identity matrix in the right side. we need something more nontrivial. The corresponding column operation is the right multiplication by the transposed elementary matrix. the simplest idea would be to interchange rows I and 2. = E1 = E.E2. i. If we now denote E* = S we get that S* AS is a diagonal matrix. EN =EAE*. and the matrix S* on the right. so the identity matrix is transformed to EN'" E2El1 ~~ ~J(~1 ~J(~ ~ 1 (° ° ~ J=(~ ~ 1 ~l 1 ° ° =:J. a quadratic form with the matrix A = (~ ~) Ifwe want to diagonalize it by row and column operations. and the matrix E = S is the right half of the transformed augmented matrix. Therefore. but it may be hard to guess the correct row operations.. A row operation is a left multiplication by an elementary matrix. performing row operations E l . 24 7 4 3 Let us explain why the method works. Let us consider. This operation is a bit tricker to perform. that we perform column operations only on the left side of the augmented matrix. for example. In the above example we got lucky. But we also must to perform the same column operation.e. ". interchange columns 1 and 2. EN and the same column operations we transform the matrix A to * * * EN . so we will end up with the same matrix.
. resp.. resp. the famous Silvester's Law ofInertia states that: For a Hermitian matrix A (i. negative. if we got a diagonal matrix D = diag{Al' A2. negative. x) = 0) for all x E E. We will need the following lemma.positive subspace E of dimension 1 +.Bilinear and Quadratic Forms 227 orthogonal diagonalization. The above theorem says that ifr+ is the number of positive diagonal entries of D. resp. we can take a diagonal matrix S=diag{sl. . Let D be a diagonal matrix D = diag{Al' A . etc). zero) diagonal entries of a diagonalization D = S*AS in terms of A. resp.sk 0 and transform D to *' S * DS = diag{s~Al.neutral) subspace. if we are not interested in the details. it does not change the signs of the diagonal entries. for example if it is sucient for us to know that the set is ellipsoid (or hyperboloid. Here we of course assume that S is an invertible matrix. The idea of the proof ofth~ Silvester's Law of Inertia is to express the number of positive (negative. (Ax. there many ways to diagonalize a quadratic form. neutral) if (Ax. .siA2. . A . to emphasize the role of a we will say Apositive (A negative. but it is impossible to find a positive subspace E with dim E > r +.e. x) n on ]R. zero) diagonal entries of D coincides with the maximal dimension of an D . And this is always the case Namely. for a quadratic form Q[x] :::. D .n (changing the numeration) we can *' . Then the number ofpositive (resp.n) we call a subspace E c lR positive (resp. Proof By rearranging the standard basis in ]R. x) > 0 (resp. S~An} This transformation changes the diagonal entries of D.negative. Note. For example. D . zero) diagonal entries of D coincides with the maximal dimension of an Apositive (resp.s2' .sk E ]R. and let D = S*AS be its diagonalization by an invertible matrix S. Given an n x n symmetric matrix A = A * (a quadratic form Q[x] = (Ax.. We will need the following definition.. zero) diagonal entries of D depends only on A. Lemma.. Sometimes. Theorem.. resp.negative.. Aneutral).. then the nonorthogonal diagonalization is an easier way to get the answer. Silvester's Law of Inertia As we discussed above. (Ax. not involving S or D. However. negative. . Let A be an n x n symmetric matrix. which can be considered a particular case of the above theorem. An}. . resp.. x O.neutral) subspace. Definition. then there exists an A . that a resulting diagonal matrix is not unique. x» and any its diagonalization D = S*AS. . the number of positive (negative. but not on a particular choice of diagonalization. sn}. (Ax. A .positive (resp. An}' Then the number of 2 positive (resp. x) < 0. and D is a diagonal one.. However.
. x) = (S* ASx.. we just apply the above reasoning to the matrix D.positive subspace E. i. Therefore. xnf. (Dx. by the definition of P Xl = x 2 = . e2. Since (Dx. Clearly E+ is a Dpositive subspace. But we just proved that Ker T= {O}. . .. x) = (ASx. The case of negative and zero diagonal entries treated similarly.positive subspace (namely SE) of the same dimension. that dim Ker T= 0. and the theorem is proved. . Of.. T is the restriction of the projection P : P is defined on the whole space. er+. and therefore Px = (xl' x2. In other words.positive subspace coincide.. V X E E. x = (Xl' x2. Consider the orthogonal projection P = PE+' For a D . . and vice versa: for any A . 2 for x But X belongs to aD . Indeed.positive subspace E define an operator T: E ~ E+ by Tx = Px. To prove the statement about negative entries.. but we restricted its domain to E and target space to E+.positive subspace F we can find a D . so dim E = rank T ~ dim E+ = r+. .. that Ker T = {O}.positive subspace (namely !)l F) of the same dimension. x) ~ 0 holds only = O.e. dimE = dim SE and dimF = dim !)l F.. Sx) it follows that for any D . xnf E E we have Tx = Px = O.positive and a D . The same identity implies that for any A . 0 .positive subspace E. for any D positive subspace E we can find an A .positive. By the Rank Theorem. Proof Let D = S*AS be a diagonalization of A. . = xr + = 0.228 Bilinear and Quadratic Forms always assume without loss of generality that the positive diagonal entries of D are the first r + diagonal entries. Since Sand !)l are invertible transformations. x r+.. but even simpler. and we use a dierent letter to distinguish it from P. the subspace SE is an Apositive subspace. rank T = dimRan T ~ dimE+ = r+ because Ran T c E+. Note. let for X = (xl' x 2.. The case of zero entries is treated similarly. Let us now apply the Rank Theorem. . and dimE+ = r+. First of all. dimKer T + rank T = dim E. . Let us now show that for any other Dpositive subspace E we have dim E ~ r +.positive subspace F the subspace !)IF is D . . Consider the subspace E+ spanned by the first rf coordinate vectors e l ... Therefore the maximal possible dimensions of a A . . We got an operator acting from E to E+. .x) = L k=r++l n Akxk ~ 0 (Ak ~ Ojork>r+). so the inequality (Dx. Then.
Theorem. So we have proved that conditions imply that A is positive definite.:: 0 for all x. Indefinite if it take both positive and negative values. A symmetric matrix A = A * is called positive definite (negative definite. etc) one does not have to compute eigenvalues. 2.Bilinear and Quadratic Forms 229 Positive Definite Forms Minimax characterization of eigenvalues and the Silvester's criterion of positivity Definition. if there exist vectors xl and x