This action might not be possible to undo. Are you sure you want to continue?
Peter J. Cameron
ii
Preface
Linear algebra has two aspects. Abstractly, it is the study of vector spaces over
ﬁelds, and their linear maps and bilinear forms. Concretely, it is matrix theory:
matrices occur in all parts of mathematics and its applications, and everyone work
ing in the mathematical sciences and related areas needs to be able to diagonalise
a real symmetric matrix. So in a course of this kind, it is necessary to touch on
both the abstract and the concrete aspects, though applications are not treated in
detail.
On the theoretical side, we deal with vector spaces, linear maps, and bilin
ear forms. Vector spaces over a ﬁeld K are particularly attractive algebraic ob
jects, since each vector space is completely determined by a single number, its
dimension (unlike groups, for example, whose structure is much more compli
cated). Linear maps are the structurepreserving maps or homomorphisms of vec
tor spaces.
On the practical side, the subject is really about one thing: matrices. If we need
to do some calculation with a linear map or a bilinear form, we must represent it
by a matrix. As this suggests, matrices represent several different kinds of things.
In each case, the representation is not unique, since we have the freedom to change
bases in our vector spaces; so many different matrices represent the same object.
This gives rise to several equivalence relations on the set of matrices, summarised
in the following table:
Equivalence Similarity Congruence Orthogonal
similarity
Same linear map Same linear map Same bilinear Same selfadjoint
α : V →W α : V →V form b on V α : V →V w.r.t.
orthonormal basis
A
/
= Q
−1
AP A
/
= P
−1
AP A
/
= P
·
AP A
/
= P
−1
AP
P, Q invertible P invertible P invertible P orthogonal
The power of linear algebra in practice stems from the fact that we can choose
bases so as to simplify the form of the matrix representing the object in question.
We will see several such “canonical form theorems” in the notes.
iii
iv
These lecture notes correspond to the course Linear Algebra II, as given at
Queen Mary, University of London, in the ﬁrst sememster 2005–6.
The course description reads as follows:
This module is a mixture of abstract theory, with rigorous proofs, and
concrete calculations with matrices. The abstract component builds
on the notions of subspaces and linear maps to construct the theory
of bilinear forms i.e. functions of two variables which are linear in
each variable, dual spaces (which consist of linear mappings from the
original space to the underlying ﬁeld) and determinants. The concrete
applications involve ways to reduce a matrix of some speciﬁc type
(such as symmetric or skewsymmetric) to as near diagonal form as
possible.
In other words, students on this course have met the basic concepts of linear al
gebra before. Of course, some revision is necessary, and I have tried to make the
notes reasonably selfcontained. If you are reading them without the beneﬁt of a
previous course on linear algebra, you will almost certainly have to do some work
ﬁlling in the details of arguments which are outlined or skipped over here.
The notes for the prerequisite course, Linear Algebra I, by Dr Francis Wright,
are currently available from
http://centaur.maths.qmul.ac.uk/Lin Alg I/
I have byandlarge kept to the notation of these notes. For example, a general
ﬁeld is called K, vectors are represented as column vectors, linear maps (apart
from zero and the identity) are represented by Greek letters.
I have included in the appendices some extracurricular applications of lin
ear algebra, including some special determinants, the method for solving a cubic
equation, the proof of the “Friendship Theorem” and the problem of deciding the
winner of a football league, as well as some worked examples.
Peter J. Cameron
September 5, 2008
Contents
1 Vector spaces 3
1.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Row and column vectors . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Subspaces and direct sums . . . . . . . . . . . . . . . . . . . . . 13
2 Matrices and determinants 15
2.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Row and column operations . . . . . . . . . . . . . . . . . . . . 16
2.3 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Calculating determinants . . . . . . . . . . . . . . . . . . . . . . 25
2.6 The Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . 29
3 Linear maps between vector spaces 33
3.1 Deﬁnition and basic properties . . . . . . . . . . . . . . . . . . . 33
3.2 Representation by matrices . . . . . . . . . . . . . . . . . . . . . 35
3.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Canonical form revisited . . . . . . . . . . . . . . . . . . . . . . 39
4 Linear maps on a vector space 41
4.1 Projections and direct sums . . . . . . . . . . . . . . . . . . . . . 41
4.2 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . 44
4.4 Diagonalisability . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Characteristic and minimal polynomials . . . . . . . . . . . . . . 48
4.6 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
v
CONTENTS 1
5 Linear and quadratic forms 55
5.1 Linear forms and dual space . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.2 Change of basis . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . 58
5.2.2 Reduction of quadratic forms . . . . . . . . . . . . . . . . 60
5.2.3 Quadratic and bilinear forms . . . . . . . . . . . . . . . . 62
5.2.4 Canonical forms for complex and real forms . . . . . . . 64
6 Inner product spaces 67
6.1 Inner products and orthonormal bases . . . . . . . . . . . . . . . 67
6.2 Adjoints and orthogonal linear maps . . . . . . . . . . . . . . . . 70
7 Symmetric and Hermitian matrices 73
7.1 Orthogonal projections and orthogonal decompositions . . . . . . 73
7.2 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Quadratic forms revisited . . . . . . . . . . . . . . . . . . . . . . 77
7.4 Simultaneous diagonalisation . . . . . . . . . . . . . . . . . . . . 78
8 The complex case 81
8.1 Complex inner products . . . . . . . . . . . . . . . . . . . . . . . 81
8.2 The complex Spectral Theorem . . . . . . . . . . . . . . . . . . . 82
8.3 Normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9 Skewsymmetric matrices 85
9.1 Alternating bilinear forms . . . . . . . . . . . . . . . . . . . . . . 85
9.2 Skewsymmetric and alternating matrices . . . . . . . . . . . . . 86
9.3 Complex skewHermitian matrices . . . . . . . . . . . . . . . . . 88
A Fields and vector spaces 89
B Vandermonde and circulant matrices 93
C The Friendship Theorem 97
D Who is top of the league? 101
E Other canonical forms 105
F Worked examples 107
2 CONTENTS
Chapter 1
Vector spaces
These notes are about linear maps and bilinear forms on vector spaces, how we
represent them by matrices, how we manipulate them, and what we use this for.
1.1 Deﬁnitions
Deﬁnition 1.1 A ﬁeld is an algebraic system consisting of a nonempty set K
equipped with two binary operations + (addition) and (multiplication) satisfying
the conditions:
(A) (K, +) is an abelian group with identity element 0 (called zero);
(M) (K`¦0¦, ) is an abelian group with identity element 1;
(D) the distributive law
a(b+c) = ab+ac
holds for all a, b, c ∈ K.
If you don’t know what an abelian group is, then you can ﬁnd it spelled out in
detail in Appendix A. In fact, the only ﬁelds that I will use in these notes are
• Q, the ﬁeld of rational numbers;
• R, the ﬁeld of real numbers;
• C, the ﬁeld of complex numbers;
• F
p
, the ﬁeld of integers mod p, where p is a prime number.
I will not stop to prove that these structures really are ﬁelds. You may have seen
F
p
referred to as Z
p
.
3
4 CHAPTER 1. VECTOR SPACES
Deﬁnition 1.2 A vector space V over a ﬁeld K is an algebraic system consisting
of a nonempty set V equipped with a binary operation + (vector addition), and
an operation of scalar multiplication
(a, v) ∈ KV →av ∈V
such that the following rules hold:
(VA) (V, +) is an abelian group, with identity element 0 (the zero vector).
(VM) Rules for scalar multiplication:
(VM0) For any a ∈ K, v ∈V, there is a unique element av ∈V.
(VM1) For any a ∈ K, u, v ∈V, we have a(u+v) = au+av.
(VM2) For any a, b ∈ K, v ∈V, we have (a+b)v = av +bv.
(VM3) For any a, b ∈ K, v ∈V, we have (ab)v = a(bv).
(VM4) For any v ∈V, we have 1v = v (where 1 is the identity element of K).
Since we have two kinds of elements, namely elements of K and elements of
V, we distinguish them by calling the elements of K scalars and the elements of
V vectors.
A vector space over the ﬁeld R is often called a real vector space, and one
over C is a complex vector space.
Example 1.1 The ﬁrst example of a vector space that we meet is the Euclidean
plane R
2
. This is a real vector space. This means that we can add two vectors, and
multiply a vector by a scalar (a real number). There are two ways we can make
these deﬁnitions.
• The geometric deﬁnition. Think of a vector as an arrow starting at the origin
and ending at a point of the plane. Then addition of two vectors is done by
the parallelogram law (see Figure 1.1). The scalar multiple av is the vector
whose length is [a[ times the length of v, in the same direction if a > 0 and
in the opposite direction if a < 0.
• The algebraic deﬁnition. We represent the points of the plane by Cartesian
coordinates (x, y). Thus, a vector v is just a pair (x, y) of real numbers. Now
we deﬁne addition and scalar multiplication by
(x
1
, y
1
) +(x
2
, y
2
) = (x
1
+x
2
, y
1
+y
2
),
a(x, y) = (ax, ay).
1.2. BASES 5
¡
¡
¡
¡
¡
¡
¡
¡
¡
¡
¡
¡
&
&
&
&
&
&
&
&
&
&
&
&
¡
¡
¡
¡
¡!
¡
¡
¡
¡
¡!
I
I
&
&
&
&
&
&
&
&
&b
Figure 1.1: The parallelogram law
Not only is this deﬁnition much simpler, but it is much easier to check that
the rules for a vector space are really satisﬁed! For example, we check the
law a(v +w) = av +aw. Let v = (x
1
, y
1
) and w = (x
2
, y
2
). Then we have
a(v +w) = a((x
1
, y
1
) +(x
2
, y
2
)
= a(x
1
+x
2
, y
1
+y
2
)
= (ax
1
+ax
2
, ay
1
+ay
2
)
= (ax
1
, ay
1
) +(ax
2
, ay
2
)
= av +aw.
In the algebraic deﬁnition, we say that the operations of addition and scalar
multiplication are coordinatewise: this means that we add two vectors coordinate
by coordinate, and similarly for scalar multiplication.
Using coordinates, this example can be generalised.
Example 1.2 Let n be any positive integer and K any ﬁeld. Let V = K
n
, the set
of all ntuples of elements of K. Then V is a vector space over K, where the
operations are deﬁned coordinatewise:
(a
1
, a
2
, . . . , a
n
) +(b
1
, b
2
, . . . , b
n
) = (a
1
+b
1
, a
2
+b
2
, . . . , a
n
+b
n
),
c(a
1
, a
2
, . . . , a
n
) = (ca
1
, ca
2
, . . . , ca
n
).
1.2 Bases
This example is much more general than it appears: Every ﬁnitedimensional vec
tor space looks like Example 1.2. Here’s why.
Deﬁnition 1.3 Let V be a vector space over the ﬁeld K, and let v
1
, . . . , v
n
be vec
tors in V.
6 CHAPTER 1. VECTOR SPACES
(a) The vectors v
1
, v
2
, . . . , v
n
are linearly independent if, whenever we have
scalars c
1
, c
2
, . . . , c
n
satisfying
c
1
v
1
+c
2
v
2
+ +c
n
v
n
= 0,
then necessarily c
1
= c
2
= = 0.
(b) The vectors v
1
, v
2
, . . . , v
n
are spanning if, for every vector v ∈V, we can ﬁnd
scalars c
1
, c
2
, . . . , c
n
∈ K such that
v = c
1
v
1
+c
2
v
2
+ +c
n
v
n
.
In this case, we write V ='v
1
, v
2
, . . . , v
n
`.
(c) The vectors v
1
, v
2
, . . . , v
n
form a basis for V if they are linearly independent
and spanning.
Remark Linear independence is a property of a list of vectors. A list containing
the zero vector is never linearly independent. Also, a list in which the same vector
occurs more than once is never linearly independent.
I will say “Let B = (v
1
, . . . , v
n
) be a basis for V” to mean that the list of vectors
v
1
, . . . , v
n
is a basis, and to refer to this list as B.
Deﬁnition 1.4 Let V be a vector space over the ﬁeld K. We say that V is ﬁnite
dimensional if we can ﬁnd vectors v
1
, v
2
, . . . , v
n
∈V which form a basis for V.
Remark In these notes we are only concerned with ﬁnitedimensional vector
spaces. If you study Functional Analysis, Quantum Mechanics, or various other
subjects, you will meet vector spaces which are not ﬁnite dimensional.
Proposition 1.1 The following three conditions are equivalent for the vectors
v
1
, . . . , v
n
of the vector space V over K:
(a) v
1
, . . . , v
n
is a basis;
(b) v
1
, . . . , v
n
is a maximal linearly independent set (that is, if we add any vector
to the list, then the result is no longer linearly independent);
(c) v
1
, . . . , v
n
is a minimal spanning set (that is, if we remove any vector from
the list, then the result is no longer spanning).
The next theorem helps us to understand the properties of linear independence.
1.2. BASES 7
Theorem 1.2 (The Exchange Lemma) Let V be a vector space over K. Suppose
that the vectors v
1
, . . . , v
n
are linearly independent, and that the vectors w
1
, . . . , w
m
are linearly independent, where m>n. Then we can ﬁnd a number i with 1 ≤i ≤m
such that the vectors v
1
, . . . , v
n
, w
i
are linearly independent.
In order to prove this, we need a lemma about systems of equations.
Lemma 1.3 Given a system (∗)
a
11
x
1
+a
12
x
2
+ +a
1m
x
m
= 0,
a
21
x
1
+a
22
x
2
+ +a
2m
x
m
= 0,
a
n1
x
1
+a
n2
x
2
+ +a
nm
x
m
= 0
of homogeneous linear equations, where the number n of equations is strictly less
than the number m of variables, there exists a nonzero solution (x
1
, . . . , x
m
) (that
is, x
1
, . . . , x
m
are not all zero).
Proof This is proved by induction on the number of variables. If the coefﬁcients
a
11
, a
21
, . . . , a
n1
of x
1
are all zero, then putting x
1
= 1 and the other variables zero
gives a solution. If one of these coefﬁcients is nonzero, then we can use the
corresponding equation to express x
1
in terms of the other variables, obtaining
n −1 equations in m−1 variables. By hypothesis, n −1 < m−1. So by the
induction hypothesis, these new equations have a nonzero solution. Computing
the value of x
1
gives a solution to the original equations.
Now we turn to the proof of the Exchange Lemma. Let us argue for a contra
diction, by assuming that the result is false: that is, assume that none of the vectors
w
i
can be added to the list (v
1
, . . . , v
n
) to produce a larger linearly independent list.
This means that, for all j, the list (v
1
, . . . , v
n
, w
i
) is linearly dependent. So there
are coefﬁcients c
1
, . . . , c
n
, d, not all zero, such that
c
1
v
1
+ +c
n
v
n
+dw
i
= 0.
We cannot have d = 0; for this would mean that we had a linear combination of
v
1
, . . . , v
n
equal to zero, contrary to the hypothesis that these vectors are linearly
independent. So we can divide the equation through by d, and take w
i
to the other
side, to obtain (changing notation slightly)
w
i
= a
1i
v
1
+a
2i
v
2
+ +a
ni
v
n
=
n
∑
j=1
a
ji
v
j
.
8 CHAPTER 1. VECTOR SPACES
We do this for each value of i = 1, . . . , m.
Now take a nonzero solution to the set of equations (∗) above: that is,
m
∑
i=1
a
ji
x
i
= 0
for j = 1, . . . , n.
Multiplying the formula for w
i
by x
i
and adding, we obtain
x
1
w
1
+ +x
m
w
m
=
n
∑
j=1
m
∑
i=1
a
ji
x
i
v
j
= 0.
But the coefﬁcients are not all zero, so this means that the vectors (w
1
, . . . , w
m
)
are not linearly dependent, contrary to hypothesis.
So the assumption that no w
i
can be added to (v
1
, . . . , v
n
) to get a linearly
independent set must be wrong, and the proof is complete.
The Exchange Lemma has some important consequences:
Corollary 1.4 Let V be a ﬁnitedimensional vector space over a ﬁeld K. Then
(a) any two bases of V have the same number of elements;
(b) any linearly independent set can be extended to a basis.
The number of elements in a basis is called the dimension of the vector space
V. We will say “an ndimensional vector space” instead of “a ﬁnitedimensional
vector space whose dimension is n”. We denote the dimension of V by dim(V).
Proof Let us see how the corollary follows from the Exchange Lemma.
(a) Let (v
1
, . . . , v
n
) and (w
1
, . . . , w
m
) be two bases for V. Suppose, for a con
tradiction, that they have different numbers of elements; say that n < m, without
loss of generality. Both lists of vectors are linearly independent; so, according to
the Exchange Lemma, we can add some vector w
i
to the ﬁrst list to get a larger
linearly independent list. This means that v
1
, . . . , v
n
was not a maximal linearly
independent set, and so (by Proposition 1.1) not a basis, contradicting our assump
tion. We conclude that m = n, as required.
(b) Let (v
1
, . . . , v
n
) be linearly independent and let (w
1
, . . . , w
m
) be a basis.
Necessarily n ≤ m, since otherwise we could add one of the vs to (
1
, . . . , w
m
) to
get a larger linearly independent set, contradicting maximality. But now we can
add some ws to (v
1
, . . . , v
n
) until we obtain a basis.
1.3. ROW AND COLUMN VECTORS 9
Remark We allow the possibility that a vector space has dimension zero. Such
a vector space contains just one vector, the zero vector 0; a basis for this vector
space consists of the empty set.
Now let V be an ndimensional vector space over K. This means that there is a
basis v
1
, v
2
, . . . , v
n
for V. Since this list of vectors is spanning, every vector v ∈V
can be expressed as
v = c
1
v
1
+c
2
v
2
+ +c
n
v
n
for some scalars c
1
, c
2
, . . . , c
n
∈ K. The scalars c
1
, . . . , c
n
are the coordinates of
v (with respect to the given basis), and the coordinate representation of v is the
ntuple
(c
1
, c
2
, . . . , c
n
) ∈ K
n
.
Now the coordinate representation is unique. For suppose that we also had
v = c
/
1
v
1
+c
/
2
v
2
+ +c
/
n
v
n
for scalars c
/
1
, c
/
2
. . . , c
/
n
. Subtracting these two expressions, we obtain
0 = (c
1
−c
/
1
)v
1
+(c
2
−c
/
2
)v
2
+ +(c
n
−c
/
n
)v
n
.
Now the vectors v
1
, v
2
. . . , v
n
are linearly independent; so this equation implies
that c
1
−c
/
1
= 0, c
2
−c
/
2
= 0, . . . , c
n
−c
/
n
= 0; that is,
c
1
= c
/
1
, c
2
= c
/
2
, . . . c
n
= c
/
n
.
Now it is easy to check that, when we add two vectors in V, we add their
coordinate representations in K
n
(using coordinatewise addition); and when we
multiply a vector v ∈ V by a scalar c, we multiply its coordinate representation
by c. In other words, addition and scalar multiplication in V translate to the same
operations on their coordinate representations. This is why we only need to con
sider vector spaces of the form K
n
, as in Example 1.2.
Here is how the result would be stated in the language of abstract algebra:
Theorem 1.5 Any ndimensional vector space over a ﬁeld K is isomorphic to the
vector space K
n
.
1.3 Row and column vectors
The elements of the vector space K
n
are all the ntuples of scalars from the ﬁeld
K. There are two different ways that we can represent an ntuple: as a row, or as
10 CHAPTER 1. VECTOR SPACES
a column. Thus, the vector with components 1, 2 and −3 can be represented as a
row vector
[ 1 2 −3]
or as a column vector
1
2
−3
¸
¸
.
(Note that we use square brackets, rather than round brackets or parentheses. But
you will see the notation (1, 2, −3) and the equivalent for columns in other books!)
Both systems are in common use, and you should be familiar with both. The
choice of row or column vectors makes some technical differences in the state
ments of the theorems, so care is needed.
There are arguments for and against both systems. Those who prefer row
vectors would argue that we already use (x, y) or (x, y, z) for the coordinates of
a point in 2 or 3dimensional Euclidean space, so we should use the same for
vectors. The most powerful argument will appear when we consider representing
linear maps by matrices.
Those who prefer column vectors point to the convenience of representing,
say, the linear equations
2x +3y = 5,
4x +5y = 9
in matrix form
¸
2 3
4 5
¸
x
y
=
¸
5
9
.
Statisticians also prefer column vectors: to a statistician, a vector often represents
data from an experiment, and data are usually recorded in columns on a datasheet.
I will use column vectors in these notes. So we make a formal deﬁnition:
Deﬁnition 1.5 Let V be a vector space with a basis B = (v
1
, v
2
, . . . , v
n
). If v =
c
1
v
1
+c
2
v
2
+ +c
n
v
n
, then the coordinate representation of v relative to the
basis B is
[v]
B
=
c
1
c
2
.
.
.
c
n
¸
¸
¸
¸
.
In order to save space on the paper, we often write this as
[v]
B
= [ c
1
c
2
. . . v
n
]
·
.
The symbol · is read “transpose”.
1.4. CHANGE OF BASIS 11
1.4 Change of basis
The coordinate representation of a vector is always relative to a basis. We now
have to look at how the representation changes when we use a different basis.
Deﬁnition 1.6 Let B= (v
1
, . . . , v
n
) and B
/
= (v
/
1
, . . . , v
/
n
) be bases for the ndimensional
vector space V over the ﬁeld K. The transitition matrix P from B to B
/
is the nn
matrix whose jth column is the coordinate representation [v
/
j
]
B
of the jth vector
of B
/
relative to B. If we need to specify the bases, we write P
B,B
/ .
Proposition 1.6 Let B and B
/
be bases for the ndimensional vector space V over
the ﬁeld K. Then, for any vector v ∈V, the coordinate representations of v with
respect to B and B
/
are related by
[v]
B
= P[v]
B
/ .
Proof Let p
i j
be the i, j entry of the matrix P. By deﬁnition, we have
v
/
j
=
n
∑
i=1
p
i j
v
i
.
Take an arbitrary vector v ∈V, and let
[v]
B
= [c
1
, . . . , c
n
]
·
, [v]
B
/ = [d
1
, . . . , d
n
]
·
.
This means, by deﬁnition, that
v =
n
∑
i=1
c
i
v
i
=
n
∑
j=1
d
j
v
/
j
.
Substituting the formula for v
/
j
into the second equation, we have
v =
n
∑
j=1
d
j
n
∑
i=1
p
i j
v
i
.
Reversing the order of summation, we get
v =
n
∑
i=1
n
∑
j=1
p
i j
d
j
v
i
.
Now we have two expressions for v as a linear combination of the vectors v
i
. By
the uniqueness of the coordinate representation, they are the same: that is,
c
i
=
n
∑
j=1
p
i j
d
j
.
12 CHAPTER 1. VECTOR SPACES
In matrix form, this says
c
1
.
.
.
c
n
¸
¸
= P
d
1
.
.
.
d
n
¸
¸
,
or in other words
[v]
B
= P[v]
B
/ ,
as required.
In this course, we will see four ways in which matrices arise in linear algebra.
Here is the ﬁrst occurrence: matrices arise as transition matrices between bases
of a vector space.
The next corollary summarises how transition matrices behave. Here I denotes
the identity matrix, the matrix having 1s on the main diagonal and 0s everywhere
else. Given a matrix P, we denote by P
−1
the inverse of P, the matrix Q satisfying
PQ = QP = I. Not every matrix has an inverse: we say that P is invertible or
nonsingular if it has an inverse.
Corollary 1.7 Let B, B
/
, B
//
be bases of the vector space V.
(a) P
B,B
= I.
(b) P
B
/
,B
= (P
B,B
/ )
−1
.
(c) P
B,B
// = P
B,B
/ P
B
/
,B
// .
This follows from the preceding Proposition. For example, for (b) we have
[v]
B
= P
B,B
/ [v]
B
/ , [v]
B
/ = P
B
/
,B
[v]
B
,
so
[v]
B
= P
B,B
/ P
B
/
,B
[v]
B
.
By the uniqueness of the coordinate representation, we have P
B,B
/ P
B
/
,B
= I.
Corollary 1.8 The transition matrix between any two bases of a vector space is
invertible.
This follows immediately from (b) of the preceding Corollary.
Remark We see that, to express the coordinate representation w.r.t. the new
basis in terms of that w.r.t. the old one, we need the inverse of the transition matrix:
[v]
B
/ = P
−1
B,B
/
[v]
B
.
1.5. SUBSPACES AND DIRECT SUMS 13
Example Consider the vector space R
2
, with the two bases
B =
¸
1
0
,
¸
0
1
, B
/
=
¸
1
1
,
¸
2
3
.
The transition matrix is
P
B,B
/ =
¸
1 2
1 3
,
whose inverse is calculated to be
P
B
/
,B
=
¸
3 −2
−1 1
.
So the theorem tells us that, for any x, y ∈ R, we have
¸
x
y
= x
¸
1
0
+y
¸
0
1
= (3x −2y)
¸
1
1
+(−x +y)
¸
2
3
,
as is easily checked.
1.5 Subspaces and direct sums
Deﬁnition 1.7 A nonempty subset of a vector space is called a subspace if it
contains the sum of any two of its elements and any scalar multiple of any of its
elements. We write U ≤V to mean “U is a subspace of V”.
A subspace of a vector space is a vector space in its own right.
Subspaces can be constructed in various ways:
(a) Let v
1
, . . . , v
n
∈V. The span of (v
1
, . . . , v
n
) is the set
¦c
1
v
1
+c
2
v
2
+ +c
n
v
n
: c
1
, . . . , c
n
∈ K¦.
This is a subspace of V. Moreover, (v
1
, . . . , v
n
) is a spanning set in this
subspace. We denote the span of v
1
, . . . , v
n
by 'v
1
, . . . , v
n
`.
(b) Let U
1
and U
2
be subspaces of V. Then
– the intersection U
1
∩U
2
is the set of all vectors belonging to both U
1
and U
2
;
– the sum U
1
+U
2
is the set ¦u
1
+u
2
: u
1
∈U
1
, u
2
∈U
2
¦ of all sums of
vectors from the two subspaces.
Both U
1
∩U
2
and U
1
+U
2
are subspaces of V.
14 CHAPTER 1. VECTOR SPACES
The next result summarises some properties of these subspaces. Proofs are left
to the reader.
Proposition 1.9 Let V be a vector space over K.
(a) For any v
1
, . . . , v
n
∈V, the dimension of 'v
1
, . . . , v
n
` is at most n, with equal
ity if and only if v
1
, . . . , v
n
are linearly independent.
(b) For any two subspaces U
1
and U
2
of V, we have
dim(U
1
∩U
2
) +dim(U
1
+U
2
) = dim(U
1
) +dim(U
2
).
An important special case occurs when U
1
∩U
2
is the zero subspace ¦0¦. In
this case, the sum U
1
+U
2
has the property that each of its elements has a unique
expression in the form u
1
+u
2
, for u
1
∈U
1
and u
2
∈U
2
. For suppose that we had
two different expressions for a vector v, say
v = u
1
+u
2
= u
/
1
+u
/
2
, u
1
, u
/
1
∈U
1
, u
2
, u
/
2
∈U
2
.
Then
u
1
−u
/
1
= u
/
2
−u
2
.
But u
1
−u
/
1
∈U
1
, and u
/
2
−u
2
∈U
2
; so this vector is in U
1
∩U
2
, and by hypothesis
it is equal to 0, so that u
1
= u
/
1
and u
2
= u
/
2
; that is, the two expressions are
not different after all! In this case we say that U
1
+U
2
is the direct sum of the
subspaces U
1
and U
2
, and write it as U
1
⊕U
2
. Note that
dim(U
1
⊕U
2
) = dim(U
1
) +dim(U
2
).
The notion of direct sum extends to more than two summands, but is a little
complicated to describe. We state a form which is sufﬁcient for our purposes.
Deﬁnition 1.8 Let U
1
, . . . ,U
r
be subspaces of the vector space V. We say that V
is the direct sum of U
1
, . . . ,U
r
, and write
V =U
1
⊕. . . ⊕U
r
,
if every vector v ∈ V can be written uniquely in the form v = u
1
+ +u
r
with
u
i
∈U
i
for i = 1, . . . , r.
Proposition 1.10 If V =U
1
⊕ ⊕U
r
, then
(a) dim(V) = dim(U
1
) + +dim(U
r
);
(b) if B
i
is a basis for U
i
for i = 1, . . . , r, then B
1
∪ ∪B
r
is a basis for V.
Chapter 2
Matrices and determinants
You have certainly seen matrices before; indeed, we met some in the ﬁrst chapter
of the notes. Here we revise matrix algebra, consider row and column operations
on matrices, and deﬁne the rank of a matrix. Then we deﬁne the determinant of
a square matrix axiomatically and prove that it exists (that is, there is a unique
“determinant” function satisfying the rules we lay down), and give some methods
of calculating it and some of its properties. Finally we prove the Cayley–Hamilton
Theorem: every matrix satisﬁes its own characteristic equation.
2.1 Matrix algebra
Deﬁnition 2.1 A matrix of size mn over a ﬁeld K, where m and n are positive
integers, is an array with m rows and n columns, where each entry is an element
of K. For 1 ≤i ≤m and 1 ≤ j ≤n, the entry in row i and column j of A is denoted
by A
i j
, and referred to as the (i, j) entry of A.
Example 2.1 A column vector in K
n
can be thought of as a n1 matrix, while a
row vector is a 1n matrix.
Deﬁnition 2.2 We deﬁne addition and multiplication of matrices as follows.
(a) Let A and B be matrices of the same size mn over K. Then the sum A+B
is deﬁned by adding corresponding entries:
(A+B)
i j
= A
i j
+B
i j
.
(b) Let A be an mn matrix and B an n p matrix over K. Then the product
AB is the m p matrix whose (i, j) entry is obtained by multiplying each
15
16 CHAPTER 2. MATRICES AND DETERMINANTS
element in the ith row of A by the corresponding element in the jth column
of B and summing:
(AB)
i j
=
n
∑
k=1
A
ik
B
k j
.
Remark Note that we can only add or multiply matrices if their sizes satisfy
appropriate conditions. In particular, for a ﬁxed value of n, we can add and mul
tiply n n matrices. It turns out that the set M
n
(K) of n n matrices over K is
a ring with identity: this means that it satisﬁes conditions (A0)–(A4), (M0)–(M2)
and (D) of Appendix 1. The zero matrix, which we denote by O, is the matrix
with every entry zero, while the identity matrix, which we denote by I, is the ma
trix with entries 1 on the main diagonal and 0 everywhere else. Note that matrix
multiplication is not commutative: BA is usually not equal to AB.
We already met matrix multiplication in Section 1 of the notes: recall that if
P
B,B
/ denotes the transition matrix between two bases of a vector space, then
P
B,B
/ P
B
/
,B
// = P
B,B
// .
2.2 Row and column operations
Given an mn matrix A over a ﬁeld K, we deﬁne certain operations on A called
row and column operations.
Deﬁnition 2.3 Elementary row operations There are three types:
Type 1 Add a multiple of the jth row to the ith, where j = i.
Type 2 Multiply the ith row by a nonzero scalar.
Tyle 3 Interchange the ith and jth rows, where j = i.
Elementary column operations There are three types:
Type 1 Add a multiple of the jth column to the ith, where j = i.
Type 2 Multiply the ith column by a nonzero scalar.
Tyle 3 Interchange the ith and jth column, where j = i.
By applying these operations, we can reduce any matrix to a particularly sim
ple form:
2.2. ROW AND COLUMN OPERATIONS 17
Theorem 2.1 Let A be an mn matrix over the ﬁeld K. Then it is possible to
change A into B by elementary row and column operations, where B is a matrix
of the same size satisfying B
ii
= 1 for 0 ≤ i ≤ r, for r ≤ min¦m, n¦, and all other
entries of B are zero.
If A can be reduced to two matrices B and B
/
both of the above form, where
the numbers of nonzero elements are r and r
/
respectively, by different sequences
of elementary operations, then r = r
/
, and so B = B
/
.
Deﬁnition 2.4 The number r in the above theorem is called the rank of A; while
a matrix of the form described for B is said to be in the canonical form for equiv
alence. We can write the canonical form matrix in “block form” as
B =
¸
I
r
O
O O
,
where I
r
is an r r identity matrix and O denotes a zero matrix of the appropriate
size (that is, r (n−r), (m−r)r, and (m−r)(n−r) respectively for the three
Os). Note that some or all of these Os may be missing: for example, if r = m, we
just have [ I
m
O].
Proof We outline the proof that the reduction is possible. To prove that we al
ways get the same value of r, we need a different argument.
The proof is by induction on the size of the matrix A: in other words, we
assume as inductive hypothesis that any smaller matrix can be reduced as in the
theorem. Let the matrix A be given. We proceed in steps as follows:
• If A = O (the allzero matrix), then the conclusion of the theorem holds,
with r = 0; no reduction is required. So assume that A = O.
• If A
11
= 0, then skip this step. If A
11
= 0, then there is a nonzero element
A
i j
somewhere in A; by swapping the ﬁrst and ith rows, and the ﬁrst and jth
columns, if necessary (Type 3 operations), we can bring this entry into the
(1, 1) position.
• Now we can assume that A
11
= 0. Multiplying the ﬁrst row by A
−1
11
, (row
operation Type 2), we obtain a matrix with A
11
= 1.
• Now by row and column operations of Type 1, we can assume that all the
other elements in the ﬁrst row and column are zero. For if A
1 j
= 0, then
subtracting A
1 j
times the ﬁrst column from the jth gives a matrix with A
1 j
=
0. Repeat this until all nonzero elements have been removed.
18 CHAPTER 2. MATRICES AND DETERMINANTS
• Now let B be the matrix obtained by deleting the ﬁrst row and column of A.
Then B is smaller than A and so, by the inductive hypothesis, we can reduce
B to canonical form by elementary row and column operations. The same
sequence of operations applied to A now ﬁnish the job.
Example 2.2 Here is a small example. Let
A =
¸
1 2 3
4 5 6
.
We have A
11
= 1, so we can skip the ﬁrst three steps. Subtracting twice the ﬁrst
column from the second, and three times the ﬁrst column from the third, gives the
matrix
¸
1 0 0
4 −3 −6
.
Now subtracting four times the ﬁrst row from the second gives
¸
1 0 0
0 −3 −6
.
From now on, we have to operate on the smaller matrix [ −3 −6], but we con
tinue to apply the operations to the large matrix.
Multiply the second row by −1/3 to get
¸
1 0 0
0 1 2
.
Now subtract twice the second column from the third to obtain
¸
1 0 0
0 1 0
.
We have ﬁnished the reduction, and we conclude that the rank of the original
matrix A is equal to 2.
We ﬁnish this section by describing the elementary row and column operations
in a different way.
For each elementary row operation on an nrowed matrix A, we deﬁne the cor
responding elementary matrix by applying the same operation to the nn identity
matrix I. Similarly we represent elementary column operations by elementary ma
trices obtained by applying the same operations to the mm identity matrix.
We don’t have to distinguish between rows and columns for our elementary
matrices. For example, the matrix
1 2 0
0 1 0
0 0 1
¸
¸
2.2. ROW AND COLUMN OPERATIONS 19
corresponds to the elementary column operation of adding twice the ﬁrst column
to the second, or to the elementary row operation of adding twice the second
row to the ﬁrst. For the other types, the matrices for row operations and column
operations are identical.
Lemma 2.2 The effect of an elementary row operation on a matrix is the same as
that of multiplying on the left by the corresponding elementary matrix. Similarly,
the effect of an elementary column operation is the same as that of multiplying on
the right by the corresponding elementary matrix.
The proof of this lemma is somewhat tedious calculation.
Example 2.3 We continue our previous example. In order, here is the list of
elementary matrices corresponding to the operations we applied to A. (Here 22
matrices are row operations while 33 matrices are column operations).
1 −2 0
0 1 0
0 0 1
¸
¸
,
1 0 −3
0 1 0
0 0 1
¸
¸
,
¸
1 0
−4 1
,
¸
1 0
0 −1/3
,
1 0 0
0 1 −2
0 0 1
¸
¸
.
So the whole process can be written as a matrix equation:
¸
1 0
0 −1/3
¸
1 0
−4 1
A
1 −2 0
0 1 0
0 0 1
¸
¸
1 0 −3
0 1 0
0 0 1
¸
¸
1 0 0
0 1 −2
0 0 1
¸
¸
= B,
or more simply
¸
1 0
4/3 −1/3
A
1 −2 1
0 1 −2
0 0 1
¸
¸
= B,
where, as before,
A =
¸
1 2 3
4 5 6
, B =
¸
1 0 0
0 1 0
.
An important observation about the elementary operations is that each of them
can have its effect undone by another elementary operation of the same kind,
and hence every elementary matrix is invertible, with its inverse being another
elementary matrix of the same kind. For example, the effect of adding twice the
ﬁrst row to the second is undone by adding −2 times the ﬁrst row to the second,
so that
¸
1 2
0 1
−1
=
¸
1 −2
0 1
.
Since the product of invertible matrices is invertible, we can state the above theo
rem in a more concise form. First, one more deﬁnition:
20 CHAPTER 2. MATRICES AND DETERMINANTS
Deﬁnition 2.5 The mn matrices A and B are said to be equivalent if B = PAQ,
where P and Q are invertible matrices of sizes mm and nn respectively.
Theorem 2.3 Given any mn matrix A, there exist invertible matrices P and Q
of sizes mm and n n respectively, such that PAQ is in the canonical form for
equivalence.
Remark The relation “equivalence” deﬁned above is an equivalence relation on
the set of all mn matrices; that is, it is reﬂexive, symmetric and transitive.
When mathematicians talk about a “canonical form” for an equivalence re
lation, they mean a set of objects which are representatives of the equivalence
classes: that is, every object is equivalent to a unique object in the canonical form.
We have shown this for the relation of equivalence deﬁned earlier, except for the
uniqueness of the canonical form. This is our job for the next section.
2.3 Rank
We have the unﬁnished business of showing that the rank of a matrix is well de
ﬁned; that is, no matter how we do the row and column reduction, we end up with
the same canonical form. We do this by deﬁning two further kinds of rank, and
proving that all three are the same.
Deﬁnition 2.6 Let A be an mn matrix over a ﬁeld K. We say that the column
rank of A is the maximum number of linearly independent columns of A, while
the row rank of A is the maximum number of linearly independent rows of A. (We
regard columns or rows as vectors in K
m
and K
n
respectively.)
Now we need a sequence of four lemmas.
Lemma 2.4 (a) Elementary column operations don’t change the column rank
of a matrix.
(b) Elementary row operations don’t change the column rank of a matrix.
(c) Elementary column operations don’t change the row rank of a matrix.
(d) Elementary row operations don’t change the row rank of a matrix.
Proof (a) This is clear for Type 3 operations, which just rearrange the vectors.
For Types 1 and 2, we have to show that such an operation cannot take a linearly
independent set to a linearly dependent set; the vice versa statement holds because
the inverse of an elementary operation is another operation of the same kind.
2.3. RANK 21
So suppose that v
1
, . . . , v
n
are linearly independent. Consider a Type 1 oper
ation involving adding c times the jth column to the ith; the new columns are
v
/
1
, . . . , v
/
n
, where v
/
k
= v
k
for k = i, while v
/
i
= v
i
+cv
j
. Suppose that the new vec
tors are linearly dependent. Then there are scalars a
1
, . . . , a
n
, not all zero, such
that
0 = a
1
v
/
1
+ +a
n
v
/
n
= a
1
v
1
+ +a
i
(v
i
+cv
j
) + +a
j
v
j
+ +a
n
v
n
= a
1
v
1
+ +a
i
v
i
+ +(a
j
+ca
i
)v
j
+ +a
n
v
n
.
Since v
1
, . . . , v
n
are linearly independent, we conclude that
a
1
= 0, . . . , a
i
= 0, . . . , a
j
+ca
i
= 0, . . . , a
n
= 0,
from which we see that all the a
k
are zero, contrary to assumption. So the new
columns are linearly independent.
The argument for Type 2 operations is similar but easier.
(b) It is easily checked that, if an elementary row operation is applied, then the
new vectors satisfy exactly the same linear relations as the old ones (that is, the
same linear combinations are zero). So the linearly independent sets of vectors
don’t change at all.
(c) Same as (b), but applied to rows.
(d) Same as (a), but applied to rows.
Theorem 2.5 For any matrix A, the row rank, the column rank, and the rank are
all equal. In particular, the rank is independent of the row and column operations
used to compute it.
Proof Suppose that we reduce A to canonical form B by elementary operations,
where B has rank r. These elementary operations don’t change the row or column
rank, by our lemma; so the row ranks of A and B are equal, and their column ranks
are equal. But it is trivial to see that, if
B =
¸
I
r
O
O O
,
then the row and column ranks of B are both equal to r. So the theorem is proved.
We can get an extra piece of information from our deliberations. Let A be an
invertible n n matrix. Then the canonical form of A is just I: its rank is equal
to n. This means that there are matrices P and Q, each a product of elementary
matrices, such that
PAQ = I
n
.
22 CHAPTER 2. MATRICES AND DETERMINANTS
From this we deduce that
A = P
−1
I
n
Q
−1
= P
−1
Q
−1
;
in other words,
Corollary 2.6 Every invertible square matrix is a product of elementary matrices.
In fact, we learn a little bit more. We observed, when we deﬁned elementary
matrices, that they can represent either elementary column operations or elemen
tary row operations. So, when we have written A as a product of elementary
matrices, we can choose to regard them as representing column operations, and
we see that A can be obtained from the identity by applying elementary column
operations. If we nowapply the inverse operations in the other order, they will turn
A into the identity (which is its canonical form). In other words, the following is
true:
Corollary 2.7 If A is an invertible n n matrix, then A can be transformed into
the identity matrix by elementary column operations alone (or by elementary row
operations alone).
2.4 Determinants
The determinant is a function deﬁned on square matrices; its value is a scalar.
It has some very important properties: perhaps most important is the fact that a
matrix is invertible if and only if its determinant is not equal to zero.
We denote the determinant function by det, so that det(A) is the determinant
of A. For a matrix written out as an array, the determinant is denoted by replacing
the square brackets by vertical bars:
det
¸
1 2
3 4
=
1 2
3 4
.
You have met determinants in earlier courses, and you know the formula for
the determinant of a 22 or 33 matrix:
a b
c d
= ad −bc,
a b c
d e f
g h i
= aei +b f g+cdh−a f h−bdi −ceg.
Our ﬁrst job is to deﬁne the determinant for square matrices of any size. We do
this in an “axiomatic” manner:
2.4. DETERMINANTS 23
Deﬁnition 2.7 A function D deﬁned on nn matrices is a determinant if it satis
ﬁes the following three conditions:
(D1) For 1 ≤i ≤n, D is a linear function of the ith column: this means that, if A
and A
/
are two matrices which agree everywhere except the ith column, and
if A
//
is the matrix whose ith column is c times the ith column of A plus c
/
times the ith column of A
/
, but agreeing with A and A
/
everywhere else, then
D(A
//
) = cD(A) +c
/
D(A
/
).
(D2) If A has two equal columns, then D(A) = 0.
(D3) D(I
n
) = 1, where I
n
is the nn identity matrix.
We show the following result:
Theorem 2.8 There is a unique determinant function on nn matrices, for any n.
Proof First, we show that applying elementary row operations to A has a well
deﬁned effect on D(A).
(a) If B is obtained from A by adding c times the jth column to the ith, then
D(B) = D(A).
(b) If B is obtained from A by multiplying the ith column by a nonzero scalar
c, then D(B) = cD(A).
(c) If B is obtained from A by interchanging two columns, then D(B) =−D(A).
For (a), let A
/
be the matrix which agrees with A in all columns except the ith,
which is equal to the jth column of A. By rule (D2), D(A
/
) = 0. By rule (D1),
D(B) = D(A) +cD(A
/
) = D(A).
Part (b) follows immediately from rule (D3).
To prove part (c), we observe that we can interchange the ith and jth columns
by the following sequence of operations:
• add the ith column to the jth;
• multiply the ith column by −1;
• add the jth column to the ith;
• subtract the ith column from the jth.
24 CHAPTER 2. MATRICES AND DETERMINANTS
In symbols,
(c
i
, c
j
) →(c
i
, c
j
+c
i
) →(−c
i
, c
j
+c
i
) →(c
j
, c
j
+c
i
) →(c
j
, c
i
).
The ﬁrst, third and fourth steps don’t change the value of D, while the second
multiplies it by −1.
Now we take the matrix A and apply elementary column operations to it, keep
ing track of the factors by which D gets multiplied according to rules (a)–(c). The
overall effect is to multiply D(A) by a certain nonzero scalar c, depending on the
operations.
• If A is invertible, then we can reduce A to the identity, so that cD(A) =
D(I) = 1, whence D(A) = c
−1
.
• If A is not invertible, then its column rank is less than n. So the columns of A
are linearly dependent, and one column can be written as a linear combina
tion of the others. Applying axiom (D1), we see that D(A) is a linear com
bination of values D(A
/
), where A
/
are matrices with two equal columns; so
D(A
/
) = 0 for all such A
/
, whence D(A) = 0.
This proves that the determinant function, if it exists, is unique. We show its
existence in the next section, by giving a couple of formulae for it.
Given the uniqueness of the determinant function, we now denote it by det(A)
instead of D(A). The proof of the theorem shows an important corollary:
Corollary 2.9 A square matrix is invertible if and only if det(A) = 0.
Proof See the case division at the end of the proof of the theorem.
One of the most important properties of the determinant is the following.
Theorem 2.10 If A and B are nn matrices over K, then det(AB) =det(A)det(B).
Proof Suppose ﬁrst that B is not invertible. Then det(B) = 0. Also, AB is not
invertible. (For, suppose that (AB)
−1
=X, so that XAB=I. Then XA is the inverse
of B.) So det(AB) = 0, and the theorem is true.
In the other case, B is invertible, so we can apply a sequence of elementary
column operations to B to get to the identity. The effect of these operations is
to multiply the determinant by a nonzero factor c (depending on the operations),
so that cdet(B) = I, or c = (det(B))
−1
. Now these operations are represented by
elementary matrices; so we see that BQ = I, where Q is a product of elementary
matrices.
2.5. CALCULATING DETERMINANTS 25
If we apply the same sequence of elementary operations to AB, we end up with
the matrix (AB)Q = A(BQ) = AI = A. The determinant is multiplied by the same
factor, so we ﬁnd that cdet(AB) = det(A). Since c = det(B))
−1
, this implies that
det(AB) = det(A)det(B), as required.
Finally, we have deﬁned determinants using columns, but we could have used
rows instead:
Proposition 2.11 The determinant is the unique function D of n n matrices
which satisﬁes the conditions
(D1
/
) for 1 ≤i ≤n, D is a linear function of the ith row;
(D2
/
) if two rows of A are equal , then D(A) = 0;
(D3
/
) D(I
n
) = 1.
The proof of uniqueness is almost identical to that for columns. To see that
D(A) =det(A): if A is not invertible, then D(A) =det(A) =0; but if A is invertible,
then it is a product of elementary matrices (which can represent either row or
column operations), and the determinant is the product of the factors associated
with these operations.
Corollary 2.12 If A
·
denotes the transpose of A, then det(A
·
) = det(A).
For, if Ddenotes the “determinant” computed by rowoperations, then det(A) =
D(A) = det(A
·
), since row operations on A correspond to column operations on
A
·
.
2.5 Calculating determinants
We now give a couple of formulae for the determinant. This ﬁnishes the job we
left open in the proof of the last theorem, namely, showing that a determinant
function actually exists!
The ﬁrst formula involves some background notation.
Deﬁnition 2.8 A permutation of ¦1, . . . , n¦ is a bijection from the set ¦1, . . . , n¦
to itself. The symmetric group S
n
consists of all permutations of the set ¦1, . . . , n¦.
(There are n! such permutations.) For any permutation π ∈ S
n
, there is a number
sign(π) = ±1, computed as follows: write π as a product of disjoint cycles; if
there are k cycles (including cycles of length 1), then sign(π) = (−1)
n−k
. A
transposition is a permutation which interchanges two symbols and leaves all the
others ﬁxed. Thus, if τ is a transposition, then sign(τ) =−1.
26 CHAPTER 2. MATRICES AND DETERMINANTS
The last fact holds because a transposition has one cycle of size 2 and n −2
cycles of size 1, so n−1 altogether; so sign(τ) = (−1)
n−(n−1)
=−1.
We need one more fact about signs: if π is any permutation and τ is a trans
position, then sign(πτ) =−sign(π), where πτ denotes the composition of π and
τ (apply ﬁrst τ, then π).
Deﬁnition 2.9 Let A be an nn matrix over K. The determinant of A is deﬁned
by the formula
det(A) =
∑
π∈S
n
sign(π)A
1π(1)
A
2π(2)
A
nπ(n)
.
Proof In order to show that this is a good deﬁnition, we need to verify that it
satisﬁes our three rules (D1)–(D3).
(D1) According to the deﬁnition, det(A) is a sum of n! terms. Each term, apart
from a sign, is the product of n elements, one from each row and column. If
we look at a particular column, say the ith, it is clear that each product is a
linear function of that column; so the same is true for the determinant.
(D2) Suppose that the ith and jth columns of A are equal. Let τ be the transpo
sition which interchanges i and j and leaves the other symbols ﬁxed. Then
π(τ(i)) =π( j) and π(τ( j)) =π(i), whereas π(τ(k)) =π(k) for k =i, j. Be
cause the elements in the ith and jth columns of A are the same, we see that
the products A
1π(1)
A
2π(2)
A
nπ(n)
and A
1πτ(1)
A
2πτ(2)
A
nπτ(n)
are equal.
But sign(πτ) = −sign(π). So the corresponding terms in the formula for
the determinant cancel one another. The elements of S
n
can be divided up
into n!/2 pairs of the form ¦π, πτ¦. As we have seen, each pair of terms in
the formula cancel out. We conclude that det(A) = 0. Thus (D2) holds.
(D3) If A = I
n
, then the only permutation π which contributes to the sum is the
identity permutation ι: for any other permutation π satisﬁes π(i) = i for
some i, so that A
iπ(i)
= 0. The sign of ι is +1, and all the terms A
iι(i)
= A
ii
are equal to 1; so det(A) = 1, as required.
This gives us a nice mathematical formula for the determinant of a matrix.
Unfortunately, it is a terrible formula in practice, since it involves working out
n! terms, each a product of matrix entries, and adding them up with + and −
signs. For n of moderate size, this will take a very long time! (For example,
10! = 3628800.)
Here is a second formula, which is also theoretically important but very inef
ﬁcient in practice.
2.5. CALCULATING DETERMINANTS 27
Deﬁnition 2.10 Let A be an n n matrix. For 1 ≤ i, j ≤ n, we deﬁne the (i, j)
minor of A to be the (n−1) (n−1) matrix obtained by deleting the ith row and
jth column of A. Now we deﬁne the (i, j) cofactor of A to be (−1)
i+j
times the
determinant of the (i, j) minor. (These signs have a chessboard pattern, starting
with sign + in the top left corner.) We denote the (i, j) cofactor of A by K
i j
(A).
Finally, the adjugate of A is the nn matrix Adj(A) whose (i, j) entry is the ( j, i)
cofactor K
ji
(A) of A. (Note the transposition!)
Theorem 2.13 (a) For j ≤i ≤n, we have
det(A) =
n
∑
i=1
A
i j
K
i j
(A).
(b) For 1 ≤i ≤n, we have
det(A) =
n
∑
j=1
A
i j
K
i j
(A).
This theorem says that, if we take any column or row of A, multiply each
element by the corresponding cofactor, and add the results, we get the determinant
of A.
Example 2.4 Using a cofactor expansion along the ﬁrst column, we see that
1 2 3
4 5 6
7 8 10
=
5 6
8 10
−4
2 3
8 10
+7
2 3
5 6
= (5 10−6 8) −4(2 10−3 8) +7(2 6−3 5)
= 2+16−21
= −3
using the standard formula for a 22 determinant.
Proof We prove (a); the proof for (b) is a simple modiﬁcation, using rows instead
of columns. Let D(A) be the function deﬁned by the righthand side of (a) in the
theorem, using the jth column of A. We verify rules (D1)–(D3).
(D1) It is clear that D(A) is a linear function of the jth column. For k = j, the co
factors are linear functions of the kth column (since they are determinants),
and so D(A) is linear.
28 CHAPTER 2. MATRICES AND DETERMINANTS
(D2) If the kth and lth columns of A are equal, then each cofactor is the determi
nant of a matrix with two equal columns, and so is zero. The harder case is
when the jth column is equal to another, say the kth. Using induction, each
cofactor can be expressed as a sum of elements of the kth column times
(n −2) (n −2) determinants. In the resulting sum, it is easy to see that
each such determinant occurs twice with opposite signs and multiplied by
the same factor. So the terms all cancel.
(D3) Suppose that A = I. The only nonzero cofactor in the jth column is K
j j
(I),
which is equal to (−1)
j+j
det(I
n−1
) = 1. So D(I) = 1.
By the main theorem, the expression D(A) is equal to det(A).
At ﬁrst sight, this looks like a simple formula for the determinant, since it is
just the sum of n terms, rather than n! as in the ﬁrst case. But each term is an
(n −1) (n −1) determinant. Working down the chain we ﬁnd that this method
is just as labourintensive as the other one.
But the cofactor expansion has further nice properties:
Theorem 2.14 For any nn matrix A, we have
A Adj(A) = Adj(A) A = det(A) I.
Proof We calculate the matrix product. Recall that the (i, j) entry of Adj(A) is
K
ji
(A).
Now the (i, i) entry of the product A Adj(A) is
n
∑
k=1
A
ik
(Adj(A))
ki
=
n
∑
k=1
A
ik
K
ik
(A) = det(A),
by the cofactor expansion. On the other hand, if i = j, then the (i, j) entry of the
product is
n
∑
k=1
A
ik
(Adj(A))
k j
=
n
∑
k=1
A
ik
K
jk
(A).
This last expression is the cofactor expansion of the matrix A
/
which is the same
of A except for the jth row, which has been replaced by the ith row of A. (Note
that changing the jth row of a matrix has no effect on the cofactors of elements in
this row.) So the sum is det(A
/
). But A
/
has two equal rows, so its determinant is
zero.
Thus A Adj(A) has entries det(A) on the diagonal and 0 everywhere else; so
it is equal to det(A) I.
The proof for the product the other way around is the same, using columns
instead of rows.
2.6. THE CAYLEY–HAMILTON THEOREM 29
Corollary 2.15 If the nn matrix A is invertible, then its inverse is equal to
(det(A))
−1
Adj(A).
So how can you work out a determinant efﬁciently? The best method in prac
tice is to use elementary operations.
Apply elementary operations to the matrix, keeping track of the factor by
which the determinant is multiplied by each operation. If you want, you can
reduce all the way to the identity, and then use the fact that det(I) = 1. Often it is
simpler to stop at an earlier stage when you can recognise what the determinant is.
For example, if the matrix A has diagonal entries a
1
, . . . , a
n
, and all offdiagonal
entries are zero, then det(A) is just the product a
1
a
n
.
Example 2.5 Let
A =
1 2 3
4 5 6
7 8 10
¸
¸
.
Subtracting twice the ﬁrst column from the second, and three times the second
column from the third (these operations don’t change the determinant) gives
1 0 0
4 −3 −6
7 −6 −11
¸
¸
.
Now the cofactor expansion along the ﬁrst row gives
det(A) =
−3 −6
−6 −11
= 33−36 =−3.
(At the last step, it is easiest to use the formula for the determinant of a 2 2
matrix rather than do any further reduction.)
2.6 The Cayley–Hamilton Theorem
Since we can add and multiply matrices, we can substitute them into a polynomial.
For example, if
A =
¸
0 1
−2 3
,
then the result of substituting A into the polynomial x
2
−3x +2 is
A
2
−3A+2I =
¸
−2 3
−6 7
+
¸
0 −3
6 −9
+
¸
2 0
0 2
=
¸
0 0
0 0
.
30 CHAPTER 2. MATRICES AND DETERMINANTS
We say that the matrix A satisﬁes the equation x
2
−3x +2 = 0. (Notice that for
the constant term 2 we substituted 2I.)
It turns out that, for every nn matrix A, we can calculate a polynomial equa
tion of degree n satisﬁed by A.
Deﬁnition 2.11 Let A be a n n matrix. The characteristic polynomial of A is
the polynomial
c
A
(x) = det(xI −A).
This is a polynomial in x of degree n.
For example, if
A =
¸
0 1
−2 3
,
then
c
A
(x) =
x −1
2 x −3
= x(x −3) +2 = x
2
−3x +2.
Indeed, it turns out that this is the polynomial we want in general:
Theorem 2.16 (Cayley–Hamilton Theorem) Let A be an nn matrix with char
acteristic polynomial c
A
(x). Then c
A
(A) = O.
Example 2.6 Let us just check the theorem for 22 matrices. If
A =
¸
a b
c d
,
then
c
A
(x) =
x −a −b
−c x −d
= x
2
−(a+d)x +(ad −bc),
and so
c
A
(A) =
¸
a
2
+bc ab+bd
ac +cd bc +d
2
−(a+d)
¸
a b
c d
+(ad −bc)
¸
1 0
0 1
= O,
after a small amount of calculation.
Proof We use the theorem
A Adj(A) = det(A) I.
In place of A, we put the matrix xI −A into this formula:
(xI −A)Adj(xI −A) = det(xI −A)I = c
A
(x)I.
2.6. THE CAYLEY–HAMILTON THEOREM 31
Now it is very tempting just to substitute x = A into this formula: on the
right we have c
A
(A)I = c
A
(A), while on the left there is a factor AI −A = O.
Unfortunately this is not valid; it is important to see why. The matrix Adj(xI −A)
is an n n matrix whose entries are determinants of (n −1) (n −1) matrices
with entries involving x. So the entries of Adj(xI −A) are polynomials in x, and if
we try to substitute A for x the size of the matrix will be changed!
Instead, we argue as follows. As we have said, Adj(xI −A) is a matrix whose
entries are polynomials, so we can write it as a sum of powers of x times matrices,
that is, as a polynomial whose coefﬁcients are matrices. For example,
¸
x
2
+1 2x
3x −4 x +2
= x
2
¸
1 0
0 0
+x
¸
0 2
3 1
+
¸
1 0
−4 2
.
The entries in Adj(xI −A) are (n −1) (n −1) determinants, so the highest
power of x that can arise is x
n−1
. So we can write
Adj(xI −A) = x
n−1
B
n−1
+x
n−2
B
n−2
+ +xB
1
+B
0
,
for suitable nn matrices B
0
, . . . , B
n−1
. Hence
c
A
(x)I = (xI −A)Adj(xI −A)
= (xI −A)(x
n−1
B
n−1
+x
n−2
B
n−2
+ +xB
1
+B
0
)
= x
n
B
n−1
+x
n−1
(−AB
n−1
+B
n−2
) + +x(−AB
1
+B
0
) −AB
0
.
So, if we let
c
A
(x) = x
n
+c
n−1
x
n−1
+ +c
1
x +c
0
,
then we read off that
B
n−1
= I,
−AB
n−1
+ B
n−2
= c
n−1
I,
−AB
1
+ B
0
= c
1
I,
−AB
0
= c
0
I.
We take this system of equations, and multiply the ﬁrst by A
n
, the second by
A
n−1
, . . . , and the last by A
0
= I. What happens? On the left, all the terms cancel
in pairs: we have
A
n
B
n−1
+A
n−1
(−AB
n−1
+B
n−2
) + +A(−AB
1
+B
0
) +I(−AB
0
) = O.
On the right, we have
A
n
+c
n−1
A
n−1
+ +c
1
A+c
0
I = c
A
(A).
So c
A
(A) = O, as claimed.
32 CHAPTER 2. MATRICES AND DETERMINANTS
Chapter 3
Linear maps between vector spaces
We return to the setting of vector spaces in order to deﬁne linear maps between
them. We will see that these maps can be represented by matrices, decide when
two matrices represent the same linear map, and give another proof of the canon
ical form for equivalence.
3.1 Deﬁnition and basic properties
Deﬁnition 3.1 Let V and W be vector spaces over a ﬁeld K. A function α from
V to W is a linear map if it preserves addition and scalar multiplication, that is, if
• α(v
1
+v
2
) = α(v
1
) +α(v
2
) for all v
1
, v
2
∈V;
• α(cv) = cα(v) for all v ∈V and c ∈ K.
Remarks 1. We can combine the two conditions into one as follows:
α(c
1
v
1
+c
2
v
2
) = c
1
α(v
1
) +c
2
α(v
2
).
2. In other literature the term “linear transformation” is often used instead of
“linear map”.
Deﬁnition 3.2 Let α : V →W be a linear map. The image of α is the set
Im(α) =¦w ∈W : w = α(v) for some v ∈V¦,
and the kernel of α is
Ker(α) =¦v ∈V : α(v) = 0¦.
33
34 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES
Proposition 3.1 Let α : V →W be a linear map. Then the image of α is a sub
space of W and the kernel is a subspace of V.
Proof We have to show that each is closed under addition and scalar multiplica
tion. For the image, if w
1
= α(v
1
) and w
2
= α(v
2
), then
w
1
+w
2
= α(v
1
) +α(v
2
) = α(v
1
+v
2
),
and if w = α(v) then
cw = cα(v) = α(cv).
For the kernel, if α(v
1
) = α(v
2
) = 0 then
α(v
1
+v
2
) = α(v
1
) +α(v
2
) = 0+0 = 0,
and if α(v) = 0 then
α(cv) = cα(v) = c0 = 0.
Deﬁnition 3.3 We deﬁne the rank of α to be ρ(α) = dim(Im(α)) and the nullity
of α to be ν(α) = dim(Ker(α)). (We use the Greek letters ‘rho’ and ‘nu’ here to
avoid confusing the rank of a linear map with the rank of a matrix, though they
will turn out to be closely related!)
Theorem 3.2 (Rank–Nullity Theorem) Let α : V →W be a linear map. Then
ρ(α) +ν(α) = dim(V).
Proof Choose a basis u
1
, u
2
, . . . , u
q
for Ker(α), where r = dim(Ker(α)) =ν(α).
The vectors u
1
, . . . , u
q
are linearly independent vectors of V, so we can add further
vectors to get a basis for V, say u
1
, . . . , u
q
, v
1
, . . . , v
s
, where q+s = dim(V).
We claim that the vectors α(v
1
), . . . , α(v
s
) form a basis for Im(α). We have
to show that they are linearly independent and spanning.
Linearly independent: Suppose that c
1
α(v
1
) + +c
s
α(v
s
) = 0. Then α(c
1
v
1
+
+c
s
v
s
) = 0, so that c
1
v
1
+ +c
s
v
s
∈ Ker(α). But then this vector can
be expressed in terms of the basis for Ker(α):
c
1
v
1
+ +c
s
v
s
= a
1
u
1
+ +a
q
u
q
,
whence
−a
1
u
1
− −a
q
u
q
+c
1
v
1
+ +c
s
v
s
= 0.
But the us and vs form a basis for V, so they are linearly independent. So
this equation implies that all the as and cs are zero. The fact that c
1
= =
c
s
= 0 shows that the vectors α(v
1
, . . . , α(v
s
) are linearly independent.
3.2. REPRESENTATION BY MATRICES 35
Spanning: Take any vector in Im(α), say w. Then w = α(v) for some v ∈ V.
Write v in terms of the basis for V:
v = a
1
u
1
+ +a
q
u
q
+c
1
v
1
+ +c
s
v
s
for some a
1
, . . . , a
q
, c
1
, . . . , c
s
. Applying α, we get
w = α(v)
= a
1
α(u
1
) + +a
q
α(u
q
) +c
1
α(v
1
) + +c
s
α(v
s
)
= c
1
w
1
+ +c
s
w
s
,
since α(u
i
) = 0 (as u
i
∈ Ker(α)) and α(v
i
) = w
i
. So the vectors w
1
, . . . , w
s
span Im(α).
Thus, ρ(α) = dim(Im(α)) = s. Since ν(α) = q and q +s = dim(V), the
theorem is proved.
3.2 Representation by matrices
We come now to the second role of matrices in linear algebra: they represent
linear maps between vector spaces.
Let α : V →W be a linear map, where dim(V) = m and dim(W) = n. As we
saw in the ﬁrst section, we can take V and W in their coordinate representation:
V = K
m
and W = K
n
(the elements of these vector spaces being represented as
column vectors). Let e
1
, . . . , e
m
be the standard basis for V (so that e
i
is the vector
with ith coordinate 1 and all other coordinates zero), and f
1
, . . . , f
n
the standard
basis for V. Then for i = 1, . . . , m, the vector α(e
i
) belongs to W, so we can write
it as a linear combination of f
1
, . . . , f
n
.
Deﬁnition 3.4 The matrix representing the linear map α : V →W relative to the
bases B = (e
1
, . . . , e
m
) for V and C = ( f
1
, . . . , f
n
) for W is the nm matrix whose
(i, j) entry is a
i j
, where
α(e
i
) =
n
∑
j=1
a
ji
f
j
for j = 1, . . . , n.
In practice this means the following. Take α(e
i
) and write it as a column vector
[ a
1i
a
2i
a
ni
]
·
. This vector is the ith column of the matrix representing α.
So, for example, if m = 3, n = 2, and
α(e
1
) = f
1
+ f
2
, α(e
2
) = 2 f
1
+5 f
2
, α(e
3
) = 3 f
1
− f
2
,
36 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES
then the vectors α(e
i
) as column vectors are
α(e
1
) =
¸
1
1
, α(e
2
) =
¸
2
5
, α(e
3
) =
¸
3
−1
,
and so the matrix representing T is
¸
1 2 3
1 5 −1
.
Now the most important thing about this representation is that the action of α
is now easily described:
Proposition 3.3 Let α : V →W be a linear map. Choose bases for V and W and
let A be the matrix representing α. Then, if we represent vectors of V and W as
column vectors relative to these bases, we have
α(v) = Av.
Proof Let e
1
, . . . , e
m
be the basis for V, and f
1
, . . . , f
n
for W. Take v =∑
m
i=1
c
i
e
i
∈
V, so that in coordinates
v =
c
1
.
.
.
c
m
¸
¸
.
Then
α(v) =
m
∑
i=1
c
i
α(e
i
) =
m
∑
i=1
n
∑
j=1
c
i
a
ji
f
j
,
so the jth coordinate of α(v) is ∑
n
i=1
a
ji
c
i
, which is precisely the jth coordinate in
the matrix product Av.
In our example, if v = 2e
1
+3e
2
+4e
3
= [ 2 3 4]
·
, then
α(v) = Av =
¸
1 2 3
1 5 −1
2
3
4
¸
¸
=
¸
20
13
.
Addition and multiplication of linear maps correspond to addition and multi
plication of the matrices representing them.
Deﬁnition 3.5 Let α and β be linear maps from V to W. Deﬁne their sum α +β
by the rule
(α +β)(v) = α(v) +β(v)
for all v ∈V. It is easy to check that α +β is a linear map.
3.3. CHANGE OF BASIS 37
Proposition 3.4 If α and β are linear maps represented by matrices A and B
respectively, then α +β is represented by the matrix A+B.
The proof of this is not difﬁcult: just use the deﬁnitions.
Deﬁnition 3.6 Let U,V,W be vector spaces over K, and let α : U →V and β :
V →W be linear maps. The product βα is the function U →W deﬁned by the
rule
(βα)(u) = β(α(u))
for all u ∈ U. Again it is easily checked that βα is a linear map. Note that the
order is important: we take a vector u ∈U, apply α to it to get a vector in V, and
then apply β to get a vector in W. So βα means “apply α, then β”.
Proposition 3.5 If α : U →V and β : V →W are linear maps represented by
matrices A and B respectively, then βα is represented by the matrix BA.
Again the proof is tedious but not difﬁcult. Of course it follows that a linear
map is invertible (as a map; that is, there is an inverse map) if and only if it is
represented by an invertible matrix.
Remark Let l = dim(U), m = dim(V) and n = dim(W), then A is ml, and B
is n m; so the product BA is deﬁned, and is n l, which is the right size for a
matrix representing a map from an ldimensional to an ndimensional space.
The signiﬁcance of all this is that the strange rule for multiplying matrices is
chosen so as to make Proposition 3.5 hold. The deﬁnition of multiplication of
linear maps is the natural one (composition), and we could then say: what deﬁni
tion of matrix multiplication should we choose to make the Proposition valid? We
would ﬁnd that the usual deﬁnition was forced upon us.
3.3 Change of basis
The matrix representing a linear map depends on the choice of bases we used to
represent it. Now we have to discuss what happens if we change the basis.
Remember the notion of transition matrix from Chapter 1. If B = (v
1
, . . . , v
m
)
and B
/
= (v
/
1
, . . . , v
/
m
) are two bases for a vector space V, the transition matrix P
B,B
/
is the matrix whose jth column is the coordinate representation of v
/
j
in the basis
B. Then we have
[v]
B
= P[v]
B
/ ,
38 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES
where [v]
B
is the coordinate representation of an arbitrary vector in the basis B,
and similarly for B
/
. The inverse of P
B,B
/ is P
B
/
,B
. Let p
i j
be the (i, j) entry of
P = P
B,B
/ .
Now let C = (w
1
, . . . , w
n
) and C
/
= (w
/
1
, . . . , w
/
n
) be two bases for a space W,
with transition matrix Q
C,C
/ and inverse Q
C
/
,C
. Let Q = Q
C,C
/ and let R = Q
C
/
,C
be
its inverse, with (i, j) entry r
i j
.
Let α be a linear map from V to W. Then α is represented by a matrix A
using the bases B and C, and by a matrix A
/
using the bases B
/
and C
/
. What is the
relation between A and A
/
?
We just do it and see. To get A
/
, we have to represent the vectors α(v
/
i
) in the
basis C
/
. We have
v
/
j
=
m
∑
i=1
p
i j
v
i
,
so
α(v
/
j
) =
m
∑
i=1
p
i j
α(v
i
)
=
m
∑
i=1
m
∑
k=1
p
i j
A
ki
w
k
=
m
∑
i=1
n
∑
k=1
n
∑
l=1
p
i j
A
ki
r
lk
w
/
l
.
This means, on turning things around, that
(A
/
)
l j
=
n
∑
k=1
m
∑
i=1
r
lk
A
ki
p
i j
,
so, according to the rules of matrix multiplication,
A
/
= RAP = Q
−1
AP.
Proposition 3.6 Let α : V →W be a linear map represented by matrix A relative
to the bases B for V and C for W, and by the matrix A
/
relative to the bases B
/
for
V and C
/
for W. If P = P
B,B
/ and Q = P
C,C
/ are the transition matrices from the
unprimed to the primed bases, then
A
/
= Q
−1
AP.
This is rather technical; you need it for explicit calculations, but for theoretical
purposes the importance is the following corollary. Recall that two matrices A and
B are equivalent if B is obtained from A by multiplying on the left and right by
invertible matrices. (It makes no difference that we said B = PAQ before and
B = Q
−1
AP here, of course.)
3.4. CANONICAL FORM REVISITED 39
Proposition 3.7 Two matrices represent the same linear map with respect to dif
ferent bases if and only if they are equivalent.
This holds because
• transition matrices are always invertible (the inverse of P
B,B
/ is the matrix
P
B
/
,B
for the transition in the other direction); and
• any invertible matrix can be regarded as a transition matrix: for, if the nn
matrix P is invertible, then its rank is n, so its columns are linearly inde
pendent, and form a basis B
/
for K
n
; and then P = P
B,B
/ , where B is the
“standard basis”.
3.4 Canonical form revisited
Now we can give a simpler proof of Theorem 2.3 about canonical form for equiv
alence. First, we make the following observation.
Theorem 3.8 Let α : V →W be a linear map of rank r = ρ(α). Then there are
bases for V and W such that the matrix representing α is, in block form,
¸
I
r
O
O O
.
Proof As in the proof of Theorem 3.2, choose a basis u
1
, . . . , u
s
for Ker(α), and
extend to a basis u
1
, . . . , u
s
, v
1
, . . . , v
r
for V. Then α(v
1
), . . . , α(v
r
) is a basis for
Im(α), and so can be extended to a basis α(v
1
), . . . , α(v
r
), x
1
, . . . , x
t
for W. Now
we will use the bases
v
1
, . . . , v
r
, v
r+1
= u
1
, . . . , v
r+s
= w
s
for V,
w
1
= α(v
1
), . . . , w
r
= α(v
r
), w
r+1
= x
1
, . . . , w
r+s
= x
s
for W.
We have
α(v
i
) =
w
i
if 1 ≤i ≤r,
0 otherwise;
so the matrix of α relative to these bases is
¸
I
r
O
O O
as claimed.
40 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES
We recognise the matrix in the theorem as the canonical form for equivalence.
Combining Theorem 3.8 with Proposition 3.7, we see:
Theorem 3.9 A matrix of rank r is equivalent to the matrix
¸
I
r
O
O O
.
We also see, by the way, that the rank of a linear map (that is, the dimension
of its image) is equal to the rank of any matrix which represents it. So all our
deﬁnitions of rank agree!
The conclusion is that
two matrices are equivalent if and only if they have the same rank.
So how many equivalence classes of mn matrices are there, for given m and n?
The rank of such a matrix can take any value from 0 up to the minimum of m and
n; so the number of equivalence classes is min¦m, n¦+1.
Chapter 4
Linear maps on a vector space
In this chapter we consider a linear map α from a vector space V to itself. If
dim(V) = n then, as in the last chapter, we can represent α by an n n matrix
relative to any basis for V. However, this time we have less freedom: instead of
having two bases to choose, there is only one. This makes the theory much more
interesting!
4.1 Projections and direct sums
We begin by looking at a particular type of linear map whose importance will be
clear later on.
Deﬁnition 4.1 The linear map π : V →V is a projection if π
2
= π (where, as
usual, π
2
is deﬁned by π
2
(v) = π(π(v))).
Proposition 4.1 If π : V →V is a projection, then V = Im(π) ⊕Ker(π).
Proof We have two things to do:
Im(π) +Ker(π) = V: Take any vector v ∈ V, and let w = π(v) ∈ Im(π). We
claim that v −w ∈ Ker(π). This holds because
π(v −w) = π(v) −π(w) = π(v) −π(π(v)) = π(v) −π
2
(v) = 0,
since π
2
=π. Now v = w+(v−w) is the sum of a vector in Im(π) and one
in Ker(π).
Im(π)∩Ker(π) =¦0¦: Take v ∈Im(π)∩Ker(π). Then v =π(w) for some vector
w; and
0 = π(v) = π(π(w)) = π
2
(w) = π(w) = v,
as required (the ﬁrst equality holding because v ∈ Ker(π)).
41
42 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
It goes the other way too: if V =U ⊕W, then there is a projection π : V →V
with Im(π) =U and Ker(π) =W. For every vector v ∈V can be uniquely written
as v = u+w, where u ∈U and w ∈W; we deﬁne π by the rule that π(v) = u. Now
the assertions are clear.
The diagram in Figure 4.1 shows geometrically what a projection is. It moves
any vector v in a direction parallel to Ker(π) to a vector lying in Im(π).
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨
¨¨
£
£
£
£
£
£
£
£
£
£
£
£
£
£
£
£
££
Ker(π)
Im(π)
¨
¨
¨
¨
¨
¨
¨
¨
¨%
v
π(v)
Figure 4.1: A projection
We can extend this to direct sums with more than two terms. First, notice that
if π is a projection and π
/
= I −π (where I is the identity map, satisfying I(v) = v
for all vectors v), then π
/
is also a projection, since
(π
/
)
2
= (I −π)
2
= I −2π +π
2
= I −2π +π = I −π = π
/
;
and π +π
/
= I; also ππ
/
=π(I −π) =π −π
2
= O. Finally, we see that Ker(π) =
Im(π
/
); so V = Im(π) ⊕Im(π
/
). In this form the result extends:
Proposition 4.2 Suppose that π
1
, π
2
, . . . , π
r
are projections on V satisfying
(a) π
1
+π
2
+ +π
r
= I, where I is the identity transformation;
(b) π
i
π
j
= O for i = j.
Then V =U
1
⊕U
2
⊕ ⊕U
r
, where U
i
= Im(π
i
).
Proof We have to show that any vector v can be uniquely written in the form
v = u
1
+u
2
+ +u
r
, where u
i
∈U
i
for i = 1, . . . , r. We have
v = I(v) = π
1
(v) +π
2
(v) + +π
r
(v) = u
1
+u
2
+ +u
r
,
4.2. LINEAR MAPS AND MATRICES 43
where u
i
= π
i
(v) ∈ Im(π
i
) for i = 1, . . . , r. So any vector can be written in this
form. Now suppose that we have any expression
v = u
/
1
+u
/
2
+ +u
/
r
,
with u
/
i
∈U
i
for i = 1, . . . , r. Since u
/
i
∈U
i
= Im(π
i
), we have u
/
i
= π(v
i
) for some
v
i
; then
π
i
(u
/
i
) = π
2
i
(v
i
) = π
i
(v
i
) = u
/
i
.
On the other hand, for j = i, we have
π
i
(u
/
j
) = π
i
π
j
(v
j
) = 0,
since π
i
π
j
= O. So applying π
i
to the expression for v, we obtain
π
i
(v) = π
i
(u
/
1
) +π
i
(u
/
2
) + +π
i
(u
/
r
) = π
i
(u
/
i
) = u
/
i
,
since all terms in the sum except the ith are zero. So the only possible expression
is given by u
i
= π
i
(v), and the proof is complete.
Conversely, if V =U
1
⊕U
2
⊕ ⊕U
r
, then we can ﬁnd projections π
i
, π
2
, . . . , π
r
satisfying the conditions of the above Proposition. For any vector v ∈ V has a
unique expression as
v = u
1
+u
2
+ +u
r
with u
i
∈U
i
for i = 1, . . . , r; then we deﬁne π
i
(v) = u
i
.
The point of this is that projections give us another way to recognise and de
scribe direct sums.
4.2 Linear maps and matrices
Let α : V →V be a linear map. If we choose a basis v
1
, . . . , v
n
for V, then V can
be written in coordinates as K
n
, and α is represented by a matrix A, say, where
α(v
i
) =
n
∑
j=1
a
ji
v
j
.
Then just as in the last section, the action of α on V is represented by the action of
A on K
n
: α(v) is represented by the product Av. Also, as in the last chapter, sums
and products (and hence arbitrary polynomials) of linear maps are represented by
sums and products of the representing matrices: that is, for any polynomial f (x),
the map f (α) is represented by the matrix f (A).
What happens if we change the basis? This also follows from the formula we
worked out in the last chapter. However, there is only one basis to change.
44 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
Proposition 4.3 Let α be a linear map on V which is represented by the matrix A
relative to a basis B, and by the matrix A
/
relative to a basis B
/
. Let P = P
B,B
/ be
the transition matrix between the two bases. Then
A
/
= P
−1
AP.
Proof This is just Proposition 4.6, since P and Q are the same here.
Deﬁnition 4.2 Two n n matrices A and B are said to be similar if B = P
−1
AP
for some invertible matrix P.
Thus similarity is an equivalence relation, and
two matrices are similar if and only if they represent the same linear
map with respect to different bases.
There is no simple canonical form for similarity like the one for equivalence
that we met earlier. For the rest of this section we look at a special class of ma
trices or linear maps, the “diagonalisable” ones, where we do have a nice simple
representative of the similarity class. In the ﬁnal section we give without proof a
general result for the complex numbers.
4.3 Eigenvalues and eigenvectors
Deﬁnition 4.3 Let α be a linear map on V. A vector v ∈ V is said to be an
eigenvector of α, with eigenvalue λ ∈ K, if v = 0 and α(v) = λv. The set ¦v :
α(v) = λv¦ consisting of the zero vector and the eigenvectors with eigenvalue λ
is called the λeigenspace of α.
Note that we require that v = 0; otherwise the zero vector would be an eigen
vector for any value of λ. With this requirement, each eigenvector has a unique
eigenvalue: for if α(v) = λv = µv, then (λ −µ)v = 0, and so (since v = 0) we
have λ = µ.
The name eigenvalue is a mixture of German and English; it means “charac
teristic value” or “proper value” (here “proper” is used in the sense of “property”).
Another term used in older books is “latent root”. Here “latent” means “hidden”:
the idea is that the eigenvalue is somehow hidden in a matrix representing α, and
we have to extract it by some procedure. We’ll see how to do this soon.
4.4. DIAGONALISABILITY 45
Example Let
A =
¸
−6 6
−12 11
.
The vector v =
¸
3
4
satisﬁes
¸
−6 6
−12 11
¸
3
4
= 2
¸
3
4
,
so is an eigenvector with eigenvalue 2. Similarly, the vector w =
¸
2
3
is an eigen
vector with eigenvalue 3.
If we knew that, for example, 2 is an eigenvalue of A, then we could ﬁnd a
corresponding eigenvector
¸
x
y
by solving the linear equations
¸
−6 6
−12 11
¸
x
y
= 2
¸
x
y
.
In the nextbutone section, we will see how to ﬁnd the eigenvalues, and the fact
that there cannot be more than n of them for an nn matrix.
4.4 Diagonalisability
Some linear maps have a particularly simple representation by matrices.
Deﬁnition 4.4 The linear map α on V is diagonalisable if there is a basis of V
relative to which the matrix representing α is a diagonal matrix.
Suppose that v
1
, . . . , v
n
is such a basis showing that α is diagonalisable. Then
α(v
i
) = a
ii
v
i
for i = 1, . . . , n, where a
ii
is the ith diagonal entry of the diagonal
matrix A. Thus, the basis vectors are eigenvectors. Conversely, if we have a basis
of eigenvectors, then the matrix representing α is diagonal. So:
Proposition 4.4 The linear map α on V is diagonalisable if and only if there is a
basis of V consisting of eigenvectors of α.
46 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
Example The matrix
¸
1 2
0 1
is not diagonalisable. It is easy to see that its only
eigenvalue is 1, and the only eigenvectors are scalar multiples of [ 1 0]
·
. So we
cannot ﬁnd a basis of eigenvectors.
Theorem 4.5 Let α : V →V be a linear map. Then the following are equivalent:
(a) α is diagonalisable;
(b) V is the direct sum of eigenspaces of α;
(c) α = λ
1
π
1
+ +λ
r
π
r
, where λ
1
, . . . , λ
r
are the distinct eigenvalues of α,
and π
1
, . . . , π
r
are projections satisfying π
1
+ +π
r
= I and π
i
π
j
= 0 for
i = j.
Proof Let λ
1
, . . . , λ
r
be the distinct eigenvalues of α, and let v
i1
, . . . , v
im
i
be a
basis for the λ
i
eigenspace of α. Then α is diagonalisable if and only if the union
of these bases is a basis for V. So (a) and (b) are equivalent.
Now suppose that (b) holds. Proposition 4.2 and its converse show that there
are projections π
1
, . . . , π
r
satisfying the conditions of (c) where Im(π
i
) is the λ
i

eigenspace. Now in this case it is easily checked that T and ∑λ
i
π
i
agree on every
vector in V, so they are equal. So (b) implies (c).
Finally, if α = ∑λ
i
π
i
, where the π
i
satisfy the conditions of (c), then V is the
direct sum of the spaces Im(π
i
), and Im(π
i
) is the λ
i
eigenspace. So (c) implies
(b), and we are done.
Example Our matrix A =
¸
−6 6
−12 11
is diagonalisable, since the eigenvectors
¸
3
4
and
¸
2
3
are linearly independent, and so form a basis for R. Indeed, we see
that
¸
−6 6
−12 11
¸
3 4
2 3
=
¸
3 4
2 3
¸
2 0
0 3
,
so that P
−1
AP is diagonal, where P is the matrix whose columns are the eigenvec
tors of A.
Furthermore, one can ﬁnd two projection matrices whose column spaces are
the eigenspaces, namely
P
1
=
¸
9 −6
12 −8
, P
2
=
¸
−8 6
−12 9
.
4.4. DIAGONALISABILITY 47
Check directly that P
2
1
=P
1
, P
2
2
=P
2
, P
1
P
2
=P
2
P
1
=0, P
1
+P
2
=I, and 2P
1
+3P
2
=
A.
This expression for a diagonalisable matrix A in terms of projections is useful
in calculating powers of A, or polynomials in A.
Proposition 4.6 Let
A =
r
∑
i=1
λ
i
P
i
be the expression for the diagonalisable matrix A in terms of projections P
i
sat
isfying the conditions of Theorem 4.5, that is, ∑
r
i=1
P
i
= I and P
i
P
j
= O for i = j.
Then
(a) for any positive integer m, we have
A
m
=
r
∑
i=1
λ
m
i
P
i
;
(b) for any polynomial f (x), we have
f (A) =
r
∑
i=1
f (λ
i
)P
i
.
Proof (a) The proof is by induction on m, the case m = 1 being the given expres
sion. Suppose that the result holds for m = k −1. Then
A
k
= A
k−1
A
=
r
∑
i=1
λ
k−1
i
P
i
r
∑
i=1
λ
i
P
i
When we multiply out this product, all the terms P
i
P
j
are zero for i = j, and we
obtain simply ∑
r
i=1
λ
k−1
i
λ
i
P
i
, as required. So the induction goes through.
(b) If f (x) =∑a
m
x
m
, we obtain the result by multiplying the equation of part
(a) by a
m
and summing over m. (Note that, for m = 0, we use the fact that
A
0
= I =
r
∑
i=1
P
i
=
r
∑
i=1
λ
0
i
P
i
,
that is, part (a) holds also for m = 0.)
48 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
4.5 Characteristic and minimal polynomials
We deﬁned the determinant of a square matrix A. Now we want to deﬁne the de
terminant of a linear map α. The obvious way to do this is to take the determinant
of any matrix representing α. For this to be a good deﬁnition, we need to show
that it doesn’t matter which matrix we take; in other words, that det(A
/
) = det(A)
if A and A
/
are similar. But, if A
/
= P
−1
AP, then
det(P
−1
AP) = det(P
−1
)det(A)det(P) = det(A),
since det(P
−1
)det(P) = 1. So our plan will succeed:
Deﬁnition 4.5 (a) The determinant det(α) of a linear map α : V →V is the
determinant of any matrix representing T.
(b) The characteristic polynomial c
α
(x) of a linear map α : V →V is the char
acteristic polynomial of any matrix representing α.
(c) The minimal polynomial m
α
(x) of a linear map α : V →V is the monic
polynomial of smallest degree which is satisﬁed by α.
The second part of the deﬁnition is OK, by the same reasoning as the ﬁrst
(since c
A
(x) is just a determinant). But the third part also creates a bit of a problem:
how do we know that α satisﬁes any polynomial? The Cayley–Hamilton Theorem
tells us that c
A
(A) = O for any matrix A representing α. Now c
A
(A) represents
c
A
(α), and c
A
= c
α
by deﬁnition; so c
α
(α) = O. Indeed, the Cayley–Hamilton
Theorem can be stated in the following form:
Proposition 4.7 For any linear map α on V, its minimal polynomial m
α
(x) di
vides its characteristic polynomial c
α
(x) (as polynomials).
Proof Suppose not; then we can divide c
α
(x) by m
α
(x), getting a quotient q(x)
and nonzero remainder r(x); that is,
c
α
(x) = m
α
(x)q(x) +r(x).
Substituting α for x, using the fact that c
α
(α) = m
α
(α) = O, we ﬁnd that r(α) =
0. But the degree of r is less than the degree of m
α
, so this contradicts the deﬁni
tion of m
α
as the polynomial of least degree satisﬁed by α.
Theorem 4.8 Let α be a linear map on V. Then the following conditions are
equivalent for an element λ of K:
(a) λ is an eigenvalue of α;
(b) λ is a root of the characteristic polynomial of α;
(c) λ is a root of the minimal polynomial of α.
4.5. CHARACTERISTIC AND MINIMAL POLYNOMIALS 49
Remark: This gives us a recipe to ﬁnd the eigenvalues of α: take a matrix A
representing α; write down its characteristic polynomial c
A
(x) = det(xI −A); and
ﬁnd the roots of this polynomial. In our earlier example,
x −0.9 −0.3
−0.1 x −0.7
= (x−0.9)(x−0.7)−0.03 =x
2
−1.6x+0.6 = (x−1)(x−0.6),
so the eigenvalues are 1 and 0.6, as we found.
Proof (b) implies (a): Suppose that c
α
(λ) = 0, that is, det(λI −α) = 0. Then
λI −α is not invertible, so its kernel is nonzero. Pick a nonzero vector v in
Ker(λI −α). Then (λI −α)v = 0, so that α(v) = λv; that is, λ is an eigenvalue
of α.
(c) implies (b): Suppose that λ is a root of m
α
(x). Then (x −λ) divides
m
α
(x). But m
α
(x) divides c
α
(x), by the Cayley–Hamilton Theorem: so (x −λ
divides c
α
(x), whence λ is a root of c
α
(x).
(a) implies (c): Let λ be an eigenvalue of A with eigenvector v. We have
α(v) = λv. By induction, α
k
(v) = λ
k
v for any k, and so f (α)(v) = f (λ)(v)
for any polynomial f . Choosing f = m
α
, we have m
α
(α) = 0 by deﬁnition, so
m
α
(λ)v = 0; since v = 0, we have m
α
(λ) = 0, as required.
Using this result, we can give a necessary and sufﬁcient condition for α to be
diagonalisable. First, a lemma.
Lemma 4.9 Let v
1
, . . . , v
r
be eigenvectors of α with distinct eigenvalues λ
1
, . . . , λ
r
.
Then v
1
, . . . , v
r
are linearly independent.
Proof Suppose that v
1
, . . . , v
r
are linearly dependent, so that there exists a linear
relation
c
1
v
1
+ +c
r
v
r
= 0,
with coefﬁcients c
i
not all zero. Some of these coefﬁcients may be zero; choose a
relation with the smallest number of nonzero coefﬁcients. Suppose that c
1
= 0.
(If c
1
= 0 just renumber.) Now acting on the given relation with α, using the fact
that α(v
i
) = λ
i
v
i
, we get
c
1
λ
1
v
1
+ +c
r
λ
r
v
r
= 0.
Subtracting λ
1
times the ﬁrst equation from the second, we get
c
2
(λ
2
−λ
1
)v
2
+ +c
r
(λ
r
−λ
1
)v
r
= 0.
Now this equation has fewer nonzero coefﬁcients than the one we started with,
which was assumed to have the smallest possible number. So the coefﬁcients in
50 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
this equation must all be zero. That is, c
i
(λ
i
−λ
1
) = 0, so c
i
= 0 (since λ
i
= λ
1
),
for i = 2, . . . , n. This doesn’t leave much of the original equation, only c
1
v
1
= 0,
from which we conclude that c
1
= 0, contrary to our assumption. So the vectors
must have been linearly independent.
Theorem 4.10 The linear map α on V is diagonalisable if and only if its mini
mal polynomial is the product of distinct linear factors, that is, its roots all have
multiplicity 1.
Proof Suppose ﬁrst that α is diagonalisable, with eigenvalues λ
1
, . . . , λ
r
. Then
there is a basis such that α is represented by a diagonal matrix D whose diagonal
entries are the eigenvalues. Now for any polynomial f , f (α) is represented by
f (D), a diagonal matrix whose diagonal entries are f (λ
i
) for i = 1, . . . , r. Choose
f (x) = (x −λ
1
) (x −λ
r
).
Then all the diagonal entries of f (D) are zero; so f (D) = 0. We claim that f is
the minimal polynomial of α; clearly it has no repeated roots, so we will be done.
We know that each λ
i
is a root of m
α
(x), so that f (x) divides m
α
(x); and we also
know that f (α) = 0, so that the degree of f cannot be smaller than that of m
α
. So
the claim follows.
Conversely, we have to show that if m
α
is a product of distinct linear factors
then α is diagonalisable. This is a little argument with polynomials. Let f (x) =
∏(x −λ
i
) be the minimal polynomial of α, with the roots λ
i
all distinct. Let
h
i
(x) = f (x)/(x −λ
i
). Then the polynomials h
1
, . . . , h
r
have no common factor
except 1; for the only possible factors are (x −λ
i
), but this fails to divide h
i
. Now
the Euclidean algorithm shows that we can write the h.c.f. as a linear combination:
1 =
r
∑
i=1
h
i
(x)k
i
(x).
Let U
i
= Im(h
i
(α). The vectors in U
i
are eigenvectors of α with eigenvalue λ
i
;
for if u ∈U
i
, say u = h
i
(α)v, then
(α −λ
i
I)u
i
= (α −λ
i
I)h
i
(α)(v) = f (α)v = 0,
so that α(v) = λ
i
(v). Moreover every vector can be written as a sum of vectors
from the subspaces U
i
. For, given v ∈V, we have
v = Iv =
r
∑
i=1
h
i
(α)(k
i
(α)v),
with h
i
(α)(k
i
(α)v) ∈ Im(h
i
(α). The fact that the expression is unique follows
from the lemma, since the eigenvectors are linearly independent.
4.6. JORDAN FORM 51
So how, in practice, do we “diagonalise” a matrix A, that is, ﬁnd an invertible
matrix P such that P
−1
AP=Dis diagonal? We sawan example of this earlier. The
matrix equation can be rewritten as AP =PD, from which we see that the columns
of P are the eigenvectors of A. So the proceedure is: Find the eigenvalues of A, and
ﬁnd a basis of eigenvectors; then let P be the matrix which has the eigenvectors as
columns, and D the diagonal matrix whose diagonal entries are the eigenvalues.
Then P
−1
AP = D.
How do we ﬁnd the minimal polynomial of a matrix? We know that it divides
the characteristic polynomial, and that every root of the characteristic polynomial
is a root of the minimal polynomial; then it’s trial and error. For example, if the
characteristic polynomial is (x−1)
2
(x−2)
3
, then the minimal polynomial must be
one of (x −1)(x −2) (this would correspond to the matrix being diagonalisable),
(x−1)
2
(x−2), (x−1)(x−2)
2
, (x−1)
2
(x−2)
2
, (x−1)(x−2)
3
or (x−1)
2
(x−2)
3
.
If we try them in this order, the ﬁrst one to be satisﬁed by the matrix is the minimal
polynomial.
For example, the characteristic polynomial of A =
¸
1 2
0 1
is (x −1)
2
; its
minimal polynomial is not (x −1) (since A = I); so it is (x −1)
2
.
4.6 Jordan form
We ﬁnish this chapter by stating without proof a canonical form for matrices over
the complex numbers under similarity.
Deﬁnition 4.6 (a) A Jordan block J(n, λ) is a matrix of the form
λ 1 0 0
0 λ 1 0
0 0 0 λ
¸
¸
¸
¸
,
that is, it is an n n matrix with λ on the main diagonal, 1 in positions
immediately above the main diagonal, and 0 elsewhere. (We take J(1, λ) to
be the 11 matrix [λ].)
(b) A matrix is in Jordan form if it can be written in block form with Jordan
blocks on the diagonal and zeros elsewhere.
Theorem 4.11 Over C, any matrix is similar to a matrix in Jordan form; that
is, any linear map can be represented by a matrix in Jordan form relative to a
suitable basis. Moreover, the Jordan form of a matrix or linear map is unique
apart from putting the Jordan blocks in a different order on the diagonal.
52 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
Remark A matrix over C is diagonalisable if and only if all the Jordan blocks
in its Jordan form have size 1.
Example Any 33 matrix over C is similar to one of
λ 0 0
0 µ 0
0 0 ν
¸
¸
,
λ 1 0
0 λ 0
0 0 µ
¸
¸
,
λ 1 0
0 λ 1
0 0 λ
¸
¸
,
for some λ, µ, ν ∈ C (not necessarily distinct).
Example Consider the matrix
¸
a b
−b a
, with b = 0. Its characteristic polyno
mial is x
2
−2ax +(a
2
+b
2
), so that the eigenvalues over C are a+bi and a−bi.
Thus A is diagonalisable, if we regard it as a matrix over the complex numbers.
But over the real numbers, A has no eigenvalues and no eigenvectors; it is not
diagonalisable, and cannot be put into Jordan form either.
We see that there are two different “obstructions” to a matrix being diagonal
isable:
(a) The roots of the characteristic polynomial don’t lie in the ﬁeld K. We can
always get around this by working in a larger ﬁeld (as above, enlarge the
ﬁeld from R to C).
(b) Even though the characteristic polynomial factorises, there may be Jordan
blocks of size bigger than 1, so that the minimal polynomial has repeated
roots. This problem cannot be transformed away by enlarging the ﬁeld; we
are stuck with what we have.
Though it is beyond the scope of this course, it can be shown that if all the roots
of the characteristic polynomial lie in the ﬁeld K, then the matrix is similar to one
in Jordan form.
4.7 Trace
Here we meet another function of a linear map, and consider its relation to the
eigenvalues and the characteristic polynomial.
Deﬁnition 4.7 The trace Tr(A) of a square matrix A is the sum of its diagonal
entries.
4.7. TRACE 53
Proposition 4.12 (a) For any two n n matrices A and B, we have Tr(AB) =
Tr(BA).
(b) Similar matrices have the same trace.
Proof (a)
Tr(AB) =
n
∑
i=1
(AB)
ii
=
n
∑
i=1
n
∑
j=1
A
i j
B
ji
,
by the rules for matrix multiplication. Now obviously Tr(BA) is the same thing.
(b) Tr(P
−1
AP) = Tr(APP
−1
) = Tr(AI) = Tr(A).
The second part of this proposition shows that, if α : V →V is a linear map,
then any two matrices representing α have the same trace; so, as we did for the
determinant, we can deﬁne the trace Tr(α) of α to be the trace of any matrix
representing α.
The trace and determinant of α are coefﬁcients in the characteristic polyno
mial of α.
Proposition 4.13 Let α : V →V be a linear map, where dim(V) = n, and let c
α
be the characteristic polynomial of α, a polynomial of degree n with leading term
x
n
.
(a) The coefﬁcient of x
n−1
is −Tr(α), and the constant term is (−1)
n
det(α).
(b) If α is diagonalisable, then the sum of its eigenvalues is Tr(α) and their
product is det(α).
Proof Let A be a matrix representing α. We have
c
α
(x) = det(xI −A) =
x −a
11
−a
12
. . . −a
1n
−a
21
x −a
22
. . . −a
2n
. . .
−a
n1
−a
n2
. . . x −a
nn
.
The only way to obtain a term in x
n−1
in the determinant is from the product
(x −a
11
)(x −a
22
) (x −a
nn
) of diagonal entries, taking −a
ii
from the ith factor
and x from each of the others. (If we take one offdiagonal term, we would have
to have at least two, so that the highest possible power of x would be x
n−2
.) So the
coefﬁcient of x
n−1
is minus the sum of the diagonal terms.
Putting x =0, we ﬁnd that the constant termis c
α
(0) =det(−A) = (−1)
n
det(A).
If α is diagonalisable then the eigenvalues are the roots of c
α
(x):
c
α
(x) = (x −λ
1
)(x −λ
2
) (x −λ
n
).
Now the coefﬁcient of x
n−1
is minus the sum of the roots, and the constant term
is (−1)
n
times the product of the roots.
54 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE
Chapter 5
Linear and quadratic forms
In this chapter we examine “forms”, that is, functions from a vector space V to
its ﬁeld, which are either linear or quadratic. The linear forms comprise the dual
space of V; we look at this and deﬁne dual bases and the adjoint of a linear map
(corresponding to the transpose of a matrix).
Quadratic forms make up the bulk of the chapter. We show that we can change
the basis to put any quadratic forminto “diagonal form” (with squared terms only),
by a process generalising “completing the square” in elementary algebra, and that
further reductions are possible over the real and complex numbers.
5.1 Linear forms and dual space
The deﬁnition is simple:
Deﬁnition 5.1 Let V be a vector space over K. A linear form on V is a linear map
fromV to K, where Kis regarded as a 1dimensional vector space over K: that is,
it is a function fromV to K satisfying
f (v
1
+v
2
) = f (v
1
) + f (v
2
), f (cv) = c f (v)
for all v
1
, v
2
, v ∈V and c ∈ K.
If dim(V) = n, then a linear form is represented by a 1 n matrix over K,
that is, a row vector of length n over K. If f = [ a
1
a
2
. . . a
n
], then for v =
[ x
1
x
2
. . . x
n
]
·
we have
f (v) = [ a
1
a
2
. . . a
n
]
x
1
x
2
.
.
.
x
n
¸
¸
¸
¸
= a
1
x
1
+a
2
x
2
+ +a
n
x
n
.
55
56 CHAPTER 5. LINEAR AND QUADRATIC FORMS
Conversely, any row vector of length n represents a linear form on K
n
.
Deﬁnition 5.2 Linear forms can be added and multiplied by scalars in the obvious
way:
( f
1
+ f
2
)(v) = f
1
(v) + f
2
(v), (c f )(v) = c f (v).
So they form a vector space, which is called the dual space of V and is denoted
by V
∗
.
Not surprisingly, we have:
Proposition 5.1 If V is ﬁnitedimensional, then so is V
∗
, and dim(V
∗
) = dim(V).
Proof We begin by observing that, if (v
1
, . . . , v
n
) is a basis for V, and a
1
, . . . , a
n
are any scalars whatsoever, then there is a unique linear map f with the property
that f (v
i
) = a
i
for i = 1, . . . , n. It is given by
f (c
1
v
1
+ +c
n
v
n
) = a
1
c
1
+ +a
n
c
n
,
in other words, it is represented by the row vector [ a
1
a
2
. . . a
n
], and its
action on K
n
is by matrix multiplication as we saw earlier.
Now let f
i
be the linear map deﬁned by the rule that
f
i
(v
j
) =
1 if i = j,
0 if i = j.
Then ( f
1
, . . . , f
n
) form a basis for V
∗
; indeed, the linear form f deﬁned in the
preceding paragraph is a
1
f
1
+ +a
n
f
n
. This basis is called the dual basis of
V
∗
corresponding to the given basis for V. Since it has n elements, we see that
dim(V
∗
) = n = dim(V).
We can describe the basis in the preceding proof as follows.
Deﬁnition 5.3 The Kronecker delta δ
i j
for i, j ∈ ¦1, . . . , n¦ is deﬁned by the rule
that
δ
i j
=
1 if i = j,
0 if i = j.
Note that δ
i j
is the (i, j) entry of the identity matrix. Now, if (v
1
, . . . , v
n
) is a basis
for V, then the dual basis for the dual space V
∗
is the basis ( f
1
, . . . , f
n
) satisfying
f
i
(v
j
) = δ
i j
.
There are some simple properties of the Kronecker delta with respect to sum
mation. For example,
n
∑
i=1
δ
i j
a
i
= a
j
for ﬁxed j ∈ ¦1, . . . , n¦. This is because all terms of the sum except the term i = j
are zero.
5.1. LINEAR FORMS AND DUAL SPACE 57
5.1.1 Adjoints
Deﬁnition 5.4 Let α : V →W be a linear map. There is a linear map α
∗
: W
∗
→
V
∗
(note the reversal!) deﬁned by
(α
∗
( f ))(v) = f (α(v)).
The map α
∗
is called the adjoint of α.
This deﬁnition takes a bit of unpicking. We are given α : V →W and asked to
deﬁne α
∗
: W
∗
→V
∗
. This means that, to any element f ∈W
∗
(any linear form on
W) we must associate a linear form g =α
∗
( f ) ∈V
∗
. This linear form must act on
vectors v ∈V to produce scalars. Our deﬁnition says that α
∗
( f ) maps the vector v
to the scalar f (α(v)): this makes sense because α(v) is a vector in W, and hence
the linear form f ∈W
∗
can act on it to produce a scalar.
Now α
∗
, being a linear map, is represented by a matrix when we choose bases
for W
∗
and V
∗
. The obvious bases to choose are the dual bases corresponding to
some given bases of W and V respectively. What is the matrix? Some calculation
shows the following, which will not be proved in detail here.
Proposition 5.2 Let α : V →W be a linear map. Choose bases B for V, and C for
W, and let A be the matrix representing α relative to these bases. Let B
∗
and C
∗
denote the dual bases of V
∗
and W
∗
corresponding to B and C. Then the matrix
representing α
∗
relative to the bases C
∗
and B
∗
is the transpose of A, that is, A
·
.
5.1.2 Change of basis
Suppose that we change bases in V from B = (v
1
, . . . , v
n
) to B
/
= (v
/
1
, . . . , v
/
n
), with
change of basis matrix P = P
B,B
/ . How do the dual bases change? In other words,
if B
∗
= ( f
1
, . . . , f
n
) is the dual basis of B, and (B
/
)
∗
= ( f
/
1
, . . . , f
/
n
) the dual basis
of B, then what is the transition matrix P
B
∗
,(B
/
)
∗? The next result answers the
question.
Proposition 5.3 Let B and B
/
be bases for V, and B
∗
and (B
/
)
∗
the dual bases of
the dual space. Then
P
B
∗
,(B
/
)
∗ =
P
·
B,B
/
−1
.
Proof Use the notation from just before the Proposition. If P = P
B,B
/ has (i, j)
entry p
i j
, and Q = P
B
∗
,(B
/
)
∗ has (i, j) entry q
i j
, we have
v
/
i
=
n
∑
k=1
p
ki
v
k
,
f
/
j
=
n
∑
l=1
q
l j
f
l
,
58 CHAPTER 5. LINEAR AND QUADRATIC FORMS
and so
δ
i j
= f
/
j
(v
/
i
)
=
n
∑
l=1
q
l j
f
l
n
∑
k=1
p
ki
v
i
=
n
∑
l=1
n
∑
k=1
q
l j
δ
i j
p
ki
=
n
∑
k=1
q
k j
p
ki
.
Now q
k j
is the ( j, k) entry of Q
·
, and so we have
I = Q
·
P,
whence Q
·
= P
−1
, so that Q =
P
−1
·
=
P
·
−1
, as required.
5.2 Quadratic forms
A lot of applications of mathematics involve dealing with quadratic forms: you
meet them in statistics (analysis of variance) and mechanics (energy of rotating
bodies), among other places. In this section we begin the study of quadratic forms.
5.2.1 Quadratic forms
For almost everything in the remainder of this chapter, we assume that
the characteristic of the ﬁeld K is not equal to 2.
This means that 2 = 0 in K, so that the element 1/2 exists in K. Of our list of
“standard” ﬁelds, this only excludes F
2
, the integers mod 2. (For example, in F
5
,
we have 1/2 = 3.)
A quadratic form as a function which, when written out in coordinates, is a
polynomial in which every term has total degree 2 in the variables. For example,
q(x, y, z) = x
2
+4xy +2xz −3y
2
−2yz −z
2
is a quadratic form in three variables.
We will meet a formal deﬁnition of a quadratic form later in the chapter, but
for the moment we take the following.
5.2. QUADRATIC FORMS 59
Deﬁnition 5.5 A quadratic form in n variables x
1
, . . . , x
n
over a ﬁeld K is a poly
nomial
n
∑
i=1
n
∑
j=1
a
i j
x
i
x
j
in the variables in which every term has degree two (that is, is a multiple of x
i
x
j
for some i, j).
In the above representation of a quadratic form, we see that if i = j, then the
term in x
i
x
j
comes twice, so that the coefﬁcient of x
i
x
j
is a
i j
+a
ji
. We are free to
choose any two values for a
i j
and a
ji
as long as they have the right sum; but we
will always make the choice so that the two values are equal. That is, to obtain a
term cx
i
x
j
, we take a
i j
= a
ji
= c/2. (This is why we require that the characteristic
of the ﬁeld is not 2.)
Any quadratic form is thus represented by a symmetric matrix A with (i, j)
entry a
i j
(that is, a matrix satisfying A = A
·
). This is the third job of matrices in
linear algebra: Symmetric matrices represent quadratic forms.
We think of a quadratic form as deﬁned above as being a function from the
vector space K
n
to the ﬁeld K. It is clear from the deﬁnition that
q(x
1
, . . . , x
n
) = v
·
Av, where v =
x
1
.
.
.
x
n
¸
¸
.
Now if we change the basis for V, we obtain a different representation for the
same function q. The effect of a change of basis is a linear substitution v = Pv
/
on
the variables, where P is the transition matrix between the bases. Thus we have
v
·
Av = (Pv
/
)·A(Pv
/
) = (v
/
)
·
(P
·
AP)v
/
,
so we have the following:
Proposition 5.4 A basis change with transition matrix P replaces the symmetric
matrix A representing a quadratic form by the matrix P
·
AP.
As for other situations where matrices represented objects on vector spaces,
we make a deﬁnition:
Deﬁnition 5.6 Two symmetric matrices A, A
/
over a ﬁeld K are congruent if A
/
=
P
·
AP for some invertible matrix P.
Proposition 5.5 Two symmetric matrices are congruent if and only if they repre
sent the same quadratic form with respect to different bases.
60 CHAPTER 5. LINEAR AND QUADRATIC FORMS
Our next job, as you may expect, is to ﬁnd a canonical form for symmetric
matrices under congruence; that is, a choice of basis so that a quadratic form has
a particularly simple shape. We will see that the answer to this question depends
on the ﬁeld over which we work. We will solve this problem for the ﬁelds of real
and complex numbers.
5.2.2 Reduction of quadratic forms
Even if we cannot ﬁnd a canonical form for quadratic forms, we can simplify them
very greatly.
Theorem 5.6 Let q be a quadratic form in n variables x
1
, . . . , x
n
, over a ﬁeld
K whose characteristic is not 2. Then by a suitable linear substitution to new
variables y
1
, . . . , y
n
, we can obtain
q = c
1
y
2
1
+c
2
y
2
2
+ +c
n
y
2
n
for some c
1
, . . . , c
n
∈ K.
Proof Our proof is by induction on n. We call a quadratic form which is written
as in the conclusion of the theorem diagonal. A form in one variable is certainly
diagonal, so the induction starts. Now assume that the theorem is true for forms
in n−1 variables. Take
q(x
1
, . . . , x
n
) =
n
∑
i=1
n
∑
j=1
a
i j
x
i
x
j
,
where a
i j
= a
ji
for i = j.
Case 1: Assume that a
ii
=0 for some i. By a permutation of the variables (which
is certainly a linear substitution), we can assume that a
11
= 0. Let
y
1
= x
1
+
n
∑
i=2
(a
1i
/a
11
)x
i
.
Then we have
a
11
y
2
1
= a
11
x
2
1
+2
n
∑
i=2
a
1i
x
1
x
i
+q
/
(x
2
, . . . , x
n
),
where q
/
is a quadratic form in x
2
, . . . , x
n
. That is, all the terms involving x
1
in q
have been incorporated into a
11
y
2
1
. So we have
q(x
1
, . . . , x
n
) = a
11
y
2
1
+q
//
(x
2
, . . . , x
n
),
5.2. QUADRATIC FORMS 61
where q
//
is the part of q not containing x
1
minus q
/
.
By induction, there is a change of variable so that
q
//
(x
2
, . . . , x
n
) =
n
∑
i=2
c
i
y
2
i
,
and so we are done (taking c
1
= a
11
).
Case 2: All a
ii
are zero, but a
i j
= 0 for some i = j. Now
x
i j
=
1
4
(x
i
+x
j
)
2
−(x
i
−x
j
)
2
,
so taking x
/
i
=
1
2
(x
i
+x
j
) and x
/
j
=
1
2
(x
i
−x
j
), we obtain a new form for q which
does contain a nonzero diagonal term. Now we apply the method of Case 1.
Case 3: All a
i j
are zero. Now q is the zero form, and there is nothing to prove:
take c
1
= = c
n
= 0.
Example 5.1 Consider the quadratic form q(x, y, z) = x
2
+2xy +4xz +y
2
+4z
2
.
We have
(x +y +2z)
2
= x
2
+2xy +4xz +y
2
+4z
2
+4yz,
and so
q = (x +y +2z)
2
−4yz
= (x +y +2z)
2
−(y +z)
2
+(y −z)
2
= u
2
+v
2
−w
2
,
where u = x+y+2z, v = y−z, w = y+z. Otherwise said, the matrix representing
the quadratic form, namely
A =
1 1 2
1 1 0
2 0 4
¸
¸
is congruent to the matrix
A
/
=
1 0 0
0 1 0
0 0 −1
¸
¸
.
Can you ﬁnd an invertible matrix P such that P
·
AP = A
/
?
62 CHAPTER 5. LINEAR AND QUADRATIC FORMS
Thus any quadratic form can be reduced to the diagonal shape
α
1
x
2
1
+ +α
n
x
2
n
by a linear substitution. But this is still not a “canonical form for congruence”.
For example, if y
1
= x
1
/c, then α
1
x
2
1
= (α
1
c
2
)y
2
1
. In other words, we can multiply
any α
i
by any factor which is a perfect square in K.
Over the complex numbers C, every element has a square root. Suppose that
α
1
, . . . , α
r
= 0, and α
r+1
= = α
n
= 0. Putting
y
i
=
(
√
α
i
)x
i
for 1 ≤i ≤r,
x
i
for r +1 ≤i ≤n,
we have
q = y
2
1
+ +y
2
r
.
We will see later that r is an “invariant” of q: however we do the reduction, we
arrive at the same value of r.
Over the real numbers R, things are not much worse. Since any positive real
number has a square root, we may suppose that α
1
, . . . , α
s
>0, α
s+1
, . . . , α
s+t
<0,
and α
s+t+1
, . . . , α
n
= 0. Now putting
y
i
=
(
√
α
i
)x
i
for 1 ≤i ≤s,
(
√
−α
i
)x
i
for s +1 ≤i ≤s +t,
x
i
for s +t +1 ≤i ≤n,
we get
q = x
2
1
+ +x +s
2
−x
2
s+1
− −x
2
s+t
.
Again, we will see later that s and t don’t depend on how we do the reduction.
[This is the theorem known as Sylvester’s Law of Inertia.]
5.2.3 Quadratic and bilinear forms
The formal deﬁnition of a quadratic form looks a bit different from the version we
gave earlier, though it amounts to the same thing. First we deﬁne a bilinear form.
Deﬁnition 5.7 (a) Let b : V V →K be a function of two variables from V
with values in K. We say that b is a bilinear form if it is a linear function of
each variable when the other is kept constant: that is,
b(v, w
1
+w
2
) = b(v, w
1
) +b(v, w
2
), b(v, cw) = cb(v, w),
with two similar equations involving the ﬁrst variable. A bilinear form b is
symmetric if b(v, w) = b(w, v) for all v, w ∈V.
5.2. QUADRATIC FORMS 63
(b) Let q : V →K be a function. We say that q is a quadratic form if
– q(cv) = c
2
q(v) for all c ∈ K, v ∈V;
– the function b deﬁned by
b(v, w) =
1
2
(q(v +w) −q(v) −q(w))
is a bilinear form on V.
Remarks The bilinear form in the second part is symmetric; and the division
by 2 in the deﬁnition is permissible because of our assumption that the character
istic of K is not 2.
If we think of the prototype of a quadratic form as being the function x
2
, then
the ﬁrst equation says (cx)
2
= c
2
x
2
, while the second has the form
1
2
((x +y)
2
−x
2
−y
2
) = xy,
and xy is the prototype of a bilinear form: it is a linear function of x when y is
constant, and vice versa.
Note that the formula b(x, y) =
1
2
(q(x +y) −q(x) −q(y)) (which is known as
the polarisation formula) says that the bilinear formis determined by the quadratic
term. Conversely, if we know the symmetric bilinear form b, then we have
2q(v) = 4q(v) −2q(v) = q(v +v) −q(v) −q(v) = 2b(v, v),
so that q(v) = b(v, v), and we see that the quadratic form is determined by the
symmetric bilinear form. So these are equivalent objects.
If b is a symmetric bilinear form on V and B = (v
1
, . . . , v
n
) is a basis for V,
then we can represent b by the nn matrix A whose (i, j) entry is a
i j
= b(v
i
, v
j
).
Note that A is a symmetric matrix. It is easy to see that this is the same as the
matrix representing the quadratic form.
Here is a third way of thinking about a quadratic form. Let V
∗
be the dual
space of V, and let α : V →V
∗
be a linear map. Then for v ∈V, we have α(v) ∈V
∗
,
and so α(v)(w) is an element of K. The function
b(v, w) = α(v)(w)
is a bilinear form on V. If α(v)(w) = α(w)(v) for all v, w ∈V, then this bilinear
form is symmetric. Conversely, a symmetric bilinear form b gives rise to a linear
map α : V →V
∗
satisfying α(v)(w) =α(w)(v), by the rule that α(v) is the linear
map w →b(v, w).
Now given α : V →V
∗
, choose a basis B for V, and let B
∗
be the dual basis
for V
∗
. Then α is represented by a matrix A relative to the bases B and B
∗
.
Summarising:
64 CHAPTER 5. LINEAR AND QUADRATIC FORMS
Proposition 5.7 The following objects are equivalent on a vector space over a
ﬁeld whose characteristic is not 2:
(a) a quadratic form on V;
(b) a symmetric bilinear form on V;
(c) a linear map α : V →V
∗
satisfying α(v)(w) = α(w)(v) for all v, w ∈V.
Moreover, if corresponding objects of these three types are represented by ma
trices as described above, then we get the same matrix A in each case. Also, a
change of basis in V with transition matrix P replaces A by P
·
AP.
Proof Only the last part needs proof. We have seen it for a quadratic form, and
the argument for a bilinear form is the same. So suppose that α : V →V
∗
, and
we change from B to B
/
in V with transition matrix P. We saw that the transition
matrix between the dual bases in V
∗
is (P
·
)
−1
. Now go back to the discussion
of linear maps between different vector spaces in Chapter 4. If α : V →W and
we change bases in V and W with transition matrices P and Q, then the matrix
A representing α is changed to Q
−1
AP. Apply this with Q = P
·
)
−1
, so that
Q
−1
= P
·
, and we see that the new matrix is P
·
AP, as required.
5.2.4 Canonical forms for complex and real forms
Finally, in this section, we return to quadratic forms (or symmetric matrices) over
the real and complex numbers, and ﬁnd canonical forms under congruence. Re
call that two symmetric matrices A and A
/
are congruent if A
/
= P
·
AP for some
invertible matrix P; as we have seen, this is the same as saying that the represent
the same quadratic form relative to different bases.
Theorem 5.8 Any nn complex symmetric matrix A is congruent to a matrix of
the form
¸
I
r
O
O O
for some r. Moreover, r = rank(A), and so A is congruent to two matrices of this
form then they both have the same value of r.
Proof We already saw that A is congruent to a matrix of this form. Moreover, if
P is invertible, then so is P
·
, and so
r = rank(P
·
AP) = rank(A)
as claimed.
5.2. QUADRATIC FORMS 65
The next result is Sylvester’s Law of Inertia.
Theorem 5.9 Any n n real symmetric matrix A is congruent to a matrix of the
form
I
s
O O
O −I
t
O
O O O
¸
¸
for some s, t. Moreover, if A is congruent to two matrices of this form, then they
have the same values of s and of t.
Proof Again we have seen that A is congruent to a matrix of this form. Arguing
as in the complex case, we see that s +t = rank(A), and so any two matrices of
this form congruent to A have the same values of s +t.
Suppose that two different reductions give the values s, t and s
/
, t
/
respectively,
with s +t = s
/
+t
/
= n. Suppose for a contradiction that s < s
/
. Now let q be the
quadratic form represented by A. Then we are told that there are linear functions
y
1
, . . . , y
n
and z
1
, . . . , z
n
of the original variables x
1
, . . . , x
n
of q such that
q = y
2
1
+ +y
2
s
−y
2
s+1
− −y
2
s+t
= z
2
1
+ +z
2
s
/ −z
2
s
/
+1
− −z
2
s+t
.
Now consider the equations
y
1
= 0, . . . , y
s
= 0, z
s
/
+1
= 0, . . . z
n
= 0
regarded as linear equations in the original variables x
1
, . . . , x
n
. The number of
equations is s +(n−s
/
) = n−(s
/
−s) < n. According to a lemma from much ear
lier in the course (we used it in the proof of the Exchange Lemma!), the equations
have a nonzero solution. That is, there are values of x
1
, . . . , x
n
, not all zero, such
that the variables y
1
, . . . , y
s
and z
s
/
+1
, . . . , z
n
are all zero.
Since y
1
= = y
s
= 0, we have for these values
q =−y
2
s+1
− −y
2
n
≤0.
But since z
s
/
+1
= = z
n
= 0, we also have
q = z
2
1
+ +z
2
s
/ > 0.
But this is a contradiction. So we cannot have s < s
/
. Similarly we cannot have
s
/
< s either. So we must have s = s
/
, as required to be proved.
66 CHAPTER 5. LINEAR AND QUADRATIC FORMS
We sawthat s+t is the rank of A. The number s−t is known as the signature of
A. Of course, both the rank and the signature are independent of how we reduce
the matrix (or quadratic form); and if we know the rank and signature, we can
easily recover s and t.
You will meet some further terminology in association with Sylvester’s Law of
Inertia. Let q be a quadratic form in n variables represented by the real symmetric
matrix A. Let q (or A) have rank s +t and signature s −t, that is, have s positive
and t negative terms in its diagonal form. We say that q (or A) is
• positive deﬁnite if s =n (and t =0), that is, if q(v) ≥0 for all v, with equality
only if v = 0;
• positive semideﬁnite if t = 0, that is, if q(v) ≥0 for all v;
• negative deﬁnite if t = n (and s = 0), that is, if q(v) ≤ 0 for all v, with
equality only if v = 0;
• negative semideﬁnite if s = 0, that is, if q(v) ≤0 for all v;
• indeﬁnite if s > 0 and t > 0, that is, if q(v) takes both positive and negative
values.
Chapter 6
Inner product spaces
Ordinary Euclidean space is a 3dimensional vector space over R, but it is more
than that: the extra geometric structure (lengths, angles, etc.) can all be derived
from a special kind of bilinear form on the space known as an inner product. We
examine inner product spaces and their linear maps in this chapter.
One can also deﬁne inner products for complex vector spaces, but things are
a bit different: we have to use a form which is not quite bilinear. We defer this to
Chapter 8.
6.1 Inner products and orthonormal bases
Deﬁnition 6.1 An inner product on a real vector space V is a function b : V V →
R satisfying
• b is bilinear (that is, b is linear in the ﬁrst variable when the second is kept
constant and vice versa);
• b is positive deﬁnite, that is, b(v, v) ≥ 0 for all v ∈V, and b(v, v) = 0 if and
only if v = 0.
We usually write b(v, w) as v w. An inner product is sometimes called a dot
product (because of this notation).
Geometrically, in a real vector space, we deﬁne v w =[v[.[w[ cosθ, where [v[
and [w[ are the lengths of v and w, and θ is the angle between v and w. Of course
this deﬁnition doesn’t work if either v or w is zero, but in this case v w = 0. But
it is much easier to reverse the process. Given an inner product on V, we deﬁne
[v[ =
√
v v
67
68 CHAPTER 6. INNER PRODUCT SPACES
for any vector v ∈V; and, if v, w = 0, then we deﬁne the angle between them to be
θ, where
cosθ =
v w
[v[.[w[
.
For this deﬁnition to make sense, we need to know that
−[v[.[w[ ≤v w ≤[V[.[w[
for any vectors v, w (since cosθ lies between −1 and 1). This is the content of the
Cauchy–Schwarz inequality:
Theorem 6.1 If v, w are vectors in an inner product space then
(v w)
2
≤(v v)(w w).
Proof By deﬁnition, we have (v +xw) (v +xw) ≥0 for any real number x. Ex
panding, we obtain
x
2
(w w) +2x(v w) +(v v) ≥0.
This is a quadratic function in x. Since it is nonnegative for all real x, either it has
no real roots, or it has two equal real roots; thus its discriminant is nonpositive,
that is,
(v w)
2
−(v v)(w w) ≤0,
as required.
There is essentially only one kind of inner product on a real vector space.
Deﬁnition 6.2 A basis (v
1
, . . . , v
n
) for an inner product space is called orthonor
mal if v
i
v
j
= δ
i j
(the Kronecker delta) for 1 ≤i, j ≤n.
Remark: If vectors v
1
, . . . , v
n
satisfy v
i
v
j
= δ
i j
, then they are necessarily lin
early independent. For suppose that c
1
v
1
+ +c
n
v
n
=0. Taking the inner product
of this equation with v
i
, we ﬁnd that c
i
= 0, for all i.
Theorem 6.2 Let be an inner product on a real vector space V. Then there is an
orthonormal basis (v
1
, . . . , v
n
) for V. If we represent vectors in coordinates with
respect to this basis, say v = [ x
1
x
2
. . . x
n
]
·
and w = [ y
1
y
2
. . . y
n
]
·
,
then
v w = x
1
y
1
+x
2
y
2
+ +x
n
y
n
.
6.1. INNER PRODUCTS AND ORTHONORMAL BASES 69
Proof This follows from our reduction of quadratic forms in the last chapter.
Since the inner product is bilinear, the function q(v) = v v = [v[
2
is a quadratic
form, and so it can be reduced to the form
q = x
2
1
+ +x
2
s
−x
2
s+1
− −x
2
s+t
.
Now we must have s = n and t = 0. For, if t > 0, then the s +1st basis vector v
s+1
satisﬁes v
s+1
v
s+1
= −1; while if s +t < n, then the nth basis vector v
n
satisﬁes
v
n
v
n
= 0. Either of these would contradict the positive deﬁniteness of V. Now
we have
q(x
1
, . . . , x
n
) = x
2
1
+ +x
2
n
,
and by polarisation we ﬁnd that
b((x
1
, . . . , x
n
), (y
1
, . . . , y
n
)) = x
1
y
1
+ +x
n
y
n
,
as required.
However, it is possible to give a more direct proof of the theorem; this is
important because it involves a constructive method for ﬁnding an orthonormal
basis, known as the Gram–Schmidt process.
Let w
1
, . . . , w
n
be any basis for V. The Gram–Schmidt process works as fol
lows.
• Since w
1
= 0, we have w
1
w
1
> 0, that is, [w
1
[ > 0. Put v
1
= w
1
/[w
1
[; then
[v
1
[ = 1, that is, v
1
v
1
= 1.
• For i = 2, . . . , n, let w
/
i
= w
i
−(v
1
w
i
)v
1
. Then
v
1
w
/
i
= v
1
w
i
−(v
1
w
i
)(v
1
v
1
) = 0
for i = 2, . . . , n.
• Now apply the Gram–Schmidt process recursively to (w
/
2
, . . . , w
/
n
).
Since we replace these vectors by linear combinations of themselves, their inner
products with v
1
remain zero throughout the process. So if we end up with vectors
v
2
, . . . , v
n
, then v
1
v
i
=0 for i =2, . . . , n. By induction, we can assume that v
i
v
j
=
δ
i j
for i, j = 2, . . . , n; by what we have said, this holds if i or j is 1 as well.
Deﬁnition 6.3 The inner product on R
n
for which the standard basis is orthonor
mal (that is, the one given in the theorem) is called the standard inner product on
R
n
.
70 CHAPTER 6. INNER PRODUCT SPACES
Example 6.1 In R
3
(with the standard inner product), apply the Gram–Schmidt
process to the vectors w
1
= [ 1 2 2]
·
, w
2
= [ 1 1 0]
·
, w
3
= [ 1 0 0]
·
.
To simplify things, I will write (a
1
, a
2
, a
3
) instead of [ a
1
a
2
a
3
]
·
.
We have w
1
w
1
= 9, so in the ﬁrst step we put
v
1
=
1
3
w
1
= (
1
3
,
2
3
,
2
3
).
Now v
1
w
2
= 1 and v
1
w
3
=
1
3
, so in the second step we ﬁnd
w
/
2
= w
2
−v
1
= (
2
3
,
1
3
, −
2
3
),
w
/
3
= w
3
−
1
3
v
1
= (
8
9
, −
2
9
,
2
9
).
Now we apply Gram–Schmidt recursively to w
/
2
and w
/
3
. We have w
/
2
w
/
2
= 1,
so v
2
= w
/
2
= (
2
3
,
1
3
, −
2
3
). Then v
2
w
/
3
=
2
3
, so
w
//
3
= w
/
3
−
2
3
v
2
= (
4
9
, −
4
9
,
2
9
).
Finally, w
//
3
w
//
3
=
4
9
, so v
3
=
3
2
w
//
3
= (
2
3
, −
2
3
,
1
3
).
Check that the three vectors we have found really do form an orthonormal
basis.
6.2 Adjoints and orthogonal linear maps
We saw in the last chapter that a bilinear form on V is the same thing as a linear
map from V to its dual space. The importance of an inner product is that the
corresponding linear map is a bijection which maps an orthonormal basis of V to
its dual basis in V
∗
.
Recall that the linear map α : V →V
∗
corresponding to a bilinear form b on
V satisﬁes α(v)(w) = b(v, w); in our case, α(v)(w) = v w. Now suppose that
(v
1
, . . . , v
n
) is an orthonormal basis for V, so that v
i
v
j
= δ
i j
. Then, if α(v
i
) = f
i
,
we have f
i
(v
j
) = δ
i j
; but this is exactly the statement that ( f
1
, . . . , f
n
) is the dual
basis to (v
1
, . . . , v
n
).
So, on an inner product space V, we have a natural way of matching up V with
V
∗
.
Recall too that we deﬁned the adjoint of α : V →V to be the map α
∗
: V
∗
→V
∗
deﬁned by α
∗
( f )(v) = f (α(v)), and we showed that the matrix representing α
∗
relative to the dual basis is the transpose of the matrix representing α relative to
the original basis.
Translating all this to an inner product space, we have the following deﬁnition
and result:
6.2. ADJOINTS AND ORTHOGONAL LINEAR MAPS 71
Deﬁnition 6.4 Let V be an inner product space, and α : V →V a linear map. Then
the adjoint of α is the linear map α
∗
: V →V deﬁned by
v α
∗
(w) = α(v) w.
Proposition 6.3 If α is represented by the matrix A relative to an orthonormal
basis of V, then α
∗
is represented by the transposed matrix A
·
.
Now we deﬁne two important classes of linear maps on V.
Deﬁnition 6.5 Let α be a linear map on an inner product space V.
(a) α is selfadjoint if α
∗
= α.
(b) α is orthogonal if it is invertible and α
∗
= α
−1
.
Proposition 6.4 If α is represented by a matrix A (relative to an orthonormal
basis), then
(a) α is selfadjoint if and only if A is symmetric;
(b) α is orthogonal if and only if A
·
A = I.
Part (a) of this result shows that we have yet another equivalence relation on
real symmetric matrices:
Deﬁnition 6.6 Two real symmetric matrices are called orthogonally similar if
they represent the same selfadjoint map with respect to different orthonormal
bases.
Then, from part (b), we see:
Proposition 6.5 Two real symmetric matrices A and A
/
are orthogonally similar
if and only if there is an orthogonal matrix P such that A
/
= P
−1
AP = P
·
AP.
Here P
−1
=P
·
because P is orthogonal. We see that orthogonal similarity is a
reﬁnement of both similarity and congruence. We will examine selfadjoint maps
(or symmetric matrices) further in the next section.
72 CHAPTER 6. INNER PRODUCT SPACES
Next we look at orthogonal maps.
Theorem 6.6 The following are equivalent for a linear map α on an inner prod
uct space V:
(a) α is orthogonal;
(b) α preserves the inner product, that is, α(v) α(w) = v w;
(c) α maps an orthonormal basis of V to an orthonormal basis.
Proof We have
α(v) α(w) = v α
∗
(α(w)),
by the deﬁnition of adjoint; so (a) and (b) are equivalent.
Suppose that (v
1
, . . . , v
n
) is an orthonormal basis, that is, v
i
v
j
= δ
i j
. If (b)
holds, then α(v
i
) α(v
j
) = δ
i j
, so that (α(v
1
), . . . , α(v
n
) is an orthonormal basis,
and (c) holds. Converesely, suppose that (c) holds, and let v =∑x
i
v
i
and w=∑y
i
v
i
for some orthonormal basis (v
1
, . . . , v
n
), so that v w =∑x
i
y
i
. We have
α(v) α(w) =
∑
x
i
α(v
i
)
∑
y
i
α(v
i
)
=
∑
x
i
y
i
,
since α(v
i
) α(v
j
) = δ
i j
by assumption; so (b) holds.
Corollary 6.7 α is orthogonal if and only if the columns of the matrix represent
ing α relative to an orthonormal basis themselves form an orthonormal basis.
Proof The columns of the matrix representing α are just the vectors α(v
1
), . . . , α(v
n
),
written in coordinates relative to v
1
, . . . , v
n
. So this follows from the equivalence
of (a) and (c) in the theorem. Alternatively, the condition on columns shows that
A
·
A = I, where A is the matrix representing α; so α
∗
α = I, and α is orthogonal.
Example Our earlier example of the Gram–Schmidt process produces the or
thogonal matrix
1
3
2
3
2
3
2
3
1
3
−
2
3
2
3
−
2
3
1
3
¸
¸
whose columns are precisely the orthonormal basis we constructed in the example.
Chapter 7
Symmetric and Hermitian matrices
We come to one of the most important topics of the course. In simple terms, any
real symmetric matrix is diagonalisable. But there is more to be said!
7.1 Orthogonal projections and orthogonal decom
positions
We say that two vectors v, w in an inner product space are orthogonal if v w = 0.
Deﬁnition 7.1 Let V be a real inner product space, and U a subspace of V. The
orthogonal complement of U is the set of all vectors which are orthogonal to
everything in U:
U
⊥
=¦w ∈V : w u = 0 for all u ∈U¦.
Proposition 7.1 If V is an inner product space and U a subspace of V, with
dim(V) = n and dim(U) = r, then U
⊥
is a subspace of V, and dim(U
⊥
) = n−r.
Moreover, V =U ⊕U
⊥
.
Proof Proving that U
⊥
is a subspace is straightforward from the properties of
the inner product. If w
1
, w
2
∈ U
⊥
, then w
1
u = w
2
u = 0 for all u ∈ U, so
(w
1
+w
2
) u = 0 for all u ∈U, whence w
1
+w
2
∈U
⊥
. The argument for scalar
multiples is similar.
Now choose a basis for U and extend it to a basis for V. Then apply the Gram–
Schmidt process to this basis (starting with the elements of the basis for U), to
obtain an orthonormal basis (v
1
, . . . , v
n
). Since the process only modiﬁes vectors
by adding multiples of earlier vectors, the ﬁrst r vectors in the resulting basis will
form an orthonormal basis for U. The last n −r vectors will be orthogonal to
73
74 CHAPTER 7. SYMMETRIC AND HERMITIAN MATRICES
U, and so lie in U
⊥
; and they are clearly linearly independent. Now suppose that
w∈U
⊥
and w=∑c
i
v
i
, where (v
1
, . . . , v
n
) is the orthonormal basis we constructed.
Then c
i
= w v
i
= 0 for i = 1, . . . , r; so w is a linear combination of the last n−r
basis vectors, which thus forma basis of U
⊥
. Hence dim(U
⊥
) =n−r, as required.
Now the last statement of the proposition follows from the proof, since we
have a basis for V which is a disjoint union of bases for U and U
⊥
.
Recall the connection between direct sum decompositions and projections. If
we have projections P
1
, . . . , P
r
whose sum is the identity and which satisfy P
i
P
j
=
O for i = j, then the space V is the direct sum of their images. This can be reﬁned
in an inner product space as follows.
Deﬁnition 7.2 Let V be an inner product space. A linear map π : V →V is an
orthogonal projection if
(a) π is a projection, that is, π
2
= π;
(b) π is selfadjoint, that is, π
∗
=π (where π
∗
(v) w = v π(w) for all v, w ∈V).
Proposition 7.2 If π is an orthogonal projection, then Ker(π) = Im(π)
⊥
.
Proof We know that V = Ker(π) ⊕Im(π); we only have to show that these two
subspaces are orthogonal. So take v ∈ Ker(π), so that π(v) = 0, and w ∈ Im(π),
so that w = π(u) for some u ∈V. Then
v w = v π(u) = π
∗
(v) u = π(v) u = 0,
as required.
Proposition 7.3 Let π
1
, . . . , π
r
be orthogonal projections on an inner product
space V satisfying π
1
+ +π
r
= I and π
i
π
j
= O for i = j. Let U
i
= Im(π
i
)
for i = 1, . . . , r. Then
V =U
1
⊕U
2
⊕ ⊕U
r
,
and if u
i
∈U
i
and u
j
∈U
j
, then u
i
and u
j
are orthogonal.
Proof The fact that V is the direct sum of the images of the π
i
follows from
Proposition 5.2. We only have to prove the last part. So take u
i
and u
j
as in the
Proposition, say u
i
= π
i
(v) and u
j
= π
j
(w). Then
u
i
u
j
= π
i
(v) π
j
(w) = π
∗
i
(v) π
j
(w) = v π
i
(π
j
(w)) = 0,
where the second equality holds since π
i
is selfadjoint and the third is the deﬁni
tion of the adjoint.
A direct sum decomposition satisfying the conditions of the theorem is called
an orthogonal decomposition of V.
Conversely, if we are given an orthogonal decomposition of V, then we can
ﬁnd orthogonal projections satisfying the hypotheses of the theorem.
7.2. THE SPECTRAL THEOREM 75
7.2 The Spectral Theorem
The main theorem can be stated in two different ways. I emphasise that these two
theorems are the same! Either of them can be referred to as the Spectral Theorem.
Theorem 7.4 If α is a selfadjoint linear map on a real inner product space V,
then the eigenspaces of α form an orthogonal decomposition of V. Hence there
is an orthonormal basis of V consisting of eigenvectors of α. Moreover, there
exist orthogonal projections π
1
, . . . , π
r
satisfying π
1
+ +π
r
= I and π
i
π
j
= O
for i = j, such that
α = λ
1
π
1
+ +λ
r
π
r
,
where λ
1
, . . . , λ
r
are the distinct eigenvalues of α.
Theorem 7.5 Let A be a real symmetric matrix. Then there exists an orthogonal
matrix P such that P
−1
AP is diagonal. In other words, any real symmetric matrix
is orthogonally similar to a diagonal matrix.
Proof The second theorem follows from the ﬁrst, since the transition matrix from
one orthonormal basis to another is an orthogonal matrix. So we concentrate on
the ﬁrst theorem. It sufﬁces to ﬁnd an orthonormal basis of eigenvectors, since
all the rest follows from our remarks about projections, together with what we
already know about diagonalisable maps.
The proof will be by induction on n =dim(V). There is nothing to do if n =1.
So we assume that the theorem holds for (n−1)dimensional spaces.
The ﬁrst job is to show that α has an eigenvector.
Choose an orthonormal basis; then α is represented by a real symmetric ma
trix A. Its characteristic polynomial has a root λ over the complex numbers. (The
socalled “Fundamental Theorem of Algebra” asserts that any polynomial over C
has a root.) We temporarily enlarge the ﬁeld from R to C. Now we can ﬁnd a
column vector v ∈ C
n
such that Av = λv. Taking the complex conjugate, remem
bering that A is real, we have Av = λv.
If v = [ z
1
z
2
z
n
]
·
, then we have
λ([z
1
[
2
+[z
2
[
2
+ +[z
n
[
2
) = λv
·
v
= (Av)
·
v
= v
·
Av
= v
·
(λv)
= λ([z
1
[
2
+[z
2
[
2
+ +[z
n
[
2
),
so (λ −λ)([z
1
[
2
+[z
2
[
2
+ +[z
n
[
2
) =0. Since v is not the zero vector, the second
factor is positive, so we must have λ = λ, that is, λ is real.
76 CHAPTER 7. SYMMETRIC AND HERMITIAN MATRICES
Now since α has a real eigenvalue, we can choose a real eigenvector v, and
(multiplying by a scalar if necessary) we can assume that [v[ = 1.
Let U be the subspace v
⊥
= ¦u ∈ V : v u = 0¦. This is a subspace of V of
dimension n−1. We claim that α : U →U. For take u ∈U. Then
v α(u) = α
∗
(v) u = α(v) u = λv u = 0,
where we use the fact that α is selfadjoint. So α(u) ∈U.
So α is a selfadjoint linear map on the (n −1)dimensional inner product
space U. By the inductive hypothesis, U has an orthonormal basis consisting of
eigenvectors of α. They are all orthogonal to the unit vector v; so, adding v to the
basis, we get an orthonormal basis for V, and we are done.
Remark The theorem is almost a canonical form for real symmetric relations
under the relation of orthogonal congruence. If we require that the eigenvalues
occur in decreasing order down the diagonal, then the result is a true canonical
form: each matrix is orthogonally similar to a unique diagonal matrix with this
property.
Corollary 7.6 If α is selfadjoint, then eigenvectors of α corresponding to dis
tinct eigenvalues are orthogonal.
Proof This follows from the theorem, but is easily proved directly. If α(v) =λv
and α(w) = µw, then
λv w = α(v) w = α
∗
(v) w = v α(w) = µv w,
so, if λ = µ, then v w = 0.
Example 7.1 Let
A =
10 2 2
2 13 4
2 4 13
¸
¸
.
The characteristic polynomial of A is
x −10 −2 −2
−2 x −13 −4
−2 −4 x −13
= (x −9)
2
(x −18),
so the eigenvalues are 9 and 18.
For eigenvalue 18 the eigenvectors satisfy
10 2 2
2 13 4
2 4 13
¸
¸
x
y
z
¸
¸
=
18x
18y
18z
¸
¸
,
7.3. QUADRATIC FORMS REVISITED 77
so the eigenvectors are multiples of [ 1 2 2]
·
. Normalising, we can choose a
unit eigenvector [
1
3
2
3
2
3
]
·
.
For the eigenvalue 9, the eigenvectors satisfy
10 2 2
2 13 4
2 4 13
¸
¸
x
y
z
¸
¸
=
9x
9y
9z
¸
¸
,
that is, x +2y +2z = 0. (This condition says precisely that the eigenvectors are
orthogonal to the eigenvector for λ = 18, as we know.) Thus the eigenspace is 2
dimensional. We need to choose an orthonormal basis for it. This can be done in
many different ways: for example, we could choose [ 0 1/
√
2 −1/
√
2]
·
and
[ −4/3
√
2 1/3
√
2 1/3
√
2]
·
. Then we have an orthonormal basis of eigenvec
tors. We conclude that, if
P =
1/3 0 −4/3
√
2
2/3 1/
√
2 1/3
√
2
2/3 −1/
√
2 1/3
√
2
¸
¸
,
then P is orthogonal, and
P
·
AP =
18 0 0
0 9 0
0 0 9
¸
¸
.
You might like to check that the orthogonal matrix in the example in the last
chapter of the notes also diagonalises A.
7.3 Quadratic forms revisited
Any real quadratic form is represented by a real symmetric matrix; and, as we
have seen, orthogonal similarity is a reﬁnement of congruence. This gives us a
new look at the reduction of real quadratic forms. Recall that any real symmetric
matrix is congruent to one of the form
I
s
O O
O −I
t
O
O O O
¸
¸
,
where the numbers s and t are uniquely determined: s +t is the rank, and s −t the
signature, of the matrix (Sylvester’s Law of Inertia).
78 CHAPTER 7. SYMMETRIC AND HERMITIAN MATRICES
Proposition 7.7 The rank of a real symmetric matrix is equal to the number of
nonzero eigenvalues, and the signature is the number of positive eigenvalues mi
nus the number of negative eigenvalues (counted according to multiplicity).
Proof Given a real symmetric matrix A, there is an orthogonal matrix P such that
P
·
AP is diagonal, with diagonal entries λ
1
, . . . , λ
n
, say. Suppose that λ
1
, . . . , λ
s
are positive, λ
s+1
, . . . , λ
s+t
are negative, and the remainder are zero. Let D be a
diagonal matrix with diagonal entries
1/
λ
1
, . . . , 1/
λ
s
, 1/
−λ
s+1
, . . . , 1/
−λ
s+t
, 1, . . . , 1.
Then
(PD)
·
APD = D
·
P
·
APD =
I
s
O O
O −I
t
O
O O O
¸
¸
.
7.4 Simultaneous diagonalisation
There are two important theorems which allow us to diagonalise more than one
matrix at the same time. The ﬁrst theorem we will consider just in the matrix
form.
Theorem 7.8 Let A and B be real symmetric matrices, and suppose that A is
positive deﬁnite. Then there exists an invertible matrix P such that P
·
AP = I and
P
·
BP is diagonal. Moreover, the diagonal entries of P
·
BP are the roots of the
polynomial det(xA−B) = 0.
Proof A is a real symmetric matrix, so there exists an invertible matrix P
1
such
that P
·
1
AP
1
is in the canonical form for congruence (as in Sylvester’s Law of Iner
tia). Since A is positive deﬁnite, this canonical form must be I; that is, P
·
1
AP
1
= I.
Now consider P
·
1
BP = C. This is a real symmetric matrix; so, according to
the spectral theorem (in matrix form), we can ﬁnd an orthogonal matrix P
2
such
that P
·
2
CP
2
= D is diagonal. Moreover, P
2
is orthogonal, so P
·
2
P
2
= I.
Let P = P
1
P
2
. Then
P
·
AP = P
·
2
(P
·
1
AP
1
)P
2
= P
·
2
IP
2
= I,
and
P
·
BP = P
·
2
(P
·
1
BP
1
)P
2
= P
·
2
CP
2
= D,
as required.
7.4. SIMULTANEOUS DIAGONALISATION 79
The diagonal entries of D are the eigenvalues of C, that is, the roots of the
equation det(xI −C) = 0. Now we have
det(P
·
1
)det(xA−B)det(P
1
) =det(P
·
1
(xA−B)P
1
) =det(xP
·
1
AP
1
−P
·
1
BP
1
) =det(xI −C),
and det(P
·
1
) =det(P
1
) is nonzero; so the polynomials det(xA−B) and det(xI −C)
are nonzero multiples of each other and so have the same roots.
You might meet this formula in mechanics. If a mechanical system has n co
ordinates x
1
, . . . , x
n
, then the kinetic energy is a quadratic form in the velocities
˙ x
1
, . . . , ˙ x
n
, and (from general physical principles) is positive deﬁnite (zero veloc
ities correspond to minimum energy); near equilibrium, the potential energy is
approximated by a quadratic function of the coordinates x
1
, . . . , x
n
. If we simulta
neously diagonalise the matrices of the two quadratic forms, then we can solve n
separate differential equations rather than a complicated system with n variables!
The second theorem can be stated either for linear maps or for matrices.
Theorem 7.9 (a) Let α and β be selfadjoint maps on an inner product space
V, and suppose that αβ = βα. Then there is an orthonormal basis for V
which consists of vectors which are simultaneous eigenvalues for α and β.
(b) Let A and B be real symmetric matrices satisfying AB = BA. Then there is
an orthogonal matrix P such that both P
·
AP and P
·
BP are diagonal.
Proof Statement (b) is just a translation of (a) into matrix terms; so we prove (a).
Let λ
1
, . . . , λ
r
be the distinct eigenvalues of α. By the Spectral Theorem, have
an orthogonal decomposition
V =U
1
⊕ ⊕U
r
,
where U
i
is the λ
i
eigenspace of α.
We claim that β maps U
i
to U
i
. For take u ∈U
i
, so that α(u) = λ
i
u. Then
α(β(u)) = β(α(u)) = β(λ
i
u) = λ
i
β(u),
so β(u) is also an eigenvector of α with eigenvalue λ
i
. Hence β(u) ∈ U
i
, as
required.
Now β is a selfadjoint linear map on the inner product space U
i
, and so by the
spectral theorem again, U
i
has an orthonormal basis consisting of eigenvectors of
β. But these vectors are also eigenvectors of α, since they belong to U
i
.
Finally, since we have an orthogonal decomposition, putting together all these
bases gives us an orthonormal basis of V consisting of simultaneous eigenvectors
of α and β.
80 CHAPTER 7. SYMMETRIC AND HERMITIAN MATRICES
Remark This theorem easily extends to an arbitrary set of real symmetric ma
trices such that any two commute. For a ﬁnite set, the proof is by induction on
the number of matrices in the set, based on the proof just given. For an inﬁnite
set, we use the fact that they span a ﬁnitedimensional subspace of the space of
all real symmetric matrices; to diagonalise all the matrices in our set, it sufﬁces to
diagonalise the matrices in a basis.
Chapter 8
The complex case
The theory of real inner product spaces and selfadjoint linear maps has a close
parallel in the complex case. However, some changes are required. In this chapter
we outline the complex case. Usually, the proofs are similar to those in the real
case.
8.1 Complex inner products
There are no positive deﬁnite bilinear forms over the complex numbers; for we
always have (iv) (iv) =−v v.
But it is possible to modify the deﬁnitions so that everything works in the same
way over C.
Deﬁnition 8.1 A inner product on a complex vector space V is a map b : V V →
C satisfying
(a) b is a linear function of its second variable, keeping the ﬁrst variable con
stant;
(b) b(w, v) = b(v, w), where denotes complex conjugation. [It follows that
b(v, v) ∈ R for all v ∈V.]
(c) b(v, v) ≥0 for all v ∈V, and b(v, v) = 0 if and only if v = 0.
As before, we write b(v, w) as v w. This time, b is not linear as a function of
its ﬁrst variable; in fact we have
b(v
1
+v
2
, w) = b(v
1
, w) +b(v
2
, w), b(cv, w) = cb(v, w)
for v
1
, v
2
, v, w ∈ V and c ∈ C. (Sometimes we say that b is semilinear (that is,
“
1
2
”linear) as a function of its ﬁrst variable, and describe it as a sesquilinear form
81
82 CHAPTER 8. THE COMPLEX CASE
(that is, “1
1
2
linear”. A form satisfying (b) is called Hermitian, and one satisfying
(c) is positive deﬁnite. Thus an inner product is a positive deﬁnite Hermitian
sesquilinear form.)
The deﬁnition of an orthonormal basis is exactly as in the real case, and the
Gram–Schmidt process allows us to ﬁnd one with only trivial modiﬁcations. The
standard inner product (with respect to an orthonormal basis) is given by
v w = x
1
y
1
+ +x
n
y
n
,
where v = [ x
1
. . . x
n
]
·
, w = [ y
1
y
n
]
·
.
The adjoint of α : V →V is deﬁned as before by the formula
α
∗
(v) w = v α(w),
but this time there is a small difference in the matrix representation: if α is rep
resented by A (relative to an orthonormal basis), then its adjoint α
∗
is represented
by (A)
·
. (Take the complex conjugates of all the entries in A, and then transpose.)
So
• a selfadjoint linear map is represented by a matrix A satisfying A = (A)
·
:
such a matrix is called Hermitian.
• a map which preserves the inner product (that is, which satisﬁes α(v)
α(w) = v w, or α
∗
= α
−1
) is represented by a matrix A satisfying (A)
·
=
A
−1
: such a matrix is called unitary.
8.2 The complex Spectral Theorem
The spectral theorem for selfadjoint linear maps on complex inner product spaces
is almost identical to the real version. The proof goes through virtually unchanged.
The deﬁnition of an orthogonal projection is the same: a projection which is
selfadjoint.
Theorem 8.1 If α is a selfadjoint linear map on a complex inner product space
V, then the eigenspaces of α form an orthogonal decomposition of V. Hence there
is an orthonormal basis of V consisting of eigenvectors of α. Moreover, there
exist orthogonal projections π
1
, . . . , π
r
satisfying π
1
+ +π
r
= I and π
i
π
j
= O
for i = j, such that
α = λ
1
π
1
+ +λ
r
π
r
,
where λ
1
, . . . , λ
r
are the distinct eigenvalues of α.
8.3. NORMAL MATRICES 83
Theorem 8.2 Let A be a complex Hermitian matrix. Then there exists a unitary
matrix P such that P
−1
AP is diagonal.
There is one special feature of the complex case:
Proposition 8.3 Any eigenvalue of a selfadjoint linear map on a complex inner
product space (or of a complex Hermitian matrix) is real.
Proof Suppose that α is selfadjoint and α(v) = λv. Then
λv v = v α(v) = α
∗
(v) v = α(v) v = λv v,
where in the last step we use the fact that (cv) w = cv w for a complex inner
product. So (λ −λ)v v = 0. Since v = 0, we have v v = 0, and so λ =λ; that is,
λ is real.
We also have a theorem on simultaneous diagonalisation:
Proposition 8.4 Let α and β be selfadjoint linear maps of a complex inner prod
uct space V, and suppose that αβ = βα. Then there is an orthonormal basis for
V consisting of eigenvectors of both α and β.
The proof is as in the real case. You are invited to formulate the theorem in
terms of commuting Hermitian matrices.
8.3 Normal matrices
The fact that the eigenvalues of a complex Hermitian matrix are real leaves open
the possibility of proving a more general version of the spectral theorem. We saw
that a real symmetric matrix is orthogonally similar to a diagonal matrix. In fact,
the converse is also true. For if A is a real n n matrix and P is an orthogonal
matrix such that P
·
AP = D is diagonal, then A = PDP
·
, and so
A
·
= PD
·
P
·
= PDP
·
= A.
In other words, a real matrix is orthogonally similar to a diagonal matrix if and
only if it is symmetric.
This is not true for complex Hermitian matrices, since such matrices have real
eigenvalues and so cannot be similar to nonreal diagonal matrices.
What really happens is the following.
Deﬁnition 8.2 (a) Let α be a linear map on a complex innerproduct space V.
We say that α is normal if it commutes with its adjoint: αα
∗
= α
∗
α.
84 CHAPTER 8. THE COMPLEX CASE
(b) Let A be an n n matrix over C. We say that A is normal if it commutes
with its conjugate transpose: AA
·
= A
·
A.
Theorem 8.5 (a) Let α be a linear map on a complex inner product space V.
Then V has an orthonormal basis consisting of eigenvectors of α if and only
if α is normal.
(b) Let A be an nn matrix over C. Then there is a unitary matrix P such that
P
−1
AP is diagonal if and only if A is normal.
Proof As usual, the two forms of the theorem are equivalent. We prove it in the
ﬁrst form.
If α has an orthonormal basis (v
1
, . . . , v
n
) consisting of eigenvectors, then
α(v
i
) = λ
i
v
i
for i = 1, . . . , n, where λ
i
are eigenvalues. We see that α
∗
(v
i
) = λ
i
v
i
,
and so
αα
∗
(v
i
) = α
∗
α(v
i
) = λ
i
λ
i
v
i
.
Since αα
∗
and α
∗
α agree on the vectors of a basis, they are equal; so α is normal.
Conversely, suppose that α is normal. Let
β =
1
2
(α +α
∗
), γ =
1
2i
(α −α
∗
).
(You should compare these with the formulae x =
1
2
(z +z), y =
1
2i
(z −z) for the
real and imaginary parts of a quadratic form. The analogy is even closer, since
clearly we have α = β +iγ.) Now we claim:
• β and γ are Hermitian. For
β
∗
=
1
2
(α
∗
+α) = β,
γ
∗
=
1
−2i
(α
∗
−α) = γ,
where we use the fact that (cα)
∗
= cα
∗
.
• βγ = γβ. For
βγ =
1
4i
(α
2
−αα
∗
+α
∗
α −(α
∗
)
2
) =
1
4i
(α
2
−(α
∗
)
2
),
γβ =
1
4i
(α
2
+αα
∗
−α
∗
α −(α
∗
)
2
) =
1
4i
(α
2
−(α
∗
)
2
).
(Here we use the fact that αα
∗
= α
∗
α.)
Hence, by the Proposition at the end of the last section, there is an orthonormal
basis B whose vectors are eigenvectors of β and γ, and hence are eigenvectors of
α = β +iγ.
Note that the eigenvalues of β and γ in this proof are the real and imaginary
parts of the eigenvalues of α.
Chapter 9
Skewsymmetric matrices
We spent the last three chapters looking at symmetric matrices; even then we
could only ﬁnd canonical forms for the real and complex numbers. It turns out
that life is much simpler for skewsymmetric matrices. We ﬁnd a canonical form
for these matrices under congruence which works for any ﬁeld whatever. (More
precisely, as we will see, this statement applies to “alternating matrices”, but these
are precisely the same as skewsymmetric matrices unless the characteristic of the
ﬁeld is 2.)
9.1 Alternating bilinear forms
Alternating forms are as far from positive deﬁnite as they can be:
Deﬁnition 9.1 Let V be a vector space over K. A bilinear form b on V is alter
nating if b(v, v) = 0 for all v ∈V.
Proposition 9.1 An alternating bilinear form b satisﬁes b(w, v) =−b(v, w) for all
v, w ∈V.
Proof
0 = b(v +w, v +w) = b(v, v) +b(v, w) +b(w, v) +b(w, w) = b(v, w) +b(w, v)
for any v, w ∈V, using the deﬁnition of an alternating bilinear form.
Now here is the analogue of the Gram–Schmidt process for alternating bilinear
forms.
85
86 CHAPTER 9. SKEWSYMMETRIC MATRICES
Theorem 9.2 Let b be an alternating bilinear form on a vector space V. Then
there is a basis (u
1
, . . . , u
s
, w
1
, . . . , w
s
, z
1
, . . . , z
t
) for V such that b(u
i
, w
i
) = 1 and
b(w
i
, u
i
) =−1 for i =1, . . . , s and b(x, y) =0 for any other choices of basis vectors
x and y.
Proof If b is identically zero, then simply choose a basis (z
1
, . . . , z
n
) and take
s = 0, t = n. So suppose not.
Choose a pair of vectors u and w such that c = b(u, w) = 0. Replacing w by
w/c, we have b(u, w) = 1.
We claim that u and w are linearly independent. For suppose that cu+dw=0.
Then
0 = b(u, cu+dw) = cb(u, u) +db(u, w) = d,
0 = b(w, cu+dw) = cb(w, u) +db(w, w) =−c,
so c = d = 0. We take u
1
= u and w
1
= v as our ﬁrst two basis vectors.
Now let U = 'u, w` and W = ¦x ∈ V : b(u, x) = b(w, x) = 0¦. We claim that
V =U ⊕W. The argument just above already shows that U ∩W = 0, so we have
to show that V =U +W. So take a vector v ∈V, and let x =−b(w, v)u+b(u, v)w.
Then
b(u, x) =−b(w, v)b(u, u) +b(u, v)b(u, w) = b(u, v),
b(w, x) =−b(w, v)b(w, u) +b(u, v)b(w, w) = b(w, v)
so b(u, v −x) = b(w, v −x) = 0. Thus v −x ∈ W. But clearly x ∈ U, and so our
assertion is proved.
Now b is an alternating bilinear form on W, and so by induction there is a
basis of the required form for W, say (u
2
, . . . , u
s
, w
2
, . . . , w
s
, z
1
, . . . , z
t
). Putting in
u
1
and w
1
gives the required basis for V.
9.2 Skewsymmetric and alternating matrices
A matrix A is skewsymmetric if A
·
=−A.
A matrix A is alternating if A is skewsymmetric and has zero diagonal. If the
characteristic of the ﬁeld K is not equal to 2, then any skewsymmetric matrix is
alternating; but if the characteristic is 2, then the extra condition is needed.
Recall the matrix representing a bilinear form b relative to a basis (v
1
, . . . , v
n
):
its (i, j) entry is b(v
i
, v
j
).
Proposition 9.3 An alternating bilinear form b on a vector space over K is rep
resented by an alternating matrix; and any alternating matrix represents an alter
nating bilinear form. If the characteristic of K is not 2, we can replace “alternat
ing matrix” by “skewsymmetric matrix”.
9.2. SKEWSYMMETRIC AND ALTERNATING MATRICES 87
Proof This is obvious since if b is alternating then a
ji
= b(v
j
, v
i
) =−b(v
i
, v
j
) =
−a
i j
and a
ii
= b(v
i
, v
i
) = 0.
So we can write our theorem in matrix form as follows:
Theorem 9.4 Let A be an alternating matrix (or a skewsymmetric matrix over a
ﬁeld whose characteristic is not equal to 2). Then there is an invertible matrix P
such that P
·
AP is the matrix with s blocks
¸
0 1
−1 0
on the diagonal and all other
entries zero. Moreover the number s is half the rank of A, and so is independent
of the choice of P.
Proof We know that the effect of a change of basis with transition matrix P is to
replace the matrix A representing a bilinear form by P
·
AP. Also, the matrix in the
statement of the theorem is just the matrix representing b relative to the special
basis that we found in the preceding theorem.
This has a corollary which is a bit surprising at ﬁrst sight:
Corollary 9.5 (a) The rank of a skewsymmetric matrix (over a ﬁeld of char
acteristic not equal to 2) is even.
(b) The determinant of a skewsymmetric matrix (over a ﬁeld of characteristic
not equal to 2) is a square, and is zero if the size of the matrix is odd.
Proof (a) The canonical form in the theorem clearly has rank 2s.
(b) If the skewsymmetric matrix A is singular then its determinant is zero,
which is a square. So suppose that it is invertible. Then its canonical form has
s =n/2 blocks
¸
0 1
−1 0
on the diagonal. Each of these blocks has determinant 1,
and hence so does the whole matrix. So det(P
·
AP) =det(P)
2
det(A) =1, whence
det(A) = 1/(det(P)
2
), which is a square.
If the size n of A is odd, then the rank cannot be n (by (a)), and so det(A) = 0.
Remark There is a function deﬁned on skewsymmetric matrices called the
Pfafﬁan, which like the determinant is a polynomial in the matrix entries, and
has the property that det(A) is the square of the Pfafﬁan of A: that is, det(A) =
(Pf(A))
2
.
For example,
Pf
¸
0 a
−a 0
= a, Pf
0 a b c
−a 0 d e
−b −d 0 f
−c −e −f 0
¸
¸
¸
¸
= af −be +cd.
(Check that the determinant of the second matrix is (af −be +cd)
2
.)
88 CHAPTER 9. SKEWSYMMETRIC MATRICES
9.3 Complex skewHermitian matrices
What if we play the same variation that led us from real symmetric to complex
Hermitian matrices? That is, we are working in a complex inner product space,
and if α is represented by the matrix A, then its adjoint is represented by A
·
, the
conjugate transpose of A.
The matrix A is Hermitian if it is equal to its adjoint, that is, if A
·
= A. So we
make the following deﬁnition:
Deﬁnition 9.2 The complex nn matrix A is skewHermitian if A
·
=−A.
Actually, things are very much simpler here, because of the following obser
vation:
Proposition 9.6 The matrix A is skewHermitian if and only if iA is Hermitian.
Proof Try it and see!
Corollary 9.7 Any skewHermitian matrix can be diagonalised by a unitary ma
trix.
Proof This follows immediately from the Proposition preceding.
Alternatively, a skewHermitian matrix is obviously normal, and the Corollary
follows from our result about normal matrices (Theorem 8.5).
Since the eigenvalues of a Hermitian matrix are real, we see that the eigenval
ues of a skewHermitian matrix are imaginary.
Appendix A
Fields and vector spaces
Fields
A ﬁeld is an algebraic structure K in which we can add and multiply elements,
such that the following laws hold:
Addition laws
(FA0) For any a, b ∈ K, there is a unique element a+b ∈ K.
(FA1) For all a, b, c ∈ K, we have a+(b+c) = (a+b) +c.
(FA2) There is an element 0 ∈ K such that a+0 = 0+a = a for all a ∈ K.
(FA3) For any a ∈K, there exists −a ∈Ksuch that a+(−a) = (−a)+a =0.
(FA4) For any a, b ∈ K, we have a+b = b+a.
Multiplication laws
(FM0) For any a, b ∈ K, there is a unique element ab ∈ K.
(FM1) For all a, b, c ∈ K, we have a(bc) = (ab)c.
(FM2) There is an element 1 ∈K, not equal to the element 0 from (FA2), such
that a1 = 1a = a for all a ∈ K.
(FM3) For any a ∈ K with a = 0, there exists a
−1
∈ K such that aa
−1
=
a
−1
a = 1.
(FM4) For any a, b ∈ K, we have ab = ba.
Distributive law
(D) For all a, b, c ∈ K, we have a(b+c) = ab+ac.
89
90 APPENDIX A. FIELDS AND VECTOR SPACES
Note the similarity of the addition and multiplication laws. We say that (K, +)
is an abelian group if (FA0)–(FA4) hold. Then (FM0)–(FM4) say that (K`¦0¦, )
is also an abelian group. (We have to leave out 0 because, as (FM3) says, 0 does
not have a multiplicative inverse.)
Examples of ﬁelds include Q (the rational numbers), R (the real numbers), C
(the complex numbers), and F
p
(the integers mod p, for p a prime number).
Associated with any ﬁeld Kthere is a nonnegative integer called its character
istic, deﬁned as follows. If there is a positive integer n such that 1+1+ +1 =0,
where there are n ones in the sum, then the smallest such n is prime. (For if n =rs,
with r, s > 1, and we denote the sum of n ones by n 1, then
0 = n 1 = (r 1)(s 1);
by minimality of n, neither of the factors r 1 and s 1 is zero. But in a ﬁeld, the
product of two nonzero elements is nonzero.) If so, then this prime number is
the characteristic of K. If no such n exists, we say that the characteristic of K is
zero.
For our important examples, Q, R and C all have characteristic zero, while F
p
has characteristic p.
Vector spaces
Let K be a ﬁeld. A vector space V over K is an algebraic structure in which we
can add two elements of V, and multiply an element of V by an element of K(this
is called scalar multiplication), such that the following rules hold:
Addition laws
(VA0) For any u, v ∈V, there is a unique element u+v ∈V.
(VA1) For all u, v, w ∈V, we have u+(v +w) = (u+v) +w.
(VA2) There is an element 0 ∈V such that v +0 = 0+v = av for all v ∈V.
(VA3) For any v ∈V, there exists −v ∈V such that v+(−v) = (−v) +v = 0.
(VA4) For any u, v ∈V, we have u+v = v +u.
Scalar multiplication laws
(VM0) For any a ∈ K, v ∈V, there is a unique element av ∈V.
(VM1) For any a ∈ K, u, v ∈V, we have a(u+v) = au+av.
(VM2) For any a, b ∈ K, v ∈V, we have (a+b)v = av +bv.
(VM3) For any a, b ∈ K, v ∈V, we have (ab)v = a(bv).
91
(VM4) For any v ∈V, we have 1v =v (where 1 is the element given by (FM2)).
Again, we can summarise (VA0)–(VA4) by saying that (V, +) is an abelian
group.
The most important example of a vector space over a ﬁeld K is the set K
n
of
all ntuples of elements of K: the addition and scalar multiplication are deﬁned
by the rules
(u
1
, u
2
, . . . , u
n
) +(v
1
, v
2
, . . . , v
n
) = (u
1
+v
1
, u
2
+v
2
, . . . , u
n
+v
n
),
a(v
1
, v
2
, . . . , v
n
) = (av
1
, av
2
, . . . , av
n
).
The fact that K
n
is a vector space will be assumed here. Proofs are straightfor
ward but somewhat tedious. Here is a particularly easy one, the proof of (VM4),
as an example.
If v = (v
1
, . . . , v
n
), then
1v = 1(v
1
, . . . , v
n
) = (1v
1
, . . . , 1v
n
) = (v
1
, . . . , v
n
) = v.
The second step uses the deﬁnition of scalar multiplication in K
n
, and the third
step uses the ﬁeld axiom (FM2).
92 APPENDIX A. FIELDS AND VECTOR SPACES
Appendix B
Vandermonde and circulant
matrices
The Vandermonde matrix V(a
1
, a
2
, . . . , a
n
) is the nn matrix
1 1 . . . 1
a
1
a
2
. . . a
n
a
2
1
a
2
2
. . . a
2
n
. . .
a
n−1
1
a
n−1
2
. . . a
n−1
n
¸
¸
¸
¸
¸
¸
.
This is a particularly important type of matrix. We can write down its deter
minant explicitly:
Theorem B.1
det(V(a
1
, a
2
, . . . , a
n
)) =
∏
i<j
(a
j
−a
i
).
That is, the determinant is the product of the differences between all pairs of
parameters a
i
. From this theorem, we draw the following conclusion:
Corollary B.2 The matrix V(a
1
, a
2
, . . . , a
n
) is invertible if and only if the param
eters a
1
, a
2
, . . . , a
n
are all distinct.
For the determinant can be zero only if one of the factors vanishes.
Proof To prove the theorem, we ﬁrst regard a
n
as a variable x, so that the de
terminant ∆ is a polynomial f (x) of degree n −1 in x. We see that f (a
i
) = 0
for 1 ≤ i ≤ n −1, since the result is the determinant of a matrix with two equal
columns. By the Factor Theorem,
∆ = K(x −a
1
)(x −a
2
) (x −a
n−1
),
93
94 APPENDIX B. VANDERMONDE AND CIRCULANT MATRICES
where K is independent of x. In other words, the original determinant is K(a
n
−
a
1
) (a
n
−a
n−1
). In the same way, all differences (a
j
−a
i
) for i < j are factors,
so that the determinant is K
0
times the product of all these differences, where K
0
does not contain any of a
1
, . . . , a
n
, that is, K
0
is a constant.
To ﬁnd K
0
, we observe that the leading diagonal of the matrix gives us a term
a
2
a
2
3
a
n−1
n
in the determinant with sign +1; but this product is obtained by
taking the term with larger index from each factor in the product, also giving sign
+1. So K
0
= 1 and the theorem is proved.
Another general type of matrix whose determinant can be calculated explicitly
is the circulant matrix, whose general form is as follows:
C(a
0
, . . . , a
n−1
) =
a
0
a
1
a
2
. . . a
n−1
a
n−1
a
0
a
1
. . . a
n−2
a
n−2
a
n−1
a
0
. . . a
n−3
. . . . . .
a
1
a
2
a
3
. . . a
0
¸
¸
¸
¸
¸
¸
.
Theorem B.3 Let C =C(a
0
, . . . , a
n−1
) be a circulant matrix over the ﬁeld C. Let
ω = e
2πi/n
be a primitive nth root of unity. Then
(a) C is diagonalisable;
(b) the eigenvalues of C are ∑
n−1
j=0
a
j
ω
jk
, for k = 0, 1, . . . , n−1;
(c) det(C) is the product of the eigenvalues listed in (b).
Proof We can write down the eigenvectors. For k = 0, 1, . . . , n−1, let
v
k
= [ 1 ω
k
. . . ω
(n−1)k
]
·
. The jth entry in Cv
k
is
a
n−j
+a
n−j+1
ω
k
+ +a
n−j−1
ω
(n−1)k
= a
0
ω
jk
+ +a
n−j−1
ω
(n−1)k
+a
n−j
ω
nk
+ +a
n−1
ω
(n+j−1)k
= ω
jk
(a
0
+a
1
ω
k
+ +a
n−1
ω
(n−1)k
),
using the fact that ω
n
= 1. This is a
0
+a
1
ω
k
+ +a
n−1
ω
(n−1)k
times the jth
entry in v
k
. So
Cv
k
= (a
0
+a
1
ω
k
+ +a
n−1
ω
(n−1)k
)v
k
,
as required.
Now the vectors v
0
, . . . , v
n−1
are linearly independent. (Why? They are the
columns of a Vandermonde matrix V(1, ω, . . . , ω
n−1
), and the powers of ω are
all distinct; so the ﬁrst part of this appendix shows that the determinant of this
95
matrix is nonzero, so that the columns are linearly independent.) Hence we have
diagonalised C, and its eigenvalues are as claimed.
Finally, for part (c), the determinant of a diagonalisable matrix is the product
of its eigenvalues.
Example B.1 We have the identity
a
3
+b
3
+c
3
−3abc =
a b c
c a b
b c a
= (a+b+c)(a+ωb+ω
2
c)(a+ω
2
b+ωc),
where ω = e
2πi/3
.
This formula has an application to solving cubic equations. Consider the equa
tion
x
3
+ax
2
+bx +c = 0.
By “completing the cube”, putting y = x +
1
3
a, we get rid of the square term:
y
3
+dy +e = 0
for some d, e. Now, as above, we have
y
3
−3uvy +u
3
+v
3
= (y +u+v)(y +ωu+ω
2
v)(y +ω
2
u+ωv),
so if we can ﬁnd u and v satisfying −3uv = d and u
3
+v
3
= e, then the solutions
of the equation are y =−u−v, y =−ωu−ω
2
v, and y =−ω
2
u−ωv.
Let U =u
3
and V =v
3
. Then U +V =e and UV =−d
3
/27. Thus we can ﬁnd
U and V by solving the quadratic equation z
2
−ez −d
3
/27 = 0. Now u is a cube
root of U, and then v =−d/(3u), and we are done.
Remark The formula for the determinant of a circulant matrix works over any
ﬁeld K which contains a primitive nth root of unity.
96 APPENDIX B. VANDERMONDE AND CIRCULANT MATRICES
Appendix C
The Friendship Theorem
The Friendship Theorem states:
Given a ﬁnite set of people with the property that any two have a
unique common friend, there must be someone who is everyone else’s
friend.
The theorem asserts that the conﬁguration must look like this, where we rep
resent people by dots and friendship by edges:
u
u u
u u
¡
¡
¡
¡
¡
¡
¡
¡
¡
¡
e
e
e
e
e
e
e
e
e
e
u
u
u
u
¨
¨
¨
¨
¨
¨
¨
¨
¨¨ r
r
r
r
r
r
r
r
rr
The proof of the theorem is in two parts. The ﬁrst part is “graph theory”, the
second uses linear algebra. We argue by contradiction, and so we assume that we
have a counterexample to the theorem.
Step 1: Graph theory We show that there is a number m such that everyone
has exactly m friends. [In the terminology of graph theory, this says that we have
a regular graph of valency m.]
To prove this, we notice ﬁrst that if P
1
and P
2
are not friends, then they have
the same number of friends. For they have one common friend P
3
; any further
97
98 APPENDIX C. THE FRIENDSHIP THEOREM
friend Q of P
1
has a common friend R with P
2
, and conversely, so we can match
up the common friends as in the next picture.
u u
u
¡
¡
¡
¡
¡
e
e
e
e
e
u u
¤
¤
¤
¤
¤
¤
h
h
h
h
h
h
u u
. . . . . .
Now let us suppose that there are two people P and Q who have different
numbers of friends. By the preceding argument, P and Q must be friends. They
have a common friend R. Any other person S must have a different number of
friends from either P or Q, and so must be the friend of either P or Q (but not
both). Now if S is the friend of P but not Q, and T is the friend of Q but not P,
then any possible choice of the common friend of S and T leads to a contradiction.
So this is not possible; that is, either everyone else is P’s friend, or everyone else
is Q’s friend. But this means that we don’t have a counterexample after all.
So we conclude this step knowing that the number of friends of each person is
the same, say m, as claimed.
Step 2: Linear algebra We prove that m = 2.
Suppose that there are n people P
1
, . . . , P
n
. Let A be the n n matrix whose
(i, j) entry is 1 if P
i
and P
j
are friends, and is 0 otherwise. Then by assumption,
A is an nn symmetric matrix. Let J be the nn matrix with every entry equal
to 1; then J is also symmetric.
Consider the product AJ. Since every entry of J is equal to 1, the (i, j) entry
of AJ is just the number of ones in the ith row of A, which is the number of
friends of P
i
; this is m, by Step 1. So every entry of AJ is m, whence AJ = mJ.
Similarly, JA = mJ. Thus, A and J are commuting symmetric matrices, and so
by Theorem 7.9, they can be simultaneously diagonalised. We will calculate their
eigenvalues.
First let us consider J. If j is the column vector with all entries 1, then clearly
J j = nj, so j is an eigenvector of J with eigenvalue n. The other eigenvalues of J
are orthogonal to j. Now v j = 0 means that the sum of the components of v is
zero; this implies that Jv = 0. So any vector orthogonal to j is an eigenvector of J
with eigenvalue 0.
Now we turn to A, and observe that
A
2
= (m−1)I +J.
99
For the (i, j) entry of A
2
is equal to the number of people P
k
who are friends of
both P
i
and P
j
. If i = j, this number is m, while if i = j then (by assumption)
it is 1. So A
2
has diagonal entries m and offdiagonal entries 1, so it is equal to
(m−1)I +J, as claimed.
The allone vector j satisﬁes Aj = mj, so is an eigenvector of A with eigen
value m. This shows, in particular, that
m
2
j = A
2
j = ((m−1)I +J) j = (m−1+n) j,
so that n = m
2
−m+1. (Exercise: Prove this by a counting argument in the
graph.)
As before, the remaining eigenvectors of A are orthogonal to j, and so are
eigenvectors of J with eigenvalue 0. Thus, if v is an eigenvector of A with eigen
value λ, not a multiple of j, then
λ
2
v = A
2
v = ((m−1)I +J)v = (m−1)v,
so λ
2
= m−1, and λ =±
√
m−1.
The diagonal entries of A are all zero, so its trace is zero. So if we let f and g
be the multiplicities of
√
m−1 and −
√
m−1 as eigenvalues of A, we have
0 = Tr(A) = m+ f
√
m−1+g(−
√
m−1) = m+( f −g)
√
m−1.
This shows that m−1 must be a perfect square, say m−1 = u
2
, from which we
see that m is congruent to 1 mod u. But the trace equation is 0 =m+( f −g)u; this
says that 0 ≡1 mod u. This is only possible if u = 1. But then m = 2, n = 3, and
we have the Three Musketeers (three individuals, any two being friends). This
conﬁguration does indeed satisfy the hypotheses of the Friendship Theorem; but
it is after all not a counterexample, since each person is everyone else’s friend. So
the theorem is proved.
100 APPENDIX C. THE FRIENDSHIP THEOREM
Appendix D
Who is top of the league?
In most league competitions, teams are awarded a ﬁxed number of points for a
win or a draw. It may happen that two teams win the same number of matches and
so are equal on points, but the opponents beaten by one team are clearly “better”
than those beaten by the other. How can we take this into account?
You might think of giving each team a “score” to indicate how strong it is, and
then adding the scores of all the teams beaten by team T to see how well T has
performed. Of course this is selfreferential, since the score of T depends on the
scores of the teams that T beats. So suppose we ask simply that the score of T
should be proportional to the sum of the scores of all the teams beaten by T.
Now we can translate the problem into linear algebra. Let T
1
, . . . , T
n
be the
teams in the league. Let A be the nn matrix whose (i, j) entry is equal to 1 if T
I
beats T
j
, and 0 otherwise. Now for any vector [ x
1
x
2
. . . x
n
]
·
of scores, the
ith entry of Ax is equal to the sum of the scores x
j
for all teams T
j
beaten by T
i
.
So our requirement is simply that
x should be an eigenvector of A with all entries positive.
Here is an example. There are six teams A, B, C, D, E, and F. Suppose that
A beats B, C, D, E;
B beats C, D, E, F;
C beats D, E, F;
D beats E, F;
E beats F;
F beats A.
101
102 APPENDIX D. WHO IS TOP OF THE LEAGUE?
The matrix A is
0 1 1 1 1 0
0 0 1 1 1 1
0 0 0 1 1 1
0 0 0 0 1 1
0 0 0 0 0 1
1 0 0 0 0 0
¸
¸
¸
¸
¸
¸
¸
¸
.
We see that A and B each have four wins, but that A has generally beaten the
stronger teams; there was one upset when F beat A. Also, E and F have the fewest
wins, but F took A’s scalp and should clearly be better.
Calculation with Maple shows that the vector
[ 0.7744 0.6452 0.4307 0.2875 0.1920 0.3856]
·
is an eigenvector of A with eigenvalue 2.0085. This conﬁrms our view that A is
top of the league and that F is ahead of E; it even puts F ahead of D.
But perhaps there is a different eigenvalue and/or eigenvector which would
give us a different result?
In fact, there is a general theorem called the Perron–Frobenius theorem which
gives us conditions for this method to give a unique answer. Before we state it,
we need a deﬁnition.
Deﬁnition D.1 Let A be an nn real matrix with all its entries nonnegative. We
say that A is indecomposable if, for any i, j with 1 ≤i, j ≤n, there is a number m
such that the (i, j) entry of A
m
is strictly positive.
This oddlooking condition means, in our football league situation, that for
any two teams T
i
and T
j
, there is a chain T
k
0
, . . . , T
k
m
with T
k
0
= T
i
and T
k
m
= T
j
,
sich that each team in the chain beats the next one. Now it can be shown that
the only way that this can fail is if there is a collection C of teams such that each
team in C beats each team not in C. In this case, obviously the teams in C occupy
the top places in the league, and we have reduced the problem to ordering these
teams. So we can assume that the matrix of results is indecomposable.
In our example, we see that B beats F beats A, so the (2, 1) entry in A
2
is
nonzero. Similarly for all other pairs. So A is indecomposable in this case.
Theorem D.1 (Perron–Frobenius Theorem) Let A be a n n real matrix with
all its entries nonnegative, and suppose that A is indecomposable. Then, up to
scalar multiplication, there is a unique eigenvector v = [ x
1
. . . x
n
]
·
for A with
the property that x
i
> 0 for all i. The corresponding eigenvalue is the largest
eigenvalue of A.
So the Perron–Frobenius eigenvector solves the problem of ordering the teams
in the league.
103
Remark Sometimes even this extra level of sophistication doesn’t guarantee a
result. Suppose, for example, that there are ﬁve teams A, B, C, D, E; and suppose
that A beats B and C, B beats C and D, C beats D and E, D beats E and A, and E
beats A and B. Each team wins two games, so the simple rule gives them all the
same score. The matrix A is
A =
0 1 1 0 0
0 0 1 1 0
0 0 0 1 1
1 0 0 0 1
1 1 0 0 0
¸
¸
¸
¸
¸
¸
,
which is easily seen to be indecomposable; and if v is the all1 vector, then Av =
2v, so that v is the Perron–Frobenius eigenvector. So even with this method, all
teams get the same score. In this case, it is clear that there is so much symmetry
between the teams that none can be put above the others by any possible rule.
Remark Further reﬁnements are clearly possible. For example, instead of just
putting the (i, j) entry equal to 1 if T
i
beats T
j
, we could take it to be the number
of goals by which T
i
won the game.
Remark This procedure has wider application. How does an Internet search
engine like Google ﬁnd the most important web pages that match a given query?
An important web page is one to which a lot of other web pages link; this can be
described by a matrix, and we can use the Perron–Frobenius eigenvector to do the
ranking.
104 APPENDIX D. WHO IS TOP OF THE LEAGUE?
Appendix E
Other canonical forms
One of the unfortunate things about linear algebra is that there are many types
of equivalence relation on matrices! In this appendix I say a few brief words
about some that we have not seen elsewhere in the course. Some of these will be
familiar to you from earlier linear algebra courses, while others arise in courses
on different parts of mathematics (coding theory, group theory, etc.)
Rowequivalence
Two matrices A and B of the same size over K are said to be rowequivalent if
there is an invertible matrix P such that B = PA. Equivalently, A and B are row
equivalent if we can transform A into B by the use of elementary row operations
only. (This is true because any invertible matrix can be written as a product of
elementary matrices; see Corollary 2.6.)
A matrix A is said to be in echelon form if the following conditions hold:
• The ﬁrst nonzero entry in any row (if it exists) is equal to 1 (these entries
are called the leading ones);
• The leading ones in rows lower in the matrix occur further to the right.
We say that A is in reduced echelon form if, in addition to these two conditions,
also
• All the other entries in the column containing a leading one are zero.
For example, the matrix
0 1 a b 0 c
0 0 0 0 1 d
0 0 0 0 0 0
¸
¸
is in reduced echelon form, whatever the values of a, . . . , e.
105
106 APPENDIX E. OTHER CANONICAL FORMS
Theorem E.1 Any matrix is rowequivalent to a unique matrix in reduced echelon
form.
Coding equivalence
In the theory of errorcorrecting codes, we meet a notion of equivalence which lies
somewhere between rowequivalence and equivalence. As far as I know it does
not have a standard name.
Two matrices A and B of the same size are said to be codingequivalent if B
can be obtained from A by a combination of arbitrary row operations and column
operations of Types 2 and 3 only. (See page 16).
Using these operations, any matrix can be put into block form
¸
I
r
A
O O
, for
some matrix A. To see this, use row operations to put the matrix into reduced ech
elon form, then column permutations to move the columns containing the leading
ones to the front of the matrix.
Unfortunately this is not a canonical form; a matrix can be codingequivalent
to several different matrices of this special form.
It would take us too far aﬁeld to explain why this equivalence relation is im
portant in coding theory.
Congruence over other ﬁelds
Recall that two symmetric matrices A and B, over a ﬁeld Kwhose characteristic is
not 2, are congruent if B = P
·
AP for some invertible matrix P. This is the natural
relation arising from representing a quadratic form relative to different bases.
We saw in Chapter 5 the canonical form for this relation in the cases when K
is the real or complex numbers.
In other cases, it is usually much harder to come up with a canonical form.
Here is one of the few cases where this is possible. I state the result for quadratic
forms.
Theorem E.2 Let F
p
be the ﬁeld of integers mod p, where p is an odd prime. Let
c be a ﬁxed element of F
p
which is not a square. A quadratic form q in n variables
over F
p
can be put into one of the forms
x
2
1
+ +x
2
r
, x
2
1
+ +x
2
r−1
+cx
2
r
by an invertible linear change of variables. Any quadratic form is congruent to
just one form of one of these types.
Appendix F
Worked examples
1. Let
A =
1 2 4 −1 5
1 2 3 −1 3
−1 −2 0 1 3
¸
¸
.
(a) Find a basis for the row space of A.
(b) What is the rank of A?
(c) Find a basis for the column space of A.
(d) Find invertible matrices P and Q such that PAQ is in the canon
ical form for equivalence.
(a) Subtract the ﬁrst row from the second, add the ﬁrst row to the third, then
multiply the new second row by −1 and subtract four times this row from the
third, to get the matrix
B =
1 2 4 −1 5
0 0 1 0 2
0 0 0 0 0
¸
¸
.
The ﬁrst two rows clearly form a basis for the row space.
(b) The rank is 2, since there is a basis with two elements.
(c) The column rank is equal to the row rank and so is also equal to 2. By
inspection, the ﬁrst and third columns of A are linearly independent, so they form
a basis. The ﬁrst and second columns are not linearly independent, so we cannot
use these! (Note that we have to go back to the original A here; row operations
change the column space, so selecting two independent columns of B would not
be correct.)
107
108 APPENDIX F. WORKED EXAMPLES
(d) By step (a), we have PA = B, where P is obtained by performing the same
elementary row operations on the 33 identity matrix I
3
:
P =
1 0 0
1 −1 0
−3 4 1
¸
¸
.
Now B can be brought to the canonical form
C =
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
¸
¸
by subtracting 2, 4, −1 and 5 times the ﬁrst column from the second, third, fourth
and ﬁfth columns, and twice the third column from the ﬁfth, and then swapping
the second and third columns; so C =BQ (whence C =PAQ), where Q is obtained
by performing the same column operations on I
5
:
Q =
1 −4 −2 1 3
0 0 1 0 0
0 1 0 0 −2
0 0 0 1 0
0 0 0 0 1
¸
¸
¸
¸
¸
¸
.
Remark: P and Q can also be found by multiplying elementary matrices, if
desired; but the above method is simpler. You may ﬁnd it easier to write an identity
matrix after A and perform the row operations on the extended matrix to ﬁnd P,
and to put an identity matrix underneath B and perform the column operations on
the extended matrix to ﬁnd Q.
2. A certain country has n political parties P
1
, . . . , P
n
. At the
beginning of the year, the percentage of voters who supported the
party P
i
was x
i
. During the year, some voters change their minds; a
proportion a
i j
of former supporters of P
j
will support P
i
at the end
of the year.
Let v be the vector [ x
1
x
2
x
n
]
·
recording support for the par
ties at the beginning of the year, and A the matrix whose (i, j) entry is
a
i j
.
(a) Prove that the vector giving the support for the parties at the end
of the year is Av.
109
(b) In subsequent years, exactly the same thing happens, with the
same proportions. Show that the vector giving the support for
the parties at the end of m years is A
m
v.
(c) Suppose that n = 2 and that
A =
¸
0.9 0.3
0.1 0.7
.
Show that, after a long time, the support for the parties will be
approximately 0.75 for P
1
to 0.25 for P
2
.
(a) Let y
i
be the proportion of the population who support P
i
at the end of
the year. From what we are given, the proportion supporting P
j
at the beginning
of the year was x
j
, and a fraction a
i j
of these changed their support to P
i
. So
the proportion of the whole population who supported P
j
at the beginning of the
year and P
i
at the end is a
i j
x
j
. The total support for P
i
is found by adding these
up for all j: that is,
y
i
=
n
∑
j=1
a
i j
x
j
,
or v
/
= Av, where v
/
is the vector [ y
1
. . . y
n
]
·
giving support for the parties at
the end of the year.
(b) Let v
k
be the column vector whose ith component is the proportion of the
population supporting party P
i
after the end of k years. In part (a), we showed
that v
1
= Av
0
, where v
0
= v. An exactly similar argument shows that v
k
= Av
k−1
for any k. So by induction, v
m
= P
m
v
0
= P
m
v, as required. (The result of (a) starts
the induction with m = 1. If we assume that v
k−1
= A
k−1
v, then
v
k
= Av
k−1
= A(A
k−1
v) = A
k
v,
and the induction step is proved.)
(c) The matrix P has characteristic polynomial
x −0.9 −0.3
−0.7 x −0.7
= x
2
−1.6x +0.6 = (x −1)(x −0.6).
So the eigenvalues of P are 1 and 0.6. We ﬁnd by solving linear equations that
eigenvectors for the two eigenvalues are
¸
3
1
and
¸
1
−1
respectively. As in the
text, we compute that the corresponding projections are
P
1
=
¸
0.75 0.75
0.25 0.25
, P
2
=
¸
0.25 −0.75
−0.25 0.75
.
110 APPENDIX F. WORKED EXAMPLES
(Once we have found P
1
, we can ﬁnd P
2
as I −P
1
.) Then P is diagonalisable:
A = P
1
+0.6P
2
.
From this and Proposition 4.6 we see that
A
m
= P
1
+(0.6)
m
P
2
.
As m →∞, we have (0.6)
m
→0, and so A →P
1
. So in the limit, if v
0
=
¸
x
y
is
the matrix giving the initial support for the parties, with x+y = 1, then the matrix
giving the ﬁnal support is approximately
¸
0.75 0.75
0.25 0.25
¸
x
y
=
¸
0.75(x +y)
0.25(x +y)
=
¸
0.75
0.25
.
As a check, use the computer with Maple to work out P
m
for some large value
of m. For example, I ﬁnd that
P
10
=
¸
0.7515116544 0.7454650368
0.2484883456 0.2545349632
.
3. The vectors v
1
, v
2
, v
3
form a basis for V =R
3
; the dual basis of V
∗
is f
1
, f
2
, f
3
. A second basis for V is given by w
1
= v
1
+v
2
+v
3
, w
2
=
2v
1
+v
2
+v
3
, w
3
= 2v
2
+v
3
. Find the basis of V
∗
dual to w
1
, w
2
, w
3
.
The ﬁrst dual basis vector g
1
satisﬁes g
1
(w
1
) = 1, g
1
(w
2
) = g
1
(w
3
) = 0. If
g
1
= x f
1
+y f
2
+z f
3
, we ﬁnd
x +y +z = 1,
2x +y +z = 0,
2y +z = 0,
giving x =−1, y =−2, z = 4. So g
1
=−f
1
−2f
2
+4 f
3
. Solving two similar sets
of equations gives g
2
= f
1
+ f
2
−2 f
3
and g
3
= f
2
− f
3
.
Alternatively, the transition matrix P from the vs to the ws is
P =
1 2 0
1 1 2
1 1 1
¸
¸
,
and we showed in Section 5.1.2 that the transition matrix between the dual bases
is
(P
−1
)
·
=
−1 1 0
−2 1 1
4 −2 −1
¸
¸
.
The coordinates of the gs in the basis of f s are the columns of this matrix.
111
4. The Fibonacci numbers F
n
are deﬁned by the recurrence relation
F
0
= 0, F
1
= 1, F
n+2
= F
n
+F
n+1
for n ≥0.
Let A be the matrix
¸
0 1
1 1
. Prove that
A
n
=
¸
F
n−1
F
n
F
n
F
n+1
,
and hence ﬁnd a formula for F
n
.
The equation for F
n
is proved by induction on n. It is clearly true for n = 1.
Suppose that it holds for n; then
A
n+1
=A
n
A=
¸
F
n−1
F
n
F
n
F
n+1
¸
0 1
1 1
=
¸
F
n
F
n−1
+F
n
F
n+1
F
n
+F
n+1
=
¸
F
n
F
n+1
F
n+1
F
n+2
.
So the induction step is proved.
To ﬁnd a formula for F
n
, we show that A is diagonalisable, and then write
A = λ
1
P
1
+λ
2
P
2
, where P
1
and P
1
are projection matrices with sum I satisfying
P
1
P
2
= P
2
P
1
= 0. Then we get A
n
= λ
n
1
P
1
+λ
n
2
P
2
, and taking the (1, 2) entry we
ﬁnd that
F
n
= c
1
λ
n
1
+c
2
λ
n
2
,
where c
1
and c
2
are the (1, 2) entries of P
1
and P
2
respectively.
From here it is just calculation. The eigenvalues of A are the roots of 0 =
det(xI −A) = x
2
−x −1; that is, λ
1
, λ
2
=
1
2
(1 ±
√
5). (Since the eigenvalues are
distinct, we know that A is diagonalisable, so the method will work.) Now because
P
1
+P
2
= I, the (1, 2) entries of these matrices are the negatives of each other; so
we have F
n
= c(λ
n
1
−λ
n
2
). Rather than ﬁnd P
1
explicitly, we can now argue as
follows: 1 = F
1
= c(λ
1
−λ
2
) = c
√
5, so that c = 1/
√
5 and
F
n
=
1
√
5
1+
√
5
2
n
−
1−
√
5
2
n
.
5. Let V
n
be the vector space of real polynomials of degree at most n.
(a) Show that the function
f g =
1
0
f (x)g(x)dx
is an inner product on V
n
.
112 APPENDIX F. WORKED EXAMPLES
(b) In the case n =3, write down the matrix representing the bilinear
form relative to the basis 1, x, x
2
, x
3
for V
3
.
(c) Apply the Gram–Schmidt process to the basis (1, x, x
2
) to ﬁnd
an orthonormal basis for V
2
.
(d) Let W
n
be the subspace of V
n
consisting of all polynomials f (x)
of degree at most n which satisfy f (0) = f (1) =0. Let D: W
n
→
W
n
be the linear map given by differentiation: (Df )(x) = f
/
(x).
Prove that the adjoint of D is −D.
(a) Put b( f , g) =
1
0
f (x)g(x)dx. The function b is obviously symmetric. So
we have to show that it is linear in the ﬁrst variable, that is, that
1
0
( f
1
(x) + f
2
(x))g(x)dx =
1
0
f
1
(x)g(x)dx +
1
0
f
2
(x)g(x)dx,
1
0
(c f (x))g(x)dx = c
1
0
f (x)g(x)dx,
which are clear from elementary calculus.
We also have to show that the inner product is positive deﬁnite, that is, that
b( f , f ) ≥ 0, with equality if and only if f = 0. This is clear from properties of
integration.
(b) If the basis is f
1
= 1, f
2
= x, f
3
= x
2
, f
4
= x
3
, then the (i, j) entry of the
matrix representing b is
1
0
x
i−1
x
j−1
dx =
1
i + j −1
,
so the matrix is
1
1
2
1
3
1
4
1
2
1
3
1
4
1
5
1
3
1
4
1
5
1
6
1
4
1
5
1
6
1
7
¸
¸
¸
¸
.
(c) The ﬁrst basis vector is clearly 1. To make x orthogonal to 1 we must
replace it by x+a for some a; doing the integral we ﬁnd that a =−
1
2
. To make x
2
orthogonal to the two preceding is the same as making it orthogonal to 1 and x, so
we replace it by x
2
+bx +c; we ﬁnd that
1
3
+
1
2
b+c = 0,
1
4
+
1
3
b+
1
2
c = 0,
so that b =−1 and c =
1
6
.
113
Now 1 1 = 1, (x −
1
2
) (x −
1
2
) =
1
12
, and (x
2
−x +
1
6
) (x
2
−x +
1
6
) =
1
180
; so
the required basis is
1,
1
2
√
3
(x −
1
2
),
1
6
√
5
(x
2
−x +
1
6
).
(d) Integration by parts shows that
f D(g) =
1
0
f (x)g
/
(x)dx
= [ f (x)g(x)]
1
0
−
1
0
f
/
(x)g(x)dx
= −(Df ) g,
where the ﬁrst termvanishes because of the condition on polynomials inW
n
. Thus,
by deﬁnition, the adjoint of D is −D.
6. Let A and B be real symmetric matrices. Is each of the following
statements true or false? Give brief reasons.
(a) If A and B are orthogonally similar then they are congruent.
(b) If A and B are orthogonally similar then they are similar.
(c) If A and B are congruent then they are orthogonally similar.
(d) If A and B are similar then they are orthogonally similar.
Recall that A and B are similar if B=P
−1
AP for some invertible matrix P; they
are congruent if B=P
·
AP for some invertible matrix P; and they are orthogonally
similar if B = P
−1
AP for some orthogonal matrix P (invertible matrix satisfying
P
·
= P
−1
). Thus it is clear that both (a) and (b) are true.
The Spectral Theorem says that A is orthogonally congruent to a diagonal
matrix whose diagonal entries are the eigenvalues. If A and B are similar, then
they have the same eigenvalues, and so are orthogonally congruent to the same
diagonal matrix, and so to each other. So (d) is true.
By Sylvester’s Law of Inertia, any real symmetric matrix is congruent to a
diagonal matrix with diagonal entries 1, −1 and 0. If we choose a symmetric
matrix none of whose eigenvalues is 1, −1 or 0, then it is not orthogonally similar
to the Sylvester form. For example, the matrices I and 2I are congruent but not
orthogonally similar. So (c) is false.
7. Find an orthogonal matrix P such that P
−1
AP and P
−1
BP are di
agonal, where
A =
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
¸
¸
¸
¸
, B =
0 1 0 1
1 0 1 0
0 1 0 1
1 0 1 0
¸
¸
¸
¸
.
114 APPENDIX F. WORKED EXAMPLES
Remark A and B are commuting symmetric matrices, so we know that the ma
trix P exists.
First solution We have to ﬁnd an orthonormal basis which consists of eigenvec
tors for both matrices.
Some eigenvectors can be found by inspection. If v
1
= (1, 1, 1, 1) then Av
1
=
4v
1
and Bv
1
= 2v
1
. If v
2
= (1, −1, 1, −1) then Av
2
= 0 and Bv
2
= −2v
2
. Any
further eigenvector v = (x, y, z, w) should be orthogonal to both of these, that is,
x+y+z+w=0 =x−y+z−w. So x+z =0 and y+w=0. Conversely, any such
vector satisﬁes Av = 0 and Bv = 0. So choose two orthogonal vectors satisfying
these conditions, say (1, 0, −1, 0) and (0, 1, 0, −1). Normalising, we obtain the
required basis: (1, 1, 1, 1)/2, (1, −1, 1, −1)/2, (1, 0, −1, 0)/
√
2, (0, 1, 0, −1)/
√
2.
So
P =
1
2
1
2
1
√
2
0
1
2
−
1
2
0
1
√
2
1
2
1
2
−
1
√
2
0
1
2
−
1
2
0 −
1
√
2
¸
¸
¸
¸
¸
¸
.
Second solution Observe that both A and B are circulant matrices. So we know
from Appendix B that the columns of the Vandermonde matrix
1 1 1 1
1 i −1 −i
1 −1 1 −1
1 −i −1 i
¸
¸
¸
¸
are eigenvectors of both matrices. The second and fourth columns have corre
sponding eigenvalues 0 for both matrices, and hence so do any linear combina
tions of them; in particular, we can replace these two columns by their real and
imaginary parts, giving (after a slight rearrangement) the matrix
1 1 1 0
1 −1 0 1
1 1 −1 0
1 −1 0 −1
¸
¸
¸
¸
.
After normalising the columns, this gives the same solution as the ﬁrst.
The results of Appendix B also allow us to write down the eigenvalues of A
and B without any calculation. For example, the eigenvalues of B are
1+1 = 2, i −i = 0, −1−1 =−2, −i +i = 0.
115
Remark A more elegant solution is the matrix
1
2
1 1 1 1
1 −1 1 −1
1 1 −1 −1
1 −1 −1 1
¸
¸
¸
¸
.
This matrix (without the factor
1
2
) is known as a Hadamard matrix. It is an nn
matrix H with all entries ±1 satisfying H
·
H = nI. It is known that an n n
Hadamard matrix cannot exist unless n is 1, 2, or a multiple of 4; however, nobody
has succeeded in proving that a Hadamard matrix of any size n divisible by 4
exists.
The smallest order for which the existence of a Hadamard matrix is still in
doubt is (at the time of writing) n = 668. The previous smallest, n = 428, was
resolved only in 2004 by Hadi Kharaghani and Behruz TayfehReziae in Tehran,
by constructing an example.
As a further exercise, show that, if H is a Hadamard matrix of size n, then
¸
H H
H −H
is a Hadamard matrix of size 2n. (The Hadamard matrix of size 4
constructed above is of this form.)
8. Let A =
¸
1 1
1 2
and B =
¸
1 1
1 0
.
Find an invertible matrix P and a diagonal matrix D such that P
·
AP=
I and P
·
BP = D, where I is the identity matrix.
First we take the quadratic form corresponding to A, and reduce it to a sum of
squares. The form is x
2
+2xy+2y
2
, which is (x+y)
2
+y
2
. (Note: This is the sum
of two squares, in agreement with the fact that A is positive deﬁnite.)
Now the matrix that transforms (x, y) to (x +y, y) is Q =
¸
1 1
0 1
, since
¸
1 1
0 1
¸
x
y
=
¸
x +y
y
.
Hence
[ x y] Q
·
Q
¸
x
y
= x
2
+2xy +2y
2
= [ x y] A
¸
x
y
,
so that Q
·
Q = A.
Now, if we put P = Q
−1
=
¸
1 −1
0 1
, we see that P
·
AP = P
·
(Q
·
Q)P = I.
116 APPENDIX F. WORKED EXAMPLES
What about P
·
BP? We ﬁnd that
P
·
BP =
¸
1 0
−1 1
¸
1 1
1 0
¸
1 −1
0 1
=
¸
1 0
0 −1
= D,
the required diagonal matrix. So we are done.
Remark 1: In general it is not so easy. The reduction of the quadratic form will
give a matrix P
1
such that P
·
1
AP
1
= I, but in general P
·
1
BP
1
won’t be diagonal;
all we can say is that it is symmetric. So by the Spectral Theorem, we can ﬁnd an
orthogonal matrix P
2
such that P
·
1
(P
·
1
BP
1
)P
2
is diagonal. (P
2
is the matrix whose
columns are orthonormal eigenvectors of P
·
1
BP
1
.) Then because P
2
is orthogonal,
we have
P
·
2
(P
·
1
AP
1
)P
2
= P
·
2
IP
2
= I,
so that P = P
1
P
2
is the required matrix.
Remark 2: If you are only asked for the diagonal matrix D, and not the matrix
P, you can do an easier calculation. We saw in the lectures that the diagonal
entries of D are the roots of the polynomial det(xA−B) = 0. In our case, we have
x −1 x −1
x −1 2x
= x
2
−1 = (x −1)(x +1),
so the diagonal entries of D are +1 and −1 (as we found).
Index
abelian group, 3, 90, 91
addition
of linear maps, 36
of matrices, 15
of scalars, 3
of vectors, 4
adjoint, 57, 71
adjugate, 27
alternating bilinear form, 85
alternating matrix, 86
axioms
for ﬁeld, 3, 89
for vector space, 3, 90
basis, 6
orthonormal, 68
bilinear form, 62
alternating, 85
symmetric, 62
canonical form, 20
for congruence, 64, 65
for equivalence, 17, 20, 39
for orthogonal similarity, 76
for similarity, 51
Cauchy–Schwarz inequality, 68
Cayley–Hamilton Theorem, 30, 48
characteristic of ﬁeld, 58, 90
characteristic polynomial
of linear map, 48
circulant matrix, 94, 114
coding equivalence, 106
cofactor, 27
cofactor expansion, 27
column operations, 16
column rank, 20
column vector, 10, 15
complex numbers, 3, 90
congruence, 59, 106
coordinate representation, 10
coordinatewise, 5
cubic equation, 95
data, 10
determinant, 23, 87
of linear map, 48
diagonalisable, 45
dimension, 8
direct sum, 14, 41
distributive law, 3
dot product, 67
dual basis, 56
dual space, 56
echelon form, 105
eigenspace, 44
eigenvalue, 44
eigenvector, 44
elementary column operations, 16
elementary matrix, 18
elementary row operations, 16
equivalence, 20, 39
errorcorrecting codes, 106
Euclidean plane, 4
Exchange Lemma, 7
Fibonacci numbers, 111
ﬁeld, 3
117
118 INDEX
ﬁnitedimensional, 6
Friendship Theorem, 97
Fundamental Theoremof Algebra, 75
Google (Internet search engine), 103
Gram–Schmidt process, 69, 85
graph theory, 97
Hadamard matrix, 115
Hermitian, 82, 83
identity linear map, 42
identity matrix, 12, 16
image, 33
indecomposable matrix, 102
indeﬁnite, 66
inner product, 67, 81
standard, 69
integers mod p, 90
intersection, 13
inverse matrix, 12, 29
invertible
linear map, 37
matrix, 12, 24, 29, 37
Jordan block, 51
Jordan form, 51
kernel, 33
Kronecker delta, 56
linear form, 55
linear map, 33
linear transformation, 33
linearly independent, 6
Maple (computer algebra system), 102,
110
matrix, 15
alternating, 86
circulant, 94, 114
Hadamard, 115
Hermitian, 82
identity, 16
indecomposable, 102
invertible, 19, 37
normal, 84
orthogonal, 71
skewHermitian, 88
skewsymmetric, 86
symmetric, 59, 71
unitary, 82
Vandermonde, 93, 114
minimal polynomial, 48
minor, 27
multiplication
of linear maps, 36
of matrices, 15
of scalars, 3
negative deﬁnite, 66
negative semideﬁnite, 66
nonsingular, 12
normal linear map, 84
normal matrix, 84
nullity, 34
orthogonal complement, 73
orthogonal decomposition, 74
orthogonal linear map, 71
orthogonal projection, 74
orthogonal similarity, 71
orthogonal vectors, 73
orthonormal basis, 68
parallelogram law, 4
permutation, 25
Perron–Frobenius Theorem, 102
Pfafﬁan, 87
polarisation, 63
positive deﬁnite, 66, 67, 82
positive semideﬁnite, 66
projection, 41
orthogonal, 74
INDEX 119
quadratic form, 59, 63
rank
of linear map, 34, 40
of matrix, 17, 40
of quadratic form, 66, 78
Rank–Nullity Theorem, 34
rational numbers, 3, 90
real numbers, 3, 90
ring, 16
row operations, 16
row rank, 20
row vector, 10, 15, 55
rowequivalence, 105
scalar, 4
scalar multiplication, 4
selfadjoint, 71
semilinear, 82
sesquilinear, 82
sign, 25
signature, 66, 78
similarity, 44
orthogonal, 71
skewHermitian, 88
skewsymmetric matrix, 86
spanning, 6
Spectral Theorem, 75, 82, 83
standard inner product, 69, 82
subspace, 13
sum, 13
Sylvester’s Law of Inertia, 62, 65, 77
symmetric group, 25
trace, 52, 99
transition matrix, 11
transposition, 25
unitary, 82, 83
Vandermonde matrix, 93, 114
vector, 4
vector space, 4
complex, 4
real, 4
zero matrix, 16
zero vector, 4
ii
Preface
Linear algebra has two aspects. Abstractly, it is the study of vector spaces over ﬁelds, and their linear maps and bilinear forms. Concretely, it is matrix theory: matrices occur in all parts of mathematics and its applications, and everyone working in the mathematical sciences and related areas needs to be able to diagonalise a real symmetric matrix. So in a course of this kind, it is necessary to touch on both the abstract and the concrete aspects, though applications are not treated in detail. On the theoretical side, we deal with vector spaces, linear maps, and bilinear forms. Vector spaces over a ﬁeld K are particularly attractive algebraic objects, since each vector space is completely determined by a single number, its dimension (unlike groups, for example, whose structure is much more complicated). Linear maps are the structurepreserving maps or homomorphisms of vector spaces. On the practical side, the subject is really about one thing: matrices. If we need to do some calculation with a linear map or a bilinear form, we must represent it by a matrix. As this suggests, matrices represent several different kinds of things. In each case, the representation is not unique, since we have the freedom to change bases in our vector spaces; so many different matrices represent the same object. This gives rise to several equivalence relations on the set of matrices, summarised in the following table: Equivalence Same linear map α :V →W A = Q−1 AP P, Q invertible Similarity Same linear map α :V →V A = P−1 AP P invertible Orthogonal similarity Same bilinear Same selfadjoint form b on V α : V → V w.r.t. orthonormal basis A = P AP P invertible A = P−1 AP P orthogonal Congruence
The power of linear algebra in practice stems from the fact that we can choose bases so as to simplify the form of the matrix representing the object in question. We will see several such “canonical form theorems” in the notes. iii
a general ﬁeld is called K. dual spaces (which consist of linear mappings from the original space to the underlying ﬁeld) and determinants. The notes for the prerequisite course. the method for solving a cubic equation. as well as some worked examples. The abstract component builds on the notions of subspaces and linear maps to construct the theory of bilinear forms i. Cameron September 5. the proof of the “Friendship Theorem” and the problem of deciding the winner of a football league. and concrete calculations with matrices. as given at Queen Mary. in the ﬁrst sememster 2005–6. For example. In other words. some revision is necessary.iv These lecture notes correspond to the course Linear Algebra II. including some special determinants. Linear Algebra I. Of course. If you are reading them without the beneﬁt of a previous course on linear algebra. by Dr Francis Wright. vectors are represented as column vectors.maths.e. you will almost certainly have to do some work ﬁlling in the details of arguments which are outlined or skipped over here.qmul. functions of two variables which are linear in each variable. are currently available from http://centaur. linear maps (apart from zero and the identity) are represented by Greek letters.uk/Lin Alg I/ I have byandlarge kept to the notation of these notes. Peter J. I have included in the appendices some extracurricular applications of linear algebra.ac. University of London. The concrete applications involve ways to reduce a matrix of some speciﬁc type (such as symmetric or skewsymmetric) to as near diagonal form as possible. and I have tried to make the notes reasonably selfcontained. The course description reads as follows: This module is a mixture of abstract theory. 2008 . students on this course have met the basic concepts of linear algebra before. with rigorous proofs.
. . . . . . . 1. . 4.3 Change of basis . . . . . . . . . . . . . . . . .5 Characteristic and minimal polynomials 4. .5 Subspaces and direct sums 3 3 5 9 11 13 15 15 16 20 22 25 29 33 33 35 37 39 41 41 43 44 45 48 51 52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Jordan form . . 2 Matrices and determinants 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . 2. . . . .3 Rank . . . . . . . . . . . . 3. . . . . 1. . . .2 Bases . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . .1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . v . . . . . . . . . . . . . . .3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . .2 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . .7 Trace . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . .4 Diagonalisability . . .1 Deﬁnition and basic properties . . 2. . . . . . . . . . . . . . . 4 Linear maps on a vector space 4. .4 Canonical form revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Row and column operations . . . . .4 Change of basis . . . . . . . .1 Deﬁnitions . 3. . . . . . . . . . . . . . . 2. 1. . . . .Contents 1 Vector spaces 1. . .2 Representation by matrices . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . .6 The Cayley–Hamilton Theorem 3 Linear maps between vector spaces 3. . . . . . .4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . .3 Row and column vectors . . . . . . . . . . . . . . . . . . . . . . . .1 Projections and direct sums . . . . . . 2. . . . . . . 4. . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . .5 Calculating determinants . . . . . 4. . .
. . . . . . . . . . . . . . . . . . .2 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . .2 Quadratic forms . . . . . . . . . .1. . . . . . . .2 The complex Spectral Theorem . 86 9. . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . 6 Inner product spaces 67 6. . . . 85 9. . . . . . . . . 73 73 75 77 78 .1 Alternating bilinear forms .1 Linear forms and dual space . . .2. . .CONTENTS 5 Linear and quadratic forms 5. . . . . . .2. . . . . . . . . . .2 Reduction of quadratic forms . . . . . . . . . . . . . . 8 The complex case 81 8. . . 5. . . . . . . . . . . .1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . .1 Inner products and orthonormal bases . . . . . . .4 Canonical forms for complex and real forms 1 55 55 57 57 58 58 60 62 64 . . . . . . . . . . . . . . . . 5. . . . . .1 Orthogonal projections and orthogonal decompositions 7. . . . . . . . . . . . . . . . . .2 Skewsymmetric and alternating matrices . . . . . . 5. . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . 5. . . . . 67 6. . 83 9 Skewsymmetric matrices 85 9. . . . . . .3 Complex skewHermitian matrices . . . . . . . .2 The Spectral Theorem . . . . . . . .1. . .3 Quadratic and bilinear forms . . . . . . . 81 8. . . 70 7 Symmetric and Hermitian matrices 7.3 Normal matrices . . . . . . . . . 88 A Fields and vector spaces B Vandermonde and circulant matrices C The Friendship Theorem D Who is top of the league? E Other canonical forms F Worked examples 89 93 97 101 105 107 . . . . . .4 Simultaneous diagonalisation . . . . . . . . . . . 82 8.1 Complex inner products .2. . . . . . . . . . 5. . . . . . . . . 5.3 Quadratic forms revisited . . . . . . . . . .2 Adjoints and orthogonal linear maps . . . . . . 7. . . . . .1 Adjoints . .
2 CONTENTS .
and what we use this for. where p is a prime number. (M) (K \ {0}. how we manipulate them. • R. +) is an abelian group with identity element 0 (called zero). the only ﬁelds that I will use in these notes are • Q. • F p .1 A ﬁeld is an algebraic system consisting of a nonempty set K equipped with two binary operations + (addition) and · (multiplication) satisfying the conditions: (A) (K. You may have seen F p referred to as Z p . the ﬁeld of complex numbers. • C. then you can ﬁnd it spelled out in detail in Appendix A. (D) the distributive law a(b + c) = ab + ac holds for all a.Chapter 1 Vector spaces These notes are about linear maps and bilinear forms on vector spaces. how we represent them by matrices. c ∈ K. the ﬁeld of rational numbers. the ﬁeld of real numbers. b. In fact. I will not stop to prove that these structures really are ﬁelds. ·) is an abelian group with identity element 1. the ﬁeld of integers mod p.1 Deﬁnitions Deﬁnition 1. If you don’t know what an abelian group is. 3 . 1.
we have (a + b)v = av + bv. v ∈ V .4 CHAPTER 1. y2 ) = (x1 + x2 . Now we deﬁne addition and scalar multiplication by (x1 . we distinguish them by calling the elements of K scalars and the elements of V vectors. Thus. (VM3) For any a. b ∈ K. we have a(u + v) = au + av.2 A vector space V over a ﬁeld K is an algebraic system consisting of a nonempty set V equipped with a binary operation + (vector addition). Example 1. ay). (VM4) For any v ∈ V . in the same direction if a > 0 and in the opposite direction if a < 0. The scalar multiple av is the vector whose length is a times the length of v. y1 ) + (x2 . y) = (ax. Since we have two kinds of elements.1). (VM) Rules for scalar multiplication: (VM0) For any a ∈ K. and one over C is a complex vector space. Think of a vector as an arrow starting at the origin and ending at a point of the plane. • The geometric deﬁnition. namely elements of K and elements of V . A vector space over the ﬁeld R is often called a real vector space. This means that we can add two vectors. y). (VM2) For any a. and multiply a vector by a scalar (a real number). Then addition of two vectors is done by the parallelogram law (see Figure 1. with identity element 0 (the zero vector). (VM1) For any a ∈ K.1 The ﬁrst example of a vector space that we meet is the Euclidean plane R2 . and an operation of scalar multiplication (a. +) is an abelian group. b ∈ K. y) of real numbers. There are two ways we can make these deﬁnitions. • The algebraic deﬁnition. v ∈ V . v ∈ V . VECTOR SPACES Deﬁnition 1. v) ∈ K ×V → av ∈ V such that the following rules hold: (VA) (V. a vector v is just a pair (x. v ∈ V . y1 + y2 ). We represent the points of the plane by Cartesian coordinates (x. This is a real vector space. . there is a unique element av ∈ V . a(x. we have (ab)v = a(bv). we have 1v = v (where 1 is the identity element of K). u.
vn be vectors in V . . but it is much easier to check that the rules for a vector space are really satisﬁed! For example. . In the algebraic deﬁnition. an ) + (b1 . . Then V is a vector space over K.2. . . ay2 ) av + aw. a2 + b2 . ay1 + ay2 ) (ax1 . Then we have a(v + w) = = = = = a((x1 . . y2 ) a(x1 + x2 . 1. an + bn ). .1: The parallelogram law Not only is this deﬁnition much simpler. this example can be generalised. the set of all ntuples of elements of K. we check the law a(v + w) = av + aw.2 Let n be any positive integer and K any ﬁeld. ay1 ) + (ax2 . Deﬁnition 1. Let V = Kn . . . and let v1 . y1 ) and w = (x2 .3 Let V be a vector space over the ﬁeld K. Example 1. . . bn ) = (a1 + b1 . . we say that the operations of addition and scalar multiplication are coordinatewise: this means that we add two vectors coordinate by coordinate. ca2 . . y1 + y2 ) (ax1 + ax2 . .2 Bases This example is much more general than it appears: Every ﬁnitedimensional vector space looks like Example 1. can ). Using coordinates. an ) = (ca1 . and similarly for scalar multiplication. a2 . a2 . y2 ). . . Let v = (x1 . . .1. . . Here’s why. b2 . c(a1 .2. . BASES 5 I & &¡ & ¡ ! & ¡ b & ¡ & ¡ ¡ & ¡ ¡ & ! ¡ & ¡ ¡ ¡ && I ¡& & ¡ Figure 1. . where the operations are deﬁned coordinatewise: (a1 . . . . y1 ) + (x2 .
. In this case. . if we remove any vector from the list. vn ) be a basis for V ” to mean that the list of vectors v1 . Proposition 1. . (c) v1 . . then the result is no longer spanning). c2 . vn are linearly independent if. . vn of the vector space V over K: (a) v1 . . (c) The vectors v1 . . and to refer to this list as B. . . . (b) The vectors v1 . . we write V = v1 . . . we can ﬁnd scalars c1 . . vn . . cn satisfying c1 v1 + c2 v2 + · · · + cn vn = 0. . . v2 . . . . . . . . . vn is a maximal linearly independent set (that is. v2 . . . . . (b) v1 . vn are spanning if. . vn is a basis. a list in which the same vector occurs more than once is never linearly independent. . . I will say “Let B = (v1 . . . . Deﬁnition 1. Also. vn is a basis. or various other subjects. Remark Linear independence is a property of a list of vectors. . cn ∈ K such that v = c1 v1 + c2 v2 + · · · + cn vn . whenever we have scalars c1 . if we add any vector to the list. If you study Functional Analysis. . . for every vector v ∈ V . v2 . VECTOR SPACES (a) The vectors v1 . c2 . Remark In these notes we are only concerned with ﬁnitedimensional vector spaces. We say that V is ﬁnitedimensional if we can ﬁnd vectors v1 . The next theorem helps us to understand the properties of linear independence. . . . vn is a minimal spanning set (that is. you will meet vector spaces which are not ﬁnite dimensional. then the result is no longer linearly independent).1 The following three conditions are equivalent for the vectors v1 . Quantum Mechanics. v2 . v2 . . .6 CHAPTER 1. . vn form a basis for V if they are linearly independent and spanning. . . . .4 Let V be a vector space over the ﬁeld K. . . vn ∈ V which form a basis for V . . . . A list containing the zero vector is never linearly independent. then necessarily c1 = c2 = · · · = 0.
. these new equations have a nonzero solution. . .1. By hypothesis. . We cannot have d = 0. Lemma 1. vn are linearly independent. . . the list (v1 . cn . wm are linearly independent. If one of these coefﬁcients is nonzero. So we can divide the equation through by d. n . . an1 of x1 are all zero. . So there are coefﬁcients c1 . for this would mean that we had a linear combination of v1 . . Let us argue for a contradiction. In order to prove this. wi ) is linearly dependent. . then putting x1 = 1 and the other variables zero gives a solution. . . xm are not all zero). then we can use the corresponding equation to express x1 in terms of the other variables. Computing the value of x1 gives a solution to the original equations. So by the induction hypothesis. such that c1 v1 + · · · + cn vn + dwi = 0. . vn . . If the coefﬁcients a11 . x1 . . This means that. . a21 x1 + a22 x2 + · · · + a2m xm = 0. .2 (The Exchange Lemma) Let V be a vector space over K. xm ) (that is. BASES 7 Theorem 1. for all j. . there exists a nonzero solution (x1 . . . and that the vectors w1 . . by assuming that the result is false: that is. vn ) to produce a larger linearly independent list. wi are linearly independent. Then we can ﬁnd a number i with 1 ≤ i ≤ m such that the vectors v1 . we need a lemma about systems of equations. vn equal to zero. contrary to the hypothesis that these vectors are linearly independent. . . a21 . . Now we turn to the proof of the Exchange Lemma. . . . vn . . . . . . .3 Given a system (∗) a11 x1 + a12 x2 + · · · + a1m xm = 0. obtaining n − 1 equations in m − 1 variables. assume that none of the vectors wi can be added to the list (v1 . ··· an1 x1 + an2 x2 + · · · + anm xm = 0 of homogeneous linear equations. d.2. not all zero. . Suppose that the vectors v1 . Proof This is proved by induction on the number of variables. . where the number n of equations is strictly less than the number m of variables. . n − 1 < m − 1. . to obtain (changing notation slightly) wi = a1i v1 + a2i v2 + · · · + ani vn = j=1 ∑ a jiv j . and take wi to the other side. . . . where m > n.
So the assumption that no wi can be added to (v1 . . we obtain x1 w1 + · · · + xm wm = j=1 ∑ ∑ a jixi i=1 n m v j = 0. VECTOR SPACES We do this for each value of i = 1. wm ) to get a larger linearly independent set. i=1 ∑ a jixi = 0 m for j = 1. But now we can add some ws to (v1 . (a) Let (v1 . since otherwise we could add one of the vs to (1 . wm ) be two bases for V . . Suppose. vn ) until we obtain a basis. contradicting our assumption. . wm ) are not linearly dependent. say that n < m. . We will say “an ndimensional vector space” instead of “a ﬁnitedimensional vector space whose dimension is n”. so this means that the vectors (w1 . we can add some vector wi to the ﬁrst list to get a larger linearly independent list. This means that v1 . The number of elements in a basis is called the dimension of the vector space V . . (b) Let (v1 . . according to the Exchange Lemma. Necessarily n ≤ m. . . . . m. . wm ) be a basis. . . vn was not a maximal linearly independent set. . . . . . without loss of generality. We denote the dimension of V by dim(V ). . . Now take a nonzero solution to the set of equations (∗) above: that is. We conclude that m = n.4 Let V be a ﬁnitedimensional vector space over a ﬁeld K. and so (by Proposition 1. . . . for a contradiction. . . . . Multiplying the formula for wi by xi and adding. . vn ) to get a linearly independent set must be wrong.8 CHAPTER 1. . . . (b) any linearly independent set can be extended to a basis. contrary to hypothesis. so. . . . . as required. Proof Let us see how the corollary follows from the Exchange Lemma. contradicting maximality. Then (a) any two bases of V have the same number of elements. vn ) be linearly independent and let (w1 . and the proof is complete. . . n. .1) not a basis. The Exchange Lemma has some important consequences: Corollary 1. . . that they have different numbers of elements. . . . But the coefﬁcients are not all zero. Both lists of vectors are linearly independent. vn ) and (w1 . . .
The scalars c1 . vn are linearly independent. In other words. Such a vector space contains just one vector. Now the coordinate representation is unique. a basis for this vector space consists of the empty set. cn = cn . addition and scalar multiplication in V translate to the same operations on their coordinate representations. . 1. we multiply its coordinate representation by c.. that is. . . . ROW AND COLUMN VECTORS 9 Remark We allow the possibility that a vector space has dimension zero. vn for V . when we add two vectors in V . and the coordinate representation of v is the ntuple (c1 . or as . . Here is how the result would be stated in the language of abstract algebra: Theorem 1. For suppose that we also had v = c1 v1 + c2 v2 + · · · + cn vn for scalars c1 . . so this equation implies that c1 − c1 = 0. Now let V be an ndimensional vector space over K. . .3 Row and column vectors The elements of the vector space Kn are all the ntuples of scalars from the ﬁeld K. cn ∈ K. c2 − c2 = 0. . Now the vectors v1 . cn ) ∈ Kn . . c2 . Now it is easy to check that. the zero vector 0. . . Subtracting these two expressions.3. . . This is why we only need to consider vector spaces of the form Kn . and when we multiply a vector v ∈ V by a scalar c. . Since this list of vectors is spanning. v2 . c2 . c2 = c2 .1. v2 . . cn are the coordinates of v (with respect to the given basis).. . . There are two different ways that we can represent an ntuple: as a row. . as in Example 1. we add their coordinate representations in Kn (using coordinatewise addition). This means that there is a basis v1 .2. . . . c1 = c1 . . . cn .5 Any ndimensional vector space over a ﬁeld K is isomorphic to the vector space Kn . cn − cn = 0. . . c2 . . we obtain 0 = (c1 − c1 )v1 + (c2 − c2 )v2 + · · · + (cn − cn )vn . every vector v ∈ V can be expressed as v = c1 v1 + c2 v2 + · · · + cn vn for some scalars c1 .
y) or (x. . 2. −3) and the equivalent for columns in other books!) Both systems are in common use. . a vector often represents data from an experiment. and data are usually recorded in columns on a datasheet. −3 (Note that we use square brackets. I will use column vectors in these notes. rather than round brackets or parentheses. the vector with components 1. y 9 Statisticians also prefer column vectors: to a statistician.or 3dimensional Euclidean space. then the coordinate representation of v relative to the basis B is c1 c2 [v]B = . so care is needed. z) for the coordinates of a point in 2.5 Let V be a vector space with a basis B = (v1 . y. cn In order to save space on the paper. So we make a formal deﬁnition: Deﬁnition 1. the linear equations 2x + 3y = 5. v2 . There are arguments for and against both systems. and you should be familiar with both. we often write this as [v]B = [ c1 The symbol is read “transpose”. The choice of row or column vectors makes some technical differences in the statements of the theorems. . Those who prefer row vectors would argue that we already use (x. . . . VECTOR SPACES a column. But you will see the notation (1. 2 and −3 can be represented as a row vector [ 1 2 −3 ] or as a column vector 1 2 . . vn ] . say. vn ). c2 . Those who prefer column vectors point to the convenience of representing. . The most powerful argument will appear when we consider representing linear maps by matrices. . Thus. .10 CHAPTER 1. If v = c1 v1 + c2 v2 + · · · + cn vn . 4x + 5y = 9 in matrix form 2 3 4 5 x 5 = . so we should use the same for vectors.
Reversing the order of summation. By the uniqueness of the coordinate representation. . . . . we get v= ∑ i=1 j=1 ∑ pi j d j vi . Deﬁnition 1. Now we have two expressions for v as a linear combination of the vectors vi . CHANGE OF BASIS 11 1.4 Change of basis The coordinate representation of a vector is always relative to a basis. By deﬁnition. for any vector v ∈ V . . dn ] . i=1 n Take an arbitrary vector v ∈ V . We now have to look at how the representation changes when we use a different basis.6 Let B and B be bases for the ndimensional vector space V over the ﬁeld K. and let [v]B = [c1 . .B . the coordinate representations of v with respect to B and B are related by [v]B = P [v]B . . If we need to specify the bases. cn ] . they are the same: that is. vn ) be bases for the ndimensional vector space V over the ﬁeld K. n Substituting the formula for v j into the second equation. j entry of the matrix P. Proof Let pi j be the i. that v = ∑ ci vi = i=1 n [v]B = [d1 .1. vn ) and B = (v1 . . . n . . we have v= j=1 ∑ d j ∑ pi j vi i=1 n n n n . . ci = j=1 ∑ pi j d j . The transitition matrix P from B to B is the n × n matrix whose jth column is the coordinate representation [v j ]B of the jth vector of B relative to B. . Proposition 1. . j=1 ∑ d jv j.4. we write PB. . Then. . by deﬁnition.6 Let B = (v1 . we have v j = ∑ pi j vi . This means. .
this says CHAPTER 1. Remark We see that. Not every matrix has an inverse: we say that P is invertible or nonsingular if it has an inverse.B [v]B . For example. . Corollary 1. the matrix Q satisfying PQ = QP = I.r. .r. the new basis in terms of that w. so [v]B = PB. .B PB . cn or in other words [v]B = P [v]B . In this course. the matrix having 1s on the main diagonal and 0s everywhere else. Here I denotes the identity matrix. for (b) we have [v]B = PB. This follows immediately from (b) of the preceding Corollary.B [v]B . we denote by P−1 the inverse of P.B = (PB. = P .B )−1 .B [v]B .t. . the old one.B = I. B .B .B = PB. VECTOR SPACES c1 d1 . Here is the ﬁrst occurrence: matrices arise as transition matrices between bases of a vector space. .8 The transition matrix between any two bases of a vector space is invertible. . we have PB. Given a matrix P. The next corollary summarises how transition matrices behave.B PB . (b) PB .12 In matrix form. dn [v]B = PB .7 Let B.B PB . to express the coordinate representation w. (c) PB. (a) PB.t. we will see four ways in which matrices arise in linear algebra.B = I. B be bases of the vector space V . as required.B [v]B . This follows from the preceding Proposition. By the uniqueness of the coordinate representation. Corollary 1. we need the inverse of the transition matrix: −1 [v]B = PB.
vn ) is a spanning set in this subspace. Both U1 ∩U2 and U1 +U2 are subspaces of V . Moreover. SUBSPACES AND DIRECT SUMS Example Consider the vector space R2 . . . . The span of (v1 . . – the sum U1 +U2 is the set {u1 + u2 : u1 ∈ U1 . . y 0 1 1 3 as is easily checked. (b) Let U1 and U2 be subspaces of V . . . with the two bases B= The transition matrix is PB. 1. We write U ≤ V to mean “U is a subspace of V ”. . . A subspace of a vector space is a vector space in its own right.1. y ∈ R.B = whose inverse is calculated to be PB . we have x 1 0 1 2 =x +y = (3x − 2y) + (−x + y) . .5.B = 3 −2 .7 A nonempty subset of a vector space is called a subspace if it contains the sum of any two of its elements and any scalar multiple of any of its elements. . . 1 3 . u2 ∈ U2 } of all sums of vectors from the two subspaces. B = 1 2 . . We denote the span of v1 . .5 Subspaces and direct sums Deﬁnition 1. . vn ) is the set {c1 v1 + c2 v2 + · · · + cn vn : c1 . 0 1 . vn ∈ V . . . . . . vn by v1 . Then – the intersection U1 ∩ U2 is the set of all vectors belonging to both U1 and U2 . cn ∈ K}. 13 So the theorem tells us that. Subspaces can be constructed in various ways: (a) Let v1 . for any x. . vn . −1 1 1 2 . . (v1 . This is a subspace of V . . 1 3 1 0 . . .
14 CHAPTER 1. . that is. then B1 ∪ · · · ∪ Br is a basis for V .Ur . . (b) if Bi is a basis for Ui for i = 1. Note that dim(U1 ⊕U2 ) = dim(U1 ) + dim(U2 ). . . Deﬁnition 1. VECTOR SPACES The next result summarises some properties of these subspaces. u1 . . and write V = U1 ⊕ . (a) For any v1 . r. . u2 ∈ U2 . . . . We state a form which is sufﬁcient for our purposes. and by hypothesis it is equal to 0. Proofs are left to the reader. the dimension of v1 . . But u1 − u1 ∈ U1 . . vn are linearly independent. . u1 ∈ U1 . . and u2 − u2 ∈ U2 . Then u1 − u1 = u2 − u2 . . . . . . so that u1 = u1 and u2 = u2 . say v = u1 + u2 = u1 + u2 . . the sum U1 +U2 has the property that each of its elements has a unique expression in the form u1 + u2 . .10 If V = U1 ⊕ · · · ⊕Ur . u2 . . vn is at most n.9 Let V be a vector space over K. The notion of direct sum extends to more than two summands. but is a little complicated to describe. . then (a) dim(V ) = dim(U1 ) + · · · + dim(Ur ). if every vector v ∈ V can be written uniquely in the form v = u1 + · · · + ur with ui ∈ Ui for i = 1.8 Let U1 . . . . . In this case. for u1 ∈ U1 and u2 ∈ U2 . (b) For any two subspaces U1 and U2 of V . An important special case occurs when U1 ∩ U2 is the zero subspace {0}. For suppose that we had two different expressions for a vector v. . so this vector is in U1 ∩U2 . . vn ∈ V . r. We say that V is the direct sum of U1 . ⊕Ur . with equality if and only if v1 . Proposition 1.Ur be subspaces of the vector space V . . the two expressions are not different after all! In this case we say that U1 + U2 is the direct sum of the subspaces U1 and U2 . . we have dim(U1 ∩U2 ) + dim(U1 +U2 ) = dim(U1 ) + dim(U2 ). . and write it as U1 ⊕U2 . Proposition 1.
is an array with m rows and n columns. indeed. and deﬁne the rank of a matrix. we met some in the ﬁrst chapter of the notes. while a row vector is a 1 × n matrix. (b) Let A be an m × n matrix and B an n × p matrix over K. there is a unique “determinant” function satisfying the rules we lay down).1 A column vector in Kn can be thought of as a n × 1 matrix. Then the sum A + B is deﬁned by adding corresponding entries: (A + B)i j = Ai j + Bi j . Deﬁnition 2.1 A matrix of size m × n over a ﬁeld K. j) entry is obtained by multiplying each 15 . j) entry of A. Then we deﬁne the determinant of a square matrix axiomatically and prove that it exists (that is. consider row and column operations on matrices.Chapter 2 Matrices and determinants You have certainly seen matrices before. 2. Then the product AB is the m × p matrix whose (i. where m and n are positive integers. Example 2.2 We deﬁne addition and multiplication of matrices as follows. For 1 ≤ i ≤ m and 1 ≤ j ≤ n. Here we revise matrix algebra.1 Matrix algebra Deﬁnition 2. where each entry is an element of K. and give some methods of calculating it and some of its properties. Finally we prove the Cayley–Hamilton Theorem: every matrix satisﬁes its own characteristic equation. and referred to as the (i. (a) Let A and B be matrices of the same size m × n over K. the entry in row i and column j of A is denoted by Ai j .
16
CHAPTER 2. MATRICES AND DETERMINANTS element in the ith row of A by the corresponding element in the jth column of B and summing: (AB)i j =
k=1
∑ Aik Bk j .
n
Remark Note that we can only add or multiply matrices if their sizes satisfy appropriate conditions. In particular, for a ﬁxed value of n, we can add and multiply n × n matrices. It turns out that the set Mn (K) of n × n matrices over K is a ring with identity: this means that it satisﬁes conditions (A0)–(A4), (M0)–(M2) and (D) of Appendix 1. The zero matrix, which we denote by O, is the matrix with every entry zero, while the identity matrix, which we denote by I, is the matrix with entries 1 on the main diagonal and 0 everywhere else. Note that matrix multiplication is not commutative: BA is usually not equal to AB. We already met matrix multiplication in Section 1 of the notes: recall that if PB,B denotes the transition matrix between two bases of a vector space, then PB,B PB ,B = PB,B .
2.2
Row and column operations
Given an m × n matrix A over a ﬁeld K, we deﬁne certain operations on A called row and column operations. Deﬁnition 2.3 Elementary row operations There are three types: Type 1 Add a multiple of the jth row to the ith, where j = i. Type 2 Multiply the ith row by a nonzero scalar. Tyle 3 Interchange the ith and jth rows, where j = i. Elementary column operations There are three types: Type 1 Add a multiple of the jth column to the ith, where j = i. Type 2 Multiply the ith column by a nonzero scalar. Tyle 3 Interchange the ith and jth column, where j = i. By applying these operations, we can reduce any matrix to a particularly simple form:
2.2. ROW AND COLUMN OPERATIONS
17
Theorem 2.1 Let A be an m × n matrix over the ﬁeld K. Then it is possible to change A into B by elementary row and column operations, where B is a matrix of the same size satisfying Bii = 1 for 0 ≤ i ≤ r, for r ≤ min{m, n}, and all other entries of B are zero. If A can be reduced to two matrices B and B both of the above form, where the numbers of nonzero elements are r and r respectively, by different sequences of elementary operations, then r = r , and so B = B . Deﬁnition 2.4 The number r in the above theorem is called the rank of A; while a matrix of the form described for B is said to be in the canonical form for equivalence. We can write the canonical form matrix in “block form” as B= Ir O , O O
where Ir is an r × r identity matrix and O denotes a zero matrix of the appropriate size (that is, r × (n − r), (m − r) × r, and (m − r) × (n − r) respectively for the three Os). Note that some or all of these Os may be missing: for example, if r = m, we just have [ Im O ]. Proof We outline the proof that the reduction is possible. To prove that we always get the same value of r, we need a different argument. The proof is by induction on the size of the matrix A: in other words, we assume as inductive hypothesis that any smaller matrix can be reduced as in the theorem. Let the matrix A be given. We proceed in steps as follows: • If A = O (the allzero matrix), then the conclusion of the theorem holds, with r = 0; no reduction is required. So assume that A = O. • If A11 = 0, then skip this step. If A11 = 0, then there is a nonzero element Ai j somewhere in A; by swapping the ﬁrst and ith rows, and the ﬁrst and jth columns, if necessary (Type 3 operations), we can bring this entry into the (1, 1) position. • Now we can assume that A11 = 0. Multiplying the ﬁrst row by A−1 , (row 11 operation Type 2), we obtain a matrix with A11 = 1. • Now by row and column operations of Type 1, we can assume that all the other elements in the ﬁrst row and column are zero. For if A1 j = 0, then subtracting A1 j times the ﬁrst column from the jth gives a matrix with A1 j = 0. Repeat this until all nonzero elements have been removed.
18
CHAPTER 2. MATRICES AND DETERMINANTS • Now let B be the matrix obtained by deleting the ﬁrst row and column of A. Then B is smaller than A and so, by the inductive hypothesis, we can reduce B to canonical form by elementary row and column operations. The same sequence of operations applied to A now ﬁnish the job.
Example 2.2 Here is a small example. Let A= 1 2 3 . 4 5 6
We have A11 = 1, so we can skip the ﬁrst three steps. Subtracting twice the ﬁrst column from the second, and three times the ﬁrst column from the third, gives the matrix 1 0 0 . 4 −3 −6 Now subtracting four times the ﬁrst row from the second gives 1 0 0 0 . −3 −6 −6 ], but we con
From now on, we have to operate on the smaller matrix [ −3 tinue to apply the operations to the large matrix. Multiply the second row by −1/3 to get 1 0 0 . 0 1 2 Now subtract twice the second column from the third to obtain 1 0 0 . 0 1 0
We have ﬁnished the reduction, and we conclude that the rank of the original matrix A is equal to 2. We ﬁnish this section by describing the elementary row and column operations in a different way. For each elementary row operation on an nrowed matrix A, we deﬁne the corresponding elementary matrix by applying the same operation to the n × n identity matrix I. Similarly we represent elementary column operations by elementary matrices obtained by applying the same operations to the m × m identity matrix. We don’t have to distinguish between rows and columns for our elementary matrices. For example, the matrix 1 2 0 0 1 0 0 0 1
0 1 −2 . In order. (Here 2 × 2 matrices are row operations while 3 × 3 matrices are column operations). one more deﬁnition: .2. −4 1 0 −1/3 0 0 1 0 0 1 0 0 1 So the whole process can be written as a matrix equation: 1 −2 0 1 0 −3 1 0 1 0 1 0 A 0 1 00 1 0 0 1 0 −1/3 −4 1 0 0 1 0 0 1 0 0 or more simply 1 4/3 where. First.0 1 0 . 1 0 . 1 An important observation about the elementary operations is that each of them can have its effect undone by another elementary operation of the same kind. the effect of an elementary column operation is the same as that of multiplying on the right by the corresponding elementary matrix. 0 1 0 1 −2 0 0 1 A −1/3 0 0 1 −2 = B. 6 B= 1 0 0 . 1 . 1 −2 0 1 0 −3 1 0 0 0 0 1 0.2 The effect of an elementary row operation on a matrix is the same as that of multiplying on the left by the corresponding elementary matrix. so that −1 1 −2 1 2 = . or to the elementary row operation of adding twice the second row to the ﬁrst. For the other types. Lemma 2. the effect of adding twice the ﬁrst row to the second is undone by adding −2 times the ﬁrst row to the second.2. with its inverse being another elementary matrix of the same kind. 0 1 0 1 Since the product of invertible matrices is invertible. Similarly. here is the list of elementary matrices corresponding to the operations we applied to A. ROW AND COLUMN OPERATIONS 19 corresponds to the elementary column operation of adding twice the ﬁrst column to the second. The proof of this lemma is somewhat tedious calculation. we can state the above theorem in a more concise form. as before. Example 2. For example. A= 1 4 2 5 3 . the matrices for row operations and column operations are identical. 1 0 −2 = B.3 We continue our previous example. and hence every elementary matrix is invertible.
except for the uniqueness of the canonical form. (d) Elementary row operations don’t change the row rank of a matrix. We say that the column rank of A is the maximum number of linearly independent columns of A. that is. symmetric and transitive. . every object is equivalent to a unique object in the canonical form. For Types 1 and 2.3 Rank We have the unﬁnished business of showing that the rank of a matrix is well deﬁned. and proving that all three are the same.6 Let A be an m × n matrix over a ﬁeld K. that is. such that PAQ is in the canonical form for equivalence. (b) Elementary row operations don’t change the column rank of a matrix. This is our job for the next section. we end up with the same canonical form.3 Given any m × n matrix A. We do this by deﬁning two further kinds of rank. (c) Elementary column operations don’t change the row rank of a matrix. which just rearrange the vectors. Proof (a) This is clear for Type 3 operations. the vice versa statement holds because the inverse of an elementary operation is another operation of the same kind. they mean a set of objects which are representatives of the equivalence classes: that is. Remark The relation “equivalence” deﬁned above is an equivalence relation on the set of all m × n matrices. where P and Q are invertible matrices of sizes m × m and n × n respectively.5 The m × n matrices A and B are said to be equivalent if B = PAQ. MATRICES AND DETERMINANTS Deﬁnition 2.) Now we need a sequence of four lemmas. 2. while the row rank of A is the maximum number of linearly independent rows of A.4 (a) Elementary column operations don’t change the column rank of a matrix. Lemma 2. We have shown this for the relation of equivalence deﬁned earlier. Deﬁnition 2. there exist invertible matrices P and Q of sizes m × m and n × n respectively.20 CHAPTER 2. (We regard columns or rows as vectors in Km and Kn respectively. When mathematicians talk about a “canonical form” for an equivalence relation. it is reﬂexive. we have to show that such an operation cannot take a linearly independent set to a linearly dependent set. no matter how we do the row and column reduction. Theorem 2.
while vi = vi + cv j . Theorem 2. vn are linearly independent. . . . . the row rank. In particular. by our lemma. and their column ranks are equal. . . . The argument for Type 2 operations is similar but easier. . where B has rank r. but applied to rows. an . . the same linear combinations are zero). O O then the row and column ranks of B are both equal to r. So the theorem is proved. the column rank. Let A be an invertible n × n matrix. . from which we see that all the ak are zero. . . Suppose that the new vectors are linearly dependent. if B= Ir O . where vk = vk for k = i. Then the canonical form of A is just I: its rank is equal to n. . . Proof Suppose that we reduce A to canonical form B by elementary operations. Since v1 . . not all zero. This means that there are matrices P and Q. .5 For any matrix A. .2. Consider a Type 1 operation involving adding c times the jth column to the ith. we conclude that a1 = 0. each a product of elementary matrices. (b) It is easily checked that. Then there are scalars a1 . These elementary operations don’t change the row or column rank. vn . the new columns are v1 . vn are linearly independent. such that PAQ = In . (c) Same as (b). . . . an = 0. . .3. and the rank are all equal. . contrary to assumption. such that 0 = a1 v1 + · · · + an vn = a1 v1 + · · · + ai (vi + cv j ) + · · · + a j v j + · · · + an vn = a1 v1 + · · · + ai vi + · · · + (a j + cai )v j + · · · + an vn . . ai = 0. . . . then the new vectors satisfy exactly the same linear relations as the old ones (that is. . We can get an extra piece of information from our deliberations. so the row ranks of A and B are equal. So the linearly independent sets of vectors don’t change at all. RANK 21 So suppose that v1 . . So the new columns are linearly independent. a j + cai = 0. But it is trivial to see that. (d) Same as (a). the rank is independent of the row and column operations used to compute it. if an elementary row operation is applied. but applied to rows.
3 4 3 4 You have met determinants in earlier courses. For a matrix written out as an array. when we deﬁned elementary matrices. We observed.4 Determinants The determinant is a function deﬁned on square matrices. If we now apply the inverse operations in the other order. that they can represent either elementary column operations or elementary row operations. then A can be transformed into the identity matrix by elementary column operations alone (or by elementary row operations alone). Corollary 2. in other words.6 Every invertible square matrix is a product of elementary matrices. So. the following is true: Corollary 2. we can choose to regard them as representing column operations. they will turn A into the identity (which is its canonical form).7 If A is an invertible n × n matrix. 2. MATRICES AND DETERMINANTS A = P−1 In Q−1 = P−1 Q−1 . d a d g b e h c f = aei + b f g + cdh − a f h − bdi − ceg. i Our ﬁrst job is to deﬁne the determinant for square matrices of any size. It has some very important properties: perhaps most important is the fact that a matrix is invertible if and only if its determinant is not equal to zero. so that det(A) is the determinant of A. We do this in an “axiomatic” manner: . In other words. we learn a little bit more.22 From this we deduce that CHAPTER 2. We denote the determinant function by det. the determinant is denoted by replacing the square brackets by vertical bars: det 1 2 1 2 = . and you know the formula for the determinant of a 2 × 2 or 3 × 3 matrix: a c b = ad − bc. In fact. when we have written A as a product of elementary matrices. its value is a scalar. and we see that A can be obtained from the identity by applying elementary column operations.
then D(A) = 0.7 A function D deﬁned on n × n matrices is a determinant if it satisﬁes the following three conditions: (D1) For 1 ≤ i ≤ n. (D2) If A has two equal columns. (b) If B is obtained from A by multiplying the ith column by a nonzero scalar c.8 There is a unique determinant function on n × n matrices. where In is the n × n identity matrix. (D3) D(In ) = 1. let A be the matrix which agrees with A in all columns except the ith. we show that applying elementary row operations to A has a welldeﬁned effect on D(A). Proof First. • multiply the ith column by −1. and if A is the matrix whose ith column is c times the ith column of A plus c times the ith column of A . . then D(A ) = c D(A) + c D(A ). By rule (D2). DETERMINANTS 23 Deﬁnition 2. For (a).4. • add the jth column to the ith. D(A ) = 0. if A and A are two matrices which agree everywhere except the ith column.2. We show the following result: Theorem 2. (c) If B is obtained from A by interchanging two columns. but agreeing with A and A everywhere else. Part (b) follows immediately from rule (D3). By rule (D1). which is equal to the jth column of A. D is a linear function of the ith column: this means that. then D(B) = cD(A). (a) If B is obtained from A by adding c times the jth column to the ith. for any n. D(B) = D(A) + cD(A ) = D(A). To prove part (c). we observe that we can interchange the ith and jth columns by the following sequence of operations: • add the ith column to the jth. • subtract the ith column from the jth. then D(B) = −D(A). then D(B) = D(A).
then we can reduce A to the identity. Then XA is the inverse of B. . whence D(A) = 0. where Q is a product of elementary matrices. Proof See the case division at the end of the proof of the theorem.9 A square matrix is invertible if and only if det(A) = 0. so we can apply a sequence of elementary column operations to B to get to the identity. so that cD(A) = D(I) = 1.24 In symbols. if it exists. We show its existence in the next section. In the other case. One of the most important properties of the determinant is the following. whence D(A) = c−1 . MATRICES AND DETERMINANTS (ci . and the theorem is true. so we see that BQ = I. c j ) → (ci . ci ). Then det(B) = 0. This proves that the determinant function.) So det(AB) = 0. Applying axiom (D1). Now these operations are represented by elementary matrices. CHAPTER 2. c j + ci ) → (c j . we see that D(A) is a linear combination of values D(A ). B is invertible. The proof of the theorem shows an important corollary: Corollary 2. so that XAB = I. • If A is not invertible. Theorem 2. so that c det(B) = I. Also. and one column can be written as a linear combination of the others. so D(A ) = 0 for all such A . So the columns of A are linearly dependent. depending on the operations. Now we take the matrix A and apply elementary column operations to it. The overall effect is to multiply D(A) by a certain nonzero scalar c. Proof Suppose ﬁrst that B is not invertible. while the second multiplies it by −1. c j + ci ) → (c j . suppose that (AB)−1 = X. Given the uniqueness of the determinant function. or c = (det(B))−1 . keeping track of the factors by which D gets multiplied according to rules (a)–(c). is unique. c j + ci ) → (−ci . by giving a couple of formulae for it. AB is not invertible. where A are matrices with two equal columns. The ﬁrst. • If A is invertible. The effect of these operations is to multiply the determinant by a nonzero factor c (depending on the operations). then det(AB) = det(A) det(B). third and fourth steps don’t change the value of D. then its column rank is less than n. we now denote it by det(A) instead of D(A). (For.10 If A and B are n×n matrices over K.
then D(A) = det(A) = 0. To see that D(A) = det(A): if A is not invertible. namely. if D denotes the “determinant” computed by row operations.8 A permutation of {1. For. we have deﬁned determinants using columns. Finally. n}. . This ﬁnishes the job we left open in the proof of the last theorem. (D2 ) if two rows of A are equal . . then det(A ) = det(A). CALCULATING DETERMINANTS 25 If we apply the same sequence of elementary operations to AB. D is a linear function of the ith row. .5.) For any permutation π ∈ Sn . showing that a determinant function actually exists! The ﬁrst formula involves some background notation. then sign(τ) = −1. . . then det(A) = D(A) = det(A ). and the determinant is the product of the factors associated with these operations.2. so we ﬁnd that c det(AB) = det(A). as required. . we end up with the matrix (AB)Q = A(BQ) = AI = A. . there is a number sign(π) = ±1. . Deﬁnition 2. Thus. . if there are k cycles (including cycles of length 1). but if A is invertible. but we could have used rows instead: Proposition 2. n} to itself. then it is a product of elementary matrices (which can represent either row or column operations). if τ is a transposition. computed as follows: write π as a product of disjoint cycles. The proof of uniqueness is almost identical to that for columns. A transposition is a permutation which interchanges two symbols and leaves all the others ﬁxed. (D3 ) D(In ) = 1.12 If A denotes the transpose of A. . 2. Corollary 2. since row operations on A correspond to column operations on A . . . n} is a bijection from the set {1. (There are n! such permutations. this implies that det(AB) = det(A) det(B). Since c = det(B))−1 . then D(A) = 0. .5 Calculating determinants We now give a couple of formulae for the determinant. then sign(π) = (−1)n−k . The symmetric group Sn consists of all permutations of the set {1. The determinant is multiplied by the same factor.11 The determinant is the unique function D of n × n matrices which satisﬁes the conditions (D1 ) for 1 ≤ i ≤ n.
This gives us a nice mathematical formula for the determinant of a matrix. it is clear that each product is a linear function of that column. Proof In order to show that this is a good deﬁnition. as required. We conclude that det(A) = 0. and all the terms Aiι(i) = Aii are equal to 1. so that Aiπ(i) = 0. we need to verify that it satisﬁes our three rules (D1)–(D3). The elements of Sn can be divided up into n!/2 pairs of the form {π. it is a terrible formula in practice. whereas π(τ(k)) = π(k) for k = i. each a product of matrix entries. . For n of moderate size. 10! = 3628800. Then π(τ(i)) = π( j) and π(τ( j)) = π(i). each pair of terms in the formula cancel out. As we have seen. Let τ be the transposition which interchanges i and j and leaves the other symbols ﬁxed. j. Deﬁnition 2. say the ith. (D1) According to the deﬁnition. If we look at a particular column. then π). so the same is true for the determinant. MATRICES AND DETERMINANTS The last fact holds because a transposition has one cycle of size 2 and n − 2 cycles of size 1. Thus (D2) holds. The determinant of A is deﬁned by the formula det(A) = ∑ π∈Sn sign(π)A1π(1) A2π(2) · · · Anπ(n) . so n − 1 altogether. Unfortunately. apart from a sign. so det(A) = 1.) Here is a second formula. We need one more fact about signs: if π is any permutation and τ is a transposition. But sign(πτ) = − sign(π). which is also theoretically important but very inefﬁcient in practice. then the only permutation π which contributes to the sum is the identity permutation ι: for any other permutation π satisﬁes π(i) = i for some i. (D3) If A = In . and adding them up with + and − signs. is the product of n elements. since it involves working out n! terms.9 Let A be an n × n matrix over K. we see that the products A1π(1) A2π(2) · · · Anπ(n) and A1πτ(1) A2πτ(2) · · · Anπτ(n) are equal. det(A) is a sum of n! terms. then sign(πτ) = − sign(π). πτ}. So the corresponding terms in the formula for the determinant cancel one another.26 CHAPTER 2. The sign of ι is +1. this will take a very long time! (For example. Each term. (D2) Suppose that the ith and jth columns of A are equal. Because the elements in the ith and jth columns of A are the same. so sign(τ) = (−1)n−(n−1) = −1. where πτ denotes the composition of π and τ (apply ﬁrst τ. one from each row and column.
2.5. CALCULATING DETERMINANTS
27
Deﬁnition 2.10 Let A be an n × n matrix. For 1 ≤ i, j ≤ n, we deﬁne the (i, j) minor of A to be the (n − 1) × (n − 1) matrix obtained by deleting the ith row and jth column of A. Now we deﬁne the (i, j) cofactor of A to be (−1)i+ j times the determinant of the (i, j) minor. (These signs have a chessboard pattern, starting with sign + in the top left corner.) We denote the (i, j) cofactor of A by Ki j (A). Finally, the adjugate of A is the n × n matrix Adj(A) whose (i, j) entry is the ( j, i) cofactor K ji (A) of A. (Note the transposition!) Theorem 2.13 (a) For j ≤ i ≤ n, we have det(A) = ∑ Ai j Ki j (A).
i=1 n
(b) For 1 ≤ i ≤ n, we have det(A) =
j=1
∑ Ai j Ki j (A).
n
This theorem says that, if we take any column or row of A, multiply each element by the corresponding cofactor, and add the results, we get the determinant of A. Example 2.4 Using a cofactor expansion along the ﬁrst column, we see that 1 4 7 2 5 8 3 6 10 5 8 6 2 3 2 3 −4 +7 10 8 10 5 6
=
= (5 · 10 − 6 · 8) − 4(2 · 10 − 3 · 8) + 7(2 · 6 − 3 · 5) = 2 + 16 − 21 = −3 using the standard formula for a 2 × 2 determinant. Proof We prove (a); the proof for (b) is a simple modiﬁcation, using rows instead of columns. Let D(A) be the function deﬁned by the righthand side of (a) in the theorem, using the jth column of A. We verify rules (D1)–(D3). (D1) It is clear that D(A) is a linear function of the jth column. For k = j, the cofactors are linear functions of the kth column (since they are determinants), and so D(A) is linear.
28
CHAPTER 2. MATRICES AND DETERMINANTS
(D2) If the kth and lth columns of A are equal, then each cofactor is the determinant of a matrix with two equal columns, and so is zero. The harder case is when the jth column is equal to another, say the kth. Using induction, each cofactor can be expressed as a sum of elements of the kth column times (n − 2) × (n − 2) determinants. In the resulting sum, it is easy to see that each such determinant occurs twice with opposite signs and multiplied by the same factor. So the terms all cancel. (D3) Suppose that A = I. The only nonzero cofactor in the jth column is K j j (I), which is equal to (−1) j+ j det(In−1 ) = 1. So D(I) = 1. By the main theorem, the expression D(A) is equal to det(A). At ﬁrst sight, this looks like a simple formula for the determinant, since it is just the sum of n terms, rather than n! as in the ﬁrst case. But each term is an (n − 1) × (n − 1) determinant. Working down the chain we ﬁnd that this method is just as labourintensive as the other one. But the cofactor expansion has further nice properties: Theorem 2.14 For any n × n matrix A, we have A · Adj(A) = Adj(A) · A = det(A) · I. Proof We calculate the matrix product. Recall that the (i, j) entry of Adj(A) is K ji (A). Now the (i, i) entry of the product A · Adj(A) is
k=1
∑ Aik (Adj(A))ki =
n
n
k=1
∑ Aik Kik (A) = det(A),
n
n
by the cofactor expansion. On the other hand, if i = j, then the (i, j) entry of the product is
k=1
∑ Aik (Adj(A))k j =
k=1
∑ Aik K jk (A).
This last expression is the cofactor expansion of the matrix A which is the same of A except for the jth row, which has been replaced by the ith row of A. (Note that changing the jth row of a matrix has no effect on the cofactors of elements in this row.) So the sum is det(A ). But A has two equal rows, so its determinant is zero. Thus A · Adj(A) has entries det(A) on the diagonal and 0 everywhere else; so it is equal to det(A) · I. The proof for the product the other way around is the same, using columns instead of rows.
2.6. THE CAYLEY–HAMILTON THEOREM Corollary 2.15 If the n × n matrix A is invertible, then its inverse is equal to (det(A))−1 Adj(A).
29
So how can you work out a determinant efﬁciently? The best method in practice is to use elementary operations. Apply elementary operations to the matrix, keeping track of the factor by which the determinant is multiplied by each operation. If you want, you can reduce all the way to the identity, and then use the fact that det(I) = 1. Often it is simpler to stop at an earlier stage when you can recognise what the determinant is. For example, if the matrix A has diagonal entries a1 , . . . , an , and all offdiagonal entries are zero, then det(A) is just the product a1 · · · an . Example 2.5 Let 1 2 3 A = 4 5 6 . 7 8 10 Subtracting twice the ﬁrst column from the second, and three times the second column from the third (these operations don’t change the determinant) gives 1 0 0 4 −3 −6 . 7 −6 −11 Now the cofactor expansion along the ﬁrst row gives det(A) = −3 −6 −6 = 33 − 36 = −3. −11
(At the last step, it is easiest to use the formula for the determinant of a 2 × 2 matrix rather than do any further reduction.)
2.6
The Cayley–Hamilton Theorem
Since we can add and multiply matrices, we can substitute them into a polynomial. For example, if 0 1 A= , −2 3 then the result of substituting A into the polynomial x2 − 3x + 2 is A2 − 3A + 2I = −2 −6 3 0 −3 2 0 0 0 + + = . 7 6 −9 0 2 0 0
if A= then cA (x) = 0 1 . d x − a −b = x2 − (a + d)x + (ad − bc).) It turns out that. we can calculate a polynomial equation of degree n satisﬁed by A. for every n × n matrix A. −2 3 x −1 = x(x − 3) + 2 = x2 − 3x + 2. In place of A. This is a polynomial in x of degree n. Then cA (A) = O. d 0 1 a c b . Proof We use the theorem A · Adj(A) = det(A) · I. 2 x−3 Indeed. MATRICES AND DETERMINANTS We say that the matrix A satisﬁes the equation x2 − 3x + 2 = 0. The characteristic polynomial of A is the polynomial cA (x) = det(xI − A). Deﬁnition 2. we put the matrix xI − A into this formula: (xI − A) Adj(xI − A) = det(xI − A)I = cA (x)I.30 CHAPTER 2. it turns out that this is the polynomial we want in general: Theorem 2. For example. (Notice that for the constant term 2 we substituted 2I.6 Let us just check the theorem for 2 × 2 matrices.16 (Cayley–Hamilton Theorem) Let A be an n×n matrix with characteristic polynomial cA (x). Example 2. −c x − d after a small amount of calculation. If A= then cA (x) = and so cA (A) = a2 + bc ac + cd ab + bd a 2 − (a + d) c bc + d b 1 0 + (ad − bc) = O.11 Let A be a n × n matrix. .
So cA (A) = O. Bn−1 . So the entries of Adj(xI − A) are polynomials in x. . then we read off that −ABn−1 −AB1 −AB0 An−1 . and multiply the ﬁrst by An . we have An + cn−1 An−1 + · · · + c1 A + c0 I = cA (A). if we let cA (x) = xn + cn−1 xn−1 + · · · + c1 x + c0 . . x2 + 1 3x − 4 2x 1 0 0 2 1 0 = x2 +x + . . = c0 I. . as claimed. . . We take this system of equations. So. as a polynomial whose coefﬁcients are matrices. Adj(xI − A) is a matrix whose entries are polynomials. we argue as follows. and if we try to substitute A for x the size of the matrix will be changed! Instead. for suitable n × n matrices B0 . that is. Bn−1 = I.2. So we can write Adj(xI − A) = xn−1 Bn−1 + xn−2 Bn−2 + · · · + xB1 + B0 . x+2 0 0 3 1 −4 2 The entries in Adj(xI − A) are (n − 1) × (n − 1) determinants. so the highest power of x that can arise is xn−1 . As we have said. while on the left there is a factor AI − A = O. For example.6. it is important to see why. ··· + B0 = c1 I. Hence cA (x)I = (xI − A) Adj(xI − A) = (xI − A)(xn−1 Bn−1 + xn−2 Bn−2 + · · · + xB1 + B0 ) = xn Bn−1 + xn−1 (−ABn−1 + Bn−2 ) + · · · + x(−AB1 + B0 ) − AB0 . THE CAYLEY–HAMILTON THEOREM 31 Now it is very tempting just to substitute x = A into this formula: on the right we have cA (A)I = cA (A). What happens? On the left. so we can write it as a sum of powers of x times matrices. Unfortunately this is not valid. On the right. + Bn−2 = cn−1 I. the second by . . . all the terms cancel in pairs: we have An Bn−1 + An−1 (−ABn−1 + Bn−2 ) + · · · + A(−AB1 + B0 ) + I(−AB0 ) = O. and the last by A0 = I. The matrix Adj(xI − A) is an n × n matrix whose entries are determinants of (n − 1) × (n − 1) matrices with entries involving x.
32 CHAPTER 2. MATRICES AND DETERMINANTS .
We will see that these maps can be represented by matrices.1 Deﬁnition and basic properties Deﬁnition 3. 33 . if • α(v1 + v2 ) = α(v1 ) + α(v2 ) for all v1 . • α(cv) = cα(v) for all v ∈ V and c ∈ K. Remarks 1. v2 ∈ V .2 Let α : V → W be a linear map. and give another proof of the canonical form for equivalence. We can combine the two conditions into one as follows: α(c1 v1 + c2 v2 ) = c1 α(v1 ) + c2 α(v2 ).1 Let V and W be vector spaces over a ﬁeld K. In other literature the term “linear transformation” is often used instead of “linear map”. 3. that is. and the kernel of α is Ker(α) = {v ∈ V : α(v) = 0}. A function α from V to W is a linear map if it preserves addition and scalar multiplication. Deﬁnition 3.Chapter 3 Linear maps between vector spaces We return to the setting of vector spaces in order to deﬁne linear maps between them. decide when two matrices represent the same linear map. 2. The image of α is the set Im(α) = {w ∈ W : w = α(v) for some v ∈ V }.
α(vs ) are linearly independent. We claim that the vectors α(v1 ). if w1 = α(v1 ) and w2 = α(v2 ). So this equation implies that all the as and cs are zero. say u1 . Then α(c1 v1 + · · · + cs vs ) = 0. .3 We deﬁne the rank of α to be ρ(α) = dim(Im(α)) and the nullity of α to be ν(α) = dim(Ker(α)). so that c1 v1 + · · · + cs vs ∈ Ker(α). .34 CHAPTER 3. Proof We have to show that each is closed under addition and scalar multiplication. . (We use the Greek letters ‘rho’ and ‘nu’ here to avoid confusing the rank of a linear map with the rank of a matrix. then w1 + w2 = α(v1 ) + α(v2 ) = α(v1 + v2 ). and if w = α(v) then cw = cα(v) = α(cv). uq . . . v1 . But then this vector can be expressed in terms of the basis for Ker(α): c1 v1 + · · · + cs vs = a1 u1 + · · · + aq uq . whence −a1 u1 − · · · − aq uq + c1 v1 + · · · + cs vs = 0. . . . . . where q + s = dim(V ). u2 . if α(v1 ) = α(v2 ) = 0 then α(v1 + v2 ) = α(v1 ) + α(v2 ) = 0 + 0 = 0. For the image. . . where r = dim(Ker(α)) = ν(α). The fact that c1 = · · · = cs = 0 shows that the vectors α(v1 . . LINEAR MAPS BETWEEN VECTOR SPACES Proposition 3. . vs . . so we can add further vectors to get a basis for V . . . . uq for Ker(α). Then ρ(α) + ν(α) = dim(V ). α(vs ) form a basis for Im(α). so they are linearly independent. . We have to show that they are linearly independent and spanning.1 Let α : V → W be a linear map. and if α(v) = 0 then α(cv) = cα(v) = c0 = 0. uq are linearly independent vectors of V . . . Linearly independent: Suppose that c1 α(v1 ) + · · · + cs α(vs ) = 0. Proof Choose a basis u1 . though they will turn out to be closely related!) Theorem 3. . But the us and vs form a basis for V . . . For the kernel. The vectors u1 . Deﬁnition 3.2 (Rank–Nullity Theorem) Let α : V → W be a linear map. . Then the image of α is a subspace of W and the kernel is a subspace of V .
. aq . Deﬁnition 3. . if m = 3. we get w = α(v) = a1 α(u1 ) + · · · + aq α(uq ) + c1 α(v1 ) + · · · + cs α(vs ) = c1 w1 + · · · + cs ws . Let α : V → W be a linear map. j=1 ∑ a ji f j n . . . m. c1 . em ) for V and C = ( f1 . . Write v in terms of the basis for V : v = a1 u1 + · · · + aq uq + c1 v1 + · · · + cs vs for some a1 . em be the standard basis for V (so that ei is the vector with ith coordinate 1 and all other coordinates zero). the vector α(ei ) belongs to W . . fn ) for W is the n × m matrix whose (i. and f1 . where dim(V ) = m and dim(W ) = n. Then w = α(v) for some v ∈ V . . since α(ui ) = 0 (as ui ∈ Ker(α)) and α(vi ) = wi . . .4 The matrix representing the linear map α : V → W relative to the bases B = (e1 . REPRESENTATION BY MATRICES 35 Spanning: Take any vector in Im(α). . So. . . . . cs . . . . . . So the vectors w1 . fn . . . for example. . . say w. . . . This vector is the ith column of the matrix representing α. As we saw in the ﬁrst section. . j) entry is ai j . n = 2. . Thus. Since ν(α) = q and q + s = dim(V ). . . Applying α.2 Representation by matrices We come now to the second role of matrices in linear algebra: they represent linear maps between vector spaces. fn the standard basis for V . Take α(ei ) and write it as a column vector [ a1i a2i · · · ani ] . α(e3 ) = 3 f1 − f2 . we can take V and W in their coordinate representation: V = Km and W = Kn (the elements of these vector spaces being represented as column vectors). .3. . In practice this means the following. the theorem is proved. and α(e1 ) = f1 + f2 . where α(ei ) = for j = 1. . . . . . Let e1 . n. . ws span Im(α). . 3.2. Then for i = 1. ρ(α) = dim(Im(α)) = s. . so we can write it as a linear combination of f1 . α(e2 ) = 2 f1 + 5 f2 .
so that in coordinates c1 . . 1 α(e2 ) = 2 . cm Then α(v) = ∑ ci α(ei ) = ∑ i=1 m m i=1 j=1 ∑ cia ji f j . −1 Now the most important thing about this representation is that the action of α is now easily described: Proposition 3. In our example. . em be the basis for V . and f1 . then 2 1 2 3 20 α(v) = Av = 3 = . Take v = ∑m ci ei ∈ i=1 V . . v= . Proof Let e1 . Then. . which is precisely the jth coordinate in i=1 the matrix product Av.36 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES then the vectors α(ei ) as column vectors are α(e1 ) = 1 . 1 5 −1 13 4 Addition and multiplication of linear maps correspond to addition and multiplication of the matrices representing them. fn for W . . . we have α(v) = Av. . if v = 2e1 + 3e2 + 4e3 = [ 2 3 4 ] . Choose bases for V and W and let A be the matrix representing α. −1 and so the matrix representing T is 1 2 1 5 3 . 5 α(e3 ) = 3 . . . Deﬁne their sum α + β by the rule (α + β )(v) = α(v) + β (v) for all v ∈ V . if we represent vectors of V and W as column vectors relative to these bases. . n so the jth coordinate of α(v) is ∑n a ji ci . It is easy to check that α + β is a linear map. .3 Let α : V → W be a linear map.5 Let α and β be linear maps from V to W . Deﬁnition 3.
3. Then we have [v]B = P[v]B . then β ”.6 Let U. Note that the order is important: we take a vector u ∈ U. The product β α is the function U → W deﬁned by the rule (β α)(u) = β (α(u)) for all u ∈ U. that is. Deﬁnition 3.3. CHANGE OF BASIS 37 Proposition 3. Again it is easily checked that β α is a linear map. 3.5 hold. Remember the notion of transition matrix from Chapter 1. . Remark Let l = dim(U). and let α : U → V and β : V → W be linear maps. Again the proof is tedious but not difﬁcult. .5 If α : U → V and β : V → W are linear maps represented by matrices A and B respectively. . The proof of this is not difﬁcult: just use the deﬁnitions.B is the matrix whose jth column is the coordinate representation of v j in the basis B. then β α is represented by the matrix BA. then α + β is represented by the matrix A + B. apply α to it to get a vector in V . . Proposition 3. . so the product BA is deﬁned. m = dim(V ) and n = dim(W ). then A is m × l. and then apply β to get a vector in W . . there is an inverse map) if and only if it is represented by an invertible matrix. If B = (v1 .3 Change of basis The matrix representing a linear map depends on the choice of bases we used to represent it.V. Of course it follows that a linear map is invertible (as a map.4 If α and β are linear maps represented by matrices A and B respectively. which is the right size for a matrix representing a map from an ldimensional to an ndimensional space. . and is n × l.W be vector spaces over K. The deﬁnition of multiplication of linear maps is the natural one (composition). . the transition matrix PB. and B is n × m. vm ) and B = (v1 . and we could then say: what deﬁnition of matrix multiplication should we choose to make the Proposition valid? We would ﬁnd that the usual deﬁnition was forced upon us. Now we have to discuss what happens if we change the basis. . So β α means “apply α. vm ) are two bases for a vector space V . The signiﬁcance of all this is that the strange rule for multiplying matrices is chosen so as to make Proposition 3.
. If P = PB. n m This means. j) entry of P = PB. and similarly for B . LINEAR MAPS BETWEEN VECTOR SPACES where [v]B is the coordinate representation of an arbitrary vector in the basis B. of course. The inverse of PB. . we have to represent the vectors α(vi ) in the basis C . A = RAP = Q−1 AP. you need it for explicit calculations. i=1 m so α(v j ) = = = pi j α(vi ) i=1 m m pi j Aki wk i=1 k=1 m n n i=1 k=1 l=1 ∑ m ∑∑ ∑ ∑ ∑ pi j Akirlk wl . then A = Q−1 AP. Let pi j be the (i. .C .C be its inverse. What is the relation between A and A ? We just do it and see.C are the transition matrices from the unprimed to the primed bases. Recall that two matrices A and B are equivalent if B is obtained from A by multiplying on the left and right by invertible matrices. Proposition 3. and by a matrix A using the bases B and C . but for theoretical purposes the importance is the following corollary. This is rather technical. ∑ ∑ rlk Aki pi j . wn ) be two bases for a space W . To get A .6 Let α : V → W be a linear map represented by matrix A relative to the bases B for V and C for W .B is PB . (It makes no difference that we said B = PAQ before and B = Q−1 AP here. with transition matrix QC. Let α be a linear map from V to W . Let Q = QC. .C and inverse QC . . according to the rules of matrix multiplication. Then α is represented by a matrix A using the bases B and C.) . j) entry ri j . We have v j = ∑ pi j vi . and by the matrix A relative to the bases B for V and C for W . . wn ) and C = (w1 . on turning things around. that (A )l j = k=1 i=1 so. .38 CHAPTER 3.B . with (i. Now let C = (w1 .C and let R = QC . .B .B and Q = PC.
where B is the “standard basis”. . . . . . . . . xt for W . .3. and so can be extended to a basis α(v1 ). and form a basis B for Kn . . . . . . .4. and • any invertible matrix can be regarded as a transition matrix: for. . . . . First. . . . and extend to a basis u1 . Ir O . . . if the n × n matrix P is invertible. x1 . .8 Let α : V → W be a linear map of rank r = ρ(α). in block form. . . 3. . otherwise. . .7 Two matrices represent the same linear map with respect to different bases if and only if they are equivalent. This holds because • transition matrices are always invertible (the inverse of PB. and then P = PB. wr = α(vr ).B for the transition in the other direction). . v1 . vr+s = ws for V. so the matrix of α relative to these bases is Ir O O O as claimed. we make the following observation. α(vr ) is a basis for Im(α).B is the matrix PB . CANONICAL FORM REVISITED 39 Proposition 3.3 about canonical form for equivalence. . w1 = α(v1 ). . . Now we will use the bases v1 . us . Then α(v1 ).2.B . then its rank is n. . . . Then there are bases for V and W such that the matrix representing α is. so its columns are linearly independent. Theorem 3. .4 Canonical form revisited Now we can give a simpler proof of Theorem 2. We have α(vi ) = wi 0 if 1 ≤ i ≤ r. α(vr ). . us for Ker(α). vr+1 = u1 . . O O Proof As in the proof of Theorem 3. choose a basis u1 . wr+s = xs for W. wr+1 = x1 . . vr for V . vr .
7. LINEAR MAPS BETWEEN VECTOR SPACES We recognise the matrix in the theorem as the canonical form for equivalence. n} + 1. So how many equivalence classes of m × n matrices are there. O O We also see. Combining Theorem 3. the dimension of its image) is equal to the rank of any matrix which represents it. we see: Theorem 3. for given m and n? The rank of such a matrix can take any value from 0 up to the minimum of m and n. by the way. . that the rank of a linear map (that is. So all our deﬁnitions of rank agree! The conclusion is that two matrices are equivalent if and only if they have the same rank.9 A matrix of rank r is equivalent to the matrix Ir O .40 CHAPTER 3. so the number of equivalence classes is min{m.8 with Proposition 3.
Now v = w + (v − w) is the sum of a vector in Im(π) and one in Ker(π).1 Projections and direct sums We begin by looking at a particular type of linear map whose importance will be clear later on. there is only one.1 If π : V → V is a projection. Proof We have two things to do: Im(π) + Ker(π) = V : Take any vector v ∈ V .Chapter 4 Linear maps on a vector space In this chapter we consider a linear map α from a vector space V to itself. We claim that v − w ∈ Ker(π). as required (the ﬁrst equality holding because v ∈ Ker(π)). This holds because π(v − w) = π(v) − π(w) = π(v) − π(π(v)) = π(v) − π 2 (v) = 0. as in the last chapter. since π 2 = π. 41 . π 2 is deﬁned by π 2 (v) = π(π(v))). Then v = π(w) for some vector w. Im(π)∩Ker(π) = {0}: Take v ∈ Im(π)∩Ker(π). as usual. If dim(V ) = n then. Proposition 4. and let w = π(v) ∈ Im(π).1 The linear map π : V → V is a projection if π 2 = π (where. This makes the theory much more interesting! 4. Deﬁnition 4. then V = Im(π) ⊕ Ker(π). we can represent α by an n × n matrix relative to any basis for V . However. and 0 = π(v) = π(π(w)) = π 2 (w) = π(w) = v. this time we have less freedom: instead of having two bases to choose.
. then π is also a projection. where u ∈ U and w ∈ W . where I is the identity transformation. In this form the result extends: Proposition 4. Now the assertions are clear. notice that if π is a projection and π = I − π (where I is the identity map. . . πr are projections on V satisfying (a) π1 + π2 + · · · + πr = I. we see that Ker(π) = Im(π ). . π2 . r. so V = Im(π) ⊕ Im(π ). where Ui = Im(πi ). £Im(π) £ £ £ v ¨ ¨¨ £ ¨ £ ¨¨ ¨ £ ¨ %¨ π(v) £¨ ¨¨ Ker(π) ¨ £ ¨¨ £ ¨¨ £ ¨ ¨ £ ¨¨ £¨ £ ¨¨ ¨¨ £ £ £ Figure 4. also ππ = π(I − π) = π − π 2 = O. It moves any vector v in a direction parallel to Ker(π) to a vector lying in Im(π). . Finally.2 Suppose that π1 . where ui ∈ Ui for i = 1. then there is a projection π : V → V with Im(π) = U and Ker(π) = W . First. For every vector v ∈ V can be uniquely written as v = u + w.1 shows geometrically what a projection is.42 CHAPTER 4. Proof We have to show that any vector v can be uniquely written in the form v = u1 + u2 + · · · + ur . . . We have v = I(v) = π1 (v) + π2 (v) + · · · + πr (v) = u1 + u2 + · · · + ur . since (π )2 = (I − π)2 = I − 2π + π 2 = I − 2π + π = I − π = π . and π + π = I. satisfying I(v) = v for all vectors v). Then V = U1 ⊕U2 ⊕ · · · ⊕Ur . .1: A projection We can extend this to direct sums with more than two terms. LINEAR MAPS ON A VECTOR SPACE It goes the other way too: if V = U ⊕W . we deﬁne π by the rule that π(v) = u. . (b) πi π j = O for i = j. The diagram in Figure 4.
sums and products (and hence arbitrary polynomials) of linear maps are represented by sums and products of the representing matrices: that is. for j = i.2. . . What happens if we change the basis? This also follows from the formula we worked out in the last chapter. On the other hand. for any polynomial f (x). The point of this is that projections give us another way to recognise and describe direct sums. then we can ﬁnd projections πi . . then we deﬁne πi (v) = ui . 4. . we have πi (u j ) = πi π j (v j ) = 0. Also. there is only one basis to change. πr satisfying the conditions of the above Proposition. r. if V = U1 ⊕U2 ⊕· · ·⊕Ur . and α is represented by a matrix A. . n Then just as in the last section. Conversely. say. . the action of α on V is represented by the action of A on Kn : α(v) is represented by the product Av. since πi π j = O. .2 Linear maps and matrices Let α : V → V be a linear map.4. . since all terms in the sum except the ith are zero. . π2 . . So the only possible expression is given by ui = πi (v). However. . So any vector can be written in this form. So applying πi to the expression for v. vn for V . . . r. Since ui ∈ Ui = Im(πi ). LINEAR MAPS AND MATRICES 43 where ui = πi (v) ∈ Im(πi ) for i = 1. . . we have ui = π(vi ) for some vi . and the proof is complete. then V can be written in coordinates as Kn . . For any vector v ∈ V has a unique expression as v = u1 + u2 + · · · + ur with ui ∈ Ui for i = 1. . If we choose a basis v1 . . . with ui ∈ Ui for i = 1. we obtain πi (v) = πi (u1 ) + πi (u2 ) + · · · + πi (ur ) = πi (ui ) = ui . as in the last chapter. Now suppose that we have any expression v = u1 + u2 + · · · + ur . . . the map f (α) is represented by the matrix f (A). r. then πi (ui ) = πi2 (vi ) = πi (vi ) = ui . where α(vi ) = j=1 ∑ a jiv j .
B be the transition matrix between the two bases. and so (since v = 0) we have λ = µ. if v = 0 and α(v) = λ v. . Thus similarity is an equivalence relation. LINEAR MAPS ON A VECTOR SPACE Proposition 4. With this requirement. where we do have a nice simple representative of the similarity class. 4. and two matrices are similar if and only if they represent the same linear map with respect to different bases. Proof This is just Proposition 4. it means “characteristic value” or “proper value” (here “proper” is used in the sense of “property”).3 Let α be a linear map on V . We’ll see how to do this soon.3 Eigenvalues and eigenvectors Deﬁnition 4.2 Two n × n matrices A and B are said to be similar if B = P−1 AP for some invertible matrix P. The name eigenvalue is a mixture of German and English. Another term used in older books is “latent root”. The set {v : α(v) = λ v} consisting of the zero vector and the eigenvectors with eigenvalue λ is called the λ eigenspace of α. Deﬁnition 4. For the rest of this section we look at a special class of matrices or linear maps. Note that we require that v = 0. otherwise the zero vector would be an eigenvector for any value of λ . the “diagonalisable” ones. Then A = P−1 AP. Let P = PB. each eigenvector has a unique eigenvalue: for if α(v) = λ v = µv. and by the matrix A relative to a basis B . Here “latent” means “hidden”: the idea is that the eigenvalue is somehow hidden in a matrix representing α.6.3 Let α be a linear map on V which is represented by the matrix A relative to a basis B. since P and Q are the same here.44 CHAPTER 4. and we have to extract it by some procedure. A vector v ∈ V is said to be an eigenvector of α. then (λ − µ)v = 0. with eigenvalue λ ∈ K. In the ﬁnal section we give without proof a general result for the complex numbers. There is no simple canonical form for similarity like the one for equivalence that we met earlier.
4. . the vector w = 2 is an eigen3 vector with eigenvalue 3.4. . vn is such a basis showing that α is diagonalisable. So: Proposition 4. if we have a basis of eigenvectors. DIAGONALISABILITY Example Let A= 3 satisﬁes 4 −6 −12 6 11 3 3 =2 . −12 11 45 The vector v = so is an eigenvector with eigenvalue 2.4 The linear map α on V is diagonalisable if there is a basis of V relative to which the matrix representing α is a diagonal matrix. Thus. then the matrix representing α is diagonal. . . and the fact that there cannot be more than n of them for an n × n matrix. Deﬁnition 4. . Suppose that v1 . If we knew that.4 The linear map α on V is diagonalisable if and only if there is a basis of V consisting of eigenvectors of α.4. 2 is an eigenvalue of A. . for example. . . y y In the nextbutone section. Then α(vi ) = aii vi for i = 1. 4 4 −6 6 . then we could ﬁnd a x corresponding eigenvector by solving the linear equations y −6 −12 6 11 x x =2 . . where aii is the ith diagonal entry of the diagonal matrix A. we will see how to ﬁnd the eigenvalues. Similarly. n. the basis vectors are eigenvectors.4 Diagonalisability Some linear maps have a particularly simple representation by matrices. Conversely.
Proof Let λ1 .2 and its converse show that there are projections π1 . . . . if α = ∑ λi πi . It is easy to see that its only 0 1 eigenvalue is 1. . since the eigenvectors −12 11 3 2 and are linearly independent. and π1 . and so form a basis for R. −12 9 . −12 11 2 3 2 3 0 3 so that P−1 AP is diagonal. (c) α = λ1 π1 + · · · + λr πr .46 Example The matrix CHAPTER 4. Proposition 4. . Then the following are equivalent: (a) α is diagonalisable. So (b) implies (c). . and let vi1 . LINEAR MAPS ON A VECTOR SPACE 1 2 is not diagonalisable. where P is the matrix whose columns are the eigenvectors of A. where λ1 . So we cannot ﬁnd a basis of eigenvectors.5 Let α : V → V be a linear map. where the πi satisfy the conditions of (c). λr are the distinct eigenvalues of α. one can ﬁnd two projection matrices whose column spaces are the eigenspaces. we see 4 3 that −6 6 3 4 3 4 2 0 = . . . and Im(πi ) is the λi eigenspace. Indeed. Then α is diagonalisable if and only if the union of these bases is a basis for V . So (c) implies (b). Finally. . . . πr are projections satisfying π1 + · · · + πr = I and πi π j = 0 for i = j. (b) V is the direct sum of eigenspaces of α. Furthermore. So (a) and (b) are equivalent. so they are equal. . Theorem 4. . . λr be the distinct eigenvalues of α. 12 −8 P2 = −8 6 . vimi be a basis for the λi eigenspace of α. namely P1 = 9 −6 . then V is the direct sum of the spaces Im(πi ). . Example Our matrix A = −6 6 is diagonalisable. . . Now in this case it is easily checked that T and ∑ λi πi agree on every vector in V . . and the only eigenvectors are scalar multiples of [ 1 0 ] . and we are done. Now suppose that (b) holds. . . πr satisfying the conditions of (c) where Im(πi ) is the λi eigenspace.
we have Am = ∑ λim Pi .6 Let A = ∑ λi Pi i=1 r be the expression for the diagonalisable matrix A in terms of projections Pi satisfying the conditions of Theorem 4. for m = 0. Suppose that the result holds for m = k − 1. and we obtain simply ∑r λik−1 λi Pi . we use the fact that A0 = I = ∑ Pi = ∑ λi0 Pi . ∑r Pi = I and Pi Pj = O for i = j. i=1 i=1 r r that is. This expression for a diagonalisable matrix A in terms of projections is useful in calculating powers of A.) . P1 P2 = P2 P1 = 0.4. DIAGONALISABILITY 47 2 2 Check directly that P1 = P1 . i=1 r Proof (a) The proof is by induction on m. i=1 r (b) for any polynomial f (x). P1 +P2 = I. Then Ak = Ak−1 A = i=1 ∑ λik−1Pi r i=1 ∑ λiPi r When we multiply out this product. or polynomials in A. part (a) holds also for m = 0. all the terms Pi Pj are zero for i = j. i=1 (b) If f (x) = ∑ am xm . we obtain the result by multiplying the equation of part (a) by am and summing over m. that is. we have f (A) = ∑ f (λi )Pi .4. Proposition 4. So the induction goes through. and 2P1 +3P2 = A. the case m = 1 being the given expression. (Note that. P2 = P2 . i=1 Then (a) for any positive integer m. as required.5.
using the fact that cα (α) = mα (α) = O.5 (a) The determinant det(α) of a linear map α : V → V is the determinant of any matrix representing T . Now cA (A) represents cA (α). (c) The minimal polynomial mα (x) of a linear map α : V → V is the monic polynomial of smallest degree which is satisﬁed by α. For this to be a good deﬁnition.8 Let α be a linear map on V . we ﬁnd that r(α) = 0. (b) λ is a root of the characteristic polynomial of α. Theorem 4. then we can divide cα (x) by mα (x). But the third part also creates a bit of a problem: how do we know that α satisﬁes any polynomial? The Cayley–Hamilton Theorem tells us that cA (A) = O for any matrix A representing α. and cA = cα by deﬁnition.5 Characteristic and minimal polynomials We deﬁned the determinant of a square matrix A. . its minimal polynomial mα (x) divides its characteristic polynomial cα (x) (as polynomials). The second part of the deﬁnition is OK. if A = P−1 AP. in other words. But the degree of r is less than the degree of mα . that is. Proof Suppose not. that det(A ) = det(A) if A and A are similar. since det(P−1 ) det(P) = 1. Substituting α for x. So our plan will succeed: Deﬁnition 4. by the same reasoning as the ﬁrst (since cA (x) is just a determinant). so this contradicts the deﬁnition of mα as the polynomial of least degree satisﬁed by α. Now we want to deﬁne the determinant of a linear map α. The obvious way to do this is to take the determinant of any matrix representing α.48 CHAPTER 4.7 For any linear map α on V . (c) λ is a root of the minimal polynomial of α. so cα (α) = O. LINEAR MAPS ON A VECTOR SPACE 4. Then the following conditions are equivalent for an element λ of K: (a) λ is an eigenvalue of α. Indeed. the Cayley–Hamilton Theorem can be stated in the following form: Proposition 4. then det(P−1 AP) = det(P−1 ) det(A) det(P) = det(A). But. we need to show that it doesn’t matter which matrix we take. getting a quotient q(x) and nonzero remainder r(x). (b) The characteristic polynomial cα (x) of a linear map α : V → V is the characteristic polynomial of any matrix representing α. cα (x) = mα (x)q(x) + r(x).
Then (λ I − α)v = 0. x − 0. so that there exists a linear relation c1 v1 + · · · + cr vr = 0.6. we have mα (λ ) = 0. choose a relation with the smallest number of nonzero coefﬁcients.6). as required. vr are linearly independent.03 = x2 −1. a lemma. . Using this result. Choosing f = mα . . . Lemma 4. α k (v) = λ k v for any k. (a) implies (c): Let λ be an eigenvalue of A with eigenvector v. by the Cayley–Hamilton Theorem: so (x − λ divides cα (x). vr be eigenvectors of α with distinct eigenvalues λ1 .3 = (x−0. −0. First. But mα (x) divides cα (x). Some of these coefﬁcients may be zero. . Then λ I − α is not invertible. λ is an eigenvalue of α. since v = 0. as we found. CHARACTERISTIC AND MINIMAL POLYNOMIALS 49 Remark: This gives us a recipe to ﬁnd the eigenvalues of α: take a matrix A representing α. Then (x − λ ) divides mα (x). We have α(v) = λ v. and ﬁnd the roots of this polynomial. λr . we can give a necessary and sufﬁcient condition for α to be diagonalisable.1 x − 0.9 −0. we get c2 (λ2 − λ1 )v2 + · · · + cr (λr − λ1 )vr = 0. Now this equation has fewer nonzero coefﬁcients than the one we started with. write down its characteristic polynomial cA (x) = det(xI − A). . Suppose that c1 = 0. which was assumed to have the smallest possible number. (c) implies (b): Suppose that λ is a root of mα (x).7)−0. we get c1 λ1 v1 + · · · + cr λr vr = 0. . . so its kernel is nonzero.6 = (x−1)(x−0. Then v1 . . Proof (b) implies (a): Suppose that cα (λ ) = 0. Subtracting λ1 times the ﬁrst equation from the second. .5. so mα (λ )v = 0. whence λ is a root of cα (x). . . Pick a nonzero vector v in Ker(λ I − α). det(λ I − α) = 0. Proof Suppose that v1 .7 so the eigenvalues are 1 and 0.) Now acting on the given relation with α. that is. that is.9 Let v1 .4. . By induction. with coefﬁcients ci not all zero.6x+0. and so f (α)(v) = f (λ )(v) for any polynomial f . (If c1 = 0 just renumber. . using the fact that α(vi ) = λi vi . so that α(v) = λ v. vr are linearly dependent. . So the coefﬁcients in . . In our earlier example.9)(x−0. . we have mα (α) = 0 by deﬁnition.
clearly it has no repeated roots. i=1 r with hi (α)(ki (α)v) ∈ Im(hi (α). so ci = 0 (since λi = λ1 ). so that f (x) divides mα (x). So the claim follows. say u = hi (α)v. given v ∈ V . So the vectors must have been linearly independent. ci (λi − λ1 ) = 0. a diagonal matrix whose diagonal entries are f (λi ) for i = 1. but this fails to divide hi . . . . n. . . . Choose f (x) = (x − λ1 ) · · · (x − λr ). we have v = Iv = ∑ hi (α)(ki (α)v). Theorem 4. as a linear combination: 1 = ∑ hi (x)ki (x). so f (D) = 0. that is. from which we conclude that c1 = 0. The vectors in Ui are eigenvectors of α with eigenvalue λi . For. Proof Suppose ﬁrst that α is diagonalisable. Let f (x) = ∏(x − λi ) be the minimal polynomial of α. . . We claim that f is the minimal polynomial of α. .50 CHAPTER 4. contrary to our assumption. so that the degree of f cannot be smaller than that of mα . its roots all have multiplicity 1. . Then the polynomials h1 . since the eigenvectors are linearly independent. i=1 r Let Ui = Im(hi (α). we have to show that if mα is a product of distinct linear factors then α is diagonalisable. Let hi (x) = f (x)/(x − λi ). Now for any polynomial f . hr have no common factor except 1. . LINEAR MAPS ON A VECTOR SPACE this equation must all be zero. We know that each λi is a root of mα (x). for if u ∈ Ui . with eigenvalues λ1 . so that α(v) = λi (v).f. .c. f (α) is represented by f (D). The fact that the expression is unique follows from the lemma. This doesn’t leave much of the original equation. Conversely. That is. with the roots λi all distinct.10 The linear map α on V is diagonalisable if and only if its minimal polynomial is the product of distinct linear factors. Moreover every vector can be written as a sum of vectors from the subspaces Ui . . . This is a little argument with polynomials. . λr . . Now the Euclidean algorithm shows that we can write the h. then (α − λi I)ui = (α − λi I)hi (α)(v) = f (α)v = 0. for the only possible factors are (x − λi ). Then all the diagonal entries of f (D) are zero. only c1 v1 = 0. Then there is a basis such that α is represented by a diagonal matrix D whose diagonal entries are the eigenvalues. . for i = 2. and we also know that f (α) = 0. so we will be done. r.
··· 0 0 0 ··· λ that is. The matrix equation can be rewritten as AP = PD.6 Jordan form We ﬁnish this chapter by stating without proof a canonical form for matrices over the complex numbers under similarity. 1 2 For example. Moreover. that is. and that every root of the characteristic polynomial is a root of the minimal polynomial. (x−1)2 (x−2).6 (a) A Jordan block J(n.6. (x−1)2 (x−2)2 . (x−1)(x−2)2 . ﬁnd an invertible matrix P such that P−1 AP = D is diagonal? We saw an example of this earlier.) (b) A matrix is in Jordan form if it can be written in block form with Jordan blocks on the diagonal and zeros elsewhere. it is an n × n matrix with λ on the main diagonal. do we “diagonalise” a matrix A. and ﬁnd a basis of eigenvectors. (x−1)(x−2)3 or (x−1)2 (x−2)3 . How do we ﬁnd the minimal polynomial of a matrix? We know that it divides the characteristic polynomial. For example.11 Over C. Then P−1 AP = D. if the characteristic polynomial is (x−1)2 (x−2)3 . in practice. so it is (x − 1)2 . its 0 1 minimal polynomial is not (x − 1) (since A = I). any linear map can be represented by a matrix in Jordan form relative to a suitable basis. then it’s trial and error. (We take J(1. 4. λ ) is a matrix of the form λ 1 0 ··· 0 0 λ 1 ··· 0 . 1 in positions immediately above the main diagonal. from which we see that the columns of P are the eigenvectors of A. then let P be the matrix which has the eigenvectors as columns. and 0 elsewhere.4. the ﬁrst one to be satisﬁed by the matrix is the minimal polynomial. JORDAN FORM 51 So how. and D the diagonal matrix whose diagonal entries are the eigenvalues. Theorem 4. any matrix is similar to a matrix in Jordan form. If we try them in this order. . Deﬁnition 4. λ ) to be the 1 × 1 matrix [λ ]. the Jordan form of a matrix or linear map is unique apart from putting the Jordan blocks in a different order on the diagonal. that is. then the minimal polynomial must be one of (x − 1)(x − 2) (this would correspond to the matrix being diagonalisable). So the proceedure is: Find the eigenvalues of A. the characteristic polynomial of A = is (x − 1)2 .
we are stuck with what we have. λ for some λ . Its characteristic polyno−b a mial is x2 − 2ax + (a2 + b2 ). We see that there are two different “obstructions” to a matrix being diagonalisable: Consider the matrix (a) The roots of the characteristic polynomial don’t lie in the ﬁeld K.7 The trace Tr(A) of a square matrix A is the sum of its diagonal entries. 4. Deﬁnition 4. so that the minimal polynomial has repeated roots. and consider its relation to the eigenvalues and the characteristic polynomial. with b = 0. This problem cannot be transformed away by enlarging the ﬁeld. (b) Even though the characteristic polynomial factorises. so that the eigenvalues over C are a + bi and a − bi. µ. it is not diagonalisable. it can be shown that if all the roots of the characteristic polynomial lie in the ﬁeld K. But over the real numbers. there may be Jordan blocks of size bigger than 1. LINEAR MAPS ON A VECTOR SPACE Remark A matrix over C is diagonalisable if and only if all the Jordan blocks in its Jordan form have size 1. Example Any 3 × 3 matrix over C is similar to one of λ 0 0 λ 1 0 λ 1 0 µ 0 . Example a b .52 CHAPTER 4. Though it is beyond the scope of this course. then the matrix is similar to one in Jordan form. . and cannot be put into Jordan form either. 0 λ 0 0 ν 0 0 µ 0 0 0 1 . We can always get around this by working in a larger ﬁeld (as above. ν ∈ C (not necessarily distinct). 0 λ 0 . Thus A is diagonalisable. enlarge the ﬁeld from R to C). if we regard it as a matrix over the complex numbers.7 Trace Here we meet another function of a linear map. A has no eigenvalues and no eigenvectors.
Proof (a) Tr(AB) = ∑ (AB)ii = ∑ i=1 n n i=1 j=1 ∑ Ai j B ji. The second part of this proposition shows that.. The trace and determinant of α are coefﬁcients in the characteristic polynomial of α. .13 Let α : V → V be a linear map. . so that the highest possible power of x would be xn−2 . Putting x = 0. −a2n . . and the constant term is (−1)n times the product of the roots. we ﬁnd that the constant term is cα (0) = det(−A) = (−1)n det(A). if α : V → V is a linear map. TRACE Proposition 4.12 Tr(BA). . taking −aii from the ith factor and x from each of the others.7. then any two matrices representing α have the same trace. Now obviously Tr(BA) is the same thing. . x − ann The only way to obtain a term in xn−1 in the determinant is from the product (x − a11 )(x − a22 ) · · · (x − ann ) of diagonal entries. (If we take one offdiagonal term. then the sum of its eigenvalues is Tr(α) and their product is det(α).4. so. We have cα (x) = det(xI − A) = x − a11 −a21 −an1 −a12 x − a22 −an2 . . (b) Tr(P−1 AP) = Tr(APP−1 ) = Tr(AI) = Tr(A). (a) The coefﬁcient of xn−1 is − Tr(α).) So the coefﬁcient of xn−1 is minus the sum of the diagonal terms. Proposition 4.. Now the coefﬁcient of xn−1 is minus the sum of the roots. n by the rules for matrix multiplication. . we can deﬁne the trace Tr(α) of α to be the trace of any matrix representing α. we have Tr(AB) = (b) Similar matrices have the same trace. 53 (a) For any two n × n matrices A and B. and the constant term is (−1)n det(α). −a1n . where dim(V ) = n. we would have to have at least two. If α is diagonalisable then the eigenvalues are the roots of cα (x): cα (x) = (x − λ1 )(x − λ2 ) · · · (x − λn ). Proof Let A be a matrix representing α. . as we did for the determinant. . and let cα be the characteristic polynomial of α. (b) If α is diagonalisable. a polynomial of degree n with leading term xn .
54 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE .
. . where K is regarded as a 1dimensional vector space over K: that is. The linear forms comprise the dual space of V .1 Linear forms and dual space The deﬁnition is simple: Deﬁnition 5. We show that we can change the basis to put any quadratic form into “diagonal form” (with squared terms only). by a process generalising “completing the square” in elementary algebra. an ]. . it is a function from V to K satisfying f (v1 + v2 ) = f (v1 ) + f (v2 ). then a linear form is represented by a 1 × n matrix over K. we look at this and deﬁne dual bases and the adjoint of a linear map (corresponding to the transpose of a matrix).1 Let V be a vector space over K. then for v = [ x1 x2 . functions from a vector space V to its ﬁeld. an ] . If dim(V ) = n. v ∈ V and c ∈ K. . that is. = a1 x1 + a2 x2 + · · · + an xn . v2 . If f = [ a1 a2 . that is. . . . for all v1 .Chapter 5 Linear and quadratic forms In this chapter we examine “forms”. xn ] we have x1 x2 f (v) = [ a1 a2 . which are either linear or quadratic. 5. Quadratic forms make up the bulk of the chapter. a row vector of length n over K. A linear form on V is a linear map from V to K. xn 55 f (cv) = c f (v) . . and that further reductions are possible over the real and complex numbers.
vn ) is a basis for V .1 If V is ﬁnitedimensional.56 CHAPTER 5. i=1 ∑ δi j ai = a j n for ﬁxed j ∈ {1. if (v1 . . . (c f )(v) = c f (v). if (v1 . . . n} is deﬁned by the rule that 1 if i = j. j ∈ {1. n}. n. an are any scalars whatsoever. It is given by f (c1 v1 + · · · + cn vn ) = a1 c1 + · · · + an cn . . . We can describe the basis in the preceding proof as follows. . So they form a vector space. . Now let fi be the linear map deﬁned by the rule that fi (v j ) = 1 if i = j. . . indeed. . . the linear form f deﬁned in the preceding paragraph is a1 f1 + · · · + an fn . . LINEAR AND QUADRATIC FORMS Conversely. . . . Deﬁnition 5. . it is represented by the row vector [ a1 a2 action on Kn is by matrix multiplication as we saw earlier. . j) entry of the identity matrix. we have: Proposition 5. then there is a unique linear map f with the property that f (vi ) = ai for i = 1. . . δi j = 0 if i = j. . and its Then ( f1 . then so is V ∗ . 0 if i = j. . an ]. and a1 . . Note that δi j is the (i. . fn ) satisfying fi (v j ) = δi j . This basis is called the dual basis of V ∗ corresponding to the given basis for V . we see that dim(V ∗ ) = n = dim(V ). .3 The Kronecker delta δi j for i. . For example. in other words. then the dual basis for the dual space V ∗ is the basis ( f1 . . . Since it has n elements. There are some simple properties of the Kronecker delta with respect to summation.2 Linear forms can be added and multiplied by scalars in the obvious way: ( f1 + f2 )(v) = f1 (v) + f2 (v). vn ) is a basis for V . which is called the dual space of V and is denoted by V ∗ . . . . Proof We begin by observing that. . and dim(V ∗ ) = dim(V ). . fn ) form a basis for V ∗ . . Now. . any row vector of length n represents a linear form on Kn . This is because all terms of the sum except the term i = j are zero. Not surprisingly. Deﬁnition 5. .
1. and (B )∗ = ( f1 . j) entry qi j . which will not be proved in detail here. This means that. Then the matrix representing α ∗ relative to the bases C∗ and B∗ is the transpose of A. If P = PB. if B∗ = ( f1 . Let B∗ and C∗ denote the dual bases of V ∗ and W ∗ corresponding to B and C. . The obvious bases to choose are the dual bases corresponding to some given bases of W and V respectively.B . . and hence the linear form f ∈ W ∗ can act on it to produce a scalar.2 Change of basis Suppose that we change bases in V from B = (v1 . . then what is the transition matrix PB∗ . and let A be the matrix representing α relative to these bases. . This deﬁnition takes a bit of unpicking. vn ). Choose bases B for V . j) entry pi j . The map α ∗ is called the adjoint of α. Then −1 PB∗ . .(B )∗ has (i. Our deﬁnition says that α ∗ ( f ) maps the vector v to the scalar f (α(v)): this makes sense because α(v) is a vector in W . . vn ) to B = (v1 . . fn ) is the dual basis of B. and B∗ and (B )∗ the dual bases of the dual space. n ∑ ql j fl .2 Let α : V → W be a linear map. .5. Now α ∗ . . to any element f ∈ W ∗ (any linear form on W ) we must associate a linear form g = α ∗ ( f ) ∈ V ∗ . and Q = PB∗ . Proposition 5. How do the dual bases change? In other words. with change of basis matrix P = PB.B . we have vi = fj = k=1 n l=1 ∑ pkivk . .1. We are given α : V → W and asked to deﬁne α ∗ : W ∗ → V ∗ .(B )∗ = PB. is represented by a matrix when we choose bases for W ∗ and V ∗ . that is.B has (i. A . What is the matrix? Some calculation shows the following. . . being a linear map.1 Adjoints Deﬁnition 5. . and C for W . LINEAR FORMS AND DUAL SPACE 57 5. 5. This linear form must act on vectors v ∈ V to produce scalars.(B )∗ ? The next result answers the question. fn ) the dual basis of B. Proposition 5.4 Let α : V → W be a linear map. Proof Use the notation from just before the Proposition. . . .1.3 Let B and B be bases for V . . There is a linear map α ∗ : W ∗ → V ∗ (note the reversal!) deﬁned by (α ∗ ( f ))(v) = f (α(v)).
among other places. when written out in coordinates.58 and so CHAPTER 5. whence Q = P−1 . In this section we begin the study of quadratic forms.) A quadratic form as a function which. We will meet a formal deﬁnition of a quadratic form later in the chapter. For example. 5. q(x. in F5 . and so we have I = Q P. l=1 n ∑ ql j f l n k=1 ∑ pkivi n Now qk j is the ( j. so that Q = P−1 = P −1 . y. . so that the element 1/2 exists in K.2 Quadratic forms A lot of applications of mathematics involve dealing with quadratic forms: you meet them in statistics (analysis of variance) and mechanics (energy of rotating bodies). z) = x2 + 4xy + 2xz − 3y2 − 2yz − z2 is a quadratic form in three variables. 5. as required. but for the moment we take the following.2. Of our list of “standard” ﬁelds. this only excludes F2 . LINEAR AND QUADRATIC FORMS δi j = f j (vi ) = n = = l=1 k=1 n k=1 ∑ ∑ ql j δi j pki ∑ qk j pki. (For example. This means that 2 = 0 in K. we have 1/2 = 3.1 Quadratic forms For almost everything in the remainder of this chapter. we assume that the characteristic of the ﬁeld K is not equal to 2. the integers mod 2. is a polynomial in which every term has total degree 2 in the variables. k) entry of Q .
) Any quadratic form is thus represented by a symmetric matrix A with (i. . where P is the transition matrix between the bases. we take ai j = a ji = c/2. but we will always make the choice so that the two values are equal. we obtain a different representation for the same function q. then the term in xi x j comes twice. Proposition 5. we make a deﬁnition: Deﬁnition 5. We are free to choose any two values for ai j and a ji as long as they have the right sum. . (This is why we require that the characteristic of the ﬁeld is not 2.5 Two symmetric matrices are congruent if and only if they represent the same quadratic form with respect to different bases. The effect of a change of basis is a linear substitution v = Pv on the variables. q(x1 . Thus we have v Av = (Pv ) A(Pv ) = (v ) (P AP)v . . We think of a quadratic form as deﬁned above as being a function from the vector space Kn to the ﬁeld K. In the above representation of a quadratic form. .2. It is clear from the deﬁnition that x1 . . j). A over a ﬁeld K are congruent if A = P AP for some invertible matrix P. . so that the coefﬁcient of xi x j is ai j + a ji . xn ) = v Av. to obtain a term cxi x j . where v = . .6 Two symmetric matrices A.5 A quadratic form in n variables x1 . so we have the following: Proposition 5. QUADRATIC FORMS 59 Deﬁnition 5. As for other situations where matrices represented objects on vector spaces. That is. xn Now if we change the basis for V . This is the third job of matrices in linear algebra: Symmetric matrices represent quadratic forms. is a multiple of xi x j for some i. j) entry ai j (that is. we see that if i = j.4 A basis change with transition matrix P replaces the symmetric matrix A representing a quadratic form by the matrix P AP. . xn over a ﬁeld K is a polynomial i=1 j=1 ∑ ∑ ai j xix j n n in the variables in which every term has degree two (that is.5. a matrix satisfying A = A ). . . .
Let y1 = x1 + ∑ (a1i /a11 )xi . . A form in one variable is certainly diagonal. . . yn . Take q(x1 . cn ∈ K. We call a quadratic form which is written as in the conclusion of the theorem diagonal. . . . Proof Our proof is by induction on n. . xn ). all the terms involving x1 in q have been incorporated into a11 y2 . We will solve this problem for the ﬁelds of real and complex numbers. . 5. So we have 1 q(x1 . as you may expect. . a choice of basis so that a quadratic form has a particularly simple shape. That is. that is. Now assume that the theorem is true for forms in n − 1 variables. n Then we have 2 a11 y2 = a11 x1 + 2 ∑ a1i x1 xi + q (x2 . .6 Let q be a quadratic form in n variables x1 . . . . xn ) = a11 y2 + q (x2 .2. is to ﬁnd a canonical form for symmetric matrices under congruence. we can assume that a11 = 0. . LINEAR AND QUADRATIC FORMS Our next job. we can obtain q = c1 y2 + c2 y2 + · · · + cn y2 n 1 2 for some c1 . over a ﬁeld K whose characteristic is not 2. . 1 . . . . . . Then by a suitable linear substitution to new variables y1 . . . Case 1: Assume that aii = 0 for some i.60 CHAPTER 5. we can simplify them very greatly. Theorem 5. xn ) = ∑ where ai j = a ji for i = j. We will see that the answer to this question depends on the ﬁeld over which we work. . xn . i=2 n n i=1 j=1 ∑ ai j xix j . . xn . .2 Reduction of quadratic forms Even if we cannot ﬁnd a canonical form for quadratic forms. xn ). By a permutation of the variables (which is certainly a linear substitution). . so the induction starts. . . . 1 i=2 n where q is a quadratic form in x2 . . . .
and there is nothing to prove: take c1 = · · · = cn = 0. there is a change of variable so that q (x2 .2. Case 2: All aii are zero. v = y − z. z) = x2 + 2xy + 4xz + y2 + 4z2 . xn ) = ∑ ci y2 . We have (x + y + 2z)2 = x2 + 2xy + 4xz + y2 + 4z2 + 4yz. we obtain a new form for q which 2 2 does contain a nonzero diagonal term. . . . i i=2 n 61 and so we are done (taking c1 = a11 ). the matrix representing 1 2 1 0 0 4 . namely 1 1 A= 2 is congruent to the matrix 1 0 0 A = 0 1 0 . . the quadratic form. and so q = (x + y + 2z)2 − 4yz = (x + y + 2z)2 − (y + z)2 + (y − z)2 = u2 + v2 − w2 . 0 0 −1 Can you ﬁnd an invertible matrix P such that P AP = A ? Otherwise said. but ai j = 0 for some i = j.5. so taking xi = 1 (xi + x j ) and x j = 1 (xi − x j ). Case 3: All ai j are zero. y. Now we apply the method of Case 1. Now xi j = 1 4 (xi + x j )2 − (xi − x j )2 . Now q is the zero form. where u = x + y + 2z. QUADRATIC FORMS where q is the part of q not containing x1 minus q . Example 5. w = y + z. By induction.1 Consider the quadratic form q(x.
. . yi = xi for r + 1 ≤ i ≤ n. . . we may suppose that α1 . Over the real numbers R. we arrive at the same value of r.7 (a) Let b : V × V → K be a function of two variables from V with values in K. αr = 0. . . First we deﬁne a bilinear form. . Suppose that α1 . w1 + w2 ) = b(v. . w ∈ V . Now putting √ for 1 ≤ i ≤ s. w2 ). . every element has a square root. b(v. and αr+1 = · · · = αn = 0. though it amounts to the same thing.3 Quadratic and bilinear forms The formal deﬁnition of a quadratic form looks a bit different from the version we gave earlier. Deﬁnition 5.] 5. Again. αs > 0. αs+1 . αs+t < 0. We say that b is a bilinear form if it is a linear function of each variable when the other is kept constant: that is.62 CHAPTER 5. Since any positive real number has a square root. then α1 x1 = (α1 c2 )y2 . w1 ) + b(v. w) = b(w. xi for s + t + 1 ≤ i ≤ n.2. [This is the theorem known as Sylvester’s Law of Inertia. . . . we have q = y2 + · · · + y2 . (√αi )xi yi = ( −αi )xi for s + 1 ≤ i ≤ s + t. we will see later that s and t don’t depend on how we do the reduction. with two similar equations involving the ﬁrst variable. 2 For example. Putting √ ( αi )xi for 1 ≤ i ≤ r. . things are not much worse. . w). r 1 We will see later that r is an “invariant” of q: however we do the reduction. Over the complex numbers C. we get 2 2 2 q = x1 + · · · + x + s2 − xs+1 − · · · − xs+t . and αs+t+1 . But this is still not a “canonical form for congruence”. A bilinear form b is symmetric if b(v. b(v. cw) = cb(v. . In other words. if y1 = x1 /c. . αn = 0. v) for all v. we can multiply 1 any αi by any factor which is a perfect square in K. . LINEAR AND QUADRATIC FORMS Thus any quadratic form can be reduced to the diagonal shape 2 2 α1 x1 + · · · + αn xn by a linear substitution.
then we have 2q(v) = 4q(v) − 2q(v) = q(v + v) − q(v) − q(v) = 2b(v. We say that q is a quadratic form if – q(cv) = c2 q(v) for all c ∈ K. If we think of the prototype of a quadratic form as being the function x2 . while the second has the form 1 2 2 2 2 ((x + y) − x − y ) = xy. If b is a symmetric bilinear form on V and B = (v1 .2. – the function b deﬁned by b(v. then this bilinear form is symmetric. Then for v ∈ V . Note that A is a symmetric matrix. we have α(v) ∈ V ∗ . and let B∗ be the dual basis for V ∗ .5. v ∈ V . w) = α(v)(w) is a bilinear form on V . and so α(v)(w) is an element of K. 63 Remarks The bilinear form in the second part is symmetric. Now given α : V → V ∗ . It is easy to see that this is the same as the matrix representing the quadratic form. . . w) = 1 (q(v + w) − q(v) − q(w)) 2 is a bilinear form on V . . So these are equivalent objects. and we see that the quadratic form is determined by the symmetric bilinear form. choose a basis B for V . a symmetric bilinear form b gives rise to a linear map α : V → V ∗ satisfying α(v)(w) = α(w)(v). Summarising: . w ∈ V . Let V ∗ be the dual space of V . v j ). and the division by 2 in the deﬁnition is permissible because of our assumption that the characteristic of K is not 2. j) entry is ai j = b(vi . QUADRATIC FORMS (b) Let q : V → K be a function. y) = 1 (q(x + y) − q(x) − q(y)) (which is known as 2 the polarisation formula) says that the bilinear form is determined by the quadratic term. Here is a third way of thinking about a quadratic form. then we can represent b by the n × n matrix A whose (i. If α(v)(w) = α(w)(v) for all v. by the rule that α(v) is the linear map w → b(v. then the ﬁrst equation says (cx)2 = c2 x2 . if we know the symmetric bilinear form b. Then α is represented by a matrix A relative to the bases B and B∗ . and let α : V → V ∗ be a linear map. v). Conversely. v). The function b(v. vn ) is a basis for V . Note that the formula b(x. Conversely. . so that q(v) = b(v. and xy is the prototype of a bilinear form: it is a linear function of x when y is constant. w). and vice versa.
64 CHAPTER 5. r = rank(A). if corresponding objects of these three types are represented by matrices as described above. Now go back to the discussion of linear maps between different vector spaces in Chapter 4.2. 5. Also. and so r = rank(P AP) = rank(A) as claimed. and so A is congruent to two matrices of this form then they both have the same value of r. If α : V → W and we change bases in V and W with transition matrices P and Q. as required. this is the same as saying that the represent the same quadratic form relative to different bases. then the matrix A representing α is changed to Q−1 AP. then we get the same matrix A in each case. We have seen it for a quadratic form. so that Q−1 = P . and the argument for a bilinear form is the same. then so is P . Proof Only the last part needs proof. Moreover. . LINEAR AND QUADRATIC FORMS Proposition 5. Apply this with Q = P )−1 . Moreover. (c) a linear map α : V → V ∗ satisfying α(v)(w) = α(w)(v) for all v. So suppose that α : V → V ∗ . w ∈ V . we return to quadratic forms (or symmetric matrices) over the real and complex numbers. as we have seen. Theorem 5. if P is invertible. and ﬁnd canonical forms under congruence. Moreover.8 Any n × n complex symmetric matrix A is congruent to a matrix of the form Ir O O O for some r. We saw that the transition matrix between the dual bases in V ∗ is (P )−1 . and we change from B to B in V with transition matrix P. in this section. a change of basis in V with transition matrix P replaces A by P AP. Recall that two symmetric matrices A and A are congruent if A = P AP for some invertible matrix P. Proof We already saw that A is congruent to a matrix of this form. and we see that the new matrix is P AP. (b) a symmetric bilinear form on V .4 Canonical forms for complex and real forms Finally.7 The following objects are equivalent on a vector space over a ﬁeld whose characteristic is not 2: (a) a quadratic form on V .
n s+1 But since zs +1 = · · · = zn = 0. Suppose for a contradiction that s < s . then they have the same values of s and of t. Moreover. . 65 Theorem 5. . . we also have q = z2 + · · · + z2 > 0. ys and zs +1 . .t. . . xn of q such that q = y2 + · · · + y2 − y2 − · · · − y2 = z2 + · · · + z2 − z2 +1 − · · · − z2 . Then we are told that there are linear functions y1 . So we must have s = s . . . So we cannot have s < s . such that the variables y1 . not all zero.t respectively.t and s . . with s + t = s + t = n. . Proof Again we have seen that A is congruent to a matrix of this form. . zn = 0 regarded as linear equations in the original variables x1 . According to a lemma from much earlier in the course (we used it in the proof of the Exchange Lemma!). . . . Since y1 = · · · = ys = 0. . . if A is congruent to two matrices of this form. there are values of x1 . . . Now let q be the quadratic form represented by A. . Suppose that two different reductions give the values s. and so any two matrices of this form congruent to A have the same values of s + t. zs +1 = 0. . . we have for these values q = −y2 − · · · − y2 ≤ 0. . xn . . . yn and z1 . QUADRATIC FORMS The next result is Sylvester’s Law of Inertia. . . 1 s But this is a contradiction. . Arguing as in the complex case.5.9 Any n × n real symmetric matrix A is congruent to a matrix of the form Is O O O −It O O O O for some s. . . . s+t s s+t 1 s+1 1 s s Now consider the equations y1 = 0. . we see that s + t = rank(A). . as required to be proved. . The number of equations is s + (n − s ) = n − (s − s) < n. . . Similarly we cannot have s < s either. . zn of the original variables x1 . the equations have a nonzero solution. ys = 0. xn . That is.2. zn are all zero.
. if q(v) ≥ 0 for all v.66 CHAPTER 5. We say that q (or A) is • positive deﬁnite if s = n (and t = 0). Let q be a quadratic form in n variables represented by the real symmetric matrix A. that is. • positive semideﬁnite if t = 0. The number s−t is known as the signature of A. if q(v) ≤ 0 for all v. that is. that is. with equality only if v = 0. • negative semideﬁnite if s = 0. that is. if q(v) takes both positive and negative values. if q(v) ≥ 0 for all v. • indeﬁnite if s > 0 and t > 0. LINEAR AND QUADRATIC FORMS We saw that s+t is the rank of A. and if we know the rank and signature. that is. both the rank and the signature are independent of how we reduce the matrix (or quadratic form). we can easily recover s and t. Of course. if q(v) ≤ 0 for all v. with equality only if v = 0. have s positive and t negative terms in its diagonal form. Let q (or A) have rank s + t and signature s − t. that is. You will meet some further terminology in association with Sylvester’s Law of Inertia. • negative deﬁnite if t = n (and s = 0).
Chapter 6 Inner product spaces Ordinary Euclidean space is a 3dimensional vector space over R. b is linear in the ﬁrst variable when the second is kept constant and vice versa). v) ≥ 0 for all v ∈ V . we deﬁne v · w = v. One can also deﬁne inner products for complex vector spaces. etc. in a real vector space. we deﬁne v = √ v·v 67 . But it is much easier to reverse the process. Of course this deﬁnition doesn’t work if either v or w is zero. Geometrically. Given an inner product on V . that is. but in this case v · w = 0. An inner product is sometimes called a dot product (because of this notation). We usually write b(v. b(v. angles. 6.1 An inner product on a real vector space V is a function b : V ×V → R satisfying • b is bilinear (that is. We examine inner product spaces and their linear maps in this chapter. and θ is the angle between v and w. but things are a bit different: we have to use a form which is not quite bilinear. and b(v.) can all be derived from a special kind of bilinear form on the space known as an inner product. w) as v · w.w cos θ . • b is positive deﬁnite.1 Inner products and orthonormal bases Deﬁnition 6. We defer this to Chapter 8. where v and w are the lengths of v and w. v) = 0 if and only if v = 0. but it is more than that: the extra geometric structure (lengths.
If we represent vectors in coordinates with respect to this basis. Theorem 6. w = 0. This is a quadratic function in x. thus its discriminant is nonpositive. Expanding. yn ] . for all i. w are vectors in an inner product space then (v · w)2 ≤ (v · v)(w · w). .2 Let · be an inner product on a real vector space V . . Taking the inner product of this equation with vi . if v. . . where v·w . Then there is an orthonormal basis (v1 . either it has no real roots.w For this deﬁnition to make sense. Remark: If vectors v1 . INNER PRODUCT SPACES for any vector v ∈ V . . vn satisfy vi · v j = δi j .1 If v. Deﬁnition 6. or it has two equal real roots. For suppose that c1 v1 +· · ·+cn vn = 0. .w for any vectors v. . . w (since cos θ lies between −1 and 1). j ≤ n. then they are necessarily linearly independent. . . . . There is essentially only one kind of inner product on a real vector space. . as required. (v · w)2 − (v · v)(w · w) ≤ 0. . vn ) for an inner product space is called orthonormal if vi · v j = δi j (the Kronecker delta) for 1 ≤ i.w ≤ v · w ≤ V . xn ] and w = [ y1 y2 . that is. Proof By deﬁnition. we have (v + xw) · (v + xw) ≥ 0 for any real number x. cosθ = v. then we deﬁne the angle between them to be θ . and. .2 A basis (v1 . . say v = [ x1 x2 . Since it is nonnegative for all real x. . we ﬁnd that ci = 0. we obtain x2 (w · w) + 2x(v · w) + (v · v) ≥ 0. This is the content of the Cauchy–Schwarz inequality: Theorem 6. we need to know that −v. then v · w = x1 y1 + x2 y2 + · · · + xn yn . vn ) for V .68 CHAPTER 6.
Let w1 . by what we have said. . wn ). . . n. this is important because it involves a constructive method for ﬁnding an orthonormal basis. w1  > 0. Either of these would contradict the positive deﬁniteness of V . the one given in the theorem) is called the standard inner product on Rn . and by polarisation we ﬁnd that b((x1 . The Gram–Schmidt process works as follows. . Since we replace these vectors by linear combinations of themselves.1. n. . . as required. . • Now apply the Gram–Schmidt process recursively to (w2 . (y1 . . v1 · v1 = 1. it is possible to give a more direct proof of the theorem. . j = 2. Then v1 · wi = v1 · wi − (v1 · wi )(v1 · v1 ) = 0 for i = 2. we have w1 · w1 > 0. . . . then the nth basis vector vn satisﬁes vn · vn = 0. . that is. . . then v1  = 1. . Since the inner product is bilinear. Now we must have s = n and t = 0. So if we end up with vectors v2 . . .3 The inner product on Rn for which the standard basis is orthonormal (that is. yn )) = x1 y1 + · · · + xn yn . . . . . we can assume that vi ·v j = δi j for i. . . Now we have 2 2 q(x1 . • For i = 2. . INNER PRODUCTS AND ORTHONORMAL BASES 69 Proof This follows from our reduction of quadratic forms in the last chapter. . . the function q(v) = v · v = v2 is a quadratic form. . . . Deﬁnition 6. n. . By induction. then v1 ·vi = 0 for i = 2. Put v1 = w1 /w1 . However. that is. . while if s + t < n. . . .6. and so it can be reduced to the form 2 2 2 2 q = x1 + · · · + xs − xs+1 − · · · − xs+t . vn . then the s + 1st basis vector vs+1 satisﬁes vs+1 · vs+1 = −1. . . n. For. • Since w1 = 0. if t > 0. xn ). this holds if i or j is 1 as well. . wn be any basis for V . let wi = wi − (v1 · wi )v1 . known as the Gram–Schmidt process. xn ) = x1 + · · · + xn . their inner products with v1 remain zero throughout the process. . . .
vn ) is an orthonormal basis for V . a3 ) instead of [ a1 a2 a3 ] . we have fi (v j ) = δi j . . Now suppose that (v1 . . Then v2 · w3 = 2 . so in the second step we ﬁnd 3 w2 = w2 − v1 = ( 2 . INNER PRODUCT SPACES Example 6. 1 ). w2 = [ 1 1 0 ] . Translating all this to an inner product space. 6. we have a natural way of matching up V with ∗. 3 ). in our case. We have w2 · w2 = 1. but this is exactly the statement that ( f1 . 1 . . on an inner product space V . 3 3 3 Now v1 · w2 = 1 and v1 · w3 = 1 . 3 9 9 9 Now we apply Gram–Schmidt recursively to w2 and w3 . . fn ) is the dual basis to (v1 . . . − 9 . α(v)(w) = v · w. so v3 = 3 w3 = ( 2 . − 2 . V Recall too that we deﬁned the adjoint of α : V → V to be the map α ∗ : V ∗ → V ∗ deﬁned by α ∗ ( f )(v) = f (α(v)). so 3 3 4 w3 = w3 − 2 v2 = ( 4 . 2 ). we have the following deﬁnition and result: . Then. − 3 ). We have w1 · w1 = 9. To simplify things. w3 = [ 1 0 0 ] . . So. apply the Gram–Schmidt process to the vectors w1 = [ 1 2 2 ] . − 2 . Recall that the linear map α : V → V ∗ corresponding to a bilinear form b on V satisﬁes α(v)(w) = b(v.70 CHAPTER 6. 3 9 9 Finally. 2 ). . if α(vi ) = fi . w3 · w3 = 4 . 2 2 so v2 = w2 = ( 3 . I will write (a1 .2 Adjoints and orthogonal linear maps We saw in the last chapter that a bilinear form on V is the same thing as a linear map from V to its dual space. . The importance of an inner product is that the corresponding linear map is a bijection which maps an orthonormal basis of V to its dual basis in V ∗ . vn ). and we showed that the matrix representing α ∗ relative to the dual basis is the transpose of the matrix representing α relative to the original basis. 2 . w). 1 . . − 2 ). a2 . 9 2 3 3 3 Check that the three vectors we have found really do form an orthonormal basis.1 In R3 (with the standard inner product). so in the ﬁrst step we put 2 v1 = 1 w1 = ( 1 . so that vi · v j = δi j . 3 3 3 w3 = w3 − 1 v1 = ( 8 . . .
Here P−1 = P because P is orthogonal. and α : V → V a linear map. then α ∗ is represented by the transposed matrix A . then (a) α is selfadjoint if and only if A is symmetric. from part (b). Proposition 6. (a) α is selfadjoint if α ∗ = α.4 Let V be an inner product space. Deﬁnition 6.5 Two real symmetric matrices A and A are orthogonally similar if and only if there is an orthogonal matrix P such that A = P−1 AP = P AP. Part (a) of this result shows that we have yet another equivalence relation on real symmetric matrices: Deﬁnition 6.2. (b) α is orthogonal if it is invertible and α ∗ = α −1 . we see: Proposition 6. Proposition 6. ADJOINTS AND ORTHOGONAL LINEAR MAPS 71 Deﬁnition 6.4 If α is represented by a matrix A (relative to an orthonormal basis). Then the adjoint of α is the linear map α ∗ : V → V deﬁned by v · α ∗ (w) = α(v) · w. We see that orthogonal similarity is a reﬁnement of both similarity and congruence. Now we deﬁne two important classes of linear maps on V . Then.6 Two real symmetric matrices are called orthogonally similar if they represent the same selfadjoint map with respect to different orthonormal bases. (b) α is orthogonal if and only if A A = I.5 Let α be a linear map on an inner product space V .3 If α is represented by the matrix A relative to an orthonormal basis of V . .6. We will examine selfadjoint maps (or symmetric matrices) further in the next section.
so that (α(v1 ). the condition on columns shows that A A = I. vn ). α(vn ). Converesely. . . .72 CHAPTER 6. . . . . Example Our earlier example of the Gram–Schmidt process produces the orthogonal matrix 1 2 2 2 3 2 3 3 2 −3 3 1 3 −2 3 1 3 3 whose columns are precisely the orthonormal basis we constructed in the example. (c) α maps an orthonormal basis of V to an orthonormal basis. . α(vn ) is an orthonormal basis. where A is the matrix representing α. and (c) holds. . so α ∗ α = I. by the deﬁnition of adjoint.7 α is orthogonal if and only if the columns of the matrix representing α relative to an orthonormal basis themselves form an orthonormal basis. and let v = ∑ xi vi and w = ∑ yi vi for some orthonormal basis (v1 . . so (a) and (b) are equivalent.6 The following are equivalent for a linear map α on an inner product space V : (a) α is orthogonal. Proof The columns of the matrix representing α are just the vectors α(v1 ). . then α(vi ) · α(v j ) = δi j . INNER PRODUCT SPACES Next we look at orthogonal maps. . . that is. . . so that v · w = ∑ xi yi . . Theorem 6. . . vn . and α is orthogonal. vn ) is an orthonormal basis. Proof We have α(v) · α(w) = v · α ∗ (α(w)). Alternatively. that is. so (b) holds. We have α(v) · α(w) = ∑ xiα(vi) · ∑ yiα(vi) = ∑ xi yi . . Suppose that (v1 . since α(vi ) · α(v j ) = δi j by assumption. α(v) · α(w) = v · w. . written in coordinates relative to v1 . vi · v j = δi j . suppose that (c) holds. Corollary 6. . So this follows from the equivalence of (a) and (c) in the theorem. (b) α preserves the inner product. If (b) holds.
The last n − r vectors will be orthogonal to 73 . . Proof Proving that U ⊥ is a subspace is straightforward from the properties of the inner product. w in an inner product space are orthogonal if v · w = 0. then w1 · u = w2 · u = 0 for all u ∈ U.1 Orthogonal projections and orthogonal decompositions We say that two vectors v. any real symmetric matrix is diagonalisable. . so (w1 + w2 ) · u = 0 for all u ∈ U. Then apply the Gram– Schmidt process to this basis (starting with the elements of the basis for U). The argument for scalar multiples is similar. with dim(V ) = n and dim(U) = r. The orthogonal complement of U is the set of all vectors which are orthogonal to everything in U: U ⊥ = {w ∈ V : w · u = 0 for all u ∈ U}. vn ). Proposition 7. and U a subspace of V . to obtain an orthonormal basis (v1 . . V = U ⊕U ⊥ . If w1 . w2 ∈ U ⊥ . Since the process only modiﬁes vectors by adding multiples of earlier vectors. the ﬁrst r vectors in the resulting basis will form an orthonormal basis for U.Chapter 7 Symmetric and Hermitian matrices We come to one of the most important topics of the course. and dim(U ⊥ ) = n − r. . whence w1 + w2 ∈ U ⊥ . In simple terms. Now choose a basis for U and extend it to a basis for V .1 If V is an inner product space and U a subspace of V . then U ⊥ is a subspace of V . Moreover.1 Let V be a real inner product space. Deﬁnition 7. But there is more to be said! 7.
. say ui = πi (v) and u j = π j (w).2 Let V be an inner product space. We only have to prove the last part. if we are given an orthogonal decomposition of V . So take v ∈ Ker(π). Pr whose sum is the identity and which satisfy Pi Pj = O for i = j. Now the last statement of the proposition follows from the proof. then Ker(π) = Im(π)⊥ . w ∈ V ). . that is. π ∗ = π (where π ∗ (v) · w = v · π(w) for all v. . Then ui · u j = πi (v) · π j (w) = πi∗ (v) · π j (w) = v · πi (π j (w)) = 0. Hence dim(U ⊥ ) = n−r. Then v · w = v · π(u) = π ∗ (v) · u = π(v) · u = 0. . Proof We know that V = Ker(π) ⊕ Im(π).2. and if ui ∈ Ui and u j ∈ U j . . SYMMETRIC AND HERMITIAN MATRICES U. so that π(v) = 0. Now suppose that w ∈ U ⊥ and w = ∑ ci vi . then ui and u j are orthogonal. Then ci = w · vi = 0 for i = 1. . . Recall the connection between direct sum decompositions and projections. . .2 If π is an orthogonal projection. and so lie in U ⊥ . and they are clearly linearly independent. . Deﬁnition 7. as required. that is. This can be reﬁned in an inner product space as follows. we only have to show that these two subspaces are orthogonal. then we can ﬁnd orthogonal projections satisfying the hypotheses of the theorem. Proposition 7. . . . . so w is a linear combination of the last n − r basis vectors. and w ∈ Im(π). since we have a basis for V which is a disjoint union of bases for U and U ⊥ .3 Let π1 . . A linear map π : V → V is an orthogonal projection if (a) π is a projection. πr be orthogonal projections on an inner product space V satisfying π1 + · · · + πr = I and πi π j = O for i = j. r. Conversely. . π 2 = π. A direct sum decomposition satisfying the conditions of the theorem is called an orthogonal decomposition of V . . Proposition 7. . (b) π is selfadjoint. which thus form a basis of U ⊥ . . If we have projections P1 . . Then V = U1 ⊕U2 ⊕ · · · ⊕Ur . where (v1 .74 CHAPTER 7. where the second equality holds since πi is selfadjoint and the third is the deﬁnition of the adjoint. then the space V is the direct sum of their images. Proof The fact that V is the direct sum of the images of the πi follows from Proposition 5. So take ui and u j as in the Proposition. so that w = π(u) for some u ∈ V . Let Ui = Im(πi ) for i = 1. r. . vn ) is the orthonormal basis we constructed. as required.
2 The Spectral Theorem The main theorem can be stated in two different ways. since the transition matrix from one orthonormal basis to another is an orthogonal matrix. such that α = λ1 π1 + · · · + λr πr . . λr are the distinct eigenvalues of α. the second factor is positive. (The socalled “Fundamental Theorem of Algebra” asserts that any polynomial over C has a root. In other words. together with what we already know about diagonalisable maps. It sufﬁces to ﬁnd an orthonormal basis of eigenvectors.5 Let A be a real symmetric matrix. where λ1 . then α is represented by a real symmetric matrix A. Theorem 7. Now we can ﬁnd a column vector v ∈ Cn such that Av = λ v. so we must have λ = λ . Taking the complex conjugate. Hence there is an orthonormal basis of V consisting of eigenvectors of α. πr satisfying π1 + · · · + πr = I and πi π j = O for i = j.) We temporarily enlarge the ﬁeld from R to C. Moreover. we have Av = λ v. λ is real.7.4 If α is a selfadjoint linear map on a real inner product space V . Since v is not the zero vector. If v = [ z1 z2 · · · zn ] . so (λ − λ )(z1 2 + z2 2 + · · · + zn 2 ) = 0. Proof The second theorem follows from the ﬁrst. So we assume that the theorem holds for (n − 1)dimensional spaces. there exist orthogonal projections π1 . then we have λ (z1 2 + z2 2 + · · · + zn 2 ) = = = = = λv v (Av) v v Av v (λ v) λ (z1 2 + z2 2 + · · · + zn 2 ). . . . THE SPECTRAL THEOREM 75 7. then the eigenspaces of α form an orthogonal decomposition of V . . . Theorem 7. Then there exists an orthogonal matrix P such that P−1 AP is diagonal. The ﬁrst job is to show that α has an eigenvector. . any real symmetric matrix is orthogonally similar to a diagonal matrix. So we concentrate on the ﬁrst theorem. that is. The proof will be by induction on n = dim(V ). remembering that A is real. since all the rest follows from our remarks about projections. . Choose an orthonormal basis.2. Its characteristic polynomial has a root λ over the complex numbers. I emphasise that these two theorems are the same! Either of them can be referred to as the Spectral Theorem. . There is nothing to do if n = 1.
Remark The theorem is almost a canonical form for real symmetric relations under the relation of orthogonal congruence. then v · w = 0. If α(v) = λ v and α(w) = µw. where we use the fact that α is selfadjoint. This is a subspace of V of dimension n − 1. SYMMETRIC AND HERMITIAN MATRICES Now since α has a real eigenvalue.6 If α is selfadjoint. Let U be the subspace v⊥ = {u ∈ V : v · u = 0}. then the result is a true canonical form: each matrix is orthogonally similar to a unique diagonal matrix with this property. U has an orthonormal basis consisting of eigenvectors of α. 4 13 so the eigenvalues are 9 and 18. Proof This follows from the theorem. adding v to the basis. 2 4 13 z 18z . Example 7. then λ v · w = α(v) · w = α ∗ (v) · w = v · α(w) = µv · w. We claim that α : U → U. but is easily proved directly.76 CHAPTER 7. For take u ∈ U. if λ = µ. They are all orthogonal to the unit vector v. For eigenvalue 18 the eigenvectors satisfy 10 2 2 x 18x 2 13 4 y = 18y . Corollary 7. Then v · α(u) = α ∗ (v) · u = α(v) · u = λ v · u = 0. so. then eigenvectors of α corresponding to distinct eigenvalues are orthogonal. we get an orthonormal basis for V .1 Let 10 A= 2 2 The characteristic polynomial of A is x − 10 −2 −2 −2 x − 13 −4 −2 −4 = (x − 9)2 (x − 18). If we require that the eigenvalues occur in decreasing order down the diagonal. x − 13 2 2 13 4 . So α(u) ∈ U. and (multiplying by a scalar if necessary) we can assume that v = 1. we can choose a real eigenvector v. By the inductive hypothesis. and we are done. So α is a selfadjoint linear map on the (n − 1)dimensional inner product space U. so.
7.3. QUADRATIC FORMS REVISITED
77
so the eigenvectors are multiples of [ 1 2 2 ] . Normalising, we can choose a 2 unit eigenvector [ 1 2 3 ] . 3 3 For the eigenvalue 9, the eigenvectors satisfy 10 2 2 x 9x 2 13 4 y = 9y , 2 4 13 z 9z that is, x + 2y + 2z = 0. (This condition says precisely that the eigenvectors are orthogonal to the eigenvector for λ = 18, as we know.) Thus the eigenspace is 2dimensional. We need to choose an orthonormal basis for it. √ This can be done in √ many different ways: for √ example, we could choose [ 0 1/ 2 −1/ 2 ] and √ √ [ −4/3 2 1/3 2 1/3 2 ] . Then we have an orthonormal basis of eigenvectors. We conclude that, if √ 1/3 0 −4/3 2 √ √ P = 2/3 1/ √ 2 1/3√2 , 2/3 −1/ 2 1/3 2 then P is orthogonal, and 18 0 0 P AP = 0 9 0 . 0 0 9 You might like to check that the orthogonal matrix in the example in the last chapter of the notes also diagonalises A.
7.3
Quadratic forms revisited
Any real quadratic form is represented by a real symmetric matrix; and, as we have seen, orthogonal similarity is a reﬁnement of congruence. This gives us a new look at the reduction of real quadratic forms. Recall that any real symmetric matrix is congruent to one of the form Is O O O −It O , O O O where the numbers s and t are uniquely determined: s + t is the rank, and s − t the signature, of the matrix (Sylvester’s Law of Inertia).
78
CHAPTER 7. SYMMETRIC AND HERMITIAN MATRICES
Proposition 7.7 The rank of a real symmetric matrix is equal to the number of nonzero eigenvalues, and the signature is the number of positive eigenvalues minus the number of negative eigenvalues (counted according to multiplicity).
Proof Given a real symmetric matrix A, there is an orthogonal matrix P such that P AP is diagonal, with diagonal entries λ1 , . . . , λn , say. Suppose that λ1 , . . . , λs are positive, λs+1 , . . . , λs+t are negative, and the remainder are zero. Let D be a diagonal matrix with diagonal entries 1/ Then λ1 , . . . , 1/ λs , 1/ −λs+1 , . . . , 1/ −λs+t , 1, . . . , 1. O −It O O O. O
Is O (PD) APD = D P APD = O
7.4
Simultaneous diagonalisation
There are two important theorems which allow us to diagonalise more than one matrix at the same time. The ﬁrst theorem we will consider just in the matrix form. Theorem 7.8 Let A and B be real symmetric matrices, and suppose that A is positive deﬁnite. Then there exists an invertible matrix P such that P AP = I and P BP is diagonal. Moreover, the diagonal entries of P BP are the roots of the polynomial det(xA − B) = 0. Proof A is a real symmetric matrix, so there exists an invertible matrix P1 such that P1 AP1 is in the canonical form for congruence (as in Sylvester’s Law of Inertia). Since A is positive deﬁnite, this canonical form must be I; that is, P1 AP1 = I. Now consider P1 BP = C. This is a real symmetric matrix; so, according to the spectral theorem (in matrix form), we can ﬁnd an orthogonal matrix P2 such that P2 CP2 = D is diagonal. Moreover, P2 is orthogonal, so P2 P2 = I. Let P = P1 P2 . Then P AP = P2 (P1 AP1 )P2 = P2 IP2 = I, and P BP = P2 (P1 BP1 )P2 = P2 CP2 = D, as required.
7.4. SIMULTANEOUS DIAGONALISATION
79
The diagonal entries of D are the eigenvalues of C, that is, the roots of the equation det(xI −C) = 0. Now we have det(P1 ) det(xA−B) det(P1 ) = det(P1 (xA−B)P1 ) = det(xP1 AP1 −P1 BP1 ) = det(xI −C), and det(P1 ) = det(P1 ) is nonzero; so the polynomials det(xA−B) and det(xI −C) are nonzero multiples of each other and so have the same roots. You might meet this formula in mechanics. If a mechanical system has n coordinates x1 , . . . , xn , then the kinetic energy is a quadratic form in the velocities x1 , . . . , xn , and (from general physical principles) is positive deﬁnite (zero veloc˙ ˙ ities correspond to minimum energy); near equilibrium, the potential energy is approximated by a quadratic function of the coordinates x1 , . . . , xn . If we simultaneously diagonalise the matrices of the two quadratic forms, then we can solve n separate differential equations rather than a complicated system with n variables! The second theorem can be stated either for linear maps or for matrices. Theorem 7.9 (a) Let α and β be selfadjoint maps on an inner product space V , and suppose that αβ = β α. Then there is an orthonormal basis for V which consists of vectors which are simultaneous eigenvalues for α and β . (b) Let A and B be real symmetric matrices satisfying AB = BA. Then there is an orthogonal matrix P such that both P AP and P BP are diagonal. Proof Statement (b) is just a translation of (a) into matrix terms; so we prove (a). Let λ1 , . . . , λr be the distinct eigenvalues of α. By the Spectral Theorem, have an orthogonal decomposition V = U1 ⊕ · · · ⊕Ur , where Ui is the λi eigenspace of α. We claim that β maps Ui to Ui . For take u ∈ Ui , so that α(u) = λi u. Then α(β (u)) = β (α(u)) = β (λi u) = λi β (u), so β (u) is also an eigenvector of α with eigenvalue λi . Hence β (u) ∈ Ui , as required. Now β is a selfadjoint linear map on the inner product space Ui , and so by the spectral theorem again, Ui has an orthonormal basis consisting of eigenvectors of β . But these vectors are also eigenvectors of α, since they belong to Ui . Finally, since we have an orthogonal decomposition, putting together all these bases gives us an orthonormal basis of V consisting of simultaneous eigenvectors of α and β .
the proof is by induction on the number of matrices in the set. to diagonalise all the matrices in our set. SYMMETRIC AND HERMITIAN MATRICES Remark This theorem easily extends to an arbitrary set of real symmetric matrices such that any two commute.80 CHAPTER 7. we use the fact that they span a ﬁnitedimensional subspace of the space of all real symmetric matrices. For an inﬁnite set. it sufﬁces to diagonalise the matrices in a basis. based on the proof just given. For a ﬁnite set. .
b is not linear as a function of its ﬁrst variable. (Sometimes we say that b is semilinear (that is. keeping the ﬁrst variable constant.1 Complex inner products There are no positive deﬁnite bilinear forms over the complex numbers. for we always have (iv) · (iv) = −v · v. (b) b(w. v. b(cv.] denotes complex conjugation. w) + b(v2 .Chapter 8 The complex case The theory of real inner product spaces and selfadjoint linear maps has a close parallel in the complex case. and b(v. in fact we have b(v1 + v2 . v) ∈ R for all v ∈ V . and describe it as a sesquilinear form 81 . However.1 A inner product on a complex vector space V is a map b : V ×V → C satisfying (a) b is a linear function of its second variable. v) = b(v. Deﬁnition 8. v) ≥ 0 for all v ∈ V . Usually. w). v2 . where b(v. w) = b(v1 . some changes are required. w) as v · w. [It follows that (c) b(v. This time. In this chapter we outline the complex case. w) = cb(v. 8. w). But it is possible to modify the deﬁnitions so that everything works in the same way over C. 1 “ 2 ”linear) as a function of its ﬁrst variable. v) = 0 if and only if v = 0. w) for v1 . the proofs are similar to those in the real case. w ∈ V and c ∈ C. we write b(v. As before.
The standard inner product (with respect to an orthonormal basis) is given by v · w = x1 y1 + · · · + xn yn . . A form satisfying (b) is called Hermitian. 8. “1 1 linear”. πr satisfying π1 + · · · + πr = I and πi π j = O for i = j. . . . xn ] . . Hence there is an orthonormal basis of V consisting of eigenvectors of α. THE COMPLEX CASE (that is. and the Gram–Schmidt process allows us to ﬁnd one with only trivial modiﬁcations.2 The complex Spectral Theorem The spectral theorem for selfadjoint linear maps on complex inner product spaces is almost identical to the real version.82 CHAPTER 8. and then transpose. then the eigenspaces of α form an orthogonal decomposition of V . λr are the distinct eigenvalues of α.) The deﬁnition of an orthonormal basis is exactly as in the real case. w = [ y1 · · · yn ] . then its adjoint α ∗ is represented by (A) . Thus an inner product is a positive deﬁnite Hermitian sesquilinear form. . . where λ1 . but this time there is a small difference in the matrix representation: if α is represented by A (relative to an orthonormal basis). . or α ∗ = α −1 ) is represented by a matrix A satisfying (A) = A−1 : such a matrix is called unitary. . The proof goes through virtually unchanged. which satisﬁes α(v) · α(w) = v · w. • a map which preserves the inner product (that is.1 If α is a selfadjoint linear map on a complex inner product space V . . Moreover. where v = [ x1 . (Take the complex conjugates of all the entries in A. .) So • a selfadjoint linear map is represented by a matrix A satisfying A = (A) : such a matrix is called Hermitian. Theorem 8. The adjoint of α : V → V is deﬁned as before by the formula α ∗ (v) · w = v · α(w). The deﬁnition of an orthogonal projection is the same: a projection which is selfadjoint. there exist orthogonal projections π1 . such that α = λ1 π1 + · · · + λr πr . and one satisfying 2 (c) is positive deﬁnite.
We also have a theorem on simultaneous diagonalisation: Proposition 8. and so A = PD P = PDP = A. where in the last step we use the fact that (cv) · w = cv · w for a complex inner product. and suppose that αβ = β α. For if A is a real n × n matrix and P is an orthogonal matrix such that P AP = D is diagonal. This is not true for complex Hermitian matrices. The proof is as in the real case. that is. In fact. we have v · v = 0. 8. We saw that a real symmetric matrix is orthogonally similar to a diagonal matrix. Then there is an orthonormal basis for V consisting of eigenvectors of both α and β . Deﬁnition 8. and so λ = λ .8. a real matrix is orthogonally similar to a diagonal matrix if and only if it is symmetric. In other words. λ is real. What really happens is the following. Then λ v · v = v · α(v) = α ∗ (v) · v = α(v) · v = λ v · v. Since v = 0. So (λ − λ )v · v = 0. There is one special feature of the complex case: Proposition 8. then A = PDP . We say that α is normal if it commutes with its adjoint: αα ∗ = α ∗ α. Then there exists a unitary matrix P such that P−1 AP is diagonal.3 Normal matrices The fact that the eigenvalues of a complex Hermitian matrix are real leaves open the possibility of proving a more general version of the spectral theorem. You are invited to formulate the theorem in terms of commuting Hermitian matrices.2 Let A be a complex Hermitian matrix. . NORMAL MATRICES 83 Theorem 8. Proof Suppose that α is selfadjoint and α(v) = λ v.4 Let α and β be selfadjoint linear maps of a complex inner product space V . since such matrices have real eigenvalues and so cannot be similar to nonreal diagonal matrices.2 (a) Let α be a linear map on a complex innerproduct space V .3.3 Any eigenvalue of a selfadjoint linear map on a complex inner product space (or of a complex Hermitian matrix) is real. the converse is also true.
We see that α ∗ (vi ) = λi vi . 4i Hence. suppose that α is normal. If α has an orthonormal basis (v1 . (b) Let A be an n × n matrix over C. Since αα ∗ and α ∗ α agree on the vectors of a basis.) βγ = 1 2 (α − (α ∗ )2 ). We say that A is normal if it commutes with its conjugate transpose: AA = A A. by the Proposition at the end of the last section. so α is normal.84 CHAPTER 8. there is an orthonormal basis B whose vectors are eigenvectors of β and γ. Proof As usual. Conversely. Note that the eigenvalues of β and γ in this proof are the real and imaginary parts of the eigenvalues of α. then α(vi ) = λi vi for i = 1. . vn ) consisting of eigenvectors. . . The analogy is even closer. Then V has an orthonormal basis consisting of eigenvectors of α if and only if α is normal. For 1 2 (α − αα ∗ + α ∗ α − (α ∗ )2 ) = 4i 1 2 γβ = (α + αα ∗ − α ∗ α − (α ∗ )2 ) = 4i (Here we use the fact that αα ∗ = α ∗ α. 4i 1 2 (α − (α ∗ )2 ). . . We prove it in the ﬁrst form. Let β = 1 (α + α ∗ ). where λi are eigenvalues. β∗ = where we use the fact • β γ = γβ . n. they are equal. 1 γ ∗ = −2i (α ∗ − α) = γ.5 (a) Let α be a linear map on a complex inner product space V . the two forms of the theorem are equivalent. . and so αα ∗ (vi ) = α ∗ α(vi ) = λi λi vi . 1 (You should compare these with the formulae x = 1 (z + z). Then there is a unitary matrix P such that P−1 AP is diagonal if and only if A is normal. y = 2i (z − z) for the 2 real and imaginary parts of a quadratic form. that (cα)∗ = cα ∗ . . .) Now we claim: • β and γ are Hermitian. 2 γ= 1 ∗ 2i (α − α ). THE COMPLEX CASE (b) Let A be an n × n matrix over C. since clearly we have α = β + iγ. For 1 ∗ 2 (α + α) = β . Theorem 8. and hence are eigenvectors of α = β + iγ. .
v) + b(v. as we will see. this statement applies to “alternating matrices”. (More precisely. v + w) = b(v. It turns out that life is much simpler for skewsymmetric matrices. Proposition 9. w) + b(w. w) = b(v. 85 . w) for all v.) 9. v) for any v. even then we could only ﬁnd canonical forms for the real and complex numbers.Chapter 9 Skewsymmetric matrices We spent the last three chapters looking at symmetric matrices.1 Alternating bilinear forms Alternating forms are as far from positive deﬁnite as they can be: Deﬁnition 9. using the deﬁnition of an alternating bilinear form. but these are precisely the same as skewsymmetric matrices unless the characteristic of the ﬁeld is 2. Now here is the analogue of the Gram–Schmidt process for alternating bilinear forms. w) + b(w. A bilinear form b on V is alternating if b(v.1 Let V be a vector space over K. v) = 0 for all v ∈ V . v) = −b(v. w ∈ V .1 An alternating bilinear form b satisﬁes b(w. w ∈ V . Proof 0 = b(v + w. v) + b(w. We ﬁnd a canonical form for these matrices under congruence which works for any ﬁeld whatever.
. w1 . v)b(u. . If the characteristic of K is not 2.2 Let b be an alternating bilinear form on a vector space V . . and so by induction there is a basis of the required form for W . us . Proof If b is identically zero. v). and any alternating matrix represents an alternating bilinear form. u) + b(u. then the extra condition is needed. ws . v)b(w. zt ) for V such that b(ui . v)u + b(u. . x) = b(w. . Putting in u1 and w1 gives the required basis for V . ws . . A matrix A is alternating if A is skewsymmetric and has zero diagonal. s and b(x. we can replace “alternating matrix” by “skewsymmetric matrix”. Now let U = u. Then there is a basis (u1 . vn ): its (i. then any skewsymmetric matrix is alternating. v)b(w. 0 = b(w. Choose a pair of vectors u and w such that c = b(u. . So suppose not.3 An alternating bilinear form b on a vector space over K is represented by an alternating matrix. Then b(u. j) entry is b(vi . . . t = n. Recall the matrix representing a bilinear form b relative to a basis (v1 . v)b(u. . . . . If the characteristic of the ﬁeld K is not equal to 2. . but if the characteristic is 2. . say (u2 . . us . ui ) = −1 for i = 1. . . then simply choose a basis (z1 . . Now b is an alternating bilinear form on W . and so our assertion is proved. z1 . w) = b(w. . . wi ) = 1 and b(wi . Proposition 9. w and W = {x ∈ V : b(u. b(w. v − x) = 0. w) = d. u) + db(u. . . zn ) and take s = 0. zt ). . x) = −b(w. . . . . We take u1 = u and w1 = v as our ﬁrst two basis vectors. x) = −b(w. w) = −c. v)w. . v − x) = b(w. 9. . . The argument just above already shows that U ∩W = 0. . y) = 0 for any other choices of basis vectors x and y. z1 . v j ). cu + dw) = cb(u. . u) + b(u. v) so b(u. so we have to show that V = U +W . Then 0 = b(u. and let x = −b(w. . . cu + dw) = cb(w. SKEWSYMMETRIC MATRICES Theorem 9. w) = 1. so c = d = 0.86 CHAPTER 9. x) = 0}. Replacing w by w/c.2 Skewsymmetric and alternating matrices A matrix A is skewsymmetric if A = −A. u) + db(w. . . We claim that V = U ⊕W . Thus v − x ∈ W . For suppose that cu + dw = 0. we have b(u. w) = b(u. w) = 0. w2 . We claim that u and w are linearly independent. So take a vector v ∈ V . But clearly x ∈ U.
det(A) = (Pf(A))2 . whence det(A) = 1/(det(P)2 ). and is zero if the size of the matrix is odd. Then its canonical form has 0 1 s = n/2 blocks on the diagonal. Also. Each of these blocks has determinant 1. So we can write our theorem in matrix form as follows: Theorem 9. then the rank cannot be n (by (a)). v j ) = −ai j and aii = b(vi . So det(P AP) = det(P)2 det(A) = 1. vi ) = −b(vi . Moreover the number s is half the rank of A.2. SKEWSYMMETRIC AND ALTERNATING MATRICES 87 Proof This is obvious since if b is alternating then a ji = b(v j . and has the property that det(A) is the square of the Pfafﬁan of A: that is. and so is independent of the choice of P. If the size n of A is odd. For example. (b) If the skewsymmetric matrix A is singular then its determinant is zero. Then there is an invertible matrix P 0 1 such that P AP is the matrix with s blocks on the diagonal and all other −1 0 entries zero. This has a corollary which is a bit surprising at ﬁrst sight: Corollary 9. which is a square. the matrix in the statement of the theorem is just the matrix representing b relative to the special basis that we found in the preceding theorem.4 Let A be an alternating matrix (or a skewsymmetric matrix over a ﬁeld whose characteristic is not equal to 2).9. −1 0 and hence so does the whole matrix. 0 a b c −a 0 0 a d e = a f − be + cd. and so det(A) = 0.5 (a) The rank of a skewsymmetric matrix (over a ﬁeld of characteristic not equal to 2) is even. So suppose that it is invertible. which like the determinant is a polynomial in the matrix entries. (b) The determinant of a skewsymmetric matrix (over a ﬁeld of characteristic not equal to 2) is a square. Proof (a) The canonical form in the theorem clearly has rank 2s. Remark There is a function deﬁned on skewsymmetric matrices called the Pfafﬁan. vi ) = 0. Proof We know that the effect of a change of basis with transition matrix P is to replace the matrix A representing a bilinear form by P AP. Pf = a.) . which is a square. Pf −b −d 0 −a 0 f −c −e − f 0 (Check that the determinant of the second matrix is (a f − be + cd)2 .
Since the eigenvalues of a Hermitian matrix are real. because of the following observation: Proposition 9. the conjugate transpose of A. if A = A. we are working in a complex inner product space. Actually. The matrix A is Hermitian if it is equal to its adjoint. Proof Try it and see! Corollary 9. that is. . then its adjoint is represented by A . Alternatively.6 The matrix A is skewHermitian if and only if iA is Hermitian. Proof This follows immediately from the Proposition preceding. and the Corollary follows from our result about normal matrices (Theorem 8. a skewHermitian matrix is obviously normal. things are very much simpler here.2 The complex n × n matrix A is skewHermitian if A = −A. SKEWSYMMETRIC MATRICES 9. So we make the following deﬁnition: Deﬁnition 9.7 Any skewHermitian matrix can be diagonalised by a unitary matrix.3 Complex skewHermitian matrices What if we play the same variation that led us from real symmetric to complex Hermitian matrices? That is. we see that the eigenvalues of a skewHermitian matrix are imaginary.88 CHAPTER 9.5). and if α is represented by the matrix A.
Appendix A Fields and vector spaces Fields A ﬁeld is an algebraic structure K in which we can add and multiply elements. Distributive law (D) For all a. b ∈ K. (FM2) There is an element 1 ∈ K. there exists −a ∈ K such that a + (−a) = (−a) + a = 0. b. such that the following laws hold: Addition laws (FA0) For any a. (FM3) For any a ∈ K with a = 0. (FM4) For any a. (FA2) There is an element 0 ∈ K such that a + 0 = 0 + a = a for all a ∈ K. we have a + b = b + a. c ∈ K. not equal to the element 0 from (FA2). c ∈ K. (FA4) For any a. c ∈ K. we have a(b + c) = ab + ac. we have a(bc) = (ab)c. there is a unique element ab ∈ K. b. such that a1 = 1a = a for all a ∈ K. (FA1) For all a. 89 . b ∈ K. b ∈ K. b. we have a + (b + c) = (a + b) + c. (FM1) For all a. Multiplication laws (FM0) For any a. there is a unique element a + b ∈ K. there exists a−1 ∈ K such that aa−1 = a−1 a = 1. we have ab = ba. b ∈ K. (FA3) For any a ∈ K.
(VM1) For any a ∈ K. Vector spaces Let K be a ﬁeld. (VA3) For any v ∈ V . +) is an abelian group if (FA0)–(FA4) hold. b ∈ K. Then (FM0)–(FM4) say that (K \ {0}. For our important examples. then 0 = n · 1 = (r · 1)(s · 1). we say that the characteristic of K is zero. we have (a + b)v = av + bv. then the smallest such n is prime. and multiply an element of V by an element of K (this is called scalar multiplication). Associated with any ﬁeld K there is a nonnegative integer called its characteristic. Q. s > 1. FIELDS AND VECTOR SPACES Note the similarity of the addition and multiplication laws. there exists −v ∈ V such that v + (−v) = (−v) + v = 0. by minimality of n. and F p (the integers mod p. C (the complex numbers). A vector space V over K is an algebraic structure in which we can add two elements of V . Scalar multiplication laws (VM0) For any a ∈ K. neither of the factors r · 1 and s · 1 is zero. deﬁned as follows. 0 does not have a multiplicative inverse. while F p has characteristic p. u. v ∈ V . If no such n exists. v. R (the real numbers).) Examples of ﬁelds include Q (the rational numbers). If there is a positive integer n such that 1+1+· · ·+1 = 0. with r. (VA2) There is an element 0 ∈ V such that v + 0 = 0 + v = av for all v ∈ V . then this prime number is the characteristic of K. b ∈ K. we have u + (v + w) = (u + v) + w. we have u + v = v + u. w ∈ V . (We have to leave out 0 because. (VA1) For all u. where there are n ones in the sum. . v ∈ V . v ∈ V . such that the following rules hold: Addition laws (VA0) For any u. v ∈ V . But in a ﬁeld. ·) is also an abelian group. there is a unique element av ∈ V . (VM3) For any a.90 APPENDIX A. R and C all have characteristic zero. v ∈ V . the product of two nonzero elements is nonzero. and we denote the sum of n ones by n · 1. v ∈ V . as (FM3) says. (VM2) For any a. (VA4) For any u. for p a prime number). we have (ab)v = a(bv). We say that (K. we have a(u + v) = au + av.) If so. (For if n = rs. there is a unique element u + v ∈ V .
the proof of (VM4). . vn ) = (u1 + v1 . as an example. un ) + (v1 . un + vn ). The second step uses the deﬁnition of scalar multiplication in K n . 1vn ) = (v1 . . . +) is an abelian group. avn ). . v2 . . . . . . . av2 . a(v1 . . and the third step uses the ﬁeld axiom (FM2). Again. u2 . . then 1v = 1(v1 . . we can summarise (VA0)–(VA4) by saying that (V. . . . . . Here is a particularly easy one. . The fact that Kn is a vector space will be assumed here. . . . . . . . .91 (VM4) For any v ∈ V . vn ) = (av1 . . vn ) = (1v1 . . v2 . vn ). . Proofs are straightforward but somewhat tedious. . . we have 1v = v (where 1 is the element given by (FM2)). . u2 + v2 . . . vn ) = v. . . The most important example of a vector space over a ﬁeld K is the set Kn of all ntuples of elements of K: the addition and scalar multiplication are deﬁned by the rules (u1 . If v = (v1 .
92 APPENDIX A. FIELDS AND VECTOR SPACES .
. . a2 . . . . the determinant is the product of the differences between all pairs of parameters ai . . . . we draw the following conclusion: Corollary B. a2 . n−1 n−1 n−1 a1 a2 . an are all distinct. an ) is the n × n matrix 1 1 . a2 . . We see that f (ai ) = 0 for 1 ≤ i ≤ n − 1..1 det(V (a1 . . an ) is invertible if and only if the parameters a1 . Proof To prove the theorem. since the result is the determinant of a matrix with two equal columns. . an This is a particularly important type of matrix. From this theorem.. we ﬁrst regard an as a variable x. an )) = ∏(a j − ai ). . . a2 . . By the Factor Theorem. ∆ = K(x − a1 )(x − a2 ) · · · (x − an−1 ). a2 . . . . . .2 The matrix V (a1 . 1 a1 a2 . 93 . so that the determinant ∆ is a polynomial f (x) of degree n − 1 in x. n 1 .Appendix B Vandermonde and circulant matrices The Vandermonde matrix V (a1 . For the determinant can be zero only if one of the factors vanishes. . . We can write down its determinant explicitly: Theorem B. i< j That is.. an 2 2 a a2 . ..
. so the ﬁrst part of this appendix shows that the determinant of this . j=0 (c) det(C) is the product of the eigenvalues listed in (b). but this product is obtained by n 3 taking the term with larger index from each factor in the product. So K0 = 1 and the theorem is proved. where K0 does not contain any of a1 . ω (n−1)k ] . Now the vectors v0 . In the same way. . . that is. C(a0 . Another general type of matrix whose determinant can be calculated explicitly is the circulant matrix. (Why? They are the columns of a Vandermonde matrix V (1. as required. an−1 ) = . an−3 . let vk = [ 1 ω k . VANDERMONDE AND CIRCULANT MATRICES where K is independent of x.3 Let C = C(a0 .. So Cvk = (a0 + a1 ω k + · · · + an−1 ω (n−1)k )vk .. . . . we observe that the leading diagonal of the matrix gives us a term a2 a2 · · · an−1 in the determinant with sign +1. 1. . To ﬁnd K0 . 1. Then (a) C is diagonalisable. . . . n − 1. . . . vn−1 are linearly independent. (b) the eigenvalues of C are ∑n−1 a j ω jk . . a1 a2 a3 . . . . .94 APPENDIX B. The jth entry in Cvk is an− j + an− j+1 ω k + · · · + an− j−1 ω (n−1)k = a0 ω jk + · · · + an− j−1 ω (n−1)k + an− j ω nk + · · · + an−1 ω (n+ j−1)k = ω jk (a0 + a1 ω k + · · · + an−1 ω (n−1)k ). an . . so that the determinant is K0 times the product of all these differences. an−2 an−2 an−1 a0 . . . Proof We can write down the eigenvectors. using the fact that ω n = 1. all differences (a j − ai ) for i < j are factors. This is a0 + a1 ω k + · · · + an−1 ω (n−1)k times the jth entry in vk . an−1 an−1 a0 a1 . . n − 1.. K0 is a constant. . . Let ω = e2πi/n be a primitive nth root of unity. . For k = 0. .. ω. . a0 Theorem B. whose general form is as follows: a0 a1 a2 . . . and the powers of ω are all distinct. . In other words. . . . . an−1 ) be a circulant matrix over the ﬁeld C. . ω n−1 ). . the original determinant is K(an − a1 ) · · · (an − an−1 ). for k = 0. . also giving sign +1. . .
putting y = x + 3 a. b c a 3 3 3 where ω = e2πi/3 . Let U = u3 and V = v3 . so if we can ﬁnd u and v satisfying −3uv = d and u3 + v3 = e. Thus we can ﬁnd U and V by solving the quadratic equation z2 − ez − d 3 /27 = 0. and then v = −d/(3u). we get rid of the square term: y3 + dy + e = 0 for some d. y = −ωu − ω 2 v. . 1 By “completing the cube”. Consider the equation x3 + ax2 + bx + c = 0. and y = −ω 2 u − ωv. Then U +V = e and UV = −d 3 /27. e. Remark The formula for the determinant of a circulant matrix works over any ﬁeld K which contains a primitive nth root of unity. then the solutions of the equation are y = −u − v. the determinant of a diagonalisable matrix is the product of its eigenvalues.1 We have the identity a b c a + b + c − 3abc = c a b = (a + b + c)(a + ωb + ω 2 c)(a + ω 2 b + ωc).95 matrix is nonzero. we have y3 − 3uvy + u3 + v3 = (y + u + v)(y + ωu + ω 2 v)(y + ω 2 u + ωv). Example B. Now. as above. Finally. so that the columns are linearly independent. and its eigenvalues are as claimed.) Hence we have diagonalised C. for part (c). This formula has an application to solving cubic equations. Now u is a cube root of U. and we are done.
VANDERMONDE AND CIRCULANT MATRICES .96 APPENDIX B.
The theorem asserts that the conﬁguration must look like this.Appendix C The Friendship Theorem The Friendship Theorem states: Given a ﬁnite set of people with the property that any two have a unique common friend. we notice ﬁrst that if P1 and P2 are not friends. We argue by contradiction. there must be someone who is everyone else’s friend. The ﬁrst part is “graph theory”.] To prove this. Step 1: Graph theory We show that there is a number m such that everyone has exactly m friends. and so we assume that we have a counterexample to the theorem. For they have one common friend P3 . then they have the same number of friends. [In the terminology of graph theory. where we represent people by dots and friendship by edges: ¡ u u rr e ¡ ¨ ¨ rre ¡ ¨¨ u r¨ e ¡ ¨¨rr ¨ ¡ e r ru r u¨ ¡ ¨ e ¡ e eu ¡ u u e e u ¡ The proof of the theorem is in two parts. the second uses linear algebra. any further 97 . this says that we have a regular graph of valency m.
say m. . . then J is also symmetric. Pn . So this is not possible. and so must be the friend of either P or Q (but not both). Thus. Let A be the n × n matrix whose (i.. So any vector orthogonal to j is an eigenvector of J with eigenvalue 0. and observe that A2 = (m − 1)I + J. Any other person S must have a different number of friends from either P or Q. P and Q must be friends. u u ¤ h . e ¤ ¡ e h ¤ ¡ e h ¤¡ eh ¡ ¤u ehu u u Now let us suppose that there are two people P and Q who have different numbers of friends. So every entry of AJ is m. Then by assumption. ¤ ¡u h . Now we turn to A. . by Step 1. j) entry of AJ is just the number of ones in the ith row of A. they can be simultaneously diagonalised. Now v · j = 0 means that the sum of the components of v is zero. . Let J be the n × n matrix with every entry equal to 1. whence AJ = mJ. So we conclude this step knowing that the number of friends of each person is the same. Similarly. that is. . First let us consider J. Suppose that there are n people P1 . By the preceding argument. THE FRIENDSHIP THEOREM friend Q of P1 has a common friend R with P2 . the (i. either everyone else is P’s friend. and T is the friend of Q but not P. . then any possible choice of the common friend of S and T leads to a contradiction. then clearly J j = n j. We will calculate their eigenvalues. or everyone else is Q’s friend. If j is the column vector with all entries 1. They have a common friend R. so we can match up the common friends as in the next picture. j) entry is 1 if Pi and Pj are friends. A and J are commuting symmetric matrices. this implies that Jv = 0. A is an n × n symmetric matrix. Consider the product AJ. this is m. Since every entry of J is equal to 1. and conversely. Now if S is the friend of P but not Q. The other eigenvalues of J are orthogonal to j. as claimed. and is 0 otherwise.. Step 2: Linear algebra We prove that m = 2.98 APPENDIX C. . which is the number of friends of Pi . and so by Theorem 7. so j is an eigenvector of J with eigenvalue n. JA = mJ.9. But this means that we don’t have a counterexample after all.
So if we let f and g √ √ be the multiplicities of m − 1 and − m − 1 as eigenvalues of A. and λ = ± m − 1.99 For the (i. The allone vector j satisﬁes A j = m j. from which we see that m is congruent to 1 mod u. This is only possible if u = 1. as claimed. so is an eigenvector of A with eigenvalue m. . n = 3. this says that 0 ≡ 1 mod u. and so are eigenvectors of J with eigenvalue 0. if v is an eigenvector of A with eigenvalue λ . The diagonal entries of A are all zero. in particular. so its trace is zero. Thus. This shows that m − 1 must be a perfect square. This conﬁguration does indeed satisfy the hypotheses of the Friendship Theorem. say m − 1 = u2 . So the theorem is proved. √ so λ 2 = m − 1. not a multiple of j. So A2 has diagonal entries m and offdiagonal entries 1. (Exercise: Prove this by a counting argument in the graph. j) entry of A2 is equal to the number of people Pk who are friends of both Pi and Pj . but it is after all not a counterexample. the remaining eigenvectors of A are orthogonal to j. and we have the Three Musketeers (three individuals. we have √ √ √ 0 = Tr(A) = m + f m − 1 + g(− m − 1) = m + ( f − g) m − 1. while if i = j then (by assumption) it is 1. But the trace equation is 0 = m + ( f − g)u. that m2 j = A2 j = ((m − 1)I + J) j = (m − 1 + n) j. This shows. so it is equal to (m − 1)I + J. so that n = m2 − m + 1.) As before. then λ 2 v = A2 v = ((m − 1)I + J)v = (m − 1)v. this number is m. If i = j. since each person is everyone else’s friend. But then m = 2. any two being friends).
100 APPENDIX C. THE FRIENDSHIP THEOREM .
D. Now we can translate the problem into linear algebra. How can we take this into account? You might think of giving each team a “score” to indicate how strong it is. So our requirement is simply that x should be an eigenvector of A with all entries positive. Let T1 . F. E beats F. xn ] of scores. . F. j) entry is equal to 1 if TI beats T j . Now for any vector [ x1 x2 . So suppose we ask simply that the score of T should be proportional to the sum of the scores of all the teams beaten by T . C beats D. C. E. D beats E. . and then adding the scores of all the teams beaten by team T to see how well T has performed. and 0 otherwise. F beats A. Of course this is selfreferential. and F. F. Let A be the n × n matrix whose (i. Suppose that A beats B. the ith entry of Ax is equal to the sum of the scores x j for all teams T j beaten by Ti . Here is an example. Tn be the teams in the league. B beats C.Appendix D Who is top of the league? In most league competitions. teams are awarded a ﬁxed number of points for a win or a draw. E. D. B. There are six teams A. since the score of T depends on the scores of the teams that T beats. D. E. . . C. . 101 . E. . but the opponents beaten by one team are clearly “better” than those beaten by the other. It may happen that two teams win the same number of matches and so are equal on points.
E and F have the fewest wins. Now it can be shown that the only way that this can fail is if there is a collection C of teams such that each team in C beats each team not in C. In this case. j with 1 ≤ i. j) entry of Am is strictly positive. but that A has generally beaten the stronger teams. there is a chain Tk0 . We say that A is indecomposable if. In our example. But perhaps there is a different eigenvalue and/or eigenvector which would give us a different result? In fact. So we can assume that the matrix of results is indecomposable. .102 The matrix A is APPENDIX D. Calculation with Maple shows that the vector [ 0. up to scalar multiplication. WHO IS TOP OF THE LEAGUE? 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 0 1 1. so the (2. . there is a general theorem called the Perron–Frobenius theorem which gives us conditions for this method to give a unique answer. obviously the teams in C occupy the top places in the league. The corresponding eigenvalue is the largest eigenvalue of A.1920 0. and suppose that A is indecomposable. 0 0 0 0 0 1 1 0 0 0 0 0 We see that A and B each have four wins. Before we state it. xn ] for A with the property that xi > 0 for all i. we see that B beats F beats A.6452 0. there was one upset when F beat A. . Deﬁnition D. Similarly for all other pairs. sich that each team in the chain beats the next one. j ≤ n. Also.1 (Perron–Frobenius Theorem) Let A be a n × n real matrix with all its entries nonnegative. So A is indecomposable in this case. that for any two teams Ti and T j . there is a number m such that the (i.3856 ] is an eigenvector of A with eigenvalue 2. This conﬁrms our view that A is top of the league and that F is ahead of E.7744 0. . Theorem D. . in our football league situation. Then. So the Perron–Frobenius eigenvector solves the problem of ordering the teams in the league.1 Let A be an n × n real matrix with all its entries nonnegative. for any i. .2875 0.4307 0. Tkm with Tk0 = Ti and Tkm = T j . . and we have reduced the problem to ordering these teams.0085. there is a unique eigenvector v = [ x1 . but F took A’s scalp and should clearly be better. we need a deﬁnition. it even puts F ahead of D. 1) entry in A2 is nonzero. This oddlooking condition means.
it is clear that there is so much symmetry between the teams that none can be put above the others by any possible rule. Remark This procedure has wider application. and suppose that A beats B and C. so that v is the Perron–Frobenius eigenvector. that there are ﬁve teams A.103 Remark Sometimes even this extra level of sophistication doesn’t guarantee a result. and we can use the Perron–Frobenius eigenvector to do the ranking. How does an Internet search engine like Google ﬁnd the most important web pages that match a given query? An important web page is one to which a lot of other web pages link. For example. for example. C beats D and E. all teams get the same score. 1 0 0 0 1 1 1 0 0 0 which is easily seen to be indecomposable. j) entry equal to 1 if Ti beats T j . . this can be described by a matrix. then Av = 2v. D beats E and A. so the simple rule gives them all the same score. and E beats A and B. we could take it to be the number of goals by which Ti won the game. B. and if v is the all1 vector. E. instead of just putting the (i. Each team wins two games. C. B beats C and D. The matrix A is 0 1 1 0 0 0 0 1 1 0 A = 0 0 0 1 1. So even with this method. D. Suppose. Remark Further reﬁnements are clearly possible. In this case.
WHO IS TOP OF THE LEAGUE? .104 APPENDIX D.
. • The leading ones in rows lower in the matrix occur further to the right. also • All the other entries in the column containing a leading one are zero. in addition to these two conditions. while others arise in courses on different parts of mathematics (coding theory. We say that A is in reduced echelon form if. . whatever the values of a. . the matrix 0 1 a b 0 c 0 0 0 0 1 d 0 0 0 0 0 0 is in reduced echelon form. A and B are rowequivalent if we can transform A into B by the use of elementary row operations only. etc.Appendix E Other canonical forms One of the unfortunate things about linear algebra is that there are many types of equivalence relation on matrices! In this appendix I say a few brief words about some that we have not seen elsewhere in the course. see Corollary 2. (This is true because any invertible matrix can be written as a product of elementary matrices.) A matrix A is said to be in echelon form if the following conditions hold: • The ﬁrst nonzero entry in any row (if it exists) is equal to 1 (these entries are called the leading ones). 105 . Some of these will be familiar to you from earlier linear algebra courses.6. .) Rowequivalence Two matrices A and B of the same size over K are said to be rowequivalent if there is an invertible matrix P such that B = PA. For example. Equivalently. e. group theory.
As far as I know it does not have a standard name.2 Let F p be the ﬁeld of integers mod p. I state the result for quadratic forms. over a ﬁeld K whose characteristic is not 2. Here is one of the few cases where this is possible. it is usually much harder to come up with a canonical form.106 APPENDIX E. then column permutations to move the columns containing the leading ones to the front of the matrix. where p is an odd prime. Two matrices A and B of the same size are said to be codingequivalent if B can be obtained from A by a combination of arbitrary row operations and column operations of Types 2 and 3 only. 2 2 2 x1 + · · · + xr−1 + cxr by an invertible linear change of variables. any matrix can be put into block form r . . Congruence over other ﬁelds Recall that two symmetric matrices A and B. use row operations to put the matrix into reduced echelon form. Any quadratic form is congruent to just one form of one of these types. Coding equivalence In the theory of errorcorrecting codes. I A Using these operations.1 Any matrix is rowequivalent to a unique matrix in reduced echelon form. for O O some matrix A. In other cases. To see this. This is the natural relation arising from representing a quadratic form relative to different bases. a matrix can be codingequivalent to several different matrices of this special form. It would take us too far aﬁeld to explain why this equivalence relation is important in coding theory. we meet a notion of equivalence which lies somewhere between rowequivalence and equivalence. Let c be a ﬁxed element of F p which is not a square. OTHER CANONICAL FORMS Theorem E. are congruent if B = P AP for some invertible matrix P. We saw in Chapter 5 the canonical form for this relation in the cases when K is the real or complex numbers. Theorem E. (See page 16). Unfortunately this is not a canonical form. A quadratic form q in n variables over F p can be put into one of the forms 2 2 x1 + · · · + xr .
since there is a basis with two elements. (b) The rank is 2. The ﬁrst and second columns are not linearly independent. the ﬁrst and third columns of A are linearly independent. Let 1 A= 1 −1 2 4 −1 5 2 3 −1 3 . so we cannot use these! (Note that we have to go back to the original A here.) 107 .Appendix F Worked examples 1. add the ﬁrst row to the third. 0 0 0 0 0 The ﬁrst two rows clearly form a basis for the row space. (d) Find invertible matrices P and Q such that PAQ is in the canonical form for equivalence. By inspection. (a) Subtract the ﬁrst row from the second. (b) What is the rank of A? (c) Find a basis for the column space of A. to get the matrix 1 2 4 −1 5 B = 0 0 1 0 2. so they form a basis. −2 0 1 3 (a) Find a basis for the row space of A. then multiply the new second row by −1 and subtract four times this row from the third. row operations change the column space. (c) The column rank is equal to the row rank and so is also equal to 2. so selecting two independent columns of B would not be correct.
and to put an identity matrix underneath B and perform the column operations on the extended matrix to ﬁnd Q. −3 4 1 Now B can be brought to the canonical form 1 0 0 0 0 C = 0 1 0 0 0 0 0 0 0 0 by subtracting 2. if desired. the percentage of voters who supported the party Pi was xi . At the beginning of the year. we have PA = B. 4. where P is obtained by performing the same elementary row operations on the 3 × 3 identity matrix I3 : 1 0 0 P = 1 −1 0 . and twice the third column from the ﬁfth. where Q is obtained by performing the same column operations on I5 : 1 −4 −2 1 3 0 0 1 0 0 Q = 0 1 0 0 −2 . You may ﬁnd it easier to write an identity matrix after A and perform the row operations on the extended matrix to ﬁnd P.108 APPENDIX F. . A certain country has n political parties P1 . During the year. 0 0 0 1 0 0 0 0 0 1 Remark: P and Q can also be found by multiplying elementary matrices. but the above method is simpler. . and then swapping the second and third columns. some voters change their minds. −1 and 5 times the ﬁrst column from the second. . and A the matrix whose (i. third. WORKED EXAMPLES (d) By step (a). . 2. Let v be the vector [ x1 x2 · · · xn ] recording support for the parties at the beginning of the year. j) entry is ai j . . a proportion ai j of former supporters of P j will support Pi at the end of the year. so C = BQ (whence C = PAQ). (a) Prove that the vector giving the support for the parties at the end of the year is Av. fourth and ﬁfth columns. Pn .
6x + 0. From what we are given. yn ] giving support for the parties at the end of the year. x − 0.75 for P1 to 0. the support for the parties will be approximately 0. . after a long time. yi = j=1 ∑ ai j x j .7 Show that.6. (c) Suppose that n = 2 and that A= 0. So by induction. 0.9 0. and the induction step is proved. where v0 = v.75 . 0. vm = Pm v0 = Pm v. we compute that the corresponding projections are P1 = 0.3 . (b) Let vk be the column vector whose ith component is the proportion of the population supporting party Pi after the end of k years.109 (b) In subsequent years. As in the 1 −1 text. n or v = Av.7 −0.) (c) The matrix P has characteristic polynomial x − 0. as required. So the proportion of the whole population who supported P j at the beginning of the year and Pi at the end is ai j x j .6). (The result of (a) starts the induction with m = 1. .3 = x2 − 1.75 .25 P2 = 0. the proportion supporting P j at the beginning of the year was x j .1 0. then vk = Avk−1 = A(Ak−1 v) = Ak v. exactly the same thing happens.6 = (x − 1)(x − 0. and a fraction ai j of these changed their support to Pi . An exactly similar argument shows that vk = Avk−1 for any k. If we assume that vk−1 = Ak−1 v. We ﬁnd by solving linear equations that 3 1 eigenvectors for the two eigenvalues are and respectively.25 0. In part (a). we showed that v1 = Av0 .9 −0.75 0.75 . with the same proportions.25 −0.7 So the eigenvalues of P are 1 and 0. The total support for Pi is found by adding these up for all j: that is. where v is the vector [ y1 .25 for P2 .25 −0. Show that the vector giving the support for the parties at the end of m years is Am v. (a) Let yi be the proportion of the population who support Pi at the end of the year. 0.
6)m P2 . If g1 = x f1 + y f2 + z f3 . giving x = −1.25 0.) Then P is diagonalisable: A = P1 + 0.6)m → 0. then the matrix giving the ﬁnal support is approximately As m → ∞. For example.110 APPENDIX F.25 As a check. f3 . The vectors v1 . From this and Proposition 4. use the computer with Maple to work out Pm for some large value of m.2 that the transition matrix between the dual bases is −1 1 0 (P−1 ) = −2 1 1 . we have (0. the transition matrix P from the vs to the ws is 1 2 0 P = 1 1 2. the dual basis of V ∗ is f1 .6 we see that Am = P1 + (0.75(x + y) 0.75 = = . and so A → P1 . w2 .7454650368 . 1 1 1 and we showed in Section 5.7515116544 0. y = −2. Alternatively. 2x + y + z = 0.25(x + y) 0. w3 = 2v2 + v3 . v3 form a basis for V = R3 . So g1 = − f1 − 2 f2 + 4 f3 . w3 . with x + y = 1. WORKED EXAMPLES (Once we have found P1 . The ﬁrst dual basis vector g1 satisﬁes g1 (w1 ) = 1. w2 = 2v1 + v2 + v3 .25 x 0.1. 2y + z = 0. z = 4. x is y the matrix giving the initial support for the parties. f2 . we ﬁnd x + y + z = 1. y 0.2545349632 3. 4 −2 −1 The coordinates of the gs in the basis of f s are the columns of this matrix. if v0 = 0.75 0. Find the basis of V ∗ dual to w1 . 0. we can ﬁnd P2 as I − P1 . v2 . . Solving two similar sets of equations gives g2 = f1 + f2 − 2 f3 and g3 = f2 − f3 . g1 (w2 ) = g1 (w3 ) = 0.2484883456 0.6P2 . I ﬁnd that P10 = 0. So in the limit.75 0. A second basis for V is given by w1 = v1 + v2 + v3 .
then An+1 = An ·A = Fn−1 Fn Fn Fn+1 0 1 Fn = 1 1 Fn+1 Fn−1 + Fn Fn = Fn + Fn+1 Fn+1 Fn+1 . Rather than ﬁnd P1 explicitly. Suppose that it holds for n. Then we get An = λ1 P1 + λ2 P2 . Let Vn be the vector space of real polynomials of degree at most n. 0 1 Fn+2 = Fn + Fn+1 for n ≥ 0. so that c = 1/ 5 and 1 Fn = √ 5 √ 1+ 5 2 n − √ 1− 5 2 n . and taking the (1. so the method will work. The Fibonacci numbers Fn are deﬁned by the recurrence relation F0 = 0. the (1. 2) entry we ﬁnd that n n Fn = c1 λ1 + c2 λ2 . we can now argue as √ √ follows: 1 = F1 = c(λ1 − λ2 ) = c 5. The equation for Fn is proved by induction on n. λ1 . Fn+1 and hence ﬁnd a formula for Fn . we show that A is diagonalisable. (a) Show that the function 1 f ·g = 0 f (x)g(x) dx is an inner product on Vn . 2) entries of these matrices are the negatives of each other. It is clearly true for n = 1. λ2 = 2 (1 ± 5).) Now because P1 + P2 = I. and then write A = λ1 P1 + λ2 P2 . Let A be the matrix F1 = 1. (Since the eigenvalues are distinct. we know that A is diagonalisable. 5. where c1 and c2 are the (1. so n n we have Fn = c(λ1 − λ2 ).111 4. To ﬁnd a formula for Fn . . 1 . Fn+2 So the induction step is proved. that is. From here it is just calculation. where P1 and P1 are projection matrices with sum I satisfying n n P1 P2 = P2 P1 = 0. 2) entries of P1 and P2 respectively. The eigenvalues of A are the roots of 0 = √ 1 det(xI − A) = x2 − x − 1. Prove that 1 An = Fn−1 Fn Fn .
that is. that b( f . x3 for V3 . that is. So we have to show that it is linear in the ﬁrst variable. 3 2 so that b = −1 and c = 1 . Prove that the adjoint of D is −D. write down the matrix representing the bilinear form relative to the basis 1. which are clear from elementary calculus. 1 (a) Put b( f . The function b is obviously symmetric. j) entry of the matrix representing b is 1 0 xi−1 x j−1 dx = 1 2 1 3 1 4 1 5 1 3 1 4 1 5 1 6 1 . g) = 0 f (x)g(x) dx. Let D : Wn → Wn be the linear map given by differentiation: (D f )(x) = f (x). then the (i. To make x orthogonal to 1 we must replace it by x + a for some a. We also have to show that the inner product is positive deﬁnite. To make x2 2 orthogonal to the two preceding is the same as making it orthogonal to 1 and x. x2 . (d) Let Wn be the subspace of Vn consisting of all polynomials f (x) of degree at most n which satisfy f (0) = f (1) = 0. doing the integral we ﬁnd that a = − 1 . (b) If the basis is f1 = 1.112 APPENDIX F. so we replace it by x2 + bx + c. i+ j−1 1 4 1 5 1 . 2 + 1 b + 1 c = 0. that 1 0 1 1 ( f1 (x) + f2 (x))g(x) dx = 1 0 f1 (x)g(x) dx + 1 0 f2 (x)g(x) dx. x2 ) to ﬁnd an orthonormal basis for V2 . 6 1 7 so the matrix is 1 3 1 4 1 2 1 (c) The ﬁrst basis vector is clearly 1. WORKED EXAMPLES (b) In the case n = 3. (c f (x))g(x) dx = c 0 0 f (x)g(x) dx. (c) Apply the Gram–Schmidt process to the basis (1. f ) ≥ 0. we ﬁnd that 1 3 1 4 + 1 b + c = 0. x. x. f3 = x2 . f4 = x3 . This is clear from properties of integration. f2 = x. with equality if and only if f = 0. 6 .
If A and B are orthogonally similar then they are similar. √ (x − ). If A and B are similar then they are orthogonally similar. B = 1 0 1 0. 7. The Spectral Theorem says that A is orthogonally congruent to a diagonal matrix whose diagonal entries are the eigenvalues.113 1 1 1 Now 1 · 1 = 1. Thus it is clear that both (a) and (b) are true. By Sylvester’s Law of Inertia. −1 and 0. the adjoint of D is −D. and (x2 − x + 1 ) · (x2 − x + 6 ) = 2 6 the required basis is 1 1 1 1 √ (x2 − x + ). For example. 6. A= 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 0 . Let A and B be real symmetric matrices. So (d) is true. and so to each other. 1. 2 6 2 3 6 5 (d) Integration by parts shows that 1 1 180 . So (c) is false. then it is not orthogonally similar to the Sylvester form. Find an orthogonal matrix P such that P−1 AP and P−1 BP are diagonal. −1 or 0. by deﬁnition. so f · D(g) = 0 f (x)g (x) dx 1 = [ f (x)g(x)]1 − 0 = −(D f ) · g. and they are orthogonally similar if B = P−1 AP for some orthogonal matrix P (invertible matrix satisfying P = P−1 ). then they have the same eigenvalues. Recall that A and B are similar if B = P−1 AP for some invertible matrix P. If we choose a symmetric matrix none of whose eigenvalues is 1. the matrices I and 2I are congruent but not orthogonally similar. Is each of the following statements true or false? Give brief reasons. where 1 1 1 1 0 1 0 1 1 1 1 1 . (x − 1 ) · (x − 2 ) = 12 . any real symmetric matrix is congruent to a diagonal matrix with diagonal entries 1. and so are orthogonally congruent to the same diagonal matrix. (a) (b) (c) (d) If A and B are orthogonally similar then they are congruent. f (x)g(x) dx 0 where the ﬁrst term vanishes because of the condition on polynomials in Wn . they are congruent if B = P AP for some invertible matrix P. If A and B are similar. Thus. If A and B are congruent then they are orthogonally similar.
that is. So choose two orthogonal vectors satisfying these conditions. 1)/2. z. (0. If v1 = (1. 1. So 1 1 1 √ 0 2 2 2 1 1 √ 0 2 −1 2 2 . 1. 0)/ 2. say (1. . Conversely. so we know that the matrix P exists. we can replace these two columns by their real and imaginary parts. in particular. Some eigenvectors can be found by inspection. −i + i = 0. and hence so do any linear combinations of them. x + y + z + w = 0 = x − y + z − w. we obtain√ the √ required basis: (1. P=1 1 1 − √2 0 2 2 1 1 1 0 − √2 2 −2 Second solution Observe that both A and B are circulant matrices. −1. 1. 1. So we know from Appendix B that the columns of the Vandermonde matrix 1 1 1 1 1 i −1 −i 1 −1 1 −1 1 −i −1 i are eigenvectors of both matrices. 1. −1)/2. 0. 1. 0. 0) and (0. For example. 1) then Av1 = 4v1 and Bv1 = 2v1 . Normalising. The second and fourth columns have corresponding eigenvalues 0 for both matrices. 1. −1. 0. If v2 = (1. −1) then Av2 = 0 and Bv2 = −2v2 . −1. the eigenvalues of B are 1 + 1 = 2. 1. First solution We have to ﬁnd an orthonormal basis which consists of eigenvectors for both matrices. i − i = 0. WORKED EXAMPLES Remark A and B are commuting symmetric matrices. (1. So x + z = 0 and y + w = 0. any such vector satisﬁes Av = 0 and Bv = 0. −1. −1)/ 2. y. giving (after a slight rearrangement) the matrix 1 1 1 0 1 −1 0 1 1 1 −1 0 .114 APPENDIX F. Any further eigenvector v = (x. 0. The results of Appendix B also allow us to write down the eigenvalues of A and B without any calculation. 1 −1 0 −1 After normalising the columns. −1). (1. this gives the same solution as the ﬁrst. −1 − 1 = −2. w) should be orthogonal to both of these.
which is (x + y)2 + y2 . The previous smallest. 1 2 1 0 Find an invertible matrix P and a diagonal matrix D such that P AP = I and P BP = D. show that. Now. if we put P = Q−1 = 1 0 −1 . in agreement with the fact that A is positive deﬁnite. Let A = First we take the quadratic form corresponding to A. was resolved only in 2004 by Hadi Kharaghani and Behruz TayfehReziae in Tehran. 8. we see that P AP = P (Q Q)P = I. nobody has succeeded in proving that a Hadamard matrix of any size n divisible by 4 exists. by constructing an example. As a further exercise. or a multiple of 4. 1 y]Q Q x = x2 + 2xy + 2y2 = [ x y y]A x . y y .) 1 1 Now the matrix that transforms (x. 2 1 1 −1 −1 1 −1 −1 1 This matrix (without the factor 1 ) is known as a Hadamard matrix. The smallest order for which the existence of a Hadamard matrix is still in doubt is (at the time of writing) n = 668.) 1 1 1 1 and B = . (The Hadamard matrix of size 4 H −H constructed above is of this form. y) to (x + y. y) is Q = . then H H is a Hadamard matrix of size 2n. however. (Note: This is the sum of two squares. The form is x2 + 2xy + 2y2 . It is known that an n × n Hadamard matrix cannot exist unless n is 1.115 Remark A more elegant solution is the matrix 1 1 1 1 1 1 −1 1 −1 . n = 428. 2. y 1 1 x x+y = . and reduce it to a sum of squares. It is an n × n 2 matrix H with all entries ±1 satisfying H H = nI. since 0 1 1 0 Hence [x so that Q Q = A. if H is a Hadamard matrix of size n. where I is the identity matrix.
) Then because P2 is orthogonal.116 What about P BP? We ﬁnd that P BP = 1 −1 0 1 1 1 1 0 APPENDIX F. WORKED EXAMPLES 1 0 −1 1 = 1 0 0 = D. Remark 2: If you are only asked for the diagonal matrix D. −1 the required diagonal matrix. (P2 is the matrix whose columns are orthonormal eigenvectors of P1 BP1 . we can ﬁnd an orthogonal matrix P2 such that P1 (P1 BP1 )P2 is diagonal. So by the Spectral Theorem. so that P = P1 P2 is the required matrix. Remark 1: In general it is not so easy. you can do an easier calculation. 2x so the diagonal entries of D are +1 and −1 (as we found). we have P2 (P1 AP1 )P2 = P2 IP2 = I. we have x−1 x−1 x−1 = x2 − 1 = (x − 1)(x + 1). The reduction of the quadratic form will give a matrix P1 such that P1 AP1 = I. and not the matrix P. We saw in the lectures that the diagonal entries of D are the roots of the polynomial det(xA − B) = 0. In our case. So we are done. . all we can say is that it is symmetric. but in general P1 BP1 won’t be diagonal.
48 characteristic of ﬁeld. 6 orthonormal. 86 axioms for ﬁeld. 36 of matrices. 48 diagonalisable. 67 dual basis. 27 117 column operations. 62 canonical form. 58. 8 direct sum. 4 Exchange Lemma. 44 eigenvector. 65 for equivalence. 71 adjugate. 7 Fibonacci numbers. 51 Cauchy–Schwarz inequality. 39 errorcorrecting codes. 14. 106 Euclidean plane. 105 eigenspace. 10. 15 complex numbers. 3 of vectors. 23. 95 data. 3. 68 Cayley–Hamilton Theorem. 64. 114 coding equivalence. 62 alternating. 41 distributive law. 4 adjoint. 15 of scalars. 44 elementary column operations.Index abelian group. 85 symmetric. 20. 57. 5 cubic equation. 76 for similarity. 3 dot product. 20 for congruence. 90. 17. 27 cofactor expansion. 111 ﬁeld. 10 coordinatewise. 27 alternating bilinear form. 3. 30. 106 coordinate representation. 20 column vector. 16 equivalence. 16 column rank. 91 addition of linear maps. 3. 59. 56 echelon form. 90 basis. 68 bilinear form. 90 characteristic polynomial of linear map. 106 cofactor. 85 alternating matrix. 90 congruence. 94. 39 for orthogonal similarity. 89 for vector space. 44 eigenvalue. 16 elementary matrix. 56 dual space. 87 of linear map. 3. 3 . 18 elementary row operations. 10 determinant. 48 circulant matrix. 45 dimension. 20.
73 orthogonal decomposition.118 ﬁnitedimensional. 82 positive semideﬁnite. 29 invertible linear map. 16 indecomposable. 36 of matrices. 88 skewsymmetric. 66 projection. 12. 19. 81 standard. 71 orthogonal vectors. 69 integers mod p. 114 minimal polynomial. 114 Hadamard. 25 Perron–Frobenius Theorem. 33 linear transformation. 93. 48 minor. 29. 74 . 110 matrix. 16 image. 33 Kronecker delta. 102 invertible. 71 skewHermitian. 68 parallelogram law. 115 Hermitian. 97 Hadamard matrix. 71 unitary. 6 Maple (computer algebra system). 74 orthogonal linear map. 37 normal. 87 polarisation. 12. 86 circulant. 3 negative deﬁnite. 15 alternating. 71 orthogonal projection. 27 multiplication of linear maps. 51 Jordan form. 84 orthogonal. 6 Friendship Theorem. 75 Google (Internet search engine). 74 orthogonal similarity. 12. 12 normal linear map. 102 indeﬁnite. 33 indecomposable matrix. 67. 24. 84 nullity. 33 linearly independent. 66. 73 orthonormal basis. 103 Gram–Schmidt process. 13 inverse matrix. 55 linear map. 82. 82 identity. 15 of scalars. 41 orthogonal. 102. 90 intersection. 63 positive deﬁnite. 66 negative semideﬁnite. 83 identity linear map. 69. 94. 59. 66 inner product. 67. 84 normal matrix. 66 nonsingular. 97 Fundamental Theorem of Algebra. 56 linear form. 85 graph theory. 37 matrix. 115 Hermitian. 102 Pfafﬁan. 42 identity matrix. 86 symmetric. 37 Jordan block. 51 kernel. 82 Vandermonde. 34 INDEX orthogonal complement. 4 permutation.
99 transition matrix. 40 of quadratic form. 114 vector. 25 trace. 66. 82 sign. 11 transposition. 90 real numbers. 34 rational numbers. 105 scalar. 34. 4 zero matrix. 40 of matrix. 16 row rank. 78 similarity. 13 sum. 78 Rank–Nullity Theorem. 13 Sylvester’s Law of Inertia. 4 selfadjoint. 69. 55 rowequivalence. 25 unitary. 93. 63 rank of linear map. 77 symmetric group. 3. 52. 82. 44 orthogonal. 17. 66. 71 skewHermitian. 15. 75. 20 row vector. 83 standard inner product. 4 complex. 59. 65. 82 subspace. 90 ring. 88 skewsymmetric matrix. 4 scalar multiplication. 25 signature.INDEX quadratic form. 4 119 . 82. 71 semilinear. 3. 82 sesquilinear. 62. 83 Vandermonde matrix. 4 real. 86 spanning. 4 vector space. 16 zero vector. 10. 16 row operations. 6 Spectral Theorem.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.