Algebra I

MA251: Algebra I Advanced Linear Algebra
David Loefer
Autumn Term, 2011-12
Contents
0 Blurb 2
1 Review of some MA106 material 3
1.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 The matrix of a linear map with respect to a choice of bases . . . . . . . . . . . . . 4
1.5 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The Jordan Canonical Form 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 The CayleyHamilton theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Calculating the minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Jordan chains and Jordan blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Jordan bases and the Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . 16
2.8 The JCF when n=2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.11 Proof of Theorem 2.7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Functions of a matrix 26
3.1 Motivation: Systems of differential equations . . . . . . . . . . . . . . . . . . . . . 26
3.2 The exponential of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Difference equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Bilinear Maps and Quadratic Forms 34
4.1 Bilinear maps: denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Bilinear maps: change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1
0 Blurb
4.4 Nice bases for quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Euclidean spaces, orthonormal bases and the GramSchmidt process . . . . . . . 44
4.6 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Nice orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.8 Applications of quadratic forms to geometry . . . . . . . . . . . . . . . . . . . . . 53
4.8.1 Reduction of the general second degree equation . . . . . . . . . . . . . . 53
4.8.2 The case n = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8.3 The case n = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.9 The complex story: sesquilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . 61
5 Linear Algebra over Z 65
5.1 Review of some MA132 material . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.2 Examples of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.3 The order of an element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.4 Isomorphisms between groups . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.5 Subgroups and cosets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Cyclic groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Homomorphisms, kernels, and quotients . . . . . . . . . . . . . . . . . . . . . . . 68
5.4 The rst isomorphism theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 The group Z
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Subgroups of Z
n
and Smith Normal Form . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 The direct sum of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.8 A preview of abstract algebra: Modules over PIDs . . . . . . . . . . . . . . . . . . 81
6 Corrections 83
0 Blurb
Lecture 1
As its title suggests, this module is a continuation of last years MA106 Linear Algebra module;
well be studying vector spaces, linear maps, and their properties in a bit more detail. Later in
the module well think a bit about matrices whose entries lie not in a eld but in the integers Z,
and well see what our methods have to tell us in that case.
2
1 Review of some MA106 material
In this section, well recall some ideas from the rst year MA106 Linear Algebra module. This
will just be a brief reminder; for detailed statements and proofs, go back to your MA106 notes!
(You didnt throw them away, did you? I hope not.)
1.1 Fields
Recall that a eld is a number system where we know how to do all of the basic arithmetic
operations: we can add, subtract, multiply and divide (as long as were not trying to divide by
zero).
Examples.
A non-example is Z, the integers. Here we can add, subtract, and multiply, but we cant
always divide without jumping out of Z into some bigger world.
The real numbers R and the complexes C are elds, and these are perhaps the most
familiar ones.
The rational numbers Q are also a eld.
A more subtle example: if p is a prime number, the integers mod p are a eld, written as
Z/pZ or F
p
.
There are lots of elds out there, and the reason we take the axiomatic approach is that we know
that everything we prove will be applicable to any eld we like, as long as weve only used the
eld axioms in our proofs (rather than any specic properties of the elds we happen to most
like). We dont have to know all the elds in existence and check that our proofs are valid for
each one separately.
1.2 Vector spaces
Let K be a eld
1
. A vector space over K is a set V together with two extra pieces of structure.
Firstly, it has to have a notion of addition: we need to know what v + w means if v and w are in
V. Secondly, it has to have a notion of scalar multiplication: we need to know what v means if v
is in V and is in K. These have to satisfy some axioms, for which Im going to refer you again
to your MA106 notes.
A basis of a vector space is a subset B V such that every v V can be written as a nite linear
combination of elements of B,
v =
1
b
1
+ +
n
b
n
,
for some n Nand some
1
, . . . ,
n
K; and for each v V, we can do this in one and only
one way. Another way of saying this is that B is a linearly independent set which spans V,
1
Its conventional to use K as the letter to denote a eld; the K stands for the German word K orper, as one of you
pointed out to me in the rst lecture.
3
which is the denition you had in MA106. We say V is nite-dimensional if there is a nite basis
of V. You saw last year that if V has one basis which is nite, then every basis of V is nite, and
they all have the same size; and we dene the dimension of V to be this magic number which is
the size of any basis of V.
Examples. Let K = R.
The space of polynomials in x with coefcients in R is certainly a vector space over R; but
its not nite-dimensional (rather obviously).
For any d N, the set R
d
of column vectors with d real entries is a vector space over R
(which, not surprisingly, has dimension d).
The set
_
_
_
_
_
x
1
x
2
x
3
_
_
R
3
: x
1
+ x
2
+ x
3
= 0
_
_
_
is a vector space over R.
The third example above is an interesting one because theres no natural choice of basis. It
certainly has bases, e.g. the set
_
_
_
_
_
1
2
1
_
_
,
_
_
1
0
1
_
_
_
_
_
,
but theres no reason why thats better than any other one. This is one of the reasons why we
need to worry about the choice of basis if you want to tell someone else all the wonderful
things youve found out about this vector space, you might get into a total muddle if you
insisted on using one particular basis and they preferred another different one.
1.3 Linear maps
If V and W are vector spaces (over the same eld K), then a linear map from V to W is a map
T : V W which respects the vector space structure. That is, we know two things that we
can do with vectors in a vector space add them, and multiply them by scalars; and a linear
map is a map where adding or scalar-multiplying on the V side, then applying the map T, is
the same as applying the map T, then adding or multiplying on the W side. Formally, for T to
be a linear map means that we must have
T(v
1
+ v
2
) = T(v
1
) + T(v
2
) v
1
, v
2
V
and
T(v
1
) = T(v
1
) K, v
1
V.
1.4 The matrix of a linear map with respect to a choice of bases
Let V and W be vector spaces over a eld K. Let T : V W be a linear map, where dim(V) =
n, dim(W) = m. Choose a basis e
1
, . . . , e
n
of V and a basis f
1
, . . . , f
m
of W.
4
Now, for 1 j n, T(e
j
) W, so T(e
j
) can be written uniquely as a linear combination of
f
1
, . . . , f
m
. Let
T(e
1
) =
11
f
1
+
21
f
2
+ +
m1
f
m
T(e
2
) =
12
f
1
+
22
f
2
+ +
m2
f
m

T(e
n
) =
1n
f
1
+
2n
f
2
+ +
mn
f
m
where the coefcients
ij
K (for 1 i m, 1 j n) are uniquely determined.
The coefcients
ij
form an mn matrix
A =
_
_
_
_
_
11

12
. . .
1n
21

22
. . .
2n
.
.
.
.
.
.
.
.
.
.
.
.
m1

m2
. . .
mn
_
_
_
_
_
over K. Then A is called the matrix of the linear map T with respect to the chosen bases of V
and W. Note that the columns of A are the images T(e
1
), . . . , T(e
n
) of the basis vectors of V
represented as column vectors with respect to the basis f
1
, . . . , f
m
of W.
It was shown in MA106 that T is uniquely determined by A, and so there is a one-one corre-
spondence between linear maps T : V W and mn matrices over K, which depends on the
choice of bases of V and W.
For v V, we can write v uniquely as a linear combination of the basis vectors e
i
; that is,
v = x
1
e
1
+ + x
n
e
n
, where the x
i
are uniquely determined by v and the basis e
i
. We shall call
x
i
the coordinates of v with respect to the basis e
1
, . . . , e
n
. We associate the column vector
v =
_
_
_
_
_
_
x
1
x
2
.
.
x
n
_
_
_
_
_
_
K
n,1
,
to v, where K
n,1
denotes the space of n 1-column vectors with entries in K.
It was proved in MA106 that if A is the matrix of the linear map T, then for v V, we have
T(v) = w if and only if Av = w, where w K
m,1
is the column vector associated with w W.
1.5 Change of basis
Lecture 2
Let V be a vector space of dimension n over a eld K, and let e
1
, . . . , e
n
and e
/
1
, . . . , e
/
n
be two
bases of V. Then there is an invertible n n matrix P = (p
ij
) such that
e
/
j
=
n
i=1
p
ij
e
i
for 1 j n. ()
5
P is called the basis change matrix or transition matrix for the original basis e
1
, . . . , e
n
and the
new basis e
/
1
, . . . , e
/
n
. Note that the columns of P are the new basis vectors e
/
i
written as column
vectors in the old basis vectors e
i
. (Recall also that P is the matrix of the identity map V V
using basis e
/
1
, . . . , e
/
n
in the domain and basis e
1
, . . . , e
n
in the codomain.)
Usually the original basis e
1
, . . . , e
n
will be the standard basis of K
n
.
Example. Let V = R
3
, e
1
= (1, 0, 0), e
2
= (0, 1, 0), e
3
= (0, 0, 1) (the standard basis) and
e
/
1
= (0, 1, 2), e
/
2
= (1, 2, 0), e
/
3
= (1, 0, 0). Then
P =
_
_
0 1 1
1 2 0
2 0 0
_
_
.
The following result was proved in MA106.
Proposition 1.5.1. With the above notation, let v V, and let v and v
/
denote the column vectors
associated with v when we use the bases e
1
, . . . , e
n
and e
/
1
, . . . , e
/
n
, respectively. Then Pv
/
= v.
So, in the example above, if we take v = (1, 2, 4), then we have v = e
1
2e
2
+4e
3
(obviously);
so the coordinates of v in the basis e
1
, e
2
, e
3
are v =
_
_
1
2
4
_
_
.
On the other hand, we also have v = 2e
/
1
2e
/
2
3e
/
3
, so the coordinates of v in the basis
e
/
1
, e
/
2
, e
/
3
are
v
/
=
_
_
2
2
3
_
_
,
and you can check that
Pv
/
=
_
_
0 1 1
1 2 0
2 0 0
_
_
_
_
2
2
3
_
_
=
_
_
1
2
4
_
_
= v,
just as Proposition 1.5.1 says.
This equation Pv
/
= v describes the change of coordinates associated with our basis change.
Well be using this relationship over and over again, so make sure youre happy with it!
Now let T : V W, e
i
, f
i
and A be as in Subsection 1.4 above, and choose new bases e
/
1
, . . . , e
/
n
of V and f
/
1
, . . . , f
/
m
of W. Then
T(e
/
j
) =
m
i=1
ij
f
/
i
for 1 j n,
where B = (
ij
) is the mn matrix of T with respect to the bases e
/
i
and f
/
i
of V and W. Let
the n n matrix P = (p
ij
) be the basis change matrix for original basis e
i
and new basis e
/
i
,
and let the mm matrix Q = (q
ij
) be the basis change matrix for original basis f
i
and new
basis f
/
i
. The following theorem was proved in MA106:
6
Theorem 1.5.2. With the above notation, we have AP = QB, or equivalently B = Q
1
AP.
In most of the applications in this module we will have V = W(= K
n
), e
i
= f
i
, and
e
/
i
= f
/
i
. So P = Q, and hence B = P
1
AP.
7
2 The Jordan Canonical Form
2.1 Introduction
Throughout this section V will be a vector space of dimension n over a eld K, T : V V will
be a linear map, and A will be the matrix of T with respect to a xed basis e
1
, . . . , e
n
of V. Our
aim is to nd a new basis e
/
1
, . . . , e
/
n
for V, such that the matrix of T with respect to the new
basis is as simple as possible. Equivalently (by Theorem 1.5.2), we want to nd an invertible
matrix P (the associated basis change matrix) such that P
1
AP is a simple as possible.
(Recall that if B is a matrix which can be written in the form B = P
1
AP, we say B is similar to
A. So a third way of saying the above is that we want to nd a matrix thats similar to A, but
which is as nice as possible.)
Our preferred form of matrix is a diagonal matrix. So wed really rather like it if every matrix
was similar to a diagonal matrix. But this wont work: we saw in MA106 that the matrix
_
1 1
0 1
_
,
for example, is not similar to a diagonal matrix. (We say this matrix is not diagonalizable.)
The point of this section of the module is to show that although we cant always get to a diagonal
matrix, we can get pretty close (at least if K is C). Under this assumption, it can be proved that
A is always similar to a matrix B of a certain type (called the Jordan canonical form or sometimes
Jordan normal form of the matrix), which is not far off being diagonal: its only nonzero entries
are on the diagonal or just above it.
2.2 Eigenvalues and eigenvectors
We start by summarising some what we know from MA106 thats relevant here.
If we can nd some 0 ,= v V and K such that Tv = v, or equivalently Av = v, then
is an eigenvalue, and v a corresponding eigenvector of T (or of A).
From MA106, you have a theorem that tells you when a matrix is diagonalizable:
Proposition 2.2.1. Let T : V V be a linear map. Then the matrix of T is diagonal with respect to
some basis of V if and only if V has a basis consisting of eigenvectors of T.
This is a nice theorem, but it doesnt tell you how you might nd such a basis! But theres one
case where its easy, as another theorem from MA106 tells us:
Proposition 2.2.2. Let
1
, . . . ,
r
be distinct eigenvalues of T : V V, and let v
1
, . . . , v
r
be corre-
sponding eigenvectors. (So T(v
i
) =
i
v
i
for 1 i r.) Then v
1
, . . . , v
r
are linearly independent.
Corollary 2.2.3. If the linear map T : V V (or equivalently the n n matrix A) has n distinct
eigenvalues, where n = dim(V), then T (or A) is diagonalizable.
8
2.3 The minimal polynomial
We start this section with what might look like a bit of a digression; but I promise itll be really
relevant later! Well spend a little while thinking about polynomials in a single variable x with
coefcients in our eld K; e.g. if K is C (or Q) we could take p = p(x) = 2x
2
3
2
x + 11. The set
of all such polynomials is denoted by K[x].
If A K
n,n
is a square n n matrix over K, and p K[x] is a polynomial, then we know what
p(A) means: we just calculate the powers of A in the usual way, and then plug them into the
formula dening p, interpreting the constant term as a multiple of I
n
.
For instance, if K = Q, p is the polynomial I just wrote down, and A =
_
2 3
0 1
_
, then Lecture 3
A
2
=
_
4 9
0 1
_
, and
p(A) = 2
_
4 9
0 1
_
3
2
_
2 3
0 1
_
+ 11I
2
=
_
16 27/2
0 23/2
_
.
Warning. Notice that this is not the same as the matrix
_
p(2) p(3)
p(0) p(1)
_
.
Theorem 2.3.1. Let A K
n,n
. Then there is some nonzero polynomial p K[x] of degree at most n
2
such that p(A) is the n n zero matrix 0
n
.
Proof. This is easy, but requires a slightly odd mental contortion. The key thing to observe is
that K
n,n
, the space of n n matrices over K, is itself a vector space over K. Its dimension (rather
obviously) is n
2
.
Lets consider the set I
n
, A, A
2
, . . . , A
n
2
K
n,n
. Since this is a set of n
2
+ 1 vectors in an
n
2
-dimensional vector space, there had better be a linear relation between them. That is, we can
nd constants
0
,
1
, . . . ,
n
2 , not all zero, such that
0
I
n
+ +
n
2 A
n
2
= 0
n
.
Now we dene the polynomial p =
0
+
1
x + +
n
2 x
2
. This isnt zero, and its degree is at
most n
2
. (It might be less, since
n
2 might be 0.) Then thats it! Weve just shown that p(A) = 0
n
,
so were done! Weve plucked a polynomial that A satises out of thin air.
Now, the problem with this argument is that theres no reason why the polynomial it produces
should be unique. In general, there might be many linear relations you could write down
between those n
2
+ 1 vectors, so there could be lots of polynomials satised by A. Is there a
way of nding a particularly nice polynomial that A satises? To answer that question, well
have to think a little bit about arithmetic in K[x].
9
Note that we can do division with polynomials, a bit like with integers. We can divide one
polynomial p (with p ,= 0) into another polynomial q and get a remainder with degree less than
p. For example, if q = x
5
3, p = x
2
+ x + 1, then we nd q = sp + r with s = x
3
x
2
+ 1 and
r = x 4.
If the remainder is 0, so p = qr for some r, we say p divides q and write this relation as p [ q.
Finally, a polynomial with coefcients in a eld K is called monic if the coefcient of the highest
power of x is 1. So, for example, x
3
2x
2
+ x + 11 is monic, but 2x
2
x 1 is not.
Theorem 2.3.2. Let A be an n n matrix over K representing the linear map T : V V. Then
(i) There is a unique monic non-zero polynomial p(x) with minimal degree and coefcients in K such
that p(A) = 0.
(ii) If q(x) is any polynomial with q(A) = 0, then p [ q.
Proof. (i) If we have any polynomial p(x) with p(A) = 0, then we can make p monic by
multiplying it by a constant. By Theorem 2.3.1, there exists such a p(x), so there exists one of
minimal degree. If we had two distinct monic polynomials p
1
(x), p
2
(x) of the same minimal
degree with p
1
(A) = p
2
(A) = 0, then p = p
1
p
2
would be a non-zero polynomial of smaller
degree with p(A) = 0, contradicting the minimality of the degree, so p is unique.
(ii) Let p(x) be the minimal monic polynomial in (i) and suppose that q(A) = 0. As we saw
above, we can write q = sp + r where r has smaller degree than p. If r is non-zero, then
r(A) = q(A) s(A)p(A) = 0 contradicting the minimality of p, so r = 0 and p [ q.
Denition 2.3.3. The unique monic polynomial
A
(x) of minimal degree with
A
(A) = 0 is
called the minimal polynomial of A.
We know that for p K[x], p(T) = 0
V
if and only if p(A) = 0
n
; so
A
is also the unique monic
polynomial of minimal degree such that
A
(T) = 0 (the minimal polynomial of T.) In particular,
since similar matrices A and B represent the same linear map T, and their minimal polynomial
is the same as that of T. Hence we have
Proposition 2.3.4. Similar matrices have the same minimal polynomial.
By Theorem 2.3.1 and Theorem 2.3.2 (ii), we have:
Corollary 2.3.5. The minimal polynomial of an n n matrix A has degree at most n
2
.
(In the next section, well see that we can do much better than this.)
Example. If D is a diagonal matrix, say
D =
_
_
_
d
11
.
.
.
d
nn
_
_
_,
10
then for any polynomial p, p(D) is the diagonal matrix with entries
_
_
_
p(d
11
)
.
.
.
p(d
nn
)
_
_
_.
Hence p(D) = 0 if and only if p(d
ii
) = 0 for all i. So for instance if
D =
_
_
3 0 0
0 3 0
0 0 2
_
_
,
the minimal polynomial of D is the smallest-degree polynomial which has 2 and 3 as roots,
which is clearly
D
(x) = (x 2)(x 3) = x
2
5x + 6.
We can generalize this example as follows: Lecture 4
Proposition 2.3.6. Let D be any diagonal matrix and let
1
, . . . ,
r
be the set of diagonal entries of
D (i.e. without any repetitions, so the values
1
, . . . ,
r
are all different). Then we have
D
(x) = (x
1
)(x
2
) . . . (x
r
).
Proof. As in the example, we have p(D) = 0 if and only if p(
i
) = 0 for all i 1, . . . , r. The
smallest-degree monic polynomial vanishing at these points is clearly the polynomial above.
Corollary 2.3.7. If A is any diagonalizable matrix, then
A
(x) is a product of distinct linear factors.
Proof. Clear from Proposition 2.3.6 and Proposition 2.3.4.
Remark. Well see later in the course that this is a necessary and sufcient condition: A is diagonalizable
if and only if
A
(x) is a product of distinct linear factors. But we dont have enough tools to prove this
theorem yet be patient!
2.4 The CayleyHamilton theorem
In this section well prove one of the main big theorems of the course, which shows that the
minimal polynomial of an n n matrix has degree n, and tells us a lot more besides:
Theorem 2.4.1 (CayleyHamiton). Let c
A
(x) be the characteristic polynomial of the n n matrix A
over an arbitrary eld K. Then c
A
(A) = 0.
Proof. (Non-examinable) Lets agree to drop the various subscripts and bold zeroes itll be
obvious from context when we mean a zero matrix, zero vector, zero linear map, etc.
Recall from MA106 that, if B is any n n matrix, the adjugate matrix of B is another matrix
adj(B) which was constructed along the way to constructing the inverse of B. The entries of
11
adj(B) are the cofactors of B: the (i, j) entry of B is (1)
i+j
c
ji
(note the transposition of indices
here!), where c
ji
= det(B
ji
), B
ij
being the (n 1) (n 1) matrix obtained by deleting the j-th
row and the i-th column of B. The key property of adj(B) is that it satises
Badj(B) = adj(B)B = (det B)I
n
.
(Notice that if B is invertible, this just says that adj(B) = (det B)B
1
, but the adjugate matrix
still makes sense even if B is not invertible.)
Lets apply this to the matrix B = AxI
n
. By denition, det(B) is the characteristic polynomial
c
A
(x), so
(A xI
n
) adj(A xI
n
) = c
A
(x)I
n
. (1)
The key to this proof is that the entries of the adjugate adj(B) are made up from determinants
of (n 1) (n 1) submatrices of B. Since the entries of B are linear or constant polynomials
in x, the entries of adj(B) are polynomials of degree (n 1). We can make a new matrix by
taking the jth coefcient of each entry of adj(B), and thus write
adj(A xI
n
) = B
0
+ B
1
x + . . . B
n1
x
n1
,
where each B
i
is an n n matrix with entries in K. Well also give names to the coefcients of
c
A
(x): lets say c
A
(x) = a
0
+ a
1
x + + a
n
x
n
. So we can write equation (1) as
(A xI
n
)(B
0
+ B
1
x + . . . B
n1
x
n1
) = (a
0
+ a
1
x + + a
n
x
n
)I
n
.
Since this is a polynomial identity, we can equate coefcients of the powers of x on the left
and right hand sides. In the list of equations below, the equations on the left are the result of
equating coefcients of x
i
for 0 i n, and those on right are obtained by multiplying A
i
by
the corresponding left hand equation.
AB
0
= a
0
I
n
AB
0
= a
0
I
n
,
AB
1
B
0
= a
1
I
n
, A
2
B
1
AB
0
= a
1
A,
AB
2
B
1
= a
2
I
n
, A
3
B
2
A
2
B
1
= a
2
A
2
,
.
.
.
.
.
.
AB
n1
B
n2
= a
n1
I
n
A
n
B
n1
A
n1
B
n2
= a
n1
A
n1
,
B
n1
= a
n
I
n
A
n
B
n1
= a
n
A
n
.
Now summing all of the equations in the right hand column gives
0 = a
0
I
n
+ a
1
A + . . . + a
n1
A
n1
+ a
n
A
n
which says exactly that c
A
(A) = 0.
Corollary 2.4.2. For any A K
n,n
, we have
A
[ c
A
, and in particular deg(
A
) n.
12
Example. Let D be my diagonal matrix
_
_
3 0 0
0 3 0
0 0 2
_
_
from last lecture. We saw above that
A
(x) = (x 2)(x 3). However, its easy to see that
c
A
(x) =
3 x 0 0
0 3 x 0
0 0 2 x
= (x 2)(x 3)
2
.
How NOT to prove the CayleyHamilton theorem It is very tempting to try and prove the Lecture 5
CayleyHamilton theorem as follows: we know that
c
A
(x) = det(A xI
n
),
so shouldnt we have
c
A
(A) = det(A AI
n
) = det(A A) = det(0) = 0?
This is wrong: the argument is not valid. We know that c
A
(x) = det(A xI
n
) holds for all
x K, but it doesnt hold in general when we replace x by a matrix. Indeed it doesnt even
make sense: if B is some other matrix, then det(A BI
n
) = det(A B) is an element of K, but
c
A
(B) is a matrix, so its meaningless to expect that
c
A
(B) = det(A BI
n
)
holds for all B. This bogus argument is a good way to remember what the CayleyHamilton
theorem says, but its not a proof.
2.5 Calculating the minimal polynomial
The CayleyHamilton theorem gives us a rather powerful tool for calculating the minimal
polynomial of a matrix. To make it even easier, heres a little lemma:
Lemma 2.5.1. Let be any eigenvalue of A. Then
A
() = 0.
Proof. Let v K
n,1
be an eigenvector corresponding to . Then A
n
v =
n
v, and hence for any
polynomial p K[x], we have
p(A)v = p()v.
We know that
A
(A)v = 0, since
A
(A) is the zero matrix. Hence
A
()v = 0, and since v ,= 0
and
A
() is an element of K (not a matrix!), this can only happen if
A
() = 0.
This lemma, together with CayleyHamilton, give us very, very few possibilities for
A
. Lets
look at an example.
13
Example. Take K = C and let
A =
_
_
_
_
4 0 1 1
1 2 0 0
2 2 2 2
1 1 0 3
_
_
_
_
.
This is rather large, but it has a fair few zeros, so you can calculate its characteristic poly fairly
quickly by hand and nd out that
c
A
(x) = x
4
11x
3
+ 45x
2
81x + 54.
Some trial and error shows that 2 is a root of this, and we nd that
c
A
(x) = (x 2)(x
3
9x
2
+ 27x 27) = (x 2)(x 3)
3
.
So
A
(x) divides (x 2)(x 3)
3
. On the other hand, the eigenvalues of A are the roots of c
A
(x),
namely 2, 3; and we know that
A
must have each of these as roots. So the only possibilities
for
A
(x) are:
A
(x)
_
_
_
(x 2)(x 3),
(x 2)(x 3)
2
,
(x 2)(x 3)
3
.
_
_
_
.
Some slightly tedious calculation shows that (A2)(A3) isnt zero, and nor is (x 2)(x 3)
2
,
but (x 2)(x 3)
3
is zero, so were done: thats the minimal polynomial of A.
Remark. In practice, one rarely has to go so far as this you have to be pretty unlucky to get a minimal
polynomial with a repeated factor, and unluckier still to get a factor that comes in with multiplicity 3.
2.6 Jordan chains and Jordan blocks
Well now consider some special vectors attached to our matrix, which satisfy a condition a
little like eigenvectors (but weaker). These will be the stepping-stones towards Jordan canonical
form.
Denition 2.6.1. A non-zero vector v K
n,1
such that (AI
n
)
i
v = 0, for some i > 0, is called
a generalized eigenvector of A with respect to the eigenvalue .
Note that, for xed i > 0, v V [ (A I
n
)
i
v = 0 is the nullspace of (A I
n
)
i
, and is
called the generalized eigenspace of index i of A with respect to .
The generalized eigenspace of index 1 is just called the eigenspace of A w.r.t. ; it consists of the
eigenvectors of A w.r.t. , together with the zero vector. We sometimes also consider the full
generalized eigenspace of A w.r.t. , which is the set of all generalized eigenvectors together with
the zero vector; this is the union of the generalized eigenspaces of index i over all i N.
We can arrange generalized eigenvectors into chains:
14
Denition 2.6.2. A Jordan chain of length k is a sequence of non-zero vectors v
1
, . . . , v
k
K
n,1
that satises
Av
1
= v
1
, Av
i
= v
i
+v
i1
, 2 i k,
for some eigenvalue of A.
Equivalently, (A I
n
)v
1
= 0 and (A I
n
)v
i
= v
i1
for 2 i k, so (A I
n
)
i
v
i
= 0 for
1 i k. Thus all of the vectors in a Jordan chain are generalized eigenvectors, and v
i
lies in
the generalized eigenspace of index i.
[XXX Schematic picture of generalized eigenspaces XXX]
For example, take K = C and consider the matrix Lecture 6
A =
_
_
3 1 0
0 3 1
0 0 3
_
_
.
We see that, for b
1
, b
2
, b
3
the standard basis of C
3,1
, we have Ab
1
= 3b
1
, Ab
2
= 3b
2
+ b
1
,
Ab
3
= 3b
3
+ b
2
, so b
1
, b
2
, b
3
is a Jordan chain of length 3 for the eigenvalue 3 of A. The
generalized eigenspaces of index 1, 2, and 3 are respectively b
1
, b
1
, b
2
, and b
1
, b
2
, b
3
.
Note that this isnt the only possible Jordan chain. Obviously, 17b
1
, 17b
2
, 17b
3
would be a
Jordan chain; but there are more devious possibilities you can check that b
1
, b
1
+b
2
, b
2
+b
3
is a Jordan chain, so there can be several Jordan chains with the same rst vector. On the other
hand, two Jordan chains with the same last vector are obviously the same (and in particular
have the same length).
What are the generalized eigenspaces here? The only eigenvalue is 3 (obviously). For this
eigenvalue, the generalized eigenspace of index 1 is b
1
(the linear span of b
1
); the generalized
eigenspace of index 2 is b
1
, b
2
; and the generalized eigenspace of index 3 is the whole space
b
1
, b
2
, b
3
. So the dimensions are (1, 2, 3).
In general, you can think of the dimensions of the generalized eigenspaces of a matrix as a sort
of ngerprint. (Since we have these numbers for each , perhaps one should think of the
eigenvalues as the ngers). Well see later that these dimensions are very important in the
theory of Jordan canonical form. For now, well stick to proving one proposition about them:
Proposition 2.6.3. The dimensions of corresponding generalized eigenspaces of similar matrices are the
same.
Proof. Notice that the dimension of a generalized eigenspace of A is the nullity of (T I
V
)
i
,
which depends only on the linear map T associated with A. Therefore its independent of the
choice of basis. Since similar matrices represent the same linear map in different bases, the
proposition follows.
In the example we had earlier, the standard basis of K
n,1
was a Jordan chain, and this mean that
the matrix A had a rather special form. Well give a name to matrices of this type:
15
Denition 2.6.4. We dene the Jordan block of degree k with eigenvalue to be the k k matrix
J
,k
whose entries are given by
ij
=
_
_
if j = i
1 if j = i + 1
0 otherwise.
So, for example,
J
1,2
=
_
1 1
0 1
_
, J
,3
=
_
_
3i
2
1 0
0
3i
2
1
0 0
3i
2
_
_
, and J
0,4
=
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0
_
_
_
_
are Jordan blocks, where =
3i
2
in the second example.
It should be clear that the matrix of T with respect to the basis v
1
, . . . , v
n
of K
n
is a Jordan block
of degree n if and only if v
1
, . . . , v
n
is a Jordan chain for A.
Note that the minimal polynomial of J
,k
is equal to (x )
k
, and the characteristic polynomial
is ( x)
k
.
Warning. Some authors put the 1s below rather than above the main diagonal in a Jordan
block. This corresponds to writing the Jordan chain in the reversed order.
2.7 Jordan bases and the Jordan canonical form
Denition 2.7.1. A Jordan basis for A is a basis of K
n
consisting of one or more Jordan chains
strung together.
Such a basis will look like
w
11
, . . . , w
1k
1
, w
21
, . . . , w
2k
2
, . . . , w
s1
, . . . , w
sk
s
,
where, for 0 i s, w
i1
, . . . , w
ik
i
is a Jordan chain (for some eigenvalue
i
).
We denote the mn matrix in which all entries are 0 by 0
m,n
. If A is an mm matrix and B an Lecture 7
n n matrix, then we denote the (m + n) (m + n) matrix with block form
_
A 0
m,n
0
n,m
B
_
,
by A B, the direct sum of A and B. For example
_
1 2
0 1
_
_
_
1 1 1
1 0 1
2 0 2
_
_
=
_
_
_
_
_
_
1 2 0 0 0
0 1 0 0 0
0 0 1 1 1
0 0 1 0 1
0 0 2 0 2
_
_
_
_
_
_
.
16
Its clear that the matrix of T with respect to a Jordan basis is the direct sum J
1
k
1
J
2
k
2

J
s
k
s
of the corresponding Jordan blocks.
We can now state the main theorem of this section, which says that if K is the complex numbers
C, then Jordan bases exist.
Theorem 2.7.2. Let A be an n n matrix over C. Then there exists a Jordan basis for A, and hence
A is similar to a matrix J which is a direct sum of Jordan blocks. The Jordan blocks occurring in J are
uniquely determined by A.
The matrix J in the theorem is said to be the Jordan canonical form (JCF) or sometimes Jordan
normal form of A. It is uniquely determined by A up to the order of the blocks.
Remark. The only reason we need K = C in this theorem is to ensure that A has at least one eigenvalue.
If K = R (or Q), wed run into trouble with
_
0 1
1 0
_
; this matrix has no eigenvalues, since c
A
(x) =
x
2
+ 1 has no roots in K. So it certainly has no Jordan chains.
We will prove the theorem later, in Section 2.11. First we derive some consequences and study
methods for calculating the JCF of a matrix. Note that, by Theorem 1.5.2, if P is the matrix
having the Jordan basis as columns, then P
1
AP = J.
Theorem 2.7.3 (Consequences of the JCF). Let A C
n,n
, and
1
, . . . ,
r
the set of eigenvalues of
A.
The characteristic polynomial of A is
(1)
n
r
i=1
(x
i
)
a
i
,
where a
i
is the sum of the degrees of the Jordan blocks of A of eigenvalue
i
.
The minimal polynomial of A is
r
i=1
(x
i
)
b
i
,
where b
i
is the largest among the degrees of the Jordan blocks of A of eigenvalue
i
.
A is diagonalizable if and only if
A
(x) has no repeated factors.
Proof. This follows from what we know about the minimal and characteristic polynomials
of Jordan blocks, together with the fact that if C = A B, then c
C
(x) is the product of c
A
(x)
and c
B
(x), and
C
(x) is the LCM of
A
(x) and
B
(x). For the last part, notice that if A is
diagonalizable, the JCF of A is just the diagonal form of A; since the JCF is unique, it follows
that A is diagonalizable if and only if every Jordan block for A has size 1, so all of the numbers
b
i
are 1.
17
2.8 The JCF when n=2 and 3
When n = 2 and n = 3, the JCF can be deduced just from the minimal and characteristic
polynomials. Let us consider these cases.
When n = 2, we have either two distinct eigenvalues
1
,
2
, or a single repeated eigenvalue
1
.
If the eigenvalues are distinct, then by Corollary 2.2.3 A is diagonalizable and the JCF is the
diagonal matrix J
1
1
J
2
1
.
Example 1. A =
_
1 4
1 1
_
. We calculate c
A
(x) = x
2
2x 3 = (x 3)(x + 1), so there are
two distinct eigenvalues, 3 and 1. Associated eigenvectors are
_
2
1
_
and
_
2
1
_
, so we put
P =
_
2 2
1 1
_
and then P
1
AP =
_
3 0
0 1
_
.
If the eigenvalues are equal, then there are two possible JCFs, J
1
1
J
1
1
, which is a scalar
matrix, and J
1
2
. The minimal polynomial is respectively (x
1
) and (x
1
)
2
in these two
cases. In fact, these cases can be distinguished without any calculation whatsoever, because in
the rst case A is a scalar multiple of the identity, and in particular A is already in JCF.
In the second case, a Jordan basis consists of a single Jordan chain of length 2. To nd such a
chain, let v
2
be any vector for which (A
1
I
2
)v
2
,= 0 and let v
1
= (A
1
I
2
)v
2
. (Note that, in
practice, it is often easier to nd the vectors in a Jordan chain in reverse order.)
Example 2. A =
_
1 4
1 3
_
. We have c
A
(x) = x
2
+ 2x + 1 = (x + 1)
2
, so there is a single
eigenvalue 1 with multiplicity 2. Since the rst column of A + I
2
is non-zero, we can choose
v
2
=
_
1
0
_
and v
1
= (A + I
2
)v
2
=
_
2
1
_
, so P =
_
2 1
1 0
_
and P
1
AP =
_
1 1
0 1
_
.
Now let n = 3. If there are three distinct eigenvalues, then A is diagonalizable. Lecture 8
Suppose that there are two distinct eigenvalues, so one has multiplicity 2, and the other has
multiplicity 1. Let the eigenvalues be
1
,
1
,
2
, with
1
,=
2
. Then there are two possible JCFs
for A, J
1
1
J
1
1
J
2
1
and J
1
2
J
2
1
, and the minimal polynomial is (x
1
)(x
2
) in the
rst case and (x
1
)
2
(x
2
) in the second.
In the rst case, a Jordan basis is a union of three Jordan chains of length 1, each of which
consists of an eigenvector of A.
Example 3. A =
_
_
2 0 0
1 5 2
2 6 2
_
_
. Then
c
A
(x) = (2 x)[(5 x)(2 x) + 12] = (2 x)(x
2
3x + 2) = (2 x)
2
(1 x).
We know from the theory above that the minimal polynomial must be (x 2)(x 1) or (x
2)
2
(x 1). We can decide which simply by calculating (A 2I
3
)(A I
3
) to test whether or not
18
it is 0. We have
A 2I
3
=
_
_
0 0 0
1 3 2
2 6 4
_
_
, A I
3
=
_
_
1 0 0
1 4 2
2 6 3
_
_
,
and the product of these two matrices is 0, so
A
= (x 2)(x 1).
The eigenvectors v for
1
= 2 satisfy (A 2I
3
)v = 0, and we must nd two linearly indepen-
dent solutions; for example we can take v
1
=
_
_
0
2
3
_
_
, v
2
=
_
_
1
1
1
_
_
. An eigenvector for the
eigenvalue 1 is v
3
=
_
_
0
1
2
_
_
, so we can choose
P =
_
_
0 1 0
2 1 1
3 1 2
_
_
and then P
1
AP is diagonal with entries 2, 2, 1.
In the second case, there are two Jordan chains, one for
1
of length 2, and one for
2
of length 1.
For the rst chain, we need to nd a vector v
2
with (A
1
I
3
)
2
v
2
= 0 but (A
1
I
3
)v
2
,= 0, and
then the chain is v
1
= (A
1
I
3
)v
2
, v
2
. For the second chain, we simply need an eigenvector
for
2
.
Example 4. A =
_
_
3 2 1
0 3 1
1 4 1
_
_
. Then
c
A
(x) = (3 x)[(3 x)(1 x) + 4] 2 + (3 x) = x
3
+ 5x
2
8x + 4 = (2 x)
2
(1 x),
as in Example 3. We have
A 2I
3
=
_
_
1 2 1
0 1 1
1 4 3
_
_
, (A 2I
3
)
2
=
_
_
0 0 0
1 3 2
2 6 4
_
_
, (A I
3
) =
_
_
2 2 1
0 2 1
1 4 2
_
_
.
and we can check that (A 2I
3
)(A I
3
) is non-zero, so we must have
A
= (x 2)
2
(x 1).
For the Jordan chain of length 2, we need a vector with (A 2I
3
)
2
v
2
= 0 but (A 2I
3
)v
2
,= 0,
and we can choose v
2
=
_
_
2
0
1
_
_
. Then v
1
= (A 2I
3
)v
2
=
_
_
1
1
1
_
_
. An eigenvector for the
eigenvalue 1 is v
3
=
_
_
0
1
2
_
_
, so we can choose
P =
_
_
1 2 0
1 0 1
1 1 2
_
_
19
and then
P
1
AP =
_
_
2 1 0
0 2 0
0 0 1
_
_
.
Finally, suppose that there is a single eigenvalue,
1
, so c
A
= (
1
x)
3
. There are three possible
JCFs for A, J
1
1
J
1
1
J
1
1
, J
1
2
J
1
1
, and J
1
3
, and the minimal polynomials in the three
cases are (x
1
), (x
1
)
2
, and (x
1
)
3
, respectively.
In the rst case, J is a scalar matrix, and A = PJP
1
= J, so this is recognisable immediately.
In the second case, there are two Jordan chains, one of length 2 and one of length 1. For the
rst, we choose v
2
with (A
1
I
3
)v
2
,= 0, and let v
1
= (A
1
I
3
)v
2
. (This case is easier than
the case illustrated in Example 4, because we have (A
1
I
3
)
2
v = 0 for all v C
3,1
.) For the
second Jordan chain, we choose v
3
to be an eigenvector for
1
such that v
2
and v
3
are linearly
independent.
Example 5. A =
_
_
0 2 1
1 3 1
1 2 0
_
_
. Then
c
A
(x) = x[(3 + x)x + 2] 2(x + 1) 2 + (3 + x) = x
3
3x
2
3x 1 = (1 + x)
3
.
We have
A + I
3
=
_
_
1 2 1
1 2 1
1 2 1
_
_
,
and we can check that (A + I
3
)
2
= 0. The rst column of A + I
3
is non-zero, so (A + I
3
)
_
_
1
0
0
_
_
,=
0, and we can choose v
2
=
_
_
1
0
0
_
_
and v
1
= (A + I
3
)v
2
=
_
_
1
1
1
_
_
. For v
3
we need to choose a
vector which is not a multiple of v
1
such that (A + I
3
)v
3
= 0, and we can choose v
3
=
_
_
0
1
2
_
_
.
So we have
P =
_
_
1 1 0
1 0 1
1 0 2
_
_
and then
P
1
AP =
_
_
1 1 0
0 1 0
0 0 1
_
_
.
In the third case, there is a single Jordan chain, and we choose v
3
such that (A
1
I
3
)
2
v
3
,= 0,
v
2
= (A
1
I
3
)v
3
, v
1
= (A
1
I
3
)
2
v
3
.
20
Example 6. A =
_
_
0 1 0
1 1 1
1 0 2
_
_
. Then
c
A
(x) = x[(2 + x)(1 + x)] (2 + x) + 1 = (1 + x)
3
.
We have
A + I
3
=
_
_
1 1 0
1 0 1
1 0 1
_
_
, (A + I
3
)
2
=
_
_
0 1 1
0 1 1
0 1 1
_
_
,
so (A + I
3
)
2
,= 0 and
A
= (x + 1)
3
. For v
3
, we need a vector that is not in the nullspace of
(A + I
3
)
2
. Since the second column, which is the image of
_
_
0
1
0
_
_
is non-zero, we can choose
v
3
=
_
_
0
1
0
_
_
, and then v
2
= (A + I
3
)v
3
=
_
_
1
0
0
_
_
and v
1
= (A + I
3
)v
2
=
_
_
1
1
1
_
_
. So we have
P =
_
_
1 1 0
1 0 1
1 0 0
_
_
and then
P
1
AP =
_
_
1 1 0
0 1 1
0 0 1
_
_
.
2.9 The general case
Lecture 9
In the examples above, we could tell what the sizes of the Jordan blocks were for each eigen-
value from the dimensions of the eigenspaces, since the dimension of the eigenspace for each
eigenvalue is the number of blocks for that eigenvalue. This doesnt work for n = 4: for
instance, the matrices
A
1
= J
,2
J
,2
and
A
2
= J
,3
J
,1
both have only one eigenvalue () with the eigenspace being of dimension 2.
(Knowing the minimal polynomial helps, but its a bit of a pain to calculate generally the
easiest way to nd the minimal polynomial is to calculate the JCF rst! Worse still, it still doesnt
uniquely determine the JCF in large dimensions, since
A
3
= J
,3
J
,3
J
,1
and
A
4
= J
,3
J
,2
J
,2
21
have the same minimal polynomial, the same characteristic polynomial, and the same number
of blocks.)
In general, we can compute the JCF from the dimensions of the generalized eigenspaces (what
I called the ngerprint of the matrix). Notice that the matrices A
1
and A
2
have different
ngerprints: the generalized eigenspace for of index 2 has dimension 4 for A
1
(its the whole
space) but dimension only 3 for A
2
.
Theorem 2.9.1. Let be an eigenvalue of a matrix A over C, and let J be the JCF of A. Then
(i) The number of Jordan blocks of J with eigenvalue is equal to nullity(A I
n
).
(ii) More generally, for i > 0, the number of Jordan blocks of J with eigenvalue and degree at least i
is equal to nullity((A I
n
)
i
) nullity((A I
n
)
i1
).
Note that this proves the uniqueness part of Theorem 2.7.2: the theorem says that the block sizes
of the Jordan form of A are determined by the ngerprint of A, so any two Jordan canonical
forms for A must have the same blocks (possibly ordered differently).
Proof. By Proposition 2.6.3, the corresponding generalized eigenspaces of A and J have the
same dimensions, so we may assume WLOG that A = J. So A is a direct sum of several Jordan
blocks J
1
,k
1
J
s
,k
s
.
However, its easy to see that the dimension of the generalized -eigenspace of index i of a
direct sum A B is the sum of the dimensions of the generalized eigenspaces of index i of A
and of B. Hence it sufces to prove the theorem for a single Jordan block J
,k
.
But we know that (J
,k
I
k
)
i
has a single diagonal line of zeroes i places above the diagonal,
for i < k, and is 0 for i k. Hence the dimension of its kernel is i for 0 i k and k for i k.
This clearly implies the theorem when A is a single Jordan block, and hence for any A.
2.10 Examples
Example 7. A =
_
_
_
_
1 3 1 0
0 2 1 0
0 0 2 0
0 3 1 1
_
_
_
_
. Then c
A
(x) = (1 x)
2
(2 x)
2
, so there are two
eigenvalue 1, 2, both with multiplicity 2. There are four possibilities for the JCF (one or two
blocks for each of the two eigenvalues). We could determine the JCF by computing the minimal
polynomial
A
but it is probably easier to compute the nullities of the eigenspaces and use
22
Theorem 2.9.1. We have
A + I
4
=
_
_
_
_
0 3 1 0
0 3 1 0
0 0 3 0
0 3 1 0
_
_
_
_
, (A 2I
4
) =
_
_
_
_
3 3 1 0
0 0 1 0
0 0 0 0
0 3 1 3
_
_
_
_
,
(A 2I
4
)
2
=
_
_
_
_
9 9 0 0
0 0 0 0
0 0 0 0
0 9 0 9
_
_
_
_
.
The rank of A + I
4
is clearly 2, so its nullity is also 2, and hence there are two Jordan blocks with
eigenvalue 1. The three non-zero rows of (A 2I
4
) are linearly independent, so its rank is
3, hence its nullity 1, so there is just one Jordan block with eigenvalue 2, and the JCF of A is
J
1,1
J
1,1
J
2,2
.
For the two Jordan chains of length 1 for eigenvalue 1, we just need two linearly independent
eigenvectors, and the obvious choice is v
1
=
_
_
_
_
1
0
0
0
_
_
_
_
, v
2
=
_
_
_
_
0
0
0
1
_
_
_
_
. For the Jordan chain v
3
, v
4
for
eigenvalue 2, we need to choose v
4
in the nullspace of (A 2I
4
)
2
but not in the nullspace of
A2I
4
. (This is why we calculated (A2I
4
)
2
.) An obvious choice here is v
4
=
_
_
_
_
0
0
1
0
_
_
_
_
, and then
v
3
=
_
_
_
_
1
1
0
1
_
_
_
_
, and to transform A to JCF, we put
P =
_
_
_
_
1 0 1 0
0 0 1 0
0 0 0 1
0 1 1 0
_
_
_
_
, P
1
=
_
_
_
_
1 1 0 0
0 1 0 1
0 1 0 0
0 0 1 0
_
_
_
_
, P
1
AP =
_
_
_
_
1 0 0 0
0 1 0 0
0 0 2 1
0 0 0 2
_
_
_
_
.
Example 8. A =
_
_
_
_
2 0 0 0
0 2 1 0
0 0 2 0
1 0 2 2
_
_
_
_
. Then c
A
(x) = (2 x)
4
, so there is a single eigen-
value 2 with multiplicity 4. We nd (A + 2I
4
) =
_
_
_
_
0 0 0 0
0 0 1 0
0 0 0 0
1 0 2 0
_
_
_
_
, and (A + 2I
4
)
2
= 0, so
A
= (x + 2)
2
, and the JCF of A could be J
2,2
J
2,2
or J
2,2
J
2,1
J
2,1
.
23
To decide which case holds, we calculate the nullity of A + 2I
4
which, by Theorem 2.9.1, is
equal to the number of Jordan blocks with eigenvalue 2. Since A + 2I
4
has just two non-zero
rows, which are distinct, its rank is clearly 2, so its nullity is 4 2 = 2, and hence the JCF of A is
J
2,2
J
2,2
.
A Jordan basis consists of a union of two Jordan chains, which we will call v
1
, v
2
, and v
3
, v
4
,
where v
1
and v
3
are eigenvectors and v
2
and v
4
are generalized eigenvectors of index 2. To nd
such chains, it is probably easiest to nd v
2
and v
4
rst and then to calculate v
1
= (A + 2I
4
)v
2
and v
3
= (A + 2I
4
)v
4
.
Although it is not hard to nd v
2
and v
4
in practice, we have to be careful, because they need to
be chosen so that no linear combination of them lies in the nullspace of (A + 2I
4
). In fact, since
this nullspace is spanned by the second and fourth standard basis vectors, the obvious choice is
v
2
=
_
_
_
_
1
0
0
0
_
_
_
_
, v
4
=
_
_
_
_
0
0
1
0
_
_
_
_
, and then v
1
= (A + 2I
4
)v
2
=
_
_
_
_
0
0
0
1
_
_
_
_
, v
3
= (A + 2I
4
)v
4
=
_
_
_
_
0
1
0
2
_
_
_
_
, so to
transform A to JCF, we put
P =
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 1
1 0 2 0
_
_
_
_
, P
1
=
_
_
_
_
0 2 0 1
1 0 0 0
0 1 0 0
0 0 1 0
_
_
_
_
, P
1
AP =
_
_
_
_
2 1 0 0
0 2 0 0
0 0 2 1
0 0 0 2
_
_
_
_
.
2.11 Proof of Theorem 2.7.2
This proof is rather difcult, and it will not be examined.
We proceed by induction on n = dim(V). The case n = 1 is clear.
Let be an eigenvalue of T and let U = im(T I
V
) and m = dim(U). Then m = rank(T
I
V
) = n nullity(T I
V
) < n, because the eigenvectors for lie in the nullspace of T I
V
.
For u U, we have u = (T I
V
)(v) for some v V, and hence T(u) = T(T I
V
)(v) =
(T I
V
)T(v) U. So T restricts to T
U
: U U, and we can apply our inductive hypothesis
to T
U
to deduce that U has a basis e
1
, . . . , e
m
, which is a disjoint union of Jordan chains for T
U
.
We now show how to extend the Jordan basis of U to one of V. We do this in two stages. For
the rst stage, suppose that l of the Jordan chains of T
U
are for the eigenvalue (possibly
l = 0). For each such chain v
1
, . . . , v
k
with T(v
1
) = v
1
, T(v
i
) = v
i
+v
i1
, 2 i k, since
v
k
U = im(T I
V
), we can nd v
k+1
V with T(v
k+1
) = v
k+1
+ v
k
, thereby extending
the chain by an extra vector. So far we have adjoined l new vectors to the basis, by extending
the length l Jordan chains by 1. Let us call these new vectors w
1
, . . . , w
l
.
For the second stage, observe that the rst vector in each of the l chains lies in the eigenspace of
T
U
for . We know that the dimension of the eigenspace of T for is the nullspace of (T I
V
),
which has dimension n m. So we can adjoin (n m) l further eigenvectors of T to the l that
we have already to complete a basis of the nullspace of (T I
V
). Let us call these (n m) l
new vectors w
l+1
, . . . , w
nm
. They are adjoined to our basis of V in the second stage. They each
24
form a Jordan chain of length 1, so we now have a collection of n vectors which form a disjoint
union of Jordan chains.
To complete the proof, we need to show that these n vectors form a basis of V, for which is it is
enough to show that they are linearly independent.
Partly because of notational difculties, we provide only a sketch proof of this, and leave
the details to the student. Suppose that
1
w
1
+
nm
w
nm
+ x = 0, where x is a linear
combination of the basis vectors of U. Applying T I
n
gives
1
(T I
n
)(w
1
) + +
l
(T I
n
)(w
l
) + (T I
n
)(x).
Each of
i
(T I
n
)(w
i
) for 1 i l is the last member of one of the l Jordan chains for T
U
.
When we apply (T I
n
) to one of the basis vectors of U, we get a linear combination of the
basis vectors of U other than
i
(T I
n
)(w
i
) for 1 i l. Hence, by the linear independence
of the basis of U, we deduce that
i
= 0 for 1 i l This implies that (T I
n
)(x) = 0, so x is
in the eigenspace of T
U
for the eigenvalue . But, by construction, w
l+1
, . . . , w
nm
extend a basis
of this eigenspace of T
U
to that the eigenspace of V, so we also get
i
= 0 for l + 1 i n m,
which completes the proof.
25
3 Functions of a matrix
In this short section, were going to use what we know about Jordan normal forms of matrices
to do a bit of analysis with matrices.
3.1 Motivation: Systems of dierential equations
Suppose I want to solve a system of rst-order simultaneous differential equations, something
like
da
dt
= 3a 4b + 8c,
db
dt
= a c,
dc
dt
= a + b + c.
Youll have seen things like this in your Differential Equations course last year. Now, I want to
write this in a bit of a different form. If we write v(t) =
_
_
a(t)
b(t)
c(t)
_
_
a vector-valued function of
time we can write the above system as
dv
dt
= Av
where A is the matrix
_
_
3 4 8
1 0 1
1 1 1
_
_
.
Aha! we say. I know what the solution is: its v(t) = e
At
v(0)! But then we pause, and say
Hang on, what does e
At
actually mean? In this section, well use what we now know about
special forms of matrices to work out how to dene e
At
, and other functions of a matrix, in a
sensible way that will make this actually work; and having got our denition, well work out
how to calculate with it. It turns out that Jordan canonical form will play a key role in making
the latter possible.
3.2 The exponential of a matrix
In this section, well dene e
A
where A is a matrix. We know that for z C, the exponential e
z
can be dened by its Taylor series. So were going to use the same series denition to dene e
A
for A a matrix:
Denition 3.2.1. If A C
n,n
, we dene e
A
to be the innite series
e
A
= I
n
+ A +
A
2
2
+
A
3
6
+ =
k=0
A
k
k!
.
26
We will, for the purposes of this course, assume that this series converges for any A (in the sense
that for any xed (i, j), the (i, j) entry of the Nth partial sum tends to a limit in C).
Remark. This is not especially hard to prove, but to explain clearly whats going on would need some
terminology (norms and normed spaces) that only comes towards the end of Analysis III. Hence you will
not need to know how to rigorously justify these limit statements for your exam but you will need to be
able to work with the matrix exponential in examples, like the ones later on in this section.
Warning. It is not true that e
A+B
= e
A
e
B
for general matrices A and B; we will see an example
of this on the rst problem sheet.
Lemma 3.2.2. Lecture 10
1. Let A and B C
n,n
be similar, so B = P
1
AP for some invertible matrix P. Then
e
B
= P
1
e
A
P.
2. We have e
A+B
= e
A
e
B
if A, B C
n,n
satisfy AB = BA.
3. If we consider the matrix-valued function t e
At
, for t R, then
d
dt
_
e
At
_
= Ae
At
.
Sketch proof. For part (1): its clear that B
k
= P
1
A
k
P for any k N(by induction on k, if you
like). So if we consider the sum of the rst n terms of the series, we have
n
k=0
B
k
k!
=
n
k=0
P
1
A
k
P
k!
= P
1
_
n
k=0
A
k
k!
_
P.
It can be shown that it is valid to let n , and we deduce that
k=0
B
k
k!
= P
1
_

k=0
A
k
k!
_
P,
as required.
For part (2), its easily checked by induction on k that if AB = BA, then the binomial expansion
of (A + B)
k
works: we have
(A + B)
k
=
k
j=0
_
k
j
_
A
kj
B
j
.
Substituting this into the power series denition of e
A+B
, and grouping terms appropriately
gives the result.
For part (3), we need to show that
lim
h0
_
e
A(t+h)
e
At
h
_
= Ae
At
.
27
Using part (2), we know that e
A(t+h)
= e
At
e
Ah
, so we just need to show that
lim
h0
_
e
Ah
I
n
h
_
= A.
Substituting in the series expansion of e
Ah
, we get
e
Ah
I
n
h
= A +
A
2
2
h +
A
3
3!
h
2
+ . . .
and Im sure youll believe me that the limit of the right-hand side is A.
Lets now consider an easy case: e
D
where D is a diagonal matrix.
Proposition 3.2.3. If D is diagonal, say D =
_
_
_
d
1
.
.
.
d
n
_
_
_, then e
D
=
_
_
_
e
d
1
.
.
.
e
d
n
_
_
_.
Proof. Clear from the denition, since D
k
/k! is diagonal with diagonal entries
(d
1
)
k
k!
, . . . ,
(d
n
)
k
k!
.
Combining the two propositions above, we can easily calculate e
A
for any diagonalizable A: just
nd P such that P
1
AP = D is diagonal, so e
A
= Pe
D
P
1
, and calculate e
D
as above. Similarly,
we can get a nice formula for the matrix-valued function t e
At
; its just
P
_
_
_
e
d
1
t
.
.
.
e
d
n
t
_
_
_P
1
.
In particular, for any diagonalizable A, each entry of e
At
is a linear combination of the functions
e
1
t
, . . . , e
n
t
with constant coefcients.
Example. Consider A =
_
1 4
1 1
_
. This was Example 1 from Section 2.8 above, and we saw that
P
1
AP = D where
P =
_
2 2
1 1
_
, D =
_
3 0
0 1
_
.
Hence
e
At
= Pe
Dt
P
1
=
_
2 2
1 1
__
e
3t
0
0 e
t
__
2 2
1 1
_
1
=
_
1
2
e
3t
+
1
2
e
t
e
3t
e
t
1
4
e
3t
1
4
e
t 1
2
e
3t
+
1
2
e
t
_
.
28
Now, what happens if we look at a matrix which isnt diagonalizable? We may as well start by
using a similarity to make it as nice as possible: lets put it in Jordan canonical form. Its clear
that if A = A
1
A
s
is a block sum of smaller matrices, then
e
A
= e
A
1
e
A
s
.
So we need to understand what the exponential of a single Jordan block looks like.
Theorem 3.2.4. If J = J
,s
is a Jordan block, then e
Jt
is the matrix whose (i, j) entry is given by
_
_
t
ji
e
t
(j i)!
if j i
0 if j < i
Sketch proof. Lets calculate J
k
where J = J
,s
. We know that J
,s
= I
n
+ J
0,s
, and these terms
obviously commute; so we can apply the binomial theorem to nd that
J
k
,s
=
k
t=0
_
k
t
_
kt
J
0,s
t
.
But J
0,s
has minimal polynomial x
s
, so J
0,s
t
is 0 for t > s; that is, we have
J
k
,s
=
min(k,s)
t=0
_
k
t
_
kt
J
0,s
t
.
Since J
t
0,s
has a diagonal strip of 1s at t places above the main diagonal, and no other entries,
weve shown that the (i, j) entry of J
,s
k
is
_
_
_
kj+i
_
k
j i
_
if i j i + k
0 if j < i or j > i + k.
Now, it follows that the (i, j) entry of e
Jt
is 0 for j < i and for j i it is
k=ji
t
k
k!
kj+i
_
k
j i
_
,
with the lower bound on the sum corresponding to the fact that the (i, j) entry of J
k
is 0 if
j > i + k.
We substitute in (
k
ji
) =
k!
(j i)!(k j + i)!
and this magically simplies:
k=0
t
k
kj+i
k!
(j i)!(k j + i)!k!
=
t
ji
(j i)!
k=ji
(t)
kj+i
(k j + i)!
29
The last sum is just crying out for us to substitute a new summation variable m = k j + i, and
it just becomes
t
ji
(j i)!
m=0
(t)
m
m!
=
t
ji
(j i)!
e
t
as required.
Lecture 11
Example. Let
A =
_
_
1 0 3
1 1 6
1 2 5
_
_
.
Using the methods of the last chapter we can check that its JCF is J =
_
_
2 1 0
0 2 0
0 0 1
_
_
and the basis
change matrix P such that J = P
1
AP is given by P =
_
_
3 0 2
3 1 1
1 1 0
_
_
.
Applying the argument above, we see that e
At
= Pe
Jt
P
1
where
e
Jt
=
_
_
e
2t
te
2t
0
0 e
2t
0
0 0 e
t
_
_
.
We can now calculate e
At
explicitly by doing the matrix multiplication to get the entries of
Pe
Jt
P
1
, as we did in the 2 2 example above; but I think youll agree its better if I dont write
this down here!
Now lets apply this to solve a differential equation.
Example. Let x and y be functions of time satisfying
dx
dt
= 3x 4y,
dy
dt
= x y.
We can write this as
dv
dt
= Av, where v(t) =
_
x(t)
y(t)
_
and A =
_
3 4
1 1
_
. Suppose we choose
any vector v
0
C
2,1
; then its clear from the lemma that
d
dt
_
e
At
v
0
_
= Ae
At
v
0
.
Hence if we set v(t) = e
At
v
0
, we know that this is a solution to the differential equations, with
v(0) = v
0
being the vector we started with. And we can use what we know about Jordan form
to calculate e
At
: we have
e
At
=
_
(1 + 2t)e
t
4te
t
te
t
(1 2t)e
t
_
.
30
So the general solution of the equations above is
_
x(t) = (1 + 2t)e
t
x(0) 4te
t
y(0),
y(t) = te
t
x(0) + (1 2t)e
t
y(0).
3.3 Dierence equations
In the course of the proof of Theorem 3.2.4, we gave a formula for the entries of the powers of a
matrix in Jordan normal form. As well as being a step in nding the exponential, this has other
uses too, since knowing powers of a matrix is useful for solving difference equations.
For example, the Fibonacci numbers F
n
satisfy the difference equation
F
n+1
= F
n
+ F
n1
,
of degree 2, with F
0
= 0 and F
1
= 1. We can reduce this to a degree 1 difference equation in
two variables by considering the sequences x
n
and y
n
dened by x
n
= F
n
and y
n
= F
n+1
; these
satisfy
_
x
n+1
= y
n
,
y
n+1
= x
n
+ y
n
.
Just as we did for differential equations, we can write this as one equation for the vector-valued
sequence v
n
=
_
x
n
y
n
_
: we have v
n+1
= Av
n
where A =
_
0 1
1 1
_
. Hence v
n
= A
n
v
0
.
This matrix A has distinct eigenvalues =
1 +
5
2
and =
1
5
2
, so we can write A =
PDP
1
where P is the matrix of eigenvectors of A and
D =
_
0
0
_
.
Therefore A
n
= PD
n
P
1
. But D
n
is easy to calculate: its just
_
n
0
0
n
_
. If we do a quick
calculation of what P is, we obtain Binets formula for the Fibonacci numbers:
F
n
=

n

n

=
_
1 +
5
2
_
n
_
1 +
5
2
_
n
5
.
If we try and do this for more complicated difference equations, we might hit matrices which
arent diagonalizable, so we will need to use Jordan canonical form. Heres an example (taken
from the book by Kaye and Wilson, 14.11):
31
Example. Let x
n
, y
n
, z
n
be sequences of complex numbers satisfying
_
_
x
n+1
= 3x
n
+ z
n
,
y
n+1
= x
n
+ y
n
z
n
,
z
n+1
= y
n
+ 2z
n
.
with x
0
= y
0
= z
0
= 1.
We can write this as
v
n+1
=
_
_
3 0 1
1 1 1
0 1 2
_
_
v
n
.
So we have
v
n
= A
n
v
0
= A
n
_
_
1
1
1
_
_
where A is this 3 3 matrix.
We nd that the JCF of A is J = P
1
DP where
J = J
2,3
=
_
_
2 1 0
0 2 1
0 0 2
_
_
, P =
_
_
1 1 1
0 1 0
1 0 0
_
_
.
The formula for the entries of J
k
for J a Jordan block tells us that
J
n
=
_
_
_
_
2
n
n2
n1
_
n
2
_
2
n2
0 2
n
n2
n1
0 0 2
n
_
_
_
_
= 2
n
_
_
1
1
2
n
1
4
(
n
2
)
0 1
1
2
n
0 0 1
_
_
We therefore have
A
n
= PJ
n
P
1
= 2
n
_
_
1 1 1
0 1 0
1 0 0
_
_
_
_
1
1
2
n
1
4
(
n
2
)
0 1
1
2
n
0 0 1
_
_
_
_
0 0 1
0 1 0
1 1 1
_
_
= 2
n
_
_
1 1 +
1
2
n 1 +
1
2
n +
1
4
(
n
2
)
0 1
1
2
n
1
1
2
n
1
4
(
n
2
)
_
_
_
_
0 0 1
0 1 0
1 1 1
_
_
= 2
n
_
_
1 +
1
2
n +
1
4
(
n
2
)
1
4
(
n
2
)
1
2
n +
1
4
(
n
2
)
1
2
n 1
1
2
n
1
2
n
1
4
(
n
2
)
1
2
n
1
4
(
n
2
) 1
1
4
(
n
2
)
_
_
32
Finally, we obtain
A
n
_
_
1
1
1
_
_
= 2
n
_
_
1 + n +
3
4
(
n
2
)
1
3
2
n
1 +
1
2
n
3
4
(
n
2
)
_
_
or equivalently, using the fact that (
n
2
) =
n(n1)
2
,
_
_
x
n
= 2
n
(
3
4
n
2
+
5
8
n + 1)
y
n
= 2
n
(1
3
2
n)
z
n
= 2
n
(
3
4
n
2
+
7
8
n + 1).
33
4 Bilinear Maps and Quadratic Forms
Lecture 12
Well now introduce another, rather different kind of object you can dene for vector spaces: a
bilinear map. These are a bit different from linear maps: rather than being machines that take a
vector and spit out another vector, they take two vectors as input and spit out a number.
4.1 Bilinear maps: denitions
Let V and W be vector spaces over a eld K.
Denition 4.1.1. A bilinear map on V and W is a map : V W K such that
(i) (
1
v
1
+
2
v
2
, w) =
1
(v
1
, w) +
2
(v
2
, w); and
(ii) (v,
1
w
1
+
2
w
2
) =
1
(v, w
1
) +
2
(v, w
2
)
for all v, v
1
, v
2
V, w, w
1
, w
2
W, and
1
,
2
K.
So (v, w) is linear in v for each w, and linear in w for each v linear in two different ways,
hence the term bilinear.
Clearly if we x bases of V and W, a bilinear map will be determined by what it does to the
basis vectors. Choose a basis e
1
, . . . , e
n
of V and a basis f
1
, . . . , f
m
of W.
Let : WV K be a bilinear map, and let
ij
= (e
i
, f
j
), for 1 i n, 1 j m. Then the
n m matrix A = (
ij
) is dened to be the matrix of with respect to the bases e
1
, . . . , e
n
and
f
1
, . . . , f
m
of V and W.
For v V, w W, let v = x
1
e
1
+ + x
n
e
n
and w = y
1
f
1
+ + y
m
f
m
, so the coordinates of
v and w with respect to our bases are
v =
_
_
_
_
_
_
x
1
x
2
.
.
x
n
_
_
_
_
_
_
K
n,1
, and w =
_
_
_
_
_
_
y
1
y
2
.
.
y
m
_
_
_
_
_
_
K
m,1
.
Then, by using the equations (i) and (ii) above, we get
(v, w) =
n
i=1
m
j=1
x
i
(e
i
, f
j
) y
j
=
n
i=1
m
j=1
x
i

ij
y
j
= v
T
Aw. (2.1)
So once weve xed bases of V and W, every bilinear map on V and W corresponds to an n m
matrix, and conversely every matrix determines a bilinear map.
For example, let V = W = R
2
and use the natural basis of V. Suppose that A =
_
1 1
2 0
_
.
Then
((x
1
, x
2
), (y
1
, y
2
)) = (x
1
x
2
)
_
1 1
2 0
__
y
1
y
2
_
= x
1
y
1
x
1
y
2
+ 2x
2
y
1
.
34
4.2 Bilinear maps: change of basis
We retain the notation of the previous section, so is a bilinear form on V and W, and A is the
matrix of with respect to some bases e
1
, . . . , e
n
of V and f
1
, . . . , f
m
of W.
As in 1.5 of the course, suppose that we choose new bases e
/
1
, . . . , e
/
n
of V and f
/
1
, . . . , f
/
m
of W,
and let P and Q be the associated basis change matrices. Let B be the matrix of with respect to
these new bases.
Let v be any vector in V. Then we know (from Proposition 1.5.1) that if v K
n,1
is the column
vector of coordinates of v with respect to the old basis e
1
, . . . , e
n
, and v
/
the coordinates of v in
the new basis e
/
1
, . . . , e
/
n
, then we have Pv
/
= v. Similarly, for any w W, the coordinates w
and w
/
of w with respect to the old and new bases of W are related by Qw
/
= w.
We know that we have
v
T
Aw = (v, w) = (v
/
)
T
Bw
/
.
Substituting in the formulae from Proposition 1.5.1, we have
(v
/
)
T
Bw
/
= (Pv
/
)
T
A(Qw
/
)
= (v
/
)
T
P
T
AQw
/
.
Since this relation must hold for all v
/
K
n,1
and w
/
K
m,1
, the two matrices in the middle
must be equal (exercise!): that is, we have B = P
T
AQ. So weve proven:
Theorem 4.2.1. Let A be the matrix of the bilinear map : W V K with respect to the bases
e
1
, . . . , e
n
and f
1
, . . . , f
m
of V and W, and let B be its matrix with respect to the bases e
/
1
, . . . , e
/
n
and
f
/
1
, . . . , f
/
m
of V and W. Let P and Q be the basis change matrices, as dened above. Then B = Q
T
AP.
Compare this result with Theorem 1.5.2.
We shall be concerned from now on only with the case where V = W. A bilinear map :
V V K is called a bilinear form on V. Theorem 4.2.1 then becomes:
Theorem 4.2.2. Let A be the matrix of the bilinear form on V with respect to the basis e
1
, . . . , e
n
of
V, and let B be its matrix with respect to the basis e
/
1
, . . . , e
/
n
of V. Let P the basis change matrix with
original basis e
i
and new basis e
/
i
. Then B = P
T
AP.
Lets give a name to this relation between matrices:
Denition 4.2.3. Symmetric matrices A and B are called congruent if there exists an invertible
matrix P with B = P
T
AP.
So congruent matrices represent the same bilinear form in different bases. Notice that congru-
ence is very different from similarity; if is a quadratic form on V and T is a linear operator on
V, it might be the case that and T have the same matrix A in some specic basis of V, but that
doesnt mean that they have the same matrix in any other basis they inhabit different worlds.
35
So, in the example at the end of Subsection 4.1, if we choose the new basis e
/
1
=
_
1
1
_
,
e
/
2
=
_
1
0
_
then P =
_
1 1
1 0
_
, P
T
AP =
_
0 1
2 1
_
, and
((y
/
1
e
/
1
+ y
/
2
e
/
2
, x
/
1
e
/
1
+ x
/
2
e
/
2
)) = y
/
1
x
/
2
+ 2y
/
2
x
/
1
+ y
/
2
x
/
2
.
Since P is an invertible matrix, P
T
is also invertible (its inverse is (P
1
)
T
), and so the matrices Lecture 13
P
T
AP and A are equivalent matrices in the sense of MA106, and hence have the same rank.
The rank of the bilinear form is dened to be the rank of its matrix A. So we have just shown
that the rank of is a well-dened property of , not depending on the choice of basis weve
used.
In fact we can say a little more. Its clear that a vector v K
n,1
is zero if and only if v
T
w = 0 for
all vectors w K
n,1
. Since
(v, w) = v
T
Aw,
the kernel of A is equal to the space
v V : (w, v) = 0 w V
(the right radical of ) and the kernel of A
T
is equal to the space
v V : (v, w) = 0 w V
(the left radical). Since A
T
and A have the same rank, the left and right radicals both have
dimension n r, where r is the rank of . In particular, the rank of is n if and only if the left
and right radicals are zero. If this occurs, well say tau is nondegenerate; so is nondegenerate if
and ony if its matrix (in any basis) is nonsingular.
You could be forgiven for expecting that we were about to launch into a long study of how to
choose, given a bilinear form on V, the best basis for V which makes the matrix of as nice
as possible. We are not going to do this, because although its a very natural question to ask, its
extremely hard! Instead, well restrict ourselves to a special kind of bilinear form where life is
much easier, which covers most of the bilinear forms that come up in real life.
Denition 4.2.4. We say bilinear form on V is symmetric if (w, v) = (v, w) for all v, w V.
We say is antisymmetric (or sometimes alternating) if (w, v) = (v, w).
An n n matrix A is called symmetric if A
T
= A, and anti-symmetric if A
T
= A. We then
clearly have:
Proposition 4.2.5. The bilinear form is symmetric or anti-symmetric if and only if its matrix (with
respect to any basis) is symmetric or anti-symmetric.
The best known example of a symmetric form is when V = R
n
, and is dened by
((x
1
, x
2
, . . . , x
n
), (y
1
, y
2
, . . . , y
n
)) = x
1
y
1
+ x
2
y
2
+ + x
n
y
n
.
36
This form has matrix equal to the identity matrix I
n
with respect to the standard basis of R
n
.
Geometrically, it is equal to the normal scalar product (v, w) = [v[[w[ cos , where is the
angle between the vectors v and w.
On the other hand, the form on R
2
dened by ((x
1
, x
2
), (y
1
, y
2
)) = x
1
y
2
x
2
y
1
is anti-
symmetric. This has matrix
_
0 1
1 0
_
.
Technical note: For the theory of symmetric and anti-symmetric bilinear forms to work properly,
we need to be able to divide by 2 in the eld K. This means that we must assume that 1 + 1 ,= 0
in K. So, for example, K is not allowed to be the eld F
2
of order 2. If you prefer to avoid
worrying about technicalities like this, then you can safely assume that K is either Q, R or C.
Having made this assumption, we can easily show that
Proposition 4.2.6. Any bilinear form can be written uniquely as
1
+
2
where
1
is symmetric and
2
is antisymmetric.
Proof. We just put
1
(v, w) =
1
2
((v, w) + (w, v)) and
2
(v, w) =
1
2
((v, w) (w, v)). Its
clear that
1
is symmetric and
2
is antisymmetric.
Moreover, given any other such expression =
/
1
+
/
2
, we have
1
(v, w) =

/
1
(v, w) +
/
1
(w, v) +
/
2
(v, w) +
/
2
(w, v)
2
=

/
1
(v, w) +
/
1
(v, w) +
/
2
(v, w)
/
2
(v, w)
2
from the symmetry and antisymmetry of
/
1
and
/
2
. The last two terms cancel each other and
we just have
=
2
/
1
(v, w)
2
=
/
1
(v, w).
So
1
=
/
1
, and so
2
=
1
=
/
1
=
/
2
, so the decomposition is unique.
(Notice that
1
2
has to exist in K for all this to make sense!)
4.3 Quadratic forms
If is a bilinear form on a vector space V, then we can consider the function q(v) = (v, v).
If we x a basis of V, then we can represent v by its coordinates, and q will be a polynomial
function in the coordinates of v of which every term is of degree 2, like 3x
2
+2xz +z
2
4yz + xy.
This is what we call a quadratic form.
These quadratic forms are important objects. For instance, one sees them often in geometry: an
equation like
3x
2
+ 2xz + z
2
4yz + xy = 17
describes a surface in R
3
, and well see later that we can use linear algebra to understand the
shape of this surface.
37
Denition 4.3.1. Let V be a vector space over the eld K. Then a quadratic form on V is a
function q : V K that is dened by q(v) = (v, v), where : V V K is a bilinear form.
If we write =
1
+
2
with
1
symmetric and
2
antisymmetric, as in 4.2.6 above, then
(v, v) =
1
(v, v), so any quadratic form can be written in the form (v, v) for symmetric.
On the other hand, if is symmetric then for u, v V, Lecture 14
q(u +v) = (u +v, u +v) = (u, u) + (v, v) + (u, v) + (v, u) = q(u) + q(v) + 2(u, v)
and hence (u, v) = (q(u +v) q(u) q(v))/2, and is completely determined by q. Hence
there is a one-one correspondence between symmetric bilinear forms on V and quadratic forms
on V.
Let e
1
, . . . , e
n
be a basis of V. Recall that the coordinates of v with respect to this basis are
dened to be the eld elements x
i
such that v =
n
i=1
x
i
e
i
.
Let A = (
ij
) be the matrix of with respect to this basis. We will also call A the matrix of q with
respect to this basis. Then A is symmetric because is, and by Equation (2.1) of Subsection 4.1,
we have
q(v) = v
T
Av =
n
i=1
n
j=1
x
i
ij
x
j
=
n
i=1
ii
x
2
i
+ 2
n
i=1
i1
j=1
ij
x
i
x
j
. (3.1)
Conversely, if we are given a quadratic form as in the right hand side of Equation (3.1), then is
easy to write down its matrix A. For example, if n = 3 and q(v) = 3x
2
+ y
2
2z
2
+ 4xy xz,
then A =
_
_
3 2 1/2
2 1 0
1/2 0 2
_
_
.
4.4 Nice bases for quadratic forms
Well now show how to choose a basis for V which makes a given symmetric bilinear form (or,
equivalently, quadratic form) as nice as possible. This will turn out to be much easier than
the corresponding problem for linear operators.
Theorem 4.4.1. Let V be a vector space of dimension n equipped with a symmetric bilinear form (or,
equivalently, a quadratic form q).
Then there is a basis b
1
, . . . , b
n
of V, and constants
1
, . . . ,
n
, such that
(b
i
, b
j
) =
_
i
if j = i
0 if j ,= i
.
Equivalently,
38
given any symmetric matrix A, there is an invertible matrix P such that P
T
AP is a diagonal
matrix (i.e. A is congruent to a diagonal matrix);
given any quadratic form q on a vector space V, there is a basis b
1
, . . . , b
n
of v and constants
1
, . . . ,
n
such that
q(x
1
b
1
+ + x
n
b
n
) =
1
x
2
1
+ +
n
x
2
n
.
Proof. We shall prove this by induction on n = dimV. If n = 0 then there is nothing to prove,
so lets assume that n 1.
If is zero, then again there is nothing to prove, so we may assume that ,= 0. Then the
associated quadratic form q is not zero either, so there is a vector v V such that q(v, v) ,= 0.
Let b
1
= v and let
1
= q(v).
Consider the linear map V K given by w (w, v). This is not the zero map, so its image
has rank 1; so its kernel W has rank n 1. Moreover, this (n 1)-dimensional subspace doesnt
contain b
1
= v.
By the induction hypothesis, we can nd a basis b
2
, . . . , b
n
for the kernel such that (b
i
, b
j
) = 0
for all 2 i < j n; and all of these vectors lie in the space W, so we also have (b
1
, b
j
) = 0
for all 2 j n. Since b
1
/ W, it follows that b
1
, . . . , b
n
is a basis of V, so were done.
Finding the good basis: The above proof is quite short and slick, and gives us very little help Lecture 15
if we explicitly want to nd the diagonalizing basis. So lets unravel whats going on a bit more
explicitly. Well see in a moment that whats going on is very closely related to completing the
square in school algebra.
So lets say we have a quadratic form q. As usual, let B = (
ij
) be the matrix of q with respect to
some random basis b
1
, . . . , b
n
. Well modify the basis b
i
step-by-step in order to eventually get
it into the nice form the theorem predicts. Let
ij
be the matrix of q with respect to the basis
b
1
, . . . , b
n
.
Step 1: Arrange that q(b
1
) ,= 0. Here there are various cases to consider.
If
11
,= 0, then were done: this means that q(b
1
) ,= 0, so we dont need to do anything.
If
11
= 0, but
ii
,= 0 for some i > 1, then we just interchange b
1
and b
i
in our basis.
If
ii
= 0 for all i, but there is some i and j such that
ij
,= 0, then we replace b
i
with
b
i
+b
j
; since
q(b
i
+b
j
) = q(b
i
) + q(b
j
) + 2(b
i
, b
j
) = 2
ij
,
after making this change we have q(b
i
) ,= 0, so were reduced to one of the two previous
cases.
If
ij
= 0 for all i and j, we can stop: the matrix of q is zero, so its certainly diagonal.
39
Step 2: Modify b
2
, . . . , b
n
to make them orthogonal to b
1
. Suppose weve done Step 1, but
we havent stopped, so
11
is now nonzero. We want to arrange that (b
1
, b
i
) is 0 for all i > 1.
To do this, we just replace b
i
with
b
i

1i
11
b
1
.
This works because
(b
1
, b
i

1i
11
b
1
) = (b
1
, b
i
)

1i
11
(b
1
, b
1
) =
1i

1i
11
11
= 0.
This is where the relation to completing the square comes in. Weve changed our basis by the
matrix
P =
_
_
_
_
_
1
12
11
. . .
1n
11
1
.
.
.
1
_
_
_
_
_
so the coordinates of a vector v V change by the inverse of this, which is just
P
1
=
_
_
_
_
_
1

12
11
. . .

1n
11
1
.
.
.
1
_
_
_
_
_
This corresponds to writing
q(x
1
b
1
+ + x
n
b
n
) =
11
x
2
1
+ 2
12
x
1
x
2
+ + 2
1n
x
1
x
n
+ C
where C doesnt involve x
1
at all, and writing this as
11
_
x
1
+

12
11
x
2
+ +

1n
11
x
n
_
2
+ C
/
where C
/
also doesnt involve x
1
. Then our change of basis changes the coordinates so the whole
bracketed term becomes the rst coordinate of v; weve eliminated cross terms involving x
1
and one of the other variables.
Step 3: Induct on n. Now weve managed to engineer a basis b
1
, . . . , b
n
such that the matrix
B =
ij
of q looks like
_
_
_
_
_
11
0 . . . 0
0 ? . . . ?
.
.
.
.
.
.
.
.
.
.
.
.
0 ? . . . ?
_
_
_
_
_
So we can now repeat the process with V replaced by the (n 1)-dimensional vector space
W spanned by b
2
, . . . , b
n
. We can mess around as much as we like with the vectors b
2
, . . . , b
n
40
without breaking the fact that they pair to zero with b
1
, since this is true of any vector in W. So
we go back to step 1 but with a smaller n, and keep going until we either have an 0-dimensional
space or a zero form, in which case we can safely stop.
Example. Let V = R
3
(or Q
3
) and q((x, y, z)) = xy + 3yz 5xz, so the matrix of q with respect
to the standard basis e
1
, e
2
, e
3
is
A =
_
_
0 1/2 5/2
1/2 0 3/2
5/2 3/2 0
_
_
.
(Since weve only got 3 variables, its much less work to call them x, y, z than x
1
, x
2
, x
3
.)
All the diagonal entries of A are zero, so were in Case 3 of the proof above. But
12
is 1/2, which
isnt zero; so we replace e
1
with e
1
+e
2
. That is, we work in the basis b
1
, b
2
, b
3
= e
1
+e
2
, e
2
, e
3
.
Thus the basis change matrix from e
1
, e
2
, e
3
to b
1
, b
2
, b
3
is
P =
_
_
1 0 0
1 1 0
0 0 1
_
_
.
And we have
q(xb
1
+ yb
2
+ zb
3
) = q((x, x + y, z)) = x(x + y) + 3(x + y)z 5xz = x
2
+ xy 2xz + 3yz,
so the matrix of q in the basis b
1
, b
2
, b
3
is
B =
_
_
1 1/2 1
1/2 0 3/2
1 3/2 0
_
_
.
Now we go to Step 2, and try and clear the rst row and column by modifying b
2
and b
3
.
That is, we introduce a new basis b
/
1
, b
/
2
, b
/
3
with b
/
1
= b
1
= (1, 1, 0), but b
/
2
= b
2
1/2b
1
=
(0, 1, 0) 1/2(1, 1, 0) = (1/2, 1/2, 0), and b
/
3
= b
3
(1)b
1
= (0, 0, 1) + (1, 1, 0) = (1, 1, 1).
So the basis change matrix from e
1
, e
2
, e
3
to b
/
1
, b
/
2
, b
/
3
is
P
/
=
_
_
1 1/2 1
1 1/2 1
0 0 1
_
_
.
This corresponds to writing
x
2
+ xy 2xz + 3yz =
_
(x + 1/2y z)
2
1/4y
2
z
2
+ yz
+ 3yz
= (x + 1/2y z)
2
1/4y
2
+ 4yz z
2
.
We have xb
/
1
+ yb
/
2
+ zb
/
3
= (x 1/2y + z)b
1
+ yb
2
+ zb
3
, so this tells us that
q(xb
/
1
+ yb
/
2
+ zb
/
3
) = x
2
1/4y
2
+ 4yz z
2
.
41
so the of matrix of q with respect to the b
/
basis is
B
/
=
_
_
1 0 0
0 1/4 2
0 2 1
_
_
.
Now we concentrate on making the bottom right 2 2 block look nice by manipulating the
second and third basis vectors. Any subsequent changes of basis we make will keep the rst
basis vector unchanged. We have q(yb
/
1
+ zb
/
2
) = 1/4y
2
+ 4yz z
2
, the leftover terms from
above, so we are just trying to make this 2-variable quadratic form nicer.
Since q(b
/
2
) = 1/4 ,= 0, we dont need to do anything for Step 1. In Step 2 we replace b
/
1
, b
/
2
, b
/
3
by another new basis b
//
1
, b
//
2
, b
//
3
, where b
//
1
= b
/
1
(as promised), b
//
2
= b
/
2
, and
b
//
3
= b
/
3
2
1/4
b
/
2
= (1, 1, 1) + 8(1/2, 1/2, 0) = (3, 5, 1).
So we now have
P
//
=
_
_
1 1/2 3
1 1/2 5
0 0 1
_
_
.
This corresponds, of course, to the completing-the-square operation
1/4y
2
+ 4yz z
2
= 1/4(y 8z)
2
+ 15z
2
.
So the matrix of is now
B
//
=
_
_
1 0 0
0 1/4 0
0 0 15
_
_
.
This is diagonal, so were done: the matrix of q in the basis b
//
1
, b
//
2
, b
//
3
is the diagonal matrix B
//
.
Notice that the choice of good basis, and the resulting good matrix, are extremely far from Lecture 16
unique. For instance, in the example above we could have replaced b
//
2
with 2b
//
2
to get the
(perhaps nicer) matrix
_
_
1 0 0
0 1 0
0 0 15
_
_
.
In the case K = C, we can do even better. After reducing q to the form q(v) =
n
i=1

ii
x
2
i
, we can
permute the coordinates if necessary to get
ii
,= 0 for 1 i r and
ii
= 0 for r + 1 i n,
where r = rank(q). We can then make a further coordinates change x
/
i
=

ii
x
i
(1 i r),
giving q(v) =
r
i=1
(x
/
i
)
2
. Hence we have proved:
Proposition 4.4.2. A quadratic form q over C has the form q(v) =
r
i=1
x
2
i
with respect to a suitable
basis, where r = rank(q).
Equivalently, given a symmetric matrix A C
n,n
, there is an invertible matrix P C
n,n
such that
P
T
AP = B, where B = (
ij
) is a diagonal matrix with
ii
= 1 for 1 i r,
ii
= 0 for r +1 i n,
and r = rank(A).
42
In particular, up to changes of basis, a quadratic form on C
n
is uniquely determined by its rank.
We say the rank is the only invariant of a quadratic form over C.
When K = R, we cannot take square roots of negative numbers, but we can replace each positive
i
by 1 and each negative
i
by 1 to get:
Proposition 4.4.3 (Sylvesters Theorem). A quadratic form q over R has the form q(v) =
t
i=1
x
2
i

u
i=1
x
2
t+i
with respect to a suitable basis, where t + u = rank(q).
Equivalently, given a symmetric matrix A R
n,n
, there is an invertible matrix P R
n,n
such that
P
T
AP = B, where B = (
ij
) is a diagonal matrix with
ii
= 1 for 1 i t,
ii
= 1 for
t + 1 i t + u, and
ii
= 0 for t + u + 1 i n, and t + u = rank(A).
We shall now prove that the numbers t and u of positive and negative terms are invariants of q.
The pair of integers (t, u) is called the signature of q.
Theorem 4.4.4 (Sylvesters Law of Inertia). Suppose that q is a quadratic form on the vector space V
over R, and that e
1
, . . . , e
n
and e
/
1
, . . . , e
/
n
are two bases of V such that
q(x
1
e
1
+ + x
n
e
n
) =
t
i=1
x
2
i

u
i=1
x
2
t+i
and
q(x
1
e
/
1
+ + x
n
e
/
n
) =
t
/
i=1
x
2
i

u
/
i=1
x
2
t+i
.
Then t = t
/
and u = u
/
.
Proof. We know that t + u = t
/
+ u
/
= rank(q), so it is enough to prove that t = t
/
. Suppose not;
by symmetry we may suppose that t > t
/
.
Let V
1
be the span of e
1
, . . . , e
t
, and let V
2
be the span of e
/
t
/
+1
, . . . e
/
n
. Then for any non-zero
v V
1
we have q(v) > 0; while for any v V
2
we have q(v) 0. So there cannot be any
nonzero v V
1
V
2
.
On the other hand, we have dim(V
1
) = t and dim(V
2
) = n t
/
. It was proved in MA106 that
dim(V
1
+ V
2
) = dim(V
1
) + dim(V
2
) dim(V
1
V
2
),
so dim(V
1
V
2
) = t t
/
> 0. Since we have shown that V
1
V
2
= 0, this is a contradiction,
which completes the proof.
Remark. Notice that any nonzero x R is either equal to a square, or 1 times a square, but not both.
This property is shared by the nite eld F
7
of integers mod 7, so any quadratic form over F
7
can be
written as a diagonal matrix with only 0s, 1s and 1s down the diagonal (i.e. Sylvesters Theorem
holds over F
7
). But Sylvesters law of inertia isnt valid in F
7
: in fact, we have
_
2 3
4 2
_
T
_
1 0
0 1
__
2 3
4 2
_
=
_
20 14
14 20
_
=
_
1 0
0 1
_
,
so the same form has signature (2, 0) and (0, 2)! The proof breaks down because theres no good notion of a
positive element of F
7
, so a sum of nonzero squares can be zero (the easiest example is 1
2
+2
2
+3
2
= 0).
So Sylvesters law of inertia is really using something quite special about R.
43
4.5 Euclidean spaces, orthonormal bases and the GramSchmidt process
In this section, were going to suppose K = R. As usual, we let V be an n-dimensional vector
space over K, and we let q be a quadratic form on V, with associated bilinear form .
Denition 4.5.1. The quadratic form q is said to be positive denite if q(v) > 0 for all 0 ,= v V.
It is clear that this is the case if and only if t = n and u = 0 in Proposition 4.4.3; that is, if q has
signature (n, 0).
The associated symmetric bilinear form is also called positive denite when q is.
Denition 4.5.2. A vector space V over R together with a positive denite symmetric bilinear
form is called a Euclidean space.
In this case, Proposition 4.4.3 says that there is a basis e
i
of V with respect to which (e
i
, e
j
) =
ij
, where
ij
=
_
1 if i = j
0 if i ,= j.
(so the matrix A of q is the identity matrix I
n
.) We call a basis of a Euclidean space V with this
property an orthonormal basis of V.
(More generally, any set v
1
, . . . , v
r
of vectors in V, not necessarily a basis, will be said to be Lecture 17
orthonormal if v
i
v
j
=
ij
for 1 i, j r.)
We shall assume from now on that V is a Euclidean space, and that we have chosen an orthonor-
mal basis basis e
1
, . . . , e
n
. Then corresponds to the standard scalar product, we shall write
v w instead of (v, w).
Note that v w = v
T
w where, as usual, v and w are the column vectors associated with v and
w.
For v V, dene [v[ =

v v. Then [v[ is the length of v. Hence the length, and also the cosine
v w/([v[[w[) of the angle between two vectors can be dened in terms of the scalar product.
Thus a set of vectors is orthonormal if the vectors all have length 1 and are at right angles to
each other.
The following theorem tells us that we can always complete a set of orthonormal vectors to an
orthonormal basis. Let
ij
be 1 if i = j and 0 if i ,= j, so an orthonormal basis
Theorem 4.5.3 (Gram-Schmidt process). Let V be a Euclidean space of dimension n, and suppose
that, for some r with 0 r n, f
1
, . . . , f
r
are vectors in V such that
f
i
f
j
=
ij
for 1 i, j r. ()
Then f
1
, . . . , f
r
can be extended to an orthonormal basis f
1
, . . . , f
n
of V.
Proof. We prove rst that f
1
, . . . , f
r
are linearly independent. Suppose that
r
i=1
x
i
f
i
= 0 for
some x
1
, . . . , x
r
R. Then, for each j with 1 j r, the scalar product of the left hand side of
44
this equation with f
j
is
r
i=1
x
i
f
j
f
i
= x
j
, by (). Since f
j
0 = 0, this implies that x
j
= 0 for all j,
so the f
i
are linearly independent.
The proof of the theorem will be by induction on n r. We can start the induction with the case
n r = 0, when r = n, and there is nothing to prove. So assume that n r > 0; i.e. that r < n.
By a result from MA106, we can extend any linearly independent set of vectors to a basis of V,
so there is a basis f
1
, . . . , f
r
, g
r+1
, . . . , g
n
of V containing the f
i
. The trick is to dene
f
/
r+1
= g
r+1
i=1
(f
i
g
r+1
)f
i
.
If we take the scalar product of this equation by f
j
for some 1 j r, then we get
f
j
f
/
r+1
= f
j
g
r+1
i=1
(f
i
g
r+1
)(f
j
f
i
)
and then, by (), f
j
f
i
is non-zero only when j = i, so the sum on the right hand side simplies
to f
j
g
r+1
, and the whole equation simplies to f
j
f
/
r+1
= f
j
g
r+1
f
j
g
r+1
= 0.
The vector f
/
r+1
is non-zero by linear independence of the basis, and if we dene f
r+1
=
f
/
r+1
/[f
/
r+1
[, then we still have f
j
f
r+1
= 0 for 1 j r, and we also have f
r+1
.f
r+1
= 1. Hence
f
1
, . . . , f
r+1
satisfy the equations (), and the result follows by inductive hypothesis.
4.6 Orthogonal transformations
If were working with a Euclidean space V, we know what the length of a vector in V means,
and what the angle between vectors is; so we might want to consider transformations from V
to itself that preserve lengths and angles they play nicely with the geometry of the space.
Denition 4.6.1. A linear map T : V V is said to be orthogonal if it preserves the scalar
product on V. That is, if T(v) T(w) = v w for all v, w V.
Since length and angle can be dened in terms of the scalar product, an orthogonal linear
map preserves distance and angle, so geometrically it is a rigid map. In R
2
, for example, an
orthogonal map is either rotation about the origin, or a reection about a line through the origin.
If A is the matrix of T, then T(v) = Av, so T(v) T(w) = v
T
A
T
Aw, and hence T is orthogonal
if and only if A
T
A = I
n
, or equivalently if A
T
= A
1
.
Denition 4.6.2. An n n matrix is called orthogonal if A
T
A = I
n
.
So we have proved:
Proposition 4.6.3. A linear map T : V V is orthogonal if and only if its matrix A (with respect to
an orthonormal basis of V) is orthogonal.
Incidentally, the fact that A
T
A = I
n
tells us that A (and hence T) is invertible, so det(A) is
nonzero. In fact we can do a little better than that:
45
Proposition 4.6.4. An orthogonal matrix has determinant 1.
Proof. We have A
T
A = I
n
, so det(A
T
A) = det(I
n
) = 1.
On the other hand, det(A
T
A) = det(A
T
) det(A) = (det A)
2
. So (det A)
2
= 1, implying that
det A = 1.
Example. For any R, let A =
_
cos sin
sin cos
_
. (This represents a counter-clockwise
rotation through an angle .) Then it is easily checked that A
T
A = AA
T
= I
2
.
One can check that every orthogonal 2 2 matrix with determinant +1 is a rotation by some
angle , and similarly that any orthogonal 2 2 matrix of det 1 is a reection in some line
through the origin. In higher dimensions the taxonomy of orthogonal matrices is a bit more
complicated well revisit this in a later section of the course.
Notice that the columns of A are mutually orthogonal vectors of length 1, and the same applies
to the rows of A. Let c
1
, c
2
, . . . , c
n
be the columns of the matrix A. As we observed in 1, c
i
is
equal to the column vector representing T(e
i
) In other words, if T(e
i
) = f
i
, say, then f
i
= c
i
.
Since the (i, j)-th entry of A
T
A is c
T
i
c
j
= f
i
f
j
, we see that T and A are orthogonal if and only if
f
i
f
i
= 1 and f
i
f
j
= 0 (i ,= j), 1 i, j n. ()
By Proposition 4.6.4, an orthogonal linear map is invertible, so T(e
i
) (1 i n) form a basis of
V, and we have:
Proposition 4.6.5. A linear map T is orthogonal if and only if T(e
1
), . . . , T(e
n
) is an orthonormal
basis of V.
Heres a pretty application of the Gram-Schmidt process and orthogonal matrices. Notice that Lecture 18
our proof of Gram-Schmidt actually proved a little more: we showed that if f
1
, . . . , f
r
is an
orthonormal set, and g
r+1
, . . . , g
n
is any way of completing the fs to a basis of V, then we can
nd f
r+1
, . . . , f
n
such that
f
1
, . . . , f
n
is an orthonormal basis,
for each r + 1 i n, f
i
is in the linear span of f
1
, . . . , f
r
, g
r+1
, . . . , g
i
.
That is, weve arranged that the basis change matrix from f
1
, . . . , f
r
, g
r+1
, . . . , g
n
to f
1
, . . . , f
n
looks like
_
I
r
A
0 B
_
with B upper-triangular. This is most useful when r = 0, when it says that any basis may be
modied by an upper-triangular matrix to make it orthonormal. The following proposition
strengthens this slightly:
This slight modication of Gram-Schmidt has a very nice interpretation in terms of matrices:
46
Proposition 4.6.6 (QR decomposition). Let A be any n n real matrix. Then we can write A = QR
where Q is orthogonal and R is upper-triangular.
Proof. Well make a simplifying assumption: lets suppose A is invertible.
Let g
1
, . . . , g
n
be the columns of A, regarded as vectors in R
n
. Then since A is invertible,
g
1
, . . . , g
n
is a basis of R
n
. We apply Gram-Schmidt orthonormalization to construct an or-
thonormal basis f
1
, . . . , f
n
such that f
i
is in the linear span of g
1
, . . . , g
i
for each i.
Let Q be the matrix whose columns are f
1
, . . . , f
n
. Then Q is an orthogonal matrix, since its
columns are orthonormal vectors; and there are real numbers r
ij
such that
g
1
= r
11
f
1
g
2
= r
12
f
1
+ r
22
f
2
g
3
= r
13
f
1
+ r
23
f
2
+ r
33
f
3
.
.
.
In other words we have
A = QR
where R is the upper-triangular matrix with entries r
ij
.
(When A isnt invertible, then the columns of A wont be a basis. But its not hard to show that
any matrix A can be written as A = BR
/
where B is invertible and A is upper triangular; then
writing B = QR we have A = QRR
/
, and RR
/
is also upper-triangular.)
Example. Consider the matrix
A =
_
_
1 0 2
2 0 1
0 2 2
_
_
.
We have det(A) = 10, so A is non-singular. Let g
1
, g
2
, g
3
be the columns of A, as above.
Then [g
1
[ =
5, so
f
1
=
g
1
5
=
_
_
1/
5
2/
5
0
_
_
.
For the next step, we take f
/
2
= g
2
(f
1
g
2
)f
1
= g
2
, since f
1
g
2
= 0. So
f
2
=
g
2
[g
2
[
=
_
_
0
0
1
_
_
.
For the nal step, we take the vector
f
/
3
= g
3
(f
1
g
3
)f
1
(f
2
g
3
)f
2
.
47
We have
f
1
g
3
=
_
_
1/
5
2/
5
0
_
_
_
_
2
1
2
_
_
= 0, f
2
g
3
=
_
_
0
0
1
_
_
_
_
2
1
2
_
_
= 2.
So f
/
3
= g
3
2f
2
=
_
_
2
1
0
_
_
. We have [f
/
3
[ =
5 again, so
f
3
=
f
/
3
5
=
_
_
2/
5
1/
5
0
_
_
.
Thus Q is the matrix whose columns are f
1
, f
2
, f
3
, that is
Q =
_
_
1/
5 0 2/
5
2/
5 0 1/
5
0 1 0
_
_
.
and we have
g
1
=
5f
1
, g
2
= 2f
2
, g
3
= 2f
2
+
5f
3
so A = QR where
R =
_
_
5 0 0
0 2 2
0 0

5
_
_
.
The QR decomposition theorem is a very important technique in numerical calculations with
matrices. For instance, QR decomposition gives a quick way of inverting matrices. If A = QR,
then A
1
= R
1
Q
1
. Inverting orthogonal matrices is trivial, as the inverse is just the transpose;
inverting upper-triangular matrices is also pretty easy, so we can compute the inverse of A this
way, without having to compute the determinant.
4.7 Nice orthonormal bases
Now what if we have a Euclidean space V, and a linear operator T : V V, or a quadratic
form q on V (not necessarily the same as the native length form on V)? Can we always nd an
orthonormal basis of V making the matrix of q look reasonably nice? Notice that were juggling
two quadratic forms here were trying to make the matrix of q look nice while simultaneously
keeping the matrix of the length form being the identity.
Oddly enough, this is also a question about linear operators. Given any bilinear form on V
(not necessarily symmetric), theres a uniquely determined linear operator T on V such that
(v, w) = v (Tw).
Conversely, any T determines a by the same formula. So once weve xed a starting bilinear
form (the inner product), we can get any other bilinear form from this via a linear operator,
48
and this gives us a bijection between bilinear forms and linear operators once weve xed our
starting point. Moreover, the matrix of T, in an orthonormal basis of V, is clearly just the
matrix of . When we change basis by an orthogonal matrix (to get a new orthonormal basis),
the matrix of T changes by A P
1
AP, and the matrix of changes by A P
T
AP, but this is
OK since P
T
= P
1
for orthogonal matrices!
In particular, if T is any linear operator, then (v, w) (Tv) w is certainly a bilinear form; so Lecture 19
there must be some linear operator S such that
(Tv) w = v (Sw) ()
for all v and w.
Denition 4.7.1. If T : V V is a linear operator on a Euclidean space V, then the unique
linear map S such that () holds is called the adjoint of T. We write this as T
.
Its easy to see that in an orthonormal basis, the matrix of T
is just the transpose of the matrix

of T. It follows from this that a linear operator is orthogonal if and only if T
= T
1
; one can
also prove this directly from the denition.
We say T is selfadjoint if T
= T, or equivalently if the bilinear form (v, w) = v (Tw) is

symmetric. Notice that selfadjointness, like orthogonal-ness, is something that only makes
sense for linear operators on Euclidean spaces; it doesnt make sense to ask if a linear operator
on a general vector space is selfadjoint. It should be clear that T is selfadjoint if and only if its
matrix in an orthogonal basis of V is a symmetric matrix.
So if V is a Euclidean space of dimension n, the following problems are all actually the same:
given a quadratic form q on V, nd an orthonormal basis of V making the matrix of q as
nice as possible;
given a selfadjoint linear operator T on V, nd an orthonormal basis of V making the
matrix of T as nice as possible;
given an n n symmetric real matrix A, nd an orthogonal matrix P such that P
T
AP is as
nice as possible.
First, well warm up by proving a proposition which well need in the main proof:
Proposition 4.7.2. Let A be a real symmetric matrix. Then A has an eigenvalue in R, and all complex
eigenvalues of A lie in R.
Proof. (To simplify the notation, we will write just v for a column vector v in this proof.)
The characteristic equation det(A xI
n
) = 0 is a polynomial equation of degree n in x, and
since C is an algebraically closed eld, it certainly has a root C, which is an eigenvalue for
A if we regard A as a matrix over C. We shall prove that any such lies in R, which will prove
the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of replacing all entries
of v or B by their complex conjugates. Since the entries of A lie in R, we have A = A.
49
Let v be a complex eigenvector associated with . Then
Av = v (1)
so,taking complex conjugates and using A = A, we get
Av = v. (2)
Transposing (1) and using A
T
= A gives
v
T
A = v
T
, (3)
so by (2) and (3) we have
v
T
v = v
T
Av = v
T
v.
But if v = (
1
,
2
, . . . ,
n
)
T
, then v
T
v =
1
1
+ +
n
n
, which is a non-zero real number
(eigenvectors are non-zero by denition). Thus = , so R.
Now lets prove the main theorem.
Theorem 4.7.3. Let V be a Euclidean space of dimension n. Then:
Given any quadratic form q on V, there is an orthonormal basis f
1
, . . . , f
n
of V and constants
1
, . . . ,
n
, uniquely determined up to reordering, such that
q(x
1
f
1
+ + x
n
f
n
) =
n
i=1
i
(x
i
)
2
for all x
1
, . . . , x
n
R.
Given any linear operator T : V V which is selfadjoint, there is an orthonormal basis f
1
, . . . , f
n
of V consisting of eigenvectors of T.
Given any symmetric n n matrix A, there is an orthogonal matrix P such that P
T
AP = P
1
AP
is a diagonal matrix.
Proof. Weve already seen that these three statements are equivalent to each other, so we can
prove whichever one of them we like. Notice that in the second and third forms of the statement,
its clear that the diagonal matrix we obtain is similar to the original one; that tells us that in the
rst statement the constants
1
, . . . ,
n
are uniquely determined (possibly up to re-ordering).
Well go for proving the second statement. Well prove this by induction on n = dimV. If n = 0
there is nothing to prove, so lets assume the proposition holds for n 1.
Let T be our linear operator. By proposition 4.7.2, T has an eigenalue in R. Let v be a corre-
sponding eigenvector in V. Then f
1
= v/[v[ is also an eigenvector, and [f
1
[ = 1. Let
1
be the
corresponding eigenvalue.
We consider the space W = w V : w f
1
= 0. This is clearly a subspace of W of dimension
n 1. I claim that T maps W into itself. So suppose w W; we want to show that T(w) W
also.
50
We have
T(w) f
1
= w T(f
1
)
since T is selfadjoint. But we know that T(f
1
) =
1
f
1
, so it follows that
T(w) f
1
=
1
(w f
1
) = 0,
since w W so w f
1
= 0 by hypothesis.
So T maps W into itself. Moreover, W is a Euclidean space of dimension n 1, so we may
apply the induction hypothesis to the restriction of T to W. This gives us an orthonormal basis
f
2
, . . . , f
n
of W consisting of eigenvectors of T. Then the set f
1
, . . . , f
n
is an orthonormal basis of
V, which also consists of eigenvectors of T.
Although it is not used in the proof of the theorem above, the following proposition is useful Lecture 20
when calculating examples. It helps us to write down more vectors in the nal orthonormal
basis immediately, without having to use Theorem 4.5.3 repeatedly.
Proposition 4.7.4. Let A be a real symmetric matrix, and let
1
,
2
be two distinct eigenvalues of A,
with corresponding eigenvectors v
1
, v
2
. Then v
1
v
2
= 0.
Proof. (As in Proposition 4.7.2, we will write v rather than v for a column vector in this proof.
So v
1
v
2
is the same as v
T
1
v
2
.) We have
Av
1
=
1
v
1
, (1)
Av
2
=
2
v
2
. (2)
The trick is now to look at the expression v
T
1
Av
2
. On the one hand, by (2) we have
v
T
1
Av
2
= v
1
(Av
2
) = v
T
1
(
2
v
2
) =
2
(v
1
v
2
). (3)
On the other hand, A
T
= A, so v
T
1
A = v
T
1
A
T
= (Av
1
)
T
, so using (1) we have
v
T
1
Av
2
= (Av
1
)
T
v
2
= (
1
v
T
1
)v
2
=
1
(v
1
v
2
). (4)
Comparing (3) and (4), we have (
2
1
)(v
1
v
2
) = 0. Since
2
1
,= 0 by assumption, we
have v
T
1
v
2
= 0.
Example 9. Let n = 2 and let A be the symmetric matrix A =
_
1 3
3 1
_
. Then
det(A xI
2
) = (1 x)
2
9 = x
2
2x 8 = (x 4)(x + 2),
so the eigenvalues of A are 4 and 2. Solving Av = v for = 4 and 2, we nd corresponding
eigenvectors
_
1
1
_
and
_
1
1
_
. Proposition 4.7.4 tells us that these vectors are orthogonal to each
other (which we can of course check directly!), so if we divide them by their lengths to give
51
vectors of length 1, giving
_
1
2
1
2
_
and
_
1
2
1
2
_
then we get an orthonormal basis consisting of
eigenvectors of A, which is what we want. The corresponding basis change matrix P has these
vectors as columns, so P =
_
1
2
1
2
1
2
1
2
_
, and we can check that P
T
P = I
2
(i.e. P is orthogonal)
and that P
T
AP =
_
4 0
0 2
_
.
Example 10. Lets do an example of the quadratic form version of the above theorem. Let
n = 3 and
q(v) = 3x
2
+ 6y
2
+ 3z
2
4xy 4yz + 2xz,
so A =
_
_
3 2 1
2 6 2
1 2 3
_
_
.
Then, expanding by the rst row,
det(A xI
3
) = (3 x)(6 x)(3 x) 4(3 x) 4(3 x) + 4 + 4 (6 x)
= x
3
+ 12x
2
36x + 32 = (2 x)(x 8)(x 2),
so the eigenvalues are 2 (repeated) and 8. For the eigenvalue 8, if we solve Av = 8v then
we nd a solution v =
_
_
1
2
1
_
_
. Since 2 is a repeated eigenvalue, we need two corresponding
eigenvectors, which must be orthogonal to each other. The equations Av = 2v all reduce to
x 2y + z = 0, and so any vector
_
_
x
y
z
_
_
satisfying this equation is an eigenvector for = 2. By
Proposition 4.7.4 these eigenvectors will all be orthogonal to the eigenvector for = 8, but we
will have to choose them orthogonal to each other. We can choose the rst one arbitrarily, so
lets choose
_
_
1
0
1
_
_
. We now need another solution that is orthogonal to this. In other words,
we want x, y and z not all zero satisfying x 2y + z = 0 and x z = 0, and x = y = z = 1
is a solution. So we now have a basis
_
_
1
2
1
_
_
,
_
_
1
0
1
_
_
,
_
_
1
1
1
_
_
of three mutually orthogonal
eigenvectors. To get an orthonormal basis, we just need to divide by their lengths, which are,
respectively,
6,
2, and
3, and then the basis change matrix P has these vectors as columns,
so
P =
_
_
1/
6 1/
2 1/
3
2/
6 0 1/
3
1/
6 1/
2 1/
3
_
_
.
52
It can then be checked that P
T
P = I
3
and that P
T
AP is the diagonal matrix with entries 8, 2, 2.
So if f
1
, f
2
, f
3
is this basis, we have
q(xf
1
+ yf
2
+ zf
3
) = 8x
2
1
+ 2x
2
2
+ 2x
2
3
.
4.8 Applications of quadratic forms to geometry
4.8.1 Reduction of the general second degree equation
The general equation of the second degree in n variables x
1
, . . . , x
n
is
n
i=1
i
x
2
i
+
n
i=1
i1
j=1
ij
x
i
x
j
+
n
i=1
i
x
i
+ = 0. ()
For xed values of the s, s and , this denes a quadric curve or surface in n-dimensional
Euclidean space. To study the possible shapes of the curves and surfaces thus dened, we rst
simplify this equation by applying coordinate changes resulting from isometries (rigid motions)
of R
n
; that is, transformations that preserve distance and angle.
By Theorem 4.7.3, we can apply an orthogonal basis change (that is, an isometry of R
n
that xes
the origin) which has the effect of eliminating the terms
ij
x
i
x
j
in the above sum.
Now, whenever
i
,= 0, we can replace x
i
by x
i

i
/(2
i
), and thereby eliminate the term
i
x
i
from the equation. This transformation is just a translation, which is also an isometry.
If
i
= 0, then we cannot eliminate the term
i
x
i
. Let us permute the coordinates such that
i
,= 0 for 1 i r, and
i
,= 0 for r + 1 i r + s. Then if s > 1, by using Theorem 4.5.3,
we can nd an orthogonal transformation that leaves x
i
unchanged for 1 i r and replaces
s
i=1

r+j
x
r+j
by x
r+1
(where is the length of
s
i=1

r+j
x
r+j
), and then we have at most one
non-zero
i
; either there are no linear terms at all, or there is just
r+1
.
Finally, if there is a non-zero
r+1
, then we can perform the translation that replaces x
r+1
by
x
r+1
/
r+1
, and thereby eliminate . By dividing the equation through by a constant, we
can assume that the constant term or linear
We have proved the following theorem: Lecture 21
Theorem 4.8.1. By rigid motions of Euclidean space, we can transform the set dened by the general
second degree equation () into the set dened by an equation having one of the following three forms:
r
i=1
i
x
2
i
= 0,
r
i=1
i
x
2
i
+ 1 = 0,
r
i=1
i
x
2
i
+ x
r+1
= 0.
Here 0 r n and
1
, . . . ,
r
are nonzero constants, and in the third case r < n.
53
In both cases, we shall assume that r ,= 0, because otherwise we have a linear equation.
The sets dened by the rst two types of equation are called central quadrics because they has
central symmetry; i.e. if a vector v satises the equation, then so does v.
We shall now consider the types of curves and surfaces that can arise in the familiar cases n = 2
and n = 3. These different types correspond to whether the
i
are positive, negative or zero,
and whether = 0 or 1.
We shall use x, y, z instead of x
1
, x
2
, x
3
, and , , instead of
1
,
2
,
3
. We shall assume also
that , , are all strictly positive, and write , etc., for the negative case.
4.8.2 The case n = 2
When n = 2 we have the following possibilities.
(i) x
2
= 0. This just denes the line x = 0 (the y-axis).
(ii) x
2
= 1. This denes the two parallel lines x = 1/
.
(iii) x
2
= 1. This is the empty set!
(iv) x
2
+ y
2
= 0. The single point (0, 0).
(v) x
2
y
2
= 0. This denes two straight lines y =
_
/ x, which intersect at (0, 0).
(vi) x
2
+ y
2
= 1. An ellipse.
(vii) x
2
y
2
= 1. A hyperbola.
(viii) x
2
y
2
= 1. The empty set again.
(ix) x
2
y = 0. A parabola.
4.8.3 The case n = 3
When n = 3, we still get the nine possibilities (i) (ix) that we had in the case n = 2, but now
they must be regarded as equations in the three variables x, y, z that happen not to involve z.
So, in Case (i), we now get the plane x = 0, in case (ii) we get two parallel planes x = 1/
, in
Case (iv) we get the line x = y = 0 (the z-axis), in case (v) two intersecting planes y =
_
/x,
and in Cases (vi), (vii) and (ix), we get, respectively, elliptical, hyperbolic and parabolic cylinders.
The remaining cases involve all of x, y and z. We omit x
2
y
2
z
2
= 1, which is empty.
(x) x
2
+ y
2
+ z
2
= 0. The single point (0, 0, 0).
(xi) x
2
+ y
2
z
2
= 0. See Fig. 1.
54
Figure 1:
1
2
x
2
+ y
2
z
2
= 0
This is an elliptical cone. The cross sections parallel to the xy-plane are ellipses of the form
x
2
+ y
2
= c, whereas the cross sections parallel to the other coordinate planes are generally
hyperbolas. Notice also that if a particular point (a, b, c) is on the surface, then so is t(a, b, c) for
any t R. In other words, the surface contains the straight line through the origin and any of
its points. Such lines are called generators. When each point of a 3-dimensional surface lies on
one or more generators, it is possible to make a model of the surface with straight lengths of
wire or string.
(xii) x
2
+ y
2
+ z
2
= 1. An ellipsoid. See Fig. 2.
55
Figure 2: 2x
2
+ y
2
+
1
2
z
2
= 1
This is a squashed sphere. It is bounded, and hence clearly has no generators. Notice that if ,
, and are distinct, it has only the nite group of symmetries given by reections in x, y and z,
but if some two of the coefcients coincide, it picks up an innite group of rotation symmetries.
(xiii) x
2
+ y
2
z
2
= 1. A hyperboloid. See Fig. 6.
56
Figure 3: 3x
2
+ 8y
2
8z
2
= 1
There are two types of 3-dimensional hyperboloids. This one is connected, and is known as a
hyperboloid of one sheet. Any cross-section in the xy direction will be an ellipse, and these get
larger as z grows (notice the hole in the middle in the picture). Although it is not immediately
obvious, each point of this surface lies on exactly two generators; that is, lines that lie entirely
on the surface. For each R, the line dened by the pair of equations
z = (1
_
y); (
x +
z) = 1 +
_
y.
lies entirely on the surface; to see this, just multiply the two equations together. The same
applies to the lines dened by the pairs of equations
_
y
z = (1
x); (
_
y +
z) = 1 +
x.
57
It can be shown that each point on the surface lies on exactly one of the lines in each of these
two families.
There is a photo at http://home.cc.umanitoba.ca/
~
gunderso/model_photos/misc/hyperboloid_
of_one_sheet.jpg depicting a rather nice wooden model of a hyperboloid of one sheet, which
gives a good idea how these lines sit inside the surface.
(xiv) x
2
y
2
z
2
= 1. Another kind of hyperboloid. See Fig. 4. Lecture 22
Figure 4: 8x
2
12y
2
20z
2
= 1
This one has two connected components and is called a hyperboloid of two sheets. It does not have
generators.
58
(xv) x
2
+ y
2
z = 0. An elliptical paraboloid. See Fig. 5.
Figure 5: 2x
2
+ 3y
2
z = 0
Cross-sections of this surface parallel to the xy plane are ellipses, while cross-sections in the yz
and xz directions are parabolas. It can be regarded as the limit of a family of hyperboloids of
two sheets, where one cap remains at the origin and the other recedes to innity.
(xvi) x
2
y
2
z = 0. A hyperbolic paraboloid (a rather elegant saddle shape). See Fig. 6.
59
Figure 6: x
2
4y
2
z = 0
As in the case of the hyperboloid of one sheet, there are two generators passing through each
point of this surface, one from each of the following two families of lines:
(
x
_
) y = z;

x +
_
y = .
(
x +
_
) y = z;

x
_
y = .
Just as the elliptical paraboloid was a limiting case of a hyperboloid of two sheets, so the
hyperbolic paraboloid is a limiting case of a hyperboloid of one sheet: you can imagine gradually
deforming the hyperboloid of one sheet so the elliptical hole in the middle becomes bigger and
bigger, and the result is the hyperbolic paraboloid.
60
4.9 The complex story: sesquilinear forms
The results in Subsection 4.7 applied only to vector spaces over the real numbers R. There are
corresponding results for spaces over the complex numbers C, which we shall summarize here
without proofs, although the proofs are similar and analogous to those for spaces over R.
The key thing that made everything work over R was the fact that if x
1
, . . . , x
n
are real numbers,
and x
2
1
+ + x
2
n
= 0, then all the x
i
are zero. This doesnt work over C, obviously (take x
1
= 1
and x
2
= i). But we do have something similar if we bring complex conjugation into play. As
usual, for z C, z will denote the complex conjugate of z. Then if z
1
z
1
+ + z
n
z
n
= 0, the zs
are all zero. So we need to put bars on half of our formulae. Notice that there was a hint of
this in the proof of Proposition 4.7.2.
Well do this as follows.
Denition 4.9.1. A sesquilinear form on a complex vector space V is a function : V V C
such that
(v, x
1
w
1
+ x
2
w
2
) = x
1
(v, w
1
) + x
2
(v, w
2
)
(as before), but
(x
1
v
1
+ x
2
v
2
, w) = x
1
(v
1
, w) + x
2
(v
2
, w).
We say such a form is Hermitian symmetric if
(w, v) = (v, w).
The word sesquilinear literally means one-and-a-half-times-linear its linear in the second
argument, but only halfway there in the rst argument! Well often abbreviate Hermitian-
symmetric sesquilinear form to just Hermitian form.
We can represent these by matrices in a similar way to bilinear forms. If is a sesquilinear form,
and e
1
, . . . , e
n
is a basis of V, we dene the matrix of to be the matrix A whose i, j entry is
(e
i
, e
j
). Then we have
(v, w) = (v
T
)Aw
where v and w are the coordinates of v and w as usual. Well shorten this to v
Aw, where the

denotes conjugate transpose. The Hermitian symmetry condition translates to the relation
a
ji
= a
ij
, so is Hermitian if and only if A satises A
= A.
We have a version here of Sylvesters two theorems (Proposition 4.4.3 and Theorem 4.4.4):
Theorem 4.9.2. If is a Hermitian form on a complex vector space V, there is a basis of V in which the
matrix of is given by
_
_
I
t
I
u
0
_
_
for some uniquely determined integers t and u.
61
As in the real case, we call the pair (t, u) the signature of , and we say is positive denite if its
signature is (n, 0). In this case, the theorem tells us that there is a basis of V in which the matrix
of is the identity, and in such a basis we have
(v, v) =
n
i=1
[v
i
[
2
where v
1
, . . . , v
n
are the coordinates of v. Hence (v, v) > 0 for all non-zero v V.
Just as we dened a Euclidean space to be a real vector space with a choice of positive denite
bilinear form, we have a similar denition here:
Denition 4.9.3. A Hilbert space is a nite-dimensional complex vector space endowed with a
choice of positive-denite Hermitian-symmetric sesquilinear form.
These are the complex analogues of Euclidean spaces. f V is a Hilbert space, we write v w for
the sesquilinear form on V, and we refer to it as an inner product. For any Hilbert space, we can
always nd a basis e
1
, . . . , e
n
of V such that e
i
e
j
=
ij
(an orthonormal basis). Then we can
write the inner product matrix-wise as
v w = v
w,
where v and w are the coordinates of v and w and v
= v
T
as before.
The canonical example of a Hilbert space is C
n
, with the standard inner product given by
v w =
n
i=1
v
i
w
i
,
for which the standard basis is obviously orthonormal.
Remark. Technically, I should say nite-dimensional Hilbert space. There are lots of interesting
innite-dimensional Hilbert spaces, but we wont say anything about them in this course. (Curiously,
one never seems to come across innite-dimensional Euclidean spaces.)
In our study of linear operators on Euclidean spaces, the idea of the adjoint of an operator was
important. Theres an analogue of it here:
Denition 4.9.4. Let T : V V be a linear operator on a Hilbert space V. Then there is a
unique linear operator T
: V V (the Hermitian adjoint of V) such that

T(v) w = v T
(w).
Its clear that if A is the matrix of T in an orthonormal basis, then the matrix of T
is A
.
Denition 4.9.5. We say T is
selfadjoint if T
= T,
unitary if T
= T
1
,
62
normal if T
T = TT
.
Exercise. If T is unitary, then
T(u) T(v) = u v
for all u, v in V.
If A is the matrix of T in an orthonormal basis, then its clear that T is selfadjoint if and only if Lecture 23
A
= A (a Hermitian-symmetric matrix), unitary if and only if A
= A
1
(a unitary matrix), and
normal if and only if A
A = AA
(a normal matrix).
Notice that if A is unitary and the entries of A are real, then A must be orthogonal, but the
denition also includes things like
_
i 0
0 i
_
.
Similarly, a matrix with real entries is Hermitian-symmetric if and only if its symmetric, but
_
2 i
i 3
_
is a Hermitian-symmetric matrix thats not symmetric.
Notice that both selfadjoint and unitary operators are normal. The generalisation of Theo-
rem 4.7.3 applies to all normal operators.
Theorem4.9.6. Let T : V V be a linear operator on a Hilbert space. Then there exists an orthonormal
basis of V consisting of eigenvectors of T if and only if T is a normal operator. Furthermore,
if T is selfadjoint, the eigenvalues of T are real;
if T is unitary, the eigenvalues of T have absolute value 1.
For unitary or selfadjoint operators, one can prove this using the same argument as in The-
orem 4.7.3. For more general normal operators its a bit harder: its not quite obvious that
if v is an eigenvector for T, then the space of things orthogonal to v is preserved by T. The
argument we had before shows that its preserved by T
, not by T. However, as youll see in Q9

on problem sheet 4, any eigenvector for T is actually an eigenvector for T
, so we can swap T
and T
around to see that the space of things orthogonal to v is preserved by T and the proof
now goes as before.
As in the real situation, one checks that any two eigenvectors of a normal operator with distinct
eigenvalues are necessarily orthogonal, which makes it relatively easy to nd the orthonormal
basis of eigenvectors.
Example. A =
_
6 2 + 2i
2 2i 4
_
. Then
c
A
(x) = (6 x)(4 x) (2 + 2i)(2 2i) = x
2
10x + 16 = (x 2)(x 8),
63
so the eigenvalues are 2 and 8. Corresponding eigenvectors are v
1
= (1 + i, 2)
T
and v
2
=
(1 + i, 1)
T
. We nd that [v
1
[
2
= v
1
v
1
= 6 and [v
2
[
2
= 3, so we divide by their lengths to get an
orthonormal basis v
1
/[v
1
[, v
2
/[v
2
[ of C
2
. Then the matrix
P =
_
1+i
6
1+i
3
2
6
1
3
_
having this basis as columns is unitary and satises P
AP =
_
2 0
0 8
_
.
64
5 Linear Algebra over Z
In the rst four sections of the course, weve always been thinking about vector spaces over
elds. The idea of this section is to show that some of the same ideas work with the eld K
replaced by the integers Z, even though Z isnt a eld; and that this is strongly related to the
group theory which you saw in Foundations last year.
5.1 Review of some MA132 material
5.1.1 Groups
Groups were introduced in the rst year in Foundations (MA132), and will be studied in detail
next term in Algebra II: Groups and Rings (MA249). You might want to dig up your notes
for MA132 and read through section 9 of the course again, since well need to re-use lots of the
ideas here.
Remember that a group is a set G together with a binary operation satisfying some axioms.
These are the same axioms as in Denition 9.4 of Foundations, but were going to write the
binary operation as addition, so g
1
+ g
2
will mean the binary operation applied to g
1
and g
2
.
The axioms are:
(G1) Closure: for all g, h G, we have g + h G;
(G2) Associativity: for all g, h, k G, we have (g + h) + k = g + (h + k);
(G3) Neutral Element: there is an element 0
G
in G such that for all g G,
g + 0
G
= 0
G
+ g = g.
(G4) Inverses: If 0
G
is a neutral element, then for all g G, there is an element g G
such that
g + (g) = (g) + g = 0
G
.
Its easy to see that every group in fact has a unique neutral element 0
G
, and well refer to this as
the identity element of G. In particular, I could have written if 0
G
is the neutral element in
axiom G4.
In this course, we are mainly interested in a special kind of groups, abelian (= commutative)
groups, which are dened as follows:
Denition 5.1.1. An abelian group is a group G satisfying the following additional axiom:
(Ab) Commutativity: for all g, h G, we have g + h = h + g.
It might seem more sensible to call these commutative groups, but its become traditional to
use the word abelian to commemorate Niels Henrik Abel, a famous Norwegian mathemati-
cian of the 19th century. (This is the source of various terrible
2
jokes.)
2
Whats purple and commutes? An abelian grape.
65
In particular, abelian groups are groups, so everything you know about groups from MA132
applies to abelian groups, although were using slightly different notation.
5.1.2 Examples of groups
Here are some examples of abelian groups:
1. The integers Z, with the usual notion of addition.
2. More generally, for any n 1, the set Z
n
of row vectors of length n with entries in Z, with
component-wise addition.
Exercise. If : G H is an isomorphism then [g[ = [(g)[ for all g G.
3. Fix a positive integer n > 0 and let
Z
n
= 0, 1, 2, . . . , n 1 = x Z [ 0 x < n .
where addition is computed modulo n. So, for example, when n = 9, we have 2 + 5 = 7,
3 +8 = 2, 6 +7 = 4, etc. Note that the inverse x of x Z
n
is equal to n x in this example.
4. Examples from linear algebra. Let K be a eld. Then the following are abelian groups:
(i) The elements of K, under addition.
(ii) The vectors in any vector space over K, under addition.
(iii) The non-zero elements of K, under multiplication. (We often write this as K
.)
An example of a group that isnt abelian is the group of 2 2 invertible matrices over a eld K,
with the usual operation of matrix multiplication. This group, which is called GL
2
(K), does not
satisfy the Commutativity axiom, so it is not an abelian group.
5.1.3 The order of an element
For any group G, g G, and integer n > 0, we dene ng by
ng = g + + g
. .
n times
.
So, for example, 1g = g, 2g = g + g, 3g = g + g + g, etc. We extend this notation to all n Z by
dening 0g = 0 and (n)g = (ng) for n < 0. Its a fun exercise to check that this satises
the same axioms as the scalar multiplication on a vector space (even though Z is not a eld).
If theres an integer n > 0 such that ng = 0, then we say g has nite order, and we dene the
order [g[ of g to be the least positive n that works. Otherwise, we dene [g[ = . You saw in
Foundations (Corollary 9.26) that if G is nite, then [g[ is nite for all g G, and [g[ divides [G[
(the order of G).
66
5.1.4 Isomorphisms between groups
Another thing well recall from Foundations is the idea of an isomorphism between two groups.
Denition 5.1.2. A map : G H between two groups is called an isomorphism if
(i) is a bijection,
(ii) (g + h) = (g) + (h) for all g, h G.
The groups G and H are called isomorphic if there is an isomorphism between them.
The notation G

= H means that G is isomorphic to H; isomorphic groups are often thought of
as being essentially the same group, but with elements having different names. As far as group
theory is concerned theyre identical.
5.1.5 Subgroups and cosets
A subgroup of a group G is a subset H of G which is a group under the same binary operation as
G (Foundations denition 9.10).
If G is abelian, and H is a subgroup of G, then for any g G we dene the coset H + g of H by Lecture 24
g as the set
h + g : h H.
In Foundations you distinguished between right and left cosets, but if G is abelian these are the
same thing
3
. You saw that any two cosets of H are either equal or disjoint, so the cosets form a
partition of G (every element is in one and only one coset).
5.2 Cyclic groups
Denition 5.2.1. A group G is said to be cyclic if there is an element g G such that every
element h of G can be written as ng for some n Z.
If G is cyclic, and g is an element of G satisfying the condition, we say g generates G. Note
that Z and Z
n
are both cyclic, generated by the element 1 in each case.
Proposition 5.2.2. Any cyclic group is isomorphic either to Z, or to Z
n
for some n 1.
Proof. Let G be cyclic with generator x. So G = mx [ m Z. Suppose rst that the elements
mx for m Z are all distinct. Then the map : Z G dened by (m) = mx is a bijection,
and it is clearly an isomorphism.
Otherwise, we have lx = mx for some l < m, and so (ml)x = 0 with ml > 0. Let n be
the least integer with n > 0 and nx = 0. Then the elements 0x = 0, 1x, 2x, . . . , (n 1)x of G
3
This is fortunate, because I can never remember whether H + g is a left or right coset.
67
are all distinct, because otherwise we could nd a smaller n. Furthermore, for any mx G,
we can write m = qn + r for some q, r Z with 0 r < n. Then mx = (qn + r)x = rx, so
G = 0, 1x, 2x, . . . , (n 1)x , and the map : Z
n
G dened by (m) = mx for 0 m < n
is a bijection, which is easily seen to be an isomorphism.
Note that this implies, in particular, that any cyclic group is abelian, and Z
n
is the only cyclic
group of order n up to isomorphism (that is, any cyclic group of order n is isomorphic to Z
n
).
The main theorem of chapter 5 of this module will tell us, roughly speaking, that all abelian
groups that arent ridiculously big in particular, all nite ones can be built up by sticking
together the basic pieces Z and Z
n
. This is a generalization of Proposition 5.2.2. To do this,
well need to develop quite a bit of theory rst.
5.3 Homomorphisms, kernels, and quotients
Recall that an isomorphism was dened a bijective map between groups satisfying
(g + h) = (g) + (h).
A map between groups which satises this relation, but isnt necessarily injective or surjective,
is called a homomorphism. Note (exercise) that any homomorphism : G H must satisfy
(0
G
) = 0
H
, and more generally
(x
1
g
1
+ x
2
g
2
) = x
1
(g
1
) + x
2
(g
2
)
for any g
1
, g
2
G and x
1
, x
2
Z; so homomorphisms of groups are a bit like linear maps of
vector spaces.
Exercise. If G is any abelian group, and g any element of G, then the map : Z G dened
by (n) = ng is a homomorphism.
We can dene the kernel of a homomorphism in an obvious way, and it will turn out that kernels
are always subgroups:
Proposition 5.3.1. Let : G J be a group homomorphism. Then the set
ker() = g G : (g) = 0
is a subgroup of G.
Proof. Exercise.
We can now ask two questions. Firstly, kernels are always subgroups; is every subgroup a
kernel? More precisely, given a subgroup H of G, can we nd some homomorphism : G J
such that H = ker()? This question is quite difcult for general groups, but for abelian groups
its easier; well see shortly that every subgroup is the kernel of some homomorphism.
68
Secondly, given H, how much freedom do we have in choosing J and : G J so that
H = ker()? The answer to this is far too much, because J could be really big there could be
lots of elements in J that have nothing to do with G and . But if we assume that is surjective,
then it turns out that we have essentially no choice at all (as well see in the next section).
Theorem 5.3.2. Let G be an abelian group, and H a subgroup of G. Then there exists a group J and a
surjective homomorphism : G J such that ker() = H.
The group J well construct is called the quotient of G by H, and written as G/H. As a set, J will
just be the set of cosets of H in G. This comes parcelled up with a map from G to J, sending
g G to the coset H + g; this is obviously surjective (by the denition of a coset). The hard part
is equipping this set J with a sensible binary relation so that becomes a homomorphism.
Denition 5.3.3. If A and B are any subsets of a group G, then we dene their sum by Lecture 25
A + B = a + b : a A, b B.
Proposition 5.3.4. Let H be a subgroup of an abelian group G. Then the set G/H of cosets of H in G
forms an abelian group under addition of subsets.
Proof. We need to check that the group axioms are satised for this binary operation. Firstly,
the closure axiom: we have
(H + g) + (H + h) = H + (g + h),
by the commutativity and associativity axioms, so the sum of any two cosets is a coset and the
closure axiom is satised.
Associativity follows easily from associativity of G. Since (H + 0) + (H + g) = H + g for all
g G, H = H +0
G
is the zero element, and since (Hg) + (H + g) = Hg + g = H, Hg is
an inverse to H + g for all cosets H + g. Moreover, (H +h) + (H + g) = H +h + g = H + g +h,
so this group is abelian, and were done.
Denition 5.3.5. The group G/H is called the quotient group (or the factor group) of G by H.
As mentioned above, there is a surjective map, called the quotient map,
: G G/H
dened by (g) = H + g. This is a group homomorphism, since (g + h) = H + (g + h) =
(H + g) + (H + h), so weve proved Theorem 5.3.2.
Example. Let G = Z and let H = 5Z, the subgroup of multiples of 5. Then the cosets H + g
and H + k are the same if and only if h k 5Z, in other words if i j (mod 5). Hence there
are only 5 possible cosets H = H + 0 = 5n [ n Z, H + 1 = 5n + 1 [ n Z, H + 2,
H + 3, H + 4. This gives a map
Z
5
Z/5Z,
sending j Z
5
= 0, . . . , 4 to the coset H + j, which is clearly an isomorphism. (More
generally, we have Z/nZ

= Z
n
for any n 1.)
69
Note that since G/H is dened as the set of cosets of H in G, its size is the number of cosets of
H in G. This is called the index of H in G, written as [G : H[; so
[G/H[ = [G : H[.
5.4 The rst isomorphism theorem
Recall that we constructed G/H in order to show that any subgroup is the kernel of a surjective
homomorphism. In fact G/H is more or less the only possibility, which will follow from the
main result of this section. Firstly, a couple of warm-up results:
Proposition 5.4.1. If G and H are any groups and : G H is a homomorphism, then the image
im() is a subgroup of H.
Proof. Exercise.
Proposition 5.4.2. Let : G H be a homomorphism. Then is injective if and only if ker() =
0
G
.
Proof. Since 0
G
ker(), if is injective, then we must have ker() = 0
G
.
Conversely, suppose that ker() = 0
G
, and let g
1
, g
2
G with (g
1
) = (g
2
). Then 0
H
=
(g
1
) (g
2
) = (g
1
g
2
), so g
1
g
2
ker(). Hence we must have g
1
g
2
= 0
G
, and
g
1
= g
2
. So is injective.
Theorem 5.4.3 (First Isomorphism Theorem). Let : G H be a group homomorphism with kernel
K. Then im()

= G/K. More specically, there is a unique isomorphism
: G/K im()
such that
(K + g) = (g) g G. ()
Proof. The trickiest point to understand in this proof is that we have to show that there really is
a map : G/K im() such that () holds. We cant just dene it by the formula (), since
the coset K + g will be equal to K + h for any h such that g h K. However, once youve
realized what needs to be done, its not hard to do it: in the above situation, (g) will be equal
to (h) + (g h) = (h), since g h ker(). So we can dene (X), for X a coset, to be
(g) for any g X, and then this wont depend on the choice of g.
This is called checking that is well-dened. Its a bit like how we dene the dimension of a
vector space to be the size of a basis: vector spaces have many bases, but any two bases have
the same size, so the dimension is well-dened.
70
Once weve worked out how to dene , were more or less done. Clearly im() = im(), and
it is straightforward to check that is a homomorphism. Finally,
(K + g) = 0
H
(g) = 0
H
g K
K + g = K + 0 = 0
G/K
.
Thus ker() = 0
G/K
, so is injective by Proposition 5.4.2. Thus : G/K im() is an
isomorphism, which completes the proof.
Notice in particular that if : G H is surjective, then im() = H, so H

= G/K; thus if
we want to nd a group H and a surjective homomorphism G H with kernel K, the only
possibility is that H

= G/K. So what we did in Theorem 5.3.2 was essentially the only thing we
could have done.
5.5 The group Z
n
Lecture 26
Well now spend a couple of lectures studying some of the most important examples of abelian
groups. Firstly, a general dLet n 1 and consider the group Z
n
of row vectors x = (x
1
, . . . , x
n
)
with x
i
Z. Its clear that this is a group, with the obvious addition law
(x
1
, . . . , x
n
) + (y
1
, . . . , y
n
) = (x
1
+ y
1
, . . . , x
n
+ y
n
).
Why is this important? In the last two sections, weve seen the idea of a quotient of a group.
Well see later in this course that every nite abelian group is a quotient of Z
n
for some n. So if
we understand the group Z
n
, and its subgroups, well be able to understand all nite abelian
groups too.
Denition 5.5.1. A basis of Z
n
is a subset B of Z
n
such that:
Every x Z
n
can be written in the form
k
i=1
m
i
b
i
for some m
i
Z and b
i
B (B spans, or generates, Z
n
);
if m
1
, . . . , m
k
Z and b
1
, . . . , b
k
B are such that
k
i=1
m
i
b
i
= 0, then all the m
i
are zero
(B is linearly independent).
Remark. The idea of a generating set makes sense in any abelian group. Well see this a lot later in the
chapter. More generally, if G is an abelian group and X is a subset of G, we can consider the subgroup
generated by X, which is just the set of nite sums of multiples of elements of X. The denition of LI
sets also makes sense in any abelian group, but its not so useful.
71
Its clear that the set e
1
= (1, 0, . . . , 0), e
2
= (0, 1, . . . , 0), . . . , e
n
= (0, 0, . . . , 1) is a basis of Z
n
(the standard basis), but there are others; for instance, (1, 0), (1, 1) is a basis of Z
2
. Its easy to
see that any basis of Z
n
is also a basis of Q
n
as a vector space over Q, so in particular it must
have size n.
Its important to notice, though, that a subset of Z
n
which is a basis of Q
n
need not be a basis of
Z
n
in the above sense. For instance, (2, 0), (0, 2) isnt a basis of Z
2
, since we cant write all
elements of Z
2
as linear combinations of these vectors with integer coefcients well need to
divide by 2 at some point. This also shows that a set of n linearly independent elements of Z
n
neednt be a basis.
Let x
1
, x
2
, . . . , x
n
be the standard basis of Z
n
, and let y
1
, . . . , y
n
be another basis. As in Linear
Algebra, we can dene the associated change of basis matrix P (with original basis x
i
and
new basis y
i
) , where the columns of P are y
T
i
; that is, they express y
i
in terms of x
i
.
We can obviously form such a matrix P with any set of n vectors. For instance, if n = 2 we could
take y
1
= (2, 7), y
2
= (1, 4), so P =
_
2 1
7 4
_
. How do we tell if they form a basis? In linear
algebra it was enough that their determinant was nonzero; here we have a slightly different
criterion:
Theorem 5.5.2. Let y
1
, . . . y
n
Z
n
, and let P = (p
ij
) be the matrix with the ys as columns, so
y
j
=
n
i=1
p
ij
x
i
for 1 j n. Then the following are equivalent:
(i) y
1
, . . . y
n
is a basis of Z
n
;
(ii) P is an invertible matrix such that P
1
has entries in Z;
(iii) det(P) = 1.
A matrix P Z
n,n
with det(P) = 1 is called unimodular, so (iii) says that P is a unimodular
matrix.
Proof. (i) (ii): If y
1
, . . . y
n
is a basis of Z
n
, then it spans Z
n
, so we can write the xs in terms
of the ys; that is, there is an n n integer matrix Q = (q
ij
) with x
k
=
n
j=1
q
jk
y
j
for 1 k n.
Hence
x
k
=
n
j=1
q
jk
y
j
=
n
j=1
q
jk
n
i=1
p
ij
x
i
=
n
i=1
(
n
j=1
p
ij
q
jk
)x
i
,
and, by the linear independence of x
1
, . . . , x
n
, this implies that
m
j=1
p
ij
q
jk
=
ik
. In other words,
PQ = I
n
. So P
1
= Q, and in particular P
1
has entries in Z.
(ii) (i): If P is invertible, then the columns of P are linearly independent as elements of Q
n
, so
they are certainly linearly independent as elements of Z
n
. Lets show that they span. Since the
xs are a basis, it sufces to show that each of x
1
, . . . , x
n
is a linear combination of the ys with
integer coefcients.
72
Let Q = P
1
, which has entries in Z by assumption. Then for 1 k n, we have
m
j=1
q
jk
y
j
=
m
j=1
q
jk
n
i=1
p
ij
x
i
=
n
i=1
(
m
j=1
p
ij
q
jk
)x
i
= x
k
,
so x
k
is a linear combination of the ys as required.
(ii) (iii): If P is a matrix such that both P and Q = P
1
have integer entries, then det(P) det(Q) =
det(I
n
) = 1. Hence det(P) must be 1. Conversely, for any invertible matrix we have
P
1
=
1
det(P)
adj(P) by MA106; clearly adj(P) has integer entries if P does, so if det(P) = 1,
then P
1
has integer entries.
Examples: For our example above with n = 2, P was
_
2 1
7 4
_
, so det(P) = 8 7 = 1. Hence
y
1
, y
2
is a basis of Z
2
.
But, if y
1
= (2, 0), y
2
= (0, 2), then det(P) = 4, so y
1
, y
2
is not a basis of Z
2
, as we saw above.
As in Linear Algebra, one can check that for v Z
n
, if x(= v
T
) and y are the column vectors
representing v using bases x
1
, . . . x
n
and y
1
, . . . y
n
respectively, then we have x = Py, so y =
P
1
x.
5.6 Subgroups of Z
n
and Smith Normal Form
Lecture 27
Well now classify the subgroups of the group Z
n
.
Theorem 5.6.1. Let G = Z
n
and let H be a subgroup of G. Then H is isomorphic to Z
r
for some r n;
and there is a basis b
1
, . . . , b
n
of G, and positive integers d
1
, . . . , d
r
such that
d
1
b
1
, . . . , d
r
b
r
is a basis of H,
d
i
[ d
i+1
for 1 i < r.
The integers d
1
, . . . , d
r
are uniquely determined by H.
Well give a partial proof of this. Well make the following assumption:
Assumption Any subgroup of Z
n
is nitely generated.
This is not hard to prove by induction on n, but we wont do that here. With this in hand, let
H be a subgroup of Z
n
. So there are a nite set of vectors which generate H. Lets write these
vectors as the columns of an n m matrix A (for some m). What can we do to make A simpler?
Remember from linear algebra that for matrices over a eld, elementary column operations dont
change the subspace spanned by the columns of a matrix. We have something similar here, but
we need to restrict to unimodular column operations:
(UC1) Replace some column c
i
of A by c
i
+ tc
j
, where j ,= i and t Z;
(UC2) Interchange two columns of A;
73
(UC3) Replace some column c
i
of A by c
i
.
Its easy to see that none of these change the subgroup of Z
n
spanned by the columns of A.
Similarly, we have unimodular row operations (UR1)-(UR3). These correspond to writing the
same vectors in terms of their coordinates in a different basis. So if we perform both unimodular
row and column operations on A, we get the same subgroup but possibly expressed in a
different basis.
Well assume the following lemma:
Lemma 5.6.2. Let A be an integer matrix and let u be the GCD of the entries of A. Then we may apply
a nite sequence of operations (URi) and (UCi) to A to obtain a matrix whose top left-hand entry is u.
The proof of the lemma is in the (non-examinable) appendix below.
Theorem 5.6.3 (Smith Normal Form over Z). Let A be any n m integer matrix. Then there is a
nite sequence of unimodular row and column operations which transform A into a matrix of the form
D =
_
_
_
_
_
_
_
_
d
1
.
.
.
d
r
0
.
.
.
_
_
_
_
_
_
_
_
,
where r = rank(A) and d
i
are positive integers such that d
i
[ d
i+1
for 1 i < r. The integers d
i
(the
elementary divisors of A) are uniquely determined by A.
The matrix D is called the Smith normal form of A. Since the operations (UCi) and (URi)
correspond to multiplication on the right and on the left by unimodular matrices, we must have
D = UAV for unimodular matrices U and V. Although D is unique, U and V generally arent.
We wont prove the uniqueness of D here, and our proof of the existence will rely on the
non-examinable proof of Lemma 5.6.2.
Proof of Theorem 5.6.3 (assuming Lemma 5.6.2). Induction on min(m, n). If either m or n is zero,
or if A is the zero matrix, there is nothing to prove. So let A be any mn matrix. By Lemma
5.6.2, we may assume that the top left entry u of A is nonzero and divides every other entry.
Then we may apply operations (UC1) and (UR1) to replace every other entry of the rst row
and column with 0.
Now A looks like
_
u 0
0 B
_
for some (m1) (n 1) matrix B, whose rank is r 1 and all of whose entries are divisible
by u. By the induction hypothesis, we can put B into Smith Normal Form, with elementary
divisors e
1
, . . . , e
r1
say, by applying row and column operations. Since every entry of B is
divisible by u, we have u [ e
1
. Applying the corresponding row and column operations on A,
we get a diagonal matrix with entries u, e
1
, e
2
, . . ., which is the Smith normal form of A.
74
Example 1. A =
_
42 21
35 14
_
.
We stare at this matrix and see that the GCD of its entries is 7, so we want to do elementary
operations to get a 7 in the top left corner.
Matrix Operation Matrix Operation
_
42 21
35 14
_
c
1
c
1
2c
2
_
0 21
7 14
_
r
2
r
2
r
1
r
2
_
7 14
0 21
_
c
2
c
2
2c
1
_
7 0
0 21
_
Done
Example 2. A =
_
30 42
70 105
_
.
This is a hard one, because the GCD of the entries is actually 1, by any proper subset of the
entries has GCD > 1. So we just try to reduce the size of the off-diagonal entries and hope for
the best:
_
30 42
70 105
_
r
2
r
2
2r
1
_
30 42
10 21
_
c
2
c
2
2c
1
_
30 18
10 1
_
c
1
c
2
r
1
r
2
_
1 10
18 30
_
r
2
r
2
+ 18r
1
c
2
c
2
10c
1
_
1 0
0 210
_
Done
Example 3. A =
_
_
_
_
18 18 18 90
54 12 45 48
9 6 6 63
18 6 15 12
_
_
_
_
.
75
_
_
_
_
18 18 18 90
54 12 45 48
9 6 6 63
18 6 15 12
_
_
_
_
c
1
c
1
c
3
_
_
_
_
0 18 18 90
9 12 45 48
3 6 6 63
3 6 15 12
_
_
_
_
r
1
r
4
_
_
_
_
3 6 15 12
9 12 45 48
3 6 6 63
0 18 18 90
_
_
_
_
r
2
r
2
3r
1
r
3
r
3
r
1
_
_
_
_
3 6 15 12
0 6 0 12
0 12 9 51
0 18 18 90
_
_
_
_
c
2
c
2
2c
1
c
3
c
3
5c
1
c
4
c
4
4c
1
_
_
_
_
3 0 0 0
0 6 0 12
0 12 9 51
0 18 18 90
_
_
_
_
c
2
c
2
c
2
c
2
+c
3
_
_
_
_
3 0 0 0
0 6 0 12
0 3 9 51
0 0 18 90
_
_
_
_
r
2
r
3
_
_
_
_
3 0 0 0
0 3 9 51
0 6 0 12
0 0 18 90
_
_
_
_
r
3
r
3
2r
2
_
_
_
_
3 0 0 0
0 3 9 51
0 0 18 90
0 0 18 90
_
_
_
_
c
3
c
3
+ 3c
2
c
4
c
4
17c
2
_
_
_
_
3 0 0 0
0 3 0 0
0 0 18 90
0 0 18 90
_
_
_
_
c
4
c
4
+ 5c
3
r
4
r
4
+r
3
_
_
_
_
3 0 0 0
0 3 0 0
0 0 18 0
0 0 0 0
_
_
_
_
Appendix: Proof of Lemma 5.6.2
Not
lectured
This is non-examinable, and for information only.
Step 1: Arrange that a
11
> 0. If A is the zero matrix, there is nothing to prove. So we can
assume that there is at least one (i, j) such that a
ij
,= 0. Then we do a row swap and a column
swap to move this nonzero entry to the top left corner. If its positive, good; otherwise, we use
(UR3) to make it positive.
Step 2: Reduce the rst column modulo a
11
.
For each i > 1, we can write a
i1
= qa
11
+r, for some q Z and 0 r < a
11
. By adding q times
row 1 to row i, the entry a
i1
gets replaced with a
i1
qa
11
= r. Similarly, we can use elementary
column operations to reduce the rst row mod a
11
.
Step 3: Is the rst column zero?
Suppose that some entry in the rst column below the top one has its rst entry non-zero. Then
swap it with row 1, and go back to step 2. Since we arranged in step 2 that the entries of the
rst column are reduced mod a
11
, well be replacing a
11
with a smaller positive integer, so this
step only happens nitely many times.
76
Step 4: Clear the rst row similarly.
Now weve arranged that the rst column is zero below a
11
. Now lets do the same to the rst
row, using elementary column operations instead of row operations. We can reduce the entries
of the rst row modulo a
11
without changing the rst column at all. If we can make them all
zero, good. Otherwise, we can permute columns to make a
11
smaller; this will ruin all of our
careful work clearing the rst column, but weve made a
11
smaller, so who cares?
Step 5: Does a
11
divide everything in sight?
Suppose theres some (i, j) such that a
ij
isnt divisible by a
11
. Then we can add column j to
column 1, to get an entry in position (i, 1) which isnt a multiple of a
11
. By a row operation, we
can reduce this entry modulo a
11
, and we wont get zero. So we swap rows i and 1, and go back
to Step 2 with a smaller value of a
11
. Again, this messes up our work clearing the rst row, but
who cares?
Well eventually end up with a matrix such that a
11
> 0, the rest of the rst row and column
consist of zeroes, and every entry is divisible by a
11
. This proves the lemma.
5.7 The direct sum of groups
Lecture 28
In this section, well use what we know about free groups to understand all abelian groups
which are nitely generated (i.e. have a nite generating set). Well show that theyre stuck
together from pieces that look like either Z, or Z
n
for some n.
Firstly, Ill explain whats meant by sticking together groups.
Denition 5.7.1. Let G and H be groups. Then the direct sum G H is the set of pairs
(g, h) : g G, h H
with the group law given by
(g
1
, h
1
) + (g
2
, h
2
) = (g
1
+ g
2
, h
1
+ h
2
).
Its easy to see that G H is a group, with the identity element being (0
G
, 0
H
) and inverses
given by (g, h) = (g, h).
Examples.
(i) The group Z
2
Z
2
, the Klein four-group, is a group of order 4 in which every element has
order 1 or 2. (So, in particular, it is not cyclic). In fact Z
2
Z
2
and Z
4
are the only groups
of order 4.
(ii) The group Z
2
Z
3
, perhaps surprisingly, is isomorphic to Z
6
. This comes from the fact
77
that the multiples of the element g = (1, 1) are
0g = (0, 0)
1g = (1, 1)
2g = (0, 2)
3g = (1, 0)
4g = (0, 1)
5g = (1, 2)
So the order of this element is 6; in other words, g generates Z
2
Z
3
, so the group is cyclic.
Similarly, one can check that if m and n are any coprime positive integers, then Z
m
Z
n
is
cyclic with generator (1, 1).
The theorem we proved last week on the subgroups of Z
n
can be used immediately to under-
stand the quotients of Z
n
as well:
Proposition 5.7.2. Let G = Z
n
and let H be a subgroup of Z
n
of rank r, with elementary divisors
d
1
, . . . , d
r
. Then we have
G/H

= Z
d
1
Z
d
2
Z
d
r
Z
nr
.
Proof. Let b
1
, . . . , b
n
be a basis of G such that d
1
b
1
, . . . , d
r
b
r
are a basis for H, as in Theorem
5.6.1. Well write J for the group Z
d
1
Z
d
2
Z
d
r
Z
nr
.
Let v G and let v
1
, . . . , v
n
be the coordinates of v in the basis b
1
, . . . , b
n
(so
_
_
_
v
1
.
.
.
v
n
_
_
_ = P
1
v
T
where P is the change-of-basis matrix). Then we may dene a map : G J by mapping v to
the element
(v
1
mod d
1
, v
2
mod d
2
, . . . , v
r
mod d
r
, v
r+1
, . . . , v
n
) J.
I claim this map is a surjective homomorphism with kernel H.
Firstly, we shall show that is a group homomorphism. It is clear that the map Z
n
Z
n,1
sending an element to its coordinates in the basis b
i
is a homomorphism. Since reduction
modulo d
i
is a homomorphism for each i, the composite map is also a homomorphism as
required.
Now we show is surjective. It is clear that (b
i
) gets sent to (0, . . . , 0, 1, 0, . . . , 0) inJ, with a 1
in the ith position. These elements obviously generate J, so since im() is a subgroup, it must
be the whole of J.
Finally, we show that the kernel of is H. It is clear that for x G, we have (x) = 0 if and
only if the coordinates v
1
, . . . , v
n
of x satisfy d
i
[ v
i
for 1 i < r and v
i
= 0 for i > r; so the
kernel of is precisely the group generated by d
1
b
1
, . . . , d
r
b
r
, which is H.
By the rst isomorphism theorem, we see that there is an isomorphism G/H J as required.
78
So now weve shown that any quotient of Z
n
is isomorphic to a direct sum of some copies of Z
and a string of cyclic groups, of which the order of each one divides the next. Notice that some
of the d
i
might be 1, so we can omit them from the direct sum if we like (since Z
1
is the trivial
group, and thus Z
1
G

= G for any abelian group G.)
Weve shown that quotients of Z
n
are rather special: they can all be written in this nice form.
Lets now show, on the other hand, that quotients of Z
n
are rather general:
Proposition 5.7.3. Let G be any nitely generated abelian group. Then there is an n 0 and subgroup
H of Z
n
such that G

=
Z
n
H
.
Proof. By assumption, we can nd a nite set of elements g
1
, . . . , g
n
G which generate G. We
take this value of n, and construct a homomorphism : Z
n
G by mapping (x
1
, . . . , x
n
) Z
n
to x
1
g
1
+ + x
n
g
n
G. This is obviously a homomorphism (since G is abelian), and the
statement that is surjective is precisely the same statement as saying that the gs generate G.
So by the First Isomorphism Theorem, we have G

=
Z
n
ker()
.
Putting all of these results together, we get (part of) the main theorem of this section:
Theorem 5.7.4 (Fundamental Theorem of Finitely Generated Abelian Groups, or FTFAG). If G
is a nitely generated abelian group, then G is isomorphic to a direct sum of cyclic groups. More precisely,
there exist integers r 0 and s 0, and d
1
, . . . , d
r
> 0 with d
1
> 1 and d
i
[ d
i+1
for each i, such that
G

= Z
d
1
Z
d
2
Z
d
r
Z
s
, ()
and the integers r, s and d
1
, . . . , d
r
are uniquely determined.
Proof. By Proposition 5.7.3, G is isomorphic to a quotient of Z
n
for some n. By Proposition 5.7.2,
G is isomorphic to a group of the form (); some of the d
i
might be 1, but we can just discard
those, and this gives the existence part of the theorem.
So we must show that s and d
1
, . . . , d
r
are uniquely determined by G. The rst part isnt too Lecture 29
hard. Dene the subgroup
G
tors
= g G : g has nite order.
It is clear that this is a subgroup, and if G is isomorphic to Z
d
1
Z
d
2
Z
d
r
Z
s
, then we have
G/G
tors

= Z
s
. Since the groups Z
s
and Z
t
are not isomorphic to each other for s ,= t, this shows
that s is uniquely determined.
The harder part is to show that the sequence of integers d
1
, . . . , d
r
is uniquely determined. Well
do this in a very similar way to the proof of uniqueness of Jordan canonical form. If you rewind
your brain back to Chapter 2 for a moment, recall that we showed (Theorem 2.9.1) that for T a
linear operator on a complex vector space V, and an eigenvalue, we had
(dimension of index i generalized eigenspace for T)
(dimension of index i 1 generalized eigenspace)
= (# of Jordan blocks of size i for ). ()
79
from which it followed that the block sizes were uniquely determined.
Here we adopt a very similar argument, but with the following differences:
eigenvalues replaced by prime numbers p dividing [G[;
block sizes replaced by powers of p dividing the d
i
s;
generalized eigenspaces replaced by the subgroups
G[p
n
] = g G : p
n
g = 0
for n 0;
dimensions of spaces replaced by orders of (nite) groups.
Well show the following fact, corresponding to (): for each p and each n 1,
[ G[p
n
] [
[ G[p
n1
] [
= p
s
, where s = (# of d
i
that are divisible by p
n
). ()
Its easy to see that () determines all the d
i
uniquely (since were assuming that the d
i
are all
> 1, so every d
i
is divisible by some prime p).
To prove (), we note that if G and H are any two abelian groups, we have (G H)[p
n
] =
G[p
n
] H[p
n
], so it sufces to look at each cyclic factor.
Its not too hard to see that if t is the highest power of p dividing d
i
, then
Z
d
i
[p
n
]

=
_
Z
p
n if n t
Z
p
t if n > t
and thus
[ Z
d
i
[p
n
] [
[ Z
d
i
[p
n1
] [
=
_
p if p
n
[ d
i
1 otherwise.
Taking the product over i = 1, . . . , r gives us ().
Thats the last examinable theorem of the course. (Phew!)
Lets nish off by thinking about what this means for nite abelian groups. Its clear that G is
nite if and only if there are no Z factors in the decomposition coming from FTFAG, so G is a
direct sum of cyclic groups:
G

= Z
d
1
Z
d
2
Z
d
r
,
where d
i
[d
i+1
for 1 i < r, and [G[ = d
1
d
2
d
r
.
From the uniqueness part of Theorem 5.7.4, it follows that different sets of integers satisfying
these conditions correspond to non-isomorphic groups. So the isomorphism classes of nite
abelian groups of order n > 0 are in one-one correspondence with expressions n = d
1
d
2
d
r
80
for which d
i
[d
i+1
for 1 i < r. This gives us a classication of isomorphism classes of nite
abelian groups.
Examples:
1. n = 4. The decompositions are 4 and 2 2, so G

= Z
4
or Z
2
Z
2
.
2. n = 15. The only decomposition is 15, so G

= Z
15
is necessarily cyclic.
3. n = 36. Decompositions are 36, 2 18, 3 12 and 6 6, so G

= Z
36
, Z
2
Z
18
, Z
3
Z
12
and Z
6
Z
6
.
5.8 A preview of abstract algebra: Modules over PIDs
Lecture 30
In this last, non-examinable lecture, Ill tell you why two of the main theorems of this course
were really different versions of the same theorem.
Denition 5.8.1. A ring is a set R with two binary operations + and , such that
(R, +) is an abelian group,
(R, ) satises group axioms G1 (closure), G2 (associativity), and G3 (identity),
(r + s) t = r t + s t and r (s + t) = r s + r t (distributive law).
If r s = s r for all r, s R, then we say R is a commutative ring.
Note that R neednt be a group under multiplication, and the additive and multiplicative
identities 0
R
and 1
R
wont necessarily be the same. (In fact the only ring which is a group under
multiplication is the zero ring, and this is also the only ring with 0
R
= 1
R
.)
Any eld is an example of a commutative ring; but there are many more, e.g. Zis a commutative
ring, as is Z
n
for any integer n, and so is K[X] for any eld K. An example of a ring thats not
commutative is the ring K
2,2
of 2 2 matrices over a eld K.
If R is a commutative ring, an R-module is an abelian group M with a binary operation RM
M (written as r m, or just rm) such that
rs m = r (s m),
(r + s) m = r m + s m,
r (m + n) = r m + r n.
If R is a eld, then an R-module is the same thing as a vector space over R; but the denition
makes sense for any commutative ring, and we have two very important examples:
Any abelian group is a Z-module (in a unique way), with the module structure given as
in Section 5.1.3.
If V is a vector space over K, and T : V V is a linear operator, then V becomes a
K[X]-module via the rule
f v = f (T)(v)
81
for f K[X] and v V.
Now, Z and K[X] are both rather nice rings. Firstly, their multiplication is well-behaved: if R
is either of these rings, and r, s R satisfy rs = 0, then r = 0 or s = 0. (Notice that this isnt
automatic from the axioms: the integers mod 4 are a ring, but 2 2 = 0 in this ring). Rings with
this nice property are called integral domains.
Finally, they satisfy the following nice property: any set (not necessarily nite) of elements of
R has a greatest common divisor that is, if X is any subset of R, then there is an element x R
such that
each element in X is of the form ux for u R,
x can be written as u
1
x
1
+ + u
n
x
n
for some elements x
1
, . . . , x
n
of R.
It would be reasonable to call rings like this GCD rings, but theyre conventionally known
as principal ideal rings for a reason youll see in Algebra II next term. A principal ideal ring
which is also an integral domain is called a principal ideal domain or just a PID.
Theorem 5.8.2. Let R be a principal ideal domain, and M a nitely-generated R-module. Then there
is an integer s 0 and elements d
1
, . . . , d
r
of R, uniquely determined up to units in R and satisfying
d
i
[ d
i+1
for each i, such that
M

= R
s
R
d
1
R

R
d
2
R

R
d
r
R
.
The proof is almost exactly the same as that of Smith normal form, with integer matrices
replaced by matrices over R. Now lets see what this gives us for our two examples:
If R = Z, then a nitely-generated abelian group is certainly a nitely-generated R-module,
and hence this gives us the FTFAG.
If R = K[X] and M is a vector space with a linear operator on it, then this theorem will give us
(after a bit of massaging) the JCF theorem! Out of the box, the theorem tells us that V is a direct
sum of sub-R-modules, each of which looks like R/f
i
(X)R as a K[X]-module. But if K = C,
then each f
i
will look like a product (X
1
)
a
1
. . . (X
t
)
a
t
, and its easy to see that
R
f
i
R
=
R
(X
1
)
a
1
R

R
(X
t
)
a
t
R
.
So V has a decomposition as a direct sum of T-stable subspaces, with each subspace isomorphic
as an R-module to
R
(X)
k
R
for some k and . But the image of the k vectors (X )
k1
, (X
)
k2
, . . . , (X ), 1
R
(X)
k
R
is then a Jordan chain forming a basis of the subspace! So
without too much work, weve deduced both Theorem 5.7.4 and Theorem 2.7.2 from the same
piece of abstract algebraic machinery.
82
6 Corrections
6 Corrections
This is a list of the changes that have been made between the version of these notes originally
uploaded to the web, and the current version.
Proof of Theorem 5.4.3 on page 70: changed g h H to g h K
83

Algebra I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algebra I

Uploaded by

Copyright:

Available Formats

MA251: Algebra I Advanced Linear Algebra

is just the transpose of the matrix

= T, or equivalently if the bilinear form (v, w) = v (Tw) is

Aw, where the

: V V (the Hermitian adjoint of V) such that

= A (a Hermitian-symmetric matrix), unitary if and only if A

, not by T. However, as youll see in Q9

You might also like