You are on page 1of 83

LINEAR ALGEBRA (ALGEBRA 1)

CMI COURSE NOTES 2021

T.R. RAMADAS

1. Introduction

1.1. Preliminaries. You are supposed to know about the set Q of rational
numbers, the set R of real numbers and the set C of complex numbers. These
are all fields (the only ones we will be concerned with for the most part).
Note the inclusions
Q⊂R⊂C

I write A ⊂ B to signify that A is a subset of B. If A is a proper subset and


I want to emphasise this, I write A ⊊ B. I never write A ⊆ B.
Given sets A and B, we mean by A ∖ B the set of elements of A that are
not in B. If A ⊂ C and B ⊂ C, then
A ∖ B = A ∩ Bc
where B c is the complement of B in C.
We use “function” and “map” interchangeably. Given sets X and Y , we
often write “consider a function X → Y, x ↦ y = f (x)” meaning thereby
“let f ∶ X → Y be a function, let x be a “typical” element of X, and let
y = f (x).” The symbol ↦ is read as “maps to” or “goes to” (as opposed to
the symbol “→”, which is read as “to”.)
I use:
● “injective” instead of one-one,
● “surjective” instead of “onto”, and
● “bijective” instead of “one-one and onto”.
Given A ⊂ B the obvious map ι ∶ A → B:
ι(x) = x, x ∈ A
where on the right x ∈ A is regarded as an element of B, is clearly injective,
and is often called an inclusion.

Date: September 30, 2021.


1
2 T.R. RAMADAS

Given maps f ∶ X → Y and g ∶ Y → Z, the function g ○ f ∶ X → Z is defined


(by “composition of f and g”) as follows:
g ○ f (x) = g(f (x)), x ∈ X

To reduce notational overload, I often omit qualifiers such as “λ ∈ R, v ∈ V ”


if the set memberships are clear and no confusion is likely.
A statement in boldface is a mathematical fact you are supposed to know
how to prove; it is likely to crop up in a quiz, for example.

1.2. Design of the course. The main reference is Artin, M.:Algebra 1,


Prentice-Hall (1991). We wish to cover as much of Chapters 1,3, and 4
as possible. Also recommended, for a more abstract approach, is the old
classic Halmos, P.: Finite dimensional vector spaces. For a very interesting
account of the subject, see the book (if you can locate a copy) by Abel-Prize
laureate Peter Lax: Linear Algebra or its successor Linear Algebra and is
Applications.
Matrices are at the heart of linear algebra, and the subtlest questions, from
the point of view of research and applications, have to do with them. They
are relatively concrete objects, already familiar from school/college, and
facility with their manipulation is easy to acquire and an important skill. It
is not surprising that many introductions to linear algebra, including in the
rather theoretical book by Artin, begin with matrices.
Other books, for example the Halmos’, put off dealing with matrices and
take a more abstract approach. This has the following drawbacks:
(1) Much of the material is routine, example-free and boring, but some
crucial concepts – linear dependence, bases and dimension – have
subtle definitions and the proofs call for cleverness, a resource to be
used frugally.
(2) One does not acquire matrix skills.
I have chosen to take the following route.
(1) We begin with an introduction to vector spaces, and make a number of
definitions and prove a number of basic results which are entirely routine.
This will give you time to gets used to abstract definitions, notations and
arguments in a relatively easy context.
(2) In a separate section we will introduce the notions of linear indepen-
dence, dimension and basis and prove the basic results of the theory. At a
crucial point we will quote a result that will be proved later, using matrices.
In a separate subsection, we will outline a proof that avoids matrices.
1Warning: You may have a different edition.
LINEAR ALGEBRA 3

(3) We move on to the concrete vector spaces Rn and maps between them
which are represented by matrices. We develop the basic theory as in Artin,
concluding with the definition and basic properties of determinants.
From this point on we deal with finite dimensional vector spaces and maps
between them, freely moving back and forth between abstract (=“basis-free”)
and matrix arguments.
(4) Given a linear map from a finite-dimensional vector space to itself, we
will define the notions of determinant, eigenvalue, eigenvector, (generalised)
eigenspace, and (generalised) eigenspace decomposition. We define the char-
acteristic and minimal polynomials.
(5) We define real/complex inner product spaces, symmetric/hermitian lin-
ear maps. We prove the spectral theorem in “operator” and “matrix ver-
sions.”
(6) Rotations/Reflections in three dimensions
(7) Quotients, duals, tensors.

1.3. Please note the following conventions. Until further notice, our
definitions, arguments, and theorems will refer to the real numbers. In fact
everything will continue to hold if we replace the real numbers with the ratio-
nals or with the complex numbers. Some authors deal with such a situation
by working with a field k which is declared to be any one of the above. I
choose to take the more informal route.
In the context of real vector spaces, we will use “scalar” and “real num-
ber” interchangeably. In general if we replace R by an arbitrary field k (in
particular by Q or C) by scalar we will mean an element of k. When it is
important to specify the field, similarly we will talk of “a k-vector space V ”
or a vector space “over k”.
4 T.R. RAMADAS

2. Vector Spaces and Linear Maps

2.1. Definition of a vector space.


Definition 2.1. A real vector space is a set V , whose elements are called
vectors, together with maps
V ×V →V
(v, w) ↦ v + w “addition of vectors”

R×V →V
(λ, v) ↦ λ v “scalar multiplication”
such that
(1) Addition of vectors is associative and commutative. There is a
unique vector (“the zero vector”) 0V (or simply 0 when confusion
is unlikely) such that v + 0V = 0V + v = v for every vector v. Ev-
ery vector v has its unique “additive inverse”, denoted −v such that
v + (−v) = (−v) + v = 0V .
(2) Obvious identities hold, which relate to the interplay between addi-
tion and scalar multiplication:
λ (v + w) = λ v + λ w
(λ + µ) v = λ v + µ v (A)
1 v=v
λ (µ v) = λµ v (B)

Remarks: Note the following.


(1) Part (1) of the above definition amounts to saying that V is an abelian
group – the group law is addition of vectors, the identity element is 0V , and
the inverse of v is −v.
(2) In part (2), in equation (A) we are adding scalars on the left and vectors
on the right. In (B) we have
– on the left, two multiplications of vectors by scalars, and
– on the right, one multiplication of two scalars followed by one multiplica-
tion of a vector by a real number.)
(3) Notational convention: I try to avoid giving names to functions unless
necessary. For example addition of vectors is a map V × V → V . We do not
need a name for this function, but will need a notation for the sum of two
vectors, so I introduce this implicitly above, writing
(v, w) ↦ v + w
LINEAR ALGEBRA 5

With this notation associativity becomes u + (v + w) = (u + v) + w, and


commutativity v + w = w + v.
(4) Notational convention: Eventually we will simply write λv when we
mean λ v and u − v when we mean u + (−v).
(5) Suppose that there exists a (not necessarily unique) zero vector 0V
and every vector u has a (not necessarily unique) inverse −u. Then the
Cancellation Lemma holds: if u, v, w are vectors such that u + v = u + w, then
v = w. Proof: u+v = u+w Ô⇒ −u+(u+v) = −u+(u+w) Ô⇒ (−u+u)+v =
(−u + u) + w Ô⇒ 0V + v = 0V + w Ô⇒ v = w.
So we can drop the word unique (both occurrences) in the definition of vector
space above.
(6) Because addition is commutative, I could have just written:
– “...such that v + 0V = v for every vector v.”
– “...“additive inverse”, denoted −v such that v + (−v) = 0V .”
(7) Notwithstanding Remarks (5) and (6), I recommend you just remember
Definition 2.1 as it is.
(8) The following follow from the Cancellation Lemma:
Exercises: 0 v = 0V , λ 0V = 0V , (−1) v = −v.
6 T.R. RAMADAS

3. More definitions: linear map, subspace, kernel, image, sum,


direct sum

Suppose V, Ṽ are vector spaces.


Definition 3.1. A linear map T ∶ V → Ṽ is a map compatible with addition
and scalar multiplication. In other words,
T (v + w) = T (v) + T (w) and T (λ v) = λ T (v)

It follows that T (0V ) = 0Ṽ and T (−v) = −T (v). A map T ∶ V → Ṽ is an


isomorphism if T is a bijection, and both T and its inverse are linear maps.
(See Exercise 1(b).)
Definition 3.2. A subspace of a vector space is a nonempty subset closed
under addition and scalar multiplication.

Exercises:
(1) (a) The composition of two linear maps is linear. (b) The inverse of a
bijective linear map is linear, so a bijective linear map is automatically an
isomorphism.
(2) Suppose given a vector space V and a subspace W (i.e., a nonempty
subset closed under addition and scalar multiplication). Then 0V ∈ W and
for every w ∈ W , −w ∈ W . Thus W is a vector space and the inclusion
W ↪ V is a linear, injective map.
(3) Given a linear map T ∶ V → Ṽ , its kernel is the subset of V defined as:
ker(T ) ≡ {v ∈ V ∣T (v) = 0Ṽ }
Prove (a) ker(T ) is a subspace (b) T is injective iff ker(T ) = {0V } (i.e., the
kernel is the singleton set containing (only) the zero vector of V .)
(4) The image of a linear map T ∶ V → Ṽ ,
T (V ) ≡ image(T ) ≡ {ṽ ∈ Ṽ ∣∃v ∈ V such that ṽ = T (v)} = {T (v)∣v ∈ V }
is a subspace of Ṽ . If ker(T ) = {0V }, we have an isomorphism V →
image(T ).
In terms that may be more familiar to you, V is the domain of T , Ṽ is the
codomain, and T (V ) = image(T ) is the range of T .
Definition 3.3. Given a vector space V and two subspaces W1 and W2 , the
sum W1 + W2 is the subset {w1 + w2 ∣w1 ∈ W1 , w2 ∈ W2 }.

Exercise: (a) Given any family {Vα } of subspaces of a vector space V the
intersection ⋂ Vα is a subspace. (b) The sum W1 + W2 is a subspace. In fact,
it is the intersection
W1 + W2 = ⋂ V ′
Wi ⊂V ′ ⊂V
LINEAR ALGEBRA 7

(The notation on the right refers to the intersection of all subspaces V ′ ⊂ V


which contain both W1 and W2 .)
Definition 3.4. A vector space V is the direct sum of two subspaces W1
and W2 if V = W1 + W2 and W1 ∩ W2 = {0V }, in which we write
V = W1 ⊕ W2

Exercise: (Direct sums)


(1) Prove that a vector space V is the direct sum of two subspaces W1 and
W2 iff every vector v can be written uniquely as a sum
v = w1 + w2
with wi ∈ Wi , i = 1, 2.
(2) Let W̃1 , W̃2 be two vector spaces2. Define addition and scalar multipli-
cation of elements of the Cartesian product W̃1 × W̃2 as follows:
(w̃1 , w̃2 ) + (w̃1′ , w̃2′ ) ≡ (w̃1 + w̃1′ , w̃2 + w̃2′ )
λ (w̃1 , w̃2 ) ≡ (λ w̃1 , λ w̃2 )

Show that this makes W̃1 ×W̃2 into a vector space, with zero vector (0W̃1 , 0W̃2 )
and −(w̃1 , w̃2 ) = (−w̃1 , −w̃2 ).
(3) Suppose V = W1 ⊕W2 . Consider the cartesian product W1 ×W2 , endowed
with the structure of a vector space as in (2). Prove that the map W1 ×W2 →
V given by
(w1 , w2 ) ↦ w1 + w2
is a bijective linear map and therefore an isomorphism.
(4) Returning to (2), let ι1 ∶ W̃1 → W̃1 × W̃2 and ι2 ∶ W̃2 → W̃1 × W̃2 denote
the injective linear maps defined below:
ι1 (w̃1 ) = (w̃1 , 0W̃2 )
ι2 (w̃2 ) = (0W̃1 , w̃2 )

Let W1 , W2 denote the respective images. Show that W̃1 × W̃2 = W1 ⊕ W2 .


We will usually identify W̃i with Wi (via the maps ιi ) and simply write
W̃1 × W̃2 = W̃1 ⊕ W̃2 .
Finally we will see of how a linear map can be understood better in terms
of an adapted direct sum decomposition.
(5) Let V be a vector space and P ∶ V → V a linear map such that
P2 ≡ P ○ P = P
2Note that these are not presented as subspaces.
8 T.R. RAMADAS

Prove that V = ker(P ) ⊕ image(P ). Verify that the direct sum decomposi-
tion gives a simple description of P :
P (w1 + w2 ) = w2
if w1 ∈ ker(P ) and w2 ∈ image(P ).
(6) Let V be a vector space and I ∶ V → V a linear map such that
I 2 ≡ I ○ I = IV
where IV ∶ V → V is the identity map. (That is, IV (v) = v ∀ v ∈ V .) By
considering the map
1
P = (IV − I)
2
prove that V = V1 ⊕ V−1 , where
V±1 = {v ∈ V ∣Iv = ±v}
.
Remark 3.5. The point of Exercises (1)-(4) is that if V is the direct sum
of two subspaces W1 and W2 , one can identify it with the Cartesian product
W1 × W2 . In this context V is sometimes called the “internal” direct sum of
the subspaces W1 and W2 . Conversely, if we start with two vector spaces W̃1
and W̃2 their Cartesian product is the direct sum of two subspaces which
can be identified with the two “factors” of the Cartesian product. In this
context W̃1 × W̃2 is sometimes called the “external” direct sum of W̃1 and
W̃2 .
LINEAR ALGEBRA 9

4. Examples of vector spaces and maps

(1) A singleton set, denoted R0 . The sole element, denoted 0, is the zero
vector. Addition, scalar multiplication, and additive inverse all defined in
the obvious way: 0 + 0 = 0, λ 0 = 0, −0 = 0.
(2) The set R1 . This is just the set of real numbers R itself, regarded as
a real vector space. Addition and scalar multiplication are defined in the
obvious way. The zero vector is 0 and the additive inverse of λ is −λ.
(3) For n ≥ 2, define Rn to be the set of ordered n-tuples (x1 , . . . , xn ) of real
numbers. Addition and scalar multiplication are defined coordinate-wise:
(x1 , . . . , xn ) + (x′1 , . . . , x′n ) = (x1 + x′1 , . . . , xn + x′n )
λ (x1 , . . . , xn ) = (λx1 , . . . , λxn )
Clearly (0, . . . , 0) is the zero vector and −(x1 , . . . , xn ) = (−x1 , . . . , −xn ).
(4) Let S be an arbitrary set. Then the set of functions S → R is a vector
space. (This set is often denoted RS , but we will use this notation sparingly.)
Addition and scalar multiplication are defined as follows (here f and g are
functions S → R, and x ∈ S):
(f + g)(x) = f (x) + g(x)
λ f (x) = λf (x)
What is the zero vector? how is −f defined?
(5) Let S = {1, 2, . . . , n}. There is an obvious isomorphism RS → Rn :
f ↦ (f (1), . . . , f (n))
More generally let S is be a finite set with n elements, and suppose a bijection
{1, . . . , n} → S is chosen yielding an ordering (x1 , . . . , xn ) of the elements of
S. Then again we get an isomorphism RS → Rn :
f ↦ (f (x1 ), . . . , f (xn ))

(6) If V, W are vector spaces, let L(V, W ) denote the space of linear maps
from V to W . (In a more algebraic context this would be denoted Hom(A, B).)
This is itself a vector space, with operations defined as follows:
(S + T )(v) = S(v) + T (v) (sum of linear maps S, T )
(λ T )(v) = λ T (v) (multiplying a linear map T by a scalar λ)
(−T )(v) = −T (v)
The zero linear map is the one that sends all vectors in V to 0W .
10 T.R. RAMADAS

Exercises:
(7) Let a < b be real numbers. Then the set C 0 [(a, b)] of continuous real-
valued functions from (a, b) is a subspace of the vector space of all real-
valued functions from (a, b). The set C 1 [(a, b)] of continuously differen-
tiable3 real-valued functions from (a, b) is a subspace of C 0 [(a, b)]. The
map f ↦ f ′ is a linear map C 1 [(a, b)] → C 0 [(a, b)]. What is its kernel?
What is the image?
(8) Is there a linear map I ∶ C 0 [(a, b)] → C 1 [(a, b)] such that if I(g) = f ,
then f ′ = g?
(9) Exhibit a surjective linear map C 0 [(a, b)] → Rn , for n any natural num-
ber.
(10) Consider the map Sum ∶ R2 → R defined as follows:
Sum((x1 , x2 )) = x1 + x2
Check that this is a linear surjective map. Determine the kernel, and draw
a sketch as it would appear on a sheet of graph paper. Determine all linear
maps4 RInv ∶ R → R2 such that
Sum(RInv(t)) = t (S)
Hint: a linear map from R to any vector space is characterised by what
it does to 1. Let RInv(1) = (a, b), and characterise the choices of vector
(a, b) ∈ R2 that are compatible with the equation (S) above.
For each possible RInv (i.e., admissible choice of (a, b)) sketch the image
image(RInv). Do you see a pattern?
(11) Let l, m be natural numbers. Exhibit a linear isomorphism Rl × Rm →
Rl+m .

3f is continuously differentiable if it is differentiable and the derivative f ′ is continuous.


4RInv is short for right inverse, in case you were wondering.
LINEAR ALGEBRA 11

5. Linear independence, bases and dimension

Warning regarding notation: From now on we will usually drop the “ ” sig-
nifying scalar multiplication; λv will mean λ v.

5.1. Linear dependence and bases. Let V be a vector space, and let S
be a nonempty subset of vectors.
Definition 5.1. The span of the set S is the set (denoted < S >) of vectors
v that can be written as a sum5
v = λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
where λi are scalars and vi ∈ S. (This representation is not necessarily
unique; in particular, m could depend on v.)

The span < S > is a subspace and the intersection of all subspaces that
contain S.
Definition 5.2. We will say that S generates 6 V if V =< S >, i.e., if every
vector v can be written as a sum
v = λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
where λi are scalars and the vi belong to the set S.
Definition 5.3. Let V be a vector space, and let S be a nonempty subset of
nonzero vectors. We will say that S is a linearly independent set of vectors
if the following holds: For any m ≥ 1 and m distinct vectors v1 , . . . , vm in S,
λ 1 v1 + ⋅ ⋅ ⋅ + λ m vm
is a nonzero vector unless all the λi vanish.

The usual definition of a linearly independent set does not assume that
the vectors are each nonzero, but this is an immediate consequence of the
definition.
Note that any singleton set {v1 }, with v1 a nonzero vector, is linearly in-
dependent. Clearly any nonempty subset of a linearly independent set is
linearly independent. Sometimes the empty set of vectors is declared to be
linearly independent.
Exercise: Let V be a vector space, and let S be a nonempty finite subset
of nonzero vectors. Let φ ∶ RS → V be the linear map
φ(f ) = ∑ f (v)v
v∈S
Then (a) S generates V iff φ is surjective, and (b) S is linearly independent
iff φ is injective.
5We say that “v is a linear combination of the vectors v , . . . , v ”.
1 m
6In fact, in the vector space context, it is more usual to say S “spans” V .
12 T.R. RAMADAS

We start with the key


Lemma 5.4. Let S be a linearly independent set and v0 ∉< S >. Then the
set S ∪ {v0 } is linearly independent.

Proof. Clearly v0 ≠ 0V and v0 ∉ S. Let v1 , . . . , vm ∈ S be distinct vectors,


and let λ0 , λ1 , . . . , λm be scalars such that
λ0 v0 + λ1 v1 + ⋅ ⋅ ⋅ + λm vm = 0
If λ0 ≠ 0, v0 ∈< S > which contradicts the hypothesis of the Lemma. If λ0 = 0
the other λi must vanish by the linear independence of S. 
Theorem 5.5. Let V be a vector space, and let S be a nonempty finite
subset of nonzero vectors that generates V . Then V is generated by a linearly
independent subset.

We can do better; to recover the above result from the next one start with
S̃ = {v1 }, where v1 is any one of the elements of S.
Theorem 5.6. Let V be a vector space, and let S be a nonempty finite subset
of nonzero vectors that generates V . Let S̃ ⊂ S be a linearly independent
subset. Then V is generated by a linearly independent subset of S that
contains S̃.

Proof. If all the vectors in S ∖ S̃ belong to < S̃ > then clearly < S̃ >=< S >= V
and we are done.
Else, there exists a vector ṽ1 ∈ S∖ < S̃ >. In this case, S̃1 ≡ S̃ ∪ {ṽ1 } is a
linearly independent set. To see this, apply the Lemma 5.4. If < S̃1 >= V we
are done, if not we iterate the process with a ṽ2 ∈ S∖ < S̃1 >, and so on.
When this process terminates, which it must since S is a finite set, we obtain
a linearly independent subset of S that generates V and contains S̃. 
Definition 5.7. Let V be a vector space. A set B of vectors is called a basis
of V if B is linearly independent and generates V .
14 T.R. RAMADAS

5.2. Dimension. This theory can be developed in two ways. One way
(advocated, for example, in Axler,S: Linear Algebra done Right) is developed
in 5.3. I prefer, as Artin does, to invoke a basic result from matrix theory,
which will be proved in the next section. This is a fact that you are probably
familiar with:
Given n2 linear homogeneous equations in n1 unknowns, with n1 > n2 (i.e.,
if “there are more unknowns than equations”), there is a nontrivial solution.
More formally:
Theorem 5.8. Suppose n1 > n2 . Then any linear map T ∶ Rn1 → Rn2 has a
nonzero kernel.

This will be proved in Section §6.


Let us provisionally7 call a vector space V finitely generated if it is (a) sin-
gleton set i.e., V = {0V } or (b) if there is a finite set of nonzero vectors that
generates V . In the latter case, we have seen that
(a) V has a basis; in fact any finite set of nonzero vectors that generates V
has a subset that is a basis, and
(b) given any finite linearly independent set, it can be extended to a basis.
The most important property of linearly independent sets in a finitely gen-
erated vector space is the following
Theorem 5.9. If V is finitely generated and {0V } ⊊ V , any linearly inde-
pendent subset has at most as many elements as any basis.

Proof. Suppose B is a basis and S ′ a linearly independent set with cardi-


nality strictly bigger than B; choose a finite S ⊂ S ′ with n1 ≡ ∣S∣ > ∣B∣ ≡ n2 .
Choosing orderings of S and B, we get a linear map Rn1 → Rn2 . By the
Theorem 5.8 this must have a nonzero kernel, which contradicts linear de-
pendence of S.
Here is a expanded version of the above proof. Let B = (⃗ e1 , . . . , e⃗n2 ) be a
v1 , . . . , v⃗n1 ) be a linearly independent set8. Using S we have
basis, and S = (⃗
a linear map φS ∶ Rn2 → V :
φS (x1 , . . . , xn1 ) = x1 v⃗1 + ⋅ ⋅ ⋅ + xn1 v⃗n1
If you unpack the definition of linear independence you see this map is
injective. Similarly we have a map φB ∶ Rn2 → V :
φB (y1 , . . . , yn2 ) = y1 e⃗1 + ⋅ ⋅ ⋅ + yn2 e⃗n2
7This is not standard terminology, although it agrees with usage in the context of rings
and modules.
8We have enumerated the elements of B and S, which imposes an ordering of the two
sets. This is the reason for the (round brackets). See §5.4 below.
LINEAR ALGEBRA 15

By the definition of basis, this map is bijective. So the composite map


B ○ φS ∶ R → Rn2 is injective. This is impossible if n1 > n2 , appealing to
n1
φ−1
Theorem 5.8. 
Corollary 5.10. Any two bases have the same number of elements.
Definition 5.11. A vector space V is said to be finite-dimensional if it is
finitely generated, in which case the cardinality of any basis is its dimension
and denoted dim V . If V = {0V }, then dim V = 0.

Here are some important consequences of Theorem 5.9.


Theorem 5.12. Let V be a finite-dimensional vector space. Then
(1) Any linearly independent subset with dim V elements is a basis.
(2) Let W be a subspace. Then dim W ≤ dim V , and equality holds iff
W =V.
(3) Let T ∶ V → V̂ be a surjective linear map with V finite-dimensional.
Then V̂ is finite-dimensional and dim V ≥ dim V̂ , with equality iff T is an
isomorphism.
(4) Let T ∶ V → V̂ be a linear map with V finite-dimensional. Then
dim V = dim ker(T ) + dim image(T )

Proof. We leave the proofs of (1) and (2) as exercises. As for (3), note that
if S is any generating set of V , its image under T ,
T (S) ≡ {T (v)∣v ∈ S}
generates V̂ . (This follows from the surjectivity of T .) So V̂ is finitely
generated. Let {⃗ e′1 , . . . , e⃗′m } be a basis of V̂ . Choose for each e⃗′i a vector
e⃗i ∈ V such that T (⃗ei ) = e⃗′i ). (For later use, we call this construction “lifting
a basis to a linearly independent set”.) I claim that the set {⃗ v1 , . . . , v⃗m } is
linearly independent. For,
λ1 v⃗1 + . . . λm v⃗m = 0 Ô⇒ T (λ1 v⃗1 + . . . λm v⃗m ) = 0 Ô⇒ λ1 v⃗1′ + . . . λm v⃗m

=0
Ô⇒ λi = 0 ∀i
This shows that dim V ≥ m = dim V̂ . In case this is an equality, the set
v1 , . . . , v⃗m } is a basis of V , and T is clearly an isomorphism.
{⃗
As for (4), choose a basis {⃗ v1 , . . . , v⃗l } for ker(T ), and extend it to a basis
v1 , . . . , v⃗l , v⃗l+1 , . . . , v⃗n } of V , where n = dim V . Let V̂ ′ ⊂ V be the span of
{⃗
the linearly independent set {⃗ vl+1 , . . . , v⃗n }. Clearly V̂ ′ has dimension n − l.
On the other hand T ∣V̂ ′ ∶ V̂ → image(T ) is an isomorphism9 (why?) so

n − l = dim V̂ ′ = dim image(T ) = m. 


9This is a recurrent construction and argument, so please make sure you understand
it.
16 T.R. RAMADAS

To understand the above proof, keep the following example in mind: consider
the map T ∶ R3 → R3 , given by
T (x, y, z) = (x − y, y − z, z − x)
Then
ker(T ) = {(t, t, t)∣t ∈ R}
image(T ) = {(u, v, w)∣u + v + w = 0}
Then a possible basis of ker(T ) consists of the single vector v⃗1 = (1, 1, 1),
and one can extend this to a basis of R3 in many ways. Let us choose the
two vectors
v⃗2 = (1, 0, 0), v⃗3 = (0, 1, 0)
Then V̂ ′ (the span of v⃗2 and v⃗3 ) is the xy-plane. The map T ∣V̂ ′ is
V̂ ′ ∋ (x, y, 0) ↦ (x − y, y, −x) ∈ image(T ) = {(u, v, w)∣u + v + w = 0}
This map is clearly bijective. You can check that (1, 0, −1) (the image of v⃗2 )
and (−1, 1, 0) (the image of v⃗3 ) together form a basis for image(T ).
Remark 5.13. Given a linear map T ∶ V → W , with V finite-dimensional,
the nullity of T is the dimension of its kernel, and rank of T the dimension
of its image:
nullity of T = dim ker(T ) rank of T = dim image(T )
Theorem 5.12(4) is often referred to as the rank-nullity theorem.

5.3. A matrix-free proof of Theorem 5.9. Here is a slight variation10 of


Theorem 5.9 followed by a proof (taken from P. Lax: Linear Algebra) that
avoids appeal to matrices:
Theorem 5.14. If V is finitely generated and {0V } ⊊ V , any linearly inde-
pendent subset has at most as many elements as any finite generating set.

Proof. We need to prove: if {x1 , . . . , xn } generate V and {y1 , . . . , ym } are


linearly independent, then m ≤ n.
Let y1 = λ1 x1 + ⋅ ⋅ ⋅ + λn xn . Since y1 ≠ 0, at least one of the λj ≠ 0. By
relabelling the xi if necessary, we can suppose this is λ1 . So {y1 , x2 , . . . xn }
is a generating set. Now
y2 = µ1 y1 + λ′2 x2 + ⋅ ⋅ ⋅ + λ′n xn
If all the λ′j vanish, this would contradict the linear independence of {y1 , . . . , ym }.
By relabelling if necessary, we can assume that λ′2 ≠ 0. So {y1 , y2 , x3 , . . . , xn }
is a generating set. How can this process terminate? (And terminate it must
since we have only finitely many yi .):
10The earlier version clearly implies this; how about the other way?
LINEAR ALGEBRA 17

(1) (m = n) We run out of yi and xj simultaneously and the final gener-


ating set is {y1 , . . . , yn }.
(2) (m < n) We run out of yi first and the final generating set is as
follows: {y1 , . . . , ym , . . . , xn }.
(3) (m > n) We run out of xj first and the final generating set is
{y1 , . . . , yn }. Since {y1 , . . . , ym } is linearly independent, this is a
contradiction.


Exercise: Once we have the above proof, we can define dimension without
appealing to the Theorem 5.8. Your task: use the rank-nullity Theorem to
give a proof of Theorem 5.8.
Exercise (Right inverse): Suppose given a linear map  ∶ V → W . A
right inverse B̂ ∶ W → V is a linear map such that  ○ B̂ = IW , where IW is
the identity map of W : IW (w) = w, w ∈ W . Prove that if a right inverse B̂
exists, then (B̂ is injective and) Â is surjective. Suppose from now on that
A is indeed surjective. Prove that
(1) if V is finite-dimensional, so is W ,
(2) if W is finite-dimensional, a right inverse exists, and if V is finite-
dimensional, then dim W ≤ dim V .
In particular, if V, W are finite-dimensional, a linear map  ∶ V → W is
surjective iff a right inverse exists.
Exercise (Left inverse): Suppose given a linear map  ∶ V → W . A left
inverse Ĉ ∶ W → V is a linear map such that Ĉ ○ Â = IV . Prove that if a
left inverse Ĉ exists, then (Ĉ is surjective and) Â is injective. Suppose from
now on that  is indeed injective. Prove that
(1) if W is finite-dimensional, so is V ,
(2) if W is finite-dimensional, a left inverse exists, and dim W ≥ dim V .
In particular, if V, W are finite-dimensional, a linear map  ∶ V → W is
injective iff a left inverse exists.
Exercise (Inverse): Let V be a finite-dimensional vector space. Show that
linear map  ∶ V → V is surjective iff it is injective iff it is bijective. (Hint:
use the rank-nullity theorem.)

5.4. Sets, ordered sets, and all that. Note that in the above exposition,
we have spoken of the following properties of a set S of vectors in a vector
space:
18 T.R. RAMADAS

(1) the span < S > of S,


(2) linear independence of S, and
(3) whether the set S is a basis.
It is more usual to consider ordered sets, and Artin does discuss this matter.
You may find the following remarks helpful. For simplicity let us assume
that S is a finite set, with n elements.
(1) An ordering is a bijective map from the set of integers {1, 2, , . . . , n} to
the set S.
(2) As a matter of notation to specify a set one lists its elements within
curly brackets, for example: {a, l, p} is a set of three letters. Endowed with
their natural order, this would be the ordered set (a, l, p), with the rounded
brackets signaling that this is an ordered set, with the ordering:
1 ↦ a, 2 ↦ l, 3 ↦ p,

(3) The normal practice is define span, linear dependence, and basis as ap-
plied to ordered sets. Immediately one proves that matters are preserved
under changes of ordering.
(4) (This might require some thought.) Specifying a set of vectors S is
equivalent to specifying a linear map RS → V . Specifying an ordered set of
n vectors is equivalent to specifying a linear map Rn → S. In both cases,
the span is the image, linear independence is equivalent to the map being
injective, and the set is a basis if the map is an isomorphism.
(5) Particularly annoying for the pedantic is the fact that the normal way
of naming elements in a set with n elements is by listing them: x1 , x2 , . . . , xn .
So the set gets denoted {x1 , x2 , . . . , xn }. The list privileges the order (x1 , x2 , . . . ).
How does one obtain other orderings? Each permutation σ of {1, 2, . . . , n}
specifies an order: (xσ(1) , xσ(2) , . . . , xσ(n) ).
(6) In practical situations, especially in numerical computations, sets are
usually presented as lists, and come with a natural order. Algorithms that
work with sets presented as lists will result in outputs that depend on the
order.
LINEAR ALGEBRA 19

6. Matrices

6.1. Basics. An m × n (real) matrix A consists of of m × n (real) numbers


aij , where i (the first, or row index) runs over 1, . . . , m and j (the second,
or column index) runs over 1, . . . , n. It is best to visualise the matrix as a
rectangular array:
⎡ a ⎤
⎢ 11 . . a1n ⎥
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ am1 . . amn ⎥
⎣ ⎦
The number aij at the intersection of the ith row and the j th column is called
the “ij th entry”.
(Strictly speaking, we should introduce a comma between the row and col-
umn indices: aij should be ai,j . We refrain from doing this to keep notation
light. To avoid running into problems we make sure that when i, j are
replaced by actual integers, they are in the single-digit range.)
Remark: On occasion, one might introduce a matrix M without having set
up a notation for its entries, in which case it is the usual practice to denote
by Mij its ij th entry. Thus, in the above paragraph, Aij = aij , and if
5 4 3
M =[ ]
−1 −9 2
then M22 = −9.
There does not seem to be a standard notation for the space of m × n real
matrices. We will use Mm×n (R) or simply Mm×n if it is clear that we are
dealing with real matrices, as will be the case now.
The set Mm×n is a vector space, addition being defined by
⎡ a ⎤ ⎡ ′ ⎤ ⎡ ⎤
⎢ 11 . . a1n ⎥ ⎢ a11 . . a1n ⎥ ⎢ a11 + a11 . . a1n + a1n ⎥
′ ′ ′
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . . . . ⎥+⎢ . . . . ⎥=⎢ . . . . ⎥
⎢ ⎥ ⎢ ′ ⎥ ⎢ ⎥
⎢ am1 . . amn ⎥ ⎢ am1 . . amn ⎥ ⎢ am1 + am1 . . amn + amn ⎥
′ ′ ′
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
and scalar multiplication as follows:
⎡ a ⎤ ⎡ ⎤
⎢ 11 . . a1n ⎥ ⎢ λa11 . . λa1n ⎥
⎢ ⎥ ⎢ ⎥
λ ⎢ . . . . ⎥=⎢ . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ am1 . . amn ⎥ ⎢ λam1 . . λamn ⎥
⎣ ⎦ ⎣ ⎦
The zero matrix is the matrix with all entries 0, and the negative of a matrix
is got by changing the sign of each entry.
“Square” matrices are matrices where the number of columns is equal to the
number of rows. These are very important and we will deal with them later.
Even more basic are “row’ vectors” (1 × n matrices):
[a11 . . . . . . a1n ]
20 T.R. RAMADAS

In part (3) of the Exercise below, we define the notion of the transpose of
a matrix. The transpose of a row vector is a “column vector”. There are a
obvious isomorphisms Rn → M1×n → Mn×1 , given by
⎡ v1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
(v1 , . . . , vn ) z→ [v1 . . . . . . vn ] z→ ⎢ ⎥
⎢ . ⎥
∈ Rn ∈ M1×n transpose
⎢ ⎥
⎢ vn ⎥
⎣ ⎦
∈ Mn×1

(Since only one index changes, we can drop the other when we write row/column
vectors.) The natural number n is the length of the row/column vector.
The following problems are mostly very routine, but introduce very impor-
tant concepts and definitions.
Exercises:
(1) Check that Mm×n is indeed a vector space.
(2) Let eij denote the matrix with all entries zero except the ij th , which is
1. For example, if m = 2, n = 2, there are four such matrices:
1 0 0 1 0 0 0 0
e11 = [ ] , e12 = [ ] , e21 = [ ] , e22 = [ ]
0 0 0 0 1 0 0 1
Artin calls these “matrix units”; the terminology is not standard. Show
that the set of matrix units forms a basis for Mm×n , which therefore has
dimension mn.
(3) Given a m × n matrix
⎡ a ⎤
⎢ 11 . . a1n ⎥
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ am1 . . amn ⎥
⎣ ⎦
its transpose, denoted A , is the n × m matrix defined as follows:
tr

⎡ a ⎤
⎢ 11 . . am1 ⎥
⎢ ⎥
A =⎢ . . .
tr
. ⎥
⎢ ⎥
⎢ a1n . . amn ⎥
⎣ ⎦
th tr th
In other words the ji entry of A is the ij entry of A.
Check that A ↦ Atr is an isomorphism Mm×n → Mn×m .

6.2. Matrix products; definition. Let B a l × m matrix and A a m × n


matrix :
⎡ a11 . . a1n ⎤
⎢ ⎥
⎡ b ⎤ ⎢ . ⎥
⎢ 11 . . . b1m ⎥ ⎢ . . . ⎥
⎢ ⎥ ⎢ ⎥
B=⎢ . . . . . ⎥, A = ⎢ . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ bl1 . . . blm ⎥ ⎢ . . . . ⎥
⎣ ⎦ ⎢ ⎥
⎢ am1 . . amn ⎥
⎣ ⎦
LINEAR ALGEBRA 21

Note that the number of columns of the “first” matrix B is the number of
rows of the “second” matrix A, both being equal to m. Then the matrix
product (or simply “product”) BA is defined to be the l × n matrix C,
⎡ c ⎤
⎢ 11 . . c1n ⎥
⎢ ⎥
C = BA = ⎢ . . . . ⎥
⎢ ⎥
⎢ cl1 . . cln ⎥
⎣ ⎦
with entries given by:
m
cij = ∑ bik akj , i = 1, . . . , l, j = 1, . . . , n .
k=1

In other words,
⎡ a11 . . a1n ⎤
⎢ ⎥
⎡ b ⎤⎢ . ⎥ ⎡ ∑m b a m ⎤
⎢ 11 . . . b1m ⎥⎢ . . . ⎥ ⎢ k=1 1k k1 . . ∑k=1 b1k akn ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . ⎥⎢ . . . . ⎥=⎢ . . . . ⎥
⎢ ⎥⎢ ⎥ ⎢ m ⎥
⎢ bl1 . . . blm ⎥⎢ . . . . ⎥ ⎢ ∑k=1 blk ak1 . . ∑m
k=1 lk akn
b ⎥
⎣ ⎦⎢ ⎥ ⎣ ⎦
⎢ am1 . . amn ⎥
⎣ ⎦

6.3. Matrices as linear maps. Consider a linear map  ∶ Mn×1 → Mm×1


(from the vector space of column vectors of length n to the vector space of
columns vectors of length m) .
Let e⃗j , j = 1, . . . , n denote the matrix unit:
⎡ 0 ⎤
⎢ ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥ ← j th place
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 0 ⎥
⎣ ⎦
Clearly (⃗ e1 , . . . , e⃗n ) is an ordered basis of Mn×1 . Define similarly the ordered
e′1 , . . . , e⃗′m ) of Mm×1 .
basis (⃗
Define scalars aij , i = 1, . . . , m and j = 1, . . . , n by
m
ej ) = ∑ aij e⃗′i
Â(⃗
i=1

Note that these scalars are determined by  and in turn determine Â. Con-
sider the column vector (of length n)
⎡ ⎤
⎢ v1 ⎥
⎢ ⎥
⎢ . ⎥
v⃗ = ⎢ ⎥ .
⎢ . ⎥
⎢ ⎥
⎢ vn ⎥
⎣ ⎦
22 T.R. RAMADAS

Set v⃗′ = Â(⃗


v ). This is a column vector of length m and
⎡ v1 ⎤
⎡ v′ ⎤ ⎢ ⎥
⎢ 1 ⎥ ⎢ . ⎥
′ ⎢ ⎥ ⎢ ⎥
v⃗ ≡ ⎢ . ⎥ = Â(⎢ ⎥)
⎢ ′ ⎥ ⎢ . ⎥
⎢ vm ⎥ ⎢ ⎥
⎣ ⎦ ⎢ vn ⎥
⎣ ⎦
and we have
⎡ ∑ a1j vj ⎤
⎡ v′ ⎤ ⎢ j ⎥
⎢ 1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ . ⎥
⎢ . ej ) = ∑ ∑ aij vj e⃗i = ⎢
⎥ = Â(∑ vj e⃗j ) = ∑ vj Â(⃗ ′

⎢ ′ ⎥ ⎢ . ⎥
⎢ vm ⎥ j j j i ⎢ ⎥
⎣ ⎦ ⎢ ∑j amj vj ⎥
⎣ ⎦
(For purposes of visualisation, I have supposed that m < n, which is why
the v⃗′ column looks shorter than the v⃗ column.) Comparing entries of the
vectors on the extreme left and right we get
vi′ = ∑ aij vj , i = 1, . . . , m
j

Remark 6.1. These are equalities of real numbers, as opposed to the equa-
tions
m
ej ) = ∑ aij e⃗′i j = 1 . . . , n
Â(⃗
i=1
which are equalities between vectors, and serve to define (see below) the
matrix A associated to the linear map Â. Note that in the first equation the
summation is over j (the column index of A) and in the second over i (the
row index of A).

Define the matrix of the linear map  to be:


⎡ a ⎤
⎢ 11 . . a1n ⎥
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ am1 . . amn ⎥
⎣ ⎦
We see that v⃗′ = Â(⃗
v ) is equivalent to:
⎡ v1 ⎤
⎢ ⎥
⎡ v′ ⎤ ⎡ a ⎤⎢ ⎥
⎢ 1 ⎥ ⎢ 11 . . a1n ⎥⎢ . ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ . ⎥=⎢ . . . . ⎥⎢ . ⎥
⎢ ′ ⎥ ⎢ ⎥⎢ ⎥
⎢ vm ⎥ ⎢ am1 . . amn ⎥⎢ . ⎥
⎣ ⎦ ⎣ ⎦⎢ ⎥
⎢ vn ⎥
⎣ ⎦
which can be written more concisely:
v⃗′ = A⃗
v
This is an equality of column vectors of length m, where the column vector
on the right is the product of a m × n matrix and a n × 1 matrix (i.e., a
column vector of length n.) Compare this with the defining equation of v⃗′ :
v⃗′ = Â(⃗
v)
LINEAR ALGEBRA 23

which is also an equality of column vectors of length m, where on the right


we have a linear map  acting on a column vector of length n.
To summarise, we can go back and forth:
v⃗ ↦ Â(⃗
v) ←→ v⃗ ↦ A⃗
v
where
v ) is the linear map  acting on a column vector v⃗, and
— Â(⃗
v is the matrix A multiplying a column vector v⃗.
— A⃗
Remark 6.2. It is important to get used to the mechanics of matrix multi-
plication. Here is a start on “block multiplication”: Write the matrix A as
one row of column vectors, each of length m:
A = [⃗ ⃗n ]
a1 . . a
where
⎡ a1p ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⃗p = ⎢ .
a ⎥ , p = 1, . . . , n
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ amp ⎥
⎣ ⎦
Then
c1 . . c⃗n ] = [B⃗
BA ≡ C = [⃗ an ]
a1 . . B⃗

6.4. Matrix products: basics. We make a number of important remarks.


Check the computations as you read.
(1) Suppose given linear maps
Ml×1 ← Mm×1 ← Mn×1
B̂ Â
and B, A are the corresponding matrices. Then C = BA is the matrix cor-
responding to the composite (linear) map Ĉ ≡ B̂ ○ Â.
Proof. Let e⃗′′1 , . . . , e⃗′′l , e⃗′1 , . . . , e⃗′m , and e⃗1 , . . . , e⃗n be the respective bases (all
matrix units) of Ml×1 ,Mm×1 and Mn×1 . We have
ej ) = ∑ aij e⃗′i
Â(⃗
i
from which it follows that
ej ) = B̂(Â(⃗
Ĉ(⃗ e′i ) = ∑ aij ∑ bki e⃗′′k = ∑{∑ bki aij }⃗
ej )) = ∑ aij B̂(⃗ e′′k = ∑ ckj e⃗′′k
i i k k i k
from which the claim follows. 
(2) As a consequence, we see that matrix multiplication is associative. that
is, given matrices A′′ , A′ , A of sizes a × b, b × c and c × d respectively
A′′ (A′ A) = (A′′ A′ )A
24 T.R. RAMADAS

Proof. Let Â′′ , Â′ , Â denote the corresponding operators:


Ma×1 ← Mb×1 ← Mc×1 ← Md×1
Â′′ Â′ Â
Clearly we have the associativity of the composition of maps:
Â′′ ○ (Â′ ○ Â) = (Â′′ ○ Â′ ) ○ Â

(This is true of the composition of any three maps, linear or otherwise:


if h, g, f are functions, h ○ (g ○ f ) = (h ○ g) ○ f . For, applied to any x,
h ○ (g ○ f )(x) = h(g ○ f (x)) = h(g(f (x))) = (h ○ g)(f (x)) = (h ○ g) ○ f (x).)
By 1, composition of linear maps corresponds to multiplication of matrices.
So we get
A′′ (A′ A) = (A′′ A′ )A
as claimed.
(3) Check that {BA}tr = Atr B tr .
(4) Let Iˆl ∶ Ml×1 → Ml×1 be the identity map:
v ) = v⃗, v⃗ ∈ Ml×1
Iˆl (⃗
check that the corresponding l × l matrix is the identity matrix :
⎡ 1 0 . 0 0 ⎤
⎢ ⎥
⎢ 0 1 . 0 0 ⎥
⎢ ⎥
⎢ ⎥
Il = ⎢ . . . . . ⎥
⎢ ⎥
⎢ 0 0 . 1 0 ⎥
⎢ ⎥
⎢ 0 0 . 0 1 ⎥
⎣ ⎦
with all entries zero except those along the diagonal, which are equal to 1.
Given any linear map  ∶ Mn×1 → Mm×1 , clearly Iˆm ○  =  ○ Iˆn = Â, and
this implies the matrix inequalities: Im A = AIn = A. Check this directly.
(5) Direct verification of associativity of matrix multiplication. Here is the
computation:
{A′′ (A′ A)}ij = ∑ A′′il {A′ A}lj = ∑ A′′il (∑ A′lm Amj )
l l m
= ∑ ∑ Ail Alm Amj = ∑ ∑ Ail Alm Amj
′′ ′ ′′ ′
l m m l
= ∑(∑ Ail Alm )Amj = ∑{A A }im Amj
′′ ′ ′′ ′
m l m
= {(A A )A}ij
′′ ′

(6) Fix n ≥ 1, and consider the set Mn×n of n × n matrices. As already


noted, this is a vector space. Furthermore, this is a ring since the product
of two n × n matrices is another such, and given any B, A, A′ , we have
B(A′ + A) = BA′ + BA
(A′ + A)B = A′ B + AB
LINEAR ALGEBRA 25

(In other words, “multiplication is distributive over addition”.) This is a


ring with identity, since AIn = In A = A. Associativity of multiplication has
been checked earlier. Note that multiplication is not commutative unless
n = 1. (Give a counterexample in the case n = 2.)
Note that the bijection R → M1×1 given by
λ ↦ [λ]
preserves addition and multiplication, and takes 0/1 to the zero/identity
matrix. So R and M1×1 are isomorphic as rings.
(7) Exercise Suppose given a n × n matrix A. Prove that the following are
equivalent
(a) there exists a n × n matrix R such that AR = In .
(b) there exists a n × n matrix L such that LA = In .
(c) the linear map  is bijective.
(8) Suppose there exists R, L as above. Then R = In R = LAR = LIn = L. If
R, R′ exist as above. Then R = L = R′ , so R is unique as is L.
(9) If  is bijective, we say that A is an invertible matrix, the matrix L = R
above is denoted A−1 . Prove that A, B are invertible, so is their product
and (AB)−1 = B −1 A−1 .
(10) The set of real invertible n × n matrices is denoted GLn (R). In the
present context, we will denote it as simply GLn . If n = 1,
GL1 = {[λ]∣λ ≠ 0}
since if λ ≠ 0, [λ][1/λ] = [1] = I1 . (We are being pedantic and taking the
trouble to distinguish a 1 × 1 matrix [λ] from the real number λ.) So every
nonzero n × n matrix is invertible if n = 1. This is no longer true if n > 1.
Give counterexamples.

6.5. GL2 and a sneak preview of determinants. This is a long


Exercise. For any a 2 × 2 matrix
a11 a12
A=[ ]
a21 a22
define its determinant, denoted det A by
det A = a11 a22 − a21 a12
Check that if B is a second 2 × 2 matrix, then
det AB = det A × det B .
Check that det I2 = 1. Conclude that a necessary condition for A to be
invertible is that det A ≠ 0.
26 T.R. RAMADAS

Warning: The map det ∶ M2×2 → R is not linear.


For every matrix A as before, check that
a11 a12 a −a12
[ ] [ 22 ] = (det A)I2
a21 a22 −a21 a11
so that if det A ≠ 0, A is invertible and
a22 −a12
A−1 = (det A)−1 [ ]
−a21 a11
In particular, GL2 is the complement, in M2×2 , of the subset D defined by
the vanishing of the determinant. Explicitly, GL2 = M2×2 ∖ D, where
a11 a21
D = {[ ] ∣a11 a22 − a12 a21 = 0}
a12 a22

The definition of the determinant when n > 2 is subtle, as we will see later.
LINEAR ALGEBRA 27

7. Row reduction

7.1. Linear equations and matrices. Consider a system of two linear


equations in two variables:
a11 v1 + a12 v2 = v̂1
(1)
a21 v1 + a22 v2 = v̂2

When we first encounter such equations, we try to solve them by “eliminating


variables”. If a11 ≠ 0, we can multiply the first equation by a21 /a11 and
subtract the resulting equation from the second to get:
a11 v1 + a12 v2 = v̂1
a11 a22 − a12 a21 a21
( )v2 = v̂2 − v̂1
a11 a11
If a11 a22 − a12 a21 ≠ 0, we can divide the second equation by (a11 a22 −
a12 a22 )/a11 to get:
a11 v1 + a12 v2 = v̂1
a11 v̂2 − a21 v̂1
v2 =
a11 a22 − a12 a21
Multiplying the second equation by a12 and subtracting resulting equation
from the first, we get:
a11 a12 v̂2 − a21 a12 v̂1 a11 a22 v̂1 − a11 a12 v̂2
a11 v1 = v̂1 − =
a11 a22 − a12 a21 a11 a22 − a12 a21
a11 v̂2 − a21 v̂1
v2 =
a11 a22 − a12 a21
And finally, dividing the first equation by a11 we get:
a22 v̂1 − a12 v̂2
v1 =
a11 a22 − a12 a21
(2)
−a21 v̂1 + a11 v̂2
v2 =
a11 a22 − a12 a21
Note that the solution is valid even if a11 = 0.
Let us approach the problem of solving the above system in terms of matri-
ces.
Write the equations (1) in matrix terms:
a11 a12 v v̂
[ ][ 1 ] = [ 1 ]
a21 a22 v2 v̂2
Now that we know about determinants and matrix inverses of 2×2 matrices,
we can multiply on the left by A−1 , which exists precisely when det A =
28 T.R. RAMADAS

a11 a22 − a12 a21 ≠ 0, and get


v1 1 a −a12 v̂
(3) [ ]= [ 22 ][ 1 ]
v2 det A −a 21 a11 v̂2

which, when written out, gives back the solution (2).


This method works in general when the number of unknowns equals the
number of equations, and the determinant is nonzero. What if this is not
the case? Let us go back our earlier manipulations and set up a dictionary
between matrix operations and operations on systems of equations:
(1) The system of equations and the corresponding matrix equation:

a11 v1 + a12 v2 = v̂1 a11 a12 v v̂


[ ][ 1 ] = [ 1 ]
a21 v1 + a22 v2 = v̂2 a21 a22 v2 v̂2

(2) Multiply the first equation by a21 /a11 and subtract the resulting equa-
tion from the second. On the “matrix side” this is implemented by left-
multiplying (i.e., multiplying from the left) both sides in this order:
1 0 a a v v̂
[ ] ↷ {[ 11 12 ] [ 1 ] = [ 1 ]}
− aa11
21
1 a21 a22 v2 v̂2

This gets us to:

a11 v1 + a12 v2 = v̂1


det A a21
( )v2 = v̂2 − v̂1 a11 a12 v1 v̂1
a11 a11 [ A ][ ]=[ ]
0 ( det
a11 ) v 2 v̂2 − a21
a11 v̂1

(3) Proceeding as above, we mimic the actions on the two equations by


matrix multiplication from the left. The whole sequence is as follows (please
check):
1
−0 1 −a12 1 0 1 0
[ a11 ]↷[ ]↷[ ]↷[ ]↷
0 1 0 1 0 a11
det A
− aa21
11
1

a11 a12 v 1 0 v̂
{[ ][ 1 ] = [ ] [ 1 ]}
a21 a22 v2 0 1 v̂2
We have introduced the identity matrix on the right to make a point.
Namely, the two column vectors are spectators in the computations, which
LINEAR ALGEBRA 29

are all being done with 2×2 matrices. Carrying out the matrix computations
on both sides, we get

1 0 v 1 a −a12 v̂
[ ][ 1 ] = [ 22 ][ 1 ]
0 1 v2 det A −a21 a11 v̂2

which we recognise as essentially the equation (3).

(4) Observe that we started with an equation of the form

v = v⃗′
A⃗

and found series of square matrices Ek , Ek−1 , . . . , E1 of some specific types


such that

● multiplication on the left by each of the Ek ’s has an effect that is easy


to describe, and

● at the end the product Ek Ek−1 . . . E1 A is a matrix of a simple form.

This is what is systematically implemented by row reduction.

7.2. Basic definitions and results: elementary matrices, row-echelon


form. Recall that the “matrix unit” eij ∈ Mm×n is the matrix with all en-
tries zero except the ij th , which is 1. There are mn matrix units and the set
of matrix units is clearly a basis for the vector space Mm×n . That is, every
m × n matrix A can be written uniquely as a sum

A= ∑ aij eij
i=1,...,m j=1,...,n

a11 a12
(For example, if m = n = 2 and A = [ ] we have
a21 a22

a11 e11 + a12 e12 + a21 e21 + a22 e22 = a11 [ ] + a12 [ ] + a21 [ ] + a22 [ ] = A .)
1 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1

We next introduce certain simple invertible square matrices, which will play
an important part in the subsequent discussion. These elementary matrices
come in three types, and can be characterised by their shapes as well as their
distinct actions on column/row vectors. (We deal with m×m matrices, with
m fixed, and we will consider only actions on column vectors.)

(1) Elementary matrices of the first kind Im + a eij ) for some 1 ≤ i ≠ j ≤ m


and a a scalar. (So all diagonal entries are equal to one, and there is only
30 T.R. RAMADAS

one nonzero “off-diagonal” entry.) Such a matrix has the shape:

⎡ 1 0 . . . 0 0 ⎤ ⎡ 1 0 . . . 0 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥ ⎢ 0 1 . . . 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ . ⎥
⎢ . . . . . . . ⎥ ⎢ . . . a . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥ or ⎢ . . . . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ . . a . . . . ⎥ ⎢ . . . . . . . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥ ⎢ 0 0 . . . 1 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 ⎥
⎣ 0 0 . . . .0 1 ⎦ ⎣ 0 . . . .0 1 ⎦

Exercise: Acting11 on a column vector such an elementary matrix adds a


times the j th entry of the vector to the ith entry. It is useful to visualise
what happens; we do this separately for the cases i > j and i < j:

⎡ 1 0 . . . 0 0 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ vj ⎥ ⎢ vj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . a . . . . ⎥⎢ vi ⎥ ⎢ vi + avj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v ⎥
⎣ 0 0 . . . .0 1 ⎦⎣ vm ⎦ ⎣ m ⎦

and
⎡ 1 0 . . . 0 0 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 . . . 0 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v + av ⎥
⎢ . . . . a . . ⎥⎢ vi ⎥ ⎢ i j ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . . . . . . . ⎥⎢ vj ⎥ ⎢ vj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 0 . . . 1 0 ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ v ⎥
⎣ 0 0 . . . .0 1 ⎦⎣ vm ⎦ ⎣ m ⎦

(2) Elementary matrices of the second kind: Im + eij + eji − eii − ejj , where
i ≠ j. Such a matrix has the shape:

⎡ 1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 1 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 0 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥⎦

11This is informal language for “multiplying a m × 1 matrix on the left”.


LINEAR ALGEBRA 31

Exercise: Acting on a column vector this interchanges (“swaps”) the entries


in the ith and j th places:
⎡ 1 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 1 ⎥⎢ vi ⎥ ⎢ vj ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 0 ⎥⎢ vj ⎥ ⎢ vi ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ 1 ⎦⎣ vm ⎦ ⎣ vm ⎦

(3) Elementary matrices of the third kind: Im + (c − 1) eii , for i and a scalar
c ≠ 0. Such a matrix has the shape:
⎡ 1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎢ c ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎣ 1 ⎦

Exercise: Acting on a column vector this multiplies the ith entry by c:


⎡ 1 ⎤⎡ v1 ⎤ ⎡ v1 ⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ c ⎥⎢ vi ⎥ ⎢ cvi ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥=⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥⎦ ⎢⎣ ⎥ ⎢ ⎥
⎣ vm ⎦ ⎣ vm ⎦

Given a m × n matrix A, we can think of it as one row (of length n) of


column vectors, each of length m, and apply the reasoning of Remark 6.2
to conclude:
Proposition 7.1. Given a m×m elementary matrix A and an m×n matrix
A, the map A ↦ EA can be described as follows:
(1) if E is of the first kind, E = Im + a eij ), then EA is got from A by
adding a times the j th row to the ith row.
(2) if E is of the second kind, E = Im + eij + eji − eii − ejj , then EA is got
from A by interchanging the j th row and the ith row.
(3) if E is of the third kind, E = Im + (c − 1)eii , then EA is got from A
by multiplying the ith row by c.
32 T.R. RAMADAS

Exercise: Prove that the inverse of an elementary matrix is elementary.


Describe explicitly the inverse in each case.
The operations on a m × n matrix A described in the above Proposition are
called (elementary) row operations.

7.3. Matrices in row-echelon form. A m × n matrix A is said to be in


row-echelon form if it has the following shape
⎡ ∗ ∗ ∗ ∗ ⎤
⎢ 0 0 1 0 0 0 0 . . . ⎥
⎢ ∗ ∗ ∗ ⎥
⎢ 0 0 0 0 1 0 0 0 . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 1 ∗ ∗ 0 0 . . . ∗ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 1 0 . . . ∗ ⎥
⎢ ⎥
⎢ ∗ ⎥
⎢ 0 0 0 0 0 0 0 0 0 1 . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎣ ⎦

Here are the rules for a matrix in row-echelon form:


(1) The first nonzero entry (reading from the left) in each row is 1. This
is called a pivot.
(2) The entries above a pivot are zero.

(3) If the ith row has a pivot, then either

● all the entries in the rows below the ith row are zero, or
● there is a pivot in the (i + 1)th row to the right of the pivot in
the ith column.
In the above example, the pivots are in red, as are the entries above the
pivots. The ∗ entries can be arbitrary real numbers. Another way of visual-
ising a matrix in row-echelon form: start with a m′ ×n matrix (with m′ ≤ m)
of the shape:
⎡ 0 0 1 ∗ 0 0 ∗ ∗ 0 0 . . . ∗ ⎤
⎢ ⎥
⎢ ∗ ∗ ∗ ⎥
⎢ 0 0 0 0 1 0 0 0 . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 1 ∗ ∗ 0 0 . . . ∗ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 1 0 . . . ∗ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 1 . . . ∗ ⎥
⎣ ⎦
and add m − m′ rows at the bottom with all entries zero.
Exercise: If m = n, and A is invertible and in row-echelon form, show that
A = In .
LINEAR ALGEBRA 33

7.4. Row reduction to row-echelon form. The main result is the fol-
lowing:
Theorem 7.2. Any m × n matrix A can be brought into a matrix A′ in row-
echelon form by a sequence of elementary row operations. In other words,
there exists a sequence (not unique) of elementary matrices Ek , . . . , E1 such
that
A′ ≡ Ek Ek−1 . . . E1 A
is in row-echelon form.

For the proof see Artin. As Artin says, the row-echelon form A′ is uniquely
determined by A, but he does not prove this, and we will not use the fact.
Corollary 7.3. A n × n matrix is invertible iff it is a product of elementary
matrices.

Proof. Elementary matrices are invertible by definition, and hence so is any


product of elementary matrices. Suppose now that A is invertible, and
A′ = Ek Ek−1 . . . E1 A
is a reduction to row-echelon form. Since A is invertible as are the Ei ’s, the
matrix A′ is invertible. This forces A′ to be the identity. Then
A = E1−1 E2−1 . . . Ek−1
But the inverse of an elementary matrix is elementary, so we are done. 

7.5. The linear map corresponding to a matrix in row-echelon


form. It is very easy to describe the kernel and image of the linear map
 given by a matrix A in row-echelon form.
Consider a m × n matrix A in the row-echelon form. Let  ∶ Mn×1 →
Mm×1 be the corresponding linear map. Let e⃗1 , . . . , e⃗n and e⃗′1 . . . , e⃗′m be the
standard bases of Mn×1 and Mm×1 respectively.
If the the pivots are in the first m′ rows of A, at the intersection of the ith
row and the jith column, i = 1, . . . , m′ , we have
eji ) = e⃗′i , i = 1, . . . m′
Â(⃗
and
ej ) ∈< e⃗′1 , . . . , e⃗′i > if ji < j < ji+1
Â(⃗
This immediately shows that
image(Â) =< e⃗′1 , . . . , e⃗′m′ >

The kernel of  can also be described explicitly, but this is subtler. We leave
the proof as an exercise.
34 T.R. RAMADAS

Proposition 7.4. The kernel of  is the subspace of Mn×1 consisting of


vectors
⎡ v1 ⎤
⎢ ⎥
⎢ . ⎥
⎢ ⎥
v⃗ = ⎢ ⎥ .
⎢ . ⎥
⎢ ⎥
⎢ vn ⎥
⎣ ⎦
where the vj , j ≠ j1 , . . . , jm′ are arbitrary and
v jp = − ∑ apl vl
jp <l≤n, l≠jp+1 ,...,jm′

In particular, dim ker(Â) = n − m′ , so that we get another proof of Theorem


5.12 (4).
Suppose we exhibit a subspace V ⊂ Mn×1 such that V ⊕ ker(Â) = Mn×1 .
Then the map Â∣V ∶ V → Rn will map bijectively onto image(Â). (Why?)
Now that we have an explicit description of ker(Â), a natural V suggests
itself:

⎪⎡ 0 ⎤ ⎫


⎪⎢ ⎥ ⎪


⎪⎢ ⎥ ⎪


⎪⎢ . ⎥ ⎪


⎪⎢ ⎥ ⎪


⎪⎢ 0 ⎥ ⎪


⎪⎢ ⎥ ⎪


⎪⎢ v ⎥ ⎪


⎪⎢ j 1 ⎥ ⎪



⎪⎢ ⎥ ⎪



⎪⎢ 0 ⎥ R ⎪

⎪⎢
⎪⎢ . ⎥ RRR ⎥ R ⎪

V = ⎨⎢ ⎥ RRvj1 , . . . , vjm′ ∈ R⎬
R
⎪⎢ ⎥ ⎪


⎪⎢ . ⎥ RRR ⎪



⎪⎢ 0 ⎥ ⎪


⎪⎢ ⎥ ⎪


⎪⎢ ⎥ ⎪



⎪⎢ v j ⎥ ⎪


⎢ ⎥
m ′


⎪⎢ 0 ⎥ ⎪




⎪⎢ ⎥ ⎪



⎪⎢ . ⎥ ⎪


⎪⎢ ⎥ ⎪

⎪⎢
⎩⎣ 0 ⎦ ⎥ ⎪

The map Â∣V is:
⎡ ⎤
⎛⎢⎢ 0 ⎥⎞

⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎡ ⎤
⎜⎢ 0 ⎥⎟ ⎢ vj1 ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ vj1 ⎥⎟ ⎢ vj2 ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ 0 ⎥⎟ ⎢ . ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ . ⎥⎟ ⎢ . ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
 ⎜⎢ ⎥⎟ = ⎢ ⎥
⎜⎢ . ⎥⎟ ⎢ vjm′ ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ 0 ⎥⎟ ⎢ 0 ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ vj ⎥⎟ ⎢ . ⎥
⎜⎢ m′ ⎥⎟ ⎢ ⎥
⎜⎢ 0 ⎥⎟ ⎢ 0 ⎥
⎜⎢ ⎥⎟ ⎣ ⎦
⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟
⎝⎢⎢ 0 ⎥⎠

⎣ ⎦
Fro this it is easy to read off a solution for each v⃗′ in the image of Â.
LINEAR ALGEBRA 35

7.6. An example. Consider the map  ∶ R5 → R3 given by the 3×5 matrix:


⎡ 0 1 a ⎤
⎢ 13 0 a15 ⎥
⎢ ⎥
A = ⎢ 0 0 0 1 a25 ⎥
⎢ ⎥
⎢ 0 0 0 0 0 ⎥
⎣ ⎦
from which it is easy to read off that

⎪ ⎡ v′ ⎤ R ⎫

⎪⎢⎢ 1′ ⎥⎥ RRRR ′ ′
⎪ ⎪

image(Â) = ⎨⎢ v2 ⎥ RRRv1 , v2 arbitrary ⎬

⎪ ⎢ ⎥ R
R ⎪

⎩⎢⎣ 0 ⎥⎦ R
⎪ ⎪

v = v⃗′ , we get
Writing out the equation A⃗
v1′ = v2 + a13 v3 + a15 v5
v2′ = v4 + a25 v5
v3′ = 0
From this we read off:
⎧⎡ v1 ⎤ ⎫


⎪⎢ ⎥ ⎪



⎪⎢ −a − ⎥ R ⎪

⎪⎢
⎪⎢ 13 v 3 a 15 v 5 ⎥ R
R ⎪

⎥ RR
ker(Â) = ⎨⎢ v3 ⎥ RRv1 , v3 , v5 ∈ R⎬
R
⎪⎢ ⎥ ⎪


⎪⎢ −a25 v5 ⎥ RRR ⎪



⎪⎢ ⎥ ⎪

⎪⎢
⎩⎣ v ⎥ ⎪

5 ⎦
Note that the ”free” variables describing the kernel are v1 , v3 , v5 while v2 , v4
are determined in terms of these. Note that also that 2, 4 correspond to the
columns with the pivots.
The subspace V is:
⎧ ⎡ 0 ⎤ ⎫


⎪ ⎢ ⎥ ⎪



⎪ ⎢ ⎥R ⎪

⎪⎢⎢ ⎥ RR
⎪ v2 ⎪
⎥ RR ⎪
V = ⎨⎢ 0 ⎥ RRv2 , v4 ∈ R⎬

⎪ ⎢ ⎥ RRR ⎪


⎪ ⎢ v4 ⎥R ⎪


⎪ ⎢ ⎥ ⎪

⎩⎢⎣
⎪ 0 ⎥
⎦ ⎪

and the map Â∣V is
⎡ 0 ⎤
⎛⎢⎢ ⎥⎞
⎥ ⎡ ⎤
⎜⎢ v2 ⎥⎟ ⎢ v2 ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
 ⎜⎢ 0 ⎥⎟ = ⎢ v4 ⎥
⎜⎢ ⎥⎟ ⎢ ⎥
⎜⎢ v4 ⎥⎟ ⎢ 0 ⎥
⎝⎢⎢ 0
⎥⎠ ⎣


⎣ ⎦
v ) = v⃗′ has
We conclude: the equation Â(⃗ as solution iff v3′ = 0 in which case
the vector
⎡ 0 ⎤
⎢ ⎥
⎢ v′ ⎥
⎢ 1 ⎥
⎢ ⎥
v⃗ = ⎢ 0 ⎥
⎢ ′ ⎥
⎢ v2 ⎥
⎢ ⎥
⎢ 0 ⎥
⎣ ⎦
36 T.R. RAMADAS

is a solution.
LINEAR ALGEBRA 37

8. Determinants

We have already encountered the definition of the determinant of a 2 × 2


matrix:
a11 a12
det [ ] = a11 a22 − a21 a12
a21 a22

We saw that a necessary and sufficient condition for the existence of an


inverse is that the determinant be nonzero.

Let us define the determinant of a 1 × 1 matrix in the obvious way:

det [λ] = λ

Trivially, the matrix is invertible iff its determinant is nonzero.

We will now extend the definition of the determinant to square matrices of


arbitrary size. For fixed n ≥ 1, we will regard the determinant as a function
det ∶ Mn×n → R.

To paraphrase Artin, all possible definitions are rather clumsy. We will


follow his strategy, namely choose one (“at random”) and show that the
corresponding function satisfies three important properties.

We make a preliminary definition. Let an n × n matrix


⎡ a11 a1n ⎤⎥

⎢ ⎥
⎢ ⎥
⎢ ⎥
A=⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ an1 ann ⎥⎦

be given. For 1 ≤ i ≤ n and 1 ≤ j ≤ n, we define A−ij as the (n − 1) × (n − 1)
matrix12 obtained by deleting the ith row and the j th column of A:
⎡ a11 a a1n ⎤
⎢ 1j ⎥
⎢ ⎥

⎢ ⎥
⎢ ⎥

Aij = ⎢⎢ in ⎥
− a
i1 a
ij a

   
⎢ ⎥
⎢  ⎥
⎢ ⎥
⎢  ⎥
⎣ an1 anj
 ann ⎦

(The matrix A−ij , and others like it, are referred to as submatrices of A.)

12Notation alert! a is the matrix entry A , but A− is a matrix. Here we diverge


ij ij ij
from Artin.
38 T.R. RAMADAS

Definition 8.1. Now we can define the determinant of an n × n matrix by


induction
⎡ a1n ⎤⎥⎞
⎛⎢⎢ a11 ⎥
⎜⎢ ⎥⎟
⎜⎢ ⎥⎟
det ⎜⎢ ⎥⎟ = a11 det A−11 − a21 det A−21 + ⋅ ⋅ ⋅ ± an1 det A−n1
⎜⎢ ⎥⎟
⎜⎢ ⎥⎟

⎝⎢ an1 ⎥
⎣ ann ⎥⎦⎠

(A better way of remembering the signs than what was done in the previous
draft of these notes is to note – as Artin does – that the signs in the sum
alternate.) Our definition is “by expansion by minors of the first column”.
We do not define the closely related notions of a minor/cofactor, because
we have no occasion to use them.
We will find it notationally and conceptually convenient to view a n × n
matrix as one column of n row vectors, each of length n.
⎡ ⃗1 ⎤
⎢ R ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⃗n ⎥
⎣ R ⎦
where
⃗ p = [ap1 . . apn ] p = 1, . . . , n
R
Theorem 8.2. The determinant function det ∶ Mn×n → R satisfies the
following properties:
(1) det In = 1.
(2) det is separately linear in the rows. That is, for any 1 ≤ j ≤ n,
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎢ R ⎥ ⎢ R ⎥ ⎢ R ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⃗ ′ ⃗′ ⎥
det ⎢ λ Rj + λ Rj ⎥ = λdet ⎢ ⃗ ⎥ + λ det ⎢⎢
′ ⃗
Rj′ ⎥
⎢ Rj ⎥ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⃗ ⎥ ⎢ ⃗ ⎥ ⎢ ⃗ ⎥
⎢ R ⎥ ⎣ Rn ⎦ ⎢ Rn ⎥
⎣ n ⎦ ⎣ ⎦

(3) if n > 1 and two adjacent rows are equal, then the determinant is zero.
That is,
⎡ R ⎤
⎢ ⃗1 ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⃗ ⎥
⎢ R ⎥
det ⎢⎢ ⃗ ⎥⎥ = 0
⎢ R ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
⃗ ⎥
⎢ R
⎣ n ⎦
LINEAR ALGEBRA 39

Proof. This is clear for n = 1, 2. (We could start the induction with n = 1,
but we have to face the headache of interpreting (3) for 1 × 1 matrices.) We
proceed by induction. For example, consider the part (3) of the Theorem.
If two adjacent rows are equal R ⃗j = R
⃗ j+1 = R,
⃗ and the theorem has been
verified for matrices of size up to (n − 1) × (n − 1), the determinant equals
a11 det A−11 − a21 det A−21
+ ⋅ ⋅ ⋅ ± {aj1 det A−j1 − a(j+1)1 det A−(j+1)1 } ± . . .
± det A−n1
Now the submatrices A−1k are of size (n − 1) × (n − 1) and have two equal
adjacent rows as long as k ≠ j or k ≠ j + 1. By induction they vanish. This
leaves us with two terms
±{aj1 det A−j1 − a(j+1)1 det A−(j+1)1 } = ±a{det A−j1 − det A−(j+1)1 }

where a ≡ aj1 = a(j+1)1 . You can check that the matrices A−j1 and A−(j+1)1
are equal. This finishes the inductive step. We leave the proofs of parts (1)
and (2) of the Theorem as exercises for the reader. 

We prove next:
Proposition 8.3. If any function d ∶ Mn×n → R satisfies the properties
(1)-(3) of the theorem, it also satisfies the following additional properties:
(1) If a multiple of one row is added to an adjacent row, the value of d is
unchanged.
(2) If two adjacent rows are interchanged d changes sign.
(3) If any two rows are equal, d vanishes.
(4) If a multiple of one row is added to another row, the value of d is
unchanged.
(5) If two rows are interchanged d changes sign.
(6) If any row is zero13, d vanishes.
(7) On elementary n×n matrices14 E, the function d takes values as follows:
(a) d(E) = 1 if E is of the first kind.
(b) d(E) = −1 if E of the second kind.
(c) d(E) = c if E is of the third kind: E = In + (c − 1)eii .

13That is, the row vector with all entries 0.


14Notation warning: we considered earlier m × m elementary matrices E multiplying
m × n matrices A on the left. We are now considering n × n matrices A being multiplied
by n × n elementary matrices E.
40 T.R. RAMADAS

(8) d(EA) = d(E)d(A) if E is an n×n elementary matrix and A is arbitrary.


(9) If A is in row-echelon form, then either
(a) A is invertible and d(A) = 1, or
(b) A is not invertible and d(A) = 0

Proof. We take up the statements one by one.


(1) The equality below holds by part (2) of the above Theorem and the
second term on the right vanishes by part (3):
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞

⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⃗ ⃗ j+1 ⎥⎟ ⎜⎢ ⃗ ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ Rj + λ′ R ⎥⎟ ⎜⎢ Rj ⎥⎟ ⎜⎢ Rj+1 ⎥⎟
d⎜
⎜⎢


⎥⎟ = d ⎜⎢
⎥⎟ ⎜⎢ ⃗ j+1
⎥⎟ + λ′ d ⎜⎢
⎥⎟ ⎜⎢ ⃗ j+1
⎥⎟
⎥⎟
⎜⎢ Rj+1 ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠

⎣ ⎦ ⎣ ⎦ ⎣ ⎦

(2) Again, the equality below holds by part (2) of the above Theorem and
the second term on the right vanishes by part (3):
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R
. ⎥⎞ ⎛⎢
⎥ ⎢
R
. ⎥⎞
⎥ ⎛⎢⎢ R
. ⎥⎞ ⎛⎢
⎥ ⎢
R
. ⎥⎞
⎥ ⎛⎢⎢ R
. ⎥⎞

⎜⎢ ⃗j ⎥⎟ ⎜⎢ ⃗ j+1 ⎥⎟ ⎜⎢ ⃗j + R ⃗ j+1 ⎥⎟ ⎜⎢ ⃗ j+1 + Rj ⎥⎟ ⎜⎢ ⃗ j+1 + Rj ⎥⎟
d ⎜⎢ R
⃗ j+1 ⎥⎟+d ⎜⎢ R
⃗j ⎥⎟ = d ⎜⎢ R
⃗ j+1 ⎥⎟+d ⎜⎢ R
⃗j ⎥⎟ = d ⎜⎢ R
⃗ j+1 + R
⃗j ⎥⎟ = 0
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎝⎢⎢ .
⃗n
⎥⎠ ⎝⎢
⎥ ⎢ .
⃗n
⎥⎠
⎥ ⎝⎢⎢ .
⃗n
⎥⎠ ⎝⎢
⎥ ⎢ .
⃗n
⎥⎠
⎥ ⎝⎢⎢ .
⃗n
⎥⎠

⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦ ⎣ R ⎦

(3) We can move the identical rows together by interchanging one of them
repeatedly with an adjacent one, each time changing the sign of d. Here is
the first step:
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞

⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ R⃗ ⎥⎟ ← j th place → ⎜⎢ ⃗
Rj+1 ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ j+1 ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
d ⎜⎢ ⎥⎟ − d ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ = ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠

⎣ ⎦ ⎣ ⎦
Eventually we have identical adjacent rows and part (2) of the Theorem
applies.
(4) The equality below holds by part (2) of the above Theorem and the
second term on the right vanishes by part (3) of the Proposition, already
LINEAR ALGEBRA 41

proved:

⎛⎡⎢ ⃗1
R ⎤⎞ ⃗ 1 ⎤⎞
⎛⎡⎢ R ⎛⎡⎢ ⃗1
R ⎤⎞ ⎛⎡⎢ ⃗1
R ⎤⎞
⎥ ⎥ ⎥ ⎥
⎜⎢⎢ ⎥ ⎢ ⎥ ⎜⎢⎢ ⎥⎟ ⎜⎢⎢ ⎥⎟
⎜⎢ . ⎥⎟⎟
⎜⎢ . ⎥⎟
⎜⎢ ⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ R ⎥ ⎟ ⎜ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ j + λ R ⃗ k ⎥⎟
⎥⎟ ⎜ ⎢
⃗ j ⎥⎟
⎢ R
⎥⎟ ⎜⎢ ⃗k
R ⎥⎟
⎥⎟ ⎜⎢ ⃗j
R ⎥⎟
⎥⎟
⎜⎢ ⎥⎟ ⎜ ⎜⎢ ⎜⎢
⎜⎢ . ⎥ ⎜⎢⎢ . ⎥⎥⎟ ⎜⎢ . ⎥⎟
⎥⎟ = d ⎜ . ⎥⎟
d ⎜⎢ ⎥⎟⎟ = ⎜ ⎟ + ⎜⎢ ⎜⎢⎢ ⎥⎟
⎜⎢⎢ . ⎥⎥⎟ ⎥⎟ ⎥⎟
d λd
⎜⎢ . ⎥ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟⎟ ⎜ ⎢ ⃗ ⎥⎟ ⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ ⃗ ⎥ ⎜ ⎢ ⎥ ⎜⎢ ⃗k ⎥⎟ ⎜⎢ ⃗k ⎥⎟
⎜⎢ R k ⎥⎟⎟ ⎜ ⎢ R k ⎟
⎥⎟ ⎜⎢ R ⎥⎟ ⎜⎢ R ⎥⎟
⎜⎢ ⎥⎟ ⎜ ⎜⎢ ⎜⎢
⎜⎢ . ⎥ ⎜⎢⎢ . ⎥⎥⎟ ⎜⎢ . ⎥⎟
⎥ ⎜⎢ . ⎥⎟

⎝⎢⎣ R⃗n ⎥⎠
⎦ ⎝⎢⎣ R
⃗ n ⎥⎦⎠ ⎝⎢⎣ ⃗n
R ⎥⎠
⎦ ⎝⎢⎣ ⃗n
R ⎥⎠

(5) Use part (2) of the proposition. Suppose row j and row k are inter-
changed, with k > j. Move the vector in the k th row up successively past
each row till it is just below the j th row, and then move the vector in the
j th row down to the k th place. Check that an odd number of exchanges are
involved. Then use part (2) of the Proposition.
(6) We have
⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤ ⎡ ⃗1 ⎤
⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞
⎥ ⎛⎢⎢ R ⎥⎞

⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⃗ ⃗ ⎥⎟ ⎜⎢ ⃗ ⎥⎟
⎜⎢ 0⃗ ⎥⎟ ⎜⎢ 0 + Rj ⎥⎟ ⎜⎢ Rj ⎥⎟
d⎜
⎜⎢

⃗j
⎥⎟ = d ⎜⎢
⎥⎟ ⎜⎢ R
⎥⎟ = d ⎜⎢
⎥⎟ ⎜⎢
⎥⎟ = 0
⎥⎟
⎜⎢ R ⎥⎟ ⎜⎢ ⃗ j ⎥⎟ ⎜⎢ ⃗j
R ⎥⎟
⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ ⎜⎢ . ⎥⎟
⎝⎢⎢ ⃗
Rn
⎥⎠
⎥ ⎝⎢⎢ R⃗n
⎥⎠
⎥ ⎝⎢⎢ ⃗
Rn
⎥⎠

⎣ ⎦ ⎣ ⎦ ⎣ ⎦

(7) If E is of the first kind, E = In + a eij ), then E is got from In by adding


a times the j th row to the ith row. So by part (4) of the Proposition, already
proved, d(E) = d(In ) = 1.
If E is of the second kind, E = In +eij +eji −eii −ejj , then E is got from I by
interchanging the j th row and the ith row. So by part (5) of the Proposition,
already proved, d(E) = −d(In ) = −1.
If E is of the third kind, E = In + (c − 1)eii , then E is got from In by
multiplying the ith row by c, so by part (2) of the Theorem d(E) = c.
(8) Using part (7) (already proved) we have to verify that
(a) d(EA) = d(A) if E is of the first kind.
(b) d(EA) = −d(A) if E of the second kind.
(c) d(EA) = cd(A) if E is of the third kind: E = In + (c − 1)eii .
Using Proposition 7.1, statement (a) is equivalent to part (4), and statement
(b) to part (6). The last statement (c) follows from part (2) of the Theorem.
42 T.R. RAMADAS

(9) This follows from part (6) of the Proposition and the fact (already stated
in an Exercise) that if a n×n matrix A is invertible and in row-echelon form,
then A = In .

Corollary 8.4. The determinant is characterised by the properties (1)-(3)
listed in Theorem 8.2.

Proof. Given any A, there are elementary matrices Ek , . . . , E1 and a matrix


A′ in row-echelon form such that
A′ = Ek . . . E1 A
Repeatedly using part (7) of the Proposition, we see that
det A′ = det Ek × det Ek−1 × ⋅ ⋅ ⋅ × det E1 × det A
On the other hand, the previous proposition determines the value of det on
elementary matrices and on matrices in row-echelon form. 

The same reasoning, together with part (9) of the Proposition, gives:
Corollary 8.5. A square matrix A is invertible iff det A ≠ 0.

The following is immediate


Corollary 8.6. The determinant of a square matrix satisfies the following
properties:
(1) It is unchanged if a multiple of one row is added to another row.
(2) It vanishes if any two rows are equal.
(3) It changes sign if two rows are interchanged.
(4) It vanishes if any row is zero.

In addition, we have the important


Theorem 8.7. The determinant is multiplicative. That is,
det AB = det A × det B

Proof. In contrast to the proof in Artin, ours will mix “abstract” and “matrix-
theoretic arguments”. We consider two cases:
(1) Either det A = 0 or det B = 0. Suppose det A = 0. Then A is not
invertible so nor is AB. This forces det AB = 0.
LINEAR ALGEBRA 43

(2) If det A ≠ 0 and det B ≠ 0, both A and B can be written as products of


elementary matrices:
A = E1′ E2′ . . . Ek′
B = E1′′ E2′′ . . . El′′
so
det AB = det (E1′ E2′ . . . Ek′ E1′′ E2′′ . . . El′′ )
= det E1′ × det E2′ × ⋅ ⋅ ⋅ × det Ek′ × det E1′′ × det E2′′ × ⋅ ⋅ ⋅ × det El′′
x = (det E1′ × det E2′ × ⋅ ⋅ ⋅ × det Ek′ ) × (det E1′′ × det E2′′ × ⋅ ⋅ ⋅ × det El′′ )
= det A × det B


44 T.R. RAMADAS

9. Linear maps from a vector space to itself

Let V be a finite-dimensional k-vector space, and set dim V = n. We will


be considering the space L(V, V ) of linear maps from V to itself. In alge-
braic parlance, such maps are called endomorphisms of V . In more analytic
contexts these are called (linear) operators.
Given a linear map  ∶ V → V , we will define the notions of eigenvector and
eigenvalue as well as characteristic polynomial p .
In this section, we break with our habit so far, and denote by k the field of
scalars. If you are not familiar with fields other than Q, R and C, you can
just keep these in mind and remember that C is algebraically closed15 but
Q, R are not. If you do want to keep the general case in mind, here are
caveats:
(1) Be warned that the definition of polynomial that we use will not
work with the finite fields.
(2) We will prove that the set of eigenvalues of  can be identified16
with the set of roots of p that lie in k. In case k is not algebraically
closed, what do the roots of a characteristic polynomial in an alge-
braic closure k̄ correspond to? We will address this question for the
case k = R, k̄ = C.

9.1. Determinant of a linear map. We have seen that the space L(V, V )
of linear maps T ∶ V → V is a vector space. It is also a ring with identity,
with multiplication being given by composition, and the identity map IdV
being the identity of the ring. In other words, S ○ IdV = Idv ○ S = S for any
S ∈ L(V, V ). Multiplication is associative since S ○ (T ○ U ) = (S ○ T ) ○ U but
is not commutative unless dim V = 1.
If all this sounds familiar because we have seen another version in matrix
language. Namely, the space of n × n matrices Mn×n is a ring with iden-
tity In , with matrix multiplication defining the product. (In this section,
we consider matrices with entries from k so that Mn×n is short-hand for
Mn×n (k).) The two rings are isomorphic, but not “canonically”, because
the isomorphism depends on a choice. We will now make this explicit be-
cause it is important to be clear about this, and for the immediate purpose
of defining the determinant of any  ∈ L(V, V ).
The choice we have to make is that of an ordered basis B = (⃗ e1 , . . . , e⃗n ) for
V . Once we do this, we can associate to each  ∈ L(V, V ) a matrix A as we

15In other words, a non-constant polynomial function (the term will be defined later)
p ∶ C → C necessarily has a root, i.e., must vanish somewhere.
16This is not by definition, but a fact to be proved.
LINEAR ALGEBRA 45

did in §6.3: define scalars aij , i = 1, . . . , n and j = 1, . . . , n by


n
ej ) = ∑ aij e⃗i
Â(⃗
i=1

These scalars are determined by  and in turn determine Â, just as any
linear map from column vectors to column vectors is given by a unique
matrix.
Let us make the connection between A and  precise as follows.
The basis B defines an isomorphism φB ∶ Mn×1 → V :
⎡ ⎤
⎛⎢⎢ v1 ⎥⎥⎞
⎜⎢ 2 ⎥⎟ n
v
⎜⎢ ⎥⎟
φB ⎜⎢ . ⎥⎟ = ∑ vi e⃗i
⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ i=1

⎝⎢ vn ⎥⎥⎠
⎣ ⎦
Note that φB is an isomorphism of vector spaces, and not the isomorphism
of algebras L(V, V ) → Mn×n that we seek. We have
⎡ ⎤ ⎡ ⎤
⎛ ⎢⎢ v1 ⎥⎞
⎥ ⎛⎢⎢ ∑j a1j vj ⎥⎥⎞
⎜ ⎢ v2 ⎥⎟ ⎜⎢ ∑j a2j vj ⎥⎟
⎜ ⎢ ⎥⎟ ⎜⎢ ⎥⎟
φB ⎜A ⎢ . ⎥⎟ = φB ⎜⎢ . ⎥⎟ = ∑ ∑ aij vj e⃗i = ∑ vj ∑ aij e⃗i
⎜ ⎢ ⎥⎟ ⎜⎢ ⎥⎟ i j
⎜ ⎢ . ⎥⎟ ⎜⎢ . ⎥⎟ j i
⎝ ⎢⎢ vn
⎥⎠


⎝⎢ ∑ anj vj ⎥⎥⎠
⎣ ⎦ ⎣ j ⎦
⎡ ⎤
⎛⎢⎢ v1 ⎥⎥⎞
⎜⎢ v2 ⎥⎟
⎜⎢ ⎥⎟
ej ) = Â(∑ vj e⃗j ) = Â(φB ⎜⎢ . ⎥⎟)
= ∑ vj Â(⃗
⎜ ⎢ ⎥⎟
j j ⎜⎢ . ⎥⎟

⎝⎢ vn ⎥⎥⎠
⎣ ⎦
If we let v denote a column vector, we can write this in more compact form:
φB (Av) = Â(φB (v))
In other words, A is the unique matrix such that

Av = φ−1
B ○ Â ○ φB (v), v ∈ Mn×1

Exercise Check that the map  ↦ A is indeed an isomorphism of algebras.


In other words,
B̂ ○ Â ↦ BA
B̂ + Â ↦ B + A

e′1 , . . . , e⃗′n )) we have again


If we choose a different basis B ′ = ((⃗
φB′ (A′ v) = Â(φB′ (v))
46 T.R. RAMADAS

From which we conclude


B ○ φB′ (A v)) = φB (φB′ (A v)) = φB (Â(φB′ (v)))
φ−1 ′ −1 ′ −1

= φ−1
B (Â(φB ○ φB ○ φB′ (v))) = AφB ○ φB′ (v)
−1 −1

B ○ φB′ ∶ Mn×1 → Mn×1 is given by a matrix TB,B′ so we have


Now φ−1
TB,B′ A′ = ATB,B′
where the product is matrix multiplication. Since φ−1B ○φB′ is an isomorphism
the matrix TB,B′ is invertible (in fact the inverse is TB′ ,B ), so we can write
the above equation in the form:
A′ = TB,B
−1
′ ATB,B ′

from which we see that


det A′ = det A
so the following definition makes sense.
Definition 9.1. We define the determinant det  of a linear map  ∶ V → A
to be the determinant det A of the matrix A associated to any choice of
ordered basis as above.

Exercise: det  ○ B̂ = det Â × det B̂.


Exercise: (Similarity of matrices) Two n × n matrices A, B are said to be
similar if there is an invertible matrix T such that B = T −1 AT . Prove that
similarity is an equivalence relation on the set of n × n matrices. Note that
above we used the fact that if A and B are similar, det A = det B.
Exercise: (Determining TB,B′ explicitly.) Let B = (⃗ e1 , . . . , e⃗n ) and B ′ =
e1 , . . . , e⃗n ) be two ordered bases of an n-dimensional vector space V . We
(⃗′ ′

defined TB,B′ to be the matrix such that for any v⃗ ∈ Mn×1 , we have
v ) = TB,B′ v⃗
B ○ φB′ (⃗
φ−1
Equivalently,
φB′ (⃗v ) = φB (TB,B′ v⃗)
If we take v⃗ = v⃗j , the column vector with 1 in the j th place, this gives:
⎡ ⎤
⎛⎢⎢ t1j ⎥⎥⎞
⎜⎢ . ⎥⎟
⎜⎢ ⎥⎟
vj ) = φB ⎜⎢ tij ⎥⎟ = ∑ tij φB (⃗
e⃗′j = φB′ (⃗ vi ) = ∑ tij e⃗i
⎜⎢ ⎥⎟
⎜⎢ . ⎥⎟ i i
⎝⎢⎢ tnj ⎥⎥⎠
⎣ ⎦
th
where tij is the ij entry of TB,B′ . Thus TB,B′ is the matrix that implements
the “change of basis from B to B ′ ”.
LINEAR ALGEBRA 47

Since the determinant is such an important concept, we close with a


Summary:
(1) The determinant det A of a n × n matrix A satisfies
● det AB = det A × det B,
● det A ≠ 0 iff A is invertible.
● det In = 1
(2) If V is a vector space with dim V = n, the determinant det  of a linear
map  ∶ V → V is defined by
det  = detA
where A is the matrix representation of  corresponding to any basis of V .
(3) The determinant det  of a linear map  ∶ V → V satisfies
● det  ○ B̂ = det Â × det B̂,
● det  ≠ 0 iff  is invertible.
● det IV = 1

9.2. Polynomials. A polynomial “over” k of degree n ≥ 0 is a function


P ∶ k → k of the form:
P (x) = a0 + a1 x + ⋅ ⋅ ⋅ + an xn
where the ai are scalars (“constants”) and an ≠ 0. (Warning: if k is a finite
field, we need an alternate definition17, which we will not go into.) If a
nonzero function P is a polynomial, its degree and the coefficients ai , i =
1, . . . , degree(P ) are uniquely defined. See the Wikipedia article for the
definition of degree for the zero polynomial, which we will not need. A root
of a polynomial is an element µ ∈ k such that p(µ) = 0. A polynomial of
degree 1 has exactly one root. A polynomial of higher degree may or may
not have a root. For example a quadratic polynomial over R
R ∋ x ↦ ax2 + bx + c
where the coefficients a, b, c are real number numbers, has a root in R iff
b2 ≥ 4ab. If a nonzero polynomial does have a root µ, it factorises uniquely
P (x) = (x − µ)mult(P,µ) Q(x)

17This is because of examples such as P (x) = xq −x, where q is a prime power. If the field
is the finite field Fq , the corresponding function P ∶ Fq → Fq vanishes identically. For the
same reason when we are dealing with finite fields we cannot use the terms “nonconstant
polynomial” and ’“polynomial with degree ≥ 1” interchangeably.
48 T.R. RAMADAS

where Q is polynomial with degree Q = degree P −mult(P, µ) and Q(µ) ≠ 0.


Here mult(P, µ) is the multiplicity of P at µ. As a consequence, a polynomial
of degree n has at most n roots.
A field is algebraically closed iff every polynomial of positive degree has a
root.
A polynomial with real coefficients may or may not have a real root. The
field of complex numbers, on the other hand, is algebraically closed:
Theorem 9.2. A non-constant polynomial function p ∶ C → C has a root.
In other words, if P has degree ≥ 1, there exists µ ∈ C such that P (µ) = 0.

It is easy to see that a polynomial of degree n over C can be factorised


uniquely into linear factors:
n
P (x) = an ∏(x − µi )
i=1
where {µ1 , . . . , µn } is the set of roots (not necessarily distinct) of P . A
polynomial over R, when regarded as polynomial over C, has a factorisation
n1 n2
P (x) = an ∏(x − µi ) ∏(x − νi )(x − ν̄i )
i=1 i=1
where
(1) µi , i = 1, . . . , n1 are the real roots (not necessarily distinct) of P ,
(2) νi , i = 1, . . . , n2 are the complex roots (not necessarily distinct) in
the upper half-plane (i.e., with positive imaginary parts),
(3) z̄ denotes the complex conjugate of a complex number z,
(4) n1 + 2n2 = n.

9.3. The characteristic polynomial of a n × n matrix; the character-


istic polynomial of a linear map  ∶ V → V .
Let n ≥ 1. We define the characteristic polynomial pA of a n × n matrix by
pA (t) = det (tIn − A)

Clearly
— if A = [a] is a 1 × 1 matrix we set pA (t) = t − a. Clearly pA has degree 1,
and one root a.
— If A is a 2 × 2 matrix,
a11 a12
A=[ ]
a21 a22
then pA (t) = t2 −(a11 +a22 )t+a11 a22 −a21 a22 . This has degree 2. If the field of
scalars is the reals this is a polynomial with real coefficients and may or may
LINEAR ALGEBRA 49

not have a (real) root. If the field of scalars is complex pA (t) = (t−λ1 )(t−λ2 )
where λi are the roots. Note that pA can be written
pA (t) = t2 − tr(A)t + det A
where tr(A) is the trace of A, defined to be sum of the diagonal entries:
tr(A) = a11 + a22 .
— if n > 2, first prove (by induction on n) that given any n × n matrix B
(with (ij)th entry bij ), we have the “complete expansion of the determinant”
det B = ∑ sign(σ) × bσ(1)1 × ⋅ ⋅ ⋅ × bσ(n)n
σ

where the sum runs over all permutations σ of the set {1, . . . , n} and sign(σ) =
±1 is the signature of σ. If you are not familiar with permutations and the
notion of a signature, note that permutation of a set (by definition) is a
bijection of the set with itself, and prove the statement:
det B = ∑ ±bσ(1)1 × ⋅ ⋅ ⋅ × bσ(n)n .
σ

Then substitute B = tIn − A to conclude that pA (t) is a polynomial over k


(of degree n)
This justifies the name “characteristic polynomial”. One can prove that
pA (t) = tn − tr(A)tn−1 + ⋅ ⋅ ⋅ + (−1)n det A
where tr(A) is the sum of the diagonal entries of A.
Let V be an n-dimensional vector space, with n ≥ 1. We define the charac-
teristic polynomial p of a linear map  ∶ V → V by
p (t) = det (tIV − Â) .
def

Note that if a basis for V is chosen, and the matrix corresponding to  is


A, then
p (t) = det (tIn − A) = pA (t) .

Correction: In an earlier version of these notes, I wrote:


“— if n > 2, expand det(tIn − A) by minors of the first column:
det (tIn − A) = (t − a11 )det (tIn−1 − A−11 ) + a21 det ((tIn−1 − A−21 ) + . . .
⋅ ⋅ ⋅ − (−1)n an1 det (tIn−1 − A−n1 )
= (t − a11 )pA−11 (t) + a22 pA−21 (t) + ⋅ ⋅ ⋅ − (−1)n an1 pA−n1 (t)
to see inductively that pA (t) is indeed a polynomial over k (of degree n).”
The RHS is incorrect, the error being in the replacements (as p runs over
1, . . . , n):
(tIn − A)−p1 ↝ (tIn−1 − A−p1 )
50 T.R. RAMADAS

9.4. Eigenvalues and eigenvectors. Let  ∶ V → V be a linear map,


with V finite-dimensional. The significance of a root of the characteristic
polynomial is this:

Theorem 9.3. Let λ0 ∈ k. The following are equivalent:


(1) p (λ0 ) = 0.

(2) ker(λ0 IV − Â) ≠ {0V }.

(3) There exists a nonzero vector v0 ∈ V such that Âv0 = λ0 v0 .

The proof is immediate. We have p (λ0 ) = det (λ0 IV − Â) = 0 iff the map
λ0 IV − Â ∶ V → V is not invertible. This holds18 iff ker(λ0 IV − Â) ≠ {0V }.
This in turn means precisely that there exists a nonzero vector v0 ∈ V such
that Âv0 = λ0 v0 .
This brings us to the important

Definition 9.4. A scalar λ0 is an eigenvalue of a linear map  ∶ V → V (with


V finite-dimensional) if there exists a nonzero vector v0 such that Âv0 = λ0 v0 .
Such a nonzero vector is called an eigenvector of  corresponding to λ0 .
Given an eigenvalue λ0 , the subspace

Vλ0 = {⃗ v = λ0 v⃗} = ker (λ0 IV − Â)


v ∈ V ∣Â⃗
def

is called the eigenspace corresponding to λ0 .

We can now rephrase the previous theorem: a scalar λ0 ∈ k is an eigenvalue


of a linear map  iff pA (λ0 ) = 0.
Important Remarks:

(1) The definition of Vλ0 makes sense for any λ0 . The point is that when λ0
is an eigenvalue the subspace is positive-dimensional (contains nonzero vec-
tors) and in that case, any nonzero vector belonging to Vλ0 is an eigenvector
corresponding to λ0 .
(2) Note that Vλ0 consists of eigenvectors together with the zero vector 0V .
The set of eigenvectors by itself is not a subspace.
(3) It is the eigenvector that is required to be nonzero. The eigenvalue
might well be zero.

18We are using the fact that a linear map from a finite-dimensional vector space to
itself is bijective iff it is injective.
LINEAR ALGEBRA 51

9.5. Important generalities. Before we come to examples, we record a


number of important facts.
Proposition 9.5. Let  ∶ V → V be a linear map. If v⃗1 , . . . , v⃗l are eigenvec-
v1 , . . . , v⃗l }
tors corresponding to pairwise distinct eigenvalues λ1 . . . , λl , then {⃗
is linearly independent.

Proof. If l = 1, the statement is true since the eigenvector v⃗1 is nonzero by


definition. We will now do an induction on l.
Let l > 1 and suppose c1 , . . . , cl are scalars such that
c1 v⃗1 + ⋅ ⋅ ⋅ + cl v⃗l = 0
E1

Applying  to the above equation, we get


c1 λ1 v⃗1 + ⋅ ⋅ ⋅ + cl λl v⃗l = 0
E2

If λl = 0, the scalars c1 λ1 , . . . , cl−1 λl−1 vanish by induction. Since λ1 , . . . , λl−1


are distinct from λl = 0, they are nonzero, and so c1 , . . . , cl−1 are all zero.
Then (E1) forces cl = 0.
If λl ≠ 0, divide by equation (E2) by λl and subtract from (E1) to get
λ1 λl−1
c1 (1 − )⃗
v1 + ⋅ ⋅ ⋅ + cl−1 (1 − )⃗
vl = 0
λn λl E1
λl−1
The scalars c1 (1 − λλ1l ), . . . , cl−1 (1 − λl ) vanish by induction, and as before
this forces c1 = ⋅ ⋅ ⋅ = cl = 0. 

Here is an immediate
Corollary 9.6. Let  ∶ V → V be a linear map. If dim V = n and v⃗1 , . . . , v⃗n
are eigenvectors corresponding to pairwise distinct eigenvalues λ1 . . . , λn ,
v1 , . . . , v⃗n ) is an ordered basis for V consisting of eigenvectors. The
then (⃗
matrix A with respect to this ordered basis is
⎡ λ1 0 0 0 ⎤⎥

⎢ 0 λ 0 ⎥⎥
⎢ 2 .
⎢ ⎥
A=⎢ 0 . . 0 ⎥
⎢ ⎥
⎢ 0 . λn−1 0 ⎥⎥

⎢ 0 . 0 λn ⎥⎦

Here is another important consequence of Proposition 9.5.


Corollary 9.7. Let  ∶ V → V be a linear map. Let λ1 . . . , λl be pairwise
distinct eigenvalues (not necessarily all) of Â. Let V ′ ⊂ V be the subspace
generated by the corresponding eigenvectors. Then
V ′ = ⊕li=1 Vλi
52 T.R. RAMADAS

where Vλ1 , . . . , Vλl are eigenspaces corresponding to λ1 . . . , λl . In other words,


every vector in V that can be written as a linear combination of eigenvectors
with eigenvalues in the set {λ1 . . . , λl } can be so written in a unique way.

Note that in the above Corollary, the subspace V ′ is invariant under the
linear map Â. That is, Â(⃗ v ′ ) ∈ V ′ if v⃗′ ∈ V ′ . We will see below that even if
the set {λ1 . . . , λl } contains all eigenvalues of Â, it can happen that V ′ is a
proper subspace of V .
Definition 9.8. We say that a linear map  = V → V is diagonalisable if
V = ⊕ri=1 Vλi
where {λ1 . . . , λr } is the set of all eigenvalues of Â.

The terminology will be explained later.


LINEAR ALGEBRA 53

10. Examples

We consider a number of examples, but first, to reiterate: the character-


istic polynomial is a polynomial with coefficients belonging to k, and the
eigenvalues are the roots of p in k. Two questions to keep in mind:

(1) what does the multiplicity of a root signify?

(2) in the case k = R what is the significance of a complex number z


with nonzero imaginary part such that p (z) = 0?

(1) If V is one-dimensional any linear map V → V is given by multiplication


by a scalar c, which is the only eigenvalue. Then V = Vc and any nonzero
vector is an eigenvector. Thus V has a basis consisting of any eigenvector.

(2) Let λ1 , . . . , λr be distinct scalars, m1 . . . , mr positive integers, and let


m1 + ⋅ ⋅ ⋅ + mr = n. For i = 1, . . . , r, let Ai be mi × mi the “diagonal matrix”
(i.e., a matrix with “off-diagonal” entries zero):

⎡ λi . . 0 ⎤
⎢ ⎥
⎢ ⎥
⎢ . λi . . ⎥
⎢ ⎥
Ai = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥
⎢ ⎥
⎢ 0 . . λi ⎥
⎣ ⎦

and A the n × n matrix with the block structure:

⎡ A1 . . 0 ⎤⎥

⎢ . . ⎥⎥
⎢ . A2
⎢ ⎥
A=⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥⎥

⎢ 0 . . Ar ⎥⎦

It is easy to check that pA (t) = ∏i (t − λi )mi , and that the standard basis of
the space of column vectors is a set of eigenvectors. Specifically,

⎡ A1 . . 0 ⎤⎥ ⎡⎢ 0 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ . . ⎥⎥ ⎢⎢ ⎥ ⎢ ⎥
⎢ . A2 . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ↰ ⎢ ⎥
⎢ . . . . ⎥⎢ 1 ⎥ = λi(j) ⎢ 1 ⎥
⎢ ⎥⎢ ⎥ th ⎢ ⎥
⎢ . . . . ⎥⎥ ⎢⎢ . ⎥ j place ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 . . Ar ⎥⎦ ⎢⎣ 0 ⎥ ⎢ 0 ⎥
⎣ ⎦ ⎣ ⎦
54 T.R. RAMADAS

i(j)−1 i(j)
provided ∑i=1 mi < j ≤ ∑i=1 mi . You can check that
⎧ ⎡ 0 ⎤ ⎫

⎪ ⎢ ⎥ ⎪


⎪ ⎢ . ⎥ ⎪


⎪ ⎢ ⎥ ⎪



⎪ ⎢ ⎥ ⎪



⎪ ⎢ ∗ ⎥ ⎪


⎪ ⎢ ⎥ R
R ⎪

⎪⎢⎢ ∗ ⎥⎥ RRR
i(j)−1 i(j)m i

Vλi = ⎨⎢ ⎥ RRRthe nonzero entries ∗ are in the range ∑ mi < − − − ≤ ∑ ⎬


⎪ ⎢ ∗ ⎥ RRR i=1 ⎪ ⎪


⎪ ⎢ . ⎥ i=1



⎪ ⎢ ⎥ ⎪


⎪ ⎢ ⎥ ⎪



⎪ ⎢ . ⎥ ⎪



⎪ ⎢ ⎥ ⎪


⎩⎣ ⎦0 ⎥ ⎭
This shows that the hypothesis of distinct eigenvalues in the statement of
Corollary 9.6 is sufficient but not necessary.
What if dim V = n > 1 and  ∶ V → V is such that p (t) = (t − λ)n ?

(3) Let us consider the simplest case n = 2. Since det (tIV − Â) = 0 when
t = λ, the map λIV − Â ∶ V → V has a nontrivial kernel. We have two
possibilities:
– (a) dim ker(λIV − Â) = 2 in which case  = λIV , and
– (b) dim ker(λIV − Â) = 1 which case we analyse in detail now. There is
a nonzero vector v⃗1 (unique up to multiplication by a nonzero scalar) such
v1 = λ⃗
that Â⃗ v1 , v⃗2 } of V . We have
v1 . Complete to a basis {⃗
v2 ) = a⃗
Â(⃗ v2 + b⃗
v1
v1 , v⃗2 ) is
for some scalars a, b. the matrix of  w.r.to the ordered basis (⃗
λ b
A=[ ]
0 a
The characteristic polynomial is
t−λ 0
p (t) = pA (t) = det ([ ]) = (t − λ)(t − a)
def b t−a

By assumption p (t) = (t − λ)2 and this forces a = λ. Now b = 0 would


contradict dim ker(λIV − Â) = 1. What if b ≠ 0?
Â(v1 v⃗1 + v2 v⃗2 ) = (λv1 + bv2 )⃗
v1 + λ⃗
v2
so that v1 v⃗1 + v2 v⃗2 ∈ ker(λIV − Â) ⇐⇒ bv2 = 0 ⇐⇒ v2 = 0. This shows
that b ≠ 0 is necessary and sufficient for dim ker(λIV − Â) = 1.
Note that if we replace the vector v⃗1 by u⃗
v1 where u is a nonzero scalar, this
has the effect of replacing b by b/u since
b
v2 ) = λ⃗
Â(⃗ v2 + b⃗
v1 = λ⃗
v1 + u⃗
v1
u
LINEAR ALGEBRA 55

So in effect we only have to distinguish the cases when b is zero or nonzero.


If b ≠ 0 we can choose v⃗1 in such a way that b = 1, so that w.r.to this basis
λ 1
A=[ ]
0 λ
This is an example of the Jordan canonical form. which we will study later.
For future use, note that in case (a) λIV − Â = 0 and in case (b) λIV − Â ≠ 0
but (λIV − Â)2 = 0. (Here (λIV − Â)2 = (λIV − Â) ○ (λIV − Â).)
def

(4) Exercise: What happens if n = 3? Prove that exactly one of the following
holds:
(a) dim ker(λIV − Â) = 3. W.r.to v1 , v⃗2 , v⃗3 } the matrix of  is
any basis {⃗
⎡ λ 0 0 ⎤⎥

⎢ ⎥
A=⎢ 0 λ 0 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥⎦

and λIV − Â = 0.
(b) dim ker(λIV − Â) = 2. There exists a v1 , v⃗2 , v⃗3 } w.r.to which the
basis {⃗
matrix of  is
⎡ λ 1 ⎤
⎢ 0 ⎥
⎢ ⎥
A=⎢ 0 λ 0 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥
⎣ ⎦
and (λIV − Â)2 = 0, but λIV − Â ≠ 0.
(c) dim ker(λIV − Â) = 1. There exists a v1 , v⃗2 , v⃗3 } w.r.to which the
basis {⃗
matrix of  is
⎡ λ 1 ⎤
⎢ 0 ⎥
⎢ ⎥
A=⎢ 0 λ 1 ⎥
⎢ ⎥
⎢ 0 0 λ ⎥
⎣ ⎦
and (λIV − Â)3 = 0, but (λIV − Â)2 ≠ 0.
(5) For 0 ≤ θ < 2π let
cos θ − sin θ
A=[ ]
sin θ cos θ
The corresponding linear map  ∶ R2 → R2 “rotates a vector by angle θ in
the anti-clockwise direction”. From this description it is clear that  has
no eigenvalues and eigenvectors unless θ = 0 or θ = π, i.e., A = ±I2 .. This is
consistent with the fact that the corresponding characteristic polynomial
p (t) = t2 − 2 cos θ + 1
has no (real) roots since 4 cos2 θ − 4 < 0 unless cos θ = ±1.
56 T.R. RAMADAS

(6) Let A be as above and consider this time the corresponding map  ∶
C2 → C2 . This has the same characteristic polynomial, but if θ ≠ 0, π we
can still consider the complex roots cos θ ± i sin θ = e±iθ . Since these are
distinct corresponding eigenvectors will be a basis for C2 . Let us find the
v ) = eiθ v⃗, we get
eigenvectors in turn. Writing out Â(⃗
cos θ v1 − sin θ v2 v
[ ] = eiθ [ 1 ]
sin θ v1 + cos θ v2 v2
Equating components we get
cos θ v1 − sin θ v2 = eiθ v1
sin θ v1 + cos θ v2 = eiθ v2
both of which can be solved by
1
v⃗ = [ ]
−i
We can similarly find an eigenvector with eigenvalue e−iθ . In conclusion, we
have
1 1
A[ ] = e±iθ [ ]
∓i ∓i

(7) Nilpotent linear maps: Let V be a k-vector space, of dimension n > 0.


Let N̂ ∶ V → V be a linear map such that N̂ l = 0 for some l ≥ 1, where
N̂ l = N̂ ○ ⋅ ⋅ ⋅ ○ N̂
def ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
l times

Such a linear map is called nilpotent. If v⃗ ≠ 0 and N̂ v⃗ = λ⃗


v , we have
by induction, N̂ l v⃗ = λl v⃗, and we conclude: the only eigenvalue of a
nilpotent map is 0.
First consider a linear (but not necessarily nilpotent) map  ∶ V → V .
Consider the decreasing family of subspaces of V :
V ⊃ Â(V ) ⊃ Â2 (V ) ⋅ ⋅ ⋅ ⊃ Âk (V ) . . .
where Â(V ) ≡ image(Â). If  is surjective it is a bijection, and we have
V = Â(V ) = Â2 (V ) ⋅ ⋅ ⋅ = Âk (V ) . . .

Let us now assume that  is not surjective. Since V is finite-dimensional,


it is still true that Âk+1 (V ) = Âk (V ) for some k, and
Âk+2 (V ) = Â(Âk+1 (V )) = Â(Âk (V )) = Âk+1 (V )
and so on. So we conclude that there exists a k0 (depending on Â) such that
V ⊋ Â(V ) ⊋ Â2 (V ) ⋅ ⋅ ⋅ ⊋ Âk0 −1 (V ) ⊋ Âk0 (V ) = Âk0 +1 (V ) = . . .
LINEAR ALGEBRA 57

In other words, the nested subspaces Â(Âk (V )) are strictly decreasing in


dimension up to k = k0 and are equal from then on.

Now if  = N̂ , with N̂ nilpotent, it is clear that

V ⊋ N̂ (V ) ⊋ N̂ 2 (V ) ⋅ ⋅ ⋅ ⊋ N̂ k0 −1 (V ) ⊋ {0V } = N̂ k0 (V ) = N̂ k0 +1 (V ) = . . .

Let dk = dim N̂ k (V ); we have

n > d1 > ⋅ ⋅ ⋅ > dk0 −1 > 0 = dk0

which yields n ≥ d1 + 1 ≥ d2 + 2 ⋅ ⋅ ⋅ ≥ dk0 −1 + k0 − 1 ≥ k0 . On other words, if


N̂ is nilpotent and nonzero, there exists 1 < k0 ≤ n = dim V such that
N̂ k0 −1 (V ) ≠ 0 and N̂ k0 (V ) = 0.

Exercise: (Suppose N̂ is nonzero.) Let k0 bs defined as above. Show that


N̂ k0 −1 (V ) contained in the eigenspace V0 corresponding to the (only) eigen-
value 0, and by example that in general

N̂ k0 −1 (V ) ⊊ V0

In the class I asserted without thinking and wrongly that N̂ k0 −1 (V ) = V0 .


Choose an ordered basis for V as follows: first choose a basis e⃗1 , . . . , e⃗dk0 −1 for
N̂ k0 −1 (V ), then extend this to a basis for N̂ k0 −2 (V ) ⊃ N̂ k0 −1 (V ) by adding
vectors e⃗dk0 −1 +1 , . . . , e⃗dk0 −2 , and so on. With this choice, we have

N̂ (⃗
ei ) = 0, i = 1, . . . , dk0 −1
ei ) ∈< e⃗1 , . . . , e⃗dk0 −1 >, i = dk0 −1 + 1, . . . , dk0 −2
N̂ (⃗
etc.

so that the matrix N of N̂ has the form


⎡ 0 0 0 x x x x x x x . . . . ⎤
⎢ ⎥
⎢ ⎥
⎢ 0 0 0 x x x x x x x . . . . ⎥
⎢ ⎥
⎢ 0 0 0 x x x x x x x . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 x x x x x . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 x x x x x . . . . ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 0 . . . . ⎥
⎢ ⎥
⎢ ⎥
⎣ 0 0 0 0 0 0 0 0 0 0 . . . . ⎦
58 T.R. RAMADAS

In other words, N is “strictly upper triangular”. i.e., has zeros along the
diagonal and below the diagonal. Then it is an easy matter to check that
pN̂ (t) = pN (t) = tn
def

We will prove, under the hypothesis that k is algebraically closed, that


the converse also holds. (The restriction on k is not necessary, but the
assumption simplifies the proof.) That is, if  ∶ V → V is such that
p (t) = tn
then  is nilpotent. To see this consider the nested sequence as above:
V ⊋ Â(V ) ⊋ Â2 (V ) ⋅ ⋅ ⋅ ⊋ Âk0 −1 (V ) ⊋ Âk0 (V ) = Âk0 +1 (V ) = . . .
and suppose V ′ ≡ Âk0 (V ) ≠ {0V }. Since V ′ is invariant under  and we are
working over an algebraically closed field, there is an eigenvector v⃗ ∈ Âk0 (V )
for the restricted map Â′ ≡ Â∣Âk0 (V ) . But this is an eigenvector for  on V ,
so the eigenvalue must be zero. This means that Â′ is not injective, so it
must follow (since dim V ′ is finite) that
Âk0 +1 (V ) = Â′ (V ′ ) ⊊ V ′ = Âk0 (V )
which is a contradiction.

11. Desperately seeking diagonalisation...and not quite


managing

Let V be a k-vector space, of dimension n. Recall that a linear map  ∶ V →


V is called diagonalisable if V is the direct sum of eigenspaces:
r
V = ∑ V λr
i

Here {λ1 , . . . , λr } is the set of (distinct) eigenvalues of Â. If dim Vλi = mi


(so that n = ∑i mi ), it is clear that
r
p (t) = ∏(t − λi )mi
i=1

so that λ1 , . . . , λr are all the eigenvalues of Â, and mi = mult(p , λi ).

11.1. Diagonalisation of a square matrix. A n×n matrix A is said to be


diagonalisable if there exists an invertible matrix T such that T −1 AT = ∆
where ∆ is a diagonal matrix with diagonal entries µ1 , . . . , µi , . . . , µn and
other entries zero.. In this section we will see that this is equivalent to the
corresponding linear map  ∶ Mn×1 → Mn×1 being diagonalisable.
First, if A is diagonalisable, we can write
T = [t⃗1 . . . , t⃗i . . . , t⃗n ]
LINEAR ALGEBRA 59

where t⃗1 , . . . , t⃗i , . . . , t⃗n are column vectors of length n. Then


AT = [At⃗1 . . . At⃗i . . . At⃗n ] = T ∆ = [µ1 t⃗1 . . . µi v⃗i . . . µn t⃗n ]
This gives At⃗i = µi t⃗i . In other words t⃗i is an eigenvector of  with eigenvalue
µi .
Conversely, if  is diagonalisable, there exists an ordered basis (t⃗1 . . . , t⃗i . . . , t⃗n )
consisting of eigenvectors; let µ1 , . . . , µi , . . . , µn be the corresponding eigen-
values, and let ∆ be the diagonal matrix formed with the µi running down
the diagonal19. Let T = [t⃗1 . . . , t⃗i . . . , t⃗n ]. We have
AT = T ∆
Since T is invertible (why?) we have
T −1 AT = ∆

11.2. Generalised Eigenspace Decomposition (over algebraically closed


fields). What are the possible “obstructions” to diagonalisability of a linear
map Â?
Looking back at the examples, we see two kinds, of which the extremes are:
(1) k is not algebraically closed and the characteristic polynomial pÂ
has no roots in k,
(2) n > 1 and p (t) = (t − λ)n , but there is only one eigenvector vλ (upto
scaling by a nonzero scalar). In other words, dim Vλ = 1, so that V
is “as far away as possible from being a direct sum of eigenspaces”.
In general the situation is a mix of various intermediate possibilities.
In the rest of this section, we will assume that k is algebraically closed and
state and state a general structure theorem for a linear map  ∶ V → V ,
with dim V = n < ∞. The main part of the proof is in the next section.
Definition 11.1. For each eigenvalue λ of  define the corresponding gen-
eralised eigenspace Ṽλ by
v ∈ V ∣(λIdV − Â)l v⃗ = 0 for some l > 0}
Ṽλ = {⃗
def

(Note that the l in the definition could a priori depend on v⃗)


Exercise: Check that
(1) Ṽλ is a linear subspace of V invariant under Â; it contains the
eigenspace Vλ .
(2) the map (λIdV − Â)∣Ṽλ ∶ Ṽλ → Ṽλ is nilpotent; as a consequence, if
v⃗ ∈ Ṽλ , then (λIdV − Â)l v⃗ = 0 provided l ≥ dim Ṽλ .
19Sometimes we write ∆ = diag(µ , . . . , µ , . . . , µ ).
1 i n
60 T.R. RAMADAS

Here is the promised structure theorem.


Theorem 11.2. Let V be a vector space over an algebraically closed field20
k, with dim V = n < ∞. Let  ∶ V → V be a linear map. Then
V = ⊕ri=1 Ṽλi

where where λ1 , . . . , λi are the distinct eigenvalues of  and Ṽλi is the gen-
eralised eigenspace corresponding to λi . (In other words, every v⃗ ∈ V can be
written uniquely as a sum ∑ri=1 v⃗i where v⃗i ∈ Ṽλi .)

We defer the proof to subsequent subsections. We draw two important


consequences, the first immediately below, and the next in the following
subsection.
As noted above in an exercise, the maps (λi IdV − Â)∣Ṽi are nilpotent on Ṽi .
Choose an ordered basis for each Ṽi such that the corresponding matrices are
upper triangular. Combine these to an ordered basis for V . With respect
to this basis, A is an n × n matrix with the block structure:
⎡ A1 . . . 0 ⎤⎥

⎢ . . ⎥⎥
⎢ . A2 .
⎢ ⎥
A=⎢ . . . . . ⎥
⎢ ⎥
⎢ . . . . . ⎥⎥

⎢ 0 . . . Ar ⎥⎦

where the mi × mi matrices Ai are upper triangular with λi along the diag-
onal.
⎡ λi x . . x ⎤
⎢ ⎥
⎢ ⎥
⎢ . λi . . x ⎥
⎢ ⎥
Ai = ⎢ . . . . . ⎥
⎢ ⎥
⎢ . . . λi x ⎥
⎢ ⎥
⎢ 0 . . 0 λi ⎥
⎣ ⎦
Corollary 11.3. We have
r
p (t) = ∏(t − λi )mi
i=1

where mi ≡ dim Ṽλi .

11.3. Substituting a variable with a linear map; Cayley-Hamilton


Theorem. In this section k = R or k = C (Indeed, all we need is that you
have a definition of polynomial to hand.) Let P be a polynomial:
P (x) = am xm + am−1 xn−1 + ⋅ ⋅ ⋅ + a0

20You can think C.


LINEAR ALGEBRA 61

where the coefficients are elements of k. Let V be a finite-dimensional vector


space over k. We define a map L(V, V ) → L(V, V ) by
 ↦ P (Â) = am Âm + am−1 Âm−1 + ⋅ ⋅ ⋅ + a0 IV
def

where  =  ○ Â, etc. Note that in general P ( + B̂) ≠ P (Â) + P (B̂) and
2

P (Â)P (B̂) = P (Â) ○ P (B̂ ≠ P (Â ○ B̂). But if P = QR, it is true that
def
P (A) = Q(A)R(A) = R(A)Q(A).
We can now prove
Theorem 11.4. (Cayley-Hamilton Theorem) A linear map  ∶ V → V “sat-
isfies its own characteristic polynomial”. That is,
p (Â) = 0

Proof. We need to prove that p (Â)(⃗ v ) = 0V for any v⃗ ∈ V . Write v⃗ =


v⃗1 + ⋅ ⋅ ⋅ + v⃗r with v⃗j ∈ Ṽj . We have
p (Â) = ∏ Qi (Â)
i
where Qi (t) = (t − λi ) mi
. Acting on v⃗j , we get
p (Â)⃗
vj = {∏ Qi (Â)}Qj (Â)⃗
vj
i≠j

On the other hand  − λi IV is nilpotent on the mj -dimensional vector space


vj = (Â − λj IV )mj v⃗j = 0.
Ṽj so Qj (Â)⃗ 

11.4. Proof of the linear independence of generalised eigenspaces.


Let us prove that “generalised eigenvectors” corresponding to distinct eigen-
values are linearly independent. More precisely
Proposition 11.5. Let l ≥ 1, and λ1 , . . . , λl be distinct eigenvalues (possibly
not all) of a linear map  ∶ V → V with V finite-dimensional. Let Ṽi
denote the corresponding generalised eigenspaces, and set mi = dim Ṽi . Let
v⃗i ∈ Ṽi , i = 1, . . . , l, and suppose
v⃗1 + ⋅ ⋅ ⋅ + v⃗l = 0
Then v⃗i = 0 ∀i.

Proof. Consider the linear maps


r
Tj ∶ V → V, Tj = ∏ (λi IV − Â) i
m
def i=1,i≠j

It is clear that Tj is zero on Ṽi , i ≠ j. I claim that Tj invertible on Ṽj . Apply


Tj to the equation v⃗1 + ⋅ ⋅ ⋅ + v⃗l = 0 to get
Tj v⃗j = 0
62 T.R. RAMADAS

which yields v⃗j = 0.


To see that Tj is invertible on Ṽj it is sufficient to see that for any i ≠ j the
linear map λi IV − Â∣Ṽj is invertible. But

λi IV − Â = (λi − λj )IdV + λj IdV − Â


= (λi − λj ){IdV + (λi − λj )−1 (λj IdV − Â)}

Restricted to Ṽj this is of the form


(nonzero scalar) × (IdṼj + nilpotent)
which is invertible.


11.5. Proof of the existence of generalised eigenspace decompo-


sition. As requested in class, I do this as a series of exercises. This is
essentially the proof in Halmos. Searching for a more modern account, I
came across this reference which you can consult if you get stuck:
https://people.eecs.berkeley.edu/ wkahan/MathH110/DownDets.pdf
(Although I am not a fan of Axler’s book, Linear Algebra Done Right, this
short account is pretty nice and you might enjoy reading it. The proof of
linear independence of generalised eigenvectors given there is exactly the
one we saw in the class!)
Exercise: Let V be finite-dimensional, and B̂ ∶ V → V a linear map.
(1) Consider the nested sequences of subspaces
kernel(B̂) ⊂ kernel(B̂ 2 ) ⊂ . . .
image(B̂) ⊃ image(B̂ 2 ) ⊃ . . .

where (as usual) B̂ 2 = B̂ ○ B̂, etc. Prove that both sequences stabilise even-
tually. That is, there exists i0 such that for i ≥ i0 , we have
kernel(B̂) ⊂ kernel(B̂ 2 ) ⊂ . . . ⊂ kernel(B̂ i ) = kernel(B̂ i+1 ) = . . . = K
def

image(B̂) ⊃ image(B̂ ) ⊃ . . . ⊃ image(B̂ ) = image(B̂


2 i i+1
) = ... = I
def

(2) Prove that


V =K⊕I

(3) Both K and I are invariant under B̂.


(4) The map B̂∣K is nilpotent and the map B̂∣I is invertible.
LINEAR ALGEBRA 63

Exercise: Let V be a finite-dimensional vector space over an algebraically


closed field, and  ∶ V → V a linear map. Let λ1 be an eigenvalue of Â.
Apply the previous exercise to the linear map B̂ = λ1 IdV − Â to conclude
that
V = Ṽ1 ⊕ V1′
where V1′ is invariant under Â, and B̂ is invertible on V ′ . Now do an induc-
tion on dim V to prove Theorem 11.2.
64 T.R. RAMADAS

12. Real inner product spaces

In this section V will denote an n-dimensional vector space over R.

12.1. Real inner product. Preliminaries.


Definition 12.1. An inner product on V is a map
V ×V →R
(⃗ ⃗ ↦⟨⃗
v , w) ⃗
v , w⟩

which is
(1) bilinear,
⟨a1 v⃗1 + a2 v⃗2 , w⟩
⃗ = a1 ⟨⃗ ⃗ + a2 ⟨⃗
v1 , w⟩ ⃗ and
v2 , w⟩,
⟨⃗ ⃗1 + b2 w
v , b1 w ⃗2 ⟩ = b1 ⟨⃗ ⃗1 ⟩ + b2 ⟨⃗
v, w ⃗2 ⟩,
v, w

(2) symmetric, i.e., ⟨⃗ ⃗ = ⟨w,


v , w⟩ ⃗ v⃗⟩,
v , v⃗⟩ > 0 unless v⃗ = 0V .
(3) positive-definite, i.e., ⟨⃗
This is modelled on the dot-product of vectors in Rn :
n
⃗ ≡ ∑ v i wi
v⃗ w
i=1
where v⃗ = (v1 , . . . , vn ) and w
⃗ = (w1 , . . . , wn ).
A vector space with an inner product is sometimes calls an inner-product
space.
Given an inner√ product, the length ∣∣⃗ v ∣∣ of a vector v⃗ is the (positive) square
root: ∣∣⃗ v , v⃗⟩. The function v⃗ ↦ ∣∣⃗
v ∣∣ = ⟨⃗ v ∣∣ from V → R≥0 obeys:
v ∣∣ > 0 unless v⃗ = 0V ,
(1) ∣∣⃗
(2) ∣∣a⃗ v ∣∣ for a ∈ R and v⃗ ∈ V ,
v ∣∣ = ∣a∣∣∣⃗
(3) ∣∣⃗ ⃗ ≤ ∣∣⃗
v + w∣∣ ⃗ (the triangle inequality)
v ∣∣ + ∣∣w∣∣
Such a function on a real vector space is called a norm. Property (3) follows
from the Cauchy-Schwarz inequality:
∣⟨⃗ ⃗ ≤ ∣∣⃗
v , w⟩∣ ⃗ ,
v ∣∣ × ∣∣w∣∣
with equality iff v⃗, w
⃗ are “co-linear”, that is, one is a scalar times the other.
To prove the Cauchy-Schwarz inequality, consider the function
t ↦ ⟨⃗ ⃗ v⃗ + tw⟩
v + tw, ⃗ = t2 ∣∣w∣∣
⃗ 2 + 2t⟨⃗ ⃗ + ∣∣⃗
v , w⟩ v ∣∣2
This is a quadratic polynomial in t which has at most one real root since it
never takes negative values. This forces
4∣⟨⃗ ⃗ 2 ≤ 4∣∣⃗
v , w⟩∣ ⃗ 2
v ∣∣2 ∣∣w∣∣
LINEAR ALGEBRA 65

which yields the desired inequality. It is also clear that in case of equality,
there is one root t0 at which we have v⃗ + t0 w
⃗ = 0.
Exercise: Given a norm ∣∣.∣∣ on a vector space V , define a function f ∶ V ×V →
R by
1
f (⃗ ⃗ = {∣∣⃗
v , w) ⃗ 2 − ∣∣⃗
v + w∣∣ ⃗ 2}
v ∣∣2 − ∣∣w∣∣
2
Check that this is an inner product iff the parallelogram law holds:
∣∣⃗ ⃗ 2 + ∣∣⃗
v + w∣∣ ⃗ 2 = 2(∣∣⃗
v − w∣∣ ⃗ 2)
v ∣∣2 + ∣∣w∣∣

12.2. Orthonormal bases.


Theorem 12.2. Let V be an n-dimensional inner product space. Then it
has an ordered basis (⃗ ⃗n ) such that ∣∣⃗
u1 , . . . , u ui ∣∣ = 1∀i and ⟨⃗ ⃗j ⟩ = 0 if i ≠ j.
ui , u
(Such a basis is called an orthonormal basis.)

Proof. We will construct an orthonormal basis starting from any given basis
v1 , . . . , v⃗n ) by the Gram-Schmidt othogonalisation process.
(⃗
v1 , . . . , v⃗i }. We have
For i = 1, . . . , n, let Vi ⊂ V be the span of {⃗
{0V } ⊊ V1 ⊊ ⋅ ⋅ ⋅ ⊊ Vi−1 ⊊ Vi ⋅ ⋅ ⋅ ⊊ Vn ≡ V
Set
1 ′
⃗′1 = v⃗1 ;
u ⃗1 =

u
u
u′1 ∣∣ 1
∣∣⃗
This is an orthonormal basis for the one-dimensional subspace V1 . Suppose
by induction, that we have constructed an orthonormal basis (⃗ ⃗i−1 )
u1 , . . . , u
for Vi−1 . I claim that
(1) v⃗i − ∑i−1
j=1 ⟨⃗ ⃗j ⟩⃗
vi , u uj is a nonzero vector, and
(2) ⟨⃗
vi − ∑i−1
j=1 ⟨⃗ ⃗j ⟩⃗
vi , u ⃗k ⟩ = 0, k = 1, . . . , i − 1.
uj , u
The first claim is clear since v⃗i is not in the span of {⃗ ⃗i−1 }. The
u1 , . . . , u
second is a simple computation.
Now set
i−1
1 ′
⃗′i ≡ v⃗i − ∑ ⟨⃗
u ⃗j ⟩⃗
vi , u ⃗i =
uj ; u ⃗
u
j=1 u′i ∣∣ i
∣∣⃗
to continue the inductive construction. 

Notation: Let S be a set, usually {1, . . . , n} for some natural number n. We


introduce a function S × S → {0, 1}, called the Kronecker delta symbol by:


⎪1 if i = j
(i, j) ↦ δi,j = ⎨
def ⎪ ⎪ 0 if i ≠ j

With this notation, a sequence (⃗ ⃗l ) of vectors is called orthonormal
u1 , . . . , u
if ⟨⃗ ⃗j ⟩ = δi,j .
ui , u
66 T.R. RAMADAS

12.3. Orthogonal Complement of a Subspace. Given an n-dimensional


vector space V and a m-dimensional subspace W such that {0V } ⊊ W ⊊ V , a
subspace W ′ of dimension n−m such that V = W ⊕W ′ is called a supplement
to W . We have noted that there is no natural choice of a supplement. If V
has an inner product, however, there is a natural choice, called the orthogonal
complement. This is often denoted W ⊥ and the definition is
v ′ ∈ V ∣⟨< v⃗, v⃗′ >= 0∀⃗
W ⊥ = {⃗ v ∈ W}
It is easy to check that W ∩ W ⊥ = {0V }. Define the map (“orhogonal pro-
jection to W ”) P ∶ V → V by
m
P (⃗
v ) = ∑⟨⃗ ⃗i ⟩⃗
v, u ui
i=1

where (⃗ ⃗m ) is an orthonormal basis for W . Check that:


ui , . . . , u
(1) im(P ) = W
(2) P ○ P = P
(3) kernel(P ) = W ⊥
and conclude that indeed V = W ⊕ W ⊥ .

12.4. The standard inner product on Rn . We deal with this mostly as


a series of exercises. Let V = Rn . Check that the dot-product is an inner
product. If v⃗, w
⃗ are column vectors, check that v⃗tr w ⃗ is the 1 × 1 matrix
[⃗ ⃗ (We usually abuse notation and write simply v⃗tr w
v w]. ⃗ = v⃗ w.)
⃗ Note that
the standard basis (⃗ e1 , . . . , e⃗n ) of R is an orthonormal basis.
n

If A be a n × n matrix written as a row of column vectors


A = [⃗ ⃗n ]
a1 . . a
then
⃗i a
{Atr A}ij = (ij)th entry of Atr A = a ⃗j
def

Conclude that an ordered set (⃗ ⃗n ) of column vectors is an orthonor-


u1 , . . . , u
mal basis iff the n × n matrix
O = [⃗ ⃗n ]
u1 . . u
satisfies Otr O = In . Since for a square matrix right and left inverses coincide,
we get OOtr = In , which can be interpreted to mean that the rows of an
orthonormal matrix form an orthonormal basis for M1×n . We have proved:
if O is a n × n matrix, the rows form an orthonormal basis for Mn×1 iff the
columns form an orthonormal basis for M1×n .
A n × n matrix O such that Otr O = OOtr = In is called an orthogonal matrix.
LINEAR ALGEBRA 67

12.5. The QR decomposition. Let A be a real invertible n × n matrix.


We will apply the Gram-Schmidt process to write
A = QR
where Q is orthogonal and R is upper-triangular. (Such a decomposition
holds even if A is not invertible, but uniqueness (see Exercise below) cannot
be asserted.)
To see the existence of the decomposition, write A as a row of column
vectors:
A = [⃗ v1 . . . v⃗n ]
v1 , . . . , v⃗n ) is an ordered basis. Let us apply the Gram-
Since A is invertible, (⃗
Schmidt process to this basis to get an orthogonal basis (⃗ ⃗n ). Recall
u1 , . . . , u
that
i−1
1 ′
⃗′i ≡ v⃗i − ∑ (⃗
u vi u⃗j )⃗
uj ; u ⃗i = ′ u ⃗
j=1 ui ∣∣ i
∣∣⃗
where we have replaced ⟨⃗ ⃗j ⟩ by v⃗i v⃗j since we are using the standard
vi , u
inner product. Note next that since u ⃗1 , . . . , u
⃗′i is orthogonal to u ⃗i−1
⃗′i u
u′i ∣∣2 = u
∣∣⃗ ⃗′i = v⃗i u
⃗′i = ∣∣⃗
u′i ∣∣(⃗ ⃗i )
vi u
so that ∣∣⃗ ui v⃗i ). Assembling all this, we get
u′i ∣∣ = (⃗
i−1
v⃗i = (⃗ ⃗i )⃗
vi u ui + ∑ (⃗ ⃗j )⃗
vi u uj
j=1

which, in matrix form becomes:


⎡ v⃗1 u⃗1 v⃗2 u⃗1 . . v⃗n u⃗1 ⎤
⎢ ⎥
⎢ v⃗2 u⃗2 . . v⃗n u⃗2 ⎥
⎢ . ⎥
⎢ ⎥
v1 . . . v⃗n ] = [⃗
A = [⃗ ⃗n ] ⎢
u1 . . . u . . . . . ⎥
⎢ ⎥
⎢ . . . . . ⎥
⎢ ⎥
⎢ 0 . . 0 v⃗n u⃗n ⎥
⎣ ⎦
which is a (the21) QR decomposition of A
Exercise Prove that if QR = Q′ R′ is invertible (with Q, Q′ orthogonal and
R, R′ upper-triangular), then Q = Q′ ∆ and R = ∆R′ where ∆ is a diagonal
matrix with diagonal entries ±1.

12.6. All inner products on Rn . What are all possible inner products on
Rn ?
To answer this, it is convenient to identify Rn with Mn×1 . Let e⃗1 , . . . , e⃗n be
the standard basis. Given any inner product ⟨, .⟩, let P denote the matrix
with entries
ei , e⃗j ⟩
Pij = ⟨⃗
21see the exercise immediately below.
68 T.R. RAMADAS

Clearly,
⟨⃗ ⃗ = v⃗tr P w
v , w⟩ ⃗
Conversely, given any symmetric matrix P the map
(⃗ ⃗ ↦ v⃗tr P w
v , w) ⃗
is bilinear and symmetric. For this to define an inner product, we require
that
(4) v⃗tr P v⃗ ≥ 0 ∀ w
⃗ and v⃗tr P v⃗ = 0 only if v⃗ = 0 .

A symmetric matrix that satisfies the conditions (4) is said to be positive-


definite 22
Here is a characterisation of positive-definite matrices:
Proposition 12.3. Let P be a n × n matrix with real entries. The following
are equivalent.
(1) P is positive-definite.
(2) There is an invertible matrix n × n matrix B such that P = B tr B.

Proof. Suppose P = B tr B. Then v⃗tr P v⃗ = v⃗tr B tr B⃗


v=w ⃗ tr w
⃗ ≥ 0 where w
⃗≡
B⃗v . Furthermore v⃗ P v⃗ = 0 Ô⇒ w
tr
⃗ w
tr
⃗ = 0 Ô⇒ w ⃗ = 0 Ô⇒ v⃗ = 0 since P is
invertible.
Suppose now that P is positive definite. Then it defines an inner product
⟨.⟩ on Rn . Let (⃗ ⃗n ) be an orthonormal basis. Write the standard
u1 , . . . , u
basis in terms of the orthonormal basis:
⃗l
e⃗i = ∑ bli u
l

Then
v⃗tr P w
⃗ = ⟨∑ vi e⃗i , ∑ wj e⃗j ⟩
i j

= ∑ ∑ vi wj ∑ ∑ bli bmj ⟨⃗ ⃗m ⟩
ul , u
i j l m

= ∑ ∑ ∑ vi bli blj wj
i j l

= v⃗ B B w

tr tr

from which we can conclude that P = B tr B. 

In a later section we will prove:

22Not to be confused with positive or totally positive matrices.


LINEAR ALGEBRA 69

Theorem 12.4. If P is a real symmetric n × n matrix, there exists an


orthonormal basis of eigenvectors. That is, there is a set {⃗ ⃗n } (of
u1 , . . . , u
column vectors of length n) and real numbers µi such that
⃗tr
ui u⃗j = δi,j and P u ⃗i
⃗ i = µi u

Exercise: Assuming the above Theorem, prove that a real symmetric P is


positive-definite iff all its eigenvalues µi are positive.

12.7. Yet another criterion for positive-definiteness, without proof.


This is Theorem 1.25 (on page 242) of Artin. I have changed the notation
A ↝ P.
Theorem 12.5. Let P be a symmetric n × n matrix and let C1 , C2 , . . . , Cn
denote the square matrices (C for “top-left corner”) defined as follows:
a11 a12
C1 = [a11 ], C2 = [ ] , . . . , Cn = P
a21 a22
Then P is symmetric is positive-definite iff det Ci > 0 for all i > 0.
70 T.R. RAMADAS

13. Self-adjoint linear maps on a real inner product space;


spectral theorem

13.1. The adjoint of a linear map. Let V be a real n-dimensional vector


space with an inner product. We define, for every linear map  ∶ V → V , its
adjoint as follows: it is the unique map † ∶ V → V such that
⟨Â(⃗
v ), w⟩ v , † (w)⟩
⃗ = ⟨⃗ ⃗ for all v⃗, w

Of course, the definition makes sense only if such a unique † exists. To
see that it does, choose an orthonormal basis (⃗ ⃗n ) for V . Writing
u1 , . . . , u
v⃗ = ∑i vi u
⃗i and w
⃗ = ∑ i wi u
⃗i , the defining equation above becomes
⃗ l , ∑ wj u
⟨∑ ∑ vi Ali u ⃗j ⟩ = ⟨∑ vi u ⃗m ⟩ Ô⇒ ∑ vi Aji wj = ∑ vi A†ij wj
⃗i , A†mj wj u
i l j i ij ij

We conclude: There exists a unique † , which is given by the expression:


† (⃗ ⃗j where  acts as Â(⃗
ui ) = ∑ Aij u ⃗j
ui ) = ∑ Aji u
j j

In other words the matrix representing the adjoint  (w.r.to any orthonor-
mal basis) is the transpose of the matrix representing Â.
In particular, if V = Rn with the standard inner product, and we go back and
forth between matrices and linear maps using the standard basis, we have
Aˆtr = † . (To be explicit: the linear map corresponding to the transpose of
a matrix is the adjoint of the linear map corresponding to the matrix.)
Definition 13.1. A linear map  ∶ V → V (where V is a finite-dimensional
real inner product space) is called self-adjoint if
† = Â

By the above remarks, Â is self-adjoint iff the corresponding matrix of A


w.r.to any orthonormal basis is symmetric.
Warning about terminology: Usually adjoints and self-adjoint linear maps
are introduced in the context of complex inner product spaces (which we will
deal with briefly later.) In the real context, the adjoint is sometimes called
the “transpose”, and self-adjoint maps called “symmetric”. We prefer to
save these latter terms for matrices. To add to the possibility of confusion,
“transpose” and “adjoint” are also applied to maps V → V ∗ where V ∗ is the
dual vector space.

13.2. Orthogonal transformations.


Definition 13.2. A linear map Ô ∶ V → V (where V is a finite-dimensional
real inner product space) is called orthogonal if it is invertible with inverse
Ô† , in other words, if Ô† ○ Ô = Ô ○ Ô† = IdV .
LINEAR ALGEBRA 71

Clearly Ô is orthogonal iff the corresponding matrix O (w.r.to any orthonor-


mal basis) is orthogonal: O−1 = Otr .
An orthogonal transformation preserves the lengths of vectors: for any v⃗ ∈ V ,
∣∣Ô(⃗
v )∣∣2 = ⟨Ô(⃗
v ), Ô(⃗ v , Ô† (Ô(⃗
v )⟩ = ⟨⃗ v , (Ô† Ô)(⃗
v ))⟩ = ⟨⃗ v )⟩ = ∣∣⃗
v ∣∣2
The converse is also true, as we will prove shortly.

13.3. Distinct eigenspaces of a self-adjoint map are mutually or-


thogonal. There are three ingredients in the proof of the spectral theorem
for self-adjoint maps. The first is
Proposition 13.3. Let  ∶ V → V be a self-adjoint linear map and λ1 , λ2
distinct eigenvalues. Then Vλ1 , Vλ2 are orthogonal subspaces of V . That is,
v1 , v⃗2 ⟩ = 0 if v⃗1 ∈ Vλ1 and v⃗2 ∈ Vλ2 .
⟨⃗

Proof. We have
v1 ), v⃗2 ⟩
⟨Â(⃗ = v1 , † (⃗
⟨⃗ v2 )⟩
definition of adjoint

= ⟨⃗ v2 )⟩
v1 , Â(⃗
since  is self-adjoint

Since the v⃗i are eigenvectors, we obtain λ1 ⟨⃗


v1 , v⃗2 ⟩ = λ2 ⟨⃗
v1 , v⃗2 ⟩. But λ1 ≠ λ2 ,
so we can conclude that ⟨⃗ v1 , v⃗2 ⟩ = 0. 

Warning on terminology: When applied to a linear map or matrix, “orthog-


onal” refers to a property of a single object. When applied to vectors or
subspaces, it is a relationship between two or more objects, and sometimes
we say “mutually orthogonal” to emphasise this. The vectors belonging to
an “orthonormal basis” are mutually orthogonal, and are each of unit norm.

13.4. A real self-adjoint linear map has n real eigenvalues. This is


the second ingredient.
Before we go to formal statement and proof, a remark: later we will introduce
the notion of a (Hermitian) inner product on a complex vector space, and
in this context define the notions of adjoint, self-adjoint map, and prove
the spectral theorem in this case. It will turn out that the eigenvalues of
a self-adjoint linear map on a complex vector space are real, and that a
self-adjoint map on a real vector space defines a self-adjoint linear map on
its “complexification”. So a natural strategy is to first treat the complex
case. This would avoid an artifice of the proof of the next proposition, the
introduction of an orthonormal basis.
Proposition 13.4. Let  ∶ V → V be a self-adjoint linear map. Then it has
n (real) eigenvalues, counting multiplicity.
72 T.R. RAMADAS

Proof. Choose an orthonormal basis for V so  is represented by a symmetric


n × n matrix A (with real entries). Let λ be any (possibly complex) root of
pA . Clearly there exists a nonzero column vector v⃗ (possibly with complex
entries) such that A⃗v = λ⃗v . We assemble the rest of the proof in (simple)
steps:
(1) Rewriting the matrix equation, we get
∑ Aij vj = λvi
j

(2) Taking complex conjugates of both sides, we get


∑ Aij v j = λv i
j

(3) Multiply the equation in Step (1) by v i and sum over i:


∑ v i Aij vj = λ ∑ v i vi = λ ∑ ∣vi ∣
2
i,j i i

(4) The left-hand side of the equation in Step (3) can be rewritten:
∑ v i Aij vj = ∑ vj {∑ Aij v i }
i,j j i

= ∑ vi {∑ Aji v j }∣(we interchanged summation indices)


i j

= ∑ vi {∑ Aij v j } (used symmetry of A)


i j

= λ ∑ vi v i (used Step (2))


i
= λ ∑ ∣vi ∣2
i

(5) We conclude that


λ ∑ ∣vi ∣2 = λ ∑ ∣vi ∣2
i i

and since ∑i ∣vi ∣ ≠ 0, λ = λ.


2

Thus any root of pA , a priori complex, is actually real. 

13.5. A lemma about self-adjoint maps. The third ingredient is


Lemma 13.5. Let V be an n-dimensional real inner product space, and
 ∶ V → V be a self-adjoint linear map. If a subspace V1 ⊂ V is invariant
under Â, then so is the orthogonal complement V1⊥ . (In other words, if
Â(⃗v1 ) ∈ V1 for every v⃗1 in a subspace V1 , then v⃗2 ⊥ V1 Ô⇒ Â(⃗
v2 ) ⊥ V1 .)
LINEAR ALGEBRA 73

Proof. Suppose v⃗2 ⊥ V1 . Then, for any v⃗1 ∈ V1 ,


v2 ), v⃗1 ⟩ = ⟨⃗
⟨Â(⃗ v1 )⟩ = 0
v2 , Â(⃗
v1 ) ∈ V1 for any v⃗1 ∈ V1 .
since Â(⃗ 

13.6. Spectral Theorem for self-adjoint maps. We are now ready to


combine the last two Propositions and the Lemma to conclude the Spectral
Theorem in abstract form.
Theorem 13.6. Let V be an n-dimensional real inner product space, and
 ∶ V → V be a self-adjoint linear map. Let λ1 , . . . , λr be the eigenvalues23
of Â. Then the corresponding eigenspaces Vλ1 , . . . , Vλr are pairwise mutually
orthogonal, and
V = ⊕ri=1 Vλi
(“V is the orthogonal direct sum of the eigenspaces.”)

Proof. Let V1 = ⊕ri=1 Vλr . By Proposition 13.3 we know that this is an orthog-
onal direct sum. It suffices to show that V1 = V . Note that V1 is invariant
under Â, so by the above Lemma, so is V1⊥ . Now Â∣V1⊥ is self-adjoint. So
by Proposition 13.4 it must have an eigenvector v⃗′ , with a corresponding
eigenvalue λ′ , as long as V1⊥ ≠ {0V }. But λ′ must be one of the λ1 , . . . , ]λr ,
which means v⃗′ ∈ V1 , and this is a contradiction. 

If we choose an orthonormal basis each Vλi and combine these to a basis


for V this is an orthonormal basis for V since the eigenspaces are mutually
orthogonal. This proves a restatement of the above Theorem:
Theorem 13.7. Let V be an n-dimensional real inner product space, and
 ∶ V → V be a self-adjoint linear map. The characteristic polynomial
factors into linear factors:
n
p (t) = ∏(t − µi )
i=1

where µ1 , . . . , µn are real (but not necessarily distinct.) One can choose an
orthonormal basis (⃗ ⃗n ) such that
u1 , . . . , u
⃗i
ui ) = µi u
Â(⃗
Remark 13.8. It follows that
r n
p (t) = ∏(t − λi )mi = ∏(t − µi )
i=1 i=1

23By definition these are real.


74 T.R. RAMADAS

where mi = dim Vλi . It also follows that w.r.to the orthonormal basis
(⃗ ⃗n ) the matrix A of  has the structure:
u1 , . . . , u
⎡ A1 . . 0 ⎤
⎢ ⎥
⎢ ⎥
⎢ . A2 . . ⎥
⎢ ⎥
A = diag(µ1 , . . . , µn ) = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥
⎢ ⎥
⎢ 0 . . Ar ⎥
⎣ ⎦
where
⎡ λi . . 0 ⎤⎥

⎢ . λ . . ⎥⎥
⎢ i
⎢ ⎥
Ai = ⎢ . . . . ⎥
⎢ ⎥
⎢ . . . . ⎥⎥

⎢ 0 . . λi ⎥⎦

provided the basis vectors are numbered suitably grouping vectors together
if they correspond to equal eigenvalues.

Exercise: To what extent is the choice of the basis (⃗ ⃗n ) unique?


u1 , . . . , u
Consider the following cases: (a) the eigenvalues µ1 , . . . , µn are all distinct,
and (b) µ1 = µ2 but the other eigenvalues are distinct from each other and
from µ1 and µ2 .

13.7. Spectral Theorem for symmetric matrices. Let A be a real


symmetric n × n matrix. We know that the corresponding linear map
 ∶ Mn×1 → Mn×1 is self-adjoint w.r.t. the standard inner product. By
Theorem 13.7, there exists an orthonormal basis (⃗ ⃗n ) of column vec-
u1 , . . . , u
tors such that
A⃗ ⃗i , i = 1, . . . , n
ui = µi u
As we have done many times before, we rewrite this:
A[⃗ ⃗n ] = [⃗
u1 , . . . , u ⃗n ]∆
u1 , . . . , u
where ∆ = diag(µ1 , . . . , µn ) is a diagonal matrix. Letting O denote the
orthogonal matrix [⃗ ⃗n ], we get AO = O∆. We have proved:
u1 , . . . , u
Theorem 13.9. Let A be a real symmetric n × n matrix. The characteristic
polynomial factors into linear factors:
n
pA (t) = ∏(t − µi )
i=1

where µ1 , . . . , µn are real (but not necessarily distinct.) There is an orthog-


onal matrix O such that
Otr AO = diag(µ1 , . . . , µn )
LINEAR ALGEBRA 75

13.8. A “geodesic” path to proof of diagonalisability of symmetric


matrices. In this section, we outline a shortest-distance proof:
Let A be a symmetric n × n matrix. Then  ∶ V → V is self-adjoint w.r.to
the standard inner product on V ≡ Rn . By Proposition 13.4 it has a real
eigenvalue µ1 ; choose a corresponding eigenvector u ⃗1 of unit length. Let
V1 ⊂ Rn be the orthogonal complement. By Lemma 13.5 the map  takes
V1 to itself. The restricted map Â1 ≡ Â∣V1 ∶ V1 → V1 is self-adjoint w.r.to the
inner-product that V1 inherits from V . So Â1 has a real eigenvalue µ2 ; let u ⃗2
be an eigenvector of unit length. Continue to obtain the basis (⃗ ⃗n ).
u1 , . . . , u
Note that although we start with a matrix, we are forced to transition to
operator language because the subspace V1 (and the ones that come after
it) do not come with a natural choice of basis.
76 T.R. RAMADAS

14. Orthogonal matrices

Let V be an n-dimensional real vector space with an inner product


(⃗ ⃗ ↦ ⟨⃗
v , w) ⃗
v , w⟩

Exercise: Check that the following are equivalent, given a linear map Ô ∶
V → V , the following are equivalent:
(1) Ô is invertible, and Ô† = Ô−1 ,
(2) Ô preserves the inner product between vectors, i.e., ⟨Ô(⃗ ⃗ =
v ), Ô(w)⟩
⟨⃗ ⃗ for any pair of vectors v⃗, w.
v ), w⟩ ⃗
(3) Ô preserves lengths of vectors: ∣∣⃗ v )∣∣, for any v⃗.
v ∣∣ = ∣∣Ô(⃗
We will work on Rn , with the standard basis and inner product, and freely
identify a matrix A and the associated linear map Â. Recall that an orthog-
onal matrix O is one such that Otr O = In . By the above Exercise,
O−1 = Otr ⇐⇒ (O⃗ ⃗ = v⃗tr w
v )tr Ow ⃗ ⇐⇒ ∣∣⃗ v )∣∣ ∀ v⃗, w
v ∣∣ = ∣∣O(⃗ ⃗ ∈ Rn

Let O(n, R) denote the set of n × n orthogonal matrices.


First, note that the product of two orthogonal matrices O1 , O2 is orthogonal,
since
(O1 O2 )tr = O2tr O1 tr = O2−1 O1−1 = (O1 O2 )−1
Second, note that Otr O = In Ô⇒ (det O)2 = 1 Ô⇒ det O = ±1. So it is
convenient to write O(n, R) = SO(n, R) ⊔ SO− (n, R) where
SO(n, R) = set of orthogonal matrices with determinant +1, and
SO− (n, R) = set of orthogonal matrices with determinant -1,
Note that SO(n, R) is closed under products, while the product of two
elements of SO− (n, R) lands you in SO(n, R).
(The notation SO(n, R) is standard, but not SO− (n, R).)
We will not use the following facts, and you can ignore all references to the
group structure if you find them unhelpful or confusing. The set O(n, R) is a
group, det ∶ O(n, R) → {±1} is a homomorphism, SO(n, R) is the kernel and
therefore a normal subgroup, and SO− (n, R) is the nontrivial coset. In fact
O(n, R) is a topological group, and SO(n, R) is the connected component
containing the identity element In .

14.1. The orthogonal group in two dimensions. An element of O(2, R)


can be written
O = [⃗ ⃗2 ]
u1 u
LINEAR ALGEBRA 77

where (⃗ ⃗2 ) is an (ordered) orthonormal basis. Note that given u


u1 , u ⃗1 , there
are only two choices for u ⃗2 , and that
⃗1 −⃗
det [ u ⃗1 u
u2 ] = −det [ u ⃗2 ]
so that exactly one of the two choices gives you an element of SO(2, R). So
the map
SO(2, R) ∋ [⃗ ⃗2 ] ↦ u
u1 u ⃗1 ∈ S 1 = unit vectors in R2
def

is a bijection. We make this explicit.


(1) Parametrise S 1 by
cos θ
[0, 2π) ∋ θ ↦ [ ] ∈ S1
sin θ

(2) This gives the bijection S 1 → SO(2, R)


cos θ cos θ − sin θ
S1 ∋ [ ] ↦ rot(θ) = [ ] ∈ SO(2, R) ...
sin θ def sin θ cos θ

(3) ...and another bijection


cos θ cos θ sin θ
S1 ∋ [ ] ↦ ref (θ) = [ ] ∈ SO− (2, R) .
sin θ def sin θ − cos θ

(4) The matrix rot(θ) implements rotation in the anticlockwise direction


by angle θ, so it has no eigenvector unless θ = 0 or θ = π. In fact the
characteristic polynomial is t2 − 2t cos θ + 1, with discriminant 4 cos2 θ − 4
which is negative unless θ = 0 or θ = π.
(5) As for the matrix ref (θ), the characteristic polynomial is t2 − 1, which
has distinct eigenvalues ±1. What are corresponding eigenvectors? One can
do this by computation easily enough, but let us try geometry. We have
1 0
ref (0) = [ ]
0 −1
which clearly implements the reflection on the x-axis, and eigenvectors are
the standard basis vectors:
e1 = e⃗1 , ref (0)⃗
ref (0)⃗ e2 = −⃗
e2
Now an easy computation shows:
ref (θ) = rot(θ)ref (0)
In words ref (θ) implements reflection on the x-axis followed by an anti-
clockwise rotation by θ. Clearly the vector
cos θ/2
⃗+ = [
u ]
def sin θ/2
78 T.R. RAMADAS

is left unchanged by this, so


ref (θ)⃗ ⃗+
u+ = u
which can be checked by the computation:
cos θ sin θ cos θ/2 cos θ cos θ/2 + sin θ sin θ/2 cos θ/2
[ ][ ]=[ ]=[ ]
sin θ − cos θ sin θ/2 sin θ cos θ/2 − cos θ sin θ/2 sin θ/2
u+ ) and we find
The other eigenspace must be orthogonal to Span(⃗
− sin θ/2 − sin θ/2
ref (θ) [ ] = −[ ]
cos θ/2 cos θ/2
We conclude that ref (θ) implements reflection on the line spanned by the
cos θ/2
vector [ ].
sin θ/2

14.2. The orthogonal group in 3 dimensions. We treat this in a number


of steps, with many verifications left as exercises.
(1) Consider an element T ∈ O(3, R). Its characteristic polynomial is a
real polynomial of degree 3, and has either one real root and two complex
roots ω, ω (with nonzero imaginary part) which are conjugates of each other.
Since T preserves lengths, this real root(s) must be ±1. Exercise: prove both
claims.
(2) Exercise: Prove that if all three roots are real, then exactly one of the
following holds:
(a) pT = (t − 1)3 and T = I3 . In this case T ∈ SO(3, R).
(b) pT = (t − 1)(t + 1)2 and T̂ is “reflection on a line”, i.e., there is an
orthogonal direct sum R3 = V−1 ⊕ V+1 , with V−1 two-dimensional, such
v− + v⃗+ ) = −⃗
that T̂ (⃗ v−1 + v⃗+ . In this case T ∈ SO(3, R).
(c) pT = (t − 1)2 (t + 1) and T̂ is “reflection on a plane,” i.e., there is an
orthogonal direct sum R3 = V−1 ⊕ V+1 , with V−1 one-dimensional, such
v− + v⃗+ ) = −⃗
that T̂ (⃗ v− + v⃗+ . In this case T ∈ SO− (3, R).
(d) pT = (t + 1)3 and T = −I3 . In this case T ∈ SO− (3, R).
In part (b) and(c), note that an orthogonal matrix T is in general not
symmetric, so you cannot just quote the result for self-adjoint maps to prove
that V+1 and V−1 are mutually orthogonal.
(3) If only one root is real, we have det T = (real root)∣ω∣2 , so the real root
is +1 if T ∈ SO(3, R) and −1 if T ∈ SO− (3, R). Exercise: In both cases prove
that V = V±1 ⊕ V±1 ⊥
v± + v⃗⊥ ) = ±⃗
with V±1 one-dimensional, and T̂ (⃗ v± + T̂ ⊥ (⃗
v ⊥ ),
where T̂ ⊥ ≡ T̂ ∣V±1
⊥ is an orthogonal linear transformation acting on V

±1 , and
det T̂ ⊥ = 1.
LINEAR ALGEBRA 79

(4) Let us briefly summarise the case when T ∈ SO(3, R). Then either
(a) T = I3 , or
(b) V is the orthogonal direct sum V+1 ⊕ V−1 , with V+1 one-dimensional, or
(c) the characteristic polynomial has roots 1, eiθ , e−iθ , with 0 < θ < π. In
this case, the eigenspace V+1 is one-dimensional, and “T̂1 is rotation in
the plane V1⊥ by an angle θ according to the right-hand corkscrew rule
with respect to u ⃗”, where u
⃗ is an appropriate choice from among the
two eigenvectors of unit length. To make this more precise we will need
to consider orientations, which we choose not to.
80 T.R. RAMADAS

15. Complex inner product spaces

In this section V will denote an n-dimensional vector space over C.

15.1. Hermitian inner product. Preliminaries.


Definition 15.1. A Hermitian inner product on V is a map
V ×V →C
(⃗ ⃗ ↦⟨⃗
v , w) ⃗
v , w⟩

which is
(1) linear in the first variable and “antilinear” in the second variable,
that is,
⟨a1 v⃗1 + a2 v⃗2 , w⟩
⃗ = a1 ⟨⃗ ⃗ + a2 ⟨⃗
v1 , w⟩ ⃗ and
v2 , w⟩,
⟨⃗ ⃗1 + b2 w
v , b1 w ⃗2 ⟩ = b1 ⟨⃗ ⃗1 ⟩ + b2 ⟨⃗
v, w ⃗2 ⟩,
v, w
where z denotes the complex conjugate of a complex number z,
(2) “conjugate-symmetric”, i.e., ⟨⃗ ⃗ = ⟨w,
v , w⟩ ⃗ v⃗⟩ (in particular ⟨⃗
v , v⃗⟩ is
real for any v⃗),
v , v⃗⟩ > 0 unless v⃗ = 0V .
(3) positive-definite, i.e., ⟨⃗
This is modelled on the Hermitian inner product of vectors in Cn :
n
v⃗ w
⃗ ≡ ∑ vi w i
i=1

where v⃗ = (v1 , . . . , vn ) and w


⃗ = (w1 , . . . , wn ).
Warning: The convention is Physics is to require linearity in the second
variable and antilinearity in the first variable.
Exercise: Let V be a n-dimensional complex vector space and let VR be
the real vector space obtained by considering the same set of vectors and
allowing only multiplication by real numbers.
(1) Show that v⃗ ↦ i⃗
v is a linear map of VR to itself. Let us denote this
map by J. To be explicit, J ∶ VR → VR is the map J(⃗ v ) = i⃗
v . Show that
J 2 = −IdVR .
(2) Show that if (⃗ e1 , . . . , e⃗n ) is a basis for V , then (⃗
e1 , . . . , e⃗n , J(⃗
e1 ), . . . , J(⃗
en ))
is a basis for VR . Thus dim VR = 2 × dim V .
(3) If ⟨., .⟩ is a Hermitian inner product on V , ⟨., .⟩R defined by
⟨⃗ ⃗ R = Re⟨⃗
v , w⟩ ⃗ (= real part of ⟨⃗
v , w⟩ ⃗
v , w⟩)
is an inner product on VR .
LINEAR ALGEBRA 81

(4) The linear map J satisfies


⟨J(⃗ ⃗ R = ⟨⃗
v ), J(w)⟩ ⃗ R
v , w⟩

(5) On the other hand


⟨⃗ ⃗ R = −⟨J(⃗
v , J(w)⟩ ⃗ R
v ), w⟩

Most of the statements we made about real inner product spaces hold for
Hermitian inner product spaces, and the proofs go through with minor mod-
ifications.
The Cauchy-Schwarz inequality holds:
∣⟨⃗ ⃗ ≤ ∣∣⃗
v , w⟩∣ ⃗ .
v ∣∣ × ∣∣w∣∣
To see this, first note that given v⃗, w,
⃗ we can find a complex number α
of modulus 1 such that ⟨α⃗ ⃗ is real. Now as seen in the last Exercise,
v , w⟩
(⃗ ⃗ ↦ Re⟨⃗
v , w) ⃗ is inner product on VR , So
v , w⟩
∣Re⟨α⃗ ⃗ ≤ ∣∣⃗
v , w⟩∣ ⃗
v ∣∣ × ∣∣w∣∣
But
∣Re⟨α⃗ ⃗ = ∣α⟨⃗
v , w⟩∣ ⃗ = ∣α∣∣⟨⃗
v , w⟩∣ ⃗ = ∣⟨⃗
v , w⟩∣ ⃗
v , w⟩∣

The triangle inequality for norms follows: ∣∣⃗ ⃗ ≤ ∣∣⃗


v + w∣∣ ⃗
v ∣∣ + ∣∣w∣∣.
Orthonormal bases exist; and in fact the Gram-Schmidt process works as in
the real case.
The hermitian analogue of an orthogonal matrix is a unitary matrix, which
we will define in the next subsection.

15.2. Important definitions: hermitian matrix; unitary matrix. .


Given a m × n matrix A with complex entries, its conjugate A is the n × m
matrix such that
(A)ij = Aij
In other words each entry of A is the complex conjugate of the corresponding
entry of A. Clearly, A is a real matrix iff A = A.
In the context of a complex vector space with inner product, the “correct”
complex analogue of a symmetric matrix is a hermitian matrix: a n × n
matrix with complex entries A is hermitian if (A)tr = A, that is to say,
Aji = Aij for all i, j
We will have more to say later about Hermitian matrices. For the moment,
we define the analogue of an orthogonal matrix, a unitary matrix: a n × n
matrix U with complex entries is unitary if
(U )tr U = U (U )tr = In
82 T.R. RAMADAS

Exercise: The following are are analogues of results we have seen in the real
case.
(1) Let U = [⃗ ⃗n ] be a n×n matrix, where (⃗
u1 , . . . , u ⃗n ) is a sequence of
u1 , . . . , u
column vectors with complex entries. Then (⃗ ⃗n ) is an orthonormal
u1 , . . . , u
basis for Cn (with the standard Hermitian inner product) iff U is unitary.
(2) Given an invertible n×n matrix A with complex entries, we have a unique
decomposition A = QR where Q is unitary and R is upper triangular.
(3) First a definition: a hermitian n × n matrix P is said to be positive-
tr
definite if v⃗ P v⃗ > 0 for all v⃗ ∈ Cn . Check that
tr
⟨⃗
v , w⟩ ⃗ i Pij v⃗j )
⃗ P v⃗ (= ∑ w
⃗ =w
i,j

is a Hermitian inner product on Cn , and that every inner product is given


by such a (unique) P . Check that every positive-definite P is given by an
invertible B by: P = (B)tr B.

15.3. Self-adjoint linear maps. Let V be a complex n-dimensional vector


space with a hermitian inner product. Given a linear map  ∶ V → V , its
adjoint, † is defined by
⟨Â(⃗
v ), w⟩ v , † (w)⟩for
⃗ = ⟨⃗ ⃗ all v⃗, w

A linear map  is self-adjoint if † = Â.
Exercise: If A is an n × n matrix and  ∶ Cn → Cn the corresponding
linear map, with Cn endowed with its standard hermitian inner product, †
corresponds to the matrix (A)tr . Thus  is self-adjoint iff A is hermitian.
Warning: We have reserved the terms ”symmetric” and ”hermitian” for
real and complex matrices and “self-adjoint” for maps. Be aware that the
terms are often used interchangeably.

15.4. Spectral theorem; complex case. We now state the spectral the-
orem.
Theorem 15.2. Let  ∶ V → V be self-adjoint. Then all its eigenvalues are
⃗1 , . . . , u
real. There exists an orthonormal basis u ⃗n of eigenvectors.

In matrix form:
Theorem 15.3. Let A be an n × n hermitian matrix. There exists a unitary
matrix U such that
(U )tr AU = diag(µ1 , . . . , µn )
where the µi are real.
LINEAR ALGEBRA 83

15.5. Unitary operators. Let V be a complex n-dimensional vector space


with a hermitian inner product. Given a linear map U ∶ V → V , it is said to
be unitary if one of the following equivalent conditions holds:
(1) Û is invertible, and Û † = Û −1 ,
(2) Û preserves the inner product between vectors, i.e., ⟨Û (⃗ ⃗ =
v ), Û (w)⟩
⟨⃗ ⃗ for any pair of vectors v⃗, w.
v ), w⟩ ⃗
(3) Û preserves lengths of vectors: ∣∣⃗ v )∣∣, for any v⃗.
v ∣∣ = ∣∣Û (⃗
Exercise: To prove that (3) implies (2) you have to show that norms deter-
mine inner products. This is slightly trickier than in the real case.
Exercise: If U is an n × n unitary matrix and Û ∶ Cn → Cn the corresponding
linear map, with Cn endowed with its standard hermitian inner product, Û †
is clearly unitary.
The structure of unitary maps is very simple:
Exercise: If Û is unitary, its eigenvalues are complex numbers of modulus
one. There exists an orthonormal basis of eigenvectors. If U is a unitary
matrix, there exists another unitary matrix U1 such that
(U 1 )tr U U1 = diag(µ1 , . . . , µn )
with ∣µi ∣ = 1 ∀i.

You might also like