A course for second year students by
Robert Howlett
typesetting by T
E
X
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Contents
Chapter 1: Preliminaries 1
'1a Logic and common sense 1
'1b Sets and functions 3
'1c Relations 7
'1d Fields 10
Chapter 2: Matrices, row vectors and column vectors 18
'2a Matrix operations 18
'2b Simultaneous equations 24
'2c Partial pivoting 29
'2d Elementary matrices 32
'2e Determinants 35
'2f Introduction to eigenvalues 38
Chapter 3: Introduction to vector spaces 49
'3a Linearity 49
'3b Vector axioms 52
'3c Trivial consequences of the axioms 61
'3d Subspaces 63
'3e Linear combinations 71
Chapter 4: The structure of abstract vector spaces 81
'4a Preliminary lemmas 81
'4b Basis theorems 85
'4c The Replacement Lemma 86
'4d Two properties of linear transformations 91
'4e Coordinates relative to a basis 93
Chapter 5: Inner Product Spaces 99
'5a The inner product axioms 99
'5b Orthogonal projection 106
'5c Orthogonal and unitary transformations 116
'5d Quadratic forms 121
iii
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Chapter 6: Relationships between spaces 129
'6a Isomorphism 129
'6b Direct sums 134
'6c Quotient spaces 139
'6d The dual space 142
Chapter 7: Matrices and Linear Transformations 148
'7a The matrix of a linear transformation 148
'7b Multiplication of transformations and matrices 153
'7c The Main Theorem on Linear Transformations 157
'7d Rank and nullity of matrices 161
Chapter 8: Permutations and determinants 171
'8a Permutations 171
'8b Determinants 179
'8c Expansion along a row 188
Chapter 9: Classiﬁcation of linear operators 194
'9a Similarity of matrices 194
'9b Invariant subspaces 200
'9c Algebraically closed ﬁelds 204
'9d Generalized eigenspaces 205
'9e Nilpotent operators 210
'9f The Jordan canonical form 216
'9g Polynomials 221
Index of notation 228
Index of examples 229
iv
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
1
Preliminaries
The topics dealt with in this introductory chapter are of a general mathemat
ical nature, being just as relevant to other parts of mathematics as they are
to vector space theory. In this course you will be expected to learn several
things about vector spaces (of course!), but, perhaps even more importantly,
you will be expected to acquire the ability to think clearly and express your
self clearly, for this is what mathematics is really all about. Accordingly, you
are urged to read (or reread) Chapter 1 of “Proofs and Problems in Calculus”
by G. P. Monro; some of the points made there are reiterated below.
'1a Logic and common sense
When reading or writing mathematics you should always remember that the
mathematical symbols which are used are simply abbreviations for words.
Mechanically replacing the symbols by the words they represent should result
in grammatically correct and complete sentences. The meanings of a few
commonly used symbols are given in the following table.
Symbols To be read as
¦ . . . [ . . . ¦ the set of all . . . such that . . .
= is
∈ in or is in
> greater than or is greater than
Thus, for example, the following sequence of symbols
¦ x ∈ X [ x > a ¦ = ∅
is an abbreviated way of writing the sentence
The set of all x in X such that x is greater than a is not the empty set.
1
2 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
When reading mathematics you should mentally translate all symbols in this
fashion, and when writing mathematics you should make sure that what you
write translates into meaningful sentences.
Next, you must learn how to write out proofs. The ability to construct
proofs is probably the most important skill for the student of pure mathe
matics to acquire. And it must be realized that this ability is nothing more
than an extension of clear thinking. A proof is nothing more nor less than an
explanation of why something is so. When asked to prove something, your
ﬁrst task is to be quite clear as to what is being asserted, then you must
decide why it is true, then write the reason down, in plain language. There
is never anything wrong with stating the obvious in a proof; likewise, people
who leave out the “obvious” steps often make incorrect deductions.
When trying to prove something, the logical structure of what you are
trying to prove determines the logical structure of the proof; this observation
may seem rather trite, but nonetheless it is often ignored. For instance, it
frequently happens that students who are supposed to proving that a state
ment p is a consequence of statement q actually write out a proof that q is
a consequence of p. To help you avoid such mistakes we list a few simple
rules to aid in the construction of logically correct proofs, and you are urged,
whenever you think you have successfully proved something, to always check
that your proof has the correct logical structure.
• To prove a statement of the form
If p then q
your ﬁrst line should be
Assume that p is true
and your last line
Therefore q is true.
• The statement
p if and only if q
is logically equivalent to
If p then q and if q then p.
To prove it you must do two proofs of the kind described in the preced
ing paragraph.
• Suppose that P(x) is some statement about x. Then to prove
P(x) is true for all x in the set S
Chapter One: Preliminaries 3
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
your ﬁrst line should be
Let x be an arbitrary element of the set S
and your last line
Therefore P(x) holds.
• Statements of the form
There exists an x such that P(x) is true
are proved producing an example of such an x.
Some of the above points are illustrated in the examples #1, #2, #3
and #4 at the end of the next section.
'1b Sets and functions
It has become traditional to base all mathematics on set theory, and we will
assume that the reader has an intuitive familiarity with the basic concepts.
For instance, we write S ⊆ A (S is a subset of A) if every element of S is an
element of A. If S and T are two subsets of A then the union of S and T is
the set
S ∪ T = ¦ x ∈ A [ x ∈ S or x ∈ T ¦
and the intersection of S and T is the set
S ∩ T = ¦ x ∈ A [ x ∈ S and x ∈ T ¦.
(Note that the ‘or’ above is the inclusive ‘or’—that which is sometimes
written as ‘and/or’. In this book ‘or’ will always be used in this sense.)
Given any two sets S and T the Cartesian product S T of S and T is
the set of all ordered pairs (s, t) with s ∈ S and t ∈ T; that is,
S T = ¦ (s, t) [ s ∈ S, t ∈ T ¦.
The Cartesian product of S and T always exists, for any two sets S and T.
This is a fact which we ask the reader to take on trust. This course is
not concerned with the foundations of mathematics, and to delve into formal
treatments of such matters would sidetrack us too far from our main purpose.
Similarly, we will not attempt to give formal deﬁnitions of the concepts of
‘function’ and ‘relation’.
4 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Let A and B be sets. A function f from A to B is to be thought of as a
rule which assigns to every element a of the set A an element f(a) of the set
B. The set A is called the domain of f and B the codomain (or target) of f.
We use the notation ‘f: A → B’ (read ‘f, from A to B’) to mean that f is a
function with domain A and codomain B.
A map is the same thing as a function. The terms mapping and trans
formation are also used.
A function f: A → B is said to be injective (or onetoone) if and only
if no two distinct elements of A yield the same element of B. In other words,
f is injective if and only if for all a
1
, a
2
∈ A, if f(a
1
) = f(a
2
) then a
1
= a
2
.
A function f: A → B is said to be surjective (or onto) if and only if for
every element b of B there is an a in A such that f(a) = b.
If a function is both injective and surjective we say that it is bijective
(or a onetoone correspondence).
The image (or range) of a function f: A → B is the subset of B con
sisting of all elements obtained by applying f to elements of A. That is,
imf = ¦ f(a) [ a ∈ A¦.
An alternative notation is ‘f(A)’ instead of ‘imf’. Clearly, f is surjective
if and only if imf = B. The word ‘image’ is also used in a slightly diﬀerent
sense: if a ∈ A then the element f(a) ∈ B is sometimes called the image of
a under the function f.
The notation ‘a → b’ means ‘a maps to b’; in other words, the function
involved assigns the element b to the element a. Thus, ‘a → b’ (under the
function f) means exactly the same as ‘f(a) = b’.
If f: A → B is a function and C a subset of B then the inverse image
or preimage of C is the subset of A
f
−1
(C) = ¦ a ∈ A [ f(a) ∈ C ¦.
(The above sentence reads ‘f inverse of C is the set of all a in A such that
f of a is in C.’ Alternatively, one could say ‘The inverse image of C under
f’ instead of ‘f inverse of C’.)
Let f: B → C and g: A → B be functions such that domain of f is the
codomain of g. The composite of f and g is the function fg: A → C given
Chapter One: Preliminaries 5
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
by (fg)(a) = f(g(a)) for all a in A. It is easily checked that if f and g are
as above and h: D → A is another function then the composites (fg)h and
f(gh) are equal.
Given any set A the identity function on A is the function i: A → A
deﬁned by i(a) = a for all a ∈ A. It is clear that if f is any function with
domain A then fi = f, and likewise if g is any function with codomain A
then ig = g.
If f: B → A and g: A → B are functions such that the composite fg is
the identity on A then we say that f is a left inverse of g and g is a right
inverse of f. If in addition we have that gf is the identity on B then we say
that f and g are inverse to each other, and we write f = g
−1
and g = f
−1
.
It is easily seen that f: B → A has an inverse if and only if it is bijective, in
which case f
−1
: A → B satisﬁes f
−1
(a) = b if and only if f(b) = a (for all
a ∈ A and b ∈ B).
Suppose that g: A → B and f: B → C are bijective functions, so that
there exist inverse functions g
−1
: B → A and f
−1
: C → B. By the properties
stated above we ﬁnd that
(fg)(g
−1
f
−1
) = ((fg)g
−1
)f
−1
= (f(gg
−1
)f
−1
= (fi
B
)f
−1
= ff
−1
= i
C
(where i
B
and i
C
are the identity functions on B and C), and an exactly
similar calculation shows that (g
−1
f
−1
)(fg) is the identity on A. Thus fg has
an inverse, and we have proved that the composite of two bijective functions
is necessarily bijective.
Examples
#1 Suppose that you wish to prove that a function λ: X → Y is injective.
Consult the deﬁnition of injective. You are trying to prove the following
statement:
For all x
1
, x
2
∈ X, if λ(x
1
) = λ(x
2
) then x
1
= x
2
.
So the ﬁrst two lines of your proof should be as follows:
Let x
1
, x
2
∈ X.
Assume that λ(x
1
) = λ(x
2
).
Then you will presumably consult the deﬁnition of the function λ to derive
consequences of λ(x
1
) = λ(x
2
), and eventually you will reach the ﬁnal line
Therefore x
1
= x
2
.
6 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#2 Suppose you wish to prove that λ: X → Y is surjective. That is, you
wish to prove
For every y ∈ Y there exists x ∈ X with λ(x) = y.
Your ﬁrst line must be
Let y be an arbitrary element of Y .
Somewhere in the middle of the proof you will have to somehow deﬁne an
element x of the set X (the deﬁnition of x is bound to involve y in some
way), and the last line of your proof has to be
Therefore λ(x) = y.
#3 Suppose that A and B are sets, and you wish to prove that A ⊆ B.
By deﬁnition the statement ‘A ⊆ B’ is logically equivalent to
All elements of A are elements of B.
So your ﬁrst line should be
Let x ∈ A
and your last line should be
Therefore x ∈ B.
#4 Suppose that you wish to prove that A = B, where A and B are sets.
The following statements are all logically equivalent to ‘A = B’:
(i) For all x, x ∈ A if and only if x ∈ B.
(ii) (For all x)
(if x ∈ A then x ∈ B) and (if x ∈ B then x ∈ A)
.
(iii) All elements of A are elements of B and all elements of B are elements
of A.
(iv) A ⊆ B and B ⊆ A.
You must do two proofs of the general form given in #3 above.
#5 Let A = ¦ n ∈ Z [ 0 ≤ n ≤ 3 ¦ and B = ¦ n ∈ Z [ 0 ≤ n ≤ 2 ¦. Prove
that if C = ¦ n ∈ Z [ 0 ≤ n ≤ 11 ¦ then there is a bijective map f: AB → C
given by f(a, b) = 3a +b for all a ∈ A and b ∈ B.
−− Observe ﬁrst that by the deﬁnition of the Cartesian product of two
sets, A B consists of all ordered pairs (a, b), with a ∈ A and b ∈ B. Our
Chapter One: Preliminaries 7
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
ﬁrst task is to show that for every such pair (a, b), and with f as deﬁned
above, f(a, b) ∈ C.
Let (a, b) ∈ AB. Then a and b are integers, and so 3a +b is also an
integer. Since 0 ≤ a ≤ 3 we have that 0 ≤ 3a ≤ 9, and since 0 ≤ b ≤ 2 we
deduce that 0 ≤ 3a + b ≤ 11. So 3a + b ∈ ¦ n ∈ Z [ 0 ≤ n ≤ 11 ¦ = C, as
required. We have now shown that f, as deﬁned above, is indeed a function
from AB to C.
We now show that f is injective. Let (a, b), (a
, b
) ∈ AB, and assume
that f(a, b) = f(a
, b
). Then, by the deﬁnition of f, we have 3a+b = 3a
+b
,
and hence b
−b = 3(a−a
). Since a−a
is an integer, this shows that b
−b is a
multiple of 3. But since b, b
∈ B we have that 0 ≤ b
≤ 2 and −2 ≤ −b ≤ 0,
and adding these inequalities gives −2 ≤ b
− b ≤ 2. The only multiple of 3
in this range is 0; hence b
= b, and the equation 3a + b = 3a
+ b
becomes
3a = 3a
, giving a = a
. Therefore (a, b) = (a
, b
).
Finally, we must show that f is surjective. Let c be an arbitrary element
of the set C, and let m be the largest multiple of 3 which is not greater than c.
Then m = 3a for some integer a, and 3a + 3 (which is a multiple of 3 larger
than m) must exceed c; so 3a ≤ c ≤ 3a + 2. Deﬁning b = c − 3a, it follows
that b is an integer satisfying 0 ≤ b ≤ 2; that is, b ∈ B. Note also that since
0 ≤ c ≤ 11, the largest multiple of 3 not exceeding c is at least 0, and less
than 12. That is, 0 ≤ 3a < 12, whence 0 ≤ a < 4, and, since a is an integer,
0 ≤ a ≤ 3. Hence a ∈ A, and so (a, b) ∈ A B. Now f(a, b) = 3a + b = c,
which is what was to be proved. −−<
#6 Let f be as deﬁned in #5 above. Find the preimages of the following
subsets of C:
S
1
= ¦5, 11¦, S
2
= ¦0, 1, 2¦, S
3
= ¦0, 3, 6, 9¦.
−− They are f
−1
(S
1
) = ¦(1, 2), (3, 2)¦, f
−1
(S
2
) = ¦(0, 0), (0, 1), (0, 2)¦
and f
−1
(S
3
) = ¦(0, 0), (1, 0), (2, 0), (3, 0)¦ respectively. −−<
'1c Relations
We move now to the concept of a relation on a set X. For example, ‘<’ is a
relation on the set of natural numbers, in the sense that if m and n are any
8 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
two natural numbers then m < n is a welldeﬁned statement which is either
true or false. Likewise, ‘having the same birthday as’ is a relation on the set
of all people. If we were to use the symbol ‘∼’ for this relation then a ∼ b
would mean ‘a has the same birthday as b’ and a ∼ b would mean ‘a does
not have the same birthday as b’.
If ∼ is a relation on X then ¦ (x, y) [ x ∼ y ¦ is a subset of XX, and,
conversely, any subset of X X deﬁnes a relation on X. Formal treatments
of this topic usually deﬁne a relation on X to be a subset of X X.
1.1 Definition Let ∼ be a relation on a set X.
(i) The relation ∼ is said to be reﬂexive if x ∼ x for all x ∈ X.
(ii) The relation ∼ is said to be symmetric if x ∼ y whenever y ∼ x (for all
x and y in X).
(iii) The relation ∼ is said to be transitive if x ∼ z whenever x ∼ y and
y ∼ z (for all x, y, z ∈ X).
A relation which is reﬂexive, symmetric and transitive is called an equivalence
relation.
For example, the “birthday” relation above is an equivalence relation.
Another example would be to deﬁne sets X and Y to be equivalent if they
have the same number of elements; more formally, deﬁne X ∼ Y if there exists
a bijection from X to Y . The reﬂexive property for this relation follows from
the fact that identity functions are bijective, the symmetric property from
the fact that the inverse of a bijection is also a bijection, and transitivity
from the fact that the composite of two bijections is a bijection.
It is clear that an equivalence relation ∼ on a set X partitions X into
nonoverlapping subsets, two elements x, y ∈ X being in the same subset if
and only if x ∼ y. (See #6 below.) These subsets are called equivalence
classes. The set of all equivalence classes is then called the quotient of X by
the relation ∼.
When dealing with equivalence relations it frequently simpliﬁes matters
to pretend that equivalent things are equal—ignoring irrelevant aspects, as
it were. The concept of ‘quotient’ deﬁned above provides a mathematical
mechanism for doing this. If X is the quotient of X by the equivalence
relation ∼ then X can be thought of as the set obtained from X by identifying
equivalent elements of X. The idea is that an equivalence class is a single
thing which embodies things we wish to identify. See #10 and #11 below,
for example.
Chapter One: Preliminaries 9
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Examples
#7 Suppose that ∼ is an equivalence relation on the set X, and for each
x ∈ X deﬁne C(x) = ¦ y ∈ X [ x ∼ y ¦. Prove that for all y, z ∈ X, the sets
C(y) and C(z) are equal if y ∼ z, and have no elements in common if y ∼ z.
Prove furthermore that C(x) = ∅ for all x ∈ X, and that for every y ∈ X
there is an x ∈ X such that y ∈ C(x).
−− Let y, z ∈ X with y ∼ z. Let x be an arbitrary element of C(y). Then
by deﬁnition of C(y) we have that y ∼ x. But z ∼ y (by symmetry of ∼,
since we are given that y ∼ z), and by transitivity it follows from z ∼ y and
y ∼ x that z ∼ x. This further implies, by deﬁnition of C(z), that x ∈ C(z).
Thus every element of C(y) is in C(z); so we have shown that C(y) ⊆ C(z).
On the other hand, if we let x be an arbitrary element of C(z) then we have
z ∼ x, which combined with y ∼ z yields y ∼ x (by transitivity of ∼), and
hence x ∈ C(y). So C(z) ⊆ C(y), as well as C(y) ⊆ C(z). Thus C(y) = C(z)
whenever y ∼ z.
Now let y, z ∈ X with y ∼ z, and let x ∈ C(y) ∩ C(z). Then x ∈ C(y)
and x ∈ C(z), and so y ∼ x and z ∼ x. By symmetry we deduce that x ∼ z,
and now y ∼ x and x ∼ z yield y ∼ z by transitivity. This contradicts
our assumption that y ∼ z. It follows that there can be no element x in
C(y) ∩ C(z); in other words, C(y) and C(z) have no elements in common
if y ∼ z.
Finally, observe that if x ∈ X is arbitrary then x ∈ C(x), since by
reﬂexivity of ∼ we know that x ∼ x. Hence C(x) = ∅. Furthermore, for
every y ∈ X there is an x ∈ X with y ∈ C(x), since x = y has the required
property. −−<
#8 Let f: X → S be an arbitrary function, and deﬁne a relation ∼ on X
by the following rule: x ∼ y if and only if f(x) = f(y). Prove that ∼ is an
equivalence relation. Prove furthermore that if X is the quotient of X by ∼
then there is a onetoone correspondence between the sets X and imf.
−− Let x ∈ X. Since f(x) = f(x) it is certainly true that x ∼ x. So ∼ is
a reﬂexive relation.
Let x, y ∈ X with x ∼ y. Then f(x) = f(y) (by the deﬁnition of ∼);
thus f(y) = f(x), which shows that y ∼ x. So y ∼ x whenever x ∼ y, and
we have shown that ∼ is symmetric.
Now let x, y, z ∈ X with x ∼ y and y ∼ z. Then f(x) = f(y) and
f(y) = f(z), whence f(x) = f(z), and x ∼ z. So ∼ is also transitive, whence
10 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
it is an eqivalence relation.
Using the notation of #7 above, X = ¦ C(x) [ x ∈ X¦ (since C(x)
is the equivalence class of the element x ∈ X). We show that there is a
bijective function f: X → imf such that if C ∈ X is any equivalence class,
then f(C) = f(x) for all x ∈ X such that C = C(x). In other words, we
wish to deﬁne f: X → imf by f(C(x)) = f(x) for all x ∈ X. To show that
this does give a well deﬁned function from X to imf, we must show that
(i) every C ∈ X has the form C = C(x) for some x ∈ X (so that the given
formula deﬁnes f(C) for all C ∈ X),
(ii) if x, y ∈ X with C(x) = C(y) then f(x) = f(y) (so that the given
formula deﬁnes f(C) uniquely in each case), and
(iii) f(x) ∈ imf (so that the the given formula does deﬁne f(C) to be an
element of imf, the set which is meant to be the codomain of f.
The ﬁrst and third of these points are trivial: since X is deﬁned to be
the set of all equivalence classes its elements are certainly all of the form C(x),
and it is immediate from the deﬁnition of the image of f that f(x) ∈ imf for
all x ∈ X. As for the second point, suppose that x, y ∈ X with C(x) = C(y).
Then by one of the results proved in #7 above, y ∈ C(y) = C(x), whence
x ∼ y by the deﬁnition of C(x). And by the deﬁnition of ∼, this says that
f(x) = f(y), as required. Hence the function f is welldeﬁned.
We must prove that f is bijective. Suppose that C, C
∈ X with
f(C) = f(C
). Choose x, y ∈ X such that C = C(x) and C
= C(y).
Then
f(x) = f(C(x)) = f(C) = f(C
) = f(C(y)) = f(y),
and so x ∼ y, by deﬁnition of ∼. By #7 above, it follows that C(x) = C(y);
that is, C = C
. Hence f is injective. But if s ∈ imf is arbitrary then by
deﬁnition of imf there exists an x ∈ X with s = f(x), and now the deﬁnition
of f gives f(C(x)) = f(x) = s. Since C(x) ∈ X, we have shown that for all
s ∈ imf there exists C ∈ X with f(C) = s. Thus f: X → imf is surjective,
as required. −−<
'1d Fields
Vector space theory is concerned with two diﬀerent kinds of mathematical ob
jects, called vectors and scalars. The theory has many diﬀerent applications,
and the vectors and scalars for one application will generally be diﬀerent from
Chapter One: Preliminaries 11
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
the vectors and scalars for another application. Thus the theory does not say
what vectors and scalars are; instead, it gives a list of deﬁning properties, or
axioms, which the vectors and scalars have to satisfy for the theory to be ap
plicable. The axioms that vectors have to satisfy are given in Chapter Three,
the axioms that scalars have to satisfy are given below.
In fact, the scalars must form what mathematicians call a ‘ﬁeld’. This
means, roughly speaking, that you have to be able to add scalars and multiply
scalars, and these operations of addition and multiplication have to satisfy
most of the familiar properties of addition and multiplication of real numbers.
Indeed, in almost all the important applications the scalars are just the real
numbers. So, when you see the word ‘scalar’, you may as well think ‘real
number’. But there are other sets equipped with operations of addition and
multiplication which satisfy the relevant properties; two notable examples
are the set of all complex numbers and the set of all rational numbers.†
1.2 Definition A set F which is equipped with operations of addition
and multiplication is called a ﬁeld if the following properties are satisﬁed.
(i) a + (b +c) = (a +b) +c for all a, b, c ∈ F.
(That is, addition is associative.)
(ii) There exists an element 0 ∈ F such that a + 0 = a = 0 + a for
all a ∈ F. (There is a zero element in F.)
(iii) For each a ∈ F there is a b ∈ F such that a +b = 0 = b +a.
(Each element has a negative.)
(iv) a +b = b +a for all a, b ∈ F. (Addition is commutative.)
(v) a(bc) = (ab)c for all a, b, c ∈ F. (Multiplication is associative.)
(vi) There exists an element 1 ∈ F, which is not equal to the zero element 0,
such that 1a = a = a1 for all a ∈ F.
(There is a unity element in F.)
(vii) For each a ∈ F with a = 0 there exists b ∈ F with ab = 1 = ba.
(Nonzero elements have inverses.)
(viii) ab = ba for all a, b ∈ F. (Multiplication is commutative.)
(ix) a(b +c) = ab +ac and (a +b)c = ac +bc for all a, b, c ∈ F.
(Both distributive laws hold.)
Comments
1.2.1 Although the deﬁnition is quite long the idea is simple: for a set
† A number is rational if it has the form n/m where n and m are integers.
12 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
to be a ﬁeld you have to be able to add and multiply elements, and all the
obvious desirable properties have to be satisﬁed.
1.2.2 It is possible to use set theory to prove the existence integers and
real numbers (assuming the correctness of set theory!) and to prove that the
set of all real numbers is a ﬁeld. We will not, however, delve into these
matters in this book; we will simply assume that the reader is familiar with
these basic number systems and their properties. In particular, we will not
prove that the real numbers form a ﬁeld.
1.2.3 The deﬁnition is not quite complete since we have not said what
is meant by the term operation. In general an operation on a set S is just a
function from SS to S. In other words, an operation is a rule which assigns
an element of S to each ordered pair of elements of S. Thus, for instance,
the rule
(a, b) −→
a +b
3
deﬁnes an operation on the set R of all real numbers: we could use the nota
tion a◦b =
√
a +b
3
. Of course, such unusual operations are not likely to be of
any interest to anyone, since we want operations to satisfy nice properties like
those listed above. In particular, it is rare to consider operations which are
not associative, and the symbol ‘+’ is always reserved for operations which
are both associative and commutative.
The set of all real numbers is by far the most important example of a
ﬁeld. Nevertheless, there are many other ﬁelds which occur in mathematics,
and so we list some examples. We omit the proofs, however, so as not to be
diverted from our main purpose for too long.
Examples
#9 As mentioned earlier, the set C of all complex numbers is a ﬁeld, and
so is the set Q of all rational numbers.
#10 Any set with exactly two elements can be made into a ﬁeld by deﬁning
addition and multiplication appropriately. Let S be such a set, and (for
reasons that will become apparent) let us give the elements of S the names
odd and even . We now deﬁne addition by the rules
even + even = even even + odd = odd
odd + even = odd odd + odd = even
Chapter One: Preliminaries 13
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and multiplication by the rules
even even = even even odd = even
odd even = even odd odd = odd .
These deﬁnitions are motivated by the fact that the sum of any two even
integers is even, and so on. Thus it is natural to associate odd with the
set of all odd integers and even with the set of all even integers. It is
straightforward to check that the ﬁeld axioms are now satisﬁed, with even
as the zero element and odd as the unity.
Henceforth this ﬁeld will be denoted by ‘Z
2
’ and its elements will simply
be called ‘0’ and ‘1’.
#11 A similar process can be used to construct a ﬁeld with exactly three
elements; intuitively, we wish to identify integers which diﬀer by a multiple
of three. Accordingly, let div be the set of all integers divisible by three,
div+1
the set of all integers which are one greater than integers divisible
by three, and
div−1
the set of all integers which are one less than integers
divisible by three. The appropriate deﬁnitions for addition and multiplication
of these objects are determined by corresponding properties of addition and
multiplication of integers. Thus, since
(3k + 1)(3h −1) = 3(3kh +h −k) −1
it follows that the product of an integer in
div+1
and an integer in
div−1
is always in
div−1
, and so we should deﬁne
div+1 div−1
=
div−1
.
Once these deﬁnitions have been made it is fairly easy to check that the ﬁeld
axioms are satisﬁed.
This ﬁeld will henceforth be denoted by ‘Z
3
’ and its elements by ‘0’, ‘1’
and ‘−1’. Since 1 + 1 = −1 in Z
3
the element −1 is alternatively denoted
by‘2’ (or by ‘5’ or ‘−4’ or, indeed, ‘n’ for any integer n which is one less than
a multiple of 3).
#12 The above construction can be generalized to give a ﬁeld with n ele
ments for any prime number n. In fact, the construction of Z
n
works for any
n, but the ﬁeld axioms are only satisﬁed if n is prime. Thus Z
4
= ¦0, 1, 2, 3¦
14 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
with the operations of “addition and multiplication modulo 4” does not form
a ﬁeld, but Z
5
= ¦0, 1, 2, 3, 4¦ does form a ﬁeld under addition and multipli
cation modulo 5. Indeed, the addition and and multiplication tables for Z
5
are as follows:
+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
It is easily seen that Z
4
cannot possibly be a ﬁeld, because multiplication
modulo 4 gives 2
2
= 0, whereas it is a consequence of the ﬁeld axioms that
the product of two nonzero elements of any ﬁeld must be nonzero. (This is
Exercise 4 at the end of this chapter.)
#13 Deﬁne
F =
a b
2b a
a, b ∈ Q
with addition and multiplication of matrices deﬁned in the usual way. (See
Chapter Two for the relevant deﬁnitions.) It can be shown that F is a ﬁeld.
Note that if we deﬁne I =
1 0
0 1
and J =
0 1
2 0
then F consists
of all matrices of the form aI +bJ, where a, b ∈ Q. Now since J
2
= 2I (check
this!) we see that the rule for the product of two elements of F is
(aI +bJ)(cI +dJ) = (ac + 2bd)I + (ad +bc)J for all a, b, c, d ∈ Q.
If we deﬁne F
to be the set of all real numbers of the form a + b
√
2, where
a and b are rational, then we have a similar formula for the product of two
elements of F
, namely
(a +b
√
2)(c +d
√
2) = (ac + 2bd) + (ad +bc)
√
2 for all a, b, c, d ∈ Q.
The rules for addition in F and in F
are obviously similar as well; indeed,
(aI +bJ) + (cI +dJ) = (a +c)I + (b +d)J
and
(a +b
√
2) + (c +d
√
2) = (a +c) + (b +d)
√
2.
Chapter One: Preliminaries 15
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Thus, as far as addition and multiplication are concerned, F and F
are
essentially the same as one another. This is an example of what is known as
isomorphism of two algebraic systems.
Note that F
is obtained by “adjoining”
√
2 to the ﬁeld Q, in exactly the
same way as
√
−1 is “adjoined” to the real ﬁeld R to construct the complex
ﬁeld C.
#14 Although Z
4
is not a ﬁeld, it is possible to construct a ﬁeld with
four elements. We consider matrices whose entries come from the ﬁeld Z
2
.
Addition and multiplication of matrices is deﬁned in the usual way, but
addition and multiplication of the matrix entries must be performed modulo 2
since these entries come from Z
2
. It turns out that
K =
0 0
0 0
,
1 0
0 1
,
1 1
1 0
,
0 1
1 1
is a ﬁeld. The zero element of this ﬁeld is the zero matrix
0 0
0 0
, and
the unity element is the identity matrix
1 0
0 1
. To simplify notation,
let us temporarily use ‘0’ to denote the zero matrix and ‘1’ to denote the
identity matrix, and let us also use ‘ω’ to denote one of the two remaining
elements of K (it does not matter which). Remembering that addition and
multiplication are to be performed modulo 2 in this example, we see that
1 0
0 1
+
1 1
1 0
=
0 1
1 1
and also
1 0
0 1
+
0 1
1 1
=
1 1
1 0
,
so that the fourth element of K equals 1 +ω. A short calculation yields the
following addition and multiplication tables for K:
+ 0 1 ω 1 +ω
0 0 1 ω 1 +ω
1 1 0 1 +ω ω
ω ω 1 +ω 0 1
1 +ω 1 +ω ω 1 0
0 1 ω 1 +ω
0 0 0 0 0
1 0 1 ω 1 +ω
ω 0 ω 1 +ω 1
1 +ω 0 1 +ω 1 ω
16 Chapter One: Preliminaries
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#15 The set of all expressions of the form
a
0
+a
1
X + +a
n
X
n
b
0
+b
1
X + +b
m
X
m
where the coeﬃcients a
i
and b
i
are real numbers and the b
i
are not all zero,
can be regarded as a ﬁeld.
The last three examples in the above list are rather outr´e, and will not
be used in this book. They were included merely to emphasize that lots of
examples of ﬁelds do exist.
Exercises
1. Let A and B be nonempty sets and f: A → B a function.
(i ) Prove that f has a left inverse if and only if it is injective.
(ii ) Prove that f has a right inverse if and only if it is surjective.
(iii ) Prove that if f has both a right inverse and a left inverse then they
are equal.
2. Let S be a set and ∼ a relation on S which is both symmetric and
transitive. If x and y are elements of S and x ∼ y then by symmetricity
we must have y ∼ x. Now by transitivity x ∼ y and y ∼ x yields x ∼ x,
and so it follows that x ∼ x for all x ∈ S. That is, the reﬂexive law is a
consequence of the symmetric and transitive laws. What is the error in
this “proof ”?
3. Suppose that ∼ is an equivalence relation on a set X. For each x ∈ X
let E(x) = ¦ z ∈ X [ x ∼ z ¦. Prove that if x, y ∈ X then E(x) = E(y)
if x ∼ y and E(x) ∩ E(y) = ∅ if x ∼ y.
The subsets E(x) of X are called equivalence classes. Prove that each
element x ∈ X lies in a unique equivalence class, although there may be
many diﬀerent y such that x ∈ E(y).
4. Prove that if F is a ﬁeld and x, y ∈ F are nonzero then xy is also
nonzero. (Hint: Use Axiom (vii).)
5. Prove that Z
n
is not a ﬁeld if n is not prime.
Chapter One: Preliminaries 17
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6. It is straightforward to check that Z
n
satisﬁes all the ﬁeld axioms except
for part (vii) of 1.2. Prove that this axiom is satisﬁed if n is prime.
(Hint: Let k be a nonzero element of Z
n
. Use the fact that if k(i−j)
is divisible by the prime n then i − j must be divisible by n to
prove that k0, k1, . . . , k(n − 1) are all distinct elements of Z
n
,
and deduce that one of them must equal 1.)
7. Prove that the example in #14 above is a ﬁeld. You may assume that
the only solution in integers of the equation a
2
− 2b
2
= 0 is a = b = 0,
as well as the properties of matrix addition and multiplication given in
Chapter Two.
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
2
Matrices, row vectors and column vectors
Linear algebra is one of the most basic of all branches of mathematics. The
ﬁrst and most obvious (and most important) application is the solution of
simultaneous linear equations: in many practical problems quantities which
need to be calculated are related to measurable quantities by linear equa
tions, and consequently can be evaluated by standard row operation tech
niques. Further development of the theory leads to methods of solving linear
diﬀerential equations, and even equations which are intrinsically nonlinear
are usually tackled by repeatedly solving suitable linear equations to ﬁnd a
convergent sequence of approximate solutions. Moreover, the theory which
was originally developed for solving linear equations is generalized and built
on in many other branches of mathematics.
'2a Matrix operations
A matrix is a rectangular array of numbers. For instance,
A =
¸
1 2 3 4
2 0 7 2
5 1 1 8
is a 34 matrix. (That is, it has three rows and four columns.) The numbers
are called the matrix entries (or components), and they are easily speciﬁed
by row number and column number. Thus in the matrix A above the (2, 3)
entry is 7 and the (3, 4)entry is 8. We will usually use the notation ‘X
ij
’ for
the (i, j)entry of a matrix X. Thus, in our example, we have A
23
= 7 and
A
34
= 8.
A matrix with only one row is usually called a row vector, and a matrix
with only one column is usually called a column vector. We will usually call
them just rows and columns, since (as we will see in the next chapter) the
term vector can also properly be applied to things which are not rows or
columns. Rows or columns with n components are often called ntuples.
The set of all mn matrices over R will be denoted by ‘Mat(mn, R)’,
and we refer to mn as the shape of these matrices. We deﬁne R
n
to be the
18
Chapter Two: Matrices, row vectors and column vectors 19
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
set of all column ntuples of real numbers, and
t
R
n
(which we use less fre
quently) to be the set of all row ntuples. (The ‘t’ stands for ‘transpose’. The
transpose of a matrix A ∈ Mat(mn, R) is the matrix
t
A ∈ Mat(n m, R)
deﬁned by the formula (
t
A)
ij
= A
ji
.)
The deﬁnition of matrix that we have given is intuitively reasonable,
and corresponds to the best way to think about matrices. Notice, however,
that the essential feature of a matrix is just that there is a welldetermined
matrix entry associated with each pair (i, j). For the purest mathematicians,
then, a matrix is simply a function, and consequently a formal deﬁnition is
as follows:
2.1 Definition Let n and m be nonnegative integers. An m n matrix
over the real numbers is a function
A: (i, j) → A
ij
from the Cartesian product ¦1, 2, . . . , m¦ ¦1, 2, . . . , n¦ to R.
Two matrices with of the same shape can be added simply by adding
the corresponding entries. Thus if A and B are both m n matrices then
A+B is the mn matrix deﬁned by the formula
(A+B)
ij
= A
ij
+B
ij
for all i ∈ ¦1, 2, . . . , m¦ and j ∈ ¦1, 2, . . . , n¦. Addition of matrices satisﬁes
properties similar to those satisﬁed by addition of numbers—in particular,
(A + B) + C = A + (B + C) and A + B = B + A for all m n matrices.
The m n zero matrix, denoted by ‘0
m×n
’, or simply ‘0’, is the m n
matrix all of whose entries are zero. It satisﬁes A + 0 = A = 0 + A for all
A ∈ Mat(mn, R).
If λ is any real number and A any matrix over R then we deﬁne λA by
(λA)
ij
= λ(A
ij
) for all i and j.
Note that λA is a matrix of the same shape as A.
Multiplication of matrices can also be deﬁned, but it is more compli
cated than addition and not as well behaved. The product AB of matrices
A and B is deﬁned if and only if the number of columns of A equals the
20 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
number of rows of B. Then the (i, j)entry of AB is obtained by multiplying
the entries of the i
th
row of A by the corresponding entries of the j
th
column
of B and summing these products. That is, if A has shape m n and B
shape n p then AB is the mp matrix deﬁned by
(AB)
ij
= A
i1
B
1j
+A
i2
B
2j
+ +A
in
B
nj
=
n
¸
k=1
A
ik
B
kj
for all i ∈ ¦1, 2, . . . , m¦ and j ∈ ¦1, 2, . . . , p¦.
This deﬁnition may appear a little strange at ﬁrst sight, but the fol
lowing considerations should make it seem more reasonable. Suppose that
we have two sets of variables, x
1
, x
2
, . . . , x
m
and y
1
, y
2
, . . . , y
n
, which are
related by the equations
x
1
= a
11
y
1
+a
12
y
2
+ +a
1n
y
n
x
2
= a
21
y
1
+a
22
y
2
+ +a
2n
y
n
.
.
.
x
m
= a
m1
y
1
+a
m2
y
2
+ +a
mn
y
n
.
Let A be the m n matrix whose (i, j)entry is the coeﬃcient of y
j
in the
expression for x
i
; that is, A
ij
= a
ij
. Suppose now that the variables y
j
can
be similarly expressed in terms of a third set of variables z
k
with coeﬃcient
matrix B:
y
1
= b
11
z
1
+b
12
z
2
+ +b
1p
z
p
y
2
= b
21
z
1
+b
22
z
2
+ +b
2p
z
p
.
.
.
y
n
= b
n1
z
1
+b
n2
z
2
+ +b
np
z
p
where b
ij
is the (i, j)entry of B. Clearly one can obtain expressions for the
x
i
in terms of the z
k
by substituting this second set of equations into the ﬁrst.
It is easily checked that the total coeﬃcient of z
k
in the expression for x
i
is
a
i1
b
1k
+a
i2
b
2k
+ +a
in
b
nk
, which is exactly the formula for the (i, k)entry
of AB. We have shown that if A is the coeﬃcient matrix for expressing the
x
i
in terms of the y
j
, and B the coeﬃcient matrix for expressing the y
j
in
terms of the z
k
, then AB is the coeﬃcient matrix for expressing the x
i
in
Chapter Two: Matrices, row vectors and column vectors 21
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
terms of the z
k
. If one thinks of matrix multiplication in this way, none of
the facts mentioned below will seem particularly surprising.
Note that if the matrix product AB is deﬁned there is no guarantee that
the product BA is deﬁned also, and even if it is deﬁned it need not equal
AB. Furthermore, it is possible for the product of two nonzero matrices to
be zero. Thus the familiar properties of multiplication of numbers do not all
carry over to matrices. However, the following properties are satisﬁed:
(i) For each positive integer n there is an nn identity matrix, commonly
denoted by ‘I
n×n
’, or simply ‘I’, having the properties that AI = A for
matrices A with n columns and IB = B for all matrices B with n rows.
The (i, j)entry of I is the Kronecker delta, δ
ij
, which is 1 if i and j are
equal and 0 otherwise:
I
ij
= δ
ij
=
1 if i = j
0 if i = j.
(ii) The distributive law A(B + C) = AB + AC is satisﬁed whenever
A is an m n matrix and B and C are n p matrices. Similarly,
(A+B)C = AC +BC whenever A and B are mn and C is n p.
(iii) If A is an mn matrix, B an n p matrix and C a p q matrix then
(AB)C = A(BC).
(iv) If A is an m n matrix, B an n p matrix and λ any number, then
(λA)B = λ(AB) = A(λB).
These properties are all easily proved. For instance, for (iii) we have
(AB)C
ij
=
p
¸
k=1
(AB)
ik
C
kj
=
p
¸
k=1
n
¸
l=1
A
il
B
lk
C
kj
=
p
¸
k=1
n
¸
l=1
A
il
B
lk
C
kj
and similarly
A(BC)
ij
=
n
¸
l=1
A
il
(BC)
lj
=
n
¸
l=1
A
il
p
¸
k=1
B
lk
C
kj
=
n
¸
l=1
p
¸
k=1
A
il
B
lk
C
kj
,
and interchanging the order of summation we see that these are equal.
The following facts should also be noted.
22 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
2.2 Proposition The j
th
column of a product AB is obtained by multi
plying the matrix A by the j
th
column of B, and the i
th
row of AB is the i
th
row of A multiplied by B.
Proof. Suppose that the number of columns of A, necessarily equal to the
number of rows of B, is n. Let b
j
be the j
th
column of B; thus the k
th
entry
of b
j
is B
kj
for each k. Now by deﬁnition the i
th
entry of the j
th
column of
AB is (AB)
ij
=
¸
n
k=1
A
ik
B
kj
. But this is exactly the same as the formula
for the i
th
entry of Ab
j
.
The proof of the other part is similar.
More generally, we have the following rule concerning multiplication of
“partitioned matrices”, which is fairly easy to see although a little messy to
state and prove. Suppose that the mn matrix A is subdivided into blocks
A
(rs)
as shown, where A
(rs)
has shape m
r
n
s
:
A =
¸
¸
¸
A
(11)
A
(12)
. . . A
(1q)
A
(21)
A
(22)
. . . A
(2q)
.
.
.
.
.
.
.
.
.
A
(p1)
A
(p2)
. . . A
(pq)
.
Thus we have that m =
¸
p
r=1
m
r
and n =
¸
q
s=1
n
s
. Suppose also that B is
an nl matrix which is similarly partitioned into submatrices B
(st)
of shape
n
s
l
t
. Then the product AB can be partitioned into blocks of shape m
r
l
t
,
where the (r, t)block is given by the formula
¸
q
s=1
A
(rs)
B
(st)
. That is,
¸
¸
¸
A
(11)
A
(12)
. . . A
(1q)
A
(21)
A
(22)
. . . A
(2q)
.
.
.
.
.
.
.
.
.
A
(p1)
A
(p2)
. . . A
(pq)
¸
¸
¸
B
(11)
B
(12)
. . . B
(1u)
B
(21)
B
(22)
. . . B
(2u)
.
.
.
.
.
.
.
.
.
B
(q1)
B
(q2)
. . . B
(qu)
=
¸
¸
¸
q
s=1
A
(1s)
B
(s1)
. . .
¸
q
s=1
A
(1s)
B
(su)
.
.
.
.
.
.
¸
q
s=1
A
(ps)
B
(s1)
. . .
¸
q
s=1
A
(ps)
B
(sq)
. ($)
The proof consists of calculating the (i, j)entry of each side, for arbitrary
i and j. Deﬁne
M
0
= 0, M
1
= m
1
, M
2
= m
1
+m
2
, . . . , M
p
= m
1
+m
2
+ +m
p
Chapter Two: Matrices, row vectors and column vectors 23
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and similarly
N
0
= 0, N
1
= n
1
, N
2
= n
1
+n
2
, . . . , N
q
= n
1
+n
2
+ +n
q
L
0
= 0, L
1
= l
1
, L
2
= l
1
+l
2
, . . . , L
u
= l
1
+l
2
+ +l
u
.
Given that i lies between M
0
+ 1 = 1 and M
p
= m, there exists an r such
that i lies between M
r−1
+ 1 and M
r
. Write i
= i − M
r−1
. We see that
the i
th
row of A is partitioned into the i
th
rows of A
(r1)
, A
(r2)
, . . . , A
(rq)
.
Similarly we may locate the j
th
column of B by choosing t such that j lies
between L
t−1
and L
t
, and we write j
= j −L
t−1
. Now we have
(AB)
ij
=
n
¸
k=1
A
ik
B
kj
=
q
¸
s=1
¸
N
s
¸
k=N
s−1
+1
A
ik
B
kj
=
q
¸
s=1
n
s
¸
k=1
(A
(rs)
)
i
k
(B
(st)
)
kj
=
q
¸
s=1
(A
(rs)
B
(st)
)
i
j
=
q
¸
s=1
A
(rs)
B
(st)
i
j
which is the (i, j)entry of the partitioned matrix ($) above.
Example
#1 Verify the above rule for multiplication of partitioned matrices by com
puting the matrix product
¸
2 4 1
1 3 3
5 0 1
¸
1 1 1 1
0 1 2 3
4 1 4 1
using the following partitioning:
¸
2 4 1
1 3 3
5 0 1
¸
1 1 1 1
0 1 2 3
4 1 4 1
.
24 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
−− We ﬁnd that
2 4
1 3
1 1 1 1
0 1 2 3
+
1
3
( 4 1 4 1 )
=
2 6 10 14
1 4 7 10
+
4 1 4 1
12 3 12 3
=
6 7 14 15
13 7 19 13
and similarly
( 5 0 )
1 1 1 1
0 1 2 3
+ ( 1 ) ( 4 1 4 1 )
= ( 5 5 5 5 ) + ( 4 1 4 1 )
= ( 9 6 9 6 ) ,
so that the answer is
¸
6 7 14 15
13 7 19 13
9 6 9 6
,
as can easily be checked directly. −−<
Comment
2.2.1 Looking back on the proofs in this section one quickly sees that
the only facts about real numbers which are made use of are the associative,
commutative and distributive laws for addition and multiplication, existence
of 1 and 0, and the like; in short, these results are based simply on the ﬁeld
axioms. Everything in this section is true for matrices over any ﬁeld.
'2b Simultaneous equations
In this section we review the procedure for solving simultaneous linear equa
tions. Given the system of equations
(2.2.2)
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
= b
1
a
21
x
1
+a
22
x
2
+ +a
2n
x
n
= b
2
.
.
.
a
m1
x
1
+a
m2
x
2
+ +a
mn
x
n
= b
m
Chapter Two: Matrices, row vectors and column vectors 25
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
the ﬁrst step is to replace it by a reduced echelon system which is equivalent
to the original. Solving reduced echelon systems is trivial. An algorithm is
described below for obtaining the reduced echelon system corresponding to
a given system of equations. The algorithm makes use of elementary row
operations, of which there are three kinds:
1. Replace an equation by itself plus a multiple of another.
2. Replace an equation by a nonzero multiple of itself.
3. Write the equations down in a diﬀerent order.
The most important thing about row operations is that the new equations
should be consequences of the old, and, conversely, the old equations should
be consequences of the new, so that the operations do not change the solution
set. It is clear that the three kinds of elementary row operations do satisfy
this requirement; for instance, if the ﬁfth equation is replaced by itself minus
twice the second equation, then the old equations can be recovered from the
new by replacing the new ﬁfth equation by itself plus twice the second.
It is usual when solving simultaneous equations to save ink by simply
writing down the coeﬃcients, omitting the variables, the plus signs and the
equality signs. In this shorthand notation the system 2.2.2 is written as the
augmented matrix
(2.2.3)
¸
¸
¸
¸
a
11
a
12
a
1n
b
1
a
21
a
22
a
2n
b
2
.
.
.
a
m1
a
m2
a
mn
b
m
.
It should be noted that the system of equations 2.2.2 can also be written as
the single matrix equation
¸
¸
a
11
a
12
a
1n
a
21
a
22
a
2n
a
m1
a
m2
a
mn
¸
¸
¸
x
1
x
2
.
.
.
x
n
=
¸
¸
¸
b
1
b
2
.
.
.
b
m
where the matrix product on the left hand side is as deﬁned in the previous
section. For the time being, however, this fact is not particularly relevant,
and 2.2.3 should simply be regarded as an abbreviated notation for 2.2.2.
The leading entry of a nonzero row of a matrix is the ﬁrst (that is,
leftmost) nonzero entry. An echelon matrix is a matrix with the following
properties:
26 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(i) All nonzero rows must precede all zero rows.
(ii) For all i, if rows i and i + 1 are nonzero then the leading entry of
row i + 1 must be further to the right—that is, in a higher numbered
column—than the leading entry of row i.
A reduced echelon matrix has two further properties:
(iii) All leading entries must be 1.
(iv) A column which contains the leading entry of any row must contain no
other nonzero entry.
Once a system of equations is obtained for which the augmented matrix is
reduced echelon, proceed as follows to ﬁnd the general solution. If the last
leading entry occurs in the ﬁnal column (corresponding to the right hand side
of the equations) then the equations have no solution (since we have derived
the equation 0=1), and we say that the system is inconsistent. Otherwise
each nonzero equation determines the variable corresponding to the leading
entry uniquely in terms the other variables, and those variables which do not
correspond to the leading entry of any row can be given arbitrary values. To
state this precisely, suppose that rows 1, 2, . . . , k are the nonzero rows, and
suppose that their leading entries occur in columns i
1
, i
2
, . . . , i
k
respectively.
We may call x
i
1
, x
i
2
, . . . , x
i
k
the pivot or basic variables and the remaining
n − k variables the free variables. The most general solution is obtained by
assigning arbitrary values to the free variables, and solving equation 1 for
x
i
1
, equation 2 for x
i
2
, . . . , and equation k for x
i
k
, to obtain the values of
the basic variables. Thus the general solution of consistent system has n −k
degrees of freedom, in the sense that it involves n−k arbitrary parameters. In
particular a consistent system has a unique solution if and only if n −k = 0.
Example
#2 Suppose that after performing row operations on a system of ﬁve equa
tions in the nine variables x
1
, x
2
, . . . , x
9
, the following reduced echelon aug
mented matrix is obtained:
(2.2.4)
¸
¸
¸
¸
¸
0 0 1 2 0 0 −2 −1 0 8
0 0 0 0 1 0 −4 0 0 2
0 0 0 0 0 1 1 3 0 −1
0 0 0 0 0 0 0 0 1 5
0 0 0 0 0 0 0 0 0 0
.
The leading entries occur in columns 3, 5, 6 and 9, and so the basic variables
are x
3
, x
5
, x
6
and x
9
. Let x
1
= α, x
2
= β, x
4
= γ, x
7
= δ and x
8
= ε, where
Chapter Two: Matrices, row vectors and column vectors 27
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
α, β, γ, δ and ε are arbitrary parameters. Then the equations give
x
3
= 8 −2γ + 2δ +ε
x
5
= 2 + 4δ
x
6
= −1 −δ −3ε
x
9
= 5
and so the most general solution of the system is
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
=
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
α
β
8 −2γ + 2δ +ε
γ
2 + 4δ
−1 −δ −3ε
δ
ε
5
=
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
0
8
0
2
−1
0
0
5
+α
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
1
0
0
0
0
0
0
0
0
+β
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
1
0
0
0
0
0
0
0
+γ
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
0
−2
1
0
0
0
0
0
+δ
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
0
2
0
4
−1
1
0
0
+ε
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
0
1
0
0
−3
0
1
0
.
It is not wise to attempt to write down this general solution directly from the
reduced echelon system 2.2.4. Although the actual numbers which appear
in the solution are the same as the numbers in the augmented matrix, apart
from some changes in sign, some care is required to get them in the right
places!
Comments
2.2.5 In general there are many diﬀerent choices for the row operations
to use when putting the equations into reduced echelon form. It can be
28 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
shown, however, that the reduced echelon form itself is uniquely determined
by the original system: it is impossible to change one reduced echelon system
into another by row operations.
2.2.6 We have implicitly assumed that the coeﬃcients a
ij
and b
i
which
appear in the system of equations 2.2.2 are real numbers, and that the un
knowns x
j
are also meant to be real numbers. However, it is easily seen
that the process used for solving the equations works equally well for any
ﬁeld.
There is a straightforward algorithm for obtaining the reduced echelon
system corresponding to a given system of linear equations. The idea is to
use the ﬁrst equation to eliminate x
1
from all the other equations, then use
the second equation to eliminate x
2
from all subsequent equations, and so
on. For the ﬁrst step it is obviously necessary for the coeﬃcient of x
1
in the
ﬁrst equation to be nonzero, but if it is zero we can simply choose to call a
diﬀerent equation the “ﬁrst”. Similar reorderings of the equations may be
necessary at each step.
More exactly, and in the terminology of row operations, the process is
as follows. Given an augmented matrix with m rows, ﬁnd a nonzero entry in
the ﬁrst column. This entry is called the ﬁrst pivot. Swap rows to make the
row containing this ﬁrst pivot the ﬁrst row. (Strictly speaking, it is possible
that the ﬁrst column is entirely zero; in that case use the next column.) Then
subtract multiples of the ﬁrst row from all the others so that the new rows
have zeros in the ﬁrst column. Thus, all the entries below the ﬁrst pivot will
be zero. Now repeat the process, using the (m− 1)rowed matrix obtained
by ignoring the ﬁrst row. That is, looking only at the second and subsequent
rows, ﬁnd a nonzero entry in the second column. If there is none (so that
elimination of x
1
has accidentally eliminated x
2
as well) then move on to the
third column, and keep going until a nonzero entry is found. This will be
the second pivot. Swap rows to bring the second pivot into the second row.
Now subtract multiples of the second row from all subsequent rows to make
the entries below the second pivot zero. Continue in this way (using next the
m− 2rowed matrix obtained by ignoring the ﬁrst two rows) until no more
pivots can be found.
When this has been done the resulting matrix will be in echelon form,
but (probably) not reduced echelon form. The reduced echelon form is readily
obtained, as follows. Start by dividing all entries in the last nonzero row by
the leading entry—the pivot—in the row. Then subtract multiples of this
Chapter Two: Matrices, row vectors and column vectors 29
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
row from all the preceding rows so that the entries above the pivot become
zero. Repeat this for the second to last nonzero row, then the third to last,
and so on. It is important to note the following:
2.2.7 The algorithm involves dividing by each of the pivots.
'2c Partial pivoting
As commented above, the procedure described in the previous section applies
equally well for solving simultaneous equations over any ﬁeld. Diﬀerences
between diﬀerent ﬁelds manifest themselves only in the diﬀerent algorithms
required for performing the operations of addition, subtraction, multiplica
tion and division in the various ﬁelds. For this section, however, we will
restrict our attention exclusively to the ﬁeld of real numbers (which is, after
all, the most important case).
Practical problems often present systems with so many equations and
so many unknowns that it is necessary to use computers to solve them. Usu
ally, when a computer performs an arithmetic operation, only a ﬁxed number
of signiﬁcant ﬁgures in the answer are retained, so that each arithmetic oper
ation performed will introduce a minuscule roundoﬀ error. Solving a system
involving hundreds of equations and unknowns will involve millions of arith
metic operations, and there is a deﬁnite danger that the cumulative eﬀect of
the minuscule approximations may be such that the ﬁnal answer is not even
close to the true solution. This raises questions which are extremely impor
tant, and even more diﬃcult. We will give only a very brief and superﬁcial
discussion of them.
We start by observing that it is sometimes possible that a minuscule
change in the coeﬃcients will produce an enormous change in the solution,
or change a consistent system into an inconsistent one. Under these circum
stances the unavoidable roundoﬀ errors in computation will produce large
errors in the solution. In this case the matrix is illconditioned, and there is
nothing which can be done to remedy the situation.
Suppose, for example, that our computer retains only three signiﬁcant
ﬁgures in its calculations, and suppose that it encounters the following system
of equations:
x + 5z = 0
12y + 12z = 36
.01x + 12y + 12z = 35.9
30 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Using the algorithm described above, the ﬁrst step is to subtract .01 times
the ﬁrst equation from the third. This should give 12y + 11.95z = 35.9,
but the best our inaccurate computer can get is either 12y + 11.9z = 35.9
or 12y + 12z = 35.9. Subtracting the second equation from this will either
give 0.1z = 0.1 or 0z = 0.1, instead of the correct 0.05z = 0.1. So the
computer will either get a solution which is badly wrong, or no solution at all.
The problem with this system of equations—at least, for such an inaccurate
computer—is that a small change to the coeﬃcients can dramatically change
the solution. As the equations stand the solution is x = −10, y = 1, z = 2,
but if the coeﬃcient of z in the third equation is changed from 12 to 12.1 the
solution becomes x = 10, y = 5, z = −2.
Solving an illconditioned system will necessarily involve dividing by a
number that is close to zero, and a small error in such a number will produce
a large error in the answer. There are, however, occasions when we may be
tempted to divide by a number that is close to zero when there is in fact no
need to. For instance, consider the equations
2.2.8
.01x + 100y = 100
x +y = 2
If we select the entry in the ﬁrst row and ﬁrst column of the augmented
matrix as the ﬁrst pivot, then we will subtract 100 times the ﬁrst row from
the second, and obtain
.01 100 100
0 −9999 −9998
.
Because our machine only works to three ﬁgures, −9999 and −9998 will both
be rounded oﬀ to 10000. Now we divide the second row by −10000, then
subtract 100 times the second row from the ﬁrst, and ﬁnally divide the ﬁrst
row by .01, to obtain the reduced echelon matrix
1 0 0
0 1 1
.
That is, we have obtained x = 0, y = 1 as the solution. Our value of y is
accurate enough; our mistake was to substitute this value of y into the ﬁrst
equation to obtain a value for x. Solving for x involved dividing by .01, and
the small error in the value of y resulted in a large error in the value of x.
Chapter Two: Matrices, row vectors and column vectors 31
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
A much more accurate answer is obtained by selecting the entry in the
second row and ﬁrst column as the ﬁrst pivot. After swapping the two rows,
the row operation procedure continues as follows:
1 1 2
.01 100 100
R
2
:=R
2
−.01R
1
−−−−−−−−−→
1 1 2
0 100 100
R
2
:=
1
100
R
2
R
1
:=R
1
−R
2
−−−−−−−→
1 0 1
0 1 1
.
We have avoided the numerical instability that resulted from dividing by .01,
and our answer is now correct to three signiﬁcant ﬁgures.
In view of these remarks and the comment 2.2.7 above, it is clear that
when deciding which of the entries in a given column should be the next
pivot, we should avoid those that are too close to zero.
Of all possible pivots in the column, always
choose that which has the largest absolute value.
Choosing the pivots in this way is called partial pivoting. It is this procedure
which is used in most practical problems.†
Some reﬁnements to the partial pivoting algorithm may sometimes be
necessary. For instance, if the system 2.2.8 above were modiﬁed by multiply
ing the whole of the ﬁrst equation by 200 then partial pivoting as described
above would select the entry in the ﬁrst row and ﬁrst column as the ﬁrst
pivot, and we would encounter the same problem as before. The situation
can be remedied by scaling the rows before commencing pivoting, by divid
ing each row through by its entry of largest absolute value. This makes the
rows commensurate, and reduces the likelihood of inaccuracies resulting from
adding numbers of greatly diﬀering magnitudes.
Finally, it is worth noting that although the claims made above seem
reasonable, it is hard to justify them rigorously. This is because the best al
gorithm is not necessarily the best for all systems. There will always be cases
where a less good algorithm happens to work well for a particular system.
It is a daunting theoretical task to deﬁne the “goodness” of an algorithm in
any reasonable way, and then use the concept to compare algorithms.
† In complete pivoting all the entries of all the remaining columns are compared
when choosing the pivots. The resulting algorithm is slightly safer, but much slower.
32 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
'2d Elementary matrices
In this section we return to a discussion of general properties of matrices:
properties that are true for matrices over any ﬁeld. Accordingly, we will
not specify any particular ﬁeld; instead, we will use the letter ‘F ’ to denote
some ﬁxed but unspeciﬁed ﬁeld, and the elements of F will be referred to
as ‘scalars’. This convention will remain in force for most of the rest of
this book, although from time to time we will restrict ourselves to the case
F = R, or impose some other restrictions on the choice of F. It is conceded,
however, that the extra generality that we achieve by this approach is not of
great consequence for the most common applications of linear algebra, and
the reader is encouraged to think of scalars as ordinary numbers (since they
usually are).
2.3 Definition Let m and n be positive integers. An elementary row
operation is a function ρ: Mat(mn, F) → Mat(mn, F) such that either
ρ = ρ
ij
for some i, j ∈ 1 = ¦1, 2, . . . , m¦, or ρ = ρ
(λ)
i
for some i ∈ 1 and
some nonzero scalar λ, or ρ = ρ
(λ)
ij
for some i, j ∈ 1 and some scalar λ, where
ρ
ij
, ρ
(λ)
ij
and ρ
(λ)
i
are deﬁned as follows:
(i) ρ
ij
(A) is the matrix obtained from A by swapping the i
th
and j
th
rows;
(ii) ρ
(λ)
i
(A) is the matrix obtained from A by multiplying the i
th
row by λ;
(iii) ρ
(λ)
ij
(A) is the matrix obtained from A by adding λ times the i
th
row to
the j
th
row.
Comment
2.3.1 Elementary column operations are deﬁned similarly.
2.4 Definition An elementary matrix is any matrix obtainable by apply
ing an elementary row operation to an identity matrix. We use the following
notation:
E
ij
= ρ
ij
(I), E
(λ)
i
= ρ
(λ)
i
(I), E
(λ)
ij
= ρ
(λ)
ij
(I),
where I denotes an identity matrix.
It can be checked that all elementary row operations have inverses. So
it follows that they are bijective functions. The inverses are in fact also
elementary row operations:
Chapter Two: Matrices, row vectors and column vectors 33
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
2.5 Proposition We have ρ
−1
ij
= ρ
ij
and (ρ
(λ)
ij
)
−1
= ρ
(−λ)
ij
for all i, j and
λ, and if λ = 0 then (ρ
(λ)
i
)
−1
= ρ
(λ
−1
)
i
.
The principal theorem about elementary row operations is that the ef
fect of each is the same as premultiplication by the corresponding elementary
matrix:
2.6 Theorem If A is a matrix of m rows and 1 is as in 2.3 then for all
i, j ∈ 1 and λ ∈ F we have that ρ
ij
(A) = E
ij
A, ρ
(λ)
i
(A) = E
(λ)
i
A and
ρ
(λ)
ij
(A) = E
(λ)
ij
A.
Once again there must be corresponding results for columns. However,
we do not have to deﬁne “column elementary matrices”, since it turns out
that the result of applying an elementary column operation to an identity
matrix I is the same as the result of applying an appropriate elementary
row operation to I. Let us use the following notation for elementary column
operations:
(i) γ
ij
swaps columns i and j,
(ii) γ
(λ)
i
multiplies column i by λ,
(iii) γ
(λ)
ij
adds λ times column i to column j.
We ﬁnd that
2.7 Theorem If γ is an elementary column operation then γ(A) = Aγ(I)
for all A and the appropriate I. Furthermore, γ
ij
(I) = E
ij
, γ
(λ)
i
(I) = E
(λ)
i
and γ
(λ)
ij
(I) = E
(λ)
ji
.
(Note that i and j occur in diﬀerent orders on the two sides of the last
equation in this theorem.)
Example
#3 Performing the elementary row operation ρ
(2)
21
(adding twice the second
row to the ﬁrst) on the 3 3 identity matrix yields the matrix
E =
¸
1 2 0
0 1 0
0 0 1
.
34 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
So if A is any matrix with three rows then EA should be the result of per
forming ρ
(2)
21
on A. Thus, for example
¸
1 2 0
0 1 0
0 0 1
¸
a b c d
e f g h
i j k l
=
¸
a + 2e b + 2f c + 2g d + 2h
e f g h
i j k l
,
which is indeed the result of adding twice the second row to the ﬁrst. Observe
that E is also obtainable by performing the elementary column operation γ
(2)
12
(adding twice the ﬁrst column to the second) on the identity. So if B is any
matrix with three columns, BE is the result of performing γ
(2)
12
on B. For
example,
p q r
s t u
¸
1 2 0
0 1 0
0 0 1
=
p 2p +q r
s 2s +t u
.
The pivoting algorithm described in the previous section gives the fol
lowing fact.
2.8 Proposition For each A ∈ Mat(mn, F) there exists a sequence of
m m elementary matrices E
1
, E
2
, . . . , E
k
such that E
k
E
k−1
. . . E
1
A is a
reduced echelon matrix.
If A is an n n matrix then an inverse of A is a matrix B satisfying
AB = BA = I. Matrices which have inverses are said to be invertible. It
is easily proved that a matrix can have at most one inverse; so there can
be no ambiguity in using the notation ‘A
−1
’ for the inverse of an invert
ible A. Elementary matrices are invertible, and, as one would expect in
view of 2.5, E
−1
ij
= E
ij
, (E
(λ)
i
)
−1
= E
(λ
−1
)
i
and (E
(λ)
ij
)
−1
= E
(−λ)
ij
. A ma
trix which is a product of invertible matrices is also invertible, and we have
(AB)
−1
= B
−1
A
−1
.
The proofs of all the above results are easy and are omitted. In contrast,
the following important fact is deﬁnitely nontrivial.
2.9 Theorem If A, B ∈ Mat(n n, F) and if AB = I then it is also true
that BA = I. Thus A and B are inverses of each other.
Proof. By 2.8 there exist n n matrices T and R such that R is reduced
echelon, T is a product of elementary matrices, and TA = R. Let r be the
Chapter Two: Matrices, row vectors and column vectors 35
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
number of nonzero rows of R, and let the leading entry of the i
th
nonzero
row occur in column j
i
.
Since the leading entry of each nonzero row of R is further to the right
than that of the preceding row, we must have 1 ≤ j
1
< j
2
< < j
r
≤ n,
and if all the rows of R are nonzero (so that r = n) this forces j
i
= i for all i.
So either R has at least one zero row, or else the leading entries of the rows
occur in the diagonal positions.
Since the leading entries in a reduced echelon matrix are all 1, and since
all the other entries in a column containing a leading entry must be zero, it
follows that if all the diagonal entries of R are leading entries then R must
be just the identity matrix. So in this case we have TA = I. This gives
T = TI = T(AB) = (TA)B = IB = B, and we deduce that BA = I, as
required.
It remains to prove that it is impossible for R to have a zero row. Note
that T is invertible, since it is a product of elementary matrices, and since
AB = I we have R(BT
−1
) = TABT
−1
= TT
−1
= I. But the i
th
row of
R(BT
−1
) is obtained by multiplying the i
th
row of R by BT
−1
, and will
therefore be zero if the i
th
row of R is zero. Since R(BT
−1
) = I certainly
does not have a zero row, it follows that R does not have a zero row.
Note that Theorem 2.9 applies to square matrices only. If A and B
are not square it is in fact impossible for both AB and BA to be identity
matrices, as the next result shows.
2.10 Theorem If the matrix A has more rows than columns then it is
impossible to ﬁnd a matrix B such that AB = I.
The proof, which we omit, is very similar to the last part of the proof
of 2.9, and hinges on the fact that the reduced echelon matrix obtained from
A by pivoting must have a zero row.
'2e Determinants
We will have more to say about determinants in Chapter Eight; for the
present, we conﬁne ourselves to describing a method for calculating determi
nants and stating some properties which we will prove in Chapter Eight.
Associated with every square matrix is a scalar called the determinant
of the matrix. A 1 1 matrix is just a scalar, and its determinant is equal
36 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
to itself. For 2 2 matrices the rule is
det
a b
c d
= ad −bc.
We now proceed recursively. Let A be an n n matrix, and assume that we
know how to calculate the determinants of (n − 1) (n − 1) matrices. We
deﬁne the (i, j)
th
cofactor of A, denoted ‘cof
ij
(A)’, to be (−1)
i+j
times the
determinant of the matrix obtained by deleting the i
th
row and j
th
column
of A. So, for example, if n is 3 we have
A =
¸
A
11
A
12
A
13
A
21
A
22
A
23
A
31
A
32
A
33
and we ﬁnd that
cof
21
(A) = (−1)
3
det
A
12
A
13
A
32
A
33
= −A
12
A
33
+A
13
A
32
.
The determinant of A (for any n) is given by the socalled ﬁrst row expansion:
det A = A
11
cof
11
(A) +A
12
cof
12
(A) + +A
1n
cof
1n
(A)
=
n
¸
i=1
A
1i
cof
1i
(A).
(A better practical method for calculating determinants will be presented
later.)
It turns out that a matrix A is invertible if and only if det A = 0, and
if det A = 0 then the inverse of A is given by the formula
A
−1
=
1
det A
adj A
where adj A (the adjoint matrix†) is the transposed matrix of cofactors; that
is,
(adj A)
ij
= cof
ji
(A).
† called the adjugate matrix in an earlier edition of this book, since some mis
guided mathematicians use the term ‘adjoint matrix’ for the conjugate transpose
of a matrix.
Chapter Two: Matrices, row vectors and column vectors 37
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
In particular, in the 2 2 case this says that
a b
c d
−1
=
1
ad −bc
d −b
−c a
provided that ad − bc = 0, a formula that can easily be veriﬁed directly.
We remark that it is foolhardy to attempt to use the cofactor formula to
calculate inverses of matrices, except in the 2 2 case and (occasionally) the
3 3 case; a much quicker method exists using row operations.
The most important properties of determinants are as follows.
(i) A square matrix A has an inverse if and only if det A = 0.
(ii) Let A and B be n n matrices. Then det(AB) = det Adet B.
(iii) Let A be an n n matrix. Then det A = 0 if and only if there exists
a nonzero n 1 matrix x (that is, a column) satisfying the equation
Ax = 0.
This last property is particularly important for the next section.
Example
#4 Calculate adj A and (adj A)A for the following matrix A:
A =
¸
3 −1 4
−2 −2 3
7 3 −2
.
−− The cofactors are as follows:
(−1)
2
det
−2 3
3 −2
= −5 (−1)
3
det
−2 3
7 −2
= 17 (−1)
4
det
−2 −2
7 3
= 8
(−1)
3
det
−1 4
3 −2
= 10 (−1)
4
det
3 4
7 −2
= −34 (−1)
5
det
3 −1
7 3
= −16
(−1)
4
det
−1 4
−2 3
= 5 (−1)
5
det
3 4
−2 3
= −17 (−1)
6
det
3 −1
−2 −2
= −8.
So the transposed matrix of cofactors is
adj A =
¸
−5 10 −16
17 −34 −17
8 −16 −8
.
38 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Hence we ﬁnd that
(adj A)A =
¸
−5 10 5
17 −34 −17
8 −16 −8
¸
3 −1 4
−2 −2 3
7 3 −2
=
¸
−15 −20 + 35 5 −20 + 15 −20 + 30 −10
51 + 68 −119 −17 + 68 −51 68 −102 + 34
24 + 32 −56 −8 + 32 −24 32 −48 + 16
=
¸
0 0 0
0 0 0
0 0 0
.
Since (adj A)A should equal (det A)I, we conclude that det A = 0. −−<
'2f Introduction to eigenvalues
2.11 Definition Let A ∈ Mat(n n, F). A scalar λ is called an eigenvalue
of A if there exists a nonzero v ∈ F
n
such that Av = λv. Any such v is called
an eigenvector.
Comments
2.11.1 Eigenvalues are sometimes called characteristic roots or character
istic values, and the words ‘proper’ and ‘latent’ are occasionally used instead
of ‘characteristic’.
2.11.2 Since the ﬁeld Q of all rational numbers is contained in the ﬁeld
R of all real numbers, a matrix over Q can also be regarded as a matrix over
R. Similarly, a matrix over R can be regarded as a matrix over C, the ﬁeld
of all complex numbers. It is a perhaps unfortunate fact that there are some
rational matrices that have eigenvalues which are real or complex but not
rational, and some real matrices which have nonreal complex eigenvalues.
2.12 Theorem Let A ∈ Mat(n n, F). A scalar λ is an eigenvalue of A if
and only if det(A−λI) = 0.
Proof. For all v ∈ F
n
we have that Av = λv if and only if
(A−λI)v = Av −λv = 0.
Chapter Two: Matrices, row vectors and column vectors 39
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
By the last property of determinants mentioned in the previous section, a
nonzero such v exists if and only if det(A−λI) = 0.
Example
#5 Find the eigenvalues and corresponding eigenvectors of A =
3 1
1 3
.
−− We must ﬁnd the values of λ for which det(A−λI) = 0. Now
det(A−λI) = det
3 −λ 1
1 3 −λ
= (3 −λ)
2
−1.
So det(A−λI) = (4 −λ)(2 −λ), and the eigenvalues are 4 and 2.
To ﬁnd eigenvectors corresponding to the eigenvalue 2 we must ﬁnd
nonzero columns v satisfying (A−2I)v = 0; that is, we must solve
1 1
1 1
x
y
=
0
0
.
Trivially we ﬁnd that all solutions are of the form
ξ
−ξ
, and hence that
any nonzero ξ gives an eigenvector. Similarly, solving
−1 1
1 −1
x
y
=
0
0
we ﬁnd that the columns of the form
ξ
ξ
with ξ = 0 are the eigenvectors
corresponding to 4 . −−<
As we discovered in the above example the equation det(A − λI) = 0,
which we must solve in order to ﬁnd the eigenvalues of A, is a polynomial
equation for λ. It is called the characteristic equation of A. The polynomial
involved is of considerable theoretical importance.
2.13 Definition Let A ∈ Mat(n n, F) and let x be an indeterminate.
The polynomial f(x) = det(A − xI) is called the characteristic polynomial
of A.
Some authors deﬁne the characteristic polynomial to be det(xI − A),
which is the same as ours if n is even, the negative of ours if n is odd. Observe
40 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
that the characteristic roots (eigenvalues) are the roots of the characteristic
polynomial.
A square matrix A is said to be diagonal if A
ij
= 0 whenever i = j. As
we shall see in examples below and in the exercises, the following problem
arises naturally in the solution of systems simultaneous diﬀerential equations:
2.13.1
Given a square matrix A, ﬁnd an invertible
matrix P such that P
−1
AP is diagonal.
Solving this problem is sometimes called “diagonalizing A”. Eigenvalue the
ory is used to do it.
Examples
#6 Diagonalize the matrix A =
3 1
1 3
.
−− We have calculated the eigenvalues and eigenvectors of A in #5 above,
and so we know that
3 1
1 3
1
−1
=
2
−2
and
3 1
1 3
1
1
=
4
4
and using Proposition 2.2 to combine these into a single matrix equation
gives
2.13.2
3 1
1 3
1 1
−1 1
=
2 4
−2 4
=
1 1
−1 1
2 0
0 4
.
Deﬁning now
P =
1 1
−1 1
we see that det P = 2 = 0, and hence P
−1
exists. It follows immediately
from 2.13.2 that P
−1
AP = diag(2, 4). −−<
Chapter Two: Matrices, row vectors and column vectors 41
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#7 Solve the simultaneous diﬀerential equations
x
(t) = 3x(t) +y(t)
y
(t) = x(t) + 3y(t).
−− We may write the equations as
d
dt
x
y
= A
x
y
where A is the matrix from our previous example. The method is to ﬁnd a
change of variables which simpliﬁes the equations. Choose new variables u
and v related to x and y by
x
y
= P
u
v
=
p
11
u +p
12
v
p
21
u +p
22
v
where p
ij
is the (i, j)
th
entry of P. We require P to be invertible, so that it
is possible to express u and v in terms of x and y, and the entries of P must
constants, but there are no other restrictions. Diﬀerentiating, we obtain
d
dt
x
y
=
x
y
=
d
dt
(p
11
u +p
12
v)
d
dt
(p
21
u +p
22
v)
=
p
11
u
+p
12
v
p
21
u
+p
22
v
= P
u
v
since the p
ij
are constants. In terms of u and v the equations now become
P
u
v
= AP
u
v
or, equivalently,
u
v
= P
−1
AP
u
v
.
That is, in terms of the new variables the equations are of exactly the same
form as before, but the coeﬃcient matrix A has been replaced by P
−1
AP.
We naturally wish choose the matrix P in such a way that P
−1
AP is as
simple as possible, and as we have seen in the previous example it is possible
(in this case) to arrange that P
−1
AP is diagonal. To do so we choose P to be
any invertible matrix whose columns are eigenvectors of A. Hence we deﬁne
P =
1 1
−1 1
42 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(which means that x = u +v and y = −u +v).
Our new equations now are
u
v
=
2 0
0 4
u
v
or, on expanding, u
= 2u and v
= 4v. The solution to this is u = He
2t
and
v = Ke
4t
where H and K are arbitrary constants, and this gives
x
y
=
1 1
−1 1
He
2t
Ke
4t
=
He
2t
+Ke
4t
−He
2t
+Ke
4t
so that the most general solution to the original problem is x = He
2t
+Ke
4t
and y = −He
2t
+Ke
4t
, where H and K are arbitrary constants. −−<
#8 Another typical application involving eigenvalues is the Leslie popula
tion model. We are interested in how the size of a population varies as time
passes. At regular intervals the number of individuals in various age groups
is counted. At time t let
x
1
(t) = the number of individuals aged between 0 and 1
x
2
(t) = the number of individuals aged between 1 and 2
.
.
.
x
n
(t) = the number of individuals aged between n −1 and n
where n is the maximum age ever attained. Censuses are taken at times
t = 0, 1, 2, . . . and it is observed that
x
2
(i + 1) = b
1
x
1
(i)
x
3
(i + 1) = b
2
x
2
(i)
.
.
.
x
n
(i + 1) = b
n−1
x
n−1
(i).
Thus, for example, b
1
= 0.95 would mean that 95% of those aged between 0
and 1 at one census survive to be aged between 1 and 2 at the next. We also
ﬁnd that
x
1
(i + 1) = a
1
x
1
(i) +a
2
x
2
(i) + +a
n
x
n
(i)
Chapter Two: Matrices, row vectors and column vectors 43
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
for some constants a
1
, a
2
, . . . , a
n
. That is, the a
i
measure the contributions
to the birth rate from the diﬀerent age groups. In matrix notation
¸
¸
¸
¸
¸
x
1
(i + 1)
x
2
(i + 1)
x
3
(i + 1)
.
.
.
x
n
(i + 1)
=
¸
¸
¸
¸
¸
a
1
a
2
a
3
. . . a
n−1
a
n
b
1
0 0 . . . 0 0
0 b
2
0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . b
n−1
0
¸
¸
¸
¸
¸
x
1
(i)
x
2
(i)
x
3
(i)
.
.
.
x
n
(i)
or x
˜
(i + 1) = Lx
˜
(i), where x
˜
(t) is the column with x
j
(t) as its j
th
entry and
L =
¸
¸
¸
¸
¸
a
1
a
2
a
3
. . . a
n−1
a
n
b
1
0 0 . . . 0 0
0 b
2
0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . b
n−1
0
is the Leslie matrix.
Assuming this relationship between x
˜
(i+1) and x
˜
(i) persists indeﬁnitely
we see that x
˜
(k) = L
k
x
˜
(0). We are interested in the behaviour of L
k
x
˜
(0)
as k → ∞. It turns out that in fact this behaviour depends on the largest
eigenvalue of L, as we can see in an oversimpliﬁed example.
Suppose that n = 2 and L =
1 1
1 0
. That is, there are just two age
groups (young and old), and
(i) all young individuals survive to become old in the next era (b
1
= 1),
(ii) on average each young individual gives rise to one oﬀspring in the next
era, and so does each old individual (a
1
= a
2
= 1).
The eigenvalues of L are (1 +
√
5)/2 and (1 −
√
5)/2, and the corresponding
eigenvectors are
(1 +
√
5)/2
1
and
(1 −
√
5)/2
1
respectively.
Suppose that there are initially one billion young and one billion old
individuals. This tells us x
˜
(0). By solving equations we can express x
˜
(0) in
terms of the eigenvectors:
x
˜
(0) =
1
1
=
1 +
√
5
2
√
5
(1 +
√
5)/2
1
−
1 −
√
5
2
√
5
(1 −
√
5)/2
1
.
44 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
We deduce that
x
˜
(k) = L
k
x
˜
(0)
=
1 +
√
5
2
√
5
L
k
(1 +
√
5)/2
1
−
1 −
√
5
2
√
5
L
k
(1 −
√
5)/2
1
=
1 +
√
5
2
√
5
1 +
√
5
2
k
(1 +
√
5)/2
1
−
1 −
√
5
2
√
5
1 −
√
5
2
k
(1 −
√
5)/2
1
.
But
(1 −
√
5)/2
k
becomes insigniﬁcant in comparison with
(1 +
√
5)/2
k
as k → ∞. We see that for k large
x
˜
(k) ≈
1
√
5
(1 +
√
5)/2
k+2
(1 +
√
5)/2
k+1
.
So in the long term the ratio of young to old approaches (1 +
√
5)/2, and the
size of the population is multiplied by (1+
√
5)/2 (the largest eigenvalue) per
unit time.
#9 Let f(x, y) be a smooth real valued function of two real variables,
and suppose that we are interested in the behaviour of f near some ﬁxed
point. Choose that point to be the origin of our coordinate system. Under
appropriate conditions f(x, y) can be approximated by a Taylor series
f(x, y) ≈ p +qx +ry +sx
2
+ 2txy +uy
2
+
where higher order terms become less and less signiﬁcant as (x, y) is made
closer and closer to (0, 0). For the ﬁrst few terms we can conveniently use
matrix notation, and write
2.13.3 f(x
˜
) ≈ p +Lx
˜
+
t
x
˜
Qx
˜
where L = ( q r ) and Q =
s t
t u
, and x
˜
is the twocomponent column
with entries x and y. Note that the entries of L are given by the ﬁrst order
partial derivatives of f at 0
˜
, and the entries of Q are given by the second
Chapter Two: Matrices, row vectors and column vectors 45
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
order partial derivatives of f at 0
˜
. Furthermore, 2.13.3 is equally valid when
x
˜
has more than two components.
In the case that 0
˜
is a critical point of f the terms of degree 1 disappear,
and the behaviour of f near 0
˜
is determined by the quadratic form
t
x
˜
Qx
˜
.
To understand
t
x
˜
Qx
˜
it is important to be able to rotate the axes so as to
eliminate product terms like xy and be left only with the square terms. We
will return to this problem in Chapter Five, and show that it can always be
done. That the eigenvalues of the matrix Q are of crucial importance can be
seen easily in the two variable case, as follows.
Rotating the axes through an angle θ introduces new variables x
and
y
related to the old variables by
x
y
=
cos θ −sin θ
sin θ cos θ
x
y
or, equivalently
( x y ) = ( x
y
)
cos θ sin θ
−sin θ cos θ
.
That is, x
˜
= Tx
˜
and
t
x
˜
=
t
(x
˜
)
t
T. Substituting this into the expression for
our quadratic form gives
t
(x
˜
)
cos θ sin θ
−sin θ cos θ
s t
t u
cos θ −sin θ
sin θ cos θ
x
˜
or
t
(x
˜
)
t
TQTx
˜
. Our aim is to choose θ so that Q
=
t
TQT is diagonal.
Since it happens that the transpose of the rotation matrix T coincides with
its inverse, this is the same problem introduced in 2.13.1 above. It is because
Q is its own transpose that a diagonalizing matrix T can be found satisfying
T
−1
=
t
T.
The diagonal entries of the diagonal matrix Q
are just the eigenval
ues of Q. It can be seen that if the eigenvalues are both positive then the
critical point 0 is a local minimum. The corresponding fact is true in the
nvariable case: if all eigenvalues of Q are positive then we have a local min
imum. Likewise, if all are negative then we have a local maximum. If some
eigenvalues are negative and some are positive it is neither a maximum nor
a minimum. Since the determinant of Q
is the product of the eigenvalues,
46 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and since det Q = det Q
, it follows in the two variable case that we have a
local maximum or minimum if and only if det Q > 0. For n variables the test
is necessarily more complicated.
#10 We remark also that eigenvalues are of crucial importance in questions
of numerical stability in the solving of equations. If x and b are related by
Ax = b where A is a square matrix, and if it happens that A has an eigenvalue
very close to zero and another eigenvalue with a large absolute value, then it
is likely that small changes in either x or b will produce large changes in the
other. For example, if
A =
100 0
0 0.01
then changing the ﬁrst component of x by 0.1 changes the ﬁrst component of
b by 10, while changing the second component of b by 0.1 changes the second
component of x by 10. If, as in this example, the matrix A is equal to its own
transpose, the condition number of A is deﬁned as the ratio of the largest
eigenvalue of A to the smallest eigenvalue of A, and A is illconditioned (see
'2c) if the condition number is too large.
Whilst on numerical matters, we remark that numerical calculation of
eigenvalues is an important practical problem. One method is to make use of
the fact that (usually) if A is a square matrix and x a column then as k → ∞
the columns A
k
x tend to approach scalar multiples of an eigenvector. (We
saw this in #8 above.) In practice, especially for large matrices, the best
numerical methods for calculating eigenvalues do not depend upon direct
solution of the characteristic equation.
Exercises
1. Prove that matrix addition is associative (A + (B + C) = (A + B) + C
for all A, B and C) and commutative (A+B = B+A for all A and B).
2. Prove that if A is an mn matrix and I
m
and I
n
are (respectively) the
mm and n n identity matrices then AI
n
= A = I
m
A.
3. Prove that if A, B and C are matrices of appropriate shapes and λ and
µ are arbitrary numbers then A(λB +µC) = λ(AB) +µ(AC).
Chapter Two: Matrices, row vectors and column vectors 47
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
4. Use the properties of matrix addition and multiplication from the pre
vious exercises together with associativity of matrix multiplication to
prove that a square matrix can have at most one inverse.
5. Check, by tedious expansion, that the formula A(adj A) = (det A)I is
valid for any 3 3 matrix A.
6. Prove that
t
(AB) = (
t
B)(
t
A) for all matrices A and B such that AB
is deﬁned. Prove that if the entries of A and B are complex numbers
then AB = AB. (If X is a complex matrix then X is the matrix whose
(i, j)entry is the complex conjugate of X
ij
).)
7. A particle moves in the plane, its coordinates at time t being (x(t), y(t))
where x and y are diﬀerentiable functions on [0, ∞). Suppose that
(∗)
x
(t) = −2y(t)
y
(t) = x(t) + 3y(t)
Show (by direct calculation) that the equations (∗) can be solved by
putting x(t) = 2z(t) − w(t) and y(t) = −z(t) + w(t), and obtaining
diﬀerential equations for z and w.
8. (i ) Show that if ξ is an eigenvalue of the 2 2 matrix
a b
c d
then ξ is a solution of the quadratic equation
x
2
−(a +d)x + (ad −bc) = 0.
Hence ﬁnd the eigenvalues of the matrix
0 −2
1 3
.
(ii ) Find an eigenvector for each of these eigenvalues.
9. (i ) Suppose that A is an n n matrix and that ξ
1
, ξ
2
, . . . , ξ
n
are
eigenvalues for A. For each i let v
i
be an eigenvector corresponding
to the eigenvalue ξ, and let P be the nn matrix whose n columns
are v
1
, v
2
, . . . , v
n
(in that order). Show that AP = PD, where D
is the diagonal matrix with diagonal entries ξ
1
, ξ
2
, . . . , ξ
n
. (That
is, D
ij
= ξ
i
δ
ij
, where δ
ij
is the Kronecker delta.)
48 Chapter Two: Matrices, row vectors and column vectors
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(ii ) Use the previous exercise to ﬁnd a matrix P such that
0 −2
1 3
P = P
ξ 0
0 ν
where ξ and ν are the eigenvalues of the matrix on the left. (Note
the connection with Exercise 7.)
10. Calculate the eigenvalues of the following matrices, and for each eigen
value ﬁnd an eigenvector.
(a)
4 2
−1 1
(b)
¸
2 −2 7
0 −1 4
0 0 5
(c)
¸
−7 −2 6
−2 1 2
−10 −2 9
11. Solve the system of diﬀerential equations
x
= −7x −2y + 6z
y
= −2x + y + 2z
z
= −10x −2y + 9z.
12. Let A be an n n matrix. Prove that the characteristic polynomial
c
A
(x) of A has degree n. Prove also that the leading coeﬃcient of c
A
(x)
is (−1)
n
, the coeﬃcient of x
n−1
is (−1)
n−1
¸
n
i=1
A
ii
, and the constant
term is det A. (Hint: Use induction on n.)
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3
Introduction to vector spaces
Pure Mathematicians love to generalize ideas. If they manage, by means of a
new trick, to prove some conjecture, they always endeavour to get maximum
mileage out of the idea by searching for other situations in which it can be
used. To do this successfully one must discard unnecessary details and focus
attention only on what is really needed to make the idea work; furthermore,
this should lead both to simpliﬁcations and deeper understanding. It also
leads naturally to an axiomatic approach to mathematics, in which one lists
initially as axioms all the things which have to hold before the theory will be
applicable, and then attempts to derive consequences of these axioms. Poten
tially this kills many birds with one stone, since good theories are applicable
in many diﬀerent situations.
We have alreay used the axiomatic approach in Chapter 1 in the def
inition of ‘ﬁeld’, and in this chapter we proceed to the deﬁnition of ‘vector
space’. We start with a discussion of linearity, since one of the major reasons
for introducing vector spaces is to provide a suitable context for discussion
of this concept.
'3a Linearity
A common mathematical problem is to solve a system of equations of the
form
3.0.1 T(x) = 0.
Depending on the context the unknown x could be a number, or something
more complicated, such as an ntuple, a function, or even an ntuple of
functions. For instance the simultaneous linear equations
3.0.2
¸
3 1 1 1
−1 1 −1 2
4 4 0 6
¸
¸
x
1
x
2
x
3
x
4
=
¸
0
0
0
49
50 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
have the form 3.0.1 with x a 4tuple, and the diﬀerential equation
3.0.3 t
2
d
2
x/dt
2
+t dx/dt + (t
2
−4)x = 0
is an example in which x is a function. Similarly, the pair of simultaneous
diﬀerential equations
df/dt = 2f + 3g
dg/dt = 2f + 7g
could be rewritten as
3.0.4
d
dt
f
g
−
2 3
2 7
f
g
=
0
0
which is also has the form 3.0.1, with x being an ordered pair of functions.
Whether or not one can solve the system 3.0.1 obviously depends on
the nature of the expression T(x) on the left hand side. In this course we
will be interested in cases in which T(x) is linear in x.
3.1 Definition An expression T(x) is said to be linear in the variable x if
T(x
1
+x
2
) = T(x
1
) +T(x
2
)
for all values x
1
and x
2
of the variable, and
T(λx) = λT(x)
for all values of x and all scalars λ.
If we let T(x) denote the expression on the left hand side of the equa
tion 3.0.3 above we ﬁnd that
T(x
1
+x
2
) = t
2
d
2
dt
2
(x
1
+x
2
) +t
d
dt
(x
1
+x
2
) + (t
2
−4)(x
1
+x
2
)
= t
2
(d
2
x
1
/dt
2
+d
2
x
2
/dt
2
) +t(dx
1
/dt +dx
2
/dt)
+ (t
2
−4)x
1
+ (t
2
−4)x
2
=
t
2
d
2
x
1
/dt
2
+tdx
1
/dt + (t
2
−4)x
1
+
t
2
d
2
x
2
/dt
2
+tdx
2
/dt + (t
2
−4)x
2
= T(x
1
) +T(x
2
)
and
T(λx) = t
2
d
2
dt
2
(λx) +t
d
dt
(λx) + (t
2
−4)λx
= λ(t
2
d
2
x/dt
2
+tdx/dt + (t
2
−4)x)
= λT(x).
Chapter Three: Introduction to vector spaces 51
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Thus T(x) is a linear expression, and for this reason 3.0.3 is called a linear
diﬀerential equation. It is left for the reader to check that 3.0.2 and 3.0.4
above are also linear equations.
Example
#1 Suppose that T is linear in x, and let x = x
0
be a particular solution
of the equation T(x) = a. Show that the general solution of T(x) = a is
x = x
0
+y for y ∈ S, where S is the set of all solutions of T(y) = 0.
−− We are given that T(x
0
) = a. Let y ∈ S, and put x = x
0
+ y. Then
T(y) = 0, and linearity of T yields
T(x) = T(x
0
+y) = T(x
0
) +T(y) = a + 0 = a,
and we have shown that x = x
0
+y is a solution of T(x) = a whenever y ∈ S.
Now suppose that x is any solution of T(x) = a, and deﬁne y = x−x
0
.
Then we have that x = x
0
+y, and by linearity of T we ﬁnd that
T(y) = T(x −x
0
) = T(x) −T(x
0
) = a −a = 0,
so that y ∈ S. Hence all solutions of T(x) = a have the given form. −−<
Our deﬁnition of linearity may seem reasonable at ﬁrst sight, but a
closer inspection reveals some deﬁciencies. The term ‘expression in x’ is a
little imprecise; in fact T is simply a function, and we should really have said
something like this:
3.1.1
A function T: V → W is linear if T(x
1
+ x
2
) = T(x
1
) + T(x
2
)
and T(λx) = λT(x) for all x
1
, x
2
, x ∈ V and all scalars λ.
For this to be meaningful V and W have to both be equipped with operations
of addition, and one must also be able to multiply elements of V and W by
scalars.
The set of all m n matrices (where m and n are ﬁxed) provides an
example of a set equipped with addition and scalar multiplication: with
the deﬁnitions as given in Chapter 1, the sum of two m n matrices is
another mn matrix, and a scalar multiple of an mn matrix is an mn
matrix. We have also seen that addition and scalar multiplication of rows
and columns arises naturally in the solution of simultaneous equations; for
52 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
instance, we expressed the general solution of 2.2.4 in terms of addition and
scalar multiplication of columns. Finally, we can deﬁne addition and scalar
multiplication for real valued diﬀerentiable functions. If x and y are such
functions then x + y is the real valued diﬀerentiable function deﬁned by
(x + y)(t) = x(t) + y(t), and if λ ∈ R then λx is the function deﬁned by
(λx)(t) = λx(t); these deﬁnitions were implicit in our discussion of 3.0.3
above.
Roughly speaking, a vector space is a set for which addition and scalar
multiplication are deﬁned in some sensible fashion. By “sensible” I mean
that a certain list of obviously desirable properties must be satisﬁed. It will
then make sense to ask whether a function from one vector space to another
is linear in the sense of 3.1.1.
'3b Vector axioms
In accordance with the above discussion, a vector space should be a set
(whose elements will be called vectors) which is equipped with an operation
of addition, and a scalar multiplication function which determines an element
λx of the vector space whenever x is an element of the vector space and λ is
a scalar. That is, if x and y are two vectors then their sum x +y must exist
and be another vector, and if x is a vector and λ a scalar then λx must exist
and be vector.
3.2 Definition Let F be a ﬁeld. A set V is called a vector space over F
if there is an operation of addition
(x, y) −→ x + y
on V , and a scalar multiplication function
(λ, x) −→ λx
from F V to V , such that the following properties are satisﬁed.
(i) (u +v) +w = u + (v +w) for all u, v, w ∈ V .
(ii) u +v = v +u for all u, v ∈ V .
(iii) There exists an element 0 ∈ V such that 0 +v = v for all v ∈ V .
(iv) For each v ∈ V there exists a u ∈ V such that u +v = 0.
(v) 1v = v for all v ∈ V .
(vi) λ(µv) = (λµ)v for all λ, µ ∈ F and all v ∈ V .
(vii) (λ +µ)v = λv +µv for all λ, µ ∈ F and all v ∈ V .
Chapter Three: Introduction to vector spaces 53
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(viii) λ(u +v) = λu +λv for all λ ∈ F and all u, v ∈ V .
Comments
3.2.1 The ﬁeld of scalars F and the vector space V both have zero
elements which are both commonly denoted by the symbol ‘0’. With a little
care one can always tell from the context whether 0 means the zero scalar or
the zero vector. Sometimes we will distinguish notationally between vectors
and scalars by writing a tilde underneath vectors; thus 0
˜
would denote the
zero of the vector space under discussion.
3.2.2 The 1 which appears in axiom (v) is the scalar 1; there is no
vector 1.
Examples
#2 Let { be the Euclidean plane and choose a ﬁxed point O ∈ {. The
set of all line segments OP, as P varies over all points of the plane, can be
made into a vector space over R, called the space of position vectors relative
to O. The sum of two line segments is deﬁned by the parallelogram rule; that
is, for P, Q ∈ { ﬁnd the point R such that OPRQ is a parallelogram, and
deﬁne OP + OQ = OR. If λ is a positive real number then λOP is the line
segment OS such that S lies on OP or OP produced and the length of OS
is λ times the length of OP. For negative λ the product λOP is deﬁned
similarly (giving a point on PO produced).
The proofs that the vector space axioms are satisﬁed are nontrivial
exercises in Euclidean geometry. The same construction is also applicable in
three dimensional Euclidean space.
#3 The set R
3
, consisting of all ordered triples of real numbers, is an
example of a vector space over R. Addition and scalar multiplication are
deﬁned in the usual way:
¸
x
1
y
1
z
2
+
¸
x
2
y
2
z
2
=
¸
x
1
+x
2
y
1
+y
2
z
1
+z
2
λ
¸
x
y
z
=
¸
λx
λy
λz
for all real numbers λ, x, x
1
, . . . etc.. It is trivial to check that the axioms
listed in the deﬁnition above are satisﬁed. Of course, the use of Cartesian
54 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
coordinates shows that this vector space is essentially the same as position
vectors in Euclidean space; you should convince yourself that componentwise
addition really does correspond to the parallelogram law.
#4 The set R
n
of all ntuples of real numbers is a vector space over R for
all positive integers n. Everything is completely analogous to R
3
. (Recall
that we have deﬁned
R
n
=
¸
¸
¸
x
1
x
2
.
.
.
x
n
x
i
∈ R for all i
t
R
n
= ¦ ( x
1
x
2
. . . x
n
) [ x
i
∈ R for all i ¦.
It is clear that
t
R
n
is also a vector space.)
#5 If F is any ﬁeld then F
n
and
t
F
n
are both vector spaces over F. These
spaces of ntuples are the most important examples of vector spaces, and the
ﬁrst examples to think of when trying to understand theoretical results.
#6 Let S be any set and let o be the set of all functions from S to F
(where F is a ﬁeld). If f, g ∈ o and λ ∈ F deﬁne f +g, λf ∈ o as follows:
(f +g)(a) = f(a) +g(a)
(λf)(a) = λ
f(a)
for all a ∈ S. (Note that addition and multiplication on the right hand side
of these equations take place in F; you do not have to be able to add and
multiply elements of S for the deﬁnitions to make sense.) It can now be
shown these deﬁnitions of addition and scalar multiplication make o into a
vector space over F; to do this we must verify that all the axioms are satisﬁed.
In each case the proof is routine, based on the fact that F satisﬁes the ﬁeld
axioms.
(i) Let f, g, h ∈ o. Then for all a ∈ S,
(f +g) +h
(a) = (f +g)(a) +h(a) (by deﬁnition addition on o)
= (f(a) +g(a)) +h(a) (same reason)
= f(a) + (g(a) +h(a)) (addition in F is associative)
= f(a) + (g +h)(a) (deﬁnition of addition on o)
=
f + (g +h)
(a) (same reason)
Chapter Three: Introduction to vector spaces 55
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and so (f +g) +h = f + (g +h).
(ii) Let f, g ∈ o. Then for all a ∈ S,
(f +g)(a) = f(a) +g(a) (by deﬁnition)
= g(a) +f(a) (since addition in F is commutative)
= (g +f)(a) (by deﬁnition),
so that f +g = g +f.
(iii) Deﬁne z: S → F by z(a) = 0 (the zero element of F) for all a ∈ S. We
must show that this zero function z satisﬁes f + z = f for all f ∈ o.
For all a ∈ S we have
(f +z)(a) = f(a) +z(a) = f(a) + 0 = f(a)
by the deﬁnition of addition in o and the Zero Axiom for ﬁelds, whence
the result.
(iv) Suppose that f ∈ o. Deﬁne g ∈ o by g(a) = −f(a) for all a ∈ S. Then
for all a ∈ S,
(g +f)(a) = g(a) +f(a) = 0 = z(a),
so that g +f = z. Thus each element of o has a negative.
(v) Suppose that f ∈ o. By deﬁnition of scalar multiplication for o and
the Identity Axiom for ﬁelds, we have, for all a ∈ S,
(1f)(a) = 1
f(a)
= f(a)
and therefore 1f = f.
(vi) Let λ, µ ∈ F and f ∈ o. Then for all a ∈ S,
λ(µf)
(a) = λ
(µf)(a)
= λ(µf(a)) = (λµ) f(a) =
(λµ)f
(a)
by the deﬁnition of scalar multiplication for o and associativity of mul
tiplication in the ﬁeld F. Thus λ(µf) = (λµ)f.
(vii) Let λ, µ ∈ F and f ∈ o. Then for all a ∈ S,
(λ +µ)f
(a) = (λ +µ)
f(a)
= λf(a) +µf(a)
= (λf)(a) + (µf)(a) = (λf +µf)(a)
56 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
by the deﬁnition of addition and scalar multiplication for o and the
distributive law in F. Thus (λ +µ)f = λf +µf.
(viii) Let λ ∈ F and f, g ∈ o. Then for all a ∈ S,
λ(f +g)
(a) = λ
(f +g)(a)
= λ
f(a) +g(a)
= λf(a) +λg(a) = (λf)(a) + (λg)(a) = (λf +λg)(a),
whence λ(f +g) = λf +λg.
Observe that if S = ¦1, 2, . . . , n¦ then the set o is essentially the same
as F
n
, since an ntuple
¸
¸
¸
a
1
a
2
.
.
.
a
n
in F
n
may be identiﬁed with the function
f: ¦1, 2, . . . , n¦ → F given by f(i) = a
i
for i = 1, 2, . . . , n; it is readily
checked that the deﬁnitions of addition and scalar multiplication for ntuples
are consistent with the deﬁnitions of addition and scalar multiplication for
functions. (Indeed, since we have deﬁned a column to be an n 1 matrix,
and a matrix to be a function, F
n
is in fact the set of all functions from
S ¦1¦ to F. There is an obvious bijective correspondence between the sets
S and S ¦1¦.)
#7 The set ( of all continuous functions f: R →R is a vector space over R,
addition and scalar multiplication being deﬁned as in the previous example.
It is necessary to show that these deﬁnitions do give well deﬁned functions
( ( → ( (for addition) and R( → ( (for scalar multiplication). In other
words, one must show that if f and g are continuous and λ ∈ R then f + g
and λf are continuous. This follows from elementary calculus. It is necessary
also to show that the vector space axioms are satisﬁed; this is routine to do,
and similar to the previous example.
#8 The set of all continuously diﬀerentiable functions from R to R, the
set of all three times diﬀerentiable functions from (−1, 1) to R, and the set of
all integrable functions from [0, 10] to R are further examples of vector spaces
over R, and there are many similar examples. It is also straightforward to
show that the set of all polynomial functions from R to R is a vector space
over R.
#9 The solution set of a linear equation T(x) = 0 is always a vector space.
Chapter Three: Introduction to vector spaces 57
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Thus, for instance, the subset of R
3
consisting of all triples
¸
x
y
z
such that
¸
1 0 5
2 1 2
4 4 8
¸
x
y
z
=
¸
0
0
0
is a vector space.
Stating the result more precisely, if V and W are vector spaces over a
ﬁeld F and T: V → W is a linear function then the set
U = ¦ x ∈ V [ T(x) = 0 ¦
is a vector space over F, relative to addition and scalar multiplication func
tions “inherited” from V . To prove this one must ﬁrst show that if x, y ∈ U
and λ ∈ F then x + y, λx ∈ U (so that we do have well deﬁned addition
and scalar multiplication functions for U), and then check the axioms. If
T(x) = T(y) = 0 and λ ∈ F then linearity of T gives
T(x +y) = T(x) +T(y) = 0 + 0 = 0
T(λx) = λT(x) = λ0 = 0
so that x + y, λx ∈ U. The proofs that each of the vector space axioms are
satisﬁed in U are relatively straightforward, based in each case on the fact
that the same axiom is satisﬁed in V . A complete proof will be given below
(see 3.13).
To conclude this section we list some examples of functions which are
linear. For some unknown reason it is common in vector space theory to speak
of ‘linear transformations’ rather than ‘linear functions’, although the words
‘transformation’ and ‘function’ are actually synonomous in mathematics. As
explained above, the domain and codomain of a linear transformation should
both be vector spaces (such as the vector spaces given in the examples above).
The deﬁnition of ‘linear transformation’ was essentially given in 3.1.1 above,
but there is no harm in writing it down again:
58 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3.3 Definition Let V and W be vector spaces over the ﬁeld F. A function
T: V → W is called a linear transformation if T(u + v) = T(u) + T(v) and
T(λu) = λT(u) for all u, v ∈ V and all λ ∈ F.
Comment
3.3.1 A function T which satisﬁes T(u +v) = T(u) +T(v) for all u and
v is sometimes said to preserve addition. Likewise, a function T satisfying
T(λv) = λT(v) for all λ and v is said to preserve scalar multiplication.
Examples
#10 Let T: R
3
→R
2
be deﬁned by
T
¸
x
y
z
=
x +y +z
2x −y
.
Let u, v ∈ R
3
and λ ∈ R. Then
u =
¸
x
1
y
1
z
1
v =
¸
x
2
y
2
z
2
for some x
i
, y
i
, z
i
∈ R, and we have
T(u +v) = T
¸
x
1
+x
2
y
1
+y
2
z
1
+z
2
=
x
1
+x
2
+y
1
+y
2
+z
1
+z
2
2(x
1
+x
2
) −(y
1
+y
2
)
=
x
1
+y
1
+z
1
2x
1
−y
1
+
x
2
+y
2
+z
2
2x
2
−y
2
= T(u) +T(v),
T(λu) = T
¸
λx
1
λy
1
λz
1
=
λx
1
+λy
1
+λz
1
2λx
1
−λy
1
=
λ(x
1
+y
1
+z
1
)
λ(2x
1
−y
1
)
= λ
x
1
+y
1
+z
1
2x
1
−y
1
= λT(u).
Since these equations hold for all u, v and λ it follows that T is a linear
transformation.
Chapter Three: Introduction to vector spaces 59
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Note that the deﬁnition of this function T could be rewritten as
T
¸
x
y
z
=
1 1 1
2 −1 0
¸
x
y
z
.
Writing A =
1 1 1
2 −1 0
, the calculations above amount to showing that
A(u + v) = Au + Av and A(λu) = λ(Au) for all u, v ∈ R
3
and all λ ∈ R,
facts which we have already noted in Chapter One.
#11 Let F be any ﬁeld and A any matrix of m rows and n columns with
entries from F. If the function f: F
n
→ F
m
is deﬁned by f(x
˜
) = Ax
˜
for
all x
˜
∈ F
n
, then f is a linear transformation. The proof of this is straight
forward, using Exercise 3 from Chapter 2. The calculations are reproduced
below, in case you have not done that exercise. We use the notation (v
˜
)
i
for
the i
th
entry of a column v
˜
.
Let u
˜
, v
˜
∈ F
n
and λ ∈ F. Then
(f(u
˜
+v
˜
))
i
= (A(u
˜
+v
˜
))
i
=
n
¸
j=1
A
ij
(u
˜
+v
˜
)
j
=
n
¸
j=1
A
ij
(u
˜
)
j
+ (v
˜
)
j
=
n
¸
j=1
A
ij
(u
˜
)
j
+A
ij
(v
˜
)
j
=
n
¸
j=1
A
ij
(u
˜
)
j
+
n
¸
j=1
A
ij
(v
˜
)
j
= (Au
˜
)
i
+ (Av
˜
)
i
= (f(u
˜
))
i
+ (f(v
˜
))
i
= (f(u
˜
) +f(v
˜
))
i
Similarly,
(f(λu
˜
))
i
= (A(λu
˜
))
i
=
n
¸
j=1
A
ij
(λu
˜
)
j
=
n
¸
j=1
A
ij
λ
(u
˜
)
j
= λ
n
¸
j=1
A
ij
(u
˜
)
j
= λ(Au
˜
)
i
= λ(f(u
˜
))
i
= (λf(u
˜
))
i
.
This holds for all i; so f(u
˜
+ v
˜
) = f(u
˜
) + f(v
˜
) and f(λu
˜
) = λf(u
˜
), and
therefore f is linear.
#12 Let T be the set of all functions with domain Z(the set of all integers)
and codomain R. By #6 above we know that T is a vector space over R.
Let G: T →R
2
be deﬁned by
G(f) =
f(−1)
f(2)
60 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
for all f ∈ T. Then if f, g ∈ T and λ ∈ R we have
G(f +g) =
(f +g)(−1)
(f +g)(2)
(by deﬁnition of G)
=
f(−1) +g(−1)
f(2) +g(2)
(by the deﬁnition of
addition of functions)
=
f(−1)
f(2)
+
g(−1)
g(2)
(by the deﬁnition of
addition of columns)
= G(f) +G(g),
and similarly
G(λf) =
(λf)(−1)
(λf)(2)
(deﬁnition of G)
=
λ
f(−1)
λ
f(2)
(deﬁnition of scalar multiplication
for functions)
= λ
f(−1)
f(2)
(deﬁnition of scalar multiplication
for columns)
= λG(f),
and it follows that G is linear.
#13 Let ( be the vector space consisting of all continuous functions from
the open interval (−1, 7) to R, and deﬁne I: ( →R by I(f) =
4
0
f(x) dx. Let
f, g ∈ ( and λ ∈ R. Then
I(f +g) =
4
0
(f +g)(x) dx
=
4
0
f(x) +g(x) dx
=
4
0
f(x) dx +
4
0
g(x) dx
(by basic integral
calculus)
= I(f) +I(g),
and similarly
I(λf) =
4
0
(λf)(x) dx
Chapter Three: Introduction to vector spaces 61
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
=
4
0
λ
f(x)
dx
= λ
4
0
f(x) dx
= λI(f),
whence I is linear.
#14 It is as well to ﬁnish with an example of a function which is not linear.
If o is the set of all functions from R to R, then the function T: o → R
deﬁned by T(f) = 1 +f(4) is not linear. For instance, if f ∈ o is deﬁned by
f(x) = x for all x ∈ R then
T(2f) = 1 + (2f)(4) = 1 + 2
f(4)
= 2 + 2
f(4)
= 2(1 +f(4)) = 2T(f),
so that T does not preserve scalar multiplication. Hence T is not linear. (In
fact, T does not preserve addition either.)
'3c Trivial consequences of the axioms
Having deﬁned vector spaces in the previous section, our next objective is
to prove theorems about vector spaces. In doing this we must be careful
to make sure that our proofs use only the axioms and nothing else, for only
that way can we be sure that every system which satisﬁes the axioms will also
satisfy all the theorems that we prove. An unfortunate consequence of this
is that we must start by proving trivialities, or, rather, things which seem to
be trivialities because they are familiar to us in slightly diﬀerent contexts. It
is necessary to prove that these things are indeed consequences of the vector
space axioms. It is also useful to learn the art of constructing proofs by doing
proofs of trivial facts before trying to prove diﬃcult theorems.
Throughout this section F will be a ﬁeld and V a vector space over F.
3.4 Proposition For all u, v, w ∈ V , if v +u = w +u then v = w.
Proof. Assume that v +u = w +u. By Axiom (iv) in Deﬁnition 3.2 there
exists t ∈ V such that u +t = 0
˜
. Now adding t to both sides of the equation
62 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
gives
(v +u) +t = (w +u) +t
v + (u +t) = w + (u +t) (by Axiom (i))
v + 0
˜
= w + 0
˜
(by the choice of t)
v = w (by Axiom (iii)).
3.5 Proposition For each v ∈ V there is a unique t ∈ V which satisﬁes
t +v = 0
˜
.
Proof. The existence of such a t for each v is immediate from Axioms (iv)
and (ii). Uniqueness follows from 3.4 above since if t +v = 0
˜
and t
+v = 0
˜
then, by 3.4, t = t
.
By Proposition 3.5 there is no ambiguity in using the customary nota
tion ‘−v’ for the negative of a vector v, and we will do this henceforward.
3.6 Proposition If u, v ∈ V and u +v = v then u = 0
˜
.
Proof. Assume that u+v = v. Then by Axiom (iii) we have u+v = 0
˜
+v,
and by 3.4, v = 0
˜
.
We comment that 3.6 shows that V cannot have more than one zero
element.
3.7 Proposition Let λ ∈ F and v ∈ V . Then λ0
˜
= 0
˜
= 0v, and, con
versely, if λv = 0
˜
then either λ = 0 or v = 0
˜
. We also have (−1)v = −v.
Proof. By deﬁnition of 0
˜
we have 0
˜
+ 0
˜
= 0
˜
, and therefore
λ(0
˜
+ 0
˜
) = λ0
˜
λ0
˜
+λ0
˜
= λ0
˜
(by Axiom (viii))
λ0
˜
= 0
˜
(by 3.6).
By the ﬁeld axioms (see (ii) of Deﬁnition 1.2) we have 0+0=0, and so by
Axiom (vii) (in Deﬁnition 3.2)
0v + 0v = (0 + 0)v = 0v,
Chapter Three: Introduction to vector spaces 63
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
whence 0v = 0
˜
by 3.6. For the converse, suppose that λv = 0
˜
and λ = 0.
Field axioms guarantee that λ
−1
exists, and we deduce that
v = 1v = (λ
−1
λ)v = λ
−1
(λv) = λ
−1
0
˜
= 0
˜
.
For the last part, observe that we have (−1) + 1 = 0 by ﬁeld axioms,
and therefore
(−1)v +v = (−1)v + 1v = ((−1) + 1)v = 0v = 0
˜
by Axioms (v) and (vii) and the ﬁrst part. By 3.5 above we deduce that
(−1)v = −v.
'3d Subspaces
If V is a vector space over a ﬁeld F and U is a subset of V we may ask
whether the operation of addition on V gives rise to an operation of addition
on U. Since an operation on U is a function U U → U, it does so if and
only if adding two elements of U always gives another element of U. Likewise,
the scalar multiplication function for V gives rise to a scalar multiplication
function for U if and only if multiplying an element of U by a scalar always
gives another element of U.
3.8 Definition A subset U of a vector space V is said to be closed under
addition and scalar multiplication if
(i) u
1
+u
2
∈ U for all u
1
, u
2
∈ U
(ii) λu ∈ U for all u ∈ U and all scalars λ.
If a subset U of V is closed in this sense, it is natural to ask whether U
is a vector space relative to the addition and scalar multiplication inherited
from V ; if it is we say that U is a subspace of V .
3.9 Definition A subset U of a vector space V is called a subspace of
V if U is itself a vector space relative to addition and scalar multiplication
inherited from V .
It turns out that a subset which is closed under addition and scalar
multiplication is always a subspace, provided only that it is nonempty.
64 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3.10 Theorem If V is a vector space and U a subset of V which is non
empty and closed under addition and scalar multiplication, then U is a sub
space of V .
Proof. It is necessary only to verify that the inherited operations satisfy
the vector space axioms. In most cases the fact that a given axiom is satisﬁed
in V trivially implies that the same axiom is satisﬁed in U.
Let x, y, z ∈ U. Then x, y, z ∈ V , and so by Axiom (i) for V it follows
that (x +y) +z = x + (y +z). Thus Axiom (i) holds in U.
Let x, y ∈ U. Then x, y ∈ V , and so x + y = y + x. Thus Axiom (ii)
holds.
The next task is to prove that U has a zero element. Since V is a vector
space we know that V has a zero element, which we will denote by ‘0
˜
, but at
ﬁrst sight it seems possible that 0
˜
may fail to be in the subset U. However,
since U is nonempty there certainly exists at least one element in U. Let
x be such an element. By closure under scalar multiplication we have that
0x ∈ U. But Proposition 3.7 gives 0x = 0
˜
(since x is an element of V ), and
so, after all, it is necessarily true that 0
˜
∈ U. It is now trivial that 0
˜
is also
a zero element for U, since if y ∈ U is arbitrary then y ∈ V and Axiom (iii)
for V gives 0
˜
+y = y.
For Axiom (iv) we must prove that each x ∈ U has a negative in U.
Since Axiom (iv) for V guarantees that x has a negative in V and since the
zero of U is the same as the zero of V , it suﬃces to show that −x ∈ U.
But x ∈ U gives (−1)x ∈ U (by closure under scalar multiplication), and
−x = (−1)x (by 3.7); so the result follows.
The remaining axioms are trivially proved by arguments similar to those
used for axioms (i) and (ii).
Comments
3.10.1 It is easily seen that if V is a vector space then the set V itself is
a subspace of V , and the set ¦0¦ (consisting of just the zero element of V ) is
also a subspace of V .
3.10.2 In #9 above we claimed that if V and W are vector spaces over F
and T: V → W a linear transformation then the set U = ¦ v ∈ V [ T(v) = 0 ¦
is a subspace of V . In view of 3.10 it is no longer necessary to check all eight
vector space axioms in order to prove this; it suﬃces, instead, to prove merely
that U is nonempty and closed under addition and scalar multiplication.
Chapter Three: Introduction to vector spaces 65
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Examples
#15 Let F = R, V = R
3
and U = ¦
¸
x
y
x +y
x, y ∈ R¦. Prove that U
is a subspace of V .
−− In view of Theorem 3.10, we must prove that U is nonempty and
closed under addition and scalar multiplication.
It is clear that U is nonempty:
¸
0
0
0
∈ U.
Let u, v be arbitrary elements of U. Then
u =
¸
x
y
x +y
, v =
¸
x
y
x
+y
for some x, y, x
, y
∈ R, and we see that
u +v =
¸
x
y
x
+y
(where x
= x +x
, y
= y +y
),
and this is an element of U. Hence U is closed under addition.
Let if u be an arbitrary element of U and λ an arbitrary scalar. Then
u =
¸
x
y
x +y
for some x, y ∈ R, and
λu =
¸
λx
λy
λ(x +y)
=
¸
λx
λy
λx +λy
∈ U.
Thus U is closed under scalar multiplication. −−<
#16 Use the result proved in #6 and elementary calculus to prove that the
set ( of all real valued continuous functions on the closed interval [0, 1] is a
vector space over R.
−− By #6 the set o of all real valued functions on [0, 1] is a vector space
over R; so it will suﬃce to prove that ( is a subspace of o. By Theorem 3.10
66 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
then it suﬃces to prove that ( is nonempty and closed under addition and
scalar multiplication.
The zero function is clearly continuous; so ( is nonempty.
Let f, g ∈ ( and t ∈ R. For all a ∈ [0, 1] we have
lim
x→a
(f +g)(x) = lim
x→a
f(x) +g(x)
(deﬁnition of f +g)
= lim
x→a
f(x) + lim
x→a
g(x) (basic calculus)
= f(a) +g(a) (since f, g are continuous)
= (f +g)(a)
and similarly
lim
x→a
(tf)(x) = lim
x→a
t
f(x)
(deﬁnition of tf)
= t lim
x→a
f(x) (basic calculus)
= t
f(a)
(since f is continuous)
= (tf)(a),
so that f + g and tf are continuous. Hence ( is closed under addition and
scalar multiplication. −−<
Since one of the major reasons for introducing vector spaces was to elu
cidate the concept of ‘linear transformation’, much of our time will be devoted
to the study of linear transformations. Whenever T is a linear transforma
tion, it is always of interest to ﬁnd those vectors x such that T(x) is zero.
3.11 Definition Let V and W be vector spaces over a ﬁeld F and let
T: V → W be a linear transformation. The subset of V
ker T = ¦ x ∈ V [ T(x) = 0
W
¦
is called the kernel of T.
In this deﬁnition ‘0
W
’ denotes the zero element of the space W. Since
the domain V and codomain W of the transformation T may very well be
diﬀerent vector spaces, there are two diﬀerent kinds of vectors simultaneously
Chapter Three: Introduction to vector spaces 67
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
under discussion. Thus, for instance, it is important not to confuse the zero
element of V with the zero element of W, although it is normal to use the
same symbol ‘0’ for each. Occasionally, as here, we will append a subscript
to indicate which zero vector we are talking about.
By deﬁnition the kernel of T consists of those vectors in V which are
mapped to the zero vecor of W. It is natural to ask whether the zero of V is
one of these. In fact, it always is.
3.12 Proposition Let V and W be vector spaces over F and let 0
V
and
0
W
be the zero elements of V and W respectively. If T: V → W is a linear
transformation then T(0
V
) = 0
W
.
Proof. Certainly T must map 0
V
to some element of W; let w = T(0
V
) be
this element. Since 0
V
+ 0
V
= 0
V
we have
w = T(0
V
) = T(0
V
+ 0
V
) = T(0
V
) +T(0
V
) = w +w
since T preserves addition. By 3.6 it follows that w = 0.
We now prove the theorem which was foreshadowed in #9:
3.13 Theorem If T: V → W is a linear transformation then ker T is a
subspace of V .
Proof. By 3.12 we know that 0
V
∈ ker T, and therefore ker T is nonempty.
By 3.10 it remains to prove that ker T is closed under addition and scalar
multiplication.
Suppose that u, v ∈ ker T and λ is a scalar. Then by linearity of T,
T(u +v) = T(u) +T(v) = 0
W
+ 0
W
= 0
W
and
T(λu) = λT(u) = λ0
W
= 0
W
(by 3.7 above). Thus u + v and λu are in ker T, and it follows that ker T is
closed under addition and scalar multiplication, as required.
68 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Examples
#17 The solution set of the simultaneous linear equations
x +y +z = 0
x −2y −z = 0
may be viewed as the kernel of the linear transformation T: R
3
→ R
2
given
by
T
¸
x
y
z
=
1 1 1
1 −2 −1
¸
x
y
z
,
and is therefore a subspace of R
3
.
#18 Let T be the vector space of all diﬀerentiable real valued functions on
R, and let T be the vector space of all real valued functions on R. For each
f ∈ T deﬁne Tf ∈ T by (Tf)(x) = f
(x) + (cos x)f(x). Show that f → Tf
is a linear map from T to T, and calculate its kernel.
−− Let f, g ∈ T and λ, µ ∈ R. Then for all x ∈ R we have
(T(λf +µg))(x) = (λf +µg)
(x) + (cos x)(λf +µg)(x)
=
d
dx
(λf(x) +µg(x)) + (cos x)(λf(x) +µg(x))
= λf
(x) +λ(cos x)f(x) +µg
(x) +µ(cos x)g(x)
= λ((Tf)(x)) +µ((Tg)(x))
= (λ(Tf) +µ(Tg))(x),
and it follows that T(λf +µg) = λ(Tf) +µ(Tg). Hence T is linear.
By deﬁnition, the kernel of T is the set of all functions f ∈ T such that
Tf = 0. Hence, ﬁnding the kernel of T means solving the diﬀerential equation
f
(x)+(cos x)f(x) = 0. Techniques for solving diﬀerential equations are dealt
with in other courses. In this case the method is to multiply through by the
“integrating factor” e
sin x
, and then integrate. This gives f(x)e
sin x
= C,
where C is any constant. Hence the kernel consists of all functions f such
that f(x) = Ce
−sin x
for some constant C. Note that the sum of two functions
of this form (corresponding to two constants C
1
and C
2
) is again of the same
form. Similarly, any scalar multiple of a function of this form has the same
form. Thus the kernel of T is indeed a subspace of T, as we knew (by
Theorem 3.13) that it would be. −−<
Chapter Three: Introduction to vector spaces 69
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Theorem 3.13 says that the kernel of a linear transformation T is a
subspace of the domain of T; our next result is, in a sense, dual to 3.13, and
says that the image of a linear transformation is a subspace of the codomain.
3.14 Theorem If T: V → W is a linear transformation then the image of
T is a subspace of W.
Proof. Recall (see '0c) that the image of T is the set
imT = ¦ T(v) [ v ∈ V ¦.
We have by 3.12 that T(0
V
) = 0
W
, and hence 0
W
∈ imT. So imT = ∅. Let
x, y ∈ imT and let λ be a scalar. By deﬁnition of imT there exist u, v ∈ V
with T(u) = x and T(v) = y, and now linearity of T gives
x +y = T(u) +T(v) = T(u +v)
and
λx = λT(u) = T(λu),
so that x + y, λx ∈ imT. Hence imT is closed under addition and scalar
multiplication.
Example
#19 Let T: R
2
→R
3
be deﬁned by
T
x
y
=
¸
1 −3
4 −10
−2 9
x
y
.
By #11 we know that T is a linear transformation, and so its image is a
subspace of R
3
. The image in fact consists of all triples
¸
a
b
c
such that
there exist x, y ∈ R such that
(∗)
x −3y = a
4x −10y = b
−2x + 9y = c.
70 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It turns out that this is the same as the set
¸
a
b
c
16a −3b + 2c = 0
.
One way to prove this is to form the augmented matrix corresponding to the
system (∗) and apply row operations to obtain an echelon matrix. The result
is
¸
1 −3 a
0 2 b −4a
0 0 16a −3b + 2c
.
Thus equations are consistent if and only if 16a −3b + 2c = 0, as claimed.
Our ﬁnal result for this section gives a useful criterion for determining
whether a linear transformation is injective.
3.15 Proposition A linear transformation is injective if and only if its
kernel is the zero subspace, ¦0
˜
¦.
Proof. Let θ: V → W be a linear transformation. Assume ﬁrst that θ is
injective.
We have proved in 3.13 that the kernel of θ is a subspace of V ; hence it
contains the zero of V . That is, θ(0
˜
) = 0
˜
. (This was proved explicitly in the
proof of 3.13. Here we will use the same notation, 0
˜
, for the zero elements
of V and W—this should cause no confusion.) Now let v be an arbitrary
element of ker θ. Then we have
θ(v) = 0
˜
= θ(0
˜
),
and since θ is injective it follows that v = 0
˜
. Hence 0
˜
is the only element of
ker θ, and so ker θ = ¦0
˜
¦, as required.
For the converse, assume that ker θ = ¦0
˜
¦. We seek to prove that θ is
injective; so assume that u, v ∈ V with θ(u) = θ(v). By linearity of θ we
obtain
θ(u −v) = θ(u) +θ(−v) = θ(u) −θ(v) = 0
˜
,
so that u−v ∈ ker θ. Hence u−v = 0
˜
, and u = v. Hence θ is injective.
Chapter Three: Introduction to vector spaces 71
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
For completeness we state the analogous result for surjective linear
transformations, although it is immediate from the deﬁnitions concerned.
3.16 Proposition A linear transformation is surjective if and only if its
image is the whole codomain.
'3e Linear combinations
Let v
1
, v
2
, . . . , v
n
and w be vectors in some space V . We say that w is
a linear combination of v
1
, v
2
, . . . , v
n
if w = λ
1
v
1
+ λ
2
v
2
+ + λ
n
v
n
for
some scalars λ
1
, λ
2
, . . . , λ
n
. This concept is, in a sense, the fundamental
concept of vector space theory, since it is made up of addition and scalar
multiplication.
3.17 Definition The vectors v
1
, v
2
, . . . , v
n
are said to span V if every
element w ∈ V can be expressed as a linear combination of the v
i
.
For example, the set S of all triples
¸
x
y
z
such that x +y +z = 0 is a
subspace of R
3
, and it is easily checked that the triples
u =
¸
0
1
−1
, v =
¸
−1
0
1
and w =
¸
1
−1
0
span S. For example, if x +y +z = 0 then
¸
x
y
z
=
y −z
3
¸
0
1
−1
+
z −x
3
¸
−1
0
1
+
x −y
3
¸
1
−1
0
.
In this example it can be seen that there are inﬁnitely many ways of
expressing each element of the set S in terms of u, v and w. Indeed, since
u + v + w = 0, if any number α is added to each of the coeﬃcients of u, v
and w, then the answer will not be altered. Thus one can arrange for the
coeﬃcient of u to be zero, and it follows that in fact S is spanned by v and w.
(It is equally true that u and v span S, and that u and w span S.) It is usual
to try to ﬁnd spanning sets which are minimal, so that no element of the
spanning set is expressible in terms of the others. The elements of a minimal
spanning set are then linearly independent, in the following sense.
72 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3.18 Definition We say that vectors v
1
, v
2
, . . . , v
r
are linearly indepen
dent if the only solution of
λ
1
v
1
+λ
2
v
2
+ +λ
r
v
r
= 0
is given by
λ
1
= λ
2
= = λ
r
= 0.
For example, the triples u and v above are linearly independent, since
if
λ
¸
0
1
−1
+µ
¸
1
0
−1
=
¸
0
0
0
then inspection of the ﬁrst two components immediately gives λ = µ = 0.
Vectors v
1
, v
2
, . . . , v
r
are linearly dependent if they are not linearly
independent; that is, if there is a solution of λ
1
v
1
+λ
2
v
2
+ +λ
r
v
r
= 0 for
which the scalars λ
i
are not all zero.
Example
#20 Show that the functions f, g and h deﬁned by
f(x) = sin x
g(x) = sin(x +
π
4
)
h(x) = sin(x +
π
2
)
for all x ∈ R
are linearly dependent over R.
−− We attempt to ﬁnd α, β, γ ∈ R which are not all zero, and which
satisfy αf +βg +γh = 0. We require
3.18.1 αf(x) +βg(x) +γh(x) = 0 for all x ∈ R,
in other words
αsin x +β sin(x +
π
4
) +γ sin(x +
π
2
) = 0 for all x ∈ R.
By some well known trigonometric formulae we ﬁnd that
sin(x +
π
4
) = sin xcos
π
4
+ cos xsin
π
4
=
1
√
2
sin x +
1
√
2
cos x
Chapter Three: Introduction to vector spaces 73
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and similarly
sin(x +
π
2
) = sin xcos
π
2
+ cos xsin
π
2
= cos x,
and so our equations become
(α +
β
√
2
) sin x + (
β
√
2
+γ) cos x = 0 for all x ∈ R.
Clearly α = 1, β = −
√
2 and γ = 1 solves this, since the coeﬃcients of sin x
and cos x both become zero. Hence 3.18.1 has a nontrivial solution, and f, g
and h are linearly dependent, as claimed. −−<
3.19 Definition If v
1
, v
2
, . . . , v
n
are linearly independent and span V we
say that they form a basis of V . The number n is called the dimension.
If a vector space V has a basis consisting of n vectors then a general
element of V can be completely described by specifying n scalar parameters
(the coeﬃcients of the basis elements). The dimension can thus be thought
of as the number of “degrees of freedom” in the space. For example, the
subspace S of R
3
described above has two degrees of freedom, since it has
a basis consisting of the two elements u and v. Choosing two scalars λ and
µ determines an element λu + µv of S, and each element of S is uniquely
expressible in this form. Geometrically, S represents a plane through the
origin, and so it certainly ought to be a twodimensional space.
One of our major tasks is to show that the dimension is an invariant
of a vector space. That is, any two bases must have the same number of
elements. We will do this in the next chapter; see also Exercise 5 below.
We have seen in #11 above that if A ∈ Mat(mn, F) then T(x) = Ax
deﬁnes a linear transformation T: F
n
→ F
m
. The set S = ¦ Ax [ x ∈ F
n
¦
is the image of T; so (by 3.14) it is a subspace of F
m
. Let a
1
, a
2
, . . . , a
n
be the columns of A. Since A is an mn matrix these columns all have m
components; that is, a
i
∈ F
m
for each i. By multiplication of partitioned
matrices we ﬁnd that
A
¸
¸
¸
λ
1
λ
2
.
.
.
λ
n
=
a
1
a
2
a
n
¸
¸
¸
λ
1
λ
2
.
.
.
λ
n
= λ
1
a
1
+λ
2
a
2
+ +λ
n
a
n
,
and we deduce that S is precisely the set of all linear combinations of the
columns of A.
74 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3.20 Definition If A ∈ Mat(mn, F) the column space, CS(A), of the
matrix A is the subspace of F
m
consisting of all linear combinations of the
columns of A. The row space, RS(A) is the subspace of
t
F
n
consisting of all
linear combinations of the rows of A.
Comment
3.20.1 Observe that the system of equations Ax = b is consistent if and
only if b is in the column space of A.
3.21 Proposition Let A ∈ Mat(mn, F) and B ∈ Mat(n p, F). Then
(i) RS(AB) ⊆ RS(B), and
(ii) CS(AB) ⊆ CS(A).
Proof. Let the (i, j)entry of A be α
ij
and let the i
th
row of B be b
˜
i
. It is
a general fact that
the i
th
row of AB =
i
th
row of A
B,
and since the i
th
row of A is ( a
i1
a
i2
. . . a
in
) we have
i
th
row of AB = ( a
i1
a
i2
. . . a
in
)
¸
¸
¸
b
˜
1
b
˜
2
.
.
.
b
˜
n
= α
i1
b
˜
1
+α
i2
b
˜
2
+ +α
in
b
˜
n
∈ RS(B).
In words, we have shown that the i
th
row of AB is a linear combination of
the rows of B, the scalar coeﬃcients being the entries in the i
th
row of A.
So all rows of AB are in the row space of B, and since the row space of B
is closed under addition and scalar multiplication it follows that all linear
combinations of the rows of AB are also in the row space of B. That is,
RS(AB) ⊆ RS(B).
The proof of (ii) is similar and is omitted.
Chapter Three: Introduction to vector spaces 75
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Example
#21 Prove that if A is an m n matrix and B an n p matrix then the
i
th
row of AB is a
˜
B, where a
˜
is the i
th
row of A.
−− Note that since a
˜
is a 1 n row vector and B is an n m matrix,
a
˜
B is a 1 m row vector. The j
th
entry of a
˜
B is
¸
n
k=1
a
k
B
kj
, where a
k
is the k
th
entry of a
˜
. But since a
˜
is the i
th
row of A, the k
th
entry of a
˜
is
in fact the (i, k)entry of A. So we have shown that the j
th
entry of a
˜
B is
¸
n
k=1
A
ik
B
kj
, which is equal to (AB)
ij
, the j
th
entry of the i
th
row of AB.
So we have shown that for all j, the i
th
row of AB has the same j
th
entry as
a
˜
B; hence a
˜
B equals the i
th
row of AB, as required. −−<
3.22 Corollary Suppose that A ∈ Mat(n n, F) is invertible, and let
B ∈ Mat(n p, F) be arbitrary. Then RS(AB) = RS(B).
Proof. By 3.21 we have RS(AB) ⊆ RS(B). Moreover, applying 3.21 with
A
−1
, AB in place of A, B gives RS(A
−1
(AB)) ⊆ RS(AB). Thus we have
shown that RS(B) ⊆ RS(AB) and RS(AB) ⊆ RS(B), as required.
From 3.22 it follows that if A is invertible then the function “premul
tiplication by A”, which takes B to AB, does not change the row space. (It
is trivial to prove also the column space analogue: postmultiplication by in
vertible matrices leaves the column space unchanged.) Since performing row
operations on a matrix is equivalent to premultiplying by elementary matri
ces, and elementary matrices are invertible, it follows that row operations do
not change the row space of a matrix. Hence the reduced echelon matrix E
which is the end result of the pivoting algorithm described in Chapter 2 has
the same row space as the original matrix A. The nonzero rows of E therefore
span the row space of A. We leave it as an exercise for the reader to prove
that the nonzero rows of E are also linearly independent, and therefore form
a basis for the row space of A.
Examples
#22 Find a basis for the row space of the matrix
¸
1 −3 5 5
−2 2 1 −3
−3 1 7 1
.
76 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
−−
¸
1 −3 5 5
−2 2 1 −3
−3 1 7 −1
R
2
:=R
2
+2R
1
R
3
:=R
3
+3R
1
−−−−−−−→
¸
1 −3 5 5
0 −4 11 7
0 −8 22 14
R
3
:=R
3
−2R
2
−−−−−−−→
¸
1 −3 5 5
0 −4 11 7
0 0 0 0
.
Since row operations do not change the row space, the echelon matrix we
have obtained has the same row space as the original matrix A. Hence the
row space consists of all linear combinations of the rows ( 1 −3 5 5 ) and
( 0 −4 11 7 ), since the zero row can clearly be ignored when computing
linear combinations of the rows. Furthermore, because the matrix is echelon,
it follows that its nonzero rows are linearly independent. Indeed, if
λ( 1 −3 5 5 ) +µ( 0 −4 11 7 ) = ( 0 0 0 0 )
then looking at the ﬁrst entry we see immediately that λ = 0, whence (from
the second entry) µ must be zero also. Hence the two nonzero rows of the
echelon matrix form a basis of RS(A). −−<
#23 Is the column vector v =
¸
1
1
1
in the column space of the matrix
A =
¸
2 −1 4
−1 3 −1
4 3 2
?
−− The question can be rephrased as follows: can v be expressed as a
linear combination of the columns of A? That is, do the equations
x
¸
2
−1
4
+y
¸
−1
3
3
+z
¸
4
−1
2
=
¸
1
1
1
have a solution? Since these equations can also be written as
¸
2 −1 4
−1 3 −1
4 3 2
¸
x
y
z
=
¸
1
1
1
,
Chapter Three: Introduction to vector spaces 77
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
our task is to determine whether or not these equations are consistent. (This
is exactly the point made in 3.20.1 above.)
We simply apply standard row operation techniques:
¸
2 −1 4 1
−1 3 −1 1
4 3 2 1
R
1
↔R
2
R
2
:=R
2
+2R
1
R
3
:=R
3
+4R
1
−−−−−−−→
¸
−1 3 −1 1
0 5 2 3
0 15 −2 5
R
3
:=R
3
−3R
2
−−−−−−−→
¸
−1 3 −1 1
0 5 2 3
0 0 −8 −4
Now the coeﬃcient matrix has been reduced to echelon form, and there is no
row for which the left hand side of the equation is zero and the right hand side
nonzero. Hence the equations must be consistent. Thus v is in the column
space of A. Although the question did not ask for the coeﬃcients x, y and z
to be calculated, let us nevertheless do so, as a check. The last equation of
the echelon system gives z = 1/2, substituting this into the second equation
gives y = 2/5, and then the ﬁrst equation gives x = −3/10. It is easily
checked that
(−3/10)
¸
2
−1
4
+ (2/5)
¸
−1
3
3
+ (1/2)
¸
4
−1
2
=
¸
1
1
1
−−<
Exercises
1. Prove that the set T of all solutions of the simultaneous equations
1 1 1 1
1 2 1 2
¸
¸
x
y
z
w
=
0
0
is a subspace of R
4
.
2. Prove that if A is an mn matrix over R then the solution set of Av = 0
is a subspace of R
n
.
78 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3. Let A be an n n matrix over R, and λ any eigenvalue of A. Prove
that if S is the subset of R
n
which consists of the zero column and all
eigenvectors associated to the eigenvalue λ then S is a subspace of R
n
.
4. Prove that the nonzero rows of an echelon matrix are necessarily linearly
independent.
5. Let v
1
, v
2
, . . . , v
m
and w
1
, w
2
, . . . , w
n
be two bases of a vector space V ,
let A be the mn coeﬃcient matrix obtained when the v
i
are expressed
as linear combinations of the w
j
, and let B be the n m coeﬃcient
matrix obtained when the w
j
are expressed as linear combinations of
the v
i
. Combine these equations to express the v
i
as linear combinations
of themselves, and then use linear independence of the v
i
to deduce that
AB = I. Similarly, show that BA = I, and use 2.10 to deduce that
m = n.
6. In each case decide whether or not the set S is a vector space over the ﬁeld
F, relative to obvious operations of addition and scalar multiplication.
(i ) S = C (complex numbers), F = R.
(ii ) S = C, F = C.
(iii ) S = R, F = Q (rational numbers).
(iv) S = R[X] (polynomials over R in the variable X—that is, expres
sions of the form a
0
+a
1
X + +a
n
X
n
with (a
i
∈ R)), F = R.
(v) S = Mat(n, C) (n n matrices over C), F = R.
7. Let V be a vector space and S any set. Show that the set of all functions
from S to V can be made into a vector space in a natural way.
8. Which of the following functions are linear transformations?
(i ) T: R
2
→R
2
deﬁned by T
x
y
=
1 2
2 1
x
y
,
(ii ) S: R
2
→R
2
deﬁned by S
x
y
=
0 0
0 1
x
y
,
(iii ) g: R
2
→R
3
deﬁned by g
x
y
=
¸
2x +y
y
x −y
,
(iv) f: R →R
2
deﬁned by f(x) =
x
x + 1
.
Chapter Three: Introduction to vector spaces 79
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9. Let V be a vector space and let S and T be subspaces of V .
(i ) Prove that S ∩ T is a subspace of V .
(ii ) Let S + T = ¦ x + y [ x ∈ S and y ∈ T ¦. Prove that S + T is a
subspace of V .
10. (i ) Let U, V , W be vector spaces over a ﬁeld F, and let f: V → W,
g: U → V be linear transformations. Prove that fg: U → W de
ﬁned by
(fg)(u) = f
g(u)
for all u ∈ U
is a linear transformation.
(ii ) Let U = R
2
, V = R
3
, W = R
5
and suppose that f, g are deﬁned
by
f
¸
a
b
c
=
¸
¸
¸
¸
1 1 1
0 0 1
2 1 0
−1 −1 1
1 0 1
¸
a
b
c
g
a
b
=
¸
0 1
1 0
2 3
a
b
Give a formula for (fg)
a
b
.
11. Let V , W be vector spaces over F.
(i ) Prove that if φ and ψ are linear transformations from V to W then
φ +ψ: V → W deﬁned by
(φ +ψ)(v) = φ(v) +ψ(v) for all v ∈ V
is also a linear transformation.
(ii ) Prove that if φ: V → W is a linear transformation and α ∈ F then
αφ: V → W deﬁned by
(αφ)(v) = α
φ(v)
for all v ∈ V
is also a linear transformation.
80 Chapter Three: Introduction to vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
12. Prove that the set U of all functions f: R → R which satisfy the diﬀer
ential equation f
(t) −f(t) = 0 is a vector subspace of the vector space
of all realvalued functions on R.
13. Let V be a vector space and let S and T be subspaces of V .
(i ) Prove that if addition and scalar multiplication are deﬁned in the
obvious way then
V
2
= ¦
u
v
[ u, v ∈ V ¦
becomes a vector space. Prove that
S
˙
+T = ¦
u
v
[ u ∈ S, v ∈ T¦
is a subspace of V
2
.
(ii ) Prove that f: S
˙
+T → V deﬁned by
f
u
v
= u +v
is a linear transformation. Calculate ker f and imf.
14. (i ) Let F be any ﬁeld. Prove that for any a ∈ F the function f: F → F
given by f(x) = ax for all x ∈ F is a linear transformation. Prove
also that any linear transformation from F to F has this form.
(Note that this implicitly uses the fact that F is a vector space
over itself.)
(ii ) Prove that if f is any linear transformation from F
2
to F
2
then
there exists a matrix A =
a b
c d
(where a, b, c, d ∈ F) such
that f(x) = Ax for all x ∈ F
2
.
(Hint: the formulae f
1
0
=
a
c
and f
0
1
=
b
d
deﬁne
the coeﬃcients of A.)
(iii ) Can these results be generalized to deal with linear transformations
from F
n
to F
m
for arbitrary n and m?
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
4
The structure of abstract vector spaces
This chapter probably constitutes the hardest theoretical part of the course;
in it we investigate bases in arbitrary vector spaces, and use them to relate
abstract vector spaces to spaces of ntuples. Throughout the chapter, unless
otherwise stated, V will be a vector space over the ﬁeld F.
'4a Preliminary lemmas
If n is a nonnegative integer and for each positive integer i ≤ n we are given
a vector v
i
∈ V then we will say that (v
1
, v
2
, . . . , v
n
) is a sequence of vectors.
It is convenient to formulate most of the results in this chapter in terms of
sequences of vectors, and we start by restating the deﬁnitions from '3e in
terms of sequences.
4.1 Definition Let (v
1
, v
2
, . . . , v
n
) be a sequence of vectors in V .
(i) An element w ∈ V is a linear combination of (v
1
, v
2
, . . . , v
n
) if there
exist scalars λ
i
such that w =
¸
n
i=1
λ
i
v
i
.
(ii) The subset of V consisting of all linear combinations of (v
1
, v
2
, . . . , v
n
)
is called the span of (v
1
, v
2
, . . . , v
n
):
Span(v
1
, v
2
, . . . , v
n
) = ¦
n
¸
i=1
λ
i
v
i
[ λ
i
∈ F ¦.
If Span(v
1
, v
2
, . . . , v
n
) = V then (v
1
, v
2
, . . . , v
n
) is said to span V .
(iii) The sequence (v
1
, v
2
, . . . , v
n
) is said to be linearly independent if the
only solution of
λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
= 0
˜
(λ
1
, λ
2
, . . . , λ
n
∈ F)
is given by
λ
1
= λ
2
= = λ
n
= 0.
81
82 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(iv) The sequence (v
1
, v
2
, . . . , v
n
) is said to be a basis of V if it is linearly
independent and spans V .
Comments
4.1.1 The sequence of vectors (v
1
, v
2
, . . . , v
n
) is said to be linearly de
pendent if it is not linearly independent; that is, if there is a solution of
¸
n
i=1
λ
i
v
i
= 0
˜
for which at least one of the λ
i
is nonzero.
4.1.2 It is not diﬃcult to prove that Span(v
1
, v
2
, . . . , v
n
) is always a sub
space of V ; it is commonly called the subspace generated by v
1
, v
2
, . . . , v
n
.
(The word ‘generate’ is often used as a synonym for ‘span’.) We say that
the vector space V is ﬁnitely generated if there exists a ﬁnite sequence
(v
1
, v
2
, . . . , v
n
) of vectors in V such that V = Span(v
1
, v
2
, . . . , v
n
).
4.1.3 Let V = F
n
and deﬁne e
i
∈ V to be the column with 1 as its i
th
entry and all other entries zero. It is easily proved that (e
1
, e
2
, . . . , e
n
) is a
basis of F
n
. We will call this basis the standard basis of F
n
.
4.1.4 The notation ‘(v
1
, v
2
, . . . , v
n
)’ is not meant to imply that n ≥ 2,
and indeed we want all our statements about sequences of vectors to be valid
for oneterm sequences, and even for the sequence which has no terms. Each
v ∈ V gives rise to a oneterm sequence (v), and a linear combination of (v)
is just a scalar multiple of v; thus Span(v) = ¦ λv [ λ ∈ F ¦. The sequence
(0
˜
) is certainly not linearly independent, since λ = 1 is a nonzero solution
of λ0
˜
= 0
˜
. However, if v = 0
˜
then by 3.7 we know that the only solution of
λv = 0
˜
is λ = 0. Thus the sequence (v) is linearly independent if and only if
v = 0
˜
.
4.1.5 Empty sums are always given the value zero; so it seems reasonable
that an empty linear combination should give the zero vector. Accordingly
we adopt the convention that the empty sequence generates the subspace ¦0
˜
¦.
Furthermore, the empty sequence is considered to be linearly independent;
so it is a basis for ¦0
˜
¦.
4.1.6 (This is a rather technical point.) It may seem a little strange
that the above deﬁnitions have been phrased in terms of a sequence of vectors
(v
1
, v
2
, . . . , v
n
) rather than a set of vectors ¦v
1
, v
2
, . . . , v
n
¦. The reason for
our choice can be illustrated with a simple example, in which n = 2. Suppose
that v is a nonzero vector, and consider the twoterm sequence (v, v). The
equation λ
1
v + λ
2
v = 0
˜
has a nonzero solution—namely, λ
1
= 1, λ
2
= −1.
Thus, by our deﬁnitions, (v, v) is a linearly dependent sequence of vectors.
A deﬁnition of linear independence for sets of vectors would encounter the
Chapter Four: The structure of abstract vector spaces 83
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
problem that ¦v, v¦ = ¦v¦: we would be forced to say that ¦v, v¦ is linearly
independent. The point is that sets A and B are equal if and only if each
element of A is an element of B and vice versa, and writing an element down
again does not change the set; on the other hand, sequences (v
1
, v
2
, . . . , v
n
)
and (u
1
, u
2
, . . . , u
m
) are equal if and only if m = n and u
i
= v
i
for each i.
4.1.7 Despite the remarks just made in 4.1.6, the concept of ‘linear
combination’ could have been deﬁned perfectly satisfactorily for sets; in
deed, if (v
1
, v
2
, . . . , v
n
) and (u
1
, u
2
, . . . , u
m
) are sequences of vectors such that
¦v
1
, v
2
, . . . , v
n
¦ = ¦u
1
, u
2
, . . . , u
m
¦, then a vector v is a linear combination of
(v
1
, v
2
, . . . , v
n
) if and only if it is a linear combination of (u
1
, u
2
, . . . , u
m
).
4.1.8 One consequence of our terminology is that the order in which
elements of a basis are listed is important. If v
1
= v
2
then the sequences
(v
1
, v
2
, v
3
) and (v
2
, v
1
, v
3
) are not equal. It is true that if a sequence of vectors
is linearly independent then so is any rearrangement of that sequence, and if
a sequence spans V then so does any rearrangement. Thus a rearrangement
of a basis is also a basis—but a diﬀerent basis. Our terminology is a little
at odds with the mathematical world at large in this regard: what we are
calling a basis most authors call an ordered basis. However, for our discussion
of matrices and linear transformations in Chapter 6, it is the concept of an
ordered basis which is appropriate.
Our principal goal in this chapter is to prove that every ﬁnitely gener
ated vector space has a basis and that any two bases have the same number
of elements. The next two lemmas will be used in the proofs of these facts.
4.2 Lemma Suppose that v
1
, v
2
, . . . , v
n
∈ V , and let 1 ≤ j ≤ n. Write
S = Span(v
1
, v
2
, . . . , v
n
) and S
= Span(v
1
, v
2
, . . . , v
j−1
, v
j+1
, . . . , v
n
). Then
(i) S
⊆ S,
(ii) if v
j
∈ S
then S
= S,
(iii) if the sequence (v
1
, v
2
, . . . , v
n
) is linearly independent, then so also
is (v
1
, v
2
, . . . , v
j−1
, v
j+1
, . . . , v
n
).
Comment
4.2.1 By ‘(v
1
, v
2
, . . . , v
j−1
, v
j+1
, . . . , v
n
)’ we mean the sequence which
is obtained from (v
1
, v
2
. . . , v
n
) by deleting v
j
; the notation is not meant to
imply that 2 < j < n. However, n must be at least 1, so that there is a term
to delete.
84 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Proof of 4.2. (i) If v ∈ S
then for some scalars λ
i
,
v = λ
1
v
1
+λ
2
v
2
+ +λ
j−1
v
j−1
+λ
j+1
v
j+1
+ +λ
n
v
n
= λ
1
v
1
+λ
2
v
2
+ +λ
j−1
v
j−1
+ 0v
j
+λ
j+1
v
j+1
+ +λ
n
v
n
∈ S.
Every element of S
is in S; that is, S
⊆ S.
(ii) Assume that v
j
∈ S
. This gives
v
j
= α
1
v
1
+α
2
v
2
+ +α
j−1
v
j−1
+α
j+1
v
j+1
+ +α
n
v
n
for some scalars α
i
. Now let v be an arbitrary element of S. Then for some
λ
i
we have
v = λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
= λ
1
v
1
+λ
2
v
2
+ +λ
j−1
v
j−1
+λ
j
(α
1
v
1
+α
2
v
2
+ +α
j−1
v
j−1
+α
j+1
v
j+1
+ +α
n
v
n
)
+λ
j+1
v
j+1
+ +λ
n
v
n
= (λ
1
+λ
j
α
1
)v
1
+ (λ
2
+λ
j
α
2
)v
2
+ + (λ
j−1
+λ
j
α
j−1
)v
j−1
+ (λ
j+1
+λ
j
α
j+1
)v
j+1
+ + (λ
n
+λ
j
α
n
)v
n
∈ S
.
This shows that S ⊆ S
, and, since the reverse inclusion was proved in (i), it
follows that S
= S.
(iii) Assume that (v
1
, v
2
. . . , v
n
) is linearly independent, and suppose
that
($) λ
1
v
1
+λ
2
v
2
+ +λ
j−1
v
j−1
+λ
j+1
v
j+1
+ +λ
n
v
n
= 0
˜
where the coeﬃcients λ
i
are elements of F. Then
λ
1
v
1
+λ
2
v
2
+ +λ
j−1
v
j−1
+ 0v
j
+λ
j+1
v
j+1
+ +λ
n
v
n
= 0
˜
and, by linear independence of (v
1
, v
2
, . . . , v
n
), all the coeﬃcients in this
equation must be zero. Hence
λ
1
= λ
2
= = λ
j−1
= λ
j+1
= = λ
n
= 0.
Since we have shown that this is the only solution of ($), we have shown that
(v
1
, . . . , v
j−1
, v
j+1
, . . . , v
n
) is linearly independent.
Chapter Four: The structure of abstract vector spaces 85
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
A subsequence of a sequence is a sequence obtained by deleting terms
from the original. (This is meant to include the possibility that the number
of terms deleted is zero; thus a sequence is considered to be a subsequence of
itself.) Repeated application of 4.2 (iii) gives
4.3 Corollary Any subsequence of a linearly independent sequence is
linearly independent.
In a similar vein to 4.2, and only slightly harder, we have:
4.4 Lemma Suppose that v
1
, v
2
, . . . , v
r
∈ V are such that the sequence
(v
1
, v
2
, . . . , v
r−1
) is linearly independent, but (v
1
, v
2
, . . . , v
r
) is linearly de
pendent. Then v
r
∈ Span(v
1
, v
2
, . . . , v
r−1
).
Proof. Since (v
1
, v
2
, . . . , v
r
) is linearly dependent there exist coeﬃcients
λ
1
, λ
2
, . . . , λ
r
which are not all zero and which satisfy
(∗∗) λ
1
v
1
+λ
2
v
2
+ +λ
r
v
r
= 0
˜
.
If λ
r
= 0 then λ
1
, λ
2
, . . . , λ
r−1
are not all zero, and, furthermore, (∗∗) gives
λ
1
v
1
+λ
2
v
2
+ +λ
r−1
v
r−1
= 0
˜
.
But this contradicts the assumption that (v
1
, v
2
, . . . , v
r−1
) is linearly inde
pendent. Hence λ
r
= 0, and rearranging (∗∗) gives
v
r
= −λ
−1
r
(λ
1
v
1
+λ
2
v
2
+ +λ
r−1
v
r−1
)
∈ Span(v
1
, v
2
, . . . , v
r−1
).
'4b Basis theorems
This section consists of a list of the main facts about bases which every
student should know, collected together for ease of reference. The proofs will
be given in the next section.
4.5 Theorem Every ﬁnitely generated vector space has a basis.
4.6 Proposition If (w
1
, w
2
, . . . , w
m
) and (v
1
, v
2
, . . . , v
n
) are both bases of
V then m = n.
As we have already remarked, if V has a basis then the number of
elements in any basis of V is called the dimension of V , denoted ‘dimV ’.
86 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
4.7 Proposition Suppose that (v
1
, v
2
, . . . , v
n
) spans V , and no proper
subsequence of (v
1
, v
2
, . . . , v
n
) spans V . Then (v
1
, v
2
, . . . , v
n
) is a basis of V .
Dual to 4.7 we have
4.8 Proposition Suppose that (v
1
, v
2
, . . . , v
n
) is a linearly independent
sequence in V which is not a proper subsequence of any other linearly inde
pendent sequence. Then (v
1
, v
2
, . . . , v
n
) is a basis of V .
4.9 Proposition If a sequence (v
1
, v
2
, . . . , v
n
) spans V then some subse
quence of (v
1
, v
2
, . . . , v
n
) is a basis.
Again there is a dual result: any linearly independent sequence extends
to a basis. However, to avoid the complications of inﬁnite dimensional spaces,
we insert the proviso that the space is ﬁnitely generated.
4.10 Proposition If (v
1
, v
2
, . . . , v
n
) a linearly independent sequence in a
ﬁnitely generated vector space V then there exist v
n+1
, v
n+2
, . . . , v
d
in V
such that (v
1
, v
2
, . . . , v
n
, v
n+1
, . . . , v
d
) is a basis of V .
4.11 Proposition If V is a ﬁnitely generated vector space and U is a
subspace of V then U is also ﬁnitely generated, and dimU ≤ dimV . Fur
thermore, if U = V then dimU < dimV .
4.12 Proposition Let V be a ﬁnitely generated vector space of dimension
d and s = (v
1
, v
2
, . . . , v
n
) a sequence of elements of V .
(i) If n < d then s does not span V .
(ii) If n > d then s is not linearly independent.
(iii) If n = d then s spans V if and only if s is linearly independent.
'4c The Replacement Lemma
We turn now to the proofs of the results stated in the previous section. In
the interests of rigour, the proofs are detailed. This makes them look more
complicated than they are, but for the most part the ideas are simple enough.
The key fact is that the number of terms in a linearly independent se
quence of vectors cannot exceed the number of terms in a spanning sequence,
and the strategy of the proof is to replace terms of the spanning sequence by
Chapter Four: The structure of abstract vector spaces 87
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
terms of the linearly independent sequence and keep track of what happens.
More exactly, if a sequence s of vectors is linearly independent and another
sequence t spans V then it is possible to use the vectors in s to replace an
equal number of vectors in t, always preserving the spanning property. The
statement of the lemma is rather technical, since it is designed speciﬁcally
for use in the proof of the next theorem.
4.13 The Replacement Lemma Suppose that r, s are nonnegative in
tegers, and v
1
, v
2
, . . . , v
r
, v
r+1
and x
1
, x
2
, . . . , x
s
elements of V such that
(v
1
, v
2
, . . . , v
r
, v
r+1
) is linearly independent and (v
1
, v
2
, . . . , v
r
, x
1
, x
2
, . . . , x
s
)
spans V . Then another spanning sequence can be obtained from this by in
serting v
r+1
and deleting a suitable x
j
. That is, there exists a j with 1 ≤ j ≤ s
such that the sequence (v
1
, . . . , v
r
, v
r+1
, x
1
, . . . , x
j−1
, x
j+1
, . . . , x
s
) spans V .
Comments
4.13.1 If r = 0 the assumption that (v
1
, v
2
, . . . , v
r+1
) is linearly indepen
dent reduces to v
1
= 0
˜
. The other assumption is that (x
1
, . . . , x
s
) spans V ,
and the conclusion that some x
j
can be replaced by v
1
without losing the
spanning property.
4.13.2 One conclusion of the Lemma is that there is an x
j
to replace; so
the hypotheses of the Lemma cannot be satisﬁed if s = 0. In other words, in
the case s = 0 the Lemma says that it is impossible for (v
1
, . . . , v
r+1
) to be
linearly independent if (v
1
, . . . , v
r
) spans V .
Proof of 4.13. Since v
r+1
∈ V = Span(v
1
, v
2
, . . . , v
r
, x
1
, x
2
, . . . , x
s
) there
exist scalars λ
i
and µ
k
with
v
r+1
=
r
¸
i=1
λ
i
v
i
+
s
¸
k=1
µ
k
x
k
.
Writing λ
r+1
= −1 we have
λ
1
v
1
+ +λ
r
v
r
+λ
r+1
v
r+1
+µ
1
x
1
+ +µ
s
x
s
= 0
˜
,
which shows that (v
1
, . . . , v
r
, v
r+1
, x
1
, . . . , x
s
) is linearly dependent, since the
coeﬃcient of λ
r+1
is nonzero. (Observe that this gives a contradiction if
88 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
s = 0, justifying the remarks in 4.13.2 above.) Now consider the sequences
(v
1
, v
2
, . . . ,v
r+1
)
(v
1
, v
2
, . . . ,v
r+1
, x
1
)
(v
1
, v
2
, . . . ,v
r+1
, x
1
, x
2
)
.
.
.
(v
1
, v
2
, . . . ,v
r+1
, x
1
, x
2
, . . . , x
s
).
The ﬁrst of these is linearly independent, and the last is linearly dependent.
We can therefore ﬁnd the ﬁrst linearly dependent sequence in the list: let j be
the least positive integer for which (v
1
, v
2
, . . . , v
r+1
, x
1
, x
2
, . . . , x
j
) is linearly
dependent. Then (v
1
, v
2
, . . . , v
r+1
, x
1
, x
2
, . . . , x
j−1
) is linearly independent,
and by 4.4,
x
j
∈ Span(v
1
, v
2
, . . . , v
r+1
, x
1
, . . . , x
j−1
)
⊆ Span(v
1
, v
2
, . . . , v
r+1
, x
1
, . . . , x
j−1
, x
j+1
, . . . , x
s
) (by 4.2 (i)).
Hence, by 4.2 (ii),
Span(v
1
, . . . , v
r+1
, x
1
, . . . , x
j−1
, x
j+1
, . . . , x
s
)
= Span(v
1
, . . . , v
r+1
, x
1
, . . . , x
j−1
, x
j
, x
j+1
, . . . , x
s
).
Let us call this last space S. Obviously then S ⊆ V . But (by 4.2 (i)) we know
that Span(v
1
, . . . , v
r
, x
1
, . . . , x
s
) ⊆ S, while Span(v
1
, . . . , v
r
, x
1
, . . . , x
s
) = V
by hypothesis. So
Span(v
1
, . . . , v
r+1
, x
1
, . . . , x
j−1
, x
j+1
, . . . , x
s
) = V,
as required.
We can now prove the key theorem, by repeated application of the
Replacement Lemma.
4.14 Theorem Let (w
1
, w
2
, . . . , w
m
) be a sequence of elements of V which
spans V , and let (v
1
, v
2
, . . . , v
n
) be a linearly independent sequence in V .
Then n ≤ m.
Proof. Suppose that n > m. We will use induction on i to prove the
following statement for all i ∈ ¦1, 2, . . . , m¦:
(∗)
There exists a subsequence (w
(i)
1
, w
(i)
2
, . . . , w
(i)
m−i
)
of (w
1
, w
2
, . . . , w
m
) such that
(v
1
, v
2
, . . . , v
i
, w
(i)
1
, w
(i)
2
, . . . , w
(i)
m−i
) spans V .
Chapter Four: The structure of abstract vector spaces 89
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
In other words, what this says is that i of the terms of (w
1
, w
2
, . . . , w
m
) can
be replaced by v
1
, v
2
, . . . , v
i
without losing the spanning property.
The fact that (w
1
, w
2
, . . . , w
m
) spans V proves (∗) in the case i = 0:
in this case the subsequence involved is the whole sequence (w
1
, w
2
, . . . , w
m
)
and there is no replacement of terms at all. Now assume that 1 ≤ k ≤ m
and that (∗) holds when i = k − 1. We will prove that (∗) holds for i = k,
and this will complete our induction.
By our hypothesis there is a subsequence (w
(k−1)
1
, w
(k−1)
2
, . . . , w
(k−1)
m−k+1
)
of (w
1
, w
2
, . . . , w
m
) such that (v
1
, v
2
, . . . , v
k−1
, w
(k−1)
1
, w
(k−1)
2
, . . . , w
(k−1)
m−k+1
)
spans V . Repeated application of 4.2 (iii) shows that any subsequence of a
linearly independent sequence is linearly independent; so since (v
1
, v
2
, . . . , v
n
)
is linearly independent and n > m ≥ k it follows that (v
1
, v
2
, . . . , v
k
) is
linearly independent. Now we can apply the Replacement Lemma to conclude
that (v
1
, . . . , v
k−1
, v
k
, w
(k)
1
, w
(k)
2
, . . . , w
(k)
m−k
) spans V , (w
(k)
1
, w
(k)
2
, . . . , w
(k)
m−k
)
being obtained by deleting one term of (w
(k−1)
1
, w
(k−1)
2
, . . . , w
(k−1)
m−k+1
). Hence
(∗) holds for i = k, as required.
Since (∗) has been proved for all i with 0 ≤ i ≤ m, it holds in particular
for i = m, and in this case it says that (v
1
, v
2
, . . . , v
m
) spans V . Since
m+ 1 ≤ n we have that v
m+1
exists; so v
m+1
∈ Span(v
1
, v
2
, . . . , v
m
), giving
a solution of λ
1
v
1
+ λ
2
v
2
+ + λ
m+1
v
m+1
= 0
˜
with λ
m+1
= −1 = 0. This
contradicts the fact that the sequence of v
j
is linearly independent, showing
that the assumption that n > m is false, and completing the proof.
It is now straightforward to deal with the theorems and propositions
stated in the previous section.
Proof of 4.6. Since (w
1
, w
2
, . . . , w
m
) spans and (v
1
, v
2
, . . . , v
n
) is linearly
independent, it follows from 4.14 that n ≤ m. Dually, since (v
1
, v
2
, . . . , v
n
)
spans and (w
1
, w
2
, . . . , w
m
) is linearly independent, it follows that m ≤ n.
Proof of 4.7. Suppose that the sequence (v
1
, v
2
, . . . , v
n
) is linearly depen
dent. Then there exist λ
i
∈ F with
λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
= 0
˜
and λ
r
= 0 for some r. For such an r we have
v
r
= −λ
−1
r
(λ
1
v
1
+ +λ
r−1
v
r−1
+λ
r+1
v
r+1
+ +λ
n
v
n
)
∈ Span(v
1
, . . . , v
r−1
, v
r+1
, . . . , v
n
),
90 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and, by 4.2 (ii),
Span(v
1
, . . . , v
r−1
, v
r+1
, . . . , v
n
) = Span(v
1
, v
2
, . . . , v
n
).
This contradicts the assumption that (v
1
, v
2
, . . . , v
n
) spans V whilst proper
subsequences of it do not. Hence (v
1
, v
2
, . . . , v
n
) is linearly independent, and,
since it also spans, it is therefore a basis.
Proof of 4.8. Let v ∈ V . By our hypotheses the sequence (v
1
, v
2
, . . . , v
n
, v)
is not linearly independent, but the sequence (v
1
, v
2
, . . . , v
n
) is. Hence, by
4.4, v ∈ Span(v
1
, v
2
, . . . , v
n
). Since this holds for all v ∈ V it follows that
V ⊆ Span(v
1
, v
2
, . . . , v
n
). Since the reverse inclusion is obvious we deduce
that (v
1
, v
2
, . . . , v
n
) spans V , and, since it is also linearly independent, is
therefore a basis.
Proof of 4.9. Given that V = Span(v
1
, v
2
, . . . , v
n
), let o be the set of all
subsequences of (v
1
, v
2
, . . . , v
n
) which span V . Observe that the set o has at
least one element, namely (v
1
, . . . , v
n
) itself. Start with this sequence, and if
it is possible to delete a term and still be left with a sequence which spans
V , do so. Repeat this for as long as possible, and eventually (in at most
n steps) we will obtain a sequence (u
1
, u
2
, . . . , u
s
) in o with the property
that no proper subsequence of (u
1
, u
2
, . . . , u
s
) is in o. By 4.7 above, such a
sequence is necessarily a basis.
Observe that Theorem 4.5 follows immediately from 4.9.
Proof of 4.10. Let S = Span(v
1
, v
2
, . . . , v
n
). If S is not equal to V then it
must be a proper subset of V , and we may choose v
n+1
∈ V with v
n+1
/ ∈ S.
If the sequence (v
1
, v
2
, . . . , v
n
, v
n+1
) were linearly dependent 4.4 would give
v
n+1
∈ Span(v
1
, v
2
, . . . , v
n
), a contradiction. So we have obtained a longer
linearly independent sequence, and if this also fails to span V we may choose
v
n+2
∈ V which is not in Span(v
1
, v
2
, . . . , v
n
, v
n+1
) and increase the length
again. Repeat this process for as long as possible.
Since V is ﬁnitely generated there is a ﬁnite sequence (w
1
, w
2
, . . . , w
m
)
which spans V , and by 4.14 it follows that a linearly independent sequence
in V can have at most m elements. Hence the process described in the above
paragraph cannot continue indeﬁnitely. Thus at some stage we obtain a lin
early independent sequence (v
1
, v
2
, . . . , v
n
, . . . , v
d
) which cannot be extended,
and which therefore spans V .
Chapter Four: The structure of abstract vector spaces 91
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The proofs of 4.11 and 4.12 are left as exercises. Our ﬁnal result in this
section is little more than a rephrasing of the deﬁnition of a basis; however,
it is useful from time to time:
4.15 Proposition Let V be a vector space over the ﬁeld F. A sequence
(v
1
, v
2
, . . . , v
n
) is a basis for V if and only if for each v ∈ V there exist unique
λ
1
, λ
2
, . . . , λ
n
∈ F with
(♥) v = λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
.
Proof. Clearly (v
1
, v
2
, . . . , v
n
) spans V if and only if for each v ∈ V the
equation (♥) has a solution for the scalars λ
i
. Hence it will be suﬃcient to
prove that (v
1
, v
2
, . . . , v
n
) is linearly independent if and only if (♥) has at
most one solution for each v ∈ V .
Suppose that (v
1
, v
2
, . . . , v
n
) is linearly independent, and suppose that
λ
i
, λ
i
∈ F (i = 1, 2, . . . , n) are such that
n
¸
i=1
λ
i
v
i
=
n
¸
i=1
λ
i
v
i
= v
for some v ∈ V . Collecting terms gives
¸
n
i=1
(λ
i
− λ
i
)v
i
= 0
˜
, and linear
independence of the v
i
gives λ
i
− λ
i
= 0 for each i. Thus λ
i
= λ
i
, and it
follows that (♥) cannot have more than one solution.
Conversely, suppose that (♥) has at most one solution for each v ∈ V .
Then in particular
¸
n
i=1
λ
i
v
i
= 0
˜
has at most one solution, and so λ
i
= 0 is
the unique solution to this. That is, (v
1
, v
2
, . . . , v
n
) is linearly independent.
'4d Two properties of linear transformations
Since linear transformations are of central importance in this subject, it is
natural to investigate their relationships with bases. This section contains
two theorems in this direction.
4.16 Theorem Let V and W be vector spaces over the same ﬁeld F. Let
(v
1
, v
2
, . . . , v
n
) be a basis for V and let w
1
, w
2
, . . . , w
n
be arbitrary elements
92 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
of W. Then there is a unique linear transformation θ: V → W such that
θ(v
i
) = w
i
for i = 1, 2, . . . , n.
Proof. We prove ﬁrst that there is at most one such linear transformation.
To do this, assume that θ and ϕ are both linear transformations from V to
W and that θ(v
i
) = ϕ(v
i
) = w
i
for all i. Let v ∈ V . Since (v
1
, v
2
, . . . , v
n
)
spans V there exist λ
i
∈ F with v =
¸
n
i=1
λ
i
v
i
, and we ﬁnd
θ(v) = θ
n
¸
i=1
λ
i
v
i
=
n
¸
i=1
λ
i
θ(v
i
) (by linearity of θ)
=
n
¸
i=1
λ
i
ϕ(v
i
) (since θ(v
i
) = ϕ(v
i
))
= ϕ
n
¸
i=1
λ
i
v
i
(by linearity of ϕ)
= ϕ(v).
Thus θ = ϕ, as required.
We must now prove the existence of a linear transformation with the
required properties. Let v ∈ V , and write v =
¸
n
i=1
λ
i
v
i
as above. By
4.15 the scalars λ
i
are uniquely determined by v, and therefore
¸
n
i=1
λ
i
w
i
is
a uniquely determined element of W. This gives us a well deﬁned rule for
obtaining an element of W for each element of V ; that is, a function from V
to W. Thus there is a function θ: V → W satisfying
θ
n
¸
i=1
λ
i
v
i
=
n
¸
i=1
λ
i
w
i
for all λ
i
∈ F, and it remains for us to prove that it is linear.
Let u, v ∈ V and α, β ∈ F. Let u =
¸
n
i=1
λ
i
v
i
and v =
¸
n
i=1
µ
i
v
i
.
Then
αu +βv =
n
¸
i=1
(αλ
i
+βµ
i
)v
i
,
Chapter Four: The structure of abstract vector spaces 93
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and the deﬁnition of θ gives
θ(αu +βv) =
n
¸
i=1
(αλ
i
+βµ
i
)w
i
= α
n
¸
i=1
λ
i
w
i
+β
n
¸
i=1
µ
i
w
i
= αθ(u) +βθ(v).
Hence θ is linear.
Comment
4.16.1 Theorem 4.16 says that a linearly transformation can be deﬁned in
an arbitrary fashion on a basis; moreover, once its values on the basis elements
have been speciﬁed its values everywhere else are uniquely determined.
Our second theorem of this section, the proof of which is left as an
exercise, examines what injective and surjective linear transformations do to
a basis of a space.
4.17 Theorem Let (v
1
, v
2
, . . . , v
n
) be a basis of a vector space V and let
θ: V → W be a linear transformation. Then
(i) θ is injective if and only if (θ(v
1
), θ(v
2
), . . . , θ(v
n
)) is linearly indepen
dent,
(ii) θ is surjective if and only if (θ(v
1
), θ(v
2
), . . . , θ(v
n
)) spans W.
Comment
4.17.1 If θ: V → W is bijective and (v
1
, v
2
, . . . , v
n
) is a basis of V then
it follows from 4.17 that (θ(v
1
), θ(v
2
), . . . , θ(v
n
) is a basis of W. Hence the
existence of a bijective linear transformation from one space to another guar
antees that the spaces must have the same dimension.
'4e Coordinates relative to a basis
In this ﬁnal section of this chapter we show that choosing a basis in a vector
space enables one to coordinatize the space, in the sense that each element
of the space is associated with a uniquely determined ntuple, where n is the
dimension of the space. Thus it turns out that an arbitrary ndimensional
vector space over the ﬁeld F is, after all, essentially the the same as F
n
.
94 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
4.18 Theorem Let V be a vector space over F and let v
1
, v
2
, . . . , v
n
be
arbitrary elements of V . Then the function T: F
n
→ V deﬁned by
T
¸
¸
¸
α
1
α
2
.
.
.
α
n
= α
1
v
1
+α
2
v
2
+ +α
n
v
n
is a linear transformation. Moreover, T is surjective if and only if the sequence
(v
1
, v
2
, . . . , v
n
) spans V and injective if and only if (v
1
, v
2
, . . . , v
n
) is linearly
independent.
Proof. Let α, β ∈ F
n
and λ, µ ∈ F. Let α
i
, β
i
be the i
th
entries of α, β
respectively. Then we have
T(λα +µβ) = T
¸
¸
¸
¸
¸
¸
λα
1
λα
2
.
.
.
λα
n
+
¸
¸
¸
µβ
1
µβ
2
.
.
.
µβ
n
= T
¸
¸
¸
λα
1
+µβ
1
λα
2
+µβ
2
.
.
.
λα
n
+µβ
n
=
n
¸
i=1
(λα
i
+µβ
i
)v
i
= λ
n
¸
i=1
α
i
v
i
+µ
n
¸
i=1
β
i
v
i
= λT(α) +µT(β).
Hence T is linear.
By the deﬁnition of the image of a function, we have
imT =
T
¸
¸
¸
λ
1
λ
2
.
.
.
λ
n
λ
i
∈ F
¸
= ¦ λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
[ λ
i
∈ F ¦
= Span(v
1
, v
2
, . . . , v
n
).
By deﬁnition, T is surjective if and only if imT = W; hence T is surjective
if and only if (v
1
, v
2
, . . . , v
n
) spans W.
By 3.15 we know that T is injective if and only if the unique solution
of T(α) = 0
˜
is α =
¸
0
.
.
.
0
. Since T(α) =
¸
n
i=1
α
i
v
i
(where α
i
is the i
th
entry
of α) this can be restated as follows: T is injective if and only if α
i
= 0 (for
all i) is the unique solution of
¸
n
i=1
α
i
v
i
= 0
˜
. That is, T is injective if and
only if (v
1
, v
2
, . . . , v
n
) is linearly independent.
Chapter Four: The structure of abstract vector spaces 95
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comment
4.18.1 The above proof can be shortened by using 4.16, 4.17 and the
standard basis of F
n
.
Assume now that b = (v
1
, v
2
, . . . , v
n
) is a basis of V and let T: F
n
→ V
be as deﬁned in 4.18 above. By 4.18 we have that T is bijective, and so it
follows that there is an inverse function T
−1
: V → F
n
which associates to
every v ∈ V a column T
−1
(v); we call this column the coordinate vector of v
relative to the basis b, and we denote it by ‘cv
b
(v)’. That is,
4.19 Definition The coordinate vector of v ∈ V relative to the basis
b = (v
1
, v
2
, . . . , v
n
) is the unique column vector
cv
b
(v) =
¸
¸
¸
λ
1
λ
2
.
.
.
λ
n
∈ F
n
such that v = λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
.
Comment
4.19.1 Observe that this bijective correspondence between elements of
V and ntuples also follows immediately from 4.15. We have, however, also
proved that the corresponding mapping F
n
→ V , satisfying cv
b
(v) → v for
all v ∈ V , is linear.
Examples
#1 Let s = (e
1
, e
2
, . . . , e
n
), the standard basis of F
n
. If v =
¸
λ
1
.
.
.
λ
n
∈ F
n
then clearly v = λ
1
e
1
+ +λ
n
e
n
, and so the coordinate vector of v relative
to s is the column with λ
i
as its i
th
entry; that is, the coordinate vector of v
is just the same as v:
(4.19.2)
If s is the standard basis of F
n
then cv
s
(v) = v for all v ∈ F
n
.
96 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#2 Prove that b =
¸
¸
1
0
1
,
¸
2
1
1
,
¸
1
1
1
is a basis for R
3
and calculate
the coordinate vectors of
¸
1
0
0
,
¸
0
1
0
and
¸
0
0
1
relative to b.
−− By 4.15 we know that b is a basis of R
3
if and only if each element of
R
3
is uniquely expressible as a linear combination of the elements of b. That
is, b is a basis if and only if the equations
x
¸
1
0
1
+y
¸
2
1
1
+z
¸
1
1
1
=
¸
a
b
c
have a unique solution for all a, b, c ∈ R. We may rewrite these equations as
¸
1 2 1
0 1 1
1 1 1
¸
x
y
z
=
¸
a
b
c
,
and we know that there is a unique solution for all a, b, c if and only if the
coeﬃcient matrix has an inverse. It has an inverse if and only if the reduced
echelon matrix obtained from it by row operations is the identity matrix.
We are also asked to calculate x, y and z in three particular cases, and
again this involves ﬁnding the reduced echelon matrices for the corresponding
augmented matrices. We can do all these things at once by applying row
operations to the augmented matrix
¸
1 2 1 1 0 0
0 1 1 0 1 0
1 1 1 0 0 1
.
If the left hand half reduces to the identity it means that b is a basis, and
the three columns in the right hand half will be the coordinate vectors we
seek.
¸
1 2 1 1 0 0
0 1 1 0 1 0
1 1 1 0 0 1
R
3
:=R
3
−R
1
−−−−−−−→
¸
1 2 1 1 0 0
0 1 1 0 1 0
0 −1 0 −1 0 1
Chapter Four: The structure of abstract vector spaces 97
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
R
1
:=R
1
−2R
2
R
3
:=R
3
+R
2
−−−−−−−→
¸
1 0 −1 1 −2 0
0 1 1 0 1 0
0 0 1 −1 1 1
R
1
:=R
1
+R
3
R
2
:=R
2
−R
3
−−−−−−−→
¸
1 0 0 0 −1 1
0 1 0 1 0 −1
0 0 1 −1 1 1
.
Hence we have shown that b is indeed a basis, and, furthermore,
cv
b
¸
1
0
0
=
¸
0
1
−1
, cv
b
¸
0
1
0
=
¸
−1
0
1
, cv
b
¸
0
0
1
=
¸
1
−1
1
.
−−<
#3 Let V be the set of all polynomials over R of degree at most three.
Let p
0
, p
1
, p
2
, p
3
∈ V be deﬁned by p
i
(x) = x
i
(for i = 0, 1, 2, 3). For each
f ∈ V there exist unique coeﬃcients a
i
∈ R such that
f(x) = a
0
+a
1
x +a
2
x
2
+a
3
x
3
= a
0
p
0
(x) +a
1
p
1
(x) +a
2
p
2
(x) +a
3
p
3
(x).
Hence we see that b = (p
0
, p
1
, p
2
, p
3
) is a basis of V , and if f(x) =
¸
a
i
x
i
as
above then cv
b
(f) =
t
( a
0
a
1
a
2
a
3
).
Exercises
1. In each of the following examples the set S has a natural vector space
structure over the ﬁeld F. In each case decide whether S is ﬁnitely
generated, and, if it is, ﬁnd its dimension.
(i ) S = C (complex numbers), F = R.
(ii ) S = C, F = C.
(iii ) S = R, F = Q (rational numbers).
(iv) S = R[X] (polynomials over R in the variable X), F = R.
(v) S = Mat(n, C) (n n matrices over C), F = R.
98 Chapter Four: The structure of abstract vector spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
2. Determine whether or not the following two subspaces of R
3
are the
same:
Span
¸
¸
1
2
−1
,
¸
2
4
1
and Span
¸
¸
1
2
4
,
¸
2
4
−5
.
3. Suppose that (v
1
, v
2
, v
3
) is a basis for a vector space V , and deﬁne
elements w
1
, w
2
, w
3
∈ V by w
1
= v
1
− 2v
2
+ 3v
3
, w
2
= −v
1
+ v
3
,
w
3
= v
2
−v
3
.
(i ) Express v
1
, v
2
, v
3
in terms of w
1
, w
2
, w
3
.
(ii ) Prove that w
1
, w
2
, w
3
are linearly independent.
(iii ) Prove that w
1
, w
2
, w
3
span V .
4. Let V be a vector space and (v
1
, v
2
, . . . , v
n
) a sequence of vectors in V .
Prove that Span(v
1
, v
2
, . . . , v
n
) is a subspace of V .
5. Prove Proposition 4.11.
6. Prove Proposition 4.12.
7. Prove Theorem 4.17.
8. Let (v
1
, v
2
, . . . , v
n
) be a basis of a vector space V and let
w
1
= α
11
v
1
+α
21
v
2
+ +α
n1
v
n
w
2
= α
12
v
1
+α
22
v
2
+ +α
n2
v
n
.
.
.
w
n
= α
1n
v
1
+α
2n
v
2
+ +α
nn
v
n
where the α
ij
are scalars. Let A be the matrix with (i, j)entry α
ij
.
Prove that (w
1
, w
2
, . . . , w
n
) is a basis for V if and only if A is invert
ible.
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5
Inner Product Spaces
To regard R
2
and R
3
merely as vector spaces is to ignore two basic concepts
of Euclidean geometry: the length of a line segment and the angle between
two lines. Thus it is desirable to give these vector spaces some additional
structure which will enable us to talk of the length of a vector and the angle
between two vectors. To do so is the aim of this chapter.
'5a The inner product axioms
If v and w are ntuples over R we deﬁne the dot product of v and w to be the
scalar v w =
¸
n
i=1
v
i
w
i
where v
i
is the i
th
entry of v and w
i
the i
th
entry
of w. That is, v w = v(
t
w) if v and w are rows, v w = (
t
v)w if v and w are
columns.
Suppose that a Cartesian coordinate system is chosen for 3dimensional
Euclidean space in the usual manner. If O is the origin and P and Q are
points with coordinates v = (x
1
, y
1
, z
1
) and w = (x
2
, y
2
, z
2
) respectively, then
by Pythagoras’s Theorem the lengths of OP, OQ and PQ are as follows:
OP =
x
2
1
+y
2
1
+z
2
1
OQ =
x
2
2
+y
2
2
+z
2
2
PQ =
(x
1
−x
2
)
2
+ (y
1
−y
2
)
2
+ (z
1
−z
2
)
2
.
By the cosine rule the cosine of the angle POQ is
(x
2
1
+y
2
1
+z
2
1
) + (x
2
2
+y
2
2
+z
2
2
) −((x
1
−x
2
)
2
+ (y
1
−y
2
)
2
+ (z
1
−z
2
)
2
)
2
x
2
1
+y
2
1
+z
2
1
x
2
2
+y
2
2
+z
2
2
=
x
1
x
2
+y
1
y
2
+z
1
z
2
x
2
1
+y
2
1
+z
2
1
x
2
2
+y
2
2
+z
2
2
= (v.w)
(
√
v.v
√
w.w).
99
100 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Thus we see that the dot product is closely allied with the concepts of ‘length’
and ‘angle’. Note also the following properties:
(a) The dot product is bilinear. That is,
(λu +µv) w = λ(u w) +µ(v w)
and
u (λv +µw) = λ(u v) +µ(u w)
for all ntuples u, v, w, and all λ, µ ∈ R.
(b) The dot product is symmetric. That is,
v w = w v for all ntuples v, w.
(c) The dot product is positive deﬁnite. That is,
v v ≥ 0 for every ntuple v
and
v v = 0 if and only if v = 0.
The proofs of these properties are straightforward and are omitted. It turns
out that all important properties of the dot product are consequences of
these three basic properties. Furthermore, these same properties arise in
other, slightly diﬀerent, contexts. Accordingly, we use them as the basis of a
new axiomatic system.
5.1 Definition A real inner product space is a vector space over R which
is equipped with a scalar valued product which is bilinear, symmetric and
positive deﬁnite. That is, writing 'v, w` for the scalar product of v and w,
we must have 'v, w` ∈ R and
(i) 'λu +µv, w` = λ'u, w` +µ'v, w`,
(ii) 'v, w` = 'w, v`,
(iii) 'v, v` > 0 if v = 0,
for all vectors u, v, w and scalars λ, µ.
Comment
5.1.1 Since the inner product is assumed to be symmetric, linearity in
the second variable is a consequence of linearity in the ﬁrst. It also follows
from bilinearity that 'v, w` = 0 if either v or w is zero. To see this, suppose
that w is ﬁxed and deﬁne a function f
w
from the vector space to R by
f
w
(v) = 'v, w` for all vectors v. By (i) above we have that f
w
is linear, and
hence f
w
(0) = 0 (by 3.12).
Chapter Five: Inner Product Spaces 101
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Apart from R
n
with the dot product, the most important example of
an inner product space is the set ([a, b] of all continuous real valued functions
on a closed interval [a, b], with the inner product of functions f and g deﬁned
by the formula
'f, g` =
b
a
f(x)g(x) dx.
We leave it as an exercise to check that this gives a symmetric bilinear scalar
product. It is a theorem of calculus that if f is continuous and
b
a
f(x)
2
dx = 0
then f(x) = 0 for all x ∈ [a, b], from which it follows that the scalar product
as deﬁned is positive deﬁnite.
Examples
#1 For all u, v ∈ R
3
, let 'u, v` ∈ R be deﬁned by the formula
'u, v` =
t
u
¸
1 1 2
1 3 2
2 2 7
v.
Prove that ' , ` is an inner product on R
3
.
−− Let A be the 3 3 matrix appearing in the deﬁnition of ' , ` above.
Let u, v, w ∈ R
3
and λ, µ ∈ R. Then using the distributive property of
matrix multiplication and linearity of the “transpose” map from R
3
to
t
R
3
,
we have
'λu +µv, w` =
t
(λu +µv)Aw
= (λ(
t
u) +µ(
t
v))Aw
= λ(
t
u)Aw +µ(
t
v)Aw
= λ'u, w` +µ'v, w`,
and it follows that (i) of the deﬁnition is satisﬁed. Part (ii) follows since A
is symmetric: for all v, w ∈ R
3
'v, w` = (
t
v)Aw = (
t
v)(
t
A)w =
t
((
t
w)Av) =
t
('w, v`) = 'w, v`
(the second of these equalities follows since A =
t
A, the third since transpos
ing reverses the order of factors in a product, and the last because 'w, v` is
a 1 1 matrix—that is, a scalar—and hence symmetric).
102 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Now let v =
t
( x y z ) ∈ R
3
be arbitrary. We ﬁnd that
'v, v` = ( x y z )
¸
1 1 2
1 3 2
2 2 7
¸
x
y
z
= (x
2
+xy + 2xz) + (yx + 3y
2
+ 2yz) + (2zx + 2zy + 7z
2
).
Applying completion of the square techniques we deduce that
'v, v` = (x +y + 2z)
2
+ 2y
2
+ 4yz + 3z
2
= (x +y + 2z)
2
+ 2(y +z)
2
+z
2
,
which is nonnegative, and can only be zero if z, y +z and x +y + 2z are all
zero. This only occurs if x = y = z = 0. So we have shown that 'v, v` > 0
whenever v = 0, which is the third and last requirement in Deﬁnition 5.1.
−−<
#2 Show that if the matrix A in #1 above is replaced by either
¸
1 2 3
0 3 1
1 3 7
or
¸
1 1 2
1 3 2
2 2 1
then the resulting scalar valued product on R
3
does not satisfy the inner
product axioms.
−− In the ﬁrst case the fact that the matrix is not symmetric means that
the resulting product would not satisfy (ii) of Deﬁnition 5.1. For example,
we would have
'
¸
1
0
0
,
¸
0
1
0
` = ( 1 0 0 )
¸
1 2 3
0 3 1
1 3 7
¸
0
1
0
= 2
while on the other hand
'
¸
0
1
0
,
¸
1
0
0
` = ( 0 1 0 )
¸
1 2 3
0 3 1
1 3 7
¸
1
0
0
= 0.
In the second case Part (iii) of Deﬁnition 5.1 would fail. For example, we
would have
'
¸
−1
−1
1
,
¸
−1
−1
1
` = ( −1 −1 1 )
¸
1 1 2
1 3 2
2 2 1
¸
−1
−1
1
= −1,
contrary to the requirement that 'v, v` ≥ 0 for all v. −−<
Chapter Five: Inner Product Spaces 103
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#3 The trace Tr(A) of an n n matrix A is deﬁned to be the sum of the
diagonal entries of A; that is, Tr(A) =
¸
n
i=1
A
ii
. Show that
'M, N` = Tr((
t
M)N)
deﬁnes an inner product on the space of all mn matrices over R.
−− The (i, i)entry of (
t
M)N is
((
t
M)N)
ii
=
m
¸
j=1
(
t
M)
ij
N
ji
=
m
¸
j=1
M
ji
N
ji
since the (i, j)entry of
t
M is the (j, i)entry of M. Thus the trace of (
t
M)N
is
¸
n
i=1
((
t
M)N)
ii
=
¸
n
i=1
¸
m
j=1
M
ji
N
ji
. Thus, if we think of an m n
matrix as an mntuple which is simply written as a rectangular array instead
of as a row or column, then the given formula for 'M, N` is just the standard
dot product formula: the sum of the products of each entry of M by the
corresponding entry of N. So the veriﬁcation that ' , ` as deﬁned is an inner
product involves almost exactly the same calculations as involved in proving
bilinearity, symmetry and positive deﬁniteness of the dot product on R
n
.
Thus, let M, N, P be mn matrices, and let λ, µ ∈ R. Then
'λM +µN, P` =
¸
i,j
(λM +µN)
ji
P
ji
=
¸
i,j
(λM
ji
+µN
ji
)P
ji
=
¸
i,j
(λM
ji
P
ji
+µN
ji
P
ji
) = λ
¸
i,j
M
ji
P
ji
+µ
¸
i,j
N
ji
P
ji
= λ'M, P` +µ'N, P`,
verifying Property (i) of 5.1. Furthermore,
'M, N` =
¸
i,j
M
ji
N
ji
=
¸
i,j
N
ji
M
ji
= 'N, M`,
verifying Property (ii). Finally, 'M, M` =
¸
i,j
(M
ji
)
2
is clearly nonnegative
in all cases, and can only be zero if M
ji
= 0 for all i and j. That is,
'M, M` ≥ 0, with equality only if M = 0. Hence Property (iii) holds as
well. −−<
104 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It is also possible to deﬁne a dot product on C
n
. In this case the
deﬁnition is
v w = (
t
v)w =
n
¸
i=1
v
i
w
i
where the overline indicates complex conjugation. This apparent compli
cation is introduced to preserve positive deﬁniteness. If z is an arbitrary
complex number then zz is real and nonnegative, and zero only if z is zero.
It follows easily that if v is an arbitrary complex column vector then v v as
deﬁned above is real and nonnegative, and is zero only if v = 0.
The price to be paid for making the complex dot product positive def
inite is that it is no longer bilinear. It is still linear in the second variable,
but semilinear or conjugate linear in the ﬁrst:
(λu +µv) w = λ(u w) +µ(v w)
and
u (λv +µw) = λ(u v) +µ(u w)
for all u, v, w ∈ C
n
and λ, µ ∈ C. Likewise, the complex dot product is not
quite symmetric; instead, it satisﬁes
v w = w v for all v, w ∈ C
n
.
Analogy with the real case leads to the following deﬁnition.
5.2 Definition A complex inner product space is a complex vector space
V which is equipped with a scalar product ' , `: V V →C satisfying
(i) 'u, λv +µw` = λ'u, v` +µ'u, w`,
(ii) 'v, w` = 'w, v`,
(iii) 'v, v` ∈ R and 'v, v` > 0 if v = 0,
for all u, v, w ∈ V and λ, µ ∈ C.
Comments
5.2.1 Note that 'v, v` ∈ R is in fact a consequence of 'v, w` = 'w, v`
5.2.2 Complex inner product spaces are often called unitary spaces, and
real inner product spaces are often called Euclidean spaces.
If V is any inner product space (real or complex) we deﬁne the length
or norm of a vector v ∈ V to be the nonnegative real number v =
'v, v`.
(This deﬁnition is suggested by the fact, noted above, that the distance from
the origin of the point with coordinates v = ( x y z ) is
√
v.v.) It is an easy
Chapter Five: Inner Product Spaces 105
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
exercise to prove that if v ∈ V and λ is a scalar then λv = [λ[ v. (Recall
that the absolute value of a complex number λ is given by [λ[ =
√
λλ.) In
particular, if λ = (1/v) then λv = 1.
Having deﬁned the length of a vector it is natural now to say that the
distance between two vectors x and y is the length of x −y; thus we deﬁne
d(x, y) = x −y.
It is customary in mathematics to reserve the term ‘distance’ for a function
d satisfying the following properties:
(i) d(x, y) = d(y, x),
(ii) d(x, y) ≥ 0, and d(x, y) = 0 only if x = y,
(iii) d(x, z) ≤ d(x, y) +d(y, z) (the triangle inequality),
for all x, y and z. It is easily seen that our deﬁnition meets the ﬁrst two of
these requirements; the proof that it also meets the third is deferred for a
while.
Analogy with the dot product suggests also that for real inner product
spaces the angle θ between two nonzero vectors v and w should be deﬁned
by the formula
cos θ = 'v, w`
(v w).
Obviously we want −1 ≤ cos θ ≤ 1; so to justify the deﬁnition we must prove
that ['v, w`[ ≤ v w (the CauchySchwarz inequality). We will do this
later.
5.3 Definition Let V be an inner product space. Vectors v, w ∈ V are
said to be orthogonal if 'v, w` = 0. A set X of vectors is said to be or
thonormal if 'v, v` = 1 for all v in X and 'v, w` = 0 for all v, w ∈ X such
that v = w.
Comments
5.3.1 In view of our intended deﬁnition of the angle between two vectors,
this deﬁnition will say that two nonzero vectors in a real inner product space
are orthogonal if and only if they are at rightangles to each other.
5.3.2 Observe that orthogonality is a symmetric relation, since if 'v, w`
is zero then 'w, v` = 'v, w` is too. Observe also that if v is orthogonal
to w then all scalar multiples of v are orthogonal to w. In particular, if
v
1
, v
2
, . . . , v
n
are nonzero vectors satisfying 'v
i
, v
j
` = 0 whenever i = j, then
an othonormal set of vectors can be obtained by replacing v
i
by v
i

−1
v
i
for
each i.
106 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
'5b Orthogonal projection
Note that the standard bases of R
n
and C
n
are orthonormal bases. As we
shall see below, such bases are of particular importance. It is easily shown
that orthonormal vectors are always linearly independent.
5.4 Proposition Let v
1
, v
2
, . . . , v
n
be nonzero elements of V (an inner
product space) and suppose that 'v
i
, v
j
` = 0 whenever i = j. Then the v
i
are linearly independent.
Proof. Suppose that
¸
n
i=1
λ
i
v
i
= 0, where the λ
i
are scalars. By 5.1.1
above we see that for all j,
0 = 'v
j
,
n
¸
i=1
λ
i
v
i
`
=
n
¸
i=1
λ
i
'v
j
, v
i
` (by linearity in the second variable)
= λ
j
'v
j
, v
j
` (since the other terms are zero)
and since 'v
j
, v
j
` = 0 it follows that λ
j
= 0.
Under the hypotheses of 5.4 the v
i
form a basis of the subspace they
span. Such a basis is called an orthogonal basis of the subspace.
Example
#4 The space ([−1, 1] has a subspace of dimension 3 consisting of polyno
mial functions on [−1, 1] of degree at most two. It can be checked that f
0
, f
1
and f
2
deﬁned by f
0
(x) = 1, f
1
(x) = x and f
2
(x) = 3x
2
−1 form an orthogo
nal basis of this subspace. Indeed, since f
0
(x)f
1
(x) and f
1
(x)f
2
(x) are both
odd functions it is immediate that
1
−1
f
0
(x)f
1
(x) dx and
1
−1
f
1
(x)f
2
(x) dx
are both zero, while
1
−1
f
0
(x)f
2
(x) dx =
1
−1
3x
2
−1 dx = (x
3
−x)
1
−1
= 0.
Chapter Five: Inner Product Spaces 107
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5.5 Lemma Let (u
1
, u
2
, . . . , u
n
) be an orthogonal basis of a subspace U
of V , and let v ∈ V . There is a unique element u ∈ U such that 'x, u` = 'x, v`
for all x ∈ U, and it is given by the formula u =
¸
n
i=1
'u
i
, v`
'u
i
, u
i
`
u
i
.
Proof. The elements of a basis must be nonzero; so we have that 'u
i
, u
i
` = 0
for all i. Write λ
i
= 'u
i
, v`
'u
i
, u
i
` and u =
¸
n
i=1
λ
i
u
i
. Then for each j from
1 to n we have
'u
j
, u` = 'u
j
,
n
¸
i=1
λ
i
u
i
`
=
n
¸
i=1
λ
i
'u
j
, u
i
`
= λ
j
'u
j
, u
j
` (since the other terms are zero)
= 'u
j
, v`.
If x ∈ U is arbitrary then x =
¸
n
j=1
µ
j
u
j
for some scalars µ
j
, and we have
'x, u` =
¸
j
µ
j
'u
j
, u` =
¸
j
µ
j
'u
j
, v` = 'x, v`
showing that u has the required property. If u
∈ U also has this property
then for all x ∈ U,
'x, u −u
` = 'x, u` −'x, u
` = 'x, v` −'x, v` = 0.
Since u−u
∈ U this gives, in particular, that 'u−u
, u−u
` = 0, and hence
u −u
= 0. So u is uniquely determined.
Our ﬁrst consequence of 5.5 is the GramSchmidt orthogonalization pro
cess, by which an arbitrary ﬁnite dimensional subspace of an inner product
space is shown to have an orthogonal basis.
5.6 Theorem Let (v
1
, v
2
, v
3
, . . . ) be a sequence (ﬁnite or inﬁnite) of vec
tors in an inner product space, such that b
r
= (v
1
, v
2
, . . . , v
r
) is linearly
independent for all r. Let V
r
be the subspace spanned by b
r
. Then there ex
ist vectors u
1
, u
2
, u
3
, . . . in V such that c
r
= (u
1
, u
2
, . . . , u
r
) is an orthogonal
basis of V
r
for each r.
Proof. This is proved by induction. The case r = 1 is easily settled by
deﬁning u
1
= v
1
; the statement that c
1
is orthogonal is vacuously true.
108 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Assume now that r > 1 and that u
1
to u
r−1
have been found with the
required properties. By 5.5 there exists u ∈ V
r−1
such that 'u
i
, u` = 'u
i
, v
r
`
for all i from 1 to r − 1. We deﬁne u
r
= v
r
− u, noting that this is nonzero
since, by linear independence of b
r
, the vector v
r
is not in the subspace V
r−1
.
To complete the proof it remains to check that 'u
i
, u
j
` = 0 for all i, j ≤ r
with i = j. If i, j ≤ r − 1 this is immediate from our inductive hypothesis,
and the remaining case is clear too, since for all i < r,
'u
i
, u
r
` = 'u
i
, v
r
−u` = 'u
i
, v
r
` −'u
i
, u` = 0
by the deﬁnition of u.
In view of this we see from 5.5 that if U is any ﬁnite dimensional sub
space of an inner product space V then there is a unique mapping P: V → U
such that 'u, P(v)` = 'u, v` for all v ∈ V and u ∈ U. Furthermore, if
(u
1
, u
2
, . . . , u
n
) is any orthogonal basis of U then we have the formula
5.6.1 P(v) =
n
¸
i=1
'u
i
, v`
'u
i
, u
i
`
u
i
.
5.7 Definition The transformation P: V → U deﬁned above is called the
orthogonal projection of V onto U.
Comments
5.7.1 It follows easily from the formula 5.6.1 that orthogonal projections
are linear transformations; this is left as an exercise for the reader.
5.7.2 The GramSchmidt process, given in the proof of Theorem 5.6
above, is an algorithm for which the input is a linearly independent sequence
of vectors v
1
, v
2
, v
3
, . . . and the output an orthogonal sequence of vectors
u
1
, u
2
, u
3
, . . . which are linear combinations of the v
i
. The proof gives the
following formulae for the u
i
:
u
1
= v
1
u
2
= v
2
−
'u
1
, v
2
`
'u
1
, u
1
`
u
1
u
3
= v
3
−
'u
1
, v
3
`
'u
1
, u
1
`
u
1
−
'u
2
, v
3
`
'u
2
, u
2
`
u
2
.
.
.
u
r
= v
r
−
'u
1
, v
r
`
'u
1
, u
1
`
u
1
− −
'u
r−1
, v
r
`
'u
r−1
, u
r−1
`
u
r−1
.
Chapter Five: Inner Product Spaces 109
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
In practice it is not necessary to remember the formulae for the coeﬃcients
of the u
i
on the right hand side. Instead, if one remembers merely that
u
r
= v
r
+λ
1
u
1
+λ
2
u
2
+ +λ
r−1
u
r−1
then it is easy to ﬁnd λ
i
for each i < r by taking inner products with u
i
.
Indeed, since 'u
i
, u
j
` = 0 for i = j, we get
0 = 'u
i
, u
r
` = 'u
i
, v
r
` +λ
i
'u
i
, u
i
`
since the terms λ
j
'u
i
, u
j
` are zero for i = j, and this yields the stated formula
for λ
i
.
Examples
#5 Let V = R
4
considered as an inner product space via the dot product,
and let U be the subspace of V spanned by the vectors
v
1
=
¸
¸
1
1
1
1
, v
2
=
¸
¸
2
3
2
−4
and v
3
=
¸
¸
−1
5
−2
−1
.
Use the GramSchmidt process to ﬁnd an orthonormal basis of U.
−− Using the formulae given in 5.7.2 above, if we deﬁne
u
1
= v
1
=
¸
¸
1
1
1
1
u
2
= v
2
−
u
1
v
2
u
1
u
1
u
1
=
¸
¸
2
3
2
−4
−
3
4
¸
¸
1
1
1
1
=
¸
¸
5/4
9/4
5/4
−19/4
and
u
3
= v
3
−
u
1
v
3
u
1
u
1
u
1
−
u
2
v
3
u
2
u
2
u
2
=
¸
¸
−1
5
−2
−1
−
1
4
¸
¸
1
1
1
1
−
(49/4)
(492/16)
¸
¸
5/4
9/4
5/4
−19/4
=
¸
¸
−1
5
−2
−1
−
123
492
¸
¸
1
1
1
1
−
49
492
¸
¸
5
9
5
−19
=
¸
¸
−860/492
1896/492
−1352/492
316/492
110 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
then u
1
, u
2
and u
3
are mutually orthogonal and span U. It remains to “nor
malize” the basis vectors; that is, replace each u
i
by a vector of length 1 which
is a scalar multiple of u
i
. This is achieved by the formula u
i
=
1
√
u
i
u
i
u
i
.
We obtain
u
1
=
¸
¸
1/2
1/2
1/2
1/2
, u
2
=
¸
¸
5/
√
492
9/
√
492
5/
√
492
−19/
√
492
, u
3
=
¸
¸
−215/
√
391386
474/
√
391386
−338/
√
391386
79/
√
391386
as a suitable orthonormal basis for the given subspace. −−<
#6 With U and V as in #5 above, let P: V → U be the orthogonal
projection, and let v =
¸
¸
−20
8
19
−1
. Calculate P(v).
−− Using the formula 5.6.1 and the orthonormal basis (u
1
, u
2
, u
3
) found
in #5 gives
P
¸
¸
−20
8
19
−1
= (u
1
v)u
1
+ (u
2
v)u
2
+ (u
3
v)u
3
= 3
¸
¸
1/2
1/2
1/2
1/2
+
86
√
492
¸
¸
5/
√
492
9/
√
492
5/
√
492
−19/
√
492
+
1591
√
391386
¸
¸
−215/
√
391386
474/
√
391386
−338/
√
391386
79/
√
391386
=
3
2
¸
¸
1
1
1
1
+
43
246
¸
¸
5
9
5
−19
+
1
246
¸
¸
−215
474
−338
79
=
¸
¸
3/2
5
1
−3/2
−−<
Suppose that V = R
3
, with the dot product, and let U be a subspace
of V of dimension 2. Geometrically, U is a plane through the origin. The
geometrical process corresponding to the orthogonal projection is “dropping
Chapter Five: Inner Product Spaces 111
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
a perpendicular” to U from a given point v. Then P(v) is the point of U
which is the foot of the perpendicular. It is geometrically clear that u = P(v)
is the unique u ∈ U such that v −u is perpendicular to everything in U; this
corresponds to the algebraic statement that 'x, v −P(v)` = 0 for all x ∈ U.
It is also geometrically clear that P(v) is the point of U which is closest to v.
We ought to prove this algebraically.
5.8 Theorem Let U be a ﬁnite dimensional subspace of an inner product
space V , and let P be the orthogonal projection of V onto U. If v is any
element of V then v −u ≥ v −P(v) for all u ∈ U, with equality only if
u = P(v).
Proof. Given u ∈ U we have v − u = (v − P(v)) + x where x = P(v) − u
is an element of U. Since v −P(v) is orthogonal to all elements of U we see
that
v −u
2
= 'v −u, v −u`
= 'v −P(v) +x, v −P(v) +x`
= 'v −P(v), v −P(v)` +'v −P(v), x` +'x, v −P(v)` +'x, x`
= 'v −P(v), v −P(v)` +'x, x`
≥ 'v −P(v), v −P(v)`
with equality if and only if x = 0.
An extremely important application of this method of ﬁnding the point
of a subspace which is closest to a given point, is the approximate solution
of inconsistent systems of equations. The equations Ax = b have a solution
if and only if b is contained in the column space of the matrix A (see 3.20.1).
If b is not in the column space then it is reasonable to ﬁnd that point b
0
of the column space which is closest to b, and solve the equations Ax = b
0
instead. Inconsistent systems commonly arise in practice in cases where we
can obtain as many (approximate) equations as we like by simply taking
more measurements.
Examples
#7 Three measurable variables A, B and C are known to be related by a
formula of the form xA + yB = C, and an experiment is performed to ﬁnd
the values x and y. In four cases measurement yields the results tabulated
112 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
below, in which, no doubt, there are some experimental errors. What are the
values of x and y which best ﬁt this data?
A B C
1
st
case: 1.0 2.0 3.8
2
nd
case: 1.0 3.0 5.1
3
rd
case: 1.0 4.0 5.9
4
th
case: 1.0 5.0 6.8
−− We have a system of four equations in two unknowns:
¸
¸
A
1
B
1
A
2
B
2
A
3
B
3
A
4
B
4
x
y
=
¸
¸
C
1
C
2
C
3
C
4
which is bound to be inconsistent. Viewing these equations as xA
˜
+yB
˜
= C
˜
(where A
˜
∈ R
4
has i
th
entry A
i
, and so on) we see that a reasonable choice
for x and y comes from the point in U = Span(A
˜
, B
˜
) which is closest to C
˜
;
that is, the point P(C
˜
) where P is the projection of R
4
onto the subspace U.
To compute the projection we ﬁrst need to ﬁnd an orthogonal basis for U,
which we do by means of the GramSchmidt process applied to A
˜
and B
˜
.
This gives the orthogonal basis (A
˜
, B
˜
) where A
˜
= A
˜
and
B
˜
= B
˜
−('A
˜
, B
˜
`/'A
˜
, A
˜
`)A
˜
= B
˜
−(7/2)A
˜
=
t
( −1.5 −0.5 0.5 1.5 ) .
Now P(C
˜
) is given by the formula
P(C
˜
) = ('A
˜
, C
˜
`/'A
˜
, A
˜
`)A
˜
+ ('B
˜
, C
˜
`/'B
˜
, B
˜
`)B
˜
and calculation gives the coeﬃcients of A
˜
and B
˜
as 5.4 and 0.98. Expressing
this combination of A
˜
and B
˜
back in terms of A
˜
and B
˜
gives the coeﬃcient
of B
˜
as y = 0.98 and the coeﬃcient of A
˜
as x = 5.4 − 3.5 0.98 = 1.97.
(If these are the correct values of x and y then the values of C for the given
values of A and B should be, in the four cases, 3.93, 4.91, 5.89 and 6.87)
−−<
Chapter Five: Inner Product Spaces 113
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#8 Find the parabola y = ax
2
+bx +c which most closely ﬁts the graph
of y = e
x
on the interval [−1, 1], in the sense that
1
−1
(f(x) − e
x
)
2
dx is
minimized.
−− Let P be the orthogonal projection of ([a, b] onto the subspace of
polynomial functions of degree at most 2. We seek to calculate P(exp),
where exp is the exponential function. Using the orthogonal basis (f
0
, f
1
, f
2
)
from #4 above we ﬁnd that P(exp) = λf
0
+µf
1
+νf
2
where
λ =
1
−1
e
x
dx
1
−1
1 dx
µ =
1
−1
xe
x
dx
1
−1
x
2
dx
ν =
1
−1
(3x
2
−1)e
x
dx
1
−1
(3x
2
−1)
2
dx.
That is, the sought after function is f(x) = λ+µx+ν(3x
2
−1) for the above
values of λ, µ and ν. −−<
#9 Deﬁne functions c
0
, c
1
, s
1
, c
2
, s
2
, . . . on the interval [−π, π] by
c
n
(x) = cos(nx) (for n = 0, 1, 2, . . . )
s
n
(x) = sin(nx) (for n = 1, 2, 3, . . . ).
Show that the c
n
for 0 ≤ n ≤ k and the s
m
for 1 ≤ m ≤ k together form
an orthogonal basis of a subspace of ([−π, π], and determine the formula for
the element of this subspace closest to a given f.
−− Recall the trigonometric formulae
2 sin(nx) cos(mx) = sin((n +m)x) −sin((n −m)x)
2 cos(nx) cos(mx) = cos((n +m)x) + cos((n −m)x)
2 sin(nx) sin(mx) = cos((n −m)x) −cos((n +m)x).
Since
π
−π
sin(kx) dx = 0 for all k and
π
−π
cos(kx) dx =
0 if k = 0
2π if k = 0
114 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
we deduce that
(i) 's
n
, c
m
` = 0 for all n and m,
(ii) 's
n
, s
m
` = 'c
n
, c
m
` = 0 if n = m,
(iii) 's
n
, s
n
` = 'c
n
, c
n
` = π if n = 0,
(iv) 'c
0
, c
0
` = 2π.
Hence if P is the orthogonal projection onto the subspace spanned by the c
n
and s
m
then for an arbitrary f ∈ ([−π, π],
P(f)
(x) = (1/2π)a
0
+ (1/π)
k
¸
n=1
(a
n
cos(nx) +b
n
sin(nx))
where
a
n
= 'c
n
, f` =
π
−π
f(x) cos(nx) dx
and
b
n
= 's
n
, f` =
π
−π
f(x) sin(nx) dx.
We know from 5.8 that g = P(f) is the element of the subspace for which
π
−π
(f −g)
2
is minimized. −−<
We have been happily talking about orthogonal projections onto sub
spaces and ignoring the fact that the word ‘onto’ should only be applied to
functions that are surjective. Fortunately, it is easy to see that the image
of the projection of V onto U is indeed the whole of U. In fact, if P is the
projection then P(u) = u for all u ∈ U. (A consequence of this is that P
is an idempotent transformation: it satisﬁes P
2
= P.) It is natural at this
stage to ask about the kernel of P.
5.9 Definition If U is a ﬁnite dimensional subspace of an inner product
space V and P the projection of V onto U then the subspace U
⊥
= ker P is
called the orthogonal complement of U.
Comment
5.9.1 If 'x, v` = 0 for all x ∈ U then u = 0 is clearly an element of
U satisfying 'x, u` = 'x, v` for all x ∈ U. Hence P(v) = 0. Conversely, if
P(v) = 0 we must have 'x, v` = 'x, P(v)` = 0 for all x ∈ U. Hence the
orthogonal complement of U consists of all v ∈ V which are orthogonal to
all elements of U.
Chapter Five: Inner Product Spaces 115
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Our ﬁnal task in this section is to prove the triangle and Cauchy
Schwarz inequalities, and some related matters.
5.10 Proposition Assume that (u
1
, u
2
, . . . , u
n
) is an orthogonal basis of
a subspace U of V .
(i) Let v ∈ V and for all i let α
i
= 'u
i
, v`. Then v
2
≥
¸
i
[α
i
[
2
u
i

2
.
(ii) If u ∈ U then u =
¸
n
i=1
'u
i
, u`
'u
i
, u
i
`
u
i
.
(iii) If x, y ∈ U then 'x, y` =
¸
i
('x, u
i
`'u
i
, y`)
'u
i
, u
i
`.
Proof. (i) If P: V → U is the orthogonal projection then 5.6.1 above gives
P(v) =
¸
i
λ
i
u
i
, where
λ
i
= 'u
i
, v`
'u
i
, u
i
` = α
i
u
i

2
.
Since 'u
i
, u
j
` = 0 for i = j we have
'P(v), P(v)` =
¸
i,j
λ
i
λ
j
'u
i
, u
j
` =
¸
i
[λ
i
[
2
'u
i
, u
i
` =
¸
i
[α
i
[
2
u
i

2
,
so that our task is to prove that 'v, v` ≥ 'P(v), P(v)`.
Writing x = v −P(v) we have
'v, v` = 'x +P(v), x +P(v)`
= 'x, x` +'x, P(v)` +'P(v), x` +'P(v), P(v)`
= 'x, x` +'P(v), P(v)` (since P(v) ∈ U and x ∈ U
⊥
)
≥ 'P(v), P(v)`,
as required.
(ii) This amounts to the statement that P(u) = u for u ∈ U, and it is
immediate from 5.8 above, since the element of U which is closest to u is
obviously u itself. A direct proof is also trivial: we may write u =
¸
i
λ
i
u
i
for some scalars λ
i
(since the u
i
span U), and now for all j,
'u
j
, u` =
¸
i
λ
i
'u
j
, u
i
` = λ
j
'u
j
, u
j
`
whence the result.
(iii) This is follows easily from (ii) and is left as an exercise.
116 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5.11 Theorem If V is an inner product space (complex or real) then
(i) ['v, w`[ ≤ v w, and
(ii) v +w ≤ v +w
for all v, w ∈ V .
Proof. (i) If v = 0 then both sides are zero. If v = 0 then v by itself
forms an orthogonal basis for a 1dimensional subspace of V , and by part (i)
of 5.10 we have for all w,
w
2
≥ ['v, w`[
2
v
2
so that the result follows.
(ii) If z = x +iy is any complex number then
z +z = 2x ≤ 2
x
2
+y
2
= 2[z[.
Hence for all v, w ∈ V ,
(v +w)
2
−v +w
2
= ('v, v` +'w, w` + 2v w) −'v +w, v +w`
= ('v, v` +'w, w` + 2v w) −('v, v` +'v, w` +'w, v` +'w, w`)
= 2v w −2('v, w` +'v, w`)
≥ 2v w −2['v, w`[
which is nonnegative by the ﬁrst part. Hence (v +w)
2
≥ v +w
2
, and
the result follows.
'5c Orthogonal and unitary transformations
As always in mathematics, we are particularly interested in functions which
preserve the mathematical structure. Hence if V and W are inner product
spaces it is of interest to investigate functions T: V → W which are linear
and preserve the inner product, in the sense that 'T(u), T(v)` = 'u, v` for
all u, v ∈ V . Such a transformation is called an orthogonal transformation
(for real inner product spaces) or a unitary transformation (in the complex
case). It turns out that preservation of length is an equivalent condition;
hence these transformations are sometimes called isometries.
Chapter Five: Inner Product Spaces 117
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5.12 Proposition If V and W are inner product spaces then a linear
transformation T: V → W preserves inner products if and only if it preserves
lengths.
Proof. Since by deﬁnition the length of v is
'v, v` it is trivial that a
transformation which preserves inner products preserves lengths. For the
converse, we assume that T(v) = v for all v ∈ V ; we must prove that
'T(u), T(v)` = 'u, v` for all u, v ∈ V .
Let u, v ∈ V . Note that (as in the proof of 5.11 above)
u +v
2
−u
2
−v
2
= 'u, v` +'u, v`,
and similarly
T(u) +T(v)
2
−T(u)
2
−T(v)
2
= 'T(u), T(v)` +'T(u), T(v)`,
so that the real parts of 'u, v` and 'T(u), T(v)` are equal. By exactly the
same argument the real parts of 'iu, v` and 'T(iu), T(v)` are equal too. But
writing 'u, v` = x +iy with x, y ∈ R we see that
'iu, v` = i'u, v` = (−i)'u, v` = (−i)(x +iy) = y −ix
so that the real part of 'iu, v` is the imaginary part of 'u, v`. Likewise,
the real part of 'T(iu), T(v)` = 'iT(u), T(v)` equals the imaginary part of
'T(u), T(v)`, and we conclude, as required, that 'u, v` and 'T(u), T(v)` have
the same imaginary part as well as the same real part.
We know from '3b#11 that if T ∈ Mat(n n, C) then the function
φ: C
n
→ C
n
deﬁned by φ(x) = Tx is linear, and, furthermore, it is easily
shown (see Exercise 14 of Chapter Three) that every linear transformation
from C
n
to C
n
has this form for some matrix T. Our next task is to describe
those matrices T for which the corresponding linear transformation is an
isometry. We need the following two deﬁnitions.
5.13 Definition If A is a matrix with complex entries we deﬁne the con
jugate of A to be the matrix A whose (i, j)entry is the conjugate of the
(i, j)entry of A. The transpose of the conjugate of A will be denoted by
‘A
∗
’.†
By Exercise 6 of Chapter Two we deduce that (AB)
∗
= B
∗
A
∗
whenever
AB is deﬁned.
† Some people call A
∗
the adjoint of A, in conﬂict with the deﬁnition of “ad
joint” given in Chapter 1.
118 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5.14 Definition An n n complex matrix T is said to be unitary if
T
∗
= T
−1
. If T has real entries this becomes
t
T = T
−1
, and T is said
to be orthogonal.
5.15 Proposition Let T ∈ Mat(n n, C). The linear transformation
φ: C
n
→ C
n
deﬁned by φ(x) = Tx is an isometry if and only if T is uni
tary.
Proof. Suppose ﬁrst that φ is an isometry, and let (e
1
, e
2
, . . . , e
n
) be the
standard basis of C
n
. Note that e
i
e
j
= δ
ij
. Since Te
i
is the i
th
column of
T we see that (Te
i
)
∗
is the i
th
row of T
∗
, and (Te
i
)
∗
(Te
j
) is the (i, j)entry
of T
∗
T. However,
(Te
i
)
∗
(Te
j
) = Te
i
Te
j
= φ(e
i
) φ(e
j
) = e
i
e
j
= δ
ij
since φ preserves the dot product, whence T
∗
T is the identity matrix. By
2.9 it follows that T is unitary.
Conversely, if T is unitary then T
∗
T = I, and for all u, v ∈ C
n
we have
φ(u) φ(v) = Tu Tv = (Tu)
∗
(Tv) = u
∗
T
∗
Tv = u
∗
v = u v.
Thus φ preserves the dot product, and is therefore an isometry.
Comment
5.15.1 Since the (i, j)entry of T
∗
T is the dot product of the i
th
and
j
th
columns of T, we see that T is unitary if and only if the columns of T
form an orthonormal basis of C
n
. Furthermore, the (i, j)entry of TT
∗
is the
conjugate of the dot product of the i
th
and j
th
rows of T; so it is also true
that T is unitary if and only if its rows form an orthonormal basis of
t
C
n
.
5.15.2 Of course, corresponding things are true in the real case, where
the complication due to complex conjugation is absent. Premultiplication
by a real n n matrix is an isometry of R
n
if and only if the matrix is
orthogonal; a real n n matrix is orthogonal if and only if its columns form
an orthonormal basis of R
n
; a real n n matrix is orthogonal if and only if
its rows form an orthonormal basis of
t
R
n
.
5.15.3 It is easy to prove, and we leave it as an exercise, that the product
of two unitary matrices is unitary and the inverse of a unitary matrix is uni
tary. The same is true in the real case, of course, with the word ‘orthogonal’
replacing the word ‘unitary’.
Chapter Five: Inner Product Spaces 119
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Examples
#10 Find an orthogonal matrix P and an upper triangular matrix U such
that
PU =
¸
1 4 5
2 5 1
2 2 1
.
−− Let the columns of P be v
˜
1
, v
˜
2
and v
˜
3
, and let U =
¸
a b c
0 d e
0 0 f
.
Then the columns of PU are av
˜
1
, bv
˜
1
+ dv
˜
2
and cv
˜
1
+ ev
˜
2
+ fv
˜
3
. Thus we
wish to ﬁnd a, b, c, d, e and f such that
av
˜
1
=
¸
1
2
2
, bv
˜
1
+dv
˜
2
=
¸
4
5
2
, cv
˜
1
+ev
˜
2
+fv
˜
3
=
¸
5
1
1
,
and (v
˜
1
, v
˜
2
, v
˜
3
) is an orthonormal basis of R
3
. Now since we require v
˜
1
 = 1,
the ﬁrst equation gives a = 3 and v
˜
1
=
t
( 1/3 2/3 2/3 ). Taking the dot
product of both sides of the second equation with v
˜
1
now gives b = 6 (since
we require v
˜
2
v
˜
1
= 0), and similarly taking the dot product of both sides
of the third equation with v
˜
1
gives c = 3. Substituting these values into the
equations gives
dv
˜
2
=
¸
2
1
−2
, ev
˜
2
+fv
˜
3
=
¸
4
−1
−1
.
The ﬁrst of these equations gives d = 3 and v
˜
2
=
t
( 2/3 1/3 −2/3 ), and
then taking the dot product of v
˜
2
with the other equation gives e = 3. After
substituting these values the remaining equation becomes
fv
˜
3
=
¸
2
−2
1
,
and we see that f = 3 and v
˜
3
=
t
( 2/3 −2/3 1/3 ). Thus the only possible
solution to the given problem is
P =
¸
1/3 2/3 2/3
2/3 1/3 −2/3
2/3 −2/3 1/3
, U =
¸
3 6 3
0 3 3
0 0 3
.
It is easily checked that
t
PP = I and that PU has the correct value. (Note
that the method we used to calculate the columns of P was essentially just
the GramSchmidt process.) −−<
120 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#11 Let u
1
, u
2
and u
3
be real numbers satisfying u
2
1
+u
2
2
+u
2
3
= 1, and let
u be the 3 1 column with i
th
entry u
i
. Deﬁne
X =
¸
u
2
1
u
1
u
2
u
1
u
3
u
2
u
1
u
2
2
u
2
u
3
u
3
u
1
u
3
u
2
u
2
3
, Y =
¸
0 u
3
−u
2
−u
3
0 u
1
u
2
−u
1
0
.
Show that X
2
= X and Y
2
= I − X, and XY = Y X = 0. Furthermore,
show that for all θ, the matrix R(u, θ) = cos θI + (1 − cos θ)X + sin θY is
orthogonal.
−− Observe that X = u(
t
u), and so
X
2
=
u(
t
u)
u(
t
u)
= u
(
t
u)u
(
t
u) = u(u u)(
t
u) = X
since u u = 1. By direct calculation we ﬁnd that
Y u =
¸
0 u
3
−u
2
−u
3
0 u
1
u
2
−u
1
0
¸
u
1
u
2
u
3
=
¸
0
0
0
,
and hence Y X = Y u(
t
u) = 0. Since
t
X = X and
t
Y = −Y it follows that
Y X = −(
t
Y )(
t
X) = −
t
(XY ) = −
t
0 = 0. And an easy calculation yields
Y
2
=
¸
u
2
2
+u
2
3
−u
1
u
2
−u
1
u
3
−u
2
u
1
u
2
1
+u
2
3
−u
2
u
3
−u
3
u
1
−u
3
u
2
u
2
1
+u
2
2
=
¸
1 −u
2
1
−u
1
u
2
−u
1
u
3
−u
2
u
1
1 −u
2
2
−u
2
u
3
−u
3
u
1
−u
3
u
2
1 −u
2
3
= I −X.
To check that R = R(u, θ) is orthogonal we show that (
t
R)R = I. Now
t
R = cos θI + (1 −cos θ)X −sin θY,
since
t
X = X and
t
Y = −Y . So
(
t
R)R = cos
2
θI
2
+ (1 −cos θ)
2
X
2
+ sin
2
θY
2
+ 2 cos θ(1 −cos θ)IX + (1 −cos θ) sin θ(XY −Y X)
= cos
2
θI + (1 −cos θ)
2
X + sin
2
θ(I −X) + 2 cos θ(1 −cos θ)X
= (cos
2
θ + sin
2
θ)I + ((1 −cos θ)(1 + cos θ) −sin
2
θ)X
= I as required.
−−<
Chapter Five: Inner Product Spaces 121
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It can be shown that the matrix R(u, θ) deﬁned in #11 above corre
sponds geometrically to a rotation through the angle θ about an axis in the
direction of the vector u.
'5d Quadratic forms
When using Cartesian coordinates to investigate geometrical problems, it in
variably simpliﬁes matters if one can choose coordinate axes which are in
some sense natural for the problem concerned. It therefore becomes impor
tant to be able keep track of what happens when a new coordinate system is
chosen.
If T ∈ Mat(n n, R) is an orthogonal matrix then, as we have seen, the
transformation φ deﬁned by φ(x
˜
) = Tx
˜
preserves lengths and angles. Hence
if S is any set of points in R
n
then the set φ(S) = ¦ Tx
˜
[ x
˜
∈ S ¦ is congruent
to S (in the sense of Euclidean geometry). Clearly such transformations will
be important in geometrical situations. Observe now that if
Tx
˜
= µ
1
e
1
+µ
2
e
2
+ +µ
n
e
n
where (e
1
, e
2
, . . . , e
n
) is the standard basis of R
n
, then
x
˜
= µ
1
(T
−1
e
1
) +µ
2
(T
−1
e
2
) + +µ
n
(T
−1
e
n
),
whence the coordinates of Tx
˜
relative to the standard basis are the same as
the coordinates of x
˜
relative to the basis (T
−1
e
1
, T
−1
e
2
, . . . , T
−1
e
n
). Note
that these vectors are the columns of T
−1
, and they form an orthonormal
basis of R
n
since T
−1
is an orthogonal matrix. So choosing a new orthonormal
basis is eﬀectively the same as applying an orthogonal transformation, and
doing so leaves all lengths and angles unchanged. We comment that in the
threedimensional case the only length preserving linear transformations are
rotations and reﬂections.
A quadratic form over R in variables x
1
, x
2
, . . . , x
n
is an expression of
the form
¸
i
a
ii
x
2
i
+ 2
¸
i<j
a
ij
x
i
x
j
where the coeﬃcients are real numbers.
If A is the matrix with diagonal entries A
ii
= a
ii
and oﬀdiagonal entries
A
ij
= A
ji
= a
ij
(for i < j) then the quadratic form can be written as
Q(x
˜
) =
t
x
˜
Ax
˜
, where x is the column with x
i
as its i
th
entry. For example,
multiplying out
( x y z )
¸
a p q
p b r
q r c
¸
x
y
z
122 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
gives ax
2
+by
2
+cz
2
+ 2pxy + 2qxz + 2ryz.
In the above situation we call A the matrix of the quadratic form Q.
Since A
ij
= A
ji
for all i and j we have that A is a symmetric matrix, in the
sense of the following deﬁnition.
5.16 Definition A real nn matrix A is said to be symmetric if
t
A = A,
and a complex n n matrix A is said to be Hermitian if A
∗
= A.
The following remarkable facts concerning eigenvalues and eigenvectors
of Hermitian matrices are very important, and not hard to prove.
5.17 Theorem Let A be an Hermitian matrix. Then
(i) All eigenvalues of A are real, and
(ii) if λ and µ are two distinct eigenvalues of A, and if u and v are corre
sponding eigenvectors, then u v = 0
The proof is left as an exercise (although a hint is given). Of course the
theorem applies to Hermitian matrices which happen to be real; that is, the
same results hold for real symmetric matrices.
If f is an arbitrary smooth real valued function on R
n
, and if we choose
a critical point of f as the origin of our coordinate system, then the terms of
degree one in the Taylor series for f vanish. Ignoring the terms of degree three
and higher then gives an approximation to f of the form c + Q(x
˜
), where
c = f(0) is a constant and Q is a quadratic form, the coeﬃcients of the
corresponding symmetric matrix being the second order partial derivatives
of f at the critical point. We would now like to be able to rotate the axes
so as to simplify the expression for Q(x
˜
) as much as possible, and hence
determine the behaviour of f near the critical point.
From our discussion above we know that introducing a new coordinate
system corresponding to an orthonormal basis of R
n
amounts to introducing
new variables x
˜
which are related to the old variables x
˜
by x = Tx
, where T
is the orthogonal matrix whose columns are the new basis vectors.† In terms
of the new variables the quadratic form Q(x
˜
) =
t
x
˜
Ax
˜
becomes
Q
(x
˜
) =
t
(Tx
˜
)A(Tx
˜
) =
t
x
˜
(
t
TAT)x
˜
=
t
x
˜
A
x
˜
where A
=
t
TAT.
† The T in this paragraph corresponds to the T
−1
above.
Chapter Five: Inner Product Spaces 123
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5.18 Definition Two n n real matrices A and A
are said to be orthog
onally similar if there exists an orthogonal matrix T such that A
=
t
TAT.
Two nn complex matrices A and A
are said to be unitarily similar if there
exists a unitary matrix T such that A
= T
∗
AT.
Comment
5.18.1 In both parts of the above deﬁnition we could alternatively have
written A
= T
−1
AT, since T
−1
=
t
T in the real case and T
−1
= T
∗
in the
complex case.
We have shown that the change of coordinates corresponding to choos
ing a new orthonormal basis for R
n
induces an orthogonal similarity transfor
mation on the matrix of a quadratic form. The main theorem of this section
says that it is always possible to diagonalize a symmetric matrix by an or
thogonal similarity transformation. Unfortunately, we have to make use of
the following fact, whose proof is beyond the scope of this book:
5.18.2
Every square complex matrix has
at least one complex eigenvalue.
(In fact 5.18.2 is a trivial consequence of the “Fundamental Theorem of Al
gebra”, which asserts that the ﬁeld C is algebraically closed. See also '9c.)
5.19 Theorem (i) If A is an nn real symmetric matrix then there exists
an orthogonal matrix T such that
t
TAT is a diagonal matrix.
(ii) If A is an nn Hermitian matrix then there exists a unitary matrix U
such that U
∗
AU is a real diagonal matrix.
Proof. We prove only part (i) since the proof of part (ii) is virtually identi
cal. The proof is by induction on n. The case n = 1 is vacuously true, since
every 1 1 matrix is diagonal.
Assume then that all k k symmetric matrices over R are orthogonally
similar to diagonal matrices, and let A be a (k +1) (k +1) real symmetric
matrix. By 5.18.2 we know that A has at least one complex eigenvalue λ, and
by 5.17 we know that λ ∈ R. Let v ∈ R
n
be an eigenvector corresponding to
the eigenvalue λ.
Since v constitutes a basis for a onedimensional subspace of R
n
, we can
apply the GramSchmidt process to ﬁnd an orthogonal basis (v
1
, v
2
, . . . , v
n
)
of R
n
with v
1
= v. Replacing each v
i
by v
i

−1
v
i
makes this into an or
thonormal basis, with v
1
still a λeigenvector for A. Now deﬁne P to be the
124 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(real) matrix which has v
i
as its i
th
column (for each i). By the comments
in 5.15.1 above we see that P is orthogonal.
The ﬁrst column of P
−1
AP is P
−1
Av, since v is the ﬁrst column of P,
and since v is a λeigenvector of A this simpliﬁes to P
−1
(λv) = λP
−1
v. But
the same reasoning shows that P
−1
v is the ﬁrst column of P
−1
P = I, and
so we deduce that the ﬁrst entry of the ﬁrst column of P
−1
AP is λ, all the
other entries in the ﬁrst column are zero. That is
P
−1
AP =
λ z
˜
0
˜
B
for some kcomponent row z
˜
and some k k matrix B, with 0
˜
being the
kcomponent zero column.
Since P is orthogonal and A is symmetric we have that
P
−1
AP =
t
P
t
AP =
t
(
t
PAP) =
t
(P
−1
AP),
so that P
−1
AP is symmetric. It follows that z = 0 and B is symmetric: we
have
P
−1
AP =
λ
t
0
˜
0
˜
B
.
By the inductive hypothesis there exists a real orthogonal matrix Q such that
Q
−1
BQ is diagonal. By multiplication of partitioned matrices we ﬁnd that
1
t
0
˜
0
˜
Q
−1
λ
t
0
˜
0
˜
B
1
t
0
˜
0
˜
Q
=
1
t
0
˜
0
˜
Q
−1
λ
t
0
˜
0
˜
BQ
=
λ
t
0
˜
0
˜
Q
−1
BQ
which is diagonal since Q
−1
BQ is. Thus (PQ
)
−1
A(PQ
) is diagonal, where
Q
=
1 0
˜
0
˜
Q
.
It remains to prove that PQ
is orthogonal. It is clear by multiplication
of block matrices and the fact that (
t
Q)Q = I that (
t
Q
)Q
= I. Hence Q
is
orthogonal, and since the product of two orthogonal matrices is orthogonal
(see 5.15.3) it follows that PQ
is orthogonal too.
Chapter Five: Inner Product Spaces 125
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comments
5.19.1 Given a n n symmetric (or Hermitian) matrix the problem of
ﬁnding an orthogonal (or unitary) matrix which diagonalizes it amounts to
ﬁnding an orthonormal basis of R
n
(or C
n
) consisting of eigenvectors of the
matrix. To do this, proceed in the same way as for any matrix: ﬁnd the
characteristic equation and solve it. If there are n distinct eigenvalues then
the eigenvectors are uniquely determined up to a scalar factor, and the theory
above guarantees that they will be a right angles to each other. So if the
eigenvectors you ﬁnd for two diﬀerent eigenvalues do not have the property
that their dot product is zero, it means that you have made a mistake. If
the characteristic equation has a repeated root λ then you will have to ﬁnd
more than one eigenvector corresponding to λ. In this case, when you solve
the linear equations (A−λI)x = 0 you will ﬁnd that the number of arbitrary
parameters in the general solution is equal to the multiplicity of λ as a root
of the characteristic polynomial. You must ﬁnd an orthonormal basis for this
solution space (by using the GramSchmidt process, for instance).
5.19.2 Let A be a Hermitian matrix and let U be a Unitary matrix
which diagonalizes A. It is clear from the above discussion that the diag
onal entries of U
−1
AU are exactly the eigenvalues of A. Note also that
det U
−1
AU = det U
−1
det Adet U = det A, and since the determinant of the
diagonal matrix U
−1
AU is just the product of the diagonal entries we con
clude that the determinant of A is just the product of its eigenvalues. (This
can also be proved by observing that the determinant and the product of the
eigenvalues are both equal to the constant term of the characteristic polyno
mial.)
The quadratic form Q(x) =
t
xAx and the symmetric matrix A are both
said to be positive deﬁnite if Q(x) > 0 for all nonzero x ∈ R
n
. If A is positive
deﬁnite and T is an invertible matrix then certainly
t
(Tx)A(Tx) > 0 for all
nonzero x ∈ R
n
, and it follows that
t
TAT is positive deﬁnite also. Note that
t
TAT is also symmetric. The matrices A and
t
TAT are said to be congruent.
Obviously, symmetric matrices which are orthogonally similar are congruent;
the reverse is not true. It is easily checked that a diagonal matrix is positive
deﬁnite if and only if the diagonal entries are all positive. It follows that
an arbitrary real symmetric matrix is positive deﬁnite if and only if all its
eigenvalues are positive. The following proposition can be used as a test for
positive deﬁniteness.
5.20 Proposition A real symmetric n n matrix A is positive deﬁnite if
126 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and only if for all k from 1 to n the k k matrix A
k
with (i, j)entry A
ij
(obtained by deleting the last n − k rows and columns of A) has positive
determinant.
Proof. Assume that A is positive deﬁnite. Then the eigenvalues of A are
all positive, and so the determinant, being the product of the eigenvalues,
is positive also. Now if x is any nonzero kcomponent column and y the
ncomponent column obtained from x by appending n − k zeros then, since
A is positive deﬁnite,
0 <
t
yAy = (
t
x 0 )
A
k
∗
∗ ∗
x
0
=
t
xA
k
x
and it follows that A
k
is positive deﬁnite. Thus the determinant of each A
k
is positive too.
We must prove, conversely, that if all the A
k
have positive determinants
then A is positive deﬁnite. We use induction on n, the case n = 1 being
trivial. For the case n > 1 the inductive hypothesis gives that A
n−1
is
positive deﬁnite, and it immediately follows that the nn matrix A
deﬁned
by
A
=
A
n−1
0
0 λ
is positive deﬁnite whenever λ is a positive number. Note that since the
determinant of A
n−1
is positive, λ is positive if and only if det A
is positive.
We now prove that A is congruent to A
for an appropriate choice of λ.
We have
A =
A
n−1
v
t
v µ
for some (n − 1)component column v and some real number µ. Deﬁning
λ = µ −
t
vA
−1
n−1
v and
X =
I
n−1
A
−1
n−1
v
0 1
an easy calculation gives
t
XA
X = A. Furthermore, we have that det X = 1,
and hence det A
= det A > 0. So A
is positive deﬁnite, and hence A is
too.
Chapter Five: Inner Product Spaces 127
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Suppose that a surface in three dimensional Euclidean space is deﬁned
by an equation of the form Q(x, y, z) = constant, where Q is a quadratic
form. We can ﬁnd an orthogonal matrix to diagonalize the symmetric ma
trix associated with Q. It is easily seen that an orthogonal matrix necessarily
has determinant either equal to 1 or −1, and by multiplying a column by −1
if necessary we can make sure that our diagonalizing matrix has determi
nant 1. Orthogonal transition matrices of determinant 1 correspond simply
to rotations of the coordinate axes, and so we deduce that a suitable rotation
transforms the equation of the surface to
λ(x
)
2
+µ(y
)
2
+ν(z
)
2
= constant
where the coeﬃcients λ, µ and ν are the eigenvalues of the matrix we started
with. The nature of these surfaces depends on the signs of the coeﬃcients,
and the separate cases are easily enumerated.
Exercises
1. Prove that the dot product is an inner product on R
n
.
2. Prove that 'f, g` =
b
a
f(x)g(x) dx deﬁnes an inner product on ([a, b].
Assume any theorems of calculus you need.
3. Deﬁne the distance between two elements v and w of an inner product
space by d(v, w) = v −w. Prove that
(i ) d(v, w) = d(w, v),
(ii ) d(v, w) ≥ 0, with d(v, w) = 0 only if v = w,
(iii ) d(u, w) ≤ d(u, v) +d(v, w),
(iv) d(u +v, u +w) = d(v, w),
for all u, v, w ∈ V .
4. Prove that orthogonal projections are linear transformations.
5. Prove part (iii) of Proposition 5.10.
6. (i ) Let V be a real inner product space and v, w ∈ V . Use calculus
to prove that the minimum value of 'v − λw, v − λw` occurs at
λ = 'v, w`
'w, w`.
(ii ) Put λ = 'v, w`
'w, w` and use 'v − λw, v − λw` ≥ 0 to prove the
CauchySchwarz inequality in any inner product space.
128 Chapter Five: Inner Product Spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7. Prove that (AB)
∗
= (B
∗
)(A
∗
) for all complex matrices A and B such
that AB is deﬁned. Hence prove that the product of two unitary matrices
is unitary. Prove also that the inverse of a unitary matrix is unitary.
8. (i ) Let A be a Hermitian matrix and λ an eigenvalue of A. Prove that
λ is real.
(Hint: Let v be a nonzero column satisfying Av = λv. Prove
that v
∗
A
∗
= λv
∗
, and then calculate v
∗
Av in two diﬀerent
ways.)
(ii ) Let λ and µ be two distinct eigenvalues of the Hermitian matrix A,
and let u and v be corresponding eigenvectors. Prove that u and
v are orthogonal to each other.
(Hint: Consider u
∗
Av.)
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6
Relationships between spaces
In this chapter we return to the general theory of vector spaces over an
arbitrary ﬁeld. Our ﬁrst task is to investigate isomorphism, the “sameness”
of vector spaces. We then consider various ways of constructing spaces from
others.
'6a Isomorphism
Let V = Mat(2 2, R), the set of all 2 2 matrices over R, and let W = R
4
,
the set of all 4component columns over R. Then V and W are both vector
spaces over R, relative the usual addition and scalar multiplication for matri
ces and columns. Furthermore, there is an obvious onetoone correspondence
between V and W: the function f from V to W deﬁned by
f
a b
c d
=
¸
¸
a
b
c
d
is bijective. It is clear also that f interacts in the best possible way with the
addition and scalar multiplication functions corresponding to V and W: the
column which corresponds to the sum of two given matrices A and B is the
sum of the column corresponding to A and the column corresponding to B,
and the column corresponding to a scalar multiple of A is the same scalar
times the column corresponding to A. One might even like to say that V
and W are really the same space; after all, whether one chooses to write four
real numbers in a column or a rectangular array is surely a notational matter
of no great substance. In fact we say that V and W are isomorphic vector
spaces, and the function f is called an isomorphism. Intuitively, to say that
V and W are isomorphic, is to say that, as vector spaces, they are the same.
129
130 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The property of the onetoone correspondence f which was written out
in words in the last paragraph, becomes, when written symbolically,
f(A+B) = f(A) +f(B)
f(λA) = λf(A)
for all A, B ∈ V and λ ∈ R. That is, f is a linear transformation.
6.1 Definition Let V and W be vector spaces over the same ﬁeld F. A
function f: V → W which is bijective and linear is called an isomorphism of
vector spaces. If there is an isomorphism from V to W then V and W are
said to be isomorphic, and we write V
∼
= W.
Comments
6.1.1 Rephrasing the deﬁnition, an isomorphism is a onetoone corre
spondence which preserves addition and scalar multiplication. Alternatively,
an isomorphism is a linear transformation which is onetoone and onto.
6.1.2 Every vector space is obviously isomorphic to itself: the identity
function i (deﬁned by i(x) = x for all x) is an isomorphism. This says that
isomorphism is a reﬂexive relation.
6.1.3 Since isomorphisms are bijective functions they have inverses. We
leave it as an exercise to prove that the inverse of an isomorphism is also an
isomorphism. Thus if V is isomorphic to W then W is isomorphic to V ; in
other words, isomorphism is a symmetric relation.
6.1.4 If f: U → W and g: V → U are isomorphisms then the composite
function fg: V → W is also an isomorphism. Thus if V is isomorphic to U
and U is isomorphic to W then V is isomorphic to W—isomorphism is a
transitive relation. This proof is also left as an exercise.
The above comments show that isomorphism is an equivalence relation
(see '1e), and it follows that vector spaces can be separated into mutually
nonoverlapping classes—which are known as isomorphism classes—such that
spaces in the same class are isomorphic to one another. Restricting attention
to ﬁnitely generated vector spaces, the next proposition shows that (for a
given ﬁeld F) there is exactly one equivalence class for each nonnegative
integer: there is, essentially, only one vector space of each dimension.
Chapter Six: Relationships between spaces 131
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.2 Proposition If U and V are ﬁnitely generated vector spaces over F
of the same dimension d then they are isomorphic.
Proof. Let b be a basis of V . We have seen (see 4.19.1 and 6.1.3) that
v → cv
b
(v) deﬁnes a bijective linear transformation from F
d
to V ; that is,
V and F
d
are isomorphic. By the same argument, so are U and F
d
.
Examples
#1 Prove that the function f from Mat(2, R) to R
4
deﬁned at the start
of this section is an isomorphism.
−− Let A, B ∈ Mat(2, R) and λ ∈ R. Then
A =
a b
c d
B =
e k
g h
for some a, b, . . . , h ∈ R, and we ﬁnd
f(A+B) = f
a +e b +k
c +g d +h
=
¸
¸
a +e
b +k
c +g
d +h
=
¸
¸
a
b
c
d
+
¸
¸
e
k
g
h
= f
a b
c d
+f
e k
g h
= f(A) +f(B)
and
f(λA) = f
λa λb
λc λd
=
¸
¸
λa
λb
λc
λd
= λ
¸
¸
a
b
c
d
= λf(A).
Hence f is a linear transformation.
Let v
˜
∈ R
4
be arbitrary. Then v
˜
=
¸
¸
x
y
z
w
for some x, y, z, w ∈ R, and
f
x y
z w
=
¸
¸
x
y
z
w
= v
˜
.
132 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Hence f is surjective.
Suppose that A, B ∈ Mat(2, R) are such that f(A) = f(B). Writing
A =
a b
c d
and B =
e k
g h
we have
¸
¸
a
b
c
d
= f(A) = f(B) =
¸
¸
e
k
g
h
,
whence a = e, b = k, c = g and d = h, showing that A = B. Thus f is
injective. Since it is also surjective and linear, f is an isomorphism. −−<
#2 Let { be set of all polynomial functions from R to R of degree two or
less, and let T be the set of all functions from the set ¦1, 2, 3¦ to R. Show
that { and T are isomorphic vector spaces over R.
−− Observe that { is a subset of the set of all functions fromR to R, which
we know to be a vector space over R (by '3b#6). A function f: R →R is in
the subset { if and only if there exist a, b, c ∈ R such that f(x) = ax
2
+bx+c
for all x ∈ R. Now let f, g ∈ { and λ ∈ R be arbitrary. We have
f(x) = ax
2
+bx +c, g(x) = a
x
2
+b
x +c
for some a, a
, b, b
, c, c
∈ R. Now by the deﬁnitions of addition and scalar
multiplication for functions we have for all x ∈ R
(f +g)(x) = f(x) +g(x)
= (ax
2
+bx +c) + (a
x
2
+b
x +c
)
= (a +a
)x
2
+ (b +b
) + (c +c
)
and similarly
(λf)(x) = λ
f(x)
= λ(ax
2
+bx +c)
= (λa)x
2
+ (λb)x + (λc),
so that f + g and λf are both in {. Hence { is closed under addition and
scalar multiplication, and is therefore a subspace.
Since '3b#6 also gives that T is a vector space over R, it remains to
ﬁnd an isomorphism φ: { → o. There are many ways to do this. One is as
Chapter Six: Relationships between spaces 133
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
follows: if f ∈ { let φ(f) be the restriction of f to ¦1, 2, 3¦. That is, if f ∈ {
then φ(f) ∈ T is deﬁned by
(φ(f))(1) = f(1), (φ(f))(2) = f(2), (φ(f))(3) = f(3).
We prove ﬁrst that φ as deﬁned above is linear. Let f, g ∈ { and
λ, µ ∈ R. Then for each i ∈ ¦1, 2, 3¦ we have
φ(λf +µg)(i) = (λf +µg)(i) = (λf)(i) + (µg)(i) = λf(i) +µg(i)
= λ
φ(f)(i)
+µ
φ(g)(i)
=
λφ(f)
(i) +
µφ(g)
(i) =
λφ(f) +µφ(g)
(i),
and so φ(λf + µg) = λφ(f) + µφ(g). Thus φ preserves addition and scalar
multiplication.
To prove that φ is injective we must prove that if f and g are polynomi
als of degree at most two such that φ(f) = φ(g) then f = g. The assumption
that φ(f) = φ(g) gives f(i) = g(i), and hence (f −g)(i) = 0, for i = 1, 2, 3.
Thus x −1, x −2 and x −3 are all factors of (f −g)(x), and so
(f −g)(x) = q(x)(x −1)(x −2)(x −3)
for some polynomial q. Now since f − g cannot have degree greater than
two we see that the only possibility is q = 0, and it follows that f = g, as
required.
Finally, we must prove that φ is surjective, and this involves showing
that for every α: ¦1, 2, 3¦ →R there is a polynomial f of degree at most two
with f(i) = α(i) for i = 1, 2, 3. In fact one can immediately write down a
suitable f:
f(x) =
1
2
(x −2)(x −3)α(1) −(x −1)(x −3)α(2) +
1
2
(x −1)(x −2)α(3).
(This formula is an instance of Lagrange’s interpolation formula.) −−<
There is an alternative and possibly shorter proof using (instead of φ)
an isomorphism which maps the polynomial ax
2
+bx +c to α ∈ T given by
α(1) = a, α(2) = b and α(3) = c.
134 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
'6b Direct sums
Let F be a ﬁeld, and consider the space F
6
, consisting of all 6component
columns over F. The subset
S
1
=
¸
¸
¸
α
β
0
0
0
0
α, β ∈ F
¸
is a subspace of F
6
isomorphic to F
2
; the map which appends four zero
components to the bottom of a 2component column is an isomorphism from
F
2
to S
1
. Likewise,
S
2
=
¸
¸
¸
0
0
γ
δ
0
0
γ, δ ∈ F
¸
and S
3
=
¸
¸
¸
0
0
0
0
ε
ζ
ε, ζ ∈ F
¸
are also subspaces of F
6
. It is easy to see that for an arbitrary v ∈ F
6
there
exist uniquely determined v
1
, v
2
and v
3
such that v = v
1
+v
2
+v
3
and v
i
∈ S
i
;
speciﬁcally, we have
¸
¸
¸
α
β
γ
δ
ε
ζ
=
¸
¸
¸
α
β
0
0
0
0
+
¸
¸
¸
0
0
γ
δ
0
0
+
¸
¸
¸
0
0
0
0
ε
ζ
.
We say that F
6
is the “direct sum” of its subspaces S
1
, S
2
and S
3
.
More generally, suppose that b = (v
1
, v
2
, . . . , v
n
) is a basis for the vector
space V , and suppose that we split b into k parts:
b
1
= (v
1
, v
2
, . . . , v
r
1
)
b
2
= (v
r
1
+1
, v
r
1
+2
, . . . , v
r
2
)
b
3
= (v
r
2
+1
, v
r
2
+2
, . . . , v
r
3
)
.
.
.
b
k
= (v
r
k−1
+1
, v
r
k−1
+2
, . . . , v
n
)
where the r
j
are integers with 0 = r
0
< r
1
< r
2
< < r
k−1
< r
k
= n. For
each j = 1, 2, . . . , k let U
j
be the subspace of V spanned by b
j
. It is not hard
to see that b
j
is a basis of U
j
, for each j. Furthermore,
Chapter Six: Relationships between spaces 135
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.3 Proposition In the situation described above, for each v ∈ V there
exist uniquely determined u
1
, u
2
, . . . , u
k
such that v = u
1
+u
2
+ +u
k
and
u
j
∈ U
j
for each j.
Proof. Let v ∈ V . Since b spans V there exist scalars λ
i
with
v =
n
¸
i=1
λ
i
v
i
=
r
1
¸
i=1
λ
i
v
i
+
r
2
¸
i=r
1
+1
λ
i
v
i
+ +
n
¸
i=r
k−1
+1
λ
i
v
i
.
Deﬁning for each j
u
j
=
r
j
¸
i=r
j−1
λ
i
v
i
we see that u
j
∈ Span b
j
= U
j
and v =
¸
k
j=1
u
j
, so that each v can be
expressed in the required form.
To prove that the expression is unique we must show that if u
j
, u
j
∈ U
j
and
¸
k
j=1
u
j
=
¸
k
j=1
u
j
then u
j
= u
j
for each j. Since U
j
= Span b
j
we
may write (for each j)
u
j
=
r
j
¸
i=r
j−1
+1
λ
i
v
i
and u
j
=
r
j
¸
i=r
j−1
+1
λ
i
v
i
for some scalars λ
i
and λ
i
. Now we have
n
¸
i=1
λ
i
v
i
=
k
¸
j=1
r
j
¸
i=r
j−1
+1
λ
i
v
i
=
k
¸
j=1
u
j
=
k
¸
j=1
u
j
=
k
¸
j=1
r
j
¸
i=r
j−1
+1
λ
i
v
i
=
n
¸
i=1
λ
i
v
i
and since b is a basis for V it follows from 4.15 that λ
i
= λ
i
for each i. Hence
for each j,
u
j
=
r
j
¸
i=r
j−1
+1
λ
i
v
i
=
r
j
¸
i=r
j−1
+1
λ
i
v
i
= u
j
as required.
136 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.4 Definition A vector space V is said to be the direct sum of subspaces
U
1
, U
2
, . . . , U
k
if every element of V is uniquely expressible in the form
u
1
+ u
2
+ + u
k
with u
j
∈ U
j
for each j. To signify that V is the direct
sum of the U
j
we write
V = U
1
⊕U
2
⊕ ⊕U
k
.
Example
#3 Let b = (v
1
, v
2
, . . . , v
n
) be a basis of V and for each i let U
i
be the
onedimensional space spanned by v
i
. Prove that V = U
1
⊕U
2
⊕ ⊕U
n
.
−− Let v be an arbitrary element of V . Since b spans V there exist λ
i
∈ F
such that v =
¸
n
i=1
λ
i
v
i
. Now since u
i
= λ
i
v
i
is an element of U
i
this shows
that v can be expressed in the required form
¸
n
i=1
u
i
. It remains to prove
that the expression is unique, and this follows directly from 4.15. For suppose
that we also have v =
¸
n
i=1
u
i
with u
i
∈ U
i
. Letting u
i
= λ
i
v
i
we ﬁnd that
v =
¸
n
i=1
λ
i
v
i
, whence 4.15 gives λ
i
= λ
i
, and it follows that u
i
= u
i
, as
required. −−<
Observe that this corresponds to the particular case of 6.3 for which
the parts into which the basis is split have one element each.
The case of two direct summands is the most important:
6.5 Definition If V = U
1
⊕U
2
then U
1
and U
2
are called complementary
subspaces.
6.6 Theorem If U
1
and U
2
are subspaces of the vector space V then
V = U
1
⊕U
2
if and only if V = U
1
+U
2
and U
1
∩ U
2
= ¦0
˜
¦.
Proof. Recall that, by deﬁnition, U
1
+ U
2
consists of all elements of V of
the form u
1
+u
2
with u
1
∈ U
1
, u
2
∈ U
2
. (See Exercise 9 of Chapter Three.)
Thus V = U
1
+U
2
if and only if each element of V is expressible as u
1
+u
2
.
Assume that V = U
1
+U
2
and U
1
∩U
2
= ¦0
˜
¦. Then each element of V
can be expressed in the form u
1
+u
2
, and to show that V = U
1
⊕U
2
it suﬃces
to prove that these expressions are unique. So, assume that u
1
+u
2
= u
1
+u
2
Chapter Six: Relationships between spaces 137
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
with u
i
, u
i
∈ U
i
for i = 1, 2. Then u
1
−u
1
= u
2
−u
2
; let us call this element
v. By closure properties for the subspace U
1
we have
v = u
1
−u
1
∈ U
1
(since u
1
, u
1
∈ U
1
),
and similarly closure of U
2
gives
v = u
2
−u
2
∈ U
2
(since u
2
, u
2
∈ U
2
).
Hence v ∈ U
1
∩ U
2
= ¦0
˜
¦, and we have shown that
u
1
−u
1
= 0
˜
= u
2
−u
2
.
Thus u
i
= u
i
for each i, and uniqueness is proved.
Conversely, assume that V = U
1
⊕U
2
. Then certainly each element of
V can be expressed as u
1
+ u
2
with u
i
∈ U
i
; so V = U
1
+ U
2
. It remains
to prove that U
1
∩ U
2
= ¦0
˜
¦. Subspaces always contain the zero vector; so
¦0
˜
¦ ⊆ U
1
∩ U
2
. Now let v ∈ U
1
∩ U
2
be arbitrary. If we deﬁne
u
1
= 0
˜
u
2
= v
and
u
1
= v
u
2
= 0
˜
then we see that u
1
, u
1
∈ U
1
and u
2
, u
2
∈ U
2
and also u
1
+u
2
= v = u
1
+u
2
.
Since each element of V is uniquely expressible as the sum of an element of
U
1
and an element of U
2
this equation implies that u
1
= u
1
and u
2
= u
2
;
that is, v = 0
˜
.
The above theorem can be generalized to deal with the case of more
than two direct summands. If U
1
, U
2
, . . . , U
k
are subspaces of V it is natural
to deﬁne their sum to be
U
1
+U
2
+ +U
k
= ¦ u
1
+u
2
+ +u
k
[ u
i
∈ U
i
¦
and it is easy to prove that this is always a subspace of V . One might hope
that V is the direct sum of the U
i
if the U
i
generate V , in the sense that V
is the sum of the U
i
, and we also have that U
i
∩ U
j
= ¦0¦ whenever i = j.
However, consideration of #3 above shows that this condition will not be
suﬃcient even in the case of onedimensional summands, since to prove that
(v
1
, v
2
, . . . , v
n
) is a basis it is not suﬃcient to prove that v
i
is not a scalar
multiple of v
j
for i = j. What we really need, then, is a generalization of the
notion of linear independence.
138 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.7 Definition The subspaces U
1
, U
2
, . . . , U
k
of V are said to be inde
pendent if the only solution of
u
1
+u
2
+ +u
k
= 0
˜
, u
i
∈ U
i
for each i
is given by u
1
= u
2
= = u
k
= 0
˜
.
We can now state the correct generalization of 6.6:
6.8 Theorem Let U
1
, . . . , U
k
be subspaces of the vector space V . Then
V = U
1
⊕U
2
⊕ ⊕U
k
if and only if the U
i
are independent and generate V .
We omit the proof of this since it is a straightforward adaption of the
proof of 6.6. Note in particular that the uniqueness part of the deﬁnition
corresponds to the independence part of 6.8.
We have proved in 6.3 that a direct sum decomposition of a space is
obtained by splitting a basis into parts, the summands being the subspaces
spanned by the various parts. Conversely, given a direct sum decomposition
of V one can obtain a basis of V by combining bases of the summands.
6.9 Theorem Let V = U
1
⊕U
2
⊕ ⊕U
k
, and for each j = 1, 2, . . . , k let
b
j
= (v
(j)
1
, v
(j)
2
, . . . , v
(j)
d
j
) be a basis of the subspace U
j
. Then
b = (v
(1)
1
, . . . , v
(1)
d
1
, v
(2)
1
, . . . , v
(2)
d
2
, . . . . . . . . . , v
(k)
1
, . . . , v
(k)
d
k
)
is a basis of V and dimV =
¸
k
j=1
dimU
j
.
Proof. Let v ∈ V . Then v =
¸
k
j=1
u
j
for some u
j
∈ U
j
, and since b
j
spans
U
j
we have u
j
=
¸
d
j
i=1
λ
ij
v
(j)
i
for some scalars λ
ij
. Thus
v =
d
1
¸
i=1
λ
i1
v
(1)
i
+
d
2
¸
i=1
λ
i2
v
(2)
i
+ +
d
k
¸
i=1
λ
ik
v
(k)
i
,
a linear combination of the elements of b. Hence b spans V .
Suppose we have an expression for 0
˜
as a linear combination of the
elements of b. That is, assume that
0
˜
=
d
1
¸
i=1
λ
i1
v
(1)
i
+
d
2
¸
i=1
λ
i2
v
(2)
i
+ +
d
k
¸
i=1
λ
ik
v
(k)
i
Chapter Six: Relationships between spaces 139
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
for some scalars λ
ij
. If we deﬁne u
j
=
¸
d
j
i=1
λ
ij
v
(j)
i
for each j then we have
u
j
∈ U
j
and
u
1
+u
2
+ +u
k
= 0
˜
.
Since V is the direct sum of the U
j
we know from 6.8 that the U
i
are inde
pendent, and it follows that u
j
= 0
˜
for each j. So
¸
d
j
i=1
λ
ij
v
(j)
i
= u
j
= 0
˜
,
and since b
j
= (v
(j)
1
, . . . , v
(j)
d
j
) is linearly independent it follows that λ
ij
= 0
for all i and j. We have shown that the only way to write 0
˜
as a linear
combination of the elements of b is to make all the coeﬃcients zero; thus b
is linearly independent.
Since b is linearly independent and spans, it is a basis for V . The
number of elements in b is
¸
k
j=1
d
j
=
¸
k
j=1
dimU
j
; hence the statement
about dimensions follows.
As a corollary of the above we obtain:
6.10 Proposition Any subspace of a ﬁnite dimensional space has a com
plement.
Proof. Let U be a subspace of V and let d be the dimension of V . By
4.11 we know that U is ﬁnitely generated and has dimension less than d.
Assume then that (u
1
, u
2
, . . . , u
r
) is a basis for U. By 4.10 we may choose
w
1
, w
2
, . . . , w
d−r
∈ V such that (u
1
, . . . , u
r
, w
1
, . . . , w
d−r
) is a basis of V .
Now by 6.3 we have
V = Span(u
1
, . . . , u
r
) ⊕Span(w
1
, . . . , w
d−r
).
That is, W = Span(w
1
, . . . , w
d−r
) is a complement to U.
'6c Quotient spaces
Let U be a subspace of the vector space V . If v is an arbitrary element of V
we deﬁne the coset of U containing v to be the subset v +U of V deﬁned by
v +U = ¦ v +u [ u ∈ U ¦.
We have seen that subspaces arise naturally as solution sets of linear equa
tions of the form T(x) = 0. Cosets arise naturally as solution sets of linear
equations when the right hand side is nonzero.
140 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.11 Proposition Suppose that T: V → W is a linear transformation. Let
w ∈ W and let S be the set of solutions of the equation T(x) = w; that is,
S = ¦ x ∈ V [ T(x) = w¦. If w / ∈ imT then S is empty. If w ∈ imT then S
is a coset of the kernel of T:
S = x
0
+K
where x
0
is a particular solution of T(x) = w and K is the set of all solutions
of T(x) = 0.
Proof. By deﬁnition imT is the set of all w ∈ W such that w = T(x) for
some x ∈ V ; so if w / ∈ imT then S is certainly empty.
Suppose that w ∈ imT and let x
0
be any element of V such that
T(x
0
) = w. If x ∈ S then x −x
0
∈ K, since linearity of T gives
T(x −x
0
) = T(x) −T(x
0
) = w −w = 0,
and it follows that
x = x
0
+ (x −x
0
) ∈ x
0
+K.
Thus S ⊆ x
0
+ K. Conversely, if x is an element of x
0
+ K then we have
x = x
0
+v with v ∈ K, and now
T(x) = T(x
0
+v) = T(x
0
) +T(v) = w + 0 = w
shows that x ∈ S. Hence x
0
+K ⊆ S, and therefore x
0
+K = S.
Cosets can also be described as equivalence classes, in the following
way. Given that U is a subspace of V , deﬁne a relation ≡ on V by
(∗) x ≡ y if and only if x − y ∈ U.
It is straightforward to check that ≡ is reﬂexive, symmetric and transitive,
and that the resulting equivalence classes are precisely the cosets of U.
The set of all cosets of U in V can be made into a vector space in a
manner totally analogous to that used in Chapter Two to construct a ﬁeld
with three elements. It is easily checked that if x + U and y + U are cosets
of U then the sum of any element of x +U and any element of y +U will lie
in the coset (x + y) + U. Thus it is natural to deﬁne addition of cosets by
the rule
6.11.1 (x +U) + (y +U) = (x +y) +U.
Chapter Six: Relationships between spaces 141
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Similarly, if λ is a scalar and x +U a coset then multiplying any element of
x +U by λ gives an element of (λx) +U, and so we deﬁne
6.11.2 λ(x +U) = (λx) +U.
It is routine to check that addition and scalar multiplication of cosets as
deﬁned above satisfy the vector space axioms. (Note in particular that the
coset 0 +U, which is just U itself, plays the role of zero.)
6.12 Theorem Let U be a subspace of the vector space V and let V/U be
the set of all cosets of U in V . Then V/U is a vector space over F relative
to the addition and scalar multiplication deﬁned in 6.11.1 and 6.11.2 above.
Comments
6.12.1 The vector space deﬁned above is called the quotient of V by U.
Note that it is precisely the quotient, as deﬁned in '1e, of V by the equivalence
relation ≡ (see (∗) above).
6.12.2 Note that it is possible to have x+U = x
+U and y +U = y
+U
without having x = x
or y = y
. However, it is necessarily true in these
circumstances that (x +y) +U = (x
+y
) +U, so that addition of cosets is
welldeﬁned. A similar remark is valid for scalar multiplication.
The quotient space V/U can be thought of as the space obtained from
V by identifying things which diﬀer by an element of U, or, equivalently,
pretending that all the elements of U are zero. Thus it is reasonable to
regard V as being in some sense made up of U and V/U. The next proposition
reinforces this idea.
6.13 Proposition Let U be a subspace of V and let W be any complement
to U. Then W is isomorphic to V/U.
Proof. Deﬁne T: W → V/U by T(w) = w + U. We will prove that T is
linear and bijective.
By the deﬁnitions of addition and scalar multiplication of cosets we
have
T(λx +µy) = (λx +µy) +U = λ(x +U) +µ(y +U) = λT(x) +µT(y)
for all x, y ∈ W and all λ, µ ∈ F. Hence T is linear.
142 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Let w ∈ ker T. Then T(w) = w+U is the zero element of V/U; that is,
w +U = U. Since w = w +0 ∈ w +U it follows that w ∈ U, and hence that
w ∈ U ∩ W. But W and U are complementary, and so it follows from 6.6
that U ∩W = ¦0¦, and we deduce that w = 0. So ker T = ¦0¦, and therefore
(by 3.15) T is injective.
Finally, an arbitrary element of V/U has the form v + U for some
v ∈ V , and since V is the sum of U and W there exist u ∈ U and w ∈ W
with v = w +u. We see from this that v ≡ w in the sense of (∗) above, and
hence
v +U = w +U = T(w) ∈ imT.
Since v +U was an arbitrary element of V/U this shows that T is surjective.
Comment
6.13.1 In the situation of 6.13 suppose that (u
1
, . . . , u
r
) is a basis for
U and (w
1
, . . . , w
s
) a basis for W. Since w → w + U is an isomorphism
W → V/U we must have that the elements w
1
+U, . . . , w
s
+U form a basis
for V/U. This can also be seen directly. For instance, to see that they span
observe that since each element v of V = W ⊕U is expressible in the form
v =
s
¸
j=1
µ
j
w
j
+
r
¸
i=1
λ
i
u
i
,
it follows that
v +U =
¸
s
¸
j=1
µ
j
w
j
+U =
s
¸
j=1
µ
j
(w
j
+U).
'6d The dual space
Let V and W be vector spaces over F. It is easily checked, by arguments
almost identical to those used in '3b#6, that the set of all functions from V to
W becomes a vector space if addition and scalar multiplication are deﬁned in
the natural way. We have also seen in Exercise 11 of Chapter Three that the
sum of two linear transformations from V to W is also a linear transformation
from V to W, and the product of a linear transformation by a scalar also
gives a linear transformation. The set L(V, W) of all linear transformations
from V to W is nonempty (the zero function is linear) and so it follows from
3.10 that L(V, W) is a vector space. The special case W = F is of particular
importance.
Chapter Six: Relationships between spaces 143
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.14 Definition Let V be a vector space over F. The space of all linear
transformations from V to F is called the dual space of V , and it is commonly
denoted ‘V
∗
’.
Comment
6.14.1 In another strange tradition, elements of the dual space are com
monly called linear functionals.
6.15 Theorem Let b = (v
1
, v
2
, . . . , v
n
) be a basis for the vector space V .
For each i = 1, 2, . . . , n there exists a unique linear functional f
i
such that
f
i
(v
j
) = δ
ij
(Kronecker delta) for all j. Furthermore, (f
1
, f
2
, . . . , f
n
) is a
basis of V
∗
.
Proof. Existence and uniqueness of the f
i
is immediate from 4.16. If
¸
n
i=1
λ
i
f
i
is the zero function then for all j we have
0 =
n
¸
i=1
λ
i
f
i
(v
j
) =
n
¸
i=1
λ
i
f
i
(v
j
) =
n
¸
i=1
λ
i
δ
ij
= λ
j
and so it follows that the f
i
are linearly independent. It remains to show
that they span.
Let f ∈ V
∗
be arbitrary, and for each i deﬁne λ
i
= f(v
i
). We will show
that f =
¸
n
i=1
λ
i
f
i
. By 4.16 it suﬃces to show that f and
¸
n
i=1
λ
i
f
i
take
the same value on all elements of the basis b. This is indeed satisﬁed, since
n
¸
i=1
λ
i
f
i
(v
j
) =
n
¸
i=1
λ
i
f
i
(v
j
) =
n
¸
i=1
λ
i
δ
ij
= λ
j
= f(v
j
).
Comment
6.15.1 The basis of V
∗
described in the theorem is called the dual basis
of V
∗
corresponding to the basis b of V .
To accord with the normal use of the word ‘dual’ it ought to be true that
the dual of the dual space is the space itself. This cannot strictly be satisﬁed:
elements of (V
∗
)
∗
are linear functionals on the space V
∗
, whereas elements
of V need not be functions at all. Our ﬁnal result in this chapter shows that,
nevertheless, there is a natural isomorphism between V and (V
∗
)
∗
.
144 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
6.16 Theorem Let V be a ﬁnite dimensional vector space over the ﬁeld
F. For each v ∈ V deﬁne e
v
: V
∗
→ F by
e
v
(f) = f(v) for all f ∈ V
∗
.
Then e
v
is an element of (V
∗
)
∗
, and furthermore the mapping V → (V
∗
)
∗
deﬁned by v → e
v
is an isomorphism.
We omit the proof. All parts are in fact easy, except for the proof
that v → e
v
is surjective, which requires the use of a dimension argument.
This will also be easy once we have proved the Main Theorem on Linear
Transformations, which we will do in the next chapter. It is important to
note, however, that Theorem 6.16 becomes false if the assumption of ﬁnite
dimensionality is omitted.
Exercises
1. Prove that isomorphic spaces have the same dimension.
2. Let U and V be ﬁnite dimensional vector spaces over the ﬁeld F and let
f: U → V be a linear transformation which is injective. Prove that U is
isomorphic to a subspace of V , and hence prove that dimU ≤ dimV .
3. Prove that the inverse of a vector space isomorphism is a vector space
isomorphism, and that the composite of two vector space isomorphisms
is a vector space isomorphism.
4. In each case determine (giving reasons) whether the given linear trans
formation is a vector space isomorphism. If it is, give a formula for its
inverse.
(i ) ρ: R
2
→R
3
deﬁned by ρ
α
β
=
¸
1 1
2 1
0 −1
α
β
(ii ) σ: R
3
→R
3
deﬁned by σ
¸
α
β
γ
=
¸
1 1 0
0 1 1
0 0 1
¸
α
β
γ
Chapter Six: Relationships between spaces 145
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
5. Let φ: V → V be a linear transformation such that φ
2
= φ. (Note that
multiplication of linear transformations is always deﬁned to be compo
sition; thus φ
2
(v) = φ
φ(v)
). Let i denote the identity transformation
V → V ; that is, i(v) = v for all v ∈ V .
(i ) Prove that i−φ: V → V (deﬁned by (i−φ)(v) = v−φ(v)) satisﬁes
(i −φ)
2
= i −φ.
(ii ) Prove that imφ = ker(i −φ) and ker φ = im(id −φ).
(iii ) Prove that for each element v ∈ V there exist unique x ∈ imφ and
y ∈ ker φ with v = x +y.
6. Let W be the space of all twice diﬀerentiable functions f: R → R satis
fying d
2
f/dt
2
= 0. For all
α
β
∈ R
2
let φ
α
β
be the function from
R to R given by
φ
α
β
(t) = αt + (β −α).
Prove that φ(v) ∈ W for all v ∈ R
2
, and that φ: v → φ(v) is an isomor
phism from R
2
to W. Give a formula for the inverse isomorphism φ
−1
.
7. Let V be a vector space and S and T subspaces of V such that V = S⊕T.
Prove or disprove the following assertion:
If U is any subspace of V then U = (U ∩ S) ⊕(U ∩ T).
8. Express R
2
as the sum of two one dimensional subspaces in two diﬀerent
ways.
9. Is it possible to ﬁnd subspaces U, V and W of R
4
such that
R
4
= U ⊕V = V ⊕W = W ⊕U ?
10. Let U
1
, U
2
, . . . , U
k
be subspaces of V . Prove that the U
i
are independent
if and only if each element of U
1
+U
2
+ +U
k
(the subspace generated
by the U
i
) is uniquely expressible in the form
¸
k
i=1
u
i
with u
i
∈ U
i
for
each i.
146 Chapter Six: Relationships between spaces
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
11. Let U
1
, U
2
, . . . , U
k
be subspaces of V . Prove that the U
i
are independent
if and only if
U
j
∩ (U
1
+ +U
j−1
+U
j+1
+ +U
k
) = ¦0
˜
¦
for all j = 1, 2, . . . , k.
12. (i ) Let V be a vector space over F and let v ∈ V . Prove that the
function e
v
: V
∗
→ F deﬁned by e
v
(f) = f(v) (for all f ∈ V
∗
) is
linear.
(ii ) With e
v
as deﬁned above, prove that the function e: V → (V
∗
)
∗
deﬁned by
e(v) = e
v
for all v in V
is a linear transformation.
(iii ) Prove that the function e deﬁned above is injective.
13. (i ) Let V and W be vector spaces over F. Show that the Cartesian
product of V and W becomes a vector space if addition and scalar
multiplication are deﬁned in the natural way. (This space is called
the external direct sum of V and W, and is sometimes denoted by
‘V W’.)
(ii ) Show that V
= ¦ (v, 0) [ v ∈ V ¦ and W
= ¦ (0, w) [ w ∈ W ¦
are subspaces of V W with V
∼
= V and W
∼
= W, and that
V W = V
⊕W
.
(iii ) Prove that dim(V W) = dimV + dimW.
14. Let S and T be subspaces of a vector space V . Prove that (s, t) → s +t
deﬁnes a linear transformation from S T to V which has image S +T.
15. Suppose that V is the direct sum of subspaces V
i
(for i = 1, 2, . . . , n) and
each V
i
is the direct sum of subspaces V
ij
(for j = 1, 2, . . . , m
i
). Prove
that V is the direct sum of all the subspaces V
ij
.
16. Let V be a real inner product space, and for all w ∈ V deﬁne f
w
: V →R
by f
w
(v) = 'v, w` for all v ∈ V .
(i ) Prove that for each element w ∈ V the function f
w
: V → R is
linear.
(ii ) By (i ) each f
w
is an element of the dual space V
∗
of V . Prove that
f: V → V
∗
deﬁned by f(w) = f
w
is a linear transformation.
Chapter Six: Relationships between spaces 147
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(iii ) By considering f
w
(w) prove that f
w
= 0 only if w = 0. Hence
prove that the kernel of the linear transformation f in (ii ) is ¦0¦.
(iv) Assume that V is ﬁnite dimensional. Use Exercise 2 above and
the fact that V and V
∗
have the same dimension to prove that the
linear transformation f in (ii ) is an isomorphism.
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7
Matrices and Linear Transformations
We have seen in Chapter Four that an arbitrary ﬁnite dimensional vector
space is isomorphic to a space of ntuples. Since an ntuple of numbers is
(in the context of pure mathematics!) a fairly concrete sort of thing, it is
useful to use these isomorphisms to translate questions about abstract vector
spaces into questions about ntuples. And since we introduced vector spaces
to provide a context in which to discuss linear transformations, our ﬁrst task
is, surely, to investigate how this passage from abstract spaces to ntuples
interacts with linear transformations. For instance, is there some concrete
thing, like an ntuple, which can be associated with a linear transformation?
'7a The matrix of a linear transformation
We have shown in 4.16 that the action of a linear transformation on a basis
of a space determines it uniquely. Suppose then that T: V → W is a linear
transformation, and let b = (v
1
, . . . , v
n
) be a basis for the space V and
c = (w
1
, . . . , w
m
) a basis for W. If we know expressions for each Tv
i
in
terms of the w
j
then for any scalars λ
i
we can calculate T (
¸
n
i=1
λ
i
v
i
) in
terms of the w
j
. Thus there should be a formula expressing cv
c
(Tv) (for any
v ∈ V ) in terms of the cv
c
(Tv
i
) and cv
b
(v).
7.1 Theorem Let V and W be vector spaces over the ﬁeld F with bases
b = (v
1
, v
2
, . . . , v
n
) and c = (w
1
, w
2
, . . . , w
m
), and let T: V → W be a linear
transformation. Deﬁne
Mcb
(T) to be the mn matrix whose i
th
column is
cv
c
(Tv
i
). Then for all v ∈ V ,
cv
c
(Tv) =
Mcb
(T) cv
b
(v).
Proof. Let v ∈ V and let v =
¸
n
i=1
λ
i
v
i
, so that cv
b
(v) is the column with
λ
i
as its i
th
entry. For each v
j
in the basis b let T(v
j
) =
¸
m
i=1
α
ij
w
i
. Then
148
Chapter Seven: Matrices and Linear Transformations 149
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
α
ij
is the i
th
entry of the column cv
c
(Tv
j
), and by the deﬁnition of
Mcb
(T)
this gives
Mcb
(T) =
¸
¸
¸
α
11
α
12
. . . α
1n
α
21
α
22
. . . α
2n
.
.
.
.
.
.
.
.
.
α
m1
α
m2
. . . α
mn
.
We obtain
Tv = T(λ
1
v
1
+λ
2
v
2
+ +λ
n
v
n
)
= λ
1
Tv
1
+λ
2
Tv
2
+ +λ
n
Tv
n
= λ
1
m
¸
i=1
α
i1
w
i
+λ
2
m
¸
i=1
α
i2
w
i
+ +λ
n
m
¸
i=1
α
in
w
i
=
m
¸
i=1
(α
i1
λ
1
+α
i2
λ
2
+ +α
in
λ
n
)w
i
,
and since the coeﬃcient of w
i
in this expression is the i
th
entry of cv
c
(Tv)
it follows that
cv
c
(Tv) =
¸
¸
¸
α
11
λ
1
+α
12
λ
2
+ +α
1n
λ
n
α
21
λ
1
+α
22
λ
2
+ +α
2n
λ
n
.
.
.
α
m1
λ
1
+α
m2
λ
2
+ +α
mn
λ
n
=
¸
¸
¸
α
11
α
12
. . . α
1n
α
21
α
22
. . . α
2n
.
.
.
.
.
.
.
.
.
α
m1
α
m2
. . . α
mn
¸
¸
¸
λ
1
λ
2
.
.
.
λ
n
=
Mcb
(T) cv
b
(v).
Comments
7.1.1 The matrix
Mcb
(T) is called the matrix of T relative to the bases
c and b. The theorem says that in terms of coordinates relative to bases c
and b the eﬀect of a linear transformation T is multiplication by the matrix
of T relative to these bases.
7.1.2 The matrix of a transformation from an mdimensional space to
an ndimensional space is an n m matrix. (One column for each element
of the basis of the domain.)
150 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Examples
#1 Let V be the space considered in '4e#3, consisting of all polynomials
over R of degree at most three, and deﬁne a transformation D: V → V by
(Df)(x) =
d
dx
f(x). That is, Df is the derivative of f. Since the derivative
of a polynomial of degree at most three is a polynomial of degree at most
two, it is certainly true that Df ∈ V if f ∈ V , and hence D is a welldeﬁned
transformation from V to V . Moreover, elementary calculus gives
D(λf +µg) = λ(Df) +µ(Dg)
for all f, g ∈ V and all λ, µ ∈ R; hence D is a linear transformation. Now,
if b = (p
0
, p
1
, p
2
, p
3
) is the basis deﬁned in '4e#3, we have
Dp
0
= 0, Dp
1
= p
0
, Dp
2
= 2p
1
, Dp
3
= 3p
2
,
and therefore
cv
b
(Dp
0
) =
¸
¸
0
0
0
0
, cv
b
(Dp
1
) =
¸
¸
1
0
0
0
,
cv
b
(Dp
2
) =
¸
¸
0
2
0
0
, cv
b
(Dp
3
) =
¸
¸
0
0
3
0
.
Thus the matrix of D relative to b and b is
Mbb
(D) =
¸
¸
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
.
(Note that the domain and codomain are equal in this example; so we may
use the same basis for each. We could have, if we had wished, used a basis
for V considered as the codomain of D diﬀerent from the basis used for V
considered as the domain of D. Of course, this would result in a diﬀerent
matrix.)
By linearity of D we have, for all a
i
∈ R,
D(a
0
p
0
+a
1
p
1
+a
2
p
2
+a
3
p
3
) = a
0
Dp
0
+a
1
Dp
1
+a
2
Dp
2
+a
3
Dp
3
= a
1
p
0
+ 2a
2
p
1
+ 3a
3
p
2
.
Chapter Seven: Matrices and Linear Transformations 151
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
so that if cv
c
(p) =
¸
¸
a
0
a
1
a
2
a
3
then cv
b
(Dp) =
¸
¸
a
1
2a
2
3a
3
0
. To verify the assertion
of the theorem in this case it remains to observe that
¸
¸
a
1
2a
2
3a
3
0
=
¸
¸
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
¸
¸
a
0
a
1
a
2
a
3
.
#2 Let b =
¸
1
0
1
,
¸
2
1
1
,
¸
1
1
1
, a basis of R
3
, and let s be the
standard basis of R
3
. Let i: R
3
→R
3
be the identity transformation, deﬁned
by i(v) = v for all v ∈ R
3
. Calculate
Msb
(i) and
Mbs
(i). Calculate also the
coordinate vector of
¸
−2
−1
2
relative to b.
−− We proved in '4e#2 that b is a basis. The i
th
column of
Msb
(i) is
the coordinate vector cv
s
(i(v
i
)), where b = (v
1
, v
2
, v
3
). But, by 4.19.2 and
the deﬁnition of i,
cv
s
(i(v)) = cv
s
(v) = v (for all v ∈ R
3
)
and it follows that
Msb
(i) =
¸
1 2 1
0 1 1
1 1 1
.
The i
th
column of
Mbs
(i) is cv
b
(i(e
i
)) = cv
b
(e
i
), where e
1
, e
2
and e
3
are the vectors of s. Hence we must solve the equations
(∗)
¸
1
0
0
= x
11
¸
1
0
1
+x
21
¸
2
1
1
+x
31
¸
1
1
1
¸
0
1
0
= x
12
¸
1
0
1
+x
22
¸
2
1
1
+x
32
¸
1
1
1
¸
0
0
1
= x
13
¸
1
0
1
+x
23
¸
2
1
1
+x
33
¸
1
1
1
.
152 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The solution
¸
x
11
x
21
x
31
of the ﬁrst of these equations gives the ﬁrst column
of
Mbs
(i), the solution of the second equation gives the second column, and
likewise for the third. In matrix notation the equations can be rewritten as
¸
1
0
0
=
¸
1 2 1
0 1 1
1 1 1
¸
x
11
x
21
x
31
¸
0
1
0
=
¸
1 2 1
0 1 1
1 1 1
¸
x
12
x
22
x
32
¸
0
0
1
=
¸
1 2 1
0 1 1
1 1 1
¸
x
13
x
23
x
33
,
or, more succinctly still,
¸
1 0 0
0 1 0
0 0 1
=
¸
1 2 1
0 1 1
1 1 1
¸
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
in which the matrix (x
ij
) is
Mbs
(i). Thus
Mbs
(i) =
¸
1 2 1
0 1 1
1 1 1
−1
=
¸
0 −1 1
1 0 −1
−1 1 1
as calculated in '4e#2. (This is an instance of the general fact that if
T: V → W is a vector space isomorphism and b, c are bases for V, W then
Mbc
(T
−1
) =
Mcb
(T)
−1
. The present example is slightly degenerate since
i
−1
= i. Note also that if we had not already proved that b is a basis, the
fact that the equations (∗) have a solution proves it. For, once the e
i
have
been expressed as linear combinations of b it is immediate that b spans R
3
,
and since R
3
has dimension three it then follows from 4.12 that b is a basis.)
Finally, with v =
¸
−2
−1
2
, 7.1 yields that
cv
b
(v) = cv
b
i(v)
=
Mbs
(i) cv
S
i(v)
=
MBS
(i) v
Chapter Seven: Matrices and Linear Transformations 153
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
=
¸
0 −1 1
1 0 −1
−1 1 1
¸
−2
−1
2
=
¸
3
−4
3
−−<
Let T: F
n
→ F
m
be an arbitrary linear transformation, and let b and
c be the standard bases of F
n
and F
m
respectively. If M is the matrix of T
relative to c and b then by 4.19.2 we have, for all v ∈ F
n
,
T(v) = cv
c
T(v)
=
Mcb
(T) cv
b
(v) = Mv.
Thus we have proved
7.2 Proposition Every linear transformation from F
n
to F
m
is given by
multiplication by an m n matrix; furthermore, the matrix involved is the
matrix of the transformation relative to the standard bases.
7.3 Definition Let b and c be bases for the vector space V . Let i be the
identity transformation from V to V and let
Mcb
=
Mcb
(i), the matrix of i
relative to c and b. We call
Mcb
the transition matrix from bcoordinates to
ccoordinates.
The reason for this terminology is clear: if the coordinate vector of v
relative to b is multiplied by
Mcb
the result is the coordinate vector of v
relative to c.
7.4 Proposition In the notation of 7.3 we have
cv
c
(v) =
Mcb
cv
b
(v) for all v ∈ V .
The proof of this is trivial. (See #2 above.)
'7b Multiplication of transformations and matrices
We saw in Exercise 10 of Chapter Three that if φ: U → V and ψ: V → W
are linear transformations then their product is also a linear transforma
tion. (Remember that multiplication of linear transformations is composi
tion; ψφ: U → W is φ followed by ψ.) In view of the correspondence between
matrices and linear transformations given in the previous section it is natural
to seek a formula for the matrix of ψφ in terms of the matrices of ψ and φ.
The answer is natural too: the matrix of a product is the product the matri
ces. Of course this is no ﬂuke: matrix multiplication is deﬁned the way it is
just so that it corresponds like this to composition of linear transformations.
154 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.5 Theorem Let U, V, W be vector spaces with bases b, c, d respec
tively, and let φ: U → V and ψ: V → W be linear transformations. Then
Mdb
(ψφ) =
Mdc
(ψ)
Mcb
(φ).
Proof. Let b = (u
1
, . . . , u
m
), c = (v
1
, . . . , v
n
) and d = (w
1
, . . . , w
p
). Let
the (i, j)
th
entries of
Mdc
(ψ),
Mcb
(φ) and
Mdb
(ψφ) be (respectively) α
ij
, β
ij
and γ
ij
. Thus, by the deﬁnition of the matrix of a transformation,
ψ(v
k
) =
p
¸
i=1
α
ik
w
i
(for k = 1, 2, . . . , n), (a)
φ(u
j
) =
n
¸
k=1
β
kj
v
k
(for j = 1, 2, . . . , m), (b)
and
(ψφ)(u
j
) =
p
¸
i=1
γ
ij
w
i
(for j = 1, 2, . . . , m). (c)
But, by the deﬁnition of the product of linear transformations,
(ψφ)(u
j
) = ψ
φ(u
j
)
= ψ
n
¸
k=1
β
kj
v
k
(by (b))
=
n
¸
k=1
β
kj
ψ(v
k
) (by linearity of ψ)
=
n
¸
k=1
β
kj
p
¸
i=1
α
ik
w
i
(by (a))
=
p
¸
i=1
n
¸
k=1
α
ik
β
kj
w
i
,
and comparing with (c) we have (by 4.15 and the fact that the w
i
form a
basis) that
γ
ij
=
n
¸
k=1
α
ik
β
kj
for all i = 1, 2, . . . , p and j = 1, 2, . . . , m. Since the right hand side of
this equation is precisely the formula for the (i, j)
th
entry of the prod
uct of the matrices
Mdc
(ψ) = (α
ij
) and
Mcb
(φ) = (β
ij
) it follows that
Mdb
(ψφ) =
Mdc
(ψ)
Mcb
(φ).
Chapter Seven: Matrices and Linear Transformations 155
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comments
7.5.1 Here is a shorter proof of 7.5. The j
th
column of
Mdc
(ψ)
Mcb
(φ)
is, by the deﬁnition of matrix multiplication, the product of
Mdc
(ψ) and the
j
th
column of
Mcb
(φ). Hence, by the deﬁnition of
Mcb
(φ), it equals
Mdc
(ψ) cv
c
(φ
u
j
)
,
which in turn equals cv
d
ψ(φ(u
j
))
(by 7.1). But this is just cv
d
(ψφ)(u
j
)
,
the j
th
column of
Mdb
(ψφ).
7.5.2 Observe that the shapes of the matrices are right. Since the di
mension of V is n and the dimension of W is p the matrix
Mdc
(ψ) has n
columns and p rows; that is, it is a pn matrix. Similarly
Mcb
(φ) is an nm
matrix. Hence the product
Mdc
(ψ)
Mcb
(φ) exists and is a p m matrix—
the right shape to be the matrix of a transformation from an mdimensional
space to a pdimensional space.
It is trivial that if b is a basis of V and i: V → V the identity then
Mbb
(i) is the identity matrix (of the appropriate size). Hence,
7.6 Corollary If f: V → W is an isomorphism then
Mbc
(f
−1
) = (
Mcb
(f))
−1
for any bases b, c of V, W.
Proof. Let I be the d d identity matrix, where d = dimV = dimW (see
Exercise 1 of Chapter Six). Then by 7.5 we have
Mbc
(f
−1
)
Mcb
(f) =
Mbb
(f
−1
f) =
Mbb
(i) = I
and
Mcb
(f)
Mbc
(f
−1
) =
Mcc
(ff
−1
) =
Mcc
(i) = I
whence the result.
As another corollary we discover how changing the bases alters the
matrix of a transformation:
156 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.7 Corollary Let T: V → W be a linear transformation. Let b
1
, b
2
be
bases for V and let c
1
, c
2
be bases for W. Then
Mc
b
(T) =
Mc
c
Mc
b
(T)
Mb
b
.
Proof. By their deﬁnitions
Mc
c
=
Mc
c
(i
W
) and
Mb
b
=
Mb
b
(i
V
)
(where i
V
and i
W
are the identity transformations); so 7.5 gives
Mc
c
Mc
b
(T)
Mb
b
=
Mc
b
(i
W
Ti
V
),
and since i
W
Ti
V
= T the result follows.
Our next corollary is a special case of the last; it will be of particular
importance later.
7.8 Corollary Let b and c be bases of V and let T: V → V be linear.
Let X =
Mbc
be the transition matrix from ccoordinates to bcoordinates.
Then
Mcc
(T) = X
−1
Mbb
(T)X.
Proof. By 7.6 we see that X
−1
=
Mcb
, so that the result follows immedi
ately from 7.7.
Example
#3 Let f: R
3
→R
3
be deﬁned by
f
¸
x
y
z
=
¸
8 85 −9
0 −2 0
6 49 −7
¸
x
y
z
and let b =
¸
¸
1
0
1
,
¸
3
0
2
,
¸
−4
1
5
. Prove that b is a basis for R
3
and
calculate
Mbb
(f).
−− As in previous examples, b will be a basis if and only if the matrix
with the elements of b as its columns is invertible. If it is invertible it is the
Chapter Seven: Matrices and Linear Transformations 157
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
transition matrix
Msb
, and its inverse is
Mbs
. Now
¸
1 3 −4 1 0 0
0 0 1 0 1 0
1 2 5 0 0 1
R
3
:=R
3
−R
1
−−−−−−−→
¸
1 3 −4 1 0 0
0 0 1 0 1 0
0 −1 9 −1 0 1
R
2
↔R
3
R
2
:=−1R
2
−−−−−−→
¸
1 3 −4 1 0 0
0 1 −9 1 0 −1
0 0 1 0 1 0
R
2
:=R
2
+9R
3
R
1
:=R
1
+4R
3
R
1
:=R
1
−R
2
−−−−−−−→
¸
1 0 0 −2 −23 3
0 1 0 1 9 −1
0 0 1 0 1 0
,
so that b is a basis. Moreover, by 7.8,
Mbb
(f) =
¸
−2 −23 3
1 9 −1
0 1 0
¸
8 85 −9
0 −2 0
6 49 −7
¸
1 3 −4
0 0 1
1 2 5
=
¸
−2 −23 3
1 9 −1
0 1 0
¸
−1 6 8
0 0 −2
−1 4 −10
=
¸
−1 0 0
0 2 0
0 0 −2
.
−−<
'7c The Main Theorem on Linear Transformations
If A and B are ﬁnite sets with the same number of elements then a function
from A to B which is onetoone is necessarily onto as well, and, likewise, a
function from A to B which is onto is necessarily onetoone. There is an
analogous result for linear transformations between ﬁnite dimensional vector
spaces: if V and U are spaces of the same ﬁnite dimension then a linear trans
formation from V to U is onetoone if and only if it is onto. More generally,
if V and U are arbitrary ﬁnite dimensional vector spaces and φ: V → U a
linear transformation then the dimension of the image of φ equals the dimen
sion of V minus the dimension of the kernel of φ. This is one of the assertions
of the principal result of this section:
158 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.9 The Main Theorem on Linear Transformations Let V and U
be vector spaces over the ﬁeld F and φ: V → U a linear transformation. Let
W be a subspace of V complementing ker φ. Then
(i) W is isomorphic to imφ,
(ii) if V is ﬁnite dimensional then
dimV = dim(imφ) + dim(ker φ).
Comments
7.9.1 Recall that ‘W complements ker φ’ means that V = W ⊕ ker φ.
We have proved in 6.10 that if V is ﬁnite dimensional then there always exists
such a W. In fact it is also true in the inﬁnite dimensional case, but the proof
(using Zorn’s Lemma) is beyond the scope of this course.
7.9.2 In physical examples the dimension of the space V can be thought
of intuitively as the number of degrees of freedom of the system. Then the
dimension of ker φ is the number of degrees of freedom which are killed by φ
and the dimension of imφ the number which survive φ.
Proof. (i) Deﬁne ψ: W → imφ by
ψ(v) = φ(v) for all v ∈ W.
Thus ψ is simply φ with the domain restricted to W and the codomain
restricted to imφ. Note that it is legitimate to restrict the codomain like this
since φ(v) ∈ imφ for all v ∈ W. We prove that ψ is an isomorphism.
Since φ is linear, ψ is certainly linear also:
ψ(λx +µy) = φ(λx +µy) = λφ(x) +µφ(y) = λψ(x) +µψ(y)
for all x, y ∈ W and λ, µ ∈ F. We have
ker ψ = ¦ x ∈ W [ ψ(x) = 0 ¦ = ¦ x ∈ W [ φ(x) = 0 ¦ = W ∩ ker φ = ¦0¦
by 6.6, since the sum of W and ker φ is direct. Hence ψ is injective, by 3.15,
and it remains to prove that ψ is surjective.
Let u ∈ imφ. Then u = φ(v) for some v ∈ V , and since V = W +ker φ
we may write v = w +z with w ∈ W and z ∈ ker φ. Then we have
u = φ(v) = φ(w +z) = φ(w) +φ(z) = φ(w) + 0
Chapter Seven: Matrices and Linear Transformations 159
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
since z ∈ ker φ, and so u = φ(w) = ψ(w). This holds for all u ∈ imφ; so
we have shown that ψ: W → imφ is surjective, as required. Hence W is
isomorphic to imφ.
(ii) Let (x
1
, x
2
, . . . , x
r
) be a basis for W. Since ψ is an isomorphism it
follows from 4.17 that (ψ(x
1
), ψ(x
2
), . . . , ψ(x
r
)) is a basis for imφ, and so
dim(imφ) = dimW. Hence 6.9 yields
dimV = dimW + dim(ker φ) = dim(imφ) + dim(ker φ).
Example
#4 Let φ: R
3
→R
3
be deﬁned by
φ
¸
x
y
z
=
¸
1 1 1
1 1 1
2 2 2
¸
x
y
z
.
It is easily seen that the image of φ consists of all scalar multiples of the
column
¸
1
1
2
, and is therefore a space of dimension 1. By the Main Theorem
the kernel of φ must have dimension 2, and, indeed, it is easily checked that
¸
¸
1
−1
0
,
¸
0
1
−1
is a basis for ker φ.
In our treatment of the Main Theorem we have avoided mentioning
quotient spaces, so that '6c would not be a prerequisite. This does, however,
distort the picture somewhat; a more natural version of the theorem is as
follows:
7.10 Theorem Let V and U be vector spaces and φ: V → U a linear
transformation. Then the image of φ is isomorphic to the quotient of V by
the kernel of φ. That is, symbolically, imφ
∼
= (V/ ker φ).
Proof. Let I = imφ and K = ker φ. Elements of V/K are cosets, of the
form v +K where v ∈ V . Now if x ∈ K then
φ(v +x) = φ(v) +φ(x) = φ(v) + 0 = φ(v)
160 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and so it follows that φ maps all elements of the coset x + K to the same
element u = φ(v) in the subspace I of U. Hence there is a welldeﬁned
mapping ψ: (V/K) → I satisfying
ψ(v +K) = φ(v) for all v ∈ V .
Linearity of ψ follows easily from linearity of φ and the deﬁnitions of
addition and scalar multiplication for cosets:
ψ(λ(x +K) +µ(y +K)) = ψ((λx +µy) +K) = φ(λx +µy)
= λφ(x) +µφ(y) = λψ(x +K) +µψ(y +K)
for all x, y ∈ V and λ, µ ∈ F.
Since every element of I has the form φ(v) = ψ(v +K) it is immediate
that ψ is surjective. Moreover, if v+K ∈ ker ψ then since φ(v) = ψ(v+K) = 0
it follows that v ∈ K, whence v + K = K, the zero element of V/K. Hence
the kernel of ψ consists of the zero element alone, and therefore ψ is injective.
We leave to the exercises the proof of the following consequence of the
Main Theorem.
7.11 Proposition If S and T are subspaces of V then
dim(S +T) + dim(S ∩ T) = dimS + dimT.
Another consequence of the Main Theorem is that it is possible to
choose bases of the domain and codomain so that the matrix of a linear
transformation has a particularly simple form.
7.12 Theorem Let V, U be vector spaces of dimensions n, m respectively,
and φ: V → U a linear transformation. Then there exist bases b, c of V, U
and a positive integer r such that
Mcb
(φ) =
I
r×r
0
r×(n−r)
0
(m−r)×r
0
(m−r)×(n−r)
Chapter Seven: Matrices and Linear Transformations 161
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
where I
r×r
is an r r identity matrix, and the other blocks are zero matrices
of the sizes indicated. Thus, the (i, j)entry of
Mcb
(φ) is zero unless i = j ≤ r,
in which case it is one.
Proof. By 6.10 we may write V = W ⊕ ker φ, and, as in the proof of the
Main Theorem, let (x
1
, x
2
, . . . , x
r
) be a basis of W. Choose any basis of
ker φ; by 6.9 it will have n − r elements, and combining it with the basis of
W will give a basis of V . Let (x
r+1
, . . . , x
n
) be the chosen basis of ker φ, and
let b = (x
1
, x
2
, . . . , x
n
) the resulting basis of V .
As in the proof of the Main Theorem, (φ(x
1
), . . . , φ(x
r
)) is a basis for
the subspace imφ of U. By 4.10 we may choose u
r+1
, u
r+2
, . . . , u
m
∈ U so
that
c = (φ(x
1
), φ(x
2
), . . . , φ(x
r
), u
r+1
, u
r+2
, . . . , u
m
)
is a basis for U.
The bases b and c having been deﬁned, it remains to calculate the
matrix of φ and check that it is as claimed. By deﬁnition the i
th
column of
Mcb
(φ) is the coordinate vector of φ(x
i
) relative to c. If r + 1 ≤ i ≤ n then
x
i
∈ ker φ and so φ(x
i
) = 0. So the last n−r columns of the matrix are zero.
If 1 ≤ i ≤ r then φ(x
i
) is actually in the basis c; in this case the solution of
φ(x
i
) = λ
1
φ(x
1
) + +λ
r
φ(x
r
) +λ
r+1
u
r+1
+ +λ
m
u
m
is λ
i
= 1 and λ
j
= 0 for j = i. Thus for 1 ≤ i ≤ r the i
th
column of the
matrix has a 1 in the i
th
position and zeros elsewhere.
Comment
7.12.1 The number r which appears in 7.12 is called the rank of φ. Note
that r is an invariant of φ, in the sense that it depends only on φ and not on
the bases b and c. Indeed, r is just the dimension of the image of φ.
'7d Rank and nullity of matrices
Because of the connection between matrices and linear transformations,
the Main Theorem on Linear Transformations ought to have an analogue for
matrices. This section is devoted to the relevant concepts. Our ﬁrst aim is
to prove the following important theorem:
7.13 Theorem Let A be any rectangular matrix over the ﬁeld F. The
dimension of the column space of A is equal to the dimension of the row
space of A.
162 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.14 Definition The rank of A, rank(A), is deﬁned to be the dimension
of the row space of A.
In view of 7.13 we could equally well deﬁne the rank to be the dimension
of the column space. However, we are not yet ready to prove 7.13; there are
some preliminary results we need ﬁrst. The following is trivial:
7.15 Proposition The dimension of RS(A) does not exceed the number
of rows of A, and the dimension of CS(A) does not exceed the number of
columns.
Proof. Immediate from 4.12 (i), since the rows span the row space and the
columns span the column space.
We proved in Chapter Three (see 3.21) that if A and B are matrices
such that AB exists then the row space of AB is contained in the row space
of B, and the two are equal if A is invertible. It is natural to ask whether
there is any relationship between the column space of AB and the column
space of B.
7.16 Proposition If A ∈ Mat(mn, F) and B ∈ Mat(n p, F) then
φ(x) = Ax deﬁnes a surjective linear transformation from CS(B) to CS(AB).
Proof. Let the columns of B be x
1
, x
2
, . . . , x
p
. Then the columns of AB
are Ax
1
, Ax
2
, . . . , Ax
m
. Now if v ∈ CS(B) then v =
¸
m
i=1
λ
i
x
i
for some
λ
i
∈ F, and so
φ(v) = Av = A
p
¸
i=1
λ
i
x
i
=
p
¸
i=1
λ
i
Ax
i
∈ CS(AB).
Thus φ is a function from CS(B) to CS(AB). It is trivial that φ is linear:
φ(λx +µy) = A(λx +µy) = λ(Ax) +µ(Ay) = λφ(x) +µφ(y).
It remains to prove that φ is surjective. If y ∈ CS(AB) is arbitrary then y
must be a linear combination of the Ax
i
(the columns of AB); so for some
scalars λ
i
y =
p
¸
i=1
λ
i
Ax
i
= A
p
¸
i=1
λ
i
x
i
.
Deﬁning x =
¸
n
i=1
λ
i
x
i
we have that x ∈ CS(B), and y = Ax = φ(x) ∈ imφ,
as required.
Chapter Seven: Matrices and Linear Transformations 163
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.17 Corollary The dimension of CS(AB) is less than or equal to the
dimension of CS(B).
Proof. This follows from the Main Theorem on Linear Transformations:
since φ is surjective we have
dimCS(B) = dim(CS(AB)) + dim(ker φ),
and since dim(ker φ) is certainly nonnegative the result follows.
If the matrix A is invertible then we may apply 7.17 with B replaced by
AB and A replaced by A
−1
to deduce that dim(CS(A
−1
AB) ≤ dim(CS(AB),
and combining this with 7.17 itself gives
7.18 Corollary If A is invertible then CS(B) and CS(AB) have the same
dimension.
We have already seen that elementary row operations do not change the
row space at all; so they certainly do not change the dimension of the row
space. In 7.18 we have shown that they do not change the dimension of the
column space either. Note, however, that the column space itself is changed
by row operations, it is only the dimension of the column space which is
unchanged.
The next three results are completely analogous to 7.16, 7.17 and 7.18,
and so we omit the proofs.
7.19 Proposition If A and B are matrices such that AB exists then
x → xB is a surjective linear transformation from RS(A) to RS(AB).
7.20 Corollary The dimension of RS(AB) is less than or equal to the
dimension of RS(A).
7.21 Corollary If B is invertible then dim(RS(AB)) = dim(RS(A)).
By 7.18 and 7.21 we see that premultiplying and postmultiplying by
invertible matrices both leave unchanged the dimensions of the row space
and column space. Given a matrix A, then, it is natural to look for invertible
matrices P and Q for which PAQ is as simple as possible. Then, with any
luck, it may be obvious that dim(RS(PAQ)) = dim(CS(PAQ)). If so we will
have achieved our aim of proving that dim(RS(A)) = dim(CS(A)). Theorem
7.12 readily provides the result we seek:
164 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
7.22 Theorem Let A ∈ Mat(mn, F). Then there exist invertible ma
trices P ∈ Mat(mm, F) and Q ∈ Mat(n n, F) such that
PAQ =
I
r×r
0
r×(n−r)
0
(m−r)×r
0
(m−r)×(n−r)
where the integer r equals both the dimension of the row space of A and the
dimension of the column space of A.
Proof. Let φ: F
n
→ F
m
be given by φ(x) = Ax. Then φ is a linear
transformation, and A =
Ms
s
(φ), where s
1
, s
2
are the standard bases of
F
n
, F
m
. (See 7.2 above.) By 7.12 there exist bases b, c of F
n
, F
m
such that
Mcb
(φ) =
I 0
0 0
, and if we let P and Q be the transition matrices
Mcs
and
Ms
b
(respectively) then 7.7 gives
PAQ =
Mcs
Ms
s
(φ)
Ms
b
=
Mcb
(φ) =
I 0
0 0
.
Note that since P and Q are transition matrices they are invertible (by 7.6);
hence it remains to prove that if the identity matrix I in the above equation
is of shape r r, then the row space of A and the column space of A both
have dimension r.
By 7.21 and 3.22 we know that RS(PAQ) and RS(A) have the same
dimension. But the row space of
I 0
0 0
consists of all ncomponent rows
of the form (α
1
, . . . , α
r
, 0, . . . , 0), and is clearly an rdimensional space iso
morphic to
t
F
r
. Indeed, the nonzero rows of
I 0
0 0
—that is, the ﬁrst r
rows—are linearly independent, and therefore form a basis for the row space.
Hence r = dim(RS(PAQ)) = dim(RS(A)), and by totally analogous reason
ing we also obtain that r = dim(CS(PAQ)) = dim(CS(A)).
Comments
7.22.1 We have now proved 7.13.
7.22.2 The proof of 7.22 also shows that the rank of a linear transfor
mation (see 7.12.1) is the same as the rank of the matrix of that linear
transformation, independent of which bases are used.
7.22.3 There is a more direct computational proof of 7.22, as follows.
Apply row operations to A to obtain a reduced echelon matrix. As we have
Chapter Seven: Matrices and Linear Transformations 165
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
seen in Chapter One, row operations are performed by premultiplying by
invertible matrices; so the echelon matrix obtained has the form PA with P
invertible. Now apply column operations to PA; this amounts to postmulti
plying by an invertible matrix. It can be seen that a suitable choice of column
operations will transform an echelon matrix to the desired form.
As an easy consequence of the above results we can deduce the following
important fact:
7.23 Theorem An nn matrix has an inverse if and only if its rank is n.
Proof. Let A ∈ Mat(n n, F) and suppose that rank(A) = n. By 7.22
there exist nn invertible P and Q with PAQ = I. Multiplying this equation
on the left by P
−1
and on the right by P gives P
−1
PAQP = P
−1
P, and
hence A(QP) = I. We conclude that A is invertible (QP being the inverse).
Conversely, suppose that A is invertible, and let B = A
−1
. By 7.20 we
know that rank(AB) ≤ rank(A), and by 7.15 the number of rows of A is an
upper bound on the rank of A. So n ≥ rank(A) ≥ rank(AB) = rank(I) = n
since (reasoning as in the proof of 7.22) it is easily shown that the n n
identity has rank n. Hence rank(A) = n.
Let A be an m n matrix over F and let φ: F
n
→ F
m
be the linear
transformation given by multiplication by A. Since
¸
¸
¸
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
.
.
.
.
.
.
a
m1
a
m2
a
mn
¸
¸
¸
x
1
x
2
.
.
.
x
n
=
n
¸
i=1
¸
¸
¸
a
1i
a
2i
.
.
.
a
mi
x
i
we see that for every v ∈ F
n
the column φ(v) is a linear combination of the
columns of A, and, conversely, every linear combination of the columns of A
is φ(v) for some v ∈ F
n
. In other words, the image of φ is just the column
space of A. The kernel of φ involves us with a new concept:
7.24 Definition The right null space of a matrix A ∈ Mat(mn, F) is
the set RN(A) consisting of all v ∈ F
n
such that Av = 0. The dimension of
RN(A) is called the right nullity, rn(A), of A. Similarly, the left null space
LN(A) is the set of all w ∈
t
F
m
such that wA = 0, and its dimension is called
the left nullity, ln(A), of A.
166 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It is obvious that if A and φ are as in the above discussion, the kernel
of φ is the right null space of A. The domain of φ is F
n
, which has dimension
equal to n, the number of columns of A. Hence the Main Theorem on Linear
Transformations applied to φ gives:
7.25 Theorem For any rectangular matrix A over the ﬁeld F, the rank of
A plus the right nullity of A equals the number of columns of A. That is,
rank(A) + rn(A) = n
for all A ∈ Mat(mn, F).
Comment
7.25.1 Similarly, the linear transformation x → xA has kernel equal to
the left null space of A and image equal to the row space of A, and it follows
that rank(A)+ln(A) = m, the number of rows of A. Combining this equation
with the one in 7.25 gives ln(A) −rn(A) = m−n.
Exercises
1. Let U be the vector space consisting of all functions f: R → R which
satisfy the diﬀerential equation f
(t) − f(t) = 0, and assume that that
b
1
= (u
1
, u
2
) and b
2
= (v
1
, v
2
) are two diﬀerent bases of U, where u
1
,
u
2
, v
1
, v
2
are deﬁned as follows:
u
1
(t) = e
t
u
2
(t) = e
−t
v
1
(t) = cosh(t)
v
2
(t) = sinh(t)
for all t ∈ R. Find the (b
1
, b
2
)transition matrix.
2. Let f, g, h be the functions from R to R deﬁned by
f(t) = sin(t), g(t) = sin(t +
π
4
), h(t) = sin(t +
π
2
).
(i ) What is the dimension of the vector space V (over R) spanned by
(f, g, h)? Determine two diﬀerent bases b
1
and b
2
of V .
(ii ) Show that diﬀerentiation yields a linear transformation from V to
V , and calculate the matrix of this transformation relative to the
bases b
1
and b
2
found in (i). (That is, use b
1
as the basis of V
considered as the domain, b
2
as the basis of V considered as the
codomain.)
Chapter Seven: Matrices and Linear Transformations 167
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3. Regard C as a vector space over R. Show that the function θ: C → C
deﬁned by θ(z) = ¯ z (complex conjugate of z) is a linear transformation,
and calculate the matrix of θ relative to the basis (1, i) (for both the
domain and codomain).
4. Let b
1
= (v
1
, v
2
, . . . , v
n
), b
2
= (w
1
, w
2
, . . . w
n
) be two bases of a vector
space V . Prove that
Mb
b
=
M
−1
b
b
.
5. Let θ: R
3
→R
2
be deﬁned by
θ
¸
x
y
z
=
1 1 1
2 0 4
¸
x
y
z
.
Calculate the matrix of θ relative to the bases
¸
¸
1
−1
1
,
¸
3
−2
2
,
¸
−2
4
−3
and
0
1
,
1
1
of R
3
and R
2
.
6. (i ) Let φ: R
2
→ R be deﬁned by φ
x
y
= ( 1 2 )
x
y
. Calculate
the matrix of φ relative to the bases
0
1
,
1
1
and (−1) of
R
2
and R.
(ii ) With φ as in (i) and θ as in Exercise 5, calculate φθ and its matrix
relative the two given bases. Hence verify Theorem 7.5 in this case.
7. Let S and T be subspaces of a vector space V and let U be a subspace
of T such that T = (S ∩ T) ⊕U. Prove that S +T = S ⊕U, and hence
deduce that dim(S +T) = dimS + dimT −dim(S ∩ T).
8. Give an alternative proof of the result of the previous exercise by use of
the Main Theorem and Exercise 14 of Chapter Six.
9. Prove that the rank of a linear transformation is an invariant. (See
7.12.1.)
168 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
10. Let S = span(v
1
, v
2
, . . . , v
s
), T = span(w
1
, w
2
, . . . , w
t
) where the v
i
and
w
j
are ncomponent rows over F. Form the matrix
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
v
1
v
1
v
2
v
2
.
.
.
.
.
.
v
s
v
s
w
1
0
w
2
0
.
.
.
.
.
.
w
t
0
and use row operations to reduce it to echelon form. If the result is
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
x
1
∗
.
.
.
.
.
.
x
l
∗
0 y
1
.
.
.
.
.
.
0 y
k
0 0
.
.
.
.
.
.
0 0
then (x
1
, x
2
, . . . , x
l
) is a basis for S +T and (y
1
, y
2
, . . . , y
k
) is a basis for
S ∩ T.
Do this for the following example:
v
1
= ( 1 1 1 1 )
v
2
= ( 2 0 −2 3 )
v
3
= ( 1 3 5 0 )
v
4
= ( 7 −3 8 −1 )
w
1
= ( 0 −2 −4 1 )
w
2
= ( 1 5 9 −1 )
w
3
= ( 3 −3 −9 6 )
w
4
= ( 1 1 0 0 ) .
Think about why it works.
Chapter Seven: Matrices and Linear Transformations 169
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
11. Let V , W be vector spaces over the ﬁeld F and let b = (v
1
, v
2
, . . . , v
n
),
c = (w
1
, w
2
, . . . , w
m
) be bases of V , W respectively. Let L(V, W) be the
set of all linear transformations from V to W, and let Mat(mn, F) be
the set of all mn matrices over F.
Prove that the function Ω: L(V, W) → Mat(m n, F) deﬁned by
Ω(θ) =
Mbc
(θ) is a vector space isomorphism. Find a basis for L(V, W).
(Hint: Elements of the basis must be linear transformations from
V to W, and to describe a linear transformation θ from V to
W it suﬃces to specify θ(v
i
) for each i. First ﬁnd a basis for
Mat(mn, F) and use the isomorphism above to get the required
basis of L(V, W).)
12. For each of the following linear transformations calculate the dimensions
of the kernel and image, and check that your answers are in agreement
with the Main Theorem on Linear Transformations.
(i ) θ: R
4
→R
2
given by θ
¸
¸
x
y
z
w
=
2 −1 3 5
1 −1 1 0
¸
¸
x
y
z
w
.
(ii ) θ: R
2
→R
3
given by θ
x
y
=
¸
4 −2
−2 1
−6 3
x
y
.
(iii ) θ: V → V given by θ(p(x)) = p
(x), where V is the space of all
polynomials over R of degree less than or equal to 3.
13. Suppose that θ: R
6
→ R
4
is a linear transformation with kernel of di
mension 2. Is θ surjective?
14. Let θ: R
3
→R
3
be deﬁned by θ
¸
x
y
z
=
¸
1 −4 2
−1 −2 2
1 −5 3
¸
x
y
z
, and let
b be the basis of R
3
given by b =
¸
¸
1
1
2
,
¸
1
1
1
,
¸
2
1
3
. Calculate
Mbb
(θ) and ﬁnd a matrix X such that
X
−1
¸
1 −4 2
−1 −2 2
1 −5 3
X =
Mbb
(θ).
170 Chapter Seven: Matrices and Linear Transformations
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
15. Let θ: R
3
→R
4
be deﬁned by
θ
¸
x
y
z
=
¸
¸
1 −1 3
2 −1 −2
1 −1 1
4 −2 −4
¸
x
y
z
.
Find bases b = (u
1
, u
2
, u
3
) of R
3
and c = (v
1
, v
2
, v
3
, v
4
) of R
4
such that
the matrix of θ relative to b and c is
¸
¸
1 0 0
0 1 0
0 0 1
0 0 0
.
16. Let V and W be ﬁnitely generated vector spaces of the same dimension,
and let θ: V → W be a linear transformation. Prove that θ is injective
if and only if it is surjective.
17. Give an example, or prove that no example can exist, for each of the
following.
(i ) A surjective linear transformation θ: R
8
→ R
5
with kernel of di
mension 4.
(ii ) A ∈ Mat(3 2, R), B ∈ Mat(2 2, R), C ∈ Mat(2 3, R) with
ABC =
¸
2 2 2
3 3 0
4 0 0
.
18. Let A be the 50 100 matrix with (i, j)
th
entry 100(i − 1) + j. (Thus
the ﬁrst row consists of the numbers from 1 to 100, the second consists
of the numbers from 101 to 200, and so on.) Let V be the space of all
50component rows v over R such that vA = 0 and W the space of 100
component columns w over R such that Aw = 0. (That is, V and W are
the left null space and right null space of A.) Calculate dimW −dimV .
(Hint: You do not really have to do any calculation!)
19. Let U and V be vector spaces over F and φ: U → V a linear transfor
mation. Let b, b
be bases for U and d, d
bases for V . Prove that the
matrices
Mbd
(φ) and
Mb
d
(φ) have the same rank.
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
8
Permutations and determinants
This chapter is devoted to studying determinants, and in particular to prov
ing the properties which were stated without proof in Chapter Two. Since
a proper understanding of determinants necessitates some knowledge of per
mutations, we investigate these ﬁrst.
'8a Permutations
Our discussion of this topic will be brief since all we really need is the
deﬁnition of the parity of a permutation (see Deﬁnition 8.3 below), and some
of its basic properties.
8.1 Definition A permutation of the set ¦1, 2, . . . , n¦ is a bijective func
tion ¦1, 2, . . . , n¦ → ¦1, 2, . . . , n¦. The set of all permutations of ¦1, 2, . . . , n¦
will be denoted by ‘S
n
’.
For example, the function σ: ¦1, 2, 3¦ → ¦1, 2, 3¦ deﬁned by
σ(1) = 2, σ(2) = 3, σ(3) = 1
is a permutation. In a notation which is commonly used one would write
σ =
¸
1 2 3
2 3 1
for this permutation. In general, the notation
τ =
¸
1 2 3 n
r
1
r
2
r
3
r
n
means
τ(1) = r
1
, τ(2) = r
2
, . . . , τ(n) = r
n
.
171
172 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comments
8.1.1 Altogether there are six permutations of ¦1, 2, 3¦, namely
¸
1 2 3
1 2 3
¸
1 2 3
1 3 2
¸
1 2 3
2 1 3
¸
1 2 3
2 3 1
¸
1 2 3
3 1 2
¸
1 2 3
3 2 1
.
8.1.2 It is usual to give a more general deﬁnition than 8.1 above: a
permutation of any set S is a bijective function from S to itself. However,
we will only need permutations of ¦1, 2, . . . , n¦.
Multiplication of permutations is deﬁned to be composition of functions.
That is,
8.2 Definition For σ, τ ∈ S
n
deﬁne στ: ¦1, 2, . . . , n¦ → ¦1, 2, . . . , n¦ by
(στ)(i) = σ(τ(i)).
Comments
8.2.1 We proved in Chapter One that the composite of two bijections
is a bijection; hence the product of two permutations is a permutation. Fur
thermore, for each permutation σ there is an inverse permutation σ
−1
such
that σσ
−1
= σ
−1
σ = i (the identity permutation).
In fact, permutations form an algebraic system known as a group. We
leave the study of groups to another course.
8.2.2 Unfortunately, there are two conﬂicting conventions within the
mathematical community, both widespread, concerning multiplication of per
mutations. Some people, perhaps most people, write permutations as right
operators, rather than left operators (which is the convention that we will
follow). If σ ∈ S
n
and i ∈ ¦1, 2, . . . , n¦ our convention is to write ‘σ(i)’ for
the image of of i under the action of σ; the permutation is written to the
left of the the thing on which it acts. The right operator convention is to
write either ‘i
σ
’ or ‘iσ’ instead of σ(i). This seems at ﬁrst to be merely a
notational matter of no mathematical consequence. But, naturally enough,
right operator people deﬁne the product στ of two permutations σ and τ by
the rule i
στ
= (i
σ
)
τ
, instead of by the rule given in 8.2 above. Thus for right
operators, ‘στ’ means ‘apply σ ﬁrst, then apply τ’, while for left operators
it means ‘apply τ ﬁrst, then apply σ’ (since ‘(στ)(i)’ means ‘σ applied to (τ
applied to i)’). The upshot is that we multiply permutations from right to
left, others go from left to right.
Chapter Eight: Permutations and determinants 173
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Example
#1 Compute the following product of permutations:
¸
1 2 3 4
2 3 4 1
¸
1 2 3 4
3 2 4 1
.
−− Let σ =
¸
1 2 3 4
2 3 4 1
and τ =
¸
1 2 3 4
3 2 4 1
. Then
(στ)(1) = σ(τ(1)) = σ(3) = 4.
The procedure is immediately clear: to ﬁnd what the product does to 1, ﬁrst
ﬁnd the number underneath 1 in the right hand factor—it is 3 in this case—
then look under this number in the next factor to get the answer. Repeat
this process to ﬁnd what the product does to 2, 3 and 4. We ﬁnd that
¸
1 2 3 4
2 3 4 1
¸
1 2 3 4
3 2 4 1
=
¸
1 2 3 4
4 3 1 2
.
−−<
Note that the right operator convention leads to exactly the same
method for computing products, but starting with the left hand factor and
proceeding to the right. This would give
¸
1 2 3 4
2 4 1 3
as the answer.
#2 It is equally easy to compute a product of more than two factors.
Thus, for instance
¸
1 2 3
3 1 2
¸
1 2 3
2 1 3
¸
1 2 3
1 3 2
¸
1 2 3
3 1 2
¸
1 2 3
3 2 1
=
¸
1 2 3
2 1 3
.
Starting from the right one ﬁnds, for example, that 3 goes to 1, then to 3,
then 2, then 1, then 3.
The notation adopted above is good, but can be somewhat cumbersome.
The trouble is that each number in ¦1, 2, . . . , n¦ must written down twice.
There is a shorter notation which writes each number down at most once. In
this notation the permutation
σ =
¸
1 2 3 4 5 6 7 8 9 10 11
8 6 3 2 1 9 7 5 4 11 10
174 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
would be written as σ = (1, 8, 5)(2, 6, 9, 4)(10, 11). The idea is that the
permutation is made up of disjoint “cycles”. Here the cycle (2, 6, 9, 4) means
that 6 is the image of 2, 9 the image of 6, 4 the image of 9 and 2 the
image of 4 under the action of σ. Similarly (1, 8, 5) indicates that σ maps 1
to 8, 8 to 5 and 5 to 1. Cycles of length one (like (3) and (7) in the above
example) are usually omitted, and the cycles can be written in any order.
The computation of products is equally quick, or quicker, in the disjoint cycle
notation, and (fortunately) the cycles of length one have no eﬀect in such
computations. The reader can check, for example, that if σ = (1)(2, 3)(4)
and τ = (1, 4)(2)(3) then στ and τσ are both equal to (1, 4)(2, 3).
8.3 Definition For each σ ∈ S
n
deﬁne l(σ) to be the number of ordered
pairs (i, j) such that 1 ≤ i < j ≤ n and σ(i) > σ(j). If l(σ) is an odd number
we say that σ is an odd permutation; if l(σ) is even we say that σ is an even
permutation. We deﬁne ε: S
n
→ ¦+1, −1¦ by
ε(σ) = (−1)
l(σ)
=
+1 if σ is even
−1 if σ is odd
for each σ ∈ S
n
, and call ε(σ) the parity or sign of σ.
The reason for this deﬁnition becomes apparent (after a while!) when
one considers the polynomial E in n variables x
1
, x
2
, . . . , x
n
deﬁned by
E = Π
i<j
(x
i
− x
j
). That is, E is the product of all factors (x
i
− x
j
) with
i < j. In the case n = 4 we would have
E = (x
1
−x
2
)(x
1
−x
3
)(x
1
−x
4
)(x
2
−x
3
)(x
2
−x
4
)(x
3
−x
4
).
Now take a permutation σ and in the expression for E replace each subscript
i by σ(i). For instance, if σ = (4, 2, 1, 3) then applying σ to E in this way
gives
σ(E) = (x
3
−x
1
)(x
3
−x
4
)(x
3
−x
2
)(x
1
−x
4
)(x
1
−x
2
)(x
4
−x
2
).
Note that one gets exactly the same factors, except that some of the signs are
changed. With some thought it can be seen that this will always be the case.
Indeed, the factor (x
i
−x
j
) in E is transformed into the factor (x
σ(i)
−x
σ(j)
)
in σ(E), and (x
σ(i)
− x
σ(j)
) appears explicitly in the expression for E if
σ(i) < σ(j), while (x
σ(j)
− x
σ(i)
) appears there if σ(i) > σ(j). Thus the
number of sign changes is exactly the number of pairs (i, j) with i < j such
that σ(i) > σ(j); that is, the number of sign changes is l(σ). It follows that
σ(E) equals E if σ is even and −E if σ is odd; that is, σE = ε(σ)E.
The following deﬁnition was implicitly used in the above argument:
Chapter Eight: Permutations and determinants 175
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
8.4 Definition Let σ ∈ S
n
and let f be a realvalued function of n real
variables x
1
, x
2
, . . . , x
n
. Then σf is the function deﬁned by
(σf)(x
1
, x
2
, . . . , x
n
) = f(x
σ(1)
, x
σ(2)
, . . . , x
σ(n)
).
If f and g are two real valued functions of n variables it is usual to
deﬁne their product fg by
(fg)(x
1
, x
2
, . . . , x
n
) = f(x
1
, x
2
, . . . , x
n
)g(x
1
, x
2
, . . . , x
n
).
(Note that this is the pointwise product, not to be confused with the compos
ite, given by f(g(x)), which is the only kind of product of functions previously
encountered in this book. In fact, since f is a function of n variables, f(g(x))
cannot make sense unless n = 1. Thus confusion is unlikely.)
We leave it for the reader to check that, for the pointwise product,
σ(fg) = (σf)(σg) for all σ ∈ S
n
and all functions f and g. In particular if c
is a constant this gives σ(cf) = c(σf). Furthermore
8.5 Proposition For all σ, τ ∈ S
n
and any function f of n variables we
have (στ)f = σ(τf).
Proof. Let σ, τ ∈ S
n
. Let x
1
, x
2
, . . . , x
n
be variables and deﬁne y
i
= x
σ(i)
for each i. Then by deﬁnition
(σ(τf))(x
1
, x
2
, . . . , x
n
) = (τf)(y
1
, y
2
, . . . , y
n
)
and
(τf)(y
1
, y
2
, . . . , y
n
) = f(y
τ(1)
, y
τ(2)
, . . . , y
τ(n)
)
= f(x
σ(τ(1))
, x
σ(τ(2))
, . . . , x
σ(τ(n))
).
Thus we have shown that
(σ(τf))(x
1
, x
2
, . . . , x
n
) = f(x
(στ)(1)
, x
(στ)(2)
, . . . , x
(στ)(n)
)
for all values of the x
i
, and so σ(τf) = (στ)f, as claimed.
176 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It can now be seen that for all σ, τ ∈ S
n
, with E as above,
ε(στ)E = (στ)E = σ(τE) = σ(ε(τ)E) = ε(τ)σE = ε(τ)ε(σ)E
and we deduce that ε(στ) = ε(σ)ε(τ). Since the proof of this fact is the chief
reason for including a section on permutations we give another proof, based
on properties of cycles of length two.
8.6 Definition A permutation in S
n
which is of the form (i, j) for some
i, j ∈ I = ¦1, 2, . . . , n¦ is called a transposition. That is, if τ is a transposition
then there exist i, j ∈ I such that τ(i) = j and τ(j) = i, and τ(k) = k for all
other k ∈ I. A transposition of the form (i, i + 1) is called simple.
8.7 Lemma Suppose that τ = (i, i + 1) is a simple transposition in S
n
. If
σ ∈ S
n
and σ
= στ then
l(σ
) =
l(σ) + 1 if σ(i) < σ(i + 1)
l(σ) −1 if σ(i) > σ(i + 1).
Proof. Observe that σ
(i) = σ(i + 1) and σ
(i + 1) = σ(i), while for other
values of j we have σ
(j) = σ(j). In our original notation for permutations, σ
is obtained from σ by swapping the numbers in the i
th
and (i +1)
th
positions
in the second row. The point is now that if r > s and r occurs to the left
of s in the second row of σ then the same is true in the second row of σ
,
except in the case of the two numbers we have swapped. Hence the number
of such pairs is one more for σ
than it is for σ if σ(i) < σ(i + 1), one less if
σ(i) > σ(i + 1), as required.
At the risk of making an easy argument seem complicated, let us spell
this out in more detail. Deﬁne
N = ¦ (j, k) [ j < k and σ(j) > σ(k) ¦
and
N
= ¦ (j, k) [ j < k and σ
(j) > σ
(k) ¦
so that by deﬁnition the number of elements of N is l(σ) and the number of
elements of N
is l(σ
). We split N into four pieces as follows:
N
1
= ¦ (j, k) ∈ N [ j / ∈ ¦i, i + 1¦ and k / ∈ ¦i, i + 1¦ ¦
N
2
= ¦ (j, k) ∈ N [ j ∈ ¦i, i + 1¦ and k / ∈ ¦i, i + 1¦ ¦
N
3
= ¦ (j, k) ∈ N [ j / ∈ ¦i, i + 1¦ and k ∈ ¦i, i + 1¦ ¦
N
4
= ¦ (j, k) ∈ N [ j ∈ ¦i, i + 1¦ and k ∈ ¦i, i + 1¦ ¦.
Chapter Eight: Permutations and determinants 177
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Deﬁne also N
1
, N
2
, N
3
and N
4
by exactly the same formulae with N replaced
by N
. Every element of N lies in exactly one of the N
i
, and every element
of N
lies in exactly one of the N
i
.
If neither j nor k is in ¦i, i + 1¦ then σ
(j) and σ
(k) are the same as
σ(j) and σ(k), and so (j, k) is in N
if and only if it is in N. Thus N
1
= N
1
.
Consider now pairs (i, k) and (i + 1, k), where k / ∈ ¦i, i + 1¦. We show
that (i, k) ∈ N if and only if (i + 1, k) ∈ N
. Obviously i < k if and
only if i + 1 < k, and since σ(i) = σ
(i + 1) and σ(k) = σ
(k) we see that
σ(i) > σ(k) if and only if σ
(i + 1) > σ
(k); hence i < k and σ(i) > σ(k)
if and only if i + 1 < k and σ
(i + 1) > σ
(k), as required. Furthermore,
since σ(i + 1) = σ
(i), exactly the same argument gives that i + 1 < k and
σ(i +1) > σ(k) if and only if i < k and σ
(i) > σ
(k). Thus (i, k) → (i +1, k)
and (i +1, k) → (i, k) gives a one to one correspondence between N
2
and N
2
.
Moving on now to pairs of the form (j, i) and (j, i+1) where j / ∈ ¦i, i+1¦,
we have that σ(j) > σ(i) if and only if σ
(j) > σ
(i + 1) and σ(j) > σ(i + 1)
if and only if σ
(j) > σ
(i). Since also j < i if and only if j < i + 1 we see
that (j, i) ∈ N if and only if (j, i + 1) ∈ N
and (j, i + 1) ∈ N if and only if
(j, i) ∈ N
. Thus (j, i) → (j, i + 1) and (j, i + 1) → (j, i) gives a one to one
correspondence between N
3
and N
3
.
We have now shown that the number of elements in N and not in N
4
equals the number of elements in N
and not in N
4
. But N
4
and N
4
each
have at most one element, namely (i, i + 1), which is in N
4
and not in N
4
if
σ(i) > σ(i + 1), and in N
4
and not N
4
if σ(i) < σ(i + 1). Hence N has one
more element than N
in the former case, one less in the latter.
8.8 Proposition (i) If σ = τ
1
τ
2
. . . τ
r
where the τ
i
are simple transposi
tions, then l(σ) is odd is r is odd and even if r is even.
(ii) Every permutation σ ∈ S
n
can be expressed as a product of simple
transpositions.
Proof. The ﬁrst part is proved by an easy induction on r, using 8.7. The
point is that
l((τ
1
τ
2
. . . τ
r−1
)τ
r
) = l(τ
1
τ
2
. . . τ
r−1
) ±1
and in either case the parity of τ
1
τ
2
. . . τ
r
is opposite that of τ
1
τ
2
. . . τ
r−1
.
The details are left as an exercise.
The second part is proved by induction on l(σ). If l(σ) = 0 then
σ(i) < σ(j) whenever i < j. Hence
1 ≤ σ(1) < σ(2) < < σ(n) ≤ n
178 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and it follows that σ(i) = i for all i. Hence the only permutation σ with
l(σ) = 0 is the identity permutation. This is trivially a product of simple
transpositions—an empty product. (Just as empty sums are always deﬁned
to be 0, empty products are always deﬁned to be 1. But if you don’t like
empty products, observe that τ
2
= i for any simple transposition τ.)
Suppose now that l(σ) = l > 1 and that all permutations σ
with
l(σ
) < l can be expressed as products of simple transpositions. There must
exist some i such that σ(i) > σ(i + 1), or else the same argument as used
above would prove that σ = i. Choosing such an i, let τ = (i, i + 1) and
σ
= στ.
By 8.7 we have l(σ
) = l −1, and by the inductive hypothesis there exist
simple transpositions τ
i
with σ
= τ
1
τ
2
. . . τ
r
. This gives σ = τ
1
τ
2
. . . τ
r
τ, a
product of simple transpositions.
8.9 Corollary If σ, τ ∈ S
n
then ε(στ) = ε(σ)ε(τ).
Proof. Write σ and τ as products of simple transpositions, and suppose
that there are s factors in the expression for σ and t in the expression for τ.
Multiplying these expressions expresses στ as a product of r+s simple trans
positions. By 8.8 (i),
ε(στ) = (−1)
r+s
= (−1)
r
(−1)
s
= ε(σ)ε(τ).
Our next proposition shows that if the disjoint cycle notation is em
ployed then calculation of the parity of a permutation is a triviality.
8.10 Proposition Transpositions are odd permutations. More generally,
any cycle with an even number of terms is an odd permutation, and any cycle
with an odd number of terms is an even permutation.
Proof. It is easily seen (by 8.7 with σ = i or directly) that if τ = (m, m+1)
is a simple transposition then l(τ) = 1, and so τ is odd: ε(τ) = (−1)
1
= −1.
Consider next an arbitrary transposition σ = (i, j) ∈ S
n
, where i < j.
A short calculation yields (i+1, i+2, . . . , j)(i, j) = (i, i+1)(i+1, i+2, . . . , j),
with both sides equal to (i, i+1, . . . , j). That is, ρσ = τρ, where τ = (i, i+1)
and ρ = (i + 1, i + 2, . . . , j). By 8.5 we obtain
ε(ρ)ε(σ) = ε(ρσ) = ε(τρ) = ε(τ)ε(ρ),
Chapter Eight: Permutations and determinants 179
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and cancelling ε(ρ) gives ε(σ) = ε(τ) = −1 by the ﬁrst case. (It is also easy
to directly calculate l(σ) in this case; in fact, l(σ) = 2(j −i) −1.)
Finally, let ϕ = (r
1
, r
2
, . . . , r
k
) be an arbitrary cycle of k terms. Another
short calculation shows that ϕ = (r
1
, r
2
)(r
2
, r
3
) . . . (r
k−1
, r
k
), a product of
k − 1 transpositions. So 8.9 gives ε(τ) = (−1)
k−1
, which is −1 if k is even
and +1 if k is odd.
Comment
8.10.1 Proposition 8.10 shows that a permutation is even if and only if,
when it is expressed as a product of cycles, the number of eventerm cycles
is even.
The proof of the ﬁnal result of this section is left as an exercise.
8.11 Proposition (i) The number of elements of S
n
is n!.
(ii) The identity i ∈ S
n
, deﬁned by i(i) = i for all i, is an even permutation.
(iii) The parity of σ
−1
equals that of σ (for any σ ∈ S
n
).
(iv) If σ, τ, τ
∈ S
n
are such that στ = στ
then τ = τ
. Thus if σ is ﬁxed
then as τ varies over all elements of S
n
so too does στ.
'8b Determinants
8.12 Definition Let F be a ﬁeld and A ∈ Mat(n n, F). Let the entry
in i
th
row and j
th
column of A be a
ij
. The determinant of A is the element
of F given by
det A =
¸
σ∈S
n
ε(σ)a
1σ(1)
a
2σ(2)
. . . a
nσ(n)
=
¸
σ∈S
n
ε(σ)
n
¸
i=1
a
iσ(i)
.
Comments
8.12.1 Determinants are only deﬁned for square matrices.
8.12.2 Let us examine the deﬁnition ﬁrst in the case n = 3. That is, we
seek to calculate the determinant of a matrix
A =
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
.
180 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The deﬁnition involves a sum over all permutations in S
3
. There are six, listed
in 8.1.1 above. Using the disjoint cycle notation we ﬁnd that there are two 3
cycles ((1, 2, 3) and (1, 3, 2)), three transpositions ((1, 2), (1, 3) and (2, 3)) and
the identity i = (1). The 3cycles and the identity are even and the trans
positions are odd. So, for example, if σ = (1, 2, 3) then ε(σ) = +1, σ(1) = 2,
σ(2) = 3 and σ(3) = 1, so that ε(σ)a
1σ(1)
a
2σ(2)
a
3σ(3)
= (+1)a
12
a
23
a
31
. The
results of similar calculations for all the permutations in S
3
are listed in the
following table, in which the matrix entries a
iσ(i)
are also indicated in each
case.
Permutation Matrix entries a
iσ(i)
ε(σ)a
1σ(1)
a
2σ(2)
a
3σ(3)
(1)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(+1)a
11
a
22
a
33
(2, 3)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(−1)a
11
a
23
a
32
(1, 2)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(−1)a
12
a
21
a
33
(1, 2, 3)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(+1)a
12
a
23
a
31
(1, 3, 2)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(+1)a
13
a
21
a
32
(1, 3)
¸
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
(−1)a
13
a
22
a
31
The determinant is the sum of the six terms in the third column. So, the
determinant of the 3 3 matrix A = (a
ij
) is given by
a
11
a
22
a
33
−a
11
a
23
a
32
−a
12
a
21
a
33
+a
12
a
23
a
31
+a
13
a
21
a
32
−a
13
a
22
a
31
.
8.12.3 Notice that in each of the matrices in the middle column of the
above table there three boxed entries, each of the three rows contains exactly
one boxed entry, and each of the three columns contains exactly one boxed
Chapter Eight: Permutations and determinants 181
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
entry. Furthermore, the above six examples of such a boxing of entries are
the only ones possible. So the determinant of a 3 3 matrix is a sum, with
appropriate signs, of all possible terms obtained by multiplying three factors
with the property that every row contains exactly one of the factors and
every column contains exactly one of the factors.
Of course, there is nothing special about the 33 case. The determinant
of a general nn matrix is obtained by the natural generalization of the above
method, as follows. Choose n matrix positions in such a way that no row
contains more than one of the chosen positions and no column contains more
than one of the chosen positions. Then it will necessarily be true that each
row contains exactly one of the chosen positions and each column contains
exactly one of the chosen positions. There are exactly n! ways of doing this.
In each case multiply the entries in the n chosen positions, and then multiply
by the sign of the corresponding permutation. The determinant is the sum
of the n! terms so obtained.
We give an example to illustrate the rule for ﬁnding the permutation
corresponding to a choice of matrix positions of the above kind. Suppose
that n = 4. If one chooses the 4
th
entry in row 1, the 2
nd
in row 2, the 1
st
in
row 3 and the 3
rd
in row 4, then the permutation is
¸
1 2 3 4
4 2 1 3
= (1, 4, 3).
That is, if the j
th
entry in row i is chosen then the permutation maps i to
j. (In our example the term would be multiplied by +1 since (1, 4, 3) is an
even permutation.)
8.12.4 So far in this section we have merely been describing what the
determinant is; we have not addressed the question of how best in practice
to calculate the determinant of a given matrix. We will come to this question
later; for now, suﬃce it to say that the deﬁnition does not provide a good
method itself, except in the case n = 3. It is too tiresome to compute and
add up the 24 terms arising in the deﬁnition of a 4 4 determinant or the
120 for a 5 5, and things rapidly get worse still.
We start now to investigate properties of determinants. Several are
listed in the next theorem.
8.13 Theorem Let F be a ﬁeld and A ∈ Mat(n n, F), and let the rows
of A be a
˜
1
, a
˜
2
, . . . , a
˜
n
∈
t
F
n
.
(i) Suppose that 1 ≤ k ≤ n and that a
˜
k
= λa
˜
k
+µa
˜
k
. Then
det A = λdet A
+µdet A
182 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
where A
is the matrix with a
k
as k
th
row and all other rows the same
as A, and, likewise, A
has a
k
as its k
th
row and other rows the same
as A.
(ii) Suppose that B is the matrix obtained from A by permuting the rows
by a permutation τ ∈ S
n
. That is, suppose that the ﬁrst row of B is
a
τ(1)
, the second a
τ(2)
, and so on. Then det B = ε(τ) det A.
(iii) The determinant of the transpose of A is the same as the determinant
of A.
(iv) If two rows of A are equal then det A = 0.
(v) If A has a zero row then det A = 0.
(vi) If A = I, the identity matrix, then det A = 1.
Proof. (i) Let the (i, j)
th
entries of A, A
and A
be (respectively) a
ij
, a
ij
and a
ij
. Then we have a
ij
= a
ij
= a
ij
if i = k, and
a
kj
= λa
kj
+µa
kj
for all j. Now by deﬁnition det A is the sum over all σ ∈ S
n
of the terms
ε(σ)a
1σ(1)
a
2σ(2)
. . . a
nσ(n)
. In each of these terms we may replace a
kσ(k)
by
λa
kσ(k)
+µa
kσ(k)
, so that the term then becomes the sum of the two terms
λε(σ)a
1σ(1)
. . . a
(k−1)σ(k−1)
a
kσ(k)
a
(k+1)σ(k+1)
. . . a
nσ(n)
and
µε(σ)a
1σ(1)
. . . a
(k−1)σ(k−1)
a
kσ(k)
a
(k+1)σ(k+1)
. . . a
nσ(n)
In the ﬁrst of these we replace a
iσ(i)
by a
iσ(i)
(for all i = k) and in the second
we likewise replace a
iσ(i)
by a
iσ(i)
. This shows that det A is equal to
λ
¸
σ∈S
n
ε(σ)a
1σ(1)
a
2σ(2)
. . . a
nσ(n)
+µ
¸
σ∈S
n
ε(σ)a
1σ(1)
a
2σ(2)
. . . a
nσ(n)
which is λdet A
+µdet A
.
(ii) Let a
ij
and b
ij
be the (i, j)
th
entries of A and B respectively. We are
given that if b
˜
i
is the i
th
row of B then b
˜
i
= a
˜
τ(i)
; so b
ij
= a
τ(i)j
for all i and
j. Now
det B =
¸
σ∈S
n
ε(σ)
n
¸
i=1
b
iσ(i)
=
¸
σ∈S
n
ε(στ)
n
¸
i=1
b
i(στ)(i)
(since στ runs through all of S
n
as σ does—see 8.11)
Chapter Eight: Permutations and determinants 183
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
=
¸
σ∈S
n
ε(τ)ε(σ)
n
¸
i=1
b
iσ(τ(i))
(by 8.5)
= ε(τ)
¸
σ∈S
n
ε(σ)
n
¸
i=1
a
τ(i)σ(τ(i))
= ε(τ)
¸
σ∈S
n
ε(σ)
n
¸
j=1
a
jσ(j)
since j = τ(i) runs through all of the set ¦1, 2, . . . , n¦ as i does. Thus
det B = ε(τ) det A.
(iii) Let C be the transpose of A and let the (i, j)
th
entry of C be c
ij
. Then
c
ij
= a
ji
, and
det C =
¸
σ∈S
n
ε(σ)
n
¸
i=1
c
iσ(i)
=
¸
σ∈S
n
ε(σ)
n
¸
i=1
a
σ(i)i
.
For a ﬁxed σ, if j = σ(i) then i = σ
−1
(j), and j runs through all of
¦1, 2, . . . , n¦ as i does. Hence
det C =
¸
σ∈S
n
ε(σ)
n
¸
j=1
a
jσ
−1
(j)
.
But if we let ρ = σ
−1
then ρ runs through all of S
n
as σ does, and since
ε(σ
−1
) = ε(σ) our expression for det C becomes
¸
ρ∈S
n
ε(ρ)
¸
n
j=1
a
jρ(j)
,
which equals det A.
(iv) Suppose that a
˜
i
= a
˜
j
where i = j. Let τ be the transposition (i, j),
and let B be the matrix obtained from A by permuting the rows by τ, as
in part (ii). Since we have just swapped two equal rows we in fact have
that B = A, but by (ii) we know that det B = ε(τ) det A = −det A since
transpositions are odd permutations. We have proved that det A = −det A;
so det A = 0.
(v) Suppose that a
˜
k
= 0
˜
. Then a
˜
k
= a
˜
k
+ a
˜
k
, and applying part (i) with
a
˜
k
= a
˜
k
= a
˜
k
and λ = µ = 1 we obtain det A = det A + det A. Hence
det A = 0.
(vi) We know that det A is the sum of terms ε(σ)a
1σ(1)
a
2σ(2)
. . . a
nσ(n)
, and
obviously such a term is zero if a
iσ(i)
= 0 for any value of i. Now if A = I the
184 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
only nonzero entries are the diagonal entries; a
ij
= 0 if i = j. So the term
in the expression for det A corresponding to the permutation σ is zero unless
σ(i) = i for all i. Thus all n! terms are zero except the one corresponding to
the identity permutation. And that term is 1 since all the diagonal entries
are 1 and the identity permutation is even.
Comment
8.13.1 The ﬁrst part of 8.13 says that if all but one of the rows of A are
kept ﬁxed, then det A is a linear function of the remaining row. In other
words, given a
˜
1
, . . . , a
˜
k−1
and a
˜
k+1
, . . . , a
˜
n
∈
t
F
n
the function φ:
t
F
n
→ F
deﬁned by
φ(x
˜
) = det
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
a
˜
1
.
.
.
a
˜
k−1
x
˜
a
˜
k+1
.
.
.
a
˜
n
is a linear transformation. Moreover, since det A = det
t
A it follows that the
determinant function is also linear in each of the columns.
8.14 Proposition Suppose that E ∈ Mat(n n, F) is any elementary ma
trix. Then det(EA) = det E det A for all A ∈ Mat(n n, F).
Proof. We consider the three kinds of elementary matrices separately. First
of all, suppose that E = E
ij
, the matrix obtained by swapping rows i and j
of I. By 2.6, B = EA is the matrix obtained from A by swapping rows i and
j. By 8.13 (ii) we know that det B = ε(τ) det A, where τ = (i, j). Thus, in
this case, det(EA) = −det A for all A.
Next, consider the case E = E
(λ)
ij
, the matrix obtained by adding λ
times row i to row j. In this case 2.6 says that row j of EA is λa
˜
i
+a
˜
j
(where
a
˜
1
, . . . , a
˜
n
are the rows of A), and all other rows of EA are the same as the cor
responding rows of A. So by 8.13 (i) we see that det(EA) is λdet A
+det A,
where A
has the same rows as A, except that row j of A
is a
˜
i
. In particular,
A
has row j equal to row i. By 8.13 (iv) this gives det A
= 0, and so in this
case we have det(EA) = det A.
Chapter Eight: Permutations and determinants 185
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The third possibility is E = E
(λ)
i
. In this case the rows of EA are the
same as the rows of A except for the i
th
, which gets multiplied by λ. Writing
λa
˜
i
as λa
˜
i
+ 0a
˜
i
we may again apply 8.13 (i), and conclude that in this case
det(EA) = λdet A+ 0 det A = λdet A for all matrices A.
In every case we have that det(EA) = c det A for all A, where c is
independent of A; explicitly, c = −1 in the ﬁrst case, 1 in the second, and λ
in the third. Putting A = I this gives
det E = det(EI) = c det I = c
since det I = 1. So we may replace c by det E in the formula for det(EA),
giving the desired result.
As a corollary of the above proof we have
8.15 Corollary The determinants of the three kinds of elementary ma
trices are as follows: det E
ij
= −1 for all i and j; det E
(λ)
ij
= 1 for all i, j and
λ; det E
(λ)
i
= λ for all i and λ.
As an easy consequence of 8.14 we deduce
8.16 Proposition If B, A ∈ Mat(n n, F) and B can be written as a
product of elementary matrices, then det(BA) = det Bdet A.
Proof. Let B = E
1
E
2
. . . E
r
where the E
i
are elementary. We proceed by
induction on r; the case r = 1 is 8.14 itself. Now let r = k + 1, and assume
the result for k. By 8.14 we have
det(BA) = det(E
1
(B
A)) = det E
1
det(B
A)
where B
= E
2
. . . E
k+1
is a product of k elementary matrices, so that the
inductive hypothesis gives
det E
1
det(B
A) = det E
1
det B
det A = det(E
1
B
) det A
by 8.14 again. Since E
1
B
= B these equations combine to give the desired
result.
186 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
In fact the formula det(BA) = det Bdet A for square matrices A and
B is valid for all A and B without restriction, and we will prove this shortly.
As we observed in Chapter 2 (see 2.8) it is possible to obtain a reduced
echelon matrix from an arbitrary matrix by premultiplying by a suitable
sequence of elementary matrices. Hence
8.17 Lemma For any A ∈ Mat(n n, F) there exist B, R ∈ Mat(n n, F)
such that B is a product of elementary matrices and R is a reduced echelon
matrix, and BA = R.
Furthermore, as in the proof of 2.9, it is easily shown that the reduced
echelon matrix R obtained in this way from a square matrix A is either equal
to the identity or else has a zero row. Since the rank of A is just the number
of nonzero rows in R (since the rows of R are linearly independent and span
RS(R) = RS(A)) this can be reformulated as follows:
8.18 Lemma In the situation of 8.17, if the rank of A is n then R = I, and
if the rank of A is not n then R has at least one zero row.
An important consequence of this is
8.19 Theorem The determinant of any n n matrix of rank less than n
is zero.
Proof. Let A ∈ Mat(n n, F) have rank less than n. By 8.17 there exist
elementary matrices E
1
, E
2
, . . . , E
k
and a reduced echelon matrix R with
E
1
E
2
. . . E
k
A = R. By 8.18 R has a zero row. By 2.5 elementary matrices
have inverses which are themselves elementary matrices, and we deduce that
A = E
−1
k
. . . E
−1
2
E
−1
1
R. Now 8.16 gives det A = det(E
−1
k
. . . E
−1
1
) det R = 0
since det R = 0 (by 8.13 (v)).
8.20 Theorem If A, B ∈ Mat(n n, F) then det(AB) = det Adet B.
Proof. Suppose ﬁrst that the rank of A is n. As in the proof of 8.19 we
have det A = det(E
−1
k
. . . E
−1
1
) det R where the E
−1
i
are elementary and R is
reduced echelon. In this case, however, 8.18 yields that R = I, whence A is a
product of elementary matrices and our conclusion comes at once from 8.16.
The other possibility is that the rank of A is less than n. Then by
3.21 (ii) and 7.13 it follows that the rank of AB is also less than n, so that
det(AB) = det A = 0 (by 8.19). So det(AB) = det Adet B, both sides being
zero.
Chapter Eight: Permutations and determinants 187
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comment
8.20.1 Let A be a n n matrix over the ﬁeld F. From the theory above
and in the preceding chapters it can be seen that the following conditions are
equivalent:
(i) A is expressible as a product of elementary matrices.
(ii) The rank of A is n.
(iii) The nullity of A is zero.
(iv) det A = 0.
(v) A has an inverse.
(vi) The rows of A are linearly independent.
(vii) The rows of A span
t
F
n
.
(viii) The columns of A are linearly independent.
(ix) The columns of A span F
n
.
(x) The right null space of A is zero. (That is, the only solution of Av = 0
is v = 0.)
(xi) The left null space of A is zero.
8.21 Definition A matrix A ∈ Mat(n n, F) possessing the properties
listed in 8.20.1 is said to be nonsingular. Other matrices in Mat(n n, F)
are said to be singular.
We promised above to present a good practical method for calculating
determinants, and we now address this matter. Let A ∈ Mat(n n, F). As
we have seen, there exist elementary matrices E
i
such that (E
1
. . . E
k
)A = R,
with R reduced echelon. Furthermore, if R is not the identity matrix then
det A = 0, and if R is the identity matrix then det A is the product of the
determinants of the inverses of E
1
, . . . , E
k
. We also know the determinants
of all elementary matrices. This gives a good method. Apply elementary row
operations to A and at each step write down the determinant of the inverse
of the corresponding elementary matrix. Stop when the identity matrix, or
a matrix with a zero row, is reached. In the zero row case tha determinant
is 0, otherwise it is the product of the numbers written down. Eﬀectively, all
one is doing is simplifying the matrix by row operations and keeping track of
how the determinant is changed. There is a reﬁnement worth noting: column
operations are as good as row operations for calculating determinants; so one
can use either, or a combination of both.
188 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Example
#3 Calculate the determinant of
¸
¸
2 6 14 −4
2 5 10 1
0 −1 2 1
2 7 11 −3
.
−− Apply row operations, keeping a record.
¸
¸
2 6 14 −4
2 5 10 1
0 −1 2 1
2 7 11 −3
R
1
:=
1
2
R
1
R
2
:=R
2
−2R
1
R
4
:=R
4
−2R
1
−−−−−−−→
¸
¸
1 3 7 −2
0 −1 −4 5
0 −1 2 1
0 1 −3 1
2
1
1
R
2
:=−1R
2
R
3
:=R
3
+R
2
R
4
:=R
4
−R
2
−−−−−−−→
¸
¸
1 3 7 −2
0 1 4 −5
0 0 6 −4
0 0 −7 6
−1
1
1
−→
¸
¸
1 0 0 0
0 1 0 0
0 0 6 −4
0 0 −7 6
where in the last step we applied column operations, ﬁrst adding suitable
multiples of the ﬁrst column to all the others, then suitable multiples of
column 2 to columns 3 and 4. For our ﬁrst row operation the determinant
of the inverse of the elementary matrix was 2, and later when we multiplied
a row by −1 the number we recorded was −1. The determinant of the
original matrix is therefore equal to 2 times −1 times the determinant of the
simpliﬁed matrix we have obtained. We could easily go on with row and
column operations and obtain the identity, but there is no need since the
determinant of the last matrix above is obviously 8—all but two of the terms
in its expansion are zero. Hence the original matrix had determinant −16.
−−<
'8c Expansion along a row
We should show that the deﬁnition of the determinant that we have given is
consistent with the formula we gave in Chapter Two. We start with some
elementary propositions.
Chapter Eight: Permutations and determinants 189
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
8.22 Proposition Let A be an r r matrix. Then for any n > r,
det
A 0
0 I
= det A
where I is the (n −r) (n −r) identity.
Proof. If A is an elementary matrix then it is clear that
A 0
0 I
is an
elementary matrix of the same type, and the result is immediate from 8.15. If
A = E
1
E
2
. . . E
k
is a product of elementary matrices then, by multiplication
of partitioned matrices,
A 0
0 I
=
E
1
0
0 I
E
2
0
0 I
. . .
E
k
0
0 I
and by 8.20,
det
A 0
0 I
=
k
¸
i=1
det
E
i
0
0 I
=
k
¸
i=1
det E
i
= det A.
Finally, if A is not a product of elementary matrices then it is singular. We
leave it as an exercise to prove that
A 0
0 I
is also singular and conclude
that both determinants are zero.
Comments
8.22.1 This result is also easy to derive directly from the deﬁnition.
8.22.2 It is clear that exactly the same proof applies for matrices of the
form
I 0
0 A
.
8.23 Proposition If C is any sr matrix then the (r +s)(r +s) matrix
T =
I
r
0
C I
s
has determinant equal to 1.
Proof. Observe that T is obtainable from the (r +s) (r +s) identity by
adding appropriate multiples of the ﬁrst r rows to the last s rows. To be
precise, T is the product (in any order) of the elementary matrices E
(C
ij
)
i,r+j
(for i = 1, 2, . . . , r and j = 1, 2, . . . , s). Since these elementary matrices all
have determinant 1 the result follows.
190 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
8.24 Proposition For any square matrices A and B and any C of appro
priate shape,
det
A 0
C B
= det Adet B.
The proof of this is left as an exercise.
Using the above propositions we can now give a straightforward proof
of the ﬁrst row expansion formula. If A has (i, j)entry a
ij
then the ﬁrst
row of A can be expressed as a linear combination of standard basis rows as
follows:
( a
11
a
12
. . . a
1n
) = a
11
( 1 0 . . . 0 ) +a
12
( 0 1 . . . 0 ) +
+a
1n
( 0 0 . . . 1 ) .
Since the determinant is a linear function of the ﬁrst row
($)
det A = a
11
det
1 0 . . . 0
A
+a
12
det
0 1 . . . 0
A
+
+a
1n
det
0 0 . . . 1
A
where A
is the matrix obtained from A by deleting the ﬁrst row. We apply
column operations to the matrices on the right hand side to bring the 1’s in
the ﬁrst row to the (1, 1) position. If e
i
is the i
th
row of the standard basis
(having j
th
entry δ
ij
) then
e
i
A
E
i,i−1
E
i−1,i−2
. . . E
21
=
1 0
˜
∗ A
i
where A
i
is the matrix obtained from A by deleting the 1
st
row and i
th
column. (The entries designated by the asterisk are irrelevant, but in fact
come from the i
th
column of A.) Taking determinants now gives
det
e
i
A
= (−1)
i−1
det A
i
= cof
1i
(A)
by deﬁnition of the cofactor. Substituting back in ($) gives
det A =
n
¸
i=1
a
1i
cof
1i
(A)
Chapter Eight: Permutations and determinants 191
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
as required.
We remark that it follows trivially by permuting the rows that one
can expand along any row: det A =
¸
n
k=1
a
jk
cof
jk
(A) is valid for all j.
Furthermore, since the determinant of a matrix equals the determinant of its
transpose, one can also expand along columns.
Our ﬁnal result for the chapter deals with the adjoint matrix, and im
mediately yields the formula for the inverse mentioned in Chapter Two.
8.25 Theorem If A is an n n matrix then A(adj A) = (det A)I.
Proof. Let a
ij
be the (i, j)entry of A. The (i, j)entry of A(adj A) is
n
¸
k=1
a
ik
cof
jk
(A)
(since adj A is the transpose of the matrix of cofactors). If i = j this is just
the determinant of A, by the i
th
row expansion. So all diagonal entries of
A(adj A) are equal to the determinant.
If the j
th
row of A is changed to be made equal to the i
th
row (i = j)
then the cofactors cof
jk
(A) are unchanged for all k, but the determinant of
the new matrix is zero since it has two equal rows. The j
th
row expansion
for the determinant of this new matrix gives
0 =
n
¸
k=1
a
ik
cof
jk
(A)
(since the (j, k)entry of the new matrix is a
ik
), and we conclude that the
oﬀdiagonal entries of A(adj A) are zero.
Exercises
1. Compute the given products of permutations.
(i )
¸
1 2 3 4
2 1 4 3
¸
1 2 3 4
2 3 4 1
(ii )
¸
1 2 3 4 5
4 1 5 2 3
¸
1 2 3 4 5
5 4 3 2 1
192 Chapter Eight: Permutations and determinants
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(iii )
¸
1 2 3 4
2 1 4 3
¸
1 2 3 4
2 3 4 1
¸
1 2 3 4
4 3 1 2
(iv)
¸
1 2 3 4
2 1 4 3
¸
1 2 3 4
2 3 4 1
¸
1 2 3 4
4 3 1 2
2. Calculate the parity of each permutation appearing in Exercise 1.
3. Let σ and τ be permutations of ¦1, 2, . . . , n¦. Let S be the matrix with
(i, j)
th
entry equal to 1 if i = σ(j) and 0 if i = σ(j), and similarly let
T be the matrix with (i, j)
th
entry 1 if i = τ(j) and 0 otherwise. Prove
that the (i, j)
th
entry of ST is 1 if i = στ(j) and 0 otherwise.
4. Prove Proposition 8.11.
5. Use row and column operations to calculate the determinant of
¸
¸
1 5 11 2
2 11 −6 8
−3 0 −452 6
−3 −16 −4 13
6. Consider the determinant
det
1 x
1
x
2
1
. . . x
n−1
1
1 x
2
x
2
2
. . . x
n−1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
x
2
n
. . . x
n−1
n
.
Use row and column operations to evaluate this in the case n = 3.
Then do the case n = 4. Then do the general case. (The answer is
¸
i>j
(x
i
−x
j
).)
7. Let p(x) = a
0
+a
1
x+a
2
x
2
, q(x) = b
0
+b
1
x+b
2
x
2
, r(x) = c
0
+c
1
x+c
2
x
2
.
Prove that
det
p(x
1
) q(x
1
) r(x
1
)
p(x
2
) q(x
2
) r(x
2
)
p(x
3
) q(x
3
) r(x
3
)
= (x
2
−x
1
)(x
3
−x
1
)(x
3
−x
2
) det
a
0
b
0
c
0
a
1
b
1
c
1
a
2
b
2
c
2
.
Chapter Eight: Permutations and determinants 193
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
8. Prove that if A is singular then so is
A 0
0 I
.
9. Use 8.22 and 8.23 to prove that
det
A 0
C B
= det Adet B.
(Hint: Factorize the matrix on the left hand side.)
10. (i ) Write out the details of the proof of the ﬁrst part of Proposition 8.8.
(ii ) Modify the proof of the second part of 8.8 to show that any σ ∈ S
n
can be written as a product of l(σ) simple transpositions.
(iii ) Use 8.7 to prove that σ ∈ S
n
cannot be written as a product of
fewer than l(σ) simple transpositions.
(Because of these last two properties, l(σ) may be termed the length
of σ.)
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9
Classiﬁcation of linear operators
A linear operator on a vector space V is a linear transformation from V to V .
We have already seen that if T: V → W is linear then there exist bases of V
and W for which the matrix of T has a simple form. However, for an operator
T the domain V and codomain W are the same, and it is natural to insist
on using the same basis for each when calculating the matrix. Accordingly,
if T: V → V is a linear operator on V and b is a basis of V , we call
Mbb
(T)
the matrix of T relative to b; the problem is to ﬁnd a basis of V for which
the matrix of T is as simple as possible.
Tackling this problem naturally involves us with more eigenvalue theory.
'9a Similarity of matrices
9.1 Definition Square matrices A and B are said to be similar if there
exists a nonsingular matrix X with A = X
−1
BX.
It is an easy exercise to show that similarity is an equivalence relation
on Mat(n n, F).
We proved in 7.8 that if b and c are two bases of a space V and T is a
linear operator on V then the matrix of T relative to b and the matrix of T
relative to c are similar. Speciﬁcally, if A =
Mbb
(T) and B =
Mcc
(T) then
B = X
−1
AX where X =
Mbc
, the c to b transition matrix. Moreover, since
any invertible matrix can be a transition matrix (see Exercise 8 of Chapter
Four), ﬁnding a simple matrix for a given linear operator corresponds exactly
to ﬁnding a simple matrix similar to a given one.
In this context, ‘simple’ means ‘as near to diagonal as possible’. We
investigated diagonalizing real matrices in Chapter One, but failed to men
tion the fact that not all matrices are diagonalizable. It is true, however,
that many—and over some ﬁelds most—matrices are are diagonalizable; for
194
Chapter Nine: Classiﬁcation of linear operators 195
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
instance, we will show that if an nn matrix has n distinct eigenvalues then
it is diagonalizable.
The following deﬁnition provides a natural extension of concepts previ
ously introduced.
9.2 Definition Let A ∈ Mat(n n, F) and λ ∈ F. The set of all v ∈ F
n
such that Av = λv is called the λeigenspace of A.
Comments
9.2.1 Observe that λ is an eigenvalue if and only if the λeigenspace is
nonzero.
9.2.2 The λeigenspace of A contains zero and is closed under addition
and scalar multiplication. Hence it is a subspace of F
n
.
9.2.3 Since Av = λv if and only if (A − λI)v = 0 we see that the λ
eigenspace of A is exactly the right null space of A − λI. This is nonzero if
and only if A−λI is singular.
9.2.4 We can equally well use rows instead of columns, deﬁning the
left eigenspace to be the set of all v ∈
t
F
n
satisfying vA = λv. For square
matrices the left and right null spaces have the same dimension; hence the
eigenvalues are the same whether one uses rows or columns. However, there
is no easy way to calculate the left (row) eigenvectors from the right (column)
eigenvectors, or vice versa.
The same terminology is also applied to linear operators. The λ
eigenspace of T: V → V is the set of all v ∈ V such that T(v) = λv. That is,
it is the kernel of the linear operator T −λi, where i is the identity operator
on V . And λ is an eigenvalue of T if and only if the λeigenspace of T is
nonzero. If b is any basis of V then the eigenvalues of T are the same as the
eigenvalues of
Mbb
(T), and v is in the λeigenspace of T if and only if cv
b
(v)
is in the λeigenspace of
Mbb
(T).
Since choosing a diﬀerent basis for V replaces the matrix of T by a
similar matrix, the above remarks indicate that similar matrices must have
the same eigenvalues. In fact, more is true: similar matrices have the same
characteristic polynomial. This enables us to deﬁne the characteristic poly
nomial c
T
(x) of a linear operator T to be the characteristic polynomial of
Mbb
(T), for arbitrarily chosen b. That is, c
T
(x) = det(
Mbb
(T) −xI).
196 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9.3 Proposition Similar matrices have the same characteristic polyno
mial.
Proof. Suppose that A and B are similar matrices. Then there exists a
nonsingular P with A = P
−1
BP. Now
det(A−xI) = det(P
−1
BP −xP
−1
P)
= det
(P
−1
B −xP
−1
)P
= det
P
−1
(B −xI)P
= det P
−1
det(B −xI) det P,
where we have used the fact that the scalar x commutes with the matrix
P
−1
, and the multiplicative property of determinants. In the last line above
we have the product of three scalars, and since multiplication of scalars is
commutative this last line equals
det P
−1
det P det(B −xI) = det(P
−1
P) det(B −xI) = det(B −xI)
since det I = 1. Hence det(A−xI) = det(B −xI), as required.
The next proposition gives the precise criterion for diagonalizability of
a matrix.
9.4 Proposition (i) If A ∈ Mat(n n, F) then A is diagonalizable if and
only if there is a basis of F
n
consisting entirely of eigenvectors for A.
(ii) If A, X ∈ Mat(n n, F) and X is nonsingular then X
−1
AX is diagonal
if and only if the columns of X are all eigenvectors of A.
Proof. Since a matrix X ∈ Mat(n n, F) is nonsingular if and only if its
columns are a basis for F
n
the ﬁrst part is a consequence of the second. If the
columns of X are t
˜
1
, t
˜
2
, . . . , t
˜
n
then the i
th
column of AX is At
˜
i
, while the
i
th
column of X diag(λ
1
, λ
2
, . . . , λ
n
) is λ
i
t
˜
i
. So AX = X diag(λ
1
, λ
2
, . . . , λ
n
)
if and only if for each i the column t
˜
i
is in the λ
i
eigenspace of A, and the
result follows.
Chapter Nine: Classiﬁcation of linear operators 197
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
It is perhaps more natural to state the above criterion for operators
rather than matrices. A linear operator T: V → V is said to be diagonalizable
if there is a basis b of V such that
Mbb
(T) is a diagonal matrix. Now if
b = (v
1
, v
2
, . . . , v
n
) then
Mbb
(T) =
¸
¸
¸
λ
1
0 0
0 λ
2
0
.
.
.
.
.
.
.
.
.
0 0 λ
n
if and only if T(v
i
) = λ
i
v
i
for all i. Hence we have proved the following:
9.5 Proposition A linear operator T on a vector space V is diagonalizable
if and only if there is a basis of V consisting entirely of eigenvectors of T.
If T is diagonalizable and b is such a basis of eigenvectors, then
Mbb
(T),
the matrix of T relative to b, is diag(λ
1
, . . . , λ
n
), where λ
i
is the eigenvalue
corresponding to the i
th
basis vector.
Of course, if c is an arbitrary basis of V then T is diagonalizable if and
only if
Mcc
(T) is.
When one wishes to diagonalize a matrix A ∈ Mat(n n, F) the proce
dure is to ﬁrst solve the characteristic equation to ﬁnd the eigenvalues, and
then for each eigenvalue λ ﬁnd a basis for the λeigenspace by solving the
simultaneous linear equations (A−λI)x
˜
= 0
˜
. Then one combines all the ele
ments from the bases for all the diﬀerent eigenspaces into a single sequence,
hoping thereby to obtain a basis for F
n
. Two questions naturally arise: does
this combined sequence necessarily span F
n
, and is it necessarily linearly
independent? The answer to the ﬁrst of these questions is no, as we will soon
see by examples, but the answer to the second is yes. This follows from our
next theorem.
9.6 Theorem Let T: V → V be a linear operator, and let the distinct
eigenvalues of T be λ
1
, λ
2
, . . . , λ
k
. For each i let V
i
be the λ
i
eigenspace.
Then the spaces V
i
are independent. That is, if U is the subspace deﬁned by
U = V
1
+V
2
+ +V
k
= ¦ v
1
+v
2
+ +v
k
[ v
i
∈ V
i
for all i ¦
then U = V
1
⊕V
2
⊕ ⊕V
k
.
Proof. We use induction on r to prove that for all r ∈ ¦1, 2, . . . , n¦ the
spaces V
1
, V
2
, . . . , V
r
are independent. This is trivially true in the case r = 1.
198 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Suppose then that r > 1 and that V
1
, V
2
, . . . , V
r−1
are independent; we
seek to prove that V
1
, V
2
, . . . , V
r
are independent. According to Deﬁnition
6.7, our task is to prove the following statement:
($) If u
i
∈ V
i
for i = 1, 2, . . . , r and
¸
r
i=1
u
i
= 0 then each u
i
= 0.
Assume that we have elements u
i
satisfying the hypotheses of ($). Since
T is linear we have
0 = T(0) = T
r
¸
i=1
u
i
=
r
¸
i=1
T(u
i
) =
r
¸
i=1
λ
i
u
i
since u
i
is in the λ
i
eigenspace of T. Multiplying the equation
¸
r
i=1
u
i
= 0
by λ
r
and combining with the above allows us to eliminate u
r
, and we obtain
r−1
¸
i=1
(λ
i
−λ
r
)u
i
= 0.
Since the i
th
term in this sum is in the space V
i
and our inductive hypothesis
is that V
1
, V
2
, . . . , V
r−1
are independent, it follows that (λ
i
−λ
r
)u
i
= 0 for
i = 1, 2, . . . , r − 1. Since the eigenvalues λ
1
, λ
2
, . . . , λ
k
were assumed to
be all distinct we have that each λ
i
− λ
r
above is nonzero, and multiplying
through by (λ
i
−λ
r
)
−1
gives u
i
= 0 for i = 1, 2, . . . , r−1. Since
¸
r
i=1
u
i
= 0
it follows that u
r
= 0 also. This proves ($) and completes our induction.
As an immediate consequence of this we may rephrase our diagonaliz
ability criterion:
9.7 Corollary A linear operator T: V → V is diagonalizable if and only
V is the direct sum of eigenspaces of T.
Proof. Let λ
1
, . . . , λ
k
be the distinct eigenvalues of T and let V
1
, . . . , V
k
be the corresponding eigenspaces. If V is the direct sum of the V
i
then
by 6.9 a basis of V can be obtained by concatenating bases of V
1
, . . . , V
k
,
which results in a basis of V consisting of eigenvectors of T and shows that
T is diagonalizable. Conversely, assume that T is diagonalizable, and let
b = (x
1
, x
2
, . . . , x
n
) be a basis of V consisting of eigenvectors of T. Let
I = ¦1, 2, . . . , n¦, and for each j = 1, 2, . . . , k let I
j
= ¦ i ∈ I [ x
i
∈ V
j
¦.
Chapter Nine: Classiﬁcation of linear operators 199
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Each x
i
lies in some eigenspace; so I is the union of the I
j
. Since B spans V ,
for each v ∈ V there exist scalars λ
i
such that
v =
¸
i∈I
λ
i
v
i
=
¸
i∈I
1
λ
i
x
i
+
¸
i∈I
2
λ
i
x
i
+ +
¸
i∈I
k
λ
i
x
i
.
Writing u
j
=
¸
i∈I
j
λ
i
x
i
we have that u
j
∈ V
j
, and
v = u
1
+u
2
+ +u
k
∈ V
1
+V
2
+ +V
k
.
Since the sum
¸
k
j=1
V
j
is direct (by 9.6) and since we have shown that every
element of V lies in this subspace, we have shown that V is the direct sum
of eigenspaces of T, as required.
The implication of the above for the practical problem of diagonalizing
an n n matrix is that if the sum of the dimensions of the eigenspaces is
n then the matrix is diagonalizable, otherwise the sum of the dimensions is
less than n and it is not diagonalizable.
Example
#1 Is the the following matrix A diagonalizable?
A =
¸
−2 1 −1
2 −1 2
4 −1 3
−− Expanding the determinant
det
¸
−2 −x 1 −1
2 −1 −x 2
4 −1 3 −x
we ﬁnd that
(−2 −x)(−1 −x)(3 −x) −(−2 −x)(−1)(2) −(1)(2)(3 −x)
+(1)(2)(4) + (−1)(2)(−1) −(−1)(−1 −x)(4)
is the characteristic polynomial of A. (Note that for calculating character
istic polynomials it seems to be easier to work with the ﬁrst row expansion
200 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
technique, rather than attempt to do row operations with polynomials.) Sim
plifying the above expression and factorizing it gives −(x+1)
2
(x−2), so that
the eigenvalues are −1 and 2. When, as here, the characteristic equation has
a repeated root, one must hope that the dimension of the corresponding
eigenspace is equal to the multiplicity of the root; it cannot be more, and if
it is less then the matrix is not diagonalizable.
Alas, in this case it is less. Solving
¸
−1 1 −1
2 0 2
4 −1 4
¸
α
β
γ
=
¸
0
0
0
we ﬁnd that the solution space has dimension one,
¸
1
0
−1
being a basis. For
the other eigenspace (for the eigenvalue 2) one must solve
¸
−4 1 −1
2 −3 2
4 −1 1
¸
α
β
γ
=
¸
0
0
0
,
and we obtain solution space of dimension one with
¸
−1
6
10
as basis. The
sum of the dimensions of the eigenspaces is not suﬃcient; the matrix is not
diagonalizable. −−<
Here is a practical point which deserves some emphasis. If when calcu
lating the eigenspace corresponding to some eigenvalue λ you ﬁnd that the
equations have only the trivial solution zero, then you have made a mistake.
Either λ was not an eigenvalue at all and you made a mistake when solving
the characteristic equation, or else you have made a mistake solving the lin
ear equations for the eigenspace. By deﬁnition, an eigenvalue of A is a λ for
which the equations (A−λI)v = 0 have a nontrivial solution v.
'9b Invariant subspaces
9.8 Definition If T is an operator on V and U is a subspace of V then U
is said to be invariant under T if T(u) ∈ U for all u ∈ U.
Chapter Nine: Classiﬁcation of linear operators 201
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Observe that the 1dimensional subspace spanned by an eigenvector of
T is invariant under T. Conversely, if a 1dimensional subspace is Tinvariant
then it must be contained in an eigenspace.
Example
#2 Let A be a nn matrix. Prove that the column space of A is invariant
under left multiplication by A
3
+ 5A−2I.
−− We know that the column space of A equals ¦ Av [ v ∈ F
n
¦. Multi
plying an arbitrary element of this on the left by A
3
+5A−2I yields another
element of the same space:
(A
3
+ 5A−2I)Av = (A
4
+ 5A
2
−2A)v = A(A
3
+ 5A−2I)v = Av
where v
= (A
3
+ 5A−2I)v. −−<
Discovering Tinvariant subspaces aids one’s understanding of the op
erator T, since it enables T to be described in terms of operators on lower
dimensional subspaces.
9.9 Theorem Let T be an operator on V and suppose that V = U ⊕ W
for some Tinvariant subspaces U and W. If b and c are bases of U and W
then the matrix of T relative to the basis (b, c) of V is
Mbb
(T
U
) 0
0
Mcc
(T
W
)
where T
U
and T
W
are the operators on U and W given by restricting T to
these subspaces.
Proof. Let b = (v
1
, v
2
, . . . , v
r
) and c = (v
r+1
, v
r+2
, . . . , v
n
), and write
B =
Mbb
(T
U
), C =
Mcc
(T
W
). Let A be the matrix of T relative to the
combined basis (v
1
, v
2
, . . . , v
n
). If j ≤ r then v
j
∈ U and so
n
¸
i=1
A
ij
v
i
= T(v
j
) = T
U
(v
j
) =
r
¸
i=1
B
ij
v
i
and by linear independence of the v
i
we deduce that A
ij
= B
ij
for i ≤ r and
A
ij
= 0 for i > r. This gives
A =
B ∗
0 ∗
and a similar argument for the remaining basis vectors v
j
completes the
proof.
202 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comment
9.9.1 In the situation of this theorem it is common to say that T is the
direct sum of T
U
and T
W
, and to write T = T
U
⊕ T
W
. The matrix in the
theorem statement can be called the diagonal sum of
Mbb
(T
U
) and
Mcc
(T
W
).
Example
#3 Describe geometrically the eﬀect on 3dimensional Euclidean space of
the operator T deﬁned by
() T
¸
x
y
z
=
¸
−2 1 −2
4 1 2
2 −3 3
¸
x
y
z
.
−− Let A be the 3 3 matrix in (). We look ﬁrst for eigenvectors of A
(corresponding to 1dimensional Tinvariant subspaces). Since
det
¸
−2 −x 1 −2
4 1 −x 2
2 −3 3 −x
= x
3
−2x
2
+x −2 = (x
2
+ 1)(x −2)
we ﬁnd that 2 is an eigenvalue, and solving
¸
−4 1 −2
4 −1 2
2 −3 1
¸
x
y
z
=
¸
0
0
0
we ﬁnd that v
1
=
t
( −1 0 2 ) spans the 2eigenspace.
The other eigenvalues of A are complex. However, since ker(T − 2i)
has dimension 1, we deduce that im(T − 2i) is a 2dimensional Tinvariant
subspace. Indeed, im(T −2i) is the column space of A−2I, which is clearly
of dimension 2 since the ﬁrst column is twice the third. We may choose a
basis (v
2
, v
3
) of this subspace by choosing v
2
arbitrarily (for instance, let v
2
be the third column of A−2I) and letting v
3
= T(v
2
). (We make this choice
simply so that it is easy to express T(v
2
) in terms of v
2
and v
3
.) A quick
calculation in fact gives that v
3
=
t
( 4 −4 −7 ) and T(v
3
) = −v
2
. It is
easily checked that v
1
, v
2
and v
3
are linearly independent and hence form
a basis of R
3
, and the action of T is given by the equations T(v
1
) = 2v
1
,
Chapter Nine: Classiﬁcation of linear operators 203
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
T(v
2
) = v
3
and T(v
3
) = −v
2
. In matrix terms, the matrix of T relative to
this basis is
¸
2 0 0
0 0 −1
0 1 0
.
On the onedimensional subspace spanned by v
1
the action of T is just multi
plication by 2. Its action on the two dimensional subspace spanned by v
2
and
v
3
is also easy to visualize. If v
2
and v
3
were orthogonal to each other and of
equal lengths then v
2
→ v
3
and v
3
→ −v
2
would give a rotation through 90
◦
.
Although they are not orthogonal and equal in length the action of T on the
plane spanned by v
2
and v
3
is nonetheless rather like a rotation through 90
◦
:
it is what the rotation would become if the plane were “squashed” so that
the axes are no longer perpendicular. Finally, an arbitrary point is easily
expressed (using the parallelogram law) as a sum of a multiple of v
1
and
something in the plane of v
2
and v
3
. The action of T is to double the v
1
component and “rotate” the (v
2
, v
3
) part. −−<
If U is an invariant subspace which does not have an invariant com
plement W it is still possible to obtain some simpliﬁcation. Choose a basis
b = (v
1
, v
2
, . . . , v
r
) of U and extend it to a basis (v
1
, v
2
, . . . , v
n
) of V . Exactly
as in the proof of 9.9 above we see that the matrix of T relative to this basis
has the form
B D
0 C
, for B =
Mbb
(T
U
) and some matrices C and D. A
little thought shows that
T
V/U
: v +U → T(v) +U
deﬁnes an operator T
V/U
on the quotient space V/U, and that C is the matrix
of T
V/U
relative to the basis (v
r+1
+U, v
r+2
+U, . . . , v
n
+U). It is less easy
to describe the signiﬁcance of the matrix D.
The characteristic polynomial also simpliﬁes when invariant subspaces
are found.
9.10 Theorem Suppose that V = U ⊕W where U and W are Tinvariant,
and let T
1
, T
2
be the operators on U and W given by restricting T. If
f
1
(x), f
2
(x) are the characteristic polynomials of T
1
, T
2
then the character
istic polynomial of T is the product f
1
(x)f
2
(x).
204 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Proof. Choose bases b and c of U and W, and let B and C be the cor
responding matrices of T
1
and T
2
. Using 9.9 we ﬁnd that the characteristic
polynomial of T is
c
T
(x) = det
B −xI 0
0 C −xI
= det(B −xI) det(C −xI) = f
1
(x)f
2
(x)
(where we have used 8.24) as required.
In fact we do not need a direct decomposition for this. If U is a T
invariant subspace then the characteristic polynomial of T is the product
of the characteristic polynomials of T
U
and T
V/U
, the operators on U and
V/U induced by T. As a corollary it follows that the dimension of the λ
eigenspace of an operator T is at most the multiplicity of x −λI as a factor
of the characteristic polynomial c
T
(x).
9.11 Proposition Let U be the λeigenspace of T and let r = dimU.
Then c
T
(x) is divisible by (x −λ)
r
.
Proof. Relative to a basis (v
1
, v
2
, . . . , v
n
) of V such that (v
1
, v
2
, . . . , v
r
) is
a basis of U, the matrix of T has the form
λI
r
D
0 C
. By 8.24 we deduce
that c
T
(x) = (λ −x)
r
det(C −xI).
'9c Algebraically closed ﬁelds
Up to now the ﬁeld F has played only a background role, and which ﬁeld
it is has been irrelevant. But it is not true that all ﬁelds are equivalent,
and diﬀerences between ﬁelds are soon encountered when solving polynomial
equations. For instance, over the ﬁeld R of real numbers the polynomial
x
2
+ 1 has no roots (and so the matrix
0 1
−1 0
has no eigenvalues), but
over the ﬁeld C of complex numbers it has two roots, i and −i. So which ﬁeld
one is working over has a big bearing on questions of diagonalizability. (For
example, the above matrix is not diagonalizable over R but is diagonalizable
over C.)
9.12 Definition A ﬁeld F is said to be algebraically closed if every poly
nomial of degree greater than zero with coeﬃcients in F has a zero in F.
Chapter Nine: Classiﬁcation of linear operators 205
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
A trivial induction based on the deﬁnition shows that over an alge
braically closed ﬁeld every polynomial of degree greater than zero can be
expressed as a product of factors of degree one. In particular
9.13 Proposition If F is an algebraically closed ﬁeld then for every matrix
A ∈ Mat(n n, F) there exist λ
1
, λ
2
, . . . , λ
n
∈ F (not necessarily distinct)
such that
c
A
(x) = (λ
1
−x)(λ
2
−x) . . . (λ
n
−x)
(where c
A
(x) is the characteristic polynomial of A).
As a consequence of 9.13 the analysis of linear operators is signiﬁcantly
simpler if the ground ﬁeld is algebraically closed. Because of this it is con
venient to make use of complex numbers when studying linear operators on
real vector spaces.
9.14 The Fundamental Theorem of Algebra The complex ﬁeld C
is algebraically closed.
The traditional name of this theorem is somewhat inappropriate, since
it is proved by methods of complex analysis. We will not prove it here. There
is an important theorem, which really is a theorem of algebra, which asserts
that for every ﬁeld F there is an algebraically closed ﬁeld containing F as
a subﬁeld. The proof of this is also beyond the scope of this book, but it
proceeds by “adjoining” new elements to F, in much the same way as
√
−1
is adjoined to R in the construction of C.
'9d Generalized eigenspaces
Let T be a linear operator on an ndimensional vector space V and suppose
that the characteristic polynomial of T can be factorized into factors of degree
one:
c
T
(x) = (λ
1
−x)
m
1
(λ
2
−x)
m
2
. . . (λ
k
−x)
m
k
where λ
i
= λ
j
for i = j. (If the ﬁeld F is algebraically closed this will always
be the case.) Let V
i
be the λ
i
eigenspace and let n
i
= dimV
i
. Certainly
n
i
≥ 1 for each i, since λ
i
is an eigenvalue of T; so, by 9.11,
9.14.1 1 ≤ n
i
≤ m
i
for all i = 1, 2, . . . , k.
206 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
If m
i
= 1 for all i then since c
T
has degree n (by Exercise 12 of Chapter
Two) it follows that k = n and
dim(V
1
⊕V
2
⊕ ⊕V
k
) =
n
¸
i=1
dimV
i
= n = dimV
so that V is the direct sum of the eigenspaces.
If the m
i
are not all equal to 1 then, as we have seen by example, V need
not be the sum of the eigenspaces V
i
= ker(T −λ
i
i). A question immediately
suggests itself: instead of V
i
, should we perhaps use W
i
= ker(T −λ
i
i)
m
i
?
9.15 Definition Let λ be an eigenvalue of the operator T and let m be
the multiplicity of λ − x as a factor of the characteristic polynomial c
T
(x).
Then ker(T −λi)
m
is called the generalized λeigenspace of T.
9.16 Proposition Let T be a linear operator on V . The generalized
eigenspaces of T are Tinvariant subspaces of V .
Proof. It is immediate from Deﬁnition 9.15 and Theorem 3.13 that gen
eralized eigenspaces are subspaces; hence invariance is all that needs to be
proved.
Let λ be an eigenvalue of T and m the multiplicity of x−λ in c
T
(x), and
let W be the generalized λeigenspace of T. By deﬁnition, v ∈ W if and only
if (T−λi)
m
(v) = 0; let v be such an element. Since (T−λi)
m
T = T(T−λi)
m
we see that
(T −λi)
m
(T(v)) = T((T −λi)
m
(v)) = T(0) = 0,
and it follows that T(v) ∈ W. Thus T(v) ∈ W for all v ∈ W, as required.
If (T −λi)
i
(x) = 0 then clearly (T −λi)
j
(x) = 0 whenever j ≥ i. Hence
9.16.1 ker(T −λi) ⊆ ker(T −λi)
2
⊆ ker(T −λi)
3
⊆
and in particular the λeigenspace of T is contained in the generalized λ
eigenspace. Because our space V is ﬁnite dimensional the subspaces in the
chain 9.16.1 cannot get arbitrarily large. We will show below that in fact the
generalized eigenspace is the largest, the subsequent terms in the chain all
being equal:
ker(T −λi)
m
= ker(T −λi)
m+1
= ker(T −λi)
m+2
=
where m is the multiplicity of x −λ in c
T
(x).
Chapter Nine: Classiﬁcation of linear operators 207
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9.17 Lemma Let S: V → V be a linear operator and x ∈ V . If r is a
positive integer such that S
r
(x) = 0 but S
r−1
(x) = 0 then the elements
x, S(x), . . . , S
r−1
(x) are linearly independent.
Proof. Suppose that λ
0
x + λ
1
S(x) + + λ
r−1
S
r−1
(x) = 0 and suppose
that the scalars λ
i
are not all zero. Choose k minimal such that λ
k
= 0.
Then the above equation can be written as
λ
k
S
k
(x) +λ
k+1
S
k+1
(x) + +λ
r−1
S
r−1
(x) = 0.
Applying S
r−k−1
to both sides gives
λ
k
S
r−1
(x) +λ
k+1
S
r
(x) + +λ
r−1
S
2r−k−2
(x) = 0
and this reduces to λ
k
S
r−1
(x) = 0 since S
i
(x) = 0 for i ≥ r. This contradicts
3.7 since both λ
k
and S
r−1
(x) are nonzero.
9.18 Proposition Let T: V → V be a linear operator, and let λ and m be
as in Deﬁnition 9.15. If r is an integer such that ker(T−λi)
r
= ker(T−λi)
r−1
then r ≤ m.
Proof. By 9.16.1 we have ker(T −λi)
r−1
⊆ ker(T −λi)
r
; so if they are not
equal there must be an x in ker(T −λi)
r
and not in ker(T −λi)
r−1
. Given
such an x, deﬁne x
i
= (T −λi)
i
(x) for i from 1 to r, noting that this gives
9.18.1 T(x
i
) =
λx
i
+x
i+1
(for i = 1, 2, . . . , r −1)
λx
i
(for i = r).
Lemma 9.17, applied with S = T − λi, shows that x
1
, x
2
, . . . , x
r
are lin
early independent, and by 4.10 there exist x
r+1
, x
r+2
. . . , x
n
∈ V such that
b = (x
1
, x
2
, . . . , x
n
) is a basis.
The ﬁrst r columns of
Mbb
(T) can be calculated using 9.18.1, and we
ﬁnd that
Mbb
(T) =
J
r
(λ) C
0 D
for some D ∈ Mat((n −r) (n −r), F) and C ∈ Mat(r (n −r), F), where
J
r
(λ) ∈ Mat(r r, F) is the matrix
9.18.2 J
r
(λ) =
¸
¸
¸
¸
¸
¸
¸
λ 0 0 . . . 0 0
1 λ 0 0 0
0 1 λ 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . λ 0
0 0 0 . . . 1 λ
208 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
(having diagonal entries equal to λ, entries immediately below the diagonal
equal to 1, and all other entries zero).
Since J
r
(λ) − xI is lower triangular with all diagonal entries equal to
λ −x we see that
det(J
r
(λ) −xI) = (λ −x)
r
and hence the characteristic polynomial of
Mbb
(T) is (λ−x)
r
f(x), where f(x)
is the characteristic polynomial of D. So c
T
(x) is divisible by (x −λ)
r
. But
by deﬁnition the multiplicity of x −λ in c
T
(x) is m; so we must have r ≤ m.
It follows from Proposition 9.18 that the intersection of ker(T − λi)
m
and im(T − λi)
m
is ¦0¦. For if x ∈ im(T − λi)
m
∩ ker(T − λi)
m
then
x = (T − λi)
m
(y) for some y, and (T − λi)
m
(x) = 0. But from this we
deduce that (T −λi)
2m
(y) = 0, whence
y ∈ ker(T −λi)
2m
= ker(T −λi)
m
(by 9.18)
and it follows that x = (T −λi)
m
(y) = 0.
9.19 Theorem In the above situation,
V = ker(T −λi)
m
⊕im(T −λi)
m
.
Proof. We have seen that ker(T −λi)
m
and im(T −λi)
m
have trivial inter
section, and hence their sum is direct. Now by Theorem 6.9 the dimension of
ker(T −λi)
m
⊕im(T −λi)
m
equals dim(ker(T −λi)
m
) +dim(im(T −λi)
m
),
which equals the dimension of V by the Main Theorem on Linear Transfor
mations. Since ker(T −λi)
m
⊕im(T −λi)
m
⊆ V the result follows.
Let U = ker(T − λi)
m
(the generalized eigenspace) and let W be the
image of (T − λi)
m
. We have seen that U is Tinvariant, and it is equally
easy to see that W is Tinvariant. Indeed, if x ∈ W then x = (T −λi)
m
(y)
for some y, and hence
T(x) = T((T −λi)
m
(y) = (T −λi)
m
(T(y)) ∈ im(T −λi)
m
= W.
By 9.10 we have that c
T
(x) = f(x)g(x) where f(x) and g(x) are the charac
teristic polynomials of T
U
and T
W
, the restrictions of T to U and W.
Chapter Nine: Classiﬁcation of linear operators 209
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Let µ be an eigenvalue of T
U
. Then T(x) = µx for some nonzero
x ∈ U, and this gives (T − λi)(x) = (µ − λ)x. It follows easily by induc
tion that (T − λi)
m
(x) = (µ − λ)
m
x, and therefore (µ − λ)
m
x = 0 since
x ∈ U = ker(T −λi)
m
. But x = 0; so we must have µ = λ. We have proved
that λ is the only eigenvalue of T
U
.
On the other hand, λ is not an eigenvalue of T
W
, since if it were then
we would have (T − λi)(x) = 0 for some nonzero x ∈ W, a contradiction
since ker(T −λi) ⊆ U and U ∩ W = ¦0¦.
We are now able to prove the main theorem of this section:
9.20 Theorem Suppose that T: V → V is a linear operator whose char
acteristic polynomial factorizes into factors of degree 1. Then V is a direct
sum of the generalized eigenspaces of T. Furthermore, the dimension of each
generalized eigenspace equals the multiplicity of the corresponding factor of
the characteristic polynomial.
Proof. Let c
T
(x) = (x −λ
1
)
m
1
(x −λ
2
)
m
2
. . . (x −λ
k
)
m
k
and let U
1
be the
generalized λ
1
eigenspace, W
1
the image of (T −λi)
m
1
.
As noted above we have a factorization c
T
(x) = f
1
(x)g
1
(x), where f
1
(x)
and g
1
(x) are the characteristic polynomials of T
U
1
and T
W
1
, and since c
T
(x)
can be expressed as a product of factors of degree 1 the same is true of f
1
(x)
and g
1
(x). Moreover, since λ
1
is the only eigenvalue of T restricted to U
1
and λ
1
is not an eigenvalue of T restricted to W
1
, we must have
f(x) = (x −λ
1
)
m
1
and
g(x) = (x −λ
2
)
m
2
. . . (x −λ
k
)
m
k
.
One consequence of this is that dimU
1
= m
1
; similarly for each i the gener
alized λ
i
eigenspace has dimension m
i
.
Applying the same argument to the restriction of T to W
1
and the
eigenvalue λ
2
gives W
1
= U
2
⊕W
2
where U
2
is contained in ker(T −λ
2
i)
m
2
and the characteristic polynomial of T restricted to U
2
is (x −λ
2
)
m
2
. Since
the dimension of U
2
is m
2
it must be the whole generalized λ
2
eigenspace.
Repeating the process we eventually obtain
V = U
1
⊕U
2
⊕ ⊕U
k
where U
i
is the generalized λ
i
eigenspace of T.
210 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Comments
9.20.1 Theorem 9.20 is a good start in our quest to understand an ar
bitrary linear operator; however, generalized eigenspaces are not as simple
as eigenspaces, and we need further investigation to understand the action
of T on each of its generalized eigenspaces. Observe that the restriction of
(T − λi)
m
to the kernel of (T − λi)
m
is just the zero operator; so if λ is
an eigenvalue of T and N the restriction of T − λi to the generalized λ
eigenspace, then N
m
= 0 for some m. We investigate operators satisfying
this condition in the next section.
9.20.2 Let A be an n n matrix over an algebraically closed ﬁeld F.
Theorem 9.20 shows that there exists a nonsingular T ∈ Mat(n n, F) such
that the ﬁrst m
1
columns of T are in the null space of (A − λ
1
I)
m
1
, the
next m
2
are in the null space of (A − λ
2
I)
m
2
, and so on, and that T
−1
AT
is a diagonal sum of blocks A
i
∈ Mat(m
i
m
i
, F). Furthermore, each block
A
i
satisﬁes (A
i
− λ
i
I)
m
i
= 0 (since the matrix A − λ
i
I corresponds to the
operator N in 9.20.1 above).
9.20.3 We will say that an operator T is λprimary if its characteristic
polynomial c
T
(x) is a power of x −λ. An equivalent condition is that some
power of T −λi is zero.
'9e Nilpotent operators
9.21 Definition A linear operator N on a space V is said to be nilpotent
if there exists a positive integer q such that N
q
(v) = 0 for all v ∈ V . The
least such q is called the index of nilpotence of N.
As we commented in 9.20.1, an understanding of nilpotent operators
is crucial for an understanding of operators in general. The present section
is devoted to a thorough investigation of nilpotent operators. We start by
considering a special case.
9.22 Proposition Let b = (v
1
, v
2
, . . . , v
q
) be a basis of the space V , and
T: V → V a linear operator satisfying T(v
i
) = v
i+1
for i from 1 to q − 1,
and T(v
q
) = 0. Then T is nilpotent of index q. Moreover, for each l with
1 ≤ l ≤ q the l elements v
q−l+1
, . . . , v
q−1
, v
q
form a basis for the kernel
of T
l
.
Chapter Nine: Classiﬁcation of linear operators 211
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The proof of this is left as an exercise. Note that the matrix of T
relative to b is the q q matrix
J
q
(0) =
¸
¸
¸
¸
¸
0 0 0 . . . 0 0
1 0 0 . . . 0 0
0 1 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 1 0
(where the entries immediately below the main diagonal are 1 and all other
entries are 0.)
We consider now an operator which is a direct sum of operators like T
in 9.22. Thus we suppose that V = V
1
⊕V
2
⊕ ⊕V
k
with dimV
j
= q
j
, and
that T
j
: V
j
→ V
j
is an operator with matrix J
q
j
(0) relative to some basis b
j
of V
j
. Let N: V → V be the direct sum of the T
j
. We may write
b
j
= (x
j
, N(x
j
), N
2
(x
j
), . . . , N
q
j
−1
(x
j
))
and combine the b
j
to give a basis b = (b
1
, b
2
, . . . , b
k
) of V ; the elements of
b are all the elements N
i
(x
j
) for j = 1 to k and i = 0 to q
j
−1. We see that
the matrix of N relative to b is the diagonal sum of the J
q
j
(0) for j = 1 to k.
For each i = 1, 2, 3, . . . let k
i
be the number of j for which q
j
≥ i.
Thus k
1
is just k, the total number of summands V
j
, while k
2
is the number
of summands of dimension 2 or more, k
3
the number of dimension 3 or more,
and so on.
9.23 Proposition For each l ( = 1, 2, 3, . . . ) the dimension of the kernel
of N
l
is k
1
+k
2
+ +k
l
.
Proof. In the basis b there are k
1
elements N
i
(x
j
) for which i = q
j
−1, there
are k
2
for which i = q
j
− 2, and so on. Hence the total number of elements
N
i
(x
j
) in b such that i ≥ q
j
− l is
¸
l
i=1
k
i
. So to prove the proposition it
suﬃces to show that these elements form a basis for the kernel of N
l
.
Let v ∈ V be arbitrary. Expressing v as a linear combination of the
elements of the basis b we have
9.23.1 v =
k
¸
j=1
q
j
−1
¸
i=0
λ
ij
N
i
(x
j
)
212 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
for some scalars λ
ij
. Now
N
l
(v) =
k
¸
j=1
q
j
−1
¸
i=0
λ
ij
N
i+l
(x
j
)
=
k
¸
j=1
q
j
+l−1
¸
i=l
λ
i−l,j
N
i
(x
j
)
=
k
¸
j=1
q
j
−1
¸
i=l
λ
i−l,j
N
i
(x
j
)
since N
i
(x
j
) = 0 for i > q
j
−1. Since the above expresses N
l
(v) in terms of
the basis b we see that N
l
(v) is zero if and only if the coeﬃcients λ
i−l,j
are
zero for all j = 1, 2, . . . , k and i such that l ≤ i ≤ q
j
−1. That is, v ∈ ker N
l
if and only if λ
ij
= 0 for all i and j satisfying 0 ≤ i < q
j
− l. So ker N
l
consists of arbitrary linear combinations of the N
i
(x
j
) for which i ≥ q
j
− l.
Since these elements are linearly independent they form a basis of ker N
l
, as
required.
Our main theorem asserts that every nilpotent operator has the above
form.
9.24 Theorem Let N be a nilpotent linear operator on the ﬁnite dimen
sional space V . Then V has a direct decomposition V = V
1
⊕V
2
⊕ ⊕V
k
in
which each V
j
has a basis b
j
= (x
j
, N(x
j
), . . . , N
q
j
−1
(x
j
)) with N
q
j
(x
j
) = 0.
Furthermore, in any such decomposition the number of summands V
j
of di
mension l equals 2(dimker N
l
) −dimker N
l−1
−dimker N
l+1
.
The construction of the subspaces V
j
is somewhat tedious, and occupies
the rest of this section. Note, however, that once this has been done the
second assertion of the theorem follows from Proposition 9.23; if k
i
is the
number of j with q
j
≥ i then the number of summands V
j
of dimension l is
k
l
−k
l+1
, while 9.23 gives that 2(dimker N
l
) −dimker N
l−1
−dimker N
l+1
is equal to 2
¸
l
i=1
k
i
−
¸
l−1
i=1
k
i
−
¸
l+1
i=1
k
i
, which is k
l
−k
l+1
.
Before beginning the construction of the V
j
we state a lemma which
can be easily proved using 6.10.
Chapter Nine: Classiﬁcation of linear operators 213
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9.25 Lemma Let W be a subspace of the vector space V . If Y is a subspace
of V such that Y ∩ W = ¦0¦ then there exists a subspace X of V which
contains Y and is complementary to W.
The proof proceeds by noting ﬁrst that the sum Y +W is direct, then
choosing Y
complementing Y ⊕W, and deﬁning X = Y
⊕Y .
We return now to the proof of 9.24. Assume then that N: V → V is a
nilpotent operator on the ﬁnite dimensional space V . It is in fact convenient
for us to prove slightly more than was stated in 9.24. Speciﬁcally, we will
prove the following:
9.26 Theorem Let q be the index of nilpotence of N and let z
1
, z
2
, . . . , z
r
be any elements of V which form a basis for a complement to the kernel
of N
q−1
. Then the elements x
1
, x
2
, . . . , x
k
in 9.24 can be chosen so that
x
i
= z
i
for i = 1, 2, . . . , r.
The proof uses induction on q. If q = 1 then N is the zero operator and
ker N
q−1
= ker i = ¦0¦; if (x
1
, x
2
, . . . , x
k
) is any basis of V and we deﬁne V
j
to be the 1dimensional space spanned by x
j
then it is trivial to check that
the assertions of Theorem 9.24 are satisﬁed.
Assume then that q > 1 and that the assertions of 9.26 hold for nilpotent
operators of lesser index. Suppose that (x
1
, x
2
, . . . , x
r
) is a basis of a subspace
X of V which is complementary to the kernel of N
q−1
; observe that 6.10
guarantees the existence of at least one such.
9.27 Lemma The qr vectors N
i
(x
j
) (for i from 0 to q − 1 and j from 1
to r) are linearly independent.
Proof. Suppose that
9.27.1
q−1
¸
i=0
r
¸
j=1
λ
ij
N
i
(x
j
) = 0
and suppose that the coeﬃcients λ
ij
are not all zero. Choose k minimal such
that λ
kj
= 0 for some j, so that the equation 9.27.1 may be written as
r
¸
j=1
q−1
¸
i=k
λ
ij
N
i
(x
j
) = 0,
214 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
and apply the operator N
q−1−k
to both sides. This gives
r
¸
j=1
q−1
¸
i=k
λ
ij
N
i+q−1−k
(x
j
) = N
q−1−k
(0) = 0
and since N
h
(x
j
) = 0 for all h ≥ q we obtain
0 =
r
¸
j=1
q−1
¸
i=k
λ
ij
N
i+q−1−k
(x
j
) =
r
¸
j=1
λ
kj
N
q−1
(x
j
) = N
q−1
¸
r
¸
j=1
λ
kj
x
j
giving
¸
r
j=1
λ
kj
x
j
∈ ker N
q−1
. But the x
j
lie in X, and X∩ker N
q−1
= ¦0¦.
Hence
¸
r
j=1
λ
kj
x
j
= 0, and by linear independence of the x
j
we deduce that
λ
kj
= 0 for all j, a contradiction. We conclude that the λ
ij
must all be
zero.
For j = 1, 2, . . . , r we set b
j
= (x
j
, N(x
j
), . . . , N
q−1
(x
j
)); by Lemma
9.27 these are bases of independent subspaces V
j
, so that V
1
⊕V
2
⊕ ⊕V
r
is a subspace of V . We need to ﬁnd further summands V
r+1
, V
r+2
, . . . , V
k
to make this into a direct decomposition of V , and the inductive hypothesis
(that 9.24 holds for nilpotent operators of index less than q) will enable us
to do this.
Observe that V
= ker N
q−1
is an Ninvariant subspace of V , and if
N
is the restriction of N to V
then N
is nilpotent of index q − 1. It is to
N
that the inductive hypothesis will be applied. Before doing so we need a
basis of a subspace X
of V
complementary to ker(N
)
q−2
. This basis will
consist of the N(x
j
) (for j = 1 to r) and (possibly) some extra elements.
(Note that N
invariant subspaces of V
are exactly the same as Ninvariant
subspaces of V
, since N
is simply the restriction of N. Similarly, since
ker N
q−2
⊆ ker N
q−1
= V
it follows that ker N
q−2
and ker(N
)
q−2
coincide.)
9.28 Lemma The elements N(x
j
) (for j = 1, 2, . . . , r) form a basis of a
subspace Y of V
for which Y ∩ (ker N
q−2
) = ¦0¦.
Proof. By 9.27 the elements N(x
j
) are linearly independent, and they lie
in V
= ker N
q−1
since N
q−1
(N(x
j
)) = N
q
(x
j
) = 0. Let Y be the space they
span, and suppose that v ∈ Y ∩ker N
q−2
. Writing v =
¸
r
j=1
λ
j
N(x
j
) we see
that v = N(x) with x =
¸
j
j=1
λ
j
x
j
∈ X. Now
N
q−1
(x) = N
q−2
(N(x)) = N
q−2
(v) = 0
shows that x ∈ X ∩ker N
q−1
= ¦0¦, whence x = 0 and v = N(x) = 0.
Chapter Nine: Classiﬁcation of linear operators 215
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
By 9.25 we may choose a subspace X
of V
containing Y and com
plementary to ker N
q−2
, and by 4.10 we may choose a basis (z
1
, z
2
, . . . , z
s
)
of X
such that z
j
= N(x
j
) for j from 1 to r. Now applying the inductive
hypothesis we conclude that V
contains elements x
1
, x
2
, . . . , x
k
such that:
(i) x
j
= z
j
for j = 1, 2, . . . , s; in particular, x
j
= N(x
j
) for j = 1, 2, . . . , r.
(ii) b
j
= (x
j
, N(x
j
), . . . , N
t
j
−1
(x
j
)) is a basis of a subspace V
j
of V
, where
t
j
is the least integer such that N
t
j
(x
j
) = 0 (for all j = 1, 2, . . . , k).
(iii) V
= V
1
⊕V
2
⊕. . . ⊕V
k
.
For j from r + 1 to k we deﬁne V
j
= V
j
and b
j
= b
j
; in particular,
x
j
= x
j
and q
j
= t
j
. For j from 1 to r let q
j
= q. Then for all j (from 1
to k) we have that b
j
= (x
j
, N(x
j
), . . . , N
q
j
−1
(x
j
)) is a basis for V
j
and q
j
is the least integer with N
q
j
(x
j
) = 0. For j from 1 to r we see that the basis
b
j
of V
j
is obtained from the basis b
j
of V
j
by deleting x
j
. Thus
V
j
= X
j
⊕V
j
for j = 1, 2, . . . , r
where X
j
is the 1dimensional space spanned by x
j
.
To complete the proofs of 9.24 and 9.26 it remains only to prove that
V is the direct sum of the V
j
.
Since (x
1
, x
2
, . . . , x
r
) is a basis of X we have
X = X
1
⊕X
2
⊕ ⊕X
r
and since X complements V
this gives
V = X
1
⊕X
2
⊕ ⊕X
r
⊕V
.
Combining this with the direct decomposition of V
above and rearranging
the order of summands gives
V = (X
1
⊕V
1
) ⊕(X
2
⊕V
2
) ⊕ ⊕(X
r
⊕V
r
) ⊕V
r+1
⊕V
r+2
⊕ ⊕V
k
= V
1
⊕V
2
⊕ ⊕V
k
as required.
Comments
9.28.1 We have not formally proved anywhere that it is legitimate to
rearrange the terms in a direct sum as in the above proof. But it is a trivial
task. Note also that we have used Exercise 15 of Chapter Six.
216 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9.28.2 It is clear that we may choose the ordering of the summands V
j
in
9.24 so that q
1
≥ q
2
≥ ≥ q
k
. (Indeed, this is the ordering which our proof
gives.) The matrix of N relative to b is the diagonal sum of the matrices
J
q
j
(0) (for j from 1 to k); that is, if A =
Mbb
(N) then
A = diag(J
q
1
(0), J
q
2
(0), . . . , J
q
k
(0)) q
1
≥ q
2
≥ ≥ q
k
.
We call such a matrix a canonical nilpotent matrix, and it is a corollary of
9.24 that every nilpotent matrix is similar to a unique canonical nilpotent
matrix.
'9f The Jordan canonical form
Let V be a ﬁnite dimensional vector space T: V → V a linear operator. If λ is
an eigenvalue of T and U the corresponding generalized eigenspace then T
U
is λprimary, and so there is a basis b of U such that the matrix of T
U
−λi
is a canonical nilpotent matrix. Now since
Mbb
(T
U
−λi) =
Mbb
(T
U
) −λ
Mbb
(i) =
Mbb
(T
U
) −λI
we see that the matrix of T
U
is a canonical nilpotent matrix plus λI. Since
J
r
(0) +λI = J
r
(λ) (deﬁned in 9.18.2 above) it follows that the matrix of T
U
is a diagonal sum of blocks of the form J
r
(λ).
9.29 Definition Matrices of the form J
r
(λ) are called Jordan blocks. A
matrix of the form diag(J
r
1
(λ), J
r
2
(λ), . . . , J
r
k
(λ)) with r
1
≥ r
2
≥ ≥ r
k
(a diagonal sum of Jordan blocks in order of nonincreasing size) is called a
canonical λprimary matrix. A diagonal sum of canonical primary matrices
for diﬀerent eigenvalues is called a Jordan matrix, or a matrix in Jordan
canonical form.
Comment
9.29.1 Diﬀerent books use diﬀerent terminology. In particular, some au
thors use bases which are essentially the same as ours taken in the reverse
order. This has the eﬀect of making Jordan blocks upper triangular rather
than lower triangular; in fact, their Jordan blocks are the transpose of ours.
Chapter Nine: Classiﬁcation of linear operators 217
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Combining the main theorems of the previous two sections gives our
ﬁnal verdict on linear operators.
9.30 Theorem Let V be a ﬁnite dimensional vector space over an alge
braically closed ﬁeld F, and let T be a linear operator on V . Then there
exists a basis b of V such that the matrix of T relative to b is a Jordan
matrix. Moreover, the Jordan matrix is unique apart from reordering the
eigenvalues.
Equivalently, every square matrix over an algebraically closed ﬁeld is
similar to a Jordan matrix, the primary components of which are uniquely
determined.
Examples
#4 Find a nonsingular matrix T such that T
−1
AT is in Jordan canonical
form, where
A =
¸
¸
1 1 0 −2
1 5 3 −10
0 1 1 −2
1 2 1 −4
.
−− Using expansion along the ﬁrst row we ﬁnd that the determinant of
A−xI is equal to
(1 −x) det
¸
5 −x 3 −10
1 1 −x −2
2 1 −4 −x
−det
¸
1 3 −10
0 1 −x −2
1 1 −4 −x
+2 det
¸
1 5 −x 3
0 1 1 −x
1 2 1
= (1 −x)(−x
3
+2x
2
) −(x
2
−7x +2) +2(x
2
−4x +1) = x
4
−3x
3
+3x
2
−x
so that the eigenvalues are 0 and 1, with the corresponding generalized
eigenspaces having dimensions 1 and 3 respectively. We ﬁnd a basis for
the 0eigenspace by solving Ax = 0 as usual. Solving (A − I)x = 0 we ﬁnd
that the 1eigenspace has dimension equal to 1, and it follows that there
will be only one Jordan block corresponding to this eigenvalue. Thus the
Jordan canonical form will be diag(J
3
(1), J
1
(0)). We could ﬁnd a basis for
the generalized 1eigenspace by solving (A−I)
3
x = 0, but it is easier to use
218 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Theorem 9.19. The generalized 1eigenspace is the unique invariant comple
ment to the generalized 0eigenspace, and by 9.19 it is simply the column
space of A − 0I. Choose x
1
to be any column of A. Certainly x
1
will lie in
the kernel of (A−I)
3
; we will be unlucky if it lies in the kernel of (A−I)
2
,
but if it does we will just choose a diﬀerent column of A instead. Then set
x
2
= Ax
1
−x
1
and x
3
= Ax
2
−x
2
(and hope that x
3
= 0). Finally, let x
4
be
an eigenvector for the eigenvalue 0. The matrix T that we seek has the x
i
as
its columns. We ﬁnd
T =
¸
¸
1 −1 −1 0
1 −5 −4 2
0 −1 −1 0
1 −2 −2 1
and it is easily checked that
AT = T
¸
¸
1 0 0 0
1 1 0 0
0 1 1 0
0 0 0 0
.
The theory we have developed guarantees that T is nonsingular; there is no
point calculating T
−1
. −−<
#5 Given that the matrix A below is nilpotent, ﬁnd a nonsingular T such
that T
−1
AT is a canonical nilpotent matrix.
A =
¸
¸
1 1 1 0
1 −1 −1 −1
−2 0 0 1
2 2 2 0
.
−− We ﬁrst solve Ax = 0, and we ﬁnd that the null space has dimen
sion 2. Hence the canonical form will have two blocks. At this stage there
are two possibilities: either diag(J
2
(0), J
2
(0)) or diag(J
3
(0), J
1
(0)). Finding
the dimension of the null space of A
2
will settle the issue. In fact we see
immediately that A
2
= 0, which rules out the latter of our possibilities. Our
task now is simply to ﬁnd two linearly independent columns x and y which
lie outside the null space of A. Then we can let T be the matrix with columns
x, Ax, y and Ay. The theory guarantees that T will be nonsingular.
Chapter Nine: Classiﬁcation of linear operators 219
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Practically any two columns you care to choose for x and y will do. For
instance, a suitable matrix is
T =
¸
¸
1 1 0 1
0 1 1 −1
0 −2 0 0
0 2 0 2
and it is easily checked that AT = T diag(J
2
(0), J
2
(0)). −−<
We will not investigate the question of how best to compute the Jordan
canonical form in general. Note, however, the following points.
(a) Once the characteristic polynomial has been factorized, it is simply a
matter of solving simultaneous equations to ﬁnd the dimensions of the
null spaces of the matrices (A − λI)
m
for all eigenvalues λ and the
relevant values of m. Knowing these dimensions determines the Jordan
canonical form (by the last sentence of Theorem 9.24).
(b) Knowing what the Jordan canonical form for A has to be, it is in prin
ciple a routine task to calculate a matrix T such that AT = TJ (where
J is the Jordan matrix). Let the columns of T be v
1
, v
2
, . . . , v
n
. If
the ﬁrst Jordan block in J is J
r
(λ) then the ﬁrst r columns of T must
satisfy T(v
i
) = λv
i
+v
i+1
(for i from 1 to r −1) and T(v
r
) = λv
r
. The
next batch of columns of T satisfy a similar condition determined by
the second Jordan block in J. Finding suitable columns for T is then a
matter of solving simultaneous equations.
Let T be a linear operator on V , and assume that the ﬁeld is al
gabraically closed. Let U
1
, U
2
, . . . , U
k
be the generalized eigenspaces of T,
and for each i let T
i
be the restriction of T to U
i
. Thus T is the direct
sum of the T
i
, in the sense of 9.9.1. For each i let S
i
: U
i
→ U
i
be given by
S
i
(u) = λ
i
u, where λ
i
is the eigenvalue of T associated with the space U
i
,
and let S be the direct sum of the operators S
i
. It can be seen that S is
the unique linear operator on V having the same eigenvalues as T and such
that for each i the generalized λ
i
eigenspace of T is the λ
i
eigenspace of S.
Because S
i
is simply λ
i
i, it is clear that S
i
T
i
= T
i
S
i
for each i, and from
this it follows that ST = TS. Furthermore, if we write N = T − S then
the restriction of N to U
i
is equal to the nilpotent operator T − λ
i
i, and it
follows that N, being a direct sum of nilpotent operators, is itself nilpotent.
220 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
The operator S is called the semisimple part of T, and the operator
N is called the nilpotent part of T. They are uniquely determined by the
properties that S is diagonalizable, N is nilpotent, SN = NS and S+N = T.
If A ∈ Mat(n n, C) we deﬁne the exponential of A by
exp(A) = e
A
= I +A+
1
2
A
2
+
1
6
A
3
+ =
∞
¸
i=0
1
n!
A
n
.
It can be proved that this inﬁnite series converges. Furthermore, if T is any
nonsingular matrix then
exp(T
−1
AT) = T
−1
exp(A)T.
For a Jordan matrix the calculation of its exponential is relatively easy; so
for an arbitrary matrix A it is reasonable to proceed by ﬁrst ﬁnding T so that
T
−1
AT is in Jordan form. If A is diagonalizable this works out particularly
well since
exp(diag(λ
1
, λ
2
, . . . , λ
n
)) = diag(e
λ
1
, e
λ
2
, . . . , e
λ
n
).
It is unfortunately not true that exp(A + B) = exp(A) exp(B) for
all A and B; however, this property is valid if AB = BA, and so, in
particular, if S and N are the semisimple and nilpotent parts of A then
exp(A) = exp(S) exp(N). Calculation of the exponential of a nilpotent ma
trix is easy, since the inﬁnite series in the deﬁnition becomes a ﬁnite series.
For example,
exp
¸
0 0 0
1 0 0
0 1 0
= I +
¸
0 0 0
1 0 0
0 1 0
+
¸
0 0 0
0 0 0
1
2
0 0
.
Matrix exponentials occur naturally in the solution of simultaneous
diﬀerential equations of the kind we have already considered. Speciﬁcally,
the solution of the system
d
dt
¸
¸
x
1
(t)
.
.
.
x
n
(t)
= A
¸
¸
x
1
(t)
.
.
.
x
n
(t)
subject to
¸
¸
x
1
(0)
.
.
.
x
n
(0)
=
¸
x
0
.
.
.
x
n
(where A is an n n matrix) can be written as
¸
¸
x
1
(t)
.
.
.
x
n
(t)
= exp(tA)
¸
x
0
.
.
.
x
n
,
a formula which is completely analogous to the familiar onevariable case.
Chapter Nine: Classiﬁcation of linear operators 221
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Example
#6 Solve
d
dt
¸
x(t)
y(t)
z(t)
=
¸
1 0 0
1 1 0
0 1 1
¸
x(t)
y(t)
z(t)
subject to x(0) = 1 and y(0) = z(0) = 0.
−− Since A is (conveniently) already in Jordan form we can immediately
write down the semisimple and nilpotent parts of tA. The semisimple part
is S = tI and the nilpotent part is
N =
¸
0 0 0
t 0 0
0 t 0
.
Calculation of the exponential of N is easy since N
3
= 0, and, even easier,
exp(S) = e
t
I. Hence the solution is
¸
x(t)
y(t)
z(t)
= e
t
¸
1 0 0
t 1 0
t
2
2
t 1
¸
1
0
0
=
¸
e
t
te
t
1
2
t
2
e
t
.
−−<
'9g Polynomials
Our behaviour concerning polynomials has, up to now, been somewhat rep
rehensible. We have not even deﬁned them, and we have used, without proof,
several important properties. Let us therefore, belatedly, rectify this situa
tion a little.
9.31 Definition A polynomial in the indeterminate x over the ﬁeld F is
an expression of the form a
0
+ a
1
x + + a
n
x
n
, where n is a nonnegative
integer and the a
i
are elements of F.
9.32 Definition Let a(x) be a nonzero polynomial over the ﬁeld F, and
let a(x) = a
0
+ a
1
x + + a
n
x
n
with a
n
= 0. Then n is called the degree
222 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
of a(x), and a
n
the leading coeﬃcient. If the leading coeﬃcient is 1 then the
polynomial is said to be monic.
Let F[x] be the set of all polynomials over F in the indeterminate x.
Addition and scalar multiplication of polynomials is deﬁnable in the obvi
ous way, and it is easily seen that the polynomials of degree less than or
equal to n form a vector space of dimension n + 1. The set F[x] itself is an
inﬁnite dimensional vector space. However, F[x] has more algebraic struc
ture than this, since one can also multiply polynomials (by the usual rule:
(
¸
i
a
i
x
i
)(
¸
j
b
j
x
j
) =
¸
r
(
¸
i
a
i
b
r−i
)x
r
).
The set of all integers and the set F[x] of all polynomials in x over
the ﬁeld F enjoy many similar properties. Notably, in each case there is a
division algorithm:
If n and m are integers and n = 0 then there exist unique integers q and r
with m = qn +r and 0 ≤ r < n.
For polynomials:
If n(x) and m(x) are polynomials over F and n(x) = 0 then there exist unique
polynomials q(x) and r(x) with m(x) = q(x)n(x) + r(x) and the degree of
r(x) less than that of n(x).
A minor technical point: the degree of the zero polynomial is undeﬁned.
For the purposes of the division algorithm, however, the zero polynomial
should be considered to have degree less than that of any other polynomial.
Note also that for nonzero polynomials the degree of a product is the sum of
the degrees of the factors; thus if the degree of the zero polynomial is to be
deﬁned at all it should be set equal to −∞.
If n and m are integers which are not both zero then one can use the
division algorithm to ﬁnd an integer d such that
(i) d is a factor of both n and m, and
(ii) d = rn +sm for some integers r and s.
(Obviously, if n and m are both positive then either r < 0 or s < 0.) The
process by which one ﬁnds d is known as the Euclidean algorithm, and it goes
as follows. Assuming that m > n, replace m by the remainder on dividing
m by n. Now we have a smaller pair of integers, and we repeat the process.
When one of the integers is replaced by 0, the other is the sought after
integer d. For example, starting with m = 1001 and n = 35, divide m by n.
The remainder is 21. Divide 35 by 21; the remainder is 14. Divide 21 by 14;
Chapter Nine: Classiﬁcation of linear operators 223
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
the remainder is 7. Now 14 is divisible by 7, and we conclude that d = 7. It
is easily checked that 7 is a factor of both 1001 and 35, and furthermore
7 = 21 −14 = 21 −(35 −21) = 2 21 −35 = 2 (1001 −28 35) −35
= 2 1001 −57 35.
Properties (i) and (ii) above determine d uniquely up to sign. The
positive d satisfying (i) and (ii) is called the greatest common divisor of m
and n, denoted by ‘gcd(m, n)’. Using the gcd theorem and induction one can
prove the unique factorization theorem for integers: each nonzero integer
is expressible in the form (±1)p
1
p
2
. . . p
k
where each p
i
is positive and has
no divisors other than ±1 and ±p
i
, and the expression is unique except for
reordering the factors.
All of the above works for F[x]. Given two polynomials m(x) and n(x)
which are not both zero there exists a polynomial d(x) which is a divisor
of each and which can be expressed in the form r(x)n(x) + s(x)m(x); the
Euclidean algorithm can be used to ﬁnd one. The polynomial d(x) is unique
up to multiplication by a scalar, and multiplying by a suitable scalar ensures
that d(x) is monic; the resulting d(x) is the greatest common divisor of m(x)
and n(x). There is a unique factorization theorem which states that every
nonzero polynomial is expressible uniquely (up to order of the factors) in the
form cp
1
(x)p
2
(x) . . . p
k
(x) where c is a scalar and the p
i
(x) are monic poly
nomials all of which have no divisors other than scalars and scalar multiples
of themselves. Note that if the ﬁeld F is algebraically closed then the p
i
(x)
are all necessarily of the form x −λ.
If two integers n and m have no common divisors (other than ±1)
then the Euclidean algorithm shows that there exist integers r and s such
that rn + sm = 1. A double application can be used to show that if three
integers n, m and p have no common divisors then there exist r, s and t with
rn +sm+tp = 1. For example, starting with 6, 10 and 15 we ﬁrst ﬁnd that
2 = 2 6 − 10 and then that 1 = 15 − 7 2, and combining these gives
1 = 15 − 14 6 + 7 10. The result clearly generalizes to any number of
integers.
The same is also true for polynomials. For instance, it is possible to
ﬁnd polynomials r(x), s(x) and t(x) such that
x(x −1)r(x) + (x −1)(x + 1)s(x) +x(x + 1)t(x) = 1.
We have a linear algebra application of this:
224 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
9.33 Theorem Let T be a linear operator on V and suppose that the
characteristic polynomial factorizes as
¸
k
i=1
(x − λ
i
)
m
i
. If for each i from 1
to k we deﬁne f
i
(x) =
¸
j=i
(x−λ
j
)
m
j
, then the image of the operator f
i
(T)
is equal to the generalized λ
i
eigenspace.
Proof. Since the only polynomials which are divisors of all the f
i
(x) are
scalars, it follows from the Euclidean algorithm that there exist polynomials
r
i
(x) such that
¸
k
i=1
r
i
(x)f
i
(x) = 1. Clearly we may replace x by T in this
equation, obtaining
¸
k
j=1
r
j
(T)f
j
(T) = i.
Let v ∈ ker(T − λ
i
i)
m
i
. If j = i then f
j
(T)(v) = 0, since f
j
(x) is
divisible by (x −λ
i
)
m
i
. Now we have
v = i(v) =
k
¸
j=1
r
j
(T)(f
j
(T)(v)) = f
i
(T)(r
i
(T)(v)) ∈ imf
i
(T).
Hence we have shown that the generalized eigenspace is contained in the im
age of f
i
(T). To prove the reverse inclusion we must show that if u ∈ imf
i
(T)
then u is in the kernel of (T −λ
i
i)
m
i
. Thus we must show that
(T −λ
i
i)
m
i
f
i
(T)
(w) = 0
for all w ∈ V . But since (x − λ
i
)
m
i
f
i
(x) = c
T
(x), this amounts to showing
that the operator c
T
(T) is zero. This fact, that a linear operator satisﬁes
its own characteristic equation, is the CayleyHamilton Theorem, which is
proved below. See also Exercise 10 at the end of this chapter.
9.34 The CayleyHamilton Theorem If A is an n n matrix over the
ﬁeld F and c(x) = det(A −xI) the characteristic polynomial of A, then the
matrix c(A) is zero.
Proof. Let E = A − xI. The diagonal entries of E are polynomials of
degree 1, and the other entries are scalars. We investigate the cofactors of E.
Calculating cof
ij
(E) involves addition and subtraction of various prod
ucts of n − 1 elements of E. Since elements of E have degree at most 1,
a product of n − 1 elements of E is a polynomial of degree at most n − 1.
Hence cof
ij
(E) is also a ploynomial of degree at most n −1. Let B
(r)
ij
be the
coeﬃcient of x
r
in cof
ij
(E), so that
cof
ij
(E) = B
(0)
ij
+xB
(1)
ij
+ +x
n−1
B
(n−1)
ij
.
Chapter Nine: Classiﬁcation of linear operators 225
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
For each r from 0 to n−1 let B
(r)
be the matrix with (i, j)entry B
(r)
ij
,
and consider the matrix
B = B
(0)
+xB
(1)
+ +x
n−1
B
(n−1)
.
The (i, j)entry of B is just
¸
n−1
r=1
x
r
B
(r)
ij
, which is cof
ij
(E), and so we see
that B = adj E, the adjoint of E. Hence Theorem 8.25 gives
9.34.1 (A−xI)(B
(0)
+xB
(1)
+ +x
n−1
B
(n−1)
) = c(x)I
since c(x) is the determinant of A−xI.
Let the coeﬃcient of x
r
in c(x) be c
r
, so that the right hand side of
9.34.1 may be written as c
0
I +xc
1
I + +x
n
c
n
I. Since two polynomials are
equal if and only if they have the same coeﬃcient of x
r
for each r, it follows
that we may equate the coeﬃcients of x
r
in 9.34.1. This gives
AB
(0)
= c
0
I (0)
−B
(0)
+AB
(1)
= c
1
I (1)
−B
(1)
+AB
(2)
= c
2
I (2)
.
.
.
−B
(n−2)
+AB
(n−1)
= c
n−1
I (n −1)
−B
(n−1)
= c
n
I. (n)
Multiply equation (i) by A
i
and add the equations. On the left hand side
everything cancels, and we deduce that
0 = c
0
I +c
1
A+c
2
A
2
+ +c
n−1
A
n−1
+c
n
A
n
,
which is what we were to prove.
Exercises
1. Find a matrix T such that T
−1
AT is in Jordan canonical form, where
A is the matrix in #1 above.
2. Let A =
¸
1 −1 2
−2 1 3
−1 −1 4
. Find T such that T
−1
AT is in Jordan form.
226 Chapter Nine: Classiﬁcation of linear operators
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
3. Let b = (v, w, x, y, z) be a basis for a vector space V and let T be a linear
operator on V such that Tv = x +5z, Tw = 2y, Tx = 4v, Ty = 2w and
Tz = v.
(i ) Let A be the matrix of T relative to b. Find A.
(ii ) Find B, the matrix of T relative to the basis (w, y, v, x, z).
(iii ) Find P such that P
−1
AP = B.
(iv) Show that T is diagonalizable, and ﬁnd the dimensions of the
eigenspaces.
4. Let T: V → V be a linear operator. Prove that a subspace of V of
dimension 1 is Tinvariant if and only if it is contained in an eigenspace
of T.
5. Prove Proposition 9.22.
6. Prove Lemma 9.25.
7. How many similarity classes are there of 10 10 nilpotent matrices?
8. Describe all diagonalizable nilpotent matrices.
9. Find polynomials r(x), s(x) and t(x) such that
x(x −1)r(x) + (x −1)(x + 1)s(x) +x(x + 1)t(x) = 1.
10. Use Theorem 9.20 to prove the CayleyHamilton Theorem for alge
braically closed ﬁelds.
(Hint: Let v ∈ V and use the decomposition of V as the di
rect sum of generalized eigenspaces to write v =
¸
i
v
i
with v
i
in the generalized λ
i
eigenspace. Observe that it is suﬃcient
to prove that c
T
(T)(v
i
) = 0 for all i. But since we may write
c
T
(T) = f
i
(T)(T − λ
i
i)
m
i
this is immediate from the fact that
v
i
∈ ker(T −λ
i
i)
m
i
.)
11. Prove that every complex matrix is similar to its transpose, by proving
it ﬁrst for Jordan blocks.
12. The minimal polynomial of a matrix A is the monic polynomial m(x) of
least degree such that m(A) = 0. Use the division algorithm for poly
nomials and the CayleyHamilton Theorem to prove that the minimal
polynomial is a divisor of the characteristic polynomial.
Chapter Nine: Classiﬁcation of linear operators 227
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
13. Prove that a complex matrix is diagonalizable if and only if its minimal
polynomial has no repeated factors.
14. Let J be a Jordan matrix, with eigenvalues λ
i
for i from 1 to k. For
each i let q
i
be the size of the largest Jordan block in the λ
i
primary
component. Prove that the minimal polynomial of J is
¸
i
(x −λ
i
)
q
i
.
15. How many similarity classes of complex matrices are there for which
the characteristic polynomial is x
4
(x − 2)
2
(x + 1)
2
and the minimal
polynomial is x
2
(x −2)(x + 1)
2
? (Hint: Use Exercise 14.)
Index of notation
¦ . . . [ . . . ¦ 1
∈ 1
⊆ 3
∪ 3
∩ 3
S T 3
f: A → B 4
imf 4
f(A) 4
a → b 4
f
−1
(C) 4
i (identity) 5
Z
n
13
Mat(mn, R) 18
t
R
n
19
t
A 19
δ
ij
21
ρ
ij
32
ρ
(λ)
i
(A) 32
ρ
(λ)
ij
32
E
ij
32
E
(λ)
i
32
E
(λ)
ij
32
γ
ij
33
γ
(λ)
i
(A) 33
γ
(λ)
ij
33
cof
ij
(A) 36
adj A 36
0
˜
53
ker T 66
0
W
66
RS(A) 73
CS(A) 73
Span(v
1
, . . . , v
n
) 81
dimV 85
cv
b
(v) 95
Tr(A) 103
v 104
d(x, y) 105
U
⊥
114
A
∗
(=
t
A) 117
∼
= 130
⊕ 136
V/U 141
V
∗
(dual) 143
Mcb
(T) 148
Mcb
153
rank(A) 162
RN(A) 165
LN(A) 165
rn(A) 165
ln(A) 165
S
n
171
l(σ) 174
ε(σ) 174
det A 179
c
T
(x) 195
diag(λ
1
, . . . , λ
n
) 197
J
r
(λ) 207
exp(A) 220
gcd(m, n) 223
228
Index of examples
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Chapter 1
'1a Logic and common sense
'1b Sets and functions
#1 Proofs of injectivity 5
#2 Proofs of surjectivity 6
#3 Proofs of set inclusions 6
#4 Proofs of set equalities 6
#5 A sample proof of bijectivity 6
#6 Examples of preimages 7
'1c Relations
#7 Equivalence classes 9
#8 The equivalence relation determined by a function 9
'1d Fields
#9 C and Q 12
#10 Construction of Z
2
12
#11 Construction of Z
3
13
#12 Construction of Z
n
13
#13 A set of matrices which is a ﬁeld 14
#14 A ﬁeld with four elements 15
#15 Field of rational functions 16
Chapter 2
'2a Matrix operations
#1 Multiplication of partitioned matrices 23
'2b Simultaneous equations
#2 Solution of a reduced echelon system 26
'2c Partial pivoting
'2d Elementary matrices
#3 Elementary operations and elementary matrices 33
'2e Determinants
#4 Calculation of an adjoint matrix 37
'2f Introduction to eigenvalues
#5 Finding eigenvalues and eigenvectors 39
#6 Diagonalizing a matrix 40
#7 Solving simultaneous linear diﬀerential equations 41
#8 Leslie population model 42
229
Index of examples
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
#9 Local behaviour of a smooth function 44
#10 Eigenvalues and numerical stability 46
Chapter 3
'3a Linearity
#1 The general solution of an inhomogeneous linear system 51
'3b Vector axioms
#2 The vector space of position vectors relative to an origin 53
#3 The vector space R
3
53
#4 The vector space R
n
54
#5 The vector spaces F
n
and
t
F
n
54
#6 The vector space of scalar valued functions on a set 54
#7 The vector space of continuous real valued functions on R 56
#8 Vector spaces of diﬀerentiable functions 56
#9 The solution space of a linear system 56
#10 A linear mapping from R
3
to R
2
58
#11 The linear transformation v → Av, where A is a matrix 59
#12 Linearity of evaluation maps 59
#13 Linearity of the integral 60
#14 A function which is not linear 61
'3c Trivial consequences of the axioms
'3d Subspaces
#15 A subspace of R
3
65
#16 A subspace of a space of functions 65
#17 Solution spaces may be viewed as kernels 68
#18 Calculation of a kernel 68
#19 The image of a linear map R
2
→R
3
69
'3e Linear combinations
#20 Three trigonometric functions which are linearly dependent 72
#21 The i
th
row of AB 75
#22 Finding a basis for RS(A) 75
#23 On the column space of a matrix 76
230
Index of examples
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Chapter 4
'4a Preliminary lemmas
'4b Basis theorems
'4c The Replacement Lemma
'4d Two properties of linear transformations
'4e Coordinates relative to a basis
#1 Coordinate vectors relative to the standard basis of F
n
95
#2 Coordinate vectors relative to a nonstandard basis of R
3
96
#3 Coordinates for a space of polynomials 97
Chapter 5
'5a The inner product axioms
#1 The inner product corresponding to a positive deﬁnite matrix101
#2 Two scalar valued products which are not inner products 102
#3 An inner product on the space of all mn matrices 103
'5b Orthogonal projection
#4 An orthogonal basis fo a space of polynomials 106
#5 Use of the GramSchmidt orthogonalization process 109
#6 Calculation of an orthogonal projection in R
4
110
#7 Calculating a linear function of best ﬁt 111
#8 Finding a parabola of best ﬁt 113
#9 Trigonometric (Fourier) approximations to functions 113
'5c Orthogonal and unitary transformations
#10 Factorizing a matrix as orthogonal by upper triangular 119
#11 A family of orthogonal matrices 120
'5d Quadratic forms
Chapter 6
'6a Isomorphism
#1 Isomorphism of Mat(2, R) and R
4
131
#2 Polynomials of degree 2 and scalar functions on ¦1, 2, 3¦ 132
'6b Direct sums
#3 Onedimensional direct summands corresponding to a basis 136
'6c Quotient spaces
'6d The dual space
231
Index of examples
C
o
p
y
r
i
g
h
t
U
n
i
v
e
r
s
i
t
y
o
f
S
y
d
n
e
y
S
c
h
o
o
l
o
f
M
a
t
h
e
m
a
t
i
c
s
a
n
d
S
t
a
t
i
s
t
i
c
s
Chapter 7
'7a The matrix of a linear transformation
#1 The matrix of diﬀerentiation relative to a given basis 150
#2 Calculation of a transition matrix and its inverse 151
'7b Multiplication of transformations and matrices
#3 Proof that b is a basis; ﬁnding matrix of f relative to b 156
'7c The Main Theorem on Linear Transformations
#4 Veriﬁcation of the Main Theorem 159
'7d Rank and nullity of matrices
Chapter 8
'8a Permutations
#1 Multiplication of permutations 173
#2 Calculation of a product of several permutations 173
'8b Determinants
#3 Finding a determinant by row operations 188
'8c Expansion along a row
Chapter 9
'9a Similarity of matrices
#1 Showing that a matrix is not diagonalizable 199
'9b Invariant subspaces
#2 Invariance of CS(A) under p(A), where p is a polynomial 201
#3 Geometrical description of a linear operator via eigenvectors 202
'9c Algebraically closed ﬁelds
'9d Generalized eigenspaces
'9e Nilpotent operators
'9f The Jordan canonical form
#4 The Jordan Form for a 3 3 matrix 217
#5 The canonical form for a 4 4 nilpotent matrix 218
#6 Matrix exponentials and linear diﬀerential equations 221
'9g Polynomials
232
Contents
Co
Chapter 1:
§1a §1b §1c §1d
Preliminaries
1 1 3 7 10 18 18 24 29 32 35 38 49 49 52 61 63 71 81 81 85 86 91 93 99 99 106 116 121
Logic and common sense Sets and functions Relations Fields
rig py ht rs ive Un
Sc
ho
ol
of
Chapter 2:
§2a §2b §2c §2d §2e §2f
Matrices, row vectors and column vectors
Matrix operations Simultaneous equations Partial pivoting Elementary matrices Determinants Introduction to eigenvalues
M
ath
ity of Sy dn cs
em ati
Chapter 3:
§3a §3b §3c §3d §3e
Introduction to vector spaces
an dS
ey
Linearity Vector axioms Trivial consequences of the axioms Subspaces Linear combinations
tat ist ics
Chapter 4:
§4a §4b §4c §4d §4e
The structure of abstract vector spaces
Preliminary lemmas Basis theorems The Replacement Lemma Two properties of linear transformations Coordinates relative to a basis
Chapter 5:
§5a §5b §5c §5d
Inner Product Spaces
The inner product axioms Orthogonal projection Orthogonal and unitary transformations Quadratic forms
iii
Chapter 6:
§6a §6b §6c §6d
Relationships between spaces
129 129 134 139 142 148 148 153 157 161 171 171 179 188 194 194 200 204 205 210 216 221 228 229
Isomorphism Direct sums Quotient spaces The dual space
Co ri g py
Sc
Chapter 7:
§7a §7b §7c §7d
Matrices and Linear Transformations
The matrix of a linear transformation Multiplication of transformations and matrices The Main Theorem on Linear Transformations Rank and nullity of matrices
ho
ht
ol
rs ive Un
of
M
Chapter 8:
§8a §8b §8c
Permutations and determinants
ath
ity
Permutations Determinants Expansion along a row
em ati
of Sy
cs
Chapter 9:
§9a §9b §9c §9d §9e §9f §9g
Classiﬁcation of linear operators
dn
an dS
Similarity of matrices Invariant subspaces Algebraically closed ﬁelds Generalized eigenspaces Nilpotent operators The Jordan canonical form Polynomials
ey ist ics
tat
Index of notation Index of examples
iv
1
Co
Preliminaries
rig py ht rs ive Un M ath ity em of ati Sy cs dn an ey of
The topics dealt with in this introductory chapter are of a general mathematical nature, being just as relevant to other parts of mathematics as they are to vector space theory. In this course you will be expected to learn several things about vector spaces (of course!), but, perhaps even more importantly, you will be expected to acquire the ability to think clearly and express yourself clearly, for this is what mathematics is really all about. Accordingly, you are urged to read (or reread) Chapter 1 of “Proofs and Problems in Calculus” by G. P. Monro; some of the points made there are reiterated below.
Sc ho ol
§1a
Logic and common sense
When reading or writing mathematics you should always remember that the mathematical symbols which are used are simply abbreviations for words. Mechanically replacing the symbols by the words they represent should result in grammatically correct and complete sentences. The meanings of a few commonly used symbols are given in the following table.
dS tat
ist
ics
Symbols { ...  ... } = ∈ >
To be read as the set of all . . . such that . . . is in or is in greater than or is greater than
Thus, for example, the following sequence of symbols {x ∈ X  x > a} = ∅ is an abbreviated way of writing the sentence The set of all x in X such that x is greater than a is not the empty set. 1
in plain language. people who leave out the “obvious” steps often make incorrect deductions. this observation may seem rather trite. And it must be realized that this ability is nothing more than an extension of clear thinking. you must learn how to write out proofs.2 Chapter One: Preliminaries When reading mathematics you should mentally translate all symbols in this fashion. likewise. The ability to construct proofs is probably the most important skill for the student of pure mathematics to acquire. your ﬁrst task is to be quite clear as to what is being asserted. Next. To prove it you must do two proofs of the kind described in the preceding paragraph. When asked to prove something. • The statement p if and only if q is logically equivalent to If p then q and if q then p. but nonetheless it is often ignored. There is never anything wrong with stating the obvious in a proof. and when writing mathematics you should make sure that what you write translates into meaningful sentences. • To prove a statement of the form If p then q your ﬁrst line should be Assume that p is true and your last line Therefore q is true. to always check that your proof has the correct logical structure. A proof is nothing more nor less than an explanation of why something is so. the logical structure of what you are trying to prove determines the logical structure of the proof. it frequently happens that students who are supposed to proving that a statement p is a consequence of statement q actually write out a proof that q is a consequence of p. Co ri g py When trying to prove something. To help you avoid such mistakes we list a few simple rules to aid in the construction of logically correct proofs. then write the reason down. • Suppose that P (x) is some statement about x. For instance. Then to prove P (x) is true for all x in the set S Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . then you must decide why it is true. whenever you think you have successfully proved something. and you are urged.
Chapter One: Preliminaries your ﬁrst line should be Let x be an arbitrary element of the set S and your last line Therefore P (x) holds. 3 Co rig py • Statements of the form There exists an x such that P (x) is true are proved producing an example of such an x. S × T = { (s. Some of the above points are illustrated in the examples #1. This course is not concerned with the foundations of mathematics. and we will assume that the reader has an intuitive familiarity with the basic concepts.) Given any two sets S and T the Cartesian product S × T of S and T is the set of all ordered pairs (s. and to delve into formal treatments of such matters would sidetrack us too far from our main purpose. we will not attempt to give formal deﬁnitions of the concepts of ‘function’ and ‘relation’. . t)  s ∈ S. t ∈ T }. #2. For instance. for any two sets S and T . This is a fact which we ask the reader to take on trust. Similarly. (Note that the ‘or’ above is the inclusive ‘or’—that which is sometimes written as ‘and/or’. The Cartesian product of S and T always exists. that is. we write S ⊆ A (S is a subset of A) if every element of S is an element of A. #3 and #4 at the end of the next section. Sc ho ht em ol rs ive Un of M ath ity §1b Sets and functions of Sy dn ey dS tat It has become traditional to base all mathematics on set theory. In this book ‘or’ will always be used in this sense. t) with s ∈ S and t ∈ T . If S and T are two subsets of A then the union of S and T is the set S ∪ T = { x ∈ A  x ∈ S or x ∈ T } ati cs an and the intersection of S and T is the set ist ics S ∩ T = { x ∈ A  x ∈ S and x ∈ T }.
If a function is both injective and surjective we say that it is bijective (or a onetoone correspondence). f is injective if and only if for all a1 . An alternative notation is ‘f (A)’ instead of ‘im f ’. ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . The set A is called the domain of f and B the codomain (or target) of f . The notation ‘a → b’ means ‘a maps to b’. a2 ∈ A. if f (a1 ) = f (a2 ) then a1 = a2 . The terms mapping and transformation are also used. the function involved assigns the element b to the element a. We use the notation ‘f : A → B’ (read ‘f . one could say ‘The inverse image of C under f ’ instead of ‘f inverse of C’. In other words.) Let f : B → C and g: A → B be functions such that domain of f is the codomain of g.’ Alternatively. The composite of f and g is the function f g: A → C given Sc ho ht em im f = { f (a)  a ∈ A }. That is. A function f from A to B is to be thought of as a rule which assigns to every element a of the set A an element f (a) of the set B. (The above sentence reads ‘f inverse of C is the set of all a in A such that f of a is in C. Clearly. f is surjective if and only if im f = B. Thus. Co ri g py A map is the same thing as a function. ‘a → b’ (under the function f ) means exactly the same as ‘f (a) = b’. The word ‘image’ is also used in a slightly diﬀerent sense: if a ∈ A then the element f (a) ∈ B is sometimes called the image of a under the function f . A function f : A → B is said to be injective (or onetoone) if and only if no two distinct elements of A yield the same element of B. in other words. If f : A → B is a function and C a subset of B then the inverse image or preimage of C is the subset of A f −1 (C) = { a ∈ A  f (a) ∈ C }. The image (or range) of a function f : A → B is the subset of B consisting of all elements obtained by applying f to elements of A. from A to B’) to mean that f is a function with domain A and codomain B.4 Chapter One: Preliminaries Let A and B be sets. A function f : A → B is said to be surjective (or onto) if and only if for every element b of B there is an a in A such that f (a) = b.
Then you will presumably consult the deﬁnition of the function λ to derive consequences of λ(x1 ) = λ(x2 ). in which case f −1 : A → B satisﬁes f −1 (a) = b if and only if f (b) = a (for all a ∈ A and b ∈ B). It is easily seen that f : B → A has an inverse if and only if it is bijective. #1 Suppose that you wish to prove that a function λ: X → Y is injective. Sc ho ht em ol rs ive Un ati cs of Examples M ath ity of Sy dn an ey dS tat ist ics . and we write f = g −1 and g = f −1 . By the properties stated above we ﬁnd that (f g)(g −1 f −1 ) = ((f g)g −1 )f −1 = (f (gg −1 )f −1 = (f iB )f −1 = f f −1 = iC (where iB and iC are the identity functions on B and C). x2 ∈ X. Consult the deﬁnition of injective. It is easily checked that if f and g are as above and h: D → A is another function then the composites (f g)h and f (gh) are equal. Thus f g has an inverse. It is clear that if f is any function with domain A then f i = f . So the ﬁrst two lines of your proof should be as follows: Let x1 . and an exactly similar calculation shows that (g −1 f −1 )(f g) is the identity on A.Chapter One: Preliminaries by (f g)(a) = f (g(a)) for all a in A. 5 Co rig py If f : B → A and g: A → B are functions such that the composite f g is the identity on A then we say that f is a left inverse of g and g is a right inverse of f . so that there exist inverse functions g −1 : B → A and f −1 : C → B. Assume that λ(x1 ) = λ(x2 ). If in addition we have that gf is the identity on B then we say that f and g are inverse to each other. x2 ∈ X. Given any set A the identity function on A is the function i: A → A deﬁned by i(a) = a for all a ∈ A. and eventually you will reach the ﬁnal line Therefore x1 = x2 . Suppose that g: A → B and f : B → C are bijective functions. and we have proved that the composite of two bijective functions is necessarily bijective. and likewise if g is any function with codomain A then ig = g. if λ(x1 ) = λ(x2 ) then x1 = x2 . You are trying to prove the following statement: For all x1 .
That is. you wish to prove For every y ∈ Y there exists x ∈ X with λ(x) = y. (iii) All elements of A are elements of B and all elements of B are elements of A. Co ri g py #3 Suppose that A and B are sets. Prove that if C = { n ∈ Z  0 ≤ n ≤ 11 } then there is a bijective map f : A × B → C given by f (a. Our Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . −− Observe ﬁrst that by the deﬁnition of the Cartesian product of two sets. #5 Let A = { n ∈ Z  0 ≤ n ≤ 3 } and B = { n ∈ Z  0 ≤ n ≤ 2 }. The following statements are all logically equivalent to ‘A = B’: (i) For all x. b) = 3a + b for all a ∈ A and b ∈ B. Your ﬁrst line must be Let y be an arbitrary element of Y . A × B consists of all ordered pairs (a. #4 Suppose that you wish to prove that A = B. and the last line of your proof has to be Therefore λ(x) = y. You must do two proofs of the general form given in #3 above.6 Chapter One: Preliminaries #2 Suppose you wish to prove that λ: X → Y is surjective. and you wish to prove that A ⊆ B. (iv) A ⊆ B and B ⊆ A. with a ∈ A and b ∈ B. x ∈ A if and only if x ∈ B. b). By deﬁnition the statement ‘A ⊆ B’ is logically equivalent to All elements of A are elements of B. Somewhere in the middle of the proof you will have to somehow deﬁne an element x of the set X (the deﬁnition of x is bound to involve y in some way). So your ﬁrst line should be Let x ∈ A and your last line should be Therefore x ∈ B. where A and B are sets. (ii) (For all x) (if x ∈ A then x ∈ B) and (if x ∈ B then x ∈ A) .
2)} and f −1 (S3 ) = {(0. b ∈ B we have that 0 ≤ b ≤ 2 and −2 ≤ −b ≤ 0. That is. 2}. Then m = 3a for some integer a. and let m be the largest multiple of 3 which is not greater than c. So 3a + b ∈ { n ∈ Z  0 ≤ n ≤ 11 } = C. b) = (a . giving a = a . Therefore (a. Let c be an arbitrary element of the set C. whence 0 ≤ a < 4. we must show that f is surjective. (3. and hence b −b = 3(a−a ). as required. Note also that since 0 ≤ c ≤ 11. and the equation 3a + b = 3a + b becomes 3a = 3a . b) = 3a + b = c. 0 ≤ a ≤ 3. (2. so 3a ≤ c ≤ 3a + 2. S2 = {0. Since 0 ≤ a ≤ 3 we have that 0 ≤ 3a ≤ 9. (3. 0 ≤ 3a < 12. this shows that b −b is a multiple of 3. by the deﬁnition of f . 0). Let (a. 0)} respectively. b ). Hence a ∈ A. −− Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics §1c Relations We move now to the concept of a relation on a set X. (1. b ). Let (a. it follows that b is an integer satisfying 0 ≤ b ≤ 2.Chapter One: Preliminaries ﬁrst task is to show that for every such pair (a. Finally. 2). and with f as deﬁned above. 9}. For example. Since a−a is an integer. −− They are f −1 (S1 ) = {(1. b) ∈ A × B. b ∈ B. that is. b). is indeed a function from A × B to C. we have 3a+b = 3a +b . 6. Then. f (a. 0). which is what was to be proved. as deﬁned above. 0). b). 0). (0. (0. Find the preimages of the following subsets of C: S1 = {5. −− #6 Let f be as deﬁned in #5 above. But since b. We have now shown that f . f −1 (S2 ) = {(0. 2)}. Now f (a. 1). b) = f (a . Deﬁning b = c − 3a. 3. b ) ∈ A × B. 11}. and assume that f (a. and 3a + 3 (which is a multiple of 3 larger than m) must exceed c. and since 0 ≤ b ≤ 2 we deduce that 0 ≤ 3a + b ≤ 11. 1. and less than 12. S3 = {0. ‘<’ is a relation on the set of natural numbers. hence b = b. since a is an integer. and so 3a + b is also an integer. b) ∈ A × B. The only multiple of 3 in this range is 0. and so (a. (a . in the sense that if m and n are any . Then a and b are integers. b) ∈ C. and adding these inequalities gives −2 ≤ b − b ≤ 2. and. the largest multiple of 3 not exceeding c is at least 0. 7 Co rig py We now show that f is injective.
1 Definition Let ∼ be a relation on a set X. 1. Another example would be to deﬁne sets X and Y to be equivalent if they have the same number of elements. and. any subset of X × X deﬁnes a relation on X. If we were to use the symbol ‘∼’ for this relation then a ∼ b would mean ‘a has the same birthday as b’ and a ∼ b would mean ‘a does not have the same birthday as b’. conversely. as it were. symmetric and transitive is called an equivalence relation. Likewise.) These subsets are called equivalence classes. If X is the quotient of X by the equivalence relation ∼ then X can be thought of as the set obtained from X by identifying equivalent elements of X. Co ri g py If ∼ is a relation on X then { (x. (See #6 below. the symmetric property from the fact that the inverse of a bijection is also a bijection. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . The set of all equivalence classes is then called the quotient of X by the relation ∼. for example. See #10 and #11 below. deﬁne X ∼ Y if there exists a bijection from X to Y . A relation which is reﬂexive. more formally. z ∈ X). y ∈ X being in the same subset if and only if x ∼ y. and transitivity from the fact that the composite of two bijections is a bijection. y. The concept of ‘quotient’ deﬁned above provides a mathematical mechanism for doing this. For example. (ii) The relation ∼ is said to be symmetric if x ∼ y whenever y ∼ x (for all x and y in X). (i) The relation ∼ is said to be reﬂexive if x ∼ x for all x ∈ X.8 Chapter One: Preliminaries two natural numbers then m < n is a welldeﬁned statement which is either true or false. y)  x ∼ y } is a subset of X × X. It is clear that an equivalence relation ∼ on a set X partitions X into nonoverlapping subsets. the “birthday” relation above is an equivalence relation. two elements x. The reﬂexive property for this relation follows from the fact that identity functions are bijective. (iii) The relation ∼ is said to be transitive if x ∼ z whenever x ∼ y and y ∼ z (for all x. ‘having the same birthday as’ is a relation on the set of all people. When dealing with equivalence relations it frequently simpliﬁes matters to pretend that equivalent things are equal—ignoring irrelevant aspects. The idea is that an equivalence class is a single thing which embodies things we wish to identify. Formal treatments of this topic usually deﬁne a relation on X to be a subset of X × X.
Thus every element of C(y) is in C(z). which shows that y ∼ x. and let x ∈ C(y) ∩ C(z). −− Let x ∈ X. Thus C(y) = C(z) whenever y ∼ z. Finally. the sets C(y) and C(z) are equal if y ∼ z. thus f (y) = f (x). so we have shown that C(y) ⊆ C(z). z ∈ X with y ∼ z. −− #8 Let f : X → S be an arbitrary function. Then x ∈ C(y) and x ∈ C(z). observe that if x ∈ X is arbitrary then x ∈ C(x). But z ∼ y (by symmetry of ∼. since x = y has the required property. and we have shown that ∼ is symmetric. since by reﬂexivity of ∼ we know that x ∼ x. Now let y. Prove that ∼ is an equivalence relation. and hence x ∈ C(y). So ∼ is a reﬂexive relation. Hence C(x) = ∅. Since f (x) = f (x) it is certainly true that x ∼ x. and have no elements in common if y ∼ z. and x ∼ z. since we are given that y ∼ z). for every y ∈ X there is an x ∈ X with y ∈ C(x). Then f (x) = f (y) and f (y) = f (z). Prove furthermore that C(x) = ∅ for all x ∈ X. which combined with y ∼ z yields y ∼ x (by transitivity of ∼). So y ∼ x whenever x ∼ y. y. as well as C(y) ⊆ C(z). z ∈ X. This further implies. whence Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . Let x be an arbitrary element of C(y). and now y ∼ x and x ∼ z yield y ∼ z by transitivity. Prove furthermore that if X is the quotient of X by ∼ then there is a onetoone correspondence between the sets X and im f . in other words. and that for every y ∈ X there is an x ∈ X such that y ∈ C(x). By symmetry we deduce that x ∼ z. Then by deﬁnition of C(y) we have that y ∼ x. if we let x be an arbitrary element of C(z) then we have z ∼ x. On the other hand.Chapter One: Preliminaries Examples #7 Suppose that ∼ is an equivalence relation on the set X. 9 Co rig py −− Let y. So C(z) ⊆ C(y). So ∼ is also transitive. Let x. that x ∈ C(z). and for each x ∈ X deﬁne C(x) = { y ∈ X  x ∼ y }. z ∈ X with x ∼ y and y ∼ z. Prove that for all y. and by transitivity it follows from z ∼ y and y ∼ x that z ∼ x. and so y ∼ x and z ∼ x. Then f (x) = f (y) (by the deﬁnition of ∼). Now let x. y ∈ X with x ∼ y. whence f (x) = f (z). It follows that there can be no element x in C(y) ∩ C(z). and deﬁne a relation ∼ on X by the following rule: x ∼ y if and only if f (x) = f (y). z ∈ X with y ∼ z. This contradicts our assumption that y ∼ z. C(y) and C(z) have no elements in common if y ∼ z. by deﬁnition of C(z). Furthermore.
By #7 above. Using the notation of #7 above. Co ri g py and so x ∼ y. by deﬁnition of ∼. the set which is meant to be the codomain of f . Hence the function f is welldeﬁned. called vectors and scalars. y ∈ X with C(x) = C(y). The ﬁrst and third of these points are trivial: since X is deﬁned to be the set of all equivalence classes its elements are certainly all of the form C(x). suppose that x. then f (C) = f (x) for all x ∈ X such that C = C(x). that is. and now the deﬁnition of f gives f (C(x)) = f (x) = s. it follows that C(x) = C(y). −− Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics §1d Fields Vector space theory is concerned with two diﬀerent kinds of mathematical objects. To show that this does give a well deﬁned function from X to im f .10 Chapter One: Preliminaries it is an eqivalence relation. we must show that (i) every C ∈ X has the form C = C(x) for some x ∈ X (so that the given formula deﬁnes f (C) for all C ∈ X). Then f (x) = f (C(x)) = f (C) = f (C ) = f (C(y)) = f (y). In other words. Choose x. this says that f (x) = f (y). As for the second point. Hence f is injective. Since C(x) ∈ X. We show that there is a bijective function f : X → im f such that if C ∈ X is any equivalence class. C ∈ X with f (C) = f (C ). whence x ∼ y by the deﬁnition of C(x). as required. we have shown that for all s ∈ im f there exists C ∈ X with f (C) = s. y ∈ X with C(x) = C(y) then f (x) = f (y) (so that the given formula deﬁnes f (C) uniquely in each case). and the vectors and scalars for one application will generally be diﬀerent from . X = { C(x)  x ∈ X } (since C(x) is the equivalence class of the element x ∈ X). We must prove that f is bijective. y ∈ C(y) = C(x). we wish to deﬁne f : X → im f by f (C(x)) = f (x) for all x ∈ X. and (iii) f (x) ∈ im f (so that the the given formula does deﬁne f (C) to be an element of im f . Thus f : X → im f is surjective. And by the deﬁnition of ∼. C = C . Then by one of the results proved in #7 above. and it is immediate from the deﬁnition of the image of f that f (x) ∈ im f for all x ∈ X. Suppose that C. y ∈ X such that C = C(x) and C = C(y). (ii) if x. as required. But if s ∈ im f is arbitrary then by deﬁnition of im f there exists an x ∈ X with s = f (x). The theory has many diﬀerent applications.
Thus the theory does not say what vectors and scalars are. (Nonzero elements have inverses. or axioms.) (iv) a + b = b + a for all a. it gives a list of deﬁning properties. instead. which the vectors and scalars have to satisfy for the theory to be applicable.2.) Comments 1. such that 1a = a = a1 for all a ∈ F . c ∈ F .) (vi) There exists an element 1 ∈ F . 11 Co rig py In fact. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . when you see the word ‘scalar’. the scalars must form what mathematicians call a ‘ﬁeld’. But there are other sets equipped with operations of addition and multiplication which satisfy the relevant properties. the axioms that scalars have to satisfy are given below. two notable examples are the set of all complex numbers and the set of all rational numbers.) (ix) a(b + c) = ab + ac and (a + b)c = ac + bc for all a.† 1. addition is associative.) (viii) ab = ba for all a. So. (Multiplication is associative. and these operations of addition and multiplication have to satisfy most of the familiar properties of addition and multiplication of real numbers. that you have to be able to add scalars and multiply scalars. (Both distributive laws hold. Indeed. (i) a + (b + c) = (a + b) + c for all a.1 Although the deﬁnition is quite long the idea is simple: for a set † A number is rational if it has the form n/m where n and m are integers.) (iii) For each a ∈ F there is a b ∈ F such that a + b = 0 = b + a. in almost all the important applications the scalars are just the real numbers. (There is a unity element in F . b ∈ F . b. which is not equal to the zero element 0. b. (That is. b ∈ F .2 Definition A set F which is equipped with operations of addition and multiplication is called a ﬁeld if the following properties are satisﬁed. c ∈ F .Chapter One: Preliminaries the vectors and scalars for another application.) (v) a(bc) = (ab)c for all a. The axioms that vectors have to satisfy are given in Chapter Three. (Each element has a negative. b. (Addition is commutative. c ∈ F . (There is a zero element in F . roughly speaking.) (vii) For each a ∈ F with a = 0 there exists b ∈ F with ab = 1 = ba.) (ii) There exists an element 0 ∈ F such that a + 0 = a = 0 + a for all a ∈ F . you may as well think ‘real number’. (Multiplication is commutative. This means.
so as not to be diverted from our main purpose for too long. delve into these matters in this book. the rule (a. We will not. and (for reasons that will become apparent) let us give the elements of S the names odd and even . and the symbol ‘+’ is always reserved for operations which are both associative and commutative.3 The deﬁnition is not quite complete since we have not said what is meant by the term operation. In particular. We now deﬁne addition by the rules even + even = even odd + even = odd even + odd = odd odd + odd = even Sc ho ht em ol rs ive Un ati cs of Examples M ath ity of Sy dn an ey dS tat ist ics . The set of all real numbers is by far the most important example of a ﬁeld. Nevertheless. Co ri g py 1. and all the obvious desirable properties have to be satisﬁed. we will simply assume that the reader is familiar with these basic number systems and their properties. #10 Any set with exactly two elements can be made into a ﬁeld by deﬁning addition and multiplication appropriately. an operation is a rule which assigns an element of S to each ordered pair of elements of S. In other words.12 Chapter One: Preliminaries to be a ﬁeld you have to be able to add and multiply elements. we will not prove that the real numbers form a ﬁeld. for instance. however. b) −→ a + b3 deﬁnes an operation on the set R of all real numbers: we could use the nota√ tion a◦b = a + b3 . 1. #9 As mentioned earlier. such unusual operations are not likely to be of any interest to anyone.2 It is possible to use set theory to prove the existence integers and real numbers (assuming the correctness of set theory!) and to prove that the set of all real numbers is a ﬁeld. Of course. We omit the proofs. there are many other ﬁelds which occur in mathematics.2. since we want operations to satisfy nice properties like those listed above. and so we list some examples. the set C of all complex numbers is a ﬁeld. and so is the set Q of all rational numbers. Let S be such a set. In general an operation on a set S is just a function from S ×S to S. however.2. Thus. In particular. it is rare to consider operations which are not associative.
intuitively. indeed. and div−1 the set of all integers which are one less than integers divisible by three. 3} Sc ho ht em ol rs ive Un ati cs of ath ity of Sy dn an ey dS tat ist ics . Since 1 + 1 = −1 in Z3 the element −1 is alternatively denoted by‘2’ (or by ‘5’ or ‘−4’ or. and so on. and so we should deﬁne div+1 div−1 = div−1 . In fact. let div be the set of all integers divisible by three. but the ﬁeld axioms are only satisﬁed if n is prime. The appropriate deﬁnitions for addition and multiplication of these objects are determined by corresponding properties of addition and multiplication of integers. ‘n’ for any integer n which is one less than a multiple of 3). #11 A similar process can be used to construct a ﬁeld with exactly three elements. div+1 the set of all integers which are one greater than integers divisible by three. we wish to identify integers which diﬀer by a multiple of three. Thus. ‘1’ and ‘−1’. Henceforth this ﬁeld will be denoted by ‘Z2 ’ and its elements will simply be called ‘0’ and ‘1’. Thus it is natural to associate odd with the set of all odd integers and even with the set of all even integers. It is straightforward to check that the ﬁeld axioms are now satisﬁed. 13 These deﬁnitions are motivated by the fact that the sum of any two even integers is even. Thus Z4 = {0. 1. the construction of Zn works for any n. #12 The above construction can be generalized to give a ﬁeld with n elements for any prime number n. Accordingly. since (3k + 1)(3h − 1) = 3(3kh + h − k) − 1 Co M rig py it follows that the product of an integer in div+1 and an integer in div−1 is always in div−1 .Chapter One: Preliminaries and multiplication by the rules even even = even odd even = even even odd = even odd odd = odd . Once these deﬁnitions have been made it is fairly easy to check that the ﬁeld axioms are satisﬁed. with even as the zero element and odd as the unity. 2. This ﬁeld will henceforth be denoted by ‘Z3 ’ and its elements by ‘0’.
2. 4} does form a ﬁeld under addition and multiplication modulo 5. b. ics . 3. where a and b are rational. but Z5 = {0. namely √ √ √ (a + b 2)(c + d 2) = (ac + 2bd) + (ad + bc) 2 for all a. c. b ∈ Q with addition and multiplication of matrices deﬁned in the usual way. b. (aI + bJ)(cI + dJ) = (ac + 2bd)I + (ad + bc)J The rules for addition in F and in F are obviously similar as well. where a. Now since J 2 = 2I (check this!) we see that the rule for the product of two elements of F is cs an dn ey ist tat dS Note that if we deﬁne I = for all a. because multiplication modulo 4 gives 22 = 0. (See Chapter Two for the relevant deﬁnitions. √ If we deﬁne F to be the set of all real numbers of the form a + b 2. then we have a similar formula for the product of two elements of F . c. Indeed. indeed. b ∈ Q. the addition and and multiplication tables for Z5 are as follows: Co 4 4 0 1 2 3 ri g py + 0 1 2 3 4 0 0 1 2 3 4 1 1 2 3 4 0 2 2 3 4 0 1 3 3 4 0 1 2 × 0 1 2 3 4 0 0 0 0 0 0 1 0 1 2 3 4 2 0 2 4 1 3 3 0 3 1 4 2 4 0 4 3 2 1 It is easily seen that Z4 cannot possibly be a ﬁeld. 1 0 0 1 and J = then F consists 0 1 2 0 of all matrices of the form aI +bJ. d ∈ Q.) It can be shown that F is a ﬁeld. whereas it is a consequence of the ﬁeld axioms that the product of two nonzero elements of any ﬁeld must be nonzero. (aI + bJ) + (cI + dJ) = (a + c)I + (b + d)J and √ √ √ (a + b 2) + (c + d 2) = (a + c) + (b + d) 2.14 Chapter One: Preliminaries with the operations of “addition and multiplication modulo 4” does not form a ﬁeld. d ∈ Q. 1.) Sc F = ho ht M ath em ol of rs ive Un ati ity of Sy #13 Deﬁne a b 2b a a. (This is Exercise 4 at the end of this chapter.
and let us also use ‘ω’ to denote one of the two remaining elements of K (it does not matter which). and 0 ati cs 0 1 1 1 dn an 1 1 1 0 ey dS tat ics = and also 1 0 + = . 0 1 let us temporarily use ‘0’ to denote the zero matrix and ‘1’ to denote the identity matrix. in exactly the √ same way as −1 is “adjoined” to the real ﬁeld R to construct the complex ﬁeld C. This is an example of what is known as isomorphism of two algebraic systems. We consider matrices whose entries come from the ﬁeld Z2 . of M 1 1 0 1 ath 1 0 1 1 ity 0 1 1 1 of Sy ist 0 0 0 . It turns out that K= 0 0 is a ﬁeld. The zero element of this ﬁeld is the zero matrix the unity element is the identity matrix 1 0 . √ Note that F is obtained by “adjoining” 2 to the ﬁeld Q. as far as addition and multiplication are concerned. 0 1 0 1 ht em . but addition and multiplication of the matrix entries must be performed modulo 2 since these entries come from Z2 . F and F are essentially the same as one another. it is possible to construct a ﬁeld with four elements. To simplify notation.Chapter One: Preliminaries Thus. 15 Co 1 0 rig py 0 1 #14 Although Z4 is not a ﬁeld. we see that Sc 0 0 1 0 ho . so that the fourth element of K equals 1 + ω. A short calculation yields the following addition and multiplication tables for K: + 0 1 ω 1+ω 0 0 1 ω 1+ω 1 0 1+ω ω 1 ω 1+ω 0 1 ω 1+ω 1+ω ω 1 0 0 1 ω 1+ω 0 0 0 0 0 0 1 ω 1+ω 1 0 ω 1+ω 1 ω 1+ω 0 1+ω 1 ω . Remembering that addition and multiplication are to be performed modulo 2 in this example. Addition and multiplication of matrices is deﬁned in the usual way. ol + rs ive Un 1 1 1 0 .
Let S be a set and ∼ a relation on S which is both symmetric and transitive. y ∈ F are nonzero then xy is also nonzero. and so it follows that x ∼ x for all x ∈ S. Let A and B be nonempty sets and f : A → B a function. . Prove that if x. If x and y are elements of S and x ∼ y then by symmetricity we must have y ∼ x. (ii ) Prove that f has a right inverse if and only if it is surjective. the reﬂexive law is a consequence of the symmetric and transitive laws. Prove that if F is a ﬁeld and x. and will not e be used in this book. Now by transitivity x ∼ y and y ∼ x yields x ∼ x. That is. y ∈ X then E(x) = E(y) if x ∼ y and E(x) ∩ E(y) = ∅ if x ∼ y. (Hint: Use Axiom (vii). For each x ∈ X let E(x) = { z ∈ X  x ∼ z }. What is the error in this “proof ”? Suppose that ∼ is an equivalence relation on a set X.) Prove that Zn is not a ﬁeld if n is not prime. can be regarded as a ﬁeld. Prove that each element x ∈ X lies in a unique equivalence class. (i ) Prove that f has a left inverse if and only if it is injective. They were included merely to emphasize that lots of examples of ﬁelds do exist. although there may be many diﬀerent y such that x ∈ E(y). Co ri g py The last three examples in the above list are rather outr´. Sc ho ol ht rs ive Un ati cs M ath ity em of Sy dn an ey dS tat ist ics of Exercises 3.16 Chapter One: Preliminaries #15 The set of all expressions of the form a0 + a1 X + · · · + an X n b0 + b 1 X + · · · + bm X m where the coeﬃcients ai and bi are real numbers and the bi are not all zero. 2. The subsets E(x) of X are called equivalence classes. (iii ) Prove that if f has both a right inverse and a left inverse then they are equal. 1. 5. 4.
. k1. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics tat dS an . Prove that the example in #14 above is a ﬁeld. k(n − 1) are all distinct elements of Zn .Chapter One: Preliminaries 6. . . (Hint: Let k be a nonzero element of Zn . You may assume that the only solution in integers of the equation a2 − 2b2 = 0 is a = b = 0.2. Use the fact that if k(i−j) is divisible by the prime n then i − j must be divisible by n to prove that k0. . It is straightforward to check that Zn satisﬁes all the ﬁeld axioms except for part (vii) of 1.) 17 Co rig py 7. and deduce that one of them must equal 1. Prove that this axiom is satisﬁed if n is prime. as well as the properties of matrix addition and multiplication given in Chapter Two.
Further development of the theory leads to methods of solving linear diﬀerential equations. and even equations which are intrinsically nonlinear are usually tackled by repeatedly solving suitable linear equations to ﬁnd a convergent sequence of approximate solutions. in our example. The ﬁrst and most obvious (and most important) application is the solution of simultaneous linear equations: in many practical problems quantities which need to be calculated are related to measurable quantities by linear equations. 1 2 3 2 0 7 A= 5 1 1 is a 3×4 matrix. 4 2 8 tat ist ics . We will usually use the notation ‘Xij ’ for the (i. and a matrix with only one column is usually called a column vector. since (as we will see in the next chapter) the term vector can also properly be applied to things which are not rows or columns. 3)entry is 7 and the (3. A matrix with only one row is usually called a row vector.) The numbers are called the matrix entries (or components). we have A23 = 7 and A34 = 8. §2a Matrix operations A matrix is a rectangular array of numbers. We deﬁne Rn to be the Sc ho ol For instance. and they are easily speciﬁed by row number and column number. Thus in the matrix A above the (2. Thus. Moreover.2 Matrices. We will usually call them just rows and columns. 18 and we refer to m × n as the shape of these matrices. row vectors and column vectors Co ri g py ht rs ive Un M ath ity em of ati Sy cs dn an ey dS of Linear algebra is one of the most basic of all branches of mathematics. (That is. The set of all m×n matrices over R will be denoted by ‘Mat(m × n. it has three rows and four columns. j)entry of a matrix X. and consequently can be evaluated by standard row operation techniques. the theory which was originally developed for solving linear equations is generalized and built on in many other branches of mathematics. 4)entry is 8. Rows or columns with n components are often called ntuples. R)’.
Chapter Two: Matrices, row vectors and column vectors set of all column ntuples of real numbers, and t Rn (which we use less frequently) to be the set of all row ntuples. (The ‘t’ stands for ‘transpose’. The transpose of a matrix A ∈ Mat(m × n, R) is the matrix tA ∈ Mat(n × m, R) deﬁned by the formula (tA)ij = Aji .) The deﬁnition of matrix that we have given is intuitively reasonable, and corresponds to the best way to think about matrices. Notice, however, that the essential feature of a matrix is just that there is a welldetermined matrix entry associated with each pair (i, j). For the purest mathematicians, then, a matrix is simply a function, and consequently a formal deﬁnition is as follows: 2.1 Definition Let n and m be nonnegative integers. An m × n matrix over the real numbers is a function A: (i, j) → Aij
19
Co
rig py
from the Cartesian product {1, 2, . . . , m} × {1, 2, . . . , n} to R.
Two matrices with of the same shape can be added simply by adding the corresponding entries. Thus if A and B are both m × n matrices then A + B is the m × n matrix deﬁned by the formula
for all i ∈ {1, 2, . . . , m} and j ∈ {1, 2, . . . , n}. Addition of matrices satisﬁes properties similar to those satisﬁed by addition of numbers—in particular, (A + B) + C = A + (B + C) and A + B = B + A for all m × n matrices. The m × n zero matrix, denoted by ‘0m×n ’, or simply ‘0’, is the m × n matrix all of whose entries are zero. It satisﬁes A + 0 = A = 0 + A for all A ∈ Mat(m × n, R). If λ is any real number and A any matrix over R then we deﬁne λA by (λA)ij = λ(Aij ) for all i and j.
Note that λA is a matrix of the same shape as A. Multiplication of matrices can also be deﬁned, but it is more complicated than addition and not as well behaved. The product AB of matrices A and B is deﬁned if and only if the number of columns of A equals the
Sc
ho
ht em
(A + B)ij = Aij + Bij
ol
rs ive Un ati cs
of
M
ath
ity
of Sy dn ey ics ist dS tat
an
20
Chapter Two: Matrices, row vectors and column vectors number of rows of B. Then the (i, j)entry of AB is obtained by multiplying the entries of the ith row of A by the corresponding entries of the j th column of B and summing these products. That is, if A has shape m × n and B shape n × p then AB is the m × p matrix deﬁned by (AB)ij = Ai1 B1j + Ai2 B2j + · · · + Ain Bnj
Co
ri g py
for all i ∈ {1, 2, . . . , m} and j ∈ {1, 2, . . . , p}.
This deﬁnition may appear a little strange at ﬁrst sight, but the following considerations should make it seem more reasonable. Suppose that we have two sets of variables, x1 , x2 , . . . , xm and y1 , y2 , . . . , yn , which are related by the equations x1 = a11 y1 + a12 y2 + · · · + a1n yn x2 = a21 y1 + a22 y2 + · · · + a2n yn . . . xm = am1 y1 + am2 y2 + · · · + amn yn .
Let A be the m × n matrix whose (i, j)entry is the coeﬃcient of yj in the expression for xi ; that is, Aij = aij . Suppose now that the variables yj can be similarly expressed in terms of a third set of variables zk with coeﬃcient matrix B: y1 = b11 z1 + b12 z2 + · · · + b1p zp y2 = b21 z1 + b22 z2 + · · · + b2p zp . . . yn = bn1 z1 + bn2 z2 + · · · + bnp zp where bij is the (i, j)entry of B. Clearly one can obtain expressions for the xi in terms of the zk by substituting this second set of equations into the ﬁrst. It is easily checked that the total coeﬃcient of zk in the expression for xi is ai1 b1k + ai2 b2k + · · · + ain bnk , which is exactly the formula for the (i, k)entry of AB. We have shown that if A is the coeﬃcient matrix for expressing the xi in terms of the yj , and B the coeﬃcient matrix for expressing the yj in terms of the zk , then AB is the coeﬃcient matrix for expressing the xi in
Sc
=
n
ho
k=1
ht rs ive Un ati cs ity em of Sy dn an ey dS tat ist ics
Aik Bkj
ol
of
M
ath
Chapter Two: Matrices, row vectors and column vectors terms of the zk . If one thinks of matrix multiplication in this way, none of the facts mentioned below will seem particularly surprising. Note that if the matrix product AB is deﬁned there is no guarantee that the product BA is deﬁned also, and even if it is deﬁned it need not equal AB. Furthermore, it is possible for the product of two nonzero matrices to be zero. Thus the familiar properties of multiplication of numbers do not all carry over to matrices. However, the following properties are satisﬁed:
21
Co
rig py
(i) For each positive integer n there is an n × n identity matrix, commonly denoted by ‘In×n ’, or simply ‘I’, having the properties that AI = A for matrices A with n columns and IB = B for all matrices B with n rows. The (i, j)entry of I is the Kronecker delta, δij , which is 1 if i and j are equal and 0 otherwise:
(ii) The distributive law A(B + C) = AB + AC is satisﬁed whenever A is an m × n matrix and B and C are n × p matrices. Similarly, (A + B)C = AC + BC whenever A and B are m × n and C is n × p. (iii) If A is an m × n matrix, B an n × p matrix and C a p × q matrix then (AB)C = A(BC). (iv) If A is an m × n matrix, B an n × p matrix and λ any number, then (λA)B = λ(AB) = A(λB). These properties are all easily proved. For instance, for (iii) we have
p p n p n
Sc
ho
ht em
1 0
ol
rs ive Un
if i = j if i = j.
Iij = δij =
of
M
ath
ity
of Sy dn ey tat ist ics
ati
cs an
dS
(AB)C
ij
=
k=1
(AB)ik Ckj =
k=1 l=1
Ail Blk
Ckj =
k=1 l=1
Ail Blk Ckj
and similarly
n n p n p
A(BC)
ij
=
l=1
Ail (BC)lj =
l=1
Ail
k=1
Blk Ckj
=
l=1 k=1
Ail Blk Ckj ,
and interchanging the order of summation we see that these are equal. The following facts should also be noted.
22
Chapter Two: Matrices, row vectors and column vectors 2.2 Proposition The j th column of a product AB is obtained by multiplying the matrix A by the j th column of B, and the ith row of AB is the ith row of A multiplied by B. Proof. Suppose that the number of columns of A, necessarily equal to the number of rows of B, is n. Let bj be the j th column of B; thus the k th entry of bj is Bkj for each k. Now by deﬁnition the ith entry of the j th column of n AB is (AB)ij = k=1 Aik Bkj . But this is exactly the same as the formula for the ith entry of Abj . The proof of the other part is similar.
Co
ri g py
A(12) A(22) . . . A(p2)
More generally, we have the following rule concerning multiplication of “partitioned matrices”, which is fairly easy to see although a little messy to state and prove. Suppose that the m × n matrix A is subdivided into blocks A(rs) as shown, where A(rs) has shape mr × ns :
A(11) A(21) A= . . . A(p1)
p
Thus we have that m = r=1 mr and n = s=1 ns . Suppose also that B is an n × l matrix which is similarly partitioned into submatrices B (st) of shape ns ×lt . Then the product AB can be partitioned into blocks of shape mr ×lt , q where the (r, t)block is given by the formula s=1 A(rs) B (st) . That is,
Sc
... ...
ho
ht em
ol
rs ive Un ati
q
of
M
ath
... ... ...
A(1q) A(2q) . . . .
A(pq)
ity
of
Sy dn ey ist ics
cs
an
dS tat
A(11) A(21) . . . A(p1) ($)
A(12) A(22) . . .
(11) A(1q) B A(2q) B (21) . . . . . .
B (12) B (22) . . .
... ...
B (1u) B (2u) . . .
A(p2) . . . A(pq) q (1s) (s1) B s=1 A . . = . q (ps) (s1) A B s=1
B (q1) B (q2) . . . B (qu) q (1s) (su) ... B s=1 A . . . . q (ps) (sq) ... B s=1 A
The proof consists of calculating the (i, j)entry of each side, for arbitrary i and j. Deﬁne M0 = 0, M1 = m1 , M2 = m1 + m2 , . . . , Mp = m1 + m2 + · · · + mp
. Write i = i − Mr−1 . Nq = n1 + n2 + · · · + nq L0 = 0. . . A(r2) . . Similarly we may locate the j th column of B by choosing t such that j lies between Lt−1 and Lt . j)entry of the partitioned matrix ($) above. L1 = l1 .Chapter Two: Matrices. Lu = l1 + l2 + · · · + lu . N2 = n1 + n2 . Example #1 Verify the above rule for multiplication of partitioned matrices by computing the matrix product 2 4 1 1 1 1 1 1 3 30 1 2 3 5 0 1 4 1 4 1 using the following partitioning: 2 4 1 1 1 1 1 1 3 30 1 2 3. and we write j = j − Lt−1 . . Now we have (AB)ij = k=1 q 23 Co ns rig py Ns which is the (i. . 4 1 4 1 5 0 1 Sc = = = = ho ht em ol n s=1 q s=1 q s=1 q rs ive Un Aik Bkj s=1 of Aik Bkj (A(rs) B (st) )i j A(rs) B (st) ij M k=1 k=Ns−1 +1 ath ity of Sy dn ey ist ics dS tat cs an (A(rs) )i k (B (st) )kj ati . . . L2 = l1 + l2 . . We see that the ith row of A is partitioned into the i th rows of A(r1) . . there exists an r such that i lies between Mr−1 + 1 and Mr . . Given that i lies between M0 + 1 = 1 and Mp = m. N1 = n1 . . A(rq) . row vectors and column vectors and similarly N0 = 0.
and the like. these results are based simply on the ﬁeld axioms.2) . . in short. existence of 1 and 0. 6 9 6 cs an dS as can easily be checked directly. Comment 2. Everything in this section is true for matrices over any ﬁeld. Given the system of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .2. commutative and distributive laws for addition and multiplication. .2.24 Chapter Two: Matrices. 6 13 9 ity of 1) em ati so that the answer is Sy dn ey −− 7 14 15 7 19 13 . am1 x1 + am2 x2 + · · · + amn xn = bm (2. row vectors and column vectors −− We ﬁnd that 2 1 = = and similarly (5 0) 5 6 1 0 5 9 1 1 4 3 2 1 6 13 6 4 1 0 10 7 1 1 1 2 1 3 + 4 12 1 3 (4 1 4 1 3 1) 14 10 Co 1 2 + 1 4 3 12 ri g py 1 3 Sc 7 14 15 7 19 13 ho ht rs ive Un 1 + (1)(4 1 4 4 1) ol of M ath = (5 = (9 5) + (4 6).1 Looking back on the proofs in this section one quickly sees that the only facts about real numbers which are made use of are the associative. tat ist ics §2b Simultaneous equations In this section we review the procedure for solving simultaneous linear equations.
··· . An echelon matrix is a matrix with the following properties: Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . am1 am2 · · · amn xn bm where the matrix product on the left hand side is as deﬁned in the previous section. so that the operations do not change the solution set. for instance. this fact is not particularly relevant. leftmost) nonzero entry.2. In this shorthand notation the system 2.2.2.3 should simply be regarded as an abbreviated notation for 2.2. An algorithm is described below for obtaining the reduced echelon system corresponding to a given system of equations. bm am1 am2 · · · amn It should be noted that the system of equations 2. omitting the variables.3) . 3. Replace an equation by a nonzero multiple of itself. the plus signs and the equality signs.Chapter Two: Matrices. . It is clear that the three kinds of elementary row operations do satisfy this requirement. The most important thing about row operations is that the new equations should be consequences of the old.2. = . of which there are three kinds: 1. if the ﬁfth equation is replaced by itself minus twice the second equation. row vectors and column vectors the ﬁrst step is to replace it by a reduced echelon system which is equivalent to the original. . . .2 can also be written as the single matrix equation x1 b1 a11 a12 · · · a1n a21 a22 · · · a2n x2 b2 . the old equations should be consequences of the new. Solving reduced echelon systems is trivial. The leading entry of a nonzero row of a matrix is the ﬁrst (that is. . Replace an equation by itself plus a multiple of another. For the time being. 25 Co rig py It is usual when solving simultaneous equations to save ink by simply writing down the coeﬃcients. and. then the old equations can be recovered from the new by replacing the new ﬁfth equation by itself plus twice the second. however. conversely. Write the equations down in a diﬀerent order.2 is written as the augmented matrix a11 a12 · · · a1n b1 a21 a22 · · · a2n b2 (2. ··· . and 2. The algorithm makes use of elementary row operations.2. 2.
If the last leading entry occurs in the ﬁnal column (corresponding to the right hand side of the equations) then the equations have no solution (since we have derived the equation 0=1). proceed as follows to ﬁnd the general solution. Co ri g py #2 Suppose that after performing row operations on a system of ﬁve equations in the nine variables x1 . A reduced echelon matrix has two further properties: (iii) All leading entries must be 1. x7 = δ and x8 = ε. 0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 0 0 0 0 The leading entries occur in columns 3. x5 . 2. the following reduced echelon augmented matrix is obtained: 0 0 1 2 0 0 −2 −1 0 8 0 0 0 0 1 0 −4 2 0 0 0 0 0 0 0 1 (2. k are the nonzero rows. i2 . if rows i and i + 1 are nonzero then the leading entry of row i + 1 must be further to the right—that is.26 Chapter Two: Matrices. .4) 1 3 0 −1 . and suppose that their leading entries occur in columns i1 . to obtain the values of the basic variables. . xi2 . and we say that the system is inconsistent. . x9 . and so the basic variables are x3 . (ii) For all i. . and equation k for xik . . and solving equation 1 for xi1 . ik respectively. . .2. . . 6 and 9. . . Let x1 = α. To state this precisely. 5. . . The most general solution is obtained by assigning arbitrary values to the free variables. x2 = β. xik the pivot or basic variables and the remaining n − k variables the free variables. Once a system of equations is obtained for which the augmented matrix is reduced echelon. suppose that rows 1. and those variables which do not correspond to the leading entry of any row can be given arbitrary values. x4 = γ. row vectors and column vectors (i) All nonzero rows must precede all zero rows. x6 and x9 . (iv) A column which contains the leading entry of any row must contain no other nonzero entry. . x2 . in a higher numbered column—than the leading entry of row i. Thus the general solution of consistent system has n − k degrees of freedom. equation 2 for xi2 . . . . . We may call xi1 . Otherwise each nonzero equation determines the variable corresponding to the leading entry uniquely in terms the other variables. in the sense that it involves n−k arbitrary parameters. where Sc ho ht em ol rs ive Un ati cs of Example M ath ity of Sy dn an ey dS tat ist ics . . In particular a consistent system has a unique solution if and only if n − k = 0. .
row vectors and column vectors α.Chapter Two: Matrices. some care is required to get them in the right places! Sc Comments 2. γ. −3 −1 0 0 0 −1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 It is not wise to attempt to write down this general solution directly from the reduced echelon system 2.4.2. δ and ε are arbitrary parameters.2.5 In general there are many diﬀerent choices for the row operations to use when putting the equations into reduced echelon form. Then the equations give x3 x5 x6 x9 = 8 − 2γ + 2δ + ε = 2 + 4δ = −1 − δ − 3ε =5 27 Co rig py and so the most general solution of the system is x1 α β x2 x3 8 − 2γ + 2δ + ε γ x4 2 + 4δ x5 = x6 −1 − δ − 3ε δ x7 ε x8 5 x9 0 0 0 0 1 0 0 0 0 1 0 0 1 2 −2 0 0 8 0 0 1 0 0 0 = 2 + α 0 + β 0 + γ 0 + δ 4 + ε 0 . It can be ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . β. apart from some changes in sign. Although the actual numbers which appear in the solution are the same as the numbers in the augmented matrix.
Swap rows to bring the second pivot into the second row. This will be the second pivot. Now subtract multiples of the second row from all subsequent rows to make the entries below the second pivot zero. More exactly. Swap rows to make the row containing this ﬁrst pivot the ﬁrst row. Co ri g py There is a straightforward algorithm for obtaining the reduced echelon system corresponding to a given system of linear equations.2. Then subtract multiples of this Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . However. row vectors and column vectors shown. and so on. Continue in this way (using next the m − 2rowed matrix obtained by ignoring the ﬁrst two rows) until no more pivots can be found. and keep going until a nonzero entry is found. Similar reorderings of the equations may be necessary at each step. then use the second equation to eliminate x2 from all subsequent equations. (Strictly speaking. Start by dividing all entries in the last nonzero row by the leading entry—the pivot—in the row. that the reduced echelon form itself is uniquely determined by the original system: it is impossible to change one reduced echelon system into another by row operations. and in the terminology of row operations. Given an augmented matrix with m rows.) Then subtract multiples of the ﬁrst row from all the others so that the new rows have zeros in the ﬁrst column.28 Chapter Two: Matrices. the process is as follows. however. Thus. looking only at the second and subsequent rows.2 are real numbers. as follows. Now repeat the process.6 We have implicitly assumed that the coeﬃcients aij and bi which appear in the system of equations 2. This entry is called the ﬁrst pivot. The reduced echelon form is readily obtained. For the ﬁrst step it is obviously necessary for the coeﬃcient of x1 in the ﬁrst equation to be nonzero. all the entries below the ﬁrst pivot will be zero. but (probably) not reduced echelon form. If there is none (so that elimination of x1 has accidentally eliminated x2 as well) then move on to the third column. The idea is to use the ﬁrst equation to eliminate x1 from all the other equations. in that case use the next column. That is.2. using the (m − 1)rowed matrix obtained by ignoring the ﬁrst row. it is possible that the ﬁrst column is entirely zero. 2. ﬁnd a nonzero entry in the ﬁrst column. but if it is zero we can simply choose to call a diﬀerent equation the “ﬁrst”. When this has been done the resulting matrix will be in echelon form. and that the unknowns xj are also meant to be real numbers. ﬁnd a nonzero entry in the second column. it is easily seen that the process used for solving the equations works equally well for any ﬁeld.
Usually. so that each arithmetic operation performed will introduce a minuscule roundoﬀ error. when a computer performs an arithmetic operation.Chapter Two: Matrices. and there is a deﬁnite danger that the cumulative eﬀect of the minuscule approximations may be such that the ﬁnal answer is not even close to the true solution. Under these circumstances the unavoidable roundoﬀ errors in computation will produce large errors in the solution. for example. In this case the matrix is illconditioned. Diﬀerences between diﬀerent ﬁelds manifest themselves only in the diﬀerent algorithms required for performing the operations of addition. and there is nothing which can be done to remedy the situation. Suppose. however. or change a consistent system into an inconsistent one. For this section. and even more diﬃcult.2.7 The algorithm involves dividing by each of the pivots. we will restrict our attention exclusively to the ﬁeld of real numbers (which is. 29 Co rig py Sc §2c Partial pivoting As commented above. then the third to last. Practical problems often present systems with so many equations and so many unknowns that it is necessary to use computers to solve them. We will give only a very brief and superﬁcial discussion of them. Repeat this for the second to last nonzero row. We start by observing that it is sometimes possible that a minuscule change in the coeﬃcients will produce an enormous change in the solution. the procedure described in the previous section applies equally well for solving simultaneous equations over any ﬁeld. Solving a system involving hundreds of equations and unknowns will involve millions of arithmetic operations. and suppose that it encounters the following system of equations: x+ 5z = 0 12y + 12z = 36 . only a ﬁxed number of signiﬁcant ﬁgures in the answer are retained. It is important to note the following: 2. the most important case). that our computer retains only three signiﬁcant ﬁgures in its calculations. after all.9 ho ol ht em rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics .01x + 12y + 12z = 35. subtraction. row vectors and column vectors row from all the preceding rows so that the entries above the pivot become zero. and so on. multiplication and division in the various ﬁelds. This raises questions which are extremely important.
That is.9z = 35. Co ri g py Solving an illconditioned system will necessarily involve dividing by a number that is close to zero.1. Now we divide the second row by −10000. y = 1 as the solution.30 Chapter Two: Matrices. we have obtained x = 0.01. of M ath ity of Sy dn ey ist ics tat 100 −9998 an dS .9 or 12y + 12z = 35.01 times the ﬁrst equation from the third. but if the coeﬃcient of z in the third equation is changed from 12 to 12. for such an inaccurate computer—is that a small change to the coeﬃcients can dramatically change the solution.9. There are. Solving for x involved dividing by . −9999 and −9998 will both be rounded oﬀ to 10000. to obtain the reduced echelon matrix 1 0 0 1 0 1 . occasions when we may be tempted to divide by a number that is close to zero when there is in fact no need to. and ﬁnally divide the ﬁrst row by . z = −2.1.05z = 0. or no solution at all. y = 5.01 100 0 −9999 ol rs ive Un ati cs . z = 2. For instance.9. then subtract 100 times the second row from the ﬁrst. our mistake was to substitute this value of y into the ﬁrst equation to obtain a value for x. but the best our inaccurate computer can get is either 12y + 11. the ﬁrst step is to subtract . then we will subtract 100 times the ﬁrst row from the second. and obtain Because our machine only works to three ﬁgures. So the computer will either get a solution which is badly wrong.1 the solution becomes x = 10. row vectors and column vectors Using the algorithm described above. This should give 12y + 11. The problem with this system of equations—at least. and a small error in such a number will produce a large error in the answer.1 or 0z = 0.01. Subtracting the second equation from this will either give 0. instead of the correct 0.1z = 0. and the small error in the value of y resulted in a large error in the value of x.01x + 100y = 100 x+y =2 If we select the entry in the ﬁrst row and ﬁrst column of the augmented matrix as the ﬁrst pivot.8 . y = 1.2.95z = 35. As the equations stand the solution is x = −10. however. Our value of y is accurate enough. Sc ho ht em . consider the equations 2.
and then use the concept to compare algorithms. It is a daunting theoretical task to deﬁne the “goodness” of an algorithm in any reasonable way. row vectors and column vectors A much more accurate answer is obtained by selecting the entry in the second row and ﬁrst column as the ﬁrst pivot. In view of these remarks and the comment 2. it is worth noting that although the claims made above seem reasonable. The situation can be remedied by scaling the rows before commencing pivoting. † In complete pivoting all the entries of all the remaining columns are compared when choosing the pivots. it is clear that when deciding which of the entries in a given column should be the next pivot. if the system 2.8 above were modiﬁed by multiplying the whole of the ﬁrst equation by 200 then partial pivoting as described above would select the entry in the ﬁrst row and ﬁrst column as the ﬁrst pivot. but much slower. the row operation procedure continues as follows: 31 Co 1 . we should avoid those that are too close to zero.01. it is hard to justify them rigorously. The resulting algorithm is slightly safer. It is this procedure which is used in most practical problems. and we would encounter the same problem as before.01 1 100 2 100 −2− − − − 1 − − 2 −→ − 1 R2 := 100 R2 R1 :=R1 −R2 R :=R −. by dividing each row through by its entry of largest absolute value. There will always be cases where a less good algorithm happens to work well for a particular system.7 above. always choose that which has the largest absolute value. Sc ho −− − − − − −→ ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ics an dS tat ist . 2 100 rig py We have avoided the numerical instability that resulted from dividing by . For instance. After swapping the two rows.01R 1 0 1 0 0 1 1 100 1 1 .Chapter Two: Matrices. This is because the best algorithm is not necessarily the best for all systems. Choosing the pivots in this way is called partial pivoting. This makes the rows commensurate.2.† Some reﬁnements to the partial pivoting algorithm may sometimes be necessary. Of all possible pivots in the column. and our answer is now correct to three signiﬁcant ﬁgures. Finally.2. and reduces the likelihood of inaccuracies resulting from adding numbers of greatly diﬀering magnitudes.
2.4 Definition An elementary matrix is any matrix obtainable by applying an elementary row operation to an identity matrix. j ∈ I = {1. we will use the letter ‘F ’ to denote some ﬁxed but unspeciﬁed ﬁeld. (λ) where I denotes an identity matrix. The inverses are in fact also elementary row operations: . j ∈ I and some scalar λ. . we will not specify any particular ﬁeld. m}.32 Chapter Two: Matrices.3 Definition Let m and n be positive integers. although from time to time we will restrict ourselves to the case F = R. We use the following notation: Eij = ρij (I). Ei (λ) dS (λ) tat ist ics = ρi (I). . and the reader is encouraged to think of scalars as ordinary numbers (since they usually are). instead. An elementary row operation is a function ρ: Mat(m × n.3. and the elements of F will be referred to as ‘scalars’. or ρ = ρij for some i. or impose some other restrictions on the choice of F . This convention will remain in force for most of the rest of this book. however. ρij and ρi (λ) (λ) (λ) (λ) are deﬁned as follows: (i) ρij (A) is the matrix obtained from A by swapping the ith and j th rows. (ii) ρi (A) is the matrix obtained from A by multiplying the ith row by λ. or ρ = ρi for some i ∈ I and (λ) some nonzero scalar λ. 2. F ) → Mat(m × n. (iii) ρij (A) is the matrix obtained from A by adding λ times the ith row to the j th row. It is conceded.1 Elementary column operations are deﬁned similarly. Accordingly. row vectors and column vectors §2d Elementary matrices In this section we return to a discussion of general properties of matrices: properties that are true for matrices over any ﬁeld. . F ) such that either (λ) ρ = ρij for some i. So it follows that they are bijective functions. where Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey ρij . Comment 2. It can be checked that all elementary row operations have inverses. . (λ) Eij = ρij (I). that the extra generality that we achieve by this approach is not of great consequence for the most common applications of linear algebra. Co ri g py 2.
7 Theorem If γ is an elementary column operation then γ(A) = Aγ(I) (λ) (λ) for all A and the appropriate I.5 Proposition λ. However.6 Theorem If A is a matrix of m rows and I is as in 2. (ii) γi (λ) (λ) multiplies column i by λ. (iii) γij adds λ times column i to column j. Let us use the following notation for elementary column operations: (i) γij swaps columns i and j.) Example #3 Performing the elementary row operation ρ21 (adding twice the second row to the ﬁrst) on the 3 × 3 identity matrix yields the matrix 1 0 E= 0 2 1 0 0 0. row vectors and column vectors 2. ρi (A) = Ei A and (λ) (λ) ρij (A) = Eij A. 1 (2) Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an dS ey ist ics tat . we do not have to deﬁne “column elementary matrices”. Furthermore. since it turns out that the result of applying an elementary column operation to an identity matrix I is the same as the result of applying an appropriate elementary row operation to I. j ∈ I and λ ∈ F we have that ρij (A) = Eij A. Once again there must be corresponding results for columns. γi (I) = Ei (λ) (λ) and γij (I) = Eji .Chapter Two: Matrices. We ﬁnd that 2. The principal theorem about elementary row operations is that the effect of each is the same as premultiplication by the corresponding elementary matrix: Co rig py 2. γij (I) = Eij . j and = (λ−1 ) ρi . (Note that i and j occur in diﬀerent orders on the two sides of the last equation in this theorem. and if λ = 0 then We have ρ−1 = ρij and (ρij )−1 = ρij ij (λ) (ρi )−1 (λ) (−λ) 33 for all i.3 then for all (λ) (λ) i.
8 Proposition For each A ∈ Mat(m × n. Matrices which have inverses are said to be invertible. 2. Thus A and B are inverses of each other. F ) there exists a sequence of m × m elementary matrices E1 . E1 A is a reduced echelon matrix. . row vectors and column vectors So if A is any matrix with three rows then EA should be the result of per(2) forming ρ21 on A. So if B is any (2) matrix with three columns. BE is the result of performing γ12 on B. and T A = R. T is a product of elementary matrices. . Thus. for example 1 2 0 a b c d a + 2e b + 2f c + 2g d + 2h 0 1 0e f g h = e f g h . The proofs of all the above results are easy and are omitted. . E2 . (Ei )−1 = Ei and (Eij )−1 = Eij . 0 1 0 = s 2s + t u s t u 0 0 1 The pivoting algorithm described in the previous section gives the following fact. Let r be the Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . F ) and if AB = I then it is also true that BA = I. For example. In contrast. . . 2. If A is an n × n matrix then an inverse of A is a matrix B satisfying AB = BA = I. 0 0 1 i j k l i j k l Co ri g py which is indeed the result of adding twice the second row to the ﬁrst. Ek such that Ek Ek−1 . Observe (2) that E is also obtainable by performing the elementary column operation γ12 (adding twice the ﬁrst column to the second) on the identity. 1 2 0 p 2p + q r p q r . .5. B ∈ Mat(n × n. It is easily proved that a matrix can have at most one inverse. so there can be no ambiguity in using the notation ‘A−1 ’ for the inverse of an invertible A.8 there exist n × n matrices T and R such that R is reduced echelon. as one would expect in (λ) (λ−1 ) (λ) (−λ) −1 view of 2. Proof. A matrix which is a product of invertible matrices is also invertible.9 Theorem If A. Elementary matrices are invertible.34 Chapter Two: Matrices. and we have (AB)−1 = B −1 A−1 . and. By 2. Eij = Eij . the following important fact is deﬁnitely nontrivial.
and let the leading entry of the ith nonzero row occur in column ji . is very similar to the last part of the proof of 2. §2e Determinants We will have more to say about determinants in Chapter Eight. Note that Theorem 2. Since R(BT −1 ) = I certainly does not have a zero row. and hinges on the fact that the reduced echelon matrix obtained from A by pivoting must have a zero row. So either R has at least one zero row.9 applies to square matrices only. But the ith row of R(BT −1 ) is obtained by multiplying the ith row of R by BT −1 . 35 Co rig py Since the leading entries in a reduced echelon matrix are all 1. we must have 1 ≤ j1 < j2 < · · · < jr ≤ n. we conﬁne ourselves to describing a method for calculating determinants and stating some properties which we will prove in Chapter Eight. which we omit. Since the leading entry of each nonzero row of R is further to the right than that of the preceding row. The proof.Chapter Two: Matrices. and its determinant is equal Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . and if all the rows of R are nonzero (so that r = n) this forces ji = i for all i. This gives T = T I = T (AB) = (T A)B = IB = B. and will therefore be zero if the ith row of R is zero. it follows that R does not have a zero row.9. it follows that if all the diagonal entries of R are leading entries then R must be just the identity matrix. or else the leading entries of the rows occur in the diagonal positions.10 Theorem If the matrix A has more rows than columns then it is impossible to ﬁnd a matrix B such that AB = I. for the present. So in this case we have T A = I. and since AB = I we have R(BT −1 ) = T ABT −1 = T T −1 = I. row vectors and column vectors number of nonzero rows of R. 2. Note that T is invertible. Associated with every square matrix is a scalar called the determinant of the matrix. It remains to prove that it is impossible for R to have a zero row. If A and B are not square it is in fact impossible for both AB and BA to be identity matrices. and since all the other entries in a column containing a leading entry must be zero. as the next result shows. since it is a product of elementary matrices. as required. A 1 × 1 matrix is just a scalar. and we deduce that BA = I.
and assume that we know how to calculate the determinants of (n − 1) × (n − 1) matrices. for example. j)th cofactor of A. if n is 3 we have A11 A12 A13 A = A21 A22 A23 A31 A32 A33 Co of M ri g py Sc ho ht em ol rs ive Un ath ity and we ﬁnd that cof 21 (A) = (−1)3 det A12 A32 of Sy ati A13 A33 = −A12 A33 + A13 A32 .36 Chapter Two: Matrices. We deﬁne the (i. We now proceed recursively. (adj A)ij = cof ji (A). For 2 × 2 matrices the rule is det a b c d = ad − bc. denoted ‘cof ij (A)’. to be (−1)i+j times the determinant of the matrix obtained by deleting the ith row and j th column of A. row vectors and column vectors to itself. So. ist ics . The determinant of A (for any n) is given by the socalled ﬁrst row expansion: det A = A11 cof 11 (A) + A12 cof 12 (A) + · · · + A1n cof 1n (A) cs dn an ey dS tat n = i=1 A1i cof 1i (A). Let A be an n × n matrix.) It turns out that a matrix A is invertible if and only if det A = 0. † called the adjugate matrix in an earlier edition of this book. (A better practical method for calculating determinants will be presented later. that is. and if det A = 0 then the inverse of A is given by the formula A−1 = 1 adj A det A where adj A (the adjoint matrix†) is the transposed matrix of cofactors. since some misguided mathematicians use the term ‘adjoint matrix’ for the conjugate transpose of a matrix.
A square matrix A has an inverse if and only if det A = 0. 8 −16 −8 . a much quicker method exists using row operations. Co ol of rig py Sc ho ht em rs ive Un ati cs M ath ity of Sy This dn an dS Example #4 Calculate adj A and (adj A)A for the following matrix A: ey tat 3 −1 4 A = −2 −2 3. Let A be an n × n matrix. last property is particularly important for the next section. (−1)4 det (−1)5 det (−1)5 det (−1)4 det = −17 (−1)6 det 3 −1 −2 −2 So the transposed matrix of cofactors is −5 10 −16 adj A = 17 −34 −17 . Then det(AB) = det A det B.Chapter Two: Matrices. Then det A = 0 if and only if there exists a nonzero n × 1 matrix x (that is. 7 3 −2 −− The cofactors are as follows: −2 3 3 −2 −1 4 3 −2 −1 4 −2 3 ist ics (−1)2 det (−1)3 det = −5 (−1)3 det = 10 =5 −2 3 7 −2 3 4 7 −2 3 4 −2 3 = 17 = −34 (−1)4 det −2 −2 7 3 3 −1 7 3 =8 = −16 = −8. a formula that can easily be veriﬁed directly. We remark that it is foolhardy to attempt to use the cofactor formula to calculate inverses of matrices. Let A and B be n × n matrices. (i) (ii) (iii) The most important properties of determinants are as follows. except in the 2 × 2 case and (occasionally) the 3 × 3 case. row vectors and column vectors In particular. a column) satisfying the equation Ax = 0. in the 2 × 2 case this says that a b c d −1 37 = 1 ad − bc d −b −c a provided that ad − bc = 0.
and some real matrices which have nonreal complex eigenvalues. Comments 2.11. a matrix over R can be regarded as a matrix over C. F ). A scalar λ is called an eigenvalue of A if there exists a nonzero v ∈ F n such that Av = λv. Similarly. Any such v is called an eigenvector. we conclude that det A = 0. Sc ho ht em ol rs ive Un ati of M ath −− ity of §2f Introduction to eigenvalues Sy dn ey dS tat ist ics 2.1 Eigenvalues are sometimes called characteristic roots or characteristic values.38 Chapter Two: Matrices.11 Definition Let A ∈ Mat(n × n. A scalar λ is an eigenvalue of A if and only if det(A − λI) = 0. F ).2 Since the ﬁeld Q of all rational numbers is contained in the ﬁeld R of all real numbers. For all v ∈ F n we have that Av = λv if and only if (A − λI)v = Av − λv = 0. 0 0 0 Co ri g py Since (adj A)A should equal (det A)I. Proof.12 Theorem Let A ∈ Mat(n × n. It is a perhaps unfortunate fact that there are some rational matrices that have eigenvalues which are real or complex but not rational. the ﬁeld of all complex numbers.11. row vectors and column vectors Hence we ﬁnd that −5 10 5 3 −1 4 (adj A)A = 17 −34 −17 −2 −2 3 8 −16 −8 7 3 −2 −15 − 20 + 35 5 − 20 + 15 −20 + 30 − 10 = 51 + 68 − 119 −17 + 68 − 51 68 − 102 + 34 24 + 32 − 56 −8 + 32 − 24 32 − 48 + 16 0 0 0 = 0 0 0. 2. and the words ‘proper’ and ‘latent’ are occasionally used instead of ‘characteristic’. 2. a matrix over Q can also be regarded as a matrix over R. cs an .
of M ath x y ity of ati . we must solve Trivially we ﬁnd that all solutions are of the form any nonzero ξ gives an eigenvector. It is called the characteristic equation of A. which is the same as ours if n is even. row vectors and column vectors By the last property of determinants mentioned in the previous section. that is.Chapter Two: Matrices. To ﬁnd eigenvectors corresponding to the eigenvalue 2 we must ﬁnd nonzero columns v satisfying (A − 2I)v = 0. the negative of ours if n is odd. 3 rig py #5 −− Find the eigenvalues and corresponding eigenvectors of A = We must ﬁnd the values of λ for which det(A − λI) = 0. Some authors deﬁne the characteristic polynomial to be det(xI − A). The polynomial f (x) = det(A − xI) is called the characteristic polynomial of A. which we must solve in order to ﬁnd the eigenvalues of A. a nonzero such v exists if and only if det(A − λI) = 0. and the eigenvalues are 4 and 2. Observe Sc ho 1 1 ht em = ol 1 1 rs ive Un 0 0 3−λ 1 1 3−λ = (3 − λ)2 − 1. and hence that −ξ with ξ = 0 are the eigenvectors −− cs an dn ey dS tat ist ics . Sy ξ . 39 Co Example 3 1 1 . 2. solving −1 1 1 −1 we ﬁnd that the columns of the form corresponding to 4 . Now det(A − λI) = det So det(A − λI) = (4 − λ)(2 − λ).13 Definition Let A ∈ Mat(n × n. is a polynomial equation for λ. F ) and let x be an indeterminate. Similarly. The polynomial involved is of considerable theoretical importance. x y ξ ξ = 0 0 As we discovered in the above example the equation det(A − λI) = 0.
ist ics 2. Solving this problem is sometimes called “diagonalizing A”. Sc ho ht 1 . and so we know that em ati = 2 −2 4 4 of Sy dn ey cs an 3 1 and 3 1 1 3 1 3 1 −1 1 1 dS tat = and using Proposition 2. row vectors and column vectors that the characteristic roots (eigenvalues) are the roots of the characteristic polynomial.13. 4). A square matrix A is said to be diagonal if Aij = 0 whenever i = j.13.2 we see that det P = 2 = 0.40 Chapter Two: Matrices. −− . Eigenvalue theory is used to do it. the following problem arises naturally in the solution of systems simultaneous diﬀerential equations: Co ri g py 2. It follows immediately from 2. and hence P −1 exists. ﬁnd an invertible matrix P such that P −1 AP is diagonal.13.1 Given a square matrix A.2 that P −1 AP = diag(2. 3 ol rs ive Un of Examples M ath 3 1 ity #6 Diagonalize the matrix A = −− We have calculated the eigenvalues and eigenvectors of A in #5 above.2 to combine these into a single matrix equation gives 3 1 1 3 1 −1 1 1 = = Deﬁning now P = 1 −1 1 1 2 −2 1 −1 4 4 1 1 2 0 0 4 . As we shall see in examples below and in the exercises.
dS tat ist ics u v . Hence we deﬁne P = 1 −1 1 1 of u v + p12 v) + p22 v) M = AP = P −1 AP ath = p11 u + p12 v p21 u + p22 v p11 u + p12 v p21 u + p22 v ity of Sy dn ey =P u v cs an . equivalently. j)th entry of P . 41 Co rig py Sc −− We may write the equations as where A is the matrix from our previous example. In terms of u and v the equations now become ol P or. but the coeﬃcient matrix A has been replaced by P −1 AP . in terms of the new variables the equations are of exactly the same form as before. row vectors and column vectors #7 Solve the simultaneous diﬀerential equations x (t) = 3x(t) + y(t) y (t) = x(t) + 3y(t). We require P to be invertible. but there are no other restrictions. and the entries of P must constants. we obtain d dt x y = x y = d dt (p11 u d dt (p21 u since the pij are constants. Diﬀerentiating. The method is to ﬁnd a change of variables which simpliﬁes the equations. u v u v rs ive Un ati = u v That is. Choose new variables u and v related to x and y by ho x y ht em d dt x y =A x y =P where pij is the (i. To do so we choose P to be any invertible matrix whose columns are eigenvectors of A.Chapter Two: Matrices. so that it is possible to express u and v in terms of x and y. We naturally wish choose the matrix P in such a way that P −1 AP is as simple as possible. and as we have seen in the previous example it is possible (in this case) to arrange that P −1 AP is diagonal.
. Censuses are taken at times t = 0. .42 Chapter Two: Matrices. where H and K are arbitrary constants. . . on expanding. . 2. xn (i + 1) = bn−1 xn−1 (i). We also ﬁnd that x1 (i + 1) = a1 x1 (i) + a2 x2 (i) + · · · + an xn (i) of M ath ity em of ati Sy cs dn an ey dS tat ist ics . 1.95 would mean that 95% of those aged between 0 and 1 at one census survive to be aged between 1 and 2 at the next. . At regular intervals the number of individuals in various age groups is counted. Thus. Our new equations now are Co = u v 2 0 0 4 u v ri g py He2t Ke4t or. for example. −− #8 Another typical application involving eigenvalues is the Leslie population model. u = 2u and v = 4v. At time t let x1 (t) = the number of individuals aged between 0 and 1 x2 (t) = the number of individuals aged between 1 and 2 . The solution to this is u = He2t and v = Ke4t where H and K are arbitrary constants. and it is observed that x2 (i + 1) = b1 x1 (i) x3 (i + 1) = b2 x2 (i) . b1 = 0. . We are interested in how the size of a population varies as time passes. and this gives Sc 1 −1 ho 1 1 ht ol rs ive Un x y = = He2t + Ke4t −He2t + Ke4t so that the most general solution to the original problem is x = He2t + Ke4t and y = −He2t + Ke4t . xn (t) = the number of individuals aged between n − 1 and n where n is the maximum age ever attained. row vectors and column vectors (which means that x = u + v and y = −u + v).
. . ... 0 43 Co of rig py or x(i + 1) = Lx(i). .. an . . By solving equations we can express x(0) in ˜ terms of the eigenvectors: ˜ √ √ √ √ 1 + 5 (1 + 5)/2 1 − 5 (1 − 5)/2 1 √ x(0) = = − √ . 0 xn (i) xj (t) as its j th entry and an 0 0 . That is. as we can see in an oversimpliﬁed example. . and (i) all young individuals survive to become old in the next era (b1 = 1). . 0 . . . √ √ The eigenvalues of L are (1 + 5)/2 and (1 − 5)/2. a2 . the ai measure the contributions to the birth rate from the diﬀerent age groups. an−1 0 b1 0 0 . . . row vectors and column vectors for some constants a1 . 1 1 1 ˜ 2 5 2 5 ics . . . bn−1 x (i) an 1 0 x2 (i) 0 x3 (i) . . . . . .. . . In matrix notation x (i + 1) a1 1 x2 (i + 1) b1 x3 (i + 1) = 0 . . . . . . and so does each old individual (a1 = a2 = 1). . It turns out that in fact this behaviour depends on the largest eigenvalue of L.. .. . . 0 L = 0 b2 0 . (ii) on average each young individual gives rise to one oﬀspring in the next era. . . an ey dS tat ist Suppose that n = 2 and L = 1 1 1 . That is. . . . . . This tells us x(0). . 0 0 0 . where x(t) is the column with ˜ ˜ ˜ a1 a2 a3 . . Assuming this relationship between x(i+1) and x(i) persists indeﬁnitely ˜ ˜ we see that x(k) = Lk x(0). . there are just two age 0 groups (young and old).Chapter Two: Matrices.. . . 0 a3 0 0 . . We are interested in the behaviour of Lk x(0) ˜ ˜ ˜ as k → ∞. bn−1 Sc ho ht em ol rs ive Un ati cs M ath ity of Sy dn is the Leslie matrix. . and the corresponding √ √ (1 + 5)/2 (1 − 5)/2 eigenvectors are and respectively. an−1 0 0 . 0 xn (i + 1) a2 0 b2 . . 1 1 Suppose that there are initially one billion young and one billion old individuals.. .
and write 2. y) be a smooth real valued function of two real variables. y) can be approximated by a Taylor series f (x.13. row vectors and column vectors We deduce that x(k) = Lk x(0) ˜ ˜√ √ √ √ 1 + 5 k (1 + 5)/2 1 − 5 k (1 − 5)/2 √ L = − √ L 1 1 2 5 2 5 √ k √ √ 1+ 5 1+ 5 (1 + 5)/2 √ = 1 2 2 5 √ √ k √ 1− 5 1− 5 (1 − 5)/2 − √ 1 2 2 5 Co ri g py (1 + (1 + √ √ k But (1 − 5)/2 becomes insigniﬁcant in comparison with (1 + 5)/2 as k → ∞. For the ﬁrst few terms we can conveniently use matrix notation. 0). Choose that point to be the origin of our coordinate system. and the √ size of the population is multiplied by (1 + 5)/2 (the largest eigenvalue) per unit time. y) ≈ p + qx + ry + sx2 + 2txy + uy 2 + · · · where higher order terms become less and less signiﬁcant as (x.3 where L = ( q f (x) ≈ p + Lx + tx Qx ˜ ˜ ˜ ˜ r ) and Q = s t .44 Chapter Two: Matrices. #9 Let f (x. We see that for k large 1 x(k) ≈ √ ˜ 5 √ So in the long term the ratio of young to old approaches (1 + 5)/2. and x is the twocomponent column t u ˜ with entries x and y. y) is made closer and closer to (0. and the entries of Q are given by the second ˜ Sc ho ht em √ √ ol rs ive Un 5)/2 5)/2 of . Note that the entries of L are given by the ﬁrst order partial derivatives of f at 0. of ati k+2 k+1 Sy dn ey tat ist ics cs an dS . Under appropriate conditions f (x. and suppose that we are interested in the behaviour of f near some ﬁxed point. k M ath ity .
We ˜ eliminate product will return to this problem in Chapter Five.Chapter Two: Matrices. Likewise. If some eigenvalues are negative and some are positive it is neither a maximum nor a minimum. if all are negative then we have a local maximum. Since the determinant of Q is the product of the eigenvalues. Our aim is to choose θ so that Q = t T QT is diagonal. Substituting this into the expression for ˜ ˜ ˜ ˜ our quadratic form gives t cs an sin θ cos θ .13. ˜ Since˜it happens that the transpose of the rotation matrix T coincides with its inverse. It is because Q is its own transpose that a diagonalizing matrix T can be found satisfying T −1 = t T . and the behaviour of˜f near 0 is determined by the quadratic form tx Qx. this is the same problem introduced in 2. The diagonal entries of the diagonal matrix Q are just the eigenvalues of Q. That the eigenvalues of the matrix Q are of crucial importance can be seen easily in the two variable case. equivalently (x y) = (x y ) cos θ − sin θ That is.3 is equally valid when ˜ x has more than two components. Furthermore. dn ey ist ics tat dS (x ) ˜ cos θ − sin θ sin θ cos θ s t t u cos θ sin θ − sin θ cos θ x ˜ or t (x ) t T QT x . as follows. ˜ To understand tx Qx it is important to be able to rotate the axes so ˜ to as ˜ ˜ terms like xy and be left only with the square terms. x = T x and tx = t (x )tT .13. The corresponding fact is true in the nvariable case: if all eigenvalues of Q are positive then we have a local minimum.1 above. 2. . 45 Co rig py Rotating the axes through an angle θ introduces new variables x and y related to the old variables by Sc x y ho = ht em ol rs ive Un x y of cos θ sin θ M ath ity of Sy − sin θ cos θ ati or. and show that it can always be done. row vectors and column vectors order partial derivatives of f at 0. It can be seen that if the eigenvalues are both positive then the critical point 0 is a local minimum. ˜ In the case that 0 is a critical point of f the terms of degree 1 disappear.
One method is to make use of the fact that (usually) if A is a square matrix and x a column then as k → ∞ the columns Ak x tend to approach scalar multiples of an eigenvector. the best numerical methods for calculating eigenvalues do not depend upon direct solution of the characteristic equation. 2. B and C) and commutative (A + B = B + A for all A and B). For example. and A is illconditioned (see §2c) if the condition number is too large.01 Co ri g py then changing the ﬁrst component of x by 0. as in this example. we remark that numerical calculation of eigenvalues is an important practical problem. For n variables the test is necessarily more complicated. row vectors and column vectors and since det Q = det Q . it follows in the two variable case that we have a local maximum or minimum if and only if det Q > 0.46 Chapter Two: Matrices. the matrix A is equal to its own transpose. Whilst on numerical matters. Sc ho ht em ol rs ive Un ati cs of Exercises 1. M ath ity of Sy dn an ey dS tat ist ics . the condition number of A is deﬁned as the ratio of the largest eigenvalue of A to the smallest eigenvalue of A.1 changes the ﬁrst component of b by 10. Prove that if A is an m × n matrix and Im and In are (respectively) the m × m and n × n identity matrices then AIn = A = Im A. while changing the second component of b by 0. and if it happens that A has an eigenvalue very close to zero and another eigenvalue with a large absolute value. Prove that matrix addition is associative (A + (B + C) = (A + B) + C for all A. 3. #10 We remark also that eigenvalues are of crucial importance in questions of numerical stability in the solving of equations. Prove that if A. if 100 0 A= 0 0. If.1 changes the second component of x by 10. especially for large matrices. If x and b are related by Ax = b where A is a square matrix.) In practice. then it is likely that small changes in either x or b will produce large changes in the other. (We saw this in #8 above. B and C are matrices of appropriate shapes and λ and µ are arbitrary numbers then A(λB + µC) = λ(AB) + µ(AC).
and let P be the n × n matrix whose n columns are v1 . 47 Co 5. 1 3 (ii ) Find an eigenvector for each of these eigenvalues. Show that AP = P D. its coordinates at time t being (x(t). . . that the formula A(adj A) = (det A)I is valid for any 3 × 3 matrix A.) ath ity em of Sy dn ey dS tat ist ics ati cs an . where δij is the Kronecker delta. . . (That is. (i ) Show that if ξ is an eigenvalue of the 2 × 2 matrix a b c d then ξ is a solution of the quadratic equation x2 − (a + d)x + (ad − bc) = 0. For each i let vi be an eigenvector corresponding to the eigenvalue ξ. 0 −2 Hence ﬁnd the eigenvalues of the matrix . . v2 . j)entry is the complex conjugate of Xij ). Prove that if the entries of A and B are complex numbers then AB = A B. ξ2 . Suppose that (∗) x (t) = − 2y(t) y (t) = x(t) + 3y(t) Show (by direct calculation) that the equations (∗) can be solved by putting x(t) = 2z(t) − w(t) and y(t) = −z(t) + w(t). . Use the properties of matrix addition and multiplication from the previous exercises together with associativity of matrix multiplication to prove that a square matrix can have at most one inverse. Check. .Chapter Two: Matrices. ∞). ξn . ξ2 . ξn are eigenvalues for A. (If X is a complex matrix then X is the matrix whose (i. row vectors and column vectors 4. 6. 8. and obtaining diﬀerential equations for z and w. . . . A particle moves in the plane. by tedious expansion. where D is the diagonal matrix with diagonal entries ξ1 . (i ) Suppose that A is an n × n matrix and that ξ1 . . y(t)) where x and y are diﬀerentiable functions on [0.) Sc ho ht ol rs ive Un of M 7. 9. Dij = ξi δij . vn (in that order). . rig py Prove that t (AB) = (tB)(tA) for all matrices A and B such that AB is deﬁned.
Solve the system of diﬀerential equations x = −7x − 2y + 6z y = −2x + y + 2z z = −10x − 2y + 9z. Prove also that the leading coeﬃcient of cA (x) n is (−1)n . Co Sc ho ol of (a) 4 −1 2 1 2 −2 7 (b) 0 −1 4 0 0 5 11.) 10. (Note the connection with Exercise 7. Let A be an n × n matrix. (Hint: Use induction on n.) ri g py ht em rs ive Un ati cs M −7 −2 (c) −10 −2 6 1 2 −2 9 ath ity an of Sy dn ey ist ics tat dS . and for each eigenvalue ﬁnd an eigenvector.48 Chapter Two: Matrices. Calculate the eigenvalues of the following matrices. and the constant term is det A. row vectors and column vectors (ii ) Use the previous exercise to ﬁnd a matrix P such that 0 −2 1 3 P =P ξ 0 0 ν where ξ and ν are the eigenvalues of the matrix on the left. 12. Prove that the characteristic polynomial cA (x) of A has degree n. the coeﬃcient of xn−1 is (−1)n−1 i=1 Aii .
this should lead both to simpliﬁcations and deeper understanding. and then attempts to derive consequences of these axioms. It also leads naturally to an axiomatic approach to mathematics. To do this successfully one must discard unnecessary details and focus attention only on what is really needed to make the idea work. a function.3 Introduction to vector spaces Co rig py ht rs ive Un M ath ity em of ati Sy cs dn an ey dS tat ist ics of Pure Mathematicians love to generalize ideas. such as an ntuple. or even an ntuple of functions. Depending on the context the unknown x could be a number. by means of a new trick. to prove some conjecture.0. in which one lists initially as axioms all the things which have to hold before the theory will be applicable. If they manage. furthermore. We have alreay used the axiomatic approach in Chapter 1 in the definition of ‘ﬁeld’. §3a Linearity A common mathematical problem is to solve a system of equations of the form 3. and in this chapter we proceed to the deﬁnition of ‘vector space’. they always endeavour to get maximum mileage out of the idea by searching for other situations in which it can be used. or something more complicated. For instance the simultaneous linear equations x1 3 1 1 1 0 −1 1 −1 2 x2 = 0 3.0. Potentially this kills many birds with one stone. since one of the major reasons for introducing vector spaces is to provide a suitable context for discussion of this concept. since good theories are applicable in many diﬀerent situations.1 T (x) = 0.2 x3 4 4 0 6 0 x4 49 Sc ho ol . We start with a discussion of linearity.
the pair of simultaneous diﬀerential equations df /dt = 2f + 3g dg/dt = 2f + 7g Co 2 2 ri g py Sc f g ho ht could be rewritten as 3.0. In this course we will be interested in cases in which T (x) is linear in x.0.4 d dt which is also has the form 3.1 Definition An expression T (x) is said to be linear in the variable x if T (x1 + x2 ) = T (x1 ) + T (x2 ) cs dn ey an dS for all values x1 and x2 of the variable.50 Chapter Three: Introduction to vector spaces have the form 3. Whether or not one can solve the system 3. with x being an ordered pair of functions.0. tat ist ics .1 with x a 4tuple.1 obviously depends on the nature of the expression T (x) on the left hand side. and the diﬀerential equation 3.1.0. If we let T (x) denote the expression on the left hand side of the equation 3.3 t2 d2 x/dt2 + t dx/dt + (t2 − 4)x = 0 is an example in which x is a function.3 above we ﬁnd that 2 d 2 d T (x1 + x2 ) = t 2 (x1 + x2 ) + t (x1 + x2 ) + (t2 − 4)(x1 + x2 ) dt dt = t2 (d2 x1 /dt2 + d2 x2 /dt2 ) + t(dx1 /dt + dx2 /dt) + (t2 − 4)x1 + (t2 − 4)x2 = t2 d2 x1 /dt2 + tdx1 /dt + (t2 − 4)x1 + t2 d2 x2 /dt2 + tdx2 /dt + (t2 − 4)x2 = T (x1 ) + T (x2 ) and T (λx) = t2 d2 d (λx) + t (λx) + (t2 − 4)λx 2 dt dt 2 2 2 = λ(t d x/dt + tdx/dt + (t2 − 4)x) = λT (x).0. Similarly. ol − rs ive Un f g = 0 0 3 7 of M ath ity em of ati Sy 3.0. and T (λx) = λT (x) for all values of x and all scalars λ.
and deﬁne y = x − x0 . where S is the set of all solutions of T (y) = 0.1 A function T : V → W is linear if T (x1 + x2 ) = T (x1 ) + T (x2 ) and T (λx) = λT (x) for all x1 . and we should really have said something like this: 3. x2 . x ∈ V and all scalars λ. Our deﬁnition of linearity may seem reasonable at ﬁrst sight.3 is called a linear diﬀerential equation. and by linearity of T we ﬁnd that T (y) = T (x − x0 ) = T (x) − T (x0 ) = a − a = 0. Then we have that x = x0 + y. Let y ∈ S. Hence all solutions of T (x) = a have the given form. and a scalar multiple of an m × n matrix is an m × n matrix. Co Sc ho ol of Example and we have shown that x = x0 + y is a solution of T (x) = a whenever y ∈ S. Now suppose that x is any solution of T (x) = a. The term ‘expression in x’ is a little imprecise.Chapter Three: Introduction to vector spaces Thus T (x) is a linear expression. The set of all m × n matrices (where m and n are ﬁxed) provides an example of a set equipped with addition and scalar multiplication: with the deﬁnitions as given in Chapter 1. It is left for the reader to check that 3.1.2 and 3. and for this reason 3. Then T (y) = 0. for rig py ht em rs ive Un ati cs M ath ity of Sy dn an ey −− dS tat ist ics . We have also seen that addition and scalar multiplication of rows and columns arises naturally in the solution of simultaneous equations. and put x = x0 + y.0. but a closer inspection reveals some deﬁciencies.0. and one must also be able to multiply elements of V and W by scalars.4 above are also linear equations.0. so that y ∈ S. −− We are given that T (x0 ) = a. 51 #1 Suppose that T is linear in x. and let x = x0 be a particular solution of the equation T (x) = a. the sum of two m × n matrices is another m × n matrix. For this to be meaningful V and W have to both be equipped with operations of addition. Show that the general solution of T (x) = a is x = x0 + y for y ∈ S. in fact T is simply a function. and linearity of T yields T (x) = T (x0 + y) = T (x0 ) + T (y) = a + 0 = a.
A set V is called a vector space over F if there is an operation of addition (x. There exists an element 0 ∈ V such that 0 + v = v for all v ∈ V . a vector space should be a set (whose elements will be called vectors) which is equipped with an operation of addition. these deﬁnitions were implicit in our discussion of 3. If x and y are such functions then x + y is the real valued diﬀerentiable function deﬁned by (x + y)(t) = x(t) + y(t). (λ + µ)v = λv + µv for all λ. a vector space is a set for which addition and scalar multiplication are deﬁned in some sensible fashion. if x and y are two vectors then their sum x + y must exist and be another vector. we can deﬁne addition and scalar multiplication for real valued diﬀerentiable functions. 1v = v for all v ∈ V . ati cs an .2. and if x is a vector and λ a scalar then λx must exist and be vector. u+v =v+u for all u. It will then make sense to ask whether a function from one vector space to another is linear in the sense of 3. For each v ∈ V there exists a u ∈ V such that u + v = 0. That is. Finally. Co ri g py Roughly speaking. v ∈ V . Sc ho ht em ol rs ive Un of M ath ity §3b Vector axioms of Sy dn ey dS tat ist ics In accordance with the above discussion. x) −→ λx from (i) (ii) (iii) (iv) (v) (vi) (vii) F × V to V .0. we expressed the general solution of 2.2 Definition Let F be a ﬁeld.4 in terms of addition and scalar multiplication of columns.3 above. y) −→ x + y on V . µ ∈ F and all v ∈ V . w ∈ V . λ(µv) = (λµ)v for all λ.1. 3. such that the following properties are satisﬁed. µ ∈ F and all v ∈ V .52 Chapter Three: Introduction to vector spaces instance. (u + v) + w = u + (v + w) for all u. v.1. By “sensible” I mean that a certain list of obviously desirable properties must be satisﬁed. and a scalar multiplication function which determines an element λx of the vector space whenever x is an element of the vector space and λ is a scalar. and a scalar multiplication function (λ. and if λ ∈ R then λx is the function deﬁned by (λx)(t) = λx(t).
x. 53 Comments 3. Q ∈ P ﬁnd the point R such that OP RQ is a parallelogram.Chapter Three: Introduction to vector spaces (viii) λ(u + v) = λu + λv for all λ ∈ F and all u. The 1 which appears in axiom (v) is the scalar 1. called the space of position vectors relative to O. It is trivial to check that the axioms listed in the deﬁnition above are satisﬁed. as P varies over all points of the plane. etc.2. The same construction is also applicable in three dimensional Euclidean space. v ∈ V . The sum of two line segments is deﬁned by the parallelogram rule. for P. .2 vector 1. . If λ is a positive real number then λOP is the line segment OS such that S lies on OP or OP produced and the length of OS is λ times the length of OP . Co rig py Sc ho ht ol rs ive Un of 3. thus 0 would denote the ˜ zero of the vector space under discussion. Of course. and deﬁne OP + OQ = OR. The set of all line segments OP . that is. there is no Examples #2 Let P be the Euclidean plane and choose a ﬁxed point O ∈ P. .. For negative λ the product λOP is deﬁned similarly (giving a point on P O produced). is an example of a vector space over R. consisting of all ordered triples of real numbers. With a little care one can always tell from the context whether 0 means the zero scalar or the zero vector. can be made into a vector space over R. Sometimes we will distinguish notationally between vectors and scalars by writing a tilde underneath vectors. x1 .1 The ﬁeld of scalars F and the vector space V both have zero elements which are both commonly denoted by the symbol ‘0’.2. The proofs that the vector space axioms are satisﬁed are nontrivial exercises in Euclidean geometry. Addition and scalar multiplication are deﬁned in the usual way: x1 x2 x1 + x2 y1 + y2 = y1 + y2 z2 z1 + z2 z2 x λx y = λy λ z λz for all real numbers λ. #3 The set R3 . the use of Cartesian M ath ity em of ati Sy cs dn an ey dS tat ist ics .
to do this we must verify that all the axioms are satisﬁed. λf ∈ S as follows: for all a ∈ S. (Recall that we have deﬁned x1 x2 Rn = . you do not have to be able to add and multiply elements of S for the deﬁnitions to make sense. Everything is completely analogous to R3 . If f. g ∈ S and λ ∈ F deﬁne f + g. #6 Let S be any set and let S be the set of all functions from S to F (where F is a ﬁeld). based on the fact that F satisﬁes the ﬁeld axioms. of M ath xn )  xi ∈ R for all i }.) It can now be shown these deﬁnitions of addition and scalar multiplication make S into a vector space over F . ity of Sy dn ey dS tat ist ics an . (f + g) + h (a) = (f + g)(a) + h(a) (by deﬁnition addition on S) = (f (a) + g(a)) + h(a) (same reason) = f (a) + (g(a) + h(a)) (addition in F is associative) = f (a) + (g + h)(a) (deﬁnition of addition on S) = f + (g + h) (a) (same reason) Sc ho ht em (f + g)(a) = f (a) + g(a) (λf )(a) = λ f (a) ol rs ive Un ati cs xn x2 .54 Chapter Three: Introduction to vector spaces coordinates shows that this vector space is essentially the same as position vectors in Euclidean space. #4 The set Rn of all ntuples of real numbers is a vector space over R for all positive integers n. g. Co ri g py t Rn = { ( x1 It is clear that t Rn is also a vector space. and the ﬁrst examples to think of when trying to understand theoretical results. In each case the proof is routine.) #5 If F is any ﬁeld then F n and tF n are both vector spaces over F . Then for all a ∈ S. (i) Let f. you should convince yourself that componentwise addition really does correspond to the parallelogram law. h ∈ S. xi ∈ R for all i . . These spaces of ntuples are the most important examples of vector spaces. (Note that addition and multiplication on the right hand side of these equations take place in F . . .
rig py M so that f + g = g + f . and therefore 1f = f . Then for all a ∈ S.Chapter Three: Introduction to vector spaces and so (f + g) + h = f + (g + h). for all a ∈ S. so that g + f = z. By deﬁnition of scalar multiplication for S and the Identity Axiom for ﬁelds. (λ + µ)f (a) = (λ + µ) f (a) = λ f (a) + µ f (a) = (λf )(a) + (µf )(a) = (λf + µf )(a) Sc ho ht em ol rs ive Un ati cs (1f )(a) = 1 f (a) = f (a) of ath ity of Sy dn an ey dS tat ist ics . µ ∈ F and f ∈ S. Thus λ(µf ) = (λµ)f . (g + f )(a) = g(a) + f (a) = 0 = z(a). µ ∈ F and f ∈ S. Deﬁne g ∈ S by g(a) = −f (a) for all a ∈ S. (vi) Let λ. Then for all a ∈ S. (f + g)(a) = f (a) + g(a) = g(a) + f (a) = (g + f )(a) 55 Co (by deﬁnition) (since addition in F is commutative) (by deﬁnition). (v) Suppose that f ∈ S. (ii) Let f. Then for all a ∈ S. Thus each element of S has a negative. λ(µf ) (a) = λ (µf )(a) = λ(µ f (a)) = (λµ) f (a) = (λµ)f (a) by the deﬁnition of scalar multiplication for S and associativity of multiplication in the ﬁeld F . (iv) Suppose that f ∈ S. we have. Then for all a ∈ S. For all a ∈ S we have (f + z)(a) = f (a) + z(a) = f (a) + 0 = f (a) by the deﬁnition of addition in S and the Zero Axiom for ﬁelds. g ∈ S. whence the result. We must show that this zero function z satisﬁes f + z = f for all f ∈ S. (vii) Let λ. (iii) Deﬁne z: S → F by z(a) = 0 (the zero element of F ) for all a ∈ S.
There is an obvious bijective correspondence between the sets S and S × {1}. and there are many similar examples. This follows from elementary calculus. it is readily checked that the deﬁnitions of addition and scalar multiplication for ntuples are consistent with the deﬁnitions of addition and scalar multiplication for functions. . g ∈ S. It is also straightforward to show that the set of all polynomial functions from R to R is a vector space over R. in F n may be identiﬁed with the function . . (Indeed. n} then the set S is essentially the same a1 a2 as F n . #8 The set of all continuously diﬀerentiable functions from R to R. n. It is necessary also to show that the vector space axioms are satisﬁed. In other words. . since we have deﬁned a column to be an n × 1 matrix. 2. . and the set of all integrable functions from [0. an f : {1. 2. n} → F given by f (i) = ai for i = 1. Co ri g py Observe that if S = {1. #9 The solution set of a linear equation T (x) = 0 is always a vector space. . . and a matrix to be a function. . and similar to the previous example. (viii) Let λ ∈ F and f.) #7 The set C of all continuous functions f : R → R is a vector space over R. this is routine to do. whence λ(f + g) = λf + λg. Thus (λ + µ)f = λf + µf . Then for all a ∈ S. λ(f + g) (a) = λ (f + g)(a) = λ f (a) + g(a) = λ f (a) + λ g(a) = (λf )(a) + (λg)(a) = (λf + λg)(a). one must show that if f and g are continuous and λ ∈ R then f + g and λf are continuous. . addition and scalar multiplication being deﬁned as in the previous example. . . It is necessary to show that these deﬁnitions do give well deﬁned functions C × C → C (for addition) and R × C → C (for scalar multiplication). F n is in fact the set of all functions from S × {1} to F . the set of all three times diﬀerentiable functions from (−1. Sc ho ol ht rs ive Un ati cs M ath ity em of Sy dn an ey dS tat ist ics of . since an ntuple . . .56 Chapter Three: Introduction to vector spaces by the deﬁnition of addition and scalar multiplication for S and the distributive law in F . 1) to R. . 2. 10] to R are further examples of vector spaces over R.
The proofs that each of the vector space axioms are satisﬁed in U are relatively straightforward. For some unknown reason it is common in vector space theory to speak of ‘linear transformations’ rather than ‘linear functions’. A complete proof will be given below (see 3. λx ∈ U (so that we do have well deﬁned addition and scalar multiplication functions for U ). To conclude this section we list some examples of functions which are linear. If T (x) = T (y) = 0 and λ ∈ F then linearity of T gives T (x + y) = T (x) + T (y) = 0 + 0 = 0 T (λx) = λT (x) = λ0 = 0 so that x + y. Stating the result more precisely. y ∈ U and λ ∈ F then x + y. To prove this one must ﬁrst show that if x. The deﬁnition of ‘linear transformation’ was essentially given in 3. although the words ‘transformation’ and ‘function’ are actually synonomous in mathematics. the subset of R consisting of all triples z 57 Co 1 0 2 1 4 4 5 x 0 y = 0 2 8 z 0 rig py Sc is a vector space. but there is no harm in writing it down again: ho ht em ol of M rs ive Un ati cs ath ity of Sy dn ey dS tat ist ics an .1.13). and then check the axioms. relative to addition and scalar multiplication functions “inherited” from V . if V and W are vector spaces over a ﬁeld F and T : V → W is a linear function then the set U = { x ∈ V  T (x) = 0 } is a vector space over F . the domain and codomain of a linear transformation should both be vector spaces (such as the vector spaces given in the examples above). As explained above.Chapter Three: Introduction to vector spaces x 3 y such that Thus.1 above. for instance. based in each case on the fact that the same axiom is satisﬁed in V . λx ∈ U .
3 Definition Let V and W be vector spaces over the ﬁeld F . zi ∈ R. Likewise. v ∈ V and all λ ∈ F . A function T : V → W is called a linear transformation if T (u + v) = T (u) + T (v) and T (λu) = λT (u) for all u. Sc ho ht em ol of M Examples rs ive Un . a function T satisfying T (λv) = λT (v) for all λ and v is said to preserve scalar multiplication. v and λ it follows that T is a linear transformation. and we have x1 + x2 T (u + v) = T y1 + y2 = z1 + z2 = x1 + y1 + z1 x2 + y2 + z2 + 2x1 − y1 2x2 − y2 λx1 λx1 + λy1 + λz1 T (λu) = T λy1 = 2λx1 − λy1 λz1 = λ(x1 + y1 + z1 ) λ(2x1 − y1 ) =λ Since these equations hold for all u. Co ri g py #10 Let T : R3 → R2 be deﬁned by x x+y+z T y = 2x − y z Let u. v ∈ R3 and λ ∈ R. Comment 3.1 A function T which satisﬁes T (u + v) = T (u) + T (v) for all u and v is sometimes said to preserve addition.58 Chapter Three: Introduction to vector spaces 3. yi . x2 v = y2 z2 x1 + x2 + y1 + y2 + z1 + z2 2(x1 + x2 ) − (y1 + y2 ) = T (u) + T (v). ath ity of Sy dn ey dS tat ist ics = λT (u). ati cs an x1 + y1 + z1 2x1 − y1 . Then x1 y1 u= z1 for some xi .3.
Chapter Three: Introduction to vector spaces Note that the deﬁnition of this function T could be rewritten as x x 1 1 y = 1 T y . n n (f (λu))i = (A(λu))i = ˜ ˜ j=1 Aij (λu)j = ˜ n j=1 Aij λ (u)j ˜ =λ j=1 Aij (u)j = λ (Au)i = λ (f (u))i = (λf (u))i . Then ˜ ˜ n n (f (u + v ))i = (A(u + v ))i = ˜ ˜ ˜ ˜ n = Aij (u)j + Aij (v )j = ˜ ˜ j=1 = (Au)i + (Av )i = (f (u))i + (f (v ))i = (f (u) + f (v ))i ˜ ˜ ˜ ˜ ˜ ˜ ho ht em n ol rs ive Un ati cs j=1 of j=1 M Aij (u + v )j = ˜ ˜ ath Aij (v )j Aij (u)j + ˜ ˜ j=1 j=1 ity n of Sy dn an Aij (u)j + (v )j ˜ ˜ ey dS tat ist ics Similarly. 2 −1 0 z z 1 1 1 . so f (u + v ) = f (u) + f (v ) and f (λu) = λf (u). If the function f : F n → F m is deﬁned by f (x) = Ax for ˜ ˜ all x ∈ F n . v ∈ F n and λ ∈ F . facts which we have already noted in Chapter One. the calculations above amount to showing that 2 −1 0 A(u + v) = Au + Av and A(λu) = λ(Au) for all u. 59 Co rig py Sc Writing A = #11 Let F be any ﬁeld and A any matrix of m rows and n columns with entries from F . v ∈ R3 and all λ ∈ R. using Exercise 3 from Chapter 2. then f is a linear transformation. and ˜ ˜ ˜ ˜ ˜ ˜ therefore f is linear. The calculations are reproduced below. #12 Let F be the set of all functions with domain Z(the set of all integers) and codomain R. Let G: F → R2 be deﬁned by G(f ) = f (−1) f (2) . We use the notation (v )i for ˜ the ith entry of a column v . ˜ Let u. ˜ ˜ ˜ ˜ This holds for all i. in case you have not done that exercise. By #6 above we know that F is a vector space over R. The proof of this is straight˜ forward.
and deﬁne I: C → R by I(f ) = 0 f (x) dx.60 Chapter Three: Introduction to vector spaces for all f ∈ F. g ∈ F and λ ∈ R we have G(f + g) = = = (f + g)(−1) (f + g)(2) (by deﬁnition of G) (by the deﬁnition of addition of functions) (by the deﬁnition of addition of columns) f (−1) + g(−1) f (2) + g(2) Co ri g py ht f (−1) g(−1) + f (2) g(2) = G(f ) + G(g). g ∈ C and λ ∈ R. #13 Let C be the vector space consisting of all continuous functions from 4 the open interval (−1. 7) to R. Sc ho ol rs ive Un of and similarly G(λf ) = = =λ (λf )(−1) (λf )(2) λ f (−1) λ f (2) M ath f (−1) f (2) = λG(f ). Let f. Then ist ics I(f + g) = 0 (f + g)(x) dx 4 = 0 4 f (x) + g(x) dx 4 = 0 f (x) dx + 0 g(x) dx (by basic integral calculus) = I(f ) + I(g). 4 (deﬁnition of G) ity (deﬁnition of scalar multiplication for functions) (deﬁnition of scalar multiplication for columns) em of ati Sy cs dn an ey dS tat and it follows that G is linear. and similarly 4 I(λf ) = 0 (λf )(x) dx . Then if f.
If S is the set of all functions from R to R.4 Proposition For all u. It is also useful to learn the art of constructing proofs by doing proofs of trivial facts before trying to prove diﬃcult theorems. ht rs ive Un ati cs M ath ity em of Sy dn an ey dS #14 It is as well to ﬁnish with an example of a function which is not linear. Proof. w ∈ V . (In fact. Now adding t to both sides of the equation ˜ tat ist ics . It is necessary to prove that these things are indeed consequences of the vector space axioms. so that T does not preserve scalar multiplication. Throughout this section F will be a ﬁeld and V a vector space over F .Chapter Three: Introduction to vector spaces 4 61 = 0 λ f (x) dx 4 Co =λ f (x) dx 0 = λI(f ). then the function T : S → R deﬁned by T (f ) = 1 + f (4) is not linear.) ol of §3c Trivial consequences of the axioms Having deﬁned vector spaces in the previous section. An unfortunate consequence of this is that we must start by proving trivialities. T does not preserve addition either. or. if f ∈ S is deﬁned by f (x) = x for all x ∈ R then T (2f ) = 1 + (2f )(4) = 1 + 2 f (4) = 2 + 2 f (4) = 2(1 + f (4)) = 2T (f ). Assume that v + u = w + u. In doing this we must be careful to make sure that our proofs use only the axioms and nothing else.2 there exists t ∈ V such that u + t = 0. rather. Hence T is not linear. things which seem to be trivialities because they are familiar to us in slightly diﬀerent contexts. if v + u = w + u then v = w. For instance. for only that way can we be sure that every system which satisﬁes the axioms will also satisfy all the theorems that we prove. v. rig py Sc ho whence I is linear. By Axiom (iv) in Deﬁnition 3. 3. our next objective is to prove theorems about vector spaces.
by 3. ˜ We comment that 3. Then λ0 = 0 = 0v. By deﬁnition of 0 we have 0 + 0 = 0.2) we have 0+0=0.7 Proposition Let λ ∈ F and v ∈ V .6). v = 0.5 Proposition t + v = 0. and.5 there is no ambiguity in using the customary notation ‘−v’ for the negative of a vector v. t = t .62 Chapter Three: Introduction to vector spaces gives (v + u) + t = (w + u) + t v + (u + t) = w + (u + t) v+0=w+0 ˜ ˜ v=w Co ri g py ht (by Axiom (i)) (by the choice of t) (by Axiom (iii)). Assume that u + v = v. ˜ ˜ By the ﬁeld axioms (see (ii) of Deﬁnition 1.2) 0v + 0v = (0 + 0)v = 0v. and therefore ˜ ˜ ˜ ˜ λ(0 + 0) = λ0 ˜ ˜ ˜ λ0 + λ0 = λ0 (by Axiom (viii)) ˜ ˜ ˜ λ0 = 0 (by 3.6 Proposition If u. The existence of such a t for each v is immediate from Axioms (iv) and (ii). 3.6 shows that V cannot have more than one zero element.4. and we will do this henceforward. We also˜have (−1)v = −v. ˜ and by 3. Uniqueness follows from 3. Then by Axiom (iii) we have u + v = 0 + v. ˜ ˜ Proof.4 above since if t + v = 0 and t + v = 0 ˜ ˜ then. 3. v ∈ V and u + v = v then u = 0. ˜ For each v ∈ V there is a unique t ∈ V which satisﬁes Proof. Sc ho ol rs ive Un ati cs of M ath ity em of Sy dn an ey ics dS tat ist . By Proposition 3. ˜ Proof. con˜ versely. if λv = 0 then either λ = 0 or v = 0. and so by Axiom (vii) (in Deﬁnition 3.4. 3.
By 3.9 Definition A subset U of a vector space V is called a subspace of V if U is itself a vector space relative to addition and scalar multiplication inherited from V .6.5 above we deduce that (−1)v = −v. Since an operation on U is a function U × U → U . Sc ho ht em ol rs ive Un of M ath ity §3d Subspaces of Sy dn ey dS tat ist ics If V is a vector space over a ﬁeld F and U is a subset of V we may ask whether the operation of addition on V gives rise to an operation of addition on U .8 Definition A subset U of a vector space V is said to be closed under addition and scalar multiplication if (i) u1 + u2 ∈ U for all u1 . and we deduce that ˜ v = 1v = (λ−1 λ)v = λ−1 (λv) = λ−1 0 = 0. For the converse. it does so if and only if adding two elements of U always gives another element of U . 3. ˜ ˜ 63 Co rig py For the last part. provided only that it is nonempty. suppose that λv = 0 and λ = 0. ati cs an . Likewise. ˜ Field axioms guarantee that λ−1 exists. 3. it is natural to ask whether U is a vector space relative to the addition and scalar multiplication inherited from V .Chapter Three: Introduction to vector spaces whence 0v = 0 by 3. If a subset U of V is closed in this sense. if it is we say that U is a subspace of V . observe that we have (−1) + 1 = 0 by ﬁeld axioms. the scalar multiplication function for V gives rise to a scalar multiplication function for U if and only if multiplying an element of U by a scalar always gives another element of U . u2 ∈ U (ii) λu ∈ U for all u ∈ U and all scalars λ. and therefore (−1)v + v = (−1)v + 1v = ((−1) + 1)v = 0v = 0 ˜ by Axioms (v) and (vii) and the ﬁrst part. It turns out that a subset which is closed under addition and scalar multiplication is always a subspace.
In view of 3. Let x be such an element. and ˜ so. Co ri g py Comments 3.10.2 In #9 above we claimed that if V and W are vector spaces over F and T : V → W a linear transformation then the set U = { v ∈ V  T (v) = 0 } is a subspace of V . z ∈ V .10 it is no longer necessary to check all eight vector space axioms in order to prove this. but at ﬁrst sight it seems possible that 0 may fail to be in the subset U . By closure under scalar multiplication we have that 0x ∈ U . Then x. The remaining axioms are trivially proved by arguments similar to those used for axioms (i) and (ii). Thus Axiom (i) holds in U .1 It is easily seen that if V is a vector space then the set V itself is a subspace of V . Proof. Since V is a vector space we know that V has a zero element.7 gives 0x = 0 (since x is an element of V ). Thus Axiom (ii) holds. But Proposition 3. which we will denote by ‘0 .10. ˜ since U is nonempty there certainly exists at least one element in U . ˜ For Axiom (iv) we must prove that each x ∈ U has a negative in U . It is necessary only to verify that the inherited operations satisfy the vector space axioms. and the set {0} (consisting of just the zero element of V ) is also a subspace of V . y ∈ U . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . and −x = (−1)x (by 3. But x ∈ U gives (−1)x ∈ U (by closure under scalar multiplication). and so by Axiom (i) for V it follows that (x + y) + z = x + (y + z). Since Axiom (iv) for V guarantees that x has a negative in V and since the zero of U is the same as the zero of V . instead. so the result follows. y. it is necessarily true that 0 ∈ U .64 Chapter Three: Introduction to vector spaces 3. since if y ∈ U is arbitrary then y ∈ V and Axiom (iii) for V gives 0 + y = y. y ∈ V . Then x. Let x. It is now trivial that 0 is also ˜ ˜ a zero element for U . 3. In most cases the fact that a given axiom is satisﬁed in V trivially implies that the same axiom is satisﬁed in U .10 Theorem If V is a vector space and U a subset of V which is nonempty and closed under addition and scalar multiplication. z ∈ U . ˜ However. The next task is to prove that U has a zero element. it suﬃces to show that −x ∈ U . to prove merely that U is nonempty and closed under addition and scalar multiplication. Let x. and so x + y = y + x. y. it suﬃces. then U is a subspace of V . after all.7).
Let if u be an arbitrary element of U and λ an arbitrary scalar. Prove that U x+y is a subspace of V . y = y + y ). x . −− By #6 the set S of all real valued functions on [0. x +y x+y for some x. x +y and this is an element of U . and x+y λx λx λu = λy = λy ∈ U. v be arbitrary elements of U . It is clear that U is nonempty: 0 Let u. y ∈ R. 65 Co rig py −− In view of Theorem 3. 0 0 ∈ U.Chapter Three: Introduction to vector spaces Examples x #15 Let F = R. y ∈ R. V = R3 and U = { y x. #16 Use the result proved in #6 and elementary calculus to prove that the set C of all real valued continuous functions on the closed interval [0. By Theorem 3. λ(x + y) λx + λy Thus U is closed under scalar multiplication. 1] is a vector space over R.10. Then x u = y for some x.10 Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics −− . we must prove that U is nonempty and closed under addition and scalar multiplication. so it will suﬃce to prove that C is a subspace of S. Then x x v= y u = y . y ∈ R }. and we see that x u+v = y (where x = x + x . Hence U is closed under addition. 1] is a vector space over R. y.
g ∈ C and t ∈ R. Hence C is closed under addition and scalar multiplication. Let f. In this deﬁnition ‘0W ’ denotes the zero element of the space W . so that f + g and tf are continuous. −− Since one of the major reasons for introducing vector spaces was to elucidate the concept of ‘linear transformation’. Whenever T is a linear transformation. g are continuous) ho ht em ati cs ol rs ive Un of M ath (deﬁnition of tf ) (basic calculus) (since f is continuous) ity of Sy dn an dS ey ist ics tat . 3.66 Chapter Three: Introduction to vector spaces then it suﬃces to prove that C is nonempty and closed under addition and scalar multiplication. 1] we have Co ri g py x→a lim (f + g)(x) = lim f (x) + g(x) x→a = lim f (x) + lim g(x) x→a x→a = f (a) + g(a) = (f + g)(a) and similarly x→a lim (tf )(x) = lim t f (x) x→a = t lim f (x) x→a = t f (a) = (tf )(a). there are two diﬀerent kinds of vectors simultaneously Sc (deﬁnition of f + g) (basic calculus) (since f. so C is nonempty. The zero function is clearly continuous. much of our time will be devoted to the study of linear transformations. Since the domain V and codomain W of the transformation T may very well be diﬀerent vector spaces. For all a ∈ [0. The subset of V ker T = { x ∈ V  T (x) = 0W } is called the kernel of T . it is always of interest to ﬁnd those vectors x such that T (x) is zero.11 Definition Let V and W be vector spaces over a ﬁeld F and let T : V → W be a linear transformation.
as here. as required. Proof.6 it follows that w = 0. If T : V → W is a linear transformation then ker T is a Proof. for instance. Occasionally. Thus u + v and λu are in ker T . 3.13 Theorem subspace of V . By deﬁnition the kernel of T consists of those vectors in V which are mapped to the zero vecor of W . although it is normal to use the same symbol ‘0’ for each. 67 Co rig py We now prove the theorem which was foreshadowed in #9: 3. Then by linearity of T . By 3. and therefore ker T is nonempty. and it follows that ker T is closed under addition and scalar multiplication. let w = T (0V ) be this element. we will append a subscript to indicate which zero vector we are talking about.12 Proposition Let V and W be vector spaces over F and let 0V and 0W be the zero elements of V and W respectively. Sc ho ht em ol of M rs ive Un ati cs ath ity of Sy dn ey ics an dS tat ist .7 above). It is natural to ask whether the zero of V is one of these. If T : V → W is a linear transformation then T (0V ) = 0W .Chapter Three: Introduction to vector spaces under discussion. Suppose that u. it is important not to confuse the zero element of V with the zero element of W . By 3. Certainly T must map 0V to some element of W . it always is. Thus. v ∈ ker T and λ is a scalar. Since 0V + 0V = 0V we have w = T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ) = w + w since T preserves addition. T (u + v) = T (u) + T (v) = 0W + 0W = 0W and T (λu) = λT (u) = λ0W = 0W (by 3.12 we know that 0V ∈ ker T . By 3.10 it remains to prove that ker T is closed under addition and scalar multiplication. In fact.
the kernel of T is the set of all functions f ∈ D such that T f = 0. any scalar multiple of a function of this form has the same form. Note that the sum of two functions of this form (corresponding to two constants C1 and C2 ) is again of the same form. Hence the kernel consists of all functions f such that f (x) = Ce− sin x for some constant C. µ ∈ R. Show that f → T f is a linear map from D to F. For each f ∈ D deﬁne T f ∈ F by (T f )(x) = f (x) + (cos x)f (x). and let F be the vector space of all real valued functions on R. Hence. and calculate its kernel. Similarly.68 Chapter Three: Introduction to vector spaces Examples #17 The solution set of the simultaneous linear equations x+y+z =0 x − 2y − z = 0 Co ri g py may be viewed as the kernel of the linear transformation T : R3 → R2 given by x x 1 1 1 T y = y . Thus the kernel of T is indeed a subspace of D. and it follows that T (λf + µg) = λ(T f ) + µ(T g). ﬁnding the kernel of T means solving the diﬀerential equation f (x)+(cos x)f (x) = 0. By deﬁnition. Hence T is linear. This gives f (x)esin x = C. and then integrate. Then for all x ∈ R we have (T (λf + µg))(x) = (λf + µg) (x) + (cos x)(λf + µg)(x) d = (λf (x) + µg(x)) + (cos x)(λf (x) + µg(x)) dx = λf (x) + λ(cos x)f (x) + µg (x) + µ(cos x)g(x) = λ((T f )(x)) + µ((T g)(x)) = (λ(T f ) + µ(T g))(x).13) that it would be. In this case the method is to multiply through by the “integrating factor” esin x . #18 Let D be the vector space of all diﬀerentiable real valued functions on R. an ey dS tat ist ics . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn −− Let f. where C is any constant. g ∈ D and λ. Techniques for solving diﬀerential equations are dealt with in other courses. as we knew (by −− Theorem 3. 1 −2 −1 z z and is therefore a subspace of R3 .
Let x. So im T = ∅. 3. We have by 3. y ∈ R such that x − 3y = a 4x − 10y = b −2x + 9y = c. Hence im T is closed under addition and scalar multiplication. y ∈ im T and let λ be a scalar. By #11 we know that T is a linear transformation. our next result is. 69 Co rig py Proof. so that x + y. λx ∈ im T . Recall (see §0c) that the image of T is the set im T = { T (v)  v ∈ V }.13 says that the kernel of a linear transformation T is a subspace of the domain of T . and now linearity of T gives x + y = T (u) + T (v) = T (u + v) Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy and dn ey tat ist λx = λT (u) = T (λu).12 that T (0V ) = 0W . in a sense.Chapter Three: Introduction to vector spaces Theorem 3. Example #19 Let T : R2 → R3 be deﬁned by x y 1 −3 = 4 −10 −2 9 x y an dS ics T . By deﬁnition of im T there exist u. and so its image is a a subspace of R3 . dual to 3. (∗) .14 Theorem If T : V → W is a linear transformation then the image of T is a subspace of W .13. The image in fact consists of all triples b such that c there exist x. v ∈ V with T (u) = x and T (v) = y. and says that the image of a linear transformation is a subspace of the codomain. and hence 0W ∈ im T .
c Co ri g py One way to prove this is to form the augmented matrix corresponding to the system (∗) and apply row operations to obtain an echelon matrix. as required.13 that the kernel of θ is a subspace of V .70 Chapter Three: Introduction to vector spaces It turns out that this is the same as the set a b 16a − 3b + 2c = 0 . ˜ Sc ho ht em ol rs ive Un ati cs θ(v) = 0 = θ(0). v ∈ V with θ(u)˜ = θ(v). 0. ˜ so that u − v ∈ ker θ. Assume ﬁrst that θ is injective. 3. and u = v. {0}. By linearity of θ we obtain θ(u − v) = θ(u) + θ(−v) = θ(u) − θ(v) = 0. so assume that u. ˜ Proof. Hence θ is injective. assume that ker θ = {0}. Hence 0 is the only element of ˜ ˜ ker θ. for the zero elements ˜ of V and W —this should cause no confusion. and so ker θ = {0}.15 Proposition A linear transformation is injective if and only if its kernel is the zero subspace. ˜ For the converse. We seek to prove that θ is injective. ˜ ˜ of M ath ity of Sy dn ey dS tat ist ics an . 2 b − 4a 0 0 16a − 3b + 2c Thus equations are consistent if and only if 16a − 3b + 2c = 0. The result is 1 −3 a 0 . Let θ: V → W be a linear transformation. Our ﬁnal result for this section gives a useful criterion for determining whether a linear transformation is injective. We have proved in 3. as claimed. θ(0) = 0. Then we have and since θ is injective it follows that v = 0. (This was proved explicitly in the ˜ ˜ proof of 3.13. Here we will use the same notation. hence it contains the zero of V . That is. Hence u − v = 0.) Now let v be an arbitrary element of ker θ.
. . . . vn and w be vectors in some space V . in the following sense. the fundamental concept of vector space theory. 3. since u + v + w = 0. For example.Chapter Three: Introduction to vector spaces For completeness we state the analogous result for surjective linear transformations. v = 0 and w = −1 0 1 −1 span S. . vn are said to span V if every element w ∈ V can be expressed as a linear combination of the vi . . . v2 . . . x For example. Indeed. v and w. The elements of a minimal spanning set are then linearly independent. . . . (It is equally true that u and v span S. so that no element of the spanning set is expressible in terms of the others. if x + y + z = 0 then x 0 −1 1 y = y − z 1 + z − x 0 + x − y −1 . v and w. and that u and w span S. 3. . 71 Co rig py Sc §3e Linear combinations Let v1 . . since it is made up of addition and scalar multiplication. ho ol ht em rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . then the answer will not be altered. λ2 . v2 .17 Definition The vectors v1 . . λn . . v2 . Thus one can arrange for the coeﬃcient of u to be zero. if any number α is added to each of the coeﬃcients of u.) It is usual to try to ﬁnd spanning sets which are minimal. and it is easily checked that the triples 1 −1 0 u = 1 . We say that w is a linear combination of v1 . This concept is. in a sense. and it follows that in fact S is spanned by v and w.16 Proposition A linear transformation is surjective if and only if its image is the whole codomain. 3 3 3 z −1 1 0 In this example it can be seen that there are inﬁnitely many ways of expressing each element of the set S in terms of u. although it is immediate from the deﬁnitions concerned. vn if w = λ1 v1 + λ2 v2 + · · · + λn vn for some scalars λ1 . the set S of all triples y such that x + y + z = 0 is a z 3 subspace of R .
. Vectors v1 . β. #20 Show that the functions f . . g and h deﬁned by f (x) = sin x π g(x) = sin(x + 4 ) for all x ∈ R π h(x) = sin(x + 2 ) are linearly dependent over R. . v2 . . the triples u and v above are linearly independent. and which satisfy αf + βg + γh = 0. ho ht rs ive Un ati cs M ath ity em of Sy dn an ey dS sin x + ol of Example tat ist 1 √ 2 ics cos x .1 in other words α sin x + β sin(x + π ) + γ sin(x + π ) = 0 4 2 for all x ∈ R. vr are linearly dependent if they are not linearly independent. since 0 1 0 λ 1 + µ 0 = 0 0 −1 −1 then inspection of the ﬁrst two components immediately gives λ = µ = 0. . By some well known trigonometric formulae we ﬁnd that sin(x + π ) = sin x cos π + cos x sin π = 4 4 4 1 √ 2 Sc λ1 = λ2 = · · · = λr = 0. αf (x) + βg(x) + γh(x) = 0 for all x ∈ R.18. γ ∈ R which are not all zero. We require 3.18 Definition We say that vectors v1 . . . if there is a solution of λ1 v1 + λ2 v2 + · · · + λr vr = 0 for which the scalars λi are not all zero. v2 . . that is. vr are linearly independent if the only solution of λ1 v1 + λ2 v2 + · · · + λr vr = 0 Co ri g py is given by if For example.72 Chapter Three: Introduction to vector spaces 3. −− We attempt to ﬁnd α.
19 Definition If v1 .Chapter Three: Introduction to vector spaces and similarly sin(x + π ) = sin x cos π + cos x sin π = cos x. . so (by 3. ai ∈ F m for each i.1 has a nontrivial solution. a2 . . Geometrically. −− Co for all x ∈ R. Choosing two scalars λ and µ determines an element λu + µv of S. v2 . For example. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . 2 2 2 and so our equations become (α + β √ ) sin x 2 β + ( √2 + γ) cos x = 0 73 √ Clearly α = 1. . the subspace S of R3 described above has two degrees of freedom. Let a1 . Since A is an m × n matrix these columns all have m components. see also Exercise 5 below. . λn λn and we deduce that S is precisely the set of all linear combinations of the columns of A. The number n is called the dimension. β = − 2 and γ = 1 solves this. . The dimension can thus be thought of as the number of “degrees of freedom” in the space. since it has a basis consisting of the two elements u and v. g and h are linearly dependent. A . as claimed. By multiplication of partitioned matrices we ﬁnd that λ1 λ1 λ2 λ2 a2 ··· an . . We have seen in #11 above that if A ∈ Mat(m × n. vn are linearly independent and span V we say that they form a basis of V . The set S = { Ax  x ∈ F n } is the image of T . and f . F ) then T (x) = Ax deﬁnes a linear transformation T : F n → F m . Hence 3. since the coeﬃcients of sin x and cos x both become zero. One of our major tasks is to show that the dimension is an invariant of a vector space. .14) it is a subspace of F m . that is. any two bases must have the same number of elements. . That is. . . and so it certainly ought to be a twodimensional space. an be the columns of A. If a vector space V has a basis consisting of n vectors then a general element of V can be completely described by specifying n scalar parameters (the coeﬃcients of the basis elements). rig py 3.18. = λ1 a1 + λ2 a2 + · · · + λn an . We will do this in the next chapter. = a1 . S represents a plane through the origin. . and each element of S is uniquely expressible in this form.
. Proof. . CS(A). The row space.74 Chapter Three: Introduction to vector spaces 3. . Comment 3. RS(A) is the subspace of tF n consisting of all linear combinations of the rows of A. and since the ith row of A is ( ai1 . of the matrix A is the subspace of F m consisting of all linear combinations of the columns of A. Co ri g py ai2 3. ith b1 ˜ b2 row of AB = ( ai1 ai2 . F ) the column space.20. RS(AB) ⊆ RS(B). ain ) ˜. and since the row space of B is closed under addition and scalar multiplication it follows that all linear combinations of the rows of AB are also in the row space of B.1 Observe that the system of equations Ax = b is consistent if and only if b is in the column space of A.. we have shown that the ith row of AB is a linear combination of the rows of B. It is ˜ a general fact that the ith row of AB = ith row of A B.21 Proposition Let A ∈ Mat(m × n. In words. Let the (i. and (ii) CS(AB) ⊆ CS(A).20 Definition If A ∈ Mat(m × n. F ) and B ∈ Mat(n × p. bn ˜ = αi1 b1 + αi2 b2 + · · · + αin bn ˜ ˜ ˜ ∈ RS(B). the scalar coeﬃcients being the entries in the ith row of A. Sc ho ht em ol rs ive Un ati cs of M ath ain ) we have ity of Sy dn ey ist ics dS tat an . The proof of (ii) is similar and is omitted. j)entry of A be αij and let the ith row of B be bi . That is. Then (i) RS(AB) ⊆ RS(B).. So all rows of AB are in the row space of B. F ). .
B gives RS(A−1 (AB)) ⊆ RS(AB). hence aB equals the ith row of AB. applying 3. Hence the reduced echelon matrix E which is the end result of the pivoting algorithm described in Chapter 2 has the same row space as the original matrix A. F ) is invertible. By 3. −− ˜ ˜ 75 Co rig py 3. #22 Find a basis for the row space of the matrix 1 −3 5 5 −2 2 1 −3 . the i row of AB has the same j th entry as aB.22 Corollary Suppose that A ∈ Mat(n × n. as required.) Since performing row operations on a matrix is equivalent to premultiplying by elementary matrices. AB in place of A. Proof. which takes B to AB.Chapter Three: Introduction to vector spaces Example #21 Prove that if A is an m × n matrix and B an n × p matrix then the ith row of AB is aB. −3 1 7 1 Sc ho ht em ol rs ive Un ati cs of Examples M ath ity of Sy dn an ey dS tat ist ics . it follows that row operations do not change the row space of a matrix. which is equal to (AB)ij .22 it follows that if A is invertible then the function “premultiplication by A”. k=1 Aik Bkj . But since a is the ith row of A. Thus we have shown that RS(B) ⊆ RS(AB) and RS(AB) ⊆ RS(B). and therefore form a basis for the row space of A. Then RS(AB) = RS(B). k)entry of A. and let B ∈ Mat(n × p. n ˜ aB is a 1 × m row vector. the k th entry of a is ˜ is ˜ ˜ ˜ in fact the (i. does not change the row space. From 3. where ak ˜ the k th entry of a. The j th entry of aB is k=1 ak Bkj . where a is the ith row of A. the j th So we have shown that for all j. as required. and elementary matrices are invertible. Moreover. So we have shown that the j th entry of aB is n th entry of the ith row of˜ AB. (It is trivial to prove also the column space analogue: postmultiplication by invertible matrices leaves the column space unchanged. F ) be arbitrary. We leave it as an exercise for the reader to prove that the nonzero rows of E are also linearly independent.21 we have RS(AB) ⊆ RS(B). ˜ ˜ −− Note that since a is a 1 × n row vector and B is an n × m matrix.21 with A−1 . The nonzero rows of E therefore span the row space of A.
0 0 0 Co ri g py Since row operations do not change the row space. 3 −1 4 3 2 z 1 ati Sy ist cs dn an ey dS tat ics . do the equations 2 −1 4 1 x −1 + y 3 + z −1 = 1 4 3 2 1 have a solution? Since these equations can also be written as 2 −1 4 x 1 −1 y = 1. it follows that its nonzero rows are linearly independent. the echelon matrix we have obtained has the same row space as the original matrix A. because the matrix is echelon. −− 1 #23 Is the column vector v = 1 in the column space of the matrix 1 2 −1 4 A = −1 3 −1 ? 4 3 2 −− The question can be rephrased as follows: can v be expressed as a linear combination of the columns of A? That is. if Sc ho ht em ol rs ive Un of M ath ity of λ ( 1 −3 5 5 ) + µ ( 0 −4 11 7) = (0 0 0 0) then looking at the ﬁrst entry we see immediately that λ = 0. Indeed. Hence the row space consists of all linear combinations of the rows ( 1 −3 5 5 ) and ( 0 −4 11 7 ). whence (from the second entry) µ must be zero also.76 Chapter Three: Introduction to vector spaces −− 1 1 −3 5 5 R2 :=R2 +2R1 −2 2 1 −3 R3 :=R3 +3R1 0 −−−− −−−→ 0 −3 1 7 −1 1 R3 :=R3 −2R2 0 −−−− −−−→ 0 −3 5 −4 11 −8 22 5 7 14 −3 5 5 −4 11 7 . since the zero row can clearly be ignored when computing linear combinations of the rows. Furthermore. Hence the two nonzero rows of the echelon matrix form a basis of RS(A).
Prove that if A is an m×n matrix over R then the solution set of Av = 0 is a subspace of Rn . Although the question did not ask for the coeﬃcients x. and then the ﬁrst equation gives x = −3/10. let us nevertheless do so. (This is exactly the point made in 3. M ath ity of Sy dn an ey dS tat ist ics −− . The last equation of the echelon system gives z = 1/2. y and z to be calculated. and there is no row for which the left hand side of the equation is zero and the right hand side nonzero. Prove that the set T of all solutions of the simultaneous equations x 1 1 1 1 y 0 = 1 2 1 2 z 0 w is a subspace of R4 . as a check. Hence the equations must be consistent. 2.1 above.) We simply apply standard row operation techniques: −1 3 −1 2 −1 4 1 1 R1 ↔R2 −1 3 −1 1 R2 :=R2 +2R1 0 5 2 3 R3 :=R3 +4R1 −−−− −−−→ 1 5 4 3 2 0 15 −2 −1 3 −1 1 R3 :=R3 −3R2 0 5 2 3 −−−− −−−→ −4 0 0 −8 77 Co rig py Now the coeﬃcient matrix has been reduced to echelon form.20. substituting this into the second equation gives y = 2/5.Chapter Three: Introduction to vector spaces our task is to determine whether or not these equations are consistent. Thus v is in the column space of A. It is easily checked that 2 −1 4 1 (−3/10) −1 + (2/5) 3 + (1/2) −1 = 1 4 3 2 1 Sc ho ht em ol rs ive Un ati cs of Exercises 1.
w2 . C) (n × n matrices over C).10 to deduce that m = n. wn be two bases of a vector space V . and use 2. . y (ii ) S: R2 → R2 deﬁned by S (iii ) g: R2 → R3 deﬁned by g 0 0 x = . x+1 an ey dS tat ist ics (iv ) f : R → R2 deﬁned by f (x) = . F = R. F = R. . 0 1 y 2x + y = y . Prove that the nonzero rows of an echelon matrix are necessarily linearly independent. Show that the set of all functions from S to V can be made into a vector space in a natural way. ri g py Let v1 . . vm and w1 . Combine these equations to express the vi as linear combinations of themselves. S = R. Which of the following functions are linear transformations? (i ) T : R2 → R2 deﬁned by T x y x y x y = 1 2 2 1 x . and let B be the n × m coeﬃcient matrix obtained when the wj are expressed as linear combinations of the vi . 8. relative to obvious operations of addition and scalar multiplication. S = R[X] (polynomials over R in the variable X—that is. S = C (complex numbers). F = R. . Prove that if S is the subset of Rn which consists of the zero column and all eigenvectors associated to the eigenvalue λ then S is a subspace of Rn . . Similarly. v2 . Let A be an n × n matrix over R. Let V be a vector space and S any set. . In each case decide whether or not the set S is a vector space over the ﬁeld F . 5. show that BA = I. F = Q (rational numbers). Sy cs dn (i ) (ii ) (iii ) (iv ) (v ) 7. and then use linear independence of the vi to deduce that AB = I. x−y x . F = C. . let A be the m × n coeﬃcient matrix obtained when the vi are expressed as linear combinations of the wj . and λ any eigenvalue of A.78 Chapter Three: Introduction to vector spaces 3. expressions of the form a0 + a1 X + · · · + an X n with (ai ∈ R)). . Sc ho ht em ol rs ive Un ati of M ath ity of 6. S = Mat(n. S = C. Co 4.
(i ) Prove that if φ and ψ are linear transformations from V to W then φ + ψ: V → W deﬁned by (φ + ψ)(v) = φ(v) + ψ(v) is also a linear transformation. (ii ) Let S + T = { x + y  x ∈ S and y ∈ T }. (ii ) Prove that if φ: V → W is a linear transformation and α ∈ F then αφ: V → W deﬁned by (αφ)(v) = α φ(v) is also a linear transformation. g: U → V be linear transformations. and let f : V → W . W be vector spaces over a ﬁeld F .Chapter Three: Introduction to vector spaces 9. g are deﬁned by 1 1 1 0 1 a a 0 1 0 b f b = 2 c −1 −1 1 c 1 0 1 0 1 a a g = 1 0 b b 2 3 Give a formula for (f g) 11. 79 Co rig py 10. (ii ) Let U = R2 . W = R5 and suppose that f . (i ) Prove that S ∩ T is a subspace of V . W be vector spaces over F . V . Let V . Let V be a vector space and let S and T be subspaces of V . for all v ∈ V for all v ∈ V Sc ho ht em ol rs ive Un ati cs of M ath a . b ity of Sy dn an ey dS tat ist ics . (i ) Let U . V = R3 . Prove that f g: U → W deﬁned by (f g)(u) = f g(u) for all u ∈ U is a linear transformation. Prove that S + T is a subspace of V .
v ∈ V } of M ath ity em  u ∈ S. Co ri g py Sc (i ) Prove that if addition and scalar multiplication are deﬁned in the obvious way then becomes a vector space. 1 a 0 b (Hint: the formulae f = and f = deﬁne 0 c 1 d the coeﬃcients of A.) (iii ) Can these results be generalized to deal with linear transformations from F n to F m for arbitrary n and m? ho ht u v u v ol rs ive Un ati =u+v V2 ={  u.80 Chapter Three: Introduction to vector spaces 12. Let V be a vector space and let S and T be subspaces of V . c. v ∈ T } of Sy dn ey tat ist ics cs an dS . 13. b. Calculate ker f and im f . 14.) (ii ) Prove that if f is any linear transformation from F 2 to F 2 then a b there exists a matrix A = (where a. Prove that for any a ∈ F the function f : F → F given by f (x) = ax for all x ∈ F is a linear transformation. d ∈ F ) such c d that f (x) = Ax for all x ∈ F 2 . Prove that ˙ S+T ={ is a subspace of V 2 . Prove that the set U of all functions f : R → R which satisfy the diﬀerential equation f (t) − f (t) = 0 is a vector subspace of the vector space of all realvalued functions on R. ˙ (ii ) Prove that f : S + T → V deﬁned by f u v is a linear transformation. Prove also that any linear transformation from F to F has this form. (Note that this implicitly uses the fact that F is a vector space over itself. (i ) Let F be any ﬁeld.
. v2 . . . . It is convenient to formulate most of the results in this chapter in terms of sequences of vectors. .1 Definition Let (v1 . vn ) if there n exist scalars λi such that w = i=1 λi vi . . . . . v2 . . unless otherwise stated. . an . v2 . 4. λn ∈ F ) Sc ho ol of Sy dn ey dS tat ist ics ati n cs λi vi  λi ∈ F }. vn ) be a sequence of vectors in V . vn ) is called the span of (v1 . in it we investigate bases in arbitrary vector spaces. . . (ii) The subset of V consisting of all linear combinations of (v1 . . v2 . λ2 . v2 . §4a Preliminary lemmas If n is a nonnegative integer and for each positive integer i ≤ n we are given a vector vi ∈ V then we will say that (v1 . . . and we start by restating the deﬁnitions from §3e in terms of sequences. vn ) = { i=1 If Span(v1 . . . . . vn ) is said to span V . . 81 (λ1 . vn ): Span(v1 . . . . and use them to relate abstract vector spaces to spaces of ntuples. . (iii) The sequence (v1 . v2 . . . .4 The structure of abstract vector spaces Co rig py ht rs ive Un M ath ity em of This chapter probably constitutes the hardest theoretical part of the course. . v2 . . vn ) = V then (v1 . Throughout the chapter. . . . . . . v2 . vn ) is a sequence of vectors. V will be a vector space over the ﬁeld F . v2 . vn ) is said to be linearly independent if the only solution of λ1 v1 + λ2 v2 + · · · + λn vn = 0 ˜ is given by λ1 = λ2 = · · · = λn = 0. (i) An element w ∈ V is a linear combination of (v1 . . . . .
v2 . vn ).1.6 (This is a rather technical point. . so it is a basis for {0}. . 4.1.2 It is not diﬃcult to prove that Span(v1 .5 Empty sums are always given the value zero. and even for the sequence which has no terms. vn ) is said to be a basis of V if it is linearly independent and spans V . . .1. v). v2 . ˜ Furthermore. Suppose that v is a nonzero vector. (The word ‘generate’ is often used as a synonym for ‘span’. v2 . . A deﬁnition of linear independence for sets of vectors would encounter the Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . e2 .1. v2 . (v. . The equation λ1 v + λ2 v = 0 has a nonzero solution—namely. v2 .˜ ˜ 4. . and consider the twoterm sequence (v. en ) is a basis of F n .1 The sequence of vectors (v1 . . vn . vn ) of vectors in V such that V = Span(v1 . . vn ) is always a subspace of V . The sequence (0) is certainly not linearly independent. vn )’ is not meant to imply that n ≥ 2. . since λ = 1 is a nonzero solution ˜ of λ0 = 0. . . . . .) We say that the vector space V is ﬁnitely generated if there exists a ﬁnite sequence (v1 .1. it is commonly called the subspace generated by v1 . ˜ 4. λ2 = −1. Thus the sequence (v) is linearly independent if and only if v = 0. The reason for our choice can be illustrated with a simple example. . . . . . if v = 0 then by 3. ˜ Thus. v2 . It is easily proved that (e1 . thus Span(v) = { λv  λ ∈ F }. . and indeed we want all our statements about sequences of vectors to be valid for oneterm sequences. .4 The notation ‘(v1 . . that is. . v2 . . by our deﬁnitions. Co ri g py 4. vn ) is said to be linearly dependent if it is not linearly independent. . . the empty sequence is considered to be linearly independent. vn }. vn ) rather than a set of vectors {v1 . so it seems reasonable that an empty linear combination should give the zero vector. . We will call this basis the standard basis of F n .82 Chapter Four: The structure of abstract vector spaces (iv) The sequence (v1 . and a linear combination of (v) is just a scalar multiple of v. . .3 Let V = F n and deﬁne ei ∈ V to be the column with 1 as its ith entry and all other entries zero. v2 .) It may seem a little strange that the above deﬁnitions have been phrased in terms of a sequence of vectors (v1 . . if there is a solution of n i=1 λi vi = 0 for which at least one of the λi is nonzero. v) is a linearly dependent sequence of vectors. .1. . . λ1 = 1. . . ˜ 4.7 we know that the only solution of ˜ ˜ ˜ λv = 0 is λ = 0. in which n = 2. However. . Each v ∈ V gives rise to a oneterm sequence (v). v2 . Accordingly we adopt the convention that the empty sequence generates the subspace {0}. . . Comments 4. .
. The point is that sets A and B are equal if and only if each element of A is an element of B and vice versa.2. v2 . . v2 . . .1 By ‘(v1 . .8 One consequence of our terminology is that the order in which elements of a basis are listed is important. . n must be at least 1. . v2 . . Write S = Span(v1 . . . .Chapter Four: The structure of abstract vector spaces problem that {v. . um ) are equal if and only if m = n and ui = vi for each i. and if a sequence spans V then so does any rearrangement. vj+1 . . v2 . u2 . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . v2 . .1. v2 . vj−1 . . . . . it is the concept of an ordered basis which is appropriate. 4. . . . . vj−1 . u2 . vn ∈ V .6. Comment 4. . v3 ) are not equal. However. Our terminology is a little at odds with the mathematical world at large in this regard: what we are calling a basis most authors call an ordered basis. then a vector v is a linear combination of (v1 . . indeed. . . . .2 Lemma Suppose that v1 . vn ). . . vn ) and (u1 . . . vn ) is linearly independent. . . um }. then so also is (v1 . . . u2 . It is true that if a sequence of vectors is linearly independent then so is any rearrangement of that sequence. . . . . the concept of ‘linear combination’ could have been deﬁned perfectly satisfactorily for sets. . Thus a rearrangement of a basis is also a basis—but a diﬀerent basis. so that there is a term to delete. . for our discussion of matrices and linear transformations in Chapter 6. However. v} = {v}: we would be forced to say that {v. . v} is linearly independent. vn ) and (u1 . on the other hand.1. . . . 4. vn ) and S = Span(v1 . and let 1 ≤ j ≤ n. .7 Despite the remarks just made in 4. and writing an element down again does not change the set. . vn } = {u1 . vn )’ we mean the sequence which is obtained from (v1 . . If v1 = v2 then the sequences (v1 . . . Our principal goal in this chapter is to prove that every ﬁnitely generated vector space has a basis and that any two bases have the same number of elements. . . Then (i) S ⊆ S. v2 . . . vn ). 83 Co rig py 4. . . The next two lemmas will be used in the proofs of these facts. . . sequences (v1 . v2 . . . if (v1 . um ) are sequences of vectors such that {v1 . . (iii) if the sequence (v1 . u2 . (ii) if vj ∈ S then S = S. v2 . v3 ) and (v2 . . v1 . vn ) if and only if it is a linear combination of (u1 .1. . v2 . vj−1 . . the notation is not meant to imply that 2 < j < n. vj+1 . . . . . . vn ) by deleting vj . v2 . v2 . vj+1 . . um ).
by linear independence of (v1 . vj−1 . Then for some λi we have v = λ1 v1 + λ2 v2 + · · · + λn vn = λ1 v1 + λ2 v2 + · · · + λj−1 vj−1 + λj (α1 v1 + α2 v2 + · · · + αj−1 vj−1 + αj+1 vj+1 + · · · + αn vn ) + λj+1 vj+1 + · · · + λn vn = (λ1 + λj α1 )v1 + (λ2 + λj α2 )v2 + · · · + (λj−1 + λj αj−1 )vj−1 + (λj+1 + λj αj+1 )vj+1 + · · · + (λn + λj αn )vn ∈S. Since we have shown that this is the only solution of ($). . that is. Hence λ1 = λ2 = · · · = λj−1 = λj+1 = · · · = λn = 0.84 Chapter Four: The structure of abstract vector spaces Proof of 4. Sc ho ht rs ive Un ati cs ity em of Sy dn an ey dS tat ist ics ol of M ath . vn ). Then λ1 v1 + λ2 v2 + · · · + λj−1 vj−1 + 0vj + λj+1 vj+1 + · · · + λn vn = 0 ˜ and. vj+1 . v = λ1 v1 + λ2 v2 + · · · + λj−1 vj−1 + λj+1 vj+1 + · · · + λn vn = λ1 v1 + λ2 v2 + · · · + λj−1 vj−1 + 0vj + λj+1 vj+1 + · · · + λn vn ∈ S. This shows that S ⊆ S . . . . v2 . vn ) is linearly independent. . . .2. v2 . . Now let v be an arbitrary element of S. S ⊆ S. (ii) Assume that vj ∈ S . . it follows that S = S. since the reverse inclusion was proved in (i). (i) If v ∈ S then for some scalars λi . we have shown that (v1 . . Co ri g py Every element of S is in S. . and. (iii) Assume that (v1 . . . . vn ) is linearly independent. This gives vj = α1 v1 + α2 v2 + · · · + αj−1 vj−1 + αj+1 vj+1 + · · · + αn vn for some scalars αi . all the coeﬃcients in this equation must be zero. . and suppose that ($) λ1 v1 + λ2 v2 + · · · + λj−1 vj−1 + λj+1 vj+1 + · · · + λn vn = 0 ˜ where the coeﬃcients λi are elements of F .
vr−1 ) is linearly independent. Since (v1 . . denoted ‘dim V ’. furthermore. . vr ∈ V are such that the sequence (v1 .4 Lemma Suppose that v1 .2 (iii) gives 4.5 Theorem Every ﬁnitely generated vector space has a basis. . if V has a basis then the number of elements in any basis of V is called the dimension of V .2. . collected together for ease of reference. vn ) are both bases of 4. ˜ But this contradicts the assumption that (v1 . . vr−1 ) is linearly independent. If (w1 . . v2 .6 Proposition V then m = n. In a similar vein to 4. vr−1 ). . λ2 . v2 . . The proofs will be given in the next section. λr which are not all zero and which satisfy λ1 v1 + λ2 v2 + · · · + λr vr = 0.3 Corollary Any subsequence of a linearly independent sequence is linearly independent. . v2 . λr−1 are not all zero. . . . and. .Chapter Four: The structure of abstract vector spaces A subsequence of a sequence is a sequence obtained by deleting terms from the original. As we have already remarked. . Then vr ∈ Span(v1 . λ2 . . . vr ) is linearly dependent there exist coeﬃcients λ1 . . 85 Co rig py Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics §4b Basis theorems This section consists of a list of the main facts about bases which every student should know. . and rearranging (∗∗) gives vr = −λ−1 (λ1 v1 + λ2 v2 + · · · + λr−1 vr−1 ) r ∈ Span(v1 . thus a sequence is considered to be a subsequence of itself. v2 . v2 . . . wm ) and (v1 . we have: 4. v2 . 4. . . . . . v2 . . w2 . Hence λr = 0. . Proof. . and only slightly harder. vr−1 ). . (This is meant to include the possibility that the number of terms deleted is zero. . ˜ If λr = 0 then λ1 . but (v1 . . vr ) is linearly dependent. . . . (∗∗) gives (∗∗) λ1 v1 + λ2 v2 + · · · + λr−1 vr−1 = 0. . . v2 . .) Repeated application of 4. . . . . . . . .
. . . . . .12 Proposition Let V be a ﬁnitely generated vector space of dimension d and s = (v1 .11 Proposition If V is a ﬁnitely generated vector space and U is a subspace of V then U is also ﬁnitely generated. . and the strategy of the proof is to replace terms of the spanning sequence by Sc ho . . . 4. . The key fact is that the number of terms in a linearly independent sequence of vectors cannot exceed the number of terms in a spanning sequence. vn ) is a basis of V . . (i) If n < d then s does not span V . v2 . vn ) spans V then some subsequence of (v1 . . to avoid the complications of inﬁnite dimensional spaces. . In the interests of rigour. we insert the proviso that the space is ﬁnitely generated.10 Proposition If (v1 . v2 . . v2 .8 Proposition Suppose that (v1 . . 4. Again there is a dual result: any linearly independent sequence extends to a basis. v2 . . . . . Co ri g py ht ol rs ive Un of M ath ity em of ati Sy cs dn an ey dS tat ist ics Dual to 4. This makes them look more complicated than they are. v2 . Then (v1 . v2 . . vn . . vn+1 . . §4c The Replacement Lemma We turn now to the proofs of the results stated in the previous section. . . . the proofs are detailed. vd ) is a basis of V . . vn ) a linearly independent sequence in a ﬁnitely generated vector space V then there exist vn+1 . Then (v1 .7 Proposition Suppose that (v1 . . vn ) spans V . . v2 . . . vn ) is a linearly independent sequence in V which is not a proper subsequence of any other linearly independent sequence.9 Proposition If a sequence (v1 . 4. . if U = V then dim U < dim V . . . vn ) spans V . . Furthermore. . but for the most part the ideas are simple enough. vd in V such that (v1 . .7 we have 4. . . . . 4. . vn ) a sequence of elements of V . v2 . and no proper subsequence of (v1 . . vn ) is a basis. vn+2 . v2 . . . v2 . . However. (iii) If n = d then s spans V if and only if s is linearly independent. vn ) is a basis of V . . and dim U ≤ dim V . (ii) If n > d then s is not linearly independent. .86 Chapter Four: The structure of abstract vector spaces 4. .
. x1 . . . xs ) spans V . vr . . . . x1 . x2 . . . ˜ and the conclusion that some xj can be replaced by v1 without losing the spanning property. .Chapter Four: The structure of abstract vector spaces terms of the linearly independent sequence and keep track of what happens. . Since vr+1 ∈ V = Span(v1 . vr . vr . . in the case s = 0 the Lemma says that it is impossible for (v1 . . . . . . . . (Observe that this gives a contradiction if Sc ho ht em ol rs ive Un ati k=1 of M ath µk xk . . there exists a j with 1 ≤ j ≤ s such that the sequence (v1 . if a sequence s of vectors is linearly independent and another sequence t spans V then it is possible to use the vectors in s to replace an equal number of vectors in t. vr . .13. vr+1 ) to be linearly independent if (v1 . ity of Sy cs dn an ey dS tat ist ics . xs ) there exist scalars λi and µk with s vr+1 = i=1 Writing λr+1 = −1 we have λ1 v1 + · · · + λr vr + λr+1 vr+1 + µ1 x1 + · · · + µs xs = 0. .1 If r = 0 the assumption that (v1 . . v2 . . . . . . vr . . . . . since the coeﬃcient of λr+1 is nonzero. xs elements of V such that (v1 . ˜ which shows that (v1 . so the hypotheses of the Lemma cannot be satisﬁed if s = 0. . . . xs ) spans V . 4. x2 . . always preserving the spanning property. 87 Co r rig py λi vi + 4. . vr+1 ) is linearly independent and (v1 . v2 . . . . . xj−1 . vr . and v1 . x1 . s are nonnegative integers. . . . . x1 . . . Comments 4. since it is designed speciﬁcally for use in the proof of the next theorem. . vr ) spans V .13. . . .2 One conclusion of the Lemma is that there is an xj to replace. .13. vr+1 and x1 . . . x2 . vr+1 ) is linearly independent reduces to v1 = 0. . xs ) is linearly dependent. . . v2 . v2 . vr+1 . xs ) spans V . . More exactly. xj+1 . . vr+1 . The other assumption is that (x1 . That is. v2 . . . . . The statement of the lemma is rather technical. Proof of 4. Then another spanning sequence can be obtained from this by inserting vr+1 and deleting a suitable xj . . In other words. .13 The Replacement Lemma Suppose that r.
xj+1 . . x2 . . x2 ) . xj−1 . . . . . wm−i ) spans V . vr+1 . while Span(v1 . . . . wm ) such that (i) (i) (i) (v1 . . . . . We can now prove the key theorem. . . . . . wm ) be a sequence of elements of V which spans V . . . xj+1 . x1 . . . . v2 . . . xj−1 ) ⊆ Span(v1 . m}: (∗) There exists a subsequence (w1 . . . xj−1 ) is linearly independent. v2 . . xs ) = V by hypothesis. We can therefore ﬁnd the ﬁrst linearly dependent sequence in the list: let j be the least positive integer for which (v1 . xs ). . . xs ) = V. v2 . Then (v1 .vr+1 . . xj ) is linearly dependent. Obviously then S ⊆ V . x1 . . . . We will use induction on i to prove the following statement for all i ∈ {1. vn ) be a linearly independent sequence in V . . . . .vr+1 . . vr . . xj−1 . . . and by 4. But (by 4.2 (i)). . . w2 . 4. .2 (i)) we know that Span(v1 . . . . . vr+1 .2 above. . . So Span(v1 . . w2 . . by 4. x1 .) Now consider the sequences (v1 . vr+1 . . x1 . . x1 . . Sy Hence.13. . xs ) = Span(v1 . x1 . xj . . Let us call this last space S. .88 Chapter Four: The structure of abstract vector spaces s = 0. . and let (v1 .4. . . xj+1 . . .14 Theorem Let (w1 . . . . . . . . . xs ). . . v2 .2 (ii). . . . . . v2 . .vr+1 ) (v1 . vr+1 . . . vr+1 . . vi . v2 . x1 . . x1 . . . x1 . . .vr+1 . . x1 . by repeated application of the Replacement Lemma. w2 . . . vr+1 . . Then n ≤ m. . xj+1 . . . . x1 ) (v1 . v2 . . . Proof. . . x2 . . xj ∈ Span(v1 . . Suppose that n > m. x1 . . . . . . w1 . . v2 . . vr . xs ) Sc ho ht em ol rs ive Un ati cs of M ath ity of (by 4. . . (i) (i) (i) an dS . . justifying the remarks in 4. Co ri g py The ﬁrst of these is linearly independent. w2 . wm−i ) of (w1 . 2. v2 . . . . . . . xj−1 . . as required. . and the last is linearly dependent. . vr+1 . . (v1 . . . xs ) ⊆ S. x2 . xj−1 . . . . v2 . . . . . . . . . . dn ey tat ist ics Span(v1 . . .
. wm ) such that (v1 . . it holds in particular for i = m. Sc ho ht em (k−1) . it follows that m ≤ n. . . vn ) is linearly independent. wm−k ) spans V . . it follows from 4. wm−k+1 ). . . . For such an r we have vr = −λ−1 (λ1 v1 + · · · + λr−1 vr−1 + λr+1 vr+1 + · · · + λn vn ) r ∈ Span(v1 . vm ). . . vi without losing the spanning property. . . vm ) spans V . wm ) spans V proves (∗) in the case i = 0: in this case the subsequence involved is the whole sequence (w1 . v2 . . vk−1 . 89 Co rig py By our hypothesis there is a subsequence (w1 of (w1 . . showing that the assumption that n > m is false.Chapter Four: The structure of abstract vector spaces In other words. . v2 . . vk−1 . w1 . . . . . . and completing the proof. wm ) can be replaced by v1 . . . . . w2 . . This ˜ contradicts the fact that the sequence of vj is linearly independent. wm−k+1 ) (k−1) (k−1) ol rs ive Un ati cs (k−1) (k−1) of M ath ity of (k−1) (k−1) (k−1) Sy dn an ey dS tat ist ics . . . . . Suppose that the sequence (v1 . . . Proof of 4. v2 . . . wm−k ) being obtained by deleting one term of (w1 . v2 . . . . vk . (w1 . . . . w2 . v2 . w2 . . v2 . . . w2 . wm ) is linearly independent. . . w2 . . Hence (∗) holds for i = k. vn ) spans and (w1 . v2 . . . . . . w2 . . . vr+1 . Repeated application of 4. wm ) spans and (v1 . giving a solution of λ1 v1 + λ2 v2 + · · · + λm+1 vm+1 = 0 with λm+1 = −1 = 0. . v2 . w1 . . . and this will complete our induction. w2 . Then there exist λi ∈ F with λ1 v1 + λ2 v2 + · · · + λn vn = 0 ˜ and λr = 0 for some r. so since (v1 . . vr−1 . as required. Proof of 4. . what this says is that i of the terms of (w1 . Now assume that 1 ≤ k ≤ m and that (∗) holds when i = k − 1. w2 (k−1) . .6. Since (∗) has been proved for all i with 0 ≤ i ≤ m. v2 . Since (w1 . .2 (iii) shows that any subsequence of a linearly independent sequence is linearly independent. . so vm+1 ∈ Span(v1 . Now we can apply the Replacement Lemma to conclude (k) (k) (k) (k) (k) (k) that (v1 . . Dually. and in this case it says that (v1 . since (v1 . . . . wm−k+1 ) spans V . . . w2 . . vn ) is linearly independent and n > m ≥ k it follows that (v1 . . . . . wm ) and there is no replacement of terms at all. . . . . Since m + 1 ≤ n we have that vm+1 exists. vn ) is linearly dependent. . . The fact that (w1 . . . . . . . . . It is now straightforward to deal with the theorems and propositions stated in the previous section. w2 .7. . . vn ). .14 that n ≤ m. . . vk ) is linearly independent. w2 . We will prove that (∗) holds for i = k. .
v2 . .10. Proof of 4. v2 . v2 . v) is not linearly independent. . . . . . let S be the set of all subsequences of (v1 . vn . . and. v2 . vn ). vn ) spans V . . . namely (v1 . v2 . vn ) which span V .90 Chapter Four: The structure of abstract vector spaces and. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . since it is also linearly independent. . vn . .5 follows immediately from 4. v2 . . . . v2 . do so. vn ). Repeat this process for as long as possible. . vn ). . . . Since V is ﬁnitely generated there is a ﬁnite sequence (w1 . . vn ). v ∈ Span(v1 .4. . .14 it follows that a linearly independent sequence in V can have at most m elements. . . . . . . . . . vd ) which cannot be extended. . . . and. . . . . vn ) spans V whilst proper subsequences of it do not. . Hence. and eventually (in at most n steps) we will obtain a sequence (u1 . v2 . vn ). . vr−1 . . . . . v2 . and which therefore spans V . By 4. wm ) which spans V . . . If S is not equal to V then it must be a proper subset of V . Thus at some stage we obtain a linearly independent sequence (v1 . v2 .4 would give vn+1 ∈ Span(v1 . . . vn+1 ) were linearly dependent 4. So we have obtained a longer linearly independent sequence. . it is therefore a basis. Observe that the set S has at least one element. . . Proof of 4.9. . Let v ∈ V . us ) in S with the property that no proper subsequence of (u1 . . vn ). . vn . . . w2 . . . . Span(v1 .7 above. u2 . and if it is possible to delete a term and still be left with a sequence which spans V . . is therefore a basis. . . .2 (ii). . Co ri g py Proof of 4. . . vn ) is linearly independent. v2 . By our hypotheses the sequence (v1 . Since the reverse inclusion is obvious we deduce that (v1 . . . vn ) = Span(v1 . . u2 . . since it also spans. v2 . . . a contradiction. / If the sequence (v1 . Let S = Span(v1 . Hence the process described in the above paragraph cannot continue indeﬁnitely. Repeat this for as long as possible. us ) is in S. and we may choose vn+1 ∈ V with vn+1 ∈ S. v2 . and by 4. Start with this sequence.8. . .9. v2 . . Given that V = Span(v1 . v2 . . . such a sequence is necessarily a basis. . . vn . . Since this holds for all v ∈ V it follows that V ⊆ Span(v1 . . vn+1 ) and increase the length again. . by 4. vr+1 . and if this also fails to span V we may choose vn+2 ∈ V which is not in Span(v1 . vn ) is. This contradicts the assumption that (v1 . vn ) itself. . . . Observe that Theorem 4. but the sequence (v1 . Hence (v1 . . . . . by 4.
v2 . Conversely. . λn ∈ F with 91 Co rig py Sc (♥) v = λ1 v1 + λ2 v2 + · · · + λn vn . . and linear independence of the vi gives λi − λi = 0 for each i. however. . suppose that (♥) has at most one solution for each v ∈ V . (v1 . . it is useful from time to time: 4. vn ) is linearly independent if and only if (♥) has at most one solution for each v ∈ V . . . . . Let (v1 . . Thus λi ˜ λi . n) are such that for some v ∈ V . it is natural to investigate their relationships with bases. This section contains two theorems in this direction. . . .12 are left as exercises. .15 Proposition Let V be a vector space over the ﬁeld F . vn ) spans V if and only if for each v ∈ V the equation (♥) has a solution for the scalars λi . .11 and 4. . . . v2 . v2 . and suppose that λi . . 2. . v2 . . wn be arbitrary elements . λi ∈ F (i = 1. Collecting terms gives i=1 (λi − λi )vi = 0. . Proof. . . v2 . vn ) is linearly independent. and so λi = 0 is ˜ the unique solution to this. w2 . . 4. . . vn ) is linearly independent. . . vn ) is a basis for V if and only if for each v ∈ V there exist unique λ1 .16 Theorem Let V and W be vector spaces over the same ﬁeld F . . . . ho n i=1 ht em ol rs ive Un ati n λi vi = i=1 of M ath n ity of Sy λi vi = v cs dn an dS ey ist ics tat §4d Two properties of linear transformations Since linear transformations are of central importance in this subject. A sequence (v1 . . Hence it will be suﬃcient to prove that (v1 . .Chapter Four: The structure of abstract vector spaces The proofs of 4. . λ2 . and it = follows that (♥) cannot have more than one solution. vn ) be a basis for V and let w1 . That is. Clearly (v1 . Suppose that (v1 . Our ﬁnal result in this section is little more than a rephrasing of the deﬁnition of a basis. . n Then in particular i=1 λi vi = 0 has at most one solution. . v2 .
. vn ) n spans V there exist λi ∈ F with v = i=1 λi vi . as required.92 Chapter Four: The structure of abstract vector spaces of W . β ∈ F . αu + βv = i=1 (αλi + βµi )vi . Let v ∈ V . and write v = i=1 λi vi as above. n. v ∈ V and α. . Let v ∈ V . . . . .15 the scalars λi are uniquely determined by v. This gives us a well deﬁned rule for obtaining an element of W for each element of V . Proof. . that is. and therefore i=1 λi wi is a uniquely determined element of W . Then there is a unique linear transformation θ: V → W such that θ(vi ) = wi for i = 1. v2 . assume that θ and ϕ are both linear transformations from V to W and that θ(vi ) = ϕ(vi ) = wi for all i. 2. We prove ﬁrst that there is at most one such linear transformation. and we ﬁnd Co ri g py Sc n n ho λi vi λi vi θ ht ol of M n i=1 rs ive Un θ(v) = θ i=1 n ath = i=1 n λi θ(vi ) λi ϕ(vi ) i=1 (by linearity of θ) ity em = =ϕ (since θ(vi ) = ϕ(vi )) of ati Sy (by linearity of ϕ) cs an dn ey i=1 dS = ϕ(v). To do this. By n 4. Since (v1 . Let u = Then n n i=1 λi vi and v = n i=1 µi vi . We must now prove the existence of a linear transformation with the n required properties. . Thus θ = ϕ. . a function from V to W . Let u. Thus there is a function θ: V → W satisfying tat ist ics n λi vi = i=1 λi wi for all λi ∈ F . and it remains for us to prove that it is linear.
. where n is the dimension of the space. ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . .1 Theorem 4. . Sc ho λi wi + β i=1 µi wi = αθ(u) + βθ(v). essentially the the same as F n . θ(v2 ). . Comment 4. . the proof of which is left as an exercise. (ii) θ is surjective if and only if (θ(v1 ).17. Our second theorem of this section.16. . .17 that (θ(v1 ). . once its values on the basis elements have been speciﬁed its values everywhere else are uniquely determined. Hence the existence of a bijective linear transformation from one space to another guarantees that the spaces must have the same dimension. θ(vn ) is a basis of W . v2 . . Comment 4. . Then (i) θ is injective if and only if (θ(v1 ). θ(vn )) is linearly independent. examines what injective and surjective linear transformations do to a basis of a space. after all. .16 says that a linearly transformation can be deﬁned in an arbitrary fashion on a basis. §4e Coordinates relative to a basis In this ﬁnal section of this chapter we show that choosing a basis in a vector space enables one to coordinatize the space. vn ) be a basis of a vector space V and let θ: V → W be a linear transformation. . Thus it turns out that an arbitrary ndimensional vector space over the ﬁeld F is. v2 . . θ(v2 ). in the sense that each element of the space is associated with a uniquely determined ntuple.1 If θ: V → W is bijective and (v1 . 4. θ(v2 ). θ(vn )) spans W .Chapter Four: The structure of abstract vector spaces and the deﬁnition of θ gives n 93 θ(αu + βv) = (αλi + βµi )wi n Co =α i=1 n rig py i=1 Hence θ is linear. . vn ) is a basis of V then it follows from 4. . . .17 Theorem Let (v1 . . moreover. . .
v2 . . Since T (α) = i=1 αi vi (where αi is the ith entry . vn ) is linearly independent. . . . Co ri g py αn is a linear transformation. βi be the ith entries of α. Proof. hence T is surjective if and only if (v1 . . . = T . β respectively. . vn ) is linearly independent. µ ∈ F . dS tat ist ics . . λαn µβn λαn + µβn Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn n n n an ey = i=1 (λαi + µβi )vi = λ i=1 αi vi + µ i=1 βi vi = λT (α) + µT (β). Let αi . v2 .94 Chapter Four: The structure of abstract vector spaces 4. we have λ1 λ2 im T = T . . . .15 we know that T is injective if and only if the unique solution 0 . Moreover. . vn be arbitrary elements of V . . . ˜ 0 of α) this can be restated as follows: T is injective if and only if αi = 0 (for n all i) is the unique solution of i=1 αi vi = 0. n of T (α) = 0 is α = . v2 . λn = { λ1 v1 + λ2 v2 + · · · + λn vn  λi ∈ F } = Span(v1 . v2 .18 Theorem Let V be a vector space over F and let v1 . . vn ) spans W . Then the function T : F n → V deﬁned by α1 α2 T . . . . Let α. . Hence T is linear. v2 . . . T is surjective if and only if im T = W . . . . + . By 3. vn ) spans V and injective if and only if (v1 . . . Then we have µβ1 λα1 + µβ1 λα1 λα2 µβ2 λα2 + µβ2 T (λα + µβ) = T . That is. . . v2 . . = α1 v1 + α2 v2 + · · · + αn vn . β ∈ F n and λ. . vn ). . . By the deﬁnition of the image of a function. T is surjective if and only if the sequence (v1 . By deﬁnition. T is injective if and ˜ only if (v1 . . λi ∈ F .
. ∈ F n . λ1 . that is. vn ) is the unique column vector λ1 λ2 cvb (v) = .2) If s is the standard basis of F n then cvs (v) = v for all v ∈ F n .Chapter Four: The structure of abstract vector spaces Comment 4. #1 Let s = (e1 . satisfying cvb (v) → v for all v ∈ V . we call this column the coordinate vector of v relative to the basis b.19. . however. the standard basis of F n . en ). also proved that the corresponding mapping F n → V . . 4.17 and the standard basis of F n .18.18 above. e2 . ∈ F n . . λn then clearly v = λ1 e1 + · · · + λn en . Comment 4. That is. .15. . the coordinate vector of v is just the same as v: (4. . Sc ho ht em ol rs ive Un ati cs of Examples M ath ity of Sy dn ey ist ics tat dS an .1 Observe that this bijective correspondence between elements of V and ntuples also follows immediately from 4. Assume now that b = (v1 . and so the coordinate vector of v relative to s is the column with λi as its ith entry. vn ) is a basis of V and let T : F n → V be as deﬁned in 4. . and so it follows that there is an inverse function T −1 : V → F n which associates to every v ∈ V a column T −1 (v). v2 .18 we have that T is bijective. . We have. . 95 Co rig py λn 4. . . v2 .19 Definition The coordinate vector of v ∈ V relative to the basis b = (v1 . is linear. and we denote it by ‘cvb (v)’.16. . If v = . By 4.19. such that v = λ1 v1 + λ2 v2 + · · · + λn vn .1 The above proof can be shortened by using 4.
96
Chapter Four: The structure of abstract vector spaces 1 2 1 0 , 1 , 1 is a basis for R3 and calculate #2 Prove that b = 1 1 1 1 0 0 0 , 1 and 0 relative to b. the coordinate vectors of 0 1 0
Co
1 1 1
ri g py
1 0 0
−− By 4.15 we know that b is a basis of R3 if and only if each element of R3 is uniquely expressible as a linear combination of the elements of b. That is, b is a basis if and only if the equations
a 1 2 1 0 + y1 + z1 = b x c 1 1 1 have a unique solution for all a, b, c ∈ R. We may rewrite these equations as 1 0 1 a x 2 1 1 1y = b, c z 1 1
and we know that there is a unique solution for all a, b, c if and only if the coeﬃcient matrix has an inverse. It has an inverse if and only if the reduced echelon matrix obtained from it by row operations is the identity matrix. We are also asked to calculate x, y and z in three particular cases, and again this involves ﬁnding the reduced echelon matrices for the corresponding augmented matrices. We can do all these things at once by applying row operations to the augmented matrix
If the left hand half reduces to the identity it means that b is a basis, and the three columns in the right hand half will be the coordinate vectors we seek. 1 2 1 1 0 0 1 2 1 1 0 0 :=R 0 1 1 0 1 0 R3− − −R1 0 1 1 0 1 0 − −3 − − −→ 1 1 1 0 0 1 0 −1 0 −1 0 1
Sc
1 2 0 1 1 1
ho
ht em
0 1 0
ol
rs ive Un ati cs
0 0. 1
of
M
ath
ity
of
Sy
dn ey ist ics tat
an dS
Chapter Four: The structure of abstract vector spaces 1 R1 :=R1 −2R2 0 R :=R3 +R −3− − − 2 −−−→ 0 1 R1 :=R1 +R3 0 R2 :=R2 −R3 −− − − − − −→ 0 0 −1 1 1 0 1 0 0 1 0 0 1 1 −2 0 1 −1 1 0 0 1
97
Co
0 −1 1 1 0 −1 . −1 1 1
rig py
Hence we have shown that b is indeed a basis, and, furthermore,
1 0 cvb 0 = 1 , 0 −1
#3 Let V be the set of all polynomials over R of degree at most three. Let p0 , p1 , p2 , p3 ∈ V be deﬁned by pi (x) = xi (for i = 0, 1, 2, 3). For each f ∈ V there exist unique coeﬃcients ai ∈ R such that f (x) = a0 + a1 x + a2 x2 + a3 x3 = a0 p0 (x) + a1 p1 (x) + a2 p2 (x) + a3 p3 (x).
Hence we see that b = (p0 , p1 , p2 , p3 ) is a basis of V , and if f (x) = above then cvb (f ) =t ( a0 a1 a2 a3 ).
Sc
ho
ht em
0 −1 cvb 1 = 0 , 0 1
ol
rs ive Un ati cs
of
0 1 cvb 0 = −1 . 1 1 −−
Exercises
1. In each of the following examples the set S has a natural vector space structure over the ﬁeld F . In each case decide whether S is ﬁnitely generated, and, if it is, ﬁnd its dimension. (i ) S = C (complex numbers), F = R. (ii ) S = C, F = C. (iii ) S = R, F = Q (rational numbers). (iv ) S = R[X] (polynomials over R in the variable X), F = R. (v ) S = Mat(n, C) (n × n matrices over C), F = R.
M
ath
ity an
of
Sy
dn
ey
ai xi as
dS
tat
ist
ics
98
Chapter Four: The structure of abstract vector spaces 2. Determine whether or not the following two subspaces of R3 are the same: 1 2 1 2 Span 2 , 4 and Span 2 , 4 . −1 1 4 −5
Co
ri g py
Sc
ho
ht
3.
Suppose that (v1 , v2 , v3 ) is a basis for a vector space V , and deﬁne elements w1 , w2 , w3 ∈ V by w1 = v1 − 2v2 + 3v3 , w2 = −v1 + v3 , w3 = v2 − v3 . (i ) Express v1 , v2 , v3 in terms of w1 , w2 , w3 . (ii ) Prove that w1 , w2 , w3 are linearly independent. (iii ) Prove that w1 , w2 , w3 span V .
ol
rs ive Un ati
of
M
ath
ity
em
of
4. 5. 6. 7. 8.
Let V be a vector space and (v1 , v2 , . . . , vn ) a sequence of vectors in V . Prove that Span(v1 , v2 , . . . , vn ) is a subspace of V .
Sy
cs
dn
an
ey
Prove Proposition 4.11. Prove Proposition 4.12. Prove Theorem 4.17.
Let (v1 , v2 , . . . , vn ) be a basis of a vector space V and let w1 = α11 v1 + α21 v2 + · · · + αn1 vn w2 = α12 v1 + α22 v2 + · · · + αn2 vn . . . wn = α1n v1 + α2n v2 + · · · + αnn vn
where the αij are scalars. Let A be the matrix with (i, j)entry αij . Prove that (w1 , w2 , . . . , wn ) is a basis for V if and only if A is invertible.
dS tat
ist ics
5
Co rig py ht rs ive Un M ath ity em of of
Inner Product Spaces
To regard R2 and R3 merely as vector spaces is to ignore two basic concepts of Euclidean geometry: the length of a line segment and the angle between two lines. Thus it is desirable to give these vector spaces some additional structure which will enable us to talk of the length of a vector and the angle between two vectors. To do so is the aim of this chapter. §5a The inner product axioms
If v and w are ntuples over R we deﬁne the dot product of v and w to be the n scalar v · w = i=1 vi wi where vi is the ith entry of v and wi the ith entry of w. That is, v · w = v(t w) if v and w are rows, v · w = (t v)w if v and w are columns. Suppose that a Cartesian coordinate system is chosen for 3dimensional Euclidean space in the usual manner. If O is the origin and P and Q are points with coordinates v = (x1 , y1 , z1 ) and w = (x2 , y2 , z2 ) respectively, then by Pythagoras’s Theorem the lengths of OP , OQ and P Q are as follows:
Sc ho ol
OP = OQ = PQ =
ati cs an
99
Sy dn ey dS tat ist ics
2 2 x2 + y1 + z1 1 2 2 x2 + y2 + z2 2
(x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 .
By the cosine rule the cosine of the angle P OQ is
2 2 2 2 (x2 + y1 + z1 ) + (x2 + y2 + z2 ) − ((x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 ) 2 1 2 2 2 2 2 x2 + y1 + z1 x2 + y2 + z2 1 2 x1 x2 + y1 y2 + z1 z2 = 2 + y2 + z2 2 2 x1 x2 + y2 + z2 1 1 2 √ √ = (v.w) ( v.v w.w).
That is. That is. (b) The dot product is symmetric. (ii) v. w. w = λ u. µ. Sc u · (λv + µw) = λ(u · v) + µ(u · w) v·v ≥0 and v·v =0 The proofs of these properties are straightforward and are omitted. w ∈ R and (i) λu + µv. w . v. suppose that w is ﬁxed and deﬁne a function fw from the vector space to R by fw (v) = v. ho ht em ol rs ive Un ati cs of M for all ntuples v. Furthermore. ath for every ntuple v if and only if v = 0. ity of Sy dn ey tat ist ics an dS . It turns out that all important properties of the dot product are consequences of these three basic properties. v . and all λ. Note also the following properties: (a) The dot product is bilinear. slightly diﬀerent. we use them as the basis of a new axiomatic system. That is. By (i) above we have that fw is linear. w = w.12). we must have v.1 Definition A real inner product space is a vector space over R which is equipped with a scalar valued product which is bilinear. v > 0 if v = 0. v. 5. w + µ v. w and scalars λ. To see this. µ ∈ R. That is. w for the scalar product of v and w. for all vectors u. It also follows from bilinearity that v.1. (λu + µv) · w = λ(u · w) + µ(v · w) Co ri g py and for all ntuples u. and hence fw (0) = 0 (by 3. w for all vectors v. Comment 5.100 Chapter Five: Inner Product Spaces Thus we see that the dot product is closely allied with the concepts of ‘length’ and ‘angle’. contexts. Accordingly. v·w =w·v (c) The dot product is positive deﬁnite. (iii) v. w. w = 0 if either v or w is zero. symmetric and positive deﬁnite. these same properties arise in other. linearity in the second variable is a consequence of linearity in the ﬁrst. writing v.1 Since the inner product is assumed to be symmetric.
above. Then using the distributive property of matrix multiplication and linearity of the “transpose” map from R3 to t R3 . g = 101 Co a b rig py f (x)g(x) dx. Sc ho ht em 1 3 2 1 u. v ∈ R3 .Chapter Five: Inner Product Spaces Apart from Rn with the dot product. v ) = w. from which it follows that the scalar product as deﬁned is positive deﬁnite. #1 For all u. is an inner product on R3 . −− Let A be the 3 × 3 matrix appearing in the deﬁnition of . a scalar—and hence symmetric). We leave it as an exercise to check that this gives a symmetric bilinear scalar b product. b] of all continuous real valued functions on a closed interval [a. and it follows that (i) of the deﬁnition is satisﬁed. v is a 1 × 1 matrix—that is. with the inner product of functions f and g deﬁned by the formula f. Part (ii) follows since A is symmetric: for all v. v = t u 1 2 ol rs ive Un ati 2 2 v. w ∈ R3 v. w = (t v)Aw = (t v)(t A)w = t ((t w)Av) = t ( w. w = t (λu + µv)Aw = (λ(t u) + µ(t v))Aw = λ(t u)Aw + µ(t v)Aw = λ u. b]. v (the second of these equalities follows since A = tA. the third since transposing reverses the order of factors in a product. we have λu + µv. v. Let u. µ ∈ R. and the last because w. w ∈ R3 and λ. It is a theorem of calculus that if f is continuous and a f (x)2 dx = 0 then f (x) = 0 for all x ∈ [a. the most important example of an inner product space is the set C[a. v ∈ R be deﬁned by the formula Prove that . b]. 7 of Examples M ath ity of Sy dn ey tat ist ics dS cs an . let u. w . w + µ v.
v > 0 whenever v = 0. which is nonnegative. −− Co ri g py Sc ho ht em ol rs ive Un of M ath ity #2 Show that if the 1 0 1 matrix A in #1 above is 1 2 3 1 or 3 1 2 3 7 replaced by either 1 2 3 2 2 1 of then the resulting scalar valued product on R3 does not satisfy the inner product axioms. 1 1 −− contrary to the requirement that v. . we would have 0 1 2 3 0 1 0. v = (x + y + 2z)2 + 2y 2 + 4yz + 3z 2 = (x + y + 2z)2 + 2(y + z)2 + z 2 .1 = (1 0 0)0 3 11 = 2 0 0 1 3 7 0 ati 2 3 3 Sy cs dn an ey dS tat ist ics while on the other hand 0 1 1.1 would have −1 −1 1 −1 . For example.0 = (0 0 0 1 1 0)0 1 3 1 1 0 = 0. So we have shown that v.102 Chapter Five: Inner Product Spaces Now let v = t ( x v. We ﬁnd that 1 1 2 x z)1 3 2y 2 2 7 z = (x2 + xy + 2xz) + (yx + 3y 2 + 2yz) + (2zx + 2zy + 7z 2 ). Applying completion of the square techniques we deduce that v.1. 7 0 In the second case Part (iii) of Deﬁnition 5. This only occurs if x = y = z = 0. which is the third and last requirement in Deﬁnition 5. y + z and x + y + 2z are all zero. and can only be zero if z. we 1 3 2 2 −1 2 −1 = −1. v = ( x y y z ) ∈ R3 be arbitrary. For example.1. −− In the ﬁrst case the fact that the matrix is not symmetric means that the resulting product would not satisfy (ii) of Deﬁnition 5. −1 = ( −1 −1 1 ) 1 1 1 2 would fail. v ≥ 0 for all v.
that is. Thus. M . M. M = i. N = Tr((tM )N ) deﬁnes an inner product on the space of all m × n matrices over R.j (Mji )2 is clearly nonnegative in all cases. and let λ. P . So the veriﬁcation that . Tr(A) = i=1 Aii . i)entry of M . as deﬁned is an inner product involves almost exactly the same calculations as involved in proving bilinearity.j Nji Pji = λ M.j Nji Mji = N. verifying Property (ii).Chapter Five: Inner Product Spaces #3 The trace Tr(A) of an n × n matrix A is deﬁned to be the sum of the n diagonal entries of A. Furthermore. N is just the standard dot product formula: the sum of the products of each entry of M by the corresponding entry of N . N = Mji Nji = i. Finally. P be m × n matrices. M. let M. Show that M. Then of j=1 ( M )ij Nji = Mji Nji j=1 M ath ity em of ati Sy cs dn an ey dS tat ist λM + µN. and can only be zero if Mji = 0 for all i and j. That is. if we think of an m × n matrix as an mntuple which is simply written as a rectangular array instead of as a row or column. Thus.j ho ht rs ive Un m ol (( M )N )ii = t since the (i. P = = i. with equality only if M = 0.j (λMji + µNji )Pji ics (λMji Pji + µNji Pji ) = λ i. verifying Property (i) of 5. symmetry and positive deﬁniteness of the dot product on Rn . Thus the trace of (tM )N n n m t is i=1 (( M )N )ii = i=1 j=1 Mji Nji . i)entry of (tM )N is 103 Co m t rig py Sc i. j)entry of tM is the (j.j (λM + µN )ji Pji = i. µ ∈ R. −− The (i.j i.j Mji Pji + µ i. N. P + µ N. then the given formula for M. −− . Hence Property (iii) holds as well. M.1. M ≥ 0.
2 Complex inner product spaces are often called unitary spaces. v . (iii) v. w = w. w = w. It follows easily that if v is an arbitrary complex column vector then v · v as deﬁned above is real and nonnegative. and is zero only if v = 0. Comments 5. Likewise. (ii) v. If z is an arbitrary complex number then zz is real and nonnegative. v.104 Chapter Five: Inner Product Spaces It is also possible to deﬁne a dot product on Cn . v ∈ R and v. w ∈ C and λ.2. instead.2 Definition A complex inner product space is a complex vector space V which is equipped with a scalar product . ity of Sy dn ey tat ist ics an dS . v > 0 if v = 0. v + µ u. : V × V → C satisfying (i) u. It is still linear in the second variable. w ∈ Cn . it satisﬁes v·w =w·v Analogy with the real case leads to the following deﬁnition. v 5. (This deﬁnition is suggested by the fact. In this case the deﬁnition is n v · w = ( v)w = t vi wi i=1 where the overline indicates complex conjugation.v. This apparent complication is introduced to preserve positive deﬁniteness. 5. The price to be paid for making the complex dot product positive definite is that it is no longer bilinear. w . the complex dot product is not quite symmetric. µ ∈ C.2. but semilinear or conjugate linear in the ﬁrst: and n Co ri g py (λu + µv) · w = λ(u · w) + µ(v · w) u · (λv + µw) = λ(u · v) + µ(u · w) for all u. w ∈ V and λ. that the distance from √ the origin of the point with coordinates v = ( x y z ) is v. If V is any inner product space (real or complex) we deﬁne the length or norm of a vector v ∈ V to be the nonnegative real number v = v. v . µ ∈ C. v ∈ R is in fact a consequence of v. and zero only if z is zero. and real inner product spaces are often called Euclidean spaces. for all u.) It is an easy Sc ho ht em ol rs ive Un ati cs of M ath for all v. v. λv + µw = λ u.1 Note that v. noted above.
A set X of vectors is said to be orthonormal if v. v = v. 105 Co rig py Analogy with the dot product suggests also that for real inner product spaces the angle θ between two nonzero vectors v and w should be deﬁned by the formula cos θ = v. Comments 5. w = 0 for all v. It is easily seen that our deﬁnition meets the ﬁrst two of these requirements.3 Definition Let V be an inner product space. v = 1 for all v in X and v. v2 . this deﬁnition will say that two nonzero vectors in a real inner product space are orthogonal if and only if they are at rightangles to each other. and d(x. y) + d(y. w ∈ X such that v = w. if v1 . (ii) d(x. that the absolute value of a complex number λ is given by λ = λλ. 5. .3. w is zero then w.Chapter Five: Inner Product Spaces exercise to prove that if v ∈ V and λ is a scalar then λv = λ v √ (Recall . so to justify the deﬁnition we must prove that  v. w = 0. w ∈ V are said to be orthogonal if v. vj = 0 whenever i = j. Having deﬁned the length of a vector it is natural now to say that the distance between two vectors x and y is the length of x − y. (iii) d(x. y) = 0 only if x = y. y) ≥ 0. z) (the triangle inequality). w ( v w ). since if v. x). Vectors v. We will do this later. w  ≤ v w (the CauchySchwarz inequality). vn are nonzero vectors satisfying vi . It is customary in mathematics to reserve the term ‘distance’ for a function d satisfying the following properties: (i) d(x.1 In view of our intended deﬁnition of the angle between two vectors. . .) In particular. In particular. Observe also that if v is orthogonal to w then all scalar multiples of v are orthogonal to w. 5. y and z. w is too. thus we deﬁne d(x. then an othonormal set of vectors can be obtained by replacing vi by vi −1 vi for each i. Obviously we want −1 ≤ cos θ ≤ 1. for all x. if λ = (1/ v ) then λv = 1. y) = d(y. . y) = x − y . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . the proof that it also meets the third is deferred for a while. z) ≤ d(x.2 Observe that orthogonality is a symmetric relation.3.
1.1 above we see that for all j. . Then the vi are linearly independent. As we shall see below. f1 and f2 deﬁned by f0 (x) = 1. .4 the vi form a basis of the subspace they span. vj = 0 it follows that λj = 0. . i=1 n λi vi λi vj . v2 . vn be nonzero elements of V (an inner product space) and suppose that vi . Example #4 The space C[−1. while 1 1 f0 (x)f2 (x) dx = −1 −1 3x2 − 1 dx = (x3 − x) 1 −1 = 0. Such a basis is called an orthogonal basis of the subspace. . 1] of degree at most two. Sc ho n ht em ol rs ive Un of M ath ity n of 0 = vj . It is easily shown that orthonormal vectors are always linearly independent. Suppose that i=1 λi vi = 0.4 Proposition Let v1 . 1] has a subspace of dimension 3 consisting of polynomial functions on [−1. such bases are of particular importance. vj = 0 whenever i = j. Co ri g py 5. since f0 (x)f1 (x) and f1 (x)f2 (x) are both 1 1 odd functions it is immediate that −1 f0 (x)f1 (x) dx and −1 f1 (x)f2 (x) dx are both zero. It can be checked that f0 . vj (since the other terms are zero) tat ist ics and since vj . where the λi are scalars. .106 Chapter Five: Inner Product Spaces §5b Orthogonal projection Note that the standard bases of Rn and Cn are orthonormal bases. By 5. Indeed. Proof. Under the hypotheses of 5. vi ati cs Sy dn = i=1 (by linearity in the second variable) an ey dS = λj vj . f1 (x) = x and f2 (x) = 3x2 − 1 form an orthogonal basis of this subspace.
Chapter Five: Inner Product Spaces 5. The elements of a basis must be nonzero. 5. u = ol of n j=1 M ath (since the other terms are zero) µj uj for some scalars µj . the statement that c1 is orthogonal is vacuously true. . v n for all x ∈ U . x. . v j λi u i µj uj . v2 . u = x. ui ui . i=1 n Sc n j = i=1 λi u j . If x ∈ U is arbitrary then x = x. Since u − u ∈ U this gives. u j = uj . u2 . u = showing that u has the required property. ui and u = i=1 λi ui . . . in V such that cr = (u1 . . v = x. by which an arbitrary ﬁnite dimensional subspace of an inner product space is shown to have an orthogonal basis. and it is given by the formula u = i=1 ui . The case r = 1 is easily settled by deﬁning u1 = v1 . u = uj . u i = λj uj . v = 0. u − u = x. and we have em ey ics dS tat ist . v ui . vr ) is linearly independent for all r. . This is proved by induction. and let v ∈ V . u − x. .5 is the GramSchmidt orthogonalization process. v − x. . . ui = 0 n for all i.6 Theorem Let (v1 . u − u = 0. . Then for each j from 1 to n we have 107 Co rig py uj . . Proof. Then there exist vectors u1 . so we have that ui . So u is uniquely determined. such that br = (v1 . Proof. u2 . . . Write λi = ui . . ur ) is an orthogonal basis of Vr for each r. . un ) be an orthogonal basis of a subspace U of V . u = x. Our ﬁrst consequence of 5. v ui . and hence u − u = 0.5 Lemma Let (u1 . There is a unique element u ∈ U such that x. v2 . in particular. ho ht rs ive Un ity of ati Sy cs dn an µj uj . If u ∈ U also has this property then for all x ∈ U . v . . ) be a sequence (ﬁnite or inﬁnite) of vectors in an inner product space. . u2 . v3 . u3 . . Let Vr be the subspace spanned by br . that u − u .
un ) is any orthogonal basis of U then we have the formula of M ath ui . u = ui . v2 u1 u1 . . . u2 . this is left as an exercise for the reader. vr − u i . P (v) = u. the vector vr is not in the subspace Vr−1 . In view of this we see from 5. . u 1 u2 .5 there exists u ∈ Vr−1 such that ui .7. u1 . by linear independence of br . . u = 0 Sc ho ht ol rs ive Un by the deﬁnition of u. We deﬁne ur = vr − u. u2 . . v3 u 2 . . . 5. since for all i < r. . uj = 0 for all i. v3 u3 = v3 − u1 − u2 u1 . is an algorithm for which the input is a linearly independent sequence of vectors v1 . The proof gives the following formulae for the ui : u1 = v1 u 1 . u 1 ur−1 . Co n ri g py u i . vr ur = vr − u1 − · · · − ur−1 .6. u 1 . given in the proof of Theorem 5.1 that orthogonal projections are linear transformations. 5.2 The GramSchmidt process. . j ≤ r with i = j. Comments 5. u i ui .6. v for all v ∈ V and u ∈ U . u3 .6 above. . Furthermore. which are linear combinations of the vi . ur−1 u2 = v2 − cs dn ey tat ist ics an dS . v i=1 ity em of ati Sy 5. vr for all i from 1 to r − 1. vr ur−1 .108 Chapter Five: Inner Product Spaces Assume now that r > 1 and that u1 to ur−1 have been found with the required properties. u 2 .7 Definition The transformation P : V → U deﬁned above is called the orthogonal projection of V onto U . noting that this is nonzero since. If i.1 P (v) = ui . By 5. and the output an orthogonal sequence of vectors u1 . . vr − u = u i . if (u1 .1 It follows easily from the formula 5. . and the remaining case is clear too. v2 . j ≤ r − 1 this is immediate from our inductive hypothesis.5 that if U is any ﬁnite dimensional subspace of an inner product space V then there is a unique mapping P : V → U such that u. v3 . To complete the proof it remains to check that ui . u r = u i .7. u 1 u 1 .
vr + λ i u i .2 above. and v3 = −2 2 1 −1 −4 1 Use the GramSchmidt process to ﬁnd an orthonormal basis of U . Instead. Sc ho ht ol rs ive Un of Examples M #5 Let V = R considered as an inner product space via the dot product. uj = 0 for i = j. v2 = . if we deﬁne 1 1 u1 = v1 = 1 1 1 5/4 2 u1 · v2 3 3 1 9/4 u2 = v2 − u1 = − = 5/4 2 u1 · u1 4 1 1 −19/4 −4 dS tat ist ics and u1 · v3 u2 · v3 u3 = v3 − u1 − u2 u1 · u1 u2 · u2 −1 1 5/4 (49/4) 9/4 5 1 1 = − − −2 5/4 4 1 (492/16) −1 1 −19/4 −1 1 5 −860/492 49 9 1896/492 5 123 1 = − − = −2 5 −1352/492 492 1 492 −1 1 −19 316/492 . if one remembers merely that ur = vr + λ1 u1 + λ2 u2 + · · · + λr−1 ur−1 then it is easy to ﬁnd λi for each i < r by taking inner products with ui . we get 109 Co rig py 0 = u i . u r = u i . and let U be the subspace of V spanned by the vectors −1 2 1 5 3 1 v1 = . since ui .Chapter Five: Inner Product Spaces In practice it is not necessary to remember the formulae for the coeﬃcients of the ui on the right hand side. uj are zero for i = j. and this yields the stated formula for λi .7. 4 ath ity em of ati Sy cs dn an ey −− Using the formulae given in 5. Indeed. u i since the terms λj ui .
The geometrical process corresponding to the orthogonal projection is “dropping Sc ho ht em ol rs ive Un ati cs of −− M ath ity of Sy dn an ey dS tat ist ics −− . U is a plane through the origin. ui · ui We obtain √ √ −215/ 391386 1/2 5/√492 √ 391386 474/ √ 1/2 9/√492 u3 = u1 = u2 = .1 and the orthonormal basis (u1 . with the dot product. It remains to “normalize” the basis vectors. and let U be a subspace of V of dimension 2. 5/ √ 492 −338/ 391386 1/2 √ −19/ 492 79/ 391386 1/2 Co ri g py as a suitable orthonormal basis for the given subspace. . u2 . u3 ) found in #5 gives −20 8 P = (u1 · v)u1 + (u2 · v)u2 + (u3 · v)u3 19 −1 √ √ 5/√492 −215/ 391386 1/2 √ 1591 474/ √ 86 9/√492 391386 1/2 = 3 + √ + √ 1/2 5/ √ 492 492 391386 −338/ 391386 √ 1/2 −19/ 492 79/ 391386 1 5 −215 3/2 3 1 43 9 1 474 5 = + + = 5 1 2 1 246 246 −338 1 −19 79 −3/2 Suppose that V = R3 . replace each ui by a vector of length 1 which 1 is a scalar multiple of ui . Calculate P (v).6. With U and V as in #5 above. let P : V → U be the orthogonal −20 8 projection. and let v = . that is. u2 and u3 are mutually orthogonal and span U .110 Chapter Five: Inner Product Spaces then u1 . This is achieved by the formula ui = √ ui . 19 −1 #6 −− Using the formula 5. Geometrically.
An extremely important application of this method of ﬁnding the point of a subspace which is closest to a given point. x v − P (v). and an experiment is performed to ﬁnd the values x and y. v − P (v) + x. and solve the equations Ax = b0 instead. Then P (v) is the point of U which is the foot of the perpendicular.8 Theorem Let U be a ﬁnite dimensional subspace of an inner product space V . B and C are known to be related by a formula of the form xA + yB = C. Since v − P (v) is orthogonal to all elements of U we see that Sc ho ht em ol rs ive Un ati of M ath ity of Sy v−u 2 = = = = ≥ v − u. It is geometrically clear that u = P (v) is the unique u ∈ U such that v − u is perpendicular to everything in U . v − P (v) cs dn an ey dS tat ist with equality if and only if x = 0. If v is any element of V then v − u ≥ v − P (v) for all u ∈ U .20. Examples #7 Three measurable variables A. Given u ∈ U we have v − u = (v − P (v)) + x where x = P (v) − u is an element of U . this corresponds to the algebraic statement that x.Chapter Five: Inner Product Spaces a perpendicular” to U from a given point v.1). We ought to prove this algebraically. x + x. and let P be the orthogonal projection of V onto U . If b is not in the column space then it is reasonable to ﬁnd that point b0 of the column space which is closest to b. v − P (v) + x v − P (v). with equality only if u = P (v). v − P (v) + x. v − u v − P (v) + x. It is also geometrically clear that P (v) is the point of U which is closest to v. In four cases measurement yields the results tabulated ics . is the approximate solution of inconsistent systems of equations. The equations Ax = b have a solution if and only if b is contained in the column space of the matrix A (see 3. v − P (v) + v − P (v). 111 Co rig py 5. x v − P (v). Inconsistent systems commonly arise in practice in cases where we can obtain as many (approximate) equations as we like by simply taking more measurements. Proof. v − P (v) = 0 for all x ∈ U .
0 5.0 case: 1. and so on) we see that a reasonable ˜ choice ˜ y comes from the point in U = Span(A.1 5.5 1.4 and 0.91.0 B 2. A )A + ( B . in which.89 and 6.87) −− Sc ho ht em ol rs ive Un ati cs of ath ity of Sy dn ey tat ist ics dS an . no doubt.9 6.112 Chapter Five: Inner Product Spaces below. 4.0 C 3.5 −0. B ) where A = A and ˜ ˜ ˜ ˜ B = B − ( A. B )B ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ and calculation gives the coeﬃcients of A and B as 5. Viewing these equations as xA + yB = C ˜ ˜ (where A ∈ R4 has ith entry Ai .8 5.0 4.0 case: 1. 5. B ) which is closest to C . (If ˜ these are the correct values of x and y˜then the values of C for the given values of A and B should be. which we do by means of the GramSchmidt process applied to A and B .97. ˜ ˜ This gives the orthogonal basis (A . What are the values of x and y which best ﬁt this data? 1 2nd 3rd 4th −− st A case: 1. in the four cases.5 ) . R ˜ To compute the projection we ﬁrst need to ﬁnd an orthogonal basis for U . B / A. for x and ˜ that is.93. C / A .5 0.98 and the coeﬃcient of A as x˜ = 5. Now P (C ) is given by the formula ˜ P (C ) = ( A .0 3.98 = 1.5 × 0.98.8 Co M x y ri g py We have a system of four equations in two unknowns: A1 A2 A3 A4 B1 B2 B3 B4 C1 C = 2 C3 C4 which is bound to be inconsistent. there are some experimental errors.4˜− 3.0 case: 1. Expressing ˜ this combination of A and B back in terms of˜A and B gives the coeﬃcient ˜ ˜ of B as y = 0. 3. C / B . the point P (C ) where P is the projection˜of˜ 4 onto the subspace U . A )A ˜ ˜ ˜ ˜ ˜ ˜ ˜ = B − (7/2)A ˜ ˜ = t ( −1.
1]. c2 . . π] by tat ist cn (x) = cos(nx) sn (x) = sin(nx) (for n = 0. b] onto the subspace of polynomial functions of degree at most 2. and determine the formula for the element of this subspace closest to a given f . . .Chapter Five: Inner Product Spaces #8 Find the parabola y = ax2 + bx + c which most closely ﬁts the graph 1 of y = ex on the interval [−1. 2. s2 . Since π sin(kx) dx −π ics = 0 for all k and π cos(kx) dx = −π 0 2π if k = 0 if k = 0 . 3. where exp is the exponential function. Using the orthogonal basis (f0 . . Show that the cn for 0 ≤ n ≤ k and the sm for 1 ≤ m ≤ k together form an orthogonal basis of a subspace of C[−π. ). . −− Recall the trigonometric formulae 2 sin(nx) cos(mx) = sin((n + m)x) − sin((n − m)x) 2 cos(nx) cos(mx) = cos((n + m)x) + cos((n − m)x) 2 sin(nx) sin(mx) = cos((n − m)x) − cos((n + m)x). −− Let P be the orthogonal projection of C[a. the sought after function is f (x) = λ + µx + ν(3x2 − 1) for the above values of λ. We seek to calculate P (exp). f2 ) from #4 above we ﬁnd that P (exp) = λf0 + µf1 + νf2 where 113 Co 1 x rig py Sc 1 1 1 λ= −1 ex dx x ho 2 ht ol xe dx rs ive Un ity of 1 dx 1 −1 M ath µ= −1 x2 dx −1 em of 1 ν= (3x − 1)e dx −1 (3x2 − 1)2 dx. . f1 . . π]. s1 . in the sense that −1(f (x) − ex )2 dx is minimized. That is. 1. −− ati −1 Sy dn ey cs an dS #9 Deﬁne functions c0 . c1 . . 2. . on the interval [−π. µ and ν. ) (for n = 1.
Hence if P is the orthogonal projection onto the subspace spanned by the cn and sm then for an arbitrary f ∈ C[−π. Hence the orthogonal complement of U consists of all v ∈ V which are orthogonal to all elements of U . if P is the projection then P (u) = u for all u ∈ U .) It is natural at this stage to ask about the kernel of P . v for all x ∈ U . cm = 0 for all n and m. cm = 0 if n = m. P (v) = 0 for all x ∈ U . cs dn an ey dS tat ist ics . u = x. 5. Co ri g py k P (f ) (x) = (1/2π)a0 + (1/π) Sc ho ht ol of M n=1 rs ive Un (an cos(nx) + bn sin(nx)) where ath π −π ity an = cn .1 If x. π]. Fortunately. cn = π if n = 0. sm = cn .8 that g = P (f ) is the element of the subspace for which π (f − g)2 is minimized. (iv) c0 . sn = cn . v = 0 for all x ∈ U then u = 0 is clearly an element of U satisfying x. if P (v) = 0 we must have x. Hence P (v) = 0. In fact. it is easy to see that the image of the projection of V onto U is indeed the whole of U . We know from 5.9 Definition If U is a ﬁnite dimensional subspace of an inner product space V and P the projection of V onto U then the subspace U ⊥ = ker P is called the orthogonal complement of U .9. v = x. f = and f (x) cos(nx) dx em ati of π Sy bn = sn . (ii) sn . (A consequence of this is that P is an idempotent transformation: it satisﬁes P 2 = P . f = −π f (x) sin(nx) dx. c0 = 2π. Comment 5. −− −π We have been happily talking about orthogonal projections onto subspaces and ignoring the fact that the word ‘onto’ should only be applied to functions that are surjective.114 Chapter Five: Inner Product Spaces we deduce that (i) sn . Conversely. (iii) sn .
λi 2 ui . x + P (v). uj . Then v 2 ≥ i αi 2 ui 2 . P (v) .j so that our task is to prove that v. . and some related matters. x + P (v) x. u = i whence the result. x + P (v). un ) is an orthogonal basis of a subspace U of V . uj = 0 for i = j we have P (v). ui ui . P (v) + P (v). . Writing x = v − P (v) we have v. dn ics an dS ey tat ist . u2 . y = i ( x. v ≥ P (v). and it is immediate from 5. since the element of U which is closest to u is obviously u itself. x + x. P (v) (since P (v) ∈ U and x ∈ U ⊥ ) P (v).Chapter Five: Inner Product Spaces Our ﬁnal task in this section is to prove the triangle and CauchySchwarz inequalities. P (v) x.6. as required. ui . P (v) . (ii) This amounts to the statement that P (u) = u for u ∈ U . (i) Let v ∈ V and for all i let αi = ui . n (ii) If u ∈ U then u = i=1 ui .10 Proposition Assume that (u1 . 115 Co rig py Proof. u ui . ui = ity of Sy i αi 2 ui 2 . Sc λi λj ui . . A direct proof is also trivial: we may write u = i λi ui for some scalars λi (since the ui span U ). u i = λ j u j .1 above gives P (v) = i λi ui . u j = i ho ht em ol rs ive Un ati cs of λi u j . (iii) If x. where λi = ui . and now for all j. v = = = ≥ x + P (v). v . y ∈ U then x. u j M ui . (iii) This is follows easily from (ii) and is left as an exercise. ui ui . v Since ui . P (v) = i. 5. ui = αi ath ui 2 . y ) ui . . (i) If P : V → U is the orthogonal projection then 5.8 above.
w  ≤ v w . v + w. w + v. Hence for all v.116 Chapter Five: Inner Product Spaces 5. v + w = ( v. w + w. w  which is nonnegative by the ﬁrst part.11 Theorem If V is an inner product space (complex or real) then (i)  v. w ∈ V . w + 2 v w ) − ( v. Such a transformation is called an orthogonal transformation (for real inner product spaces) or a unitary transformation (in the complex case). Co ri g py Proof. v + w. ( v + w )2 − v + w 2 = ( v. v + v. If v = 0 then v by itself forms an orthogonal basis for a 1dimensional subspace of V . It turns out that preservation of length is an equivalent condition. T (v) = u. §5c Orthogonal and unitary transformations As always in mathematics. and by part (i) of 5. we are particularly interested in functions which preserve the mathematical structure. (i) If v = 0 then both sides are zero. so that the result follows. Hence if V and W are inner product spaces it is of interest to investigate functions T : V → W which are linear and preserve the inner product. v + w.10 we have for all w. and the result follows. and (ii) v + w ≤ v + w for all v. w ) w − 2 v. w + 2 v w ) − v + w. in the sense that T (u). v ∈ V . hence these transformations are sometimes called isometries. w ) =2 v ≥2 v w − 2( v. v for all u. (ii) If z = x + iy is any complex number then z + z = 2x ≤ 2 x2 + y 2 = 2z. Hence ( v + w )2 ≥ v + w 2 . w 2 M ath ity of Sy dn ey tat ist ics dS ati cs an . Sc ho w ht em ol 2 rs ive Un v 2 of ≥  v. w ∈ V .
C) then the function φ: Cn → Cn deﬁned by φ(x) = T x is linear. We know from §3b#11 that if T ∈ Mat(n × n.11 above) 117 Co 2 rig py − v 2 u+v and similarly T (u) + T (v) 2 − T (u) so that the real parts of u. v = i u. j)entry is the conjugate of the (i. as required. T (v) equals the imaginary part of T (u). v ∈ V . Let u. v = (−i)(x + iy) = y − ix so that the real part of iu. v and T (iu). = T (u).† By Exercise 6 of Chapter Two we deduce that (AB)∗ = B ∗ A∗ whenever AB is deﬁned. v . Likewise. v is the imaginary part of u. Since by deﬁnition the length of v is transformation which preserves inner products preserves lengths. and we conclude. † Some people call A∗ the adjoint of A. T (v) are equal. we must prove that T (u). But writing u. We need the following two deﬁnitions. T (v) + T (u).Chapter Five: Inner Product Spaces 5. we assume that T (v) = v for all v ∈ V . For the converse. v . v and T (u). v + u. in conﬂict with the deﬁnition of “adjoint” given in Chapter 1. T (v) are equal too. v = (−i) u.13 Definition If A is a matrix with complex entries we deﬁne the conjugate of A to be the matrix A whose (i.12 Proposition If V and W are inner product spaces then a linear transformation T : V → W preserves inner products if and only if it preserves lengths. − T (v) M ath ity em of Sy dn ey ics an dS tat ist . The transpose of the conjugate of A will be denoted by ‘A∗ ’. furthermore. Our next task is to describe those matrices T for which the corresponding linear transformation is an isometry. v = x + iy with x. T (v) have the same imaginary part as well as the same real part. T (v) . Sc 2 ho 2 ht 2 − u ol rs ive Un ati cs of = u. T (v) = u. Note that (as in the proof of 5. T (v) . T (v) = iT (u). v. it is easily shown (see Exercise 14 of Chapter Three) that every linear transformation from Cn to Cn has this form for some matrix T . y ∈ R we see that iu. v and T (u). v for all u. By exactly the same argument the real parts of iu. that u. v ∈ V . j)entry of A. 5. and. v it is trivial that a Proof. the real part of T (iu).
if T is unitary then T ∗ T = I. and is therefore an isometry. . j)entry of T ∗ T .3 It is easy to prove. By 2. . e2 . Conversely. (T ei )∗ (T ej ) = T ei · T ej = φ(ei ) · φ(ej ) = ei · ej = δij since φ preserves the dot product. and T is said to be orthogonal. corresponding things are true in the real case.15. the (i. and let (e1 . with the word ‘orthogonal’ replacing the word ‘unitary’. Premultiplication by a real n × n matrix is an isometry of Rn if and only if the matrix is orthogonal. Co ri g py Proof. Comment 5. and for all u.9 it follows that T is unitary. we see that T is unitary if and only if the columns of T form an orthonormal basis of Cn . j)entry of T ∗ T is the dot product of the ith and j th columns of T . 5. and (T ei )∗ (T ej ) is the (i. If T has real entries this becomes tT = T −1 . where the complication due to complex conjugation is absent. Thus φ preserves the dot product. 5.118 Chapter Five: Inner Product Spaces 5. a real n × n matrix is orthogonal if and only if its columns form an orthonormal basis of Rn . and we leave it as an exercise. Suppose ﬁrst that φ is an isometry. that the product of two unitary matrices is unitary and the inverse of a unitary matrix is unitary. Since T ei is the ith column of T we see that (T ei )∗ is the ith row of T ∗ . j)entry of T T ∗ is the conjugate of the dot product of the ith and j th rows of T . However.15. . C).15. 5. a real n × n matrix is orthogonal if and only if its rows form an orthonormal basis of t Rn . The same is true in the real case.14 Definition An n × n complex matrix T is said to be unitary if T ∗ = T −1 . Furthermore. of course. v ∈ Cn we have φ(u) · φ(v) = T u · T v = (T u)∗ (T v) = u∗ T ∗ T v = u∗ v = u · v. The linear transformation φ: Cn → C n deﬁned by φ(x) = T x is an isometry if and only if T is unitary.15 Proposition Let T ∈ Mat(n × n.2 Of course. . whence T ∗ T is the identity matrix. Note that ei · ej = δij .1 Since the (i. en ) be the standard basis of Cn . so it is also true that T is unitary if and only if its rows form an orthonormal basis of t Cn . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics .
˜ equation gives a = 3 and v = t ( 1/3 2/3 2/3 ). cv 1 + ev 2 + f v 3 = 1 . bv 1 + dv 2 and cv 1 + ev 2 + f v 3 . dv 2 = 1 .) −− Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . bv 1 + dv 2 = 5 . After ˜ substituting these values the remaining equation becomes 2 f v 3 = −2 . b.Chapter Five: Inner Product Spaces Examples #10 Find an orthogonal matrix P and an upper triangular matrix U such that 1 4 5 PU = 2 5 1. U = 0 3 3. v 3 ) is an orthonormal basis of R3 . v 2 and v 3 . av 1 = ˜ ˜ ˜ ˜ ˜ ˜ 1 2 2 119 Co rig py and (v 1 . Taking˜the dot the ﬁrst ˜ ˜ 1 product of both sides of the second˜ equation with v 1 now gives b = 6 (since ˜ we require v 2 · v 1 = 0). Thus we ˜ ˜ ˜ ˜ wish to ﬁnd a. 2/3 −2/3 1/3 0 0 3 It is easily checked that tP P = I and that P U has the correct value. 2 2 1 a b c −− Let the columns of P be v 1 . c. Substituting these values into the ˜ equations gives 4 2 ev 2 + f v 3 = −1 . Thus the only possible ˜ solution to the given problem is 1/3 2/3 2/3 3 6 3 P = 2/3 1/3 −2/3 . Now since we require v 1 = 1. v 2 . ˜ ˜ ˜ 0 0 f Then the columns of P U are av 1 . and let U = 0 d e . e and f such ˜ that ˜ 5 4 1 2 . ˜ 1 and we see that f = 3 and v 3 = t ( 2/3 −2/3 1/3 ). ˜ ˜ ˜ −1 −2 The ﬁrst of these equations gives d = 3 and v 2 = t ( 2/3 1/3 −2/3 ). (Note that the method we used to calculate the columns of P was essentially just the GramSchmidt process. and similarly taking the dot product of both sides ˜ of the third˜equation with v 1 gives c = 3. d. and ˜ then taking the dot product of v 2 with the other equation gives e = 3.
So (tR)R = cos2 θI 2 + (1 − cos θ)2 X 2 + sin2 θY 2 + 2 cos θ(1 − cos θ)IX + (1 − cos θ) sin θ(XY − Y X) = cos2 θI + (1 − cos θ)2 X + sin2 θ(I − X) + 2 cos θ(1 − cos θ)X = (cos2 θ + sin2 θ)I + ((1 − cos θ)(1 + cos θ) − sin2 θ)X =I as required. 1 3 2 −u3 u1 −u3 u2 u2 + u2 −u3 u1 −u3 u2 1 − u2 1 2 3 To check that R = R(u. θ) = cos θI + (1 − cos θ)X + sin θY is orthogonal. show that for all θ. u2 and u3 be real numbers satisfying u2 + u2 + u2 = 1. Since tX = X and t Y = −Y it follows that Y X = −(t Y )(tX) = −t (XY ) = −t 0 = 0. Yu= 0 u1 u2 −u1 0 u3 0 and hence Y X = Y u(t u) = 0. −− Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ics ist an dS tat .120 Chapter Five: Inner Product Spaces #11 Let u1 . Y = −u3 0 u1 . Furthermore. the matrix R(u. and XY = Y X = 0. and so X 2 = u(t u) u(t u) = u (t u)u (t u) = u(u · u)(t u) = X since u · u = 1. since tX = X and t Y = −Y . Deﬁne 2 u1 u1 u2 u1 u3 0 u3 −u2 X = u2 u1 u2 u2 u3 . 2 2 u3 u1 u3 u2 u3 u2 −u1 0 Co ri g py Show that X 2 = X and Y 2 = I − X. and let 1 2 3 u be the 3 × 1 column with ith entry ui . By direct calculation we ﬁnd that 0 u3 −u2 u1 0 −u3 u2 = 0 . −− Observe that X = u(t u). Now t R = cos θI + (1 − cos θ)X − sin θY. θ) is orthogonal we show that (tR)R = I. And an easy calculation yields 2 u2 + u2 −u1 u2 −u1 u3 1 − u2 −u1 u2 −u1 u3 3 1 Y 2 = −u2 u1 u2 + u2 −u2 u3 = −u2 u1 1 − u2 −u2 u3 = I − X.
˜ whence the coordinates of T x relative to the standard basis are the same as the coordinates of x relative˜to the basis (T −1 e1 . A quadratic form over R in variables x1 . and they form an orthonormal basis of Rn since T −1 is an orthogonal matrix. Note ˜ that these vectors are the columns of T −1 . where x is the column with xi as its ith entry. Hence ˜ if S is any set of points in Rn then the ˜ φ(S) = { T x  x ∈ S } is congruent set ˜ ˜ to S (in the sense of Euclidean geometry). . it invariably simpliﬁes matters if one can choose coordinate axes which are in some sense natural for the problem concerned. . . It therefore becomes important to be able keep track of what happens when a new coordinate system is chosen. as we have seen. x2 . θ) deﬁned in #11 above corresponds geometrically to a rotation through the angle θ about an axis in the direction of the vector u. So choosing a new orthonormal basis is eﬀectively the same as applying an orthogonal transformation. 121 Co §5d Quadratic forms rig py When using Cartesian coordinates to investigate geometrical problems. T −1 en ). e2 . For example. . . then x = µ1 (T −1 e1 ) + µ2 (T −1 e2 ) + · · · + µn (T −1 en ). R) is an orthogonal matrix then. en ) is the standard basis of Rn . . Clearly such transformations will be important in geometrical situations. and doing so leaves all lengths and angles unchanged. .Chapter Five: Inner Product Spaces It can be shown that the matrix R(u. If T ∈ Mat(n × n. . i If A is the matrix with diagonal entries Aii = aii and oﬀdiagonal entries Aij = Aji = aij (for i < j) then the quadratic form can be written as Q(x) = txAx. . Observe now that if T x = µ1 e1 + µ2 e2 + · · · + µn en ˜ where (e1 . ˜ ˜ out multiplying ˜ a p q x p b ry (x y z) q r c z Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics tat an dS . . We comment that in the threedimensional case the only length preserving linear transformations are rotations and reﬂections. the transformation φ deﬁned by φ(x) = T x preserves lengths and angles. . xn is an expression of the form i aii x2 + 2 i<j aij xi xj where the coeﬃcients are real numbers. . T −1 e2 .
† In terms of the new variables the quadratic form Q(x) = txAx becomes ˜ ˜ ˜ t t t Q (x ) = (T x )A(T x ) = x ( T AT )x = tx A x ˜ ˜ ˜ ˜ ˜ ˜ ˜ t where A = T AT . and (ii) if λ and µ are two distinct eigenvalues of A.16 Definition A real n × n matrix A is said to be symmetric if tA = A. and if u and v are corresponding eigenvectors. and not hard to prove.17 Theorem Let A be an Hermitian matrix. Then (i) All eigenvalues of A are real. In the above situation we call A the matrix of the quadratic form Q. † The T in this paragraph corresponds to the T −1 above. and hence ˜ determine the behaviour of f near the critical point. We would now like to be able to rotate the axes so as to simplify the expression for Q(x) as much as possible. Co ri g py 5. the coeﬃcients of the corresponding symmetric matrix being the second order partial derivatives of f at the critical point.122 Chapter Five: Inner Product Spaces gives ax2 + by 2 + cz 2 + 2pxy + 2qxz + 2ryz. where T ˜ ˜ is the orthogonal matrix whose columns are the new basis vectors. Ignoring the terms of degree three and higher then gives an approximation to f of the form c + Q(x). the same results hold for real symmetric matrices. From our discussion above we know that introducing a new coordinate system corresponding to an orthonormal basis of Rn amounts to introducing new variables x which are related to the old variables x by x = T x . The following remarkable facts concerning eigenvalues and eigenvectors of Hermitian matrices are very important. Of course the theorem applies to Hermitian matrices which happen to be real. that is. 5. and if we choose a critical point of f as the origin of our coordinate system. and a complex n × n matrix A is said to be Hermitian if A∗ = A. where ˜ c = f (0) is a constant and Q is a quadratic form. Since Aij = Aji for all i and j we have that A is a symmetric matrix. then the terms of degree one in the Taylor series for f vanish. then u · v = 0 The proof is left as an exercise (although a hint is given). Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . If f is an arbitrary smooth real valued function on Rn . in the sense of the following deﬁnition.
The case n = 1 is vacuously true. 123 Co rig py (In fact 5.18 Definition Two n × n real matrices A and A are said to be orthogonally similar if there exists an orthogonal matrix T such that A = tT AT .18.1 In both parts of the above deﬁnition we could alternatively have written A = T −1 AT . .2 is a trivial consequence of the “Fundamental Theorem of Algebra”. and by 5. Now deﬁne P to be the Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics an dS tat . See also §9c. (ii) If A is an n × n Hermitian matrix then there exists a unitary matrix U such that U ∗ AU is a real diagonal matrix. . We have shown that the change of coordinates corresponding to choosing a new orthonormal basis for Rn induces an orthogonal similarity transformation on the matrix of a quadratic form.18. which asserts that the ﬁeld C is algebraically closed.Chapter Five: Inner Product Spaces 5. Comment 5. .19 Theorem (i) If A is an n × n real symmetric matrix then there exists an orthogonal matrix T such that tT AT is a diagonal matrix. . whose proof is beyond the scope of this book: 5. The proof is by induction on n. v2 . Assume then that all k × k symmetric matrices over R are orthogonally similar to diagonal matrices. Replacing each vi by vi −1 vi makes this into an orthonormal basis. Two n × n complex matrices A and A are said to be unitarily similar if there exists a unitary matrix T such that A = T ∗ AT . Since v constitutes a basis for a onedimensional subspace of Rn . we have to make use of the following fact. we can apply the GramSchmidt process to ﬁnd an orthogonal basis (v1 . Proof. since T −1 = tT in the real case and T −1 = T ∗ in the complex case. since every 1 × 1 matrix is diagonal. with v1 still a λeigenvector for A. The main theorem of this section says that it is always possible to diagonalize a symmetric matrix by an orthogonal similarity transformation.2 we know that A has at least one complex eigenvalue λ.17 we know that λ ∈ R.18. We prove only part (i) since the proof of part (ii) is virtually identical. Unfortunately. and let A be a (k + 1) × (k + 1) real symmetric matrix.2 Every square complex matrix has at least one complex eigenvalue.) 5. vn ) of Rn with v1 = v.18. Let v ∈ Rn be an eigenvector corresponding to the eigenvalue λ. By 5.
since v is the ﬁrst column of P . 0 B ˜ By the inductive hypothesis there exists a real orthogonal matrix Q such that Q−1 BQ is diagonal. That is Co ri g py for some kcomponent row z and some k × k matrix B. so that P −1 AP is symmetric. It is clear by multiplication of block matrices and the fact that (tQ)Q = I that (tQ )Q = I.124 Chapter Five: Inner Product Spaces (real) matrix which has vi as its ith column (for each i). and so we deduce that the ﬁrst entry of the ﬁrst column of P −1 AP is λ.3) it follows that P Q is orthogonal too. By multiplication of partitioned matrices we ﬁnd that Sc λ 0 ˜ t ho ht λ 0 ˜ P −1 AP = ol z ˜ B rs ive Un ati cs of M ath ity em of Sy dn ey ics ist an dS tat 1 0 ˜ t 0 ˜ Q −1 0 ˜ B 1 0 ˜ t 0 ˜ Q = = t 1 0 λ ˜ 0 Q−1 0 ˜ ˜ t λ 0 BQ 0 Q−1˜ ˜ 0 ˜ BQ t which is diagonal since Q−1 BQ is. It follows that z = 0 and B is symmetric: we have λ t0 P −1 AP = ˜ . all the other entries in the ﬁrst column are zero. Hence Q is orthogonal. By the comments in 5. But the same reasoning shows that P −1 v is the ﬁrst column of P −1 P = I. The ﬁrst column of P −1 AP is P −1 Av. Thus (P Q )−1 A(P Q ) is diagonal. 0 Q ˜ It remains to prove that P Q is orthogonal. ˜ Since P is orthogonal and A is symmetric we have that P −1 AP = tP tAP = t (tP AP ) = t (P −1 AP ). and since the product of two orthogonal matrices is orthogonal (see 5. with 0 being the ˜ kcomponent zero column.15.15. where 1 0 Q = ˜ . .1 above we see that P is orthogonal. and since v is a λeigenvector of A this simpliﬁes to P −1 (λv) = λP −1 v.
and since the determinant of the diagonal matrix U −1 AU is just the product of the diagonal entries we conclude that the determinant of A is just the product of its eigenvalues. 125 Co rig py 5. In this case.19. 5. when you solve the linear equations (A − λI)x = 0 you will ﬁnd that the number of arbitrary parameters in the general solution is equal to the multiplicity of λ as a root of the characteristic polynomial. the reverse is not true. The matrices A and t T AT are said to be congruent. The following proposition can be used as a test for positive deﬁniteness. It is clear from the above discussion that the diagonal entries of U −1 AU are exactly the eigenvalues of A.1 Given a n × n symmetric (or Hermitian) matrix the problem of ﬁnding an orthogonal (or unitary) matrix which diagonalizes it amounts to ﬁnding an orthonormal basis of Rn (or Cn ) consisting of eigenvectors of the matrix.19. If the characteristic equation has a repeated root λ then you will have to ﬁnd more than one eigenvector corresponding to λ. If there are n distinct eigenvalues then the eigenvectors are uniquely determined up to a scalar factor. If A is positive deﬁnite and T is an invertible matrix then certainly t (T x)A(T x) > 0 for all nonzero x ∈ Rn .2 Let A be a Hermitian matrix and let U be a Unitary matrix which diagonalizes A.Chapter Five: Inner Product Spaces Comments 5. and it follows that t T AT is positive deﬁnite also. So if the eigenvectors you ﬁnd for two diﬀerent eigenvalues do not have the property that their dot product is zero. and the theory above guarantees that they will be a right angles to each other.20 Proposition A real symmetric n × n matrix A is positive deﬁnite if Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . for instance).) The quadratic form Q(x) = txAx and the symmetric matrix A are both said to be positive deﬁnite if Q(x) > 0 for all nonzero x ∈ Rn . It follows that an arbitrary real symmetric matrix is positive deﬁnite if and only if all its eigenvalues are positive. symmetric matrices which are orthogonally similar are congruent. Obviously. You must ﬁnd an orthonormal basis for this solution space (by using the GramSchmidt process. (This can also be proved by observing that the determinant and the product of the eigenvalues are both equal to the constant term of the characteristic polynomial. proceed in the same way as for any matrix: ﬁnd the characteristic equation and solve it. Note also that det U −1 AU = det U −1 det A det U = det A. To do this. Note that t T AT is also symmetric. it means that you have made a mistake. It is easily checked that a diagonal matrix is positive deﬁnite if and only if the diagonal entries are all positive.
So A is positive deﬁnite. Note that since the determinant of An−1 is positive. we have that det X = 1. being the product of the eigenvalues. and hence A is too. We have An−1 v A= t v µ for some (n − 1)component column v and some real number µ. We must prove. since A is positive deﬁnite. Sc ho ht ∗ ∗ ol rs ive Un x 0 of M ath = txAk x ity of Sy dn ey dS tat ist ics em ati cs an . For the case n > 1 the inductive hypothesis gives that An−1 is positive deﬁnite. and so the determinant. Furthermore. that if all the Ak have positive determinants then A is positive deﬁnite.126 Chapter Five: Inner Product Spaces and only if for all k from 1 to n the k × k matrix Ak with (i. Thus the determinant of each Ak is positive too. We now prove that A is congruent to A for an appropriate choice of λ. 0 < tyAy = ( tx 0 ) Co ri g py Ak ∗ and it follows that Ak is positive deﬁnite. Then the eigenvalues of A are all positive. j)entry Aij (obtained by deleting the last n − k rows and columns of A) has positive determinant. Deﬁning λ = µ − t vA−1 v and n−1 X= In−1 0 A−1 v n−1 1 an easy calculation gives tXA X = A. the case n = 1 being trivial. We use induction on n. Now if x is any nonzero kcomponent column and y the ncomponent column obtained from x by appending n − k zeros then. and hence det A = det A > 0. Assume that A is positive deﬁnite. Proof. λ is positive if and only if det A is positive. conversely. is positive also. and it immediately follows that the n × n matrix A deﬁned by An−1 0 A = 0 λ is positive deﬁnite whenever λ is a positive number.
v) + d(v. w). w w.Chapter Five: Inner Product Spaces Suppose that a surface in three dimensional Euclidean space is deﬁned by an equation of the form Q(x. (iii ) d(u. and the separate cases are easily enumerated. w) ≤ d(u. v). v − λw occurs at λ = v. v. Prove that the dot product is an inner product on Rn . z) = constant. y. w) = 0 only if v = w. w ∈ V . Assume any theorems of calculus you need. (iv ) d(u + v. g = a f (x)g(x) dx deﬁnes an inner product on C[a.10. for all u. Orthogonal transition matrices of determinant 1 correspond simply to rotations of the coordinate axes. u + w) = d(v. w w. w) ≥ 0. w . w) = v − w . 4. v − λw ≥ 0 to prove the CauchySchwarz inequality in any inner product space. and so we deduce that a suitable rotation transforms the equation of the surface to 127 Co rig py λ(x )2 + µ(y )2 + ν(z )2 = constant where the coeﬃcients λ. It is easily seen that an orthogonal matrix necessarily has determinant either equal to 1 or −1. Prove that f. 3. w ∈ V . (ii ) Put λ = v. µ and ν are the eigenvalues of the matrix we started with. Deﬁne the distance between two elements v and w of an inner product space by d(v. 2. where Q is a quadratic form. w). (i ) Let V be a real inner product space and v. b]. 1. 6. We can ﬁnd an orthogonal matrix to diagonalize the symmetric matrix associated with Q. Prove that orthogonal projections are linear transformations. The nature of these surfaces depends on the signs of the coeﬃcients. w) = d(w. Sc b ho ht em ol rs ive Un ati cs of Exercises M ath ity of Sy dn ey ics dS tat ist an . with d(v. w and use v − λw. and by multiplying a column by −1 if necessary we can make sure that our diagonalizing matrix has determinant 1. Prove part (iii) of Proposition 5. Prove that (i ) d(v. (ii ) d(v. Use calculus to prove that the minimum value of v − λw. 5.
(Hint: Consider u∗ Av. Hence prove that the product of two unitary matrices is unitary.128 Chapter Five: Inner Product Spaces 7. and then calculate v ∗ Av in two diﬀerent ways. and let u and v be corresponding eigenvectors. Prove that v ∗ A∗ = λv ∗ . Let A be a Hermitian matrix and λ an eigenvalue of A. Prove that (AB)∗ = (B ∗ )(A∗ ) for all complex matrices A and B such that AB is deﬁned. Prove also that the inverse of a unitary matrix is unitary. Co 8. (Hint: Let v be a nonzero column satisfying Av = λv. Prove that λ is real.) (ii ) Let λ and µ be two distinct eigenvalues of the Hermitian matrix A. Prove that u and v are orthogonal to each other.) (i ) ri g py Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics tat dS an .
In fact we say that V and W are isomorphic vector spaces. and let W = R4 . the set of all 2 × 2 matrices over R. is to say that.6 Relationships between spaces Co rig py ht rs ive Un M ath ity em of of In this chapter we return to the general theory of vector spaces over an arbitrary ﬁeld. there is an obvious onetoone correspondence between V and W : the function f from V to W deﬁned by is bijective. It is clear also that f interacts in the best possible way with the addition and scalar multiplication functions corresponding to V and W : the column which corresponds to the sum of two given matrices A and B is the sum of the column corresponding to A and the column corresponding to B. to say that V and W are isomorphic. 129 Sc ho ol f a b c d a b = c d ati cs an Sy dn ey dS tat ist ics . the set of all 4component columns over R. Furthermore. We then consider various ways of constructing spaces from others. the “sameness” of vector spaces. Intuitively. R). as vector spaces. whether one chooses to write four real numbers in a column or a rectangular array is surely a notational matter of no great substance. One might even like to say that V and W are really the same space. relative the usual addition and scalar multiplication for matrices and columns. and the function f is called an isomorphism. Then V and W are both vector spaces over R. they are the same. after all. §6a Isomorphism Let V = Mat(2 × 2. and the column corresponding to a scalar multiple of A is the same scalar times the column corresponding to A. Our ﬁrst task is to investigate isomorphism.
the next proposition shows that (for a given ﬁeld F ) there is exactly one equivalence class for each nonnegative integer: there is.3 Since isomorphisms are bijective functions they have inverses.4 If f : U → W and g: V → U are isomorphisms then the composite function f g: V → W is also an isomorphism. Restricting attention to ﬁnitely generated vector spaces.1 Definition Let V and W be vector spaces over the same ﬁeld F . an isomorphism is a onetoone correspondence which preserves addition and scalar multiplication. 6. becomes. only one vector space of each dimension. A function f : V → W which is bijective and linear is called an isomorphism of vector spaces. isomorphism is a symmetric relation. an isomorphism is a linear transformation which is onetoone and onto. That is. The above comments show that isomorphism is an equivalence relation (see §1e). B ∈ V and λ ∈ R. = Comments 6. If there is an isomorphism from V to W then V and W are said to be isomorphic.1. We leave it as an exercise to prove that the inverse of an isomorphism is also an isomorphism. Alternatively. Thus if V is isomorphic to U and U is isomorphic to W then V is isomorphic to W —isomorphism is a transitive relation.1. in other words. 6. essentially.1.2 Every vector space is obviously isomorphic to itself: the identity function i (deﬁned by i(x) = x for all x) is an isomorphism.1 Rephrasing the deﬁnition. 6. This says that isomorphism is a reﬂexive relation. when written symbolically. 6. and it follows that vector spaces can be separated into mutually nonoverlapping classes—which are known as isomorphism classes—such that spaces in the same class are isomorphic to one another.130 Chapter Six: Relationships between spaces The property of the onetoone correspondence f which was written out in words in the last paragraph. f is a linear transformation. Sc ho ht em ol of M rs ive Un ati cs ath ity of Sy dn an ey dS tat ist ics . This proof is also left as an exercise. Thus if V is isomorphic to W then W is isomorphic to V .1. and we write V ∼ W . f (A + B) = f (A) + f (B) f (λA) = λf (A) Co ri g py for all A.
.1. b. Then v = for some x.19. By the same argument. that is. z. B ∈ Mat(2. x y Let v ∈ R4 be arbitrary. w ∈ R. R) to R4 deﬁned at the start of this section is an isomorphism. so are U and F d . R) and λ ∈ R. Proof.Chapter Six: Relationships between spaces 6. . 131 Co rig py #1 Prove that the function f from Mat(2. Let b be a basis of V . .1 and 6. and we ﬁnd f (A + B) = f e a a+e a+e b+k b+k b k = = + g c c+g d+h c+g h d d+h a b e k =f +f = f (A) + f (B) c d g h dn ey dS tat ist ics and f (λA) = f λa λb λc λd a λa b λb = = λ = λf (A). z ˜ w f . Then ath ity A= a b c d B= e g k h em of Sy ati cs an for some a. c λc d λd Hence f is a linear transformation.2 Proposition If U and V are ﬁnitely generated vector spaces over F of the same dimension d then they are isomorphic. y. h ∈ R. We have seen (see 4. Sc ho Examples ht rs ive Un ol of M −− Let A.3) that v → cvb (v) deﬁnes a bijective linear transformation from F d to V . and z ˜ ˜ w x y z w x y = = v. V and F d are isomorphic. .
b. c = g and d = h. b . Since it is also surjective and linear. it remains to ﬁnd an isomorphism φ: P → S.132 Chapter Six: Relationships between spaces Hence f is surjective. c. −− #2 Let P be set of all polynomial functions from R to R of degree two or less. so that f + g and λf are both in P. for some a. Show that P and F are isomorphic vector spaces over R. a . −− Observe that P is a subset of the set of all functions from R to R. There are many ways to do this. 3} to R. c ∈ R such that f (x) = ax2 +bx+c for all x ∈ R. Thus f is injective. and let F be the set of all functions from the set {1. Since §3b#6 also gives that F is a vector space over R. Now by the deﬁnitions of addition and scalar multiplication for functions we have for all x ∈ R (f + g)(x) = f (x) + g(x) = (ax2 + bx + c) + (a x2 + b x + c ) = (a + a )x2 + (b + b ) + (c + c ) and similarly (λf )(x) = λ f (x) = λ(ax2 + bx + c) = (λa)x2 + (λb)x + (λc). Suppose that A. c ∈ R. 2. B ∈ Mat(2. We have Co ri g py f (x) = ax2 + bx + c. g ∈ P and λ ∈ R be arbitrary. which we know to be a vector space over R (by §3b#6). One is as Sc ho ht em ol rs ive Un ati cs of M ath g(x) = a x2 + b x + c ity of Sy dn an ey dS tat ist ics . Writing a b e k A= and B = we have c d g h a e b k = f (A) = f (B) = . c g d h whence a = e. b. f is an isomorphism. R) are such that f (A) = f (B). Hence P is closed under addition and scalar multiplication. b = k. and is therefore a subspace. showing that A = B. A function f : R → R is in the subset P if and only if there exist a. Now let f.
Chapter Six: Relationships between spaces follows: if f ∈ P let φ(f ) be the restriction of f to {1. x − 2 and x − 3 are all factors of (f − g)(x). 2. Then for each i ∈ {1. and hence (f − g)(i) = 0. we must prove that φ is surjective. 2. (φ(f ))(3) = f (3). Now since f − g cannot have degree greater than two we see that the only possibility is q = 0. The assumption that φ(f ) = φ(g) gives f (i) = g(i). That is. Sc ho ht em ol of M rs ive Un ati cs ath ity of Sy dn an ey dS tat ist ics −− . if f ∈ P then φ(f ) ∈ F is deﬁned by 133 Co (φ(f ))(1) = f (1). 3} we have φ(λf + µg)(i) = (λf + µg)(i) = (λf )(i) + (µg)(i) = λf (i) + µg(i) = λ φ(f )(i) + µ φ(g)(i) = λφ(f ) (i) + µφ(g) (i) = λφ(f ) + µφ(g) (i). g ∈ P and λ. 2. 2. Thus x − 1. Thus φ preserves addition and scalar multiplication. Finally. 3}.) There is an alternative and possibly shorter proof using (instead of φ) an isomorphism which maps the polynomial ax2 + bx + c to α ∈ F given by α(1) = a. and so φ(λf + µg) = λφ(f ) + µφ(g). 3. To prove that φ is injective we must prove that if f and g are polynomials of degree at most two such that φ(f ) = φ(g) then f = g. 2 (This formula is an instance of Lagrange’s interpolation formula. and so (f − g)(x) = q(x)(x − 1)(x − 2)(x − 3) for some polynomial q. Let f. and it follows that f = g. µ ∈ R. as required. α(2) = b and α(3) = c. for i = 1. 3} → R there is a polynomial f of degree at most two with f (i) = α(i) for i = 1. 2. (φ(f ))(2) = f (2). In fact one can immediately write down a suitable f : 1 f (x) = 2 (x − 2)(x − 3)α(1) − (x − 1)(x − 3)α(2) + 1 (x − 1)(x − 2)α(3). 3. rig py We prove ﬁrst that φ as deﬁned above is linear. and this involves showing that for every α: {1.
. vr2 +2 . v2 and v3 such that v = v1 +v2 +v3 and vi ∈ Si . . Sc ho ht em ol rs ive Un ati ε ζ of M ath and S3 = 0 0 ε. . . . For each j = 1. δ 0 δ 0 ε ζ 0 0 0 0 ε ζ We say that F 6 is the “direct sum” of its subspaces S1 . vr3 ) . vr1 ) b2 = (vr1 +1 . . ζ ∈ F 0 ity of Sy dn ey tat ist ics cs an dS . k let Uj be the subspace of V spanned by bj . and suppose that we split b into k parts: b1 = (v1 . . vr1 +2 . suppose that b = (v1 . The subset α Co 0 0 S1 = β 0 α. It is not hard to see that bj is a basis of Uj . . . we have α α 0 0 0 β β γ 0 γ = 0 + + 0. v2 . vrk−1 +2 .134 Chapter Six: Relationships between spaces §6b Direct sums Let F be a ﬁeld. . and consider the space F 6 . v2 . . . speciﬁcally. . for each j. δ ∈ F δ 0 0 are also subspaces of F 6 . . . . consisting of all 6component columns over F . . . . β ∈ F 0 ri g py is a subspace of F 6 isomorphic to F 2 . . S2 and S3 . 0 0 S2 = 0 γ γ. More generally. It is easy to see that for an arbitrary v ∈ F 6 there exist uniquely determined v1 . . . . Furthermore. . the map which appends four zero components to the bottom of a 2component column is an isomorphism from F 2 to S1 . . vn ) is a basis for the vector space V . Likewise. bk = (vrk−1 +1 . vr2 ) b3 = (vr2 +1 . 2. . vn ) where the rj are integers with 0 = r0 < r1 < r2 < · · · < rk−1 < rk = n.
uk such that v = u1 + u2 + · · · + uk and uj ∈ Uj for each j. Since Uj = Span bj we may write (for each j) em of Sy dn ey dS tat ati k cs an rj rj ist uj = i=rj−1 +1 λi vi and uj = i=rj−1 +1 λi vi ics for some scalars λi and λi . Since b spans V there exist scalars λi with 135 Co rig py Sc λi vi n v= i=1 r1 = i=1 λi vi + ho ol r2 λi vi + · · · + ht n rs ive Un λi vi . . . u2 . Let v ∈ V . Proof. . .15 that λi = λi for each i. rj rj uj = i=rj−1 +1 λi vi = i=rj−1 +1 λi vi = uj as required. . Hence for each j. Now we have n k rj k λi vi = i=1 j=1 i=rj−1 +1 k λi vi = j=1 k rj uj n = j=1 uj = j=1 i=rj−1 +1 λi vi = i=1 λi vi and since b is a basis for V it follows from 4.Chapter Six: Relationships between spaces 6. uj ∈ Uj k k and j=1 uj = j=1 uj then uj = uj for each j.3 Proposition In the situation described above. so that each v can be expressed in the required form. for each v ∈ V there exist uniquely determined u1 . i=r1 +1 of i=rk−1 +1 M Deﬁning for each j ath rj i=rj−1 ity uj = λi vi we see that uj ∈ Span bj = Uj and v = j=1 uj . To prove that the expression is unique we must show that if uj .
6 Theorem If U1 and U2 are subspaces of the vector space V then V = U1 ⊕ U2 if and only if V = U1 + U2 and U1 ∩ U2 = {0}.15 gives λi = λi . Co ri g py #3 Let b = (v1 . Uk if every element of V is uniquely expressible in the form u1 + u2 + · · · + uk with uj ∈ Uj for each j. Then each element of V can be expressed in the form u1 +u2 . . −− Let v be an arbitrary element of V . by deﬁnition.136 Chapter Six: Relationships between spaces 6. (See Exercise 9 of Chapter Three.4 Definition A vector space V is said to be the direct sum of subspaces U1 . Assume that V = U1 + U2 and U1 ∩ U2 = {0}. .15. Now since ui = λi vi is an element of Ui this shows n that v can be expressed in the required form i=1 ui . . The case of two direct summands is the most important: 6. Since b spans V there exist λi ∈ F n such that v = i=1 λi vi . For suppose n that we also have v = i=1 ui with ui ∈ Ui . If V = U1 ⊕ U2 then U1 and U2 are called complementary 6. So. as required.3 for which the parts into which the basis is split have one element each. and to show˜that V = U1 ⊕U2 it suﬃces to prove that these expressions are unique. To signify that V is the direct sum of the Uj we write V = U1 ⊕ U2 ⊕ · · · ⊕ Uk . . and it follows that ui = ui . u2 ∈ U2 . assume that u1 +u2 = u1 +u2 Sc ho ol Example ht rs ive Un ati cs M ath ity em of Sy dn an ey dS tat ist ics of . . Prove that V = U1 ⊕ U2 ⊕ · · · ⊕ Un . vn ) be a basis of V and for each i let Ui be the onedimensional space spanned by vi . whence 4. U2 .) Thus V = U1 + U2 if and only if each element of V is expressible as u1 + u2 . . v2 . It remains to prove that the expression is unique. and this follows directly from 4. Recall that. . −− Observe that this corresponds to the particular case of 6. Letting ui = λi vi we ﬁnd that n v = i=1 λi vi . ˜ Proof. U1 + U2 consists of all elements of V of the form u1 + u2 with u1 ∈ U1 . .5 Definition subspaces.
vn ) is a basis it is not suﬃcient to prove that vi is not a scalar multiple of vj for i = j. Uk are subspaces of V it is natural to deﬁne their sum to be U1 + U2 + · · · + Uk = { u1 + u2 + · · · + uk  ui ∈ Ui } and it is easy to prove that this is always a subspace of V . ui ∈ Ui for i = 1. ho ht em ol rs ive Un ati cs of ath ity of Sy dn an ey dS tat ist ics . . assume that V = U1 ⊕ U2 . and we also have that Ui ∩ Uj = {0} whenever i = j. v2 . and uniqueness is proved. If U1 . u1 ∈ U1 and u2 . u2 ∈ U2 ). ˜ Thus ui = ui for each i. Since each element of V is uniquely expressible as the sum of an element of U1 and an element of U2 this equation implies that u1 = u1 and u2 = u2 . u1 ∈ U1 ). then. By closure properties for the subspace U1 we have v = u1 − u1 ∈ U1 and similarly closure of U2 gives (since u1 . . . What we really need. It remains to prove that U1 ∩ U2 = {0}. so V = U1 + U2 . If we deﬁne ˜ u1 = 0 u1 = v and ˜ u2 = v u2 = 0 ˜ then we see that u1 . . One might hope that V is the direct sum of the Ui if the Ui generate V . that is. 137 Co M rig py v = u2 − u2 ∈ U2 Hence v ∈ U1 ∩ U2 = {0}. u2 ∈ U2 and also u1 + u2 = v = u1 + u2 . Conversely. ˜ The above theorem can be generalized to deal with the case of more than two direct summands. so ˜ {0} ⊆ U1 ∩ U2 . . let us call this element v. However. in the sense that V is the sum of the Ui . . v = 0. and we have shown that ˜ u1 − u1 = 0 = u2 − u2 . Sc (since u2 . is a generalization of the notion of linear independence. Subspaces always contain the zero vector. since to prove that (v1 . U2 .Chapter Six: Relationships between spaces with ui . 2. . consideration of #3 above shows that this condition will not be suﬃcient even in the case of onedimensional summands. Then certainly each element of V can be expressed as u1 + u2 with ui ∈ Ui . Then u1 − u1 = u2 − u2 . Now let v ∈ U1 ∩ U2 be arbitrary. .
7 Definition The subspaces U1 . . . . We omit the proof of this since it is a straightforward adaption of the proof of 6. Then v = Uj we have uj = v= i=1 dj i=1 d1 (1) λi1 vi (j) λij vi k j=1 dim Uj . . . v2 . the summands being the subspaces spanned by the various parts. Uk be subspaces of the vector space V .6: Co ri g py 6. . . . vd2 . . vdj ) be a basis of the subspace Uj .9 Theorem Let V = U1 ⊕ U2 ⊕ · · · ⊕ Uk . ˜ We can now state the correct generalization of 6. vdk ) (1) (2) (k) (k) ist ics is a basis of V and dim V = Proof. Conversely. . 2. (k) a linear combination of the elements of b. Uk of V are said to be independent if the only solution of u1 + u2 + · · · + uk = 0. . . vd1 . . . .8. Suppose we have an expression for 0 as a linear combination of the ˜ elements of b. . Note in particular that the uniqueness part of the deﬁnition corresponds to the independence part of 6. . given a direct sum decomposition of V one can obtain a basis of V by combining bases of the summands. ui ∈ Ui for each i ˜ is given by u1 = u2 = · · · = uk = 0. We have proved in 6. . v1 . .6. . . . . U2 . Then Sc (1) ho (1) ht em ol (2) rs ive Un ati cs of M ath ity of Sy dn an ey dS tat b = (v1 . 6. . .3 that a direct sum decomposition of a space is obtained by splitting a basis into parts. . k j=1 uj for some uj ∈ Uj . That is.138 Chapter Six: Relationships between spaces 6. . . . .8 Theorem Let U1 . . v1 . . Let v ∈ V . . . Thus d2 dk (2) λi2 vi i=1 + + ··· + i=1 λik vi . . assume that d1 d2 dk 0= ˜ λi1 vi i=1 + i=1 λi2 vi (2) + ··· + i=1 λik vi (k) . and since bj spans for some scalars λij . Then V = U1 ⊕ U2 ⊕ · · · ⊕ Uk if and only if the Ui are independent and generate V . . . and for each j = 1. k let (j) (j) (j) bj = (v1 . . Hence b spans V .
10 Proposition plement. and it follows that uj = 0 for each j. . If we deﬁne uj = i=1 λij vi for each j then we have uj ∈ Uj and u1 + u2 + · · · + uk = 0. . By 4. . ur . Let U be a subspace of V and let d be the dimension of V . . u2 . vdj ) is linearly independent it follows that λij = 0 for all i and j. Cosets arise naturally as solution sets of linear equations when the right hand side is nonzero. wd−r ∈ V such that (u1 . The k k number of elements in b is j=1 dj = j=1 dim Uj . it is a basis for V .3 we have V = Span(u1 . . W = Span(w1 . We have shown that the only way to write 0 as a linear combination of the elements of b is to make all the coeﬃcients˜zero. So i=1 λij vi = uj = 0. . Any subspace of a ﬁnite dimensional space has a com Proof. . Now by 6. . Assume then that (u1 . wd−r ) is a complement to U .Chapter Six: Relationships between spaces j for some scalars λij . ur ) ⊕ Span(w1 . . wd−r ) is a basis of V . . . . . wd−r ). .8 that the Ui are indedj (j) pendent. w1 . If v is an arbitrary element of V we deﬁne the coset of U containing v to be the subset v + U of V deﬁned by v + U = { v + u  u ∈ U }. . .10 we may choose w1 . . We have seen that subspaces arise naturally as solution sets of linear equations of the form T (x) = 0. . . .11 we know that U is ﬁnitely generated and has dimension less than d. . Since b is linearly independent and spans. . ur ) is a basis for U . . . That is. . hence the statement about dimensions follows. thus b is linearly independent. . . . By 4. 139 d (j) Co rig py As a corollary of the above we obtain: 6. ˜ ˜ (j) (j) and since bj = (v1 . . . §6c Quotient spaces Let U be a subspace of the vector space V . . . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics dS tat an . w2 . ˜ Since V is the direct sum of the Uj we know from 6.
If x ∈ S then x − x0 ∈ K. Hence x0 + K ⊆ S. Cosets can also be described as equivalence classes. Conversely. It is easily checked that if x + U and y + U are cosets of U then the sum of any element of x + U and any element of y + U will lie in the coset (x + y) + U . Sc ho and it follows that ht em x = x0 + (x − x0 ) ∈ x0 + K. and now T (x) = T (x0 + v) = T (x0 ) + T (v) = w + 0 = w shows that x ∈ S. S = { x ∈ V  T (x) = w }.140 Chapter Six: Relationships between spaces 6. Thus it is natural to deﬁne addition of cosets by the rule 6. Thus S ⊆ x0 + K. Let w ∈ W and let S be the set of solutions of the equation T (x) = w. symmetric and transitive. if x is an element of x0 + K then we have x = x0 + v with v ∈ K. The set of all cosets of U in V can be made into a vector space in a manner totally analogous to that used in Chapter Two to construct a ﬁeld with three elements. so if w ∈ im T then S is certainly empty.1 (x + U ) + (y + U ) = (x + y) + U.11. / Suppose that w ∈ im T and let x0 be any element of V such that T (x0 ) = w. If w ∈ im T then S is empty. and that the resulting equivalence classes are precisely the cosets of U .11 Proposition Suppose that T : V → W is a linear transformation. in the following way. deﬁne a relation ≡ on V by (∗) x ≡ y if and only if x − y ∈ U . ol rs ive Un ati cs of M ath ity of Sy dn ey tat ist ics an dS . since linearity of T gives T (x − x0 ) = T (x) − T (x0 ) = w − w = 0. By deﬁnition im T is the set of all w ∈ W such that w = T (x) for some x ∈ V . It is straightforward to check that ≡ is reﬂexive. Given that U is a subspace of V . If w ∈ im T then S / is a coset of the kernel of T : Co S = x0 + K ri g py where x0 is a particular solution of T (x) = w and K is the set of all solutions of T (x) = 0. that is. Proof. and therefore x0 + K = S.
2 λ(x + U ) = (λx) + U. 6.1 The vector space deﬁned above is called the quotient of V by U . Thus it is reasonable to regard V as being in some sense made up of U and V /U . as deﬁned in §1e. It is routine to check that addition and scalar multiplication of cosets as deﬁned above satisfy the vector space axioms. 6. equivalently.12. The quotient space V /U can be thought of as the space obtained from V by identifying things which diﬀer by an element of U . so that addition of cosets is welldeﬁned. Proof. pretending that all the elements of U are zero. rig py Sc ho ht em ol of M rs ive Un ati cs ath ity of Sy dn an ey dS tat ist ics .11. By the deﬁnitions of addition and scalar multiplication of cosets we have T (λx + µy) = (λx + µy) + U = λ(x + U ) + µ(y + U ) = λT (x) + µT (y) for all x.12. or.13 Proposition Let U be a subspace of V and let W be any complement to U . (Note in particular that the coset 0 + U . A similar remark is valid for scalar multiplication. which is just U itself. Comments 6. y ∈ W and all λ. Note that it is precisely the quotient. Hence T is linear.11.2 above.2 Note that it is possible to have x + U = x + U and y + U = y + U without having x = x or y = y . However.1 and 6. Then W is isomorphic to V /U . µ ∈ F . it is necessarily true in these circumstances that (x + y) + U = (x + y ) + U .Chapter Six: Relationships between spaces Similarly. of V by the equivalence relation ≡ (see (∗) above).11. plays the role of zero. and so we deﬁne 141 Co 6. We will prove that T is linear and bijective. Then V /U is a vector space over F relative to the addition and scalar multiplication deﬁned in 6. Deﬁne T : W → V /U by T (w) = w + U .) 6.12 Theorem Let U be a subspace of the vector space V and let V /U be the set of all cosets of U in V . if λ is a scalar and x + U a coset then multiplying any element of x + U by λ gives an element of (λx) + U . The next proposition reinforces this idea.
Since w → w + U is an isomorphism W → V /U we must have that the elements w1 + U . and so it follows from 6.13. . . We have also seen in Exercise 11 of Chapter Three that the sum of two linear transformations from V to W is also a linear transformation from V to W . ws ) a basis for W . It is easily checked. . and hence v + U = w + U = T (w) ∈ im T. This can also be seen directly. So ker T = {0}. . and the product of a linear transformation by a scalar also gives a linear transformation. Then T (w) = w + U is the zero element of V /U . But W and U are complementary. w + U = U . an arbitrary element of V /U has the form v + U for some v ∈ V . . .15) T is injective. by arguments almost identical to those used in §3b#6. and we deduce that w = 0. ws + U form a basis for V /U . to see that they span observe that since each element v of V = W ⊕ U is expressible in the form Sc ho v= s j=1 ht em r ol rs ive Un ati cs λi ui .10 that L(V.6 that U ∩ W = {0}. We see from this that v ≡ w in the sense of (∗) above. and hence that w ∈ U ∩ W . j=1 of M µj wj + i=1 ath ity of Sy dn an ey dS tat it follows that µj wj + U = j=1 s ist ics v+U = µj (wj + U ). Comment 6. and therefore (by 3. Finally. . that is. . .142 Chapter Six: Relationships between spaces Let w ∈ ker T . ur ) is a basis for U and (w1 . . Since w = w + 0 ∈ w + U it follows that w ∈ U . The special case W = F is of particular importance. The set L(V. §6d The dual space Let V and W be vector spaces over F . Co s ri g py Since v + U was an arbitrary element of V /U this shows that T is surjective. .1 In the situation of 6. W ) of all linear transformations from V to W is nonempty (the zero function is linear) and so it follows from 3.13 suppose that (u1 . and since V is the sum of U and W there exist u ∈ U and w ∈ W with v = w + u. . . that the set of all functions from V to W becomes a vector space if addition and scalar multiplication are deﬁned in the natural way. W ) is a vector space. For instance.
since an i=1 ey dS tat ist ics n n n λi fi i=1 (vj ) = i=1 λi fi (vj ) = i=1 λi δij = λj = f (vj ).Chapter Six: Relationships between spaces 6. Existence and uniqueness of the fi is immediate from 4. f2 . elements of the dual space are commonly called linear functionals. there is a natural isomorphism between V and (V ∗ )∗ . .16 it suﬃces to show that f and i=1 λi fi take the same value on all elements of the basis b. . Furthermore. vn ) be a basis for the vector space V . . .16.15 Theorem Let b = (v1 . Let f ∈ V ∗ be arbitrary. fn ) is a basis of V ∗ . (f1 . nevertheless. 2. n there exists a unique linear functional fi such that fi (vj ) = δij (Kronecker delta) for all j.14 Definition Let V be a vector space over F . 143 Co rig py 6. The space of all linear transformations from V to F is called the dual space of V . Comment 6. . . By 4. Comment 6.1 In another strange tradition. Our ﬁnal result in this chapter shows that. It remains to show that they span. . whereas elements of V need not be functions at all. This is indeed satisﬁed. n i=1 λi fi is the zero function then for all j we have Sc ho ht rs ive Un ati cs M ath ity em If ol of of Sy n n n dn 0= i=1 λi fi (vj ) = i=1 λi fi (vj ) = λi δij = λj and so it follows that the fi are linearly independent. . . .15. This cannot strictly be satisﬁed: elements of (V ∗ )∗ are linear functionals on the space V ∗ . . v2 . .14. For each i = 1. and for each i deﬁne λi = f (vi ).1 The basis of V ∗ described in the theorem is called the dual basis ∗ of V corresponding to the basis b of V . Proof. We will show n n that f = i=1 λi fi . To accord with the normal use of the word ‘dual’ it ought to be true that the dual of the dual space is the space itself. . and it is commonly denoted ‘V ∗ ’.
. give a formula for its inverse. ath ity of Sy dn ey ist ics tat dS an 3. 1 1 α α (i ) ρ: R2 → R3 deﬁned by ρ = 2 1 β β 0 −1 α 1 1 0 α 3 3 β = 0 1 1β (ii ) σ: R → R deﬁned by σ γ 0 0 1 γ 4. For each v ∈ V deﬁne ev : V ∗ → F by ev (f ) = f (v) Co M for all f ∈ V ∗ .16 becomes false if the assumption of ﬁnite dimensionality is omitted. which we will do in the next chapter.144 Chapter Six: Relationships between spaces 6. This will also be easy once we have proved the Main Theorem on Linear Transformations. Let U and V be ﬁnite dimensional vector spaces over the ﬁeld F and let f : U → V be a linear transformation which is injective.16 Theorem Let V be a ﬁnite dimensional vector space over the ﬁeld F . We omit the proof. except for the proof that v → ev is surjective. If it is. which requires the use of a dimension argument. 2. In each case determine (giving reasons) whether the given linear transformation is a vector space isomorphism. and that the composite of two vector space isomorphisms is a vector space isomorphism. ri g py Then ev is an element of (V ∗ )∗ . that Theorem 6. It is important to note. All parts are in fact easy. Prove that the inverse of a vector space isomorphism is a vector space isomorphism. and hence prove that dim U ≤ dim V . Prove that U is isomorphic to a subspace of V . however. Prove that isomorphic spaces have the same dimension. and furthermore the mapping V → (V ∗ )∗ deﬁned by v → ev is an isomorphism. Sc ho ht em ol rs ive Un ati cs of Exercises 1.
i(v) = v for all v ∈ V . Let φ: V → V be a linear transformation such that φ2 = φ. Let V be a vector space and S and T subspaces of V such that V = S⊕T . Prove that the Ui are independent if and only if each element of U1 + U2 + · · · + Uk (the subspace generated k by the Ui ) is uniquely expressible in the form i=1 ui with ui ∈ Ui for each i. .Chapter Six: Relationships between spaces 5. ho ol ht em rs ive Un ati cs of M ath ity of Sy dn an ey dS tat 7. . U2 . V and W of R4 such that R4 = U ⊕ V = V ⊕ W = W ⊕ U ? 10. For all ∈ R2 let φ be the function from β β R to R given by α φ (t) = αt + (β − α). 6. Let W be the space of all twice diﬀerentiable functions f : R → R satisα α fying d2 f /dt2 = 0. Express R2 as the sum of two one dimensional subspaces in two diﬀerent ways. ist ics . thus φ2 (v) = φ φ(v) ). Give a formula for the inverse isomorphism φ−1 . and that φ: v → φ(v) is an isomorphism from R2 to W . Is it possible to ﬁnd subspaces U . (Note that multiplication of linear transformations is always deﬁned to be composition. that is. (i ) Prove that i−φ: V → V (deﬁned by (i−φ)(v) = v −φ(v)) satisﬁes 145 Co rig py (ii ) Prove that im φ = ker(i − φ) and ker φ = im(id − φ). Prove or disprove the following assertion: If U is any subspace of V then U = (U ∩ S) ⊕ (U ∩ T ). Sc (i − φ)2 = i − φ. Uk be subspaces of V . 9. . 8. β Prove that φ(v) ∈ W for all v ∈ R2 . (iii ) Prove that for each element v ∈ V there exist unique x ∈ im φ and y ∈ ker φ with v = x + y. . Let U1 . Let i denote the identity transformation V → V .
. and is sometimes denoted by ‘V W ’. . . (i ) Let V and W be vector spaces over F . w)  w ∈ W } are subspaces of V W with V ∼ V and W ∼ W . 14. 15. Show that the Cartesian product of V and W becomes a vector space if addition and scalar multiplication are deﬁned in the natural way. 0)  v ∈ V } and W = { (0. t) → s + t deﬁnes a linear transformation from S T to V which has image S + T . Let S and T be subspaces of a vector space V . 16. U2 . . . (i ) Let V be a vector space over F and let v ∈ V . . (i ) cs dn an ey dS tat ist ics . Prove that the function ev : V ∗ → F deﬁned by ev (f ) = f (v) (for all f ∈ V ∗ ) is linear. . . . mi ). Let V be a real inner product space. Sc ho ht em ol rs ive Un ati of M ath ity of Sy 13.) (ii ) Show that V = { (v. n) and each Vi is the direct sum of subspaces Vij (for j = 1. . . . . Prove that the Ui are independent if and only if Uj ∩ (U1 + · · · + Uj−1 + Uj+1 + · · · + Uk ) = {0} ˜ for all j = 1. (ii ) By (i ) each fw is an element of the dual space V ∗ of V . 2. Prove that for each element w ∈ V the function fw : V → R is linear. 2. Uk be subspaces of V . (This space is called the external direct sum of V and W . Prove that V is the direct sum of all the subspaces Vij . Prove that f : V → V ∗ deﬁned by f (w) = fw is a linear transformation. and that = = V W =V ⊕W . Let U1 . . w for all v ∈ V . prove that the function e: V → (V ∗ )∗ deﬁned by e(v) = ev for all v in V is a linear transformation. (iii ) Prove that the function e deﬁned above is injective.146 Chapter Six: Relationships between spaces 11. . k. (iii ) Prove that dim(V W ) = dim V + dim W . Co ri g py 12. Prove that (s. and for all w ∈ V deﬁne fw : V → R by fw (v) = v. Suppose that V is the direct sum of subspaces Vi (for i = 1. 2. (ii ) With ev as deﬁned above. .
(iv ) Assume that V is ﬁnite dimensional. Hence prove that the kernel of the linear transformation f in (ii ) is {0}. Use Exercise 2 above and the fact that V and V ∗ have the same dimension to prove that the linear transformation f in (ii ) is an isomorphism.Chapter Six: Relationships between spaces (iii ) By considering fw (w) prove that fw = 0 only if w = 0. 147 Co rig py Sc ho ol ht rs ive Un ity of ati cs an Sy dn ey dS tat ist ics M ath em of .
. For each vj in the basis b let T (vj ) = i=1 αij wi . Since an ntuple of numbers is (in the context of pure mathematics!) a fairly concrete sort of thing. .7 Matrices and Linear Transformations Co ri g py ht rs ive Un M ath ity em of ati Sy cs dn an ey dS of We have seen in Chapter Four that an arbitrary ﬁnite dimensional vector space is isomorphic to a space of ntuples. vn ) be a basis for the space V and c = (w1 . is there some concrete thing. w2 . wm ). and let b = (v1 . . and let T : V → W be a linear transformation. to investigate how this passage from abstract spaces to ntuples interacts with linear transformations. wm ) a basis for W . Then 148 Sc ho ol tat ist ics . . And since we introduced vector spaces to provide a context in which to discuss linear transformations. . . n Proof. . . Deﬁne Mcb (T ) to be the m × n matrix whose ith column is cvc (T vi ). it is useful to use these isomorphisms to translate questions about abstract vector spaces into questions about ntuples.1 Theorem Let V and W be vector spaces over the ﬁeld F with bases b = (v1 . surely. . . . like an ntuple. v2 . 7. Then for all v ∈ V . so that cvb (v) is the column with m λi as its ith entry. For instance.16 that the action of a linear transformation on a basis of a space determines it uniquely. . which can be associated with a linear transformation? §7a The matrix of a linear transformation We have shown in 4. cvc (T v) = Mcb (T ) cvb (v). . . Suppose then that T : V → W is a linear transformation. our ﬁrst task is. Let v ∈ V and let v = i=1 λi vi . . Thus there should be a formula expressing cvc (T v) (for any v ∈ V ) in terms of the cvc (T vi ) and cvb (v). . vn ) and c = (w1 . If we know expressions for each T vi in n terms of the wj then for any scalars λi we can calculate T ( i=1 λi vi ) in terms of the wj .
. .) ath ity cs an dS em of Sy entry of cvc (T v) ati dn ey tat ist ics . .2 The matrix of a transformation from an mdimensional space to an ndimensional space is an n × m matrix.1. . . α1n α21 α22 . α2n λ2 = . . αm1 λ1 + αm2 λ2 + · · · + αmn λn α11 α12 . . and by the deﬁnition of Mcb (T ) this gives α11 α12 . . . = Mcb (T ) cvb (v). . . αm1 αm2 . 7. . . . . . . . Mcb (T ) = . .1 The matrix Mcb (T ) is called the matrix of T relative to the bases c and b. . and since the coeﬃcient of wi in this expression is the ith it follows that α11 λ1 + α12 λ2 + · · · + α1n λn α21 λ1 + α22 λ2 + · · · + α2n λn cvc (T v) = . . . . . . αmn 149 Co rig py Sc ho We obtain ht rs ive Un T v = T (λ1 v1 + λ2 v2 + · · · + λn vn ) = λ1 T v1 + λ2 T v2 + · · · + λn T vn ol of m m m M = λ1 i=1 m αi1 wi + λ2 αi2 wi + · · · + λn αin wi i=1 i=1 = i=1 (αi1 λ1 + αi2 λ2 + · · · + αin λn )wi . .1. . . αm1 αm2 . .Chapter Seven: Matrices and Linear Transformations αij is the ith entry of the column cvc (T vj ). The theorem says that in terms of coordinates relative to bases c and b the eﬀect of a linear transformation T is multiplication by the matrix of T relative to these bases. . . (One column for each element of the basis of the domain. α2n . αmn λn Comments 7. . α1n λ1 α21 α22 . .
p2 . we have Sc D(λf + µg) = λ(Df ) + µ(Dg) ho ht ol rs ive Un of M ath ity Dp0 = 0. We could have. used a basis for V considered as the codomain of D diﬀerent from the basis used for V considered as the domain of D. . this would result in a diﬀerent matrix. p1 .) By linearity of D we have. That is. hence D is a linear transformation. em ati 0 2 0 0 0 0 cvb (Dp0 ) = . and hence D is a welldeﬁned transformation from V to V .150 Chapter Seven: Matrices and Linear Transformations Examples #1 Let V be the space considered in §4e#3. it is certainly true that Df ∈ V if f ∈ V . Dp2 = 2p1 . g ∈ V and all λ. Moreover. 0 0 0 0 cvb (Dp3 ) = . consisting of all polynomials over R of degree at most three. so we may use the same basis for each. 3 0 of Sy dn ey ics ist dS tat cs an Thus the matrix of D relative to b and 0 0 Mbb (D) = 0 0 0 0 . Dp3 = 3p2 . D(a0 p0 + a1 p1 + a2 p2 + a3 p3 ) = a0 Dp0 + a1 Dp1 + a2 Dp2 + a3 Dp3 = a1 p0 + 2a2 p1 + 3a3 p2 . p3 ) is the basis deﬁned in §4e#3. Df is the derivative of f . Of course. elementary calculus gives Co ri g py for all f. and therefore Dp1 = p0 . 0 0 b is 1 0 0 0 1 0 cvb (Dp1 ) = . if we had wished. Now. Since the derivative of a polynomial of degree at most three is a polynomial of degree at most two. and deﬁne a transformation D: V → V by d (Df )(x) = dx f (x). 0 0 0 2 cvb (Dp2 ) = . µ ∈ R. if b = (p0 . for all ai ∈ R. 3 0 (Note that the domain and codomain are equal in this example.
Chapter Seven: Matrices and Linear Transformations a0 a1 a 2a so that if cvc (p) = 1 then cvb (Dp) = 2 . To verify the assertion a2 3a3 a3 0 of the theorem in this case it remains to observe that a1 0 1 0 0 a0 2a2 0 0 2 0 a1 = . 3a3 0 0 0 3 a2 0 0 0 0 0 a3 1 2 1 0 , 1 , 1 , a basis of R3 , and let s be the #2 Let b = 1 1 1 standard basis of R3 . Let i: R3 → R3 be the identity transformation, deﬁned by i(v) = v for all v ∈ R3 . Calculate Msb (i) and Mbs (i). Calculate also the −2 coordinate vector of −1 relative to b. 2
151
Co
rig py
−− We proved in §4e#2 that b is a basis. The ith column of Msb (i) is the coordinate vector cvs (i(vi )), where b = (v1 , v2 , v3 ). But, by 4.19.2 and the deﬁnition of i, cvs (i(v)) = cvs (v) = v and it follows that 1 0 Msb (i) = 1 2 1 1 (for all v ∈ R3 ) 1 1. 1
The ith column of Mbs (i) is cvb (i(ei )) = cvb (ei ), where e1 , e2 and e3 are the vectors of s. Hence we must solve the equations 1 2 1 1 0 = x11 0 + x21 1 + x31 1 1 0 1 1 0 1 2 1 1 = x12 0 + x22 1 + x32 1 (∗) 1 1 0 1 0 1 2 1 0 = x13 0 + x23 1 + x33 1 . 1 1 1 1
Sc
ho
ht em
ol
rs ive Un ati cs
of
M
ath
ity
of
Sy
dn
an
ey
dS
tat
ist ics
152
Chapter Seven: Matrices and Linear Transformations x11 The solution x21 of the ﬁrst of these equations gives the ﬁrst column x31 of Mbs (i), the solution of the second equation gives the second column, and likewise for the third. In matrix notation the equations can be rewritten as x11 1 2 1 1 0 = 0 1 1 x21 x31 1 1 1 0 x12 1 2 1 0 1 = 0 1 1 x22 1 1 1 x32 0 x13 1 2 1 0 0 = 0 1 1 x23 , 1 1 1 1 x33
Co
ri g py
Sc
ho
ht em
ol
rs ive Un ati
of
M
ath
ity
of
Sy
or, more succinctly 1 0 0
still, 0 1 0 1 0 = 0 0 1 1 2 1 1
x11 1 x21 1 1 x31
in which the matrix (xij ) is Mbs (i). Thus −1 1 2 1 0 −1 1 0 −1 Mbs (i) = 0 1 1 = 1 1 1 1 −1 1 1
as calculated in §4e#2. (This is an instance of the general fact that if T : V → W is a vector space isomorphism and b, c are bases for V, W then Mbc (T −1 ) = Mcb (T )−1 . The present example is slightly degenerate since i−1 = i. Note also that if we had not already proved that b is a basis, the fact that the equations (∗) have a solution proves it. For, once the ei have been expressed as linear combinations of b it is immediate that b spans R3 , and since R3 has dimension three it then follows from 4.12 that b is a basis.) −2 Finally, with v = −1 , 7.1 yields that 2 cvb (v) = cvb i(v) = Mbs (i) cvS i(v) = MBS (i) v
cs
x12 x22 x32
x13 x23 x33
dn ey ics
an
dS
tat
ist
Chapter Seven: Matrices and Linear Transformations 0 −1 1 −2 3 = 1 0 −1 −1 = −4 −1 1 1 2 3 −−
153
Co rig py
T (v) = cvc T (v) = Mcb (T ) cvb (v) = M v. Thus we have proved
Let T : F n → F m be an arbitrary linear transformation, and let b and c be the standard bases of F n and F m respectively. If M is the matrix of T relative to c and b then by 4.19.2 we have, for all v ∈ F n ,
7.2 Proposition Every linear transformation from F n to F m is given by multiplication by an m × n matrix; furthermore, the matrix involved is the matrix of the transformation relative to the standard bases. 7.3 Definition Let b and c be bases for the vector space V . Let i be the identity transformation from V to V and let Mcb = Mcb (i), the matrix of i relative to c and b. We call Mcb the transition matrix from bcoordinates to ccoordinates. The reason for this terminology is clear: if the coordinate vector of v relative to b is multiplied by Mcb the result is the coordinate vector of v relative to c.
Sc
ho
ht em
ol
rs ive Un ati cs
of
M
ath
ity
of
Sy
dn
an
ey
dS
tat
ist
ics
7.4 Proposition
In the notation of 7.3 we have cvc (v) = Mcb cvb (v) for all v ∈ V .
The proof of this is trivial. (See #2 above.) §7b Multiplication of transformations and matrices
We saw in Exercise 10 of Chapter Three that if φ: U → V and ψ: V → W are linear transformations then their product is also a linear transformation. (Remember that multiplication of linear transformations is composition; ψφ: U → W is φ followed by ψ.) In view of the correspondence between matrices and linear transformations given in the previous section it is natural to seek a formula for the matrix of ψφ in terms of the matrices of ψ and φ. The answer is natural too: the matrix of a product is the product the matrices. Of course this is no ﬂuke: matrix multiplication is deﬁned the way it is just so that it corresponds like this to composition of linear transformations.
154
Chapter Seven: Matrices and Linear Transformations 7.5 Theorem Let U, V, W be vector spaces with bases b, c, d respectively, and let φ: U → V and ψ: V → W be linear transformations. Then Mdb (ψφ) = Mdc (ψ) Mcb (φ). Proof. Let b = (u1 , . . . , um ), c = (v1 , . . . , vn ) and d = (w1 , . . . , wp ). Let the (i, j)th entries of Mdc (ψ), Mcb (φ) and Mdb (ψφ) be (respectively) αij , βij and γij . Thus, by the deﬁnition of the matrix of a transformation,
Co
ri g py
Sc
ho
p
ht
ol
(a) (b) and
ψ(vk ) = φ(uj ) =
αik wi βkj vk
(for k = 1, 2, . . . , n),
rs ive Un
i=1 n
of M ath
(for j = 1, 2, . . . , m),
k=1 p
ity of
em
(c)
(ψφ)(uj ) =
i=1
γij wi
(for j = 1, 2, . . . , m).
But, by the deﬁnition of the product of linear transformations, (ψφ)(uj ) = ψ φ(uj )
n
ati
Sy
cs
dn ey
an dS
=ψ
k=1 n
βkj vk βkj ψ(vk )
(by (b))
=
k=1 n
(by linearity of ψ)
tat ist ics
p
=
k=1 p
βkj
i=1 n
αik wi αik βkj wi ,
(by (a))
=
i=1 k=1
and comparing with (c) we have (by 4.15 and the fact that the wi form a basis) that
n
γij =
k=1
αik βkj
for all i = 1, 2, . . . , p and j = 1, 2, . . . , m. Since the right hand side of this equation is precisely the formula for the (i, j)th entry of the product of the matrices Mdc (ψ) = (αij ) and Mcb (φ) = (βij ) it follows that Mdb (ψφ) = Mdc (ψ) Mcb (φ).
Chapter Seven: Matrices and Linear Transformations Comments 7.5.1 Here is a shorter proof of 7.5. The j th column of Mdc (ψ) Mcb (φ) is, by the deﬁnition of matrix multiplication, the product of Mdc (ψ) and the j th column of Mcb (φ). Hence, by the deﬁnition of Mcb (φ), it equals
155
Co
rig py
which in turn equals cvd ψ(φ(uj )) (by 7.1). But this is just cvd (ψφ)(uj ) , the j th column of Mdb (ψφ). 7.5.2 Observe that the shapes of the matrices are right. Since the dimension of V is n and the dimension of W is p the matrix Mdc (ψ) has n columns and p rows; that is, it is a p×n matrix. Similarly Mcb (φ) is an n×m matrix. Hence the product Mdc (ψ) Mcb (φ) exists and is a p × m matrix— the right shape to be the matrix of a transformation from an mdimensional space to a pdimensional space. It is trivial that if b is a basis of V and i: V → V the identity then (i) is the identity matrix (of the appropriate size). Hence, Mbb 7.6 Corollary If f : V → W is an isomorphism then Mbc (f for any bases b, c of V, W . Proof. Let I be the d × d identity matrix, where d = dim V = dim W (see Exercise 1 of Chapter Six). Then by 7.5 we have Mbc (f and Mcb (f ) Mbc (f whence the result. As another corollary we discover how changing the bases alters the matrix of a transformation:
−1 −1 −1
Sc
Mdc (ψ) cvc (φ uj ) ,
) Mcb (f ) = Mbb (f −1 f ) = Mbb (i) = I ) = Mcc (f f −1 ) = Mcc (i) = I
ho
ht em
ol
rs ive Un ati cs
−1
of
M
) = (Mcb (f ))
ath
ity
of
Sy
dn
an
ey
dS
tat ist ics
156
Chapter Seven: Matrices and Linear Transformations 7.7 Corollary Let T : V → W be a linear transformation. Let b1 , b2 be bases for V and let c1 , c2 be bases for W . Then Mc b (T ) = Mc c Mc b (T ) Mb b .
Co
ri g py
Proof. By their deﬁnitions Mc c = Mc c (iW ) and Mb b = Mb b (iV ) (where iV and iW are the identity transformations); so 7.5 gives Mc c Mc b (T ) Mb b = Mc b (iW T iV ), and since iW T iV = T the result follows.
Our next corollary is a special case of the last; it will be of particular importance later. 7.8 Corollary Let b and c be bases of V and let T : V → V be linear. Let X = Mbc be the transition matrix from ccoordinates to bcoordinates. Then −1 Mcc (T ) = X Mbb (T )X. Proof. By 7.6 we see that X −1 = Mcb , so that the result follows immediately from 7.7.
#3
Let f : R3 → R3 be deﬁned by x 8 85 −9 x f y = 0 −2 0y z 6 49 −7 z
1 3 −4 and let b = 0 , 0 , 1 . Prove that b is a basis for R3 and 2 5 1 calculate Mbb (f ). −− As in previous examples, b will be a basis if and only if the matrix with the elements of b as its columns is invertible. If it is invertible it is the
Sc
ho
ht em
ol
rs ive Un ati cs
of
Example
M
ath
ity
of
Sy
dn
an
ey
dS
tat
ist
ics
Chapter Seven: Matrices and Linear Transformations transition matrix Msb , and its inverse is Mbs . Now 1 3 −4 1 3 −4 1 0 0 :=R3 R3− − −R1 0 0 0 1 0 1 0 − − −→ 0 1 − − 0 0 1 1 2 5 0 −1 9 1 3 −4 R2 ↔R3 0 1 −9 R2 :=−1R2 −− −→ −−− 0 0 1 1 0 0 R2 :=R2 +9R3 R1 :=R1 +4R3 0 1 0 R :=R1 −R −1− − − 2 −−−→ 0 0 1
157
1 0 −1 1 1 0
0 1 0
0 0 0 −1 1 0
0 0 1
Co
rig py
so that b is a basis. Moreover, by 7.8, 1 8 85 −9 −2 −23 3 00 9 −1 0 −2 Mbb (f ) = 1 1 6 49 −7 0 1 0 −1 6 8 −2 −23 3 = 1 9 −1 0 0 −2 −1 4 −10 0 1 0 −1 0 0 = 0 2 0. 0 0 −2
Sc
ho
ht em
ol
rs ive Un ati cs
−2 −23 3 1 9 −1 , 0 1 0
of
M
ath
ity
of
3 −4 0 1 2 5
Sy
dn ist
an
ey ics
−−
dS
tat
§7c
The Main Theorem on Linear Transformations
If A and B are ﬁnite sets with the same number of elements then a function from A to B which is onetoone is necessarily onto as well, and, likewise, a function from A to B which is onto is necessarily onetoone. There is an analogous result for linear transformations between ﬁnite dimensional vector spaces: if V and U are spaces of the same ﬁnite dimension then a linear transformation from V to U is onetoone if and only if it is onto. More generally, if V and U are arbitrary ﬁnite dimensional vector spaces and φ: V → U a linear transformation then the dimension of the image of φ equals the dimension of V minus the dimension of the kernel of φ. This is one of the assertions of the principal result of this section:
µ ∈ F . and since V = W + ker φ we may write v = w + z with w ∈ W and z ∈ ker φ. We have proved in 6. ity of Sy dn an ey ist ics dS tat .15. by 3.1 Recall that ‘W complements ker φ’ means that V = W ⊕ ker φ. Then the dimension of ker φ is the number of degrees of freedom which are killed by φ and the dimension of im φ the number which survive φ. We prove that ψ is an isomorphism. Then u = φ(v) for some v ∈ V . but the proof (using Zorn’s Lemma) is beyond the scope of this course. Then we have u = φ(v) = φ(w + z) = φ(w) + φ(z) = φ(w) + 0 Sc dim V = dim(im φ) + dim(ker φ). Note that it is legitimate to restrict the codomain like this since φ(v) ∈ im φ for all v ∈ W . and it remains to prove that ψ is surjective. Then (i) W is isomorphic to im φ.9.2 In physical examples the dimension of the space V can be thought of intuitively as the number of degrees of freedom of the system.10 that if V is ﬁnite dimensional then there always exists such a W .6. Let u ∈ im φ. ψ is certainly linear also: ψ(λx + µy) = φ(λx + µy) = λφ(x) + µφ(y) = λψ(x) + µψ(y) for all x. (i) Deﬁne ψ: W → im φ by ψ(v) = φ(v) Thus ψ is simply φ with the domain restricted to W and the codomain restricted to im φ. Let W be a subspace of V complementing ker φ. We have ker ψ = { x ∈ W  ψ(x) = 0 } = { x ∈ W  φ(x) = 0 } = W ∩ ker φ = {0} by 6. In fact it is also true in the inﬁnite dimensional case. ho ht em ol rs ive Un ati cs of M ath for all v ∈ W .9. (ii) if V is ﬁnite dimensional then Co ri g py Comments 7. Hence ψ is injective.9 The Main Theorem on Linear Transformations Let V and U be vector spaces over the ﬁeld F and φ: V → U a linear transformation. y ∈ W and λ. since the sum of W and ker φ is direct.158 Chapter Seven: Matrices and Linear Transformations 7. Proof. 7. Since φ is linear.
. and is therefore a space of dimension 1. 1 z 2 .Chapter Seven: Matrices and Linear Transformations since z ∈ ker φ. xr ) be a basis for W . (ii) Let (x1 . distort the picture somewhat. Example rs ive Un ity of Sy dn ey dS tat ist ics ati cs an In our treatment of the Main Theorem we have avoided mentioning quotient spaces. ψ(xr )) is a basis for im φ. That is. it is easily checked that 0 1 −1 . . 1 −1 0 Sc ho ht em ol of M is a basis for ker φ. a more natural version of the theorem is as follows: 7. indeed. of the form v + K where v ∈ V .17 that (ψ(x1 ). Since ψ is an isomorphism it follows from 4. .10 Theorem Let V and U be vector spaces and φ: V → U a linear transformation. and so dim(im φ) = dim W . symbolically. . . This does. Hence W is isomorphic to im φ.9 yields 159 Co rig py dim V = dim W + dim(ker φ) = dim(im φ) + dim(ker φ). so that §6c would not be a prerequisite. Now if x ∈ K then φ(v + x) = φ(v) + φ(x) = φ(v) + 0 = φ(v) ath 1 1 2 x 1 y . Then the image of φ is isomorphic to the quotient of V by the kernel of φ. so we have shown that ψ: W → im φ is surjective. By the Main Theorem column 2 the kernel of φ must have dimension 2. however. = Proof. and. . Hence 6. This holds for all u ∈ im φ. . . im φ ∼ (V / ker φ). Elements of V /K are cosets. as required. and so u = φ(w) = ψ(w). ψ(x2 ). x2 . #4 Let φ: R3 → R3 be deﬁned by 1 x y = 1 φ 2 z It is easily that the image of φ consists of all scalar multiples of the seen 1 1 . Let I = im φ and K = ker φ.
Moreover.160 Chapter Seven: Matrices and Linear Transformations and so it follows that φ maps all elements of the coset x + K to the same element u = φ(v) in the subspace I of U . and φ: V → U a linear transformation. Then there exist bases b. Sc ho ol 7. Another consequence of the Main Theorem is that it is possible to choose bases of the domain and codomain so that the matrix of a linear transformation has a particularly simple form. U be vector spaces of dimensions n. if v+K ∈ ker ψ then since φ(v) = ψ(v+K) = 0 it follows that v ∈ K.11 Proposition Mcb (φ) = ht em rs ive Un ati cs If S and T are subspaces of V then dim(S + T ) + dim(S ∩ T ) = dim S + dim T. Hence there is a welldeﬁned mapping ψ: (V /K) → I satisfying ψ(v + K) = φ(v) Co for all v ∈ V . Since every element of I has the form φ(v) = ψ(v + K) it is immediate that ψ is surjective. the zero element of V /K.12 Theorem Let V. We leave to the exercises the proof of the following consequence of the Main Theorem. m respectively. y ∈ V and λ. Hence the kernel of ψ consists of the zero element alone. ri g py Linearity of ψ follows easily from linearity of φ and the deﬁnitions of addition and scalar multiplication for cosets: ψ(λ(x + K) + µ(y + K)) = ψ((λx + µy) + K) = φ(λx + µy) = λφ(x) + µφ(y) = λψ(x + K) + µψ(y + K) for all x. c of V. µ ∈ F . whence v + K = K. and therefore ψ is injective. 7. U and a positive integer r such that Ir×r 0(m−r)×r 0r×(n−r) 0(m−r)×(n−r) of M ath ity of Sy dn an ey dS tat ist ics .
By deﬁnition the ith column of Mcb (φ) is the coordinate vector of φ(xi ) relative to c.Chapter Seven: Matrices and Linear Transformations where Ir×r is an r × r identity matrix. . . Our ﬁrst aim is to prove the following important theorem: 7. and the other blocks are zero matrices of the sizes indicated. . xn ) the resulting basis of V .10 we may choose ur+1 . ati Sy cs dn an ey dS tat ist ics .13 Theorem Let A be any rectangular matrix over the ﬁeld F . Proof. If r + 1 ≤ i ≤ n then xi ∈ ker φ and so φ(xi ) = 0. Comment 7. . as in the proof of the Main Theorem. . um ) Sc ho ht em ol rs ive Un of M ath ity of is a basis for U . The bases b and c having been deﬁned. Let (xr+1 . by 6. let (x1 . . . Note that r is an invariant of φ. um ∈ U so that c = (φ(x1 ). φ(xr )) is a basis for the subspace im φ of U . xr ) be a basis of W . . . . . .9 it will have n − r elements. (φ(x1 ). Thus for 1 ≤ i ≤ r the ith column of the matrix has a 1 in the ith position and zeros elsewhere. and combining it with the basis of W will give a basis of V . and. So the last n − r columns of the matrix are zero. . φ(x2 ). ur+2 . Thus. By 6.12 is called the rank of φ. . . x2 . . . . . By 4. and let b = (x1 . . . . . it remains to calculate the matrix of φ and check that it is as claimed. in this case the solution of φ(xi ) = λ1 φ(x1 ) + · · · + λr φ(xr ) + λr+1 ur+1 + · · · + λm um is λi = 1 and λj = 0 for j = i. the Main Theorem on Linear Transformations ought to have an analogue for matrices. . in which case it is one. the (i.1 The number r which appears in 7. . r is just the dimension of the image of φ. . §7d Rank and nullity of matrices Because of the connection between matrices and linear transformations. This section is devoted to the relevant concepts. j)entry of Mcb (φ) is zero unless i = j ≤ r. xn ) be the chosen basis of ker φ. φ(xr ). in the sense that it depends only on φ and not on the bases b and c. . ur+1 . The dimension of the column space of A is equal to the dimension of the row space of A.10 we may write V = W ⊕ ker φ. 161 Co rig py As in the proof of the Main Theorem. ur+2 . . If 1 ≤ i ≤ r then φ(xi ) is actually in the basis c. Choose any basis of ker φ. Indeed. x2 .12.
It is trivial that φ is linear: φ(λx + µy) = A(λx + µy) = λ(Ax) + µ(Ay) = λφ(x) + µφ(y). so for some scalars λi p p Sc y= ho ht em p i=1 i=1 ol p rs ive Un ati i=1 of λi xi = λi Axi = A M ath λi Axi ∈ CS(AB). Ax2 . . . Now if v ∈ CS(B) then v = i=1 λi xi for some λi ∈ F . Immediate from 4. and so φ(v) = Av = A i=1 Thus φ is a function from CS(B) to CS(AB). The following is trivial: Co ri g py 7.15 Proposition The dimension of RS(A) does not exceed the number of rows of A. . there are some preliminary results we need ﬁrst. and the dimension of CS(A) does not exceed the number of columns. . We proved in Chapter Three (see 3. λi xi . Then the columns of AB m are Ax1 . . rank(A). .21) that if A and B are matrices such that AB exists then the row space of AB is contained in the row space of B. F ) then φ(x) = Ax deﬁnes a surjective linear transformation from CS(B) to CS(AB). F ) and B ∈ Mat(n × p. It remains to prove that φ is surjective. Axm . Proof.14 Definition The rank of A. Proof.13 we could equally well deﬁne the rank to be the dimension of the column space. xp . and the two are equal if A is invertible. n i=1 λi xi we have that x ∈ CS(B). . Let the columns of B be x1 . since the rows span the row space and the columns span the column space. However.12 (i).13. In view of 7. . we are not yet ready to prove 7. ity of Sy cs dn an ey dS tat ist ics Deﬁning x = as required. x2 . and y = Ax = φ(x) ∈ im φ. If y ∈ CS(AB) is arbitrary then y must be a linear combination of the Axi (the columns of AB). .162 Chapter Seven: Matrices and Linear Transformations 7. It is natural to ask whether there is any relationship between the column space of AB and the column space of B. is deﬁned to be the dimension of the row space of A.16 Proposition If A ∈ Mat(m × n. 7.
it is natural to look for invertible matrices P and Q for which P AQ is as simple as possible. 7. and combining this with 7. and since dim(ker φ) is certainly nonnegative the result follows. however. Note. Proof. with any luck.17 itself gives Sc ho ht ol rs ive Un of M ath ity 7. The next three results are completely analogous to 7. Given a matrix A.18 Corollary dimension. Then. If A is invertible then CS(B) and CS(AB) have the same We have already seen that elementary row operations do not change the row space at all.17 with B replaced by AB and A replaced by A−1 to deduce that dim(CS(A−1 AB) ≤ dim(CS(AB).17 Corollary The dimension of CS(AB) is less than or equal to the dimension of CS(B). By 7.16. it may be obvious that dim(RS(P AQ)) = dim(CS(P AQ)).17 and 7. This follows from the Main Theorem on Linear Transformations: since φ is surjective we have 163 Co rig py dim CS(B) = dim(CS(AB)) + dim(ker φ). it is only the dimension of the column space which is unchanged. In 7.18 and 7.Chapter Seven: Matrices and Linear Transformations 7. Theorem 7. 7.21 we see that premultiplying and postmultiplying by invertible matrices both leave unchanged the dimensions of the row space and column space. 7.19 Proposition If A and B are matrices such that AB exists then x → xB is a surjective linear transformation from RS(A) to RS(AB).20 Corollary The dimension of RS(AB) is less than or equal to the dimension of RS(A).18 we have shown that they do not change the dimension of the column space either. that the column space itself is changed by row operations. If so we will have achieved our aim of proving that dim(RS(A)) = dim(CS(A)). and so we omit the proofs.21 Corollary If B is invertible then dim(RS(AB)) = dim(RS(A)).12 readily provides the result we seek: em of Sy dn ey dS tat ist ics ati cs an . so they certainly do not change the dimension of the row space. If the matrix A is invertible then we may apply 7.18. then. 7.
.12 there exist bases b.21 and 3. . Indeed.3 There is a more direct computational proof of 7. Hence r = dim(RS(P AQ)) = dim(RS(A)).22.164 Chapter Seven: Matrices and Linear Transformations 7. But the row space of consists of all ncomponent rows 0 0 of the form (α1 . Then there exist invertible matrices P ∈ Mat(m × m. and by totally analogous reasoning we also obtain that r = dim(CS(P AQ)) = dim(CS(A)). Proof. .2 The proof of 7.12.13.22. and is clearly an rdimensional space isoI 0 morphic to t F r . 7. c of F n . where s1 . (See 7. and if we let P and Q be the transition matrices Mcs Mcb (φ) = 0 0 and Ms b (respectively) then 7. F m . Let φ: F n → F m be given by φ(x) = Ax. and therefore form a basis for the row space.22. dn an ey dS tat ics . 0.2 above.6). the ﬁrst r 0 0 rows—are linearly independent. then the row space of A and the column space of A both have dimension r. . F ). s2 are the standard bases of F n . the nonzero rows of —that is.22. F ) such that P AQ = Ir×r 0(m−r)×r 0r×(n−r) 0(m−r)×(n−r) Co ri g py where the integer r equals both the dimension of the row space of A and the dimension of the column space of A. As we have Sc ho ht em ol rs ive Un ati cs of M ath ity of I 0 0 0 Sy ist .) By 7. . F m such that I 0 .22 also shows that the rank of a linear transformation (see 7.22 we know that RS(P AQ) and RS(A) have the same I 0 dimension.1 We have now proved 7. independent of which bases are used. 0). Comments 7. as follows. 7. and A = Ms s (φ). . Then φ is a linear transformation. F ) and Q ∈ Mat(n × n. hence it remains to prove that if the identity matrix I in the above equation is of shape r × r.1) is the same as the rank of the matrix of that linear transformation. By 7. αr . .7 gives P AQ = Mcs Ms s (φ) Ms b = Mcb (φ) = Note that since P and Q are transition matrices they are invertible (by 7.22 Theorem Let A ∈ Mat(m × n. Apply row operations to A to obtain a reduced echelon matrix. .
so the echelon matrix obtained has the form P A with P invertible. Hence rank(A) = n. conversely. Let A be an m × n matrix over F and let φ: F n transformation given by multiplication by A. Proof. The dimension of RN(A) is called the right nullity.Chapter Seven: Matrices and Linear Transformations seen in Chapter One. the image of φ is just the column space of A. .23 Theorem An n × n matrix has an inverse if and only if its rank is n. . xi . and its dimension is called the left nullity.20 we know that rank(AB) ≤ rank(A). of A. and by 7. We conclude that A is invertible (QP being the inverse). and. Multiplying this equation on the left by P −1 and on the right by P gives P −1 P AQP = P −1 P . tat ist ics . . . i=1 Sc ho ht em xn am1 am2 · · · amn we see that for every v ∈ F n the column φ(v) is a linear combination of the columns of A. rn(A). By 7. It can be seen that a suitable choice of column operations will transform an echelon matrix to the desired form. The kernel of φ involves us with a new concept: 7. = . . In other words. . 165 Co rig py As an easy consequence of the above results we can deduce the following important fact: 7.24 Definition The right null space of a matrix A ∈ Mat(m × n. this amounts to postmultiplying by an invertible matrix. ol of M rs ive Un ati cs ath ity of ami Sy dn an ey dS → F m be the linear a1i a2i . . of A. row operations are performed by premultiplying by invertible matrices. .22) it is easily shown that the n × n identity has rank n. every linear combination of the columns of A is φ(v) for some v ∈ F n . Since x1 a11 a12 · · · a1n n a21 a22 · · · a2n x2 . F ) and suppose that rank(A) = n. So n ≥ rank(A) ≥ rank(AB) = rank(I) = n since (reasoning as in the proof of 7. ln(A). Similarly.22 there exist n×n invertible P and Q with P AQ = I. and let B = A−1 . Conversely. the left null space LN(A) is the set of all w ∈ tF m such that wA = 0. F ) is the set RN(A) consisting of all v ∈ F n such that Av = 0. Now apply column operations to P A. . suppose that A is invertible.15 the number of rows of A is an upper bound on the rank of A. By 7. . and hence A(QP ) = I. . Let A ∈ Mat(n × n.
u2 . g(t) = sin(t + ). u2 ) and b2 = (v1 . g. v2 ) are two diﬀerent bases of U . and assume that that b1 = (u1 . use b1 as the basis of V considered as the domain. Co ri g py for all A ∈ Mat(m × n. That is. where u1 . Combining this equation with the one in 7. the linear transformation x → xA has kernel equal to the left null space of A and image equal to the row space of A. Let U be the vector space consisting of all functions f : R → R which satisfy the diﬀerential equation f (t) − f (t) = 0. the kernel of φ is the right null space of A.) Sc u1 (t) = et u2 (t) = e−t ho rank(A) + rn(A) = n ht em ol rs ive Un ati cs of M ath ity of Sy dn Exercises v1 (t) = cosh(t) v2 (t) = sinh(t) an dS ey tat ist ics . b2 )transition matrix. (That is. Comment 7.25 Theorem For any rectangular matrix A over the ﬁeld F . 1. which has dimension equal to n. b2 as the basis of V considered as the codomain. h)? Determine two diﬀerent bases b1 and b2 of V . and it follows that rank(A)+ln(A) = m. Let f . Hence the Main Theorem on Linear Transformations applied to φ gives: 7. F ).1 Similarly. and calculate the matrix of this transformation relative to the bases b1 and b2 found in (i). The domain of φ is F n . v1 . v2 are deﬁned as follows: for all t ∈ R. Find the (b1 .25 gives ln(A) − rn(A) = m − n. 2.25. g. the rank of A plus the right nullity of A equals the number of columns of A. the number of rows of A.166 Chapter Seven: Matrices and Linear Transformations It is obvious that if A and φ are as in the above discussion. h(t) = sin(t + ). 4 2 (i ) What is the dimension of the vector space V (over R) spanned by (f. (ii ) Show that diﬀerentiation yields a linear transformation from V to V . the number of columns of A. h be the functions from R to R deﬁned by π π f (t) = sin(t).
¯ and calculate the matrix of θ relative to the basis (1. w2 . Let b1 = (v1 .1. −2 . 7. Prove that the rank of a linear transformation is an invariant. . v2 . vn ). Prove that Mb b = M−1b .5 in this case. rs ive Un 1 4 Calculate the matrix of θ relative to the bases of M 1 2 1 0 x y . (See 7. . 9. 4 1 2 −3 of R3 and R2 . . 167 Co rig py 4. Give an alternative proof of the result of the previous exercise by use of the Main Theorem and Exercise 14 of Chapter Six. 1 1 dn an = (1 0 1 . wn ) be two bases of a vector space V . (i ) Let φ: R2 → R be deﬁned by φ x y the matrix of φ relative to the bases R2 and R. Calculate y and (−1) of dS tat 2) 1 1 ist ics 8. Hence verify Theorem 7. Let S and T be subspaces of a vector space V and let U be a subspace of T such that T = (S ∩ T ) ⊕ U .) cs and 0 1 . b Let θ: R3 → R2 be deﬁned by x θy = z Sc ho ht ol 5. ey x . . . 6. Prove that S + T = S ⊕ U . Show that the function θ: C → C deﬁned by θ(z) = z (complex conjugate of z) is a linear transformation. (ii ) With φ as in (i) and θ as in Exercise 5. z ath ity of Sy em ati 1 3 −2 −1 . i) (for both the domain and codomain). . . and hence deduce that dim(S + T ) = dim S + dim T − dim(S ∩ T ).12. . b2 = (w1 . Regard C as a vector space over R.Chapter Seven: Matrices and Linear Transformations 3. calculate φθ and its matrix relative the two given bases.
. . T = span(w1 . . . 0 of x1 . . . If the result is then (x1 . vs w1 w2 . Do this for the following example: v1 v2 v3 v4 = (1 = (2 = (1 = (7 1 1 1) 0 −2 3) 3 5 0) −3 8 −1 ) w1 w2 w3 w4 = (0 = (1 = (3 = (1 −2 −4 1) 5 9 −1 ) −3 −9 6) 1 0 0). wt ) where the vi and wj are ncomponent rows over F . . x2 . . w2 . . . .168 Chapter Seven: Matrices and Linear Transformations 10. vs 0 0 . . . . . . . . . . xl ) is a basis for S + T and (y1 . M dn ey ist ics tat dS . . . 0 0 . vs ). 0 ri g py ath 0 and use row operations to reduce it to echelon form. . Think about why it works. . . . . xl 0 . ∗ y1 . . yk ) is a basis for S ∩ T. v2 . . . wt v1 v2 . . y2 . . . . Let S = span(v1 . yk 0 . . Sc ho ol ht em rs ive Un ity of ati cs an Sy ∗ . Form the matrix Co v1 v2 . .
F ) and use the isomorphism above to get the required basis of L(V. F ) deﬁned by Ω(θ) = Mbc (θ) is a vector space isomorphism. . . . 1 . W respectively. W ) be the set of all linear transformations from V to W . Suppose that θ: R6 → R4 is a linear transformation with kernel of dimension 2. and check that your answers are in agreement with the Main Theorem on Linear Transformations. . 1 −5 3 Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . First ﬁnd a basis for Mat(m × n. . 13.) 169 Co rig py 12. (Hint: Elements of the basis must be linear transformations from V to W . and let Mat(m × n. vn ). where V is the space of all polynomials over R of degree less than or equal to 3. Let V . 1 −1 1 0 z z w w 4 −2 x x (ii ) θ: R2 → R3 given by θ = −2 1 . Find a basis for L(V. w2 . Let L(V. F ) be the set of all m × n matrices over F . 1 . wm ) be bases of V . v2 . Calculate 3 2 1 Mbb (θ) and ﬁnd a matrix X such that 1 −4 2 X −1 −1 −2 2 X = Mbb (θ). x x 2 −1 3 5 y y (i ) θ: R4 → R2 given by θ = . . W ). y y −6 3 (iii ) θ: V → V given by θ(p(x)) = p (x). . c = (w1 . W ). Prove that the function Ω: L(V. W be vector spaces over the ﬁeld F and let b = (v1 .Chapter Seven: Matrices and Linear Transformations 11. For each of the following linear transformations calculate the dimensions of the kernel and image. Let θ: R → R be deﬁned by θ z 1 −5 z 3 1 1 2 b be the basis of R3 given by b = 1 . Is θ surjective? x 1 −4 2 x 3 3 y = −1 −2 2 y . . and let 14. W ) → Mat(m × n. and to describe a linear transformation θ from V to W it suﬃces to specify θ(vi ) for each i.
) Let V be the space of all 50component rows v over R such that vA = 0 and W the space of 100component columns w over R such that Aw = 0. or prove that no example can exist. (That is. u3 ) of R3 and c = (v1 . −1 1 z −2 −4 Co ri g py Find bases b = (u1 . 17. 0 18. the second consists of the numbers from 101 to 200. v4 ) of R4 such that 1 0 0 0 1 0 the matrix of θ relative to b and c is . u2 . R). Let A be the 50 × 100 matrix with (i. j)th entry 100(i − 1) + j. b be bases for U and d. (Hint: You do not really have to do any calculation!) 19. R) with 2 ABC = 3 4 2 3 0 2 0. and so on.) Calculate dim W − dim V . v2 . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . (Thus the ﬁrst row consists of the numbers from 1 to 100. and let θ: V → W be a linear transformation. Prove that the matrices Mbd (φ) and Mb d (φ) have the same rank. Prove that θ is injective if and only if it is surjective. R). (i ) A surjective linear transformation θ: R8 → R5 with kernel of dimension 4. B ∈ Mat(2 × 2. v3 . for each of the following. Let U and V be vector spaces over F and φ: U → V a linear transformation. C ∈ Mat(2 × 3. (ii ) A ∈ Mat(3 × 2. Let b. d bases for V . Let θ: R3 → R4 be deﬁned by 1 x y = 2 θ 1 z 4 −1 3 x −1 −2 y . Let V and W be ﬁnitely generated vector spaces of the same dimension. 0 0 1 0 0 0 16. Give an example.170 Chapter Seven: Matrices and Linear Transformations 15. V and W are the left null space and right null space of A.
n} → {1. 2. .1 Definition A permutation of the set {1. and some of its basic properties. For example..3 below).8 Permutations and determinants Co rig py ht rs ive Un M ath ity em of This chapter is devoted to studying determinants. 2. . τ (n) = rn . . the notation 2 r2 3 r3 ··· n · · · rn . . In a notation which is commonly used one would write 1 2 2 3 3 1 for this permutation. σ(3) = 1 is a permutation. 2. . 8.. 1 r1 of Sy dn ey dS tat ist ics Our discussion of this topic will be brief since all we really need is the deﬁnition of the parity of a permutation (see Deﬁnition 8. σ= τ= means τ (1) = r1 . ati cs an τ (2) = r2 . . and in particular to proving the properties which were stated without proof in Chapter Two. Since a proper understanding of determinants necessitates some knowledge of permutations. . 3} → {1. n} is a bijective function {1. the function σ: {1. §8a Sc ho ol Permutations σ(1) = 2. 3} deﬁned by σ(2) = 3. . 2. . . 2. . n}. . . 171 . .. n} will be denoted by ‘Sn ’. we investigate these ﬁrst. The set of all permutations of {1. . 2. In general. .
. right operator people deﬁne the product στ of two permutations σ and τ by the rule iστ = (iσ )τ . naturally enough. This seems at ﬁrst to be merely a notational matter of no mathematical consequence. concerning multiplication of permutations. 8. .2. . Multiplication of permutations is deﬁned to be composition of functions. . If σ ∈ Sn and i ∈ {1.2 It is usual to give a more general deﬁnition than 8. then apply σ’ (since ‘(στ )(i)’ means ‘σ applied to (τ applied to i)’). Sc ho ol rs ive Un cs of M ity of Sy dn an ey dS tat ist ics . while for left operators it means ‘apply τ ﬁrst. The right operator convention is to write either ‘iσ ’ or ‘iσ’ instead of σ(i). . That is. .2. . . namely 1 1 1 2 2 2 2 3 3 3 3 1 1 1 2 3 2 1 3 2 3 2 1 2 1 3 2 1 2 2 3 3 3 . write permutations as right operators.172 Chapter Eight: Permutations and determinants Comments 8.1 above: a permutation of any set S is a bijective function from S to itself. τ ∈ Sn deﬁne στ : {1. permutations form an algebraic system known as a group. perhaps most people. n}. 2. The upshot is that we multiply permutations from right to left. . ‘στ ’ means ‘apply σ ﬁrst. 2. others go from left to right.1 We proved in Chapter One that the composite of two bijections is a bijection. However. instead of by the rule given in 8. .2 above. .2 Unfortunately. n} → {1. Some people.2 Definition For σ. the permutation is written to the left of the the thing on which it acts. . Thus for right operators. then apply τ ’. 3}. there are two conﬂicting conventions within the mathematical community. 2. . n} by (στ )(i) = σ(τ (i)).1 Altogether there are six permutations of {1. Comments 8. In fact. But. 1 Co 1 3 ri g py ht em ath ati 8. Furthermore. 2.1. . . both widespread. . rather than left operators (which is the convention that we will follow). n} our convention is to write ‘σ(i)’ for the image of of i under the action of σ. 2. hence the product of two permutations is a permutation. 8.1. we will only need permutations of {1. We leave the study of groups to another course. for each permutation σ there is an inverse permutation σ −1 such that σσ −1 = σ −1 σ = i (the identity permutation).
ﬁrst ﬁnd the number underneath 1 in the right hand factor—it is 3 in this case— then look under this number in the next factor to get the answer. then 2. .Chapter Eight: Permutations and determinants Example #1 Compute the following product of permutations: 173 Co 3 4 4 1 1 2 1 2 2 3 3 4 2 3 4 1 1 3 2 2 2 2 3 4 3 4 4 . This would give as the answer. but can be somewhat cumbersome. In this notation the permutation σ= 1 8 2 6 3 3 4 2 5 1 6 9 7 7 8 5 9 4 10 11 11 10 . that 3 goes to 1. n} must written down twice. There is a shorter notation which writes each number down at most once. then to 3. . for example. Thus. 2 Note that the right operator convention leads to exactly the same method for computing products. 2. but starting with the left hand factor and 1 2 3 4 proceeding to the right. We ﬁnd that ol 1 3 rs ive Un ati of M ath ity of Sy 1 2 2 3 3 4 4 1 2 2 3 4 4 1 = 1 4 2 3 3 1 4 . 3 Starting from the right one ﬁnds. then 3. The procedure is immediately clear: to ﬁnd what the product does to 1. The notation adopted above is good. 1 4 . 3 and 4. Repeat this process to ﬁnd what the product does to 2. 2 4 1 3 #2 It is equally easy to compute a product of more than two factors. for instance cs dn ey ics −− an dS tat ist 1 2 3 1 3 2 1 2 2 1 3 3 1 2 1 3 3 2 1 3 2 1 3 2 1 3 2 2 3 1 = 1 2 2 1 3 . . Then 1 rig py Sc ho −− Let σ = and τ = 1 3 ht em (στ )(1) = σ(τ (1)) = σ(3) = 4. The trouble is that each number in {1. . then 1.
that is. x2 . 1. 4)(2)(3) then στ and τ σ are both equal to (1. and (xσ(i) − xσ(j) ) appears explicitly in the expression for E if σ(i) < σ(j).3 Definition For each σ ∈ Sn deﬁne l(σ) to be the number of ordered pairs (i. 4)(2. and call ε(σ) the parity or sign of σ. 3)(4) and τ = (1. The computation of products is equally quick. 2. 9 the image of 6. except that some of the signs are changed. in the disjoint cycle notation. while (xσ(j) − xσ(i) ) appears there if σ(i) > σ(j). 4) means that 6 is the image of 2. σE = ε(σ)E. j) with i < j such that σ(i) > σ(j). Thus the number of sign changes is exactly the number of pairs (i. if σ = (4. that is. 4)(10. 4 the image of 9 and 2 the image of 4 under the action of σ. or quicker. . 9. 6. 8 to 5 and 5 to 1. Similarly (1. 9. If l(σ) is an odd number we say that σ is an odd permutation. Cycles of length one (like (3) and (7) in the above example) are usually omitted. 3). We deﬁne ε: Sn → {+1. if l(σ) is even we say that σ is an even permutation. the number of sign changes is l(σ). With some thought it can be seen that this will always be the case. For instance. and (fortunately) the cycles of length one have no eﬀect in such computations.174 Chapter Eight: Permutations and determinants would be written as σ = (1. 6. The following deﬁnition was implicitly used in the above argument: Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics an dS tat . That is. In the case n = 4 we would have E = (x1 − x2 )(x1 − x3 )(x1 − x4 )(x2 − x3 )(x2 − x4 )(x3 − x4 ). Co ri g py 8. Note that one gets exactly the same factors. 5)(2. xn deﬁned by E = Πi<j (xi − xj ). −1} by +1 if σ is even −1 if σ is odd for each σ ∈ Sn . Now take a permutation σ and in the expression for E replace each subscript i by σ(i). 8. . . j) such that 1 ≤ i < j ≤ n and σ(i) > σ(j). 3) then applying σ to E in this way gives σ(E) = (x3 − x1 )(x3 − x4 )(x3 − x2 )(x1 − x4 )(x1 − x2 )(x4 − x2 ). Here the cycle (2. 5) indicates that σ maps 1 to 8. The idea is that the permutation is made up of disjoint “cycles”. that if σ = (1)(2. It follows that σ(E) equals E if σ is even and −E if σ is odd. E is the product of all factors (xi − xj ) with i < j. the factor (xi − xj ) in E is transformed into the factor (xσ(i) − xσ(j) ) in σ(E). ε(σ) = (−1)l(σ) = The reason for this deﬁnition becomes apparent (after a while!) when one considers the polynomial E in n variables x1 . 8. 11). Indeed. The reader can check. . and the cycles can be written in any order. for example.
. f (g(x)) cannot make sense unless n = 1. not to be confused with the composite. . . . x2 . since f is a function of n variables.Chapter Eight: Permutations and determinants 8. x2 . xn )g(x1 . . . . . In particular if c is a constant this gives σ(cf ) = c(σf ).5 Proposition For all σ. . yτ (2) . . . xn . . x(στ )(n) ) for all values of the xi . . yn ) = f (yτ (1) . Thus we have shown that (σ(τ f ))(x1 . . . . . . yn ) and (τ f )(y1 . . 175 Co rig py If f and g are two real valued functions of n variables it is usual to deﬁne their product f g by (f g)(x1 . . . τ ∈ Sn and any function f of n variables we have (στ )f = σ(τ f ). . . as claimed.4 Definition Let σ ∈ Sn and let f be a realvalued function of n real variables x1 . In fact. . given by f (g(x)). . . . x2 . . Then by deﬁnition (σ(τ f ))(x1 . x2 . xn ) = (τ f )(y1 . x(στ )(2) . Sc ho ol ht em rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . . σ(f g) = (σf )(σg) for all σ ∈ Sn and all functions f and g. xσ(τ (n)) ). Then σf is the function deﬁned by (σf )(x1 . . . xn be variables and deﬁne yi = xσ(i) for each i. . . y2 . and so σ(τ f ) = (στ )f . xn ) = f (x(στ )(1) . . . which is the only kind of product of functions previously encountered in this book. . x2 . Thus confusion is unlikely. xσ(τ (2)) . . . xn ). Furthermore 8. (Note that this is the pointwise product. . . . . xn ) = f (x1 . . Let σ. y2 . τ ∈ Sn . . Proof. . xn ) = f (xσ(1) . .) We leave it for the reader to check that. . xσ(n) ). . . . x2 . yτ (n) ) = f (xσ(τ (1)) . x2 . for the pointwise product. . . x2 . . Let x1 . . . xσ(2) . .
176 Chapter Eight: Permutations and determinants It can now be seen that for all σ. j) for some i. .6 Definition A permutation in Sn which is of the form (i. i + 1} } / ∈ {i. . as required. Deﬁne N = { (j. In our original notation for permutations. Observe that σ (i) = σ(i + 1) and σ (i + 1) = σ(i). k) ∈ N = { (j. i + 1} / ∈ {i. let us spell this out in more detail. . except in the case of the two numbers we have swapped. Since the proof of this fact is the chief reason for including a section on permutations we give another proof. k)  j < k and σ (j) > σ (k) } so that by deﬁnition the number of elements of N is l(σ) and the number of elements of N is l(σ ). . ε(στ )E = (στ )E = σ(τ E) = σ(ε(τ )E) = ε(τ )σE = ε(τ )ε(σ)E and we deduce that ε(στ ) = ε(σ)ε(τ ).7 Lemma Suppose that τ = (i. σ is obtained from σ by swapping the numbers in the ith and (i + 1)th positions in the second row. k) ∈ N = { (j. n} is called a transposition. That is. while for other values of j we have σ (j) = σ(j). i + 1) is called simple. one less if σ(i) > σ(i + 1). k) ∈ N = { (j. The point is now that if r > s and r occurs to the left of s in the second row of σ then the same is true in the second row of σ . At the risk of making an easy argument seem complicated. Hence the number of such pairs is one more for σ than it is for σ if σ(i) < σ(i + 1). 8. i + 1} } ∈ {i. i + 1} ∈ {i. i + 1} } / ∈ {i. If σ ∈ Sn and σ = στ then l(σ ) = l(σ) + 1 if σ(i) < σ(i + 1) l(σ) − 1 if σ(i) > σ(i + 1). j ∈ I such that τ (i) = j and τ (j) = i. based on properties of cycles of length two. i + 1) is a simple transposition in Sn . Co ri g py 8. Proof. j ∈ I = {1. i + 1} and and and and k k k k ∈ {i. i + 1} / ∈ {i. τ ∈ Sn . 2. k) ∈ N j j j j ∈ {i. and τ (k) = k for all other k ∈ I. k)  j < k and σ(j) > σ(k) } and N = { (j. with E as above. We split N into four pieces as follows: N1 N2 N3 N4 = { (j. if τ is a transposition then there exist i. Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey ist ics tat an dS . A transposition of the form (i. i + 1} }.
Moving on now to pairs of the form (j. k) and (i + 1. τr where the τi are simple transpositions. . and in N4 and not N4 if σ(i) < σ(i + 1). Thus N1 = N1 . k) → (i + 1. k) gives a one to one correspondence between N2 and N2 . as required. . Consider now pairs (i. and since σ(i) = σ (i + 1) and σ(k) = σ (k) we see that σ(i) > σ(k) if and only if σ (i + 1) > σ (k). . The second part is proved by induction on l(σ). / we have that σ(j) > σ(i) if and only if σ (j) > σ (i + 1) and σ(j) > σ(i + 1) if and only if σ (j) > σ (i). Every element of N lies in exactly one of the Ni . i) ∈ N . i + 1). Hence 1 ≤ σ(1) < σ(2) < · · · < σ(n) ≤ n Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . k) and (i + 1. i + 1) → (j. i+1) where j ∈ {i. Thus (i. i + 1}. where k ∈ {i. i + 1) ∈ N if and only if (j.Chapter Eight: Permutations and determinants Deﬁne also N1 . using 8. N3 and N4 by exactly the same formulae with N replaced by N . k) ∈ N . i + 1) ∈ N and (j. If neither j nor k is in {i. which is in N4 and not in N4 if σ(i) > σ(i + 1).8 Proposition (i) If σ = τ1 τ2 . . Thus (j. k). one less in the latter. 177 Co rig py 8. then l(σ) is odd is r is odd and even if r is even. N2 . The details are left as an exercise. The point is that l((τ1 τ2 . Since also j < i if and only if j < i + 1 we see that (j. We show / that (i. and every element of N lies in exactly one of the Ni . . τr−1 . exactly the same argument gives that i + 1 < k and σ(i + 1) > σ(k) if and only if i < k and σ (i) > σ (k). . τr−1 )τr ) = l(τ1 τ2 . i + 1) and (j. and so (j. (ii) Every permutation σ ∈ Sn can be expressed as a product of simple transpositions. We have now shown that the number of elements in N and not in N4 equals the number of elements in N and not in N4 . τr−1 ) ± 1 and in either case the parity of τ1 τ2 . If l(σ) = 0 then σ(i) < σ(j) whenever i < j. i + 1} then σ (j) and σ (k) are the same as σ(j) and σ(k).7. Hence N has one more element than N in the former case. The ﬁrst part is proved by an easy induction on r. i) ∈ N if and only if (j. τr is opposite that of τ1 τ2 . since σ(i + 1) = σ (i). namely (i. Obviously i < k if and only if i + 1 < k. i) gives a one to one correspondence between N3 and N3 . hence i < k and σ(i) > σ(k) if and only if i + 1 < k and σ (i + 1) > σ (k). k) ∈ N if and only if (i + 1. k) is in N if and only if it is in N . i) → (j. . But N4 and N4 each have at most one element. Furthermore. k) → (i. . i) and (j. i+1}. . Proof. .
let τ = (i. But if you don’t like empty products. Hence the only permutation σ with l(σ) = 0 is the identity permutation. . This gives σ = τ1 τ2 . . and by the inductive hypothesis there exist simple transpositions τi with σ = τ1 τ2 . i+1)(i+1. . . . Our next proposition shows that if the disjoint cycle notation is employed then calculation of the parity of a permutation is a triviality. . j)(i. . a product of simple transpositions. i+2. By 8.7 we have l(σ ) = l −1. . By 8. with both sides equal to (i. This is trivially a product of simple transpositions—an empty product. τr . Proof. and so τ is odd: ε(τ ) = (−1)1 = −1. Choosing such an i. By 8. Co ri g py 8. j). . Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn ey tat ist ics an dS .8 (i). or else the same argument as used above would prove that σ = i. There must exist some i such that σ(i) > σ(i + 1). (Just as empty sums are always deﬁned to be 0. ρσ = τ ρ. empty products are always deﬁned to be 1. .10 Proposition Transpositions are odd permutations. . i+ 1) and ρ = (i + 1. τr τ . m+1) is a simple transposition then l(τ ) = 1. Multiplying these expressions expresses στ as a product of r +s simple transpositions. . . where i < j. More generally. any cycle with an even number of terms is an odd permutation. .7 with σ = i or directly) that if τ = (m. . j) ∈ Sn . i+2. i + 2. .) Suppose now that l(σ) = l > 1 and that all permutations σ with l(σ ) < l can be expressed as products of simple transpositions. and any cycle with an odd number of terms is an even permutation. and suppose that there are s factors in the expression for σ and t in the expression for τ . j). Consider next an arbitrary transposition σ = (i.178 Chapter Eight: Permutations and determinants and it follows that σ(i) = i for all i. j). . ε(στ ) = (−1)r+s = (−1)r (−1)s = ε(σ)ε(τ ). observe that τ 2 = i for any simple transposition τ . Proof. . . where τ = (i. That is. 8. i + 1) and σ = στ .9 Corollary If σ. τ ∈ Sn then ε(στ ) = ε(σ)ε(τ ). i+1. . Write σ and τ as products of simple transpositions. A short calculation yields (i+1. It is easily seen (by 8.5 we obtain ε(ρ)ε(σ) = ε(ρσ) = ε(τ ρ) = ε(τ )ε(ρ). j) = (i.
2 Let us examine the deﬁnition ﬁrst in the case n = 3. Sc ho ht ol rs ive Un of M ath ity 8. . . the number of eventerm cycles is even.11 (ii) (iii) (iv) Proposition (i) The number of elements of Sn is n!. r2 )(r2 . 8. Comment 8.10 shows that a permutation is even if and only if. (It is also easy to directly calculate l(σ) in this case. τ ∈ Sn are such that στ = στ then τ = τ . anσ(n) tat ist ics σ∈Sn n = σ∈Sn ε(σ) i=1 aiσ(i) .1 Proposition 8. τ. a product of k − 1 transpositions. a31 a32 a33 . . rk ) be an arbitrary cycle of k terms.12 Definition Let F be a ﬁeld and A ∈ Mat(n × n. So 8. let ϕ = (r1 . in fact. If σ. Let the entry in ith row and j th column of A be aij . . . (rk−1 . 179 Co rig py The proof of the ﬁnal result of this section is left as an exercise. Thus if σ is ﬁxed then as τ varies over all elements of Sn so too does στ . .12. we seek to calculate the determinant of a matrix a11 a12 a13 A = a21 a22 a23 . . rk ).12. em of ati Sy cs dn an ey dS §8b Determinants 8. That is.Chapter Eight: Permutations and determinants and cancelling ε(ρ) gives ε(σ) = ε(τ ) = −1 by the ﬁrst case.1 Determinants are only deﬁned for square matrices. The parity of σ −1 equals that of σ (for any σ ∈ Sn ). . r2 . which is −1 if k is even and +1 if k is odd.9 gives ε(τ ) = (−1)k−1 . Comments 8. The identity i ∈ Sn .10. deﬁned by i(i) = i for all i. Another short calculation shows that ϕ = (r1 . F ). is an even permutation.) Finally. when it is expressed as a product of cycles. r3 ) . The determinant of A is the element of F given by det A = ε(σ)a1σ(1) a2σ(2) . l(σ) = 2(j − i) − 1.
the determinant of the 3 × 3 matrix A = (aij ) is given by a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 . 2) (1. so that ε(σ)a1σ(1) a2σ(2) a3σ(3) = (+1)a12 a23 a31 .12. 2)). 2). (1. So. σ(1) = 2. 8.3 Notice that in each of the matrices in the middle column of the above table there three boxed entries. There are six. 2) (1. 2. three transpositions ((1. The 3cycles and the identity are even and the transpositions are odd. 3) Matrix a11 a21 a 31 a11 a21 a 31 a11 a21 a 31 a11 a21 a 31 a11 a21 a 31 a11 a21 a31 The determinant is the sum of the six terms in the third column. 3) (1. 2. for example.1 above. Using the disjoint cycle notation we ﬁnd that there are two 3cycles ((1. in which the matrix entries aiσ(i) are also indicated in each case. 3) and (1. 3. σ(2) = 3 and σ(3) = 1. Co ri g py Permutation (1) (2. each of the three rows contains exactly one boxed entry. 3) and (2. if σ = (1.180 Chapter Eight: Permutations and determinants The deﬁnition involves a sum over all permutations in S3 . listed in 8. and each of the three columns contains exactly one boxed Sc ho ht em ol rs ive Un ati cs of entries aiσ(i) a12 a13 a22 a23 a32 a33 a12 a13 a23 a22 a32 a33 a12 a13 a22 a23 a32 a33 a12 a13 a22 a23 a32 a33 a12 a13 a22 a23 a32 a33 a12 a13 a22 a23 a32 a33 ε(σ)a1σ(1) a2σ(2) a3σ(3) (+1)a11 a22 a33 M ath ity an of (−1)a11 a23 a32 (−1)a12 a21 a33 (+1)a12 a23 a31 (+1)a13 a21 a32 (−1)a13 a22 a31 Sy dn ey tat ist ics dS . 3. 3) then ε(σ) = +1. 3)) and the identity i = (1). The results of similar calculations for all the permutations in S3 are listed in the following table. So. 3) (1. 2.1.
Choose n matrix positions in such a way that no row contains more than one of the chosen positions and no column contains more than one of the chosen positions. 4. It is too tiresome to compute and add up the 24 terms arising in the deﬁnition of a 4 × 4 determinant or the 120 for a 5 × 5. . 4.Chapter Eight: Permutations and determinants entry. of all possible terms obtained by multiplying three factors with the property that every row contains exactly one of the factors and every column contains exactly one of the factors. If one chooses the 4th entry in row 1. except in the case n = 3. an ∈ tF n . suﬃce it to say that the deﬁnition does not provide a good method itself. and things rapidly get worse still. (In our example the term would be multiplied by +1 since (1. . we have not addressed the question of how best in practice to calculate the determinant of a given matrix. for now.4 So far in this section we have merely been describing what the determinant is. F ). 8. 4 2 1 3 That is. Then it will necessarily be true that each row contains exactly one of the chosen positions and each column contains exactly one of the chosen positions. then the permutation is = (1. the 1st in 1 2 3 4 row 3 and the 3rd in row 4. and let the rows of A be a1 .12. 3) is an even permutation. In each case multiply the entries in the n chosen positions. if the j th entry in row i is chosen then the permutation maps i to j. ˜ ˜ ˜ (i) Suppose that 1 ≤ k ≤ n and that ak = λak + µak . We will come to this question later. We give an example to illustrate the rule for ﬁnding the permutation corresponding to a choice of matrix positions of the above kind. Of course. Furthermore. The determinant is the sum of the n! terms so obtained. Then ˜ ˜ ˜ det A = λ det A + µ det A Sc ho ht em ol rs ive Un ati cs of M ath ity of Sy dn an ey dS tat ist ics . Several are listed in the next theorem. as follows. . there is nothing special about the 3×3 case. a2 . 3). There are exactly n! ways of doing this. The determinant of a general n×n matrix is obtained by the natural generalization of the above method. and then multiply by the sign of the corresponding permutation.13 Theorem Let F be a ﬁeld and A ∈ Mat(n × n. the above six examples of such a boxing of entries are the only ones possible.) 181 Co rig py 8. with appropriate signs. Suppose that n = 4. We start now to investigate properties of determinants. the 2nd in row 2. . So the determinant of a 3 × 3 matrix is a sum.
. . then det A = 1. Then we have aij = aij = aij if i = k.11) . . (ii) Let aij and bij be the (i. . suppose that the ﬁrst row of B is aτ (1) . . a(k−1)σ(k−1) akσ(k) a(k+1)σ(k+1) . a(k−1)σ(k−1) akσ(k) a(k+1)σ(k+1) . anσ(n) and µε(σ)a1σ(1) . . . . (i) Let the (i. aij and aij . A has ak as its k th row and other rows the same as A. Co (ii) ri g py Sc (iii) (iv) (v) (vi) Proof. . . If A has a zero row then det A = 0. . This shows that det A is equal to ho ht em ol rs ive Un ati cs akj = λakj + µakj of M ath ity of Sy dn an ey dS tat ist ics λ σ∈Sn ε(σ)a1σ(1) a2σ(2) . so that the term then becomes the sum of the two terms λε(σ)a1σ(1) . We are given that if bi is the ith row of B then bi = aτ (i) . Now by deﬁnition det A is the sum over all σ ∈ Sn of the terms ε(σ)a1σ(1) a2σ(2) . anσ(n) which is λ det A + µ det A . If A = I. and for all j. anσ(n) + µ σ∈Sn ε(σ)a1σ(1) a2σ(2) . A and A be (respectively) aij . The determinant of the transpose of A is the same as the determinant of A. j)th entries of A. anσ(n) In the ﬁrst of these we replace aiσ(i) by aiσ(i) (for all i = k) and in the second we likewise replace aiσ(i) by aiσ(i) . Then det B = ε(τ ) det A. . That is. anσ(n) . the second aτ (2) . .182 Chapter Eight: Permutations and determinants where A is the matrix with ak as k th row and all