You are on page 1of 36

CONTENTS 1

Preface
These lecture notes contain more–less verbatim what appears on the board during the lectures.
The colour–coding signifies the following: what is written in red is being defined. The named theorems
and statements are denoted by blue (these could be referred to by these names). Finally, green is used
for references to lecture notes of other modules (mostly MATH1048 Linear Algebra I).

Acknowledgement
For the most part, these notes were written by Dr Bernhard Koeck, and typed into LATEX by the
undergraduate student Thomas Blundell-Hunter in summer 2011.

Contents
1 Groups 2

2 Fields and Vector Spaces 6

3 Bases 12

4 Linear Transformations 19

5 Determinants 28

6 Diagonalisability 31
2 1 GROUPS

1 Groups
Recall that the cartesian product G × H of two sets G and H is defined as the set

G × H := {(g, h) : g ∈ G, h ∈ H}

consisting of all pairs (g, h) where g ∈ G and h ∈ H. For instance, R2 = R × R.


Definition 1.1. (a) A binary operation on a set G is a map G × G → G from the cartesian product
G × G := {(a, b) : a ∈ G, b ∈ G} to G. We write a ∗ b (or ab or a ◦ b or a + b) for the image of the
pair (a, b) ∈ G × G in G.
(b) A group is a set G together with a binary operation G × G → G, (a, b) 7→ a ∗ b, (which we usually
refer to as “group operation” or “group multiplication” or just “multiplication”) such that the
following axioms are satisfied:
i Associativity: For all a, b, c ∈ G we have (a ∗ b) ∗ c = a ∗ (b ∗ c) in G.
ii Existence of the identity (or neutral) element: There exists e ∈ G such that for all a ∈ G we
have e ∗ a = a = a ∗ e.
iii Existence of the right inverse: For every a ∈ G there exists b ∈ G such that a ∗ b = e.
(c) A group is called abelian (or commutative) if for all a, b ∈ G we have a ∗ b = b ∗ a in G.
(d) A group is called finite if it has only finitely many elements; in this case the cardinality |G| is called
the order of G.
Example 1.2. (a) The usual addition N × N → N, (m, n) 7→ m + n, defines a binary operation on N;
so does multiplication;
but not subtraction because for instance 2 − 3 = −1 ∈
/ N.
(b) The sets Z, Q, R and C together with addition as the binary operation are abelian groups. The sets
Q∗ := Q\{0}, R∗ := R\{0} and C∗ := C\{0} together with multiplication as the binary operation
are abelian groups.
(c) The set N together with addition is not a group because, for example, there doesn’t exist a (right)
inverse of 1 in N.
(d) For any n ∈ N, the set Rn together with vector addition is a group. For any m, n ∈ N the set
Mm×n (R) of real (m × n)-matrices together with matrix addition is a group. For every n ∈ N,
the set GLn (R) of invertible real (n × n) - matrices together with matrix multiplication is a group,
called 
the
 general
 linear group (see
 alsoLinear Algebra
  I). If n > 1, GLn (R) is not abelian, e.g.:
1 1 1 0 2 1 1 0 1 1 1 1
   
= , but =
0 1 1 1 1 1 1 1 0 1 1 2
Proposition 1.3. Let G be a group.
(a) There exists precisely one identity element in G.
(b) For every a ∈ G there exists precisely one right inverse b and this is also a left inverse of a (i.e.
we have b ∗ a = e). We write a−1 for the inverse of a (or −a if we use additive notation a + b).
Proof. (a) Suppose e and e0 are two identity elements in G;
=⇒ e ∗ e0 = e (because e0 is an identity element)
and e ∗ e = e
0 0
(because e is an identity element);
=⇒ e = e0
(b) Let b be a right inverse of a and let c be a right inverse of b;
=⇒ a ∗ (b ∗ c) = a ∗ e (because c is a right inverse of b)
=a (because e is the identity element)
and a ∗ (b ∗ c) = (a ∗ b) ∗ c (by associativity)
=e∗c (because b is a right inverse of a)
=c (because e is the identity element).
1 GROUPS 3

=⇒ a = c.
=⇒ b ∗ a = b ∗ c = e (because c is a right inverse of b);
i.e. b is also a left inverse of a.
Suppose b0 is another right inverse of a.
=⇒ b = b ∗ e = b ∗ (a ∗ b0 ) = (b ∗ a) ∗ b0 = e ∗ b0 = b0 . (These equalities hold by definition of e, by
definition of b0 , by associativity, by the fact just proved and by definition of e, respectively.)

Proposition 1.4. Let G be a group.

(a) (Cancellation). If a, b, c ∈ G and a ∗ c = b ∗ c (or c ∗ a = c ∗ b) then a = b.

(b) For every a ∈ G both the map G → G, x 7→ a ∗ x, (left translation with a) and G → G, y 7→ y ∗ a
(right translation with a) are bijective. In other words: For all a, b ∈ G both the equation a ∗ x = b
and y ∗ a = b have a unique solution in G.

(c) For every a ∈ G we have (a−1 )−1 = a.

(d) For all a, b ∈ G we have (a ∗ b)−1 = b−1 ∗ a−1 .

(e) (Exponential Laws) For any m ∈ Z and a ∈ G we define:




 a ∗ a ∗{z· · · ∗ a} if m > 0;
|

m times
a :=
m

 e if m = 0;
(a−1 )|m|

if m < 0.

(We write ma instead of am when we use additive notation in G.)


Then am+n = am ∗ an and (am )n = amn for all m, n ∈ Z.
If a, b ∈ G commute (i.e. if a ∗ b = b ∗ a) then (a ∗ b)m = am ∗ bm for all m ∈ Z.

Proof. (a) Multiply with c−1 .

(b) Injectivity: see (a).


Surjectivity: Let b ∈ G. Then b ∗ a−1 is mapped to b under right translation with a.

(c) Both (a−1 )−1 and a are solutions of the equation a−1 ∗ x = e (see 1.3(b)). Now apply (b).

(d) We have (a ∗ b) ∗ (b−1 ∗ a−1 ) = a ∗ (b ∗ (b−1 ∗ a−1 )) = a ∗ ((b ∗ b−1 ) ∗ a−1 ) = a ∗ (e ∗ a−1 ) = a ∗ a−1 =
e. =⇒ b−1 ∗ a−1 is a right inverse of a ∗ b.

(e) Left as an exercise.

Example 1.5. (a) The group table of a finite group G = {a1 , a2 , . . . , an } is given by:

∗ ... aj ...
.. .. ..
. . .
ai ... ai ∗ aj ...
.. ..
. .

The trivial group is the group with exactly one element, say e. This must have the group table:
∗ e
e e
Any group with 2 elements, say e and a, is given by the group table:

∗ e a
e e a
a a e
4 1 GROUPS

Any group with 3 elements, say e, a and b, must have the group table:

∗ e a b
e e a b
a a b e
b b e a

(Notice: a ∗ b = a would imply b = e by 1.4(a).) There are two “essentially different” groups of
order 4.
Note that Proposition 1.4(b) implies that the group tables must satisfy “sudoku rules”, i.e. every
group element must appear in each row and each column exactly once. However, not every table
obeying this sudoku rule defines a group, for instance the table

e a b c d
a e c d b
b c d e a
c d a b e
d b e a c

does not. Why? (Hint: what is a ∗ a ∗ b?)


(b) Let m ∈ N and define Cm := {0, 1, 2, . . . , m − 1}. We define a binary operation on Cm by:
(
x+y if x + y < m;
x + y :=
x + y − m if x + y ≥ m.

Then Cm is an abelian group called the cyclic group of order m.

Proof. Associativity: Let x, y, z ∈ {0, 1, . . . , m − 1}.


We want to show (x + y) + z = x + (y + z) in Cm .

First Case: Suppose x + y + z < m. =⇒ x + y < m and y + z < m.


=⇒ LHS = x + y + z = (x + y) + z = x + (y + z) = x + y + z = RHS (using associativity in Z,
see 1.2(b)).

Second Case: Suppose m ≤ x + y + z < 2m.


Case (a): Suppose x + y < m.
=⇒ LHS = x + y + z = x + y + z − m.
Case (b): Suppose x + y ≥ m.
=⇒ LHS = x + y − m + z = x + y + z − m.
Hence in both cases (a) and (b) we have LHS = x + y + z − m.
Similarly we obtain RHS = x + y + z − m.

Third Case: Suppose x + y + z ≥ 2m. =⇒ x + y ≥ m and y + z ≥ m.


=⇒ LHS = x + y − m + z = x + y + z − 2m (since (x + y − m) + z ≥ m)
and RHS = x + y + z − m = x + y + z − 2m (since x + (y + z − m) ≥ m).

Identity Element: 0̄ is obviously an identity element in Cm .

Existence of the Right Inverse: The inverse of x ∈ {0, . . . m − 1} is obviously m − x ∈ {0, . . . m − 1}


if x 6= 0 and 0 if x = 0.

(c) Let S be a set (such as S = {1, . . . , n} for some n) and let Sym(S) denote the set of all bijective
maps π : S → S, also called permutations of S. For any σ, π ∈ Sym(S) we define σ◦π by (σ◦π)(s) =
σ(π(s)) for s ∈ S (composition of permutations). Then Sym(S) together with composition as the
binary operation is a group, the permutation group of S. The identity element of Sym(S) is the
identity function and the inverse of π ∈ Sym(S) is the inverse function π −1 (seen in Calculus I). If
1 GROUPS 5

1 2
 
... n
S = {1, . . . , n} for some n ∈ N we write Sn for Sym(S) and for π ∈ Sn .
π(1) π(2) . . . π(n)
−1 
1 2 3 4 1 2 3 4
 
For example we have : = in S4 .
3 1 2 4 2 3 1 4
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
     
◦ = in S5 .
3 4 1 2 5 2 3 5 1 4 4 1 5 3 2

Definition 1.6. Let n ≥ 1 and s ≤ n. Let a1 , . . . , as ∈ {1, . . . , n} be pairwise distinct. The permutation
π ∈ Sn with
π(a1 ) = a2 , π(a2 ) = a3 , . . . , π(as−1 ) = as , π(as ) = a1
and
π(a) = a for a ∈ {1, . . . , n} not belonging to {a1 , . . . , as }
is denoted by ha1 , . . . , as i. Any permutation of this form is called a cycle. If s = 2 it is called a
transposition. The number s is called the length (or order) of the cycle.
1 2 3 4 5 6 1 2 3 4
   
E.g. h3, 1, 5, 2i = in S6 ; h3, 2i = in S4 .
5 3 1 4 2 6 1 3 2 4
Proposition 1.7. Every permutation σ ∈ Sn is a composition of cycles.
1 2 3 4 5 6 7 8 9 10 11
 
Example 1.8. (a) Let σ = ∈ S11 .
2 5 6 9 1 4 10 11 3 7 8
=⇒ σ = h1, 2, 5i ◦ h3, 6, 4, 9i ◦ h7, 10i ◦ h8, 11i.
1 2 3 4
 
(b) Let σ = ∈ S4 ; =⇒ σ = h1, 4i.
4 2 3 1

General Recipe (which with a bit of effort can be turned into a proof of 1.7).
Let a := 1 and choose m ∈ N minimal such that
σ m (a) ∈ {a, σ(a), σ 2 (a), . . . , σ m−1 (a)}. (We then in fact have σ m (a) = a.)
Let σ1 be the cycle ha, σ(a), . . . , σ m−1 (a)i.
Now let b be the smallest number in {1, . . . , n}\{a, σ(a), . . . , σ m−1 (a)} and, as above, choose l ∈ N
minimal such that σ l (b) ∈ {b, σ(b), . . . , σ l−1 (b)} and put σ2 := hb, σ(b), . . . , σ l−1 (b)i.
Continuing this way we find σ = σ1 ◦ σ2 ◦ . . . is a composition of cycles.
Definition 1.9. Let n ≥ 1 and σ ∈ Sn . We write σ = σ1 ◦ σ2 ◦ . . . as a composition of cycles of length
s1 , s2 , . . . . Then
sgn(σ) := (−1)(s1 −1)+(s2 −1)+... ∈ {±1}
is called the sign (or signum) of σ.
E.g. let σ be as in 1.8(a), then sgn(σ) = (−1)2+3+1+1 = (−1)7 = −1.
If σ is a transposition then sgn(σ) = −1.

Theorem 1.10. (a) The definition of sgn(σ) does not depend on the chosen cycle decomposition of σ.
(b) For all σ, π ∈ Sn we have sgn(σ ◦ π) = sgn(σ)sgn(π).
Proof. See Group Theory in Year 2.
6 2 FIELDS AND VECTOR SPACES

2 Fields and Vector Spaces

Definition 2.1. A field is a set F together with binary operations on F , called addition and multipli-
cation, such that F together with addition and F × := F \{0} together with multiplication are abelian
groups (we use the notations a + b, 0, −a and a · b, 1, a−1 respectively) and such that the following axiom
holds:
Distributivity: For all a, b, c ∈ F we have a(b + c) = ab + ac in F .

Example 2.2. (a) The sets Q, R and C with the usual addition and multiplication are fields (see also
(1.2(b))).

(b) The set Z with the usual addition and multiplication is not a field because for instance there is no
multiplicative inverse of 2 in Z.

(c) The set F2 := {0, 1} together with the following operations is a field.

+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1

Note that 1 + 1 = 2 in Q but 1 + 1 = 0 in F2 . F2 is the smallest field.

Proof. (F2 , +) and (F2 \{0}, ·) are abelian groups (see (1.5(a))).
Distributivity: We need to check a(b + c) = ab + ac for all a, b, c ∈ F2 .
1st Case: a = 0 =⇒ LHS = 0, RHS = 0 + 0 = 0
2nd Case: a = 1 =⇒ LHS = b + c = RHS

(d) (without proof) Let p be a prime. The set Fp := {0, 1, . . . , p − 1} together with the addition defined
in Example (1.5(b)) and the following multiplication is a field:

x · y := remainder left when xy is divided by p.

Proposition 2.3. Let F be a field and let a, b ∈ F . Then we have:

(a) 0a = 0 in F .

(b) (−a)b = −(ab) in F .

Proof. (a) We have 0 + 0a = 0a = (0 + 0)a = 0a + 0a (by distributivity)


=⇒ 0 = 0a (cancel 0a on both sides using 1.4(a)).

(b) We have ab + (−a)b = (a + (−a))b (by distributivity)


= 0b (by definition of the additive inverse)
= 0 (by part (a))
=⇒ (−a)b is the additive inverse of ab, i.e. (−a)b = −(ab)

Definition 2.4. Let F be a field. A vector space over F is an abelian group V (where binary operation
is written additively, i.e. (z, y) 7→ x + y) together with map F × V → V (called scalar multiplication and
written as (a, x) 7→ ax) such that the following axioms are satisfied:

(i) 1st distributivity law: For all a, b ∈ F and x ∈ V we have


(a + b)x = ax + bx in V .

(ii) 2nd distributivity law: For all a, b ∈ F and x, y ∈ V we have


a(x + y) = ax + ay in V .

(iii) For all a, b ∈ F and for all x ∈ V we have (ab)x = a(bx) in V .

(iv) For all x ∈ V we have 1x = x in V .


2 FIELDS AND VECTOR SPACES 7

The elements of V are often called vectors. We write 0F and 0V for the identity element of F and V ,
respectively, and often just 0 for both 0F and 0V . Furthermore we use the notation u − v for u + (−v)
when u, v are both in V .

Example 2.5. (a) For every n ∈ N the set Rn together with the usual addition and scalar multi-
plication (as seen in Linear Algebra I) is a vector space over R. Similarly, for any field F , the
set
F n := {(a1 , . . . , an ) : a1 , . . . , an ∈ F }
together with componentwise addition and the obvious scalar multiplication is a vector space over
F ; e.g. F22 = {(0, 0), (0, 1), (1, 0), (1, 1)} is a vector space over F2 , F = F 1 is a vector space over
F, {0} = F 0 is a vector space over F .

(b) Let V be the additive group of R or C. We view the usual multiplication R × V → V, (a, x) 7→ ax,
as scalar multiplication of R on V . Then V is a vector space over Q (easy to check!). Similarly C
is a vector space over R.

(c) Let V denote the abelian group R (with the usual addition). For a ∈ R and x ∈ V we put
a ⊗ x := a2 x ∈ V ; this defines a scalar multiplication

R × V → V, (a, x) 7→ a ⊗ x,

of the field R on V . Which of the vector-space axioms (see 2.4) hold for V with this scalar
multiplication?

Solution:

(i) We need to check whether (a + b) ⊗ x = a ⊗ x + b ⊗ x for all a, b ∈ R and x ∈ V .


LHS = (a + b)2 x; RHS = a2 x + b2 x = (a2 + b2 )x
=⇒ For a = 1, b = 1 and x = 1 we have LHS 6= RHS.
=⇒ First distributivity law does not hold.

(ii) We need to check whether a)⊗ (x + y) = a ⊗ x + a ⊗ y for all a ∈ R and x, y ∈ V .


LHS = a2 (x + y)
=⇒ LHS = RHS
RHS = a2 x + a2 y = a2 (x + y)
=⇒ Second distributivity law does hold.

(iii) We need to check whether a ⊗ (b)⊗ x) = (ab) ⊗ x for all a, b ∈ R and x ∈ V .


LHS = a ⊗ (b2 x) = a2 (b2 x)
=⇒ LHS = RHS
RHS = (ab)2 x = (a2 b2 )x = a2 (b2 x)
=⇒ Axiom 3 does hold.

(iv) We have 1 ⊗ x = 12 x = 1x for all x ∈ V .


=⇒ Axiom 4 does hold.

Proposition 2.6. Let V be a vector space over a field F and let a, b ∈ F and x, y ∈ V . Then we have:

(a) (a − b)x = ax − bx

(b) a(x − y) = ax − ay

(c) ax = 0V ⇐⇒ a = 0F or x = 0V

(d) (−1)x = −x

Proof. (a) (a − b)x + bx = ((a − b) + b)x (by 1st distributivity law)


= (a + ((−b) + b)) x = (a + 0F )x = ax (using field axioms)
=⇒ (a − b)x = ax − bx (add −bx to both sides).

(b) See Coursework


8 2 FIELDS AND VECTOR SPACES

(c) “ =⇒ ”: See Coursework


“⇐=”: Put a = b and x = y in (a) and (b), respectively.

(d) Put a = 0 and b = 1 in (a) and use (c).

The next example is the “mother” of almost all vector spaces. It vastly generalises the fourth of the
following five ways of representing vectors and vector addition in R3 .

z  y

AA 
3 

2 
(I) 
1
 b r

 x

h
a h H
H
h 

1a 2 hh 3 hra + b
 aa hhhh

aa
−1 ar
a a

(II) a = (2.5, 0, −1) b = (1, 1, 0.5) a + b = (3.5, 1, −0.5)

1 2 3 1 2 3 1 2 3
     
(III) a = b= a+b=
2.5 0 −1 1 1 0.5 3.5 1 −0.5

a : {1, 2, 3} → R b : {1, 2, 3} → R a + b : {1, 2, 3} → R


1 7→ 2.5 1 7→ 1 1 7→ 3.5
(IV)
2 7→ 0 2 7→ 1 2 7→ 1
3 7→ −1 3 7→ 0.5 3 7→ −0.5

A


3


2
(V)
1 
 






 H

1 2 3

−1 


Example 2.7. Let S be any set and let F be a field. Let

F S := {f : S → F }

denote the set of all maps from S to F . We define an addition on F S and a scalar multiplication of F
on F S as follows. Let f, g ∈ F S and a ∈ F ; then:

(f + g)(s) := f (s) + g(s) for any s ∈ S


(af )(s) := af (s) for any s ∈ S.
2 FIELDS AND VECTOR SPACES 9

Then F S is a vector space over F (see below for the proof).

Special Cases:

(a) Let S = {1, . . . , n}. Identifying any map f : {1, . . . , n} → F with the corresponding tuple
(f (1), . . . , f (n)) we see that F S is nothing but the set F n of all n-tuples (a1 , . . . , an ) considered in
Example 2.5(a).

(b) Let S = {1, . . . , n} × {1, . . . , m}. Identifying any map f : {1, . . . , n} × {1, . . . , m} → F with the
corresponding matrix:
f ((1, 1)) . . . f ((1, n))
 
.. ..
. .
 
 
f ((m, 1)) ... f ((m, n))
we see that F S is nothing but the set Mn×m (F ) of (n × m)-matrices.
 
a11 . . . a1m
 .. .. 
 . . 
an1 . . . anm

with entries in F . In particular Mn×m (F ) is a vector space over F .

(c) Let S = N. Identifying any map f : N → F with the sequence (f (1), f (2), f (3), . . .) we see that F N
is nothing but the set of all infinite sequences (a1 , a2 , a3 , . . .) in F .

(d) Let F = R and let S be an interval I in R. Then F S = RI is the set of all functions f : I → R.
Such functions are usually visualised by their graph in I × R ⊆ R × R.

Proof. Associativity: Let f, g, h ∈ F S .


We need to show: (f + g) + h = f + (g + h) in F S
⇐⇒ ((f + g) + h)(s) = (f + (g + h))(s)
) for all s ∈ S.
LHS = (f + g)(s) + h(s) = (f (s) + g(s)) + h(s)
(by definition. of
RHS = f (s) + (g + h)(s) = f (s) + (g(s) + h(s) addition in F S )
=⇒ LHS = RHS (by associativity in F ).

Identity element: Let 0 denote the constant function S → F, s 7→ 0F . For any f ∈ F S we have
(f + 0)(s) = f (s) + 0(s) = f (s) + 0F = f (s) for all s ∈ S, hence f + 0 = f . Similarly we have 0 + f = f .
=⇒ 0 is an identity element.

Inverse element: Let f ∈ F S . Define (−f )(s) := −f (s)


=⇒ f + (−f ) = 0 in F S (easy to check!)

Commutativity: easy to check!

1st distributivity law: Let a, b ∈ F and f ∈ F S


=⇒ For all s ∈ S we have
((a + b)f )(s) = (a + b)(f (s)) (by definition of the scalar multiplication)
= a(f (s)) + b(f (s)) (by definition in F )
= (af )(s) + (bf )(s) (by definition of the scalar multiplication)
= (af + bf )(s) (by definition of addition in F S )
=⇒ (a + b)f = af + bf .

Similarly one easily checks the three remaining axioms of Definition 2.4.

Definition 2.8. Let V be a vector space over a field F . A subset W of V is called a subspace of V if
the following conditions hold:
10 2 FIELDS AND VECTOR SPACES

(a) 0V ∈ W .

(b) W is closed under addition, i.e. for all x, y ∈ W we also have x + y ∈ W .

(c) W is closed under scalar multiplication, i.e. for all a ∈ F and x ∈ W we have ax ∈ W .

Note that condition (b) states that the restriction of the addition in V to W gives a binary operation
W × W → W on W (addition in W ). Similarly, condition (c) states that the scalar multiplication of F
on V yields a map F × W → W which we view as a scalar multiplication of F on W .

Proposition 2.9. Let V be a scalar space over a field F and let W be a subspace of V . Then W together
with the above mentioned addition and scalar multiplication is a vector space over F .

Proof. The following axioms hold for W because they already hold for V :

• associativity of addition;

• commutativity of addition;

• first and second distributivity laws;

• final two axioms in Definition 2.4

There exists an additive identity element in W by condition 2.8(a). It remains to show that additive
inverse exists: Let x ∈ W ; then −x = (−1)x (by 2.6(d)) is in W by condition 2.8(c) and −x satisfies the
condition of an additive inverse of x because it does so in V .

Example 2.10. (a) Examples of subspaces of Rn as seen in Linear Algebra I, such as the nullspace
of any real (m × n)-matrix.

(b) The set of convergent sequences is a subspace of the vector space RN of all sequences (a1 , a2 , a3 , . . .)
in R. A subspace of this subspace (and hence of RN ) is the set of all sequences in R that converge
to 0. (See Calculus I for proofs).

(c) Let A ∈ Ml×m (R). Then W := {B ∈ Mm×n (R) : AB = 0} is a subspace of Mm×n (R).

Proof. i We have A · 0 = 0 =⇒ 0 ∈ W .
ii Let B1 , B2 ∈ W
=⇒ A(B1 + B2 ) = AB1 + AB2 = 0 + 0 = 0
=⇒ B1 + B2 ∈ W .
iii Let a ∈ R and B ∈ W
=⇒ A(aB) = a(AB) = a0 = 0
=⇒ aB ∈ W .

(d) Let I be a non-empty interval in R. The following subsets of the vector space RI consisting of all
functions from I to R are subspaces:

i For any s0 ∈ I the subset W := {f ∈ RI : f (s0 ) = 0} of RI .

Proof. 1 The zero function 0 vanishes at s0 =⇒ 0 ∈ W .


2 Let f, g ∈ W
=⇒ (f + g)(s0 ) = f (s0 ) + g(s0 ) = 0 + 0 = 0
=⇒ f + g ∈ W .
3 Let a ∈ R and f ∈ W
=⇒ (af )(s0 ) = a f (s0 ) = a0 = 0
=⇒ af ∈ W .

ii The set of all continuous functions f : I → R (see Calculus I).


iii The set of all differentiable functions f : I → R (see Calculus I).
2 FIELDS AND VECTOR SPACES 11

iv For any n ∈ N, the set Pn of polynomial functions f : I → R of degree at most n a subspace


by 3.2(c) and 3.3. Recall that a function f : I → R is a polynomial function of degree ≤ n if
there exists a0 , . . . , an ∈ R such that:

f (s) = a0 + a1 s + . . . + an sn for all s ∈ I.

Denoting the function I → R, s 7→ sm , by tm , this means that f = a0 + a1 t + . . . + an tn as


elements of the vector space RI .
v The space of solutions of a homogeneous linear differential equation (without further expla-
nation); e.g.:

Pn = {f ∈ RI : f is differentiable (n + 1) times and f (n+1) = 0}

(e) The subset Zn of the vector space Rn over R is obviously closed under addition but not closed under
scalar multiplication because, for instance, (1, 0, . . . , 0) ∈ Zn and 21 ∈ R, but 12 (1, 0, . . . , 0) ∈
/ Zn .
(f) The subsets W1 := {(a, 0) : a ∈ R} and W2 := {(0, b) : b ∈ R} are obviously subspaces of R2 . The
subset W := W1 ∪ W2 of the vector space R2 is obviously closed under scalar multiplication but
not under addition because, for instance, (1, 0) and (0, 1) are in W1 but
(1, 0) + (0, 1) = (1, 1) ∈
/ W.
Proposition 2.11. Let W1 , W2 be subspaces of a vector space V over a field F . Then the intersection
W1 ∩ W2 and the sum
W1 + W2 := {x1 + x2 ∈ V : x1 ∈ W1 , x2 ∈ W2 }
are subspaces of V as well.
Proof. For W1 ∩ W2 :
• We have 0V ∈ W1 and 0V ∈ W2 (because W1 and W2 are subspaces)
=⇒ 0V ∈ W1 ∩ W2 (by the definition of intersection).
• Let x, y ∈ W1 ∩ W2
=⇒ x, y ∈ W1 and x, y ∈ W2 (by the definition of intersection)
=⇒ x + y ∈ W1 and x + y ∈ W2 (because W1 and W2 are subspaces)
=⇒ x + y ∈ W1 ∩ W2 (by the definition of intersection).

• Let a ∈ F and x ∈ W1 ∩ W2
=⇒ x ∈ W1 and x ∈ W2 (by the definition of intersection)
=⇒ ax ∈ W1 and ax ∈ W2 (because W1 and W2 are subspaces)
=⇒ ax ∈ W1 ∩ W2 (by the definition of intersection).
For W1 + W2 :

• We have 0V = 0V + 0V ∈ W1 + W2 .
• Let x, y ∈ W1 + W2
=⇒ ∃x1 , y1 ∈ W1 and ∃x2 , y2 ∈ W2 such that x = x1 + x2 , y = y1 + y2 .
=⇒ x + y = (x1 + x2 ) + (y1 + y2 ) = (x1 + y1 ) + (x2 + y2 ) ∈ W1 + W2
(because W1 and W2 are subspaces).
• Let a ∈ F and x ∈ W1 + W2
=⇒ ∃x1 ∈ W1 , x2 ∈ W2 s.t. x = x1 + x2
=⇒ ax = a(x1 + x2 ) = ax1 + ax2 ∈ W1 + W2
(because W1 and W2 are subspaces).

Example 2.12. Let W1 and W2 be as in 2.10(f) =⇒ W1 + W2 = R2 .


12 3 BASES

3 Bases
Definition 3.1. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V .
(a) An element x ∈ V is called a linear combination of x1 , . . . , x2 if there are a1 , . . . , an ∈ F such that
x = a1 x1 + . . . + an xn .
(b) The subset of V consisting of all linear combinations of x1 , . . . , xn is called the Span of x1 , . . . , xn
and denoted by Span(x1 , . . . , xn ); i.e.

Span(x1 , . . . , xn ) = {a1 x1 + . . . + an xn |a1 , . . . , an ∈ F }.

(c) We say that x1 , . . . , xn span (or generate) V or that x1 , . . . xn form a spanning set of V if V =
Span(x1 , . . . , xn ), i.e. every x ∈ V is a linear contribution of x1 , . . . , xn .
(See also Sections 6.4 and 6.5 of L.A.I.)
Example 3.2. (a) Let V := Mn×m (F ) be the vector space of (n × m)-matrices with entries in F . For
i ∈ {1, . . . , n} and j ∈ {1, . . . , m} let Eij denote the (n × m)-matrix with zeroes everywhere except
at (ij) where it has the entry 1. Then the matrices Eij ; i = 1, . . . , n; j = 1, . . . , m form a spanning
set of V .

Proof. LetPA = P(aij ) ∈ Mn×m (F ) be an arbitrary matrix


n m
=⇒ A = i=1 j=1 aij Eij
2 3 1 0 0 1 0 0 0 0
         
e.g. : =2 +3 + (−1) +5
−1 5 0 0 0 0 1 0 0 1

1
   
i
(b) Do the vectors , ∈ C2 span the vector spaces C2 over C?
 i 2
w
Solution: Let ∈ C2 be an arbitrary vector.
z
1
     
i w
We want to find a1 , a2 ∈ C such that a1 + a2 =
i 2 z
1 i w 1 i
   
R27→R2−iR1 w
Hence ∼∼∼∼∼∼∼∼∼>
i 2 z 0 3 z − iw
As inLinear Algebra I we conclude that this system is solvable. (Thm 3.15 of L.A.I.)
1
  
i
=⇒ , Span C2 .
i 2

(c) Pn = Span(1, t, . . . , tn ) (c.f. 2.10(d)iv).


Proposition 3.3. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V . Then Span(x1 , . . . , xn )
is the smallest subspace of V that contains x1 , . . . , xn . In particular:
(a) If x ∈ Span(x1 , . . . , xn ) then Span(x1 , . . . , xn , x) = Span(x1 , . . . , xn ).
(b) For any a2 , . . . , an ∈ F we have
Span(x1 , . . . , xn ) = Span(x1 , x2 − a2 x1 , . . . , xn − an x1 ).
Note:
(i) The first statement means: Span(x1 , . . . , xn ) is a subspace of V that contains x1 , . . . , xn and among
all the subspaces of V with this property it is the smallest, i.e. if W is a subspace of V that contains
x1 , . . . , xn then Span(x1 , . . . xn ) ⊆ W .
(ii) The statement (b) implies that the span of the column vectors of any matrix does not change when
performing (standard) column operations.
Proof. Span (x1 , . . . , xn ) is a subspace of V :
• We have 0V = 0F x1 + . . . + 0F xn ∈ Span(x1 , . . . , xn ).
3 BASES 13

• Let x, y ∈ Span(x1 , . . . , xn );
=⇒ ∃a1 , . . . , an ∈ F, ∃b1 , . . . bn ∈ F such that
x = a1 x1 + . . . + an xn and y = b1 x1 + . . . + bn xn ;
=⇒ x + y = (a1 x1 + . . . + an xn ) + (b1 x1 + . . . + bn xn )
= (a1 x1 + b1 x1 ) + . . . + (an xn + bn xn )
(using commutativity and associativity)
= (a1 + b1 )x1 + . . . + (an + bn )xn (using 1st distributivity law)
∈ Span(x1 , . . . , xn ) (by definition)

• Let x ∈ Span(x1 , . . . , xn ) and a ∈ F .


Write x = a1 x1 + . . . an xn with a1 , . . . , an ∈ F as above;
=⇒ ax = a(a1 x1 + . . . + an xn )
= a(a1 x1 ) + . . . + a(an xn ) (using distributivity)
= (aa1 )x1 + . . . + (aan )xn (by axiom 2.4(iii))
∈ Span(x1 , . . . , xn )

Span(x1 , . . . , xn ) contains x1 , . . . , xn :
because xi = 0F x1 + . . . 0F xi−1 + 1 · xi + 0F xi+1 + . . . + 0F xn

Span(x1 , . . . , xn ) is smallest:
Let W be a subspace of V s.t. x1 , . . . , xn ∈ W .
Let x ∈ Span(x1 , . . . , xn ); write x = a1 x1 + . . . + an xn with a1 , . . . an ∈ F .
We have a1 x1 , . . . , an xn ∈ W (by condition 2.8(c))
=⇒ x = a1 x1 + . . . + an xn ∈ W (by condition 2.8(b)).

Part (a): Span(x1 , . . . , xn ) ⊆ Span(x1 , . . . , xn , x) =: W because W is a subspace of V and x1 , . . . , xn ∈


W.
Span(x1 , . . . , xn , x) ⊆ Span(x1 , . . . , xn ) =: W
f because Wf is a subspace of V and x1 , . . . , xn , x ∈ W
f.

Part (b): LHS ⊆ RHS because RHS is a subspace


and x1 ∈ RHS and xi = 1(xi − ai x1 ) + ai x1 ∈ RHS.
RHS ⊆ LHS because LHS is a subspace
and x1 ∈ LHS and xi − ai x1 ∈ LHS.

Definition 3.4. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V . We say that x1 , . . . , xn
are linearly independent (over F ) if the following condition holds: whenever a1 , . . . , an ∈ F and a1 x1 +
. . . + an xn = 0 then a1 = . . . = an = 0. Otherwise we say that x1 , . . . , xn are linearly dependent.
A linear combination a1 x1 + · · · + an xn is called trivial if a1 = · · · = an = 0, otherwise it is called
non-trivial. (See also Section 6.5 of L.A.I.)

Note: x1 , . . . , xn are linearly dependent ⇐⇒ ∃a1 , . . . , an ∈ F , not all zero, such that a1 x1 + · · · +
an xn = 0. In other words, ⇐⇒ there exists a non-trivial linear combination of x1 , . . . , xn which equals
0.

Example 3.5. (a) Examples as seen in Linear Algebra I. (Section 6.5 of L.A.I.)

1 1 0
     

(b) The three vectors x1 = 0 , x2 = 1 , x3 = −1 ∈ F 3 are not linearly independent because
1 0 1
x1 − x2 − x3 = 0.

1 0
     
c1
(c) Determine all vectors c2  ∈ C3 such that x1 :=  i  , x2 := 1 ,
  c3 1 0
c1
x3 := c2  ∈ C3 are linearly dependent.
c3
Solution: We apply Gaussian elimination to
14 3 BASES

1 0 c1 1 0
   
R27→R2−iR1
c1
 i 1 c2  ∼∼∼∼∼∼∼∼∼> 0 1 c2 − ic1 
R37→R3−R1
1 0 c3 0 0 c3 − c1
=⇒ The equation a1 x1 + a2 x2 + a3 x3 = 0 has a non-trivial solution (a1 , a2 , a3 ) if and only if
c3 − c1 = 0
=⇒ x1 , x2 , x3 are linearly dependent if and only if c1 = c3 .

(d) The two functions sin, cos ∈ RR are linearly independent.

Proof.( Let a, b ∈ R s.t. a sin +b cos = 0 in RR


a sin(0) + b cos(0) = 0 =⇒ a · 0 + b · 1 = 0 =⇒ b = 0
=⇒
a sin( π2 ) + b cos( π2 ) = 0 =⇒ a · 1 + b · 0 = 0 =⇒ a = 0

(e) Let I ⊆ R be a non-empty open interval. For any i ∈ N0 , let ti denote the polynomial function
I → R, s 7→ si . Then 1, t, t2 , . . . , tn are linearly independent in RI .

Proof. Let a0 , . . . , an ∈ R s.t. a0 + a1 t + . . . + an tn = 0


=⇒ a0 + a1 s + . . . + an sn = 0 for all s ∈ I
=⇒ a0 = . . . = an = 0 because of Proposition 3.6 below.

Proposition 3.6. Any non-zero polynomial function f : R → R of degree n has at most n zeroes.

Proof. Induction on n:
n ≤ 1:
The polynomial a0 + a1 t has no zeroes if a1 = 0 and a0 6= 0 and precisely one zero (namely −a0 a−1
1 ) if
a1 6= 0.
n−1 n, n ≥ 2: Let f ∈ Pn .
Suppose s0 ∈ R is a zero of f . (If f doesn’t have a zero, we are done.)
=⇒ ∃g ∈ Pn−1 such that f (s) = g(s)(s − s0 ) for all s ∈ R
(for instance by long division)
=⇒ The zeroes of f are s0 and those of g.
(Let s1 ∈ R, s1 6= s0 s.t. f (s1 ) = 0 =⇒ g(s1 )(s1 − s0 ) = 0 =⇒ g(s1 ) = 0)
=⇒ f has as most n zeroes
(because g has at most n − 1 zeroes by the inductive hypothesis).

Lemma 3.7. Let V be a vector space over a field F .

(a) A single vector x ∈ V is linearly independent if and only if x 6= 0V .

(b) Every subset of any set of linearly independent vectors is linearly independent again. (This is
equivalent to: If a subset of a set of vectors is linearly dependent, then the set itself is linearly
dependent.)

(c) Let x1 , . . . , xn ∈ V and suppose that xi = 0V for some i ∈ {1, . . . , n} or that xi = xj for some
i 6= j. Then x1 , . . . , xn are linearly dependent.

(d) If x1 , . . . , xn ∈ V are linearly dependent then at least one vector xi among x1 , . . . , xn is a linear
combination of the other ones.

(e) Let x1 , . . . , xn ∈ V and x ∈ Span(x1 , . . . , xn ). Then x1 , . . . , xn , x are linearly dependent.

Proof. (a) only-if part: If x = 0V then 1 · x = 0V is a non-trivial linear combination of x.


if-part: Let x 6= 0V and let a ∈ F such that ax = 0V .
Suppose a 6= 0 =⇒ x = 1x = (a−1 a)x = a−1 (ax) = a−1 0V = 0V (Alternatively, apply 2.6 (c).)

(b) Let x1 , . . . , xn ∈ V be linearly independent and let y1 , . . . , ym be a subset of x1 , . . . , xn for some


m ≤ n. We prove the contrapositive statement. Suppose ∃b1 , . . . , bm ∈ F , not all zero, such that
b1 y1 + . . . + bm ym = 0V . Extending the list b1 , . . . , bm by zeroes we get a1 , . . . , am ∈ F , not all zero
such that a1 x1 + . . . + an xn = 0V .
3 BASES 15

(c) If xi = 0V then x1 , . . . , xn are linearly dependent (by (a) and (b)).


If xi = xj for some i 6= j, then xi , xj are linearly dependent (because 1xi + (−1)xj = 0V ) and
hence x1 , . . . , xn are linearly dependent (by (b)).

(d) ∃a1 , . . . , an ∈ F, not all 0, such that a1 x1 + . . . + an xn = 0V .


After reordering we may assume an 6= 0
=⇒ xn = −a−1 n (a1 x1 + . . . + an xn )
= (−a−1 n a1 )x1 + . . . + (−an an−1 )xn−1
−1

is a linear combination of x1 , . . . , xn−1 .

(e) ∃a1 , . . . , an ∈ F such that x = a1 x1 + . . . + an xn


(because x ∈ Span(x1 , . . . , xn ))
=⇒ 0V = a1 x1 + . . . + an xn + (−1)x is a non-trivial linear combination of x1 , . . . , xn , x (because
−1 6= 0 in any field).

Definition 3.8. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V . We say that x1 , . . . , xn
form a basis of V if x1 , . . . , xn both span V and are linearly independent.

Example 3.9. (a) The vectors e1 := (1, 0, . . . , 0); . . . ; en := (0, . . . , 0, 1) form a basis F n , called the
canonical or standard basis of F n (as in Linear Algebra I (Ex 6.41(a))).

(b) The polynomials 1, t, . . . , tn form a basis of Pn . (see 2.10(d)iv, 3.2(c) and 3.5(e)).

(c) 1, i form a basis of the vector space C over R (cf 2.5(b)).

(d) Determine a basis of the nullspace N (A) ⊆ RR of the matrix

1 −1 3 2
 
 2 −1 6 7 
A := 
 3 −2 9
 ∈ M4×4 (R).
9 
−2 0 −6 −10

 We perform row 
Solution: operations:
1 −1 3 2 1 −1 3 2
 
 2 −1 6 7  →R2−2R1 0 1 0 3
A=  ∼R27∼∼∼∼∼∼∼∼> 
 3 −2 9 9 0 1 0 3

 R37→R3−3R1 
−2 0 −6 −10  R47→R4+2R1 0 −2 0 −6
1 0 3 5
R17→R1+R2 0 1 0 3
∼∼∼∼∼∼∼∼∼>   =: A
R37→R3−R2 0 0 0 0
e
R47→R4+2R2
0 0 0 0
=⇒ N (A) =
  ∈ R : Aa = 0} = {a ∈ R : Aa = 0}
{a 4 4 e

 a1 

a
  
=   ∈ R : a1 = −3a3 − 5a4 ; a2 = −3a4
 2 4

 a3 

a
 
 4
 −3a3 − 5a4
 
 

−3a4 
 
=    : a 3 , a4 ∈ R

 a3  

a
 
   4 
 
 −3 −5 
0
 
 −3 
= a3  1
 + a4   : a3 , a4 ∈ R
0
 
0 1

 

   
−3 −5
 0  −3
= Span  1  ,  0  ;
   

0 1
16 3 BASES

   
−3 −5
 0  −3
=⇒   ,   form a basis of N (A) (they are linearly independent because they are not
1 0
  

0 1
multiples of each other).
(e) Let A ∈ Mn×n (R). Then:
A invertible ⇐⇒ The columns of A form a basis of Rn
(for a proof see Theorem 5.6).
Proposition 3.10. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V . The following statements
are equivalent:
(a) x1 , . . . , xn form a basis of V .
(b) x1 , . . . , xn form a minimal spanning set of V (i.e. x1 , . . . , xn span V and after removing any vector
from x1 , . . . , xn the remaining ones don’t span V any more). (Compare Def 6.23 of L.A.I.)
(c) x1 , . . . , xn form a maximal linearly independent subset of V (i.e. x1 , . . . , xn are linearly independent
and for x ∈ V the n + 1 vectors x1 , . . . , xn , x are linearly dependent).
(d) Every vector x ∈ V can be written in the form

x = a1 x1 + . . . + an xn

with coefficients a1 , . . . , an ∈ F uniquely determined by x.


Proof. “(a) =⇒ (b)”: x1 , . . . , xn span V by definition of a basis.
Suppose the spanning set x1 , . . . , xn is not minimal.
=⇒ After reordering we may assume that x1 , . . . , xn−1 span V
=⇒ xn ∈ V = Span(x1 , . . . , xn−1 )
=⇒ x1 , . . . , xn−1 , xn are linearly dependent (by 3.7(e)).
“(b) =⇒ (c)”: Suppose that x1 , . . . , xn are linearly dependent.
=⇒ After reordering we have xn ∈ Span(x1 , . . . , xn−1 ) (by 3.7(d))
V = Span(x1 , . . . , xn ) = Span(x1 , . . . , xn−1 ) (by 3.7(a))
This contradicts the minimality in (b).
Maximality: Let x ∈ V =⇒ x ∈ Span(x1 , . . . , xn )
=⇒ x1 , . . . , xn , x are linearly dependent (by 3.7(c)).
“(c) =⇒ (d)”: Let x ∈ V
=⇒ ∃b1 , . . . , bn , b ∈ F , not all zero, s.t. b1 x1 + . . . + bn xn + bx = 0
(because x1 , . . . , xn , x are linearly dependent)
=⇒ b 6= 0 (because x1 , . . . , xn are linearly independent)
=⇒ x = a1 x1 + . . . + an xn where ai := −b−1 bi .
Suppose x = a1 x1 + . . . + an xn = b1 x1 + . . . + bn xn
=⇒ 0V = x − x = (a1 − b1 )x1 + . . . + (an − bn )xn
=⇒ a1 = b1 , . . . , an = bn (because x1 , . . . , xn are linearly independent).
“(d) =⇒ (a)”: x1 , . . . , xn obviously span V .
Let a1 , . . . , an ∈ F s.t. a1 x1 + . . . an xn = 0V
=⇒ a1 = . . . = an = 0 (because we also have 0x1 + . . . + 0xn = 0V ).
Corollary 3.11. Let V be a vector space over a field F . Suppose V = Span(x1 , . . . , xn ) for some
x1 , . . . xn ∈ V . Then a subset of x1 , . . . , xn forms a basis of V . In particular V has a basis.
Proof. By successively removing vectors from x1 , . . . , xn we arrive at a minimal spanning set y1 , . . . , ym
for some m ≤ n. Then y1 , . . . , ym form a basis of V (by 3.10(b) =⇒ (a))
The following Theorem allows us to define the dimension of a vector space.
Theorem 3.12. Let V be a vector space over a field F . Suppose x1 , . . . , xn and y1 , . . . , ym form a basis
of V . Then m = n. (Compare Thm 6.44 from L.A.I.)
3 BASES 17

Definition 3.13. Let V be a vector space over a field F . If x1 , . . . , xn ∈ V form a basis of V we say
that V is of finite dimension and call n the dimension of V we write dimF (V ) or just dim(V ) for n.
Note that n does not depend on the chosen basis x1 , . . . , xn by 3.12.

Example 3.14. (a) dimF (F n ) = n (by Example 3.9(a)).

(b) dimR (Pn ) = n + 1 (by Example 3.9(b)).

(c) dimR (C) = 2 (by Example 3.9(c)).

(d) dimC (C3 ) = 3, dimR (C3 ) = 6.

(e) dimR (Mn×m (R)) = nm.

(f) Let x1 ∈ V, x1 6= 0 =⇒ dimF (Span(x1 )) = 1.


Let x2 be another vector in V
(;
2 if x1 , x2 are lin.indep.
=⇒ dimF (Span(x1 , x2 )) =
1 if x1 , x2 are lin.dep.

Proof of Theorem 3.12. It clearly suffices to prove the following Proposition.

Proposition 3.15. Let V be a vector space over a field F . Let x1 , . . . , xn ∈ V and y1 , . . . , ym ∈ V .


Suppose x1 , . . . , xn span V . If y1 , . . . , ym are linearly independent then m ≤ n.

Proof. We will show the contrapositive: If m > n then there exist


c1 , . . . , cm ∈ F , not all zero, such that c1 y1 + . . . + cm ym = 0.
For every i ∈ {1, . . . , m} we can write yi = ai1 x1 + . . . + ain xn
with some ai1 , . . . ain ∈ F (because x1 , . . . , xn span V );
=⇒ For all c1 , . . . , cm ∈ F we have :
c1 y1 + . . . + cm ym
= c1 (a11 x1 + . . . + a1n xn ) + . . . + c1m (am1 x1 + . . . + amn xn )
= (a11 c1 + . . . + am1 cm )x1 + . . . + (a1n c1 + . . . + amn cm )xn
(using the axioms of a vector space)
=⇒ It suffices to show that the system of linear equations

a11 c1 + a21 c2 + . . . + am1 cm = 0


..
.
a1n c1 + a2n c2 + . . . + amn cm = 0

has a solution (c1 , . . . , cm ) ∈ F m different from (0, . . . , 0).


This follows from Gaussian elimination (as seen in Linear Algebra I for F = R (Thm 3.15(b) from
L.A.I.)) because we have more unknowns than equations as m > n.

Corollary 3.16. Let V be a vector space over a field F . Let x1 , . . . xn ∈ V . Suppose two of the following
three statements hold. Then x1 , . . . , xn form a basis:

(a) x1 , . . . , xn are linearly independent.

(b) x1 , . . . , xn span V .

(c) n = dimF (V ).

Proof. If (a) and (b) hold then x1 , . . . , xn form a basis by definition.


Suppose (a) and (c) hold. If x1 , . . . , xn would not form a basis we could find an x ∈ V such that
x1 , . . . , xn , x are still linearly independent (by 3.10(a) ⇐⇒ (c)). But then n + 1 ≤ n by proposition
3.15.
Suppose (b) and (c) hold. After reordering we may assume that x1 , . . . , xm form a minimal spanning set
(for some m ≤ n).
=⇒ x1 , . . . , xm form a basis (by 3.10(a) ⇐⇒ (b))
=⇒ m = n (by 3.12); i.e. x1 , . . . , xn form a basis.
18 3 BASES

1 1
     
−2
Example 3.17. (a) The vectors x1 := 2 , x2 :=  1  , x3 := 0 form a basis of the vector
3 0 1
space Q3 over Q, of the vector space R3 over R and of the vector space C3 over C.

Proof. We first show that x1 , x2 , x3 are linearly independent over C:


Let c
1 , c2 , c3 ∈ C such
 that
 c 1 x1+ c2 x2 + c3 x3 = 0
1 −2 1 c1 0
=⇒ 2 1 0 c2  = 0
3 0 1 c3 0
Gaussian
 elimination  yields:
1 −2 1 1 1
 
R27→R2−2R1
−2
A := 2 1 0 ∼∼∼∼∼∼∼∼∼> 0 5 −2
R37→R3−3R1
3 0 1 0 6 −2
1 −2 1

∼∼∼∼∼∼∼∼∼∼> 0 5 −2 
R37→R3−6/5R2
0 0 2/5
=⇒ c = (c1 , c2 , c3 ) = (0, 0, 0) is the only solution of Ac = 0.
=⇒ x1 , x2 , x3 are linearly independent over C and then also over R and Q.
=⇒ x1 , x2 , x3 form a basis of C3 , R3 and Q3
(by 3.16 because 3 = dimC (C3 ) = dimR (R3 ) = dimQ (Q3 )).

(b) We view C as a vector space√ over C, R and Q√(See 2.5(b)).


Let x1 := 1, x2 := 2, x3 := 2, x4 := i, x5 := i 3 ∈ C.
Determine dimF (SpanF (x1 , x2 , x3 , x4 , x5 )) for F = C, R and Q.
Solution: for F = C:
We have C = SpanC (x1 ) ⊆ SpanC (x1 , . . . , x5 ) ⊆ C
=⇒ dimC (SpanC (x1 , . . . x5 )) = dimC (C) = 1.
for F = R:
x1 and x4 span C as a vector space over R
=⇒ SpanR (x1 , . . . , x5 ) = C
=⇒ dimR (Span(x1 , . . . , xn )) = dimR (C) = 2.
for F = Q: Observations:
x1 , x2 are l.d. over Q; x1 , x3 are l.i. over Q; x1 , x4 are l.i. over R.
Let’s try to prove: x1 , x3 , x4 and x5 are l.i. over Q.
Let c1 , c3 , c4 √
and c5 ∈ Q s.t. √c1 x1 + c3 x3 + c4 x4 + c5 x5 = 0
=⇒ (c1 + c3√ 2) + i(c4 + c5 3)√= 0
=⇒ c1 + c3 2 = 0 and c4 + c5 3 = 0 √(because 1, i are l.i. over R)
=⇒ c1 = c3 = 0 (suppose c3 6= 0 =⇒ 2 = − cc13 ∈ Q ) and
c4 = c5 = 0 (similarly).
Hence x1 , x3 , x4 and x5 form a basis of SpanQ (x1 , x3 , x4 , x5 ) over Q
=⇒ dimQ (Span(x1 , x2 , x3 , x4 , x5 )) = dimQ (Span(x1 , x3 , x4 , x5 )) = 4.
(c) Let V be a vector space of finite dimension over a field F and let W be a subspace of V . Then
dimF (W ) ≤ dimF (V ).

Proof. Extend a maximal linearly independent subset of W to a maximal independent subset of


V . (Note that if vectors are L.I. in W , they are also L.I. in V .)
=⇒ dimF (W ) ≤ dimF (V ) (by 3.10(a) ⇐⇒ (c)).
4 LINEAR TRANSFORMATIONS 19

4 Linear Transformations
Let F be a field (e.g. F = R, Q, C or F2 ).

Definition 4.1. An (m × n) - matrix A over a field F is an array


 
a11 . . . a1n
A =  ... .. 
. 

am1 . . . amn

with entries aij in F . We use the notation Mm×n (F ) for the set of all (m × n) - matrices over F (see
also 2.7(b)). We define addition and multiplication of matrices (and other notions) in the same way as
in the caseF = R (as seen
 in Linear Algebra
  I). e.g.
1 1+i 1−i 1 − i + 3 + 3i 4 + 2i

= = (matrices over C)
2 1 − i 3 2 − 2i + 3 − 3i 5 − 5i
0 1 1 1 0 1
    
= (as matrices over F2 )
1 1 0 1 1 0
0 1 1 1 0 1
    
but = (as matrices over R)
1 1 0 1 1 2

Definition 4.2. Let V, W be vector spaces over a field F . A map L : V → W is called a linear
transformation if the following two conditions hold:

(a) For all x, y ∈ V we have L(x + y) = L(x) + L(y) in W .

(b) For all a ∈ F and x ∈ V we have L(ax) = a(L(x)) in W .

Note: Then we also have L(0V ) = 0W and L(x − y) = L(x) − L(y)


for all x, y ∈ V .

Proof. L(0V ) = L(0V + 0V ) = L(0V ) + L(0V )


=⇒ L(0V ) = 0W (by cancelling L(0V )).
L(x − y) = L(x + (−1)y) = L(x) + L((−1)y)
= L(x) + (−1)L(y) = L(x) − L(y).

Example 4.3. (a) Let A ∈ Mm×n (F ). Then the map

LA :F n → F m
a11 x1 + . . . + a1n xn
   
x1
x =  ...  7→ Ax =  ..
.
   

xn am1 x1 + . . . + amn xn

is a linear transformation.
a 0
 
E.g. if A = ∈ M2×2 (R) for some a ∈ R then LA : R2 → R2 is given by x 7→ ax (stretch
0 a
by a factor
 of a);
cos(φ) sin(φ)

if A = for some 0 ≤ φ < 2π then LA : R2 → R2 is the clockwise rotation by
− sin(φ) cos(φ)
the angle φ.

Proof. i Let x, y ∈ F n ;
=⇒ LA (x + y) = A(x + y) = Ax + Ay = LA (x) + LA (y)
ii Let a ∈ F and x ∈ F n
=⇒ LA (ax) = A(ax) = a(Ax) = a(LA (x))
(The middle equality of both chains of equalities has been proved in Linear Algebra I if F = R,
See Thm 2.13(i) and (ii), the same proof works for any field F . The whole statement is actually
Lemma 5.3 in L.A.I.)
20 4 LINEAR TRANSFORMATIONS

(b) Let V be a vector space over a field F . Then the following maps are linear transformations:
id : V → V, x 7→ x (identity);
0 : V → V, x 7→ 0V (zero map);
the map V → V, given by x 7→ ax, for any given a ∈ F fixed (stretch).

Proof. Easy. (Example 5.4(c),(d) in L.A.I.)

(c) Let L : V → W and M : W → Z be linear transformation between vector spaces over a field F .
Then their composition M ◦ L : V → Z is again a linear transformation.

Proof. (See also Section 5.3 of L.A.I.)

i Let x, y ∈ V ;
=⇒ (M ◦ L)(x + y) = M (L(x + y)) = M (L(x) + L(y))
= M (L(x)) + M (L(y)) = (M ◦ L)(x) + (M ◦ L)(y).
ii Let a ∈ F and x ∈ V ;
=⇒ (M ◦ L)(ax) = M (L(ax)) = M (a(L(x)))
= a(M (L(x))) = a(M ◦ L)(x).

(d) Let V be the subspace of RR consisting of all differentiable functions. Then differentiation D :
V → RR , f 7→ f 0 , is a linear transformation.

Proof. i Let f, g ∈ V ;
=⇒ D(f + g) = (f + g)0 = f 0 + g 0 = D(f ) + D(g)
ii Let a ∈ R and f ∈ V ;
=⇒ D(af ) = (af )0 = af 0 = a(D(f ))
(The middle equality in both chains of equalities has been proved in Calculus.)
 
x
(e) The map L : R → R, 1
2
7→ x1 x2 , is not a linear transformation.
x2

1
 
Proof. Let a = 2 and x = ∈ R2 . Then:
! 1
2
 
L(ax) = L = 4 but aL(x) = 2 · 1 = 2.
2

Proposition 4.4 (Matrix representation I). Let F be a field. Let L : F n → F m be a linear transforma-
tion. Then there exists a unique matrix A ∈ Mm×n (F ) such that L = LA (as defined in 4.3(a)). In this
case we say that A represents L (with respect to the standard bases of F n and F m ).

(See also Thm 5.6 of L.A.I.)  


c1
2c1 + c3 − 4c2 2 1
   
−4
For example, the map R3 → R2 , c2  7→ , is represented by A = ∈
c2 0 1 0
c3
M2×3 (R).

1 0
   
0  .. 
Proof. Let e1 :=  .  , . . . , en :=  .  denote the standard basis of F n .
   
 ..  0
0 1
Uniqueness: Suppose A ∈ Mm×n (F ) satisfies L = LA .
=⇒ The j th column of A is Aej = LA (ej ) = L(ej ) (for j = 1, . . . , n)
=⇒ A is the (m × n)-matrix with the column vector L(e1 ), . . . , L(en ).
Existence: Let A be defined this way. We want to show L = LA .
4 LINEAR TRANSFORMATIONS 21

 
c1
Let c =  ...  ∈ F n =⇒ c = c1 e1 + . . . + cn en ;
 

cn
=⇒ L(c) = L(c1 e1 ) + . . . + L(cn en ) = c1 L(e1 ) + . . . + cn L(en )
and LA (c) = ... = c1 LA (e1 ) + . . . + cn LA (en )
(because L and LA are linear transformations)
=⇒ L(c) = LA (c) (because L(ej ) = LA (ej ) for all j = 1, . . . , n).

Definition 4.5. Let L : V → W be a linear transformation between vector spaces V, W over a field F .
Then
ker(L) := {x ∈ V : L(x) = 0W }
is called the kernel of L and

im(L) := {y ∈ W : ∃x ∈ V : y = L(x)}

is called the image (or range) of L.

Remark 4.6. Let A ∈ Mm×n (F ). Then

ker(LA ) = N(A)

when N(A) = {c ∈ F n : Ac = 0} denotes the nullspace of A (see also Section 6.2 of L.A.I.) and

im(LA ) = Col(A)

where Col(A) denotes the column space of A; i.e. Col(A) = Span(a1 , . . . , an )


where a1 , . . . , an denotes the n columns of A. (See also Section 6.4 of L.A.I.)

Proof. First assertion: by definition.


Second assertion: follows from 4.9(a) applied to the standard basis of F n .

Lemma 4.7. Let V and W be vector spaces over a field F and let L : V → W be a linear transformation.
Then:

(a) ker(L) is a subspace of V .

(b) im(L) is a subspace of W .

Proof. (a) i We have 0V ∈ ker(L) (see Note after Def 4.2).


ii Let x, y ∈ ker(L);
=⇒ L(x + y) = L(x) + L(y) = 0W + 0W = 0W ;
=⇒ x + y ∈ ker(L).
iii Let a ∈ F and x ∈ ker(L);
=⇒ L(ax) = a(L(x)) = a0W = 0W ;
=⇒ ax ∈ ker(L).

(b) i We have 0W = L(0V ) ∈ im(L).


ii Let x, y ∈ im(L);
=⇒ ∃v, w ∈ V such that x = L(v) and y = L(w);
=⇒ x + y = L(v) + L(w) = L(v + w) ∈ im(L).
iii Let y ∈ im(L) and a ∈ F ;
=⇒ ∃x ∈ V such that y = L(x);
=⇒ ay = a(L(x)) = L(ax) ∈ im(L).

Example 4.8. Let A ∈ M4×4 (R) be as in 3.9(d). Find a basis of image, im(LA ), of LA : R4 → R4 , c 7→
Ac.
22 4 LINEAR TRANSFORMATIONS

Solution We perform column operations:

1 3 2 1 0 0 0
   
−1
2 6 −1 7  C27→C2+C1  2 1 0 3
A =
3
 ∼∼∼∼∼∼∼∼∼> 
9 9  C37→C3−3C1  3 1 0 3

−2
−2 0
−6 −10 C47→C4−2C1
−2 −2 0 −6
1 0 0 0
 
C47→C4+3C2  2 1 0 0
∼∼∼∼∼∼∼∼∼>   =: A
3 1 0 0
e

−2 −2 0 0

1 0
   
2 1
 3  ,  1  span im(LA )
=⇒ The two vectors    

−2 −2
(because im(LA ) = Col(A) = Col(Ã) by 4.6 and 3.3(b)).
=⇒ They form a basis on im(LA ) (because they are also L.I., as they are not multiples of each other).
Proposition 4.9. Let V and W be vector spaces over a field F and let L : V → W be a linear
transformation. Let x1 , . . . , xn ∈ V . Then:
(a) If x1 , . . . , xn ∈ V span V then L(x1 ), . . . , L(xn ) span im(L).
(b) If L(x1 ), . . . , L(xn ) are linearly independent then x1 , . . . , xn are linearly independent.
Proof. (a) Let y ∈ im(L);
=⇒ ∃x ∈ V such that y = L(x)
and ∃a1 , . . . , an ∈ F such that x = a1 x1 + . . . + an xn (since V = Span(x1 , . . . , xn ))
=⇒ y = L(x) = L(a1 x1 + . . . + an xn ) =
a1 L(x1 ) + . . . + an L(xn ) ∈ Span(L(x1 ), . . . , L(xn ));
=⇒ im(L) ⊆ Span(L(x1 ), . . . , L(xn ));
=⇒ im(L) = Span(L(x1 ), . . . , L(xn )) (i.e. L(x1 ), . . . , L(xn ) span im(L))
(because Span(L(x1 ), . . . , L(xn )) is the smallest subspace containing L(x1 ), . . . , L(xn ) (by 3.3)).

(b) Let a1 , . . . , an ∈ F such that a1 x1 + . . . + an xn = 0V ;


=⇒ 0W = L(0V ) = L(a1 x1 + . . . + an xn ) = a1 L(x1 ) + . . . + an L(xn );
=⇒ a1 = . . . = an = 0
(since L(x1 ), . . . , L(xn ) are linearly independent).
Hence x1 , . . . , xn are linearly independent.

Proposition 4.10 (Kernel Criterion). Let V and W be vector spaces over a field F , and let L : V → W
be a linear transformation. Then:
L injective ⇐⇒ ker(L) = {0V }.
Proof. “=⇒”: )
Let x ∈ ker(L) =⇒ L(x) = 0W
=⇒ x = 0V .
We also have L(0V ) = 0W
“⇐=”: Let x, y ∈ V such that L(x) = L(y);
=⇒ L(x − y) = L(x) − L(y) = 0W ;
=⇒ x − y = 0V (since ker(L) = {0V });
=⇒ x = y.
Definition 4.11. Let V, W be vector spaces over a field F . A bijective linear transformation L : V → W
is called an isomorphism. The vector spaces V and W are called isomorphic if there exists an isomorphism
L : V → W ; we then write V ∼= W.
Example 4.12. (a) For any vector space V over a field F , the identity id : V → V is an isomorphism.
4 LINEAR TRANSFORMATIONS 23

(b) If L : V → W is an isomorphism then the inverse map L−1 : W → V is an isomorphism as well.


(See also Def 5.21 from L.A.I.)
(c) If A ∈ Mn×n (R) is invertible then LA : Rn → Rn is an isomorphism.
 
a
(d) The map L : R → C,
2
7→ a + bi, is an isomorphism between the vector spaces R2 and C over
b
R.
(e) For any n ∈ N, the map
 
a0
L : Rn+1 → Pn ,  ...  7→ a0 + a1 t + . . . + an tn
 

an

is an isomorphism between the vector spaces Rn+1 and Pn over R.


(f) For any m, n ∈ N we have Rmn ∼
= Mm×n (R).
Proof. (a) obvious.
(b) and (c) see Coursework.
(d) and (e) follow from the following proposition and 3.9(c) and (d), respectively.
(f) (only in the case m = n = 2) The map
 
a1  
a2  a1 a2
R → M2×2 (R),
4

a3  7→ a3 a4
 

a4
is clearly an isomorphism.
Proposition 4.13. Let V be a vector space over a field F with basis x1 , . . . xn .
Then the map  
a1
L : F n → V,  ...  7→ a1 x1 + . . . + an xn
 

an
is an isomorphism. (We will later use the notation Ix1 ,...,xn for the map L.)
   
a1 b1
 ..   .. 
Proof. (a) Let a =  .  and b =  .  ∈ F n ;
an b
 n
a1 + b1


=⇒ L(a + b) = L  ... 
 

an + bn
= (a1 + b1 )x1 + . . . + (an + bn )xn (by definition of L)
= (a1 x1 + . . . + an xn ) + (b1 x1 + . . . + bn xn )
(by distributivity, commutativity and associativity)
= L(a) + L(b) (by definition of L).
 
b1
 .. 
(b) Let a ∈ F and b =  .  ∈ F n ;
bn
 
ab1
=⇒ L(ab) = L  ... 
 

abn
= (ab1 )x1 + . . . + (abn )xn (by definition of L)
= a(b1 x1 + . . . + bn xn ) (using the axioms of a vector space)
= a(L(b)) (by definition of L).
24 4 LINEAR TRANSFORMATIONS

 0 
     
 a1
 
 
(c) ker(L) =  ...  ∈ F n : a1 x1 + . . . + an xn = 0V =  ... 

   

0
 
  
an
  
(because x1 , . . . , xn are linearly independent)
=⇒ L is injective (by 4.10).

(d) im(L) = Span(L(e1 ), . . . , L(en )) (by 4.9(a))


= Span(x1 , . . . , xn ) = V (because x1 , . . . , xn span V );
=⇒ L is surjective.

Theorem 4.14. Let V and W be vector spaces over a field F of dimension n and m, respectively. Then
V and W are isomorphic if and only if n = m.

Proof. (“⇐=”): We assume that n = m.


=⇒ We have isomorphisms LV : F n → V and LW : F n → W (by 4.13);
=⇒ LW ◦ L−1V is an isomorphism between V and W (by 4.3(b) and 4.12(b)).

(“=⇒”) We assume that V and W are isomorphic.


Let L : V → W be an isomorphism and let x1 , . . . , xn be a basis of V .
=⇒ L(x1 ), . . . , L(xn ) span im(L) = W (by 4.9(a))
and are linearly independent (by 4.9(b) applied to L−1 and L(x1 ), . . . , L(xn ))
=⇒ L(x1 ), . . . , L(xn ) form a basis of W
=⇒ n = dimF (W ) = m.

Theorem 4.15 (Dimension Theorem). Let V be a vector space over a field F of finite dimension and
let L : V → W be a linear transformation from V to another vector space W over F . Then:

dimF (ker(L)) + dimF (im(L)) = dimF (V ).

Example 4.16.

L:V→W dim(ker(L)) dim(im(L)) dim(V) Verification


LA for =2 =2 =4 2+2=4
A ∈ M4×4 (R) by 3.9(d) by 4.7
as in 3.9(d)
LA for =3 =2 =5 3+2=5
A ∈ M3×5 (R)
below
Isomorphism =0 = dim(W ) = dim(V ) 0 + dim(W )
by 4.5 = dim(V )
by 4.14
Zero map = dim(V ) =0 = dim(V ) dim(V ) + 0
= dim(V )
Let
1 2 3 −1
 
−2
A = −3 6 −1 1 −7 .
2 −4 5 8 −4

We want to find dimR (ker(LA )) and dimR (im(LA )) – we do this by finding bases for both ker(LA ) and
im(LA ).
We find a basis of the nullspace N (A):

1 2 3 −1 1 0 −1 3
   
−2 −2
Gaussian elimination
A = −3 6 −1 1 −7 −−−−−−−−−−−−−→ 0 0 0 0 0  =: Ã.
2 −4 5 8 −4 0 0 1 2 −2
4 LINEAR TRANSFORMATIONS 25

Then

N (A) = {x ∈ R5 : Ax = 0} = {x ∈ R5 : Ãx = 0} =
  

 x1 

x2 

 
  

=   x 3
 ∈ R 5
: x 1 = 2x 2 + x 4 − 3x 5 , x 3 = −2x 4 + 2x 5 =
x
 
4

   

 
x5
 

2x2 + x4 − 3x5
  

 

x2

   

 
=  + 2x : =

 −2x 4 5 
 x 2 , x4 , x5 ∈ R
x
 
4

   

 
x5
 

2 1
       
 −3 
1
0 0

 


   

= x2  0 + x4 −2 + x5  2  : x2 , x4 , x5 ∈ R =
   
0 1 0
       

 

0 0 1

 

2 1
     
−3
1  0   0 
0 , −2 ,  2  .
= Span 
     
     
0  1   0 
0 0 1

The three vectors above are linearly independent, since if a1 , a2 , a3 ∈ R and

2 1 0
       
−3
1 0  0  0
0 + a2 −2 + a3  2  = 0 ,
       
a1        
0 1  0  0
0 0 1 0

then a1 = a2 = a3 = 0 by looking at the second, fourth and fifth coordinates, respectively.


Thus these three vectors are a basis of N (A), so dimR (ker(LA )) = dimR (N (A)) = 3.
We now find a basis of the image im(LA ):

1 −2 2 3 −1 1 0 0 0 0
   
column operations
A = −3 6 −1 1 −7 −−−−−−−−−−−→ −3 0 5 0 0 .
2 −4 5 8 −4 2 0 1 0 0

1 0
   

So the vectors −3 , 5 span im(LA ). Since they are obviously not multiples of each other, they are
2 1
linearly independent, hence form a basis of im(LA ). Consequently, dimR (im(LA )) = 2.

Proof of the Dimension Theorem. Let x1 , . . . , xr be a basis of ker(L).


We extend x1 , . . . , xr to a basis x1 , . . . , xn of the whole V for some n ≥ r (as in Example 3.17(c)) and
show below that L(xr+1 ), . . . , L(xn ) form a basis of im(L). Then we have

dimF (ker(L)) + dimF (im(L)) = r + (n − r) = n = dimF (V ),

as we wanted to prove.
L(xr+1 ), . . . , L(xn ) form a basis of im(L):
• L(xr+1 ), . . . , L(xn ) span im(L): Let y ∈ imL.
=⇒ ∃x ∈ V such that y = L(x) (by definition of im(L))
and ∃a1 , . . . , an ∈ F s.t. x = a1 x1 + · · · + an xn (since x1 , . . . , xn span V ).
=⇒ y = L(x) = L(a1 x1 + · · · + an xn )
26 4 LINEAR TRANSFORMATIONS

= a1 L(x1 ) + · · · + an L(xn ) (because L is a linear transformation)


= ar+1 L(xr+1 ) + · · · + an L(xn ) (because x1 , . . . , xr ∈ ker(L))
∈ Span(L(xr+1 ), . . . , L(xn ))
=⇒ im(L) ⊆ Span(L(xr+1 ), . . . , L(xn ))
=⇒ im(L) = Span(L(xr+1 ), . . . , L(xn )) (because L(xr+1 ), . . . , L(xn ) ∈ im(L)
and Span(L(xr+1 ), . . . , L(xn )) is the smallest subspace containing L(xr+1 ), . . . , L(xn ) ∈ im(L)).
• L(xr+1 ), . . . , L(xn ) are linearly independent:
Let ar+1 , . . . , an ∈ F such that ar+1 L(xr+1 ) + · · · + an L(xn ) = 0W .
=⇒ L(ar+1 xr+1 + · · · + an xn ) = 0W (because L is a linear transformation)
=⇒ ar+1 xr+1 + · · · + an xn ∈ ker(L) (by definition of kernel)
=⇒ ∃a1 , . . . ar ∈ F such that ar+1 xr+1 + · · · + an xn = a1 x1 + · · · + ar xr
(because x1 , . . . , xr span ker(L))
=⇒ a1 x1 + · · · + ar xr − ar+1 xr+1 − · · · − an xn = 0V
=⇒ a1 = · · · = ar = −ar+1 = · · · = −an = 0
(because x1 , . . . , xn are linearly independent)
=⇒ ar+1 = · · · = an = 0.
Proposition 4.17 (Matrix representation II). Let V and W be vector spaces over a field F with bases
x1 , . . . , xn and y1 , . . . , ym , respectively. Let L : V → W be a linear transformation. Then there exists a
unique matrix A ∈ Mm×n (F ) that represents L with respect to x1 , . . . , xn and y1 , . . . , ym . Here we say
that A = (aij ) ∈ Mm×n (F ) represents L (with respect to x1 , . . . , xn and y1 , . . . , ym ) if for all c1 , . . . , cn ∈
F we have
L(c1 x1 + . . . + cn xn ) = d1 y1 + . . . + dm ym
where d1 , . . . , dm ∈ F are given by    
d1 c1
 ..   .. 
 .  = A . 
dm cn

Proof. Let A ∈ Mm×n (F ). Then:


A represents L with respect to x1 , . . . , xn and y1 , . . . , ym
(?)
⇐⇒ The diagram
L
V −−→ W
Ix1 ,...,xn ↑ ↑ Iy1 ,...,ym
n LA m
F −−−→ F
commutes, i.e. L ◦ Ix1 ,...,xn = Iy1 ,...,ym ◦ LA (see proof below);
⇐⇒ LA = Iy−1 1 ,...,ym
◦ L ◦ Ix1 ,...,xn =: M
⇐⇒ A represents M : F n → F m (with respect to the standard bases of F n and F m ).
Hence 4.17 follows from 4.4.
   
c1 d1
 ..   .. 
Proof of (?): Let c1 , . . . , cn ∈ F and let d1 , . . . , dm ∈ F be given by A  .  =  . 
cn dm
 
c1
=⇒ (L ◦ Ix1 ,...,xn )  ...  = L(c1 x1 + · · · + cn xn )
 

cn
   
c1 d1
 ..   .. 
and (Iy1 ,...,ym ◦ LA )  .  = Iy1 ,...,ym  .  = d1 x1 + · · · + dm ym .
cn dm
Hence: L(c1 x1 + . . .+cn x
n )= d1 y1 + . . . + dm ym
 
c1 c1
 ..   .. 
⇐⇒ (L ◦ Iy1 ,...,ym )  .  = (Iy1 ,...,ym ◦ LA )  . 
cn cn
Therefore: A represents L ⇐⇒ L ◦ Ix1 ,...,xn = Iy1 ,...,ym ◦ LA
4 LINEAR TRANSFORMATIONS 27

Note: Given L, x1 , . . . , xn and y1 , . . . , ym as in 4.17 we find the corresponding matrix A as follows:


For each i = 1, . . . , n we compute L(xi ), represent L(xi ) as a linear combination of y1 , . . . , ym and write
the coefficients of this linear combination into the ith column of A.
Example 4.18. Find the matrix A ∈ M3×4 (R) representing differentiation D : P3 → P2 , f 7→ f 0 , with
respect to the bases 1, t, t2 , t3 and 1, t, t2 of P3 and P2 , respectively.
Solution: We have

D(1) = 0 = 0 + 0t + 0t2
D(t) = 1 = 1 + 0t + 0t2
D(t2 ) = 2t = 0 + 2t + 0t2
D(t3 ) = 3t2 = 0 + 0t + 3t2

0 10 0
 

=⇒ A = 0 00 2
0 03 0

1
 
−1
Example 4.19. Let B := ∈ M2×2 (R). Find the matrix A ∈ M2×2 (R) representing the linear
2 4
1 1
   
transformation LB : R2 → R2 , x 7→ Bx, with respect to the basis , of R2 (used for both
−2 −1
source and target space).
Solution:
1 1 1 3 1 1
          
−1
LB = = =3 +0
2 2 4 −2 −6 −2 −1

1 1 1 2 1 1
          
−1
LB = = =0 +2
−1 2 4 −1 −2 −2 −1
3 0
 
=⇒ A = .
0 2
28 5 DETERMINANTS

5 Determinants
In Linear Algebra I the determinant of a square matrix has been (inductively) defined via expanding
along a row (or a column). Here we begin with the following closed formula; it is obtained by iterated
expansion (without proof).

Definition 5.1 (Leibniz). Let F be a field. Let n ≥ 1 and A = (aij )i,j=1,...,n ∈ Mn×n (F ). Then

X n
Y
det(A) := sgn(σ) ai,σ(i) ∈ F
σ∈Sn i=1

is called the determinant of A. We also write |A| for det(A).

Example 5.2. (a) Let n = 1 and A = (a11 ) ∈ M1×1 (F );


=⇒ S1 = {id} and det(A) = sgn(id)a11 = a11 .
 
a11 a12
(b) Let n = 2 and A = ∈ M2×2 (F ) =⇒ S2 = {id, h1, 2i} and
a21 a22

det(A) = sgn(id)a11 a22 + sgn(h1, 2i)a12 a21 = a11 a22 − a12 a21 .

1 + 2i 3 + 4i
 
E.g. let A = ∈ M2×2 (C);
1 − 2i 2 − i
=⇒ det(A) = (1 + 2i)(2 − i) − (3 + 4i)(1 − 2i)
= (2 + 2 + 4i − i) − (3 + 8 + 4i − 6i) = −7 + 5i.
 
a11 a12 a13
(c) Let n = 3 and A = a21 a22 a23  ∈ M3×3 (F );
a31 a32 a33
=⇒ S3 = {id, h1, 2, 3i, h1, 3, 2i, h1, 3i, h2, 3i, h1, 2i} and det(A) =
sgn(id) a11 a22 a33 + sgn(h1, 2, 3i) a12 a23 a31 + sgn(h1, 3, 2i) a13 a21 a32
| {z } | {z } | {z }
=+1 =+1 =+1
+ sgn(h1, 3i) a13 a22 a31 + sgn(h2, 3i) a11 a23 a32 + sgn(h1, 2i) a12 a21 a33 .
| {z } | {z } | {z }
=−1 =−1 =−1

a11 a12 a13 a11 a12


 
 
 
Trick to memorise: a21 a22 a23 a21 a22
 
 
 
a31 a32 a33 a31 a32
(Rule of Sarrus)
1 0 1
 

E.g. let A = 0 1 0 ∈ M3×3 (F2 );


1 1 0
=⇒ det(A) = (0 + 0 + 0) − (1 + 0 + 0) = 1 (because − 1 = 1 in F2 ).
 
a11 . . . . . . a1n
.. .. 
 0 . . 

(d) Let A = (aij ) be an upper (or lower) triangular matrix ; i.e. A is of the form 
 . .. 
.. .

 .. . .. . 
0 . . . 0 ann
; i.e. we have aij = 0 if i > j.
Then det(A) = a11 a22 · · · ann ; i.e. det(A) is the product of the entries on the main diagonal, e.g.
det(In ) = 1.

Proof. Let σ ∈ Sn , σ 6= Q
id =⇒ ∃i0 ∈ {1, . . . , n} such that i0 > σ(i0 );
n
=⇒ ai0 ,σ(i0 ) = 0; =⇒ Qi=1 ai,σ(i) = 0.
n
=⇒ det(A) = sgn(id) i=1 ai,id(i) = a11 a22 · · · ann .
5 DETERMINANTS 29

Theorem 5.3 (Weierstrass’ axiomatic description of the determinant map). (See Defn 4.1 of L.A.I.)
Let F be a field. Let n ≥ 1. The map

det : Mn×n (F ) → F, A 7→ det(A),

has the following properties and is uniquely determined by these properties:


(a) det 
is linear in each column:
a1s + b1s
    
a1s b1s
det C .. .. ..
D = det C D + det C
     
. . . D
ans + bns ans bns

(b) Multiplying any column of a matrix A ∈ Mn×n (F ) with a scalar λ ∈ F changes det(A) by the
 λ:
factor   
λa1s a1s
det C .. ..
D = λ · det C
   
. . D
λans ans

(c) If two columns of A ∈ Mn×n (F ) are equal, then det(A) = 0.


(d) det(In ) = 1.
Proof. Omitted. (See the extended notes.)
Remark 5.4. Theorem 5.3 and the following Corollary 5.5 also hold when “columns” are replaced with
“rows” (similar proofs).
Corollary 5.5. Let F be a field. Let A ∈ Mn×n (F ). Then:
(a) For all λ ∈ F we have det(λA) = λn det(A).
(b) If a column of A is the zero column then det(A) = 0.
(c) Let B be obtained from A by swapping two columns of A.
Then det(B) = −det(A).
(d) Let λ ∈ F and let B be obtained from A by adding the λ-multiple of the j th column of A to the ith
column of A (i =
6 j). Then det(B) = det(A).
Proof. (a) Apply Theorem 5.3(b) n times.
(b) Apply Theorem 5.3(b) with λ = 0.
(c) Let a, b denote the two columns of A to be swapped.
=⇒ det(A) + det(B) = det(. . . a . . . b . . .) + det(. . . b . . . a . . .)
= det(. . . a . . . b . . .) + det(. . . b . . . a . . .)
+ det(. . . a . . . a . . .) + det(. . . b . . . b . . .) (using 5.3(c))
| {z } | {z }
=0 =0
= det(. . . a . . . a + b . . .) + det(. . . b . . . a + b . . .) (by 5.3(a))
= det(. . . a + b . . . a + b . . .) (by 5.3(a))
= 0 (by 5.3(c)).
(d) det(B) = det(. . . a + λb . . . b . . .)
= det(. . . a . . . b . . .) + λ det(. . . b . . . b . . .) (by 5.3(a), (b))
= det(A) (by 5.3(c)).

Theorem 5.6. Let F be a field. Let A ∈ Mn×n (F ). Then the following are equivalent:
(a) A is invertible.
(b) The columns of A span F n .
30 5 DETERMINANTS

(c) N (A) = {0}.


(d) det(A) 6= 0.
(For (a) ⇐⇒ (d), compare Thm 4.14 of L.A.I.)

Proof. “(a) =⇒ (b)”. We’ll prove the following more precise statement:
(?) ∃B ∈ Mn×n (F ) s.t. AB = In ⇐⇒ The columns of A span F n .
Proof of (?):  Leta1 . . . , an denote
  the columns of A
1 0
0  .. 
and let e1 =  .  , . . . , en =  . , i.e. In = (e1 , . . . , en ). Then:
   
 ..  0
0 1
LHS ⇐⇒ ∃b1 , . . . , bn ∈ F n s.t. Abi = ei for all i = 1, . . . , n
(use B = (b1 , . . . , bn ));
⇐⇒ ∃b1 , . . . , bn ∈ F n s.t. a1 bi1 + . . . + an bin = ei for all i = 1, . . . , n;
⇐⇒ e1 , . . . , en ∈ Span(a1 , . . . , an ) ⇐⇒ RHS.

“(b) ⇐⇒ (c)”. The columns of A span F n


⇐⇒ The columns of A form a basis of F n (by 3.16).
⇐⇒ The columns of A are linearly independent (by 3.16).
⇐⇒ N (A) = {0} (by definition of N (A)).

“(b) =⇒ (a)”. ∃B ∈ Mn×n (F ) s.t. AB = In (by (?)).


=⇒ A(BA) = (AB)A = In A = A = AIn .
=⇒ A(BA − In ) = 0.
=⇒ Every column of BA − In belongs to N (A).
=⇒ BA − In = 0 (because N (A) = {0} by (b) ⇐⇒ (c)).
=⇒ BA = In .
=⇒ A is invertible (as both AB = In and BA = In ).

“(b) ⇐⇒ (d)”. We apply column operations to the matrix A until we arrive at a lower triangular
matrix C. Then:
The columns of A span F n ;
⇐⇒ the columns of C span F n (by 3.3(b));
⇐⇒ all the diagonal elements of C are non-zero (because C is triangular)
⇐⇒ det(C) 6= 0 (by 5.2(d))
⇐⇒ det(A) 6= 0 (because det(C) = λ det(A) for some non-zero λ ∈ F by 5.5).
Theorem 5.7. (See also Thm 4.21 of L.A.I.) Let F be a field. Let A, B ∈ Mn×n (F ). Then:

det(AB) = det(A) · det(B)

Proof. Omitted. (See the proof of Thm 4.21 in L.A.I.)

 For each m
 ∈ N compute det(A ) ∈ C
m
Example 5.8.
1 + 4i 1
where A = ∈ M2×2 (C).
5+i 1−i

Solution. det(A) = (1 + 4i)(1 − i) − (5 + i) = (1 + 4 + 4i − i) − (5 + i) = 2i.


=⇒ det(Am ) = det(A)m (by 5.7)
2 if m is of the form 4k;
 m

2m i

if m is of the form 4k + 1;
= (2i)m =


 −(2 ) if m is of the form 4k + 2;
m

−(2m )i if m is of the form 4k + 3.



6 DIAGONALISABILITY 31

6 Diagonalisability
Definition 6.1. Let V be a vector space over a field F and let L : V → V be a linear transformation
from V to itself.

(a) For any λ ∈ F the set


Eλ (L) := {x ∈ V : L(x) = λx}
is called the eigenspace of L corresponding to λ.

(b) An element λ ∈ F is called an eigenvalue of L if Eλ (L) is not the zero space. In this case any
vector x in Eλ (L) different from the zero vector is called an eigenvector of L with eigenvalue λ.

(c) Let A ∈ Mn×n (F ). The eigenspaces, eigenvalues and eigenvectors of A are, by definition, those of
LA : F n → F n , x 7→ Ax.

(Compare Definition 7.1 of L.A.I.)

Lemma 6.2. Let F , V and L be as in Def 6.1. Then Eλ (L) is a subspace of V for every λ ∈ F .

Proof. (a) We have 0V ∈ Eλ (L) because L(0V ) = 0V = λ · 0V .

(b) Let x, y ∈ Eλ (L)


=⇒ L(x + y) = L(x) + L(y) = λx + λy = λ(x + y)
=⇒ x + y ∈ Eλ (L).

(c) Let a ∈ F and x ∈ Eλ (L)


=⇒ L(ax) = aL(x) = a(λx) = (aλ)x = (λa)x = λ(ax)
=⇒ ax ∈ Eλ (L).

Proposition 6.3. Let F be a field. Let A ∈ Mn×n (F ) and λ ∈ F . Then:

λ is an eigenvalue of A ⇐⇒ det(λIn − A) = 0.

(pA (λ) := det(λIn − A) is called the characteristic polynomial of A.) (See also Proposition 7.5 in L.A.I.)

Proof. λ is an eigenvalue of A
⇐⇒ ∃x ∈ F n , x 6= 0, such that Ax = λx
⇐⇒ ∃x ∈ F n , x 6= 0, such that (λIn − A)x = 0
⇐⇒ N (λIn − A) 6= {0}
⇐⇒ det(λIn − A) = 0 (by 5.6(c) ⇐⇒ (d)).

Example 6.4. Determine the (complex!) eigenvalues of the matrix

5i 3
 
A := ∈ M2x2 (C)
2 −2i

and a basis of the eigenspace of A for each eigenvalue of A.

Solution.
λ − 5i
 
−3
pA (λ) = det(λI2 − A) = det
−2 λ + 2i
= (λ − 5i)(λ + 2i) − 6√ = λ2 − (3i)λ + 4
The two roots of this polynomial are λ1,2 = 3i± −9−16
2 = 3i±5i
2 = 4i or − i;
=⇒ eigenvalues of A are 4i and −i.

Basis of E4i(A): Apply


 Gaussianelimination to
1 −3i R27→R2+2R1 1
 
−i −3 R17→iR1 −3i
4iI2 − A = ∼∼∼∼∼∼> ∼∼∼∼∼∼∼∼∼>
 6i
−2 −2 6i  0 0
x1
=⇒ E4i (A) = ∈ C2 : x1 = (3i)x2
x2
32 6 DIAGONALISABILITY

(3i)x2 3i
    
= : x2 ∈ C = Span
x2 1
3i
 
=⇒ basis of E4i (A) is .
1
Basis of E−i(A): Apply Gaussian elimination to  
−6i −3 R1↔R2 −2 i R27→R2−(3i)R1 −2 i
−iI2 − A = ∼∼∼∼∼> ∼∼∼∼∼∼∼∼∼∼∼>
−2 i −6i −3 0 0
  
x1
=⇒ E−i (A) = ∈ C2 : x1 = 2i x2 =
x2
 i    i 
2 x2 : x2 ∈ C = Span 2
x 1
  2
i
=⇒ basis of E−i (A) is .
2

Example 6.5. Let V be the real vector space of infinitely often differentiable functions from R to R
and let D : V → V, f 7→ f 0 , denote differentiation (cf. 4.3(d)). Then for every λ ∈ R the eigenspace of
D with eigenvalue λ is of dimension 1 with basis given by the function expλ : R → R, t 7→ eλt .

Proof. (eλt )0 = λ(eλt ) (by the chain rule)


=⇒ expλ ∈ Eλ (D).
Suppose f ∈ Eλ (D)
=⇒ (f (t)e−λt )0 = f 0 (t)e−λt + f (t)(e−λt )0 (by the product rule)
= λf (t)e−λt − λf (t)e−λt (because f ∈ Eλ (D) and by the chain rule)
=0
=⇒ f (t)e−λt is a constant, say a ∈ R (by Calculus)
=⇒ f (t) = aeλt , i.e. f = a expλ .
Hence Eλ (D) = Span(expλ ).

Definition 6.6. (a) Let F , V and L : V → V be as in Def 6.1. We say that L is diagonalisable if there
exists a basis x1 , . . . , xn of V such that the matrix D representing L with respect to this basis is a
diagonal matrix.

(b) Let F be a field. We say that a square matrix A ∈ Mn×n (F ) is diagonalisable if the linear
transformation LA : F n → F n , x 7→ Ax, is diagonalisable.

Proposition 6.7. Let F , V and L : V → V be as in Def 6.1. Then L is diagonalisable if and only if V
has a basis x1 , . . . , xn consisting of eigenvectors of L.

part: ∃ a basis x1 , . . . , xn of V such that the matrix D representing L is diagonal: D =


Proof. only-if 
0

λ1
..  with some λ1 , . . . , λn ∈ F . We put
.
 

0 λn
 
c1
 .. 
I := Ix1 ,...,xn : F n f
→ V,  .  7→ c1 x1 + . . . + cn xn .
cn

L
V −−→ V
=⇒ The diagram I ↑ ↑I commutes (see proof of 4.17)
n LD n
F −−−−→ F
=⇒ L(xi ) = L(I(ei )) = I(LD (ei )) = I(λi ei ) = λi xi
=⇒ x1 , . . . , xn are eigenvectors of L with eigenvalues λ1 , . . . , λn , respectively.

if part: Let x1 , . . . , xn be a basis of V consisting of eigenvectors of L and let λi ∈ F denote the eigenvalue
corresponding to xi .
6 DIAGONALISABILITY 33

0
 
λ1
Define the diagonal matrix D by D =  .. .
.
 

0 λn
=⇒ D represents L with respect to x1 , . . . , xn because
L(c1 x1 + . . .+ cn xn
) = c1 L(x1 ) + . . . + cn L(xn ) = λ1 c1 x1 + . . . + λn cn xn
λ1 c1 c1
and  ...  = D  ...  ∀c1 , . . . , cn ∈ F.
   

λn cn cn

Proposition 6.8. Let F be a field. Let A ∈ Mn×n (F ). Then A is diagonalisable if and only if there
exists an invertible matrix M ∈ GLn (F ) such that M −1 AM is a diagonal matrix. (In this case we say
that M diagonalises A.)

Proof. Let M ∈ Mn×n (F ) with columns vectors x1 , . . . , xn .


Then x1 , . . . , xn form a basis of F n if and only if M is invertible
(by 5.6(a) ⇐⇒ (b) ⇐⇒ (c)).
In this case we furthermore have for i = 1, . . . , n and λ1 , . . . , λn ∈ F :
xi is an eigenvector of A with eigenvalue λi
⇐⇒ Axi = λi xi
0
 
 .. 
.
0
 

⇐⇒ AM ei = λi (M ei ) (because xi = M ei and ei =  1 ← i place)


  th

0
.
 
 .. 
0
⇐⇒ AM ei = M (λi ei )
⇐⇒ M −1 AM ei = λi ei (multiply  with
 M
−1
);
0
 .. 
.
0
 

⇐⇒ i column of M AM is  λi  ← i place.


th −1
  th

0
.
 
 .. 
0
Hence x1 , . . . , xn are eigenvectors of A if and only if M −1 AM is a diagonal matrix.
Now 6.8 follows from 6.7.

Example 6.9. Show that the matrix

0 −1 1
 

A := −3 −2 3 ∈ M3×3 (R)


−2 −2 3

is diagonalisable and find an invertible matrix M ∈ GL3 (R) that diagonalises it.

Solution.
1
 
λ −1
pA (λ) = det(λI3 − A) = det  3 λ + 2 −3 
2 2 λ−3
= λ(λ + 2)(λ − 3) + 1(−3)2 + (−1)3 · 2 − (−1)(λ + 2)2 − λ(−3)(2) − 1 · 3(λ − 3)
= λ(λ2 − λ − 6) − 6 − 6 + 2λ + 4 + 6λ − 3λ + 9
= λ3 − λ2 − λ + 1 = λ2 (λ − 1) − (λ − 1) = (λ2 − 1)(λ − 1) = (λ − 1)2 (λ + 1).
=⇒ Eigenvalues of A are 1 and −1.
Basis of E1 (A): We apply Gaussian elimination to
34 6 DIAGONALISABILITY

1 1 −1 1 1 −1
   
R27→R2−3R1
I3 − A = 3 3 −3 ∼∼∼∼∼∼∼∼∼> 0 0 0 
R37→R3−2R1
2  2 −2 0 0 0 
  −b2 + b3
    
 b1 
=⇒ E1 (A) = b2  ∈ R3 : b1 + b2 − b3 = 0 =  b2  : b2 , b3 ∈ R
 b3   b3   
   
1 1
  
 −1  −1
= b2  1  + b3 0 : b2 , b3 ∈ R = Span  1  , 0 ;
0  1 0 1
 
1
 
−1
=⇒ Basis of E1 (A) is x1 :=  1  , x2 := 0.
0 1
Basis of E −1 (A): We apply Gaussian elimination
 to −I3 − A 
−1 1 −1 1 −1 0 12
   
R27→R2+3R1
−1 −1 R7→R1− 41 R2
=  3 1 −3 ∼∼∼∼∼∼∼∼∼>  0 4 −6 ∼∼∼∼∼∼∼∼∼>  0 4 −6
R37→R3+2R1 R37→R3−R2
2 2 −4 0 4 −6 0 0 0
 
 1b 
=⇒ E−1 (A) = b2  ∈ R3 : −b1 + 12 b3 = 0 and 4b2 − 6b3 = 0
b3
 
 1    1 
 2 b3  2
=  32 b3  : b3 ∈ R = Span  32  ;
b3 1
 
1
2
=⇒ Basis of E−1 (A) is x3 :=  32  .
1
−1 1 12
 

For M := (x1 , x2 , x3 ) =  1 0 32  we have det(M ) = 12 + 32 − 1 = 1 6= 0.


0 1 1
=⇒ x1 , x2 , x3 form a basis of R3 (by 5.6 and 3.15) consisting of eigenvectors of A.
=⇒ A is diagonalisable (by 6.7) and M diagonalises A (by the proof of 6.8).

Definition 6.10. Let F be a field. Let A ∈ Mn×n (F ) and λ ∈ F be an eigenvalue of A.

(a) The algebraic multiplicity aλ (A) of λ is its multiplicity as a root of the characteristic polynomial
of A.

(b) The geometric multiplicity gλ (A) of λ is the dimension of the eigenspace Eλ (A).

Example 6.11. (a) In Example 6.9 we had pA (λ) = (λ − 1)2 (λ + 1) and


a1 (A) = 2 = g1 (A) and a−1 (A) = 1 = g−1 (A).

1 1
 
(b) Let A = ∈ M2×2 (F )
0 1
1
 
=⇒ pA (λ) = (λ − 1) and a basis of E1 (A) is
2
0
=⇒ a1 (A) = 2 but g1 (A) = 1.

Theorem 6.12. Let F be a field. Let A ∈ Mn×n (F ). Then A is diagonalisable if and only if the
characteristic polynomial of A splits into linear factors and the algebraic multiplicity equals the geometric
multiplicity for each eigenvalue of A.

Proof. Omitted

0 1
 
Example 6.13. Determine whether the matrix A = is diagonalisable when viewed as an
−1 0
element of M2×2 (R), of M2×2 (C) and of M2×2 (F2 ). If A is diagonalisable then determine an invertible
matrix M that diagonalises A.
6 DIAGONALISABILITY 35

 
λ −1
Solution. pA (λ) = det = λ2 + 1.
1 λ
for R: λ2 + 1 does not split into linear factors
=⇒ as an element of M2×2 (R) the matrix A is not diagonalisable (by 6.12)
(A is a rotation by 90◦ ).

for C: λ2 + 1 = (λ + i)(λ − i) =⇒ a+i (A) = 1 and a−i (A) = 1.


i (A):
Basis of E
i −1 R17→(−i)R1 1 i R27→R2−R1 1 i
    
iI2 − A = ∼∼∼∼∼∼∼∼> ∼∼∼∼∼∼∼∼>
1 i 1 i 0 0
  
b1
=⇒ Ei (A) = ∈ C2 : b1 + ib2 = 0
b2
    
−ib2 −i
= : b2 ∈ C = Span
b2 1
=⇒ g+i (A) = 1.
Basis of E−i(A):
−i −1 R17→iR1 1 −i R27→R2−R1 1 −i
    
−iI2 − A = ∼∼∼∼∼∼> ∼∼∼∼∼∼∼∼>
1 −i 1 −i  0 0
b1
=⇒ E−i (A) = ∈ C2 : b1 − ib2 = 0
b
2
   
ib2 i
= : b2 ∈ C = Span
b2 1
=⇒ g−i (A) = 1  
−i i
=⇒ A is diagonalisable when viewed as an element of M2×2 (C) (by 6.12) and M = diago-
1 1
nalises A.

for F2 : λ2 + 1 = (λ + 1)2 (as 1 + 1 = 0 in F2 )


=⇒ A has the eigenvalue 1 = −1 and a1 (A) = 2.
Basis of E1 (A): 
1 1 R27→R2−R1 1 1
 
I2 − A = ∼∼∼∼∼∼∼∼>
1 1  0 0 
b1
=⇒ E1 (A) = ∈ F 2 : b + b2 = 0
b2 2 1
1
  
b2
= : b2 ∈ F2 = Span
b2 1
=⇒ g1 (A) = 1
=⇒ A is not diagonalisable (by 6.12).

Theorem 6.14 (Cayley-Hamilton). Let F be a field, let A ∈ Mn×n (F ) and let pA be the characteristic
polynomial of A. Then pA (A) is the zero matrix.

0 1
 
Example 6.15. Let A = ∈ M2 (F )
−1 0
=⇒ pA (λ) = λ + 1 (see 6.9)
2

−1 0 1 0 0 0
    
=⇒ pA (λ) = A + I2 =
2
+ = .
0 −1 0 1 0 0

0
 
a1
Proof of 6.14. First Case: Let A =  ..  be a diagonal matrix.
.
 

0 an
=⇒ pA (λ) = (λ − a1 ) · · · (λ − an )
=⇒
 pA (A) = (A − a1 In )· · · (A − an In )
1 −an
0 0 a1 −a2 0
 a 0

a2 −a1 0 ..
= ..   .. ... . 
. . an−1 −an
0 an −a1 0 an −a2 0 0
36 6 DIAGONALISABILITY

= 0 (because the product of any two diagonal matrices with diagonal entries b1 , . . . , bn and c1 , . . . , cn ,
respectively, is the diagonal matrix with diagonal entries b1 c1 , . . . , bn cn ).
Second Case: Let A be a diagonalisable matrix.
=⇒ ∃M ∈ GLn (F ) such that M −1 AM = D where D is a diagonal matrix (by 6.8).
Let pA (λ) = λn + an−1 λn−1 + . . . + a1 λ + a0 be the characteristic polynomial of A
=⇒ pA (A) = An + an−1 An−1 + . . . + a1 A + a0 In
= (M DM −1 )n + an−1 (M DM −1 )n−1 + . . . + a1 (M DM −1 ) + a0 In
(because A = M DM −1 )
= M Dn M −1 + an−1 M Dn−1 M −1 + . . . + a1 M DM −1 + a0 M M −1
(because (M DM −1 )k = (M DM −1 )(M DM −1 ) . . . (M DM −1 )
= M D(M −1 M )D(M −1 M ) . . . (M −1 M )DM −1 = M Dk M −1 )
= M (Dn + an−1 Dn−1 + . . . + a1 D + a0 In )M −1
(using distributivity of matrix multiplication)
= M pA (D)M −1
= M pD (D)M −1 (see below)
= M 0M −1 = 0 (by the first case).

pD (λ) = det(λIn − M −1 AM ) = det(M −1 (λIn )M − M −1 AM )


= det(M −1 (λIn − A)M ) = det(M )−1 det(λIn − A) det(M ) (by 5.7)
= det(λIn − A) = pA (λ).

General Case: omitted.

You might also like