Linear Algebra Lecture Notes
Linear Algebra Lecture Notes
Lecture Notes
Autumn 2020
1
MA201 Linear Algebra Autumn 2020-21
Contents
2
MA201 Linear Algebra Autumn 2020-21
9 Linear maps 42
Linear maps and matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
13 Diagonalisation 56
0
MA201 Linear Algebra Autumn 2020-21
This section summarises notation and results on matrices and vectors which will be
referred to from time to time. Mostly, these results will be familiar from MA114.
Vectors
So far, you will have have seen row vectors, which are ordered lists of numbers
v = (v1 , v2 , . . . , vn ), and column vectors such as xy . The vectors in a matrix-vector
equation Au = v are usually column vectors. Because column vectors take up a lot
of space on the page, it is common to use the transpose operation to write a row
vector as a column vector, for instance
v1
v2 T
.. = v1 v2 · · · vn .
.
vn
You will see different notation for vectors in different places; in these notes, vectors
will always be denoted using bold upright letters (u, v, x and so on). A special role
is played by the zero vector whose entries are all zero. We denote this vector by 0.
1
MA201 Linear Algebra Autumn 2020-21
In other words, the (i, j) entry of AB is the scalar product of the i-th row of A
with the j-th column of B. Again, the also holds true for matrix-vector products,
thinking of vectors as matrices with a single column.
Another very useful property of matrix multiplication is the associative property
A(BC) = (AB)C,
which holds for all matrices A, B and C (which must be of the right size so that the
products are defined). A special case of this is when C has a single column – that
is, it is a column vector. Then we obtain the equation
(AB)v = A(Bv)
which holds for all vectors v. If we think of matrices as acting on vectors (i.e. ‘A
is the matrix which sends each vector v to the vector Av’), then this equation says
‘AB is the matrix which encodes the operation “apply B to v, then apply A.”
Elementary row operations are, as the name suggests, basic steps for manipulating a
matrix, which do not affect the solutions to the corresponding system of equations.
The three operations are:
2
MA201 Linear Algebra Autumn 2020-21
operations and end up with a solution, then undoing these operations will also result
in a solution.
A matrix is said to be in row echelon form if the each row starts with at least as
many zeroes as the row above it. In a picture, the matrix has the form
∗ possibly non-zero entries
∗
∗
∗
∗
∗
zero entries
∗
Example 1.1. Let’s use Gaussian elimination to reduce a matrix to row echelon
form:
1 2 3 1 2 3
1 1 1 0 −1 −2 (subtract row 1 from row 2)
2 3 4 2 3 4
1 2 3
0 −1 −2 (subtract twice row 1 from row 3)
0 −1 −2
1 2 3
0 −1 −2 (subtract row 2 from row 3)
0 0 0
which is in row-echelon form, and we can further bring this into reduced row echelon
form by multiplying row 2 by −1, and subtracting twice this from row 1, to get
1 0 −1
0 1 2 .
0 0 0
Thus, elementary row operations let us simplify a system of linear equations, let-
ting us spot its properties more easily. For instance, the rank of a system of linear
equations is the number of ‘independent’ constraints the system places on its vari-
ables. This is not easy to spot in general, but if we put the system in row echelon
3
MA201 Linear Algebra Autumn 2020-21
form (i.e. use Gaussian elimination on A), then the rank is equal to the number of
non-zero rows. This is not hard to see, since each non-zero row then tells us that
some variable xi can be written as a sum of the next variables xi+1 up to xn .
Since the number of constraints in a system Ax = b depends only on the matrix A,
we also call this the rank of the matrix A.
Using Gaussian elimination on the augmented matrix of a system Ax = b, i.e. the
matrix (A|b) obtained by appending b as a new column of A, let’s us study the
solutions to a system of linear equations as follows:
You may have seen this explained in a different way: If we use Gaussian elimination
on the augmented matrix (A|b), then the left-hand part is just A in row echelon
form. So the only way these matrices can have different ranks is if the augmented
matrix ends up with a row
(0 0 · · · 0|c)
where c is non-zero. But this is an equation of the form 0x1 + 0x2 + · · · + 0xn = c,
which is exactly what makes a linear system inconsistent!
So if b2 − 2b1 6= 0 then the system is inconsistent (no solutions), but if b2 = 2b1 then
there are solutions, in fact any pair (x, y) satisfying x + 2y = b1 will be a solution.
4
MA201 Linear Algebra Autumn 2020-21
0 · · · 0 an
and if e1 = (1, 0, . . . , 0)T , e2 = (0, 1, 0, . . . , 0)T and so on, then Aei = ai ei for all i.
Thus for each i, ei is an eigenvector for A with eigenvalue ai .
Calculate the characteristic polynomial of A, and find its roots (the eigenvalues
of A).
For each such root λ, solve (A − λIn )v = 0 to find a non-zero solution for v,
which will be an eigenvector corresponding to λ.
Hence we verify that the eigenvalues of this matrix are indeed ±i, the (complex)
square roots of −1.
1 1
substitute x = i and x = −i into A−xIn :
From this point, to get the eigenvectors, we
We find that −i works for x = i and i works for x = −i.
This algorithm always works: The only way it could fail is if (A − λIn )v = 0 had
the unique solution v = 0, since then we could not find a (non-zero) eigenvector
corresponding to λ. But as we shall see when we discuss matrix inverses, a linear
system only has a unique solution when the determinant is non-zero. But we know
that det(A − λIn ) = 0 since λ is a root.
5
MA201 Linear Algebra Autumn 2020-21
Determinants
Determinants are very useful invariants of square matrices which encode both al-
gebraic and geometric information. For instance, the determinant of a real matrix
can be thought of as a ‘scaling factor’ which tells you how the volume of a shape
changes when applying the matrix as a geometric transformation.
The determinant of a matrix A is written as det(A) or |A|. For small matrices, the
determinant can be written down and calculated by hand:
a b
c d = ad − bc,
a b c
d e f = a e f − b d f + c d e
h i g i g h
g h i
= aei − af h − bdi + bf g + cdh − ceg.
For larger matrices, determinants can be defined inductively via a rule sometimes
known as Laplace expansion along a row or column. For instance, the 3 × 3 deter-
minant above is given by expansion along the first row: We take the elements of the
first row (a, b, c), multiplied by −1 in the even positions, and then multiplied by the
corresponding minor, i.e. the determinant of the matrix where the row and column
containing a, b or c is crossed out.
This same technique lets us define the determinant of an n×n matrix as an alternat-
ing sum of determinants of (n − 1) × (n − 1) matrices, expanding along an arbitrary
row or column: If we are given the an n × n matrix A = (aij ), then we can fix i or
j to get the inductive formulas
n
X n
X
i+j
det A = (−1) aij det Aij = (−1)i+j aij det Aij .
j=1 i=1
| {z } | {z }
expand along row i expand along column j
where Aij is the matrix formed from A by deleting row i and column j.
While you will not be asked to perform gigantic calculations by hand, you should
be aware how one defines the determinant of matrices larger than 3 × 3, and for
certain matrices, expansion along a row can be very quick, particularly if the matrix
has lots of zeroes. For example, using two lots of expanding along the first row and
ignoring zeroes lets us calculate:
0 1 0 0
0 2 0
0 0 2 0
0 0 0 3 = (−1) 0 0 3
4 0 0
4 0 0 0
0 3
= (−1) (−2)
4 0
= (−1)(−2)(−12)
= −24.
6
MA201 Linear Algebra Autumn 2020-21
Inverse matrices
The inverse of a matrix exists precisely when its determinant is non-zero. The
existence of an inverse tells us something about solutions to the corresponding linear
system:
Proposition 1.4. If A is a square matrix (i.e. m = n), then Ax = b has a unique
solution for x if and only if A−1 exists, i.e. when det A 6= 0.
7
MA201 Linear Algebra Autumn 2020-21
In MA114, you’ve seen vectors and matrices described as lists and arrays of numbers
(usually real numbers). However, all of the important properties of vectors and
matrices arise from just two things:
(1) Addition: We have a rule for adding elements of V together to get another
element of V . If v, w ∈ V , we denote this new element by v + w.
In this case we call V a vector space over F . Elements of V are called vectors,
and F is called the field of scalars.
Not all sets of numbers are allowed to be scalars. Importantly, we must be able to
add, subtract and multiply scalars together, and we must also be able to divide by
non-zero scalars (otherwise Gaussian elimination wouldn’t work). So for instance,
F = R, Q or C is OK, but F = N is not (subtraction might result in numbers not
in N), and nor is F = Z (because division might result in a number not in Z).
8
MA201 Linear Algebra Autumn 2020-21
Since we are only ever allowed to use addition and scalar multiplication to manipu-
late vectors, we have a special term for when we create new vectors using these:
λ1 v1 + λ2 v2 + · · · + λn vn
Examples 2.3.
1. The familiar spaces R, R2 and R3 are vector spaces over R. More generally, for
each n ≥ 0, the set Rn is a vector space over R. This is the set of n-tuples
3. Here is a more abstract example. Let V be the set of all functions R → R. This
becomes a vector space over R with addition and scalar multiplication defined
point-wise, i.e. if we have functions f, g : R → R then we define
def
(f + g)(x) = f (x) + g(x),
a1 x1 + a2 x2 + · · · + an xn = 0,
9
MA201 Linear Algebra Autumn 2020-21
as in Rn above), then sums and scalar multiples of solutions to the equation are
also solutions, because of the two rules:
n n n
! !
X X X
ai (xi + yi ) = ai xi + ai yi ,
i=1 i=1 i=1
n n
!
X X
ai (λxi ) = λ ai xi ,
i=1 i=1
which hold for all vectors (x1 , . . . , xn ) and (y1 , . . . , yn ) ∈ F n and all scalars λ ∈ F .
dn y dn−1 y dy
an (x) n
+ a n−1 (x) n−1
+ · · · + a1 (x) + a0 (x)y = 0.
dx dx dx
i.e. the coefficients ai (x) depend on x only, not on y. The set of functions R → R
satisfying a given linear ODE form a vector space over R. To check this, we
need to know that when we add two such functions together, we get another such
function, and similarly when we multiply such a function by a scalar (constant).
Check this for yourself!
10
MA201 Linear Algebra Autumn 2020-21
The informal definition of a vector space above (Definition 2.1) is lacking some
detail, since we haven’t actually specified which properties the addition and scalar
multiplication has to satisfy. To do this, we need axioms.
(Later in MA204, you’ll see that these four axioms tell us that V is an abelian
group under addition. But you don’t need to know this name for now.)
Axioms for scalar multiplication
The idea of these axioms is that they capture the ‘obvious’ properties of R2 , R3 etc.,
and from these axioms we can deduce all other properties of vector spaces. Here are
some examples of properties we can prove directly from the axioms:
Proposition 3.2 (Uniqueness of zero). Suppose 0 and 00 are two vectors, both of
which satisfy the additive identity axiom. Then 0 = 00 .
11
MA201 Linear Algebra Autumn 2020-21
But now applying the additive identity axiom for 0, with v = 00 , we get 0 + 00 = 00 .
So 0 = 00 , as claimed.
Proposition 3.3 (Uniqueness of negatives). Let v ∈ V , and suppose that w and
w0 satisfy
v + w = v + w0 = 0. (1)
Then w = w0 .
Note: It is tempting to say “this is obvious, we just subtract v from both sides”.
But we haven’t defined subtraction yet, only addition and scalar multiplication. To
define subtraction, we need to know that negatives are unique (so things like v1 − v2
are uniquely defined). So we need a proof that only uses the axioms.
Proof. Using the axioms and (1) gives the following equalities:
as required.
Notation: Since negatives are unique, we can now talk about the negative of a
vector (the axiom only says ‘a negative’), and we write −v for the negative of v.
Now we are also able to define subtraction: v − w is defined to be v + (−w).
Corollary 3.4. If 0 is the zero scalar in F and v ∈ V is any vector, then 0v = 0.
Proof. Using the property 0 = 0 + 0, which holds in all fields F (think of the reals),
as well as the distributivity axioms, we have 0v = (0 + 0)v = 0v + 0v. Now we can
subtract 0v from both sides:
0v = 0v + 0v
0v − 0v = (0v + 0v) − 0v
0v − 0v = 0v + (0v − 0v) (additive associativity)
0 = 0v + 0 (additive inverse)
0 = 0v. (property of 0).
Note: This is, obviously, an extreme level of detail to go into. In an exam situation,
you would be told explicitly when you are expected to go back to the axioms to prove
a result!
12
MA201 Linear Algebra Autumn 2020-21
Example 3.5. Consider the set V = R. We will briefly check that the usual addition
and multiplication of real numbers makes R satisfies the eight axioms, and therefore
R is a vector space over R (it is just Rn with n = 1, of course).
In fact, in this case, most of the axioms are self-evident. Writing the axioms out
using symbols, these become:
And these are all familiar properties of real numbers which we already use without
thinking about them.
Subspaces
Vector spaces are important objects already, but their real power comes from study-
ing their subspaces. These are subsets where using addition and scalar multiplication
never takes you outside the subset. We say that the subset is closed under addition
and scalar multiplication. Formally:
– if v, w ∈ U then v + w ∈ U , and
– if v ∈ U and λ ∈ F then λv ∈ U .
Another way of phrasing this is ‘U is closed under taking linear combinations.’ One
reason subspaces are important is that they are also vector spaces:
Proposition 3.7. Let V be a vector space over F , and let U be a subspace. Then
U is also a vector space over F , with the same addition and scalar multiplication.
Proof. We need to check that each of the eight axioms holds for U . Mostly, these
are automatically true because they are true for V . For instance, consider the two
distributivity axioms. These say that the equations
λ(v + u) = λv + λu
(λ + µ)v = λv + µv
13
MA201 Linear Algebra Autumn 2020-21
v + (−v) = 0.
So if we show that v + (−1)v = 0, then the uniqueness will tell us that (−1)v = −v.
Now:
1. Let V = R3 , the set of triples (x, y, z) of real numbers. Let U be the subset
U = {(x, x, 0) : x ∈ R}.
14
MA201 Linear Algebra Autumn 2020-21
which is also a vector in U . Since the scalars λi were arbitrary, this shows
that U is closed under both addition and scalar multiplication.
U = {(x, x + 1, 0) : x ∈ C}.
We claim that this is not a subspace (equivalently, it is not a vector space). For
example, we can see that 0 6= (x, x + 1, 0) for any x ∈ C, so U doesn’t contain 0.
Also, U is not closed under addition, because
(x, x + 1, 0) + (y, y + 1, 0) = (x + y, x + y + 2, 0)
and the right-hand side here does not have the form (z, z + 1, 0) for any z ∈ C.
A third reason that U is not a subspace is that it is not closed under scalar
multiplication, because
λ(x, x + 1, 0) = (λx, λx + λ, 0)
and if we pick any complex number λ 6= 1 then the right-hand side does not have
the form (y, y + 1, 0), so is not an element of U .
U = {(x, y, z) : x, y, z ∈ R, x ≥ 0}.
This contains 0, and is closed under addition (Exercise: check these for yourself!)
but U is not a subspace, because it is not closed under scalar multiplication; e.g.
(−1)(x, y, z) = (−x, −y, −z), which is not in U if we pick x > 0.
U = {(m, n, p) : m, n, p ∈ Z}.
15
MA201 Linear Algebra Autumn 2020-21
Ax = 0. (∗)
Where 0 is the all-zero column vector of length m. We claim that the subset U ⊆ V
of all vectors x satisfying (∗) is a subspace of V . This is an Exercise (Sheet 2).
16
MA201 Linear Algebra Autumn 2020-21
Recall: If V is any set, and if U and W are subsets, then the intersection of U and
W is the set
U ∩ W = {x ∈ V : x ∈ U and x ∈ W }.
0 = |{z} 0 ∈ U + W.
0 + |{z}
∈U ∈W
If u1 + w1 ∈ U + W and u2 + w2 ∈ U + W , then
U + W = {u + w : u ∈ U, w ∈ W }
= {0 + u + w : u ∈ U, w ∈ W }
| {z }
∈W
= 0 + w0 : w0 ∈ W
= W.
17
MA201 Linear Algebra Autumn 2020-21
Similarly if U = W then U = W = U ∩ W = U + W .
For a concrete example, take V = R2 and let U and W be lines through the
origin. Then we have
U = {λv1 : λ ∈ R},
W = {λv2 : λ ∈ R}
U + W = {u + w : u ∈ U, w ∈ W }
= {µ1 (λv1 ) + µ2 v1 : µ2 , µ2 ∈ R}
= {γv1 : γ ∈ R}
= U = W = U ∩ W.
On the other hand, if U and W are not the same line, then the only multiples
of v1 and v2 which are equal are the zero multiples, i.e. 0. Hence U ∩ W = {0}
in this case. Moreover, U + W will be equal to all of R2 . This can easily be
seen if we draw a picture (and we’ll see a more rigorous way to show this later).
18
MA201 Linear Algebra Autumn 2020-21
Consider the linear equation x+y +z = 0. Some easy solutions to this are (x, y, z) =
(1, −1, 0), (0, 1, −1) and (−1, 0, 1). But these three solutions are redundant: For
example, we can write
and because the equation is linear, we know that linear combination of solutions will
also be a solution. So the solution (−1, 0, 1) can be recovered from the other two
using sums and scalar multiples. We say that the set {(1, −1, 0), (0, 1, −1), (1, 0, −1)}
is linearly dependent. More rigorously:
We can also turn this definition around: The set {v1 , . . . , vn } is linearly independent
precisely when the only relation (∗) it satisfies is the trivial relation with a1 = a2 =
· · · = an = 0.
Remark: A set of vectors {v1 , . . . , vn } is linearly dependent if and only if some
vi can be written as a linear combination of the others: If {v1 , . . . , vn } satisfy the
linear equation (∗) above, and if ai 6= 0, then we can write
1 X
vi = − a j vj .
ai
j6=i
P
Conversely, if we can write vi = j6=i bj vj for some scalars bj then, noticing that
vi has a non-zero coefficient (‘1’), we can move everything to the left-hand side to
get a non-trivial linear relation of the form (∗).
Examples 5.2.
19
MA201 Linear Algebra Autumn 2020-21
previous example, we know that two such vectors form a linearly dependent set
if and only if the vectors are parallel. In particular, two such vectors are linearly
independent if and only if they are not parallel (i.e. they are not contained in the
same line through the origin).
3. The set C of complex numbers can be thought of as a vector space over C, or as
a vector space over R (thinking of the complex number x + iy as a pair (x, y) of
real numbers).
As a vector space over C, all pairs of complex numbers are scalar multiples
of each other, so every set {z1 , z2 } of distinct complex numbers is linearly
dependent.
On the other hand, as a vector space over R, two complex numbers
z1 = x1 + iy1 , z2 = x2 + iy2
are scalar multiples if and only if there are real numbers λ1 and λ2 such that
λ1 (x1 + iy1 ) + λ2 (x2 + iy2 ) = 0.
Identifying real and imaginary parts, this means that we need
λ 1 x1 + λ 2 x2 = 0 and λ1 y1 + λ2 y2 = 0.
So for instance, the set {1, i} is linearly independent, since 1 and i are not
(real) scalar multiples of one another.
The following proposition shows that linear dependence is preserved when making
a set larger:
Proposition 5.3. Let V be a vector space, and suppose {v1 , . . . , vn } is linearly
dependent. Then for every w ∈ V , the set {v1 , . . . , vn , w} is also linearly dependent.
In conclusion: We can have a set of two linearly independent vectors in R2 , but not
a set of three. Moreover, we’ve seen that a set of two geometric vectors is linearly
independent if and only if the vectors define a plane. This discussion extends to R3
and above:
20
MA201 Linear Algebra Autumn 2020-21
The empty set ∅ and the zero vector 0 ∈ R3 define the zero subspace, contain-
ing only the zero vector 0.
Two vectors v, u define a line if they are linearly dependent (and not both 0),
otherwise they define the plane
{av + bu : a, b ∈ R}.
Three vectors in R3 are linearly dependent precisely when they define a line
or a plane. If they define a plane, then one of them lies in the plane defined
by the other two (so it is a linear combination of them). If they define a line,
then all three of them are scalar multiples of each other.
Definition 5.6. Let V be a vector space over F and let X be a subset of V . The
span of X is the subset of V consisting of all linear combinations of elements of
X. We denote this by hXi:
In some places this is written hXiF , to emphasise that the scalars λi come
from F . Other notation commonly found in books is Span(X) or SpanF (X).
If X = {v1 , v2 , . . . , vn } is finite, we sometimes write hv1 , . . . , vn i instead of
h{v1 , v2 , . . . , vn }i.
If hXi = V then we say that X is a spanning set in V , or X spans V .
We get 0 ∈ hXi by taking the zero linear combination, i.e. λi = 0 for all i. To see
that hXi is closed under taking linear combinations, recall from Exercise Sheet 1
21
MA201 Linear Algebra Autumn 2020-21
Linearly independent sets are those which contain no redundant vectors (they are
not ‘too big’). On the other hand, spanning sets are those from which you can obtain
every vector in the space (they are not ‘too small’). Sets which satisfy both of these
properties are therefore extremely useful:
Definition 5.9. A basis of a vector space V is a subset of V which is both
linearly independent and spans V .
22
MA201 Linear Algebra Autumn 2020-21
(a1 , a2 , . . . , an ) = a1 e1 + a2 e2 + · · · + an en
for any choice of vector (a1 , . . .P , an ) ∈ V . The set is also linearly independent,
n
because the i-th coordinate of i=1 ai ei is ai , so if this sum is the zero vector
(0, 0, . . . , 0) then we have ai = 0 for all i. So {e1 , . . . , en } is indeed a basis of V .
In particular, Rn has dimension n as a vector space over R, and Cn has dimension
n as a vector space over C, as we might have hoped!
Remark 5.12. The one-point vector space {0} is special. Its only subsets are ∅
and {0}, so its only basis is the empty set ∅, and hence this is a 0-dimensional
vector space.
When we are given vectors in F n (i.e. lists of numbers), there are straightforward
‘algorithms’ which will tell us whether the set is linear independent or spanning.
To see whether a finite set {v1 , . . . , vr } is linearly dependent, we want to find a
linear relation between its vectors. Gaussian elimnation lets us to this:
Use Gaussian elimination to put this matrix in row-echelon form; let B be the
matrix we obtain. In the end, each row of B is a non-zero linear combination
of the rows of A (which are the vectors v1 , . . . , vr ).
So if B P
has a row of zeroes, this tells us that we have a non-trivial linear
relation ri=1 ai vi = 0.
Conversely, if the rows of A are linearly dependent, then B will always have a row of
zeroes at the bottom (Gaussian elimination produces as many such rows as possible).
Example 5.13. Let V = R3 or C3 and consider the set
23
MA201 Linear Algebra Autumn 2020-21
And this final row shows us that −3v1 + 2v2 + v3 = 0. So X is linearly dependent.
To calculate Span(X), we now know that we can write v3 = 3v1 − 2v2 . Therefore,
if a vector is a linear combination of v1 , v2 and v3 , we can substitute for v3 to write
the vector as a linear combination of v1 and v2 . Therefore Span(X) = Span(v1 , v2 ).
Finally, note that the set {v1 , v2 } is linearly independent. We can see this in two
ways: We can spot that the two vectors are not scalar multiples of one another (which
happens to work in this special case), or (a method which works more generally):
we can write these vectors as therows of a matrix,
and use Gaussian elimination. In
1 2 3
this case, the matrix reduces to , which has no rows of zeroes, which
0 −1 −1
tells us the set is linearly independent.
Now, as {v1 , v2 } is linearly independent, it is a basis of the subspace Span({v1 , v2 }).
So this span is two-dimensional (geometrically, it is a plane in R3 ). More explicitly,
X is a minimal spanning set in V (i.e. all proper subsets of X are not spanning).
X is a basis of V .
The proof of this result is an exercise on Exercise Sheet 3. This result is useful
because it tells us that we can always find a basis by shrinking a spanning set, or
adding to a linearly independent set. We will make use of this next week, when
proving that the dimension (size of a basis) is a uniquely-determined number.
24
MA201 Linear Algebra Autumn 2020-21
The terms ‘spanning’, ‘linear independent’ and ‘basis’ all have interpretations in
terms of solutions to linear systems:
Proposition 5.15. Let V be a vector space and let X ⊆ V . Then
(i) X spans V if and only if, for all subsets {v1 , . . . , vr } ⊆ X and each w ∈ V ,
the equation
X r
λ i vi = w (†)
i=1
(ii) X is linearly independent if and only if every equation of the form (†) has
at most one solution, for each w ∈ V .
(iii) X is a basis of V if and only if every equation (†) has a unique solution, for
each w ∈ V .
λ1 v1 + · · · + λn vn = w = µ1 v1 + · · · + µn vn
for some scalars λi and µi . This equation rearranges to ni=1 (λi − µi )vi = 0. This is
P
a linear relation, and we are assuming that B is linearly independent, and therefore
all the coefficients must be zero. In other words, λi = µi for all i.
P
Conversely, if X is linearly dependent, so that αi vi = 0 where some scalar αi is
non-zero, then we see that the equation
r
X
λ i vi = 0
i=1
25
MA201 Linear Algebra Autumn 2020-21
Corollary 6.1. (1) Every linearly independent set can be extended to a basis (just
add in vectors not in the span, one at a time, until you get a spanning set. This
will not break the property ‘linearly independent’).
(2) Every spanning set can be reduced to a basis (throw away elements until you
cannot throw away any more without breaking the property ‘spanning’).
We will now prove the following very important result, which tells us that the di-
mension is a uniquely-defined property of a vector space.
Theorem 6.2. Let V be a vector space.
Proof. 1 The idea is to show that we can replace the vectors in S with vectors in I,
one at a time, without breaking the ‘spanning’ property. By doing this, we will end
up showing that we can fit the whole of I into S, so that |I| ≤ |S|.
To get started, note that because S is spanning, every vector in I can be written as
a linear combination of elements in S. In particular, there exist scalars λ1 , . . . , λn
such that
Xn
w1 = λ i vi .
i=1
Now, because I is linearly independent, the right-hand side of this equation cannot
be 0, which means that λi 6= 0 for at least one i. Without loss of generality, we can
re-order the vectors v1 , . . . , vn so that λ1 6= 0 in the above equation. This gives us
w1 = λ1 v1 + · · · + λn vn
1 1
⇒ v1 = w1 − (λ2 v2 + · · · + λn vn ) .
λ1 λ1
This shows that v1 is in the span hw1 , v2 , v3 , . . . , vn i. This means that
{v1 , v2 , . . . , vn } ⊆ hw1 , v2 , . . . , vn i
⇒ hv1 , v2 , . . . , vn i ⊆ hw1 , v2 , . . . , vn i
26
MA201 Linear Algebra Autumn 2020-21
Since S is spanning, the left-hand side is equal to V , and therefore the right-hand
side is also equal to V . This shows that the set
def
S1 = {w1 , v2 , . . . , vn }
is also spanning.
Similarly, since S1 is spanning, we can write
w2 = µ1 w1 + α2 v2 + α3 v3 + · · · + αn vn
for some scalars µ1 , α2 , . . ., αn . Again, using the fact that I is linearly independent,
at least one of the αi must be non-zero, otherwise we would get the non-trivial
relation w2 = µ1 w1 between elements in I. So again, we can rearrange the elements
{v2 , . . . , vn } to assume that α2 6= 0, and we get
w2 = µ1 w1 + α2 v2 + α3 v3 + · · · + αn vn
1 1
⇒ v2 = (w2 − µ1 w1 ) − (α3 v3 + · · · + αn vn ) .
α2 α2
This shows that v2 ∈ hw1 , w2 , v3 , v4 , . . . , vn i, and therefore
{w1 , v2 , . . . , vn } ⊆ hw1 , w2 , v3 , v4 , . . . , vn i
⇒ hw1 , v2 , . . . , vn i ⊆ hw1 , w2 , v3 , . . . , vn i ,
and since we showed that S1 is spanning, the left-hand side is V hence so is the
right-hand side. Therefore the set
def
S2 = {w1 , w2 , v3 , . . . , vn }
is also spanning.
We now proceed in this way, inductively: We keep replacing an element of S with
an element from I, until we run out of elements in S or elements in I. We end up
with a spanning set
Sp = {w1 , . . . , wp , vp+1 , vp+2 , . . . , vn }
where p = min{n, m}. Recall that we’re trying to shown n ≥ m.
Now note that if n < m then Sp = Sn = {w1 , . . . , wn } spans V , and moreover there
exists another vector wn+1 ∈ I. This means that wn+1 can be written as a linear
combination of the other elements in I; but this is impossible because I is linearly
independent. This is a contradiction, and so we must have n ≥ m, which is what
we wanted to show.
2 Let B1 and B2 be bases of V . By Part 1, since B1 is spanning and B2 is linearly
independent, we have |B1 | ≥ |B2 |. But for the same reason, since B2 is spanning
and B1 is linearly independent, we have |B1 | ≥ |B2 |. So B1 and B2 have the same
size.
Corollary 6.3. If U ⊆ V are vector spaces, then dim U ≤ dim V .
27
MA201 Linear Algebra Autumn 2020-21
But each of the expressions on the right-hand side is in hXi, so u + w ∈ hXi, too.
Secondly, we must show that X is linearly independent. We know that XU ∩W , XU
and XW are linearly independent since they are bases of their subspaces. So suppose
we have an equation
r
X s
X t
X
λ i vi + µ j uj + σk wk = 0.
i=1 j=1 k=1
The left-hand side here is in U , and the right-hand side is in W . So both sides lie
in U ∩ W .
In particular, this says that − tk=1 σk wk is in the span of XU ∩W = {v1 , . . . , vr };
P
this implies that σk = 0 for all k since the set XW = {v1 , . . . , vr , w1 , . . . , wt } is
linearly independent.
An
Pr identical argument shows that µj = 0 for all j; then the above equation becomes
i=1 λi vi = 0, and so λi = 0 for all i since XU ∩W is linearly independent.
28
MA201 Linear Algebra Autumn 2020-21
Examples 6.5.
1. Consider two lines in R2 . These each have dimension 1, and so we have two
possiblities: Since dim(U1 ∩ U2 ) + dim(U1 + U2 ) = 2 and dim(U1 ∩ U2 ) ≤ dim U1 =
1, the intersection is a point (dimension 0) precisely when the sum is all of R2
(dimension 2),
dim(U1 ∩ U2 ) + dim(U1 + U2 ) = 4
4. Things get more interesting when we consider two planes U1 and U2 in R4 . Now,
we have three possibilities:
dim(U1 ∩ U2 ) dim(U1 + U2 )
(a) 0 4
(b) 1 3
(c) 2 2
Cases (b) and (c) are familiar: If two planes intersect in a line, then their sum
is a 3-dimensional subspace of R4 (note: just as there are lots of lines in R2 and
lots of planes in R3 , there are lots of 3-D subspaces of R4 ).
But case (a) is a new: In R4 (and in higher-dimensional spaces Rn with n > 4),
it is possible for two planes to meet in a single point (the origin). In this case,
the sum of the two planes is R4 (or a 4-dimensional subspace of Rn , if n > 4).
To make this very concrete, let U1 and U2 be the following subspaces of R4 :
U1 = {(x, y, 0, 0) : x, y ∈ R},
U2 = {(0, 0, z, u) : z, u ∈ R}.
Then clearly U1 ∩ U2 = {(0, 0, 0, 0)}, and we can trivially spot how any vector in
R4 can be written as a sum of a vector in U1 and a vector in U2 .
29
MA201 Linear Algebra Autumn 2020-21
. . , ain ) ∈ F n ,
row vectors (ai1
, ai2 , .
a a · · · a
11 12 1n
a1j
(
a21 a22 · · · a2n
a2j
m .
.. .. .. column vectors . ∈ F m
..
. . . .
.
am1 am2 · · · amn
amj
| {z }
n
Examples 7.3.
0 ··· 0
1. If A is the all-zero m × n matrix ... . . . ... , then the row space of A is
0 ··· 0
spanned by the zero vector 0 = (0, 0, . . . , 0) ∈ Rn , hence is {0}, and similarly the
column space of A is {0} ⊆ Rm . All of the square sub-matrices of size ≥ 1 have
determinant zero, and so the determinantal rank of A is zero.
0 1 2
2. Let A = . The row space is the span of the set {(0, 1, 2), (0, 1, 3)}.
0 1 3
These two vectors are not multiples of each other, so the set is linearly indepen-
dent, hence it spans a 2-dimensional subspace of R3 , and the row rank of A is
2.
The column space of A is the span of the set 00 , 11 , 23 . This is linearly
2.
30
MA201 Linear Algebra Autumn 2020-21
tiples of each other, we can throw two of them away without changing what the
span is. So the column space is the span of 11 , so the column rank is 1.
(1) The row rank, column rank and determinantal rank of A are all equal.
(2) If A is square, then the numbers in 1 are equal to the number of non-zero
eigenvalues of A, counted with multiplicity.
We will prove the first part of this theorem. The second part requires properties of
linear maps which we have not yet covered.
Definition 7.5. The rank of an m × n matrix A is defined to be any of the
above numbers (which are all the same). This is denoted by rank(A).
This part of Theorem 7.4 can be boiled down to two statements. Firstly, Propo-
sition 7.9 shows that elementary row operations don’t change the row space of a
matrix. Secondly, Proposition 7.10 shows that row operations don’t change the col-
umn rank (although the column space itself might be shifted around). This means
that we can assume that our matrix is in reduced row echelon form, where it is easier
to study the row space and column space.
First, we need another way of thinking about elementary row operations:
31
MA201 Linear Algebra Autumn 2020-21
1. If we swap rows i and j of A, we get the matrix Sij A, where Sij is the matrix
formed by swapping rows i and j of the m × m identity matrix Im .
3. If we add the j-th row of A to the i-th row, we get Eij A, where Eij is the matrix
formed from Im by adding 1 to the (i, j) position.
4. All of the matrices Sij , Li (λ) and Eij have non-zero determinant, so they are
invertible.
This follows because B is the product of various matrices Sij , Di (λ) and Eij from
the above proposition. Since det(XY ) = det(X) det(Y ) for all matrices, det(B) 6= 0
so B is also invertible.
Proposition 7.9. If A is an m × n matrix and B is invertible, then
On the other hand, since B is invertible we can applying the same reasoning to BA
instead of A, to get
And since the left-hand side is just the row space of A, the spaces are equal, as
claimed.
Proposition 7.10. If A is an m × n matrix and B is invertible, then
32
MA201 Linear Algebra Autumn 2020-21
Proof. Let {v1 , . . . , vn } be the column vectors of A. Then the column vectors of
BA are {Bv1 , . . . , Bvn }. So we want to show that the spans of these two sets have
the same dimension.
Pn
Now suppose that Pnwe have a non-trivial linear relation i=1 λi (Bvi ) = 0. This
rearranges to B i=1 λi vi = 0. Since P B is invertible, every equation Bx = 0 has a
unique solution, and so we have ni=1 λi vi = 0.
Thus, every non-trivial linear relation in {Bv1 , . . . , Bvn } gives a non-trivial linear
relation in {v1 , . . . , vn }, So if we reduce {Bv1 , . . . , Bvn } to a basis of the column
space of BA by throwing away some vectors Bvi , we can also throw away the
corresponding vectors vi from {v1 , . . . , vn }, and we still get a spanning set. We
don’t know that this new set is a basis for the column space of A, but it does
contain a basis, and therefore
And as in the previous proposition, we get the reverse inequality by considering the
matrices B −1 and BA instead of B and A.
Corollary 7.11. The row rank and the column rank of a matrix are equal.
Proof. By the above propositions, we can assume that A is in reduced row echelon:
1
1
unknowns
1
1
1
1
zeroes
1
In this case, the row rank is clearly equal to the number of non-zero rows. But also,
reading left-to-right, the column rank only ever increases when we encounter a new
leading ‘1’. The number of these is equal to the number of non-zero rows. So these
are the same.
It is clear that the row rank of an m × n matrix A cannot be bigger than m, the
number of rows, and the column rank of A cannot be bigger than n. Since the row
and column ranks are the same, we have
33
MA201 Linear Algebra Autumn 2020-21
(a) A has full rank (i.e. row rank(A) = col rank(A) = n).
The equivalence of (a) and (b) is left as an exercise on this week’s Exercise Sheet.
Here, we prove that (a) and (c) are equivalent. Firstly, notice that ‘rank n’ means
that the n columns of A are linearly independent, or in other words, there are no
non-trivial
Pn linear relations between the columns. Now observe that a linear relation
i=1 ai vi = 0 between the column vectors of A is the same thing as saying that
Aa = 0, where a = (a1 , a2 , . . . , an )T . So a non-trivial such linear relation exists if
and only if A has an eigenvector with eigenvalue 0. This shows that (c) is equiavlent
to (a).
1 10 11
Example 7.14. The matrix 2 4 50 has determinant −12472 6= 0. So its
−27 4 3
three rows are linearly independent, and its three columns are linearly independent.
So these are both bases of the space F 3 (where F is whichever field we are using for
the entries of the matrix).
34
MA201 Linear Algebra Autumn 2020-21
between them). But this means there are no non-trivial linear relations between the
corresponding k rows (or columns) of A. This shows that
35
MA201 Linear Algebra Autumn 2020-21
Coordinates
Bases are extremely useful objects in linear algebra. One reason for this is that they
let us talk about vectors using coordinates. For instance, your location on a map
can be specified by two numbers precisely because R2 has a basis of size two (in the
map example, this basis would probably be the two vectors ‘1 unit north, 1 unit
east’). This extends to higher dimensions, for instance, you can arrange a meeting
by specifying three coordinates x, y, z and a time t, and this makes sense because
R4 has a basis of size 4.
The space Rn is n-dimensional, because the standard basis {e1 , . . . , en } has size n.
In particular, R3 is 3-dimensional. Its subspaces each have dimension 0, 1, 2 or 3.
The only 0-dimensional subspace is {0}, and the only three dimensional subspace
of R3 is itself (the whole space). We saw last time that the 1-dimensional subspaces
are the lines (each the span of a non-zero vector), and the 2-dimensional subspaces
are the planes (which are the spans of pairs of linearly independent vectors).
Example 8.1. Let V = R3 , and let U be the set of (x, y, z) satisfying x + y + z = 0.
Then U is a subspace (check this!), and we will show that U is a plane. The equation
x + y + z = 0 lets us eliminate one variable, say z = −x − y. This gives us a method
of finding a basis for U . We take an arbitrary vector (x, y, z) = (x, y, −x − y) ∈ U .
This can be written
and since x and y were arbitrary, this shows that every vector in U can be written
as a linear combination of (1, 0, −1) and (0, 1, −1). In fact, this expression is unique:
If
x(1, 0, −1) + y(0, 1, −1) = x0 (1, 0, −1) + y 0 (0, 1, −1)
then x = x0 and y = y 0 . By Proposition 5.15, this shows that {(1, 0, −1), (0, 1, −1)}
is a basis of U , so U is 2-dimensional.
A vital property of bases is that they let us describe vectors using coordinates. By
part (iii) of Proposition 5.15, if we have a basis X = {v1 , . . . , vnP} of a vector space
V , then every vector v ∈ V can be written uniquely as v = ni=1 λi vi for some
scalars λi . So we can represent this vector with the list (λ1 , . . . , λn ). We call these
the coordinates of v with respect to the basis X.
This is actually something we’ve already used, without thinking about it. When we
write a vector in Rn or Cn as (a1 , a2 , . . . , an ), this can be thought of as shorthand
for the expression
a1 e1 + a2 e2 + · · · + an en
where the ei are the standard basis vectors. But sometimes we want to use other
bases.
36
MA201 Linear Algebra Autumn 2020-21
Example 8.2. Suppose that we have two maps with different orientations, and we
want to transform coordinates from one map to the other (i.e. if we know a location
on one map, we want to be able to find it on the other map easily). To make this
more concrete, suppose that on the first map, locations are described in terms of ‘1
unit north’ and ‘1 unit east’, and on the second map the locations are described in
terms of ‘1 unit north-east’ and ‘1 unit north-west.’ To make notation easier, we
give these vectors names:
[v]X = (a1 , a2 , . . . , an )
Thus in our example above, we have [v]{v1 ,v2 } = (3, 2), and [v]{w1 ,w2 } = √5 , √1 .
2 2
We can also transform back the other way: If we are given the coordinates of a
vector with respect to {w1 , w2 }, then we can substitute in the expressions for w1
and w2 to get the coordinates with respect to {v1 , v2 }. To continue our example
37
MA201 Linear Algebra Autumn 2020-21
with v = √5 w1 + √1 w2 :
2 2
5 1
v = √ w1 + √ w2
2 2
5 1 1 1 1 1
=√ √ v 1 + √ v2 + √ √ v1 − √ v2
2 2 2 2 2 2
6 4
= v1 + v2
2 2
= 3v1 + 2v2
v = av1 + bv2
1 1 1 1
= a √ w1 + √ w2 + b √ w1 − √ w2
2 2 2 2
a+b a−b
= √ w1 + √ w2 ,
2 2
√ , a−b
so [v]{w1 ,w2 } = a+b
2
√
2
. Similarly if we are given v = cw1 +dw2 , i.e. [v]{w1 ,w2 } =
(c, d), then we can go back:
v = cw1 + dw2
1 1 1 1
= c √ v1 + √ v2 + d √ v1 − √ v2
2 2 2 2
c+d c−d
= √ v1 + √ v 2 .
2 2
so [v]{v1 ,v2 } √ , c−d
= c+d
2
√
2
.
The process of changing coordinates from one basis to another is called change of
basis, and we can describe this using matrices. To continue our example, if we use
column vectors to denote coordinates, then the above equations can be expressed as
follows:
! !
√1 √1 a
a+b
√
2 2 = a−b 2 ,
√1 − √1 b √
2 2 2
! !
√1 √1 c
c+d
√
2 2 = c−d 2 .
√1 − √1 d √
2 2 2
In other words, multiplying by the 2×2 matrix shown takes coordinates with respect
to {v1 , v2 } and gives coordinates with respect to {w1 , w2 }. This matrix is called
the change of basis matrix from {v1 , v2 } to {w1 , w2 }.
38
MA201 Linear Algebra Autumn 2020-21
Note: In this example, the same matrix also takes us back the other way. But
often, we will need two different matrices to go from {v1 , v2 } to {w1 , w2 } and from
{w1 , w2 } to {v1 , v2 }. We’ll have more examples of this next time.
Properties:
If V is a vector space and B1 , B2 are bases, and if M12 is the change of basis
matrix from B1 to B2 , and M21 is the change of basis matrix from B2 to B1 ,
then M12 and M21 are inverse to one another (i.e. M12 M21 = M21 M12 = In ,
the identity matrix).
B = {v1 , . . . , vn }, B 0 = {w1 , . . . , wn }
v = c1 v1 + c2 v2 + · · · + cn vn (∗)
= d1 w1 + d2 w2 + · · · + dn wn .
If we write this using the notation from the last lecture, we have
We now show how (c1 , . . . , cn ) and (d1 , . . . , dn ) are related via a change of basis
matrix.
Note that, because B 0 is a basis, for each i we can write
n
X
vj = aij wi (†)
i=1
v = c1 v1 + c2 v2 + · · · + cn vn
n n n
! ! !
X X X
= c1 ai1 wi + c2 ai2 wi + · · · + ain wi (substituting for vj )
i=1 i=1 i=1
n
X Xn
= aij cj .wi (rearranging terms)
i=1 j=1
39
MA201 Linear Algebra Autumn 2020-21
But because the scalars d1 , . . . , dn are unique, this implies that di = nj=1 aij cj for
P
all i. If we write (c1 , . . . , cn ) and (d1 , . . . , dn ) as column vectors, these n equations
become the matrix equation:
d1 a11 a12 · · · a1n c1
d2 a21 a22 · · · a2n c2
[v]B 0 = . = . .. .. = A[v]B ,
. . . .
. . . . .
dn an1 · · · ··· ann cn
The entries aij in the change of basis matrix come from the equation Pn (†) above. In
particular, if the j-th vector from the first basis, vj , is equal to i=1 aij wi , then the
T
j-th column of the change of basis matrix is a1j a2j · · · anj . In other words:
To get column j of the change of basis matrix from B1 to B2 , take the j-th vector
in B1 and find its coordinates with respect to B2 .
Let’s have some more examples.
Example 8.4. In this example, we’ll see that although we are thinking of a basis
as a set, when we take coordinates it actually matters which order we put the basis
elements in.
Let V = R2 and let B = {v1 , v2 } be any basis. If [v]B = (c1 , c2 ) then this says that
v = c1 v1 + c2 v2 . So if we let B 0 = {v2 , v1 } then clearly [v]B 0 = (c2 , c1 ), which will
usually not be equal to (c1 , c2 ).
Let’s calculate the change of basis matrix from B to B 0 . Following the recipe above:
To get column 1 of the matrix, we have to take the first element of B, v1 , and
get its coordinates with respect to B 0 = {v2 , v1 }. Since v1 = 0v2 + 1v1 , we have
[v1]B 0= (0, 1), and so the first column of the change of basis matrix from B to B 0
0
is . Similarly, since v2 = 1v2 + 0v1 , we have [v2 ]B 0 = (1, 0). Hence the second
1
1
column of the matrix is , and so the change of basis matrix from B to B 0 is
0
0 1
.
1 0
In this example, like in the example from last time, the change of basis matrix from
B 0 to B is actually the same matrix as from B to B 0 . Let’s have another example
where this is not the case.
Example 8.5 (‘Stretching’). Here let V = R2 again, let B = {e1 , e2 } be the
standard basis, and let B 0 = {5e1 , 2e2 }. If [v]B = (c1 , c2 ) then this says that
v = c1 e1 + c2 e2 . So we would expect that [v]B 0 = (c1 /5, c2 /2): “If I need c1 of e1
to get to v, then I should need only 1/5 of 5e1 to get there,” and similarly for e2 .
40
MA201 Linear Algebra Autumn 2020-21
Let’s prove this. The first element of B is e1 , which is equal to 51 (5e1 ) + 0e2 . So the
1/5
first column of the change of basis matrix is . Similarly the second element
0
1 0
of B is e2 = 0e1 + 2 (2e2 ). So the second column is . So the change of basis
1/2
1/5 0
matrix from B to B 0 is . Hence if v ∈ V and [v]B = (c1 , c2 ), then
0 1/2
1/5 0 c1 c1 /5
[v]B0 = =
0 1/2 c2 c2 /5
as we expected.
41
MA201 Linear Algebra Autumn 2020-21
9 Linear maps
Let’s have one more example of change of basis: Let V = R2 again and let B =
{e1 , e2 } be the standard basis and B 0 = {e1 + e2 , e1 − e2 } (this is also a basis).
Again, to get the change of basis matrix from B to B 0 , we have to get the coordinate
of the vectors in B, with respect to B 0 . It is not hard to see that
1 1 1 1
e1 = (e1 + e2 ) + (e1 − e2 ), e2 = (e1 + e2 ) − (e1 − e2 ).
2 2 2 2
1
i.e. [e1 ]B 0 = 12 , 21 , so the first column of the change of basis matrix is 21 , and also
1 21 1
1 1
[e2 ]B 0 = 2 , − 2 , so the second column is 2 . Thus the matrix is 2 2
1 .
− 12 1
2 −2
0 1 1
Exercise 9.1. Show that change of basis matrix from B to B is .
1 −1
The process of ‘changing basis’ can be described as a function, which takes a set of
coordinates (with respect to one basis, say B), and outputs another set of coordinates
(with respect to the new basis, say B 0 ). Let’s call this f : Rn → Rn (or more
generally, F n → F n , where F is our field). This function has several nice properties.
It can be described as a matrix multiplication: f ([v]B ) = [v]B 0 = A[v]B , where A is
the change of basis matrix. From this we can deduce the following.
Definition 9.2. Let V and W be vector spaces over a field F . A linear map
from V to W is a function f : V → W such that
Some authors also call this a linear mapping, or homomorphism of vector spaces.
If V = W then we say f is a linear transformation V → V . For this reason, it
42
MA201 Linear Algebra Autumn 2020-21
Remark 9.3. One definition of ‘abstract algebra’ is ‘the study of sets with structure’
– So ‘linear algebra’ would be the study of sets with the ‘structure’ of addition and
scalar multiplication. Generally in algebra, the way we compare two such sets is
with structure preserving maps, also often called homomorphisms – For instance, it
is well known that the sets Z and Q have the same size, so there is a bijection between
them. But for the purposes of algebra, they are quite different (e.g. every non-zero
element of Q has a multiplicative inverse, but 2 does not have a multiplicative
inverse in Z). We can express this difference with that fact that there is no bijective
structure-preserving map from Z to Q. You’ll see more of this in MA204.
Examples 9.4.
1. If V is any vector space over F and c ∈ F then f (v) = cv defines a linear map
V → V . Check: For all v, w ∈ V and all λ ∈ F , we have
Linear maps are the structure-preserving functions between vector spaces, and hence
they have lots of useful properties. For instance:
Proposition 9.5. Let f : V → W be a linear map between two vector spaces V
and W . Then f sends the zero vector of V to the zero vector of W .
Proof. The ‘additive identity’ axiom tells us that the zero vectors in V and W satisfy
0 + 0 = 0. Using this property, we get
43
MA201 Linear Algebra Autumn 2020-21
Turning this property around, we get a condition for showing that a map is not
linear.
Corollary 9.6. If f : V → W is a map such that f (0) 6= 0, then f is not linear.
For instance, consider the map f : R → R, f (x) = x + a for some fixed a ∈ R. Since
f (0) = 0 + a = a, the corollary tells us that if a 6= 0 then f is not linear.
T (a, b) = T (a(1, 0)+b(0, 1)) = aT (1, 0)+bT (0, 1) = a(1, 1)+b(1, −1) = (a+b, a−b).
Example 9.9. Rotations in R2 . Rotations about the origin O = (0, 0) are linear
maps R2 → R2 . Given this information, let Rθ denote an anticlockwise rotation by
θ about the origin. Let’s calculate Rθ (x, y).
Firstly, by drawing triangles with hypotenuse of length 1, it is easy to calculate that
Rθ (1, 0) = (cos θ, sin θ) and Rθ (0, 1) = (− sin θ, cos θ). Now, calculate Rθ (x, y) using
the recipe above:
44
MA201 Linear Algebra Autumn 2020-21
In this second example, observe that if we use column vectors, then we can write
this as a matrix multiplication:
x cos θ − sin θ x
Rθ = .
y sin θ cos θ y
Notice: In this matrix, the first column gives the coordinates of Rθ (1, 0) with
respect to the basis {(1, 0), (0, 1)}, and the second column gives the coordinates of
Rθ (0, 1) with respect to the basis {(1, 0), (0, 1)}. This gives us a clue as to how we
can always express a linear transformation as a matrix.
45
MA201 Linear Algebra Autumn 2020-21
(ii) Im T is a subspace of W ;
(iii) The rank-nullity theorem: The dimensions of V , Im T and ker T are related
by the equation:
The name ‘rank-nullity theorem’ arises because dim(Im T ) is called the rank of T ,
and dim(ker T ) is sometimes called the nullity of T , although not everyone uses this
term nowadays.
Note: dim W does not appear anywhere in this equation! The reason for this is
that, if W is contained in a bigger vector space W 0 , then T still gives a linear map
V → W 0 , but this does not change the sets Im T or ker T .
Proof of (i) and (ii). (i) We need to check the three subspace conditions for the
subset ker T :
46
MA201 Linear Algebra Autumn 2020-21
T (v + v0 ) = T (v) + T (v0 ) = w + w0 ,
2 2 2
1. Thinking of elements
of R as column vectors, let T : R → R be multiplication
1 2
by the matrix . What is ker T ? What is Im T ?
2 1
Notice that if v = xy ∈ R2 then T (v) = x+2y
2x+y .
Kernel: If T (v) = 0 then x + 2y = 2x + y = 0. Rearranging, we get x = −2y =
− 12 y. The only solution to this is y = x = 0. Hence ker T = 00 , of dimension
0.
Image: We could calculate this directly, but since we know that dim ker T = 0,
the rank-nullity theorem tells us that dim Im T = dim V − dim ker T = 2 − 0 = 2.
The only 2-dimensional subspace of R2 is R2 itself, hence Im T = R2 .
dim Im T = 1.
Kernel: If T xy = 2x+2y 0
2x+2y = 0 then x = −y, and conversely if x = −y then
we can see that T xy = 00 . Therefore ker T is the set of vectors of the form
n o
x 1 1
−x = x −1 . So −1 is a basis for ker T , and dim ker T = 1.
As a check, we can verify that the rank-nullity theorem holds:
| {z V} = |dim{z
dim ker T} + |dim{z
Im T} .
=2 =1 =1
47
MA201 Linear Algebra Autumn 2020-21
and since {v1 , . . . , vm } is a basis of ker T , it is also a linearly independent set, hence
ai = 0 for all i.
Spanning: We need to show that if v ∈ V , then there exist scalars di (i = 1, . . . , m)
and ci (i = 1, . . . , n) such that
m n
! !
X X
di vi + ci ui = v.
i=1 i=1
48
MA201 Linear Algebra Autumn 2020-21
To show this, consider T (v). This lies in Im T , and we know that {w1 , . . . , wn }
spans Im T since it is a basis. Therefore, there exist scalars c1 , . . . , cn such that
n
X
T (v) = ci wi .
i=1
as required.
Hence, we have shown that {v1 , . . . , vm , u1 , . . . , un } is linearly independent and
spans V , hence it is a basis of size m + n, and so dim V = m + n = dim ker T +
dim Im T , and the rank-nullity theorem is proved.
49
MA201 Linear Algebra Autumn 2020-21
The word ‘isomorphism’ comes from the Greek for ‘equal shape’; the idea is that
two isomorphic vector spaces are essentially the same.
Recall: If X and Y are sets, then a function f : X → Y is called
surjective if, for every y ∈ Y , there exists some x ∈ X such that f (x) = y (in
other words, Im f = Y ).
Proof. (i) If T is injective and T (v) = 0, then also T (0) = 0, so T (v) = T (0) and
therefore v = 0 since T is injective. So the only vector v ∈ V satisfying T (v) = 0
is 0, hence ker T = {0}.
Conversely, suppose ker T = {0} and suppose that T (v) = T (v0 ) for some v, v0 ∈ V .
Since T is linear, we get T (v − v0 ) = 0, so v − v0 ∈ ker T , hence v − v0 = 0, so
v = v0 . Therefore, T is injective.
(ii) We have just shown that T is injective if and only if ker T = {0}, and by
definition, T is surjective if and only if Im T = W . Therefore T is bijective if and
only if both of these hold.
(iii) Since T is bijective, we know that an inverse function T −1 : W → V exists. To
prove that T −1 is linear, we need to show
50
MA201 Linear Algebra Autumn 2020-21
We show these by applying T to the left-hand side, and using the fact that T T −1 (u) =
u for all u ∈ W :
51
MA201 Linear Algebra Autumn 2020-21
We now discuss a general method for getting a matrix from a linear transformation.
Let T : V → W be linear. To get a matrix, we fix a basis B = {v1 , . . . , vn } of V ,
and a basis B 0 = {w1 , . . . , wm } of W . Since B 0 is a basis of W , we can write each
vector T (vi ) uniquely as a linear combination of the wj :
for unique scalars aij ∈ F . Once we have done this, if we have an arbitrary vector
v = ni=1 ci vi in V , then we can calculate
P
n n
!
X X
T (v) = T ci vi = ci T (vi ) (as T is linear)
i=1 i=1
n
X Xm
= ci aji wj . (substituting for T (vi ))
i=1 j=1
In other words, if [v]B = (c1 , . . . , cn ) and we take coordinates of T (v) with respect
to the basis B 0 = {w1 , . . . , wm }, we get
n n n
!
X X X
[T (v)]B 0 = a1i ci , a2i ci , . . . , ami ci .
i=1 i=1 i=1
If we write this out using column vectors, this becomes a matrix-vector product:
a11 a12 · · · a1n c1 a11 a12 · · · a1n
a21 a22 · · ·
c2 a21 a22 · · · a2n
a2n
[T (v)]B 0 = . .. .. = .. .. [v]B .
.. .. .. ..
.. . . . . . . . .
an1 an2 · · · amn cn an1 an2 · · · amn
Definition 12.1. We write [T ]B,B 0 for the m × n matrix (aij ) above; this is the
unique matrix satisfying the equation
52
MA201 Linear Algebra Autumn 2020-21
Remarks 12.2.
53
MA201 Linear Algebra Autumn 2020-21
Remark 12.5. If we try and perform the same check for [f ]B,B 0 as before, we get
x x
1 −1 0 x−y
[f ]B,B 0 y =
y = .
−1 0 −2 −x − 2z
z z
This looks wrong, the last entry should be x + 2z. But note: These are the coordi-
nates with respect to the basis {(1, 0), (0, −1)}, so the right-hand actually denotes
the vector (x − y)(1, 0) + (−x − 2z)(0, −1) = (x − y, x + 2z) in the standard basis.
Now that we know how to turn linear transformations into matrices, we can try and
pick ‘nice bases’ of V and W to get ‘nice matrices’ for f . For instance, if V = W
and B consists of eigenvectors for f , then [f ]B,B will be a diagonal matrix – this is
very useful, e.g. it makes performing calculations easy.
Recall that an eigenvector of a matrix A is a non-zero vector v such that Av = λv
for some scalar λ ∈ F , which we call the corresponding eigenvalue. This same
definition works for linear transformations:
Definition 12.6. Let V be a vector space over a field F and let T : V → V
be a linear transformation. An eigenvector of T is a non-zero vector v ∈ V
such that T (v) = λv for some scalar λ ∈ F , which is called the corresponding
eigenvalue of T .
It makes sense that if two matrices both correspond to the same linear transformation
T , but with respect to different bases, then these matrices should have something
to do with one another. The next proposition tells us what this relationship is.
Proposition 12.9. Let T : V → V be a linear transformation and let B1 , B2 be
bases of V . Let
M1 = [T ]B1 ,B1 , M2 = [T ]B2 ,B2 .
Finally, let A be the change of basis matrix from B1 to B2 , so that A−1 is the change
of basis matrix from B2 to B1 . Then M1 and M2 are related by the equation
M1 = A−1 M2 A.
54
MA201 Linear Algebra Autumn 2020-21
Proof of the proposition. By definition, the matrices M1 = [T ]B1 ,B1 , M2 = [T ]B2 ,B2
and the change of basis matrices A, A−1 satisfy the following properties for all u ∈ V :
This shows that M1 and A−1 M2 A do the same thing to every column vector [v]B1 ∈
F n , so they are the same matrix.
Remarks 12.11.
The proposition shows that two matrices which represent the same linear trans-
formation are conjugate. The converse is true as well: If M1 = A−1 M2 A, then
we can view M1 and M2 are representing the same linear transformation.
55
MA201 Linear Algebra Autumn 2020-21
13 Diagonalisation
Definition 13.1.
By the above, T is diagonalisable if and only if [T ]B,B is diagonalisable for any choice
of B (in this case, [T ]B,B is diagonalisable for all choices of B).
Proposition 13.2. Let T : V → V be a linear transformation. Then T is diagonal-
isable if and only if V has a basis B which consists of eigenvectors of T .
56
MA201 Linear Algebra Autumn 2020-21
cos θ − sin θ
Note that this agrees with the expression for the rotation matrix
sin θ cos θ
that we saw earlier.
Of course, it is not always easy to ‘spot’ eigenvectors in this way, so we would like
a more general method. The following gives an outline of a general process for
diagonalising a matrix or linear transformation, whenever this is possible.
General method.
Step 1. Use the characteristic equation of the matrix M to find its eigenvalues and
eigenvectors.
det(M − xIn ) = 0,
57
MA201 Linear Algebra Autumn 2020-21
With this definition, the eigenvalues of M are precisely the roots of the characteristic
polynomial of M .
In our particular example, we now know that M has eigenvalues ±1. To get eigen-
vectors, we solve the equation (M − λI2 )v = 0 twice, once with λ = 1 and once with
λ = −1. For λ = 1 we get
3 4
−5 − 1 5 x 0
4 3 = .
5 5 − 1 y 0
The first line here gives us − 58 x − 45 y = 0, and the second line gives an equivalent
equation.
So any non-zero solution to this gives us an eigenvector. For instance
1
is an eigenvector with corresponding eigenvalue 1 (taking x = 1, y = 2).
2
Repeating this for λ = −1, we get
3 4
−5 + 1 5 x 0
4 3 = =⇒ x = −2y
5 5 + 1 y 0
−2
And so is an eigenvector with corresponding eigenvalue −1.
1
Step 2. We now have two eigenvectors for M , and these are not multiples of one
another, so they form a linearly independent set of size 2, which is a basis of R2 .
Use these as the columns of a matrix:
1 −2
P = .
2 1
−1 −1 1 1 2
And let’s check that P M P is diagonal: Firstly note that P = det P =
−2 1
1 1 2
5 −2 1 . Hence,
3 4
1 1 2 −5 5 1 −2
P −1 M P = 4 3
5 −2 1 2 1
5 5
1 1 2 −3 4 1 −2
=
25 −2 1 4 3 2 1
1 1 2 5 10
=
25 −2 1 10 −5
1 25 0
=
25 0 −25
1 0
=
0 −1
as we expected.
When does this work?
There are two ways in which the above process can fail to work.
58
MA201 Linear Algebra Autumn 2020-21
1. Firstly, it is possible that the characteristic polynomial det(M − λIn ) has some
solutions which do not lie in the field F . For example, suppose that θ is not a
multiple of π, and consider the rotation matrix
cos θ − sin θ
Rθ = .
sin θ cos θ
This is a real matrix (so F = R), but its eigenvalues are complex. The charac-
teristic polynomial is
λ2 − (2 cos θ)λ + (cos2 θ + sin2 θ) = λ2 − 2 cos θ + 1 = (λ − eiθ )(λ − e−iθ ).
And we can see that the (complex) roots eiθ , e−iθ are not real if θ is not a multiple
of π.
2. Even if all the roots of the characteristic polynomial lie in the field F , we might
not be able to find enough eigenvectors to get a basis of V . For example, consider
the matrix
1 1
M= .
0 1
Let’s follow our diagonalisation routine. For Step 1, we get the characteristic
polynomial det(M − λI2 ) = (1 − λ2 ), which has a single repeated root λ = 1.
Now if we set λ = 1 and solve (M − λI2 )v = 0, we get
1−1 1 x 0 1 x y 0
= = = .
0 1−1 y 0 0 y 0 0
Clearly the only condition on x and y we get here is y = 0. But this means
that every eigenvector of M with eigenvalue 1 has the form x0 . These are all
scalar multiples of a single vector, namely 10 . Hence if we take two such vectors,
then they are not linearly independent. So V does not have a basis consisting of
eigenvalues of M .
It turns out that the above points are the only things which can get in the way of
our diagonalisation process.
Definition 13.6. Let M be an n × n matrix and let p(x) = det(M − xIn ) be
its characteristic polynomial, and let λ ∈ F .
With these definitions, the following proposition summarises the above information:
Proposition 13.7. A matrix M can be diagonalised (i.e. there exists a matrix P
such that P −1 M P is diagonal) if and only if all the eigenvalues of M lie in F , and
for each such eigenvalue λ, we have
algebraic multiplicity of λ = geometric multiplicity of λ.
59
MA201 Linear Algebra Autumn 2020-21
60
MA201 Linear Algebra Autumn 2020-21
Direct sums
Definition 14.1. Let V and W be vector spaces over a field F . The direct sum
of V and W , denoted V ⊕ W , is
V ⊕ W = {(v, w) : v ∈ V, w ∈ W } ,
F ⊕ F ⊕ · · · ⊕ F → F n,
| {z }
n times
(x1 , x2 , . . . , xn ) → (x1 , x2 , . . . , xn )
is an isomorphism.
Proof. Parts (i) and (ii) are really easy and they are left as an exercise.
Let f : U ⊕ W → U + W be defined as f (u, w) = u + w. It is easy to show that this
map is linear (do it!) and surjective, so it follows that dim(U + W ) = dim(Im(f )).
It follows from the rank-nullity theorem that
Note that
ker(f ) = {(u, w) ∈ U ⊕ W : u + w = 0}.
However (u, w) ∈ U ⊕ W satisfies u + w = 0 if and only if u = −w ∈ V . But
that implies u ∈ U ∩ W Conversely, if u ∈ U ∩ W then f (u, −u) = 0. Hence
ker(f ) ∼
= U ∩ W . Since f is surjective, it is an isomorphism if and only if U ∩ W ∼
=
ker(f ) = {0}, completing the proof.
Example 14.5. For instance, the direct sum of two 1-dimensional vector spaces
(lines) is always 2-dimensional, however we have previously seen the sum U + W of
two lines can be a line (if U = W ), or a plane (if U 6= W , which is exactly when
U ∩ W = {0}).
61
MA201 Linear Algebra Autumn 2020-21
Hom spaces
Definition 14.6. Let V and W be vector spaces over a field F . Let
Proposition 14.7.
Proof.
(ii) First of all we need to show that eij ∈ Hom(V, W ), i.e. that eij is indeed
Pm
0 0
PGiven0 v, v ∈ V , there are unique ak , ak such that v = k=1 ak vk and
linear.
v0 = m k=1 ak vk . Then
m
X
eij (v+v0 ) = eij ( (ak +a0k )vk ) = (ai +a0i )wj = ai wj +a0i wj = eij (v)+eij (v0 ).
k=1
Similarly, one can show that eij (λv) = λeij (v). (Do it!).
Now we need to show that {eij }16i6m,16j6n form a basis of T ∈ Hom(V, W ).
To prove linear independence, suppose we have aij such that
m X
X n
aij eij = 0.
i=1 j=1
62
MA201 Linear Algebra Autumn 2020-21
For each vi , for i = 1, . . . , m we have that there are aij such that
n
X
T (vi ) = aij wj ,
j=1
for all v ∈ V so
m X
X n
T = T0 = aij eij
i=1 j=1
and {eij } span Hom(V, W ) and are a basis. Since we have n×m = dim(V ) dim(W )
elements on this basis, (iii) follows.
For the last claim regarding the matrix of eij ∈ Hom(V, W ), note that if we
let B = {v1 , . . . , vm } and B 0 = {w1 , . . . , wn } then [eij (vk )]B0 is the 0-vector
unless i = k in which case all the entries are 0 apart from the j-entry which is
1. Hence the result follows from Observation 12.3.
63
MA201 Linear Algebra Autumn 2020-21
Example 14.8. Let V = {a cos t + b sin t : a, b ∈ R}, which you have previously
seen is a vector space. We claim:
d
(i) the map T = dt ∈ Hom(V, V ),
(ii) Let B = B 0 = {v1 = cos t, v2 = sin t} be the natural basis for V . In the
notation of Proposition 14.7, we have:
so T = −e12 + e21 .
d
To see (i) note that dt is a linear transformation in the vector space of continuous
d
functions, of which V is a subspace, so we only need to show that dt (V ) ⊆ V .
Indeed,
d
(a cos t + b sin t) = −a sin t + b cos t ∈ V, (2)
dt
d
so T = dt ∈ Hom(V, V ).
To see (ii) we just apply Proposition 14.7 (ii), i.e.
64
MA201 Linear Algebra Autumn 2020-21
In this section we just hint some materials that are not examinable but that would
be covered if we had more time.
Observe that since we know multiplication of matrices, we can also describe this
product as:
n
X
hv, wi = vT · I · w = v i · wi .
i=1
But what happens if we replace the identity matrix above by another matrix? Do
we get a notion of ‘product’ ? Indeed.
(iv) homogeneity in first slot: hλu, vi = λhu, vi for all λ ∈ F and all u, v ∈ V ;
65
MA201 Linear Algebra Autumn 2020-21
This is an inner product and in fact, this is a baby example of Hilbert space, which
is the main object of study in functional analysis.
A inner product space is very useful to formalise notions from Euclidean geometry
into linear algebra. For instance, it allows us to define lengths of vectors (norms)
or even angles between two vectors. To see this, notice that property (v) in the
definition of inner product means that hv, vi = hv, vi so hv, vi ∈ R.
Definition 15.5. Let V be a real inner product space. The angle between two
vectors v, w ∈ V is defined to be
hv, wi
arccos .
||v|| ||w||
What the above means is that all angles have norm 1 and are at right angles with
each other. The standard basis on V = Rn with the inner product given by the dot
product is an example of orthonormal basis. This theory allows us to extend that
notion to more abstract vector spaces.
There is an algorithmical procedure to produce an orthonormal basis called the
Gram-Schmidt process but I am afraid we have run out of time. You may explore
some of these topics and more in a Capstone project next year!
66