100% found this document useful (1 vote)
367 views69 pages

Linear Algebra Lecture Notes

The document provides lecture notes on linear algebra concepts including vectors, matrices, systems of linear equations, elementary row operations, eigenvalues, determinants, inverse matrices, vector spaces, subspaces, linear independence, spanning sets, bases, dimension, ranks of matrices, coordinates, change of basis, linear maps, images, kernels, rank-nullity theorem, injections, surjections, isomorphisms, diagonalization, direct sums, and inner product spaces.

Uploaded by

Soumyadeep Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
367 views69 pages

Linear Algebra Lecture Notes

The document provides lecture notes on linear algebra concepts including vectors, matrices, systems of linear equations, elementary row operations, eigenvalues, determinants, inverse matrices, vector spaces, subspaces, linear independence, spanning sets, bases, dimension, ranks of matrices, coordinates, change of basis, linear maps, images, kernels, rank-nullity theorem, injections, surjections, isomorphisms, diagonalization, direct sums, and inner product spaces.

Uploaded by

Soumyadeep Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

MA201 Linear Algebra

Lecture Notes

Dr. Alastair Litterick


Dr. Jesus Martinez-Garcia

Autumn 2020

1
MA201 Linear Algebra Autumn 2020-21

Contents

1 Recap: Matrices and vectors (MA114) 1


Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Matrices and systems of linear equations . . . . . . . . . . . . . . . . . . . 1
Adding and multiplying vectors and matrices. . . . . . . . . . . . . . . . . 2
Gaussian elimination and elementary row operations . . . . . . . . . . . . 2
Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 4
The characteristic polynomial of a matrix . . . . . . . . . . . . . . . . . . 5
Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Abstract vector spaces – “What is linear algebra?” 8

3 Vector spaces and subspaces 11


Axioms for vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Fundamental example: Solutions to a system of linear equations . . . . . 15

4 New subspaces from old (sums and intersections) 17

5 Linear independence, spanning sets and bases 19


Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . 19
Spans and spanning sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Bases and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Methods for spans and linear (in)dependence . . . . . . . . . . . . . . . . 23
Basis, minimal spanning sets, maximal linearly independent sets . . . . . 24
Bases and linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 All bases have the same size 26


Dimensions of sums and intersections . . . . . . . . . . . . . . . . . . . . . 28

7 The rank of a matrix 30

2
MA201 Linear Algebra Autumn 2020-21

Row rank, column rank, determinantal rank . . . . . . . . . . . . . . . . . 30


Row rank equals column rank . . . . . . . . . . . . . . . . . . . . . . . . . 31
Matrices of full rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8 Coordinates and change of basis 36


Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Change of basis matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
How to calculate a change of basis matrix . . . . . . . . . . . . . . . . . . 40

9 Linear maps 42
Linear maps and matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

10 Image, kernel, and the rank-nullity theorem 46


Some examples of images and kernels. . . . . . . . . . . . . . . . . . . . . 47
Proof of the Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . 48

11 Injections, surjections, isomorphism 50

12 Linear maps and matrices 52


Linear maps and change of basis . . . . . . . . . . . . . . . . . . . . . . . 54

13 Diagonalisation 56

14 New vector spaces from old 61


Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Hom spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

15 An introduction to inner product spaces 65


Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

0
MA201 Linear Algebra Autumn 2020-21

1 Recap: Matrices and vectors (MA114)

This section summarises notation and results on matrices and vectors which will be
referred to from time to time. Mostly, these results will be familiar from MA114.

Vectors

So far, you will have have seen row vectors, which  are ordered lists of numbers
v = (v1 , v2 , . . . , vn ), and column vectors such as xy . The vectors in a matrix-vector
equation Au = v are usually column vectors. Because column vectors take up a lot
of space on the page, it is common to use the transpose operation to write a row
vector as a column vector, for instance
 
v1
 v2  T
 ..  = v1 v2 · · · vn .
 
.
vn

You will see different notation for vectors in different places; in these notes, vectors
will always be denoted using bold upright letters (u, v, x and so on). A special role
is played by the zero vector whose entries are all zero. We denote this vector by 0.

Matrices and systems of linear equations

A system of linear equations in n variables x1 , x2 , . . ., xn is a set of equations


a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm
where the quantities aij and bk are fixed constants (usually real or complex numbers).
This system of m equations can be represented as a matrix-vector equation:
Ax = b (∗)
where
     
a11 a12 ··· a1n x1 b1
 a21 a22 ··· a2n   x2   b2 
A= . ..  , x =  . , b =  . .
     
.. ..
 .. . . .   ..   .. 
am1 am2 · · · amn xn bm
In other words, x is a column vector of length n (the number of variables), b is a
column vector length m (the number of equations), and A is an m × n matrix (m
rows, n columns). If we have only two or three variables, it is common to name
them x, y and z instead of x1 , x2 , x3 .

1
MA201 Linear Algebra Autumn 2020-21

Adding and multiplying vectors and matrices.

Vectors or matrices of the same size can be added together component-wise: If


A = (aij ) and B = (bij ) are both m × n matrices, then A + B is the matrix whose
(i, j)-entry is aij + bij , for each i and j. (Note: A row or column vector is the same
thing as a 1 × n or m × 1 matrix for this purpose).
Furthermore, if A is a m × n matrix and B is an n × p matrix (i.e. A has as many
rows as B has columns), then we can define matrix multiplication: writing (AB)ij
for the (i, j)-entry of AB, we have

(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj .

In other words, the (i, j) entry of AB is the scalar product of the i-th row of A
with the j-th column of B. Again, the also holds true for matrix-vector products,
thinking of vectors as matrices with a single column.
Another very useful property of matrix multiplication is the associative property

A(BC) = (AB)C,

which holds for all matrices A, B and C (which must be of the right size so that the
products are defined). A special case of this is when C has a single column – that
is, it is a column vector. Then we obtain the equation

(AB)v = A(Bv)

which holds for all vectors v. If we think of matrices as acting on vectors (i.e. ‘A
is the matrix which sends each vector v to the vector Av’), then this equation says
‘AB is the matrix which encodes the operation “apply B to v, then apply A.”

Gaussian elimination and elementary row operations

Elementary row operations are, as the name suggests, basic steps for manipulating a
matrix, which do not affect the solutions to the corresponding system of equations.
The three operations are:

(1) swap two rows;

(2) multiply a row by a non-zero scalar;

(3) add a multiple of one row to another row.

These correspond, respectively, to swapping two equations around; multiplying all


the coefficients of an equation by the same number; and adding a multiple of one
equation to another. The resulting system has exactly the same solutions as the
system we started with; if we start with a solution (x1 , x2 , . . .) and perform these
operations, it is clear that we still have a solution. And since each of these operations
can be undone with another such operation, the converse also holds: If we do some

2
MA201 Linear Algebra Autumn 2020-21

operations and end up with a solution, then undoing these operations will also result
in a solution.
A matrix is said to be in row echelon form if the each row starts with at least as
many zeroes as the row above it. In a picture, the matrix has the form
 
∗ possibly non-zero entries

 ∗ 


 ∗ 


 ∗ 

 ∗ 

 
 zero entries 

where each ∗ denotes an arbitrary non-zero scalar. Gaussian elimination is the


process of using elementary row operations to reduce a matrix to row echelon form,
which can always be done.
Further elementary row operations can bring the matrix into reduced row echelon
form, where the first non-zero entry in each row is 1 and has only zeroes above
it. This can also be done for any matrix: Once the matrix is in row echelon form,
we can just divide each row by a constant to make the first non-zero entry 1, then
subtract a multiple of this row from each row above it.

Example 1.1. Let’s use Gaussian elimination to reduce a matrix to row echelon
form:
   
1 2 3 1 2 3
1 1 1 0 −1 −2 (subtract row 1 from row 2)
2 3 4 2 3 4
 
1 2 3
0 −1 −2 (subtract twice row 1 from row 3)
0 −1 −2
 
1 2 3
0 −1 −2 (subtract row 2 from row 3)
0 0 0

which is in row-echelon form, and we can further bring this into reduced row echelon
form by multiplying row 2 by −1, and subtracting twice this from row 1, to get
 
1 0 −1
0 1 2  .
0 0 0

Thus, elementary row operations let us simplify a system of linear equations, let-
ting us spot its properties more easily. For instance, the rank of a system of linear
equations is the number of ‘independent’ constraints the system places on its vari-
ables. This is not easy to spot in general, but if we put the system in row echelon

3
MA201 Linear Algebra Autumn 2020-21

form (i.e. use Gaussian elimination on A), then the rank is equal to the number of
non-zero rows. This is not hard to see, since each non-zero row then tells us that
some variable xi can be written as a sum of the next variables xi+1 up to xn .
Since the number of constraints in a system Ax = b depends only on the matrix A,
we also call this the rank of the matrix A.
Using Gaussian elimination on the augmented matrix of a system Ax = b, i.e. the
matrix (A|b) obtained by appending b as a new column of A, let’s us study the
solutions to a system of linear equations as follows:

Proposition 1.2 (Rouché-Capelli theorem, simplified). The system Ax = b is


consistent (has at least one solution) if and only if A and the augmented matrix
(A|b) have the same rank.

You may have seen this explained in a different way: If we use Gaussian elimination
on the augmented matrix (A|b), then the left-hand part is just A in row echelon
form. So the only way these matrices can have different ranks is if the augmented
matrix ends up with a row
(0 0 · · · 0|c)
where c is non-zero. But this is an equation of the form 0x1 + 0x2 + · · · + 0xn = c,
which is exactly what makes a linear system inconsistent!

Example 1.3. Consider the system given by


    
1 2 x b1
= .
2 4 y b2

For some unknowns b1 , b2 ∈ R. Using Gaussian elimination:


   
1 2 b1 1 2 b1
2 4 b2 0 0 b2 − 2b1

So if b2 − 2b1 6= 0 then the system is inconsistent (no solutions), but if b2 = 2b1 then
there are solutions, in fact any pair (x, y) satisfying x + 2y = b1 will be a solution.

Eigenvalues and eigenvectors

If A is an n × n matrix then an eigenvector of A is a non-zero vector v of length n


such that
Av = λv
for some scalar λ. In this case, λ is called the eigenvalue of A corresponding to v.
Eigenvectors and eigenvalues can involve complex numbers, even if the matrix is
real. For example, the matrix  
0 −1
A=
1 0

4
MA201 Linear Algebra Autumn 2020-21

has no real eigenvectors: If A xy = λ xy then −y = λx and x = λy, so if (x, y) 6=


 

(0, 0) then we get −y = λ2 y hence 2


 λ y= −1. The only solutions to this are λ = i
x
and λ = −i, and then we get y = iy , which can never be real if y 6= 0 (and if
y = 0 then the vector is 00 , which is not an eigenvector, by definition).


In some special cases, it is easy to spot the eigenvalues of a matrix by inspection:


The simplest case is when the matrix is diagonal. If
 
a1 0 · · · 0
 .. 
 0 a2 .
A=. 
.

. . . . 0

0 · · · 0 an

and if e1 = (1, 0, . . . , 0)T , e2 = (0, 1, 0, . . . , 0)T and so on, then Aei = ai ei for all i.
Thus for each i, ei is an eigenvector for A with eigenvalue ai .

The characteristic polynomial of a matrix

The characteristic polynomial of an n × n matrix A is defined as the determinant


det(A − xIn ), where x is a variable and In is the n × n identity matrix (diagonal
matrix with ’1’s on the diagonal). The roots of the characteristic polynomial are
precisely the eigenvalues of A. And in fact, this gives us a general algorithm for
calculating eigenvalues and eigenvectors of a square matrix A:

ˆ Calculate the characteristic polynomial of A, and find its roots (the eigenvalues
of A).

ˆ For each such root λ, solve (A − λIn )v = 0 to find a non-zero solution for v,
which will be an eigenvector corresponding to λ.

Let’s go back to a previous example:


     
0 −1 1 0 −x −1
det −x = det
1 0 0 1 1 −x
= x2 + 1

Hence we verify that the eigenvalues of this matrix are indeed ±i, the (complex)
square roots of −1.

1 1
 substitute x = i and x = −i into A−xIn :
From this point, to get the eigenvectors, we
We find that −i works for x = i and i works for x = −i.
This algorithm always works: The only way it could fail is if (A − λIn )v = 0 had
the unique solution v = 0, since then we could not find a (non-zero) eigenvector
corresponding to λ. But as we shall see when we discuss matrix inverses, a linear
system only has a unique solution when the determinant is non-zero. But we know
that det(A − λIn ) = 0 since λ is a root.

5
MA201 Linear Algebra Autumn 2020-21

Determinants

Determinants are very useful invariants of square matrices which encode both al-
gebraic and geometric information. For instance, the determinant of a real matrix
can be thought of as a ‘scaling factor’ which tells you how the volume of a shape
changes when applying the matrix as a geometric transformation.
The determinant of a matrix A is written as det(A) or |A|. For small matrices, the
determinant can be written down and calculated by hand:

a b
c d = ad − bc,


a b c
d e f = a e f − b d f + c d e

h i g i g h
g h i
= aei − af h − bdi + bf g + cdh − ceg.
For larger matrices, determinants can be defined inductively via a rule sometimes
known as Laplace expansion along a row or column. For instance, the 3 × 3 deter-
minant above is given by expansion along the first row: We take the elements of the
first row (a, b, c), multiplied by −1 in the even positions, and then multiplied by the
corresponding minor, i.e. the determinant of the matrix where the row and column
containing a, b or c is crossed out.
This same technique lets us define the determinant of an n×n matrix as an alternat-
ing sum of determinants of (n − 1) × (n − 1) matrices, expanding along an arbitrary
row or column: If we are given the an n × n matrix A = (aij ), then we can fix i or
j to get the inductive formulas
n
X n
X
i+j
det A = (−1) aij det Aij = (−1)i+j aij det Aij .
j=1 i=1
| {z } | {z }
expand along row i expand along column j

where Aij is the matrix formed from A by deleting row i and column j.
While you will not be asked to perform gigantic calculations by hand, you should
be aware how one defines the determinant of matrices larger than 3 × 3, and for
certain matrices, expansion along a row can be very quick, particularly if the matrix
has lots of zeroes. For example, using two lots of expanding along the first row and
ignoring zeroes lets us calculate:

0 1 0 0
0 2 0
0 0 2 0
0 0 0 3 = (−1) 0 0 3

4 0 0
4 0 0 0
 
0 3
= (−1) (−2)
4 0
= (−1)(−2)(−12)
= −24.

6
MA201 Linear Algebra Autumn 2020-21

Inverse matrices

The inverse of a matrix exists precisely when its determinant is non-zero. The
existence of an inverse tells us something about solutions to the corresponding linear
system:
Proposition 1.4. If A is a square matrix (i.e. m = n), then Ax = b has a unique
solution for x if and only if A−1 exists, i.e. when det A 6= 0.

If the inverse exists then we multiply both sides by it to get:


Ax = b
⇐⇒ A−1 (Ax) = A−1 b
⇐⇒ (A−1 A)x = A−1 b
⇐⇒ In x = A−1 b
⇐⇒ x = A−1 b.
So x is uniquely determined. Conversely, if the determinant is zero, then when we
reduce A to row echelon form, we get a row of zeroes. A little thought shows that
this means A sends some non-zero vector v to 0. So if Ax = b then A(v + x) =
Av + Ax = 0 + b = b. So if one solution exists, then more than one exists when
the determinant is zero.
For a 2 × 2 matrix we have a nice short formula for calculating the inverse:
 −1    
a b 1 d −b 1 d −b
= = .
c d det A −c a ad − bc −c a
For larger n × n matrices we also have a formula, although this is again a very slow
process to do by hand: If A = (aij ) is invertible and if A−1 = (bij ), then for each i
and j we have
(−1)i+j
bij = det Aji
det A
where Aji is the matrix formed by deleting row j and column i note: rows and
columns have been swapped here!.
The fastest way to calculate an inverse matrix by hand is using so-called Gauss-
Jordan elimination: Form the matrix (A | In ) by adjoining In , then use Gaussian
elimination to put this in reduced row echelon form. The result is (In | A−1 ).
Example 1.5. Consider the system given by
    
1 2 x b1
= .
2 c y b2

For some unknowns b1 , b2 , c ∈ R. The matrix here has determinant c − 4, so the


matrix is invertible precisely when c 6= 4, in which case the system has a unique
solution, namely
   −1     
x 1 2 b1 1 c −2 b1
= = .
y 2 c b2 c − 4 −2 1 b2

7
MA201 Linear Algebra Autumn 2020-21

2 Abstract vector spaces – “What is linear algebra?”

In MA114, you’ve seen vectors and matrices described as lists and arrays of numbers
(usually real numbers). However, all of the important properties of vectors and
matrices arise from just two things:

(1) The ability to add two vectors together,

(2) The ability to multiply a vector by a scalar.

Linear algebra is a branch of abstract algebra. In abstract algebra, instead of work-


ing with concrete objects like lists of numbers, we consider abstract objects, which
are objects defined by their properties. By focusing on the properties, and not on
the vectors themselves, we can derive properties and theorems which hold for all
collections of vectors at once.
Importantly, when studying abstract objects, it doesn’t matter what they represent,
as long as they obey the rules. Vectors can represent many things: positions in
space, velocities, momenta, stock prices, quantum states, functions, probabilities,
and far, far more besides. But as long as they obey the correct rules of addition and
scalar multiplication, it doesn’t matter what the vectors are, they still obey all the
rules of linear algebra.
If we try to define the most general sets of objects with a suitable addition and scalar
multiplication operations, we end up with vector spaces, which are the fundamental
objects of study in linear algebra. Here’s an informal definition (we will give more
detail later):

Definition 2.1 (Informal). A vector space is a set V together with

(1) Addition: We have a rule for adding elements of V together to get another
element of V . If v, w ∈ V , we denote this new element by v + w.

(2) Scalar multiplication: There is a rule for multiplying elements of V by


scalars to get new elements of V . We denote the set of scalars by F ; in this
module we usually take F = R or C. If λ ∈ F and v ∈ V , the new element
is denoted by λv. It is called a scalar multiple of v.

In this case we call V a vector space over F . Elements of V are called vectors,
and F is called the field of scalars.

Not all sets of numbers are allowed to be scalars. Importantly, we must be able to
add, subtract and multiply scalars together, and we must also be able to divide by
non-zero scalars (otherwise Gaussian elimination wouldn’t work). So for instance,
F = R, Q or C is OK, but F = N is not (subtraction might result in numbers not
in N), and nor is F = Z (because division might result in a number not in Z).

8
MA201 Linear Algebra Autumn 2020-21

Since we are only ever allowed to use addition and scalar multiplication to manipu-
late vectors, we have a special term for when we create new vectors using these:

Definition 2.2. Let V be a vector space over a field F . A linear combination


of elements of V is a vector of the form

λ1 v1 + λ2 v2 + · · · + λn vn

for some scalars λ1 , . . . , λn and some vectors v1 , . . . , vn .

In other words, a linear combination of v1 , . . . , vn is any vector in V which we can


obtain by applying only addition and scalar multiplication to v1 , . . . , vn .

Examples 2.3.

1. The familiar spaces R, R2 and R3 are vector spaces over R. More generally, for
each n ≥ 0, the set Rn is a vector space over R. This is the set of n-tuples

Rn = {(x1 , x2 , . . . , xn ) : xi ∈ R for each i}.

Addition and scalar multiplication are defined component-wise: If (x1 , x2 , . . . , xn ),


(y1 , y2 , . . . , yn ) ∈ Rn , and if λ ∈ R then
def
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn ),
def
λ(x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ).

2. Similarly, C, C2 , C3 etc. are vector spaces over C.

3. Here is a more abstract example. Let V be the set of all functions R → R. This
becomes a vector space over R with addition and scalar multiplication defined
point-wise, i.e. if we have functions f, g : R → R then we define
def
(f + g)(x) = f (x) + g(x),

and if f : R → R and λ ∈ R, then


def
(λf )(x) = λ(f (x)).

In words, f +g is the function which sends x to f (x)+g(x), and λf is the function


which sends x to λ(f (x)).

4. Consider a linear equation of the form

a1 x1 + a2 x2 + · · · + an xn = 0,

where the ai are constants in F = R or C, and x1 , . . . , xn are variables. Then


the set of all n-tuples (x1 , x2 , . . . , xn ) ∈ F n satisfying the equation form a vector
space over F . If we take component-wise addition and scalar multiplication (just

9
MA201 Linear Algebra Autumn 2020-21

as in Rn above), then sums and scalar multiples of solutions to the equation are
also solutions, because of the two rules:
n n n
! !
X X X
ai (xi + yi ) = ai xi + ai yi ,
i=1 i=1 i=1
n n
!
X X
ai (λxi ) = λ ai xi ,
i=1 i=1

which hold for all vectors (x1 , . . . , xn ) and (y1 , . . . , yn ) ∈ F n and all scalars λ ∈ F .

5. Recall that a homogeneous linear ODE is an ODE of the form

dn y dn−1 y dy
an (x) n
+ a n−1 (x) n−1
+ · · · + a1 (x) + a0 (x)y = 0.
dx dx dx
i.e. the coefficients ai (x) depend on x only, not on y. The set of functions R → R
satisfying a given linear ODE form a vector space over R. To check this, we
need to know that when we add two such functions together, we get another such
function, and similarly when we multiply such a function by a scalar (constant).
Check this for yourself!

10
MA201 Linear Algebra Autumn 2020-21

3 Vector spaces and subspaces

Axioms for vector spaces

The informal definition of a vector space above (Definition 2.1) is lacking some
detail, since we haven’t actually specified which properties the addition and scalar
multiplication has to satisfy. To do this, we need axioms.

Definition 3.1. A vector space over a field F is a set of vectors, V , with an


addition and a scalar multiplication by elements of F (called scalars), such that
the following eight axioms hold:
Axioms for addition

ˆ Commutativity: v + w = w + v for all v, w ∈ V .

ˆ Associativity: v + (w + u) = (v + w) + u for all v, w, u ∈ V .

ˆ Identity: There is a vector 0 ∈ V , called the zero vector, such that


0 + v = v for all vectors v ∈ V .

ˆ Inverse: For each vector v ∈ V , there is a vector w ∈ V , called a negative


of v, such that v + w = 0.

(Later in MA204, you’ll see that these four axioms tell us that V is an abelian
group under addition. But you don’t need to know this name for now.)
Axioms for scalar multiplication

ˆ Associativity: (λµ)v = λ(µv) for all scalars λ, µ ∈ F and all vectors


v ∈V.

ˆ Identity: The scalar 1 ∈ F satisfies 1v = v for all vectors v ∈ V .

Distributivity axioms (these involve both + and scalar multiplication)

ˆ λ(v + w) = λv + λw for all λ ∈ F and all v, w ∈ V .

ˆ (λ + µ)v = λv + µv for all λ, µ ∈ F and all v ∈ V .

The idea of these axioms is that they capture the ‘obvious’ properties of R2 , R3 etc.,
and from these axioms we can deduce all other properties of vector spaces. Here are
some examples of properties we can prove directly from the axioms:
Proposition 3.2 (Uniqueness of zero). Suppose 0 and 00 are two vectors, both of
which satisfy the additive identity axiom. Then 0 = 00 .

Proof. Applying the additive identity axiom for 00 , with v = 0, we get 0 = 0 + 00 .

11
MA201 Linear Algebra Autumn 2020-21

But now applying the additive identity axiom for 0, with v = 00 , we get 0 + 00 = 00 .
So 0 = 00 , as claimed.
Proposition 3.3 (Uniqueness of negatives). Let v ∈ V , and suppose that w and
w0 satisfy
v + w = v + w0 = 0. (1)
Then w = w0 .

Note: It is tempting to say “this is obvious, we just subtract v from both sides”.
But we haven’t defined subtraction yet, only addition and scalar multiplication. To
define subtraction, we need to know that negatives are unique (so things like v1 − v2
are uniquely defined). So we need a proof that only uses the axioms.

Proof. Using the axioms and (1) gives the following equalities:

w =w+0 by the property of 0 (additive identity)


0
= w + (v + w ) by (1)
0
= (w + v) + w by associativity of addition
0
= (v + w) + w by commutativity of addition
0
=0+w by (1)
0
=w by the property of 0.

as required.
Notation: Since negatives are unique, we can now talk about the negative of a
vector (the axiom only says ‘a negative’), and we write −v for the negative of v.
Now we are also able to define subtraction: v − w is defined to be v + (−w).
Corollary 3.4. If 0 is the zero scalar in F and v ∈ V is any vector, then 0v = 0.

Proof. Using the property 0 = 0 + 0, which holds in all fields F (think of the reals),
as well as the distributivity axioms, we have 0v = (0 + 0)v = 0v + 0v. Now we can
subtract 0v from both sides:

0v = 0v + 0v
0v − 0v = (0v + 0v) − 0v
0v − 0v = 0v + (0v − 0v) (additive associativity)
0 = 0v + 0 (additive inverse)
0 = 0v. (property of 0).

Note: This is, obviously, an extreme level of detail to go into. In an exam situation,
you would be told explicitly when you are expected to go back to the axioms to prove
a result!

12
MA201 Linear Algebra Autumn 2020-21

Example 3.5. Consider the set V = R. We will briefly check that the usual addition
and multiplication of real numbers makes R satisfies the eight axioms, and therefore
R is a vector space over R (it is just Rn with n = 1, of course).
In fact, in this case, most of the axioms are self-evident. Writing the axioms out
using symbols, these become:

Addition Scalar multiplication


Commutativity x+y =y+x −
Associativity x + (y + z) = (x + y) + z (λµ)x = λ(µx)
Identity x+0=0+x=x 1x = x
Inverse x + (−x) = (−x) + x = 0 −
λ(x + y) = λx + λy
Distributivity
(λ + µ)x = λx + µx

And these are all familiar properties of real numbers which we already use without
thinking about them.

Subspaces

Vector spaces are important objects already, but their real power comes from study-
ing their subspaces. These are subsets where using addition and scalar multiplication
never takes you outside the subset. We say that the subset is closed under addition
and scalar multiplication. Formally:

Definition 3.6. Let V be a vector space over F , and let U be a subset of V ,


denoted U ⊆ V . Then U is called a subspace of V is the following hold:

ˆ U contains the zero vector of V ; (in symbols, 0 ∈ U ),

ˆ U is closed under addition and under scalar multiplication, i.e.:

– if v, w ∈ U then v + w ∈ U , and
– if v ∈ U and λ ∈ F then λv ∈ U .

Another way of phrasing this is ‘U is closed under taking linear combinations.’ One
reason subspaces are important is that they are also vector spaces:
Proposition 3.7. Let V be a vector space over F , and let U be a subspace. Then
U is also a vector space over F , with the same addition and scalar multiplication.

Proof. We need to check that each of the eight axioms holds for U . Mostly, these
are automatically true because they are true for V . For instance, consider the two
distributivity axioms. These say that the equations

λ(v + u) = λv + λu
(λ + µ)v = λv + µv

13
MA201 Linear Algebra Autumn 2020-21

hold for all λ, µ ∈ F and all v, u ∈ V . Since a vector in U is also a vector in V ,


these equations hold also for all λ ∈ F and all v, u ∈ U . Note that we have had to
use the fact that U is closed under addition and scalar multiplication, so that the
equations above actually make sense in U .
Similarly, commutativity and associativity of addition and scalar multiplication
hold for all vectors in V . Since a vector in U is also a vector in V , this means that
these two axioms also hold for U . The additive identity axiom holds by definition
(0 ∈ U ), and the multiplicative identity axiom also follows straight from the
same axiom in V .
The only axiom that requires a little work is the additive inverse axiom. We know
from before that each vector v ∈ V has a unique negative vector −v ∈ V . We need
to show that if v ∈ U then −v ∈ U also.
To show this, we will prove that −v = (−1)v. In this case, the right-hand side has
the form λv, hence is in U . So we conclude that −v ∈ U .
Now, −v is the unique vector in V satisfying

v + (−v) = 0.

So if we show that v + (−1)v = 0, then the uniqueness will tell us that (−1)v = −v.
Now:

v + (−1)v = 1v + (−1)v (using multiplicative identity)


= (1 − 1)v (distributivity)
= 0v
= 0. (Corollary 3.4)

Hence (−1)v = v, and since U is closed under scalar multiplication, if v ∈ U then


(−1)v ∈ U too, and therefore −v = (−1)v ∈ U . So U contains negatives, and
therefore satisfies the additive inverse axiom.
In conclusion, we have proved that U is a vector space.
Examples 3.8.

1. Let V = R3 , the set of triples (x, y, z) of real numbers. Let U be the subset

U = {(x, x, 0) : x ∈ R}.

We claim that this is a subspace. To check this, we need 0 ∈ U , and that U is


closed under addition and scalar multiplication.
ˆ The zero vector in R3 is 0 = (0, 0, 0), which has the form (x, x, 0), so 0 ∈ U .
ˆ If v, u ∈ U then v = (x, x, 0) for some x ∈ R and u = (y, y, 0) for some
y ∈ R, so v + u = (x + y, x + y, 0), which is also an element of U .
ˆ If v = (x, x, 0) ∈ U and λ ∈ R then λv = (λx, λx, 0), which is also in U .

So U is indeed a subspace of V . We could also have combined checking the last


two items into a single calculation:

14
MA201 Linear Algebra Autumn 2020-21

ˆ If v1 , . . . , vn ∈ U , then for each i we can write


Pn vi = (xi , xi , 0) for some
xi ∈ R. Then if we take a linear combination i=1 λi vi for scalars λi , this
can be written
n n n
!
X X X
λ i vi = λi xi , λ i xi , 0 ,
i=1 i=1 i=1

which is also a vector in U . Since the scalars λi were arbitrary, this shows
that U is closed under both addition and scalar multiplication.

2. Let V = C3 , and let U be the subset

U = {(x, x + 1, 0) : x ∈ C}.

We claim that this is not a subspace (equivalently, it is not a vector space). For
example, we can see that 0 6= (x, x + 1, 0) for any x ∈ C, so U doesn’t contain 0.
Also, U is not closed under addition, because

(x, x + 1, 0) + (y, y + 1, 0) = (x + y, x + y + 2, 0)

and the right-hand side here does not have the form (z, z + 1, 0) for any z ∈ C.
A third reason that U is not a subspace is that it is not closed under scalar
multiplication, because

λ(x, x + 1, 0) = (λx, λx + λ, 0)

and if we pick any complex number λ 6= 1 then the right-hand side does not have
the form (y, y + 1, 0), so is not an element of U .

3. Let V = R3 again, and this time let U be the subset

U = {(x, y, z) : x, y, z ∈ R, x ≥ 0}.

This contains 0, and is closed under addition (Exercise: check these for yourself!)
but U is not a subspace, because it is not closed under scalar multiplication; e.g.
(−1)(x, y, z) = (−x, −y, −z), which is not in U if we pick x > 0.

4. Let V = R3 or C3 , and now take

U = {(m, n, p) : m, n, p ∈ Z}.

Since 0 ∈ Z, we have 0 ∈ U , and also U is closed under addition (sums of integers


are integers). But again U is not closed under scalar multiplication; for instance:
   
1 1 1 1
(1, 1, 1) = , , 6∈ U.
2 2 2 2

Fundamental example: Solutions to a system of linear equations

The following example is fundamentally important, as every subspace of a vector


space can ultimately be expressed in this way.

15
MA201 Linear Algebra Autumn 2020-21

Let V = Rn or Cn , and think of its elements as column vectors of length n. If A is


an m × n matrix and x ∈ V , then the product Ax is a vector of length m.
So suppose we have a matrix-vector equation

Ax = 0. (∗)

Where 0 is the all-zero column vector of length m. We claim that the subset U ⊆ V
of all vectors x satisfying (∗) is a subspace of V . This is an Exercise (Sheet 2).

16
MA201 Linear Algebra Autumn 2020-21

4 New subspaces from old (sums and intersections)

Recall: If V is any set, and if U and W are subsets, then the intersection of U and
W is the set
U ∩ W = {x ∈ V : x ∈ U and x ∈ W }.

Definition 4.1. If U , W are subspaces of a vector space V , then the sum of U


and W is the set
U + W = {u + w : u ∈ U, w ∈ W }.

Proposition 4.2. If U and W are subspaces of V , then so are U ∩ W and U + W .

Proof. We check the subspace conditions, firstly for U ∩ W :

ˆ 0 ∈ U and 0 ∈ W , since they are subspaces, and so 0 ∈ U ∩ W .

ˆ If v, v0 ∈ U ∩ W then v + v0 ∈ U , since it is a subspace, and v + v0 ∈ W , since


it is a subspace, and so v + v0 ∈ U ∩ W .

ˆ If v ∈ U ∩ W and λ ∈ F then λv ∈ U and λv ∈ W since these are subspaces,


and therefore λv ∈ U ∩ W .

Hence U ∩ W is a subspace of V . Similarly, for U + W :

ˆ 0 = |{z} 0 ∈ U + W.
0 + |{z}
∈U ∈W

ˆ If u1 + w1 ∈ U + W and u2 + w2 ∈ U + W , then

(u1 + w1 ) + (u2 + w2 ) = (u1 + u2 ) + (w1 + w2 ) ∈ U + W.


| {z } | {z }
∈U ∈W

ˆ If u + w ∈ U + W and λ ∈ F then λ(u + w) = |{z} λw ∈ U + W .


λu + |{z}
∈U ∈W

Hence U + W is also a subspace of V .

Examples 4.3 (Special cases).

ˆ If U ⊆ W then U ∩ W = U (this is clear) and also

U + W = {u + w : u ∈ U, w ∈ W }
= {0 + u + w : u ∈ U, w ∈ W }
| {z }
∈W
= 0 + w0 : w0 ∈ W


= W.

17
MA201 Linear Algebra Autumn 2020-21

ˆ Similarly if U = W then U = W = U ∩ W = U + W .

ˆ For a concrete example, take V = R2 and let U and W be lines through the
origin. Then we have

U = {λv1 : λ ∈ R},
W = {λv2 : λ ∈ R}

for some non-zero vectors v1 , v2 ∈ R2 .


Now, U and W are the same line if and only if v1 and v2 are scalar multiples
of each other, say v2 = λv1 for some fixed scalar λ. In this case we have
U = W = U ∩ W , and

U + W = {u + w : u ∈ U, w ∈ W }
= {µ1 (λv1 ) + µ2 v1 : µ2 , µ2 ∈ R}
= {γv1 : γ ∈ R}
= U = W = U ∩ W.

On the other hand, if U and W are not the same line, then the only multiples
of v1 and v2 which are equal are the zero multiples, i.e. 0. Hence U ∩ W = {0}
in this case. Moreover, U + W will be equal to all of R2 . This can easily be
seen if we draw a picture (and we’ll see a more rigorous way to show this later).

18
MA201 Linear Algebra Autumn 2020-21

5 Linear independence, spanning sets and bases

Linear dependence and independence

Consider the linear equation x+y +z = 0. Some easy solutions to this are (x, y, z) =
(1, −1, 0), (0, 1, −1) and (−1, 0, 1). But these three solutions are redundant: For
example, we can write

(−1, 0, 1) = −(0, 1, −1) − (1, −1, 0),

and because the equation is linear, we know that linear combination of solutions will
also be a solution. So the solution (−1, 0, 1) can be recovered from the other two
using sums and scalar multiples. We say that the set {(1, −1, 0), (0, 1, −1), (1, 0, −1)}
is linearly dependent. More rigorously:

Definition 5.1. A set of vectors {v1 , . . . , vn } is called linearly dependent if


some equation
a1 v1 + a2 v2 + · · · + an vn = 0 (∗)
is satisfied, where at least one coefficient ai is non-zero.
A set of vectors is called linearly independent if it is not linearly dependent.
An equation of the form (∗) is called a linear relation. We say that the linear
relation is non-trivial if at least one coefficient ai is non-zero.

We can also turn this definition around: The set {v1 , . . . , vn } is linearly independent
precisely when the only relation (∗) it satisfies is the trivial relation with a1 = a2 =
· · · = an = 0.
Remark: A set of vectors {v1 , . . . , vn } is linearly dependent if and only if some
vi can be written as a linear combination of the others: If {v1 , . . . , vn } satisfy the
linear equation (∗) above, and if ai 6= 0, then we can write
1 X
vi = − a j vj .
ai
j6=i
P
Conversely, if we can write vi = j6=i bj vj for some scalars bj then, noticing that
vi has a non-zero coefficient (‘1’), we can move everything to the left-hand side to
get a non-trivial linear relation of the form (∗).
Examples 5.2.

1. If v1 , v2 are two vectors in a vector space V then {v1 , v2 } is linearly dependent


if and only if a1 v1 + a2 v2 = 0, where either a1 or a2 is non-zero. This is the same
thing as saying that either v1 is a multiple of v2 , or vice-versa (or both).

2. Consider the familiar ‘geometric’ vectors in R2 or R3 , where we think of (x, y) as


the arrow going from the origin to the point (x, y), and similarly for R3 . By the

19
MA201 Linear Algebra Autumn 2020-21

previous example, we know that two such vectors form a linearly dependent set
if and only if the vectors are parallel. In particular, two such vectors are linearly
independent if and only if they are not parallel (i.e. they are not contained in the
same line through the origin).
3. The set C of complex numbers can be thought of as a vector space over C, or as
a vector space over R (thinking of the complex number x + iy as a pair (x, y) of
real numbers).
ˆ As a vector space over C, all pairs of complex numbers are scalar multiples
of each other, so every set {z1 , z2 } of distinct complex numbers is linearly
dependent.
ˆ On the other hand, as a vector space over R, two complex numbers
z1 = x1 + iy1 , z2 = x2 + iy2
are scalar multiples if and only if there are real numbers λ1 and λ2 such that
λ1 (x1 + iy1 ) + λ2 (x2 + iy2 ) = 0.
Identifying real and imaginary parts, this means that we need
λ 1 x1 + λ 2 x2 = 0 and λ1 y1 + λ2 y2 = 0.
So for instance, the set {1, i} is linearly independent, since 1 and i are not
(real) scalar multiples of one another.

The following proposition shows that linear dependence is preserved when making
a set larger:
Proposition 5.3. Let V be a vector space, and suppose {v1 , . . . , vn } is linearly
dependent. Then for every w ∈ V , the set {v1 , . . . , vn , w} is also linearly dependent.

P By hypothesis, the set {v1 , . . . , vn } satisfies some non-trivial linear relation,


Proof.
say ni=1 ai vi = 0, where at least one of the ai is non-zero. We can extend this to
aPrelation for the bigger set {v1 , . . . , vn , w} by taking w to have coefficient 0, i.e.
( ni=1 ai vi ) + 0w = 0, and this still has a non-zero coefficient, namely ai , so the
bigger set is linearly dependent.
Example 5.4. Every set of three vectors in R2 is linearly dependent.
Suppose u, v, w ∈ R2 . If two of these are linearly dependent, then by the proposition,
so are all three. Therefore we can assume that each pair of vectors is linearly
independent. In particular, u and v point in different directions in R2 . But this now
means that we can get every vector in R2 by taking a linear combination of u and
v. In particular, I can write w = au + bv for some scalars a and b. So the set is
linearly dependent.

In conclusion: We can have a set of two linearly independent vectors in R2 , but not
a set of three. Moreover, we’ve seen that a set of two geometric vectors is linearly
independent if and only if the vectors define a plane. This discussion extends to R3
and above:

20
MA201 Linear Algebra Autumn 2020-21

ˆ The empty set ∅ and the zero vector 0 ∈ R3 define the zero subspace, contain-
ing only the zero vector 0.

ˆ A single non-zero vector v ∈ R3 defines a line {λv : λ ∈ R}.

ˆ Two vectors v, u define a line if they are linearly dependent (and not both 0),
otherwise they define the plane

{av + bu : a, b ∈ R}.

ˆ Three vectors in R3 are linearly dependent precisely when they define a line
or a plane. If they define a plane, then one of them lies in the plane defined
by the other two (so it is a linear combination of them). If they define a line,
then all three of them are scalar multiples of each other.

ˆ We can’t have four linearly independent vectors in R3 : If we have three lin-


early independent vectors, then we can make every vector in R3 from linear
combinations of these; this means that four vectors always satisfy a non-trivial
linear relation.

This pattern extends to Rn : This contains a set of n linearly independent vectors,


but not a set of n + 1 linearly independent vectors. We’ll come back to this later on.
Remark 5.5 (The zero vector is linearly dependent). The vector 0 is the unique
vector such that λ0 = 0 for all scalars λ. If λ 6= 0 then ‘λ0 = 0’ is a non-trivial
linear relation! So {0} is a linearly dependent set. By Proposition 5.3 above, any
set which contains 0 is automatically linearly dependent.

Spans and spanning sets

The discussion above leads us on to an important class of subspaces.

Definition 5.6. Let V be a vector space over F and let X be a subset of V . The
span of X is the subset of V consisting of all linear combinations of elements of
X. We denote this by hXi:

hXi = {λ1 v1 + λ2 v2 + · · · + λr vr : λ1 , . . . , λr ∈ F, v1 , . . . , vr ∈ X, r ≥ 0}.

In some places this is written hXiF , to emphasise that the scalars λi come
from F . Other notation commonly found in books is Span(X) or SpanF (X).
If X = {v1 , v2 , . . . , vn } is finite, we sometimes write hv1 , . . . , vn i instead of
h{v1 , v2 , . . . , vn }i.
If hXi = V then we say that X is a spanning set in V , or X spans V .

Lemma 5.7. hXi is a subspace of V .

We get 0 ∈ hXi by taking the zero linear combination, i.e. λi = 0 for all i. To see
that hXi is closed under taking linear combinations, recall from Exercise Sheet 1

21
MA201 Linear Algebra Autumn 2020-21

that a linear combination of some linear combinations of elements of X, is again a


linearP m ∈ hXi, so that
combination of elements of X! In more detail, if u1 , . . . , uP
ui = nj=1 λij vj for some scalars λij , then a linear combination m i=1 ai ui can be
written
m m n n m
!
X X X X X
ai ui = ai λij vj = ai λij vj ,
i=1 i=1 j=1 j=1 i=1
which is a linear combination of v1 , . . . , vn , and hence an element of hXi.
We can think of the span as the smallest subspace of V which contains X (i.e. if
U is a subspace of V , and X ⊆ U , then hXi ⊆ U also, because U is closed under
taking linear combinations).
Examples 5.8.

1. If V = Rn (n ≥ 0) and v 6= 0, then hvi = {λv : λ ∈ R}, the line through the


origin containing v.
2. If v, w ∈ Rn then hv, wi = {λv + µw : λ, µ ∈ R}. This set is either:
ˆ a plane if v and w are linearly independent;
ˆ a line if either v or w is a multiple of the other, and not both 0. For
instance, if v = cw for some c ∈ R then
{λv + µw : λ, µ ∈ R} = {λcw + µw : λ, µ ∈ R},
= {(λc + µ)w : λ, µ ∈ R},
= {λw : λ ∈ R},
= hwi
where in the third line, we have used the fact that (λc + µ) takes all real
values as we vary λ, µ.
ˆ if v = w = 0 then hv, wi is the zero subspace, {0}.

Bases and dimension

Linearly independent sets are those which contain no redundant vectors (they are
not ‘too big’). On the other hand, spanning sets are those from which you can obtain
every vector in the space (they are not ‘too small’). Sets which satisfy both of these
properties are therefore extremely useful:
Definition 5.9. A basis of a vector space V is a subset of V which is both
linearly independent and spans V .

[Side-note: The plural of basis is bases, pronounced base-eez. Not to be confused


with the plural of ‘base’, which is pronounced base-iz.]
So far, we have not mentioned any ways to distinguish one vector space with another.
One way is to talk about its dimension. We already have an intuitive notion of what
‘dimension’ means, e.g. R2 is 2-dimensional, R3 is 3-dimensional, and so on. Bases
let us turn this intuition into rigorous mathematics.

22
MA201 Linear Algebra Autumn 2020-21

Definition 5.10. If V has a finite basis B ⊂ V , then the dimension of V is


defined to be the size of B.

Example 5.11. Let V = Rn or Cn . Then the standard basis of V is the set


{e1 , . . . , en }, where for each i, the element ei is the vector having 1 in the i-th
coordinate, and zero everywhere else. This set spans V , because we can write

(a1 , a2 , . . . , an ) = a1 e1 + a2 e2 + · · · + an en

for any choice of vector (a1 , . . .P , an ) ∈ V . The set is also linearly independent,
n
because the i-th coordinate of i=1 ai ei is ai , so if this sum is the zero vector
(0, 0, . . . , 0) then we have ai = 0 for all i. So {e1 , . . . , en } is indeed a basis of V .
In particular, Rn has dimension n as a vector space over R, and Cn has dimension
n as a vector space over C, as we might have hoped!
Remark 5.12. The one-point vector space {0} is special. Its only subsets are ∅
and {0}, so its only basis is the empty set ∅, and hence this is a 0-dimensional
vector space.

Methods for spans and linear (in)dependence

When we are given vectors in F n (i.e. lists of numbers), there are straightforward
‘algorithms’ which will tell us whether the set is linear independent or spanning.
To see whether a finite set {v1 , . . . , vr } is linearly dependent, we want to find a
linear relation between its vectors. Gaussian elimnation lets us to this:

ˆ Write the vectors as the rows of a matrix. If we have r vectors in F n , this


gives an r × n matrix, call it A.

ˆ Use Gaussian elimination to put this matrix in row-echelon form; let B be the
matrix we obtain. In the end, each row of B is a non-zero linear combination
of the rows of A (which are the vectors v1 , . . . , vr ).

ˆ So if B P
has a row of zeroes, this tells us that we have a non-trivial linear
relation ri=1 ai vi = 0.

Conversely, if the rows of A are linearly dependent, then B will always have a row of
zeroes at the bottom (Gaussian elimination produces as many such rows as possible).
Example 5.13. Let V = R3 or C3 and consider the set

X = {(1, 2, 3), (1, 1, 2), (1, 4, 5)}.

Is this set linearly independent? Does it span V ? If not, what is Span(X)?


Call the three vectors v1 , v2 , v3 . As suggested above, we use these vectors as the
rows of a matrix, and reduce to row echelon form. We will also keep track of the

23
MA201 Linear Algebra Autumn 2020-21

linear combinations we make along the way:


   
1 2 3 v1 1 2 3 v1
1 1 2 v2 0 −1 −1 v2 − v1
1 4 5 v3 0 2 2 v3 − v1
 
1 2 3 v1
0 −1 −1 v 2 − v1
0 0 0 (v3 − v1 ) + 2(v2 − v1 )

And this final row shows us that −3v1 + 2v2 + v3 = 0. So X is linearly dependent.
To calculate Span(X), we now know that we can write v3 = 3v1 − 2v2 . Therefore,
if a vector is a linear combination of v1 , v2 and v3 , we can substitute for v3 to write
the vector as a linear combination of v1 and v2 . Therefore Span(X) = Span(v1 , v2 ).
Finally, note that the set {v1 , v2 } is linearly independent. We can see this in two
ways: We can spot that the two vectors are not scalar multiples of one another (which
happens to work in this special case), or (a method which works more generally):
we can write these vectors as therows of a matrix,
 and use Gaussian elimination. In
1 2 3
this case, the matrix reduces to , which has no rows of zeroes, which
0 −1 −1
tells us the set is linearly independent.
Now, as {v1 , v2 } is linearly independent, it is a basis of the subspace Span({v1 , v2 }).
So this span is two-dimensional (geometrically, it is a plane in R3 ). More explicitly,

Span({v1 , v2 , v3 }) = Span({v1 , v2 }) = {a(1, 2, 3) + b(1, 1, 2) : a, b ∈ R}


= {(a + b, 2a + b, 3a + 2b) : a, b, ∈ R}.

Basis, minimal spanning sets, maximal linearly independent sets

The following result is very important:


Proposition 5.14. Let X be a subset of a vector space V . The following properties
are equivalent:

ˆ X is a minimal spanning set in V (i.e. all proper subsets of X are not spanning).

ˆ X is a maximal linearly independent subset of V (i.e. all subsets of V which


properly contain X are linearly dependent).

ˆ X is a basis of V .

The proof of this result is an exercise on Exercise Sheet 3. This result is useful
because it tells us that we can always find a basis by shrinking a spanning set, or
adding to a linearly independent set. We will make use of this next week, when
proving that the dimension (size of a basis) is a uniquely-determined number.

24
MA201 Linear Algebra Autumn 2020-21

Bases and linear systems

The terms ‘spanning’, ‘linear independent’ and ‘basis’ all have interpretations in
terms of solutions to linear systems:
Proposition 5.15. Let V be a vector space and let X ⊆ V . Then

(i) X spans V if and only if, for all subsets {v1 , . . . , vr } ⊆ X and each w ∈ V ,
the equation
X r
λ i vi = w (†)
i=1

has at least one solution (λ1 , . . . , λr ).

(ii) X is linearly independent if and only if every equation of the form (†) has
at most one solution, for each w ∈ V .

(iii) X is a basis of V if and only if every equation (†) has a unique solution, for
each w ∈ V .

Proof. (i) is just another way of saying X spans V .


(ii) Suppose two solutions give us the same vector w, so

λ1 v1 + · · · + λn vn = w = µ1 v1 + · · · + µn vn

for some scalars λi and µi . This equation rearranges to ni=1 (λi − µi )vi = 0. This is
P
a linear relation, and we are assuming that B is linearly independent, and therefore
all the coefficients must be zero. In other words, λi = µi for all i.
P
Conversely, if X is linearly dependent, so that αi vi = 0 where some scalar αi is
non-zero, then we see that the equation
r
X
λ i vi = 0
i=1

has two solutions: Namely (λ1 , . . . , λr ) = (0, 0, . . . , 0), or (λ1 , . . . , λr ) = (α1 , . . . , αr ).


Since some αi 6= 0, these are distinct solutions.
(iii) now follows from (i) and (ii).

25
MA201 Linear Algebra Autumn 2020-21

6 All bases have the same size

We now have a mathematical definition of the dimension of a vector space. We now


prove that this definition of ‘dimension’ makes sense – in other words, we need to
show that a vector space cannot contain two bases of different sizes!
Recall that Sheet 3 asked you to show that a X is a basis of V if and only if it is a
minimal spanning set (i.e. every proper subset is not spanning), if and only if X is a
maximal linearly independent set (i.e. every bigger set Y ) X is linearly dependent).

Corollary 6.1. (1) Every linearly independent set can be extended to a basis (just
add in vectors not in the span, one at a time, until you get a spanning set. This
will not break the property ‘linearly independent’).
(2) Every spanning set can be reduced to a basis (throw away elements until you
cannot throw away any more without breaking the property ‘spanning’).

We will now prove the following very important result, which tells us that the di-
mension is a uniquely-defined property of a vector space.
Theorem 6.2. Let V be a vector space.

(1) Let S = {v1 , . . . , vn } ⊆ V be a spanning set, and let I = {w1 , . . . , wm } ⊆ V


be linearly independent. Then n ≥ m.

(2) All bases of V have the same size.

Proof. 1 The idea is to show that we can replace the vectors in S with vectors in I,
one at a time, without breaking the ‘spanning’ property. By doing this, we will end
up showing that we can fit the whole of I into S, so that |I| ≤ |S|.
To get started, note that because S is spanning, every vector in I can be written as
a linear combination of elements in S. In particular, there exist scalars λ1 , . . . , λn
such that
Xn
w1 = λ i vi .
i=1
Now, because I is linearly independent, the right-hand side of this equation cannot
be 0, which means that λi 6= 0 for at least one i. Without loss of generality, we can
re-order the vectors v1 , . . . , vn so that λ1 6= 0 in the above equation. This gives us
w1 = λ1 v1 + · · · + λn vn
1 1
⇒ v1 = w1 − (λ2 v2 + · · · + λn vn ) .
λ1 λ1
This shows that v1 is in the span hw1 , v2 , v3 , . . . , vn i. This means that
{v1 , v2 , . . . , vn } ⊆ hw1 , v2 , . . . , vn i
⇒ hv1 , v2 , . . . , vn i ⊆ hw1 , v2 , . . . , vn i

26
MA201 Linear Algebra Autumn 2020-21

Since S is spanning, the left-hand side is equal to V , and therefore the right-hand
side is also equal to V . This shows that the set
def
S1 = {w1 , v2 , . . . , vn }
is also spanning.
Similarly, since S1 is spanning, we can write
w2 = µ1 w1 + α2 v2 + α3 v3 + · · · + αn vn
for some scalars µ1 , α2 , . . ., αn . Again, using the fact that I is linearly independent,
at least one of the αi must be non-zero, otherwise we would get the non-trivial
relation w2 = µ1 w1 between elements in I. So again, we can rearrange the elements
{v2 , . . . , vn } to assume that α2 6= 0, and we get
w2 = µ1 w1 + α2 v2 + α3 v3 + · · · + αn vn
1 1
⇒ v2 = (w2 − µ1 w1 ) − (α3 v3 + · · · + αn vn ) .
α2 α2
This shows that v2 ∈ hw1 , w2 , v3 , v4 , . . . , vn i, and therefore
{w1 , v2 , . . . , vn } ⊆ hw1 , w2 , v3 , v4 , . . . , vn i
⇒ hw1 , v2 , . . . , vn i ⊆ hw1 , w2 , v3 , . . . , vn i ,
and since we showed that S1 is spanning, the left-hand side is V hence so is the
right-hand side. Therefore the set
def
S2 = {w1 , w2 , v3 , . . . , vn }
is also spanning.
We now proceed in this way, inductively: We keep replacing an element of S with
an element from I, until we run out of elements in S or elements in I. We end up
with a spanning set
Sp = {w1 , . . . , wp , vp+1 , vp+2 , . . . , vn }
where p = min{n, m}. Recall that we’re trying to shown n ≥ m.
Now note that if n < m then Sp = Sn = {w1 , . . . , wn } spans V , and moreover there
exists another vector wn+1 ∈ I. This means that wn+1 can be written as a linear
combination of the other elements in I; but this is impossible because I is linearly
independent. This is a contradiction, and so we must have n ≥ m, which is what
we wanted to show.
2 Let B1 and B2 be bases of V . By Part 1, since B1 is spanning and B2 is linearly
independent, we have |B1 | ≥ |B2 |. But for the same reason, since B2 is spanning
and B1 is linearly independent, we have |B1 | ≥ |B2 |. So B1 and B2 have the same
size.
Corollary 6.3. If U ⊆ V are vector spaces, then dim U ≤ dim V .

Proof. Pick a basis of U . This is linearly independent, hence it can be extended to


a basis of V . Therefore a basis of V is at least as large as a basis of U .

27
MA201 Linear Algebra Autumn 2020-21

Dimensions of sums and intersections

Proposition 6.4. Let V be a vector space and let U , W be subspaces. Then


dim(U + W ) = dim U + dim W − dim(U ∩ W ).

Proof. Since U ∩ W ⊆ U and U ∩ W ⊆ W it follows that


dim U ∩ W ≤ dim U and dim U ∩ W ≤ dim W,
and so we can label the dimensions as follows:
dim(U + W ) = |dim {zW} − dim(U
{z U} + |dim | {z
∩ W).
}
r+s r+t r

We will prove the result by showing that dim(U + W ) = r + s + t.


Let XU ∩W = {v1 , . . . , vr } be a basis of U ∩ W . This is a linearly independent subset
of U and of W , so we can extend this to a basis of U and a basis of W :
XU = {v1 , . . . , vr , u1 , . . . , us },
XW = {v1 , . . . , vr , w1 , . . . , wt }.
We now claim that X = {v1 , . . . , vr , u1 , . . . , us , w1 , . . . , wt } is a basis of U + W .
Firstly, if u ∈ U and w ∈ W , then we can write
X  X 
u= λ i ui + µ j vj as XU spans U , and
X  X 
w= λ0i wi + µ0j vj as XW spans W .

But each of the expressions on the right-hand side is in hXi, so u + w ∈ hXi, too.
Secondly, we must show that X is linearly independent. We know that XU ∩W , XU
and XW are linearly independent since they are bases of their subspaces. So suppose
we have an equation
r
X s
X t
X
λ i vi + µ j uj + σk wk = 0.
i=1 j=1 k=1

We need to show that λi = µj = σk = 0 for all i, j, k. Rearranging the above


equation, we get  
Xr s
X t
X
 λ i vi + µj uj = −
 σk wk
i=1 j=1 k=1

The left-hand side here is in U , and the right-hand side is in W . So both sides lie
in U ∩ W .
In particular, this says that − tk=1 σk wk is in the span of XU ∩W = {v1 , . . . , vr };
P
this implies that σk = 0 for all k since the set XW = {v1 , . . . , vr , w1 , . . . , wt } is
linearly independent.
An
Pr identical argument shows that µj = 0 for all j; then the above equation becomes
i=1 λi vi = 0, and so λi = 0 for all i since XU ∩W is linearly independent.

28
MA201 Linear Algebra Autumn 2020-21

Examples 6.5.

1. Consider two lines in R2 . These each have dimension 1, and so we have two
possiblities: Since dim(U1 ∩ U2 ) + dim(U1 + U2 ) = 2 and dim(U1 ∩ U2 ) ≤ dim U1 =
1, the intersection is a point (dimension 0) precisely when the sum is all of R2
(dimension 2),

2. The same thing holds for two lines in R3 , or in R4 , and so on.

3. Now let U1 and U2 be two planes in R3 . Now we have

dim(U1 ∩ U2 ) + dim(U1 + U2 ) = 4

and moreover dim(U1 ∩ U2 ) ≤ dim U1 = 2 and dim(U1 + U2 ) ≤ dim R3 = 3. So we


have two cases: Either dim(U1 ∩ U2 ) = 1 and dim(U1 + U2 ) = 3, so the two planes
intersect in a line and sum to all of R3 , or we have dim(U1 +U2 ) = dim(U1 ∩U2 ) =
2, which only happens when the two planes are the same.

4. Things get more interesting when we consider two planes U1 and U2 in R4 . Now,
we have three possibilities:

dim(U1 ∩ U2 ) dim(U1 + U2 )
(a) 0 4
(b) 1 3
(c) 2 2

Cases (b) and (c) are familiar: If two planes intersect in a line, then their sum
is a 3-dimensional subspace of R4 (note: just as there are lots of lines in R2 and
lots of planes in R3 , there are lots of 3-D subspaces of R4 ).
But case (a) is a new: In R4 (and in higher-dimensional spaces Rn with n > 4),
it is possible for two planes to meet in a single point (the origin). In this case,
the sum of the two planes is R4 (or a 4-dimensional subspace of Rn , if n > 4).
To make this very concrete, let U1 and U2 be the following subspaces of R4 :

U1 = {(x, y, 0, 0) : x, y ∈ R},
U2 = {(0, 0, z, u) : z, u ∈ R}.

Then clearly U1 ∩ U2 = {(0, 0, 0, 0)}, and we can trivially spot how any vector in
R4 can be written as a sum of a vector in U1 and a vector in U2 .

29
MA201 Linear Algebra Autumn 2020-21

7 The rank of a matrix

Row rank, column rank, determinantal rank

Observation 7.1. If A is an m × n matrix with entries in F , then the rows of A


are vectors in F m , and the columns are vectors in F n :

. . , ain ) ∈ F n ,

  row vectors (ai1
, ai2 , .
a a · · · a

11 12 1n

a1j
( 

 a21 a22 · · · a2n  

 a2j 
m  .
 
.. .. ..  column vectors  .  ∈ F m
 ..
 
. . .   .
. 

 
am1 am2 · · · amn


amj

| {z }
n

Definition 7.2. Let A be an m × n matrix with entries in a field F .

ˆ The row space of A is the subspace of F n spanned by the rows of A.

ˆ The column space of A is the subspace of F m spanned by the columns of


A.

ˆ The row rank of A is the dimension of the row space of A.

ˆ The column rank of A is the dimension of the column space of A.

ˆ The determinantal rank of A is the size of the largest square sub-matrix


of A with non-zero determinant.

Examples 7.3.
 
0 ··· 0
1. If A is the all-zero m × n matrix  ... . . . ... , then the row space of A is
 

0 ··· 0
spanned by the zero vector 0 = (0, 0, . . . , 0) ∈ Rn , hence is {0}, and similarly the
column space of A is {0} ⊆ Rm . All of the square sub-matrices of size ≥ 1 have
determinant zero, and so the determinantal rank of A is zero.
 
0 1 2
2. Let A = . The row space is the span of the set {(0, 1, 2), (0, 1, 3)}.
0 1 3
These two vectors are not multiples of each other, so the set is linearly indepen-
dent, hence it spans a 2-dimensional subspace of R3 , and the row rank of A is
2.
The column space of A is the span of the set 00 , 11 , 23 . This is linearly
   

dependent, because it contains the zero vector 00 . However, if we throw this




away, the set 11 , 23 is linearly independent, and so the column rank of A is


  

2.

30
MA201 Linear Algebra Autumn 2020-21

 we see that A has a 2 × 2 sub-matrix with non-zero determinant, namely


Finally,

1 2
.
1 3
 
1 2 3
3. Let A = . This time, we see that the set {(1, 2, 3), (1, 2, 3)} is linearly
1 2 3
dependent. Since the set {(1, 2, 3)} is linearly independent, the row rank of A is
1.
The column space is the span of 11 , 22 , 33 . Since these vectors are all mul-
   

tiples of each other, we can throw two of them away without changing what the
span is. So the column space is the span of 11 , so the column rank is 1.
 

 A and all its 2 × 2 sub-matrices have determinant zero: For instance


Finally,

1 3
has determinant zero because its two rows are multiples of one another.
1 3
However A has 1 × 1 sub-matrices with non-zero determinant, for instance (2).
So the determinantal rank of A is 1.

Theorem 7.4. Let A be an m × n matrix.

(1) The row rank, column rank and determinantal rank of A are all equal.

(2) If A is square, then the numbers in 1 are equal to the number of non-zero
eigenvalues of A, counted with multiplicity.

We will prove the first part of this theorem. The second part requires properties of
linear maps which we have not yet covered.
Definition 7.5. The rank of an m × n matrix A is defined to be any of the
above numbers (which are all the same). This is denoted by rank(A).

Example 7.6. Let A be the diagonal n × n matrix with diagonal entries λ1 , λ2 , . . .,


λn . Then a maximal linearly independent set of rows (or columns) is given by the set
of all rows (or columns) where λi 6= 0. Moreover, if we delete all rows and columns
where λi = 0, we are left with a matrix whose determinant is the product of the
other λi s, which are all non-zero. So the row-rank, column rank and determinantal
rank are evidently all the same in this case.

Row rank equals column rank

This part of Theorem 7.4 can be boiled down to two statements. Firstly, Propo-
sition 7.9 shows that elementary row operations don’t change the row space of a
matrix. Secondly, Proposition 7.10 shows that row operations don’t change the col-
umn rank (although the column space itself might be shifted around). This means
that we can assume that our matrix is in reduced row echelon form, where it is easier
to study the row space and column space.
First, we need another way of thinking about elementary row operations:

31
MA201 Linear Algebra Autumn 2020-21

Proposition 7.7. Let A be an m × n matrix. Then performing elementary row


operations on A corresponds to multiplying A on the left by an invertible matrix,
as follows:

1. If we swap rows i and j of A, we get the matrix Sij A, where Sij is the matrix
formed by swapping rows i and j of the m × m identity matrix Im .

2. If we multiply row i of A by a scalar λ, then we get the matrix Di (λ)A, where


Di (λ) is the matrix formed by multiplying the (i, i)-entry of Im by λ.

3. If we add the j-th row of A to the i-th row, we get Eij A, where Eij is the matrix
formed from Im by adding 1 to the (i, j) position.

4. All of the matrices Sij , Li (λ) and Eij have non-zero determinant, so they are
invertible.

The proof of this proposition is an exercise on this week’s Exercise Sheet. It is


not a difficult proof, however – simply calculate the various products Sij A, etc., and
notice that the result is what we say it is (A with two rows swapped, and so on).
Corollary 7.8. Let A be an m × n matrix. When we put A into row echelon form
(or reduced row echelon form), the matrix we get is equal to BA, for some invertible
m × m matrix B.

This follows because B is the product of various matrices Sij , Di (λ) and Eij from
the above proposition. Since det(XY ) = det(X) det(Y ) for all matrices, det(B) 6= 0
so B is also invertible.
Proposition 7.9. If A is an m × n matrix and B is invertible, then

row space(A) = row space(BA).

Proof. Each row of BA is a linear combination of rows of A. So linear combinations


of rows of BA are also linear combinations of rows of A. Therefore

row space(BA) ⊆ row space(A).

On the other hand, since B is invertible we can applying the same reasoning to BA
instead of A, to get

row space(B −1 (BA)) ⊆ row space(BA).

And since the left-hand side is just the row space of A, the spaces are equal, as
claimed.
Proposition 7.10. If A is an m × n matrix and B is invertible, then

column rank(A) = column rank(BA).

32
MA201 Linear Algebra Autumn 2020-21

Proof. Let {v1 , . . . , vn } be the column vectors of A. Then the column vectors of
BA are {Bv1 , . . . , Bvn }. So we want to show that the spans of these two sets have
the same dimension.
Pn
Now suppose that Pnwe have a non-trivial linear relation i=1 λi (Bvi ) = 0. This
rearranges to B i=1 λi vi = 0. Since P B is invertible, every equation Bx = 0 has a
unique solution, and so we have ni=1 λi vi = 0.
Thus, every non-trivial linear relation in {Bv1 , . . . , Bvn } gives a non-trivial linear
relation in {v1 , . . . , vn }, So if we reduce {Bv1 , . . . , Bvn } to a basis of the column
space of BA by throwing away some vectors Bvi , we can also throw away the
corresponding vectors vi from {v1 , . . . , vn }, and we still get a spanning set. We
don’t know that this new set is a basis for the column space of A, but it does
contain a basis, and therefore

column rank(A) ≤ column rank(BA).

And as in the previous proposition, we get the reverse inequality by considering the
matrices B −1 and BA instead of B and A.
Corollary 7.11. The row rank and the column rank of a matrix are equal.

Proof. By the above propositions, we can assume that A is in reduced row echelon:
 
1
 
 1 
unknowns
 

 1 


 1 


 1 

1
 
 zeroes 
1

In this case, the row rank is clearly equal to the number of non-zero rows. But also,
reading left-to-right, the column rank only ever increases when we encounter a new
leading ‘1’. The number of these is equal to the number of non-zero rows. So these
are the same.

Matrices of full rank

It is clear that the row rank of an m × n matrix A cannot be bigger than m, the
number of rows, and the column rank of A cannot be bigger than n. Since the row
and column ranks are the same, we have

row rank(A) = column rank(A) ≤ min{m, n}.

33
MA201 Linear Algebra Autumn 2020-21

Definition 7.12. We say an m × n matrix A has full rank if

row rank(A) = columnr ank(A) = min{m, n}.

If A happens to be square (so m = n), we have the following result, which is a


special case of Theorem 7.4.

Theorem 7.13. For an n × n matrix A, the following are equivalent.

(a) A has full rank (i.e. row rank(A) = col rank(A) = n).

(b) A is invertible. (i.e. det(A) 6= 0, or A has determinantal rank n).

(c) All of the eigenvalues of A are non-zero.

The equivalence of (a) and (b) is left as an exercise on this week’s Exercise Sheet.
Here, we prove that (a) and (c) are equivalent. Firstly, notice that ‘rank n’ means
that the n columns of A are linearly independent, or in other words, there are no
non-trivial
Pn linear relations between the columns. Now observe that a linear relation
i=1 ai vi = 0 between the column vectors of A is the same thing as saying that
Aa = 0, where a = (a1 , a2 , . . . , an )T . So a non-trivial such linear relation exists if
and only if A has an eigenvector with eigenvalue 0. This shows that (c) is equiavlent
to (a).
 
1 10 11
Example 7.14. The matrix  2 4 50 has determinant −12472 6= 0. So its
−27 4 3
three rows are linearly independent, and its three columns are linearly independent.
So these are both bases of the space F 3 (where F is whichever field we are using for
the entries of the matrix).

Proof of Theorem 7.4(i)


Now we know that the row rank and column rank are equal, it remains to prove
that these equal the determinantal rank.
In one direction, suppose that our matrix A has row rank k. We want to show
that A has a k × k sub-matrix with non-zero determinant. Pick a set of k linearly
independent rows of A. This gives a k × n sub-matrix of rank k. So we can pick a
set of k linearly independent columns. This gives a k × k sub-matrix of rank k. But
this is a matrix of full rank, so its determinant is non-zero by Theorem 7.13. This
shows that
determinantal rank(A) ≥ row rank(A).

Conversely, suppose A has determinantal rank k, so that A has a k × k sub-matrix


with non-zero determinant. Then by Theorem 7.13, the rows (and columns) of
this sub-matrix are linearly independent (there are no non-trivial linear relations

34
MA201 Linear Algebra Autumn 2020-21

between them). But this means there are no non-trivial linear relations between the
corresponding k rows (or columns) of A. This shows that

row rank(A) ≥ determinantal rank(A),

and so these quantities are equal.

35
MA201 Linear Algebra Autumn 2020-21

8 Coordinates and change of basis

Coordinates

Bases are extremely useful objects in linear algebra. One reason for this is that they
let us talk about vectors using coordinates. For instance, your location on a map
can be specified by two numbers precisely because R2 has a basis of size two (in the
map example, this basis would probably be the two vectors ‘1 unit north, 1 unit
east’). This extends to higher dimensions, for instance, you can arrange a meeting
by specifying three coordinates x, y, z and a time t, and this makes sense because
R4 has a basis of size 4.
The space Rn is n-dimensional, because the standard basis {e1 , . . . , en } has size n.
In particular, R3 is 3-dimensional. Its subspaces each have dimension 0, 1, 2 or 3.
The only 0-dimensional subspace is {0}, and the only three dimensional subspace
of R3 is itself (the whole space). We saw last time that the 1-dimensional subspaces
are the lines (each the span of a non-zero vector), and the 2-dimensional subspaces
are the planes (which are the spans of pairs of linearly independent vectors).
Example 8.1. Let V = R3 , and let U be the set of (x, y, z) satisfying x + y + z = 0.
Then U is a subspace (check this!), and we will show that U is a plane. The equation
x + y + z = 0 lets us eliminate one variable, say z = −x − y. This gives us a method
of finding a basis for U . We take an arbitrary vector (x, y, z) = (x, y, −x − y) ∈ U .
This can be written

(x, y, −x − y) = x(1, 0, −1) + y(0, 1, −1)

and since x and y were arbitrary, this shows that every vector in U can be written
as a linear combination of (1, 0, −1) and (0, 1, −1). In fact, this expression is unique:
If
x(1, 0, −1) + y(0, 1, −1) = x0 (1, 0, −1) + y 0 (0, 1, −1)
then x = x0 and y = y 0 . By Proposition 5.15, this shows that {(1, 0, −1), (0, 1, −1)}
is a basis of U , so U is 2-dimensional.

A vital property of bases is that they let us describe vectors using coordinates. By
part (iii) of Proposition 5.15, if we have a basis X = {v1 , . . . , vnP} of a vector space
V , then every vector v ∈ V can be written uniquely as v = ni=1 λi vi for some
scalars λi . So we can represent this vector with the list (λ1 , . . . , λn ). We call these
the coordinates of v with respect to the basis X.
This is actually something we’ve already used, without thinking about it. When we
write a vector in Rn or Cn as (a1 , a2 , . . . , an ), this can be thought of as shorthand
for the expression
a1 e1 + a2 e2 + · · · + an en
where the ei are the standard basis vectors. But sometimes we want to use other
bases.

36
MA201 Linear Algebra Autumn 2020-21

Example 8.2. Suppose that we have two maps with different orientations, and we
want to transform coordinates from one map to the other (i.e. if we know a location
on one map, we want to be able to find it on the other map easily). To make this
more concrete, suppose that on the first map, locations are described in terms of ‘1
unit north’ and ‘1 unit east’, and on the second map the locations are described in
terms of ‘1 unit north-east’ and ‘1 unit north-west.’ To make notation easier, we
give these vectors names:

w1 = 1 unit north, w2 = 1 unit east, v1 = 1 unit north-east, v2 = 1 unit north-west.

Then these vectors are related according to the following equations:


1 1
w1 = √ (v1 + v2 ) , w2 = √ (v1 − v2 ) ,
2 2
1 1
v1 = √ (w1 + w2 ) , v2 = √ (w1 − w2 ) .
2 2
Suppose we are told to get to the point (3, 2) in the coordinates determined by the
basis {v1 , v2 } but our map uses the basis {w1 , w2 }. To get the coordinates with
respect to this basis, we have to transform using the equations above. In symbols,
if we define v = 3v1 + 2v2 , then v has coordinates (3, 2) with respect to {v1 , v2 }.
If we substitute in the above expressions for v1 and v2 , we get
   
1 1 1 1
v = 3v1 + 2v2 = 3 √ w1 + √ w2 + 2 √ w1 − √ w2
2 2 2 2
5 1
= √ w1 + √ w2 ,
2 2
 
so v has coordinates √52 , √12 with respect to the basis {w1 , w2 }.

Notation 8.3. If X is a basis of a vector space V , and v ∈ V , then we write

[v]X = (a1 , a2 , . . . , an )

to indicate that the coordinates of v with respect to X are (a1 , . . . , an ). If we


name the vectors in X, say X = {v1 , . . . , vn }, then this is equivalent to the
equation
v = a1 v1 + a2 v2 + · · · + an vn .

 
Thus in our example above, we have [v]{v1 ,v2 } = (3, 2), and [v]{w1 ,w2 } = √5 , √1 .
2 2

We can also transform back the other way: If we are given the coordinates of a
vector with respect to {w1 , w2 }, then we can substitute in the expressions for w1
and w2 to get the coordinates with respect to {v1 , v2 }. To continue our example

37
MA201 Linear Algebra Autumn 2020-21

with v = √5 w1 + √1 w2 :
2 2

5 1
v = √ w1 + √ w2
2 2
   
5 1 1 1 1 1
=√ √ v 1 + √ v2 + √ √ v1 − √ v2
2 2 2 2 2 2
6 4
= v1 + v2
2 2
= 3v1 + 2v2

which is what we expected to get!


This same process works for arbitrary vectors v. If v = av1 + bv2 for some scalars
a, b, so [v]{v1 ,v2 } , then we can get [v]{w1 ,w2 } by substituting in the expressions for
v1 and v2 :

v = av1 + bv2
   
1 1 1 1
= a √ w1 + √ w2 + b √ w1 − √ w2
2 2 2 2
a+b a−b
= √ w1 + √ w2 ,
2 2
 
√ , a−b
so [v]{w1 ,w2 } = a+b
2

2
. Similarly if we are given v = cw1 +dw2 , i.e. [v]{w1 ,w2 } =
(c, d), then we can go back:

v = cw1 + dw2
   
1 1 1 1
= c √ v1 + √ v2 + d √ v1 − √ v2
2 2 2 2
c+d c−d
= √ v1 + √ v 2 .
2 2
 
so [v]{v1 ,v2 } √ , c−d
= c+d
2

2
.

The process of changing coordinates from one basis to another is called change of
basis, and we can describe this using matrices. To continue our example, if we use
column vectors to denote coordinates, then the above equations can be expressed as
follows:
!  !
√1 √1 a
a+b

2 2 = a−b 2 ,
√1 − √1 b √
2 2 2
!  !
√1 √1 c
c+d

2 2 = c−d 2 .
√1 − √1 d √
2 2 2

In other words, multiplying by the 2×2 matrix shown takes coordinates with respect
to {v1 , v2 } and gives coordinates with respect to {w1 , w2 }. This matrix is called
the change of basis matrix from {v1 , v2 } to {w1 , w2 }.

38
MA201 Linear Algebra Autumn 2020-21

Note: In this example, the same matrix also takes us back the other way. But
often, we will need two different matrices to go from {v1 , v2 } to {w1 , w2 } and from
{w1 , w2 } to {v1 , v2 }. We’ll have more examples of this next time.
Properties:

ˆ Whenever we have a vector space V and bases B1 , B2 , there always exists a


change of basis matrix from B1 to B2 .

ˆ If V is a vector space and B1 , B2 are bases, and if M12 is the change of basis
matrix from B1 to B2 , and M21 is the change of basis matrix from B2 to B1 ,
then M12 and M21 are inverse to one another (i.e. M12 M21 = M21 M12 = In ,
the identity matrix).

Change of basis matrices

Let V be a vector space, let v ∈ V and let

B = {v1 , . . . , vn }, B 0 = {w1 , . . . , wn }

be two bases of V . If v ∈ V then by Proposition 5.15(iii) there are unique scalars


c1 , . . . , cn and d1 , . . . , dn such that

v = c1 v1 + c2 v2 + · · · + cn vn (∗)
= d1 w1 + d2 w2 + · · · + dn wn .

If we write this using the notation from the last lecture, we have

[v]B = (c1 , . . . , cn ), [v]B 0 = (d1 , . . . , dn ).

We now show how (c1 , . . . , cn ) and (d1 , . . . , dn ) are related via a change of basis
matrix.
Note that, because B 0 is a basis, for each i we can write
n
X
vj = aij wi (†)
i=1

for unique scalars aij (j = 1, . . . , n). Therefore, using (∗) we get

v = c1 v1 + c2 v2 + · · · + cn vn
n n n
! ! !
X X X
= c1 ai1 wi + c2 ai2 wi + · · · + ain wi (substituting for vj )
i=1 i=1 i=1
 
n
X Xn
=  aij cj  .wi (rearranging terms)
i=1 j=1

39
MA201 Linear Algebra Autumn 2020-21

But because the scalars d1 , . . . , dn are unique, this implies that di = nj=1 aij cj for
P
all i. If we write (c1 , . . . , cn ) and (d1 , . . . , dn ) as column vectors, these n equations
become the matrix equation:
    
d1 a11 a12 · · · a1n c1
 d2   a21 a22 · · · a2n   c2 
[v]B 0 =  .  =  . ..   ..  = A[v]B ,
    
. . . .
.  . . .  . 
dn an1 · · · ··· ann cn

where A = (aij ) is called the change of basis matrix from B to B 0 .

How to calculate a change of basis matrix

The entries aij in the change of basis matrix come from the equation Pn (†) above. In
particular, if the j-th vector from the first basis, vj , is equal to i=1 aij wi , then the
T
j-th column of the change of basis matrix is a1j a2j · · · anj . In other words:
To get column j of the change of basis matrix from B1 to B2 , take the j-th vector
in B1 and find its coordinates with respect to B2 .
Let’s have some more examples.
Example 8.4. In this example, we’ll see that although we are thinking of a basis
as a set, when we take coordinates it actually matters which order we put the basis
elements in.
Let V = R2 and let B = {v1 , v2 } be any basis. If [v]B = (c1 , c2 ) then this says that
v = c1 v1 + c2 v2 . So if we let B 0 = {v2 , v1 } then clearly [v]B 0 = (c2 , c1 ), which will
usually not be equal to (c1 , c2 ).
Let’s calculate the change of basis matrix from B to B 0 . Following the recipe above:
To get column 1 of the matrix, we have to take the first element of B, v1 , and
get its coordinates with respect to B 0 = {v2 , v1 }. Since v1 = 0v2 + 1v1 , we have
[v1]B 0= (0, 1), and so the first column of the change of basis matrix from B to B 0
0
is . Similarly, since v2 = 1v2 + 0v1 , we have [v2 ]B 0 = (1, 0). Hence the second
1
 
1
column of the matrix is , and so the change of basis matrix from B to B 0 is
0
 
0 1
.
1 0

In this example, like in the example from last time, the change of basis matrix from
B 0 to B is actually the same matrix as from B to B 0 . Let’s have another example
where this is not the case.
Example 8.5 (‘Stretching’). Here let V = R2 again, let B = {e1 , e2 } be the
standard basis, and let B 0 = {5e1 , 2e2 }. If [v]B = (c1 , c2 ) then this says that
v = c1 e1 + c2 e2 . So we would expect that [v]B 0 = (c1 /5, c2 /2): “If I need c1 of e1
to get to v, then I should need only 1/5 of 5e1 to get there,” and similarly for e2 .

40
MA201 Linear Algebra Autumn 2020-21

Let’s prove this. The first element of B is e1 , which is equal to 51 (5e1 ) + 0e2 . So the
 
1/5
first column of the change of basis matrix is . Similarly the second element
0
 
1 0
of B is e2 = 0e1 + 2 (2e2 ). So the second column is . So the change of basis
1/2
 
1/5 0
matrix from B to B 0 is . Hence if v ∈ V and [v]B = (c1 , c2 ), then
0 1/2
    
1/5 0 c1 c1 /5
[v]B0 = =
0 1/2 c2 c2 /5

as we expected.

41
MA201 Linear Algebra Autumn 2020-21

9 Linear maps

Let’s have one more example of change of basis: Let V = R2 again and let B =
{e1 , e2 } be the standard basis and B 0 = {e1 + e2 , e1 − e2 } (this is also a basis).
Again, to get the change of basis matrix from B to B 0 , we have to get the coordinate
of the vectors in B, with respect to B 0 . It is not hard to see that
1 1 1 1
e1 = (e1 + e2 ) + (e1 − e2 ), e2 = (e1 + e2 ) − (e1 − e2 ).
2 2 2 2
1
i.e. [e1 ]B 0 = 12 , 21 , so the first column of the change of basis matrix is 21 , and also

 1  21 1

1 1

[e2 ]B 0 = 2 , − 2 , so the second column is 2 . Thus the matrix is 2 2
1 .
− 12 1
2 −2
 
0 1 1
Exercise 9.1. Show that change of basis matrix from B to B is .
1 −1

The process of ‘changing basis’ can be described as a function, which takes a set of
coordinates (with respect to one basis, say B), and outputs another set of coordinates
(with respect to the new basis, say B 0 ). Let’s call this f : Rn → Rn (or more
generally, F n → F n , where F is our field). This function has several nice properties.
It can be described as a matrix multiplication: f ([v]B ) = [v]B 0 = A[v]B , where A is
the change of basis matrix. From this we can deduce the following.

ˆ If v, w ∈ V then f ([v]B + [w]B ) = f ([v]B ) + f ([w]B ). In other words, if I add


coordinates, then change basis, this is the same thing as changing basis then
adding the coordinates. This makes sense, because in both cases, the result
should just be the coordinates of the vector v + w).
This can be expressed as ‘change of basis preserves addition.’

ˆ If v ∈ V and λ is a scalar, then λf ([v]B ) = f (λ[v]B ). In words, if I scale


the coordinates of v and then change basis, this is the same thing as changing
basis then scaling. This also makes sense because in both cases, we expect to
end up with the coordinates of λv with respect to B 0 .
This can be expressed as ‘change of basis preserves scalar multiplication.’

Definition 9.2. Let V and W be vector spaces over a field F . A linear map
from V to W is a function f : V → W such that

ˆ f (v + w) = f (v) + f (w) for all v, w ∈ V , and

ˆ f (λv) = λf (v) for all vectors v ∈ V and all scalars λ ∈ F .

Some authors also call this a linear mapping, or homomorphism of vector spaces.
If V = W then we say f is a linear transformation V → V . For this reason, it

42
MA201 Linear Algebra Autumn 2020-21

is common to use the letter ‘T ’ for a linear map or linear transformation.

Remark 9.3. One definition of ‘abstract algebra’ is ‘the study of sets with structure’
– So ‘linear algebra’ would be the study of sets with the ‘structure’ of addition and
scalar multiplication. Generally in algebra, the way we compare two such sets is
with structure preserving maps, also often called homomorphisms – For instance, it
is well known that the sets Z and Q have the same size, so there is a bijection between
them. But for the purposes of algebra, they are quite different (e.g. every non-zero
element of Q has a multiplicative inverse, but 2 does not have a multiplicative
inverse in Z). We can express this difference with that fact that there is no bijective
structure-preserving map from Z to Q. You’ll see more of this in MA204.
Examples 9.4.

1. If V is any vector space over F and c ∈ F then f (v) = cv defines a linear map
V → V . Check: For all v, w ∈ V and all λ ∈ F , we have

c(v + w) = cv + cw by the distributive axiom,


c(λv) = (cλ)v = λ(cv) by associativity of multiplication.

and so this is indeed a linear map.


2. If V and W are spaces of column vectors, of lengths n and m respectively, and
A is an m × n matrix, then you have seen in MA114 that

A(v + w) = Av + Aw, and


A(λv) = λAv,

so multiplying by A gives a linear map V → W . In particular, change of basis


gives a linear map.
3. If V and W are any two vector spaces over F , then there is a zero map V → W ,
which sends every vector in V to the zero vector 0 of W . Exercise: Check that
this is a linear map.

Linear maps are the structure-preserving functions between vector spaces, and hence
they have lots of useful properties. For instance:
Proposition 9.5. Let f : V → W be a linear map between two vector spaces V
and W . Then f sends the zero vector of V to the zero vector of W .

In symbols, we can write either f (0) = 0, or f (0V ) = 0W , if we want to make it


clear where each vector lives.

Proof. The ‘additive identity’ axiom tells us that the zero vectors in V and W satisfy
0 + 0 = 0. Using this property, we get

f (0) = f (0 + 0) by the property of zero


= f (0) + f (0) since f is linear.

Now, subtracting f (0) from both sides, we get 0 = f (0), as claimed.

43
MA201 Linear Algebra Autumn 2020-21

Turning this property around, we get a condition for showing that a map is not
linear.
Corollary 9.6. If f : V → W is a map such that f (0) 6= 0, then f is not linear.

For instance, consider the map f : R → R, f (x) = x + a for some fixed a ∈ R. Since
f (0) = 0 + a = a, the corollary tells us that if a 6= 0 then f is not linear.

Linear maps and matrices.

We have seen that change of basis can be described as a function F n → F n , sending


‘old coordinates’ to ‘new coordinates’, and can also be described as multiplication
by a change of basis matrix.
Our goal now is to extend this, and show that every linear map can be described
with a matrix. To start, we make the following
Observation 9.7. A linear map is completely determined by what it does to the
elements of a basis.

Concretely, suppose that T : V → W is a linear map, and let B = {v1 , . . . ,P


vn } be a
basis of V . Then every element of V can be written uniquely in the form ni=1 ci vi
for some scalars c1 , . . . , cn .
Now suppose that we know T (vi ) for each i, and write T (vi ) = wi . Then because
T is linear, we can calculate:
n n n
!
X X X
T (v) = T ci vi = ci T (vi ) = ci wi .
i=1 i=1 i=1

So we know what T (v) is, for all v ∈ V .


Example 9.8. Let T : R2 → R2 be a linear map such that T (1, 0) = (1, 1) and
T (0, 1) = (1, −1). Let’s calculate T (a, b) for arbitrary a and b using the recipe
above:

T (a, b) = T (a(1, 0)+b(0, 1)) = aT (1, 0)+bT (0, 1) = a(1, 1)+b(1, −1) = (a+b, a−b).

Example 9.9. Rotations in R2 . Rotations about the origin O = (0, 0) are linear
maps R2 → R2 . Given this information, let Rθ denote an anticlockwise rotation by
θ about the origin. Let’s calculate Rθ (x, y).
Firstly, by drawing triangles with hypotenuse of length 1, it is easy to calculate that
Rθ (1, 0) = (cos θ, sin θ) and Rθ (0, 1) = (− sin θ, cos θ). Now, calculate Rθ (x, y) using
the recipe above:

Rθ (x, y) = Rθ (x, 0) + Rθ (0, y) = xRθ (1, 0) + yRθ (0, 1)


= x(cos θ, sin θ) + y(− sin θ, cos θ)
= (x cos θ − y sin θ, x sin θ + y cos θ)

44
MA201 Linear Algebra Autumn 2020-21

In this second example, observe that if we use column vectors, then we can write
this as a matrix multiplication:
    
x cos θ − sin θ x
Rθ = .
y sin θ cos θ y

Notice: In this matrix, the first column gives the coordinates of Rθ (1, 0) with
respect to the basis {(1, 0), (0, 1)}, and the second column gives the coordinates of
Rθ (0, 1) with respect to the basis {(1, 0), (0, 1)}. This gives us a clue as to how we
can always express a linear transformation as a matrix.

45
MA201 Linear Algebra Autumn 2020-21

10 Image, kernel, and the rank-nullity theorem


Definition 10.1. Let T : V → W be a linear map.

ˆ The image of T , denoted Im T , is the subset of W consisting of all vectors


of the form T (v):
def
Im T = {T (v) : v ∈ V } ⊆ W.

ˆ The kernel of T , denoted ker T , is the subset of V of all vectors sent to 0


by T :
def
ker T = {v ∈ V : T (v) = 0} ⊆ V.

Theorem 10.2. Let T : V → W be a linear map. Then:

(i) ker T is a subspace of V ;

(ii) Im T is a subspace of W ;

(iii) The rank-nullity theorem: The dimensions of V , Im T and ker T are related
by the equation:

dim(Im T ) + dim(ker T ) = dim V.

The name ‘rank-nullity theorem’ arises because dim(Im T ) is called the rank of T ,
and dim(ker T ) is sometimes called the nullity of T , although not everyone uses this
term nowadays.
Note: dim W does not appear anywhere in this equation! The reason for this is
that, if W is contained in a bigger vector space W 0 , then T still gives a linear map
V → W 0 , but this does not change the sets Im T or ker T .

Proof of (i) and (ii). (i) We need to check the three subspace conditions for the
subset ker T :

ˆ 0 ∈ ker T . This says that T (0) = 0, which we know (Proposition 9.5).


ˆ Let v, v0 ∈ ker T , so T (v) = T (v0 ) = 0. Then
T (v + v0 ) = T (v) + T (v0 ) = 0 + 0 = 0,
which shows that v + v0 ∈ ker T , too.
ˆ Let v ∈ ker T and let λ be a scalar. Then
T (λv) = λT (v) = λ0 = 0,
so λv ∈ ker T , too. Thus we have shown that ker T contains 0 and is closed
under addition and scalar multiplication, so ker T is a subspace of V .

46
MA201 Linear Algebra Autumn 2020-21

(ii) Now we check the subspace conditions for the subset Im T ⊆ W :

ˆ 0 ∈ Im T says that there exists a vector v ∈ V such that T (v) = 0. Taking


v = 0 works, again by Proposition 9.5.

ˆ If w, w0 ∈ Im T , this says that there exists v and v0 ∈ V such that T (v) = w


and T (v0 ) = w0 . Therefore,

T (v + v0 ) = T (v) + T (v0 ) = w + w0 ,

which shows that w + w0 ∈ Im T , too.

ˆ If w ∈ Im T , so that T (v) = w for some v ∈ V , and if λ is a scalar, then

T (λv) = λT (v) = λw,

which shows that λw ∈ Im T . Hence Im T satisfies the subspace conditions.

Some examples of images and kernels.

2 2 2
1. Thinking of elements
 of R as column vectors, let T : R → R be multiplication
1 2
by the matrix . What is ker T ? What is Im T ?
2 1
Notice that if v = xy ∈ R2 then T (v) = x+2y
 
2x+y .
Kernel: If T (v) = 0 then x + 2y = 2x + y = 0. Rearranging, we get x = −2y =
− 12 y. The only solution to this is y = x = 0. Hence ker T = 00 , of dimension
0.
Image: We could calculate this directly, but since we know that dim ker T = 0,
the rank-nullity theorem tells us that dim Im T = dim V − dim ker T = 2 − 0 = 2.
The only 2-dimensional subspace of R2 is R2 itself, hence Im T = R2 .

2. Again think ofR2 ascolumn vectors, and now let T : R2 → R2 be multiplication


2 2
by the matrix . What are ker T and Im T now?
2 2
  2 2 
Image: T xy = x 2x+2y 2
 
= , which can be written (x+y) 2 . Hence
2 2 y 2x+2y

Im T is spanned by the non-zero vector 22 . Thus 22 is a basis for Im T , so


  

dim Im T = 1.
 
Kernel: If T xy = 2x+2y 0
 
2x+2y = 0 then x = −y, and conversely if x = −y then
 
we can see that T xy = 00 . Therefore ker T is the set of vectors of the form

n o
x 1 1
 
−x = x −1 . So −1 is a basis for ker T , and dim ker T = 1.
As a check, we can verify that the rank-nullity theorem holds:

| {z V} = |dim{z
dim ker T} + |dim{z
Im T} .
=2 =1 =1

47
MA201 Linear Algebra Autumn 2020-21

Proof of the Rank-Nullity Theorem

The rank-nullity theorem states that if T : V → W is a linear map, then


dim V = dim Im T + dim ker T.
Recall that the dimension of a vector space U is the size of a basis of U (it turns
out that all bases have the same size). Let m = dim ker T and n = dim Im T . We
will prove the theorem by constructing a basis for dim V of size m + n.
Let {v1 , . . . , vm } ⊆ V be a basis of ker T , and let {w1 , . . . , wn } ⊆ W be a basis of
Im T .
Now, since each wi lies in Im T , there exists ui ∈ V such that T (ui ) = wi for all i.
We now claim that the set
{v1 , v2 , . . . , vm , u1 , u2 , . . . , un }
is a basis of V . To show this, we need to show that this subset is spanning and
linearly independent.
Linear independence: Suppose we have an equation
m n
! !
X X
ai vi + bi ui = 0 (∗)
i=1 i=1

for some scalars ai , bi . We need to show that ai = bi = 0 for all i. Notice: By


definition we have T (ui ) = wi for all i, and also vi ∈ ker T for all i, so T (vi ) = 0.
Therefore if we apply T to the equation (∗), we get
m n m n
! !! ! !
X X X X
T ai vi + bi ui = ai T (vi ) + bi T (ui ) since T is linear
i=1 i=1 i=1 i=1
| {z } | {z }
=0 = n
P
i=1 bi wi
n
X
= bi wi
i=1
= 0.
But now, we know that the set {w1 , . . . , wn } is linearly independent, since it is a
basis of Im T . Therefore bi = 0 for all i. Substituting this into equation (∗), we get
m
X
ai vi = 0
i=1

and since {v1 , . . . , vm } is a basis of ker T , it is also a linearly independent set, hence
ai = 0 for all i.
Spanning: We need to show that if v ∈ V , then there exist scalars di (i = 1, . . . , m)
and ci (i = 1, . . . , n) such that
m n
! !
X X
di vi + ci ui = v.
i=1 i=1

48
MA201 Linear Algebra Autumn 2020-21

To show this, consider T (v). This lies in Im T , and we know that {w1 , . . . , wn }
spans Im T since it is a basis. Therefore, there exist scalars c1 , . . . , cn such that
n
X
T (v) = ci wi .
i=1

Now, since T (ui ) = wi , this means that


n n n
!
X X X
T ci ui = ci T (ui ) = ci wi = T (v).
i=1 i=1 i=1

In turn, because T is linear, this means that


n n
! !
X X
T v− ci ui = T (v) − T ci ui = T (v) − T (v) = 0.
i=1 i=1
Pn
Which tells us that v − i=1 ci ui ∈ ker T . Finally, since {v1 , . . . , vm } spans ker T ,
this means that there exist scalars d1 , . . . , dm such that
n
X m
X
v− ci ui = di vi
i=1 i=1
m n
! !
X X
⇒ v= di vi + ci ui ,
i=1 i=1

as required.
Hence, we have shown that {v1 , . . . , vm , u1 , . . . , un } is linearly independent and
spans V , hence it is a basis of size m + n, and so dim V = m + n = dim ker T +
dim Im T , and the rank-nullity theorem is proved.

49
MA201 Linear Algebra Autumn 2020-21

11 Injections, surjections, isomorphism


Definition 11.1. If V and W are vector spaces, then an isomorphism is a
bijective linear map V → W .
If there exists an isomorphism T : V → W , then we say that V and W are
isomorphic.

The word ‘isomorphism’ comes from the Greek for ‘equal shape’; the idea is that
two isomorphic vector spaces are essentially the same.
Recall: If X and Y are sets, then a function f : X → Y is called

ˆ injective if f (x) = f (x0 ) implies x = x0 .

ˆ surjective if, for every y ∈ Y , there exists some x ∈ X such that f (x) = y (in
other words, Im f = Y ).

ˆ bijective if it is both injective and surjective.

Moreover, a f is bijective precisely when there exists a function g : Y → X which is


inverse to f , meaning

f (g(y)) = y and g(f (x)) = x for all x ∈ X and all y ∈ Y .

Proposition 11.2. Let T : V → W be a linear map.

(i) T is injective if and only if ker T = {0}.

(ii) T is an isomorphism (i.e. bijective) if and only if Im T = W and ker T = {0}.

(iii) If T is an isomorphism, then the inverse function T −1 : W → V is also a linear


map.

Proof. (i) If T is injective and T (v) = 0, then also T (0) = 0, so T (v) = T (0) and
therefore v = 0 since T is injective. So the only vector v ∈ V satisfying T (v) = 0
is 0, hence ker T = {0}.
Conversely, suppose ker T = {0} and suppose that T (v) = T (v0 ) for some v, v0 ∈ V .
Since T is linear, we get T (v − v0 ) = 0, so v − v0 ∈ ker T , hence v − v0 = 0, so
v = v0 . Therefore, T is injective.
(ii) We have just shown that T is injective if and only if ker T = {0}, and by
definition, T is surjective if and only if Im T = W . Therefore T is bijective if and
only if both of these hold.
(iii) Since T is bijective, we know that an inverse function T −1 : W → V exists. To
prove that T −1 is linear, we need to show

ˆ T −1 (w + w0 ) = T −1 (w) + T −1 (w0 ) for all w, w0 ∈ W , and

50
MA201 Linear Algebra Autumn 2020-21

ˆ T −1 (λw) = λT −1 (w) for all λ ∈ F and all w ∈ W .

We show these by applying T to the left-hand side, and using the fact that T T −1 (u) =
u for all u ∈ W :

T (T −1 (w) + T −1 (w0 )) = T (T −1 (w)) + T (T −1 (w0 )) since T is linear


0
=w+w
= T (T −1 (w + w0 )).

And since T is injective, this implies that T −1 (w) + T −1 (w0 ) = T −1 (w + w0 ).


Similarly,

T (λT −1 (w)) = λT (T −1 (w)) since T is linear


= λw
= T (T −1 (λw))

And again, since T is injective this implies λT −1 (w) = T −1 (λw), as required.

51
MA201 Linear Algebra Autumn 2020-21

12 Linear maps and matrices

We now discuss a general method for getting a matrix from a linear transformation.
Let T : V → W be linear. To get a matrix, we fix a basis B = {v1 , . . . , vn } of V ,
and a basis B 0 = {w1 , . . . , wm } of W . Since B 0 is a basis of W , we can write each
vector T (vi ) uniquely as a linear combination of the wj :

T (v1 ) = a11 w1 + a21 w2 + · · · + am1 wm ,


T (v2 ) = a12 w1 + a22 w2 + · · · + am2 wm ,
..
.
T (vn ) = a1n w1 + a2n w2 + · · · + amn wm .

for unique scalars aij ∈ F . Once we have done this, if we have an arbitrary vector
v = ni=1 ci vi in V , then we can calculate
P

n n
!
X X
T (v) = T ci vi = ci T (vi ) (as T is linear)
i=1 i=1
 
n
X Xm
= ci  aji wj  . (substituting for T (vi ))
i=1 j=1

Now if we rearrange the sum, grouping the terms in wj together, we get


m n
!
X X
T (v) = aji ci wj .
j=1 i=1

In other words, if [v]B = (c1 , . . . , cn ) and we take coordinates of T (v) with respect
to the basis B 0 = {w1 , . . . , wm }, we get
n n n
!
X X X
[T (v)]B 0 = a1i ci , a2i ci , . . . , ami ci .
i=1 i=1 i=1

If we write this out using column vectors, this becomes a matrix-vector product:
    
a11 a12 · · · a1n c1 a11 a12 · · · a1n
 a21 a22 · · ·
  c2   a21 a22 · · · a2n 
a2n     
[T (v)]B 0 =  . ..   ..  =  .. ..  [v]B .

.. .. .. ..
 .. . . .  .   . . . . 
an1 an2 · · · amn cn an1 an2 · · · amn

Definition 12.1. We write [T ]B,B 0 for the m × n matrix (aij ) above; this is the
unique matrix satisfying the equation

[T (v)]B 0 = [T ]B,B 0 [v]B .

We call [T ]B,B 0 the matrix of T with respect to the bases B of V and B 0 of W .

52
MA201 Linear Algebra Autumn 2020-21

Remarks 12.2.

ˆ If V = Rn and W = Rm , we will sometimes refer to ‘the matrix of T ’ without


specifying bases. In this case, we mean ‘the matrix of T with respect to the
standard bases of Rn and Rm .’
ˆ If we use different bases for the vector spaces V and W , we will end up with
a different matrix for T (we’ll see examples of this shortly).
ˆ Comparing the matrix equation above with the equations T (vi ) at the start
of this lecture, we get a short rule for calculating [T ]B,B 0 , as follows:
Observation 12.3. Let T : V → W be a linear map, and let B = {v1 , . . . , vn } be a
basis of V and B 0 = {w1 , . . . , wm } be a basis of W . Then the i-th column of [T ]B,B 0
is [T (vi )]B 0 . In words, to get the i-th column of [T ]B,B 0 , take the i-th member of B,
apply T , and find the coordinates of the new vector with respect to B 0 .
Example 12.4. Let V = R3 and W = R2 , and let
f : V −→ W,
(x, y, z) 7−→ (x − y, x + 2z).
This is a linear map. Let’s calculate matrix of f with respect to the standard
bases of V and W . According to the recipe above, we need to take the standard
basis B = {e1 , e2 , e3 } = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} of V , and calculate the co-
ordinates of f (1, 0, 0), f (0, 1, 0) and f (0, 0, 1) with respect to the standard basis
B 0 = {(1, 0), (0, 1)} of W .

f (e1 ) = f (1, 0, 0) = (1, 1) = 1(1, 0) + 1(0, 1),


f (e2 ) = f (0, 1, 0) = (−1, 0) = −1(1, 0) + 0(0, 1),
f (e3 ) = f (0, 0, 1) = (0, 2) = 0(1, 0) + 2(0, 1).
     
1 −1 0
These three equations tells us that the columns of [f ]B,B 0 are , and ,
1 0 2
 
1 −1 0
and therefore [f ]B,B 0 = . Let’s check that this matrix really does what
1 0 2
we expect:    
x   x  
1 −1 0 x − y
[f ]B,B 0 y  = y  = ,
1 0 2 x + 2z
z z
as required.
Now let’s do the same example, still with B the standard basis of V = R3 , but now
let B 0 = {(1, 0), (0, −1)}. Then we calculate

f (e1 ) = f (1, 0, 0) = (1, 1) = 1(1, 0) − 1(0, −1),


f (e2 ) = f (0, 1, 0) = (−1, 0) = −1(1, 0) + 0(0, −1),
f (e3 ) = f (0, 0, 1) = (0, 2) = 0(1, 0) − 2(0, 1).
 
0 1 −1 0
And therefore with this new basis B , we get [f ]B,B 0 = .
−1 0 −2

53
MA201 Linear Algebra Autumn 2020-21

Remark 12.5. If we try and perform the same check for [f ]B,B 0 as before, we get
   
x   x  
1 −1 0   x−y
[f ]B,B 0 y =
  y = .
−1 0 −2 −x − 2z
z z
This looks wrong, the last entry should be x + 2z. But note: These are the coordi-
nates with respect to the basis {(1, 0), (0, −1)}, so the right-hand actually denotes
the vector (x − y)(1, 0) + (−x − 2z)(0, −1) = (x − y, x + 2z) in the standard basis.

Now that we know how to turn linear transformations into matrices, we can try and
pick ‘nice bases’ of V and W to get ‘nice matrices’ for f . For instance, if V = W
and B consists of eigenvectors for f , then [f ]B,B will be a diagonal matrix – this is
very useful, e.g. it makes performing calculations easy.
Recall that an eigenvector of a matrix A is a non-zero vector v such that Av = λv
for some scalar λ ∈ F , which we call the corresponding eigenvalue. This same
definition works for linear transformations:
Definition 12.6. Let V be a vector space over a field F and let T : V → V
be a linear transformation. An eigenvector of T is a non-zero vector v ∈ V
such that T (v) = λv for some scalar λ ∈ F , which is called the corresponding
eigenvalue of T .

Remark 12.7. If v is an eigenvector for T with eigenvalue λ then, if we fix a basis


B of V , we get
[T ]B,B [v]B = [T (v)]B = [λv]B = λ[v]B .
So the column vector [v]B ∈ F n is an eigenvector for the matrix [T ]B,B , with the
same eigenvalue. The converse holds as well; we make this a proposition.
Proposition 12.8. Let T : V → V be a linear transformation.

1. The eigenvalues of T : V → V are exactly the same as the eigenvalues of the


matrix [T ]B,B , for any choice of basis B.
2. A vector v is an eigenvector for T if and only if [v]B is an eigenvector for [T ]B,B .

Linear maps and change of basis

It makes sense that if two matrices both correspond to the same linear transformation
T , but with respect to different bases, then these matrices should have something
to do with one another. The next proposition tells us what this relationship is.
Proposition 12.9. Let T : V → V be a linear transformation and let B1 , B2 be
bases of V . Let
M1 = [T ]B1 ,B1 , M2 = [T ]B2 ,B2 .
Finally, let A be the change of basis matrix from B1 to B2 , so that A−1 is the change
of basis matrix from B2 to B1 . Then M1 and M2 are related by the equation
M1 = A−1 M2 A.

54
MA201 Linear Algebra Autumn 2020-21

Definition 12.10. Two n × n matrices M1 and M2 are called conjugate if


M1 = A−1 M2 A for some invertible n × n matrix A.

Proof of the proposition. By definition, the matrices M1 = [T ]B1 ,B1 , M2 = [T ]B2 ,B2
and the change of basis matrices A, A−1 satisfy the following properties for all u ∈ V :

M1 [u]B1 = [T (u)]B1 , M2 [u]B2 = [T (u)]B2 , A[u]B1 = [u]B2 , A−1 [u]B2 = [u]B1 ,

Using these, for all v ∈ V we calculate

M1 [v]B1 = [T (v)]B1 by the property of M1


−1
=A [T (v)]B2 by the property of A−1
= A−1 M2 [v]B2 by the property of M2
−1
=A M2 A[v]B1 by the property of A.

This shows that M1 and A−1 M2 A do the same thing to every column vector [v]B1 ∈
F n , so they are the same matrix.
Remarks 12.11.

ˆ In words, multiplying a vector by A−1 M2 A says ‘change basis from B1 to B2 ,


then apply M2 , then change basis back to B1 .’ It makes sense that this should
be the same thing as applying M1 (which is the same linear transformation,
but operating on coordinates with respect to B1 ).

ˆ The proposition shows that two matrices which represent the same linear trans-
formation are conjugate. The converse is true as well: If M1 = A−1 M2 A, then
we can view M1 and M2 are representing the same linear transformation.

55
MA201 Linear Algebra Autumn 2020-21

13 Diagonalisation
Definition 13.1.

ˆ A linear transformation T : V → V is called diagonalisable if there is a


basis B of V such that [T ]B,B is a diagonal matrix.

ˆ An n × n matrix M is called diagonalisable if there is an invertible n × n


matrix A such that A−1 M A is diagonal.

By the above, T is diagonalisable if and only if [T ]B,B is diagonalisable for any choice
of B (in this case, [T ]B,B is diagonalisable for all choices of B).
Proposition 13.2. Let T : V → V be a linear transformation. Then T is diagonal-
isable if and only if V has a basis B which consists of eigenvectors of T .

Proof. Suppose B = {v1 , . . . , vn } is a basis of V such that for each i we have


T (vi ) = λi vi for some scalar λi . Then the i-th column of [T ]B,B is given by the
coordinates [T (vi )]B,B = [λvi ]B,B , which are just λi in the i-th position, and 0
everywhere else. In other words [T ]B,B is diagonal, and its diagonal entries are
λ1 , λ2 , . . . , λn .
Conversely, if [T ]B,B is diagonal with diagonal entries λ1 , λ2 , . . . , λn , then looking at
the i-th column tells us that T (vi ) is the vector whose coordinates with respect to
B are (0, 0, . . . , 0, λi , 0, . . . , 0), where λi is in position i. This vector is λi vi , so vi is
an eigenvector of T with eigenvalue λi .

Remark 13.3. Not all matrices/linear transformations


 can be diagonalised. For
cos θ − sin θ
instance, consider the rotation matrix Rθ = . If we draw a picture,
sin θ cos θ
it is clear that if θ is not a multiple of π then Rθ does not send any non-zero vector
in R2 to a scalar multiple of itself, so Rθ has no eigenvectors (so V cannot have a
basis consisting of eigenvectors).

Here’s a step-by-step recipe for how to diagonalise a linear transformation T or a


matrix M .
Quick method: “Spot” eigenvectors. In many cases, it is not hard to think of
some eigenvectors of a linear transformation. If we can find a basis this way, we can
easily find a diagonal matrix representing T .
Example 13.4. Let V = R2 and let Rπ : V → V denote a rotation through π. Then
it is not hard to see that every vector in V is sent to its negative, i.e. Rπ (v) = −v
for all v ∈ V . So if B = {v1 , v2 } is any basis of V , then Rπ (v1 ) = −v1 and
Rπ (v2 ) = −v2 . Thus [Rπ (v1 )]B = (−1, 0) and [Rπ (v2 )]B = (0, −1); using these as
columns of the matrix, we get
 
−1 0
[Rπ ]B,B = .
0 −1

56
MA201 Linear Algebra Autumn 2020-21

 
cos θ − sin θ
Note that this agrees with the expression for the rotation matrix
sin θ cos θ
that we saw earlier.

Of course, it is not always easy to ‘spot’ eigenvectors in this way, so we would like
a more general method. The following gives an outline of a general process for
diagonalising a matrix or linear transformation, whenever this is possible.
General method.

Step 0. If we are given a linear transformation T : V → V , pick any basis B of V ,


such as the standard basis, and calculate [T ]B,B , so that we are working with
a matrix, M .

Step 1. Use the characteristic equation of the matrix M to find its eigenvalues and
eigenvectors.

Step 2. If we can find enough linearly independent eigenvectors of M to get a basis


{v1 , . . . , vn } of V , use these eigenvectors as the columns of a matrix P . Then
P −1 M P will be diagonal, with i-th diagonal entry equal to the eigenvalue
corresponding to vi .

Let’s illustrate the method with an example.


 3 4
−5 5
Example. Let M = 4 3 . (This is the matrix of the linear transformation
5 5
T : R2 → R2 given by reflecting in the line y = 2x.)
Step 1. If M v = λv for some column vector v, then we can rearrange this equation
to get (M − λI2 )v = 0. If this has a non-zero solution then the determinant of the
matrix on the left-hand side is zero (because we can use Gaussian elimination to get
a row of zeroes):
det(M − λI2 ) = 0.
In our example here, we get
 3 4

−5 − λ 9 16
det 4 3
5 = λ2 − − = λ2 − 1 = (λ + 1)(λ − 1).
5 5 − λ 25 25

This shows that if M v = λv then λ = ±1.


More generally, we define/recall the following.

Definition 13.5. If M is an n × n matrix, the characteristic equation of M is

det(M − xIn ) = 0,

where x is a variable and In is the n × n identity matrix. The expression


det(M − xIn ) is called the characteristic polynomial of M ; it is a polynomial in
x of degree n.

57
MA201 Linear Algebra Autumn 2020-21

With this definition, the eigenvalues of M are precisely the roots of the characteristic
polynomial of M .
In our particular example, we now know that M has eigenvalues ±1. To get eigen-
vectors, we solve the equation (M − λI2 )v = 0 twice, once with λ = 1 and once with
λ = −1. For λ = 1 we get
 3 4
   
−5 − 1 5 x 0
4 3 = .
5 5 − 1 y 0

The first line here gives us − 58 x − 45 y = 0, and the second line gives an equivalent
equation.
  So any non-zero solution to this gives us an eigenvector. For instance
1
is an eigenvector with corresponding eigenvalue 1 (taking x = 1, y = 2).
2
Repeating this for λ = −1, we get
 3 4
   
−5 + 1 5 x 0
4 3 = =⇒ x = −2y
5 5 + 1 y 0
 
−2
And so is an eigenvector with corresponding eigenvalue −1.
1
Step 2. We now have two eigenvectors for M , and these are not multiples of one
another, so they form a linearly independent set of size 2, which is a basis of R2 .
Use these as the columns of a matrix:
 
1 −2
P = .
2 1
 
−1 −1 1 1 2
And let’s check that P M P is diagonal: Firstly note that P = det P =
−2 1
 
1 1 2
5 −2 1 . Hence,

   3 4  
1 1 2 −5 5 1 −2
P −1 M P = 4 3
5 −2 1 2 1
  5 5
 
1 1 2 −3 4 1 −2
=
25 −2 1 4 3 2 1
  
1 1 2 5 10
=
25 −2 1 10 −5
 
1 25 0
=
25 0 −25
 
1 0
=
0 −1

as we expected.
When does this work?
There are two ways in which the above process can fail to work.

58
MA201 Linear Algebra Autumn 2020-21

1. Firstly, it is possible that the characteristic polynomial det(M − λIn ) has some
solutions which do not lie in the field F . For example, suppose that θ is not a
multiple of π, and consider the rotation matrix
 
cos θ − sin θ
Rθ = .
sin θ cos θ
This is a real matrix (so F = R), but its eigenvalues are complex. The charac-
teristic polynomial is
λ2 − (2 cos θ)λ + (cos2 θ + sin2 θ) = λ2 − 2 cos θ + 1 = (λ − eiθ )(λ − e−iθ ).
And we can see that the (complex) roots eiθ , e−iθ are not real if θ is not a multiple
of π.
2. Even if all the roots of the characteristic polynomial lie in the field F , we might
not be able to find enough eigenvectors to get a basis of V . For example, consider
the matrix  
1 1
M= .
0 1
Let’s follow our diagonalisation routine. For Step 1, we get the characteristic
polynomial det(M − λI2 ) = (1 − λ2 ), which has a single repeated root λ = 1.
Now if we set λ = 1 and solve (M − λI2 )v = 0, we get
         
1−1 1 x 0 1 x y 0
= = = .
0 1−1 y 0 0 y 0 0
Clearly the only condition on x and y we get here is y = 0. But this means
that every eigenvector of M with eigenvalue 1 has the form x0 . These are all
scalar multiples of a single vector, namely 10 . Hence if we take two such vectors,


then they are not linearly independent. So V does not have a basis consisting of
eigenvalues of M .

It turns out that the above points are the only things which can get in the way of
our diagonalisation process.
Definition 13.6. Let M be an n × n matrix and let p(x) = det(M − xIn ) be
its characteristic polynomial, and let λ ∈ F .

ˆ The algebraic multiplicity of λ is the number of times that (x − λ) occurs


as a factor of p(x).

ˆ The geometric multiplicity of λ is the number of linearly independent


eigenvalues of M we can find with corresponding eigenvalue λ.

With these definitions, the following proposition summarises the above information:
Proposition 13.7. A matrix M can be diagonalised (i.e. there exists a matrix P
such that P −1 M P is diagonal) if and only if all the eigenvalues of M lie in F , and
for each such eigenvalue λ, we have
algebraic multiplicity of λ = geometric multiplicity of λ.

59
MA201 Linear Algebra Autumn 2020-21

[The formal proof of this proposition is omitted.]

60
MA201 Linear Algebra Autumn 2020-21

14 New vector spaces from old

Direct sums
Definition 14.1. Let V and W be vector spaces over a field F . The direct sum
of V and W , denoted V ⊕ W , is

V ⊕ W = {(v, w) : v ∈ V, w ∈ W } ,

subject to the rules λ(v, w) = (λv, λw) and (v, w) + (v0 , w0 ) = (v + v0 , w + w0 ).


We often write (v, w) = v + w, i.e. we identify v with (v, 0) and w with (0, w).

Exercise 14.2. Show V ⊕ W is a vector space.


Example 14.3. Rn ⊕ Rm = Rn+m .
Proposition 14.4. (i) dim(V ⊕ W ) = dim V + dim W .

(ii) The map

F ⊕ F ⊕ · · · ⊕ F → F n,
| {z }
n times
(x1 , x2 , . . . , xn ) → (x1 , x2 , . . . , xn )

is an isomorphism.

(iii) If U and W are subspaces of a vector space V , then U + W is isomorphic to


U ⊕ W if and only if U ∩ W = {0}.

Proof. Parts (i) and (ii) are really easy and they are left as an exercise.
Let f : U ⊕ W → U + W be defined as f (u, w) = u + w. It is easy to show that this
map is linear (do it!) and surjective, so it follows that dim(U + W ) = dim(Im(f )).
It follows from the rank-nullity theorem that

dim(Im(f )) = dim(U ⊕ W ) − dim(ker(f )) = dim(U ) + dim(W ) − dim(ker(f )).

Note that
ker(f ) = {(u, w) ∈ U ⊕ W : u + w = 0}.
However (u, w) ∈ U ⊕ W satisfies u + w = 0 if and only if u = −w ∈ V . But
that implies u ∈ U ∩ W Conversely, if u ∈ U ∩ W then f (u, −u) = 0. Hence
ker(f ) ∼
= U ∩ W . Since f is surjective, it is an isomorphism if and only if U ∩ W ∼
=
ker(f ) = {0}, completing the proof.

Example 14.5. For instance, the direct sum of two 1-dimensional vector spaces
(lines) is always 2-dimensional, however we have previously seen the sum U + W of
two lines can be a line (if U = W ), or a plane (if U 6= W , which is exactly when
U ∩ W = {0}).

61
MA201 Linear Algebra Autumn 2020-21

Hom spaces
Definition 14.6. Let V and W be vector spaces over a field F . Let

HomF (V, W ) = Hom(V, W ) = {all linear maps V → W }.

Proposition 14.7.

(i) Hom(V, W ) is a vector space over F , under the operations

(f + g)(v) := f (v) + g(v),


(cf )(v) := cf (v).

(ii) Let {v1 , . . . , vm } be a basis of V and {w1 , . . . , wn } be a basis of W . Then


Hom(V, W ) has a basis consisting of the functions eij , for i = 1, . . . , m, j =
1, . . . , n, where we define eij on a basis of V , via

wj : k = i,
eij (vk ) =
0 : otherwise.
Moreover, with these bases of V and W , the function eij can be identified with
multiplication by the matrix having 1 in the (i, j)-position, and 0 everywhere
else.

(iii) dim Hom(V, W ) = (dim V )(dim W ).

Proof.

(i) f ∈ Hom(V, W ) is defined by how it acts on each vector v ∈ V . Hence, to


define f + g, cf ∈ Hom(V, W ), it is enough to express how it acts on each
vector v ∈ V . Therefore, we must verify that the axioms of vector space hold
when we apply the resulting operation on elements of Hom(V, W ) on each
v ∈ V . For instance, commutativity holds because (f + g)(v) := f (v) + g(v) =
g(v) + f (v) = (g + f )(v). You can similarly check the other axioms. Note that
0 ∈ Hom(V, W ) is defined as 0(v) = 0W for all v ∈ V .

(ii) First of all we need to show that eij ∈ Hom(V, W ), i.e. that eij is indeed
Pm
0 0
PGiven0 v, v ∈ V , there are unique ak , ak such that v = k=1 ak vk and
linear.
v0 = m k=1 ak vk . Then
m
X
eij (v+v0 ) = eij ( (ak +a0k )vk ) = (ai +a0i )wj = ai wj +a0i wj = eij (v)+eij (v0 ).
k=1

Similarly, one can show that eij (λv) = λeij (v). (Do it!).
Now we need to show that {eij }16i6m,16j6n form a basis of T ∈ Hom(V, W ).
To prove linear independence, suppose we have aij such that
m X
X n
aij eij = 0.
i=1 j=1

62
MA201 Linear Algebra Autumn 2020-21

By applying v = vk for k = 1, . . . m to the above expression we get:


m X
X n n
X
aij eij (vk ) = akj wj = 0(vk ) = 0W ,
i=1 j=1 j=1

which is a linear combination in W for each k = 1 ldots, m. However, since


{w1 , . . . , wn } form a basis of W , we conclude that akj = 0 for all k, j so the
{eij } form a basis.
To prove that the {eij } span Hom(V, W ) we need to show that given T ∈
Hom(V, W ), there are unique aij such that
m X
X n
T = aij eij .
i=1 j=1

For each vi , for i = 1, . . . , m we have that there are aij such that
n
X
T (vi ) = aij wj ,
j=1

since {wj } form a basis of W . Now, let


m X
X n
0
T := aij eij .
i=1 j=1

We claim that T 0 = T . It is enough to show that for any v ∈ V we have


T 0 (v) = T (v) or equivalently that (T 0 −T )(v) = 0W . We note that by definition
of aij and T 0 we have that
n
X n
X
0 0
(T − T )(vi ) = T (vi ) − T (vi ) = aij wj − aij wj = 0.
i=1 i=1

For any v ∈ V there are bk ∈ F such that v = m


P
k=1 bk vk . Hence, by linearity
of T and T 0 :
n n
!
X X
0 0
(T − T )(v) = (T − T ) bk vk = bk ((T − T 0 )(vk )) = 0W ,
k=1 k=1

for all v ∈ V so
m X
X n
T = T0 = aij eij
i=1 j=1

and {eij } span Hom(V, W ) and are a basis. Since we have n×m = dim(V ) dim(W )
elements on this basis, (iii) follows.
For the last claim regarding the matrix of eij ∈ Hom(V, W ), note that if we
let B = {v1 , . . . , vm } and B 0 = {w1 , . . . , wn } then [eij (vk )]B0 is the 0-vector
unless i = k in which case all the entries are 0 apart from the j-entry which is
1. Hence the result follows from Observation 12.3.

63
MA201 Linear Algebra Autumn 2020-21

Example 14.8. Let V = {a cos t + b sin t : a, b ∈ R}, which you have previously
seen is a vector space. We claim:

d
(i) the map T = dt ∈ Hom(V, V ),

(ii) Let B = B 0 = {v1 = cos t, v2 = sin t} be the natural basis for V . In the
notation of Proposition 14.7, we have:

e11 (a cos t + b sin t) = a cos t, e21 (a cos t + b sin t) = b cos t,


e12 (a cos t + b sin t) = a sin t, e22 (a cos t + b sin t) = b sin t.

(iii) In particular the matrix of T in this basis is:


 
0 1
[T ]B,B = ,
1 0

so T = −e12 + e21 .

d
To see (i) note that dt is a linear transformation in the vector space of continuous
d
functions, of which V is a subspace, so we only need to show that dt (V ) ⊆ V .
Indeed,
d
(a cos t + b sin t) = −a sin t + b cos t ∈ V, (2)
dt
d
so T = dt ∈ Hom(V, V ).
To see (ii) we just apply Proposition 14.7 (ii), i.e.

e11 (cos t) = e11 (v1 ) = v1 = cos t, e21 (cos t) = e21 (v1 ) = 0,


e12 (cos t) = e12 (v1 ) = v2 = sin t, e22 (cos t) = e22 (v1 ) = 0,
e11 (sin t) = e11 (v2 ) = 0, e21 (sin t) = e21 (v2 ) = v1 = cos t,
e12 (sin t) = e12 (v2 ) = 0, e22 (sin t) = e22 (v2 ) = v2 = sin t.

and the fact that eij are linear.


For (iii) we note from (2) that

T (av1 + bv2 ) = −av2 + bv1 . (3)


 
0 1
Hence T = [T ]B,B = . Moreover, from (3) we see that T sends v1 to −v2 and
1 0
v2 to v1. We can express those maps in terms of eij by inspection above, concluding
T = −e12 + e21 .

64
MA201 Linear Algebra Autumn 2020-21

15 An introduction to inner product spaces

In this section we just hint some materials that are not examinable but that would
be covered if we had more time.

Inner product spaces

When V = Rn you have encountered the dot product between elements v =


(v1 , . . . , vn ) and w = (w1 , . . . , wn ) as
n
X
v · w = hv, wi = v i · wi .
i=1

Observe that since we know multiplication of matrices, we can also describe this
product as:
n
X
hv, wi = vT · I · w = v i · wi .
i=1

But what happens if we replace the identity matrix above by another matrix? Do
we get a notion of ‘product’ ? Indeed.

Definition 15.1. An inner product on a vector space V over a field F = R


or F = C is a function h−, −i : V ⊕ V → R that takes each ordered pair
(u, v) ∈ V ⊕ V to a number hu, vi ∈ F and has the following properties:

(i) positivity: hv, vi > 0 for all v ∈ V ;

(ii) definiteness: hv, vi = 0 if and only if v = 0;

(iii) additivity in first slot: hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V ;

(iv) homogeneity in first slot: hλu, vi = λhu, vi for all λ ∈ F and all u, v ∈ V ;

(v) conjugate symmetry: hu, vi = hv, ui for all u, v ∈ V ;

Note that if F = R then hu, vi = hv, ui = hv, ui.


When a vector space V has an inner product attached to it we say V is an inner
product space.

Example 15.2. If F = R, V = Rn , and A = (aij ) is a symmetric matrix (i.e. AT =


A, or equivalently aij = aji ) then we can define an inner product as hx, yi = xT Ay
(check it satisfies (i)–(v) in the definition!).
Example 15.3. A more highbrow example: Let V be the vector space of continuous
real-valued functions on [−1, 1] and define
Z 1
hf, gi = f (x)g(x)dx.
−1

65
MA201 Linear Algebra Autumn 2020-21

This is an inner product and in fact, this is a baby example of Hilbert space, which
is the main object of study in functional analysis.

A inner product space is very useful to formalise notions from Euclidean geometry
into linear algebra. For instance, it allows us to define lengths of vectors (norms)
or even angles between two vectors. To see this, notice that property (v) in the
definition of inner product means that hv, vi = hv, vi so hv, vi ∈ R.

Definition 15.4. Let V be an inner product space. The norm of a vector


v ∈ V is defined by: p
||v|| = hv, vi.

Definition 15.5. Let V be a real inner product space. The angle between two
vectors v, w ∈ V is defined to be

hv, wi
arccos .
||v|| ||w||

Two vectors u, v of an inner product space V are said to be orthogonal if the


angle between them is 0, or equivalently hv, wi = 0.

In many applications (e.g. computer graphics, including videogames) lengths of


vectors and angles are an important thing to preserve under linear transformations
(such as those involved in moving cameras points of view around). For this reason,
it is important to consider bases whose elements have ‘right’ angles with each other
and norm 1:
Definition 15.6. Let V be a finite dimensional inner product vector space over
R. An orthonormal basis B = {v1 , . . . , vn } of V is one where

1 : j = i,
hvi , vj i =
0 : otherwise.

What the above means is that all angles have norm 1 and are at right angles with
each other. The standard basis on V = Rn with the inner product given by the dot
product is an example of orthonormal basis. This theory allows us to extend that
notion to more abstract vector spaces.
There is an algorithmical procedure to produce an orthonormal basis called the
Gram-Schmidt process but I am afraid we have run out of time. You may explore
some of these topics and more in a Capstone project next year!

66

You might also like