Linear Algebra Notes

HONOURS MODERATION
LINEAR ALGEBRA I, 2008/09
ANNE HENKE
Contents
Topics Covered 1
Some remarks 1
Solving mathematical problems 2
1. Fields 4
2. The Algebra of Matrices 6
3. Vector Spaces 12
4. First Properties of Vector Spaces 17
5. Subspaces of Vector Spaces 19
6. Linear Dependence, Linear Independence and Spanning 25
7. Bases of Vector Spaces 30
8. Steinitz Exchange Procedure 34
9. Dimension of Vector Spaces and an Application to Sums 38
10. Linear Transformations 43
11. The Rank-Nullity Theorem 48
12. The Matrix Representation of a Linear Transformation 52
13. Row Reduced Echelon Matrices 59
14. Systems of Linear Equations 64
15. Invertible Matrices and Systems of Linear Equations. 68
16. Elementary Matrices 72
17. Row Rank and Column Rank 75
HONOURS MODERATION LINEAR ALGEBRA I, 2008/09 1
Topics Covered
Algebra of matrices.
Vector spaces over the real numbers; subspaces. Linear dependence and linear
independence. The span of a (finite) set of vectors; spanning sets. Examples.
Finite dimensionality.
Definition of bases; reduction of a spanning set and extension of a linearly inde-
pendent set to a basis; proof that all bases have the same size. Dimension of a
vector space. Co-ordinates with respect to a basis.
Sums and intersections of subspaces; formula for the dimension of the sum.
Linear transformations from one (real) vector space to another. The image and
kernel of a linear transformation. The rank-nullity theorem. Applications.
The matrix representation of a linear transformation with respect to fixed bases;
change of basis and co-ordinate systems. Composition of transformations and
product of matrices.
Elementary row operations on matrices; echelon form and row-reduction. Matrix
representation of a system of linear equations. Invariance of the row space under
row operations; row rank.
Significance of image, kernel, rank and nullity for systems of linear equations.
Solution by Gaussian elimination. Bases of solution space of homogeneous equa-
tions. Applications to finding bases of vector spaces.
Invertible matrices; use of row operations to decide invertibility and to calculate
inverse.
Column space and column rank. Equality of row rank and column rank.
Some remarks
This set of notes is a collection of material which will be covered in this course, in
about the given order. It contains a few things which are not part of the syllabus,
like the section on fields, examples of vector spaces over a field different from R,
or some comments on infinite dimensional vector spaces. I will use this collection
of material to prepare the lectures. This means that I may spontaneously decide
to give different examples, to elaborate on something that I did not write down
in these notes. I would strongly advise you to take your own notes in the lectures
– already as it helps to concentrate on the lecture. Equally important is that you
read in linear algebra books; they are not only written with much more care than
these notes, they also contain more examples, more details and more background
material.
Please let me know of any necessary corrections of the mathematics, preferably
by email to: henke@maths.ox.ac.uk.
2 ANNE HENKE
Solving mathematical problems
Mathematical problems. Problems play a central role in mathematics. To

solve a problem, we need to analyse the problem, we need to play with the prob-
lem, spend time with it, in order to eventually solve it with phantasy, with sense
for elegance, symmetry. Mathematical problems are the natural way to learn
these abilities. When studying, your aim should not be to constantly think about
how to prepare for the examinations. It is the other way around: the examination
will test whether you learned to solve problems. Don’t forget that the problem
sheets coming with your lecture courses are something entirely different to your
school homework. Copying solutions means you miss the most important part of
your education. Do not restrict yourself to just the easy problems which are there
to train a new concept. These are just warm up exercises. The real learning effect
comes when you stretch your mind beyond the things you know already. This is
like in sports. And as more you exercise, as better you get. The wise student will
solve many many exercises, going beyond the problem sheets of the course.
Analysing. Typically you have one week time to solve a problem sheet. Start
immediately, take the maximal time span to think about the problems. This
means thinking, rethinking, trying repeatedly, eventually improving the solution,
finding alternative solutions. Put it away in between and do something else,
and repeatedly return to the problems you have not yet solved. You cannot
expect to solve a problem in a few minutes. Many ideas need to ripen first in
your subconscious, before you see light. It is vital that you really know the
problem. This is not meaning that you learn the problem by heart, no, you need
to understand the problem. In order to understand the problem it is necessary
that when reading it first, you spend so much time thinking about what you read,
that you can formulate the problem with your own words, that you can explain the
problem to a friend, any time without thinking and without using the problem
sheet. Look up definitions, understand the context in which these definitions
appear. Ask yourself which theorems from the lectures could be helpful: is it a
special case of a theorem, or is it generalising a theorem? If the problem deals
with a general situation, form examples (special cases). Often when you know
the right examples, then you are able to understand why a general statement is
correct, and hence you can write down a proof. Try to visualise the problems (for
example, real functions you can draw etc). Check which methods of proof where
used in the relevant lecture material. Trust your own intuition. Are there any
situations the problem reminds you of?
Talking. Talk about the problem. In general, try to talk as much as possible
about mathematics. Talking helps to structure the own thoughts. You can talk
to your friends, tutorial partners, your tutor. Use that chance! But be aware that
it only makes sense to talk about the problem if you spend some time thinking
about it. Working in groups also can be a good way of learning, but only if there
is a healthy balance between giving and taking. In the end you are measured on
your own abilities. To develop these it is not enough that someone just explains
you the solutions. You need to actively participate in the process of finding the
solution. When you found a solution, it could prove helpful to see how other
people solve the problem, or to let other people criticise your own solution.
Writing. This is a critical moment. Here it shows whether the thought solution
can really be written down. Every correct solution can be written down in a
sensible way. If you have problems to write down your solution, then you have
not yet ordered your thoughts enough, you have not yet fully understood the
solution, the mathematical mechanism. Think again, you have not yet reached
the final goal! There are two bad extremes of writing styles: the first one is to just
write a calculation without an argument; the second one is to write a whole novel
without talking precisely about the problem. The correct way is somewhere in
the middle. Give precise arguments. Moreover, a solution to a problem consists
of a properly readable English text. We are not talking maths-language, we speak
and write English when we explain a solution. We write full sentences (not just
a formula without giving any context). Can you read your text loudly and it still
makes sense? Expect that you are not handing in the first written version of your
solution. And of course, be kind and respectful to your tutor: write clearly and
do not hand in pages where half of the text is crossed out or your cafe pot spilled
over. Your tutors also care about you making progress. Do you still understand
your own solution a couple of days later? If not start again. Getting a correct,
elegant and well written solution is often hard work! But you will feel good when
you achieved it.
Presenting. The communication of a solution is an important part of math-
ematical work. It is part of your education at university to give a clear and
understandable presentation of your work. You will likely need it whatever you
do after university. Exercise now, to learn it later will be harder.
The above is a free translation of parts of the student advice given by Prof M
Lehn (University of Mainz); for the original see http://www.mathematik.uni-
mainz.de/Members/lehn/le/uebungsblatt.
4 ANNE HENKE
1. Fields
This chapter is not part of the syllabus of this course. It is included in these notes
to indicate the more general setting in which linear algebra is defined. Fields will
properly be introduced and studied in some depth in the second year. The objects
that we study in this course – vector spaces, subspaces, linear maps (but similarly
groups, fields and many other mathematical objects which you meet later) – are
typically defined by a list of axioms that need to be satisfied.
Notation 1.1.
C = {a + bi | a, b ∈ R} = set of all complex numbers,
R = set of all real numbers,

Q = set of all rational numbers = m n
| m, n ∈ Z, n 6= 0 ,
Z = set of all integers,
N = set of all natural numbers.
Note: Z ⊆ Q ⊆ R ⊆ C.
Recall that the addition and multiplication of complex numbers is given by:
(a + bi) + (c + di) = (a + c) + (b + d)i,
(a + bi) · (c + di) = (ac − bd) + (ad + bc)i,
Note that this generalises the addition and multiplication of the subsets N, Z, Q
and R.
Definition 1.2. Let K be a subset of the complex numbers. Then K is called a
field if it satisfies the following conditions:
(K1) If x, y ∈ K, then x + y ∈ K and x · y ∈ K.

(K2) If x ∈ K, then −x ∈ K. If furthermore x 6= 0, then x−1 ∈ K.
(K3) The elements 0 and 1 are elements of K.
Example 1.3. (1) Claim: Q is a field.
Proof. We need to check that the axioms (K1)-(K3) hold. Let x, y ∈ Q,
then – by the definition of Q, see above – there exist a, b, c, d ∈ Z, b 6=
0, d 6= 0 with x = ab and y = dc . To show that a number is in Q, we need
to write it as a fraction.
(a) We have:
a c ad + bc
x+y = + = ,
b d bd
ac
x·y = .
bd
Since a, b, c, d ∈ Z =⇒ ad + bd ∈ Z,
bd ∈ Z,
ac ∈ Z.
ad+bc ac
Since b 6= 0, d 6= 0 =⇒ b · d 6= 0. Hence bd
∈ Q and bd
∈ Q, that
is x + y, x · y ∈ Q. Hence (K1) holds.
(−a)
(b) Since a ∈ Z =⇒ −a ∈ Z. Hence −x = b
∈ Q. Moreover
−1 b
x−1 = ab = a ∈ Q. Hence (K2) holds.
(c) 0,1 are elements in Q as 0 = 10 and 1 = 11 .
(2) Similar to (1) we have: R and C are fields.
(3) Claim: Z is not a field.

Proof. It is enough to show that one of the three axioms in Definition
1.2 fails. We show that (K2) fails. Let x = 3, then x ∈ Z and x 6= 0.
However x−1 = 31 6∈ Z. Hence (K2) does not hold.
√ √
(4) Define the set:
√ Q( 2) = {a + b 2|a, b ∈ Q}.
Claim: Q( 2) is a √ field. √
Proof. Clearly Q( √ 2) consists of √real numbers. So Q( 2) is a subset
of C. Let x = a + b √2, y = c + d 2 where a, b, c, d ∈ Q. Then x +
y = (a + c) + (b + d) 2. Since a, c ∈ Q it follows by Example (1) √ that
a + c ∈ Q. Similarly √ b, d ∈ Q implies
√ b + d ∈ Q. Hence x + y√∈ Q( 2).
Next, x · y = (a + b 2)(c + d 2) = (ac + 2bd) + (ad + bc) 2. Since
a, b, c, d ∈ Q and Q is a field,
√ it follows that ac + 2bd and ad + bc are
elements in Q. So x · y ∈ Q( 2). Hence (K1) holds. To check the rest of
the axioms is left as an exercise to the reader.
(5) Q(i) := {a + bi | a, b ∈ Q} is a field.
Can you find further examples of fields?

Remark 1.4. There is a more general definition of a field, not assuming that we
work with a subset of the complex numbers. The interested reader is referred to
the literature. Linear algebra deals with vector spaces. Vector spaces are defined
over a field K. In this course we always will take K = R. The results presented
in this course could easily be generalised to hold for any field K.
6 ANNE HENKE
2. The Algebra of Matrices
We will meet matrices in this course at various places. They will provide an ex-
ample for the most important object of linear algebra, the so-called vector spaces.
They also will be of fundamental importance when we study maps between vector
spaces. Finally they will be important when we study systems of linear equations.
In this section, we introduce matrices and operations defined for matrices. We are
interested in which algebraic relations matrices satisfy. Matrices can be defined
over any field K. In this course we always take K = R. We assume in this section
the usual rules of how to add and multiply elements in R. In the more general
setting of a field K these rules are precisely part of the definition of what a field
is.
Definition 2.1. Let m, n be natural numbers. An m × n matrix over R is an
array
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A=  ... .. ... ..  ,
. . 
am1 am2 · · · amn
where aij ∈ R. We shortly write A = (aij )1≤i≤m,1≤j≤n , or A = (aij ) if the shape
of the matrix is understood. A matrix of shape m × 1 is called a vector. We
define Mm×n (R) as the set of all m × n matrices with real entries. In particular
Mn (R) = Mn×n (R) is the set of real square matrices of size n.
Example 2.2. Matrix A given below is a 3 × 3 matrix. We have three rows and
three columns. Matrix B below is a general 3 × 3 matrix with entries bij ∈ R
where 1 ≤ i, j ≤ 3. We speak of bij as the (i, j)th entry of the matrix B. This
entry lies in row i and column j. Matrices need not be square; note that the
definition takes account of a general m × n matrix, a matrix that has m rows and
n columns. Matrix C below is an example of a 2 × 4 matrix.
   
2 1 1 b11 b12 b13
−8 1 1 −1
A =  4 1 0  , B =  b21 b22 b23  , C = .
4 1 0 5
−2 2 1 b31 b32 b33
Example 2.3. Define 0n = (aij ) where aij = 0 for i = 1, . . . , m and j = 1, . . . , n.
This is called the zero matrix of Mm×n (R). Moreover, define matrix In = (aij )
where
1 if i = j,
aij =
0 if i 6= j,
for 1 ≤ i, j ≤ n. This is called the n × n identity matrix. Often the zero matrix
and the identity matrix are just denoted by 0 and I respectively:
   
0 0 ··· ··· 0 1 0 ··· 0
 0 0 ··· ··· 0   0 1 ··· 0 
0= .
 .. . .. . . , I =  .
. . ..   .. . . . ..  .
. 
0 0 ··· ··· 0 0 0 ··· 1
We next define addition of matrices:

Definition 2.4. Let A, B be two m × n matrices, say A = (aij ) and B = (bij ).

Then the sum of A and B, written as A + B, is the matrix A + B = (cij ) where
cij = aij + bij with 1 ≤ i ≤ m and 1 ≤ j ≤ n.
Remark. Note that addition of matrices is only defined for matrices of the same
shape. If A, B are two m × n matrices then also A + B is an m × n matrix. We
say: We add two matrices by adding the entries coordinate-wise. Note that this
also defines addition of vectors (which are special matrices by definition).
Example 2.5. For example, for 3 × 3 matrices we have:
     
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
 a21 a22 a23  +  b21 b22 b23  =  a21 + b21 a22 + b22 a23 + b23  .
a31 a32 a33 b31 b32 b33 a31 + b31 a32 + b32 a33 + b33
We next define scalar multiplication of matrices. Note that this definition includes
also the definition of the scalar multiplication of vectors.
Definition 2.6. The product of a matrix A ∈ Mm×n (R) by a scalar λ ∈ R,
written as λA is the matrix C = (cij ) obtained by multiplying each entry of A by
λ: cij = λaij for 1 ≤ i ≤ m and 1 ≤ j ≤ n:
 
λa11 · · · λa1n
λA =  ... ... ..  .
.
λam1 · · · λamn
Example 2.7. If A = (aij ) is a square matrix with aij = 0 for all i 6= j, then A
is called diagonal matrix, and we write A = diag(a11 , . . . , ann ). A special type of
a diagonal matrix is the scalar matrix. Matrix B is a scalar matrix if B = kIn
for some k ∈ R.
   
a11 0 . . . 0 k 0 ... 0
 0 a22 . . . 0   0 k ... 0 
A=  ... .. . . ..  , B = kIn =  .. .. . . ..  .
. . .   . . . . 
0 0 . . . ann 0 0 ... k
Proposition 2.8. For all A, B, C ∈ Mm×n (R) and for all r, s ∈ R we have:
(1) A + B = B + A,
(2) A + (B + C) = (A + B) + C,
(3) A + 0 = A = 0 + A,
(4) s(rA) = (sr)A,
(5) (r + s)A = rA + sA,
(6) r(A + B) = rA + rB.
Note that 0 in statement (3) denotes the zero matrix of shape m × n. Note that
statement (2) says, when forming a sum of matrices, then brackets can safely be
omitted.
The proof of the above statements is straight forward. To demonstrate how they
should be written, we give an example by proving the first statement.
8 ANNE HENKE
Proof. Let A = (aij ), B = (bij ) ∈ Mm×n (R). Define (cij ) = A + B and

(dij ) = B + A. By the definition of addition of matrices (see Definition 2.4), we
have cij = aij + bij and dij = bij + aij . Since real numbers are commutative with
respect to addition, that is x + y = y + x for any x, y ∈ R, this implies that
cij = dij . Hence A + B = B + A.
There is one more operation for matrices which we need, namely the P
product of
matrices. We first recall the summation notation: We write shortly ni=1 ai for
the sum a1 + a2 + . . . + an of real numbers ai .
Definition 2.9. If A = (aij ) is an m × n matrix over the real numbers and
B = (bij ) is an n × p matrix over the real numbers, then the product
P AB is an
m × p matrix, defined as follows: AB = C = (cij ) where cij = nk=1 aik bkj for
1 ≤ i ≤ m and 1 ≤ j ≤ p.
Remark. Note that multiplication of two matrices A and B is only defined if the
number of elements in a row of A equals the number of elements in a column of
B. If it is defined then the (i, j)th entry of the matrix C = AB is obtained by
multiplying the ith row of matrix A with the jth column of the matrix B:
Xn
cij = aik bkj = ai1 b1j + ai2 b2j + . . . + ain bnj .
k=1
Example 2.10. The above definition includes the multiplication of a matrix with
a vector. For example, let A be a 3 × 3 matrix and x be a 3 × 1 vector, then Ax
is defined and is another 3 × 1 vector. For example,
     
2 1 1 2 3
 4 1 0  ·  −1  =  7  .
−2 2 1 0 −6
Proposition 2.11. Suppose A, B, C are matrices.
(1) If AB is defined then for all r ∈ R we have (rA)B = r(AB) = A(rB).

(2) If AB, AC and B + C are defined then so is A(B + C) and AB + AC =
A(B + C).
(3) If BA, CA and B + C are defined then so is (B + C)A and BA + CA =
(B + C)A.
(4) If AB and BC are defined then so are (AB)C and A(BC), and (AB)C =
A(BC).
Remark. Note that statement (4) says, when forming a product of matrices then
brackets can safely be omitted. Proofs of all except (4) are straight forward; they
are left as an exercise to the reader.
We conclude this section with defining some more language about matrices and
by giving some more examples.
Example 2.12. (1) For every matrix A = (aij ), there exists a matrix B such
that A + B = 0 = B + A, namely B = (−aij ). We call B the additive
inverse of A and write −A for it. Note that −A = (−1)A; this last
equation says that the additive inverse of A is given by multiplying the
matrix A with the scalar (−1).
(2) If A ∈ Mn (R) (a square matrix) and there exists B ∈ Mn (R) such that
AB = BA = In , then we call B the (multiplicative) inverse of A. We write
A−1 for the inverse matrix of A. For example, the matrices A, B, D below
are invertible with A−1 = A, B −1 = D and D−1 = B. Matrix C is not
invertible. So there are non-zero square matrices which are not invertible.

0 1 1 a 0 1 1 −a
A= ,B= ,C= ,D= .
1 0 0 1 0 0 0 1
Example 2.13. If A ∈ Mm×n (R) and A = (aij ) then AT = (bij ) where bij = aji .
We call AT the transposed matrix of A. In particular AT ∈ Mn×m (R). Note rows
of AT are columns of A, and columns of AT are rows of A. For example,
 
2 3
2 1 4
A= then AT =  1 1  .
3 1 2
4 2
We say A is a symmetric matrix if AT = A. We say A is skew symmetric if
AT = −A. We say A is orthogonal if AAT = AT A = I. Equivalently, A is
orthogonal if A is invertible and A−1 = AT .
Proposition 2.14. Let A, B ∈ Mm×n (R), C ∈ Mn×p (R). Then
(1) (AT )T = A,
(2) (A + B)T = AT + B T ,
(3) (λA)T = λAT ,
(4) (BC)T = C T B T .
Proof. We prove the first property and leave the others as straight forward
exercises to the reader. Let A = (aij ) be a matrix of shape m × n. Note that
A and (AT )T have the same shape. By the definition of a transposed matrix,
the (i, j)th entry of (AT )T equals the (j, i)th entry of AT , which in turn equals
the (i, j)th entry of A. So the entries of A and (AT )T coincide. Hence indeed
A = (AT )T .
Example 2.15. Let’s see on an example how to check that a matrix is symmet-
ric. Let A, B be symmetric matrices of the same size. Is the matrix AB again
symmetric? We claim that AB is symmetric if and only if AB = BA. To prove
this, we need to show two directions:
(a) Assume that AB = BA. We want to show that AB is symmetric. By

assumption we know that AT = A and B T = B. So by the assumption,
and by using Proposition 2.14(4) we have: (AB)T = B T AT = BA = AB.
(b) Conversely, assume now that AB is symmetric. We want to show that
AB = BA. Indeed, AB = (AB)T = B T AT = BA, where the first equality
holds by assumption that AB is symmetric, the second one uses Proposi-
tion 2.14(4) and the third equality uses that A and B are symmetric by
assumption.
10 ANNE HENKE
Definition 2.16. If APis an n × n matrix, say A = (aij ), then the trace of A is

defined to be tr(A) = ni=1 aii , the sum of all the elements on the main diagonal
of A.
Example 2.17. We have tr(In ) = n and tr(A) = 2 + 1 + 1 = 4 where
 
2 1 1
A=  4 1 0 .
−2 2 1
Proposition 2.18. Let A, B ∈ Mm (R), C ∈ Mm×n (R), D ∈ Mn×m (R). Then
(1) tr(AT ) = tr(A),

(2) tr(A + B) = tr(A) + tr(B),
(3) tr(λA) = λtr(A),
(4) tr(DC) = tr(CD).
Proof. WeP prove (4), the rest is left as an exercise to the reader.PLet (xij ) = DC.
Then xij = m k=1 dik ckj . Similarly, let (yij ) = CD. Then yij =
n
t=1 cit dtj . Since
addition and multiplication in R is commutative, we have:
n
X
tr(DC) = xtt
t=1
Xn X m m X
X n
= dtk ckt = ckt dtk
t=1 k=1 k=1 t=1
Xm
= ykk = tr(CD).
k=1
Exercise 1. Let ω be a complex cube root of 1 (this means ω 3 = 1) with ω 6= 1.

Prove that 1 + ω + ω 2 = 0. Letting A be the matrix
 
1 1 1
 1 ω ω2 
1 ω2 ω
Determine A2 and A−1 .
Exercise 2. Let A and B be two matrices such that AB and BA are defined and
of the same size. We say that A and B commute with respect to multiplication
if AB = BA. Now let A be a 2 × 2 matrix with entries in R.

1 0
(a) Show that A commutes with if and only if A is diagonal.
0 −1

1 0
(b) Show that A commutes with if and only if A is diagonal.
0 0

0 1
(c) Which 2 × 2 matrices A commute with ?
0 0
(d) that A commutes with all 2 × 2 matrices if and only if A =

Deduce
λ 0
for some λ ∈ R.
0 λ
(For those of you who found this problem too easy: Find all n × n matrices which
commute with any matrix A ∈ Mn (R) for fixed n ∈ N. Justify your answer.)
Exercise 3. For each a ∈ R, define the matrix A(a) by
 
1 a 12 a2
A(a) =  0 1 a  .
0 0 1
Show that for all a, b ∈ R we have A(a + b) = A(a)A(b). Deduce that each matrix
A(a) is invertible.
Exercise 4. Let A and B be two square matrices of the same size with A sym-
metric and B skew symmetric. Determine which of the following matrices are
symmetric and which are skew symmetric, and justify your answer:
(a) AB + BA,
(b) AB − BA,
(c) A2 ,
(d) B2,
(e) B T (AT + A)B,
(f) B T (A − AT )B.
(Do you need the assumptions on A and B in all cases?)

Exercise 5. Let A, D be square matrices of the same size. Show that the following
statements are true.
(a) If A is invertible then the inverse is unique.

(b) If A, D are invertible then AD is also invertible.
(c) Let B ∈ Mm×n (R) and C ∈ Mn×p (R). Then (BC)T = C T B T .
(d) If A is invertible, then so is AT and (AT )−1 = (A−1 )T .

a b
Exercise 6. Show that a 2 × 2 matrix A = has an inverse if and only
c d
if ad − bc 6= 0. Find A−1 .
Exercise 7. Let C and D be square matrices of the same size. Show that if both
C, D are orthogonal, then so is C −1 , CD and C −1 D.
12 ANNE HENKE
3. Vector Spaces
We next define the main objects of linear algebra, the vector spaces over a field
K. Although we will define vector spaces over any field K, you may always take
K = R. The concrete example of vectors will underpin the abstract definition.
Definition 3.1. A vector space V over K is a triple (V, +, ·) where
(a) V is a non-empty set,

(b) + is addition of vectors, that is
+ : V × V → V with (u, v) 7→ u + v for any u, v ∈ V ,
(c) · is scalar multiplication of vectors by elements of K, that is
· : K × V → V with (λ, v) 7→ λ · v for any λ ∈ K and v ∈ V ,
such that:
(V1) u + v = v + u for all u, v ∈ V.

(V2) (u + v) + w = u + (v + w) for all u, v, w, ∈ V .
(V3) There exists a special element in V , denoted by 0V , satisfying v + 0V =
0V + v = v for all v ∈ V . We call 0V the zero element of V .
(V4) For every v ∈ V there exists a special element in V , denoted by −v,
satisfying v + (−v) = (−v) + v = 0V . We call −v the additive inverse of
v.
(V5) λ(u + v) = λu + λv for all λ ∈ K and all u, v ∈ V .
(V6) (λ + µ)v = λv + µv for all λ, µ ∈ K and for all v ∈ V .
(V7) λ(µv) = (λµ)v for all λ, µ ∈ K and all v ∈ V .
(V8) 1 · v = v for all v ∈ V .
The elements of V are called vectors. If it is understood what + and · are, we will
shortly write V is a vector space instead of writing (V, +, ·) is a vector space. We
also will typically write λv instead of λ · v, and 0 instead of 0V . Note also that the
addition and scalar multiplication are examples of so-called binary operations.
Remark. Note that the statements in (b) and (c) mean that when checking that
some set V is a vector space, you have to check that for all u, v ∈ V and λ ∈ K,
the resulting elements u + v and λv are indeed elements belonging to V .
Example 3.2. The canonical example of a vector space is V = Rn where
  
 x1 
n .
.
R = x i ∈ R .
 . 

xn
Given two elements u, v ∈ V , then there are xi , yi ∈ R for 1 ≤ i ≤ n such that
   
x1 y1
u =  ...  v =  ...  .
xn yn
We define the addition of elements in Rn and the scalar multiplication as follows:

   
x1 + y1 λx1
u + v :=  .. , λu :=  ...  .
.
xn + yn λxn
We now need to check several things to prove that (V, +, ·) is indeed a vector
space.
(a) Clearly V is a non-empty set.

(b) We need to check that V is closed with respect to addition: Note that as
xi , yi ∈ R we have xi + yi ∈ R for i = 1, . . . , n. This implies that indeed
u + v ∈ Rn = V . Hence V is closed with respect to addition.
(c) We need to check that V is closed with respect to scalar multiplication:
Note that if λ, xi ∈ R then λxi ∈ R. Hence λu ∈ V . So V is indeed closed
with respect to scalar multiplication.
We next have to check that the axioms (V 1) − (V 8) hold in V = Rn . Given any

elements u, v, w ∈ V , then there are xi , yi , zi ∈ R for 1 ≤ i ≤ n such that
     
x1 y1 z1
u =  ...  v =  ...  w =  ...  .
xn yn zn
(V1) Since for real numbers we have xi +yi = yi +xi (for i = 1, . . . , n), it follows
that    
x1 + y1 y1 + x1
u+v = .. = ..  = u + v.
. .
xn + yn yn + xn
(V2) Since for real numbers we have (xi +yi )+zi = xi +(yi +zi ) (for i = 1, . . . , n),
it follows that
   
(x1 + y1 ) + z1 x1 + (y1 + z1 )
(u + v) + w =  .. = ..  = u + (v + w).
. .
(xn + yn ) + zn xn + (yn + zn )
(V3) We take
 
0R
0V =  ...  .
0R
Note that 0R denotes here zero of the real numbers. Clearly 0V ∈ Rn .
Over the real numbers we have xi + 0 = 0 + xi = xi (for i = 1, . . . , n),
which implies that
   
x1 + 0R x1
u + 0V =  ..  =  ...  = u.
.
xn + 0 R xn
Similarly, 0V + u = u.
14 ANNE HENKE
(V4) Given a vector p ∈ V with say coordinate entries ai , we take q to have

coordinate entries −ai :
   
a1 −a1
p =  ...  , q =  ...  .
an −an
Clearly then q ∈ V , and moreover
   
a1 + (−a1 ) 0
p+q = .. = ..  .
. .
an + (−an ) 0
So indeed p + q = q + p = 0V .
(V5) If λ ∈ R then
 
x1 + y1
λ(u + v) = λ  .. by the definition of addition for vectors,
. 
xn + yn
 
λ(x1 + y1 )
=  .. by the definition of scalar multiplication,
. 
λ(xn + yn )
 
λx1 + λy1
=  ..  using properties of real numbers,
.
λxn + λyn
   
x1 y1
= λ  ...  + λ  ...  = λu + λv,
xn yn
using again the definition of addition of vectors and the definition of scalar
multiplication.
(V6) If λ, µ ∈ R then
   
x1 (λ + µ)x1
(λ + µ)  ...  =  ..
.  definition of scalar multiplication in Rn ,
xn (λ + µ)xn
 
λx1 + µx1
=  .. properties of real numbers,
. 
λxn + µxn
   
λx1 µx1
=  ...  +  ...  definition of addition in Rn ,
λxn µxn
   
x1 x1
= λ  ...  + µ  ...  definition of scalar multiplication in Rn .
xn xn
Hence indeed (λ + µ)u = λu + µu for all u ∈ V .
(V7) Let λ, µ ∈ R. For real numbers we have λ(µxi ) = (λµ)xi in R, where

i = 1, . . . , n. Then
 
µx1
λ(µu) = λ  ...  definition of scalar multiplication in Rn ,
µxn
 
λ(µx1 )
=  .. definition of scalar multiplication in Rn ,
. 
λ(µxn )
 
(λµ)x1
=  ..  properties of real numbers,
.
(λµ)xn
 
x1
= (λµ)  ...  definition of scalar multiplication in Rn .
xn
(V8) The element 1 in R has the property that 1xi = xi for all xi ∈ R. So
     
x1 1x1 x1
1 ·  ...  =  ...  =  ...  .
xn 1xn xn
So 1v = v for all v ∈ Rn .
Hence we have verified that Rn with addition and scalar multiplication as above
is indeed a vector space.
Example 3.3. Fix a natural number n. Define Rn [x] to be the set of all polyno-
mials f (x) of degree less than or equal to n with coefficients in R. So
Rn [x] = {f | f (x) = a0 + a1 x + · · · + an xn with ai ∈ R for i = 0, . . . , n},
which clearly is a non-empty set. Addition is defined on Rn [x] as follows. Let
f (x) = a0 + a1 x + · · · + an xn ,
g(x) = b0 + b1 x + · · · + bn xn .
then (f + g)(x) := (a0 + b0 ) + · · · + (an + bn )xn . Clearly f + g ∈ Rn [x] since
ai + bi ∈ R for i = 0, . . . , n. Hence Rn [x] is closed with respect to addition.
Scalar multiplication is defined by (λf )(x) = (λa0 ) + · · · + (λan )xn . As λai ∈ R it
follows that Rn [x] is closed with respect to scalar multiplication. To check axioms
(V 1) − (V 8) is left as an exercise to the reader. Note that the set of polynomials
of (some fixed) degree n is not forming a vector space.
Example 3.4. Consider the m × n matrices with entries in R, that is, consider
V = Mm×n (R) = {A | A = (aij ), aij ∈ R, i = 1, . . . , m, j = 1, . . . , n}.
Then Mm×n (R) forms a vector space with component-wise addition (see Defini-
tion 2.4) and scalar multiplication (see Definition 2.6). The proof is left as an
exercise. Hint: the element 0V is given in Example 2.3. Note that Proposition 2.8
shows some of the vector space axioms for Mm×n (R).
16 ANNE HENKE
Example 3.5. There are many more examples of vector spaces, turning up in
different areas of mathematics. Vector spaces are defined over any field K. Similar
to above we have that
(1) (K, +, ·) is a K-vector space,

(2) (Kn [x], +, ·) is a K-vector space,
(3) Mm×n (K), +, ·) is a K-vector space.
The precise definitions of the sets and the binary operations is left to the reader.
When doing this exercise it becomes apparent which properties a field needs to
have. We also have the following variations of the above examples:
(1) (Rn , +, ·) is an R-vector space (by Example 3.2).

(2) (Cn , +, ·) is a C-vector space (by generalising Example 3.2 to fields).
(3) (C, +, ·) is an R-vector space.
(4) (R, +, ·) is a Q-vector space.
And so on. In this course we only consider vector spaces over the real numbers.
Vector spaces over other fields than the real numbers are not part of the syllabus.
From now on we will work only with vector spaces over R. It should however be
noted that we could develop our theory equally for vector spaces over fields. The
interested reader may try this as a (not so difficult, and eventually boring) exer-
cise. Examples (3) and (4) are examples of so-called field extensions, something
studied in the second year in the course “Fields”.
Example 3.6. Let X be a non-empty set. Let V = {f : X → R}. We write
x 7→ f (x) for x ∈ X. Then V is a vector space over R with the following addition
and scalar multiplication:
(a) Addition is defined by: For f1 , f2 ∈ V define (f1 + f2 )(x) := f1 (x) + f2 (x).
Note that f1 (x) ∈ R and f2 (x) ∈ R, and hence f1 (x) + f2 (x) ∈ R. So
f1 + f2 ∈ V and hence V is closed with respect to addition.
(b) Scalar multiplication is defined by: For f ∈ V and λ ∈ R define (λf )(x) =
λ · f (x). Since λ ∈ R and f (x) ∈ R, this implies λ · f (x) ∈ R. Hence
λf ∈ V and V is closed with respect to scalar multiplication.
In case X = R, we denote the vector space V by RR .

4. First Properties of Vector Spaces
Using the axioms (V 1) − (V 8) of vector spaces, we now can derive some first
properties of vector spaces. Throughout this section, let V be a vector space over
R.
Lemma 4.1. The following statements hold:
(i) The zero vector 0V is unique.

(ii) The additive inverse of any element in V is unique.
Proof. (i) Let 0V , 0′V ∈ V both have the property described in Axiom (V3),
that is:
v + 0V = 0V + v = v ∀v ∈ V, (a)
v + 0′V = 0′V + v = v ∀v ∈ V. (b)
Then put v = 0′V in (a) and we have 0′V + 0V = 0V + 0′V = 0′V . Put v = 0V in (b),
then 0V + 0′V = 0′V + 0V = 0V . Hence 0V = 0′V .
(ii) Suppose we have two elements −v and v ′ both satisfying Axiom (V4), that is:
v + (−v) = (−v) + v = 0V , (a)

v + v ′ = v ′ + v = 0V . (b)
Then
v + (−v) + v ′ = (v + (−v)) + v ′ by Axiom (V2),
= 0V + v ′ by (a),
′
=v byAxiom(V3).
Now v + (−v) + v ′ = (v + v ′ ) + (−v) byAxioms(V1), (V2)

= 0V + (−v) by (b),
= −v by Axiom (V3).
Hence v ′ = −v.
Lemma 4.2. Let V be a vector space over R. Then for all u, v ∈ V and λ ∈ R
we have:
(i) 0R · v = 0V ;
(ii) λ · 0V = 0V ;
(iii) if λ · v = 0V then λ = 0R or v = 0V ;
(iv) (−1) · v = −v;
(v) (−1) · (−v) = v.
(vi) −0V = 0V .
18 ANNE HENKE
Proof. (i) We need to show that 0R · v satisfies Axiom (V3). If so, then
Lemma 4.1(i) implies 0R · v = 0V . Now
v + 0R · v = 1 · v + 0R · v by (V8),
= (1 + 0R ) · v by (V6),
= 1·v since 1 + 0R = 1 in R,
= v by (V8).
Therefore 0R · v = 0V .
(ii) Exercise.
(iii) Let λ · v = 0V and λ 6= 0. We prove that v = 0V . Now
v = 1·v by (V8),
= (λ−1 · λ) · v as λ−1 · λ = 1 in R and λ 6= 0,
= λ−1 · (λ · v) by (V7),
= λ−1 · 0V by assumption,
= 0V by (ii).
(iv)-(vi) Exercise.
Exercise 8. In each of the following cases, either give a careful proof that V is a
vector space over R, or give a reason why it is not:
(a) V is the set of all polynomials over R (in one variable, say x) which have
a non-zero constant term, with the usual addition of polynomials and the
usual scalar multiplication.
(b) V is the set of all functions f : X → R (for some fixed non-empty set X),
and if f, g ∈ V , α ∈ R, then the functions f + g, αf are defined by setting
(f + g)(x) = f (x) + g(x), (αf )(x) = αf (x).
(c) V is the set of all symmetric n × n matrices over R.
(d) V is the set of all skew-symmetric n × n-matrices over R.
(e) V is the set of all invertible n × n-matrices over R.
(f) V = R2 with the usual scalar multiplication and the new addition ⊕ :
V × V → V given by

x1 y1 x1 + y2
⊕ = .
x2 y2 x2 + y1
Exercise 9. Let V be a vector space over R. Use the vector space axioms to
show that for all v ∈ V and all λ ∈ R the following holds:
(a) λ · 0V = 0V , (b) (−1)v = −v.
5. Subspaces of Vector Spaces
Definition and Examples. Given an object in mathematics, one typically also

defines ’sub-objects’ of this object. Assuming that the object is defined as a set
with certain properties, then the sub-objects are subsets of the original object
with the same properties. In this section we define ’sub-objects’ for vector spaces.
Throughout this section V denotes a vector space over R.
Definition 5.1. Let (V, +, ·) be a vector space over R. A non-empty subset W
of V is a (vector) subspace if and only if (W, +, ·) is a vector space over R. We
write W ≤ V and read this as W is a subspace of V .
Remarks. (a) Every vector space V has at least two subspaces: {0V } and V itself.
Any subspace W of V with W 6= {0V } and W 6= V is called a proper subspace.
(b) The zero element of a subspace W of V always coincides with the zero element
of V . To see this, use Lemma 4.1 and Definition 3.1 for V .
(c) The two binary operations (addition and scalar mulitiplication) needed to
define a vector space structure on W are precisely the two binary operations
given with the vector space V , restricted to the subset W . The definition now
says, that in order to check that a subset of a vector space is again a vector
space, we need to check that W is closed with respect to addition and scalar
multiplication, and that the eight axioms (V1)-(V8) of a vector space hold for W .
Similar as W inherits the binary operations from V , several of the axioms hold
automatically for the elements of a subset of a vector space.
Lemma 5.2 (First subspace test). A non-empty subset W of a vector space V is
a subspace of V if and only if it is closed under addition and scalar multiplication:
(i) If w1 , w2 ∈ W then w1 + w2 ∈ W .
(ii) If w ∈ W and λ ∈ R then λw ∈ W .
Proof. The proof consists of showing two statements.

“⇒”: For this direction of the proof, nothing is to show. We assume that W ⊆ V
is a vector space. By Definition 3.1 (applied to W ), the set W is closed under
addition and scalar multiplication. Hence (i) and (ii) hold.
“⇐”: Given W ⊆ V and W 6= ∅ and (i), (ii) hold. We claim that W is a vector
space. By (i) and (ii) we know that W is closed with respect to addition and
scalar multiplication. We need to check that (V1)-(V8) hold for W .
(1) Axioms (V1), (V2) and (V5)-(V8) are inherited from V as W ⊆ V .

(2) Since W = 6 ∅, there exists w ∈ W . By assumption (ii) we know that
0R w ∈ W . By Lemma 4.2 (applied to the vector space V ) we have:
0R w = 0V ∈ W . Axiom (V3) holds in V by assumption. Hence 0V +w = w
for all w ∈ W ⊆ V . Hence 0W = 0V , and (V3) holds in W .
(3) Let w ∈ W ⊆ V . Then −w = (−1)w by Lemma 4.2 (applied to V ).
Assumption (ii) implies that −w ∈ W . Axiom (V4) holds in V , hence
w + (−w) = 0V for all w ∈ W ⊆ V . Since 0W = 0V it follows that (V4)
holds in W .
20 ANNE HENKE
We call Lemma 5.2 the first subspace test. It obviously speeds up the checking
that we have to do in order to prove that a given non-empty subset of a vector
space is a subspace. Often you will see conditions (i), (ii) in Lemma 5.2 simplified
into one condition. This is called the second subspace test.
Lemma 5.3 (Second subspace test). A non-empty subset W of a vector space V is
a subspace if and only if for any λ1 , λ2 ∈ R and w1 , w2 ∈ W then λ1 w1 +λ2 w2 ∈ W .
Proof. The proof consists of showing two statements.

“⇒:”: If W 6= ∅ is a vector space then from closure laws we have λ1 w1 +λ2 w2 ∈ W
for any λ1 , λ2 ∈ R and w1 , w2 ∈ W .
“⇐:”: Let W ⊆ V with W 6= ∅. Assume λ1 w1 + λ2 w2 ∈ W for all w1 , w2 ∈ W
and λ1 , λ2 ∈ R. We need to show that W is a vector space over R. It is sufficient
to show that (i), (ii) of Lemma 5.2 hold. Given w1 , w2 ∈ W .
(1) Choose λ1 = λ2 = 1. We calculate in the vector space V . Then by

assumption, λ1 w1 + λ2 w2 = 1 · w1 + 1 · w2 = w1 + w2 lies in W . Here we
used (V8) for V . This shows that W is closed with respect to addition.
(2) Choose λ1 = 1 and λ2 = 0. Then by assumption
λ1 w1 + λ2 w2 = 1 · w1 + 0 · w2 = w1
is an element of W . Here we used Lemma 4.2 and the axioms (V3) and
(V8) for V . Hence W is closed with respect to scalar multiplication.
Lemma 5.2 now implies that W is a subspace of V .
Lemma 5.4. Let U be a subspace of V . For any k ∈ N and u1 , . . . , uk ∈ U and

α1 , . . . , αk ∈ R we have
k
X
α1 u1 + . . . + αk uk = αi ui ∈ U.
i=1
Proof. The proof is by induction on k, using Lemma 5.3. Let k = 1 then the
claim follows from the definition of a vector space, see Definition 3.1. If k = 2
then the claim follows from Lemma 5.3. Assume the claim is true for some k ≥ 2.
Consider
x = α1 u1 + . . . + αk uk + αk+1 uk+1 .
Put ũ = α1 u1 + . . . + αk uk . By induction assumption, ũ ∈ U . By (V8) we have
ũ = 1 · ũ. Hence x = 1 · ũ + αk+1 uk+1 . By Lemma 5.3 we have x ∈ U .
Example 5.5. We give various examples of subspaces:
(1) R is a subspace of the R-vector space C = {a + bi | a, b ∈ R}.

(2) A subset {0} =6 U ( R2 is a subspace if and only if U is a straight line
through the origin.
(3) The subspaces of R3 are precisely the following subsets of R3 :

(a) the origin,
(b) lines through the origin,
(c) planes through the origin,
(d) and R3 .
(4) Consider the set
W = {A ∈ Mn (R) | A = (aij ) with aij = 0 for i > j}.
So elements in W are matrices A of the form:
 
a11 a12 · · · · · · a1n
 0 a22 · · · · · · a2n 
 
A= 0 0 a33 · · · a3n 
 . .. .. .. 
 .. . . . 
0 ··· 0 ann
with aij ∈ R. A matrix A of this form is called an upper triangular matrix.
We claim: The set W is a subspace of Mn (R). We use Lemma 5.2 to show
this. Note W 6= ∅ since the zero matrix lies in W . If A = (aij ), B =
(bij ) ∈ W then A + B = (cij ) where cij = aij + bij , and if aij = 0 for i > j
and bij = 0 for i > j then cij = 0 for i > j. Hence A + B ∈ W . Similarly
for λ ∈ R, if A = (aij ) with aij = 0 for i > j then λaij = 0 for i > j. So
λA ∈ W . By Lemma 5.2 it follows that W is a subspace of Mn (R).
(5) Let A ∈ Mm×n (R). Then W = {x ∈ Mn×1 (R) | Ax = 0} is a subspace of
Mn×1 (R).
Example 5.6. We next give examples of subsets of vector spaces which are not
subspaces.
(1) Let V = R3 and

   
 x 
W =  y  x ≥ 0, y ≥ 0 .
 
0
So the elements of W consist of precisely all points in the first quadrant
of the x-y-coordinate system. Let
   
1 −1
v =  1  . Then − v =  −1  .
0 0
Here v lies in W but the additive inverse −v ∈ / W . Hence (V4) does not
hold, and W is not a subspace of V .
(2) Let V be the set of square matrices of size n. Then the subset of invertible
matrices of V is not a subspace.
(3) Let V be the set of square matrices of size n. Then the subset of orthogonal
matrices of V is not a subspace.
Intersection, union and sum of subspaces. Given two subspaces U, W of a

vector space V . Let U ∩ W be the set theoretic intersection of the sets U and W ,
22 ANNE HENKE
and let U ∪ W be the set theoretic union of the sets U and W :

U ∩ W = {v ∈ V | v ∈ U and v ∈ W },
U ∪ W = {v ∈ V | v ∈ U or v ∈ W }.
Definition 5.7. Let U, W be subspaces of a vector space V . The sum of the
subspaces U, W is the set
U + W = {u + w | u ∈ U, w ∈ W }.
Clearly U + W is a subset of V .
Example 5.8. Let V = R2 and let U be the x-axis and W be the y-axis:

x 0
U= x∈R , W = y∈R .
0 y
Then U and W are subspaces of V and

0
U ∩W = 6= ∅,
0

x 0
U ∪W = x∈R ∪ y∈R ,
0 y

x 0
U +W = + x, y ∈ R
0 y

x
= x, y ∈ R = R2 .
y
The sets U ∩ W and U + W are vector spaces, and hence subspaces of V . The
set U ∪ W is not a vector space. It is not closed under addition: (1, 0)T ∈ U and
(0, 1)T ∈ W , hence (1, 0)T , (0, 1)T ∈ U ∪ W . However

1 0
+ ∈/ U ∪ W.
0 1
Example 5.9. Let V = M2×2 (R), the set of 2 × 2 matrices with entries in R. Let

a b x 0
U= a, b ∈ R , W = x, y ∈ R .
0 0 y 0
Then U and W are subspaces of V , and

z b
z, b, y ∈ R ,
U +W =
y 0

a 0
U ∩W = a∈R ,
0 0
since a + x = z describes the whole of R as x and a do. The sets U ∩ W and
U + W are vector spaces, and hence subspaces of V . The set U ∪ W is not a
vector space: It is not closed under addition, similar as in the previous example.
The proofs of the following (important) propositions are left as an exercise to the
reader.
Proposition 5.10. Let U, W be subspaces of a vector space V .
(a) Then U ∩ W is a subspace of V .

(b) Then U + W is a subspace of V .
(c) Then U ∪ W is a subspace of V if and only if W ⊆ U or U ⊆ W .
Remark. In general U ∪ W is not a subspace of V , but as a set of vectors in V ,

it will generate a subspace. The space generated by U ∪ W is precisely the sum
of U, W . The sum U + W is in fact the smallest subspace containing both U, W .
What it means for some elements to generate a vector space, we explain in the
next section.
Exercise 10. Let U, W be subspaces of a vector space V . Prove that
(a) U ∩ W is a subspace of V ;
(b) U + W = {u + w | u ∈ U, w ∈ W } is a subspace of V ;
(c) U ∪ W is a subspace of V if and only if U ⊂ W or W ⊂ U .
Exercise 11. Let V = R[x], the vector space of all real polynomials in one
variable x. Determine whether or not U is a subspace of V when:
(a) U consists of all polynomials with degree ≥ k for fixed k, together with
the zero polynomial;
(b) U consists of all polynomials with only even powers of x;
(c) U consists of all polynomials with integral coefficients;
(d) U consists of all polynomials p(x) ∈ R[x] with p(1) = p(5).
Exercise 12. (a) Let α ∈ R. Prove that Uα = {(x1 , x2 , x3 ) ∈ R3 | x1 + x2 +
x3 = α} is a subspace of R3 if and only if α = 0.
(b) Is the set U = {(x1 , x2 , x3 , x4 ) ∈ R4 | x21 = 2x2 and x1 + x2 = x3 + x4 } a
subspace of R4 ? Justify your answer.
Exercise 13. (a) Let S be the subset {(x, 0) | x ∈ R and x > 0} of R2 .
Is S a subspace of the vector space R2 with respect to the usual scalar
multiplication and the usual addition of R2 ?
(b) Let S be the subset {(x, 0) | x ∈ R and x > 0} of R2 . Define the scalar
multiplication ∗ and addition ⊕ on S by:
α ∗ (u, 0) = (uα , 0), (u, 0) ⊕ (v, 0) = (uv, 0)
for all α, u, v ∈ R with u, v > 0. Show that S is a vector space with respect
to ∗ and ⊕. Is (S, ∗, ⊕) a subspace of R2 ?
Exercise 14. If A is a real n × n-matrix, prove that {x ∈ Mn×1 (R) | Ax = 0} is
a subspace of Rn .
Exercise 15. For each of the following statements about subspaces X, Y, Z of a
vector space V either give a proof of the statement, or find a counterexample. R2
and R3 will provide all the counterexamples required.
(a) V \X is never a subspace of V ;

24 ANNE HENKE
(b) (X ∩ Y ) + (X ∩ Z) = X ∩ (Y + Z);
(c) (X + Y ) ∩ (X + Z) = X + (Y ∩ Z);
(d) if Y ⊆ X, then Y + (X ∩ Z) = X ∩ (Y + Z).
6. Linear Dependence, Linear Independence and Spanning
Throughout this section V denotes a vector space over R. Whenever we refor-

mulate equations involving vectors in this (or later) sections, the reader is urged
to determine which vector space axioms or earlier lemmatas have been applied to
obtain the reformulated equation.
Spanning.
Definition 6.1. Let S = {v1 , . . . , vn } be a subset of a vector space V .
(1) We call any expression of the form λ1 v1 + · · · + λn vn with λi ∈ R a linear

combination of v1 , . . . , vn .
(2) The span of S, denoted by Span(S) or Span{v1 , . . . , vn } or hv1 , . . . , vn i, is
the set of all linear combinations of v1 , . . . , vn :
hv1 , . . . , vn i = {λ1 v1 + · · · + λn vn | ∀λi ∈ R}.
Example 6.2. We give various examples of vectors spanning a vector space. In
particular the examples show that a vector space V has many different spanning
sets (also called generating system).
(1) Let V = Rn . Define vector ei ∈ V by letting all coordinates be zero except

the ith coordinate which is one:
 
0
..
 . 
 
 0 
ei =  1  ← ith coordinate.
 0

 
 .. 
.
0
Then Span{ei | 1 ≤ i ≤ P n} = Rn since for any vector (x1 , . . . , xn )T ∈ V

we have (x1 , . . . , xn ) = ni=1 xi ei .
T
(2) Define the vectors

X i
vi = ej
j=1
for 1 ≤ i ≤ n. Then Span{vi | 1 ≤ i ≤ n} = Rn . The proof is left as an
exercise to the reader.
(3) Let V = R3 . Then
     
 1 0 0 
V = Span  0  ,  1  ,  0 
 
0 0 1
     
 1 1 1 
= Span  0 , 1 , 1 
   
 
0 0 1
       
 1 0 1 0 
= Span  0 , 1 , 1 , 1  .
     
 
0 0 0 1
26 ANNE HENKE
So the spanning sets of a vector space V can have different cardinality.

(4) Let V = Mn (R). Define Eij = (akl )1≤k≤n,1≤l≤n by

1 if k = i, l = j,
akl =
0 otherwise.
Then P
P V = Span {Eij | 1 ≤ i ≤ n, 1 ≤ j ≤ n} since any matrix B = (bij ) =
n n
i=1 j=1 bij Eij .
Proposition 6.3. Let V be a vector space over R and S = {v1 , . . . , vn } ⊆ V.

Then Span(S) is a subspace of V . It is the smallest subspace of V containing S.
Proof. Write X = Span{v1 , . . . , vn }. Note that X 6= ∅ as S ⊆ X. Let u, w ∈ X.

Then there exist αi , βi ∈ R with
n
X n
X
u= αi vi , w= βi vi ,
i=1 i=1
see Definition 6.1. Let λ, µ ∈ R. Then

n
! n
!
X X
λu + µw = λ αi vi + µ βi vi
i=1 i=1
= (λα1 + µβ1 )v1 + · · · + (λαn + µβn )vn ∈ X,
since λαi + µβi ∈ R. By Lemma 5.3 it follows that X is a subspace. Assume that
Y is the smallest subspace of V containing S. Then any linear combination of
the vectors vi lies in Y , see Lemma 5.4. Hence certainly X ⊆ Y . Since Y is by
definition the smallest subspace containing S, it follows that X = Y .
Remark. At this point we can revisit the last remark given in Section 5. It is
now an easy exercise to show that the space generated by U ∪ W is precisely the
sum of U and W . It then follows from the last proposition, that it is the smallest
subspace containing both U and W .
Linear (in)dependence.
Definition 6.4. Let V be a vector space over R and let {v1 , . . . , vn } ⊂ V.
(1) We say {v1 , . . . , vn } is linearly dependent if there exist scalars λ1 , . . . , λn ∈

R, not all zero, such that λ1 v1 + · · · + λn vn = 0.
(2) We say {v1 , . . . , vn } is linearly independent if whenever α1 v1 +· · ·+αn vn =
0 for αi ∈ R then αi = 0 for 1 ≤ i ≤ n.
Remark 6.5. (1) The phrase “not all zero” means that there is at least one
of the λi which is not zero.
(2) Note that a set of vectors is linearly independent if and only if it is not
linearly dependent.
(3) By convention the empty set is linearly independent.
(4) If vi = 0 for some i then the set {v
P1n, . . . , vn } is linearly dependent: take
λi = 1 and λj = 0 for i 6= j, then k=1 λk vk = 0.
Example 6.6. Let V = R3 . Let

     
1 0 −2
v1 =  0 , v2 =  1  and v3 =  1  .
0 0 0
(1) Then {v1 , v2 , v3 } are linearly dependent as (−2)v1 + v2 + (−1)v3 = 0.

(2) Then {v1 , v2 } are linearly independent.
Proof. Assume α1 v1 + α2 v2 = 0 for any α1 , α2 ∈ R. Then
       
0 1 0 α1
 0  = α1  0  + α2  1  =  α2  .
0 0 0 0
Hence α1 = 0 and α2 = 0.
Lemma 6.7. Let V be a vector space over R, and let S = {v1 , . . . , vn } ⊆ V .
(1) If S is linearly independent then any T ⊆ S is linearly independent.

(2) If S is linearly dependent then any V ⊇ T ⊇ S (T finite) is linearly
dependent.
Proof. We prove the first statement. Assume there is a subset T ⊆ S which

is linearly dependent. Without loss of generality we may assume that T =
{v1 , . . . , vk } with k ≤ n. Then there exist ai ∈ R for 1 ≤ i ≤ k with a1 v1 + · · · +
ak vk = 0, and not all ai zero. Extend this to a relation between v1 , . . . , vk , . . . , vn :
0 = a1 v1 + · · · + ak vk + 0vk+1 + · · · + 0vn .
Here not all the ai are zero. This implies {v1 , . . . , vn } is linearly dependent, a
contradiction. Hence any subset T ⊆ S is linearly independent. The second
statement is equivalent to the first one.
Proposition 6.8. Let n ≥ 2. The vectors v1 , . . . , vn are linearly dependent if and

only if one of them can be expressed as a linear combination of the others.
Proof. “⇒”: Suppose v1 , . . . , vn are linearly dependent. Then there is a relation:

λ1 v1 + · · · + λn vn = 0,
and not all λi are zero. Suppose λj 6= 0. Then
n
X
λj vj = − λl vl .
l=1
l6=j
1
Since λj ∈ R with λj 6= 0, hence λj
exists. So
n
X n
X
vj = − λ1j λl vl = (− λλjl )vl .
l=1 l=1
l6=j l6=j
Hence vj is a linear combination of the other vectors.

28 ANNE HENKE
“⇐”: Suppose vector vk is a linear combination of v1 , . . . , vk−1 , vk+1 , . . . , vn . Then

there exist αi ∈ R for 1 ≤ i ≤ n, i 6= k with
Xn
vk = αi vi .
i=1
i6=k
Hence
n
X
0= αi vi + (−1)vk .
i=1
i6=k
Pn
Let αk = −1. Then 0 = i=1 αi vi and v1 , . . . , vn are linearly dependent.
Example 6.9. For the first two examples let V = Rn .
(1) For 1 ≤ i ≤ n, define the vector ei ∈ V as above, by letting all coordinates

be zero except the ith coordinate which is one. Then {e1 , . . . , en } is linearly
independent. Any subset of {e1 , . . . , en } is also linearly independent.
(2) The vectors
     
1 1 1
 0   1   1 
v1 = 
 ...  , v2 =  ...  , . . . , vn =  ... 
    
0 0 1
are linearly independent. The proof is left as an exercise to the reader.
(3) Let 1 ≤ i, j ≤ n and let V = Mn (R). Define Eij = (akl )1≤k≤n,1≤l≤n by

1 if k = i, l = j,
akl =
0 otherwise.
Then {Eij | 1 ≤ i ≤ n, 1 ≤ j ≤ n} is linearly independent.
Remark. We have worked with finite subsets S of a vector space, both for spanning
and linear independence. In fact, if we would want to include infinite sets S in
our study, we would need to be more careful with our definitions. If S is any
(possibly infinite) subset of a vector space V , we define the span of S to be the
set of all linear combinations of finite subsets of S. We say an infinite family of
vectors S is linearly independent, if each finite subset of S is linearly independent.
In this course we work only with so-called finite dimensional vector spaces. It will
therefore be enough to always assume that S is finite.
Exercise 16. (a) Which of the following sets of vectors in R3 are linearly
independent?
(i) {(1, 3, 0), (2, −3, 4), (3, 0, 4)},
(ii) {(1, 2, 3), (2, 3, 1), (3, 1, 2)}.
(b) Which of the following sets of vectors in V = {f : R → R} are linearly
independent?
(i) {f, g, h} with f (x) = 5x2 + x + 1, g(x) = 2x + 3 and h(x) = x2 − 1.
(ii) {p, q, r} with p(x) = cos2 (x), q(x) = cos(2x) and r(x) = 1.
(c) Determine all α ∈ R for which the set {(1, α, α), (α, 1, α) and (α, α, 1)} is
linear independent.
Exercise 17. Let V be an R-vector space, n ∈ N and v1 , . . . , vn ∈ V . Define
vectors wi for 1 ≤ i ≤ n by
Xi
wi = vj .
j=1
(a) Show that Span{v1 , . . . , vn } = Span{w1 , . . . , wn }.

(b) Show that {w1 , . . . , wn } is linearly independent if and only if {v1 , . . . , vn }
is linearly independent.
Exercise 18. Consider the vector space R3 .
(a) Find four vectors a, a′ , b, b′ ∈ R3 such that if A = Span{a, a′ } and B =

Span{b, b′ } then A + B = R3 and A ∩ B = Span{(1, 1, 1)}.
(b) Are there vectors c, c′ , d, d′ ∈ R3 such that if C = Span{c, c′ } and D =
Span{d, d′ } then C + D = {(x, y, z) | x + 2y + 3z = 0} and C ∩ D =
Span{(1, 1, −1), (5, −1, −1)}?
30 ANNE HENKE
7. Bases of Vector Spaces
Throughout this section, let V be a vector space over R. In the previous two
sections, we introduced the span of a finite set of vectors, and we studied what it
means for a finite set of vectors to be linearly independent. These two concepts
come together in the basis of a vector space.
Definition 7.1. Let V be a vector space over R and let {v1 , . . . , vn } be elements
in V such that:
(1) V = Span{v1 , . . . , vn },
(2) {v1 , . . . , vn } are linearly independent.
Then we say {v1 , . . . , vn } is a basis of V.

Example 7.2. Let ei and vi be defined as in Example 6.2(1) and 6.2(2).
(1) Rn has basis {ei | 1 ≤ i ≤ n}. See Example 6.2(1) and Example 6.9(1).
(2) Rn has basis {vi | 1 ≤ i ≤ n}. See Example 6.2(2) and Example 6.9(2).
(3) Mm×n (R) has basis {Eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n}. See Example 6.2(4) and
Example 6.9(3).
(4) C as R-vector space has basis {1, i}.
(5) Claim: Rn [x] has basis {1, x, . . . , xn }.
Proof.
(a) Let f (x) ∈ Rn [x]. Then f (x) = a0 + a1 x + · · · + an xn for some ai ∈ R
and clearly f (x) ∈ Span{1, x, . . . , xn }.
(b) The vectors {1, x, . . . , xn } are linearly independent: Assume λi ∈ R
with λ0 + λ1 x + · · · + λn xn = 0. Then the polynomial
f (x) := λ0 + λ1 x + · · · + λn xn
is zero for any x ∈ R. The fundamental theorem of algebra states
that any polynomial of degree n has over C precisely n roots – roots
of f (x) are by definition those values x ∈ R with f (x) = 0. Over R
such a polynomial then has at most n roots. Since R has more than
n elements this implies that f (x) is the zero polynomial, that is
λ1 = λ2 = . . . = λn = 0.
Proposition 7.3. Let {v1 , . . . , vn } be a basis of a vector space V. Then every

element v ∈ V has a unique expression as a linear combination of v1 , . . . , vn .
Proof. By Definition 7.1, {v1 , . . . , vn } is a spanning set. So any v ∈ V is

expressible as:
(1) v = a1 v1 + · · · + an vn ,
for ai ∈ R. Assume v = b1 v1 + · · · + bn vn with bi ∈ R is another expression of v.
Then
0 = v − v = (a1 − b1 )v1 + · · · + (an − bn )vn .
Since v1 , . . . , vn are linearly independent, this implies ai − bi = 0 for all i. Hence

ai = bi for all i, and the expression for every element v ∈ V is unique.
Definition 7.4. We call the column vector

 
a1
 ... 
an
defined by Equation (1) the coordinate vector of v with respect to the basis
{v1 , . . . , vn }, and ai is called the ith coordinate of v.
Proposition 7.5. Let {v1 , . . . , vk } be a finite subset of a vector space V . The
following statements are equivalent:
(1) {v1 , . . . , vk } is a maximal linearly independent set.

(2) {v1 , . . . , vk } is a minimal spanning set.
(3) {v1 , . . . , vk } is a linearly independent spanning set (=a basis).
Proof. We proof that (1) ⇔ (3) and (3) ⇔ (2).
• (3) ⇒ (1): Let {v1 , . . . , vk } be a basis of V. By Definition 7.1, then

{v1 , . . . , vk } is linearly independent. By Definition 7.1, any v ∈ V is
expressible as v = a1 v1 + · · · + ak vk , for some ai ∈ R. Hence for any v ∈ V ,
the set {v1 , . . . , vk , v} is linearly dependent by Proposition 6.8. Hence
{v1 , . . . , vk } is maximal linearly independent.
• (1) ⇒ (3): Assume {v1 , . . . , vk } is maximal linearly independent. So in
particular {v1 , . . . , vk } is linearly independent. By assumption, for any
v ∈ V which is adjoint to {v1 , . . . , vk }, we get a linearly dependent set.
Hence there exist ai ∈ R with a1 v1 + · · · + ak vk + ak+1 v = 0 and not
all ai are zero. Assume ak+1 = 0. Then a1 v1 + · · · + ak vk = 0 with not
all ai zero. This contradicts to {v1 , . . . , vk } being linearly independent.
1
Hence we have ak+1 6= 0. So v = − ak+1 (a1 v1 + · · · + ak vk ) and hence
v ∈ Span{v1 , . . . , vk }. So {v1 , . . . , vk } spans V , and hence is a basis of V.
• (3) ⇒ (2): We assume {v1 , . . . , vk } is a basis of V . Suppose {v1 , . . . , vk }
is not minimal as spanning set. Then there exists a vector in {v1 , . . . , vk }
which is not necessary in order to span V . Say this is vk . Then {v1 , . . . , vk−1 }
spans V and hence in particular vk = c1 v1 + · · · + ck−1 vk−1 for some ci ∈ R.
But by Proposition 6.8, this defines a linear dependence between the ele-
ments {v1 , . . . , vk } which contradicts the assumption. Hence {v1 , . . . , vk }
is a minimal spanning set.
• (2) ⇒ (3): We assume that {v1 , . . . , vk } is a minimal spanning set. We
need to show that {v1 , . . . , vk } is linearly independent. Assume not. Then
there exist elements ci ∈ R, not all zero, such that c1 v1 + · · · + ck vk = 0.
We may assume that c1 6= 0. Then v1 = − c11 (c2 v2 + · · · + ck vk ). Hence,
v1 ∈ Span{v2 , . . . , vk } which contradicts the fact that {v1 , . . . , vk } was a
minimal spanning set. So our assumption was wrong and {v1 , . . . , vk } is
indeed linearly independent.
32 ANNE HENKE
Corollary 7.6. Let V be a vector space over R. If S = {v1 , . . . , vr } is spanning V ,

then a subset of S forms a basis of V ; this means, there exist indices i1 , . . . , in ∈
{1, . . . , r} with {vi1 , . . . , vin } ⊆ S is a basis of V.
Proof. This follows from Proposition 7.5(2). Roughly speaking, we need to

delete vectors from the spanning set S, until we reach a set T ⊆ S which is a
minimal spanning set. To choose a linear dependent vector which can be deleted
from the spanning set without changing the span of the set, use Proposition 6.8.
The details are left to the reader.
Example 7.7.
We demonstrate the idea for the proof of the last corollary on an example. Let
V = R3 . Define
         
1 0 1 0 1
v1 =  0  , v2 =  1 , v3 =
  1 , v4 =
  1 , v5 =
  0 .
0 0 0 1 1
Find a linear dependence involving some (or all) of the five vectors, for example
v3 = v1 + v2 . Hence Span{v1 , v2 , v3 , v4 , v5 } = Span{v1 , v2 , v4 , v5 }. We next look
for a linear dependence of the remaining four vectors, for example v1 − v2 + v4 =
v5 . Hence Span{v1 , v2 , v3 , v4 , v5 } = Span{v1 , v2 , v4 }, and the latter is a minimal
spanning set.
There are various other ways to obtain a minimal spanning set. Here is one such al-
ternative. Note that v1 = v3 −v2 . Hence Span{v1 , v2 , v3 , v4 , v5 } = Span{v2 , v3 , v4 , v5 }.
Next we use that v2 = 21 (v3 +v4 −v5 ). Hence Span{v1 , v2 , v3 , v4 , v5 } = Span{v3 , v4 , v5 },
and the latter is a minimal spanning set.
Exercise 19. Which of the following system of vectors of R3 are linear indepen-
dent, which form a generating system, which are a basis of R3 ?
(i) {(1, −1, 0), (0, 1, −1);

(ii) {(1, −1, 0), (0, 1, −1), (1, 0, −1)};
(iii) {(1, a, 1 − a) | a ∈ R};
(iii) {(1, a, a2 ) | a ∈ R};
(iii) all (x, y, z) with x, y, z ∈ R and x + 2y + 3z = 1.
Exercise 20. Show that the vectors

1 0 0 1 1 0 0 1
(a) , , , form a basis of M2 (R);
0 1 1 0 1 0 1 −1
(b) 1, 1+x, 1+x+x2 , . . . , 1+x+. . .+xn form a basis of Rn [x], the polynomials
of degree at most n in one variable x.
Exercise 21. (a) Consider the vectors v1 = (3, 5, 7), v2 = (2, 1, 2), v3 =
(1, 4, 5), v4 = (5, 3, 2) and v5 = (−4, −2, −4) in R3 . List all subsets
S ⊆ {v1 , v2 , v3 , v4 , v5 } which form a basis of R3 .
(b) Extend the set {(8, 2, 5), (−3, −5, 9)} to a basis of R3 .
Exercise 22. Let V, W be vector spaces over R. Consider the cartesian product
V × W with componentwise addition and scalar multiplication:
(v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ), λ · (v1 , w1 ) = (λv1 , λw1 )
for all v1 , v2 ∈ V, w1 , w2 ∈ W and λ ∈ R (which defines on V × W a vector space
structure). Let S be a basis of V and T be a basis of W . Give a basis for the
vector space V × W and justify your answer.
Exercise 23. (i) Let S and T be subsets of a vector space V . Which of the
following statements are true? Give reasons.
(a) Span(S ∩ T ) = Span(S) ∩ Span(T );

(b) Span(S ∪ T ) = Span(S) ∪ Span(T );
(c) Span(S ∪ T ) = Span(S) + Span(T ).
(ii) Let U1 , U2 be subspaces of a vector space V , and let X1 , X2 be bases of U1

and U2 . Which of the following statements are correct? Justify your answer.
(a) X1 ∪ X2 generates U1 + U2 .
(b) X1 ∪ X2 is linear independent in U1 + U2 .
(c) X1 ∩ X2 generates U1 ∩ U2 .
(d) X1 ∩ X2 is linear independent in U1 ∩ U2 .
(e) X1 ⊆ X2 if and only if U1 ⊆ U2 .
(r) If U1 ∩ U2 = {0} then X1 and X2 are disjoint.
34 ANNE HENKE
8. Steinitz Exchange Procedure
Let V be a vector space over R. We would like to define the dimension of a vector
space V to be the cardinality of any basis of V . To do this, we need to know that
two bases of the same vector space have the same cardinality. To prove such a
result, we use the exchange procedure going back to E. Steinitz. In this course
we only deal with vector spaces that have a finite basis.
Lemma 8.1 (Steinitz Exchange Lemma). Let {v1 , . . . , vn } be a basis of a vector
space V . Let
(2) w = λ1 v1 + · · · + λn vn
with λi ∈ R. If there exists an index k with 1 ≤ k ≤ n and λk 6= 0, then the set
{v1 , . . . , vk−1 , w, vk+1 , . . . , vn } is a basis of V.
Proof. Without loss of generality, we assume k = 1, so λ1 6= 0. We want to

show that {w, v2 , . . . , vn } is a basis of V.
(1) Let v ∈ V . Since {v1 , . . . , vn } is a basis of V , we have v = µ1 v1 +· · ·+µn vn

for some µi ∈ R. Since λ1 6= 0, we get from Equation (2):
1 λ2 λn
v1 = λ1
w − v
λ1 2
− ··· − v
λ1 n
and hence
v = µ1 ( λ11 w − λλ12 v2 − · · · − λλn1 vn ) + µ2 v2 + · · · + µn vn ,

= µλ11 w + µ2 − µλ1 λ1 2 v2 + · · · + µn − µλ1 λ1 n vn .
Hence v ∈ Span{w, v2 , . . . , vn } and so Span{w, v2 , . . . , vn } = V.

(2) To show that {w, v2 , . . . , vn } is linearly independent, assume
µw + µ2 v2 + · · · + µn vn = 0
for µ, µ2 , . . . , µn ∈ R. Then
0 = µ(λ1 v1 + · · · + λn vn ) + µ2 v2 + · · · + µn vn ,
= µλ1 v1 + (µλ2 + µ2 )v2 + · · · + (µλn + µn )vn .
Since {v1 , . . . , vn } is linearly independent, this implies
µλ1 = µλ2 + µ2 = . . . = µλn + µn = 0.
As λ1 6= 0, it follows µ = 0, and hence µ2 = . . . = µn = 0. So {w, v2 , . . . , vn }
is linearly independent.
We now apply Steinitz exchange Lemma repeatedly to a set of linearly indepen-

dent vectors. We call this repeated use Steinitz exchange procedure.
Proposition 8.2. Let S = {v1 , . . . , vn } be a basis of a vector space V and let

{w1 , . . . , wr } be linearly independent vectors. Then r ≤ n and there are indices
i1 , . . . , ir ∈ {1, . . . , n} such that after exchanging in S
vi1 against w1 ,
vi2 against w2 ,
..
.
vir against wr ,
we obtain a set T which is again a basis of V. If we rearrange vectors so that
i1 = 1, i2 = 2, . . . , ir = r, then T = {w1 , . . . , wr , vr+1 , . . . , vn }.
Remark. Note that the inequality r ≤ n is part of the claim.

Proof. By induction on r.
(1) If r = 0, nothing has to be shown (induction beginning). Let r ≥ 1 and

assume the statement is true for r − 1. In particular: If {w1 , . . . , wr } is
linearly independent, then by Lemma 6.7 also {w1 , . . . , wr−1 } is linearly
independent. The induction assumption – after rearranging the vectors –
then says that {w1 , . . . , wr−1 , vr , . . . , vn } is a basis of V and r − 1 ≤ n.
(2) We prove next that r ≤ n. By induction assumption we known that
r − 1 ≤ n. Assume r − 1 = n. In this case (1) says that {w1 , . . . , wr−1 } is a
basis of V. By assumption {w1 , . . . , wr−1 , wr } is also linearly independent
which provides a contradiction to Proposition 7.5. Hence r − 1 6= n, and
so indeed r ≤ n.
(3) Since {w1 , . . . , wr−1 , vr , . . . , vn } is a basis of V by induction assumption,
we can express wr as a linear combination, say
wr = λ1 w1 + · · · + λr−1 wr−1 + λr vr + · · · + λn vn ,
with λi ∈ R. Assume λr = . . . = λn = 0. This would give
wr = λ1 w1 + · · · + λr−1 wr−1 .
By Proposition 6.8, this contradicts the linear independence of {w1 , . . . , wr }.
So at least one element of {λr , λr+1 , . . . , λn } is non-zero. Without loss of
generality assume that λr 6= 0. Using Lemma 8.1 we can exchange vr
against wr , and obtain that {w1 , . . . , wr−1 , wr , vr+1 , . . . , vn } is a basis of V.
We next collect some important consequences of Steinitz exchange procedure.

Theorem 8.3. Any two finite bases of a vector space V have the same number
of elements.
Proof. Assume B = {v1 , . . . , vn } is a finite basis of V and B ′ is any other

basis of V with at least n elements. Pick n elements of B ′ , say {w1 , . . . , wn }.
Then {w1 , . . . , wn } is linearly independent, see Lemma 6.7. Now apply Steinitz’s
exchange prodecure: By Proposition 8.2 it then follows that {w1 , . . . , wn } is a
36 ANNE HENKE
basis of V . Since {w1 , . . . , wn } ⊆ B ′ , and B ′ is also a basis of V , it follows that

B ′ = {w1 , . . . , wn } and hence B and B ′ have both precisely n elements.
Remark. It should be noted that this last proof also indicates that it is not so
difficult to prove that if a vector space V has a finite basis, then any basis of V is
finite. For this purpose we would need to generalise some of the earlier definitions
(see the Remark at the end of Section 6) and statements to infinite sets. We call
a vector space V with a finite basis finitely generated. In this course we only deal
with finitely generated vector spaces.
Theorem 8.4. Let V be a finitely generated vector space and let S = {w1 , . . . , wr } ⊆
V be linearly independent. Then V has a basis B with S ⊆ B.
Proof. By assumption, the vector space V has a finite basis, say {v1 , . . . , vn }.
Apply Proposition 8.2. Then after possibly a suitable rearrangement of the vec-
tors, the set B = {w1 , . . . , wr , vr+1 , . . . , vn } forms a basis of V , and S ⊆ B.
Remark. To prove Theorem 8.4 for infinite families of vectors (for a vector space
which is not finitely generated) is more complicated. The result of Theorem 8.4
still would be true in the infinite setting, however it would need a not completely
unproblematic tool from set theory, called Zorn’s lemma. Examples of vector
spaces with no finite basis are:
(1) V = R[t] as vector space over R.

(2) V = R as vector space over Q.
Example 8.5. We conclude this section with an example on extending a finite
linearly independent set to a basis. To do so we use Theorem 8.4. Let V = R3 .
Let    
1 0
v1 =  1 , v2 =  1 .
0 1
Let B = {e1 , e2 , e3 } be the canonical basis of V , see Example 7.2(1). Then
using Steinitz exchange procedure, we can construct a basis B containing {v1 , v2 }.
Write v1 = e1 + e2 + 0 · e3 . Then we can exchange e1 in B with v1 . Next write
v2 = 0 · v1 + e2 + e3 . Hence we can exchange e2 with v2 . We obtain the basis
B ′ = {v1 , v2 , e3 } of V . Note that the basis B ′ extending the set {v1 , v2 } to a basis
is by no means unique.
Exercise 24. Consider the vector space R4 . Let
U1 := Span{(1, 1, 3, 2), (0, 1, 0, 2)}
U2 := Span{(2, 2, 4, 0), (2, −1, 3, 1), (2, 1, 1, 1)}.
(a) Find a basis B of U1 ∩ U2 .

(b) Use the exchange procedure by Steinitz to get a basis Bi of Ui with B ⊆ Bi
for i = 1, 2.
(c) Prove that U1 + U2 = R4 .
Exercise 25. Consider the vector space R4 . Let

E := {(1, −2, 6, 4), (2, −6, 15, 8), (0, 2, −9, −8), (3, −8, 21, 7)}
and S = {(0, 0, 1, 0), (0, 0, 0, 1)}.
(i) Show that E is linearly independent. Why is E a generating set for R4 ?

(ii) Use the exchange procedure of Steinitz to get a basis B with S ⊆ B ⊆
S ∪ E.
38 ANNE HENKE
9. Dimension of Vector Spaces and an Application to Sums
Dimension of vector spaces. Given any vector space V , we have not yet
shown, that there always exists a basis for V . Indeed to prove existence of a
basis is another consequence of the more general version of Theorem 8.4 (see the
remark following it): any family S of linearly independent vectors of a vector
space V (not necessarily finitely generated) can be extended to a basis B of V . In
particular, if we take S to be the empty set, then S is linearly independent and
by the generalised version of Theorem 8.4, S can be extended to a basis B of V .
Theorem 9.1. Every vector space has a basis.
In the case of a finitely generated vector space V , we have seen in Theorem 8.3 and
the remark following it, that any basis of V is finite and of the same cardinality.
Definition 9.2. If a vector space V has a finite basis, then we define the dimen-
sion of V as the number of elements in a basis of V. We denote the dimension of
V shortly by dim(V ) or dim V and say that V is finite dimensional. If V has no
finite basis, we call V infinite dimensional and write dim V = ∞.
Example 9.3. Compare the following with the Examples in 7.2.
(1) dim(Rn ) = n.
(2) dim(Mm×n (R)) = m · n.
(3) Let V = C be a vector space over C. Then dim(V ) = 1.
(4) Let V = C be a vector space over R. Then dim(V ) = 2.
(5) dim(Rn [X]) = n + 1.
(6) dim(R[X]) = ∞.
(7) Let V = R be a vector space over Q. Then dim(V ) = ∞.
(8) Define R∞ = {(ai ) | (ai ) = (a1 , a2 , . . . )}, the vector space of sequences of
real numbers. This is a vector space over R, and dim(R∞ ) = ∞.
Remark 9.4. Let V be a finite dimensional vector space and let W be a subspace
of V. As a consequence of Theorem 8.4 we have:
(1) dim W ≤ dim V,

(2) dim W = dim V if and only if V = W.
Remark 9.4(2) is not true for infinite dimensional vector spaces: the vector space
W of polynomials is a subspace of the vector space V of continuous functions and
dim(W ) = dim(V ) = ∞.
Sums and intersections of subspaces. The second part of this section deals
with an important dimension formula. Recall from Section 5:
Proposition 9.5. Let V be a vector space and let U, W be subspaces of V . Then
U ∩ W is a subspace of V .
Proof. We show this by using Lemma 5.3, the second subspace test. Since U, W
are subspaces, then 0 ∈ W and 0 ∈ U , so 0 ∈ U ∩ W , hence U ∩ W 6= ∅. If
v1 , v2 ∈ U ∩ W , then v1 , v2 ∈ U and v1 , v2 ∈ W . Let a1 , a2 ∈ R. Since U is a
subspace, we know that a1 v1 + a2 v2 ∈ U . Similarly since W is a subspace, we

know that a1 v1 + a2 v2 ∈ W . Therefore a1 v1 + a2 v2 ∈ U ∩ W . So by Lemma 5.3,
U ∩ W is a subspace of V .
Proposition 9.6. Let V be a vector space and let U, W be subspaces of V . Then

U + W is a subspace of V .
Proof. We will use Lemma 5.3, the second subspace test. Since U, W are both
subspaces, then 0 ∈ U and 0 ∈ W , so 0 + 0 = 0 ∈ U + W . Hence U + W 6= ∅.
Let v1 , v2 ∈ U + W , then v1 = u1 + w1 for some u1 ∈ U and w1 ∈ W . Similarly,
v2 = u2 + w2 for some u2 ∈ U and w2 ∈ W . So
a1 v1 + a2 v2 = a1 (u1 + w1 ) + a2 (u2 + w2 )
= (a1 u1 + a2 u2 ) + (a1 w1 + a2 w2 ) =: v say,
Since u1 , u2 ∈ U and U is a subspace, we have u := a1 u1 + a2 u2 ∈ U . Similarly
w := a1 w1 + a2 w2 ∈ W . Hence v = u + w ∈ U + W , and so by Lemma 5.3 we see
that U + W is a subspace of V .
Dimension formula for sums of subspaces. Let V be a finite dimensional vec-

tor space and let U, W be subspaces of V . Our aim is to determine the dimension
of U +W . Recall from Example 5.9 that in general dim(U +W ) 6= dim U +dim W .
Theorem 9.7. Let V be a finite dimensional vector space over R. Let U, W be
subspaces of V. Then dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W ).
Proof. (a) Let {v1 , . . . , vn } be a basis of U ∩ W.
• By Theorem 8.4, we can extend the basis of U ∩ W to a basis of U. This

means there exist elements {u1 , . . . , us } ∈ U such that {v1 , . . . , vn , u1 , . . . , us }
is a basis of U.
• By Theorem 8.4, we can extend the basis of U ∩ W to a basis of W. This
means there exist elements {w1 , . . . , wt } ∈ W with {v1 , . . . , vn , w1 , . . . , wt }
is a basis of W.
We will prove in (b) and (c) below that B = {v1 , . . . , vn , w1 , . . . , wt , u1 , . . . , us } is

a basis of U + W. This indeed then would imply that
dim(U + W ) = n + s + t = (n + s) + (n + t) − n
= dim U + dim W − dim(U ∩ W ).
(b) Claim: Span(B) = U + W.
Let v ∈ U + W . Then v = u + w for some u ∈ U , w ∈ W. Since u ∈ U, u equals
a linear combination of the above basis of U : there exist ai , bj ∈ R with
(3) u = a1 v1 + · · · + an vn + b1 u1 + · · · + bs us .
Similarly, since w ∈ W , w is expressible as a linear combination of the above basis
of W. So there exist ci , dj ∈ R with
(4) w = c1 v1 + · · · + cn vn + d1 w1 + · · · + dt wt .
40 ANNE HENKE
By Equations (3) and (4): v = u + w = (a1 + c1 )v1 + · · · + (an + cn )vn + b1 u1 +

· · · + bs us + d1 w1 + · · · + dt wt . Hence v is a linear combination of the vectors in B.
Hence U + W ⊆ Span(B). Conversely, any element in the Span(B) lies in U + W
as every vector z ∈ B lies in U + W.
(c) Claim: B is linearly independent.

Assume B is linearly dependent. Then we have a relation between the vectors in
B, say
n
X s
X t
X
(5) ai vi + b i ui + ci wi = 0,
i=1 i=1 i=1
and not all coefficients are zero. Then:
X t n
X s
X
(6) − ci wi = ai vi + bi ui =: z.
i=1 i=1 i=1
Since z is a linear combination of w1 , . . . , wt (which is a subset of W ), it follows

that z ∈ W. Since z is a linear combination of v1 , . . . , vn , u1 , . . . , us (which is a
basis of U ), it follows that z ∈ U. So z ∈ U ∩ W. Hence z is expressible in terms
of a basis of U ∩ W. So in particular, z is expressible with respect to v1 , . . . , vn ,
say z = f1 v1 + · · · + fn vn for some fi ∈ R. Hence:
n
X t
X
z = fi vi = − ci wi by Equation (6).
i=1 i=1
Pn Pt
It follows: i=1 fi vi + i=1 ci wi = 0. But v1 , . . . , vn , w1 , . . . , wt are a basis of W.
Hence these vectors are linear independent, i.e.
fi = 0 for 1 ≤ i ≤ n, ci = 0 for 1 ≤ i ≤ t.
P P
So z = 0 and hence Equation (6) reads: 0 = ni=1 ai vi + si=1 bi ui . Since the
vectors {v1 , . . . , vn , u1 , . . . , us } form a basis of U, this implies
ai = 0 for 1 ≤ i ≤ n, bi = 0 for 1 ≤ i ≤ s.
As all coefficients in Equation (5) are zero, B is linearly independent.
Example 9.8.
Let V = M3 (R), and define
U = {A | A = (aij ) with aij = 0 for i > j} ⊆ V,
W = {B | B = (bij ) with bij = 0 for i < j} ⊆ V.
So    
a11 a12 a13 b11 0 0
A =  0 a22 a23  ∈ U, B =  b21 b22 0  ∈ W.
0 0 a33 b31 b32 b33
Then
(1) U, W are subspaces of V. Note that dim V = 9, dim U = 6, dim W = 6.

(2) U ∩ W = {C | C = diag(c11 , c22 , c33 )} ⊆ V. So dim(U ∩ W ) = 3.
(3) U + W = {A + B | A ∈ U, B ∈ W } = V = M3 (R).
We can verify the dimension formula from Theorem 9.7:

dim(U + W ) = dim V = 9 = 6 + 6 − 3
= dim U + dim W − dim(U ∩ W ).
Direct sums. Let V be a finite dimensional vector space with subspaces U and
W . Sometimes the formula in Theorem 9.7 holds without the correction term
dim(U ∩ W ), see Example 5.8. This gets an own name.
Definition 9.9. A vector space V is called a direct sum of the subspaces U and
W, written as V = U ⊕ W, if the following holds:
(D1) V = U + W,
(D2) U ∩ W = {0}.
Often some other characterisations of this phenomenon are useful:

Proposition 9.10. Let U, W be subspaces of a vector space V. Then the following
are equivalent:
(1) V = U ⊕ W,
(2) For every v ∈ V there is a unique u ∈ U and w ∈ W with v = u + w.
Proof. (1) ⇒ (2): We only need to show uniqueness. Let v = u + w and also
v = u′ +w′ for u, u′ ∈ U , w, w′ ∈ W. Then u+w = u′ +w′ and so w−w′ = u′ −u. But
w −w′ ∈ W and u′ −u ∈ U as U, W are subspaces. Hence w −w′ = u′ −u ∈ U ∩W.
As U ∩ W = {0} this implies w − w′ = 0 = u′ − u. So w = w′ and u = u′ .
(2) ⇒ (1): By assumption (D1) holds. We show (D2). Assume v ∈ U ∩ W. Then
v ∈ U and v ∈ W . Since U, W are subspaces, 0 ∈ U and 0 ∈ W . Note that
v = 0 + v with 0 ∈ U, v ∈ W ; and v = v + 0 with v ∈ U, 0 ∈ W . Both these
expressions for v ∈ U + W need to be the same by assumption (2). Hence v = 0
and so U ∩ W = {0}.
Proposition 9.11. For subspaces U, W of a finite dimensional vector space V

the following conditions are equivalent:
(1) V = U ⊕ W ;
(2) V = U + W and dim V = dim U + dim W ;
(3) U ∩ W = {0} and dim V = dim U + dim W.
Proof. (1) ⇒ (2): Follows from Theorem 9.7 and Definition 9.9.
(2) ⇒ (3): By Theorem 9.7 it follows that dim(U ∩ W ) = 0, and so U ∩ W = {0}.
(3) ⇒ (1): Use Theorem 9.7 and the assumption in (3) to get:
dim(U + W ) = dim U + dim W − dim(U ∩ W )
= dim U + dim(W ) = dim(V ).
Note U + W ≤ V. By Remark 9.4(2) it follows that V = U + W.
42 ANNE HENKE
Exercise 26. A magical square is a table with nine digits with the following
properties: the sum of all numbers in each row, and in each column, and in each
diagonal is equal. This number is called the magical number. For example,
4 3 8
9 5 1
2 7 6
and the magical number is 15, the number in the center of the square is 5. Consider
the set of all magical squares with entries from the set of real numbers R.
(a) Show that the magical squares form a vector space.

(b) Show that the magical number is always three times the number in the
center of the square.
(c) Find a basis of the vector space of magical squares and determine its
dimension.
Exercise 27. Let Mn (R) be the set of all n × n matrices over R.
(a) Compute the dimension of Mn (R). Show that Mn (R) has a basis with
the property that each matrix in the basis is either symmetric or skew-
symmetric.
(b) Compute the dimension of the subspace of Mn (R) consisting of all diagonal
matrices.
(c) Compute the dimension of the subspace of Mn (R) consisting of all matrices
of zero trace (that is, where the sum of the diagonal entries is zero).
Exercise 28. (a) Let U and V be two subspaces of R2n−1 and let dim(U ) =
dim(V ) = n. Prove that U ∩ V 6= {0}.
(b) Let X, Y, Z be subspaces of a vector space V . Is the following formula
correct:
dim(X + Y + Z) = (dimX + dimY + dimZ)
−(dimX ∩ Y + dimY ∩ Z + dimZ ∩ X) + dimX ∩ Y ∩ Z?
(c) Given are three two-dimensional subspaces U1 , U2 , U3 of a vector space V
such that the intersection of each two subspaces is one dimensional. Which
dimensions can occure as dim(U1 + U2 + U3 ).
Exercise 29. Let V be a vector space of dimension n over R.
(a) Prove that for each r such that 0 ≤ r ≤ n, V contains a subspace of

dimension r.
(b) Let U, W be subspaces of V with U ⊆ W . Show that there exists a
subspace W ′ in V such that W ∩ W ′ = U and W + W ′ = V .
Exercise 30. Let U := Span{(1, 2, 1, 0), (2, 3, 2, 2), (0, −1, 0, 2)}. Determine a
vector space V ≤ R4 such that U ⊕ V = R4 .
10. Linear Transformations
This course deals with finite dimensional vector spaces only. From now on we
always assume the vector spaces under consideration to be finite dimensional,
without explicitely saying so. Throughout this section, let V, W be (finite dimen-
sional) vector spaces over R. Assume that T is a map from V to W , where we
consider V and W as sets. If T respects the structure of the underlying vector
spaces, then T is called linear transformation. More precisely:
Definition 10.1. Let V, W be vector spaces over R. Then a map T : V → W is
said to be a linear transformation (or a linear map) if and only if:
(L1) T (v1 + v2 ) = T (v1 ) + T (v2 ) ∀v1 , v2 ∈ V,

(L2) T (λv) = λT (v) ∀λ ∈ R, ∀v ∈ V.
Note that (L1) and (L2) are equivalent to requesting:
(L) T (λ1 v1 + λ2 v2 ) = λ1 T (v1 ) + λ2 T (v2 ) ∀λ1 , λ2 ∈ R, ∀v1 , v2 ∈ V.
Remarks. (a) Instead of T (v) we also write T v.

(b) Note that λ·v is scalar multiplication in V , while λ·T (v) is scalar multiplication
in W. Note that v1 + v2 is addition in V and T (v1 ) + T (v2 ) is addition in W .

2 2 x1 x2
Example 10.2. Let T : R → R be given by T ( )= . We verify
x2 x1
(L1) and (L2) :

x1 y1 x1 + y1
T( + ) = T( ) definiton of + in R2 ,
x2 y2 x 2 + y2
x2 + y2
= definition of T ,
x1 + y1
x2 y2
= + definition of + in R2 ,
x
1 y1
x1 y1
= T( ) + T( ) definition of T .
x2 y2

x λx
T (λ ) = T( ) definition of scalar multiplication,
y λy

λy
= definition of T ,
λx
y
= λ definition of scalar multiplication,
x

x
= λT ( ) definition of T .
y
Hence (L1) and (L2) hold, and so T is a linear transformation.

2 2 x x
Example 10.3. (1) Let T : R → R be defined by T ( ) = .
y 0
This is a linear transformation, called the projection onto the first coordi-
nate (or x-axis).
44 ANNE HENKE
(2) Let f (x) = a0 + a1 x + · · · + an xn . We define T : Rn [x] → Rn−1 [x] by

d
differentiating: T f (x) = dx f (x) = a1 +2a2 x+· · ·+nan xn−1 . We can verify
that T (λf ) = λT (f ) and T (f1 + f2 ) = T f1 + T f2 for all f, f1 , f2 ∈ Rn [X],
and λ ∈ R. Hence T is a linear transformation.
(3) Similarly, taking T to be the operation of integration, then T : Rn−1 [x] →
2 xn
Rn [x] with T (a0 + a1 x + · · · + an−1 xn−1 ) = a0 x + a12x + · · · + an−1
n
is a
linear transformation.
(4) The zero map 0 : V → {0} with v 7→ 0 for all v ∈ V is linear. The identity
map idV : V → V with v 7→ v for all v ∈ V is linear.
(5) Fix vector v0 ∈ V with v0 6= 0. The constant map T : V → V given by
v 7→ v0 for all v ∈ V is not linear.
Example 10.4. Fix a square matrix M ∈ Mn (R). Let T : Mn (R) → Mn (R) be
given by multiplication with M from the right, that is T (A) = AM . Then T is a
linear transformation.
Proof. Use Proposition 2.11 and the definition of T . We have

T (A1 + A2 ) = (A1 + A2 )M = A1 M + A2 M = T (A1 ) + T (A2 ).
T (λA) = (λA)M = λ(AM ) = λT (A).
So (L1) and (L2) hold for T . Hence T is linear.
Let us collect some properties of linear maps.

Lemma 10.5. Let T : V → W be linear. Then
(1) T (0) = 0,
(2) T (v − w) = T (v) − T (w) for v, w ∈ V ,
r
! r
X X
(3) T λi vi = λi T (vi ) for vi ∈ V, λi ∈ R.
i=1 i=1
Proof. We have
T (0V ) = T (0R · 0V ) by Lemma 4.2,
= 0R · T (0V ) by (L2),
| {z }
∈W
= 0W by Lemma 4.2.
Similarly, as v − w = v + (−w) we have,

T (v − w) = T (v + (−1)w) by Lemma 4.2,
= T (v) + (−1)T (w) by (L),
= T (v) − T (w) by Lemma 4.2.
For the last statement do induction on r using (L) .
Lemma 10.6. Let T : V → W be linear.

(1) If {v1 , . . . , vr } is linearly dependent in V, then {T v1 , . . . , T vr } is linearly

dependent in W.
(2) If {T v1 , . . . , T vr } is linearly independent in W , then {v1 , . . . , vr } is linearly
independent in V.
Remark. The converse of this Lemma is wrong. However, if T is injective, then

the converse holds.
Proof. Assume λ1 v1 + · · · + λr vr = 0 with not all λi zero. Then applying T and
using Lemma 10.5 we get 0 = T (0) = T (λ1 v1 +· · ·+λr vr ) = λ1 T (v1 )+· · ·+λr T (vr )
with not all λi zero. Hence {T (v1 ), . . . , T (vr )} is linearly dependent. The second
statement is shown in the same way.
Proposition 10.7. (1) Let S : U → V and T : U → V be linear, λ ∈ R.

Then λT and S + T are linear.
(2) Let T : U → V and S : V → W be linear. Then the composition S ◦ T :
U → W is linear.
(3) Let T : V → W be linear. If the inverse map T −1 of T exists then
T −1 : W → V is again linear.
Proof. Let λ, µ ∈ R and let u, v ∈ U . Since S is linear we have S(λu + µv) =

λS(u) + µS(v). Since T is linear we have T (λu + µv) = λT (u) + µT (v). By
definition of addition of maps we have (S + T )(x) = S(x) + T (x) for any x ∈ U .
Hence
(S + T )(λu + µv) = S(λu + µv) + T (λu + µv)
= λS(u) + µS(v) + λT (u) + µT (v)
= λ · (S(u) + T (u)) + µ · (S(v) + T (v))
= λ · (S + T )(u) + µ · (S + T )(v).
Hence S + T is linear. The proof of the other statements is left as an exercise to
the reader.
Remark. The set of all R-linear maps from a vector space U to a vector space
V is denoted by HomR (U, V ). The set HomR (U, V ) is in fact an R-vector space
where the first property of the last proposition gives the scalar multiplication
and addition of this vector space. If one considers the set HomR (U, U ) – also
denoted by EndR (U ) – then this is a so-called ring (a slightly more general object
than what a field is). The second property of the last proposition defines the
multiplication of this ring. Rings are studied in the second year algebra course.
Theorem 10.8. Let V, W be vector spaces over R. Let {v1 , . . . , vn } be a basis
of V and {w1 , . . . , wn } a set of vectors in W. Then there is precisely one linear
transformation T : V → W with T (vi ) = wi for 1 ≤ i ≤ n. Moreover
(1) im(T ) := {T v | v ∈ V } = Span{w1 , . . . , wn },

(2) T is injective if and only if {w1 , . . . , wn } is linearly independent.
46 ANNE HENKE
Proof. (a) Given v ∈ V , write v = λ1 v1 + · · · + λn vn for λiP

∈ R. By Proposition
7.3 the scalars λi are uniquely determined. Define T (v) = ni=1 λi wi . Then T is
linear with T (vi ) = wi . Hence such a linear transformation exists. Moreover, for
Pn linear transformation, T : V → W with T (vi ) = wi we have that T (v) =
any
i=1 λi wi . Hence the map T is uniquely determined.
Pn We next show part (1) of the claim. Let w ∈ Span{w1 , . . . , wn }. Then w =

(b)
i=1 λi wi for some λi ∈ R. Since T (vi ) = wi we have
n
X n
X
w = λi wi = λi T (vi )
i=1 i=1
= T (λ1 v1 + · · · + λn vn ),
and hence for v = λ1 v1 + · · · + λn vn we have T v = w. So wP ∈ im(T ). This shows
n
Span{w
Pn 1 , . . . , wn } ⊆ im(T ). Conversely, if v ∈ V , say v = i=1 λi vi , then T v =
λ w
i=1 i i , and hence T v ∈ Span{w 1 , . . . , wn } and so im(T ) ⊆ Span{w1 , . . . , wn }.
(c) Assume that T is injective. This means whenever T u = T v for some u, v ∈ V
then u = v. We show that {w1 , . . . , wn } is linearly independent. Consider
(7) 0 = λ1 w1 + . . . + λn wn .
We need to show that λi = 0 for 1 ≤ i ≤ n. As T (vi ) = wi , Equation (7) equals
0 = λ1 T (v1 )+. . .+λn T (vn ). Since T is linear, this implies 0 = T (λ1 v1 +. . .+λn vn ).
By Lemma 10.5 we have T (0) = 0, and hence T (0) = T (λ1 v1 + . . . + λn vn ). As T
is injective, this implies 0 = λ1 v1 + . . . + λn vn . As {v1 , . . . , vn } is a basis, it is in
particular linearly independent. Hence λi = 0 for 1 ≤ i ≤ n.
(d) Next, let {w1 , . . . , wn } be linearly independent. Assume that T u = T v for
some u, v ∈ V . Since {v1 , . . . , vn } is a basis, there exist λi , µi ∈ R with
n
X n
X
u= λi vi , v= µi vi .
i=1 i=1
Applying T to u and v and using that T vi = wi we get

n
X n
X
Tu = λi wi , Tv = µi wi .
i=1 i=1
Pn
Since T u = T v this implies 0 = T u − T v = i=1 (λi − µi )wi . Since {w1 , . . . , wn }
is linearly independent, this implies that λi − µi = 0. Hence λi = µi for 1 ≤ i ≤ n.
Hence u = v, and so T is injective.
It should be noted that in Theorem 10.8 the assumption that {v1 , . . . , vn } is a basis
is very important. To see this the reader is advised to try out examples where
{v1 , . . . , vn } is either a linearly dependent set of vectors or it is not spanning V .
Example 10.9. (a) Choose vectors

1 2 1 0
v1 = , v2 = , w1 = , w2 = .
1 2 0 1
Note that {v1 , v2 } is linearly dependent and is not spanning V . It is easily seen
that for the vectors choosen above, there is no linear map T : R2 → R2 with
T (vi ) = wi for i = 1, 2.
(b) Choose vectors

1 2 1 2
v1 = , v2 = , w1 = , w2 = .
1 2 0 0
Define
x x
T( )= ,
y 0

x x
S( )= .
y x−y
Then both S and T are linear with T (vi ) = wi and S(vi ) = wi .
Exercise 31. Which of the following mappings T : R3 → R3 are linear transfor-
mations:
(i) T (x, y, z) = (y, z, 0);

(ii) T (x, y, z) = (|x|, −z, 0);
(iii) T (x, y, z) = (x − 1, x, y).
(iv) T (x, y, z) = (2x, y − 2, 4y).
Exercise 32. Let T : U → V be a linear transformation between vector spaces
U, V , let λ ∈ R. Show that λT is a linear transformation.
Exercise 33. Let U, V, W be vector spaces over R, and T : U → V and S : V →
W be linear transformations.
(a) Show that the composition S ◦ T : U → W is linear.

(b) Show that if the inverse map S −1 of S exists, then S −1 : W → V is linear.
48 ANNE HENKE
11. The Rank-Nullity Theorem
Let V and W be finite dimensional vector spaces over R. In this section we

continue our study of linear maps between vector spaces. We derive in particular
the important rank-nullity theorem.
Definition 11.1. Let T : V → W be a linear transformation.
(1) We define ker(T ) = {v ∈ V | T v = 0}, called the kernel of T.

(2) We define im(T ) = {T v | v ∈ V }, called the image of T.
Proposition 11.2. Let T : V → W be linear. Then ker(T ) is a subspace of V ,
and im(T ) is a subspace of W.
Proof.
(1) Let v1 , v2 ∈ ker(T ). Then T v1 = 0 and T v2 = 0. Let λ1 , λ2 ∈ R. Then

T (λ1 v1 + λ2 v2 ) = λ1 T (v1 ) + λ2 T (v2 ) by (L),
= λ1 · 0 + λ2 · 0
= 0+0=0 by Lemma 4.2.
By Lemma 5.3, it follows that ker(T ) is a subspace.
(2) Let w1 , w2 ∈ im(T ). Then there exist v1 , v2 ∈ V with T v1 = w1 and
T v2 = w2 . Let λ1 , λ2 ∈ R. Then
T (λ1 v1 + λ2 v2 ) = λ1 T (v1 ) + λ2 T (v2 ) by (L),
= λ1 w1 + λ2 w2 .
Hence λ1 v1 + λ2 w2 ∈ im(T ). By Lemma 5.3, it follows that im(T ) is a
subspace.
Proposition 11.3. For a linear map T : V → W we have: T is injective ⇐⇒

ker(T ) = {0}.
Proof. “⇒”: Assume T is injective. Let v ∈ ker(T ). Then T (v) = 0, and hence
by Lemma 10.5: T (v) = T (0). Since T is injective, this implies v = 0. Hence
ker(T ) = {0}.
“⇐”: Let v1 , v2 ∈ V with T (v1 ) = T (v2 ). By Lemma 10.5:
T (v1 − v2 ) = T (v1 ) − T (v2 ) = 0.
So v1 − v2 ∈ ker(T ) = {0}. Hence v1 = v2 , which proves that T is injective.
Remark. A linear map T : V → W is injective, if and only if whenever {v1 , . . . , vr } ⊆

V is linearly independent, then {T v1 , . . . , T vr } is linearly independent. The proof
is left as an exercise to the reader. Compare with Lemma 10.6
Definition 11.4. Let T : V → W be linear. We define the nullity of T to be

dim(ker T ) and write n(T ). We define the rank of T to be dim(imT ) and write
rk(T ).
Example 11.5. Define T : R3 → R2 by
 
x
x
T ( y ) = .
0
z
Note that T is linear and
  
 0 
x
imT = |x∈R , ker T =  y  | y, z ∈ R .
0  
z
Hence rk(T ) = dim(imT ) = 1, and n(T ) = dim(ker T ) = 2. Note that
dim(R3 ) = 3 = 1 + 2 = rk(T ) + n(T ).
Theorem 11.6 (The Rank-Nullity Theorem). Let T : V → W be a linear trans-
formation of finite dimensional vector spaces over R. Then
dim(V ) = rk(T ) + n(T ).
Proof.
(1) By Lemma 11.2, we know that ker(T ) is a subspace of V. Let {u1 , . . . , uk }

be a basis of ker(T ). So n(T ) = k. By Proposition 8.2, we can ex-
tend {u1 , . . . , uk } to a basis of V , say {u1 , . . . , uk , uk+1 , . . . , un } where
n = dim V. We show that {T uk+1 , . . . , T un } is a basis of im(T ), that
is rk(T ) = n − k. Then indeed rk(T ) + n(T ) = dim(V ). P
(2) Let w ∈ im(T ). Then there exists v ∈ V with T v = w. Write v = ni=1 ai ui
with ai ∈ R. Then
Xn
w = T (v) = T ( ai ui )
i=1
n
X
= ai T (ui ) since T is linear,
i=1
Xn
= ai T (ui ) since T (uj ) = 0 for j ≤ k.
i=k+1
Hence w ∈ Span{T uk+1 , . . . , T un }, and so im(T ) ⊆ Span{T uk+1 , . . . , T un }.

Since T ui ∈ im(T ) for k + 1 ≤ i ≤ n, this implies that
Span{T uk+1 , . . . , T un } = im(T ).
(3) Assume we have 0 = λk+1 T uk+1 +· · ·+λn T un for some λi ∈ R with k+1 ≤
i ≤ n. By linearity of T , this implies that 0 = T (λk+1 uk+1 + · · · + λn un ).
Let z := λk+1 uk+1 + · · · + λn un , then z ∈ ker(T ). Hence we can express z
as a linear combination of the basis elements of ker(T ), say
λk+1 uk+1 + · · · + λn un = z = λ1 u1 + · · · + λk uk .
50 ANNE HENKE
This implies: 0 = −λ1 u1 − · · · − λk uk + λk+1 uk+1 + · · · + λn un . Since

{u1 , . . . , un } is linearly independent (it is a basis of V ), this implies that
λ1 = λ2 = . . . = λn = 0.
Hence {T uk+1 , . . . , T un } is linearly independent.
Corollary 11.7. Between finite dimensional vector spaces V and W , there exists
a bijective linear map (called isomorphism) T : V → W if and only if dim V =
dim W.
Proof. Assume V and W are of dimension n. By Theorem 10.8, there exists

a bijective linear map, mapping a basis element vi to a basis element wi for
1 ≤ i ≤ n.
Conversely, assume there exists a bijective linear map T : V → W . So in partic-
ular, T is injective, and hence Proposition 11.3 implies ker(T ) = {0}. Since T is
surjective, im(T ) = W . By Theorem 11.6, dim(V ) = dim(imT ) = dim(W ).
Exercise 34. Describe the kernel and image of each of the following linear trans-
formations, and in each case give the rank and nullity of the transformation:
(a) T : R4 → R3 is given by T (x) = Ax for x a column vector in R4 , and

where A is the matrix
 
1 −1 1 1
 1 2 −1 1  .
0 3 −2 0
(b) V is the vector space of all polynomials in x of degree ≤ n, and T : V → V
is given by differentiation with respect to x. P
(c) V = Mn (R), and T : V → R is given by T (A) = tr(A) = ni=1 aii for
A = (aij ) ∈ V .
(d) Determine the dimension of ker(f1 ) ∩ ker(f2 ) and ker(f1 ) + ker(f2 ) for
the following linear maps from R5 → R1 :
f1 (x1 , x2 , x3 , x4 , x5 ) = x2 + 5x4 , f2 (x1 , x2 , x3 , x4 , x5 ) = x1 + 2x2 + x3 + 10x4 − x5 .
Exercise 35. Let V be a vector space of dimension n ≥ 1. If T : V → V is a
linear transformation, prove that the following statements are equivalent:
(a) im(T ) = ker(T ); (b) T 2 = 0, n is even and rk(T ) = 21 n.
Exercise 36. Let T : R3 → R3 be a linear transformation. Show that im(T 2 ) ⊆
im(T ) and that ker(T ) ⊆ ker(T 2 ). Prove the equivalence of the following state-
ments:
(a) R3 = ker(T ) ⊕ im(T );

(b) ker(T ) = ker(T 2 );
(c) im(T ) = im(T 2 ).
(We write R3 = ker(T ) ⊕ im(T ) if R3 = ker(T ) + im(T ) and ker(T ) ∩ im(T ) =

{0}.)
Exercise 37. Define the vectors a1 = (1, 0, 0), a2 = (0, 1, 0), a3 = (0, 0, 1) a4 =
(2, 1, 3), b1 = (1, 2, 4, 1), b2 = (1, 1, 0, 1), b3 = (−1, 0, 4, −1) and b4 = (0, 5, 20, 0).
(i) Show that there is precisely one linear map f : R3 → R4 with f (ai ) = bi
for i = 1, 2, 3, 4.
(ii) Describe the kernel and the image of f and give the rank and the nullity
of f .
Exercise 38. Let V be an n-dimensional vector space and let S and T be linear
transformations on V .
(a) Prove that nullity(ST ) ≤ nullity(S)+ nullity(T ).

(b) If S n = 0 but S n−1 6= 0, then determine nullity(S).
52 ANNE HENKE
12. The Matrix Representation of a Linear Transformation
Let T : V → W be a linear transformation between finite dimensional vector

spaces V and W . Let B1 = {v1 , . . . , vn } be a basis of V, B2 = {w1 , . . . , wm } be a
basis of W. Define (using so-called column convention) a matrix
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A = (aij ) =   ∈ Mm×n (R)
 ... ..
.
. . . ..
. 
am1 am2 . . . amn
by expressing T (vi ) as a linear combination in the basis elements of B2 :
T (v1 ) = a11 w1 + · · · + am1 wm
..
.
T (vi ) = a1i w1 + · · · + ami wm
..
.
T (vn ) = a1n w1 + · · · + amn wm .
Note that by Proposition 7.3, the coefficients aij are uniquely determined. Hence
the matrix A is well-defined.
Definition 12.1. Matrix A is called the matrix of T (or the matrix representing
T , or corresponding to T ) with respect to bases B1 and B2 . Write A = MBB21 (T ).
Example 12.2. Let T : Rn [x] → Rn [x] be differentiation. Let B1 = B2 =
{1, x, . . . , xn }. Then
T (1) = 0 = 0 · 1 + 0 · x + · · · + 0 · xn ,
T (x) = 1 = 1 · 1 + 0 · x + · · · + 0 · xn ,
T (x2 ) = 2x = 0 · 1 + 2 · x + · · · + 0 · xn ,
..
.
T (x ) = nxn−1 = 0 · 1 + 0 · x + · · · + nxn−1 + 0 · xn .
n
Hence the matrix of T with respect to B1 and B2 is

 
0 1 0 ... 0
. 
0 2 . . . .. 

 0
 . . .
A=  0 0 0 . . ..  
 . .. .. . . 
 .. . . . n 
0 0 0 ... 0
Recall the definition of a coordinate vector, given in the remark after Proposi-
tion 7.3.
Proposition 12.3. Let T : V → W be linear. Let x be the coordinate vector of

v ∈ V with respect to a basis B1 . Then the coordinate vector of T (v) with respect
to a basis B2 is MBB21 (T )x.
Proof. Let MBB21 (T ) = A = (aij ) with B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wm }.

P
By assumption v = ni=1 xi vi for xi ∈ R. Let x = (x1 , . . . , xn )T be the coordinate
vector of v with respect to B1 . Then
n
X
T (v) = xi T (vi )
i=1
n
X m
X
= xi aji wj
i=1 j=1
m n
!
X X
= aji xi wj
j=1 i=1
Xm

= j th entry of Ax · wj .
j=1
Hence the coordinate vector of T v with respect to basis B2 is Ax.
Theorem 12.4. If S : U → V and T : U → V are linear with BU basis of U and

BV basis of V, then S + T and λT are linear with
MBBVU (S + T ) = MBBVU (S) + MBBVU (T ),

MBBVU (λT ) = λMBBVU (T ).
Proof. Follows from Proposition 10.7 and Definition 12.1 The details are left as
an exercise to the reader.
Theorem 12.5. Let U, V, W be vector spaces with basis B1 , B2 and B3 respectively.

Let T : U → V and S : V → W be linear. Then ST is linear with
MBB31 (ST ) = MBB32 (S) · MBB21 (T ).
Proof. To check that ST is linear is left as an exercise to the reader (see

Proposition 10.7). Let
B1 = {u1 , . . . , un } be a basis of U,
B2 = {v1 , . . . , vm } be a basis of V,
B3 = {w1 , . . . , wk } be a basis of W.
54 ANNE HENKE
Let A = MBB21 (T ) = (aij ), so

T (u1 ) = a11 v1 + · · · + am1 vm
..
.
(8) T (ui ) = a1i v1 + · · · + ami vm
..
.
T (un ) = a1n v1 + · · · + amn vm .
Let B = MBB32 (S) = (bij ), with
S(v1 ) = b11 w1 + · · · + bk1 wk
..
(9) .
S(vm ) = b1m v1 + · · · + bkm wk .
Let C = MBB31 (ST ) = (cij ), with
(ST )(u1 ) = c11 w1 + · · · + ck1 wk
..
.
(10) (ST )(ui ) = c1i w1 + · · · + cki wk
..
.
(ST )(un ) = c1n w1 + · · · + ckn wk .
Then
(ST )(ui ) = S(T ui )
= S(a1i v1 + . . . + ami vm ) by Equation (8),
= a1i S(v1 ) + . . . + ami S(vm ) since S is linear,
= a1i (b11 w1 + . . . + bk1 wk ) by Equation (9),
+a2i (b12 w1 + . . . + bk2 wk )
..
.
+ami (b1m w1 + . . . + bkm wk )
= (a1i b11 + a2i b12 + . . . + ami b1m )w1 (by reordering)
+(a1i b21 + a2i b22 + . . . + ami b2m w2
..
.
+(a1i bj1 + a2i bj2 + . . . + ami bjm )wj
..
.
+(a1i bk1 + a2i bk2 + . . . + ami bkm )wk
Pk Pm
= j=1 ( l=1 bjl ali ) wj .
Pm
Hence cji = l=1 bjl ali . Hence C = B · A.
Corollary 12.6. Let T : U → V be linear. If T is invertible then T −1 is linear

with
MBBUV (T −1 ) = MBBVU (T )−1 .
Proof. Use Theorem 12.5:

I = MBBUU (idU ) = MBBUU (T −1 ◦ T ) = MBBUV (T −1 ) · MBBVU (T ).
In the rest of this section, we consider some special cases and applications of the
results obtained so far in this section.
Definition 12.7. Let id : V → V be the identity map, and let B1 and B2 be
bases of V. We call MBB21 (id) the base change matrix associated with the change
of basis from basis B1 to basis B2 .
Proposition 12.8. Let V be a vector space. Let x be the coordinate vector of v
with respect to basis B1 . Let y be the coordinate vector of v with respect to basis
B2 . Then y = MBB21 (id)x.
Proof. Take T = id : V → V in Proposition 12.3.
Theorem 12.9. Let T : V → W be linear. Let BV1 and BV2 be bases of V, BW1
B B B B
and BW2 be bases of W. Then MBWV2 (T ) = MBWW1 (id) ◦ MBWV1 (T ) ◦ MBVV1 (id)−1 .
2 2 1 2
Proof. Consider the composition of maps T = idW ◦ T ◦ idV with respect to the
following bases:
id T id
V −−→ V −−→ W −−→ W.
basis BV2 basis BV1 basis BW1 basis BW2
Then by Theorem 12.5 and Corollary 12.6 the claim follows:
B B
MBWV2 (T ) = MBWV2 (idW ◦ T ◦ idV )
2 2
B B B
= MBWW1 (idW ) ◦ MBWV1 (T ) ◦ MBVV2 (idV ) by 12.5,
2 1 1
B B B
= MBWW1 (idW ) ◦ MBWV1 (T ) ◦ MBVV1 (idV )−1 by 12.6.
2 1 2
Example 12.10. (i) Let T : R2 → R3 be given with respect to basis B1 =

{u1 , u2 } and B2 = {v1 , v2 , v3 } as
 
1 2
A = MBB21 (T ) =  0 1 .
−3 −1
So by Definition 12.1 we have:
(11) T u1 = 1 · v1 + 0 · v2 − 3v3 ,
T u2 = 2 · v1 + v2 − v3 .
56 ANNE HENKE
(ii) We take new bases for R2 and R3 , say B3 = {w1 , w2 } and B4 = {z1 , z2 , z3 }
with
z1 = v1 + v2 ,
w1 = u1 − 2u2 ,
z2 = v2 + v3 ,
w2 = u1 + u2 ,
z3 = v1 + v3 .
What is MBB43 (T )?
(iii) Note
v1 = 21 (z1 − z2 + z3 ),
(12) v2 = 12 (z2 − z3 + z1 ),
v3 = 12 (z3 + z2 − z1 ).
Then
T w1 = T u1 − 2T u2
= (v1 − 3v3 ) − 2(2v1 + v2 − v3 ) by Eq. (11),
= −3v1 − 2v2 − v3
= − 23 (z1 − z2 + z3 ) − (z2 − z3 + z1 ) − 21 (z3 + z2 − z1 ) by Eq. (12),
= −2z1 + 0z2 − 1z3
Similarly,
T w2 = T u1 + T u2
= (v1 − 3v3 ) + (2v1 + v2 − v3 )
= 3v1 + v2 − 4v3
3
= (z − z2 + z3 ) + 21 (z2 − z3 + z1 ) − 2(z3 + z2 − z1 )
2 1
= 4z1 − 3z2 − z3
 
−2 4
Hence B = MBB43 (T ) =  0 −3  .
−1 −1
(iv) What is the relationship between A and B? Let Q = MBB42 (id) and let
P = MBB31 (id). Then by Equation (12) we have:
 
1 1 −1
1
Q =  −1 1 1 .
2 1 −1 1
Moreover, from (ii) we see that

1 1
P −1
= MBB13 (id) = .
−2 1
It is now easily seen that indeed B = QAP −1 .
Exercise 39. Let E2 and E3 denote the canonical bases for R2 and R3 respec-
tively, that is E2 = {(1, 0), (0, 1)} and E3 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Let
f : R2 → R3 and g : R3 → R2 be given by
f (x, y) = (x + 2y, x − y, 2x + y),
g(x, y, z) = (x − 2y + 3z, 2y − 3z).
(a) Determine the matrices MEE32 (f ), MEE23 (g), MEE22 (g ◦ f ) and MEE33 (f ◦ g)
representing the linear maps f, g, g ◦ f and f ◦ g with respect to bases E2
and E3 .
(b) Show that g ◦ f is bijective and determine MEE22 ((g ◦ f )−1 ).
Exercise 40. Consider the vector spaces R3 and R2 with the bases B and C
respectively, where
B = {(1, 1, 0), (1, −1, 1), (1, 1, 1)} and C = {(1, 1), (1, −1)}.
(a) Determine the matrices MBE3 (id), MEB3 (id) representing the identity map
on R3 , and determine the matrices MCE2 (id), MEC2 (id) representing the
identity map on R2 .
(b) For the linear maps f and g in the previous question, determine the ma-
trices MBC (f ), MCB (g), MCC (g ◦ f ) and MBB (f ◦ g) representing the linear
maps f, g, g ◦ f and f ◦ g with respect to bases B and C.
Exercise 41. Let n ∈ N. Consider the vector space Rn [x] of polynomials of
degree at most n. Let Bn = {1, x, . . . , xn }. Define Dn : Rn [x] → Rn−1 [x] by
f 7→ f ′ , where f ′ denotes the first derivative of f .
(a) Show that Dn is linear.

(b) Determine MBBn−1
n
(Dn ).
(c) Show that there is a linear map In : Rn−1 [x] → Rn [x] with Dn ◦ In = id
B
and determine MBnn−1 (In ).
Exercise 42. Consider the vector space W of functions from R to R. Let
B = {sin(x), cos(x), sin(x) · cos(x), sin2 (x), cos2 (x)},
and define V = Span(B) ⊆ W . Consider the map F : V → V given by f (x) 7→
f ′ (x) where f ′ denotes the first derivative of f .
(i) Show that B is a basis of V .

(ii) Determine the matrix MBB (F ).
(iii) Give a basis of ker(f ) and im(f ).
Exercise 43. Let V = R3 [x] be the vector space of all polynomials of degree at
most three with basis B = {1, x, x2 , x3 }. Consider the maps
Z 1
F : V → R, f 7→ f (x)dx and G : V → R3 , f 7→ (f (−1), f (0), f (1)).
−1
(i) Show that F and G are linear.

(ii) Let E1 and E3 be the canonical bases of R and R3 . Determine MEB1 (F )
and MEB3 (G).
(iii) Show that Ker(G) ⊆ Ker(F ).
(iv) Show that there is a linear map H : R3 → R with H ◦ G = F .
Exercise 44. Let V be a vector space of dimension n and F : V → V a linear
map with F 2 = F .
58 ANNE HENKE
(a) Show that there are subspaces U, W of V with V = U ⊕W and F (W ) = 0,

F (u) = u for all u ∈ U .
(b) Show that there exists a basis B of V and some r ≤ n such that

B Ir 0
MB (F ) = .
0 0
[Here Ir denotes the identity matrix of size r, and 0 denotes zero matrices
(of possibly different size).]
13. Row Reduced Echelon Matrices
The last part of this lecture course deals with how to solve system of linear
equations. We will study in particular when a system of linear equations has
precisely one solution. In this section we introduce the row reduced echelon form
of a matrix.
Definition 13.1. An m × n matrix M is in row reduced echelon form if
(1) The zero rows of M (if any) all come below the non-zero rows.
(2) In each non-zero row the leading entry (= the left most non-zero entry) is
one.
(3) If row i and row i + 1 are non-zero, then the leading entry of row i + 1 is
strictly to the right of the leading entry of row i.
(4) If a column contains a leading entry of a non-zero row, then all its other
entries are zero.
Example 13.2. Matrices in row reduced echelon form are
 
  1  
0 1 0 0 0 3 1
 0  ,   , 0 0 0 , 0 1 ,  0 1 4 0 −2 0  .
 0 
 0  0 0 0 0 0
0 0 0 0 1 0 1
0
Or more generally, the following matrix is in row reduced echelon form:
0 0 1 0 0 0
0 1
··· ∗ ··· ∗ ∗ ··· ∗ ∗ ··· ∗ ∗ ··· ∗ ··· ∗
B .. .. .. .. .. .. .. C
B
B . . 0 ··· ··· 0 1 ∗ ··· ∗ 0 . . . . . C
C
B .. .. C
. 0 0 1 0 .
B C
B ··· ∗ ··· ∗ ∗ C
B
B .. C
C.
B
B . 0 ··· ··· 1 ∗ ∗ C
C
B
B 0 ··· ··· 0 ··· ··· 0 ··· ··· 0 C
C
B .. .. C
@ . . A
0 ··· ··· ··· ··· ··· ··· ··· ··· ··· 0
Remark 13.3. If A ∈ Mm×n (R) is in row reduced echelon form then for each k
with 1 ≤ k ≤ n, the matrix made from the first k columns is also in row reduced
echelon form.
Matrices not in row reduced echelon form are:

 
  1 0 0 0 0
0 1 2 1
1 0  0 0 1 0 0 
, 0 1 0 3 , .
0 2  0 0 0 0 0 
0 0 1 0
0 0 0 0 1
Definition 13.4. (1) The following operations are called elementary row op-
erations (eros) on a matrix:
Type I: Swap row i and row j. (Write Ri ←→ Rj .)
Type II: Multiply row i by a non-zero λ ∈ R. (Write Ri −→ λRi .)
Type III: Add to row i a multiple (c times) of row j for i 6= j. (Write
Ri −→ Ri + cRj .)
60 ANNE HENKE
(2) If matrix B is obtained from A by applying eros, we call A and B row

equivalent.
Theorem 13.5. Every m×n-matrix may be brought to a row reduced echelon form
by applying elementary row operations. The row reduced echelon form obtained is
unique.
Proof. For existence, see Algorithm 13.9 below. We omit the proof for the
uniqueness of the row reduced echelon form obtained.
Theorem 13.5 allows to make the following definition:
Definition 13.6. The row rank of a matrix A is defined to be the number of

non-zero rows in the row reduced echelon form of A.
Example 13.7. We give two examples of reducing a matrix to row reduced

echelon form by applying eros. In the first case the matrices have row rank two,
in the second case the matrices have row rank three. A further example is given
in Example 13.10.
(1)

0 2 −1
M1 = e1 = R2 −→ 21 R2
2 4 8

0 2 −1
M2 = e2 = R1 ←→ R2
1 2 4

1 2 4
M3 = e3 = R2 −→ 21 R2
0 2 −1

1 2 4
M4 = e4 = R1 −→ R1 − 2R2
0 1 − 21

1 0 5
M5 =
0 1 − 12
Notation: Given matrix M. Suppose when applying ero e to M we obtain

the matrix N. Then we write N = e(M ).
Remark: In the example above we have ei (Mi ) = Mi+1 for 1 ≤ i ≤ 4.
(2)
 
0 1 0
M1 =  0 0 1  e1 = R3 −→ 12 R3
2 2 0
 
0 1 0
M2 =  0 0 1  e2 = R1 ←→ R3
1 1 0
 
1 1 0
M3 =  0 0 1  e3 = R2 ←→ R3
0 1 0
 
1 1 0
M4 =  0 1 0  e4 = R1 −→ R1 − R2
0 0 1
 
1 0 0
M5 =  0 1 0 
0 0 1
Remark 13.8. Given matrix M = (mij ). We will apply eros to M using the
following language:
(1) If mij 6= 0 then we can normalise by Ri −→ (mij )−1 Ri , so that the (i, j)th
entry becomes 1.
(2) We can move an entry mij up and down in its column by Ri ←→ Rv .
(3) If mij = 1 then we can purge all other entries in column j (i.e. make them
zero) by applying Type III operations: Rs −→ Rs − msj Ri , for s 6= i. The
element mij = 1 used to “clean out” the rest of the column is called the
pivot of the purging operation.
Algorithm 13.9 (for reducing a matrix to row reduced echelon form by eros).
Input: Matrix M of size m × n.

The algorithm is an n stage process. Starting with matrix Mk , we have
at the end of stage k the matrix Mk+1 in which the first k columns make
an m × k matrix in row reduced echelon form.
Stage 1: Inspect column 1 of M1 := M. If the column is zero, then stop and
Stage 1 is complete. Otherwise find the first non-zero entry in the first
column (reading downwards), normalise it and move it to the top. Use it
as pivot to purge the rest of the column. We obtain:
 
0 ∗ ··· ∗
. .. 
 0 ..

. 
M2 =  . . .. 
 .. .. . 
0 ∗ ··· ∗
62 ANNE HENKE
or
 
1 ∗ ··· ∗
. ..
0 ..
 
 . 
M2 =  .. .. .. 
 . . . 
0 ∗ ··· ∗
Stage k: We start with matrix Mk of the form
Mk = (Ak | Bk )
where Ak is an m × (k − 1)-matrix in row reduced echelon form and Bk is
an m × (n − k + 1)-matrix.
Case 1: If Ak = 0 (has no non-zero rows), then apply the Stage 1 process to
the first column of Bk . We obtain Matrix Mk+1 .
Case 2: If Ak has m non-zero rows then stop altogether. In this case Mk is
already in row reduced echelon form.
If neither Case 1 nor Case 2 occur then
Case 3: Write
Ek Fk
Mk = ,
0 Gk
where Ek consists of the non-zero rows of Ak , Fk is the continuation
of these non-zero rows of Ek . As Case 1 did not occur, this implies
Ek has at least one row. As Case 2 did not occur, this implies Gk has
at least one row. Note Ek is in row reduced echelon form. Inspect
the first column of Gk in Mk . If it is zero, then stop and Stage k is
complete. Otherwise select the first non-zero element, normalise and
move it to the top left hand corner of Gk . Use it as a pivot to purge
the kth column of Mk . Stop. Stage k is complete. We obtain matrix
Mk+1 .
In Case 1 and Case 3: Move on to the next stage if k ≤ n − 1, or otherwise

the algorithm stops.
Output: matrix in row reduced echelon form.
Remarks. (a) This algorithm proves the first part of Theorem 13.5. We have not
shown uniqueness of the row reduced echelon form in Theorem 13.5.
(b) Note that the algorithm has been applied in Example 13.7. However many
more inbetween steps have been given in these examples, and hence the labelling
of the matrices Mi does not agree with the labelling of matrices used in Algo-
rithm 13.9.
Example 13.10. We perform Algorithm 13.9 on matrix M1 given below. There
are three steps in the algorithm and the matrices Mi from the algorithm are given
by:
 
1 0 1 −2 2
 2 1 1 −2 3 
M1 =  
 3 1 3 −6 4 
1 0 2 −4 1
 
1 0 1 −2 2
 0 1 −1 2 −1 
M2 =  
 0 1 0 0 −2 
0 0 1 −2 −1
 
1 0 1 −2 2
 0 1 −1 2 −1 
M3 =  
 0 0 1 −2 −1 
0 0 1 −2 −1
 
1 0 0 0 3
 0 1 0 0 −2 
M4 =  
 0 0 1 −2 −1 
0 0 0 0 0
It follows that the row rank of the matrices Mi for 1 ≤ i ≤ 4 is three.
Exercise 45. Find the row reduced echelon form of the matrices A and B where
   
2 −2 2 1 2 2 −1 6 4
A =  −3 6 0 −1  , B= 4 4 1 10 13  .
1 −7 10 2 8 8 −1 26 23
64 ANNE HENKE
14. Systems of Linear Equations
Notation 14.1. Given is a system of linear equations, say

a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
..
.
am1 x1 + am2 x2 + . . . + amn xn = bm .
We can shortly write this as A · x = b where
     
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
A=  ... ..  , x =  ..  , b=
 ...  .

.   . 
am1 am2 ··· amn xn bm
We call A the coefficient matrix of the system of equations; the m×(n+1)-matrix
(A | b) is called the augmented matrix of the equation system. A system of linear
equations Ax = b is called homogeneous if b = 0.
Remark: Note that the set of solutions {x ∈ Rn | Ax = b} of a sytem of linear

equations Ax = b forms a subspace of Rn if and only if b = 0.
Example 14.2. (1) The system of equations
x1 − 2x2 + 2x3 = 0
3x1 + x2 + 4x3 = −1
2x1 + x2 + = 2
has augmented matrix
 
1 −2 2 0
 3 1 4 −1  .
2 1 0 2
(2) We give some examples of systems of linear equations where the corre-
sponding augmented matrix is in row reduced echelon form:
(a) Consider

0 0 0 0·x+0·y = 0
,
0 0 0 0 · x + 0 · y = 0.
Solutions are all (x, y) = (α, β) with α ∈ R, β ∈ R.
(b) Consider
 
1 0 0 −2 x1 = −2
 0 1 0 3 , x2 = 3
0 0 1 4 x3 = 4.
There is precisely one solution to the system of equations, namely
(x1 , x2 , x3 ) = (−2, 3, 4).
(c) Consider the homogeneous system of equations

 
1 0 0 0 3 0 x1 + 3x5 = 0
 0 1 4 0 −2 0  , x2 + 4x3 − 2x5 = 0
0 0 0 1 0 0 x4 = 0.
The solutions of this system of equations are given by all
(x1 , x2 , x3 , x4 , x5 ) = (−3β, 2β − 4α, α, 0, β)
= α(0, −4, 1, 0, 0) + β(−3, 2, 0, 0, 1)
with α ∈ R and β ∈ R. The set of all solutions forms a subspace of R5
of dimension two with basis vectors (0, −4, 1, 0, 0) and (−3, 2, 0, 0, 1).
Remark. Our aim is to solve systems of linear equations. If the augmented

matrix of a system of equations is in row reduced echelon form, we can read off
the solutions of the equation system. Now suppose we are given a system of
equations with augmented matrix not in row reduced echelon form. We then can
perform row operations on the augmented matrix. Doing this, we do not change
the set of solutions of the system. The following algorithm describes how to solve
a system of linear equations by so-called Gaussian elimination.
Algorithm 14.3 (Gauss-Jordan Solution of Linear Equations).
Input: Given Ax = b where A is an m × n matrix.
Step1: Form the augmented matrix M = (A | b).
Step2: Transform M to row reduced echelon form E.
Output:
Case (a): E = 0. Let αi ∈ R for 1 ≤ i ≤ n, then (x1 , . . . , xn ) = (α1 , . . . , αn )
is a solution of the equation system Ax = b.
Case (b): E contains (0, 0, . . . , 0, 1) as a row. Declare the system inconsis-
tent, that is, it has no solution.
Case (c): If (a) and (b) do not occur, then all leading entries of E occur in
columns 1 to n. If column j (for 1 ≤ j ≤ n) does not contain a leading
entry, then give a parameterised value αj to xj . Use the equations corre-
sponding to the non-zero rows of E to solve for the remaining variables.
Example 14.4. Given is
2x1 + x2 + x3 − 2x4 = 3
x1 + x3 − 2x4 = 2
x1 + 2x3 − 4x4 = 1
3x1 + x2 + 3x3 − 6x4 = 4.
Using Gaussian elimination, see Algorithm 14.3, we obtain the row reduced ech-
elon matrix  
1 0 0 0 3
 0 1 0
 0 −2 
.
 0 0 1 −2 −1 
0 0 0 0 0
Solutions: for any α ∈ R, then (x1 , x2 , x3 , x4 ) = (3, −2, −1 + 2α, α) is a solution.
66 ANNE HENKE
Example 14.5. For which values of a, b, c is the system of equations

3x + y + 2z = −1
x + 2y + −z = a
x +z = −1
2x + by − z = c
consistent? Which values of a, b, c give infinitely many solutions? Taking the
third equation first, the augmented matrix is
 
1 0 1 −1
 3 1
 2 −1  
 1 2 −1 a 
2 b −1 c
Performing row operations on the augmented matrix in accordence with Algo-
rithm 14.3, we get:
 
1 0 1 −1
 0 1 −1 2 
 
 0 2 −2 a + 1 
0 b −3 c + 2
(we cleaned the first column by subtracting multiples of row one),
 
1 0 1 −1
 0 1
 −1 2 .
 0 0 0 a−3 
0 0 b − 3 c − 2b + 2
(we cleaned the second column by subtracting multiples of row two).
Instead of solving the system completely, note at this stage, that the system of
equations is consistent only for a = 3. So put a = 3 and swap rows three and four
to get
 
1 0 1 −1
 0 1 −1 2 
 0 0 b − 3 c − 2b + 2  .
 
0 0 0 0
We hence obtain the following different cases:
(1) If a = 3, b 6= 3, then the system has a unique solution.

(2) If a = 3, b = 3, c = 4 we have a solution (x, y, z) = (−1 − α, 2 + α, α) for
any α ∈ R. Hence we have infinitely many solutions.
(3) If a 6= 3, or if a = 3, b = 3, c 6= 4, then the system is inconsistent.
Exercise 46. Solve the following system of equations using Gaussian elimination:
(a)
x + 2y − 4z = −4
2x + 5y − 9z = −10 ,
3x − 2y + 3z = 11
(b)
x + 2y − 3z = −1
−3x + y − 2z = −7 ,
5x + 3y − 4z = 2
(c)
x + 2y − 3z = 1
2x + 5y − 8z = 4 .
3x + 8y − 13z = 7
Exercise 47. Use the Gaussian elimination algorithm to do the following:
(i) to determine a basis of the subspace of R4 spanned by the following vectors:

(1, −2, 5, −3), (2, 3, 1, −4) and (3, 8, −3, −5).
(ii) to decide whether the vectors (2, 5, −3, −2), (−2, −3, 2, −5), (1, 3, −2, 2)
and (−1, −6, 4, 3) form a basis of R4 .
Exercise 48. Determine whether the following sums are direct sums or not:
(i) Span{(3, −2, −5, 4), (−5, 2, 8, −5)} + Span{(−2, 4, 7, −3), (2, −3, −5, 8)},
(ii) Span{(1, −2, 5, −3), (4, −4, 6, −3)} + Span{(3, 4, 0, 1), (−3, 8, −2, 1)}.
68 ANNE HENKE
15. Invertible Matrices and Systems of Linear Equations.
Notation 15.1. Recall Definitions 11.1 and 11.4 where we defined the image,
kernel, rank and nullity of a linear transformation. Let A be an m × n matrix.
Consider the map fA : Rn → Rm given by fA (x) = Ax. Note that fA is a linear
map. We define:
imA := imfA image of A, ker A := ker fA null space of A,
rkA := dim(imA) rank of A, n(A) := dim(ker A) nullity of A.
Proposition 15.2. Let A ∈ Mn (R), x ∈ Mn×1 (R). The following are equivalent:
(1) Ax = 0 has x = 0 as unique solution.

(2) The row reduced echelon form of A is In .
(3) For any b ∈ Mn×1 (R), the equation system Ax = b has a unique solution.
Proof. (1) ⇒ (2): Bring the augmented matrix (A | 0) into row reduced echelon
form, say (E | 0). So Ax = 0 if and only if Ex = 0. As Ax = 0 has a unique
solution, this implies that Ex = 0 has a unique solution. Hence there are no
parameters in the general solution of Ex = 0. Hence each of the columns of E
contains a leading entry. This implies E = In .
(2) ⇒ (3): Bring (A | b) into row reduced echelon form. Since A has row reduced
echelon form In , it follows that (A | b) has row reduced echelon form (In | c). So
Ax = b if and only if In x = c. But In x = c has unique solution x = c. Hence
Ax = b has a unique solution.
(3) ⇒ (1): By assumption, Ax = b has a unique solution for every b ∈ Mn×1 (R).
Take b = 0, then Ax = 0 has a unique solution.
Remark: Note that the conditions in the last proposition are also equivalent to
nullity of A is zero; to row rank of A is n; to rank of A is n (use the rank-nullity
formula).
Next, recall that a square matrix A ∈ Mn (R) is invertible if and only if there
exists B ∈ Mn (R) such that AB = BA = In . Also recall that if A is invertible
then B is uniquely determined. Write B = A−1 .
Proposition 15.3. Suppose A ∈ Mn (R) is satisfying one of the conditions in
Proposition 15.2. Then there exists B ∈ Mn (R) such that AB = In . (We say that
B is a right inverse of A.)
Proof. Let e1 , e2 , . . . , en be the columns of In . By Proposition 15.2 we have

a unique solution bi for every equation system Ax = ei , for 1 ≤ i ≤ n. Define
B = (b1 | b2 | . . . | bn ), the matrix with columns b1 , . . . , bn . Then AB = In .
Proposition 15.4. Supposer A ∈ Mn (R) has a right inverse. Then A is an

invertible matrix.
Proof. By assumption, we have a matrix B with AB = In . We need to show

that also BA = In .
(1) We show that B has a right inverse C: Assume we have a vector x with
Bx = 0. Then: 0 = A · 0 = ABx = In x = x. Hence Bx = 0 has
unique solution x = 0. By Proposition 15.3, there exists C ∈ Mn (R) with
BC = In .
(2) Note that C = In C = (AB)C = A(BC) = AIn = A. Hence AB = In and
BA = In and so, by definition, A is invertible with A−1 = B.
Corollary 15.5. The following are equivalent:
(1) Ax = 0 has x = 0 as a unique solution.

(2) A has a right inverse.
(3) A is invertible.
(4) A has a left inverse.
Proof. This follows from Proposition 15.3 and Proposition 15.4 and from show-
ing that (4) implies (1). Let B be the left inverse of A, so BA = In . Assume that
x is a solution of Ax = 0. Then 0 = B · 0 = BAx = In x = x. Hence Ax = 0 has
x = 0 as a unique solution.
Algorithm 15.6. (for calculating the inverse of an n×n matrix A, or for declaring
A to have no inverse)
Input: Given matrix A ∈ Mn (R).
Step1: Form the augmented n × 2n matrix M = (A | In ).
Step2: Bring M into row reduced echelon form (E | F ) with E, F ∈ Mn (R).
Output:
(a) If E 6= In , declare A to be non-invertible.

(b) If E = In , declare A−1 = F.
70 ANNE HENKE
   
1 0 2 −11 2 2
Example 15.7. Let A =  2 −1 3  . Then A−1 =  −4 0 1  as
4 1 8 6 −1 −1
 
1 0 2 | 1 0 0
 2 −1 3 | 0 1 0  R2 → R2 − 2R1
R3 → R3 − 4R1
4 1 8 | 0 0 1
 
1 0 2 | 1 0 0 R2 → −R2
 0 −1 −1 | −2 1 0  R3 → R3 − R2
0 1 0 | −4 0 1 R3 → −R3
 
1 0 2 | 1 0 0
 0 1 1 | 2 −1 R1 → R1 − 2R3
0 
R2 → R2 − R3
0 0 1 | 6 −1 −1
 
1 0 0 | −11 2 2
 0 1 0 | −4 0 1 .
0 0 1 | 6 −1 −1
Exercise 49. Find the inverses of the following matrices using elementary row
operations:
     
1 −1 0 0 1 1 2 2 1 −1 1 2
 1 0 −1 0   2 1 1 2   0 1 2 −1 
A= , B= , C= ,
 1 0 0 −1   2 2 1 2   3 1 1 1 
0 1 1 1 3 3 1 3 3 2 1 0
Exercise 50. Let A be a matrix with entries in R. Prove the following statements:
(a) A system of linear equations Ax = 0 with fewer equations than variables

always has a non-trivial solution.
(b) A system of linear equations Ax = b with fewer equations than variables
either has no solution or has several different solutions.
(c) A system of linear equations Ax = b where the rank of A equals the
number of equations in the system, always has a solution.
(d) A system of linear equations Ax = b where the rank of A equals the
number of variables in the system, has at most one solution.
(e) A system of linear equations Ax = b where the rank of A equals the
number of variables in the system and equals the number of equations in
the system, has precisely one solution.
[Denote by B the row reduced echelon form of a given matrix A. Then the rank
of A is defined to be the number of non-zero rows of B.]
Exercise 51. The n × n Van der Monde matrix is the matrix A defined by
 
1 x1 x21 · · · x1n−1
 1 x2 x22 · · · x2n−1 
A=  ... .. .. . . .. 
. . . . 
1 xn x2n · · · xnn−1
where x1 , x2 , . . . , xn are distinct real numbers. Show that A is invertible. [Hint:

let  
a0
 a1 
x=  ... 

an−1
be a solution to the simultaneous equations Ax = 0. Show that x1 , . . . , xn are all
roots of the polynomial a0 + a1 x + a2 x2 + . . . + an−1 xn−1 .]
72 ANNE HENKE
16. Elementary Matrices
Recall Definition 13.4 of an elementary row operation (ero) on a matrix. If a

matrix B is obtained from a matrix A by applying elementary row operations,
we call A and B row equivalent. In this section, we study how row equivalent
matrices are related.
Definition 16.1. Let e be an elementary row operation. An n × n matrix E =
e(In ) obtained by applying e to the identity matrix In is called an elementary
matrix.
Example 16.2. The elementary matrices for Example 13.7(1) are:
1
1 0 e1 =(R2 → R2 )
2 1 0
I2 = −−−−−−−−→ E1 := e1 (I2 ) = 1
0 1 0 2

1 0 e2 =(R1 ↔R2 ) 0 1
I2 = −−−−−−−→ E2 := e2 (I2 ) =
0 1 1 0
1
1 0 e3 =(R2 → R2 )
2 1 0
I2 = −−−−−−− −→ E3 := e3 (I2 ) = 1
0 1 0 2

1 0 e4 =(R1 →R1 −2R2 ) 1 −2
I2 = −−−−−−−−−−−→ E4 := e4 (I2 ) = .
0 1 0 1
Note that in Example 13.7 we have Ei Mi = Mi+1 .
Lemma 16.3. Let A ∈ Mm×n (R) and let e be an elementary row operation. Then
e(A) = e(Im ) · A.
Proof. We sketch a proof for the type II elementary row operation. We leave
it as an exercise to the reader to formally write down the matrix multiplications.
Similarly, to prove the claim for the other types of elementary row operations is
left as an exercise to the reader. Let A = (aij ) and let e be scalar multiplication
of row i by λ ∈ R (Type II ero). Then e(Im ) = diag(1, . . . , 1, λ, 1, . . . , 1) with λ
in row and column i. Hence
   
1 0 ... 0 a11 a12 . . . a1n
.   ... .. 
 0 ..
 . 
 
   . .. 
 1   .. . 
 . . . .   
e(Im ) · A =   .. .. λ .. .. 
 ·  ai1 ai2 . . . ain 
 
. .
  .. .. 
 1  
 

 . .
  .
. 0   .. .
.. 

0 ... 0 1 am1 am2 . . . amn

 
a11 a12 . . . a1n
 .. .. 
 . . 
 
=  λai1 λai2 . . . λain  = e(A).
 . .. 
 .. . 
am1 am2 . . . amn
Lemma 16.4. Any elementary matrix is invertible.
Proof.
(1) Note that each elementary row operation is invertible:

(i) Ri ↔ Rj is its own inverse.
(ii) The inverse of e = (Ri → cRi ) with c 6= 0 is f = (Ri → 1c Ri ).
(iii) The inverse of e = (Ri → Ri +cRj ) with i 6= j is f = (Ri → Ri −cRj ).
(2) Let e be an elementary row operation and f be its inverse elementary row
operation. Let E and F be the corresponding elementary matrices. We
apply e and f to the identity matrix I. Then
F E = F EI = I,
EF = EF I = I.
Hence F is the inverse matrix of E.
Proposition 16.5. Matrices A and B are row equivalent if and only if there
exists an invertible matrix P with B = P A.
Proof. “⇒”: Since A and B are row equivalent, there are elementary row
operations ei (for 1 ≤ i ≤ t) such that
(13) B = e1 e2 · · · et (A).
Let Ei = ei (I) for 1 ≤ i ≤ t. Let P = E1 E2 · · · Et . By Lemma 16.4 we have P is
invertible, and by Equation (13) and Lemma 16.3 we have B = P A.
“⇐”: Suppose A and B are matrices with B = P A where P is invertible. Since
P is invertible, it follows by Corollary 15.5 and Proposition 15.2, that the row
reduced echelon form of P is the identity matrix I. This means, P can be brought
to row reduced echelon form I by applying elementary row operations. So P =
Es Es−1 · · · E1 for some elementary matrices Ei with 1 ≤ i ≤ s. Then B = P A =
Es Es−1 · · · E1 A, and so B can be obtained from A by applying elementary row
operations corresponding to E1 , E2 , . . . , Es .
Remark 16.6. Why is Algorithm 15.6 (for inverting a matrix or for declaring
it to be non-invertible) working? If A ∈ Mn (R) is invertible, then by Corollary
15.5 and Proposition 15.2 matrix A is row equivalent to In . Hence there exist
elementary matrices Ei for 1 ≤ i ≤ k such that
In = Ek Ek−1 · · · E1 A.
As A is invertible, we can multiply this equation by A−1 from the right to get:
A−1 = Ek Ek−1 · · · E1 In .
The meaning of this last equation is the following: the elementary row operations
used to get the row reduced echelon form of A, if applied to In , give precisely the
inverse of A.
74 ANNE HENKE
Exercise 52. (a) Write the following matrix C as a product of elementary ma-
trices:  
1 1 1
C =  1 2 2 .
1 2 3
(b) Given are matrices A and B with
   
1 1 0 1 0 0
A =  1 0 2 , B =  0 1 0 .
2 1 2 0 0 0
Find matrices P and Q such that P AQ = B.
17. Row Rank and Column Rank
Row and column operations. Similarly to the elementary row operations

defined in Section 13, we can define elementary column operations:
Definition 17.1. (1) The following operations are called elementary column
operations (ecos) on a matrix:
Type I: Swap column i and column j: Ci ↔ Cj .
Type II: Multiply column i by a non-zero λ ∈ R: Ci → λCi .
Type III: Add to column i a multiple (λ times) of column j for i 6= j:
Ci → Ci + λCj .
(2) Two matrices A and B are column equivalent, if we can pass from A to
B by a sequence of elementary column operations.
Definition 17.2. Let A = (aij ) be an m × n matrix with entries in R.
(1) We define the row space of A as the vector space spanned by the rows
ai = (ai1 , ai2 , . . . , ain ) of A, for 1 ≤ i ≤ m. We consider the row space as
a subspace of Rn .
(2) We define the column space of A as the vector space spanned by the
columns ai = (a1i , a2i , . . . , ami ) of A, for 1 ≤ i ≤ m. We consider the
column space as a subspace of Rm .
The proof of the following statement is left as an exercise to the reader.

Lemma 17.3. Let V = hv1 , . . . , vi , . . . , vj , . . . vn i be the vector space spanned by
vectors v1 , . . . , vn . Then:
(1) V = hv1 , . . . , vj , . . . , vi , . . . , vn i for j > i (swap vectors vi and vj );

(2) V = hv1 , . . . , λvi , . . . , vn i for λ 6= 0 (multiply vector vi by scalar λ);
(3) V = hv1 , . . . , vi + λvj , . . . , vn i for any λ ∈ R and i 6= j (add to vector vi a
multiple of a vector vj for i 6= j).
Proposition 17.4. Let A, B be matrices where B is obtained from A through
elementary row (column) operations. Then the row (column) space of A equals
the row (column) space of B.
Proof. The rows of B are obtained from the rows of A by

- reordering rows,
- scalar multiplication of rows,
- addition of rows.
By Lemma 17.3, all these operations preserve the space spanned by the rows. The
proof that the column space is preserved under elementary column operations is
similar.
Remark. Note that the column space of a matrix B is in general not equal to the
column space of a matrix A, if matrix B is obtained from matrix A by elementary
row operations. And the row space of a matrix B is in general not equal to the
76 ANNE HENKE
row space of a matrix A, if matrix B is obtained from matrix A by elementary

column operations.
Lemma 17.5. Let A ∈ Mm×n (R), let c be an elementary column operation appli-
cable to A. Let r be the corresponding elementary row operation (that is of same
type, applied to the row with the same number). Then c(A) = (r(AT ))T .
Proof. Elementary column operations applied to A have the same effect as

corresponding elementary row operations applied to AT (and then transposing
the latter result).
Remark 17.6. We leave it to the reader to find a similar definition to Defini-

tion 13.1 for the column reduced echelon form of a matrix. The column reduced
echelon form of a matrix A should be equal to the transposed of the row reduced
echelon form of AT .
Recall that by Definition 13.6, the row rank of a matrix A equals the number of
non-zero rows in the row reduced echelon form of A.
Definition 17.7. We define the column rank of a matrix A to be equal to the
number of non-zero columns in the column reduced echelon form of A.
Corollary 17.8. Let A be an m×n matrix over R. Then the row rank of A equals
the dimension of the row space of A. Moreover, the column rank of A equals the
dimension of the column space of A, which in turn equals the rank of A.
Proof. (1) Let B be the corresponding row reduced echelon form of the given
matrix A. Note that the non-zero row vectors in B are linearly independent.
Hence the dimension of the row space of B equals the number of non-zero rows
of B. By Proposition 17.4, it follows that the row rank of A equals the dimension
of the row space of A.
(2) The proof that the column rank is equal to the dimension of the column space
is similar to the argument given in (1).
(3) By definition, the rank of a matrix A equals the dimension of the image of
A, see 15.1. Let {e1 , . . . , en } be the canonical basis of Rn . Then the image of A
is equal to Span{Ae1 , . . . , Aen }. Note that Aei equals precisely the ith column of
A. Hence the image of A is equal to the column space of A. This implies that
the rank of A is equal to the column rank of A.
Row and column rank are equal. We define elementary matrices for elemen-
tary column operations, similar to the elementary matrices defined for elementary
row operations (see Section 16). Note that when applying an elementary column
operation to a matrix A, it means multiplying A with the corresponding ele-
mentary matrix from the right (not from the left.) We hence have similar to
Proposition 16.5:
Proposition 17.9. Matrices A and B are column equivalent if and only if there
exists an invertible matrix Q with B = A · Q.
Theorem 17.10. Let A ∈ Mm×n (R) with row rank r. Then there exists an in-
vertible matrix P ∈ Mm (R) and an invertible matrix Q ∈ Mn (R) such that

Ir 0
P AQ = .
0 0
Proof. By Propositions 16.5 and 17.9, it is sufficient to show that A can be

brought into this form by a combination of elementary column operations and
elementary row operations.
Step 1: Bring A to row reduced echelon form, say E, by a sequence of elementary

row operations. Then E has r non-zero rows.
Step 2: Take the r columns containing the leading entries of the non-zero rows
and use them to make up the first r columns. This can be done by elementary
column operations. We obtain the matrix

Ir ∗
.
0 0
Step 3: Use the leading entry of each non-zero row to purge the first r rows. This
means applying elementary column operations. We obtain the matrix

Ir 0
.
0 0
To calculate P and Q, keep track of the elementary row operations and elementary
column operations applied (see the proof of Proposition 16.5).
Theorem 17.11. Let A ∈ Mm×n (R). Then the row rank of A equals the column
rank of A (equals the rank of A).
Proof. Let r be the row rank of A. Then by Theorem 17.10, there exist invertible
matrices P, Q with

Ir 0
P AQ = .
0 0
(i) Note that P AQ is row equivalent to AQ by applying a sequence of elementary

row operations (see Proposition 16.5). These elementary row operations leave
the last (n − r) columns zero. So at least n − r columns of AQ are zero. Hence
the column reduced echelon form of AQ has ≤ r non-zero columns. Hence the
column rank of AQ is ≤ r. Since A and AQ are column equivalent, we have that
the column rank of A equals the column rank of AQ. Hence the column rank of
A ≤ row rank of A.
78 ANNE HENKE
(ii) Apply the result of (i) to AT . Then

row rank of A = column rank of AT
≤ row rank of AT
= column rank of A.
Hence by (i) and (ii), this implies that the row rank of A equals the column rank
of A.
Example 17.12. Given

1 2 1
A= .
−2 −4 1
Find matrices P, Q such that

1 0 0
P AQ = ,
0 1 0
using the algorithm in the proof of Theorem 17.10.
Step 1: Bring A to row reduced echelon form E and apply the same elementary
row operations to I2 :

1 0 1 2 1
R2 → R2 + 2R1
0 1 −2 −4 1

1 0 1 2 1
R2 → 13 R2
2 1 0 0 3

1 0 1 2 1
2 1 R1 → R1 − R2
3 3
0 0 1
1

3
− 31 1 2 0
2 1 .
3 3
0 0 1
Hence for
1
3
− 31
P = 2 1
3 3
we have P A = E.
Step 2/3: Bring E to column reduced echelon form and apply the same elementary
column operations to I3 :
 
1 0 0
1 2 0  0 1 0  C2 ↔ C3
0 0 1
0 0 1
 
1 0 0
1 0 2  0 0 1  C3 → C3 − 2C1
0 1 0
0 1 0
 
1 0 −2
1 0 0  0 0 1 .
0 1 0
0 1 0
 
1 0 −2
1 0 0
Then taking Q =  0 0 1  , we have P AQ = .
0 1 0
0 1 0
Exercise 53. Determine the row rank and the column rank of the following
matrix:  
1 2 3 ... n
 2 3 4 ... n + 1 
X=  ... .. .. ... .. .
. . . 
n n + 1 n + 2 . . . 2n − 1
Exercise 54. Matrix U comes from matrix A by subtracting row one from row
three:    
1 3 2 1 3 2
A=  0 1 1  U = 0 1 1 
1 3 2 0 0 0
(i) Find bases for the two column spaces.

(ii) Find bases for the two row spaces.
(iii) Find bases for the two null spaces.

Linear Algebra Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Notes

Uploaded by

Copyright:

Available Formats

HONOURS MODERATION

LINEAR ALGEBRA I, 2008/09

Solving mathematical problems

Mathematical problems. Problems play a central role in mathematics. To

(K1) If x, y ∈ K, then x + y ∈ K and x · y ∈ K.

(2) Similar to (1) we have: R and C are fields.

(3) Claim: Z is not a field.

(5) Q(i) := {a + bi | a, b ∈ Q} is a field.

Can you find further examples of fields?

2. The Algebra of Matrices

We next define addition of matrices:

Definition 2.4. Let A, B be two m × n matrices, say A = (aij ) and B = (bij ).

Proof. Let A = (aij ), B = (bij ) ∈ Mm×n (R). Define (cij ) = A + B and

(1) If AB is defined then for all r ∈ R we have (rA)B = r(AB) = A(rB).

(a) Assume that AB = BA. We want to show that AB is symmetric. By

Definition 2.16. If APis an n × n matrix, say A = (aij ), then the trace of A is

(1) tr(AT ) = tr(A),

Exercise 1. Let ω be a complex cube root of 1 (this means ω 3 = 1) with ω 6= 1.

(d)  that A commutes with all 2 × 2 matrices if and only if A =

(Do you need the assumptions on A and B in all cases?)

(a) If A is invertible then the inverse is unique.

(a) V is a non-empty set,

(V1) u + v = v + u for all u, v ∈ V.

We define the addition of elements in Rn and the scalar multiplication as follows:

(a) Clearly V is a non-empty set.

We next have to check that the axioms (V 1) − (V 8) hold in V = Rn . Given any

(V4) Given a vector p ∈ V with say coordinate entries ai , we take q to have

(V7) Let λ, µ ∈ R. For real numbers we have λ(µxi ) = (λµ)xi in R, where

(1) (K, +, ·) is a K-vector space,

(1) (Rn , +, ·) is an R-vector space (by Example 3.2).

In case X = R, we denote the vector space V by RR .

4. First Properties of Vector Spaces

Lemma 4.1. The following statements hold:

(i) The zero vector 0V is unique.

v + (−v) = (−v) + v = 0V , (a)

Now v + (−v) + v ′ = (v + v ′ ) + (−v) byAxioms(V1), (V2)

5. Subspaces of Vector Spaces

Definition and Examples. Given an object in mathematics, one typically also

Proof. The proof consists of showing two statements.

(1) Axioms (V1), (V2) and (V5)-(V8) are inherited from V as W ⊆ V .

Proof. The proof consists of showing two statements.

(1) Choose λ1 = λ2 = 1. We calculate in the vector space V . Then by

Lemma 5.2 now implies that W is a subspace of V .

Lemma 5.4. Let U be a subspace of V . For any k ∈ N and u1 , . . . , uk ∈ U and

Example 5.5. We give various examples of subspaces:

(1) R is a subspace of the R-vector space C = {a + bi | a, b ∈ R}.

(3) The subspaces of R3 are precisely the following subsets of R3 :

(1) Let V = R3 and

Intersection, union and sum of subspaces. Given two subspaces U, W of a

and let U ∪ W be the set theoretic union of the sets U and W :

(a) Then U ∩ W is a subspace of V .

Remark. In general U ∪ W is not a subspace of V , but as a set of vectors in V ,

(a) V \X is never a subspace of V ;

6. Linear Dependence, Linear Independence and Spanning

Throughout this section V denotes a vector space over R. Whenever we refor-

(1) We call any expression of the form λ1 v1 + · · · + λn vn with λi ∈ R a linear

(1) Let V = Rn . Define vector ei ∈ V by letting all coordinates be zero except

Then Span{ei | 1 ≤ i ≤ P n} = Rn since for any vector (x1 , . . . , xn )T ∈ V

(2) Define the vectors

So the spanning sets of a vector space V can have different cardinality.

Proposition 6.3. Let V be a vector space over R and S = {v1 , . . . , vn } ⊆ V.

Proof. Write X = Span{v1 , . . . , vn }. Note that X 6= ∅ as S ⊆ X. Let u, w ∈ X.

see Definition 6.1. Let λ, µ ∈ R. Then

(d) that A commutes with all 2 × 2 matrices if and only if A =