You are on page 1of 36

MATH 2020 — Linear Algebra

Gordon Royle

October 28, 2008

1 Vector Spaces

The first vector space that most students encounter is R2 which is the set
of all pairs (x, y) of real numbers. The vectors in R2 are usually interpreted
either as points in the plane, or as quantities (such as velocity) that have
both magnitude and direction. In Figure 1 we either view the point P as
being located at position (3, 2) or representing a quantity whose magnitude
# »
and direction are represented by the directed line segment OP where O is
the origin.

P = (3, 2)

Figure 1: Vectors in R2

The vectors in R2 satisfy certain useful properties (see Linear Algebra Notes

1
page 44,45) including (among others) the following:

• Vector Addition: Vectors can be added according to the rule

(u1 , u2 ) + (v1 , v2 ) = (u1 + v1 , u2 + v2 ).

• Scalar Multiplication: Vectors can be multiplied by real numbers


according to the rule

α(v1 , v2 ) = (αv1 , αv2 ).

• Zero Vector: There is a zero vector 0 = (0, 0) such that

u+0=0+u=u

for any u ∈ R2 .

Clearly the set R3 of triples of real numbers also satisfies these properties
and in general we have the n-dimensional space Rn . Although R2 represents
points in the plane and R3 represents points in space, they are really the
“same sort” of structure.

1.1 Abstract Real Vector Spaces

One of the main techniques of mathematics is the abstraction of general


principles and structures from close examination of particular instances, and
their distillation into a collection of axioms. The axioms or rules are meant
to represent the “essential features” that have proven useful in a range of
specific into a single abstract definition that will apply to all of the specific
examples, including any that may be discovered in the future.

2
Definition 1. A real vector space is a set V of objects called vectors together
with two functions
+:V ×V →V
and
·:R×V →V
called addition and scalar multiplication such that the following conditions
hold:

(i) Associativity: u + (v + w) = (u + v) + w for all u, v, w ∈ V .

(ii) Commutativity: u + v = v + u for all u, v ∈ V .

(iii) Zero vector: There is some vector 0 such that v +0 = v for all v ∈ V .

(iv) Inverses: For each vector v ∈ V there is another vector v 0 ∈ V such


that v + v 0 = 0 — we usually denote this vector by −v.

(v) Distributivity: α · (u + v) = αu + αv and (α + β) · v = α · v + β · v


for all α, β ∈ R and u, v ∈ V .

(vi) Associativity of ·: α · (β · v) = (αβ) · v for all α, β ∈ R and v ∈ V .

(vii) Identity: 1 · v = v for all v ∈ V .

Although we have given them the “normal” names of + and ·, it is important


to realize that they can be any function at all, and they may in some situa-
tions look very different from the usual addition and multiplication (even if
V is a familiar set).

3
Example 1. The set V = R2 with the usual addition and scalar multiplica-
tion is a real vector space.
Example 2. The set R2×2 of all 2 × 2 real matrices with vector addition
defined by
    
a1 b 1 a2 b 2 a1 + a2 b 1 + b 2
+
c1 d1 c2 d 2 c1 + c2 d 1 + d 2

and scalar multiplication by


   
a b αa αb
α· =
c d αc αd

is a real vector space.


Example 3. The set of 2×2 real symmetric matrices (with the usual addition
and multiplication) is a real vector space.
Example 4. The set of 2 × 2 matrices with determinant 1 and the usual
addition and multiplication is not a real vector space, because the set is not
closed under ·.
Example 5. The set V = {F} with addition defined by

F+F=F

and scalar multiplication defined by

α·F=F

is a real vector space.


Example 6. Let V be the set of all positive real numbers and define

u + v = uv

and
k · v = vk .
Is this a real vector space?

We need to check all of the conditions in turn.

4
(i) Vector addition is associative because

u + (v + w) = u(vw) = (uv)w = (u + v) + w

by the normal properties of real numbers.

(ii) Vector addition is commutative because

u + v = uv = vu = v + u.

(iii) The zero vector is 0 = 1 because 1 + v = 1v = v for all v.

(iv) The inverse of v is 1/v because


1 v
v+ = = 1 = 0.
v v

(v) Distributivity holds because

α · (u + v) = (u + v)α = (uv)α = uα v α = uα + v α = α · u + α · v.

(vi) Associativity holds because

α · (β · v) = α · (v β ) = v βα = v αβ = β · (v α ) = β · (α · v).

(vii) Identity holds because


1 · v = v 1 = v.

Therefore this set does form a real vector space under these operations.

1.2 Other fields

Throughout the preceding discussion we assumed that the scalars were real
numbers. However there are many other “number systems” and we can define
a vector space using any of these as the scalars.

The technical term for a “suitable” number system is a field but we will not
need to know the exact definition of a field, just some examples of fields that
we can use. We’re already familiar with a number of common fields

5
• The field R of real numbers.
• The field C of complex numbers:

C = {a + bi | a, b ∈ R}

where i is a symbol with the property that i2 = −1. We can add


together and multiply complex numbers according to the following rules

(a + bi) + (c + di) = (a + c) + (b + d)i


(a + bi)(c + di) = (ac − bd) + (ad + bc)i

• The field Q of rational numbers which are numbers of the form a/b
where a, b are integers (i.e. whole numbers).

All of these fields have infinitely many scalars, but there are also finite fields.
The simplest is

• The binary field F2 = {0, 1} where addition and multiplication are


given by

+ 0 1 · 0 1
0 0 1 0 0 0
1 1 0 1 0 1

To a computer scientist these are simply XOR and AND in disguise, while
to a mathematician it is simply arithmetic modulo 2.
• The prime fields Fp = {0, 1, . . . , p − 1} where p is a prime1 and addition
and multiplication are performed modulo p.

The formal definition of a vector space over a field F is exactly the same as
above, but with every occurrence of R replaced by an arbitrary field. For
example, F32 is the vector space of all triples of scalars from F2 and thus

F32 = {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}

is a finite vector space containing just eight vectors.


1
This is important!

6
1.3 Theoretical Consequences

To prove that something is true in all vector spaces, we need to ensure that
our argument uses only the axioms and already-proved consequences. It is
very easy to accidentally assume something is true from the familiar exam-
ples.

Theorem 1. Let V be a vector space, v ∈ V and α ∈ R. Then

(i) 0 · v = 0

(ii) α · 0 = 0

(iii) (−1) · v = −v

(iv) (−α) · v = −(αv) = α(−v)

(v) If αv = 0 then α = 0 or v = 0.

Proof. We prove each of the assertions in turn, using just the axioms or the
earlier results:

Proof of (i): To prove that 0 · v = 0 we start by using Axiom (v) to deduce


that

0 · v + 0 · v = (0 + 0) · v
= 0·v

Adding the inverse of 0 · v to both sides we get

0·v+0 = 0
0 · v = 0 Axiom (iii)

Proof of (ii): By Axiom (v),

α · v + α · 0 = α · (v + 0)
= α·v

7
and so adding −(α · v) to both sides and arguing similarly to part (i) we get
α · 0 = 0.

Proof of (iii): For any vector v, Axiom (viii) shows that v = 1 · v and so

v + (−1) · v = 1 · v + (−1) · v
= 0·v
= 0

and so (−1) · v is equal to −v.

Proof of (iv): Similar to (iii).

Proof of (v): Later.

Conventions

From now on, we will adopt the following conventions:

1. We will drop the · for scalar multiplication and just use αu


to denote α · u.

2. We will define an operation − by

v − w = v + (−w).

With these conventions we can now define a linear combination of vectors


{v1 , v2 , . . . , vn } to be any vector of the form

v = α1 v1 + α2 v2 + · · · + αn vn

without any ambiguity.

1.4 Some more interesting vector spaces

We list some further examples of real vector spaces, but omit the proofs.

8
1. The vector space R∞ of sequences
Consider the set of all infinite sequences of real numbers

R∞ = {(a1 , a2 , . . . , an , . . .) | ai ∈ R}

and define vector addition and scalar multiplication so that if a =


(a1 , a2 , . . .) and b = (b1 , b2 , . . .) then

a + b = (a1 + b1 , a2 + b2 , . . .)

and
αa = (αa1 , αa2 , . . .).

This is a real vector space with

0 = (0, 0, . . .)

and
−a = (−a1 , −a2 , . . .).

2. The vector space R[x] of polynomials


Let R[x] denote the set of all polynomial functions of a variable x.
Therefore some sample vectors in this vector space are

v1 = 1 + 2x − x3
v2 = x100
v3 = 3

Notice that each vector in this space is an entire polynomial (and not,
for example, the polynomial evaluated at a single point). Two polyno-
mials f and g are equal if and only if they “have the same graph” or
in other words if and only if f (x) = g(x) for every value of x.
The zero vector 0 in this vector space is the polynomial f (x) = 0 that
takes the value 0 everywhere.
The addition of two polynomials is defined simply by adding the coeffi-
cients of the corresponding powers of x, and multiplying a polynomial
by a scalar in a similar fashion.

9
3. The vector space RR of functions
Let RR denote the set of all functions f : R → R. This includes the
polynomial functions and so

R[x] ⊆ RR

but of course there are many functions that are not polynomials and so

R[x] ⊂ RR .

The set of all functions includes “tame” functions that are given by
simple formulas such as sin x, cos x and can easily be graphed, along
with truly “wild” functions such as
(
x, if x ∈ Q,
f (x) =
x2 , if x ∈
/ Q,

which can’t even be drawn.


If f and g are functions, then the sum f + g and scalar multiple αf are
functions that are defined by their effect on a value x as follows:

(f + g)(x) = f (x) + g(x),


(αf )(x) = αf (x).

2 Subspaces

Several of the examples in the previous section showed that vector spaces
can contain smaller vector spaces inside them.
Definition 2 (Subspace). Let V be a vector space and suppose that W ⊆ V
is a subset of V . Then W is called a subspace of V if W is itself a vector
space (with the same field, vector addition and scalar multiplication as V ).

Proving that a set of vectors is a subspace is a lot easier than proving that a set
is a vector space from scratch because most of the axioms are automatically
satisfied in W because they are true in V . For example, we never need to

10
check that vector addition is commutative in W because we already know
that it is commutative in V .

In fact there are only two things that need to be checked:


Theorem 2. If V is a vector space and W ⊆ V then W is a subspace if and
only if

1. W is closed under vector addition,

2. W is closed under scalar multiplication.

Proof. Omitted here but given in lecture.


Example 7. Is the set of vectors W = {(x, y, z) | x + y + z = 1} a subspace
of R3 ?

The answer is “No”. The set W is not closed under + because if v = (1, 0, 0)
and w = (0, 1, 0) then v, w ∈ W but v + w = (1, 1, 0) ∈
/ W.
Example 8. Is the set of even functions a subspace of RR .

A function f is even if
f (−x) = f (x)
for all x ∈ R, and so the question is whether this property still holds if we
add together two even functions. If h = f + g then

h(−x) = f (−x) + g(−x) = f (x) + g(x) = h(x)

and so h is itself an even function. In a similar fashion, h = αf is also even


and so the set of even functions is a subspace.

2.1 Linear Combinations

As noted earlier, in any vector space, an expression of the form

α1 v1 + α2 v2 + · · · + αn vn

is unambiguously defined and is called a linear combination of {v1 , v2 , . . . , vn }.

11
Definition 3. If V is a vector space over a field F , and v1 , v2 , . . ., vn ∈ V
then the linear span (usually just called span) of {v1 , v2 , . . . , vn } is the set of
all linear combinations:

span({v1 , v2 , . . . , vn }) = {α1 v1 + α2 v2 + · · · + αn vn | αi ∈ F for all i}.

Theorem 3. The span of any set of vectors is a subspace of V .

Example 9. In R3 let

S1 = {(1, 0, 0)}
S2 = {(1, 1, 0), (1, 0, 0)}
S3 = {(1, 2, 3), (1, 2, 0), (2, 0, 0)}
S4 = {(0, 0, 0), (1, 0, 0), (−2, 0, 0)}

Then span(S1 ) is the x-axis, span(S2 ) is the xy-plane, span(S3 ) is the whole
of R3 and span(S4 ) is also the x-axis.

12
Self Study Problems
1. Which of the following sets are subspaces of R3 ?

(a) All vectors of the form (a, 0, 0)


(b) All vectors of the form (a, a2 , −a2 )
(c) All vectors of the form (a, b, −a − b)
(d) All vectors of the form (a, b, c) where c = a + b − 1
(e) The solutions (x1 , x2 , x3 ) to the equation
    
1 2 3 x1 0
 −3 2 4   x2  =  0 
3 1 0 x3 0

(f) The solutions to the equation


    
1 2 3 x1 1
 −3 2 4   x2  =  0 
3 1 0 x3 1

(g) The line passing through the points (1, 1, 2) and (1, 2, 1)
(h) The line passing through the points (1, 0, 3) and (−2, 0, −6)
(i) The point (1, 2, 3)
(j) The point (0, 0, 0)
(k) The points at distance at most 1 from the origin
(l) All vectors of the form (x, sin2 x, cos2 x)
(m) All vectors of the form (0, sin2 x, cos2 x)
(n) All vectors of the form (1, − sin2 x, − cos2 x)
(o) All vectors of the form (α, −α sin2 x, −α cos2 x)

13
2. Which of the following sets are subspaces of R3×3 ?

(a) Symmetric matrices.


(b) Diagonal matrices.
(c) Upper triangular matrices.
(d) Singular matrices.
(e) Invertible matrices.
(f) Non-invertible matrices.
(g) Matrices A such that Ax = 0 where x = (1, 1, 1)T .
(h) Matrices A such that Ax = x for all x ∈ R3 .
(i) Matrices with an odd number of entries equal to zero.
(j) Matrices where the sum of all the entries is 0.
(k) Matrices where the sum of all the entries is 1.
(l) “Row-and-column magic” squares i.e. matrices such that every
row and column sum to the same value (not including diagonals).
(m) “Fully magic” squares — include the diagonal and anti-diagonal.

3. Which of the following sets are subspaces of R[x]?

(a) Polynomials of degree exactly 3.


(b) Polynomials of degree at most 3.
(c) Constant polynomials.
(d) Even polynomials.
(e) Polynomials f such that f (2) = 1
(f) Polynomials f such that f (2) = 0
(g) Polynomials with zero constant term.
(h) Polynomials that satisfy f (1) = f (−1).
(i) Polynomials of even degree.

(j) Polynomials f such that −π f (x)dx = 0
(k) Polynomials that satisfy f (1)f (−1) = 0

14
4. List all the subspaces of F32

5. Which of the following are subspaces of RR ?

(a) Continuous functions


(b) Differentiable functions
(c) Piecewise smooth functions
(d) Functions of the form α sin x + β cos x
R∞
(e) Functions f such that −∞ f (x)dx = 0

3 Bases and Dimension

In this section we consider the idea of a basis of a vector space or subspace.


The main concept here is that we wish to specify a vector space or (more
usually) a subspace in an “efficient” manner.

3.1 Linear Independence

An easy way to specify a subspace W is to give a spanning set of vectors for


W . For example, in R3 here are four ways to specify a subspace:

W1 = span({(1, 0, 1), (0, 1, 1)})


W2 = span({(1, 0, 1), (0, 1, 1), (1, 1, 2)}
W3 = span({(1, 1, 2), (1, −1, 0)})

Each of these descriptions gives us a complete specification of the subspace


— a vector v is in W1 if and only if it can be expressed as a linear combination

α1 (1, 0, 1) + α2 (0, 1, 1).

For any particular vector we can always decide whether it is in W1 simply by


solving a system of linear equations.

15
Example Are the vectors v1 = (2, −3, −1) and v2 = (2, 3, 1) in W1 ? To
check v1 we need to solve the vector equation
α1 (1, 0, 1) + α2 (0, 1, 1) = (2, −3, 1).
Looking at each coordinate in turn, we get a system of linear equations
1α1 + 0α2 = 2
0α1 + 1α2 = −3
1α1 + 1α2 = 1
and this particular system of equations is easy to solve, giving us α1 = 2 and
α2 = −3. Therefore we conclude that v1 is a linear combination of the two
vectors {(1, 0, 1), (0, 1, 1)} and so it is in W1 .

When we try to repeat this with v2 we get a different system of equations


1α1 + 0α2 = 2
0α1 + 1α2 = 3
1α1 + 1α2 = 1
and when we attempt to solve these we discover that they have no solution.
Therefore v2 is not in W1 .

The subspace W3 actually the same as W1 because any vector that can be
expressed as a linear combination of {(1, 0, 1), (0, 1, 1)} can also be expressed
as a linear combination of {(1, 1, 2), (1, −1, 0)}. [Question: How can you
prove this?]

The subspace W2 is also the same as W1 , but we have expressed it less


efficiently. The set of linear combinations that can be reached by using all
three vectors is no greater than the set that can be reached by using just the
first two, because (1, 1, 2) is already a linear combination of the other two
vectors. Therefore the set of vectors {(1, 0, 1), (0, 1, 1), (1, 1, 2)} has some
redundancy in it.

We need a definition to capture this concept of redundancy.


Definition 4 (Independence). Let S = {v1 , v2 , . . . , vn } be a set of vectors in
a vector space. Then S is called linearly independent if the only solution to
the vector equation
α1 v1 + α2 v2 + · · · + αn vn = 0 (1)

16
is the trivial solution
α1 = α2 = · · · = αn = 0.

If there is a non-trivial solution to (1) then S is called dependent.

Linearly independent sets of vectors have many nice properties:

Theorem 4. If S = {v1 , v2 , . . . , vn } is a linearly independent set of vectors


then

1. Any vector v ∈ span(S) can be expressed uniquely as a linear combina-


tion of the vectors in S.

2. Any subset of S is also linearly independent.

3. If v ∈
/ span(S) then S ∪ {v} is also linearly independent.

4. S does not contain 0.

One of the most important properties of linearly independent sets is a con-


sequence of this unassuming lemma.

Lemma 1. Suppose that S and T are linearly independent sets of size m and
n respectively where m ≤ n and that span(S) = span(T ). If |S ∩ T | = k < m
then there is a linearly independent set S 0 of size m such that span(S 0 ) =
span(T ) and |S 0 ∩ T | = k + 1

Proof. Suppose that

S = {v1 , v2 , . . . , vk , uk+1 , . . . , um }
T = {v1 , v2 , . . . , vk , wk+1 , . . . , wm , . . . , wn }

As the two sets have the same span, we can certainly find an expression for
wk+1 as a linear combination of vectors from S.

wk+1 = α1 v1 + α2 v2 + · · · + αk vk + βk+1 uk+1 + · · · + βm um (2)

As T is linearly independent at least one of the βs must be non-zero, so


suppose that βj 6= 0 (where k + 1 ≤ j ≤ m).

17
Now let S 0 = S − {uj } ∪ {wk+1 } (i.e. we replace uj with wk+1 ). Then
span(S 0 ) = span(S) because anything that can be obtained as a linear com-
bination of the vectors in S can be expressed as a linear combination of the
vectors in S 0 by using (2) to get an expression for uj in terms of the vectors
in S 0 . A similar argument shows that S 0 is independent.

3.2 Bases

A basis for a vector space V is a set of vectors S such that

• S is linearly indpendent

• V = span(S)

The easiest way to specify a vector space or subspace is to give a basis for it,
because a basis provides a complete and economical specification of exactly
which vectors are in that vector space or subspace. A vector space is said to
be finite dimensional if it has a finite basis.
Theorem 5. All bases for a finite-dimensional vector space V have the same
size, which is known as the dimension of V .

Proof. Suppose that B and C are both bases for V and that |B| ≤ |C|. If
B is not contained in C, then by Lemma 1, we can find a sequence B = B1 ,
B2 , . . . of bases of V that have increasingly large intersection with C until
eventually Bi ⊆ C. As a basis cannot properly contain another basis, it
follows that Bi = C and hence that B and C have the same size.

Many of the vector spaces that we have seen have a particular basis that is
used so often that it is called the standard basis:

• The standard basis for R3 is

{(1, 0, 0), (0, 1, 0), (0, 0, 1)}

and so its dimension is 3.

18
• The standard basis for R2×2 is
       
1 0 0 1 0 0 0 0
, , ,
0 0 0 0 1 0 0 1

and so its dimension is 4.

• The standard basis for R[x] is

{1, x, x2 , x3 , x4 , . . .}

and so it is not finite dimensional.

• The standard basis for R3 [x] is

{1, x, x2 , x3 }

and so it has dimension 4.

There are two main ways to find a basis:

• Start with an independent set of vectors S and (if necessary) add more
vectors to S while keeping it independent until it becomes a spanning
set.

• Start with a spanning set of vectors S and (if necessary) remove vectors
from S while keeping it spanning until it becomes independent.

The first way is called extending an independent set to a basis.

The row space of a matrix is the vector space spanned by its rows. Often
however the rows will not be linearly independent and so while they form
a spanning set for the row space, they do not form a basis. Gaussian elim-
ination is a procedure for replacing the rows of a matrix with a linearly
independent set of rows that span the same space and thus it can be viewed
as a very convenient way to find a basis for a vector space.

19
3.3 Coordinates

If B = {v1 , v2 , . . . , vn } is an ordered basis for V then any vector v ∈ V can be


expressed as a unique linear combination v = α1 v1 + α2 v2 + · · · + αn vn and
the vector
[v]B = (α1 , α2 , . . . , αn )
is called the coordinate vector of v with respect to the basis B.

This then allows us to specify vectors in any vector space simply as a list of
scalars, even if the actual vectors are more complicated, such as polynomials
or matrices.

Example Consider the basis


       
1 0 1 0 0 1 0 1
B= , , ,
0 1 0 −1 1 0 −1 0
for R2×2 .

What is the coordinate vector with respect to B of the vector


 
1 2
A= ?
1 1
We need to solve the equation
         
1 2 1 0 1 0 0 1 0 1
= α1 + α2 + α3 + α4
1 1 0 1 0 −1 1 0 −1 0
Looking at the main diagonal we get
α1 + α2 = 1
α1 − α2 = 1
which has the solution α1 = 1 and α2 = 0, and looking at the other two
corners we get
α3 + α4 = 2
α3 − α4 = 1
and so α3 = 3/2 and α4 = 1/2. Thus
[A]B = (1, 0, 3/2, 1/2).

20
4 Linear Transformations

In this section we consider maps between vector spaces that preserve the
linear structure, and consider various subspaces associated with those maps.

4.1 Linear Maps

Definition 5. If V and W are vector spaces over the same field F , then a
function T : V → W is called a linear transformation if the following two
conditions are satisfied for all u, v ∈ V and α ∈ F :

T (u + v) = T (u) + T (v)
T (αu) = αT (u)

Notation: Normally we just write T u rather than T (u) when the meaning
is clear.

Example Let T : R3 → R2 be defined by

T (x, y, z) = (x + y, y + z).

This is a linear transformation because if we put u = (u1 , u2 , u3 ) and v =


(v1 , v2 , v3 ) then u + v = (u1 + v1 , u2 + v2 , u3 + v3 ) and so

T (u + v) = (u1 + v1 + u2 + v2 , u2 + v2 + u3 + v3 )

and

T u + T v = (u1 + u2 , u2 + u3 ) + (v1 + v2 , v2 + v3 )
= (u1 + u + 2 + v1 + v2 , u2 + u3 + v2 + v3 )
= T (u + v)

as required.

Example On the other hand, the mapping T : R2 → R2 given by

T (x, y) = (xy, x + y)

21
is not linear, because
T (1, 1) = (1, 2)
T (2, 2) = (4, 4)
and so T (2, 2) 6= 2T (1, 1).

Many common operations that we perform on vectors, matrices and polyno-


mials are actually linear transformations:

1. Projection of vectors onto a subspace, such as T : R3 → R3 given by


T (x, y, z) = (x, y).

2. Taking the trace of a matrix — T : R3×3 → R1 defined by


 
a11 a12 a13
T  a21 a22 a23  = (a11 + a22 + a23 )
a31 a32 a33

3. Differentiation of polynomials — T : R[x] → R[x] given by


T f = f 0.

4. Rotation of vectors (e.g in computer graphics) Tθ : R2 → R2 given by


    
x cos θ − sin θ x
T =
y sin θ cos θ y

4.2 The rank-nullity theorem

Let T : V → W be a linear transformation. Then we define the following


sets of vectors, called the kernel and range of T
ker T = {v ∈ V | T v = 0}
range T = {w ∈ W | w = T v for some v ∈ V }

Of course these are not just arbitrary sets of vectors, but they are actually
subspaces.

22
Theorem 6. If T : V → W is a linear transformation then ker T is a
subspace of V and range T is a subspace of W .

Proof. We prove that ker T is a subspace, as the proof for the range is similar.
So suppose that v1 , v2 ∈ ker T . Then we have to check whether v1 + v2 is in
the kernel, and also whether αv1 is in the kernel.

T (v1 + v2 ) = T v1 + T v2 (by linearity)


=0+0 (by definition of kernel)
= 0.

Similarly

T (αv1 ) = αT v1 (by linearity)


= α0 (by definition of kernel)
= 0.

We have special names for the dimensions of these subspaces.

Definition 6. If T : V → W is a linear transformation then the rank of T


is the dimension of the range and the nullity of T is the dimension of the
kernel.

Example: Consider the projection T : R3 → R3 given by

T (x, y, z) = T (x, y, 0).

Then the kernel of T is the vectors of the form {(0, 0, a) | a ∈ R} or (speaking


geometrically) the z-axis. The range of T is the whole of the xy-plane.
Therefore the rank of T is 2 and the nullity of T is 1.

Theorem 7 (The rank-nullity theorem). If T : V → W is a linear transfor-


mation between finite-dimensional vector spaces V and W then

rank(T ) + nullity(T ) = dim(V ).

23
Proof. Suppose that dim(V ) = n and that the nullity of T is k. Then let v1 ,
v2 , . . ., vk be a basis for the kernel of T and extend it to a basis for V .

B = {v1 , v2 , . . . , vk , vk+1 , . . . , vn }

Then we claim that

C = {T vk+1 , T vk+2 , . . . , T vn }

is a basis for the range of T and so it has dimension n − k.

To prove the claim we need to show that every vector in the range is a lin-
ear combination of the vectors in C and that the vectors in C are linearly
independent. It is fairly clear that every vector in the range of T is a linear
combination of the vectors in C and so we just show that they are indepen-
dent. So suppose that

αk+1 T vk+1 + . . . + αn T vn = 0 (3)

Then by the linearity of T we have

T (αk+1 vk+1 + . . . + αn vn ) = 0

which implies that αk+1 vk+1 +· · ·+αn vn ∈ ker T and so is a linear combination
of the vectors in B. But if

α1 v1 + · · · + αk vk = alphak+1 vk+1 + · · · + αn vn

then because B is a basis for V it follows that

α1 = α2 = · · · = αk = αk+1 = · · · = αn = 0

and (3) is the trivial linear combination.

The rank-nullity theorem is often useful for determining the dimension of the
kernel or range when one of theme is awkward to compute directly for some
reason.

Example Let T : R2 [x] → R3 be given by

T f = (f (−1), f (0), f (1))

24
What is the kernel of T ?

It is easy to see that the range of T is the whole of R3 because T (1) = (1, 1, 1)
and T (x) = (−1, 0, 1) and T (x2 ) = (1, 0, 1) and these three vectors are
linearly independent. Therefore the rank is 3 and so the nullity is 0 and
hence the only vector in the kernel of T is the zero vector 0.

4.3 The matrix of a linear transformation

It is easy to see that multiplication by a matrix is a linear transformation.


Thus if we let A be an m × n matrix (i.e. it has m rows and n columns) then
the map
T : Rn → Rm
given by
T x = Ax
(where x is viewed as a column-vector rather than a row-vector) is a linear
transformation.

In fact, in a very strong sense, every linear transformation is essentially equiv-


alent to multiplication by a matrix.
Definition 7. If B = {v1 , v2 , . . . , vn } and C are ordered bases for V and W
respectively and T : V → W is a linear transformation, then the matrix of T
with respect to B and C is the matrix
.. .. .. ..
 
. . . .
C
 .
.
[T ]B =  [T v1 ]C [T v2 ]C . [T vn ]C 



.. .. .. ..
. . . .

where the jth column is the vector [T vj ]C written as a column vector.


Theorem 8. With notation as above, for any vector v ∈ V we have

[T v]C = [T ]CB [v]B

thus showing that any linear transformation can be expressed as multiplication


by a matrix.

25
Proof. If v ∈ V then v = α1 v1 + · · · + αn vn and so its coordinate vector is
 
α1
 α2 
[v]B =  ..  .
 
 . 
αn

Multiplying this vector by the matrix of the linear transformation we get

[T ]CB [v]B = α1 [T v1 ]C + α2 [T v2 ]C + · · · + αn [T vn ]C

which is
[T (α1 v1 + α2 v2 + · · · + αn vn )]C
as required.

Note: The key point of this result is that multiplying by the matrix of
the linear transformation has the effect of “taking the B-coordinate vector of
v and returning the C-coordinate vector of T v” In other words, the matrix
multiplication both applies the linear transformation and expresses the result
in C-coordinates.

5 Change of Basis

Let V be a vector space and suppose that B and C are bases for V . Then we
can consider the identity linear transformation I : V → V where

Iv = v.

In other words, I maps each vector to itself!

The matrix of this linear transformation with respect to the bases B and C
is defined as usual by

[I]CB = [[v1 ]C , [v2 ]C , . . . , [vn ]C ]

We recall that multiplying by the matrix of a linear transformation has two


effects on the B-coordinate vector of v

26
• It applies the linear transformation to v

• It expresses the result in C-coordinates

When the transformation is the identity then the only effect is to translate B-
coordinates to C-coordinates and in this case the matrix is called the transition
matrix between the bases B and C.

Example Let V = R3 and consider the two bases

B = {(1, 1, 1), (1, 1, 0), (1, 0, 0)}

and
C = {(1, −1, 0), (1, 0, −1), (1, 0, 1)}

Express each of the vectors in B as a linear combination of those in C to get


the coordinate vectors.

1 3
(1, 1, 1) = (−1)(1, −1, 0) + (1, 0, −1) + (1, 0, 1)
2 2

6 Eigenvalues, Eigenvectors and Eigenspaces

Suppose that A is a real n × n matrix. Then a vector v ∈ Rn is called an


eigenvector of A with eigenvalue λ if

Av = λv

In other words the vector is simply multiplied by λ.

Example: If  
1 1
A=
0 3

27
 
1
then v = is an eigenvector for A with eigenvalue 3 because
2
    
1 1 1 3
=
0 3 2 6

If v is an eigenvector with eigenvalue λ then so is every scalar multiple of v.


Theorem 9. If A is a real n × n matrix, then the set of all eigenvectors of
A with eigenvalue λ is a subspace of Rn called the eigenspace corresponding
to A.

Proof. Suppose that v, w are both eigenvectors with eigenvalue λ. Then


A(v + w) = Av + Aw = λv + λw = λ(v + w)
and so v + w is an eigenvector with eigenvalue λ.

6.1 The characteristic polynomial

We can determine all the possible eigenvalues for a matrix as follows. First
we note that if
Av = λv
then
(λI − A)v = 0
or in other words v is in the null space of the matrix λI − A.

Therefore λI − A is not an invertible matrix and so its determinant is zero.


Hence we define the characteristic polynomial of A to be the polynomial
ϕ(λ) = |λI − A|.
Then the only possible eigenvalues of A are the zeros of this characteristic
polynomial.

Example Let  
0 0 −2
A= 1 2 1 
1 0 3

28
Then
λ 0 2

ϕ(λ) = −1 λ − 2 −1 = (λ − 1)(λ − 2)2

−1 0 λ−3
Therefore the only possible eigenvalues for A are λ = 1 and λ = 2.

To find the corresponding eigenspaces we need to solve the two systems of


linear equations     
0 0 −2 x x
 1 2 1   y  =  y 
1 0 3 z z
and     
0 0 −2 x 2x
 1 2 1   y  =  2y  .
1 0 3 z 2z

The first system of linear equations is equivalent to


    
−1 0 −2 x 0
 1 1 1  y  =  0 
1 0 2 z 0

which after reduction to row-echelon form leaves us with


    
1 0 2 x 0
 0 1 −1   y  =  0 
0 0 0 z 0

and so we have the solution space

{(−2z, z, z) | z ∈ R}

which is a one-dimensional eigenspace spanned by {(−2, 1, 1)}.

The second system is equivalent to


    
−2 0 −2 x 0
 1 0 1  y  =  0 
1 0 1 z 0

29
which after reduction to row-echelon form leaves us with
    
1 0 1 x 0
 0 0 0  y  =  0 
0 0 0 z 0

and so we have y, z being free variables and x = −z. Hence the solution
space is
{(−z, y, z) | y, z ∈ R}
which is a 2-dimensional eigenspace spanned by {(−1, 0, 1), (0, 1, 0)}.

7 Markov Chains

Consider some sort of system that evolves over time in a probabilistic fashion.
For example, suppose that out of Perth’s 1 million adult inhabitants there
are initially 20% Dockers supporters, 40% Eagles supporters and 40% not in-
terested in AFL. We can represent these proportions in a vector representing
the state of the system at time 0:
 
0.2
s0 =  0.4  .
0.4

After each year, some people change their habits as follows:

• Of the Dockers supporters at the end of any year, 50% remain Dockers
fans, 30% switch to the Eagles and 20% give up in disgust and lose
interest in football.

• Of the Eagles supporters 20% change to the Dockers while 80% remain
with the Eagles.

• Of those not interested, 30% become Dockers fans, 30% become Eagles
fans and 40% remain uninterested.

30
The proportions/probabilities are expressed in a transition matrix where each
column contains non-negative real numbers that sum to 1.
 
0.5 0.2 0.3
T =  0.3 0.8 0.3 
0.2 0 0.4

Definition 8. A stochastic matrix is a square matrix where each column


contains non-negative real numbers that sum to 1.

The state of the system after 1 year is then


    
0.5 0.2 0.3 200000 300000
s1 = T s0 =  0.3 0.8 0.3   400000  =  500000 
0.2 0 0.4 400000 200000

After another year we have


 
310000
s2 = T s1 =  550000 
140000

and then the sequence of states continues


       
307000 303900 302030 301031
s3 =  575000  s4 =  587500  s5 =  593750  s6 =  596875  .
118000 108600 104220 102094

The fluctuations appear to be settling down and the system approaching a


“steady state” v that must satisfy the equation

Ts = s

Therefore the steady state of this Markov chain is an eigenvector with eigen-
value 1. Solving the system of linear equations we discover that the steady
state vector is  
300000
s =  600000 
100000

31
and that the system appears to be approaching this steady state.

What are the other eigenvalues and eigenvectors of the transition matrix T ?
It is rather tedious to calculate the characteristic polynomial but it turns out
that the eigenvalues of T are {1/5, 1/2, 1} with eigenvectors v1 = (−1, 0, 1),
v2 = (1, −3, 2), v3 = (3, 6, 1) respectively. These eigenvectors form a basis for
R3 and so the initial state vector is some linear combination of those vectors

s0 = α1 v1 + α2 v2 + α3 v3

Therefore the state at time k is given by

sk = T k s0 = α1 (1/5)k v1 + α2 (1/2)k v2 + α3 v3

As k increases the terms (1/5)k and (1/2)k become vanishingly small and all
that remains is the term α3 v3 . Therefore any initial state vector will tend
towards a multiple of v3 provided the initial vector is not in span({v1 , v2 }).

Under certain mild conditions, a Markov chain can be guaranteed to converge


to a steady state vector
Theorem 10. If a Markov chain is described by a stochastic transition ma-
trix T such that either T or any of its powers T k have strictly positive entries,
then regardless of the initial state the chain converges to a unique steady state.

7.1 The $100 billion dollar eigenvector

The most famous Markov chain of all and certainly the most lucrative is the
one underlying the original Page Rank algorithm of Google.

This algorithm views the entire web as a giant Markov process where users
move from page to page depending on the links on each page. In particular,
the entry Tij models the probability that a user will surf from page j to page
i, which depends on whether page j links to page i and how many other links
there are from page j to other pages. Finally there is always a small chance
that the user will randomly surf to an unlinked page.

The steady state vector of this giant Markov chain represents the overall
“popularity” of each web page and so the pages are ranked according to the

32
values in the steady state vector. As T is a matrix with around 4 billion rows
and columns, this is a massive computation!

8 Inner Products

An inner product on a vector space V is a function

h i:V ×V →R

that satisfies the following conditions for all vectors u, v, w ∈ V and all
scalars α

1. hu, vi = hv, ui

2. hu, v + wi = hu, vi + hu, wi

3. hαu, vi = αhu, vi = hu, αvi

4. hu, ui ≥ 0 and hu, ui = 0 ⇒ u = 0

These properties of an inner product are modelled after the familiar dot
product in Rn given by

u · v = u1 v1 + u2 v2 + · · · un vn .

There are a number of other examples of inner products for different vector
spaces.

• The usual dot product in Rn (as above).

• If we pick n + 1 distinct real numbers x0 , x1 , . . ., xn then the function

hf, gi = f (x0 )g(x0 ) + f (x1 )g(x1 ) + · · · + f (xn )g(xn )

is an inner product on the vector space Rn [x] of polynomials of degree


at most n.

33
• The vector space C[0, 2π], which is the set of continuous real-valued
functions defined on the interval 0 ≤ x ≤ 2π has an inner product
Z 2π
hf, gi = f (x)g(x) dx.
0

• The function
hA, Bi = tr(AB T )
is an inner product on the space of square matrices Rn×n .

A vector space V together with an inner product is called an inner product


space.

Definition 9. In an inner product space a set of non-zero vectors {v1 , . . . , vk }


is called orthogonal if
hvi , vj i = 0
for all i 6= j and the set of vectors is called orthonormal if it is orthogonal
and in addition
hvi , vi i = 1
for all i.

8.1 Orthogonal Projection

Let W be a subspace of an inner product space V . The next lemma shows


that any vector in V can be expressed as the sum of two vectors, one in W
and one in the orthogonal complement of W .

Lemma 2. Let W be a subspace of an inner product space V . Then any


vector v ∈ V can be expressed in the form

v = w + w0 (4)

where w ∈ W and w0 ∈ W ⊥ .

34
Proof. Let W have an orthonormal basis {w1 , w2 , . . . , wk } and define

w = hv, w1 iw1 + hv, w2 iw2 + · · · + hv, wk iwk

and w0 = v − w. Then it is obvious that w ∈ W and that v = w + w0 . To


show that w0 ∈ W ⊥ we simply show that it is orthogonal to each wi where
1 ≤ i ≤ k. But clearly

hwi , w0 i = hv − w, wi i = hv, wi i − hw, wi i = 0.

The vector w in (4) is called the projection of v onto W and denoted projW (v).

8.2 Symmetric Matrices

One important class of linear transformations are those represented by sym-


metric matrices. The major properties of symmetric matrices are summarised
in the next theorem

Theorem 11. Suppose that A is an n × n symmetric matrix. Then

• The matrix A has n real eigenvalues.

• If λ is an eigenvalue of A with multiplicity mλ then the eigenspace Eλ


has dimension mλ .

• Eigenvectors belonging to different eigenspaces of A are orthogonal.

Proof. For now we just consider the third statement. Suppose that v1 has
eigenvalue λ1 and that v2 has eigenvalue λ2 , where λ1 6= λ2 . Then consider
the value
v1T Av2 = v2T Av1 .
As v2 is an eigenvector for A, the first equation shows that

v1T λ2 v2 = λ2 v1T v2 = λ2 (v1 · v2 )

35
and as v1 is an eigenvector for A the second equation shows that

v2T λ1 v1 = λ1 v2T v1 = λ1 (v1 · v2 )

Therefore
λ1 (v1 · v2 ) = λ2 (v1 · v2 )
and as λ1 6= λ2 it follows that v1 · v2 = 0.

Definition 10. An invertible matrix P is called orthogonal if

P −1 = P T .

Corollary 1. A matrix is orthogonally diagonalizable if and only if it is a


symmetric matrix.

36