You are on page 1of 284

MATH 115 Lecture Notes

University of Waterloo
December 1, 2020
Week 1: September 8 – September 11
Lecture Page Topic Textbook
Lecture 1 2 Complex Numbers - Standard Form 9.1
Lecture 2 7 Complex Conjugate, Modulus, Geometry 9.1
Lecture 3 14 Polar Form, Powers of Complex Numbers 9.1
Lecture 4 21 Complex nth Roots, the Complex Exponential 9.1
Due: Assignment 0 by 8:30am on Friday, September 11

Week 2: September 14 – September 18


Lecture Page Topic Textbook
Complex Polynomials, Introduction to Proofs, Round-
Lecture 5 27 –
off Error
Lecture 6 32 Vector Algebra 1.1, 1.4
Lecture 7 41 Norm and Dot Product 1.3, 1.5
Lecture 8 48 Complex Vectors, The Cross Product in R3 1.3, 9.31
Due: Assignment 1 by 8:30am on Friday, September 18

Week 3: September 21 – September 25


Lecture Page Topic Textbook
Lecture 9 54 The Vector Equation of a Line 1.1
The Vector Equation of a Plane, The Scalar Equation
Lecture 10 62 1.1, 1.3
of a Plane in R3
Lecture 11 70 Projections, Distances from Points to Lines and Planes 1.5
Volumes of Parallelepipeds in R3 , Introduction to Set
Lecture 12 76 1.5, –
Theory
Due: Assignment 2 by 8:30am on Friday, September 25

1
Only the material regarding complex vectors, conjugates, inner products and norms. The material about
vector spaces and the Gram Schmidt Procedure can be ignored.

i
Week 4: September 28 – October 2
Lecture Page Topic Textbook
Lecture 13 82 Spanning Sets 1.2, 1.4
Lecture 14 90 Linear Dependence and Linear Independence 1.2, 1.4
Lecture 15 96 Bases, Subspaces of Rn 1.4
Lecture 16 100 Bases of Subspaces 1.4
Due: Assignment 3 by 8:30am on Friday, October 2

Week 5: October 5 – October 9


Lecture Page Topic Textbook
k−Flats and Hyperplanes, Orthogonal Sets and Bases,
Lecture 17 105 7.1
Orthonormal Sets and Bases
Lecture 18 110 Systems of Linear Equations 2.1
Lecture 19 116 Solving Systems of Linear Equations 2.1, 2.2, 9.22
Lecture 20 123 Spanning, Rank 2.33 , 2.2
Due: Term Test 1 by 8:30am on Friday, October 9

Reading Week: October 12 – October 16

Week 6: October 19 – October 23


Lecture Page Topic Textbook
Lecture 21 130 Homogeneous Systems of Linear Equations 2.2
Lecture 22 136 More on Systems of Equations, Dimension 2.3
Lecture 23 142 Applications: Chemical Reactions, Linear Models 2.4
Lecture 24 147 Applications: Network Flow, Electrical Networks 2.4
Due: Assignment 4 by 8:30am on Friday, October 23

2
Complex Numbers in Electrical Circuit Equations is omitted.
3
Only Spanning Problems are covered for now

ii
Week 7: October 26 – October 30
Lecture Page Topic Textbook
Lecture 25 157 Matrix Algebra 3.1
Lecture 26 162 The Matrix–Vector Product 3.1
The Fundamental Subspaces Associated with a Matrix,
Lecture 27 167 3.1, 3.4
Matrix Multiplication
Lecture 28 173 Complex Matrices, Application: Directed Graphs –
Due: Assignment 5 by 8:30am on Friday, October 30

Week 8: November 2 – November 6


Lecture Page Topic Textbook
Lecture 29 182 Application: Markov Chains 6.34
Lecture 30 188 Matrix Inverses 3.55
Invertible Matrix Theorem, Matrix Transformations,
Lecture 31 193 3.5, 3.2
Linear Transformations
Lecture 32 200 Linear Transformations, Projections, Reflections 3.2
Due: Assignment 6 by 8:30am on Friday, November 6

Week 9: November 9 – November 13


Lecture Page Topic Textbook
Rotations, Compressions and Stretches, Contractions
Lecture 33 206 and Dilations, Shears, Operations on Linear Transfor- 3.2
mations
Inverses of Linear Transformations, Application: Linear
Lecture 34 214 3.5
Transformations
Lecture 35 221 Kernel and Range, One-to-One Linear Transformations 3.4, 4.76
Onto Linear Transformations, One-to-One Correspon-
Lecture 36 228 4.77 , 5.1
dence, Determinants
Due: Term Test 2 by 8:30am on Friday, November 13

4
The material here is a bit advanced - understanding the lecture notes will be sufficient.
5
Omit Linear Mappings for now.
6
The lecture notes are better here - the textbook material is too abstract.
7
The lecture notes are better here - the textbook material is too advanced

iii
Week 10: November 16 – November 20
Lecture Page Topic Textbook
Cofactors, Adjugates, Inverses, Elementary Row and
Lecture 37 233 5.1, 5.2, 5.38
Column Operations
Lecture 38 241 Properties of Determinants 5.2
Application: Polynomial Interpolation, Determinants
Lecture 39 247 5.4
and Area
Determinants and Volume, Eigenvalues and Eigenvec-
Lecture 40 253 5.4, 6.19
tors, Characteristic Polynomials
Due: Assignment 7 by 8:30am on Friday, November 20

Week 11: November 23 – November 27


Lecture Page Topic Textbook
Lecture 41 256 Eigenspaces, Algebraic and Geometric Multiplicities 6.1
Lecture 42 260 Diagonalization 6.2
Lecture 43 264 Powers of a Matrix 6.3
Vector Spaces, Examples, The Vectors Space of m × n
Lecture 44 270 4.2, 4.3
Matrices
Due: Assignment 8 by 8:30am on Friday, November 27

Week 12: November 30 – December 4


Lecture Page Topic Textbook
Lecture 45 275 The Vector Space of Polynomials 4.1, 4.3
Lecture 46 278 Abstract Vector Spaces 4.2
Due: Assignment 9 by 8:30am on Friday, December 4

Final Exam Period: December 9 – December 23


Term Test 3 will be held on Friday, December 18

8
Omit Cramer’s Rule.
9
Omit The Power Method of Determining Eigenvalues.

iv
Lecture 1

Complex Numbers – Standard Form


Recall the number systems you know:

Natural Numbers: N = {1, 2, 3, . . .}


Integers: Z = {.
n .a. , −3, −2, −1, 0, 1,o2, 3, . . .}
Rational Numbers: Q = a, b ∈ Z, b 6= 0
b
Real Numbers: R, the set or collection all rational and irrational numbers

Note that every natural number is an integer, every integer is a rational number (with
denominator equal to 1) and that every rational number is a real number. Consider the
following five equations:

x+3=5 (1)
x+4=3 (2)
2x = 1 (3)
x2 = 2 (4)
x2 = −2 (5)

Equation (1) has solution x = 2, and thus can be solved using natural numbers. Equation (2)
does not have a solution in the natural numbers, but it does have a solution in the integers,
namely x = −1. Equation (3) does not have a solution in the integers, but it does have a
rational solution of x = 21 . Equation (4) does not have a rational solution, but it does a have

real solution: x = 2. Finally, since the square of any real number is greater than or equal
to zero, Equation (5) does not have a real solution. In order to solve this last equation, we
will need a “larger” set of numbers.

We introduce a bit of notation here. When we write x ∈ R, we mean that the variable x is
a real number. As another example, by p, q ∈ Z, we mean that both p and q are integers.
By x ∈/ N, we mean that x is not a natural number.

Definition 1.1. A complex number in standard form is an expression of the form x + yj


where x, y ∈ R and j satisfies j 2 = −1. The set of all complex numbers is denoted by

C = {x + yj | x, y ∈ R}.

Note that mathematicians (and most other humans including the authors of the text) use
i rather than j, however engineers use j since i is often used in the modelling of electric
networks.

2
Example 1.2.
• 3 = 3 + 0j ∈ C

• 4j = 0 + 4j ∈ C

• 3 + 4j ∈ C

• sin(π/7) + π π j ∈ C
In fact, every x ∈ R can be expressed as x = x + 0j ∈ C, so every real number is a complex
number. However, not every complex number is real, for example, 3 + 4j ∈ / R.

We introduce a little bit more notation here. We just mentioned that every real number is a
complex number. We denote this by R ⊆ C and say that R is a subset of C. We also showed
that not every complex number is a real number, which we denote by C 6⊆ R and say that
C is not a subset of R. From the definitions of natural numbers, integers, rational numbers,
real numbers and complex numbers, we have

N⊆Z⊆Q⊆R⊆C

Definition 1.3. Let z = x + yj ∈ C with x, y ∈ R. We call x the real part of z and y the
imaginary part of z:

x = Re(z) (sometimes written as R(z))


y = Im(z) (sometimes written as I(z))

If y = 0, then z = x is purely real (we normally just say “real”). If x = 0, then z = yj is


purely imaginary.
Example 1.4.
• Re(3 − 4j) = 3

• Im(3 − 4j) = −4
It is important to note that Im(3 − 4j) 6= −4j. By definition, for any z ∈ C we have
Re(z), Im(z) ∈ R, that is, both the real and imaginary parts of a complex number are real
numbers.

Having defined complex numbers, we now look at how the basic algebraic operations of
addition, subtraction, multiplication and division are defined.
Definition 1.5. Two complex numbers z = x + yj and w = u + vj with x, y, u, v ∈ R are
equal if and only if x = u and y = v, that is, if and only if Re(z) = Re(w) and Im(z) = Im(w).
In words, two complex numbers are equal if they have the same real parts and the same
imaginary parts.

3
Definition 1.6. Let x + yj and u + vj be two complex numbers in standard form. We define
addition, subtraction and multiplication as

(x + yj) + (u + vj) = (x + u) + (y + v)j


(x + yj) − (u + vj) = (x − u) + (y − v)j
(x + yj)(u + vj) = (xu − yv) + (xv + yu)j

To add two complex numbers, we simply add the real parts and add the imaginary parts.
Subtraction is done similarly. With our definition of multiplication, we can verify that
j 2 = −1:

j 2 = (j)(j) = (0 + 1j)(0 + 1j) = (0(0) − 1(1)) + (0(1) + 1(0))j = −1 + 0j = −1

There is no need to memorize the formula for multiplication of complex numbers. Using the
fact that j 2 = −1, we can simply do a binomial expansion:

(x + yj)(u + vj) = xu + xvj + yuj + yvj 2


= xu + xvj + yuj − yv
= (xu − yv) + (xv + yu)j

Example 1.7. Let z = 3 − 2j and w = −2 + j. Compute z + w, z − w and zw. Express


your answers in standard form.

Solution. We have

z + w = (3 − 2j) + (−2 + j) = 1 − j
z − w = (3 − 2j) + (−2 + j) = 5 − 3j
zw = (3 − 2j)(−2 + j) = −6 + 3j + 4j − 2j 2 = −6 + 3j + 4j + 2 = −4 + 7j

We see that addition, subtraction and multiplication are similar to that of real numbers, just
a little more complicated. We now look at division of complex numbers.

Example 1.8. Let z = x + yj be a nonzero complex number. Show that


1 x y
= 2 2
− 2 j.
z x +y x + y2
Solution. Since z is nonzero, x and y cannot both be zero. It follows that x − yj 6= 0. We
have
 
1 1 1 x − yj x − yj x − yj x y
= = = 2 2 2
= 2 2
= 2 2
− 2 j.
z x + yj x + yj x − yj x − xyj + yxj − y j x +y x +y x + y2
1
Since x, y are not both zero, x2 + y 2 > 0, which guarantees that z
is defined.

4
Notice that when we divide by a nonzero complex number x + yj, we multiply both the
numerator and denominator by x − yj. This is because (x + yj)(x − yj) = x2 + y 2 ∈ R,
which allows us to put the quotient into standard form. We can now divide any complex
number by any nonzero complex number.
Example 1.9. With z = 3 − 2j and w = −2 + j, compute z/w.
Solution. We have
−6 − 3j + 4j + 2j 2
 
z 3 − 2j 3 − 2j −2 − j −8 + j 8 1
= = = 2
= = − + j.
w −2 + j −2 + j −2 − j 4 + 2j − 2j − j 4+1 5 5
Example 1.10. Express
(1 − 2j) − (3 + 4j)
5 − 6j
in standard form.
Solution. We carry out our operations as we would with real numbers.
(1 − 2j) − (3 + 4j) −2 − 6j
=
5 − 6j 5 − 6j
 
−2 − 6j 5 + 6j
=
5 − 6j 5 + 6j
−10 − 12j − 30j − 36j 2
=
25 + 36
26 − 42j
=
61
26 42
= − j
61 61
Note that for z ∈ C, we have z 1 = z, and for any integer k ≥ 2, z k = z(z k−1 ). For z 6= 0
(here, 0 = 0 + 0j), z 0 = 1. As usual, 00 is undefined. For any z ∈ C with z 6= 0, we have
z −k = z1k for any positive integer k. In particular, z −1 = z1 for z 6= 0.

We now summarize the rules of arithmetic in C. Notice that we’ve used some of these rules
already.
Theorem 1.11 (Properties of Arithmetic in C). Let u, v, z ∈ C with z = x + yj. Then
(1) (u + v) + z = u + (v + z) addition is associative

(2) u + v = v + u addition is commutative

(3) z + 0 = z 0 is the additive identity

(4) z + (−z) = 0 −z is the additive inverse of z

5
(5) (uv)z = u(vz) multiplication is associative

(6) uv = vu multiplication is commutative

(7) z(1) = z 1 is the multiplicative identity

(8) for z 6= 0, z −1 z = 1 z −1 is the multiplicative inverse of z 6= 0

(9) z(u + v) = zu + zv distributive law

These rules show that complex numbers behave much like real numbers with respect to
addition, subtraction, multiplication and division.

6
Lecture 2
Example 2.1. Find all z ∈ C satisfying z 2 = −7 + 24j.
Solution. Let z = a + bj with a, b ∈ R. Then

z 2 = (a + bj)2 = a2 − b2 + 2abj = −7 + 24j.

Equating real and imaginary parts gives

a2 − b2 = −7 (6)
2ab = 24 (7)
24 12 12
From (7), we have that a, b 6= 0, so b = 2a
= a
.
Substituting b = a
into (6) gives
 2
2 12
a − = −7
a
144
a2 − 2 = −7
a
4 2
a + 7a − 144 = 0
(a + 16)(a2 − 9) = 0
2

(a2 + 16)(a + 3)(a − 3) = 0.

Since a ∈ R, a2 + 16 > 0, so we conclude that a + 3 = 0 or a − 3 = 0 which gives a = 3 or


a = −3. Since b = 12
a
, b = 12
3
12
= 4 or b = −3 = −4. Thus z = 3 + 4j or z = −3 − 4j.
Example 2.2. Find all z ∈ C satisfying z 2 = −2.
Solution. Let z = a + bj with a, b ∈ R. Then

z 2 = (a + bj)2 = a2 − b2 + 2abj = −2.

Equating real and imaginary parts gives

a2 − b2 = −2 (8)
2ab = 0 (9)
2 2
From (9) we
√ see that a √ √ to −b = −2, that is b = 2.
= 0 or b = 0. If a = 0√then (8) reduces
Hence b = 2 or b = − 2. In this case, z = 2j or z = − 2j. On the other hand,√if b = 0
2
then a√ = −2 which has no solutions since a ∈ R implies that a2 ≥ 0. Thus z = 2j and
z = − 2j are the only solutions.

Conjugate and Modulus


We introduce the conjugate and modulus of a complex number and state their basic prop-
erties.

7
Definition 2.3. The complex conjugate of z = x + yj with x, y ∈ R is z = x − yj.

Example 2.4.

• 1 + 3j = 1 − 3j
√ √
• 2j = − 2j

• −4 = −4

Note that for a complex number z 6= 0,


1 1z z
z −1 = = = .
z zz zz
Theorem 2.5 (Properties of Conjugates). Let z, w ∈ C with z = x + yj where x, y ∈ R.
Then

(1) z = z

(2) z ∈ R ⇐⇒ z = z

(3) z is purely imaginary ⇐⇒ z = −z

(4) z + w = z + w

(5) zw = z w
k
(6) z k = z for k ∈ Z, k ≥ 0, (k 6= 0 if z = 0)
z z
(7) = provided w 6= 0
w w
(8) z + z = 2x = 2Re(z)

(9) z − z = 2yj = 2jIm(z)

(10) zz = x2 + y 2

Note that “⇐⇒” means “if and only if”.


Proof. Let z, w ∈ C with z = x + yj and w = u + vj where x, y, u, v ∈ R.

(1) z = x + yj = x − yj = x + (−y)j = x − (−y)j = x + yj = z.

(2) z = z ⇐⇒ x − yj = x + yj ⇐⇒ 2yj = 0 ⇐⇒ y = 0 ⇐⇒ Im(z) = 0 ⇐⇒ z ∈ R.

(3) Let z be purely imaginary. Then x = 0 so z = yj. It follows that z = yj = −yj = −z


so z = −z. Now assume that z = −z. Then x − yj = −(x + yj) = −x − yj so 2x = 0
from which it follows that x = 0 so z is purely imaginary.

8
(4) z + w = (x + yj) + (u + vj) = (x + u) + (y + v)j = (x + u) − (y + v)j
= (x − yj) + (u − vj) = z + w.

(5) We have

zw = (x + yj)(u + vj) = (xu − yv) + (xv + yu)j = (xu − yv) − (xv + yu)j

and
 
z w = x + yj u + vj = (x − yj)(u − vj) = (xu − yv) + (−xv − yu)j
= (xu − yv) − (xv + yu)j

from which it follows that zw = z w.

(6) This requires a proof technique called induction which we do not cover in MATH 115.

(7) For w 6= 0,
     
1 1 u v u v
= = 2 2
− 2 2
j = 2 2
+ 2 j
w u + vj u +v u +v u +v u + v2

and
1 1 u v
= = 2 + j
w u − vj u + v 2 u2 + v 2
so  
1 1
= .
w w
Now, using (5) we obtain
z    
1 1 1 z
= z =z =z =
w w w w w

(8) z + z = (x + yj) + (x − yj) = 2x = 2Re(z).

(9) z − z = (x + yj) − (x − yj) = 2yj = 2jIm(z).

(10) zz = (x + yj)(x − yj) = x2 + xyj − xyj − y 2 j 2 = x2 + y 2 .

Definition
p 2.6. The modulus of z = x + yj with x, y ∈ R is the nonnegative real number
|z| = x + y 2 .
2

9
Example 2.7.
√ √
• |1 + j| = 12 + 12 = 2
√ √
• |3j| = 02 + 32 = 9 = 3
p √
• | − 4| = (−4)2 + 02 = 16 = 4
For x ∈ R, we know that since R ⊆ C, x ∈ C. Thus the modulus of x is given by
√ √
|x| = |x + 0j| = x2 + 02 = x2 = |x| .
|{z} |{z}
modulus absolute value

We see that for real numbers x, the modulus of x is the absolute value of x. Thus the
modulus is the extension of the absolute value to the complex numbers which is why we
have chosen the same notation for the modulus as the absolute value. We will see shortly
that the modulus of a complex number can be interpreted as the size or magnitude of that
complex number, just like the absolute value of a real number can be interpreted as the size
of magnitude of that real number.

Given two complex numbers, it is natural to ask if one number is greater than the other, for
example, is 1+j < 3j? This is easy to decide for real numbers, but it is actually undefined for
complex numbers.10 This is where the modulus can help us: we compare complex numbers
by comparing
√ their moduli since the modulus of a complex number is real. For instance,
|1 + j| = 2 < 3 = |3j| (but we don’t say 1 + j < 3j).
Theorem 2.8 (Properties of Modulus). Let z, w ∈ C. Then
(1) |z| = 0 ⇐⇒ z = 0
(2) |z| = |z|
(3) zz = |z|2
(4) |zw| = |z||w|
z |z|
(5) = provided w 6= 0
w |w|
(6) |z + w| ≤ |z| + |w| which is known as the Triangle Inequality
Proof. Let z, w ∈ C.

(1) Assume first that z = 0. pThen |z| = 02 + 02 = 0. Assume now that z = x + yj is
such that |z| = 0. Then x2 + y 2 = 0 and so x2 + y 2 = 0. It follows that x = y = 0
and so z = 0.
10
We say that R is ordered by ≤, that is, for x, y ∈ R, x ≤ y or y ≤ x. As defined, ≤ doesn’t make sense
for complex numbers. One can however, redefine what ≤ means for complex numbers so that the complex
numbers are ordered by ≤, but this is beyond the scope of MATH 115.

10
p p
(2) |z| = |x − yj| = |x + (−y)j| = x2 + (−y)2 = x2 + y 2 = |z|.

(3) Using Theorem 2.5(10), we have zz = x2 + y 2 = |z|2 .

(4) We have

|zw|2 = (zw)zw by (3)


= zwz w by Theorem 2.5(5)
= zzww
= |z|2 |w|2 by (3)
= (|z||w|)2

Thus |zw|2 = (|z||w|)2 . Since the modulus of a complex number is never negative, we
can take square roots of both sides to obtain |zw| = |z||w|.

(5) Let w 6= 0. Using (4) and Theorem 2.5(7), we have

z 2 zz z z zz |z|2
= = = =
w w w ww ww |w|2

Since the modulus of a complex number is never negative, we can take square roots of
z |z|
both sides to obtain = .
w |w|
(6) Left as an exercise.

Note that for a complex number z 6= 0, the modulus and the complex conjugate give us a
nice way to write z −1 :
1 z z
z −1 = = = 2.
z zz |z|

Geometry
Visually, we interpret the set of real numbers as a line. Given that R ⊆ C and that there
are complex numbers that are not real, the set of complex numbers should be “bigger” than
a line. In fact, the set of complex numbers is a plane, much like the xy–plane as shown in
Figure 1. We “identify” the complex number x + yj ∈ C with the point (x, y) ∈ R2 . In
this sense, the complex plane is simply a “relabelling” of the xy–plane. The x–axis in the
xy–plane corresponds to the real axis in the complex plane which contains the real numbers,
The y–axis of the xy–plane corresponds to the imaginary axis in the complex plane which
contains the purely imaginary numbers. Note we will often label the real axis as “Re” and
the imaginary axis as “Im”.

11
(a) The xy-plane, known as R2 . (b) The complex plane C

Figure 1: The xy-plane and the complex plane.

We also have a geometric interpretation of the complex conjugate and the modulus as well
which is shown in Figure 2.

Figure 2: Visually interpreting the complex conjugate and the modulus of a complex number.

For z ∈ C, we see that that z is a reflection of z in the real axis and that |z| is the distance
between 0 and z. Also note that any complex number w lying on the green circle in Figure 2
satisfies |w| = |z|. If w is inside the green circle, then |w| < |z| and if w is outside the green
circle, then |w| > |z|.
12
We also gain a geometric interpretation of addition:

Figure 3: Visually interpreting complex addition.

We see that the complex numbers 0, z, w and z+w form a parallelogram with the line segment
between 0 and z + w as one of the diagonals. Finally, we look at the triangle determined by
0, z and z + w.

Figure 4: Visualizing the Triangle Inequality.

Since the length of any one side of a triangle cannot exceed the sum of the other two sides
(or else the triangle wouldn’t “close”), we have must have
|z + w| ≤ |z| + |w|
Note that this is not a proof of the Triangle Inequality.

We will require a little more work before we can have a meaningful geometric understanding
of complex multiplication.
13
Lecture 3

Complex Numbers – Polar Form


We now look at another way that we can represent complex numbers that will be useful for
complex multiplication. Consider a nonzero complex number z = x + yj in standard form.
Let r = |z| > 0 and let θ denote the angle the line segment from 0 to z makes with the
positive real axis, measured counterclockwise. We refer to r > 0 as the radius of z and θ as
an argument of z.

Figure 5: A complex number with its radius and an argument.


p
Given z = x + yj 6= 0, we can compute r = |z| = x2 + y 2 and we can compute θ using
x y
cos θ = and sin θ =
r r
and given the radius r and an argument θ of a nonzero complex number z, we can compute
the real and imaginary parts using

x = r cos θ and y = r sin θ.

Thus
z = x + yj = (r cos θ) + (r sin θ)j = r(cos θ + j sin θ).

Note that | cos θ + j sin θ| = cos2 θ + sin2 θ = 1, and as a result, we may understand an
argument of a complex number z as giving us a point on a circle of radius 1 to move towards
(that is measured counterclockwise from the positive real axis), while r > 0 tell us how far
to move in that direction to reach z. This is illustrated in Figure 6.

14
Figure 6: Using r and θ to locate a complex number. Here, r > 1.

Definition 3.1. The polar form 11 of a complex number z 6= 0 is given by

z = r(cos θ + j sin θ)

where r = |z| and θ is an argument of z.

So far, we have considered complex numbers z 6= 0. For z = 0, it is clear that r = 0, so


0 is also the polar form of z = 0 with any θ ∈ R serving as an argument for z = 0 since
0 = 0(cos θ + j sin θ) for any θ ∈ R.

Note that unlike standard form, z does not have a unique polar form. Recall that for any
k ∈ Z,
cos θ = cos(θ + 2kπ) and sin θ = sin(θ + 2kπ)
so 
r(cos θ + j sin θ) = r cos(θ + 2kπ) + j sin(θ + 2kπ)
for any k ∈ Z.

Example 3.2. Write the following complex numbers in polar form.



(1) 1 + 3j

(2) 7 + 7j

11
We typically write cos θ + j sin θ rather than cos θ + (sin θ)j to avoid the extra brackets. For standard
form, we still write x + yj and not x + jy.

15
Solution.
q√ √ √ √
(1) We have r = |1 + 3j| = 12 + ( 3)2 = 1 + 3 = 4 = 2. Thus, factoring r = 2 out

of 1 + 3j gives
√ !
√ 1 3
1 + 3j = 2 + j .
2 2

3
As this is of the form r(cos θ + j sin θ), we have that cos θ = 12 and sin θ = 2
. We thus
take θ = π3 so
√  π π
1 + 3j = 2 cos + j sin .
3 3
√ p √
(2) Since r = |7 + 7j| = 72 + 72 = 2(49) = 7 2, we have that
√ √
   
7 7 1 1
7 + 7j = 7 2 √ + √ j = 7 2 √ + √ j
7 2 7 2 2 2
√ √
√1 2 2 π
so cos θ = 2
= 2
and sin θ = 2
. Thus we take θ = 4
to obtain
√  π π
7 + 7j = 7 2 cos + j sin .
4 4

Note that we can add 2π to either of our above arguments to obtain



 
7π 7π
1 + 3j = 2 cos + j sin
3 3

 
9π 9π
7 + 7j = 7 2 cos + j sin
4 4
which verifies that the polar form of a complex number is not unique. Normally, we choose
our arguments θ such that 0 ≤ θ < 2π or −π < θ ≤ π to avoid this problem.

Converting from standard form to polar form is a bit computational, however the next
example shows it is quite easy to convert from polar form back to standard form.
Example 3.3. Write 3 cos 5π + j sin 5π
 
6 6
in standard form.
Solution. We have
     √ ! √
5π 5π 3 1 3 3 3
3 cos + j sin =3 − + j =− + j.
6 6 2 2 2 2

As mentioned, polar form is useful for complex multiplication. To see how, we begin by
recalling the angle sum formulas

cos(θ1 + θ2 ) = cos θ1 cos θ2 − sin θ1 sin θ2

16
sin(θ1 + θ2 ) = sin θ1 cos θ2 + cos θ1 sin θ2

If
z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 )
are two complex numbers in polar form, then
 
z1 z2 = r1 (cos θ1 + j sin θ1 ) r2 (cos θ2 + j sin θ2 )
= r1 r2 (cos θ1 + j sin θ1 )(cos θ2 + j sin θ2 )

= r1 r2 (cos θ1 cos θ2 − sin θ1 sin θ2 ) + j(sin θ1 cos θ2 + cos θ1 sin θ2 )

= r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 )

Thus 
z1 z2 = r1 r2 cos(θ1 + θ2 ) + j sin(θ1 + θ2 ) .
This now allows us to understand polar multiplication geometrically. Given a complex
number z = r(cos θ + j sin θ), multiplying by z can be viewed as a counterclockwise rotation
by θ about the number 0 in the complex plane, and a scaling by a factor of r. This is
illustrated in Figure 7. Note that a counterclockwise rotation by θ is a clockwise rotation
by −θ. Thus, if θ = − π4 for example, then multiplication by z can be viewed as a clockwise
rotation by π4 (plus a scaling by a factor of r).

Figure 7: Multiplication of complex numbers in polar form. Note that in this image,
|z1 |, |z2 | > 1 and θ1 , θ2 > 0.

Recall that multiplying complex numbers in standard form requires a binomial expansion
which can be tedious and error prone by hand. Although it is also tedious to convert a
complex number in standard form to polar form, multiplying complex numbers in polar form

17
is quite simple. We simply multiply the two moduli together, which is just multiplication of
real numbers, and add the arguments together, which is just addition of real numbers.

Example 3.4. Let z1 = 2 cos π3 + j sin π3 and z2 = 7 2 cos π4 + j sin π4 . Express z1 z2 in
 

polar form.
Solution. We have
√  π π  √
 π π   
7π 7π
z1 z2 = 2(7 2) cos + + j sin + = 14 2 cos + j sin .
3 4 3 4 12 12
Example 3.5. Let z1 = r1 (cos θ1 + j sin θ1 ) and z2 = r2 (cos θ2 + j sin θ2 ) be two complex
numbers in polar form with z2 6= 0 (from which it follows that r2 6= 0). Show that
z1 r1 
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
z2 r2
Solution. Recall that
cos(θ1 − θ2 ) = cos θ1 cos θ2 + sin θ1 sin θ2
sin(θ1 − θ2 ) = sin θ1 cos θ2 − cos θ1 sin θ2
We have
z1 r1 (cos θ1 + j sin θ1 )
=
z2 r2 (cos θ2 + j sin θ2 )
r1 cos θ1 + j sin θ1 cos θ2 − j sin θ2
=
r2 cos θ2 + j sin θ2 cos θ2 − j sin θ2
r1 (cos θ1 cos θ2 + sin θ1 sin θ2 ) + j(sin θ1 cos θ2 − cos θ1 sin θ2 )
=
r2 cos2 θ2 + sin2 θ2
r1 
= cos(θ1 − θ2 ) + j sin(θ1 − θ2 ) .
r2

Powers of Complex Numbers


We now look at computing z n for any integer n. Let z = r(cos θ + j sin θ) be a complex
number in polar form. Then
z 2 = z z = r(cos θ + j sin θ)r(cos θ + j sin θ)
= r2 cos(θ + θ) + j sin(θ + θ)


= r2 cos(2θ) + j sin(2θ) .


A similar computation shows that


z 3 = z 2 z = r3 cos(3θ) + j sin(3θ) .


Continuing with this process, it appears that for any positive integer n,

z n = rn cos(nθ) + j sin(nθ) .


18
Example 3.6. For z = r(cos θ + j sin θ) 6= 0, show that
1 1
z −1 =

= cos(−θ) + j sin(−θ) .
z r
Solution. Note that |1| = 1 and θ = 0 is an argument for 1. Using the result of Example
3.5, we have
1 1(cos 0 + j sin 0) 1  1
z −1 =

= = cos(0 − θ) + j sin(0 − θ) = cos(−θ) + j sin(−θ) .
z r(cos θ + j sin θ) r r

The above example shows that z n = rn cos(nθ) + j sin(nθ) holds for n = −1 as well. We
have the following important result.
Theorem 3.7 (de Moivre’s Theorem). If z = r(cos θ + j sin θ) 6= 0, then

z n = rn cos(nθ) + j sin(nθ)


for any n ∈ Z.
Since de Moivre’s Theorem is stated for n ∈ Z, we have to allow for n < 0 and thus the
restriction that z 6= 0. It is easy to verify that de Moivre’s Theorem holds for z = 0 provided
n ≥ 1. The proof of de Moivre’s Theorem again requires induction so is not included here.
Example 3.8. Compute (2 + 2j)7 using de Moivre’s Theorem and express your answer in
standard form.
√ p √
Solution. We have r = |2 + 2j| = 4 + 4 = 2(4) = 2 2 and so
√ √ !
√ √
 
2 2 2 2
2 + 2j = 2 2 √ + √ j = 2 2 + j
2 2 2 2 2 2

from which we find θ = π4 . Thus


√  π π
2 + 2j = 2 2 cos + j sin .
4 4
Then
7
√ 

7 π π
(2 + 2j) = 2 2 cos + j sin
4 4
√ 7
 
7π 7π
= (2 2) cos + j sin by de Moivre’s Theorem
4 4
√ √ !
√ 2 2
= 1024 2 − j
2 2
= 1024 − 1024j

19
 √ 602
1 3
Example 3.9. Compute 2
+ 2
j and express your answer in standard form.
√ q
1 3 1 3
Solution. Since r = 2
+ 2
j = 4
+ 4
= 1, we see that

1 3 π π
+ j = cos + j sin .
2 2 3 3
Thus
√ !602 
1 3 π π 602
+ j = cos + j sin
2 2 3 3
602π 602π
= cos + j sin by de Moivre’s Theorem
3 3
2π 2π
= cos + j sin
3 √ 3
1 3
=− + j
2 2
It is hopefully apparent that trigonometry will play a role here, so we include the unit circle
in the complex plane. Note that in MATH 115, we use radians to measure angles as opposed
to degrees.

Figure 8: The unit circle in the complex plane.

20
Lecture 4

Complex nth Roots


Let n be a positive integer and z ∈ C. We have seen that we can use polar form to compute
z n rather easily, but suppose instead that we are asked to find all w ∈ C such that wn = z.
We refer to such a w as an nth root of z. Examples 2.1 and 2.2 illustrate one way of solving
this problem when n = 2, but for larger n, this method becomes much more difficult. Again,
polar form will help us. Of course, if z = 0, then wn = 0 implies that w = 0, so we assume
z 6= 0.

Let z = r(cos θ + j sin θ) and let w = R(cos φ + j sin φ). From wn = z we have
n
R(cos φ + j sin φ) = r(cos θ + j sin θ).
Using de Moivre’s Theorem, we obtain
Rn (cos(nφ) + j sin(nφ)) = r(cos θ + j sin θ).
From this we find that
Rn = r and nφ = θ + 2kπ
for some k ∈ Z. To understand this, notice that since wn = z, it must be the case that wn
and z have the same modulus and so Rn = r, and that any argument of wn must be equal
to an argument of z plus some integer multiple of 2π. Solving for R and φ gives
θ + 2kπ
R = r1/n and φ =
n
for some k ∈ Z. Here, r1/n is the nth root of the real number r and is evaluated in the
normal way. Thus, for any k ∈ Z, let
    
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n
Then
      n
θ + 2kπ θ + 2kπ
wkn = r 1/n
cos + j sin
n n
    
1/n n
 θ + 2kπ θ + 2kπ
= r cos n + j sin n by de Moivre’s Theorem
n n

= r cos(θ + 2kπ) + j sin(θ + 2kπ)
= r(cos θ + j sin θ)
= z.
Hence wkn = z for any integer k. It is tempting to think that there will be infinitely many
solutions to wn = z, but in fact we obtain exactly n solutions.

21
Theorem 4.1. Let z = r(cos θ + j sin θ) be nonzero, and let n be a positive integer. Then
the n distinct nth roots of z are given by
    
1/n θ + 2kπ θ + 2kπ
wk = r cos + j sin .
n n

for k = 0, 1, . . . , n − 1.

A few examples will show why we only need to consider k = 0, 1, . . . , n − 1.

Example 4.2. Find the 3rd roots of 1, that is, find all w ∈ C such that w3 = 1.

Solution. Here, z = 1 and n = 3. In polar form, 1 = 1(cos 0 + j sin 0) so the 3rd roots of 1
are given by
    
1/3 0 + 2kπ 0 + 2kπ
wk = 1 cos + j sin , k = 0, 1, 2
3 3
2kπ 2kπ
= cos + j sin , k = 0, 1, 2
3 3
Thus

w0 = cos 0 + j sin 0 = 1

2π 2π 1 3
w1 = cos + j sin =− + j
3 3 2 √2
4π 4π 1 3
w2 = cos + j sin =− − j
3 3 2 2

√ √
3 3
Thus, the 3rd roots of 1 are given by 1, − 12 + 2
j and − 12 − 2
j. This means that
√ !3 √ !3
1 3 1 3
13 = − + j = − − j = 1.
2 2 2 2

If we try to compute w−1 and w3 , we find


    √
2π 2π 1 3
w−1 = cos − + j sin − =− − j = w2
3 3 2 2
w3 = cos(2π) + j sin(2π) = 1 = w0

We see that as we increase k, we rotate counterclockwise by an angle of 2π3


, and thus after
three rotations, we are back where we started. The 3rd roots of 1 are plotted in Figure 9.

22
Figure 9: The 3rd roots of 1.

Example 4.3. Find all 4th roots of −256 in standard form and plot them in the complex
plane.

Solution. Here, z = −256 and n = 4. We have that −256 = 256(cos π + j sin π) so the 4th
roots are given by
    
1/4 π + 2kπ π + 2kπ
wk = (256) cos + j sin , k = 0, 1, 2, 3
4 4
    
π + 2kπ π + 2kπ
= 4 cos + j sin , k = 0, 1, 2, 3.
4 4

Thus
√ √ !
 π π 2 2 √ √
w0 = 4 cos + j sin =4 + j = 2 2 + 2 2j
4 4 2 2
√ √ !
√ √
 
3π 3π 2 2
w1 = 4 cos + j sin =4 − + j = −2 2 + 2 2j
4 4 2 2
√ √ !
√ √
 
5π 5π 2 2
w2 = 4 cos + j sin =4 − − j = −2 2 − 2 2j
4 4 2 2

23
√ √ !
√ √
 
7π 7π 2 2
w3 = 4 cos + j sin =4 − j = 2 2 − 2 2j
4 4 2 2

which we plot in the complex plane. Notice again that the roots are evenly spaced out on a
circle of radius 4.

Figure 10: The 4th roots of −256.


Example 4.4. Find the 3rd roots of 4 − 4 3j. Express your answers in polar form.
√ √ √
Solution. Since |4 − 4 3j| = 4|1 − 3j| = 4 1 + 3 = 4(2) = 8, we have
√ ! √ !

 
4 4 3 1 3 5π 5π
4 − 4 3j = 8 − j =8 − j = 8 cos + j sin .
8 8 2 2 3 3

Thus, the 3rd roots are given by


  5π   5π 
1/3 3
+ 2kπ 3
+ 2kπ
wk = 8 cos + j sin , k = 0, 1, 2
3 3
    
5π + 6kπ 5π + 6kπ
= 2 cos + j sin , k = 0, 1, 2
9 9
so
 
5π 5π
w0 = 2 cos + j sin
9 9
 
11π 11π
w1 = 2 cos + j sin
9 9
 
17π 17π
w2 = 2 cos + j sin
9 9

24
In the last example, it is difficult to write w0 , w1 and w2 in standard from without a calculator.

The Complex Exponential


Let θ ∈ R. Then
ejθ = cos θ + j sin θ
is known as Euler’s Formula. Given that the left hand side is an exponential function and
that the right hand side is a trigonometric function, it is quite remarkable that the two
quantities are equal. The proof of this requires the use of power series, which you will learn
about in MATH 118 or MATH 119.

If z = r(cos θ + j sin θ) is the polar form of z ∈ C, then z = rejθ is the complex exponential
form of z. As polar form is not unique, neither is complex exponential form:

rejθ = rej(θ+2kπ) for any k ∈ Z.

Thus, ejθ is 2π-periodic (that is, it oscillates like trigonometric functions)!

Let z1 = r1 (cos θ1 + j sin θ1 ) = r1 ejθ1 and z2 = r2 (cos θ2 + j sin θ2 ) = r2 ejθ2 . Then

z1 z2 = r1 r2 (cos(θ1 + θ2 ) + j sin(θ1 + θ2 )) = r1 r2 ej(θ1 +θ2 )

In particular, taking r1 = r2 = 1 gives

ejθ1 ejθ2 = ej(θ1 +θ2 )

which is consistent with the rules of multiplication of exponential functions!

Also, recall that de Moivre’s Theorem states that for z = r(cos θ + j sin θ) and n ∈ Z, we
have that z n = rn (cos(nθ) + j sin(nθ)). Thus (rejθ )n = rn ej(nθ) . Taking r = 1 gives
n
ejθ = ej(nθ)

which is again consistent with the rules of powers of exponential functions!

Now consider ejπ = cos π + j sin π = −1. Rearranging gives

ejπ + 1 = 0

which is known as Euler’s Identity and is often regarded as the most beautiful equation in
mathematics because it combines some of the most important quantities mathematicians use
into one tidy little equation:

25
e − irrational number appearing all over mathematics, particularly in differential equations
π − irrational number important for trigonometry
j − most famous nonreal complex number
1 − the multiplicative identity
0 − the additive identity

Unless specifically asked otherwise, you may use complex exponential form instead of polar
form.

Example 4.5. Find the 6th roots of −64 in standard form.

Solution. Since −64 = 64(cos π + j sin π) = 64ejπ , the 6th roots are given by
j π+2kπ j π+2kπ
wk = 641/6 e 6
= 2e 6
, k = 0, 1, 2, 3, 4, 5.

Thus,
√ !
3 1 √
w0 = 2ejπ/6 = 2 + j = 3+j
2 2
w1 = 2ejπ/2 = 2(0 + j) = 2j
√ !
j5π/6 3 1 √
w2 = 2e =2 − + j =− 3+j
2 2
√ !
3 1 √
w3 = 2ej7π/6 = 2 − − j =− 3−j
2 2
w4 = 2ej3π/2 = 2(0 − j) = −2j
√ !
3 1 √
w5 = 2ej11π/6 = 2 − j = 3−j
2 2

In a course on complex analysis, one often begins by studying well-known functions from
calculus while allowing the variable to be√ complex. Given z ∈ C, one considers functions
such as ez , sin z, cos z, tan z, ln z and z. As our work above suggests, these functions
behave quite differently when the variable is allowed to be complex. As an example of how
different the behaviour is, it can be shown that there exist infinitely many z ∈ C such that
sin z = w where w is any given complex number (say, w = 7). However, this is a topic for
another course.

26
Lecture 5

Complex Polynomials
Recall that p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is the equation of a polynomial. We
call x the variable and a0 , a1 , . . . , an the coefficients. If an 6= 0, we say p(x) has degree n. A
number c is a root of p(x) if p(c) = 0.

If a0 , a1 , . . . , an ∈ R, then we call p(x) a real polynomial. If a0 , a1 , . . . , an ∈ C, then we call


p(x) a complex polynomial. As R ⊆ C, every real polynomial is also a complex polynomial.
For complex polynomials that are not real, we often use the variable z in place of x.
Example 5.1. The polynomial p(z) = jz 3 − (1 − j)z 2 + 3z + (4 − j) is a complex polynomial
of degree 3.
We know real polynomials can have non-real roots. For example, p(x) = x2 + 1 has roots ±j
since (±j)2 + 1 = −1 + 1 = 0. Notice that j and −j are complex conjugates of one another.
Theorem 5.2. Let p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 be a real polynomial. If z ∈ C
is a root of p(x), then so too is z.
Proof. Let p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 be a real polynomial and suppose z ∈ C
is a root of p(x). Then p(z) = 0, that is

an z n + an−1 z n−1 + · · · + a1 z + a0 = 0

Taking complex conjugates of both sides and using the fact that 0, a0 , a1 , . . . , an ∈ R, we
have

an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0
an z n + an−1 z n−1 + · · · + a1 z + a0 = 0.

Thus p(z) = 0 so z is a root of p(x).


Example 5.3. Let p(x) = x3 + 16x. If p(x) = 0, then 0 = x3 + 16x = x(x2 + 16). Thus
x = 0 or x2 + 16 = 0. For x2 + 16 = 0, we can use the quadratic formula:
p
−0 ± 02 − 4(1)(16)
x=
2(1)

−64

2
27
8j
=± since (±8j)2 = −64
2
= ±4j.

Thus the roots of p(x) are 0, 4j and −4j. Note that given any of these roots, the complex
conjugate of that root is also a root of p(x).

Note that we require p(x) to be a real polynomial for Theorem 5.2 to hold. The complex
polynomial
p(z) = z 2 + (2 + 3j)z − (5 − j)
has roots 1 − j and −3 − 2j, neither of which is a complex conjugate of the other.

A Very Brief Introduction to Proofs


Linear Algebra is often the first time students encounter proofs. The goal of this section is
to give a brief introduction to proofs - what to assume, what to prove, and how much detail
should be given. First, consider the following example.

Example 5.4. Let z1 , z2 , z3 ∈ C. Prove that if |z1 | = |z2 | = |z3 | = 1, then


1 1 1
z1 + z2 + z3 = + + .
z1 z2 z3
We are told that z1 , z2 , z3 are arbitrary complex numbers. The statement
1 1 1
if |z1 | = |z2 | = |z3 | = 1, then z1 + z2 + z3 = + + .
z1 z2 z3
is what we have to prove. Statements of the form “if . . . then . . .” are called implications.
The part following the word “if” is called the hypothesis, while the part following the word
“then” is called the conclusion. To prove an implication (that is, to prove the implication
is always true), we assume that hypothesis is true, and then show that the conclusion must
be true. For our example, we may assume that z1 , z2 , z3 are complex numbers such that
|z1 | = |z2 | = |z3 | = 1, and under these assumptions, we must show that
1 1 1
z1 + z2 + z3 = + + .
z1 z2 z3
Note by letting z1 = z2 = z3 = 1, we have that z1 , z2 , z3 ∈ C and |z1 | = |z2 | = |z3 | = 1
so our hypothesis is satisfied. Also, it is clear that z1 + z2 + z3 = 1 + 1 + 1 = 3 = 3 and
that z11 + z12 + z13 = 11 + 11 + 11 = 3 so the conclusion also holds. However, this does not
prove the statement. We must prove the statement for arbitrary z1 , z2 , z3 ∈ C satisying
|z1 | = |z2 | = |z3 | = 1. The following is a correct proof.

28
Proof. Let z1 , z2 , z3 ∈ C and assume that |z1 | = |z2 | = |z3 | = 1. Then for each i = 1, 2, 3,
zi 6= 0 and it follows that zi 6= 0. We have
1 1 1 z1 z2 z3
+ + = + +
z1 z2 z3 z1 z1 z2 z2 z3 z3

z1 z2 z3
= 2
+ 2
+
|z1 | |z2 | |z3 |2

= z1 + z2 + z3 since |z1 | = |z2 | = |z3 | = 1

= z1 + z2 + z3 .

In the above proof, we began by stating our assumptions: z1 , z2 , z3 ∈ C and |z1 | = |z2 | =
|z3 | = 1. We then deduce that since |zi | = 1 for i = 1, 2, 3, zi 6= 0 from which it follows that
z i 6= 0. This justifies why we can multiply each z1i term by zzii . From there, we use properties
of conjugates to finish showing the conclusion holds. Note that in the proof, we state where
we used the hypothesis |z1 | = |z2 | = |z3 | = 1. Additionally, note that at no point in the
proof did we assume the conclusion was true – it is incorrect to write the following:

1 1 1
z1 + z2 + z3 = + +
z1 z2 z3

z1 z2 z3
z1 + z2 + z3 = + +
z1 z1 z2 z2 z3 z3

z1 z2 z3
z1 + z2 + z3 = 2
+ 2
+
|z1 | |z2 | |z3 |2

z1 + z2 + z3 = z1 + z2 + z3 since |z1 | = |z2 | = |z3 | = 1

z1 + z2 + z3 = z1 + z2 + z3

as the very first line implies that we are already assuming the conclusion is true when it is
in fact the very statement we want to show is true.

Example 5.5. Let z ∈ C. Show that |Re(z)| + |Im(z)| ≤ 2|z|.

Proof. Let z ∈ C. Then z = x + yj with x, y ∈ R. Then x = Re(z) and y = Im(z). Since



( 2|z|)2 = 2|z|2 = 2(x2 + y 2 ) = 2x2 + 2y 2 = 2|x|2 + 2|y|2

and 2
|Re(z)| + |Im(z)| = (|x| + |y|)2 = |x|2 + 2|x||y| + |y|2

29
we have
√ 2
( 2|z|)2 − |Re(z)| + |Im(z)| = 2|x|2 + 2|y|2 − |x|2 − 2|x||y| − |y|2
= |x|2 − 2|x||y| + |y|2
= (|x| − |y|)2
= (|Re(z)| − |Im(z)|)2
≥0
√ 2 2 √
Thus ( 2|z|)2 − |Re(z)| + |Im(z)| ≥ 0, that is, |Re(z)| + |Im(z)| ≤ ( 2|z|)2 . Since both

|Re(z)| + |Im(z)| √and 2|z| are nonnegative real numbers, we conclude that
|Re(z)| + |Im(z)| ≤ 2|z|.

Recall
√ that for any x ∈ R, x2 = |x|. However, if we know that x ≥ 0, then |x| = x, so
2
x = x in this case. This observation can be useful when dealing with radicals – we often
square both sides of an equality (or inequality) if it rids us of radicals, and then take square
roots once we are done.

Roundoff Error
Rounding real numbers to a certain number of decimal places is an extremely useful idea.
For example if you knew your exact weight was 123456/2345 kilograms and someone asked
you what your weight was, you wouldn’t likely respond with “123456/2345 kilograms.” You
would more likely use the fact that
123456
≈ 52.6
2345
and say that your weight was 52.6 kilograms. The reason you do this is because 52.6 is easier
to remember and it is more meaningful - it is not immediately clear how big 123456/2345
actually is.

But we do need to be careful when we round. Consider x = 1.01. Then

x100 = 1.01100 ≈ 2.705

If we decide to round 1.01 down to 1, then we have

x100 ≈ 1100 = 1

and we see that the resulting answers are not very close. We observe that a very small
change in x (exactly 0.01) leads to a relatively large change in x100 (approximately 1.705).
This phenomenon is known as roundoff error. We can attempt to avoid roundoff error by
not rounding exact answers before using them in further computations.

30
It might seem like the above roundoff error occurred because of the high power on x, but
consider the following system of equations:

x + y = 2
x + 1.014y = 0

To solve this system, we isolate for y in the first equation to get y = 2 − x. Substituting this
into the second equation gives

x + 1.014(2 − x) = 0, that is, 0.014x = 2.028 so x = 144.857142

and it follows that


y = 2 − x = 2 − 144.857142 = −142.857142
If we round our values of x and y to the nearest tenth, we have

x = 144.9 and y = −142.9

Now consider the above system where we round the coefficients to the nearest hundredth:

x + y = 2
x + 1.01y = 0

Using the same steps as above to solve this system leads to

x = 202 and y = −200

Here we observe an even worse roundoff error than before, and we only changed one of the
coefficients in the original system of equations by 0.004. Note also that there are no high
powers on any of the variables.

It is extremely important to control roundoff error. For instance, in the previous example,
if x and y represent changes to the amount of drugs being administered to a patient in a
hospital, then this roundoff error could severely harm the patient or cause their death.

So how do we avoid roundoff error? In general, it is not always possible. Many of the numbers
used in real-world applications have many decimal places, so even computers are forced to
round off or truncate values. A course in applied mathematics will introduce students to
sensitivity analysis, where we test how sensitive the output variables are to small changes to
the input variables.

31
Lecture 6

Vector Algebra
We now begin our study of linear algebra. Mostly we will focus on the “real case”, that is,
linear algebra using real numbers, but we will at times address the “complex case” as well.

We begin with the Cartesian Plane. We choose an origin O and two perpendicular axes
called the x1 −axis and the x2 −axis.12 A point P in this plane is represented by the ordered
pair (p1 , p2 ). We think of p1 as a measure of how far to the right (if p1 > 0) or how far to
the left (if p1 < 0) P is from the x2 −axis and we think of p2 as a measure of how far above
(if p2 > 0) or how far below (if p2 < 0) the x1 −axis P is. This is illustrated in Figure 11.

Figure 11: The Cartesian Plane and the point P (p1 , p2 ).

It is often convenient to associate to each point a vector which we view geometrically as an


“arrow”, and refer to as a directed line segment. Given a point P (p1 , p2 ) in our Cartesian
plane, we associate to it the vector " #
p1
p~ =
p2
as illustrated in Figure 12. We define
(" # )
x1
R2 = x1 , x2 ∈ R
x2

to be the collection of all such vectors. We refer to x1 and x2 as the entries or components
of the vector. We will define how to add these vectors and multiply them by constants,
two operations which we will see are vital to linear algebra. Of course, these ideas extend
naturally to three-dimensional space and beyond.

12
You might be more familiar with the names x−axis and y−axis. However, as we’ll see, it’s more
convenient to call them the x1 −axis and the x2 −axis.

32
Figure 12: The Cartesian Plane and the point P (p1 , p2 ).

Definition 6.1. We define


  
 x1 
.. 
 
n
R =  .  x1 , . . . , xn ∈ R


 
xn
to be the set of all vectors with n components, each of which is a real number. A vector in
R3 is illustrated in Figure 13.

Figure 13: A vector in R3 . Note the labelling of the axes.

Definition 6.2. Two vectors


   
x1 y1
 .   . 
~x =  ..  and ~y =  .. 
xn yn

33
in Rn are equal if x1 = y1 , x2 = y2 , . . . , xn = yn , that is, if their corresponding entries are
equal, and we write ~x = ~y in this case. Otherwise, we write ~x 6= ~y .

It is important to note that if ~x ∈ Rn and ~y ∈ Rm with n 6= m, then ~x and ~y can never be


equal. For example,  
" # 1
1
6=  2
 

2
0
as one vector belongs to R2 and the other belongs to R3 .

Definition 6.3. The zero vector in Rn is denoted by


 
0
~0Rn  . 
=  ..  .
0

For example,
 
  0
" # 0
~0R2 = 0 ~0R3 = 

~0R4 =  0 
,  0 , and so on.
 
 
0  0 
0
0

Often, we denote the zero vector in Rn simply as ~0 when it is clear that we are talking about
~0Rn to avoid the messy subscript.

Definition 6.4. Let   


x1 y1
 .   . 
~x =  ..  and ~y =  .. 
xn yn
be two vectors in Rn . We define vector addition as
 
x1 + y1
.. n
~x + ~y =  ∈R ,
 
.
xn + yn

that is, we add vectors by adding the corresponding entries.

34
Example 6.5.
" # " # " #
1 −1 0
• + =
2 3 5
     
1 2 3
•  2 + 3 = 5 
     

3 −2 1
 
1 " #
1
•  1 + is not defined as one vector is in R3 while the other is in R2 .
 
2
1
We have a nice geometric interpretation of vector addition that is similar to what we observed
for the addition of complex numbers. This is illustrated in Figure 14 (compare to Figure
3) where we see that two vectors determine a parallelogram with their sum appearing as a
diagonal of this parallelogram.

Figure 14: Geometrically interpreting vector addition. The figure on the left is in R2 with
vector components labelled on the corresponding axes and the figure on the right is vector
addition viewed for vectors in Rn .

Definition 6.6. Let  


x1
 . 
~x =  .. 
xn
be a vector in Rn and let c ∈ R. We define scalar multiplication as
 
cx1
 . 
c~x =  ..  ∈ Rn
cxn

35
that is, we multiply each entry of ~x by c.
Example 6.7.
   
1 2
 6   12 
• 2 =
   

 −4   −8 
8 16
   
−1 0
   ~
• 0  −1  =  0  = 0

2 0
We often refer to c as a scalar, and call c~x a scalar multiple of ~x. Figure 15 helps us
understand geometrically what scalar multiplication of a nonzero vector ~x ∈ R2 looks like.
The picture is similar for ~x ∈ Rn .

Figure 15: Geometrically interpreting scalar multiplication in R2 .

Definition 6.8. Two nonzero vectors in Rn are parallel if they are scalar multiples of one
another.
Example 6.9. The vectors
" # " #
2 −4
~x = and ~y =
−5 10
are parallel since ~y = −2~x, or equivalently, ~x = − 21 ~y . The vectors
   
−2 −2
~u =  −3  and ~v =  −1 
   

−4 −13

36
are not parallel for ~u = c~v would imply that −2 = −2c, −3 = −c and −4 = −13c which
4
implies that c = 1, 3, 13 simultaneously, which is impossible.

Having equipped the set Rn with vector addition and scalar multiplication, we state here a
theorem that gives the resulting properties which we will use often throughout the course.

~ ~x, ~y ∈ Rn and let c, d ∈ R. We have


Theorem 6.10. Let w,

V1. ~x + ~y ∈ Rn Rn is closed under addition

V2. ~x + ~y = ~y + ~x addition is commutative

V3. (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative

V4. There exists a vector ~0 ∈ Rn such that ~v + ~0 = ~v for every ~v ∈ Rn zero vector

V5. For each ~x ∈ Rn there exists a (−~x) ∈ Rn such that ~x + (−~x) = ~0 additive inverse

V6. c~x ∈ Rn Rn is closed under scalar multiplication

V7. c(d~x) = (cd)~x scalar multiplication is associative

V8. (c + d)~x = c~x + d~x distributive law

V9. c(~x + ~y ) = c~x + c~y distributive law

V10. 1~x = ~x scalar multiplicative identity

Note that the zero vector of Rn is ~0 = ~0Rn and the additive inverse of ~x ∈ Rn is −~x = (−1)~x.
Many of these properties may seem obvious and it might not be clear as to why they are
stated as a theorem. One of the reasons is that everything we do in this course will fol-
low from these ten properties, so it is important to list them all here. Also, as we proceed
through the course, we will see that vectors in Rn are not the only mathematical objects
that are subject to these properties, and it is quite useful and powerful to understand what
other classes of objects behave the same as vectors in Rn .

We make the following definition that will be important throughout the course.

Definition 6.11. Let ~x1 , ~x2 , . . . , ~xk ∈ Rn and c1 , c2 , . . . , ck ∈ R for some positive integer k.
We call the vector
c1~x1 + c2~x2 + · · · + ck ~xk
a linear combination of the vectors ~x1 , ~x2 , . . . , ~xk .

Note that properties V1 and V6 of Theorem 6.10 together guarantee that if ~x1 , . . . , ~xk ∈ Rn
and c1 , . . . , ck ∈ R, then c1~x1 + c2~x2 + · · · + ck ~xk ∈ Rn , that is, every linear combination
of ~x1 , . . . , ~xk will again be a vector in Rn . Hence we say that Rn is closed under linear
combinations.

37
Example 6.12. In R3 , let
     
1 0 0
~e1 =  0  , ~e2 =  1  , and ~e3 =  0  .
     

0 0 1

Then for any  


x1
~x =  x2  ∈ R3
 

x3
we see that
~x = x1~e1 + x2~e2 + x3~e3 .
That is, every ~x ∈ R3 can be expressed as a linear combination of ~e1 , ~e2 and ~e3 .

Thus far, we have associated vectors in Rn with points. Recall that given a point P (p1 , . . . , pn ),
we associate with it the vector  
p1
 . 
p~ =  ..  ∈ Rn
pn
and view p~ as a directed line segment from the origin to P . Before we continue, we briefly
mention that vectors may also be thought of as directed segments between arbitrary points.
For example, given two points A and B in the x1 x2 −plane, we denote the directed line
−→
segment from A to B by AB. In this sense, the vector p~ from the origin O to the point P
−→
can be denoted as p~ = OP . This is illustrated in Figure 16.

Figure 16: Vectors between points in R2 (the picture in Rn is similar).

38
Notice that Figure 16 is in R2 , but that we can view directed segments between vectors in
Rn in a similar way. We realize that there is something special about directed segments from
−→
the origin to a point P . In particular, given a point P , the entires in the vector p~ = OP are
simply the coordinates of the point P (refer to Figures 12 and 13). Thus we refer to a vector
−→
p~ = OP to be the position vector of P and and we say that p~ is in standard position. Note
that in Figure 16, only the vector p~ is in standard position.

Finding a vector from a point A to a point B in Rn is also not difficult. For two points
A(a1 , a2 ) and B(b1 , b2 ) we have that
" # " # " #
−→ b 1 − a1 b1 a1 −−→ −→
AB = = − = OB − OA
b 2 − a2 b2 a2
which is illustrated in Figure 17.

−→
Figure 17: Finding the components of AB ∈ R2 .

This generalizes naturally to Rn where for A(a1 , . . . , an ) and B(b1 , . . . , bn ) we have


     
b 1 − a1 b1 a1
−→  ..   ..   ..  −−→ −→
AB =  . = . − .  = OB − OA.
b n − an bn an
Example 6.13. Find the vector from A(1, 1, 1) to B(2, 3, 4).
−→
Solution. The vector from A to B is the vector AB. We have
     
2 1 1
−→ −−→ −→      
AB = OB − OA =  3  −  1  =  2  .
4 1 3

39
Now in Rn , given three points A, B and C, we have that
−→ −→ −→ −−→ −→ −→ −−→ −→ −−→
AC = OC − OA = OB − OA + OC − OB = AB + BC

which is illustrated in Figure 18.

Figure 18: Vector Addition.

Finally, putting everything together, we see that two points A and B and their corresponding
−→ −−→
position vectors OA and OB determine a parallelogram and that the sum and difference of
these vectors determine the diagonals of this parallelogram. This is displayed in Figure 19,
−−→
where the image on the right is obtained from the one on the left by setting ~u = OB and
−→ −−→ −→
~v = OA. Note that by orienting vectors this way, OB − OA = ~u − ~v is not in standard
position.

Figure 19: The parallelogram determined by two vectors. The diagonals of the parallelogram
are represented by the sum and difference of the two vectors.

40
Lecture 7

Norms and Dot Products


Having introduced vectors in Rn , we now define the norm of a vector and the dot product of
two vectors. We will see how these two concepts are related and we will use them frequently.
We begin with the norm of a vector.
Definition 7.1. The norm of  
x1
 . 
~x =  ..  ∈ Rn
xn
is the nonnegative real number
q
k~xk = x21 + · · · + x2n .
We interpret the norm of a vector in Rn as the length or magnitude of the vector. Figure 20
shows this for a vector in R2 .

Figure 20: A vector ~x ∈ R2 and its norm, interpreted as length.

Example 7.2.
" #
1 √ √
• If ~x = ∈ R2 , then k~xk = 12 + 22 = 5
2
 
1
 1  √ √
• If ~x =   ∈ R4 , then k~xk = 12 + 12 + 12 + 12 = 4 = 2
 
 1 
1

41
Example 7.3. Find the distance from A(1, −1, 2) to B(3, 2, 1).

Solution. Since 
    
3 1 2
−→ −−→ −→   
AB = OB − OA =  2  −  −1  =  3  ,
  

1 2 −1
the distance from A to B is
−→ p √ √
kABk = 22 + 32 + (−1)2 = 4 + 9 + 1 = 14.

Theorem 7.4 (Properties of Norms). Let ~x, ~y ∈ Rn and c ∈ R. Then

(1) k~xk ≥ 0 with equality if and only if ~x = ~0

(2) kc~xk = |c|k~xk

(3) k~x + ~y k ≤ k~xk + k~y k which we call the Triangle Inequality.

Property (3) is known as the Triangle Inequality. As with complex numbers, the Triangle
Inequality has the same interpretation for vectors. Namely, that in the triangle determined
by vectors ~x, ~y and ~x + ~y , the length of any one side of the triangle cannot exceed the sum of
the lengths of the remaining two sides. This is illustrated in Figure 21 (compare to Figures
3 and 4).

Figure 21: Interpreting the Triangle Inequality.

Definition 7.5. A vector ~x ∈ Rn is a unit vector if k~xk = 1.

Example 7.6.
" #
1 √
• ~x = is a unit vector since k~xk = 12 + 02 = 1
0

42

1
1   1 √ 2 1 √
• ~x = − √  1  is a unit vector since k~xk = − √ 1 + 12 + 1 2 = √ 3 = 1
3 3 3
1
Consider a nonzero vector ~x ∈ Rn . Then
1
~y = ~x
k~xk
is a unit vector in the direction of ~x. To see this, note that since ~x 6= ~0, k~xk > 0 by Theorem
7.4(1). Thus ~y is a positive scalar multiple of ~x so ~y is in the same direction as ~x. Now
1 1 1
k~y k = ~x = k~xk = k~xk = 1
k~xk k~xk k~xk
so ~y is a unit vector in the direction of ~x.
 
4
Example 7.7. Find a unit vector in the direction of ~x =  5 .
 

6
√ √ √
Solution. Since k~xk = 42 + 52 + 62 = 16 + 25 + 36 = 77, we have
   √ 
4 4/ 77
1    √ 
~y = √  5  =  5/ 77 
77 √
6 6/ 77
is the desired vector.
We now define the dot product of two vectors in Rn .
Definition 7.8. Let   
x1 y1
 .   . 
~x =  ..  and ~y =  .. 
xn yn
be vectors in Rn . The dot product 13 of ~x and ~y is the real number
~x · ~y = x1 y1 + · · · + xn yn .
Example 7.9.
   
1 −3
 1  ·  −4  = 1(−3) + 1(−4) + 2(5) = −3 − 4 + 10 = 3.
   

2 5

13
The dot product is sometimes referred to the scalar product or the standard inner product. The term
scalar product comes from the fact that the dot product returns a real number which we call a scalar.

43
~ ~x, ~y ∈ Rn and c ∈ R.
Theorem 7.10 (Properties of Dot Products). Let w,
(1) ~x · ~y ∈ R

(2) ~x · ~y = ~y · ~x

(3) ~x · ~0 = 0

(4) ~x · ~x = k~xk2

(5) (c~x) · ~y = c(~x · ~y ) = ~x · (c~y )

~ · (~x ± ~y ) = w
(6) w ~ · ~x ± w
~ · ~y
Proof. We prove (2), (4) and (5). Let c ∈ R and
   
x1 y1
 .   . 
~x =  ..  and ~y =  .. 
xn yn

be vectors in Rn . For (2) we have

~x · ~y = x1 y1 + · · · + xn yn = y1 x1 + · · · + yn xn = ~y · ~x.

Now to prove (4), we have

~x · ~x = x1 x1 + · · · + xn xn = x21 + · · · + x2n = k~xk2 .

For (5),
(c~x) · ~y = (cx1 )y1 + · · · + (cxn )yn = c(x1 y1 + · · · + xn yn ) = c(~x · ~y ).
That ~x · (c~y ) = c(~x · ~y ) is shown similarly.
Property (4) of Theorem 7.10 shows how the norm and dot product are related. Together,
norms and dot products lead to a nice geometric interpretation about angles between vectors.
Given two vectors ~x, ~y ∈ Rn , they determine an angle θ as shown in Figure 22. We restrict
θ to 0 ≤ θ ≤ π to avoid multiple values for θ and to avoid reflex angles.

Figure 22: Acute, Obtuse and Orthogonal angles determined by vectors.

44
Theorem 7.11. For two nonzero vectors ~x, ~y ∈ Rn determining an angle θ,

~x · ~y = k~xkk~y k cos θ

Proof. Consider the triangle determined by the vectors ~x, ~y and ~x − ~y .

From the Cosine Law, we have

k~x − ~y k2 = k~xk2 + k~y k2 − 2k~xkk~y k cos θ. (10)

Using Theorem 7.10, we obtain

k~x − ~y k2 = (~x − ~y ) · (~x − ~y )


= (~x − ~y ) · ~x − (~x − ~y ) · ~y
= ~x · ~x − ~y · ~x − ~x · ~y + ~y · ~y
= k~xk2 − 2(~x · ~y ) + k~y k2 .

Thus (10) becomes

k~xk2 − 2(~x · ~y ) + k~y k2 = k~xk2 + k~y k2 − 2k~xkk~y k cos θ

and subtracting k~xk2 + k~y k2 from both sides and then multiplying both sides by − 21 gives
~x · ~y = k~xkk~y k cos θ as required.
As an easy consequence of Theorem 7.11, we have the Cauchy-Schwarz Inequality, which
states that the size of the dot product of two vectors ~x, ~y ∈ Rn cannot exceed the product
of their norms. Note that the Cauchy-Schwarz Inequality holds for any vectors in Rn .
14
Corollary 7.12 (Cauchy-Schwarz Inequality). For any two vectors ~x, ~y ∈ Rn , we have

|~x · ~y | ≤ k~xkk~y k.
14
A Corollary is a result that follows from the preceding Theorem.

45
Proof. The inequality holds (with equality) if either ~x = ~0 or ~y = ~0. Thus, we assume both
~x and ~y are nonzero. Then, by Theorem 7.11, ~x · ~y = k~xkk~y k cos θ. Taking absolute values
of both sides give
|~x · ~y | = k~xkk~y k| cos θ| ≤ k~xkk~y k
where we have used that fact that | cos θ| ≤ 1 for any θ ∈ R.
The result of Theorem 7.11 can also be rearranged in order to compute the angle determined
by two nonzero vectors ~x and ~y . Indeed, for nonzero ~x and ~y we have that k~xk, k~y k > 0 and
so from ~x · ~y = k~xkk~y k cos θ we obtain

~x · ~y
cos θ = . (11)
k~xkk~y k

Given that 0 ≤ θ ≤ π, we can solve for θ using


 
−1 ~x · ~y
θ = cos .
k~xkk~y k

Example 7.13. Compute the angle between the vectors


   
2 1
~x =  1  and ~y =  −1 
   

−1 −2

Solution. We have that


~x · ~y 2(1) + 1(−1) − 1(−2) 3 1
cos θ = =√ √ =√ √ =
k~xkk~y k 4+1+1 1+1+4 6 6 2
so  
−1 1 π
θ = cos = .
2 3
For nonzero vectors ~x, ~y ∈ Rn determining an angle θ, we are often not interested in the spe-
cific value of θ, but rather in the approximate size of θ. That is, we are often only concerned
if ~x and ~y determine an acute angle, an obtuse angle, or if the vectors are orthogonal (refer
to Figure 22). Recalling that
π
cos θ > 0 for 0 ≤ θ <
2
π
cos θ = 0 for θ =
2
π
cos θ < 0 for < θ ≤ π
2

46
we see from Equation (11) that the sign of cos θ is determined by the sign of ~x · ~y since
k~xkk~y k > 0. Thus
π
~x · ~y > 0 ⇐⇒ 0 ≤ θ < ⇐⇒ ~x and ~y determine an acute angle
π 2
~x · ~y = 0 ⇐⇒ θ= ⇐⇒ ~x and ~y are orthogonal
π 2
~x · ~y < 0 ⇐⇒ <θ≤π ⇐⇒ ~x and ~y determine an obtuse angle
2
Example 7.14. For " # " #
1 6
~x = and ~y = ,
2 −2
we compute
~x · ~y = 1(6) + 2(−2) = 2 > 0
and so ~x and ~y determine an acute angle.

Note that to find the exact angle determined by ~x and ~y in the previous example we compute
~x · ~y 2 2 2 2 1
cos θ = =√ √ =√ √ =√ = √ = √
k~xkk~y k 1 + 4 36 + 4 5 40 200 10 2 5 2
so  
1 −1
θ = cos √
5 2
which is our exact answer for θ as any computer or calculator will return an approximation
of this value.

47
Lecture 8
We have defined the norm for any vector in Rn and the dot product for any two vectors
in Rn . However, our work with angles determined by vectors has required that our vectors
be nonzero thus far. Now since ~x · ~0 = 0 for every ~x ∈ Rn , we define the zero vector to
be orthogonal to every vector in Rn . Thus we may simply say that two vectors ~x, ~y ∈ Rn
are orthogonal if and only if ~x · ~y = 0 and not insist that ~x, ~y are nonzero. Although the
zero vector of Rn is orthogonal to all vectors in Rn , we don’t explicitly compute the angle ~0
makes with another vector ~x ∈ Rn since
~x · ~y
cos θ =
k~xkk~y k

is not defined if either of ~x or ~y is the zero vector. Thus, we interpret ~x and ~y being orthogonal
to mean that their dot product is zero, and if they are both nonzero, then they determine
an angle of π2 .

Complex Vectors
The idea here is to extend our work in Rn to vectors whose entries are complex numbers.
For z1 , . . . , zn ∈ C, we define
 
z1
 . 
~z =  .. 
zn
to be a complex vector (a vector with complex entries) and
  
 z1 
.. 
 
n
C =  .  z1 , . . . , zn ∈ C


 
zn

~ ∈ Cn , we define vector addition


to be the set of all complex vectors with n entries. For ~z, w
by
     
z1 w1 z1 + w1
 .   .   ..
~ =  ..  +  ..  = 
~z + w

. 
zn wn zn + wn
and for any α ∈ C we define scalar multiplication by
 
αz1
 . 
α~z =  ..  .
αzn

48
Addition and scalar multiplication for vectors in Cn are defined in the same way as for vec-
tors in Rn . However, the norm and the dot product don’t behave quite the same way. Note
that for a vector ~x ∈ Rn , we have that k~xk ∈ R, k~xk ≥ 0 and that k~xk2 = ~x · ~x. We would
like to extend these operations to Cn in such a way that these properties still hold.

Consider " # " #


z1 1+j
~z = = .
z2 2−j
We compute
~z · ~z = z12 + z22 = (1 + j)2 + (2 − j)2 = 3 − 2j.
/ R, so defining a norm so that k~z k2 = ~z · ~z would mean that k~z k ∈
Thus, ~z · ~z ∈ / R. Instead,
consider
z 1 z1 + z 2 z2 = (1 − j)(1 + j) + (2 + j)(2 − j) = 7
which is a nonnegative real number. This motivates the following definition.

Definition 8.1. Let    


z1 w1
 .   . 
~z =  ..  ~ =  .. 
and w
zn wn
be vectors in Cn . The complex inner product 15 of ~z and w
~ is

h~z, wi
~ = z 1 w1 + · · · + z n wn

and the norm of ~z is √


k~z k = z 1 z1 + · · · + z n zn .

We make a few remarks here.

• For ~z ∈ Cn , we have

h~z, ~z i = z 1 z1 + · · · + z n zn = |z1 |2 + · · · + |zn |2


2
so h~z, ~z i ∈ R
p with h~z, ~z i ≥ 0. We also see that k~z k = h~z, ~z i from which it follows
that k~z k = h~z, ~z i is a nonnegative real number.

• The dot product ~z · w


~ will still be useful later.

~ ∈ Rn , then h~z, wi
• If ~z, w ~ = ~z · w.
~
15
The definition of the complex inner product given here is the one used by engineers. Mathematicians
define the complex inner product as h~z, wi
~ = z1 w1 + · · · + zn wn , that is, engineers put the complex conjugate
on the first entry in each term of the sum, whereas mathematicians put it on the second entry. We will
use the definition where the conjugate appears on the first entries, but be careful if you pick up a different
Linear Algebra text!

49
For  
z1
 . 
~z =  ..  ∈ Cn
zn
we define 

z1
 . 
~z =  ..  ∈ Cn
zn
from which we see
h~z, wi
~ = ~z · w.
~
~ ∈ Cn can be viewed as a dot
From this, we can see that the complex inner product of ~z, w
product of ~z and w
~ rather than of ~z and w.
~

Example 8.2. Let " # " #


2 − 2j 2+j
~z = and w
~=
1+j 3
be two vectors in Cn . Compute h~z, wi
~ and hw,
~ ~z i.

Solution.

h~z, wi
~ = (2 − 2j)(2 + j) + (1 + j)(3)
= (2 + 2j)(2 + j) + (1 − j)(3)
= (2 + 6j) + (3 − 3j)
= 5 + 3j

and

hw,
~ ~z i = (2 + j)(2 − 2j) + (3)(1 + j)
= (2 − j)(2 − 2j) + (3)(1 + j)
= (2 − 6j) + (3 + 3j)
= 5 − 3j

~ ∈ Cn , h~z, wi
The last example shows us that for ~z, w ~ =6 hw,
~ ~z i in general.

~ ~z ∈ Cn and α ∈ C. Then
Theorem 8.3 (Properties of Complex Inner Products). Let ~v , w,

(1) h~z, ~z i ≥ 0 with equality if and only if ~z = ~0

(2) h~z, wi
~ = hw,
~ ~z i

(3) h~v + w,
~ ~z i = h~v , ~z i + hw,
~ ~z i and h~z, ~v + w
~ i = h~z, ~v i + h~z, w
~i

50
(4) hα~z, w
~ i = αh~z, w
~ i and h~z, αw
~ i = αh~z, w
~i

(5) |h~z, w
~ i| ≤ k~z kkwk
~ (Cauchy–Schwarz Inequality)

(6) k~z + wk
~ ≤ k~z k + kwk
~ (Triangle Inequality)

Proof. We prove (4). Let


  

z1 w1
 .   . 
~z =  ..  ~ =  .. 
and w
zn wn

be vectors in Cn and α ∈ C. Then

~ i = (αz1 )w1 + · · · + (αzn )wn


hα~z, w
= α z 1 w1 + · · · + α z n wn
= α(z 1 w1 + · · · + z n wn )
= αh~z, w~i

and

h~z, αw
~ i = z 1 (αw1 ) + · · · + z n (αwn )
= α z 1 w1 + · · · + α z n wn
= α(z 1 w1 + · · · + z n wn )
= αh~z, w~i

We end this section with a few comments.

• From Theorem 8.3(2), we see that the complex inner product does not commute, that
~ ∈ Cn , h~z, wi
is, for ~z, w 6 hw,
~ = ~ ~z i. Of course, if ~z, w
~ happen to have all real entries,
then h~z, wi
~ = hw, ~ ~z i. Thus, when we say the complex inner product doesn’t commute,
we mean that there exist ~z, w ~ ∈ Cn so that h~z, wi
~ 6= hw,~ ~z i. Of course, we have that
h~z, wi
~ = hw, ~ ~z i, so knowing the value of h~z, wi
~ allows us to easily compute hw, ~ ~z i.

• It might seem that the complex inner product is “made up” as it was our attempt
to make sure h~z, ~z i is a nonnegative real number. However, given that the complex
inner product obeys the Triangle Inequality and the Cauchy–Schwarz Inequality as
well as many of the other properties dot products satisfy for vectors in Rn , it should
be apparent that the complex inner product was the correct choice.

• Generalization is very common in mathematics. We have defined vectors in Rn along


with how to add them and multiply them by scalars. Then we generalized those ideas to
vectors in Cn and saw that everything worked the same way. We also defined the norm
and the dot product for vectors in Rn and then generalized those properties to vectors

51
in Cn , however this time, we had to do a little bit of work to make sure everything still
made sense. Near the end of the course, we will extend these ideas further – vector
addition and scalar multiplication can be applied to objects other than the vectors in
Rn and Cn and studying collections of such objects is one of the things that makes
Linear Algebra such an interesting and beautiful branch of mathematics.

The Cross Product in R3


We return to the relative safety of real numbers. We now define a product of two vectors
that is only valid16 in R3 . Whereas the dot product of two vectors in R3 is a real number,
the cross product of two vectors in R3 is a vector in R3 .

Definition 8.4. Let   


x1 y1
~x =  x2  and ~y =  y2 
   

x3 y3
be two vectors in R3 . The cross product 17 of ~x and ~y is
 
x2 y 3 − y 2 x3
~x × ~y =  −(x1 y3 − y1 x3 ) 
 

x1 y 2 − y 1 x2

Example 8.5. Let   


1 −1
~x =  6  and ~y =  3  .
   

3 2
Then    
6(2) − 3(3) 3
  
~x × ~y =  − 1(2) − (−1)(3)  =  −5 
 

1(3) − (−1)(6) 9

From this example, we compute

~x · (~x × ~y ) = 1(3) + 6(−5) + 3(9) = 3 − 30 + 27 = 0


~y · (~x × ~y ) = −1(3) + 3(−5) + 2(9) = −3 − 15 + 18 = 0

from which we see that ~x × ~y is orthogonal to both ~x and ~y .


16
This is not entirely true. There is a cross product in R7 as well, but it is beyond the scope of this course.
17
The cross product is sometimes called the vector product because the result is also a vector.

52
The formula for ~x × ~y is quite tedious to remember. Here we give a simpler way. For
a, b, c, d ∈ R, define
a b
= ad − bc
c d
so that
 
x2 y 2
  ←− remove x1 and y1

 x3 y 3 

     
x1 y1
 
 
  x1 y1 
~x × ~y =  x2  ×  y 2  =   − x
 ←− remove x2 and y2 (don’t forget the “−” sign)
  
3 y3 
x3 y3
 
 
 
 

 x1 y1 
 ←− remove x and y
3 3
x2 y2
 
x2 y 3 − y 2 x3
= −(x1 y3 − y1 x3 )  .
 

x1 y 2 − y 1 x2

It’s a good idea to try this “trick” using the above example.

53
Lecture 9
~ ∈ R3 , c ∈ R. Then
Theorem 9.1 (Properties of Cross Products). Let ~x, ~y , w
(1) ~x × ~y ∈ R3

(2) ~x × ~y is orthogonal to both ~x and ~y

(3) ~x × ~0 = ~0 = ~0 × ~x

(4) ~x × ~x = ~0

(5) ~x × ~y = −(~y × ~x )

(6) (c~x ) × ~y = c(~x × ~y ) = ~x × (c~y )

~ × (~x ± ~y ) = (w
(7) w ~ × ~x ) ± (w
~ × ~y )

(8) (~x ± ~y ) × w
~ = (~x × w
~ ) ± (~y × w
~)
Proof. We prove (5). Let
   
x1 y1
~x =  x2  and ~y =  y2  .
   

x3 y3

Then
     
x2 y 3 − y 2 x3 −(y2 x3 − x2 y3 ) y2 x3 − x2 y3
~x × ~y =  −(x1 y3 − y1 x3 )  =  y1 x3 − x1 y3  = −  −(y1 x3 − x1 y3 )  = −(~y × ~x ).
     

x1 y 2 − y 1 x2 −(y1 x2 − x1 y2 ) y1 x2 − x1 y2

Example 9.2. Let ~x, ~y ∈ R3 be parallel vectors. Compute ~x × ~y .


Solution. Let ~x, ~y ∈ R3 be parallel vectors. Then ~y = c~x for some c ∈ R. Using Theorem
9.1(4),(6) we have
~x × ~y = ~x × (c~x ) = c(~x × ~x) = c(~0) = ~0.
~ ∈ R3
Example 9.3. Show that the cross product is not associative. That is, find ~x, ~y , w
such that
(~x × ~y ) × w
~ 6= ~x × (~y × w
~ ).
Solution. Consider     
1 0 0
~x =  1  , ~y =  1  and w
~ = 0 
     

0 0 1

54
Then
           
1 0 0 0 0 0
(~x × ~y ) × w
~ =  1  ×  1  ×  0  =  0  ×  0  =  0 
           

0 0 1 1 1 0
and
           
1 0 0 1 1 0
~ ) =  1  ×  1  ×  0  =  1  ×  0  =  0 
~x × (~y × w
           

0 0 1 0 0 −1
so we see that (~x × ~y ) × w
~ 6= ~x × (~y × w
~ ). Thus, the cross product is not associative.
Since the cross product is not associative, the expression ~x × ~y × w ~ is undefined. We must
always include brackets to indicate in which order we should evaluate the cross products as
changing the order will change the result. Also note that the cross product is not commu-
tative as ~x × ~y 6= ~y × ~x. However, since ~x × ~y = −(~y × ~x), we say that the cross product
is anti-commutative, that is, changing the order of ~x and ~y in the cross product changes the
result by a factor of −1.
Example 9.4. Find a nonzero vector orthogonal to both
   
1 1
~x =  2  and ~y =  −1  .
   

3 −1
Moreover, show that this vector is orthogonal to any linear combination of ~x and ~y .
Solution. Using Theorem 9.1(2), we have that
     
1 1 1
~n = ~x × ~y =  2  ×  −1  =  4 
     

3 −1 −3
is orthogonal to both ~x and ~y . Now for any s, t ∈ R,
~n · (s~x + t~y ) = s(~n · ~x) + t(~n · ~y ) = s(0) + t(0) = 0
so ~n = ~x × ~y is orthogonal to any linear combination of ~x and ~y .
Example 9.4 demonstrates one of the main uses of the cross product in R3 . Given two non
parallel vectors ~x, ~y ∈ R3 , it is quite useful to find a nonzero vector that is orthogonal to both
~x and ~y (and hence to any linear combination of them). Also, we note here that once the
cross product of ~x, ~y ∈ R3 is computed, we can check that our work is correct by verifying
that (~x × ~y ) · ~x = 0 = (~x × ~y ) · ~y .

We now look at how the cross product can be used to compute the area of a parallelogram.
We will need the following result which is stated without proof.

55
Theorem 9.5 (Lagrange Identity). Let ~x, ~y ∈ R3 . Then k~x × ~y k2 = k~x k2 k~y k2 − (~x · ~y )2 .

Let ~x, ~y ∈ R3 be nonzero vectors. Then

~x · ~y = k~x kk~y k cos θ

where 0 ≤ θ ≤ π. Substituting this into the Lagrange Identity gives

k~x × ~y k2 = k~x k2 k~y k2 − (k~x kk~y k cos θ)2


= k~x k2 k~y k2 − k~x k2 k~y k2 cos2 θ
= k~x k2 k~y k2 (1 − cos2 θ)
= k~x k2 k~y k2 sin2 θ.

Since sin θ ≥ 0 for 0 ≤ θ ≤ π, we may take square roots to obtain

k~x × ~y k = k~x kk~y k sin θ.

Now consider the parallelogram determined by the nonzero vectors ~x and ~y .

Figure 23: The parallelogram determined by ~x and ~y .

Denoting the base by b and the height by h, we see that b = k~x k and that h satisfies
sin θ = k~yh k which gives h = k~y k sin θ. Denoting the area of the parallelogram by A, we have

A = bh = k~x kk~y k sin θ = k~x × ~y k.

We see that the norm of the cross product of two nonzero vectors ~x, ~y ∈ R3 gives the area
of the parallelogram that ~x and ~y determine. Our derivation has been for nonzero vectors ~x
and ~y , and we implicitly assumed that ~x and ~y were not parallel in the above diagram. Note
that if ~x and ~y are parallel, then the parallelogram they determine is simply a line segment
(a degenerate parallelogram) and thus the area is zero. Moreover, if any of ~x and ~y are zero,
then the area of the resulting parallelogram is again zero. Note that in these two cases we
have ~x × ~y = ~0, so our formula A = k~x × ~y k holds for any ~x, ~y ∈ R3 .

56
Example 9.6. Let   
1 1
~x =  1  and ~y =  2  .
   

1 −3
Find
(a) the area of the parallelogram determined by ~x and ~y .
(b) the area of the triangle determined by ~x and ~y .
Solution.
(a) Since      
1 1 −5
~x × ~y =  1  ×  2  =  4  ,
     

1 −3 1
the area of the parallelogram is
√ √
A = k~x × ~y k = 25 + 16 + 1 = 42.

(b) The area of the triangle determined by ~x and ~y is half of the√area of the parallelogram
determined by ~x and ~y , that is, the area of the triangle is 12 42 (see Figure 24).

Figure 24: The triangle determined by ~x and ~y .

The Vector Equation of a Line


In R2 , we define lines by equations18 such as

x2 = mx1 + b or ax1 + bx2 = c


18
Recall we are using x1 , x2 , x3 , . . . rather than x, y, z, . . .

57
where m, a, b, c are constants. How do we describe lines in Rn (for example, in R3 )? It might
be tempting to think the above equations are equations of lines in Rn as well, but this is
not the case. Consider the graph of the line x2 = x1 in R2 . This graph consists of all points
(x1 , x2 ) such that x2 = x1 , which yields a line (see Figure 25). If we consider the equation

Figure 25: The graph of x2 = x1 is a line in R2 .

x2 = x1 in R3 , then we are considering all points (x1 , x2 , x3 ) with the property that x2 = x1 .
Notice that there is no restriction on x3 , so we can take x3 to be any real number. It follows
that the equation x2 = x1 represents a plane in R3 and not a line (see Figure 26).

Figure 26: The graph of x2 = x1 is a plane in R3 . The red line indicates the intersection of
the plane with the x1 x2 −plane.

58
Note that we require two things to describe a line:
1) A point P on the line,

2) A vector d~ in the direction of the line (called a direction vector for the line).
~ where d~ ∈ Rn is nonzero,
Definition 9.7. A line in Rn through a point P with direction d,
is given by the vector equation
 
x1
 .  −→ ~ t ∈ R.
~x =  ..  = OP + td,
xn
−→
Figure 27 shows how the line through P with direction d~ is “drawn out” by the vector OP +td~
as t varies from −∞ to ∞.

−→
Figure 27: The line through P with direction d~ and the vector OP + td~ for a few values of t.
−→
We can also think of the equation ~x = OP + td~ as first moving us from the origin to the
~ This is shown
point P , and then moving from P as far as we like in the direction given by d.
in Figure 28.
Example 9.8. Find the vector equation of the line through the points A(1, 1, −1) and
B(4, 0, −3).
Solution. We first find a direction vector for the line. Since the line passes through the points
A and B, we take the direction vector to be the vector from A to B. That is,
     
4 1 3
−→ −−→ −→ 
d~ = AB = OB − OA =  0  −  1  =  −1  .
    

−3 −1 −2

59
−→ ~
Figure 28: An equivalent way to understand the vector equation ~x = OP + td.

Hence, using the point A, we have a vector equation for our line:
   
1 3
−→ −→ 
~x = OA + tAB =  1  + t  −1  , t ∈ R.
  

−1 −2

Note that the vector equation for a line is not unique. In fact, in Example 9.8, we could
−→
have used the vector BA as our direction vector, and we could have used B as the point on
our line to obtain
   
4 −3
−−→ −→ 
~x = OB + tBA =  0  + t  1  , t ∈ R.
  

−3 2

Indeed, we can use any known point on the line and any nonzero scalar multiple of the
direction vector for the line when constructing the vector equation. Thus, there are infinitely
many vector equations for a line (see Figure 29).

Finally, given one of the vector equations for the line in Example 9.8, we have
           
x1 1 3 1 3t 1 + 3t
~x =  x2  =  1  + t  −1  =  1  +  −t  =  1 − t 
           

x3 −1 −2 −1 −2t −1 − 2t

from which it follows that

60
Figure 29: Two different vector equations for the same line.

x1 = 1 + 3t
x2 = 1 − t, t∈R
x3 = −1 − 2t

which we call the parametric equations of the line. For each choice of t ∈ R, these equations
give the x1 −, x2 − and x3 −coordinates of a point on the line. Note that since the vector
equation for a line is not unique, neither are the parametric equations for a line.

61
Lecture 10

The Vector Equation of a Plane


We extend the vector equation for a line in Rn to a vector equation for a plane in Rn .
Definition 10.1. The vector equation for a plane in Rn through a point P is given by
 
x1
 .  −→
~x =  ..  = OP + s~u + t~v , s, t ∈ R
xn

where ~u, ~v ∈ Rn are nonzero nonparallel vectors.


We may think of this vector equation as taking us from the origin to the point P on the
plane, and then adding any linear combination of ~u and ~v to reach any point on the plane.
It is important to note that the parameters s and t are chosen independently of one another,
that is, the choice of one parameter does not determine the choice of the other. See Figure
30.

Figure 30: Using vectors to describe a plane in Rn

Example 10.2. Find a vector equation for the plane containing the points A(1, 1, 1),
B(1, 2, 3) and C(−1, 1, 2).

62
Solution. We compute

    
1 1 0
−→ −−→ −→      
AB = OB − OA =  2  −  1  =  1 
3 1 2
     
−1 1 −2
−→ −→ −→ 
AC = OC − OA =  1  −  1  =  0 
    

2 1 1
−→ −→
and note that AB and AC are nonzero and nonparallel. A vector equation is thus
       
x1 1 0 −2
 −→ −→ −→  
~x =  x2  = OA + sAB + tAC =  1  + s  1  + t  0  , s, t ∈ R.
    

x3 1 2 1

The plane from the previous example is shown in Figure 31. We see that by setting either of
s, t ∈ R to be zero and letting the other parameter be arbitrary, we obtain vector equations
for two lines – each of which lie in the given plane:
       
1 0 1 −2
−→ −→     −→ −→   
~x = OA+sAB =  1  +s  1  , s ∈ R and ~x = OA+tAC =  1  +t  0  , t ∈ R.

1 2 1 1

This is illustrated in Figure 32.

Figure 31: The plane through the points A, B and C

63
Figure 32: A plane and two nonparallel lines in it.

We also note that evaluating the right hand side of the above vector equation gives
         
x1 1 0 −2 1 − 2t
~x =  x2  =  1  + s  1  + t 0  =  1 + s 
         

x3 1 2 1 1 + 2s + t

from which we derive the parametric equations of the plane:

x1 = 1 − 2t
x2 = 1 + s s, t ∈ R
x3 = 1 + 2s + t

Finally, we note that as with lines, our vector equation for the plane in Example 10.2 is not
unique as we could have chosen
−−→ −−→ −→
~x = OB + sBC + tAB, s, t ∈ R
−−→ −→
as the vector equation instead (it is easy to verify that BC and AB are nonzero and non-
parallel).

Example 10.3. Find a vector equation of the plane containing the point P (1, −1, −2) and
the line with vector equation
   
1 1
~x =  3  + r  1  , r ∈ R.
   

−1 4

64
Solution. We construct two vectors lying in the plane. For one, we can take the direction
vector of the given line, and for the other, we can take a vector from a known point on the
given line to the point P . Thus we let
       
1 1 1 0
~u =  1  and ~v =  −1  −  3  =  −4  .
       

4 −2 −1 −1

Then, since ~u and ~v are nonzero and nonparallel, a vector equation for the plane is
−→
~x = OP + s~u + t~v
     
1 1 0
=  −1  + s  1  + t  −4  , s, t ∈ R.
     

−2 4 −1

We note that for the vector equation for a plane, we do require ~u and ~v to be nonparallel. If
~u and ~v are parallel, say ~u = c~v for some c ∈ R, then the vector equation we derive is
−→ −→ −→
~x = OP + s~u + t~v = OP + s(c~v ) + t~v = OP + (sc + t)~v ,

which is the vector equation for a line through P , not a plane.

The Scalar Equation of a Plane in R3


Given a plane in R3 and any point P on this plane, there is a unique line through that point
that is perpendicular to the plane. Let ~n be a direction vector for this line. Then for any Q
−→
on the plane, ~n is orthogonal to P Q.

Figure 33: A line that is perpendicular to a plane.

65
Definition 10.4. A nonzero vector ~n ∈ R3 is a normal vector for a plane if for any two
−→
points P and Q on the plane, ~n is orthogonal to P Q.

We note that given a plane in R3 , a normal vector for that plane is not unique as any nonzero
scalar multiple of that vector will also be a normal vector for that plane.

Now consider a plane in R3 with a normal vector


 
n1
~n =  n2  ,
 

n3

and suppose P (a, b, c) is a given point on this plane. For any point Q(x1 , x2 , x3 ), Q lies on
the plane if and only if
   
n1 x1 − a
−→ −→ −→ 
0 = ~n · P Q = ~n · OQ − OP =  n2  ·  x2 − b  = n1 (x1 − a) + n2 (x2 − b) + n3 (x3 − c).
  

n3 x3 − c

Definition 10.5. The scalar equation of a plane in R3 with normal vector


 
n1
~n =  n2
 

n3

containing the point P (a, b, c) is given by

n1 x1 + n2 x2 + n3 x3 = n1 a + n2 b + n3 c.

Example 10.6. Find a scalar equation of the plane containing the points A(3, 1, 2), B(1, 2, 3)
and C(−2, 1, 3).

Solution. We have three points lying on the plane, so we only need to find a normal vector
for the plane. We compute
     
1 3 −2
−→ −−→ −→     
AB = OB − OA =  2  −  1  =  1 

3 2 1
     
−2 3 −5
−→ −→ −→ 
AC = OC − OA =  1  −  1  =  0 
    

3 2 1

66
−→ −→
Figure 34: The normal vector ~n is orthogonal to both AB and AC.

−→ −→
and notice that AB and AC are nonzero nonparallel vectors in R3 . We compute
     
−2 −5 1
−→ −→ 
~n = AB × AC =  1  ×  0  =  −3 
    

1 1 5
−→ −→
and recall that the nonzero vector ~n is orthogonal to both AB and AC. It follows from
Example 9.4 that ~n is orthogonal to the entire plane and is thus a normal vector for the
plane. Hence, using the point A(3, 1, 2), our scalar equation is
1(x1 − 3) − 3(x2 − 1) + 5(x3 − 2) = 0
which evaluates to
x1 − 3x2 + 5x3 = 10.
Figure 34 helps us visualize the plane from the previous example.
We make a few remarks about the preceding example here.
• Using the point B or C rather than A to compute the scalar equation would lead to
the same scalar equation as is easily verified.
• As the normal vector for the above plane is not unique, neither is the scalar equation.
In fact, 2~n is also a normal vector for the plane, and using it instead of ~n would lead to
the scalar equation 2x1 − 6x2 + 10x3 = 20, which is just the scalar equation we found
multiplied by a factor of 2.
• From our work above, we see that we can actually compute a vector equation for the
plane:
     
3 −2 −5
−→ −→ −→  
~x = OA + sAB + tAC =  1  + s  1  + t  0  , s, t ∈ R
   

2 1 1

67
−→
for example. In fact, given a vector equation ~x = OP + s~u + t~v for a plane in R3
containing a point P , we can compute a normal vector ~n = ~u × ~v .

• Note that in the scalar equation x1 − 3x2 + 5x3 = 10, the coefficients on the variables
x1 , x2 and x3 are exactly the entries in the normal vector as predicted by the formula
for the scalar equation. Thus, if we are given a scalar equation of a different plane, say
3x1 − 2x2 + 5x3 = 72, we can deduce immediately that
 
3
~n =  −2 
 

is a normal vector for that plane.


Given a plane in R3 , when is it better to use a vector equation and when is it better to
use a scalar equation? Consider a plane with scalar equation 4x1 − x2 − x3 = 2 and vector
equation      
1 1 1
~x =  1  + s  2  + t  1  , s, t ∈ R.
     

1 2 3
Suppose you are asked if the point (2, 6, 0) lies on this plane. Using the scalar equation
4x1 − x2 − x3 = 2, we see that 4(2) − 1(6) − 1(0) = 2 satisfies this equation so we can easily
conclude that (2, 6, 0) lies on the plane. However, if we use the vector equation, we must
determine if there exist s, t ∈ R such that
       
1 1 1 2
 1  + s 2  + t 1  =  6 
       

1 2 3 0

which leads to the system of equations

s + t = 1
2s + t = 5
2s + 3t = −1

With a little work, we can find that the solution19 to this system is s = 4 and t = −3
which again guarantees that (2, 6, 0) lies on the plane. It should be clear that using a scalar
equation is preferable here. On the other hand, if you are asked to find a point that lies
on the plane, then using the vector equation, we may select any two values for s and t (say
s = 0 and t = 0) to conclude that the point (1, 1, 1) lies on the plane. It is not too difficult
to find a point lying on the plane using the scalar equation either - this will likely be done
19
We will look at a more efficient technique to solve systems of equations in a few lectures.

68
by choosing two of x1 , x2 , x3 and then solving for the last, but this does involve a little bit
more math. Thus, the scalar equation is preferable when verifying if a given point lies on a
plane, and the vector equation is preferable when asked to generate points that lie on the
plane.

We have have discussed parallel vectors previously, and we can use this definition to define
parallel lines and planes.

Definition 10.7. Two lines in Rn are parallel if their direction vectors are parallel. Two
planes in R3 are parallel if their normal vectors are parallel.

69
Lecture 11

Projections
Given two vectors ~u, ~v ∈ Rn with ~v 6= ~0, we can write ~u = ~u1 + ~u2 where ~u1 is a scalar
multiple of ~v and ~u2 is orthogonal to ~v . In physics, this is often done when one wishes to
resolve a force into its vertical and horizontal components.

Figure 35: Decomposing ~u ∈ Rn as ~u = ~u1 + ~u2 where ~u1 is parallel to ~v and ~u2 is orthogonal
to ~v .

This is not a new idea. In R2 , we have seen that we can write a vector ~u as a linear
combination ~e1 = [ 10 ] and ~e2 = [ 01 ] in a natural way. Figure 36 shows that we are actually
writing a vector ~u ∈ R2 as the sum of a vector parallel to ~e1 and orthogonal to ~e1 .

Figure 36: Writing a vector ~u ∈ R2 as a linear combination of ~e1 and ~e2 .

Now for ~u, ~v ∈ Rn with ~v 6= ~0,

~u = ~u1 + ~u2 =⇒ ~u2 = ~u − ~u1


~u2 orthogonal to ~v =⇒ ~u2 · ~v = 0
~u1 a scalar multiple of ~v =⇒ ~u1 = t~v for some t ∈ R

70
so if we can find t, then we can find ~u1 and then find ~u2 . To find t, we have

0 = ~u2 · ~v = (~u − ~u1 ) · ~v = ~u · ~v − ~u1 · ~v = ~u · ~v − (t~v ) · ~v .

Hence
0 = ~u · ~v − t(~v · ~v ) = ~u · ~v − tk~v k2
and since ~v 6= ~0,
~u · ~v
t= .
k~v k2

Definition 11.1. Let ~u, ~v ∈ Rn with ~v 6= ~0. The projection of ~u onto ~v is


~u · ~v
proj ~v ~u = ~v
k~v k2

and the projection of ~u perpendicular to ~v (or the perpendicular of ~u onto ~v ) is

perp ~v ~u = ~u − proj ~v ~u.

Note that from our above work, ~u1 = proj ~v ~u and ~u2 = perp ~v ~u.

Figure 37: Visualizing projections and perpendiculars based on the angle determined by
~u, ~v ∈ Rn .

Example 11.2. Let    


1 −1
~u =  2  and ~v =  1  .
   

3 2

71
Then
    
−1 −1 −7/6
~u · ~v −1 + 2 + 6   7
proj ~v ~u = ~v =  1  =  1  =  7/6  .
  
k~v k 2 1+1+4 6
2 2 7/3

and

    
1 −7/6 13/6
perp ~v ~u = ~u − proj ~v ~u =  2  −  7/6  =  5/6  .
     

3 7/3 2/3

In the previous example, note that

• proj ~v ~u = 76 ~v which is a scalar multiple of ~v ,

• (perp ~v ~u) · ~v = − 13
6
+ 56 + 4
3
= − 86 + 8
6
= 0 so perp ~v ~u is orthogonal to ~v ,

• proj ~v ~u + perp ~v ~u = ~u.

Example 11.3. For ~u, ~v ∈ Rn with ~v 6= ~0, prove that proj ~v ~u and perp ~v ~u are orthogonal.

Proof. We have

proj ~v ~u · perp ~v ~u = proj ~v ~u · (~u − proj ~v ~u)


= (proj ~v ~u) · u − proj ~v ~u · proj ~v ~u
     
~u · ~v ~u · ~v ~u · ~v
= ~v · ~u − ~v · ~v
k~v k2 k~v k2 k~v k2
   2
~u · ~v ~u · ~v
= (~v · ~u) − (~v · ~v )
k~v k2 k~v k2
(~u · ~v )2 (~u · ~v )2
 
= − k~v k2
k~v k2 k~v k4
(~u · ~v )2 (~u · ~v )2
= −
k~v k2 k~v k2
=0

and thus proj ~v ~u and perp ~v ~u are orthogonal.

Distances from Points to Lines and Planes


Given a point P , and a line (or a plane), we are interested in finding the point Q on the line
(or the plane) that is closest to P , and also the shortest distance from P to the line (or the
plane).

72
Example 11.4. Find the shortest distance from the point P (1, 2, 3) to the line L which
passes through the point P0 (2, −1, 2) with direction vector
 
1
d~ =  1  .
 

−1

Also, find the point Q on L that is closest to P .

Before we state the solution, we illustrate the situation in Figure 38. Note that the line L and
the point P were plotted arbitrarily, so it is not meant to be accurate. It does however, give
us a way to think about the problem geometrically and inform us as to what computations
we should do.

Figure 38: Finding the distance from a point to a line.

Solution. We construct the vector from the point P0 lying on the line to the point P which
gives      
1 2 −1
−−→ −→ −−→   
P0 P = OP − OP0 =  2  −  −1  =  3  .
  

3 2 1
−−→
Projecting the vector P0 P onto the direction vector of the line leads to
     
−−→ ~ 1 1 1/3
−−→ P0 P · d ~ −1 + 3 − 1   1
proj d~ P0 P = d=  1  =  1  =  1/3 
  
~
kdk 2 1+1+1 3
−1 −1 −1/3

73
and it follows that
     
−1 1/3 −4/3
−−→ −−→ −−→ 
perp d~ P0 P = P0 P − proj d~ P0 P =  3  −  1/3  =  8/3  .
    

1 −1/3 4/3

The shortest distance from P to L is thus given by


−−→ 1√ 1p 4√
kperp d~ P0 P k = 16 + 64 + 16 = 16(1 + 4 + 1) = 6.
3 3 3
We have two ways to find the point Q since
     
2 1/3 7/3
−→ −−→ −−→ 
OQ = OP0 + proj d~ P0 P =  −1  +  1/3  =  −2/3 
    

2 −1/3 5/3

and
   
 
1 −4/3 7/3
−→ −→ −−→   
OQ = OP − perp d~ P0 P =  2  −  8/3  =  −2/3  .
  

3 4/3 5/3
7
, − 23 , 53

In either case, Q 3
is the point on L closest to P .
−−→
Now we see that Figure 38 was indeed inaccurate: It suggest that proj d~ P0 P is approximately
5~ −−→ ~
2
d, but our computations show that proj d~ P0 P = 13 d.

Example 11.5. Find the shortest distance from the point P (1, 2, 3) to the plane T with
equation x1 + x2 − 3x3 = −2. Also, find the point Q on T that is closest to P .

Figure 39 can be used to help us decide how to proceed.


Solution. We see that P0 (−2, 0, 0) lies on T since −2 + 0 − 3(0) = −2. We also have that
 
1
~n =  1 
 

−3

is a normal vector for T . Now



    
1 −2 3
−−→ −→ −−→   
P0 P = OP − OP0 =  2  −  0  =  2 
  

3 0 3

74
−→ −−→
Figure 39: Finding the distance from a point to a plane. Note that kQP k = kproj ~n P0 P k.

and
   
−−→ 1 1
−−→ P0 P · ~n 3+2−9 4 
proj ~n P0 P = ~n = 1  = −  1 .
 
k~nk 2 1+1+9

11
−3 −3

The shortest distance from P to T is



−−→ 4 √ 4 11
kproj ~n P0 P k = − 1+1+9= .
11 11

To find Q we have
    
1 1 15/11
−→ −→ −−→   4 
OQ = OP − proj ~n P0 P =  2  +  1  =  26/11 
  
11
3 −3 21/11
15 26 21

so Q , ,
11 11 11
is the point on T closest to P .

75
Lecture 12

Volumes of Parallelepipeds in R3
Consider three nonzero vectors w, ~ ~x, ~y ∈ R3 such that no one vector is a linear combination
of the other two (that is, w,
~ ~x, ~y are nonzero and nonparallel and no one of them lies on the
plane determined by the other two20 ). These three vectors determine a parallelepiped, which
is the three dimensional analogue of a parallelogram.

Figure 40: A parallelipiped determined by the vectors w,


~ ~x and ~y .

The volume of the parallelepiped is the product of its height with the area of its base. We
know that the area of the base is given by k~x ×~y k (which is nonzero since ~x and ~y are nonzero
and nonparallel), and we can find the height by computing the length of the projection of w ~
onto ~x × ~y . Thus, the volume V of the parallelepiped is given by

V = k~x × ~y kkproj ~x×~y w ~k


w~ · (~x × ~y )
= k~x × ~y k (~x × ~y )
k~x × ~y k2
|w
~ · (~x × ~y )|
= k~x × ~y k k~x × ~y k
k~x × ~y k2
= |w
~ · (~x × ~y )|
20
~ ~x and ~y are equivalent to the set {w,
As we will see shortly, our conditions on w, ~ ~x, ~y } being linearly
independent.

76
Example 12.1. Let
     
1 1 1
w
~ =  1 , ~x =  1  and ~y =  2  .
     

1 2 −3

Then
         
1 1 1 1 −7
~ · (~x × ~y ) =  1  ·  1  ×  2  =  1  ·  5  = −7 + 5 + 1 = −1
w
         

1 2 −3 1 1

so the volume of the parallelepiped determined by w,


~ ~x and ~y is

V = |w
~ · (~x × ~y )| = | − 1| = 1.

We make a couple of remarks here:

• In our derivation of the formula for the volume of the parallelepiped determined by
the vectors w,
~ ~x and ~y , there was nothing special about labelling the vectors the way
that we did. We only needed to call one of the vectors w, ~ one of them ~x and one of
them ~y . Also, we could have chosen any of the six sides of the parallelogram to be the
base. Thus, we also have

V = |w
~ · (~y × ~x)| = |~x · (w
~ × ~y )| = |~x · (~y × w)|
~ = |~y · (~x × w)|
~ = |~y · (w
~ × ~x)|.

• Our derivation also required that no one of the vectors w, ~ ~x and ~y was a linear combi-
nation of the others (so no one of the three vectors lied in the plane through the origin
determined by the other two). Suppose one of the vectors is a linear combination of
the others, say w~ is a linear combination of ~x and ~y . Then w ~ = s~x + t~y for some
s, t ∈ R (from which we see w ~ lies in the plane through the origin determined by ~x
and ~y ). Geometrically, the resulting parallelepiped determined by w, ~ ~x and ~y is “flat”,
and thus the volume should be zero. Since w ~ lies in the plane determined by ~x and ~y ,
we have that w ~ is orthogonal to ~x × ~y and so w ~ · (~x × ~y ) = 0 so our derived formula
does indeed return the correct volume. A similar result occurs if ~x or ~y is a linear
combination of the other two vectors. Thus, our formula V = |w ~ · (~x × ~y )| holds for
3
~ ~x, ~y ∈ R .
any three vectors w,

Introduction to Set Theory


Sets will play a large role in linear algebra, so we need to be able to understand the basic
results concerning them.

77
Definition 12.2. A set is a collection of objects. We call the objects elements of the set.21
Example 12.3.
• S = {1, 2, 3} is a set with three elements, namely 1, 2 and 3,
• T = {♥, f (x), {1, 2}, 3},
• ∅ = { }, the set with no elements, which is called the empty set.
We see that one way to describe a set is to list the elements of the set between curly braces
“{” and “}”. The set T shows that a set can have elements other than numbers - the elements
can be functions, other sets, or other symbols. The empty set has no elements in it, and we
normally prefer using ∅ over { } in this case.
Given a set S, we write x ∈ S if x is an element of S, and x ∈
/ S is x is not an element of S.
Example 12.4. For T = {♥, f (x), {1, 2}, 3}, we have
♥ ∈ T, f (x) ∈ T, {1, 2} ∈ T and 3 ∈ T
but
1∈
/T and 2 ∈
/T
Example 12.5. Here are a few more sets that we know:
• N = {1, 2, 3, . . .},
• Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .},
na o
• Q= a, b ∈ Z, b 6= 0 ,
b
• R is the the set of all numbers that are either rational or irrational,
• C = {a + bj | a, b ∈ R},
  
 x 1 
 .. 
 
n
• R =  .  x1 , . . . , x n ∈ R .

 

xn
Note that each of these sets contains infinitely many elements. The sets N and Z are defined
by listing their elements (or rather, listing enough elements so that you “get the idea”), the
set R is defined using words, and the sets Q, C and Rn are defined using set builder notation
where an arbitrary element is described. For example, the set
na o
Q= a, b ∈ Z, b 6= 0
b
is understood to mean “Q is the set of all fractions of the form ab where a and b are integers
and b is nonzero”.
21
This definition is far from the formal definition, and can lead to contradictions if we are not careful. For
our purposes here, however, this definition will be sufficient.

78
    
 x1
 
 1
3
Example 12.6. Let S =  x2  ∈ R 2x1 − x2 + x3 = 4 . Is  2  ∈ S?
   
 
x3 3
 
 
1
Solution. Since 2(1) − 2 + 3 = 3 6= 4, we have that  2  ∈
/ S.
 

3
We now define two ways that we can combine given sets to create new sets.
Definition 12.7. Let S, T be sets. The union of S and T is the set

S ∪ T = {x | x ∈ S or x ∈ T }

and the intersection of S and T is the set

S ∩ T = {x | x ∈ S and x ∈ T }.

We can visualize the union and intersection of two sets using Venn Diagrams. Although
Venn Diagrams can help us visualize sets, they should never be used as part of a proof of
any statement regarding sets.

(a) A Venn Diagram depicting the union (b) A Venn Diagram depicting the inter-
of two sets S and T . section of two sets S and T .

Figure 41: Venn Diagrams.

Example 12.8. If S = {1, 2, 3, 4} and T = {−1, 2, 4, 6, 7}, then

S ∪ T = {−1, 1, 2, 3, 4, 6, 7}
S ∩ T = {2, 4}

Definition 12.9. Let S, T be sets. We say that S is a subset of T (and we write S ⊆ T ) if


for every x ∈ S we have that x ∈ T . If S is not a subset of T , then we write S 6⊆ T .

Example 12.10. Let S = {1, 2, 4} and T = {1, 2, 3, 4}. Then S ⊆ T since every element of
S is an element of T , but T 6⊆ S since 3 ∈ T , but 3 ∈
/ S.

79
Figure 42: A Venn diagram showing an instance when S ⊆ T on the left, and an instance
when S 6⊆ T on the right.

Note that it’s important to distinguish between an element of a set and a subset of a set.
For example,
1 ∈ {1, 2, 3} but 1 6⊆ {1, 2, 3}
and
{1} ∈
/ {1, 2, 3} but {1} ⊆ {1, 2, 3}.
More interestingly,

{1, 2} ∈ {1, 2, {1, 2}} and {1, 2} ⊆ {1, 2, {1, 2}}

which shows that an element of a set may also be a subset of a set. This last example can
cause students to stumble, so the following may help:

{1, 2} ∈ {1, 2, {1, 2}} and {1, 2} ⊆ {1, 2, {1, 2}}.

Finally we mention that for any set S, we have that ∅ ⊆ S. This generally seems quite
strange at first. However if ∅ 6⊆ S, then there must be some element x ∈ ∅ such that x ∈
/ S.
But the empty set contains no elements, so we can never show that ∅ is not a subset of S.
Thus we are forced to conclude that ∅ ⊆ S.22

Definition 12.11. Let S, T be sets. We say that S = T if S ⊆ T and T ⊆ S.

22
The statement ∅ ⊆ S is called vacuously true, that is, it is a true statement simply because we cannot
show that it is false.

80
Example 12.12. Let
( " # " # " # )
1 1 2
S= c1 + c2 + c3 c1 , c2 , c3 ∈ R
2 1 3
( " # " # )
1 1
T = d1 + d2 d1 , d2 ∈ R .
2 1

Show that S = T .
Before we give the solution, we note that S is the set of all linear combinations of the vectors
" # " # " #
1 1 2
, and
2 1 3

while T is the set of all linear combinations of just


" # " #
1 1
and
2 1

Solution. We show that S = T by showing that S ⊆ T and that T ⊆ S. To show that


S ⊆ T , we choose an arbitrary ~x ∈ S and show that ~x ∈ T . So, let ~x ∈ S. Then there exist
c1 , c2 , c3 ∈ R such that
" # " # " #
1 1 2
~x = c1 + c2 + c3
2 1 3
" # " # " # " #!
1 1 1 1
= c1 + c2 + c3 +
2 1 2 1
" # " #
1 1
= (c1 + c3 ) + (c2 + c3 )
2 1

from which it follows that ~x ∈ T (take d1 = c1 + c3 and d2 = c2 + c3 ) and hence S ⊆ T . We


now show that T ⊆ S by showing that if ~y ∈ T then ~y ∈ S. Let ~y ∈ T . Then there exist
d1 , d2 ∈ R such that
" # " #
1 1
~y = d1 + d2
2 1
" # " # " #
1 1 2
= d1 + d2 +0
2 1 3

from which it follows that ~y ∈ S (take c1 = d1 , c2 = d2 and c3 = 0) and hence T ⊆ S. Since


S ⊆ T and T ⊆ S, we conclude that S = T .

81
Lecture 13

Spanning Sets
Definition 13.1. Let B = {~v1 , . . . , ~vk } be a set of vectors in Rn . The span of B is
Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.
We say that the set Span B is spanned by B and that B is a spanning set for Span B.
It is important to note that there are two sets here: B and Span B. Given that B =
{~v1 , . . . , ~vk } is a set of vectors in Rn , Span B is simply the set of all linear combinations
of the vectors ~v1 , . . . , ~vk . To show that a vector ~x ∈ Rn belongs to Span B, we must show
that we can express ~x as a linear combination of ~v1 , . . . , ~vk . As an example, note that for
i = 1, . . . , k,
~vi = 0~v1 + · · · + 0~vi−1 + 1~vi + 0~vi+1 + · · · + 0~vk
from which we see that vi ∈ Span B for i = 1, . . . , k. This shows that B ⊆ Span B.
Example 13.2. Determine whether or not
" # (" # " #)
2 4 3
∈ Span , .
3 5 3

Solution. Let c1 , c2 ∈ R and consider


" # " # " # " #
2 4 3 4c1 + 3c2
= c1 + c2 = .
3 5 3 5c1 + 3c2

Equating corresponding entries leads to the system of equations

2 = 4c1 + 3c2
3 = 5c1 + 3c2 .
Subtracting the first equation from the second equation gives c1 = 1 and substituting c1 = 1
into either equation and solving for c2 gives c2 = − 32 . Thus
" # " # " #
2 4 2 3
=1 −
3 5 3 3
" # " # " #
2 4 3
and so can be expressed as a linear combination of and which allows
3 5 3
us to conclude that " # (" # " #)
2 4 3
∈ Span , .
3 5 3

82
Example 13.3. Determine whether or not
     
1  1
 1 

2 ∈ Span     .
0 , 1
     
 
 
3 1 0
 

Solution. Let c1 , c2 ∈ R and consider


       
1 1 1 c1 + c2
 2  = c1  0  + c2  1  =  c2  .
       

3 1 0 c1

We obtain the system of equations

1 = c1 + c2
2 = c2
3 = c1

It is clear that the last two equations give c1 = 3 and c2 = 2, but from the first equation
we
 have c1 + c2 = 3 + 2 = 5 6= 1, so our system cannot have a solution.
  Here we see that
1 1 1
 2  cannot be expressed as a linear combination of  0  and  1  and we conclude
     

3 1 0
that      
1  1
 1  
2 ∈
/ Span     .
0 , 1
     
 
 
3 1 0
 

Given a set of vectors ~v1 , . . . , ~vk , we now try to understand what Span {~v1 , . . . , ~vk } looks like
geometrically.
Example 13.4. Describe the subset
 
 1 
 
S = Span  2  .
 
 
3
 

of R3 geometrically.
Solution. By defintion,    

 1 

S = s 2  s ∈ R .
 
 
3
 

83
Thus, ~x ∈ S if and only if  
1
~x = s  2 
 

3
for some s ∈ R. The equation  
1
~x = s  2  , s∈R
 

3
is called a vector equation for S. But we see that this is simply a vector equation for
 a line in
1
3
R through the origin. Hence, S is a line through the origin with direction vector  2 .
 

Example 13.5. Describe the subset


   
 1
 1 
S = Span  0  ,  1 
   
 
1 0
 

of R3 geometrically.

Solution. By definition,
     

 1 1 

S = s  0  + t  1  s, t ∈ R
   
 
1 0
 

so a vector equation for S is


   
1 1
~x = s  0  + t  1  , s, t ∈ R.
   

1 0

Since the vectors    


1 1
 0  and  1 
   

1 0
are not scalar multiples of one another, we see that S is a plane in R3 through the origin23 .
23
The set S is from Example 13.3. In light of what we have observed here, Example 13.3 shows us that
the point P (1, 2, 3) does not lie on the plane S.

84
Example 13.6. Let      
 1
 1 1 
S = Span  0  ,  1  ,  2  .
     
 
0 0 1
 

Show that S = R3 .
Solution. We show S = R3 by showing that S ⊆ R3 and that R3 ⊆ S. To see that S ⊆ R3 ,
note that      
1 1 1
3
 0 , 1 , 2  ∈ R
     

0 0 1
and that S contains all linear combinations of these three vectors. Since R3 is closed under
linear combinations (see V 1 and V 6 from Theorem 6.10), every vector in S must be a vector
in R3 , so S ⊆ R3 . Now, let  
x1
~x =  x2  ∈ R3
 

x3
and for c1 , c2 , c3 ∈ R consider
         
x1 1 1 1 c1 + c2 + c3
 x2  = c1  0  + c2  1  + c3  2  =  c2 + 2c3 
         

x3 0 0 1 c3
We have the system of equations

x 1 = c1 + c2 + c3
x2 = c2 + 2c3
x3 = c3
The last equation gives c3 = x3 , and from the second equation we have that
c2 = x2 − 2c3 = x2 − 2x3 .
Finally, from the first equation, we see
c1 = x1 − c2 − c3 = x1 − (x2 − 2x3 ) − x3 = x1 − x2 + x3 .
Thus        
x1 1 1 1
 x2  = (x1 − x2 + x3 )  0  + (x2 − 2x3 )  1  + x3  2 
       

x3 0 0 1
and it follows that ~x ∈ S so R3 ⊆ S. Hence, S = R3 .

85
It would seem (at least in R3 ) that the span of one vector gives a line through the origin,
the span of two vectors gives a plane through the origin, and the span of three vectors gives
all of R3 . Unfortunately this is not always true as the next example shows.
Example 13.7. Describe the subset
     
 1
 0 1 
S = Span  0  ,  1  ,  1 
     
 
0 0 0
 

of R3 geometrically.
Solution. By definition,
       

 1 0 1 

S = c1  0  + c2  1  + c3  1  c1 , c2 , c3 ∈ R
     
 
0 0 0
 

so a vector equation for S is


     
1 0 1
~x = c1  0  + c2  1  + c3  1  , c1 , c2 , c3 ∈ R.
     

0 0 0

However, we observe that      


1 1 0
 1 = 0 + 1 
     

0 0 0
so
       
1 0 1 0
~x = c1  0  + c2  1  + c3  0  +  1 
       

0 0 0 0
   
1 0
= (c1 + c3 )  0  + (c2 + c3 )  1  .
   

0 0

Setting d1 = c1 + c3 and d2 = c2 + c3 (where c1 , c2 , c3 arbitrary imply d1 , d2 are arbitrary),


we have    
1 0
~x = d1  0  + d2  1  , d1 , d2 ∈ R
   

0 0

86
is also a vector equation for S. Since the vectors
   
1 0
 0  and  1 
   

0 0

are not scalar multiples of one another, we see that S is a plane in R3 through the origin.
From      
1 0 d1
~x = d1  0  + d2  1  =  d2  ,
     

0 0 0
we clearly24 see that S is the x1 x2 −plane of R3 .
In the previous example, one of the vectors in the spanning set for S was a linear combination
of the other vectors in that spanning set. We saw that we could remove that vector from the
spanning set and the resulting smaller set would still span S. It was important to do this as
it allowed us to understand that S was geometrically a plane in R3 through the origin.
Theorem 13.8. Let ~v1 , . . . , ~vk ∈ Rn . One of these vectors, say ~vi , can be expressed as a
linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk if and only if

Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }.

We make a comment here before giving the proof. The theorem we need to prove is a double
implication as evidenced by the words if and only if.25 Thus we must prove two implications:
1. If ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk , then
Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }
2. If Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }, then ~vi can be expressed as a
linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk .
The result of this theorem is that the two statements
“ ~vi can be expressed as a linear combination of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk ”
and
“ Span {~v1 , . . . , ~vk } = Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }”
are equivalent, that is, they are both true or they are both false. The proof that follows is
often not understood after just the first reading - it takes a bit of time to understand, so
don’t be discouraged if you need to read it a few times before it begins to make sense.
24
Please be very careful if you use “clearly”, as what seems clear to you might not be clear to someone
else. Many marks have been lost by students due to “clearly” being used when their work was not clear at
all.
25
We sometimes write ⇐⇒ to mean “if and only if”. To prove a statement of the form A ⇐⇒ B, we must
prove the two implications A =⇒ B and B =⇒ A, and so we call A ⇐⇒ B a double implication.
87
Proof. Without loss of generality26 , we assume i = k. To simplify the writing of the proof,
we let

A = Span {~v1 , . . . , ~vk−1 , ~vk }


B = Span {~v1 , . . . , ~vk−1 }.

To prove the first implication, assume that ~vk can be expressed as a linear combination of
~v1 , . . . , ~vk−1 . Then there exist c1 , . . . , ck−1 ∈ R such that

~vk = c1~v1 + · · · + ck−1~vk−1 . (12)

We must show that A = B. Let ~x ∈ A. Then there exist d1 , . . . , dk−1 , dk ∈ R such that

~x = d1~v1 + · · · + dk−1~vk−1 + dk~vk

and we make the substitution for ~vk using Equation (12) to obtain

~x = d1~v1 + · · · + dk−1~vk−1 + dk c1~v1 + · · · + ck−1~vk−1 )


= (d1 + dk c1 )~v1 + · · · + (dk−1 + dk ck−1 )~vk−1

from which we see that ~x can be expressed as a linear combination of ~v1 , . . . , ~vk−1 and it
follows that ~x ∈ B. Hence A ⊆ B. Now let ~y ∈ B. Then there exist a1 , . . . , ak−1 ∈ R such
that

~y = a1~v1 + · · · + ak−1~vk−1
= a1~v1 + · · · + ak−1~vk−1 + 0~vk

and we have that ~y can be expressed as a linear combination of ~v1 , . . . , ~vk from which it
follows that ~y ∈ A. We have that B ⊆ A and combined with A ⊆ B we conclude that
A = B.

To prove the second implication, we now assume that A = B and we must show that
~vk can be expressed as a linear combination of ~v1 , . . . , ~vk−1 . Since vk ∈ A (recall that
~vk = 0~v1 + · · · + 0~vk−1 + 1~vk ) and A = B, we have ~vk ∈ B. Thus, there exist b1 , . . . , bk−1 ∈ R
such that ~vk = b1~v1 + · · · + bk−1~vk−1 as required.

Example 13.9. Consider


(" # " # " # " #)
1 5 2 0
S = Span , , , .
0 0 4 1
26
What we mean here is that if i 6= k, then we may “rename” the vectors ~v1 , . . . , ~vk so that ~vk is the vector
that can be expressed as a linear combination of the first k − 1 vectors. Thus we just assume i = k. Note
that for i = k, Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk } = Span {~v1 , . . . , ~vk−1 }.

88
Since " # " # " # " # " #
5 1 1 2 0
=5 =5 +0 +0
0 0 0 4 1
Theorem 13.8 gives (" # " # " #)
1 2 0
S = Span , ,
0 4 1
and since " # " # " #
2 1 0
=2 +4
4 0 1
it again follows from Theorem 13.8 that
(" # " #)
1 0
S = Span ,
0 1
and since " # " #
1 0
and
0 1
are not scalar multiples of one another, we cannot remove either of them from the spanning
set without changing the span. A vector equation for S is
" # " #
1 0
~x = c1 + c2 , c1 , c2 ∈ R.
0 1
Combining the vectors on the right gives
" #
c1
~x = .
c2
and it is clear that S = R2 .
Regarding the last example, the vector(s) that were chosen to be removed from the spanning
set depended on us noticing that some were linear combinations of others. Of course, we
could have noticed that " # " # " #
1 1 2 0
= −2
0 2 4 1
and concluded that (" # " # " #)
5 2 0
S = Span , ,
0 4 1
and then continued from there. Indeed, any of
nh i h io nh i h io nh i h io nh i h io
1 2 5 2 5 0 2 0
S = Span 0 , 4 = Span 0 , 4 = Span 0 , 1 = Span 4 , 1

are also correct descriptions of S where the spanning sets cannot be further reduced.

89
Lecture 14

Linear Dependence and Linear Independence


In the previous lecture, we saw that given a spanning set for a set S, if one of the vectors in
the spanning set was a linear combination of the others, then we could remove this vector
and the set of remaining vectors would still span S. Thus far, our method has been to simply
observe whether or not one vector in a given spanning set was a linear combination of the
others. However, suppose we are given
       

 1 −2 1 −6 

 
 2   1   6   −6 
       
S = Span  , , ,  .

  −3   4   8   3  

 

7 8 2 7
It’s likely not immediately clear that
       
−6 1 −2 1
 −6   2   1   6 
 = −  + 2 −
       
 
 3   −3   4   8 
7 7 8 2
and that we can thus remove the last vector from the spanning set for S. Now imagine being
given 500 vectors in R1000 and trying to decide if any one of them is a linear combination of
the other 499 vectors. Inspection clearly won’t help here, so we need a better way to spot
these “dependencies” among a set of vectors should they exist. We make a definition here,
and will see soon how it can help us spot such dependencies.
Definition 14.1. Let B = {~v1 , . . . , ~vk } be a set of vectors in Rn . We say that B is linearly
dependent if there exist c1 , . . . , ck ∈ R, not all zero,27 so that
c1~v1 + · · · + ck~vk = ~0.
We say that B is linearly independent if the only solution to
c1~v1 + · · · + ck~vk = ~0
is c1 = · · · = ck = 0, which we call the trivial solution.
Example 14.2. Is the set (" # " #)
2 −1
A= ,
3 2
linearly dependent or linearly independent?
27
When we say c1 , . . . , ck are not all zero, we mean that at least one of c1 , . . . , ck is nonzero.

90
Solution. Let c1 , c2 ∈ R and consider
" # " # " #
2 −1 0
c1 + c2 = .
3 2 0

Equating entries gives


2c1 − c2 = 0
3c1 + 2c2 = 0
If we add twice the first equation to the second equation,28 then we have that 7c1 = 0, that
is, c1 = 0, and substituting c1 = 0 into either of the above equations gives c2 = 0. As
c1 = c2 = 0 is the only solution, we have that A is linearly independent.
Example 14.3. Is the set
     

 1 2 1 
B=  0 , 1 , 1 
     
 
−1 0 1
 

linearly dependent or linearly independent?


Solution. Let c1 , c2 , c3 ∈ R and consider
       
1 2 1 0
c1  0  + c2  1  + c3  1  =  0  .
       

−1 0 1 0

We obtain
c1 + 2c2 + c3 = 0
c2 + c3 = 0
−c1 + c3 = 0
From the third equation, we see that c1 = c3 and from the second equation we have c2 = −c3 .
Substituting into the first equation gives

c3 + 2(−c3 ) + c3 = 0

which holds for any value of c3 . Thus we let c3 = t, where t ∈ R is any scalar. We then have

c1 = t, c2 = −t and c3 = t, t ∈ R.

Letting t 6= 0 gives nontrivial solutions, so we conclude that B is linearly dependent.


28
It’s completely fine to solve for one of c1 , c2 in one equation and then substitute into the other equation
as well - however the method we give here will lead into a more systematic way to solve systems of equations
that we will discuss later.

91
In the last example, we saw that the set B was linearly dependent. We showed that
       
1 2 1 0
t 0  − t 1  + t 1  =  0 .
       

−1 0 1 0

held for any t ∈ R. Choosing any t 6= 0, say t = 1 gives


       
1 2 1 0
 0  −  1  +  1  =  0 . (13)
       

−1 0 1 0

We may rearrange this as      


1 2 1
 0 = 1 − 1 
     

−1 0 1
and use Theorem 13.8 to conclude that
         

 1 2 1   2
 1 
Span B = Span  0  ,  1  ,  1  = Span  1 , 1 
         
   
−1 0 1 0 1
   

In this case we could solve for any vector on the left hand side of Equation (13) in terms of
the other two to alternatively arrive at
               
2 1 1 
 1 2 1  
 1 1  
 1  =  0  +  1  =⇒ Span  0  ,  1   1  = Span
, ,
0   1 
               

   
0 −1 1 −1 0 1 −1 1
   

or
               
1 2 1 
 1 2 1  
 1 2 
 1  =  1  −  0  =⇒ Span 0  ,  1  ,  1  = Span 0 , 1  .
               
 
   
1 0 −1 −1 0 1 −1 0
   

Example 14.4. Show that the set


   

 1 1 
C=  0 , 1 
   
 
−1 1
 

is linearly independent.

92
Solution. For c1 , c2 ∈ R, consider
     
1 1 0
c1  0  + c2  1  =  0  .
     

−1 1 0

We obtain the system of equations

c1 + c2 = 0
c2 = 0
−c1 + c2 = 0

We see from the second equation that c2 = 0 and substituting c2 = 0 into both the first and
third equations each gives c1 = 0. Thus we have only the trivial solution c1 = c2 = 0 and we
conclude that C is linearly independent.

Theorem 14.5. A set of vectors {~v1 , . . . , ~vk } in Rn is linearly dependent if and only if

~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }

for some i = 1, . . . , k.

Proof. Assume first that the set {~v1 , . . . , ~vk } in Rn is linearly dependent. Then there exist
c1 , . . . , ck ∈ R, not all zero, such that

c1~v1 + · · · + ci−1~vi−1 + ci~vi + ci+1~vi+1 + · · · + ck~vk = ~0.

Without loss of generality, assume that ci 6= 0. Then we may isolate for ~vi on one side of the
equation:
c1 ci−1 ci+1 ck
~vi = − ~v1 − · · · − ~vi−1 − ~vi+1 − · · · − ~vk
ci ci ci ci
which shows that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }. To prove the other implication, we
assume that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk } for some i = 1, . . . , k. Then there exist
d1 , . . . , di−1 , di+1 , . . . , dk ∈ R such that

~vi = d1~v1 + · · · + di−1~vi−1 + di+1~vi+1 + · · · + dk~vk

and rearranging gives

d1~v1 + · · · + di−1~vi−1 − 1~vi + di+1~vi+1 + · · · + dk~vk = ~0

which shows that {~v1 , . . . , ~vk } is linearly dependent.

93
Given a spanning set of vectors {~v1 , . . . , ~vk } in Rn for a set S, we can now consider the vector
equation
c1~v1 + · · · + ck~vk = ~0. (14)
If the only solution to (14) is the trivial solution (c1 = · · · = ck = 0), then {~v1 , . . . , ~vk } is lin-
early independent. It follows that removing any vector from {~v1 , . . . , ~vk } will leave a smaller
spanning set that no longer spans S. If, on the other hand, there exists a nontrivial solution
to (14) where say, ci 6= 0, then we can solve for ~vi in (14) to express ~vi as a linear combination
of ~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk which shows that ~vi ∈ Span {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }. It follows
that we can remove ~vi from the spanning set and the resulting set {~v1 , . . . , ~vi−1 , ~vi+1 , . . . , ~vk }
will still span S by Theorem 13.8.

We conclude with a few more examples involving linear dependence and linear independence.

Example 14.6. Consider the set {~v1 , . . . , ~vk , ~0} of vectors in Rn . Then

0~v1 + · · · + 0~vk + (1)~0 = ~0

which shows that {~v1 , . . . , ~vk , ~0} is linearly dependent. Note that any subset of Rn containing
~0 ∈ Rn will be linearly dependent.

Example 14.7. Let ~v1 , ~v2 , ~v3 ∈ Rn be such that {~v1 , ~v2 , ~v3 } is linearly independent. Prove
that {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.

Proof. We must prove that the set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent. To do
so, we consider the vector equation

c1~v1 + c2 (~v1 + ~v2 ) + c3 (~v1 + ~v2 + ~v3 ) = ~0, c1 , c2 , c3 ∈ R.

Rearranging this equation gives

(c1 + c2 + c3 )~v1 + (c2 + c3 )~v2 + c3~v3 = ~0.

Since {~v1 , ~v2 , ~v3 } is linearly independent, we must have that

c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0

We see that c3 = 0 and it follows that c2 = 0 and then that c1 = 0. Hence we have only the
trivial solution, so our set {~v1 , ~v1 + ~v2 , ~v1 + ~v2 + ~v3 } is linearly independent.

Example 14.8. Let {~v1 , . . . , ~vk } be a linearly independent set of vectors in Rn . Prove that
{~v1 , . . . , ~vk−1 } is linearly independent.

94
Proof. It is given that {~v1 , . . . , ~vk } is linearly independent. Suppose for a contradiction that
{~v1 , . . . , ~vk−1 } is linearly dependent. Then there exist c1 , . . . , ck−1 , not all zero, such that

c1~v1 + · · · + ck−1~vk−1 = ~0.

But then adding 0~vk to both sides gives

c1~v1 + · · · + ck−1~vk−1 + 0~vk = ~0

which shows that {~v1 , . . . , ~vk } is linearly dependent, since not all of c1 , . . . , ck−1 are zero. But
this is a contradiction since we were given that {~v1 , . . . , ~vk } is linearly independent. Hence,
our supposition that {~v1 , . . . , ~vk−1 } is linearly dependent was incorrect. This leaves only that
{~v1 , . . . , ~vk−1 } is linearly independent, as required.
In the previous example, we used a proof technique known as Proof by Contradiction. When
using proof by contradiction, you are proving a statement is true by proving that it cannot
be false. In the proof, we had to show that {~v1 , . . . , ~vk−1 } was linearly independent. The
set {~v1 , . . . , ~vk−1 } is either linearly independent or linearly dependent, but not not both.
Instead of proving that {~v1 , . . . , ~vk−1 } was linearly independent directly, we supposed that it
was linearly dependent. From that supposition, we argued until we arrived at {~v1 , . . . , ~vk }
being linearly dependent, which was impossible since we were given that {~v1 , . . . , ~vk } was
linearly independent. We arrived at a contradiction: if the set {~v1 , . . . , ~vk−1 } was linearly
dependent, then the set {~v1 , . . . , ~vk } was both linearly independent and linearly dependent.
Thus we showed that {~v1 , . . . , ~vk−1 } cannot be linearly dependent, and so must be linearly
independent (which is what we were asked to prove).

It follows from the last example that every nonempty subset of a linearly independent set is
also linearly independent. Of course, we should consider the empty set, since it is a subset
of every set. As the empty set contains no vectors, we cannot exhibit vectors from the
empty set that form a linearly dependent set. Thus, the empty set is (vacuously) linearly
independent. Thus, we can now say that given any linearly independent set B, every subset
of B is linearly independent as well.

95
Lecture 15

Bases29
Having discussed spanning and linear independence, we now combine the two ideas and
consider linearly independent spanning sets.

Definition 15.1. Let S be a subset of Rn . If B = {~v1 , . . . , ~vk } is a linearly independent set


of vectors in S such that S = Span {~v1 , . . . , ~vk }, then B is a basis for S. We define a basis
for {~0} to be the empty set, ∅.

Example 15.2. Prove that (" # " #)


1 0
B= ,
0 1
is a basis for R2 .

Solution. We first prove that Span B = R2 . For ~x = [ xx12 ] ∈ R2 , we have


" # " # " #
x1 1 0
= x1 + x2
x2 0 1

which shows that ~x ∈ Span B and we conclude that R2 ⊆ Span B. Since [ 10 ] , [ 01 ] ∈ R2 and
R2 is closed under linear combinations, we have that Span B ⊆ R2 . Hence Span B = R2 . To
show that B is linearly independent, let c1 , c2 ∈ R and consider
" # " # " # " #
0 1 0 c1
= c1 + c2 = .
0 0 1 c2

We immediately see that c1 = c2 = 0 and so B is linearly independent. Hence we have that


B is a basis for R2 .

Definition 15.3. For i = 1, . . . , n, let ~ei ∈ Rn be the vector whose ith entry is 1 and whose
other n − 1 entries are 0. The set
{~e1 , . . . , ~en }
is a basis for Rn , called the standard basis for Rn .

Example 15.4. In R2 the standard basis is


(" # " #)
1 0
{~e1 , ~e2 } = ,
0 1
29
“Bases” is the plural of “basis”.

96
and in R3 the standard basis is
     
 1
 0 0 
{~e1 , ~e2 , ~e3 } =  0  ,  1  ,  0  .
     
 
0 0 1
 

It should now be clear how to write out the standard basis for R4 , R5 and so on. Note that
we have seen the standard basis for R3 before in Example 6.12. It is important to realize
that the definition of ~ei depends on n: in the previous example, ~e1 and ~e2 are expressed in
two different ways depending on the value of n. In general it will be clear from the context
what ~e1 , . . . , ~en are, that is, what the value of n is. As with Example 6.12, it is easy to write
any vector in Rn as a linear combination of the standard basis vectors for Rn .
Example 15.5. Is    
 1
 −1 
B =  2 , 2 
   
 
0 1
 

a basis for R3 ?
h x1 i
Solution. Let ~x = x2
x3
∈ R3 and for c1 , c2 ∈ R, consider
     
x1 1 −1
 x 2  = c1  2  + c2  2  .
     

x3 0 1
We obtain the system of equations
c1 − c2 = x 1
2c1 + 2c2 = x2
c2 = x 3
From the last equation we see that c2 = x3 and substitution into the second equation gives
x2 − 2x3
c1 = .
2
Substituting our values for c1 and c2 into the first equation gives
x2 − 2x3
− x3 = x1
2
and simplifying leads to
2x1 − x2 + 4x3 = 0.
h x1 i
From this, we deduce that ~x = x2
x3
∈ Span B if and only if 2x1 −x2 +4x3 = 0, and it follows
3 3
h i Span B 6= R so B cannot be a basis for R . For example, since 2(1) − 1 + 4(1) = 5 6= 0,
that
1
1 ∈ / Span B.
1

97
Note that in the previous example, B is linearly independent since it contains only two
vectors which are not scalar multiples of one another. A vector equation for Span B is
   
1 −1
~x = c1  2  + c2  2  , c1 , c2 ∈ R
   

0 1

which we recognize as the vector equation for a plane in R3 . Indeed, taking the cross product
of the vectors in B yields a normal vector for this plane, and leads to the scalar equation
2x1 − x2 + 4x3 = 0.

Given a set S, we would like to find a basis for S if possible.30 . Thus, we would like to find a
linearly independent set B ⊆ S such that Span B = S. A good reason to require Span B = S
is that we would like be able to write every vector in S (and only those vectors in S) as a
linear combination of the vectors in B. The reason for requiring linear independence may
not be so clear.
Theorem 15.6. If B = {~v1 , . . . , ~vk } is a basis for a set S ⊆ Rn , then every ~x ∈ S can be
expressed as a linear combination ~v1 , . . . , ~vk in a unique way.
Proof. Since B is a basis for S, S = Span B and so every ~x ∈ S can be expressed as a linear
combination of the vectors in B. Thus we only need to show that this expression is unique.
Suppose for c1 , d1 , . . . , ck , dk ∈ R we have two ways to express ~x as a linear combination of
~v1 , . . . , ~vk :
~x = c1~v1 + · · · + ck~vk and ~x = d1~v1 + · · · + dk~vk .
Then
c1~v1 + · · · + ck~vk = d1~v1 + · · · + dk~vk
and so
(c1 − d1 )~v1 + · · · + (ck − dk )~vk = ~0.
Since B is linearly independent, we have that c1 − d1 = · · · = ck − dk = 0, that is, ci = di
for i = 1, . . . , k, which shows any ~x ∈ S can be expressed uniquely as a linear combination
of the vectors in B.
We thus think of a basis B for a subset S of Rn as a minimal spanning set in the sense that
B spans S, but since B is linearly independent, we cannot remove a vector from B as we
would obtain a set that no longer spans S.

Subspaces of Rn
We now address which subsets of Rn admit a basis. If a subset S of Rn has a basis, then
S = Span B for some set B = {~v1 , . . . , ~vk }. It follows that ~v1 , . . . , ~vk ∈ S and that S is closed
under linear combinations. We’ve seen that Rn itself is closed under linear combinations,
so we expect that subsets of Rn with a basis to act very much like Rn itself under vector
addition and scalar multiplication. Thus we have the following definition:
30
We will soon see exactly which subsets of Rn can have a basis, but the work we’ve done thus far might
lead you to guess which sets. 98
Definition 15.7. A subset S of Rn is a subspace of Rn if for every w,
~ ~x, ~y ∈ S and c, d ∈ R
we have

S1 ~x + ~y ∈ S S is closed under addition

S2 ~x + ~y = ~y + ~x addition is commutative

S3 (~x + ~y ) + w
~ = ~x + (~y + w)
~ addition is associative

S4 There exists a vector ~0 ∈ S such that ~v + ~0 = ~v for every ~v ∈ S zero vector

S5 For each ~x ∈ S there exists a (−~x) ∈ S such that ~x + (−~x) = ~0 additive inverse

S6 c~x ∈ S S is closed under scalar multiplication

S7 c(d~x) = (cd)~x scalar multiplication is associative

S8 (c + d)~x = c~x + d~x distributive law

S9 c(~x + ~y ) = c~x + c~y distributive law

S10 1~x = ~x scalar multiplicative identity

This should seem similar to Theorem 6.10. In fact, if we replace S with Rn in the above def-
inition, then we have Theorem 6.10, so we see immediately that Rn is itself a subspace of Rn .

Aside from Rn , we see that S = {~0} is a subspace of Rn (take w,


~ ~x, ~y = ~0 in the above
definition), called the trivial subspace.

Given a subset S of Rn , it would appear that in order to show that S is a subspace of Rn , we


would need to verify all ten properties given in the above definition. However, we can use
the fact that since S is a subset of Rn , every vector in S is a vector in Rn and so properties
S2, S3, S7, S8, S9 and S10 are simply properties V 2, V 3, V 7, V 8, V 9 and V 10 of Theorem
6.10 and so must hold.

Thus to show that a nonempty subset S of Rn is a subspace of Rn , we need only verify


properties S1, S4, S5 and S6 as these depend on S (and not on Rn as in Theorem 6.10).
However, our work is cut even further as once S1 and S6 hold, we may conclude that for
any ~x ∈ S, 0~x = ~0 ∈ S and (−~x) = (−1)~x ∈ S, that is, that S4 and S5 hold.

Note that the empty set, ∅, is not a subspace of Rn since ~0 ∈


/ ∅ and so S4 fails to hold.

99
Lecture 16
Theorem 16.1 (Subspace Test). Let S be a nonempty subset of Rn . If for every ~x, ~y ∈ S
and for every c ∈ R, we have that ~x + ~y ∈ S and c~x ∈ S, then S is a subspace of Rn .

Example 16.2. The set (" # " #)


1 1
S= ,
1 2
is not a subspace of R2 since
" # " # " #
1 1 2
+ = ∈
/ S,
1 2 3

that is, S is not closed under vector addition.

Example 16.3. Prove that


  

 x 1 

S =  x2  x1 + x2 = 0 and x2 − x3 = 0
 
 
x3
 

is a subspace of R3 .

Proof. By definition, S ⊆ R3 , and since 0 + 0 = 0 and 0 − 0 = 0, ~0 ∈ S so S is nonempty.


Now let    
x1 y1
~x =  x2  and ~y =  y2 
   

x3 y3
be two vectors in S. Then x1 + x2 = 0 = y1 + y2 and x2 − x3 = 0 = y2 − y3 . We must first
show that  
x1 + y1
~x + ~y =  x2 + y2
 

x3 + y3
belongs to S by showing that (x1 + y1 ) + (x2 + y2 ) = 0 and that (x2 + y2 ) − (x3 + y3 ) = 0.
We have

(x1 + y1 ) + (x2 + y2 ) = (x1 + x2 ) + (y1 + y2 ) = 0 + 0 = 0

and

(x2 + y2 ) − (x3 + y3 ) = (x2 − x3 ) + (y2 − y3 ) = 0 + 0 = 0

100
so ~x + ~y ∈ S. For any c ∈ R, we must next show that
 
cx1
c~x =  cx2 
 

cx3

belongs to S by showing that cx1 + cx2 = 0 and cx2 − cx3 = 0. We have

cx1 + cx2 = c(x1 + x2 ) = c(0) = 0

and

cx2 − cx3 = c(x2 − x3 ) = c(0) = 0

so c~x ∈ S. Hence S is a subspace of R3 .


Note that when asked to show that a subset S of Rn is a subspace of Rn , one of the things
we must show is that S is nonempty. It is often easiest to do so by showing that the zero
vector of Rn belongs to S. This is for two reasons: first, the zero vector is easy to work with
mathematically; and second, if the zero vector does not belong to S, then property S4 fails
and we can conclude immediately that S is not a subspace of Rn .
Example 16.4. The set
  

 x 1 

S =  x2  x1 + x2 − x3 = 4
 
 
x3
 

is not a subspace of R3 since 0 + 0 − 0 = 0 6= 4, that is, ~0 ∈


/ S.
It is of course okay to show S is nonempty by showing a nonzero vector in Rn belongs to S,
however, a nonzero vector failing to belong htoiS does not exclude S from being a subspace
1
of Rn . Indeed, in Example 16.3, the vector 1 ∈ / S and yet S is a subspace of R3 .
1

Example 16.5. Show that ( " # )


1
S= c c∈R
3
is a subspace of R2 .
Solution. Since c [ 13 ] ∈ R2 for any c ∈ R, we have that S ⊆ R2 and taking c = 0 gives [ 00 ] ∈ S
so S is indeed nonempty. Now let ~x, ~y ∈ S. Then there exist c1 , c2 ∈ R such that
" # " #
1 1
~x = c1 and ~y = c2 .
3 3

101
Then we have that
" # " # " #
1 1 1
~x + ~y = c1 + c2 = (c1 + c2 )
3 3 3
so ~x + ~y ∈ S and for any c ∈ R,
" #! " #
1 1
c~x = c c1 = (cc1 )
3 3

so c~x ∈ S. Hence S is a subspace of R2 .


The next theorem shows that given a set {~v1 , . . . , ~vk } of vectors in Rn , the span of that set
will always be a subspace of Rn .
Theorem 16.6. Let ~v1 , . . . , ~vk ∈ Rn . Then
S = Span {~v1 , . . . , ~vk }
is a subspace of Rn .
Solution. Since ~v1 , . . . , ~vk ∈ Rn and Rn is closed under linear combinations, we have that
S ⊆ Rn and since ~0 = 0~v1 + · · · + 0~vk ∈ S, we see S is nonempty. Now let ~x, ~y ∈ S. Then
there exist c1 , . . . , ck , d1 , . . . , dk ∈ R such that
~x = c1~v1 + · · · + ck~vk and ~y = d1~v1 + · · · + dk~vk .
Then
~x + ~y = c1~v1 + · · · + ck~vk + d1~v1 + · · · + dk~vk
= (c1 + d1 )~v1 + · · · + (ck + dk )~vk
and so ~x + ~y ∈ S. For any c ∈ R,
c~x = c(c1~v1 + · · · + ck~vk )
= (cc1 )~v1 + · · · + (cck )~vk
from which we see that c~x ∈ S. Thus, S is a subspace of Rn .
The last example shows that we can always generate a subspace by looking at the set
spanned by a finite set of vectors. In fact, every subspace S of Rn can be expressed as
S = Span {~v1 , . . . , ~vk } for some ~v1 , . . . , ~vk ∈ S, however, a proof of this last statement is
beyond the scope of this course.

Bases of Subspaces
We have discussed that every subspace S of Rn can be expressed as S = Span {~v1 , . . . , ~vk }
for some ~v1 , . . . , ~vk . Thus {~v1 , . . . , ~vk } is a spanning set for S. Removing any dependencies
from the set {~v1 , . . . , ~vk } will leave us with a linearly independent spanning set for S, that
is, a basis for S. We now look at how to find a basis for a subspace of Rn .

102
Example 16.7. Find a basis for the subspace
  
 x1
 

S =  x2  x1 + x2 = 0 and x2 − x3 = 0
 
 
x3
 

of R3 .
h x1 i
Solution. Let ~x = xx23 ∈ S. Then x1 + x2 = 0 and x2 − x3 = 0 and thus x1 = −x2 and
x3 = x2 . It follows that
     
x1 −x2 −1
~x =  x2  =  x2  = x2  1  .
     

x3 x2 1
nh −1 io h −1 i
Thus S ⊆ Span 1 . Now since 1 ∈ S and since S is closed under linear combina-
1 nh −1 io 1 nh −1 io
31
tions we have that Span 1 ⊆ S and so Span 1 = S. Hence the set
1 1
 
 −1 
 
B=  1 
 
 
1
 

is a spanning set for S. Since B consists of a single nonzero vector, B is linearly independent
and is hence a basis for S.
Note that once we obtain a basis for S, we see that
h −1 i S is the set of all linear combinations (or
in this case, scalar multiples) of the vector 1 . Thus S is a line through the origin with
h −1 i 1
direction vector 1 .
1

When finding a spanning set for a subspace S of Rn , we choose an arbitrary ~x ∈ S and try
to “decompose” ~x as a linear combination of some ~v1 , . . . , ~vk ∈ S. This then shows that
S ⊆ Span {~v1 , . . . , ~vk }. Technically, we should also show that Span {~v1 , . . . , ~vk } ⊆ S, but this
is trivial as S is a subspace and thus contains all linear combinations of ~v1 , . . . , ~vk . Thus for
a subspace S of Rn , S ⊆ Span {~v1 , . . . , ~vk } implies that S = Span {~v1 , . . . , ~vk }, and we don’t
normally show (or even mention) that Span {~v1 , . . . , ~vk } ⊆ S.
31
Properties S1 and S6 from the definition of a subspace of Rn combine to give that every linear combi-
nation of vectors from S is again in S.

103
Example 16.8. Consider the subspace
  

 a − b 

S =  b − c  a, b, c ∈ R
 
 
c−a
 

of R3 . Find a basis for S.


Solution. Let ~x ∈ S. Then for some a, b, c ∈ R,
       
a−b 1 −1 0
~x =  b − c  = a  0  + b  1  + c  −1  .
       

c−a −1 0 1

Thus      

 1 −1 0 
S = Span  0  ,  1  ,  −1  .
     
 
−1 0 1
 

Now since      
0 1 −1
 −1  = −  0  −  1 
     

1 −1 0
We have from Theorem 13.8 that
   

 1 −1 
S = Span  0 , 1 
   
 
−1 0
 

so    

 1 −1 
B=  0 , 1 
   
 
−1 0
 

is a spanning set for S. Moreover, since neither vector in B is a scalar multiple of the other,
B is linearly independent and hence a basis for S.
Note that we now see that S is a plane through the origin and a vector equation for S is
   
1 −1
~x = s  0  + t  1  , s, t ∈ R
   

−1 0

and it’s not hard to derive that x1 + x2 + x3 = 0 is a scalar equation for S.

104
Lecture 17

k−Flats and Hyperplanes


We can now use the concept of linear independence to extend the notions of lines and planes.
We recall:
−→
Definition 17.1. Let P be a point in Rn and let p~ = OP .

• Let ~v1 ∈ Rn be such that {~v1 } is linearly independent32 . The set with vector equation

~x = p~ + c1~v1 , c1 ∈ R

is a line in Rn through the point P .

• Let ~v1 , ~v2 ∈ Rn be such that {~v1 , ~v2 } is linearly independent33 . The set with vector
equation
~x = p~ + c1~v1 + c2~v2 , c1 , c2 ∈ R
is a plane in Rn through the point P .

We now generalize the definition of a line and of a plane as follows:

Definition 17.2. For some positive integer k ≤ n − 1, let ~v1 , . . . , ~vk ∈ Rn be such that
{~v1 , . . . , ~vk } is linearly independent. The set with vector equation

~x = p~ + c1~v1 + · · · + ck~vk , c1 , . . . , ck ∈ R

is a k−flat in Rn through the point P .

Thus in Rn , a 1−flat is a line and a 2−flat is a plane, both of which we have seen before.
We may think of a 3−flat as a “three dimensional plane”, but we aren’t normally able to
visualize such things in higher dimensions. We also mention that a 0−flat is simply a point
and has vector equation ~x = p~. The last type of k−flat that we have encountered already is
an (n − 1)−flat, known as a hyperplane:

Definition 17.3. Let ~v1 , . . . , ~vn−1 ∈ Rn be such that {~v1 , . . . , ~vn−1 } is linearly independent.
The set with vector equation

~x = p~ + c1~v1 + · · · + cn−1~vn−1 , c1 , . . . , cn−1 ∈ R

is a hyperplane in Rn through the point P .


32
Or equivalently, ~v1 6= ~0
33
Or equivialently, ~v1 and ~v2 are not scalar multiples of one another

105
Note that the definition of a hyperplane depends on n, so how we geometrically interpret a
hyperplane depends on n. For example, in R2 , a hyperplane is a 1−flat, or a line. In R3 , a
hyperplane is a 2−flat, or a plane. The reason we are concerned with hyperplanes is that
they are the only k−flats in Rn that have scalar equations. Indeed, a scalar equation for a
line in R2 is of the form ax1 + bx2 = c for some a, b, c ∈ R and a scalar equation for a plane
in R3 is of the form ax1 + bx2 + cx3 = d for some a, b, c, d ∈ R. Hyperplanes will play a role
when we study systems of equations and their geometric interpretations shortly.

Finally, using our new terminology, we can now give a simple geometric description of all of
the subspaces of Rn : they are exactly the k−flats through the origin for k = 0, 1, . . . , n − 1
along with Rn itself. We note that in Rn , a k−flat with vector equation ~x = p~+c1~v1 +· · ·+ck~vk
is a subspace of Rn if and only if p~ ∈ Span {~v1 , . . . , ~vk }.

Orthogonal Sets and Bases


Definition 17.4. A set {~v1 , . . . , ~vk } ⊆ Rn is an orthogonal set if ~vi · ~vj = 0 for i 6= j.
Example 17.5. The standard basis {~e1 , . . . , ~en } for Rn is an orthogonal set. The set
     
 1
 1 −1 

 1  ,  −2  ,  0
     

 
1 1 1
 

is also an orthogonal set in R3 .


Note that an orthogonal set may contain the zero vector, and that any set containing the
zero vector is linearly dependent. However, if we insist that our orthogonal set contain only
nonzero vectors, then we obtain a linearly independent set.
Theorem 17.6. If {~v1 , . . . , ~vk } ⊆ Rn is an orthogonal set of nonzero vectors, then {~v1 , . . . , ~vk }
is linearly independent.
Proof. For c1 , . . . , ck ∈ R, consider c1~v1 + · · · + ck~vk = ~0. For each i = 1, . . . , k,
~vi · (c1~v1 + · · · + ck~vk ) = ~vi · ~0.
Expanding the dot product on the left and evaluating the one on the right gives
c1 (~vi · ~v1 ) + · · · + ci−1 (~vi · ~vi−1 ) + ci (~vi · ~vi ) + ci+1 (~vi · ~vi+1 ) + · · · + ck (~vi · ~vk ) = 0.
Since {~v1 , . . . , ~vk } is an orthogonal set, we have ~vi · ~vj = 0 for i 6= j. We thus obtain
ci (~vi · ~vi ) = 0,
that is,
ci k~vi k2 = 0.
Since ~vi 6= ~0, k~vi k =
6 0 and we must have ci = 0. Since i was arbitrary, it follows that
c1 = · · · = ck = 0 and we have that {~v1 , . . . , ~vk } is linearly independent.

106
Definition 17.7. If an orthogonal set B is a basis for a subspace S of Rn , then B is an
orthogonal basis for S.
If B = {~v1 , . . . , ~vk } is an orthogonal basis of a subspace S of Rn and ~x ∈ S, then, since B is
a basis for S, there exist c1 , . . . , ck ∈ R such that ~x = c1~v1 + · · · + ck~vk . For any i = 1, . . . , k
it follows from B being an orthogonal set that
~vi · ~x = ~vi · (c1~v1 + · · · + ck~vk )
= ci k~vi k2

and using the fact that ~vi 6= ~0, we find


~x · ~vi
ci = .
k~vi k2
Thus,
~x · ~v1 ~x · ~vk
~x = 2
~v1 + · · · + ~vk .
k~v1 k k~vk k2

Hence, we can compute the coefficients that are used to express ~x as a linear combination of
the vectors in B directly, that is, without solving a system of equations. Also note that we
can solve for the coefficients independently of one another.
Example 17.8. Let ~x = [ −1
1 ] and
(" # " #)
1 6
B= ,
3 −2

be an orthogonal basis for R2 . Write ~x as a linear combination of the vectors in B.


Solution. For c1 , c2 ∈ R, consider
" # " #
1 6
~x = c1 + c2 .
3 −2
Then
[ −1 1
1 ] · [3] −1 + 3 2 1
c1 = 2 = = =
k[ 13 ]k 1+9 10 5
[ −1 ] · [ 6 ] −6 − 2 −8 1
c2 = 1 6 −2 2 = = =−
k[ −2 ]k 36 + 4 40 5
and so " # " #
1 1 1 6
~x = − .
5 3 5 −2

107
Orthonormal Sets and Bases

Definition 17.9. An orthogonal set {~v1 , . . . , ~vk } ⊆ Rn is called an orthonormal set if


k~vi k = 1 for i = 1, . . . , k. If an orthonormal set B is a basis for subspace S of Rn , then
B is an orthonormal basis for S.

Example 17.10. The standard basis {~e1 , . . . , ~en } for Rn is an orthonormal set (and an
orthonormal basis for Rn ). The set
 √   √   √ 
 1/√3 1/ 6 −1/ 2 
√  
 
 1/ 3  ,  −2/ 6  ,  0
   
√ √ √

 
1/ 3 1/ 6 1/ 2
 

is also an orthonormal set (and also an orthonormal basis for R3 ).

Note that the condition k~vi k = 1 excludes the zero vector from any orthonormal set. It
follows that any orthonormal set is an orthogonal set of nonzero vectors34 , and as such, must
be linearly independent by Theorem 17.6.

Given an orthogonal basis B = {~v1 , . . . , ~vk } for a subspace S of Rn , we can obtain an


orthonormal basis
C = {w ~ k}
~ 1, . . . , w
for S by letting
1
w
~i = ~vi
k~vi k
for i = 1, . . . , k.

If B = {~v1 , . . . , ~vk } is an orthonormal basis for a subspace S of Rn , then B is an orthogonal


basis for S and so for any ~x ∈ S we have
~x · ~v1 ~x · ~vk
~x = ~
v 1 + · · · + ~vk
k~v1 k2 k~vk k2

and since k~vi k = 1 for i = 1, . . . , k, we have

~x = (~x · ~v1 )~v1 + · · · + (~x · ~vk )~vk .

We see that for an orthonormal basis B = {~v1 , . . . , ~vk } of a subspace S of Rn , evaluating


the k dot products ~x · ~v1 , . . . , ~x · ~vk is all that is required to compute the coefficients used to
express ~x ∈ S as a linear combination of ~v1 , . . . , ~vk .
34
Note that not every orthogonal set of nonzero vectors is an orthonormal set.

108
Example 17.11. From before,
(" # " #)
1 6
B= ,
3 −2

is an orthogonal basis for R2 . Obtain an orthonormal basis C for R2 from B and express
~x = [ −1
1 ] as a linear combination of the vectors in C.

Solution. Since " # " #


1 √ 6 √ √
= 10 and = 40 = 2 10
3 −2
we have (" √ # " √ #)
1/ 10 3/ 10
C= √ , √
3/ 10 −1/ 10
is an orthonormal basis for R2 . Since
" # " √ # " # " √ #
−1 1/ 10 2 −1 3/ 10 4
· √ =√ and · √ = −√
1 3/ 10 10 1 −1/ 10 10

we obtain " √ # " √ #


2 1/ 10 4 3/ 10
~x = √ √ −√ √ .
10 3/ 10 10 −1/ 10

109
Lecture 18

Systems of Linear Equations


Our computations stemming from linear combinations, spanning, and linear dependence and
independence all involve solving systems of equations. Thus far, our method to solve such
systems has mainly been elimination and substitution whereby we isolate for a variable in
one equation and substitute the resulting expression for this variable into the remaining
equations, effectively “eliminating” one of the variables. This has been sufficient given that
our systems have had few equations and few variables. However, as the number of variables
and the number of equations grow, we require a more systematic approach to solving these
systems.
Definition 18.1. A linear equation in n variables is an equation of the form

a1 x 1 + a2 x 2 + · · · + an x n = b

where x1 , . . . , xn ∈ R are the variables or unknowns, a1 , . . . , an ∈ R are coefficients and b ∈ R


is the constant term. A system of linear equations (also called a linear system of equations)
is a collection of finitely many linear equations.
Example 18.2. The system

3x1 + 2x2 − x3 = 3
2x1 + x3 = −1
3x2 − 4x3 = 4

is a system of three linear equations in three variables. Assuming we are working in R3 , we


see that each equation is the scalar equation of a hyperplane (a plane in R3 ). More generally,
a system of m linear equations in n variables is written as

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

The number aij is the coefficient of xj in the ith equation and bi is the constant term in the
ith equation. Each of the m equations is a scalar equation of a hyperplane in Rn .
Definition 18.3. A vector  
s1
 . 
~s =  ..  ∈ Rn
sn

110
is a solution to a system of m equations in n variables if all m equations are satisfied when
we set xj = sj for j = 1, . . . , n. The set of all solutions to a system of equations is called the
solution set.
We may view the solution set of a system of m equations in n variables as the intersection
of the m hyperplanes determined by the system.
Example 18.4. Solving the system of two linear equations in two variables
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
can be viewed as finding the points of intersection of the two lines with scalar equations
a11 x1 + a12 x2 = b1 and a21 x1 + a22 x2 = b2 . Figure 43 shows the possible outcomes.

Figure 43: Number of solutions resulting from intersecting two lines in R2 .

We see that a system of two equations in two variables can have no solutions, exactly one
solution or infinitely many solutions. Figure 44 shows a similar situation when we consider a
system of three equations in three variables, which we may view geometrically as intersecting
three planes in R3 . Indeed we will see that for any linear system of m equations in n variables,
we will obtain either no solutions, exactly one solution, or infinitely many solutions.
Definition 18.5. We call a linear system of equations consistent if it has at least one
solution. Otherwise, we call the linear system inconsistent.
Example 18.6. Solve the linear system
x1 + 3x2 = −1
x 1 + x2 = 3
Solution. To begin, we will eliminate x1 in the second equation by subtracting the first
equation from the second:
!
x1 + 3x2 = −1 Subtract the first x1 + 3x2 = −1
−→ −→
x1 + x2 = 3 equation from the second −2x2 = 4

111
Figure 44: Number of solutions resulting from intersecting three planes. Note that there are
other ways to arrange these planes to obtain the given number of solutions.

Next, we multiply the second equation by a factor of − 21 :


!
x1 + 3x2 = −1 Multiply second x1 + 3x2 = −1
−→ 1
−→
−2x2 = 4 equation by − 2 x2 = −2

Finally we eliminate x2 from the first equation by subtracting the second equation from the
first equation three times:
!
x1 + 3x2 = −1 Subtract 3 times the second x1 = 5
−→ −→
x2 = −2 equation from the first x2 = −2

From here, we conclude that the given system is consistent and


" # " #
x1 = 5 x1 5
or = or (x1 , x2 ) = (5, −2)
x2 = −2 x2 −2

which we refer to as the parametric form of the solution, the vector form of the solution and
the point form of the solution respectively.
Notice that when we write a system of equations, we always list the variables in order and
that when we solve a system of equations, we are ultimately concerned with the coefficients
and constant terms. Thus, we can write the above systems of equations and the subsequent
operations we used to solve the system more compactly:
" # " # " # " #
1 3 −1 −→ 1 3 −1 −→ 1 3 −1 R1 −3R2 1 0 5
1 1 3 R2 −R1 0 −2 4 − 12 R2 0 1 −2 −→ 0 1 −2

112
so " # " #
x1 5
=
x2 −2
as above. We call " #
1 3
1 1
the coefficient matrix 35 of the linear system, which is often denoted by A. The vector
" #
−1
3

is the constant matrix (or constant vector) of the linear system and will be denoted by ~b.
Finally " #
1 3 −1
1 1 3
is the augmented matrix of the linear system, and will be denoted by [ A | ~b ].

From the previous example, we see that by taking the augmented matrix of a linear system
of equations, we can “reduce” it to an augmented matrix of a simpler system from which we
can “read off” the solution. Notice that by doing this, we are simply removing the variables
from the system (since we know x1 is always the first variable and x2 is always the second
variable), and treating the equations as rows of the augmented matrix. Thus, the operation
R2 − R1 written to the right of the second row of an augmented matrix means that we are
subtracting the first row from the second to obtain a new second row which would appear
in the next augmented matrix.

We are allowed to perform the following Elementary Row Operations (EROs) to the aug-
mented matrix of a linear system of equations:
• Swap two rows
• Add a scalar multiple of one row to another
• Multiply any row by a nonzero scalar
We say that two systems are equivalent if they have the same solution set. A system derived
from a given system by performing elementary row operations on its augmented matrix will
be equivalent to the given system. Thus elementary row operations allow us to reduce a
complicated system to one that is easier to solve. In the previous example, since
" # " #
1 3 −1 1 0 5
−→
1 1 3 0 1 −2
35
A matrix will be formally defined in Lecture 25 - for now, we view them as rectangular arrays of numbers
used to represent systems of linear equations.
113
the systems they represent

x1 + 3x2 = −1 x1 = 5
and
x 1 + x2 = 3 x2 = −2

must have the same solution set. Clearly, the second system is easier to solve as we can
simply read off the solution.

Example 18.7. Solve the linear system of equations

2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10

Solution. To solve this system, we perform elementary row operations to the augmented
matrix:
     
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
 0 1 2 8  R1 ↔R3  0 1 2 8   0 1 2 8 
     

1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2
   
1 0 3 10 R1 −3R3 1 0 0 1
 0 1 2 8  R2 −2R3  0 1 0 2 
   

0 0 1 3 −→ 0 0 1 3

We thus have
   
x1 = 1 x1 1
x2 = 2 or  x2  =  2  or (x1 , x2 , x3 ) = (1, 2, 3)
   

x3 = 3 x3 3

as our solution.
Note that the augmented matrix  
1 0 3 10
 0 1 2 8 
 

0 0 1 3
corresponds to the linear system of equations

x1 + 3x3 = 10
x2 + 2x3 = 8
x3 = 3

114
From here, we can see that x3 = 3. We can then use the second equation to solve for x2 and
then the first equation to solve for x1 :

x2 = 8 − 2x3 = 8 − 2(3) = 8 − 6 = 2
x1 = 10 − 3x3 = 10 − 3(3) = 10 − 9 = 1

to again arrive at x1 = 1, x2 = 2 and x3 = 3. This technique is called back substitution.

115
Lecture 19

Solving Systems of Linear Equations


In the last lecture, it likely wasn’t clear what elementary row operations one should perform
on an augmented matrix in order to solve a linear system of equations. Note that in the two
examples done last lecture, we computed
" # " #
1 3 −1 1 0 5
−→
1 1 3 0 1 −2
and
   
2 1 9 31 1 0 0 1
 0 1 2 8  −→  0 1 0 2 
   

1 0 3 10 0 0 1 3
In both cases, we chose our elementary row operations in order to get to the augmented
matrices on the right, and this is the “form” that we are looking for.
Definition 19.1.
• The first nonzero entry in each row of a matrix is called a leading entry (or a pivot).
• A matrix is in Row Echelon Form (REF) if
(1) All rows whose entries are all zero appear below all rows that contain nonzero entries,
(2) Each leading entry is to the right of the leading entries above it.
• A matrix is in Reduced Row Echelon Form (RREF) if it is in REF and
(3) Each leading entry is a 1 called a leading one,
(4) Each leading one is the only nonzero entry in its column.
Note that by definition, if a matrix is in RREF, then it is in REF.

When row reducing the augmented matrix of a linear system of equations, we aim first to
reduce the augmented matrix to REF. Once we have reached an REF form, we may either
use back substitution, or continue using elementary row operations until we reach RREF
where we can simply read off the solution.

From our last example, we rewrite the steps and circle the leading entries:
     
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
 0 1 2 8  R1 ↔R3  0 1 2 8   0 1 2 8 
     

1 0 3 10 2 1 9 31 R3 −2R1 0 1 3 11 R3 −R2

116
   
1 0 3 10 R1 −3R3 1 0 0 1
 0 1 2 8   0 1 0 2 
   
R2 −2R3

0 0 1 3 −→ 0 0 1 3
| {z } | {z }
REF REF and RREF

We point out here that any matrix has many REFs, but the RREF is always unique for any
matrix.
Example 19.2. Solve the linear system of equations

3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20

Solution. We use elementary row operations to carry the augmented matrix of the system
to RREF.
     
3 1 0 10 R1 −R2 1 0 −1 4 −→ 1 0 −1 4 −→
 2 1 1 6  −→  2 1 1 6  R2 −2R1  0 1 3 −2 
     

−3 4 15 −20 −3 4 15 −20 R3 +3R1 0 4 12 −8 R3 −4R2


 
1 0 −1 4
 0 1 3 −2 
 

0 0 0 0

If we write out the resulting system, we have

x1 − x3 = 4
x2 + 3x3 = −2
0 = 0

The last equation is clearly always true, and from the first two equations, we can solve for
x1 and x2 respectively to obtain

x1 = 4 + x3
x2 = −2 − 3x3

We see that there is no restriction on x3 , so we let x3 = t ∈ R. Thus our solution is


     
x1 = 4 + t x1 4 1
x2 = −2 − 3t , t ∈ R or  x2  =  −2  + t  −3 , t ∈ R.
     

x3 = t x3 0 1

117
Geometrically, we view solving the above system of equations as finding those points in R3
that lie on the three planes 3x1 + x2 = 10, 2x1 + x2 + x3 = 6 and −3x1 + 4x2 + 15x3 = −20.
Notice that the solution we obtained
     
x1 4 1
 x2  =  −2  + t  −3  , t ∈ R
     

x3 0 1

is the vector equation of a line in R3 . Hence we see that the three planes intersect in a line,
and we have found the vector equation for that line. See Figure 45.

Figure 45: The intersection of the three planes in R3 is a line. Note that the planes may not
be arranged exactly as shown.

That our solution was a line in R3 was a direct consequence of the fact that there were no
restrictions on the variable x3 and that as a result, our solutions for x1 and x2 depended on
x3 . This motivates the following definition.

Definition 19.3. Consider a consistent system of equations with augmented matrix [ A | ~b ]


and let [ R | ~c ] be any REF of [ A | ~b ]. If the jth column of R has a leading entry in it, then
the variable xj is called a leading variable. If the jth column R does not have a leading
entry, then xj is called a free variable.

In our last example,


     
3 1 0 10 R1 −R2 1 0 −1 4 −→ 1 0 −1 4 −→
 2 1 1 6  −→  2 1 1 6   0 1 3 −2 
     
R2 −2R1

-3 4 15 −20 -3 4 15 −20 R3 +3R1 0 4 12 −8 R3 −4R2

118
 
1 0 −1 4
 0 1 3 −2 
 

0 0 0 0
| {z }
REF (RREF actually)

In light of Definition 19.3, we can take


 
1 0 −1
R= 0 1 3 .
 

0 0 0
As the first two columns of R have leading entries (leading ones in this case), we have that
x1 and x2 are leading variables. There is no leading entry in the third column of R, so x3 is
a free variable.

When solving a system, if there are free variables, then each free variable is assigned a
different parameter, and then the leading variables are solved for in terms of the parameters.
The existence of a free variable guarantees that there will be infinitely many solutions to the
linear system of equations.
Example 19.4. Solve the linear system of equations
x1 + 6x2 − x4 = −1
x3 + 2x4 = 7
Solution. We have that the augmented matrix for this system of linear equations
" #
1 6 0 −1 −1
0 0 1 2 7
is already in RREF. The leading entries are in the first and third columns, so x1 and x3
are leading variables while x2 and x4 are free variables. We will assign x2 and x4 different
parameters. We have
x1 = −1 − 6s + t
x2 = s
, s, t ∈ R
x3 = 7 − 2t
x4 = t
or as a vector equation
       
x1 −1 −6 1
 x   0   1   0 
 2
=  + s  + t , s, t ∈ R
      

 x3   7   0   −2 
x4 0 0 1
which we recognize as the equation of a plane in R4 .

119
In the previous example, note that when we reached RREF and we begin to find the values
of x1 , x2 , x3 and x4 that will give the solution, it was easiest to solve for x4 first, then x3
followed by x2 and finally x1 .

Example 19.5. Solve the linear system of equations

2x1 + 12x2 − 8x3 = −4


2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7

Solution. We have
     
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
 2 13 −6 −5  R2 −R1  0 1 2 −1   0 1 2 −1 
     

−2 −14 4 7 R3 +R1 0 −2 −4 3 R3 +2R2 0 0 0 1

The resulting system is


2x1 + 12x2 − 8x3 = −4
x2 + 2x3 = −1
0 = 1
Clearly, the last equation can never be satisfied for any x1 , x2 , x3 ∈ R. Hence our system is
inconsistent, that is, it has no solution.
Geometrically, we see that the three planes 2x1 + 12x2 − 8x3 = −4, 2x1 + 13x2 − 6x3 = −5
and −2x1 − 14x2 + 4x3 = 7 in R3 have no point in common. Notice that no two of the planes
are parallel so the planes are arranged similar to what is depicted in Figure 46.

Figure 46: Three nonparallel planes that have no common point of intersection.

120
Keeping track of our leading entries in the last example, we see
     
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
 2 13 −6 −5  R2 −R1  0 1 2 −1   0 1 2 −1 
     

-2 −14 4 7 R3 +R1 0 -2 −4 3 R3 +2R2 0 0 0 1


| {z }
REF (but not RREF)

If row reducing an augmented matrix reveals a row of the form


h i
0 ··· 0 c

with c 6= 0, then the system is inconsistent. Thus, there is no need to continue row operations
in this case. Note that in a row of the form [ 0 ··· 0 | c ] with c 6= 0, the entry c is a leading
entry. Thus, a leading entry appearing in the last column of an augmented matrix indicates
that the system of linear equations is inconsistent.

All of our work for systems of linear equations can easily be generalized to the complex case.

Example 19.6. Solve the linear system of equations

jz1 − z2 − z3 + (−1 + j)z4 = −1


− (1 + j)z3 − 2jz4 = −1 − 3j
2jz1 − 2z2 − z3 − (1 − 3j)z4 = j

Solution. Our method to solve this system is no different than in the real case. We take the
augmented matrix of the system and use elementary row operations to carry it to RREF.
Note that our elementary row operations now involve multiplying a row by a complex number
and adding a complex multiple of one row to another, in addition to swapping two distinct
rows.
   
j −1 −1 −1 + j −1 −→ j −1 −1 −1 + j −1 −jR1

 0 0 −1 − j −2j −1 − 3j   0 0 −1 − j −2j −1 − 3j  R2 ↔R3


   

2j −2 −1 −1 + 3j j R3 −2R1 0 0 1 1+j 2+j −→


   
1 j j 1+j j −→ 1 j j 1+j j R1 −jR2

 0 0 1 1+j 2+j   0 0 1 1 + j 2 + j  −→
   

0 0 −1 − j −2j −1 − 3j R3 +(1+j)R2 0 0 0 0 0
 
1 j 0 2 1−j
 0 0 1 1+j 2+j 
 

0 0 0 0 0

121
We see that the system is consistent and that z1 and z3 are leading variables while z2 and
z4 are free variables. Thus

z1 = (1 − j) − js − 2t
z2 = s
s, t ∈ C
z3 = (2 + j) − (1 + j)t
z4 = t
or        
z1 1−j −j −2
 z2   0   1   0 
=  + s  + t , s, t ∈ C
       

 z3   2+j   0   −1 − j 
z4 0 0 1
Note that when we are dealing with a complex system of linear equations, our parameters
should be complex numbers rather than just real numbers.

122
Lecture 20
Example 20.1. Determine if
       
31  2
 1 9 
 8  ∈ Span  0 , 1 , 2  .
       
 
10 1 0 3
 

Solution. For c1 , c2 , c3 ∈ R, consider


       
31 2 1 9
 8  = c1  0  + c2  1  + c3  2 
       

10 1 0 3
which leads to the system of equations with augmented matrix
 
2c1 + c2 + 9c3 = 31 2 1 9 31
c2 + 2c3 = 8 −→  0 1 2 8 
 

c1 + 3c3 = 10 1 0 3 10
We’ve seen this system before - it is the system from Example 18.7 with x1 , x2 , x3 replaced
with c1 , c2 , c3 . Thus we have that the system is consistent and c1 = 1, c2 = 2 and c3 = 3.
Hence,        
31 2 1 9
 8  =  0  + 2 1  + 3 2 
       

10 1 0 3
and        
31  2
 1 9 
 8  ∈ Span  0   1   2  .
, ,
       
 
10 1 0 3
 

Note that the coefficient matrix of the system in the previous example is
 
2 1 9
 0 1 2 
 

1 0 3
and that its columns are the vectors from the above spanning set.
Theorem 20.2. Let ~v1 , . . . , ~vk ∈ Rn . Then ~b ∈ Span {~v1 , . . . , ~vk } if and only if the system
with augmented matrix h i
~
~v1 · · · ~vk b

is consistent. Note that ~v1 , . . . , ~vk , ~b are the columns of the augmented matrix.

123
Example 20.3. From Example 19.2, the linear system of equations

3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20
is consistent so we have that
       
10 
 3 1 0 
6  ∈ Span 2 , 1 , 1  .
       
 
 
−20 −3 4 15
 

Moreover, as the solution to this system isx1 = 4 + t, x2 = −2 − 3t and x3 = t for t ∈ R, we


see that        
10 3 1 0
6  = (4 + t)  2  − (2 + 3t)  1  + t  1  ,
       

−20 −3 4 15
h 10 i h 3 i
that is, there are infinitely many ways to express −20 as a linear combination of −3
6 2 ,
h i h i
1 0
1 and 1 .
4 15

Example 20.4. From Example 19.5, the linear system of equations

2x1 + 12x2 − 8x3 = −4


2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7
is inconsistent, so
       
−4 
 2 12 −8 
 −5  ∈
/ Span 2  ,  13  ,  −6  .
       

 
7 −2 −14 4
 

We observe that we now have a couple of ways to view a linear system of equations:
• In terms of its rows, where we can think of the system geometrically as intersecting
hyperplanes. The solution can then be interpreted as a description of this intersection
(if the system is inconsistent, then there is no intersection and the solution set is
empty).
• In terms of the columns (of the augmented matrix), where we think of the system
algebraically as determining if a vector ~b is in the span of a given set of vectors. The
answer is affirmative if the system is consistent and the solution tells us how to write
~b as a linear combination of the given vectors, and the answer is negative if the system
is inconsistent.

124
Rank
After solving numerous systems of equations, we are beginning to see the importance of
leading entries in an REF of the augmented matrix of the system. This motivates the
following definition.

Definition 20.5. The rank of a matrix A, denoted by rank (A), is the number of leading
entries in any REF of A.

Note that although we don’t prove it here, given a matrix and any two of its REFs, the
number of leading entries in both of these REFs will be the same. This means that our
definition of rank actually makes sense.

Example 20.6. Consider the following three matrices A, B and C along with one of their
REFs. Note that A and B are being viewed as augmented matrices for a linear system of
equations, while C is being viewed as a coefficient matrix.
   
2 1 9 31 1 0 3 10
A =  0 1 2 8  −→  0 1 2 8 
   

1 0 3 10 0 0 1 3
" # " #
2 0 1 3 4 1 1 4 −13 −5
B= −→
5 1 6 −7 3 0 -2 −7 29 14
" # " #
1 2 3 1 2 3
C= −→
2 4 6 0 0 0

We see that rank (A) = 3, rank (B) = 2 and rank (C) = 1.

Note that the requirement that a matrix be in REF before counting leading entries is im-
portant. The matrix " #
1 2 3
C=
2 4 6
has two leading entries, but rank (C) = 1.

Note that if a matrix has m rows and n columns, then rank (A) ≤ min{m, n}, the minimum
of m and n. This follows from the definition of leading entries and REF: there can be at
most one leading entry in each row and each column.

The next theorem is useful to analyze systems of equations and will appear throughout the
course.

125
Theorem 20.7 (System-Rank Theorem). Let [ A | ~b ] be the augmented matrix of a system
of m linear equations in n variables.
(1) The system is consistent if and only if rank (A) = rank [ A | ~b ]


(2) If the system is consistent, then the number of parameters in the general solution is
the number of variables minus the rank of A:

# of parameters = n − rank (A).

(3) The system is consistent for all ~b ∈ Rm if and only if rank (A) = m.
We don’t prove the System-Rank Theorem here. However, we will look at some of the
systems we have encountered thus far and show that they each satisfy all three parts of the
System-Rank Theorem.
Example 20.8. From Example 18.7, the system of m = 3 linear equations in n = 3 variables

2x1 + x2 + 9x3 = 31
x2 + 2x3 = 8
x1 + 3x3 = 10

has augmented matrix


   
2 1 9 31 1 0 0 1
[ A | ~b ] =  0 1 2 8  −→  0 1 0 2 
   

1 0 3 10 0 0 1 3

and solution    
x1 1
 x2  =  2  .
   

x3 3
From the System-Rank Theorem we see that
(1) rank (A) = 3 = rank [ A | ~b ] so the system is consistent.


(2) # of parameters = n − rank (A) = 3 − 3 = 0 so there are no parameters in the solution


(unique solution).

(3) rank (A) = 3 = m so the system will be consistent for any ~b ∈ R3 , that is, the system

2x1 + x2 + 9x3 = b1
x2 + 2x3 = b2
x1 + 3x3 = b3
will be consistent (with a unique solution) for any choice of b1 , b2 , b3 ∈ R.

126
Example 20.9. From Example 19.2, the system of m = 3 linear equations in n = 3 variables

3x1 + x2 = 10
2x1 + x2 + x3 = 6
−3x1 + 4x2 + 15x3 = −20

has augmented matrix


   
3 1 0 10 1 0 −1 4
[ A | ~b ] =  2 1 1 6  −→  0 1 3 −2 
   

−3 4 15 −20 0 0 0 0

and solution      
x1 4 1
 x2  =  −2  + t  −3  , t ∈ R.
     

x3 0 1
From the System-Rank Theorem, we have

(1) rank (A) = 2 = rank [ A | ~b ] so the system is consistent.




(2) # of parameters = n − rank (A) = 3 − 2 = 1 so there is 1 parameter in the solution


(infinitely many solutions).

(3) rank (A) = 2 6= 3 = m, so the system will not be consistent for every ~b ∈ R3 , that is,
the system

3x1 + x2 = b1
2x1 + x2 + x 3 = b2
−3x1 + 4x2 + 15x3 = b3
will be inconsistent for some choice of b1 , b2 , b3 ∈ R.

Example 20.10. From Example 19.4, the system of m = 2 equations in n = 4 variables

x1 + 6x2 − x4 = −1
x3 + 2x4 = 7

has augmented matrix that is already in RREF


" #
1 6 0 −1 −1
[ A | ~b ] =
0 0 1 2 7

127
and solution        
x1 −1 −6 1
 x2   0   1   0 
=  + s  + t , s, t ∈ R
       

 x3   7   0   −2 
x4 0 0 1
From the System-Rank Theorem,
(1) rank (A) = 2 = rank [ A | ~b ]) so the system is consistent.
(2) # of parameters = n − rank (A) = 4 − 2 = 2 so there are 2 parameters in the solution
(infinitely many solutions).

(3) rank (A) = 2 = m, so the system will be consistent for every ~b ∈ R2 , that is, the system
x1 + 6x2 − x 4 = b1
x3 + 2x4 = b2
will be consistent (with infinitely many solutions) for any choice of b1 , b2 ∈ R.
Example 20.11. From Example 19.5, the system of m = 3 linear equations in n = 3
variables
2x1 + 12x2 − 8x3 = −4
2x1 + 13x2 − 6x3 = −5
−2x1 − 14x2 + 4x3 = 7
has augmented matrix
   
2 12 −8 −4 2 12 −8 −4
[ A | ~b ] =  2 13 −6 −5  −→  0 1 2 −1 
   

−2 −14 4 7 0 0 0 1
and is inconsistent. From the System-Rank Theorem, we see
(1) rank (A) = 2 < 3 = rank [ A | ~b ] , so the system is inconsistent.


(2) as the system is inconsistent, the System-Rank Theorem does not apply here.

(3) rank (A) = 2 < 3 = m so the system will not be consistent forhevery ~b ∈ R3 . Indeed,
−4
i
as our work shows, the system is clearly not consistent for ~b = −5 .
7

In our last example, it is tempting to think that the system [ A | ~b ] will be inconsistent for
every ~b ∈ R3 , however, this is not the case. If we take ~b = ~0, then our system becomes

2x1 + 12x2 − 8x3 = 0


2x1 + 13x2 − 6x3 = 0
−2x1 − 14x2 + 4x3 = 0

128
It isn’t difficult to see that x1 = x2 = x3 = 0 is a solution, so that this system is indeed
consistent. Of course, the question now is for which ~b ∈ R3 is this system consistent.

Example 20.12. Find an equation that b1 , b2 , b3 ∈ R must satisfy so that the system

2x1 + 12x2 − 8x3 = b1


2x1 + 13x2 − 6x3 = b2
−2x1 − 14x2 + 4x3 = b3

is consistent.
Solution. We look at the augmented matrix of this system, and carry it to REF.
     
2 12 −8 b1 −→ 2 12 −8 b1 −→ 2 12 −8 b1
 2 13 −6 b2   0 1 2 b2 − b1   0 1 2 b2 − b1
     
R2 −R1 
−2 −14 4 b3 R3 +R1 0 −2 −4 b3 + b1 R3 +2R2 0 0 0 −b1 + 2b2 + b3

Since rank (A) = 2, we require rank [ A | ~b ] = 2 for consistency. Thus, we require that


−b1 + 2b2 + b3 = 0.

Note that if −b1 + 2b2 + b3 6= 0, then the above system is inconsistent.

129
Lecture 21
Our last examples showed the the System-Rank Theorem did indeed use the rank of a
matrix to predict whether or not a system was consistent, and if it was consistent, how
many parameters the solution would have. Of course, we already knew the answers to these
problems as we had previously solved those systems. Here we look at another example of
using the System-Rank Theorem to predict how many solutions a system will have based
on the values of the coefficients in the system. In this situation, we are not concerned with
what the solutions are, but simply if solutions exist and how many solutions there are.

Example 21.1. For which values of k, ` ∈ R does the system

2x1 + 6x2 = 5
4x1 + (k + 15)x2 = ` + 8

have no solutions? A unique solution? Infinitely many solutions?

Solution. Let " # " #


2 6 5
A= and ~b = .
4 k + 15 `+8

We carry [ A | ~b ] to REF.
" # " #
2 6 5 −→ 2 6 5
4 k + 15 ` + 8 R2 −2R1 0 k+3 `−2

If k + 3 6= 0, that is if k 6= −3, then rank (A) = 2 = rank [ A | ~b ] so the system is consistent




with 2 − rank (A) = 2 − 2 = 0 parameters. Hence we obtain a unique solution. If k + 3 = 0,


that is if k = −3, then we have
" # " #
2 6 5 2 6 5
=
0 k+3 `−2 0 0 `−2

If ` − 2 6= 0, that is if ` 6= 2, then rank (A) = 1 < 2 = rank [ A | ~b ] so the system is inconsis-




tent and thus has no solutions. If `−2 = 0, that is if ` = 2, then rank (A) = 1 = rank [ A | ~b ]


so the system is consistent with 2−rank (A) = 2−1 = 1 parameter. Hence we have infinitely
many solutions.

In summary,
Unique Solution : k 6= −3
No Solutions : k = −3 and ` 6= 2
Infinitely Many Solutions : k = −3 and ` = 2

130
Definition 21.2. A linear system of m equations in n variables is underdetermined if n > m,
this is, if it has more variables than equations.
Example 21.3. The linear system of equations

x1 + x 2 − x3 + x4 − x5 = 1
x1 − x2 − 3x3 + 2x4 + 2x5 = 7
is underdetermined.
Theorem 21.4. A consistent underdetermined linear system of equations has infinitely many
solutions.
Proof. Consider a consistent underdetermined linear system of m equations in n variables
with augmented matrix [ A | ~b ]. Since rank (A) ≤ min{m, n} = m, the system will have
n − rank (A) ≥ n − m > 0 parameters and so will have infinitely many solutions.
Definition 21.5. A linear system of m equations in n variables is overdetermined if n < m,
this is, if it has more equations than variables.
Example 21.6. The linear system of equations

−2x1 + x2 = 2
x1 − 3x2 = 4
3x1 + 2x2 = 7
is overdetermined.
Note that overdetermined systems are often inconsistent. Indeed, the system in the previous
example is inconsistent. To see why this is, consider for example, three lines in R2 (so a
system of three equations in two variables like the one in the previous example). When
chosen arbitrarily, it is generally unlikely that all three lines would intersect in a common
point and hence we would generally expect no solutions.

Homogeneous Systems of Linear Equations


We now discuss a particular type of linear system of equations that appears quite frequently.
Definition 21.7. A homogeneous linear equation is a linear equation where the constant
term is zero. A system of homogeneous linear equations is a collection of finitely many
homogeneous equations.
Example 21.8. A homogeneous system of m linear equations in n variables is written as

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
.. .. .. .. ..
. . . . .
am1 x1 + am2 x2 + · · · + amn xn = 0

131
As this is still a linear system of equations, we use our usual techniques to solve such systems.
However, notice that x1 = x2 = · · · = xn = 0 satisfies each equation in the homogeneous
system, and thus ~0 ∈ Rn is a solution to this system, called the trivial solution. As every
homogeneous system has a trivial solution, we see immediately that homogeneous linear
systems of equations are always consistent.

Example 21.9. Solve the homogeneous linear system

x1 + x2 + x3 = 0
3x2 − x3 = 0

Solution. We have
" # " # " #
1 1 1 0 −→ 1 1 1 0 R1 −R2 1 0 4/3 0
0 3 −1 0 1
R
3 2
0 1 −1/3 0 −→ 0 1 −1/3 0
so    
x1 = − 34 t x1 −4/3
x2 = 13 t, t ∈ R or  x2  = t  1/3  , t ∈ R.
   

x3 = t x3 1
We make a few remarks about this example:

• Note that taking t = 0 gives the trivial solution. However, as our system was underde-
termined, we have infinitely many solutions. Indeed, the solution set is actually a line
through the origin.

• We can simplify our solution a little bit by eliminating fractions:


       
x1 −4/3 −4 −4
 t
 x2  = t  1/3  =  1  = s  1  , s ∈ R
     
3
x3 1 3 3

where s = t/3. Hence we can let the parameter “absorb” the factor of 1/3. This is not
necessary, but is useful if one wishes to eliminate fractions.

• When working with homogeneous systems of linear equations, notice that the aug-
mented matrix [ A | ~0 ] will always have the last column containing all zero entries.
Thus, it is common to row reduce only the coefficient matrix.

Given a non-homogeneous linear system of equations with augmented matrix [ A | ~b ] (so


~b 6= ~0), the homogeneous system with augmented matrix [ A | ~0 ] is called the associated
homogeneous system. The solution to the associated homogeneous system tells us a lot
about the solution of the original non-homogeneous system.

132
Example 21.10. If we solve the system

x1 + x2 + x3 = 1
3x2 − x3 = 3

we obtain
" # " # " #
1 1 1 1 −→ 1 1 1 1 R1 −R2 1 0 4/3 0
0 3 −1 3 1
R
3 2
0 1 −1/3 1 −→ 0 1 −1/3 1
so
     
x1 = − 34 t x1 0 −4/3
x2 = 1 + 13 t, t∈R or  x2  =  1  + t  1/3  , t ∈ R.
     

x3 = t x3 0 1

Note that the solution to the associated homogeneous system (from Example 21.9) is
   
x1 −4/3
 x2  = t  1/3  , t ∈ R
   

x3 1

so we view the homogeneous solution from Example 21.9 as a line, say L0 , through the
origin, and the solution
h i from Example 21.10 as a line, say L1 , through P (0, 1, 0) parallel
0
to L0 . We refer to 1 as a particular solution to the system in Example 21.10 and note
0
that in general, the solution to a non-homogeneous system of linear equations is a particular
solution plus the solution to the associated homogeneous system of linear equations, provided
the non-homogeneous system of linear equations is consistent.

system of equations associated homogeneous system of equations


z
    }|   { z   }|  {
x1 0 −4/3 x1 −4/3
 x2  =  1  + t  1/3 , t∈R  x2  = t  1/3  , t∈R .
         

x3 0 1 x3 1
| {z } | {z }
particular associated
solution homogeneous
solution

Example 21.11. Consider the system of linear equations

x1 + 6x2 − x4 = −1
x3 + 2x4 = 7

133
We know from Example 19.4 that the solution is
       
x1 −1 −6 1
 x2   0   1   0 
=  + s  + t , s, t ∈ R,
       

 x3   7   0   −2 
x4 0 0 1

so the solution to the associated homogeneous system

x1 + 6x2 − x4 = 0
x3 + 2x4 = 0

is      
x1 −6 1
 x2   1   0 
 = s  + t , s, t ∈ R.
     

 x3   0   −2 
x4 0 1
which we recognize as a plane through the origin in R4 since the two vectors appearing in
the solution are nonzero and nonparallel.

From Examples 21.9 and 21.11 we saw that our solutions sets were lines and planes through
the origin which we recognize as subspaces. The following theorem shows that the solution
set to any homogeneous system in n variables will indeed be a subspace of Rn .

Theorem 21.12. Let S be the solution set to a homogeneous system of m linear equations
in n variables. Then S is a subspace of Rn .

Proof. Since the system has n variables, S ⊆ Rn and since the system is homogeneous, ~0 ∈ S
so S is nonempty. Now let
  
y1 z1
 .   . 
~y =  ..  and ~z =  .. 
yn zn

be vectors in S. To show that S is closed under vector addition and scalar multiplication, it
is enough to consider one arbitrary equation of the system:

a1 x1 + · · · + an xn = 0.

Then ~y , ~z ∈ S imply that

a1 y1 + · · · + an yn = 0 = a1 z1 + · · · + an zn .

134
It follows that
a1 (y1 + z1 ) + · · · + an (yn + zn ) = a1 y1 + · · · + an yn + a1 z1 + · · · + an zn = 0 + 0 = 0
so ~y + ~z satisfies any equation of the system and thus ~y + ~z ∈ S. For c ∈ R,
a1 (cy1 ) + · · · + an (cyn ) = c(a1 y1 + · · · + an yn ) = c(0) = 0
so c~y ∈ S. Hence S is a subspace of Rn .
Note that we call the solution set of a homogeneous system the solution space of the system.
Example 21.13. Solve the homogeneous system of linear equations

4x1 − 2x2 + 3x3 + 5x4 = 0


8x1 − 4x2 + 6x3 + 11x4 = 0
−4x1 + 2x2 − 3x3 − 7x4 = 0
and find a basis for the solution space S. Describe S geometrically.
Solution. As we have a homogeneous system, we carry the coefficient matrix to RREF.
     
4 −2 3 5 −→ 4 −2 3 5 R1 −5R2 4 −2 3 0 1
R
4 1

 8 −4 6 11  R2 −2R1  0 0 0 1  −→  0 0 0 1  −→
     

−4 2 −3 −7 R3 +R1 0 0 0 −2 R3 +2R2 0 0 0 0
 
1 −1/2 3/4 0
 0 0 0 1 
 

0 0 0 0
so
= 12 s − 43 t
     
x1 x1 1/2 −3/4
x2 = s  x2   1   0 
, s, t ∈ R or  = s  + t , s, t ∈ R.
     

x3 = t  x3   0   1 
x4 = 0 x4 0 0
Taking    

 1/2 −3/4 


 1   0  
B=  ,  ,
   
 0   1 
 

 

0 0
we can express the solution set S of our homogeneous system of linear equations as S =
Span B. As B contains two vectors that are not scalar multiples of one another, we have
that B is a basis for S. We see that S is a plane through the origin in R4 .

135
Lecture 22
Consider the homogeneous system of linear equations

x 1 + x2 + x3 + 4x5 = 0
x4 + 2x5 = 0

The coefficient matrix " #


1 1 1 0 4
0 0 0 1 2
is already in reduced row echelon form36 and our solution is
       
x1 = −t1 − t2 − 4t3 x1 −1 −1 −4
x2 = t1  x2   1   0   0 
       
       
x3 = t2 or   x 3
 = t1  0  + t2  1  + t3  0 
      
x4 = −2t3 x 0 0 −2
       
 4       
x5 = t3 x5 0 0 1

with t1 , t2 , t3 ∈ R so      

 −1 −1 −4 

 
1   0   0

      

 

     
B=  0 , 1 , 0
   



0 0   −2

     

    


 
 0 0 1 

is a spanning set for the solution space S of the system. We check B for linear independence.
Note however that the variables x2 , x3 and x5 are free variables. If we consider the second,
third and fifth entries in vectors of our spanning set
     

 −1 −1 −4  
 
1   0   0 

      

 
     
B=   0  ,  1  ,  0 
    

 0   0   −2 

      

 

 
 0 0 1 

we see that each vector has a 1 where the other two vectors have zeros in that same position.
Thus no vector in B is in the span of the others, and so B is linearly independent by Theorem
14.5. Hence B is a basis for the solution space S.
36
Remember that for homogeneous systems of linear equations, we normally row reduce just the coefficient
matrix.

136
Theorem 22.1. Let [ A | ~b ] be the augmented matrix for a consistent system of m linear
equations in n variables. If rank (A) = k < n, then the general solution of the system is of
the form
~x = d~ + t1~v1 + · · · + tn−k~vn−k
where d~ ∈ Rn , t1 , . . . , tn−k ∈ R and the set {~v1 , . . . , ~vn−k } ⊆ Rn is linearly independent. In
particular, the solution set is an (n − k)−flat in Rn .

Note that if rank (A) = n in the above theorem, then there are n − n = 0 parameters and so
our solution ~x = d~ is unique.

When solving a homogeneous system of linear equations, we see that the spanning set for
the solution space we find by solving the system is linearly independent. However, given
an arbitrary spanning set B for a subspace of Rn , we cannot assume that B is linearly
independent, and so we must still check. We now show a faster way to do so.
Consider        
1 2 1 5
 1   2   2   7 
~v1 =   , ~v2 =   , ~v3 =   and ~v4 = 
       

 2   4   3   12 
3 6 4 17
and let B = {~v1 , ~v2 , ~v3 , ~v4 } and S = Span B. We wish to find a basis B 0 for S with B 0 ⊆ B.
That is, find a linearly independent subset B 0 of B with Span B 0 = S. For c1 , c2 , c3 , c4 ∈ R,
considering
c1~v1 + c2~v2 + c3~v3 + c4~v4 = ~0
gives a homogeneous system whose coefficient matrix we carry to RREF:
     
1 2 1 5 −→ 1 2 1 5 R1 −R2 1 2 0 3
 1 2 2 7  R2 −R1
 0 0 1 2  −→  0 0 1 2 
     
     
 2 4 3 12  R3 −2R1  0 0 1 2  R3 −R2  0 0 0 0 
3 6 4 17 R4 −3R1 0 0 1 2 R4 −R2 0 0 0 0

We see that c2 and c4 are free variables so we obtain nontrivial solutions to the system and
hence B is linearly dependent. Our work with bases thus far has shown us that since we can
find solutions with c2 6= 0 and c4 6= 0, we can remove one of ~v2 or ~v4 from B and then test
the resulting smaller set for linear independence. We show here that we can simply remove
both ~v2 and ~v4 and arrive at B 0 = {~v1 , ~v3 } as our basis for S immediately.

To begin, note that c1 and c3 were leading variables in the above system. Using our work
above, we see that by considering the homogeneous system

c1~v1 + c3~v3 = ~0

137
we obtain    
1 1 1 0
 1 2   0 1 
→
   
 
 2 3   0 0 
3 4 0 0
which has only the trivial solution so {~v1 , ~v3 } is linearly indepedent. If we try to write ~v4 as
a linear combination of ~v1 , ~v2 and ~v3 , we obtain the system with augmented matrix
   
1 2 1 5 1 2 0 3
 1 2 2 7   0 0 1 2 
 −→ 
   
 
 2 4 3 12   0 0 0 0 
3 6 4 17 0 0 0 0

The system is consistent (with infinitely many solutions), so ~v4 ∈ Span {~v1 , ~v2 , ~v3 } and so by
Theorem 13.8, Span {~v1 , ~v2 , ~v3 , ~v4 } = Span {~v1 , ~v2 , ~v3 } so we “discard” ~v4 . Now, if we try to
express ~v2 as a linear combination of ~v1 , we obtain the system with augmented matrix
   
1 2 1 2
 1 2   0 0 
 −→ 
   
 
 2 4   0 0 
3 6 0 0

which is also consistent (with a unique solution) so ~v2 ∈ Span {~v1 } ⊆ Span {~v1 , ~v3 } and we
have that Span {~v1 , ~v2 , ~v3 } = Span {~v1 , ~v3 } by Theorem 13.8. We will thus “discard” ~v2 . In
summary, we’ve shown

S = Span B = Span {~v1 , ~v2 , ~v3 , ~v4 } = Span {~v1 , ~v2 , ~v3 } = Span {~v1 , ~v3 }

with {~v1 , ~v3 } linearly independent. Hence B 0 = {~v1 , ~v3 } is a basis for S.

Thus, we see that given a spanning set B = {~v1 , . . . , ~vk } for a subspace S of Rn , to find a
basis B 0 for S with B 0 ⊆ B, we construct the matrix [ ~v1 · · · ~vk ] which we carry to (reduced)
row echelon form. For i = 1, . . . , k, take ~vi ∈ B 0 if and only if the ith column of any REF of
our matrix has a leading entry. We also see that for ~vj ∈ / B 0 , ~vj can be expressed as a linear
combination of the vectors in {~v1 , . . . , ~vj−1 } ∩ B 0 .
Example 22.2. Let
       

 1 1 1 3 

B =  −1  ,  2  ,  5  ,  6  .
       
 
1 −3 −7 −9
 

Find a basis B 0 for Span B with B 0 ⊆ B.

138
Solution. We have
     
1 1 1 3 −→ 1 1 1 3 −→ 1 1 1 3
 −1 2 5 6  R2 +R1  0 3 6 9   0 3 6 9 
     

1 −3 −7 −9 R3 −R1 0 −4 −8 −12 R3 + 43 R2 0 0 0 0

As only the first two columns of an REF of our matrix contain leading entries, the first two
vectors in B comprise B 0 , that is
   

 1 1 

0
B =  −1  ,  2
   

 
1 −3
 

is a basis for Span B.


Note that if we had continued to row reduce to RREF, we would have found
   
1 1 1 3 1 0 −1 0
 −1 2 5 6  −→  0 1 2 3 . (?)
   

1 −3 −7 −9 0 0 0 0

Note that the third and fourth columns of the RREF do not contain leading ones. We see
that those vectors in B not taken in B 0 satisfy
          
1 1 1 1 1 1 1 0 −1 omit 4th
 5  = −1  −1  + 2  2  since  −1 2 5  −→  0 1 2   columns from 
          

−7 1 −3 1 −3 −7 0 0 0 matrices in (?)
          
3 1 1 1 1 3 1 0 0 omit 3rd
 6  = 0  −1  + 3  2  since  −1 2 6  −→  0 1 3   columns from 
          

−9 1 −3 1 −3 −9 0 0 0 matrices in (?)

Dimension
Let S be a subspace of Rn and B = {~v1 , ~v2 } be a basis for S. If C = {w ~ 1, w ~ 3 } is a set of
~ 2, w
vectors in S, then C must be linearly dependent. To see this, note that since B is a basis
for S, Theorem 15.6 gives that there are unique a1 , a2 , b1 , b2 , c1 , c2 ∈ R so that

w
~ 1 = a1~v1 + a2~v2 , w
~ 2 = b1~v1 + b2~v2 and w
~ 3 = c1~v1 + c2~v2 .

Now for t1 , t2 , t3 ∈ R, consider


~0 = t1 w
~ 1 + t2 w
~ 2 + t3 w
~3

139
= t1 (a1~v1 + a2~v2 ) + t2 (b1~v1 + b2~v2 ) + t3 (c1~v1 + c2~v2 )
= (a1 t1 + b1 t2 + c1 t3 )~v1 + (a2 t1 + b2 t2 + c2 t3 )~v2

Since B = {~v1 , ~v2 } is linearly independent we have,

a1 t1 + b1 t2 + c1 t3 = 0
a2 t1 + b2 t2 + c2 t3 = 0

This is an underdetermined homogeneous system, so it is consistent with nontrivial solutions


and it follows that C = {w
~ 1, w ~ 3 } is linearly dependent.
~ 2, w

The above generalizes as follows.


Theorem 22.3. Let B = {~v1 , . . . , ~vk } be a basis for a subspace S of Rn . If C = {w ~ `}
~ 1, . . . , w
is a set in S with ` > k, then C is linearly dependent.
It follows from the statement of the previous theorem that if C is linearly independent, then
` ≤ k. We now state the following important result:
Theorem 22.4. If B = {~v1 , . . . , ~vk } and C = {w ~ ` } are both bases for a subspace S
~ 1, . . . , w
of Rn , then k = `.
Proof. Since B is a basis for S and C is linearly independent, we have that ` ≤ k. Since C
is a basis for S and B is linearly independent, k ≤ `. Hence k = `.
Hence, given a nontrivial subspace S of Rn , there are many bases for S, but they will all
contain the same number of vectors. This motivates the following definition.
Definition 22.5. If B = {~v1 , . . . , ~vk } is a basis for a subspace S of Rn , then we say the
dimension of S is k, and we write dim(S) = k. If S = {~0}, then dim(S) = 0 since ∅ is a basis
for S.
Example 22.6. Since the standard basis for Rn is {~e1 , . . . , ~en }, we see that dim(Rn ) = n.
Example 22.7. We saw in Example 16.8 that the subspace
  

 a − b 

S =  b − c  a, b, c ∈ R
 
 
c−a
 

of R3 had basis    

 1 −1 
B=  ,
0   1 
   
 
−1 0
 

so dim(S) = 2.

140
Theorem 22.8. If S is a k−dimensional subspace of Rn with k > 0, then

(1) A set of more than k vectors in S is linearly dependent,

(2) A set of fewer than k vectors in S cannot span S,

(3) A set of exactly k vectors in S spans S if and only if it is linearly independent.

Example 22.9. Let S be a subspace of R3 with dim(S) = 2. Suppose that


   
1 1
~v1 =  1  and ~v2 =  2
   

−2 −3

belong to S. Since ~v1 and ~v2 are nonzero and nonparallel, we have that {~v1 , ~v2 } is a linearly
independent set of two vectors in S. Since dim(S) = 2, we have that S = Span {~v1 , ~v2 } by
Theorem 22.8(3). Thus {~v1 , ~v2 } is a basis for S.

Note that we must know dim(S) before we use Theorem 22.8. In the previous example, we
could not have used the linear independence of {~v1 , ~v2 } to conclude that S = Span {~v1 , ~v2 }
if we weren’t told the dimension of S.

141
Lecture 23
We now begin to look at some application of systems of linear equations.

Application: Chemical Reactions


A very simple chemical reaction often learned in high school is the combination of hydrogen
molecules (H2 ) and oxygen molecules (O2 ) to produce water (H2 O). Symbolically, we write

H2 + O2 −→ H2 O

The process by which molecules combine to form new molecules is called a chemical reaction.
Note that each hydrogen molecule is composed of two hydrogen atoms, each oxygen molecule
is composed of two oxygen atoms, and that each water molecule is composed of two hydrogen
atoms and one oxygen atom. Our goal is to balance this chemical reaction, that is, compute
how many hydrogen molecules and how many oxygen molecules are needed so that there are
the same number of atoms of each type both before and after the chemical reaction takes
place. By inspection, we find that

2H2 + O2 −→ 2H2 O

That is, two hydrogen molecules and one oxygen molecule combine to create two water
molecules. Before this chemical reaction takes place, there are four hydrogen atoms and
two oxygen atoms. After the reaction, there are again four hydrogen atoms and two oxygen
atoms. Thus we have balanced the chemical reaction.

Balancing chemical reactions by inspection becomes increasingly difficult as more complex


molecules are introduced. For example, the chemical reaction photosynthesis is a process
where plants combine carbon dioxide (CO2 ) and water (H2 O) to produce glucose (C6 H12 O6 )
and oxygen (O2 ):
CO2 + H2 O −→ C6 H12 O6 + O2
Although this could be solved by inspection, we look at another method. Let x1 denote
the number of CO2 molecules, x2 the number of H2 O molecules, x3 the number of C6 H12 O6
molecules and x4 the number of O2 molecules. Then we have

x1 CO2 + x2 H2 O −→ x3 C6 H12 O6 + x4 O2

Equating the number of atoms of each type before and after the reaction gives the equations

C: x1 = 6x3
O : 2x1 + x2 = 6x3 + 2x4
H: 2x2 = 12x3

142
Moving all variables to the left in each equation gives the homogeneous system

x1 − 6x3 = 0
2x1 + x2 − 6x3 − 2x4 = 0
2x2 − 12x3 = 0
Row reducing the augmented matrix of this system to RREF gives
     
1 0 −6 0 0 −→ 1 0 −6 0 0 −→ 1 0 −6 0 0 −→
 2 1 −6 −2 0  R2 −2R1  0 1 6 −2 0   0 1 6 −2 0 
     

0 2 −12 0 0 1
R
2 3
0 1 −6 0 0 R3 −R2 0 0 −12 2 0 − 12 R3
     
1 0 −6 0 0 R1 +R3 1 0 0 −1 0 −→ 1 0 0 −1 0
 0 1 6 −2 0  R2 −R3  0 1 0 −1 0   0 1 0 −1 0 
     

0 0 6 −1 0 −→ 0 0 6 −1 0 1
R
6 3
0 0 1 −1/6 0
We see that for t ∈ R,
x1 = t, x2 = t, x3 = t/6 and x4 = t
There are infinitely many solutions to the homogeneous system. However, since we cannot
have a fractional number of molecules, we require that x1 , x2 , x3 and x4 be nonnegative
integers. This implies that t should be an integer multiple of 6. Moreover, we wish to have
the simplest (or smallest) solution, so we will take t = 6. This gives x1 = x2 = x4 = 6 and
x3 = 1. Thus,
6CO2 + 6H2 O −→ C6 H12 O6 + 6O2
balances the chemical reaction.
Example 23.1. The fermentation of sugar is a chemical reaction given by the following
equation:
C6 H12 O6 −→ CO2 + C2 H5 OH
where C6 H12 O6 is glucose, CO2 is carbon dioxide and C2 H5 OH is ethanol37 . Balance this
chemical reaction.
Solution. Let x1 denote the number of C6 H12 O6 molecules, x2 the number of CO2 molecules
and x3 the number of C2 H5 OH molecules. We obtain
x1 C6 H12 O6 −→ x2 CO2 + x3 C2 H5 OH
Equating the number of atoms of each type before and after the reaction gives the equations

C : 6x1 = x2 + 2x3
O : 6x1 = 2x2 + x3
H : 12x1 = 6x3
37
Ethanol is also denoted by C2 H6 O and CH3 CH2 OH

143
which leads to the homogeneous system of equations

6x1 − x2 − 2x3 = 0
6x1 − 2x2 − x3 = 0
12x1 − 6x3 = 0

Carrying the augmented matrix of this system to RREF gives


     
6 −1 −2 0 −→ 6 −1 −2 0 R1 −R2 6 0 −3 0 1
R
6 1

 6 −2 −1 0  R2 −R1  0 −1 1 0  −→  0 −1 1 0  −R2
     

12 0 −6 0 R3 −2R1 0 2 −2 0 R3 +2R2 0 0 0 0 −→
 
1 0 −1/2 0
 0 1 −1 0 
 

0 0 0 0

Thus, for t ∈ R,
x1 = t/2, x2 = t and x3 = t
Taking t = 2 gives the smallest nonnegative integer solution, and we conclude that

C6 H12 O6 −→ 2CO2 + 2C2 H5 OH

Application: Linear Models

Example 23.2. An industrial city has four heavy industries (denoted by A1 , A2 , A3 , A4 )


each of which burns coal to manufacture its products. By law, no industry can burn more
than 45 units of coal per day. Each industry produces the pollutants Pb (lead), SO2 (sulfur
dioxide), and NO2 (nitrogen dioxide) at (different) daily rates per unit of coal burned and
these are released into the atmosphere. The rates are shown in the following table.

Industry A1 A2 A3 A4
Pb 1 0 1 7
SO2 2 1 2 9
NO2 0 2 2 0

The CAAG (Clean Air Action Group) has just leaked a government report that claims that
on one day last year, 250 units of Pb, 550 units of SO2 and 400 units of NO2 were measured
in the atmosphere. An inspector reported that A3 did not break the law on that day. Which
industry (or industries) broke the law on that day?

144
Solution. Let ai denote the number of units of coal burned by Industry Ai , for i = 1, 2, 3, 4.
Using the above table, we account for each of the pollutants on that day.

Pb : a1 + a3 + 7a4 = 250
SO2 : 2a1 + a2 + 2a3 + 9a4 = 550
NO2 : 2a2 + 2a3 = 400

Carrying the augmented matrix of the above system to RREF, we have


     
1 0 1 7 250 −→ 1 0 1 7 250 −→ 1 0 1 7 250 −→
 2 1 2 9 550  R2 −2R1  0 1 0 −5 50   0 1 0 −5 50 
     

0 2 2 0 400 0 2 2 0 400 R3 −2R2 0 0 2 10 300 1


R
2 3
   
1 0 1 7 250 R1 −R3 1 0 0 2 100
 0 1 0 −5 50  −→  0 1 0 −5 50 
   

0 0 1 5 150 0 0 1 5 150

From this, we find that

a1 = 100 − 2t, a2 = 50 + 5t, a3 = 150 − 5t, a4 = t

where t ∈ R. Now we look for conditions on t. We know A3 did not break that law, so
0 ≤ a3 ≤ 45, that is,
0 ≤ 150 − 5t ≤ 45
−150 ≤ −5t ≤ −105
30 ≥ t ≥ 21
It immediately follows that A4 didn’t break that law as a4 = t. Looking at A2 , we have

21 ≤ t ≤ 30
105 ≤ 5t ≤ 150
155 ≤ 50 + 5t ≤ 200
155 ≤ a2 ≤ 200

so A2 broke the law. Finally, for A1 , we find

21 ≤ t ≤ 30
−42 ≥ −2t ≥ −60
58 ≥ 100 − 2t ≥ 40
58 ≥ a1 ≥ 40

so it is possible that A1 broke the law, but we cannot be sure without more information.

145
Example 23.3. An engineering company has three divisions (Design, Production, Testing)
with a combined annual budget of $1.5 million. Production has an annual budget equal to
the combined annual budgets of Design and Testing. Testing requires a budget of at least
$80 000. What is the Production budget and the maximum possible budget for the Design
division?

Solution. Let x1 denote the annual Design budget, x2 the annual Production budget, and
x3 the annual Testing budget. It follows that x1 + x2 + x3 = 1 500 000. Since the annual
Production budget is equal the the combined Design and Testing budgets, we have x2 =
x1 + x3 . This gives the system of equations

x1 + x2 + x3 = 1 500 000
x1 − x2 + x3 = 0

Row reducing the above system gives


" # " #
1 1 1 1 500 000 −→ 1 1 1 1 500 000 −→
1 −1 1 0 R2 −R1 0 −2 0 −1 500 000 − 12 R2
" # " #
1 1 1 1 500 000 R1 −R2 1 0 1 750 000
0 1 0 750 000 −→ 0 1 0 750 000

This gives
x1 = 750 000 − t, x2 = 750 000, x3 = t
where t ∈ R. We know that the Testing budget requires at least $80 000 and can re-
ceive no more than $750 000 (since Testing shares a budget of $750 000 with Design). Thus
80 000 ≤ t ≤ 750 000. It follows that

−750 000 ≤ −t ≤ −80 000


0 ≤ 750 000 − t ≤ 670 000
0 ≤ x1 ≤ 670 000

Hence the Production budget is $750 000 and the maximum Design budget is $670 000.

146
Lecture 24

Application: Network Flow


A network consists of a system of junctions or nodes that are connected by directed line
segments. These networks are used to model real world problems such as traffic flow, fluid
flow, or any such system where a flow is observed. We observe here the central rule that
must be obeyed by these systems.

Junction Rule: At each of the junctions (or nodes) in the network, the flow into that
junction must equal the flow out of that junction.

Our goal is to achieve a network such that every junction obeys the Junction Rule. We say
that such a system is in a steady state or equilibrium.

Figure 47 below gives an example of a network with four nodes, A, B, C and D, and eight
directed line segments. We wish to compute all possible values of f1 , f2 , f3 and f4 so that
the system is in equilibrium.

Figure 47: A simple network

147
Using the Junction Rule at each node, we construct the following table:

Flow In Flow Out


A: 40 = f1 + f4
B: f1 + f2 = 50
C: 60 = f2 + f3
D: f3 + f4 = 50

Rearranging each of the above four linear equations leads to the following system:

f1 + f4 = 40
f1 + f2 = 50
f2 + f3 = 60
f3 + f4 = 50

Row reducing the augmented matrix to RREF, we have


     
1 0 0 1 40 −→ 1 0 0 1 40 −→ 1 0 0 1 40 −→

 1 1 0 0 50 
 R2 −R1

 0 1 0 −1 10 


 0 1 0 −1 10 

     
 0 1 1 0 60   0 1 1 0 60  R3 −R2  0 0 1 1 50 
0 0 1 1 50 0 0 1 1 50 0 0 1 1 50 R4 −R3
 
1 0 0 1 40
 0 1
 0 −1 10 

 
 0 0 1 1 50 
0 0 0 0 0

We find that
f1 = 40 − t, f2 = 10 + t, f3 = 50 − t and f4 = t
where t ∈ R. We see that there are infinitely many values for f1 , f2 , f3 and f4 so that the
system is in equilibrium. Note that a negative solution for one of the variables means that the
flow is in the opposite direction than the one indicated in the diagram. Depending on what
the network is representing, we may require that each of f1 , f2 , f3 and f4 be nonnegative.
In this case,
f1 ≥ 0 =⇒ 40 − t ≥ 0 =⇒ t ≤ 40
f2 ≥ 0 =⇒ 10 + t ≥ 0 =⇒ t ≥ −10
f3 ≥ 0 =⇒ 50 − t ≥ 0 =⇒ t ≤ 50
f4 ≥ 0 =⇒ t≥0
Here, we see that 0 ≤ t ≤ 40. They may be more constraints on f1 , f2 , f3 and f4 . For exam-
ple, if the flows in the above network represent the number of automobiles moving between

148
the junctions, then we further require f1 , f2 , f3 and f4 to be integers. In our example, this
would make t = 0, 1, 2, . . . 40, giving us 41 possible solutions.

When using linear algebra to model real world problems, we must be able to interpret our
solutions in terms of the problem it is modelling. This includes incorporating any real world
restrictions imposed by the system we are modelling.

Example 24.1. Consider four train stations labelled A, B, C and D. In the figure below, the
directed line segments represent train tracks to and from stations, and the numbers represent
the number of trains travelling on that track per day. Assume the tracks are one-way, so
trains may not travel in the other direction.

a) Find all values of f1 , . . . , f5 so that the system is in equilibrium.

b) Suppose the tracks from A to C and from D to A are closed due to maintenance. Is it
still possible for the system to be in equilibrium?

Solution.

a) We construct a table:

Flow In Flow Out


A: 15 + f4 = 10 + f1 + f5
B: 20 + f1 = 10 + f2
C: 15 + f2 + f5 = 25 + f3
D: 5 + f3 = 10 + f4

149
Rearranging gives the linear system of equations
f1 − f4 + f5 = 5
f1 − f2 = −10
f2 − f3 + f5 = 10
f3 − f4 = 5
which we carry to RREF
   
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
 1 −1 0 0 0 −10  −R2  −1
  1 0 0 0 10  R2 +R1
 
   
 0 1 −1 0 1 10  −R3  0 −1 1 0 −1 −10 
0 0 1 −1 0 5 −R4 0 0 −1 1 0 −5
   
1 0 0 −1 1 5 −→ 1 0 0 −1 1 5 −→
 0
 1 0 −1 1 15 

 0 1
 0 −1 1 15 
   
 0 −1 1 0 −1 −10  R3 +R2  0 0 1 −1 0 5 
0 0 −1 1 0 −5 0 0 −1 1 0 −5 R4 +R3
 
1 0 0 −1 1 5
 0 1 0 −1 1 15 
 
 
 0 0 1 −1 0 5 
0 0 0 0 0 0
giving
f1 = 5 + s − t, f2 = 15 + s − t, f3 = 5 + s, f4 = s and f5 = t
for integers s, t (as we cannot have fractional trains). Moreover, as trains cannot go
the other way, we immediately have
f1 ≥0 =⇒ 5 + s − t ≥ 0 =⇒ s − t ≥ −5
f2 ≥0 =⇒ 15 + s − t ≥ 0 =⇒ s − t ≥ −15
f3 ≥0 =⇒ 5+s≥0 =⇒ s ≥ −5
f4 ≥0 =⇒ s≥0
f5 ≥0 =⇒ t≥0
so we have s, t ≥ 0 and s − t ≥ −5.
b) Assume the tracks from A to C and from D to A are closed. This forces f4 = f5 = 0.
From our previous solution, we have that s = t = 0. Since s − t = 0 ≥ −5, this is a
valid solution. We have
f1 = 5, f2 = 15, f3 = 5, f4 = 0 and f5 = 0
Notice here we have a unique solution.

150
Application: Electrical Networks
Consider the following electrical network shown in Figure 48:

Figure 48: An electrical network

It consists of voltage sources, resistors and wires. A voltage source (often a battery) provides
an electromotive force V measured in volts. This electromotive force moves electrons through
the network along a wire at a rate we refer to as current I measured in amperes (or amps).
The resistors (lightbulbs for example) are measured in ohms Ω, and serve to retard the
current by slowing the flow of electrons. The intersection point between three or more wires
is called a node. The nodes break the wires up into short paths between two nodes. Every
such path can have a different current, and the arrow on each path is called a reference
direction. Pictured here is a voltage source (left) and a resistor (right) between two nodes.

One remark about voltage sources. If a current passes through a battery supplying V volts
from the “−” to the “+”, then there is a voltage increase of V volts. If the current passes
through the same battery from the “+” to the “−”, then there is a voltage drop (decrease)
of V volts.

151
Our aim is to compute the currents I1 , I2 and I3 in Figure 48. The following laws will be
useful.

Ohm’s Law The potential difference V across a resistor is given by V = IR, where I is the
current and R is the resistance.

Note that the reference direction is important when using Ohm’s Law. A current I travelling
across a resistor of 10Ω in the reference direction will result in a voltage drop of 10I while
the same current travelling across the same resistor against the reference direction will result
in a voltage gain of 10I.

Kirchoff ’s Laws

1. Conservation of Energy: Around any closed voltage loop in the network, the algebraic
sum of voltage drops and voltage increases caused by resistors and voltage sources is
zero.

2. Conservation of Charge: At each node, the total inflow of current equals the total
outflow of current.

Kirchoff’s Laws will be used to derive a system of equations that we can solve in order to find
the currents. The Conservation of Energy requires using Ohm’s Law. Returning to Figure
48, we can now solve for I1 , I2 and I3 . Notice that there is an upper loop, and a lower loop.
We may choose any orientation we like for either loop. Given the reference directions, we
will use a clockwise orientation for the upper loop and a counterclockwise orientation for the
lower loop. We will compute the voltage increases and drops as we move around both loops.
Conservation of Energy says the voltage drops must equal the voltage gains around each loop.

For the upper loop, we can start at node A. Moving clockwise, we first have a voltage gain
of 5 from the battery, then a voltage drop of 5I1 at the 5Ω resistor and a 10I2 voltage drop
at the 10Ω resistor. Thus

5I1 + 10I2 = 5 (15)

For the lower loop, we can again start at node A. Moving counterclockwise, we have a
voltage drop of 5I3 followed by a voltage increase of 10 and finally a voltage drop of 10I2 .
We have

10I2 + 5I3 = 10 (16)

Now, applying the Conservation of Charge to node A gives I1 + I3 = I2 so we obtain

I1 − I2 + I3 = 0 (17)

152
Note that at node B we obtain the same equation, so including it would be redundant.
Combining equations (15), (16) and (17) gives the system of equations

I1 − I2 + I3 = 0
5I1 + 10I2 = 5
10I2 + 5I3 = 10
Carrying the augmented matrix of this system to RREF,
     
1 −1 1 0 −→ 1 −1 1 0 −→ 1 −1 1 0 −→
 5 10 0 5  R2 −5R1  0 15 −5 5  5 R2  0 3 −1 1  R2 −R3
    1  

0 10 5 10 0 10 5 10 1
R
5 3
0 2 1 2
     
1 −1 1 0 R1 +R2 1 0 −1 −1 −→ 1 0 −1 −1 R1 +R3

 0 1 −2 −1  −→  0 1 −2 −1   0 1 −2 −1  R2 +2R3
     

0 2 1 2 R3 −2R2 0 0 5 4 1
R
5 3
0 0 1 4/5 −→
 
1 0 0 −1/5
 0 1 0 3/5 
 

0 0 1 4/5
we see that I1 = −1/5 amps, I2 = 3/5 amps and I3 = 4/5 amps. Notice that I1 is negative.
This simply means that our reference direction for I1 in Figure 48 is incorrect and the cur-
rent flows in the opposite direction there. Note that the reference directions may be assigned
arbitrarily.

Note that there is actually a third loop in Figure 48: the loop that travels along the outside
of the network. If we start at node A and travel clockwise around this loop, we first have
a voltage increase of 5, then a voltage drop of 5I1 , then another voltage drop of 10 (as we
pass through the 10V battery from “+” to “−”) and finally a voltage increase of 5I3 (as we
pass through the 5Ω resistor in the opposite reference direction for I3 ). As voltage increases
equal voltage drops, we have 5 + 5I3 = 5I1 + 10, or 5I1 − 5I3 = −5. However, this is just
Equation (16) subtracted from Equation (15). Including this equation in our above system
of equations would only result in an extra row of zeros when we carried the resulting system
of equations to RREF. This will be true in general, and shows that when computing current
in an electrical network, we only need to consider the “smallest” loops.

Another note is that we chose to orient the upper loop in the clockwise direction and the
lower loop in the counterclockwise direction. This was totally arbitrary (but made sense
given the reference directions). We could have changed either of the directions. Of course,
as we saw in the previous paragraph, we have to consider which way our orientation will
cause the current to flow through a battery, and how to handle resistors if our orientation
has us moving in the opposite direction of a reference direction.

153
One last thing to notice here is that since I1 is negative, the current is actually flowing
backwards through the 5V battery. This can happen in a poorly designed electrical network
- the 10V battery is too strong and actually forces the current to travel through the 5V
battery in the wrong direction. Too much current being forced through a battery in the
wrong direction will lead to a fire.

Example 24.2. Find the currents in the following electrical network:

Solution. We begin by using the Conservation of Energy on each of the three smallest closed
loops. Going clockwise around the left loop starting at A, we see a voltage drop of 20I2 , a
voltage gain of 10 and then a drop of 20I1 . This gives

20I1 + 20I2 = 10 or 2I1 + 2I2 = 1

Traversing the middle loop clockwise starting at A, we have a voltage drop of 20I3 followed
by a gain of 20I2 (note the we pass the resistor between A and C in the opposite direction
of I2 ). We obtain
20I2 = 20I3 or I2 − I3 = 0
Moving clockwise around the right loop starting at B, we observe a voltage gain of 20,
followed by a drop of 20I5 and then a gain of 20I3 leading to

20I5 = 20 + 20I3 or I3 − I5 = −1

Next, we apply the Conservation of Charge to the nodes A, B, C and D (in that order) to
obtain the equations

I1 − I2 − I4 =0
I3 − I4 + I5 =0
I1 − I2 − I6 =0
I3 + I5 − I6 =0

154
Finally, we have constructed the system of equations

2I1 + 2I2 = 1
I2 − I3 = 0
I3 − I5 = −1
I1 − I2 − I4 = 0
I3 − I4 + I5 = 0
I1 − I2 − I6 = 0
I3 I5 − I6 = 0

Carrying the augmented matrix of this system to RREF, we have


   
2 2 0 0 0 0 1 −→ 1 −1 0 −1 0 0 0 −→
1 −1 1 −1
   
 0 0 0 0 0   0 0 0 0 0 
   
 0
 0 1 0 −1 0 −1   R1 ↔R4  0
 0 1 0 −1 0 −1  
 1 −1 0 −1 0 0 0   2 2 0 0 0 0 1  R4 −2R1
   
   
 0
 0 1 −1 1 0 0 

 0
 0 1 −1 1 0 0 

 1 −1 0 −1
0 0 0   1 −1 0 0 0 −1 0 
 
  R6 −R1
0 0 1 0 1 −1 0 0 0 1 0 1 −1 0
   
1 −1 0 −1 0 0 0 R1 +R2 1 0 −1 −1 0 0 0 R1 +R3

1 −1 0  −→  0 1 −1
   
 0 0 0 0 0 0 0 0  R2 +R3
   
 0
 0 1 0 −1 0 −1 

 0 0
 1 0 −1 0 −1  −→

 0 4 0 2 0 0 1  R4 −4R2  0 0 4 2 0 0 1  R4 −4R3
   
   
 0
 0 1 −1 1 0 0 

 0 0
 1 −1 1 0 0  R5 −R3

 0
 0 0 1 0 −1 0 

 0 0
 0 1 0 −1 0 
0 0 1 0 1 −1 0 0 0 1 0 1 −1 0 R7 −R3
   
1 0 0 −1 −1 0 −1 −→ 1 0 0 −1 −1 0 −1 R1 +R4

0 −1 0 −1  0 −1 0 −1  −→
   
 0 1 0  0 1 0
   
 0 0 1
 0 −1 0 −1 
 0 0 1
 0 −1 0 −1  
 0 0 0 2 4 0 5   0 0 0 1 0 −1 0 
   
   
 0 0 0 −1 2 0  R4 ↔R6  0 0 0 −1
1  2 0 1  R5 +R4


 0 0 0
 1 0 −1 0 

 0 0 0
 2 4 0 5  R6 −2R4
0 0 0 0 2 −1 1 0 0 0 0 2 −1 1

155
   
1 0 0 0 −1 −1 −1 −→ 1 0 0 0 −1 −1 −1 R1 +R5

0 −1 0 −1  −1 0 −1  R2 +R5
   
 0 1 0  0 1 0 0
   

 0 0 1 0 −1 0 −1 
  0 0 1 0
 −1 0 −1   R3 +R5
0 0 0 1 0 −1 0   0 0 0 1 0 −1 0  −→
   

   

 0 0 0 0 2 −1 1  2 R4 
 1
 0 0 0 0 1 −1/2 1/2 

 0 0 0 0 4 2 5  4 R5 
 1
 0 0 0 0 1 1/2 5/4  R6 −R5
0 0 0 0 2 −1 1 1
R
2 7
0 0 0 0 1 −1/2 1/2 R7 −R5
   
1 0 0 0 0 −3/2 −1/2 R1 + 32 R6 1 0 0 0 0 0 5/8
0 0 −1/2 −1/2  R2 + 12 R6  0 1 0 0 0 0 −1/8 
   
 0 1 0
   

 0 0 1 0 0 −1/2 −1/2 
 R 3 + 1
2
R 6
 0 0
 1 0 0 0 −1/8  
0 0 0 1 0 −1 0  R4 +R6  0 0 0 1 0 0 3/4 
   

   

 0 0 0 0 1 −1/2 1/2 
 R5 + 12 R6  0 0
 0 0 1 0 7/8 
0 0 0 0 0 1  −→  0 0
3/4  0 0 0 1 3/4 
 
 
0 0 0 0 0 0 0 0 0 0 0 0 0 0

Finally, we see
5 1 1
I1 = amps, I2 = − amps, I3 = − amps,
8 8 8

3 7 3
I4 = amps, I5 = amps, I6 = amps
4 8 4
In particular, the reference arrows for I2 and I3 are pointing in the wrong direction.

156
Lecture 25

Matrix Algebra
We first encountered matrices when we solved systems of equations, where we performed
elementary row operations to the augmented matrix or the coefficient matrix of the system.
Here, we look at matrices as their own algebraic objects, and we will find that they are not
so different from vectors in Rn .
Definition 25.1. An m × n matrix A is a rectangular array with m rows and n columns.
The entry in the ith row and jth column will be denoted by aij , that is38
 
a11 a12 · · · a1j · · · a1n
 a21 a22 · · · a2j · · · a2n 
 
 . .. .. .. 
 .. . . . 
A=
 

 ai1 ai2 · · · aij · · · ain 
 .. .. .. .. 
 
 . . . . 
am1 am2 · · · amj · · · amn
which we sometimes abbreviate as A = [aij ] when the size of the matrix is known. Two
m × n matrices A and B are equal if aij = bij for all i = 1, . . . , m and j = 1, . . . , n, and we
write A = B. The set of all m × n matrices with real entries is denoted by Mm×n (R).
For a matrix A ∈ Mm×n (R), we say that A has size m × n and call aij the (i, j)−entry of A.
Note that we may write (A)ij instead of aij . If m = n, we say that A is a square matrix.
Example 25.2. Let  
1 2 " #
0 0
A= 6 4  and B =
 
0 sin π
3 1
Then A is a 3 × 2 matrix and B is a 2 × 2 square matrix.
Definition 25.3. The m × n matrix with all zero entries is called a zero matrix, denoted by
0m×n , or just 0 if the size is clear. Note that the matrix B in the previous example is the
2 × 2 zero matrix.
Definition 25.4. For A, B ∈ Mm×n (R) we define matrix addition as
(A + B)ij = (A)ij + (B)ij
and for c ∈ R, scalar multiplication is defined by
(cA)ij = c(A)ij
38
We normally use an uppercase letter to denote a matrix, such as A, B or C. We will then use aij , bij or
cij , respectively, to denote the entry in the ith row and jth column.

157
Example 25.5. Find a, b, c ∈ R such that
h i h i h i
a b c − 2 c a b = −3 3 6

Solution. Since
h i h i h i
a b c −2 c a b = a − 2c b − 2a c − 2b

we require
a − 2c = −3
−2a + b = 3
−2b + c = 6

     
1 0 −2 −3 −→ 1 0 −2 −3 −→ 1 0 −2 −3 −→
 −2 1 0 3  R2 +2R1  0 1 −4 −3   0 1 −4 −3 
     

0 −2 1 6 0 −2 1 6 R3 +2R2 0 0 −7 0 − 17 R3
   
1 0 −2 −3 R1 +2R3 1 0 0 −3
 0 1 −4 −3  R2 +4R3  0 1 0 −3 
   

0 0 1 0 −→ 0 0 1 0

so a = b = −3 and c = 0.
Note that for any A = Mm×n (R) and any c ∈ R we have that

0A = 0m×n and c 0m×n = 0m×n .

Example 25.6. Let c ∈ R and A ∈ Mm×n (R) be such that cA = 0m×n . Prove that either
c = 0 or A = 0m×n .

Proof. Since cA = 0m×n , we have that

caij = 0 for every i = 1, . . . , m and j = 1, . . . , n. (18)

If c = 0, then the result holds, so we assume c 6= 0. But then from (18), we see that aij = 0
for every i = 1, . . . , m and j = 1, . . . , n, that is, A = 0m×n .
The next theorem is very similar to Theorem 6.10, and shows that under our operations of
addition and scalar multiplication, matrices behave very similarly to vectors.

Theorem 25.7. Let A, B, C ∈ Mm×n (R) and let c, d ∈ R. We have

V1. A + B ∈ Mm×n (R) (Mm×n (R) is closed under addition)

V2. A + B = B + A (addition is commutative)

158
V3. (A + B) + C = A + (B + C) (addition is associative)

V4. There exists a matrix 0m×n ∈ Mm×n (R) such that A + 0m×n = A for every
A ∈ Mm×n (R) (zero matrix)

V5. For each A ∈ Mm×n (R) there exists a (−A) ∈ Mm×n (R) such that A + (−A) = 0m×n
(additive inverse)

V6. cA ∈ Mm×n (R) (Mm×n (R) is closed under scalar multiplication)

V7. c(dA) = (cd)A (scalar multiplication is associative)

V8. (c + d)A = cA + dA (distributive law)

V9. c(A + B) = cA + cB (distributive law)

V10. 1A = A (scalar multiplicative identity)


Note for A ∈ Mm×n (R), the additive inverse of A is (−A) = (−1)A.
Definition 25.8. Let A ∈ Mm×n (R). The transpose of A, denoted by AT , is the n × m
matrix satisfying (AT )ij = (A)ji .
Example 25.9. Let
 
1 " #
h i 4 2
A =  2 , B= 1 4 8 and C = .
 
−1 3
3

Then  
1 " #
h i 4 −1
AT = 1 2 3 , BT =  4  and C T = .
 
2 3
8

Theorem 25.10 (Properties of Transpose). Let A, B ∈ Mm×n (R) and c ∈ R. Then


(1) AT ∈ Mn×m (R)
T
(2) AT = A

(3) (A + B)T = AT + B T

(4) (cA)T = cAT


Example 25.11. Solve for A if
" #!T " #
1 2 2 3
2AT − 3 = .
−1 1 −1 2

159
Solution. Using Theorem 25.10, we have
" #!T " #
T T
 1 2 2 3
2A − 3 = by (3)
−1 1 −1 2
" #T " #
T T
 1 2 2 3
2 A −3 = by (4)
−1 1 −1 2
" # " #
1 −1 2 3
2A − 3 = by (2)
2 1 −1 2
" # " #
2 3 3 −3
2A = +
−1 2 6 3
" #
1 5 0
A=
2 5 5
" #
5/2 0
A=
5/2 5/2

Definition 25.12. A matrix A is symmetric if AT = A.

Note that if A ∈ Mm×n (R), then AT ∈ Mn×m (R) so AT = A implies n = m. Thus a


symmetric matrix must be a square matrix.

Example 25.13. Let


 
" # 1 −2 3
1 6
A= and B =  −2 4 5 .
 
6 9
3 6 7

Then
" #
1 6
AT = =A
6 9
 
1 −2 3
B T =  −2 4 6  6= B
 

3 5 7

so A is symmetric while B is not symmetric.

Example 25.14. Prove that if A, B ∈ Mn×n (R) are symmetric, then sA + tB is symmetric
for any s, t ∈ R.

160
Proof. Since A and B are symmetric, we have that AT = A and B T = B. We must show
that (sA + tB)T = sA + tB. We have

(sA + tB)T = (sA)T + (tB)T = sAT + tB T = sA + tB

so sA + tB is symmetric.

161
Lecture 26

The Matrix-Vector Product


Thus far, given a system of linear equations, we have worked with the augmented matrix in
order to solve the system and to verify various properties of systems of equations. We now
show that a linear system of equations can be viewed as a matrix-vector equation. We will
see that in addition to giving us a compact way to express a system of equations, this new
notation will make it easier to verify properties of systems of equations.

To begin, consider the linear system of equations

x1 + 3x2 − 2x3 = −7
−x1 − 4x2 + 3x3 = 8

Let  
" # x1 " #
1 3 −2 −7
A= , ~x =  x2  and ~b = .
 
−1 −4 3 8
x3
and let " # " # " #
1 3 −2
~a1 = , ~a2 = and ~a3 =
−1 −4 3

be the columns of A so that A = [ ~a1 ~a2 ~a3 ]. Now our above system is consistent if and
only if we can find x1 , x2 , x3 ∈ R so that
" # " # " # " # " #
~b = −7 x 1 + 3x 2 − 2x 3 1 3 −2
= = x1 + x2 + x3
8 −x1 − 4x2 + 3x3 −1 −4 3
= x1~a1 + x2~a2 + x3~a3

that is, the system is consistent if and only if ~b ∈ Span {~a1 , ~a2 , ~a3 }. This is simply Theorem
20.2 which in this case states that ~b ∈ Span {~a1 , ~a2 , ~a3 } if and only if the system with
augmented matrix [ ~a1 ~a2 ~a3 ~b ] is consistent. We make the following definition.

Definition 26.1. Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R) (it follows that ~a1 , . . . , ~an ∈ Rm ) and
~x = [ x1 · · · xn ]T ∈ Rn . Then the vector A~x is defined by

A~x = x1~a1 + · · · + xn~an ∈ Rm .

162
Using this definition, we can rewrite our above system as
 
" # x1 " #
1 3 −2  −7
 x2  =

−1 −4 3 8
x3
or more simply as
A~x = ~b.
Example 26.2.
       
1 5 " # 1 5 9
 −1
 −1 2  = (−1)  −1  + 2  2  =  5 
      
2
−2 1 −2 1 4

from which we see that ~x = [ −1 2 ]T is a solution to the linear system

x1 + 5x2 = 9
−x1 + 2x2 = 5
−2x1 + x2 = 4

Notice in the previous example that the entries in the solution ~x to the system A~x = ~b are
the coefficients that express ~b as a linear combination of the columns of the coefficient matrix
A.
Theorem 26.3.
(1) Every linear system of equations can be expressed as A~x = ~b for some matrix A and
some vector ~b,
(2) The system A~x = ~b is consistent if and only if ~b can be expressed as a linear combina-
tion of the columns of A,
(3) If ~a1 , . . . , ~an are the columns of A ∈ Mm×n (R) and ~x = [ x1 · · · xn ]T , then ~x
satisfies A~x = ~b if and only if x1~a1 + · · · + xn~an = ~b.
It’s important to keep the sizes of our matrices and vectors in mind:
A |{z} ~b
~x = |{z}
|{z}
m×n Rn Rm

For example, the matrix-vector product


  
1 2 1
A~x =  3 4   2 
  

1 4 −1
/ R2 .
is not defined since A has two columns but ~x ∈

163
Example 26.4. " #" # " # " # " #
1 1 1 1 1 0
=1 −1 =
1 1 −1 1 1 0

This shows that for A ∈ Mm×n (R) and ~x ∈ Rn with A 6= 0m×n and ~x 6= ~0Rn we are not
guaranteed that A~x is nonzero.

Theorem 26.5. Let A, B ∈ Mm×n (R), ~x, ~y ∈ Rn and c ∈ R. Then

(1) A(~x + ~y ) = A~x + A~y

(2) A(c~x) = c(A~x) = (cA)~x

(3) (A + B)~x = A~x + B~x

Proof. We prove (1). Let A = [ ~a1 · · · ~an ] where ~a1 , . . . , ~an ∈ Rm , ~x = [ x1 · · · xn ]T


and ~y = [ y1 · · · yn ]T . Then
 
x 1 + y1
..
A(~x + ~y ) = [ ~a1 · · · ~an ] 
 
. 
x n + yn
= (x1 + y1 )~a1 + · · · + (xn + yn )~an
= x1~a1 + · · · + xn~an + y1~a1 + · · · + yn~an
= A~x + A~y

Recall Theorem 21.12 which states that the solution set for a homogeneous system of equa-
tions in n variables is a subspace of Rn , called the solution space. We proved Theorem 21.12,
but we prove it again here using our new notation for systems of equations. Note how much
more concise the proof now is.

Example 26.6. Let A ∈ Mm×n (R) and ~x ∈ Rn . Let S denote the solution set to the
homogeneous system of linear equations A~x = ~0. Show S is a subspace of Rn .

Proof. Since ~x ∈ Rn , we have that S ⊆ Rn , and since A~0Rn = ~0, ~0Rn ∈ S so S is nonempty.
Suppose ~y , ~z ∈ S. Then A~y = ~0 = A~z and

A(~y + ~z) = A~y + A~z = ~0 + ~0 = ~0

so ~y + ~z ∈ S. For any c ∈ R,
A(c~y ) = cA~y = c ~0 = ~0
so c~y ∈ S. Hence S is a subspace of Rn .

164
We return now to examine the matrix-vector product. We have seen that A~x can be viewed
as a linear combination of the columns of A which has allowed us to talk about systems of
equations. Writing out and evaluating a linear combination can be tedious, and we will see
that dot products can simplify the task. If we compute A~x where
   
1 −1 6 1
A= 0 2 1  and ~x =  1  .
   

4 −3 2 2

then we have
         
1 −1 6 1(1) + 1(−1) + 2(6) 12
A~x = 1  0  + 1  2  + 2  1  =  1(0) + 1(2) + 2(1) = 4 
         

4 −3 2 1(4) + 1(−3) + 2(2) 5


| {z }
these look like dot products

If we define     

1 0 4
~r1 =  −1  , ~r2 =  2  and ~r3 =  −3 
     

6 1 2
then  
~r1 · ~x
A~x =  ~r2 · ~x 
 

~r3 · ~x
In general, given A ∈ Mm×n (R), there are vectors ~r1 , . . . , ~rm ∈ Rn so that

~r1T
 
 . 
A =  .. 
~rmT

and for any ~x ∈ Rn ,


~r1T
   T   
~r1 ~x ~r1 · ~x
 .   .   . 
A~x =  ..  ~x =  ..  =  .. 
~rmT ~rmT ~x ~rm · ~x
from which we see that the ith entry of A~x is the dot product ~ri · ~x where ~riT is the ith row
of A.

Definition 26.7. The n × n identity matrix, denoted by In (or In×n or just I if the size is
clear) is the square matrix of size n × n with aii = 1 for i = 1, 2, . . . , n (these entries make
up what we call the main diagonal of the matrix) and zeros elsewhere.

165
For example,
 
  1 0 0 0
" # 1 0 0
1 0  0 1 0 0 
I2 = I3 =  0 1 0  I4 = 
   

0 1 , ,  0 0 1 0  ,
0 0 1
= [ ~e1 ~e2 ] 0 0 0 1
= [ ~e1 ~e2 ~e3 ]
= [ ~e1 ~e2 ~e3 ~e4 ]

Example 26.8. Show In~x = ~x for every ~x ∈ Rn .


Proof. Let ~x = [ x1 · · · xn ]T ∈ Rn . Then

In~x = x1~e1 + · · · + xn~en = ~x

since {~e1 , . . . , ~en } is the standard basis for Rn .


Note that In~x = ~x for every ~x ∈ Rn is exactly why we call In the identity matrix. It is
also why we require In to be an square matrix. If I were an m × n matrix with m 6= n and
~x ∈ Rn , then I~x ∈ Rm 6= Rn so I~x would never be equal to ~x.
Example 26.9. Let
" # " # " #
1 0 3 −1 1
A= , B= and ~x = .
2 3 2 3 2

Then " #" # " # " #" #


1 0 1 1 3 −1 1
A~x = = = = B~x.
2 3 2 8 2 3 2
We see that A~x = B~x with ~x 6= ~0, and yet A 6= B.
This might seem strange39 . For a, b, x ∈ R with x 6= 0, we know that if ax = bx, then a = b.
As the previous example shows, this result does not hold for matrices: A~x = B~x for a given
vector ~x is not sufficient to guarantee A = B.
Theorem 26.10 (Matrices Equal Theorem). Let A, B ∈ Mm×n (R). If A~x = B~x for every
~x ∈ Rn , then A = B.
Proof. Let A = [ ~a1 · · · ~an ] and B = [ ~b1 · · · ~bn ]. Since A~x = B~x for every ~x ∈ Rn ,
we have that A~ei = B~ei for i = 1, . . . , n. Since

A~ei = ~ai and B~ei = ~bi

we have that ~ai = ~bi for i = 1, . . . , n. Hence A = B.

Note that A~x = B~x is equivalent to (A − B)~x = ~0, and we have seen in Example 26.4 that we can have
39

A − B and ~x nonzero despite their product being zero.

166
Lecture 27

Fundamental Subspaces of a Matrix


In this section, we define the nullspace, column space and row space of an m × n matrix.
We will refer to these three sets as the fundamental subspaces of a matrix.
Definition 27.1. Let A ∈ Mm×n (R). The nullspace (sometimes called the kernel ) of A is
the subset of Rn defined by

Null (A) = {~x ∈ Rn | A~x = ~0}.

Definition 27.2. Let A = [ ~a1 · · · ~an ] ∈ Mm×n (R). The column space of A is the subset
of Rm defined by

Col (A) = {A~x | ~x ∈ Rn }


= Span {~a1 , . . . , ~an }.

Definition 27.3. Let


~r1T
 
 . 
A =  ..  ∈ Mm×n (R).
~rmT
The row space of A is subset of Rn defined by

Row (A) = {AT ~x | ~x ∈ Rm }


= Span {~r1 , . . . , ~rm }.

Note that the nullspace of A is simply the solution space of the homogeneous system of
equations A~x = ~0 and is hence a subspace of Rn by Theorem 21.12. Since the column space
of A is simply the span of the columns of A, we have that Col (A) is a subspace of Rm by
Example 16.6. Similarly, Row (A) is a subspace of Rn .

From our previous work, we know that the system A~x = ~b is consistent if and only if ~b is a
linear combination of the columns of A. Now we can say that A~x = ~b is consistent if and
only if ~b ∈ Col (A).

Let A ∈ Mm×n (R). We already know how to find a basis for the nullspace of A, and since
the column space of A is simply the span of the columns of A, finding a basis for Col (A)
amounts to removing dependencies among the columns of A, which is a method we have
previously derived. But how do we find a basis for the row space of A?

Theorem 27.4. Let A ∈ Mm×n (R). If R is obtained from A by a series of elementary row
operations, then Row (R) = Row (A).
167
Proof. Let A ∈ Mm×n (R) with rows ~r1T , . . . , ~rmT . It is sufficient to show that Row (A) is
unchanged by each of the three elementary row operations. Let 1 ≤ i, j ≤ m with i 6= j. If
we swap the ith row and jth row of A, then the row space of the resulting matrix will be
spanned by
~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm
(we’ve shown the case for i < j, the case j < i being similar) and it’s not difficult to see that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~ri−1 , ~rj , ~ri+1 , . . . , ~rj−1 , ~ri , ~rj+1 , . . . , ~rm }. (19)

If we add k times the ith row of A to the jth row of A, then the resulting matrix will have
a row space spanned by
~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm
and it’s not difficult to show that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , ~rj−1 , ~rj + k~ri , ~rj+1 , . . . , ~rm }. (20)

Finally, if we multiply the ith row of A by a nonzero scalar k ∈ R, then the row space of the
resulting matrix will be spanned by

~r1 , . . . , k~ri , . . . , ~rm

and it is again not difficult to show that

Span {~r1 , . . . , ~rm } = Span {~r1 , . . . , k~ri , . . . , ~rm }. (21)

Together, equations (19), (20) and (21) show that if R is obtained from A by a series of
elementary row operations, then Row (R) = Row (A).
It follows from Theorem 27.4 that to find a basis for Row (A), it is sufficient to find a basis
for Row (R). The next example will show that this is quite easy if R is the reduced row
echelon form of A.
Example 27.5. Let  
1 1 5 1
A= 1 2 7 2 
 

2 3 12 3
Find a basis for Null (A), Col (A) and Row (A), and state the dimensions of each of these
subspaces.
Solution. Carrying A to RREF gives
     
1 1 5 1 −→ 1 1 5 1 R1 −R2 1 0 3 0
 1 2 7 2  R2 −R1  0 1 2 1  −→  0 1 2 1 
     

2 3 12 3 R3 −2R1 0 1 2 1 R3 −R2 0 0 0 0

168
The solution to the homogeneous system A~x = ~0 is
     
x1 −3 0
 x   −2   −1 
 2 
 = s  + t , s, t ∈ R
   

 x3   1   0 
x4 0 1
so    

 −3 0 


 −2   −1 

B1 =  ,
   


  1   0 

 

0 1
is a basis for Null (A) and dim(Null (A)) = 2. Also, as only the first two columns of the
RREF of A have leading entries,
   
 1
 1  
B2 =  1  ,  2 
   
 
2 3
 

is a basis for Col (A) and dim(Col (A)) = 2. Theorem 27.4 tells us that the rows of the
reduced row echelon form of A span Row (A), so
      

 1 0 0 
 0   1   0  
 
Row (A) = Span   ,   ,   .
     

  3   2   0 

 

0 1 0

Since each of the nonzero vectors in our spanning set for Row (A) has a 1 where the others
have a zero, the nonzero vectors in our spanning set are linearly independent and still span
Row (A). Hence
   

 1 0  

 0   1  
B3 =   ,  
   

  3   2  

 

0 1
is a basis for Row (A) and dim(Row (A)) = 2.
Note that if R is the reduced row echelon form of any matrix A, then the nonzero rows of
R will each contain a 1 where the other rows will have a zero and so the nonzero rows of R
will be a linearly independent set and hence a basis for Row (A).40
40
With a bit more thought, one realizes that the nonzero rows of any row echelon form of A form a basis
for Row (A).

169
Note that to find a basis for Col (A), we carry A to any row echelon form R (preferably
reduced row echelon form, particularly if we also seek a basis for Null (A)) and look for the
columns of R with leading entries. The corresponding columns of A will form a basis for
Col (A). To find a basis for Row (A), we simply take nonzero rows of R. It follows that for
any A ∈ Mm×n (R),
dim(Col (A)) = rank (A) = dim(Row (A)).
Also, by the System–Rank Theorem,

dim(Null (A)) = n − rank (A).

Example 27.6. Find a basis for Null (A), Col (A) and Row (A) where
 
1 2 1 3 4
A= 3 6 2 6 9 
 

−2 −4 1 1 −1

and state their dimensions.

Solution. We carry A to RREF:


   
1 2 1 3 4 −→ 1 2 1 3 4 R1 +R2

 3 6 2 6 9  R2 −3R1  0 0 −1 −3 −3  −→
   

−2 −4 1 1 −1 R3 +2R1 0 0 3 7 7 R3 +3R2
     
1 2 0 0 1 −→ 1 2 0 0 1 −→ 1 2 0 0 1
 0 0 −1 −3 −3  −R2  0 0 1 3 3  R2 −3R3  0 0 1 0 0 
     

0 0 0 −2 −2 − 21 R3 0 0 0 1 1 0 0 0 1 1

We have
         
x1 −2 −1 
 −2 −1 

 
x2 1 0 1   0
      
   
      
 

         

 x3  = s
  0  + t
  0 ,
 s, t ∈ R so B1 = 
 , 0
0   


x4 0 −1  0   −1
      
   
      
 


 
x5 0 1  0 1 

is a basis for Null (A) showing that dim(Null (A)) = 2. As the first, third and fourth columns
of the RREF of A have leading entries,
     

 1 1 3  
B2 =  3  ,  2  ,  6 
     
 
−2 1 1
 

170
is a basis for Col (A) and dim(Col (A)) = 3. Finally, the nonzero rows of the reduced row
echelon form of A give      

 1 0 0 

 
2 0 0

      

     

     
B3 =   0 ,
  1 ,
  0 


0 0 1

      

      


 
 1 0 1 

as a basis for Row (A) and dim(Row (A)) = 3.

Matrix Multiplication
We now extend the matrix–vector product to matrix multiplication.

Definition 27.7. If A ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R), then the matrix
product AB is the m × k matrix

AB = [ A~b1 · · · A~bk ].

Example 27.8. Let


 
" # 1 2
1 2 3
A= and B =  1 −1 
 
−1 −1 1
2 2
so    
1 2
~b1 = 
 1 

and ~b2 =  −1  .
 

2 2
Then
   
" # 1 " # " # 2 " #
1 2 3 9 1 2 3 6
A~b1 =  1 = and A~b2 =  −1  =
   
−1 −1 1 0 −1 −1 1 1
2 2
so " #
9 6
AB = [ A~b1 A~b2 ] = .
0 1

In the above example, we saw that

(A2×3 )(B3×2 ) = (AB)2×2

171
In general, for the product AB to be defined, the number of columns of A must equal
the number of rows of B. If this is the case, then A ∈ Mm×n (R) and B ∈ Mn×k (R) and
AB ∈ Mm×k (R).

The above method to multiply matrices can be quite tedious. As with the matrix–vector
product, we can simplify the task using dot products. For

~r1T
 
 . 
A =  ..  ∈ Mm×n (R) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (R)
~rmT

we see that ~ri ∈ Rn for i = 1, . . . , m and ~bj ∈ Rn for j = 1, . . . , k so the dot product ~ri · ~bj is
defined. Then we have

~r1 ~b1 · · · ~r1T~bk ~r1 · ~b1 · · · ~r1 · ~bk


 T   T   
~r1
 .   . ..   .. ..
AB =  ..  [ ~b1 · · · ~bk ] =  .. . =

. . 
T~ T~
T
~rm ~rm b1 · · · ~rm bk ~rm · ~b1 · · · ~rm · ~bk

Thus, the (i, j)−entry of AB is ~ri · ~bj .

Example 27.9. Let


" # " #
1 2 1 1 3
A= and B = .
3 4 4 −2 1

Then
" #" # " # " #
1 2 1 1 3 1(1) + 2(4) 1(1) + 2(−2) 1(3) + 2(1) 9 −3 5
AB = = =
3 4 4 −2 1 3(1) + 4(4) 3(1) + 4(−2) 3(3) + 4(1) 19 −5 13

In the previous example, note that A ∈ M2×2 (R) and B ∈ M2×3 (R) so AB ∈ M2×3 (R).
However, the number of columns of B is not equal to the number of rows of A, so the
product BA is not defined.

172
Lecture 28
Example 28.1. Let " # " #
1 1 1 2
A= and B = .
1 1 1 −1
Then
" #"
# " #
11 12 2 1
AB = =
1 −1
1 1 2 1
" #" # " #
1 2 1 1 3 3
BA = =
1 −1 1 1 0 0

from which we see that AB 6= BA despite the products AB and BA both being defined and
having the same size.

Examples 27.9 and 28.1 show us that matrix multiplication is not commutative. That is,
given two matrices A and B such that AB is defined, the product BA may not be defined,
and even if it is, BA may not be equal to AB (in fact, BA need not have the same size as
AB: consider A ∈ M2×3 (R) and B ∈ M3×2 (R)).

Example 28.2. Let " # " #


1 2 1 1
A= and B =
3 4 −1 2
Then
" #" #!T " #T " #
1 2 1 1 −1 5 −1 −1
(AB)T = = =
3 4 −1 2 −1 11 5 11

but
" #" # " #
1 3 1 −1 4 5
AT B T = =
2 4 1 2 6 6

and in fact
" #" # " #
1 −1 1 3 −1 −1
B T AT = = = (AB)T
1 2 2 4 5 11

173
41
Theorem 28.3. Let c ∈ R and A, B, C be matrices so that the following are defined

(1) IA = A I is an identity matrix

(2) AI = A I is an identity matrix

(3) A(BC) = (AB)C Matrix multiplication is associative

(4) A(B + C) = AB + AC Left distributive law

(5) (B + C)A = BA + CA Right distributive law

(6) (cA)B = c(AB) = A(cB)

(7) (AB)T = B T AT

Note that since we defined matrix products in terms of the matrix vector product, we have
that (3) holds for the matrix vector product also: A(B~x) = (AB)~x where ~x has the same
number of entries as B has columns. We also note that (7) can be generalized as

(A1 A2 · · · Ak )T = ATk · · · AT2 AT1 (22)

where A1 , . . . , Ak are matrices of appropriate sizes. In fact, taking A1 = . . . = Ak = A for


some square matrix A, we obtain
T k
Ak = AT
for any positive integer k 42

Example 28.4. Simplify A(3B − C) + (A − 2B)C + 2B(C + 2A).

Solution. We have

A(3B − C) + (A − 2B)C + 2B(C + 2A) = 3AB − AC + AC − 2BC + 2BC + 4BA


= 3AB + 4BA

Note:

• A(3B − C) = 3AB − AC, that is, when distributing, A must remain on the left

• (A − 2B)C = AC − 2BC, that is, when distributing, C must remain on the right

• 3AB + 4BA 6= 7AB since we cannot assume AB = BA.

Example 28.5. If A, B, C ∈ Mn×n (R) and C commutes (with respect to multiplication)


with both A and B, then prove that C commutes with AB.
41
Here we mean that the matrices A, B and C have sizes such that their products and sums are defined.
The sizes of each matrix may change depending on which statement we are considering.
42
For a square matrix A and a positive integer k, Ak = A
| ·{z
· · A}.
k times

174
Proof. Since C commutes with both A and B, we have that AC = CA and BC = CB. Thus

(AB)C = A(BC) = A(CB) = (AC)B = (CA)B = C(AB)

and so C commutes with AB.

Complex Matrices
We denote the set of m×n matrices with complex entries by Mm×n (C). The rules of addition,
scalar multiplication, matrix-vector product, matrix multiplication and transpose derived for
real matrices also hold for complex matrices.

Example 28.6. Let


" # " #
j 2−j 1 j
A= and B = .
4 + j 1 − 2j 2j 1 − j

Then
" #" # " #
j 2−j 1 j 2 + 5j −3j
AB = =
4 + j 1 − 2j 2j 1 − j 8 + 3j −2 + j
" #" # " #
1 j j 2−j −1 + 5j 4
BA = =
2j 1 − j 4 + j 1 − 2j 3 − 3j 1 + j

from which we see that AB 6= BA, so multiplication of complex matrices also doesn’t
commute.

Note that if

~r1T
 
 . 
A =  ..  ∈ Mm×n (C) and B = [ ~b1 · · · ~bk ] ∈ Mn×k (C)
~rmT

then ~ri ∈ Cn for i = 1, . . . , m and ~bj ∈ Cn for j = 1, . . . , k so the dot product ~ri ·~bj is defined.
Then we have

~r1 · ~b1 · · · ~r1 · ~bk


 T   
~r1
 .  .. ..
AB =  ..  [ ~b1 · · · ~bk ] = 
 
. . 
~r T
m ~rm · ~b1 · · · ~rm · ~bk

Thus, the (i, j)−entry of AB is ~ri · ~bj . It’s important to note that we use the dot product
here, and not the complex inner product.

175
Definition 28.7. Let A = [aij ] ∈ Mm×n (C). Then the conjugate of A is

A = [aij ]

and the conjugate transpose of A is


T
A∗ = A .

Example 28.8.  
" #∗ 1−j 2
1 + j 1 − 2j j
=  1 + 2j j 
 
2 −j 3+j
−j 3−j

Definition 28.9. Let A ∈ Mn×n (C). A is called Hermitian if A∗ = A.

Example 28.10. Let


" # " #
j 1+j 3 2−j
A= and B = .
1+j 3 2+j 6

Then
" #
−j 1 − j
A∗ = 6= A
1−j 3
" #
3 2−j
B∗ = =B
2+j 6

so B is Hermitian and A is not. Notice however that AT = A so A is symmetric.

Theorem 28.11. If A ∈ Mm×n (C) and ~z ∈ Cn , then A~z = A ~z .

Theorem 28.12. If A, B are complex matrices of appropriate sizes, ~z ∈ Cn and α ∈ C,


then
∗
(1) A∗ = A

(2) (A + B)∗ = A∗ + B ∗

(3) (αA)∗ = αA∗

(4) (AB)∗ = B ∗ A∗

(5) (A~z)∗ = ~z ∗ A∗

176
Application: Adjacency Matrices for Directed Graphs
A directed graph (or digraph) is a set of vertices and a set of directed edges between some of
the pairs of vertices. We may move from one vertex in the directed graph to another vertex
if there is a directed edge pointing in the direction we wish to move. Consider the directed
graph below:

This graph has four vertices, V1 , V2 , V3 and V4 . A directed edge between two vertices Vi and
Vj is simply the arrow pointing from Vi to Vj . As seen in the figure, we may have a directed
edge from a vertex to the same vertex (see V1 ), an edge may be directed in both directions
(see V2 and V3 ) and there may be more than one directed edge from one vertex to another
(see V3 and V4 ).

One question we may ask is in how many distinct ways can we get from V1 to V4 travelling
along exactly 3 directed edges, that is, how many distinct 3−edged paths are there from V1
to V4 ? A little counting reveals that there are 6 distinct such paths:
upper
V1 −−−→ V1 −−−→ V3 −−−→ V4
lower
V1 −−−→ V1 −−−→ V3 −−−→ V4
V1 −−−→ V1 −−−→ V2 −−−→ V4
upper
V1 −−−→ V2 −−−→ V3 −−−→ V4
lower
V1 −−−→ V2 −−−→ V3 −−−→ V4
V1 −−−→ V3 −−−→ V2 −−−→ V4

177
Note that each time we move from V3 to V4 , we specify which directed edge we are taking
since there is more than one. We could alternatively label each directed edge as we have the
vertices. However, we are more concerned with counting the number of paths and not with
actually listing them all out.

Counting may seem easy, but what if we were asked to find all distinct 20−edged paths
from V1 to V4 ? After months of counting, you would find 2 584 875 distinct paths. Clearly,
counting the paths one-by-one is not the best method.

Consider the 4 × 4 matrix A whose (i, j)−entry is the number of directed edges from Vi to
Vj . Then
 
1 1 1 0
 0 0 1 1 
A=
 

 0 1 0 2 
1 0 0 0
We compute
   
1 2 2 3 4 3 3 6
2
 1 1 0 2  
3 3 1 2 1 
A = and A = 
   
 
 2 0 1 1   3 3 2 2 
1 1 1 0 1 2 2 3
and note that the (1, 4)−entry of A3 is 6 which is the number of distinct 3−edged paths
from V1 to V4 . In fact, the (i, j)−entry of A3 gives the number of distinct 3−edged paths
from Vi to Vj for any i and j with 1 ≤ i, j ≤ 4.
Definition 28.13. Consider a directed graph with n vertices V1 , V2 , . . . , Vn . The adjacency
matrix of the directed graph is the n × n matrix A whose (i, j)−entry is the number of
directed edges from Vi to Vj .
Theorem 28.14. Consider a directed graph with n vertices V1 , V2 , . . . , Vn . For any positive
integer k, the number of distinct k−edged paths from Vi to Vj is given by the (i, j)−entry of
Ak .
Proof. 43 The result is true for k = 1 since the (i, j)−entry of A1 = A is by definition the
number of distinct 1−edged paths from Vi to Vj . Assume now that the result is true for
(k)
some positive integer k. Denote the (i, j)−entry of Ak by aij so that the number of distinct
(k) (k+1)
k−edged paths from Vi to Vj is aij . Consider the (i, j)−entry of Ak+1 , denoted by aij .
We have n
(k+1)
X (k) (k) (k) (k)
aij = ai` a`j = ai1 a1j + ai2 a2j + · · · + ain anj
`=1
43
The proof technique used here is called induction. Although you will not be asked to give a proof by
induction, we include this proof here as it illustrates why Ak gives the number of k−edged paths between
the vertices of a directed graph.

178
Note that every (k + 1)−edged path from Vi to Vj is of the form

Vi −−−→ · · · −−−→ V` −−−→ Vj (23)


(k)
for some ` = 1, 2, . . . , n. By assumption, there are ai` distinct k−edged paths from Vi
(k)
to V` . Since there are a`j distinct 1−edged paths from V` to Vj , there are ai` a`j distinct
(k + 1)−edged paths of the form given by (23). Thus, the total number of (k + 1)−edged
paths from Vi to Vj is given by
(k) (k) (k)
ai1 a1j + ai2 a2j + · · · + ain anj
(k+1)
which is aij , the (i, j)−entry of Ak+1 .

Example 28.15. An airline company offers flights between the cities of Toronto, Beijing,
Paris and Sydney. You can fly between these cities as you like, except that there is no
flight from Beijing to Sydney, and there is no flight between Toronto and Sydney in either
direction.

(a) If you depart from Toronto, how many distinct sequences of flights can you take if you
plan to arrive in Beijing after no more than 5 flights? (You may arrive at Beijing in
less than 5 flights and then leave, provided you end up back in Beijing after no later
than the 5th flight).

(b) Suppose you wish to depart from Sydney and arrive in Beijing after the 5th flight. In
how many ways can this be done so that your second flight takes you to Toronto?

(c) Suppose you wish to depart from Sydney and arrive in Beijing after the 5th flight. In
how many ways can this be done so that you visit Toronto at least once?

Solution. We denote the four cities as vertices, and place a directed arrow between two cities
if we can fly between the two cities in that direction. We label Toronto as V1 , Beijing as V2 ,
Paris as V3 and Sydney as V4 . We obtain the following directed graph:

179
We construct the adjacency matrix A as
 
0 1 1 0
 1 0 1 0 
A=
 

 1 1 0 1 
0 1 1 0

As we can take at most 5 flights, we will compute A2 , A3 , A4 and A5 :


" # " # " # " #
2 1 1 1 2 4 4 1 8 7 7 4 14 19 19 7
2 1 2 1 1 3 3 3 4 1 4 7 8 7 4 5 15 18 19 7
A = 1 2 3 0 , A = 5 4 3 3 , A = 7 11 12 3 , A = 23 22 21 12
2 1 1 1 2 4 4 1 8 7 7 4 14 19 19 7

(a) Since the (1, 2)−entry of Ak gives the number of distinct ways to fly from Toronto to
Beijing using k flights, we simply add the (1, 2)−entries of these five matrices. We
have 1 + 1 + 4 + 7 + 19 = 32. Thus, there are 32 distinct ways to fly from Toronto to
Beijing using no more than 5 flights.

(b) Here, we need to fly from Sydney to Beijing using exactly 5 flights. The (4, 2)−entry of
A5 tells us that there are 19 ways to do this. However, we must pass through Toronto
after the second flight. This restriction implies our final answer should be no greater
than 19. We will compute the number of ways to fly from Sydney to Toronto in two
flights, and then the number of ways to fly from Toronto to Beijing in three flights,
and finally multiply our results together to get the final answer. Thus we compute
(2) (3)
a41 · a12 = 2 · 4 = 8

There are 8 ways to fly from Sydney to Beijing in 5 flights, stopping in Toronto after
the second flight.

(c) Here it is tempting to count the number of flights from Sydney that pass through
Toronto after the first flight, then the number of flights that pass through Toronto
after the second flight, third flight and fourth flight, then add the results, that is, to
compute
       
(1) (4) (2) (3) (3) (2) (4) (1)
a41 · a12 + a41 · a12 + a41 · a12 + a41 · a12 = 0 · 7 + 2 · 4 + 2 · 1 + 8 · 1 = 18

and conclude that there are 18 such flights. However, the sequence of flights

Sydney −→ Paris −→ Toronto −→ Beijing −→ Toronto −→ Beijing

is “double-counted” as it passes through Toronto twice. Thus there should be less than
18 such flights. To avoid this double-counting, we will instead count the number of
ways to fly from Sydney to Beijing without visiting Toronto, and we will accomplish
this by removing Toronto from our directed graph:

180
This leads to a new adjacency matrix
 
0 0 0 0
 0 0 1 0 
B=
 

 0 1 0 1 
0 1 1 0

It’s left as an exercise to see that


 
0 0 0 0
 0 3 4 1 
B5 = 
 

 0 5 4 4 
0 5 5 3

The (4, 2)−entry of B 5 shows that there are 5 distinct way to fly from Sydney to Beijing
in 5 flights without stopping in Toronto. Since the (4, 2)-entry of A5 shows there are
19 ways to fly from Sydney to Beijing in 5 flights, there must be 19 − 5 = 14 distinct
ways to fly from Sydney to Beijing in 5 flights while visiting Toronto at least once.

181
Lecture 29

Application: Markov Chains


In the zombie apocalypse, a person exists in exactly one of two states: human or zombie.
If a person is a human on any given day, then that person will become a zombie on the
next day with probability 1/2, and if a person is a zombie on any given day, then with
a simple application of the moderately successful miracle-cure ZombieGoneTM , they will
become a human on the next day with probability 1/4. We are interested in computing
the probability that a person is a human or a zombie k days after the onset of the zombie
apocalypse. Given the above data, we can construct the following table:
From : To :
Human Zombie
1/2 1/4 Human
1/2 3/4 Zombie
For k a nonnegative integer, let hk be the probability that a person is a human on day k and
let zk be the probability that a person is a zombie on day k. Since 1/2 of the humans on
day k will remain human the next day, and 1/4 of the zombies on day k will become human
the next day, we have that
1 1
hk+1 = hk + zk
2 4
Similarly, we find
1 3
zk+1 = hk + zk
2 4
giving us a system of equations. In matrix notation,
" # " #" #
hk+1 1/2 1/4 hk
=
zk+1 1/2 3/4 zk
Now let " # " #
hk 1/2 1/4
~sk = and P =
zk 1/2 3/4
to obtain
~sk+1 = P~sk (24)
for every nonnegative integer k. Since a person must be either a human or a zombie (but not
both), we have that hk + zk = 1 for each k. Notice how we can find the matrix P directly
from the table we constructed above, and that it follows from how we constructed the table
that the entries of P are all nonnegative and the entries in each column of P sum to 1.

182
Definition 29.1. A vector ~s ∈ Rn is called a probability vector if the entries in the vector are
nonnegative and sum to 1. A square matrix is called stochastic if its columns are probability
vectors. Given a stochastic matrix P , a Markov Chain is a sequence of probability vectors
~s0 , ~s1 , ~s2 , . . . where
~sk+1 = P~sk
for every nonnegative integer k. In a Markov Chain, the probability vectors ~sk are called
state vectors.

Now suppose that for k = 0 (the moment the zombie apocalypse begins - referred to by
survivors as “Z–Day”), everyone is still human. Thus, a person is a human with probability
1 and a person is a zombie with probability 0. This gives
" #
1
~s0 =
0

We can now compute


" #" # " #
1/2 1/4 1 1/2
~s1 = P~s0 = =
1/2 3/4 0 1/2

showing that one day after the start of the zombie apocalypse, 1/2 of the population are
humans while the other 1/2 of the population are now zombies. Now
" #" # " # " #
1/2 1/4 1/2 3/8 0.37500
~s2 = P~s1 = = =
1/2 3/4 1/2 5/8 0.62500
" #" # " # " #
1/2 1/4 3/8 11/32 0.34375
~s3 = P~s2 = = =
1/2 3/4 5/8 21/32 0.65625

Continuing to use the formula ~sk+1 = P~sk , we obtain


" # " # " # " #
43/128 0.33594 171/512 0.33398
~s4 = ≈ , ~s5 = ≈ ,
85/128 0.66406 341/512 0.66602
" # " #
683/2048 0.33350
~s6 = ≈
1365/2048 0.66650
and after some work, we find that
" # " #
174 763/524 288 0.33333
~s10 = ≈
349 525/524 288 0.66666

183
It appears that the sequence ~s0 , ~s1 , ~s2 , . . . is converging44 to
" #
1/3
~s =
2/3

In fact, " #" # " #


1/2 1/4 1/3 1/3
P~s = = = ~s
1/2 3/4 2/3 2/3
Thus, if the system reaches state ~s (or starts in this state), then the probabilities that a
given person is a human or a zombie no longer change over time.

Definition 29.2. If P is a stochastic matrix, then a state vector ~s is called a steady-state


vector for P if P~s = ~s. It can be shown that every stochastic matrix has a steady-state
vector.

To algebraically determine any steady-state vectors in above our example, we start with
P~s = ~s. Then

P~s − ~s = ~0
P~s − I~s = ~0
(P − I)~s = ~0

so that we have a homogeneous system. Note the introduction of the identity matrix I above.
It might be tempting to go from P~s − ~s = ~0 to (P − 1)~s = ~0, but since P is a matrix and 1
is a number, P − 1 is not defined. Computing the coefficient matrix P − I and row reducing
gives
" # " # " #
−1/2 1/4 −→ −1/2 1/4 −2R1 1 −1/2
1/2 −1/4 R2 +R1 0 0 −→ 0 0

We find that for t ∈ R, " #


1
2
t
~s =
t
Recalling that ~s is a state vector, we additionally require that (1/2)t + t = 1, that is, t = 2/3.
This gives " #
1/3
~s =
2/3
44
By converging, we mean that each component of ~sk is converging to the corresponding component of ~s
as k tends to infinity

184
h iT
Now, what happens if we change our initial state vector ~s0 ? If we let ~s0 = h0 z0 and
h iT
recall that h0 + z0 = 1, we obtain that z0 = 1 − h0 , so ~s0 = h0 1 − h0 . It is a good
exercise to show that by repeatedly using Equation (24), we obtain
h0 1 1
 
+ −
 4k 3 3 · 4k 
~sk = 
 

 h 2 1 
0
− k + +
4 3 3 · 4k
Since both h0 /4k and 1/(3 · 4k ) tend to zero as k tends to infinity, we see that for any initial
state vector ~s0 , the sequence ~s0 , ~s1 , ~s2 , . . . tends to
" #
1/3
~s =
2/3

This means that once the zombie apocalypse begins, in the long-run for a city of 100 000
people, we can expect 33 333 humans and 66 667 zombies each day, and that this long-term
outcome does not depend on the initial state ~s0 . Note that once this steady-state is achieved
humans are still turning into zombies, and zombies are still reverting back to humans each
day, but that the number of humans turning into zombies is equal to the number of zombies
turning into humans. It’s worth noting that once the steady-state is achieved, there would
actually be 33 333.3̄ humans and 66 666.6̄ zombies. We have rounded our final answers due
to the real-world constraints that we cannot have fractional humans.

Note that in our above zombie example, our stochastic matrix P had a unique steady-state
and that the Markov Chain converged to this steady-state regardless of the initial state vec-
tor ~s0 . The next two examples show that this is not always the case.

Example 29.3. The n × n identity matrix I is a stochastic matrix, and for any state vector
~s ∈ Rn , I~s = ~s. This shows that every state vector ~s ∈ Rn is a steady-state vector for I.
Thus we do not have a unique steady-state vector.
Example 29.4. For our second example, consider the stochastic matrix
" #
0 1
Q=
1 0
h iT
Then for any state vector ~s = s1 s2 ,
" #" # " #
0 1 s1 s2
Q~s = =
1 0 s2 s1

185
In order for ~s to be a steady-state vector, we require Q~s = ~s, and so we have that s1 = s2 =
1/2. Thus we have a unique steady-state vector (which we also could have found by solving
the homogeneous system (Q − I)~s = ~0 as above). However, if we take the initial state vector
h iT
~s0 = 1 0 , we find
" #" # " #
0 1 1 0
~s1 = Q~s0 = =
1 0 0 1
" #" # " #
0 1 0 1
~s2 = Q~s1 = = = ~s0
1 0 1 0

We see that  " #


 1

 , if k is even,
 0



~sk =
 " #
0




 , if k is odd
1

so that the Markov Chain doesn’t converge to the steady-state with this initial state. In
h iT
fact, the Markov Chain converges to the steady-state only when ~s0 = 1/2 1/2 .

Clearly, the stochastic matrix P from the zombie apocalypse example is special in the sense
that P has a unique steady-state vector and any Markov Chain will converge to this steady-
state regardless of the initial state vector chosen. This is because the matrix P is regular.

Definition 29.5. An n × n stochastic matrix P is called regular if for some positive integer
k, the matrix P k has all positive entries.

Since a stochastic matrix has all entries between 0 and 1 inclusive, a stochastic matrix P
fails to be regular when P k contains a zero entry for every positive integer k. Clearly,
" #
1/2 1/4
P = P1 =
1/2 3/4

is regular as all entries are positive. The n × n identity matrix is not regular since for any
positive integer k, I k = I contains zero entries. The matrix
" #
0 1
Q=
1 0

186
is not regular since for any positive integer k,
 " #
 1 0

 , if k is even,
 0 1



Qk =
 " #
0 1




 , if k is odd
1 0

always contains a zero entry.


Theorem 29.6. Let P be a regular n × n stochastic matrix. Then P has a unique steady-
state vector ~s and for any initial state vector ~s0 ∈ Rn , the resulting Markov Chain converges
to the steady-state vector ~s.
To solve a Markov Chain problem (in this course):
1. Read and understand the problem
2. Determine the stochastic matrix P and verify that P is regular
3. Determine the initial state vector ~s0 if required
4. Solve the homogeneous system (P − I)~s = ~0
5. Choose values for any parameters resulting from solving the above system so that ~s is
a probability vector
6. Conclude by Theorem 29.6 that ~s is the steady-state vector
7. Interpret the entries of ~s in terms of the original problem as needed
A brief word about notation. Our examples here dealt mostly with two states - for example,
human and zombie. In general, we will have many states in our Markov Chain. This means
that our stochastic matrix P will be an n × n matrix and our state vectors ~sk ∈ Rn for
k = 0, 1, 2, . . .. Oftentimes, the following notation is used for state vectors:
 (k) 
s1
 .. 
~sk =  . 
(k)
sn

for k = 0, 1, 2, . . . and
 
s1
 . 
~s =  .. 
sn
for our steady-state vector.

187
Lecture 30

Matrix Inverses45
We have seen that like real numbers, we can multiply matrices. For real numbers, we know
that 1 is the multiplicative identity since 1(x) = x = x(1) for any x ∈ R. We also know that
if x, y ∈ R are such that xy = 1 = yx, then x and y are multiplicative inverses of each other,
and we say that they are both invertible. We have recently seen that for an n × n matrix A,
IA = A = AI where I is the n × n identity matrix which shows that I is the multiplicative
identity for Mn×n (R). It is then natural to ask that for a given a matrix A, does there exist
a matrix B so that AB = I = BA?46

Definition 30.1. Let A ∈ Mn×n (R). If there exists a B ∈ Mn×n (R) such that

AB = I = BA

then A is invertible and B is an inverse of A (and B is invertible with A an inverse of B).

Example 30.2. Let " # " #


2 −1 1 1
A= and B = .
−1 1 1 2
Then
" #" # " # " #" #
2 −1 1 1 1 0 1 1 2 −1
AB = = = = BA
−1 1 1 2 0 1 1 2 −1 1

so A is invertible and B is an inverse of A.

Example 30.3. Let " #


1 2
A= .
0 0
Then for any b1 , b2 , b3 , b4 ∈ R,
" #" # " # " #
1 2 b1 b2 b1 + 2b3 b2 + 2b4 1 0
= 6=
0 0 b3 b4 0 0 0 1

so A is not invertible.
45
When we say inverse here, we mean multiplicative inverse. Given any matrix A ∈ Mm×n (R), the additive
inverse of A is −A, which is both easy to compute and not very interesting to study.
46
The requirement that AB = BA imposes the condition that A be a square matrix.

188
Notice that in the previous example, A is a nonzero matrix that fails to be invertible. This
might be surprising since for a real number x, we know that x being invertible is equivalent
to x being nonzero. Clearly this is not the case for n × n matrices.

By the above definition, to show that B ∈ Mn×n (R) is an inverse of A ∈ Mn×n (R), we must
check that both AB = I and BA = I. Then next theorem shows that if AB = I, then it
follows that BA = I (or equivalently, if BA = I then it follows that AB = I) so that we
may verify only one of AB = I and BA = I to conclude that B is an inverse of A.

Theorem 30.4. Let A, B ∈ Mn×n (R) be such that AB = I. Then BA = I. Moreover,


rank (A) = rank (B) = n.

Proof. Let A, B ∈ Mn×n (R) be such that AB = I. We first show that rank (B) = n. Let
~x ∈ Rn be such that B~x = ~0. Since AB = I,

~x = I~x = (AB)~x = A(B~x) = A~0 = ~0

so ~x = ~0 is the only solution to the homogeneous system B~x = ~0. Thus, rank (B) = n by
the System–Rank Theorem(2).

We next show that BA = I. Let ~y ∈ Rn . Since rank (B) = n and B has n rows, the
System–Rank Theorem(3) guarantees that we will find ~x ∈ Rn such that ~y = B~x. Then

(BA)~y = (BA)B~x = B(AB)~x = BI~x = B~x = ~y = I~y

so (BA)~y = I~y for every ~y ∈ Rn . Thus BA = I by the Matrices Equal Theorem.

Finally, since BA = I, it follows that rank (A) = n by the first part of our proof with the
roles of A and B interchanged.
We have now proven that if A ∈ Mn×n (R) is invertible, then rank (A) = n. It follows that
the reduced row echelon form of A is I. We now prove that if A is invertible, then the inverse
of A is unique.

Theorem 30.5. Let A ∈ Mn×n (R) be invertible. If B, C ∈ Mn×n (R) are both inverses of A,
then B = C.

Proof. Assume for A, B, C ∈ Mn×n (R) that both B and C are inverses of A. Then BA = I
and AC = I. We have

B = BI = B(AC) = (BA)C = IC = C.

Hence, if A is invertible, the inverse of A is unique, and we denote this inverse by A−1 .

189
Theorem 30.6. Let A, B ∈ Mn×n (R) be invertible and let c ∈ R with c 6= 0. Then
(1) (cA)−1 = 1c A−1

(2) (AB)−1 = B −1 A−1


−1 k
(3) Ak = A−1 for k a positive integer
−1 T
(4) AT = A−1
−1
(5) A−1 =A
Proof. We prove (2) and (4) only. For (2), since

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I

we have that (AB)−1 = B −1 A−1 and for (4), since


T T
AT A−1 = A−1 A = I T = I
−1 T
we see that AT = A−1 .
Note that (2) from Theorem 30.6 generalizes for more than two matrices. For invertible
matrices A1 , A2 , . . . , Ak ∈ Mn×n (R) we have that A1 A2 · · · Ak is invertible and

(A1 A2 · · · Ak )−1 = A−1 −1 −1


k · · · A2 A1 .

In particular, if A1 = A2 = · · · = Ak = A is invertible, then


−1 k
Ak = A−1

for any positive integer k.

Matrix Inversion Algorithm


Having shown many properties of matrix inverses, we have yet to actually compute the in-
verse of an invertible matrix. We know that for a real number x, x is invertible if and only
if x 6= 0, and in this case x−1 = x1 . Things aren’t quite so easy with matrices.47 . We derive
an algorithm here that will tell us if a matrix is invertible, and compute the inverse should
the matrix be invertible. Our construction is for 3 × 3 matrices, but generalizes naturally
for n × n matrices.

Consider A ∈ M3×3 (R). If A is invertible, then there exists an X = [ ~x1 ~x2 ~x3 ] ∈ M3×3 (R)
such that

AX = I
47
Don’t you even think about writing A−1 = 1
A. This is wrong as 1
A is not even defined.

190
A[ ~x1 ~x2 ~x3 ] = [ ~e1 ~e2 ~e3 ]
[ A~x1 A~x2 A~x3 ] = [ ~e1 ~e2 ~e3 ]
Thus
A~x1 = ~e1 , A~x2 = ~e2 and A~x3 = ~e3 .
We have three systems of equations with the same coefficient matrix, so we construct an
augmented matrix
[ A | ~e1 ~e2 ~e3 ] = [ A | I ]
We must consider two cases when solving this system. First, if the reduced row echelon form
of A is I, then
[ A | I ] −→ [ I | ~b1 ~b2 ~b3 ]
where B = [ ~b1 ~b2 ~b3 ] ∈ M3×3 (R) is the matrix that I reduces to under the same elemen-
tary row operations that carry A to I. From this, we see that ~b1 is the solution to A~x1 = ~e1 ,
~b2 is the solution to A~x2 = ~e2 and ~b3 is the solution to A~x3 = ~e3 , that is,

~x1 = ~b1 , ~x2 = ~b2 and ~x3 = ~b3


Hence
AX = AB = A[ ~b1 ~b2 ~b3 ] = [ A~b1 A~b2 A~b3 ] = [ ~e1 ~e2 ~e3 ] = I
so A−1 = B. Second, if the reduced row echelon form of A is not I, then rank (A) < 3 and
A cannot be invertible since if A were invertible, we would have rank (A) = 3 by Theorem 30.4

Thus, for A ∈ Mn×n (R), to see if A is invertible (and to compute A−1 if A is invertible),
carry the matrix [ A | I ] to reduced row echelon form. If the reduced row echelon form of
[ A | I ] is [ I | B ] for some B ∈ Mn×n (R), then B = A−1 , but if the reduced row echelon form
of A is not I, then A is not invertible. This is known as the Matrix Inversion Algorithm.
Example 30.7. Let " #
2 3
A= .
4 5
Find A−1 if it exists.
Solution. We have
" # " # " #
2 3 1 0 −→ 2 3 1 0 R1 +3R2 2 0 −5 3 1
R
2 1

4 5 0 1 R2 −2R1 0 −1 −2 1 −→ 0 −1 −2 1 −R2
" #
1 0 −5/2 3/2
0 1 2 −1
So A is invertible (since the reduced row echelon form of A is I) and
" #
−5/2 3/2
A−1 = .
2 −1

191
Example 30.8. Let " #
1 2
A= .
2 4
Find A−1 if it exists.

Solution. We have
" # " #
1 2 1 0 −→ 1 2 1 0
2 4 0 1 R2 −2R1 0 0 −2 1

We see that the reduced row echelon form of A is


" # " #
1 2 1 0
6=
0 0 0 1

so A is not invertible (note that rank (A) = 1 < 2).

Example 30.9. Let  


1 0 −1
A =  1 1 −2 
 

1 2 −2
Find A−1 if it exists.

Solution. We have
   
1 0 −1 1 0 0 −→ 1 0 −1 1 0 0 −→
 1 1 −2 0 1 0  R2 −R1  0 1 −1 −1 1 0 
   

1 2 −2 0 0 1 R3 −R1 0 2 −1 −1 0 1 R3 −2R2
   
1 0 −1 1 0 0 R1 +R3 1 0 0 2 −2 1
 0 1 −1 −1 1 0  R2 +R3  0 1 0 0 −1 1 
   

0 0 1 1 −2 1 −→ 0 0 1 1 −2 1

and we conclude that A is invertible and


 
2 −2 1
A−1 =  0 −1 1  .
 

1 −2 1

Note that if you find A to be invertible and you compute A−1 , then you can check your work
by ensuring that AA−1 = I.

192
Lecture 31

Properties of Matrix Inverses


Theorem 31.1. Let A ∈ Mn×n (R) be invertible
(1) For all B, C ∈ Mn×k (R), if AB = AC, then B = C left cancellation
(2) For all B, C ∈ Mk×n (R), if BA = CA, then B = C right cancellation
Proof. We prove (1). We have
AB = AC
A (AB) = A−1 (AC)
−1

(A−1 A)B = (A−1 A)C


IB = IC
B=C
Note that our two cancellation laws require that A is invertible. Indeed
" #" # " # " #" #
0 0 1 2 3 0 0 0 0 0 7 8 9
= =
0 1 4 5 6 4 5 6 0 1 4 5 6

but " # " #


1 2 3 7 8 9
6= .
4 5 6 4 5 6
Notice that rank ([ 00 01 ]) = 1 < 2 so [ 00 01 ] is not invertible.
Example 31.2. If A, B, C ∈ Mn×n (R) are such that A is invertible and AB = CA, does
B = C?
Solution. The answer is no. To see this, consider
" # " # " #
1 1 1 1 2 0
A= , B= and C = .
0 1 1 1 1 0

Then
" #" # " #
1 1 1 1 2 2
AB = =
0 1 1 1 1 1
" #" # " #
2 0 1 1 2 2
CA = =
1 0 0 1 1 1

So AB = CA but B 6= C.

193
The previous example shows that we do not have mixed cancellation. This is a direct result of
matrix multiplication not being commutative. From AB = CA, we can obtain B = A−1 CA,
and since B 6= C, we have C 6= A−1 CA. Note that we cannot cannot cancel A and A−1 here.

Example 31.3. For A, B ∈ Mn×n (R) with A, B and A + B invertible, do we have that
(A + B)−1 = A−1 + B −1 ?

Solution. The answer is no. Let A = B = I. Then A + B = 2I and


1 1
(A + B)−1 = (2I)−1 = I −1 = I
2 2
but

A−1 + B −1 = I −1 + I −1 = I + I = 2I

As 12 I 6= 2I, (A + B)−1 6= A−1 + B −1


The following theorem summarizes many of the results we have seen thus far in the course,
and shows the importance of matrix invertibility. This theorem is central to all of linear
algebra and actually contains many more parts, some of which we will encounter later. Note
that we have already proven several of these equivalences.

Theorem 31.4 (Invertible Matrix Theorem). Let A ∈ Mn×n (R). The following are equiva-
lent.

(1) A is invertible

(2) rank (A) = n

(3) The reduced row echelon form of A is I

(4) For all ~b ∈ Rn , the system A~x = ~b is consistent and has a unique solution

(5) Null (A) = {~0}

(6) The columns of A form a linearly independent set

(7) The columns of A span Rn

(8) AT is invertible

(9) Null (AT ) = {~0}

(10) The rows of A form a linearly independent set

(11) The rows of A span Rn

194
In particular, for A invertible, the system A~x = ~b has a unique solution. We can solve for ~x
using our matrix algebra:

A~x = ~b
A−1 A~x = A−1~b
I~x = A−1~b
~x = A−1~b

Example 31.5. Consider the system of equations A~x = ~b with


" # " #
2 3 4
A= and ~b =
4 5 −1

Then A is invertible (see Example 30.7) and

~x = A−1~b
" #" #
−5/2 3/2 4
=
2 −1 −1
" #
−23/2
=
9

Of course we could have solved the above system A~x = ~b by row reducing the augmented
matrix [ A | ~b ] → [ I | −23/2
9
]. Note that to find A−1 we row reduced [ A | I ] −→ [ I | A−1 ] and
that the elementary row operations used in both cases are the same.

Linear Transformations
Recall that a function is a rule that assigns to every element in one set (called the domain
of the function) a unique element in another set (called the codomain 48 of the function).
Given sets U and V we write f : U → V to indicate that f is a function with domain U
and codomain V , and it is understood that to each element u ∈ U , the function f assigns
a unique element v ∈ V . We say that f maps u to v and that v is the image of u under f .
We typically write v = f (u). See Figure 49.

In calculus, one studies functions f : R → R, for example f (x) = x2 or f (x) = sin(x). We


will consider functions f : Rn → Rm . In fact, for A ∈ Mm×n (R) and ~x ∈ Rn , we have seen
how to compute the matrix-vector product A~x, and we know that A~x ∈ Rm . This motivates
the following definition.
48
The codomain of a function is often confused with the range of a function. We will define the range of
a function shortly.

195
(a) A function with domain U and codomain (b) This fails to be a function from U to V
V. for two reasons: it doesn’t assign an image
in V to all points in U , and it assigns to one
point in U more than one image in V .

Figure 49: An example of a function (on the left) and something that fails to be a function
(on the right).

Definition 31.6. For A ∈ Mm×n (R), the function fA : Rn → Rm defined by fA (~x) = A~x
for every ~x ∈ Rn is called the matrix transformation corresponding to A. We call Rn the
domain of fA and Rm the codomain of fA . We say that fA maps ~x to A~x and say that A~x
is the image of ~x under fA .

We make a few notes here:

• It is not uncommon to say matrix mapping instead of matrix transformation. We may


use the words transformation and mapping interchangeably.

• The subscript A in fA is merely to indicate that the function depends on the matrix
A. If we change the matrix A, we change the function fA .

• For A ∈ Mm×n (R), we have that fA : Rn → Rm . This is a result of how we defined the
matrix-vector product.

Example 31.7. Let " #


1 2 3
A= .
1 −1 1
Then A ∈ M2×3 (R) and so fA : R3 → R2 . We can compute
 
" # 1
1 2 3  
fA (1, 1, 4) =  1  = (15, 4),
1 −1 1
4

196
and more generally,
 
" # x1
1 2 3
fA (x1 , x2 , x3 ) =  x2  = (x1 + 2x2 + 3x3 , x1 − x2 + x3 ).
 
1 −1 1
x3

Since for A ∈ Mm×n (R), the function fA sends vectors in Rn to vectors in Rm , we should be
writing
     
x1 x1 y1
 .   .   . 
fA  ..  = A  ..  =  ..  ,
xn xn ym
but as functions are often viewed as sending points to points, we will prefer the notation
 
y1
 . 
f (x1 , . . . , xn ) = (y1 , . . . , ym ) or f (x1 , . . . , xn ) =  ..  .
ym

However, we must still write


 
x1
 . 
A  .. 
xn
and not
A(x1 , . . . , xn ).
due to our rules for the matrix-vector product.

Theorem 31.8. Let A ∈ Mm×n (R) and let fA be the matrix transformation corresponding
to A. For every ~x, ~y ∈ Rn and for every c ∈ R,

(1) fA (~x + ~y ) = fA (~x) + fA (~y )

(2) fA (c~x) = cfA (~x)

Proof. We use the properties of the matrix-vector product. We have

fA (~x + ~y ) = A(~x + ~y ) = A~x + A~y = fA (~x) + fA (~y )

and
fA (c~x) = A(c~x) = cA~x = cfA (~x).

197
Thus matrix transformations preserve vector sums and scalar multiplication. Combin-
ing these two results shows that matrix transformations preserve linear combinations: for
~x1 , . . . , ~xk ∈ Rn and c1 , . . . , ck ∈ R,

fA (c1~x1 + · · · + ck ~xk ) = c1 fA (~x1 ) + · · · + ck fA (~xk ).

Functions which preserve linear combinations are called linear transformations or linear
mappings.

Definition 31.9. A function L : Rn → Rm is called a linear transformation (or a linear


mapping) if for every ~x, ~y ∈ Rn and for every s, t ∈ R we have

L(s~x + t~y ) = sL(~x) + tL(~y ).

For m = n, a linear transformation L : Rn → Rn is often called a linear operator on Rn .

It follows immediately from Theorem 31.8 that every matrix transformation is a linear trans-
formation.

By taking s = t = 0 in the definition of a linear transformation, we find that

L(~0Rn ) = ~0Rm ,

that is, a linear transformation always sends the zero vector of the domain to the zero vector
of the codomain. By taking s = −1 and t = 0, we see that

L(−~x) = −L(~x)

so linear transformations preserve additive inverses as well.

Linear transformations are important throughout mathematics – in fact, we have seen them
in calculus.49 For differentiable functions f, g : R → R, and s, t ∈ R we have
d d d
(sf (x) + tg(x)) = s f (x) + t g(x).
dx dx dx
Example 31.10. Show that L : R2 → R2 defined by

L(x1 , x2 ) = (x1 − x2 , 2x1 + x2 )

is a linear transformation.

Solution. Let ~x, ~y ∈ R2 and s, t ∈ R. With ~x = [ xx12 ] and ~y = [ yy12 ], we have

L(s~x + t~y ) = L(sx1 + ty1 , sx2 + ty2 )


49
It is important to always remember that linear algebra is far better than calculus.

198

= (sx1 + ty1 ) − (sx2 + ty2 ), 2(sx1 + ty1 ) + (sx2 + ty2 )
= (sx1 − sx2 , 2sx1 + sx2 ) + (ty1 − ty2 , 2ty1 + ty2 )
= s(x1 − x2 , 2x1 + x2 ) + t(y1 − y2 , 2y1 + y2 )
= sL(~x) + tL(~y ).

As L(s~x + t~y ) = sL(~x) + tL(~y ), we see that L is a linear transformation.


Note that we could have also noticed that for any ~x ∈ R2
" # " #" #
x1 − x2 1 −1 x1
L(~x) = =
2x1 + x2 2 1 x2

which shows that L is a matrix transformation and hence a linear transformation.

Example 31.11. Show that L : R2 → R defined by L(~x) = k~xk is not linear.

Solution. To show that L is not linear, we must exhibit two vectors ~x, ~y ∈ R2 and two
scalars s, t ∈ R such that L(s~x + t~y ) 6= sL(~x) + tL(~y ). We know that the norm does not
generally preserve sums, so we will take s = t = 1 and choose two nonzero nonparallel vectors
~x, ~y ∈ R2 . Consider " # " #
1 0
~x = and ~y = .
0 1
Then " #
1 √
L(~x + ~y ) = L(1, 1) = = 2
1
and " # " #
1 0
L(~x) + L(~y ) = L(1, 0) + L(0, 1) = + = 1 + 1 = 2.
0 1
As we have found vectors ~x, ~y ∈ R2 such that L(~x + ~y ) 6= L(~x) + L(~y ), we conclude that L
is not linear.

199
Lecture 32
Example 32.1. Show that L : R3 → R2 defined by L(x1 , x2 , x3 ) = (x1 + x2 + x3 , x23 + 3) is
not linear.

Solution. Consider    
1 0
~x =  0  and ~y =  1  .
   

0 0
Then
L(~x + ~y ) = L(1, 1, 0) = (2, 3)
but
L(~x) + L(~y ) = L(1, 0, 0) + L(0, 1, 0) = (1, 3) + (1, 3) = (2, 6),
which shows that L is not linear.
Recall that a linear transformation always maps the zero vector of the domain to the
zero vector of the codomain. Thus in Example 32.1, we could have quickly noticed that
L(0, 0, 0) = (0, 3) 6= (0, 0) and concluded immediately that L was not linear. Note however,
that a function sending the zero vector of the domain to the zero vector of the codomain
does not guarantee that the function is linear – see Example 31.11.

Example 32.2. Let L : R2 → R4 be a linear transformation such that

L(1, 2) = (1, 2, 3, 4) and L(2, 3) = (1, 4, 0, −1)

Then
L(3, 5) = L(1, 2) + L(2, 3) = (1, 2, 3, 4) + (1, 4, 0, −1) = (2, 6, 3, 3).

In general, for a linear transformation L : Rn → Rm , if we are given L(~x1 ), . . . , L(~xk ) for


~x1 , . . . , ~xk ∈ Rn , then we can compute L(~x) for any ~x ∈ Span {~x1 , . . . , ~xk } since L pre-
serves linear combinations. In particular, if {~v1 , . . . , ~vn } is a basis for Rn and we know
L(~v1 ), . . . , L(~vn ), then we can compute L(~v ) for any ~v ∈ Rn which is an extremely powerful
property!

Indeed, from Example 32.2, the set


(" # " #)
1 2
,
2 3

is a basis for R2 . Thus, for any [ xx12 ] ∈ R2 we have


" # " #
1 2 x1 −→ 1 2 x1 −→
2 3 x2 R2 −2R1 0 −1 x2 − 2x1 −R2

200
" # " #
1 2 x1 R1 −2R2 1 0 −3x1 + 2x2
0 1 2x1 − x2 −→ 0 1 2x1 − x2

and so " # " # " #


x1 1 2
= (−3x1 + 2x2 ) + (2x1 − x2 ) .
x2 2 3
It follows that

L(x1 , x2 ) = (−3x1 + 2x2 )L(1, 2) + (2x1 − x2 )L(2, 3)


= (−3x1 + 2x2 )(1, 2, 3, 4) + (2x1 − x2 )(1, 4, 0, −1)
= (−x1 + x2 , 2x1 , −9x1 + 6x2 , −14x1 + 9x2 ).

Thus, by knowing just L(1, 2) and L(2, 3) we can compute L(~x) for any ~x ∈ R2 . Also note
that    
−x1 + x2 −1 1 " #
 2x1   2 0  x1
L(x1 , x2 ) =  =
   

 −9x1 + 6x2   −9 6  x2
−14x1 + 9x2 −14 9
which shows that L is a matrix transformation.

Recall that Theorem 31.8 guarantees that every matrix transformation from Rn to Rm is a
linear transformation. We also noticed that the linear transformations from Examples 31.10
and 32.2 were matrix transformations, so it is natural to ask if every linear transformation
from Rn to Rm is a matrix transformation. The following theorem shows the answer is yes.
Theorem 32.3. If L : Rn → Rm is a linear transformation, then L is a matrix transforma-
tion with corresponding matrix

[ L ] = [ L(~e1 ) · · · L(~en ) ] ∈ Mm×n (R),

that is, L(~x) = [ L ]~x for every ~x ∈ Rn .


Proof. Let ~x = [ x1 · · · xn ]T ∈ Rn . Then ~x = x1~e1 + · · · + xn~en . We have

L(~x) = L(x1~e1 + · · · + xn~en )


= x1 L(~e1 ) + · · · + xn L(~en ) since L is linear
 
x1
 . 
= [ L(~e1 ) · · · L(~en ) ]  .. 
xn
= [ L ]~x

201
Given a linear transformation L : Rn → Rm , we refer to [ L ] ∈ Mm×n (R) as the standard
matrix of L. Theorems 31.8 and 32.3 combine to give that a transformation is linear if and
only if it is a matrix transformation.

Example 32.4. Let d~ ∈ R2 be nonzero and define L : R2 → R2 by L(~x) = proj d~ ~x for every
~x ∈ R2 . Show L is linear, and then find the standard matrix of L with d~ = [ −1
3 ].

Solution. We first show L is linear. Let ~x, ~y ∈ R2 and s, t ∈ R. We have

L(s~x + t~y ) = proj d~ (s~x + t~y )


(s~x + t~y ) · d~ ~
= d
kdk~ 2
~x · d~ ~ ~y · d~ ~
=s d+t d by properties of the dot product
~ 2
kdk ~ 2
kdk
= s proj d~ ~x + t proj d~ ~y
= sL(~x) + tL(~y )

so L is linear. Now with d~ = [ −1


3 ],
" # " #
~e1 · d~ ~ 1 −1 1/10
L(~e1 ) = proj d~ ~e1 = d=− =
~ 2
kdk 10 3 −3/10
" # " #
~e2 · d~ ~ 3 −1 −3/10
L(~e2 ) = proj d~ ~e2 = d= =
~ 2
kdk 10 3 9/10
so " #
1/10 −3/10
[ L ] = [ L(~e1 ) L(~e2 ) ] = .
−3/10 9/10

Note that if we take ~x = [ 12 ] for example, we can compute the projection of ~x onto d~ = [ −1
3 ]
as " #" # " #
1/10 −3/10 1 −1/2
L(~x) = proj d~ ~x = = ,
−3/10 9/10 2 3/2
that is, we can compute projections using matrix multiplication.

Example 32.5. Let L : R2 → R2 be defined by L(~x) = 2proj d~ ~x − ~x where d~ ∈ R2 is a


nonzero vector. Figure 50 shows that L represents a reflection in the line through the origin
~ Show that L is linear, and then find the standard matrix of L with
with direction vector d.
~ 1
d = [ 1 ].

202
~ Note that
Figure 50: Reflecting ~x in a line through the origin with direction vector d.
2proj d~ ~x − ~x = ~x − 2perp d~ ~x

Solution. We first show that L is linear using the fact that proj d~ ~x is linear. For ~x, ~y ∈ R2
and s, t ∈ R we have

L(s~x + t~y ) = 2proj d~ (s~x + t~y ) − (s~x + t~y )


= 2(s proj d~ ~x + t proj d~ ~y ) − s~x − t~y
= s(2proj d~ ~x − ~x) + t(2proj d~ ~y − ~y )
= sL(~x) + tL(~y )

so L is linear. Now with d~ = [ 11 ]


" #! " # " #
1 1 1 0
L(~e1 ) = 2proj d~ ~e1 − ~e1 = 2 − =
2 1 0 1
" #! " # " #
1 1 0 1
L(~e2 ) = 2proj d~ ~e2 − ~e2 = 2 − =
2 1 1 0

and so " #
0 1
[ L ] = [ L(~e1 ) L(~e2 ) ] = .
1 0

Note that in R2 , the line through the origin with direction vector d~ = [ 11 ] has scalar equation
x2 = x1 . For any ~y = [ yy12 ] ∈ R2 ,
" #" # " #
0 1 y1 y2
L(~y ) = =
1 0 y2 y1

from which we see that reflecting a vector in the line x2 = x1 simply swaps the coordinates
of that vector.

203
Example 32.6. Let L : R3 → R3 be defined by L(~x) = x − 2proj ~n ~x where ~n ∈ R3 is
a nonzero vector. Figure 51 shows that L represents a reflection in the plane through the
origin with normal vector ~n. Show that L is linear, and find the standard matrix of L if the
plane has scalar equation x1 − x2 + 2x3 = 0.

Figure 51: Reflecting ~x in a plane through the origin with normal vector ~n.

Solution. We first show that L is linear using the fact that projections are linear. For
~x, ~y ∈ R3 , and s, t ∈ R,
L(s~x + t~y ) = (s~x + t~y ) − 2proj ~n (s~x + t~y )
= s~x + t~y − 2(s proj ~n ~x + t proj ~n ~y )
= s(~x − 2proj ~n ~x) + t(~y − 2proj ~n ~y )
= sL(~x) + tL(~y )
h 1 i
and so L is linear. Now for the plane x1 − x2 + 2x3 = 0, we have that ~n = −1 . We compute
2
      
1 1 2/3
~e1 · ~n 1 
L(~e1 ) = ~e1 − 2proj ~n ~e1 = ~e1 − 2 ~n =  0  − 2   −1  =  1/3 
    
k~nk 2 6
0 2 −2/3
      
0 1 1/3
~e2 · ~n  (−1) 
L(~e2 ) = ~e2 − 2proj ~n ~e2 = ~e2 − 2 ~n =  1  − 2   −1  =  2/3 
    
k~nk 2 6
0 2 2/3
      
0 1 −2/3
~e3 · ~n 2 
L(~e3 ) = ~e3 − 2proj ~n ~e3 = ~e3 − 2 ~n =  0  − 2   −1  =  2/3 
    
k~nk 2 6
1 2 −1/3

Hence the standard matrix of L is


 
2/3 1/3 −2/3
[ L ] = [ L(~e1 ) L(~e2 ) L(~e3 ) ] =  1/3 2/3 2/3  .
 

−2/3 2/3 −1/3

204
In the last two examples, we required the objects we were reflecting in (a line and a plane)
to be through the origin. The reason for this is because if our line or plane does not contain
the origin, then our transformation would not send the zero vector to the zero vector and
thus not be linear.

Oftentimes, we use more descriptive names for our linear transformations:

proj d~ : Rn → Rn is the projection onto d~ =


6 ~0 in Rn
perp d~ : Rn → Rn is the perpendicular onto d~ 6= ~0 in Rn
refl~n : Rn → Rn is the reflection in a hyperplane through the origin with
normal vector ~n 6= ~0 in Rn

205
Lecture 33
We are seeing that linear transformations (or equivalently, matrix transformations) give us a
way to geometrically understand the matrix–vector product. We have seen that projections
and reflections are both linear transformations, and we now look at some additional linear
transformations that are common in many fields, such as computer graphics.

We first consider rotations. Let Rθ : R2 → R2 be a counterclockwise rotation about the


origin by an angle of θ. To see that Rθ is linear, we use basic trigonometry to write ~x ∈ R2
as (notice the similarity to polar form for complex numbers)
" #
r cos φ
~x =
r sin φ

where r ∈ R satisfies r = k~xk ≥ 0 and φ ∈ R is the angle ~x makes with the positive x1 −axis
measured counterclockwise (if ~x = ~0, then r = 0 and we may take φ to be any real number).
See Figure 52.

Figure 52: Rotating a vector in R2 .

Since Rθ (~x) is obtained from rotating ~x counterclockwise about the origin, it is clear that
kRθ (~x)k = r and that Rθ (~x) makes an angle of θ + φ with the positive x1 −axis (this is
illustrated in Figure 52). Thus using the angle-sum formulas for sine and cosine, we have
" #
r cos(φ + θ)
Rθ (~x) =
r sin(φ + θ)
" #
r(cos φ cos θ − sin φ sin θ)
=
r(sin φ cos θ + cos φ sin θ)
" #
cos θ(r cos φ) − sin θ(r sin φ)
=
sin θ(r cos φ) + cos θ(r sin φ)
" #" #
cos θ − sin θ r cos φ
=
sin θ cos θ r sin φ

206
" #
cos θ − sin θ
= ~x
sin θ cos θ

and we see that Rθ is a matrix transformation and thus a linear transformation. We also see
that " #
cos θ − sin θ
[ Rθ ] = .
sin θ cos θ

Example 33.1. Find the vector that results from rotating ~x = [ 12 ] counterclockwise about
the origin by an angle of π6 .

Solution. We have
" #" # " √ #" # " √ #
cos π6 − sin π6 1 3/2 −1/2 1 1 3−2
R π6 (~x) = [ R π6 ]~x = = √ = √ .
sin π6 cos π6 2 1/2 3/2 2 2 1−2 3

Note that a clockwise rotation about the origin by an angle of θ is simply a counterclockwise
rotation about the origin by an angle of −θ. Thus a clockwise rotation by θ is given by the
linear transformation
" # " #
cos(−θ) − sin(−θ) cos θ sin θ
[ R−θ ] = =
sin(−θ) cos(−θ) − sin θ cos θ

where we have used the fact that cos θ is an even function (cos(−θ) = cos θ) and sin θ is an
odd function (sin(−θ) = − sin θ).

We briefly mention that we can generalize these results for rotations about a coordinate axis
in R3 . Consider50
     
1 0 0 cos θ 0 sin θ cos θ − sin θ 0
A =  0 cos θ − sin θ  , B =  0 1 0  , C =  sin θ cos θ 0  .
     

0 sin θ cos θ − sin θ 0 cos θ 0 0 1

Then

L1 : R3 → R3 defined by L1 (~x) = A~x is a counterclockwise rotation about the x1 − axis,


L2 : R3 → R3 defined by L2 (~x) = B~x is a counterclockwise rotation about the x2 − axis,
L3 : R3 → R3 defined by L3 (~x) = C~x is a counterclockwise rotation about the x3 − axis.
50
For the matrix B, notice that the negative sign is on the “other” instance of sin θ. The reason for
this is if one “stares” down the positive x2 −axis, then they see the x1 x3 −plane, however, the orientation is
backwards – the positive x1 −axis is to the left of the positive x3 −axis. Thus the roles of “clockwise” and
“counterclockwise” are reversed in this instance.

207
In fact, we can rotate about any line through the origin in R3 , but finding the standard
matrix of such a transformation is beyond the scope of this course.

We next look at stretches and compressions. For t a positive real number, let
" #
t 0
A=
0 1

and define L : R2 → R2 by L(~x) = A~x for every ~x ∈ R2 . Then L is a matrix transformation


and hence a linear transformation. For ~x = [ xx12 ],
" #" # " #
t 0 x1 tx1
L(~x) = = .
0 1 x2 x2

If t > 1, then we say that L is a stretch in the x1 −direction by a factor of t, and if 0 < t < 1,
we say that L is a compression if the x1 −direction. A stretch in the ~x1 −direction is illustrated
in Figure 53.

Figure 53: A stretch in the x1 −direction by a factor of t > 1.

Note the requirement that t > 0. If t = 0, then L is actually a projection onto the x2 −axis,
and if t < 0, then L is a reflection in the x2 −axis followed by a stretch or compression by a
factor of −t > 0. A stretch or compression in the x2 −direction is defined in a similar way.

We next consider dilations and contractions. For t ∈ R with t > 0, let


" #
t 0
B=
0 t

and define L(~x) = B~x for every ~x ∈ R2 . Then L is a matrix transformation and thus a linear
transformation. For ~x = [ xx12 ],
" #" # " #
t 0 x1 tx1
L(~x) = = = t~x.
0 t x2 tx2

208
We see that L(~x) is simply a scalar multiple of ~x. We call L a dilation if t > 1 and we call
L a contraction if 0 < t < 1. If t = 1, then B is the identity matrix and L(~x) = ~x. Figure
54 illustrates a dilation.

Figure 54: A dilation by a factor of t > 1.

Finally, we consider shears. For s ∈ R, let


" #
1 s
C=
0 1

and define L : R2 → R2 by L(~x) = C~x for every ~x ∈ R2 . Then L is a matrix transformation


and hence a linear transformation. For ~x = [ xx12 ],
" #" # " #
1 s x1 x1 + sx2
L(~x) = =
0 1 x2 x2

and we see that L is a shear in the x1 −direction by a factor of s (also referred to as a


horizontal shear by a factor of s). Figure 55 illustrates a shear in the x1 −direction for s > 0.

Figure 55: A shear in the x1 −direction by a factor of s > 0.

209
Note that a shear in the x2 −direction (or a vertical shear) by a factor of s ∈ R has standard
matrix " #
1 0
.
s 1

Operations on Linear Transformations


We now study linear transformations more algebraically. Given the relationship between
linear transformations and matrices, it shouldn’t be too much of a surprise that linear trans-
formations behave very much like matrices under the operations of addition and scalar mul-
tiplication.

Definition 33.2. Let L, M : Rn → Rm be (linear) transformations51 . If L(~x) = M (~x) for


every ~x ∈ Rn , then we say L and M are equal and we write L = M . If for some ~x ∈ Rn we
have that L(~x) 6= M (~x), then L and M are not equal and we write L 6= M .

Note that if L, M : Rn → Rm are linear transformations, then

L = M ⇐⇒ L(~x) = M (~x) for every ~x ∈ Rn


⇐⇒ [ L ]~x = [ M ]~x for every ~x ∈ Rn
⇐⇒ [ L ] = [ M ] by the Matrices Equal Theorem.

Definition 33.3. Let L, M : Rn → Rm be (linear) transformations and let c ∈ R. We define


(L + M ) : Rn → Rm by
(L + M )(~x) = L(~x) + M (~x)
for every ~x ∈ Rn , and we define (cL) : Rn → Rm by

(cL)(~x) = cL(~x)

for every ~x ∈ Rn .

Example 33.4. Let L, M : R3 → R2 be linear transformations such that L(x1 , x2 , x3 ) =


(2x1 + x2 , x1 − x2 + x3 ) and M (x1 , x2 , x3 ) = (x3 , x1 + 2x2 + 3x3 ). Calculate L + M and −2L.
h x1 i
Solution. For ~x = xx23 ∈ R3 we have

(L + M )(~x) = L(~x) + M (~x)


= (2x1 + x2 , x1 − x2 + x3 ) + (x3 , x1 + 2x2 + 3x3 )
= (2x1 + x2 + x3 , 2x1 + x2 + 4x3 )

and
(−2)L(~x) = −2(2x1 + x2 , x1 − x2 + x3 ) = (−4x1 − 2x2 , −2x1 + 2x2 − 2x3 ).
51
This definition works for any two functions L, M : Rn → Rm

210
Notice that in the previous example, L + M and −2L are both linear transformationsh x1 i as
well. In fact, since L and M are linear transformations, we have that for any ~x = xx23 ∈ R3

(L + M )(~x) = L(~x) + M (~x) = [ L ]~x + [ M ]~x = [ L ] + [ M ] ~x
   
" # " #! x1 " # x1
2 1 0 0 0 1 2 1 1 
= +  x2  =  x2 
  
1 −1 1 1 2 3 2 1 4
x3 x3
" #
2x1 + x2 + x3
=
2x1 + x2 + 4x3

and
 
" x1 #
2 1 0 
(−2L)(~x) = −2L(~x) = −2[ L ]~x = −2  x2 

1 −1 1
x3
" # " #
2x1 + x2 −4x1 − 2x2
= −2 =
x1 − x2 + x3 −2x1 + 2x2 − 2x3

which is consistent with what we found in the above example.

Theorem 33.5. Let L, M : Rn → Rm be linear transformations and c ∈ R. Then


L + M : Rn → Rm and cL : Rn → Rm are linear transformations. In addition,

[ L + M ] = [ L ] + [ M ] and [ cL ] = c[ L ].

Proof. We prove the result for cL. For any ~x, ~y ∈ Rn and s, t ∈ R, we have

(cL)(s~x + t~y ) = cL(s~x + t~y ) by definition of cL



= c sL(~x) + tL(~y ) since L is linear
= csL(~x) + ctL(~y )
= s(cL)(~x) + t(cL)(~y ) by definition of cL

and we see that cL is a linear transformation. Now for any ~x ∈ Rn

[cL]~x = (cL)(~x) by definition of the standard matrix of cL


= cL(~x) by definition of cL
= c[ L ]~x by the definition of the standard matrix of L

from which we see that [ cL ] = c[ L ] by the Matrices Equal Theorem (Theorem 26.10).

211
Aside from adding and scaling linear transformations, we can also compose them.
Definition 33.6. Let L : Rn → Rm and M : Rm → Rp be (linear) transformations. The
composition M ◦ L : Rn → Rp is defined by

(M ◦ L)(~x) = M (L(~x))

for every ~x ∈ Rn .
The composition of two transformations is illustrated in Figure 56. It is important to note
that in order for M ◦ L to be defined, the domain of M must contain the codomain of L.

Figure 56: Composing two transformations

Example 33.7. Let L : R3 → R2 and M : R2 → R2 be linear transformations defined by


L(x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ) and M (x1 , x2 ) = (x1 − 3x2 , 2x1 ). Calculate M ◦ L.
Solution. We have

(M ◦ L)(x1 , x2 , x3 ) = M L(x1 , x2 , x3 )
= M (x1 + x2 , x2 + x3 )

= (x1 + x2 ) − 3(x2 + x3 ), 2(x1 + x2 )
= (x1 − 2x2 − 3x3 , 2x1 + 2x2 ).

Notice that M ◦ L is also a linear transformation with domain R3 and codomain R2 . In fact,
computing the standard matrices for L and M gives
" # " #
1 1 0 1 −3
[L] = and [ M ] =
0 1 1 2 0

and computing their product gives


" #" # " #
1 −3 1 1 0 1 −2 −3
[ M ][ L ] = =
2 0 0 1 1 2 2 0

which is the standard matrix for M ◦ L.

212
Theorem 33.8. Let L : Rn → Rm and M : Rm → Rp be linear transformations. Then
M ◦ L : Rn → Rp is a linear transformation and

[ M ◦ L ] = [ M ][ L ].

Proof. We first show that M ◦ L is linear. Let ~x, ~y ∈ Rn and s, t ∈ R. Then



(M ◦ L)(s~x + t~y ) = M L(s~x + t~y ) by definition of composition

= M sL(~x) + tL(~y ) since L is linear
 
= sM L(~x) + tM L(~y ) since M is linear
= s(M ◦ L)(~x) + t(M ◦ L)(~y ) by definition of composition

and we see that M ◦ L is linear. Now for any ~x ∈ Rn ,

[ M ◦ L ]~x = (M ◦ L)(~x) by definition of the standard matrix of M ◦ L



= M L(~x) by definition of composition
= M ([ L ]~x) by definition of the standard matrix of L
= [ M ]([ L ]~x) by definition of the standard matrix of M
= ([ M ][ L ])~x

from which we see that [ M ◦ L ] = [ M ][ L ] by the Matrices Equal Theorem.


Note that for linear transformations L : Rn → Rm and M : Rm → Rp , [ L ] ∈ Mm×n (R) and
[ M ] ∈ Mp×m (R). It follows that [ M ◦ L ] = [ M ][ L ] ∈ Mp×n (R) which is consistent with
M ◦ L : Rn → Rp . Note that [ L ][ M ] is not defined unless n = p, and even then, we are not
guaranteed that [ L ][ M ] = [ M ][ L ], that is, if M ◦ L and L ◦ M are both defined, we are
not guaranteed that they are equal.

213
Lecture 34
Example 34.1. Let L : R2 → R2 be a counterclockwise rotation about the origin by an
angle of π/4 and let M : R2 → R2 be a projection onto the x1 −axis. Find the standard
matrices for M ◦ L and L ◦ M .
Solution. Since L and M are linear, we have
" # " √ √ #
cos π/4 − sin π/4 2/2 − 2/2
[L] = = √ √
sin π/4 cos π/4 2/2 2/2
" #
h i 1 0
[ M ] = proj ~e1 ~e1 proj ~e1 ~e2 =
0 0

and thus
" #" √ √ # " √ √ #
1 0 2/2 − 2/2 2/2 − 2/2
[ M ◦ L ] = [ M ][ L ] = √ √ =
0 0 2/2 2/2 0 0
" √ √ #" # " √ #
2/2 − 2/2 1 0 2/2 0
[ L ◦ M ] = [ L ][ M ] = √ √ = √ .
2/2 2/2 0 0 2/2 0

We notice in the previous example that although M ◦ L and L ◦ M are both defined,
[ M ◦ L ] 6= [ L ◦ M ] from which we conclude that M ◦ L and L ◦ M are not the same
linear transformation, that is, L and M do not commute.
Example 34.2. Let L, M : R2 → R2 be linear transformations defined by L(x1 , x2 ) =
(2x1 + x2 , x1 + x2 ) and M (x1 , x2 ) = (x1 − x2 , −x1 + 2x2 ). Find [ M ◦ L ] and [ L ◦ M ].
Solution. Since L and M are linear, we have
" #
h i 2 1
[ L ] = L(~e1 ) L(~e2 ) =
1 1
" #
h i 1 −1
[ M ] = M (~e1 ) M (~e2 ) =
−1 2

and thus
" #" # " #
1 −1 2 1 1 0
[ M ◦ L ] = [ M ][ L ] = =
−1 2 1 1 0 1
" #" # " #
2 1 1 −1 1 0
[ L ◦ M ] = [ L ][ M ] = = .
1 1 −1 2 0 1

We see that [ M ◦ L ] = I = [ L ◦ M ], so M ◦ L = L ◦ M .

214
Inverse Linear Transformations
We have studied invertible matrices, and have seen that the inverse is only defined for
square matrices. We now study invertible linear transformations, which will only be defined
for linear operators on Rn .52
Definition 34.3. The linear transformation Id : Rn → Rn defined by Id(~x) = ~x for every
~x ∈ Rn is called the identity transformation.
Clearly, h i h i
[ Id ] = Id(~e1 ) · · · Id(~en ) = ~e1 · · · ~en = I.

Definition 34.4. If L : Rn → Rn is a linear transformation and there exists another linear


transformation M : Rn → Rn such that M ◦ L = Id = L ◦ M , then we say that L is invertible
and call M the inverse of L, and write L−1 = M .
From Example 34.2, we see that L−1 = M (and equivalently that M −1 = L). We also see
that [ L ]−1 = [ M ] (and equivalently that [ M ]−1 = [ L ]).
Theorem 34.5. If L, M : Rn → Rn are linear transformations, then M is the inverse of L
if and only if [ M ] is the inverse of [ L ].
Proof. We have

M is the inverse of L ⇐⇒ M ◦ L = Id = L ◦ M
⇐⇒ [ M ◦ L ] = [ Id ] = [ L ◦ M ]
⇐⇒ [ M ][ L ] = I = [ L ][ M ]
⇐⇒ [ M ] is the inverse of [ L ].

It follows from Theorem 34.5 that if L : Rn → Rn is an invertible linear transformation, then

[ L−1 ] = [ L ]−1 .

Geometrically, given an invertible linear transformation L : Rn → Rn , we can view


L−1 : Rn → Rn as “undoing” what L does.
Example 34.6. Recall that Rθ : R2 → R2 denotes a counterclockwise rotation about the
origin through an angle of θ. Describe the inverse transformation of Rθ and find its standard
matrix.
Solution. We see that Rθ−1 is a rotation by an angle of −θ, so Rθ−1 = R−θ . As we have seen
before, " # " #
cos(−θ) − sin(−θ) cos θ sin θ
[ R−θ ] = = .
sin(−θ) cos(−θ) − sin θ cos θ
52
Recall, a linear operator is a linear transformation L : Rn → Rn , that is, a linear transformation whose
codomain is equal to its domain.

215
Note that we have just shown that [ Rθ ]−1 = [ R−θ ], that is,
" #−1 " #
cos(θ) − sin(θ) cos θ sin θ
= .
sin(θ) cos(θ) − sin θ cos θ

We could have used the Inversion Algorithm to compute [ Rθ ]−1 :


" # " #
cos θ − sin θ 1 0 1 0 cos θ sin θ

sin θ cos θ 0 1 0 1 − sin θ cos θ

but this is quite tedious. Indeed, understanding what multiplication by a square matrix does
geometrically can give us a fast way to decide if the matrix is invertible, and if so, what the
inverse of that matrix is.

Complex Linear Transformations


We briefly mention that the theory of linear transformations from Cn to Cm mirrors that of
linear transformations from Rn to Rm . We consider a simple example:

Example 34.7. Let L : C3 → C2 be a linear transformation such that


" # " # " #
1−j j 2
L(~e1 ) = , L(~e2 ) = and L(~e3 ) = .
2+j 3 1+j

Compute L(1, j, 1 + j).

Solution. We first compute the standard matrix for L:


" #
h i 1−j j 2
[ L ] = L(~e1 ) L(~e2 ) L(~e3 ) = .
2+j 3 1+j

Then  
" # 1 " #
1−j j 2 2+j
L(1, j, 1 + j) =  j = .
 
2+j 3 1+j 2 + 6j
1+j

Application - Linear Transformations


Suppose we have a robotic arm anchored at the origin in the plane. It can rotate about the
origin by an angle θ1 , and also rotate about its “elbow” by an angle θ2 . We don’t allow for
θ2 = π as the arm would fold back on itself. Thus we restrict θ2 to −π < θ2 < π. The lower
part of the arm has length `1 and the upper part of the arm has length `2 .

216
We wish to know the coordinates of the endpoint of the arm. We begin with the portion
of the arm with length `2 , assuming it is based at the origin and is lying on the x1 -axis so
its end has coordinates (`2 , 0). We perform a counterclockwise rotation by θ2 , followed by a
translation by the vector [ `1 0 ]T (which can be thought as inserting the remaining portion
of the arm with its base at the origin and laying parallel to the x1 -axis), and then rotate
counterclockwise by θ1 .

(a) Begin with [ `2 0 ]T (b) Rotate counterclockwise by θ2

(c) Translate by [ `1 0 ]T (d) Rotate counterclockwise by θ1

217
In terms of transformations, we obtain
" # " # " #" #!
cos θ1 − sin θ1 `1 cos θ2 − sin θ2 `2
+ =
sin θ1 cos θ1 0 sin θ2 cos θ2 0
" # " # " #!
cos θ1 − sin θ1 `1 `2 cos θ2
= +
sin θ1 cos θ1 0 `2 sin θ2
" #" #
cos θ1 − sin θ1 `1 + `2 cos θ2
=
sin θ1 cos θ1 `2 sin θ2
" #
`1 cos θ1 + `2 cos θ1 cos θ2 − `2 sin θ1 sin θ2
=
`1 sin θ1 + `2 sin θ1 cos θ2 + `2 cos θ1 sin θ2
" #
`1 cos θ1 + `2 (cos θ1 cos θ2 − sin θ1 sin θ2 )
=
`1 sin θ1 + `2 (sin θ1 cos θ2 + cos θ1 sin θ2 )
" #
`1 cos θ1 + `2 cos(θ1 + θ2 )
=
`1 sin θ1 + `2 sin(θ1 + θ2 )

where we have used the angle-sum formulas. We see the endpoint of the arm has coordinates

(x1 , x2 ) = `1 cos θ1 + `2 cos(θ1 + θ2 ), `1 sin θ1 + `2 sin(θ1 + θ2 )

Note that although our transformation is linear, it was constructed using translations (a
shift in the direction of a nonzero vector), which are nonlinear transformations. We can see
the nonlinearity of a translation: for `1 6= 0 we define
" # " #
x1 `1
L(x1 , x2 ) = +
x2 0

and note that " # " # " # " #


0 `1 `1 0
L(0, 0) = + = 6=
0 0 0 0
giving that L is not linear (recall, a linear transformation must map the zero vector of the
domain to the zero vector of the codomain). Since L is not linear, we see that a translation
cannot be accomplished using matrix multiplication (in our usual coordinates of Rn ).

Below we introduce homogeneous coordinates. These coordinates require adding a 1 onto the
coordinates of a vector in Rn thus giving a vector in Rn+1 . We see that in these coordinates,
we actually can compute a translation using matrix multiplication.
Definition 34.8. To each point (x1 , x2 ) in the x1 x2 −plane there is a corresponding point
(x1 , x2 , 1) lying in the plane x3 = 1 in R3 . We call the coordinates (x1 , x2 , 1) the homogeneous
coordinates of the point (x1 , x2 ).

218
For a, b ∈ R, not both zero, consider a transformation L : R2 → R2 defined by
" # " # " #
x1 a x1 + a
L(x1 , x2 ) = + =
x2 b x2 + b

This is not a linear transformation, but using homogeneous coordinates, we are able to apply
the transformation by matrix multiplication:
    
1 0 a x1 x1 + a
 0 1 b   x2  =  x2 + b 
    

0 0 1 1 1

Also, for any linear transformation L : R2 → R2 defined by L(~x) = A~x, where A ∈ M2×2 (R),
we can construct the 3 × 3 matrix " #
A ~0R2
~0 T2 1
R

to represent the transformation in homogeneous coordinates.


Example 34.9. In the example with our robotic arm, we can find the coordinates of the
endpoint as follows:
    
cos θ1 − sin θ1 0 1 0 `1 cos θ2 − sin θ2 0 `2
 sin θ1 cos θ1 0   0 1 0   sin θ2 cos θ2 0   0 
    

0 0 1 0 0 1 0 0 1 1
   
cos θ1 − sin θ1 0 1 0 `1 `2 cos θ2
=  sin θ1 cos θ1 0   0 1 0   `2 sin θ2 
   

0 0 1 0 0 1 1
  
cos θ1 − sin θ1 0 `1 + `2 cos θ2
=  sin θ1 cos θ1 0   `2 sin θ2
  

0 0 1 1
 
`1 cos θ1 + `2 cos θ1 cos θ2 − `2 sin θ1 sin θ2
=  `1 sin θ1 + `2 sin θ1 cos θ2 + `2 cos θ1 sin θ2 
 

1
 
`1 cos θ1 + `2 cos(θ1 + θ2 )
=  `1 sin θ1 + `2 sin(θ1 + θ2 ) 
 

We again see that the coordinates are



(x1 , x2 ) = `1 cos θ1 + `2 cos(θ1 + θ2 ), `1 sin θ1 + `2 sin(θ1 + θ2 )

219
Another interesting use for linear transformations is the differentiation of polynomials. If we
write our polynomials as
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn
where n is a positive integer, then we may represent the polynomial as a vector in Rn+1
 
a0
 
 a1 
 
 a2 
p(x) −→  . 
 
 .. 
 
 a 
 n−1 
an
Since
d
p(x) = a1 + 2a2 x + · · · + (n − 1)an−1 xn−2 + nan xn−1
dx
we have  
a1
 
 2a2 
 
d  3a3 
p(x) −→  . 
 
dx .
 . 
 
 na 
 n 
0
h iT
In fact, given an arbitrary vector a0 a1 a2 · · · an−1 an ∈ Rn+1 , we have that
   
a1   a0
  0 1 0 ··· 0 0  
 2a2    a1 
0 0 2 ··· 0 0 
   
 3a3    a2 
  .. ... .. 

 . =
  
 ..    . .   . 
 .. 
   0 0 0 ··· 0 n    
 na   a 
n   n−1 
0 0 0 ··· 0 0

0 an
Example 34.10. We can represent
d
(3 − 2x + 4x2 − 7x3 ) = −2 + 8x − 21x2
dx
as     
−2 0 1 0 0 3
  0 0 2 0   −2 
8 
   
=

  
 −21   0 0 0 3   4 
0 0 0 0 0 −7

220
Lecture 35

The Kernel and the Range of a Linear Transformation


Given a linear transformation L : Rn → Rm , there are two important sets that carry with
them information about L.

Definition 35.1. Let L : Rn → Rm be a (linear) transformation. The kernel 53 of L is

Ker (L) = {~x ∈ Rn | L(~x) = ~0}

Note that Ker (L) ⊆ Rn .

Example 35.2. Let L : R2 → R2 be a linear transformation defined by

L(x1 , x2 ) = (x1 − x2 , −3x1 + 3x2 ).

Determine which of ~x1 = [ 00 ], ~x2 = [ 11 ] and ~x3 = [ 32 ] belong to Ker (L).

Solution. We compute

L(~x1 ) = L(0, 0) = (0 − 0, −3(0) + 3(0)) = (0, 0)


L(~x2 ) = L(1, 1) = (1 − 1, −3(1) + 3(1)) = (0, 0)
L(~x3 ) = L(3, 2) = (3 − 2, −3(3) + 3(2)) = (1, −3)

from which we deduce that ~x1 , ~x2 ∈ Ker (L) and ~x3 ∈
/ Ker (L).

Definition 35.3. Let L : Rn → Rm be a (linear) transformation. The range of L is

Range (L) = {L(~x) | ~x ∈ Rn }.

Note that Range (L) ⊆ Rm .

Example 35.4. Let L : R2 → R3 be a linear transformation defined by

L(x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x2 ).


h i h i
2 1
Determine which of ~y1 = 3 and ~y2 = 1 belong to Range (L).
3 2

Solution. To see if ~y1 ∈ Range (L), we try to find ~x = [ xx12 ] ∈ R2 such that L(~x) = ~y1 . Thus
we need
L(x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x2 ) = (2, 3, 3).
53
The kernel of L can also be called the nullspace of L, denoted by Null (L).

221
This leads to a system of equations

x1 + x2 = 2
2x1 + x2 = 3
3x2 = 3

Carrying the augmented matrix of this system to reduced row echelon form gives
       
1 1 2 −→ 1 1 2 R1 +R2 1 0 1 −→ 1 0 1
 2 1 3  R2 −2R1  0 −1 −1  −→  0 −1 −1  −R2  0 1 1 
       

0 3 3 0 3 3 R3 +3R1 0 0 0 0 0 0

from which we see that x1 = x2 = 1 and so L(1, 1) = (2, 3, 3). Thus ~y1 ∈ Range (L). For
~y2 , we seek ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2). A similar computation leads to a
system of equations with augmented matrix
   
1 1 1 1 1 1
 2 1 1  −→  0 −1 −1  .
   

0 3 2 0 0 −1

As this system is inconsistent, there is no ~x = [ xx12 ] ∈ R2 such that L(x1 , x2 ) = (1, 1, 2) and
so ~y2 ∈
/ Range (L).
As one might expect, the kernel and range for a linear transformation are both subspaces.

Theorem 35.5. Let L : Rn → Rm be a linear transformation. Then

(1) Ker (L) is a subspace of Rn , and

(2) Range (L) is a subspace of Rm .

Proof.

(1) By definition, Ker (L) ⊆ Rn and since L is linear, L(~0Rn ) = ~0Rm so ~0Rn ∈ Ker (L) and
Ker (L) is nonempty. For ~x, ~y ∈ Ker (L), we have that L(~x) = ~0 = L(~y ). Then, since
L is linear
L(~x + ~y ) = L(~x) + L(~y ) = ~0 + ~0 = ~0
so ~x + ~y ∈ Ker (L) and Ker (L) is closed under vector addition. For c ∈ R, we again
use the linearity of L to obtain

L(c~x) = cL(~x) = c ~0 = ~0

showing that c~x ∈ Ker (L) so that Ker (L) is closed under scalar multiplication. Hence,
Ker (L) is a subspace of Rn .

222
(2) By definition, Range (L) ⊆ Rm and since L is linear, L(~0Rn ) = ~0Rm so ~0Rm ∈ Range (L)
and Range (L) is nonempty. For ~x, ~y ∈ Range (L), there exist ~u, ~v ∈ Rn such that
~x = L(~u) and ~y = L(~v ). Then since L is linear,

L(~u + ~v ) = L(~u) + L(~v ) = ~x + ~y

and so ~x + ~y ∈ Range (L). For c ∈ R, we use the linearity of L to obtain

L(c~u) = cL(~u) = c~x

and so c~x ∈ Range (L). Thus Range (L) is a subspace of Rm .

We note that for a linear transformation L : Rn → Rm , the standard matrix of L is

[ L ] = [ L(~e1 ) · · · L(~en ) ]

and that L(~x) = [ L ]~x for every ~x ∈ Rn . Thus we may view the kernel of L as the nullspace
of [ L ] and the range of L as the column space of [ L ].

Theorem 35.6. Let L : Rn → Rm be a linear transformation with standard matrix [ L ].


Then

(1) Ker (L) = Null ([ L ]), and

(2) Range (L) = Col ([ L ]).

Note that in Example 35.4, to see if ~y1 ∈ Range (L), were are ultimately checking if the
linear system of equations [ L ]~x = ~y1 is consistent, that is, if ~y1 ∈ Col ([ L ]).

Example 35.7. Leth Li: R3 → R3 be a projection onto the line through the origin with
1
direction vector d~ = 1 . Find a basis for Ker (L) and Range (L).
1

Solution. As L is linear, the standard matrix of L is


 
h i h i 1/3 1/3 1/3
[L] = L(~e1 ) L(~e2 ) L(~e3 ) = proj d~ ~e1 proj d~ ~e2 proj d~ ~e3 =  1/3 1/3 1/3 
 

1/3 1/3 1/3

If ~x ∈ Ker (L), then L(~x) = [ L ]~x = ~0. Carrying [ L ] to reduced row echelon form gives
   
1/3 1/3 1/3 1 1 1
 1/3 1/3 1/3  −→  0 0 0 
   

1/3 1/3 1/3 0 0 0

223
and we see that    
−1 −1
~x = s  1  + t  0  , s, t ∈ R
   

0 1
so    

 −1 −1 

 1 , 0 
   
 
0 1
 

is a basis for Ker (L). To find a basis for Range (L), we find all vectors ~y ∈ Rm for which
there exists a ~x ∈ Rn with L(~x) = ~y . But this is equivalent to finding all ~y ∈ Rm for which
the system [ L ]~x = ~y is consistent, and the system [ L ]~x = ~y is consistent if and only if
~y ∈ Col ([ L ]). Hence we simply seek a basis for Col ([ L ]). From our work above, we see
that the reduced row echelon form of [ L ] has a leading one in the first column only, and so
a basis for Range (L) is  
 1/3 
 
 1/3  .
 
 
1/3
 

Example 35.8. Find a basis for Ker (L) and Range (L) where L is the linear transformation
satisfying
L(x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ).
Solution. We have
" #
h i 1 1 0
[L] = L(~e1 ) L(~e2 ) L(~e3 ) = .
0 1 1

Carrying [ L ] to reduced row echelon form gives


" # " #
1 1 0 R1 −R2 1 0 −1
0 1 1 −→ 0 1 1

from which we see a solution to L(~x) = [ L ]~x = ~0 is


 
1
~x = t  −1  , t ∈ R
 

1
and so  

 1 
 −1 
 
 
1
 

224
is a basis for Ker (L). As the reduced row echelon form of [ L ] has leading ones in the first
two columns, a basis for Range (L) is
(" # " #)
1 1
, .
0 1

In Example 35.8,
h 1 inote that geometrically, Ker (L) is a line through the origin with direction
vector d~ = −1 , which is a 1−dimensional subspace of R3 and that Range (L) = R2 .
1
Figure 58 gives a more general geometric interpretation of the domain and range of a linear
transformation from Rn to Rm .

Figure 58: Visualizing the kernel and the range of a linear transformation.

One-to-One and Onto Linear Transformations


For a linear transformation L : Rn → Rm , we now know how to determine if an element
of the codomain is the image of some element in the domain, that is, if the element of the
codomain belongs to the range of L. We now ask the question of whether or not there is a
an element in the codomain that is the image of more than one element from the domain.
This motivates the definition of a one-to-one transformation.

Definition 35.9. Let L : Rn → Rm be a (linear) transformation.54 We say that L is


one-to-one (or injective) if L(~x1 ) = L(~x2 ) implies that ~x1 = ~x2 for any ~x1 , ~x2 ∈ Rn .

Note that an equivalent definition is that L is one-to-one if ~x1 6= ~x2 implies L(~x1 ) 6= L(~x2 ).
Thus a one-to-one transformation cannot send distinct elements of the domain to the same
element in the range, and it follows that each element of the range is the image of at most
one element in the domain. This is illustrated in Figure 59.
54
Again, this definition holds for any function from Rn to Rm . In fact, if n = m = 1, then we have a
function from R to R, and in this case the definition amounts to the horizontal line test often seen in a
calculus course.

225
(a) An example of a one-to-one transforma- (b) An example of a transformation (or func-
tion (or function). tion) that is not one-to-one. Note that ~y1 is
the image of both ~x1 and ~x2 , but ~x1 6= ~x2 .

Figure 59: For a one-to-one transformation, every element of the codomain is the image of
at most one element from the domain.

Showing that a transformation is one-to-one can be challenging. However, if our transfor-


mation is linear, the work is simplified.
Theorem 35.10. If L : Rn → Rm is a linear transformation, then L is one-to-one if and
only if Ker (L) = {~0}.
Proof. Assume first that L is one-to-one. Let ~x ∈ Ker (L). Then L(~x) = ~0. Since L is linear,
L(~0) = ~0 so we have that L(~x) = L(~0). Because L is one-to-one, we have that ~x = ~0. Hence
Ker (L) ⊆ {~0} and since clearly {~0} ⊆ Ker (L), we have that Ker (L) = {~0}.

Assume now that Ker (L) = {~0}. If ~x1 , ~x2 ∈ Rn are such that L(~x1 ) = L(~x2 ), then using the
linearity of L, we have
~0 = L(~x1 ) − L(~x2 ) = L(~x1 − ~x2 )
and so ~x1 − ~x2 ∈ Ker (L). Since Ker (L) = {~0}, we see that ~x1 − ~x2 = ~0, that is, ~x1 = ~x2 and
so L is one-to-one.
Given that the kernel of a linear transformation from Rn to Rm is simply the nullspace of
the standard matrix, we can actually use the rank of the standard matrix to determine if a
linear transformation is one-to-one.
Theorem 35.11. Let L : Rn → Rm be a linear transformation. Then L is one-to-one if and
only if rank ([ L ]) = n.
Proof. Since L : Rn → Rm is a linear transformation, [ L ] ∈ Mm×n (R). Then

L is one-to-one ⇐⇒ Ker (L) = {~0} by Theorem 35.10


⇐⇒ [ L ]~x = ~0 has only the trivial solution
⇐⇒ rank ([ L ]) = n by the System-Rank Theorem (2).

226
Example 35.12. Consider the linear transformations L : R2 → R3 and M : R2 → R2
defined by

L(x1 , x2 ) = (x1 , x2 − x1 , x2 )
M (x1 , x2 ) = (x1 + x2 , 2x1 + 2x2 )

Determine which of L and M are one-to-one.

Solution. Note that in the case of L, n = 2. The standard matrix for L is


   
1 0 1 0
[ L ] =  −1 1  −→  0 1 
   

0 1 0 0

and we see that rank ([ L ]) = 2 = n and thus L is one-to-one. In the case of M , n = 2 and
the standard matrix for M is
" # " #
1 1 1 1
[M ] = −→
2 2 0 0

and thus rank ([ M ]) = 1 < 2 = n so M is not one-to-one.


In the previous example, note that M (~x) = ~0 gives
" #
−1
~x = t , t ∈ R.
1

Thus, for example, we see that

M (−1, 1) = (0, 0) = M (−3, 3)

but " # " #


−1 −3
6= .
1 3

227
Lecture 36
Recall that a linear transformation L : Rn → Rm is one-to-one if L(~x1 ) = L(~x2 ) implies
that ~x1 = ~x2 for every ~x1 , ~x2 ∈ Rn . Recall also that this means that every element in the
codomain of L is the image of at most one element in the domain, and that knowing the rank
of [ L ] allows us to verify if L is one-to-one. We now look for a condition that guarantees
that every element in the codomain is the image of at least one element from the domain.
Definition 36.1. Let L : Rn → Rm be a (linear) transformation. L is called onto (or
surjective) if for every ~y ∈ Rm there exists an ~x ∈ Rn such that L(~x) = ~y .
It is clear that Range (L) ⊆ Rm . It follows that if L : Rn → Rm is onto, then Range (L) = Rm .
Figure 60 gives an illustration of an onto transformation.

(a) An example of an onto transformation. (b) An example of a transformation that is


not onto. Note that neither ~y2 nor ~y5 are the
image of any of element in the domain.

Figure 60: For an onto transformation, every element of the codomain is the image of at
least one element from the domain.

The next theorem shows that we can use the rank of the standard matrix of a linear trans-
formation to determine if the linear transformation is onto.
Theorem 36.2. Let L : Rn → Rm be a linear transformation. Then L is onto if and only if
rank ([ L ]) = m.
Proof. Since L : Rn → Rm is a linear transformation, [ L ] ∈ Mm×n (R). Then
L is onto ⇐⇒ for every ~y ∈ Rm there exists a ~x ∈ Rn such that L(~x) = ~y
⇐⇒ [ L ]~x = ~y is consistent for every ~y ∈ Rm
⇐⇒ rank ([ L ]) = m by the System-Rank Theorem (3).
Example 36.3. Let L : R3 → R2 and M : R2 → R3 be linear transformations defined by
L(x1 , x2 , x3 ) = (x1 + x2 − x3 , x2 + x3 )
M (x1 , x2 ) = (x1 , x2 , 0)
Determine which of L and M are onto.

228
Solution. In the case of L, m = 2. The standard matrix for L is
" # " #
1 1 −1 1 0 −2
[L] = −→
0 1 1 0 1 1

and we see rank ([ L ]) = 2 = m, and thus L is onto. In the case of M , m = 3 and the
standard matrix for M is  
1 0
[M ] =  0 1 
 

0 0
and thus rank ([ M ]) = 2 < 3 = m. Hence M is not onto.

Example 36.4. Let {~x1 , . . . , ~xk } ⊆ Rn and let L : Rn → Rm be a linear transformation.


Prove that:

(1) If {~x1 , . . . , ~xk } is linearly independent and L is one-to-one, then {L(~x1 ), . . . , L(~xk )} is
linearly independent.

(2) If Span {~x1 , . . . , ~xk } = Rn and L is onto, then Span {L(~x1 ), . . . , L(~xk )} = Rm .

Proof.

(1) Assume L is one-to-one and that {~x1 , . . . , ~xk } is linearly independent. For scalars
c1 , . . . , ck ∈ R consider
c1 L(~x1 ) + · · · + ck L(~xk ) = ~0.
We must show that c1 = · · · = ck = 0. Since L is linear, we have

L(c1~x1 + · · · + ck ~xk ) = ~0

and thus, c1~x1 +· · ·+ck ~xk ∈ Ker (L). Since L is one-to-one, Ker (L) = {~0} by Theorem
35.10. Hence
c1~x1 + · · · + ck ~xk = ~0
and since {~x1 , . . . , ~xk } is linearly independent, c1 = · · · = ck = 0. Hence {L(~x1 ), · · · , L(~xk )}
is linearly independent.

(2) Assume L is onto and that Span {~x1 , . . . , ~xk } = Rn . Let ~y ∈ Rm . We must show that
~y can be expressed as a linear combination of L(~x1 ), . . . , L(~xk ). Since L is onto, there
exists an ~x ∈ Rn so that L(~x) = ~y . As Span {~x1 , . . . , ~xk } = Rn , there are c1 , . . . , ck ∈ R
so that
~x = c1~x1 + · · · + ck ~xk .
Then since L is linear

~y = L(~x) = L(c1~x1 + · · · + ck ~xk ) = c1 L(~x1 ) + · · · + ck L(~xk ),

229
and we see that ~y ∈ Span {L(~x1 ), . . . , L(~xk )}. This shows

Rm ⊆ Span {L(~x1 ), . . . , L(~xk )}

and equality follows as Rm is closed under linear combinations.

Given a linear transformation, we are able to determine if it is one-to-one or onto by simply


looking at the rank of the standard matrix. It is natural then to consider linear transforma-
tions that are one-to-one and onto. Such a linear transformation L : Rn → Rm requires that
n = rank ([ L ]) = m, that is, m = n. Thus only linear operators L : Rn → Rn can be both
one-to-one and onto.

Definition 36.5. A (linear) transformation L : Rn → Rn is a one-to-one correspondence


(or bijective) if it is both one-to-one and onto.

If a linear transformation L : Rn → Rn is a one-to-one correspondence, then it is both


one-to-one and onto. Thus, every element in the codomain is the image of at most one
element from the domain and at least one element from the domain. Hence for a one-to-one
correspondence, every element in the codomain is the image of exactly one element in the
domain. This is illustrated in Figure 61.

Figure 61: An example of a one-to-one correspondence.

Theorem 36.6. Let L : Rn → Rn be a linear transformation. Then L is a one-to-one


correspondence if and only if L is invertible.

Proof. Since L : Rn → Rn is a linear transformation, [ L ] ∈ Mn×n (R). Then

L is a one-to-one correspondence ⇐⇒ L is both one-to-one and onto


⇐⇒ rank ([ L ]) = n (by Theorems 35.11 and 36.2)
⇐⇒ [ L ] is invertible (by the Invertible Matrix Theorem)
⇐⇒ L is invertible (by Theorem 34.5).

230
Determinants, Adjugates and Matrix Inverses
We return now to studying matrices. Previously, we used the Matrix Inversion Algorithm
to both decide if an n × n matrix A was invertible and to compute A−1 if A was invertible.
Now we study a number associated to an n × n matrix A, called the determinant and we
will see how the determinant is related to the invertibility. We begin with a 2 × 2 matrix.

Definition 36.7. Let " #


a b
A= ∈ M2×2 (R).
c d
The determinant of A is
a b
det A = = ad − bc
c d
and the adjugate of A is " #
d −b
adj A = .
−c a

Example 36.8. Consider " #


1 2
A= .
3 4
Then
det A = 1(4) − 2(3) = 4 − 6 = −2
and " #
4 −2
adj A =
−3 1
Also,
" #" # " # " #
1 2 4 −2 −2 0 1 0
A(adj A) = = = −2 = (det A)I
3 4 −3 1 0 −2 0 1
" #" # " # " #
4 −2 1 2 −2 0 1 0
(adj A)A = = = −2 = (det A)I
−3 1 3 4 0 −2 0 1
From this we see " # " #! " #
1 2 1 4 −2 1 0
− =
3 4 2 −3 1 0 1
so " #
−2 1
A−1 =
3/2 −1/2

231
Theorem 36.9. Let A ∈ M2×2 (R). Then
A(adj A) = (det A)I = (adj A)A
Moreoever, A is invertible if and only if det A 6= 0 and in this case
1
A−1 = adj A
det A
Proof. Let " #
a b
A= ∈ M2×2 (R)
c d
Then " #
d −b
det A = ad − bc and adj A =
−c a
Now " #" # " #
a b d −b ad − bc 0
A(adj A) = = = (det A)I
c d−c a 0 ad − bc
" #" # " #
d −b a b ad − bc 0
(adj A)A = = = (det A)I
−c a c d 0 ad − bc
Assume then that det A 6= 0. From
A(adj A) = (det A)I = (adj A)A
we obtain    
1 1
A adj A = I = adj A A
det A det A
so
1
A−1 = adj A.
det A
Thus det A 6= 0 implies that A invertible and gives our formula for A−1 . We now show if A is
invertible, then det A 6= 0. Assume for a contradiction that det A = 0. Since A is invertible,
A 6= 0 so at least one of a, b, c, d are not zero. Since
A(adj A) = (det A)I = 0I = 0,
we have " # " # " # " #
d 0 −b 0
A = and A = .
−c 0 a 0
Since not all of a, b, c, d are zero, we have that either
" # " # " # " #
d 0 −b 0
6= or 6=
−c 0 a 0
from which we see that the homogeneous system A~x = ~0 has a nontrivial solution, so A is
not invertible by the Invertible Matrix Theorem. This is a contradiction, so our assumption
that det A = 0 was incorrect, and we must have det A 6= 0.

232
Lecture 37
We now turn our attention to computing the determinant of an n × n matrix. We will see
that the definition of the determinant of an n × n matrix is recursive - to compute such
a determinant, we will compute n determinants of size (n − 1) × (n − 1). This can be
quite tedious by hand, so we will also begin to explore how elementary row (and column)
operations can greatly reduce our work.
Definition 37.1. Let A ∈ Mn×n (R) and let A(i, j) be the (n − 1) × (n − 1) matrix obtained
from A by deleting the ith row and jth column of A. The (i, j)-cofactor of A, denoted by
Cij , is
Cij = (−1)i+j det A(i, j).
Example 37.2. Let  
1 −2 3
A= 1 0 4 
 

4 1 1
then the (3, 2)-cofactor of A is

1 3
C32 = (−1)3+2 det A(3, 2) = (−1)5 = (−1)(4 − 3) = −1
1 4

and the (2, 2)-cofactor of A is

1 3
C22 = (−1)2+2 det A(2, 2) = (−1)4 = 1(1 − 12) = −11.
4 1

Definition 37.3. Let A ∈ Mn×n (R). For any i = 1, . . . , n, we define the determinant of A
as
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
which we refer to as a cofactor expansion of A along the ith row of A. Equivalently, for any
j = 1, . . . , n,
det A = a1j C1j + a2j C2j + · · · + anj Cnj
which we refer to as a cofactor expansion of A along the jth column of A.

Note that we can do a cofactor expansion along any row or column we choose. This is
illustrated in the next example.
Example 37.4. Compute det A where
 
1 2 −3
A =  4 −5 6 
 

−7 8 9

233
Solution. Doing a cofactor expansion along the first row gives
+ − + + − + + − +
− + − − + − − + −
+ − + + − + + − +
2 −3
z}|{ z}|{ z}|{
1
−5 6 4 6 4 −5
det A = 4 −5 6 = 1 −2 −3
8 9 −7 9 −7 8
−7 8 9 | {z } | {z } | {z }
1 2 −3 1 2 −3 1 2 −3
4 −5 6 4 −5 6 4 −5 6
−7 8 9 −7 8 9 −7 8 9

= 1(−45 − 48) − 2(36 + 42) − 3(32 − 35)


= 1(−93) − 2(78) − 3(−3)
= −93 − 156 + 9
= −240

Alternatively, a cofactor expansion along the second columns leads to


+ − + + − + + − +
− + − − + − − + −
+ − + + − + + − +
2 −3
z}|{ z}|{ z}|{
1
4 6 1 −3 1 −3
det A = 4 −5 6 = −2 −5 −8
−7 9 −7 9 4 6
−7 8 9 | {z } | {z } | {z }
1 2 −3 1 2 −3 1 2 −3
4 −5 6 4 −5 6 4 −5 6
−7 8 9 −7 8 9 −7 8 9

= −2(36 + 42) − 5(9 − 21) − 8(6 + 12)


= −2(78) − 5(−12) − 8(−18)
= −240 (as before)

Example 37.5. Find det B if  


1 0 −2
B= 0 3 4 
 

5 6 −7

Solution. Expanding along the first column,

1 0 −2
3 4 0 −2 0 −2
det B = 0 3 4 = 1(−1)1+1 + 0(−1)2+1 + 5(−1)3+1
6 −7 6 −7 3 4
5 6 −7
= 1(−21 − 24) + 0 + 5(0 + 6)
= −45 + 30
= −15

234
Example 37.6. Find det A if
 
1 2 −1 3
 1 2 0 4 
A=
 

 0 0 0 3 
−1 1 2 1

Solution. Performing a cofactor expansion along the third row, we have

1 2 −1 3
1 2 −1
1 2 0 4
det A = = −3 1 2 0
0 0 0 3
−1 1 2
−1 1 2 1

To evaluate the determinant of the 3 × 3 matrix, we can do a cofactor expansion along the
third column. This gives
!
1 2 1 2
det A = −3 −1 +2
−1 1 1 2
= −3(−1(1 + 2) + 2(2 − 2))
= −3(−3 + 0)
=9

Definition 37.7. Let A = [aij ] ∈ Mn×n (R)

1. Cij = (−1)i+j det A(i, j) is the (i, j)-cofactor of A

2. The cofactor matrix of A is

cof A = [Cij ] ∈ Mn×n (R)

3. The adjugate of A is
adj A = [Cij ]T ∈ Mn×n (R)

Example 37.8. Find adj A if  


1 2 3
A= 1 1 2 
 

3 4 5

235
Solution.
 T
1 2 1 2 1 1
 − 

 4 5 3 5 3 4 

   T  
−3 1 1 −3 2 1
 
 
 2 3 1 3 1 2 
 − 4 5
adj A =  −  2 −4 2  =  1 −4
 =  
1 

3 5 3 4 
1 1 −1 1 2 −1
 
 
 
 
 2 3 1 3 1 2 
 − 
1 2 1 2 1 1

Note that in the previous example, we computed all of the cofactors of A. Thus we can
easily compute the determinant of A by doing a cofactor expansion along, say, the first row:

1 2 3
det A = 1 1 2 = a11 C11 + a12 C12 + a13 C13 = 1(−3) + 2(1) + 3(1) = 2.
3 4 5

Note that
    
1 2 3 −3 2 1 2 0 0
A(adj A) =  1 1 2   1 −4 1 = 0 2 0  = 2I = (det(A))I
    

3 4 5 1 2 −1 0 0 2
    
−3 2 1 1 2 3 2 0 0
(adj A)A =  1 −4 1  1 1 2 = 0 2 0  = 2I = (det(A))I
    

1 2 −1 3 4 5 0 0 2

More generally, let  


a11 a12 a13
A =  a21 a22 a23  .
 

a31 a32 a33


Then  T  
C11 C12 C13 C11 C21 C31
adj A =  C21 C22 C23  =  C12 C22 C32 
   

C31 C32 C33 C13 C23 C33


so
 
a11 C11 + a12 C12 + a13 C13 a11 C21 + a12 C22 + a13 C23 a11 C31 + a12 C32 + a13 C33
A(adj A) =  a21 C11 + a22 C12 + a23 C13 a21 C21 + a22 C22 + a23 C23 a21 C31 + a22 C32 + a23 C33 
 

a31 C11 + a32 C12 + a33 C13 a31 C21 + a32 C22 + a33 C23 a31 C31 + a32 C32 + a33 C33

236
and
 
a11 C11 + a21 C21 + a31 C31 a12 C11 + a22 C21 + a32 C31 a13 C11 + a23 C21 + a33 C31
(adj A)A =  a11 C12 + a21 C22 + a31 C32 a12 C12 + a22 C22 + a32 C32 a13 C12 + a23 C22 + a33 C32 
 

a11 C13 + a21 C23 + a31 C33 a12 C13 + a22 C23 + a32 C33 a13 C13 + a23 C23 + a33 C33

The (1, 1)−, (2, 2)− and (3, 3)− entries of A(adj A) are respectively the cofactor expansions
along the first, second and third rows of A, and thus are each equal to det A. The (1, 1)−,
(2, 2)− and (3, 3)− entries of (adj A)A are respectively the cofactor expansions along the first,
second and third columns of A, and are thus each equal to det A. The entries of A(adj A)
and (adj A)A that are not on the main diagonal look like cofactor expansions, but they are
not (they are sometimes called false determinants). These always evaluate to zero.

The following theorem generalizes Theorem 36.9 for n × n matrices. The proof is similar and
is thus omitted.

Theorem 37.9. Let A ∈ Mn×n (R). Then

A(adj A) = (det A)I = (adj A)A

Moreover, A is invertible if and only if det A 6= 0. In this case,


1
A−1 = adj A
det A
Example 37.10. Find det A, adj A and A−1 if
 
1 1 2
A= 1 1 4 
 

1 2 4

Solution. Using a cofactor expansion along the first row, we obtain

1 4 1 4 1 1
det A = 1 −1 +2
2 4 1 4 1 2
= 1(4 − 8) − 1(4 − 4) + 2(2 − 1)
= −4 + 2
= −2

237
Then
 T
1 4 1 4 1 1
 − 

 2 4 1 4 1 2 

   T  
−4 0 1 −4 0 2
 
 
 1 2 1 2 1 1 
 − 2 4
adj A =  − 2 −1  =  0 2 −2 
 =
 0
  
1 4 1 2 
2 −2 0 1 −1 0
 
 
 
 
 1 2 1 2 1 1 
 − 
1 4 1 4 1 1

so    
−4 0 2 2 0 −1
1
A−1 =−  0 2 −2  =  0 −1 1 
  
2
1 −1 0 −1/2 1/2 0
Although we have developed determinants for square real matrices, determinants are also
defined for square complex matrices, and the computations are identical as those for real
matrices.

Example 37.11. Find A−1 if " #


1+j 2−j
A=
3j 4

Solution. We have

det A = (1 + j)4 − 3j(2 − j) = 4 + 4j − 6j − 3 = 1 − 2j

and
" #
4 −2 + j
adj A =
−3j 1+j
so
" # " #
4
1 4 −2 + j + 85 j − 45 − 35 j
A−1 = = 5
6
.
1 − 2j −3j 1+j 5
− 35 j − 15 + 35 j

238
Elementary Row/Column Operations
After computing several determinants, we see that having a row or column consisting of
mostly zeros greatly simplifies our work. We now investigate how a determinant changes after
a matrix has elementary row operations (or elementary column operations55,56 performed on
it). Our goal is to use these operations to introduce rows and/or columns with many zero
entries.
Example 37.12. Consider
" # " # " # " #
1 2 2 1 1 3 2 2
A= , B= , C= and D =
1 4 4 1 1 5 2 4

We compute the determinants of these four matrices:

det A = 2, det B = −2, det C = 2 and det D = 4

and notice that B, C and D can each be derived from A by exactly one elementary column
operation.
" # " #
1 2 −→ 2 1
A= =B and det B = − det A
1 4 C1 ↔C2 4 1
" # " #
1 2 −→ 1 3
A= = C and det C = det A
1 4 C1 +C2 →C2 1 5
" # " #
1 2 2C1 →C1 2 2
A= = D and det D = 2 det A
1 4 −→ 2 4

It appears that the determinant changes predictably under these elementary column opera-
tions (the same holds for elementary row operations).
Theorem 37.13. Let A ∈ Mn×n (R).
(1) If A has a row (or column) of zeros, then det A = 0.

(2) If B is obtained from A by swapping two distinct rows (or two distinct columns), then
det B = − det A.

(3) If B is obtained from A by adding a multiple of one row to another row (or a multiple
of one column to another column) then det B = det A.
55
Elementary column operations are the same as elementary row operations, but performed on the columns.
One may think of performing an elementary column operation on A as performing an elementary row
operation on AT .
56
When solving a linear system of equations by carrying the augmented matrix to reduced row echelon
form, you must perform elementary row operations, and not elementary column operations.

239
(4) If two distinct rows of A (or two distinct columns of A) are equal, then det A = 0.

(5) If B is obtained from A by multiplying a row (or a column) by c ∈ R, then


det B = c det A.

Note: Do not perform elementary row operations and elementary column operations at the
same time. In particular, do not add a multiple of a row to a column, or swap a row with a
column. If you need to do both types of operations, do the row operations in one step and
the column operations in another.

240
Lecture 38
We now use elementary row and column operations to simplify the taking of determinants.

Example 38.1. Find det A if 


1 2 3
A= 4 5 6 
 

7 8 10

Solution. Rather than immediately doing a cofactor expansion, we will perform elementary
row operations to A to introduce two zeros in the first column, and then do a cofactor
expansion along that column.

1 2 3 = 1 2 3
−3 −6
det A = 4 5 6 R2 −4R1 0 −3 −6 =1
−6 −11
7 8 10 R3 −7R1 0 −6 −11

Of course, we could now evaluate the 2 × 2 determinant, but to include another example, we
will instead multiply the first column by a factor of −1/3 and then evaluate the simplified
determinant.

−3 −6 − 31 C1 →C1 1 −6
det A = (−3) = (−3)(−11 + 12) = −3.
−6 −11 = 2 −11

A couple of things to note here. First, we are using “=” rather than “−→” when we perform
our elementary operations on A. This is because we are really working with determinants,
and provided we are making the necessary adjustments mentioned in Theorem 37.13, we will
maintain equality. Secondly, when we performed the operation − 31 C1 → C1 , a factor of −3
appeared rather than a factor of −1/3. To see why this is, consider
" # " #
−3 −6 1 −6
C= and B =
−6 −11 2 −11

Since " # " #


−3 −6 − 13 C1 →C1 1 −6
C= =B
−6 −11 −→ 2 −11
we see that B is obtained from C by multiplying the first column of C by −1/3. Thus by
Theorem 37.13
1
det B = − det C
3
and so
det C = −3 det B

241
which is why we have
−3 −6 1 −6
= (−3)
−6 −11 2 −11
We normally view this type of row or column operation as “factoring out” of that row or
column, and we omit writing this type of operation as we reduce.

Example 38.2. Let  


1 a a2
A =  1 b b2 
 

1 c c2
Show that det A = (b − a)(c − a)(c − b).

Solution. We again introduce two zeros into the first column by performing elementary row
operations on A, and then do a cofactor expansion along that column.

1 a a2 = 1 a a2
(b − a) (b − a)(b + a)
det A = 1 b b2 R2 −R1 0 b − a b 2 − a2 =1
(c − a) (c − a)(c + a)
1 c c2 R3 −R1 0 c − a c 2 − a2
1 b+a
= (b − a)(c − a)
1 c+a
= (b − a)(c − a)(c + a − b − a)
= (b − a)(c − a)(c − b)

Again, notice that

(b − a) (b − a)(b + a) 1 b+a
= (b − a)(c − a) (25)
(c − a) (c − a)(c + a) 1 c+a

results from removing a factor of b − a from the first row of the determinant on the left, and
removing a factor of c − a from the second row. These correspond to the row operations
1 1
R → R1 and c−a
b−a 1
R2 → R2 . It is natural to ask what happens if a = b or a = c since it
would appear that we are dividing by zero in these cases. However, if a = b or a = c, we see
that both sides of (25) evaluate to zero, so that we still have equality.

Example 38.3. Consider  


x x 1
A =  x 1 x .
 

1 x x
For what values of x ∈ R does A fail to be invertible?

242
Solution. A fails to be invertible exactly when det A = 0. Thus we have

x x 1 R1 −xR3 0 x − x 2 1 − x2
x(1 − x) (1 + x)(1 − x)
0= x 1 x R2 −xR3 0 1 − x2 x − x 2 =1
(1 + x)(1 − x) x(1 − x)
1 x x = 1 x x
x 1+x
= (1 − x)2
1+x x
= (1 − x)2 (x2 − (1 + x)2 ) = (1 − x)2 (x2 − 1 − 2x − x2 )
= −(1 − x)2 (1 + 2x)

so A is not invertible exactly when −(1−x)2 (1+2x) = 0, that is, when x = 1 or x = −1/2.
Example 38.4. Compute det A if
 
1 0 0 0
 2 3 0 0 
A=
 

 4 5 6 0 
7 8 9 10

Solution.

1 0 0 0
3 0 0
2 3 0 0 6 0
det A = =1 5 6 0 = 1(3) = 1(3)(6)(10) = 180
4 5 6 0 9 10
8 9 10
7 8 9 10

Note that in the previous example, det A is just the product of the entries on the main
diagonal57
Definition 38.5. Let A ∈ Mm×n (R). A is called upper triangular if every entry below the
main diagonal is zero. A is called lower triangular if every entry above the main diagonal is
zero.
Example 38.6. The matrices
   
4 −7 1 2 3 " #
0 0
 0 3 ,  0 4 10  and
   
0 0
0 0 0 0 −2
57
Recall that for A = [aij ] ∈ Mm×n (R), the main diagonal of A consists of the entries a11 , a22 , . . . , akk
with k being the minimum of m and n.

243
are upper triangular, and the matrices
 
" # 0 0 0 " #
3 0 0 0 0
,  1 2 0  and
 
2 −4 0 0 0
−1 3 4

are lower triangular.


Theorem 38.7. If A = [aij ] ∈ Mn×n (R) is a triangular matrix (upper or lower triangular),
then n
Y
det(A) = a11 a22 · · · ann = aii .
i=1

Properties of Determinants

Theorem 38.8. If A ∈ Mn×n (R) and k ∈ R, then

det(kA) = k n det A.

Proof. If k = 0, then the result holds trivially. If k 6= 0, then we perform k1 Ri → Ri to each


of the n rows of A, which gives the result by Theorem 37.13.
Example 38.9. Find (det A)(det B) and det(AB) where
" # " #
1 2 1 1
A= and B =
3 4 −1 2

Solution. We have

(det A)(det B) = (4 − 6)(2 − (−1)) = −2(3) = −6

and
−1 5
det(AB) = = −11 − (−5) = −6.
−1 11

Theorem 38.10. If A, B ∈ Mn×n (R), then det(AB) = (det A)(det B).


Note that Theorem 38.10 says that for n × n matrices, the determinant distributes over
matrix multiplication. Since multiplication of real numbers is commutative, we have

det(AB) = (det A)(det B) = (det B)(det A) = det(BA)

for any A, B ∈ Mn×n (R). This means that even though A and B do not commute in general,
we are guaranteed that det(AB) = det(BA).

244
Note that Theorem 38.10 extends to more than two matrices. For A1 , A2 , . . . , Ak ∈ Mn×n (R),

det(A1 A2 · · · Ak ) = (det A1 )(det A2 ) · · · (det Ak ).

In particular, if A1 = A2 = · · · = Ak = A for any positive integer k, then we obtain

det(Ak ) = (det A)k .

Theorem 38.11. Let A ∈ Mn×n (R) be invertible. Then


1
det(A−1 ) =
det A
Proof. For an invertible matrix A we have,

(det A)(det(A−1 )) = det(AA−1 ) = det I = 1

and since A invertible implies det A 6= 0, we obtain


1
det(A−1 ) = .
det A
For an invertible matrix A, we define A−k = (A−1 )k for any positive integer k and we define
A0 = I. Thus
k k
det(A−k ) = det (A−1 )k = det(A−1 ) = (det A)−1 = (det A)−k


and

det(A0 ) = det I = 1 = 10 = (det A)0 .

It follows that
det(Ak ) = (det A)k
for any integer k where k ≤ 0 requires that A be invertible.

Recalling Theorem 30.6, we have that for A1 , A2 , . . . , Ak ∈ Mn×n (R) invertible, the product
A1 A2 · · · Ak is invertible and

(A1 A2 · · · Ak )−1 = A−1 −1 −1


k · · · A2 A1 .

Example 38.12. Let A1 , A2 , . . . , Ak ∈ Mn×n (R) be such that A1 A2 · · · Ak is invertible. Then

0 6= det(A1 A2 · · · Ak ) = (det A1 )(det A2 ) · · · (det Ak )

and so for i = 1, 2, . . . , k, we have that det Ai 6= 0 and thus Ai is invertible for i = 1, . . . , k.

Theorem 38.13. Let A ∈ Mn×n (R). Then det(AT ) = det(A).

245
Note that since det(AT ) = det(A) for a square matrix A, we see why we may perform column
operations on A when computing det(A) – column operations performed on A are just row
operations performed on AT .

Example 38.14. If det(A) = 3, det(B) = −2 and det(C) = 4 for A, B, C ∈ Mn×n (R), find
det(A2 B T C −1 B 2 (A−1 )2 )

Solution. We have

det(A2 B T C −1 B 2 (A−1 )2 ) = det(A2 ) det(B T ) det(C −1 ) det(B 2 ) det((A−1 )2 )


1 1
= (det A)2 (det B) (det B)2
det C (det A)2
(det B)3
=
det C
(−2)3 8
= = − = −2.
4 4
We are seeing that the determinant behaves well with matrix multiplication. However, we
are not so lucky with matrix addition.

Example 38.15. Let " # " #


1 0 0 0
A= and B
0 0 0 1
Then
det A + det B = 0 + 0 = 0
but
det(A + B) = det I = 1
so for A, B ∈ Mn×n (R), det(A + B) 6= det A + det B in general, that is, the determinant does
not distribute over matrix addition.

246
Lecture 39

Application: Polynomial Interpolation


During experiments, data is often observed, measured and recorded in the form (x, y) where
x is independent (or control) variable and y is the dependent (or responding) variable.
Given a set of data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we seek a polynomial p(x) such that
yi = p(xi ) for each i = 1, . . . , n. We can then use the polynomial to approximate y values
corresponding to other x values. Since two distinct points determine a line and three distinct
points determine a quadratic (provided they don’t all lie on a line), given n data points we
seek a polynomial of degree n − 1.

Example 39.1. Find a cubic polynomial p(x) whose graph passes through each of the points
(−2, −5), (−1, 4), (1, 4) and (3, 60).

Solution. Let p(x) = a0 + a1 x + a2 x2 + a3 x3 for a0 , a1 , a2 , a3 ∈ R. For each data point (xi , yi ),


we evaluate the equation p(xi ) = yi .

(−2, −5) : a0 − 2a1 + 4a2 − 8a3 = −5


(−1, 4) : a0 − a1 + a2 − a3 = 4
(1, 4) : a0 + a1 + a2 + a3 = 4
(3, 60) : a0 + 3a1 + 9a2 + 27a3 = 60

Converting to matrix notation, we obtain


    
1 −2 4 −8 a0 −5
 1 −1 1 −1   a1
    4 
=
   
  
 1 1 1 1   a2   4 
1 3 9 27 a3 60

Solving the system gives a0 = 3, a1 = −2, a2 = 1 and a3 = 2, that is, p(x) = 3−2x+x2 +2x3 .
More generally, given n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), we construct a polynomial
p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 such that p(xi ) = yi for each i = 1, . . . , n. This gives
the system of equations

a0 + a1 x1 + a2 x21 + · · · + an−1 x1n−1 = y1


a0 + a1 x2 + a2 x22 + · · · + an−1 x2n−1 = y2
.. .. .. .. ..
. . . . .
2 n−1
a0 + a1 xn + a2 xn + · · · + an−1 xn = yn

247
whose matrix equation is
    
1 x1 x21 · · · xn−1
1 a0 y1
1 x2 x2 · · · xn−1
2
a1 y2
    
 2   
=

 .. .. .. ..  .. ..  (26)
. . . . . .
    
    
2 n−1
1 xn xn · · · xn an−1 yn

For x1 , x2 , . . . , xn ∈ R and n ≥ 2, the n × n matrix


 
1 x1 x21 · · · xn−1
1
2 n−1
 1 x2 x2 · · · x2
 

A=  .. .. .. .. 
 . . . .


2 n−1
1 xn xn · · · xn

is called a Vandermonde matrix. For n = 2, the Vandermonde matrix is


" #
1 x1
A=
1 x2

with det A = x2 − x1 . For n = 3, the we have


 
1 x1 x21
A =  1 x2 x22 
 

1 x3 x23

with det A = (x3 − x1 )(x3 − x2 )(x2 − x1 ). For the n × n Vandermonde matrix,


Y
det A = (xj − xi )
1≤i<j≤n

that is, det A is the product of the terms (xj − xi ) where j > i and i, j both lie between
1 and n inclusively. It follows that the n × n Vandermonde matrix is invertible if and only
if x1 , x2 , . . . , xn are all distinct and that in this case, Equation (26) has a unique solution.
This shows the following:

Theorem 39.2. For the n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) where x1 , x2 , . . . , xn are
all distinct, there exists a unique polynomial

p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1

satisfying p(xi ) = yi for each i = 1, 2 . . . , n.

248
Example 39.3. A car manufacturing company uses a wind tunnel to test the force due to
air resistance experienced by the car windshield. The following data was collected:
Air velocity (m/s) 20 33 45
Force on windshield (N) 200 310 420
Construct a quadratic polynomial to model this data, and use it to predict the force due to
air resistance from a wind speed of 40m/s.
Solution. Let p(x) = a0 +a1 x+a2 x2 where a0 , a1 , a2 ∈ R. Using our data points (20, 200), (33, 310)
and (45, 420) we obtain the system of equations in matrix notation
    
1 20 400 a0 200
 1 33 1089   a1  =  310 
    

1 45 2025 a2 420
The determinant of the coefficient matrix is (45 − 20)(45 − 33)(33 − 20) = 25 · 12 · 13 = 3900
and the adjugate is
 T
33 1089 1 1089 1 33
 − 

 45 2025 1 2025 1 45  
   T
17 820 −936 12
 
 
 − 20 400 1 400 1 20 

−  =  −22 500 1625 −25 

 45 2025 1 2025 1 45  
8 580 −689 13

 
 
 
 20 400 1 400 1 20 
 − 
33 1089 1 1089 1 33
 
17 820 −22 500 8 580
=  −936 1625 −689 
 

12 −25 13
so       
a0 17 820 −22 500 8 580 200 642/13
1 
 a1  =  −936 1625 −689   310  =  209/30 
     
3900
a2 12 −25 13 420 11/390
Thus
642 209 11 2
p(x) = + x+ x
13 30 390
When x = 40, we have
642 209 11 2 14 554
p(40) = + 40 + 40 = ≈ 373.18
13 30 390 39
When the air velocity is 40 m/s, the windshield experiences approximately 373.18N of force.

249
Determinants and Area
Let " # " #
u1 v1
~u = and ~v =
u2 v2
h u1 i h v1 i
2
be vectors in R . Recall that [ uu12 ] 6= u2 and [ vv12 ] 6=
, and that the parallelogram
v2
0 0 h u1 i
determined by [ uu12 ] and [ vv12 ] is a subset of R2 while the parallelogram determined by u2
h v1 i 0
3
and 2 is a subset of R . However, these two parallelograms do have the same area. See
v
0
Figure 62.

Figure 62: A parallelogram determined by ~u, ~v ∈ R2 on the left, and its “realization” lying
in the x1 x2 −plane of R3 on the right.

Thus the area, A, of the parallelogram ~u, ~v ∈ R2 determine can be computed58 as


     
u1 v1 0
p
A =  u2  ×  v2  =  0  = (u1 v2 − v1 u2 )2
     

0 0 u1 v2 − v1 u2
" #
u1 v1
= |u1 v2 − v1 u2 | = det = det[ ~u ~v ] .
u2 v2
58
We need to be careful here: we explicitly write “det” when indicating a determinant since we are using
“| · · · |” to indicate absolute value and not the determinant. Mathematics often uses the same notation to
mean different things in different settings, and so we must be careful in cases such as this when such notation
could be interpreted in several ways.

250
Example 39.4. The area of the parallelogram determined by the vectors
" # " #
1 3
~u = and ~v =
2 4

is " #
1 3
A = det = |4 − 6| = | − 2| = 2.
2 4

Now, consider ~u, ~v ∈ R2 and a linear transformation L : R2 → R2 with standard matrix [ L ].


The area of the parallelogram determined by L(~u) and L(~v ) is

A = det[ L(~u) L(~v ) ]


 
= det [ L ]~u [ L ]~v

= det [ L ][ ~u ~v ]
= det[ L ] det[ ~u ~v ] .

Example 39.5. Let ~u, ~v ∈ R2 determine a parallelogram with area equal to 4. Let
L : R2 → R2 be a linear transformation with standard matrix
" #
1 2
[L] = .
−1 1

Then the area, A, of the parallelogram determined by L(~u) and L(~v ) is

A = det[ L(~u) L(~v ) ] = det[ L ] det[ ~u ~v ] = 1 − (−2) (4) = 3(4) = 12

An illustration of this is shown in Figure 63.

Figure 63: The parallelogram determined by ~u and ~v on the left and its image under the
linear transformation L on the right.

251
Although we have focused on parallelograms, our work generalizes to any shape in R2 . For
example, consider a circle of radius r = 1 centred at the origin in R2 . The area of this circle
is Acircle = πr2 = π(1)2 = π. If we consider a stretch in the x1 −direction by a factor of 2,
then we are considering the linear transformation L : R2 → R2 with standard matrix
" #
2 0
[L] = .
0 1

The image of our circle under L is called an ellipse, and this ellipse has area

Aellipse = det[ L ] Acircle = |2|π = 2π.

Figure 64 depicts our circle and the resulting ellipse, and shows that our result for the area
of the ellipse is consistent with the actual formula for the area of an ellipse.

Figure 64: A circle of radius 1 centred at the origin on the left, and its image under the
linear transformation L on the right.

252
Lecture 40

Determinants and Volume


Let      
u1 v1 w1
~u =  u2  , ~v =  v2  and w
~ =  w2 
     

u3 v3 w3
be three vectors in R3 . From before, we know the volume, V , of the parallelepiped they
determine is given by
V = |~u · (~v × w)|.
~
Working with the components of ~u, ~v and w
~ gives
V = |~u · (~v × w)|
~
     
u1 v1 w1
=  u2  ·  v2  ×  w2 
     

u3 v3 w3
   
u1 v2 w3 − w2 v3
=  u2  ·  −(v1 w3 − w1 v3 ) 
   

u3 v1 w2 − w1 v2
 
u1 v1 w1
= det  u2 v2 w2 
 

u3 v3 w3
h i
= det ~u ~v w ~ .

~ ∈ R3 and any linear transformation


From this, it follows that for three vectors ~u, ~v , w
3 3
L : R → R , the volume of the parallelepiped determined by L(~u), L(~v ) and L(w)
~ is

V = det[ L(~u) L(~v ) L(w)


~ ] = det[ L ] det[ ~u ~v w
~ ]
with the derivation being the same as the two dimensional case.

As in R2 , our work generalizes to any shape in R3 . For example, consider a sphere of radius
r = 1 centred at the origin in R3 . The volume of this sphere is Vsphere = 34 πr3 = 43 π(1)3 = 43 π.
If we consider a stretch in the x2 −direction by a factor of 2 and a stretch in the x3 −direction
by a factor of 3, then we have the linear transformation L : R3 → R3 with standard matrix
 
1 0 0
[L] =  0 2 0 .
 

0 0 3

253
The image of our circle under L is an ellipsoid, and this ellipsoid has volume
4
Vellipsoid = det[ L ] Vsphere = |6| π = 8π.
3
Figure 65 illustrates our sphere and the resulting ellipsoid, and shows that our result for the
volume of the ellipsoid is consistent with the actual formula for the volume of an ellipsoid.

Figure 65: A sphere of radius 1 centred at the origin on the left, and its image under the
linear transformation L on the right.

Eigenvalues and Eigenvectors


For A ∈ Mm×n (R), ~x ∈ Rn and ~b ∈ Rm , we have seen that the equation A~x = ~b has
been central to our study of linear algebra. We now focus on a particular instance of this
equation where ~b is a scalar multiple of ~x. This of course requires ~b ∈ Rn and it follows that
A ∈ Mn×n (R). Thus, given A ∈ Mn×n (R), we seek a nonzero59 vector ~x and a scalar λ such
that
A~x = λ~x
Definition 40.1. For A ∈ Mn×n (R), a scalar λ is an eigenvalue of A if A~x = λ~x for some
nonzero vector ~x. The vector ~x is then called an eigenvector of A corresponding to λ.
Example 40.2. If " # " #
−3/5 4/5 1
A= and ~x =
4/5 3/5 2
59
We insist ~x 6= ~0 as otherwise the above equation becomes A~0 = λ~0 which trivially holds for any scalar λ.

254
then " #" # " #
−3/5 4/5 1 1
A~x = = = 1~x,
4/5 3/5 2 2
and so λ = 1 is an eigenvalue of A and ~x = [ 12 ] is a corresponding eigenvector.
Example 40.3. Let L : R2 → R2 be a reflection in the x2 −axis. Then we know L is a linear
transformation and " #
−1 0
A = [L] =
0 1
is the standard matrix of L. Thinking geometrically, we see that the reflection of ~e1 in
the x2 −axis is −~e1 , that is, A~e1 = −~e1 = (−1)~e1 so λ = −1 is an eigenvalue of A with
corresponding eigenvector ~e1 . Similarly, we see A~e2 = ~e2 = 1~e2 , so λ = 1 is an eigenvalue
of A with corresponding eigenvector ~e2 . In fact, any nonzero vector lying on the x1 −axis is
an eigenvector corresponding to λ = −1 and any nonzero vector lying on the x2 −axis is an
eigenvector corresponding to λ = 1.
How do we find eigenvalues and eigenvectors for A ∈ Mn×n (R)? For a nonzero vector ~x and
scalar λ, we have that λ is an eigenvalue of A with corresponding eigenvector ~x if and only
if
A~x = λ~x ⇐⇒ A~x − λ~x = ~0 ⇐⇒ A~x − λI~x = ~0 ⇐⇒ (A − λI)~x = ~0.
Thus we will consider the homogeneous system (A − λI)~x = ~0. Since ~x 6= ~0, we require
nontrivial solutions to this system, and since A − λI is an n × n matrix, the Invertible
Matrix Theorem gives that A − λI cannot be invertible, and so det(A − λI) = 0. This
verifies the following theorem.
Theorem 40.4. Let A ∈ Mn×n (R). A number λ is a eigenvalue of A if and only if λ satisfies
the equation
det(A − λI) = 0.
If λ is a eigenvalue of A, then all nonzero solutions of the homogeneous system of equations
(A − λI)~x = ~0
are all of the eigenvectors corresponding to λ.
Theorem 40.4 indicates that to find the eigenvalues and corresponding eigenvectors of an
n × n matrix A, we first find all scalars λ so that det(A − λI) = 0 which will be our
eigenvalues. Then for each eigenvalue λ of A, we find the nullspace of A − λI by solving the
homogeneous system (A − λI)~x = ~0. The nonzero vectors of Null (A − λI) will be the set of
eigenvectors of A corresponding to λ. We make the following definition.
Definition 40.5. Let A ∈ Mn×n (R). The characteristic polynomial of A is
CA (λ) = det(A − λI).
We note that λ is an eigenvalue of A if and only if CA (λ) = 0. As we will see, CA (λ) is
indeed a polynomial. Since A ∈ Mn×n (R), CA (λ) will have real coefficients, but may have
non–real roots.

255
Lecture 41
Example 41.1. Find the eigenvalues and all corresponding eigenvectors for the matrix
" #
1 2
A= .
−1 4

Solution. We first compute the characteristic polynomial of A.

1−λ 2
CA (λ) = det(A − λI) = = (1 − λ)(4 − λ) − 2(−1)
−1 4 − λ
= 4 − 5λ + λ2 + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3).

Now λ is a eigenvalue of A if and only if CA (λ) = 0, that is, if and only if (λ − 2)(λ − 3) = 0.
Thus λ1 = 2 and λ2 = 3 are the eigenvalues of A. To find the eigenvectors of A corresponding
to λ1 = 2, we solve the homogeneous system (A − 2I)~x = ~0.
" # " # " #
−1 2 −→ −1 2 −R1 1 −2
A − 2I =
−1 2 R2 −R1 0 0 −→ 0 0
so " # " #
2t 2
~x = =t , t ∈ R.
t 1
Thus the eigenvectors of A corresponding to λ1 = 2 are
" #
2
t , t ∈ R, t 6= 0.
1

To find the eigenvectors of A corresponding to λ2 = 3, we solve the homogeneous system


(A − 3I)~x = ~0.
" # " # " #
−2 2 −→ −2 2 − 12 R1 1 −1
A − 3I =
−1 1 R2 − 12 R1 0 0 −→ 0 0
so " # " #
s 1
~x = =s , s∈R
s 1
and thus the eigenvectors of A corresponding to λ2 = 3 are
" #
1
s , s ∈ R, s 6= 0.
1

256
Definition 41.2. Let λ be an eigenvalue of A ∈ Mn×n (R). The set containing all of the eigen-
vectors of A corresponding to λ together with the zero vector of Rn is called the eigenspace
of A corresponding to λ, and is denoted by Eλ (A). It follows that
Eλ (A) = Null (A − λI)
and is hence a subspace of Rn .
Thus we seek a basis for each eigenspace Eλ (A) of A. Once we have a basis for Eλ (A), we can
construct all eigenvectors of A corresponding to λ by taking all non-zero linear combinations
of these basis vectors.

From our previous example, the eigenvalues of A were λ1 = 2 and λ2 = 3. Hence


(" #) (" #)
2 1
and
1 1
are bases for Eλ1 (A) and Eλ2 (A) respectively. Note that each eigenspace is a line through
the origin in R2 .

Note that we can verify our work is correct by ensuring that our basis vectors for each
eigenspace satisfy the equation A~x = λ~x for the corresponding eigenvalue λ:
" # " #" # " # " #
2 1 2 2 4 2
A = = =2
1 −1 4 1 2 1
" # " #" # " # " #
1 1 2 1 3 1
A = = =3 .
1 −1 4 1 3 1
Example 41.3. Find the eigenvalues and a basis for each eigenspace of A where
 
0 1 1
A =  1 0 1 .
 

1 1 0
Solution. We begin by computing the characteristic polynomial of A, using elementary row
operations to aid in our computations.
−λ 1 1 R1 +λR2 0 1 − λ2 1 + λ
CA (λ) = det(A − λI) = 1 −λ 1 = 1 −λ 1
1 1 −λ R3 −R2 0 1 + λ −λ − 1
and performing a cofactor expansion along the first column and factoring entries as needed
leads to
(1 + λ)(1 − λ) 1+λ 1−λ 1
= (−1) = (−1)(1 + λ)2
1+λ −(1 + λ) 1 −1

257
= (−1)(λ + 1)2 ((1 − λ)(−1) − 1) = (−1)(λ + 1)2 (λ − 2).

Hence the eigenvalues of A are λ1 = −1 and λ2 = 2. For λ1 = −1, we solve (A + I)~x = ~0.
   
1 1 1 −→ 1 1 1
A + I =  1 1 1  R2 −R1  0 0 0 
   

1 1 1 R3 −R1 0 0 0
so      
−s − t −1 −1
~x =  s  = s 1  + t 0 , s, t ∈ R.
     

t 0 1
Hence a basis for Eλ1 (A) is
   
 −1
 −1  
B1 =  1  ,  0  .
   
 
0 1
 

For λ2 = 2, we solve (A − 2I)~x = ~0.


     
−2 1 1 R1 +2R2 0 −3 3 −→ 0 −3 3 − 31 R1

A − 2I =  1 −2 1  −→  1 −2 1   1 −2 1  −→
     

1 1 −2 R3 −R2 0 3 −3 R3 +R1 0 0 0
     
0 1 −1 −→ 0 1 −1 1 0 −1
 R1 ↔R2 
 1 −2 1  R2 +2R1  1 0 −1  0 1 −1 
   
−→

0 0 0 0 0 0 0 0 0
so    
t 1
~x =  t  = t  1  , t ∈ R.
   

t 1
Hence a basis for Eλ2 (A) is  
 1 
 
B2 =  1  .
 
 
1
 

Note that in the last example, the matrix A was 3 × 3 and the characteristic polynomial of
A was of degree 3. This is true in general: for A ∈ Mn×n (R), CA (λ) will be of degree n.
Notice also in the last example that we only had two eigenvalues: λ1 = −1 (which was a
double–root of CA (λ)) and λ2 = 2 (which was a single–root of CA (λ)).

258
Definition 41.4. Let A ∈ Mn×n (R) with eigenvalue λ. The algebraic multiplicity of λ,
denoted by aλ , is the number of times λ appears as a root of CA (λ).60
In our previous example, λ1 = −1 and λ2 = 2 were the only two eigenvalues of A, and we
observed that
aλ1 = 2 and aλ2 = 1.
Also in our last example, we see that dim(Eλ1 (A)) = 2 and dim(Eλ2 (A)) = 1.
Definition 41.5. Let A ∈ Mn×n (R) with eigenvalue λ. The geometric multiplicity of λ,
denoted by gλ , is the dimension of the eigenspace Eλ (A).
Again from our previous example, we have that,
gλ1 = 2 and gλ2 = 1.
The next theorem states a relationship between the algebraic and geometric multiplicities of
an eigenvalue. The proof is omitted as it is beyond the scope of this course.
Theorem 41.6. For any A ∈ Mn×n (R) and any eigenvalue λ of A,
1 ≤ gλ ≤ aλ ≤ n.
Example 41.7. Find the eigenvalues of A and a basis for each eigenspace where
" #
1 0
A=
5 1
Solution. We have
1−λ 0
CA (λ) = det(A − λI) = = (1 − λ)2
5 1−λ
which shows that λ1 = 1 is the only eigenvalue of A and aλ1 = 2. We solve (A − I)~x = ~0.
" # " # " #
0 0 −→ 0 0 R1 ↔R2 1 0
A−I =
5 0 1
R
5 2
1 0 −→ 0 0
so " # " #
0 0
~x = =t , t∈R
t 1
Thus (" #)
0
1
is a basis for Eλ1 (A), and we see gλ1 = 1 < 2 = aλ1 .
We see from this example that the geometric multiplicity of an eigenvalue can be less than
its algebraic multiplicity. We also notice that for a square upper or lower triangular matrix,
the eigenvalues of A are the entries on the main diagonal of A.
60
We can find the algebraic multiplicities of the eigenvalues of a matrix from the factorization of its
characteristic polynomial. In Example 41.3, we saw that CA (λ) = (−1)(λ + 1)2 (λ − 2). The exponent of “2”
on the λ + 1 term means that λ1 = −1 has algebraic multiplicity 2 while the exponent of “1” on the λ − 2
means that λ2 = 2 has algebraic multiplicity 1. 259
Lecture 42
Given A ∈ Mn×n (R) we have seen that CA (λ) is a real polynomial of degree n. However, we
have seen before that a real polynomial can have non-real roots, and it thus follows that a
real matrix can have non-real eigenvalues.
Example 42.1. Let " #
1 −1
A= .
1 1
Find the eigenvalues of A, and for each eigenvalue, find one corresponding eigenvector.
Solution. We have

1 − λ −1
CA (λ) = det(A − λI) = = (1 − λ)2 + 1 = λ2 − 2λ + 2.
1 1−λ

Turning to the quadratic formula61 , we find


p √ √
−(−2) ± (−2)2 − 4(1)(2) 2 ± −4 2 ± 2 −1
λ= = = =1±j
2(1) 2 2
so we take λ1 = 1 − j and λ2 = 1 + j as the eigenvalues of A. To find an eigenvector of A
corresponding to λ1 = 1 − j, we solve (A − (1 − j)I)~x = ~0.
" # " # " #
j −1 −jR1 1 j −→ 1 j
A − (1 − j)I =
1 j −→ 1 j R2 −R1 0 0
so " # " #
−jt −j
~x = =t , t∈C
t 1
and we have that −j
 
1 is an eigenvector of A corresponding to λ1 = 1 − j. For λ2 = 1 + j,
we solve (A − (1 + j)I)~x = ~0.
" # " # " #
−j −1 jR1 1 −j −→ 1 −j
A − (1 + j)I =
1 −j −→ 1 −j R2 −R1 0 0
so " # " #
jt j
~x = =t , t∈C
t 1
j
and 1 is an eigenvector of A corresponding to λ2 = 1 + j.
61
Recall√for a polynomial p(x) = ax2 + bx + c with a 6= 0, we know that p(x) = 0 if and only if
−b ± b2 − 4ac
x= .
2a

260
As a reminder, we can check our work:
" #" # " # " #
1 −1 −j −1 − j −j
= = (1 − j)
1 1 1 1−j 1
" #" # " # " #
1 −1 j −1 + j j
= = (1 + j)
1 1 1 1+j 1

Recall from Theorem 5.2 that if a real polynomial has a complex root z, then z is also a root
of the polynomial. Thus it follows that if a real n × n matrix A has a complex eigenvalue λ,
then λ is also an eigenvalue of A, which is exactly what we observed in the previous example.
Moreover, we observed that if ~x is an eigenvector of a real n × n matrix A corresponding to a
complex eigenvalue λ, then ~x is a eigenvector of A corresponding to the complex eigenvalue
λ.

Diagonalization
Note that for our following discussions about diagonalization, it is assumed that our matrices
are square matrices with real entries, and that the eigenvalues (and thus eigenvectors) of
our matrices are real. Our work does generalize naturally to real matrices with complex
eigenvalues, and even to complex matrices with complex eigenvalues, but we do not pursue
this here.
Definition 42.2. An n × n matrix D such that dij = 0 for all i 6= j is called a diagonal
matrix and is denoted by D = diag(d11 , . . . , dnn ).
Example 42.3. The matrices
 
" # " # 1 0 0
1 0 0 0
, , 0 2 0 
 
0 1 0 0
0 0 3

are diagonal matrices. Note that diagonal matrices are both upper and lower triangular
matrices.
Lemma 42.4. If D = diag(d11 , . . . , dnn ) and E = diag(e11 , . . . , enn ) then it follows
1) D + E = diag(d11 + e11 , . . . , dnn + enn )

2) DE = diag(d11 e11 , . . . , dnn enn ) = diag(e11 d11 , . . . , enn dnn ) = ED


In particular, for any positive integer k,

Dk = diag(d11
k k
, . . . , dnn ).

In fact, this holds for any integer k provided none of d11 , . . . , dnn are zero, that is, if D is
invertible.

261
Definition 42.5. An n × n matrix A is diagonalizable if there exists an n × n invertible
matrix P and an n × n diagonal matrix D so that P −1 AP = D. In this case, we say that P
diagonalizes A to D.
It is important to note that P −1 AP = D does not imply that A = D in general. This
is because matrix multiplication does not commute, so we cannot cancel P and P −1 in the
expression P −1 AP = D. However, given two n×n matrices A and B such that P −1 AP = B,
it can be shown that A and B have many similarities.
Theorem 42.6. If A, B are n × n matrices such that P −1 AP = B for some invertible n × n
matrix P , then
1) det A = det B,
2) A and B have the same eigenvalues,
3) rank (A) = rank (B),
4) tr (A) = tr (B) where
tr (A) = a11 + · · · + ann
is called the trace of A.
This motivates the following definition.
Definition 42.7. If A and B are n × n matrices such that P −1 AP = B for some n × n
invertible matrix P , then A and B are said to be similar.
In light of Definition 42.7, we can restate Definition 42.5 by saying that an n × n matrix is
diagonalizable if it is similar to a diagonal matrix.

We now consider how to determine if a square matrix A is diagonalizable, and to find the
invertible matrix P that diagonalizes A (provided A is indeed diagonalizable). Suppose A
is an n × n matrix whose distinct eigenvalues are λ1 , . . . , λk with algebraic multiplicities
aλ1 , . . . , aλk . Since CA (λ) is a polynomial of degree n, it has exactly n roots (counting
complex roots and repeated roots). Thus aλ1 + · · · + aλk = n. From Theorem 41.6, we
have that 1 ≤ gλ ≤ aλ ≤ n for any eigenvalue λ of A so k ≤ gλ1 + · · · + gλk ≤ n. In fact,
gλ1 + · · · + gλk = n if and only if gλi = aλi for each i = 1, . . . , k.
Lemma 42.8. Let A be an n × n matrix and let λ1 , . . . , λk be distinct eigenvalues of A. If Bi
is a basis for the eigenspace Eλi (A) for i = 1, . . . , k, then B = B1 ∪ B2 ∪ · · · ∪ Bk is linearly
independent.
Lemma 42.8 simply states that if we have bases for eigenspaces corresponding to the distinct
eigenvalues of an n × n matrix A and we construct a set B that contains all of those bases
vectors, then the set B will be linearly independent. Since the number of vectors in each
eigenspace Eλi (A) is gλi , there are k ≤ gλ1 + · · · + gλk ≤ n vectors in B. If there are in fact n
vectors in B, then B is a basis for Rn consisting of eigenvectors of A. The following theorem
gives us a condition under which A is diagonalizable.

262
Theorem 42.9 (Diagonalization Theorem). An n × n matrix A is diagonalizable if and only
if there exists a basis for Rn consisting of eigenvectors of A.

Proof. We first assume that A is diagonalizable. Then there exists an invertible matrix
P = [ ~x1 · · · ~xn ] and a diagonal matrix D = diag(λ1 , . . . , λn ) such that P −1 AP = D,
that is, such that AP = P D. Thus

A[ ~x1 · · · ~xn ] = P [ λ1~e1 · · · λn~en ]


[ A~x1 · · · A~xn ] = [ λ1 P~e1 · · · λn P~en ]
[ A~x1 · · · A~xn ] = [ λ1~x1 · · · λn~xn ].

We see that A~xi = λi~xi for i = 1, . . . , n, and since P = [ ~x1 · · · ~xn ] is invertible, it follows
from the Invertible Matrix Theorem that the set {~x1 , . . . , ~xn } is a basis for Rn so that ~xi 6= ~0
for i = 1, . . . , n. Thus {~x1 , . . . , ~xn } is a basis for Rn consisting of eigenvectors of A.

We now assume that there is a basis {~x1 , . . . , ~xn } of Rn consisting of eigenvectors of A. Then
for each i = 1, . . . n, A~xi = λi~xi for some eigenvalue λi of A. It follows from the Invertible
Matrix Theorem that P = [ ~x1 · · · ~xn ] is invertible and thus

P −1 AP = P −1 [ A~x1 · · · A~xn ]
= P −1 [ λ1~x1 · · · λn~xn ]
= P −1 [ λ1 P~e1 · · · λn P~en ]
= P −1 P [ λ1~e1 · · · λn~en ]
= diag(λ1 , . . . , λn )

which shows that A is diagonalizable.


The proof of the Diagonalization Theorem is a constructive proof, that is, given a diagonal-
izable matrix A, it tells us exactly how to construct the invertible matrix P and the diagonal
matrix D so that P −1 AP = D. Given that A is diagonalizable, the jth column of P will
contain the jth vector from the basis of eigenvectors, and the jth column of the diagonal
matrix D will contain the corresponding eigenvalue in the (j, j)−entry.

The following are easy consequences of Theorem 42.9.

Corollary 42.10. An n × n matrix A is diagonalizable if and only if aλ = gλ for every


eigenvalue λ of A.

Corollary 42.11. If an n × n matrix A has n distinct eigenvalues, then A is diagonalizable.

263
Lecture 43
Example 43.1. Diagonalize the matrix
" #
1 2
A= .
−1 4

Solution. From Example 41.1, the eigenvalues of A are

λ1 = 2 with algebraic multiplicity aλ1 = 1


λ2 = 3 with algebraic multiplicity aλ2 = 1

and
(" #)
2
is a basis for Eλ1 (A) so λ1 = 2 has geometric multiplicity gλ1 = 1
1
(" #)
1
is a basis for Eλ2 (A) so λ2 = 3 has geometric multiplicity gλ2 = 1.
1

We see that aλ1 = gλ1 and aλ2 = gλ2 and so A is diagonalizable by Corollary 42.1062 . We
take " #
2 1
P =
1 1
and have that P diagonalizes A, that is
" #
2 0
P −1 AP = diag(2, 3) = = D.
0 3

We can (and should) check our work:


" #
1 1 1 −1
P −1 = adj (P ) =
det(P ) 1 −1 2
so
" #" #" # " #" # " #
1 −1 1 2 2 1 2 −2 2 1 2 0
P −1 AP = = = =D
−1 2 −1 4 1 1 −3 6 1 1 0 3

Note that P and D are not unique. We could have chosen P = [ 11 21 ] which would have
diagonalized A to D = diag(3, 2). Moreover, we can use the vectors from any bases for the
eigenspaces of A, not just the ones we found in Example 41.1.
62
In this case, we don’t even need to compute the geometric multiplicities of A to conclude that A is
diagonalizable - since the two eigenvalues of A are distinct and A is a 2×2 matrix, Corollary 42.11 immediately
tells us that A is diagonalizable.

264
Example 43.2. Diagonalize the matrix
 
0 1 1
A =  1 0 1 .
 

1 1 0

Solution. From Example 41.3 the eigenvalues of A are

λ1 = −1 with algebraic multiplicity aλ1 = 2


λ2 = 2 with algebraic multiplicity aλ2 = 1

and
   
 −1
 −1 

 1 , 0  is a basis for Eλ1 (A) so λ1 = −1 has geometric multiplicity gλ1 = 2
   
 
0 1
 
 
 1
 

 1  is a basis for Eλ2 (A) so λ2 = 2 has geometric multiplicity gλ2 = 1.
 
 
1
 

Since aλ1 = gλ1 and aλ2 = gλ2 , we see that A is diagonalizable so we take
 
−1 −1 1
P = 1 0 1 
 

0 1 1

from which it follows that


 
−1 0 0
P −1 AP = diag(−1, −1, 2) =  0 −1 0  = D.
 

0 0 2

Again, it’s a good idea to check P −1 AP = D even though it’s a bit more work to compute
P −1 for a 3 × 3 matrix.

We note that for a diagonalizable matrix A, the ith column of P is an eigenvector of A which
must correspond to the eigenvalue lying in the ith column of D. Thus, when A is diagonal-
izable, we normally construct P which then allows us to easily write out the diagonal matrix
D based on how we constructed P . We also note that an eigenvalue λ of a diagonalizable
matrix A appears in the diagonal matrix D aλ times.

265
Example 43.3. Recall from Example 41.7 that
" #
1 0
A=
5 1
has eigenvalues λ1 = 1 with aλ1 = 2. However, a basis for Eλ1 (A) is
(" #)
0
1
so gλ1 = 1 6= 2 = aλ1 . Hence A is not diagonalizable. This means that we cannot find two
linearly independent eigenvectors of A to form an invertible 2 × 2 matrix P .

Powers of Matrices
A useful application of diagonalizing is computing high powers of a matrix. Suppose A is
an n × n diagonalizable matrix. Then P −1 AP = D for some n × n invertible P and n × n
diagonal matrix D. Then A = P DP −1 and
A2 = P DP −1 P DP −1 = P DIDP −1 = P D2 P −1
Similarly, A3 = P D3 P −1 and more generally, Ak = P Dk P −1 for any positive integer k.
Although computing a high power of an arbitrary matrix is nearly impossible by inspection,
Lemma 42.4 states that to compute a positive power of a diagonal matrix, one need only
raise each of the diagonal entries to that power.
Example 43.4. Find Ak for any positive integer k where
" #
1 2
A= .
−1 4
Solution. From Example 43.1, A is diagonalizable with
" # " #
2 1 2 0
P = and D = .
1 1 0 3
Thus
Ak = P Dk P −1
" #" #" #
2 1 2k 0 1 −1
=
1 1 0 3k −1 2
" #" #
2k+1 3k 1 −1
= k k
2 3 −1 2
" #
2k+1 − 3k (2)3k − 2k+1
= .
2k − 3k (2)3k − 2k

266
Note that we can verify our work is reasonable - taking k = 1 gives
" # " #
1+1 1 1 1+1
2 − 3 (2)3 − 2 1 2
A1 = = = A.
21 − 31 (2)31 − 21 −1 4

Note also that we can now easily compute, say, A10 :


" #
−57001 116050
A10 = .
−58025 117074

Example 43.5. Find Ak for " #


3 −4
A= .
−2 1

Solution.

3 − λ −4
CA (λ) = = (3 − λ)(1 − λ) − 8 = λ2 − 4λ +3 − 8 = λ2 − 4λ − 5 = (λ − 5)(λ +1)
−2 1 − λ

so λ1 = −1 and λ2 = 5 are the eigenvalues of A. We see aλ1 = 1 = aλ2 , that is, the 2 × 2
matrix A has two distinct eigenvalues, so we are guaranteed that A is diagonalizable by
Corollary 42.11. For λ1 = −1,
" # " # " #
4 −4 −→ 4 −4 1
4
R 1 1 −1
A+I =
−2 2 R2 + 21 R1 0 0 −→ 0 0
so (" #)
1
1
is a basis for Eλ1 (A). For λ2 = 5,
" # " # " #
−2 −4 −→ −2 −4 1
− 2 R1 1 2
A − 5I =
−2 −4 R2 −R1 0 0 −→ 0 0
so (" #)
−2
1
is a basis for Eλ2 (A). Now, let
" # " #
1 −2 −1 0
P = from which it follows that D = .
1 1 0 5

267
Then " #
1 1 2
P −1 =
3 −1 1
and
" #" # " #
1 −2 (−1)k 0 1 1 2
Ak = P Dk P −1 =
1 1 0 5k 3 −1 1
" #" #
1 (−1)k (−2)5k 1 2
=
3 (−1)k 5k −1 1
" #
1 (−1)k + (2)5k 2(−1)k − (2)5k
= .
3 (−1)k − 5k 2(−1)k + 5k

We can use the eigenvalues of an n × n matrix A to compute the determinant and trace of
A. Suppose A has k distinct eigenvalues λ1 , . . . , λk with algebraic multiplicities aλ1 , . . . , aλk .
Then aλ1 + · · · + aλk = n and the characteristic polynomial of A is of the form

CA (λ) = det(A − λI) = (−1)n (λ − λ1 )aλ1 · · · (λ − λk )aλk .

Taking λ = 0 gives

det(A − 0I) = det A = (−1)n (−λ1 )aλ1 · · · (−λk )aλk


= (−1)n (−1)aλ1 +···+aλk λ1 aλ1 · · · λk aλk
= (−1)n (−1)n λ1 aλ1 · · · λk aλk
= λ1 aλ1 · · · λk aλk
k
Y aλ
= λi i .
i=1

Thus, det A is the product of the eigenvalues of A where each eigenvalue λ of A appears in
the product aλ times. With a bit more work, one can show that
k
X
tr A = λ1 aλ1 + · · · + λk aλk = λi aλ1 ,
i=1

that is, the trace of A is the sum of the eigenvalues of A where each eigenvalue λ of A appears
in the sum aλ times.

Example 43.6. For  


0 1 1
A =  1 0 1 ,
 

1 1 0

268
λ1 = −1 and λ2 = 2 with aλ1 = 2 and aλ2 = 1. Thus

det A = (−1)2 21 = (−1)(−1)(2) = 2


tr A = (−1)(2) + 2(1) = (−1) + (−1) + 2 = 0.

We also note that A is invertible if and only if 0 is not an eigenvalue of A.

269
Lecture 44

Vector Spaces
Back in Lecture 6, we defined the operations of vector addition and scalar multiplication for
vectors in Rn . Theorem 6.10 then gave ten properties that vectors in Rn obey under our
definitions of vector addition and scalar multiplication. The notion of a vector space is to
consider a set V of objects with an operation of addition and scalar multiplication defined
upon them such that a similar set of properties as those stated in Theorem 6.10 also hold.
As an example, in Lecture 25, we defined addition and scalar multiplication for matrices in
Mm×n (R), and Theorem 25.7 showed that the same ten properties held for matrices under
these two operations.
Definition 44.1. A set V with an operation of addition, denoted ~x + ~y , and an operation of
scalar multiplication, denoted c~x, c ∈ R is called a vector space over R if for every ~v , ~x, ~y ∈ V
and for every c, d ∈ R
V1: ~x + ~y ∈ V
V2: ~x + ~y = ~y + ~x
V3: (~x + ~y ) + ~v = ~x + (~y + ~v )
V4: There exists a vector ~0 ∈ V, called the zero vector, so that ~x + ~0 = ~x for every ~x ∈ V.
V5: For every ~x ∈ V there exists a (−~x) ∈ V so that ~x + (−~x) = ~0
V6: c~x ∈ V
V7: c(d~x) = (cd)~x
V8: (c + d)~x = c~x + d~x
V9: c(~x + ~y ) = c~x + c~y
V10: 1~x = ~x
We call the elements of V vectors 63 .
Note that in the above definition, “over R” means that our scalars are real numbers. Later,
we will briefly mention vector spaces over C. Until then, all vector spaces are over R and we
will simply say “vector space”.
Example 44.2. We have seen that Rn , subspaces of Rn , and Mm×n (R) all satisfy these
properties and are thus vector spaces with the usual definition of addition and scalar multi-
plication.
Example 44.3. The set L(Rn , Rm ) of linear transformations from Rn to Rm is a vector
space with the standard definition of addition and scalar multiplication.
63
The textbook uses a boldface x to denote a vector in a vector space V.

270
Example 44.4. Let a, b ∈ R with a < b. With the standard addition and scalar multiplica-
tion,

• The set F(a, b) of all functions f : (a, b) → R is a vector space

• The set C(a, b) of all continuous functions f : (a, b) → R is a vector space

• The set C 1 (a, b) of all differentiable functions f : (a, b) → R is a vector space

Example 44.5. The set of discontinuous functions f : R → R with the standard addition
and scalar multiplication is not a vector space. To see this, consider
( (
1, x ≥ 0 0, x ≥ 0
f1 (x) = and f2 (x) =
0, x < 0 1, x < 0

Both are discontinuous, but their sum

(f1 + f2 )(x) = f1 (x) + f2 (x) = 1

for every x ∈ R and is thus continuous. Hence, V1 fails: the set of discontinuous functions
is not closed under addition and is thus not a vector space.

Our work involving spanning sets, linear independence and linear dependence, bases and
subspaces all carry over naturally to vector spaces. We restate those definitions here for an
arbitrary vector space V.

Definition 44.6. Let B = {~v1 , . . . , ~vk } be a set a set of vectors in a vector space V. The
span of B is
Span B = {c1~v1 + · · · + ck~vk | c1 , . . . , ck ∈ R}.
The set Span B is spanned by B, and B is a spanning set for Span B.

Definition 44.7. Let B = {~v1 , . . . , ~vk } be a set of vectors in a vector space V. We say that
B is linearly dependent if there exist c1 , . . . , ck ∈ R, not all zero so that
~0 = c1~v1 + · · · + ck~vk .

We say that B is linearly independent if the only solution to


~0 = c1~v1 + · · · + ck~vk

is c1 = · · · = ck = 0.

Definition 44.8. A subset S of V is a subspace of V if properties V 1 − V 10 of Definition


44.1 hold for every ~x, ~y , ~z ∈ S and for all c, d ∈ R.

In particular, {~0} is a subspace of V, called the trivial subspace, and V is a subspace of V.

271
Theorem 44.9 (Subspace Test). Let S be a nonempty subset of V. If for every ~x, ~y , ∈ S,
and for every c ∈ R, we have that ~x + ~y ∈ S and c~x ∈ S, then S is a subspace of V.

Definition 44.10. Let S be a subspace of V, and let B = {~v1 , . . . , ~vk } be a set of vectors in
S. Then B is a basis for S if B is linearly independent and S = Span B. If S = {~0}, then we
define B = ∅ to be a basis for S.

Definition 44.11. The dimension of a subspace S of V, denoted by dim(S), is the number


of vectors in any basis for S.

Having reviewed the important definitions, we begin to look at some examples.

Example 44.12. Let B ∈ Mn×n (R) be fixed and let

S = {A ∈ Mn×n (R) | AB = 0n×n }.

Show S is a subspace of Mn×n (R).

Solution. By definition, S ⊆ Mn×n (R), and since 0n×n B = 0n×n , we have that 0n×n ∈ S and
S is nonempty. Let A1 , A2 ∈ S. Then A1 B = 0n×n = A2 B so

(A1 + A2 )B = A1 B + A2 B = 0n×n + 0n×n = 0n×n

and A1 + A2 ∈ S. For any c ∈ R,

(cA1 )B = c(A1 B) = c 0n×n = 0n×n

so cA1 ∈ S. Thus S is a subspace of Mn×n (R).

Example 44.13. Is S = {A ∈ M2×2 (R)|A2 = A} a subspace of M2×2 (R)?

Solution. S is not a subspace of M2×2 (R). To see this, note that I ∈ S since I 2 = I. However,
since (2I)2 = 4I 6= 2I, the matrix 2I ∈ / S so S is not closed under scalar multiplication
(property V 6 fails).

Example 44.14. Consider the set


(" # " # " # " #)
1 0 0 1 0 0 0 0
B= , , , .
0 0 0 0 1 0 0 1

Show that B is a basis for the vector space M2×2 (R).

Solution. We show that B is linearly independent. For c1 , c2 , c3 , c4 ∈ R, consider


" # " # " # " # " #
1 0 0 1 0 0 0 0 0 0
c1 + c2 + c3 + c4 = .
0 0 0 0 1 0 0 1 0 0

272
This gives " # " #
c1 c2 0 0
=
c3 c4 0 0
and so clearly c1 = c2 = c3 = c4 = 0 and thus B is linearly independent. Also note that for
any [ ac db ] ∈ M2×2 (R),
" # " # " # " # " #
a b 1 0 0 1 0 0 0 0
=a +b +c +d
c d 0 0 0 0 1 0 0 1

so Span B = M2×2 (R). Thus B is a basis for M2×2 (R), called the standard basis for M2×2 (R).
Since B has 4 vectors, dim(M2×2 (R)) = 4.
We construct the standard basis for Mm×n (R) similarly, so dim(Mm×n (R)) = mn.
Example 44.15. Let
(" # " # " # " #)
1 1 1 1 0 1 1 0
B= , , ,
0 1 1 0 1 1 1 1

Show B is a basis for M2×2 (R) and express A = [ 13 24 ] as a linear combination of the vectors
(matrices) in B.
Solution. For c1 , c2 , c3 , c4 ∈ R, consider
" # " # " # " # " #
1 2 1 1 1 1 0 1 1 0
= c1 + c2 + c3 + c4
3 4 0 1 1 0 1 1 1 1
Equating corresponding entries gives the system
c1 + c2 + + c4 = 1
c1 + c2 + c3 + = 2
+ c2 + c3 + c4 = 3
c1 + + c3 + c4 = 4
which we carry to reduced row echelon form.
     
1 1 0 1 1 −→ 1 1 0 1 1 R1 −R3 1 0 −1 0 −2 R1 +R2
 1
 1 1 0 2  R2 −R1  0
  0 1 −1 1  −→ 
  0 0 1 −1 1  −→

     
 0 1 1 1 3   0 1 1 1 3   0 1 1 1 3  R3 −R2
1 0 1 1 4 R4 −R1 0 −1 1 0 3 R4 +R3 0 0 2 1 6 R4 −2R2
     
1 0 0 −1 −1 −→ 1 0 0 −1 −1 R1 +R4 1 0 0 0 1/3
 0
 0 1 −1 1  R2 ↔R3  0 1 0
  2 2 
 R2 −2R4
 0 1 0 0 −2/3 
 
     
 0 1 0 2 2   0 0 1 −1 1  R3 +R4  0 0 1 0 7/3 
0 0 0 3 4 1
R
3 4
0 0 0 1 4/3 −→ 0 0 0 1 4/3

273
so c1 = 1/3, c2 = −2/3, c3 = 7/3, c4 = 4/3 and
" # " # " # " # " #
1 2 1 1 1 2 1 1 7 0 1 4 1 0
= − + + .
3 4 3 0 1 3 1 0 3 1 1 3 1 1

Also, since the coefficient matrix reduces to I, the resulting homogeneous system derived
from " # " # " # " # " #
1 1 1 1 0 1 1 0 0 0
c1 + c2 + c3 + c4 =
0 1 1 0 1 1 1 1 0 0
has only the trivial solution, so B is linearly independent. Since B has 4 vectors and
dim(M2×2 (R)) = 4, Span B = M2×2 (R) so B is a basis for M2×2 (R).

Example 44.16. Let S = {A ∈ M2×2 (R) | AT = A} be a subspace of M2×2 (R). Find a basis
for S.
" #
a1 a2
Solution: Let A = ∈ S. Then AT = A so
a3 a4
" # " #
a1 a3 a1 a2
=
a2 a4 a3 a4

from which we see a3 = a2 . Thus


" # " # " # " #
a1 a2 1 0 0 1 0 0
A= = a1 + a2 + a4
a2 a4 0 0 1 0 0 1
so (" # " # " #)
1 0 0 1 0 0
B= , ,
0 0 1 0 0 1
is a spanning set for S. Since each vector in B contains a nonzero entry where others contain
a zero entry, B is linearly independent and thus a basis for S. It follows that dim(S) = 3.

274
Lecture 45
Definition 45.1. The set
P (R) = a0 + a1 x + a2 x2 + · · · | a0 , a1 , a2 , . . . ∈ R where only finitely many of the ai s are nonzero


denotes the set of all real polynomials. The set


Pn (R) = {a0 + a1 x + · · · + an xn | a0 , a1 , . . . , an ∈ R}
denotes the set of all real polynomials of degree at most n. We denote the zero polynomial
by 0 = 0 + 0x + · · · + 0xn ∈ Pn (R). Note that Pn (R) ⊆ P (R).
If p(x), q(x) ∈ Pn (R), then
p(x) = a0 + a1 x + · · · + an xn
q(x) = b0 + b1 x + · · · + bn xn
for some a0 , . . . , an , b0 , . . . , bn ∈ R. We have that
• p(x) = q(x) if and only if ai = bi for i = 0, 1, . . . , n
• (p + q)(x) = p(x) + q(x) = (a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn
• (kp)(x) = kp(x) = ka0 + ka1 x + · · · + kan xn for any k ∈ R
and that Pn (R) is a vector space under these operations.
Example 45.2. Let S = {p(x) ∈ P3 (R) | p(1) = 0}. Show S is a subspace of P3 (R).
Solution. By definition, S ⊆ P3 (R). Also, if p(x) is the zero polynomial, then p(1) = 0 so
the zero polynomial belongs to S and S is nonempty. Now for p(x), q(x) ∈ S, p(1) = 0 = q(1)
so
(p + q)(1) = p(1) + q(1) = 0 + 0 = 0
which shows (p + q)(x) ∈ S. For any t ∈ R,
(tp)(1) = tp(1) = t(0) = 0
so (tp)(x) ∈ S. Hence S is a subspace of P3 (R).
Example 45.3. Consider the set B = {1, x, . . . , xn } ⊆ Pn (R). For c0 , c1 , . . . , cn ∈ R,
consider
c0 (1) + c1 x + · · · + cn xn = 0 = 0 + 0x + · · · + 0xn .
We have that c0 = c1 = · · · = cn = 0 so B is linearly independent. Also, for any polynomial
p(x) = a0 + a1 x + · · · + an xn ∈ Pn (R), p(x) is trivially a linear combination of the elements
in B:
p(x) = a0 (1) + a1 (x) + · · · + an (xn )
so Span B = Pn (R). Thus B is a basis for Pn (R), called the standard basis of Pn (R). It
follows that dim(Pn (R)) = n + 1.

275
Note that B = {1, x, x2 , . . . } is the standard basis for P (R). We see then that P (R) is
infinite dimensional !

Example 45.4. Let B = {1, 1 + x, 1 + x + x2 } ⊆ P2 (R). Show B is a basis for P2 (R).

Solution. For c1 , c2 , c3 ∈ R, consider

c1 (1) + c2 (1 + x) + c3 (1 + x + x2 ) = 0

Rearranging gives

(c1 + c2 + c3 ) + (c2 + c3 )x + c3 x2 = 0 + 0x + 0x2

Thus
c1 + c2 + c3 = 0
c2 + c3 = 0
c3 = 0
and we see that c3 = 0 which implies that c2 = 0 which in turn gives c1 = 0. Thus B is linearly
independent. Since B has 3 elements and dim(P2 (R)) = 3, we see that Span B = P2 (R) and
so B is a basis for P2 (R).

Example 45.5. Let B = {1 + x, 1 − x, 1, 2x, x + x2 } ⊆ P2 (R). Find a basis B 0 for Span B


with B 0 ⊆ B. Find the dimension of Span B.

Solution. For c1 , . . . , c5 ∈ R, consider

c1 (1 + x) + c2 (1 − x) + c3 (1) + c4 (2x) + c5 (x + x2 ) = 0.

Rearranging gives

(c1 + c2 + c3 ) + (c1 − c2 + 2c4 + c5 )x + c5 x2 = 0 + 0x + 0x2

from which we obtain

c1 + c2 + c3 = 0
c1 − c2 + 2c4 + c5 = 0
c5 = 0

We see immediately that this system is underdetermined and thus has nontrivial solutions.
This allows us to conclude that B is a linearly dependent set. Carrying the coefficient matrix
of our system to reduced row echelon form gives
     
1 1 1 0 0 −→ 1 1 1 0 0 −→ 1 1 1 0 0 R1 −R2

 1 −1 0 2 1  R2 −R1  0 −2 −1 2 1  − 2 R2  0 1 1/2 −1 −1/2  −→


    1  

0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

276
   
1 0 1/2 1 1/2 R1 − 21 R3 1 0 1/2 1 0
 0 1 1/2 −1 −1/2   0 1 1/2 −1 0 
   
R2 + 21 R3

0 0 0 0 1 −→ 0 0 0 0 1

From any of the above row echelon forms, we can see that there are leading entries in the
first, second and fifth columns. Thus we can tell that 1 and 2x can be expressed as linear
combinations of 1 + x and 1 − x, but from the reduced row echelon form, we see that
1 1
1 = (1 + x) + (1 − x)
2 2
2x = 1(1 + x) − 1(1 − x)

From any of the above row echelon forms, we conclude that

B 0 = {1 + x, 1 − x, x + x2 }

is a linearly independent subset of B with Span B 0 = Span B. Since B 0 has 3 elements and
dim(P2 (R)) = 3, Span B 0 = P2 (R), so B 0 is a basis for P2 (R).

Example 45.6. Consider the subspace S = {p(x) ∈ P2 (R) | p(2) = 0} of P2 (R). Find a basis
for S.

Solution: Let p(x) ∈ S. Then p(2) = 0 so x − 2 is a factor of p(x). Since p(x) ∈ P2 (R), there
are a, b ∈ R so that p(x) = (x − 2)(ax + b) = ax2 + bx − 2ax − 2b = a(x2 − 2x) + b(x − 2) so
S = Span {x2 − 2x, x − 2}. Since neither x2 − 2x nor x − 2 is a scalar multiple of the other,
{x2 − 2x, x − 2} is linearly independent and thus a basis for S.

277
Lecture 46
Example 46.1. Let V = {x ∈ R | x > 0}. Then under the standard operations of addition
and scalar multiplication of real numbers, V is not a vector space over R since, for example,
V 6 fails. To see this, note that 2 ∈ V and −1 ∈ R, but that (−1)2 = −2 ∈ / V. We also note
V 4 and V 5 fail. However, we define a new addition, ⊕, and a new scalar multiplication, ,
as follows: for all x, y ∈ V and for all c ∈ R

x ⊕ y = xy
c x = xc

Show that under these operations, V is a vector space over R.


Solution. Let x, y, z ∈ V, c, d ∈ R. Then
V1: x ⊕ y = xy > 0 since x, y > 0. Thus x ⊕ y ∈ V.

V2: x ⊕ y = xy = yx = y ⊕ x

V3: (x ⊕ y) ⊕ z = (xy) ⊕ z = (xy)z = x(yz) = x ⊕ (yz) = x ⊕ (y ⊕ z)

V4: Since 1 ∈ V and x ⊕ 1 = x(1) = x for all x ∈ V, 1 is the zero vector.


1 1 1 1 1

V5: Since x ∈ V, x > 0 so x
> 0 and x
∈ V. Then since x ⊕ x
= x x
= 1, x
is the
additive inverse of x.

V6: c x = xc > 0 since x > 0, so c x ∈ V.


c
V7: c (d x) = c (xd ) = (xd ) = xcd = (cd) x

V8: (c + d) x = xc+d = xc xd = xc ⊕ xd = (c x) ⊕ (d x)

V9: c (x ⊕ y) = c (xy) = (xy)c = xc y c = xc ⊕ y c = (c x) ⊕ (c y)

V10: 1 x = x1 = x
This shows that V is a vector space over R.
This example serves to show that it can be possible to redefine the operations of vector
addition and scalar multiplication to make a set a vector space. The resulting vector space
is quite bizarre: we saw that 1 is the zero vector64 of V, and the additive inverse of x ∈ V is x1 .

We don’t always use ⊕ and to denote vector addition and scalar multiplication, even if
these definitions have been redefined. If the definitions are clearly understood, then we can
use the standard notation.
64
This does not imply that 1 = 0. It simply says that under our new rules of vector addition and scalar
multiplication, 1 plays the role of the zero vector, that is, x ⊕ 1 = x for every ~x ∈ V. Note that as defined,
0∈
/ V.

278
Theorem 46.2. If V is a vector space, then for every ~x ∈ V,
1) 0~x = ~0
2) −~x = (−1)~x.
Proof of (2). For ~x ∈ V
(−1)~x = (−1)~x + ~0 by V4
= (−1)~x + (~x + (−~x)) by V5
= ((−1)~x + ~x) + (−~x) by V3
= ((−1)~x + 1~x) + (−~x) by V10
= ((−1) + 1)~x + (−~x) by V8
= 0~x + (−~x) Since − 1 + 1 = 0
= ~0 + (−~x) by part (1) above
= (−~x) + ~0 by V2
= −~x by V4
As an illustration of Theorem 46.2, we consider V from Example 46.1. For any ~x ∈ V
0 x = x0 = 1 ← the zero vector
and
1
= x−1 = (−1)
−x = x
x
which is consistent with Theorem 46.2.
Vector Spaces over C
• Cn is a vector space over65 C. For ~z ∈ Cn ,
 
z1
 . 
~z =  ..  = z1~e1 + · · · + zn~en .
zn
We call {~e1 , . . . , ~en } (where ~ei is the ith column of the n × n identity matrix) the
standard basis for Cn , so dim(Cn ) = n.
• Mm×n (C) is a vector space over C. The standard basis for Mm×n (C) is the same as for
Mm×n (R), so dim(Mm×n (C)) = mn.
• Pn (C) (the set of polynomials of degree at most n) is a vector space over C. The
standard basis is {1, x, . . . , xn } (here, x ∈ C), so dim(Pn (C)) = n + 1.
The notions of subspace, span, linear independence, basis and dimension are handled the
same way as for real vector spaces.
65
The expression “over C” means our scalars are complex numbers.

279
THE END

55
A 4−dimensional cube, often called a tesseract or a hypercube. The same hypercube is depicted on the
cover of these notes, but is viewed from a different angle.

280

You might also like