You are on page 1of 87

# Math 308: Lecture Notes

## May 30, 2016

Contents
1 Linear Systems 3
1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear Systems and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Traffic flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 Data fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Euclidean space 17
2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Geometry of vector operations . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Spans and linear independence . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Matrices 27
3.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 What does T tell us? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Geometry of Linear Transformations . . . . . . . . . . . . . . . . . . . . . 31
3.2 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.3 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Finding A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1
4 Subspaces 45
4.1 Introduction to subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Span and nullspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Kernel and range of linear transformations . . . . . . . . . . . . . . . . . 49
4.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.2 Finding a basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Row and Column Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Determinants 61
5.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1 A Formula for the Determinant . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Applications of the determinant . . . . . . . . . . . . . . . . . . . . . . . . 65

## 6 Eigenvalues and Eigenvectors 67

6.1 The Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Motivating examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.2 Finding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1.3 Finding Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1.4 Multiplicity of an Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1.5 Remarks about Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Changing it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.3 Change of Basis Matrices between Nonstandard Bases . . . . . . . . . . . 81
6.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.1 How to diagonalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2
Introduction
Linear algebra is ubiquitous with advanced applications of math. Here are some examples of
applications, that we will talk about more during the quarter.

Ch. 1 Scheduling and optimization: Lufthansa pilots, flight attendants, cleaning, maintenance,
schedules.
Ch. 2 Video Games and computer animation: quickly update a 3d scene by rotating and scaling
vectors.
Ch. 4 Computer vision: rectify points in stereo vision.
Ch. 6 Google PageRank : determine how to list search results.

1 Linear Systems
1.1 Systems of Linear Equations
Lets motivate linear algebra with an example.
Example 1.1. Xavier is two years older than Yang. Their ages sum to 6.
x+y =6
x y = 2.
There are three ways to think about this: substitution, doing algebra with equations, or
graphically. Many of you are probably used to solving this with substitution. For example, if
we solve the second equation for x, we find that
x = y + 2.
Now, we can plug this into the first equation to get
y+2+y =6
which means y = 2, and x = 4. There is nothing wrong with this method, but well soon be
trying to solve more complicated systems (with lots of variables and lots of equations), so we
might want to focus on a different method.
We could solve this equation in another way by doing a little equation algebra. For instance,
if we just add the first equation to the second, we get
x+y+xy =6+2
or
2x = 8
which immediately tells us that x = 4. We can then solve for y to see y = 2 as well. This is the
process we are going to formalize in this course.
Finally, another beneficial way to do this would be to graph the linear system. The solution
is the intersection of two lines. We know how to graph lines and can just sketch them to find
the solution. Well come back to this throughout the course.

3
What if we wanted to solve a harder question?

Example 1.2. Ling, an engineer, was just hired to orchestrate the upkeep of 7 bridges in the
San Francisco area. She has a budget of \$700,000. Older bridges cost more per day to repair.
In fact, the nth oldest bridge costs n thousand dollars per day. They need more work too. She
plans to spend 2 days more on each bridge than the one built just before it. How many days
does she allocate to each bridge?
What if the same scheme was kept, but instead Ling had 100 bridges and \$770,000,000
dollars? How many days should the first bridge get? How could you program a computer to
solve this? (answer 10 days)
What if Ling had M bridges and 5M (M + 1) thousand dollars? How many days should the
first bridge get? Can a computer even solve this question? (answer 76 days)

Well come back to this later. The course will be focused on studying these equations and
systems of these equations. Lets begin with the terminology.

Notation 1.3. Rather than x and y we might have a lot of variables. So we use x1 , x2 , . . . , xn
denote different variables. Other ways they might appear are as xi , yi , wi , zi . We also let
a1 , . . . , an R denote constants. Here ai R means ai is an element of the set of real numbers.

## Definition 1.4. A linear equation is an equation of the form

a1 x1 + a2 x2 + + an xn = b

## where a1 , a2 , . . . , an R and b R are constants and x1 , x2 , . . . , xn are variables.

Example 1.5.
2x1 + 3x2 + x3 = 6
is an example of a linear equation with 3 variables.

## Example 1.6. 2x21 + 3x1 x3 = 7 is not a linear equation. Neither is x1 /x2 = 6.

Definition 1.7. A solution to a linear equation is a set of numbers (s1 , s2 , . . . , sn ) that satisfies
the equation.

## Example 1.8. By plugging in (1, 1, 1) to the equation

2x1 + 3x2 + x3 = 6,

4
we get
2+3+1=6
which shows that (1, 1, 1) is a solution to the equation.

Definition 1.9. The solution set to a linear equation is the set of all solutions to a given
equation.

equations

## a11 x1 + a12 x2 + + a1n xn = b1

a21 x1 + a22 x2 + + a2n xn = b2
..
.
am1 x1 + am2 x2 + + amn xn = bm .

Notice the double subscripts; aij means ith down and jth over when arranged in standard form.

## x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 = 700

x1 + x2 = 2
x2 + x3 = 2
x3 + x4 = 2
x4 + x5 = 2
x6 + x7 = 2

Can you tell that x1 = 17? Later well introduce a quick way to solve this system.

## The following is an example of a linear system with 3 variables and 3 equations.

Example 1.12.

2x1 + 3x2 + x3 = 6
x1 x2 x3 = 2
2x1 + x2 7x3 = 1

Definition 1.13. A solution to a linear system is a set of numbers (s1 , s2 , . . . , sn ) that simul-
taneously satisfy all of the equations in the linear system.

Definition 1.14. We say a linear system is consistent if it has at least one solution. We say
it is inconsistent if it has no solutions.

5
Keep in mind that there can be more than one solution to a linear system; for example the
system
2x1 + 3x2 = 6
4x1 + 6x2 = 12

has infinitely many solutions (because it is two multiples of the same equation).
We can also use the graph perspective to analyze solution sets.

In the example of two lines intersecting (solving a system of two equations with two variables),
we know the lines could either intersect in one point (meaning there is exactly one solution), or
intersect nowhere by being parallel (meaning there are no solutions), or be the same line (meaning
there are infinitely many solutions). This is our first indication of the following theorem, which
well prove soon.
Theorem 1.15. A linear system has no solutions, exactly one solution, or infinitely many
solutions.
Two questions: Why is linearity important here? And, how do we actually find these solu-
tions? We want to develop an algorithm to systematically find the solutions. First, lets start
with a few special types of systems.
Example 1.16. Solve the system
x1 + 2x2 + 2x3 = 2
x2 x3 = 1
x3 = 2.

This triangular system is something we can solve by back-substitution (i.e., start at the
bottom and substitute upwards to find all the variables). We start at the bottom, where x3 = 2,
and then plug that in to the second equation to find that x2 2 = 1, or x2 = 3, and then finally
plug these into the first equation to find x1 = 8. Therefore, we find the (unique) solution is
(8, 3, 2).

6
Definition 1.17. The above system is an example of a triangular system, where the linear
system has n equations and n variables, and the first nonzero term of the k-th equation is the
k-th variable.

## Theorem 1.18. Triangular systems always have exactly one solution.

We can see this in the above example, and it is true in general because back-substitution
will always give us that solution.

## 2x1 4x2 + 2x3 + x4 = 11

x2 x3 + 2x4 = 5
3x4 = 9.

This system is not triangular (there are more variables than equations) but we can still try
to solve it with back substitution. The last equation tells us that x4 = 3, and if we try to plug
that in to the second equation, we see that x2 = x3 1 and that we have no information about
x3 !
Because there are no restrictions on x3 , that means it is a free variable and can take any
value. We assign it a free parameter (a letter representing any number we want), says x3 = s.
Then, x2 = s 1 and we can plug all of this into the first equation to find x1 = 2 + s. In other
words, the solutions are
(2 + s, 1 + s, s, 3)
where s R.

Definition 1.20. The above system is an example of a system in echelon form. We say a
system is in echelon form if

(a) Every variable is the leading variable (the first nonzero term) of at most one equation.

(b) The system is organized in a stair-step pattern, meaning the leading varibles move down
and to the right.

2x1 4x2 = 11
x1 x2 = 5.

## is not in echelon form because x1 is the leading variable of two equations.

Theorem 1.22. A system in echelon form always has at least one solution.

This is true for the same reason as the analogous fact about triangular systems; we can find
the solutions by back-substitution.

7
1.2 Linear Systems and Matrices
In the last section, we saw that if a system was in echelon form, we could solve it with back
substitution. In this section, well figure out how to convert systems into echelon form to easily
find solutions. To do that, were going to introduce matrices.
Definition 1.23. A matrix is a rectangular array of numbers.
Example 1.24.    
1 2 3 1 2
and
3 4 1 4 7
are examples of matrices.
Notation 1.25. We will use both hard [] and soft () brackets for matrices. They both mean
the same thing.
Each linear system corresponds to two matrices.
Definition 1.26. The linear system

## a11 x1 + a12 x2 + + a1n xn = b1

a21 x1 + a22 x2 + + a2n xn = b2
..
.
am1 x1 + am2 x2 + + amn xn = bm .

## has coefficient matrix

a11 a12 ... a1n
a21 a22 ... a2n
.

.. ..
. .
am1 am2 ... amn
It has augmented matrix

a11 a12 ... a1n b1
a21 a22 ... a2n b2
.

.. ..
. .
am1 am2 ... amn bm
For the first few chapters, well focus on the augmented matrix only. Note that the book
leaves out the vertical line separating the a and b terms.
Example 1.27. The linear system
4x1 + 2x2 + x3 = 2
x1 x2 + 4x3 = 1

## has augmented matrix  

4 2 1 2
.
1 1 4 1

8
We can use the matrix to systematically find solutions to systems of equations. First, well
introduce echelon and reduced echelon forms, and then elementary operations.

10 7 20 4
A = 3 2 6 1
1 4 2 9
Definition 1.28. A matrix is in echelon form if

(a) the leading terms (the first nonzero entries of each row) move down and to the right in a
stair step pattern.

(b) any rows of zeros are grouped at the bottom of the matrix.

7
2 25

1 10
echelon(A) = 0 1 0 2
0 0 0 0
Definition 1.29. A matrix is in reduced echelon form if

## (b) all leading terms are equal to 1

(c) each leading term is the only nonzero term in its column

1 0 2 1
reduced echelon(A) = 0 1 0 2
0 0 0 0
Definition 1.30. Elementary operations are operations that we can do to a linear system that
dont change its solution set. There are three of them:

## 3. Add a multiple of one row to another row

Now, we will outline an algorithm for converting a matrix into an equivalent matrix in reduced
echelon form, and see how that gives us the solutions to the corresponding linear system. We
will outline the process with an example.

## 2x1 4x2 6x3 = 2

x1 x2 2x3 = 1
x1 + 3x2 + 5x3 = 2.

9
1. Write the linear system as the corresponding augmented matrix.
In the example,
2 4 6 2
1 1 2 1 .
1 3 5 2

2. If necessary, interchange two rows to get a nonzero entry in the upper left corner. Multiply
the first row by a constant to make that entry a 1.
In the example, this means multiplying the first row by -1/2,

1 2 3 1
1 1 2 1 .
1 3 5 2

3. Add multiples of the first row to the other rows to zero out the rest of the column.
In the example, this corresponds to adding R1 to R2 and adding R1 to R2 ,

1 2 3 1
0 1 1 2 .
0 1 2 1

4. Move over one column and down one row and repeat.
In this example, we move to the entry in the second row and second column. Its already
a 1, so we proceed to step 3 to zero out the rest of the second column by adding 2R2 to the
first row adding R2 to R3 ,
1 0 1 3
0 1 1 2 .
0 0 1 1
Now, we move down and over again to the third row and third column. Its also a 1, so
we proceed to step 3 and zero out the rest of the column,

1 0 0 2
0 1 0 3 .
0 0 1 1

## 5. Translate back to system of equations and write down the solution.

In the example, if we translate back to a system of equations, we see that

x1 = 2
x2 = 3
x3 = 1

so the solution is (2, 3, 1). Its always a good idea to check this!

10
This process will work for any system.

## Example 1.32. Infinitely many solutions:

1 3 9 7
1 2 6 5
2 7 21 16

1R1

1 3 9 7
1 2 6 5
2 7 21 16
1R1 + R2
2R1 + R3

1 3 9 7
0 1 3 2
0 1 3 2
1R2

1 3 9 7
0 1 3 2
0 1 3 2
1R2 + R3

1 3 9 7
0 1 3 2
0 0 0 0
Reduced echelon form:
1 0 0 1
0 1 3 2
0 0 0 0
Example 1.33. No solutions.
10 3 5
3 1 3
5 2 4
1/10R1
3
21

1 10
3 1 3
5 2 4
3R1 + R2
5R1 + R3
3
21

1 10
1
0
10 29
0 12 13
2

11
10R2
3
21

1 10
0 1 45
0 12 13
2
1/2R2 + R3
3
21

1 10
0 1 45
0 0 16
1/16R3
3
21

1 10
0 1 45
0 0 1
Reduced echelon form:
1 0 0
0 1 0
0 0 1
Lets make a few observations about systems in reduced echelon form.

1. In reduced echelon form, if the last row looks like [0 . . . 0 | 1], that translates to the
equation 0 = 1, which is FALSE, so the system has NO SOLUTIONS.

2. If there is not a row like that, then either each column has a leading one, meaning there
is exactly one solution, or there is at least one column without a leading one, meaning it
is free and can take any value, so there are infinitely many solutions.

3. Combining the two statements above, weve actually proven Theorem 1.11, because these
are the only possibilities in reduced echelon form.

4. If there are more variables than equations (more columns than rows in the matrix), then
each column CANNOT have a leading one. This means the system has either no solutions
or infinitely many, but it cannot have exactly one.

Now we introduce special type of linear systems that ALWAYS have solutions.
Definition 1.34. A linear system is homogeneous if all of the constant terms are equal to 0,
i.e. it looks like

## a11 x1 + a12 x2 + + a1n xn = 0

a21 x1 + a22 x2 + + a2n xn = 0
..
.
am1 x1 + am2 x2 + + amn xn = 0.

Note that these systems always have a solution, namely (0, 0, . . . 0), called the trivial so-
lution. There may be (infinitely many) other solutions, but we at least know the system is
consistent.

12
1.3
SKIP

1.4 Applications
1.4.1 Traffic flow
Arcata, on the northern coast of California, is a small college town with a central plaza (Figure
1). Figure 2 shows the streets surrounding and adjacent to the towns central plaza. As indicated
by the arrows, all streets in the vicinity of the plaza are oneway.

Figure 2 Traffic volumes around the Arcata plaza. Traffic flows north and south on G and H
streets, respectively, and east and west on 8th and 9th streets, respectively. The number of
cars flowing on and off the plaza during a typical 15-minute period on a Saturday morning is
also shown. Our goal is to find x1 , x2 , x3 , and x4 , the volume of traffic along each side of the
plaza. The four intersections are labeled A, B, C, and D. At each intersection, the number of
cars entering the intersection must equal the number leaving. For example, the number of cars
entering A is 100 + x1 and the number exiting is 20 + x2 . Since these must be equal, we end up
with the equation

A : 100 + x1 = 20 + x2
Applying the same reasoning to intersections B, C, and D, we arrive at three more equations,

B : x3 + 30 = x1 + 100
C : x2 + 25 = x3 + 95
D : x3 + 75 = x4 + 15.
Rewriting the equations in the usual form, we obtain the system:

13
To solve the system, we populate an augmented matrix and transform to echelon form.

1 1 0 0 80
1
0 0 1 70

0 1 1 0 70
0 0 1 1 60
1R1 + R2

1 1 0 0 80
0
1 0 1 10

0 1 1 0 70
0 0 1 1 60
1R2 + R3

1 1 0 0 80
0
1 0 1 10

0 0 1 1 60
0 0 1 1 60
1R3

1 1 0 0 80
0
1 0 1 10

0 0 1 1 60
0 0 1 1 60
1R3 + R4

1 1 0 0 80
0
1 0 1 10

0 0 1 1 60
0 0 0 0 0
Reduced echelon form:
1 0 0 1 70
0
1 0 1 10

0 0 1 1 60
0 0 0 0 0
The last coordinate x4 = s1 is free. Back substitution yields the general solution
x1 = 70 + s1 , x2 = 10 + s1 , x3 = 60 + s1 , x4 = s1 .
where s1 is a free parameter. A moments thought reveals why it makes sense that this system
has infinitely many solutions. There can be an arbitrary number of cars simply circling the plaza,
perhaps looking for a parking space. Note also that since each of x1 , x2 , x3 , and x4 must be
nonnegative, it follows that the parameter s1 70. The analysis performed here can be carried
over to much more complex traffic questions, or to other similar settings, such as computer
networks.

14
1.4.2 Data fitting
Given a set of ordered pairs we want to find a curve that goes through all of them.
Example 1.35. Find a quadratic through (0, 1), (1, 2) and (2, 7).

y = a1 x2 + a2 x + a3 .
So we have a system:

a3 = 1
a1 + a2 + a3 = 2
4a1 + 2a2 + a3 = 7

0 0 1 1
1 1 1 2
4 2 1 7
R1  R2

1 1 1 2
0 0 1 1
4 2 1 7
4R1 + R3

1 1 1 2
0 0 1 1
0 2 3 1
R2  R3

1 1 1 2
0 2 3 1
0 0 1 1
1/2R2

1 1 1 2
0 1 3 1
2 2
0 0 1 1
Reduced echelon form:
1 0 0 2
0 1 0 1
0 0 1 1
Example 1.36. Find the plane equation going through (1, 1, 0), (1, 2, 1) and (0, 0, 1).

a1 x1 + a2 x2 + a3 x3 b = 0.

1 1 0 1 0
1 2 1 1 0
0 0 1 1 0

15
1R1 + R2

1 1 0 1 0
0 1 1 0 0
0 0 1 1 0
Reduced echelon form:
1 0 0 2 0
0 1 0 1 0
0 0 1 1 0

a1 = 2b, a2 = b, a3 = b.
So we have infinitely many solutions. But what is one solution? Well, set b equal to anything
nonzero. For instance, if b = 1 then we have

2x1 x2 + x3 1 = 0

## is a plane containing those points.

16
2 Euclidean space
2.1 Vectors
Definition 2.1. A vector is an ordered list of real numbers (called components) arranged in
a vertical column or in wedge brackets h i.

Example 2.2.
u1
u2 1
2 h1, 2, 3i

..
.
3
un
We need to know where vectors live.

Notation 2.3. We write a R to mean that a is a real number. Often we call a a scalar.
Vectors on the other hand live in higher-dimensional space. The set of all n-dimensional vectors
is denoted by Rn , here R is the set of all real numbers. We write ~u Rn .

u1 v1
u2 v2
Definition 2.4. Let ~u, ~v Rn . That is ~u = . and ~v = . .

.. ..
un vn

~u = ~v ui = vi for all 1 i n.

u1 + v1
u2 + v2
~u + ~v =

..
.
un + vn

u1 cu1
u2 cu2
c~u = c . = .

.. ..
un cun

3 3 11
Example 2.5. 2 + 2 2 = 4
0 0 6

Vectors have all the nice properties that we associate to scalars. Like c(~u~v )+w ~ uc~v .
~ = w+c~
See the book for a list and proof of all of these properties. There are other vector operations.
You might remember the dot product from multi-dimensional calculus.

## Definition 2.6. The dot product of two vectors ~u, ~v is defined as ~u ~v = u1 v1 + un vn .

Example 2.7.
3 3
2 2 = 9 + 4 + 1 = 14.
1 1

17

c1
.
Notation 2.8. The dot product allows us write linear equations compactly. Fix b. Call ~c = ..
cn

x1
.
and ~x = .. then
xn

c1 x1 + cn xn = b is the same as ~c ~x = b.

We are interested in something that looks similar to a dot product, but involves summing
scalars times vectors.

Definition 2.9. Given c1 , . . . , cm R a linear combination of vectors ~u1 , . . . , ~um is the vector

c1 ~u1 + cm ~um .

Notation 2.10. Given vectors ~u1 , . . . , ~um it is convenient to write the matrix with columns
corresponding to these vectors as [~u1 ~um ].

We wont elaborate on this much now, but this notation allows us to concisely describe a
linear combination.

b1 c1
. .
Fact 2.11. Fix ~b = .. . Call ~c = .. and A = [~u1 ~um ] with ~ui Rn . We will see
bm cn
later that a linear combination is the same as A~c, and that a system of equations is the same as
A~c = ~b.

General solutions can be written in terms of linear combinations of vectors. This is called
the vector form of the general solution.

## Example 2.12. The Arcata traffic example had solutions:

x1 = 70 + s1 , x2 = 10 + s1 , x3 = 60 + s1 , x4 = s1 .

## We could instead write this solution set as

70 1
10 1
60 + s1 1 .
~x =

0 1

Well see that there are several other ways to represent solutions.

## 2.1.1 Geometry of vector operations

There are visual ways to understand the vector operations:

18
Application 2.13. Vector graphics such as PDF and CAD use vectors to make scalable
images. Rather than recording image pixel data, these record everything as either a line, circle,
ellipse, curve with start and end points. Rescaling is then multiplying by a constant. So vector
graphics can be scaled infinitely! Basically the file has 1000s of instructions that read like:

Points in a vector graphic are described by a vector describing their position from the
origin.

## Draw a dot at point b

Connect the points together with a straight line (or curve) of a certain thickness

## All of the letters in this pdf are recorded as vectors!

Advantage: The size of the file does not change if the image needs to be largerthe same
instructions will make the image as large as you like, and yet the line would always remain
smooth and clear. Well see later that rotations, reflections and translations are very easy with
vectors.
Simple geometric shapes such as letters, autoshapes and logos are ideal for storing as vector
images. Some images are not suitable for storing as a vector image. For example, a detailed
photograph would need so many vector instructions to recreate it, that it is better to store it
as, say, a BITMAP.

19
Example 2.14. This is stored as a vector graphic consisting of three lines. Each line is
stored as a vector and a base point for the vector. Suppose that the base of the arrow is at the
origin and has length 4.
Describe the vector and base point of the three lines in the arrow. (There are many choices
that could work.) Then, rescale it to be 4 times larger in size. What would make it point left
instead of right?  
4
Solution: The line from base to tip is rooted at (0, 0) with the vector . The two points
0
   
1 1
of the arrows are the vectors , based at (4, 0). To rescale we multiply everything by
1 1
4. Multiplying everything by 1 would change the direction.

2.2 Span
Thinking of vectors as building blocks. The span is the set of all points we can build with our
vectors. Better yet, think of vectors like your favorite smoothie ingredients. Imagine you want
to blend them together to hit a certain nutrient profile (fat, protein, carbs, iron, calcium, . . .).
Studying the span asks what profiles we can make with the vectors and which ones we cannot.
We expand on this here with an example in R2 .

2.2.1 Introduction

x1
x2 n
Question 2.15. How many vectors, ~u1 , ~u2 , . . . , ~um does it take so that for any ~x = ... in R

xn
we can find c1 , c2 , . . . , cm R so that c1 ~u1 + c2 ~u2 + + cm ~um = ~x?
More succinctly, how many vectors do we need to express every point in Rn as a linear
combination?
   
2 1 1
Example 2.16. Lets look at R . Imagine we have vectors ~u = and ~v = and we want
2 0
to use them to get to the point (5, 6). With ~v we can adjust   our horizontal position as we please.
3
So, we first use ~u to get to height 6. This is at 3~u = . Next, we need to shift back by 1 in
6

20
the x-coordinate. This requires ~v . So
 
5
3~u ~v = .
6

In this manner we can express any point in R2 using ~u and ~v . To see this, take (x, y) R2
and solve
 
x
c1 ~u + c2~v = .
y
This is equivalent to

c1 + c2 = x
2c1 = y.

## So, c1 = y/2 and c2 = x y2 .

2 1
Example 2.17. What if we take ~u1 = 1 and ~u2 = 2 and ask where we can go with these?
1 3
We can go to any point
2x1 + x2
x1 ~u1 + x2 ~u2 = x1 + 2x2 .
x1 + 3x2
Notice that we cannot form (1, 0, 0) with these.

2 1 1 1 2 0
1 2 0 0 1 0
1 3 9 0 0 1
So we cant make everything with just these two vectors.

Given a set of vectors in Rn we are interested in the set of all points we can make with linear
combinations. This is called the span.

Definition 2.18. The span of vectors ~u1 , . . . , ~um Rn is the set of all linear combinations

## c1 ~u1 + c2 ~u2 + + cm ~um ,

where c1 , . . . , cm R.

2.2.2 Properties
Whether or not a matrix has a solution tells us whether a vector belongs to the span.

Theorem 2.19. Let ~u1 , . . . , ~um Rn . Then v span{~u1 , . . . , ~um } if and only if the matrix
equation [~u1 ~um | ~v ] has at least one solution.

## Theorem 2.20. Let S = span{~u1 , . . . , ~um } if ~u S then S = span{~u, ~u1 , . . . , ~um }.

21

1 1 1
Example 2.21. Is span 1 , 0 , 0 = R3 ?
2 1 0

Call these vectors ~u1 , ~u2 , ~u3 . Given (a, b, c) R3 we want to find x1 , x2 , x3 so that

a
x1 ~u1 + x2 ~u2 + x3 ~u3 = b .
c
This gives matrix
1 1 1 a 1 0 0 b
1 0 0 b 0 1 0 2 b + c
2 1 0 c 0 0 1 a 3b + c
Yes!

1 3 2
Example 2.22. Is span 1 , 1 , 1 = R3 ?
1 1 0

Lets setup a matrix and reduce.
1 0 21 12 a + 32 b

1 3 2 a
1 1 1 b 0 1 12 1
2a 2b
1
1 1 0 c 0 0 0 a 2b + c
No! It is described by the plane a 2b + c = 0.
Theorem 2.23. If m < n then ~u1 , . . . , ~um Rn do not span Rn . If m n then maybe they
span Rn .
Proof: Idea for m < n: Row reduce to get to a row of all zeros. Keep track of the operations.
Augment, setting the all zero row equal to 1. Reverse the operations. The new augmented
vector is not in the span. Heres a worked out example from the book:

22
For m n. The vectors ei = (0, . . . , 1, 0, . . . , 0) (1 in the ith place) span Rn . But if was take
2n copies of just say e1 , this doesnt span Rn .


2.2.3 Ax = b
With spans we are interested in solving vector equations. It will be useful to have a compact
way to express these.

Definition 2.24. Let ~a1 , . . . , ~am Rn be vectors. Set A = [~a1 ~am ]. Let ~x = (x1 , . . . , xm )
with xi R. We have
A~x = x1 ~u1 + xm ~um .

Example 2.25. Kyle pays very close attention to what he eats post workout, and his macronu-
trient needs vary greatly from week to week. He typically blends a smoothie with yogurt, banana,
avocado and tofu. These have macro-nutrient ratios per 100g as

Y B A T
Protein 9 1 2 5
.
Fat 5 1 15 3
Carb 4 22 1 2

x1
x2
If A is the nutrition matrix, then ~x =
x3 is the recipe Kyle follows, and A~x is the resulting

x4
smoothie.

Note that A needs to have the same number of columns as ~x has rows.

0 0 1
Example 2.26. Let A = 1 1 1 . The system
4 2 1

1
A~x = 2

7

0 0 1 1
1 1 1 2 .
4 2 1 7

## 2.3 Linear independence

We are interested in detecting how a set of vectors relate to one another. In particular if we can
get by with fewer vectors. We start with an abstract definition, then give an example.

23
Definition 2.27. A set of vectors ~u1 , . . . , ~um Rn is linearly independent if the only solution
to
x1 ~u1 + + xm ~um = ~0
is the trivial solution ~x = ~0 (x1 = 0, x2 = 0, . . . , xm = 0). Otherwise the set is linearly
dependent. Another way to say this is that the homogeneous equation A~x = 0 has only
one solution.

1 1 1
Example 2.28. Is 0 , 1 , 1 linearly independent?

0 0 2

## Ie., for x1 , x2 , x3 does

1 1 1 0
x1 0 + x2 1 + x3 1 = 0?

0 0 2 0
We reduce the corresponding augmented matrix to echelon form,

1 1 1 : 0 1 0 0 : 0
0 1 1 : 0 0 1 0 : 0
0 0 2 : 0 0 0 1 : 0
So our only solution is x1 = x2 = x3 = 0, therefore the set of vectors is linearly independent.
Notation 2.29. Notice that when row reducing matrices for homogeneous equations the aug-
mented column doesnt change. So, its okay to omit it, and add it at the end.
This is a big deal in computing efficiency, but we will look at a more pragmatic example
first.
Example 2.30. Kyle pays very close attention to what he eats post workout, and his macronu-
trient needs vary greatly from week to week. He typically blends a smoothie with yogurt, banana,
avocado and tofu. These have macro-nutrient ratios per 100g as

Y B A T
Protein 9 1 2 5
.
Fat 5 1 15 3
Carb 4 22 1 2
Kyle is curious whether he can cut down on ingredients and still hit any macro-goal he might
have.

24
Say that (y, b, a, t) is the amount of each ingredient he uses. He reduces this matrix:

y b a t 745

9 1 2 5 1 0 0 1261 0
0 1 0 12 0 .
9 1 15 3

1261
2
0 0 1 13 0
4 22 1 2
Letting s = t, this has solution set
745 12 2
( s, s, s, s).
1261 1261 13
Set s = 1261 and we have

## Thus we can write

1
Y = (12B + 184A + 1261T ) .
745
So for any unit of yogurt, we can instead use the blend described above. This means we can
eliminate yogurt from the ingredient list and still make the same amount of macro-decomposition
smoothies!
We further observe that if we were to similarly try to eliminate bananas, we would be left
with the equation
1
B= (745Y 184A 1261T ) .
12
The negative coefficients on the right hand side make this substitution impossible in practice.
The same is true for avocados and tofu.

Theorem 2.31. If n < m, then ~u1 , . . . , ~um Rn are linearly dependent. If m n, then they
might be linearly independent.

## Proof: The vector equation

x1 ~u1 + xm~vm = ~0
is a homogeneous linear system of n equations and m unknowns. Since m > n, there must
be infinitely many solutions, and therefore a nontrivial solution, meaning the set of vectors is
linearly dependent. 

## 2.3.1 Spans and linear independence

Span and linear independence, while different concepts, are fundamentally related. If we consider
a set of two vectors {~u1 , ~u2 }, the set is linearly dependent precisely when ~u1 is a scalar multiple
of ~u2 or vice-versa. But this just corresponds to ~u1 span{~u2 }. This connection was hinted at
in our macronutrient example, and we now generalize it.

Theorem 2.32. Let {~u1 , . . . , ~um } be a set of vectors in Rn . The set is linearly dependent if
and only if one of the vectors is in the span of the other vectors.

25

1 1 0 2

0 , , , 0 linearly dependent?
2 1

Example 2.33. Is the set of vectors
3 1 2
0

1 0 1 4

Answering this question corresponds to solving a homogeneous system, and to save time
we may omit the column of zeros in our augmented matrix. Converting the matrix to reduced
echelon form,
1 1 0 2 1 0 0 0
0 2 1 0 0 1 0 2
0 3 1 2 0 0 1 4 .

1 0 1 4 0 0 0 0
Letting x4 = t, the solution set to x1 ~u1 + x2 ~u2 + x3 ~u3 + x4 ~u4 = ~0 is (0, 2t, 4t, t). In particular,
this tells us that ~u1 is not in span{~u2 , ~u3 , ~u4 }. This example shows an important caveat of the
above theorem, that a set being linearly dependent only implies that some vector is in the span
of the others, but this need not be true for every vector.

## Another connection between the concepts is evidenced in the following theorem.

Theorem 2.34. Let{~u1 , . . . , ~um } be a set of vectors in Rn . Then {~u1 , . . . , ~um } is linearly
independent if and only if the system [~u1 ~un : ~v ] has at most one solution for every ~v Rn .

So geometrically speaking, if its possible to reach a vector ~v with a linearly independent set
{~u1 , . . . , ~um }, then the displacement along ~u1 , . . . , ~um is always the same.
When m = n, we have a much stronger statement.

Theorem 2.35. Let {~u1 , . . . , ~un } be a set of vectors in Rn . The following are equivalent:

## 3. The system [~u1 ~un | ~v ] has a unique solution for every ~v Rn .

26
3 Matrices
Lets review the story so far. We have been interested in solving systems of linear equations
A~x = ~b where we try to solve for ~x in terms of A and ~v . We studied linear independence and
span to understand better when we have solutions. To do this we introduced a bit of matrix
algebraexplaining how a matrix acts on a vector. Here we look at this from a more geometric

Example 3.1. You have a picture of a cat that is winking, and want it to cock its head to the
side in a cute way. How do you animate this?

Example 3.2. For an art project you are going to film the shadows of two statues over the
course of a 1 hour period. You want the statues to give each other a high-5 at the end of the
hour.

You need to calculate how the shadows move across the (say perfectly flat) pavement. The
sun at any given instant can be thought of as a function from R3 to R2 . We are interested in
characterizing properties of this function.

27
3.1 Linear Transformations
When we take an n m matrix A and multiply it by a vector in ~v Rm we get a linear
combination of the column vectors in A. This new vector, A~v , lives in Rn . So, we can think of
A as a function that takes vectors in Rm to vectors in Rn .

## A~x = ~b to mean [A | ~b].

Example 3.4.
1 2 1 x1 2
A = 0 1 3 , ~x = x2 ,
~b = 0 .
1 2 5 x3 7

## Then A~x = ~b means

1 2 1 x1 2
0 1 3 x2 = 0 ,
1 2 5 x3 7
This is the same as
1 2 1 : 2
0 1 3 : 0 .
1 2 5 : 7

Fact 3.5. A nice general form of the above is if A = [~a1 ~am ] then

## A~x = ~a1 x1 + ~a2 x2 + ~am xm .

m n
 ~x2  R and outputs values f (~x) R .
Notation 3.6. Say that f is a function that takes vectors
x
We will use the f : Rm Rn . For example if f (x) = then f : R R2 . Often we call f
x+1
a map.

Definition 3.7. The domain of a function is what can be inputted in the function. The image
is the result of the function. We call the range the set of images. The codomain is the space
where images live.

Notation 3.8. We denote the domain and codomain of a function f by: Dom(f ), and Cod(f ).
Also, we let Range(f ) denote the range.

3.1.1 Basics
Definition 3.9. A function T : Rm Rn is a linear transformation if for all vectors ~u, ~v Rm
and scalars r we have

## 2. T (r~u) = rT (~u) (called scaling)

Fact 3.10. T (~0) = ~0. This is because of linearity: T (~u) = T (~u + 0) = T (~u) + T (~0).

28

  x+y
x
Example 3.11. T = x is a linear transformation that takes R2 R3 , but
y
y
   
x xy
T = is not a linear transformation because it isnt linear.
y y
This may look daunting, but it is a property satisfied by matrices. That is for any n m
matrix A and vectors ~x, ~y Rn we have
1. A(~x + ~y ) = A~x + A~y
2. A(r~x) = r(A~x).
This will be our focus for this chapter. We record this fact as a theorem.
Theorem 3.12. Let A = [~a1 ~am ] be a matrix with ~ai Rn and define T (~x) = A~x. Then
T : Rm Rn is a linear transformation.

1 2
Example 3.13. Let A = 2 3 and define T (~x) = A(~x). Then T : R2 R3 is a linear
0 1
 
2 3 1
transformation. The domain is R and the range is R . Let ~u = . Then
3

1 2 7
A~u = 1 2 + 3 3 = 7 .
0 1 3

7 a
So 7 is the image of ~u under T . But is any vector b in the range of T ? That is, does
3 c

  a
x
there exist ~x = 1 such that T (~x) = b ? Well this would mean
x2
c
" # x 2x x 2x a
1 2 x
1 2 1 2
2 3 1 = ?
2x1 + 3x2 = 2x1 + 3x2 = b .

0 1 x2
0 x2 x2 c
No, x2 = c, x1 = (b 3c)/2. So a = (b 3c)/2 2c, which means it isnt free.
Or, we could have row reduced this system and seen it was inconsistent for some values of
a
b .
c

1 2 a
0 1 27 a + 17 b
0 0 27 a 17 b + c
Theorem 3.14. Let A = [~a1 ~am ] be an n m matrix. and let T : Rm Rn with T (~x) = A~x
a linear transformation. Then,
~ Range(T ) if and only if T (~x) = Aw
1. w ~ is a consistent linear system.
2. Range(T ) = span{~a1 , . . . , ~am }.

29
3.1.2 What does T tell us?
Theorem 3.15. For any linear transformation T there is a matrix A so that T (~x) = A~x.

0
.
Proof: Let ~ei = ..1 with a 1 in the ith place. The matrix A = [T (~e1 ) T (~em )] is the matrix
..
.
we want. Any ~x Rm can be written as x1~e1 + xm~em . We then have

## T (~x) = T (x1~e1 + xm~em ) = T (x1~e1 ) + T (xm~em ) = x1 T (~e1 ) + xm T (~em ) = A~x.


So, we are interested in properties of T . Here are a few things one can say about any function:

Definition 3.16. We say that T is one-to-one if T (~u) is unique for all ~u in the domain. We
say that T is onto if every ~v in the codomain is equal to T (~u) for some ~u in the domain.

## 2. T is onto if span{~a1 , . . . , ~am } = Rn .

Theorem 3.18. Let A = {~a1 ~an } and T : Rn Rn be given by T (~x) = A~x. The following
are equivalent:

1. A spans Rn .

2. A is linearly independent.

## 3. A~x = ~b has a unique solution for all ~b Rn .

4. T is onto.

5. T is one-to-one.

30
3.1.3 Geometry of Linear Transformations
There are three basic transformations: Dilations, Rotations and Translations. Lets look at
these three examples in more depth.
Before we can describe a dilation, we need some notion of the size of a vector.
p
Definition 3.19. Given ~x Rn we let k~xk = x21 + x22 + + x2m .
Example 3.20. A dilation by a factor of r R+ is just the map T (~x) = r~x. Notice that
kr~xk = rk~xk so we
 are indeed scaling by
 r. Can
 you think of the matrix attached to this map?
r 0 x1 rx1
It is Ar = . This is because Ar = . Notice that his works in any dimension!
0 r x2 rx2
Also, note that if we want to scale in two directions differently we multiply by the matrix
 
r 0
Ar,s = .
0 s

Some food for thought... what happens if r < 0? Well, we still scale, but something else
geometric also happens.
 
2 cos sin
Example 3.21. A rotation of every vector by in R is given by the matrix A = .
sin cos
Lets see why with the vector h1, 0i
   
1 cos ()
A = .
0 sin ()

 
x
If we think about a general vector then the trig is a little trickier. This is a CAMPS exercise.
y
If youre curious, here is the resulting vector:
      
x cos () sin () x x cos () y sin ()
A = = .
y sin () cos () y y cos () + x sin ()

Say we want to rotate in Rd about one of the axes x, y, or z by . We use the matrices
(respectively):

1 0 0 cos 0 sin cos sin 0
Ax, = 0 cos sin Ay, = 0 1 0 Az, = sin cos 0 .
0 sin cos sin 0 cos 0 0 1

31
More complicated rotations are possibly by multiplying these matrices together, but we wont
get into that now.
 
x
Example 3.22. Say we want to translate a vector, R2 , over 3 and up 2. The natural
y
   
x x+3
thing to try would be T = . But this isnt a linear transformation, because
y y+3
T (r~x) 6= rT (~x). It turns out that to make this work we need to think of R2 as a plane sitting
inside R3 .  
1
Thinking of vectors as points, lets start with the vector ~e1 = . We would like to translate
0
it over 3 and up 2 to get  
3
.
2
Weve seen that this isnt possible with a linear transformation from R2
R2 . The fix is to go
  x
3 x
up a dimension and map the plane into Z via the function f = y . Note this isnt a

y
1
~ ~
linear transformation because f (0) 6= 0. Can you see where in R this takes R2 ? This gives
3

1
f (~e1 ) = 0 .
1

1 0 a
Now consider the matrix Aa,b = 0 1 b .
0 0 1
We have
x 1 0 a x x+a
Aa,b y = 0 1 b y = y + b
1 0 0 1 1 1
This works because this is whats called a shear. Think of pushing the sides of this slinky:

32
3.2 Matrix Algebra
3.2.1 The basics
Definition 3.23. Let c R. Let A = [~a1 ~am ] with ~ai Rn . We can write the row vectors of
A as ~ar1 , ~ar2 , . . . , ~arn . This lets us write A alternatively as
r
~a1
~ar2
A= ... .

~arm

Note that A is an n m matrix. Let B = [~b1 ~bk ] with ~bi Rm . Notice that B is an m k
matrix. We define addition, scalar multiplication and matrix multiplication:

1. A + B = [~a1 + ~b1 ~am + ~bm ] This only works if A and B have the same

2. cA = [c~a1 c~am ]

r

## 3. AB = ... .. where ~a ~b is the dot product.

.
~arn ~b1 ~arn ~b2 ~arn ~bk
Explicitly, the entry in the ith row and jth column of the product matrix is given by ~ari ~bj .
The resulting matrix is n k.

Fact 3.24. Say that A is n m and B is j k. We can only multiply A and B if m = j. The
resulting matrix is n k.

## Example 3.25. A is n m and ~x Rm then A~x is a n 1 matrix and thus in Rn .

Example 3.26.
 1 2 3
  
1 1 2 4 8 3
1 2 0 = .
1 0 1 0 0 3
1 2 0
Note that if we switched the matrices then the product would be undefined.

~ m ].

## Notation 3.28. If S : Rm Rk and T : Rk Rn are linear transformations, then we write

W = T S if W : Rm Rn is the linear transformation given by

W (~x) = T (S(~x)).

## This is called function composition.

If we have matrices A and B such that

## S(~x) = B~x and T (~y ) = A~y ,

then
W (~x) = A(B~x) = (AB)~x.

33
Example 3.29. Let T : R2 R2 be the linear transformation that rotates a vector ~x be . Let
 
cos sin
A = .
sin cos

T T : R2 R2

## will rotate by + . This has corresponding matrix A A .

Our first example showed matrices for which AB was defined, but not BA. In general, even
when both products are defined they are not the same.
Example 3.30.      
1 2 1 1 7 5
=
3 0 4 2 3 3
     
1 1 1 2 2 2
=
4 2 3 0 10 8
Moreover, if A is n m and B is m n, then AB is n n and BA is m m. So the two products
may not even have the same dimensions.
Definition 3.31. We define the following direction vectors of Rm ,

1 0 0
0 1 0
~e1 = . , ~e2 = . , , ~em = . .

.. .. ..
0 0 1

## The m m identity matrix is denoted by Im = [~e1 ~em ].

Although AB is not necessarily equal to BA, there are several properties of matrix multipli-
cation that coincide with the behavior weve come to expect from multiplication.
Fact 3.32. Let A, B and C be matrices. It holds that
1. A(BC) = (AB)C

2. A(B + C) = AB + AC

3. (A + B)C = AC + BC

## 4. s(AB) = (sA)B = A(sB)

5. AI = A

6. IA = A
The one we choose presently concerns identity matrices.
Theorem 3.33. Let A be an n m matrix, then

In A = A = AIm .

34
Equally important are the ways in which matrix algebra differs from the algebra of the real
numbers. Its important to keep these rules in mind to prevent erroneous mistakes.
Theorem 3.34. Let A, B, and C be nonzero matrices.
(a) It is possible that AB 6= BA.
(b) AB = 0 does not imply that A = 0 or B = 0.
(c) AC = BC does not imply A = B or C = 0.
Example 3.35. Weve already seen an example of (a). For part (b) we consider the product
    
1 2 2 2 0 0
= .
2 4 1 1 0 0
For part (c) we observe
       
3 1 1 1 5 5 5 0 1 1
= = .
2 1 2 2 4 4 0 2 2 2

## 3.2.2 Transpose of a matrix

Sometimes it is convenient to switch the rows and the columns. This is called taking the transpose
of a matrix.
Definition 3.36. Let A be a matrix. We call the matrix that puts the columns of A into the
rows, and the rows of A into the columns the transpose of A. This is denoted by AT .
Notice that if A is n m then AT is m n.
Example 3.37.
1 0 0
0 1 0 1 3
3 3 0 3 1 3

1
has transpose
1 3
0 3 3 3
3 3 3
Transposes have several properties:
Fact 3.38. Let A and B be n m matrices and C and m k matrix. Let s be a scalar. It
holds that
1. (A + B)T = AT + B T
2. (sA)T = sAT
3. (AC)T = C T AT .
Definition 3.39. A matrix A is symmetric if A = AT . It is skew-symmetric if A = AT .
Example 3.40.
2 1 0 2 1 0
A= 1 1 0 and AT = 1 1 0
0 0 2 0 0 2

0 2 1 0 2 1
T
A = 2 0 4 and A = 2 0 4
1 4 0 1 4 0

35
Example 3.41. Two neat applications of skew symmetric matrices to physics are as follows:

1. The cross product matrix B with the property that B~x = ~b ~x.

0 b3 b2
b3 0 b1
b2 b3 0

is skew-symmetric.

## 2. The inertia matrix for a collection of n moving particles is also skew-symmetric, Pi , at

~ and let ~ri = ~ri C. Then I =
points ~ri with masses mi . Call the center of mass C
[~r1 ~rn ] is skew symmetric.

## 3.2.3 Powers of Matrices

If A is an n n then A A is also n n, as is A A A, A A A A, etc. It therefore makes
sense to talk about powers of square matrices, and in general

Ak = |A A{z
A} .
k terms

## Example 3.42. Consider a binary tree with 7 vertices.

We can encode this graph with an adjacency matrix where we put a 1 for any two vertices
that have an edge between them.

A B D E C F G

A 0 1 0 0 1 0 0

B 1 0 1 1 0 0 0

D 0 1 0 0 0 0 0
Q=

E 0 1 0 0 0 0 0

C 1 0 0 0 0 1 1

F 0 0 0 0 1 0 0
G 0 0 0 0 1 0 0
A neat feature of matrix multiplication is that Qk counts exactly how many paths of length k
between two vertices there are. The way to see this is to realize that when we take the dot
product of say the B row with the B column we will get a +1 for each connected vertex. Thus,

36
the result is exactly how many ways one can get from B back to itself in 2 steps. This reasoning
continues for higher powers.

2 0 1 1 0 1 1 0 4 0 0 4 0 0 8 0 4 4 0 4 4
0 3 0 0 1 0 0 4 0 3 3 0 1 1 0 10 0 0 6 0 0

1 0 1 1 0 0 0 0 3 0 0 1 0 0 4 0 3 3 0 1 1
2 3 4

Q = 1 0 1 1 0 0 0 , Q = 0 3 0 0 1 0 0 , Q =

4 0 3 3 0 1 1

0 1 0 0 3 0 0 4 0 1 1 0 3 3 0 6 0 0 10 0 0

1 0 0 0 0 1 1 0 1 0 0 3 0 0 4 0 1 1 0 3 3
1 0 0 0 0 1 1 0 1 0 0 3 0 0 4 0 1 1 0 3 3

## So something larger like Q13 contains some pretty non-trivial information:

A B D E C F G
A 0 4096 0 0 4096 0 0

B 4096 0 2080 2080 0 2016 2016

13
D 0 2080 0 0 2016 0 0
Q =
E 0 2080 0 0 2016 0 0

C 4096 0 2016 2016 0 2080 2080

F 0 2016 0 0 2080 0 0
G 0 2016 0 0 2080 0 0
Example 3.43. Suppose the internet was simpler and consisted of 6 webpages linked together
as follows:

Bryan is surfing the web and randomly clicking on links. Another term for this is random
walk. He starts on page 1. What is the probability that Brian is at page 5 after 10 clicks?

37
We can describe how a random walk behaves after 1 click with the transition probability
matrix:

0.0 0.5 0.5 0.0 0.0 0.0
0.5 0.0 0.5 0.0 0.0 0.0

0.0 0.5 0.0 0.5 0.0 0.0
P = 0.0

0.0 0.0 0.0 0.5 0.5

0.0 0.0 0.5 0.5 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
If we start at website #1 and ask where we can transition this is like computing P ~e1 . Now, to
figure out the next step this is like P (P ~e1 ). This continues so that the answer to our question
is look at the 5th entry of P 10~e1 .

0.076 0.151 0.225 0.22 0.22 0.108 .076
0.075 0.152 0.225 0.22 0.22 0.108 .151

10
0.076 0.148 0.223 0.225 0.217 0.111 10
.225
P =
, P ~e1 =
.
0.073 0.146 0.217 0.22 0.228 0.116 .22

0.07 0.149 0.225 0.228 0.22 0.108 .22
0.076 0.141 0.223 0.217 0.232 0.111 .108
This is .22. For a transition probability matrix P it holds that the matrix P = limk P k
describes the average amount of time a random walk spends at each site. Google actually uses
something very similar to this to rank pages. They meticulously index the web, put it into a
matrix, and the page they show first when you type in a query is the one that has the highest
probability in P . Well give more details about this in Chapter 6.

There are two matrix properties that are preserved when we take powers of matrices.

## Definition 3.44. An n n matrix A is a diagonal matrix if it has the form

a11 0 0
0 a22 0
A= . .. .

. .
. . .
. . . .
0 0 ann

Theorem 3.45. If A is an n n diagonal matrix as above then Ak is also diagonal for all k 1.
In fact, k
a11 0 0
0 ak 0
22
Ak = . .. .

.. ..
.. . . .
0 0 aknn
Definition 3.46. Let A be an n n matrix. We say A is upper triangular if it has the form

a11 a12 a1n
0 a22 a2n
A= . .. .

.. ..
.. . . .
0 0 ann

38
We say A is lower triangular if it has the form

a11 0 0
a21 a22 0
A= . .

.. .. ..
.. . . .
an1 an2 ann
A matrix is triangular is it is either upper or lower triangular.
Theorem 3.47. Let A be an n n upper (lower) triangular matrix and k 1. Then Ak is also
upper (lower) triangular.

3.3 Inverses
In this section we tackle the problem of reversing a linear transformation, to the extent that
its possible. This process has countless applications, including cryptography and 3D transfor-
mations.

3.3.1 Basics
Example 3.48. Let T : R3 R3 be given by T (~x) = A~x, where

1 3 2
A = 2 7 2 .
2 6 5

When we solve T (~x) = ~y for some ~y R3 , we can alternatively interpret this as the inverse of
T for ~y , or T 1 (~y ). To do this we need to solve
x1 + 3x2 + 2x3 = y1
2x1 7x2 + 2x3 = y2
2x1 + 6x2 + 5x3 = y3
Transforming this system into reduced echelon form is equivalent to solving for the xi -s, giving
us the system
x1 = 47y1 + 3y2 20y3
x2 = 14y1 y2 + 6y3
x3 = 2y1 + y3
.
So if we set
47 3 20
B = 14 1 6 ,
2 0 1
then T 1 (~y ) = B~y . In particular, T 1 is also a linear transformation.
Definition 3.49. A linear transformation T : Rm Rn is invertible if T is one-to-one and
onto. When T is invertible, the inverse T 1 : Rn Rm is defined by
T 1 (~y ) = ~x if and only if T (~x) = ~y

39
Theorem 3.50. Let T : Rm Rn be a linear transformation. Then

## (b) If T is invertible, then T 1 is also a linear transformation.

Example 3.51. It is important to keep in mind that m = n is a necessary but not sufficient
requirement for T to be invertible. If T (~x) = A~x with
 
3 6
A= ,
4 8

then the columns of this matrix are linearly dependent, so T is neither one-to-one nor onto, and
therefore not invertible.

## If T : Rn Rn is an invertible linear transformation given by T (~x) = A~x, with T 1 (~x) = B~x,

then for any ~x Rn ,
~x = T T 1 (~x) = T (B~x) = AB~x.


Since this holds for all ~x, it follows that AB = In , the identity matrix. This yields the corre-
sponding notion of invertible matrices.

AB = I.

## Example 3.53. Consider

  1 2 1
1 2
A= and B = 2 5 1
3 5
1 2 0

Since     
1 2 5 2 1 0
= ,
3 5 3 1 0 1
and
1 2 1 2 2 3 1 0 0
2 5 1 1 1 1 = 0 1 0 ,
1 2 0 1 0 1 0 0 1
both A and B are invertible.

Theorem 3.54. Let A be an invertible matrix with AB = I. Then BA = I, and this matrix
B is unique.

Definition 3.55. If A is invertible, then A1 is the inverse of A given by the unique matrix
such that AA1 = A1 A = I.
A square matrix that is not invertible is also called singular.

## Theorem 3.56. Let A and B be n n invertible matrices and C and D be n m matrices.

Then

40
1
(a) A1 is invertible, with A1 =A
(b) AB is invertible, with (AB)1 = B 1 A1
(c) If AC = AD then C = D
(d) If AC = 0nm then C = 0nm
Example 3.57. Inverses are useful for cryptography. Suppose Chuck Norris has a message:
When I slice onions, onions cry.
He can encode this as numbers in a simple way. Letting 0 denote a space, and each other
number the ordinal of the letter a = 1, b = 2, . . ..

23, 8, 5, 14, 0, 9, 19, 12, 9, 3, 5, 0, 15, 14, 9, 15, 14, 19, 15, 14, 9, 15, 14, 19, 0, 3, 18, 25
Lets put it into a 4 7 matrix

23 8 5 14 0 9 19
12 9 3 5 0 15 14
B=
9 15 14 19 15 14 9
15 14 19 0 3 18 25
This could easily be broken though. So, what he does is scramble it with a 4 4 matrix A. Lets
say

0 1 0 1
3 3 3 2
A= 2 4 2 3

3 3 2 2
This has inverse

1 0 1 1
5 2 3 0
A1 =
0 1 0 1
6 2 3 0
So, instead of B, Chuck can send the message AB:

0 1 0 1 23 8 5 14 0 9 19 27 23 22 5 3 33 39
3 3 3 2 12 9 3 5 0 15 14 162 124 104 114 51 150 176
AB =
2 4
=
2 3 9 15 14 19 15 14 9 157 124 107 86 39 160 187
3 3 2 2 15 14 19 0 3 18 25 153 109 90 95 36 136 167
Jimmy (the deputy Ranger) decodes it via the formula
A1 (AB) = (A1 A)B = IB = B.
That is

1 0 1 1 27 23 22 5 3 33 39 23 8 5 14 0 9 19
5 2 3 0 162 124 104 114 51 150 176 12 9 3 5 0 15 14
157 124 107 86 39 160 187 = 9 15 14 19 15 14 9 .

0 1 0 1
6 2 3 0 153 109 90 95 36 136 167 15 14 19 0 3 18 25

41
History: This was invented by Lester Hill in 1929, along with a machine that decoded it,
the Hill Cipher:

## Unfortunately, the basic Hill cipher is vulnerable to a known-plaintext attack because it

is completely linear. An opponent who intercepts n2 plaintext/ciphertext character pairs can
set up a linear system which can (usually) be easily solved; if it happens that this system is
indeterminate, it is only necessary to add a few more plaintext/cipher text pairs. Calculating
this solution by standard linear algebra algorithms then takes very little time.

3.3.2 Finding A1
If A = [~a1 ~an ] and B = [~b1 ~bn ] are matrices such that AB = In , then by breaking down
matrix multiplication column by column tells us.

## A~b1 = ~e1 , A~b2 = ~e2 , ..., A~bn = ~en .

We can then solve for each ~bi with the following augmented matrices

## [~a1 ~an : ~e1 ] [~a1 ~an : ~en ].

However, in solving each of these systems we will perform exactly the same row operations, so
it makes sense to combine it all into one process by setting up one larger augmented matrix:

## [~a1 ~an : ~e1 ~en ].

Once weve transformed the left half to In , provided it is possible to do so, then the right hand
side will be transformed to B = A1 .
In short: We perform Gauss-Jordan elimination on [A : In ] to get [In : A1 ].

42

1 2 3
Example 3.58. Consider A = 0 1 5. To find A1 we will row-reduce [A : I3 ].
5 6 0

1 2 3 : 1 0 0 1 2 3 : 1 0 0
0 1 5 : 0 1 0 0 1 5 : 0 1 0
5 6 0 : 0 0 1 0 4 15 : 5 0 1

1 2 3 : 1 0 0
0
1 5 : 0 1 0
0 0 5 : 5 4 1
2 0 : 4 12 35

1 5
0
1 0 : 5 3 1
4 1
0 0 1 : 1 5 5
0 0 : 6 18 7

1 5 5
0
1 0 : 5 3 1
0 0 1 : 1 45 1
5

## Example 3.59. What does this

process look like when our matrix turns out to not be invertible?
1 1 2
Consider B = 2 1 3, we start the elimination process:
3 1 4

1 1 2 : 1 0 0 1 1 2 : 1 0 0
2 1 3 : 0 1 0 0 1 1 : 2 1
0
3 1 4 : 0 0 1 0 2 2 : 3 0 1

1 1 2 : 1 0 0
0 1 1 : 2 1 0
0 0 0 : 1 2 1

Our process terminates because it is no longer possible to transform the right half into I3 . This
is what will happen for any singular matrix.

Fact 3.60. In the 2 2 case we even have a nice formula for inverses. If
 
a b
A= ,
c d

## then its inverse is given by  

1 1 d b
A = .
ad bc c a
We observe that this formula is undefined when ad bc = 0, and this is precisely when a 2 2
matrix is not invertible! Well explain why in Chapter 5.

## Theorem 3.61. Let A be an n n matrix. Then the following are equivalent:

(a) A is invertible.

(b) A~x = ~b has a uniue solution for all ~b, given by ~x = A1~b.

43
(c) A~x = ~0 has only the trivial solution.

## In wrapping up Chapter 3, we have some additions to our Big Theorem.

Theorem 3.62 (The Big Theorem). Let A = [~a1 ~an ] be an n n matrix and T : Rn Rn
be given by T (~x) = A~x. Then the following are equivalent:

## (c) A~x = ~b has a unique solution for all ~b Rn .

(d) T is onto.

(e) T is one-to-one.

(f) T is invertible.

(g) A is invertible.

44
4 Subspaces
We have studied linear systems of equations, matrices, and linear transformations. Now we are
interested in studying the subsets of Rn that these relate to. It is often the case that the span
of a matrix, or the range of a linear transformation is not all of the codomain.
 
1 2
Example 4.1. T (~x) = ~x has range 0 = 4x + y, or equivalently the line y = 4x.
4 8

## 4.1 Introduction to subspaces

Definition 4.2. A subset S Rn is a subspace if S satisfies the following 3 conditions:
1. S contains 0, that is 0 S.

2. If ~u, ~v S the ~u + ~v S.

## 3. If r R and ~u S then r~u S.

Example 4.3. Let

x+y =6
xy =2
 
4
The solution set is not a subspace because ~0 is not a solution.
2
Example 4.4. Let

x+y =6
5x 5y = 30


6t
The solution set is . Notice that ~0 again is not in the solution set, so the solutions set is
t
not a subspace.

## 4.1.1 Span and nullspace

Theorem 4.5. Let S = span{~u1 , . . . ~um } be a subset of Rn . Then S is a subspace.
This is true because ~0 S, and linear combinations of vectors are in S by the definition of
the span.
There is an algorithm to determine if a set is subspace:

## Step 1. Check if ~0 S. If not, then S isnt a subspace.

Step 2. If you can show that S is generated by a set of vectors, i.e. that is the span of some set of
vectors, then by the previous theorem S is a subspace.

Step 3. Show by hand that the second and third conditions of the subspace definition are or arent
met by S.

45
Example 4.6. S = {~0} and S = Rn are subspaces. We call them the trivial subspaces.

Example 4.7. Let ` be a line in Rn not containing the origin. The ` is not a subspace because
~0
/ `. However, any line L that contains ~0 is a subspace because it can be written as the span
of a single vector.

v1
v2 n
Example 4.8. Let S be the set of all vectors ... in R such that v1 + + vn = 0. Then

vn
~
S is a subspace because (1) 0 belongs to S, and one can check that the second two conditions
hold as well:
(2) ~u, ~v S implies that (u1 + v1 ) + (un + vn ) = 0 so ~u + ~v S.
(3) If ~u S then r(u1 + un ) = 0 so r~u S.

Matrices also have subspaces attached to them. For instance, homogeneous systems A~x = ~0
have solution sets that are subspaces.

## These solution sets have a special name.

Definition 4.10. The set of solutions to A~x = ~0 is called the null space of A. This is denoted
by null(A).

Example 4.11. We give three more intuitive examples of what the Span and null space might
describe:

1. Lets suppose that the matrix A represents a physical system. As an example, lets assume
our system is a rocket, and A is a matrix representing the directions we can go based on
our thrusters. So what do the null space and the span represent?
Well lets suppose we have a direction that were interested in. Is it in our column space?
If so, then we can move in that direction. The column space is the set of directions that
we can achieve based on our thrusters. Lets suppose that we have three thrusters equally
spaced around our rocket. If theyre all perfectly functional then we can move in any
direction. In this case our column space is the entire range. But what happens when a
thruster breaks? Now weve only got two thrusters. Our linear system will have changed
(the matrix A will be different), and our column space will be reduced.

46
Whats the null space? The null space are the set of thruster instructions that completely
waste fuel. Theyre the set of instructions where our thrusters will thrust, but the direction
will not be changed at all.
Suppose a space ship has thrusters which allow it to move in directions

1 0 0 2 1
~u1 = 1 , ~u2 = 0 , ~u3 = 1 , ~u4 = 3 ~u5 = 1 .
1 1 0 2 2

Suppose that each of these vectors has an antipodal thruster that can fire in the opposite
direction. So, it makes sense to apply a negative thrust in the direction. What thrust
amounts, t1 , t2 , t3 , t4 , t5 result in no motion?
We solve the system A~x = ~0 :

1 0 0 2 1 0
1 0 1 3 1 0
1 1 0 2 2 0

and obtain

1 0 0 2 1 0
0 1 0 4 1 0 .
0 0 1 1 2 0

2t4 t5
So, if we choose t4 and t5 , then any thrusting for the first three of the form 4t4 t5
t4 + 2t5
will result in no motion. Note that because the Span is all of R3 we can go anywhere with
our space ship.

2. Or A can represent measurements on investments. The range are all the measurements
achievable by a portfolio. The null space are all the investments that can be made that
wouldnt change the measurements at all.

47

R

An investment has 4 measurements attached to it where R is the average rate of

v
return, is the amount the stock has outperformed the index, is the correlation to the
index, and v is the riskiness or volatility of the stock. An investor can either buy an index
or short it. So, it is possible to have positive and negative linear combinations of the
following five investments:

3 1 14 1 5
1 0 2 1 1
Quascorp: 3; LundinWorx: 1; GruelInc: 11 ; Holtheroni: 1; Bruzzard 2 .

4 11 20 1 3
What stock blends will result in a completely neutral portfolio? We set up an augmented
matrix to solve for the nullspace:

3 1 14 1 5 0
1
0 2 1 1 0

3 1 11 1 2 0
4 11 20 1 3 0

## This row reduces to:

0 586

1 0 0 65 0
54
0
1 0 0 65 0

0 101
0 1 0 65 0
723
0 0 0 1 65 0
So, the blend is determined by how many stocks of Bruzzard we buy.

3. Another example that we wont work out is room illumination. A math professor at
Indiana University once told Matt that the hardest math problem he couldnt solve was
buying lighting for his house (true story). Apparently it is really hard to get the light just
right in a room. The range of A represents the area of the room that can be illuminated.
The null space of A represents the power we can apply to lamps that dont change the
illumination in the room at all.

48
4.1.2 Kernel and range of linear transformations
We can define similar notions for linear transformations.

## Definition 4.12. Let T : Rm Rn be a linear transformation. The kernel of T is the set of

all vectors that T maps to ~0. i.e. all ~x Rm such that T (~x) = ~0. We denote this by ker T .

## Theorem 4.13. ker T and range(T ) are both subspaces.

Both of these can be checked by using the linearity and scaling properties of linear transfor-
mations.

## Example 4.14. Suppose that

  x1 2x2
x1
T = 3x1 + 6x2
x2
2x1 4x2
tells us what direction a space ship moves given its two thrusters fire with x1 and x2 . We can
write T (~x) = A~x with
1 2 1 2
A = 3 6 0 0 .
2 4 0 0
Augmenting with ~0, this is equivalent to the single equation x1 2x2 = 0. So, we can write for
all s R  
2
ker T = s .
1

1
Notice that range(T) = span 3. This is just a line. A thruster redesign might be in order...
2
We can relate the kernel to one-to-one-ness.

## 4.2 Basis and Dimension

Weve talked about how homogeneous solution sets, span and range relate to subspaces. Now
we want to introduce linear independence.

49
4.2.1 Introduction
Example 4.16. Bryan is very health conscious. He not only cares about macro nutrients, but
about the constitution of these macros. Here are the standard recommendations.
Calories 2000
Fat 65 g
Sodium 2,400 mg
Carbohydrate 300 g
Protein 50 g
Let
K
F
S R5

C
P
be the nutrition information of 100g of each food item. Notice that calories are determined by
the formula
K = 9F + 4C + 4P.
Thus, the space of possible nutrition vectors is the subspace W R5 described by all vectors
satisfying
0 = K + 9F + 4C + 4P.
Bryan also has a narrow palate. Suppose he only likes
whole milk, avocado, tahini, orange juice, pineapple and coconut milk.
Can he hit any nutrition goal with just these? What is the least amount of food he can eat
in order to be able to hit any nutrition goal he might have? We first gather the nutrition
information:

Nutrition Facts
Serving Size 100 g

## Amount Per Serving

Calories 179
% Daily Value*
Total Fat 15 g 2%
Sodium 7 mg 8%
Total Carbohydrate 9 g 1%
Protein 2 g

## * Percent Daily Values are based on a

2,000 calorie diet.

50
This gives a matrix of column vectors:

A T O M P C
179 638 44 41 41 232 K

15 54 0 1 1 24 F

7 115 1 44 15 2 S

9 21 10 5 6 1 C
2 17 1 3 2 3 P
Suppose also, that Bryan can have really targeted workouts so he depletes exactly the negative
of any of these nutrition vectors. The echelon form of the above is:
5176 8417
1 0 0 0 18807 12538
0 1 0 0 1741 1705
18807 6269
3690 8741

0 0 1 0
6269 12538
811 9483
6269 12538
0 0 0 1
0 0 0 0 0 0
Note that had we augmented with a generic vector we would have the echelon form:
1 638 44 41 41 232 1

179 179 179 179 179 179 K
0 1 55
8 109
24 109
24
17
2
179 5
96 F 32 K

3611 3379 2060 1791 449 8

0
0 1 4947 4947 1649 6596 F + 19788 K + 4947 S ,
0 0 0 1 6269 12538 12538 C + 41367
811 9483 4947
50152 F 4547
50152 K + 547
12538 S
0 0 0 0 0 0 4C + 9 F K + 4P
which checks out with the fact these vectors all lie in the subspace W .
The above tells us that Bryan can hit every nutritional goal in W . But can he cut out a
food item? Yes! These vectors are not linearly independent. Suppose he took out coconut milk
and pineapple. Then we would have {~a, ~t, ~o, m}
~ is a linearly independent system that still spans
W . This sets us up PERFECTLY to define a basis.
Lets define a basis.
Definition 4.17. A set B = {~u1 , . . . , ~um } is a basis for a subspace S if
1. span(B) = S.
2. B is linearly independent.
Theorem 4.18. If B is a basis for S then every element of S can be written as a unique linear
combination of vectors in B.

## 4.2.2 Finding a basis

There are two ways to find a basis. One involves row reducing. It hinges on the idea that the
span of the rows a matrix is unaffected by row operations.

a~1
a~2 m
Definition 4.19. Let A = ... , with row vectors ~ai R . We define the rowspace to be

a~n
span{~a1 , . . . , ~am }. We denote this by rowspace(A).

51
Theorem 4.20. If A and B are equivalent (i.e. theres a way to go from one to the other by
row operations) then
rowspace(A) = rowspace(B).
So if we know that ~u1 , . . . , ~um span S and want to find a minimal set, it is enough to write
these as row vectors then go to echelon form. The remaining non-zero vectors will be linearly
independent, and still span S.
Example 4.21. The one in the book is fine:

## How to rower-ize a basis

1. To find a basis for S = span ~u1 , . . . , ~um , Use the vectors ~u1 , . . . , ~um to form the rows of a
matrix A.

## 3. The nonzero rows of B give a basis for S.

There is another way to find a basis. This uses the fact that row operations preserve linear
combinations:

52
Fact 4.22. If U = [~u1 ~um ] and V = [~v1 ~vm ] are equivalent matrices, then any linear
dependence among the ~ui still exists for the ~vi .

## 2. Transform A to echelon form B. The pivot (leading 1) columns of B will be linearly

independent, and the other columns will be linearly dependent on the pivot columns.

## 3. The columns of A corresponding to the pivot columns of B form a basis for S.

53
4.2.3 Dimension
Now that we have the notion of a basis, we can finally
 make sense of dimension. For example, it
1
has always been tempting to say that span = R, because it basically looks like the real
0
line. However, this isnt true because it lives in R2 . We will see that this set has dimension
1, and so it looks like R, although it lives somewhere else...
Before defining a dimension we need to establish that every basis for a given subspace will
have the same size.
Theorem 4.24. If S is a subspace of Rn , then every basis of S has the same number of vectors.
Definition 4.25. Let S be subspace of Rn , then the dimension of S is the number of vectors
in any basis of S.
Note that {~0} and Rn have dimensions 0 and n, respectively. To see that Rn has dimension
n we can use the standard basis vectors ~e1 , . . . , ~en .
We are interested in three different tasks:
1. Determining the dimension of a subspace.

## 3. Expanding a set of vectors to be a basis for a given subspace.

Lets look at examples of these.
Example 4.26. Determine the dimension of the subspace

6 1 1 7
4 0 4 1

S = span 1 , 2 , 9 , 0 .

1 3 16 1

2 4 22 3

We do so by row reducing the corresponding matrix and counting the number of pivots.

6 1 1 7 1 0 1 0
4 0 4 1 0 1 5 0

1
2 9 0
0 0
0 1
.
1 3 16 1 0 0 0 0
2 4 22 3 0 0 0 0

## There are 3, so the dimension is 3.

Before our next example, we give a theorem.
Theorem 4.27. Let U = {~u1 , . . . , ~um } be a set of vectors in a subspace S Rn .
1. If U is linearly independent then either U is a basis, or vectors can be added to U to form
a basis of S.

2. If span(U) = S, then either U is already a basis, or vectors can be removed from U to form
a basis.

54
Now lets expand a set to form a basis. We go about this in a round-about way. Put simply,
we add too many vectors in, then remove them using techniques we already know.

## Example 4.29. Let A = [~a1 ~a6 ] with

1 3 2 0 2 0
2 6 5 2 4 3
A= 0

0 5 10 0 15
2 6 0 8 4 18

55
Describe the nullspace of A and exhibit a basis. We row reduce and obtain

1 3 0 4 2 0 0
0 0 1 2 0 0 0
A 0

0 0 0 0 1 0
0 0 0 0 0 0 0

x2

2x 4

.

x4

x5
0

## Another way to write this is as

3 4 2

1 0 0

0
+ x4 0 + x5 0 .

x2
0 1 0

0 0 1
0 0 0

Thus,

3 4 2

1 0 0

0 0 0

nullspace(A) = span , ,
0 1 0 .

0 0 1

0 0 0
We often care about the dimension of the nullspace. This gets a special name.
Definition 4.30. The nullity of a matrix A is the dimension of the nullspace. We denote this
by nullity(A).
Knowing the dimension of a subspace helps us find the dimension.
Theorem 4.31. If S has dimension m and U is a set of m vectors that either span S or are
linearly independent, then U is a basis for S.

1
Example 4.32. Suppose that S is a subspace of dimension 2 and contains the vectors 0 and
2

0
1. Show that these two vectors form a basis for S.
0
To do this it suffices to check that these are linearly independent, which either visual inspec-
tion or row reducing quickly reveals.

56
Fact 4.33. If S1 and S2 are subspaces of Rn and S1 is a subset of S2 , then dim(S1 ) dim(S2 )
and dim(S1 ) = dim(S2 ) if and only if S1 = S2 .
Fact 4.34. If U = {~u1 , . . . , ~um } is a set of vectors in a subspace S of dimension k, then
1. If m < k, then U does not span S.

## 4.3 Row and Column Spaces

In developing a method for finding bases we started viewing matrices in the context of subspaces
spanned by its row and column vectors.
Definition 4.35. Let A be an n m matrix. The row space of A is the subspace of Rm
spanned by its row vectors, denoted row(A). Similarly, the column space of A is the subspace
of Rn spanned by the column vectors of A, denoted col(A).
Formalizing what we saw in section 4.2 gives the following theorem.
Theorem 4.36. Let A be a nonzero matrix and B an echelon form of A.
1. The nonzero rows of B form a basis for row(A).

2. The columns of A corresponding to the pivot columns of B form a basis for col(A).
Example 4.37. Find a basis and the dimension of the row and column spaces for A.

1 3 1 0 3
A = 2 6 0 2 4
3 9 3 6 3

## Our first step is to find an echelon form of A,

1 3 1 0 3 1 3 1 0 3
A = 2 6 0
2 4 0 0 2 2 2 = B
3 9 3 6 3 0 0 0 0 0

By the previous theorem, the nonzero rows of B form a basis for row(A), or

1 0

3 0

1 , 2 .

0 2

3 2

For the row space, we observe that the pivot columns of B are the first and third columns, so
the corresponding columns of A will be a basis for col(A), or

1 1
2 , 0 .
3 3

Both subspaces have bases consisting of two vectors, so they have dimension 2.

57
In the above example the row space and column space of our matrix, even though they were
subspaces of different Euclidean spaces, had the same dimension. It turns out this is always the
case.

Theorem 4.38. For any matrix A, the dimensions of the row space and the column space are
the same.

Proof: Given A, we use elementary row operations to find an echelon form B of A. The
dimension of the row space of A will correspond to the number of nonzero rows of B, while
the dimension of the column space will correspond to the number of pivot columns of B. We
observe that each nonzero row of B has exactly one leading variable, or pivot, and by the
stair-step pattern of B different nonzero rows will have different pivot columns. It follows that

## # nonzero rows of B = # pivot columns of B .

| {z } | {z }
dimension of row space dimension of column space


With this relationship established we can make sense of the rank of a matrix.

Definition 4.39. The rank of a matrix A is the dimension of the row (or column) space of A,
denoted rank(A).

1 2 3 0 1
Example 4.40. What is the rank and nullity of 0 0 1 3 5 ?
0 0 0 0 0
Well, we see that there are two pivots, so the rank is 2. To determine the nullity we augment
with ~0 and observe that there are 3 free variables: x2 , x4 , x5 . Back substitution yields the
following solutions for the null space:

x3 = 3x4 5xt ,
x1 = 2x2 3x3 + x5 = 2x2 9x4 + 16x5 .

## Call s1 = x2 , s2 = x3 , x3 = x5 and the solutions are

2 9 16
1 0 0

0 + s2 3 + s3 5 .
~x = s1
0 1 0
0 0 1

These are linearly independent, and thus form a basis for the nullspace. We deduce that the
nullity is 3.

In this example we saw that for our 3 5 matrix the rank was 2 and the nullity was 3. Notice
that 2 + 3 = 5. This holds in general:

## Theorem 4.41 (Rank-Nullity Theorem). Let A be any n m matrix. Then,

rank(A) + nullity(A) = m.

58
This is true because in echelon form, each pivot of A corresponds to a vector in the basis
for column (or row) space of A. The remaining columns correspond to free variables, each of
which is a parameter for a basis vector of the solution set A~x = 0. So, these two objects sum to
m = #columns.

Example 4.42. Suppose that A is 3742 and T (~x) = A~x has a kernel with dimension 12. What
is the rank of A? Well, since the kernel of T and the nullspace of A have the same dimension,
we can use the rank-nullity theorem to deduce that rank(A) = 42 12 = 30.

Theorem 4.43 (The Big Theorem). Let A = [~a1 ~an ] be an n n matrix and T : Rn Rn
be given by T (~x) = A~x. Then the following are equivalent:

## (c) A~x = ~b has a unique solution for all ~b Rn .

(d) T is onto.

(e) T is one-to-one.

(f) T is invertible.

(g) A is invertible.

## (i) ~a1 , . . . , ~an form a basis for Rn .

(j) col(A) = Rn

(k) row(A) = Rn

(l) rank(A) = n

(m) nullity(A) = 0.

We end with one last example that ties a lot of this together.

1 1

0 0 4
1 , ~u2 = 2 R
span ~u1 =

1 2

## and Q be the plane

(x y z = 0) R3 .
Find a linear transformation T : R4 R3 such that T (P ) = Q and ker T has dimension 1.
Step 1: We We first find a basis for R4 that includes the vectors spanning P . To do this we
augment with the standard basis vectors of R4 and row reduce.

59

0 1 1 0 0 0 1 0 2 0 0 1
0 0 0 1 0 0 0 1 1 0 0 0

1 2 0 0 1 0 0 0 0 1 0 0
1 2 0 0 0 1 0 0 0 0 1 1
Looking at the pivots we have {~u1 , ~u2 , ~e2 , ~e3 } is a basis for R4 .
Next we need to find spanning vectors
for the subspace Q. We can write x = y + z, so
x
another way to represent vectors y Q is as
z

y+z 1 1
y = y 1 + z 0 .
z 0 1

1 1
This means that ~v1 = 1 and ~v2 = 0 is a spanning set for Q. We still need to find another
0 1
vector to make this a basis for R3 . A sure way to do this is to augment again with the standard
basis vectors and row reduce:

1 1 1 0 0 1 0 0 1 0
1 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 1 1
So, we have a basis {~v1 , ~v2 , ~e1 } is a basis for R3 .
Step 2: We observe that a linear transformation is completely determined by how it acts on a
basis. So, the map T : R4 R3 given by:

## T (~u1 ) = ~v1 ; T (~u2 ) = ~v2 ; T (~e2 ) = ~e1 ; T (~e3 ) = ~0

is a unique transformation and does what we want.

60
5 Determinants
5.1 Determinants
The determinant is a number we associate to a square matrix A. We call it det(A), or sometimes
just |A| (Caution, the determinant can be negative). It will tell us, among other things, whether
or not A is invertible. Rather than say how to compute it, lets say how we would like det(A)
to behave like:
Theorem 5.1. Let A be an n n matrix. The number det(A) has the following properties:
1. det A 6= 0 if and only if A is invertible. (Puts us in the Big Theorem situation of lots being
true about the row and column vectors of A, as well as the induced linear transformation
and bases.)

2. The area of the unit cube under the linear transformation T is | det(A)|.

3. det I = 1.

4. If B is obtained from A by switching a single row or column with another then det B =
det A.

a b
= c d .

c d a b

ta tb
= t a b .

c d c d

## (b) It is linear in each row and column:

a + a0 b + b0 a b a0 b0

= +
c d c d c d

6. If two rows (or columns) are the same then det A = 0. This is by Property 3, since we
could swap two rows and obtain det A = det A and so det A = 0.

## 7. Subtracting r Ri from Rj does not change the determinant.

a b a b a b
c ra d rb = c d + ra rb

a b a b
= + r

c d a b

a b
= +0
c d

## 8. A row (or column) of zeros means det A = 0.

9. If B is the upper triangular we get from A after row reducing like in 6, then det A is the
product of the diagonal entries of B times (1)#row exchanges .

61
10. det(AB) = det A det B

11. det A1 = 1
det A .

## 5.1.1 A Formula for the Determinant

Formula 5.2. If A is a 1 1 matrix, A = [a11 ], then det(A) = a11 .
Formula 5.3. If A is a 2 2 matrix, things get a little bit harder, but are still pretty easy to
compute. If A is the matrix  
a11 a12
a21 a22
then the determinant of A is given by

## det(A) = a11 a22 a12 a21

.
Note: weve
  talked about the quick formula for the inverse of a 2 2 matrix: if A is
a a
the matrix 11 12 , then
a21 a22
 
1 1 a22 a12
A = .
a11 a22 a12 a21 a21 a11
So weve already seen the determinant of a 2 2 matrix come up and we know that a 2 2
matrix A is invertible if and only if det(A) 6= 0.
 
2 4
Example 5.4. Find the determinant of the matrix A = .
3 1
The determinant is (2)(1) 4(3) = 14.

Formula 5.5. Once we know the formula for a 2 2 determinant, we begin to compute de-
terminants recursively. In order to do this, we first introduce some notation. If A is an n n
matrix, denote the ij th entry by aij . Let Mij be the (n 1) (n 1) matrix gotten by cross-
ing out the ith row and j th column of A. Mij is called the ij th minor of A. Finally, let
Cij = (1)i+j det(Mij ). Cij is called the ij th cofactor of A.
Example 5.6. As an example, consider the matrix

2 1 1
A = 3 2 4 .
2 1 0

Then,
a13 = 1
 
3 2
M13 =
2 1

62
and
C13 = (1)1+3 det M13 = (1)4 (3(1) (2)(2)) = 1.
Well use these to compute the determinant of any matrix. Heres the formula: if A is the
n n matrix
a11 a12 . . . a1n
.. .. .. ..
. . . .
A= .. .. .. ..

. . . .
an1 an2 . . . ann
then
det(A) = a11 C11 + a12 C12 + + a1n C1n .

2 1 1
Example 2. Find the determinant of the matrix A = 3 2 4 .
2 1 0
The formula above tells us det(A) = a11 C11 + a12 C12 + a13 C13 . We know

## a11 = 2, a12 = 1, a13 = 1

and  
1+1 2 4
C11 = (1) det M11 = det = 4
1 0
 
1+2 3 4
C12 = (1) det M11 = det = 8
2 0
 
1+3 3 2
C13 = (1) det M13 = det = 1
2 1
so
det(A) = 2(4) + 1(8) 1(1) = 15.

## The formula is recursive, meaning we compute determinants of 3 3 matrices by computing

determinants of 2 2 matrices. We compute determinants of 4 4 matrices by computing
determinants of 3 3 matrices, which we do by computing determinants of 2 2 matrices, and
so on.

## Formula 5.7. Here are two formulas for 3 3 determinants:

63
If you are curious, here is a worked out example for a 4 4 matrix:

1 1 2 1
0 2 1 1
Example 5.8. Find the determinant of the matrix A = 0 3 2
.
4
0 2 1 0
The formula tells us det(A) = a11 C11 + a12 C12 + a13 C13 + a14 C14 . We know
a11 = 1, a12 = 1, a13 = 2, a14 = 1
and, by what we computed above (example (2)),

2 1 1
C11 = (1)1+1 det M11 = det 3 2 4 = 15.
2 1 0
We also need to find

0 1 1
C12 = (1)1+2 det M11 = det 0 2 4
0 1 0

0 2 1
C13 = (1)1+3 det M13 = det 0 3 4
0 2 0

0 2 1
C14 = (1)1+4 det M14 = det 0 3 2 .
0 2 1
Lets do the first one together:

0 1 1       
2 4 0 4 0 2
C12 = det 0 2 4 = 0 det 1 det + (1) det = 0.
1 0 0 0 0 1
0 1 0
Similarly, we can compute C13 = 0 and C14 = 0 and we find that so
det(A) = 1(15) 1(0) + 2(0) + 1(0) = 15.

64
A lot of the determinants in the above example turned out to be 0, and thats a reflection
of the following fact: we can compute determinants by expanding along the first row (what
weve been doing in the formula above, using the entries from the first row to compute the
determinant), or we could expand along any row or column of our choice. This is summarized
in the following formulas.

a11 a12 ... a1n
.. .. .. ..
A= . . . .
an1 an2 ... ann

## then we can compute the determinant via

1. An expansion across row i: det(A) = ai1 Ci1 + ai2 Ci2 + + ain Cin

2. An expansion across column j: det(A) = a1j C1j + a2j C2j + + anj Cnj

Example 5.9. Lets revisit the earlier example and try applying part (2) of the formula to
the example above, with j = 1 (expanding across the first column). The formula tells us
det(A) = 1 C11 + 0 C21 + 0 C31 + 0 C41 . So, we need to find

2 1 1
C11 = (1)1+1 det M11 = det 3 2 4 = 15
2 1 0

## 5.1.2 Applications of the determinant

Determinants let us quickly find inverses of a matrix:

## Theorem 5.10 (Cramers Rule).

1
det A
Where adj(A) is the adjoint of A given by

C11 C21 Cn1
.. .. .. ..
. . . .
adj(A) = .. .. .. ..

. . . .
C1n C2n Cnn

Example 5.11.
1
2 0 1 1 1 2
1 2 1 = 1 0 0 3
3
0 1 0 1 2 4

65
Fact 5.12. If T : Rn Rn with T (~x) = A~x then the volume of the image of the unit cube is
det A.
Fact 5.13. You will see in multivariable calculus that even for non-linear coordinate changes the
determinant tells you what happens to the volume of a cube. For example, the transformation
from polar coordinates F (r, ) = (r cos , r sin ) induces a determinant

cos r sin
sin r cos = r.

So the distortion is r. This means the amount of stretching increases as r increases. See the
picture below:

What this amounts to is an extra factor of r when we compute double integrals in polar
coordinates: Z Z Z Z
f (x, y)dxdy = f (r cos , r sin )rdrd.

## Definition 5.14. A matrix is totally unimodular if every submatrix has determinant in

{1, 0, 1}.
Fact 5.15. If A is totally unimodular and ~b is only integers then the unique solution to A~x = ~b
is an integer valued vector.
Example 5.16. The adjacency graphs of certain link structures (where there are two groups of
sites that do not link to on another) are totally unimodular.

For the matrix above we know that all equations A~x = ~b have integer solutions.
This follows from Cramers rule since

## det(A with ~b in ith column)

A~x = ~b ~x = A1~b each xi = .
det A
It is (among many other things) useful for minimum cost flow problems. Like getting goods
from a warehouse to a factory with different paths having different costs.

66
6 Eigenvalues and Eigenvectors
The prefix eigen- is adopted from the German word eigen for own- or unique to, peculiar
to, or belonging to in the sense of idiosyncratic in relation to the originating matrix.

## 6.1 The Eigenvalue Problem

6.1.1 Motivating examples
That said, lets get started with an example about centrality measures.

Example 6.1. Suppose you take facebook and draw lines between all friends. What is a good
way to measure how well-connected an individual is? It maybe isnt just how many friends
someone has, its also how important their friends are.
What if we assume that there are n people and each person has a certain importance score,
ci . To account for how important your friends are mattering, lets say this is proportional to its
neighbors scores. Another way to write this is as
1
ci = (sum of neighbors importance scores).

If ~c is everyones importance score, we can encode this with an equation involving the adjacency
matrix, A to be:
1
~c = A~c.

Thus, the importance scores can be found by solving A~c = ~c for and ~c. For example, if we
.41
1/2
.58

solve this for the two graphs below we get vectors (a) 2/2 and (b) .41 .

1/2 .52
.21

67
One more example: hippos.

Example 6.2. Suppose that a large hippo population exists in a wildlife preserve. Hippos
produce different average amounts of offspring at different times in their lives. In fact,

## There are 3 stages s1 , s2 , s3 .

x1 5
We start with ~x = x2 of each stage hippo. Say ~x = 10 all in thousands.
x3 8

b1 0
~ ~
During stages s1 , s2 , s3 they produce b = b2 offspring on average. Lets say that b = 2.

b3 1
Suppose that the probability a hippo in s1 reaches s2 is .5, and the probability a hippo in
s2 reaches s3 is 1/5.

Does the population stabilize, go extinct, or grow exponentially? At what rate does this
happen?
Lets start by putting this data into a matrix:

b1 b2 b3 0 2 1
A = p1 0 0 = 21 0 0
0 p2 0 0 15 0
Notice that multiplying the age vector ~x by A gives the age distribution for the next gener-
ation:
57
5 28 7 2
A0 ~x = 10 , A1 ~x = 52 , A2 ~x = 14 , A3 ~x = 72 .
1 14
8 2 2 5

It would be nice to find a simpler formula for Ak ~x. That way we could make predictions
about the long term behavior of the hippo population.
Suppose we could solve the equation

A~u = ~u.

## This would ensure that

Ak ~u = k ~u.
Suppose further that there were three solutions (1 , ~u1 ), (2 , ~u2 ), (3 , ~u3 ) to this equation and
that the ~ui were linearly independent. Then we could write ~x as a linear combination of the ~ui .
This means that

68
~x = c1 ~u1 + c2 ~u2 + c3 ~u3 .
Now, if we multiply by A we have
Ak ~x = Ak (c1 ~u1 + c2 ~u2 + c3 ~u3 )
= c1 Ak ~u1 + c2 Ak ~u2 + c3 Ak ~u3
= c1 k1 ~u1 + c2 k2 ~u2 + c3 k3 ~u3 .
This decomposes the matrix multiplication into a simple formula involving just scalars. What
is eigen (peculiar) is that we can do this! We will say how soon, but for the hippo example we
have
1
1 = .95, ~u1 = .53
.11

1
2 = .10, ~u2 = 4.9
9.8

1
3 = 1.04, ~u3 = .48
.09
And, by solving a linear system we get

c1 11.8
c2 = .8 .
c3 16
This gives us a much simpler formula for Ak ~x:

1 1 1
Ak ~x = 11.8(.95)k .53 + .8(.10)k 4.9 + 16(1.04)k .48 .
.11 9.8 .09
Notice that because 1 , 2 < 1 they go to zero as k gets large. So, we see that the stable
equation is (note we multiply 16~u1 ):

16
Ak ~x 1.1k 7.68 .
1.44
This tells us the rate of growth, 1.04k , as well as the stabilizing population. For example,
16
16+7.68+1.44 .64 says that 64% of the hippos are young in the 1000th generation.

69
The calculation for the hippo example was done on a computer. The one for the centrality
scores is at the end of this section.
In the above examples, for our square matrix A, we heavily relied on knowing scalars
nonzero vectors ~u such that A~u = ~u. We wish to develop a method for finding these for any
square matrix.

Definition 6.3. A vector ~u is an eigenvector of A if there exists some scalar such that
A~u = ~u. We then call an eigenvalue of A. Finally, the set of all eigenvectors corresponding
to a given is called the eigenspace of .

Fact 6.4. If ~u is an eigenvector of a matrix with eigenvector , then so is any scalar multiple
of ~u.
   
3 5 5
Example 6.5. If A = , verify that ~u = is an eigenvector of A with corresponding
4 2 4
eigenvalues = 7.
To verify this, we need to check the A~u = 7~u. But this is true:
       
3 5 5 35 5
A~u = = =7 = 7~u.
4 2 4 28 4

## 6.1.2 Finding Eigenvalues

We want to find and ~u such that
A~u = ~u.
Subtracting ~u from both sides, we get

A~u ~u = ~0.

## Now, factoring out the ~u (on the right), we get

(A I)~u = ~0.

Rearranging the eigenvalue problem in this way, were looking for values of such that the
equation
(A I)~x = ~0
has a nonzero solution ~x = ~u. By the Big Theorem, this equation has a nonzero solution if and
only if A I is a singular matrix, ie. det(A I) = 0.

By rearranging the problem in this way, we see that we can find eigenvalues of A by deter-
mining when det(A I) = 0. So we are solving for the zeros of a polynomial in .

## Definition 6.6. The characteristic polynomial of an n n matrix A is det(A I).

So we find the eigenvalues of A by setting its characteristic polynomial to 0 then solving for
.

70
Example 6.7. Find the eigenvalues for the matrix

1 0 1
A = 0 1 2
0 4 3

1 0 1
1 2
det(A I) = 0
1 2 = (1 )
+0+0
4 3
0 4 3
= (1 )((1 )(3 ) 8)
= (1 )(2 4 5)
= (1 )( 5)( + 1)

## 6.1.3 Finding Eigenvectors

Once we know the eigenvalues of A, we can find eigenvectors corresponding to an eigenvalue
as nonzero solutions to (A I)~x = ~0.

Fact 6.8. The eigenspace of is then seen to be the set of all solutions to (A I)~x = ~0. It it
is therefore the nullspace of A I, and a subspace of Rn .
 
3 3
Example 6.9. Find the eigenvalues and a basis for each eigenspace of the matrix A = .
6 4
We first want to find eigenvalues of A, which we do by finding such that det(A I) = 0.
     
3 3 1 0 3 3
A I = = .
6 4 0 1 6 4

Now we compute

## Setting this equal to 0, we find that the eigenvalues are 1 = 5 and 2 = 6.

We can find the corresponding eigenvectors (and eigenspaces) by solving the equation (A
I)~x = ~0. For 1 = 5,  
2 3
A 5I =
6 9
and we solve the equation by reducing
   
2 3 0 1 3/2 0
[A 5I|~0] = .
6 9 0 0 0 0

The solutions to this equation are the eigenspace corresponding to 1 = 5, so the eigenspace is
the set of vectors of the form  
3/2
s .
1

71
A basis for this eigenspace would be  
3/2
.
1
Note that although ~0 is in the eigenspace, eigenvectors are required to be nonzero, so any value
s 6= 0 would give an eigenvector corresponding to 1 = 5.

## Now, we repeat the process for the other eigenvalue 2 = 6:

   
~ 9 3 0 1 1/3 0
[A + 6I|0] =
6 2 0 0 0 0

meaning the eigenspace corresponding to 2 = 6 (the solutions to (A + 6I)~x = ~0) is the set of
vectors of the form  
1/3
s .
1
One basis for this eigenspace would be
 
1/3
.
1

1 2 1
Example 6.10. Find the eigenvalues and a basis for each eigenspace of A = 1 0 1.
1 2 3
First we need to find the characteristic polynomial, det(A I). But,

1 2 1
det(A I) = det 1 1 = 3 + 42 4 = ( 2)2 .
1 2 3

This equals zero when is 0 or 2, so the eigenvalues are 1 = 0 and 2 = 2. To find the
corresponding eigenspaces, we want to find all solutions to (A I)~x = ~0.

## For 1 = 0, we solve this by reducing

1 2 1 0 1 0 1 0
[A 0I|~0] = 1 0 1 0 0 1 1 0 .
1 2 3 0 0 0 0 0

1
span 1 .
1

## We repeat the process for 2 = 2, reducing

1 2 1 0 1 2 1 0
[A 2I|~0] = 1 2 1 0 0 0 0 0 .
1 2 1 0 0 0 0 0

72
The solutions to this system (the eigenspace corresponding to 2 = 2) are given by vectors of
the form
1 2
s1 0 + s2 1 ,

1 0
so a basis for the eigenspace of 2 = 2 is

1 2
0 , 1 .
1 0

## 6.1.4 Multiplicity of an Eigenvalue

In the above example, the characteristic polynomial was ( 2)2 . Because of the square,
2 = 2 appeared as a root twice.
Definition 6.11. The number of times an eigenvalues appears as a root of the characteristic
polynomial is called the multiplicity of .
Fact 6.12. The dimension of the eigenspace associated to is always less than or equal to the
multiplicity of (in the above example, it was actually equal).

2 0 0 0
1 2 0 0 2 2
Example 6.13. Let A = 0 1 3 0. This has characteristic polynomial ( 2) ( 3) ,

0 0 1 3
and so has eigenvalues 2 and 3 each with multiplicity 2.
Lets look at the eigenspace for = 2. This is the nullspace of

0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0
0 1 1 0 0 0 1 1 .

0 0 1 1 0 0 0 0

0
0
x4
= span 1 . So the nullspace has dimension 1, despite the

This has nullspace x4 1

x4 1

fact that = 2 has multiplicity 2.
Another thing to be aware of when calculating eigenvalues and eigenspaces is that a matrix
need not having solely real eigenvalues.
 
0 1
Example 6.14. Let B = , the matrix corresponding to a rotation by 2 in R2 . Thinking
1 0
about it geometrically, rotating a non-vector by /2 will never yield a scalar multiple of that
vector. Therefore our intuition is that this (invertible) matrix should have no eigenvectors.
1
Actually working it out, the characteristic polynomial of B is = 2 + 1. But then
1

2 + 1 = 0 implies that = 1 = i.
So we see that matrices can have i maginary or complex eigenvalues. However, any B is real-
valued, so multiplying it by any ~x Rn will result in another vector in Rn . So any eigenspaces
corresponding these eigenvalues will not be subspaces of Rn .

73
6.1.5 Remarks about Eigenvalues
Things to keep in mind when finding eigenvalues and eigenvectors:
1. If A is an n n matrix, the characteristic polynomial det(A I) should always be a
degree n polynomial (you get n s from the n terms on the diagonal).

## 2. There can be at most n different eigenvalues for an n n matrix A.

3. If A is a triangular matrix, the eigenvalues of A are just the entries on the diagonal.

4. When youre finding eigenvectors and reducing [A I|~0], you should ALWAYS get free
variables and rows of zeros because weve chosen so that A I is a singular matrix. If
you dont get free variables and rows of zeros, something went wrong in your calculation!

5. Because eigenvalues are all scalars such that A I is a singular matrix, we can say
that A itself is a singular matrix if and only if 0 is an eigenvalue of A. But, this means we
can add something to the Big Theorem!
Theorem 6.15 (Big Theorem). Let S = {~a1 , . . . , ~an } be a set of n vectors in Rn , and let
A = [~a1 . . . ~an ]. Let T : Rn Rn be the linear transformation T (~x) = A~x. Then the following
are equivalent:
1. S spans Rn .

2. S is linearly independent.

## 3. A~x = ~b has a unique solution for all ~b in Rn .

4. T is onto.

5. T is one-to-one.

6. ker(T ) = {~0}.

7. range(T ) = Rn .

8. null(A) = {~0}.

9. col(A) = Rn .

10. row(A) = Rn .

11. nullity(A) = 0.

12. rank(A) = n.

14. det(A) 6= 0.

## 16. A row reduces to the identity matrix.

Here is the facebook example calculations from earlier:

74
Figure 3.2: Eigenvector Centrality Example.

Example 3.2. For the graph shown in Figure 3.2(a), the adjacency matrix is
2 3
66 0 1 0 77
6 7
A = 666 1 0 1 777 . (3.10)
4 5
0 1 0
Based on Equation 3.9, we need to solve Ce = ACe , or
(A I)Ce = 0. (3.11)
Assuming Ce = [u1 u2 u3 ]T ,
2 32 3 2 3
66 0 1 0 77 66 u1 77 66 0 77
66 1 0 1 777 666 u2 777 = 666 0 777 .
66 75 64 75 64 75 (3.12)
4
0 1 0 u3 0
Since Ce , [0 0 0]T , the characteristic equation is
0 1 0
det(A I) = 1 0 1 = 0, (3.13)
0 1 0
or equivalently,
2 3 2
( )( 1)
1( ) = 2 = (2 ) = 0. (3.14)
p p p
So the eigenvalues are ( 2, 0, + 2). We select the largest eigenvalue: 2.
We compute the corresponding eigenvector:
2 p 32 3 2 3
66 0 2 1p 0 77 6 u1 7 6 0 7
66 77 66 77 66 77
666 1 0 2 1p 777 666 u2 777 = 666 0 777 . (3.15)
64 75 4 5 4 5
0 1 0 2 u3 0

77
Assuming Ce vector has norm 1, its solution is
2 3 2 3
66 u1 77 666 p1/2 777
6 7
Ce = 666 u2 777 = 6666 2/2 7777 , (3.16)
4 5 4 5
u3 1/2

which denotes that node v2 is the most central node and nodes v1 and v3 have equal
centrality values.

Example 3.3. For the graph shown in Figure 3.2(b), the adjacency matrix is as
follows:
2 3
66 0 1 0 1 0 77
66 7
66 1 0 1 1 1 777
66 7
A = 666 0 1 0 1 0 7777 . (3.17)
66 1 1 1 0 0 77
66 77
4 5
0 1 0 0 0
The eigenvalues of A are ( 1.74, 1.27, 0.00, +0.33, +2.68). For eigenvector
centrality, the largest eigenvalue is selected: 2.68. The corresponding eigenvector
is the eigenvector centrality vector and is
2 3
66 0.4119 77
66 77
66 0.5825 77
6 77
Ce = 6666 0.4119 77 .
77 (3.18)
66 0.5237 77
66 75
4
0.2169

## 3.1.3 Katz Centrality

A major problem with eigenvector centrality arises when it considers di-
rected graphs (see Problem 1 in the Exercises). Centrality is only passed
on when we have (outgoing) edges, and in special cases such as when a
node is in a directed acyclic graph, centrality becomes zero, even though
the node can have many edges connected to it. In this case, the problem
can be rectified by adding a bias term to the centrality value. The bias term
is added to the centrality values for all nodes no matter how they are
situated in the network (i.e., irrespective of the network topology). The

78
Heres one more example with the Fuzzy Ball Graph:

Example 6.16.
This has eigenvector [1.0, 1.0, 1.0, 1.0, 1.0, 0.96, 0.95, 0.95, 0.95, 0.95, 0.54, 0.1]T . So we see that
five of the nodes are best connected and the rest have lower scores.

6.2
SKIP

## 6.3 Change of Basis

6.3.1 Motivation
Before we get into the technical details of this section, lets first revisit the idea of a basis and
what wed use a basis for. If W is a subspace of Rn , then remember that a basis is a linearly
independent spanning set for W . Why would we want this? Remember the main theorem about
bases: if B = {~u1 , . . . , ~um } is a basis for W , then any vector w ~ in W can be written as a linear
combination
~ = c1 ~u1 + + cm ~um
w
~ in the basis B. In some
in a unique way. We call the scalars c1 , . . . , cm the coordinates of w
sense, were imposing a coordinate system on the subspace W where each basis vector acts like
a coordinate axis.

Were used to seeing coordinates in the standard basis. For example, if {~e1 , ~e2 } are the
standard basis vectors for R2 , then any vector in R2 can be written as a linear combination of
these coordinates in a unique way, namely
     
a 1 0
=a +b .
b 0 1
 
a
This tells us the coordinates of the vector in the standard basis are just (a, b). This makes
b
sensewe go a units in the direction of ~e1 (the x-direction) and b units in the direction of ~e2 (the
y-direction) to get to the point (a, b).

77
Now, using other bases, we just want to extend this idea to other subspaces. For example,
if I have some plane in R3 , can we find some vector that acts like the x-axis and some vector
that acts like the y-axis? Well, if we have a basis for that plane, we will be able to impose a
coordinate system.

We are also interested in extending our ideas of coordinate systems beyond the standard
basis because some bases are much easier to work with than others. For example, if we have an
n n matrix A and a basis B = {~u1 , . . . , ~un } of Rn consisting of eigenvectors of A, then it is very
easy to compute A~x for any vector ~x in Rn . Why? Well, suppose the eigenvectors {~u1 , . . . , ~un }
correspond to the eigenvalues 1 , . . . , n . Then, we know A~ui = i ~ui . So, if we can find scalars
c1 , . . . , cn such that
~x = c1 ~u1 + + cn ~un ,
then
A~x = c1 A~u1 + + cn A~un = c1 1 ~u1 + + cn n ~un .
This becomes very useful when were trying to find higher powers of A; it says
Ak ~x = c1 Ak ~u1 + + cn Ak ~un = c1 k1 ~u1 + + cn kn ~un .
So, instead of multiplying ~x by A k times, we just have to multiply the eigenvectors by powers
of scalars and add them up. This is much easier to do!

We can investigate some of these ideas further in the future, but for now, thats a bit of
motivation of why wed want to be able to find a basis and then change to a different basis.
Now well move on to the computational side of how to change to a new basis. We restrict our
attention to changing between bases of Rn .

6.3.2 Changing it
Notation 6.17. If B = {~u1 , . . . , ~un } is a basis for Rn , and ~x is a vector in Rn , we write

c1
..
~xB = .
cn B

## where c1 , . . . , cn are the coordinates of ~x in the basis B, i.e.

~x = c1 ~u1 + + cn ~un .
So, when we write
c1
..
~xB = .
cn B
we DO NOT mean that ~x is equal to the vector with components c1 , . . . , cn . i.e we DO NOT
mean
c1
..
~x = . .
cn

78
Instead, we mean that ~x is a linear combination of the basis vectors in B with coefficients given
by c1 , . . . , cn .
     
2 1 2 4
Example 6.18. Let B be the basis , for R . If ~xB = , find .
1 3 5 B
 
4
The notation ~xB = means that
5 B
     
2 1 3
~x = 4 5 = .
1 3 19

## we always mean that ~x = ~xS , where S is

Notation 6.19. When we write ~x with no subscript,
c1
n ..
the standard basis for R . In other words, ~x = . with no subscript actually means that ~x is
cn
the vector whose components are given by c1 , . . . , cn .

## Lets bring it back to smoothies.

Example 6.20. Brian is all about smoothies. His favorite ingredients by far are banana, avocado
and tofu.

~b ~a ~t

Protein 1 2 5
.
Fat 1 15 3
Carb 22 1 2
These vectors form a basis for the set of macro nutrients R3 . Brian likes these ingredients so
much he would rather express macronutrients in terms of B = {~b, ~a, ~t} than in the standard
m1
basis. So instead of making a smoothie with a macro-profile m ~ = m2 , he would like to write
m3

r1
m
~ B = r2 where r1 , r2 and r3 describe the amount of banana, avocado and tofu he needs to
r2 B
put in his smoothie. Letting B = [~b ~a ~t], the equation m
~ = ~rB is the same as:

m
~ = Bm
~ B.

## This is the same as

B 1 m
~ =m
~ B.
Note that
27 1 69
1
B 1 = 64 108 2 .
1490
329 43 13

79
So, if his partner
wants
to go get fried oysters later, Brian can understand the macro composition
13
of oysters ~o = 12 in terms of his favorite foods, ~oB by multiplying
9

111 0.1
1
N 1~o = 68 0.1 .
745
1932 2.6
So if we take 10g of banana (i.e. -.1 100g servings of banana), 10g of avocado and 260g of
avocado we end up with the same macros as fried oysters.

Recap.

If B = {~u1 , . . . , ~un } is a basis for Rn , and ~x is a vector in Rn , then we can find scalars
c1 , . . . , cn such that
~x = c1 ~u1 + + cn ~un .
However, this says
c1
..
~xB = . .
cn B
If U is the matrix whose columns are the basis vectors, A = [~u1 . . . ~un ], then we can rewrite the
first equation as
~x = U~xB .
In words, this is saying that multiplying the coordinate vector ~xB of ~x in the basis B by the
matrix U gives us ~x, the coordinate vector of ~x in the standard basis S. Hence, we call U the
change of basis matrix from B to the standard basis.

What if were given a vector in the standard basis, and we want to find ~xB ? Well, because
B is a basis, the matrix U is invertible, so we can solve for ~xB by multiplying both sides of the
equation above by U 1 to get the equation

~xB = U 1 ~x.

## We call U 1 the change of basis matrix from the standard basis to B.

Here are a few more examples.
     
3 5 2
Example 6.21. Let B = {~u1 , ~u2 } = , . Find ~x if ~xB = .
1 2 1 B

We can do this problem using the formula ~x = U~xB . We first set U = [~u1 ~u2 ]. Then,
     
3 5 2 1
~x = U~xB = = .
1 2 1 0

 
1
Example 6.22. Let B be as in Example 2. Find ~xB if ~x = .
1

80
 
3 5
We do this problem using the formula ~xB = U 1 ~x. We know U = , so U 1 =
1 2
 
2 5
. Then,
1 3
     
1 2 5 1 3
~xB = U ~x = = .
1 3 1 2 B

## 6.3.3 Change of Basis Matrices between Nonstandard Bases

Now we want to find the change of basis matrix from B1 to B2 , where B1 and B2 are any bases
for Rn . Let B1 = {~u1 , . . . , ~un } and B2 = {~v1 , . . . , ~vn }. Let U be the matrix U = [~u1 . . . ~un ] and
V be the matrix V = [~v1 . . . ~vn ]. From the formulas above, we know ~x = U~xB1 and ~xB2 = V 1 ~x.
Substituting the first equation into the second, we see that

~xB2 = V 1 U~xB1

## Similarly, we know ~x = V ~xB2 and ~xB1 = U 1 ~x, so

~xB1 = U 1 V ~xB2

## and we say that U 1 V is the change of basis matrix from B2 to B1 .

Lets make it real with a smoothie example.

Example 6.23. Recall Brian just wants to think of macro nutrients in terms of avocado, banana
and tofu. His friend Adrian only thinks in terms of coconut milk, yogurt and pineapple. These
have macro profiles:
~b ~a ~t

~c ~y p~
Protein 3 9 2 Protein 1 2 5
A= Fat
, B= .
24 5 1 Fat 1 15 3
Carb 1 4 6 Carb 22 1 2

One canreadily check that A = {~c, ~y , p~} is a basis. We already saw how to compute fried oysters,
13
~o = 12, in Brians basis, ~oB . Say, that Brian calls up Adrian and asks him about the macros

9
in his fried oyster meal. If Brian tells Adrian ~oB , how does Adrian translate to his basis A?.
That is how do we relate ~oB and ~oA ?
First, we compute some inverses

27 1 69 26 46 1
1 1
B 1 = 64 108 2 , A1 = 143 16 45
1490 1027
329 43 13 91 3 201
Notice that
A1~o = ~oA , and B~oB = ~o.

81
It follows that
A1 B~oB = ~oA .
This is our conversion formula. If youre curious we have

6.2
~oA 9.8 .
55.3

## Heres a non-smoothie example.

1 1 2 1 1 2
Example 6.24. Let B1 = 1 , 4 , 1
and let B2 = 0 , 3 , 1 . Find ~xB1 if

3 2 6 1 0 2

3
~xB2 = 2 .
1 B
2
1 1 2 1 1 2
First set U = 1 4 1 and V = 0 3 1. We know ~xB1 = U 1 V ~xB2 so we need to
3 2 6 1 0 2
1
find U . This is
22 2 7
U 1 = 3 0 1 ,
10 1 3
so
22 2 7 1 1 2 3 129
~xB1 = U 1 V ~xB2 = 3 0 1 0 3 1 2 = 16 .
10 1 3 1 0 2 1 B 60 B
2 1

82
6.4 Diagonalization
6.4.1 How to diagonalize
Suppose A is an n n matrix with eigenvectors B = {~u1 , . . . , ~un } that form a basis for Rn . Let
1 , . . . , n be the corresponding eigenvectors. If ~x is any vector in Rn , because these eigenvectors
form a basis, we can write
~x = c1 ~u1 + + cn ~un
or
c1
..
~xB = . .
cn B
Because these are eigenvectors, if we multiply the first equation by A, we get

## A~x = c1 1 ~u1 + + cn n ~un

or
1 c1
(A~x)B = ... .

n cn B
But, if D is the matrix
1 0 ... 0
..
0 2 .
D=
.. ..
,

. . 0
0 ... 0 n
we have just derived the formula

1 c1 c1
.. ..
(A~x)B = . = D . = D~xB .
n cn cn

~xB = P 1 ~x

## This equation holds for any vector ~x, so tells us that

A = P DP 1 .

Definition 6.25. We say that an nn matrix A is diagonalizable if we can find these matrices
P and D (where D is a diagonal matrix) such that A = P DP 1 .

83
In the situation outlined above, we see that we can find P and D by finding a basis for Rn
consisting of eigenvectors {~u1 , . . . , ~un } of A with corresponding eigenvalues 1 , . . . , n . Then
P = [~u1 . . . ~un ] and D is the diagonal matrix with the eigenvalues on the diagonal.
 
3 1
Example 6.26. Find matrices P and D to diagonalize A = .
2 0
First, we want to find the eigenvalues and eigenvectors of A. A computation shows that
the eigenvalues
 are
 1 = 1 and 2 = 2, and an eigenvector corresponding to 1 = 1 is the 
1 1
vector ~u1 = . Similarly, an eigenvector corresponding to 2 = 2 is the vector ~u2 = .
2 1
   
1 1 1 0
Then, the vectors {~u1 , ~u2 } form a basis for R2 , so we set P = and D = . Then,
2 1 0 2
A = P DP 1 .

Remark. Not every matrix is diagonalizable. In order to do this computation, we need to find
a basis for Rn consisting of eigenvectors of A, and we cannot always do this. We can if either:

## The dimension of the -eigenspace of each eigenvalue, , is equal to the multiplicity of .

6.4.2 Applications
The main application of this is the following formula:

## Fact 6.27. If A = P DP 1 is diagonalizable then Ak = P Dk P 1 . Since D is a diagonal matrix,

Dk is obtained by raising each entry of D to the kth power.

For example, (P DP 1 )2 = P DP 1 P DP 1 = P DP 1 .

Example 6.28 (Hippos). Recall that the hippo population model had a matrix:

b1 b2 b3 0 2 1
1
A = p1 0 0 =

2 0 0
0 p2 0 0 15 0
This has eigenvalues and vectors:

1
1 = .95, ~u1 = .53
.11

1
2 = .10, ~u2 = 4.9
9.8

1
3 = 1.04, ~u3 = .48
.09

84

.95 0 0
Let P = [~u1 ~u2 ~u3 ] and D = 0 .10 0 . As we take larger and larger powers Ak
0 0 1.04

0 0 0
we get the matrix Dk starts to look like Dk 0 0 0 .
0 0 1.04k
One final example. Remember that e 2.714 is a ubiquitous number in math. This number
mysteriously comes up all the time. For example,
p
e = lim n LCM(1, 2, . . . , n),
n

and
1
e 1 = 0.
Heres two from probability. If n people are at the dog park, the dogs run around, then eventually
return to a random owner, we will have n/e dogs return to their actual owner. Or, if you interview
n candidates for a job, and must make the hiring decision immediately after each interview, you
should interview n/e of the candidates then hire the next best one to come along (called the
secretary problem, wiki it).
Another way to define ex from Math 126 is as an infinite sum:

x0 x1 x2 xk
ex = + + + + + .
0! 1! 2! k!
We can define the exponential of a matrix in the same way.

A0 A1 A2 Ak
eA = + + + + + .
0! 1! 2! k!
This is hard in general. Computers cant even do it. But it becomes tractable for diagonal
matrices:

## Fact 6.29. If A = P DP 1 is diagonalizable with eigenvalues 1 , . . . , n then

e 1 0 0
0 e 2
A 1
e =P . P

. .
. .
. .
.
. . . .
0 0 0 en
 
3 1
Example 6.30. Find eA for A = .
2 0
   
1 1 1 1
This is the example from just before. We have P = , P 1 = 1
3 , and
2 1 2 1
 
1 0
D= . By the fact we have
0 2
2
e2 + 13 e 1
e2 13 e
 1   
e 0 1
eA = P P = 3
2
3 .
0 e2 3 e2 23 e 1
3 e2 + 23 e

85
Note you can also write things like etA or 3A = elog(3)A . It is possible to check from the
d tA
definition that dt e = AetA . We can use this to use to solve systems of differential equations.
 
S(t)
Example 6.31. Suppose that ~x = is a function of t and represents people at different
F (t)
stages of having a facebook account. Let S(t) denote number of people that are susceptible to
making a facebook, but not yet on it. Let F (t) denote number of people who have an account.
We assume:
The fraction of the susceptible who make facebooks per unit time (say decades) is propor-
tional to the number with facebooks, b is the proportional number.
A fixed fraction rS of the facebook population deletes their facebook, but still is susceptible
to making a new account.
Another proportion dI vows to never use facebook again with 0 d 1 r.
The system of differential equations that model this phenomenom are

S 0 = bS + rF
I 0 = bS rF dF

 
S(t)
Letting ~x(t) = . The above looks like matrix multiplication. We can express this as
F (t)
 
d b r
~x(t) = ~x(t).
dt b r d
 
S(0)
Call the above matrix A. Suppose that ~x0 = . The solution to the above is given by
I(0)

## You can check this by taking the derivative of both sides.

For the concrete case r = 4/5, b = 2/3, d = 1/10 we have
 
2/3 4/5
A=
2/3 9/10
has eigenvalues and eigenvectors:
     
1 1
1.52, , 0.043, .
1.07 0.78
If we diagonalize and exponentiate we get

## 0.42 e1.52t + 0.57 e.043t 0.54 e1.52t + 0.54 e.043t

 
tA
e = .
0.45 e1.52t + 0.45 e.043t 0.57 e1.52t + 0.42 e.043t
 
4
Now if we plug in initial conditions of say ~x0 = (in billions) we can estimate when no
1
one is on facebook anymore by solving for when the second coordinate of etA ~x0 is equal to 0.

86
This amounts to multiplying the second row of A with ~x0 , setting equal to 0, and solving for
t
4(0.45 e1.52t + 0.45e.043t ) + 1(0.57e1.52t + 0.42e.043t ) = 0.
A solver numerically estimates this at t = 1.31 decades.

87