Math 146 Notes

MATH 146 January 9 Section 2
Pseudo-Definition. A field is a “number system” F having:

(1) elements 0,1 (and possibly others). Note: we require that 0 ̸= 1.
(2) operations + and ×, secondary operations − and −1 (the last defined for all nonzero elements)
and satisfying the “obvious” algebraic laws. (See the end of these notes for the official definition.)
Example 1. R, C, Q, Zp (p a prime) are fields. Another example is R(x), the field of all rational
functions p(x)/q(x) where p(x) and q(x) are polynomials over R (and q(x) ̸= 0).
Example 2. N, Z, Zn (n composite) are not fields. The class of all ordinals (under ordinal addition
and multiplication) is not a field.
Let F be a field.
Definition. A vector space over F is a set V on which two operations
• addition, V × V → V , denoted x + y
• scalar multiplication, F × V → V , denoted ax
are defined , and such that the following conditions hold :
For all x, y, z ∈ V and a, b ∈ F:
(VS 1) x+y =y+x
(VS 2) (x + y) + z = x + (y + z)
(VS 3) There exists a “zero vector” in V , denoted 0, which satisfies x + 0 = x for all x ∈ V .
(VS 4) For every x ∈ V there exists u ∈ V satisfying x + u = 0.
(VS 5) 1x = x
(VS 6) (ab)x = a(bx)
(VS 7) a(x + y) = ax + ay
(VS 8) (a + b)x = ax + bx
To define a vector space, you must specify the set and the two operations.
To prove that a set with two operations is a vector space, you need to verify the 8 conditions.
Example 3. Rn is the set of all n-tuples (a1 , a2 , . . . , an ) of real numbers. Addition and scalar
multiplication (by real numbers) on Rn are defined “coordinate-wise,” i.e.,
def
(a1 , a2 , . . . , an ) + (b1 , b2 , . . . , bn ) = (a1 +b1 , a2 +b2 , . . . , an +bn )
def
c(a1 , a2 , . . . , an ) = (ca1 , ca2 , . . . , can ).
Claim. Rn with coordinate-wise addition and scalar multiplication is a vector space over R.
Proof sketch. Must verify that each of (VS 1)–(VS 8) holds. □
Example 4. More generally, for any field F, the set Fn = {(a1 , a2 , . . . , an ) : a1 , . . . , an ∈ F} of
all n-tuples from F, with coordinatewise addition and scalar multiplication (calculated in F), is a
vector space over F.
Example 5. For any field F, the set FN = {(a1 , a2 , . . . , an , . . .) : ai ∈ F} of all infinite sequences
over F, with coordinatewise addition and scalar multiplication (calculated in F), is a vector space
over F.
Example 6. Let F = C. Let V = {o} where o is the apple I brought to class. Defining addition
and scalar multiplication in the only possible way, V is a vector space over C.
Here is our official definition of “field” (from Appendix C of Friedberg, Insel & Spence).
Definition. A field is a set F on which two operations + and · (called addition and multiplica-
tion) are defined (for all possible pairs in F, and always producing an element of F), and having
two privileged elements 0 and 1 with 0 ̸= 1, such that the following laws hold: for all a, b, c ∈ F,
(F 1) a+b=b+a
a·b=b·a
(F 2) a + (b + c) = (a + b) + c
a · (b · c) = (a · b) · c
(F 3) 0+a=a
1 · a = a.
(F 4) For every a ∈ F there exists x ∈ F with a + x = 0.
For every a ∈ F with a ̸= 0 there exists y ∈ F with a · y = 1.
(F 5) a · (b + c) = (a · b) + (a · c)
In any field F, we define −a to be the unique solution x to a + x = 0. Note that (F 4) guarantees
the existence of a solution, and uniqueness can be deduced from (F 1)–(F 3).
In any field F, if a ∈ F and a ̸= 0, then we define a−1 to be the unique solution x to ax = 1.
Note that (F 4) guarantees the existence of a solution, and uniqueness can again be deduced from
(F 1)–(F 3).
2
MATH 146 January 11 Lecture notes
Announcements:
More examples of vector spaces (over a field F):

(1) Fix m, n ≥ 1. Mm×n (F) is the set of all m × n matrices with entries from F. Define addition of
matrices in the usual way (componentwise); also define scalar multiplication componentwise:
a11 a12 · · · a1n ca11 ca12 · · · ca1n
   
 a21 a22 · · · a2n  def  ca21 ca22 · · · ca2n 
c
 ... .. ..  =  .. .. ..  .
. .   . . . 
am1 am2 · · · amn cam1 cam2 · · · camn
This is a vector space. What is the zero vector? Note: though you know how to multiply matrices,
the product operation is not itself part of the vector space. (Also, if m ̸= n then you can’t multiply
two m × n matrices.)
(2) Let F(R, R) be the set of all functions f : R → R. We can turn F into a vector space over R, using
the usual operations of adding two functions, and scaling a function by a real number.
(3) More generally, fix a nonempty set D. Let F(D, F) be the set of all functions f : D → F. We can
turn F(D, F) into a vector space by defining addition and scalar multiplication “pointwise.” I.e., if
f, g ∈ F(D, F), then we need to define f + g and this sum must be a function from D to F. We do
this by setting, for each x ∈ D,
def
(f + g)(x) = f (x) + g(x).
Scalar multiplication is defined similarly: if f ∈ F(D, F) and c ∈ F, then the function cf is defined
by setting, for each x ∈ D,
def
(cf )(x) = c · f (x).
With these two operations, F(D, F) is a vector space over F.
(4) Fix n ≥ 0. Pn (F) denotes the set of all “formal polynomials” in the variable x, of degree at most
n, using coefficients from F. Thus
Pn (F) = {a0 + a1 x + a2 x2 · · · + an xn : a0 , a1 , . . . , an ∈ F}.
“Formal” means that two polynomials are considered to be equal if and only if they have exactly
the same coefficients (not when they define the same polynomial functions F → F).
Addition of polynomials in Pn (F) is defined “term-wise” in the usual way. Similarly, multiplica-
tion of a polynomial by a scalar c ∈ F is defined by multiplying the coefficient of each term by c.
Pn (F) with these operations is a vector space over F.
Note that
F = P0 (F) ⊂ P1 (F) ⊂ P2 (F) ⊂ · · · ⊂ Pn (F) ⊂ Pn+1 (F) ⊂ · · ·
We let P(F) be the set of all formal polynomials (of all degrees) with coefficients from F. Thus
∞
[
P(F) = Pn (F).
n=0
P(F) with addition and scalar multiplication defined term-wise is a vector space over F.
Remark. A more standard notation for P(F) is F[x], but we will use P(F) in this course.
Next: some basic facts true of all vector spaces.
Theorem 1.1 (Cancellation Law). Let V be a vector space. If x, y, u ∈ V and x + u = y + u, then x = y.
Proof. Assume x + u = y + u. Let 0 be the zero vector of V (given by (VS 3). By (VS 4), there exists
z ∈ V with u + z = 0. Then
x = x+0 (VS 3)
= x + (u + z) by choice of z above
= (x + u) + z (VS 2)
= (y + u) + z assumption
= y + (u + z) (VS 2)
= y+0 choice of z above
= y (VS 3).
□
Corollary 1. Suppose V is a vector space. There is only one vector in V that can be the zero vector.
Proof. Suppose 01 , 02 ∈ V both satisfy (VS 3). Thus x + 01 = x + 02 = x for all x ∈ V . Use (VS 1) to flip
this to get 01 + x = 02 + x. Then 01 = 02 by the Cancellation Law. □
Corollary 2. Suppose V is a vector space and x ∈ V . There is only one vector u ∈ V satisfying x + u = 0.
Proof. Like the proof of Corollary 1; exercise. □
Definition. Let V be a vector space and x, y ∈ V .
(1) −x denotes the unique vector u ∈ V satisfying x + u = 0.
(2) x − y denotes x + (−y).
Theorem 1.2. Suppose V is a vector space over F, x ∈ V , and a ∈ F.
(1) 0x = 0.
(2) (−a)x = −(ax) = a(−x).
(3) a0 = 0.
Remark. Parts (1) and (2) are proved in the text. Parts (1) and (3) are also proved in Prof. Tatarko’s
notes.
Also note the overloaded notation in the statement. For example, in part (1), the first 0 is the scalar
0 ∈ F, while the second 0 is the zero vector in V . In part (2), the first − means minus in F, while the
second and third mean the minus in V defined above.
Here is a proof of (1), where I will use bold font for vectors, nonbold for scalars: In the field F we have
0 + 0 = 0; thus 0x = (0 + 0)x = 0x + 0x by (VS 8). On the other hand, we have 0x = 0x + 0 = 0 + 0x by
(VS 3) and (VS 1). Putting these together gives 0x + 0x = 0 + 0x. Now apply the Cancellation Law to
get 0x = 0.
2
Assume V is a vector space over F and x1 , . . . , xn ∈ V . Because of (VS 2), we can (and do) write
expressions like x1 + x2 + · · · + xn without declaring where the brackets go. And we could (though we
won’t) prove true facts like
x1 + x2 + · · · + xn = xσ(1) + xσ(2) + · · · + xσ(n) for any permutation σ of {1, 2, . . . , n}
c(x1 + x2 + · · · + xn ) = cx1 + cx2 + · · · + cxn for any scalar c ∈ F
−(x1 + x2 + · · · + xn ) = −x1 − x2 − · · · − xn
Going forward, you can freely use these facts, except when told not to.
Definition. Let V be a vector space over a field F, and let W ⊆ V .

(1) W is closed under addition (of V ) if x, y ∈ W =⇒ x + y ∈ W .
(2) W is closed under scalar multiplication (of V ) if x ∈ W and a ∈ F =⇒ ax ∈ W .
Definition (Different from our text!). Let V be a vector space over F. A subset W ⊆ V is a called
subspace of V if
(1) W is closed under the operations of V , and
(2) W ̸= ∅.
Lemma. Suppose V is a vector space over F and W is a subspace of V . Then W , with the operations of V
restricted to W , is itself a vector space over F.
Proof. First, the restrictions to W of the operations of V are operations of the right kind (W × W → W
and F × W → W ), because W is closed under the operations of V .
Next, (VS 1), (VS 2), and (VS 5)–(VS 8) follow automatically from their truth in V because of their
logical nature (universally quantified statements) and the fact that the operations of W are just the
restrictions to W of the operations of V . Proving (VS 3) and (VS 4) requires more work. Let +V be the
addition operation of V and let + denote its restriction to W .
(VS 3): since W ̸= ∅, we can choose x ∈ W . Then 0x ∈ W (as W is closed under scalar multiplication).
But 0x is the zero vector 0V of V (by Theorem 1.2(1) applied to V ), so 0V ∈ W . We’ll now show that 0V
is the zero vector of W : for any x ∈ W ,
x + 0V = x +V 0V (+ is the restriction of +V )
=x since (VS 3) is true in V and x ∈ V .
(VS 4): given x ∈ W , recall that −1 ∈ F so (−1)x ∈ W . Applying Theorem 1.2(2) and (VS 5) to V , we
get (−1)x = −(1x) = −x (the additive inverse of x in V ), which proves −x ∈ W . We now show that −x
is also an additive inverse of x in W :
x + (−x) = x +V (−x) = 0V by definition of −x. □
Incidentally, the proof shows that if W is a subspace of V , then:
• W always contains the zero vector of V (which is also the zero vector of W );
• W is closed under additive inverses (calculated in V ). Moreover, if x ∈ W then the additive inverse
of x calculated in V is the same as the additive inverse of x calculated in the vector space W .
Example.
(1) V = R3 as a vector space over R (usual operations). Let W be the following plane in R3 :
W = {(x, y, z) ∈ R3 : x + y + z = 0}.
It’s easy to check that W is a subspace of R3 : i.e., it is a subset of R3 , it is closed under addition
and scalar multiplication, and it is nonempty.
(2) Same V . Let W1 = {(x, y, z) ∈ R3 : x + y + z = 1}. W1 is not a subspace of R3 , for many reasons:
(i) it doesn’t contain the zero element of R3 , so it can’t be a subspace by the observation following
the proof of the Lemma; (ii) W isn’t closed under scalar multiplication; and (iii) W isn’t closed
under vector addition.
(3) Let W2 be the paraboloid
W2 = {(x, y, z) ∈ R3 : z = x2 + y 2 }.
W2 is not a subspace of R3 , as it is not closed under scalar multiplication. (It’s also not closed
under vector addition.)
(4) Given any vector space V :
• V is a subspace of itself.
• {0} is a subspace.
(5) The subspaces of R3 are R3 , all planes through 0, all lines through 0, and {0}.
(6) Let V be C considered as a vector space over R. (So scalar multiplication is R × C → C.)
(a) R is a subspace of V . (Proof: R ̸= ∅, R is closed under (complex) addition, and R is closed
under scaling by real numbers.)
(b) Can you find another subspace of V (other than C and {0})?
(7) Is R2 a subspace of R3 ?
(8) For each field F and n ≥ 0, Pn (F) is a subspace of Pn+1 (F) and is also a subspace of P(F).
2
Theorem 1.4. Let V be a vector space over F. Let W1 , W1 be subspaces of V . Then W1 ∩ W2 is also a
subspace of V .
Proof. We check the definition of being a subspace.
(1) Let x, y ∈ W1 ∩ W2 . Then x, y ∈ W1 so x + y ∈ W1 . Similarly, x, y ∈ W2 so x + y ∈ W2 . Thus
x + y ∈ W1 ∩ W2 , proving W1 ∩ W2 is closed under addition.
(2) Let x ∈ W1 ∩ W2 and c ∈ F. A similar proof shows cx ∈ W1 ∩ W2 .
(3) We have 0 ∈ W1 and 0 ∈ W2 by the earlier Lemma, so 0 ∈ W1 ∩ W2 , proving W1 ∩ W2 ̸= ∅. □
Note: in general, it is not true that the union of two subspaces is a subspace. See Assignment 1.
§1.4
Definition. Suppose V is a vector space over F and x ∈ V .

(1) For v1 , . . . , vn ∈ V , we say that x is a linear combination of v1 , . . . , vn (or of {v1 , . . . , vn }) if
there exist c1 , . . . , cn ∈ F such that x = c1 v1 + · · · + cn vn .
(2) If S ⊆ V is infinite, we say that x is a linear combination of S if x is a linear combination of
some v1 , . . . , vn ∈ S.
Definition 1.5. Suppose V is a vector space over F and ∅ ̸= S ⊆ V . The span of S is the set
def
span(S) = {all linear combinations of S} ⊆ V.
def
Also span(∅) = {0}.
Example. In R3 , the span of {(1, 0, 0), (0, 1, 0)} is
{a(1, 0, 0) + b(0, 1, 0) : a, b ∈ R} = {(a, b, 0) : a, b ∈ R},
the x-y plane in R3 .
Example. In P(R), what is the span of the infinite set S = {x, x2 , x3 , x4 , . . .}? It includes all linear
combinations of finitely many of x, x2 , x3 , x4 , . . .. Thus we get all polynomials of the form
a1 x + a2 x 2 + · · · + an x n , a1 , a2 , . . . , an ∈ R.
In other words, span(S) = {f (x) ∈ P(R) : f (0) = 0}.
span(S) will turn out to always be a subspace. Before proving this, we need the following.
Technical Observation. Suppose V is a vector space, S ⊆ V with S infinite, and x, y ∈ span(S). Then
each of x, y is a linear combination of some finite set of vectors in S, but perhaps not the same finite set.
Not a problem: if
x = a1 u1 + · · · + am um with u1 , . . . , um ∈ S
y = b1 v1 + · · · + bn vn with v1 , . . . , vn ∈ S
then, letting k = |{u1 , . . . , um } ∩ {v1 , . . . , vn }|, without loss of generality we can assume ui = vi for
i = 1, . . . , k and {uk+1 , . . . , um } ∩ {vk+1 , . . . , vn } = ∅ and write
x = a1 u1 + · · · + ak uk + ak+1 uk+1 + · · · + am um + 0·vk+1 + · · · + 0·vn
y = b1 u1 + · · · + bk uk + 0 · uk+1 + · · · + 0 · um + bk+1 vk+1 + · · · + bn vn
and both x, y are now written as linear combinations of the same finite subset {u1 , . . . , um } ∪ {vk+1 , . . . , vn }
of S.
Theorem 1.6. Let V be vector space over F. Let S ⊆ V . Then:
(1) span(S) is a subspace of V .
(2) span(S) contains S, i.e., S ⊆ span(S).
(3) Moreover, span(S) is the smallest subspace of V containing S, in the following sense: if W is any
subspace of V with S ⊆ W , then span(S) ⊆ W .
Proof. The case when S = ∅ is left as an exercise. So assume S ̸= ∅.
(1) First show span(S) is closed under scalar multiplication. If x ∈ span(S) then (whether S is finite or
infinite) there exist u1 , . . . , un ∈ S and scalars a1 , . . . , an ∈ F with
x = a1 u1 , . . . , an un .
For any scalar c ∈ F we then get
c(a1 v1 + · · · + an vn ) = (ca1 )v1 + · · · + (can )vn
which is also a linear combination of u1 , . . . , un . This proves cx ∈ span(S) as required.
Next show span(S) is closed under addition. Let x, y ∈ span(S). By the earlier technical remark, there
exist v1 , . . . , vn ∈ S such that both x and y can be written as linear combinations of v1 , . . . , vn , say
x = a1 v1 + · · · + an vn
y = b1 v1 + · · · + bn vn .
Then
x + y = (a1 v1 + · · · + an vn ) + (b1 v1 + · · · + bn vn )
= (a1 v1 + b1 v1 ) + · · · + (an vn + bn vn ) (commutativity and associativity)
= (a1 + b1 )v1 + · · · + (an + bn )vn (VS 8).
This last expression is a linear combination of v1 , . . . , vn , vectors in S. This proves x + y ∈ span(S).
(2) If S = {v1 , . . . , vn }, then v ∈ S implies v = vi for some i. Then
v = 0·v1 + · · · + 0·vi−1 + 1·vi + 0·vi+1 + · · · + 0·vn
which proves v ∈ span(S) in this case. If S is infinite, then we can write each v ∈ S as v = 1·v, which is a
linear combination of finitely many (i.e., one) vector from S. So again v ∈ span(S).
(3) This will be proved on Wednesday. □
2
Theorem 1.6. Let V be vector space over F. Let S ⊆ V . Then:

(1) span(S) is a subspace of V .
(2) span(S) contains S, i.e., S ⊆ span(S).
(3) Moreover, span(S) is the smallest subspace of V containing S, in the following sense: if W is any
subspace of V with S ⊆ W , then span(S) ⊆ W .
Proof (continued). (3) Suppose W is a subspace of V and S ⊆ W . Let x ∈ span(S). So x = a1 v1 +· · ·+an vn
for some v1 , . . . , vn ∈ S and a1 , . . . , an ∈ F. Then
• v1 , . . . , vn ∈ W (as S ⊆ W )
• So a1 v1 , a2 v2 , . . . , an vn ∈ W (as W is a subspace, so is closed under scalar multiplication)
• So a1 v1 + · · · + an vn ∈ W (as W is closed under addition)
I.e., x ∈ W . Since x was arbitrary in span(S), this proves span(S) ⊆ W . □
Comment. Theorem 1.6 gives an easy way to prove span(S) ⊆ W when S ⊆ V and W is a subspace of
V : just prove S ⊆ W and then use Theorem 1.6(3) to deduce span(S) ⊆ W .
Jargon. If S ⊆ V and span(S) = W , then we say that S spans the subspace W , or that S is a spanning
set of W . In particular, if span(S) = V then we say S spans V .
A basic question we will be frequently faced with is deciding whether a given vector x is in the span of
a given set of vectors.
Example. Consider the vector space M2×2 (R) of 2 × 2 matrices over R. Determine whether

2 1 1 0 1 1 0 1 1 2
∈ span , , , .
−1 1 −1 0 0 −1 1 1 1 0
Solution. We are asking whether there exist scalars a, b, c, d ∈ R satisfying

1 0 1 1 0 1 1 2 2 1
a +b +c +d = .
−1 0 0 −1 1 1 1 0 −1 1
Equivalently,
a 0 b b 0 c d 2d 2 1
+ + + =
−a 0 0 −b c c d 0 −1 1
or more simply
a + b + d b + c + 2d 2 1
= .
−a + c + d −b + c −1 1
Since matrices are equal if and only if their corresponding entries are equal, this last equation simplifies to
the following system of four linear equations:


 a + b + d = 2
b + c + 2d = 1

(∗)

 −a + c + d = −1
 −b + c = 1
The point of this example is that the original question (is the first matrix in the span of the other four
matrices?) is equivalent to asking whether the above system of 4 linear equations in the four unknowns
a, b, c, d has a solution in real numbers (remember that in this example, the scalars are real numbers).
We determine whether the system has a solution by solving it. There are many ways to do this (software,
row-reduction of an appropriate matrix). For now, we want you to know how to solve the system by direct
manipulation of the equations, called the method of elimination. I’ll illustrate in this example.
Step 1: eliminate the variable a from the 3rd equation, by adding the first equation to it. This changes
the system to


 a + b + d = 2
b + c + 2d = 1


 b + c + 2d = 1
 −b + c = 1
It’s easy to argue that every solution to the first system is also a solution to the 2nd system. The
converse is also true, since we can return to the original system by subtracting the first equation from the
(new) third equation. In other words, the original system and the new system are equivalent (have the
same solutions).
The 3rd equation is now irrelevant since it is a repeat. So we can delete it.

 a + b + d = 2
b + c + 2d = 1
 −b + c = 1
Step 2. There are no other occurrences of a that we can eliminate, so we consider eliminating another
variable. For example, we could eliminate b from the first and second equations by adding the third
equation to each of them. This gives

 a + c + d = 3
+ 2c + 2d = 2
−b + c = 1

Again, this system is equivalent to the original one.

Step 3. An obvious simplification to do here is to divide the second equation by 2 (i.e., multiply by 21 ).
Let’s also multiply the 3rd equation by −1. These simplifications do not change the solution set of the
system.

 a + c + d = 3
c + d = 1
 b − c = −1
Step 4. Can we eliminate another variable? If I try to eliminate c from the 2nd equation, I would need to
either add the 3rd equation to it, but that would re-introduce b; or subtract the first equation from it, but
that would introduce a. So I don’t want to do that. However, I could add the 2nd equation to the third
and eliminate c from the 3rd equation; and I could subtract the 2nd equation from the first and eliminate
c from it. Let’s do that:

 a = 2
(∗∗) c + d = 1
b + d = 0

Step 5: Neither occurrence of d can be eliminated (without re-introducing a, b or c). This signals that we
are done with the elimination process. Since we didn’t find a contradiction (such as 0 = 1), this system
has solutions. Since solutions exist, this means that the answer to our original question is YES:

2 1 1 0 1 1 0 1 1 2
is in span , , , .
−1 1 −1 0 0 −1 1 1 1 0
To actually see how to express the first matrix as a linear combination of the other four, we first describe
all solutions to the system (∗). Introducing a new symbol, or parameter, for d, say d = t, we can express
2
a, b, c, d in terms of t using the equations in (∗∗), obtaining
a = 2
b = − t
t ∈ R.
c = 1 − t
d = t
As t ranges over R, the equations above describe all solutions to (∗). For example, when t = 0 we get the
solution (a, b, c, d) = (2, 0, 1, 0). Thus one way to describe the original matrix as a sum of the other four is
“2 times the first matrix plus the third matrix,” i.e.,

2 1 1 0 1 1 0 1 1 2
=2· +0· +1· +0· .
−1 1 −1 0 0 −1 1 1 1 0
More information about the method of elimination will be placed on Learn.
3
Properties of span. Let V be a vector space and S, T ⊆ V .

(1) If S ⊆ T then span(S) ⊆ span(T ).
Proof. Can prove this directly. Or argue like this: T ⊆ span(T ) (Theorem 1.6), so S ⊆ T =⇒
S ⊆ span(T ) =⇒ span(S) ⊆ span(T ) (using Theorem 1.6, since span(T ) is a subspace). □
?
(2) span(S ∪ T ) = span(S) ∪ span(T )
Answer. No: right-hand side might not be a subspace; left-hand side is always a subspace. □
§1.5
Jargon. Let V be a vector space over F. Given a linear combination
a1 v1 + · · · + an vn
of vectors from V , we say call it trivial if a1 = a2 = · · · = an = 0, and nontrivial otherwise.
Definition. Let V be a vector space over a field F. Let S ⊆ V .
(1) Say S is linearly dependent if there exists a nontrivial linear combination of distinct vectors from
S which equals 0; i.e., if it is possible to write
(∗) a1 v1 + a2 v2 + · · · + an vn = 0
for some distinct vectors v1 , . . . , vn ∈ S , and such that at least one ai is ̸= 0 .
(2) Say S is linearly independent if S is not linearly dependent: i.e., if

v1 , . . . , vn ∈ S 
v1 , . . . , vn distinct =⇒ a1 = a2 = · · · = an = 0.
a1 v1 + · · · + an vn = 0 
Example. (1) In R2 , the set {(1, −1), (−2, 2), (3, 4)} is linearly dependent because we can write

1 −2 3 0
2 +1 +0 = .
−1 2 4 0
(2) In F3 , is the set {(1,
  1, 0), (1,
 0,1), (0,1, 
1)} linearly
  dependent or independent?
1 1 0 0
Solution: solve a 1 + b 0 + c 1 = 0. We get the system
0 1 1 0
a + b = 0
a + c = 0
b + c = 0.
If F = R (or C or Q), we can solve to get a = b = c = 0, meaning the set is linearly independent.
But if F = Z2 , we get the solution a = b = c = t (t ∈ Z2 ). In fact, the sum of the three vectors in
(Z2 )3 is the zero vector, so in (Z2 )3 these vectors are linearly dependent.
(3) In any vector space, {0} is linearly dependent. Any set containing 0 is linearly dependent.
(4) {v} is linearly dependent iff v = 0. (Use A1 Problem 3 for ⇒.)
(5) What about ∅? Is it linearly dependent, or independent? (Answer: independent)
(6) Suppose S ⊆ T ⊆ V . Which of the following implications are correct?
?
S is linearly dependent =⇒ T is linearly dependent. (True)
?
T is linearly dependent =⇒ S is linearly dependent. (False)
?
S is linearly independent =⇒ T is linearly independent. (False)
?
T is linearly independent =⇒ S is linearly independent. (True)
Proposition. Let V be a vector space over F. Let S ⊆ V . TFAE:
(1) S is linearly dependent.
(2) S = {0} or ∃v ∈ S such that v can be expressed as a linear combination of other vectors in S.
Proof. (⇐) We have already explained why {0} is linearly dependent. Suppose the vector v ∈ S can be
written as a linear combination of other vectors u1 , . . . , un ∈ S, say
v = c1 u1 + · · · + cn un , c1 , . . . , cn ∈ F.
We can assume that u1 , . . . , un are distinct. Since v ̸∈ {u1 , . . . , un } by assumption, it follows that
u1 , . . . , un , v are distinct. Now note that (−1)v = −v by Theorem 1.2(3) and so
c1 u1 + · · · + cn un + (−1)v = 0.
As u1 , . . . , un , v are distinct vectors in S and at least one of the coefficients (−1) in the above linear
combination is not 0, we get that S is linearly dependent.
(⇒) Assume S is linearly dependent. By assumption there exist distinct u1 , . . . , un ∈ S and a1 , . . . , an ∈
F, not all 0, such that
(∗) a1 u1 + · · · + an un = 0.
By “weeding out” terms where ai = 0, we can assume that ai ̸= 0 for all i = 1, . . . , n.
Case 1: n = 1. Then a1 u1 = 0 with a1 ̸= 0. Thus u1 = 0 by A1 Problem 3, so {0} ⊆ S. One option is
S = {0}; the other is that there exists v ∈ S with v =
̸ 0. In the second option, we can write 0 ∈ S as a
linear combination of v ∈ S, namely, 0 = 0v.
Case 2: n > 1.
In this case we can show that every ui can be written as a linear combination of the other uj ’s. For
example, here is the proof that u1 can be written as a linear combination of u2 , . . . , un : rewite (∗) as
a1 u1 = (−a2 )u2 + · · · + (−an )un .
Recall that a1 ̸= 0, so a−1
1 exists (in F). Multiply both sides of the equation to get
u1 = (−a2 /a1 )u2 + · · · + (−an /a1 )un
proving u1 ∈ S can be written as a linear combination of u2 , . . . , un ∈ S. □
Comment. Another equivalent condition to “S is linearly dependent” is
∃v ∈ S with v ∈ span(S \ {v}).
Indeed, if v ∈ S and v = a1 u1 + · · · + an un with u1 , . . . , un ∈ S and ui ̸= v for all i, then v ∈ span(S \ {v}).
If S = {0} then 0 ∈ span(∅) = span(S \ {0}).
Taking the negation, we get
Corollary. A set S in a vector space is linearly independent iff ∀v ∈ S we have v ̸∈ span(S \ {v}).
2
§1.6 Recall: a set S (in a vector space/F) is linearly independent if


v1 , . . . , vn ∈ S 
v1 , . . . , vn distinct =⇒ a1 = a2 = · · · = an = 0.
a1 v1 + · · · + an vn = 0 
Definition. Let V be a vector space over F. A subset S ⊆ V is a basis for V if S is linearly independent
and spans V .
Example. (1) V = Rn : a basis is En = {(1, 0, 0, . . .), (0, 1, 0, . . .), . . . , (0, . . . , 0, 1)}.
| {z } | {z } | {z }
e1 e2 en
(2) More generally, En is a basis for Fn (for any F).
(3) V = Mm×n (F): a basis is Em,n = {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} where Eij is the matrix of all 0’s
except one 1 in the (i, j) spot.
(4) V = P(F): a basis is B = {1, x, x2 , . . .}.
Proof sketch of (4). Show span(B) = P(F): ⊆ is obvious. For ⊇, let f ∈ P(F), say
f = a0 + a1 x + a2 x2 + · · · + an xn
= a0 ·1 + a1 x + · · · + an xn .
So f is a linear combination of 1, x, x2 , . . . , xn , all in B; so f ∈ span(B). This proves P(F) ⊆ span(B).
Prove B is linearly independent: assume xi1 , . . . , xik are distinct vectors in B, ai1 , . . . , aik ∈ F, and
ai1 xi1 + · · · + aik xik = 0.
By definition, a polynomial is the zero polynomial iff all of its coefficients are 0, so
ai 1 = · · · = ai k = 0
proving B is linearly independent. So B is a basis. □
(5) Here is another basis for P(R), invented by Legendre: {P0 , P1 , . . . , Pn , . . .} where
n k
X n n+k x−1
Pn = .
k=0
k k 2
For example, P0 = 1, P1 = x, P2 = 12 (3x2 − 1), . . . This basis for P(R) is “better” than B in some
applications.
Theorem 1.8. Suppose B = {v1 , . . . , vn } ⊆ V is finite. TFAE:
(1) B is a basis for V .
(2) Every v ∈ V admits a unique representation v = a1 v1 + · · · + an vn as a linear combination of B.
Proof sketch. (1) ⇒ (2). Since B spans V , every vector in V admits some representation as a linear
combination of B. To prove uniquencess, assume that v ∈ V admits two representations
v = a1 v1 + · · · + an vn
and v = b1 v1 + · · · + bn vn .
Subtract the 2nd from the first to get
0 = (a1 − b1 )v1 + · · · + (an − bn )vn .
Use linear independence of B to get
a1 − b1 = · · · = an − bn = 0, i.e., ai = bi for all i = 1, . . . , n.
(2) ⇒ (1). By assumption, every v ∈ V admits a representation as a linear combination of B, so
span(B) = V . To prove B is linear independent, note that one representation of the zero vector (as a
linear combination of B) is
0 = 0v1 + · · · + 0vn .
By the uniqueness assumption, this is the only representation of the zero vector. That is,
a1 v1 + · · · + an vn = 0 =⇒ a1 = · · · = an = 0,
proving B is linear independent. □
Corollary. Let V be a vector space/F. If V has a finite basis B = {v1 , . . . , vn }, then V is “naturally” in
1-1 correspondence with Fn via
a1 v1 + · · · + an vn ↭ (a1 , . . . , an )
| {z } | {z }
∈V ∈Fn
We’ll discuss this correspondence more later.
Existence of bases
Theorem 1.9. Suppose the vector space V is spanned by a finite set S. Then S can be “shrunk” to a
basis B of V (i.e., ∃ basis B ⊆ S).
Proof sketch. We have span(S) = V . If S is linearly independent, then S itself is a basis. Otherwise,
∃v ∈ S such that v ∈ span(S \ {v}), and hence span(S \ {v}) = span(S) = V . Choose such v and let
S ′ = S \ {v}. It still spans V but is smaller. Repeat until you can’t anymore; the final S (k) must be linearly
independent and a basis. □
Question: is the theorem true for infinite spanning sets? More care would be needed in the proof. For
example, it is possible to construct a vector space V ̸= {0} and an infinite spanning set {v0 , v1 , . . .} with
the property that for every k, vk ∈ span({vk+1 , vk+2 , . . .}). So you might first throw away v0 , then v1 , then
v2 , etc. and at the end you will have reduced S to ∅ (which doesn’t span V ).1
1And even more problems arise if S is uncountable.

2
Question. Suppose V has a finite linearly independent set S. Can S always be “grown” (i.e., extended)
to a basis for V ?
Naive algorithm proving “YES.”
Assume S ⊆ V is linearly independent. If span(S) = V , then S is already a basis. Otherwise, there
exists x ∈ V with x 6∈ span(S). Let S 0 = S ∪ {x}. I want to say that S 0 is also linearly independent. (I’ll
justify this later.)
Repeat: if span(S 0 ) = V then S 0 is a basis. Otherwise, there exists x0 ∈ V with x0 6∈ span(S 0 ). Let
S 00 = S 0 ∪ {x0 }. Then S 00 is still linearly independent.
Continue: we get S ⊂ S 0 ⊂ S 00 ⊂ · · · ⊂ S (n) ⊂ · · · with each set S (k) linearly independent. Now there is
the question of whether this process terminates. It turns out that we can prove termination, provided we
assume that V has a finite spanning set.
To fully justify the claims above, we need the following two theorems.
Theorem 1.7. Suppose V is a vector space, S ⊆ V , and x ∈ V with x 6∈ S. Then
S ∪ {x} is linearly independent ⇐⇒ (S is linearly independent and x 6∈ span(S)).
Proof sketch. (⇒) By known facts.
(⇐) Assume S is linearly independent and x 6∈ span(S). Suppose S ∪ {x} is linearly dependent. So
there exist v1 , . . . , vn ∈ S ∪ {x} (distinct) and scalars a1 , . . . , an ∈ F, not all 0, with
(∗) a1 v1 + · · · + an vn = 0.
By weeding out those ai vi for which ai = 0, we can assume that we have ai 6= 0 for all i.
We must have x ∈ {v1 , . . . , vn } (why?) WLOG assume that vn = x. Since an 6= 0, we can “solve for x”
in (∗) to get x ∈ span(v1 , . . . , vn−1 ) ⊆ span(S), contradiction.
Theorem 1.10 (Baby Replacement Theorem1). Suppose V is spanned by some finite set of size n. Then
every linearly independent set S ⊆ V satisfies |S| ≤ n.
(Proof deferred until Friday).
Now consider again the Question and the naive algorithm for solving it. Assume V is spanned by a
finite set of size n, and S is linearly independent. Recall the sequence of growing linearly independent sets
S ⊂ S 0 ⊂ S 00 ⊂ · · ·
By the Baby Replacement Theorem, all of these sets must have size ≤ n, so the sequence cannot go on
forever. Therefore at some step, say S (k) , the algorithm terminates, which must be because S (k) is a basis.
This proves:
Corollary 2 (c). Suppose V is spanned by a finite set. Then every linearly independent set S ⊆ V can be
extended to a basis for V .
Here is another application of the Baby Replacement Theorem.
Corollary 1. Assume V is spanned by a finite set.
(1) V has a finite basis B.
(2) Let n = |B|. Every basis of V has size n.
1The full Replacement Theorem says that, if T is the spanning set of size n, then not only do we have |S| ≤ n, but also
there exists a subset H ⊆ T of size |H| = n − |S|, such that S ∪ H spans V .
Proof. (1) Let S be a finite spanning set. S can be shrunk to a basis B, by Theorem 1.9 (Jan 23). Obviously
B is also finite.
(2) Let C be another basis. Apply the Baby Replacement Theorem 1.10 using B as the finite spanning
set and S := C as the linearly independent set to get |C| ≤ |B|. So C is also a finite basis. So we can
reverse the roles of B and C in the above argument to get |B| ≤ |C|. So |C| = |B| = n.
2
Recap: suppose V has a finite spanning set.
• Theorem 1.9: finite spanning sets shrink to bases.
• (Baby) Replacement Theorem: finite spanning set B, S lin. indep. =⇒ |S| ≤ |B|.
• Corollary 2(c): lin. indep. sets extend to bases.
• Corollary 1(1): V has a basis.
• Corollary 1(2): all bases have same (finite) size.
Still need to prove the (Baby) Replacement Theorem. First we prove:
Exchange Lemma∗ . Suppose V is a vector space, span(x1 , . . . , xk , v1 , . . . , v` ) = V (possibly with k = 0),
and x ∈ V with x 6= span(x1 , . . . , xk ). Then there exists i ∈ {1, . . . , `} such that
span(x1 , . . . , xk , x, v1 , . . . , vi−1 , vi+1 , . . . , v` ) = V.
Proof sketch. Let S = {x1 , . . . , xk , v1 , . . . , v` }. Since span(S) = V we can write

(∗) x = a1 x1 + · · · + ak xk + b1 v1 + · · · + b` v` .
At least one bi 6= 0 (why?); choose such i. Let T = (S \ {vi }) ∪ {x}. Using bi 6= 0, we can manipulate (∗)
to show vi ∈ span(T ). Clearly S \ {vi } ⊆ T ⊆ span(T ), so
S ⊆ span(T )
so
span(S) ⊆ span(T )
(by Theorem 1.6), proving span(T ) = V .
(Baby) Replacement Theorem. Suppose V = span(B) with |B| = n. Then every linearly independent
set S ⊆ V satisfies |S| ≤ n.
Proof. Write B = {v1 , . . . , vn }. Assume there is a counter-example; then there is one with |S| = n + 1.
Write S = {x1 , . . . , xn+1 }. Observe that
span(v1 , v2 , . . . , vn ) = V and x1 6∈ span(∅) (why?).
| {z }
B
So by the Exchange Lemma, there exists i ∈ {1, . . . , n} so that

B 0 := (B \ {vi }) ∪ {x1 } still spans V .
For concreteness, let’s assume it is i = 1: so
span(x1 , v2 , . . . , vn ) = V.
| {z }
B0
Repeating, we have x2 6∈ span(x1 ) (why?). Again by the Exchange Lemma, there exists j ∈ {2, . . . , n}
such that (B 0 \ {vj }) ∪ {x2 } still spans V . For concreteness, assume it is j = 2, so
span(x1 , x2 , v3 , . . . , vn ) = V.
Keep repeating until we run out of v’s. Then we will have
span(x1 , x2 , . . . , xn ) = V.
In particular, xn+1 ∈ span(x1 , . . . , xn ), contradicting that S is linearly independent.
Corollary 1 justifies the next definition.
Definition. Let V be a vector space.
(1) Suppose V is spanned by a finite set. The dimension of V , written dim V , is the unique size n of
every basis of V . We also say that V is finite-dimensional.
(2) If V has no finite spanning set, then we say V is infinite-dimensional.
Example.
(1) Fn . It has the finite basis En of size n. So dim Fn = n.
(2) Mm×n (F). It has the finite basis Em,n of size mn. So dim Mm×n (F) = mn.
(3) P(F). It has the infinite basis {1, x, x2 , . . .} (which is linearly independent). By the Replacement
Theorem, P(F) has no finite spanning set, i.e., is infinite-dimensional.
(4) F(R, R). It has an infinite linearly independent subset (A2). So by the Replacement Theorem, it
has no finite spanning set. So it is infinite-dimensional.
Here is an intuitively obvious but important consequence of our results.
Theorem 1.11. Let V be a finite-dimensional vector space. Let W be a subspace. Then:
(1) W is finite-dimensional and dim W ≤ dim V .
(2) If W is a proper subspace (i.e., W ⊂ V ), then dim W < dim V .
Proof. Let n = dim V . Fix a subspace W . By the Replacement Theorem or Corollary 2(c), every lin.
indep. set in V has size ≤ n. In particular, every lin. indep. set in W has size ≤ n. This means that if we
start with some lin. indep. subset S ⊆ W (say S = ∅), the Naive Algorithm for extending S to a basis
for W will terminate, say at C. Then |C| = dim W (by definition). As C is linearly independent (in W ,
and hence in V ), we have |C| ≤ n (see the comments at the start of this proof). Thus dim W ≤ n, proving
(1).
(2) Let C be the basis for W we found in (1). C is linearly independent in V , so we can extend C to a
basis B of V (by Corollary 2(c)). We have C ⊆ B, |C| = dim W ≤ n, and |B| = dim V = n. If C = B, then
we would have W = span(C) = span(B) = V , contradiction. So C ⊂ B, so |C| < |B|, proving (2).
Here is another cute result that is sometimes useful.
Corollary 2 (continued). Assume V is a finite-dimensional vector space with dim V = n. Suppose S ⊆ V
is a subset with |S| = n. Then
(1)
S is linearly independent ⇐⇒ S spans V ⇐⇒ S is a basis for V .
(1)
Proof. (⇒) By Corollary 2(c), S can be extended to a basis C. Then |C| = n. But S ⊆ C and |S| = n. So
S = C and S is already a basis for V .
(1)
(⇐) By Theorem 1.9 (Jan 23), we can shrink S to a basis C. Then |C| = n. But C ⊆ S and |S| = n. So
S = C and S is already a basis for V .
2
The material in this week’s lectures is not in our textbooks (only in the exercises of Friedberg-Insel-Spence).
Example. V = M2×2 (R),

a b 0 b
W1 = | a, b, c ∈ R , W2 = | b, d ∈ R .
c 0 b d
Warm-up: what is dim W1 ? dim W2 ? W1 ∩ W2 ? dim(W1 ∩ W2 )?
Definition. Suppose V is a vector space and W1 , W2 are subspaces. Their sum is the set
def
W1 + W2 = {w1 + w2 : wi ∈ Wi for i = 1, 2} ⊆ V.
Example. V = M2×2 (R), W1 and W2 as in the previous lecture. W1 + W2 = V .
Example. V = R3 , W1 = {the x-axis} = span(e1 ) and W2 = {the y-axis} = span(e2 ). Then
W1 + W2 = {(a, 0, 0) + (0, b, 0) : a, b ∈ R} = {(a, b, 0) : a, b ∈ R} = {the x, y-plane} = span(e1 , e2 ).
Proposition 1. If W1 , W2 are subspaces of V , then W1 + W2 is a subspace of V . Moreover, W1 + W2
contains both W1 and W2 .
Proof. 0 = 0 + 0 ∈ W + W 0 so W + W 0 6= ∅.
Suppose x, y ∈ W1 + W2 (must show x + y ∈ W1 + W2 ). By definition this means x = w1 + w2 and
y = u1 + u2 for some w1 , u1 ∈ W1 and w2 , u2 ∈ W2 . So
x + y = (w1 + w2 ) + (u1 + u2 )
= (w1 + u1 ) + (w2 + u2 ) (by commutativity, associativity).
| {z } | {z }
∈W1 ∈W2
So x + y ∈ W1 + W2 . A similar proof, using one of the distributive axioms, shows W1 + W2 is closed under
scalar multiplication. So W1 + W2 is a subspace of V .
Finally, w1 ∈ W1 =⇒
w1 = w1 + 0 ∈ W1 + W2
so W1 ⊆ W1 + W2 . Similarly, w2 = 0 + w2 ∈ W1 + W2 shows W2 ⊆ W1 + W2 .
Remark. W1 + W2 is the smallest subspace containing both W1 and W2 ; that is, every subspace that
contains W1 and W2 also contains W1 + W2 .
Proposition 2. span(S ∪ T ) = span(S) + span(T ).
Proof. (⊆) S ⊆ span(S) ⊆ span(S) + span(T ). Similarly, T ⊆ span(T ) ⊆ span(S) + span(T ). So
S ∪ T ⊆ span(S) + span(T ).
Since the RHS is a subspace (Prop. 1), we get
span(S ∪ T ) ⊆ span(S) + span(T )
by Theorem 1.6.
(⊇) Clearly span(S ∪ T ) is a subspace containing both span(S) and span(T ). Thus span(S ∪ T ) contains
the smallest subspace containing span(S) and span(T ), which by a Remark is span(S) + span(T ).
Remark. Can define W1 + W2 + · · · + Wk similarly. It is always a subspace, and is the smallest subspace
containing all W1 , . . . , Wk . span(S1 ∪ · · · ∪ Sk ) = span(S1 ) + · · · + span(Sk ).
Definition. If W1 , W2 are subspace of V , then we say that V is the direct sum of W1 and W2 , and we
write V = W1 ⊕ W2 , provided:
(1) W1 + W2 = V .
(2) Every v ∈ V can be written uniquely as v = w1 + w2 with w1 ∈ W1 and w2 ∈ W2 . This means if
v ∈ V and v = w1 + w2 = w10 + w20 with wi , wi0 ∈ Wi for i = 1, 2, then wi = wi0 for i = 1, 2.
Remark. We view this as “decomposing” or “factoring” the space V . This is especially useful in analyzing
the structure of linear operators on vector spaces (MATH 245).

0 b
Example. Let V = M2×2 (R) and W1 be as before. Let W3 = | b ∈ R . Then V = W1 ⊕ W3 ,
0 b

a b
because every ∈ V can be written
c d

a b a b−d 0 d
= +
c d c 0 0 d
| {z } | {z }
∈W1 ∈W3
and this is the unique way to write it.

Remark. We can similarly define V = W1 ⊕ W2 ⊕ · · · ⊕ Wk to mean
(1) V = W1 + · · · + Wk , and
(2) Every v ∈ V can be written uniquely as v = w1 + · · · + wk with wi ∈ Wi for i = 1, . . . , k.
Example. {v1 , . . . , vn } is a basis for V iff
V = span(v1 ) ⊕ span(v2 ) ⊕ · · · ⊕ span(vn ).
Theorem 1. V = W1 ⊕ W2 ⊕ · · · ⊕ Wk iff
(1) V = W1 + W2 + · · · + Wk , and
(20 ) the only solution to
0 = w1 + w2 + · · · + wk with w1 ∈ W1 , w2 ∈ W2 , . . . , wk ∈ Wk
is w1 = w2 = · · · = wk = 0.
Proof. Just need to prove (2) ⇐⇒ (20 ).
(2) ⇒ (20 ). We can write 0 = 0 + · · · + 0 (0 ∈ Wi for i = 1, . . . , k). By (2), this is the unique way to
write 0 = w1 + · · · + wk with wi ∈ Wi for all i, which proves (20 ).
(20 ) ⇒ (2). Assume (20 ). Suppose v ∈ V can be written
v = w1 + · · · + wk
and v = w10 + · · · + wk0
where wi , wi0 ∈ Wi for i = 1, . . . , k. Then
0 = v − v = (w1 + · · · + wk ) − (w10 + · · · + wk0 )
= (w1 − w10 ) + · · · + (wk − wk0 ) .
| {z } | {z }
∈W1 ∈Wk
0
(2 ) implies wi − wi0 = 0 for i = 1, . . . , k, so wi = wi0 for all i, proving (2).
Theorem 2. V = W1 ⊕ W2 iff
(1) V = W1 + W2 , and
(200 ) W1 ∩ W2 = {0}.
Proof sketch. Just need to prove (20 ) ⇐⇒ (200 ) (where k = 2 in (20 )). The trick is to note that if w1 ∈ W1
and w2 ∈ W2 , then 0 = w1 + w2 ⇐⇒ w2 = −w1 =⇒ w1 ∈ W1 ∩ W2 .
2
MATH 146 February 1 Section 2
Given subspaces W1 , W2 of vector space V , we have defined

W1 + W2 := {w1 + w2 : wi ∈ Wi for i = 1.2}.
If V is fin. dim., then so are W1 , W2 , and W1 + W2 (Theorem 1.11).
What can we say about dim(W1 + W2 )? Does it equal dim(W1 ) + dim(W2 )?
Theorem 1. Suppose V is finite-dimensional and W1 , W2 are subspaces of V . Then
dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).
Before proving, let’s look at some applications.
(1) Suppose W1 ∩ W2 = {0}. Then dim(W1 + W2 ) = dim(W1 ) + dim(W2 ).
(2) In particular, if V = W1 ⊕ W2 , then dim(V ) = dim W1 + dim W2 .
(3) Suppose dim V = 5 and dim W1 = dim W2 = 4. Then dim(W1 ∩ W2 ) ≥ 3.
Proof. We have
dim(W1 + W2 ) = 4 + 4 − dim(W1 ∩ W2 )
so
dim(W1 ∩ W2 ) = 8 − dim(W1 + W2 ).
But dim(W1 + W2 ) ≤ dim V = 5, so dim(W1 ∩ W2 ) ≥ 3. □
Proof of Theorem 1. Start by letting B = {v1 , . . . , vk } be a basis for W1 ∩ W2 . We can consider B as a
subset of W1 . Note that B is linearly independent. Hence by Corollary 2(c), B can be extended to a basis
C1 for W1 , say,
C1 = {v1 , . . . , vk , x1 , . . . , xℓ }.
Note: x1 , . . . , xℓ ∈ W1 \ W2 .
Similarly, B can be extended to a basis C2 for W2 , say,
C2 = {v1 , . . . , vk , y1 , . . . , ym }.
Now let
D = C1 ∪ C2 = {v1 , . . . , vk , x1 , . . . , xℓ , y1 , . . . , ym }.
Clearly |D| = k + ℓ + m. (This is where we use the observation that each xi ̸∈ W2 ; hence no xi can equal
any yj .) Clearly
span(D) = span(C1 ∪ C2 ) = span(C1 ) + span(C2 ) = W1 + W2 . □
Claim: D is linearly independent.
Proof of Claim. Assume
(∗) a1 v1 + · · · + ak vk + b1 x1 + · · · + bℓ xℓ + c1 y1 + · · · + cm ym = 0.
Then
(∗∗) a1 v1 + · · · + ak vk + b1 x1 + · · · + bℓ xℓ = −(c1 y1 + · · · + cm ym ) .
| {z } | {z }
∈W1 ∈W2
Thus both sides of (∗∗) are in W1 ∩ W2 . Since W1 ∩ W2 = span(B), we get

−(c1 y1 + · · · + cm ym ) = d1 v1 + · · · + dk vk for some d1 , . . . , dk ∈ F,
Then
c1 y1 + · · · + cm ym + d1 v1 + · · · + dk vk = 0.
| {z }
a linear combo of C2
C2 is linearly independent, so c1 = · · · = cm = 0. (Also d1 = · · · = dk = 0.)
So (∗) simplifies to
a1 v1 + · · · + ak vk + b1 x1 + · · · + bℓ xℓ = 0.
This is a linear combination of C1 , and C1 is linearly independent, so
a1 = · · · = ak = b1 = · · · = bℓ = 0.
So all the coefficients in (∗) are 0. This proves D is linearly independent.
So D is a basis for W1 + W2 , so dim(W1 + W2 ) = |D| = k + ℓ + m. On the other hand,
dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ) = (k + ℓ) + (k + m) − k = k + ℓ + m.
as required. □
Next, we start quotient spaces.

Definition 1.1. Suppose V is a vector space and W is a subspace. Given v ∈ V , define
def
v + W = {v + w : w ∈ W }.
Example. Let V = R2 . Let W = span{(1, 1)}) = {(a, a) : a ∈ R}.
v+W W
Note that v + W = v ′ + W
v′
Jargon. v + W is called the translation of W by v, or the coset of W containing v. (Note that v ∈ v + W .)

Definition. Given a subspace W of V , we let
V /W = {v + W : v ∈ V }.
I.e., V /W is the set of all translations of W .
2
Lemma 1. Let W be a subspace of V . Let u, v ∈ V . Then
u + W = v + W ⇐⇒ u − v ∈ W.
Proof. (⇒) Assume u + W = v + W . Observe that u ∈ U + W . So u ∈ v + W , meaning u = v + w for
some w ∈ W . Then u − v ∈ W .
(⇐) Assume u − v =: w ∈ W . Then
u + W = (v + w) + W = v + (w + W ) = v + W (as w + W = W ; exercise.) □
Lemma 2. Let W be a subspace of V . The translations of W partition V ; that is,
(1) Their union is V , and
(2) If u + W ̸= v + W then (u + W ) ∩ (v + W ) = ∅.
Proof. (1) is obvious (v ∈ v + W for each v ∈ V ). To prove (2), we prove its contrapositive. Assume
(u + W ) ∩ (v + W ) ̸= ∅. Choose x ∈ (u + W ) ∩ (v + W ). This means
x=u+w for some w ∈ W
′
=v+w for some w′ ∈ W .
Then u + w = v + w′ , so u − v = w′ − w ∈ W . So u + W = v + W by Lemma 1. □
Note: the translations of W are the equivalence classes of an equivalence relation we can define on V ,
called “congruence mod W .” The definition is
u≡v (mod W ) ⇐⇒ u − v ∈ W.
Recall that for a fixed subspace W , V /W denotes the set of all translations of W . Thus
elements of V /W ↭ (certain) subsets of V
We use vectors in V to “name” elements of V /W (v “names” v + W ), but the vector names are not the
translations. In particular, elements of V /W can be “named” in more than one way. For example, if
V = R2 , W = span({(1, 1)}), and S = (−1, 0) + W , then also S = (0, 1) + W , since
(−1, 0) − (0, 1) = (−1, −1) ∈ W, so (−1, 0) + W = (0, 1) + W (Lemma 1).
We want to turn the set V /W into a vector space. So we need to define addition and scalar multiplication
on the elements of V /W . The idea is illustrated in the following example.
Example. V = R2 , W = span({(1, 1)}), S = (−1, 0) + W , T = (2, −1) + W .
W
S S+T T
u = (−1, 0)
u+v v = (2, −1)

def
In general, if S = u + W and T = v + W , then we want to define S + T = (u + v) + W . Is there a
problem if we used different names for S and T ?
Lemma (There is no problem). Suppose V is a vector space, W is a subspace, and u, u′ , v, v ′ ∈ V . Then
u + W = u′ + W

=⇒ (u + v) + W = (u′ + v ′ ) + W.
and v + W = v ′ + W
In other words, defining addition on V /W by
def
(u + W ) + (v + W ) = (u + v) + W
is well-defined (does not depend on the choice of names u, v).
Proof. The hypothesis implies u − u′ ∈ W and v − v ′ ∈ W . The conclusion will follow if we can establish
(u + v) − (u′ + v ′ ) ∈ W . Well,
(u + v) − (u′ + v ′ ) = (u − u′ ) + (v − v ′ ) ∈ W. □
| {z } | {z }
∈W ∈W
Remark. Similarly, if F is the field of scalars for V , then we can define scalar multiplication on V /W by
def
c · (v + W ) = (cv) + W
and there is no problem (it is well-defined: if v + W = v ′ + W then (cv) + W = (cv ′ ) + W ).
Proposition. V /W with the operations defined as above is a vector space over F.
Proof. Ugh. There are 8 axioms that we must verify. Some are easy: e.g., commutativity (VS 1). Suppose
S, T ∈ V /W , say S = x + W and T = y + W . Then
df ∗ df
S + T = (u + W ) + (v + W ) = (u + v) + W = (v + u) + W = (v + W ) + (u + W ) = T + S.
∗
(= marks an application of (VS 1) in V .) Or left distributivity (VS 7):
df
a(S + T ) = a((x + W ) + (y + W )) = a((x + y) + W )
df ∗
= (a(x + y) + W ) = ((ax + ay) + W )
df df
= (ax + W ) + (ay + W ) = a(x + W ) + a(y + W ) = aS + aT.
Existence of a zero vector is more interesting. What is the zero vector in V /W ? (It is W .)
And the additive inverse of v+W is (−v)+W , because obviously (v+W )+((−v)+W ) = 0+W = W . □
Now that we know that V /W is a vector space, we can talk about linear combinations of “vectors” (i.e.,
translations of W ), linear independence, span, etc. The next theorem tells us what is dim(V /W ) when
dim V is finite.
Theorem. Suppose If V is finite-dimensional, say dim V = n. Let W be a subspace, say of dimension k.
Then dim(V /W ) = n − k = dim V − dim W .
Proof. Let B = {v1 , . . . , vk } be a basis for W . We can view B as a linearly independent subset of V . So
we can extend B to a basis C = {v1 , . . . , vk , vk+1 , . . . , vn } for V . Let D := {vk+1 + W, . . . , vn + W }. We
will show that D is a basis for V /W . Since |D| = n − k, this will prove the theorem.
We first show that D is linearly independent (in V /W ). Suppose ak+1 , . . . , an ∈ F and
(∗) ak+1 (vk+1 + W ) + · · · + an (vn + W ) = W.
Using the definitions of the operations in V /W , we can simplify this to
(ak+1 vk+1 + · · · + an vn ) + W = W.
2
By Lemma 1, this implies
ak+1 vk+1 + · · · + an vn ∈ W,
so
ak+1 vk+1 + · · · + an vn = b1 v1 + · · · bk vk for some b1 , . . . , bk ∈ F,
since {v1 , . . . , v)k } spans W . Then
(−b1 )v1 + · · · + (−bk )vk + ak+1 vk+1 + · · · + an vn = 0.
This is a linear combination of C. Since C is linearly independent, all the coefficients are 0. In particular,
ak+1 = · · · = an = 0. This proves that the only solution to (∗) is the trivial one, so D is linearly
independent.
Next we will show that D spans V /W . Let v + W ∈ V /W be given. Because C spans V , we can write
v = a1 v1 + · · · + ak vk + ak+1 vk+1 + · · · + an vn for some a1 , . . . , an ∈ F.
Claim. v + W = ak+1 (vk+1 + W ) + · · · + an (vn + W ) (so v + W ∈ span(D)).
Proof of Claim. The RHS of the Claim simplifies to u + W where u = ak+1 vk+1 + · · · + an vn . We want to
prove v + W = u + W . We can use Lemma 1: it is enough to prove v − u ∈ W . Well,
v − u = (a1 v1 + · · · + ak vk + ak+1 vk+1 + · · · + an vn ) − (ak+1 vk+1 + · · · + an vn )
= a1 v1 + · · · + ak vk ,
which is in W because v1 , . . . , vk ∈ W and W is a subspace. □
3
§2.1. Here is the definition of “good” functions between vector spaces.

Definition. Let V and W be vector spaces over the same field F. A function T : V → W is called a
linear transformation, or is said to be linear, if:
(1) T (x + y) = T (x) + T (y) for all x, y ∈ V , and
(2) T (ax) = aT (x) for all x ∈ V and a ∈ F.
A linear transformation that is also a bijection is called an isomorphism.
Remark: We often write T x in place of T (x).
Example.
(1) (F = R). Let V = W = R2 . Define T : R2 → R2 by T (x1 , x2 ) = (−x2 , x1 ). This maps, e.g.,
(1, 0) 7→ (0, 1) and (0, 1) 7→ (−1, 0).
In fact, this T is just “rotation c.c.w. by 90◦ about (0, 0).”
Claim. T is a linear transformation.
Proof. For any x = (x1 , x2 ), y = (y1 , y2 ) ∈ R2 ,
T (x + y) = T ((x1 , x2 ) + (y1 , y2 ))
= T ((x1 + y1 , x2 + y2 )) (def. of + in R2 )
= (−(x2 + y2 ), x1 + y1 ) (def. of T )
= (−x2 , x1 ) + (−y2 , y1 )
= T (x) + T (y)
and similarly,
T (ax) = T (a(x1 , x2 ))
= T ((ax1 , ax2 ))
= (−ax2 , ax1 )
= a(−x2 , x1 )
= aT (x). □
In fact, T is an isomorphism (from R to R).
(2) V = W = R2 , T : R2 → R2 given by T (x1 , x2 ) = ( x1 +x
2
2 x1 +x2
, 2 ). The range of T is the x, y-line. In
2
effect, T maps each point in R to the closest point on this line.
Claim. T is linear.
Proof. Exercise. □
(3) The previous two examples are special cases of the following: let F be a field, and let A ∈ Mm×n (F),
say
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A =   ... ... .. 
. 
am1 am2 · · · amn
Given x = (x1 , . . . , xn ) ∈ Fn , define
a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
    
 a21 a22 · · · a2n   x2  def  a21 x1 + a22 x2 + · · · + a2n xn 
Ax =  =   ∈ Fm .
 ... .. ..  ..  ..

. .  .   . 
am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn
(This is “matrix-vector multiplication.”) Note that the i-th entry of the product Ax is obtained by
combining the entries in the i-th row
of A with
the entries of x “like a dot product.”
0 2 3
For example, if F = Z5 and A = ∈ M2×3 (Z5 ) and x = (3, 2, 1) ∈ (Z5 )3 , then
4 1 2
 
3
0 2 3   0·3 + 2·2 + 3·1 2
Ax = 2 = = ∈ (Z5 )2 .
4 1 2 4·3 + 1·2 + 2·1 1
1
Jeff ’s comment: Ax can also be defined as the linear combination of the columns of A using the
entries of x as scalars. So in the above example,
 
3
0 2 3   0 2 3 0 4 3 2
Ax = 2 =3 +2 +1 = + + = .
4 1 2 4 1 2 2 2 2 1
1
(Thank you, Jeff!) Returning to the general situation: given A ∈ Mm×n (F), define LA : Fn → Fm
by LA (x) = Ax.
Claim. Every such map LA : Fn → Fm is linear.
Proof. Tedious but not difficult. □

0 −1
Note: in Example 1 we had T = LA : R2 → R2 with A = . In example 2 we had
1 0

2 2 1/2 1/2
T = LA : R → R with A = .
1/2 1/2
(4) Define D : P(R) → P(R) by D(f ) = f ′ (derivative). D is linear. Is it injective? Surjective?
(5) Let C(R) be the set of all continuous functions R → R. Define T : C(R) → C(R) by
Z x
(T f )(x) = f (t)dt.
0
T is linear. Is it injective? Surjective?
Facts. Suppose V, W are vector spaces and T : V → W is a function.
(1) If T is linear, then
• T (0) = 0. (Proof: T (0) = T (0 + 0) = T (0) + T (0).)
• T (−x) = −T (x). (Hint: −x = (−1)x.)
• T (a1 x1 + · · · + an xn ) = a1 T (x1 ) + · · · + an T (xn ). (By induction on n)
(2) T is linear ⇐⇒ T (ax + by) = aT (x) + bT (y) for all x, y ∈ V and all a, b ∈ F. (Exercise)
Theorem 2.6. Let V, W vector spaces over F with V finite-dimensional. Suppose {v1 , . . . , vn } is a basis
for V . Let w1 , . . . , wn be any vectors in W (not necessarily distinct). Then there exists a unique linear
transformation T : V → W satisfying
T vi = wi , for i = 1, . . . , n.
2
Proof of Theorem 2.6. To prove existence, define T : V → W by
df
T (a1 v1 + · · · + an vn ) = a1 w1 + · · · + an wn .
| {z }
arbitrary element of V
Every element of V can be written as on the LHS, and in just one way (Thm. 1.8, Jan 23), so there is no
issue of being well-defined.
Clearly T vi = wi for each i = 1, . . . , n since
df
T vi = T (0·v1 + · · · + 1·vi + · · · + 0·vn ) = 0·w1 + · · · + 1·wi + · · · + 0·wn = wi .
Check that T is linear. Let x, y ∈ V . Say x = ni=1 ai vi and y = ni=1 bi vi . Then
P P
Xn Xn n
X n
X
T (x + y) = T ( (ai + bi )vi ) = (ai + bi )wi = ai w i + bi wi = T x + T y.
i=1 i=1 i=1 i=1
A similar proof shows T (cx) = cT (x).

To prove uniqueness, assume more generally that T, T ′ are two linear transformations V → W satisfying
T vi = T ′ vi for i = 1, . . . , n. We must show T = T ′ , i.e., T x = T ′ x for all x ∈ V . Fix x ∈ V ; because
{v1 , . . . , vn } spans V , we can write x = a1 v1 + · · · + an vn . Then
T x = T (a1 v1 + · · · + an vn )
= a1 T v1 + · · · + an T vn T is linear
= a1 T ′ v1 + · · · + an T ′ vn assumption
′
= T (a1 v1 + · · · + an vn ) T ′ is linear
= T ′ x.
So T = T ′ as claimed. □
3
Definition. Suppose T : V → W is linear. The range (or image) of T is

def
Ran(T ) = {T (x) : x ∈ V }.
Note: Ran(T ) ⊆ W . Text R(T ) in place of Ran(T ).
Theorem 2.1 (a). Let T : V → W be linear. Then Ran(T ) is a subspace of W .
Proof. Must show Ran(T ) 6= ∅, Ran(T ) is closed under +, and Ran(T ) is closed under scalar multiplication.
(1) T (0V ) = 0W so 0W ∈ Ran(T ). (Alternate proof: T is a function with nonempty domain, so duh.)
(2) Given T (x), T (y) ∈ Ran(T ), we have T (x) + T (y) = T (?), proving T (x) + T (y) ∈ Ran(T ).
(3) Given T (x) ∈ Ran(T ) and a ∈ F, we have aT (x) = T (?), proving aT (x) ∈ Ran(T ).
The following will be useful.
Theorem 2.2. Suppose T : V → W is linear with V finite-dimensional. If V = span(v1 , . . . , vn ), then
Ran(T ) = span(T v1 , . . . , T vn ).
Proof. Ran(T ) = {T x : x ∈ V } = {T (a1 v1 + · · · + an vn ) : ai ∈ F}
= {a1 T v1 + · · · + an T vn : ai ∈ F} = span(T v1 , . . . , T vn ).
Corollary. Suppose T : V → W is linear with V finite-dimensional. Then Ran(T ) is also finite-
dimensional and dim(R(T )) ≤ dim(V ).
Proof. Let n = dim V . By picking a basis for V , Theorem 2.2 gives a spanning set for Ran(T ) of size ≤ n.
We can shrink it to a basis for Ran(T ), which will also have size ≤ n.

1/2 1/2 x1
Example. V = W = R2 , T : V → W given by T (x1 , x2 ) = ( x1 +x 2 x1 +x2
, 2 )= .
2 1/2 1/2 x2

1 0
We know that V is spanned by {e1 , e2 } = , . Hence by Theorem 2.2,
0 1

1/2 1/2 1/1 1
Ran(T ) = span(T e1 , T e2 ) = span , = span = span .
1/2 1/2 1/2 1
Definition. Suppose T : V → W is linear. The kernel (or null space) of T is
def
Ker(T ) = {x ∈ V : T x = 0W }.
Note: Ker(T ) ⊆ V . Text uses N (T ) in place of Ker(T ).
Example. Same T : R2 → R2 .

2 x1 x1 +x2 x1 +x2 x1 −1
Ker(T ) = {x ∈ R : T x = 0} = :( 2 , 2 )=0 = : x1 + x2 = 0 = span .
x2 x2 1
Theorem 2.1 (b). Let V be linear. Then Ker(T ) is a subspace of V .
Proof. Again must show Ker(T ) is nonempty and closed under the vector space operations.
(1) T (0V ) = 0W , so 0V ∈ Ker(T ), so Ker(T ) 6= ∅.
(2) Suppose x, y ∈ Ker(T ). So T x = T y = 0. So T (x + y) = T x + T y = 0 + 0 = 0. So x + y ∈ Ker(T ).
This proves Ker(T ) is closed under addition.
(3) Suppose x ∈ Ker(T ) and a ∈ F. So T x = 0. So T (ax) = aT (x) = a0 = 0, so ax ∈ Ker(T ). This
proves Ker(T ) is closed under scalar multiplication.
Coverage of midterm ends here
Theorem 2.4. Suppose T : V → W is linear. Then T is injective ⇐⇒ Ker(T ) = {0}.
Proof. (⇒) Assume T is injective. Since {0V } ⊆ Ker(T ), it suffices to show Ker(T ) ⊆ {0V }. Let x ∈
Ker(T ). So T x = 0W = T (0V ). By injectivity, we get x = 0V . This proves Ker(T ) ⊆ {0V }.
(⇐) Assume Ker(T ) = {0V }. Let x, y ∈ V and assume T x = T y. Then
T (x − y) = T (x + (−y)) = T x + T (−y) = T x + −T (y) = T x − T y = 0 (since T x = T y).
Hence x − y ∈ Ker(T ). But Ker(T ) = {0} so x − y = 0, so x = y. This proves T is injective.
Theorem 2.3 (Rank-Nullity Theorem). Suppose T : V → W is linear with V finite-dimensional. Then
dim(Ran(T )) + dim(Ker(T )) = dim(V ).
| {z } | {z }
“rank of T ” “nullity of T ”
2
Definition. Suppose T : V → W is linear with V finite-dimensional.

• The rank of T is dim Ran(T ).
• The nullity of T is dim Ker(T ).
Theorem 2.3 (Rank-Nullity Theorem). Suppose T : V → W is linear with V finite-dimensional. Then
rank(T ) + nullity(T ) = dim V.
Intuitively: the dimension of V is “divided up” between the range and kernel of T .
Proof. Suppose dim V = n and dim Ker(T ) = k . We need a bases for V , Ker(T ), and Ran(T ) which are
related so we can compare their sizes.
Start with a basis S = {v1 . . . , vk } for Ker T . S is linearly independent in V , so we can extend S to a
basis B = {v1 , . . . , vk , x1 , . . . , xn−k } for V . Since B spans V , it follows that
{T v1 , . . . , T vk , T x1 , . . . , T xn−k } spans Ran(T ) (Theorem 2.2).
| {z }
all = 0
Let C = {T x1 , . . . , T xn−k }. So C spans Ran(T ). We’ll show that |C| = n−k and C is linearly independent.
Assume first that ∃i ̸= j with T xi = T xj . Then T (xi − xj ) = T xi − T xj = 0, so xi − xj ∈ Ker(T ). Then
xi − xj = a1 v1 + · · · + ak vk for some scalars a1 , . . . , ak . But then xi can be written as a linear combination
of other vectors in B, contradicting linear independence of B. So T xi ̸= T xj ∀i ̸= j.
Next let’s show that C is linearly independent. Suppose c1 , . . . , cn−k ∈ F and
c1 T x1 + · · · + cn−k T xn−k = 0.
Then
T (c1 x1 + · · · + cn−k xn−k ) = 0,
so
c1 x1 + · · · + cn−k xn−k ∈ Ker(T ).
Recall that {v1 , . . . , vk } spans Ker(T ), so ∃a1 , . . . , ak ∈ F with
c1 x1 + · · · + cn−k xn−k = a1 v1 + · · · + ak vk .
So
(−a1 )v1 + · · · + (−ak )vk + c1 x1 + · · · + cn−k xn−k = 0.
But this is a linear combination of B, and B is linearly independent. So c1 = · · · = cn−k = 0 (and also
a1 = · · · = ak = 0). This proves C is linearly independent.
So C is a basis for Ran(T ), and |C| = n − k. This proves
rank(T ) = dim(Ran(T )) = n − k = dim V − dim(Ker(T )) = dim V − nullity(T )
so rank(T ) + nullity(T ) = dim V as required. □

Theorem 2.5. Suppose T : V → W is linear where V, W are finite-dimensional and dim V = dim W . The
following are equivalent:
(1) T is injective.
(2) T is surjective.
(3) rank(T ) = dim V .
Proof.
2.4 2.3
T injective ⇐⇒ Ker(T ) = {0} ⇐⇒ nullity(T ) = 0 ⇐⇒ rank(T ) = dim V
(∗)
⇐⇒ rank(T ) = dim W ⇐⇒ dim(Ran(T )) = dim(W ) ⇐⇒ Ran(T ) = W ⇐⇒ T surjective
where (∗) holds since Ran(T ) is a subspace of finite-dimensional W (using Theorem 1.11). □
Recall that an isomorphism is a bijective linear transformation. We can add to Theorem 2.5
(4) T is an isomorphism.
In case T is an isomorphism, we write T : V ∼
= W.
Theorem 2.19. Suppose V, W are vector spaces over F with dim V, dim W < ∞. Then
∃ an isomorphism T : V ∼
= W ⇐⇒ dim W = dim V.
Proof sketch. (⇒) Assume T : V ∼
= W . By the Rank-Nullity Theorem,
rank(T ) + nullity(T ) = dim V.
| {z } | {z }
dim W 0
(⇐) Assume dim V = dim W = n. Let {v1 , . . . , vn } be a basis for V . Let {w1 , . . . , wn } be a basis for W .
Let T be the unique linear transformation T : V → W sending vi 7→ wi for i = 1, . . . , n (Theorem 2.6).
Then
Ran(T ) = span(T v1 , . . . , T vn ) (Theorem 2.2)
= span(w1 , . . . , wn ) = W as {w1 , . . . , wn } is a basis for W
so T is surjective. Since dim V = dim W < ∞, T is also injective (Theorem 2.5). □
Definition 2.20. We write V ∼ = W to mean ∃ an isomorphism T : V ∼ = W.
Example.
(1) Pn (F) ∼
= Fn+1 (since both have dimension n + 1).
(2) Mm×n (F) ∼= Fmn (since both have dimension mn).
(3) C “as vector space over R” ∼= R2 (both have dimension 2).
2
Suppose A ∈ Mm×n (F) and x ∈ Fn . Recall that Ax ∈ Fm .

Notation. We write A = [a1 a2 · · · an ], meaning a1 , . . . , an ∈ Fm are the columns of A.
x1
 
Fact. With A as above, if x =  ...  ∈ Fn then by Jeff’s comment (Feb. 6 lecture),

xn
Ax = x1 a1 + · · · + xn an .
Lemma 1. If A, B ∈ Mm×n (F) and x, y ∈ Fn , then
(1) A(x + y) = Ax + Ay.
(2) (A + B)x = Ax + Bx.
x1 y1 x1 + y1
     
Proof. Write A = [a1 a2 · · · an ], x =  ...  and y =  ... . Then x + y =  ... , so

xn yn xn + yn
A(x + y) = (x1 + y1 )a1 + · · · + (xn + yn )an
= (x1 a1 + · · · + xn an ) + (y1 a1 + · · · + yn an )
= Ax + Ay.
Now write B = [b1 b2 · · · bn ]. Then A + B = [a1 +b1 a2 +b2 · · · an +bn ], so
(A + B)x = x1 (a1 + b1 ) + · · · + xn (an + bn )
= (x1 a1 + · · · + xn an ) + (x1 b1 + · · · + xn bn )
= Ax + Bx. □
Similarly one can prove the following.
Lemma 2. If A ∈ Mm×n (F), x ∈ Fn , and α ∈ F, then A(αx) = α(Ax) = (αA)x.
Also recall that I defined the map LA : Fn → Fm by LA (x) = Ax. Going forward, let’s use the notation
TA instead of LA for this map.
Corollary. For each A ∈ Mm×n (F), TA is a linear transformation.
Proof. For any x, y ∈ Fn and α ∈ F, using Lemmas 1 and 2 we get
TA (x + y) = A(x + y) = Ax + Ay = TA (x) + TA (y) and TA (αx) = A(αx) = α(Ax) = αTA (x). □
Here are two more useful facts.
Lemma 3. If A ∈ Mm×n (F) and ei ∈ Fn is the ith standard basis vector, then Aei = the ith column of A.
 
0
 .. 
.
Proof. Write A = [a1 a2 · · · an ] ∈ Mm×n (F) and ei = 1. Then Aei = 0·a1 +· · ·+1·ai +· · ·+0·an = ai . □
 
.
 .. 
0
Lemma 4. If A, B ∈ Mm×n (F), then A = B ⇐⇒ Ax = Bx ∀x ∈ Fn .
Proof. (⇒) Obvious.
(⇐) Write A = [a1 a2 · · · an ] and B = [b1 b2 · · · bn ]. Assume Ax = Bx for all x ∈ Fn . Then in
particular, for each i = 1, . . . , n we have ai = Aei = Bei = bi . So A = B. □
Definition. Given vector spaces V, W over F, let L(V, W ) denote the set of all linear transformations
from V to W .
Example 1. L(Fn , Fm ) = {TA : A ∈ Mm×n (F)}.
Proof. (⊇) by the Corollary.
(⊆): Let T ∈ L(Fn , Fm ), so T : Fn → Fm is linear. For i = 1, . . . , n define ai = T (ei ) ∈ Fm . Let
A = [a1 a2 · · · an ] ∈ Mm×n (F). We want to show that T = TA . For this, it is enough to show that T and
TA have the same values on some basis for Fn (by Theorem 2.6, Feb. 6).
Consider the standard basis {e1 , . . . , en } for Fn . For each i we have
TA (ei ) = Aei = ai = T (ei ) ✓
So T = TA . □
Now let’s do something mind-blowing.

Definition. Given vector spaces V, W over F, we turn L(V, W ) into a vector space over F by defining
operations pointwise in W . That is, for T1 , T2 ∈ L(V, W ) we define T1 + T2 : V → W by
def
(T1 + T2 )(v) = T1 v + T2 v (v ∈ V )
and for T ∈ L(V, W ) and α ∈ F define αT : V → W by
def
(αT )(v) = α · T v (v ∈ V ).
Many things must be verified.
(1) If T1 , T2 ∈ L(V, W ) then T1 + T2 ∈ L(V, W ).
(2) If T ∈ L(V, W ) and α ∈ F then αT ∈ L(V, W ).
(3) L(V, W ) with the above operations satisfies (VS 1)–(VS 8).
Let’s check a couple of these.
(1) Let T = T1 + T2 . To prove T is linear, let x, y ∈ V . Then
T (x + y) = (T1 + T2 )(x + y) T = T1 + T2
= T1 (x + y) + T2 (x + y) definition of T1 + T2
= (T1 x + T1 y) + (T2 x + T2 y) T1 and T2 are linear
= (T1 x + T2 x) + (T1 y + T2 y) commutativity and associativity in W
= (T1 + T2 )(x) + (T1 + T2 )(y) definition of T1 + T2
= T x + T y. T = T1 + T2
A similar proof shows (T1 + T2 )(ax) = a · (T1 + T2 )x.
(3) (VS 3) What is the “zero vector” in L(V, W )?
Theorem 2. Define L : Mm×n (F) → L(Fn , Fm ) by L(A) = TA . Then L : Mm×n (F) ∼
= L(Fn , Fm ).
Proof. (Linear): Given A, B ∈ Mm×n (F), we first show L(A + B) = L(A) + L(B), equivalently, TA+B =
TA + TB , equivalently, TA+B (x) = (TA + TB )(x) for all x ∈ Fn . So fix x ∈ Fn . Using Lemma 1,
Lm.1
TA+B (x) = (A + B)x = Ax + Bx = TA (x) + TB (x) = (TA + TB )(x).
As explained above, this proves L(A + B) = L(A) + L(B). A similar proof shows L(αA) = αL(A).
(Injective): Assume A, B ∈ Mm×n (F) and L(A) = L(B); i.e., TA = TB ; i.e., TA (x) = TB (x) ∀x ∈ Fn ;
i.e., Ax = Bx ∀x ∈ Fn . Then A = B by Lemma 4.
(Surjective): by Example 1. □
2
§2.2. Coordinatization
Definition. Let V be a finite-dimensional vector space with dim V = n. An ordered basis for V is a
basis B = {v1 , . . . , vn } with a specified order (or if you are Patrick, indexing). I’ll write B = (v1 , . . . , vn ).
Example. Let V = Fn . The standard ordered basis is E = (e1 , . . . , en ).
Definition. Let V be a finite-dimensional vector space over F, let B = (v1 , . . . , vn ) be an ordered basis
for V , and let v ∈ V . Recall (Theorem 1.8, Jan. 23) that we can write
(∗) v = a1 v1 + · · · + an vn
in exactly one way. The unique scalars a1 , . . . , an satisfying (∗) are called the coordinates of v with
respect to the ordered basis B. The coordinate vector of v with respect to B, denoted [v]B , is
a1
 
(∗∗) [v]B =  ...  ∈ Fn .

an
Note: by definition, the equation in (∗∗) is equivalent to the equation in (∗).
Example.

1 0 0 1 0 0 0 0
(1) Let V = M2×2 (R) and E = (E11 , E12 , E21 , E22 ) = , , , .
0 0 0 0 1 0 0 1
 
a
a b b a b
If M = then [M ]E =   because
  = aE11 + bE12 + cE21 + dE22 .
c d c c d
d
 
c
b
(2) B = (E21 , E12 , E11 , E22 ) is another ordered basis for M2×2 (R). Then [M ]B = 
 a .

d
 
2
−1
(3) Let B = (1, x, x2 , x3 ), the standard ordered basis for P3 (R). Then [x3 − x + 2]B = 
 0 .

1
(4) Let B = (1, x, 2 (3x − 1), 2 (5x − 3x)), the Legendre basis for P3 (R). To find [x − x + 2]B0 , we
0 1 2 1 3 3
must find a1 , a2 , a3 , a4 ∈ R satisfying

x3 − x + 2 = a1 · 1 + a2 · x + a3 · 21 (3x2 − 1) + a4 · 12 (5x3 − 3x).
After expanding and combining like powers on the right-hand side, this simplifies to the system of
linear equations
a1 − 12 a3 = 2
3
a2 − 2 a4 = −1
3
a
2 3
= 0
5
a =
2 4
1
 
2
−0.4
This system is easily solved to get (a1 , a2 , a3 , a4 ) = (2, − 52 , 0, 52 ). Thus [x3 − x + 2]B0 = 
 0 .

0.4
In general, the problem “given v, find [v]B ” often reduces to solving a system of linear equations. This
system is obtained by forming the equation (∗) involving the unknown a1 , . . . , an . In the reverse direction,
the problem “given [v]B , find v” is easy, since the equation (∗) tells you how to define v from its coordinates.
In general, if V is a vector space over F of dimension n, and B is an ordered basis for V , then the [ ]B
“operator” (sending v 7→ [v]B ) is a bijection from V to Fn . (This is the same “natural” 1-1 correspondence
mentioned in a Corollary on Jan. 23.) In fact,
Claim. [ ]B is an isomorphism from V to Fn , for each ordered basis B.
a1 b1
   
Proof. I’ll show [v + w]B = [v]B + [w]B . Write B = (v1 , . . . , vn ), [v]B = .

. .. . This
 . , and [w]B =
  .
an bn
implies
v = a1 v1 + · · · + an vn and w = b1 v1 + · · · + bn vn .
So
v + w = (a1 v1 + · · · + an vn ) + (b1 v1 + · · · + bn vn )
= (a1 + b1 )v1 + · · · + (an + bn )vn
a1 + b 1
 
which tells us that [v+w]B =  ...  which equals [v]B +[w]B . A similar proof shows [αv]B = α[v]B .
an + b n
Now suppose V, W are finite-dimensional vector spaces over F, say dim V = n and dim W = m. Let
A = (v1 , . . . , vn ) and B = (w1 , . . . , wm ) be ordered basis for V and W . Note that:
• T is completely determined by its values T v1 , . . . , T vn on A (by Theorem 2.6, Feb. 6).
• Each T vj is determined, relative to B, by its coordinate vector [T vj ]B ∈ Fm .
Definition. In this context, the matrix of T with respect to A and B is the m × n matrix over F
whose columns are [T v1 ]B , . . . , [T vn ]B :
 
··· ···
[T v ] · · · [T vj B ···
] [T vn ]B .
 
 1 B
··· ···
We denote this matrix by [T ]BA (preferred) or [T ]B
A (as in the text).
Example. Consider D : P3 (R) → P2 (R) given by D(f ) = f 0 . Choosing ordered bases A = (1, x, x2 , x3 )
and B = (1, x, x2 ), we calculate
D(1) = 0 = 0(1) + 0(x) + 0(x2 )  
2 0 1 0 0
D(x) = 1 = 1(1) + 0(x) + 0(x )
so [D]BA =  0 0 2 0  .
D(x2 ) = 2x = 0(1) + 2(x) + 0(x2 )
0 0 0 3
D(x3 ) = 3x2 = 0(1) + 0(x) + 3(x2 )
Given T : V → W with dim V = n and dim W = m, ordered bases A and B for V and W , and a vector
v ∈ V , we have:
• the vector [v]A ∈ Fn .
• the vector [T v]B ∈ Fm .
• the matrix [T ]BA ∈ Mm×n (F).
Theorem 2.14. In the context above, [T ]BA [v]A = [T v]B .
2
Theorem 2.14. Fix V, W vector spaces over F, dim V = n, dim W = m, A and ordered basis for V , B
an ordered basis for W . Let T : V → W be linear and v ∈ V . Then
[T ]BA [v]A = [T v]B .
Proof. Write v = a1 v1 + · · · + an vn . Then
T v = T (a1 v1 + · · · + an vn ) = a1 T v1 + · · · + an T vn .
So
[T v]B = [a1 T v1 + · · · + an T vn ]B
= a1 [T v1 ]B + · · · + an [T vn ]B because [ ]B is linear
a1
 
= [T ]BA  ...  by Jeff-observation

an
= [T ]BA [v]A . □
Corollary. (Same hypothesis): if B = [T ]BA , then [T v]B = B · [v]A = TB ([v]A ) for all v ∈ V .
In other words, the following diagram “commutes”:
T
V W
[ ]A ∼
= [ ]B ∼
=
Fn Fm
TB (= LB ) (B = [T ]BA )
Again with V, W, A, B fixed, but now allowing T to vary, we can view [ ]BA as a map L(V, W ) → Mm×n (F).
Theorem 2.8. With V, W, A, B as above, [ ]BA : L(V, W ) → Mm×n (F) is an isomorphism.
Proof. At the end of these notes. □
Note: we’ve already seen something like this in a special case: when V = Fn and W = Fm . In that case,
we had L : Mm×n (F) ∼ = L(Fn , Fm ). It turns out that if we choose the standard ordered bases En and Em
for F and F , then [ ]Em En : L(Fn , Fm ) ∼
n m
= Mm×n (F) and [ ]Em En is the inverse map to L.
§2.3
It’s time to define multiplication of matrices. Suppose A ∈ Mk×m (F) and B ∈ Mm×n (F). We will define
AB to be a k × n matrix as follows. First, an example:
 
  2 0  
1 2 3 0  ∗ ∗
1 −4
AB = 0 −1 1 5  3 2  = ∗ ∗
  
2 0 2 4 ∗ ∗
1 2
         
1 2 3 0 13
Write B = [b1 b2 ]. The first column of AB is Ab1 = 2 0 + 1 −1 + 3 1 + 1 5 =  7 
1 0 2 4 12
         
1 2 3 0 −2
The second column of AB is Ab2 = 0 0 + (−4) −1 + 2 1 + 2 5 =  16 
1 0 2 4 12
 
13 −2
So AB =  7 16 . Also note that if you want to compute just one entry of AB, say the row-
12 12
2,column-2 entry, just multiply the elements of row-2 in A with the elements of column-2 in B:
 
  2 0  
1 2 3 0  13 −2
1 −4
AB = 0 −1 1 5  3 2  = 7 16 .
  
2 0 2 4 12 12
1 2
0(0) + (−4)(−1) + 2(1) + 2(5) = 16.

In general:
Definition 2.9. If A ∈ Mk×m (F) and B ∈ Mm×n (F), then AB is the k × n matrix defined as follows: if
B = [b1 b2 · · · bn ] then AB = [Ab1 Ab2 · · · Abn ].
Row-Column characterization. If A = (aij )i,j and B = (bjℓ )j,ℓ , then AB = (ciℓ )i,ℓ where the row-i,
column-ℓ entry ciℓ of AB is the sum of the products of the i-th row of A with the ℓ-th column of B:
k
X
ciℓ = aij bjℓ .
j=1
Matrix multiplication is related to composition of linear transformations. Suppose we have T : V → W

and S : W → U , both linear. We can compose T and S:
def
S ◦ T : V → W, (S ◦ T )(v) = S(T (v)).
We also write ST for S ◦ T .
Lemma 2.9. If T : V → W and S : W → U are linear, then ST : V → U is linear.
Proof. Suppose x, y ∈ V and a, b ∈ F. Then
ST (ax + by) = S(T (ax + by)) = S(a(T x) + b(T y)) = a(S(T x)) + b(S(T y)) = a(ST x) + b(ST y).
Thus by a theorem (Fact 2, Feb. 6), ST is linear. □
Example. Suppose T : Fn → Fm and S : Fm → Fk are linear. By Example 1 (Feb. 13), T = TB for
some B ∈ Mm×n (F). Similarly, S = TA for some A ∈ Mk×m (F). The dimensions allow us to multiply AB,
getting a k × n matrix.
Proposition. TA TB = TAB .
Proof. TA TB and TAB are both linear transformations Fn → Fk . To prove they are equal, it’s enough
to prove they have the same values on a basis for Fn (Theorem 2.6, Feb. 6). So it’s enough to prove
(TA TB )(ei ) = TAB (ei ) for all i = 1, . . . , n. Write B = [b1 b2 · · · bn ], so AB = [Ab1 Ab2 · · · Abn ]. Then
(∗)
(TA TB )(ei ) = TA (TB (ei )) = TA (Bei ) = TA (bi ) = Abi (∗) = Lemma 3, Feb. 13
while
(∗)
(TAB )(ei ) = (AB)ei = i-th column of AB = Abi □
Here is a proof of Theorem 2.8, which I will restate.
Theorem 2.8. Let V, W be finite-dimensional vector spaces, dim V = n, dim W = m, and let A be an
ordered basis for V and B an ordered basis for W . Then [ ]BA : L(V, W ) → Mm×n (F) is an isomorphism.
2
Proof. First show [ ]BA is linear. Suppose S, T ∈ L(V, W ). We want [S + T ]BA = [S]BA + [T ]BA . Write
A = (v1 , . . . , vn ). Then by definition,
[S]BA = [ [Sv1 ]B · · · [Svn ]B ]
[T ]BA = [ [T v1 ]B · · · [T vn ]B ]
[S + T ]BA = [ [(S + T )v1 ]B · · · [(S + T )vn ]B ]
= [ [Sv1 + T v1 ]B · · · [Svn + T vn ]B ] definition of S + T
= [ [Sv1 ]B + [T v1 ]B · · · [Svn ]B + [T vn ]B ] linearity of [ ]B (Claim, Feb 15.)
Now we can see that for every i, the i-th column of [S + T ]BA is equal to the sum of the i-th columns of
[S]BA and [T ]BA , which proves [S + T ]BA equals [S]BA + [T ]BA . A similar proof shows [αT ]BA = α[T ]BA for
any α ∈ F.
Next we show [ ]BA is injective. For this, it is enough to show that ker([ ]BA ) = {0} where 0 : V → W
is the constant zero transformation. Suppose T ∈ ker([ ]BA ). So [T ]BA = O (the m × n zero matrix). This
means that for each i = 1, . . . , n,
0
 
[T vi ]B = ...  .

0
Let B = (w1 , . . . , wm ). The equation displayed above is equivalent to
T vi = 0w1 + · · · + 0wm
which implies T vi = 0 (the zero vector of W ) for all i = 1, . . . , n. Thus if v ∈ V is an arbitrary vector,
then v = a1 v1 + · · · + an vn for some scalars a1 , . . . , an , and
T v = T (a1 v1 + · · · + an vn ) = a1 T v1 + · · · + an T vn = a1 0 + · · · + an 0 = 0,
which proves that T is the constant zero transformation, i.e., T = 0.
We’ve shown ker([ ]BA ) ⊆ {0}, and the opposite inclusion is obvious, so ker([ ]BA ) = {0} and hence [ ]BA
is injective.
Finally, we show that [ ]BA is surjective. Let A ∈ Mm×n be given. Recall that B = (w1 , . . . , wm ). For
each i = 1, . . . , n, let the i-th column of A be
a1i
 
 a2i 
 . 
 .. 
ami
and define xi = a1i w1 + a2i w2 + · · · + ami wm ∈ W . Using Theorem 2.6 (Feb 6), there exists a (unique)
linear transformation T : V → W satisfying T vi = xi for i = 1, . . . , n. Obviously
a1i
 
 a2i 
[T vi ]B = 
 ...  = the i-th column of A

ami
so [T ]BA and A have exactly the same columns. Hence [T ]BA = A, proving [ ]BA is surjective. □
3
Recall from Feb 17:

def
• If A ∈ Mk×m (F) and B = [b1 · · · bn ] ∈ Mm×n (F), then AB = [Ab1 · · · Abn ] ∈ Mk×n (F).
• Proposition: If A ∈ Mk×m (F) and B ∈ Mm×n (F), then TAB = TA ◦ TB .
The Proposition shows that matrix multiplication is related to composition of linear transformations of
the form TA : Fn → Fm . In fact, the connection is more general. Suppose T : V → W and S : W → U
where V, W, U are finite-dimensional spaces over F, say of dimensions n, m, k.
Let A = (v1 , . . . , vn ) be an ordered basis for V , let B = (w1 , . . . , wm ) be an ordered basis for W , and let
C = (u1 , . . . , uk ) be an ordered basis for U .
Recall that [T ]BA is the m × n matrix whose i-th column is [T vi ]B . Similarly, [S]CB is a k × m matrix.
We also have ST : V → U , so [ST ]CA is a k × n matrix.
Theorem 2.11. In this situation, [ST ]CA = [S]CB [T ]BA .
Proof. For each i = 1, . . . , n,
i-th col. of RHS = [S]CB (i-th col. of [T ]BA ) Def. of matrix multiplication
= [S]CB [T vi ]B Def. of [T ]BA
= [S(T vi )]C Theorem 2.14
= [(ST )vi ]C Def. of ST = S ◦ T
= i-th column of LHS.
Since this is true for every i, we have LHS = RHS. □
Also recall from Feb 17:
• Lemma 1: if A, B ∈ Mk×m (F) and x, y ∈ Fk , then
(1) A(x + y) = Ax + Ay).
(2) (A + B)x = Ax + Bx.
• Lemma 2: A(αx) = α(Ax) = (αA)x.
From these facts about matrix-vector multiplication, we can deduce the following about matrix-matrix
multiplication.
Theorem 2.12. Let A, B ∈ Mk×m (F), C, D ∈ Mm×n (F), and α ∈ F.
(1) A(C + D) = AC + AD.
(2) (A + B)C = AC + BC.
(3) α(AC) = (αA)C = A(αC).
Proof. (1) Write C = [c1 · · · cn ] and D = [d1 · · · dn ]. Then C + D = [c1 +d1 · · · cn +dn ]. So
i-th col. of A(C + D) = A(i-th col. of C + D) def. of matrix multiplication
= A(ci + di )
= Aci + Adi Lemma 1(1)
= (i-th col. of AC) + (i-th col. of AD)
= i-th col. of AC + AD.
(2) i-th col. of (A + B)C = (A + B)ci
= Aci + Bci Lemma 1(2)
= (i-th col. of AC) + (i-th col. of BC)
= i-th col. of AC + BC.
(3) Proved similarly, using Lemma 2. □
Returning to the start of this lecture, we can restate TAB = TA ◦ TB as follows:
• If A is k × m and B is m × n, then for all x ∈ Fn , (AB)x = A(Bx).
This is a kind of “associative law” for matrix-matrix-vector multiplication. From it we can deduce that
matrix multiplication is associative.
Theorem 2.16. Let A ∈ Mk×m (F), B ∈ Mm×n (F), and C ∈ Mn×p (F). Then (AB)C = A(BC).
Proof. Write C = [c1 · · · cp ]. Then
i-th col. of (AB)C = (AB)ci def. of (AB)C
= A(Bci ) Restatement of TAB = TA ◦ TB
= A(i-th col. of BC) def. of BC
= i-th col. of A(BC). def. of A(BC) □
More facts. Suppose A ∈ Mk×m (F).
(1) Ik A = A and AIm = A where In is the n × n identity matrix In = [e1 e2 · · · en ]. For example,
 
1 0 0 0
0 1 0 0
I4 = 
0 0 1 0 .

0 0 0 1
(2) Oj×k A = Oj×m and AOm×n = Ok×n , where Op×q is the p × q zero matrix.
Having to pay attention to make sure the sizes of matrices are compatible for multiplication is a bit of
a pain. One case when you don’t have to pay attention is when multiplying square matrices: if A, B, C ∈
Mn (F) := Mn×n (F), then all possible products of A, B, C exist and are in Mn (F).
Note. Mn (F) with +, ×, On×n , In is a ring. Adding scalar multiplication, it becomes an F-algebra.
Cautions.
1 1 0 1
(1) AB ̸= BA in general. Example: A = ,B= . AB = B but BA = O.
0 0 0 0
(2) AB = O ̸⇒ (A = O or B = O).
(3) Cancellation does not hold. In the above example, AB = IB and B ̸= O, but A ̸= I.
In any ring (with identity!) we can define “invertible” elements. In Mn (F) the definition is the following.
Definition. If A ∈ Mn (F), we say that A is invertible if there exists B ∈ Mn (F) with AB = BA = In .
Such B is called an inverse of A.

0 1
Example. F = R, A = . AA = I so A is an inverse to itself.
1 0

1 1 ∗ ∗
Example. F = R, A = . No matter what is B, AB will have the form so cannot equal I.
0 0 0 0
A is not invertible.
We will learn all the standard techniques for testing for invertibility, and finding inverses, later in the
course.
Fact. If A is invertible, then its inverse B is unique.
Proof. If C is another inverse, then C = CI = C(AB) = (CA)B = IB = B. □
−1
Notation. When A is invertible, its unique inverse is denoted A .
2
MATH 146 March 1 Section 2
Recall: If A ∈ Mn (F), we say that A is invertible if there exists B ∈ Mn (F) with AB = BA = In . Such
B is called an inverse of A.
Notation. When A is invertible, its inverse is unique and is denoted A−1 .
Theorem 1 (Properties of inverses). Suppose A, B ∈ Mn (F).

(1) If A is invertible, then so is A−1 and (A−1 )−1 = A.
(2) If A is invertible and α ∈ F with α ̸= 0, then αA is invertible and (αA)−1 = α1 A−1 .
(3) If A, B are invertible then so is AB and (AB)−1 = B −1 A−1 .
Proof. (1) We just need to show that B := A works as an inverse to A−1 , i.e., A−1 A = AA−1 = I, which
is immediate since A−1 is the inverse to A.
(2) (αA)( α1 A−1 ) = αα (AA−1 ) = 1I = I. Similarly for the product in the other direction.
(3) By associativity, (AB)(B −1 A−1 ) = ((AB)B −1 )A−1 = (A(BB −1 ))A−1 = (AI)A−1 = AA−1 = I.
Similarly for the product in the other order. □
Question: if we just assume that AB is invertible, can we say (AB)−1 = B −1 A−1 ?
Definition. Given T ∈ L(V, W ), we say that T is invertible if there exists S ∈ L(W, V ) such that
ST = IV and T S = IW , where IV : V → V is the identity transformation on V , and similarly for IW . Such
an S is called an inverse of T .
Theorem 2. Let T ∈ L(V, W ).

(1) If T is invertible, then its inverse is unique. (We denote it T −1 .)
(2) T is invertible iff T is an isomorphism.
Proof. (1) Assume that S1 , S2 ∈ L(W, V ) are both inverses of T . So S1 T = S2 T = IV and T S1 = T S2 = IW .

Then
S1 = S1 ◦ IW = S1 ◦ (T ◦ S2 ) = (S1 ◦ T ) ◦ S2 = IV ◦ S2 = S2 .
Before proving (2), note the following
Fact about Functions. Suppose f : A → B and g : B → C, so g ◦ f : A → C. If g ◦ f is a bijection,

then f is injective and g is surjective.
(2) (⇒) Assume T is invertible, so an inverse S ∈ L(W, V ) exists. We have ST = IV , which is a bijection;
so T is injective. And we have T S = IW , which is a bijection; so T is surjective. Hence T is a bijection,
so is an isomorphism.
(⇐) Assume T is an isomorphism. So T is a bijection from V to W . Hence the inverse map T −1 : W → V
exists and is defined by T −1 (w) = v iff T V = w. We need to show that T −1 ∈ L(W, V ) and T and T −1
satisfying the compositions T T −1 = IV and T −1 T = IW . The compositions are easily checked. We need to
prove T −1 is linear. Suppose w1 , w2 ∈ W ; let v1 , v2 ∈ V be such that T vi = wi (and T −1 wi = vi ). We know
T (v1 + v2 ) = T v1 + T v2 = w1 + w2 ; hence T −1 (w1 + w2 ) = v1 + v2 = T −1 w1 + T −1 w2 . So T −1 preserves +.
A similar proof works for scalar multiplication. □
Theorem 2.18. Let V, W be vector spaces over F with dim(V ) = n and dim(W ) = m and ordered bases
A and B. Let T ∈ L(V, W ). Then T is invertible iff n = m and [T ]BA is invertible.
Proof. Let A = [T ]BA ∈ Mm×n (F).

(⇒) Assume T is invertible, with inverse T −1 . Then T is an isomorphism by the previous theorem, so
dim V = dim W by Theorem 2.19 (Feb 10), i.e., n = m. Let B = [T −1 ]AB . Note that A, B ∈ Mn (F). Also,
AB = [T ]BA [T −1 ]AB
= [T T −1 ]BB Theorem 2.11 (Feb 27)
= [IW ]BB
= In (exercise)
A similar proof shows BA = In . So A is invertible and B is its inverse.
(⇐) Assume n = m and A is invertible. Then dim V = dim W . Let B = A−1 . I will prove first that
TA : Fn → Fn is injective. Assume x ∈ ker(TA ), so Ax = 0. Then
x = In x = (BA)x = B(Ax) = B0 = 0.
So ker(TA ) ⊆ {0} and thus TA is injective.
Next I’ll show that T is injective. Let v ∈ ker(T ), so T v = 0. Consider x = [v]A ∈ Fn . By Theorem 2.14
(Feb 17), we get
TA (x) = Ax = [T ]BA [v]A = [T v]B = [0]B = 0.
Since TA is injective, we get x = 0, i.e., [v]A = 0. But [ ]A : V ∼ = Fn is an isomorphism, so v = 0. This
proves ker(T ) ⊆ {0}, so T is injective.
Finally, recall that T : V → W with dim V = dim W = n. Since T is injective, it is an isomorphism by
Theorem 2.5 (Feb 10). Hence T is invertible by Theorem 2. □
2
Recall:
Theorem 2.18. Let V, W be vector spaces over F with dim(V ) = n and dim(W ) = m and ordered bases
A and B. Let T ∈ L(V, W ). Then T is invertible iff n = m and [T ]BA is invertible.
Corollary. If A ∈ Mn (F), then TA : Fn → Fn is invertible iff A is invertible.

Proof. Let En be the standard ordered basis for Fn . Then [TA ]En En = A. Apply Theorem 2.18. □
Now we can answer the earlier question about (AB)−1 .
Theorem 1. Let A, B ∈ Mn (F). If AB is invertible, then A and B are invertible (so (AB)−1 = B −1 A−1 ).
Proof.
AB invertible =⇒ TAB is invertible above Corollary
=⇒ TAB is an isomorphism Theorem 2 (Mar 1)
=⇒ TA ◦ TB is a bijection Proposition Feb 17
=⇒ TA is surjective and TB is injective Fact about functions
=⇒ TA and TB are isomorphisms Theorem 2.5 (Cor. of RNT)
=⇒ TA and TB are invertible Theorem 2 (Mar 1)
=⇒ A and B are invertible. above Corollary □
Corollary. Suppose A, B ∈ Mn (F) satisfy AB = In . Then BA = In , A is invertible, and B = A−1 .
Proof. In is invertible. So if AB = In then A, B are both invertible by the previous theorem, so A−1 exists.
Then AB = In =⇒ A−1 (AB) = A−1 In =⇒ B = A−1 =⇒ BA = In . □
§2.5 Change of coordinates

Let V be a fin. dim. vector space over F, say dim V = n. Let A be an ordered basis for V . Then we can
assign to every v ∈ V its coordinate vector [v]A : V ∼
= Fn .
The coordinate vector depends on A. Thus if we have a different ordered basis B, we would get a
different coordinate vector [v]B .
We can transform [v]A to [v]B via a change of coordinate matrix.
Definition. Suppose A, B are two ordered bases for the finite-dimensional vector space V . The change
of coordinate matrix from A to B is the matrix [IV ]BA .
Recall that, in general, if T : V → W and A, B are ordered bases for V and W , then for all v ∈ V ,
[T v]B = [T ]BA [v]A . (Theorem 2.14, Feb 17)
Apply this with W = V and T = IV to get
[v]B = [IV (v)]B = [IV ]BA [v]A .
So multiplication by [IV ]BA changes [v]A to [v]B .
Example. Let V = R2 , A = (e1 , e2 ), B = ((1, 2), (0, 1)). Then
[IR2 ]BA = [[IR2 (e1 )]B [IR2 (e2 )]B ] = [[e1 ]B [e2 ]B ].

1
We can solve e1 = a(1, 2) + b(0, 1) to be a = 1, b = −2, so [e1 ]B = . Similarly we can solve
−2

0 1 0
e2 = a(1, 2) + b(0, 1) to get a = 0, b = 1. So [e2 ]B = and so [IV ]BA = .
1 −2 1

a
Now let v = (a, b) ∈ R2 be arbitrary. Then [v]A = while
b

1 0 a a
[v]B = = .
−2 1 b −2a + b
Theorem 2.22. Any change of coordinate matrix [IV ]BA is invertible and ([IV ]BA )−1 = [IV ]AB .
Proof. [IV ]AB [IV ]BA = [IV ◦ IV ]AA = [IV ]AA = In where the first equality is by Theorem 2.11 (Feb 27). Now
apply the earlier Corollary. □
Example. With V, A, B as before, we get
−1
1 0 1 0
= [IR2 ]AB = [[IR2 (1, 2)]A [IR2 (0, 1)]A ] = .
−2 1 2 1
You can verify that these two matrices multiply to give I2 (in either order).
Now suppose that dim V = n and we have T : V → V . We are most interested in [T ]A (= [T ]AA ) when
A is an ordered basis. If B is another ordered basis, how can we relate [T ]A to [T ]B ?
Theorem 2.23. In this situation, [T ]B = [IV ]BA [T ]A [IV ]AB .
We summarize this as
[T ]B = P −1 [T ]A P where P is the change-of-coordinate matrix from B to A.

1 3
Example. Let C = and T = TC : R2 → R2 . We know that
2 −2
[TC ]A = C.

1 0
Let P = [IV ]AB = . Then
2 1

−1 1 0 1 3 1 0 7 3
[TC ]B = P CP = = .
−2 1 2 −2 2 1 −16 −8
Definition. Suppose A, B ∈ Mn (F). We say that B is similar to A if there exists an invertible matrix
P ∈ Mn (F) with B = P −1 AP .
So [T ]A and [T ]B are similar matrices (for any T : V → V , any ordered bases A and B).
Definition. Given A ∈ Mm×n (F), its transpose is the matrix At ∈ Mn×m (F) obtained from A by
“switching rows and columns.”
 
1 0
1 2 4
Example. If A = then At = 2 1.
0 1 5
4 5
2
Definition. Given a matrix A ∈ Mm×n (F), the transpose of A is the matrix At ∈ Mn×m (F) obtained
from A by “switching rows and columns.”
Formally, the (i, j)-entry of At is the (j, i)-entry of A.
Properties of Transpose.
(1) (A + B)t = At + B t
(2) (αA)t = α(At )
(3) (At )t = A
(4) (AB)t = B t At (whenever A is m × n and B is n × k).
Proof. (1)–(3) are obvious, since switching rows with columns commutes with addition and scalar multi-
plication.
(4) is not obvious, but can be deduced as follows. AB is m × k, so (AB)t is k × m. B t is k × n and At
is n × m, so B t At is also k × m. Finally,
(i, j)-entry of (AB)t = (j, i)-entry of AB
= sum of products of pairs from row j of A and column i of B
= sum of products of pairs from column j of At and row i of B t
= (i, j) entry of B t At .
Thus (AB)t and B t At have the same dimensions and exactly the same entries, so (AB)t = B t At . □
Corollary. Let A ∈ Mn (F). If A is invertible, then so is At and (At )−1 = (A−1 )t .
Proof. From A−1 A = In we get (A−1 A)t = (In )t which simplifies to At (A−1 )t = In . Then the 2nd Corollary
from March 3 gives that At is invertible and (A−1 )t = (At )−1 . □
Fundamental Subspaces
Definition. Suppose A ∈ Mm×n (F).

def
(1) The column space of A is Col(A) = span{columns of A}, a subspace of Fm .
def
(2) The row space of A is Row(A) = span{rows of A}, a subspace of Fn .
Equivalently: if A = [a1 · · · an ] then
Col(A) = span{a1 , . . . , an } = {α1 a1 + · · · + αn an : α1 , . . . , αn ∈ F}
α1
   
 
= [a1 · · · an ]  ...  : α1 , . . . , αn ∈ F = {Av : v ∈ Fn }
 
αn
= {TA (v) : v ∈ Fn } = Ran(TA ).
Since {rows of A} = {columns of At }, we can also write Row(A) = Col(At ) = Ran(TAt ).
(3) The null space of A is Null(A) = {v ∈ Fn : Av = 0}, a subspace of Fn .
(4) The left null space of A is Null(At ), a subspace of Fm .
Note that for v ∈ Fm ,
v ∈ Null(At ) ⇐⇒ At v = 0 ⇐⇒ (At v)t = 0t ⇐⇒ v t A = 0t
so Null(At ) = {v ∈ Fm : v t A = 0t }. Also note that Null(A) = ker(TA ) and Null(At ) = ker(TAt ).
Definition. Suppose A ∈ Mm×n (F).
def
(1) The rank of A is rank(A) = dim Col(A).
def
(2) The nullity of A is nullity(A) = dim Null(A).
def
(3) The left nullity of A is nullity(At ) = dim Null(At ).
Observations.
(1) rank(A) = dim Col(A) = dim Ran(TA ) = rank(TA ).
(2) nullity(A) = dim Null(A) = dim Ker(TA ) = nullity(TA ).
So by the Rank-Nullity Theorem,
dim Col(A) + dim Null(A) = rank(TA ) + nullity(TA ) = dim(Fn ) = n.
| {z } | {z }
rank(A) nullity(A)
t
Since Row(A) = Col(A ) = Ran(TAt ), we similarly get
dim Row(A) + dim Null(At ) = rank(TAt ) + nullity(TAt ) = dim(Fm ) = m.
| {z } | {z }
rank(At ) nullity(At )
One of the most interesting facts about ranks is that rank(A) = rank(At ), or equivalently, Col(A) and
Row(A) have the same dimension. Before proving it, let me state the key idea on which the proof rests.
Lemma. Suppose A ∈ Mm×n (F) and B ∈ Mm×r (F). The following are equivalent:
(1) Col(A) ⊆ Col(B).
(2) There exists C ∈ Mr×n (F) with A = BC.
Proof. (2) ⇒ (1). Assume A = BC with C = [c1 · · · cn ]. Then the columns of A are Bc1 , Bc2 , . . . , Bcn .
Recall that Bx is a linear combination of the columns of B and so is in Col(B) (for any x ∈ Fr ). In
particular, every column of A is in Col(B), and hence Col(A) ⊆ Col(B).
(1) ⇒ (2). Write A = [a1 · · · an ] and B = [b1 · · · br ]. Since Col(A) ⊆ Col(B) we get that a1 , . . . , an ∈
span{b1 , . . . , br }. Consider a1 . By the previous remarks, there exist scalars α1 , . . . , αr ∈ F so that a1 =
α1 b1 + · · · + αr br . Define
α1
 
c1 =  ...  ∈ Fr .
αr
Then Bc1 = α1 b1 + · · · + αr br = a1 . Similarly we can find c2 , . . . , cn ∈ Fr so that Bci = ai for each
i = 1, . . . , n. Let C = [c1 · · · cn ] ∈ Mr×n (F). Then
BC = B[c1 · · · cn ] = [Bc1 · · · Bcn ] = [a1 · · · an ] = A. □
Theorem 1. For any A ∈ Mm×n (F), rank(A) = rank(At ).
Proof. Write A = [a1 a2 · · · an ] and let r = rank(A). Let {b1 , . . . , br } ⊆ Fm be a basis for Col(A).
Let B = [b1 · · · br ] ∈ Mm×r (F). By definition, Col(B) = Col(A). Hence by the Lemma, there exists
C ∈ Mr×n (F) with
A = BC.
Taking transposes of both sides gives
At = (BC)t = C t B t .
Then the Lemma gives Col(At ) ⊆ Col(C t ). Hence
rank(At ) = dim Col(At ) ≤ dim Col(C t ) = dim Row(C) ≤ r since C has only r rows.
In other words, rank(At ) ≤ rank(A). Applying this to At gives rank((At )t ) ≤ rank(At ), so rank(A) =
rank(At ). □
Corollary. For any matrix A, dim Row(A) = dim Col(A) = rank(A).
2
§3.1
Consider a system (S) of m linear equations in n unknowns:
a11 x1 + a12 x2 + · · · · · · a1n xn = b1


 a21 x1 + a22 x2 + · · · · · · a2n xn = b2

(S) ..

 .
am1 x1 + am2 x2 + · · · · · · amn xn = bm

We always have a field F in mind. The coefficients aij and right-hand sides bi belong to F, and we are
looking for solutions x1 , . . . , xn ∈ F.
We can write the system as
a11 a1n b1
     
x1 .
.. + · · · + xn .
.. = .. 
     .
am1 amn bm
and hence as
a11 ··· a1n x1 b1
    
 ... ..  
.
..  =  ..  ,
. .
am1 · · · amn xn bm
that is, as a matrix-vector equation
Ax = b
where A ∈ Mm×n (F) is the coefficient matrix of (S), b ∈ Fm is the RHS vector of (S), and x ∈ Fn
represents a potential solution.
It is also convenient to form the m × (n + 1) matrix
a11 · · · a1n b1
 
(A | b) =  ... ..
.
.. 
.
am1 · · · amn bm
which encodes (S) and which we call the augmented matrix of (S).
Example. For the system
2x1 + 3x2 − x3 = 1
x1 − 2x2 + 4x3 = 2
we have
 
x1
2 3 −1 1 2 3 −1 1
A= , x = x2  ,
 b= , (A | b) = .
1 −2 4 2 1 −2 4 2
x3
Goal: to develop techniques to
(1) Solve a system of linear equations.
(2) Determine the rank of a matrix A.
(3) Determine whether a square matrix A is invertible, and if so, to find A−1 .
We will do all of these via manipulations of matrices called elementary operations.
Definition. An elementary row operation (on matrices) is any one of the following actions:
(1) Switching two rows. Ri ⇆ Rj
(2) Multiplying one row by a nonzero scalar. Ri ← αRi (α ̸= 0)
(3) Adding a scalar multiple of one row to another row. Ri ← Ri + aRj
An elementary column operation is any action of the above kinds, but with rows replaced by columns.
We use notation Ci ⇆ Cj , Ci ← αCi etc.
An elementary row operation applies to m × n matrices where m is at least as large as the index i (or
indices i, j) of the row(s) it references. Similarly, an elementary column operation applies to m×n matrices
where n is at least as large as the index i (or indices i, j) of the column(s) it references.
Notation. If A ∈ Mm×n , O is an elementary operation that can be applied to A, and A′ is the result of
O
applying O to A, then we write A −→ A′ .
Newton’s 3rd Law of Elementary Operations. To every elementary operation O there is an equal
and opposite elementary operation O−1 of the same kind.
For example, if O is the row operation Ri ← Ri + αRj , then O−1 is Ri ← Ri + (−α)Rj . Clearly if
O O−1
A −→ A′ then A′ −→ A.
Willard’s Law. For every elementary row operation O there is a transpose column operation Ot , so that
O Ot
A −→ B iff At −→ B t . (Just change rows to columnns.) Similarly for column operations.
Definition. Let O be an elementary operation and let n be an integer big enough so that O can act on
O
n × n matrices. The n × n elementary matrix corresponding to O is the matrix E where In −→ E.
 
1 0 0
Example.  0 0 1  is an elementary matrix of type 1, using either R2 ⇆ R3 or C2 ⇆ C3 .
0 1 0
 
1 0 0
 0 1 0  (where α ̸= 0) is an elementary matrix of type 2, using either R3 ← αR3 or C3 ← αC3 .
0 0 α
 
1 0 0
 α 1 0  is an elementary matrix of type 3, using either R2 ← R2 + αR1 or C1 ← C1 + αC2 .
0 0 1
Theorem 3.1. Fix m, n and suppose that O is an elementary column operation which can act on □ × n
O
matrices. Let In −→ E, so E is the corresponding n × n elementary matrix. Then for all A ∈ Mm×n (F),
O
A −→ AE.
Proof. Write A = [a1 · · · an ] and In = [e1 · · · en ]. Now consider cases.
Case 1: O is Ci ⇆ Cj . (Assume i < j.)
Then E = [e1 · · · ej · · · ei · · · en ]. So
AE = [Ae1 · · · Aej · · · Aei · · · Aen ] = [a1 · · · aj · · · ai · · · an ],
O
which is the result of switching columns i and j of A, so A −→ AE.
Case 2: O is Ci ← αCi (α ̸= 0).
Then E = [e1 · · · αei · · · en ], so
AE = [Ae1 · · · A(αei ) · · · Aen ] = [Ae1 · · · α(Aei ) · · · Aen ] = [a1 · · · αai · · · an ],
O
which is the result of multiplying column i of A by α, so A −→ AE.
Case 3: O is Ci ← Ci + αCj . (Left as an exercise.) □
There is a corresponding result for elementary row operations.
Corollary. Fix m, n and suppose that O is an elementary row operation that can act on m × □ matrices.
O O
Let Im −→ E, so E is the corresponding m×m elementary matrix. Then for all A ∈ Mm×n (F), A −→ EA.
2
Theorem 3.1. Suppose O is an elementary column operation which can act on □ × n matrices. Let E be
O O
the corresponding n × n elementary matrix, i.e., In −→ E. Then for all A ∈ Mm×n (F), A −→ AE.
O
Corollary 1. Suppose O is an elementary row operation that can act on m × □ matrices. Let Im −→ E.
O
Then for all A ∈ Mm×n (F), A −→ EA.
O
Proof sketch. Ot is an elementary column operation that can act on □ × m matrices. We have Im −→ E
t Ot
t O O
so Im = Im −→ E t . We can apply Ot to At and we get At −→ At E t = (EA)t . So A −→ EA. □
Now we can prove the following intuitively obvious result.
Theorem 3.2. Elementary matrices are invertible. Moreover, if E is the n × n elementary matrix corre-
sponding to O, then E −1 is the n × n elementary matrix corresponding to O−1 .
O−1 O
Proof. Let In −→ F . Then F −→ In .
O
Case 1: O is a column op. Then F −→ F E (Theorem 3.1), so F E = In . Since E, F are square, we get
E is invertible and F = E −1 (2nd Corollary, March 3).
O
Case 2: O is a row op. Then F −→ EF (Corollary 1). So EF = In . Use the same logic. □
§3.2
Notation. Suppose A, B ∈ Mm×n (F). We write A ⇝ B if there exists a sequence of elementary operations
row
that can transform A to B. We write A ⇝ B if we can do this using just row operations; similarly for
col
A ⇝ B.
Consider the following sequence O1 , . . . , O11 which witnesses
   
2 4 1 0 1 0 0 0
A = −1 −2 1 3  ⇝ 0 1 0 0 = B :
3 6 0 −3 0 0 0 0
     
2 4 1 0 −1 −2 1 3 1 2 −1 −3
R1 ⇆R2 R1 ←(−R1 )
A = −1 −2 1 3  −−−−→  2 4 1 0  −−−−−−→ 2 4 1 0
O1 O2
3 6 0 −3 3 6 0 −3 3 6 0 −3
     
1 2 −1 −3 1 2 −1 −3 1 0 −1 −3
R2 ←R2 −2R1 R3 ←R3 −3R1 C2 ←C2 −2C1
−−−−−−−→ 0 0 3 6  −− −−−−−→ 0 0 3 6  −− −−−−−→ 0 0 3 6
O3 O4 O5
3 6 0 −3 0 0 3 6 0 0 3 6
     
1 0 0 −3 1 0 0 0 1 0 0 0
C3 ←C3 +C1 C4 ←C4 +3C1 R3 ←R3 −R2
−−−−−−−→ 0 0 3 6  −−−−−−−→ 0 0 3 6 −−−−−−−→ 0 0 3 6
  
O6 O7 O8
0 0 3 6 0 0 3 6 0 0 0 0
     
1 0 0 0 1 0 0 0 R ←1R 1 0 0 0
C4 ←C4 −2C3 C2 ⇆C3 2 2
−− −−−−−→ 0 0 3 0 −−−−→ 0 3 0 0 −−−−3−→ 0 1 0 0 = B.
O9 O10 O11
0 0 0 0 0 0 0 0 0 0 0 0
Observe that O1 , . . . , O4 , O8 , O11 are elementary row operations while O5 , O6 , O7 , O9 , O10 are elementary
column operations. For i = 1, . . . , 11 let Ei be the appropriately sized elementary matrix corresponding to
Oi . (So O1 , . . . , O4 , O8 , O11 are 3 × 3 and O5 , O6 , O7 , O9 , O10 are 4 × 4). Then
B = E10 E8 E4 E3 E2 E1 A E5 E6 E7 E9 E10 = P AQ.
| {z } | {z }
P := Q:=
Since elementary matrices are invertible (Theorem 3.2), and products of invertible matrices are invertible
(Properties of inverses, Feb 27), we get that P, Q are invertible. This obviously generalizes.
Theorem 1. Let A, B ∈ Mm×n (F).
(1) If A ⇝ B, then B = P AQ for some invertible P ∈ Mm (F) and Q ∈ Mn (F).
row
(2) If A ⇝ B, then B = P A for some invertible P ∈ Mm (F).
col
(3) If A ⇝ B, then B = AQ for some invertible Q ∈ Mn (F).
Theorem 3.4. Suppose A ∈ Mm×n (F). Let P ∈ Mm (F) and Q ∈ Mn (F) be invertible.
(1) Col(A) = Col(AQ).
(2) Row(A) = Row(P A).
(3) rank(A) = rank(P A) = rank(AQ) = rank(P AQ).
Proof. (1) Col(AQ) = Ran(TAQ ) = Ran(TA ◦ TQ ). Q is invertible so TQ is an isomorphism and hence is
surjective. So Ran(TA ◦ TQ ) = Ran(TA ) = Col(A).
(2) Row(P A) = Col((P A)t ) = Col(At P t ) = Col(At ) (by (1)) = Row(A).
(3) rank(AQ) = dim Col(AQ) = dim Col(A) = rank(A). Similarly for rank(P A). Finally,
rank(P AQ) = rank((P A)Q) = rank(P A) = rank(A). □
Combining Theorem 3.4 with Theorem 1, we get:
Corollary 2.
row
(1) If A ⇝ B, then Row(A) = Row(B).
col
(2) If A ⇝ B, then Col(A) = Col(B).
(3) If A ⇝ B, then rank(A) = rank(B).
 
1 1 1 2
Example. Find rank(A) where A = 2 0 −1 2.
1 1 1 2
Solution:
     
1 1 1 2 1 0 1 2 1
C ←− C2
1 0 1 2
R2 ←R2 −2R1 C ←C −C1
A −− −−−−−→ 0 −2 −3 −2 −−2−−−2−−→ 0 −2 −3 −2 −−2−−−2−→ 0 1 −3 −2 =: B.
R3 ←R3 −R1
0 0 0 0 0 0 0 0 0 0 0 0
Obviously rank(B) = 2 (the first two columns of B form a basis for Col(B)). Hence rank(A) = 2 by
Corollary 2.
Note that A, B do not have the same column space. But their column spaces have the same dimension.
Also note that, if we wanted, we could continue the above calculation as follows:
   
1 0 0 0 1 0 0 0
C ←C −C1 C3 ←C3 +3C2
B −−3−−−3−−→ 0 1 −3 −2 − −−−−−−→ 0 1 0 0 =: D.
C4 ←C4 −2C1 C4 ←C4 +2C2
0 0 0 0 0 0 0 0
This generalizes.
Theorem 3.6. For every nonzero matrix A ∈ Mm×n (F) there exists a matrix D of the form

Ir O
D=
O′ O′′
where r ≥ 1 and O, O′ , O′′ are all-zero matrices, such that A ⇝ D. Obviously rank(D) = r, so rank(A) = r.
Proof sketch. A has a nonzero entry, and using type-1 operations we can move it to the 1,1 position. By
a type-2 operation, we can change it to 1. Then using type-3 operations, we can “clear” the remaining
2
entries in the first row and column. Thus we have converted A to a matrix A′ of the form
1 0 ··· 0
 
 0
A′ = 

.
 ..

B 
0
If B is all 0s we’re done. Else repeat: we can move a nonzero entry of B to the 2,2 position of A′ ; make it
equal 1; and then clear the rest of the 2nd row and column. Etc. □
3
Theorem 1. For square A ∈ Mn (F), the following are equivalent:

(1) A is invertible.
(2) rank(A) = n.
(3) A ⇝ In .
row
(4) A ⇝ In .
(5) A can be written as a product of elementary matrices.
Proof. (1) ⇔ (2): A is invertible ⇐⇒ TA is invertible ⇐⇒ TA is surjective (⇐ by Thm. 2.5, Feb 10)
⇐⇒ Ran(TA ) = Fn ⇐⇒ Col(A) = Fn ⇐⇒ dim Col(A) = n ⇐⇒ rank(A) = n.
Ir O
(2) ⇔ (3): By Theorem 3.6 (Mar 10), rank(A) = r iff A ⇝ D = . Apply with r = n.
O′ O′′
(3) ⇒ (5): If A ⇝ In , then In ⇝ A. Thus
A = (Ek · · · E2 E1 )In (E1′ E2′ · · · Eℓ′ )
= Ek · · · E2 E1 E1′ E2′ · · · Eℓ′
where E1 , . . . , Ek are the elementary matrices corresponding to the row operations used in transforming
In to A, and E1′ , . . . , Eℓ′ are the elementary matrices corresponding to the column operations used.
(5) ⇒ (4): Assume that A = E1 E2 · · · Ek where each Ei is elementary. Thus A = E1 E2 · · · Ek (In ). Each
row row
Ei represents a row operation when multiplying a matrix on the left. Thus In ⇝ A. Hence A ⇝ In using
the reverse (row) operations.
(4) ⇒ (3): obvious. □
Here is an application. Suppose A is invertible, so can be transformed by elementary row operations to
In . Let E1 , . . . , Ek be the corresponding elementary matrices. Thus
In = Ek · · · E2 E1 A.
Multiply both sides of this equation on the right by A−1 to get
A−1 = Ek · · · E2 E1 In .
This equation shows that
Theorem 2. If A is invertible, then any sequence of elementary row operations which transforms A to In
also transforms In to A−1 .
This gives an easy way to find A−1 (when it exists).
(1) Form the augmented matrix (A | In ) (n × 2n).
(2) Using row operations, try to transform A to In , but apply the operations to (A | In ).
(3) If (A | In ) is transformed to (In | B), then A is invertible and B = A−1 .
 
1 −3 1
Example. Let A =  1 0 2 . Then
0 1 0
     
1 −3 1 1 0 0 1 −3 1 1 0 0 1 −3 1 1 0 0
(A | I3 ) =  1 0 2 0 1 0  −→  0 3 1 −1 1 0  −→  0 1 0 0 0 1  −→
0 1 0 0 0 1 0 1 0 0 0 1 0 3 1 −1 1 0
     
1 0 1 1 0 3 1 0 1 1 0 3 1 0 0 2 −1 6
 0 1 0 0 0 1  −→  0 1 0 0 0 1  −→  0 1 0 0 0 1 .
0 3 1 −1 1 0 0 0 1 −1 1 −3 0 0 1 −1 1 −3
 
2 −1 6
Thus A is invertible and A−1 = 0 0 1 .
−1 1 −3
If A is not invertible, then rank(A) < n and any attempt to transform A to In will fail. As soon as it
becomes “obvious” that the LHS has rank < n, you can stop and conclude that rank(A) < n and hence A
is not invertible.
 
1 2 1
Example. Let A =  2 1 −1 . Then
1 5 4
     
1 2 1 1 0 0 1 2 1 1 0 0 1 2 1 1 0 0
(A | I3 ) =  2 1 −1 0 1 0  −→  0 −3 −3 −2 1 0  −→  0 −3 −3 −2 1 0 
1 5 4 0 0 1 1 5 4 0 0 1 0 3 3 −1 0 1
 
1 2 1 1 0 0
−→  0 −3 −3 −2 1 0  =: (A′ | B).
0 0 0 −3 1 1
At this point we see that the LHS (A′ ) has rank < 3 (as it has only two nonzero rows, so its row space has
dimension ≤ 2). At this point we can stop and conclude that rank(A) < 3 and hence A is not invertible.
(Alternatively, we might have noted at the start that the first and third columns of A add to make the
second column, so rank(A) ≤ 2, so A is not invertible.)
Recall the four fundamental subspaces of A ∈ Mm×n (F):

Col(A) ⊆ Fm
Row(A) ⊆ Fn
Null(A) ⊆ Fn
Null(At ) ⊆ Fm .
Focus on Row(A) and Null(A); both are subspaces of Fn . Moreover,
dim Row(A) + dim Null(A) = rank(A) + nullity(A) = n,
so it might be tempting to guess that Fn = Row(A) ⊕ Null(A).
And in MATH 136 this will stated as a true fact. But it is only true if the field F is a subfield of R.
Theorem. Assume F ⊆ R. If A ∈ Mm×n (F), then Fn = Row(A) ⊕ Null(A).
Proof. It is enough to prove Row(A) ∩ Null(A) = {0}. (Because if we know that, then
dim(Row(A) + Null(A)) = dim Row(A) + dim Null(A) − dim({0})
= rank(A) + nullity(A) = n = dim Fn .
Since Row(A) + Null(A) is a subspace of Fn , it will follow that Row(A) + Null(A) = Fn .)
(Continued on Wednesday)
2
Theorem. Assume If F ⊆ R. If A ∈ Mm×n (F), then Fn = Row(A) ⊕ Null(A).

Proof. (Resumed from Monday) It is enough to prove Row(A) ∩ Null(A) = {0}.
So suppose v ∈ Row(A) ∩ Null(A). v ∈ Row(A) =⇒ v ∈ Col(At ) =⇒ ∃x ∈ Fm with v = At x. Then
v ∈ Null(A) =⇒ Av = 0 =⇒ AAt x = 0.
This last 0 is the zero vector in Fm , which we can view as a m × 1 matrix. AAt x is also an m × 1 matrix.
We can multiply both on their left by xt to get
xt AAt x = 0 where both sides are scalars.
t t t t
We can write the LHS as (A x) (A x) = v v. Write v = (a1 , . . . , an ). Then
a1
 
n
t
a2  X

a2i = 0.

v v = a1 a2 · · · an  ..  =
 
. i=1
an
But a sum of squares of real numbers equals 0 iff all terms equal 0, so
a1 = a2 = · · · = an = 0 so v = 0. □

Ir O row
We’ve seen that every A ⇝ and the RHS is unique (for A). What if we can only use ⇝?
O′ O′′
In the following definition, the leading entry of a nonzero row is the first (left-most) nonzero entry.
For example, in the following matrix
 
1 0 2 0 −2 3
 0 1 −1 0 1 1 
 
 0 0 0 1 −2 2 
0 0 0 0 0 0
the leading entries of rows 1–3 are in columns 1, 2 and 4.
Definition. A matrix R ∈ Mm×n (F) is in reduced row echelon form (RREF) if:
(1) All zero rows (if any) are at the bottom.
(2) The leading entry of a nonzero row is always strictly to the right of the leading entries of the rows
above it.
(3) Every leading entry (of a nonzero row) equals 1.
(4) The entries directly above a leading entry (i.e., in the same column) all equal 0.
If R just satisfies (1) and (2), then it is in row echelon form (REF).
Examples:      
1 0 0 2 0 1 0 0 2 0 1 2 2
0 1 0 0  0 0 1 0 0  0 0 0 0
0 0 1 −1 0 0 0 1 −1 0 0 0 0
Some matrices not in RREF:
   
1 0 2 0 0 1 0 2
0 1 −1 0 0 0 0 1
0 0 0 2 0 0 0 0
Note: in each RREF example R, the columns containing the leading 1s are standard basis vectors (so
are linearly independent), and span Col(R). This is always the case. Thus
Theorem 1. If R is in RREF, then rank(R) = number of leading ones = number of nonzero rows.
row
Theorem 2. For every A ∈ Mm×n (F) there exists R ∈ Mm×n (F) in RREF such that A ⇝ R.
Proof sketch. Like the proof of Thm 3.6, but no column operations allowed:
• If A = O, then we’re done. Else, find the first nonzero column; pick a nonzero entry in it and use
a row op to move it to the top of that column. (This entry is called a pivot.)
• Using row operations, change the pivot to 1 and “clear” the entries below it. We now have something
like
0 0 1 ∗ ··· ∗
 
 0 0 0
A′ = 

. . . .
 .. .. .. B 
0 0 0
• Repeat. If B = O then we are done. Otherwise, find its first nonzero column, pick a nonzero entry
in that column, move it to the top of B (so to the 2nd row of A′ ). (This is our 2nd pivot.)
• Change this pivot to a 1 and use it to clear the entries above and below it to get something like
0 0 1 ∗ ∗ 0 ∗ ··· ∗
 
 0 0 0 0 0 1 ∗ ··· ∗ 
′′ 0 0 0 0 0 0
 
A =  . . . . . .


 .. .. .. .. .. .. C 
0 0 0 0 0 0
row
Eventually we arrive at an RREF matrix R. We used only row ops, so A ⇝ R. □
Notes
(1) The algorithm sketched in the above proof is known as Gauss-Jordan elimination. Another
variation is described in the example below.
(2) This algorithm is non-deterministic – each time you examine a first nonzero column, you have a
choice in which nonzero element from that column to use as a pivot.
Example. Consider the following matrix A ∈ M4×6 (R):
 
2 4 1 0 −4 2
0 0 2 −4 4 4
A= 1 2 2 −3 1

4
3 6 −2 7 −13 4
We can transform it to a matrix in RREF as follows.
(1) In the leftmost nonzero column, use elementary row operations (if necessary) to get a 1 in the first
row. (This will be a leading one.)
  R ∗1  
2 4 1 0 −4 2 − 1 2
−−→ 1 2 21 0 −2 1
0 0 2 −4 4 4 0 0 2 −4 4 4
A= 1 2 2 −3 1 4
  
1 2 2 −3 1 4
3 6 −2 7 −13 4 3 6 −2 7 −13 4
(2) By means of type 3 elementary row operations, use the first row to create zeros in the remaining
entries of the leftmost nonzero column; that is, below the leading one created in the previous step.
   
1 2 21 0 −2 1 1 2 12 0 −2 1
 R3 ←R3 −R1 0 0 23 −4 4 4
0 0 2 −4 4 4  

1 2 2 −3 1 4 −−−−−−−→ 0 0 −3 3 3
2
←R −3∗R 7
3 6 −2 7 −13 4 −−−−−−−−→ 0 0 − 2 7 −7 1
R 4 4 1
2
(3) Consider the submatrix consisting of the columns to the right of the column we just modified and
the rows beneath the row that just got a leading one. Use elementary row operations (if necessary)
to get a leading one in the top of the first nonzero column of this submatrix.
 1
  1

12 2
0 −2 1 1 2 2
0 −2 1
 0 0 2 −4 4 4  R2 ∗ 12  0 0 1 −2 2 2 
  −−−→  
 0 0 3 −3 3 3   0 0 3 −3 3 3 
2 2
0 0 − 72 7 −7 1 0 0 − 72 7 −7 1
(4) Use elementary row operations to obtain zeros below the 1 created in the preceding step. (But do
not create zeroes above the leading one now; Gaussian elimination does this later.)
   
1 2 12 0 −2 1 1 2 21 0 −2 1
0 0 1 −2 2 2
 R3 ←R3 − 32 ∗R2 0 0 1 −2 2 2
 
 3
0 0
2
−3 3 3 −−−−−−−−→ 0 0 0 0 0 0
7 7
0 0 − 2 7 −7 1 R4 ←R4 + 2 ∗R2 0 0 0 0 0 8
−−−−−−−−→
(5) Repeat Steps 3 and 4 until no nonzero rows remain. This completes the forward phase.
     
1 2 12 0 −2 1 1 2 12 0 −2 1 1 2 12 0 −2 1
 0
 0 1 −2 2 2 

 0 0 1 −2 2 2  
 R3 ← 1 ∗R3  0 0 1 −2 2 2 
 0 0 0 0 0 0  R3 ↔R4
 0 0 0 0 0 8 −−−−8−−→ 
 0 0 0 0 0 1 
0 0 0 0 0 8 −−− −→ 0 0 0 0 0 0 0 0 0 0 0 0
(6) Now we will create zeroes above the leading ones. Working backwards, begin with the last nonzero
row and add multiples of it to each row above it to create zeros above its leading one.
  R ←R −R3  
1 2 21 0 −2 1 −−1−−−1−−→ 1 2 12 0 −2 0
0 0 1 −2 2 2 R2 ←R2 −2∗R3 0 0 1 −2 2 0
  −−−−−−−−→  
0 0 0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
(7) Repeat the process in Step 6 for the second-to-last nonzero row, then the third-to-last nonzero row,
etc. until it has been performed to every nonzero row except the first row. This completes the
backward phase. At this point the matrix should be in RREF.
  R ←R − 1 ∗R  
1 2 12 0 −2 0 − 1 1 2
−−−−−−−→
2
1 2 0 1 −3 0
0 0 1 −2 2 0 0 0 1 −2 2 0
    in RREF.
0 0 0 0 0 1  0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
Note that, in this example, we did not clear the entries above the pivots until the end (and then we
did them in reverse order). This variation of the algorithm is called Gaussian elimination and is the
preferred algorithm for computer implementations, because it is slightly more efficient than the Gauss-
Jordan algorithm.
Definition. Let A ∈ Mm×n (F) and b ∈ Fm .

(1) The system Ax = b is consistent if it has at least one solution. Otherwise it is inconsistent.
(2) Ax = b is homogeneous if b = 0; otherwise it is nonhomogeneous.
Given a system Ax = b, I’ll typically let S denote the set of solutions of Ax = b. Thus S ⊆ Fn . I’ll often
write SH for S when b = 0, i.e., when discussing Ax = 0.
Note Ax = b is consistent iff S ̸= ∅. Also, Ax = 0 is always consistent (0 ∈ Fn is always a solution).
3
Facts.
(1) SH = Null(A) = Ker(TA ), a subspace of Fn .
(2) dim(SH ) = nullity(A) = n − rank(A).
(3) SH has only the trivial solution ⇐⇒ nullity(A) = 0 ⇐⇒ rank(A) = n.
Corollary. If m < n, then Ax = 0 must have a nontrivial solution.
Proof. Col(A) is a subspace of Fm , so rank(A) = dim Col(A) ≤ m. If m < n then rank(A) < n so
dim(SH ) > 0. □
Theorem 3. Let A ∈ Mm×n (F) and b ∈ Fm . The following are equivalent:
(1) Ax = b is consistent.
(2) b ∈ Col(A).
(3) rank(A | b) = rank(A).
Proof. Write A = [a1 · · · an ]. A solution to Ax = b is a vector x = (x1 , . . . , xn ) ∈ Fn such that
x1 a1 + · · · + xn an = b.
Such (x1 , . . . , xn ) exists iff b ∈ span{a1 , . . . , an } = Col(A). This proves (1) ⇔ (2). Since
rank(A) = dim span{a1 , . . . , an } and rank(A | b) = dim span{a1 , . . . , an , b},
the two ranks are equal iff b ∈ span{a1 , . . . , an }. □
Theorem 4. Let A ∈ Mm×n (F) and b ∈ F. Let S be the solution set to Ax = b and let SH be the solution
set to Ax = 0. Assume that Ax = b is consistent (i.e., S ̸= ∅) and let xp ∈ Fn be one particular solution
(i.e., xp ∈ S). Then
S = xp + SH .
(Proof on Friday)
4
Theorem 4. Let A ∈ Mm×n (F) and b ∈ F. Let S be the solution set to Ax = b and let SH be the solution
set to Ax = 0. Assume that Ax = b is consistent (i.e., S ̸= ∅) and let xp ∈ Fn be one particular solution
(i.e., xp ∈ S). Then
S = xp + SH .
Proof. (⊆) Suppose x ∈ S, so Ax = b. Then A(x − xp ) = Ax − Axp = b − b = 0 so x − xp ∈ SH . Then
x = xp + (x − xp ) ∈ xp + SH .
(⊇) Suppose x ∈ xp + SH . Then x = xp + v for some v ∈ SH . Then Ax = A(xp + v) = Axp + Av = b + 0 = b
so x ∈ S. □
Computational problem: given A ∈ Mm×n (F) and b ∈ Fm , “describe” the solution set S to Ax = b by:
(P1) Determining whether Ax = b is consistent (i.e., whether S ̸= ∅).
(P2) If S ̸= ∅, finding
(a) one solution xp to Ax = b, and (b) a basis {v1 , . . . , vk } for SH (= Null(A)),
so that S = xp + span{v1 , . . . , vk }.
Fact. Let R ∈ Mm×n (F) and b ∈ Fm . If (R | b) is in RREF, then we can easily “read off” a solution to
(P1) and (P2) for Rx = b.
(P1) Rx = b is consistent ⇐⇒ b ∈ Col(R) ⇐⇒ (R | b) does not have a pivot in the last column (b).
   
1 2 0 −1 0 1 2 0 −1 2
(R | b) =  0 0 1 4 0  (R | b) =  0 0 1 4 3 
0 0 0 0 1 0 0 0 0 0
inconsistent consistent
This is because, generally, a pivot column in an RREF matrix is never in the span of the columns
to its left.
(P2) If Rx = b is consistent:
(a) Write the system of equations corresponding to Rx = b. In the 2nd example above, this would
correspond to
x1 + 2x2 − x4 = 2
x3 + 4x4 = 3
0 = 0
(b) The columns of R correspond to the variables x1 , . . . , xn in the equations. The variables
corresponding to columns with a pivot are called pivot variables. The remaining variables
are called free variables.
If r = rank(R), then R has r pivots. So R has n − r free variables. Choose new parameter
names for them s1 , . . . , sk (k = n − r) (for convenience).
In the above example, n = 4 and r = 2, so k = 2. The free variables are x2 , x4 and we
rename them x2 = s1 and x4 = s2 .
(c) Replace the free variables with their parameter names and move them to the RHS:
x1 = 2 − 2s1 + s2
x3 = 3 − 4s2
(d) (Optional) Insert the equations equating the free variables with the parameters.
x1 = 2 − 2s1 + 1s2
x2 = 0 + 1s1 + 0s2
x3 = 3 + 0s1 − 4s2
x4 = 0 + 0s2 + 1s2
(e) Translate to a vector equation:
       
x1 2 −2 1
x2  0 1 0
  =   + s1   + s2  
x3  3 0 −4
x4 0 0 1
Here s1 , s2 can freely vary over F.
(f) Thus the solution set S to Rx = b is
     
2  −2 1
0    
1 0

S =   +span   ,  
  
3 
 0 −4  
1 0 1
| {z } | {z }
xp basis for SH
Theorem 5. In general, if (R | b) ∈ Mm×(n+1) (F) is in RREF, S is the solution set to Rx = b, and S ̸= ∅,

then steps (a)–(f ) produce
S = xp + span{v1 , . . . , vk }
where k = n−rank(R), xp is a particular solution to Rx = b, and {v1 , . . . , vk } is a basis for SH (= Null(R)).
Finally, we consider general systems Ax = b.
Definition. We say that the two systems of linear equations are equivalent if they have exactly the same
solution sets.
row e e
Theorem 6. Suppose A ∈ Mm×n and b ∈ Fm . If (A | b) ⇝ (A | b), then Ax = b and Ax
e = eb are equivalent.
row e e
Proof. Assume (A | b) ⇝ (A | b). Then there exists an invertible P ∈ Mm (F) with P (A | b) = (A e | eb). So
PA = A e and P b = eb.
Let x ∈ Fm be arbitrary. If Ax = b, then P Ax = P b, i.e., Axe = eb. Conversely, if Ax
e = eb, i.e., P Ax = P b,
then P −1 P Ax = P −1 P b, i.e., Ax = b. So Ax = b ⇐⇒ Ax e = eb as desired. □
Putting this altogether, we have an obvious
Algorithm to solve any system Ax = b
(1) Form the augmented matrix: (A | b).
row
(2) Transform (A | b) ⇝ (R | eb) in RREF. Ax = b and Rx = eb have the same solution set S.
(3) Check Rx = eb for consistency.
(4) If consistent, do steps (a)–(f) above to get xp , v1 , . . . , vn−r ∈ Fn so that S = xp + span(v1 , . . . , vn−r ).
row e e
Summary. If Ax = b is consistent and (A | b) ⇝ (A | b) in RREF, then
• # of pivots = rank(A) = # of nonzero rows (of R).
• dim(SH ) = # of free variables = n − # of pivots = n − rank(A).
Theorem 7. Let A ∈ Mm×n (F). TFAE:
(1) For all b ∈ Fm , Ax = b has a unique solution.
(2) m = n and A is invertible.
Proof. (1) ⇒ (2): In particular, Ax = 0 has a unique solution, i.e., dim(SH ) = 0, so rank(A) = n, so n ≤ m
(as rank(A) ≤ m). On the other hand, Ax = b is consistent for every b ∈ Fm , which implies Fm ⊆ Col(A),
so =, so rank(A) = m. Hence m = n.
So A ∈ Mn (F) is square and rank(A) = n. It follows that A is invertible (Theorem 1, March 13).
(2) ⇒ (1): If m = n and A is invertible, then A−1 exists. So for any b ∈ Fn , Ax = b ⇐⇒ x = A−1 b. □
2
One more fact about row-reduction to RREF: it preserves “linear dependencies among the columns.”
Consider the example from March 15 lecture notes:
   
2 4 1 0 −4 2 1 2 0 1 −3 0
0 0 2 −4 4 4 row 0 0 1 −2 2 0
A= 1 2 2 −3 1 4 ⇝ 0 0 0
   = R in RREF.
0 0 1
3 6 −2 7 −13 4 0 0 0 0 0 0
Write A = [a1 a2 · · · a6 ] and R = [r1 r2 · · · r6 ]. Observe that:
• r1 ̸= 0. Also a1 ̸= 0.
• r2 = 2r1 . Also a2 = 2a1 .
• r3 ̸∈ span{r1 , r2 }. Also a3 ̸∈ span{a1 , a2 }.
• r4 = r1 − 2r3 . Claim: a4 = a1 − 2a3 .
• By the apparent pattern, we should get a5 = −3a1 + 2a3 and a6 ̸∈ span{a1 , . . . , a5 }, because
r5 = −3r1 + 2r3 and r6 ̸∈ span{r1 , . . . , r5 }.
The formal statement of the pattern observed above is the following theorem.
row
Theorem 3.16. Suppose A ∈ Mm×n (F) and A ⇝ R in RREF. Write A = [a1 a2 · · · an ] and R =
[r1 r2 · · · rn ]. Then for all t = 1, . . . , n:
(1) rt = 0 iff at = 0.
(2) For all i1 , . . . , ik ̸= t and α1 , . . . , αk ∈ F,
rt = α1 ri1 + · · · + αk rik ⇐⇒ at = α1 ai1 + · · · + αk aik .
row
Proof sketch. (Not given in class) Because A ⇝ R, there exists invertible P ∈ Mm (F) with P A = R and
P −1 R = A. Hence P ai = ri and P −1 ri = ai for i = 1, . . . , n.
(1) Suppose at = 0. Multiply both sides on the left by P to get rt = 0. Conversely, if rt = 0, multiply
both sides on the left by P −1 to get at = 0.
(2) Suppose at = α1 ai1 + · · · + αk aik . Multiply both sides on the left by P (and use linearity of matrix-
vector multiplication) to get rt = α1 ri1 + · · · + αk rik . For the opposite implication, multiply both sides on
the left by P −1 . □
Returning to the example at the top of this page: obviously {r1 , . . . , r6 } spans Col(R). Consider what
will happen if we apply the Naive Algorithm to shrink {r1 , . . . , r6 } to a basis of Col(R). The above bullet
points (starting with r1 ̸= 0) tell us that the output of the algorithm will be {r1 , r3 , r6 }, i.e., the columns
of R containing pivots.
Now consider what will happen if we apply the Naive Algorithm to shrink {a1 , . . . , a6 } to a basis of
Col(A). Because the columns of A satisfy exactly the same linear dependencies as do the columns of R,
the output of the algorithm will be {a1 , a3 , a6 }. Hence {a1 , a3 , a6 } is a basis for Col(A).
These remarks hold generally.
row
Corollary 1. Suppose A ∈ Mm×n (F) and A ⇝ R in RREF. Write A = [a1 a2 · · · an ]. Suppose R has
pivots in columns i1 , i2 , . . . , ir . Then the Naive Algorithm to shrink {a1 , . . . , an } to a basis for Col(A) will
return {ai1 , . . . , air }. Hence the columns of A corresponding to the pivot columns of R form a basis for
Col(A).
Lastly, we can use Theorem 3.16 to prove the uniqueness of RREFs.
row
Corollary 2. For all A ∈ Mm×n (F) there exists only one RREF matrix R with A ⇝ R.
row
Proof sketch. (Not given in class) Suppose A ⇝ R with R in RREF. Write R = [r1 · · · rn ]. We will show
that the columns of R are determined by A. If A = 0 then R = 0 and R is completely determined. If
A ̸= 0, let i1 be the column number of the first nonzero column of A. Then the first i1 − 1 columns of
R equal 0 (by Theorem 3.16(1)) and column i1 of R equals e1 (by Theorem 3.16(1) and the definition of
RREF). So the first i1 columns of R are determined. Inductively suppose that i1 < t ≤ n and we have
proved that the first t − 1 columns of R are determined. Let the pivot columns of R in the first t − 1
columns be in column numbers i1 , . . . , ik (so the corresponding columns of R are e1 , . . . , ek ). Consider rt .
Case 1: at ∈ span{ai1 , . . . , aik }. Say
(∗) at = α 1 ai 1 + · · · + α k ai k .
Then rt = α1 e1 + · · · + αk ek by Theorem 3.16(2) so rt is determined by (∗), which is determined by A.
Case 2: at ̸∈ span{ai1 , . . . , aik }.
Then rt ̸∈ span{e1 , . . . , ek } by Theorem 3.16(2), so rt is the next pivot column (by definition of RREF),
i.e., rt = ek+1 , which was determined by A. □
Chapter 4 (starting at §4.5) – Determinants

For every F and n ≥ 2 there exists a function det : Mn (F) → F having many remarkable and useful
properties, including:
• det(AB) = det(A) det(B).
• det At = det A.
• det(A) ̸= 0 iff the columns of A are linearly independent (i.e., A is invertible).
• (When F = R and n = 2): If A = [v1 v2 ] ∈ M2 (R) and the columns are linearly independent, then
det(A) is ± the area of the parallelogram spanned by v1 , v2 .
You probably know of the 2 × 2 determinant function over R:

a b
det = ad − bc.
c d
You might know the formula for the 3 × 3 determinant function over R:
 
a11 a12 a13
det a21 a22 a23  = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a11 a23 a32 + a13 a22 a31 + a12 a21 a33 ).
a31 a32 a33
There is a formula for the general n × n determinant function, known as the Leibniz formula: if
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A=  ... .. .. .. 
. . . 
an1 an2 · · · ann
then X
det A = sgn(σ) a1σ(1) a2σ(2) · · · anσ(n)
σ∈Sn
where Sn is the set of all permutations on {1, . . . , n} and

1 if σ is even
sgn(σ) =
−1 if σ is odd
A completely different “formula” for det A is the following. If
 
a11 a12 · · · a1n
 | | ··· | 
A=  v1 v2 · · · vn 
 with v1 , . . . , vn ∈ Fn−1 ,
| | ··· |
2
then
det A = a11 det [v2 v3 · · · vn ] − a12 det [v1 v3 · · · vn ] + · · · + (−1)n−1 a1n det [v1 v2 · · · vn−1 ].
This is a recursive definition of det on Mn (F) in terms of det on Mn−1 (F), called cofactor expansion
along the first row or as Laplace expansion.
Proving that the two formulas above actually define the same function, and that this function has the
various properties we care about, is tedious and a lot of work. Instead, this week we will do the following:
(1) Identity some simple postulates or axioms for determinant functions.
(2) Deduce from these postulates all the other properties we will need to know (including techniques
to calculate determinants, and a proof that det A is given by the Leibniz formula).
The only issue is that we won’t have proved the the postulates are consistent. We’ll fix this at the end
by showing that the cofactor definition satisfies the postulates.
3
To facilitate the definition, we will interpret the determinant functions both as

n
det : Mn (F) → F and as · · × Fn} → F
| × ·{z
D:F
n
where D(v1 , . . . , vn ) = det [v1 · · · vn ].

Definition. Fix F and n ≥ 1. The n × n determinant function for F is a function det : Mn (F) → F,
also written D : Fn × · · · × Fn → F, satisfying
(P1) Linearity in each column: for all k = 1, . . . , n,
(a) D(v1 , . . . , αvk , . . . , vn ) = αD(v1 , . . . , vk , . . . , vn )
(b) D(v1 , . . . , x + y, . . . , vn ) = D(v1 , . . . , x, . . . , vn ) + D(v1 , . . . , y, . . . , vn )
(P2) Alternation: If A ∈ Mn (F) has two equal adjacent columns, then det A = 0.
(P3) Normalization: det In = 1. (Equivalently, D(e1 , e2 , . . . , en ) = 1.)

a b
Exercise. Show that the function det = ad − bc satisfies (P1)–(P3).
c d
Ci ⇆Cj
Lemma 1 (Type 1 Column Op). If A ∈ Mn (F) and A −→ B, then det B = − det A.
Proof sketch. Case 1: j = i + 1. Write A = [v1 · · · vi vi+1 · · · vn ] so B = [v1 · · · vi+1 vi · · · vn ]. Let
x = vi and y = vi+1 . Observe that
0 = D(v1 , . . . , x + y, x + y, . . . , vn ) (P2)
= D(v1 , . . . , x, x + y, . . . , vn ) + D(v1 , . . . , y, x + y, . . . , vn ) (P1b) in column i
= D(v1 , . . . , x, x, . . . , vn ) + D(v1 , . . . , x, y, . . . , vn )
+ D(v1 , . . . , y, x, . . . , vn ) + D(v1 , . . . , y, y, . . . , vn ) (P1b) in column i + 1
= D(v1 , . . . , vi , vi+1 , . . . , vn ) + D(v1 , . . . , vi+1 , vi , . . . , vn ) (P2)
= det A + det B.
Case 2: j > i + 1. We can simulate swapping columns i and j by a sequence of 2(j − i) − 1 swaps of
adjacent columns. For example, if i = 1 and j = n, then in permutation notation,
(1 n) = (1 2)(2 3) · · · (n−2 n−1)(n−1 n)(n−2 n−1) · · · (2 3)(1 2).
By Case 1, each adjacent column swap changes the determinant by a factor of −1. So 2(j − i) − 1 such
swaps changes the determinant by a factor of (−1)2(j−i)−1 = −1. □
Lemma 2 (P2+ ). If A ∈ Mn (F) has two equal columns, then det A = 0.
Proof. Suppose A = [v1 · · · vi · · · vj · · · vn ] and vi = vj . If j = i + 1 we get det A = 0 by (P2). If
Ci+1 ⇆Cj
j > i + 1, then we let A −→ B and we get det A = − det B = 0 by Lemma 1 and (P2). □
C ←αC
Lemma 3 (Type 2 Column Op). If A ∈ Mn (F) and A i−→ i B (α ̸= 0), then det B = α det A.
Proof. This is (P1a). □
Lemma 4. If A ∈ Mn (F) has a zero column, then det A = 0.
Proof. Write A = [v1 · · · vi · · · vn ] where vi = 0. Then vi = 0vi (this 0 is the scalar), so
det A = D(v1 , . . . , vi , . . . , vn ) = D(v1 , . . . , 0vi , . . . , vn ) = 0 · D(v1 , . . . , vi , . . . , vn ) = 0 · det A = 0. □
Lemma 5. If A ∈ Mn (F) and the columns of A are linearly dependent, then det A = 0.
Proof. If n = 1 then A = [0], so A has a zero column, so det A = 0 by Lemma 4. If n > 1, then some
column of A can be written as a linear combination of the other columns. Write A = [v1 v2 · · · vn ] and
suppose for some t we have i1 , . . . , ik ̸= t and α1 , . . . , αk ∈ F with vt = α1 vi1 + · · · + αk vik . Then
det A = D(v1 , . . . , vt , . . . , vn ) = D(v1 , . . . , kj=1 αj vij , . . . , vn )
P
= kj=1 αj D(v1 , . . . , vij , . . . , vn )

P
linearity in column t
=0 (P2+ )
as each [v1 · · · vij · · · vn ] has two equal columns (vij occurs in columns ij and t). □
Ci ←Ci +αCj
Lemma 6 (Type 3 Column Op). If A ∈ Mn (F) and A −→ B, then det B = det A.
Proof. Write A = [v1 · · · vi · · · vj · · · vn ]. Then
det B = D(v1 , . . . , vi + αvj , . . . , vj , . . . , vn )
= D(v1 , . . . , vi , . . . , vj , . . . , vn ) + αD(v1 , . . . , vj , . . . , vj , . . . , vn ) (P1)
= det A + 0 (P2+ ) □
Corollary 1. For every elementary column operation O there exists a nonzero α ∈ F, determined by O,
O
such that if A ∈ Mn (F) and A −→ B, then det B = α(det A).
Proof. α = −1 works for type 1 operations by Lemma 1. The scalar α in the type 2 operation Ci ← αCi
works by Lemma 3. α = 1 works for type 3 operations by Lemma 6. □
Theorem 1. If A ∈ Mn (F) then A is invertible iff det A ̸= 0.
row col
Proof. (⇒) Assume A is invertible. Then so is At , so At ⇝ In , so A ⇝ In . By Corollary 1 there exist
α1 , . . . , αℓ ∈ F \ {0} corresponding to the column operations used to transform A to In , such that
det In = αℓ · · · α2 α1 (det A).
Since det In = 1 (P3), we must have det A ̸= 0.
(⇐) We prove the contrapositive. If A is not invertible, then rankA < n, so the columns of A are linearly
dependent. So det A = 0 by Lemma 5. □
2
Corollary 2. Let E ∈ Mn (F) be elementary, let O be the elementary column operation corresponding to
E, and let α be the nonzero scalar corresponding to O by Corollary 1.
(1) α = det E.
(2) For all A ∈ Mn (F), det(AE) = (det A)(det E).
O
Proof. (1) In −→ E so det E = α · det In by Corollary 1. Apply (P3).
O
(2) A −→ AE so det(AE) = α · det A by Corollary 1. Apply (1).
Lemma 7. For all A ∈ Mn (F) and all elementary matrices E1 , . . . , E` ,
(1) det(AE1 E2 · · · E` ) = (det A)(det E1 )(det E2 ) · · · (det E` )
(2) and det(E1 E2 · · · E` ) = (det E1 )(det E2 ) · · · (det E` ).
Proof. (Not given in lecture.) (2) follows from (1) by setting A = In . We prove (1) by induction on `. If
` = 1 then the claim is just Corollary 2. Inductively,
det(AE1 · · · E` E`+1 ) = det((AE1 · · · E` )E`+1 )
= det(AE1 · · · E` ) · (det E`+1 ) Corollary 2
= (det A)(det E1 ) · · · (det E` ) · (det E`+1 ). Induction
Theorem 2. det(AB) = (det A)(det B) for all A, B ∈ Mn (F).
Proof. If rank B < n, then rank(AB) ≤ rank B < n and det B = det(AB) = 0 by Theorem 1 and the result
holds. If rank B = n, then B is a product of elementary matrices, say B = E1 · · · E` . Apply Lemma 7.
Theorem 3. det At = det A for all A ∈ Mn (F).
Proof. First we prove det E t = det E whenever E is elementary. Let O be the elementary column operation
corresponding to E. If O is of type 1 (Ci Cj ) or of type 2 (Ci ← αCi ) then E t = E, as can be seen by
inspection, so obviously det E t = det E. If instead O is of type 3 (Ci ← Ci +βCj ) then E t is the elementary
matrix of the type-3 operation O0 : Cj ← Cj + βCi (as can be seen by inspection). The corresponding
scalar α associated to both O and O0 by Corollary 1 is α = 1, so det E = det E t = 1 in this case (by
Corollary 2).
Now we prove the theorem in the general case. If rank A < n then rank At < n and det A = 0 = det At
by Theorem 1. If rank A = n, then A = E1 · · · Ek for some elementary matrices Ei , so
At = (E1 · · · Ek )t = (Ek )t · · · (E1 )t
so
det At = (det(Ek )t ) · · · (det(E1 )t ) Lemma 7(2)
= (det Ek ) · · · (det E1 ) Comments in first paragraph
= det A. Lemma 7(2)
Theorem 3 implies that to every “column fact” from Wednesday there is a corresponding “row fact”; for
example, there is a “row version” of Corollary 1.
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
Theorem 4 (Leibniz formula). Let A ∈ Mn (F), say A =   ... .. . . ..  . Then
. . . 
X
det A = sgn(σ) a1σ(1) a2σ(2) · · · anσ(n) .
σ∈Sn
Pn
Proof sketch. Let the rows of A be v1 , . . . , vn , and note that each vi = j=1 aij ej . We will pass to At
whose columns are v1 , . . . , vn .
n
X n
X n
X
t
det A = det A = D( a1j1 ej1 , a2j2 ej2 , . . . , anjn ejn )
j1 =1 j2 =1 jn =1
X n
n X n
X
= ··· a1j1 a2j2 · · · anjn D(ej1 , ej2 , . . . , ejn )
j1 =1 j2 =1 jn =1
Note that each D(ej1 , ej2 , . . . , ejn ) = 0 unless all of j1 , . . . , jn are distinct, in which case the map i 7→ ji is
a permutation σ. Thus
X
= a1σ(1) a2σ(2) · · · anσ(n) D(eσ(1) , eσ(2) , . . . , eσ(n) )
σ∈Sn
X
= a1σ(1) a2σ(2) · · · anσ(n) det(Pσ )
σ∈Sn
X
= a1σ(1) a2σ(2) · · · anσ(n) sgn(σ).
σ∈Sn
Exercise. Work out this formula in the n = 3 case.

Remark. We started by assuming that det is a function Mn (F) → F satisfing (P1)–(P3), and we have
deduced that det is defined by the Leibniz formula. This proves that (for given n and F) there is at most one
function satisfying (P1)–(P3). Put another way, the only candidate for a function satisfying (P1)–(P3) is
the function defined by the Leibniz formula.
We could finish the theoretical exposition by now verifying that the Leibniz formula satisfies (P1)–(P3).
While this is possible, it is tedious and (in my opinion) unenlightening. Instead, here is a proof that the
functions defined by Laplace expansion (cofactor expansion along the first row) satisfy (P1)–(P3).
(What follows was not given in the lecture.)
Definition. For fixed F and n ≥ 1, define Ln : Mn (F) → F recursively by
• L1 ([a]) = a.
• If n > 1, then given
 
a11 a12 ··· a1n
 | | ··· | 
A=  with v1 , . . . , vn ∈ Fn−1 ,
 v1 v2 ··· vn 
| | ··· |
define
Ln (A) = a11 Ln−1 [v2 v3 · · · vn ] − a12 Ln−1 [v1 v3 · · · vn ] + · · · + (−1)n−1 a1n Ln−1 [v1 v2 · · · vn−1 ].
Theorem. For all F and all n ≥ 1, Ln satisfies (P1)–(P3). Hence the det function exists for all F and n,
det(A) is given by the Leibniz formula, and det(A) = Ln (A) for all A ∈ Mn (F).
Proof. We only need to prove the first sentence. The proof is by induction on n. If n = 1, (P1) and (P3)
are obvious and (P2) holds vacuously (since there are no pairs of adjacent columns).
Now assume that n > 1 and that we know Ln−1 satisfies (P1)–(P3).
(P1)(a). Let

a11 · · · αa1j · · · a1n a11 · · · a1j · · · a1n
A= , B= .
v1 · · · αvj · · · vn v1 · · · vj · · · vn
2
Then
j−1
X
Ln (A) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · αvj · · · vn ]
| {z } |{z}
i=1 skip vi j
j−1
+ (−1) αa1j Ln−1 [v1 · · · vj−1 vj+1 · · · + vn ]
| {z }
skip vj
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · αvj · · · vi−1 vi+1 · · · vn ]
|{z} | {z }
i=j+1 skip vi
j
In the first and third lines we can inductively use the fact that Ln−1 satisfies (P1)(a) to pull out a factor
of α from each summand; since the middle line also has a factor of α, we can see that Ln (A) = αLn (B) as
required.
(P1)(b). Let

a11 · · · b + c · · · a1n a11 · · · b · · · a1n a11 · · · c ··· a1n
A= , B= , C=
v1 · · · x + y · · · vn v1 · · · x · · · vn v1 · · · y ··· vn

b c b+c
where , , and are in column j of the respective matrices. Then
x y x+y
j−1
X
Ln (A) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · x+y · · · vn ]
| {z } |{z}
i=1 skip vi j
j−1
+ (−1) (b + c)Ln−1 [v1 · · · vj−1 vj+1 · · · + vn ]
| {z }
skip x+y
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · x+y · · · vi−1 vi+1 · · · vn ]
|{z} | {z }
i=j+1 j skip vi
and similar calculations give
j−1
X
Ln (B) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · |{z}
x · · · vn ]
| {z }
i=1 skip vi j
+ (−1)j−1 bLn−1 [v1 · · · vj−1 vj+1 · · · + vn ]

| {z }
skip x
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · |{z}
x · · · vi−1 vi+1 · · · vn ]
| {z }
i=j+1 j skip vi
and
j−1
X
Ln (C) = (−1)i−1 a1i Ln−1 [v1 · · · vi−1 vi+1 · · · y · · · vn ]
| {z } |{z}
i=1 skip vi j
j−1
+ (−1) cLn−1 [v1 · · · vj−1 vj+1 · · · + vn ]
| {z }
skip y
n
X
+ (−1)i−1 a1i Ln−1 [v1 · · · y · · · vi−1 vi+1 · · · vn ].
|{z} | {z }
i=j+1 j skip vi
3
By applying (P1)(b) to Ln−1 in the first and third lines of the formula for Ln (A), and expanding the
product in the second line, we can see that Ln (A) = Ln (B) + Ln (C) as required.

a11 · · · b b · · · a1n
(P2). Suppose A = has two consecutive equal columns in columns j and j + 1
v1 · · · x x · · · vn
as shown. In the Laplace expansion of Ln (A) we will have:
• n − 2 summands of the form (−1)i−1 a1i Ln−1 [· · · x x · · · ]. Note that all of these equal 0 because
Ln−1 satisfies (P2).
• (−1)j−1 bLn−1 [v1 · · · vj−1 x vj+1 · · · vn ]

| {z }
skip 1st x
• (−1)j bLn−1 [v1 · · · vj−1 x vj+1 · · · vn ]
| {z }
skip 2nd x
The last two summands are identical except that they have opposite sign, so they cancel and Ln (A) = 0
as required.
(P3). This is really easy and is left as an exercise.
4
Recall the formula for Laplace expansion of a determinant: If

 
a11 a12 · · · a1n
 | | ··· | 
A=  v1 v2 · · · vn 
 with v1 , . . . , vn ∈ Fn−1 ,
| | ··· |
then
det A = a11 det [v2 v3 · · · vn ] − a12 det [v1 v3 · · · vn ] + · · · + (−1)n−1 a1n det [v1 v2 · · · vn−1 ].
The matrices on the RHS are obtained from A by deleting the first row and column 1, . . . , n respectively.
Definition. Suppose A ∈ Mn (F) and 1 ≤ i, j ≤ n. Aij denotes the submatrix of A obtained by deleting
row i and column j.
Example 1. Let
 
2 4 0 9    
−1 −3 0 8 −1 0 8 2 0 9
A=
 0
. Then A12 =  0 0 0 and A32 = −1 0 8 .
2 0 0
4 7 6 4 7 6
4 5 7 6
a11 a12 · · · a1n
   
a11 a12 ··· a1n
 a21 a22 · · · a2n   | | ··· | 
In general, suppose A ∈ Mn (F), say A = 
 ... .. .. ..  = .
. . .   v1 v2 ··· vn 
an1 an2 · · · ann | | ··· |
Using this notation, the formula for Laplace expansion is
det A = a11 det A11 − a12 det A12 + · · · + (−1)n−1 det A1n .
It can be shown that det A is given by cofactor expansion along any row or column, using the following
pattern for signs:
+ − + ···
 
 − + − ··· 
.
 + − +


.. .. ..
. . .
(I will prove this claim after Example 2.) For example, cofactor expansion along column 2 means “going
down” the entries of the second column of A, i.e., a12 , a22 , . . . , an2 , starting with − and alternating signs,
and multiplying each entry by the submatrices A12 , A22 , . . . , An2 obtained from A by deleting column 2
and row 1, 2, . . . , n respectively:
det A = −a12 det A12 + a22 det A22 + · · · + (−1)n det An2 .
Example 2. Using A from Example 1, note that row 3 has only one nonzero entry. Using cofactor
expansion along row 3, we get
       
4 0 9 2 0 9 2 4 9 2 4 0
det A = +0 · det −3 0 8 − 2 · det −1 0 8 + 0 · det 1 −3 8 − 0 · det −1 −3 0
5 7 6 4 7 6 4 5 6 4 5 7
 
2 0 9
= −2 · det −1 0 8 .
4 7 6
We can calculate the 3 × 3 determinant by cofactor expansion along column 2:
 
2 0 9
−1 8 2 9 2 9
det −1 0 8 = −0 · det + 0 · det − 7 · det
4 6 4 6 −1 8
4 7 6

2 9
= −7 · det .
−1 8
Hence

2 9
det A = (−2)(−7) det
−1 8
= 14(2 · 8 − (−1)·9)
= 14(25) = 350.
Theorem 5 (Cofactor expansion along any row). Let A ∈ Mn (F). Then for any row i = 1, . . . , n, cofactor
expansion along row i correctly calculates det A. That is,
n
X
det A = (−1)i+j aij det Aij .
j=1
Proof sketch. (Not given in lectures) If i = 1 then this is just the last Theorem from the March 24 lecture.
Consider the case i = 2. Write
 
a11 · · · a1j−1 a1j a1j+1 · · · a1n  
a21 · · · a2j−1 a2j a2j+1 a11 · · · a1j−1 a1j+1 · · · a1n
· · · a2n   |
  ··· | | ··· | 
A=  | ··· | | | ··· |  so A2j =  .
 v1 · · · vj−1 vj vj+1
  v1 · · · vj−1 vj+1 · · · vn 
· · · vn 
| ··· | | ··· |
| ··· | | | ··· |
Let B be the matrix obtained from A by swapping rows 1 and 2; thus

 
a21 · · · a2j−1 a2j a2j+1 · · · a2n
a11
 · · · a1j−1 a1j a1j+1 · · · a1n 

B=  | ··· | | | ··· | .
 v1 · · · vj−1 vj vj+1 · · · vn 
| ··· | | | ··· |
Observe that B1j = A2j for all j = 1, . . . , n. Hence
det A = − det B Lemma 1 (row version)

= − (a21 det B11 − a22 det B12 + a23 det B13 − · · · ) Cof. exp. of det B on row 1
= −a21 det A21 + a22 det A22 − a23 det A23 + · · ·
= cofactor expansion of det A on row 2.
Finally, consider the general case i ≥ 2. Let B be the matrix obtained from A by cyclically permuting
the first i rows of A, so that rows 1, 2, . . . , i − 1, i of A become rows 2, 3, . . . , i, 1 of B. In particular, the
i-th row of A is the first row of B. The cyclic permutation (1 2 · · · i) can be simulated by i − 1 swaps
of pairs of rows. Hence
(∗) det B = (−1)i−1 det A.

2
And by construction, B1j = Aij for all j = 1, . . . , n (exercise). Hence
det A = (−1)i−1 det B By (∗)
i−1
= (−1) (ai1 det B11 − ai2 det B12 + ai3 det B13 − · · · ) Cof. exp. of det B on row 1
= (−1)i+1 ai1 det Ai1 + (−1)i+2 ai2 det Ai2 + (−1)i+3 ai3 det Ai3 + · · ·
= cofactor expansion of det A on row i. □
Corollary (Cofactor expansion along any column). Let A ∈ Mn (F). Then for any row j = 1, . . . , n,
cofactor expansion along column j correctly calculates det A. That is,
X n
det A = (−1)i+j aij det Aij .
i=1
Proof. Cofactor expansion of det A along column j is cofactor expansion of det At along row j. □
The method of expansion by cofactors is useful if a matrix is sparse (has many zeroes), but otherwise is
generally not useful in calculations. The best method in practice uses row-reduction to an upper-triangular
matrix.
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
Definition. Let A ∈ Mn (F), say A =   ... .. ... .. .
. . 
(1) The diagonal entries of A are a11 , a22 , . . . , ann . Collectively they are called the diagonal.
(2) A is diagonal if all entries above and below the diagonal are 0.
(3) A is upper triangular if all entries below the diagonal are 0.
Theorem 6. If A ∈ Mn (F) is upper-triangular, then det A is the product of the diagonal entries of A.
a11 a12 · · · a1n
 
 0 a22 · · · a2n 
Proof sketch. By induction on n. In the inductive step, write A = 
 ... .. . . ..  .
. . . 
0 0 ··· ann
and expand det A by cofactors along the first column. □
This suggests a method to calculate det A: reduce
A⇝U upper triangular
keeping track of the elementary row and/or column operations O1 , . . . , Oℓ used. If α1 , . . . , αℓ are the scalars
corresponding to O1 , . . . , Oℓ , then
det U = α1 · · · αℓ (det A).
det U can be easily calculated using Theorem 6, and α1 , . . . , αℓ are known, so we can solve for det A.
 
0 1 3
Example. To find det −2 −4 −5 you can do
3 −1 1
       5
  
0 1 3 −2 −4 −5 1 2 52 1 2 2
1 2 52
−1 −1/2 1 1
A = −2 −4 −5 −→  0 1 3 −→ 0 1 3 −→ 0 1 3 −→ 0 1 3 =: U.
3 −1 1 3 −1 1 3 −1 1 0 −7 − 13
2
0 0 29 2
Then det U = (−1)(− 12 )(1)(1) det A by tracking row operations, while det U = 29
2
by upper-triangularity.
Hence det A = 29.
3
Here is one last fact to blow your mind.
Definition. Let A ∈ Mn (F) and 1 ≤ i, j ≤ n. The (i, j)-cofactor of A is the scalar (−1)i+j det Aij .
Definition. Given A ∈ Mn (F), for each 1 ≤ i, j ≤ n let cij be the (i, j)-cofactor of A, and let C be the
n × n matrix whose (i, j)-entry is cij . The transpose C t of C is called the adjugate of A.
Theorem 7. For any A ∈ Mn (F), A(C t ) = det A · In .
Proof sketch. (Not given in lecture) Consider the (i, i)-entry of A(C t ). It is the sum of pairwise products
from row i of A and column i of C t , or in other words, of the i-th rows of A and C. The i-th rows of A
and C are
j: 1 2 ··· n
A: ai1 ai2 ··· ain
C: (−1)i+1 det Ai1 (−1)i+2 det Ai2 · · · (−1)i+n det Ain
so the (i, i) entry of A(C t ) is
ai1 (−1)i+1 det Ai1 + ai2 (−1)i+2 det Ai2 + · · · + ain (−1)i+n det Ain
which is exactly the cofactor expansion of det A on row i, and hence equals det A.
Next consider the (i, j)-entry of A(C t ) where j ̸= i. Let B be the matrix obtained from A by replacing
row j (of A) with a copy of row i (of A). Thus det B = 0 since B has two equal rows.
As before, the (i, j)-entry of A(C t ) is the sum of pairwise products from row i of A and column j of C t ,
or in other words, of row i of A and row j of C. Note that row i of A is the same as row j of B. And
because A and B are identical except on row j, we get that Bjℓ = Ajℓ for all ℓ = 1, . . . , n. From this we
can deduce that the (i, j) entry is the cofactor expansion of det B on row j, which equals 0. □
Corollary 3. Suppose A ∈ Mn (F) and det A ̸= 0. Then A−1 = 1
det A
Ct where C t is the adjugate of A.
1 1 det A
Proof. A · det A
Ct = det A
A(C t ) = I
det A n
= In . □
Here is a typical application.
Corollary 4. Suppose A ∈ Mn (R) has integer entries. If det A = ±1, then A−1 also has integer entries.
Proof. The determinant of any square matrix with integer entries is an integer, by the Leibniz formula.
Since A has integer entries, so does every submatrix Aij , and hence every cofactor (−1)i+j det Aij . Thus
the adugate C t has integer entries. If det A = ±1 then A−1 = ±C t by Corollary 3 and so A−1 has integer
entries. □
4
§5.1 Diagonalizability
Motivation: given a linear operator T ∈ L(V ) (with dim V = n), we can represent T by [T ]BA where A, B
are ordered bases of V .
If we are free to choose A and B, then we can always find A, B so that

Ir O
[T ]BA = with r = rank(T )
O′ O′′
(exercise) which is a very nice matrix. However if we want to understand T 2 , T 3 , etc., then we will likely
want A = B. (Then [T 2 ]B = [T ]B [T ]B = ([T ]B )2 , [T 3 ]B = ([T ]B )3 , etc.)
Definition. T ∈ L(V ) is called diagonalizable if there exists an ordered basis B for V such that [T ]B is
diagonal. If V = Fn and T = TA , then we also say that A is diagonalizable.
Diagonalization Problem: Which T ∈ L(V ) are diagonalizable? If T is diagonalizable, how can I find
B so that [T ]B is diagonal?
Special Case: V = Fn , T = TA . Suppose B = (v1 , . . . , vn ) is an ordered basis such that [TA ]B is a
diagonal matrix, say D. Let E be the standard basis for Fn . We have
D = [TA ]B = [Id]BE [TA ]E [Id]EB
Q−1 A Q
where Q = [v1 · · · vn ]. So A is similar to D. The converse is also true: if A is similar to the diagonal
matrix D, say D = Q−1 AQ, then [TA ]B = D where B is the ordered basis for Fn consisting of the columns
of Q (exercise). So TA is diagonalizable iff A is similar to a diagonal matrix.
Matrix diagonalization problem: Which A ∈ Mn (F) are similar to a diagonal matrix? If A is diago-
nalizable, how can I find invertible Q and diagonal D such that Q−1 AQ = D (equivalently, QDQ−1 = A)?
Definition. Let V be a vector space over F. Let T ∈ L(V ).

(1) v ∈ V is an eigenvector of T if v ̸= 0 and T v = λv for some (unique) λ ∈ F, i.e., T v ∈ span(v).
(2) λ is called the eigenvalue corresponding to v, and an eigenvalue of T .
If V = Fn and T = TA , we also call these eigenvectors and eigenvalues of A.

3/2 −1 1 2
Example. Let A = ∈ M2 (R) so TA ∈ L(R2 ). Let v1 = and v2 = .
1/2 0 1 1

1/2
TA (v1 ) = Av1 = = 21 v1 , so v1 is an eigenvector with corresponding eigenvalue λ = 12 .
1/2

2
TA (v2 ) = Av2 = = v2 , so v2 is an eigenvector with corresponding eigenvalue λ = 1.
1

1/2 0 1 2
BTW, if B = (v1 , v2 ), then [TA ]B = (so A is diagonalizable!). Similarly, if Q = [v1 v2 ] =
0 1 1 1
−1
then Q AQ is diagonal.
In general:
Theorem 1. If V is finite-dimensional and T ∈ L(V ), then
T is diagonalizable ⇐⇒ ∃ a basis for V consisting of eigenvectors for T .
Proof. (Not given in lecture) (⇒) Suppose B is an ordered basis such that [T ]B is a diagonal matrix, say
λ1 0 · · · 0
 
 0 λ2 · · · 0 
[T ]B = 
 ... .. . . . .
. . .. 
0 0 · · · λn
Write B = (v1 , . . . , vn ). Then by definition of [T ]B , T vi = λi vi for each i. Obviously each vi is nonzero
(since it is part of a basis). So each vi is an eigenvector of T , so B is an (ordered) basis for V consisting
of eigenvectors.
(⇐) Suppose B = (v1 , . . . , vn ) is an (ordered) basis for V consisting of eigenvectors of T . For each
i = 1, . . . , n let λi be the eigenvalue corresponding to vi . By definition of [T ]B ,
λ1 0 · · · 0
 
 0 λ2 · · · 0 
[T ]B = 
 ... .. . . .
. . .. 
0 0 ··· λn
so T is diagonalizable. □
Question: How do you find eigenvectors? Answer: First find the eigenvalues.
.Lemma 1. If A ∈ Mn (F) and λ ∈ F, then λ is an eigenvalue of A ⇐⇒ det(A − λIn ) = 0.
λ is an eigenvalue of A ⇐⇒ ∃v ∈ Fn , v ̸= 0 and Av = λv
⇐⇒ ∃v ∈ Fn , v ≠ 0 and Av − λv = 0
⇐⇒ ∃v ∈ Fn , v ̸= 0 and (A − λIn )v = 0
⇐⇒ Ker(A − λIn ) ̸= {0}
⇐⇒ rank(A − λIn ) < n
⇐⇒ det(A − λIn ) = 0. □
So in the example, I formed

3/2 −1 λ 0 3/2 − λ −1
A − λI2 = − =
1/2 0 0 λ 1/2 −λ
and calculated the determinant:
det(A − λI2 ) = (3/2 − λ)(−λ) − (−1)(1/2) = λ2 − 32 λ + 1
2
1
which equals 0 at λ = 2
and λ = 1. (These are the eigenvalues.)
Then for each λ I found an eigenvector v by solving (A − λI2 )v = 0 and choosing a nonzero solution.

1 1 1 −1 0 row 1 −1 0
λ = 2: (A − 2 I2 | 0) = ⇝
1/2 −1/2 0 0 0 0
whose solution set is

1 1
v=α , α ∈ R. I chose v1 = .
1 1

1/2 −1 0 row 1 −2 0
λ = 1: (A − 1I2 | 0) = ⇝
1/2 −1 0 0 0 0
whose solution set is

2 2
v=α , α ∈ R. I chose v2 = .
1 1
Let’s generalize. Fix an indeterminate (formal variable) t.
Definition. Given A ∈ Mn (F), the characteristic polynomial of A, denoted pA (t), is det(A − tIn ).
Corollary. The eigenvalues of A are the roots (zeroes) of pA (t).
2
Recall: if A ∈ Mn (F) then pA (t) = det(A − tIn ). The eigenvalues of A are the roots of pA (t).

3/2 −1
Example. A = , pA (t) = t2 − 23 t + 12 , eigenvalues are λ = 12 and λ = 1.
1/2 0
Theorem 2. Let A ∈ Mn (F).
(1) pA (t) is a polynomial cn tn + cn−1 tn−1 + · · · + c1 t + c0 of degree n.
(2) cn = (−1)n .
(3) cn−1 = (−1)n−1 trace(A).
(4) c0 = det A.
Proof. (4) c0 = pA (0) = det(A − 0In ) = det A.
(1)–(3): At end of these lecture notes. □
Now we turn to eigenvectors. First, another definition.
Definition. Suppose V is a vector space over F, T ∈ L(V ), and λ ∈ F is an eigenvalue of T . The
eigenspace of T corresponding to λ is
Eλ = Ker(T − λ·Id) = {v ∈ V : T v = λv}
= {eigenvectors corresponding to λ} ∪ {0}.
Notes:
(1) If V = Fn and T = TA , then Eλ = Null(A − λIn ).
(2) dim Eλ ≥ 1 always.
To find eigenvector(s) corresponding to an eigenvalue λ for A, we can let B = A − λIn and solve Bv = 0.
Our usual method gives a basis for the solution set (which is Eλ ); all nonzero linear combinations of the
basis vectors are eigenvectors.
Recall: to diagonalize T , what we need is a basis for V consisting of eigenvectors; that means a basis
contained in the union of the eigenspaces. The next result will guarantee that we can “glue” bases of the
eigenspaces together and still have a linearly independent set.
Theorem 3. Assume T ∈ L(V ) and λ1 , . . . , λk are distinct eigenvalues of T . Let W = Eλ1 + · · · + Eλk .
Then W = Eλ1 ⊕ · · · ⊕ Eλk .
Proof. At end of these lecture notes. □
As a consequence, we get the following.
Corollary 1. Suppose dim V = n, T ∈ L(V ), and λ1 , . . . , λk are the distinct eigenvalues of T .
(1) dim Eλ1 + · · · + dim Eλk = dim(Eλ1 ⊕ · · · ⊕ Eλk ) ≤ n.
(2) T is diagonalizable ⇐⇒ dim Eλ1 + · · · + dim Eλk = n ⇐⇒ V = Eλ1 ⊕ · · · ⊕ Eλk .
(3) If T is diagonalizable, then a diagonalizing basis can be found by taking the union of bases for the
eigenspaces of T .
Proof. At end of these lecture notes. □
Example 1. A ∈ M2 (R) at start of lecture. Two distinct eigenvalues λ = 21 and λ = 1. Each eigenspace
has dimension ≥ 1, and the sum of their dimensions is ≤ 2, so dim E 1 = dim E1 = 1 and the dimensions
2
sum to 2. Hence A is diagonalizable.
 
4 0 1
Example 2. Let A = 2 3 2 ∈ M3 (R).
1 0 4
 
4−t 0 1
4−t 1
pA (t) = det(A − tI3 ) = det  2 3−t 2  = (3 − t) det
1 4−t
1 0 4−t
= (3 − t)((4 − t)2 − 1) = (3 − t)(t2 − 8t + 15) = −(t − 3)2 (t − 5).
Thus the eigenvalues are 3 and 5 (we say that λ = 3 has “algebraic multiplicity 2”)
Next let’s find the eigenspaces and their dimensions.
λ=3  
1 0 1
E3 = Null(A − 3I3 ) = Null 2 0 2
1 0 1
By inspection, rank(A − 3I3 ) = 1, so dim(E3 ) = nullity(A − 3I3 ) = 2. To get a basis for E3 , solve the
system (A − 3I3 )x = 0. The augmented matrix for this system is
   
1 0 1 0 1 0 1 0
2 0 2 0 −→ 0 0 0 0 .
1 0 1 0 0 0 0 0
A basis for the solution set is v1 = (0, 1, 0), v2 = (−1, 0, 1).
λ=5  
−1 0 1
E5 = Null(A − 5I3 ) = Null  2 −2 2
1 0 −1
By inspection, rank(A − 5I3 ) = 2, so dim(E5 ) = nullity(A − 5I3 ) = 1.
To get a basis for E5 , solve the system (A − 5I3 )x = 0. The augmented matrix for this system is
   
−1 0 1 0 1 0 −1 0
 2 −2 2 0 −→ 0 1 −2 0 .
1 0 −1 0 0 0 0 0
A basis for the solution set is v3 = (1, 2, 1).
We have dim E3 + dim E5 = 3 so A is diagonalizable. A diagonalizing basis for TA is B = (v1 , v2 , v3 ) and
 
3 0 0
[TA ]B = 0 3 0 =: D.
0 0 5
If we let  
0 −1 1
Q = [v1 v2 v3 ] = 1 0 2
0 1 1
then Q−1 AQ = D.
 
3 1 0
Example 3. Let’s do the same thing for the matrix B = 0 3 0.
0 0 5
 
3−t 1 0
pB (t) = det(B − tI3 ) = det  0 3−t 0  = (3 − t)2 (5 − t) = −(t − 3)2 (t − 5).
0 0 5−t
2
The same as pA (t); hence the same eigenvalues and same multiplicities. Let’s find the eigenspaces and
their dimensions.
λ=3 
0 1 0
E3 = Null(B − 3I3 ) = Null 0 0 0 .
0 0 2
By inspection, rank(B − 3I3 ) = 2, so dim(E3 ) = nullity(B − 3I3 ) = 1.
λ = 5.
We will see next week that dim Eλ is always ≤ the algebraic multiplicity of λ. Since 5 has algebraic
multiplicity 1, we know that dim(E5 ) = 1.
So dim E3 + dim E5 = 1 + 1 = 2 < 3, so B is not diagonalizable.
The following was not given in lectures.

Proof of Theorem 2. (1)–(3) Look at A − tIn :
a11 − t a12 ··· a1n
 
 a21 a22 − t ··· a2n 
A − tIn = 
 ... .. .. ..  .
. . . 
an1 an2 ··· ann − t
Each entry is a polynomial of degree 0 or 1. By the Leibniz formula, det(A − tIn ) is an alternating sum
of products of entries of A − tIn . Hence pA (t) is a polynomial. Each product in the sum has n factors, so
each product has degree at most n, so pA (t) has degree ≤ n. Write
pA (t) = cn tn + cn−1 tn−1 + · · · + c1 t + c0 .
Observe that only one product in the Leibniz formula contributes to the coefficient of tn : the product of
the diagonal entries
(a11 − t)(a22 − t) · · · (ann − t).
In fact, this is also the only product that contributes to cn−1 , since any product other than the diagonal
product has at least two nondiagonal factors. So we expand the diagonal product:
n
X
n
(a11 − t)(a22 − t) · · · (ann − t) = (−t) + aii (−t)n−1 + (lower terms)
i=1
= (−1) t + (−1)n−1 (a11 + · · · + ann )tn−1 + (lower terms).
n n
| {z }
trace A
This proves (1)–(3). □

Proof of Theorem 3. Equivalently, if xi ∈ Eλi for i = 1, . . . , k, then
x1 + · · · + xk = 0 =⇒ x1 = · · · = xk = 0.
We prove this claim by induction on k. If k = 1 there is nothing to prove. Assume k > 1 and the claim is
true with k replaced by k − 1. Assume xi ∈ Eλi for i = 1, . . . , k and
(1) x1 + · · · + xk−1 + xk = 0.
Apply T to both sides:
T (x1 + · · · + xk−1 + xk ) = T (0),
i.e.,
(2) λ1 x1 + · · · + λk−1 xk−1 + λk xk = 0.
3
Multiply (1) by λk :
(3) λk x1 + · · · + λk xk−1 + λk xk = 0.
Subtract (3) from (2):
(λ1 − λk )x1 + · · · + (λk−1 − λk )xk−1 = 0.
| {z } | {z }
∈Eλ1 ∈Eλk−1
By induction,
(λ1 − λk )x1 = · · · = (λk−1 − λk )xk−1 = 0
and since the λi ’s are distinct, we get
x1 = · · · = xk−1 = 0.
Plugging back into (1) gives xk = 0 as well. □
Before proving Corollary 1, we state and prove a consequence of Theorem 3 that we will need.
Corollary 0. Suppose dim V = n, T ∈ L(V ), and λ1 , . . . , λk are the distinct eigenvalues of T . Let
B1 , . . . , Bk be bases for Eλ1 , . . . , Eλk respectively. Then B1 ∪ · · · ∪ Bk is linearly independent.
Proof. Let di = dim Eλi and write Bi = {v1i , . . . , vdi i }. Suppose ait are scalars such that
a11 v11 + · · · + a1d1 vd11 + · · · + ai1 v1i + · · · + aidi vdi i + · · · + ak1 v1k + · · · + akdk vdkk = 0.
| {z } | {z } | {z }
x1 xi xk
Let x1 , . . . , xk be the elements named in the above display. Then xi ∈ Eλi for each i. By the proof of
Theorem 3, x1 = · · · = xk = 0. That is, for each fixed i, dt=1 at vt = 0. Since Bi is linearly independent,
Pi i i
each ait = 0. Since all coefficients are 0, B1 ∪ · · · ∪ Bk is linearly independent. □
Proof of Corollary 1. (1) The equality follows from repeated use of dim(W1 ⊕ W2 ) = dim W1 + dim W2 (see
the remarks after Theorem 1 on Feb. 1). The inequality follows from the fact that Eλ1 ⊕ · · · ⊕ Eλk is a
subspace of V .
(2) For i = 1, . . . , k let di = dim Eλi . Clearly dim(Eλ1 ⊕ · · · ⊕ Eλk ) = ki=1 di , so V = Eλ1 ⊕ · · · ⊕ Eλk
P
iff ki=1 di = n. It remains to prove the first ⇐⇒ .

P
(⇒) Assume T is diagonalizable. Then V has a basis B consisting of eigenvectors of T , by Theorem 1
(March 29). For each i = 1, . . . , k let Bi = B ∩ Eλi . Then B = B1 ∪ · · · ∪ Bk is a partition of B. Each Bi
is a linearly independent subset of Eλi so |Bi | ≤ di . Thus
n = |B| = |B1 | + · · · + |Bk | ≤ d1 + · · · + dk ≤ n by (1)
Pk
so di = n as required.
i=1
(⇐) Assume ki=1 di = n. For each i let Bi be a basis for Eλi . Then |Bi | = di . Let B = B1 ∪ · · · ∪ Bk .
P
Then |B| = d1 + · · · + dk = n. Furthermore, B is linearly independent by Corollary 0. So B is a basis for
V . By construction, B consists entirely of eigenvectors of T . So T is diagonalizable by Theorem 1.
(3) This follows from the proof of (2)(⇐). □
4
MATH 146 April 3 Section 2
Definition. Given A ∈ Mn (F) and an eigenvalue λ of A, the algebraic multiplicity of λ is the maximum
m ≥ 1 such that (t − λ)m | pA (t).
 
4 0 1
Example. Let A = 2 3 2 ∈ M3 (R). We saw on Friday that pA (t) = −(t − 3)2 (t − 5). Eigenvalue
1 0 4
λ = 3 has algebraic multiplicity 2. Eigenvalue λ = 5 has algebraic multiplicity 1.
Theorem 4. Suppose A ∈ Mn (F) and λ is an eigenvalue of A. Then
dim Eλ ≤ (algebraic multiplicity of λ).
(We’ll prove this at the end of the lecture.)
 
1 0 0
Example. A = 0 0 −1 ∈ M3 (R). Is A diagonalizable?
0 1 0
 
1−t 0 0
−t −1
pA (t) = det  0 −t −1 = (1 − t) det
 = −(t − 1)(t2 + 1).
1 −t
0 1 −t
t2 + 1 cannot be factored over R. We say that pt (A) doesn’t split (over R), because it doesn’t completely
factor as a product of linear factors. Considered as a matrix over R, the only eigenvalue is λ = 1. Its
eigenspace will have dimension 1, not enough to diagonalize A.
If on the other hand we consider A as a matrix over C, then pA (t) does split:
pA (t) = −(t − 1)(t − i)(t + i)
There are three distinct eigenvalues: 1, i and −i. Each has algebraic multiplicity 1. The dimension of each
eigenspace is 1, so the sum of the dimensions is 3, and A is diagonalizable under this interpretation.
We see that when discussing diagonalizability of a matrix, we must also be careful to specify over which
field F we are considering it. (In other words, we must specify the vector space Fn on which TA operates.)
Using Theorem 4, we can prove one more characterization of diagonalizability.
Theorem 5. Suppose A ∈ Mn (F). A is diagonalizable over F iff pA (t) splits over F and dim Eλ =
(alg. mult. of λ) for each eigenvalue λ.
Proof. (Not given in lectures) Let λ1 , . . . , λk be the distinct eigenvalues of A and let m1 , . . . , mk be their
algebraic multiplicities. Then
pA (t) = c(t − λ1 )m1 · · · (t − λk )mk if pA (t) splits;
while if pA (t) doesn’t split, then
pA (t) = c(t − λ1 )m1 · · · (t − λk )mk q(t) with deg(q(t)) ≥ 2.
Note that deg(pA (t)) = n by Theorem 2 (March 31). Hence using Theorem 4 we get
k k
X X n if pA (t) splits
dim Eλi ≤ mi =
n − deg(q(t)) if pA (t) does not split
i=1 i=1
Pk
So we get i=1 dim Eλi = n exactly when pA (t) splits and dim Eλi = mi for all i. □
The rest of this lecture is devoted to a proof of Theorem 4. To get there, we need to extend some of our
analysis of eigenvalues from matrices to linear operators.
First, given T ∈ L(V ) where V is finite-dimensional, how can we find the eigenvalues of T ?
Answer: Pick an ordered basis B, find eigenvalues of [T ]B .
This works because T and A := [T ]B have the same eigenvalues:
T v = λv ⇐⇒ [T v]B = [λv]B
⇐⇒ [T ]B [v]B = λ[v]B
⇐⇒ A[v]B = λ[v]B .
We also get a complete translation between
Eigenvectors of T and eigenvectors of [T ]B : v ↭ [v]B
Eigenspaces of T and eigenspaces of [T ]B : EλT ∼
= EλA via v 7→ [v]B
Dimensions of eigenspaces: dim EλT = dim EλA .
Note: we haven’t yet defined the characteristic polynomial of a linear operator. Time to fix that.
Definition. Suppose V is a vector space over F with dim V = n, and T ∈ L(V ). The characteristic
polynomial of T , denoted pT (t), is obtained by choosing an ordered basis B for V , letting A = [T ]B , and
def
defining pT (t) = pA (t).
There is one annoying issue: we need to prove that pT (t) is well-defined, i.e., doesn’t depend on the
choice of B.
Lemma. Let V and T be as above. Then pT (t) is well-defined; that is, if A = [T ]B and B = [T ]C where
B, C are ordered bases for V , then pA (t) = pB (t).
Proof. This will follow from the fact that A and B are similar. Write B = Q−1 AQ where Q is the
change-of-coordinate matrix from C to B. First observe that det B = det A, since
det B = (det Q−1 )(det A)(det Q)
= (det A)(det Q−1 )(det Q)
= (det A) det(Q−1 Q) = det A.
(Hey, we’ve proved that similar matrices always have the same determinant.) Next observe that
B − tIn = B − tQ−1 In Q
= Q−1 AQ − Q−1 (tIn )Q
= Q−1 (A − tIn )Q.
So B − tIn and A − tIn are similar. So
pB (t) = det(B − tIn ) = det(A − tIn ) = pA (t). □
Now we can prove Theorem 4. It is easier to prove its generalization to linear operators.
Theorem 4 (Generalization). Suppose V is finite-dimensional, T ∈ L(V ), and λ is an eigenvalue of T .
Then
dim Eλ ≤ (algebraic multiplicity of λ).
Proof. Let d = dim Eλ . It suffices to prove (t − λ)d | pT (t).
Let (v1 , . . . , vd ) be an ordered basis for Eλ , and extend it to an ordered basis B = (v1 , . . . , vd , vd+1 , . . . , vn )
for V . Let A = [T ]B , so pT (t) = pA (t) by the Lemma.
2
Observe that for i = 1, . . . , d,
T (vi ) = λvi (because vi ∈ Eλ )
= 0v1 + · · · + 0vi−1 + λvi + 0vi+1 + · · · 0vd + 0vd+1 + · · · + 0vn .
Hence
 
λ 0 ··· 0 ∗ ··· ∗
 0 λ ··· 0 ∗ ··· ∗ 
 .. .. .. . 

. . . .. ∗ ··· ∗ 

λId B

A =  0 0 ··· λ ∗ ··· ∗  =
 
O C
0 0 ··· 0 ∗ ··· ∗ 
 

 .. .. .. .. .. 
 . . . . . 
0 0 ··· 0 ∗ ··· ∗
for some matrices of the appropriate dimensions. Then

(λ − t)Id B
A − tIn = where m = n − d.
O C − tIm
Both A and A − tIn are block upper triangular (with square blocks on the diagonal). In A6 Q3 you will
show that if B = O then the determinant of such a matrix is is equal to the product of the determinants of
the two diagonal blocks. Essentially the same argument shows that this is true even if B ̸= O (exercise!).
Note that
λ−t 0 ··· 0
 
 0 λ − t ··· 0 
det((λ − t)Id ) = det  .
 .. .
.. . .. ..  = (λ − t)d .
. 
0 0 ··· λ−t d×d
Thus
pT (t) = det((λ − t)Id ) · det(C − tIm )
= (λ − t)d · det(C − tIm )
= (λ − t)d · pC (t).
This proves that (t − λ)d divides pT (t) as required. □
3
MATH 146 April 5 Section 2
Here is a little parlour trick which is actually examinable material. Suppose A ∈ Mn (F) is diagonalizable.
Thus there exists an invertible matrix P and a diagonal matrix
λ1 0 · · · 0
 
 0 λ2 · · · 0 
D=  ... .. . . .
. . .. 
0 0 ··· λn
such that P −1 AP = D. We can solve for A to get A = P DP −1 . Then for any m ≥ 1,
Am = (P DP −1 )(P DP −1 )(P DP −1 ) · · · (P DP −1 )
| {z }
m
m −1
= PD P .
Now Dm is easily calculated:
λm 0 ··· 0
 
1
 0 λm 2 ··· 0 
Dm = 
 ... .. . . . .
. . .. 
0 0 · · · λmn
If we know P and P −1 , then we can get an explicit formula for Am .

 
4 0 1
Example. Consider the matrix A = 2 3 2 ∈ M3 (R) from Example 2 last Friday. We found that:
1 0 4
pA (t) = −(t − 3)2 (t − 5)

eigenvalues : λ = 3 (alg. mult. 2) and λ = 5 (alg. mult. 1)
      
1 0 1  0 −1 
E3 = Null 2 0 2 = span 1 ,  0
1 0 1 0 1
 
   
−1 0 1  1 
E5 = Null  2 −2 2 = span 2 .

1 0 −1 1
 
A is diagonalizable, and we can let

   
0 −1 1 3 0 0
P = 1 0 2 and D = 0 3 0
0 1 1 0 0 5
and we will get P −1 AP = D. So

A = P DP −1 and Am = P Dm P −1 .
We need to know P −1 .
1 0 0 −1 1 −1
   
0 −1 1 1 0 0
row 1 1
(P | I3 ) = 1 0 2 0 1 0 ⇝ 0 1 0 − 2 0 2 = (I3 | P −1 ).
0 1 1 0 0 1 1 1
0 0 1 2
0 2
So
−1 1 −1
  m  
0 −1 1 3 0 0
1 1
Am = P Dm P −1 = 1 0 2   0 3 m 0  − 2 0 2
0 1 1 0 0 5m 1
2
0 1
2
| {z }
−1 1 −1
  
0 −3m 5m
1 1
= 3m 0 2·5m  − 2 0 2
0 3m 5m 1
2
0 1
2
 1 m
(3 + 5m ) 0 12 (−3m + 5m )

2
=  −3m + 5m 3m −3m + 5m 
1 1 m
2
(−3m + 5m ) 0 2
(3 + 5m )
Check m = 1:  1 1   
2
(3
+ 5) 0 2
(−3 + 5) 4 0 1
 −3 + 5 3 −3 + 5  = 2 3 2 = A ✓
1 1 1 0 4
2
(−3 + 5) 0 2
(3 + 5)
Example. Recall the Fibonacci numbers
f0 = 0, f1 = 1, fn = fn−2 + fn−1 for n ≥ 2.

0 1
Let B = . An easy induction argument shows that
1 1

n fn−1 fn
B =
fn fn+1
√
for all n ≥ 1. On
√
the other hand, you can check that B is diagonalizable with eigenvalues 1+2 5 (the golden
ratio ϕ) and 1−2 5 = − ϕ1 . If you follow the above steps to find an explicit formula for B n , you will obtain
an explicit formula for fn involving powers of ϕ and ϕ1 .
(What follows is not examinable.)

Some of you have asked me to explain “what it means” when dim Eλ < alg. mult. of λ. At least for
matrices over R, one way to answer this lies in “local
analysis.”

1 0 1 1
Consider two archetypal examples: A = I2 = and B = , both considered as matrices
0 1 0 1
over R. Both have characteristic polynomial (t − 1)2 , so both have just one eigenvalue λ = 1, which
has algebraic
multiplicity
2. For A we have E1 = R2 and A is diagonalizable. For B we have E1 =
0 1 1
Null = span , the x-axis, and B is not diagonalizable.
0 0 0
How to explain the difference? Consider a small perturbation of A:

1−ε δ
Aε,δ = .
0 1+ε
Aε,δ has two eigenvalues: 1 − ε and 1 + ε. Its eigenspaces are

0 δ 1
E1−ε = Null = span , the x-axis
0 2ε 0

−2ε δ 0 2ε
E1+ε = Null = span , the line thru 0 with slope .
0 0 1 δ
2
As ε, δ → 0, the second eigenspace may rotate wildly, and in effect can be anything we want. In particular,
we can pick our rates of convergence so that E1+ε converges to the y-axis. So in the limit, when the two
eigenvalues converge to each other, we might expect a 2-dimensional eigenspace; which is what happens.
Now do the same thing to B; let
1−ε 1+δ
Bε,δ = .
0 1+ε
Bε,δ has the same two eigenvalues: 1 − ε and 1 + ε. But this time its eigenspaces are

0 1+δ 1
E1−ε = Null = span , the x-axis
0 2ε 0

−2ε 1 + δ 1 2ε
E1+ε = Null = span , the line thru 0 with slope 1+δ .
0 0 2ε
As ε, δ → 0, the second eigenspace converges to the first, so in the limit, there is only 1 dimension’s worth
of eigenvectors. Viewed in this way, the 1-dimensional eigenspace of B can be seen as a “singularity,” that
is, two 1-dimensional eigenspaces which have collapsed onto each other.

Math 146 Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math 146 Notes

Uploaded by

Copyright:

Available Formats

MATH 146 January 9 Section 2

Pseudo-Definition. A field is a “number system” F having:

More examples of vector spaces (over a field F):

Definition. Let V be a vector space over a field F, and let W ⊆ V .

Definition. Suppose V is a vector space over F and x ∈ V .

Theorem 1.6. Let V be vector space over F. Let S ⊆ V . Then:

Again, this system is equivalent to the original one.

Properties of span. Let V be a vector space and S, T ⊆ V .

§1.6 Recall: a set S (in a vector space/F) is linearly independent if

We’ll discuss this correspondence more later.

1And even more problems arise if S is uncountable.

Proof sketch. Let S = {x1 , . . . , xk , v1 , . . . , v` }. Since span(S) = V we can write

So by the Exchange Lemma, there exists i ∈ {1, . . . , n} so that

and this is the unique way to write it.

Given subspaces W1 , W2 of vector space V , we have defined

Thus both sides of (∗∗) are in W1 ∩ W2 . Since W1 ∩ W2 = span(B), we get

Next, we start quotient spaces.

Jargon. v + W is called the translation of W by v, or the coset of W containing v. (Note that v ∈ v + W .)

u+v v = (2, −1)

§2.1. Here is the definition of “good” functions between vector spaces.

A similar proof shows T (cx) = cT (x).

Definition. Suppose T : V → W is linear. The range (or image) of T is

Definition. Suppose T : V → W is linear with V finite-dimensional.

rank(T ) = dim(Ran(T )) = n − k = dim V − dim(Ker(T )) = dim V − nullity(T )

so rank(T ) + nullity(T ) = dim V as required. □

Suppose A ∈ Mm×n (F) and x ∈ Fn . Recall that Ax ∈ Fm .

Fact. With A as above, if x =  ...  ∈ Fn then by Jeff’s comment (Feb. 6 lecture),

Proof. Write A = [a1 a2 · · · an ], x =  ...  and y =  ... . Then x + y =  ... , so

Now let’s do something mind-blowing.

(∗∗) [v]B =  ...  ∈ Fn .

must find a1 , a2 , a3 , a4 ∈ R satisfying

Proof. I’ll show [v + w]B = [v]B + [w]B . Write B = (v1 , . . . , vn ), [v]B = .

= [T ]BA  ...  by Jeff-observation

0(0) + (−4)(−1) + 2(1) + 2(5) = 16.

Matrix multiplication is related to composition of linear transformations. Suppose we have T : V → W

Recall from Feb 17:

Notation. When A is invertible, its inverse is unique and is denoted A−1 .

Theorem 1 (Properties of inverses). Suppose A, B ∈ Mn (F).

Question: if we just assume that AB is invertible, can we say (AB)−1 = B −1 A−1 ?

Theorem 2. Let T ∈ L(V, W ).

Proof. (1) Assume that S1 , S2 ∈ L(W, V ) are both inverses of T . So S1 T = S2 T = IV and T S1 = T S2 = IW .

Fact about Functions. Suppose f : A → B and g : B → C, so g ◦ f : A → C. If g ◦ f is a bijection,

Proof. Let A = [T ]BA ∈ Mm×n (F).

Corollary. If A ∈ Mn (F), then TA : Fn → Fn is invertible iff A is invertible.

§2.5 Change of coordinates

Definition. Suppose A ∈ Mm×n (F).

Theorem 1. For square A ∈ Mn (F), the following are equivalent:

Recall the four fundamental subspaces of A ∈ Mm×n (F):

Theorem. Assume If F ⊆ R. If A ∈ Mm×n (F), then Fn = Row(A) ⊕ Null(A).

Definition. Let A ∈ Mm×n (F) and b ∈ Fm .

Theorem 5. In general, if (R | b) ∈ Mm×(n+1) (F) is in RREF, S is the solution set to Rx = b, and S ̸= ∅,

Chapter 4 (starting at §4.5) – Determinants

where Sn is the set of all permutations on {1, . . . , n} and

To facilitate the definition, we will interpret the determinant functions both as

where D(v1 , . . . , vn ) = det [v1 · · · vn ].

= kj=1 αj D(v1 , . . . , vij , . . . , vn )

Exercise. Work out this formula in the n = 3 case.

and similar calculations give

+ (−1)j−1 bLn−1 [v1 · · · vj−1 vj+1 · · · + vn ]

• (−1)j−1 bLn−1 [v1 · · · vj−1 x vj+1 · · · vn ]

Recall the formula for Laplace expansion of a determinant: If

Let B be the matrix obtained from A by swapping rows 1 and 2; thus

Observe that B1j = A2j for all j = 1, . . . , n. Hence

det A = − det B Lemma 1 (row version)