This action might not be possible to undo. Are you sure you want to continue?
Dave Penneys
August 7, 2008
2
Contents
1 Background Material 7
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Vector Spaces 21
2.1 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Linear Combinations, Span, and (Internal) Direct Sum . . . . . . . . . . . . 24
2.3 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Finitely Generated Vector Spaces and Dimension . . . . . . . . . . . . . . . 31
2.5 Existence of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Linear Transformations 39
3.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Kernel and Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Polynomials 51
4.1 The Algebra of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Prime Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Irreducible Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5 The Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . . 59
4.6 The Polynomial Functional Calculus . . . . . . . . . . . . . . . . . . . . . . 60
5 Eigenvalues, Eigenvectors, and the Spectrum 63
5.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 The Minimal Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3
6 Operator Decompositions 77
6.1 Idempotents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Matrix Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Nilpotents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5 Generalized Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.6 Primary Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 Canonical Forms 89
7.1 Cyclic Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Rational Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 The CayleyHamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.5 Uniqueness of Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.6 Holomorphic Functional Calculus . . . . . . . . . . . . . . . . . . . . . . . . 106
8 Sesquilinear Forms and Inner Product Spaces 111
8.1 Sesquilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.2 Inner Products and Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3 Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.4 Finite Dimensional Inner Product Subspaces . . . . . . . . . . . . . . . . . . 119
9 Operators on Hilbert Space 123
9.1 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.2 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.3 Unitaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.5 Partial Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.6 Dirac Notation and Rank One Operators . . . . . . . . . . . . . . . . . . . . 133
10 The Spectral Theorems and the Functional Calculus 135
10.1 Spectra of Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.2 Unitary Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.3 The Spectral Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.4 The Functional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.5 The Polar and Singular Value Decompositions . . . . . . . . . . . . . . . . . 146
4
Introduction
These notes are designed to be a self contained treatment of linear algebra. The author
assumes the reader has taken an elementary course in matrix theory and is well versed with
matrices and Gaussian elimination. It is my intention that each time a deﬁnition is given,
there will be at least two examples given. Most times, the examples will be presented without
proof that they are, in fact, examples, and it will be left to the reader to verify that the
examples set forth are examples.
5
6
Chapter 1
Background Material
1.1 Sets
Throughout these notes, sets will be denoted with curly brackets or letters. A verti
cal bar in the middle of the curly brackets will mean “such that.” For example, N =
¦1, 2, 3, . . . ¦ is the set of natural numbers, Z = ¦0, 1, −1, 2, −2, . . . ¦ is the set of integers,
Q =
_
p/q
¸
¸
p, q ∈ Z and q ,= 0
_
is the set of rational numbers, R is the set of real numbers
(which we will not deﬁne), and C = R + iR =
_
x + iy
¸
¸
x, y ∈ R
_
where i
2
+ 1 = 0 is the set
of complex numbers. If z = x +iy is in C where x, y are in R, then its complex conjugate is
the number z = x −iy. The modulus, or absolute value, of z is
[z[ =
_
x
2
+ y
2
=
√
zz.
Deﬁnition 1.1.1. To say x is an element of the set X, we write x ∈ X. We say X is a
subset of Y , denoted X ⊂ Y , if x ∈ X ⇒ x ∈ Y (the double arrow ⇒ means “implies”).
Sets X and Y are equal if both X ⊂ Y and Y ⊂ X. If X ⊂ Y and we want to emphasize
that X and Y may be equal, we will write X ⊆ Y . If X and Y are sets, we write
(1) X ∪ Y =
_
x
¸
¸
x ∈ X or x ∈ Y
_
,
(2) X ∩ Y =
_
x
¸
¸
x ∈ X and x ∈ Y
_
,
(3) X ¸ Y =
_
x ∈ X
¸
¸
x / ∈ Y
_
, and
(4) X Y =
_
(x, y)
¸
¸
x ∈ X and y ∈ Y
_
. X Y is called the (Cartesian) product of X and
Y .
There is a set with no elements in it, and it is denoted ∅ or ¦¦. A subset X ⊂ Y is called
proper if Y ¸ X ,= ∅, i.e., there is a y ∈ Y ¸ X.
Remark 1.1.2. Note that sets can contain other sets. For example, ¦N, Z, Q, R, C¦ is a set.
In particular, there is a diﬀerence between ¦∅¦ and ∅. The ﬁrst is the set containing the
empty set. The second is the empty set. There is something in the ﬁrst set, namely the
empty set, so it is not empty.
7
Examples 1.1.3.
(1) N ∪ Z = Z, N ∩ Z = N, and Z ¸ N = ¦0, −1, −2, . . . ¦.
(2) ∅ ∪ X = X, ∅ ∩ X = ∅, X ¸ X = ∅, and X ¸ ∅ = X for all sets X.
(3) R ¸ Q is the set of irrational numbers.
(4) R R = R
2
=
_
(x, y)
¸
¸
x, y ∈ R
_
.
Deﬁnition 1.1.4. If X is a set, then the power set of X, denoted T(X), is
_
S
¸
¸
S ⊂ X
_
.
Examples 1.1.5.
(1) If X = ∅, then T(X) = ¦∅¦.
(2) If X = ¦x¦, then T(X) = ¦∅, ¦x¦¦.
(3) If X = ¦x, y¦, then T(X) = ¦∅, ¦x¦, ¦y¦, ¦x, y¦¦.
Exercises
Exercise 1.1.6. A relation on a set X is a subset 1 of X X. We usually write x1y if
(x, y) ∈ 1. The relation 1 is called
(i) reﬂexive if x1x for all x ∈ X,
(ii) symmetric if x1y implies y1x for all x, y ∈ X,
(iii) antisymmetric if x1y and y1x implies x = y, and
(iv) transitive if x1y and y1z implies x1z.
(v) skewsymmetric if x1y and x1z implies y1z for all x, y, z ∈ X.
Find the following examples of a relation 1 on a set X:
1. X and 1 such that 1 is transitive, but not symmetric, antisymmetric, or reﬂexive.
2. X and 1 such that 1 is reﬂexive, symmetric, and transitive, but not antisymmetric,
3. X and 1 such that 1 is reﬂexive, antisymmetric, and transitive, but not symmetric,
and
4. X and 1 such that 1 is reﬂexive, symmetric, antisymmetric, and transitive.
Exercise 1.1.7. Show that a reﬂexive relation 1 on X is symmetric and transitive if and
only if 1 is skewtransitive.
8
1.2 Functions
Deﬁnition 1.2.1. A function (or map) f from the set X to the set Y , denoted f : X →Y ,
is a rule that associates to each x ∈ X a unique y ∈ Y , denoted f(x). The sets X and Y are
called the domain and codomain of f respectively. The function f is called
(1) injective (or onetoone) if f(x) = f(y) implies x = y,
(2) surjective (or onto) if for all y ∈ Y there is an x ∈ X such that f(x) = y, and
(3) bijective if f is both injective and surjective.
To say the element x ∈ X maps to the element y ∈ Y via f, i.e. f(x) = y, we sometimes
write f : x → y or x → y when f is understood. The image, or range, of f, denoted im(f)
or f(X), is
_
y ∈ Y
¸
¸
there is an x ∈ X with f(x) = y
_
. Another way of saying that f is
surjective is that im(f) = Y . The graph of f is the set
_
(x, y) ∈ X Y
¸
¸
y = f(x)
_
.
Examples 1.2.2.
(1) The natural inclusion N →Z is injective.
(2) The absolute value function Z →N ∪ ¦0¦ is surjective.
(3) Neither of the ﬁrst two is bijective. The map N → Z given by 1 → 0, 2 → 1, 3 → −1,
4 →2, 5 →−2 and so forth is bijective.
Deﬁnition 1.2.3. Suppose f : X →Y and g : Y →Z. The composite of g with f, denoted
g ◦ f, is the function X →Z given by (g ◦ f)(x) = g(f(x)).
Examples 1.2.4.
(1)
(2)
Proposition 1.2.5. Suppose f : X →Y and g : Y →Z.
(1) If f, g are injective, then so is g ◦ f.
(2) If f, g are surjective, then so is g ◦ f.
Proof. Exercise.
Proposition 1.2.6. Function composition is associative.
Proof. Exercise.
Deﬁnition 1.2.7. Let f : X →Y , and let W ⊂ X. Then the restriction of f to W, denoted
f[
W
: W →Y , is the function given by f[
W
(w) = f(w) for all w ∈ W.
Examples 1.2.8.
(1)
(2)
9
Proposition 1.2.9. Let f : X →Y .
(1) f is injective if and only if it has a left inverse, i.e. a function g : Y → X such that
g ◦ f = id
X
, the identity on X.
(2) f is surjective if and only if it has a right inverse, i.e. a function g : Y → X such that
f ◦ g = id
Y
.
Proof.
(1) Suppose f is injective. Then for each y ∈ im(f), there is a unique x with f(x) = y. Pick
x
0
∈ X, and deﬁne a function g : Y →X as follows:
g(y) =
_
x if f(x) = y
x
0
else.
It is immediate that g ◦ f = id
f
. Suppose now that there is a g such that g ◦ f = id
X
. Then
if f(x
1
) = f(x
2
), applying g to the equation yields x
1
= g ◦ f(x
1
) = g ◦ f(x
2
) = x
2
, so f is
injective.
(2) Suppose f is surjective. Then the sets I
y
=
_
x ∈ X
¸
¸
f(x) = y
_
are nonempty for each
y ∈ Y . By the axiom of choice, we may choose a representative x
y
∈ I
y
for each y ∈ Y .
Construct a function g : Y → X by setting g(y) = x
y
. It is immediate that f ◦ g = id
Y
.
Suppose now that f has a right inverse g. Then if y ∈ Y , we have that f ◦ g(y) = y, so
y ∈ im(f) and f is surjective.
Remark 1.2.10. The axiom of choice is formulated as follows:
Let X be a collection of nonempty sets. A choice function f is a function such that for
every S ∈ X, f(S) ∈ S.
Axiom of Choice: There exists a choice function f on X if X is a set of nonempty sets.
Note that in the proof of 1.2.9 (2), our collection of nonempty sets is
_
I
y
¸
¸
y ∈ Y
_
, and
our choice function is I
y
→x
y
.
We give a corollary whose proof uses a uniqueness technique found frequently in mathe
matics.
Corollary 1.2.11. f : X → Y is bijective if and only if it admits an inverse g : Y → X
such that f ◦ g = id
Y
and g ◦ f = id
X
.
Proof. It is clear by 1.2.9 that if an inverse exists, f is bijective as it admits both a left and
right inverse. Suppose now that f is bijective. Then by the preceding proposition, there is
a left inverse g and a right inverse h. We then see that
g = g ◦ id
Y
= g ◦ (f ◦ h) = (g ◦ f) ◦ h = id
X
◦h = h,
so g = h is an inverse of f. Note that this direction uses the axiom of choice as it was used
in 1.2.9. To prove this fact without the axiom of choice, we deﬁne g : Y → X by g(y) = x
if f(x) = y. As the sets I
y
as in the proof of 1.2.9 each only have one element in them, the
axiom of choice is unnecessary.
10
Remarks 1.2.12.
(1) The proof of 1.2.11 also shows that if f admits an inverse, then it is unique.
(2) Note that we used the associativity of composition of functions in the proof of 1.2.11.
Deﬁnition 1.2.13. For n ∈ N, let [n] = ¦1, 2, . . . , n¦. The set X
(1) has n elements if there exists a bijection [n] →A,
(2) is ﬁnite if there is a surjection [n] →X for some n ∈ N,
(3) is inﬁnite if there is an injection N →X (equivalently if X is not ﬁnite),
(4) is countable if there is a surjection N →X,
(5) is denumerable if there is a bijection N →X,
(6) is uncountable if X is not countable.
Examples 1.2.14.
(1) [n] has n elements and N is denumerable.
(2) Q is countable, and R is uncountable.
(3) [n] is countable, but not denumerable.
(4) If X has n elements, then T(X) has 2
n
elements.
Deﬁnition 1.2.15. If A is a ﬁnite set, then [A[ ∈ N is the number n such that A has n
elements.
Exercises
Exercise 1.2.16. Prove 1.2.5.
Exercise 1.2.17. Show that Q is countable.
Hint: Find a surjection Z
2
¸ Z ¦0¦ → Q, and construct a surjection N → Z
2
¸ Z ¦0¦.
Then use 1.2.5.
1.3 Fields
Deﬁnition 1.3.1. A binary operation on a set X is a function #: X X → X given by
(x, y) →x#y. A binary operation #: X X →X is called
(1) associative if x#(y#z) = (x#y)#z for all x, y, z ∈ X, and
(2) commutative if x#y = y#x for all x, y ∈ X.
Examples 1.3.2.
(1) Addition and multiplication on Z, Q, R, C are associative and commutative binary oper
ations.
11
(2) The cross product : R
3
R
3
→R
3
is not commutative or associative.
Deﬁnition 1.3.3. A ﬁeld (F, +, ) is a triple consisting of a set F and two binary operations
+ and on F called addition and multiplication respectively such that the following axioms
are satisﬁed:
(F1) additive associativity: (a + b) + c = a + (b + c) for all a, b, c ∈ F;
(F2) additive identity: there exists 0 ∈ F such that a + 0 = a = 0 +a for all a ∈ F;
(F3) additive inverse: for each a ∈ F, there exists an element b ∈ F such that a + b = 0 =
b +a;
(F4) additive commutativity: a +b = b + a for all a, b ∈ F;
(F5) distributivity: a (b + c) = a b + a c and (a + b) c = a c + b c for all a, b, c ∈ F;
(F6) multiplicative associativity: (a b) c = a (b c) for all a, b, c ∈ F;
(F7) multiplicative identity: there exists 1 ∈ F ¸ ¦0¦ such that a 1 = a = 1 a for all a ∈ F;
(F8) multiplicative inverse: for each a ∈ F ¸ ¦0¦, there exists b ∈ F ¸ ¦0¦ such that a b =
1 = b a;
(F9) multiplicative commutativity: a b = b a for all a, b ∈ F.
Examples 1.3.4.
(1) Q, R, C are ﬁelds.
(2) Q(
√
n) =
_
x +y
√
n
¸
¸
x, y ∈ Q
_
is a ﬁeld.
Remark 1.3.5. These axioms are also used in deﬁning the following algebraic structures:
axioms name axioms name
(F1) semigroup (F1)(F5) nonassociative ring
(F1)(F2) monoid (F1)(F6) ring
(F1)(F3) group (F1)(F7) ring with unity
(F1)(F4) abelian group (F1)(F8) division ring
(F1)(F9) ﬁeld
Remarks 1.3.6.
(1) The additive inverse of a ∈ F in (F3) is usually denoted −a.
(2) The multiplicative inverse of a ∈ F ¸ ¦0¦ in (F8) is usually denoted a
−1
or 1/a.
(3) For simplicity, we will often denote the ﬁeld by F instead of (F, +, ).
Deﬁnition 1.3.7. Let F be a ﬁeld. A subﬁeld K ⊂ F is a subset such that +[
K×K
and [
K×K
are well deﬁned binary operations on K, (K, +[
K×K
, [
K×K
) is a ﬁeld, and the multiplicative
identity of K is the the multiplicative identity of F.
12
Examples 1.3.8.
(1) Q is a subﬁeld of R and R is a subﬁeld of C.
(2) Q(
√
2) is a subﬁeld of R.
(3) The irrational numbers R ¸ Q do not form a subﬁeld of R.
Finite Fields
Theorem 1.3.9 (Euclidean Algorithm). Suppose x ∈ Z
≥0
and n ∈ N. Then there are unique
k, r ∈ Z
≥0
with r < n such that x = kn +r.
Proof. There is a smallest k ∈ Z
≥0
such that kn ≤ x. Set r = x − kn. Then r < n and
x = kn +r. It is obvious that k, r are unique.
Deﬁnition 1.3.10. For x ∈ Z
≥0
and n ∈ N, we deﬁne
x mod n = r
if x = kn +r with r < n.
Examples 1.3.11.
(1)
(2)
Deﬁnition 1.3.12.
(1) For x, y ∈ Z, we say x divides y, denoted x[y, if there is a k ∈ Z such that x = ky.
(2) For x, y ∈ N, the greatest common divisor, denoted gcd(x, y), is the largest number k ∈ N
such that k[x and k[y.
(3) We call x, y ∈ N relatively prime if gcd(x, y) = 1.
(4) A number p ∈ N ¸ ¦1¦ is called prime if gcd(p, n) = 1 for all n ∈ [p −1].
Proposition 1.3.13. x, y ∈ N are relatively prime if and only if there are r, s ∈ Z such that
rx + sy = 1.
Proof. Now if gcd(x, y) = k, then k[(rx + sy) for all r, s ∈ Z, so existence of r, s ∈ Z such
that rx + sy = 1 implies gcd(x, y) = 1.
Now suppose gcd(x, y) = 1. Let S =
_
sx +ty ∈ N
¸
¸
s, t ∈ Z
_
. Then S has a smallest
element n = sx + ty for some s, t ∈ Z. By the Euclidean Algorithm 1.3.9, there are unique
k, r ∈ Z
≥0
with r < n such that x = kn +r. But then
r = x −kn = x −k(sx +ty) = (1 −ks)x + (−kt)y ∈ S.
But r < n, so r = 0, and x = kn. Similarly, we have an l ∈ Z
≥0
such that y = ln. Hence
n[x and n[y, so n = 1.
13
Deﬁnition 1.3.14. For n ∈ N¸¦1¦, we deﬁne Z/n = ¦0, 1, . . . , n−1¦, and we deﬁne binary
operations # and ∗ on Z/n by
x#y = (x +y) mod n and x ∗ y = (xy) mod n.
Proposition 1.3.15. (Z/n, #, ∗) is a ﬁeld if and only if n is prime.
Proof. First note that Z/n has additive inverses as the additive inverse of x ∈ Z/n is n −x.
It then follows that Z/n is a commutative ring with unit (one must check that addition and
multiplication are compatible with the operation x → x mod n). Thus Z/n is a ﬁeld if
and only if it has multiplicative inverses. We claim that x ∈ Z/n ¸ ¦0¦ has a multiplicative
inverse if and only if gcd(x, n) = 1, which will immediately imply the result.
Suppose gcd(x, n) = 1. By 1.3.13, we have r, s ∈ N with rx +sn = 1. Hence
(r mod n) ∗ x = (rx) mod n = (rx +sn) mod n = 1 mod n = 1,
so x has a multiplicative inverse.
Now suppose x has a multiplicative inverse y. Then
(xy) mod n = 1,
so there is an k ∈ Z such that xy = kn + 1. Thus xy + (−k)n = 1, and gcd(x, n) = 1 by
1.3.13.
Exercises
Exercise 1.3.16 (Fields). Show that Q(
√
2) =
_
a + b
√
2
¸
¸
a, b ∈ Q
_
is a ﬁeld where addition
and multiplication are the restriction of addition and multiplication in R.
Exercise 1.3.17. Construct an addition and multiplication table for Z/2, Z/3, and Z/5 and
deduce they are ﬁelds from the tables.
1.4 Matrices
This section is assumed to be prerequisite knowledge for the student, and deﬁnitions in this
section will be given without examples, and all results will be stated without proof. For this
section, F is a ﬁeld.
Deﬁnition 1.4.1. An m n matrix A over F is a function A: [m] [n] → F. Usually, A
is denoted by an m n array of elements of F, and the i, j
th
entry, denoted A
i,j
, is A(i, j)
where (i, j) ∈ [m] [n]. The set of all m n matrices over F is denoted M
m×n
(F), and we
will write M
n
(F) = M
n×n
(F). The i
th
row of the matrix A is the 1 n matrix A[
{i}×[n]
, and
the j
th
row is the m1 matrix A[
[m]×{j}
. The identity matrix I ∈ M
n
(F) is the matrix given
by
I
i,j
=
_
0 if i ,= j
1 if i = j.
14
Deﬁnition 1.4.2.
(1) If A, B ∈ M
m×n
(F), then A+B is the matrix in M
m×n
(F) given by (A+B)
i,j
= A
i,j
+B
i,j
.
(2) If A ∈ M
m×n
(F) and λ ∈ F, then λA ∈ M
m×n
(F) is the matrix given by (λA)
i,j
= λA
i,j
.
(3) If A ∈ M
m×n
(F) and B ∈ M
n×p
(F), then AB ∈ M
m×p
(F) is the matrix given by
(AB)
i,j
=
n
k=1
A
i,k
B
k,j
.
(4) If A ∈ M
m×n
(F) and B ∈ M
m×p
(F), then [A[B] ∈ M
m×(n+p)
(F) is the matrix given by
[A[B]
i,j
=
_
A
i,j
if j ≤ n
B
i,j−n
else.
Remark 1.4.3. Note that matrix addition and multiplication are associative.
Proposition 1.4.4. The identity matrix I ∈ M
n
(F) is the unique matrix in M
n
(F) such
that AI = A for all A ∈ M
m×n
(F) for all m ∈ N and IB = B for all B ∈ M
n×p
(F) for all
p ∈ N.
Proof. It is clear that AI = A for all A ∈ M
m×n
(F) for all m ∈ N and IB = B for all
B ∈ M
n×p
(F) for all p ∈ N. If J ∈ M
n
(F) is another such matrix, then J = IJ = I.
Deﬁnition 1.4.5. A matrix A ∈ M
n
(F) is invertible if there is a matrix B ∈ M
n
(F) such
that AB = BA = I.
Proposition 1.4.6. If A ∈ M
n
(F) is invertible, then the inverse is unique.
Remark 1.4.7. In this case, the unique inverse of A is usually denoted A
−1
.
Deﬁnition 1.4.8.
(1) The transpose of the matrix A ∈ M
m×n
(F) is the matrix A
T
∈ M
n×m
(F) given by (A
T
)
i,j
=
A
j,i
.
(2) The adjoint of the matrix A ∈ M
m×n
(C) is the matrix A
∗
∈ M
n×m
(C) given by (A
∗
)
i,j
=
A
j,i
.
Proposition 1.4.9.
(1) If A, B ∈ M
m×n
(F) and λ ∈ F, then (A+λB)
T
= A
T
+λB
T
. If F = C, then (A+λB)
∗
=
A
∗
+ λB
∗
.
(2) If A ∈ M
m×n
(F) and B ∈ M
n×p
(F), then (AB)
T
= B
T
A
T
. If F = C, then (AB)
∗
= B
∗
A
∗
.
(3) If A ∈ M
n
(F) is invertible, then A
T
is invertible, and (A
T
)
−1
= (A
−1
)
T
. If F = C, then
A
∗
is invertible, and (A
∗
)
−1
= (A
−1
)
∗
.
Deﬁnition 1.4.10. Let A ∈ M
n
(F). We call A
15
(1) upper triangular if i > j implies A
i,j
= 0,
(2) lower triangular if A
T
is upper triangular, i.e. j > i implies A
i,j
= 0, and
(3) diagonal if A is both upper and lower triangular, i.e. i ,= j implies A
i,j
= 0.
Deﬁnition 1.4.11. Let A ∈ M
n
(F). We call A
(1) block upper triangular if there are square matrices A
1
, . . . , A
m
with m ≥ 2 such that
A =
_
_
_
A
1
∗
.
.
.
0 A
m
_
_
_
where the ∗ denotes entries in F.
(2) block lower triangular if A
T
is block upper triangular, and
(3) block diagonal if A is both block upper triangular and block lower triangular.
Deﬁnition 1.4.12. Matrices A, B ∈ M
n
(F) are similar, denoted A ∼ B, if there is an
invertible S ∈ M
n
(F) such that S
−1
AS = B.
Exercises
Let F be a ﬁeld.
Exercise 1.4.13. Show that similarity gives a relation on M
n
(F) that is reﬂexive, symmetric,
and transitive (see 1.1.6).
Exercise 1.4.14. Prove 1.4.9.
1.5 Systems of Linear Equations
This section is assumed to be prerequisite knowledge for the student, and deﬁnitions in this
section will be given without examples, and all results will be stated without proof. For this
section, F is a ﬁeld.
Deﬁnition 1.5.1.
(1) An Flinear equation, or a linear equation over F, is an equation of the form
n
i=1
λ
i
x
i
= λ
1
x
1
+ + λ
n
x
n
= µ where λ
1
, . . . , λ
n
, µ ∈ F.
The x
i
’s are called the variables, and λ
i
is called the coeﬃcient of the variable x
i
. An Flinear
equation is completely determined by a matrix
A =
_
λ
1
λ
n
_
∈ M
1×n
(F)
and a number µ ∈ F. In this sense, we can give an equivalent deﬁnition of an Flinear
equation as a pair (A, µ) with A ∈ M
1×n
(F) and a µ ∈ F.
16
(2) A solution of the Flinear equation
n
i=1
λ
i
x
i
= λ
1
x
1
+ +λ
n
x
n
= µ
is an element
(r
1
, . . . , r
n
) ∈ F
n
= F F
. ¸¸ .
n copies
such that
n
i=1
λ
i
r
i
= λ
1
r
1
+ +λ
n
r
n
= µ.
Equivalently, a solution to the Flinear equation (A, µ) is an x ∈ M
n×1
(F) such that Ax = µ.
(3) A system of Flinear equations is a ﬁnite number m of Flinear equations:
n
i=1
λ
1,i
x
i
= λ
1,1
x
1
+ +λ
1,n
x
n
= µ
1
.
.
.
n
i=1
λ
m,i
x
i
= λ
m,1
x
1
+ + λ
m,n
x
n
= µ
n
where λ
i,j
∈ F for all i, j. Equivalently, a system of Flinear equations is a pair (A, y) where
A ∈ M
m×n
(F) and y ∈ M
m×1
(F).
(4) A solution to the system of Flinear equations
n
i=1
λ
1,i
x
i
= λ
1,1
x
1
+ +λ
1,n
x
n
= µ
1
.
.
.
n
i=1
λ
m,i
x
i
= λ
m,1
x
1
+ + λ
m,n
x
n
= µ
n
is an element (r
1
, . . . , r
n
) ∈ F
n
such that
n
i=1
λ
1,i
r
i
= λ
1,1
r
1
+ +λ
1,n
r
n
= µ
1
.
.
.
n
i=1
λ
m,i
r
i
= λ
m,1
r
1
+ +λ
m,n
r
n
= µ
n
.
Equivalently, a solution to the system of Flinear equations (A, y) is an x ∈ M
n×1
(F) such
that Ax = y.
17
Deﬁnition 1.5.2.
(1) An elementary row operation on a matrix A ∈ M
m×n
(F) is performed by
(i) doing nothing,
(ii) switching two rows,
(iii) multiplying one row by a constant λ ∈ F, or
(iv) replacing the i
th
row with the sum of the i
th
row and a scalar (constant) multiple of
another row.
(2) An n n elementary matrix is a matrix obtained from the identity matrix I ∈ M
n
(F) by
performing not more than one elementary row operation.
(3) Matrices A, B ∈ M
m×n
(F) are row equivalent if there are elementary matrices E
1
, . . . , E
n
such that A = E
n
E
1
B.
Remark 1.5.3. Elementary row operations are invertible, i.e. every elementary row operation
can be undone by another elementary row operation.
Proposition 1.5.4. Performing an elementary row operation on a matrix A ∈ M
m×n
(F) is
equivalent to multiplying A by the elementary matrix obtained by doing the same elementary
row operation to the identity.
Corollary 1.5.5. Elementary matrices are invertible.
Proposition 1.5.6. Suppose A ∈ M
n
(F). Then A is invertible if and only if A is row
equivalent to I.
Deﬁnition 1.5.7. A pivot of the i
th
row of the matrix A ∈ M
m×n
(F) is the ﬁrst nonzero
entry in the i
th
row. If there is no such entry, then the row has no pivots.
Deﬁnition 1.5.8. Let A ∈ M
m×n
(F).
(1) A is said to be in row echelon form if the pivot in the (i + 1)
th
row (if it exists) is in a
column strictly to the right of the pivot in the i
th
row (if it exists) for i = 1, . . . , n − 1, i.e.
if the pivot of the i
th
row is A
i,j
and the pivot of the (i + 1)
th
row is A
(i+1),k
, then k > j.
(2) A is said to be in reduced row echelon form, or is said to be row reduced, if A is in row
echelon form, all pivots of A are equal to 1, and all entries that occur above a pivot are
zeroes.
Theorem 1.5.9 (Gaussian Elimination Algorithm). Every matrix over F is row equivalent
to a matrix in row echelon form, and every matrix over F is row equivalent to a unique
matrix in reduced row echelon form.
Theorem 1.5.10. Suppose A ∈ M
m×n
(F). The system of Flinear equations (A, y) has a
solution if and only if the augmented matrix [A[y] can be row reduced so that no pivot occurs
in the (n + 1)
th
column. The solution is unique if and only if A can be row reduced to the
identity matrix.
18
Exercises
Let F be a ﬁeld.
Exercise 1.5.11. Prove 1.5.5.
Exercise 1.5.12. Prove 1.5.9.
Exercise 1.5.13. Prove 1.5.10.
Exercise 1.5.14. Prove 1.5.6.
19
20
Chapter 2
Vector Spaces
The objects that we will study this semester are vector spaces. The main theorem in this
chapter is that vector spaces over a ﬁeld F are classiﬁed by their dimension. In this chapter
F will denote a ﬁeld.
2.1 Deﬁnition
Deﬁnition 2.1.1. A vector space consists of
(1) a scalar ﬁeld F;
(2) a set V , whose elements are called vectors;
(3) a binary operation + called vector addition on V such that
(V1) addition is associative, i.e. u + (v +w) = (u +v) + w for all u, v, w ∈ V ,
(V2) addition is commutative, i.e. u + v = v +u for all u, v ∈ V
(V3) vector additive identity: there is a vector 0 ∈ V such that 0 +v = v for all v ∈ V , and
(V4) vector additive inverse: for each v ∈ V , there is a −v ∈ V such that v + (−v) = 0;
(4) and a map : F V →V given by (λ, v) →λ v called a scalar multiplication such that
(V5) scalar unit: 1 x = x for all v ∈ V and
(V6) scalar associativity: λ (µ v) = (λµ) v for all v ∈ V and λ, µ ∈ F;
such that the distributive properties hold:
(V7) λ (u +v) = (λ u) + (λ v) for all λ ∈ F and u, v ∈ V and
(V8) (λ +µ) v = (λ v) + (µ v) for all λ, µ ∈ F and v ∈ V .
Examples 2.1.2.
21
(1) (F, +, ) is a vector space over F where 0 is the additive identity and −x is the additive
inverse of x ∈ F.
(2) Let M
m×n
(F) denote the mn matrices over F. (M
n×k
(F), +, ) is a vector space where
+ is addition of matrices and is the usual scalar multiplication; if A = (A
ij
), B = (B
ij
) ∈
M
m×n
(F) and λ ∈ F, then (A + B)
ij
= A
ij
+B
ij
and (λA)
ij
= λA
ij
.
(3) Note that F
n
= M
n×1
(F) is a vector space over F.
(4) Let K ⊂ F be a subﬁeld. Then F is a vector space over K.
(5) F(X, F) = ¦f : X → F¦ is a vector space over F with pointwise addition and scalar
multiplication:
(λf + g)(x) = λf(x) + g(x).
(6) C(X, F), the set of continuous functions from X ⊂ F to F is a vector space over F. We
will write C(a, b) = C((a, b), R), and similarly for closed or halfopen intervals.
(7) C
1
(a, b), the set of Rvalued, continuously diﬀerentiable functions on the interval (a, b) ⊂
R is a vector space over R. This example can be generalized to C
n
(a, b), the ntimes contin
uously diﬀerentiable functions on (a, b), and C
∞
(a, b), the inﬁnitely diﬀerentiable functions.
Now that we know what a vector space is, we begin to develop the necessary tools to
prove that vector spaces over F are classiﬁed by their dimension. From this point on, let
(V, +, ) be a vector space over the ﬁeld F.
Deﬁnition 2.1.3. Let (V, +, ) be a vector space over F. Then W ⊂ V is a vector subspace
(or subspace) if +[
W×W
is an addition on W (W is closed under addition), [
F×W
is a scalar
multiplication on W (W is closed under scalar multiplication), and (W, +[
W×W
, [
F×W
) is a
vector space over F. A subspace W ⊂ V is called proper if W ,= V .
Examples 2.1.4.
(1) Every vector space has a zero subspace (0) consisting of the vector 0. Also, if V is a
vector space, V is a subspace of V .
(2) R
2
is not a subspace of R
3
. R
2
is not even a subset of R
3
. There are subspaces of R
3
which look exactly like R
2
. We will explain what this means when we discuss the concept of
isomorphism of vector spaces.
(3) If A ∈ M
m×n
(F) is an mn matrix, then NS(A) ⊂ R
n
is the subspace consisting of all
x ∈ R
n
such that Ax = 0.
(4) C
n
(a, b) is a subspace of C(a, b) for all n ∈ N ∪ ¦∞¦.
(5) C(a, b) is a subspace of F((a, b), R).
Proposition 2.1.5. A subset W ⊂ V is a subspace if it is closed under addition and scalar
multiplication, i.e. λu +v ∈ W for all u, v ∈ W and λ ∈ F.
22
Proof. If W is closed under addition and scalar multiplication, then clearly a[
W×W
and s[
F×W
will satisfy the distributive property. It remains to check that W has additive inverses and W
has 0. Since −1 ∈ F, we have that if w ∈ W, −1 w ∈ W. It is an easy exercise to check that
−1 w = −w, the additive inverse of w. Now since w, −w ∈ W, we have w+(−w) = 0 ∈ W,
and we are ﬁnished.
Exercises
V will denote a vector space over F. When confusion can arise, 0
V
will denote the additive
identity of V , and 0
F
will denote the additive identity of F.
Exercise 2.1.6. Show that (R
>0
, #, ) is a vector space over R where x#y = xy is the
addition and r x = x
r
is the scalar multiplication.
Exercise 2.1.7.
Show that if V is a vector space over F, then the additive identity is unique. Show if v ∈ V ,
then the additive inverse of v is unique.
Exercise 2.1.8. Let λ ∈ F. Show that λ 0
V
= 0
V
.
Exercise 2.1.9. Let v ∈ V and λ ∈ F ¸ ¦0¦. Show that λ v = 0 implies v = 0.
Exercise 2.1.10. Let v ∈ V . Show that 0
F
v = 0
V
.
Exercise 2.1.11. Let v ∈ V . Show that (−1) v = −v, the additive inverse of v.
Exercise 2.1.12 (Complexiﬁcation). Let (V, +, ) be a vector space over R. Let V
C
= V V ,
and deﬁne functions +: V
C
V
C
→V
C
and : C V
C
→V
C
by
(u
1
, v
1
) + (u
2
, v
2
) = (u
1
+u
2
, v
1
+v
2
) and (x +iy) (u
1
, v
1
) = (xu
1
−yv
1
, yu
1
+xv
1
).
(1) Show (V
C
, +, ) is a vector space over C called the complexiﬁcation of the real vector space
V .
(2) Show that (0, v) = i(v, 0) for all v ∈ V .
Note: This implies that we can think of V ⊂ V
C
as all vectors of the form (v, 0), and we can
write (u, v) = u + iv. Addition and scalar multiplication can then be rewritten in the naive
way as
(u
1
+iv
1
)+(u
2
+iv
2
) = (u
1
+u
2
)+i(v
1
+v
2
) and (x+iy)(u
1
+iv
1
) = xu
1
−yv
1
+i(yu
1
+xv
1
).
For w = u + iv ∈ V
C
, the real part of u + iv ∈ V
C
is Re(w) = u ∈ V , the imaginary part is
Im(v) = v ∈ V , and the conjugate is w = u −iv ∈ V
C
.
(3) Show that u
1
+ iv
1
= u
2
+ iv
2
if and only if u
1
= u
2
and v
1
= v
2
. Hence, two vectors are
equal if and only if their real and imaginary parts are equal.
(4) Find (R
n
)
C
and (M
m×n
(R))
C
.
23
Hint: Use the identiﬁcation V
C
= V +iV described in the note.
Exercise 2.1.13 (Quotient Spaces). Let W ⊂ V be a subspace. For v ∈ V , deﬁne
v + W =
_
v +w
¸
¸
w ∈ W
_
.
v +W is called a coset of W. Let V/W =
_
v + W
¸
¸
v ∈ V
_
, the set of cosets of W.
(1) Show that u + W = v + W if and only if u −v ∈ W.
(2) Deﬁne an addition # and a scalar multiplication ∗ on V/W so that (V/W, #, ∗) is a vector
space.
2.2 Linear Combinations, Span, and (Internal) Direct
Sum
Deﬁnition 2.2.1. Let v
1
, . . . , v
n
∈ V . A linear combination of v
1
, . . . , v
n
is a vector of the
form
v =
n
i=1
λ
i
v
i
where λ
i
∈ F for all i = 1, . . . , n. It is convention that the empty linear combination is equal
to zero (i.e. the sum of no vectors is zero).
Deﬁnition 2.2.2. Let S ⊂ V be a subset. Then
span(S) =
_
W
¸
¸
W is a subspace of V with S ⊂ W
_
.
Examples 2.2.3.
(1) If A ∈ M
m×n
(F) is a matrix, then the row space of A is the subspace of R
n
spanned by
the rows of A and the column space of A is the subspace of R
m
spanned by the columns of
A. These subspaces are denoted RS(A) and CS(A) respectively.
(2) span(∅) = (0), the zero subspace.
Remarks 2.2.4. Suppose S ⊂ V .
(1) span(S) is the smallest subspace of V containing S, i.e. if W is a subspace containing S,
then span(S) ⊂ W.
(2) Since span(S) is closed under addition and scalar multiplication, we have that all (ﬁnite)
linear combinations of elements of S are contained in span(S). As the set of all linear
combinations of elements of S forms a subspace, we have that
span(S) =
_
n
i=1
λ
i
v
i
¸
¸
¸
¸
n ∈ Z
≥0
, λ
i
∈ F, v
i
∈ S for all i = 1, . . . , n
_
.
Note that this still makes sense for S = ∅ as span(S) contains the empty linear combination,
zero.
24
Deﬁnition 2.2.5. If V is a vector space and W
1
, . . . , W
n
⊂ V are subspaces, then we deﬁne
the subspace
W
1
+ +W
n
=
n
i=1
W
i
=
_
n
i=1
w
i
¸
¸
¸
¸
w
i
∈ W
i
for all i ∈ [n]
_
⊂ V.
Examples 2.2.6.
(1) span¦(1, 0)¦ + span¦(0, 1)¦ = R
2
.
(2)
Proposition 2.2.7. Suppose W
1
, . . . , W
n
⊂ V are subspaces. Then
n
i=1
W
i
= span
_
n
_
i=1
W
i
_
.
Proof. Suppose v ∈
n
i=1
W
i
. Then v is a linear combination of elements of the W
i
’s. so
v ∈ span
_
n
_
i=1
W
i
_
.
Now suppose v ∈ span
_
n
_
i=1
W
i
_
. Then v is a linear combination of elements of the W
i
’s,
so there are w
1
1
, . . . , w
1
m
1
, . . . , w
n
1
, . . . , w
n
mn
and scalars λ
1
1
, . . . , λ
1
m
1
, . . . , λ
n
1
, . . . , λ
n
mn
such that
w
i
j
∈ W
i
for all j ∈ [m
j
] and
v =
n
i=1
m
j
j=1
λ
i
j
w
i
j
.
For i ∈ [n], set
u
i
=
m
j
j=1
λ
i
j
w
i
j
∈ W
i
to see that v =
n
i=1
u
i
∈
n
i=1
W
i
.
DeﬁnitionProposition 2.2.8. Suppose W
1
, . . . , W
n
⊂ V are subspaces, and let
W =
n
i=1
W
i
.
The following conditions are equivalent:
(1) W
i
∩
j=i
W
j
= (0) for all i ∈ [n],
25
(2) for each v ∈ W, there are unique w
i
∈ W
i
for i ∈ [n] such that v =
n
i=1
w
i
.
(3) if w
i
∈ W
i
for i ∈ [n] such that
n
i=1
w
i
= 0, then w
i
= 0 for all i ∈ [n].
If any of the above three conditions are satisﬁed, we call W the direct sum of the W
i
’s,
denoted
W =
n
i=1
W
i
.
Proof.
(1) ⇒(2): Suppose W
i
∩
j=i
W
j
= (0) for all i ∈ [n], and let v ∈ W. Since W =
n
i=1
W
i
,
there are w
i
∈ W
i
for i ∈ [n] such that v =
n
i=1
w
i
. Suppose v =
n
i=1
w
i
with w
i
∈ W
i
for all
i ∈ [n]. Then
0 = v −v =
n
i=1
w
i
−
n
i=1
w
i
=
n
i=1
w
i
−w
i
.
Since w
i
−w
i
∈ W
i
for all i ∈ [n], we have
w
j
−w
j
=
i=j
w
i
−w
i
∈ W
j
∩
i=j
W
i
= (0),
so w
j
−w
j
= 0. Similarly, w
i
= w
i
for all i ∈ [n], and the expression is unique.
(2) ⇒(3): Trivial.
(3) ⇒(1): Now suppose (3) holds. Suppose
w ∈ W
i
∩
j=i
W
j
for i ∈ [n]. Then we have w
j
∈ W
j
for all j ,= i such that
w =
j=i
w
j
.
Setting w
i
= −w, we have that
0 = w −w =
n
j=1
w
j
,
so w
j
= 0 for all j ∈ [n], and w
i
= 0 = −w. Thus w = 0.
26
Remarks 2.2.9.
(1) If V =
n
i=1
W
i
and v =
n
i=1
w
i
where w
i
∈ W
i
for all i ∈ [n], then w
i
is called the
W
i
component of w for i ∈ [n].
(2) The proof (1) ⇒ (2) highlights another important proof technique called the “in two
places at once” technique.
Examples 2.2.10.
(1) V = V ⊕(0) for all vector spaces V .
(2) R
2
= span¦(1, 0)¦ ⊕span¦(0, 1)¦.
(2) C(R, R) = ¦even functions¦ ⊕¦odd functions¦.
(2) Suppose Y ⊂ X, a set. Then
F(X, F) =
_
f ∈ F(X, F)
¸
¸
f(y) = 0 for all y ∈ Y
_
⊕
_
f ∈ F(X, F)
¸
¸
f(x) = 0 for all x / ∈ Y
_
.
Exercises
2.3 Linear Independence and Bases
Deﬁnition 2.3.1. A subset S ⊂ V is linearly independent if for every ﬁnite subset ¦v
1
, . . . , v
n
¦ ⊂
S,
n
i=1
λ
i
v
i
= 0 implies that λ
i
= 0 for all i ∈ [n].
We say the vectors v
1
, . . . , v
n
are linearly independent if ¦v
1
, . . . , v
n
¦ is linearly independent.
If a set S is not linearly independent, it is linearly dependent.
Examples 2.3.2.
(1) Letting e
i
= (0, . . . , 0, 1, 0, . . . , 0)
T
∈ F
n
where the 1 is in the i
th
slot for i ∈ [n], we get
that
_
e
i
¸
¸
i ∈ [n]
_
is linearly independent.
(2) Deﬁne E
i,j
∈ M
m×n
(F) by
(E
i,j
)
k,l
=
_
1 if i = k, j = l
0 else.
Then
_
E
i,j
¸
¸
i ∈ [m], j ∈ [n]
_
is linearly independent.
(3) The functions δ
x
: X →F given by
δ
x
(y) =
_
0 if x ,= y
1 if x = y
are linearly independent in F(X, F), i.e.
_
δ
x
¸
¸
x ∈ X
_
is linearly independent.
27
Remark 2.3.3. Note that the zero element of a vector space is never in a linearly independent
set.
Proposition 2.3.4. Let S
1
⊂ S
2
⊂ V .
(1) If S
1
is linearly dependent, then so is S
2
.
(2) If S
2
is linearly independent, then so is S
1
.
Proof.
(1) If S
1
is linearly dependent, then there is a ﬁnite subset ¦v
1
, . . . , v
n
¦ ⊂ S
1
and scalars
λ
1
, . . . , λ
n
not all zero such that
n
i=1
λ
i
v
i
= 0.
Then as ¦v
1
, . . . , v
n
¦ is a ﬁnite subset of S
2
, S
2
is not linearly independent.
(2) Let ¦v
1
, . . . , v
n
¦ ⊂ S
1
, and suppose
n
i=1
λ
i
v
i
= 0.
If S
2
is inﬁnite, then λ
i
= 0 for all i as ¦v
1
, . . . , v
n
¦ is a ﬁnite subset of S
2
. If S
2
is ﬁnite,
then we have S
2
= S
1
∪ ¦w
1
, . . . , w
m
¦, and we have that
n
i=1
λ
i
v
i
+
m
j=1
µ
j
w
j
= 0 with µ
j
= 0 for all j = 1, . . . , m,
so λ
i
= 0 for all i = 1, . . . , n.
Remark 2.3.5. Note that in the proof of 2.3.4 (1), we have scalars λ
1
, . . . , λ
n
which are not
all zero. This means that there is a λ
i
,= 0 for some i ∈ ¦1, . . . , n¦. This is diﬀerent than if
λ
1
, . . . , λ
n
were all not zero. The order of “not” and “all” is extremely important, and it is
often confused by many beginning students.
Proposition 2.3.6. Suppose S is a linearly independent subset of V , and suppose v / ∈
span(S). Then S ∪ ¦v¦ is linearly independent.
Proof. Let ¦v
1
, . . . , v
n
¦ be a ﬁnite subset of S. Then we know
n
i=1
λ
i
v
i
= 0 =⇒λ
i
= 0 for all i.
Now suppose
µv +
n
i=1
µ
i
v
i
= 0.
28
If µ ,= 0, then we have
v =
−1
µ
n
i=1
µ
i
v
i
=
n
i=1
−µ
i
µ
v
i
.
Hence v is a linear combination of the v
i
’s, and v ∈ span(S), a contradiction. Hence µ = 0,
and all µ
i
= 0.
Proposition 2.3.7. Let S ⊂ V be a linearly independent, and let v ∈ span(S) ¸ ¦0¦. Then
there are unique v
1
, . . . , v
n
∈ S and λ
1
, . . . , λ
n
∈ F ¸ ¦0¦ such that
v =
n
i=1
λ
i
v
i
.
Proof. As v ∈ span(S), v can be written as a linear combination of elements of S. As v ,= 0,
there are v
1
, . . . , v
n
∈ S and λ
1
, . . . , λ
n
∈ F ¸ ¦0¦ such that
v =
n
i=1
λ
i
v
i
.
Suppose there are w
1
, . . . , w
m
∈ S and µ
1
, . . . , µ
m
∈ F ¸ ¦0¦ such that
v =
m
j=1
µ
j
w
j
.
Let T = ¦v
1
, . . . , v
n
¦ ∩ ¦w
1
, . . . , w
m
¦. If T = ∅, then
0 = v −v =
n
i=1
λ
i
v
i
−
m
j=1
µ
j
w
j
,
so λ
i
= 0 for all i ∈ [n] and µ
j
= 0 for all j ∈ [m] as S is linearly independent, a contradiction.
Let ¦u
1
, . . . , u
k
¦ = ¦v
1
, . . . , v
n
¦ ∩ ¦w
1
, . . . , w
m
¦. After reindexing, we may assume u
i
= v
i
=
w
i
for all i ∈ [k]. Thus
0 = v−v =
k
i=1
λ
i
v
i
+
n
i=k+1
λ
i
v
i
−
k
j=1
µ
j
w
j
−
m
j=k+1
µ
j
w
j
=
k
i=1
(λ
i
−µ
i
)u
i
+
n
i=k+1
λ
i
v
i
−
m
j=k+1
µ
j
w
j
,
so λ
i
= µ
i
for all i ∈ [k], λ
i
= 0 for all i = k + 1, . . . , n, and µ
j
= 0 for all j = k + 1, . . . , m,
so we have that k = n = m, and the expression is unique.
Deﬁnition 2.3.8. A subset B ⊂ V is called a basis for V if B is linearly independent and
V = span(B) (B spans V ).
Examples 2.3.9.
(1) The set
_
e
i
¸
¸
i ∈ [n]
_
is the standard basis of F
n
.
29
(2) The matrices E
i,j
which have as entries all zeroes except a 1 in the (i, j)
th
entry form the
standard basis of M
m×n
(F).
(3) The functions δ
x
: X →F given by
δ
x
(y) =
_
0 if x ,= y
1 if x = y
form a basis of F(X, F) if and only if X is ﬁnite. Note that the function f(x) = 1 for all
x ∈ X is not a ﬁnite linear combination of these functions if X is inﬁnite.
Deﬁnition 2.3.10. A vector space is ﬁnitely generated if there is a ﬁnite subset S ⊂ V such
that span(S) = V . Such a set S is called a ﬁnite generating set for V .
Examples 2.3.11.
(1) Any vector space with a ﬁnite basis is ﬁnitely generated. For example, F
n
and M
m×n
(F)
are ﬁnitely generated.
(2) The matrices E
ij
which have as entries all zeroes except a 1 in the ij
th
entry form the
standard basis of M
m×n
(F).
(3) We will see shortly that C(a, b) for a < b is not ﬁnitely generated.
Exercises
Exercise 2.3.12 (Symmetric and Exterior Algebras).
(1) Find a basis for the following real vector spaces:
(a) S
2
(R
n
) =
_
A ∈ M
n
(R)
¸
¸
A = A
T
_
, i.e. the real symmetric matrices and
(b)
_
2
(R
n
) =
_
A ∈ M
n
(R)
¸
¸
A = −A
T
_
, ie. the real antisymmetric (or skewsymmetric)
matrices.
(2) Show that M
n
(R) = S
2
(R
n
) ⊕
_
2
(R
n
).
Exercise 2.3.13 (Even and Odd Functions). A function f ∈ C(R, R) is called even if
f(x) = f(−x) for all x ∈ R and odd if f(x) = −f(−x) for all x ∈ R. Show that
(1) the subset of even, respectively odd, functions is a subspace of C(R, R) and
(2) C(R, R) = ¦even functions¦ ⊕¦odd functions¦.
Exercise 2.3.14. Let S
1
, S
2
⊂ V be subsets of V .
(1) What is span(span(S
1
))?
(2) Show that if S
1
⊂ S
2
, then span(S
1
) ⊂ span(S
2
).
(3) Suppose S
1
HS
2
is linearly independent. Show
span(S
1
HS
2
) = span(S
1
) ⊕span(S
2
).
30
Exercise 2.3.15.
(1) Let B = ¦v
1
, . . . , v
n
¦ be a basis for V . Suppose W is a kdimensional subspace of V .
Show that for any subset ¦v
i
1
, . . . , v
im
¦ of B with m > n − k, there is a nonzero vector
w ∈ W that is a linear combination of ¦v
i
1
, . . . , v
im
¦.
(2) Let W be a subspace of V having the property that there exists a unique subspace W
such that V = W ⊕W
. Show that W = V or W = (0).
2.4 Finitely Generated Vector Spaces and Dimension
For this section, V will denote a ﬁnitely generated vector space over F.
Lemma 2.4.1. Let A ∈ M
m×n
(F) be a matrix.
(1) y ∈ CS(A) ⊂ F
m
if and only if there is an x ∈ F
n
such that Ax = y.
(2) If v
1
, . . . , v
n
are n vectors in F
m
with m < n, then ¦v
1
, . . . , v
n
¦ is linearly dependent.
Proof.
(1) Let A
1
, . . . , A
n
be the columns of A. We have
y ∈ CS(A) ⇐⇒ there are λ
1
, . . . , λ
n
such that y =
n
i=1
λ
i
A
i
⇐⇒ there are λ
1
, . . . , λ
n
such that if x =
_
_
_
_
_
λ
1
λ
2
.
.
.
λ
n
_
_
_
_
_
, then Ax = y
⇐⇒ there is an x ∈ F
n
such that Ax = y.
(2) We must show there are scalars λ
1
, . . . , λ
n
, not all zero, such that
n
i=1
λ
i
v
i
= 0.
Form a matrix A ∈ M
m×n
(F) by letting the i
th
column be v
i
: A = [v
1
[v
2
[ [v
m
]. By (1), the
above condition is now equivalent to ﬁnding a nonzero x ∈ F
n
such that Ax = 0. To solve
this system of linear equations, we augment the matrix so it has n+1 columns by letting the
last column be all zeroes. Performing Gaussian elimination, we get a row reduced matrix
U which is row equivalent to the matrix B = [A[0]. Now, the number of pivots (the ﬁrst
entry of a row that is nonzero) of U must be less than or equal to m as there are only m
rows. Hence, at least one of the ﬁrst n columns does not have a pivot in it. These columns
correspond to free variables when we are looking for solutions of the original system of linear
equations, i.e. there is a nonzero x such that Ax = 0.
31
Theorem 2.4.2. Suppose S
1
is a ﬁnite set that spans V and S
2
⊂ V is linearly independent.
Then S
2
is ﬁnite and [S
2
[ ≤ [S
1
[.
Proof. Let S
1
= ¦v
1
, . . . , v
m
¦ and ¦w
1
, . . . , w
n
¦ ⊂ S
2
. We show n ≤ m. Since S
1
spans V ,
there are scalars λ
i,j
∈ F such that
w
j
=
m
i=1
λ
i,j
v
i
for all j = 1, . . . , n.
Let A ∈ M
m×n
(F) by A
i,j
= λ
i,j
. If m > n, then the rows of A are linearly dependent by
2.4.1, so there is an x = (µ
1
, . . . , µ
n
)
T
∈ R
n
¸ ¦0¦ such that Ax = 0. This means that
n
j=1
A
i,j
µ
j
=
n
j=1
λ
i,j
µ
j
= 0 for all i ∈ [m].
This implies that
n
j=1
µ
j
w
j
=
n
j=1
µ
j
m
i=1
λ
i,j
v
i
=
m
i=1
_
n
j=1
λ
i,j
µ
j
_
v
i
= 0,
a contradiction as ¦w
1
, . . . , w
n
¦ is linearly independent. Hence n ≤ m.
Corollary 2.4.3. Suppose V has a ﬁnite basis B with n elements. Then every basis of V
has n elements.
Proof. Let B
be another basis of V . Then by 2.4.2, B
is ﬁnite and [B
[ ≤ [B[. Applying
2.4.2 after switching B, B
yields [B[ ≤ [B
[, and we are ﬁnished.
Corollary 2.4.4. Every inﬁnite subset of a ﬁnitely generated vector space is linearly depen
dent.
Deﬁnition 2.4.5. The vector space V is ﬁnite dimensional if there is a ﬁnite subset B ⊂ V
such that B is a basis for V . The number of elements of B is the dimension of V , denoted
dim(V ). A vector space that is not ﬁnite dimensional is inﬁnite dimensional. In this case,
we will write dim(V ) = ∞.
Examples 2.4.6.
(1) M
m×n
(F) has dimension mn.
(2) If a ,= b, then C(a, b) and C
n
(a, b) are inﬁnite dimensional for all n ∈ N ∪ ¦∞¦.
(3) F(X, F) is ﬁnite dimensional if and only if X is ﬁnite.
Theorem 2.4.7 (Contraction). Let S ⊂ V be a ﬁnite subset such that S spans V .
(1) There is a subset B ⊂ S such that B is a basis of V .
32
(2) If dim(V ) = n and S has n elements, then S is a basis for V .
Proof.
(1) IfS is not a basis, then S is not linearly independent. Thus, there is some vector v ∈ S
such that v is a linear combination of the other elements of S. Set S
1
= S ¸ ¦v¦. It is
clear that S
1
spans V . If S
1
is not a basis, repeat the process to get S
2
. Repeating this
process as many times as necessary, since S is ﬁnite, we will eventually get that S
n
is linearly
independent for some n ∈ N, and thus a basis for V .
(2) Suppose S is not a basis for V . Then by (1), there is a proper subset T of S such that T
is a basis for V . But T has fewer elements than S, a contradiction to 2.4.3.
Corollary 2.4.8 (Existence of Bases). Let V be a ﬁnitely generated vector space. Then V
has a basis.
Proof. This follows immediately from 2.4.7 (1).
Corollary 2.4.9. The vector space V is ﬁnitely generated if and only if V is ﬁnite dimen
sional.
Proof. By 2.4.8, we see that a ﬁnitely generated vector space has a ﬁnite basis. The other
direction is trivial.
Lemma 2.4.10. A subspace W of a ﬁnite dimensional vector space V is ﬁnitely generated.
Proof. We construct a maximal linearly independent subset S of W as as follows: if W = (0),
we are ﬁnished. Otherwise, choose w
1
∈ W ¸ ¦0¦, and set S
1
= ¦w
1
¦, and note that S
1
is
linearly independent by 2.3.6. If W
1
= span(S
1
) = W, we are ﬁnished. Otherwise, choose
w
2
∈ W ¸ W
1
, and set S
2
= ¦w
1
, w
2
¦, which is again linearly independent by 2.3.6. If
W
2
= span(S
2
) = W, we are ﬁnished. Otherwise we may repeat this process. This algorithm
will terminate as all linearly independent subsets of W must have less than or equal to dim(V )
elements by 2.4.2. Now it is clear by 2.3.6 that our maximal linearly independent set S ⊂ W
must be a basis for W by 2.3.6. Hence W is ﬁnitely generated.
Theorem 2.4.11 (Extension). Let W be a subspace of V . Let B be a basis of W (we know
one exists by 2.4.8 and 2.4.10).
(1) If [B[ = dim(V ), then W = V .
(2) There is a basis C of V such that B ⊂ C.
Proof.
(1) Suppose B = ¦w
1
, . . . , w
n
¦ and dim(V ) = n. Suppose W ,= V . Then there is a w
n+1
∈
V ¸ W, and B
1
= B ∪ ¦w
n+1
¦ is linearly independent by 2.3.6. This is a contradiction to
2.4.2 as every linearly independent subset of V must have less than or equal to n elements.
Hence W = V .
33
(2) If span(B) ,= V , then pick v
1
∈ V ¸span(B). It is immediate from 2.3.6 that B
1
= B∪¦v
1
¦
is linearly independent. If span(B
1
) ,= V , then we may pick v
2
∈ V ¸ span(B
1
), and once
again by 2.3.6, B
2
= B
1
∪¦v
2
¦ is linearly independent. Repeating this process as many times
as necessary, we will have that eventually B
n
will have dim(V ) elements and be linearly
independent, so its span must be V by (1).
Corollary 2.4.12. Let W be a subspace of V , and let B be a basis of W. If dim(V ) = n < ∞,
then dim(W) ≤ dim(V ). If in addition W is a proper subspace, then dim(W) < dim(V ).
Proof. This is immediate from 2.4.11.
Proposition 2.4.13. Suppose V is ﬁnite dimensional and W
1
, . . . , W
n
are subspaces of V
such that
V =
n
i=1
W
i
.
(1) For i = 1, . . . , n, let n
i
= dim(W
i
) and let B
i
= ¦v
i
1
, . . . , v
u
n
i
¦ be a basis for W
i
. Then
B =
n
i=1
B
i
is a basis for V .
(2) dim(V ) =
n
i=1
dim(W
i
).
Proof.
(1) It is clear that B spans V . We must show it is linearly independent. Suppose
n
i=1
n
i
j=1
λ
i
j
v
i
j
= 0.
Then setting w
j
=
n
i
j=1
λ
i
j
w
i
j
, we have
n
i=1
w
i
= 0, so w
i
= 0 for all i ∈ [n] by 2.2.8. Now since
B
i
is a basis for all i ∈ [n], λ
i
j
= 0 for all i, j.
(2) This is immediate from (1).
Exercises
Exercise 2.4.14 (Dimension Formulas). Let W
1
, W
2
be subspaces of a vector space V .
(1) Show that dim(W
1
+W
2
) + dim(W
1
∩ W
2
) = dim(W
1
) + dim(W
2
).
(2) Suppose W
1
∩ W
2
= (0). Find an expression for dim(W
1
⊕ W
2
) similar to the formula
found in (1).
34
(3) Let W be a subspace of V , and suppose V = W
1
⊕W
2
. Show that if W
1
⊂ W or W
2
⊂ W,
then
W = (W ∩ W
1
) ⊕(W ∩ W
2
).
Is this still true if we omit the condition “W
1
⊂ W or W
2
⊂ W”?
Exercise 2.4.15 (Complexiﬁcation 2). Let V be a vector space over R.
(1) Show that dim
R
(V ) = dim
C
(V
C
). Deduce that dim
R
(V
C
) = 2 dim
R
(V ).
(2) If T ∈ L(V ), we deﬁne the complexiﬁcation of the operator T to be the operator T
C
∈
L(V
C
) given by
T
C
(u +iv) = Tu + iTv.
(a) Show that the complexicifation of multiplication by A ∈ M
n
(R) on R
n
is multiplication
by A ∈ M
n
(C) on C
n
.
(b) Show that the complexiﬁcation of multiplication by f ∈ F(X, R) on F(X, R) is multi
plication by f ∈ F(X, C) on F(X, C).
2.5 Existence of Bases
A question now arises: do all vector spaces have bases? The answer to this question is yes
as we will see shortly. We saw in the previous section that this result is very easy for ﬁnitely
generated vector spaces. However, to prove this in full generality, we will need Zorn’s Lemma
which is logically equivalent to the Axiom of Choice, i.e. assuming one, we can prove the
other. We will not prove the equivalence of these two statements as it is beyond the scope
of this course. For this section, V will denote a vector space over F (which is not necessarily
ﬁnitely generated).
Deﬁnition 2.5.1.
(1) A partial order on the set P is a subset 1 of P P (a relation 1 on P), such that
(i) (reﬂexive) (p, p) ∈ 1 for all p ∈ P,
(ii) (antisymmetric) (p, q) ∈ 1 and (q, p) ∈ 1, then p = q, and
(iii) (transitive) (p, q), (q, r) ∈ 1 implies (p, r) ∈ 1.
Usually, we denote (p, q) ∈ 1 as p ≤ q. Hence, we may restate these conditions as:
(i) p ≤ p for all p ∈ P,
(ii) p ≤ q and q ≤ p implies p = q, and
(iii) p ≤ q and q ≤ r implies p ≤ r.
35
Note that we may not always compare p, q ∈ P. In this sense the order is partial. The pair
(P, ≤), or sometimes P if ≤ is understood, is called a partially ordered set.
(2) Suppose P is partially ordered set. The subset T is totally ordered if for every s, t ∈ T,
we have either s ≤ t or t ≤ s. Note that we can compare all elements of a totally ordered
set.
(3) An upper bound for a totally ordered subset T ⊂ P, a partially ordered set, is an element
x ∈ P such that t ≤ x for all t ∈ T.
(4) An element m ∈ P is called maximal if p ∈ P with m ≤ p implies p = m.
Examples 2.5.2.
(1) Let X be a set, and let T(X) be the power set of X, i.e. the set of all subsets of X. Then
we may partially order T(X) by inclusion, i.e. we say S
1
≤ S
2
for S
1
, S
2
∈ T(X) if S
1
⊆ S
2
.
Note that this is not a total order unless X has only one element. Furthermore, note that
T(X) has a unique maximal element: X.
(2) We may partially order T(X) by reverse inclusion, i.e. we say S
2
≤ S
1
for S
1
, S
2
∈ T(X)
if S
1
⊆ S
2
. Note once again that this is not a total order unless X has one element. Also,
T(X) has a unique maximal element for this order as well: ∅.
Lemma 2.5.3 (Zorn). Suppose every nonempty totally ordered subset T of a nonempty
partially ordered set P has an upper bound. Then P has a maximal element.
Remark 2.5.4. Note that the maximal element may not be (and probably is not) unique.
Theorem 2.5.5 (Existence of Bases). Let V be a vector space. Then V has a basis.
Proof. Let P be the set of all linearly independent subsets of V . We partially order P by
inclusion. Let T be a totally ordered subset of P. We show that T has an upper bound.
Our candidate for an upper bound will be
X =
_
t∈T
t,
the union of all sets contained in T.
We must show X ∈ P, i.e. X is linearly independent. Let Y = ¦y
1
, . . . , y
n
¦ be a ﬁnite
subset of X. Then there are t
1
, . . . , t
n
∈ T such that y
i
∈ t
i
for all i = 1, . . . , n. Since T is
totally ordered, one of the t
i
’s contains the others. Call this set s. Then Y ⊂ s, but s is
linearly independent. Hence Y is linearly independent, and thus so is X.
We must show that X is indeed an upper bound for T. This is obvious as t ⊂ X for all
t ∈ T. Hence t ≤ X for all t ∈ T.
Now, we invoke Zorn’s lemma to get a maximal element B of P. We claim that B is a
basis for V , i.e. it spans V (B is linearly independent as it is in P). Suppose not. Then
there is some v ∈ V ¸ span(B). Then B ∪ ¦v¦ is linearly independent, and B ⊂ B ∪ ¦v¦.
Hence, B ≤ B ∪ ¦v¦, but B ,= B ∪ ¦v¦. This is a contradiction to the maximality of B.
Hence B is a basis for V .
36
Theorem 2.5.6 (Extension). Let W be a subspace of V . Let B be a basis of W. Then there
is a basis C of V such that B ⊂ C.
Proof. Let P be the set whose elements are linearly independent subsets of V containing
B partially ordered by inclusion. As in the proof of 2.5.5, we see that each totally ordered
subset has an upper bound, so P has a maximal element C ∈ P (which means B ⊂ C).
Once again, as in the proof of 2.5.5, we must have that C is a basis for V .
Theorem 2.5.7 (Contraction). Let S ⊂ V be a subset such that S spans V . Then there is
a subset B ⊂ S such that B is a basis of V .
Proof. Let P be the set whose elements are linearly independent subsets of S partially
ordered by inclusion. As in the proof of 2.5.5, we see that each totally ordered subset has
an upper bound, so P has a maximal element B ∈ P (which means B ⊂ S). Once again, as
in the proof of 2.5.5, we must have that B is a basis for V .
Proposition 2.5.8. Let W ⊂ V be a subspace. Then there is a (nonunique) subspace U of
V such that V = U ⊕W.
Proof. This is merely a restatement of 2.5.6. Let B be a basis of W, and extend B to a basis
C of V . Set U = span(C ¸ B). Then clearly U ∩ W = (0) and U + W = V , so V = U ⊕W
by 2.2.8. Note that if V is ﬁnitely generated, we may use 2.4.11 instead of 2.5.6.
Exercises
Exercise 2.5.9.
37
38
Chapter 3
Linear Transformations
For this chapter, unless stated otherwise, V, W will be vector spaces over F.
3.1 Linear Transformations
Deﬁnition 3.1.1. A linear transformation from the vector space V to the vector space W,
denoted T : V →W, is a function from V to W such that
T(λu + v) = λT(u) + T(v)
for all λ ∈ F and u, v ∈ V . This condition is called Flinearity. Often T(v) is denoted Tv.
Sometimes we will refer to a linear transformation as a linear operator, an operator, or a
map. The set of all linear transformations from V to W is denoted L(V, W). If V = W, we
write L(V ) = L(V, V ).
Examples 3.1.2.
(1) There is a zero linear transformation 0: V → W given by 0(v) = 0 ∈ W for all v ∈ V .
There is an identity linear transformation I ∈ L(V ) given by Iv = v for all v ∈ V .
(2) Let A ∈ M
m×n
(F). Then left multiplication by Adeﬁnes a linear transformation L
A
: F
n
→
F
m
given by x →Ax.
(3) The projection maps F
n
→F given by e
∗
i
(λ
1
, . . . , λ
n
)
T
= λ
i
for i ∈ [n] are Flinear.
(4) Integration from a to b where a, b ∈ R is a map C[a, b] →R.
(5) The derivative map D: C
n
(a, b) →C
n−1
(a, b) given by f →f
is an operator.
(6) Let x ∈ X, a set. Then evaluation at x is a linear transformation ev
x
: F(X, F) →F given
by f →f(x).
(7) Suppose B = ¦v
1
, . . . , v
n
¦ is a basis for V . Then the projection maps v
∗
j
: V → F given
by
v
∗
j
_
n
i=1
λ
i
v
i
_
= λ
j
39
are linear transformations.
Deﬁnition 3.1.3. Let V be a vector space over R, and let V
C
be as in 2.1.12. If T ∈ L(V ),
we deﬁne the complexiﬁcation of the operator T to be the operator T
C
∈ L(V
C
) given by
T
C
(u +iv) = Tu + iTv.
Examples 3.1.4.
(1) The complexiﬁcation of multiplication by the matrix A on R
n
is multiplication by the
matrix A on C
n
.
(2) The complexiﬁcation of multiplication by x on R[x] is multiplication by x on C[x].
Remarks 3.1.5.
(1) Note that a linear transformation is completely determined by what it does to a basis. If
¦v
i
¦ is a basis for V and we know what Tv
i
is for all i, then by linearity, we know Tv for any
vector v ∈ V . Hence, to deﬁne a linear transformation, one only needs to specify where a
basis goes, and then we may “extend it by linearity,” i.e. if ¦v
i
¦ is a basis for V , we specify
Tv
i
for all i, and then we decree that
T
_
n
j=1
λ
j
v
j
_
=
n
j=1
λ
j
Tv
j
for all ﬁnite subsets ¦v
1
, . . . , v
n
¦ ⊂ ¦v
i
¦. This rule deﬁnes T on all of V as ¦v
i
¦ spans V ,
and T is well deﬁned since ¦v
i
¦ is linearly independent.
For example, in example 7 above, v
∗
j
is the unique map that sends v
j
to one, and v
i
to
zero for i ,= j.
(2) If T
1
and T
2
are linear transformations V → W and λ ∈ F, then we may deﬁne T
1
+
T
2
: V → W by (T
1
+ T
2
)v = T
1
v + T
2
v, and we may deﬁne λT
1
: V → W by (λT
1
)v =
λ(T
1
v). Hence, if V and W are vector spaces over F, then L(V, W), the set of all Flinear
transformations V → W, is a vector space (it is easy to see that + is an addition and is a
scalar multiplication which satisfy the distributive property, each linear transformation has
an additive inverse, and there is a zero linear transformation).
Example 3.1.6. The map L: M
m×n
(F) →L(F
n
, F
m
) given by A →L
A
is a linear transfor
mation.
Exercises
Exercise 3.1.7. Let v ∈ V . Show that ev
v
: L(V, W) → W given by T → Tv is a linear
operator.
40
3.2 Kernel and Image
Note that we may talk about injectivity and surjectivity of operators as they are functions.
The immediate question to ask is, “Why should we care about linear transformations?” The
answer is that these maps are the “natural” maps to consider when we are thinking about
vector spaces. As we shall see, linear transformations preserve vector space structure in the
sense of the following deﬁnition and proposition:
Deﬁnition 3.2.1. Let T ∈ L(V, W). Let the kernel of T, denoted ker(T), be
_
v ∈ V
¸
¸
Tv = 0
_
and let the image, or range, of T, denoted im(T) or TV , be
_
w ∈ W
¸
¸
w = Tv for some v ∈ V
_
.
Examples 3.2.2.
(1) Let A ∈ M
m×n
(F) and consider L
A
. Then ker(L
A
) = NS(A) and im(L
A
) = CS(A).
(2) Consider ev
x
: F(X, F) →F. Then ker(ev
x
) =
_
f : X →F
¸
¸
f(x) = 0
_
and im(ev
x
) = F as
λδ
x
→λ.
(3) If B = ¦v
1
, . . . , v
n
¦ is a basis for V , then ker(v
∗
1
) = span¦v
2
, . . . , v
n
¦ and im(v
∗
1
) = F.
(4) Let D: C
1
[0, 1] →C[0, 1] be the derivative map f →f
. Then ker(D) = ¦constant functions¦
and im(D) = C[0, 1] as
D
x
_
0
f(t) dt = f(x) for all x ∈ [0, 1]
by the fundamental theorem of calculus, i.e. D has a right inverse.
Proposition 3.2.3. Let T ∈ L(V, W). ker(T) and im(T) are vector spaces.
Proof. It suﬃces to show they are closed under addition and scalar multiplication. Clearly
Tv = 0 = Tu implies T(λu +v) = 0 for all u, v ∈ ker(T) and λ ∈ F, so ker(T) is a subspace
of V , and thus a vector space. Let y, z ∈ im(T) and µ ∈ F, and let w, x ∈ V such that
Tw = y and Tx = z. Then T(λw +x) = λy +z ∈ im(T), so im(T) is a subspace of W, and
thus a vector space.
Proposition 3.2.4. T ∈ L(V, W) is injective if and only if ker(T) = (0).
Proof. Suppose T is injective. Then since T0 = 0, if Tv = 0, then v = 0. Suppose now that
ker(T) = (0). Let x, y ∈ V such that Tx = Ty. Then T(x −y) = 0, so x −y = 0 and x = y.
Hence T is injective.
Proposition 3.2.5. Suppose T ∈ L(V, W) is bijective, and let ¦v
i
¦ be a basis for V . Then
¦Tv
i
¦ is a basis for W.
Proof. We show ¦Tv
i
¦ is linearly independent. Suppose ¦Tv
1
, . . . , Tv
n
¦ is a ﬁnite subset of
¦Tv
i
¦, and suppose
n
j=1
λ
j
Tv
j
= T
_
n
j=1
λ
j
v
j
_
= 0.
41
Then since T is injective, by 3.2.4
n
j=1
λ
j
v
j
= 0.
Now since ¦v
i
¦ is linearly independent, we have λ
j
= 0 for all j = 1, . . . , n.
We show ¦Tv
i
¦ spans W. Let w ∈ W. Then there is a v ∈ V such that Tv = w as T is
surjective. Then v ∈ span¦v
i
¦, so there are v
1
, . . . , v
n
∈ ¦v
i
¦ and λ
1
, . . . , λ
n
∈ F such that
v =
n
j=1
λ
j
v
j
.
Then
w = Tv = T
n
j=1
λ
j
v
j
=
n
j=1
λ
j
Tv
j
,
so w ∈ span¦Tv
i
¦.
Deﬁnition 3.2.6. An operator T ∈ L(V, W) is called invertible, or an isomorphism (of vector
spaces), if there is a linear operator S ∈ L(W, V ) such that T◦S = id
W
and S◦T = id
V
, where
id means the identity linear transformation. Often, when composing linear transformations,
we will omit the ◦ and write ST and TS. We call vector spaces V, W isomorphic, denoted
V
∼
= W, if there is an isomorphism T ∈ L(V, W).
Examples 3.2.7.
(1) F
n
∼
= V if dim(V ) = n.
(2) F(X, F)
∼
= F
n
if [X[ = n.
Proposition 3.2.8. T ∈ L(V, W) is invertible if and only if it is bijective.
Proof. It is clear that an invertible operator is bijective.
Suppose T is bijective. Then there is an inverse function S: W →V . It remains to show
S is a linear transformation. Suppose y, z ∈ W and λ ∈ F. Then there are w, x ∈ W such
that Tw = y and Tx = z. Then
S(λy +z) = S(λTw +Tz) = ST(λw + x) = λw +x = λSTw +STx = λSy +Sz.
Hence S is a linear transformation.
Remark 3.2.9. If T is an isomorphism, then T maps bases to bases. In this way, we can
identify the domain and codomain of T as the same vector space. In particular, if V
∼
= W,
then dim(V ) = dim(W) (we allow the possibility that both are ∞).
Proposition 3.2.10. Suppose dim(V ) = dim(W) = n < ∞. Then there is an isomorphism
T : V →W.
42
Proof. Let ¦v
1
, . . . , v
n
¦ be a basis for V and ¦w
1
, . . . , w
n
¦ be a basis for W. Deﬁne a linear
transformation T : V →W by Tv
i
= w
i
for i = 1, . . . , n, and extend by linearity. It is easy to
see that the inverse of T is the operator S: W → V deﬁned by Sw
i
= v
i
for all i = 1, . . . , n
and extending by linearity.
Remark 3.2.11. We see that if U
∼
= V via isomorphism T and V
∼
= W via isomorphism S,
then U
∼
= W via isomorphism ST.
Our task is now to prove the RankNullity Theorem. One particular application of this
theorem is that injectivity, surjectivity, and bijectivity are all equivalent for T ∈ L(V, W)
when V, W are of the same ﬁnite dimension.
Lemma 3.2.12. Let T ∈ L(V, W), and let M be a subspace such that V = ker(T) ⊕M (one
exists by 2.5.8).
(1) T[
M
is injective.
(2) im(T[
M
) = im(T).
(3) T[
M
is an isomorphism M →im(T).
Proof.
(1) Suppose u, v ∈ M such that Tu = Tv. Then T(u −v) = 0, so u −v ∈ M ∩ ker(T) = (0).
Hence u −v = 0, and u = v.
(2) Let B = ¦u
1
, . . . , u
n
¦ be a basis for ker(T), and let C = ¦w
1
, . . . , w
m
¦ be a basis for
M. Then by 2.4.13, B ∪ C is a basis for V . Hence ¦Tu
1
, . . . , Tu
n
, Tw
1
, . . . , Tw
m
¦ spans
im(T). But since Tu
i
= 0 for all i, we have that ¦Tw
1
, . . . , Tw
m
¦ spans im(T). Thus
im(T) = im(T[
M
).
(3) By (1), T[
M
is injective, and by (2), T[
M
is surjective onto im(T). Hence it is bijective
and thus an isomorphism.
Theorem 3.2.13 (RankNullity). Suppose V is ﬁnite dimensional, and let T ∈ L(V, W).
Then
dim(V ) = dim(im(T)) + dim(ker(T)).
Proof. By 2.5.8 there is a subspace M of V such that V = ker(T) ⊕M. By 2.4.13, dim(V ) =
dim(ker(T)) + dim(M). We must show dim(M) = dim(im(T)). This follows immediately
from 3.2.12 and 3.2.9.
Remark 3.2.14. Suppose V is ﬁnite dimensional, and let T ∈ L(V, W). Then dim(im(T)) is
sometimes called the rank of T, denoted rank(T), and dim(ker(T)) is sometimes called the
nullity of T, denoted nullity(T). Using this terminology, 3.2.13 says that
dim(V ) = rank(T) + nullity(T),
which is how 3.2.13 gets the name “RankNullity Theorem.”
43
Corollary 3.2.15. Let V, W be ﬁnite dimensional vector spaces with dim(V ) = dim(W),
and let T ∈ L(V, W). The following are equivalent:
(1) T is injective,
(2) T is surjective, and
(3) T is bijective.
Proof.
(3) ⇒(1): Obvious.
(1) ⇒(2): Suppose T is injective. Then dim(ker(T)) = 0, so dim(V ) = dim(im(T)) =
dim(W), and T is surjective.
(2) ⇒(3): Suppose T is surjective. Then by similar reasoning, dim(ker(T)) = 0, so T is
injective. Hence T is bijective.
Exercises
Exercise 3.2.16. Let T ∈ L(V, W), let S = ¦v
1
, . . . , v
n
¦ ⊂ V , and recall that TS =
¦Tv
1
, . . . , Tv
n
¦ ⊂ W.
(1) Prove or disprove the following statements:
(a) If S is linearly independent, then TS is linearly independent.
(b) If TS is linearly independent, then S is linearly independent.
(c) If S spans V , then TS spans W.
(d) If TS spans W, then S spans V .
(e) If S is a basis for V , then TS is a basis for W.
(f) If TS is a basis for W, then S is a basis for V .
(2) For each of the false statements above (if there are any), ﬁnd a condition on T which
makes the statement true.
Exercise 3.2.17 (Rank of a Matrix).
(1) Let A ∈ M
n
(F). Prove that dim(CS(A)) = dim(RS(A)). This number is called the rank
of A, denoted rank(A).
(2) Show that rank(A) = rank(L
A
).
Exercise 3.2.18. [Square Matrix Theorem] Prove the following (celebrated) theorem from
matrix theory. For A ∈ M
n
(F), the following are equivalent:
(1) A is invertible,
44
(2) There is a B ∈ M
n
(F) such that AB = I,
(3) There is a B ∈ M
n
(F) such that BA = I,
(4) A is nonsingular, i.e. if Ax = 0 for x ∈ F
n
, then x = 0,
(5) For all y ∈ F
n
, there is a unique x ∈ F
n
such that Ax = y,
(6) For all y ∈ F
n
, there is an x ∈ F
n
such that Ax = y,
(7) CS(A) = F
n
,
(8) RS(A) = F
n
,
(9) rank(A) = n, and
(10) A is row equivalent to I.
Exercise 3.2.19 (Quotient Spaces 2). Let W ⊂ V be a subspace.
(1) Show that the map q : V →V/W given by v →v+W is a surjective linear transformation
such that ker(q) = W.
(2) Suppose V is ﬁnite dimensional. Using q, ﬁnd dim(V/W) in terms of dim(V ) and dim(W).
(3) Show that if T ∈ L(V, U) and W ⊂ ker(T), then T factors uniquely through q, i.e. there
is a unique linear transformation
¯
T : V/W →U such that the diagram
V
T
//
q
""
D
D
D
D
D
D
D
D
U
V/W
e
T
<<
z
z
z
z
z
z
z
z
commutes, i.e. T =
¯
T ◦ q. Show that if W = ker(T), then
¯
T is injective.
3.3 Dual Spaces
We now study L(V, F) in the case of a ﬁnite dimensional vector space V over F.
Deﬁnition 3.3.1. Let V be a vector space over F. The dual of V , denoted V
∗
, is the vector
space of all linear transformations V → F, i.e. V
∗
= L(V, F). Elements of V
∗
are called
linear functionals.
Proposition 3.3.2. Let V be a vector space over F, and let ϕ ∈ V
∗
. Then either ϕ = 0 or
ϕ is surjective.
Proof. We know that dim(im(ϕ)) ≤ 1. If it is 0, then ϕ = 0. If it is 1, then ϕ is surjective.
DeﬁnitionProposition 3.3.3. Let B = ¦v
1
, . . . , v
n
¦ be a basis for the ﬁnite dimensional
vector space V . Then B
∗
= ¦v
∗
1
, . . . , v
∗
n
¦ is a basis for V
∗
called the dual basis.
45
Proof. We must show that B
∗
is a basis for V
∗
. First, we show B
∗
is linearly independent.
Suppose
n
i=1
λ
i
v
∗
i
= 0,
the zero linear transformation. Then we have that
_
n
i=1
λ
i
v
∗
i
_
v
j
= λ
j
= 0
for all j = 1, . . . , n. Thus, B
∗
is linearly independent. We show B
∗
spans V
∗
. Suppose
ϕ ∈ V
∗
. Let λ
j
= ϕ(v
j
) for j = 1, . . . , n. Then we have that
_
ϕ −
n
i=1
λ
i
v
∗
i
_
v
j
= 0
for all j = 1, . . . , n. Since a linear transformation is completely determined by its values on
a basis, we have that
ϕ −
n
i=1
λ
i
v
∗
i
= 0 =⇒ϕ =
n
i=1
λ
i
v
∗
i
.
Remark 3.3.4. One of the most useful techniques to show linear independence is the “killoﬀ”
method used in the previous proof:
_
n
i=1
λ
i
v
∗
i
_
v
j
= λ
j
= 0.
Applying the v
j
allows us to see that each λ
j
is zero. We will see this technique again when
we discuss eigenvalues and eigenvectors.
Proposition 3.3.5. Let V be a ﬁnite dimensional vector space over F. There is a canonical
isomorphism ev: V → V
∗∗
= (V
∗
)
∗
given by v → ev
v
, the linear transformation V
∗
→ F
given by the evaluation map: ev
v
(ϕ) = ϕ(v).
Proof. First, it is clear that ev is a linear transformation as ev
λu+v
= λev
u
+ev
v
. Hence, we
must show the map ev is bijective. Suppose ev
v
= 0. Then ev
v
is the zero linear functional
on V
∗
, i.e., for all ϕ ∈ V
∗
, ϕ(v) = 0. Since ϕ is completely determined by where it sends a
basis of V , we know that ϕ = 0. Hence, the map ev is injective. Suppose now that x ∈ V
∗∗
.
Let B = ¦v
1
, . . . , v
n
¦ be a basis for V and let B
∗
= ¦v
∗
1
, . . . , v
∗
n
¦ be the dual basis. Setting
λ
j
= x(v
∗
j
) for j = 1, . . . , n, we see that if
u =
n
i=1
λ
i
v
i
,
46
then (x − ev
u
)v
∗
j
= 0 for all j = 1, . . . , n. Once more since a linear transformation is
completely determined by where it sends a basis, we have x −ev
u
= 0, so x = ev
u
and ev is
surjective.
Remark 3.3.6. The word “canonical” here means “completely determined,” or “the best
one.” It is independent of any choices of basis or vector in the space.
Proposition 3.3.7. Let V be a ﬁnite dimensional vector space over F. Then V is isomorphic
to V
∗
, but not canonically.
Proof. Let B = ¦v
1
, . . . , v
n
¦ be a basis for V . Then B
∗
is a basis for V
∗
. Deﬁne a linear
transformation T : V → V
∗
by v
i
→ v
∗
i
for all v
i
∈ B, and extend this map by linearity.
Then T is an isomorphism as there is the obvious inverse map.
Remark 3.3.8. It is important to ask why there is no canonical isomorphism between V
and V
∗
. The naive, but incorrect, thing to ask is whether v → v
∗
would give a canonical
isomorphism V → V
∗
. Note that the deﬁnition of v
∗
requires a basis of V as in 3.3.3.
We cannot deﬁne the coordinate projection v
∗
without a distinguished basis. For example,
if V = F
2
and v = (1, 0), we can complete ¦v¦ to a basis in many diﬀerent ways. Let
B
1
= ¦v, (0, 1)¦ and B
2
= ¦v, (1, 1)¦. Then we see that v
∗
(1, 1) = 1 relative to B
1
, but
v
∗
(1, 1) = 0 relative to B
2
.
Exercises
Exercise 3.3.9 (Dual Spaces of Inﬁnite Dimensional Spaces). Suppose B =
_
v
n
¸
¸
n ∈ N
_
is
a basis of V . Is it true that V
∗
= span
_
v
∗
n
¸
¸
n ∈ N
_
?
3.4 Coordinates
For this section, V, W will denote ﬁnite dimensional vector spaced over F. We now discuss
the way in which all ﬁnite dimensional vector spaces over F look like F
n
(nonuniquely). We
then study L(V, W), with the main result being that if dim(V ) = n and dim(W) = m, then
L(V, W)
∼
= M
m×n
(F) (nonuniquely) which is left to the reader as an exercise.
Deﬁnition 3.4.1. Let V be a ﬁnite dimensional vector space over F, and let B = (v
1
, . . . , v
n
)
be an ordered basis for V . The map []
B
: V →R
n
given by
v =
n
i=1
λ
i
v
i
−→
_
_
_
_
_
λ
1
λ
2
.
.
.
λ
n
_
_
_
_
_
=
_
_
_
_
_
v
∗
1
(v)
v
∗
2
(v)
.
.
.
v
∗
n
(v)
_
_
_
_
_
is called the coordinate map with respect to B. We say that [v]
B
are coordinates for v with
respect to the basis B. Note that we denote an ordered basis as a sequence of vectors in
parentheses rather than a set of vectors contained in curly brackets.
47
Proposition 3.4.2. The coordinate map is an isomorphism.
Proof. It is clear that the coordinate map is a linear transformation as [λu+v]
B
= λ[u]
B
+[v]
B
for all u, v ∈ V and λ ∈ F. We must show []
B
is bijective. We show it is injective. Suppose
that [v]
B
= (0, . . . , 0)
T
. Then since
v =
n
i=1
λ
i
v
i
for unique λ
1
, . . . , λ
n
∈ F, we must have that λ
i
= 0 for all i = 1, . . . , n, and v = 0. Hence
the map is injective. We show the map is surjective. Suppose (λ
1
, . . . , λ
n
)
T
∈ F
n
. Then
deﬁne the vector u by
u =
n
i=1
λ
i
v
i
.
Then [u]
B
= (λ
1
, . . . , λ
n
)
T
, so the map is surjective.
Deﬁnition 3.4.3. Let V and W be ﬁnite dimensional vector spaces, let B = (v
1
, . . . , v
n
) be
an ordered basis for V , let C = (w
1
, . . . , w
m
) be an ordered basis for W, and let T ∈ L(V, W).
The matrix [T]
C
B
∈ M
m×n
(F) is given by
([T]
C
B
)
ij
= w
∗
i
(Tv
j
) = e
∗
i
([Tv
j
]
C
)
where w
∗
i
is projection onto the i
th
coordinate in W and e
∗
i
is the projection onto the i
th
coordinate in F
m
.
We can also deﬁne [T]
C
B
as the matrix whose (i, j)
th
entry is λ
i,j
where the λ
i,j
are the
unique elements of F such that
Tv
j
=
m
i=1
λ
i,j
w
i
.
We can see this in terms of matrix augmentation as follows:
[T]
C
B
=
_
[Tv
1
]
C
¸
¸
¸
¸
¸
¸
¸
¸
[Tv
n
]
C
_
.
The matrix [T]
C
B
is called coordinate matrix for T with respect to B and C. If V = W
and B = C, then we denote [T]
B
B
as [T]
B
.
Remark 3.4.4. Note that the j
th
column of [T]
C
B
is [Tv
j
]
C
.
Examples 3.4.5.
(1) Let B = (v
1
, . . . , v
4
) be an ordered basis for the ﬁnite dimensional vector space V . Let
S ∈ L(V, V ) be given by Sv
i
= v
i+1
for i ,= 4 and Sv
4
= v
1
. Then
[S]
B
=
_
_
_
_
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
_
_
_
_
.
48
(2) Let S be as above, but consider the ordered basis B
= (v
4
, . . . , v
1
). Then
[S]
B
=
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
_
_
_
_
.
(3) Let S be as above, but consider the ordered basis B
= (v
3
, v
2
, v
4
, v
1
). Then
[S]
B
=
_
_
_
_
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
_
_
_
_
.
Proposition 3.4.6. Let V and W be ﬁnite dimensional vector spaces, let B = (v
1
, . . . , v
n
) be
an ordered basis for V , let C = (w
1
, . . . , w
m
) be an ordered basis for W, and let T ∈ L(V, W).
Then the diagram
V
T
//
[·]
B
W
[·]
C
V
[T]
C
B
//
W
commutes, i.e. [T]
C
B
[v]
B
= [Tv]
C
for all v ∈ V .
Proof. To show both matrices are equal, we must show that their entries are equal. We know
([T]
C
B
)
i,j
= e
∗
i
[Tv
j
]
C
. Hence
([T]
C
B
[v]
B
)
i,1
=
n
j=1
([T]
C
B
)
i,j
([v]
B
)
j,1
=
n
j=1
e
∗
i
[Tv
j
]
C
([v]
B
)
j,1
= e
∗
i
n
j=1
([v]
B
)
j,1
[Tv
j
]
C
= e
∗
i
_
n
j=1
([v]
B
)
j,1
Tv
j
_
C
= e
∗
i
_
T
n
j=1
([v]
B
)
j,1
v
j
_
C
= e
∗
i
[Tv]
C
= ([Tv]
C
)
i,1
as []
C
, e
∗
i
, and T are linear transformations, and
v =
n
j=1
([v]
B
)
j1
v
j
.
Remarks 3.4.7. When we proved 2.4.3 and 2.5.6 (1), we used the coordinate map implicitly.
The algorithm of Gaussian elimination proves the hard result of 2.4.2, which in turn, proves
these technical theorems after we identify the vector space with F
n
for some n.
Proposition 3.4.8. Suppose S, T ∈ L(V, W) and λ ∈ F, and suppose that B = ¦v
1
, . . . , v
n
¦
is a basis for V and C = ¦w
1
, . . . , w
m
¦ is a basis for W. Then [S + λT]
C
B
= [S]
C
B
+ λ[T]
C
B
.
Thus []
C
B
: L(V, W) →M
m×n
(F) is a linear transformation.
49
Proof. First, we note that this follows immediately from 3.4.2 and 3.4.6 as
[S+λT]
C
B
x = [(S+λT)[x]
−1
B
]
C
= [S[x]
−1
B
+λT[x]
−1
B
]
C
= [S[x]
−1
B
]
C
+λ[T[x]
−1
B
]
C
= [S]
C
B
x+λ[T]
C
B
x
for all x ∈ F
n
.
We give a direct proof as well. We have that
Sv
j
=
m
i=1
([S]
C
B
)
i,j
w
i
and Tv
j
=
m
i=1
([T]
C
B
)
i,j
w
i
for all j ∈ [n]. Thus
(S +λT)v
j
= Sv
j
+λTv
j
=
m
i=1
([S]
C
B
)
i,j
w
i
+λ
m
i=1
([T]
C
B
)
i,j
w
i
=
m
i=1
(([S]
C
B
)
i,j
+λ([T]
C
B
)
i,j
)w
i
.
for all j ∈ [n], and ([S +λT]
C
B
)
i,j
= ([S]
C
B
)
i,j
+λ([T]
C
B
)
i,j
, so [S + λT]
C
B
= [S]
C
B
+λ[T]
C
B
.
Proposition 3.4.9. Suppose T ∈ L(U, V ) and S ∈ L(V, W), and suppose that A = ¦u
1
, . . . , u
m
¦
is a basis for U, B = ¦v
1
, . . . , v
n
¦ is a basis for V , and C = ¦w
1
, . . . , w
p
¦ is a basis for W.
Then [ST]
C
A
= [S]
C
B
[T]
B
A
.
Proof. We have that
STu
j
= S
n
i=1
([T]
B
A
)
i,j
v
i
=
n
i=1
([T]
B
A
)
i,j
(Sv
i
) =
n
i=1
([T]
B
A
)
i,j
_
p
k=1
([S]
C
B
)
k,i
w
k
_
=
p
k=1
_
n
i=1
([S]
C
B
)
k,i
([T]
B
A
)
i,j
_
w
k
=
p
k=1
_
[S]
C
B
[T]
B
A
_
k,j
w
k
for all j ∈ [m], so [ST]
C
A
= [S]
C
B
[T]
B
A
.
Exercises
V, W will denote vector spaces. Let T ∈ L(V, W).
Exercise 3.4.10.
Exercise 3.4.11.
Exercise 3.4.12. Suppose dim(V ) = n < ∞ and dim(W) = m < ∞, and let B, C be bases
for V, W respectively.
(1) Show that L(V, W)
∼
= M
m×n
(F).
(2) Show T is invertible if and only if [T]
C
B
is invertible.
Exercise 3.4.13. Suppose V is ﬁnite dimensional and B, C are two bases of V . Show
[T]
B
∼ [T]
C
.
50
Chapter 4
Polynomials
In this section, we discuss the background material on polynomials needed for linear algebra.
The two main results of this section are the Euclidean Algorithm and the Fundamental
Theorem of Algebra. The ﬁrst is a result on factoring polynomials, and the second says
that every polynomial in C[z] has a root. The latter result relies on a result from complex
analysis which is stated but not proved. For this section, F is a ﬁeld.
4.1 The Algebra of Polynomials
Deﬁnition 4.1.1. A polynomial p over F is a sequence p = (a
i
)
i∈Z
≥0
where a
i
∈ F for all
i ∈ Z
≥0
such that there is an n ∈ N such that a
i
= 0 for all i > n. The minimal n ∈ Z
≥0
such that a
i
= 0 for all i > n (if it exists) is called the degree of p, denoted deg(p), and we
deﬁne the degree of the zero polynomial, the sequence of all zeroes, denoted 0, to be −∞.
The leading coeﬃcient of of p, denoted LC(p) is a
deg(p)
, and the leading coeﬃcient of 0 is 0.
Note that p = 0 if and only if LC(p) = 0. A polynomial p ∈ F[z] is called monic if LC(p) = 1.
Often, we identify a polynomial with the function it induces. The zero polynomial 0
induces the zero function 0: F → F by z → 0. A nonzero polynomial p = (a
i
) induces a
function still denoted p: F →F given by
p(z) =
deg(p)
i=0
a
i
z
i
.
Sometimes when we identify the polynomial with the function, we call p a polynomial over
F with indeterminate z, and a
i
is called the coeﬃcient of the z
i
term for i = 1, . . . , n. If we
say that p is the polynomial given by
p(z) =
n
i=0
a
i
z
i
,
then it is implied that p = (a
i
) where a
i
= 0 for all i > n. Note that it is still possible in
this case that deg(p) < n.
51
The set of all polynomials over F is denoted F[z]. The set of all polynomials of degree
less than or equal to n ∈ Z
≥0
is denoted P
n
(F). If p, q ∈ F[z] with p = (a
i
) and q = (b
i
),
then we deﬁne the polynomial p + q = (c
i
) ∈ F[z] by c
i
= a
i
+ b
i
for all i ∈ Z
≥0
. Note that
if we identify p, q with the functions they induce:
p(z) =
deg(p)
i=0
a
i
z
i
and q(z) =
deg(q)
i=0
b
i
z
i
,
the polynomial p +q ∈ F[z] is given by
(p +q)(z) = p(z) + q(z) =
deg(p)
i=0
a
i
z
i
+
deg(q)
i=0
b
i
z
i
=
max{deg(p),deg(q)}
i=1
(a
i
+b
i
)z
i
.
We deﬁne the polynomial pq = (c
i
) ∈ F[z] by
c
k
=
i+j=k
a
i
b
j
for all k ∈ Z
≥0
.
Alternatively, if we are using the function notation,
(pq)(z) = p(z)q(z) =
deg(p)
i=0
a
i
z
i
deg(q)
i=0
b
i
z
i
=
deg(p)+deg(q)
k=0
_
i+j=k
a
i
b
j
_
z
k
.
If λ ∈ F and p = (a
i
) ∈ F[z], then we deﬁne the polynomial λp = (c
i
) by c
i
= λa
i
for all
i ∈ Z
≥0
. Alternatively, we have
(λp)(z) = λp(z) = λ
deg(p)
i=0
a
i
z
i
=
deg(p)
i=0
(λa
i
)z
i
.
Examples 4.1.2.
(1) p(z) = z
2
+ 1 is a polynomial in F[z].
(2) A linear polynomial is a polynomial of degree 1, i.e. it is of the form λz −µ for λ, µ ∈ F
with λ ,= 0.
Remarks 4.1.3.
(1) It is clear that LC(pq) = LC(p) LC(q), so pq is monic if and only if p, q are monic.
(2) A consequence of (1) is that pq = 0 implies p = 0 or q = 0 for p, q ∈ F[z].
(3) Another consequence of (1) is that deg(pq) = deg(p) + deg(q) for all p, q ∈ F[z].
(4) deg(p +q) ≤ max¦deg(p), deg(q)¦ for all p, q ∈ F[z], and if p, q ,= 0, we have deg(p +q) ≤
deg(p) + deg(q).
52
Deﬁnition 4.1.4. An Falgebra, or an algebra over F, is a vector space (A, +, ) and a binary
operation ∗ on A called multiplication such that the following axioms hold:
(A1) multiplicative associativity: (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ A,
(A2) distributivity: a∗(b +c) = a∗b +b ∗c and (a+b) ∗c = a∗c +b ∗c for all a, b, c ∈ A, and
(A3) compatibility with scalar multiplication: λ (a ∗ b) = (λ a) ∗ b for all λ ∈ F and a, b ∈ A.
The algebra is called unital if there is a multiplicative identity 1 ∈ A such that a∗1 = 1∗a = a
for all a ∈ A, and the algebra is called commutative if a ∗ b = b ∗ a for all a, b ∈ A.
Examples 4.1.5.
(1) The zero vector space (0) is a unital, commutative algebra over F with multiplication
given as 0 ∗ 0 = 0. The unit in the algebra in this case is 0.
(2) M
n
(F) is an algebra over F.
(3) L(V ) is an algebra over F if V is a vector space over F where multiplication is given by
S ∗ T = S ◦ T = ST.
(4) C(a, b) is an algebra over R. This is also true for closed and half open intervals.
(5) C
n
(a, b) is an algebra over R for all n ∈ N ∪ ¦∞¦.
Proposition 4.1.6. F[z] is an Falgebra with addition, multiplication, and scalar multipli
cation deﬁned as in 4.1.1.
Proof. Exercise.
Exercises
Exercise 4.1.7 (Ideals). An ideal I in an Falgebra A is a subspace I ⊂ A such that a∗x ∈ I
and x ∗ a ∈ I for all x ∈ I and a ∈ A.
(1) Show that I =
_
p ∈ F[z]
¸
¸
p(0) = 0
_
is an ideal in F[z].
(2) Show that I
x
=
_
f ∈ C[a, b]
¸
¸
f(x) = 0
_
is an ideal in C[a, b] for all x ∈ [a, b].
(3) Show that I =
_
f ∈ C(R, R)
¸
¸
there are a, b ∈ R such that f(x) = 0 for all x < a, x > b
_
is an ideal in C(R, R) (note that the a, b ∈ R depend on the f).
Exercise 4.1.8 (Ideals of Matrix Algebras). Find all ideals of M
n
(F).
Exercise 4.1.9. Let P
n
(F) be the subset of F[z] consisting of all polynomials of degree less
than or equal to n, i.e.
P
n
(F) =
_
p ∈ F[z]
¸
¸
deg(p) ≤ n
_
.
Show that P
n
(F) is a subspace of F[z] and dim(P
n
(F) = n + 1.
53
Exercise 4.1.10. Let B = (1, x, x
2
, x
3
), respectively C = (1, x, x
2
), be an ordered basis
of P
3
(F), respectively P
2
(F); let B
= (x
3
, x
2
, x, 1), respectively C
= (x
2
, x, 1) be another
ordered basis of P
3
(F), respectively P
2
(F); and let D ∈ L(P
3
(F), P
2
(F)) be the derivative
map
az
3
+bz
2
+cz +d −→ 3az
2
+ 2bz +c for all a, b, c ∈ F.
Find [D]
C
B
. and [D]
C
B
.
4.2 The Euclidean Algorithm
Theorem 4.2.1 (Euclidean Algorithm). Let p, q ∈ F[z] ¸ ¦0¦ with deg(q) ≤ deg(p). Then
there are unique k, r ∈ F[z] such that p = qk +r and deg(r) < deg(q).
Proof. Suppose
p(z) =
m
i=0
a
i
z
i
and q(z) =
n
i=0
b
i
z
i
where m ≥ n, a
m
,= 0, and b
n
,= 0. Then set
k
1
(z) =
a
m
b
n
z
m−n
and p
2
(z) = p(z) −k
1
(z)q(z),
and note that deg(p
2
) < deg(p). If deg(p
2
) < deg(q), we are ﬁnished by setting k = k
1
and
r = p
2
. Otherwise, set
k
2
(z) =
LC(p
2
)
b
n
z
deg(p
2
)−n
and p
3
(z) = p
2
(z) −k
2
(z)q(z),
ad note that deg(p
3
) < deg(p
2
). If deg(p
3
) < deg(q), we are ﬁnished by setting k = k
1
+ k
2
and r = p
3
. Otherwise, set
k
3
(z) =
LC(p
3
)
b
n
z
deg(p
3
)−n
and p
4
(z) = p
3
(z) −k
3
(z)q(z),
ad note that deg(p
4
) < deg(p
3
). If deg(p
4
) < deg(q), we are ﬁnished by setting k = k
1
+k
2
+k
3
and r = p
4
. Otherwise, we may repeat this process as many times as necessary, and note
that this algorithm will terminate as deg(p
j+1
) < deg(p
j
) (eventually deg(p
j
) < deg(q) for
some q).
It remains to show k, r are unique. Suppose p = k
1
q+r
1
= k
2
q+r
2
with deg(r
1
), deg(r
2
) <
deg(q). Then 0 = (k
1
−k
2
)q + (r
1
−r
2
). Now since deg(r
1
−r
2
) < deg(q), we have that
0 = LC((k
1
−k
2
)q + (r
1
−r
2
)) = LC((k
1
−k
2
)q) = LC(k
1
−k
2
) LC(q),
so k
1
= k
2
as q ,= 0. It immediately follows that r
1
= r
2
.
54
Exercises
Exercise 4.2.2 (Principal Ideals). Suppose A is a unital, commutative algebra over F. An
ideal I ⊂ A is called principal if there is an x ∈ I such that I =
_
ax
¸
¸
a ∈ A
_
. Show that
every ideal of F[z] is principal (see 4.1.7).
4.3 Prime Factorization
Deﬁnition 4.3.1. For p, q ∈ F[z], we say p divides q, denoted p[q, if there is a polynomial
k ∈ F[z] such that kp = q.
Examples 4.3.2.
(1) (z ±1) [ (z
2
−1).
(2) (z ± i) [ (z
2
+ 1) in C[z], but there are no nonconstant polynomials in R[z] that divide
z
2
+ 1.
Remark 4.3.3. Note that the above deﬁnes a relation on F[z] that is reﬂexive and transitive
(see 1.1.6). It is also antisymmetric on the set of monic polynomials.
Deﬁnition 4.3.4. A polynomial p ∈ F[z] with deg(p) ≥ 1 is called irreducible (or prime) if
q ∈ F[z] with q [ p and deg(q) < deg(p) implies q is constant.
Examples 4.3.5.
(1) Every linear (degree 1) polynomial is irreducible.
(2) z
2
+ 1 is irreducible over R (in R[z]), but not over C (in C[z]).
Deﬁnition 4.3.6. Polynomials p
1
, . . . , p
n
∈ F[z] where n ≥ 2 and deg(p
i
) ≥ 1 for all i ∈ [n]
are called relatively prime if q [ p
i
for all i ∈ [n] implies q is constant.
Examples 4.3.7.
(1) Any n ≥ 2 distinct monic linear polynomials in F[z] is relatively prime.
(2) Any n ≥ 2 distinct quadratics z
2
+ az + b ∈ R[z] such that a
2
− 4b < 0 are relatively
prime.
(3) Any quadratic z
2
+az +b ∈ R[z] with a
2
−4b < 0 and any linear polynomial in R[z] are
relatively prime.
Proposition 4.3.8. p
1
, . . . , p
n
∈ F[z] where n ≥ 2 are relatively prime if and only if there
are q
1
, . . . , q
n
∈ F[z] such that
n
i=1
q
i
p
i
= 1.
55
Proof. Suppose there are q
1
, . . . , q
n
∈ F[z] such that
n
i=1
q
i
p
i
= 1,
and suppose q[p
i
for all i ∈ [n]. Then q[1, so q must be constant, and the p
i
’s are relatively
prime.
Suppose now that the p
i
’s are relatively prime. Choose polynomials q
1
, . . . , q
n
∈ F[z]
such that
f =
n
i=1
q
i
p
i
has minimal degree which is not −∞ (in particular, f ,= 0). Now we know deg(f) ≤ deg(p
j
)
as we could have q
j
= 1 and q
i
= 0 for all i ,= j. By the Euclidean algorithm, for j ∈ [n],
there are unique k
j
, r
j
with deg(r
j
) < deg(f) such that p
j
= k
j
f + r
j
. Since
r
j
= p
j
−kf = p
j
−k
n
i=1
q
i
p
i
= (1 −kq
j
)p
j
+
i=j
kq
i
p
i
,
we must have that r
j
= 0 for all j ∈ [n] since deg(f) was chosen to be minimal. Thus f[p
i
for all i ∈ [n], so f must be constant. As f ,= 0, we may divide by f, and replacing q
i
with
q
i
/f for all i ∈ [n], we get the desired result.
Corollary 4.3.9. Suppose p, q
1
, . . . , q
n
∈ F[z] where n ≥ 2 and p is irreducible. Then if
p [ (q
1
q
n
), p [ q
i
for some i ∈ [n].
Proof. We proceed by induction on n ≥ 2.
n = 2: Suppose q
1
, q
2
∈ F[z] and p [ q
1
q
2
. If p [ q
1
, we are ﬁnished. Otherwise, p, q
1
are
relatively prime (if k is nonconstant and k [ p, then k = p, but k q
1
). By 4.3.8, there are
g
1
, g
2
∈ F[z] such that g
1
p +g
2
q
1
= 1. We then get
q
2
= g
1
q
2
p +g
2
q
1
q
2
.
As p[g
1
q
2
p and p[g
2
q
1
q
2
, we have p[q
2
.
n −1 ⇒n: We have p[(q
1
q
n−1
)q
n
, so by the case n = 2, either p[q
n
, in which case we are
ﬁnished, or p[q
1
q
n−1
, in which case we apply the induction hypothesis to get p[q
i
for some
i ∈ [n −1].
Lemma 4.3.10. Suppose p, q, r ∈ F[z] with p ,= 0 such that pq = pr. Then q = r.
Proof. We have p(q − r) = 0. As LC(p(q − r)) = LC(p) LC(q − r), we must have that
LC(q −r) = 0, and q = r.
56
Theorem 4.3.11 (Unique Factorization). Let p ∈ F[z] with deg(p) ≥ 1. Then there are
unique monic irreducible polynomials p
1
, . . . , p
n
and a unique constant λ ∈ F¸ ¦0¦ such that
p = λp
1
p
n
= λ
n
i=1
p
i
.
Proof.
Existence: We proceed by induction on deg(p).
deg(p) = 1: This case is trivial as p/ LC(p) is a monic irreducible polynomial and p = LC(p)(p/ LC(p)).
deg(p) > 1: We assume the result holds for all polynomials of degree less than deg(p). If p is
irreducible, then so is p/ LC(p), and we have p = LC(p)(p/ LC(p)). If p is not irreducible,
there is are nonconstant polynomials q, r ∈ F[z] with deg(q), deg(r) < deg(p) such that
p = qr. As the result holds for q, r, we have that the result holds for p.
Uniqueness: Suppose
p = λ
n
i=1
p
i
= µ
m
j=1
q
i
where all p
i
, q
j
’s are monic, irreducible polynomials. Then λ = µ = LC(p). Now p
n
divides
q
1
q
m
, so p
n
[q
i
for some i ∈ [m]. After relabeling, we may assume i = m. But then
p
n
= q
m
, so
p
1
p
n−1
p
n
= q
1
q
m−1
p
n
.
By 4.3.10, we have that p
1
p
n−1
= q
1
q
m−1
, and we may repeat this process. Eventually,
we see that n = m and that the q
j
’s are at most a rearrangement of the p
i
’s.
Exercises
Exercise 4.3.12. Suppose p, q ∈ F[z] are relatively prime. Show that p
n
and q
m
are rela
tively prime for n, m ∈ N.
Exercise 4.3.13. Let λ
1
, . . . , λ
n
∈ F be distinct and let µ
1
, . . . , µ
n
∈ F. Show that there is
a unique polynomial p ∈ F[z] of degree n−1 such that p(λ
i
) = µ
i
for all i ∈ [n]. Deduce that
a polynomial of degree d is uniquely determined by where it sends d + 1 (distinct) points of
F.
Exercise 4.3.14 (Greatest Common Divisors and Least Common Multiples). Let p
1
, p
2
∈
F[z] with degree ≥ 1.
(1) Show that there is a unique monic polynomial q ∈ F[z] of minimal degree such that p
1
[ q
and p
2
[ q. This polynomial, usually denoted lcm(p
1
, p
2
), is called the least common multiple
of p
1
and p
2
.
(2) Show that there is a unique polynomial q ∈ F[z] of maximal degree with LT(q) = 1 such
that q [ p
1
and q [ p
2
. This polynomial, usually denoted gcd(p
1
, p
2
), is called the greatest
common divisor of p
1
and p
2
.
57
4.4 Irreducible Polynomials
For this section, F will denote a ﬁeld and K ⊂ F will be a subﬁeld such that F is a ﬁnite
dimensional Kvector space.
Deﬁnition 4.4.1. λ ∈ F is called a root of p ∈ K[z] where deg(p) ≥ 1 if p(λ) = 0.
DeﬁnitionProposition 4.4.2. Let λ ∈ F. The irreducible polynomial of λ over the ﬁeld
K is the unique monic irreducible polynomial Irr
K,λ
∈ K[z] of minimal degree such that
Irr
K,λ
(λ) = 0.
Proof.
Existence: Since λ ∈ F, λ
k
∈ F for all k ∈ Z
≥0
. The set
_
λ
k
¸
¸
k ∈ Z
≥0
_
cannot be linearly
independent over K as F is a ﬁnite dimensional Kvector space, so there are scalars µ
0
, . . . , µ
n
with µ
n
,= 0 such that
n
i=0
µ
i
λ
i
= 0 =⇒
n
i=0
µ
i
µ
n
λ
i
= 0.
Deﬁne p ∈ K[z] by
p(z) =
n
i=0
µ
i
µ
n
z
i
.
Then p(λ) = 0, so there is a monic polynomial in K[z] with λ as a root.
Pick a monic polynomial q ∈ K[z] of minimal degree such that q(λ) = 0. Then q is
irreducible (otherwise, there are k, r ∈ K[z] of degree ≥ 1 such that q = kr, so q(λ) =
k(λ)r(λ) = 0, so either k(λ) = 0 or r(λ) = 0, a contradiction as q was chosen of minimal
degree).
Uniqueness: Suppose p, q ∈ K[z] are both monic polynomials of minimal degree such that
p(λ) = q(λ) = 0. Then deg(p) = deg(q), so by the Euclidean algorithm, there are k, r ∈ K[z]
with deg(r) < deg(q) such that p = kq + r. Now p(λ) = k(λ)q(λ) + r(λ), so r(λ) = 0. This
is only possible if r = 0 as deg(r) < deg(q). Hence p = kq with deg(p) = deg(q), so k is a
constant. As p, q are both monic, k = 1, and p = q.
Corollary 4.4.3. Suppose λ ∈ F. Then λ ∈ K if and only if deg(Irr
K,λ
) = 1.
Corollary 4.4.4. Suppose λ ∈ F and dim
K
(F) = n. Then deg(Irr
K,λ
) ≤ n.
Lemma 4.4.5. Suppose λ ∈ F is a root of p ∈ K[x] where deg(p) ≥ 1. Then Irr
K,λ
[p.
Proof. The proof is similar to 4.4.2. Without loss of generality, we may assume p is monic (if
not, we replace p with p/ LC(p). Now since deg(Irr
K,λ
) ≤ deg(p), by the Euclidean algorithm,
there are k, r ∈ K[z] with deg(r) < deg(Irr
K,λ
) such that p = k Irr
K,λ
+r. We immediately
see r(λ) = 0, so r = 0.
58
Example 4.4.6. We know that C is a two dimensional real vector space. If λ ∈ C is a root
of p ∈ R[z], then λ is also a root of p. In fact, since p(λ) = 0, we have that p(λ) = p(λ) = 0
since the coeﬃcients of p are all real numbers. Hence, the irreducible polynomial in R[z] for
λ ∈ C is given by
Irr
R,λ
(z) =
_
z −λ if λ ∈ R
(z −λ)(z −λ) = z
2
−2 Re(λ)z +[λ[
2
if λ / ∈ R.
Deﬁnition 4.4.7. The polynomial p ∈ F[z] splits (into linear factors) if there are λ, λ
1
, . . . , λ
n
∈
F such that
p(z) = λ
n
i=1
(z −λ
i
).
Examples 4.4.8.
(1) Every constant polynomial trivially splits in F[z].
(2) The polynomial z
2
+ 1 splits in C[z] but not in R[z].
Deﬁnition 4.4.9. The ﬁeld F is called algebraically closed if every p ∈ F[z] splits.
Exercises
4.5 The Fundamental Theorem of Algebra
Deﬁnition 4.5.1. A function f : C →C is entire if there is a power series representation
f(z) =
∞
n=0
a
n
z
n
where a
n
∈ C for all n ∈ Z
≥0
that converges for all z ∈ C.
Examples 4.5.2.
(1) Every polynomial in C[z] is entire as its power series representation is itself.
(2) f(z) = e
z
is entire as
e
z
=
∞
j=0
z
n
n!
(3) f(z) = cos(z) is entire as
cos(z) =
∞
j=0
(−1)
n
z
2n
(2n)!
(4) f(z) = sin(z) is entire as
sin(z) =
∞
j=0
(−1)
n
z
2n+1
(2n + 1)!
59
In order to prove the Fundamental Theorem of Algebra, we will need a theorem from
complex analysis. We state it without proof.
Theorem 4.5.3 (Liouville). Every bounded, entire function f : C →C is constant.
Theorem 4.5.4 (Fundamental Theorem of Algebra). Every nonconstant polynomial in C[x]
has a root.
Proof. Let p be a polynomial in C[z]. If p has no roots, then 1/p is entire and bounded. By
Liouville’s Theorem, 1/p is constant, so p is constant.
Remark 4.5.5. Let p ∈ C[z] be a nonconstant polynomial, and let n = deg(p). Then by
4.5.4, p has a root, say µ
1
, so there is a p
1
∈ C[z] such that p(z) = (z − µ
1
)p
1
(z). If p
1
is
nonconstant, then apply 4.5.4 again to see p
1
has a root µ
2
, so there is a p
2
∈ C[z] with
deg(p
2
) = n −2 such that p(z) = (z −µ
1
)(z −µ
2
)p
2
(z). Repeating this process, we see that
p
n
must be constant as a polynomial has at most n roots. Hence there are µ
1
, . . . , µ
n
∈ C
such that
p(z) = k
n
j=1
(z −µ
j
) = k(z −µ
1
) (z −µ
n
) for some k ∈ C.
Hence every polynomial in C[z] splits.
Proposition 4.5.6. Every nonconstant polynomial p ∈ R[z] splits into linear and quadratic
factors.
Proof. We know p has a root λ ∈ C by 4.5.4, so by 4.4.6, Irr
λ
[p, i.e. there is a p
2
∈ R[z]
with deg(p
2
) < deg(p) such that p = Irr
λ
p
2
. If p
2
is constant we are ﬁnished. If not,
we may repeat the process for p
2
to get p
3
, and so forth. This process will terminate as
deg(p
n
) < deg(p
n+1
).
Exercises
4.6 The Polynomial Functional Calculus
Deﬁnition 4.6.1 (Polynomial Functional Calculus). Given a polynomial
p(z) =
n
j=0
λ
j
z
j
∈ F[z],
and T ∈ L(V ), we can deﬁne an operator by
p(T) =
n
j=0
λ
j
T
j
with the convention that T
0
= I.
60
Proposition 4.6.2. The polynomial functional calculus satisﬁes:
(1) (p +q)(T) = p(T) + q(T) for all p, q ∈ F[z] and T ∈ L(V ),
(2) (pq)(T) = p(T)q(T) = q(T)p(T) for all p, q ∈ F[z] and T ∈ L(V ),
(3) (λp)(T) = λp(T) for all λ ∈ F, p ∈ F[z] and T ∈ L(V ), and
(4) (p ◦ q)(T) = p(q(T)) for all p, q ∈ F[z] and T ∈ L(V ).
Proof. Exercise.
Exercises
Exercise 4.6.3. Prove 4.6.2.
61
62
Chapter 5
Eigenvalues, Eigenvectors, and the
Spectrum
For this chapter, V will denote a vector space over F. We begin our study of operators in
L(V ) by studying the spectrum of an element T ∈ L(V ) which gives a lot of information
about the operator. We then discuss methods for calculating the spectrum sp(T) when V is
ﬁnite dimensional, namely the characteristic and minimal polynomials. For A ∈ M
n
(F), we
will identify L
A
∈ L(F
n
) with the matrix A.
5.1 Eigenvalues and Eigenvectors
Deﬁnition 5.1.1. Let T ∈ L(V ).
(1) The spectrum of T, denoted sp(T), is
_
λ ∈ F
¸
¸
T −λI is not invertible
_
.
(2) If v ∈ V ¸ ¦0¦ such that Tv = λv for some λ ∈ F, then v is called an eigenvector of T
with corresponding eigenvalue λ.
(3) If λ is an eigenvalue of T, i.e. there is an eigenvector v of T with corresponding eigenvalue
λ, then E
λ
=
_
w ∈ V
¸
¸
Tw = λw
_
is the eigenspace associated to the eigenvalue λ. It is clear
that E
λ
is a subspace of V .
Proposition 5.1.2. Suppose V is ﬁnite dimensional. Then λ ∈ F is an eigenvalue of
T ∈ L(V ) if and only if λ ∈ sp(V ).
Proof. By 3.2.15, T − λI is not invertible if and only if T − λI is not injective. It is clear
that T −λI is not injective if and only if there is an eigenvector v of T with corresponding
eigenvalue λ.
Remark 5.1.3. Note that 5.1.2 depends on the ﬁnite dimensionality of V as 3.2.15 depends
on the ﬁnite dimensionality of V .
Examples 5.1.4.
63
(1) Suppose A ∈ M
n
(F) is an upper triangular matrix. Then sp(L
A
) is the set of distinct
values on the diagonal of A. This can be seen as A − λI is not invertible if and only if λ
appears on the diagonal of A.
(2) The eigenvalues of the operator
A =
1
2
_
1 1
1 1
_
∈ M
n
(F)
are 0, 1 corresponding to respective eigenvectors
1
√
2
_
1
1
_
and
1
√
2
_
1
−1
_
.
If we had an eigenvector associated to the eigenvalue λ, then
1
2
_
1 1
1 1
__
a
b
_
=
1
2
_
a + b
a + b
_
= λ
_
a
b
_
,
so a + b = 2λa = 2λb. Thus λ(a − b) = 0, so either λ = 0, or a = b. In the latter case, we
have λ = 1 as a = b ,= 0. Hence sp(L
A
) = ¦0, 1¦.
(3) The operator
B =
_
0 −1
1 0
_
∈ M
2
(R)
has no eigenvalues. In fact, if
_
0 −1
1 0
__
a
b
_
=
_
−b
a
_
= λ
_
a
b
_
,
then we must have that λa = −b and λb = a, so −b = λa = λ
2
b, and (λ
2
+ 1)b = 0. Now
the ﬁrst is nonzero as λ ∈ R, so b = 0. Thus a = 0, and B has no eigenvectors.
If instead we consider B ∈ M
2
(C), then B has eigenvalues ±i corresponding to eigenvec
tors
1
√
2
_
1
±i
_
.
(4) Consider L
z
∈ L(F[z]) given by L
z
(p(z)) = zp(z). Then the set of eigenvalues of L
z
is
empty as zp(z) = λp(z) if and only if (z −λ)p(z) = 0 if and only if p = 0 for all λ ∈ F.
(5) Recall that a polynomial is really a sequence of elements of F which is eventually zero.
Now the operator L
z
discussed above is really the right shift operator R on F[z]:
R(a
0
, a
1
, a
2
, a
3
, . . . ) = L
z
(a
0
, a
1
, a
2
, a
3
, . . . ) = (0, a
0
, a
1
, a
2
, . . . ).
One can also deﬁne the left shift operator L ∈ L(F[z]) by
L(a
0
, a
1
, a
2
, a
3
, . . . ) = (a
1
, a
2
, a
3
, a
4
, . . . ).
64
Note that LR = I, the identity operator in L(F[z]), but R is not surjective as the constant
polynomials are not hit, and L is not injective as the constant polynomials are killed. Thus
L, R are not invertible, so 0 ∈ sp(R) and 0 ∈ sp(L). Thus R is an example of an operator
with no eigenvalues, but 0 ∈ sp(R). In fact, the only eigenvalue of L is 0 as Lp = λp implies
deg(λp) = deg(Lp) =
_
deg(p) −1 if deg(p) ≥ 2
−∞ if deg(p) < 2
which is only possible if λp = 0, and Lp = 0 only for the constant polynomials.
Deﬁnition 5.1.5. Suppose T ∈ L(V ). A subspace W ⊂ V is called Tinvariant if TW ⊂ W.
Examples 5.1.6. Suppose T ∈ L(V ).
(1) (0) and V are the trivial Tinvariant subspaces.
(2) ker(T) and im(T) are Tinvariant subspaces.
(3) An eigenspace of T is a Tinvariant subspace.
Notation 5.1.7. Given T ∈ L(V ) and a Tinvariant subspace W ⊂ V , we deﬁne the
restriction of T to W in L(W), denoted T[
W
, by T[
W
(w) = Tw for all w ∈ W. Note that
T[
W
is just the restriction of the function T : V → V to W with the codomain restricted as
well.
Lemma 5.1.8 (Polynomial Eigenvalue Mapping). Suppose p ∈ F[z], T ∈ L(V ), and v is an
eigenvector for T corresponding to λ ∈ sp(T). Then p(T)v = p(λ)v, so p(λ) ∈ sp(p(T)).
Proof. Exercise.
Proposition 5.1.9. Let T ∈ L(V ), and suppose λ
1
, . . . , λ
n
are eigenvalues of T. Suppose
v
i
∈ E
λ
i
for all i ∈ [n] such that
n
i=1
v
i
= 0. Then v
i
= 0 for all i ∈ [n]. Thus
n
i=1
E
λ
i
is a welldeﬁned subspace of V .
Proof. For i ∈ [n] deﬁne f
i
∈ F[z] by
f
i
(z) =
j=i
(z −λ
j
)
j=i
(λ
i
−λ
j
)
.
Then
f
i
(λ
j
) =
_
1 if i = j
0 else.
65
By 5.1.8, for i ∈ [n],
0 = f
i
(T)0 = f
i
(T)
n
j=1
v
j
=
n
j=1
f
i
(T)v
j
=
n
j=1
f
i
(λ
j
)v
j
= v
i
.
Remark 5.1.10. Eigenvectors corresponding to distinct eigenvalues are linearly independent,
so T can have at most dim(V ) many distinct eigenvalues.
Exercises
Exercise 5.1.11. We call two operators in S, T ∈ L(V ) similar, denoted S ∼ T, if there is an
invertible J ∈ L(V ) such that J
−1
SJ = T. The similarity class of S is
_
T ∈ L(V )
¸
¸
T ∼ S
_
.
Show
(1) ∼ deﬁnes a relation on L(V ) which is reﬂexive, symmetric, and transitive (see 1.1.6),
(2) distinct similarity classes are disjoint, and
(3) if S ∼ T, then sp(S) = sp(T). Thus the spectrum is an invariant of a similarity class.
Exercise 5.1.12 (Finite Shift Operators). Let B = ¦v
1
, . . . , v
n
¦ be a basis of V . Find the
spectrum of the shift operator T ∈ L(V ) given by Tv
i
= v
i+1
for i = 1, . . . , n − 1 and
Tv
n
= v
1
.
Exercise 5.1.13 (Inﬁnite Shift Operators). Let
1
(N, R) =
_
(a
n
)
¸
¸
¸
¸
a
∈
R for all n ∈ N and
n
n=1
[a
n
[ < ∞
_
,
∞
(N, R) =
_
(a
n
)
¸
¸
[a
n
[ ∈ [−M, M] for all n ∈ N for some M ∈ N
_
, and
R
∞
=
_
(a
n
)
¸
¸
a
n
∈ R for all n ∈ N
_
.
In other words,
1
(N, R) is the set of absolutely convergent sequences of real numbers,
∞
(N, R) is the set of bounded sequences of real numbers, and R
∞
is the set of all sequences
of real numbers.
(1) Show that
1
(N, R),
∞
(N, R), and R
∞
are vector spaces over R.
Hint: First show that R
∞
is a vector space over R, and then show
1
(N, R) and
∞
(N, R) are
subspaces. To show
1
(N, R) is closed under addition, use the fact that if (a
n
) is absolutely
convergent, then we can add up the terms [a
n
[ in any order that we want and we will still
get the same number.
(2) Deﬁne S
1
∈ L(
1
(N, R)), S
∞
∈ L(
∞
(N, R)) and S
0
∈ L(R
∞
) by
S
i
(a
1
, a
2
, a
3
, . . . ) = (a
2
, a
3
, a
4
, . . . ) for i = 0, 1, ∞.
66
(a) Find the set of eigenvalues, denoted E
S
i
for i = 0, 1, ∞.
(b) Why is part (a) asking you to ﬁnd the set of eigenvalues of S
i
and not the spectrum
of S
i
for i = 0, 1, ∞?
Exercise 5.1.14. Let F be a real vector space. λ ∈ C ¸ R is called a psuedoeigenvalue of
T ∈ L(V ) if λ is an eigenvalue of T
C
∈ L(V
C
). Suppose λ = a + ib is a psuedoeigenvalue of
T ∈ L(V ), and suppose u +iv ∈ V
C
¸ ¦0¦ such that T
C
(u +iv) = λ(u +iv). Set B = ¦u, v¦.
Show
(1) W = span(B) ⊂ V is invariant for T,
(2) B is a basis for W, and
(3) min
T
W
= Irr
R,λ
.
(4) Find [T[
W
]
B
.
5.2 Determinants
In this section, we discuss how to take the determinant of a square matrix. The determinant
is a crude tool for calculating the spectrum of L
A
for A ∈ M
n
(F) via the characteristic
polynomial which is deﬁned in the next section. This method is only recommended for small
matrices, and it is generally useless for large matrices without the use of a computer. Even
then, one can only get numerical approximations to the eigenvalues.
Deﬁnition 5.2.1. A permutation on [n] is a bijection σ: [n] →[n]. The set of all permuta
tions on [n] is denoted S
n
. Permutations are denoted
_
1 n
σ(1) σ(n)
_
.
A permutation σ ∈ S
n
is called a transposition if σ(i) = i for all but two distinct elements
j, k ∈ [n] and σ(j) = k and σ(k) = j. Note that the composite of two permutations in S
n
is
a permutation in S
n
.
Examples 5.2.2.
(1)
(2)
Deﬁnition 5.2.3. An inversion in σ ∈ S
n
is a pair (i, j) ∈ [n] [n] such that i < j and
σ(i) > σ(j). We call σ ∈ S
n
odd if there are an odd number of inversions in σ, and we call σ
even if there are an even number of inversions in σ. Deﬁne the sign function sgn: S
n
→¦±1¦
by sgn(σ) = 1 if σ is even and −1 if σ is odd.
Examples 5.2.4.
67
(1) The identity permutation has sign 1 since it has no inversions, and a transposition has
sign −1 as it has only one inversion.
(2)
Deﬁnition 5.2.5. The determinant of A ∈ M
n
(F) is
det(A) =
σ∈Sn
sgn(σ)
n
i=1
A
i,σ(i)
.
Example 5.2.6. We calculate the determinant of the 2 2 matrix
A =
_
a b
c d
_
∈ M
2
(F).
There are two permutations in S
2
: the identity permutation and the transposition switching
1 and 2. Call these id, σ respectively. It is clear that id has no inversions and σ has one
inversion, so sgn(id) = 1 and sgn(σ) = −1. Hence
det(A) =
_
sgn(id)
n
i=1
A
i,id(i)
_
+
_
sgn(σ)
n
i=1
A
i,σ(i)
_
= A
1,1
A
2,2
+(−1)A
1,2
A
2,1
= ad−bc ∈ F.
Proposition 5.2.7. For A ∈ M
n
(F), we can calculate det(A) recursively. Let A
i,j
∈
M
n−1
(F) be the matrix obtained from A by deleting the i
th
row and j
th
column. Then ﬁxing
i ∈ [n], we have
det(A) =
n
j=1
(−1)
i+j
A
i,j
det(A
i,j
).
This procedure is commonly referred to as taking the determinant of A by expanding along
the i
th
row of A. We may also take the determinant by expanding along the j
th
column of A.
Fixing j ∈ [n], we get
det(A) =
n
i=1
(−1)
i+j
A
i,j
det(A
i,j
).
Proof.
FINISH: We proceed by induction on n.
n = 1: Obvious.
n −1 ⇒n: Expanding along the i
th
row and using the induction hypothesis, we see
n
j=1
(−1)
i+j
A
i,j
det(A
i,j
) =
n
j=1
(−1)
i+j
A
i,j
σ∈S
n−1
sgn(σ)
n−1
k=1
(A
i,j
)
k,σ(k)
=
n
j=1
σ∈S
n−1
(−1)
i+j
A
i,j
_
sgn(σ)
n−1
k=1
(A
i,j
)
k,σ(k)
_
where [n −1] is identiﬁed with
68
Corollary 5.2.8. If A ∈ M
n
(F) has a row or column of zeroes, then det(A) = 0.
Lemma 5.2.9. Let A ∈ M
n
(F).
(1) det(A) = det(A
T
).
(2) If A is block upper triangular, i.e., there are square matrices A
1
, . . . , A
m
such that
A =
_
_
_
A
1
∗
.
.
.
0 A
m
_
_
_
,
then
det(A) =
m
i=1
det(A
i
).
Proof.
(1) This is obvious by 5.2.7 as taking the determinant of A along the i
th
row of A is the same
as taking the determinant of A
T
along the i
th
column of A
T
.
(2) We proceed by cases.
Case 1: Suppose m = 2. We proceed by induction on k where A
1
∈ M
k
(F).
k = 1: A
1
∈ M
1
(F), the result is trivial by taking the determinant along the ﬁrst column as
A
1,1
= A
2
and A
2,1
= 0.
k −1 ⇒k: Suppose A
1
∈ M
k
(F) with k > 1. Then taking the determinant along the ﬁrst
column, we have
A
i,1
=
_
(A
1
)
i,1
∗
0 A
2
_
for all i ≤ k,
so applying the induction hypothesis, we have det(A
i,1
) = det((A
1
)
i,1
) det(A
2
) for all i ≤ k.
Then as A
i,1
= 0 for all i > k, we have
det(A) =
k
i=1
(−1)
1+i
A
i,1
det(A
i,1
) =
k
i=1
(−1)
i+1
A
i,1
det((A
1
)
i,1
) det(A
2
) = det(A
1
) det(A
2
).
Case 2: Suppose m > 2, and set
B
1
=
_
_
_
A
2
∗
.
.
.
0 A
m
_
_
_
=⇒A =
_
A
1
∗
0 B
1
_
.
Applying case 1 gives det(A) = det(A
1
) det(B
1
). We may repeat this trick to peel oﬀ the
A
i
’s to get
det(A) =
m
i=1
A
i
.
69
Corollary 5.2.10. If A is block lower triangular or block diagonal, then det(A) is the product
of the determinants of the blocks on the diagonal. If A is upper triangular, lower triangular,
or diagonal, then det(A) is the product of the diagonal entries.
Proposition 5.2.11. Suppose A ∈ M
n
(F), and let E ∈ M
n
(F) be an elementary matrix.
Then det(EA) = det(E) det(A).
Proof. There are four cases depending on the type of elementary matrix.
Case 1: If E = I, the result is trivial.
Case 2: We must show
(a) the determinant of the matrix E obtained by switching two rows of I is −1, and
(b) switching two rows of A switches the sign of det(A).
Note that the result is trivial if the two rows are adjacent by 5.2.7. To get the result if the
rows are not adjacent, we note that the interchanging of two rows can be accomplished by
an odd number of adjacent switches. If we want to switch rows i and j with i < j, we switch
j with j −1, then j −1 with j −2, all the way to i + 1 with i for a total of j −i switches.
We then switch i + 1 with i + 2 as the old i
th
row is now in the (i + 1)
th
place, we switch
i + 2 with i + 3, all the way up to switching j − 1 with j for a total of j − i − 1 switches.
Hence, we switch a total of 2j −2i −1 adjacent rows, which is always an odd number, and
the result holds.
Case 3: We must show
(a) the determinant of the matrix E obtained from the identity by multiplying a row by a
nonzero constant λ is λ, and
(b) multiplying a row of A by a nonzero constant λ changes the determinant by multiplying
by λ.
We see (a) immediately holds from 5.2.9 as E is diagonal, and (b) immediately holds by
5.2.7 by expanding along the row multiplied by λ. If the i
th
row is multiplied by λ, then
det(EA) =
n
j=1
(−1)
i+j
(EA)
i,j
det((EA)
i,j
) =
n
j=1
(−1)
i+j
λA
i,j
det(A
i,j
) = det(E) det(A)
as λ = det(E) and (EA)
i,j
= A
i,j
as only the i
th
row diﬀers between the two matrices.
Case 4: We must show
(a) the determinant of the matrix E obtained from the identity by adding a constant
multiple of one row to another row is 1, and
(b) adding a constant multiple of the i
th
row of A to the k
th
row of A with k ,= i does not
change the determinant of A.
70
To prove this result, we need a lemma:
Lemma 5.2.12. Suppose A ∈ M
n
(F) has two identical rows. Then det(A) = 0.
Proof. We see from Case 2 that if we switch two rows of A, the determinant changes sign. As
A has two identical rows, switching these rows does not change the sign of the determinant,
so the determinant must be zero.
Once again, note (a) is trivial by 5.2.9 as E is either upper or lower triangular. To show
(b), we note that (EA)
k,j
= A
k,j
for all j ∈ [n], so by 5.2.7
det(EA) =
n
j=1
(−1)
k+j
(EA)
k,j
det((EA)
k,j
) =
n
j=1
(−1)
k+j
(A
i,j
+A
k,j
) det(A
k,j
)
=
n
j=1
(−1)
i+j
A
i,j
det(A
k,j
)
. ¸¸ .
det(B)
+
n
j=1
(−1)
k+j
A
k,j
det(A
k,j
)
. ¸¸ .
det(A)
= det(A)
by 5.2.12 as B is the matrix obtained from A by replacing the k
th
row with the i
th
row, so
two rows of B are the same, and det(B) = 0.
Theorem 5.2.13. A ∈ M
n
(F) is invertible if and only if det(A) ,= 0.
Proof. There is a unique matrix U in reduced row echelon form such that A = E
n
E
1
U
for elementary matrices E
1
, . . . , E
n
. By iterating 5.2.11, we have
det(A) = det(E
n
) det(E
1
) det(U),
which is nonzero if and only if det(U) ,= 0. Now det(U) ,= 0 if and only if U = I as U is in
reduced row echelon form, so det(A) ,= 0 if and only if A is row equivalent to I if and only
if A is invertible.
Proposition 5.2.14. Suppose A, B ∈ M
n
(F). Then det(AB) = det(A) det(B).
Proof. If Ais not invertible, then AB is not invertible by 3.2.18, so det(AB) = det(A) det(B) =
0 by 5.2.13.
Now suppose A is invertible. Then there are elementary matrices E
1
, . . . , E
n
such that
A = E
1
E
n
as A is row equivalent to I. By repeatedly applying 5.2.11, we get
det(AB) = det(E
1
E
n
B) = det(E
1
) det(E
n
) det(B) = det(A) det(B).
Corollary 5.2.15. For A, B ∈ M
n
(F), det(AB) = det(BA).
Proposition 5.2.16.
(1) Suppose S ∈ M
n
(F) is invertible. Then det(S
−1
) = det(S)
−1
71
(2) Suppose A, B ∈ M
n
(F) and A ∼ B. Then det(A) = det(B).
Proof.
(1) By 5.2.14, we have
1 = det(I) = det(S
−1
S) = det(S
−1
) det(S).
(2) By 5.2.14 and (1),
det(A) = det(S
−1
BS) = det(S
−1
) det(B) det(S) = det(S
−1
) det(S) det(B) = det(B).
Proposition 5.2.17. Suppose A, B[inM
n
(F).
(1) trace(AB) = trace(BA).
(2) If A ∼ B, then trace(A) = trace(B).
Proof. Exercise.
Exercises
V will denote a ﬁnite dimensional vector space over F.
Exercise 5.2.18 (A Faithful Representation of S
n
). Show that the symmetric group S
n
can
be embedded into L(V ) where dim(V ) = n, i.e. there is an injective function Φ: S
n
→L(V )
such that Φ(στ) = Φ(σ)Φ(τ) for all σ, τ ∈ S
n
.
Hint: Use operators like T deﬁned in 5.1.12.
Note: If G is a group, a function Φ: G → L(V ) such that Φ(gh) = Φ(g)Φ(h) is called a
representation of G. An injective representation is usually called a faithful representation.
5.3 The Characteristic Polynomial
For this section, V will denote a ﬁnite dimensional vector space over F.
Deﬁnition 5.3.1.
(1) For A ∈ M
n
(F), deﬁne the characteristic polynomial of A, denoted char
A
∈ F[z], by
char
A
(z) = det(zI −A). Note that char
A
∈ F[z] is a monic polynomial of degree n.
(2) Let T ∈ L(V ). Deﬁne the characteristic polynomial of T, denoted char
T
∈ F[z], by
char
T
(z) = det(zI −[T]
B
)
where B is some basis for V . Note that this is well deﬁned as if C is another basis of V ,
then [T]
B
∼ [T]
C
by 3.4.13, so zI −[T]
B
∼ zI −[T]
C
. Now we apply 5.2.16.
72
Remark 5.3.2. Note that char
A
= char
L
A
if A ∈ M
n
(F).
Examples 5.3.3.
(1) If A
±
∈ M
2
(F) is given by
A
±
=
_
0 ±1
1 0
_
,
we see that char
A
±
(z) = det(zI −A
±
) = z
2
∓1.
(2)
Remark 5.3.4. If A ∼ B, then char
A
= char
B
.
Proposition 5.3.5. Let T ∈ L(V ). Then λ ∈ sp(T) if and only if λ is a root of char
T
and
λ ∈ F.
Proof. This is immediate from 5.2.13 and 3.4.12.
Remark 5.3.6. 5.3.5 tells us that one way to ﬁnd sp(T) is to ﬁrst pick a basis B of V , ﬁnd
[T]
B
, and then compute the roots of the polynomial
char
T
(z) = det(zI −[T]
B
).
The problem with this technique is that it is usually very diﬃcult to factor polynomials of
high degree. For example, there are quadratic, cubic, and quartic equations for calculating
roots of polynomials of degree less than or equal to 4, but there is no formula for ﬁnding
roots of polynomials with degree greater than or equal to 5. Hence, if dim(V ) ≥ 5, there is
no good way known to factor the characteristic polynomial.
Another problem is that it is very hard to calculate determinants without the use of
computers. Even with a computer, we can only calculate determinants to a certain degree of
accuracy, so we are still unsure of the spectrum of the matrix if the characteristic polynomial
obtained in the fashion can be factored.
Examples 5.3.7. We will calculate the spectrum for some operators.
(1)
(2)
Exercises
V will denote a ﬁnite dimensional vector space over F.
Exercise 5.3.8 (Roots of Unity). Suppose V is a ﬁnite dimensional vector space over C
with ordered basis B = (v
1
, . . . , v
n
), and let T ∈ L(V ) be the ﬁnite shift operator deﬁned in
5.1.12. Compute char
T
(z) = det(zI −[T]
B
), and relate your answer to 5.1.12.
Exercise 5.3.9. Compute the characteristic polynomial of
73
(1) the operator L
B
∈ L(F
n
) where B =
_
_
_
_
_
λ 1 0
.
.
.
.
.
.
.
.
. 1
0 λ
_
_
_
_
_
∈ M
n
(F) and λ ∈ F, and
(2) the operator L
C
∈ L(F
n
) where C =
_
_
_
_
_
_
_
0 0 0 −a
0
1 0 0 −a
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0 −a
n−2
0 1 −a
n−1
_
_
_
_
_
_
_
∈ M
n
(F) and a
i
∈ F for
all i ∈ [n −1].
5.4 The Minimal Polynomial
For this section, V will denote a ﬁnite dimensional vector space over F.
DeﬁnitionProposition 5.4.1. Recall that if V is ﬁnite dimensional, then L(V ) is ﬁnite
dimensional. For T ∈ L(V ), the set
_
T
n
¸
¸
n ∈ Z
≥0
_
where T
0
= I cannot be linearly independent, so there is a unique monic polynomial p ∈ C[z]
of minimal degree ≥ 1 such that p(T) = 0. This polynomial is called the minimal polynomial,
and it is denoted min
T
.
Proof. We must show the polynomial p is unique. If q ∈ C[z] is monic with deg(p) = deg(q),
by the Euclidean Algorithm 4.2.1, there are polynomials unique k, r ∈ C[z] such that p =
kq + r and deg(r) < deg(q). Since p(T) = q(T) = 0, we must also have that r(T) = 0, so
r = 0 as p was chosen of minimal degree. Thus p = kq, and since deg(p) = deg(q), k must
be constant. Since p and q are both monic, the constant k must be 1.
Examples 5.4.2.
(1) If T = 0, we have that min
T
(z) = z as this polynomial is a linear polynomial which gives
the zero operator when evaluated at T.
(2) If T = I, we have that min
T
(z) = z−1 for similar reasoning as above (recall that 1(T) = I,
i.e. the constant polynomial 1 evaluated at T is I).
(3) We have that min
T
is linear if and only if T = λI for some λ ∈ F.
Proposition 5.4.3. Let p ∈ F[z]. Then p(T) = 0 if and only if min
T
[ p.
Proof. It is obvious that if min
T
[ p, then p(T) = 0. Suppose p(T) = 0. Since min
T
has
minimal degree such that min
T
(T) = 0, we have deg(p) ≥ deg(min
T
). By 4.2.1, there
are unique k, r ∈ C[z] with deg(r) < deg(min
T
) and p = k min
T
+r. Then since p(T) =
min
T
(T) = 0, we have r(T) = 0, so r = 0 as min
T
was chosen of minimal degree. Hence
min
T
[ p.
74
Proposition 5.4.4. For T ∈ L(V ), λ ∈ sp(T) if and only if λ is a root of min
T
.
Proof. First, suppose λ ∈ sp(T) corresponding to eigenvector v. Then
min
T
(T)v = min
T
(λ)v = 0,
so λ is a root of min
T
. Now suppose λ is a root of min
T
. Then (z −λ) divides min
T
(z), so
there is a p ∈ C[z] with min
T
(z) = (z −λ)
n
p(z) for some n ∈ N such that λ is not a root of
p(z). By 5.4.3, p(T) ,= 0 as min
T
p, so there is a v such that w = p(T)v ,= 0. Now we must
have tha
0 = min
T
(T)v = (T −λI)
n
p(T)v = (T −λI)
n
w.
Hence (T − λI)
k
w is an eigenvector for T corresponding to the eigenvalue λ for some k ∈
¦0, . . . , n −1¦.
Corollary 5.4.5 (Existence of Eigenvalues). Suppose F = C and T ∈ L(V ). Then T has
an eigenvalue.
Proof. We know that min
T
∈ C[z] has a root by 4.5.4. The result follows immediately by
5.4.4.
Remark 5.4.6. More generally, T ∈ L(V ) has an eigenvalue if V is a vector space over an
algebraically closed ﬁeld.
Exercises
V will denote a ﬁnite dimensional vector space over F.
Exercise 5.4.7. Find the minimal polynomial of the following operators:
(1) The operator L
A
∈ L(R
4
) where A =
_
_
_
_
1 2 3 4
0 5 6 7
0 0 8 9
0 0 0 10
_
_
_
_
.
(2) The operator L
B
∈ L(F
n
) where B =
_
_
_
_
_
λ 1 0
.
.
.
.
.
.
.
.
. 1
0 λ
_
_
_
_
_
∈ M
n
(F) and λ ∈ F.
(3) The operator L
C
∈ L(F
n
) where C =
_
_
_
_
_
_
_
0 0 0 −a
0
1 0 0 −a
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0 −a
n−2
0 1 −a
n−1
_
_
_
_
_
_
_
∈ M
n
(F) and a
i
∈ F for
all i ∈ [n −1].
(4) The ﬁnite shift operator T in 5.1.12.
75
76
Chapter 6
Operator Decompositions
We begin with the easiest type of decomposition, which is matrix decomposition via idempo
tents. We then discuss diagonalization and a generalization of diagonalization called primary
decomposition.
6.1 Idempotents
For this section, V will denote a vector space over F, and U, W will denote subspaces of V
such that V = U ⊕W. Please note that the results of this section are highly dependent on
the speciﬁc direct sum decomposition V = U ⊕W.
Deﬁnition 6.1.1. An operator E ∈ L(V ) is called idempotent if E
2
= E ◦ E = E.
Examples 6.1.2.
(1) The zero operator and the identity operator are idempotents.
(2) Let A ∈ M
n
(F) be given by
A =
1
n
_
_
_
1 1
.
.
.
.
.
.
1 1
_
_
_
,
and let E = L
A
. Then E
2
= E as A
2
= A.
Facts 6.1.3. Suppose V is ﬁnite dimensional and E ∈ L(V ) is an idempotent.
(1) As E
2
= E, so we know E
2
−E = 0. If E = 0, then we know min
E
(z) = p
1
(z) = z, and
if E = I, then min
E
(z) = p
2
(z) = z − 1, both of which divide p
3
(z) = z
2
− z. If E ,= 0, I,
then we have that p
1
(E) ,= 0 and p
2
(E) ,= 0, but p
3
(E) = 0. As E ,= λI for any λ ∈ F, we
have deg(min
E
) ≥ 2, so min
E
= p
3
.
(2) We see that if E is a nontrivial idempotent, i.e. E ,= 0, I, then sp(E) = ¦0, 1¦ as these
are the only roots of z
2
−z.
77
DeﬁnitionProposition 6.1.4. Deﬁne E
U
: V →V by E
U
(v) = u if v = u +w with u ∈ U
and w ∈ W. Note that E
U
is well deﬁned since by 2.2.8, for each v ∈ V there are unique
u ∈ U and w ∈ W such that v = u + w. Then
(1) E
U
is a linear operator,
(2) E
2
U
= E
U
,
(3) ker(E
U
) = W, and
(4) im(E
U
) =
_
v ∈ V
¸
¸
E
U
(v) = v
_
= U.
Thus E
U
is an idempotent called the idempotent onto U along W. Note that we also have
E
W
∈ L(V ), the idempotent onto W along U which satisﬁes conditions (1)(4) above after
switching U and W. The operators E
U
, E
W
∈ L(V ) satisfy
(5) E
W
E
U
= 0 = E
U
E
W
, and
(6) I = E
U
+E
W
.
Proof.
(1) Let v = u
1
+w
1
, v
2
= u
2
+ w
2
, and λ ∈ F where u
i
∈ U and w
i
∈ W for i = 1, 2. Then
E
U
(λv
1
+v
2
) = E
U
((λu
1
+ u
2
)
. ¸¸ .
∈U
+(λw
1
+w
2
)
. ¸¸ .
∈W
) = λu
1
+u
2
= λE
U
(v
1
) + E
U
(v
2
).
(2) If v = u +w, then E
2
U
v = E
U
E
U
(v) = E
U
u = u = E
U
v, so E
2
U
= E
U
.
(3) E
U
(v) = 0 if and only if the Ucomponent of V is zero if and only if v ∈ W.
(4) E
U
(v) = v if and only if the Wcomponent of v is zero if and only if v ∈ U. This shows
U =
_
v ∈ V
¸
¸
E
U
(v) = v
_
⊆ im(E
U
). Suppose now that v ∈ im(E
U
). Then there is an x ∈ V
such that E
U
(x) = v. By (2), v = E
U
x = E
2
U
x = E
U
v, so v ∈
_
v ∈ V
¸
¸
E
U
(v) = v
_
= U.
(5) If v = u +w, we have
E
W
E
U
(v) = E
W
E
U
(u + w) = E
W
(u) = 0 = E
U
(w) = E
U
E
W
(u +w) = E
U
E
W
v.
Hence E
W
E
U
= 0 = E
U
E
W
.
(6) If v = u +w, we have
(E
U
+ E
W
)v = E
U
(u +w) + E
W
(u + w) = u +w = v.
Hence E
U
+E
W
= I.
Remark 6.1.5. It is very important to note that E
U
is dependent on the complementary
subspace W. For example, if we set
U = span
__
1
0
__
⊂ F
2
, W
1
= span
__
0
1
__
⊂ F
2
, and W
2
= span
__
1
1
__
⊂ F
2
,
78
then U ⊕W
1
= V = U ⊕W
2
. Applying 6.1.4 using W
1
, we have that
E
U
_
1
1
_
=
_
1
0
_
,
but using W
2
, we would have
E
U
_
1
1
_
=
_
0
0
_
.
Proposition 6.1.6. Suppose E ∈ L(V ) with E = E
2
. Then
(1) I −E ∈ L(V ) satisﬁes (I −E)
2
= (I −E),
(2) im(E) =
_
v ∈ V
¸
¸
Ev = v
_
,
(3) V = ker(E) ⊕im(E),
(4) E is the idempotent onto im(E) along ker(E), and
(5) I −E is the idempotent onto ker(E) along im(E).
Proof.
(1) (I −E)
2
= I −2E +E
2
= I −2E +E = I −E.
(2) It is clear
_
v ∈ V
¸
¸
Ev = v
_
⊆ im(E). If v ∈ im(E), then there is a u ∈ V such that
Eu = v. Then v = Eu = E
2
u = Ev.
(3) Suppose v ∈ ker(E)∩im(E). Then by (2), v = Ev = 0, so v = 0, and ker(E)∩im(E) = (0).
Let v ∈ V , and set u = Ev and w = v −u. Then u ∈ im(E) and Ew = Ev −Eu = u−u = 0,
so w ∈ ker(E), and v = w + u ∈ ker(E) + im(E).
(4) This follows immediately from 6.1.4.
(5) We have that ker(I −E) = im(E) and im(I −E) = ker(E), so the result follows from (4)
(or 6.1.4).
Exercises
Exercise 6.1.7 (Idempotents and Direct Sum). Show that
V =
n
i=1
W
i
for nontrivial subspaces W
i
⊂ V for i ∈ [n] if and only if there are nontrivial idempotents
E
i
∈ L(V ) for i ∈ [n] such that
(1)
n
i=1
E
i
= I and
(2) E
i
E
j
= 0 for all i ,= j.
Note: In this case, we can deﬁne T
i,j
∈ L(W
j
, W
i
) by T
i,j
= E
i
TE
j
, and note that
¯
T = (T
i,j
) ∈ L(W
1
⊕ ⊕W
n
)
∼
= L(V ).
79
6.2 Matrix Decomposition
Deﬁnition 6.2.1. Let U, W be vector spaces. The external direct sum of U and W is the
vector space
U⊕W =
__
u
w
_¸
¸
¸
¸
u ∈ U and w ∈ W
_
where addition and scalar multiplication are deﬁned in the obvious way:
_
u
1
w
1
_
+
_
u
2
w
2
_
=
_
u
1
+u
2
w
1
+w
2
_
and λ
_
u
w
_
=
_
λu
λw
_
for all u, u
1
, u
2
∈ U, w, w
1
, w
2
∈ W, and λ ∈ F.
Notation 6.2.2 (Matrix Decomposition). Since V = U ⊕ W, there is a canonical isomor
phism V = U ⊕W
∼
= U⊕W given by
u +w −→
_
u
w
_
with obvious inverse.
If T ∈ L(V ), then by 6.1.4,
T = (E
U
+E
W
)T(E
U
+E
W
) = E
U
TE
U
+E
U
TE
W
+ E
W
TE
U
+E
W
TE
W
.
If we set T
1
= E
U
TE
U
∈ L(U), T
2
= E
U
TE
W
∈ L(W, U), T
3
= E
W
TE
U
∈ L(U, W), and
T
4
= E
W
TE
W
∈ L(W), deﬁne the operator
¯
T =
_
T
1
T
2
T
3
T
4
_
: U⊕W →U⊕W.
by matrix multiplication, i.e. if u ∈ U and w ∈ W, deﬁne
_
T
1
T
2
T
3
T
4
__
u
w
_
=
_
T
1
u +T
2
w
T
3
u +T
4
w
_
The map given by T →
¯
T is an isomorphism
L(V )
∼
=
__
A B
C D
_¸
¸
¸
¸
A ∈ L(U), B ∈ L(W, U), C ∈ L(U, W), and D ∈ L(W)
_
where addition and scalar multiplication are deﬁned in the obvious way:
_
A B
C D
_
+
_
A
B
C
D
_
=
_
A + A
B +B
C + C
D +D
_
and λ
_
A B
C D
_
=
_
λA λB
λC λD
_
for all A, A
∈ L(U), B, B
∈ L(W, U), C, C
∈ L(U, W), D, D
∈ L(W), and λ ∈ F.
This decomposition can be extended to the case when
V =
n
i=1
W
i
for subspaces W
i
⊂ V for i = 1, . . . , n.
80
Exercises
Exercise 6.2.3. Suppose V = U ⊕W. Verify the claims made in 6.2.2, i.e.
1. V
∼
= U⊕W and
2. L(V )
∼
=
__
A B
C D
_¸
¸
¸
¸
A ∈ L(U), B ∈ L(W, U), C ∈ L(U, W), and D ∈ L(W)
_
.
Exercise 6.2.4.
Exercise 6.2.5. Suppose T ∈ L(V ), V = U ⊕W for subspaces U, W, and
¯
T =
_
A B
0 C
_
where A ∈ L(U), B ∈ L(W, U), and C ∈ L(W) as in 6.2.2. Show that sp(T) = sp(A)∪sp(C).
6.3 Diagonalization
For this section, V will denote a ﬁnite dimensional vector space over F. The main results of
this section are characterization of when an operator T ∈ L(V ) is diagonalizable in terms of
its associated eigenspaces and in terms of its minimal polynomial.
Deﬁnition 6.3.1. Let T ∈ L(V ). T is diagonalizable if there is a basis of V consisting of
eigenvectors of T. A matrix A ∈ M
n
(F) is called diagonalizable if L
A
is diagonalizable.
Examples 6.3.2.
(1) Every diagonal matrix is diagonalizable.
(2) There are matrices which are not diagonalizable. For example, consider
A =
_
0 1
0 0
_
.
We see that zero is the only eigenvalue of L
A
, but dim(E
0
) = 1. Hence there is not a basis
of R
2
consisting of eigenvectors of L
A
.
Remark 6.3.3. T ∈ L(V ) is diagonalizable if and only if there is a basis B of V such that
[T]
B
is diagonal.
Theorem 6.3.4. T ∈ L(H) is diagonalizable if and only if
V =
m
i=1
E
λ
i
where sp(T) = ¦λ
1
, . . . , λ
m
¦.
81
Proof. Certainly
W =
m
i=1
E
λ
i
is a subspace of V by 5.1.9.
Suppose T is diagonalizable. Then there is a basis of V consisting of eigenvectors of T.
Thus W = V as every element of V can be written as a linear combination of eigenvectors
of T.
Suppose now that V = W. Let B
i
be a basis for E
λ
i
for i = 1, . . . , m. Then by 2.4.13,
B =
m
_
i=1
B
i
is a basis for B. Since B consists of eigenvectors of T, T is diagonalizable.
Corollary 6.3.5. Let sp(T) = ¦λ
1
, . . . , λ
n
¦ and let n
i
= dim(E
λ
i
). Then T ∈ L(V ) is
diagonalizable if and only if
n
i=1
n
i
= dim(V ).
Remark 6.3.6. Suppose A ∈ M
n
(F) is diagonalizable. Then there is a basis ¦v
1
, . . . , v
n
¦ of
F
n
consisting of eigenvectors of L
A
corresponding to eigenvalues λ
1
, . . . , λ
n
. Form the matrix
S = [v
1
[v
2
[ [v
n
],
which is invertible by 3.2.15, and note that
S
−1
AS = S
−1
A[v
1
[ [v
n
] = S
−1
[λ
1
v
1
[ [λ
n
v
n
] = S
−1
Sdiag(λ
1
, . . . , λ
n
) = diag(λ
1
, . . . , λ
n
)
where D = diag(λ
1
, . . . , λ
n
) is the diagonal matrix in M
n
(F) whose (i, i)
th
entry is λ
i
. Hence,
there is an invertible matrix S such that S
−1
AS = D, a diagonal matrix. This is the usual
way diagonalizability is presented in a matrix theory course. We show the converse holds,
so that this deﬁnition of diagonalizability is equivalent to the one given in these notes.
Suppose S
−1
AS = D, a diagonal matrix for some invertible S ∈ M
n
(F). Then we know
CS(S) = F
n
by 3.2.18, and if B = ¦v
1
, . . . , v
n
¦ are the columns of S, we have
[Av
1
[ [Av
n
] = AS = SD = [D
1,1
v
1
[ . . . [D
n,n
λ
n
],
so B is a basis of F
n
consisting of eigenvectors of L
A
, and A is diagonalizable.
Exercises
Exercise 6.3.7. Suppose T ∈ L(V ) is diagonalizable and W ⊂ V is a Tinvariant subspace.
Show T[
W
is diagonalizable.
Exercise 6.3.8. Suppose V is ﬁnite dimensional. Let S, T ∈ L(V ) be diagonalizable. Show
that S, T ∈ L(V ) commute if and only if S, T are simultaneously diagonalizable, i.e. there
is a basis of V consisting of eigenvectors of both S and T.
Hint: First show that the eigenspaces E
λ
of T are Sinvariant. Then show S
λ
= S[
E
λ
∈ L(E
λ
)
is diagonalizable for all λ ∈ sp(T).
82
6.4 Nilpotents
Deﬁnition 6.4.1.
(1) An operator N ∈ L(V ) is called nilpotent if N
n
= 0 for some n ∈ N. We say N is
nilpotent of order n if n is the smallest n ∈ N such that N
n
= 0.
Examples 6.4.2.
(1) The matrix
A =
_
0 1
0 0
_
is nilpotent of order 2. In fact,
A
n
=
_
_
_
_
_
0 1 0
.
.
.
.
.
.
.
.
. 1
0 0
_
_
_
_
_
∈ M
n
(F)
is nilpotent of order n for all n ∈ N.
(2) The block matrix
N =
_
0 B
0 0
_
where 0, B ∈ M
n
(F) for some n ∈ N is nilpotent of order 2 in M
2n
(F).
Exercises
6.5 Generalized Eigenvectors
For this section, V will denote a ﬁnite dimensional vector space over F. This main result of
this section is Theorem 12 of Section 6.8 in [3]
Deﬁnition 6.5.1. Let T ∈ L(V ), and suppose p ∈ F[z] is a monic irreducible factor of min
T
.
(1) There is a maximal m ∈ N such that p
m
[ min
T
. Deﬁne K
p
= ker(p(T)
m
).
(2) Deﬁne G
p
=
_
v ∈ V
¸
¸
p(T)
n
v = 0 for some n ∈ N
_
.
(3) If deg(p) = 1, then p(z) = z − λ for some λ ∈ sp(T) by 5.4.4, and we set G
λ
= G
p
. In
this case, elements of G
λ
¸¦0¦ are called generalized eigenvectors of T corresponding the the
eigenvalue λ. We say the multiplicity of λ is M
λ
= dim(G
λ
).
Remarks 6.5.2.
(1) Note that G
p
and K
p
are both Tinvariant subspaces of V .
83
(2) For λ ∈ sp(T), there can be most M
λ
linearly independent eigenvectors corresponding to
the eigenvalue λ as E
λ
⊂ G
λ
.
Examples 6.5.3.
(1) Every eigenvector of T ∈ L(V ) is a generalized eigenvector of T.
(2) Suppose
A =
_
0 1
0 0
_
.
We saw earlier that A has a one eigenvalue, namely zero, and that dim(E
0
) = 1 so A is not
diagonalizable. However, we see that A
2
= 0, so every v ∈ F
2
is a generalized eigenvector
corresponding to the eigenvalue 0.
(3) Let B ∈ M
4
(R) be given by
B =
_
_
_
_
0 −1 1 0
1 0 0 1
0 0 0 −1
0 0 1 0
_
_
_
_
.
We calculate G
p
for all monic irreducible p [ min
L
B
. First, we calculate char
B
. We see that
B −λI is block upper triangular, so
char
B
(z) = det(zI −B) =
¸
¸
¸
¸
¸
¸
¸
¸
z 1 −1 0
−1 z 0 −1
0 0 z 1
0 0 −1 z
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
z 1
−1 z
¸
¸
¸
¸
¸
¸
¸
¸
z 1
−1 z
¸
¸
¸
¸
= (z
2
+ 1)
2
.
We claim char
B
= min
L
B
. Note that
(z
2
+ 1)[
B
= B
2
+I =
_
_
_
_
−1 0 0 −2
0 −1 2 0
0 0 −1 0
0 0 0 −1
_
_
_
_
+
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
=
_
_
_
_
0 0 0 −2
0 0 2 0
0 0 0 0
0 0 0 0
_
_
_
_
,
so we see immediately that (B
2
+ I)
2
= 0. We now have a candidate for min
L
B
. Set
p(z) = (z
2
+ 1)
2
. As p(B) = 0, we have that min
L
B
[ p, but as the only monic irreducible
factor of p is q(z) = z
2
+1 and q(B) ,= 0, we must have that min
L
B
= p. Hence K
p
= G
p
= V
as p(B)
2
v = 0v = 0 for all v ∈ V .
Remark 6.5.4. If T ∈ L(V ) is nilpotent, then every v ∈ V ¸ ¦0¦ is a generalized eigenvector
of T corresponding to the eigenvalue zero.
Lemma 6.5.5. Let T ∈ L(V ), and suppose p, q ∈ F[z] are two distinct monic irreducible
factors of min
T
.
(1) G
p
∩ G
q
= (0).
84
(2) q(T)[
Gp
is bijective.
Proof.
(1) Suppose v ∈ G
p
∩ G
q
. Then there are n, m ∈ N such that p(T)
n
v = 0 = q(T)
m
v. But as
p
n
, q
m
are relatively prime, by 4.3.8, there are f, g ∈ F[z] such that fp
n
+gq
m
= 1. Thus,
v = 1(T)v = f(T)p(T)
n
v +g(T)q(T)
m
v = 0,
and we are ﬁnished.
(2) First, we note that G
p
is Tinvariant, so it is q(T)invariant. Next, suppose q(T)v = 0 for
some v ∈ G
p
. Then v ∈ G
p
∩ G
q
= (0) by (1), so v = 0. Thus q(T)[
Gp
is injective and thus
bijective by 3.2.15.
Proposition 6.5.6. Let T ∈ L(V ), and suppose p ∈ F[z] is a monic irreducible factor of
min
T
. Then K
p
= G
p
.
Proof. Certainly K
p
⊂ G
p
. Let m be the multiplicity of p, and let f be the unique monic
polynomial such that min
T
= fp
m
. We know G
p
is f(T)invariant, and as f = q
1
q
m
for
monic irreducible q
i
,= p which divide min
T
for all i ∈ [m], by 6.5.5, q
i
(T) is bijective, and
thus so is
f(T)[
Gp
= (q
1
q
m
)(T)[
Gp
= (q
1
(T)[
Gp
) (q
m
(T)[
Gp
).
Thus, if x ∈ G
p
, then there is a y ∈ G
p
such that f(T)y = x, so
0 = min
T
(T)y = (fp
m
)(T)y = p(T)
m
f(T)y = p(T)
m
x,
and x ∈ K
p
.
Exercises
6.6 Primary Decomposition
Theorem 6.6.1 (Primary Decomposition). Suppose T ∈ L(V ), and suppose
min
T
= p
r
1
1
p
rn
n
where the p
j
∈ F[z] are distinct irreducible polynomials for i ∈ [n]. Then
(1) V =
n
i=1
K
p
i
=
n
i=1
G
p
i
and
(2) If T
i
= T[
Kp
i
∈ L(K
p
i
) for i ∈ [n], min
T
i
= p
r
i
j
.
Proof.
85
(1) For i ∈ [n], set
f
i
=
min
T
p
r
i
i
=
j=i
p
r
j
j
,
and note that min
T
[f
i
f
j
for all i ,= j. The f
i
’s are relatively prime, so there are polynomials
g
i
∈ F[z] such that
n
i=1
f
i
g
i
= 1.
Set h
i
= f
i
g
i
for i ∈ [n] and E
i
= h
i
(T). First note that
n
i=1
E
i
=
n
i=1
h
i
(T) =
_
n
i=1
h
i
_
(T) = q(T) = I
where q ∈ F[z] is the constant polynomial given by q(z) = 1. Moreover,
E
i
E
j
= h
i
(T)h
j
(T) = (h
i
h
j
)(T) = (g
i
g
j
f
i
f
j
)(T) = 0
by 5.4.3 as min
T
[f
i
f
j
. Hence the E
i
’s are idempotents:
E
j
= E
j
I = E
j
_
n
i=1
E
i
_
=
n
i=1
E
j
E
i
= E
2
j
.
Since they sum to I, they corrspond to a direct sum decomposition of V :
V =
n
i=1
im(E
i
).
It remains to show im(E
i
) = K
p
i
. If v ∈ im(E
i
), then v = E
i
v, so
p
i
(T)
r
i
v = p
i
(T)
r
i
E
i
v = p
i
(T)
r
i
h
i
(T)v = (p
r
i
i
f
i
g
i
)(T)v = (min
T
g
i
)(T)v = 0.
Hence v ∈ K
p
i
and im(E
i
) ⊂ K
p
i
. Now suppose v ∈ K
p
i
. If j ,= i, then f
j
g
j
(T)v = 0 as
p
r
i
i
[f
j
. Thus E
j
v = h
j
(T)v = (f
j
g
j
)v = 0, and
E
i
v =
_
n
j=1
E
j
v
_
=
_
n
j=1
E
j
_
v = Iv = v.
Hence v ∈ im(E
i
), and K
p
i
⊆ im(E
i
).
(2) Since p
i
(T
i
)
r
i
∈ L(K
p
i
) is the zero operator, we have that min
T
i
[(p
i
)
r
i
. Conversely, if
g ∈ F[z] such that g(T
i
) = 0, then g(T)f
i
(T) = 0 as f
i
(T) is only nonzero on K
p
i
, but
g(T) = 0 on K
p
i
. That is, if v ∈ V , then by (1), we can write
v =
n
j=1
v
j
where v
j
∈ K
p
j
for all j ∈ [n],
86
and we have that
g(T)f
i
(T)v = g(T)f
i
(T)
n
j=1
v
j
= g(T)
n
j=1
f
i
(T)v
j
= g(T)v
i
= 0.
Hence min
T
[gf
i
, and p
r
i
i
f
i
divides gf
i
. This implies p
r
i
i
[g, so min
T
i
= p
r
i
i
.
Corollary 6.6.2. V =
λ∈sp(T)
G
λ
if and only if min
T
splits (into linear factors).
Corollary 6.6.3. T ∈ L(V ) is diagonalizable if and only if min
T
splits into distinct linear
factors in F[z], i.e.
min
T
(z) =
λ∈sp(T)
(z −λ).
Proof. Suppose T is diagonalizable, and set
p(z) =
λ∈sp(T)
(z −λ).
We claim p = min
T
, which will imply min
T
splits into distinct linear factors in F[z]. By
6.3.4,
V =
λ∈sp(T)
E
λ
.
Let v ∈ V . Then by 2.2.8, there are unique v
λ
∈ E
λ
for each λ ∈ sp(T) such that
v =
λ∈sp(T)
v
λ
,
so it is clear that p(T)v = 0, and p(T) = 0. Furthermore, as (z−λ)[ min
T
(z) for all λ ∈ sp(T)
by 5.4.4, p = min
T
.
Suppose now that
min
T
(z) =
λ∈sp(T)
(z −λ).
Then by 6.6.1, we have that
V =
λ∈sp(T)
ker(T −λI) =
λ∈sp(T)
E
λ
.
Hence T is diagonalizable by 6.3.4.
Exercises
87
88
Chapter 7
Canonical Forms
For this chapter, V will denote a ﬁnite dimensional vector space over F. We discuss the
rational canonical form and the Jordan canonical form. Along the way, we prove the Cayley
Hamilton Theorem which relates the characteristic and minimal polynomials of an operator.
We then will show to what extent these canonical forms are unique. Finally, we discuss the
holomorphic functional calculus which is an application of the Jordan canonical form.
7.1 Cyclic Subspaces
Deﬁnition 7.1.1. Let p ∈ F[z] be the monic polynomial given by
p(z) = t
n
+
n−1
i=0
a
i
z
i
.
Then the companion matrix for p is the matrix given by
_
_
_
_
_
_
_
0 0 0 −a
0
1 0 0 −a
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0 −a
n−2
0 1 −a
n−1
_
_
_
_
_
_
_
Examples 7.1.2.
(1) Suppose p(z) = z −λ. Then the companion matrix for p is (λ) ∈ M
1
(F).
(2) Suppose p(z) = z
2
+ 1 ∈ M
2
(F). Then the companion matrix for p is
_
0 −1
1 0
_
.
89
(3) If p(z) = z
n
, then the companion matrix for p is
_
_
_
_
_
0 0
1
.
.
.
.
.
.
.
.
.
0 1 0
_
_
_
_
_
Proposition 7.1.3. Let A ∈ M
n
(F) be the companion matrix of p ∈ F[z]. Then char
A
=
min
A
= p.
Proof. The routine exercise of checking that p = char
A
and p(A) = 0 are left to the reader.
Note that if ¦e
1
, . . . , e
n
¦ is the standard basis of F
n
, we have that Ae
i
= e
i+1
for i ∈ [n −1],
so
_
A
i
e
1
¸
¸
i = 0, . . . , n −1
_
is linearly independent. Thus deg(min
A
) ≥ n as q(A)e
1
,= 0 for
all nonzero q ∈ F[z] with deg(q) < n. As deg(p) = n, p is a monic polynomial of least degree
such that p(A) = 0, and thus p = min
A
.
Deﬁnition 7.1.4. Suppose T ∈ L(V ).
(1) For v ∈ V , the Tcyclic subspace generated by v is Z
T,v
= span
_
T
n
v
¸
¸
n ∈ Z
≥0
_
.
(2) A subspace W ⊂ V is called Tcyclic if there is a w ∈ W such that W = Z
T,w
. This w is
called a Tcyclic vector for W.
Examples 7.1.5.
(1) If A ∈ M
n
(F) is a companion matrix for p ∈ F[z], then F
n
is L
A
cyclic with cyclic vector
e
1
as Ae
i
= e
i+1
for all i ∈ [n −1].
(2)
Remarks 7.1.6.
(1) Every Tcyclic subspace is Tinvariant. In fact, if w ∈ Z
T,v
, then there is an n ∈ N and
there are scalars λ
0
, . . . , λ
n
∈ F such that
w =
n
i=0
λ
i
T
i
v, so Tw =
n
i=0
λ
i
T
i+1
v ∈ Z
T,v
.
(2) Suppose W is a Tinvariant subspace and w ∈ W. Then Z
T,w
⊂ W as T
i
w ∈ W for all
i ∈ N.
Proposition 7.1.7. Suppose v ∈ V ¸¦0¦. There there is an n ∈ Z
≥0
such that
_
T
i
v
¸
¸
i = 0, . . . , n
_
is a basis for Z
T,v
.
Proof. As V is ﬁnite dimensional, the set
_
T
i
v
¸
¸
i ∈ Z
≥0
_
is linearly dependent, so we can
pick a maximal n ∈ Z
≥0
such that B =
_
T
i
v
¸
¸
i = 0, . . . , n
_
is linearly independent. We
claim that T
m
v ∈ span(B) for all m > n so B is the desired basis. We prove this claim by
induction on m. The base case is m = n + 1.
90
m = n + 1: We already know T
n+1
v ∈ span(B) as n was chosen to be maximal. Hence there
are scalars λ
0
, . . . , λ
n
∈ F such that
T
n+1
v =
n
i=0
λ
i
T
i
v.
m ⇒m+ 1: Suppose now that T
i
v ∈ span(B) for all i = 0, . . . , m. Then
T
m+1
v = T
m−n
T
n+1
v = T
m−n
n
i=0
λ
i
T
i
v =
n
i=0
λ
i
T
m−n+i
v ∈ span(B)
by the induction hypothesis.
Deﬁnition 7.1.8. Suppose W ⊂ V is a nontrivial Tcyclic subspace of V . Then a Tcyclic
basis of W is a basis of the form ¦w, Tw, . . . , T
n
w¦ for some n ∈ Z
≥0
. We know such a basis
exists by 7.1.7. If W = Z
T,v
, then the Tcyclic basis associated to v is denoted B
T,v
.
Examples 7.1.9.
(1) If A ∈ M
n
(F) is a companion matrix for p ∈ F[z], then the standard basis for F
n
is an
L
A
cyclic basis. In fact, F
n
= Z
L
A
,e
1
, and the standard basis is equal to B
L
A
,e
1
.
(2)
Proposition 7.1.10. Suppose T ∈ L(V ), and suppose V is Tcyclic.
(1) Let B be a Tcyclic basis for V . Then [T]
B
is the companion matrix for min
T
.
(2) deg(min
T
) = dim(V ).
Proof.
(1) Let n = dim(V ). We know that [T]
B
is a companion matrix for some polynomial p ∈ F[z]
as B = ¦v, Tv, . . . , T
n−1
v¦, and
[T]
B
=
_
[Tv]
B
¸
¸
¸
¸
[T
2
v]
B
¸
¸
¸
¸
¸
¸
¸
¸
[T
n
v]
B
_
=
_
_
_
_
_
_
_
0 0 0 −a
0
1 0 0 −a
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0 −a
n−2
0 1 −a
n−1
_
_
_
_
_
_
_
where
T
n
v = −
n−1
i=0
a
i
T
i
v and p(z) = z
n
+
n−1
i=1
a
i
z
i
.
Now min
L
[T]
B
= min
T
= p by 7.1.3.
(2) This follows immediately from (1).
91
Lemma 7.1.11. Suppose p is a monic irreducible factor of min
T
for T ∈ L(V ), and d =
deg(p). Suppose that v ∈ V ¸ ¦0¦ and m ∈ N is minimal with p(T)
m
v = 0. Then B
T,v
=
_
T
i
v
¸
¸
i = 0, . . . , dm−1
_
(so Z
T,v
has dimension d).
Proof. Let W = Z
T,v
, and note that W is a Tinvariant subspace. Set S = T[
W
, and note
that p(T)
m
v = 0 implies that p(T)
m
w = 0 for all w ∈ W. Hence p(S)
m
= 0, and min
S
[ p
m
by 5.4.3. But p is irreducible, so min
S
= p
k
for some k ≤ m, but as p(T)
k
v ,= 0 for all k < m,
we must have min
S
= p
m
. As W is Scyclic, we must have that
dm = deg(p
m
) = deg(min
S
) = dim(W)
by 7.1.10, so [B
T,v
[ = dm. The result now follows by 7.1.7.
Exercises
Exercise 7.1.12. Let T ∈ L(V ) with min
T
= p
m
for some monic irreducible p ∈ F[z] and
some m ∈ N. Show the following are equivalent:
(1) V is Tcyclic,
(2) deg(min
T
) = dim(V ), and
(3) min
T
= char
T
.
7.2 Rational Canonical Form
The proof of the main theorem in this section is adapted from [2].
Deﬁnition 7.2.1. Let T ∈ L(V ).
(1) Subspaces Z
1
, . . . , Z
n
are called a rational canonical decomposition of V for T if
V =
n
i=1
Z
i
,
Z
i
is Tcyclic for all i ∈ [n], and Z
i
⊂ K
p
i
for some monic irreducible p
i
[ min
T
for all i ∈ [n].
(2) A basis B of V is called a rational canonical basis for T if B is the disjoint union of
nonempty sets B
i
, denoted
B =
n
i=1
B
i
,
such that each B
i
is a Tcyclic basis for span(B
i
) ⊂ K
p
i
for some monic irreducible p
i
[ min
T
for all i ∈ [n].
(3) The matrix [T]
B
is called a rational canonical form of T if B is a rational canonical basis
for T.
92
(4) The matrix A ∈ M
n
(F) is said to be in rational canonical form if the standard basis of
F
n
is a rational canonical basis for L
A
.
Remark 7.2.2. Note that if B is a rational canonical basis for T, then [T]
B
is a block diagonal
matrix such that each block is a companion matrix for a polynomial q ∈ F[z] of the form p
m
where p ∈ F[z] is monic and irreducible and m ∈ N.
Proposition 7.2.3. Suppose T ∈ L(V ) such that min
T
= p
m
for some monic, irreducible
p ∈ F[z] and some m ∈ N. Suppose v
1
, . . . , v
n
∈ V such that
S
1
=
n
i=1
B
T,w
i
is linearly independent so that
W =
n
i=1
Z
T,w
i
is a subspace of V . For each i ∈ [n], Suppose there is v
i
∈ V such that p(T)v
i
= w
i
, i.e.
w
i
∈ im(p(T)) for all i ∈ [n]. Then
S
2
=
n
i=1
B
T,v
i
is linearly independent.
Proof. Set Z
i
= Z
T,w
i
, T
i
= T[
Z
i
, and m
i
= [B
T,v
i
[ for each i ∈ [n]. Suppose
n
i=1
m
i
j=1
λ
i,j
T
j
v
i
= 0.
For i ∈ [n], let
f
i
(z) =
m
i
j=1
λ
i,j
z
j
so that 0 =
n
i=1
f
i
(T)v
i
.
Applying p(T), we get
0 = p(T)
n
i=1
f
i
(T)v
i
=
n
i=1
f
i
(T)p(T)v
i
=
n
i=1
f
i
(T)w
i
.
Now as f
i
(T)w
i
∈ Z
i
and W is a direct sum of the Z
i
’s, we must have f
i
(T)w
i
= 0 for all
i ∈ [n]. Hence f
i
(T
i
) = 0 on Z
i
, and min
T
i
[ f
i
. Since min
T
(T
i
) = 0, we must have that
min
T
i
[ min
T
, so min
T
i
= p
k
for some k ≤ m. Hence p[f
i
for all i ∈ [n]. Let g
i
∈ F[z] such
that pg
i
= f
i
for all i ∈ [n]. Then
n
i=1
f
i
(T)v
i
=
n
i=1
g
i
(T)p(T)v
i
=
n
i=1
g
i
(T)w
i
= 0,
93
so once again, g
i
(T)w
i
= 0 for all i ∈ [n] as W is the direct sum of the Z
i
’s and g
i
(T)w
i
∈ Z
i
for all i ∈ [n]. But then
0 = g
i
(T)w
i
= f
i
(T)v
i
=
m
i
j=1
λ
i,j
T
j
v
i
,
so we must have that λ
i
j
= 0 for all i ∈ [n] and j ∈ [m
i
] as B
T,v
i
is linearly independent.
Proposition 7.2.4. Suppose T ∈ L(V ) with min
T
= p
m
for some monic, irreducible p ∈ F[z]
and some m ∈ N. Suppose that W is a Tinvariant subspace of V with basis B.
(1) Suppose v ∈ ker(p(T)) ¸ W. Then B ∪ B
T,v
is linearly independent.
(2) There are v
1
, . . . , v
n
∈ ker(p(T)) such that
B
= B ∪
n
_
i=1
B
T,v
i
is linearly independent and contains ker(p(T)).
Proof.
(1) Let d = deg(min
T
), and note that B
T,v
=
_
T
i
v
¸
¸
i = 0, . . . , d −1
_
by 7.1.11. Suppose
B = ¦v
1
, . . . , v
n
¦ and
n
i=1
λ
i
v
i
+ w = 0 where w =
d−1
j=0
µ
j
T
j
v.
Then w ∈ span(B) = W and w ∈ Z
T,v
, both of which are Tinvariant subspaces. Thus we
have Z
T,w
⊂ W and Z
T,w
⊂ Z
T,v
, so we have
Z
T,w
⊂ Z
T,v
∩ W ⊂ Z
T,v
.
If w ,= 0, then p(T)w = 0 as p(T)[
Z
T,v
= 0, so by 7.1.11,
d = dim(Z
T,w
) ≤ dim(Z
T,v
∩ W) ≤ dim(Z
T,v
) = d,
so equality holds. But then Z
T,v
= Z
T,v
∩ W, which is a subset of W, a contradiction as
v / ∈ W. Thus w = 0, and µ
j
= 0 for all j = 0, . . . , d −1. But as B is a basis, we have λ
i
= 0
for all i ∈ [n].
(2) If ker(p(T)) W, then pick v
1
∈ W¸ker(p(T)). By (1), B∪B
T,v
1
is linearly independent,
and W
1
= span(B ∪ B
T,vt
) is Tinvariant. If ker(p(T)) W
1
, pick v
2
∈ W
1
¸ ker(p(T)). By
(1), B∪B
T,v
1
∪B
T,v
2
is linearly independent, and W
2
= span(B∪B
T,v
1
∪B
T,v
2
) is Tinvariant.
We can repeat this process until W
k
contains ker(p(T)) for some k ∈ N.
Lemma 7.2.5. Suppose T ∈ L(V ) such that min
T
= p
m
where p ∈ F[z] is monic and
irreducible and m ∈ N. Then there is a rational canonical basis B for T.
94
Proof. We proceed by induction on m ∈ N.
m = 1: Then V = ker(p(T)), so we may apply 7.2.4 with W = (0) to get a basis for ker(p(T)).
m−1 ⇒m: We know W = im(p(T)) is Tinvariant, and T[
W
has minimal polynomial p
m−1
.
By the induction hypothesis, we have a rational canonical basis
B =
n
i=1
B
T,w
i
for T[
W
. As W = im(p(T)), there are v
i
∈ V for i ∈ [n] such that p(T)v
i
= w
i
, and
C =
n
i=1
B
T,v
i
is linearly independent by 7.2.3. By 7.2.4, there are u
1
, . . . , u
k
∈ ker(p(T)) such that
D = C ∪
k
i=1
B
T,u
i
is linearly independent and contains ker(p(T)). Let U = span(D). We claim that U = V .
First, note that U is Tinvariant, so it is p(T) invariant. Let S = p(T)[
U
. Then
im(S) = SU = S span(C) ⊇ W = im(p(T)), and ker(S) ⊇ ker(p(T))
as U contains ker(p(T)). By 3.2.13
dim(U) = dim(im(S)) + dim(ker(S)) ≥ dim(im(p(T))) + dim(ker(p(T))) = dim(V ),
so U = V . Thus D is a rational canonical basis for T.
Theorem 7.2.6. Every T ∈ L(V ) has a rational canonical decomposition.
Proof. By 4.3.11, we can factor min
T
uniquely:
min
T
=
n
i=1
p
m
i
i
where p
i
∈ F[z] are distinct monic irreducible factors of min
T
and m
i
∈ N for all i ∈ [n]. By
6.6.1, we have
V =
n
i=1
K
p
i
where K
p
i
= ker(p
i
(T)
m
i
) are Tinvariant subspaces, and if T
i
= T[
Kp
i
, then min
T
i
= p
m
i
i
for
all i ∈ [n]. By 7.2.5, we know that there is a rational canonical basis B
i
for T
i
on K
p
i
, so
B =
n
i=1
B
i
is the desired rational canonical basis of T.
95
Corollary 7.2.7. For T ∈ L(V ), there is a rational canonical decomposition V =
n
i=1
Z
T,v
i
into cyclic subspaces.
Proof. By 7.2.6 there is a rational canonical basis
B =
n
i=1
B
T,v
i
for v
1
, . . . , v
n
∈ V . Setting Z
T,v
i
= span(B
T,v
i
), we get the desired result.
Exercises
Exercise 7.2.8. Classify all pairs of polynomials (m, p) ∈ F[z]
2
such that there is an operator
T ∈ L(V ) where V is a ﬁnite dimensional vector space over F such that min
T
= m and
char
T
= p.
Exercise 7.2.9. Show that two matrices A, B ∈ M
n
(F) are similar if and only if they share
a rational canonical form, i.e. there are bases C, D of F
n
such that [L
A
]
C
= [L
B
]
D
is in
rational canonical form.
Exercise 7.2.10. Prove or disprove: A square matrix A ∈ M
n
(F) is similar to its transpose
A
T
. If the statement is false, ﬁnd a condition which makes it true.
7.3 The CayleyHamilton Theorem
Theorem 7.3.1 (CayleyHamilton). For T ∈ L(V ), char
T
(T) = 0.
Proof. Factor min
T
into irreducibles as in 4.3.11:
min
T
=
n
i=1
p
m
i
i
.
Let B be a rational canonical basis for T as in 7.2.6. By 7.1.10, [T]
B
is block diagonal, so let
A
1
, . . . , A
k
be the blocks. We know that A
j
is the companion matrix for p
n
i
j
i
j
where i
j
∈ [n]
and n
j
∈ N for j ∈ [k]. As
0 = [min
T
(T)]
B
= min
T
([T]
B
) = min
T
_
_
_
A
1
.
.
.
A
k
_
_
_
=
_
_
_
min
T
(A
1
)
.
.
.
min
T
(A
k
)
_
_
_
,
we must have that n
i
j
≤ m
i
for all i ∈ [n]. But as min
T
is the minimal polynomial of [T]
B
,
we must have that n
i
j
= m
i
for some i ∈ [n]. By 7.1.3, we see that char
T
(z) = det(zI −[T]
B
)
is a product
char
T
(z) = det(zI −[T]
B
) =
n
i=1
p
i
(z)
r
i
96
where r
i
≥ m
i
for all i ∈ [n]. Thus,
char
T
(T) =
n
i=1
p
i
(z)
r
i
(T) = min
T
(T)
n
i=1
p
i
(T)
r
i
−m
i
= 0.
Corollary 7.3.2. min
T
[ char
T
for all T ∈ L(V ).
Proposition 7.3.3. Let A ∈ M
n
(F).
(1) The coeﬃcient of the z
n−1
term in char
A
(z) is equal to −trace(A).
(2) The constant term in char
A
(z) is equal to (−1)
n
det(A).
Proof. First note that the result is trivial if A is the companion matrix to a polynomial
p(z) = z
n
+
n−1
i=0
a
i
z
i
∈ F[z] as char
A
(z) = p(z), trace(A) = −a
n−1
, and det(A) = (−1)
n
a
0
.
For the general case, let B be a rational canonical basis for L
A
so that [A]
B
is a block
diagonal matrix
[A]
B
=
_
_
_
A
1
0
.
.
.
0 A
m
_
_
_
where each block A
i
is the companion matrix of some polynomial p
i
(z) = z
n
i
+
n
i
−1
j=0
a
i
j
z
j
∈
F[z]. As A ∼ [A]
B
, we have that
char
A
(z) = det(zI −[A]
B
) =
n
i=1
p
i
(z),
trace(A) = trace([A]
B
), and det(A) = det([A]
B
).
(1) It is easy to see that the trace of [A]
B
is the sum of the traces of the A
i
, i.e.
trace(A) = trace([A]
B
) =
m
i=1
trace(A
i
) =
m
i=1
−a
i
n
i
−1
.
Now one sees that the (n −1)
th
coeﬃcient of char
A
is exactly the negative of the right hand
side as
char
A
(z) =
n
i=1
_
z
n
i
+
n
i
j=0
a
i
j
z
j
_
= z
n
+
_
m
i=1
a
i
n
i
−1
_
z
n−1
+ +
m
i=1
a
i
0
since the (n − 1)
th
terms in the product above are obtained by taking the (n
i
− 1)
th
term
of p
i
and multiplying by the leading term (the z
n
j
term) for j ,= i. Similarly, the constant
term of char
A
is obtained by taking the product of the constant terms of the p
i
’s.
97
(2) In (1), we calculated that the constant term of char
A
is
m
i=1
a
i
0
. But this is (−1)
n
det(A)
as
det(A) = det([A]
B
) =
m
i=1
det(A
i
) =
m
i=1
(−1)
n
i
a
i
0
= (−1)
n
m
i=1
a
i
0
.
Exercises
7.4 Jordan Canonical Form
Deﬁnition 7.4.1. A Jordan block of size n associated to λ ∈ F is a matrix J ∈ M
n
(F) such
that
J
i,j
=
_
¸
_
¸
_
λ if i = j
1 if j = i + 1
0 else,
i.e. J looks like
_
_
_
_
_
λ 1 0
.
.
.
.
.
.
.
.
. 1
0 λ
_
_
_
_
_
.
Proposition 7.4.2. Suppose J ∈ M
n
(F) is a Jordan block associated to λ. Then char
J
(z) =
min
T
(z) = (z −λ)
n
.
Proof. Obvious.
Remark 7.4.3. Let T ∈ L(V ) and λ ∈ sp(T). If m ∈ N is maximal such that (z −λ)
m
[ min
T
and S = T[
G
λ
, then (z−λ)
m
= min
S
by 6.6.1. This means that the operator N = (T −λI)[
G
λ
is nilpotent of order m. Now by 7.2.6, there is a rational canonical decomposition for N
G
λ
=
k
i=1
Z
N,v
i
for some k ∈ N where v
i
∈ G
λ
for all i ∈ [n]. Set n
i
= [B
N,v
i
[ for each i ∈ [n]. Then
B
N,v
i
=
_
(T −λI)
j
v
i
¸
¸
j = 0, . . . , n
i
−1
_
,
and note that [N[
B
N,v
i
]
B
N,v
i
is a matrix in M
n
i
(F) of the form
_
_
_
_
_
0 0
1
.
.
.
.
.
.
.
.
.
0 1 0
_
_
_
_
_
98
as the minimal polynomial of N[
B
N,v
i
is z
n
i
. Setting C
N,v
i
=
_
(T −λI)
j
v
i
¸
¸
j = n
i
−1, . . . , 0
_
,
i.e. reordering B
N,v
i
, we have that [N[
C
N,v
i
]
C
N,v
i
is a matrix in M
n
i
(F) of the form
_
_
_
_
_
0 1 0
.
.
.
.
.
.
.
.
. 1
0 0
_
_
_
_
_
,
i.e. it is a Jordan block associated to 0 of size n
i
. Thus, we see that if we set
B =
k
i=1
B
N,v
i
,
we have that [N]
B
is a block diagonal matrix where the i
th
block is the companion matrix
of the polynomial z
n
i
, and if we set
C =
k
i=1
C
N,v
i
,
we have that [N]
C
is a block diagonal matrix where the i
th
block is a Jordan block of size n
i
associated to 0. Thus [T[
G
λ
]
C
= [N]
C
+λI is a block diagonal matrix where the i
th
block is
a Jordan block of size n
i
associated to λ.
Theorem 7.4.4. Suppose T ∈ L(V ) and min
T
splits in F[z]. Then there is a basis B of V
such that [T]
B
is a block diagonal matrix where each block is a Jordan block. The diagonal
elements of [T]
B
are precisely the eigenvalues of T, and if λ ∈ sp(T), λ appears exactly M
λ
times on the diagonal of [T]
B
.
Proof. By 6.6.2, we have min
T
splits if and only if
V =
λ∈sp(T)
G
λ
.
by 7.4.3, for each λ ∈ sp(T), there is a basis B
λ
of G
λ
such that T[
B
λ
is a block diagonal
matrix in M
M
λ
(F) where each block is a Jordan block associated to λ. Setting
B =
λ∈sp(T)
B
λ
gives the desired result.
Deﬁnition 7.4.5.
(1) Let T ∈ L(V ). A basis B for V as in 7.4.4 is called a Jordan canonical basis for T, and
the matrix [T]
B
is called a Jordan canonical form of T.
99
(2) A matrix A ∈ M
n
(F) is said to be in Jordan canonical form if the standard basis is a
Jordan canonical basis of F
n
for L
A
.
Examples 7.4.6.
(1)
(2)
Deﬁnition 7.4.7. Let T ∈ L(V ). Recall that if v ∈ G
λ
¸¦0¦ for some λ ∈ sp(T), then there is
a minimal n such that (T−λI)
n
v = 0. This means that B
T−λI,v
=
_
(T −λI)
i
v
¸
¸
i = 0, . . . , n −1
_
is a basis for Z
T−λI,v
.
(1) The set C
T−λI,v
= ((T −λI)
n−1
v, . . . , (T −λI)v, v) is called a Jordan chain for T associated
to the eigenvalue λ of length n. The vector (T − λI)
n−1
v is called the lead vector of the
Jordan chain C
T−λI,v
, and note that it is an eigenvector corresponding to the eigenvalue λ.
Examples 7.4.8.
(1) Suppose J ∈ M
n
(F) is a Jordan block associated to λ ∈ F
J =
_
_
_
_
_
λ 1 0
.
.
.
.
.
.
.
.
. 1
0 λ
_
_
_
_
_
.
One can easily see that (J − λI)e
i
= e
i−1
for all 1 < i ≤ n and (J − λI)e
1
= 0, so the
standard basis of F
n
is a Jordan chain for L
J
associated to λ with lead vector e
1
.
(2) If A is a block diagonal matrix
A =
_
_
J
1
0
. . .
0 J
n
_
_
where J
i
is a Jordan block associated to λ ∈ F for all i ∈ [n], then we see by (1) that the
standard basis is a disjoint union of Jordan chains associated to λ.
Corollary 7.4.9. Suppose B is a Jordan canonical basis for T ∈ L(V ). Then B is a disjoint
union of Jordan chains associated to the eigenvalues of λ.
Proof. Let sp(T) = ¦λ
1
, . . . , λ
n
¦. We have that [T]
B
is a block diagonal matrix
[T]
B
=
_
_
_
A
1
0
.
.
.
0 A
n
_
_
_
100
where each A
i
is a block diagonal matrix corresponding to λ
i
∈ sp(T) composed of Jordan
blocks:
A
i
=
_
_
_
J
i
1
0
.
.
.
0 J
i
n
i
_
_
_
.
Now set B
i
= B ∩ G
λ
i
for each i ∈ [n] so that
B =
n
i=1
B
i
,
and set T
i
= T[
G
λ
i
. Then [T
i
]
B
i
= A
i
, and it is easy to check that B
i
is the disjoint union of
Jordan chains associated to λ
i
for all i ∈ [n]. Hence B is a disjoint union of Jordan chains
associated to the eigenvalues of T.
Exercises
Exercise 7.4.10. Classify all pairs of polynomials (m, p) ∈ C[z]
2
such that there is an
operator T ∈ L(V ) where V is a ﬁnite dimensional vector space over C such that min
T
= m
and char
T
= p.
Exercise 7.4.11. Show that two matrices A, B ∈ M
n
(C) are similar if and only if they share
a Jordan canonical form.
7.5 Uniqueness of Canonical Forms
In this section, we discuss to what extent a rational canonical form of T ∈ L(V ) is unique,
and to what extend the rational canonical form is unique if min
T
splits. The information
from this section is adapted from sections 7.2 and 7.4 of [2].
Notation 7.5.1 (Jordan Canonical Dot Diagrams).
Ordering Jordan Blocks: Suppose T ∈ L(V ) such that min
T
splits, and let B is a Jordan
canonical basis for T. If λ ∈ sp(T) and m ∈ N is the largest integer such that (z−λ)
m
[ min
T
,
then we have that S = T[
G
λ
∈ L(G
λ
) has minimal polynomial z
m
, and there is a subset
C ⊂ B that is a Jordan canonical basis for S. This means that [S]
C
is a block diagonal
matrix, so let A
1
, . . . , A
n
be these blocks, i.e
[S]
C
=
_
_
_
A
1
0
.
.
.
0 A
n
_
_
_
.
We know that A
i
is a Jordan block of size m
i
where m
i
≤ m for all i ∈ [n]. We now impose
the condition that in order for [S]
C
to be in Jordan canonical form, we must have that
101
m
i
≥ m
i+1
for all i ∈ [n−1]. This means that to ﬁnd a Jordan canonical basis for T ∈ L(V ),
we ﬁrst decompose G
λ
into Tcyclic subspaces for all λ ∈ sp(T), then we take bases of these
Tcyclic subspaces, and then we order these bases by how many elements they have.
Nilpotent Operators: Suppose min
T
= z
m
for some λ ∈ F. Let
B =
n
i=1
C
T,v
i
be a Jordan canonical basis of V consisting of disjoint Jordan chains C
T,v
i
for i ∈ [n]. We
can picture B as an array of dots called a Jordan canonical dot diagram for T relative to the
basis B. For i ∈ [n], let m
i
= [C
T,v
i
[, and note that the ordering of Jordan blocks requires
that m
i
≥ m
i+1
for all i ∈ [n −1]. Write B as an array of dots such that
(1) the array contains n columns starting at the top,
(2) the i
th
column contains m
i
dots, and
(3) the (i, j)
th
dot is labelled T
m
i
−j
v
i
.
• T
m
1
−1
v
1
• T
m
2
−1
v
2
• T
mn−1
v
n
• T
m
1
−2
v
1
• T
m
2
−2
v
2
• T
mn−2
v
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. • Tv
n
.
.
.
.
.
. • v
n
.
.
. • Tv
2
.
.
. • v
2
• Tv
1
• v
1
Operators: Suppose that T ∈ L(V ), and let B be a Jordan canonical basis for T. Let
B
λ
= B ∩ G
λ
for λ ∈ sp(T) and T
λ
= T[
G
λ
. Then T
λ
−λI is nilpotent, so we may apply (2)
to get a dot diagram for T
λ
−λI relative to B
λ
for each λ ∈ sp(T). The dot diagram for T
relative to B is the disjoint union of the dot diagrams for T
λ
relative to the B
λ
for λ ∈ sp(T).
Lemma 7.5.2. Suppose T ∈ L(V ) is nilpotent of order m, i.e. min
T
(z) = z
m
. Let B be a
Jordan canonical basis for T, and let r
i
denote the number of dots in the i
th
row of the dot
diagram for T relative to B as in 7.5.1 for i ∈ [n]. Then
(1) If R
k
=
k
j=1
r
j
for k ∈ [n], then R
k
= nullity(T
k
) = dim(ker(T
k
)) for all k ∈ [n],
(2) r
1
= dim(V ) −rank(T), and
(3) r
k
= rank(T
k−1
) −rank(T
k
) for all k > 1.
102
Hence the dot diagram for T is independent of the choice of Jordan canonical basis B for T
as each r
i
is completely determined by T, and the Jordan canonical form [T]
B
is unique.
Proof.
(1) We see that there are at least R
k
linearly independent vectors in ker(T
k
), namely T
m
i
−j
v
i
for all i ∈ [n] and j ∈ [k], so nullity(T
k
) ≥ R
k
. Moreover,
_
T
k
(T
m
i
−j
v
i
)
¸
¸
i ∈ [n] and j > k
_
is a linearly independent subset of B, so rank(T
k
) ≥ [B[ −R
k
. By the ranknullity theorem,
we have
[B[ = rank(T
k
) + nullity(T
k
) ≥ [B[ −R
k
+R
k
= [B[,
so equality holds, and we must have that nullity(T
k
) = R
k
.
(2) This is immediate from (1) and the ranknullity theorem.
(3) For k > 1, by (1) we have that
r
k
=
k
i=1
r
i
−
k−1
i=1
r
i
= (dim(V )−nullity(T
k
))−(dim(V )−nullity(T
k−1
)) = rank(T
k
)−rank(T
k−1
).
Theorem 7.5.3. Suppose T ∈ L(V ) and min
T
splits in F[z]. Two Jordan canonical forms
(using the convention of 7.5.1) of T ∈ L(V ) diﬀer only by a permutation of the eigenvalues
of T.
Proof. If B is a Jordan canonical basis of T as in 7.5.1, we set B
λ
= B ∩ G
λ
and T
λ
= T[
G
λ
for all λ ∈ sp(T), and we note that
B =
λ∈sp(T)
B
λ
.
Note further that T
λ
−λI is nilpotent, and [T
λ
−λI]
B
λ
is in Jordan canonical form, which is
unique by 7.5.2. Hence [T
λ
]
B
λ
= [T
λ
−λI]
B
λ
+λI is unique, so is unique up to the ordering
of the λ ∈ sp(T). In fact, if ¦λ
1
, . . . , λ
n
¦ is an enumeration of sp(T) and
B =
n
i=1
B
λ
i
,
then setting T
i
= T
λ
i
and B
i
= B
λ
i
, we have
[T]
B
=
_
_
_
[T
1
]
B
1
0
.
.
.
0 [T
n
]
Bn
_
_
_
.
103
Notation 7.5.4 (Rational Canonical Dot Diagrams).
Ordering Companion Blocks: Let B is a rational canonical basis for T ∈ L(V ). If p ∈ F[z]
is a monic irreducible factor of min
T
and m ∈ N is the largest integer such that p
m
[ min
T
,
then we have that S = T[
Kp
∈ L(K
p
) has minimal polynomial p
m
, and there is a subset
C ⊂ B that is a rational canonical basis for S. This means that [S]
C
is a block diagonal
matrix, so let A
1
, . . . , A
n
be these blocks, i.e
[S]
C
=
_
_
_
A
1
0
.
.
.
0 A
n
_
_
_
.
We know that A
i
is the companion matrix for p
m
i
where m
i
≤ m for all i ∈ [n]. We now
impose the condition that in order for [S]
C
to be in rational canonical form, we must have
that m
i
≥ m
i+1
for all i ∈ [n − 1]. This means that to ﬁnd a rational canonical basis for
T ∈ L(V ), we ﬁrst decompose K
p
into Tcyclic subspaces for all monic irreducible p ∈ F[z]
dividing min
T
, then we take bases of these Tcyclic subspaces, and then we order these bases
by how many elements they have.
Case 1: Suppose min
T
= p
m
for some monic, irreducible p ∈ F[z] and some m ∈ N. Let
B =
n
i=1
B
T,v
i
be a rational canonical basis of V for T. B can be represented as an array of dots called a
rational canonical dot diagram for T relative to B. For i ∈ [n], let B
i
= B
T,v
i
, let Z
i
= Z
T,v
i
,
let T
i
= T[
Z
i
, and let m
i
∈ N be the minimal number such that p(T)
m
i
v
i
= 0, i.e. min
T
i
= p
m
i
and m
i
≤ m for all i ∈ [n]. Set d = deg(p), and note that [B
i
[ = dm
i
by 7.1.11. Now by the
ordering of the companion blocks, we must have that m
i
≥ m
i+1
for all i ∈ [n −1]. The dot
diagram for T relative to B is given by an array of dots such that
1. the array contains n columns starting at the top,
2. the i
th
column has m
i
dots, and
3. the (i, j)
th
dot is labelled p(T)
m
i
−j
v
i
.
Note that there are exactly [B[/d dots in the dot diagram.
Operators: As before, for a general T ∈ L(V ), we let B be a Rational canonical basis for T,
and we let B
p
= B∩G
p
and T
p
= T[
Gp
for each distinct monic irreducible p [ min
T
. The dot
diagram for T is then the disjoint union of the dot diagrams for the T
p
’s relative to the B
p
’s.
Lemma 7.5.5. Suppose T ∈ L(V ) with min
T
= p
m
for a monic irreducible p ∈ F[z] and
some m ∈ N. Let d = deg(p), let
B =
n
i=1
B
T,v
i
104
be a rational canonical basis of V for T, and let m
i
∈ N be minimal such that p(T)
m
i
v
i
= 0.
Let r
j
be the number of dots in the j
th
row of the dot diagram for T relative to B for j ∈ [l].
Then
(1) Set C
i
l
=
_
(p(T)
m
i
−h
T
l
v
i
¸
¸
h ∈ [m
i
]
_
is a p(T)cyclic basis for Z
p(T),T
l
v
i
for 0 ≤ l < d
and i ∈ [n]. Then C
i
=
d−1
l=0
C
i
l
is a basis for Z
i
= Z
T,v
i
, so it is a Jordan canonical
basis of Z
i
for p(T)[
Z
i
. Hence C =
n
i=1
C
i
is a Jordan canonical basis of V for p(T).
(2) r
1
=
1
d
(dim(V ) −rank(p(T))), and
(3) r
j
=
1
d
(rank(p(T)
j−1
) −rank(p(T)
j
)) for j > 1.
Hence the dot diagram for T is independent of the choice of rational canonical basis B for T
as each r
j
is completely determined by p(T), and the rational canonical form [T]
B
is unique.
Proof. We have that (2) and (3) are immediate from (1) and 7.5.2 as the Jordan canonical
form of p(T), a nilpotent operator of order m, is unique. It suﬃces to prove (1).
We may suppose p(z) ,= z as this case would be similar to 7.5.3 as in this case, min
T
would split. The fact that C
i
l
is linearly independent for all i, l comes from 7.1.11 as T
l
v
i
,= 0
and m
i
is minimal such that p(T)
m
i
T
l
v
i
= 0 (in fact the restriction of T
l
to Z
i
is invertible
as if q(z) = z
l
, then q(T) is bijective on G
p
= V by the proof of 6.5.5 as p, q are relatively
prime).
We show C
i
is a basis for Z
i
for a ﬁxed i ∈ [n]. Suppose
d−1
l=0
m
i
h=1
λ
h,l
p(T)
m
i
−h
T
l
v
i
= 0,
and set
q
h
(z) =
d−1
l=0
λ
h,l
z
l
for h ∈ [m
i
].
Then we see that deg(q
h
) < d for all h ∈ [m
i
], and
m
i
h=1
p(T)
m
i
−h
q
h
(T)v
i
= 0.
Now setting
q(z) =
m
i
h=1
p(z)
m
i
−h
q
h
(z),
105
we have that deg(q) < deg(p
m
i
) and q(T)v
i
= 0. If p q, then p, q are relatively prime, so
q(T) is invertible on G
p
= V by the proof of 6.5.5, which is a contradiction as q(T)v
i
= 0.
Hence p [ q. Let s ∈ N be maximal such that p
s
[ q, and let f ∈ F[z] such that fp
s
= q. If
f ,= 0, then p, f are relatively prime, so f(T) is invertible on G
p
= V . But then
0 = f(T)
−1
0 = f(T)
−1
f(T)p(T)
s
v
i
= p(T)
s
v
i
,
so we must have that s ≥ m
i
, a contradiction as deg(q) < deg(p
m
i
). Hence f = 0, so q = 0.
As deg(q
h
) < d = deg(p) for all h, we must have that
0 = LC(q) = LC(p
m
i
−1
q
1
) = LC(p
m
i
−1
) LC(q
1
) =⇒q
1
= 0.
Once more, as deg(q
n
) < d for all h, we now have that
0 = LC(q) = LC(p
m
i
−2
q
2
) = LC(p
m
i
−2
) LC(q
2
) =⇒q
2
= 0.
We repeat this process to see that q
h
= 0 for all h ∈ [m
i
]. Hence λ
h,l
= 0 for all h, l, and C
i
is linearly independent. Now we see that [C
i
[ = dm
i
= dim(Z
i
) = dm
i
by 7.1.11, and we are
ﬁnished by the extension theorem.
It follows immediately that C is a basis for V as V =
n
i=1
Z
i
.
Theorem 7.5.6. Suppose T ∈ L(V ). Two rational canonical forms of T ∈ L(V ) diﬀer only
by a permutation of the irreducible monic factor powers of min
T
.
Proof. This follows immediately from 7.5.5 and 6.6.1.
Exercises
V will denote a ﬁnite dimensional vector space over F.
Exercise 7.5.7. Recall that spectrum is an invariant of similarity class by 5.1.11. In this
sense, we may deﬁne the spectrum of a similarity class of L(V ) to be the spectrum of one of
its elements. Given distinct λ
1
, λ
2
, λ
3
∈ C, how many similarity classes of matrices in M
7
(C)
have spectrum ¦λ
1
, λ
2
, λ
3
¦?
7.6 Holomorphic Functional Calculus
For this section, V will deﬁne a ﬁnite dimensional vector space over C.
Deﬁnition 7.6.1.
(1) The open ball of radius r > 0 centered at z
0
∈ C is
B
r
(z
0
) =
_
z ∈ C
¸
¸
[z −z
0
[ < r
_
.
106
(2) A subset U ⊂ C is called open if for each z ∈ U, there is an ε > 0 such that B
ε
(z) ⊂ U.
Examples 7.6.2.
(1)
(2)
Deﬁnition 7.6.3. Suppose U ⊂ C is open. A function f : U →C is called holomorphic (on
U) if for each z
0
∈ U, there is a power series representation of f on B
ε
(z
0
) ⊂ U for some
ε > 0, i.e., for each z
0
∈ U, there is an ε > 0 and a sequence (λ
k
)
k∈Z
geq0
such that
f(z) =
∞
k=0
λ
k
(z −z
0
)
k
converges for all z ∈ B
ε
(z
0
). Note that if f is holomorphic on U, then f is inﬁnitely many
times diﬀerentiable at each point in U. We have
f
(z) =
∞
k=1
λ
k
k(z −z
0
)
k−1
, f
(z) =
∞
k=2
λ
k
k(k −1)(z −z
0
)
k−2
, etc.
In particular,
f
(n)
(z
0
) = λ
n
n =⇒λ
n
=
f
(n)
(z
0
)
n!
.
This is Taylor’s Formula.
Examples 7.6.4.
(1)
(2)
Deﬁnition 7.6.5 (Holomorphic Functional Calculus).
Jordan Blocks: Suppose A ∈ M
n
(C) is a Jordan block, i.e. there is a λ ∈ C such that
A =
_
_
_
_
_
λ 1 0
.
.
.
.
.
.
.
.
. 1
0 λ
_
_
_
_
_
.
We see that we can write A canonically as A = D +N with D ∈ M
n
(C) diagonal (D = λI)
and N ∈ M
n
(C) nilpotent of order n:
A =
_
_
_
_
_
λ 0
.
.
.
.
.
.
0 λ
_
_
_
_
_
. ¸¸ .
D
+
_
_
_
_
_
0 1 0
.
.
.
.
.
.
.
.
. 1
0 0
_
_
_
_
_
. ¸¸ .
N
.
107
Now sp(T) = ¦λ¦, so if ε > 0 and f : B
ε
(λ) → C is holomorphic, then there is a sequence
(µ
n
)
n∈Z
geq0
such that
f(z) =
∞
k=0
µ
k
(z −λ)
k
converges for all z ∈ B
ε
(λ), so we may deﬁne
f(A) = f(D +N) =
∞
k=0
µ
k
(D +N −λI)
k
=
n−1
k=0
µ
k
N
k
=
_
_
_
_
_
µ
0
µ
1
µ
n−1
.
.
.
.
.
.
.
.
.
.
.
. µ
1
0 µ
0
_
_
_
_
_
=
_
_
_
_
_
_
f(λ) f
(λ)
f
(n−1)
(λ)
(n−1)!
.
.
.
.
.
.
.
.
.
.
.
. f
(λ)
0 f(λ)
_
_
_
_
_
_
Matrices: Suppose A ∈ M
n
(C). Then there is an invertible S ∈ M
n
(C) and a matrix J ∈
M
n
(C) in Jordan canonical form such that J = S
−1
AS. Let K
1
, . . . , K
n
be the Jordan blocks
of J, and let λ
i
∈ sp(A) be the diagonal entry of K
i
for i ∈ [n] . Let U ⊂ C be open such
that sp(A) ⊂ U, and suppose f : U →C is holomorphic. By the above discussion, we know
how to deﬁne f(K
i
) for all i ∈ [n]. We deﬁne
f(J) =
_
_
_
f(K
1
) 0
.
.
.
0 f(K
n
)
_
_
_
.
We then deﬁne f(A) = Sf(J)S
−1
. We must check that f(A) is well deﬁned. If J
1
, J
2
are two
Jordan canonical forms of A, then there are invertible S
1
, S
2
∈ M
n
(F) such that J
i
= S
−1
i
AS
i
for i = 1, 2. We must show S
1
f(J
1
)S
−1
1
= S
2
f(J
2
)S
−1
2
. By 7.5.3, we know J
1
and J
2
diﬀer
only by a permutation of the Jordan blocks, and since J
1
= S
−1
1
S
2
J
2
S
−1
2
S
1
, we must have
that S = S
−1
1
S
2
is a generalized permutation matrix. It is easy to see that f(J
2
) = Sf(J
1
)S
−1
as the Jordan blocks do not interact under multiplication by the generalized permutation
matrix. Thus S
1
f(J
1
)S
−1
1
= S
2
f(J
2
)S
−1
2
, and we are ﬁnished.
Operators: Suppose T ∈ L(V ). By 7.4.4, there is a Jordan canonical basis B for T, so [T]
B
is in Jordan canonical form. Thus, we deﬁne
f(T) = [f([T]
B
)]
−1
B
.
We must check that if B
is another Jordan canonical basis for T, then [f([T]
B
)]
−1
B
=
[f([T]
B
)]
−1
B
. This follows directly from the above discussion and 3.4.13.
Examples 7.6.6.
108
(1) If p ∈ F[z], we have that the p(T) deﬁned in 4.6.2 agrees with the p(T) deﬁned in 7.6.5.
Hence the holomorphic functional calculus is a generalization of the polynomial functional
calculus when V is a complex vector space.
(2) We will compute cos(A) for
A =
_
0 −1
1 0
_
.
We have that
U =
_
1 1
−i i
_
∈ M
2
(C)
is a unitary matrix that maps the standard orthonormal basis to the orthonormal basis
consisting of eigenvectors of
A =
_
0 −1
1 0
_
.
We then see that
D =
_
i 0
0 −i
_
=
_
1 1
−i i
_
∗
_
0 −1
1 0
__
1 1
−i i
_
= U
∗
AU.
Since A = UDU
∗
, by the entire functional calculus, we have that
cos(A) = cos(UDU
∗
) = U cos(D)U
∗
= U cos
_
i 0
0 −i
_
U
∗
= U
_
cos(i) 0
0 cos(−i)
_
U
∗
= U cosh(1)IU
∗
= cosh(1)I.
Theorem 7.6.7 (Spectral Mapping). Suppose T ∈ L(V ) and U ⊂ C is open such that
sp(T) ⊂ U. Then sp(f(T)) = f(sp(T)).
Proof. Exercise.
Proposition 7.6.8. Let T ∈ L(V ), let f, g : U → C be holomorphic such that sp(T) ⊂ U,
and let h: V → C be holomorphic such that sp(f(T)) ⊂ V . The holomorphic functional
calculus satisﬁes
(1) (f + g)(T) = f(T) + g(T),
(2) (fg)(T) = f(T)g(T),
(3) (λf)(T) = λf(T) for all λ ∈ C, and
(4) (h ◦ f)(T) = h(f(T)).
Proof. Exercise.
109
Exercises
V will denote a ﬁnite dimensional vector space over C.
Exercise 7.6.9. Compute
sin
_
0 −1
1 0
_
and exp
_
0 −π
π 0
_
.
Exercise 7.6.10. Suppose V is a ﬁnite dimensional vector space over C and T ∈ L(V ) with
sp(T) ⊂ B
1
(1) ⊂ C. Use the holomorphic functional calculus to show T is invertible.
Hint: Look at f(z) =
1
z
=
1
1 −(1 −z)
. When is f holomorphic?
Exercise 7.6.11 (Square Roots). Determine which matrices in M
n
(C) have square roots,
i.e. all A ∈ M
n
(C) such that there is a B ∈ M
n
(C) with B
2
= A.
Hint: The function g : C¸¦0¦ →C given by g(z) =
√
z is holomorphic. First look at the case
where A is a single Jordan block associated to λ ∈ C. In particular, when does a Jordan
block associated to λ = 0 have a square root?
110
Chapter 8
Sesquilinear Forms and Inner Product
Spaces
For this chapter, F will denote either R or C, and V will denote a vector space over F.
8.1 Sesquilinear Forms
Deﬁnition 8.1.1. A sesquilinear form on V is a function ¸, ): V V →F such that
(i) ¸, ) is linear in the ﬁrst variable, i.e. for each v ∈ V , the function ¸, v): V →F given
by u →¸u, v) is a linear transformation, and
(ii) ¸, ) is conjugate linear in the second variable, i.e. for each v ∈ V , the function
¸v, ): V →F given by u →¸v, u) is a linear transformation.
The sesquilinear form ¸, ) is called
(1) self adjoint if ¸u, v) = ¸v, u) for all u, v ∈ V ,
(2) positive if ¸v, v) ≥ 0 for all v ∈ V , and
(3) deﬁnite if ¸v, v) = 0 implies v = 0.
(4) an inner product it is a positive deﬁnite selfadjoint sesquilinear form.
An inner product space over F is a vector space V over F together with an inner product on
V .
Remarks 8.1.2.
(1) Note that linearity in the ﬁrst variable and self adjointness of a sesquilinear form ¸, )
implies that ¸, ) is conjugate linear in the second variable.
(2) If V is a vector space over R, then a sesquilinear form on V is usually called a bilinear
form, and the adjective “self adjoint” is replaced by “symmetric.” Note that in this case,
conjugate linear in the second variable means linear in the second variable. Note further
that if ¸, ) is linear in the ﬁrst variable and symmetric, then it is also linear in the second
variable.
111
Examples 8.1.3.
(1) The function
¸u, v) =
n
i=1
e
∗
i
(u)e
∗
i
(v)
is the standard inner product on F
n
.
(2) Let x
0
, x
1
, . . . , x
n
be n + 1 points in F. Then
¸p, q) =
n
i=0
p(x
i
)q(x
i
)
is an inner product on P
m
if m ≤ n, but it is not deﬁnite if m > n (including m = ∞).
(3) The function
¸f, g) =
b
_
a
f(x)g(x) dx
is the standard inner product on C([a, b], F)
(4) trace: M
n
(F) →F given by
trace(A) =
n
i=1
A
ii
induces an inner product on M
m×n
(F) by ¸A, B) = tr(B
∗
A).
(5) Let a, b ∈ R. Then the function
¸p, q) =
b
_
a
p(x)q(x) dx
is an inner product on F[x].
(6) Suppose V is a real inner product space. Then the complexiﬁcation V
C
deﬁned in 2.1.12
is a complex inner product space with inner product given by
¸u
1
+ iv
1
, u
2
+iv
2
)
C
= ¸u
1
, v
1
) +¸v
1
, v
2
) +i(¸u
2
, v
1
) −¸u
1
, v
2
)).
In particular, note that ¸, )
C
is deﬁnite.
Proposition 8.1.4 (Polarization Identity).
(1) If V is a vector space space over R and ¸, ) is a symmetric bilinear form, then
4¸u, v) = ¸u +v, u +v) −¸u −v, u −v).
112
(2) If V is a vector space space over C and ¸, ) is a self adjoint sesquilinear form, then
4¸u, v) =
3
k=0
i
k
¸u +i
k
v, u +i
k
v).
Proof. This is immediate from the deﬁnition of a symmetric bilinear form or self adjoint
sesquilinear form respectively.
Proposition 8.1.5 (CauchySchwartz Inequality). Let ¸, ) be a positive, self adjoint sesquilin
ear form on V . Then
[¸u, v)[
2
≤ ¸u, u)¸v, v) for all u, v ∈ V.
Proof. Let u, v ∈ V . Since ¸, ) is positive, we know that for all λ ∈ F ,
0 ≤ ¸u −λv, u −λv) = ¸u, u) −2 Re λ¸u, v) +[λ[
2
¸v, v). (∗)
Case 1: If ¸v, v) = 0, then 2 Re λ¸u, v) ≤ ¸u, u). Since this holds for all λ, we must have
¸u, v) = 0.
Case 2: If ¸v, v) , = 0, then in particular, equation (∗) holds for
λ =
¸v, u)
¸v, v)
Hence,
0 ≤ ¸u, u)−2 Re
¸v, u)
¸v, v)
¸u, v)+
¸
¸
¸
¸
¸v, u)
¸v, v)
¸
¸
¸
¸
2
¸v, v) = ¸u, u)−2
[¸u, v)[
2
¸v, v)
+
[¸u, v)[
2
¸v, v)
= ¸u, u)−
[¸u, v)[
2
¸v, v)
.
Thus,
[¸u, v)[
2
¸v, v)
≤ ¸u, u) =⇒[¸u, v)[
2
≤ ¸u, u)¸v, v).
Exercises
Exercise 8.1.6. Show that the sesquilinear form
¸f, g) =
b
_
a
f(x)g(x) dx
on C([a, b], F) is an inner product.
Hint: Show that if [f(x)[ > 0 for some x ∈ [a, b], then by the continuity of f, there is some
δ > 0 such that [f(y)[ > 0 for all y ∈ (x −δ, x +δ) ∩ (a, b).
Exercise 8.1.7. Show that the function ¸, )
2
: M
n
(F) M
n
(F) → F given by ¸A, B)
2
=
trace(B
∗
A) is an inner product on M
n
(F).
113
8.2 Inner Products and Norms
From this point on, (V, ¸, )) will denote an inner product space over F.
We illustrate an important proof technique called “true for all, then true for a speciﬁc
one.”
Proposition 8.2.1.
(1) Suppose u, v ∈ V such that ¸u, w) = ¸v, w) for all w ∈ W. Then u = v.
(2) Suppose S, T ∈ L(V ) such that ¸Sx, y) = ¸Tx, y) for all x, y ∈ V . Then S = T.
(3) Suppose S, T ∈ L(V ) such that ¸x, Sy) = ¸x, Ty) for all x, y ∈ V . Then S = T.
Proof.
(1) We have that ¸u −v, w) = 0 for all w ∈ V . In particular, this holds for w = u −v. Hence
¸u −v, u −v) = 0, so u −v −0 by deﬁniteness. Hence u = v.
(2) Let x ∈ V , and set u = Sx and v = Tx. Applying (1), we see Sx = Tx. Since this is true
for all x ∈ V , we have S = T.
(3) This follows immediately from selfadjointness of an inner product and (2).
Deﬁnition 8.2.2. A norm on V is a function  : V →R
≥0
such that
(i) (deﬁniteness) v = 0 implies v = 0,
(ii) (homogeneity) λv = [λ[ v for all λ ∈ F and v ∈ V , and
(iii) (triangle inequality) u +v ≤ u +v for all u, v ∈ V .
Examples 8.2.3.
(1) The Euclidean norm on F
n
is given by
v =
¸
¸
¸
_
n
i=1
e
∗
i
(v).
We will see in 8.2.4 that it is the norm induced by the standard inner product on F
n
.
(2) The 1norm on C([a, b], F) is given by
f
1
=
b
_
a
[f[ dx,
and the 2norm is given by
f
2
=
_
_
b
_
a
[f[
2
dx
_
_
1/2
.
114
We will see in 8.2.4 that the 2norm is the norm induced from the standard inner product
on C([a, b], F). The ∞norm is given by
f
∞
= max
_
[f(x)[
¸
¸
x ∈ [a, b]
_
.
It exists by the Extreme Value Theorem, which says that a continuous, realvalued function,
namely [f[, achieves its maximum on a closed, bounded interval, namely [a, b]. These norms
are all diﬀerent. For example, if [a, b] = [0, 2π] and f : [0, 2π] →R is given by f(x) = sin(x),
then f
1
= 4, f
2
= π, and f
∞
= 1.
(3) The norms deﬁned in (2) can all be deﬁned for F[x] as well.
Proposition 8.2.4. The function  : V → R
≥0
given by v =
_
¸v, v) is a norm. It is
usually called the induced norm on V .
Proof. Clearly   is deﬁnite by deﬁnition. It is homogeneous since
λv =
_
¸λv, λv) =
_
[λ[
2
¸v, v) = [λ[ v.
Now the CauchySchwartz inequality can be written as [¸u, v)[ ≤ uv after taking square
roots, so we have
u + v
2
= ¸u + v, u + v) = ¸u, u) + 2 Re¸u, v) +¸v, v)
≤ ¸u, u) + 2[¸u, v)[ +¸v, v) ≤ u
2
+ 2uv +v
2
= (u +v)
2
.
Taking square roots now gives the desired result.
Remark 8.2.5. In many books, the CauchyScwartz inequality is proved for inner products
(not positive, self adjoint sesquilinear forms as was done in 8.1.5), and it is usually in the
form used in the proof of 8.2.4:
[¸u, v)[ ≤ uv.
Proposition 8.2.6 (Parallelogram Identity). The induced norm   on V satisﬁes
u +v
2
+u −v
2
= 2u
2
+ 2v
2
for all u, v ∈ V.
Proof. This is immediate from the deﬁnition of  .
Deﬁnition 8.2.7.
(1) If u, v ∈ V and ¸u, v) = 0, then we say u is perpendicular to v. Sometimes this is denoted
as u ⊥ v.
(2) Let S ⊂ V be a subset. Then S
⊥
=
_
v ∈ V
¸
¸
v ⊥ s for all s ∈ S
_
is a subspace of V .
(3) We say the set S
1
is orthogonal to the set S
2
, denoted S
1
⊥ S
2
if v ∈ S
1
and w ∈ S
2
implies v ⊥ w.
Examples 8.2.8.
115
(1) The standard basis vectors in F
n
are all pairwise orthogonal. If we pick v ∈ F
n
, then
¦v¦
⊥
∼
= F
n−1
.
(2) Suppose we have the inner product on R[x] given by
¸p, q) =
1
_
0
p(x)q(x) dx.
Then 1 ⊥ x − 1/2. Moreover, ¦1¦
⊥
is inﬁnite dimensional as x
n
− 1/(n + 1) ∈ ¦1¦
⊥
for all
n ∈ N.
Proposition 8.2.9 (Pythagorean Theorem). Suppose u, v ∈ V such that u ⊥ v. Then
u +v
2
= u
2
+v
2
.
Proof. u +v
2
= ¸u +v, u + v) = ¸u, u) + 2 Re¸u, v) +¸v, v) = u
2
+v
2
.
Exercises
Exercise 8.2.10.
8.3 Orthonormality
Deﬁnition 8.3.1.
(1) A subset S ⊂ V is called an orthogonal set if u, v ∈ S with u ,= v implies that u ⊥ v.
(2) A subset S ⊂ V is called an orthonormal set if S is an orthogonal set and v ∈ S implies
v = 1.
Examples 8.3.2.
(1) Zero is never in an orthonromal set, but can be in an orthogonal set.
Proposition 8.3.3. Let S be an orthogonal set such that 0 / ∈ S. Then S is linearly inde
pendent. Hence all orthonormal sets are linearly independent.
Proof. Let ¦v
1
, . . . , v
n
¦ be a ﬁnite subset of S, and suppose there are scalars λ
1
, . . . , λ
n
∈ F
such that
n
i=1
λ
i
v
i
= 0.
Then we have a linear functional v
∗
j
= ¸, v
j
) ∈ V
∗
given by v
∗
j
(u) = ¸u, v
j
) for all u ∈ V . We
apply ϕ
j
to the above expression (“hit is with v
∗
j
”) to get
0 = v
j
_
n
i=1
λ
i
v
i
_
=
n
i=1
λ
i
v
∗
j
(v
i
) =
n
i=1
λ
i
¸v
i
, v
j
) = λ
j
¸v
j
, v
j
) = λ
j
v
j

2
.
as ¸v
i
, v
j
) = 0 if i ,= j. Since v
j
,= 0, we have v
j

2
,= 0, so λ
j
= 0.
116
Theorem 8.3.4 (GramSchmidt Orthonormalization). Let S = ¦v
1
, . . . , v
n
¦ be a linearly
independent subset of V . Then there is an orthonormal set ¦u
1
, . . . , u
n
¦ such that for each
k ∈ ¦1, . . . , n¦, span¦u
1
, . . . , u
k
¦ = span¦v
1
, . . . , v
k
¦.
Proof. The proof is by induction on n.
n = 1: Setting u
1
= v
1
/v
1
 works.
n −1 ⇒n: Suppose we have a linearly independent set S = ¦v
1
, . . . , v
n
¦. By the induc
tion hypothesis, there is an orthonormal set ¦u
1
, . . . , u
n−1
¦ such that span¦u
1
, . . . , u
k
¦ =
span¦v
1
, . . . , v
k
¦ for each k ∈ ¦1, . . . , n −1¦. Set
w
n
= v
n
−
n−1
i=1
¸v
n+1
, u
i
)u
i
and u
n
=
w
n
w
n

.
It is clear that w
n+1
⊥ u
i
for all j = 1, . . . , n −1 by applying the linear functional ¸, u
j
):
¸w
n+1
, u
j
) = ¸v
n+1
, u
j
) −
n−1
i=1
¸v
n+1
, u
i
)¸u
i
, u
j
) = ¸v
n+1
, u
j
) −¸v
n+1
, u
j
)¸u
j
, u
j
) = 0.
Hence u
n
⊥ u
j
for all j = 1, . . . , n − 1, and R = ¦u
1
, . . . , u
n+1
¦ is an orthonormal set. It
remains to show that span(S) = span(R). Recall that span(S ¸ ¦u
n
¦) = span(R ¸ ¦v
n
¦). It
is clear that span(R) ⊆ span(S) since v
1
, . . . , v
n−1
∈ span(S) and v
n
is a linear combination
of u
1
, . . . , u
n
. But we immediately see span(S) ⊆ span(R) as u
1
, . . . , u
n−1
∈ span(R) and u
n
is a linear combination of u
1
, . . . , u
n−1
and v
n
.
Deﬁnition 8.3.5. If V is a ﬁnite dimensional inner product space over F, a subset B ⊂ V
is called an orthonormal basis of V if B is an orthonormal set and B is a basis of V .
Examples 8.3.6.
(1) The standard basis of F
n
is an orthonormal basis.
(2) The set ¦1, x, x
2
, . . . , x
n
¦ is a basis, but not an orthonormal basis of P
n
with the inner
product given by
¸p, q) =
1
_
0
p(x)q(x) dx.
Theorem 8.3.7 (Existence of Orthonormal Bases). Let V be a ﬁnite dimensional inner
product space over F. Then V has an orthonormal basis.
Proof. Let B be a basis of V . By 8.3.4, there is an orthonormal set C such that span(B) =
span(C). Moreover, C is linearly independent by 8.3.3. Hence C is an orthonormal basis for
V .
117
Proposition 8.3.8. Let B = ¦v
1
, . . . , v
n
¦ ⊂ V be an orthogonal set that is also basis of V .
Then if w ∈ V ,
w =
n
i=1
¸w, v
i
)
¸v
i
, v
i
)
v
i
.
If B is an orthonormal basis for V , then this expression simpliﬁes to
w =
n
i=1
¸w, v
i
)v
i
,
and the ¸w, v
i
) are called Fourier coeﬃcients of w with respect to B.
Proof. Since B spans V , there are scalars λ
1
, . . . , λ
n
∈ F such that
w =
n
i=1
λ
i
v
i
.
Now we apply ¸, v
j
) to see that
¸w, v
j
) =
n
i=1
λ
i
¸v
i
, v
j
) = λ
j
¸v
j
, v
j
)
as ¸v
i
, v
j
) = 0 if i ,= j. Dividing by ¸v
j
, v
j
) gives the desired formula.
Proposition 8.3.9 (Parseval’s Identity). Suppose B = ¦v
1
, . . . , v
n
¦ be an orthonormal basis
of V . Then
v
2
=
n
i=1
[¸v, v
i
)[
2
for all v ∈ V.
Proof. The reader may check that this identity follows immediately from 8.3.8 using induc
tion and the Pythagorean Theorem (8.2.9).
Fact 8.3.10. Suppose B = (v
1
, . . . , v
n
) is an ordered basis of the vector space V . We may
impose an inner product on V by setting
¸u, v)
B
= ¸[u]
B
, [v]
B
)
F
n,
and we check that
¸v
i
, v
j
)
B
= ¸[v
i
]
B
, [v
j
]
B
)
F
n = ¸e
i
, e
j
)
F
n =
_
1 if i=j
0 else,
so B is an orthonormal basis of V with inner product ¸, )
B
. Moreover, we see that the linear
functional v
∗
j
= ¸, v
j
) for all j ∈ [n]. If
v =
n
i=1
λ
i
v
i
,
then we have that
¸v, v
j
)
B
=
_
n
i=1
λ
i
v
i
, v
j
_
B
=
n
i=1
λ
i
¸v
i
, v
j
)
B
= λ
j
= v
∗
j
(v).
118
Exercises
Exercise 8.3.11. Use GramSchmidt orthonormalization on ¦1, z, z
2
, z
3
¦ to ﬁnd an or
thonormal basis for P
3
(F) with inner product
¸p, q) =
1
_
0
p(z)q(z) dz.
Exercise 8.3.12. Suppose ¦v
1
, . . . , v
n
¦ is an orthonormal basis for V . Show that the linear
functionals ¸v
i
, v
j
): L(V ) →F given by T →¸Tv
i
, v
j
) are a basis for L(V )
∗
.
8.4 Finite Dimensional Inner Product Subspaces
Lemma 8.4.1. Let W be a ﬁnite dimensional subspace of V , and let ¦w
1
, . . . , w
n
¦ be an
orthonormal basis for W. Then x ∈ W
⊥
if x ⊥ w
i
for all i = 1, . . . , n.
Proof. Let w ∈ W. Then by 8.3.8,
w =
n
i=1
¸w, w
i
)w
i
.
Then we have
¸x, w) =
n
i=1
¸w, w
i
)¸x, w
i
) = 0,
so x ∈ W
⊥
.
Proposition 8.4.2. Let W ⊂ V be a ﬁnite dimensional subspace, and let v ∈ V . Then there
is a unique w ∈ W minimizing v −w, i.e. v −w ≤ v −w
 for all w
∈ W.
Proof. Let ¦w
1
, . . . , w
n
¦ be an orthonormal basis for W. Set
w =
n
i=1
¸v, w
i
)w
i
∈ W
and u = v −w. We have u ∈ W
⊥
by 8.4.1 as ¸u, w
j
) = 0 for all j = 1, . . . , n:
¸u, w
j
) = ¸v, w
j
) −
n
i=1
¸v, w
i
)¸w
i
, w
j
) = ¸v, w
j
) −¸v, w
j
)¸w
j
, w
j
) = 0.
We show w ∈ W is the unique vector minimizing the distance to v. Suppose w
∈ W
such that v −w
 ≤ v −w. Since v = u +w and u ⊥ (w −w
), by 8.2.9,
u
2
+w −w

2
= u + (w −w
)
2
= v −w

2
≤ v −w
2
= u
2
.
Hence w −w

2
= 0 and w = w
.
119
Deﬁnition 8.4.3. Let W ⊂ V be a ﬁnite dimensional subspace. Then P
W
, the projection
onto W, is the operator in L(V ) deﬁned by P
W
(v) = w where w is the unique vector in W
closest to v which exists by 8.4.2.
Examples 8.4.4.
(1) Let v ∈ V . The projection onto span¦v¦ denoted by P
v
, is is the operator in L(V ) given
by
u −→
¸u, v)
v
2
v.
The corresponding component operator is the linear functional C
v
∈ V
∗
given by
u −→
¸u, v)
v
2
.
Note that if v = 1, the formulas simplify to P
v
(u) = ¸u, v)v and C
v
(u) = ¸u, v).
(2) Suppose u, v ∈ V with u, v ,= 0 and u = λv for some λ ∈ F. Then P
u
= P
v
.
(3) Let W be a ﬁnite dimensional subspace of V , and let ¦w
1
, . . . , w
n
¦ be an orthonormal
basis of W. Then the projection onto W is the operator in L(V ) given by
u −→
n
i=1
¸u, w
i
)w
i
.
In particular, this deﬁnition is independent of the choice of orthonormal basis for W.
Lemma 8.4.5. Let W ⊂ V be a ﬁnite dimensional subspace. Then
(1) P
2
W
= P
W
,
(2) im(P
W
) =
_
v ∈ V
¸
¸
P
W
(v) = v
_
= W,
(3) ker(P
W
) = W
⊥
, and
(4) V = W ⊕W
⊥
.
Proof. We have that (2)(4) follow immediately from 6.1.6 if (1) holds.
(1) Let v ∈ V . Then there is a unique w ∈ W closest to v by 8.4.2. Then by the deﬁnition
of P
W
, we have P
W
(v) = w = P
W
(w). Hence P
2
W
(v) = P
W
(w) = w = P
W
(v), and P
2
W
=
P
W
.
Corollary 8.4.6. ¸P
W
u, v) = ¸u, P
W
v) for all u, v ∈ V .
Proof. Let u, v ∈ V . Then by 2.2.8, there are unique w
1
, w
2
∈ W and x
1
, x
2
∈ W
⊥
such that
u = w
1
+x
1
and v = w
2
+ x
2
. Then
¸P
W
u, v) = ¸P
W
(w
1
+x
1
), w
2
+x
2
) = ¸w
1
, w
2
+x
2
) = ¸w
1
, w
2
)
= ¸w
1
+x
1
, w
2
) = ¸w
1
+x
1
, P
W
(w
2
+ x
2
)) = ¸u, P
W
v).
120
Exercises
V will denote an inner product space over F.
Exercise 8.4.7 (Invariant Subspaces). Let T ∈ L(V ), and let W ⊂ V be a ﬁnite dimensional
subspace.
(1) Show W is Tinvariant if and only if P
W
TP
W
= TP
W
.
(2) Show W and W
⊥
are Tinvariant if and only if TP
W
= P
W
T.
121
122
Chapter 9
Operators on Hilbert Space
9.1 Hilbert Spaces
The preceding discussion brings up a few questions. What can we say about subspaces of
an inner product space V that are not ﬁnite dimensional? Is there a projection P
W
onto an
inﬁnite dimensional subspace? Is it still true that V = W⊕W
⊥
if W is inﬁnite dimensional?
As we are not assuming a knowledge of basic topology and analysis, there is not much we can
say about these questions. Hence, we will need to restrict our attention to ﬁnite dimensional
inner product spaces, which are particular examples of Hilbert spaces.
Deﬁnition 9.1.1. For these notes, a Hilbert space over F will mean a ﬁnite dimensional
inner product space over F.
Remark 9.1.2. The study of operators on inﬁnite dimensional Hilbert spaces is a vast area
of research that is widely popular today. One of the biggest diﬀerences between an under
graduate course on linear algebra and a graduate course in functional analysis is that in the
undergraduate course, one only studies ﬁnite dimensional Hilbert spaces.
For this section H will denote a Hilbert space over F. In this section, we discuss various
types of operators in L(H).
Proposition 9.1.3. Let K ⊂ H be a subspace. Then (K
⊥
)
⊥
= K.
Proof. It is obvious that K ⊂ (K
⊥
)
⊥
. We know H = K ⊕ K
⊥
by 8.4.5. By 8.3.7, choose
orthonormal bases B = ¦v
1
, . . . , v
n
¦ and C = ¦u
1
, . . . , u
m
¦ for K and K
⊥
respectively. Then
B ∪ C is an orthonormal basis for H by 2.4.13. Suppose w ∈ (K
⊥
)
⊥
. Then by 8.3.8,
w =
n
i=1
¸w, v
i
)v
i
+
m
i=1
¸w, u
i
)u
i
,
but w ⊥ u
i
for all i = 1, . . . , m, so w ∈ span(B) = K. Hence K ⊂ (K
⊥
)
⊥
.
Lemma 9.1.4. Suppose T ∈ L(H, V ) where V is a vector space. Then T = TP
ker(T)
⊥.
123
Proof. We know that H = ker(T) ⊕ ker(T)
⊥
by 8.4.5. Let v ∈ ker(T) and w ∈ ker(T)
⊥
.
Then T(v +w) = Tv +Tw = 0 +Tw = Tw = TP
ker(T)
⊥(v +w). Thus, T = TP
ker(T)
⊥.
Theorem 9.1.5 (Reisz Representation). The map Φ: H → H
∗
given by v → ¸, v) is a
conjugatelinear isomorphism, i.e. Φ(λu + v) = λΦ(u) + Φ(v) for all λ ∈ F and u, v ∈ H,
and Φ is bijective.
Proof. It is obvious that the map is conjugate linear. We show Φ is injective. Suppose
¸, v) = 0, i.e. ¸u, v) = 0 for all u ∈ V . Then in particular, ¸v, v) = 0, so v = 0 by
deﬁniteness. Note that the proof of 3.2.4 still works for conjugatelinear transformations, so
Φ is injective. We show Φ is surjective. It is clear that Φ(0) = 0. Suppose that ϕ ∈ H
∗
with
ϕ ,= 0. Then ker(ϕ) ,= H, so ker(ϕ)
⊥
,= (0). Pick v ∈ ker(ϕ)
⊥
such that ϕ(v) ,= 0. Now
consider the functional
ψ = ϕ(v)
¸, v)
v
2
= Φ
_
ϕ(v)
v
2
v
_
.
Since span¦v¦ = ker(ϕ)
⊥
, by 9.1.4 and 3.3.2 we have
ψ(u) = ϕ(v)
¸u, v)
v
2
= ϕ
_
¸u, v)
v
2
v
_
= ϕ(P
v
(u)) = ϕ(u).
Hence ψ = ϕ, and Φ is surjective.
Exercises
Exercise 9.1.6 (Sesquilinear Forms and Operators). Show that there is a bijective corre
spondence Ψ: L(H) →¦sesquilinear forms on H¦.
9.2 Adjoints
Deﬁnition 9.2.1. Let T ∈ L(H). Then if v ∈ H, u → ¸Tu, v) = (Φ(v) ◦ T)(u) deﬁnes a
linear operator on H. By the Reisz Representation Theorem, 9.1.5, there is a vector in H,
which we will denote T
∗
v, such that
¸Tu, v) = ¸u, T
∗
v) for all u ∈ H.
We show the map T
∗
: H →H given by v →T
∗
v is linear. Suppose λ ∈ F and w ∈ H. Then
¸Tu, λv + w) = λ¸Tu, v) +¸Tu, w) = λ¸u, T
∗
v) +¸u, T
∗
w) = ¸u, λT
∗
v + T
∗
w)
for all u ∈ H. Hence T
∗
(λv + w) = λT
∗
v + T
∗
w by 8.2.1. The map T
∗
is called the adjoint
of T.
Examples 9.2.2.
(1) For λI ∈ L(H), (λI)
∗
= λI.
124
(2) For A ∈ M
n
(F), we have (L
A
)
∗
= L
A
∗, multiplication by the adjoint matrix.
(3) Suppose H is a real Hilbert space and T ∈ L(H). Then (T
C
)
∗
∈ L(H
C
) is deﬁned by the
following formula:
(T
C
)
∗
(u +iv) = T
∗
u +iT
∗
v.
In other words, (T
C
)
∗
= (T
∗
)
C
.
Proposition 9.2.3. Let S, T ∈ L(H) and λ ∈ F.
(1) The map ∗: L(H) →L(H) is conjugatelinear, i.e. (λT +S)
∗
= λT
∗
+S,
(2) (ST)
∗
= T
∗
S
∗
, and
(3) T
∗∗
= (T
∗
)
∗
= T.
In short, ∗ is a conjugatelinear, antiautomorphism of period two.
Proof.
(1) For all u, v ∈ H, we have
¸u, (λT+S)
∗
v) = ¸(λT+S)u, v) = λ¸Tu, v)+¸Su, v) = ¸u, λT
∗
v)+¸u, S
∗
v) = ¸u, (λT
∗
+S
∗
)v).
By 8.2.1 we get the desired result.
(2) For all u, v ∈ H, we have
¸u, (ST)
∗
v) = ¸STu, v) = ¸u, T
∗
S
∗
v).
By 8.2.1 we get the desired result.
(3) For all u, v ∈ H, we have
¸Tu, v) = ¸u, T
∗
v) = ¸T
∗
v, u) = ¸v, T
∗∗
u) = ¸T
∗∗
u, v).
Hence, by 8.2.1, we get T = T
∗∗
.
Deﬁnition 9.2.4. An operator T ∈ L(H) is called
(1) normal if T
∗
T = TT
∗
,
(2) self adjoint if T = T
∗
(note that a self adjoint operator is normal),
(3) positive if T is self adjoint and ¸Tv, v) ≥ 0 for all v ∈ H, and
(4) positive deﬁnite if T is positive and ¸Tv, v) = 0 implies v = 0.
(5) A matrix A ∈ M
n
(F) is called positive (deﬁnite) if L
A
is positive (deﬁnite).
Examples 9.2.5.
(1) The following matrices in M
2
(F) are normal:
_
0 −1
1 0
_
,
1
2
_
1 1
1 1
_
, and
_
0 −i
i 0
_
.
125
(2)
Deﬁnition 9.2.6.
(1) An operator T ∈ L(H) is called an isometry if T
∗
T = I.
(2) An operator U ∈ L(H) is called unitary if U
∗
U = UU
∗
= I.
(3) An operator P ∈ L(H) is called a projection (or an orthogonal projection) if P = P
∗
= P
2
.
(4) An operator V ∈ L(H) is called a partial isometry if V
∗
V is a projection.
Examples 9.2.7.
(1) Suppose A ∈ M
n
(F). Then L
A
is unitary if and only if A
∗
A = AA
∗
= I. In fact, L
A
is unitary if and only if the columns of A are orthonormal if and only if the rows of A are
orthonormal.
(2) The simplest example of a projection is L
A
where A ∈ M
n
(F) is a diagonal matrix with
only zeroes and ones on the diagonal.
(3) If T ∈ L(H) is a projection or (partial) isometry, and if U ∈ L(H) is unitary, then U
∗
TU
is a projection or (partial) isometry respectively.
(4) Let H = P
n
with the inner product
¸p, q) =
1
_
0
p(x)q(x) dx.
Then multiplication by p ∈ P
n
is a unitary operator if and only if [p(x)[ = 1 for all x ∈ [0, 1]
if and only if p(x) = λ where [λ[ = 1 for all x ∈ [0, 1].
Exercises
Exercise 9.2.8. Let B be an orthonormal basis for H, and let T ∈ L(H). Show that
[T
∗
]
B
= [T]
∗
B
.
Exercise 9.2.9 (Sesquilinear Forms and Operators 2). Let Ψ: L(H) →¦sesquilinear forms on H¦
be the bijective correspondence found in 9.1.6. For T ∈ L(H), show that the map Ψ satisﬁes
(1) T is self adjoint if and only if Ψ(T) is self adjoint,
(2) T is positive if and only if Ψ(T) is positive, and
(2) T is positive deﬁnite if and only if Ψ(T) is an inner product.
Exercise 9.2.10. Suppose T ∈ L(H). Show
(1) if T is self adjoint, then sp(T) ⊂ R,
(2) if T is positive, then sp(T) ⊂ [0, ∞), and
(3) if T is positive deﬁnite, then sp(T) ⊂ (0, ∞).
126
9.3 Unitaries
Lemma 9.3.1. Suppose B is an orthonormal basis of H. Then []
B
preserves inner products,
i.e.
¸u, v)
H
= ¸[u]
B
, [v]
B
)
F
n for all u, v ∈ H.
Proof. Let B = ¦v
1
, . . . , v
n
¦, and suppose
u =
n
i=1
µ
i
v
i
and v =
n
i=1
λ
i
v
i
.
Then we have
[u]
B
=
n
i=1
µ
i
e
i
and [v]
B
=
n
i=1
λ
i
e
i
,
so
¸[u]
B
, [v]
B
)
F
n =
n
i=1
µ
i
λ
i
= ¸u, v)
H
.
Theorem 9.3.2 (Unitary). The following are equivalent for U ∈ L(H):
(1) U is unitary,
(2) U
∗
is unitary,
(3) U is an isometry,
(4) ¸Uv, Uw) = ¸v, w) for all v, w ∈ H,
(5) Uv = v for all v ∈ H,
(6) If B is an orthonormal basis of H, then UB is an orthonormal basis of H, i.e. U maps
orthonormal bases to bases,
(7) If B is an orthonormal basis of H, then the columns of [U]
B
form an orthonormal basis
of F
n
, and
(8) If B is an orthonormal basis of H, then the rows of [U]
B
form an orthonormal basis of
F
n
.
Proof.
(1) ⇔(2): Obvious.
(1) ⇒(3): Obvious.
(3) ⇒(4): Suppose U
∗
U = I, and let v, w ∈ H. Then
¸Uv, Uw) = ¸U
∗
Uv, w) = ¸Iv, w) = ¸v, w).
127
(4) ⇒(5): Suppose ¸Uv, Uw) = ¸v, w) for all v, w ∈ H. Then
Uv
2
= ¸Uv, Uv) = ¸v, v) = ¸v, v) = v
2
.
Now take square roots.
(5) ⇒(1): Suppose Uv = v for all v ∈ V , and let v ∈ ker(U). Then
0 = 0 = Uv = v,
so v = 0 and U is injective. Hence U is bijective by 3.2.15, and U is invertible. Thus
UU
∗
= I.
(4) ⇒(6): Let B = ¦v
1
, . . . , v
n
¦ be an orthonormal basis of H. Then
¸Uv
i
, Uv
j
) = ¸v
i
, v
j
) =
_
1 if i = j
0 else,
so UB = ¦Uv
1
, . . . , Uv
n
¦ is an orthonormal basis of H.
(6) ⇒(7): Suppose B = ¦v
1
, . . . , v
n
¦ is an orthonormal basis of H. Then
[U]
B
=
_
[Uv
1
]
B
¸
¸
¸
¸
¸
¸
¸
¸
[Uv
n
]
B
_
,
and we have that
¸[Uv
i
]
B
, [Uv
j
]
B
)
F
n = ¸[U]
∗
B
[U]
B
[v
i
]
B
], [v
j
]
b
)
F
n = ¸e
i
, e
j
)
F
n =
_
1 if i = j
0 else,
so the columns [Uv
i
]
B
of [U]
B
form an orthonormal basis of F
n
.
(7) ⇒(8): Suppose B is an orthonormal basis of H and the columns of [U]
B
form an or
thonormal basis of F
n
. Then we see that
[U]
∗
B
[U]
B
= I = [U]
B
[U]
∗
B
.
Hence if U
i
∈ M
1×n
(F) is the i
th
row of [U]
B
for i ∈ [n], then we have
U
i
U
∗
j
= I
i,j
=
_
1 if i = j
0 else,
so the rows of [U]
B
form an orthonormal basis of F
n
.
(8) ⇒(1): Suppose B is an orthonormal basis of H and the rows of [U]
B
form an orthonormal
basis of F
n
. Then by 9.2.8
[UU
∗
]
B
= [U]
B
[U
∗
]
B
= [U]
B
[U]
∗
B
= I,
so UU
∗
= I. Thus U
∗
is injective and invertible by 3.2.15, so U
∗
U = I, and U is unitary.
Remark 9.3.3. Note that only (5) ⇒(1) fails if we do not assume that H is ﬁnite dimensional.
128
Exercises
Exercise 9.3.4 (The Trace). Let ¦v
1
, . . . , v
n
¦ be an orthonormal basis of H. Deﬁne tr: L(H) →
F by
tr(T) =
n
i=1
¸Tv
i
, v
i
).
(1) Show that tr ∈ L(H)
∗
.
(2) Show that tr(TS) = tr(ST) for all S, T ∈ L(H). Deduce that tr(UTU
∗
) = tr(T) for all
unitary U ∈ L(H). Deduce further that tr is independent of the choice of orthonormal basis
of H.
(3) Let B be an orthonormal basis of H. Show that
tr(T) = trace([T]
B
)
for all T ∈ L(H) where trace ∈ M
n
(F)
∗
is given by
trace(A) =
n
i=1
A
ii
.
9.4 Projections
Proposition 9.4.1. There is a bijective correspondence between the set of projections P(H) ⊂
L(H) and the set of subspaces of H.
Proof. We show the map K → P
K
where P
K
is as in 8.4.3 is bijective. First, suppose
P
L
= P
K
for subspaces L, K. Suppose u ∈ L. Then P
K
(u) = P
L
(u) = u by 8.4.5, so u ∈ K.
Hence L ⊆ K. By symmetry, i.e. switching L and K in the preceding argument, we have
that K ⊆ L, so L = K.
We must show now that all projections are of the form P
K
for some subspace K. Let P
be a projection, and let K = im(P). We claim that P = P
K
. First, note that P
2
= P, so if
w ∈ K, then Pw = w. Second, if u ∈ K
⊥
, then for all w ∈ K,
0 = ¸w, u) = ¸Pw, u) = ¸w, Pu).
Hence Pu ∈ K ∩ K
⊥
, so Pu = 0 by 8.4.5. Now let v ∈ H = K ⊕ K
⊥
. Then by 2.2.8 there
are unique y ∈ K and z ∈ K
⊥
such that v = y +z. Then
Pv = Py + Pz = y = P
K
y +P
K
z = P
K
v,
so P = P
K
.
Corollary 9.4.2. Let P ∈ L(H) be a projection. Then
(1) H = ker(P) ⊕im(P) and
129
(2) im(P) =
_
v ∈ H
¸
¸
Pv = v
_
= ker(P)
⊥
.
Remark 9.4.3. Let P be a minimal projection, i.e. a projection onto a one dimensional
subspace. In light of 9.4.1, we see that a P is minimal in the sense that im(P) has no
nonzero proper subspaces. Hence, there are no nonzero projections that are “smaller” than
P.
Deﬁnition 9.4.4. Projections P, Q ∈ L(H) are called orthogonal, denoted P ⊥ Q if
im(P) ⊥ im(Q).
Examples 9.4.5.
(1) Suppose u, v ∈ H with u ⊥ v. Then P
u
⊥ P
v
.
(2) If L, K are two subspaces of H such that L ⊥ K, then P
L
⊥ P
K
. In particular, P
K
⊥ P
K
⊥
for all subspaces K ⊂ H.
Proposition 9.4.6. Let P, Q ∈ L(H) be projections. The following are equivalent:
(1) P ⊥ Q,
(2) PQ = 0, and
(3) QP = 0.
Proof.
(1) ⇒(2), (3): Suppose P ⊥ Q. For all u, v ∈ H, we have
¸PQu, v) = ¸Qu, Pv) = ¸u, QPv) = 0.
Hence PQ = 0 = QP by 8.2.1.
(2) ⇔(3): We have PQ = 0 if and only if QP = (PQ)
∗
= 0.
(2) ⇒(1): Now suppose PQ = 0 and let u ∈ im(P) and v ∈ im(Q). Then there are x, y ∈ H
such that u = Px and v = Qy, and
¸u, v) = ¸Px, Qy) = ¸PQu, v) = ¸0, v) = 0.
Deﬁnition 9.4.7.
(1) We say the projection P ∈ L(H) is larger than the projection Q ∈ L(H), denoted P ≥ Q
if im(Q) ⊂ im(P).
(2) If P, Q ∈ L(H), then we deﬁne the sup of P and Q by P ∨ Q = P
im(P)+im(Q)
and the inf
of P and Q by P ∧ Q = P
im(P)∩im(Q)
.
Proposition 9.4.8.
(1) P ∨Q is the smallest projection larger than both P and Q, i.e. if P, Q ≤ E and E ∈ L(H)
is a projection, then P ∨ Q ≤ E.
130
(2) P ∧Q is the largest projection smaller than both P and Q, i.e. if F ≤ P, Q, and F ∈ L(H)
is a projection, then F ≤ P ∧ Q.
Proof.
(1) It is clear that im(P), im(Q) ⊂ im(P) + im(Q), so P, Q ≤ P ∨ Q. Suppose P, Q ≤ E.
Then im(P) ⊂ im(E) and im(Q) ⊂ im(E), so we must have that im(P) + im(Q) ⊂ im(E),
and im(P ∨ Q) ⊂ im(E). Thus P ∨ Q ≤ E.
(2) It is clear that im(P) ∩ im(Q) ⊂ im(P), im(Q), so P ∧ Q ≤ P, Q. Suppose F ≤ P, Q.
Then im(F) ⊂ im(P) and im(F) ⊂ im(Q), so we must have that im(F) ⊂ im(P) ∩ im(Q),
and im(F) ⊂ im(P ∧ Q). Thus F ≤ P ∧ Q.
Proposition 9.4.9. Let P, Q ∈ L(H) be projections. The following are equivalent:
(1) im(Q) ⊂ im(P), i.e. Q ≤ P,
(2) Q = PQ, and
(3) Q = QP.
Proof.
(1) ⇒(2): Suppose Q ≤ P. Then
_
v ∈ H
¸
¸
Qv = v
_
⊂
_
v ∈ H
¸
¸
Pv = v
_
. Let v ∈ H, and
note there are unique x ∈ im(Q) and y ∈ ker(Q) such that v = x +y. Then
Qv = Qx +Qy = x = Px = PQx = PQv,
so Q = PQ.
(2) ⇔(3): We have Q = PQ if and only if Q = Q
∗
= (PQ)
∗
= QP.
(2) ⇒(1): Suppose Q = PQ, and suppose v ∈ im(Q). Then Qv = v, so v = Qv = PQv = Pv,
and v ∈ im(P).
Corollary 9.4.10. Suppose P, Q ∈ L(H) are projections with Q ≤ P. Then P − Q is a
projection.
Proof. We know P −Q is self adjoint. By 9.4.9,
(P −Q)
2
= P −QP −PQ+Q = P −Q−Q+Q = P −Q,
so P −Q is a projection.
Exercises
9.5 Partial Isometries
Proposition 9.5.1. Suppose V ∈ L(H) is a partial isometry. Set P = V
∗
V and Q = V V
∗
(1) P is the projection onto ker(V )
⊥
.
131
(2) Q is the projection onto im(V ). In particular, V
∗
is a partial isometry.
Proof.
(1) We show ker(P) = ker(V ), so that im(P) = ker(P)
⊥
= ker(V )
⊥
. It is clear that ker(V ) ⊂
ker(P) as V v = 0 implies Pv = V
∗
V v = V
∗
0 = 0.
Now suppose v ∈ ker(P) so Pv = 0. Then
0 = ¸Pv, v) = ¸V
∗
V v, v) = ¸V v, V v) = V v
2
,
so v ∈ ker(V ).
(2) By 9.1.4 and (1), we see V = V P. Suppose v ∈ im(V ). Then v = V u for some u ∈ H.
Then Qv = QV u = V V
∗
V u = V Pu = V u = v, so v ∈ im(Q).
Now suppose v ∈ im(Q). Then v = Qv = V V
∗
v, so v ∈ im(V ).
Remark 9.5.2. If V is a partial isometry, then ker(V )
⊥
is called the initial subspace of V and
im(V ) is called the ﬁnal subspace of V .
Theorem 9.5.3. Let V ∈ L(H). The following are equivalent:
(1) V is a partial isometry,
(2) V v = v for all v ∈ ker(V )
⊥
,
(3) ¸V u, V v) = ¸u, v) for all u, v ∈ ker(V )
⊥
, and
(4) V [
ker(V )
⊥ ∈ L(ker(V )
⊥
, im(V )) is an isomorphism with inverse V
∗
[
im(V )
∈ L(im(V ), ker(V )
⊥
).
Proof.
(1) ⇒(2): Suppose V is a partial isometry. Then if v ∈ ker(V )
⊥
,
V v
2
= ¸V v, V v) = ¸V
∗
V v, v) = ¸v, v) = v
2
by 9.5.1. Taking square roots gives V v = v.
(2) ⇒(3): If u, v ∈ ker(V )
⊥
, then assuming H is a complex Hilbert space, by 8.1.4,
4¸V u, V v) =
3
k=0
i
k
¸V u +i
k
V v, V u +i
k
V v) =
3
k=0
i
k
V (u +i
k
v)
2
=
3
k=0
i
k
u +i
k
v
2
=
3
k=0
i
k
¸u +i
k
v, u +i
k
v) = 4¸u, v).
Now divide by 4. The proof is similar using 8.1.4 if H is a real Hilbert space.
(3) ⇒(1): We show V
∗
V is a projection. Let w, x ∈ H. Then there are unique y
1
, y
2
∈ ker(V )
and z
1
, z
2
∈ ker(V )
⊥
such that w = y
1
+z
1
and x = y
2
+z
2
. Then
¸V
∗
V w, x) = ¸V
∗
V (y
1
+ z
1
), y
2
+z
2
) = ¸V y
1
+V z
1
, V y
2
+V z
2
) = ¸V z
1
, V z
2
) = ¸z
1
, z
2
)
= ¸z
1
, y
2
+ z
2
) = ¸P
ker(V )
⊥w, x).
Since this holds for all w, x ∈ H, by 8.2.1, V
∗
V = P
ker(V )
⊥.
132
(1) ⇒(4): We know by 9.5.1 that V
∗
V is the projection onto ker(V )
⊥
and V V
∗
is the pro
jection onto im(V ). Hence if S = V [
ker(V )
⊥ ∈ L(ker(V )
⊥
, im(V )) and T = V
∗
[
im(V )
∈
L(im(V ), ker(V )
⊥
), then ST = I
im(V )
and TS = I
ker(V )
⊥.
(4) ⇒(1): Suppose v ∈ H. Then there are unique x ∈ ker(V ) and y ∈ ker(V )
⊥
such that
v = x + y. Then V
∗
V v = V
∗
V (x + y) = V
∗
V y = y = P
ker(V )
⊥v, so V
∗
V = P
ker(V )
⊥, a
projection, and V is a partial isometry.
Exercises
Exercise 9.5.4 (Equivalence of Projections). Projections P, Q ∈ L(H) are said to be equiv
alent if there is a partial isometry V ∈ L(H) such that V V
∗
= P and V
∗
V = Q.
(1) Show that tr(P) = dim(im(P)) for all projections P ∈ L(H).
(2) Show that projections P, Q ∈ L(H) are equivalent if and only if tr(P) = tr(Q).
9.6 Dirac Notation and Rank One Operators
Notation 9.6.1 (Dirac). Given the inner product space (H¸, , )), we can deﬁne a function
¸[): H H →F by
¸u[v) = ¸u, v) for all u, v ∈ H.
In some treatments of linear algebra, the inner products are linear in the second variable
and conjugate linear in the ﬁrst. We can go back and forth between these two notations by
using the above convention.
In his work on quantum mechanics, Dirac found a beautiful and powerful notation that
is now referred to as “Dirac notation” or “bras and kets.” A vector in H is sometimes called
a “ket” and is sometiems denoted using the right half of the alternate inner product deﬁned
above:
v ∈ H or [v) ∈ H.
A linear functional in L(H, F) is sometimes called a “bra” and is sometimes denoted using
the left half of the alternate inner product:
v
∗
= ¸, v) ∈ H
∗
or ¸v[ ∈ H
∗
.
Now the (alternate) inner product of u, v ∈ H is nothing more than the “bra” ¸u[ applied to
the “ket” [v) to get the “braket” ¸u[v).
The power of this notation is that it allows for the opposite composition to get “rank one
operators” or “ketbras.” If u, v ∈ H, we deﬁne a linear transformation [u)¸v[ ∈ L(H) by
w = [w) −→¸v[w)[u) = ¸w, v)u.
If we write out the equation naively, we see ([u)¸v[)[w) = [u)¸v[w). The term “rank one”
refers to the fact that the dimension of the image of a rank one operator is less than or equal
to one. The dimension is one if and only if u, v ,= 0.
133
For example, recall that P
v
for v ∈ H was deﬁned to be the projection onto span¦v¦
given by
u −→
¸u, v)
v
2
v.
Thus, we see that P
v
= v
−2
[v)¸v[. This makes it clear that if u = λv and u, v ,= 0 ∈ V
and λ ∈ S
1
, then P
u
= P
v
:
P
u
=
[u)¸u[
u
2
=
[λv)¸λv[
λv
2
=
[λ[
2
[v)¸v[
[λ[
2
v
2
=
[v)¸v[
v
2
= P
v
.
In particular, we have that P is a minimal projection if and only if P = [v)¸v[ for some
v ∈ H with v = 1. Sometimes these projections are called “rank one” projections.
One can easily show that composition of rank one operators [u)¸v[ and [w)¸x[ is exactly
the naive composition:
([u)¸v[)([w)¸x[) = [u)¸v[w)¸x[ = ¸v[w)[u)¸x[ for all u, v, w, x ∈ H,
and taking adjoints is also easy:
([u)¸v[)
∗
= [v)¸u[ for all u, v ∈ H.
Furthermore, if T ∈ L(H), then composition is also naive:
T[u)¸v[ = [Tu)¸v[ and [u)¸v[T = [u)¸T
∗
v[.
Exercises
Exercise 9.6.2. Let u, v ∈ H and T ∈ L(H).
(1) Show that ([u)¸v[)
∗
= [v)¸u[.
(2) Show that T[u)¸v[ = [Tu)¸v[ and [u)¸v[T = [u)¸T
∗
v[.
134
Chapter 10
The Spectral Theorems and the
Functional Calculus
For this section H will denote a Hilbert space over F. The main result of this section is the
spectral theorem which says we can decompose H into invariant subspaces for T under a few
conditions.
10.1 Spectra of Normal Operators
This section provides some of the main lemmas for the proofs of the spectral theorems.
Proposition 10.1.1. Let T ∈ L(H) be a normal operator.
(1) Tv = T
∗
v for all v ∈ H.
(2) Suppose v ∈ H is an eigenvector of T ∈ L(H) with corresponding eigenvalue λ. Then v
is an eigenvector of T
∗
with corresponding eigenvalue λ.
Proof.
(1) Ke have that
Tv
2
= ¸Tv, Tv) = ¸T
∗
Tv, v) = ¸TT
∗
v, v) = ¸T
∗
v, T
∗
v) = T
∗
v
2
.
Now take square roots.
(2) Since T is normal, so is T
λ
I. By 9.2.3, we know (T −λI)
∗
= T
∗
−λI. Now we apply (1)
to get
0 = (T −λI)v = (T −λI)
∗
v = (T
∗
−λI)v.
Hence T
∗
v = λv.
Proposition 10.1.2. Let T ∈ L(H) be normal.
(1) If v
1
, v
2
are eigenvectors of T corresponding to distinct eigenvalues λ
1
, λ
2
respectively,
then v
1
⊥ v
2
. Hence E
λ
1
⊥ E
λ
2
.
135
(3) If S = ¦v
1
, . . . , v
n
¦ is a set of eigenvectors of T corresponding to distinct eigenvalues,
then S is linearly independent.
Proof.
(1) By 10.1.1 we have
λ
1
¸v
1
, v
2
) = ¸λ
1
v
1
, v
2
) = ¸Tv
1
v
2
) = ¸v
1
, T
∗
v
2
) = ¸v
1
, λ
2
v
2
) = λ
2
¸v
1
, v
2
).
The only way these two can be equal is if v
1
⊥ v
2
.
(2) By 8.3.3, it suﬃces to show S is orthogonal, but this follows by (1).
Exercises
H will denote a Hilbert space over F.
10.2 Unitary Diagonalization
Let V be a ﬁnite dimensional vector space over F. In Chapter 4, we proved that T ∈ L(V )
is diagonalizable if and only if
V =
m
i=1
E
λ
i
where sp(T) = ¦λ
1
, . . . , λ
m
¦
. Recall that this theorem was independent of an inner product structure of V and merely
relies on the ﬁnite dimensionality of V . In this section, we will characterize when an operator
T ∈ L(H) is unitarily diagonalizable, which is inherently connected to the inner product
structure of H as we need an inner product structure to deﬁne a unitary operator.
Deﬁnition 10.2.1. Let T ∈ L(H). T is unitarily diagonalizable if there is an orthonor
mal basis of H consisting of eigenvectors of T. A matrix A ∈ M
n
(F) is called unitarily
diagonalizable if L
A
is (unitarily) diagonalizable.
Examples 10.2.2.
(1) Every diagonal matrix is unitarily diagonalizable.
(2) Not every diagonalizable matrix is unitarily diagonalizable. An example is
A =
_
1 1
0 0
_
.
The basis
__
1
0
_
,
_
1
−1
__
is a basis of R
2
consisting of eigenvectors of L
A
, but there is no orthonormal basis of R
2
consisting of eigenvectors of L
A
.
136
Proposition 10.2.3. T ∈ L(H) is unitarily diagonalizble if and only if
H =
n
i=1
E
λ
i
and E
λ
i
⊥ E
λ
j
for all i ,= j
where sp(T) = ¦λ
1
, . . . , λ
n
¦.
Proof. Suppose T is unitarily diagonalizable, and let B = ¦v
1
, . . . , v
m
¦ be an orthonormal
basis of H consisting of eigenvectors of T. Let λ
1
, . . . , λ
n
be the eigenvalues of T, and let
B
i
= B ∩ E
λ
i
for i ∈ [n]. Then E
λ
i
= span(B
i
) for all i ∈ [n] and
H =
n
i=1
E
λ
i
as B =
n
i=1
B
i
.
Now E
λ
i
⊥ E
λ
j
for i ,= j as B
i
⊥ B
j
for i ,= j.
Suppose now that
H =
n
i=1
E
λ
i
and E
λ
i
⊥ E
λ
j
for all i ,= j
where sp(T) = ¦λ
1
, . . . , λ
n
¦. For i ∈ [n], let B
i
be an orthonormal basis for E
λ
i
, and note
that B
i
⊥ B
j
as E
λ
i
⊥ E
λ
j
for all i ,= j. Then
B =
n
i=1
B
i
is an orthonormal basis for H consisting of eigenvectors of T as B
i
is an orthonormal basis
for E
λ
i
for all i ∈ [n] and B is orthogonal.
Corollary 10.2.4. Let T ∈ L(H). T is unitarily diagonalizable if and only if there are
mutually orthogonal projections P
1
, . . . , P
n
∈ L(H) and distinct scalars λ
1
, . . . , λ
n
∈ F such
that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
Proof. We know by 10.2.3 that T is unitarily diagonalizable if and only if
H =
n
i=1
E
λ
i
and E
λ
i
⊥ E
λ
j
for all i ,= j
where sp(T) = ¦λ
1
, . . . , λ
n
¦.
Suppose that H is the orthogonal direct sum of the eigenspaces of T. By 6.1.7, setting
P
i
= P
E
λ
i
for all i ∈ [n], we have that
I =
n
i=1
P
i
and P
i
P
j
= 0 if i ,= j.
137
Hence by 9.4.6, the P
i
’s are mutually orthogonal. Now if v ∈ H, we have v can be written
uniquely as a sum of elements of the E
λ
i
’s by 2.2.8:
v =
n
i=1
v
i
where v
i
∈ E
λ
i
.
Now it is immediate that
Tv =
n
i=1
Tv
i
=
n
i=1
λ
i
v
i
=
n
i=1
λ
i
P
i
v
i
=
n
j=1
λ
j
P
j
n
i=1
v
i
=
_
n
j=1
λ
j
P
j
_
v,
so we have
T =
n
i=1
λ
i
P
i
.
Now suppose there are mutually orthogonal projections P
1
, . . . , P
n
∈ L(H) and distinct
scalars λ
1
, . . . , λ
n
∈ F such that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
The reader should check that sp(T) = ¦λ
1
, . . . , λ
n
¦ and E
λ
i
= im(P
i
). Note that E
λ
i
⊥ E
λ
j
for i ,= j as P
i
⊥ P
j
for i ,= j. Finally, by 6.1.7, we know that
H =
n
i=1
im(P
i
) =
n
i=1
E
λ
i
,
and we are ﬁnished.
Remark 10.2.5. Note that if ∈ L(H) with
T =
n
i=1
λ
i
P
i
,
where λ
i
∈ F are distinct and the P
i
’s are mutually orthogonal projections in L(H) that sum
to I, we can immediately see that sp(T) = ¦λ
1
, . . . , λ
n
¦, and the corresponding eigenspaces
are ¦im(P
1
), . . . , im(P
n
)¦.
Exercises
10.3 The Spectral Theorems
The complex, respectively real, spectral theorem is a classiﬁcation of unitarily diagonalizable
operators on complex, respectively real, Hilbert space. The key result for this section is 5.4.5.
138
Theorem 10.3.1 (Complex Spectral). Suppose H is a ﬁnite dimensional inner product space
over C, and let T ∈ L(H). Then T is unitarily diagonalizable if and only if T is normal.
Proof. Suppose T is unitarily diagonalizable. Then by 10.2.4, there are λ
1
, . . . , λ
n
∈ C and
mutually orthogonal projections P
1
, . . . , P
n
such that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
Then
TT
∗
=
n
i=1
λ
i
P
i
n
j=1
λ
j
P
j
=
n
i=1
λ
i
λ
i
P
i
=
n
i=1
λ
i
λ
i
P
i
=
n
j=1
λ
j
P
j
n
i=1
λ
i
P
i
= T
∗
T.
Suppose now that T is normal. By 5.4.5, we know T has an eigenvector. Let S =
¦v
1
, . . . , v
k
¦ be a maximal orthonormal set of eigenvectors of T corresponding to eigenvalues
λ
1
, . . . , λ
k
. Let K = span(S). We need to show K = H, or K
⊥
= (0). First, we show
TP
K
= P
K
T. By the Extension Theorem, we may extend S to an orthonormal basis B =
¦v
1
, . . . , v
n
¦ of H. By 10.1.1, for v ∈ H, we have
P
K
(Tv) = P
K
n
i=1
¸Tv, v
i
)v
i
=
n
i=1
¸Tv, v
i
)P
K
v
i
=
k
i=1
¸Tv, v
i
)v
i
=
k
i=1
¸v, T
∗
v
i
)v
i
=
k
i=1
¸v, λ
i
v
i
)v
i
=
k
i=1
¸v, v
i
)λ
i
v
i
=
k
i=1
¸v, v
i
)Tv
i
= T
_
k
i=1
¸v, v
i
)v
i
_
= T
_
n
i=1
¸v, v
i
)P
K
v
i
_
= T(P
K
v).
By 8.4.7, we know that K and K
⊥
are invariant subspaces for T, so T[
K
⊥ = (I−P
K
)T(I−P
K
)
is a well deﬁned normal operator in L(K
⊥
). Suppose K
⊥
,= (0), by 5.4.5 T[
K
⊥ has an
eigenvector w ∈ K
⊥
. We may assume w = 1. But then S ∪ ¦w¦ is an orthonormal set
of eigenvectors of T which is strictly larger than S, a contradiction. Hence K
⊥
= (0), and
K = H.
Lemma 10.3.2. If T ∈ L(H) is self adjoint, then all eigenvalues of T are real.
Proof. Suppose λ is an eigenvalue of T corresponding to the eigenvector v ∈ H. Then
λ¸v, v) = ¸Tv, v) = ¸v, Tv) = λ¸v, v).
The only way this is possible is if λ ∈ R.
Lemma 10.3.3. Suppose H is a ﬁnite dimensional inner product space over R, and let
T ∈ L(H) be self adjoint. Then T has a real eigenvalue.
139
Proof. Recall by 2.1.12 and 3.1.3 that the complexifcation H
C
of H is a complex Hilbert
space and that the complexiﬁcation of T is given by T
C
(u+iv) = Tu+iTv. Note that (T
C
)
∗
is given by
(T
C
)
∗
(u +iv) = T
∗
u +iT
∗
v,
so T
C
is self adjoint, hence unitarily diagonalizable by 10.3.1:
(T
C
)
∗
(u +iv) = T
∗
u +iT
∗
v = Tu +iTv = T
C
(u +iv).
Hence, there is an orthonormal basis ¦w
1
, . . . , w
n
¦ of H
C
consisting of eigenvectors of T
C
. For
j = 1, . . . , n, let w
j
= u
j
+ iv
j
, and let λ
j
∈ R (by 10.3.2) be the eigenvalue corresponding
to w
j
. First note that Tu
j
= λ
j
u
j
and Tv
j
= λ
j
v
j
by 2.1.12:
Tu
j
+iTv
j
= T
C
(u
j
+iv
j
) = λ
j
(u
j
+iv
j
) = λ
j
u
j
+iλ
j
v
j
.
Hence one of u
j
, v
j
must be nonzero as w
j
,= 0, and is thus an eigenvector of T with
corresponding real eigenvalue λ
j
.
Theorem 10.3.4 (Real Spectral). Suppose H is a ﬁnite dimensional inner product space
over R, and let T ∈ L(H). Then T is unitarily diagonalizable if and only if T is self adjoint.
Proof. Suppose T is unitarily diagonalizable. Then by 10.2.4, there are distinct λ
1
, . . . , λ
n
∈
R and mutually orthogonal projections P
1
, . . . , P
n
such that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
This immediately implies
T =
n
i=1
λ
i
P
i
=
n
i=1
λ
i
P
i
= T
∗
.
Now suppose T is self adjoint. Then by 10.3.3, T has an eigenvector. Let S = ¦v
1
, . . . , v
m
¦
be a maximal orthonormal set of eigenvectors, and let K = span(S). Then as in the proof
of 10.3.1, we have P
K
T = TP
K
. Note that T[
K
⊥ = (I −P
K
)T(I −P
K
) is a well deﬁned self
adjoint operator in L(K
⊥
). The rest of the argument is exactly the same as in 10.3.1.
Exercises
Exercise 10.3.5. Let S, T ∈ L(H) be normal operators. Show that S, T ∈ L(H) commute
if and only if S, T are simultaneously unitarily diagonalizable, i.e. there is an orthonormal
basis of H consisting of eigenvectors of both S and T.
Exercise 10.3.6 (Rayleigh’s Principle). Suppose T ∈ L(H) with T = T
∗
, and let
sp(T) = ¦λ
min
= λ
1
< λ
2
< < λ
n
= λ
max
¦
(see 9.2.10). Show that
λ
min
≤
¸Tv, v)
v
2
≤ λ
max
for all v ∈ H ¸ ¦0¦ with equality at each side if and only if v is an eigenvector with the
corresponding eigenvalue.
140
10.4 The Functional Calculus
Lemma 10.4.1. Suppose P
1
, . . . , P
n
∈ L(H) are mutually orthogonal projections. Then
n
i=1
λ
i
P
i
=
n
i=1
µ
i
P
i
if and only if λ
i
= µ
i
for all i ∈ [n].
Proof. For i ∈ [n], let v
i
∈ im(P
i
) ¸ ¦0¦. Then
λ
i
v
i
=
_
n
j=1
λ
j
P
j
_
v
i
=
_
n
j=1
µ
j
P
j
_
v
i
= µ
i
v
i
,
so (λ
i
−µ
i
)v
i
= 0 and λ
i
−µ
i
= 0 for all i ∈ [n].
Proposition 10.4.2. Let T ∈ L(H) be normal if F = C or self adjoint if F = R. Then T is
(1) self adjoint if and only if sp(T) ⊂ R,
(2) positive if and only if sp(T) ⊂ [0, ∞),
(3) a projection if and only if sp(T) = ¦0, 1¦,
(4) a unitary if and only if sp(T) ⊂ S
1
=
_
λ ∈ C
¸
¸
[λ = 1
_
,
(5) a partial isometry if and only if sp(T) ⊂ S
1
∪ ¦0¦.
Proof. We write
T =
n
i=1
λ
i
P
i
as in 10.2.4 as T is unitarily diagonalizable by 10.3.1.
(1) Clearly λ
i
= λ
i
for all i implies that
T =
n
i=1
λ
i
P
i
=
n
i=1
λ
i
P
i
= T
∗
.
Now if T = T
∗
, then we have by 9.4.6 that λ
j
P
j
= TP
j
= T
∗
P
j
= λ
j
P
j
for all j = 1 . . . , n,
which implies λ
j
= λ
j
for all j.
(2) Suppose T ≥ 0. Then ¸Tv, v) ≥ 0 for all v ∈ H, and in particular, for all eigenvectors.
Hence λ
i
v
2
≥ 0 for all i = 1, . . . , n, and sp(T) ⊂ [0, ∞). Now suppose T is positive
and v ∈ H. For i = 1, . . . , n, Let v
i
∈ im(P
i
) be a unit vector. Then ¦v
1
, . . . , v
n
¦ is an
orthonormal basis for H consisting of eigenvectors of T, and there are scalars µ
1
, . . . , µ
n
∈ F
such that
v =
n
i=1
µ
i
v
i
.
141
Now this means
¸Tv, v) =
_
T
n
i=1
µ
i
v
i
,
n
i=1
µ
i
v
i
_
=
n
i=1
¸Tµ
i
v
i
, µ
i
v
i
) =
n
i=1
¸λ
i
µ
i
v
i
, µ
i
v
i
) =
n
i=1
λ
i
µ
i
v
i

2
≥ 0.
(3) We have
T = T
∗
= T
2
⇐⇒
n
i=1
λ
i
P
i
=
n
i=1
λ
i
P
i
=
n
i=1
λ
2
i
P
i
⇐⇒ λ
i
= λ
i
= λ
2
i
for al i = 1, . . . , n
⇐⇒ λ
i
∈ ¦0, 1¦ for al i = 1, . . . , n.
The second ⇐⇒ follows by arguments similar to those in (1).
(4) We have
U
∗
U =
n
i=1
λ
i
P
i
n
j=1
λ
j
P
j
=
n
i=1
[λ
i
[
2
P
i
=
n
i=1
P
i
= I
if and only if [λ
i
[
2
= 1 for all i = 1, . . . , n if and only if λ
i
∈ S
1
for all i = 1, . . . , n. The
result now follows by 9.3.2.
(5) We have T
∗
T is a projection if and only if sp(T
∗
T) ∈ ¦0, 1¦ by (3). By the proof of
(4), we see that sp(T
∗
T) =
_
[λ
i
[
2
¸
¸
λ
i
∈ sp(T)
_
. It is clear that [λ
i
[
2
∈ ¦0, 1¦ if and only if
λ
i
∈ S
1
∪ ¦0¦.
Deﬁnition 10.4.3. Let T ∈ L(H) be normal if F = C or self adjoint if F = R. Then by 10.2.4
and the Spectral Theorem, there are mutually orthogonal projections P
1
, . . . , P
n
∈ L(H) and
scalars λ
1
, . . . , λ
n
∈ F such that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
The spectral projections of T are of the form
P =
m
j=1
P
i
j
where i
j
are distinct elements of ¦1, . . . , n¦ and 0 ≤ m ≤ n (if m = 0, then P = 0). Note
that the spectral projections of T are projections by 9.4.6.
Examples 10.4.4.
(1) If
A =
1
2
_
1 1
1 1
_
∈ M
2
(R),
142
then A is self adjoint. We see that the eigenvalues of A are 0, 1 corresponding to eigenvectors
1
√
2
_
1
1
_
and
1
√
2
_
1
−1
_
∈ R
2
.
We see then that our rank one projections are
P
1
=
1
2
_
1
1
_
_
1 1
_
=
1
2
_
1 1
1 1
_
and P
2
=
1
2
_
1
−1
_
_
1 −1
_
=
1
2
_
1 −1
−1 1
_
∈ L(R
2
),
and it is clear P
1
P
2
= P
2
P
1
= 0. Hence the spectral projections of A are
_
0 0
0 0
_
,
1
2
_
1 1
1 1
_
,
1
2
_
1 −1
−1 1
_
, and
_
1 0
0 1
_
.
(2) If
A =
_
0 −1
1 0
_
∈ M
2
(C),
then A is normal. We see that the eigenvalues of A are ±i corresponding to eigenvectors
_
1
−i
_
and
_
1
i
_
∈ C
2
.
We see then that our rank one projections are
P
1
=
_
1
−i
_
_
1 i
_
=
_
1 i
−i 1
_
and P
2
=
_
1
i
_
_
1 −i
_
=
_
1 −i
i 1
_
∈ L(C
2
),
and we see that P
1
P
2
= P
2
P
1
= 0. Hence the spectral projections of A are
_
0 0
0 0
_
,
_
1 i
−i 1
_
,
_
1 −i
i 1
_
, and
_
1 0
0 1
_
.
Deﬁnition 10.4.5 (Functional Calculus). Let T ∈ L(H) be normal if F = C or self adjoint
if F = R. Then T is unitarily diagonalizable by the Spectral Theorem. By 10.2.4, there are
mutually orthogonal minimal projections P
1
, . . . , P
n
∈ L(H) such that
T =
n
i=1
λ
i
P
i
.
For f : sp(T) →F, deﬁne the operator f(T) ∈ L(H) by
f(T) =
n
i=1
f(λ
i
)P
i
.
Examples 10.4.6.
143
(1) For p ∈ F[z], we have that p(T) as deﬁned in 4.6.2 agrees with the p(T) deﬁned in 10.4.5
for normal T. This is shown by proving that
p
_
n
i=1
λ
i
P
i
_
=
n
i=1
p(λ
i
)P
i
,
which follows easily from the mutual orthogonality of the P
i
’s using 9.4.6. Hence the func
tional calculus is a generalization of the polynomial functional calculus for a normal operator.
(2) Suppose U ⊂ C is an open set containing sp(T) and f : U →C is holomorphic. Then the
f(T) deﬁned in 7.6.5 agrees with the f(T) deﬁned in 10.4.5. Hence the functional calculus
is a generalization of the holomorphic functional calculus for a normal operator.
(3) Suppose we want to ﬁnd cos(A) where
A =
_
0 −1
1 0
_
∈ M
2
(C).
We saw in 10.4.4 that the minimal nonzero spectral projections of A are
_
1 i
−i 1
_
and
_
1 −i
i 1
_
,
so we see that
A =
_
0 −1
1 0
_
= i
_
1 i
−i 1
_
+ (−i)
_
1 −i
i 1
_
.
Now, we will apply 10.4.5 and use the fact that cos(z) is even to get
cos(A) = cos(i)
_
1 i
−i 1
_
+ cos(−i)
_
1 −i
i 1
_
= cos(i)
_
1 i
−i 1
_
+ cos(i)
_
1 −i
i 1
_
= cos(i)
_
1 0
0 1
_
= cosh(1)I.
Proposition 10.4.7. Let T ∈ L(H) be normal if F = C or self adjoint if F = R. The
functional calculus states that there is a well deﬁned function ev
T
: F(sp(T), F) → L(H).
This map has the following properties:
(1) ev
T
is a linear transformation,
(2) ev
T
is injective,
(3) ev
T
(f) = (ev
T
(f))
∗
for all f ∈ F(sp(T), F),
(4) im(ev
T
) is contained in the normal operators if F = C or the self adjoint operators if
F = R, and
(5) ev
T
(f ◦ g) = ev
g(T)
(f) for all f ∈ F(sp(T), F) and all g ∈ F(sp(g(T)), F).
144
Proof. By 10.2.4 and the Spectral Theorem, there are mutually orthogonal projections
P
1
, . . . , P
n
and distinct scalars λ
1
, . . . , λ
n
such that
I =
n
i=1
P
i
and T =
n
i=1
λ
i
P
i
.
(1) Suppose λ ∈ F and f, g ∈ F(sp(T), F). Then
(λf+g)(T) =
n
i=1
(λf+g)(λ
i
)P
i
=
n
i=1
_
λf(λ
i
)+g(λ
i
)
_
P
i
= λ
n
i=1
f(λ
i
)P
i
+
n
i=1
g(λ
i
)P
i
= λf(T)+g(T).
(2) This is immediate from 10.4.1.
(3) We have
f(T) =
n
i=1
f(λ
i
)P
i
=
_
n
i=1
f(λ
i
)P
i
_
∗
= f(T)
∗
(4) We have
f(T) =
n
i=1
f(λ
i
)P
i
,
so f(T) is unitarily diagonalizable by 10.2.4,and is thus normal if F = C or self adjoint if
F = R.
(5) Note that f(g(T)) is well deﬁned by (4). We have
(f ◦ g)(T) =
n
i=1
(f ◦ g)(λ
i
)P
i
=
n
i=1
f(g(λ
i
))P
i
= f
_
n
i=1
g(λ
i
)P
i
_
= f(g(T)).
Remark 10.4.8. We see that the spectral projections of a normal or self adjoint T ∈ L(H)
are precisely of the form f(T) where f : sp(T) →F such that im(f) ⊂ ¦0, 1¦.
Proposition 10.4.9. Suppose T ∈ L(H). Deﬁne [T[ =
√
T
∗
T.
(1) [T[
2
= T
∗
T, and [T[ is the unique positive square root of T
∗
T.
(2) Tv = [T[v for all v ∈ H.
(3) ker(T) = ker([T[).
Proof.
(1) That [T[
2
= T
∗
T follows immediately from 10.4.7 part (5). We know [T[ is positive by
10.4.2 and 10.4.5 as T
∗
T is positive:
¸T
∗
Tv, v) = ¸Tv, Tv) = Tv
2
≥ 0 for all v ∈ H.
145
Suppose now that S ∈ L(H) is positive and S
2
= T
∗
T. Then there are mutually orthogonal
projections P
1
, . . . , P
n
and distinct scalars λ
1
, . . . , λ
n
∈ [0, ∞) such that
S =
n
i=1
λ
i
P
i
=⇒T
∗
T = S
2
=
n
i=1
λ
2
i
P
i
.
As the λ
i
’s are distinct, the λ
2
i
’s are distinct, so applying the functional calculus, we see
[T[ =
√
T
∗
T =
n
i=1
_
λ
2
i
P
i
=
n
i=1
λ
i
P
i
= S.
(2) If v ∈ H,
Tv
2
= ¸Tv, Tv) = ¸T
∗
Tv, v) = ¸[T[
2
v, v) = ¸[T[v, [T[v) = [T[v
2
.
Now take square roots.
(3) This is immediate from (2).
Exercises
Exercise 10.4.10 (HahnJordan Decomposition of an Operator).
(1) Show that every operator can be written uniquely as the sum of two self adjoint operators.
(2) Show that every self adjoint operator can be written uniquely as the diﬀerence of two
positive operators.
10.5 The Polar and Singular Value Decompositions
Theorem 10.5.1 (Polar Decomposition). For T ∈ L(H), there is a unique partial isometry
V ∈ L(H) such that ker(V ) = ker(T) = ker([T[) and T = V [T[ where [T[ =
√
T
∗
T as in
10.4.9.
Proof. As [T[ is positive, [T is unitarily diagonalizable, so there is an orthonormal basis
¦v
1
, . . . , v
n
¦ of H consisting of eigenvectors of [T[. Let λ
i
be the eigenvalue corresponding to
v
i
for i ∈ [n], and note that λ
i
∈ [0, ∞) by 10.4.2. After relabeling, we may assume λ
i
≥ λ
i+1
for all i ∈ [n − 1]. Let k ∈ ¦0¦ ∪ [n] be minimal such that λ
i
= 0 for all i > n. Deﬁne an
operator V ∈ L(H) by
V v
i
=
_
_
_
1
λ
i
Tv
i
if i ∈ [k]
0 else.
and extending by linearity. Note that ker(T) = span¦v
k+1
, . . . , v
n
¦ = ker(V ).
146
We show V is a partial isometry. If v ∈ ker(V )
⊥
, then there are scalars µ
1
, . . . , µ
n
such
that
v =
k
i=1
µ
i
v
i
.
Then by 10.4.9, we have
V v =
_
_
_
_
_
k
i=1
µ
i
V v
i
_
_
_
_
_
=
_
_
_
_
_
k
i=1
µ
i
λ
i
Tv
i
_
_
_
_
_
=
_
_
_
_
_
T
k
i=1
µ
i
λ
i
v
i
_
_
_
_
_
=
_
_
_
_
_
[T[
k
i=1
µ
i
λ
i
v
i
_
_
_
_
_
=
_
_
_
_
_
k
i=1
µ
i
λ
i
[T[v
i
_
_
_
_
_
=
_
_
_
_
_
k
i=1
µ
i
λ
i
λ
i
v
i
_
_
_
_
_
=
_
_
_
_
_
k
i=1
µ
i
v
i
_
_
_
_
_
= v.
Hence V is a partial isometry by 9.5.3.
We show V [T[ = T. If v ∈ H, then there are scalars µ
1
, . . . , µ
n
such that
v =
n
i=1
µ
i
v
i
.
Then
V [T[v = V [T[
n
i=1
µ
i
v
i
=
n
i=1
µ
i
V [T[v
i
=
n
i=1
µ
i
λ
i
V v
i
=
k
i=1
µ
i
λ
i
1
λ
i
Tv
i
=
n
i=1
µ
i
Tv
i
= Tv.
as λ
i
= 0 implies Tv
i
= 0 since ker(T) = ker([T[).
Suppose now that T = U[T[ for a partial isometry U with ker(U) = ker(T). Then we
would have that Uv
i
= 0 if i > k, and
Uv
i
= U
_
1
λ
i
(λ
i
v
i
)
_
= U
_
1
λ
i
([T[v
i
)
_
= U[T[
_
1
λ
i
v
i
_
= T
_
1
λ
i
v
i
_
=
1
λ
i
Tv
i
if i ∈ [k]. Hence U and V agree on a basis of H, so U = V .
Deﬁnition 10.5.2. The eigenvalues of [T[ are called the singular values of T.
Notation 10.5.3 (Singular Value Decomposition). Let T ∈ L(H). By 10.5.1, there is a
unique partial isometry V with ker(V ) = ker(T) and T = V [T[. As [T[ is positive, [T is
unitarily diagonalizable, there is an orthonormal basis ¦v
1
, . . . , v
n
¦ of H and scalars λ
1
, . . . , λ
n
such that
[T[ =
n
i=1
λ
i
[v
i
)¸v
i
[.
Setting u
i
= V v
i
for i = 1, . . . , n, we have that
T = U[T[ = U
n
i=1
λ
i
[v
i
)¸v
i
[ =
n
i=1
λ
i
U[v
i
)¸v
i
[ =
n
i=1
λ
i
[u
i
)¸v
i
[.
This last line is called a singular value decomposition, or Schmidt decomposition, of T.
147
Exercises
H will denote a Hilbert space over F.
Exercise 10.5.4. Compute the polar decomposition of
A =
_
0 −1
1 0
_
and B =
_
1 −1
1 1
_
∈ M
2
(C).
148
Bibliography
[1] Axler, S. Linear Algebra Done Right, 2nd Edition. SpringerVerlag, 1997.
[2] Friedberg, Insel, and Spence. Linear Algebra, 4th Edition. Pearson Education, 2003.
[3] Hoﬀman and Kunze. Linear Algebra, 2nd Edition. PrenticeHall, 1971.
[4] Pederson, G. Analysis Now, Revised Printing. Springerverlag, 1989.
149
2
Contents
1 Background Material 1.1 Sets . . . . . . . . . . . . . 1.2 Functions . . . . . . . . . . 1.3 Fields . . . . . . . . . . . . 1.4 Matrices . . . . . . . . . . . 1.5 Systems of Linear Equations 7 7 9 11 14 16 21 21 24 27 31 35 39 39 41 45 47 51 51 54 55 58 59 60 63 63 67 72 74
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
2 Vector Spaces 2.1 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Linear Combinations, Span, and (Internal) Direct Sum 2.3 Linear Independence and Bases . . . . . . . . . . . . . 2.4 Finitely Generated Vector Spaces and Dimension . . . 2.5 Existence of Bases . . . . . . . . . . . . . . . . . . . . 3 Linear Transformations 3.1 Linear Transformations 3.2 Kernel and Image . . . 3.3 Dual Spaces . . . . . . 3.4 Coordinates . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
4 Polynomials 4.1 The Algebra of Polynomials . . . . . . 4.2 The Euclidean Algorithm . . . . . . . . 4.3 Prime Factorization . . . . . . . . . . . 4.4 Irreducible Polynomials . . . . . . . . . 4.5 The Fundamental Theorem of Algebra 4.6 The Polynomial Functional Calculus .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
5 Eigenvalues, Eigenvectors, and the Spectrum 5.1 Eigenvalues and Eigenvectors . . . . . . . . . 5.2 Determinants . . . . . . . . . . . . . . . . . . 5.3 The Characteristic Polynomial . . . . . . . . . 5.4 The Minimal Polynomial . . . . . . . . . . . . 3
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . 89 . . . .3 Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectral Theorems and the Functional Calculus Spectra of Normal Operators . . . . . . . . . . . . 8. 106 111 111 114 116 119 123 123 124 127 129 131 133 135 135 136 138 141 146 8 Sesquilinear Forms and Inner Product Spaces 8. . . . . . 101 . . . . . . .1 Sesquilinear Forms . . . . . . . . . . . .2 Adjoints . . . . . . . . . . . . . . . . . . 9 Operators on Hilbert Space 9. . . . . . . . . . . . . . The Functional Calculus . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . .6 Primary Decomposition . . . . . . . . 98 . . . . . . . . . . . 92 . . . . . .5 Generalized Eigenvectors 6. . . . .6 Holomorphic Functional Calculus .3 Diagonalization . . . . . . . . . . . . .5 Uniqueness of Canonical Forms . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Cyclic Subspaces . . . . . .2 Rational Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Partial Isometries . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . 7. 96 . . . . . . . .2 Matrix Decomposition . . . . . . . . . . . . . . . . . 9. . . . .4 Nilpotents . . . . . . . . . . . . . 7. . . . . . . . . . . . .6 Dirac Notation and Rank One 10 The 10. . . . . . . . . . . . 9. . . . . . 9. . . . . . . . . . . . . .2 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . 6. . . . . . . . . . . . . . . . . . . The Spectral Theorems . . . . . . . . . . . 9. . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 10. . . . . . . . . . . . Unitary Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Unitaries . . . . . . . . . . . 89 . . . . . .1 Hilbert Spaces . . . . . . . . . . . . Operators . .5 . . . . . . . . . . . . . . . . . . . . . . .2 Inner Products and Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 77 80 81 83 83 85 7 Canonical Forms 7. . . . . . . . . . . . . . .1 10. . .4 Jordan Canonical Form . .1 Idempotents . . . . . . . . . . . . . . . . . 9. . . . . . . . . .3 The CayleyHamilton Theorem . . . . .6 Operator Decompositions 6. . . . . . . 7. . . . . . .4 Finite Dimensional Inner Product Subspaces . . . . . . . . . . . . . .4 Projections . . . . . . . . . . . . 8. . The Polar and Singular Value Decompositions . . . . . . 4 . . . .3 10. . . .
examples. The author assumes the reader has taken an elementary course in matrix theory and is well versed with matrices and Gaussian elimination.Introduction These notes are designed to be a self contained treatment of linear algebra. It is my intention that each time a deﬁnition is given. the examples will be presented without proof that they are. Most times. and it will be left to the reader to verify that the examples set forth are examples. in fact. there will be at least two examples given. 5 .
6 .
C} is a set. The second is the empty set. (3) X \ Y = x ∈ X x ∈ Y . we write x ∈ X.1. . and C = R + iR = x + iy x. if x ∈ X ⇒ x ∈ Y (the double arrow ⇒ means “implies”). 2. X × Y is called the (Cartesian) product of X and Y. } is the set of natural numbers. There is something in the ﬁrst set. . Z = {0. In particular. To say x is an element of the set X. −2. .. Deﬁnition 1. Remark 1. {N.1. y ∈ R where i2 + 1 = 0 is the set of complex numbers. 7 . .1. The modulus. we will write X ⊆ Y . q ∈ Z and q = 0 is the set of rational numbers. Sets X and Y are equal if both X ⊂ Y and Y ⊂ X. and / (4) X × Y = (x. If X ⊂ Y and we want to emphasize that X and Y may be equal. A vertical bar in the middle of the curly brackets will mean “such that. Q = p/q p. i. N = {1. y are in R. 3. there is a y ∈ Y \ X. −1. denoted X ⊂ Y . namely the empty set. there is a diﬀerence between {∅} and ∅. Z.” For example. sets will be denoted with curly brackets or letters. and it is denoted ∅ or {}. of z is √ z = x2 + y 2 = zz. . For example. (2) X ∩ Y = x x ∈ X and x ∈ Y . 1. A subset X ⊂ Y is called proper if Y \ X = ∅. If z = x + iy is in C where x. There is a set with no elements in it. . then its complex conjugate is the number z = x − iy. The ﬁrst is the set containing the empty set. Note that sets can contain other sets. We say X is a subset of Y . If X and Y are sets. we write (1) X ∪ Y = x x ∈ X or x ∈ Y . so it is not empty. 2.e. y) x ∈ X and y ∈ Y . R is the set of real numbers (which we will not deﬁne). R. } is the set of integers.1 Sets Throughout these notes.2. Q.Chapter 1 Background Material 1. or absolute value.
−1. antisymmetric. and transitive. but not symmetric. We usually write xRy if (x. y) ∈ R.6. −2. (ii) symmetric if xRy implies yRx for all x. Exercise 1. and Z \ N = {0. or reﬂexive. . N ∩ Z = N. symmetric. y ∈ R .Examples 1. but not symmetric. symmetric. X and R such that R is transitive. (3) If X = {x.1. (2) If X = {x}. (4) R × R = R2 = (x. (v) skewsymmetric if xRy and xRz implies yRz for all x. then P(X) = {∅. X and R such that R is reﬂexive. y}. then P(X) = {∅}.5. (1) N ∪ Z = Z. If X is a set. Examples 1. and 4. and transitive.1. denoted P(X).1. The relation R is called (i) reﬂexive if xRx for all x ∈ X. A relation on a set X is a subset R of X × X. . (3) R \ Q is the set of irrational numbers. (1) If X = ∅. and transitive. 3. (2) ∅ ∪ X = X. X and R such that R is reﬂexive. antisymmetric. y ∈ X. Exercises Exercise 1. y.7. {x}}.1. Deﬁnition 1. y}}. 8 . is S S ⊂ X . but not antisymmetric. and (iv) transitive if xRy and yRz implies xRz. {y}. X and R such that R is reﬂexive. Show that a reﬂexive relation R on X is symmetric and transitive if and only if R is skewtransitive. X \ X = ∅. Find the following examples of a relation R on a set X: 1. }. y) x. z ∈ X.3. then the power set of X.4.1. and X \ ∅ = X for all sets X. (iii) antisymmetric if xRy and yRx implies x = y. {x}. then P(X) = {∅. ∅ ∩ X = ∅. {x. . antisymmetric. 2.
we sometimes write f : x → y or x → y when f is understood. The image. Suppose f : X → Y and g : Y → Z. Exercise. then so is g ◦ f . is y ∈ Y there is an x ∈ X with f (x) = y .2.6.2 Functions Deﬁnition 1. The map N → Z given by 1 → 0. 3 → −1. To say the element x ∈ X maps to the element y ∈ Y via f .2. Exercise.1. The composite of g with f .1. (2) The absolute value function Z → N ∪ {0} is surjective.8. Examples 1. Function composition is associative. (2) surjective (or onto) if for all y ∈ Y there is an x ∈ X such that f (x) = y. g are surjective. Another way of saying that f is surjective is that im(f ) = Y . Then the restriction of f to W . Examples 1. denoted f : X → Y . y) ∈ X × Y y = f (x) . and let W ⊂ X. denoted im(f ) or f (X).e. (1) The natural inclusion N → Z is injective. The sets X and Y are called the domain and codomain of f respectively. Proof. is the function X → Z given by (g ◦ f )(x) = g(f (x)).2. (1) (2) Proposition 1. Deﬁnition 1. 4 → 2. 5 → −2 and so forth is bijective. (1) (2) 9 . i. denoted g ◦ f . of f . Suppose f : X → Y and g : Y → Z.2. The function f is called (1) injective (or onetoone) if f (x) = f (y) implies x = y. A function (or map) f from the set X to the set Y . g are injective.2.2. The graph of f is the set (x.3.5. or range.2. f (x) = y. and (3) bijective if f is both injective and surjective. is the function given by f W (w) = f (w) for all w ∈ W . Deﬁnition 1. Proof. denoted f (x). Examples 1.2. Proposition 1. is a rule that associates to each x ∈ X a unique y ∈ Y . then so is g ◦ f .2. denoted f W : W → Y .4. 2 → 1. Let f : X → Y . (1) If f. (2) If f.7. (3) Neither of the ﬁrst two is bijective.
2.9 (2). the axiom of choice is unnecessary. By the axiom of choice. It is clear by 1.2. We then see that g = g ◦ idY = g ◦ (f ◦ h) = (g ◦ f ) ◦ h = idX ◦h = h. Pick x0 ∈ X.9 each only have one element in them.11. there is a unique x with f (x) = y.2. Suppose now that f is bijective. Proof.9. Let f : X → Y .Proposition 1. and deﬁne a function g : Y → X as follows: g(y) = x x0 if f (x) = y else. f : X → Y is bijective if and only if it admits an inverse g : Y → X such that f ◦ g = idY and g ◦ f = idX . i. Note that this direction uses the axiom of choice as it was used in 1. Note that in the proof of 1. Axiom of Choice: There exists a choice function f on X if X is a set of nonempty sets. Then for each y ∈ im(f ). f is bijective as it admits both a left and right inverse.2. we may choose a representative xy ∈ Iy for each y ∈ Y . To prove this fact without the axiom of choice. As the sets Iy as in the proof of 1. A choice function f is a function such that for every S ∈ X. Then if y ∈ Y . our collection of nonempty sets is Iy y ∈ Y . so f is injective. so y ∈ im(f ) and f is surjective. Then by the preceding proposition. (2) Suppose f is surjective. Proof.e. Then if f (x1 ) = f (x2 ).e. f (S) ∈ S. so g = h is an inverse of f . Remark 1.2. 10 . applying g to the equation yields x1 = g ◦ f (x1 ) = g ◦ f (x2 ) = x2 . a function g : Y → X such that f ◦ g = idY . i. Corollary 1. there is a left inverse g and a right inverse h.10. (1) Suppose f is injective. (2) f is surjective if and only if it has a right inverse.2. The axiom of choice is formulated as follows: Let X be a collection of nonempty sets. Then the sets Iy = x ∈ X f (x) = y are nonempty for each y ∈ Y . Construct a function g : Y → X by setting g(y) = xy . we deﬁne g : Y → X by g(y) = x if f (x) = y. We give a corollary whose proof uses a uniqueness technique found frequently in mathematics.9 that if an inverse exists. the identity on X. It is immediate that g ◦ f = idf . and our choice function is Iy → xy . (1) f is injective if and only if it has a left inverse. a function g : Y → X such that g ◦ f = idX . Suppose now that f has a right inverse g.2. It is immediate that f ◦ g = idY . Suppose now that there is a g such that g ◦ f = idX . we have that f ◦ g(y) = y.9.
and (2) commutative if x#y = y#x for all x.3.2.11. (1) The proof of 1.3 Fields Deﬁnition 1.2. . z ∈ X.11 also shows that if f admits an inverse. y. Exercises Exercise 1.5. . let [n] = {1.2.5. Exercise 1. and construct a surjection N → Z2 \ Z × {0}. (2) Note that we used the associativity of composition of functions in the proof of 1.12. 11 .2.13.3. then it is unique. (5) is denumerable if there is a bijection N → X. Deﬁnition 1. . Hint: Find a surjection Z2 \ Z × {0} → Q. y) → x#y.2.17.15.1. A binary operation # : X × X → X is called (1) associative if x#(y#z) = (x#y)#z for all x. 2. R. then A ∈ N is the number n such that A has n elements.Remarks 1. Then use 1. If A is a ﬁnite set. . (1) [n] has n elements and N is denumerable.2.2.2. then P(X) has 2n elements. The set X (1) has n elements if there exists a bijection [n] → A.14. (3) is inﬁnite if there is an injection N → X (equivalently if X is not ﬁnite). For n ∈ N.2. 1.16. (6) is uncountable if X is not countable.2. n}. but not denumerable. C are associative and commutative binary operations. (1) Addition and multiplication on Z.2. y ∈ X. (2) is ﬁnite if there is a surjection [n] → X for some n ∈ N. (4) If X has n elements. A binary operation on a set X is a function # : X × X → X given by (x. (3) [n] is countable. Q. Show that Q is countable. Examples 1. Prove 1. (2) Q is countable. Examples 1. and R is uncountable. Deﬁnition 1. (4) is countable if there is a surjection N → X.
(3) For simplicity. Deﬁnition 1. These axioms are also used in deﬁning the following algebraic structures: axioms (F1) (F1)(F2) (F1)(F3) (F1)(F4) Remarks 1. +. ·K×K ) is a ﬁeld. there exists b ∈ F \ {0} such that a · b = 1 = b · a.4. A subﬁeld K ⊂ F is a subset such that +K×K and ·K×K are well deﬁned binary operations on K. and the multiplicative identity of K is the the multiplicative identity of F.3.3.3.3. Remark 1. A ﬁeld (F. R. b ∈ F. we will often denote the ﬁeld by F instead of (F. Examples 1.(2) The cross product × : R3 × R3 → R3 is not commutative or associative. (F6) multiplicative associativity: (a · b) · c = a · (b · c) for all a.3. b. c ∈ F.6. ·) is a triple consisting of a set F and two binary operations + and · on F called addition and multiplication respectively such that the following axioms are satisﬁed: (F1) additive associativity: (a + b) + c = a + (b + c) for all a. (1) The additive inverse of a ∈ F in (F3) is usually denoted −a. (F2) additive identity: there exists 0 ∈ F such that a + 0 = a = 0 + a for all a ∈ F. (1) Q. (F8) multiplicative inverse: for each a ∈ F \ {0}. Deﬁnition 1.3. +. (F5) distributivity: a · (b + c) = a · b + a · c and (a + b) · c = a · c + b · c for all a. (F4) additive commutativity: a + b = b + a for all a. (2) The multiplicative inverse of a ∈ F \ {0} in (F8) is usually denoted a−1 or 1/a. b ∈ F. c ∈ F. y ∈ Q is a ﬁeld. ·). (F3) additive inverse: for each a ∈ F. √ √ (2) Q( n) = x + y n x. b. b. C are ﬁelds. Let F be a ﬁeld.7. (K. (F9) multiplicative commutativity: a · b = b · a for all a. +K×K . 12 name semigroup monoid group abelian group axioms (F1)(F5) (F1)(F6) (F1)(F7) (F1)(F8) (F1)(F9) name nonassociative ring ring ring with unity division ring ﬁeld . there exists an element b ∈ F such that a + b = 0 = b + a. c ∈ F. (F7) multiplicative identity: there exists 1 ∈ F \ {0} such that a · 1 = a = 1 · a for all a ∈ F.5.
y ∈ Z.10.9 (Euclidean Algorithm).11.3. 13 . we have an l ∈ Z≥0 such that y = ln. the greatest common divisor.3. But r < n. so n = 1. Then there are unique k. But then r = x − kn = x − k(sx + ty) = (1 − ks)x + (−kt)y ∈ S. y) = 1. and x = kn. r ∈ Z≥0 with r < n such that x = kn + r. √ (2) Q( 2) is a subﬁeld of R. (1) For x.9. Proposition 1. y ∈ N. Proof. Examples 1. y ∈ N relatively prime if gcd(x. (4) A number p ∈ N \ {1} is called prime if gcd(p. (1) (2) Deﬁnition 1. denoted gcd(x. we deﬁne x mod n = r if x = kn + r with r < n. Hence nx and ny. Deﬁnition 1.3. Similarly. r are unique. Suppose x ∈ Z≥0 and n ∈ N.12.13. is the largest number k ∈ N such that kx and ky. Let S = sx + ty ∈ N s. s ∈ Z. t ∈ Z . y ∈ N are relatively prime if and only if there are r.3. y) = 1. For x ∈ Z≥0 and n ∈ N. then k(rx + sy) for all r. Now if gcd(x.3. n) = 1 for all n ∈ [p − 1]. Set r = x − kn. Proof. Then S has a smallest element n = sx + ty for some s. y). t ∈ Z. r ∈ Z≥0 with r < n such that x = kn + r. (1) Q is a subﬁeld of R and R is a subﬁeld of C. By the Euclidean Algorithm 1.Examples 1. (3) The irrational numbers R \ Q do not form a subﬁeld of R. There is a smallest k ∈ Z≥0 such that kn ≤ x. s ∈ Z such that rx + sy = 1 implies gcd(x. y) = k.3. y) = 1. we say x divides y. there are unique k. Now suppose gcd(x.8. x. It is obvious that k. so r = 0.3. Finite Fields Theorem 1. Then r < n and x = kn + r. (2) For x. if there is a k ∈ Z such that x = ky. so existence of r. denoted xy. s ∈ Z such that rx + sy = 1. (3) We call x.
we deﬁne Z/n = {0.3. n) = 1. s ∈ N with rx + sn = 1.3. n − 1}. is A(i.j = 1 if i = j. Exercise 1. and gcd(x.3. Then (xy) mod n = 1. Now suppose x has a multiplicative inverse y. and all results will be stated without proof. 1. and Z/5 and deduce they are ﬁelds from the tables. Show that Q( 2) = a + b 2 a. The identity matrix I ∈ Mn (F) is the matrix given by 0 if i = j Ii. and we will write Mn (F) = Mn×n (F). Proposition 1.4. (Z/n. Deﬁnition 1.3. . The set of all m × n matrices over F is denoted Mm×n (F).15. we have r. Proof.17. n) = 1 by 1.3. By 1.Deﬁnition 1.13. Thus xy + (−k)n = 1.14. . and we deﬁne binary operations # and ∗ on Z/n by x#y = (x + y) mod n and x ∗ y = (xy) mod n. 14 . n) = 1.16 (Fields). 1. so there is an k ∈ Z such that xy = kn + 1. denoted Ai. and deﬁnitions in this section will be given without examples. Thus Z/n is a ﬁeld if and only if it has multiplicative inverses.13. For n ∈ N \ {1}. F is a ﬁeld. . which will immediately imply the result. Hence (r mod n) ∗ x = (rx) mod n = (rx + sn) mod n = 1 mod n = 1. Suppose gcd(x. First note that Z/n has additive inverses as the additive inverse of x ∈ Z/n is n − x. so x has a multiplicative inverse.3. An m × n matrix A over F is a function A : [m] × [n] → F. For this section.1. j) where (i. . Exercises √ √ Exercise 1. Usually. It then follows that Z/n is a commutative ring with unit (one must check that addition and multiplication are compatible with the operation x → x mod n). #.4 Matrices This section is assumed to be prerequisite knowledge for the student.j . The ith row of the matrix A is the 1 × n matrix A{i}×[n] . b ∈ Q is a ﬁeld where addition and multiplication are the restriction of addition and multiplication in R. A is denoted by an m × n array of elements of F. and the i. j) ∈ [m] × [n]. ∗) is a ﬁeld if and only if n is prime. and the j th row is the m × 1 matrix A[m]×{j} . We claim that x ∈ Z/n \ {0} has a multiplicative inverse if and only if gcd(x. j th entry. Construct an addition and multiplication table for Z/2. Z/3.
Proposition 1. (1) If A.4. then (AB)∗ = B ∗ A∗ . Deﬁnition 1.i . then A∗ is invertible. Proposition 1. Note that matrix addition and multiplication are associative. (2) If A ∈ Mm×n (F) and λ ∈ F.j = k=1 Ai.4. It is clear that AI = A for all A ∈ Mm×n (F) for all m ∈ N and IB = B for all B ∈ Mn×p (F) for all p ∈ N.4.4.k Bk. B ∈ Mm×n (F). B ∈ Mm×n (F) and λ ∈ F.8.10. the unique inverse of A is usually denoted A−1 . (1) The transpose of the matrix A ∈ Mm×n (F) is the matrix AT ∈ Mn×m (F) given by (AT )i.7.j = Ai. then AB ∈ Mm×p (F) is the matrix given by n (AB)i. (2) If A ∈ Mm×n (F) and B ∈ Mn×p (F). Proof. then λA ∈ Mm×n (F) is the matrix given by (λA)i.Deﬁnition 1.4. then (AB)T = B T AT .j = Aj. If F = C. Proposition 1. (3) If A ∈ Mn (F) is invertible. If J ∈ Mn (F) is another such matrix.5.3.6.j = Ai.4.4. Deﬁnition 1. then A+B is the matrix in Mm×n (F) given by (A+B)i. then the inverse is unique.j . then [AB] ∈ Mm×(n+p) (F) is the matrix given by [AB]i.j−n if j ≤ n else.j . In this case. (1) If A.4. (2) The adjoint of the matrix A ∈ Mm×n (C) is the matrix A∗ ∈ Mn×m (C) given by (A∗ )i. then (A + λB)T = AT + λB T . If F = C.j +Bi.4. If F = C. If A ∈ Mn (F) is invertible. and (A∗ )−1 = (A−1 )∗ .j .2. Remark 1. then J = IJ = I. The identity matrix I ∈ Mn (F) is the unique matrix in Mn (F) such that AI = A for all A ∈ Mm×n (F) for all m ∈ N and IB = B for all B ∈ Mn×p (F) for all p ∈ N.j Bi.j = λAi. and (AT )−1 = (A−1 )T . Remark 1. (3) If A ∈ Mm×n (F) and B ∈ Mn×p (F). (4) If A ∈ Mm×n (F) and B ∈ Mm×p (F). Let A ∈ Mn (F). then AT is invertible. We call A 15 . then (A + λB)∗ = A∗ + λB ∗ . A matrix A ∈ Mn (F) is invertible if there is a matrix B ∈ Mn (F) such that AB = BA = I.j = Aj. Deﬁnition 1.9.4.i .
Deﬁnition 1.4.4. .. and (3) block diagonal if A is both block upper triangular and block lower triangular.j = 0. In this sense.(1) upper triangular if i > j implies Ai.6). µ) with A ∈ M1×n (F) and a µ ∈ F. Exercise 1.1. Exercise 1.j = 0. is an equation of the form n λi xi = λ1 x1 + · · · + λn xn = µ where λ1 . j > i implies Ai. symmetric.12.5 Systems of Linear Equations This section is assumed to be prerequisite knowledge for the student. if there is an invertible S ∈ Mn (F) such that S −1 AS = B. we can give an equivalent deﬁnition of an Flinear equation as a pair (A.4. We call A (1) block upper triangular if there are square matrices A1 . 16 . An Flinear equation is completely determined by a matrix A = λ1 · · · λn ∈ M1×n (F) and a number µ ∈ F. i = j implies Ai. . 1.14. (1) An Flinear equation.1. . λn . i.5.e. . and (3) diagonal if A is both upper and lower triangular. or a linear equation over F. Show that similarity gives a relation on Mn (F) that is reﬂexive. and λi is called the coeﬃcient of the variable xi . and transitive (see 1.11.4. Let A ∈ Mn (F). . Matrices A. For this section. Deﬁnition 1. . Deﬁnition 1. and deﬁnitions in this section will be given without examples. denoted A ∼ B..e. . (2) block lower triangular if AT is block upper triangular.9. A= 0 Am where the ∗ denotes entries in F. B ∈ Mn (F) are similar. µ ∈ F.13.j = 0. i=1 The xi ’s are called the variables. F is a ﬁeld. and all results will be stated without proof.4. i. (2) lower triangular if AT is upper triangular. Am with m ≥ 2 such that A1 ∗ . Exercises Let F be a ﬁeld. . Prove 1.
a system of Flinear equations is a pair (A.i xi = λm. a solution to the Flinear equation (A.i xi = λm.i ri = λm.(2) A solution of the Flinear equation n λi xi = λ1 x1 + · · · + λn xn = µ i=1 is an element (r1 . .1 r1 + · · · + λ1.1 r1 + · · · + λm.i xi = λ1. . . n λm.1 x1 + · · · + λm. . . . j. n λm.1 x1 + · · · + λ1. . y) where A ∈ Mm×n (F) and y ∈ Mm×1 (F). i=1 Equivalently. Equivalently. i=1 Equivalently. rn ) ∈ Fn = F × · · · × F n copies such that n λi ri = λ1 r1 + · · · + λn rn = µ. .n rn = µn . (3) A system of Flinear equations is a ﬁnite number m of Flinear equations: n λ1. . .n xn = µn i=1 is an element (r1 . n λm. . (4) A solution to the system of Flinear equations n λ1.1 x1 + · · · + λm. µ) is an x ∈ Mn×1 (F) such that Ax = µ. rn ) ∈ Fn such that n λ1.i xi = λ1. y) is an x ∈ Mn×1 (F) such that Ax = y.j ∈ F for all i.n xn = µ1 i=1 . a solution to the system of Flinear equations (A.n rn = µ1 i=1 . .i ri = λ1. .n xn = µ1 i=1 .1 x1 + · · · + λ1.n xn = µn i=1 where λi. 17 . .
k . and every matrix over F is row equivalent to a unique matrix in reduced row echelon form. Theorem 1.6. The solution is unique if and only if A can be row reduced to the identity matrix. If there is no such entry. 18 . B ∈ Mm×n (F) are row equivalent if there are elementary matrices E1 . or is said to be row reduced. if A is in row echelon form.5.5. (2) An n × n elementary matrix is a matrix obtained from the identity matrix I ∈ Mn (F) by performing not more than one elementary row operation. if the pivot of the ith row is Ai. i.10. Proposition 1.5.7. Suppose A ∈ Mm×n (F).e.8. (1) An elementary row operation on a matrix A ∈ Mm×n (F) is performed by (i) doing nothing. . A pivot of the ith row of the matrix A ∈ Mm×n (F) is the ﬁrst nonzero entry in the ith row. Remark 1. (ii) switching two rows.Deﬁnition 1.5. (2) A is said to be in reduced row echelon form. or (iv) replacing the ith row with the sum of the ith row and a scalar (constant) multiple of another row. Then A is invertible if and only if A is row equivalent to I. Let A ∈ Mm×n (F). (3) Matrices A.j and the pivot of the (i + 1)th row is A(i+1).5.e. . . Performing an elementary row operation on a matrix A ∈ Mm×n (F) is equivalent to multiplying A by the elementary matrix obtained by doing the same elementary row operation to the identity. . Deﬁnition 1. then the row has no pivots. every elementary row operation can be undone by another elementary row operation.5.5. . . all pivots of A are equal to 1. (iii) multiplying one row by a constant λ ∈ F. Every matrix over F is row equivalent to a matrix in row echelon form. . y) has a solution if and only if the augmented matrix [Ay] can be row reduced so that no pivot occurs in the (n + 1)th column.9 (Gaussian Elimination Algorithm).3. and all entries that occur above a pivot are zeroes. En such that A = En · · · E1 B. Theorem 1. Corollary 1. Elementary row operations are invertible.4. (1) A is said to be in row echelon form if the pivot in the (i + 1)th row (if it exists) is in a column strictly to the right of the pivot in the ith row (if it exists) for i = 1. Elementary matrices are invertible.5. Proposition 1.2. The system of Flinear equations (A. n − 1. .5. then k > j. i. Deﬁnition 1. Suppose A ∈ Mn (F).5.
Exercise 1. Prove 1.Exercises Let F be a ﬁeld.13. 19 .5. Exercise 1. Exercise 1.9.5.5. Prove 1.12. Prove 1.14.6.5.5. Exercise 1.5. Prove 1.5.11.10.5.5.
20 .
2. v ∈ V (V3) vector additive identity: there is a vector 0 ∈ V such that 0 + v = v for all v ∈ V . i. (4) and a map · : F × V → V given by (λ.e. The main theorem in this chapter is that vector spaces over a ﬁeld F are classiﬁed by their dimension. u + (v + w) = (u + v) + w for all u. v. 2. A vector space consists of (1) a scalar ﬁeld F. and (V4) vector additive inverse: for each v ∈ V . Examples 2. µ ∈ F and v ∈ V . v) → λ · v called a scalar multiplication such that (V5) scalar unit: 1 · x = x for all v ∈ V and (V6) scalar associativity: λ · (µ · v) = (λµ) · v for all v ∈ V and λ.e. (2) a set V . 21 .1. (3) a binary operation + called vector addition on V such that (V1) addition is associative. µ ∈ F.Chapter 2 Vector Spaces The objects that we will study this semester are vector spaces. whose elements are called vectors. In this chapter F will denote a ﬁeld.1 Deﬁnition Deﬁnition 2.1. u + v = v + u for all u. such that the distributive properties hold: (V7) λ · (u + v) = (λ · u) + (λ · v) for all λ ∈ F and u.1. v ∈ V and (V8) (λ + µ) · v = (λ · v) + (µ · v) for all λ. w ∈ V . i. (V2) addition is commutative. there is a −v ∈ V such that v + (−v) = 0.
There are subspaces of R3 which look exactly like R2 . if A = (Aij ). (5) F (X.e. This example can be generalized to C n (a. We will write C(a. b). Now that we know what a vector space is. ·) be a vector space over the ﬁeld F. (7) C 1 (a. then N S(A) ⊂ Rn is the subspace consisting of all x ∈ Rn such that Ax = 0. we begin to develop the necessary tools to prove that vector spaces over F are classiﬁed by their dimension. Proposition 2. and (W. and similarly for closed or halfopen intervals. (4) C n (a. b). +. i. +. (3) If A ∈ Mm×n (F) is an m × n matrix. (4) Let K ⊂ F be a subﬁeld. v ∈ W and λ ∈ F. R). b) for all n ∈ N ∪ {∞}. (5) C(a. b). (2) Let Mm×n (F) denote the m × n matrices over F. Let (V. F). (1) Every vector space has a zero subspace (0) consisting of the vector 0. ·) be a vector space over F.1. λu + v ∈ W for all u. b) is a subspace of F ((a. b). Also. A subset W ⊂ V is a subspace if it is closed under addition and scalar multiplication.(1) (F. b) = C((a. +.1.4. V is a subspace of V . 22 . F) = {f : X → F} is a vector space over F with pointwise addition and scalar multiplication: (λf + g)(x) = λf (x) + g(x). A subspace W ⊂ V is called proper if W = V . b). b) ⊂ R is a vector space over R. the ntimes continuously diﬀerentiable functions on (a. the set of continuous functions from X ⊂ F to F is a vector space over F. the set of Rvalued. (3) Note that Fn = Mn×1 (F) is a vector space over F.3.5. We will explain what this means when we discuss the concept of isomorphism of vector spaces. (6) C(X. +. ·) is a vector space where + is addition of matrices and · is the usual scalar multiplication. R2 is not even a subset of R3 . Examples 2. let (V. R). +W ×W . (Mn×k (F). and C ∞ (a. then (A + B)ij = Aij + Bij and (λA)ij = λAij .1. continuously diﬀerentiable functions on the interval (a. From this point on. b). ·F×W ) is a vector space over F. b) is a subspace of C(a. if V is a vector space. Then F is a vector space over K. (2) R2 is not a subspace of R3 . Then W ⊂ V is a vector subspace (or subspace) if +W ×W is an addition on W (W is closed under addition). Deﬁnition 2. the inﬁnitely diﬀerentiable functions. ·F×W is a scalar multiplication on W (W is closed under scalar multiplication). B = (Bij ) ∈ Mm×n (F) and λ ∈ F. ·) is a vector space over F where 0 is the additive identity and −x is the additive inverse of x ∈ F.
yu1 + xv1 ). (3) Show that u1 + iv1 = u2 + iv2 if and only if u1 = u2 and v1 = v2 . Exercise 2. 0). Exercise 2. (1) Show (VC .1. v1 + v2 ) and (x + iy) · (u1 . ·) is a vector space over C called the complexiﬁcation of the real vector space V. For w = u + iv ∈ VC . Exercise 2.11. the additive inverse of w. (4) Find (Rn )C and (Mm×n (R))C . Since −1 ∈ F. Exercise 2. Exercise 2. Let v ∈ V . v) = i(v.7. Exercise 2. v) = u + iv. Let v ∈ V and λ ∈ F \ {0}. Note: This implies that we can think of V ⊂ VC as all vectors of the form (v. the additive inverse of v. v1 ) = (xu1 − yv1 .1. It is an easy exercise to check that −1 · w = −w. then the additive inverse of v is unique. Exercise 2. Show that (R>0 .1. #. and deﬁne functions + : VC × VC → VC and · : C × VC → VC by (u1 . then the additive identity is unique. Exercises V will denote a vector space over F. (2) Show that (0. +. Show if v ∈ V . the real part of u + iv ∈ VC is Re(w) = u ∈ V . −w ∈ W . Let λ ∈ F.9. Show that 0F · v = 0V .Proof. we have w + (−w) = 0 ∈ W .10. 0V will denote the additive identity of V . Show that (−1) · v = −v. v1 ) + (u2 .8. Let VC = V ×V . two vectors are equal if and only if their real and imaginary parts are equal. Show that λ · 0V = 0V . then clearly aW ×W and sF×W will satisfy the distributive property. Let (V. Addition and scalar multiplication can then be rewritten in the naive way as (u1 +iv1 )+(u2 +iv2 ) = (u1 +u2 )+i(v1 +v2 ) and (x+iy)·(u1 +iv1 ) = xu1 −yv1 +i(yu1 +xv1 ). 23 . −1 · w ∈ W . and we can write (u. It remains to check that W has additive inverses and W has 0. and the conjugate is w = u − iv ∈ VC . and 0F will denote the additive identity of F.6. If W is closed under addition and scalar multiplication. Now since w. we have that if w ∈ W . v2 ) = (u1 + u2 . Show that λ · v = 0 implies v = 0. When confusion can arise. Let v ∈ V .1. Show that if V is a vector space over F. and we are ﬁnished.1. 0) for all v ∈ V . the imaginary part is Im(v) = v ∈ V .12 (Complexiﬁcation). ·) be a vector space over R. +.1. Hence.1. ) is a vector space over R where x#y = xy is the addition and r x = xr is the scalar multiplication.
As the set of all linear combinations of elements of S forms a subspace. #. Let v1 . . Deﬁnition 2. vn is a vector of the form n v= i=1 λi vi where λi ∈ F for all i = 1. n . i. . v + W is called a coset of W . vi ∈ S for all i = 1. Let S ⊂ V be a subset. span(S) = i=1 λi vi n ∈ Z≥0 . 2. . .2. the sum of no vectors is zero). These subspaces are denoted RS(A) and CS(A) respectively. Remarks 2.4. . then the row space of A is the subspace of Rn spanned by the rows of A and the column space of A is the subspace of Rm spanned by the columns of A. .3. deﬁne v+W = v+w w ∈W . . . Exercise 2. ∗) is a vector space.Hint: Use the identiﬁcation VC = V + iV described in the note. the set of cosets of W . .2.e.2. vn ∈ V . A linear combination of v1 . and (Internal) Direct Sum Deﬁnition 2.13 (Quotient Spaces). 24 .1. (2) span(∅) = (0). Let W ⊂ V be a subspace.2.e. (2) Deﬁne an addition # and a scalar multiplication ∗ on V /W so that (V /W. . (1) If A ∈ Mm×n (F) is a matrix.1. Then span(S) = Examples 2. zero.2. (1) span(S) is the smallest subspace of V containing S.2 Linear Combinations. we have that n W W is a subspace of V with S ⊂ W . Let V /W = v + W v ∈ V . (2) Since span(S) is closed under addition and scalar multiplication. we have that all (ﬁnite) linear combinations of elements of S are contained in span(S). . (1) Show that u + W = v + W if and only if u − v ∈ W . Span. . . . then span(S) ⊂ W . if W is a subspace containing S. It is convention that the empty linear combination is equal to zero (i. Suppose S ⊂ V . the zero subspace. . n. For v ∈ V . . λi ∈ F. Note that this still makes sense for S = ∅ as span(S) contains the empty linear combination.
. λn . . wmn and scalars λ1 .5. . 25 . . . .2. If V is a vector space and W1 . . . .2. . 0)} + span{(0. . . λ1 1 .8. . . Then v is a linear combination of elements of the Wi ’s. Suppose W1 . DeﬁnitionProposition 2. n 1 n 1 so there are w1 . . . .7.2. . . Proof. .2. . Then v is a linear combination of elements of the Wi ’s. . Wn ⊂ V are subspaces. .6. The following conditions are equivalent: (1) Wi ∩ j=i Wj = (0) for all i ∈ [n]. λn n such that m m 1 1 i wj ∈ Wi for all j ∈ [mj ] and n mj i λi w j . then we deﬁne the subspace n n W1 + · · · + Wn = i=1 Wi = i=1 wi wi ∈ Wi for all i ∈ [n] ⊂ V. . and let n W = i=1 Wi . . wm1 . (1) span{(1. Wn ⊂ V are subspaces. w1 . so v ∈ span i=1 Wi . .Deﬁnition 2. . . n Now suppose v ∈ span i=1 Wi . set mj ui = j=1 n n i λi wj ∈ Wi j to see that v = i=1 ui ∈ i=1 Wi . 1)} = R2 . . . . . Suppose W1 . . . (2) Proposition 2. Suppose v ∈ i=1 n Wi . . Wn ⊂ V are subspaces. Examples 2. Then n n Wi = span i=1 n i=1 Wi . . j i=1 j=1 v= For i ∈ [n].
Then we have wj ∈ Wj for all j = i such that w= j=i wj .n (2) for each v ∈ W . (2) ⇒ (3): Trivial. Then n 0=v−v = i=1 wi − i=1 wi = i=1 wi − wi . we have wj − wj = i=j wi − wi ∈ Wj ∩ i=j Wi = (0). we have that n 0=w−w = j=1 wj . denoted n W = i=1 Wi . we call W the direct sum of the Wi ’s. Suppose w ∈ Wi ∩ j=i Wj for i ∈ [n]. so wj = 0 for all j ∈ [n]. 26 . If any of the above three conditions are satisﬁed. Since W = i=1 n n Wi . n (1) ⇒ (2): Suppose Wi ∩ j=i Wj = (0) for all i ∈ [n]. (3) ⇒ (1): Now suppose (3) holds. Suppose v = i=1 n n wi with wi ∈ Wi for all i ∈ [n]. and let v ∈ W . Proof. there are wi ∈ Wi for i ∈ [n] such that v = i=1 wi . Setting wi = −w. there are unique wi ∈ Wi for i ∈ [n] such that v = i=1 n wi . (3) if wi ∈ Wi for i ∈ [n] such that i=1 wi = 0. Thus w = 0. Since wi − wi ∈ Wi for all i ∈ [n]. so wj − wj = 0. wi = wi for all i ∈ [n]. and wi = 0 = −w. then wi = 0 for all i ∈ [n]. Similarly. and the expression is unique.
j i ∈ [m]. . λi vi = 0 implies that λi = 0 for all i ∈ [n]. . / Exercises 2. Then E i. .e. .9. (1) V = V ⊕ (0) for all vector spaces V .3 Linear Independence and Bases n Deﬁnition 2. . vn } is linearly independent. F) f (y) = 0 for all y ∈ Y ⊕ f ∈ F (X. . . (2) Deﬁne E i.Remarks 2. . Then F (X. 27 . F) f (x) = 0 for all x ∈ Y . then wi is called the Wi component of w for i ∈ [n]. F). (3) The functions δx : X → F given by δx (y) = are linearly independent in F (X.l = 1 0 if i = k. i. .2. (2) C(R. vn are linearly independent if {v1 . .2. .j ∈ Mm×n (F) by (E i. If a set S is not linearly independent.2.j )k. 0 1 if x = y if x = y δx x ∈ X is linearly independent. 0. R) = {even functions} ⊕ {odd functions}. Examples 2. (2) R2 = span{(1. Examples 2. i=1 We say the vectors v1 . vn } ⊂ S. . a set. . F) = f ∈ F (X. . 1)}. .10.1. j ∈ [n] is linearly independent. 0. 1. 0)} ⊕ span{(0. . (2) The proof (1) ⇒ (2) highlights another important proof technique called the “in two places at once” technique.3. . 0)T ∈ Fn where the 1 is in the ith slot for i ∈ [n]. A subset S ⊂ V is linearly independent if for every ﬁnite subset {v1 . . (2) Suppose Y ⊂ X. (1) Letting ei = (0. j = l else. it is linearly dependent. n n (1) If V = i=1 Wi and v = i=1 wi where wi ∈ Wi for all i ∈ [n]. . . we get that ei i ∈ [n] is linearly independent.3.
. . n}. . Remark 2. . then λi = 0 for all i as {v1 .4 (1). . The order of “not” and “all” is extremely important. Then we know n λi vi = 0 =⇒ λi = 0 for all i. Let {v1 . then we have S2 = S1 ∪ {w1 . . wm }. . . . . λn were all not zero. .3. Then S ∪ {v} is linearly independent. If S2 is ﬁnite. vn } is a ﬁnite subset of S2 . . . . . Let S1 ⊂ S2 ⊂ V . . so λi = 0 for all i = 1. This is diﬀerent than if λ1 . . . . Suppose S is a linearly independent subset of V . . . . . . i=1 Now suppose n µv + i=1 µi vi = 0. Proposition 2.5. This means that there is a λi = 0 for some i ∈ {1. . . (1) If S1 is linearly dependent. .6. . and suppose v ∈ / span(S). . . . Note that the zero element of a vector space is never in a linearly independent set. Proof. i=1 If S2 is inﬁnite. λn not all zero such that n λi vi = 0. i=1 Then as {v1 . . and it is often confused by many beginning students. . .Remark 2. . Note that in the proof of 2. we have scalars λ1 .3. then so is S2 . vn } ⊂ S1 and scalars λ1 . then so is S1 .3.3. 28 . . .3. . m. . Proposition 2. vn } is a ﬁnite subset of S2 . and suppose n λi vi = 0. vn } ⊂ S1 . . Proof. (2) Let {v1 . . . . n. then there is a ﬁnite subset {v1 . .3. .4. . (2) If S2 is linearly independent. . vn } be a ﬁnite subset of S. and we have that n m λi vi + i=1 j=1 µj wj = 0 with µj = 0 for all j = 1. S2 is not linearly independent. (1) If S1 is linearly dependent. . λn which are not all zero.
3. there are v1 . . . . . . vn } ∩ {w1 . Hence µ = 0. . . so λi = µi for all i ∈ [k]. λn ∈ F \ {0} such that n v= i=1 λi vi . . . Let T = {v1 .8. After reindexing. . . As v ∈ span(S). . . . µ Hence v is a linear combination of the vi ’s. . n. Let {u1 . then we have −1 v= µ n n µi v i = i=1 i=1 −µi vi . µm ∈ F \ {0} such that m v= j=1 µj wj . uk } = {v1 . vn ∈ S and λ1 . .9.3. . . Thus k n k m k n m 0 = v−v = i=1 λi vi + i=k+1 λi vi − j=1 µj w j − j=k+1 µj w j = i=1 (λi −µi )ui + i=k+1 λi vi − j=k+1 µj w j . . and v ∈ span(S). . v can be written as a linear combination of elements of S. wm }. . .3. . wm ∈ S and µ1 .If µ = 0. . . . we may assume ui = vi = wi for all i ∈ [k]. Let S ⊂ V be a linearly independent. . λn ∈ F \ {0} such that n v= i=1 λi vi . a contradiction. . . Examples 2. . . . wm }. . . . . . . . m. Proof. . . Deﬁnition 2. If T = ∅. As v = 0. . a contradiction. and let v ∈ span(S) \ {0}. . vn } ∩ {w1 . .7. A subset B ⊂ V is called a basis for V if B is linearly independent and V = span(B) (B spans V ). . . Proposition 2. . (1) The set ei i ∈ [n] is the standard basis of Fn . . λi = 0 for all i = k + 1. then n m 0=v−v = i=1 λi vi − j=1 µj w j . and the expression is unique. so we have that k = n = m. . vn ∈ S and λ1 . 29 . Suppose there are w1 . . Then there are unique v1 . . so λi = 0 for all i ∈ [n] and µj = 0 for all j ∈ [m] as S is linearly independent. . and all µi = 0. and µj = 0 for all j = k + 1.
Note that the function f (x) = 1 for all x ∈ X is not a ﬁnite linear combination of these functions if X is inﬁnite. (1) Any vector space with a ﬁnite basis is ﬁnitely generated. R) is called even if f (x) = f (−x) for all x ∈ R and odd if f (x) = −f (−x) for all x ∈ R. Fn and Mm×n (F) are ﬁnitely generated. A function f ∈ C(R. (3) The functions δx : X → F given by δx (y) = 0 1 if x = y if x = y form a basis of F (X. R) and (2) C(R.3. ie. S2 ⊂ V be subsets of V . Show that (1) the subset of even. (1) Find a basis for the following real vector spaces: (a) S 2 (Rn ) = A ∈ Mn (R) A = AT .e.13 (Even and Odd Functions). the real symmetric matrices and (b) (Rn ) = matrices.3. For example. 2 A ∈ Mn (R) A = −AT . functions is a subspace of C(R. Deﬁnition 2. Exercise 2. Exercises Exercise 2.14. (2) The matrices Eij which have as entries all zeroes except a 1 in the ij th entry form the standard basis of Mm×n (F).11.10. Examples 2. 30 .3.12 (Symmetric and Exterior Algebras). (3) We will see shortly that C(a.j which have as entries all zeroes except a 1 in the (i. j)th entry form the standard basis of Mm×n (F). Show span(S1 S2 ) = span(S1 ) ⊕ span(S2 ). b) for a < b is not ﬁnitely generated. R) = {even functions} ⊕ {odd functions}. the real antisymmetric (or skewsymmetric) 2 (2) Show that Mn (R) = S 2 (Rn ) ⊕ (Rn ). Let S1 . (3) Suppose S1 S2 is linearly independent. Exercise 2.3. A vector space is ﬁnitely generated if there is a ﬁnite subset S ⊂ V such that span(S) = V .(2) The matrices E i. Such a set S is called a ﬁnite generating set for V . respectively odd. (1) What is span(span(S1 ))? (2) Show that if S1 ⊂ S2 . i. F) if and only if X is ﬁnite.3. then span(S1 ) ⊂ span(S2 ).
. we augment the matrix so it has n + 1 columns by letting the last column be all zeroes.3. Suppose W is a kdimensional subspace of V . We have n y ∈ CS(A) ⇐⇒ there are λ1 . Performing Gaussian elimination.e. Let A ∈ Mm×n (F ) be a matrix.1. . 31 . . vim }. . . .4 Finitely Generated Vector Spaces and Dimension For this section. . there is a nonzero vector w ∈ W that is a linear combination of {vi1 . the above condition is now equivalent to ﬁnding a nonzero x ∈ Fn such that Ax = 0. . vn are n vectors in Fm with m < n. (1) y ∈ CS(A) ⊂ Fm if and only if there is an x ∈ Fn such that Ax = y. .15. (2) If v1 . 2. . . vn } is linearly dependent. (1) Let B = {v1 . . . . Show that for any subset {vi1 . . . . . . Hence. To solve this system of linear equations. . λn such that if x = . λn such that y = i=1 λi Ai λ1 λ2 ⇐⇒ there are λ1 . then Ax = y .Exercise 2. (2) We must show there are scalars λ1 . (2) Let W be a subspace of V having the property that there exists a unique subspace W such that V = W ⊕ W . . . . . Show that W = V or W = (0). such that n n λi vi = 0. These columns correspond to free variables when we are looking for solutions of the original system of linear equations. not all zero. Lemma 2. we get a row reduced matrix U which is row equivalent to the matrix B = [A0]. vn } be a basis for V . . . . λn ⇐⇒ there is an x ∈ F such that Ax = y. . there is a nonzero x such that Ax = 0. i=1 Form a matrix A ∈ Mm×n (F) by letting the ith column be vi : A = [v1 v2  · · · vm ]. . i. V will denote a ﬁnitely generated vector space over F. vim } of B with m > n − k. . . (1) Let A1 . . Now. at least one of the ﬁrst n columns does not have a pivot in it. By (1). . .4. . λn . then {v1 . Proof. . . An be the columns of A. the number of pivots (the ﬁrst entry of a row that is nonzero) of U must be less than or equal to m as there are only m rows. .
Theorem 2.4.2. Suppose S1 is a ﬁnite set that spans V and S2 ⊂ V is linearly independent. Then S2 is ﬁnite and S2  ≤ S1 . Proof. Let S1 = {v1 , . . . , vm } and {w1 , . . . , wn } ⊂ S2 . We show n ≤ m. Since S1 spans V , there are scalars λi,j ∈ F such that
m
wj =
i=1
λi,j vi for all j = 1, . . . , n.
Let A ∈ Mm×n (F) by Ai,j = λi,j . If m > n, then the rows of A are linearly dependent by 2.4.1, so there is an x = (µ1 , . . . , µn )T ∈ Rn \ {0} such that Ax = 0. This means that
n n
Ai,j µj =
j=1 j=1
λi,j µj = 0 for all i ∈ [m].
This implies that
n n m m n
µj w j =
j=1 j=1
µj
i=1
λi,j vi =
i=1 j=1
λi,j µj
vi = 0,
a contradiction as {w1 , . . . , wn } is linearly independent. Hence n ≤ m. Corollary 2.4.3. Suppose V has a ﬁnite basis B with n elements. Then every basis of V has n elements. Proof. Let B be another basis of V . Then by 2.4.2, B is ﬁnite and B  ≤ B. Applying 2.4.2 after switching B, B yields B ≤ B , and we are ﬁnished. Corollary 2.4.4. Every inﬁnite subset of a ﬁnitely generated vector space is linearly dependent. Deﬁnition 2.4.5. The vector space V is ﬁnite dimensional if there is a ﬁnite subset B ⊂ V such that B is a basis for V . The number of elements of B is the dimension of V , denoted dim(V ). A vector space that is not ﬁnite dimensional is inﬁnite dimensional. In this case, we will write dim(V ) = ∞. Examples 2.4.6. (1) Mm×n (F) has dimension mn. (2) If a = b, then C(a, b) and C n (a, b) are inﬁnite dimensional for all n ∈ N ∪ {∞}. (3) F (X, F) is ﬁnite dimensional if and only if X is ﬁnite. Theorem 2.4.7 (Contraction). Let S ⊂ V be a ﬁnite subset such that S spans V . (1) There is a subset B ⊂ S such that B is a basis of V . 32
(2) If dim(V ) = n and S has n elements, then S is a basis for V . Proof. (1) IfS is not a basis, then S is not linearly independent. Thus, there is some vector v ∈ S such that v is a linear combination of the other elements of S. Set S1 = S \ {v}. It is clear that S1 spans V . If S1 is not a basis, repeat the process to get S2 . Repeating this process as many times as necessary, since S is ﬁnite, we will eventually get that Sn is linearly independent for some n ∈ N, and thus a basis for V . (2) Suppose S is not a basis for V . Then by (1), there is a proper subset T of S such that T is a basis for V . But T has fewer elements than S, a contradiction to 2.4.3. Corollary 2.4.8 (Existence of Bases). Let V be a ﬁnitely generated vector space. Then V has a basis. Proof. This follows immediately from 2.4.7 (1). Corollary 2.4.9. The vector space V is ﬁnitely generated if and only if V is ﬁnite dimensional. Proof. By 2.4.8, we see that a ﬁnitely generated vector space has a ﬁnite basis. The other direction is trivial. Lemma 2.4.10. A subspace W of a ﬁnite dimensional vector space V is ﬁnitely generated. Proof. We construct a maximal linearly independent subset S of W as as follows: if W = (0), we are ﬁnished. Otherwise, choose w1 ∈ W \ {0}, and set S1 = {w1 }, and note that S1 is linearly independent by 2.3.6. If W1 = span(S1 ) = W , we are ﬁnished. Otherwise, choose w2 ∈ W \ W1 , and set S2 = {w1 , w2 }, which is again linearly independent by 2.3.6. If W2 = span(S2 ) = W , we are ﬁnished. Otherwise we may repeat this process. This algorithm will terminate as all linearly independent subsets of W must have less than or equal to dim(V ) elements by 2.4.2. Now it is clear by 2.3.6 that our maximal linearly independent set S ⊂ W must be a basis for W by 2.3.6. Hence W is ﬁnitely generated. Theorem 2.4.11 (Extension). Let W be a subspace of V . Let B be a basis of W (we know one exists by 2.4.8 and 2.4.10). (1) If B = dim(V ), then W = V . (2) There is a basis C of V such that B ⊂ C. Proof. (1) Suppose B = {w1 , . . . , wn } and dim(V ) = n. Suppose W = V . Then there is a wn+1 ∈ V \ W , and B1 = B ∪ {wn+1 } is linearly independent by 2.3.6. This is a contradiction to 2.4.2 as every linearly independent subset of V must have less than or equal to n elements. Hence W = V . 33
(2) If span(B) = V , then pick v1 ∈ V \span(B). It is immediate from 2.3.6 that B1 = B ∪{v1 } is linearly independent. If span(B1 ) = V , then we may pick v2 ∈ V \ span(B1 ), and once again by 2.3.6, B2 = B1 ∪ {v2 } is linearly independent. Repeating this process as many times as necessary, we will have that eventually Bn will have dim(V ) elements and be linearly independent, so its span must be V by (1). Corollary 2.4.12. Let W be a subspace of V , and let B be a basis of W . If dim(V ) = n < ∞, then dim(W ) ≤ dim(V ). If in addition W is a proper subspace, then dim(W ) < dim(V ). Proof. This is immediate from 2.4.11. Proposition 2.4.13. Suppose V is ﬁnite dimensional and W1 , . . . , Wn are subspaces of V such that
n
V =
i=1
Wi .
i u (1) For i = 1, . . . , n, let ni = dim(Wi ) and let Bi = {v1 , . . . , vni } be a basis for Wi . Then n
B=
i=1
Bi is a basis for V .
n
(2) dim(V ) =
i=1
dim(Wi ).
Proof. (1) It is clear that B spans V . We must show it is linearly independent. Suppose
n ni i λi vj = 0. j i=1 j=1 ni n i λi wj , we have j j=1 i=1
Then setting wj =
wi = 0, so wi = 0 for all i ∈ [n] by 2.2.8. Now since
Bi is a basis for all i ∈ [n], λi = 0 for all i, j. j (2) This is immediate from (1).
Exercises
Exercise 2.4.14 (Dimension Formulas). Let W1 , W2 be subspaces of a vector space V . (1) Show that dim(W1 + W2 ) + dim(W1 ∩ W2 ) = dim(W1 ) + dim(W2 ). (2) Suppose W1 ∩ W2 = (0). Find an expression for dim(W1 ⊕ W2 ) similar to the formula found in (1). 34
we can prove the other. Let V be a vector space over R. Usually. (1) A partial order on the set P is a subset R of P × P (a relation R on P ). V will denote a vector space over F (which is not necessarily ﬁnitely generated). to prove this in full generality. Deﬁnition 2. However. (2) If T ∈ L(V ). and (iii) (transitive) (p. Is this still true if we omit the condition “W1 ⊂ W or W2 ⊂ W ”? Exercise 2.e. p) ∈ R.1. and suppose V = W1 ⊕ W2 . (b) Show that the complexiﬁcation of multiplication by f ∈ F (X.(3) Let W be a subspace of V . (ii) (antisymmetric) (p. then W = (W ∩ W1 ) ⊕ (W ∩ W2 ). R) on F (X. we denote (p. q) ∈ R and (q. Hence. We saw in the previous section that this result is very easy for ﬁnitely generated vector spaces. For this section.4. (ii) p ≤ q and q ≤ p implies p = q. p) ∈ R for all p ∈ P . q). and (iii) p ≤ q and q ≤ r implies p ≤ r. we may restate these conditions as: (i) p ≤ p for all p ∈ P . q) ∈ R as p ≤ q. r) ∈ R implies (p. we will need Zorn’s Lemma which is logically equivalent to the Axiom of Choice. such that (i) (reﬂexive) (p. 2. r) ∈ R. Deduce that dimR (VC ) = 2 dimR (V ). C) on F (X. i.5 Existence of Bases A question now arises: do all vector spaces have bases? The answer to this question is yes as we will see shortly. Show that if W1 ⊂ W or W2 ⊂ W . then p = q. (1) Show that dimR (V ) = dimC (VC ). 35 . We will not prove the equivalence of these two statements as it is beyond the scope of this course. C).15 (Complexiﬁcation 2). (q. R) is multiplication by f ∈ F (X. assuming one. (a) Show that the complexicifation of multiplication by A ∈ Mn (R) on Rn is multiplication by A ∈ Mn (C) on Cn .5. we deﬁne the complexiﬁcation of the operator T to be the operator TC ∈ L(VC ) given by TC (u + iv) = T u + iT v.
we have either s ≤ t or t ≤ s. Note that the maximal element may not be (and probably is not) unique.4. We must show X ∈ P . . Hence B is a basis for V .5. We partially order P by inclusion. a partially ordered set.2. The pair (P. i. . note that P(X) has a unique maximal element: X. we say S2 ≤ S1 for S1 . Let T be a totally ordered subset of P .e. This is obvious as t ⊂ X for all t ∈ T . Let V be a vector space. Hence Y is linearly independent. and let P(X) be the power set of X. Now. Examples 2. Then P has a maximal element. i. we invoke Zorn’s lemma to get a maximal element B of P .e. t ∈ T . (2) We may partially order P(X) by reverse inclusion. Suppose every nonempty totally ordered subset T of a nonempty partially ordered set P has an upper bound. Hence. Theorem 2. Then Y ⊂ s. the set of all subsets of X. Let P be the set of all linearly independent subsets of V . (2) Suppose P is partially ordered set.5. one of the ti ’s contains the others. Lemma 2. yn } be a ﬁnite subset of X. (4) An element m ∈ P is called maximal if p ∈ P with m ≤ p implies p = m. it spans V (B is linearly independent as it is in P ).5. . (1) Let X be a set. . i. and B ⊂ B ∪ {v}. Proof. we say S1 ≤ S2 for S1 . is an element x ∈ P such that t ≤ x for all t ∈ T . Then there are t1 . S2 ∈ P(X) if S1 ⊆ S2 .e. the union of all sets contained in T . Call this set s.3 (Zorn). We must show that X is indeed an upper bound for T . . Then we may partially order P(X) by inclusion. q ∈ P . .Note that we may not always compare p. . but B = B ∪ {v}. ≤). Remark 2. . and thus so is X. tn ∈ T such that yi ∈ ti for all i = 1. S2 ∈ P(X) if S1 ⊆ S2 . We show that T has an upper bound. (3) An upper bound for a totally ordered subset T ⊂ P . i. This is a contradiction to the maximality of B. Then V has a basis. Then B ∪ {v} is linearly independent. Note that we can compare all elements of a totally ordered set. . B ≤ B ∪ {v}. Let Y = {y1 . . Furthermore. n. 36 . Suppose not. . Our candidate for an upper bound will be X= t∈T t. .e.e. Note once again that this is not a total order unless X has one element. or sometimes P if ≤ is understood. but s is linearly independent. Note that this is not a total order unless X has only one element. The subset T is totally ordered if for every s. is called a partially ordered set. i.5 (Existence of Bases). P(X) has a unique maximal element for this order as well: ∅. Then there is some v ∈ V \ span(B). X is linearly independent. Hence t ≤ X for all t ∈ T . Since T is totally ordered. Also. We claim that B is a basis for V .5. In this sense the order is partial.
As in the proof of 2. Note that if V is ﬁnitely generated.8.5.2. Set U = span(C \ B). we see that each totally ordered subset has an upper bound.5.4. we must have that C is a basis for V . Let W ⊂ V be a subspace. we may use 2. as in the proof of 2. Then clearly U ∩ W = (0) and U + W = V . we see that each totally ordered subset has an upper bound. Then there is a (nonunique) subspace U of V such that V = U ⊕ W .5. Theorem 2.5.5.6. Let B be a basis of W .5. Let W be a subspace of V .Theorem 2. Once again.5. Proof. Exercises Exercise 2.5. Let P be the set whose elements are linearly independent subsets of V containing B partially ordered by inclusion.5.5.6 (Extension). so P has a maximal element B ∈ P (which means B ⊂ S). Once again. Then there is a basis C of V such that B ⊂ C.6. Let B be a basis of W . and extend B to a basis C of V .9. This is merely a restatement of 2. Then there is a subset B ⊂ S such that B is a basis of V .11 instead of 2. Proof.5. Let P be the set whose elements are linearly independent subsets of S partially ordered by inclusion. we must have that B is a basis for V . Proof.5.7 (Contraction). so P has a maximal element C ∈ P (which means B ⊂ C). 37 .5. so V = U ⊕ W by 2. as in the proof of 2.5. Proposition 2. Let S ⊂ V be a subset such that S spans V . As in the proof of 2.8.
38 .
(5) The derivative map D : C n (a. Often T (v) is denoted T v. b) given by f → f is an operator. vn } is a basis for V . we write L(V ) = L(V.2.1. . The set of all linear transformations from V to W is denoted L(V. is a function from V to W such that T (λu + v) = λT (u) + T (v) for all λ ∈ F and u. . a set. Then left multiplication by A deﬁnes a linear transformation LA : Fn → Fm given by x → Ax. unless stated otherwise.1. .1 Linear Transformations Deﬁnition 3. .1. λn )T = λi for i ∈ [n] are Flinear. (2) Let A ∈ Mm×n (F). Then evaluation at x is a linear transformation evx : F (X. or a map. i (4) Integration from a to b where a. (1) There is a zero linear transformation 0 : V → W given by 0(v) = 0 ∈ W for all v ∈ V . (3) The projection maps Fn → F given by e∗ (λ1 . ∗ (7) Suppose B = {v1 . v ∈ V . denoted T : V → W . Sometimes we will refer to a linear transformation as a linear operator. W ). . If V = W . Then the projection maps vj : V → F given by n ∗ vj i=1 λi v i 39 = λj . W will be vector spaces over F. There is an identity linear transformation I ∈ L(V ) given by Iv = v for all v ∈ V . F) → F given by f → f (x). Examples 3.Chapter 3 Linear Transformations For this chapter. This condition is called Flinearity. . an operator. . b ∈ R is a map C[a. V. 3. b) → C n−1 (a. . (6) Let x ∈ X. b] → R. A linear transformation from the vector space V to the vector space W . V ).
then we may deﬁne T1 + T2 : V → W by (T1 + T2 )v = T1 v + T2 v. then L(V. Hence. .1. Examples 3.7.1. vn } ⊂ {vi }. Let V be a vector space over R.1. and then we decree that n n T j=1 λj vj = j=1 λj T vj for all ﬁnite subsets {v1 . ∗ For example. and let VC be as in 2. and T is well deﬁned since {vi } is linearly independent.” i.e. (1) The complexiﬁcation of multiplication by the matrix A on Rn is multiplication by the matrix A on Cn . Deﬁnition 3. Hence. Show that evv : L(V.1. one only needs to specify where a basis goes. if {vi } is a basis for V . (2) The complexiﬁcation of multiplication by x on R[x] is multiplication by x on C[x].1. each linear transformation has an additive inverse. (2) If T1 and T2 are linear transformations V → W and λ ∈ F. the set of all Flinear transformations V → W . we specify T vi for all i. W ).3. and vi to zero for i = j. we deﬁne the complexiﬁcation of the operator T to be the operator TC ∈ L(VC ) given by TC (u + iv) = T u + iT v.4. if V and W are vector spaces over F. Example 3. This rule deﬁnes T on all of V as {vi } spans V . to deﬁne a linear transformation. in example 7 above. W ) → W given by T → T v is a linear operator. 40 . Exercises Exercise 3. and then we may “extend it by linearity. is a vector space (it is easy to see that + is an addition and · is a scalar multiplication which satisfy the distributive property.are linear transformations. Remarks 3.12. Fm ) given by A → LA is a linear transformation. If T ∈ L(V ). . Let v ∈ V .5. (1) Note that a linear transformation is completely determined by what it does to a basis. The map L : Mm×n (F) → L(Fn . we know T v for any vector v ∈ V . . and there is a zero linear transformation). and we may deﬁne λT1 : V → W by (λT1 )v = λ(T1 v). then by linearity. . If {vi } is a basis for V and we know what T vi is for all i. vj is the unique map that sends vj to one.6.1.
if T v = 0. Let T ∈ L(V. then v = 0. . . (2) Consider evx : F (X. Suppose {T v1 . and thus a vector space.2. W ). denoted ker(T ).3. We show {T vi } is linearly independent. Then since T 0 = 0. then ker(v1 ) = span{v2 . or range. (4) Let D : C 1 [0. Proposition 3. . . “Why should we care about linear transformations?” The answer is that these maps are the “natural” maps to consider when we are thinking about vector spaces. Proposition 3. so im(T ) is a subspace of W . and let {vi } be a basis for V . Suppose T ∈ L(V. of T . As we shall see. .2.4. Then {T vi } is a basis for W . Let y. Proof. ker(T ) and im(T ) are vector spaces.2. and let w. Clearly T v = 0 = T u implies T (λu + v) = 0 for all u. Proof. W ) is injective if and only if ker(T ) = (0). T vn } is a ﬁnite subset of {T vi }. vn } and im(v1 ) = F. v ∈ ker(T ) and λ ∈ F. W ). The immediate question to ask is. Then T (x − y) = 0. Let x. so x − y = 0 and x = y. . W ) is bijective. .1.2. Proposition 3. .3. Suppose now that ker(T ) = (0). . Let T ∈ L(V. . Then T (λw + x) = λy + z ∈ im(T ). . Proof. It suﬃces to show they are closed under addition and scalar multiplication. 1] as x D 0 f (t) dt = f (x) for all x ∈ [0. i.2. and thus a vector space. D has a right inverse. Let the kernel of T . T ∈ L(V. Hence T is injective.2.e. Examples 3. and suppose n n λj T vj = T j=1 j=1 λj vj 41 = 0. . z ∈ im(T ) and µ ∈ F.5. (1) Let A ∈ Mm×n (F) and consider LA . so ker(T ) is a subspace of V . x ∈ V such that T w = y and T x = z. 1] be the derivative map f → f . linear transformations preserve vector space structure in the sense of the following deﬁnition and proposition: Deﬁnition 3. .2 Kernel and Image Note that we may talk about injectivity and surjectivity of operators as they are functions. vn } is a basis for V . be w ∈ W w = T v for some v ∈ V . 1] → C[0. y ∈ V such that T x = T y. Then ker(D) = {constant functions} and im(D) = C[0. F) → F. be v ∈ V T v = 0 and let the image. Suppose T is injective. ∗ ∗ (3) If B = {v1 . 1] by the fundamental theorem of calculus. Then ker(evx ) = f : X → F f (x) = 0 and im(evx ) = F as λδx → λ. Then ker(LA ) = N S(A) and im(LA ) = CS(A). denoted im(T ) or T V .
we will omit the ◦ and write ST and T S. . . W ) is invertible if and only if it is bijective. Deﬁnition 3. . (1) Fn ∼ V if dim(V ) = n. so w ∈ span{T vi }. Then w = Tv = T n n λj vj = j=1 j=1 λj T vj . It is clear that an invertible operator is bijective. = (2) F (X. n. Then v ∈ span{vi }. λn ∈ F such that n v= j=1 λj vj . j=1 Now since {vi } is linearly independent. = Examples 3. if V ∼ W . by 3. Remark 3. Suppose y. An operator T ∈ L(V.8. if there is a linear operator S ∈ L(W. Then S(λy + z) = S(λT w + T z) = ST (λw + x) = λw + x = λST w + ST x = λSy + Sz. so there are v1 . Proposition 3. 42 . Hence S is a linear transformation. Often.2. denoted V ∼ W . F) ∼ Fn if X = n.4 n λj vj = 0. if there is an isomorphism T ∈ L(V. If T is an isomorphism. .9. W ) is called invertible. .2. Let w ∈ W . In this way.6. It remains to show S is a linear transformation. T ∈ L(V. . . We call vector spaces V. We show {T vi } spans W .7.Then since T is injective. . . In particular. = Proposition 3. z ∈ W and λ ∈ F. Proof. x ∈ W such that T w = y and T x = z. Then there is an isomorphism T : V → W. where id means the identity linear transformation. vn ∈ {vi } and λ1 . W ).2.2. Suppose dim(V ) = dim(W ) = n < ∞. when composing linear transformations. or an isomorphism (of vector spaces). Then there are w. Then there is a v ∈ V such that T v = w as T is surjective. W isomorphic. Suppose T is bijective. = then dim(V ) = dim(W ) (we allow the possibility that both are ∞). . we have λj = 0 for all j = 1. V ) such that T ◦S = idW and S◦T = idV . then T maps bases to bases. .2.2. we can identify the domain and codomain of T as the same vector space. . Then there is an inverse function S : W → V .10.
5. .2. . W ). . Then by 2. Proof. T un . Using this terminology. and let C = {w1 . W ). . n and extending by linearity. . T w1 .” 43 . Deﬁne a linear transformation T : V → W by T vi = wi for i = 1. By 2. Let T ∈ L(V. Lemma 3. Hence {T u1 .2.2. . . Remark 3.2.13. . . T wm } spans im(T ). and dim(ker(T )) is sometimes called the nullity of T .2. wm } be a basis for M . But since T ui = 0 for all i. Thus im(T ) = im(T M ). and extend by linearity. Hence u − v = 0. and bijectivity are all equivalent for T ∈ L(V. . It is easy to see that the inverse of T is the operator S : W → V deﬁned by Swi = vi for all i = 1. .2. W are of the same ﬁnite dimension. W ) when V. . . we have that {T w1 . . dim(V ) = dim(ker(T )) + dim(M ). n. .11. denoted nullity(T ). W ).12 and 3. B ∪ C is a basis for V . v ∈ M such that T u = T v. un } be a basis for ker(T ). This follows immediately from 3. .13 says that dim(V ) = rank(T ) + nullity(T ). Then T (u − v) = 0. T M is injective. Suppose V is ﬁnite dimensional. .2. We must show dim(M ) = dim(im(T )). vn } be a basis for V and {w1 . . . .9. Then dim(im(T )) is sometimes called the rank of T . = = then U ∼ W via isomorphism ST .5. . .13. Let {v1 .13 gets the name “RankNullity Theorem. and by (2). wn } be a basis for W .4.4. . . . = Our task is now to prove the RankNullity Theorem. . . so u − v ∈ M ∩ ker(T ) = (0). T wm } spans im(T ). By 2. . and let T ∈ L(V. Proof. . (3) By (1).13 (RankNullity). Theorem 3.14. . Suppose V is ﬁnite dimensional. and let M be a subspace such that V = ker(T ) ⊕ M (one exists by 2. (3) T M is an isomorphism M → im(T ). . (1) Suppose u. Remark 3.2.8).Proof. (2) Let B = {u1 . and let T ∈ L(V. (2) im(T M ) = im(T ). . 3.12. . (1) T M is injective. denoted rank(T ). . One particular application of this theorem is that injectivity.8 there is a subspace M of V such that V = ker(T ) ⊕ M . and u = v. surjectivity. Then dim(V ) = dim(im(T )) + dim(ker(T )). . which is how 3. We see that if U ∼ V via isomorphism T and V ∼ W via isomorphism S. . Hence it is bijective and thus an isomorphism. T M is surjective onto im(T ).
then T S spans W .2. The following are equivalent: (1) T is injective. let S = {v1 . Prove that dim(CS(A)) = dim(RS(A)). Hence T is bijective. (f) If T S is a basis for W . . Then dim(ker(T )) = 0. This number is called the rank of A. . Proof. . Let T ∈ L(V. W ). dim(ker(T )) = 0. so dim(V ) = dim(im(T )) = dim(W ).2. then T S is linearly independent.2. For A ∈ Mn (F). (e) If S is a basis for V .16. . and recall that T S = {T v1 . (c) If S spans V . so T is injective. the following are equivalent: (1) A is invertible. (b) If T S is linearly independent. and (3) T is bijective. then S is a basis for V . (2) For each of the false statements above (if there are any). (1) ⇒ (2): Suppose T is injective. W be ﬁnite dimensional vector spaces with dim(V ) = dim(W ). and let T ∈ L(V. (2) ⇒ (3): Suppose T is surjective. (2) Show that rank(A) = rank(LA ).2.15. . Exercise 3. . and T is surjective. T vn } ⊂ W . then S is linearly independent. (3) ⇒ (1): Obvious.Corollary 3. [Square Matrix Theorem] Prove the following (celebrated) theorem from matrix theory.17 (Rank of a Matrix). then S spans V .18. Exercises Exercise 3. ﬁnd a condition on T which makes the statement true. Let V. (1) Prove or disprove the following statements: (a) If S is linearly independent. Then by similar reasoning. (1) Let A ∈ Mn (F). vn } ⊂ V . Exercise 3. . (2) T is surjective. W ). then T S is a basis for W . (d) If T S spans W . 44 . . denoted rank(A).
T = T ◦ q. Then either ϕ = 0 or ϕ is surjective.3. vn } be a basis for the ﬁnite dimensional ∗ ∗ vector space V . (6) For all y ∈ Fn . then T is injective.19 (Quotient Spaces 2). If it is 0. there is an x ∈ Fn such that Ax = y. . Show that if W = ker(T ). vn } is a basis for V ∗ called the dual basis. if Ax = 0 for x ∈ Fn . then ϕ = 0. (5) For all y ∈ Fn . F) in the case of a ﬁnite dimensional vector space V over F. The dual of V . . Deﬁnition 3. (4) A is nonsingular.1. Exercise 3. then x = 0. then ϕ is surjective. F). denoted V ∗ . (3) There is a B ∈ Mn (F) such that BA = I. ﬁnd dim(V /W ) in terms of dim(V ) and dim(W ). (8) RS(A) = Fn . i. .(2) There is a B ∈ Mn (F) such that AB = I. We know that dim(im(ϕ)) ≤ 1.3. Let W ⊂ V be a subspace. Elements of V ∗ are called linear functionals. . .e. If it is 1. . . V ∗ = L(V. (9) rank(A) = n. there is a unique x ∈ Fn such that Ax = y. (1) Show that the map q : V → V /W given by v → v +W is a surjective linear transformation such that ker(q) = W .e.3. Proof. Using q.e.e.2. and let ϕ ∈ V ∗ . 3. there is a unique linear transformation T : V /W → U such that the diagram V DD T DD DD q DD " / <U zz z zz zz T e z V /W commutes. DeﬁnitionProposition 3.3 Dual Spaces We now study L(V. i. is the vector space of all linear transformations V → F.3. and (10) A is row equivalent to I. then T factors uniquely through q. i. 45 . . Let V be a vector space over F. Let V be a vector space over F. (2) Suppose V is ﬁnite dimensional.2. i. (3) Show that if T ∈ L(V. Let B = {v1 . Then B ∗ = {v1 . Proposition 3. U ) and W ⊂ ker(T ). (7) CS(A) = Fn .
we have that n n ∗ λi vi = 0 =⇒ ϕ = i=1 i=1 ∗ λi vi . Hence. i=1 the zero linear transformation. . . i=1 Applying the vj allows us to see that each λj is zero. i. . for all ϕ ∈ V ∗ . . we must show the map ev is bijective. . we show B ∗ is linearly independent. we see that if n u= i=1 λi vi . Suppose ϕ ∈ V ∗ . . Let λj = ϕ(vj ) for j = 1. . 46 . the map ev is injective. n. . we know that ϕ = 0. . n. . Then evv is the zero linear functional on V ∗ . Proposition 3. it is clear that ev is a linear transformation as evλu+v = λ evu + evv . .3.. . . Let V be a ﬁnite dimensional vector space over F. . Suppose n ∗ λi vi = 0. One of the most useful techniques to show linear independence is the “killoﬀ” method used in the previous proof: n ∗ λi vi vj = λj = 0. Then we have that n ϕ− i=1 ∗ λi vi vj = 0 for all j = 1. First. Since ϕ is completely determined by where it sends a basis of V .5. We show B ∗ spans V ∗ .Proof. n. . Since a linear transformation is completely determined by its values on a basis. First. . the linear transformation V ∗ → F given by the evaluation map: evv (ϕ) = ϕ(v). . B ∗ is linearly independent.e. Hence. Proof. Then we have that n ∗ λi vi i=1 vj = λj = 0 for all j = 1. vn } be a basis for V and let B ∗ = {v1 . ϕ(v) = 0. vn } be the dual basis. We will see this technique again when we discuss eigenvalues and eigenvectors. . ∗ ∗ Let B = {v1 . We must show that B ∗ is a basis for V ∗ . .4. There is a canonical isomorphism ev : V → V ∗∗ = (V ∗ )∗ given by v → evv . . .3. Suppose evv = 0. Suppose now that x ∈ V ∗∗ . n. . Setting ∗ λj = x(vj ) for j = 1. Thus. . . ϕ− Remark 3.
∗ then (x − evu )vj = 0 for all j = 1, . . . , n. Once more since a linear transformation is completely determined by where it sends a basis, we have x − evu = 0, so x = evu and ev is surjective.
Remark 3.3.6. The word “canonical” here means “completely determined,” or “the best one.” It is independent of any choices of basis or vector in the space. Proposition 3.3.7. Let V be a ﬁnite dimensional vector space over F. Then V is isomorphic to V ∗ , but not canonically. Proof. Let B = {v1 , . . . , vn } be a basis for V . Then B ∗ is a basis for V ∗ . Deﬁne a linear ∗ transformation T : V → V ∗ by vi → vi for all vi ∈ B, and extend this map by linearity. Then T is an isomorphism as there is the obvious inverse map. Remark 3.3.8. It is important to ask why there is no canonical isomorphism between V and V ∗ . The naive, but incorrect, thing to ask is whether v → v ∗ would give a canonical isomorphism V → V ∗ . Note that the deﬁnition of v ∗ requires a basis of V as in 3.3.3. We cannot deﬁne the coordinate projection v ∗ without a distinguished basis. For example, if V = F2 and v = (1, 0), we can complete {v} to a basis in many diﬀerent ways. Let B1 = {v, (0, 1)} and B2 = {v, (1, 1)}. Then we see that v ∗ (1, 1) = 1 relative to B1 , but v ∗ (1, 1) = 0 relative to B2 .
Exercises
Exercise 3.3.9 (Dual Spaces of Inﬁnite Dimensional Spaces). Suppose B = vn n ∈ N is ∗ a basis of V . Is it true that V ∗ = span vn n ∈ N ?
3.4
Coordinates
For this section, V, W will denote ﬁnite dimensional vector spaced over F. We now discuss the way in which all ﬁnite dimensional vector spaces over F look like Fn (nonuniquely). We then study L(V, W ), with the main result being that if dim(V ) = n and dim(W ) = m, then L(V, W ) ∼ Mm×n (F) (nonuniquely) which is left to the reader as an exercise. = Deﬁnition 3.4.1. Let V be a ﬁnite dimensional vector space over F, and let B = (v1 , . . . , vn ) be an ordered basis for V . The map [·]B : V → Rn given by ∗ λ1 v1 (v) n λ2 v ∗ (v) 2 v= λi vi −→ . = . . . . . i=1 ∗ λn vn (v) is called the coordinate map with respect to B. We say that [v]B are coordinates for v with respect to the basis B. Note that we denote an ordered basis as a sequence of vectors in parentheses rather than a set of vectors contained in curly brackets. 47
Proposition 3.4.2. The coordinate map is an isomorphism. Proof. It is clear that the coordinate map is a linear transformation as [λu+v]B = λ[u]B +[v]B for all u, v ∈ V and λ ∈ F. We must show [·]B is bijective. We show it is injective. Suppose that [v]B = (0, . . . , 0)T . Then since
n
v=
i=1
λi vi
for unique λ1 , . . . , λn ∈ F, we must have that λi = 0 for all i = 1, . . . , n, and v = 0. Hence the map is injective. We show the map is surjective. Suppose (λ1 , . . . , λn )T ∈ Fn . Then deﬁne the vector u by
n
u=
i=1 T
λi vi .
Then [u]B = (λ1 , . . . , λn ) , so the map is surjective. Deﬁnition 3.4.3. Let V and W be ﬁnite dimensional vector spaces, let B = (v1 , . . . , vn ) be an ordered basis for V , let C = (w1 , . . . , wm ) be an ordered basis for W , and let T ∈ L(V, W ). The matrix [T ]C ∈ Mm×n (F) is given by B
∗ ([T ]C )ij = wi (T vj ) = e∗ ([T vj ]C ) B i ∗ where wi is projection onto the ith coordinate in W and e∗ is the projection onto the ith i coordinate in Fm . We can also deﬁne [T ]C as the matrix whose (i, j)th entry is λi,j where the λi,j are the B unique elements of F such that m
T vj =
i=1
λi,j wi .
We can see this in terms of matrix augmentation as follows: [T ]C = [T v1 ]C · · · [T vn ]C . B The matrix [T ]C is called coordinate matrix for T with respect to B and C. If V = W B and B = C, then we denote [T ]B as [T ]B . B Remark 3.4.4. Note that the j th column of [T ]C is [T vj ]C . B Examples 3.4.5. (1) Let B = (v1 , . . . , v4 ) be an ordered basis for the ﬁnite dimensional vector space V . Let S ∈ L(V, V ) be given by Svi = vi+1 for i = 4 and Sv4 = v1 . Then 0 0 0 1 1 0 0 0 [S]B = 0 1 0 0 . 0 0 1 0 48
(2) Let S be as above, but consider the ordered basis B = (v4 , . . . , v1 ). Then 0 1 0 0 0 0 1 0 [S]B = 0 0 0 1 . 1 0 0 0 (3) Let S be as above, but consider the ordered basis B = (v3 , v2 , v4 , v1 ). Then 0 1 0 0 0 0 0 1 [S]B = 1 0 0 0 . 0 0 1 0 Proposition 3.4.6. Let V and W be ﬁnite dimensional vector spaces, let B = (v1 , . . . , vn ) be an ordered basis for V , let C = (w1 , . . . , wm ) be an ordered basis for W , and let T ∈ L(V, W ). Then the diagram V V
[·]B [T ]C B T
/W /W
[·]C
commutes, i.e. [T ]C [v]B = [T v]C for all v ∈ V . B Proof. To show both matrices are equal, we must show that their entries are equal. We know ([T ]C )i,j = e∗ [T vj ]C . Hence i B
n n n
([T ]C [v]B )i,1 B
=
j=1
([T ]C )i,j ([v]B )j,1 B
n
=
j=1
e∗ [T vj ]C ([v]B )j,1 i
n
=
e∗ i
j=1
([v]B )j,1 [T vj ]C
=
e∗ i
j=1
([v]B )j,1 T vj
C
=
e∗ i
T
j=1
([v]B )j,1 vj
C
= e∗ [T v]C = ([T v]C )i,1 i
as [·]C , e∗ , and T are linear transformations, and i
n
v=
j=1
([v]B )j1 vj .
Remarks 3.4.7. When we proved 2.4.3 and 2.5.6 (1), we used the coordinate map implicitly. The algorithm of Gaussian elimination proves the hard result of 2.4.2, which in turn, proves these technical theorems after we identify the vector space with Fn for some n. Proposition 3.4.8. Suppose S, T ∈ L(V, W ) and λ ∈ F, and suppose that B = {v1 , . . . , vn } is a basis for V and C = {w1 , . . . , wm } is a basis for W . Then [S + λT ]C = [S]C + λ[T ]C . B B B Thus [·]C : L(V, W ) → Mm×n (F) is a linear transformation. B 49
j wi B +λ i=1 ([T ]C )i. W ) ∼ Mm×n (F). Exercise 3.j . A B A Proof. Let T ∈ L(V. and suppose that A = {u1 . so [S + λT ]C = [S]C + λ[T ]C . .4.12.j (Svi ) A p = i=1 ([T ]B )i. Exercise 3. First.4.13. . Thus m m m (S + λT )vj = Svj + λT vj = i=1 ([S]C )i. W ). . Suppose V is ﬁnite dimensional and B. V ) and S ∈ L(V.6 as [S+λT ]C x = [(S+λT )[x]−1 ]C = [S[x]−1 +λT [x]−1 ]C = [S[x]−1 ]C +λ[T [x]−1 ]C = [S]C x+λ[T ]C x B B B B B B B B for all x ∈ Fn .2 and 3.j for all j ∈ [m].j A k=1 ([S]C )k. so [ST ]C = [S]C [T ]B . and C = {w1 .j = ([S]C )i. .j B A wk = k=1 [S]C [T ]B B A k. Suppose T ∈ L(U. W respectively. We have that n n n p ST uj = S i=1 p ([T ]B )i. (1) Show that L(V. C be bases for V. . .9. .i ([T ]B )i. B = {v1 .4. B B for all j ∈ [n]. . Then [ST ]C = [S]C [T ]B . Exercise 3. We give a direct proof as well.j + λ([T ]C )i. .11. um } is a basis for U .j wi B and T vj = i=1 ([T ]C )i. vn } is a basis for V . wp } is a basis for W . = (2) Show T is invertible if and only if [T ]C is invertible.j )wi . . A B A Exercises V. Show [T ]B ∼ [T ]C . .4. C are two bases of V .4. We have that m m Svj = i=1 ([S]C )i. and let B.j wi B = i=1 (([S]C )i. B Exercise 3.Proof.i wk B wk = k=1 i=1 ([S]C )k. B B B B B B Proposition 3.j wi B for all j ∈ [n].10. 50 .4. and ([S + λT ]C )i.j + λ([T ]C )i. W will denote vector spaces.j vi A n = i=1 ([T ]B )i. . Suppose dim(V ) = n < ∞ and dim(W ) = m < ∞.4. we note that this follows immediately from 3. W ).
The minimal n ∈ Z≥0 such that ai = 0 for all i > n (if it exists) is called the degree of p. . 4. . denoted deg(p). A polynomial p ∈ F[z] is called monic if LC(p) = 1. F is a ﬁeld. Note that p = 0 if and only if LC(p) = 0. The latter result relies on a result from complex analysis which is stated but not proved. we discuss the background material on polynomials needed for linear algebra. denoted 0. denoted LC(p) is adeg(p) . n. the sequence of all zeroes.Chapter 4 Polynomials In this section. Often.1. and the leading coeﬃcient of 0 is 0. . A polynomial p over F is a sequence p = (ai )i∈Z≥0 where ai ∈ F for all i ∈ Z≥0 such that there is an n ∈ N such that ai = 0 for all i > n. The zero polynomial 0 induces the zero function 0 : F → F by z → 0. and the second says that every polynomial in C[z] has a root. 51 . The ﬁrst is a result on factoring polynomials. to be −∞. Note that it is still possible in this case that deg(p) < n. For this section. The leading coeﬃcient of of p. The two main results of this section are the Euclidean Algorithm and the Fundamental Theorem of Algebra. then it is implied that p = (ai ) where ai = 0 for all i > n. and we deﬁne the degree of the zero polynomial. .1. we identify a polynomial with the function it induces. and ai is called the coeﬃcient of the z i term for i = 1. A nonzero polynomial p = (ai ) induces a function still denoted p : F → F given by deg(p) p(z) = i=0 ai z i . we call p a polynomial over F with indeterminate z. Sometimes when we identify the polynomial with the function.1 The Algebra of Polynomials Deﬁnition 4. If we say that p is the polynomial given by n p(z) = i=0 ai z i .
e.3. (3) Another consequence of (1) is that deg(pq) = deg(p) + deg(q) for all p. q ∈ F[z]. (4) deg(p + q) ≤ max{deg(p). it is of the form λz − µ for λ. q with the functions they induce: deg(p) deg(q) p(z) = i=0 ai z and q(z) = i=0 i bi z i . the polynomial p + q ∈ F[z] is given by deg(p) deg(q) max{deg(p). we have deg(p) deg(p) (λp)(z) = λp(z) = λ i=0 ai z = i=0 i (λai )z i . q ∈ F[z]. (1) It is clear that LC(pq) = LC(p) LC(q). q = 0. The set of all polynomials of degree less than or equal to n ∈ Z≥0 is denoted Pn (F).1. q ∈ F[z] with p = (ai ) and q = (bi ).1. If λ ∈ F and p = (ai ) ∈ F[z]. then we deﬁne the polynomial λp = (ci ) by ci = λai for all i ∈ Z≥0 . (2) A consequence of (1) is that pq = 0 implies p = 0 or q = 0 for p. Alternatively. and if p. Alternatively. q are monic. deg(p) deg(q) deg(p)+deg(q) (pq)(z) = p(z)q(z) = i=0 ai z i i=0 bi z = k=0 i+j=k i ai bj zk . Remarks 4. we have deg(p + q) ≤ deg(p) + deg(q). Examples 4. (2) A linear polynomial is a polynomial of degree 1. (1) p(z) = z 2 + 1 is a polynomial in F[z]. if we are using the function notation. We deﬁne the polynomial pq = (ci ) ∈ F[z] by ck = i+j=k ai bj for all k ∈ Z≥0 .The set of all polynomials over F is denoted F[z]. deg(q)} for all p. 52 . µ ∈ F with λ = 0. so pq is monic if and only if p. If p.2.deg(q)} (p + q)(z) = p(z) + q(z) = i=0 ai z + i=0 i bi z = i=1 i (ai + bi )z i . Note that if we identify p. i. q ∈ F[z]. then we deﬁne the polynomial p + q = (ci ) ∈ F[z] by ci = ai + bi for all i ∈ Z≥0 .
Deﬁnition 4.1.4. An Falgebra, or an algebra over F, is a vector space (A, +, ·) and a binary operation ∗ on A called multiplication such that the following axioms hold: (A1) multiplicative associativity: (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ A, (A2) distributivity: a ∗ (b + c) = a ∗ b + b ∗ c and (a + b) ∗ c = a ∗ c + b ∗ c for all a, b, c ∈ A, and (A3) compatibility with scalar multiplication: λ · (a ∗ b) = (λ · a) ∗ b for all λ ∈ F and a, b ∈ A. The algebra is called unital if there is a multiplicative identity 1 ∈ A such that a∗1 = 1∗a = a for all a ∈ A, and the algebra is called commutative if a ∗ b = b ∗ a for all a, b ∈ A. Examples 4.1.5. (1) The zero vector space (0) is a unital, commutative algebra over F with multiplication given as 0 ∗ 0 = 0. The unit in the algebra in this case is 0. (2) Mn (F) is an algebra over F. (3) L(V ) is an algebra over F if V is a vector space over F where multiplication is given by S ∗ T = S ◦ T = ST . (4) C(a, b) is an algebra over R. This is also true for closed and half open intervals. (5) C n (a, b) is an algebra over R for all n ∈ N ∪ {∞}. Proposition 4.1.6. F[z] is an Falgebra with addition, multiplication, and scalar multiplication deﬁned as in 4.1.1. Proof. Exercise.
Exercises
Exercise 4.1.7 (Ideals). An ideal I in an Falgebra A is a subspace I ⊂ A such that a∗x ∈ I and x ∗ a ∈ I for all x ∈ I and a ∈ A. (1) Show that I = p ∈ F[z] p(0) = 0 is an ideal in F[z]. (2) Show that Ix = f ∈ C[a, b] f (x) = 0 is an ideal in C[a, b] for all x ∈ [a, b]. (3) Show that I = f ∈ C(R, R) there are a, b ∈ R such that f (x) = 0 for all x < a, x > b is an ideal in C(R, R) (note that the a, b ∈ R depend on the f ). Exercise 4.1.8 (Ideals of Matrix Algebras). Find all ideals of Mn (F). Exercise 4.1.9. Let Pn (F) be the subset of F[z] consisting of all polynomials of degree less than or equal to n, i.e. Pn (F) = p ∈ F[z] deg(p) ≤ n . Show that Pn (F) is a subspace of F[z] and dim(Pn (F) = n + 1. 53
Exercise 4.1.10. Let B = (1, x, x2 , x3 ), respectively C = (1, x, x2 ), be an ordered basis of P3 (F), respectively P2 (F); let B = (x3 , x2 , x, 1), respectively C = (x2 , x, 1) be another ordered basis of P3 (F), respectively P2 (F); and let D ∈ L(P3 (F), P2 (F)) be the derivative map az 3 + bz 2 + cz + d −→ 3az 2 + 2bz + c for all a, b, c ∈ F. Find [D]C . and [D]C . B B
4.2
The Euclidean Algorithm
Theorem 4.2.1 (Euclidean Algorithm). Let p, q ∈ F[z] \ {0} with deg(q) ≤ deg(p). Then there are unique k, r ∈ F[z] such that p = qk + r and deg(r) < deg(q). Proof. Suppose
m n
p(z) =
i=0
ai z and q(z) =
i=0
i
bi z i
where m ≥ n, am = 0, and bn = 0. Then set k1 (z) = am m−n z and p2 (z) = p(z) − k1 (z)q(z), bn
and note that deg(p2 ) < deg(p). If deg(p2 ) < deg(q), we are ﬁnished by setting k = k1 and r = p2 . Otherwise, set k2 (z) = LC(p2 ) deg(p2 )−n z and p3 (z) = p2 (z) − k2 (z)q(z), bn
ad note that deg(p3 ) < deg(p2 ). If deg(p3 ) < deg(q), we are ﬁnished by setting k = k1 + k2 and r = p3 . Otherwise, set k3 (z) = LC(p3 ) deg(p3 )−n z and p4 (z) = p3 (z) − k3 (z)q(z), bn
ad note that deg(p4 ) < deg(p3 ). If deg(p4 ) < deg(q), we are ﬁnished by setting k = k1 +k2 +k3 and r = p4 . Otherwise, we may repeat this process as many times as necessary, and note that this algorithm will terminate as deg(pj+1 ) < deg(pj ) (eventually deg(pj ) < deg(q) for some q). It remains to show k, r are unique. Suppose p = k1 q+r1 = k2 q+r2 with deg(r1 ), deg(r2 ) < deg(q). Then 0 = (k1 − k2 )q + (r1 − r2 ). Now since deg(r1 − r2 ) < deg(q), we have that 0 = LC((k1 − k2 )q + (r1 − r2 )) = LC((k1 − k2 )q) = LC(k1 − k2 ) LC(q), so k1 = k2 as q = 0. It immediately follows that r1 = r2 . 54
Exercises
Exercise 4.2.2 (Principal Ideals). Suppose A is a unital, commutative algebra over F. An ideal I ⊂ A is called principal if there is an x ∈ I such that I = ax a ∈ A . Show that every ideal of F[z] is principal (see 4.1.7).
4.3
Prime Factorization
Deﬁnition 4.3.1. For p, q ∈ F[z], we say p divides q, denoted pq, if there is a polynomial k ∈ F[z] such that kp = q. Examples 4.3.2. (1) (z ± 1)  (z 2 − 1). (2) (z ± i)  (z 2 + 1) in C[z], but there are no nonconstant polynomials in R[z] that divide z 2 + 1. Remark 4.3.3. Note that the above deﬁnes a relation on F[z] that is reﬂexive and transitive (see 1.1.6). It is also antisymmetric on the set of monic polynomials. Deﬁnition 4.3.4. A polynomial p ∈ F[z] with deg(p) ≥ 1 is called irreducible (or prime) if q ∈ F[z] with q  p and deg(q) < deg(p) implies q is constant. Examples 4.3.5. (1) Every linear (degree 1) polynomial is irreducible. (2) z 2 + 1 is irreducible over R (in R[z]), but not over C (in C[z]). Deﬁnition 4.3.6. Polynomials p1 , . . . , pn ∈ F[z] where n ≥ 2 and deg(pi ) ≥ 1 for all i ∈ [n] are called relatively prime if q  pi for all i ∈ [n] implies q is constant. Examples 4.3.7. (1) Any n ≥ 2 distinct monic linear polynomials in F[z] is relatively prime. (2) Any n ≥ 2 distinct quadratics z 2 + az + b ∈ R[z] such that a2 − 4b < 0 are relatively prime. (3) Any quadratic z 2 + az + b ∈ R[z] with a2 − 4b < 0 and any linear polynomial in R[z] are relatively prime. Proposition 4.3.8. p1 , . . . , pn ∈ F[z] where n ≥ 2 are relatively prime if and only if there are q1 , . . . , qn ∈ F[z] such that
n
qi pi = 1.
i=1
55
Proof. 56 .Proof. We have p(q − r) = 0. If p  q1 . Suppose p. for j ∈ [n].3. By 4. q1 . By the Euclidean algorithm. p.3. Choose polynomials q1 . so f must be constant. in which case we are ﬁnished. qn ∈ F[z] such that n f= i=1 qi p i has minimal degree which is not −∞ (in particular. we are ﬁnished. we have pq2 . g2 ∈ F[z] such that g1 p + g2 q1 = 1. and the pi ’s are relatively prime. Proof. q1 are relatively prime (if k is nonconstant and k  p. we must have that LC(q − r) = 0. . in which case we apply the induction hypothesis to get pqi for some i ∈ [n − 1]. and q = r. Now we know deg(f ) ≤ deg(pj ) as we could have qj = 1 and qi = 0 for all i = j. rj with deg(rj ) < deg(f ) such that pj = kj f + rj . We then get q2 = g1 q2 p + g2 q1 q2 . . Thus f pi for all i ∈ [n]. qn ∈ F[z] such that n qi pi = 1. Since n rj = pj − kf = pj − k i=1 qi pi = (1 − kqj )pj + i=j kqi pi . Otherwise. r ∈ F[z] with p = 0 such that pq = pr. qn ∈ F[z] where n ≥ 2 and p is irreducible. p  qi for some i ∈ [n]. . . so q must be constant. We proceed by induction on n ≥ 2.8. . or pq1 · · · qn−1 . either pqn .9.3. .10. . q2 ∈ F[z] and p  q1 q2 . Then q = r. Corollary 4. n = 2: Suppose q1 . we must have that rj = 0 for all j ∈ [n] since deg(f ) was chosen to be minimal. As f = 0. we get the desired result. q. and replacing qi with qi /f for all i ∈ [n]. Lemma 4. Then q1. but k q1 ). Then if p  (q1 · · · qn ). then k = p. As pg1 q2 p and pg2 q1 q2 . so by the case n = 2. f = 0). Suppose p. n − 1 ⇒ n: We have p(q1 · · · qn−1 )qn . there are unique kj . . . we may divide by f . there are g1 . Suppose there are q1 . As LC(p(q − r)) = LC(p) LC(q − r). i=1 and suppose qpi for all i ∈ [n]. Suppose now that the pi ’s are relatively prime. . . .
is called the greatest common divisor of p1 and p2 . By 4. r ∈ F[z] with deg(q). As the result holds for q. Suppose p.3. (2) Show that there is a unique polynomial q ∈ F[z] of maximal degree with LT(q) = 1 such that q  p1 and q  p2 . Existence: We proceed by induction on deg(p). . µn ∈ F. Show that there is a unique polynomial p ∈ F[z] of degree n − 1 such that p(λi ) = µi for all i ∈ [n]. Proof. so pn qi for some i ∈ [m]. deg(p) > 1: We assume the result holds for all polynomials of degree less than deg(p). there is are nonconstant polynomials q. .3. . 57 . Let λ1 . p2 ∈ F[z] with degree ≥ 1. Show that pn and q m are relatively prime for n. . Deduce that a polynomial of degree d is uniquely determined by where it sends d + 1 (distinct) points of F. Uniqueness: Suppose n m p=λ i=1 pi = µ j=1 qi where all pi .11 (Unique Factorization).13. we have that p1 · · · pn−1 = q1 · · · qm−1 . and we may repeat this process. If p is not irreducible. . qj ’s are monic. Then there are unique monic irreducible polynomials p1 . we have that the result holds for p. we may assume i = m. Exercise 4. pn and a unique constant λ ∈ F \ {0} such that n p = λp1 · · · pn = λ i=1 pi . Let p1 . (1) Show that there is a unique monic polynomial q ∈ F[z] of minimal degree such that p1  q and p2  q. deg(p) = 1: This case is trivial as p/ LC(p) is a monic irreducible polynomial and p = LC(p)(p/ LC(p)). Now pn divides q1 · · · qm . . is called the least common multiple of p1 and p2 . . Exercise 4. After relabeling. p2 ). Then λ = µ = LC(p). But then pn = qm .3. irreducible polynomials. then so is p/ LC(p). usually denoted lcm(p1 . usually denoted gcd(p1 .Theorem 4.14 (Greatest Common Divisors and Least Common Multiples). deg(r) < deg(p) such that p = qr. λn ∈ F be distinct and let µ1 .3. p2 ).10. q ∈ F[z] are relatively prime. . . and we have p = LC(p)(p/ LC(p)).3. This polynomial.12. This polynomial. m ∈ N. Exercises Exercise 4. so p1 · · · pn−1 pn = q1 · · · qm−1 pn . Eventually. r. If p is irreducible. . we see that n = m and that the qj ’s are at most a rearrangement of the pi ’s. . . Let p ∈ F[z] with deg(p) ≥ 1.
58 .λ ) ≤ deg(p).4. We immediately see r(λ) = 0. Uniqueness: Suppose p.4. Pick a monic polynomial q ∈ K[z] of minimal degree such that q(λ) = 0. Corollary 4.4 Irreducible Polynomials For this section. we may assume p is monic (if not.λ (λ) = 0. Without loss of generality. µn with µn = 0 such that n n µi i i λ = 0.2. there are k. we replace p with p/ LC(p). . q are both monic.4. As p. Then λ ∈ K if and only if deg(IrrK. Proof. Hence p = kq with deg(p) = deg(q). so there is a monic polynomial in K[z] with λ as a root. Suppose λ ∈ F and dimK (F) = n. Then IrrK. Lemma 4. The irreducible polynomial of λ over the ﬁeld K is the unique monic irreducible polynomial IrrK. so r(λ) = 0.3. Then deg(p) = deg(q). so there are scalars µ0 . q ∈ K[z] are both monic polynomials of minimal degree such that p(λ) = q(λ) = 0. and p = q.4. µi λ = 0 =⇒ µn i=0 i=0 Deﬁne p ∈ K[z] by n p(z) = i=0 µi i z. . Suppose λ ∈ F is a root of p ∈ K[x] where deg(p) ≥ 1. r ∈ K[z] with deg(r) < deg(IrrK. so q(λ) = k(λ)r(λ) = 0. F will denote a ﬁeld and K ⊂ F will be a subﬁeld such that F is a ﬁnite dimensional Kvector space. so k is a constant. DeﬁnitionProposition 4.2. r ∈ K[z] with deg(r) < deg(q) such that p = kq + r. by the Euclidean algorithm. Then q is irreducible (otherwise. Suppose λ ∈ F.λ +r. Existence: Since λ ∈ F. r ∈ K[z] of degree ≥ 1 such that q = kr. The proof is similar to 4. so r = 0.1. The set λk k ∈ Z≥0 cannot be linearly independent over K as F is a ﬁnite dimensional Kvector space.λ ) ≤ n. λ ∈ F is called a root of p ∈ K[z] where deg(p) ≥ 1 if p(λ) = 0. Proof. Now p(λ) = k(λ)q(λ) + r(λ).4. λk ∈ F for all k ∈ Z≥0 . Then deg(IrrK. Now since deg(IrrK.4. Deﬁnition 4.λ ) = 1. . Let λ ∈ F. so by the Euclidean algorithm.4. k = 1.λ ) such that p = k IrrK. Corollary 4. there are k. µn Then p(λ) = 0. This is only possible if r = 0 as deg(r) < deg(q). so either k(λ) = 0 or r(λ) = 0.5.λ p. a contradiction as q was chosen of minimal degree). there are k.4.λ ∈ K[z] of minimal degree such that IrrK. .
In fact.7. Deﬁnition 4. (2) The polynomial z 2 + 1 splits in C[z] but not in R[z].2. . .1. If λ ∈ C is a root of p ∈ R[z].8. / Deﬁnition 4.4. since p(λ) = 0. The polynomial p ∈ F[z] splits (into linear factors) if there are λ. Examples 4. Hence. λ1 . the irreducible polynomial in R[z] for λ ∈ C is given by IrrR. we have that p(λ) = p(λ) = 0 since the coeﬃcients of p are all real numbers.λ (z) = z−λ (z − λ)(z − λ) = z 2 − 2 Re(λ)z + λ2 if λ ∈ R if λ ∈ R. .9.4. The ﬁeld F is called algebraically closed if every p ∈ F[z] splits.Example 4. .6. A function f : C → C is entire if there is a power series representation f (z) = n=0 an z n where an ∈ C for all n ∈ Z≥0 that converges for all z ∈ C. Exercises 4. Examples 4. (1) Every polynomial in C[z] is entire as its power series representation is itself. then λ is also a root of p.4. We know that C is a two dimensional real vector space.5 The Fundamental Theorem of Algebra ∞ Deﬁnition 4.4.5. (2) f (z) = ez is entire as ∞ ez = j=0 zn n! z 2n (2n)! (3) f (z) = cos(z) is entire as ∞ cos(z) = j=0 (−1)n (4) f (z) = sin(z) is entire as ∞ sin(z) = j=0 (−1)n z 2n+1 (2n + 1)! 59 .5. (1) Every constant polynomial trivially splits in F[z]. λn ∈ F such that n p(z) = λ i=1 (z − λi ).
Irrλ p. Hence there are µ1 . and T ∈ L(V ). . so there is a p1 ∈ C[z] such that p(z) = (z − µ1 )p1 (z). we see that pn must be constant as a polynomial has at most n roots. Every bounded.5. we will need a theorem from complex analysis. so p is constant. Let p ∈ C[z] be a nonconstant polynomial.4. . We know p has a root λ ∈ C by 4. Given a polynomial p(z) = j=0 λj z j ∈ F[z]. 60 . Repeating this process.4 (Fundamental Theorem of Algebra). Every nonconstant polynomial in C[x] has a root.6 The Polynomial Functional Calculus n Deﬁnition 4.6. Proof. If p1 is nonconstant. Theorem 4. then 1/p is entire and bounded. µn ∈ C such that n p(z) = k j=1 (z − µj ) = k(z − µ1 ) · · · (z − µn ) for some k ∈ C. Proof. Proposition 4.5.5. We state it without proof.5. If not. By Liouville’s Theorem.1 (Polynomial Functional Calculus). Remark 4. .5.5. If p2 is constant we are ﬁnished. Exercises 4. 1/p is constant. then apply 4. Let p be a polynomial in C[z]. say µ1 . . p has a root. and let n = deg(p).3 (Liouville).4. we may repeat the process for p2 to get p3 . so by 4. Theorem 4.In order to prove the Fundamental Theorem of Algebra.5. Hence every polynomial in C[z] splits.6. and so forth. we can deﬁne an operator by n p(T ) = j=0 λj T j with the convention that T 0 = I. Then by 4. there is a p2 ∈ R[z] with deg(p2 ) < deg(p) such that p = Irrλ p2 .4. i. This process will terminate as deg(pn ) < deg(pn+1 ). entire function f : C → C is constant.6.5. If p has no roots. Every nonconstant polynomial p ∈ R[z] splits into linear and quadratic factors.e. so there is a p2 ∈ C[z] with deg(p2 ) = n − 2 such that p(z) = (z − µ1 )(z − µ2 )p2 (z).4 again to see p1 has a root µ2 .
Exercises Exercise 4. and (4) (p ◦ q)(T ) = p(q(T )) for all p. p ∈ F[z] and T ∈ L(V ).2.6. Prove 4. q ∈ F[z] and T ∈ L(V ). (3) (λp)(T ) = λp(T ) for all λ ∈ F. The polynomial functional calculus satisﬁes: (1) (p + q)(T ) = p(T ) + q(T ) for all p.6.3. Exercise.2. 61 .6. q ∈ F[z] and T ∈ L(V ). q ∈ F[z] and T ∈ L(V ).Proposition 4. Proof. (2) (pq)(T ) = p(T )q(T ) = q(T )p(T ) for all p.
62 .
(2) If v ∈ V \ {0} such that T v = λv for some λ ∈ F. is λ ∈ F T − λI is not invertible .1. then v is called an eigenvector of T with corresponding eigenvalue λ. We begin our study of operators in L(V ) by studying the spectrum of an element T ∈ L(V ) which gives a lot of information about the operator. We then discuss methods for calculating the spectrum sp(T ) when V is ﬁnite dimensional. Eigenvectors. Let T ∈ L(V ).e. T − λI is not invertible if and only if T − λI is not injective. there is an eigenvector v of T with corresponding eigenvalue λ. It is clear that T − λI is not injective if and only if there is an eigenvector v of T with corresponding eigenvalue λ.Chapter 5 Eigenvalues. (3) If λ is an eigenvalue of T .2 depends on the ﬁnite dimensionality of V as 3. Remark 5. V will denote a vector space over F. 63 .1 Eigenvalues and Eigenvectors Deﬁnition 5.1. Note that 5. By 3.15 depends on the ﬁnite dimensionality of V .2. Proof. namely the characteristic and minimal polynomials. denoted sp(T ). (1) The spectrum of T .1. Then λ ∈ F is an eigenvalue of T ∈ L(V ) if and only if λ ∈ sp(V ).2. Examples 5. Suppose V is ﬁnite dimensional. we will identify LA ∈ L(Fn ) with the matrix A. Proposition 5.15.1.1. and the Spectrum For this chapter.3. 5. For A ∈ Mn (F).4.1.2. It is clear that Eλ is a subspace of V . i. then Eλ = w ∈ V T w = λw is the eigenspace associated to the eigenvalue λ.
a0 . a1 . 1}. a1 . ±i 2 (4) Consider Lz ∈ L(F[z]) given by Lz (p(z)) = zp(z). . and B has no eigenvectors. . a1 . ). b 0 −1 1 0 ∈ M2 (R) then we must have that λa = −b and λb = a. . ). . 64 . . Now the ﬁrst is nonzero as λ ∈ R. Thus a = 0. Thus λ(a − b) = 0. a4 . a2 . a2 . a1 . . then 1 2 1 1 1 1 a b = 1 2 a+b a+b =λ a . b so a + b = 2λa = 2λb. a3 . In fact. we have λ = 1 as a = b = 0. (2) The eigenvalues of the operator A= 1 2 1 1 1 1 ∈ Mn (F) are 0. a2 . . −1 If we had an eigenvector associated to the eigenvalue λ. One can also deﬁne the left shift operator L ∈ L(F[z]) by L(a0 . . . ) = Lz (a0 . a3 . . Hence sp(LA ) = {0. or a = b. Then sp(LA ) is the set of distinct values on the diagonal of A. a2 .(1) Suppose A ∈ Mn (F) is an upper triangular matrix. so b = 0. (5) Recall that a polynomial is really a sequence of elements of F which is eventually zero. This can be seen as A − λI is not invertible if and only if λ appears on the diagonal of A. if 0 −1 1 0 a b = −b a =λ a . ) = (0. Then the set of eigenvalues of Lz is empty as zp(z) = λp(z) if and only if (z − λ)p(z) = 0 if and only if p = 0 for all λ ∈ F. (3) The operator B= has no eigenvalues. so either λ = 0. If instead we consider B ∈ M2 (C). a2 . . . and (λ2 + 1)b = 0. Now the operator Lz discussed above is really the right shift operator R on F[z]: R(a0 . . then B has eigenvalues ±i corresponding to eigenvectors 1 1 √ . a3 . ) = (a1 . In the latter case. a3 . 1 corresponding to respective eigenvectors 1 √ 2 1 1 1 and √ 2 1 . . so −b = λa = λ2 b. .
1. Lemma 5. Then p(T )v = p(λ)v. Suppose T ∈ L(V ).1. . j=i (λi − λj ) j=i .1. R are not invertible. Thus n Eλi i=1 is a welldeﬁned subspace of V .6. so 0 ∈ sp(R) and 0 ∈ sp(L). and Lp = 0 only for the constant polynomials. but R is not surjective as the constant polynomials are not hit. and L is not injective as the constant polynomials are killed. by T W (w) = T w for all w ∈ W . Suppose T ∈ L(V ). For i ∈ [n] deﬁne fi ∈ F[z] by (z − λj ) fi (z) = Then fi (λj ) = 1 0 65 if i = j else. but 0 ∈ sp(R). and suppose λ1 . (3) An eigenspace of T is a T invariant subspace. Notation 5.1. so p(λ) ∈ sp(p(T )). Given T ∈ L(V ) and a T invariant subspace W ⊂ V . Proof. Proposition 5. Exercise. and v is an eigenvector for T corresponding to λ ∈ sp(T ).5. denoted T W . Examples 5. Let T ∈ L(V ).Note that LR = I. . the only eigenvalue of L is 0 as Lp = λp implies deg(λp) = deg(Lp) = deg(p) − 1 −∞ if deg(p) ≥ 2 if deg(p) < 2 which is only possible if λp = 0. T ∈ L(V ). λn are eigenvalues of T . Proof. (1) (0) and V are the trivial T invariant subspaces. Thus R is an example of an operator with no eigenvalues. (2) ker(T ) and im(T ) are T invariant subspaces. . Suppose p ∈ F[z]. Note that T W is just the restriction of the function T : V → V to W with the codomain restricted as well. Deﬁnition 5. Then vi = 0 for all i ∈ [n]. . A subspace W ⊂ V is called T invariant if T W ⊂ W . Thus L.9.8 (Polynomial Eigenvalue Mapping). In fact. .7. the identity operator in L(F[z]).1. Suppose n vi ∈ Eλi for all i ∈ [n] such that i=1 vi = 0. we deﬁne the restriction of T to W in L(W ).
To show 1 (N. n − 1 and T vn = v1 . Thus the spectrum is an invariant of a similarity class. Let n 1 (N. if there is an invertible J ∈ L(V ) such that J −1 SJ = T . then we can add up the terms an  in any order that we want and we will still get the same number. .1. R) is closed under addition. (2) distinct similarity classes are disjoint. R) = (an ) an  ∈ [−M. a3 . R). a4 . .13 (Inﬁnite Shift Operators). so T can have at most dim(V ) many distinct eigenvalues. denoted S ∼ T . Exercise 5. Show (1) ∼ deﬁnes a relation on L(V ) which is reﬂexive.8. S∞ ∈ L( ∞ (N. then sp(S) = sp(T ). Let B = {v1 . .By 5. The similarity class of S is T ∈ L(V ) T ∼ S . (2) Deﬁne S1 ∈ L( 1 (N. use the fact that if (an ) is absolutely convergent. for i ∈ [n]. and transitive (see 1. ∞ (N. vn } be a basis of V . Remark 5. We call two operators in S. 1 (N. R)) and S0 ∈ L(R∞ ) by Si (a1 . . a3 . .11. ∞ (N. ∞ (N. ∞.1. R) is the set of absolutely convergent sequences of real numbers.1. R) and ∞ (N.12 (Finite Shift Operators). R)).10.6). R) is the set of bounded sequences of real numbers. M ] for all n ∈ N for some M ∈ N . . . In other words. Eigenvectors corresponding to distinct eigenvalues are linearly independent. 1.1. . and R∞ is the set of all sequences of real numbers. and R∞ = (an ) an ∈ R for all n ∈ N . ) for i = 0. R) = (an ) a∈ R for all n ∈ N and n=1 an  < ∞ . 66 .1. R). (1) Show that 1 (N. and then show 1 (N. . a2 . . Exercises Exercise 5. Hint: First show that R∞ is a vector space over R. . ) = (a2 . . . and R∞ are vector spaces over R.1. n n n 0 = fi (T )0 = fi (T ) j=1 vj = j=1 fi (T )vj = j=1 fi (λj )vj = vi . R) are subspaces. Find the spectrum of the shift operator T ∈ L(V ) given by T vi = vi+1 for i = 1. symmetric. and (3) if S ∼ T . . T ∈ L(V ) similar. Exercise 5.
∞.(a) Find the set of eigenvalues. Examples 5. Suppose λ = a + ib is a psuedoeigenvalue of T ∈ L(V ). This method is only recommended for small matrices.4. Even then. v}.1.2.14. Deﬁnition 5.2 Determinants In this section. and (3) minT W = IrrR. The set of all permutations on [n] is denoted Sn . 1. (2) B is a basis for W . denoted ESi for i = 0.2. k ∈ [n] and σ(j) = k and σ(k) = j.1. j) ∈ [n] × [n] such that i < j and σ(i) > σ(j). λ ∈ C \ R is called a psuedoeigenvalue of T ∈ L(V ) if λ is an eigenvalue of TC ∈ L(VC ). 67 . and we call σ even if there are an even number of inversions in σ. Deﬁne the sign function sgn : Sn → {±1} by sgn(σ) = 1 if σ is even and −1 if σ is odd. Show (1) W = span(B) ⊂ V is invariant for T . we discuss how to take the determinant of a square matrix. Permutations are denoted 1 ··· σ(1) · · · n . σ(n) A permutation σ ∈ Sn is called a transposition if σ(i) = i for all but two distinct elements j. Note that the composite of two permutations in Sn is a permutation in Sn . and suppose u + iv ∈ VC \ {0} such that TC (u + iv) = λ(u + iv). (b) Why is part (a) asking you to ﬁnd the set of eigenvalues of Si and not the spectrum of Si for i = 0. 1.3.λ .2.2. Let F be a real vector space. An inversion in σ ∈ Sn is a pair (i. (1) (2) Deﬁnition 5. The determinant is a crude tool for calculating the spectrum of LA for A ∈ Mn (F) via the characteristic polynomial which is deﬁned in the next section. one can only get numerical approximations to the eigenvalues. Set B = {u. ∞? Exercise 5. and it is generally useless for large matrices without the use of a computer. Examples 5.2. (4) Find [T W ]B . A permutation on [n] is a bijection σ : [n] → [n]. 5. We call σ ∈ Sn odd if there are an odd number of inversions in σ.
j det(Ai. we see n n n−1 (−1)i+j Ai.j )k. There are two permutations in S2 : the identity permutation and the transposition switching 1 and 2.σ(i) .j ∈ Mn−1 (F) be the matrix obtained from A by deleting the ith row and j th column. Proof.1 = ad−bc ∈ F.j ). The determinant of A ∈ Mn (F) is n det(A) = σ∈Sn sgn(σ) i=1 Ai. we can calculate det(A) recursively. We calculate the determinant of the 2 × 2 matrix A= a b c d ∈ M2 (F). we get n det(A) = i=1 (−1)i+j Ai. n = 1: Obvious.7.j σ∈Sn−1 i+j sgn(σ) k=1 (Ai. Proposition 5. (2) Deﬁnition 5.2.j det(Ai.2.σ(k) where [n − 1] is identiﬁed with 68 . so sgn(id) = 1 and sgn(σ) = −1.σ(k) n−1 = j=1 σ∈Sn−1 (−1) Ai. we have n det(A) = j=1 (−1)i+j Ai.(1) The identity permutation has sign 1 since it has no inversions.5.j ).2. This procedure is commonly referred to as taking the determinant of A by expanding along the ith row of A. n − 1 ⇒ n: Expanding along the ith row and using the induction hypothesis. It is clear that id has no inversions and σ has one inversion. Call these id.j det(Ai. We may also take the determinant by expanding along the j th column of A. Hence n n det(A) = sgn(id) i=1 Ai.6. and a transposition has sign −1 as it has only one inversion.j )k.id(i) + sgn(σ) i=1 Ai.j ) = j=1 j=1 n (−1)i+j Ai.σ(i) = A1.j sgn(σ) k=1 (Ai.2 A2. Then ﬁxing i ∈ [n]. For A ∈ Mn (F).2 +(−1)A1. FINISH: We proceed by induction on n. Let Ai. Fixing j ∈ [n]. σ respectively.1 A2. Example 5.
(2) If A is block upper triangular.1 = A2 and A2. 0 A2 so applying the induction hypothesis.1 ∗ Ai. Case 1: Suppose m = 2.1 ) det(A2 ) = det(A1 ) det(A2 ). i. . k = 1: A1 ∈ M1 (F). We may repeat this trick to peel oﬀ the Ai ’s to get m det(A) = i=1 Ai .8.1 = 0. (2) We proceed by cases.1 ) det(A2 ) for all i ≤ k.2.1 = for all i ≤ k.1 = 0 for all i > k. Let A ∈ Mn (F).. there are square matrices A1 . .e. and set B1 = A2 . we have k k det(A) = i=1 (−1) 1+i Ai. 0 . .1 det((A1 )i. 0 B1 Applying case 1 gives det(A) = det(A1 ) det(B1 ). Then taking the determinant along the ﬁrst column.9.7 as taking the determinant of A along the ith row of A is the same as taking the determinant of AT along the ith column of AT . A= .2. Am such that A1 ∗ ... k − 1 ⇒ k: Suppose A1 ∈ Mk (F) with k > 1. If A ∈ Mn (F) has a row or column of zeroes. (1) det(A) = det(AT ). . Lemma 5. ∗ Am =⇒ A = A1 ∗ . Case 2: Suppose m > 2.1 (−1)i+1 Ai. . we have det(Ai. 0 Am then det(A) = i=1 m det(Ai ). Proof. we have (A1 )i.Corollary 5. 69 . (1) This is obvious by 5.1 ) = det((A1 )i. We proceed by induction on k where A1 ∈ Mk (F).1 det(A ) = i=1 i. Then as Ai.2. then det(A) = 0. the result is trivial by taking the determinant along the ﬁrst column as A1.
2.7 by expanding along the row multiplied by λ. Suppose A ∈ Mn (F). Proof. the result is trivial.j ) = det(E) det(A) as λ = det(E) and (EA)i. We then switch i + 1 with i + 2 as the old ith row is now in the (i + 1)th place. we switch j with j − 1. and (b) switching two rows of A switches the sign of det(A).2.2. Case 1: If E = I. or diagonal. If we want to switch rows i and j with i < j. Proposition 5. and let E ∈ Mn (F) be an elementary matrix. Case 3: We must show (a) the determinant of the matrix E obtained from the identity by multiplying a row by a nonzero constant λ is λ. then det(A) is the product of the diagonal entries. To get the result if the rows are not adjacent.10. then n n det(EA) = j=1 (−1)i+j (EA)i.j as only the ith row diﬀers between the two matrices. and (b) adding a constant multiple of the ith row of A to the k th row of A with k = i does not change the determinant of A.Corollary 5.2. and (b) immediately holds by 5.2. Then det(EA) = det(E) det(A). all the way to i + 1 with i for a total of j − i switches. lower triangular. Hence. we note that the interchanging of two rows can be accomplished by an odd number of adjacent switches. we switch a total of 2j − 2i − 1 adjacent rows.j det((EA)i. we switch i + 2 with i + 3.j det(Ai. then det(A) is the product of the determinants of the blocks on the diagonal.j = Ai. If A is block lower triangular or block diagonal. 70 . Case 2: We must show (a) the determinant of the matrix E obtained by switching two rows of I is −1. and the result holds.9 as E is diagonal. There are four cases depending on the type of elementary matrix.7. then j − 1 with j − 2. all the way up to switching j − 1 with j for a total of j − i − 1 switches. If A is upper triangular.11. which is always an odd number. Note that the result is trivial if the two rows are adjacent by 5. Case 4: We must show (a) the determinant of the matrix E obtained from the identity by adding a constant multiple of one row to another row is 1. and (b) multiplying a row of A by a nonzero constant λ changes the determinant by multiplying by λ.j ) = j=1 (−1)i+j λAi. If the ith row is multiplied by λ. We see (a) immediately holds from 5.
j ) + j=1 det(B) (−1)k+j Ak.2. we need a lemma: Lemma 5. Proof. En such that A = E1 · · · En as A is row equivalent to I. To show (b).2. so the determinant must be zero. Now det(U ) = 0 if and only if U = I as U is in reduced row echelon form. (1) Suppose S ∈ Mn (F) is invertible.2.j ) = j=1 (−1)i+j Ai.j det((EA) ) = j=1 n k. For A.j = Ak.12. Now suppose A is invertible. . There is a unique matrix U in reduced row echelon form such that A = En · · · E1 U for elementary matrices E1 . we get det(AB) = det(E1 · · · En B) = det(E1 ) · · · det(En ) det(B) = det(A) det(B).2.12 as B is the matrix obtained from A by replacing the k th row with the ith row. we note that (EA)k. Proposition 5.2. If A is not invertible.7 n n det(EA) = j=1 n (−1) k+j (EA)k. . . and det(B) = 0. Then det(AB) = det(A) det(B). By repeatedly applying 5.9 as E is either upper or lower triangular. By iterating 5.j det(Ak.2.2.15. so det(A) = 0 if and only if A is row equivalent to I if and only if A is invertible. Proposition 5. Then det(S −1 ) = det(S)−1 71 . Then det(A) = 0. .j + Ak.2. Then there are elementary matrices E1 . . A ∈ Mn (F) is invertible if and only if det(A) = 0.j ) = det(A) det(A) by 5. which is nonzero if and only if det(U ) = 0.2. switching these rows does not change the sign of the determinant.13. .2. B ∈ Mn (F).11.j ) det(Ak. we have det(A) = det(En ) · · · det(E1 ) det(U ). Proof. so two rows of B are the same. Proof.18. Suppose A.j for all j ∈ [n].2. Once again. then AB is not invertible by 3.16. . det(AB) = det(BA). . Theorem 5. As A has two identical rows.13. the determinant changes sign. We see from Case 2 that if we switch two rows of A.To prove this result. so det(AB) = det(A) det(B) = 0 by 5. so by 5. Corollary 5. Suppose A ∈ Mn (F) has two identical rows.11. En . note (a) is trivial by 5.14.2. B ∈ Mn (F).j det(Ak.j (−1)k+j (Ai.
3. Proof. Hint: Use operators like T deﬁned in 5.14 and (1). by charA (z) = det(zI − A).14. Deﬁne the characteristic polynomial of T . (1) For A ∈ Mn (F). det(A) = det(S −1 BS) = det(S −1 ) det(B) det(S) = det(S −1 ) det(S) det(B) = det(B).2.2. so zI − [T ]B ∼ zI − [T ]C . there is an injective function Φ : Sn → L(V ) such that Φ(στ ) = Φ(σ)Φ(τ ) for all σ. Note that this is well deﬁned as if C is another basis of V . Exercise. denoted charA ∈ F[z]. i. Exercise 5. τ ∈ Sn .2. denoted charT ∈ F[z]. then [T ]B ∼ [T ]C by 3. a function Φ : G → L(V ) such that Φ(gh) = Φ(g)Φ(h) is called a representation of G.1.18 (A Faithful Representation of Sn ). An injective representation is usually called a faithful representation. 72 .2. B ∈ Mn (F) and A ∼ B. Note that charA ∈ F[z] is a monic polynomial of degree n. 5. Show that the symmetric group Sn can be embedded into L(V ) where dim(V ) = n. Deﬁnition 5. Note: If G is a group.4. Now we apply 5. (1) By 5. then trace(A) = trace(B).3 The Characteristic Polynomial For this section. Suppose A.17. (2) By 5.(2) Suppose A. Proposition 5. (1) trace(AB) = trace(BA).16. BinMn (F). Exercises V will denote a ﬁnite dimensional vector space over F. (2) Let T ∈ L(V ). by charT (z) = det(zI − [T ]B ) where B is some basis for V .13.12. Proof. we have 1 = det(I) = det(S −1 S) = det(S −1 ) det(S).e. deﬁne the characteristic polynomial of A. V will denote a ﬁnite dimensional vector space over F.2.1. Then det(A) = det(B). (2) If A ∼ B.
and relate your answer to 5.1. For example. if dim(V ) ≥ 5.12. and quartic equations for calculating roots of polynomials of degree less than or equal to 4. Another problem is that it is very hard to calculate determinants without the use of computers. Compute charT (z) = det(zI − [T ]B ).Remark 5.5 tells us that one way to ﬁnd sp(T ) is to ﬁrst pick a basis B of V . 1 0 1.3. Then λ ∈ sp(T ) if and only if λ is a root of charT and λ ∈ F. vn ).12.4.3. and then compute the roots of the polynomial charT (z) = det(zI − [T ]B ). This is immediate from 5. Proof.7. (1) (2) 0 ±1 . The problem with this technique is that it is usually very diﬃcult to factor polynomials of high degree. there are quadratic.6. ﬁnd [T ]B . we can only calculate determinants to a certain degree of accuracy.4. but there is no formula for ﬁnding roots of polynomials with degree greater than or equal to 5.1.3. . cubic.8 (Roots of Unity).13 and 3.2. Suppose V is a ﬁnite dimensional vector space over C with ordered basis B = (v1 .3. . Let T ∈ L(V ). Examples 5. Remark 5.12.9. We will calculate the spectrum for some operators.5. Compute the characteristic polynomial of 73 . Proposition 5.2.3. Exercise 5. (1) If A± ∈ M2 (F) is given by A± = we see that charA± (z) = det(zI − A± ) = z 2 (2) Remark 5.3.3. Exercises V will denote a ﬁnite dimensional vector space over F. 5. Examples 5.3. Exercise 5.3. .3. then charA = charB . Hence. . so we are still unsure of the spectrum of the matrix if the characteristic polynomial obtained in the fashion can be factored. Note that charA = charLA if A ∈ Mn (F). Even with a computer. If A ∼ B. and let T ∈ L(V ) be the ﬁnite shift operator deﬁned in 5. there is no good way known to factor the characteristic polynomial.
there are polynomials unique k.. Since p(T ) = q(T ) = 0. (2) If T = I. we must also have that r(T ) = 0. .2.4. This polynomial is called the minimal polynomial. Recall that if V is ﬁnite dimensional. Then p(T ) = 0 if and only if minT  p. . . the set T n n ∈ Z≥0 where T 0 = I cannot be linearly independent.. .1. .. Proof. ∈ Mn (F) and ai ∈ F for . . Proof. (1) the operator LB ∈ L(F ) where B = n λ 0 0 1 n (2) the operator LC ∈ L(F ) where C = 0 all i ∈ [n − 1]. .1. Proposition 5. 1 0 . we have that minT (z) = z as this polynomial is a linear polynomial which gives the zero operator when evaluated at T . . we have r(T ) = 0. . 74 . . and it is denoted minT . We must show the polynomial p is unique. . the constant k must be 1. . Examples 5. we have deg(p) ≥ deg(minT ). i. by the Euclidean Algorithm 4.1. . By 4. Then since p(T ) = minT (T ) = 0. and 1 λ 0 · · · 0 −a0 0 · · · 0 −a1 . ∈ Mn (F) and λ ∈ F. the constant polynomial 1 evaluated at T is I). so r = 0 as minT was chosen of minimal degree. Suppose p(T ) = 0. so there is a unique monic polynomial p ∈ C[z] of minimal degree ≥ 1 such that p(T ) = 0. Since p and q are both monic.4.e. V will denote a ﬁnite dimensional vector space over F.2. r ∈ C[z] such that p = kq + r and deg(r) < deg(q). .3. If q ∈ C[z] is monic with deg(p) = deg(q). and since deg(p) = deg(q). so r = 0 as p was chosen of minimal degree. DeﬁnitionProposition 5. For T ∈ L(V ).4. Since minT has minimal degree such that minT (T ) = 0. 0 −an−2 1 −an−1 5. then L(V ) is ﬁnite dimensional.4 The Minimal Polynomial For this section. there are unique k.2. we have that minT (z) = z−1 for similar reasoning as above (recall that 1(T ) = I. Let p ∈ F[z]... k must be constant. . Thus p = kq. Hence minT  p. It is obvious that if minT  p. (1) If T = 0. then p(T ) = 0. . (3) We have that minT is linear if and only if T = λI for some λ ∈ F. r ∈ C[z] with deg(r) < deg(minT ) and p = k minT +r.
λ ∈ sp(T ) if and only if λ is a root of minT .4. First. Find the minimal polynomial of the following operators: 1 2 3 4 0 5 6 7 (1) The operator LA ∈ L(R4 ) where A = 0 0 8 9 .. Then (z − λ) divides minT (z).4.7. . ∈ Mn (F) and ai ∈ F for ...4. so there is a p ∈ C[z] with minT (z) = (z − λ)n p(z) for some n ∈ N such that λ is not a root of p(z). 0 −an−2 0 1 −an−1 all i ∈ [n − 1]. More generally. Suppose F = C and T ∈ L(V ). Exercises V will denote a ﬁnite dimensional vector space over F..Proposition 5. 0 0 0 10 λ 1 0 . Now suppose λ is a root of minT . Corollary 5. . Proof.4. so there is a v such that w = p(T )v = 0. Now we must have tha 0 = minT (T )v = (T − λI)n p(T )v = (T − λI)n w. .4. For T ∈ L(V ). p(T ) = 0 as minT p. .. . Then T has an eigenvalue. so λ is a root of minT . 75 . .6.. Remark 5.5. The result follows immediately by 5. Then minT (T )v = minT (λ)v = 0..4. . .4.. suppose λ ∈ sp(T ) corresponding to eigenvector v. .12..5 (Existence of Eigenvalues). . n − 1}. By 5. .3. Hence (T − λI)k w is an eigenvector for T corresponding to the eigenvalue λ for some k ∈ {0.4.. Exercise 5. n .4. 1 0 λ 0 0 · · · 0 −a0 1 0 · · · 0 −a1 . T ∈ L(V ) has an eigenvalue if V is a vector space over an algebraically closed ﬁeld. .1. n (2) The operator LB ∈ L(F ) where B = ∈ Mn (F) and λ ∈ F. We know that minT ∈ C[z] has a root by 4. Proof. (3) The operator LC ∈ L(F ) where C = . (4) The ﬁnite shift operator T in 5..
76 .
E = 0. 6. n . 1} as these are the only roots of z 2 − z. . 77 1 . . 1 ··· and let E = LA .1. I. If E = 0. so minE = p3 .3. and if E = I. Please note that the results of this section are highly dependent on the speciﬁc direct sum decomposition V = U ⊕ W . then minE (z) = p2 (z) = z − 1. we have deg(minE ) ≥ 2. V will denote a vector space over F. which is matrix decomposition via idempotents.1. i. but p3 (E) = 0. We then discuss diagonalization and a generalization of diagonalization called primary decomposition. then we have that p1 (E) = 0 and p2 (E) = 0. W will denote subspaces of V such that V = U ⊕ W . Examples 6.1 Idempotents For this section. (1) The zero operator and the identity operator are idempotents.1.e. then we know minE (z) = p1 (z) = z. both of which divide p3 (z) = z 2 − z. As E = λI for any λ ∈ F. Deﬁnition 6.2. . If E = 0. (2) We see that if E is a nontrivial idempotent. 1 . Suppose V is ﬁnite dimensional and E ∈ L(V ) is an idempotent. I. (2) Let A ∈ Mn (F) be given by 1 ··· 1 . and U. An operator E ∈ L(V ) is called idempotent if E 2 = E ◦ E = E. then sp(E) = {0. Facts 6. A = .1. so we know E 2 − E = 0. Then E 2 = E as A2 = A.Chapter 6 Operator Decompositions We begin with the easiest type of decomposition. (1) As E 2 = E.
Thus EU is an idempotent called the idempotent onto U along W . and λ ∈ F where ui ∈ U and wi ∈ W for i = 1. so EU = EU . Proof. 2 (2) EU = EU . Remark 6. Hence EU + EW = I.1. Hence EW EU = 0 = EU EW .8. for each v ∈ V there are unique u ∈ U and w ∈ W such that v = u + w. (4) EU (v) = v if and only if the W component of v is zero if and only if v ∈ U . Deﬁne EU : V → V by EU (v) = u if v = u + w with u ∈ U and w ∈ W .2. . and W2 = span 1 1 ⊂ F2 . the idempotent onto W along U which satisﬁes conditions (1)(4) above after switching U and W . v2 = u2 + w2 .4. then EU v = EU EU (v) = EU u = u = EU v. EW ∈ L(V ) satisfy (5) EW EU = 0 = EU EW . By (2). (3) ker(EU ) = W . so v ∈ v ∈ V EU (v) = v = U . if we set U = span 1 0 ⊂ F2 .DeﬁnitionProposition 6. W1 = span 0 1 78 ⊂ F2 . For example. 2. Then there is an x ∈ V 2 such that EU (x) = v.1. (1) Let v = u1 + w1 . The operators EU . This shows U = v ∈ V EU (v) = v ⊆ im(EU ). Then (1) EU is a linear operator. Note that EU is well deﬁned since by 2. we have EW EU (v) = EW EU (u + w) = EW (u) = 0 = EU (w) = EU EW (u + w) = EU EW v. (6) If v = u + w. ∈U ∈W 2 2 (2) If v = u + w. Suppose now that v ∈ im(EU ).5. v = EU x = EU x = EU v. Then EU (λv1 + v2 ) = EU ((λu1 + u2 ) + (λw1 + w2 )) = λu1 + u2 = λEU (v1 ) + EU (v2 ). (5) If v = u + w. we have (EU + EW )v = EU (u + w) + EW (u + w) = u + w = v. It is very important to note that EU is dependent on the complementary subspace W . (3) EU (v) = 0 if and only if the U component of V is zero if and only if v ∈ W . and (4) im(EU ) = v ∈ V EU (v) = v = U . Note that we also have EW ∈ L(V ). and (6) I = EU + EW .
j ∈ L(Wj .1. Then by (2).1. and v = w + u ∈ ker(E) + im(E).then U ⊕ W1 = V = U ⊕ W2 . v = Ev = 0.1.j = Ei T Ej . Then u ∈ im(E) and Ew = Ev − Eu = u − u = 0. we would have EU 1 1 = 0 . we have that EU but using W2 . Applying 6. (3) Suppose v ∈ ker(E)∩im(E). and ker(E)∩im(E) = (0). If v ∈ im(E). (4) E is the idempotent onto im(E) along ker(E). so w ∈ ker(E).7 (Idempotents and Direct Sum). (2) im(E) = v ∈ V Ev = v . Wi ) by Ti. and note that T = (Ti.1. (2) It is clear v ∈ V Ev = v ⊆ im(E). Suppose E ∈ L(V ) with E = E 2 . Then v = Eu = E 2 u = Ev. (5) We have that ker(I − E) = im(E) and im(I − E) = ker(E). Then (1) I − E ∈ L(V ) satisﬁes (I − E)2 = (I − E). Show that n V = i=1 Wi for nontrivial subspaces Wi ⊂ V for i ∈ [n] if and only if there are nontrivial idempotents Ei ∈ L(V ) for i ∈ [n] such that n (1) i=1 Ei = I and (2) Ei Ej = 0 for all i = j. 0 Proposition 6. (3) V = ker(E) ⊕ im(E). so the result follows from (4) (or 6. and (5) I − E is the idempotent onto ker(E) along im(E).4. = 79 .6. Proof. Note: In this case.1.4). 0 1 1 = 1 .j ) ∈ L(W1 ⊕ · · · ⊕Wn ) ∼ L(V ). then there is a u ∈ V such that Eu = v. so v = 0. Exercises Exercise 6.4 using W1 . Let v ∈ V . we can deﬁne Ti. and set u = Ev and w = v − u. (1) (I − E)2 = I − 2E + E 2 = I − 2E + E = I − E. (4) This follows immediately from 6.
Since V = U ⊕ W . W ). there is a canonical isomorphism V = U ⊕ W ∼ U ⊕W given by = u + w −→ with obvious inverse. . The external direct sum of U and W is the vector space u U ⊕W = u ∈ U and w ∈ W w where addition and scalar multiplication are deﬁned in the obvious way: u1 u2 + w1 w2 = u1 + u2 w1 + w2 and λ u w = λu λw for all u. B. W be vector spaces. This decomposition can be extended to the case when n V = i=1 Wi for subspaces Wi ⊂ V for i = 1. W ).1.2. i. B ∈ L(W. U ). and D ∈ L(W ) where addition and scalar multiplication are deﬁned in the obvious way: A B A + C D C B D = A+A C +C B+B D+D and λ A B C D = λA λB λC λD for all A. then by 6.6. deﬁne T1 T2 T3 T4 = T1 u + T2 w T3 u + T4 w The map given by T → T is an isomorphism L(V ) ∼ = A B C D A ∈ L(U ). U ). . D. .4. n. T3 = EW T EU ∈ L(U. w. D ∈ L(W ). C ∈ L(U. w2 ∈ W .2. and T4 = EW T EW ∈ L(W ).1. . If we set T1 = EU T EU ∈ L(U ). B ∈ L(W. deﬁne the operator T = T1 T2 T3 T4 u w : U ⊕W → U ⊕W. C. w1 . u1 . A ∈ L(U ). and λ ∈ F. Notation 6. U ). u2 ∈ U .2 Matrix Decomposition Deﬁnition 6. W ).e. 80 . T = (EU + EW )T (EU + EW ) = EU T EU + EU T EW + EW T EU + EW T EW .2 (Matrix Decomposition). u w by matrix multiplication. T2 = EU T EW ∈ L(W. and λ ∈ F. If T ∈ L(V ). if u ∈ U and w ∈ W . Let U. C ∈ L(U.
Suppose T ∈ L(V ). W ).3.2. (1) Every diagonal matrix is diagonalizable. Let T ∈ L(V ). where A ∈ L(U ). V ∼ U ⊕W and = 2.2.2. B ∈ L(W. V = U ⊕ W for subspaces U.4. and D ∈ L(W ) .2. T is diagonalizable if there is a basis of V consisting of eigenvectors of T .5. (2) There are matrices which are not diagonalizable. Show that sp(T ) = sp(A)∪sp(C). Suppose V = U ⊕ W . For example.2. T ∈ L(V ) is diagonalizable if and only if there is a basis B of V such that [T ]B is diagonal. . Deﬁnition 6. Hence there is not a basis of R2 consisting of eigenvectors of LA .2. U ). but dim(E0 ) = 1. and T = A B 0 C A B C D A ∈ L(U ).3. The main results of this section are characterization of when an operator T ∈ L(V ) is diagonalizable in terms of its associated eigenspaces and in terms of its minimal polynomial.3.3 Diagonalization For this section. A matrix A ∈ Mn (F) is called diagonalizable if LA is diagonalizable. Verify the claims made in 6. L(V ) ∼ = Exercise 6. V will denote a ﬁnite dimensional vector space over F. 81 .Exercises Exercise 6. Remark 6.e. .3. B ∈ L(W. U ). 1.4.2.3. and C ∈ L(W ) as in 6. .1. C ∈ L(U.2. consider A= 0 1 . Exercise 6. W . i. Examples 6. . Theorem 6. λm }.3. 0 0 We see that zero is the only eigenvalue of LA . T ∈ L(H) is diagonalizable if and only if m V = i=1 Eλi where sp(T ) = {λ1 . 6.
5. . Dn. . Hint: First show that the eigenspaces Eλ of T are Sinvariant.2. . Since B consists of eigenvectors of T . Certainly m W = i=1 E λi is a subspace of V by 5. . . Then show Sλ = SEλ ∈ L(Eλ ) is diagonalizable for all λ ∈ sp(T ). λn . . . Then T ∈ L(V ) is n diagonalizable if and only if i=1 ni = dim(V ). a diagonal matrix for some invertible S ∈ Mn (F). Then we know CS(S) = Fn by 3. . . . Corollary 6. vn } are the columns of S. . there is an invertible matrix S such that S −1 AS = D. . Then there is a basis {v1 . Suppose V is ﬁnite dimensional. . . 82 .3. Form the matrix S = [v1 v2  · · · vn ]. Hence. . Show T W is diagonalizable. .6. Suppose now that V = W . T ∈ L(V ) commute if and only if S. . .3. Remark 6. . . .7.13. so B is a basis of Fn consisting of eigenvectors of LA . m B= i=1 Bi is a basis for B. and note that S −1 AS = S −1 A[v1  · · · vn ] = S −1 [λ1 v1  · · · λn vn ] = S −1 Sdiag(λ1 . Suppose S −1 AS = D. Suppose A ∈ Mn (F) is diagonalizable. i.Proof. λn ) where D = diag(λ1 .1. Suppose T is diagonalizable.8.4. Then there is a basis of V consisting of eigenvectors of T . λn ) = diag(λ1 . T is diagonalizable. and A is diagonalizable. . . . vn } of Fn consisting of eigenvectors of LA corresponding to eigenvalues λ1 . which is invertible by 3. . This is the usual way diagonalizability is presented in a matrix theory course. Let Bi be a basis for Eλi for i = 1.2. .18. We show the converse holds. . .1 v1  . . λn ) is the diagonal matrix in Mn (F) whose (i.n λn ]. T ∈ L(V ) be diagonalizable.15. Let S. so that this deﬁnition of diagonalizability is equivalent to the one given in these notes. . we have [Av1  · · · Avn ] = AS = SD = [D1. λn } and let ni = dim(Eλi ).3. . Let sp(T ) = {λ1 . T are simultaneously diagonalizable. there is a basis of V consisting of eigenvectors of both S and T . Exercise 6. Then by 2. .3. Thus W = V as every element of V can be written as a linear combination of eigenvectors of T . Exercises Exercise 6. and if B = {v1 . Suppose T ∈ L(V ) is diagonalizable and W ⊂ V is a T invariant subspace. . m. Show that S. .e.9. i)th entry is λi . a diagonal matrix.
1. ∈ Mn (F) 1 0 0 is nilpotent of order n for all n ∈ N.4. V will denote a ﬁnite dimensional vector space over F. (2) Deﬁne Gp = v ∈ V p(T )n v = 0 for some n ∈ N . 83 .1. Remarks 6.4 Nilpotents Deﬁnition 6. and suppose p ∈ F[z] is a monic irreducible factor of minT . We say N is nilpotent of order n if n is the smallest n ∈ N such that N n = 0. Deﬁne Kp = ker(p(T )m ). In this case.5 Generalized Eigenvectors For this section.2.. Exercises 6. (1) The matrix A= is nilpotent of order 2. (1) An operator N ∈ L(V ) is called nilpotent if N n = 0 for some n ∈ N. (1) Note that Gp and Kp are both T invariant subspaces of V . We say the multiplicity of λ is Mλ = dim(Gλ ).4.4. 0 1 0 .4. Examples 6. An = ...8 in [3] Deﬁnition 6. .2.. (1) There is a maximal m ∈ N such that pm  minT . then p(z) = z − λ for some λ ∈ sp(T ) by 5. . elements of Gλ \ {0} are called generalized eigenvectors of T corresponding the the eigenvalue λ. (2) The block matrix N= 0 B 0 0 0 1 0 0 where 0.5. In fact. B ∈ Mn (F) for some n ∈ N is nilpotent of order 2 in M2n (F).5.6. and we set Gλ = Gp . . (3) If deg(p) = 1. This main result of this section is Theorem 12 of Section 6. Let T ∈ L(V ).
5. (3) Let B ∈ M4 (R) be given by 0 −1 1 0 1 0 0 1 B= 0 0 0 −1 . (1) Every eigenvector of T ∈ L(V ) is a generalized eigenvector of T . As p(B) = 0.3.5. 0 0 0 0 so we see immediately that (B 2 + I)2 = 0. we must have that minLB = p. q ∈ F[z] are two distinct monic irreducible factors of minT . Note that −1 0 0 −2 1 0 −1 2 0 0 (z 2 + 1)B = B 2 + I = + 0 0 −1 0 0 0 0 0 −1 0 0 1 0 0 0 0 1 0 0 0 0 0 = 0 0 1 0 0 0 0 0 0 −2 2 0 . 84 . Set p(z) = (z 2 + 1)2 .(2) For λ ∈ sp(T ). First. 0 0 1 0 We calculate Gp for all monic irreducible p  minLB . namely zero. Let T ∈ L(V ). We see that B − λI is block upper triangular. we see that A2 = 0. We now have a candidate for minLB . so z −1 charB (z) = det(zI − B) = 0 0 1 −1 0 z 0 −1 z 1 z 1 = (z 2 + 1)2 . 0 0 We saw earlier that A has a one eigenvalue. If T ∈ L(V ) is nilpotent. we have that minLB  p. Hence Kp = Gp = V as p(B)2 v = 0v = 0 for all v ∈ V . (1) Gp ∩ Gq = (0). so every v ∈ F2 is a generalized eigenvector corresponding to the eigenvalue 0. Remark 6. and that dim(E0 ) = 1 so A is not diagonalizable. Examples 6.4. However.5. we calculate charB . = −1 z −1 z 0 z 1 0 −1 z We claim charB = minLB .5. then every v ∈ V \ {0} is a generalized eigenvector of T corresponding to the eigenvalue zero. (2) Suppose A= 0 1 . there can be most Mλ linearly independent eigenvectors corresponding to the eigenvalue λ as Eλ ⊂ Gλ . Lemma 6. but as the only monic irreducible factor of p is q(z) = z 2 +1 and q(B) = 0. and suppose p.
suppose q(T )v = 0 for some v ∈ Gp . Then Kp = Gp . Proof. Next. Then v ∈ Gp ∩ Gq = (0) by (1). q m are relatively prime.6. Then n n (1) V = i=1 Kp i = i=1 Gpi and (2) If Ti = T Kpi ∈ L(Kpi ) for i ∈ [n]. then there is a y ∈ Gp such that f (T )y = x. minTi = pri . Thus q(T )Gp is injective and thus bijective by 3.3. We know Gp is f (T )invariant. so v = 0. by 4. Proof. and as f = q1 · · · qm for monic irreducible qi = p which divide minT for all i ∈ [m]. j Proof. we note that Gp is T invariant. g ∈ F[z] such that f pn + gq m = 1. there are f.(2) q(T )Gp is bijective. Let T ∈ L(V ). Proposition 6.5. (2) First.2.6 Primary Decomposition minT = pr1 · · · prn 1 n Theorem 6. 85 . Certainly Kp ⊂ Gp . Then there are n. Thus. and thus so is f (T )Gp = (q1 · · · qm )(T )Gp = (q1 (T )Gp ) · · · (qm (T )Gp ). and suppose p ∈ F[z] is a monic irreducible factor of minT . so 0 = minT (T )y = (f pm )(T )y = p(T )m f (T )y = p(T )m x.1 (Primary Decomposition). and suppose where the pj ∈ F[z] are distinct irreducible polynomials for i ∈ [n]. v = 1(T )v = f (T )p(T )n v + g(T )q(T )m v = 0. so it is q(T )invariant.5.5. (1) Suppose v ∈ Gp ∩ Gq . by 6. Thus. and let f be the unique monic polynomial such that minT = f pm .15. and we are ﬁnished. qi (T ) is bijective. Exercises 6.8. if x ∈ Gp . But as pn . Suppose T ∈ L(V ).6. Let m be the multiplicity of p. m ∈ N such that p(T )n v = 0 = q(T )m v. and x ∈ Kp .
then fj gj (T )v = 0 as pri fj .(1) For i ∈ [n]. Moreover. (2) Since pi (Ti )ri ∈ L(Kpi ) is the zero operator. The fi ’s are relatively prime. Hence v ∈ im(Ei ). so there are polynomials gi ∈ F[z] such that n fi gi = 1. Now suppose v ∈ Kpi . That is.3 as minT fi fj . set fi = minT = pri i pj j . then by (1). if g ∈ F[z] such that g(Ti ) = 0. It remains to show im(Ei ) = Kpi . Thus Ej v = hj (T )v = (fj gj )v = 0. we can write n v= j=1 vj where vj ∈ Kpj for all j ∈ [n]. j=i r and note that minT fi fj for all i = j. Since they sum to I. and i n n Ei v = j=1 Ej v = j=1 Ej v = Iv = v. 86 . they corrspond to a direct sum decomposition of V : n V = i=1 im(Ei ). i=1 Set hi = fi gi for i ∈ [n] and Ei = hi (T ). Conversely. but g(T ) = 0 on Kpi . i Hence v ∈ Kpi and im(Ei ) ⊂ Kpi . First note that n n n Ei = i=1 i=1 hi (T ) = i=1 hi (T ) = q(T ) = I where q ∈ F[z] is the constant polynomial given by q(z) = 1. and Kpi ⊆ im(Ei ). Hence the Ei ’s are idempotents: n n Ej = Ej I = Ej i=1 Ei = i=1 2 Ej Ei = Ej . then g(T )fi (T ) = 0 as fi (T ) is only nonzero on Kpi . If v ∈ im(Ei ). we have that minTi (pi )ri . Ei Ej = hi (T )hj (T ) = (hi hj )(T ) = (gi gj fi fj )(T ) = 0 by 5. if v ∈ V .4. If j = i. so pi (T )ri v = pi (T )ri Ei v = pi (T )ri hi (T )v = (pri fi gi )(T )v = (minT gi )(T )v = 0. then v = Ei v.
1.4. and pri fi divides gfi .6.4. and set p(z) = (z − λ). Corollary 6. Exercises 87 . Hence minT gfi . i i i Corollary 6. Furthermore.2. λ∈sp(T ) Then by 6. Hence T is diagonalizable by 6. By 6.8. λ∈sp(T ) We claim p = minT . T ∈ L(V ) is diagonalizable if and only if minT splits into distinct linear factors in F[z]. V = λ∈sp(T ) Gλ if and only if minT splits (into linear factors).2. which will imply minT splits into distinct linear factors in F[z].6. there are unique vλ ∈ Eλ for each λ ∈ sp(T ) such that v= λ∈sp(T ) vλ . i. we have that V = λ∈sp(T ) ker(T − λI) = λ∈sp(T ) Eλ .4.and we have that n n g(T )fi (T )v = g(T )fi (T ) j=1 vj = g(T ) j=1 fi (T )vj = g(T )vi = 0. Suppose T is diagonalizable.3. so minTi = pri . λ∈sp(T ) Proof.4. p = minT .3.e.3. as (z −λ) minT (z) for all λ ∈ sp(T ) by 5. and p(T ) = 0. so it is clear that p(T )v = 0. Suppose now that minT (z) = (z − λ). This implies pri g. λ∈sp(T ) Let v ∈ V . V = Eλ . minT (z) = (z − λ). Then by 2.6.
88 .
. 0 −an−2 0 1 −an−1 Examples 7. 1 0 89 .Chapter 7 Canonical Forms For this chapter. (1) Suppose p(z) = z − λ. Finally. . . we prove the CayleyHamilton Theorem which relates the characteristic and minimal polynomials of an operator. V will denote a ﬁnite dimensional vector space over F.1.. Then the companion matrix for p is the matrix given by 0 0 · · · 0 −a0 1 0 · · · 0 −a1 . . 7.1...2. Let p ∈ F[z] be the monic polynomial given by p(z) = t + i=0 n ai z i . (2) Suppose p(z) = z 2 + 1 ∈ M2 (F).. .. We discuss the rational canonical form and the Jordan canonical form. we discuss the holomorphic functional calculus which is an application of the Jordan canonical form.1. Then the companion matrix for p is (λ) ∈ M1 (F)..1 Cyclic Subspaces n−1 Deﬁnition 7. . Along the way. Then the companion matrix for p is 0 −1 . We then will show to what extent these canonical forms are unique. . .
. (1) For v ∈ V . In fact. Then charA = minA = p. p is a monic polynomial of least degree such that p(A) = 0. Note that if {e1 . λn ∈ F such that n n w= i=0 λi T v. (2) A subspace W ⊂ V is called T cyclic if there is a w ∈ W such that W = ZT. . Examples 7. We claim that T m v ∈ span(B) for all m > n so B is the desired basis. . Suppose v ∈ V \{0}. n − 1 is linearly independent. n is linearly independent. . As deg(p) = n. . (2) Remarks 7. 1 . . (1) Every T cyclic subspace is T invariant. As V is ﬁnite dimensional.4..1. The base case is m = n + 1.1.v .w ⊂ W as T i w ∈ W for all i ∈ N. .3. then Fn is LA cyclic with cyclic vector e1 as Aei = ei+1 for all i ∈ [n − 1]. n is a basis for ZT..1. . (1) If A ∈ Mn (F) is a companion matrix for p ∈ F[z]. There there is an n ∈ Z≥0 such that T i v i = 0. The routine exercise of checking that p = charA and p(A) = 0 are left to the reader. We prove this claim by induction on m. .v . Proof. (2) Suppose W is a T invariant subspace and w ∈ W .(3) If p(z) = z n . en } is the standard basis of Fn . so Ai e1 i = 0. Deﬁnition 7. . .v .1. so T w = i=0 i λi T i+1 v ∈ ZT. Proposition 7. then there is an n ∈ N and there are scalars λ0 . 0 1 0 Proposition 7.v = span T n v n ∈ Z≥0 .7. . . Proof. then the companion matrix for p is 0 0 . .6. and thus p = minA . . if w ∈ ZT. we have that Aei = ei+1 for i ∈ [n − 1]. Then ZT.5. so we can pick a maximal n ∈ Z≥0 such that B = T i v i = 0.1. . Let A ∈ Mn (F) be the companion matrix of p ∈ F[z]. . Thus deg(minA ) ≥ n as q(A)e1 = 0 for all nonzero q ∈ F[z] with deg(q) < n. . the T cyclic subspace generated by v is ZT..w . 90 . .. . . Suppose T ∈ L(V ). This w is called a T cyclic vector for W . the set T i v i ∈ Z≥0 is linearly dependent. . .
then the standard basis for Fn is an LA cyclic basis. and 0 0 · · · 0 −a0 1 0 · · · 0 −a1 . Hence there are scalars λ0 .1. T n−1 v}. .8. . (2) Proposition 7. and suppose V is T cyclic. In fact. Now minL[T ]B = minT = p by 7. . . . . .1. . . Then n n T m+1 v=T m−n T n+1 v=T m−n i=0 λi T v = i=0 i λi T m−n+i v ∈ span(B) by the induction hypothesis. (2) deg(minT ) = dim(V ). and the standard basis is equal to BLA . .3. then the T cyclic basis associated to v is denoted BT..1.v . Then a T cyclic basis of W is a basis of the form {w.1.e1 . T v.10. If W = ZT. We know such a basis exists by 7. m. T n w} for some n ∈ Z≥0 . . . .v . . We know that [T ]B is a companion matrix for some polynomial p ∈ F[z] as B = {v. . . Suppose T ∈ L(V ). . m ⇒ m + 1: Suppose now that T i v ∈ span(B) for all i = 0.. .9. . (1) If A ∈ Mn (F) is a companion matrix for p ∈ F[z]. (1) Let B be a T cyclic basis for V .. ... 91 .m = n + 1: We already know T n+1 v ∈ span(B) as n was chosen to be maximal. Examples 7. (1) Let n = dim(V ). . λn ∈ F such that n T n+1 v= i=0 λi T i v. Fn = ZLA . . . 0 −an−2 0 1 −an−1 n−1 [T ]B = [T v]B [T 2 v]B · · · [T n v]B where T nv = − n−1 ai T i v and p(z) = z n + i=0 i=1 ai z i . .e1 . = .1. T w. Proof. Deﬁnition 7.7. Suppose W ⊂ V is a nontrivial T cyclic subspace of V . (2) This follows immediately from (1). Then [T ]B is the companion matrix for minT .
v = T i v i = 0. . Exercises Exercise 7.1. and minS  pm by 5. .2 Rational Canonical Form The proof of the main theorem in this section is adapted from [2]. Suppose that v ∈ V \ {0} and m ∈ N is minimal with p(T )m v = 0. Deﬁnition 7. and Zi ⊂ Kpi for some monic irreducible pi  minT for all i ∈ [n]. (3) The matrix [T ]B is called a rational canonical form of T if B is a rational canonical basis for T . Let W = ZT. and note that W is a T invariant subspace.Lemma 7. (1) Subspaces Z1 . dm − 1 (so ZT. so BT. denoted n B= i=1 Bi . .1. we must have minS = pm . and note that p(T )m v = 0 implies that p(T )m w = 0 for all w ∈ W . but as p(T )k v = 0 for all k < m. But p is irreducible. Let T ∈ L(V ). and (3) minT = charT . Proof. Show the following are equivalent: (1) V is T cyclic. such that each Bi is a T cyclic basis for span(Bi ) ⊂ Kpi for some monic irreducible pi  minT for all i ∈ [n].11. . . Suppose p is a monic irreducible factor of minT for T ∈ L(V ). Zn are called a rational canonical decomposition of V for T if n V = i=1 Zi . Zi is T cyclic for all i ∈ [n]. . As W is Scyclic.v  = dm.1. .12. we must have that dm = deg(pm ) = deg(minS ) = dim(W ) by 7.3. so minS = pk for some k ≤ m. and d = deg(p).10.1. 7.v .v has dimension d). Set S = T W . Let T ∈ L(V ) with minT = pm for some monic irreducible p ∈ F[z] and some m ∈ N. 92 .7. (2) deg(minT ) = dim(V ). The result now follows by 7.2. (2) A basis B of V is called a rational canonical basis for T if B is the disjoint union of nonempty sets Bi . Then BT. Hence p(S)m = 0.1.4. .
Applying p(T ). i=1 j=1 For i ∈ [n]. Proposition 7. and minTi  fi .2. Since minT (Ti ) = 0. Set Zi = ZT.2. so minTi = pk for some k ≤ m. vn ∈ V such that n S1 = i=1 BT. Note that if B is a rational canonical basis for T .3. 93 .wi is a subspace of V . we get n n n 0 = p(T ) i=1 fi (T )vi = i=1 fi (T )p(T )vi = i=1 fi (T )wi .(4) The matrix A ∈ Mn (F) is said to be in rational canonical form if the standard basis of Fn is a rational canonical basis for LA . Let gi ∈ F[z] such that pgi = fi for all i ∈ [n]. . Then n S2 = i=1 BT.j T j vi = 0. Remark 7.vi  for each i ∈ [n]. irreducible p ∈ F[z] and some m ∈ N. let mi n fi (z) = j=1 λi. Suppose n mi λi. Then n n n fi (T )vi = i=1 i=1 gi (T )p(T )vi = i=1 gi (T )wi = 0. i. . and mi = BT. Suppose T ∈ L(V ) such that minT = pm for some monic. then [T ]B is a block diagonal matrix such that each block is a companion matrix for a polynomial q ∈ F[z] of the form pm where p ∈ F[z] is monic and irreducible and m ∈ N.wi is linearly independent so that n W = i=1 ZT.vi is linearly independent. we must have fi (T )wi = 0 for all i ∈ [n]. Proof. we must have that minTi  minT .e.wi . Suppose v1 . For each i ∈ [n].j z so that 0 = i=1 j fi (T )vi . Hence pfi for all i ∈ [n]. Ti = T Zi . . wi ∈ im(p(T )) for all i ∈ [n]. Suppose there is vi ∈ V such that p(T )vi = wi .2. . Hence fi (Ti ) = 0 on Zi . Now as fi (T )wi ∈ Zi and W is a direct sum of the Zi ’s.
B ∪BT.v2 is linearly independent. But as B is a basis. Suppose that W is a T invariant subspace of V with basis B. .4. d = dim(ZT.v1 is linearly independent.11. Suppose T ∈ L(V ) such that minT = pm where p ∈ F[z] is monic and irreducible and m ∈ N.v1 ∪BT. so by 7. We can repeat this process until Wk contains ker(p(T )) for some k ∈ N. .w ⊂ W and ZT.11. vn ∈ ker(p(T )) such that n B =B∪ i=1 BT.v = B = {v1 . By (1).v . But then mi 0 = gi (T )wi = fi (T )vi = j=1 λi.v = ZT. Suppose T ∈ L(V ) with minT = pm for some monic. (2) If ker(p(T )) W . .vt ) is T invariant.w ⊂ ZT. If ker(p(T )) W1 . a contradiction as v ∈ W .v ∩ W ) ≤ dim(ZT. then pick v1 ∈ W \ ker(p(T )).v is linearly independent.w ) ≤ dim(ZT. pick v2 ∈ W1 \ ker(p(T )). so equality holds.v ∩ W . . (1) Suppose v ∈ ker(p(T )) \ W .v2 ) is T invariant. .v ) = d. so we have ZT.j T j vi . Lemma 7.5. Thus w = 0. . B ∪ BT.2. (1) Let d = deg(minT ). and note that BT. . (2) There are v1 . Suppose λi vi + w = 0 where w = i=1 j=0 µj T j v. .2. . vn } and n T i v i = 0. Then there is a rational canonical basis B for T . Then w ∈ span(B) = W and w ∈ ZT. . so we must have that λi = 0 for all i ∈ [n] and j ∈ [mi ] as BT. gi (T )wi = 0 for all i ∈ [n] as W is the direct sum of the Zi ’s and gi (T )wi ∈ Zi for all i ∈ [n]. both of which are T invariant subspaces. d − 1 d−1 by 7. Then B ∪ BT. . . and W2 = span(B ∪BT. By (1). .1.vi is linearly independent and contains ker(p(T )). and W1 = span(B ∪ BT.vi is linearly independent.v1 ∪BT.v .v ∩ W ⊂ ZT.w ⊂ ZT. . j Proposition 7. and µj = 0 for all j = 0. d − 1. .v . Thus we have ZT. which is a subset of W . irreducible p ∈ F[z] and some m ∈ N. 94 . we have λi = 0 / for all i ∈ [n].1. If w = 0. then p(T )w = 0 as p(T )ZT. But then ZT. Proof.so once again. .v = 0.
m = 1: Then V = ker(p(T )). so U = V . By 7. . By 6.6. uk ∈ ker(p(T )) such that k D=C∪ i=1 BT. Then im(S) = SU = S span(C) ⊇ W = im(p(T )). and T W has minimal polynomial pm−1 . note that U is T invariant. we know that there is a rational canonical basis Bi for Ti on Kpi . .13 dim(U ) = dim(im(S)) + dim(ker(S)) ≥ dim(im(p(T ))) + dim(ker(p(T ))) = dim(V ). By 3. so n B= i=1 Bi is the desired rational canonical basis of T . we have n V = i=1 Kp i where Kpi = ker(pi (T )mi ) are T invariant subspaces. we can factor minT uniquely: n minT = i=1 pm i i where pi ∈ F[z] are distinct monic irreducible factors of minT and mi ∈ N for all i ∈ [n]. and ker(S) ⊇ ker(p(T )) as U contains ker(p(T )). Proof. there are u1 .2. By 7. Let S = p(T )U . By 4.4.2. so it is p(T ) invariant. Theorem 7. By the induction hypothesis.2.vi is linearly independent by 7. and n C= i=1 BT. 95 . then minTi = pmi for i all i ∈ [n].3.2. We proceed by induction on m ∈ N. so we may apply 7.Proof. . We claim that U = V . there are vi ∈ V for i ∈ [n] such that p(T )vi = wi . m − 1 ⇒ m: We know W = im(p(T )) is T invariant.2.3.5. and if Ti = T Kpi . Thus D is a rational canonical basis for T . First. As W = im(p(T )). Every T ∈ L(V ) has a rational canonical decomposition.4 with W = (0) to get a basis for ker(p(T )). we have a rational canonical basis n B= i=1 BT.6.ui is linearly independent and contains ker(p(T )).wi for T W .2. Let U = span(D).1. .11.
vn ∈ V . there are bases C.1. . [T ]B is block diagonal. . .9.vi = span(BT. we see that charT (z) = det(zI − [T ]B ) j is a product n ni charT (z) = det(zI − [T ]B ) = i=1 pi (z)ri 96 .3.2. ﬁnd a condition which makes it true. For T ∈ L(V ). charT (T ) = 0. Prove or disprove: A square matrix A ∈ Mn (F) is similar to its transpose AT .11: n minT = i=1 pm i .2. there is a rational canonical decomposition V = i=1 ZT.10.6.7. 7. 0 = [minT (T )]B = minT ([T ]B ) = minT = . But as minT is the minimal polynomial of [T ]B . Proof.3.e. i Let B be a rational canonical basis for T as in 7. If the statement is false.6 there is a rational canonical basis n B= i=1 BT.3 The CayleyHamilton Theorem Theorem 7. Show that two matrices A..2. .10.2. i. By 7.vi ). Factor minT into irreducibles as in 4. For T ∈ L(V ). . D of Fn such that [LA ]C = [LB ]D is in rational canonical form.8.2.1.3. we get the desired result. . . Exercise 7. By 7. . Ak be the blocks.n Corollary 7. As A1 minT (A1 ) .1 (CayleyHamilton). Proof.vi for v1 . so let A1 . j we must have that ni = mi for some i ∈ [n].2. Ak minT (Ak ) we must have that ni ≤ mi for all i ∈ [n]. B ∈ Mn (F) are similar if and only if they share a rational canonical form. p) ∈ F[z]2 such that there is an operator T ∈ L(V ) where V is a ﬁnite dimensional vector space over F such that minT = m and charT = p. Classify all pairs of polynomials (m. By 7. Exercise 7. Exercises Exercise 7.. Setting ZT. . . We know that Aj is the companion matrix for pijj where ij ∈ [n] and nj ∈ N for j ∈ [k].vi into cyclic subspaces. .
Corollary 7. m m trace(A) = trace([A]B ) = i=1 trace(Ai ) = i=1 −ai i −1 . [A]B = . and det(A) = (−1)n a0 . Proposition 7. and det(A) = det([A]B ). the constant term of charA is obtained by taking the product of the constant terms of the pi ’s. (1) The coeﬃcient of the z n−1 term in charA (z) is equal to − trace(A). Proof. As A ∼ [A]B . First note that the result is trivial if A is the companion matrix to a polynomial n−1 p(z) = z + i=0 n ai z i ∈ F[z] as charA (z) = p(z). Let A ∈ Mn (F). let B be a rational canonical basis for LA so that [A]B is a block diagonal matrix A1 0 . Similarly. n n charT (T ) = i=1 pi (z) (T ) = minT (T ) i=1 ri pi (T )ri −mi = 0. n Now one sees that the (n − 1)th coeﬃcient of charA is exactly the negative of the right hand side as n ni m m charA (z) = i=1 z ni + j=0 ai z j j =z + i=1 n ai i −1 n z n−1 + ··· + i=1 ai 0 since the (n − 1)th terms in the product above are obtained by taking the (ni − 1)th term of pi and multiplying by the leading term (the z nj term) for j = i. For the general case. (1) It is easy to see that the trace of [A]B is the sum of the traces of the Ai . minT  charT for all T ∈ L(V ). Thus.. trace(A) = −an−1 .where ri ≥ mi for all i ∈ [n].e. i. we have that n ni + j=0 ai z j ∈ j charA (z) = det(zI − [A]B ) = i=1 pi (z). trace(A) = trace([A]B ).3.2.3. (2) The constant term in charA (z) is equal to (−1)n det(A).3. 0 Am ni −1 where each block Ai is the companion matrix of some polynomial pi (z) = z F[z]. 97 .
4... Obvious. Suppose J ∈ Mn (F) is a Jordan block associated to λ.e.j = 1 if j = i + 1 0 else.3. ni − 1 .vi = (T − λI)j vi j = 0.6. A Jordan block of size n associated to λ ∈ F is a matrix J ∈ Mn (F) such that λ if i = j Ji.2. . Then BN.4. .vi  for each i ∈ [n]. Then charJ (z) = minT (z) = (z − λ)n . we calculated that the constant term of charA is as det(A) = det([A]B ) = i=1 i=1 m m ai .vi is a matrix in Mni (F) of the form 0 0 . .6.. .. then (z −λ)m = minS by 6. there is a rational canonical decomposition for N k Gλ = i=1 ZN. 0 Exercises 7. Set ni = BN. Now by 7. and note that [N BN. Remark 7.4 Jordan Canonical Form Deﬁnition 7..1.m (2) In (1). Let T ∈ L(V ) and λ ∈ sp(T ). 1 0 λ λ Proposition 7. This means that the operator N = (T −λI)Gλ is nilpotent of order m.. 0 1 0 98 ..vi for some k ∈ N where vi ∈ Gλ for all i ∈ [n]. i. ..2. . J looks like 1 0 .1. But this is (−1)n det(A) 0 m det(Ai ) = i=1 (−1)ni ai 0 = (−1) n i=1 ai . 1 . . .vi ]BN.. . Proof. . If m ∈ N is maximal such that (z − λ)m  minT and S = T Gλ . .4.
4 is called a Jordan canonical basis for T .2.6. Then there is a basis B of V such that [T ]B is a block diagonal matrix where each block is a Jordan block. and if we set k C= i=1 CN. .3.. 1 0 0 0 i..4. Deﬁnition 7. Thus [T Gλ ]C = [N ]C + λI is a block diagonal matrix where the ith block is a Jordan block of size ni associated to λ. and the matrix [T ]B is called a Jordan canonical form of T . . By 6. Setting CN.as the minimal polynomial of N BN.. (1) Let T ∈ L(V ). i. and if λ ∈ sp(T ). . 0 . we have that [N CN.5. A basis B for V as in 7. .vi is a matrix in Mni (F) of the form 1 0 .vi ]CN. 99 . there is a basis Bλ of Gλ such that T Bλ is a block diagonal matrix in MMλ (F) where each block is a Jordan block associated to λ. we see that if we set k B= i=1 BN.4. . it is a Jordan block associated to 0 of size ni .vi . by 7. Suppose T ∈ L(V ) and minT splits in F[z].. we have that [N ]C is a block diagonal matrix where the ith block is a Jordan block of size ni associated to 0. Setting B= λ∈sp(T ) Bλ gives the desired result. .vi .4.vi is z ni . ..e. Proof. The diagonal elements of [T ]B are precisely the eigenvalues of T . . Thus.4.vi = (T − λI)j vi j = ni − 1. we have that [N ]B is a block diagonal matrix where the ith block is the companion matrix of the polynomial z ni . for each λ ∈ sp(T ).4. Theorem 7. reordering BN. we have minT splits if and only if V = λ∈sp(T ) Gλ .e. λ appears exactly Mλ times on the diagonal of [T ]B .vi .
v . . (1) Suppose J ∈ Mn (F) is a Jordan block associated to λ ∈ F 1 0 . then there is a minimal n such that (T −λI)n v = 0. . Corollary 7.v = ((T −λI)n−1 v. 1 0 λ λ One can easily see that (J − λI)ei = ei−1 for all 1 < i ≤ n and (J − λI)e1 = 0. 0 An 100 .8.9. so the standard basis of Fn is a Jordan chain for LJ associated to λ with lead vector e1 . then we see by (1) that the standard basis is a disjoint union of Jordan chains associated to λ. Proof. λn }. Then B is a disjoint union of Jordan chains associated to the eigenvalues of λ. Suppose B is a Jordan canonical basis for T ∈ L(V ). . . . . Let T ∈ L(V ).4. .4.. [T ]B = . (2) If A is a block diagonal matrix J1 . and note that it is an eigenvector corresponding to the eigenvalue λ.(2) A matrix A ∈ Mn (F) is said to be in Jordan canonical form if the standard basis is a Jordan canonical basis of Fn for LA . v) is called a Jordan chain for T associated to the eigenvalue λ of length n.6. (1) The set CT −λI. . J = .. .. A= 0 0 Jn where Ji is a Jordan block associated to λ ∈ F for all i ∈ [n]. Recall that if v ∈ Gλ \{0} for some λ ∈ sp(T ). Examples 7. Let sp(T ) = {λ1 .7. This means that BT −λI. We have that [T ]B is a block diagonal matrix A1 0 . .v = (T − λI)i v i = 0. (1) (2) Deﬁnition 7... Examples 7.. n − 1 is a basis for ZT −λI. .. . The vector (T − λI)n−1 v is called the lead vector of the Jordan chain CT −λI..4.4. .v . . . (T −λI)v.
so let A1 . and set Ti = T Gλi . .where each Ai is a block diagonal matrix corresponding to λi ∈ sp(T ) composed of Jordan blocks: i 0 J1 .. Exercise 7. . we discuss to what extent a rational canonical form of T ∈ L(V ) is unique.11. and to what extend the rational canonical form is unique if minT splits.5 Uniqueness of Canonical Forms In this section.2 and 7. Hence B is a disjoint union of Jordan chains associated to the eigenvalues of T . Classify all pairs of polynomials (m. This means that [S]C is a block diagonal matrix.5. 0 Now set Bi = B ∩ Gλi for each i ∈ [n] so that n i Jni B= i=1 Bi . and there is a subset C ⊂ B that is a Jordan canonical basis for S.4 of [2]. . p) ∈ C[z]2 such that there is an operator T ∈ L(V ) where V is a ﬁnite dimensional vector space over C such that minT = m and charT = p. . and it is easy to check that Bi is the disjoint union of Jordan chains associated to λi for all i ∈ [n]. An be these blocks.4.4. [S]C = . Then [Ti ]Bi = Ai .1 (Jordan Canonical Dot Diagrams).10. The information from this section is adapted from sections 7. Exercises Exercise 7. If λ ∈ sp(T ) and m ∈ N is the largest integer such that (z −λ)m  minT . Show that two matrices A. we must have that 101 . .e A1 0 . We now impose the condition that in order for [S]C to be in Jordan canonical form. 7. then we have that S = T Gλ ∈ L(Gλ ) has minimal polynomial z m . Ordering Jordan Blocks: Suppose T ∈ L(V ) such that minT splits. 0 An We know that Ai is a Jordan block of size mi where mi ≤ m for all i ∈ [n]. Notation 7.. and let B is a Jordan canonical basis for T . i.. B ∈ Mn (C) are similar if and only if they share a Jordan canonical form. Ai = .
and let ri denote the number of dots in the ith row of the dot diagram for T relative to B as in 7.mi ≥ mi+1 for all i ∈ [n − 1]. (2) the ith column contains mi dots. . . Then Tλ − λI is nilpotent. and let B be a Jordan canonical basis for T . . . let mi = CT. we ﬁrst decompose Gλ into T cyclic subspaces for all λ ∈ sp(T ). The dot diagram for T relative to B is the disjoint union of the dot diagrams for Tλ relative to the Bλ for λ ∈ sp(T ). • T vn . Write B as an array of dots such that (1) the array contains n columns starting at the top. . . . . • vn . . • v2 • T v1 • v1 Operators: Suppose that T ∈ L(V ). Suppose T ∈ L(V ) is nilpotent of order m.vi for i ∈ [n]. .5. and (3) the (i. . (2) r1 = dim(V ) − rank(T ). Nilpotent Operators: Suppose minT = z m for some λ ∈ F.5.vi . .1 for i ∈ [n]. Let B be a Jordan canonical basis for T . 102 . minT (z) = z m . i. For i ∈ [n]. This means that to ﬁnd a Jordan canonical basis for T ∈ L(V ). We can picture B as an array of dots called a Jordan canonical dot diagram for T relative to the basis B. Let n B= i=1 CT. . and then we order these bases by how many elements they have.vi be a Jordan canonical basis of V consisting of disjoint Jordan chains CT.2. then Rk = nullity(T k ) = dim(ker(T k )) for all k ∈ [n]. . • T m1 −1 v1 · · · • T m2 −1 v2 · · · • T mn −1 vn • T m1 −2 v1 · · · • T m2 −2 v2 · · · • T mn −2 vn .e. . and (3) rk = rank(T k−1 ) − rank(T k ) for all k > 1. • T v2 . . . j)th dot is labelled T mi −j vi . . . so we may apply (2) to get a dot diagram for Tλ − λI relative to Bλ for each λ ∈ sp(T ). Let Bλ = B ∩ Gλ for λ ∈ sp(T ) and Tλ = T Gλ . Lemma 7. then we take bases of these T cyclic subspaces. Then k (1) If Rk = j=1 rj for k ∈ [n]. . and note that the ordering of Jordan blocks requires that mi ≥ mi+1 for all i ∈ [n − 1]. . . .
namely T mi −j vi for all i ∈ [n] and j ∈ [k]. so nullity(T k ) ≥ Rk .5. we have B = rank(T k ) + nullity(T k ) ≥ B − Rk + Rk = B. 0 [Tn ]Bn 103 . and the Jordan canonical form [T ]B is unique. Hence [Tλ ]Bλ = [Tλ − λI]Bλ + λI is unique.5. we have [T1 ]B1 0 . and [Tλ − λI]Bλ is in Jordan canonical form.Hence the dot diagram for T is independent of the choice of Jordan canonical basis B for T as each ri is completely determined by T . we set Bλ = B ∩ Gλ and Tλ = T Gλ for all λ ∈ sp(T ). (2) This is immediate from (1) and the ranknullity theorem. . which is unique by 7. so equality holds. by (1) we have that k k−1 rk = i=1 ri − i=1 ri = (dim(V )−nullity(T k ))−(dim(V )−nullity(T k−1 )) = rank(T k )−rank(T k−1 ).1) of T ∈ L(V ) diﬀer only by a permutation of the eigenvalues of T . T k (T mi −j vi ) i ∈ [n] and j > k is a linearly independent subset of B. [T ]B = . Proof. Moreover. so is unique up to the ordering of the λ ∈ sp(T ). Theorem 7. Two Jordan canonical forms (using the convention of 7..1. In fact. Suppose T ∈ L(V ) and minT splits in F[z]. .5. Note further that Tλ − λI is nilpotent. (1) We see that there are at least Rk linearly independent vectors in ker(T k ). if {λ1 . then setting Ti = Tλi and Bi = Bλi . By the ranknullity theorem. Proof. (3) For k > 1. and we note that B= λ∈sp(T ) Bλ . If B is a Jordan canonical basis of T as in 7.5.2. λn } is an enumeration of sp(T ) and n B= i=1 Bλi . so rank(T k ) ≥ B − Rk .. . .3. and we must have that nullity(T k ) = Rk .
Operators: As before. 2. Set d = deg(p). Now by the ordering of the companion blocks. and let mi ∈ N be the minimal number such that p(T )mi vi = 0. The dot diagram for T is then the disjoint union of the dot diagrams for the Tp ’s relative to the Bp ’s. This means that [S]C is a block diagonal matrix. Let n B= i=1 BT. so let A1 . and there is a subset C ⊂ B that is a rational canonical basis for S.1. This means that to ﬁnd a rational canonical basis for T ∈ L(V ).5. Let d = deg(p).vi be a rational canonical basis of V for T . j)th dot is labelled p(T )mi −j vi . the array contains n columns starting at the top. . An be these blocks.e A1 0 .5. we must have that mi ≥ mi+1 for all i ∈ [n − 1]. Ordering Companion Blocks: Let B is a rational canonical basis for T ∈ L(V ). Note that there are exactly B/d dots in the dot diagram. i. 0 An We know that Ai is the companion matrix for pmi where mi ≤ m for all i ∈ [n].vi .11. . let Bi = BT. . for a general T ∈ L(V ). let Zi = ZT. .vi . irreducible p ∈ F[z] and some m ∈ N.Notation 7. we must have that mi ≥ mi+1 for all i ∈ [n − 1]. [S]C = . Suppose T ∈ L(V ) with minT = pm for a monic irreducible p ∈ F[z] and some m ∈ N. and then we order these bases by how many elements they have.vi 104 . then we take bases of these T cyclic subspaces.5.. and we let Bp = B ∩ Gp and Tp = T Gp for each distinct monic irreducible p  minT . The dot diagram for T relative to B is given by an array of dots such that 1. the ith column has mi dots. let Ti = T Zi . then we have that S = T Kp ∈ L(Kp ) has minimal polynomial pm . we let B be a Rational canonical basis for T . . B can be represented as an array of dots called a rational canonical dot diagram for T relative to B. minTi = pmi and mi ≤ m for all i ∈ [n]. Lemma 7. let n B= i=1 BT. For i ∈ [n]. and note that Bi  = dmi by 7. We now impose the condition that in order for [S]C to be in rational canonical form.e.4 (Rational Canonical Dot Diagrams). If p ∈ F[z] is a monic irreducible factor of minT and m ∈ N is the largest integer such that pm  minT . the (i. i. Case 1: Suppose minT = pm for some monic. and 3. we ﬁrst decompose Kp into T cyclic subspaces for all monic irreducible p ∈ F[z] dividing minT .
a nilpotent operator of order m.1. and the rational canonical form [T ]B is unique. Then we see that deg(qh ) < d for all h ∈ [mi ]. We show Ci is a basis for Zi for a ﬁxed i ∈ [n].5. l=0 h=1 and set d−1 qh (z) = l=0 λh.l z l for h ∈ [mi ]. h=1 Now setting mi q(z) = h=1 p(z)mi −h qh (z). Then (1) Set Cli = (p(T )mi −h T l vi h ∈ [mi ] is a p(T )cyclic basis for Zp(T ). Then Ci = l=0 Cli is a basis for Zi = ZT. We may suppose p(z) = z as this case would be similar to 7.5. Proof. minT would split. and d 1 (3) rj = (rank(p(T )j−1 ) − rank(p(T )j )) for j > 1. and mi p(T )mi −h qh (T )vi = 0.be a rational canonical basis of V for T .11 as T l vi = 0 and mi is minimal such that p(T )mi T l vi = 0 (in fact the restriction of T l to Zi is invertible as if q(z) = z l . d Hence the dot diagram for T is independent of the choice of rational canonical basis B for T as each rj is completely determined by p(T ). We have that (2) and (3) are immediate from (1) and 7. so it is a Jordan canonical n basis of Zi for p(T )Zi . It suﬃces to prove (1).5. 105 .vi .2 as the Jordan canonical form of p(T ).l p(T )mi −h T l vi = 0. and let mi ∈ N be minimal such that p(T )mi vi = 0.5 as p. then q(T ) is bijective on Gp = V by the proof of 6. Hence C = i=1 Ci is a Jordan canonical basis of V for p(T ). Let rj be the number of dots in the j th row of the dot diagram for T relative to B for j ∈ [l]. is unique.T l vi for 0 ≤ l < d d−1 and i ∈ [n]. The fact that Cli is linearly independent for all i. q are relatively prime). Suppose d−1 mi λh.3 as in this case. l comes from 7. 1 (2) r1 = (dim(V ) − rank(p(T ))).
1. λ3 ∈ C.5. This follows immediately from 7.6 Holomorphic Functional Calculus For this section. If f = 0.l = 0 for all h. Hence p  q. a contradiction as deg(q) < deg(pmi ).5. which is a contradiction as q(T )vi = 0.11.5 and 6. λ2 . so q = 0. Exercise 7. n It follows immediately that C is a basis for V as V = i=1 Zi .6. Theorem 7. Hence λh.1. and let f ∈ F[z] such that f ps = q. Exercises V will denote a ﬁnite dimensional vector space over F. l.6. then p. Now we see that Ci  = dmi = dim(Zi ) = dmi by 7.5. we may deﬁne the spectrum of a similarity class of L(V ) to be the spectrum of one of its elements.1. But then 0 = f (T )−1 0 = f (T )−1 f (T )p(T )s vi = p(T )s vi . In this sense. If p q. we must have that 0 = LC(q) = LC(pmi −1 q1 ) = LC(pmi −1 ) LC(q1 ) =⇒ q1 = 0.11. and Ci is linearly independent. As deg(qh ) < d = deg(p) for all h. Once more. (1) The open ball of radius r > 0 centered at z0 ∈ C is Br (z0 ) = z ∈ C z − z0  < r . f are relatively prime. Proof. Two rational canonical forms of T ∈ L(V ) diﬀer only by a permutation of the irreducible monic factor powers of minT . so f (T ) is invertible on Gp = V . how many similarity classes of matrices in M7 (C) have spectrum {λ1 . q are relatively prime. Hence f = 0. We repeat this process to see that qh = 0 for all h ∈ [mi ]. and we are ﬁnished by the extension theorem. Suppose T ∈ L(V ). V will deﬁne a ﬁnite dimensional vector space over C.5. then p. as deg(qn ) < d for all h. 106 . so q(T ) is invertible on Gp = V by the proof of 6.1.we have that deg(q) < deg(pmi ) and q(T )vi = 0.7. so we must have that s ≥ mi . Recall that spectrum is an invariant of similarity class by 5. Let s ∈ N be maximal such that ps  q.5. λ3 }? 7. Deﬁnition 7. we now have that 0 = LC(q) = LC(pmi −2 q2 ) = LC(pmi −2 ) LC(q2 ) =⇒ q2 = 0. λ2 .6. Given distinct λ1 .
+ . there is a power series representation of f on Bε (z0 ) ⊂ U for some ε > 0. A= .. 1 0 λ We see that we can write A canonically as and N ∈ Mn (C) nilpotent of order n: λ ... 1 λ 0 0 0 0 N 107 .. for each z0 ∈ U .6. .. f (z) = k=2 λk k(k − 1)(z − z0 )k−2 .6...4. .3.. Suppose U ⊂ C is open. etc.. Jordan Blocks: Suppose A ∈ Mn (C) is a Jordan block. Note that if f is holomorphic on U .. i.e.2.. n! A = D + N with D ∈ Mn (C) diagonal (D = λI) 1 0 . (1) (2) Deﬁnition 7.6. there is a λ ∈ C such that λ 1 0 . Examples 7.. . 0 D f (n) (z0 ) . (1) (2) Deﬁnition 7. . In particular.6.5 (Holomorphic Functional Calculus). We have ∞ ∞ f (z) = k=1 λk k(z − z0 )k−1 . Examples 7. then f is inﬁnitely many times diﬀerentiable at each point in U ..e.(2) A subset U ⊂ C is called open if for each z ∈ U . i. there is an ε > 0 and a sequence (λk )k∈Zgeq0 such that ∞ f (z) = k=0 λk (z − z0 )k converges for all z ∈ Bε (z0 ). f (n) (z0 ) = λn n =⇒ λn = This is Taylor’s Formula... there is an ε > 0 such that Bε (z) ⊂ U . A= . A function f : U → C is called holomorphic (on U ) if for each z0 ∈ U ...
We deﬁne f (K1 ) 0 . so if ε > 0 and f : Bε (λ) → C is holomorphic. we must have −1 that S = S1 S2 is a generalized permutation matrix. and since J1 = S1 S2 J2 S2 S1 . If J1 . ∞ n−1 . . By 7. then there are invertible S1 . . and we are ﬁnished. we know J1 and J2 diﬀer −1 −1 only by a permutation of the Jordan blocks. there is a Jordan canonical basis B for T . and suppose f : U → C is holomorphic. . we deﬁne f (T ) = [f ([T ]B )]−1 . . Let K1 . = . .. Thus S1 f (J1 )S1 = S2 f (J2 )S2 .Now sp(T ) = {λ}. Kn be the Jordan blocks of J. J2 are two Jordan canonical forms of A. . S2 ∈ Mn (F) such that Ji = Si−1 ASi −1 −1 for i = 1.4.6. B We must check that if B is another Jordan canonical basis for T . . k k f (A) = f (D + N ) = µk (D + N − λI) = µk N = . Operators: Suppose T ∈ L(V ). then [f ([T ]B )]−1 = B [f ([T ]B )]−1 . and let λi ∈ sp(A) be the diagonal entry of Ki for i ∈ [n] . Thus. .4. It is easy to see that f (J2 ) = Sf (J1 )S −1 as the Jordan blocks do not interact under multiplication by the generalized permutation −1 −1 matrix..6. 108 .. Let U ⊂ C be open such that sp(A) ⊂ U . . . By 7. 2. By the above discussion. so [T ]B is in Jordan canonical form. . f (λ) 0 f (λ) Matrices: Suppose A ∈ Mn (C).. so we may deﬁne µ0 µ1 · · · µn−1 . µ1 k=0 k=0 0 µ0 (n−1) (λ) f (λ) f (λ) · · · f (n−1)! . Then there is an invertible S ∈ Mn (C) and a matrix J ∈ Mn (C) in Jordan canonical form such that J = S −1 AS. .3.. . ..4... We must show S1 f (J1 )S1 = S2 f (J2 )S2 . . 0 f (Kn ) We then deﬁne f (A) = Sf (J)S −1 . .13. f (J) = . then there is a sequence (µn )n∈Zgeq0 such that ∞ f (z) = k=0 µk (z − λ)k converges for all z ∈ Bε (λ). B Examples 7.5. This follows directly from the above discussion and 3. we know how to deﬁne f (Ki ) for all i ∈ [n]. We must check that f (A) is well deﬁned..
we have that cos(A) = cos(U DU ∗ ) = U cos(D)U ∗ = U cos =U i 0 U∗ 0 −i cos(i) 0 U ∗ = U cosh(1)IU ∗ = cosh(1)I. g : U → C be holomorphic such that sp(T ) ⊂ U .6. (2) (f g)(T ) = f (T )g(T ). Proposition 7.6. Since A = U DU ∗ . let f.(1) If p ∈ F[z]. Exercise.6.7 (Spectral Mapping).8. Exercise. Suppose T ∈ L(V ) and U ⊂ C is open such that sp(T ) ⊂ U . Then sp(f (T )) = f (sp(T )). (2) We will compute cos(A) for A= We have that U= 1 1 −i i ∈ M2 (C) 0 −1 .2 agrees with the p(T ) deﬁned in 7. 109 . and let h : V → C be holomorphic such that sp(f (T )) ⊂ V . 0 cos(−i) Theorem 7.6. 1 0 is a unitary matrix that maps the standard orthonormal basis to the orthonormal basis consisting of eigenvectors of 0 −1 A= . by the entire functional calculus. 1 0 We then see that D= i 0 0 −i = 1 1 −i i ∗ 0 −1 1 0 1 1 −i i = U ∗ AU. Hence the holomorphic functional calculus is a generalization of the polynomial functional calculus when V is a complex vector space. Proof. we have that the p(T ) deﬁned in 4. (3) (λf )(T ) = λf (T ) for all λ ∈ C. Let T ∈ L(V ).5. Proof. and (4) (h ◦ f )(T ) = h(f (T )). The holomorphic functional calculus satisﬁes (1) (f + g)(T ) = f (T ) + g(T ).
When is f holomorphic? Hint: Look at f (z) = = z 1 − (1 − z) Exercise 7. First look at the case where A is a single Jordan block associated to λ ∈ C.11 (Square Roots). Compute sin 0 −1 1 0 and exp 0 −π . when does a Jordan block associated to λ = 0 have a square root? 110 . Determine which matrices in Mn (C) have square roots.6. Exercise 7. √ Hint: The function g : C \ {0} → C given by g(z) = z is holomorphic.Exercises V will denote a ﬁnite dimensional vector space over C. In particular.10. 1 1 .6.9. all A ∈ Mn (C) such that there is a B ∈ Mn (C) with B 2 = A.e. i.6. Suppose V is a ﬁnite dimensional vector space over C and T ∈ L(V ) with sp(T ) ⊂ B1 (1) ⊂ C. π 0 Exercise 7. Use the holomorphic functional calculus to show T is invertible.
i. (1) Note that linearity in the ﬁrst variable and self adjointness of a sesquilinear form ·. An inner product space over F is a vector space V over F together with an inner product on V.2. · : V × V → F such that (i) ·. · is conjugate linear in the second variable. v : V → F given by u → u. A sesquilinear form on V is a function ·.e. Remarks 8. 111 . · implies that ·. u is a linear transformation. v is a linear transformation. then a sesquilinear form on V is usually called a bilinear form. The sesquilinear form ·.1. for each v ∈ V . conjugate linear in the second variable means linear in the second variable. v ∈ V . · is conjugate linear in the second variable. · is linear in the ﬁrst variable. and V will denote a vector space over F. the function v. · is linear in the ﬁrst variable and symmetric.e.1. · is called (1) self adjoint if u.1 Sesquilinear Forms Deﬁnition 8.1. i. 8. and the adjective “self adjoint” is replaced by “symmetric. then it is also linear in the second variable. and (3) deﬁnite if v. F will denote either R or C. (4) an inner product it is a positive deﬁnite selfadjoint sesquilinear form. for each v ∈ V . v = 0 implies v = 0.” Note that in this case. (2) If V is a vector space over R.Chapter 8 Sesquilinear Forms and Inner Product Spaces For this chapter. Note further that if ·. · : V → F given by u → v. (2) positive if v. the function ·. v = v. v ≥ 0 for all v ∈ V . u for all u. and (ii) ·.
is deﬁnite.3. v1 − u1 . v1 + v1 .1. q = i=0 p(xi )q(xi ) is an inner product on Pm if m ≤ n. (5) Let a. Then the function b p. Proposition 8. q = a p(x)q(x) dx is an inner product on F[x]. B = tr(B ∗ A).12 is a complex inner product space with inner product given by u1 + iv1 . (6) Suppose V is a real inner product space. · C C = u1 . 112 . xn be n + 1 points in F. (1) The function n u.1. v = u + v. (1) If V is a vector space space over R and ·. then 4 u.Examples 8. u + v − u − v. g = a f (x)g(x) dx is the standard inner product on C([a. . b]. · is a symmetric bilinear form. b ∈ R. note that ·. . v = i=1 e∗ (u)e∗ (v) i i is the standard inner product on Fn . u − v . v2 + i( u2 . Then n p. but it is not deﬁnite if m > n (including m = ∞).1. (3) The function b f.4 (Polarization Identity). Then the complexiﬁcation VC deﬁned in 2. . u2 + iv2 In particular. . x1 . v2 ). F) (4) trace : Mn (F) → F given by trace(A) = i=1 n Aii induces an inner product on Mm×n (F) by A. (2) Let x0 .
Hint: Show that if f (x) > 0 for some x ∈ [a. F) is an inner product.1. 113 2 = . v v. · 2 : Mn (F) × Mn (F) → F given by A. v + λ2 v. v. we must have u. u v. · is positive. v v. v v. Show that the sesquilinear form b f.1. u − . b]. u u. B trace(B ∗ A) is an inner product on Mn (F). u v. This is immediate from the deﬁnition of a symmetric bilinear form or self adjoint sesquilinear form respectively. v.(2) If V is a vector space space over C and ·. v 2 ≤ u. v ∈ V . v. Proof. v 2 ≤ u. b]. Show that the function ·. Proposition 8. Case 2: If v. equation (∗) holds for λ= Hence. v + 0 ≤ u. · be a positive. u + ik v . u − λv = u. v 2 v. g = a f (x)g(x) dx on C([a. v v. then by the continuity of f . v = k=0 ik u + ik v. then in particular. we know that for all λ ∈ F . v 2 ≤ u. v for all u.5 (CauchySchwartz Inequality). v ∈ V. v 2  u. u v.  u. then 3 4 u. u =⇒  u. v = 0. u −2 Re v. there is some δ > 0 such that f (y) > 0 for all y ∈ (x − δ. x + δ) ∩ (a. u −2  u. Since this holds for all λ. Exercise 8. v = 0. v . (∗) Case 1: If v. v . self adjoint sesquilinear form on V . Since ·.1. 0 ≤ u − λv. u . Let u. then 2 Re λ u. Proof.6. u v. v = 0. v Exercises Exercise 8. Let ·. v Thus. v ≤ u. · is a self adjoint sesquilinear form.7. v 2 + = u. v 2  u. v = u. b). Then  u. u − 2 Re λ u.
y ∈ V . (2) Let x ∈ V . A norm on V is a function (i) (deﬁniteness) v = 0 implies v = 0. Then S = T .4 that it is the norm induced by the standard inner product on Fn . (2) The 1norm on C([a. w for all w ∈ W . v ∈ V . i We will see in 8. (V. Applying (1). b]. (3) Suppose S. Examples 8.3.2. (2) Suppose S. = a 114 . y for all x. y ∈ V . T ∈ L(V ) such that x. and (iii) (triangle inequality) u + v ≤ u + v for all u. Then S = T . ·.1. f 2 b 1/2 f 2 dx . T y for all x. we see Sx = T x. (3) This follows immediately from selfadjointness of an inner product and (2). Since this is true for all x ∈ V .2. then true for a speciﬁc one.2. w = 0 for all w ∈ V . Sy = x.2. Then u = v. · ) will denote an inner product space over F. T ∈ L(V ) such that Sx. Proof. In particular. so u − v − 0 by deﬁniteness. u − v = 0. w = v. we have S = T . F) is given by b f and the 2norm is given by 1 = a f  dx. (1) The Euclidean norm on Fn is given by n · : V → R≥0 such that v = i=1 e∗ (v). Hence u = v. (ii) (homogeneity) λv = λ · v for all λ ∈ F and v ∈ V . Hence u − v.2 Inner Products and Norms From this point on. and set u = Sx and v = T x. (1) Suppose u.2. (1) We have that u − v.” Proposition 8. this holds for w = u − v. We illustrate an important proof technique called “true for all. v ∈ V such that u. y = T x. Deﬁnition 8.8.
5. v ∈ V. Clearly · is deﬁnite by deﬁnition. v after taking square v. v ≤ u.2.4.2. namely f . Proof. 115 . and f ∞ = 1. achieves its maximum on a closed. In many books.We will see in 8. if [a. then we say u is perpendicular to v. which says that a continuous. v + v.7. b]. Proposition 8. self adjoint sesquilinear forms as was done in 8. v is a norm. These norms are all diﬀerent.8. v  ≤ u roots. Taking square roots now gives the desired result. It exists by the Extreme Value Theorem. 2 for all u. Sometimes this is denoted as u ⊥ v. the CauchyScwartz inequality is proved for inner products (not positive. It is homogeneous since λv = λv. Examples 8. v = λ · v .2. so we have u+v 2 = u + v. Remark 8. and it is usually in the form used in the proof of 8. v ≤ u 2 + 2 u v + v 2 = ( u + v )2 . f 2 = π.2.5). Proof. 2π] and f : [0. F). (2) Let S ⊂ V be a subset. (1) If u. namely [a. 2π] → R is given by f (x) = sin(x). Then S ⊥ = v ∈ V v ⊥ s for all s ∈ S is a subspace of V . then f 1 = 4. u + v = u.6 (Parallelogram Identity). bounded interval. The function · : V → R≥0 given by v = usually called the induced norm on V . (3) The norms deﬁned in (2) can all be deﬁned for F[x] as well. Proposition 8.1. v = 0. λv = λ2 v.2.4:  u.2. b] . v  + v. u + 2 Re u. It is Now the CauchySchwartz inequality can be written as  u. u + 2 u. This is immediate from the deﬁnition of Deﬁnition 8.4 that the 2norm is the norm induced from the standard inner product on C([a. The induced norm u+v 2 · on V satisﬁes + u−v 2 =2 u 2 +2 v · . (3) We say the set S1 is orthogonal to the set S2 . b] = [0. For example. b].2. denoted S1 ⊥ S2 if v ∈ S1 and w ∈ S2 implies v ⊥ w. The ∞norm is given by f ∞ = max f (x) x ∈ [a. v  ≤ u v . v ∈ V and u. realvalued function.
vj = 0 if i = j. u + v = u.1. Let {v1 . 8. (2) A subset S ⊂ V is called an orthonormal set if S is an orthogonal set and v ∈ S implies v = 1. Suppose u. .3. Moreover.2. but can be in an orthogonal set. v = u + v 2. . We ∗ apply ϕj to the above expression (“hit is with vj ”) to get n n n ∗ λi vj (vi ) i=1 0 = vj i=1 λi vi = = i=1 λi vi . v + v. vj = λj vj . = (2) Suppose we have the inner product on R[x] given by 1 p. Proposition 8. λn ∈ F such that n λi vi = 0. (1) A subset S ⊂ V is called an orthogonal set if u.(1) The standard basis vectors in Fn are all pairwise orthogonal. q = 0 p(x)q(x) dx. Let S be an orthogonal set such that 0 ∈ S. 2 = u + v.3. .9 (Pythagorean Theorem). . vj for all u ∈ V . Exercises Exercise 8. vj ∈ V ∗ given by vj (u) = u. and suppose there are scalars λ1 .2. Examples 8.2. v ∈ V such that u ⊥ v. . Hence all orthonormal sets are linearly independent. vj = λj vj 2 . we have vj 116 = 0. i=1 ∗ ∗ Then we have a linear functional vj = ·. . . u + v 2 2 2 = u + v 2. . Since vj = 0. (1) Zero is never in an orthonromal set. then {v}⊥ ∼ Fn−1 . so λj = 0. Proof. If we pick v ∈ Fn . u + 2 Re u. Then S is linearly inde/ pendent. Then 1 ⊥ x − 1/2. vn } be a ﬁnite subset of S. {1}⊥ is inﬁnite dimensional as xn − 1/(n + 1) ∈ {1}⊥ for all n ∈ N. 2 as vi . v ∈ S with u = v implies that u ⊥ v. Proposition 8.3.3.10. Then u+v Proof.3 Orthonormality Deﬁnition 8. .
. Examples 8. n − 1 ⇒ n: Suppose we have a linearly independent set S = {v1 . . Set n−1 wn = vn − i=1 vn+1 . . . . . . . . If V is a ﬁnite dimensional inner product space over F. . (1) The standard basis of Fn is an orthonormal basis. un . . . . (2) The set {1. But we immediately see span(S) ⊆ span(R) as u1 . . . un−1 and vn . . vn } be a linearly independent subset of V . . . . . x. n − 1}. .4. Proof. uj = vn+1 . wn It is clear that wn+1 ⊥ ui for all j = 1. . uj = 0. Then V has an orthonormal basis. . uk } = span{v1 .3. un+1 } is an orthonormal set. . . Let B be a basis of V . .6. vn−1 ∈ span(S) and vn is a linear combination of u1 .4 (GramSchmidt Orthonormalization). . . C is linearly independent by 8. Then there is an orthonormal set {u1 . . n}. uj = vn+1 . vn }.3. Deﬁnition 8. . . . . n − 1. . . there is an orthonormal set {u1 . . xn } is a basis. . . there is an orthonormal set C such that span(B) = span(C). span{u1 . x2 . . By the induction hypothesis. Moreover. un−1 } such that span{u1 . .Theorem 8. 117 .5. . . . vk } for each k ∈ {1. . . uj − vn+1 . q = 0 p(x)q(x) dx. . Proof.7 (Existence of Orthonormal Bases).3. . . n = 1: Setting u1 = v1 / v1 works. un } such that for each k ∈ {1. . . and R = {u1 . . . . uk } = span{v1 . . Hence C is an orthonormal basis for V. Let S = {v1 . uj uj . uj − i=1 vn+1 . . . . but not an orthonormal basis of Pn with the inner product given by 1 p. . The proof is by induction on n.3.3.3. . vk }. . ui ui . Theorem 8. . uj : n−1 wn+1 . By 8. Hence un ⊥ uj for all j = 1.3. . a subset B ⊂ V is called an orthonormal basis of V if B is an orthonormal set and B is a basis of V . . ui ui and un = wn . . It is clear that span(R) ⊆ span(S) since v1 . . . . It remains to show that span(S) = span(R). Recall that span(S \ {un }) = span(R \ {vn }). . Let V be a ﬁnite dimensional inner product space over F. un−1 ∈ span(R) and un is a linear combination of u1 . . . . n − 1 by applying the linear functional ·. .
vj B = i=1 λi vi . vi 2 for all v ∈ V. . then we have that n n v. . Suppose B = {v1 . ej Fn = 1 0 if i=j else. . Then if w ∈ V . vj = 0 if i = j. . = [vi ]B . vj B ∗ = λj = vj (v). vn } be an orthonormal basis of V . Then n v 2 = i=1  v. vj as vi . vj = i=1 λi vi . . vn ) is an ordered basis of the vector space V . Fact 8.9). Let B = {v1 .9 (Parseval’s Identity). . vi i=1 If B is an orthonormal basis for V .10.3. vj gives the desired formula.8 using induction and the Pythagorean Theorem (8. we see that the linear ∗ functional vj = ·. so B is an orthonormal basis of V with inner product ·.3. The reader may check that this identity follows immediately from 8. · B . . . .3. [vj ]B Fn = ei . and the w.3. . Proof. . Suppose B = (v1 . . vi vi . We may impose an inner product on V by setting u. then this expression simpliﬁes to n w= i=1 w. vj for all j ∈ [n]. vj B B = [u]B . there are scalars λ1 . Dividing by vj . λn ∈ F such that n w= i=1 λi vi . vi vi . v and we check that vi . vj = λj vj . vn } ⊂ V be an orthogonal set that is also basis of V . . [v]B Fn .2. . . . Since B spans V . vj B = i=1 λi vi . 118 . w= vi .Proposition 8. n w. If n v= i=1 λi vi . Now we apply ·. Proof. vi are called Fourier coeﬃcients of w with respect to B. Proposition 8. Moreover.8. vj to see that n w.
.3. n.4. . Since v = u + w and u ⊥ (w − w ). Proposition 8. wi = 0. u Hence w − w 2 + w−w 2 2 = u + (w − w ) 2 = v−w 2 ≤ v−w 2 = u 2. wj = 0. 8. vn } is an orthonormal basis for V . Let W ⊂ V be a ﬁnite dimensional subspace.3. .2. Show that the linear functionals ·vi . wi wi ∈ W and u = v − w. . Let {w1 . Suppose {v1 . Suppose w ∈ W such that v − w ≤ v − w . w = n w. wi x. . wn } be an orthonormal basis for W .11. n: n u.e. Then x ∈ W ⊥ if x ⊥ wi for all i = 1. Proof. wj = v. Then by 8. by 8. wj − v. Set n w= i=1 v. wn } be an orthonormal basis for W . n w= i=1 w.12. wi wi . . . Proof. v − w ≤ v − w for all w ∈ W . . q = 0 p(z)q(z) dz. wj = 0 for all j = 1. wi wi . Let W be a ﬁnite dimensional subspace of V . = 0 and w = w .9. z 2 . . Then there is a unique w ∈ W minimizing v − w . . . vj : L(V ) → F given by T → T vi . . .1. We show w ∈ W is the unique vector minimizing the distance to v. wj wj . i.8. . Use GramSchmidt orthonormalization on {1.1 as u. Let w ∈ W . z. . . . .3. and let {w1 . 119 . Then we have x. wj = v. vj are a basis for L(V )∗ . i=1 so x ∈ W ⊥ .Exercises Exercise 8. We have u ∈ W ⊥ by 8.2.4.4. . Exercise 8. . and let v ∈ V . z 3 } to ﬁnd an orthonormal basis for P3 (F) with inner product 1 p. wj − i=1 v.4 Finite Dimensional Inner Product Subspaces Lemma 8.
(3) Let W be a ﬁnite dimensional subspace of V . u −→ v 2 The corresponding component operator is the linear functional Cv ∈ V ∗ given by u −→ u. is the operator in L(V ) deﬁned by PW (v) = w where w is the unique vector in W closest to v which exists by 8.2. the formulas simplify to Pv (u) = u. v .6 if (1) holds. Proof. is is the operator in L(V ) given by u.4. and let {w1 . Corollary 8. In particular. We have that (2)(4) follow immediately from 6. this deﬁnition is independent of the choice of orthonormal basis for W . v = 0 and u = λv for some λ ∈ F. PW v . . v . w2 + x2 = w1 . (2) im(PW ) = v ∈ V PW (v) = v = W .2.2. Let W ⊂ V be a ﬁnite dimensional subspace. 120 . Then PW u. w2 ∈ W and x1 .4. v = u. Lemma 8. . (2) Suppose u. Let W ⊂ V be a ﬁnite dimensional subspace. and (4) V = W ⊕ W ⊥ .5. Then by 2. Then the projection onto W is the operator in L(V ) given by n u −→ i=1 u. Proof. Then 2 (1) PW = PW .Deﬁnition 8. wi wi . v ∈ V . v ∈ V with u.3. (3) ker(PW ) = W ⊥ . Then Pu = Pv .8. The projection onto span{v} denoted by Pv .4. v 2 Note that if v = 1. Let u. (1) Let v ∈ V . w2 + x2 = w1 . v v and Cv (u) = u. v ∈ V . and PW = PW . w2 = w1 + x1 . we have PW (v) = w = PW (w). PW (w2 + x2 ) = u. v = PW (w1 + x1 ). PW v for all u. x2 ∈ W ⊥ such that u = w1 + x1 and v = w2 + x2 .1.4. there are unique w1 . Then there is a unique w ∈ W closest to v by 8. .6.4. Examples 8. PW u. Then by the deﬁnition 2 2 of PW . v v.4.4. w2 = w1 + x1 . the projection onto W . . Hence PW (v) = PW (w) = w = PW (v). (1) Let v ∈ V . wn } be an orthonormal basis of W . Then PW .
Exercise 8. (1) Show W is T invariant if and only if PW T PW = T PW . Let T ∈ L(V ).4.7 (Invariant Subspaces). 121 . (2) Show W and W ⊥ are T invariant if and only if T PW = PW T .Exercises V will denote an inner product space over F. and let W ⊂ V be a ﬁnite dimensional subspace.
122 .
. Deﬁnition 9. Suppose T ∈ L(H. Then B ∪ C is an orthonormal basis for H by 2. we will need to restrict our attention to ﬁnite dimensional inner product spaces. We know H = K ⊕ K ⊥ by 8. Proposition 9.5. Hence K ⊂ (K ⊥ )⊥ .7. Then (K ⊥ )⊥ = K. Let K ⊂ H be a subspace.1. . For these notes.3. .3. .1. one only studies ﬁnite dimensional Hilbert spaces.4. By 8.1. . a Hilbert space over F will mean a ﬁnite dimensional inner product space over F.8. so w ∈ span(B) = K. One of the biggest diﬀerences between an undergraduate course on linear algebra and a graduate course in functional analysis is that in the undergraduate course. but w ⊥ ui for all i = 1. 123 .2.1. m. The study of operators on inﬁnite dimensional Hilbert spaces is a vast area of research that is widely popular today.1 Hilbert Spaces The preceding discussion brings up a few questions. there is not much we can say about these questions.4. choose orthonormal bases B = {v1 . . n m w= i=1 w. um } for K and K ⊥ respectively. . Hence. we discuss various types of operators in L(H). .13. . Lemma 9. . Remark 9.Chapter 9 Operators on Hilbert Space 9. Then T = T Pker(T )⊥ .4. In this section. vn } and C = {u1 . It is obvious that K ⊂ (K ⊥ )⊥ .3. ui ui . vi vi + i=1 w. Suppose w ∈ (K ⊥ )⊥ . Then by 8. For this section H will denote a Hilbert space over F. Proof. V ) where V is a vector space. which are particular examples of Hilbert spaces. . What can we say about subspaces of an inner product space V that are not ﬁnite dimensional? Is there a projection PW onto an inﬁnite dimensional subspace? Is it still true that V = W ⊕ W ⊥ if W is inﬁnite dimensional? As we are not assuming a knowledge of basic topology and analysis. .1.
(1) For λI ∈ L(H). We show Φ is surjective. The map T ∗ is called the adjoint of T .1. and Φ is surjective. and Φ is bijective. u → T u. so v = 0 by deﬁniteness. v = (Φ(v) ◦ T )(u) deﬁnes a linear operator on H. The map Φ : H → H ∗ given by v → ·. v ∈ H. v = 0.4. v = 0 for all u ∈ V . v =ϕ v 2 u. so Φ is injective.e. Let T ∈ L(H). We show Φ is injective.2 Adjoints Deﬁnition 9. It is clear that Φ(0) = 0.2. by 9.2. v + T u. 2 v v 2 Since span{v} = ker(ϕ)⊥ . Then in particular. i.3. T ∗ v + u.1. Suppose ·.2.1.5. w = λ u. v v v 2 = ϕ(Pv (u)) = ϕ(u). v ϕ(v) ψ = ϕ(v) =Φ v . Hence ψ = ϕ. 9. Now consider the functional ·.1. u. Then if v ∈ H. which we will denote T ∗ v. We know that H = ker(T ) ⊕ ker(T )⊥ by 8. Then ker(ϕ) = H. Proof. there is a vector in H.4 still works for conjugatelinear transformations. 124 . Suppose λ ∈ F and w ∈ H. Then T u.Proof. Note that the proof of 3. Show that there is a bijective correspondence Ψ : L(H) → {sesquilinear forms on H}. T ∗ w = u. v = 0. 9. such that T u. Then T (v + w) = T v + T w = 0 + T w = T w = T Pker(T )⊥ (v + w). T = T Pker(T )⊥ . Suppose that ϕ ∈ H ∗ with ϕ = 0.5. (λI)∗ = λI. Pick v ∈ ker(ϕ)⊥ such that ϕ(v) = 0. It is obvious that the map is conjugate linear. v = u.1. v. Φ(λu + v) = λΦ(u) + Φ(v) for all λ ∈ F and u.5 (Reisz Representation). T ∗ v for all u ∈ H.2.6 (Sesquilinear Forms and Operators).1.2. Let v ∈ ker(T ) and w ∈ ker(T )⊥ . so ker(ϕ)⊥ = (0).4 and 3. Thus. Examples 9. By the Reisz Representation Theorem. v is a conjugatelinear isomorphism. Exercises Exercise 9. We show the map T ∗ : H → H given by v → T ∗ v is linear. i.e. λT ∗ v + T ∗ w for all u ∈ H. Hence T ∗ (λv + w) = λT ∗ v + T ∗ w by 8. λv + w = λ T u.2 we have ψ(u) = ϕ(v) u. Theorem 9.
(λT + S)∗ = λT ∗ + S. (2) (ST )∗ = T ∗ S ∗ . (3) Suppose H is a real Hilbert space and T ∈ L(H). we have u. v = λ T u.1.e.3. (1) The map ∗ : L(H) → L(H) is conjugatelinear. v + Su. and 1 1 125 0 −i . v ≥ 0 for all v ∈ H.(2) For A ∈ Mn (F).2. we have (LA )∗ = LA∗ . antiautomorphism of period two. v . S ∗ v = u. T ∗ S ∗ v .2.2. T ∗ v = T ∗ v. (5) A matrix A ∈ Mn (F) is called positive (deﬁnite) if LA is positive (deﬁnite). (2) self adjoint if T = T ∗ (note that a self adjoint operator is normal). T ∗∗ u = T ∗∗ u. Let S. In short. By 8.1 we get the desired result. Proposition 9. by 8. we have T u. i 0 . u = v.2. (3) positive if T is self adjoint and T v. Deﬁnition 9. (λT +S)∗ v = (λT +S)u. ∗ is a conjugatelinear.5. By 8. v = u. 1 0 2 1 1 . i. and (4) positive deﬁnite if T is positive and T v. Examples 9. (ST )∗ v = ST u. (1) The following matrices in M2 (F) are normal: 0 −1 1 . T ∈ L(H) and λ ∈ F. (1) For all u. we get T = T ∗∗ . v ∈ H. (TC )∗ = (T ∗ )C . λT ∗ v + u. (λT ∗ +S ∗ )v . v = 0 implies v = 0.2. In other words.1 we get the desired result.4. v = u. (3) For all u. Proof. An operator T ∈ L(H) is called (1) normal if T ∗ T = T T ∗ . and (3) T ∗∗ = (T ∗ )∗ = T . Then (TC )∗ ∈ L(HC ) is deﬁned by the following formula: (TC )∗ (u + iv) = T ∗ u + iT ∗ v. v = u. (2) For all u. v ∈ H. multiplication by the adjoint matrix. we have u.2. Hence. v ∈ H.
Show that [T ∗ ]B = [T ]∗ . LA is unitary if and only if the columns of A are orthonormal if and only if the rows of A are orthonormal.9 (Sesquilinear Forms and Operators 2).2. Exercises Exercise 9.(2) Deﬁnition 9.2. Then multiplication by p ∈ Pn is a unitary operator if and only if p(x) = 1 for all x ∈ [0. q = 0 p(x)q(x) dx. Then LA is unitary if and only if A∗ A = AA∗ = I. (1) An operator T ∈ L(H) is called an isometry if T ∗ T = I. and let T ∈ L(H). 1] if and only if p(x) = λ where λ = 1 for all x ∈ [0. 1]. show that the map Ψ satisﬁes (1) T is self adjoint if and only if Ψ(T ) is self adjoint.7. and if U ∈ L(H) is unitary. (4) Let H = Pn with the inner product 1 p. then sp(T ) ⊂ [0. ∞). (1) Suppose A ∈ Mn (F). (3) An operator P ∈ L(H) is called a projection (or an orthogonal projection) if P = P ∗ = P 2 . (2) if T is positive. 126 . (3) If T ∈ L(H) is a projection or (partial) isometry. B Exercise 9. and (3) if T is positive deﬁnite. ∞).2. Exercise 9. For T ∈ L(H). then sp(T ) ⊂ (0.6.2. then U ∗ T U is a projection or (partial) isometry respectively. Examples 9.2.6.1. Show (1) if T is self adjoint. Suppose T ∈ L(H). In fact. (4) An operator V ∈ L(H) is called a partial isometry if V ∗ V is a projection. and (2) T is positive deﬁnite if and only if Ψ(T ) is an inner product. (2) An operator U ∈ L(H) is called unitary if U ∗ U = U U ∗ = I. (2) The simplest example of a projection is LA where A ∈ Mn (F) is a diagonal matrix with only zeroes and ones on the diagonal. Let Ψ : L(H) → {sesquilinear forms on H} be the bijective correspondence found in 9.10.8. (2) T is positive if and only if Ψ(T ) is positive. then sp(T ) ⊂ R. Let B be an orthonormal basis for H.
127 . . then the rows of [U ]B form an orthonormal basis of Fn . u. v H. and (8) If B is an orthonormal basis of H.2 (Unitary). Suppose B is an orthonormal basis of H.e. [v]B Fn for all u. [v]B Fn = i=1 µi λi = u. and let v. . then U B is an orthonormal basis of H. (2) U ∗ is unitary. Then [·]B preserves inner products. Proof.1. and suppose n n u= i=1 µi vi and v = i=1 λi vi . i. (1) ⇒ (3): Obvious. (5) U v = v for all v ∈ H.3. U w = v. (3) ⇒ (4): Suppose U ∗ U = I. then the columns of [U ]B form an orthonormal basis of Fn . i. w = Iv.e. w ∈ H. (7) If B is an orthonormal basis of H. Let B = {v1 . w = v. . vn }. (6) If B is an orthonormal basis of H. w ∈ H. (1) ⇔ (2): Obvious. v ∈ H. Then U v. Then we have [u]B = n n µi ei and [v]B = i=1 i=1 n λi e i . . (4) U v. so [u]B .9. U w = U ∗ U v. w . v H = [u]B . Theorem 9. Proof. w for all v.3 Unitaries Lemma 9. (3) U is an isometry.3. The following are equivalent for U ∈ L(H): (1) U is unitary. U maps orthonormal bases to bases.
Then we see that [U ]∗ [U ]B = I = [U ]B [U ]∗ . 128 . B B Hence if Ui ∈ M1×n (F) is the ith row of [U ]B for i ∈ [n].2. .(4) ⇒ (5): Suppose U v. (6) ⇒ (7): Suppose B = {v1 . . vn } be an orthonormal basis of H. U v = v. Then Uv Now take square roots. vj = 1 0 if i = j else. Hence U is bijective by 3.15. .3. so v = 0 and U is injective. . so U B = {U v1 . so the rows of [U ]B form an orthonormal basis of Fn . U w = v. v = v 2 . Then [U ]B = [U v1 ]B · · · [U vn ]B . vn } is an orthonormal basis of H. 2 = U v. . Thus U ∗ is injective and invertible by 3. and we have that [U vi ]B . . . (8) ⇒ (1): Suppose B is an orthonormal basis of H and the rows of [U ]B form an orthonormal basis of Fn . .8 [U U ∗ ]B = [U ]B [U ∗ ]B = [U ]B [U ]∗ = I.j = 1 0 if i = j else. ej Fn = 1 0 if i = j else. B so U U ∗ = I. and let v ∈ ker(U ). w ∈ H. [U vj ]B Fn = [U ]∗ [U ]B [vi ]B ]. U vn } is an orthonormal basis of H. so the columns [U vi ]B of [U ]B form an orthonormal basis of Fn . so U ∗ U = I. and U is unitary. . then we have Ui Uj∗ = Ii.2. Then by 9. v = v. Then U vi .2. (7) ⇒ (8): Suppose B is an orthonormal basis of H and the columns of [U ]B form an orthonormal basis of Fn . and U is invertible. w for all v. Note that only (5) ⇒ (1) fails if we do not assume that H is ﬁnite dimensional. .15. [vj ]b B Fn = ei . . . Then 0 = 0 = Uv = v . Thus U U ∗ = I. (5) ⇒ (1): Suppose U v = v for all v ∈ V . Remark 9. (4) ⇒ (6): Let B = {v1 . U vj = vi .3.
4. Deﬁne tr : L(H) → F by n tr(T ) = i=1 T vi . Let {v1 .4 Projections Proposition 9. u = P w.3. (1) Show that tr ∈ L(H) . Proof. Suppose u ∈ L. . P u . vi . Now let v ∈ H = K ⊕ K ⊥ .1. T ∈ L(H).Exercises Exercise 9. then for all w ∈ K. Deduce that tr(U T U ∗ ) = tr(T ) for all unitary U ∈ L(H). suppose PL = PK for subspaces L.4. Then (1) H = ker(P ) ⊕ im(P ) and 129 . we have that K ⊆ L. Hence L ⊆ K. so u ∈ K. (3) Let B be an orthonormal basis of H. so P = PK .5. . K. Second. Let P ∈ L(H) be a projection.4. so L = K. i. so P u = 0 by 8. By symmetry. . 0 = w. 9. There is a bijective correspondence between the set of projections P (H) ⊂ L(H) and the set of subspaces of H. Let P be a projection. and let K = im(P ).e. switching L and K in the preceding argument. then P w = w. Hence P u ∈ K ∩ K ⊥ .2.2. First. Show that tr(T ) = trace([T ]B ) for all T ∈ L(H) where trace ∈ Mn (F)∗ is given by n ∗ trace(A) = i=1 Aii . .4. Then by 2.4 (The Trace). First. (2) Show that tr(T S) = tr(ST ) for all S.4.3 is bijective. We claim that P = PK .8 there are unique y ∈ K and z ∈ K ⊥ such that v = y + z. We must show now that all projections are of the form PK for some subspace K. u = w. if u ∈ K ⊥ . We show the map K → PK where PK is as in 8. Deduce further that tr is independent of the choice of orthonormal basis of H. Corollary 9. Then PK (u) = PL (u) = u by 8. so if w ∈ K. Then P v = P y + P z = y = PK y + PK z = PK v. note that P 2 = P . vn } be an orthonormal basis of H.5.
4. Examples 9.4. Remark 9. Qy = P Qu. denoted P ⊥ Q if im(P ) ⊥ im(Q). (2) P Q = 0.2. Hence P Q = 0 = QP by 8. Q ∈ L(H) be projections. then we deﬁne the sup of P and Q by P ∨ Q = Pim(P )+im(Q) and the inf of P and Q by P ∧ Q = Pim(P )∩im(Q) . and u. In light of 9.e. v ∈ H.4. (1) ⇒ (2).4. (1) We say the projection P ∈ L(H) is larger than the projection Q ∈ L(H). Let P be a minimal projection.6. v = P x. Q ∈ L(H) are called orthogonal. then PL ⊥ PK . Proposition 9. Proof. (1) Suppose u. QP v = 0. y ∈ H such that u = P x and v = Qy. Projections P. Deﬁnition 9. (3): Suppose P ⊥ Q. v = Qu. The following are equivalent: (1) P ⊥ Q.4.8.5.(2) im(P ) = v ∈ H P v = v = ker(P )⊥ .4. (2) If P. (2) If L. denoted P ≥ Q if im(Q) ⊂ im(P ). we have P Qu. i. Q ∈ L(H).4. K are two subspaces of H such that L ⊥ K. Proposition 9. Hence. a projection onto a one dimensional subspace. v = 0. there are no nonzero projections that are “smaller” than P. (1) P ∨ Q is the smallest projection larger than both P and Q.1. Then Pu ⊥ Pv .1. (2) ⇔ (3): We have P Q = 0 if and only if QP = (P Q)∗ = 0. (2) ⇒ (1): Now suppose P Q = 0 and let u ∈ im(P ) and v ∈ im(Q). For all u. and (3) QP = 0. Q ≤ E and E ∈ L(H) is a projection. if P. PK ⊥ PK ⊥ for all subspaces K ⊂ H. v = 0.e. P v = u.7. then P ∨ Q ≤ E. Let P. 130 . v ∈ H with u ⊥ v. i.4.3. we see that a P is minimal in the sense that im(P ) has no nonzero proper subspaces. Deﬁnition 9. Then there are x. In particular.
10.e. Thus P ∨ Q ≤ E. Thus F ≤ P ∧ Q. Proof. Then v ∈ H Qv = v ⊂ v ∈ H P v = v . so P. Let v ∈ H. Proof. Exercises 9. (1) ⇒ (2): Suppose Q ≤ P . By 9.4. so Q = P Q. The following are equivalent: (1) im(Q) ⊂ im(P ). and im(F ) ⊂ im(P ∧ Q). Suppose P. Q. so we must have that im(P ) + im(Q) ⊂ im(E). Q ≤ E. so P − Q is a projection. so P ∧ Q ≤ P. (2) ⇔ (3): We have Q = P Q if and only if Q = Q∗ = (P Q)∗ = QP .9.4. so we must have that im(F ) ⊂ im(P ) ∩ im(Q). and v ∈ im(P ). (2) ⇒ (1): Suppose Q = P Q.4. Suppose V ∈ L(H) is a partial isometry.1. and note there are unique x ∈ im(Q) and y ∈ ker(Q) such that v = x + y. and im(P ∨ Q) ⊂ im(E). Q ∈ L(H) are projections with Q ≤ P . Corollary 9. We know P − Q is self adjoint. if F ≤ P. Q. and F ∈ L(H) is a projection.9. Q ≤ P . im(Q) ⊂ im(P ) + im(Q).5. Q ∈ L(H) be projections. (P − Q)2 = P − QP − P Q + Q = P − Q − Q + Q = P − Q. Then P − Q is a projection. (1) It is clear that im(P ). Then Qv = v. Then im(P ) ⊂ im(E) and im(Q) ⊂ im(E).(2) P ∧Q is the largest projection smaller than both P and Q. i.e. (2) Q = P Q. Set P = V ∗ V and Q = V V ∗ (1) P is the projection onto ker(V )⊥ . Suppose F ≤ P. so v = Qv = P Qv = P v. then F ≤ P ∧ Q. Let P. (2) It is clear that im(P ) ∩ im(Q) ⊂ im(P ). Proof. 131 .5 Partial Isometries Proposition 9. Suppose P. im(Q). and suppose v ∈ im(Q). Then im(F ) ⊂ im(P ) and im(F ) ⊂ im(Q). Then Qv = Qx + Qy = x = P x = P Qx = P Qv. Q. i. and (3) Q = QP . Q ≤ P ∨ Q. Proposition 9.
V v = k=0 3 ik V u + ik V v. y2 + z2 = Pker(V )⊥ w. x ∈ H. so v ∈ im(V ). Now divide by 4. The following are equivalent: (1) V is a partial isometry. v for all u. (1) We show ker(P ) = ker(V ).5. Then V ∗ V w.1. 132 . V v = V ∗ V v. Suppose v ∈ im(V ). 3 3 3 4 V u.2. Then v = V u for some u ∈ H. z2 = z1 .5.1. x . y2 ∈ ker(V ) and z1 . Vv 2 = V v. Proof. v . Then if v ∈ ker(V )⊥ . and (4) V ker(V )⊥ ∈ L(ker(V )⊥ .2.5. (2) V v = v for all v ∈ ker(V )⊥ . Taking square roots gives V v = v . It is clear that ker(V ) ⊂ ker(P ) as V v = 0 implies P v = V ∗ V v = V ∗ 0 = 0. by 8.1. we see V = V P . v = V ∗ V v. Let w. V ∗ V = Pker(V )⊥ . v = V v. x = V ∗ V (y1 + z1 ). y2 + z2 = V y1 + V z1 . Remark 9.1. v = v. Then Qv = QV u = V V ∗ V u = V P u = V u = v. The proof is similar using 8. (1) ⇒ (2): Suppose V is a partial isometry. then ker(V )⊥ is called the initial subspace of V and im(V ) is called the ﬁnal subspace of V . (2) ⇒ (3): If u. V z2 = z1 . V y2 + V z2 = V z1 . Then 0 = P v. Since this holds for all w. In particular. V ∗ is a partial isometry. V u + ik V v = k=0 ik V (u + ik v) 2 = k=0 ik u + ik v 2 = k=0 ik u + ik v. V v = u. If V is a partial isometry. so v ∈ ker(V ). Proof. v ∈ ker(V )⊥ . v ∈ ker(V )⊥ . u + ik v = 4 u. ker(V )⊥ ). Now suppose v ∈ ker(P ) so P v = 0. (3) V u. Now suppose v ∈ im(Q). x ∈ H.4.1. z2 ∈ ker(V )⊥ such that w = y1 + z1 and x = y2 + z2 . Theorem 9. Let V ∈ L(H). Then there are unique y1 .4 and (1). by 8. then assuming H is a complex Hilbert space. (2) By 9.4 if H is a real Hilbert space. so v ∈ im(Q). so that im(P ) = ker(P )⊥ = ker(V )⊥ .3. (3) ⇒ (1): We show V ∗ V is a projection. V v = V v 2 . Then v = Qv = V V ∗ v. v = v 2 by 9. im(V )) is an isomorphism with inverse V ∗ im(V ) ∈ L(im(V ).(2) Q is the projection onto im(V ).
5. and V is a partial isometry. 9. v ∈ H ∗ or v ∈ H ∗ . the inner products are linear in the second variable and conjugate linear in the ﬁrst.6. A linear functional in L(H. Then V ∗ V v = V ∗ V (x + y) = V ∗ V y = y = Pker(V )⊥ v. (4) ⇒ (1): Suppose v ∈ H. Exercises Exercise 9. v u. Q ∈ L(H) are equivalent if and only if tr(P ) = tr(Q).(1) ⇒ (4): We know by 9. ker(V )⊥ ). · ). we can deﬁne a function ·· : H × H → F by uv = u. We can go back and forth between these two notations by using the above convention. v = 0. 133 . Dirac found a beautiful and powerful notation that is now referred to as “Dirac notation” or “bras and kets. ·.4 (Equivalence of Projections).1 (Dirac). In his work on quantum mechanics. If we write out the equation naively. v ∈ H. im(V )) and T = V ∗ im(V ) ∈ L(im(V ). v for all u. (1) Show that tr(P ) = dim(im(P )) for all projections P ∈ L(H). we see (u v)w = u vw . a projection. The term “rank one” refers to the fact that the dimension of the image of a rank one operator is less than or equal to one. Then there are unique x ∈ ker(V ) and y ∈ ker(V )⊥ such that v = x + y. we deﬁne a linear transformation u v ∈ L(H) by w = w −→ vw u = w. The power of this notation is that it allows for the opposite composition to get “rank one operators” or “ketbras. so V ∗ V = Pker(V )⊥ . Hence if S = V ker(V )⊥ ∈ L(ker(V )⊥ . The dimension is one if and only if u. Projections P. (2) Show that projections P.” A vector in H is sometimes called a “ket” and is sometiems denoted using the right half of the alternate inner product deﬁned above: v ∈ H or v ∈ H. Given the inner product space (H . Now the (alternate) inner product of u. Q ∈ L(H) are said to be equivalent if there is a partial isometry V ∈ L(H) such that V V ∗ = P and V ∗ V = Q.6 Dirac Notation and Rank One Operators Notation 9.5. v ∈ H. then ST = Iim(V ) and T S = Iker(V )⊥ . v ∈ H is nothing more than the “bra” u applied to the “ket” v to get the “braket” uv . In some treatments of linear algebra.” If u. F) is sometimes called a “bra” and is sometimes denoted using the left half of the alternate inner product: v ∗ = ·.1 that V ∗ V is the projection onto ker(V )⊥ and V V ∗ is the projection onto im(V ).
134 .2. v 2 Thus. v. (2) Show that T u v = T u v and u vT = u T ∗ v.For example. and taking adjoints is also easy: (u v)∗ = v u for all u. then Pu = Pv : Pu = −2 v v. then composition is also naive: T u v = T u v and u vT = u T ∗ v. v = 0 ∈ V λv λv λ2 v v v v u u = = = = Pv . v ∈ H and T ∈ L(H). Exercises Exercise 9. we have that P is a minimal projection if and only if P = v v for some v ∈ H with v = 1. x ∈ H. (1) Show that (u v)∗ = v u. we see that Pv = v and λ ∈ S 1 . Sometimes these projections are called “rank one” projections. v ∈ H. 2 2 2 v 2 u λv λ v 2 In particular. recall that Pv for v ∈ H was deﬁned to be the projection onto span{v} given by u. Let u.6. v u −→ v. if T ∈ L(H). w. This makes it clear that if u = λv and u. One can easily show that composition of rank one operators u v and w x is exactly the naive composition: (u v)(w x) = u vw x = vw u x for all u. Furthermore.
Now take square roots. 10. Proof. v = T T ∗ v.3. v = T ∗ v. then v1 ⊥ v2 .1. v2 are eigenvectors of T corresponding to distinct eigenvalues λ1 . Now we apply (1) to get 0 = (T − λI)v = (T − λI)∗ v = (T ∗ − λI)v . By 9. we know (T − λI)∗ = T ∗ − λI.1 Spectra of Normal Operators This section provides some of the main lemmas for the proofs of the spectral theorems. T ∗ v = T ∗ v 2 . The main result of this section is the spectral theorem which says we can decompose H into invariant subspaces for T under a few conditions. Proposition 10. Hence Eλ1 ⊥ Eλ2 . (2) Since T is normal. T v = T ∗ T v. Proposition 10.Chapter 10 The Spectral Theorems and the Functional Calculus For this section H will denote a Hilbert space over F. Let T ∈ L(H) be normal. (1) T v = T ∗ v for all v ∈ H. (1) Ke have that Tv 2 = T v.2.1. λ2 respectively. Then v is an eigenvector of T ∗ with corresponding eigenvalue λ. Hence T ∗ v = λv. Let T ∈ L(H) be a normal operator. (2) Suppose v ∈ H is an eigenvector of T ∈ L(H) with corresponding eigenvalue λ.2.1. (1) If v1 . 135 . so is Tλ I.
Recall that this theorem was independent of an inner product structure of V and merely relies on the ﬁnite dimensionality of V . which is inherently connected to the inner product structure of H as we need an inner product structure to deﬁne a unitary operator. T ∗ v2 = v1 . v2 . Deﬁnition 10. . . but this follows by (1). Let T ∈ L(H). Proof. (2) By 8.2. (1) Every diagonal matrix is unitarily diagonalizable. we proved that T ∈ L(V ) is diagonalizable if and only if m V = i=1 Eλi where sp(T ) = {λ1 . . An example is A= The basis 1 1 . 10. In this section. λ2 v2 = λ2 v1 . A matrix A ∈ Mn (F) is called unitarily diagonalizable if LA is (unitarily) diagonalizable. . .2 Unitary Diagonalization Let V be a ﬁnite dimensional vector space over F. vn } is a set of eigenvectors of T corresponding to distinct eigenvalues.3. then S is linearly independent.1 we have λ1 v1 . v2 = T v1 v2 = v1 .(3) If S = {v1 . 136 1 1 . 0 0 . but there is no orthonormal basis of R2 consisting of eigenvectors of LA . In Chapter 4. v2 = λ1 v1 . 0 −1 is a basis of R2 consisting of eigenvectors of LA . (1) By 10.1.2.1. Exercises H will denote a Hilbert space over F.2. we will characterize when an operator T ∈ L(H) is unitarily diagonalizable. λm } . it suﬃces to show S is orthogonal.3. T is unitarily diagonalizable if there is an orthonormal basis of H consisting of eigenvectors of T . . . The only way these two can be equal is if v1 ⊥ v2 . (2) Not every diagonalizable matrix is unitarily diagonalizable. . Examples 10.
. . . . Proof. Proof. Suppose that H is the orthogonal direct sum of the eigenspaces of T . . . Let λ1 . . . let Bi be an orthonormal basis for Eλi .4. T is unitarily diagonalizable if and only if there are mutually orthogonal projections P1 . . .3 that T is unitarily diagonalizable if and only if n H= i=1 Eλi and Eλi ⊥ Eλj for all i = j where sp(T ) = {λ1 . 137 . By 6. λn }. . Suppose T is unitarily diagonalizable. . vm } be an orthonormal basis of H consisting of eigenvectors of T . Let T ∈ L(H). . .Proposition 10.1. . .7. . λn be the eigenvalues of T . Then n B= i=1 Bi is an orthonormal basis for H consisting of eigenvectors of T as Bi is an orthonormal basis for Eλi for all i ∈ [n] and B is orthogonal. and let Bi = B ∩ Eλi for i ∈ [n]. Then Eλi = span(Bi ) for all i ∈ [n] and n n H= i=1 Eλi as B = i=1 Bi . . . We know by 10. setting Pi = PEλi for all i ∈ [n]. Suppose now that n H= i=1 Eλi and Eλi ⊥ Eλj for all i = j where sp(T ) = {λ1 . . . . . T ∈ L(H) is unitarily diagonalizble if and only if n H= i=1 Eλi and Eλi ⊥ Eλj for all i = j where sp(T ) = {λ1 . λn }. . we have that n I= i=1 Pi and Pi Pj = 0 if i = j.3.2. . . . Now Eλi ⊥ Eλj for i = j as Bi ⊥ Bj for i = j. Corollary 10.2. λn }. Pn ∈ L(H) and distinct scalars λ1 . and note that Bi ⊥ Bj as Eλi ⊥ Eλj for all i = j.2. λn ∈ F such that n n I= i=1 Pi and T = i=1 λ i Pi . and let B = {v1 . For i ∈ [n]. .
The key result for this section is 5. we can immediately see that sp(T ) = {λ1 .Hence by 9. 138 . .4. .3 The Spectral Theorems The complex. spectral theorem is a classiﬁcation of unitarily diagonalizable operators on complex.1.2. Now if v ∈ H. . . Hilbert space. . λn } and Eλi = im(Pi ). λn ∈ F such that n n I= i=1 Pi and T = i=1 λi Pi . we know that n n H= i=1 im(Pi ) = i=1 E λi . respectively real. . and we are ﬁnished. Now it is immediate that n n n n n n Tv = i=1 T vi = i=1 λi vi = i=1 λi Pi vi = j=1 λ j Pj i=1 vi = j=1 λ j Pj v. .5. .7. we have v can be written uniquely as a sum of elements of the Eλi ’s by 2. . The reader should check that sp(T ) = {λ1 . . Exercises 10.5. . . Note that Eλi ⊥ Eλj for i = j as Pi ⊥ Pj for i = j. . .2.6. and the corresponding eigenspaces are {im(P1 ). . by 6. . λn }. Note that if ∈ L(H) with n T = i=1 λ i Pi . i=1 Now suppose there are mutually orthogonal projections P1 . so we have T = n λ i Pi . where λi ∈ F are distinct and the Pi ’s are mutually orthogonal projections in L(H) that sum to I. . . .8: n v= i=1 vi where vi ∈ Eλi . Pn ∈ L(H) and distinct scalars λ1 . the Pi ’s are mutually orthogonal. Remark 10. im(Pn )}.4. Finally. . respectively real.
. we know T has an eigenvector. v = T v. Then by 10. vi T vi = T i=1 =T i=1 v. .2. λn ∈ C and mutually orthogonal projections P1 . we may extend S to an orthonormal basis B = {v1 .3. Suppose λ is an eigenvalue of T corresponding to the eigenvector v ∈ H. for v ∈ H. a contradiction. . T v = λ v. . Suppose H is a ﬁnite dimensional inner product space over C.2.5. . λi vi vi = i=1 n v.Theorem 10. we know that K and K ⊥ are invariant subspaces for T . and K = H. We need to show K = H. Then T is unitarily diagonalizable if and only if T is normal. Suppose H is a ﬁnite dimensional inner product space over R.1 (Complex Spectral). . Lemma 10. We may assume w = 1. By 10. By 5. Hence K ⊥ = (0).4.7. v = v. The only way this is possible is if λ ∈ R. By 8. we show T PK = PK T . so T K ⊥ = (I−PK )T (I−PK ) is a well deﬁned normal operator in L(K ⊥ ). . Pn such that n n I= i=1 Pi and T = i=1 λi Pi . Suppose K ⊥ = (0). v . . Then n n n n n n TT∗ = i=1 λi Pi j=1 λj Pj = i=1 λi λi Pi = i=1 λi λi Pi = j=1 λ j Pj i=1 λi Pi = T ∗ T. Then λ v. λk . vi vi = i=1 v. Let S = {v1 .4.3. and let T ∈ L(H). 139 . vi PK vi = i=1 k T v. If T ∈ L(H) is self adjoint. . By the Extension Theorem. Proof.1. Suppose now that T is normal. there are λ1 .1. . vn } of H. Proof. vi λi vi = i=1 v. Lemma 10.4. But then S ∪ {w} is an orthonormal set of eigenvectors of T which is strictly larger than S. vi PK vi = T (PK v). First. we have n n k k PK (T v) = PK i=1 k T v. vi vi = i=1 k v.5 T K ⊥ has an eigenvector w ∈ K ⊥ . . or K ⊥ = (0). . by 5. and let T ∈ L(H) be self adjoint. .4. Suppose T is unitarily diagonalizable. .3. then all eigenvalues of T are real. .3. Let K = span(S). . Then T has a real eigenvalue. . . . vi vi = i=1 k T v. vk } be a maximal orthonormal set of eigenvectors of T corresponding to eigenvalues λ1 . . T ∗ vi vi v.
12 and 3. so TC is self adjoint.4 (Real Spectral). v ≤ λmax v 2 for all v ∈ H \ {0} with equality at each side if and only if v is an eigenvector with the corresponding eigenvalue. and let K = span(S). This immediately implies T = i=1 λi Pi = i=1 λ i Pi = T ∗ . Hence.3.3. .5. . . . Show that S.4. For j = 1. Exercises Exercise 10. Then by 10. λmin ≤ 140 . and let T ∈ L(H). .3. Suppose H is a ﬁnite dimensional inner product space over R.1. Then T is unitarily diagonalizable if and only if T is self adjoint. T ∈ L(H) be normal operators. T are simultaneously unitarily diagonalizable. . there is an orthonormal basis {w1 . vm } be a maximal orthonormal set of eigenvectors. Then as in the proof of 10.3 that the complexifcation HC of H is a complex Hilbert space and that the complexiﬁcation of T is given by TC (u + iv) = T u + iT v. .12: T uj + iT vj = TC (uj + ivj ) = λj (uj + ivj ) = λj uj + iλj vj . and let sp(T ) = {λmin = λ1 < λ2 < · · · < λn = λmax } (see 9. .2) be the eigenvalue corresponding to wj . . T has an eigenvector. Exercise 10. . vj must be nonzero as wj = 0. Let S. . . Suppose T is unitarily diagonalizable.3. Hence one of uj . .Proof. we have PK T = T PK . Note that (TC )∗ is given by (TC )∗ (u + iv) = T ∗ u + iT ∗ v. . i. T ∈ L(H) commute if and only if S.e.2. let wj = uj + ivj . and let λj ∈ R (by 10. . Note that T K ⊥ = (I − PK )T (I − PK ) is a well deﬁned self adjoint operator in L(K ⊥ ).3. . Proof.1: (TC )∗ (u + iv) = T ∗ u + iT ∗ v = T u + iT v = TC (u + iv). there is an orthonormal basis of H consisting of eigenvectors of both S and T . λn ∈ R and mutually orthogonal projections P1 . Show that T v. hence unitarily diagonalizable by 10.1. and is thus an eigenvector of T with corresponding real eigenvalue λj . there are distinct λ1 .3.2. .1.3.6 (Rayleigh’s Principle). Let S = {v1 . . Now suppose T is self adjoint. First note that T uj = λj uj and T vj = λj vj by 2.3. n.1. wn } of HC consisting of eigenvectors of TC .10). Theorem 10. .3. Pn such that n n I= i=1 Pi and T = i=1 n n λi Pi .1. Then by 10. Suppose T ∈ L(H) with T = T ∗ . The rest of the argument is exactly the same as in 10. Recall by 2. .
. Proof. .1. (2) Suppose T ≥ 0. Now if T = T ∗ . Pn ∈ L(H) are mutually orthogonal projections.4 as T is unitarily diagonalizable by 10. .4. .2. for all eigenvectors. (2) positive if and only if sp(T ) ⊂ [0. Let T ∈ L(H) be normal if F = C or self adjoint if F = R. n. 1}. so (λi − µi )vi = 0 and λi − µi = 0 for all i ∈ [n]. . . . Then n n λi vi = j=1 λj Pj vi = j=1 µj Pj vi = µi vi . then we have by 9. vn } is an orthonormal basis for H consisting of eigenvectors of T . ∞). . (5) a partial isometry if and only if sp(T ) ⊂ S 1 ∪ {0}. Suppose P1 .3. . Then λ i Pi = i=1 i=1 µ i Pi if and only if λi = µi for all i ∈ [n]. . (1) Clearly λi = λi for all i implies that n n T = i=1 λi Pi = i=1 λ i Pi = T ∗ . which implies λj = λj for all j. n. . v ≥ 0 for all v ∈ H.4.6 that λj Pj = T Pj = T ∗ Pj = λj Pj for all j = 1 . . and there are scalars µ1 .2. Then {v1 . .10. Then T is (1) self adjoint if and only if sp(T ) ⊂ R. . . . and sp(T ) ⊂ [0.4. For i = 1. Now suppose T is positive and v ∈ H. . 141 . ∞). We write n T = i=1 λ i Pi as in 10. n. Let vi ∈ im(Pi ) be a unit vector. . Proof. Proposition 10. and in particular.4 The Functional Calculus n n Lemma 10. . Hence λi v 2 ≥ 0 for all i = 1. let vi ∈ im(Pi ) \ {0}. . µn ∈ F such that n v= i=1 µi vi . . . Then T v. . For i ∈ [n]. (3) a projection if and only if sp(T ) = {0.1. (4) a unitary if and only if sp(T ) ⊂ S 1 = λ ∈ C λ = 1 .
1} if and only if λi ∈ S 1 ∪ {0}. The spectral projections of T are of the form m P = j=1 Pi j where ij are distinct elements of {1. .4 and the Spectral Theorem.3. . The second ⇐⇒ follows by arguments similar to those in (1). . (1) If A= 1 2 1 1 1 1 142 ∈ M2 (R). . Then by 10. . Let T ∈ L(H) be normal if F = C or self adjoint if F = R. . n. .2. . n} and 0 ≤ m ≤ n (if m = 0. . Note that the spectral projections of T are projections by 9. . . . (5) We have T ∗ T is a projection if and only if sp(T ∗ T ) ∈ {0. .4. . Deﬁnition 10.6. . Examples 10.2. By the proof of (4). µi vi = i=1 λi µi vi . 1} for al i = 1. n if and only if λi ∈ S 1 for all i = 1. n n n T = T ∗ = T 2 ⇐⇒ i=1 λi Pi = λi Pi = i=1 λ 2 Pi i al i = 1. n. .Now this means n n n n n T v.4.4. µi vi = i=1 λi µi vi 2 ≥ 0. . v = (3) We have T i=1 µi vi . 1} by (3). then P = 0). . . . . . The result now follows by 9. It is clear that λi 2 ∈ {0. n ⇐⇒ λi = λi = ⇐⇒ λi ∈ {0.4. . . . λn ∈ F such that n n I= i=1 Pi and T = i=1 λi Pi . .3. (4) We have n n n n i=1 λ2 for i U U= i=1 ∗ λ i Pi j=1 λ j Pj = i=1 λi  Pi = i=1 2 Pi = I if and only if λi 2 = 1 for all i = 1. . we see that sp(T ∗ T ) = λi 2 λi ∈ sp(T ) . . there are mutually orthogonal projections P1 . i=1 µi vi = i=1 T µi vi . . Pn ∈ L(H) and scalars λ1 .
0 0 2 (2) If A= 0 −1 1 0 ∈ M2 (C). .5 (Functional Calculus). Then T is unitarily diagonalizable by the Spectral Theorem. 1 corresponding to eigenvectors 1 √ 2 1 1 1 and √ 2 1 −1 ∈ R2 . Pn ∈ L(H) such that n T = i=1 λ i Pi . 0 1 then A is normal. and −1 1 1 0 . 1 1 2 1 −1 .4. Examples 10. Hence the spectral projections of A are 0 0 1 . Hence the spectral projections of A are 0 0 1 i 1 −i . and it is clear P1 P2 = P2 P1 = 0. We see that the eigenvalues of A are ±i corresponding to eigenvectors 1 −i and 1 i ∈ C2 . We see then that our rank one projections are P1 = 1 −i 1 i = 1 i −i 1 and P2 = 1 i 1 −i = 1 −i i 1 ∈ L(C2 ). deﬁne the operator f (T ) ∈ L(H) by n f (T ) = i=1 f (λi )Pi . Let T ∈ L(H) be normal if F = C or self adjoint if F = R. 1 1 1 . We see that the eigenvalues of A are 0. .then A is self adjoint. 0 1 Deﬁnition 10. For f : sp(T ) → F. We see then that our rank one projections are P1 = 1 2 1 1 1 1 = 1 2 1 1 1 1 and P2 = 1 2 1 −1 1 −1 = 1 2 1 −1 −1 1 ∈ L(R2 ).4. there are mutually orthogonal minimal projections P1 . By 10. . . and 0 0 −i 1 i 1 1 0 . and we see that P1 P2 = P2 P1 = 0.6.2. . 143 . .4.
5 for normal T .6.4. = cos(i) 1 i 1 −i + cos(i) −i 1 i 1 Proposition 10. (3) Suppose we want to ﬁnd cos(A) where A= 0 −1 1 0 ∈ M2 (C). This map has the following properties: (1) evT is a linear transformation. Hence the functional calculus is a generalization of the holomorphic functional calculus for a normal operator. (3) evT (f ) = (evT (f ))∗ for all f ∈ F (sp(T ).5 and use the fact that cos(z) is even to get cos(A) = cos(i) = cos(i) 1 i 1 −i + cos(−i) −i 1 i 1 1 0 0 1 = cosh(1)I. Hence the functional calculus is a generalization of the polynomial functional calculus for a normal operator. we have that p(T ) as deﬁned in 4. (4) im(evT ) is contained in the normal operators if F = C or the self adjoint operators if F = R. (2) Suppose U ⊂ C is an open set containing sp(T ) and f : U → C is holomorphic. The functional calculus states that there is a well deﬁned function evT : F (sp(T ).5. F). we will apply 10. F) and all g ∈ F (sp(g(T )).2 agrees with the p(T ) deﬁned in 10. F). We saw in 10. This is shown by proving that n n p i=1 λi Pi = i=1 p(λi )Pi .7. i 1 =i 1 i 1 −i + (−i) . Let T ∈ L(H) be normal if F = C or self adjoint if F = R. −i 1 i 1 Now. F) → L(H).4. which follows easily from the mutual orthogonality of the Pi ’s using 9. (2) evT is injective.6.5 agrees with the f (T ) deﬁned in 10.(1) For p ∈ F[z].4.4.4.6.4 that the minimal nonzero spectral projections of A are 1 i −i 1 so we see that A= 0 −1 1 0 and 1 −i .4. and (5) evT (f ◦ g) = evg(T ) (f ) for all f ∈ F (sp(T ). 144 . Then the f (T ) deﬁned in 7.
7 part (5). Remark 10. Proof. (2) T v = T v for all v ∈ H. F). g ∈ F (sp(T ). (3) We have n n ∗ f (T ) = i=1 f (λi )Pi = i=1 n f (λi )Pi = f (T )∗ (4) We have f (T ) = i=1 f (λi )Pi .4.5 as T ∗ T is positive: T ∗ T v. and T  is the unique positive square root of T ∗ T .4. 1}. Deﬁne T  = T ∗ T .9.1. We see that the spectral projections of a normal or self adjoint T ∈ L(H) are precisely of the form f (T ) where f : sp(T ) → F such that im(f ) ⊂ {0. . . √ Proposition 10. . v = T v. Suppose T ∈ L(H).2 and 10.4. We know T  is positive by 10. T v = T v 145 2 ≥ 0 for all v ∈ H.4. (1) That T 2 = T ∗ T follows immediately from 10.Proof.2. . (3) ker(T ) = ker(T ). . (1) Suppose λ ∈ F and f. so f (T ) is unitarily diagonalizable by 10.4. Then n n n n (λf +g)(T ) = i=1 (λf +g)(λi )Pi = i=1 λf (λi )+g(λi ) Pi = λ i=1 f (λi )Pi + i=1 g(λi )Pi = λf (T )+g(T ). (2) This is immediate from 10. (1) T 2 = T ∗ T . We have n n n (f ◦ g)(T ) = i=1 (f ◦ g)(λi )Pi = i=1 f (g(λi ))Pi = f i=1 g(λi )Pi = f (g(T )). λn such that n n I= i=1 Pi and T = i=1 λi Pi . . . there are mutually orthogonal projections P1 .2. (5) Note that f (g(T )) is well deﬁned by (4).4.8.and is thus normal if F = C or self adjoint if F = R. By 10. Pn and distinct scalars λ1 .4. . .4 and the Spectral Theorem.
Tv 2 n n λ 2 Pi i i=1 = i=1 λi Pi = S.1 (Polar Decomposition). Exercises Exercise 10. . and extending by linearity. (2) Show that every self adjoint operator can be written uniquely as the diﬀerence of two positive operators. so applying the functional calculus. (1) Show that every operator can be written uniquely as the sum of two self adjoint operators.4.4. T is unitarily diagonalizable. After relabeling. . T v = T v 2 . so there is an orthonormal basis {v1 . Pn and distinct scalars λ1 . there is a unique partial isometry √ V ∈ L(H) such that ker(V ) = ker(T ) = ker(T ) and T = V T  where T  = T ∗ T as in 10. v = T 2 v. . = T v. λn ∈ [0. vn } of H consisting of eigenvectors of T . . . .2. we see i √ T  = T ∗ T = (2) If v ∈ H. . ∞) such that n n S= i=1 λi Pi =⇒ T ∗ T = S 2 = i=1 λ 2 Pi . Note that ker(T ) = span{vk+1 . vn } = ker(V ). ∞) by 10. the λ2 ’s are distinct.5 The Polar and Singular Value Decompositions Theorem 10. . Let k ∈ {0} ∪ [n] be minimal such that λi = 0 for all i > n. and note that λi ∈ [0. Deﬁne an operator V ∈ L(H) by 1 Tv if i ∈ [k] i V vi = λi 0 else.5.Suppose now that S ∈ L(H) is positive and S 2 = T ∗ T . . 146 . Let λi be the eigenvalue corresponding to vi for i ∈ [n]. Now take square roots. . For T ∈ L(H). i As the λi ’s are distinct.10 (HahnJordan Decomposition of an Operator).9. As T  is positive.4. T v = T ∗ T v. . Then there are mutually orthogonal projections P1 . . 10. v = T v. (3) This is immediate from (2). . . Proof. we may assume λi ≥ λi+1 for all i ∈ [n − 1]. . .
and U vi = U 1 (λi vi ) λi =U 1 (T vi ) λi = U T  1 vi λi =T 1 vi λi = 1 T vi λi if i ∈ [k]. . If v ∈ H. or Schmidt decomposition.5. . The eigenvalues of T  are called the singular values of T .4. . Suppose now that T = U T  for a partial isometry U with ker(U ) = ker(T ). By 10.2. Setting ui = V vi for i = 1. we have k k Vv = i=1 k µi V vi = µi λi vi = λi i=1 k µi T vi = T λi µi v i = v .5. .3. there is a unique partial isometry V with ker(V ) = ker(T ) and T = V T . so U = V . µn such that k v= i=1 µi vi . . n. . . . .3 (Singular Value Decomposition). T is unitarily diagonalizable. Deﬁnition 10. 147 . Then by 10. . Then we would have that U vi = 0 if i > k. . then there are scalars µ1 . there is an orthonormal basis {v1 . .1. . Then n n n k V T v = V T  i=1 µi v i = i=1 µi V T vi = i=1 µi λi V vi = i=1 µi λi 1 T vi = λi n µi T vi = T v. . Notation 10. .We show V is a partial isometry. . We show V T  = T .5. λn such that n T  = i=1 λi vi vi . Hence U and V agree on a basis of H. This last line is called a singular value decomposition. we have that n n n T = U T  = U i=1 λi vi vi  = i=1 λi U vi vi  = i=1 λi ui vi . . . i=1 as λi = 0 implies T vi = 0 since ker(T ) = ker(T ). vn } of H and scalars λ1 . .5.9. of T . Let T ∈ L(H). k i=1 µi vi = T  λi k i=1 µi vi = λi k i=1 µi T vi λi = i=1 i=1 Hence V is a partial isometry by 9. If v ∈ ker(V )⊥ . then there are scalars µ1 . µn such that n v= i=1 µi vi . As T  is positive. .
5. Exercise 10.Exercises H will denote a Hilbert space over F. 148 . Compute the polar decomposition of A= 0 −1 1 0 and B = 1 −1 1 1 ∈ M2 (C).4.
PrenticeHall. 1989. and Spence. Analysis Now. Springerverlag. [2] Friedberg. S. 2nd Edition. G. 1997. Insel. Pearson Education. SpringerVerlag. [4] Pederson. 1971. 2003. [3] Hoﬀman and Kunze. Linear Algebra. 4th Edition. 2nd Edition. 149 .Bibliography [1] Axler. Linear Algebra. Revised Printing. Linear Algebra Done Right.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.