Professional Documents
Culture Documents
Guillaume Pouliot
August 23, 2008
Abstract
We here present the main conclusions and theorems from a rst rig-
orous inquiry into linear algebra. Although some basic denitions and
lemmas have been omitted so to keep this exposition decently short, all
the main theorems necesary to prove and understand the spectral, or di-
agonalization, theorem are here presented. A special attention has been
placed on making the proofs not only proofs of existence, but as enlight-
ning as possible to the reader.
1 Introduction
Since William Rowan Hamilton developped the concept of quaternions in 1843,
and since Arthur Caley suggested and developped the idea of matrices, linear
Algebra has known a phenomenal growth. It has become central to the study
of many elds, with the noticeable example of statistics. For that reason, the
study of linear algebra has increasingly imposed itself as an unavoidable eld
for students endeavoring to do empirical research. Hence the pages to follow
consist in a pedagogical presentation of standard material in linear algebra up
to and including the complex and real spectral theorem.
Throughout the paper, we assume the reader has had some experience with
vector spaces and linear algebra. We hope to provide all the necessary infor-
mation regarding the study of operators. As indicated by the title, we work
towards proving the spectral theorem. We do so as we believe it is one of the
most important results in the applications of linear algebra, and why it is true
thus deserves some attention.
2 The Rank-Nullity Theorem
We start by giving a proof of the rank-nullity theorem as it will be used through-
out our way towards proving the spectral theorem.
Let V be a nite-dimensional vector space. Let L(V ) be the space of linear
transformations from V to V .
1
2.1 Theorem: The Rank-Nullity Theorem
Let T L(V ). Then dimV = dimrangeT + dimkerT.
Proof
RangeT and KerT are both subspaces, thus they both have basis. Let
(w
1
, ..., w
n
) be a basis for the former, and (u
1
, .., u
k
) be a basis for the latter.
That is, dimrangeT = n and dimkerT = k.
Now we dene the set (v
1
, ..., v
n
) such that T(v
i
) = w
i
, i = 1, ..., n.
Take any v V , then
T(v) =
1
w
1
+... +
n
w
n
for some
i
F, i = 1, ...n. So
T(
1
v
1
+... +
n
v
n
v) = 0
and thus,
1
v
1
+... +
n
v
n
v kerT. Hence,
1
v
1
+... +
n
v
n
v =
1
u
1
+... +
k
u
k
,
so
v =
1
v
1
+... +
n
v
n
1
u
1
...
k
u
k
.
Therefore (v
1
, ..., v
n
, u
1
, ..., u
k
) spans V . Now assume
1
v
1
+... +
n
v
n
+
n+1
u
1
+... +
n+k
u
k
= 0.
Applying T on both sides of the above equality, we get
T(
1
v
1
+... +
n
v
n
+
n+1
u
1
+... +
n+k
u
k
) =
1
w
1
+... +
n
w
n
= 0.
But the w
i
s are linearly independent
1
, ...,
n
= 0
n+1
u
1
+... +
n+k
u
k
= 0
but the u
i
s are linearly independent, thus
n+1
, ...,
n+k
= 0. Hence
(v
1
, ..., v
n
, u
1
, ..., u
k
) is linearly independent and thus a basis of V . Thereofore
dimV = n +k = dimrangeT + dimkerT.
.
3 Eigenvalues and Eigenvectors
3.1 Denition: Operator
An operator is a linear map from a vector space to itself.
2
3.2 Denition: Invariant Subspace
Let T L(V ). A subspace U is invariant under T if Tu U for every u U.
3.3 Denition: Eigenvalue
Let T L(V ). A scalar F is an eigenvalue if there a non-zero vector v V
such that Tv = v.
Nota Bene: T has a 1-dimensional invariant subspace if and only if T has
an eigenvalue.
Observe that Tu = u (T I)u = 0. Therefore is an eigenvalue of T
if and only if (T I) is not injective (T I) is not surjective (T I)
is not invertible. This, of course, only makes sense for higher dimensions.
3.4 Denition: Eigenvector
We call the v in Tv = v an eigenvector of T.
Note that because Tu = u (T I)u = 0 , the set of eigenvectors of T
corresponding to equals ker(T I), which is a subspace of the vector space
V .
The following statement, true in all dimensions, may make things more intu-
itive: an operator has an eigenvalue if and only if there exists a nonzero vector
in its domain that gets sent bu the operator to a scalar multiple of itself.
3.5 Theorem
Let T L(V ). Suppose that
1
, ...,
m
are distinct eigenvalues of T and
v
1
, ..., v
m
are corresponding nonzero eigenvectors. Then {v
1
, ..., v
m
} is linearly
independent.
Proof
Suppose not. Let v
k
be the eigenvector that is a linear combinations of the
others with the smallest subscript. Then
v
k
=
1
v
1
+... +
k1
v
k1
[1]
where some
i
= 0.
Tv
k
= T (
1
v
1
+... +
k1
v
k1
)
k
v
k
=
1
Tv
1
+... +
k1
Tv
k1
=
1
1
v
1
+... +
k1
k1
v
k1
[2]
Multiplying [1] by
k
and substracting from [2] gives
0 =
1
(
1
k
)v
1
+... +
k1
(
k
k1
)v
k1
3
where
1
, ...,
k1
=
k
, and the v
i
s are linearly independent which implies that
all the
i
s are 0. But some
i
= 0. This is a contradiction because the
i
s
were the coecient of linear combination in v
i
, 1 i k 1 of the nonzero v
k
.
Then (v
1
, ..., v
m
) is linearly independent.
3.6 Theorem
Every operator on a nite-dimensional, nonzero, complex vector space has an
eigenvalue.
Proof
Take some complex vector space V of dimension n and pick any v V .
Consider the vector
(v, Tv, ..., T
n
v).
As it conains n+1 parameters, it must be linearly dependent
0
, ...,
n
with some
i
= 0. Let m be the greatest integer such that a
m
= 0.
0 =
0
v +
1
Tv +... +
m
T
m
v
= (
0
I +
1
T +... +
m
T
m
)v
= c(T
1
I)...(T
m
I)v
T
j
I is not injetive for some j T has an eigenvalue.
3.7 Lemma
For T L(V ). T has an eigenvalue if and only if T has a 1-dimensional invariant
subspace.
Proof
First suppose T has an eigenvalue. There exists F such that, for some
u V , we have Tu = u. Then consider the 1-dimensional subspace spanu = U.
Take any w U. Then
Tw = T(u) = Tu = u
where F and thus F Tw U U is invariant.
Now suppose T has a one dimensional invariant subspace. Let U be that
subspace. for any u U, we have that Tu U Tu = u for some F
is en eigenvalue of T.
3.8 Denition: Projection Operator.
If V = U W, such that we can write any v V as v = u + w where u U
and w W, the projection operator P
U,W
v = u. Also, P
U,W
is an orthogonal
projection if W = U
.
4.2 Denition
The orthogonal projection of V onto U, P
U
, is dened such that P
U
v = u where
v = u +u
for u U and u
.
It is also quite straightforward to show that P
U
, an orthogonal projection of
V onto U, has the following properties:
rangeP
U
= U
nullP
U
= U
v P
U
v U
for every v V
P
2
U
= P
U
||P
U
v|| ||v||, v V .
However, showing that some of these properties suce to dene an orthogonal
projection are more tricky.
4.3 Theorem
If P L(V ) is idempotent, i.e. P
2
= P, and every vector in kerP is orthogonal
to every vector in rangeP, then P is an orthogonal projection.
Proof
Take any v V , then
v = Pv + (I P)v
where, clearly, Pv rangeP and (I P)v kerP because P((I P)v) =
Pv PPv = Pv Pv = 0.
Now take any v kerP rangeP.
v rangeP v
s.t. Pv
= v, and
v kerP Pv = 0
PPv
= 0
Pv
= 0
v = 0
6
kerP rangeP = {0}.
Observe how we didnt need otrhogonality to prove that the direct sum of kerP
and rangeP is V . That is, an idempotent operator is necessarily a projection,
but not necessarily an orthogonal projection.
= V = kerP rangeP. But kerP and rangeP are orthogonal = P is an
orthogonal projection onto P(V ).
We saw that if a matrix or a projector is idempotent and its column space
is orthogonal to its null space, then that matrix or projector is positive.
4.4 Theorem
Suppose P L(V ) is idempotent, i.e. P
2
= P. Then P is an orthogonal
projection if and only if P is self-adjoint.
Proof
First assume P is self-adjoint. P is idemtpotent P = P
rangeP,kerP
. Thus
we need to show that rangeP and kerP are orthogonal in order to show that P
is an orthogonal projection. We take v rangeP and w kerP . v
s.t.
Pv
= v. Then
< v, w >=< Tv
, w >=< v
, Tw >=< v
, T
w >=< v
, Tw >=< v
, 0 >= 0.
P is an orthogonal projection.
Now assume P is an orthogonal projection. Take v, z V and consider their
unique decomposition v = u + w and z = u
+ w
where u, u
rangeP and
w, w
nullP. Then
< Tv, z >=< u, u
+w
>=< u, u
> .
Similarily
< v, Tz >=< u +w, u
>=< u, u
>,
< (TT
> < u, u
>= 0.
T
= T .
Anticipating the next proof, we take an instant to note that the uniqueness
of a projection P
U,W
follows from the unique decomposition v = u + w into
elements of the two subspace, U and W, the direct sum of which is V . That is,
for M and P, two projections onto U, for our arbitrary v we get Mv = u = Pv
M = P.
7
4.5 Denition
Let T L(V ). Then let C(M(T)) be the column space of the matrix represen-
tation of T. That is if M(T) =
_
c
1
. . . . . . c
n
where c
1
, c
2
, ..., c
n
are the
colomns of M(T), then C(M(T)) = span({c
1
, c
2
, ..., c
n
}).
4.6 Theorem
Let o
1
, ..., o
r
be an orthonormal basis for C(X), and let O = [o
1
, ..., o
r
]. Then
OO
=
r
i=1
o
i
o
i
, where we use to annotate the transpose of the matrix, is the
perpendicular projection operator onto C(X).
Proof
First we show that OO
= OO
OO
OO
. Then O
O is an r r
matrix where the diagonal elements are inner products of equal orthonormal
vectors, i.e. o
i
o
i
= 1, and the o-diagonal elements are inner products of unqueal
orthonormal vectors, i.e. o
i
o
j
= 0 because i = j. Therefore O
O = I
r
OO
OO
= OI
r
O
= OO
is idempotent. Because OO
) = C(X),
then we will know that OO
= X if X is a perpendicular
projection. First we want to show that C(OO
) C(O).
We know that OO
=
r
i=1
o
i
o
i
. Let us write o
i
=
_
_
a
1i
.
.
.
.
.
.
a
ni
_
_
o
i
o
i
=
_
_
a
1i
.
.
.
.
.
.
a
ni
_
_
_
a
1i
. . . . . . a
1n
=
_
_
a
1i
a
1i
a
1i
a
2i
.
.
. a
1i
a
ni
a
2i
a
1i
a
2i
a
2i
.
.
. a
21
a
ni
.
.
.
.
.
.
.
.
.
.
.
.
a
ni
a
1i
a
ni
a
2i
.
.
. a
ni
a
ni
_
_
=
_
_
a
1i
_
_
a
1i
.
.
.
.
.
.
a
ni
_
_
a
2i
_
_
a
1i
.
.
.
.
.
.
a
ni
_
_
. . . a
ni
_
_
a
1i
.
.
.
.
.
.
a
ni
_
_
_
_
=
_
a
1i
o
i
a
2i
o
i
. . . a
ni
o
i
OO
=
r
i=1
o
i
o
i
=
_
r
i=1
a
1i
o
i
r
i=1
a
2i
o
i
. . .
r
i=1
a
ni
o
i
_
8
where each column is a linear combination of the columns of O = [o
1
, ..., o
r
]
OO
span(O) C(OO
).
Take v C(X) v = Ob for some b R
n
. O
O = I
r
v = OO
..
nn
Ob
..
n1
, thus
v is a linear combination of the columns of OO
v C(OO
).
This also reads v = OO
to map v to itself.
C(X) C(OO
) = C(X) = C(OO
= dimV dimU.
Proof
Consider any subspace U, it has a unique orthogonal complement U
. These
suce to dene the orthogonal projection P
U
, for which rangeP
U
= U and
ker P
U
= U
. Thus
dimV = dimrangeP
U
+ dimker P
U
dimV = dimU + dimU
4.8 Denition
For any T L(V, W), the adjoint of T is dened to be the linear map T
L(W, V ) such that for any two v V, w W, then < Tv, w >=< v, T
w >.
4.9 Proposition
For T L(V ) and U a subspace of V , U is invariant under T if and only if U
is invariant under T
.
Proof
Pick any v U, w U
.
< v, w >= 0
Suppose T is invariant
< Tv, w >= 0
< v, T
w >= 0
T
is invariant.
The proof in the other direction is perfectly analogous.
9
4.10 Proposition
Let T L(V, W). Then
dimker T
= dimrangeT
.
Proof
First observe that w ker T
< Tw, v >= 0, v V
< w, T
v >= 0, v V
w (rangeT
.
= ker T = (rangeT
. Similarily, ker T
= (rangeT)
.
Then the proof becomes almost trivial because
dimV = dimrangeT
+ dim(rangeT
= dimrangeT
+ dimker T
and
dimW = dimrangeT + dimker T
dimker T
dimrangeT = dimrangeT
.
Interestingly, it directly follows from this result that the dimension of the
column space and the row space of a matrix must be the same.
5 Inner-Product Spaces Continued
5.1 Denition
An operator T L(V ) is self adjoint if T = T
.
10
5.2 Proposition
Every eigenvalue of a self-adjoint operator is real.
Proof
Suppose T is a self-adjoint operator on V . Let be an eigenvalue of T, and
take v a correspondong eigenvector. Then
||v||
2
= < v, v >=< Tv, v >=< v, Tv >
because T = T
. Thus
||v||
2
=< v, v >=
< v, v >
=
R for all eigenvalues of T.
5.3 Denition
An operator T is normal if TT
= T
.
5.4 Proposition
If T L(V ) is normal, then
rangeT = rangeT
Proof
First we show that ker T = ker T
T.
Take u ker T
T T
Tu = 0 Tu ker T
Tu (rangeT)
Tu
rangeT(rangeT)
T. And of course,
if u ker T, then Tu = 0 T
Tu = 0 and u ker T
T ker T
T ker T
ker T
T = ker T. With this in hand, we can easily see that ker T = ker T
= ker T
. But TT
= T
T implies that
ker T
= ker T
T = ker T.
Furthermore
ker T
= ker T
(ker T
= (ker T)
rangeT = rangeT
.
11
5.5 Proposition
If T L(V ) is normal, then
ker T
k
= ker T
and
rangeT
k
= rangeT
for every positive integer k.
Proof
First we show that ker T
2
= ker T. Take u ker T
2
. Then TTu = 0 Tu
ker T
Tu (rangeT
Tu rangeT (rangeT)
Tu = 0 u ker T
ker T ker T
2
. And again, it is trivial that ker T
2
ker T, which in turns
implies that ker T = ker T
2
. The inductive step is identical as long as T
k
is also
normal, which is obvious.
Now we show that rangeT
k
= rangeT. First we show that rangeT
2
=
rangeT. Indeed,
(rangeT
2
)
= ker(TT)
= ker(T
)
2
= ker T
= (rangeT)
, (e
1
, ..., e
n
), (f
1
, ..., fm)).
Proof
Suppose T L(V, W). Assume (e
1
, ..., e
n
) is an orthonormal basis of V and
(f
1
, ..., fm) is an orthonormal basis of W. We know the k
th
column of M(T)
is obtained by placing the j
th
coecient of the linear combination of Te
k
in
itsj
th
row cell. Furthermore, because (f
1
, ..., f
m
) is an orthonormal basis, we
can write
Te
k
=< Te
k
, f
1
> f
1
+...+ < Te
k
, f
m
> f
m
.
Thus
M(T) =
_
_
< Te
1
, f
1
> . . . . . . < Te
1
, f
m
>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
< Te
n
, f
1
> . . . . . . < Te
n
, f
m
>
_
_
.
Similarly, we nd that the parameters of the k
th
column of M(T
) from the
linear decomposition
Te
k
=< T
f
k
, e
1
> e
1
+...+ < T
f
k
, e
m
> e
m
=< f
k
, Te
1
> e
1
+...+ < f
k
, Te
m
> e
m
= < f
k
, Te
1
>e
1
+... +< f
k
, Te
m
>e
m
.
Thus
M(T
) =
_
_
< Te
1
, f
1
> . . . . . . < Te
n
, f
1
>
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
< Te
1
, f
m
> . . . . . . < Te
n
, f
m
>
_
_
.
Obviously, M(T
1
0
2
.
.
.
0
n
_
_
, which is a diag-
onal matrix. Thus M(T
, (v
1
, ..., v
n
)) is also a diagonal matrix. Matrix multi-
plication with diagonal matrices is obviously commutative, which implies that
TT
= T
_
a
11
. . . . . . a
1n
.
.
.
.
.
.
.
.
.
.
.
.
0 a
nn
_
_
for some a
ij
C, 1 i, j n, i j. Therefore
Te
1
= a
11
e
1
14
and thus
||Te
1
||
2
= |a
11
|
2
.
And
||T
e
1
||
2
= |a
1n
|
2
+... +|a
nn
|
2
.
However, because T is normal we have that
||Te
1
||
2
=< Te
1
, Te
1
>=< e
1
, T
Te
1
>=< e
1
, TT
e
1
>=< T
e
1
, T
e
1
>= ||T
e
1
||
2
so
|a
11
|
2
= |a
1n
|
2
+... +|a
nn
|
2
and thus
|a
2n
|
2
, ..., |a
nn
|
2
= 0
and thus
a
2n
, ..., a
nn
= 0.
Similarily, we have that for all 1 j n, a
ij
, ..., a
in
= 0, n j. And thus
M(T) =
_
_
a
11
0
.
.
.
.
.
.
0 a
nn
_
_
is a diagonal matrix.
7 The Real Spectral Theorem
7.1 Theorem: The Real Spectral Theorem
Suppose that V is a real inner-product space and T L(V ). Then V has an
orthonormal basis consisting of eigenvectors of T if and only if T is self-adjoint.
Proof
First, suppose that V has an orthonormal basis B consisting of eigenvectors
of T, then
M(T, B) = M(T
, B)
because V is a real vector space. In other words
T = T
,
15
that is T is self-adjoint. Now suppose T is self-adjoint. We shall induct on
dimV to show that V has an orthonormal basis consisting of eigenvectors of T.
If dimV = 1, then T is a scaler, so our claim holds. Now, let n = dimV and
choose some v V such that v = 0. Then is must be that the (n + 1)-vector
(v, Tv, ..., T
n
v)
is linearly independent.
Therefore, there exists a
0
, a
1
, ..., a
n
, not all zero, such that
0 = a
0
+a
1
Tv +... +a
n
T
n
v
= (a
0
+a
1
T +... +a
n
T
n
)v
= c(T
2
+
1
T +
1
I)...(T
2
+
M
T +
M
I)(T
1
I)...(T
m
I)v
by polynomial decomposition, with c,
i
,
i
R, 1 i M, m, n R, m+
n 1, where it can be shown that (T
2
+
j
T +
j
I) is injective for 1 j M.
This implies that
(T
1
I)...(T
m
I)v = 0
and consequently
(T
j
I) is not injective for some j,
from which it follows that T has an eigenvalue.
Obsevation: for T
j
I not injective, and (T
1
I)...(T
m
I)v = 0 we
have that for some combination of
1
, ...,
k1
,
k+1
, ...,
m
where some are 0 and
the others are 1, then
1
(T
1
I)...
k1
(T
k1
I)
k+1
(T
k+1
I)...
m
(T
m
I) is an eigenvector of T corresponding to the eigenvalue
k
.
Now let be an eigenvalue for T. There exists a correspondint eigenvector
w. Take u =
w
||w||
||u|| = 1. Also, take
U = {u | R}
and
U
is invariant under T.
16
We can thus dene S = T|
U
, which is also self-adjoint. Then by our
inductive hypothesis, there exists an orthonormal basis of U
consisting of
eigenvectors of S, call it B
S
. Thus B
S
{u} is an orthogonal basis of V consisting
of eigenvalues of T. Of course, with respect to that basis, M(T) is a diagonal
matrix.
8 Positive Operators
8.1 Denition
An operator T L(V ) is positive if T is self-adjoint and
< Tv, v > v,
v V .
We rst observe that the set of orthogonal operators is a subset of the set
of positive operators.
8.1.1 Proposition
Every orthogonal projection is positive.
Proof
Consider the T, an orthogonal projection on U. Take any v V and consider
its unique decomposition v = u +w where u U and w U
. Then
< Tv, v >=< u, u +w >=< u, u > + < u, w >= ||u||
2
0.
Depending on what kind of problem we are looking at, it becomes interesting
to look at alternative denitions of a positive operator. Indeed, ans as we will
show, it is perfectly correct to dene a postivie operator as an operator T being
self-adjoint and having nonnegative eigenvalues; as having a positive square
root; as having a self-adjoint square root; or as an operator T for which there
exists an operator S L(V ) such that S
S = T.
Indeed, if T is positive, then consider anyeigenvalue of T -the existence of
which is guaranteed by the self-adjointedness-, and a corresponding eigenvector
v:
0 < Tv, v >=< v, v >= ||v||
2
0.
Furthermore, because T is self-adjoint by denition, we nd that being pos-
itive is implies being self-adjoint with nonnegative eigenvalues.
Now, from there we nd that, by the real spectral theorem, there exists an
orthonormal basis of V of eigenvectors of T, label it (v
1
, ..., v
n
). Then for each
17
element of the basis, we nd that, because the eigenvalues are nonegative, we
van write
Tv
i
=
i
v
i
=
_
i
_
i
v
i
where
i
is the eigenvalue corresponding to the eigenvector v
i
. And we thus
fully dene an operator S acting on each basis element in the following way:
Sv
i
=
i
v
i
, 1 i n. Thus
SSv = SS
i
v
i
=
i
_
i
_
i
v
i
=
i
Tv
i
= T
i
v
i
= Tv
for any v =
i
v
i
V .
Furthermore
< Sv, v >=< S
i
v
i
,
j
v
j
>=<
i
_
i
v
i
,
j
v
j
>=
j
<
i
_
i
v
i
,
j
v
j
>
=<
i
_
i
v
i
,
i
v
i
>
because the basis is orthonormal
=
_
|
i
|
2
0.
where the last inequality holds because
i
is the square root of a nonegative
real.
Furthermore,
< Sv, v >=
i
<
i
_
i
v
i
,
i
v
i
>=
i
<
i
v
i
,
_
i
v
i
>
because
i
R
=<
i
v
i
,
i
_
i
v
i
>=< v, Sv >
S = S
Sv, v >=< v, (S
S)
v >=< v, S
Sv >=< v, Tv >
T is self adjoint, and
< Tv, v >=< S