Linear Algebra Ma106 Iitb

Linear Algebra
Murali K. Srinivasan
Jugal K. Verma
February 13, 2014
2
Contents
1 Matrices, Linear Equations and Determinants 5
1.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Gauss Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Vector spaces and Linear Transformations 29
2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Inner product spaces 49
3.1 Length, Projection, and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Projections and Least Squares Approximations . . . . . . . . . . . . . . . . . . . . . 54
3.3 Determinant and Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 Eigenvalues and eigenvectors 61
4.1 Algebraic and Geometric multiplicities . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3
4 CONTENTS
Chapter 1
Matrices, Linear Equations and
Determinants
1.1 Matrix Operations
Convention 1.1.1. We shall write F to mean either the real numbers R or the complex numbers
C. Elements of F will be called scalars.
Let m, n be positive integers. An mn matrix A over F is a collection of mn scalars a
ij
F
arranged in a rectangular array of m rows and n columns:
A =
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n

a
m1
a
m2
a
mn
_
_
.
The entry in row i and column j is a
ij
. We also write A = (a
ij
) to denote the entries. When all
the entries are in R we say that A is a real matrix. Similarly, we dene complex matrices. For
example,
_
1 1 3/2
5/2 6 11.2
_
is a 2 3 real matrix.
A 1 n matrix [a
1
a
2
a
n
] is called a row vector and a m1 matrix
_
_
b
1
b
2
b
n
_
_
is called a column vector. An n n matrix is called a square matrix.
Matrix Addition
5
6 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
Let M, N be mn matrices. Then M +N is a mn matrix whose (i, j) entry is the sum of
the (i, j) entries of M and N. For example,
_
2 1 0
1 3 5
_
+
_
1 0 3
4 3 1
_
=
_
3 1 3
5 0 6
_
.
Note that addition is dened only when both matrices have the same size.
Scalar multiplication
Let F and let M be a mn matrix. Then M is a mn matrix whose (i, j) entry is
(i, j) entry of M. For example
2
_
_
0 1
2 3
2 1
_
_
=
_
_
0 2
4 6
4 2
_
_
.
Matrix multiplication
First we dene the product of a row vector a = [a
1
. . . a
n
] and a column vector b =
_
_
b
1
b
n
_
_
,
both with n components.
Dene ab to be the scalar

n
i=1
a
i
b
i
.
The product of two matrices A = (a
ij
) and B = (b
ij
), denoted AB, is dened only when the
number of columns of A is equal to the number of rows of B. So let A be a mn matrix and let
B be a n p matrix. Let the row vectors of A be A
1
, A
2
, . . . , A
m
and let the column vectors of B
be B
1
, B
2
, . . . , B
p
. We write
A =
_
_
A
1
A
2
A
m
_
_
, B = [B
1
B
2
B
p
] .
Then M = AB is a mp matrix whose (i, j) entry m
ij
, for 1 i m and 1 j p, is given
by
m
ij
= A
i
B
j
=
n
k=1
a
ik
b
kj
.
For example,
_
1 3 1
2 4 2
_
_
_
2 0
1 1
0 1
_
_
=
_
5 4
8 6
_
.
The usefulness and meaning of this denition will emerge as this course progresses. Meanwhile,
let us note other ways of thinking about matrix multiplication. First a denition. By a linear
combination of n 1 column vectors v
1
, v
2
, . . . , v
r
we mean a column vector v of the form v =
1
v
1
+ +
r
v
r
, where
i
F for all i are called the coecients. Similarly we dene a linear
combination of row vectors.
Matrix times a column vector
1.1. MATRIX OPERATIONS 7
Lemma 1.1.2. Let B = [B
1
B
2
B
p
] be a np matrix with columns B
1
, . . . , B
p
. Let x =
_
_
x
1
x
p
_
_
be a column vector with p components. Then
Bx = x
1
B
1
+x
2
B
2
+ +x
p
B
p
.
Proof. Both sides are n 1. By denition
Bx =
_
p
j=1
b
1j
x
j
p
j=1
b
2j
x
j
p
j=1
b
nj
x
j
_
_
=
p
j=1
_
_
b
1j
x
j
b
2j
x
j
b
nj
x
j
_
_
=
p
j=1
x
j
B
j
,
as desired. So, Bx can be thought of as a linear combination of the columns of B, with column l
having coecient x
l
. This way of thinking about Bx is very important.
Example 1.1.3. Let e
1
, e
2
, . . . , e
p
denote the standard column vectors with p components, i.e., e
i
denotes the p1 column vector with 1 in component i and all other components 0. Then Be
i
= B
i
,
column i of B.
Row vector times a matrix
Let A = be a m n with rows A
1
, . . . , A
m
. Let y = [y
1
y
m
] be a row vector with m
components. Then (why?)
yA = y
1
A
1
+y
2
A
2
+ +y
m
A
m
.
So, yA can be thought of as a linear combination of the rows of A, with row i having coecient y
i
.
Columns and rows of product
Let A and B be as above. Then (why?)
AB = [AB
1
AB
2
AB
p
] =
_
_
A
1
B
A
2
B
A
m
B
_
_
.
So, the jth column of AB is a linear combination of the columns of A, the coecients coming from
the jth column B
j
of B. For example,
_
1 3 1
2 4 2
_
_
_
2 0
1 1
0 1
_
_
=
_
5 4
8 6
_
.
The second column of the product can be written as
0
_
1
2
_
+ 1
_
3
4
_
+ 1
_
1
2
_
=
_
4
6
_
.
Similarly, ith row A
i
B of AB is a linear combination of the rows of B, the coecients coming from
the ith row A
i
of A.
Properties of Matrix Operations
Theorem 1.1.4. The following identities hold for matrix sum and product, whenever the sizes of
the matrices involved are compatible (for the stated operations).
(i) A(B +C) = AB +AC.
(ii) (P +Q)R = PR +QR.
(iii) A(BC) = (AB)C.
(iv) c(AB) = (cA)B = A(cB).
Proof. We prove item (iii) (leaving the others as exercises). Let A = (a
ij
) have p columns,
B = (b
kl
) have p rows and q columns, and C = (c
rs
) have q rows. Then the entry in row i and
column s of A(BC) is
=
p
m=1
a(i, m)entry in row m, column s of BC
=
p
m=1
a(i, m)
_
q
n=1
b(m, n)c(n, s)
_
=
q
n=1
_
p
m=1
a(i, m)b(m, n)
_
c(n, s),
which is the entry in row i and column s of A(BC).
Matrix multiplication is not commutative. For example :
_
1 0
0 0
_ _
0 1
0 0
_
=
_
0 1
0 0
_
,
but
_
0 1
0 0
_ _
1 0
0 0
_
=
_
0 0
0 0
_
Denition 1.1.5. A matrix all of whose entries are zero is called the zero matrix. The entries
a
ii
of a square matrix A = (a
ij
) are called the diagonal entries. If the only nonzero entries of a
square matrix A are the diagonal entries then A is called a diagonal matrix. An n n diagonal
matrix whose diagonal entries are 1 is called the n n identity matrix. It is denoted by I
n
. A
square matrix A = (a
ij
) is called upper triangular if all the entries below the diagonal are zero,
i.e., a
ij
= 0 for i > j. Similarly we dene lower triangular matrices.
A square matrix A is called nilpotent if A
r
= 0 for some r 1.
Example 1.1.6. Let A = (a
ij
) be an upper triangular n n matrix with diagonal entries zero.
Then A is nilpotent. In fact A
n
= 0.
Since column j of A
n
is A
n
e
j
, it is enough to show that A
n
e
j
= 0 for j = 1, . . . , n. Denote
column j of A by A
j
.
1.1. MATRIX OPERATIONS 9
We have Ae
1
= A
1
= 0. Now
A
2
e
2
= A(Ae
2
) = AA
2
= A(a
12
e
1
) = a
12
Ae
1
= 0.
Similarly
A
3
e
3
= A
2
(Ae
3
) = AA
3
= A
2
(a
13
e
1
+a
23
e
2
) = 0.
Continuing in this fashion we see that all columns of A
n
are zero.
Inverse of a Matrix
Denition 1.1.7. Let A be an nn matrix. If there is an nn matrix B such that AB = I
n
= BA
then we say A is invertible and B is the inverse of A. The inverse of A is denoted by A
1
.
Remark 1.1.8. (1) Inverse of a matrix is uniquely determined. Indeed, if B and C are inverses of
A then
B = BI = B(AC) = (BA)C = IC = C.
(2) If A and B are invertible n n matrices, then AB is also invertible. Indeed,
(B
1
A
1
)(AB) = B
1
(A
1
A)B = B
1
B = I.
Similarly (AB)(B
1
A
1
) = I. Thus AB is invertible and (AB)
1
= B
1
A
1
.
(3) We will see later (in Chapter 3) that if there exists an n n matix B for an n n matrix A
such that AB = I or BA = I, then A is invertible. This fact fails for non-square matrices. For
example
[1 2]
_
1
0
_
= [1] = I
1
, but
_
1
0
_
[1 2] =
_
1 2
0 0
_
,= I
2
.
(4) Inverse of a square matrix need not exist. For example, let A =
_
1 0
0 0
_
. If
_
a b
c d
_
is any
2 2 matrix, then
_
1 0
0 0
_ _
a b
c d
_
=
_
a b
0 0
_
,= I
2
for any a, b, c, d.
Transpose of a Matrix
Denition 1.1.9. Let A = (a
ij
) be an mn matrix. Then the transpose of A, denoted by A
t
, is
the matrix n n matrix (b
ij
) such that b
ij
= a
ji
for all i, j.
Thus rows of A become columns of A
t
and columns of A become rows of A
t
. For example, if
A =
_
2 0 1
1 0 1
_
then A
t
=
_
_
2 1
0 0
1 1
_
_
.
Lemma 1.1.10. (i) For matrices A and B of suitable sizes, (AB)
t
= B
t
A
t
.
(ii) For any invertible square matrix A, (A
1
)
t
= (A
t
)
1
.
Proof. For any matrix C, let C
ij
denote its (i, j)th entry.
(i) Let A = (a
ij
), B = (b
ij
). Then, for all i, j,
((AB)
t
)
ij
= (AB)
ji
=
a
jk
b
ki
=
(A
t
)
kj
(B
t
)
ik
=
(B
t
)
ik
(A
t
)
kj
= (B
t
A
t
)
ij
(ii) Since AA
1
= I = A
1
A, we have (AA
1
)
t
= I = (A
1
A)
t
. By (i), (A
1
)
t
A
t
= I =
A
t
(A
1
)
t
. Thus (A
t
)
1
= (A
1
)
t
.
Denition 1.1.11. A square matrix A is called symmetric if A = A
t
. It is called skew-symetric
if A
t
= A.
Lemma 1.1.12. (i) If A is a symmetric matrix then so is A
1
. (ii) Every square matrix A is a
sum of a symmetric and a skew symmetric matrix in a unique way.
Proof. (i) is clear from part (ii) above.
(ii) Since
A =
1
2
(A+A
t
) +
1
2
(AA
t
),
every matrix is a sum of a symmetric and a skew-symmetric matrix. To see the uniqueness, suppose
that P is a symmetric matrix and Q is a skew-symmetric matrix such that
A = P +Q.
Then A
t
= P
t
+Q
t
= P Q. Hence P =
1
2
(A+A
t
) and Q =
1
2
(AA
t
).
1.2 Gauss Elimination
We discuss a widely used method called the Gauss elimination method to solve a system of m linear
equations in n unknowns x
1
, . . . , x
n
:
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
= b
i
, i = 1, 2, . . . , m,
where the a
ij
s and the b
i
s are known scalars in F. If each b
i
= 0 then the system above is called
a homogeneous system. Otherwise, we say it is inhomogeneous.
Set A = (a
ij
), b = (b
1
, . . . , b
m
)
t
, and x = (x
1
, . . . , x
n
)
t
. We can write the system above in the
matrix form
Ax = b.
The matrix A is called the coecient matrix. By a solution, we mean any choice of the unknowns
x
1
, . . . , x
n
which satises all the equations.
1.2. GAUSS ELIMINATION 11
Lemma 1.2.1. Let A be a mn matrix over F, b F
m
, and E an invertible mm matrix over
F. Set U = EA and c = Eb. Then Ax = b has the same solutions as Ux = c.
Proof. Ax = b implies EAx = Eb. Similarly, EAx = Eb implies E
1
(EAx) = E
1
(Eb) or
Ax = b.
The idea of Gauss elimination is the following:
(i) Find a suitable invertible E so that U is in row echelon form or row canonical form
(dened below).
(ii) All solutions to Ux = c, when U is in row echelon form or row canonical form , can be
written down easily.
We rst describe step (ii) and then step (i).
Denition 1.2.2. A mn matrix M is said to be in row echelon form (ref ) if it satises the
following conditions:
(a) By a zero row of M we mean a row with all entries zero. Suppose M has k nonzero rows
and mk zero rows. Then the last mk rows of M are the zero rows.
(b) The rst nonzero entry in a nonzero row is called a pivot. For i = 1, 2, . . . , k, suppose that
the pivot in row i occurs in column j
i
. Then we have j
1
< j
2
< < j
k
. The columns j
1
, . . . , j
k
are called the set of pivotal columns of M. Columns 1, . . . , nj

1
, . . . , j
k
are the nonpivotal
or free columns.
Denition 1.2.3. A m n matrix M is said to be in row canonical form (rcf ) if it satises
the following conditions:
(a) By a zero row of M we mean a row with all entries zero. Suppose M has k nonzero rows
and mk zero rows. Then the last mk rows of M are the zero rows.
(b) The rst nonzero entry in every nonzero row is 1. This entry is called a pivot. For
i = 1, 2, . . . , k, suppose that the pivot in row i occurs in column j
i
.
(c) We have j
1
< j
2
< < j
k
. The columns j
1
, . . . , j
k
are called the set of pivotal columns
of M. Columns 1, . . . , n j
1
, . . . , j
k
are the nonpivotal or free columns.
(d) The matrix formed by the rst k rows and the k pivotal columns of M is the k k identity
matrix, i.e., the only nonzero entry in a pivotal column is the pivot 1.
Note that a matrix in row canonical form is automatically in row echelon form. Also note that,
in both the denitions above, the number of pivots k is m, n.
Example 1.2.4. Consider the following 4 8 matrix U
_
_
0 1 a
13
a
14
0 a
16
0 a
18
0 0 0 0 1 a
26
0 a
28
0 0 0 0 0 0 1 a
38
0 0 0 0 0 0 0 0
_
_
,
where the a
ij
s are arbitrary scalars. It may be checked that U is in rcf with pivotal columns 2, 5, 7
and nonpivotal columns 1, 3, 4, 6, 8. Now let R be the matrix
_
_
0 a a
13
a
14
a
15
a
16
a
17
a
18
0 0 0 0 b a
26
a
27
a
28
0 0 0 0 0 0 c a
38
0 0 0 0 0 0 0 0
_
_
,
where a, b, c are nonzero scalars and the a
ij
s are arbitrary scalars. It may be checked that R is in
ref with pivotal columns 2, 5, 7 and nonpivotal columns 1, 3, 4, 6, 8.
Example 1.2.5. Let U be the matrix from the example above. Let c = (c
1
, c
2
, c
3
, c
4
)
t
. We want
to write down all solutions to the system Ux = c.
(i) If c
4
,= 0 then clearly there is no solution.
(ii) Now assume that c
4
= 0. Call the variables x
2
, x
5
, x
7
pivotal and the variables x
1
, x
3
, x
4
, x
6
, x
8
nonpivotal or free.
Give arbitrary values x
1
= s, x
3
= t, x
4
= u, x
6
= v, x
8
= w to the free variables. These values
can be extended to values of the pivotal variables in one and only one way to get a solution to the
system Ux = c:
x
7
= c
3
a
38
w
x
5
= c
2
a
26
v a
28
w
x
2
= c
1
a
13
t a
14
u a
16
v a
18
w
Thus (why?) the set of all solutions to Ux = c can be written as
_
_
s
c
1
a
13
t a
14
u a
16
v a
18
w
t
u
c
2
a
26
v a
28
w
v
c
3
a
38
w
w
_
_
,
where s, t, u, v, w are arbitrary scalars.
(iii) The column vector above can be written as
_
_
0
c
1
0
0
c
2
0
c
3
0
_
_
+s
_
_
1
0
0
0
0
0
0
0
_
_
+t
_
_
0
a
13
1
0
0
0
0
0
_
_
+u
_
_
0
a
14
0
1
0
0
0
0
_
_
+v
_
_
0
a
16
0
0
a
26
1
0
0
_
_
+w
_
_
0
a
18
0
0
a
28
0
a
38
1
_
_
Thus every solution to Ux = c is of the form above, for arbitrary scalars s, t, u, v, w. Note that the
rst vector in the expression above is the unique solution to Ux = c that has all free variables zero
and that the other vectors (without the coecients) are the unique solutions to Ux = 0 that have
one free variable equal to 1 and the other free variables equal to zero.
Example 1.2.6. Let R be the matrix from the example above. Let c = (c
1
, c
2
, c
3
, c
4
)
t
. We want
to write down all solutions to the system Ux = c.
(i) If c
4
,= 0 then clearly there is no solution.
(ii) Now assume that c
4
= 0. Call the variables x
2
, x
5
, x
7
pivotal and the variables x
1
, x
3
, x
4
, x
6
, x
8
nonpivotal or free.
Give arbitrary values x
1
= s, x
3
= t, x
4
= u, x
6
= v, x
8
= w to the free variables. These values
can be extended to values of the pivotal variables in one and only one way to get a solution to the
system Rx = c:
x
7
= (c
3
a
38
w)/c
x
5
= (c
2
a
26
v a
27
x
7
a
28
w)/b
x
2
= (c
1
a
13
t a
14
u a
15
x
5
a
16
v a
17
x
7
a
18
w)/a
The process above is called back substitution. Given arbitrary values for the free variables,
we rst solve for the value of the largest pivotal variable, then using this value (and the values of
the free variables) we get the value of the second largest pivotal variable, and so on.
We extract the following Lemma from the examples above and its proof is left as an exercise.
Lemma 1.2.7. Let U be a mn matrix in ref. Then the only solution to the homogeneous system
Ux = 0 which is zero in all free variables is the zero solution.
Note that a matrix in rcf is also in ref and the lemma above also applies to such matrices.
Theorem 1.2.8. Let Ax = b, with A an m n matrix. Let c be a solution of Ax = b and S the
set of all solutions of the associated homogeneous system Ax = 0. Then the set of all solutions to
Ax = b is
c +S = c +v : v S.
Proof. Let Au = b. Then A(uc) = AuAc = bb = 0. So uc S and u = c+(uc) c+S.
Conversely, let v S. Then A(c +v) = Ac +Av = b + 0 = b. Hence c +v is a solution to Ax = b.
The proof of the following important result is almost obvious and is left as an exercise.
Theorem 1.2.9. Let U be a mn matrix in ref with k pivotal columns P = j
1
< j
2
< < j
k
and nonpivotal or free columns F = 1, . . . , n P. Let c = (c

1
, . . . , c
m
)
t
.
(i) The system Ux = c has a solution i c
k+1
= = c
m
= 0.
(ii) Assume c
k+1
= = c
m
= 0. Given arbitrary scalars y
i
, i F, there exist unique scalars
y
i
, i P such that y = (y
1
, . . . , y
n
)
t
satises Uy = c.
(iii) For i F, let s
i
be the unique solution of Ux = 0 which is zero in all free components
except component i, where it is 1. Then every solution of Ux = 0 is of the form
iF
a
i
s
i
,
where the a
i
s are arbitrary scalars.
(iv) Let p be the unique solution of Ux = c having all free variables zero. Then every solution
of Ux = c is of the form
p +
iF
a
i
s
i
,
where the a
i
s are arbitrary scalars.
Example 1.2.10. In our previous two examples P = 2, 5, 7 and F = 1, 3, 4, 6, 8. To make sure
the notation of the theorem is understood write down p and s
i
, i = 1, 3, 4, 6, 8.
We now discuss the rst step in Gauss elimination, namely, how to reduce a matrix to ref or
rcf. We dene a set of elementary row operations to be performed on the equations of a system.
These operations transform a system of equations into another system with the same solution set.
Performing an elementary row operation on Ax = b is equivalent to replacing this system by the
system EAx = Eb, where E is an invertible elementary matrix.
Elementary row operations and elementary matrices
Let e
ij
denote the mn matrix with 1 in the ith row and jth column and zero elsewhere. Any
matrix A = (a
ij
) of size mn can be written as
A =
m
i=1
n
j=1
a
ij
e
ij
.
For this reason e
ij
s are called the matrix units. Let us see the eect of multiplying e
13
with a
matrix A written in terms of row vectors :
e
13
A =
_
_
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
_
_
mm
_
_
R
1

R
2

R
3

.
.
.
R
m

_
_
mn
=
_
_
R
3

0
0
.
.
.
0
_
_
.
In general, if e
ij
is an mm matrix unit and A is an mn matrix then
e
ij
A =
_
_
0
.
.
.
R
j

0
.
.
.
0

_
_
ith row.
We now dene three kinds of elementary row operations and elementary matrices. Consider the
system Ax = b, where A is mn, b is m1, and x is a n 1 unknown vector.
(i) Elementary row operation of type I: For i ,= j and a scalar a, add a times equation j to
equation i in the system Ax = b.
What eect does this operation have on A and b? Consider the matrix
E =
_
_
1
1 a
1
.
.
.
1
_
_
or
_
_
1
1
.
.
.
a
1
_
_
= I +ae
ij
, i ,= j.
This matrix has 1s on the diagonal and a scalar a as an o-diagonal entry. By the above observation
(I +ae
ij
)
_
_
R
1

R
2

.
.
.
R
m

_
_
=
_
_
R
1

R
2

.
.
.
R
m

_
_
+a
_
_
0
R
j

.
.
.
0
_
_
ith row
=
_
_
R
1

.
.
.
R
i
+aR
j

.
.
.
R
m

_
_
ith row
It is now clear that performing an elementary row operation of type I on the system Ax = b we
get the new system EAx = Eb.
Suppose we perform an elementary row operation of type I as above. Then perform the same
elementary row operation of type I but with the scalar a replaced by the scalar a. It is clear that
we get back the original system Ax = b. It follows (why?) that E
1
= I ae
ij
.
(ii) Elementary row operation of type II: For i ,= j interchange equations i and j in the system
Ax = b.
What eect does this operation have on A and b?. Consider the matrix
F =
_
_
1
1
.
.
.
0 1
.
.
.
1 0
.
.
.
1
_
_
= I +e
ij
+e
ji
e
ii
e
jj
.
Premultiplication by this matrix has the eect of interchanging the ith and jth rows. Performing
this operation twice in succession gives back the original system. Thus F
2
= I.
(iii) Elementary row operation of type III: Multiply equation i in the system Ax = b by a
nonzero scalar c.
What eect does this operation have on A and b?. Consider the matrix
G =
_
_
1
1
.
.
.
c
1
1
.
.
.
1
_
_
= I + (c 1)e
ii
, c ,= 0
Premultiplication by G has the eect of multiplying the ith row by c. Do this operation twice
in succession, rst time with the scalar c and the second time with scalar 1/c, yields the original
system back. It follows that G
1
= I + (c
1
1)e
ii
.
The matrices E, F, G above are called elementary matrices of type I,II,III respectively. We
summarize the above discussion in the following result.
Theorem 1.2.11. Performing an elementary row operation (of a certain type) on the system
Ax = b is equivalent to premultiplying A and b by an elementary matrix E (of the same type),
yielding the system EAx = Eb.
Elementary matrices are invertible and the inverse of an elementary matrix is an elementary
matrix of the same type.
Since elementary matrices are invertible it follows that performing elementary row operations
does not change the solution set of the system. We now show how to reduce a matrix to row
reduced echelon form using a sequence of elementary row operations.
Theorem 1.2.12. Every matrix can be reduced to a matrix in rcf by a sequence of elementary row
operations.
Proof. We apply induction on the number of rows.If the matrix A is a row vector, the conclusion
is obvious. Now suppose that A is m n, where m 2. If A = 0 then we are done. If A is not
the zero matrix then there is a nonzero column in A. Find the rst nonzero column, say column
j
1
, from the left. Interchange rows to move the rst nonzero in column j
1
to the top row. Now
multiply by a nonzero scalar to make this entry (in row 1 and column j
1
) 1. Now add suitable
multiples of the rst row to the remaining rows so that all entries in column j
1
, except the entry
in row 1, become zero. The resulting matrix looks like
A
1
=
_
_
0 0 1
0 0 0

0 0 0
_
_
By induction, the submatrix of A
1
consisting of rows 2, 3 . . . , m can be reduced to row reduced
echelon form. So now the resulting matrix looks like
A
2
=
_
1 v
D
_
where blank space consists of 0s, v is a row vector with nj
1
components, and D is a (m1)(nj
1
)
matix in rcf. Let the pivotal columns of D be j
2
< j
3
< < j
k
, where j
1
< j
2
. By subtracting
suitable multiples of rows 2, . . . , k of A
2
from row 1 of A
2
we can make the entries in columns
j
2
, . . . , j
k
of row 1 equal to 0. The resulting matrix is in rcf.
Before giving examples of row operations we collect together some results on systems of linear
equations that follow from Gauss elimination.
Theorem 1.2.13. Let Ax = b, with A an mn matrix.
(i) Suppose m < n. Then there is a nontrivial solution to the homogeneous system Ax = 0.
(ii) The number of solutions to Ax = b is either 0, 1, or .
Proof. (i) Reduce A to rcf U by Gauss elimination. Since m < n there is atleast one free variable.
It follows that there is a nontrivial solution.
(ii) Reduce Ax = b to EAx = Eb using Gauss elimination, where U = EA is in rref. Put
c = Eb = (c
1
, . . . , c
m
)
t
. Suppose U has k nonzero rows. There cases arise:
(a) atleast one of c
k+1
, . . . , c
m
is nonzero: in this case there is no solution.
(b) c
k+1
= = c
m
= 0 and k = n: there is a unique solution (why?).
(c)c
k+1
= = c
m
= 0 and k < n: there are innitely many solutions (why?).
No other cases are possible (why?). That completes the proof.
In the following examples an elementary row operation of type I is indicated by R
i
+ aR
j
, of
type II is indicated by R
i
R
j
, and of type III is indicated by aR
i
.
Example 1.2.14. Consider the system
Ax =
_
_
2 1 1
4 6 0
2 7 2
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
5
2
9
_
_
= b.
Applying the indicated elementary row operations to A and b we get
_
_
2 1 1 5
4 6 0 2
2 7 2 9
_
_
R
2
2R
1
R
3
+R
1
_
_
2 1 1 5
0 8 2 12
0 8 3 14
_
_
R
3
+R
2
_
_
2 1 1 5
0 8 2 12
0 0 1 2
_
_
R
1
R
3
R
2
+ 2R
3
_
_
2 1 0 3
0 8 0 8
0 0 1 2
_
_
R
1
+ (1/8)R
2
_
_
2 0 0 2
0 8 0 8
0 0 1 2
_
_
(1/2)R
1
(1/8)R
2
_
_
1 0 0 1
0 1 0 1
0 0 1 2
_
_
Since there are no free columns the problem has a unique solution given by x
1
= x
2
= 1 and x
3
= 2.
Ax =
_
_
1 3 2 0 2 0
2 6 5 2 4 3
0 0 5 10 0 15
2 6 0 8 4 18
_
_
_
_
x
1
x
2
x
3
x
4
x
5
x
6
_
_
=
_
_
0
1
5
6
_
_
= b.
_
_
1 3 2 0 2 0 0
2 6 5 2 4 3 1
0 0 5 10 0 15 5
2 6 0 8 4 18 6
_
_
R
2
2R
1
R
4
2R
1
_
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 5 10 0 15 5
0 0 4 8 0 18 6
_
_
R
2
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 5 10 0 15 5
0 0 4 8 0 18 6
_
_
R
3
5R
2
R
4
4R
2
_
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 0 0
0 0 0 0 0 6 2
_
_
R
3
R
4
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 6 2
0 0 0 0 0 0 0
_
_
(1/6)R
3
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 1 1/3
0 0 0 0 0 0 0
_
_
R
2
3R
3
_
1 3 2 0 2 0 0
0 0 1 2 0 0 0
0 0 0 0 0 1 1/3
0 0 0 0 0 0 0
_
_
R
1
+ 2R
2
_
1 3 0 4 2 0 0
0 0 1 2 0 0 0
0 0 0 0 0 1 1/3
0 0 0 0 0 0 0
_
_
It may be checked that every solution to Ax = b is of the form
_
_
0
0
0
0
0
1/3
_
_
+s
_
_
3
1
0
0
0
0
_
_
+r
_
_
4
0
2
1
0
0
_
_
+t
_
_
2
0
0
0
1
0
_
_
,
for some scalars s, t, r.
Ax =
_
_
1 3 2 0 2 0
2 6 5 2 4 3
0 0 5 10 0 15
2 6 0 8 4 18
_
_
_
_
x
1
x
2
x
3
x
4
x
5
x
6
_
_
=
_
_
0
1
6
6
_
_
= b.
_
_
1 3 2 0 2 0 0
2 6 5 2 4 3 1
0 0 5 10 0 15 6
2 6 0 8 4 18 6
_
_
R
2
2R
1
R
4
2R
1
_
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 5 10 0 15 6
0 0 4 8 0 18 6
_
_
R
2
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 5 10 0 15 6
0 0 4 8 0 18 6
_
_
R
3
5R
2
R
4
4R
2
_
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 0 1
0 0 0 0 0 6 2
_
_
R
3
R
4
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 6 2
0 0 0 0 0 0 1
_
_
(1/6)R
3
_
1 3 2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 1 1/3
0 0 0 0 0 0 1
_
_
R
2
3R
3
_
1 3 2 0 2 0 0
0 0 1 2 0 0 0
0 0 0 0 0 1 1/3
0 0 0 0 0 0 1
_
_
R
1
+ 2R
2
_
1 3 0 4 2 0 0
0 0 1 2 0 0 0
0 0 0 0 0 1 1/3
0 0 0 0 0 0 1
_
_
It follows that the system has no solution.
Calculation of A
1
by Gauss elimination
Lemma 1.2.17. Let A be a square matrix. Then the following are equivalent:
(a) A can be reduced to I by a sequence of elementary row operations.
(b) A is a product of elementary matrices.
(c) A is invertible.
(d) The system Ax = 0 has only the trivial solution x = 0.
Proof. (a) (b). Let E
1
, . . . , E
k
be elementary matrices so that E
k
. . . E
1
A = I. Thus A =
E
1
1
. . . E
1
k
.
(b) (c) Elementary matrices are invertible.
(c) (d) Suppose A is invertible. Then AX = 0. Hence A
1
(AX) = X = 0.
(d) (a) First observe that a square matrix in rcf is either the identity matrix or its bottom row
is zero. If A cant be reduced to I by elementary row operations then U = the rcf of A has a zero
row at the bottom. Hence Ux = 0 has atmost n 1 nontrivial equations. which have a nontrivial
solution. This contradicts (d).
This proposition provides us with an algorithm to calculate inverse of a matrix if it exists. If A
is invertible then there exist invertible matrices E
1
, E
2
, . . . , E
k
such that E
k
E
1
A = I. Multiply
by A
1
on both sides to get E
k
E
1
I = A
1
.
Lemma 1.2.18. (Gauss-Jordan Algorithm) Let A be an invertible matrix. To compute A
1
, apply
elementary row operations to A to reduce it to an identity matrix. The same operations when
applied to I, produce A
1
.
Example 1.2.19. We nd the inverse of the matrix
A =
_
_
1 0 0
1 1 0
1 1 1
_
_
.
by forming the 3 6 matrix
[A [ I] =
_
_
1 0 0 1 0 0
1 1 0 0 1 0
1 1 1 0 0 1
_
_
.
Now perform row operations to reduce the matrix A to I. In this process the identity matrix will
reduce to A
1
.
[A [ I] =
_
_
1 0 0 1 0 0
1 1 0 0 1 0
1 1 1 0 0 1
_
_
R
2
R
1
R
3
R
1
_
_
1 0 0 1 0 0
0 1 0 1 1 0
0 1 1 1 0 1
_
_
R
3
R
2
_
_
1 0 0 1 0 0
0 1 0 1 1 0
0 0 1 0 1 1
_
_
. Hence A
1
=
_
_
1 0 0
1 1 0
0 1 1
_
_
1.3 Determinants
In this section we study determinants of matrices. Recall the formula for determinants of k k
matrices, for k = 1, 2, 3.
det[a] = a, det
_
a b
c d
_
= ad bc
and det
_
_
a b c
d e f
g h i
_
_
= aei ahf bdi +bgf +cdh ceg.
Our approach to determinants of n n is via their properties (rather than via an explicit formula
as above). It makes their study more elegant. Later, we will give a geometric interpretation of
determinant in terms of volume.
Let d be a function that associates a scalar d(A) F with every nn matrix A over F. We use
the following notation. If the columns of A are A
1
, A
2
, . . . , A
n
, we write d(A) = d(A
1
, A
2
, . . . , A
n
).
Denition 1.3.1. (i) d is called multilinear if for each k = 1, 2, . . . , n; scalars , and n 1
column vectors A
1
, . . . , A
k1
, A
k+1
, . . . , A
n
, B, C
d(A
1
, . . . , A
k1
, B +C, A
k+1
, . . . , A
n
) =
d(A
1
, . . . , A
k1
, B, A
k+1
, . . . , A
n
) + d(A
1
, . . . , A
k1
, C, A
k+1
, . . . , A
n
).
1.3. DETERMINANTS 21
(ii) d is called alternating if d(A
1
, A
2
, . . . , A
n
) = 0 if A
i
= A
j
for some i ,= j.
(iii) d is called normalized if d(I) = d(e
1
, e
2
, . . . , e
n
) = 1, where e
i
is the i
th
standard column
vector with 1 in the i
th
coordinate and 0s elsewhere.
(iv) A normalized, alternating, and multillinear function d on nn matrices is called a deter-
minant function of order n.
Our immediate objective is to show that there is only one determinant function of order n. This
fact is very useful in proving that certain formulas yield the determinant. We simply show that
the formula denes an alternating, multilinear and normalized function on the columns of n n
matrices.
Lemma 1.3.2. Suppose that d(A
1
, A
2
, . . . , A
n
) is a multilinear alternating function on columns of
n n matrices. Then
(a) If some A
k
= 0 then d(A
1
, A
2
, . . . , A
n
) = 0.
(b) d(A
1
, A
2
, . . . , A
k
, A
k+1
, . . . A
n
) = d(A
1
, A
2
, . . . , A
k+1
, A
k
, . . . , A
n
).
(c) d(A
1
, A
2
, . . . , A
i
, . . . , A
j
, . . . , A
n
) = d(A
1
, A
2
, . . . , A
j
, . . . , A
i
, . . . , A
n
).
Proof. (a) If A
k
= 0 then by multilinearity
d(A
1
, A
2
, . . . , 0A
k
, . . . , A
n
) = 0 d(A
1
, A
2
, . . . , A
k
, . . . , A
n
) = 0.
(b) Put A
k
= B, A
k+1
= C. Then by alternating property of d(A
1
, A
2
, . . . , A
n
),
0 = d(A
1
, A
2
, . . . , B +C, B +C, . . . , A
n
)
= d(A
1
, A
2
, . . . , B, B +C, . . . , A
n
) +d(A
1
, A
2
, . . . , C, B +C, . . . , A
n
)
= d(A
1
, A
2
, . . . , B, C, . . . , A
n
) +d(A
1
, A
2
, . . . , C, B, . . . , A
n
)
Hence d(A
1
, A
2
, . . . , B, C, . . . , A
n
) = d(A
1
, A
2
, . . . , C, B, . . . , A
n
).
(c) Follows from (b).
Remark 1.3.3. Note that the properties (a), (b), (c) have been derived by properties of determinant
functions without having any formula at our disposal yet.
Computation of determinants
Example 1.3.4. We now derive the familiar formula for the determinant of 22 matrices. Suppose
d(A
1
, A
2
) is an alternating multilinear normalized function on 2 2 matrices A = (A
1
, A
2
). Then
d
_
x y
z u
_
= xu yz.
To derive this formula, write the rst column as A = xe
1
+ ze
2
and the second column as A
2
=
ye
1
+ue
2
. Then
d(A
1
, A
2
) = d(xe
1
+ze
2
, ye
1
+ue
2
)
= d(xe
1
+ze
2
, ye
1
) +d(xe
1
+ze
2
, ue
2
)
= d(xe
1
, ye
1
) +d(ze
2
, ye
1
)
+d(xe
1
, ue
2
) +d(ze
2
, ue
2
)
= yzd(e
2
, e
1
) +xud(e
1
, e
2
)
= (xu yz)d(e
1
, e
2
)
= xu yz.
Similarly, the formula for 3 3 determinants can also be derived as above. We leave this as an
exercise.
Lemma 1.3.5. Suppose f is a multilinear alternating function on nn matrices and f(e
1
, e
2
, . . . , e
n
) =
0. Then f is identically zero.
Proof. Let A = (a
ij
) be an n n matrix with columns A
1
, . . . , A
n
. Write A
j
as
A
j
= a
1j
e
1
+a
2j
e
2
+ +a
nj
e
n
.
Since f is multilinear we have (why?)
f(A
1
, . . . , A
n
) =
h
a
h(1)1
a
h(2)2
a
h(n)n
f(e
h(1)
, e
h(2)
, . . . , e
h(n)
),
where the sum is over all functions h : 1, 2, . . . , n 1, 2, . . . , n.
Since f is alternating we have (why?)
f(A
1
, . . . , A
n
) =
h
a
h(1)1
a
h(2)2
a
h(n)n
f(e
h(1)
, e
h(2)
, . . . , e
h(n)
),
where the sum is now over all 1 1 onto functions h : 1, 2, . . . , n 1, 2, . . . , n.
By using part (c) of the lemma above we see that we can write
f(A
1
, . . . , A
n
) =
h
a
h(1)1
a
h(2)2
a
h(n)n
f(e
1
, e
2
, . . . , e
n
),
where the sum is over all 1 1 onto functions h : 1, 2, . . . , n 1, 2, . . . , n.
Thus f(A) = 0.
Existence and uniqueness of determinant function
Theorem 1.3.6. (Uniqueness of determinant function). Let f be an alternating multilinear func-
tion of order n and d a determinant function of order n. Then for all n n matrices A =
(A
1
, A
2
, . . . , A
n
),
f(A
1
, A
2
, . . . , A
n
) = d(A
1
, A
2
, . . . , A
n
)f(e
1
, e
2
, . . . , e
n
).
In particular, if f is also a determinant function then f(A
1
, A
2
, . . . , A
n
) = d(A
1
, A
2
, . . . , A
n
).
Proof. Consider the function
g(A
1
, A
2
, . . . , A
n
) = f(A
1
, A
2
, . . . , A
n
) d(A
1
, A
2
, . . . , A
n
)f(e
1
, e
2
, . . . , e
n
).
Since f, d are alternating and multilinear so is g. Since
g(e
1
, e
2
, . . . , e
n
) = 0
the result follows from the previous lemma.
We have proved uniqueness of determinant function of order n. It remains to show their exis-
tence.
Convention 1.3.7. We shall denote the determinant of A by det A or [A[.
Setting det[a] = a shows existence for n = 1.
Assume that we have shown existence of determinant function of order (n 1) (n 1). The
determinant of an nn matrix A can be computed in terms of certain (n1)(n1) determinants.
Let A
ij
= the (n1) (n1) matrix obtained from A by deleting the ith row and jth column
of A.
Theorem 1.3.8. Let A = (a
ij
) be an n n matrix. Then the function
a
11
detA
11
a
12
detA
12
+ + (1)
n+1
a
1n
detA
1n
.
is multilinear, alternating, and normalized on n n matrices, hence is the determinant function.
Proof. Denote the function by f(A
1
, A
2
, . . . , A
n
).
Suppose that the columns A
j
and A
j+1
of A are equal. Then A
1i
have equal columns except
when i = j or i = j + 1. By induction f(A
1i
) = 0 for i ,= j, j + 1. Thus
f(A) = a
1j
_
(1)
j+1
f(A
1j
)
+
_
(1)
j+2
f(A
1j+1
)
a
1j+1
.
Since A
j
= A
j+1
, a
1j
= a
1j+1
and A
1j
= A
1j+1
. Thus f(A) = 0. Therefore f(A
1
, A
2
, . . . , A
n
) is
alternating.
If A = (e
1
, e
2
, . . . , e
n
) then by induction
f(A) = 1f(A
11
) = f(e
1
, e
2
, . . . , e
n1
) = 1.
We leave the multilinear property of f(A
1
, . . . , A
n
) as an exercise for the reader.
The formula in the lemma above above is called expansion by rst row. Just like in the
lemma above we can also prove the following formula for expansion by row k. We leave its proof
as an exercise.
ij
) be an n n matrix and let 1 k n. Then
detA =
n
j=1
(1)
k+j
a
kj
detA
kj
.
Theorem 1.3.10. (i) Let U be an upper triangular or a lower triangular matrix. Then detU =
product of diagonal entries of U.
(ii) Let E be an elementary matrix of the type I +ae
ij
, for some i ,= j. Then detE = 1.
(iii) Let E be an elementary matrix of the type I + e
ij
+ e
ji
e
ii
e
jj
, for some i ,= j. Then
detE = 1.
(iv) Let E be an elementary matrix of the type I + (a 1)e
ii
, a ,= 0. Then detE = a.
Proof. (i) Let U = (u
ij
) be upper triangular. Arguing as in Lemma 3.5 we see that
detU =
h
u
h(1)1
u
h(2)2
u
h(n)n
,
where the sum is over all 1 1 onto functions h : 1, 2, . . . , n 1, 2, . . . , n. Since U is upper
triangular the only choice of h yeilding a nonzero term is the identity function (and this gives a
plus sign).
The proof for a lower triangular matrix is similar.
(ii) Follows from part (i).
(iii) E is obtained from the identity matrix by exchanging columns i and j. The result follows
since determinant is an alternating function.
(iv) Follows form part (i).
Determinant and Invertibility
Theorem 1.3.11. Let A, B be two n n matrices. Then
det(AB) = detAdetB.
Proof. Let D
i
denote the ith column of a matrix D. Then
(AB)
i
= AB
i
.
Therefore we need to prove that
det(AB
1
, AB
2
. . . , AB
n
) = det(A
1
, A
2
, . . . , A
n
)det(B
1
, . . . , B
n
)
Keep A xed and dene
f(B
1
, B
2
, . . . , B
n
) = det(AB
1
, AB
2
, . . . , AB
n
).
We show that f is alternating and multilinear. Let C be a n 1 column vector. Then
f(B
1
, . . . , B
i
, . . . , B
i
, . . . , B
n
) = det(AB
1
, . . . , AB
i
, . . . , AB
i
, . . . , AB
n
) = 0
f(B
1
, . . . , B
k
+C, . . . , B
n
) = det(AB
1
, . . . , A(B
k
+C), . . . , AB
n
)
= det(AB
1
, . . . , AB
k
+AC, . . . , AB
n
)
= det(AB
1
, . . . , AB
k
, . . . , AB
n
)
+ det(AB
1
, . . . , AC, . . . , AB
n
)
= f(B
1
, . . . , B
n
) +f(B
1
, . . . , C, . . . , B
n
).
Therefore
f(B
1
, B
2
, . . . , B
n
) = det(B
1
, . . . , B
n
)f(e
1
, e
2
, . . . , e
n
)
Now note that
f(e
1
, e
2
, . . . , e
n
) = det(Ae
1
, . . . , Ae
n
)
= det(A
1
, . . . , A
n
)
= detA
Hence det(AB) = detAdetB.
Lemma 1.3.12. (i) If A is an invertible matrix then detA ,= 0 and
detA
1
=
1
detA
.
(ii) detA ,= 0 implies A is invertible.
(iii) Suppose A, B are square matrices with AB = I. Then A is invertible and B = A
1
.
Proof. (i) Since AA
1
= I, detA
1
detA = detI = 1.
(ii) Suppose A is not invertible. Then, by Chapter 2, there is a nontrivial column vector x such
that Ax = 0. So some column of A is a linear combination of other columns (i.e., excluding itself)
of A. It now follows from multilinearity and alternating properties that detA = 0.
(iii) Taking determinants we have detAdetB = 1. So detA ,= 0 and A is invertible. Now
B = (A
1
A)B = A
1
(AB) = A
1
.
Theorem 1.3.13. For any n n matrix A,
detA = detA
t
.
Proof. Let B be the rcf of A. Then EA = B, where E is a product of elementary matrices. Since
inverses of elementary matrices are elementary matrices (of the same type) we can write
A = E
1
E
k
B,
A
t
= B
t
E
t
k
E
t
1
,
where the E
i
are elementary matrices.
Now the transpose of an elementary matrix is also an elementary matrix (of the same type) and
has the same determinant (by Theorem 1.3.10). Thus, by multiplicativity of determinant, we need
to show that det(B) = det(B
t
).
Case (i) A is not invertible (i.e., det(A) = 0): In this case det(B) = 0 and the last row of B or the
last column of B
t
is 0. Thus det(B
t
) = 0.
Case (ii) A is invertible: In this case B (and B
t
) are both equal to the the identity matrix.
The lemma above shows that the determinant is also a normalized, alternating, and multilinear
functions of the rows of a square matrix and we have the following formula for the determinant,
called expansion by column k.
ij
) be an n n matrix and let 1 k n. Then
detA =
n
i=1
(1)
k+i
a
ik
detA
ik
.
Example 1.3.15. (Computation by GaussElimination Method). This is one of the most ecient
ways to calculate determinant functions. Let A be an n n matrix. Suppose
E = the n n elementary matrix for the row operation A
i
+cA
j
F = the n n elementary matrix for the row operation A
i
A
j
G = the n n elementary matrix for the row operation A
i
cA
i
.
Suppose that U is the rcf of A. If c
1
, c
2
, . . . , c
p
are the multipliers used for the row operations
A
i
cA
i
and r row exchanges have been used to get U from A then for any alternating multilinear
function d, d(A) = (1)
r
c
1
c
2
. . . c
p
d(U). To see this we simply note that
d(FA) = d(A), d(EA) = d(A) and d(GA) = cd(A).
Suppose that u
11
, u
22
, . . . , u
nn
are the diagonal entries of U then
d(A) = (1)
r
(c
1
c
2
, . . . c
p
)
1
u
11
u
22
. . . u
nn
d(e
1
, e
2
, . . . , e
n
).
The cofactor matrix
Denition 1.3.16. Let A = (a
ij
) be an n n matrix. The cofactor of a
ij
, denoted by cofa
ij
is
dened as
cofa
ij
= (1)
i+j
detA
ij
.
The cofactor matrix of A denoted by cofA is the matrix
cofA = (cofa
ij
).
When n = 1, A
11
is the empty matrix and its determinant is taken to be 1.
Theorem 1.3.17. For any n n matrix A,
A(cofA)
t
= (detA)I = (cofA)
t
A.
In particular, if detA is nonzero then A
1
=
1
detA
(cofA)
t
, hence A is invertible.
Proof. The (i, j) entry of (cofA)
t
A is :
a
1j
cofa
1i
+a
2j
cofa
2i
+ +a
nj
cofa
ni
.
If i = j, it is easy to see that it is detA. When i ,= j consider the matrix B obtained by replacing i
th
column of A by j
th
column of A. So B has a repeated column. The expansion by minors formula
for detB shows that detB = 0. The other equation A(cofA)
t
= (detA)I is proved similarly.
Theorem 1.3.18. (Cramers Rule) Suppose
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
a
n1
a
n2
a
nn
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
=
_
_
b
1
b
2
.
.
.
b
n
_
_
is system of n linear equations in n unknowns, x
1
, x
2
, . . . , x
n
. Suppose the coecient matrix A =
(a
ij
) is invertible. Let C
j
be the matrix obtained from A by replacing j
th
column of A by b =
(b
1
, b
2
, . . . , b
n
)
t
. Then for j = 1, 2, . . . , n,
x
j
=
detC
j
detA
.
Proof. Let A
1
, . . . , A
n
be the columns of A. Write b = x
1
A
1
+ x
2
A
2
+ + x
n
A
n
. Then
det(b, A
2
, A
3
, . . . , A
n
) = x
1
detA (why?). So x
1
=
detC
1
detA
. Similarly for x
2
, . . . , x
n
.
Chapter 2
Vector spaces and Linear
Transformations
2.1 Vector Spaces
Denition 2.1.1. A nonempty set V of objects (called elements or vectors) is called a vector
space over the scalars F if the following axioms are satised.
I. Closure axioms:
1. (closure under vector addition) For every pair of elements x, y V there is a unique element
x +y V called the sum of x and y.
2. (closure under scalar multiplication of vectors by elements of F) For every x V and every
scalar F there is a unique element x V called the product of and x.
II. Axioms for vector addition:
3. (commutative law) x +y = y +x for all x, y V.
4. (associative law) x + (y +z) = (x +y) +z for all x, y, z V.
5. (existence of zero element) There exists a unique element 0 in V such that x + 0 = 0 + x = x
for all x V.
6. (existence of inverse or negatives) For x V there exists a unique element written as x such
that x + (x) = 0.
III. Axioms for scalar multiplication
7. (associativity) For all , F, x V,
(x) = ()x.
8. (distributive law for addition in V ) For all x, y V and F,
(x +y) = x +y.
9. (distributive law for addition in F) For all , F and x V,
( +)x = x +x
29
30 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
10. (existence of identity for multiplication) For all x V,
1x = x.
Remark 2.1.2. When F = R we say that V is a real vector space. If we replace real numbers in
the above denition by complex numbers then we get the denition of a complex vector space.
Examples of Vector Spaces
In the examples below we leave the verication of the vector addition and scalar multiplication
axioms as exercises.
Example 2.1.3. 1. V = R, F = R with ordinary addition and multiplication as vector addition
and scalar multiplication. This gives a real vector space.
2. V = C, F = C with ordinary addition and multiplication as vector addition and scalar
multiplication. This gives a complex vector space.
3. V = C, F = R with ordinary addition and multiplication as vector addition and scalar
multiplication. This gives a real vector space.
4. V = R
n
= (a
1
, a
2
, . . . , a
n
)[a
1
, . . . , a
n
R, F = R with addition of row vectors as vector
addition and multiplication of a row vector by a real number as scalar multiplication. This
gives a real vector space. We can similarly dene a real vector space of column vectors with
n real components. Depending on the context R
n
could refer to either row vectors or column
vectors with n real components.
5. V = C
n
= (a
1
, a
2
, . . . , a
n
)[a
1
, . . . , a
n
C, F = C with addition of row vectors as vector
addition and multiplication of a row vector by a complex number as scalar multiplication.
This gives a complex vector space. We can similarly dene a complex vector space of column
vectors with n complex components. Depending on the context R
n
could refer to either row
vectors or column vectors with n complex components.
6. Let a < b be real numbers and set V = f : [a, b] R, F = R. If f, g V then we set
(f + g)(x) = f(x) + g(x) for all x [a, b]. If a R and f V then (af)(x) = af(x) for all
x [a, b]. This gives a real vector space. Here V is also denoted by R
[a,b]
.
7. Let t be an indeterminate. The set T
n
(R) = a
0
+ a
1
t + . . . + a
n
t
n
[a
0
, a
1
, . . . , a
n
R is a
real vector space under usual addition of polynomials and multiplication of polynomials with
real numbers.
8. C[a, b] = f : [a, b] R[f is continuous on [a, b] is a real vector space under addition and
scalar multiplication dened in item 6 above.
9. V = f : [a, b] R[f is dierentiable at x [a, b], x xed is a real vector space under the
operations described in item 6 above.
10. The set of all solutions to the dierential equation y
+ ay
+ by = 0 where a, b R form a
real vector space. More generally, in this example we can take a = a(x), b = b(x) suitable
functions of x.
2.1. VECTOR SPACES 31
11. Let V = M
mn
(R) denote the set of all m n matrices with real entries. Then V is a real
vector space under usual matrix addition and multiplication of a matrix by a real number.
The above examples indicate that the notion of a vector space is quite general. A result proved
for vector spaces will simultaneously apply to all the above dierent examples.
Subspace of a Vector Space
Denition 2.1.4. Let V be a vector space over F. A nonempty subset W of V is called a subspace
of V if
(i) 0 W.
(ii) u, v W implies u +v W.
(iii) u W, F implies u W.
Before giving examples we discuss an important notion.
Linear span
Let V be a vector space over F. Let x
1
, . . . , x
n
be vectors in V and let c
1
, . . . , c
n
F. The
vector

n
i=1
c
i
x
i
V is called a linear combination of x
i
s and c
i
is called the coecient of x
i
in this linear combination.
Denition 2.1.5. Let S be a subset of a vector space V over F. The linear span of S is the
subset of all vectors in V expressible as linear combinations of nite subsets of S, i.e.,
L(S) =
_
n
i=1
c
i
x
i
[n 0, x
1
, x
2
, . . . , x
n
S and c
1
, c
2
, . . . , c
n
F
_
The empty sum of vectors is the zero vector. Thus L() = 0. We say that L(S) is spanned by
S.
The linear span L(S) is actually a subspace of V . In fact, we have
Lemma 2.1.6. The smallest subspace of V containing S is L(S).
Proof. Note that L(S) is a subspace (why?). Now, if S W V and W is a subspace of V then
by L(S) W (why?). The result follows.
Example 2.1.7. 1. Let A be an m n matrix over F, with rows R
1
, . . . , R
m
and columns
C
1
, . . . , C
n
. The row space of A, denoted (A), is the subspace of F
n
spanned by the rows of
A. The column space of A, denoted ((A), is the subspace of F
m
spanned by the columns of
A. The null space of A, denoted ^(A), is dened by
^(A) = x F
n
: Ax = 0.
Check that ^(A) is a subspace of F
n
.
2. Dierent sets may span the same subspace. For example L(e
1
, e
2
) = L(e
1
, e
2
, e
1
+e
2
) =
R
2
. The vector space T
n
(R) is spanned by 1, t, t
2
, . . . , t
n
and also by 1, (1+t), . . . , (1+t)
n
(why?).
Bases and dimension of vector spaces
We have introduced the notion of linear span of a subset S of a vector space. This raises some
natural questions:
(i) Which spaces can be spanned by nite number of elements ?
(ii) If a vector space V = L(S) for a nite subset S of V then what is the size of smallest such
S?
To answer these questions we introduce the notions of linear dependence and independence,
basis and dimension of a vector space.
Linear independence
Denition 2.1.8. Let V be a vector space. A subset S V is called linearly dependent (L.D.)
if there exist distinct elements v
1
, v
2
, . . . , v
n
S (for some n 1) and scalars
1
,
2
, . . . ,
n
not
all zero such that
1
v
1
+
2
v
2
+. . . +
n
v
n
= 0
A set S is called linearly independent (L.I.) if it is not linearly dependent, i.e., for all n 1
and for all distinct v
1
, v
2
, . . . , v
n
S and scalars
1
,
2
, . . . ,
n
1
v
1
+
2
v
2
+. . . +
n
v
n
= 0 implies
i
= 0, for all i.
Elements of a linearly independent set are called linearly independent. Note that the empty
set is linearly independent.
Remark 2.1.9. (i) Any subset of V containing a linearly dependent set is linearly dependent.
(ii) Any subset of a linearly independent set in V is linearly independent.
Example 2.1.10. (i) If a set S contains the zero vector 0 then S is dependent since 1.0 = 0.
(ii) Consider the vector space R
n
and let S = e
1
, e
2
, . . . , e
n
. Then S is linearly independent.
Indeed, if
1
e
1
+
2
e
2
+ . . . +
n
e
n
= 0 for some scalars
1
,
2
, . . . ,
n
then (
1
,
2
, . . . ,
n
) = 0.
Thus each
i
= 0. Hence S is linearly independent.
(iii) Let V be the vector space of all continuous functions from R to R. Let S = 1, cos
2
t, sin
2
t.
Then the relation cos
2
t + sin
2
t 1 = 0 shows that S is linearly dependent.
(iv) Let
1
<
2
< . . . <
n
be real numbers. Let V = f : R R[f is continuous . Consider
the set S = e
1
x
, e
2
x
, . . . , e
n
x
. We show that S is linearly independent by induction on n. Let
n = 1 and e
1
x
= 0. Since e
1
x
,= 0 for any x, we get = 0. Now assume that the assertion is
true for n 1 and
1
e
1
x
+. . . +
n
e
n
x
= 0.
Then
1
e
(
1
n
)x
+. . . +
n
e
(
n
n
)x
= 0
Let x to get
n
= 0. Now apply induction hypothesis to get
1
= . . . =
n1
= 0.
(v) Let T denote the vector space of all polynomials p(t) with real coecients. Then the set
S = 1, t, t
2
, . . . is linearly independent. Suppose that 0 n
1
< n
2
< . . . < n
r
and
1
t
n
1
+
2
t
n
2
+. . . +
r
t
n
r
= 0
for certain real numbers
1
,
2
, . . . ,
r
. Dierentiate n
1
times to get
1
= 0. Continuing this way
we see that all
1
,
2
, . . . ,
r
are zero.
Bases and Dimension
Bases and dimension are two important notions in the study of vector spaces. A vector space
may be realized as linear span of several sets of dierent sizes. We study properties of the smallest
sets whose linear span is a given vector space.
Denition 2.1.11. A subset S of a vector space V is called a basis of V if elements of S are
independent and V = L(S). A vector space V possessing a nite basis is called nite dimensional.
Otherwise V is called innite dimensional.
Exercise 2.1.12. Let v
1
, . . . , v
n
be a basis of a nite dimensional vector space V . Show that
every v V can be uniquely expressed as v = a
1
v
1
+ +a
n
v
n
, for scalars a
1
, . . . , a
n
.
We show that all bases of a nite dimensional vector space have same cardinality (i.e., they
contain the same number of elements) For this we prove the following result.
Lemma 2.1.13. Let S = v
1
, v
2
, . . . , v
k
be a subset of a vector space V. Then any k +1 elements
in L(S) are linearly dependent.
Proof. We shall give two proofs.
(rst proof ) Suppose T = w
1
, . . . , w
n
are linearly independent vectors in L(S). We shall
show that n k. This will prove the result.
We shall construct a sequence of sets
S = S
0
, S
1
, . . . , S
n
such that
(i) each S
i
spans L(S), i = 0, 1, . . . , n.
(ii) [S
i
[ = k, i = 0, 1, . . . , n.
(iii) w
1
, . . . , w
i
S
i
, i = 0, 1, . . . , n.
We shall produce this sequence of sets inductively, the base case i = 0 being clear. Now suppose
we have sets S
0
, . . . , S
j
satisfying (i), (ii), (iii) above, for some j < n.
Since S
j
spans L(S) we can write
w
j+1
=
sS
j
c
s
s,
for some scalars c
s
, s S. Since w
1
, . . . , w
j+1
are linearly idependent there exists t S
j

w
1
, . . . , w
j
with c
t
,= 0 (why?). It follows that
t =
1
c
t
(w
j+1
sS
j
t
c
s
s)
and hence the set (S
j
t) w
j+1
satises conditions (i), (ii), and (iii) above for i = j +1. That
completes the proof.
(second proof ) Let T = u
1
, . . . , u
k+1
L(S). Write
u
i
=
k
j=1
a
ij
v
j
, i = 1, . . . , k + 1.
Consider the (k + 1) k matrix A = (a
ij
).
Since A has more rows than columns there exists (why?) a nonzero row vector c = [c
1
, . . . , c
k+1
]
such that cA = 0, i.e., for j = 1, . . . k
k+1
i=1
c
i
a
ij
= 0.
We now have
k+1
i=1
c
i
u
i
=
k+1
i=1
c
i
(
k
j=1
a
ij
v
j
)
=
k
j=1
(
k+1
i=1
c
i
a
ij
)v
j
= 0,
completing the proof.
Theorem 2.1.14. Any two bases of a nite dimensional vector space have same number of ele-
ments.
Proof. Suppose S and T are bases of a nite dimensional vector space V. Suppose [S[ < [T[. Since
T L(S) = V, T is linearly dependent. This is a contradiction.
Denition 2.1.15. The number of elements in a basis of a nite-dimensional vector space V is
called the dimension of V. It is denoted by dimV.
Example 2.1.16. (1) The n coordinate vectors e
1
, e
2
, . . . , e
n
in R
n
form a basis of R
n
.
(2) Let A be a n n matrix. Then the columns of A form a basis of F
n
i A is invertible.
(why?)
(3) T
n
(R) = a
0
+a
1
t +. . . +a
n
t
n
[ a
0
, a
1
, . . . , a
n
R is spanned by S = 1, t, t
2
, . . . , t
n
. Since
S is independent, dimT
n
(R) = n + 1.
(4) Let M
mn
(F) denote the vector space of all m n matrices with entries in F. Let e
ij
denote the m n matrix with 1 in (i, j) position and 0 elsewhere. If A = (a
ij
) M
mn
(F) then
A =
m
i=1
n
j=1
a
ij
e
ij
. It is easy to see that the mn matrices E
ij
are linearly independent. Hence
M
mn
(F) is an mndimensional vector space.
(5) The space of solutions of the dierential equation
y
2y
3y = 0
has dimension 2. A basis is e
x
, e
3x
. Every solution is a linear combination of the solutions e
x
and e
3x
.
Exercise 2.1.17. What is the dimension of M
nn
(C) as a real vector space?
Lemma 2.1.18. Suppose V is a nite dimensional vector space. Let S be a linearly independent
subset of V . Then S can be enlarged to a basis of V .
Proof. Suppose that dimV = n and S has less than n elements. Let v V L(S). Then S v
is a linearly independent subset of V (why?). Continuing this way we can enlarge S to a basis of
V .
Gauss elimination, row space, and column space
Lemma 2.1.19. Let A be a mn matrix over F and E a nonsingular mm matrix over F. Then
(a) (A) = (EA). Hence dim(A) = dim(EA).
(b) Let 1 i
1
< i
2
< < i
k
n. Columns i
1
, . . . , i
k
of A are linearly independent if and
only if columns i
1
, . . . , i
k
of EA are linearly independent. Hence dim((A) = dim((EA).
Proof. (a) R(EA) R(A) since every row of EA is a linear combination of the rows of A.
Similarly,
R(A) = R(E
1
(EA)) R(EA).
(b) Suppose columns i
1
, . . . , i
k
of A are linearly independent. Then (why?)
1
(EA)
i
1
+
2
(EA)
i
2
+ +
k
(EA)
i
k
= 0
i E(
1
A
i
1
+
2
A
i
2
+ +
k
A
i
k
) = 0
i E
1
(E(
1
A
i
1
+
2
A
i
2
+ +
k
A
i
k
)) = 0
i
1
A
i
1
+
2
A
i
2
+ +
k
A
i
k
= 0
i
1
= =
k
= 0.
Thus columns i
1
, . . . , i
k
of EA are linearly independent. The proof of the converse is similar.
Theorem 2.1.20. Let A be a mn matrix. Then dim(A) = dim((A).
Proof. We give two proofs.
(rst proof ) Let r be dim((A). Then there are m1 column vectors v
1
, v
2
, . . . , v
r
that form
a basis for the column space of A. Form a mr matrix C with columns v
1
, . . . , v
r
.
For each 1 j n, there exists (why?) a r 1 column vector u
j
such that the jth column of
A is equal to Cu
j
.
Form a r n matrix B with columns u
1
, . . . , u
n
. Then (why?) A = CB.
Now the row space of A is contained in the row space of B and so dim(A) is r = dim((A).
Applying this argument to the transpose of A shows that dim((A) is dim(A).
(second proof ) Apply row operations to reduce A to the rcf U. Thus A = EU, where E is a
product of nonsingular elementary matrices. Suppose the rst k rows of U are nonzero.Thus U has
k pivotal columns.
Then (why?) the rst k rows of U are a basis of (A). Let j
1
, . . . , j
k
be the pivotal columns of
U. Then (why?) columns j
1
, . . . , j
k
of A form a basis of ((A).
Example 2.1.21. Let A be a 4 6 matrix whose row echelon form is
U =
_
_
1 2 3 4 5 6
0 0 0 1 7 8
0 0 0 0 0 1
0 0 0 0 0 0
_
_
Columns 1,4,6 of A form a basis of ((A) and the rst 3 rows of U form a basis of (A).
Denition 2.1.22. The rank of an mn matrix A, denoted by r(A) or rank (A) is dim(A) =
dim((A). The nullity of A is the dimension of the nullspace ^(A) of A.
The rank-nullity Theorem
Theorem 2.1.23. Let A be an mn matrix. Then
rank A+ nullity A = n.
Proof. Let k = r(A). Reduce A to rcf (or even ref) U using elementary row operations. Then U
has k nonzero rows and k pivotal columns. We need to show that dim^(A) = dim^(U) = n k.
Let j
1
, . . . , j
k
be the indices of the pivotal columns of U. Set P = j
1
, . . . , j
k
and F =
1, 2, . . . , n P, so [F[ = n k. Recall from Chapter 2 the following:
(i) Given arbitrary scalars x
i
for i F, there are unique scalars x
i
for i P such that
x = (x
1
, . . . , x
n
)
t
satisfying Ux = 0.
(ii) Given i F, there is a unique s
i
= (x
1
, . . . , x
n
) satisfying Us
i
= 0, x
i
= 1, and x
j
= 0, for
all j F i.
Then s
i
, i F forms a basis of ^(A) (why?).
Fundamental Theorem for systems of linear equations
Theorem 2.1.24. Consider the following system of m linear equations in n unknowns x
1
, x
2
, . . . , x
n
:
_
_
a
11
a
12
a
1n
a
21
a
22
a
2n
.
.
.
a
m1
a
m2
a
mn
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
=
_
_
b
1
b
2
.
.
.
b
m
_
_
or Ax = b.
(1) The system has a solution i r(A) = r([A[ b]).
(2) If r(A) = r([A[ b]) = n then Ax = b has a unique solution.
(3) If r(A) = r([A[ b]) = r < n then Ax = b has innitely many solutions.
Proof. (1) Let C
1
, C
2
, . . . , C
n
be the column of A. Suppose Ax = b has a solution x
1
= a
1
, x
2
=
a
2
, . . . , x
n
= a
n
. Then
b = a
1
C
1
+a
2
C
2
+ +a
n
C
n
.
Hence b ((A) so A and [A[ b] have same column space. Thus they have equal rank. Conversely if
r(A) = r([A[ b]), then b ((A). Hence b = d
1
C
1
+ +d
n
C
n
for some scalars d
1
, d
2
, . . . , d
n
. Then
d
1
C
1
+ +d
n
C
n
= A
_
_
d
1
d
2
.
.
.
d
n
_
_
= b.
Hence x
1
= d
1
, . . . , x
n
= d
n
is a solution.
2.2. LINEAR TRANSFORMATIONS 37
(2) Let r(A) = r([A[ b]) = n. Then by the rank-nullity theorem, nullity (A) = 0. Hence Ax = 0
has a unique solution, namely x
1
= = x
n
= 0. If Ax = b = Ay then A(x y) = 0. Hence
x y = 0. Thus x = y.
(3) Suppose r(A) = r([A[ b]) = r < n. Then n r = dim^(A) > 0. Thus Ax = 0 has innitely
many solutions. Let c F
n
and Ac = b. Then we have seen before that all the solutions of Ax = b
are in the set c +^(A) = c +x[ Ax = 0. Hence Ax = b has innitely many solutions.
Rank in terms of determinants
We characterize rank in terms of minors of A. Recall that a minor of order r of A is a
submatrix of A consisting of r columns and r rows of A.
Theorem 2.1.25. An mn matrix A has rank r 1 i detM ,= 0 for some order r minor M of
A and detN = 0 for all order r + 1 minors N of A.
Proof. Let the rank of A be r 1. Then some r columns of A are linearly independent. Let B
be the mr matrix consisting of these r columns of A. Then rank(B) = r and thus some r rows
of B will be linearly independent. Let C be the r r matrix consisting of these r rows of B. Then
det(C) ,= 0 (why?).
Let N be a (r +1) (r +1) minor of A. Without loss of generality we may take N to consist of
the rst r + 1 rows and columns of A. Suppose det(N) ,= 0. Then the r + 1 rows of N, and hence
the rst r + 1 rows of A, are linearly independent, a contradiction.
The converse is left as an exercise.
2.2 Linear transformations
Let A be an mn matrix with real entries. Then A acts on the n-dimensional space R
n
by left
multiplication : If v R
n
then Av R
m
.
In other words, A denes a function
T
A
: R
n
R
m
, T
A
(v) = Av.
By properties of matrix multiplication, T
A
satises the following conditions :
(i) T
A
(v +w) = T
A
(v) +T
A
(w)
(ii) T
A
(cv) = cT
A
(v)
where c R and v, w R
n
. We say that T
A
respects the two operations in the vector space R
n
. In
this section we study such maps between vector spaces.
Denition 2.2.1. Let V, W be vector spaces over F. A linear transformation T : V W is a
function satisfying
T(v +w) = T(v) +T(w) and T(cv) = cT(v)
where v, w V and c F.
Exercise 2.2.2. Let T : V W be a linear map. Show that T(0) = 0.
Example 2.2.3.
(1) Let c R, V = W = R
2
. Dene T : R
2
R
2
by
T
_
x
y
_
=
_
c 0
0 c
_ _
x
y
_
=
_
cx
cy
_
.
T stretches each vector v in R
2
to cv. Hence
T(v +w) = c(v +w) = cv +cw = T(v) +T(w)
T(dv) = c(dv) = d(cv) = dT(v).
Hence T is a linear transformation.
(2) Rotation
Fix and dene T : R
2
R
2
by
T
__
x
y
__
=
_
cos sin
sin cos
_ _
x
y
_
=
_
xcos y sin
xsin +y cos
_
.
Then T(e
1
) = (cos , sin)
t
and T(e
2
) = (sin, cos )
t
. Thus T rotates the whole space by .
(Draw a picture to convince yourself of this. Another way is to identify the vector (x, y)
t
with the
complex number z = x +iy. Then we can write T(z) = ze
i
).
(3) Let T be the vector space of dierentiable functions f : R R such that f
(n)
exists for all n.
Dene D : T T by
D(f) = f
.
Then D(af +bg) = af
+bg
= aD(f) +bD(g). Hence D is a linear transformation.

(4) Dene 1 : T T by
1(f)(x) =
_
x
0
f(t) dt
By properties of integration, 1 is a linear transformation.
(5) Consider the dierential equation
y
3y
+ 2y = 0.
Let D : T T be the linear transformation dened as above. Then D D(y) = y
. Let I be the
identity map I(y) = y. Then the dierential equation can be written as
(D
2
3D + 2I)(y) = 0.
It can be shown that e
x
and e
2x
are solutions of the dierential equation. Let T = D
2
3D + 2I.
Then for any , R
T(e
x
+e
2x
) = T(e
x
) +T(e
2x
) = 0
(6) The map T : R R given by T(x) = x
2
is not linear (why?).
(7) Let V = M
nn
(F) be the vector space of all n n matrices over F. Fix A V . The map
T : V V given by T(N) = AN is linear (why?).
Rank and Nullity
Let T : V W be a linear transformation of vector spaces. There are two important subspaces
associated with T.
Nullspace of T = ^(T) = v V [ T(v) = 0.
Image of T = Im(T) = T(v) [ v V .
Let V be a nite dimensional vector space. Suppose that , are scalars. If v, w ^(T)
then T(v + w) = T(v) + T(w) = 0. Hence v + w ^(T). Thus ^(T) is a subsapce of V.
The dimension of ^(T) is called the nullity of T and it is denoted by nullity (T). Suppose that
v, w V. Then
T(v) +T(w) = T(v +w).
Thus Im(T) is a subspace of W. The dimension of ImT, denoted by rank(T), is called the rank of
T.
Lemma 2.2.4. Let T : V W be a linear map of vector spaces. Then T is 1-1 if and only if
^(T) = 0.
Proof. (if) T(u) = T(v) implies T(u v) = 0 which implies u = v.
(only if) T(v) = 0 = T(0) implies v = 0.
Lemma 2.2.5. Let V, W be vector spaces. Assume V is nite dimensional with (v
1
, . . . , v
n
) as an
ordered basis. Let (w
1
, . . . , w
n
) be an arbitrary sequence of vectors in W. Then there is a unique
linear map T : V W with T(v
i
) = w
i
, for all i = 1, . . . , n.
Proof. (uniqueness) Given v V we can write (uniquely) v = a
1
v
1
+ +a
n
v
n
, for scalars a
i
.
Then T(v) = a
1
T(v
1
) + +a
n
T(v
n
) = a
1
w
1
+ +a
n
w
n
. So T is determined by (w
1
, . . . , w
n
).
(existence) Dene T as follows. Given v V write (uniquely) v = a
1
v
1
+ + a
n
v
n
, for scalars
a
i
. Dene T(v) = a
1
w
1
+ +a
n
w
n
. Show that T is linear (exercise).
Theorem 2.2.6 (The Rank-Nullity Theorem). Let T : V W be a linear transformation of vector
spaces where V is nite dimensional. Then
rank(T) + nullity (T) = dimV.
Proof. Suppose dimV = n. Let B = v
1
, v
2
, . . . , v
l
be a basis of ^(T). We can extend B to a
basis C = v
1
, v
2
, . . . , v
l
, w
1
, w
2
, . . . , w
nl
of V. We show that
D = T(w
1
), T(w
2
), . . . , T(w
nl
)
is a basis of Im(T). Any v V can be expressed uniquely as
v =
1
v
1
+
2
v
2
+ +
l
v
l
+
1
w
1
+ +
nl
w
nl
.
Hence
T(v) =
1
T(v
1
) + +
l
T(v
l
) +
1
T(w
1
) + +
nl
T(w
nl
)
=
1
T(w
1
) + +
nl
T(w
nl
).
Hence D spans ImT. Suppose
1
T(w
1
) + +
nl
T(w
nl
) = 0.
Then
T(
1
w
1
+ +
nl
w
nl
) = 0.
Hence
1
w
1
+ +
nl
w
nl
^(T). Hence there are scalars
1
,
2
, . . . ,
l
such that
1
v
1
+
2
v
2
+ +
l
v
l
=
1
w
1
+
2
w
2
+ +
nl
w
nl
.
By linear independence of v
1
, v
2
, . . . , v
l
, w
1
, w
2
, . . . , w
nl
we conclude that
1
=
2
= =
nl
=
0. Hence D is a basis of ImT. Thus
rank(T) = n l = dimV dim^(T).
In a later exercise in this section we ask you to derive the rank-nullity theorem for matrices
from the result above.
Coordinate vectors
Let V be a nite dimensional vector space (fdvs) over F. By an ordered basis of V we mean a
sequence (v
1
, v
2
, . . . , v
n
) of distinct vectors of V such that the set v
1
, . . . , v
n
is a basis. Let u V .
Write uniquely (why?)
u = a
1
v
1
+a
2
v
2
+ +a
n
v
n
, a
i
F.
Dene the coordinate vector of u with respect to (wrt) the ordered basis B by
[u]
B
=
_
_
a
1
a
2
.
.
a
n
_
_
.
Note that (why?) for vectors u, v V and scalar a F we have
[u +v]
B
= [u]
B
+ [v]
B
, [av]
B
= a[v]
B
.
Suppose C = (u
1
, . . . , u
n
) is another ordered basis of V . Given u V , what is the relation between
[u]
B
and [u]
C
.
Dene M
C
B
, the transition matrix from C to B, to be the n n matrix whose jth column
is [u
j
]
B
:
M
C
B
= [[u
1
]
B
[u
2
]
B
[u
n
]
B
] .
Lemma 2.2.7. Set M = M
C
B
. Then, for all u V , we have
[u]
B
= M[u]
C
.
Proof. Let
[u]
C
=
_
_
a
1
a
2
.
.
a
n
_
_
.
Then u = a
1
u
1
+a
2
u
2
+ +a
n
u
n
and we have
[u]
B
= [a
1
u
1
+ +a
n
u
n
]
B
= a
1
[u
1
]
B
+ +a
n
[u
n
]
B
= [[u
1
]
B
[u
2
]
B
[u
n
]
B
]
_
_
a
1
a
2
.
.
a
n
_
_
= M[u]
C
.
Example 2.2.8. Let V = R
3
and let
v
1
=
_
_
1
1
1
_
_
, v
2
=
_
_
0
1
1
_
_
, v
3
=
_
_
0
0
1
_
_
, u
1
=
_
_
1
0
0
_
_
, u
2
=
_
_
0
1
0
_
_
, u
3
=
_
_
0
0
1
_
_
.
Consider the ordered bases B = (v
1
, v
2
, v
3
) and C = (u
1
, u
2
, u
3
). We have (why?)
M = M
C
B
=
_
_
1 0 0
1 1 0
0 1 1
_
_
.
Let u =
_
_
2
3
4
_
_
. So (why?) [u]
C
=
_
_
2
3
4
_
_
.
Then
[u]
B
=
_
_
1 0 0
1 1 0
0 1 1
_
_
_
_
2
3
4
_
_
=
_
_
2
1
1
_
_
.
Check that
_
_
2
3
4
_
_
= 2
_
_
1
1
1
_
_
+
_
_
0
1
1
_
_
+
_
_
0
0
1
_
_
.
Lemma 2.2.9. Let V be a fdvs and B and C be two ordered bases. Then
M
C
B
= (M
B
C
)
1
.
Proof. Put M = M
B
C
and N = M
C
B
. We need to show that MN = NM = I.
We have, for all u V ,
[u]
B
= N[u]
C
, [u]
C
= M[u]
B
.
It follows that, for all u V ,
[u]
B
= N[u]
C
= NM[u]
B
[u]
C
= M[u]
B
= MN[u]
C
Thus (why?) MN = NM = I.
Example 2.2.10. Let M be the (n + 1) (n + 1) matrix, with rows and columns indexed by
0, 1, . . . , n, and with entry in row i and column j, 0 i, j n, given by
_
j
i
_
. We show that M is
invertible and nd the inverse explicitly.
Consider the vector space T
n
(R) of real polynomials of degree n. Then B = (1, x, x
2
, . . . , x
n
)
and C = (1, x 1, (x 1)
2
, . . . , (x 1)
n
) are both ordered bases (why?).
We claim that M = M
B
C
. To see this note the following computaion. For 0 j n we have
x
j
= (1 + (x 1))
j
=
j
i=0
_
j
i
_
(x 1)
i
=
n
i=0
_
j
i
_
(x 1)
i
,
where in the last step we have used the fact that
_
j
i
_
= 0 for i > j.
Thus M
1
= M
C
B
and its entries are given by the following computation. For 0 j n we
have
(x 1)
j
=
j
i=0
(1)
ji
_
j
i
_
x
i
=
n
i=0
(1)
ji
_
j
i
_
x
i
Thus, the entry in row i and column j of M
1
is (1)
ji
_
j
i
_
.
Matrices and linear transformations
Let V and W be nite dimensional vector spaces with dimV = n and dimW = m. Suppose
E = (e
1
, e
2
, . . . , e
n
) is an ordered basis for V and F = (f
1
, f
2
, . . . , f
m
) is an ordered basis for W.
Let T : V W be a linear transformation. We dene M
E
F
(T), the matrix of T with respect
to the ordered bases E and F, to be the mn matrix whose jth column is [T(e
j
)]
F
:
M
E
F
(T) = [[T(e
1
)]
F
[T(e
2
)]
F
[T(e
n
)]
F
] .
Please do the following important exercise.
Exercise 2.2.11. Let A be a m n matrix over F and consider the linear map T
A
: F
n
F
m
given by T
A
(v) = Av, for v F
n
(we are considering column vectors here).
Consider the ordered basis E = (e
1
, . . . , e
n
) and F = (e
1
, . . . , e
m
) of F
n
and F
m
respectively.
Show that M
E
F
(T
A
) = A.
Let L(V, W) denote the set of all linear transformations from V to W. Suppose S, T L(V, W)
and c is a scalar. Dene S +T and cS as follows :
(S +T)(x) = S(x) +T(x)
(cS)(x) = cS(x)
for all x V. It is easy to show that L(V, W) is a vector space under these operations.
Lemma 2.2.12. Fix ordered bases E and F of V and W respectively. For all S, T L(V, W) and
scalar c we have
(i) M
E
F
(S +T) = M
E
F
(S) +M
E
F
(T)
(ii) M
E
F
(cS) = cM
E
F
(S)
(iii) M
E
F
(S) = M
E
F
(T) S = T.
Proof. Exercise.
Lemma 2.2.13. Suppose V, W are vector spaces of dimensions n, m respectively. Suppose T :
V W is a linear transformation. Suppose E = (e
1
, . . . , e
n
), F = (f
1
, . . . , f
m
) are ordered bases
of V, W respectively. Then
[T(v)]
F
= M
E
F
(T)[v]
E
, v V.
Proof. Let
[v]
E
=
_
_
a
1
a
2
.
.
a
n
_
_
.
Then v = a
1
e
1
+a
2
e
2
+ +a
n
e
n
and hence T(v) = a
1
T(e
1
) +a
2
T(e
2
) + +a
n
T(e
n
).
We have
[T(v)]
F
= [a
1
T(e
1
) + +a
n
T(e
n
)]
F
= a
1
[T(e
1
)]
F
+ +a
n
[T(e
n
)]
F
= [[T(e
1
)]
F
[T(e
2
)]
F
[T(e
n
)]
F
]
_
_
a
1
a
2
.
.
a
n
_
_
= M
E
F
(T)[v]
E
.
Lemma 2.2.14. Suppose U, V, W are vector spaces of dimension n, p, m respectively. Suppose
T : U V and S : V W are linear transformations. Suppose E = (e
1
, . . . , e
n
), F, G are
ordered bases of U, V, W respectively. Then
M
E
G
(SoT) = M
F
G
(S)M
E
F
(T).
Proof. The jth column of M
E
G
(S T) is
= [(S T)(e
j
)]
G
= [S(T(e
j
))]
G
.
Now the jth column of M
F
G
(S)M
E
F
(T) is
= M
F
G
(S)(jth column of M
E
F
(T))
= M
F
G
(S)[T(e
j
)]
F
= [S(T(e
j
))]
G
.
Let V be a fdvs. A linear map T : V V is said to be a linear operator on V . Let B, C be
ordered bases of V . The square matrix M
B
B
(T) is said to be the matrix of T with respect to
the ordered basis B. Note that the transition matrix M
C
B
from C to B is the matrix M
C
B
(I) of
the identity map wrt C and B. Thus it follows that (why?) M
C
B
(I) = M
B
C
(I)
1
.
Exercise 2.2.15. Prove Lemma 2.2.9 using Lemma 2.2.14.
An easy induction gives the following generalization the lemma above. Its proof is left as an
exercise.
Lemma 2.2.16. Suppose V
i
, 1 i m + 1 are nite dimensional vector spaces and T
i
: V
i

V
i+1
, 1 i m are linear maps. Suppose E
i
is an ordered basis of V
i
, for 1 i m+ 1. Then
M
E
1
E
m+1
(T
m
T
m1
T
2
T
1
) = M
E
m
E
m+1
(T
m
) M
E
2
E
3
(T
2
) M
E
1
E
2
(T
1
).
Lemma 2.2.17. We have
M
B
B
(T) = (M
B
C
)
1
M
C
C
(T)M
B
C
.
Proof. Applying Lemma 2.2.16 with V
1
= V
2
= V
3
= V
4
= V and T
1
= T
3
= I, T
2
= T, and
E
1
= E
4
= B, E
2
= E
3
= C we get
M
B
B
(T) = M
C
B
M
C
C
(T)M
B
C
Since M
C
B
= (M
B
C
)
1
the proof is complete.
Example 2.2.18. Consider the linear transformation
T : R
2
R
2
, T(e
1
) = e
1
, T(e
2
) = e
1
+e
2
.
Let C = e
1
, e
2
and B = e
1
+e
2
, e
1
e
2
. Then
M
C
C
(T) =
_
1 1
0 1
_
, M
B
C
=
_
1 1
1 1
_
, M
C
B
=
_
1/2 1/2
1/2 1/2
_
.
Hence
M
B
B
=
_
1/2 1/2
1/2 1/2
_ _
1 1
0 1
_ _
1 1
1 1
_
=
1
2
_
3 1
1 1
_
.
We can also nd this directly :
T(e
1
+e
2
) = 2e
1
+e
2
=
3
2
(e
1
+e
2
) +
1
2
(e
1
e
2
)
T(e
1
e
2
) = e
2
=
1
2
(e
1
+e
2
) +
1
2
(e
1
e
2
).
Exercise 2.2.19. (i) Deduce the rank-nullity theorem for matrices from the rank-nullity theorem
for linear maps.
(ii) Let T : V W be a linear map of rank r between fdvs V and W. Show that there are
ordered bases E of V and F of W such that
M
E
F
(T) =
_
I 0
0 0
_
,
where I is the r r identity matrix and 0 stands for a matrix of zeros of appropriate size.
Given subspaces V, W of a vector space U dene the sum of V and W, denoted V +W, by
V +W = L(V W).
Lemma 2.2.20. Let V, W be subspaces of a fdvs U. Then
dimV + dimW = dim(V W) + dim(V +W).
Proof. We shall give a sketch of a proof leaving the reader to ll in the details.
Consider the set V W = (v, w) : v V, w W. This set is a vector space with component
wise addition and scalar multiplication. Check that the dimension of this space is dimV +dimW.
Dene a linear map T : V W V +W by T((v, w)) = v w. Check that T is onto and that
the nullspace of T is (v, v) : v V W. The result now follows from the rank nullity theorem
for linear maps.
Example 2.2.21. Let V, W be nite dimensional vector spaces over F with dimensions n, m re-
spectively. Fix ordered bases E, F for V, W respectively.
Consider the map f : L(V, W) M
mn
(F) given by f(T) = M
E
F
(T), for T L(V, W). Lemma
2.2.12 shows that f is linear and 1-1 and Lemma 2.2.5 shows that f is onto. It follows (why?) that
dimL(V, W) = mn.
Example 2.2.22. Often we see statements like If every vector in a vector space is uniquely
determined by t parameters the dimension of V is t. In this example we show one possible way of
making this precise.
Let V be a vector space over F. A linear functional is a linear map f : V F. We shall refer
to a linear functional as a parameter. Suppose we have t parameters f
i
: V F, i = 1, 2, . . . , t.
Suppose every vector in V is uniquely determined by these t parameters, i.e., given arbitrary scalars
a
1
, a
2
, . . . , a
t
in F, there is a unique vector v V with f
i
(v) = a
i
, i = 1, . . . , t. Then dimV = t.
We show this as follows.
For i = 1, . . . , t, let v
i
V be the (unique) vector with f
i
(v
i
) = 1 and f
j
(v
i
) = 0, for j ,= i. We
claim that v
1
, . . . , v
t
is a basis of V .
Let v V . Put a
i
= f
i
(v), i = 1, . . . , t. Consider the vector v (a
1
v
1
+ +a
t
v
t
). Check that
f
i
(v (a
1
v
1
+ + a
t
v
t
)) = 0, for i = 1, . . . , t. Since each of the f
i
is linear and the parameters
f
1
, . . . , f
t
uniquely determine the vectors in V it follows that the only vector with all parameters 0
is the 0 vector. Thus v (a
1
v
1
+ +a
t
v
t
) = 0 and v
1
, . . . , v
t
span V .
Now suppose a
1
v
1
+ +a
t
v
t
= 0. Then, for all i, f
i
(a
1
v
1
+ +a
t
v
t
) = a
i
= 0 and thus linear
independence follows.
Example 2.2.23. Given a n n matrix, by r
i
we mean the sum of elements in row i. Similarly,
by c
j
we mean the sum of elements in column j.
A real magic square of order n is a real n n matrix satisfying
r
1
= r
2
= = r
n
= c
1
= c
2
= = c
n
.
Let RMS(n) denote the set of all real magic squares of order n. It is easy to see that RMS(n) is
a subspace of M
nn
(R), the vector space of all n n real matrices. The dimension of M
nn
(R) is
n
2
. What is the dimension of RMS(n)?
We show that dimRMS(n) = (n 1)
2
+ 1 using the previous example.
For 1 i, j n 1, dene a linear map f
ij
: RMS(n) R by
f
ij
(M) = entry in row i and column j of M, M RMS(n).
Dene a linear map f : RMS(n) R by
f(M) = sum of the entries in row 1 of M, M RMS(n).
Check that the (n 1)
2
+ 1 parameters f, f
ij
satisfy the hypothesis of the previous example.
Let V be a nite dimensional vector space and let V = R N be a direct sum decomposition.
It is easy to see that there is a unique linear map p
R
: V V satisfying
p
R
(v) = v, for v R and p
R
(v) = 0, for v N.
We say that p
R
is the projection onto R along N. Note that p
2
R
= p
R
.
Lemma 2.2.24. Let P : V V satisfy P
2
= P. Then P is the projection onto Im(P) along
^(P).
Proof. Let v V . Write v = P(v) +(v P(v)). Now P(v P(v)) = P(v) P
2
(v) = 0. It follows
that V = Im(P) +^(P).
Now let P(u) Im(P) ^(P). Then P(u) = P
2
(u) = P(P(u)) = 0. It follows that V =
Im(P) ^(P).
Let P(u) Im(P). Then P(P(u)) = P
2
(u) = P(u). Clearly P(v) = 0 for v ^(P). That
Chapter 3
Inner product spaces
3.1 Length, Projection, and Angle
The concept of a (real) vector space abstracts the operations of adding directed line segments and
multiplying a directed line segment by a real number. In plane geometry we also speak of other
geometric concepts such as length, angle, perpendicularity, projection of a point on a line etc.
Remarkably, we need to put only a single additional algebraic structure, that of an inner product,
on a vector space in order to have these geometric concepts available in the abstract setting.
We shall use the following notation. Recall that F = R or C. Given a F, a will denote a if
F = R and is the complex conjugate of a if F = C. Given a matrix A over F we denote by A
the
conjugate transpose of A, i.e., if A = (a
ij
) then A
= (a
ji
). We call A
the adjoint of A.
Denition 3.1.1. Let V be a vector space V over F. An inner product on V is a rule which
to any ordered pair of elements (u, v) of V associates a scalar, denoted by u, v) satisfying the
following axioms: for all u, v, w in V and c any scalar we have
(1) u, v) = v, u) (Hermitian property or conjugate symmetry),
(2) u, v +w) = u, v) +u, w) (additivity),
(3) cu, v) = cu, v) (homogeneity),
(4) v, v) 0 with v, v) = 0 i v = 0 (positive denite).
A vector space with an inner product is called an inner product space.
Example 3.1.2. (1) Let v = (x
1
, x
2
, . . . , x
n
)
t
and w = (y
1
, y
2
, . . . , y
n
)
t
R
n
. Dene v, w) =
n
i=1
x
i
y
i
= v
t
w. This is an inner product on the real vector space R
n
. This is often called the
standard inner product.
(2) Let v = (x
1
, x
2
, . . . , x
n
)
t
and w = (y
1
, y
2
, . . . , y
n
)
t
C
n
. Dene v, w) =

n
i=1
x
i
y
i
= v
w.
This is an inner product on the complex vector space C
n
. This is often called the standard inner
product.
(3) Let V = the vector space of all real valued continuous functions on the unit interval [0, 1].
For f, g V, put
49
50 CHAPTER 3. INNER PRODUCT SPACES
f, g) =
_
1
0
f(t)g(t)dt.
Simple properties of the integral show that f, g) is an inner product.
(4) Let B be a nonsingular n n complex matrix. Set A = B
B. Given x, y C
n
dene
x, y) = x
Ay. Denote the standard inner product on C

n
by the dot product (i.e., the inner
product of u and v is u v). We have
x, y) = x
Ay = x
By = (Bx)
By = Bx By
Check that , ) is an inner product.
Denition 3.1.3. Given an inner product space V and an element v V we dene its length or
norm by
|v| =
_
v, v).
We say v is a unit vector if |v| = 1. Elements v, w of V are said to be orthogonal or perpen-
dicular if v, w) = 0. We write this as v w.
Note that if c F and v V then
|cv| =
_
cv, cv) =
_
ccv, v) = [c[|v|.
Theorem 3.1.4. (Pythagoras) If v w then |v +w|
2
= |v|
2
+|w|
2
.
Proof. We have
|v +w|
2
= v +w, v +w) = v, v) +w, w) +v, w) +w, v) = |v|
2
+|w|
2
.
Exercise 3.1.5. Prove the Parallelogram law: for v, w V we have
|v +w|
2
+|v w|
2
= 2|v|
2
+ 2|w|
2
.
Denition 3.1.6. Let v, w V with w ,= 0. We dene
p
w
(v) =
w, v)
w, w)
w
to be the projection of v on w.
Note that the map p
w
: V V given by v p
w
(v) is linear. This is the reason for using w, v)
instead of v, w) in the denition of p
w
(v).
The next lemma, whose geometric content is clear, explains the use of the term projection.
Lemma 3.1.7. Let v, w V with w ,= 0. Then
(i) p
w
(v) = p
w
w
(v), i.e., the projection of v on w is the same as the projection of v on the unit
vector in the direction of w.
(ii) p
w
(v) and v p
w
(v) are orthogonal.
(iii) |p
w
(v)| |v| with equality i v, w are linearly dependent.
3.1. LENGTH, PROJECTION, AND ANGLE 51
Proof. (i) We have
p
w
(v) =
w, v)
w, w)
w =
w, v)
|w|
2
w =
w
|w|
, v)
w
|w|
= p
w
w
(v),
where in the last step we have used the fact that
w
|w|
,
w
|w|
) = 1.
(ii) In view of part (i) we may assume that w is a unit vector. We have
p
w
(v), v p
w
(v)) = p
w
(v), v) p
w
(v), p
w
(v))
= w, v)w, v) w, v)w, w, v)w)
= w, v)w, v) w, v)w, v)w, w)
= 0.
(iii) We have (in the third step below we have used part (ii) and Pythogoras)
|v|
2
= v, v)
= p
w
(v) +v p
w
(v), p
w
(v) +v p
w
(v))
= |p
w
(v)|
2
+|v p
w
(v)|
2
|p
w
(v)|
2
.
Clearly, there is equality in the last step i v = p
w
(v), i.e., i v, w are dependent.
Theorem 3.1.8. Cauchy-Schwarz inequality For v, w V
[w, v)[ |v||w|,
with equality i v, w are linearly dependent.
Proof. The result is clear if w = 0. So we may assume that w ,= 0.
Case (i): w is a unit vector. In this case the lhs of the C-S inequality is |p
w
(v)| and the result
follows from part (iii) of the previous lemma.
Case (ii): w is not a unit vector. Set u =
w
|w|
. We have [w, v)[ = |w|([u, v)[) and |v||w| =
|w|(|v||u|). The result follows from Case (i) above.
Theorem 3.1.9. Triangle Inequality For v, w V
|v +w| |v| +|w|.
Proof. We have, using C-S inequality,
|v +w|
2
= v +w, v +w)
= v, v) +v, w) +w, v) +w, w)
= v, v) +v, w) +v, w) +w, w)
|v|
2
+|w|
2
+ 2|v||w|
= (|v| +|w|)
2
.
Denition 3.1.10. Let V be a real inner product space. Given v, w V with v, w ,= 0, by C-S
inequality
1
v, w)
|v||w|
1.
So, there is a unique 0 satisfying cos() =
v, w)
|v||w|
. This is the angle between v and w.
The distance between u and v in V is dened as d(u, v) = |u v|.
Lemma 3.1.11. Let u, v, w V . Then
(i) d(u, v) 0 with equality i u = v.
(ii) d(u, v) = d(v, u).
(iii) d(u, v) d(u, w) +d(w, u).
Proof. Exercise.
Denition 3.1.12. Let V be an n-dimensional inner product space. A basis v
1
, v
2
, . . . , v
n
of V
is called orthogonal if its elements are mutually perpendicular, i.e., if v
i
, v
j
) = 0 for i ,= j. If,
in addition, |v
i
| = 1, for all i, we say that the basis is orthonormal.
Example 3.1.13. In F
n
, with the standard inner product, the basis e
1
, . . . , e
n
is orthonormal.
Lemma 3.1.14. Let U = u
1
, u
1
, . . . , u
n
be a set of nonzero vectors in an inner product space V.
If u
i
, u
j
) = 0 for i ,= j, 1 i, j n, then U is linearly independent.
Proof. Suppose c
1
, c
2
, . . . , c
n
are scalars with
c
1
u
1
+c
2
u
2
+. . . +c
n
u
n
= 0.
Take inner product with u
i
on both sides to get
c
i
u
i
, u
i
) = 0.
Since u
i
,= 0, we get c
i
= 0. Thus U is linearly independent.
Example 3.1.15. (1) Consider R
2
with standard inner product. Then v
1
=
_
1
1
_
and v
2
=
_
1
1
_
are orthogonal. Dividing v
1
and v
2
by their lengths we get an orthonormal basis.
(2) Let V denote the real inner product space of all continuous real functions on [0, 2] with
inner product given by
f, g) =
_
2
0
f(x)g(x)dx.
Dene g
n
(x) = cos(nx), for n 0. Then
|g
n
(x)| =
_
2
0
cos
2
nx dx =
_
2, n = 0,
, n 1,
and g
m
, g
n
) =
_
2
0
cos(mx) cos(nx) dx = 0, m ,= n.
So, g
0
, . . . , g
n
are orthogonal.
3.1. LENGTH, PROJECTION, AND ANGLE 53
Theorem 3.1.16. Let V be a nite dimensional inner product space. Let W V be a subspace and
let w
1
, . . . , w
m
be an orthogonal basis of W. If W ,= V , then there exist elements w
m+1
, . . . , w
n
of V such that w
1
, . . . , w
n
is an orthogonal basis of V .
Taking W = 0, the zero subspace, we see that V has an orthogonal, and hence orthonormal,
basis.
Proof. The method of proof is as important as the theorem and is called the Gram-Schmidt
orthogonalization process.
Since W ,= V , we can nd a vector v
m+1
such that w
1
, . . . , w
m
, v
m+1
is linearly independent.
The idea is to take v
m+1
and subtract from it its projections along w
1
, . . . , w
m
. Dene
w
m+1
= v
m+1
p
w
1
(v
m+1
) p
w
2
(v
m+1
) p
w
m
(v
m+1
)
(Recall that p
w
(v) =
w, v)
w, w)
w.)
Clearly, w
m+1
,= 0 as otherwise w
1
, . . . , w
m
, v
m+1
would be linearly dependent. We now check
that w
1
, . . . , w
m+1
is orthogonal. For this, it is enough to check that w
m+1
is orthogonal to each
of w
i
, 1 i m.
For i = 1, 2, . . . , m we have
w
i
, w
m+1
) = w
i
, v
m+1
j=1
p
w
j
(v
m+1
))
= w
i
, v
m+1
) w
i
,
m
j=1
p
w
j
(v
m+1
))
= w
i
, v
m+1
) w
i
, p
w
i
(v
m+1
)), (since w
i
, w
j
) = 0 for i ,= j)
= w
i
, v
m+1
p
w
i
(v
m+1
))
= 0, (by part (ii) of 3.1.7).
Example 3.1.17. Find an orthonormal basis for the subspace of R
4
(under standard inner product)
spanned by
_
_
1
1
0
1
_
_
,
_
_
1
2
0
0
_
_
, and
_
_
1
0
1
2
_
_
Denote these vectors by a, b, c respectively. Set
b
t
= b
b a
a a
a
=
1
3
_
_
4
5
0
1
_
_
.
Now subtract c fr0m its projections along a and b
t
.
c
t
= c
c a
a a
a
c b
t
b
t
b
t
b
t
=
1
7
_
_
4
2
7
6
_
_
.
Now a, b
t
, c
t
are orthogonal and generate the same subspace as a, b, c. Dividing by the lengths we
get the orthonormal basis
a
|a|
,
b
|b
|
,
c
|c
|
.
Example 3.1.18. Let V = P
3
[1, 1] denote the real vector space of polynomials of degree atmost
3 dened on [1, 1]. V is an inner product space under the inner product
f, g) =
_
1
1
f(t)g(t)dt.
To nd an orthonormal basis, we begin with the basis 1, x, x
2
, x
3
. Set v
1
= 1. Then
v
2
= x x, 1)
1
|1|
2
= x
1
2
_
1
1
tdt = x,
v
3
= x
2
x
2
, 1)
1
2
x
2
, x)
x
(2/3)
= x
2
1
2
_
1
1
t
2
dt
3
2
x
_
1
1
t
3
dt
= x
2
1
3
,
v
4
= x
3
x
3
, 1)
1
2
x
3
, x)
x
(2/3)
x
3
, x
2
1
3
)
x
2
1
3
(
_
8/45)
= x
3
3
5
x.
Thus 1, x, x
2
1
3
, x
3
3
5
x is an orthogonal basis. We divide these by respective norms to get an
orthonormal basis.
_
1
2
, x
_
3
2
, (x
2
1
3
)
3
5
2
2
, (x
3
3
5
x)
5
7
2
2
_
.
You will meet these polynomials later when you will learn about dierential equations.
3.2 Projections and Least Squares Approximations
Let V be a nite dimensional inner product space. We have seen how to project a vector onto a
nonzero vector. We now discuss the (orthogonal) projection of a vector onto a subspace.
3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 55
Let W be a subspace of V . Dene the orthogonal complement W
of W:
W
= u V [ u w for all w W.
Check that W
is a subspace of V .
Theorem 3.2.1. Every v V can be written uniquely as
v = x +y,
where x W and y W
.
Proof. (Existence) Let v
1
, v
2
, . . . , v
k
be an orthonormal basis of W. Set
x = v
1
, v)v
1
+v
2
, v)v
2
+ +v
k
, v)v
k
and put y = v x. Clearly v = x +y and x W. We now check that y W
. For i = 1, 2, . . . , k
we have
y, v
i
) = v x, v
i
)
= v, v
i
) x, v
i
)
= v, v
i
)
k
j=1
v
j
, v)v
j
, v
i
)
= v, v
i
)
k
j=1
v
j
, v)v
j
, v
i
)
= v, v
i
) v, v
i
) (by orthonormality)
= 0.
It follows that (why?) y W
.
(uniqueness) Let v = x+y = x
t
+y
t
, where x, x
t
W and y, y
t
W
. Then xx
t
= y
t
y
W W
. But W W
= 0 (why?). Hence x = x
t
and y = y
t
.
Lemma 3.2.2. We have dim W + dim W
= dim V .
Proof. Exercise.
Exercise 3.2.3. Consider R
n
with standard inner product. Given a nonzero vector v R
n
, by H
v
we mean the hyperplane (i.e., a subspace of dimension n 1) orthogonal to v
H
v
= u R
n
: u v = 0.
By a reection we mean a linear operator T
v
: R
n
R
n
which, for some nonzero v, sends v
to v and xes every vector in H
v
, i.e., T
v
(v) = v and T
v
(u) = u, for u H
v
.
Show that, for all w R
n
,
T
v
(w) = w
2(w v)
v v
v.
Denition 3.2.4. For a subspace W, we dene a function p
W
: V W as follows: given v V ,
express v (uniquely) as v = x + y, where x W and y W
. Dene p
W
(v) = x. We call p
W
(v)
the orthogonal projection of v onto W. Note that v p
W
(v) W
. Note also that the map p

W
is linear.
The diligent reader should observe that, in the language of the previous chapter, p
W
is the
projection onto W along W
.
Example 3.2.5. Consider V = F
n
with the standard inner product. Let P be a nn matrix over
F with associated linear map T
P
: F
n
F
n
. Assume that P
2
= P and P = P
. Then we claim
that T
P
= p
im(T
P
)
= p
((P)
. To see this proceed as follows.
We have already seen that P
2
= P implies that T
P
is the projection onto ((P) along ^(P). It
is enough to show that ((P) and ^(P) are orthogonal. Let v ((P) and u ^(P). Then
v, u) = T
P
(v), u) = Pv, u) = (Pv)
u = v
u = v
Pu = 0,
completing the proof.
This example is our rst hint of the connection between adjointness and orthogonality. We shall
come back to this theme when we discuss the spectral theorem.
Denition 3.2.6. Let W be a subspace of V and let v V . A best approximation to v by
vectors in W is a vector w in W such that
| v w | | v u |, for all u W.
The next result shows that orthogonal projection gives the unique best approximation.
Theorem 3.2.7. Let v V and let W be a subspace of V . Let w W. Then the following are
equivalent:
(i) w is a best approximation to v by vectors in W.
(ii) w = p
W
(v).
(iii) v w W
.
Proof. We have
| v w |
2
= | v p
W
(v) +p
W
(v) w |
2
= | v p
W
(v) |
2
+| p
W
(v) w |
2
,
where the second equality follows from Pythogoras theorem on noting that p
W
(v) w W and
v p
W
(v) W
. It follows that (i) and (ii) are equivalent. To see the equivalence of (ii) and (iii)
write v = w + (v w) and apply Theorem 3.2.1.
Consider R
n
with the standard inner product (we think of R
n
as column vectors). Let A be an
n m (m n) matrix and let b R
n
. We want to project b onto the column space of A. Here is
a method for doing this:
(i) Use Gauss elimination nd a basis of ((A).
(ii) Now use Gram-Schmidt process to nd an orthogonal basis B of ((A).
3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 57
(iii) We have
p
((A)
(b) =
wB
p
w
(b).
We now discuss another method for doing this. The projection of b onto the column space of A
will be a vector of the form p = Ax for some x R
m
. From Theorem 3.2.7, p is the projection i
b Ax is orthogonal to every column of A. In other words, x should satisfy the equations
A
t
(b Ax) = 0, or
A
t
Ax = A
t
b.
The above equations are called normal equations in the Gauss-Markov theory in statistics. Thus,
if x is any solution of the normal equations, then Ax is the required projection of b.
Lemma 3.2.8.
rank (A) = rank (A
t
A).
Proof. We have rank (A) rank (A
t
A) (why?). Let A
t
Az = 0, for z R
m
. Then A
t
w = 0,
where w = Az, i.e., w is in the column space of A and is orthogonal to every column of A.
This implies (why?) that w = Az = 0. Thus nullity (A) nullity (A
t
A). It follows that
rank (A
t
A) = rank (A) .
If the columns of A are linearly independent, the (unique) solution to the normal equations is
(A
t
A)
1
A
t
b and the projection of b onto the column space of A is A(A
t
A)
1
A
t
b. Note that the
normal equations always have a solution (why?), although the solution will not be unique in case
the columns of A are linearly dependent (since rank (A
t
A) = rank (A) < m).
Example 3.2.9. Let A =
_
_
1 1
1 0
0 1
_
_
and b =
_
_
1
0
5
_
_
.
Then A
t
A =
_
2 1
1 2
_
and A
t
b =
_
1
4
_
. The unique solution to the normal equations is
x =
_
2
3
_
and b Ax =
_
_
2
2
2
_
_
(note that this vector is orthogonal to the columns of A).
The projection of b onto the column space of A is p = Ax =
_
_
1
2
3
_
_
.
Now let B =
_
_
1 1 1
1 0 1/2
0 1 1/2
_
_
. We have B
t
B =
_
_
2 1 3/2
1 2 3/2
3/2 3/2 3/2
_
_
and B
t
b =
_
_
1
4
3/2
_
_
.
Note that A and B have the same column spaces (the third column of B is the average of
the rst two columns). So the projection of b onto the column space of B will be the same as
before. However the normal equations do not have a unique solution in this case. Check that
x =
_
_
2
3
0
_
_
,
_
_
3
2
2
_
_
are both solutions of the normal equations B
t
Bx = B
t
b.
Suppose we have a large number of data points (x
i
, y
i
), i = 1, 2, . . . , n collected from some
experiment. Frequently there is reason to believe that these points should lie on a straight line. So
we want a linear function y(x) = s +tx such that y(x
i
) = y
i
, i = 1, . . . , n. Due to uncertainity in
data and experimental error, in practice the points will deviate somewhat from a straightline and
so it is impossible to nd a linear y(x) that passes through all of them. So we seek a line that ts
the data well, in the sense that the errors are made as small as possible. A natural question that
arises now is: how do we dene the error?
Consider the following system of linear equations, in the variables s and t, and known coecients
x
i
, y
i
, i = 1, . . . , n:
s +x
1
t = y
1
s +x
2
t = y
2
.
.
s +x
n
t = y
n
Note that typically n would be much greater than 2. If we can nd s and t to satisfy all these
equations, then we have solved our problem. However, for reasons mentioned above, this is not
always possible. For given values of s and t the error in the ith equation is [y
i
s x
i
t[. There
are several ways of combining the errors in the individual equations to get a measure of the total
error. The following are three examples:
_
n
i=1
(y
i
s x
i
t)
2
,
n
i=1
[y
i
s x
i
t[, max
1in
[y
i
s x
i
t[.
Both analytically and computationally, a nice theory exists for the rst of these choices and this is
what we shall study. The problem of nding s, t so as to minimize
_
n
i=1
(y
i
s x
i
t)
2
is called a least squares problem.
Let
A =
_
_
_
_
_
_
1 x
1
1 x
2
. .
. .
1 x
n
_
_
_
_
_
_
, b =
_
_
_
_
_
_
y
1
y
2
.
.
y
n
_
_
_
_
_
_
, and x =
_
s
t
_
, so that Ax =
_
_
_
_
_
_
s +tx
1
s +tx
2
.
.
s +tx
n
_
_
_
_
_
_
.
The least squares problem is nding an x such that [[b Ax[[ is minimized, i.e., nd an x such
that Ax is the best approximation to b in the column space of A. This is precisely the problem of
nding x such that b Ax is orthogonal to the column space of A.
A straight line can be considered as a polynomial of degree 1. We can also try to t an mth
degree polynomial
y(x) = s
0
+s
1
x +s
2
x
2
+ +s
m
x
m
3.3. DETERMINANT AND VOLUME 59
to the data points (x
i
, y
i
), i = 1, . . . , n, so as to minimize the error (in the least squares sense). In
this case s
0
, s
1
, . . . , s
m
are the variables and we have
A =
_
_
_
_
_
_
1 x
1
x
2
1
. . x
m
1
1 x
2
x
2
2
. . x
m
2
. . . . .
. . . . .
1 x
n
x
2
n
. . x
m
n
_
_
_
_
_
_
, b =
_
_
_
_
_
_
y
1
y
2
.
.
y
n
_
_
_
_
_
_
, x =
_
_
_
_
_
_
s
0
s
1
.
.
s
m
_
_
_
_
_
_
.
Example 3.2.10. Find s, t such that the straight line y = s +tx best ts the following data in the
least squares sense:
y = 1 at x = 1, y = 1 at x = 1, y = 3 at x = 2.
We want to project b =
_
_
1
1
3
_
_
onto the column space of A =
_
_
1 1
1 1
1 2
_
_
. Now A
t
A =
_
3 2
2 6
_
and A
t
b =
_
5
6
_
. The normal equations are
_
3 2
2 6
__
s
t
_
=
_
5
6
_
.
The solution is s = 9/7, t = 4/7 and the best line is y =
9
7
+
4
7
x.
3.3 Determinant and Volume
The proper denition of volume needs multivariable calculus. We do not discuss this here but
instead give an elementary geometric denition of the k-volume of a parallelopiped with k sides in
R
n
and show that this is given by a determinant.
Denition 3.3.1. Consider R
n
with standard inner product. For 1 k n, let v
1
, . . . , v
k
be k
vectors in in R
n
. For i = 1, . . . , k dene W
i
to be the subspace spanned by v
1
, . . . , v
i
.
Inductively dene the volume V (v
1
, . . . , v
i
) of the i-parallelopiped with sides v
1
, . . . , v
i
as follows
V (v
1
) = |v
1
|, if i = 1,
V (v
1
, . . . , v
i
) = |v
i
p
W
i1
(v
i
)|V (v
1
, . . . , v
i1
), if 1 < i k.
While this denition is geometrically meaningful, it is not clear that the volume is independent
of the order in which the v
1
, . . . , v
k
are listed. This will follow from the next result.
Exercise 3.3.2. Show that if v
1
, . . . , v
k
are linearly dependent then V (v
1
, . . . , v
k
) = 0.
Exercise 3.3.3. Show that V (v
1
, . . . , v
k
) |v
1
||v
2
| |v
k
|, with equality i the v
i
s are orthog-
onal.
For i = 1, 2, . . . , k, let A
i
be the n i matrix with v
1
, . . . , v
i
as the columns. Note that the
entry in row r and column s of A
t
i
A
i
is v
r
v
s
.
Theorem 3.3.4. We have
V (v
1
, . . . , v
k
)
2
= det(A
t
k
A
k
), k = 1, . . . , n.
Proof. By induction on k, the result being clear for k = 1.
Set u
k
= v
k
p
W
k1
(v
k
) and write
p
W
k1
(v
k
) = a
1
v
1
+ +a
k1
v
k1
,
where a
i
R for all i. Then
v
k
= u
k
+a
1
v
1
+ +a
k1
v
k1
.
Set a = (a
1
, . . . , a
k1
)
t
and let B
k
be the n k matrix with v
1
, . . . , v
k1
, u
k
as the columns. Then
(why?)
A
k
= B
k
_
I a
0 1
_
,
where the second matrix on the rhs is k k with I the (k 1) (k 1) identity matrix. Thus
A
t
k
A
k
=
__
I a
0 1
__
t
B
t
k
B
k
_
I a
0 1
_
.
Since u
k
is orthogonal to v
1
, v
2
, . . . , v
k1
we have
B
t
k
B
k
=
_
B
t
k1
B
k1
0
0 u
k
u
k
_
,
where both the 0s have k 1 components.
By induction det(B
t
k1
B
k1
) = V (v
1
, . . . , v
k1
)
2
and the result follows by taking determinants
in the formula above for A
t
k
A
k
.
Exercise 3.3.5. Show that V (v
1
, . . . , v
k
) is invariant under permutation of the v
i
s.
Exercise 3.3.6. Let A be a square nn real matrix with columns v
1
, . . . , v
n
. Prove Hadamards
inequality: det(A) |v
1
||v
2
| |v
n
|, with equality i the v
i
s are orthogonal.
Chapter 4
Eigenvalues and eigenvectors
4.1 Algebraic and Geometric multiplicities
Denition 4.1.1. Let V be a vector space over F and let T : V V be a linear operator. A scalar
F is said to be an eigenvalue of T if there is a nonzero vector v V such that T(v) = v.
We say that v is an eigenvector of T with eigenvalue .
Denition 4.1.2. Let A be a n n matrix over F. An eigenvalue and eigenvector of A are an
eigenvalue and eigenvector of the linear map
T
A
: F
n
F
n
given by T
A
(x) = Ax, x F
n
, i.e., F is an eigenvalue of A if there exists a nonzero (column)
vector x F
n
with Ax = x.
Example 4.1.3. (i) Let V be the real vector space of all smooth real valued functions on R. Let
D : V V be the derivative map. The function f(x) = e
x
is an eigenvector with eigenvalue
since D(e
x
) = e
x
.
(ii) Let A be a diagonal matrix with scalars
1
, . . . ,
n
on the diagonal. We write this as A =
diag(
1
, . . . ,
n
). Then Ae
i
=
i
e
i
and so e
1
, . . . , e
n
are eigenvectors of A with (corresponding)
eigenvalues
1
, . . . ,
n
.
Let T : V V be linear and let F. It can be checked that
V
= v V : T(v) = v
is a subspace of V . If V
,= 0, then is an eigenvalue of T and any nonzero vector in V
is an
eigenvector with eigenvalue . In this case we say that V
is the eigenspace of the eigenvalue .

Theorem 4.1.4. Let T : V V be a linear operator. Let
1
, . . . ,
n
F be distinct eigenvalues of
T and let v
i
V
i
, i = 1, . . . , n. Then
v
1
+v
2
+ +v
n
= 0 implies v
i
= 0, for all i.
That is, eigenvectors corresponding to distinct eigenvalues are linearly independent.
61
62 CHAPTER 4. EIGENVALUES AND EIGENVECTORS
Proof. By induction on n, the case n = 1 being clear. Let n > 1. Assume
v
1
+v
2
+ +v
n
= 0.
So
1
v
1
+
1
v
2
+ +
1
v
n
= 0.
Applying T to the equation v
1
+v
2
+ +v
n
= 0 gives
1
v
1
+
2
v
2
+ +
n
v
n
= 0.
Subtracting gives
(
2
1
)v
2
+ + (
n
1
)v
n
= 0.
Since
1
,
2
, . . . ,
n
are distinct we get (why?), by induction, that v
2
= = v
n
= 0. And now we
get (why?)v
1
= 0.
Example 4.1.5. Continuation of Example 4.1.3 (i). e
1
x
, . . . , e
n
x
(
1
, . . . ,
n
distinct) are
linearly independent functions.
Denition 4.1.6. Let V be a fdvs over F and let T : V V be a linear operator. T is said
to be diagonalizable if there exists a basis (or, equivalently, an ordered basis) of V consisting of
eigenvectors of T. If B = (v
1
, . . . , v
n
) is such an ordered basis with T(v
i
) =
i
v
i
,
i
F then
M
B
B
(T) = diag(
1
, . . . ,
n
).
Denition 4.1.7. A n n matrix A over F is said to be diagonalizable i the linear map T
A
:
F
n
F
n
, given by T
A
(x) = Ax, x F
n
, is diagonalizable.
Lemma 4.1.8. A n n matrix A over F is diagonalizable i P
1
AP is a diagonal matrix, for
some invertible matrix P over F. In that case, the columns of P are eigenvectors of A and the ith
diagonal entry of P
1
AP is the eigenvalue associated with the ith column of P.
Proof. (only if ) Let v
1
, . . . .v
n
be a basis of F
n
with T
A
(v
i
) = Av
i
=
i
v
i
. Let P be the n n
matrix with ith column v
i
and let D = diag(
1
, . . . ,
n
). Then check that
AP = PD.
Since P is invertible (why?), we have P
1
AP = PD.
(if ) Suppose P
1
AP = D, where D is diagonal. Then AP = PD. It now follows, as in the only if
part, that the ith column of P is an eigenvector with eigenvalue the ith diagonal entry of D.
Denition 4.1.9. Let A be a n n matrix over F. We dene the characteristic polynomial
P
A
of A to be
P
A
(t) = det(tI A).
Note that P
A
(t) is a monic polynomial of degree n (i.e., the coecient of the term t
n
is 1).
Lemma 4.1.10. If A = PBP
1
then P
A
(t) = P
B
(t).
4.1. ALGEBRAIC AND GEOMETRIC MULTIPLICITIES 63
Proof. We have
P
A
(t) = det(tI PBP
1
)
= det(P(tI B)P
1
)
= det(P)det(tI B)det(P
1
)
= P
B
(t).
Lemma 4.1.11. (1) Eigenvalues of a square matrix A are the roots of P
A
(t) lying in F.
(2) For a scalar F, V
= nullspace of AI.
Proof. (1) F is an eigenvalue of A i Av = v for some nonzero v i (A I)v = 0 for some
nonzero v i the nullity of AI is positive i rank(AI) is less than n i det(AI) = 0.
(2) V
= v [ Av = v = v [ (AI)v = 0 = ^(AI).
Example 4.1.12. (1) Let A =
_
1 2
0 3
_
. To nd the eigenvalues of A we solve the equation
det(I A) = det
_
1 2
0 3
_
= ( 1)( 3) = 0.
Hence the eigenvalues of A are 1 and 3. Let us calculate the eigenspaces V
1
and V
3
. By denition
V
1
= v [ (AI)v = 0 and V
3
= v [ (A3I)v = 0.
AI =
_
0 2
0 2
_
. Suppose
_
0 2
0 2
_ _
x
y
_
=
_
2y
2y
_
=
_
0
0
_
. Hence V
1
= L(1, 0).
A3I =
_
1 3 2
0 3 3
_
=
_
2 2
0 0
_
. Suppose
_
2 2
0 0
_ _
x
y
_
=
_
0
0
_
.
Then
_
2x + 2y
0
_
=
_
0
0
_
. Hence x = y. Thus V
3
= L((1, 1)).
(2) Let A =
_
cos sin
sin cos
_
, where ,= 0, . Now
P
A
(t) = det(
_
t cos sin
sin t cos
_
)
= (t cos )
2
+sin
2
= (t e
i
)(t e
i
),
where i =

1.
So, as a real matrix A has no eigenvalues (and thus no eigenvectors). This is clear geometrically
as A represents counter clockwise rotation by . But as a complex matrix A has two distinct eigen-
values e
i
and e
i
. An eigenvector corresponding to e
i
is (1, i)
t
and an eigenvector corresponding
to e
i
is (i, 1)
t
.
Example 4.1.13. Find A
8
, where A =
_
4 3
2 1
_
.
Check that the eigenvalues of A are 2, 1 and that the corresponding eigenvectors are
_
3
2
_
and
_
1
1
_
.
Set P =
_
3 1
2 1
_
and D =
_
2 0
0 1
_
. Then P
1
=
_
1 1
2 3
_
and A = PDP
1
.
We have
A
8
= (PDP
1
)
8
= (PDP
1
) (PDP
1
) = PD
8
P
1
=
_
3 1
2 1
_ _
2
8
0
0 1
8
_ _
1 1
2 3
_
=
_
3 1
2 1
_ _
256 0
0 1
_ _
1 1
2 3
_
=
_
766 765
510 509
_
.
Let T : V V be a linear transformation of a fdvs over F. WE dene the characteristic
polynomial P
T
(t) of T to be P
A
(t), where A = M
B
B
(T) wrt an ordered basis of V . By Lemma
4.1.10 it is immaterial which ordered basis B we take.
Denition 4.1.14. (i) Let f(x) be a polynomial with coecients in F. Let F be a root of f(x).
Then (x ) divides f(x). The multiplicity of the root is the largest positive integer k such
that (x )
k
divides f(x).
(ii) Let V be a fdvs over F and let T : V V be a linear operator. Let be an eigenvalue of
T. The geometric multiplicity of is the dimension of the eigenspace V
and the algebraic

multiplicity of is the multiplicity of as a root of P
T
(t).
Theorem 4.1.15. Let V be a fdvs over F and let T : V V be a linear operator. Then the
geometric multiplicity of an eigenvalue F of T is less than or equal to the algebraic multiplicity
of .
Proof. Suppose that the algebraic multiplicity of is k. Let g = geometric multiplicity of . Hence
V
has a basis of g eigenvectors v

1
, v
2
, . . . , v
g
. We can extend this basis of V
to an (ordered) basis
of V say B = (v
1
, v
2
, . . . , v
g
, . . . , v
n
). Now
M
B
B
(T) =
_
_
I
g
D
0 C
_
_
where D is an g (n g) matrix and C is an (n g) (n g) matrix. From the form of M
B
B
(T),
we see that ( )
g
is a factor of det(AI). Thus g k.
Theorem 4.1.16. Let T : V V be a linear operator, where V is a n-dimensional vector space
over F.
4.1. ALGEBRAIC AND GEOMETRIC MULTIPLICITIES 65
(i) T is diagonalizable i the sum of the dimensions of the eigenspaces of T equals n.
(ii) Assume F = C. Then T is diagonalizable i the algebraic and geometric multiplicities are
equal for all eigenvalues of T.
Proof. (i)(if ) Let
1
, . . . ,
k
be the distinct eigenvalues of T. Let B
i
be a basis of V
i
. By Theorem
4.1.4 B
1
B
2
B
k
is a basis of V consisting of eigenvectors of T.
(only if ) Exercise.
(ii) Let
1
, . . . ,
k
be the distinct eigenvalues of T. By the Fundamental theorem of Algebra
P
T
(t) =

k
i=1
(t
i
)
m
i
, where m
i
is the algebraic multiplicity of
i
. Since

i
m
i
= n the result
now follows from part (i).
Example 4.1.17. (1) Let A =
_
_
3 0 0
2 4 2
2 1 5
_
_
. Then det(I A) = ( 3)
2
( 6).
Hence eigenvalues of A are 3 and 6. The eigenvalue = 3 has algebraic multiplicity 2. Let
us nd the eigenspaces V
3
and V
6
.
= 3 : A3I =
_
_
0 0 0
2 1 2
2 1 2
_
_
. Hence rank(A3I) = 1.
Thus nullity (A3I) = 2. By solving the system (A3I)v = 0, we nd that
^(A3I) = V
3
= L((1, 0, 1)
t
, (1, 2, 0)
t
).
Hence geometric multiplicity of = 3 is 2.
= 6 : A6I =
_
_
3 0 0
2 2 2
2 1 1
_
_
. Hence rank(A6I) = 2.
Thus dimV
6
= 1. It can be shown that (0, 1, 1)
t
is a basis of V
6
. Thus the algebraic and geometric
multiplicities of = 6 are one.
If we dene
P =
_
_
1 1 0
0 2 1
1 0 1
_
_
then we have P
1
AP = diag(3, 3, 6).
(2) Let A =
_
1 1
0 1
_
. Then det(I A) = ( 1)
2
. Thus = 1 has algebraic multiplicity 2.
A I =
_
0 1
0 0
_
. Hence nullity (A I) = 1 and V
1
= Le
1
. In this case the geometric
multiplicity of = 1 < algebraic multiplicity of = 2.
Exercise 4.1.18. Show that an upper triangular matrix with distinct scalars on the diagonal is
diagonalizable.
Theorem 4.1.19. (Schur) Let V be a n-dimensional vector space over C and let T : V V be a
linear operator. Then there exists an ordered basis B of V such that M
B
B
(T) is upper triangular.
Proof. By induction on the dimension of V the dimension 1 case being clear. Let be an eigenvalue
of T (it exists since we have taken the scalars to be C). Choose an eigenvector v with eigenvalue
. Extend v to an ordered basis (v, u
2
, . . . , u
n
) of V and let W denote the subspace spanned by
u
2
, . . . , u
n
.
Dene a linear map S : W W as follows: given u W, write (uniquely) T(u) = av + w,
where a C and w W. Dene S(u) = w. Check that this denes a linear map.
By induction, there is an ordered basis C = (w
2
, . . . , w
n
) of W such that M
C
C
(S) is upper
triangular. Set B = (v, w
2
, . . . , w
n
). It is now easy to check that M
B
B
(T) is upper triangular.
Exercise 4.1.20. Let A be a complex n n matrix with eigenvalues
1
, . . . ,
n
(in this notation
eigenvalues may possibly be repeated. The number of times an eigenvalue is repested is equal to
its algebraic multiplicity). The trace of A, denoted tr(A), is the sum of its diagonal entries.
1. Show that det(A) =
1
2

n
.
2. Show that tr(A) =
1
+
2
+ +
n
. (Hint: tr(P
1
AP) = tr(A). Now use the previous
theorem.)
3. Show that the eigenvalues of A
k
are
k
1
, . . . ,
k
n
.
Example 4.1.21. Let A be a complex n n . Then A is nilpotent i all eigenvalues of A are 0.
Choose an ordered basis B of C
n
such that M = M
B
B
(T
A
) = (m
ij
) is upper triangular. So the
eigenvalues of A are m
ii
, i = 1, . . . , n. If all m
ii
are 0 then A is nilpotent (by an example form
Chapter 1). On the other hand, if some m
jj
,= 0, then the jth diagonal entry of all powers of A
will be nonzero and A cannot be nilpotent.
Example 4.1.22. Let A be a complex n n matrix. Then A is nilpotent i tr(A
i
) = 0, for
i = 1, . . . , n.
The only if part is clear. For the if part, we prove only the n = 3 case. The reader is urged
to work out the n = 4 case (which is similar) after which the general case becomes clear. In the
process the reader will have discovered Newtons identities (on symmetric functions).
Assume that the eigenvalues of A are
1
,
2
,
3
. Since trace is the sum of eigenvalues we have
p
i
=
i
1
+
i
2
+
i
3
= 0, i = 1, 2, 3.
We want to show
1
=
2
=
n
= 0.
Set e
1
= p
1
, e
2
=
1
2
+
1
3
+
2
3
and e
3
=
1
3
.
First we show e
1
= e
2
= e
3
= 0. We have e
1
= p
1
= 0. Now consider
0 = (
1
+
2
+
3
)
2
= p
2
+ 2e
2
.
It follows that e
2
= 0. Similarly consider
0 = (
1
+
2
+
3
)
3
= p
3
+ 6e
3
+ 3
i,=j
2
i
j
= p
3
+ 6e
3
+ 3p
2
p
1
p
3
.
It follows that e
3
= 0. Now e
1
= e
2
= e
3
= 0 easily implies that
1
=
2
=
3
= 0.
4.2. SPECTRAL THEOREM 67
4.2 Spectral Theorem
In this section we prove one of the most important results in Linear Algebra, namely, the Spectral
Theorem. We shall see that the three concepts of symmetry or conjugate symmetry (of the entries
of a matrix), orthogonality (of eigenvectors), and commutativity (of operators) are linked together
in a deep way. More generally, adjointness, orthogonality, and commutativity go hand in hand.
This is the beginnings of the beautiful subject of representation theory.
Recall the denition of a diagonalizable complex matrix: An n n complex matrix A is diago-
nalizable if there exists a basis of C
n
consisting of eigenvectors of A. Now, C
n
is not just a vector
space. Endowed with the dot product, it is an inner product space. The following denition is thus
natural.
Denition 4.2.1. (i) An n n complex matrix A is orthonormally diagonalizable if there
exists an orthonormal basis of C
n
consisting of eigenvectors of A. Similarly,
(ii) An n n real matrix A is orthonormally diagonalizable if there exists an orthonormal
basis of R
n
consisting of eigenvectors of A.
Theorem 4.2.2. (i) If a square real matrix A is orthonormally diagonalizable then A
t
= A, i.e.,
A is symmetric.
(ii) If a square complex matrix A is orthonormally diagonalizable then A
A = AA
.
Proof. (i) Let A be a real n n orthonormally diagonalizable matrix. Let v
1
, v
2
, . . . , v
n
be an
orthonormal basis of R
n
with Av
i
=
i
v
i
,
i
R and let D = diag(
1
, . . . ,
n
). Let P be the n n
i
. Then AP = PD.
Since the v
i
are orthonormal we have (why?) P
t
P = I. Thus
A = PDP
t
and A
t
= PD
t
P
t
.
Since D is a real diagonal matrix we have D = D
t
and A = A
t
.
(ii) Let A be a complex n n orthonormally diagonalizable matrix. Let v
1
, v
2
, . . . , v
n
be an or-
thonormal basis of C
n
with Av
i
=
i
v
i
,
i
C and let D = diag(
1
, . . . ,
n
). Let P be the n n
i
. Then AP = PD.
Since the v
i
are orthonormal we have (why?) P
P = I. Thus
A = PDP
and A
= PD
.
It follows that AA
= (PDP
)(PD
) = PDD
and A
A = PD
DP
. Since D is diagonal
we have AA
= A
A.
A square complex matrix A is called normal if A
A = AA
. The spectral theorem asserts the

converse of the theorem above : any real symmetric matrix is orthonormally diagonalizable (over
reals) and any complex normal matrix
1
is orthonormally diagonalizable (over complexes).
A square complex matrix A is called Hermitian or self-adjoint if A = A
. Note that
Hermitian matrices are normal. Also note that a real matrix (treated as a complex matrix) is
Hermitian i it is symmetric. We shall rst prove the spectral theorem for Hermitian matrices and
then deduce the normal case from this.
1
unlike human beings it is possible for a matrix to be both complex and normal.
Theorem 4.2.3. The eigenvalues of a Hermitian matrix are real.
Proof. Let A be a Hermitian matrix. Then for any v C
n
(v
Av)
= v
v = v
Av,
so v
Av is a real number.
Now let be an eigenvalue of A with eigenvector v. Then
v
Av = v
(v) = (v
v) = |v|
2
.
Since v
Av is real and |v| is a nonzero real number it follows that is real.

Though a proof of the spectral theorem for self-adjoint matrices can be given working only with
matrices, a coordinate free approach is more intuitive and more memorable. Therefore, we rst
develop a coordinate free version of the concept of a self adjoint matrix. The following denition
covers both the real and complex cases.
Denition 4.2.4. Let V be a nite dimensional inner product space over F. A linear operator
T : V V is said to be Hermitian or self-adjoint if
x, T(y)) = T(x), y), x, y V.
Theorem 4.2.5. Let V be a nite dimensional inner product space over F and let T : V V
be a linear operator. Then T is self-adjoint i M
B
B
(T) is a self-adjoint matrix for every ordered
orthonormal basis of V .
Proof. Let B = (v
1
, . . . , v
n
) be an ordered orthonormal basis of V and let A = (a
ij
) = M
B
B
(T).
(only if ) T(v
j
) =
n
k=1
a
kj
v
k
. So
T(v
j
), v
i
) =
n
k=1
a
kj
v
k
, v
i
) = a
ij
.
Thus
a
ij
= T(v
j
), v
i
) = v
j
, T(v
i
)) = v
j
,
n
k=1
a
ki
v
k
) = a
ji
.
(if ) We are given that a
ij
= a
ji
. From the only if part we have
T(v
j
), v
i
) = v
j
, T(v
i
)).
Let x =
n
j=1
a
j
v
j
and y =
n
i=1
b
i
v
i
.
We have
x, T(y)) =
j
a
j
v
j
,
i
b
i
T(v
i
)) =
j,i
a
j
b
i
v
j
, T(v
i
)),
T(x), y) =
j
a
j
T(v
j
),
i
b
i
v
i
) =
j,i
a
j
b
i
T(v
j
), v
i
).
It follows that T is self-adjoint.
Theorem 4.2.6. (Spectral Theorem for Self-Adjoint Operators) Let V be a nite dimen-
sional inner product space over F and let T : V V be a self-adjoint linear operator. Then there
exists an orthonormal basis of V consisting of eigenvectors of T.
Proof. By the fundamental theorem of algebra and Theorem 4.2.3 (why?) there exists R and
0 ,= v V with T(v) = v. We may assume that v is a unit vector.
Put W = L(v)
. We claim that
(i) w W implies T(w) W.
(ii) T : W W is self-adjoint.
Proof of claim (i): We have
T(w), v) = w, T(v)) = w, v) = w, v) = 0,
since w W. Thus T(w) W.
Proof of claim (ii): This is clear.
By induction on dimension, there is an orthonormal basis B of W consisting of eigenvectors of
T. Now v B is the required orthonormal basis of V .
Corollary 4.2.7. Let T : V V be a self-adjoint linear operator on a nite dimensional inner
product space over F. Let
1
, . . . ,
k
be the distinct (real) eigenvalues of T. Then the eigenspaces
V
1
, . . . , V
k
are mutually orthogonal and the sum of their dimensions is equal to the dimension of
V .
Proof. Exercise.
An n n matrix U over C is said to be unitary if U
U = UU
= I, i.e., the columns (and

rows) of U are orthonormal vectors in C
n
(why?).
An n n matrix O over R is said to be orthogonal if O
t
O = OO
t
= I, i.e., the columns (and
rows) of O are orthonormal vectors in R
n
(why?).
Corollary 4.2.8. (Spectral Theorem for Real Symmetric matrices and Self-Adjoint
Matrices) (i) Let A be a n n real symmetric matrix with (real) eigenvalues
1
, . . . ,
n
. Set
D = diag(
1
, . . . ,
n
). Then there exists an n n real orthogonal matrix O such that
O
t
AO = D.
(ii) Let A be a n n complex self-adjoint matrix with (real) eigenvalues
1
, . . . ,
n
. Set D =
diag(
1
, . . . ,
n
). Then there exists an n n unitary matrix U such that
U
AU = D.
Proof. Exercise.
Here is a procedure for orthonormally diagonalizing a self-adjoint n n matrix A:
(i) Find the eigenvalues of A. These should all be real. Let us call them
1
, . . . ,
k
.
(ii) For each eigenvalve
i
of A nd a basis of V
i
= ^(A
i
I) (this can be done with Gauss
elimination). Suppose this basis has d
i
vectors. Then d
1
+ +d
k
= n.
(iii) Apply Gram-Schmidt process to the basis obtained in step (ii) to get an orthonormal basis
B(
i
), for all i.
(iv) Form an n n matrix as follows: the rst d
1
columns are the (column) vectors in B(
1
) (in
any order), the next d
2
columns are the vectors in B(
2
) (in any order), and so on. This is the
matrix U (in the complex case) or O (in the real case).
(v) The diagonal matrix D is the following: the rst d
1
elements on the diagonal are
1
, the next
d
2
elements are
2
, and so on.
Example 4.2.9. Consider the real symmetric matrix
A =
_
_
1 2 2
2 1 2
2 2 1
_
_
.
Check that the eigenvalues of A are 3, 3, 3.
The eigenvectors for = 3 (the null space of ^(A3I)) are solutions of
_
_
2 2 2
2 2 2
2 2 2
_
_
_
_
x
y
z
_
_
=
_
_
0
0
0
_
_
.
Hence we obtain the single equation x y +z = 0. Hence
V
3
= (y z, y, z) [ y, z R
= L(u
1
= (0, 1, 1)
t
, u
2
= (1, 1, 2)
t
).
Now we apply Gram-Schmidt process to get an orthonormal basis of V
3
:
v
1
=
_
0,
1
2
,
1
2
_
t
and v
2
=
_
_
2
3
,
1
6
,
1
6
_
t
.
Similarly, check that the unit vector v
3
= (1/
3, 1/
3, 1/
3)
t
is an orthonormal basis of V
3
.
The orthogonal matrix O for diagonalization is O = [v
1
, v
2
, v
3
] and D = diag(3, 3, 3).
A square matrix K over C is said to be skew-Hermitian or skew-self-adjoint if K
= K.
Lemma 4.2.10. A skew-self-adjoint matrix is orthonormally diagonalizable and its eigenvalues are
purely imaginary (i.e., of the form ia, for some real a).
Proof. If K is skew-self-adjoint then A = iK is self adjoint. The result follows.
Lemma 4.2.11. Let V be a n-dimensional complex inner product space. Let A, B be two commuting
self-adjoint operators on V , i.e., AB = BA. Then there exists an (ordered) orthonormal basis
(v
1
, . . . , v
n
) of V such that each v
i
is an eigenvector of both A and B.
Proof. Let V
1
, . . . , V
t
be the distinct eigenspaces of A with associated eigenvalues
1
, . . . ,
t
. Then
V
1
, . . . , V
t
are mutually orthogonal and the sum of their dimensions is n.
Let v V
i
. Then we claim that B(v) V
i
. To see this note that
A(B(v)) = (AB)(v) = (BA)(v) = B(A(v)) = B(
i
v) =
i
B(v),
so B(v) is an eigenvector of A with eigenvalue
i
and hence belongs to V
i
.
Thus, for all i, B : V
i
V
i
is a self-adjoint operator. Hence each V
i
has an orthonormal
basis consisting of eigenvectors of B and all of these vectors are already eigenvectors of A. That
Theorem 4.2.12. (Spectral Theorem for Normal Matrices) A complex normal matrix is
orthonormally diagonalizable.
Proof. Let N be normal. Write
N =
N +N
2
+
N N
2
.
Put A = (N + N
)/2 and B = (N N
)/2. Check that A is self-adjoint, B is skew-self-adjoint,

and AB = BA.
Now C = iB is self-adjoint and AC = CA. By the previous lemma there is a common orthogonal
eigenbasis B of A and C. Since B = iC, B is also an orthogonal eigenbasis of B and of N = A+B.
Lemma 4.2.13. Let U be a n n unitary matrix. Then U is orthonormally diagonalizable and

every eigenvalue of U satises [[ = 1.
Proof. Since U is normal it is orthonormally diagonalizable.
Let x, y C
n
. Then
Ux Uy = (Ux)
Uy = x
Uy = x
y = x y.
Hence
|Ux|
2
= Ux Ux = x x = |x|
2
.
If x is an eigenvector with eigenvalue the Ux = x. So
|x| = |Ux| = |x| = [[|x|.
Thus [[ = 1.

Linear Algebra Ma106 Iitb

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Ma106 Iitb

Uploaded by

Copyright:

Available Formats

Linear Algebra

are called the set of pivotal columns of M. Columns 1, . . . , nj

and nonpivotal or free columns F = 1, . . . , n P. Let c = (c

= aD(f) +bD(g). Hence D is a linear transformation.

Ay. Denote the standard inner product on C

. Note also that the map p

,= 0, then is an eigenvalue of T and any nonzero vector in V

is the eigenspace of the eigenvalue .

and the algebraic

has a basis of g eigenvectors v

. The spectral theorem asserts the

Av is real and |v| is a nonzero real number it follows that is real.

= I, i.e., the columns (and

)/2. Check that A is self-adjoint, B is skew-self-adjoint,

Lemma 4.2.13. Let U be a n n unitary matrix. Then U is orthonormally diagonalizable and

You might also like