0% found this document useful (0 votes)

20 views115 pages

Linear Algebra Notes

Uploaded by

ria paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views115 pages

Linear Algebra Notes

Uploaded by

ria paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Linear Algebra I notes

Sagnik Chakraborty
February 2, 2021

Abstract
These are some rough lecture notes of the course on Linear Algebra. The textbook for this
course is the ‘Linear Algebra’ book by Hoffman & Kunze. The notes are not complete, but
only intended to serve as a guideline. I’ll omit many details which you should fill in. Also,
frequent references will be made to the textbook (henceforth referred to as HK).

1
Contents
Lecture 1 (Matrices - row, column operations) 3

Lectures 2,3 (Equivalence of matrices, Vector spaces) 5

Lecture 4 (Basis) 11

Lecture 5 (Co-ordinate systems) 15

Lecture 6 (Matrix representations, Lagrange’s interpolation) 19

Lecture 7 (Dual spaces) 22

Lecture 8 (Double dual) 26

Lectures 9,10 (Eigenvalues, Eigenvectors, Characteristic polynomial) 30

Lectures 11,12,13 (Diagonalization, Triangularization) 37

Lecture 14 (Projections) 48

Lecture 15 (Projections and Diagonalizability) 51

Lectures 16,17,18,19 (Inner product spaces) 57

Lectures 20,21 (Orthogonal projections and Best approximations) 72

Lectures 22,23,24,25 (Orthogonal diagonalization, Adjoints and Normal operators) 78

Lecture 26 (Unitary operators) 92

Lecture 27 (QR decomposition, Orthogonal transformations of Rn ) 95

Lectures 28,29 (Polar decomposition, Spectral theory and Self-adjoint algebras) 102

Miscellaneneous Exercises 114

2
Lecture 1 (24/9/2020) :
Linear algebra consists of two words - linear and algebra.

Algebra, in high school, was a ‘tool to solve equations’. With a major transition from previously
familiar arithmetic which involves computation with numbers, to a bunch of symbols like ‘x, y, z
etc.’ and a list of formulas like ‘(x + y)2 = x2 + y 2 + 2xy’. This algorithmic approach of ‘you
know the formula, you know the solution’ actually facilitates computation, allowing us to solve
equations more efficiently - our approach to solve a problem is no longer dependent on a specific
set of ‘values’; rather, any set of of values given as ‘inputs’ in the formulas give us solutions.
After that, ‘college algebra’ became a game involving ‘sets with operations’. The salient feature of
this so-called ‘abstract algebra’ is its axiomatic development. In this approach, to study a struc-
ture, we extract a set a properties of this structure and then formulate them as axioms. Then
formal deductions are made from these axioms. And whatever results we get, then applies to every
possible structure satisfying that particular set of axioms.

Now the word ‘linear’ comes from ’line’ - in a sense, it means ‘degree 1’. What is a line? If a, b
are two points in Rn (or, more generally, in a vector space V over a field F ) then the line through
a and b, denoted by l(a, b), is defined as

l(a, b) := {λa + (1 − λ)b | λ ∈ R}.

Note that if a = b then the line degenerates into a point!

Now if we take our field to be F2 instead of R, then l(a, b) = {a, b} - a set consisting of just two
points! More generally, if V is a vector space over a field F and a, b ∈ V are two distinct points,
then the natural map φ : F → l(a, b) given by λ 7→ λa + (1 − λ)b gives a bijection. Also, a line over
C is what we think to be plane (over R). So our intuition of a line which has developed feeding
on the reals can sometimes be misleading! Nevertheless, the intuition is helpful more often than not.

The simplest linear equation is ‘ax = b’. To make it meaningful, we should know what ax is,
i.e., a concept of multiplication. Then, to solve it, if a 6= 0, we should have a multiplicative inverse
(note that in a ring R we usually don’t allow 0 to have an inverse as R = 0 iff 1 = 0 in R iff 0 has
a multiplicative inverse in R). For example, 2x = 1 is a perfectly nice linear equation over Z. But
it has no solution in Z, the solution is in Q.
Next, if we consider two linear equations in two variables

ax + by = c
a x + b0 y = c 0
0

then we need one more operation - addition. To solve the equations, we also need subtraction.
So to work with linear equations and to find their solutions (whenever they exist!), the coefficients
must come from a division ring. To make life simpler, we’ll actually assume that the coefficients
are coming from a field.

The information of a system of linear equations is captured by an array of scalars, the so-called
augmented matrix. As we observed in the case of two linear equations in two variables, the most
basic (and surprisingly effective!) tool for solving linear equations is the so-called ‘elimination of
variables’. The essence of this operation is formalized in the process of row reduction of a matrix
by elementary row operations. There are three types of elementary row operations

(i) interchange two rows

(ii) multiply a row by a nonzero scalar (the scalar is chosen to be ‘nonzero’ because otherwise we
can turn everything into the zero matrix!),
(iii) multiply a row by a scalar and then add it to another row.

If A, B ∈ Mm×n (F ) then we say that A is row equivalent to B, denoted by A ∼r B, if B can be

obtained from A by a finite sequence of elementary row operations. Note that row equivalence is
indeed an equivalence relation.

3
Remarks.
(i) If A0 is a matrix obtained from an m × n matrix A by applying a single elementary row
operation then A0 = I 0 A, where I 0 is the m × m matrix obtained from the identity matrix by
applying the same elementary row operation.

(ii) Just as elementary row operations, we can similarly define elementary column operations. We
say that two matrices A, B ∈ Mm×n are column equivalent, denoted by A ∼c B, if B can be
obtained from A by a finite sequence of elementary column operations. Check that column
equivalence is also an equivalence relation.
(iii) If A0 is a matrix obtained from an m × n matrix A by applying a single elementary column
operation then A0 = AI 0 , where I 0 is the n × n matrix obtained from the identity matrix by
applying the same elementary column operation. Therefore A is row/column equivalent to
B iff At is column/row equivalent to B t .
(iv) Note that, when applied to the identity matrix, elementary row operations are ‘same as’
elementary column operations. For example, multiplying the i-th row by c 6= 0 is same as
multiplying the i-th column by c 6= 0, interchanging i-th and j-th row is same as interchanging
i-th and j-th column, and finally adding c-times of the i-th row to the j-th row is same as
adding c-times of the j-th column to the i-th column.
(v) While solving a system of linear equations we never use elementary columin operations since
it will ‘disturb the variables’.
Exercise. A matrix E ∈ Mn (F ) is called an elementary matrix if it can be obtained from the
identity matrix by applying a single elementary row/column operation. Prove that the set of ele-
mentary matrices generate GLn (F ).

Definitions. Let A be an m × n matrix over a field F . Then the subspace of F n (respectively

F m ) generated by the row (respectively column) vectors of A is called the row space (respectively
column space) of A. The dimension of the row space (respectively column space) of A is called the
row rank (respectively column rank) of A.
Note that if we define a linear transformation TA : F n → F m , given by T (x) := Ax, then the rank
of TA is equal to the column rank of A.

4
Lectures 2 and 3 (28/9/2020, 30/9/2020) :
Remark. Let A be an m×n matrix over F . Let TA : F n → F m be the linear transformation given
by A. As elementary matrices generate the general linear group, A ∼r A0 (respectively A ∼c A0 ) iff
there exists an automorphism φ of F m (respectively ψ of F n ) such that TA0 = φ ◦ TA (respectively
TA0 = TA ◦ ψ).

We have been doing row reductions to find the set of solutions of a system of linear equations
all our life. The following proposition shows that it is a legitimate operation.

Proposition. Let Ax = b be a system of m linear equations in n-variables. Then elementary

row operations do not change the solution set of the system.

Proof. By induction, it suffices to show that the solution set remains invariant under a single
elementary row operation. After applying an elementary row operation, let A0 x = b0 be the new
system of linear equations. Then there exists an elementary matrix I 0 such that A0 = I 0 A and
b0 = I 0 b. Since I 0 ∈ GLn (F ), it follows that for an element ξ = (ξ1 , ..., ξn ) ∈ F n , Aξ = b iff
A0 ξ = b0 .
Alternatively, ξ := (ξ1 , ..., ξn ) ∈ F n is a solution of Ax = b iff i ξi Ai = b, where A1 , ..., An ∈ F m
P
m
areP P of A. Since every row operation of A induces an automorphism of F , say
the column vectors
θ, i ξi Ai = b iff i ξi θ(Ai ) = θ(b).

If Ax = b is a system of m linear equations in n-variables, let Sol (Ax = b) denote the set
of its solutions. Then we know that for each ξ = (ξ1 , ..., ξn ) ∈ Sol (Ax = b), Sol (Ax = b) =
ξ + Sol (Ax = 0). Therefore homogeneous linear equations play a special role in solving linear
equations. Observe that the solution of a system of homogeneous linear equations Ax = 0 which
is nothing but the kernel of TA , depends only on the row space of A and not on the individual row
vectors. Therefore row operations, which do not change the row space, also do not change the solu-
tion set of a system of homogeneous linear equations. In fact, if A ∈ Mm×n (F ) and B ∈ Mr×n (F )
are two matrices having the same row space then Ax = 0 and Bx = 0 have the same set of solutions.

Definition. Let A be an m × n matrix over F . Then the rank of A is defined to be the largest
non-negative integer r such that A contains an r × r sub-matrix whose determinant is nonzero.
Note that the rank of A cannot be more than either m or n, and it remains invariant under any
field extension of F .

Proposition. Let E/F be a field extension.

(i) A system of linear equations Ax = b over F has a solution over F iff it has a solution over L.
(ii) A system of homogeneous linear equations over F has a non-trivial solution over F iff it has
a non-trivial solution over L.

Proof. (i) Let Ax = b be a system of m linear equations in n variables. Let (ξ1 , ..., ξn ) ∈ E n
be a solution of this system. Then we can find an F -linearly independent sequence of elements
e0 := 1, e1 , ..., er ∈ E such that each ξi can be written as an F -linear combination of ej ’s. Now
looking at each equation at a time and expanding every ξi as an F -linear combination of ej ’s, one
can easily see that the coefficient of e0 = 1 gives a solution of the system over F .
Alternatively, let A1 , ..., An ∈ F m ⊆ E m be the column vectors of A. Then Ax = b has a solution
over F iff b ∈ F m is contained in the F -column space of A iff the coefficient matrix A and the
augmented matrix (A|b) have the same ( rank or column rank???)rank. Similarly, Ax = b has a
solution over E iff b ∈ E m is contained in the E-column space of A iff the coefficient matrix A and
the augmented matrix (A|b) have the same rank. But the rank of a matrix does not change under
field extensions!
(ii) Non-triviality of a system of homogeneous linear equations depends on the rank of the coefficient
matrix. But the rank of A does not depend on whether we treat it as an element of Mm×n (F ) or
as an element of Mm×n (E).
Note that the proof crucially uses the equality of rank and column rank which we’ll prove in the
next class.

5
Lemma. Let V, W be vector spaces of dimension n, m respectively. If T : V → W is
a linear transformation of rank r then there exist an ordered basis x1 , ..., xn of V such that
T (V ) =< T (x1 ), ...., T (xr ) > and xr+1 , ..., xn ∈ Ker T .

Proof. Left as an exercise.

Proposition. Let A ∈ Mm×n (F ). Then there exists a non-negative integer r such that by
applying elementary elementary row and column operations on A we can turn it into a matrix
which has an r × r identity matrix at the top left and zero everywhere else. Equivalently, there
exist invertible matrices Pm (F ) and Q ∈ GLn (F ) such that P AQ is of the desired form. As the
row rank and the column rank of A remain invariant under both row and column operations, it
implies that the row rank of A is equal to the column rank of A, both being equal to r.

Proof. The matrix-theoretic proof is left to the reader.

We will give a proof from the viewpoint of linear transformations. We want to find automorphisms
φ ∈ Aut F m and ψ ∈ Aut F n such that the matrix representation of φ ◦ TA ◦ ψ is of the desired
form. By the above lemma, we can find ordered bases (x1 , ..., xn ) of V and (y1 , ..., ym ) of W
such that T (xi ) = yi for i ≤ i ≤ r, Ker T =< xr+1 , ...., xn > and T (F n ) =< y1 , ...., yr >. Let
φ : F m → F m be the automorphism which sends (y1 , ..., ym ) to the natural ordered basis of F m .
Let ψ : F n → F n be the automorphism which sends the natural ordered basis of F n to (x1 , ..., xn ).
then the matrix representation of φ ◦ TA ◦ ψ is of the desired form.

Corollary. Let A be an m × n matrix over F . Then the rank of A is equal to its row rank and
column rank.

Proof. We have already proved that the row rank and column rank of A are equal.
If A has rank r, then there exists an r × r sub-matrix A0 of A such that det A 6= 0. Therefore
the row rank of A0 is equal to r. Let R1 , ..., Rr be the rows of A appearing in A0 . If we denote
the rows of A0 by R10 , ..., Rr0 then there exists certain projection π : F n → Fr under which each Ri
is mapped to Ri0 . As R10 , ..., Rr0 are linearly independent, so must be R1 , ..., Rr , implying that the
row rank of A is at least r.
Conversely, if R1 , ..., Rs are linearly independent rows of A. First look at the s × n sub-matrix
of A obtained by deleting the other rows. Since this sub-matrix has row rank s, it must have s
linearly independent columns. Deleting the other columns we get an s × s invertible sub-matrix of
A, which implies that the rank of A cannot be smaller that the row rank of A. Together, we get
the result.

Remarks.
1. If A ∈ Mm×n (F ) then A and At have the same rank, row rank and column rank. In the next
class, we’ll see that they are all equal.
2. If we define an equivalence relation on Mm×n (F ) by saying that A is equivalent to B if there
exist invertible matrices P ∈ GLm (F ) and Q ∈ GLn (F ) such that B = P AQ, then there
exist exactly min {m, n} + 1 different equivalence classes - the number of different possible
ranks of such matrices. So, under this equivalence relation, two matrices are equivalent iff
they have the same rank.
3. Let A be an m × n matrix over F . Then Ax = b has a solution for all b ∈ F m iff the column
rank of A is equal to m. In particular m ≤ n. And if m = n, then A must be invertible.
If m < n, then Ax = 0 has a non-trivial solution.
4. Let E/F be a field extension. Then x1 , ..., xr ∈ F n ⊆ E n are linearly independent over F iff
they are linearly independent over E. One can see this by noting that the rank of a matrix
does not change under linear transformations. Another approach is to extend this set to a
basis of F n which is then also a basis of E n .

The geometry behind the elimination of variables : The main philosophy behind the
elimination of variables is that a system of linear equations in n variables is ‘easier to solve’ than

6
a system of linear equation in (n + 1) variables.
We concentrate on a system of homogeneous linear P equations Ax = 0 to illustrate this idea. Af-
ter re-arranging the variables, if required, let i ai xi = 0 be a linear equationPof this system
with a1 6= 0. Then the hyperplane H defined by the equation x1 = −a−1 1 ( j>1 aj xj ) con-
tains all solutions of this system. Note that the natural projection π1 : k → k n−1 given by
n

(α1 , ..., αn ) 7→ (α2 , ..., αn ) induces an isomorphism between H and k n−1 . This allows us to view
the set of solutions as a subset of k n−1 and the original system of equations reduces to a system
of equations in (n − 1) variables which, in principle, is easier to solve.

The following ‘well-known’ fact which we are not going to prove will be used time and again.
There exists a monoid homomorphism det : Mn (F ) → F such that a matrix A is invertible iff
det (A) 6= 0. The set of all invertible matrices which forms a group under matrix multiplication,
is called the general linear group of degree n and denoted by GLn (F ).

Vector Spaces
A ring, for us, will always mean a ring with unity. Usually they’re also assumed to be commutative
unless, of course, we are dealing with the rings of matrices or linear operators. If R is a ring then
there exists a unique ring homomorphism R : Z → R and the non-negative generator of the kernel
of this map is called the characteristic of R, denoted by ch R. Note that, if R is an integral domain
then ch R is either zero or a prime number.
If (V, +) is an abelian group, then a scalar multiplication · : F × V → V is ‘same as’ a ring
homomorphism φ : F → EndZ (V ). If a scalar multiplication is given, we can define a ring homo-
morphism φ : F → EndZ (V ) which sends an element a ∈ F to φ(a), defined by φ(a)(v) := a · v
for all v ∈ V . Conversely, given a ring homomorphism φ : F → EndZ (V ), we can define a scalar
multiplication by a · v := φ(a)(v). Clearly, this is a one-to-one correspondence.

Examples.
1. 0, F and F n for any non-negative integer n.
2. Let I be a set. Then F I , the set of all functions from I to F is a vector space over F under
point-wise addition and scalar multiplication.
This is a special case of a more general phenomenon. If X is a set and G is an algebraic
structure then GX , under point-wise operations, tend to reflect the algebraic structure of G.
F (I) , the set of all functions from I to F which takes nonzero values only at finitely many
elements of I, is a subspace of F I . Note that F I = F (I) iff I is a finite set.
We’ll later see that vector spaces ‘look like’ F (I) whereas dual spaces ‘look like’ F I . In fact
(F (I) )∗ = F I .
3. The set of all m × n matrices over a field F .
4. If E/F is a field extension, then E is automatically a vector space over F .
S∞
5. F [X] = n=0 F n = F (N) .

Definitions. A subset S of V is called a subspace of V if it is closed under addition and scalar

multiplication. We will usually write W ≤ V to denote that W is a subspace of V .

If W ≤ V , then the relation ‘x ∼ y if x − y ∈ W ’ is an equivalence relation in V . The set of

equivalence classes V / ∼ ‘inherits’ a unique vector space structure from V which makes the natural
projection map V → V / ∼ a vector space homomorphism. Conversely, it’s also easy to check that
if we want to give an equivalence relation ∼ on V so that the natural projection V → V / ∼ taking
every element x ∈ v to its equivalence class x̄ ∈ V / ∼ becomes a vector space homomorphism,
then 0̄ is a subspace of V and x̄ = x + 0̄ for all x ∈ V . In short, V / ∼ = V /0̄.
This is a special case of a more general phenomenon which may be observed in the cases of other
mathematical structures like groups, rings or topological spaces. A quotient structure is often
determined by the requirement of making the natural projection map ‘nice’, i.e., a set-theoretic
map preserving that particular structure.
If W ≤ V then the subspaces of V /W are in one-to-one correspondence with the subspaces of V

7
which contains W .

The set of all linear combinations of a sequence (x1 , ..., xn ) ∈ V n , i.e, the set {λ1 x1 + ... +
λn xn | λ1 , ..., λn ∈ F }, is called the linear span of (x1 , ..., xn ). Note that the linear span of the
sequence (x1 , ..., xn ) is equal to the image of the homomorphism φ : F n → V , where φ(ei ) := xi
for all i; and rearranging the elements of a sequence does not change its linear span.
If S is a subset of V , then the linear span of S, denoted by < S >, is defined to be union of all
linear spans of all finite sequences of elements of S. Note that
\
< S >= W,
S⊆W ≤V

which is the smallest subspace of V containing S.

A finite sequence of elements (x1 , ..., xn ) ∈ V n is said to be linearly independent if the natural
homomorphism φ : F n → V , with φ(ei ) := xi for all i, is 1 − 1. Note that a linearly independent
sequence of vectors retains its property under any rearrangement of the vectors. A set S ⊆ V is
said to be linearly independent if all finite sequences of distinct elements of S are linearly indepen-
dent.
A vector space V is said to be finitely generated if there exists a finite set S ⊆ V such that
V =< S >.

Operations on subspaces of V .
1. Union: If V1 , V2 ≤ V , the V1 ∪ V2 is a subspace of V iff either V1 ⊆ V2 or V2 ⊆ V1 .
A partially ordered set (Γ, ≤ is said to directed (upwards) if for any two elements i, j ∈ Γ
there exists an element l ∈ Γ satisfying i ≤ l and j ≤ l. A family of subspaces {Vi } is said the
a directed system if it forms a directed set under the inclusion relation. The corresponding
union ∪i Vi may then be called a directed union. Note that a directed union of subspaces is
always a subspace.
2. Sum : If V1 , V2 ≤ V then we define their sum as V1 + V2 := {x + y | x ∈ V1 and y ∈ V2 }.
Note that V1 + V2 is a subspace of V . Inductively, we can define a sum of finitely many
subspaces of V . Since addition is a commutative operation in V , the order of the summands
does not change the sum of subspaces. If {Vi }i∈I P is a family of subspaces (finite or infinite!)
of V , then the sum of this family, denoted by i Vi , is defined to be the directed union of
the sums of all finite subfamilies of {Vi }. If I is a finite set then this definition matches with
the definition of the sum of finitely many subspaces. Note that
X \
Vi = W =< ∪i Vi >,
i ∪i Vi ⊆W ≤V

and it’s the smallest subspace of V containing all Vi .

3. Intersection : If Vi is a family of subspaces of V , then ∩i Vi is also a subspace of V . It is

the largest subspace of V contained in all Vi .
If V1 , V2 , V3 are subspaces of V then

(V1 ∩ V3 ) + (V2 ∩ V3 ) ⊆ (V1 + V2 ) ∩ V3 .

To see that the inclusion can be strict, just look at three distinct lines passing through the origin
in R2 .

Remarks.

1. Let S ⊆ T be subsets of V . If S spans V so does T , and if T is linearly independent so is S.

Note that ∅ is linearly independent and < ∅ >= 0.
2. A subset S if V is linearly independent iff every finite subset of S is linearly independent.

8
3. Any vector space V is a directed union of its finite dimensional subspaces. It’s also an union
of its one dimensional subspaces, although this union is not directed.

4. If V is finitely generated and S ⊆ V is a spanning set then there exists a finite set S0 ⊆ S
such that V =< S0 >.
5. Let S ⊆ V be a linearly independent set. Then for all x ∈ V , S ∪ {x} is linearly independent
iff x ∈<
/ S >.

6. Let S 0 ⊆ S be subsets of V . If S is a spanning (respectively linearly independent) set of V

then the image of S \ S 0 is a spanning (respectively linearly independent) set of V / < S 0 >.
7. If S is a subset of V and x ∈< S > is a nonzero vector then there exists a nonzero vector
y ∈ S such that < (S \ {y} ∪ {x} >=< S >.

Theorem. Let S, T be subsets of a vector space V . If < S >= V and T is linearly independent,
then |T | ≤ |S|.

Proof. If either |S| or |T | is finite, we can apply induction on |S| or |T |.

Let’s try to mimic the proof when S is infinite. Let {φi : Ti → S} be the collection of all injective
maps from subsets of T to S such that < (S \ φi (Ti )) ∪ Ti >= V . We can define a partial order on
this set by saying that φi ≤ φj if Ti ⊆ Tj and φj |Ti = φi . If {φj }j∈J is chain in this family then
S
φ̃ : ∪j Tj → S satisfies < (S \ φ̃(∪j Tj )) (∪j Tj ) >= V , so that every chain has an upper bound.
By Zorn’s lemma, there exists a maximal element of this family, say φ0 : T 0 → S. If T 0 = T then
|T | ≤ |S|. Otherwise, V 0 , the subsapce of V generated by T 0 is a proper subspace of V and T \ T 0
is a non-empty set. Let T 00 be the image of V /V 0 . As V 6= V 0 , S 00 , the image of S \ φ0 (T 0 ) in V /V 0
is also a non-empty set which spans V /V 0 . Then we can replace a non-zero vector of S 00 by an
element of T 00 , contradicting the maximality of φ0 !
Is the above ‘proof ’ correct, what do you think?

Exercises.

1. HK : Section 1.6 - 9,12; Section 2.1 - 4,6,7; Section 2.2 - 1,2,5,8 (you only need ch F 6= 2);
2. Prove that the ring of upper (or lower) triangular n × n matrices is not commutative for all
n ≥ 2.
3. If F is a field prove that both the groups (F, +) and (F ∗ , ·) can be embedded in SL2 (F ).

4. Let Sn , An be the symmetric group and the alternating group of degree n respectively. Prove
that for any commutative ring R, Sn and An can be embedded in GLn (R) and SLn (R)
respectively.
5. Let m > n be positive integers. Show that for all A ∈ Mm×n (F ), B ∈ Mn×m (F ), the product
AB ∈ Mm (F ) is not invertible.
If m ≤ n, can you always find A ∈ Mm×n (F ), B ∈ Mn×m (F ) such that AB = Im ?
6. Let R be a commutative ring with unity. If I is an ideal of R satisfying I 6= I 2 then prove
that I, under the induced operations, is a commutative ring without unity.
7. (*)

(i) Let V be the vector space of all real-valued functions defined on R. Show that the set
{eαx }α∈R is linearly independent.
(ii) Let V be the vector space of all real-valued functions defined on the unit interval [0, 1].
1
Show that the set { x−c }c∈R\[0,1] is linearly independent.
8. Let S1 , S2 be linearly independent sets f a vector space V . Prove that the image of S1 is
linearly independent in V / < S2 > iff the image of S2 is linearly independent in V / < S1 >.
Further if S1 ∩S2 = ∅, show that the above conditions are equivalent to S1 ∪S2 being linearly
independent.

9
9. If F is a finite field, find the number of elements in GLn (F ), SLn (F ) and the set of n × n
elementary matrices.

10. Let A ∈ Mn (Z). For any prime number p, let Ap denote the image of A in Mn (Z/pZ). Prove
that rk A ≥ rk Ap for all p, and the equality holds for all but finitely many p.
11. Let A be an n×n matrix over a field F whose every entry is equal to 1. Show that Ai+1 = nAi
for all i ≥ 1. Deduce that A is a nilpotent matrix iff ch F | n.

12. A matrix E ∈ Mn (F ) is called idempotent if E 2 = E. Prove that the set of idempotent

matrices generates Mn (F ).
13. (*) Let S be a connected metric space and F := R/C. If f : S → F is a non-constant
continuous function, show that the set {f i }i∈N is linearly independent. Further if f (s) 6= 0
for all s ∈ S, then {f i }i∈Z is also linearly independent.

14. A set X ⊆ V is said to be closed under lines (I don’t think that this terminology is standard,
but the concept is!) if l(x, y) ⊆ X for all x, y ∈ X. Then prove that
(i) A set X ⊆ V is closed under lines iff −X is closed under lines iff v + X is closed under
lines for all v ∈ V .
(ii) If Ax = b is a system of m linear equations in n variables then the set of its solutions
is closed under lines.
(iii) If F (6=?) then a non-empty set X ⊆ V is closed under lines iff there exists a vector
v ∈ V and a subspace W ≤ V such that X = v + W .
15. If V1 , V2 are subspaces of a vector space V over F (6=?), then prove that
[
V1 + V2 = l(x, y).
x∈V1 , y∈V2

16. Let Ω be a subset of a vector space V over F (6=). For any set X ⊆ V , we define
[
l(X) := l(x, y).
x,y∈X

If 0 ∈ Ω, then we inductively define a sequence of sets by Ω0 := Ω and Ωi+1 := l(Ωi ) for all
i ≥ 0. Now prove that
(a) Ωi ⊆ Ωi+1 for all i.
(b) If Ω ⊆ Ω0 then Ωi ⊆ Ω0i for all i.
(c) If dim < Ω >= n < ∞, then there exists a finite set Γ ⊆ Ω, consisting of n + 1 points
which includes the origin, such that Γn = Ωn =< Ω >.
In general, show that < Ω >= ∪i Ωi .

10
Lecture 4 (9/10/2020) :
We will continue with the proof of the theorem that ‘If S, T ⊆ V are such that T is linearly inde-
pendent and S is a spanning set then then |T | ≤ |S|.’

We have already proved it in the case when S is finite. The ‘proposed proof’ when S is infinite
doesn’t work because if {Ti }i∈N is an increasing sequence of linearly independent sets in VS with
injecive maps φi : Ti → S such that V =< (S \ φi (Ti )) ∪ Ti > for all i, still (S \ φ̃(∪i Ti )) ∪i Ti
may not span V .
For example, let V := F [X], S := {1, 1 + X, 1 + X + X 2 , ...} and T := {X, X 2 , X 3 , ...}. Let Sn , Tn
be the set of first n elements of S, T respectively (strictly speaking, ‘first n elements of a set’ doesn’t
make any sense, but you know what i mean!). Then (S \ Sn ) ∪ Tn is a spanning set of V for all n
but, as you can see, V 6=< T >.

So here is a correct proof of the remaining case.

First we need the following lemma whose proof is left as an exercise.

Lemma. Let X be an infinite set. Then F(X), the set of all finite subsets of X, has the same
cardinality as X.

If S is an infinite spanning set of V , by the above lemma, the elements of F(S) can be indexed
by S, say F(S) = {Si }i∈S . For each i ∈ S, let Vi :=< Si >. Then {Vi }i∈S is a collection of finite
dimensional subspaces of V indexed by S and V = ∪i Vi . If Ti := T ∩ Vi , then T = ∪i Ti . Note that
each Ti is a finite set since |Ti | ≤ |Si | for every i. Now the following lemma, whose proof we leave
as an exercise, finishes the proof of the theorem.

Lemma. Let Ω be an infinite set and S X is any set. Let {Xi }i∈Ω be a collection of subsets of
X. If each Xi is at most countable, then | i∈Ω Xi | ≤ |Ω|.

And this finishes the proof of the theorem.

Definition. A linearly independent spanning set of a vector space V is called a basis of V .

Note that B ⊆ V is a basis of V iff every element of V can be uniquely written as a (finite) linear
combination of the elements of B. Since B is a spanning set, every element of V can be written
as a linear combination of the elements of B; and the uniqueness of such a representation is a
equivalent to the linear independence of B.

Lemma. The following statements are equivalent for a set B ⊆ V .

i) B is a basis of V .
ii) B is a maximal linearly independent subset of V .
iii) B is a minimal generating set of V .

Proof. Left as an exercise.

Proposition. Every vector space V contains a basis.

Proof. We will show that every vector space V contains a maximal linearly independent set
with respect to set inclusion. If V is finitely generated, we can inductively construct such a set.
Otherwise, we have to apply Zorn’s lemma which says that ‘If every chain of a partially ordered
set (P, ≤) has an upper bound then P has a maximal element.’. Now consider the collection of all
linearly independent subsets of V . This is a partially ordered set under set inclusion. If {Ti } is
a chain in this family then ∪i Ti is also linearly independent (in fact, more generally, the union of
any directed family of linearly independent sets is linearly independent), which serves as an upper
bound of the chain. So by Zorn’s lemma, V contains a maximal linearly independent set.

Question. Can you use Zorn’s lemma to conclude the existence of a minimal generating set?

11
Remark.
1. Any two bases of a vector space V have the same cardinality. The cardinality of any (and
hence, all) basis of a vector space V is called its dimension and denoted by dimF V , or simply
by dim V if F is understood from the context.
2. If V, W are two vector spaces over a field F then V ∼ = W iff dim V = dim W . So once
the base field F is fixed, vector spaces over F , up to isomorphism, are ‘nothing but cardinal
numbers’.
3. Let V, W be two vector spaces over a field F . If there exist injective (respectively surjective)
linear maps φ : V → W and ψ : W → V then V ∼ = W . This is analogous to the Schroeder-
Bernstein theorem for sets; and the proof, perhaps not surprisingly, uses the Schroeder-
Bernstein theorem to conclude that dim V = dim W .
4. If I is a set, then the canonical vector space over F with I as a basis is F (I) . In fact, given any
vector space V , there exists a natural one-to-one correspondence between the set maps from
I to V and the vector space homomorphisms from F (I) to V . So F (I) ‘converts’ set-theoretic
maps into vector space homomorphisms.
5. Any linearly independent set of V can be extended to a basis of V .
Definitions. Let {V Qα }α∈Γ be a family of vector spaces over F . Then the direct product of
the family, denoted by α∈Γ Vα , is defined to be the set of all functions f : Γ → ∪α Vα such that
f (α) ∈ Vα for all α ∈ Γ. Then pointwise addition and scalar multiplication makes it a vector space
over F . L
The (external) direct sum of the family {Vα }α∈Γ , denoted by α∈Γ Vα , is defined to be the set of
all functions f : Γ → ∪α Vα such that f (α) ∈ Vα for all α ∈ ΓL
and f (α) 6= 0 for most finitely many
α. Then pointwise addition and scalarL multiplication makes Qα∈Γ Vα a vector
L space overQF .
It is clear from the definitions that α∈Γ Vα is a subspace of α∈Γ Vα and α∈Γ Vα = α∈Γ Vα
iff Γ is a finite set.

Remarks.
L
1. For each α ∈ Γ, there exists a natural inclusion ια : Vα ,→ α∈Γ Vα , where ια (xα ) is defined
to be the function fxα : Γ → ∪α Vα which takes the value xα at α and L0 everywhere else.
Similarly,
L for each α ∈ Γ, there exists a natural projection π α : α∈Γ Vα → Vα , where
f ∈ α∈Γ Vα is mapped to f (α) ∈ Vα .
It is clear that πα ◦ ια = idPV α and πβ ◦ ια = 0 whenever α 6= β.
Also, one can check that α∈Γ ια ◦ πα = id α∈Γ Vα (the possibly infiniteL sum of linear
L

operators makes sense because when it is applied to any particular element of α∈Γ Vα , only
Lmany summands are nonzero). In particular, if we identify Vα with its image ια (Vα ),
finitely
then α∈Γ Vα =< ∪α Vα >. Under L this identification, if Bα is a basis of Vα for each α ∈ Γ
then B := ∪α Bα is a basis of α∈Γ Vα .

2. By definition, α∈Γ F = F Γ and α∈Γ F = F (Γ) .

Q L

Lemma. Let W be a subspace of a vector space V . Let T be a linearly independent (respec-

tively spanning) set of W . If S ⊆ V is a set whose image under the natural projection V → V /W
is a linearly independent (respectively spanning) set of V /W , then S ∪ T is a linearly indepen-
dent (respectively spanning) set of V . In particular, if T is a basis of W and the image of S
is a basis of V /W then S ∪ T is a basis of V . It follows that if V is finite dimensional then
dim V /W = dim V − dim W and therefore we get the rank-nullity theorem.

Proof. Left as an exercise.

Lemma. If V1 , V2 are two subspaces of a vector space V then (V1 + V2 )/V1 ∼= V2 /(V1 ∩ V2 ).
Consequently, if V1 , V2 are finite dimensional then then dim (V1 + V2 ) = dim V1 + dim V2 −
dim (V1 ∩ V2 ).

Proof. Left as an exercise.

12
Definition. Let R be a commutative ring. By an R-algebra A, we mean an ordered pair
(A, φA : R → A) where A is a ring (not necessarily commutative) and φA is a ring homomorphism,
called the structure-homomorphism of the R-algebra A, such that φA (R) is contained in the center
of A. Note that φA need not be injective. We will often simply say that A is an R-algebra when
the structure homomorphism φA is understood from the context.
As for examples of R-algebras, any commutative ring R is trivially an algebra over itself with the
identity map giving the structure homomorphism. A standard example of an R-algebra is the
polynomial algebra R[X]. We will mostly be interested in R-algebras where R = F is a field so
that, whenever A 6= 0, we can actually identify F as a subring of A contained in the center. For us,
the most important F -algebras will be the matrix rings Mn (F ) and the rings of linear operators
of vector spaces V over F .
By an R-algebra homomorphism between two R-algebras (A, φA : R → A) and (B, φB : R → B)
we mean a ring homomorphism f : A → B such that the following diagram commutes

φA
R A
f
φB
B
i.e., φB = f ◦ φA . Loosely speaking, an R-algebra homomorphism from A to B is a ring homomor-
phism from A to B which does not ‘disturb’ the elements of R.

Exercises.
1. HK : Section 2.3 - 7,8,9 (ch F 6= 2 suffices),11,13; 2.4 - 6,7.
2. Prove that a vector space V over an infinite field F cannot be written as a finite union of its
proper subspaces.
If V is a vector space of dimension > 1 over a finite field F , show that V can be written as
an union of q + 1 proper subspaces, where |F | = q.
3. (*) Let F be a field and F (X) := Q(f [X]), the field of rational functions in one variable over
F . Show that dimF F (X) = max {ℵ0 , |F |}, where ℵ0 = |N|.
1
Hint. can you prove that the set { X−c }c∈F is linearly independent over F ?
4. Let V1 , V2 be two subspaces of a vector space V . Let B12 be a basis of V1 ∩ V2 . If B1 ⊆ V1
and B2 ⊆ V2 are two sets such that their images are bases of V1 /(V1 ∩ V2 ) and V2 /(V1 ∩ V2 )
respectively, then prove that B1 ∪ B2 ∪ B12 is a basis of V1 + V2 .
5. Prove that Rm is isomorphic to Rn as abelian groups for any two positive integers m and n.
(*) Also, prove that R ∼
= RN as abelian groups.
Hint. Can you view the above groups as suitable vector spaces?

6. Let E/F be a finite field extension, i.e., E/F is a field extension such that E is a finite
dimensional vector space over F . If V is a finite dimensional vector space over E, then prove
that
dimF V = [E : F ] · dimE V,
where [E : F ] := dimF E.
Hint. If {λi } is a basis of E over F and {ej } is a basis of V over E, then {λi ej } is a basis of
V over F .

7. Let I be a set and E/F a field extension. Then there exists a natural inclusion of F -vector
spaces F (I) ,→ E (I) . If S ⊆ F (I) is a linearly independent set over F , show that S ⊆ E (I)
remains linearly independent over E.
Let E/F be a field extension and V a vector space over E. Then by restriction of scalars,
we can view V as a vector space over F . If S ⊆ V is linearly independent over F does it
remain linearly independent over E?

8. (*) If X is a metric space (or topological space, if you know what it is!), let C(X, F ) denote
the set of all continuous functions from X to F , where F := R/C. If X is homeomorphic to
Y , then prove that C(X, F ) and C(Y, F ) are isomorphic as F -algebras.

13
9. (*)

i) Let V := C(R, R), the set of all real-valued continuous functions defined on R. Prove
that the set {ecx }c∈R ⊆ V is linearly independent over R.
Deduce that V has uncountable dimension over R; in fact, dimR V = ℵ1 , where ℵ1 := |R|.
ii) Let V := C([0, 1], R), the set of all real-valued continuous functions defined on the closed
unit interval [0, 1]. Then prove that dimR V = ℵ1 .
Note that the natural restriction gives a surjective linear map from C(R, R) to C([0, 1], R),
implying that dimR C([0, 1], R) ≤ dimR C(R, R).
Hint. You may use the ideas of the previous exercises to prove (ii). But given a countable
set of continuous functions S from [0, 1] to R, can you actually construct a function
f ∈ C([0, 1], R) such that f ∈<
/ S >?
10. (*) Let F := R/C and V := {(an ) ∈ F N | |an |1/n → 0 as n → ∞}. Show that V is a vector
space over F and its dimension is uncountable.
Hint. You may prove it using the previous exercises. But given a countably infinite subset
S of V , can you actually construct a sequence (an ) ∈ V such that (an ) ∈<
/ S >?

14
Lecture 5 (12/10/2020) :
Co-ordinate system
Let V be a finite dimensional vector space over F . A finite sequence of vectors B := (x1 , ..., xn ) ∈
V n is called an ordered basis of V if the natural linear map φB : F n → V , given by ei 7→ xi for all i,
is an isomorphism. Equivalently, (x1 , ..., xn ) is a linearly independent sequence whose linear span
is the whole of V . If α ∈ V , then the co-ordinate representation of α with respect to B, denoted
by [α]B , is the pre-image of α with respect to φB . If B0 := (x01 , ..., x0n ) ∈ V n is another ordered
basis of V then we have a commutative diagram

θf
Fn Fn
φB φB0
f
V V
where f : V → V is the isomorphism defined by f (xi ) := x0i for all i, and θf (ei ) := e0i , where
e0i := φ−1 0
B0 (xi ) for all i. Let α ∈ V and [α]B , [α]B0 the correspondinding co-ordinate representations
with respect to the ordered basis B, B0 respectively. Let P be the n × n invertible matrix whose
columns are the vectors e01 , ..., e0n . For any α ∈ V , if we write [α]B , [α]B0 as n × 1 column matrices,
then [α]B = P [α]B0 , or equivalently, [α]B0 = P −1 [α]B . Note that the ordered bases of F n are in
one-to-one correspondence with the elements of GLn (F ), the set of n × n invertible matrices over
F . Writing the vectors of an orederd basis of F n as column vectors gives us an invertible matrix.
Conversely, the column vectors of an invertible matrix gives an ordered basis of F n . Similarly, if
V is an n-dimensional vector space over F , then the ordered bases of V are in one-to-one corre-
spondence with the elements of GL(V ), the group of invertible linear operators of V .

Remark. Co-ordinate systems allow us to identify V with F n so that we can do computations

using matrices. The identification is kept flexible since different co-ordinate systems are useful in
different situations. We will see that the flexibility in choosing a particular co-ordinate system is
extremely helpful in the study of linear operators.

Linear Transformations
Let V, W be vector spaces over a field F . Then a set-theoretic map T : V → W is called a linear
transformation from V to W if the following diagrams are commutative
T ×T
V ×V W ×W
+V +W
T
V W
idF ×T
F ×V F ×W
·V ·W
T
V W
where (T × T )(x, y) := (T x, T y) and (idF × T )(λ, x) := λT x for all λ ∈ F and x, y ∈ V .
In short linear transformations are precisely those set-theoretic maps which preserve linear combi-
nations.

Remark. For various algebraic structures like groups, rings, vector spaces etc. morphisms
(homomorphisms, if you like) are defined to be those set-theoretic maps which respect the relevant
structure, and we can represent this property by using various diagrams as above.

Definitions. Let T : V → W be a linear transformation.

If V = W we usually call T a linear operator on V .
T is said to be an isomorphism if there exists a linear transformation U : W → V such that
U T = idV and T U = idW . Note that T is an isomorphism iff T is bijective. If V = W and T is
an isomorphism, we call T an automorphism or sometimes, an invertible linear operator.

15
A linear operator T : V → V is said to be nilpotent if there exists a positive integer n such that
T n = 0. T is said to be locally nilpotent if for all x ∈ V , there exists a positive integer nx (de-
pending on x) such that T nx (x) = 0. Note that a nilpotent operator is locally nilpotent and the
converse is true if V is finite dimensional. However, a locally nilpotent operator, in general, need
not be nilpotent. For example, if V := R[X] and D : V → V is the usual differential operator, i.e.,
D(f (X)) := f 0 (X), then D is a locally nilpotent operator which is not nilpotent.
If T is a linear operator on V , then W ≤ V is said to be T -invariant if T (W ) ⊆ W . A linear
operator T is said to be locally finite if for all x ∈ V , the linear span of the set {x, T x, T 2 x, T 3 x, ...}
is finite dimensional. Note that < {x, T x, T 2 x, T 3 x, ...} > is the smallest T -invariant subspace
of V which contains x. Consequently, T is locally finite iff every x ∈ V is contained in a finite
dimensional T -invariant subspace, or equivalently, V is an union of finite-dimensional T -invariant
subspaces. Note that a locally nilpotent operator is automatically locally finite. Also, every linear
operator on a finite-dimensional vector space is locally finite.

Remark. We intentionally avoided defining isomorphism to be a bijective linear map because

although it’s true with familiar algebraic structures like groups, rings, vector spaces etc., a bijective
morphism need not always be an isomorphism. For example, a continuous bijection between two
metric spaces need not be an isometry, and a continuous bijection between two topological spaces
need not be a homeomorphism.

Examples.
1. Let V be the set of infinitely differentiable real-valued functions defined on R. Then the
usual differential operator D : VS→ V is a linear operator. Note that Ker Dn consists of all
∞
polynomials of degree < n and n=0 Ker Dn = R[X].

2. Let V := Mm×n (F ). If A ∈ Mm (F ) and B ∈ Mn (F ) the we can define a linear operator TA,B

on V as TA,B (M ) := AM B for all M ∈ Mm×n (F ). Note that TA,B := RB ◦ LA = LA ◦ RB
is a composition of two linear operators, where LA (M ) := AM and RB (M ) := M B.
Proposition. Let T : V → W be a linear transformation. Then the kernel of T , denoted
by ker T , is defined as Ker T := {x ∈ V | T x = 0}. Note that ker T ≤ V and T (V ) ≤ W .
The cokernel of T , denoted by coker T , is defined to be the quotient space W/T (V ). The di-
mension of ker T and T (V ) are called the nullity and the rank of T respectively. We know that
T (V ) ∼
= V /ker T . In particular, if V is finite-dimensional, we have the rank-nullity theorem, viz.,
rank T + nullity T = dim V .

Proof. Left as an exercise.

Remarks.
1. If F = R/C, then every linear transformation T : F n → F m is continuous
2. If V, W are vector spaces over F , then LF (V, W ) (or simply L(V, W ), the set of all linear
transformations from V to W , is also a vector space over V . If dim V = n and dim W = m
then dim L(V, W ) = mn.
If V = W , we usually denote L(V, V ) by just L(V ) and this is an F -algebra.
3. If T, T 0 ∈ L(V, W ) then T = T 0 iff they match on a generating set of V .
4. Let T : V → W be a linear transformation. Then T is injective iff ker T = 0 iff T takes
every linearly independent set of V to a linearly independent set of W iff there exists a linear
transformation T 0 : W → V such that T 0 ◦ T = idV .
T is surjective iff coker T = 0 iff T takes every spanning set of V to a spanning set of W iff
there exists a linear transformation T 0 : W → V such that T ◦ T 0 = idW .
T is bijective iff T takes a (respectively every) basis of V to a basis W iff there exists a linear
transformation T 0 : W → V such that T 0 ◦ T = idV and T ◦ T 0 = idW .
Note that if there exist linear transformations T 0 , T 00 : W → V such that T 0 ◦ T = idV and
T ◦ T 00 = idW then T 0 = T 00 and T is an isomorphism.
If T is a linear operator on a finite-dimensional vector space V , then by rank-nullity theorem,
T is injective iff it’s surjective iff it’s an isomorphism.

16
5. Let V be a finite-dimensional vector space over F = R/C with two ordered bases B :=
(x1 , ..., xn ) and B0 := (x01 , ..., x0n ). If φB and φB0 are the corresponding co-ordinate maps from
F n to V , then we can give two different metrics on V by defining d(x, y) := |φ−1 −1
B (x) − φB (y)
0 −1 −1 0
and d (x, y) := |φB0 (x) − φB0 (y)|. Then d, d are topologically equivalent, i.e., they induce the
same topology on V , or, in other words, a set X ⊆ V is open with respect to d iff it’s open
with respect to d0 . To see this, note that d, d0 are topologically equivalent iff the identity map
from (V, d) to (V, d0 ) is a homeomorphism. Now look at the following commutative diagram

T
Fn Fn

φB : ei 7→xi φB0 : ei 7→x0i

idV
V V
n
where T is the invertible linear operator of F which makes the above diagram commutative.
In fact, T (ei ) = φ−1 n
B0 (xi ) for all i. Note that, by construction, φB : (F , | |) → (V, d) and
0 n 0
φB : (F , | |) → (V, d ) are homeomorphisms. Also, T , being an invertible linear operator, is
a homeomorphism of (F n , | |). Therefore idV : (V, d) → (V, d0 ) is also a homeomrphism. So
if V is a finite-dimensional vector space over F = R/C, then we can give it a topology which
is independent of any particular choice of the co-ordinate system.

Exercises.
(1) HK : Section 3.2 - 1,5,7,8,9,11,12; Section 3.3 - 7.
(2) (*) If T : R2 → R2 is a continuous map such that T n is a linear operator for all n ≥ 2,
is T a linear operator?
(3) Let T : Rn → Rm be a group homomorphism. Show that the following statements are
equivalent.
(i) T is uniformly continuous.
(ii) T is continuous.
(iii) T is continuous at the origin.
(iv) There exists a point x ∈ Rn such that T is continuous at x.
(4) (*)
(i) Let T : Rn → Rm be a map satisfying T (λx) = λT (x) for all λ ∈ R and x ∈ Rn . Is
T continuous?
(ii) If T : Rn → Rm is a continuous map satisfying T (λx) = λT (x) for all λ ∈ R and
x ∈ Rn , is T a linear transformation?
Hint: You may visualize in R2 .
(5) (*) Let T : V → W be a linear transformation. If T 0 : W → V is a set-theoretic map such
that T 0 ◦ T = idV and T ◦ T 0 = idW , then prove that T 0 is also a linear transformation.
Does the same conclusion hold if we only assume that T 0 ◦ T = idV (or T ◦ T 0 = idW )?
Prove that T is an isomorphism iff there exist linear transformations T 0 , T 00 : W → V
such that T 0 ◦ T is an invertible linear operator on V and T ◦ T 00 is an invertible linear
operator on W .

6. Give examples of two linear operators T, T 0 ∈ L(V ) such that T ◦ T 0 = idV but T 0 ◦ T 6= idV .
7. Let T : V → V be a linear operator.
i) Prove that {ker T n }∞ n ∞
n=1 is an increasing and {im T }n=1 is a decreasing sequence of
subspaces in V .
ii) If ker T i = ker T i+1 for some positive integer i, then ker T i = ker T i+j for all j ≥ 0.
Similarly, if im T i = im T i+1 for some positive integer i, then im T i = im T i+j for all
j ≥ 0.
If V is finite-dimensional then ker T i = ker T i+1 iff im T i = im T i+1 .
iii) ker T = ker T 2 iff ker T ∩ im T = 0.

17
iv) If V is a vector space of dimension n, then V = ker T n ⊕ im T n .
v) If V 0 := i ker T i then V 0 is a T -invariant subspace of V and T |V 0 is a locally nilpotent
S
operator which is nilpotent iff there exists a positive integer n such that ker T n =
ker T n+1 .
vi) Give an example of a linear operator T : V → V such that ker T i ∩ im T i 6= 0 for all
i ≥ 1.
8. If V is a vector space of dimension > 1, prove that L(V ) is not a commutative ring.

9. Let T : V → W be a linear transformation.

(i) If V 0 ≤ V then prove that V 0 ≤ T −1 (T (V 0 )).
(ii) If W 0 ≤ W then prove that T (T −1 (W 0 )) ≤ W 0 .
(iii) Prove that T is injective iff dim V 0 = dim T −1 (T (V 0 )) for all finite-dimensional sub-
spaces V 0 of V .
Prove that T is surjective iff dim W 0 = dim T (T −1 (W 0 )) for all finite-dimensional sub-
spaces W 0 of W .
10. (*) Let f : Rn → Rm be a continuous group homomorphism. Prove that f is a linear
transformation.
What if we replace R by C? Does the answer change if we further assume that f (ix) = if (x)
for all x ∈ Cn ?
11. Let V be a vector space and {Wi } a family of vector spaces. Then prove that
i) L(V, i Wi ) ∼
Q Q
= i L(V, Wi ).
Wi , V ) ∼
L Q
ii) L( i = L(Wi , V ).
i

12. Let V be a vector space and S, T ∈ L(V ) such that ST = T S. Prove that ker T and im T
are S-invariant and vice versa.
13. (*) Let T be a linear operator on an n-dimensional vector space V . Prove that there exists
a T -invariant decomposition V = V1 ⊕ V2 such that T |V1 is nilpotent and T |V2 is invertible
with dim V2 = rk T n so that rk T |V1 = rk T − rk T n .
Can you see how the matrix representation of T looks like with respect to a basis B1 ∪ B2 ,
where B1 is an ordered basis of V1 and B2 is an ordered basis of V2 ?
14. Let F be a field and Pol (F ) the set of all polynomial functions on F , i.e.,

Pol (F ) := {f : F → F | ∃ p(X) ∈ F [X] such that f (a) = p(a) for all a ∈ F }.

Show that the natural F -algebra homomorphism from F [X] to Pol (F ) which sends a poly-
nomial to the corresponding polynomial function, is injective iff F is infinite. If F is finite,
can you describe the kernel of this map?
If F is finite, prove that every function from F to F is a polynomial function.

15. Let V, W be vector spaces over a field F and σ : F → F a ring homomorphism. Let
f : V → W be a group homomorphism satisfying f (λv) = σ(λ)f (v) for all λ ∈ F and v ∈ V
(if σ = idF then we get the definition of linear transformation).
Prove that ker f is a subspace of V ; and if σ is surjective then im f is a subspace of W .
Let x1 , ..., xn ∈ V . If f (x1 ), ..., f (xn ) is linearly independent, prove that x1 , ..., xn ∈ V is
linearly independent. The converse is also true if f is injective and σ is surjective.
If σ is surjective then S ⊆ V is a spanning set of V iff f (S) ⊆ im f is a spanning set of im f .

18
Lecture 6 (14/10/2020) :
Matrix representation of linear transformations
Let V, W be vector spaces over F of dimension n, m respectively and T : V → W a linear trans-
formation. Let BV := (x1 , ..., xn ) and BW := (y1 , ..., ym ) be ordered basis of V, W respectively.
We want to find the matrix representation of T with respect to these ordered bases. The ma-
trix, denoted by [T ]BV ,BW , is the unique matrix in Mm×n (F ) which makes the following diagram
commutative

[T ]BV ,BW
Fn Fm

φBV φBW

T
V W
where φBV is the isomorphism which sends the natural basis vectors of F n to xi ’s and φBW is the
isomorphism which sends the natural basis vectors of F m to yj ’s. The (i, j)-th entry of [T ]BV ,BW
is the coefficient of yi in T (xj ).

Remarks.

1. For any fixed pair of ordered bases BV , BW , we get a vector space isomorphism ΦBV ,Bw :
L(V, W ) → Mm×n (F ), given by T 7→ [T ]BV ,BW .
If V = W and BV = BW = B, then we denote the matrix corresponding to T by TB and
the map from L(V ) to Mn (F ), given by T 7→ [T ]B , is an F -algebra isomorphism.
2. If V, W are vector spaces of dimension n, m respectively, the set of all possible isomorphisms
from L(V, W ) to Mm×n (F ) is parametrized by the set ordered pairs like (BV , BW ) where
BV is an ordered basis of V and BW is an ordered basis of W .
In particular, the ordered bases of V parametrize the set of isomorphisms from L(V ) to
Mn (F ).
3. If U, V, W are vector spaces with ordered basis BU , BV , BW and dimension n, m, p respec-
tively, then for any linear transformations T : U → V and T 0 : V → W , [T 0 ◦ T ]BU ,BW =
[T 0 ]BV ,BW [T ]BU ,BV .
And this is why the matrix multiplication rule may appear somewhat peculiar at first glance,
it’s supposed to ‘reflect’ the composition of two linear transformations.
If T : V → W is an isomorphism, then [T −1 ]BW ,BV = [T ]−1 BV ,BW .

For the details, one may look at Hoffman and Kunze, section 3.4.
Change of basis. Let T : V → W be a linear transformation and BV , BW ordered basis of
V, W respectively. Let A := [T ]BV ,BW , so that

A[α]BV = [T (α)]BW for all α ∈ V .

If B0V , B0W is another pair of ordered basis, how can we find A0 := [T ]B0V ,B0W ?
Note that, by change of co-ordinates we have

APV [α]B0V = PW [T (α)]B0W ,

where PV is the invertibe matrix whose columns are the vectors of B0V , represented with respect
to BV , and similarly PW . Therefore we get that
−1
A 0 = PW APV .

In particular, if V = W , BV = BW = B and B0V = B0W = B0 , then A0 = P −1 AP , where P is

the invertible matrix whose column vectors are the vectors of B 0 represented with respect to the
ordered basis B.

19
Definition. Two matrices A, B ∈ Mn (F ) are said to be similar, denoted by A ∼ B, if there
exists an invertible matrix P ∈ GLn (F ) such that B = P −1 AP .
Observe that similarity is an equivalence relation. For a matrix A ∈ Mn (F ), if TA : F n → F n is
the corresponding linear operator whose matrix representation with respect to the natural ordered
basis of F n is A, then a similar matrix B = P −1 AP is the matrix representation of TA with respect
to the ordered basis given by the column vectors of P . As a result, for the purpose of discussing
linear operators, similar matrices are often considered ‘same’; and important invariants associated
to matrices, like trace, determinant etc., always take the same value on similar matrices.

Lagrange’s interpolation
Lemma. Let λ1 , ..., λn ∈ F be distinct elements. Then the corresponding Vandermonde matrix

1 λ1 λ21 · · · λ1n−1
 
1 λ2 λ22 · · · λ2n−1 
V(λ1 ,...,λn ) :=  .
 
.. .. .. .. 
 .. . . . . 
1 λn λ2n n−1
· · · λn
is invertible.

Proof. Let us consider the system of homogeneous linear equations V(λ1 ,...,λn ) X = 0. If it has
a non-trivial solution, say (c0 , c1 , ..., cn−1 ) ∈ F n , then the polynomial f (X) := c0 + c1 X + ... +
cn−1 X n−1 ∈ F [X] has degree at most n − 1. But f (λi ) = 0 for all i = 1, ..., n, a contradiction.

Theorem (Lagrange’s interpolation). Let λ1 , ..., λn ∈ F be distinct elements. Then for all
c1 , ..., cn ∈ F there exist a unique polynomial f (X) ∈ F [X] of degree < n such that f (λi ) = ci for
all i. Here we follow the standard convention where the degree of the zero polynomial is taken to
be either −1 or −∞.

Proof. We’ll give two proofs - one using linear algebra and the other using ring theory.
First proof: Let C be the n × 1 column vector given by (c1 , ..., cn ). As the Vandermonde matrix
V(λ1 ,...,λn ) is invertible, the system of linear equations V(λ1 ,...,λn ) X = C has a unique solution,
say (a0 , ..., an−1 ) ∈ F n , which can be obtained by using Cramer’s rule. Then the polynomial
f (X) := a0 + a1 X + ... + an−1 X n−1 satisfies the given conditions.
Second Proof: First,P for each i we’ll construct a polynomial fi of degree < n such that fi (λj ) = δij .
Then f (X) := i ci fi (X) does the job. Q
As fi (λj ) = 0 for all j 6= i, fi (X) must be divisible by the product j6=i (X − λj ). Now the
condition fi (λi ) = 1 forces the polynomial to be
Y X − λj
fi (X) = .
λi − λj
j6=i

The uniqueness of the polynomial follows from the fact that if f (X), g(X) ∈ F [X] are two poly-
nomials satisfying the given conditions then each λi is a root of the polynomial f (X) − g(X).
Note that if the condition that the polynomial has degree < n is dropped, then we cannot retain
the uniqueness part of the claim.

Schroeder-Bernstein Theorem.
Statement. Let A, B be two sets. If there exist injective maps φ : A → B and ψ : B → A, then
there exists a bijection between A and B.
Q
AxiomQof choice. Let {Xi }i∈I be a non-empty family of non-empty sets. Then i∈I Xi 6= ∅,
where i∈I Xi is the set of all maps f : I → ∪i∈I Xi such that f (i) ∈ Xi for all i.

As a corollary to axiom of choice, we get the following lemma.

20
Lemma. Let A, B be two non-empty sets. Then there exists an injective map φ : A → B iff
there exists a surjective map ψ : B → A.

Proof. Let φ : A → B be an injective map. Let a ∈ A. Then we can define a surjective map
ψ : B → A as follows (
φ−1 (b) if b ∈ φ(A)
ψ(b) :=
a otherwise.
Note that ψ ◦ φ = idA , and if φ is not surjective then ψ is unique iff A is a singleton set.
Conversely, suppose that ψ : B →QA be a surjective map. For each a ∈ A, let Fa := ψ −1 (a), the
fiber over a. By axiom of choice, a∈A Fa is non-empty, which gives us an injective map from A
to B. In fact, for any such injective map φ : A → B, ψ ◦ φ = idA .

Question. What if we take A = ∅?

Corollary. In view of the above lemma, if A, B are two non-empty sets then there exists
a bijection from A to B iff there exist injective (respectively surjective) maps φ : A → B and
ψ : B → A.

Exercises.
1. HK : Section 3.4 - 6,8,9,10,11,12.
2. Let λ1 , ..., λn be n distinct elements in a field F . Show that there exist n elements c1 , ..., cn ∈
F such that there does not exist any polynomial f (X) ∈ F [X] of degree < n − 1 such that
f (λi ) = ci for all i.
3. Let λ1 , ..., λn ∈ F be distinct elements and c1 , ..., cn ∈ F . If g(X) ∈ F [X] is a polynomial
such that g(λi ) 6= 0 for all i, then there exists a polynomial f (X) ∈ F [X], divisible by g(X),
such that f (λi ) = ci for all i.

4. Let T be a linear operator on a vector space V . If T commutes with every invertible operator
on V , prove that T is a scalar operator.
Deduce that If A ∈ Mn (F ) commutes with all elements of SLn (F ) then A = λI, fot some
λ ∈ F.
In particular, the center of Mn (F ) consists of precisely the scalar matrices.
Hint: If T : V → V is a linear operator such that T (v) ∈< v > for all v ∈ V , can you show
that T is a scalar operator?
5. (*) For A ∈ Mn (F ), let Sim (A) denote the set of matrices which are similar to A. Prove
that Sim (A) is a singleton set iff A is a scalar matrix.
If A ∈ Mn (F ) is not a scalar matrix, show that there exists a set-theoretic injective map
from F to Sim (A). In particular, |Sim (A)| = |F | whenever F is infinite.
Deduce that there exist uncountably many R-algebra embeddings of C in M2 (R).
6. (*) Let F be a field of characteristic zero and A ∈ Mn (F ) has trace zero. Then prove that A
is similar to a matrix B whose every diagonal entry is equal to zero.
Does the above result hold over a field of positive characteristic?

21
Lecture 7 (16/10/20) :
Dual spaces.
Definitions. If V is a vector space over F , a linear transformation φ : V → F is sometimes
called a linear functional on V ; and we denote the set of all linear functionals on V by V ∗ , i.e.,
V ∗ := L(V, F ). As we already know, V ∗ is a vector space over F . It’s called the dual space (or
simply, dual) of V .
If B := {xi } is a basis of V , we define a set B∗ := {x∗i : V → F | xi ∈ B} ⊆ V ∗ , where x∗i is
defined as x∗i (xj ) := δij for all xj ∈ B.

Lemma. If B is a basis of V , B∗ ⊆ V ∗ is linearly independent.

Proof. If possible, let x1 , ..., xn ∈ V such that x∗1 , ..., x∗n ∈ V ∗ is linearly dependent and let
φ := i λi x∗i = 0. But then φ(xj ) = 0 for all j = 1, ..., n, implying that λi = 0 for all i.
P

Proposition. If B is a basis of V , then B∗ ⊆ V ∗ is a basis of V ∗ iff V is finite dimensional,

in which case we call B∗ the dual basis of B.
We’ll later see that if V is a finite-dimensional vector space then every basis of V ∗ is a dual basis.

Proof. If V is finite dimensional, let B := {x1 , ..., xn } be a basis of V . Then B∗ is linearly

independent. On the other hand, since every linear functional is uniquely determined by its values
on the elements of B, it’s also a spanning set of V ∗ . If φ : V → F is a linear functional such that
φ(xi ) = ci for all i, then clearly φ = i ci x∗i .
P
The elements in the linear span of B∗ are characterized by the property that they can take nonzero
values only at finitely many elements of B. But a linear functional can take an arbitrary set of
values on the elements of a basis. Therefore when B is not finite, i.e., V is not finite-dimensional,
B∗ cannot span V ∗ .

Theorem. If I is an infinite set then dim F I > dim F (I) . Therefore if V := F (I) then
dim V ∗ > dim V .

Proof. Let V := F (I) with a basis B := {xi }i∈I . Since I is infinite, |I| = |I × N|. So we can
partition B into a family of subsets {Bi }i∈I such that each Bi is a countably infinite set. Let
Vi :=< Bi > for all i. As {Bi }i∈I is a partition of B, it’s easy to see that V = ⊕i Vi . If possible,
let S ⊆ V ∗ be a spanning set of V ∗ with |S| = |I|. Since I is infinite, F(S), the collection of all
finite subsets of S, has the same cardinality as I, allowing us to write F(S) as F(S) = {Si }i∈I .
With each Si being finite and Vi being of infinite dimension, we can find an element φi ∈ Vi∗ such
/ SSi >. As V = ⊕i Vi , there exists a unique φ ∈ V ∗ such that φ|Vi = φi for all i ∈ I. But
that φ ∈<
then φ ∈/ i∈I < Si >= V ∗ , a contradiction.

Examples.
1. Let X be a metric space and V := C(X, R), the set of all continuous functions from X
to R. Then for each x ∈ X, we can define the evaluation map at x, denoted by evx , as
evx (f ) := f (x) for all f ∈ V . This induces a vector space homomorphism from R(X) to V ∗ .
Can you see that this map is injective?
2. Let V := C([a, b], R), the set of all real valued continuous functions defined on the closed
Rb Rb Rb
interval [a, b]. Then the integration operator a : V → R, defined as a f := a f (t)dt, is a
linear functional.
Remarks.
1. To study a mathematical object, it’s a standard practice to focus on a set of ‘nice’ functions
defined on it. If V := F n then the most natural functions on V are the natural projections/co-
ordinate functions and V ∗ is nothing but the linear span of these projections in F V , the set
of all functions from V to F which is a vector space over F in its own right.
2. If V is a vector space with a basis B, then the set-theoretic map B → B∗ with xi 7→
x∗i , induces an injective linear map from V to V ∗ . This is an isomorphism iff V is finite-

22
dimensional.
However, even if V is finite-dimensional, the isomorphism between V and V ∗ is not ‘natural’,
i.e., it depends on the chosen basis B of V , so the isomorphism is not co-ordinate-free. While
discussing double dual V ∗∗ , we’ll see that for a finite-dimensional vector space V , there’s a
natural isomorphism between V and V ∗∗ which does not depend on any particular choice of
basis of V .
3. If V := F (I) then V ∗ = F I . So in some sense, ‘vector spaces are like direct sums and dual
spaces are like direct products’.
4. If V is finite-dimensional then dim V = dim V ∗ .
5. For a field F , Lagrange’s interpolation tells us that F (F ) ,→ F [X]∗ , where we send an element
of F to the corresponding evaluation map, viz., a 7→ eva , where eva (f (X)) := f (a) for all
f (X) ∈ F [X]. In particular, if F is uncountable like F = R/C, then the dual of F [X] is
not countably generated. Now F [X] is not countably generated even when F is finite or
countable, but that requires a different proof.
6. If x ∈ V then x = 0 iff φ(x) = 0 for all φ ∈ V ∗ .
Let E := (e1 , ..., en ) be the natural ordered basis of F n . Then a homogeneous
7. P linear equation
∗
n
e∗i is the i-th
P
i a i Xi = 0 may be viewed as a linear functional on F , viz., i ai e i , where
n
co-ordinate function on F ; and the solution set of the linear equation is nothing but the
hyperplane defined by this linear functional, i.e., the kernel of this linear functional.
In fact, if V is a finite-dimensional vector space with an ordered basis B := (x1 , ..., xn ), then
by virtue of the following commutative diagram

ei 7→e∗
Fn i
(F n )∗

φB : ei 7→xi φB∗ : e∗ ∗
i 7→xi

xi 7→x∗
V i
V∗
every linear functional is ‘like’ a homogeneous linear equation. Linear functionals, therefore,
may be viewed as a ‘generalization of homogeneous linear equations’.
Lemma. Let T : V → W be a linear transformation. We define the transpose of T , denoted
by T t , as a linear transformation T t : W ∗ → V ∗ with T t (φ) := φ ◦ T for all φ ∈ W ∗ . Then T is
injective (respectively surjective) iff T t is surjective (respectively injective).

Proof. Left as an exercise.

Definition. If W ≤ V , then the codimension of W in V , denoted by codim W (when V is

understood from the context), is defined to be the dimension of V /W .
If W is a subspace of a finite-dimensional vector space V , then dim W + codim W = dim V .

Remark. If V1 , V2 are subspaces of a vector space V , then codim (V1 ∩ V2 ) ≤ codim V1 +

codim V2 . To see this, just look at the natural inclusion
V V V
,→ ⊕ .
(V1 ∩ V2 ) V1 V2

Lemma. If W ≤ V then the following statements are equivalent.

1. W is a maximal proper subspace of V .
2. W has codimension one in V .
3. There exists a nonzero linear functional φ ∈ V ∗ such that W = ker φ.
A subspace W of V is said to be a hyperplane in V if it satisfies any one (and hence all) the above
equivalent conditions.

23
Proof. Left as an exercise.

Definition. If S is a subset of V then S ◦ , the annihilator of S, is defined as

S ◦ := {φ ∈ V ∗ | S ⊆ ker φ}.

Some basic properties of annihilators

i) If S1 ⊆ S2 ⊆ V , then S2◦ ⊆ S1◦ .

ii) For any set S in V , S ◦ = < S >◦ ≤ V ∗ .

iii) If V1 , V2 are subspaces of V then V1 ⊆ V2 iff V2◦ ⊆ V1◦ . In particular, V1 = V2 iff V1◦ = V2◦ .
iv) If W ≤ V , then W = 0 iff W ◦ = V ∗ and W = V iff W ◦ = 0.

Proposition. Let W be a subspace of V . Then

(i) V ∗ /W ◦ ∼
= W ∗ . In particular, if V is finite-dimensional then dim W + dim W ◦ = dim V .
∗
(ii) W ◦ ∼= V /W .
Therefore the association W 7→ W ◦ sends a subspace of dimension (respectively codimension) r to
a subspace of codimension (respectively dimension) r.

Proof. (i) If W ≤ V , we’ve a natural linear map from V ∗ to W ∗ given by φ 7→ φ|W (formally,
this is obtained by composing φ with the natural inclusion ι : W ,→ V ). Note that this map is
surjective with the kernel being W ◦ .
(ii) Look at the following commutative diagram.
π
V V /W
∃ ?
φ
F
There exists φ̄ : V /W → F , making the above diagram commutative iff W ⊆ ker φ. Therefore
every elementof W ◦ induces a linear functional on V /W ; and this results into an isomorphism
∼ ∗
W◦ − → V /W .

Corollary 1. If φ, ψ ∈ V ∗ then < φ > = < ψ > iff ker φ = ker ψ. So hyperplanes determine
nonzero linear functionals up to nonzero constants.

Proof. If ψ is a nonzero scalar multiple of φ, then clearly ker φ = ker ψ.

Conversely, let W := ker φ = ker ψ. Then φ, ψ ∈ W ◦ , which is one-dimensional (Of course, we’re
assuming that W 6= V , for the assertion is trivial otherwise.).

Corollary 2. If W ≤ V is a subspace of codimension r then W can be written as an intersec-

tion of r hyperplanes of V .

Proof. Replacing V by V /W T, we’ve to show that if dim V = r then there exist r hyperplanes
in V , say H1 , ..., Hr , such that Hi =T 0. If B = (x1 , ..., xr ) is an ordered basis of V , then we can
define Hi := ker x∗i , so that we have i Hi = 0.
T
Lemma. If H1 , ..., Hm are hyperplanes in V , then H := i Hi has codimension ≤ m. It is
equal to m iff the corresponding linear functionals are linearly independent.

Proof. The first assertion follows from the fact that codim (Hi ∩ Hj ) ≤ 2 for all i, j, together
with induction.
Let φ1 , ..., φr ∈ V ∗ such that Hi = ker φi for all i and W := i Hi . Note that φ1 , ..., φr ∈ W ◦ . As
T
∗
W◦ ∼

= V /W , if codim W < r, then φi ’s cannot be T linearly independent. Conversely, if some φi
is linearly dependent on the other φj ’s, then W = j6=i Hj , implying that codim W < r.

24
Corollary. If a matrix A ∈ Mm×n (F ) has row rank r and column rank c, then r = c.

Proof. Let TA : F n → F m be the corresponding linear transformation. If we view the rows of

A as linear functionals on F n , then nullity TA = n − r. Also, the column rank of A is equal to the
rank of TA . So by rank-nullity theorem, c + (n − r) = n which implies r = c.

Exercises.

1. HK : Section 3.5 - 4,5,6,9,10,11,13 (F 6= F2 suffices), 14 (F infinite suffices), 16 (you’ve to

assume that f 6= 0) ,17.
2. (*) Let V := F [X] with the standard basis B := {1, X, X 2 , ...}. The B∗ = {δ0 , δ1 , δ2 , ...} is
defined as δi (X j ) := δij for all i, j. Note that δ0 = ev0 . Prove that {δi , evx }i∈N∪{0},x6=0 is a
linearly independent set in V ∗ .
Hint. You may have to use a slightly general version of Lagrange’s interpolation which has
been given as an exercise.
3. (*) If V := C([a, b], R), the set of all real-valued continuous functions defined on the closed
interval [a, b], then prove that
y
{evx , ∫ }x∈[a,b],y∈(a,b]
a
∗
is a linearly independent set in V .
Hint. You need to construct a continuous function f : [a, b] → R with certain integral values
which vanishes at a given set of finitely many points. Pictorially, can you see how to do it?
4. Let R be an infinite integral domain. If f (T1 , ..., Tn ) is a nonzero polynomial over R in finitely
many variables T1 , ..., Tn , then prove that the set

{(a1 , ..., an ) ∈ Rn | f (a1 , ..., an ) 6= 0}

is infinite.
5. (*) Let E/F be a field extension with F being infinite. If A, B ∈ Mn (F ), then prove that A
and B are similar over F iff they are similar over E.
The result is true even without assuming that F is infinite, but the only proof known to me
uses ‘fundamental theorem of modules over a PID’ which you’ll learn in Linear Algebra II.
Hint. Let P ∈ GLn (E) such that AP = P B. Then we can find finitely many elements
λ1 , ..., λr ∈ E, linearly independent over F , such that each entry of P can be written as an
F -linear combination of λ1 , ..., λr . Note that we can then write P as P = λ1 P1 + ... + λr Pr ,
with P1 , ..., Pr ∈ Mn (F ). As the sequence λ1 , ..., λr is linearly independent over F , APi = Pi B
for all i = 1, ..., r. Let R := F [T1 , ..., Tr ] be the polynomial ring in r variables over F and
consider the matrix P̃ := T1 P1 + ... + Tr Pr ∈ Mn (R). Note that det P̃ is a polynomial, say
φ(T1 , ..., Tr ) ∈ F [T1 , ..., Tr ], which is a nonzero polynomial because φ(λ1 , ..., λr ) 6= 0. Since
F is infinite, we can find (α1 , ..., αr ) ∈ F r such that φ(α1 , ..., αr ) 6= 0.

6. Let V be a finite-dimensional vector space of dimension n and φ, ψ ∈ L(V ) linear operators

satisfying φψ = ψφ.
(a) Suppose that φψ = 0. Then prove that null φ + null ψ ≥ n and rk φ + rk ψ ≤ n. Also,
show that null φ + null ψ = n iff rk φ + rk ψ = n, in which case im φ = ker ψ and
im ψ = ker φ.
(b) If ker φ ∩ ker ψ = 0, then show that the following statements are equivalent.
(i) φψ = 0.
(ii) V = ker φ ⊕ ker ψ.
(iii) rk φ + rk ψ = n.
(c) Let T : V → V be a linear operator. If f (X), g(X) ∈ F [X] are relatively prime poly-
nomials, then prove that ker f (T ) ∩ ker g(T ) = 0. Deduce that if (f g)(T ) = 0, then
V = ker f (T ) ⊕ ker g(T ), where ker f (T ) = im g(T ) and ker g(T ) = im f (T ).

25
Lecture 8 (2/11/20) :
Double dual
Definition. The double dual of a vector space V , denoted by V ∗∗ , is defined to be the dual of V ∗ ,
i.e., V ∗∗ = (V ∗ )∗ := L(V ∗ , F ).

Lemma. If V is a vector space, there exists a natural (co-ordinate free, for our purpose) injec-
tive linear transformation from V to V ∗∗ which is surjective iff V is finite dimensional.

Proof. For each x ∈ V , we define a linear functional Lx : V ∗ → F as Lx (f ) := f (x) for

all f ∈ V ∗ , which is the ‘evaluation of f at x’. It’s easy to check that this is an injective linear
transformation, and the definition does not depend on any particular choice of a basis of V . If V
is finite-dimensional, then the map is surjective since dim V = dim V ∗ = dim V ∗∗ .
On the other hand, if V is not finite dimensional then dim V ∗∗ > dim V ∗ > dim V . So V can
never be isomorphic to V ∗∗ . However, it’s not easy to explicitly describe an element of V ∗∗ which
is not contained in the image of V .

Corollary. If V is finite-dimensional, then every basis of V ∗ is the dual basis of a basis of V .

Proof. Let B0 := {f1 , ..., fn } be a basis of V ∗ . If (B0 )∗ = {f1∗ , ..., fn∗ } is the dual basis of B0
∼
=
in V ∗∗ , then the natural isomorphism V − → V ∗∗ allows us to get n elements x1 , ..., xn ∈ V such
that Lxi = fi∗ for all i = 1, ..., n. As fi∗ (fj ) = Lxi (fj ) = fj (xi ) = δij for all i, j, we conclude that
B0 is the dual basis of B := {x1 , ..., xn }. T
Alternatively, let Hi be the hyperplane defined by fi . Then for each i, Vi := j6=i Hj is one-
dimensional and Vi * Hi . So for each i, we can find a vector xi ∈ Vi \ Hi such that fi (xi ) = 1.
Then it is easy to see that one B0 is the dual basis of B := {x1 , ..., xn }.

Remarks.

1. If V is finite-dimensional, we often ‘identify’ V with V ∗∗ via the natural isomorphism. In-

formally, one may view V ∗ as a ‘mirror-reflection’ of V , so that taking reflection twice gives
back the original object.
Nevertheless, there’s always a ‘copy’ of V ‘sitting inside’ V ∗∗ and it’s the whole of V ∗∗ iff V
is finite-dimensional.
2. If S ⊆ V then the image of < S > in V ∗∗ is equal to S ◦◦ ∩ im V , where S ◦◦ := (S ◦ )◦ (We
cannot drop the intersection as if S := V then S ◦◦ = V ∗∗ ). In particular, if W is a subspace
of a finite-dimensional vector space V , we can naturally identify W with W ◦◦ , and will often
write W = W ◦◦ .
3. If W ≤ V is a subspace of dimension (respectively codimension) r, then we know that its an-
nihilator W ◦ has codimension (respectively dimension) r. Further, if V is finite dimensional,
then the association W → W ◦ establishes a one-to-one correspondence between the set of
r-dimensional subspaces of V to the (n − r)-dimensional subspaces of V ∗ , where n = dim V .
In particular, if V is a vector space of dimension n over a finite field F , then for any r ≤ n,
the number of r-dimensional subspaces of V is equal to the number of (n − r)-dimensional
subspaces of V . As a special case, the number of lines is same as the number of hyperplanes
in V .
4. The preceding remark shows that if V is finite-dimensional then every subspace of V ∗ is the
annihilator of a subspace of V . This, however, fails whenever V is not finite-dimensional.
To see this, let φ ∈ V ∗∗ be an element which is not contained in the image of V under the
natural linear map. Then W := ker φ is a hyperplane in V ∗ which isn’t the annihilator of
any subspace in V . For otherwise, if possible, let U ≤ V such that U ◦ = W . But then
W ⊆ ker Lx for all x ∈ U , forcing U be one-dimensional. So if v ∈ U is a nonzero element,
then W = ker φ = ker Lx ; which is a contradiction since φ is not contained in the image of
V.
However, note that every finite dimensional subspace of V ∗ is the annihilator of a subspace
of V having an appropriate codimension.

26
5. If f : V → F is a linear functional then ker f ⊆ f ◦ (after identifying V with its image in
V ∗∗ ), and equality holds iff V is finite-dimensional.
6. If x ∈ V then x = 0 iff f (x) = 0 for all f ∈ V ∗ . Similarly, for f ∈ V ∗ , f = 0 iff f (x) = 0
for all x ∈ V or, in other words, Lx (f ) = 0 for all x ∈ V . The duality relations between V
and V ∗ are especially useful when V is finite-dimensional, because then V can be naturally
identified with V ∗∗ .
7. Let AX = 0 and BX = 0 be two systems of homogeneous linear equations in n variables.
Recall that A is said to be row-equivalent to B iff they have the same row space. Then
A is row-equivalent to B iff the two systems have the same solution set. To see this, let
RA and RB be the row space of A, B respectively. If the linear equations are considered as
linear functionals on F n , then the kernel of the linear functional gives the solution set of
◦ ◦
the equation. If the two systems have the same solution set then RA = RB , implying that
(RA ) = (RB ) . As RA , RB can be naturally identified with (RA ) , (RB )◦◦ respectively,
◦◦ ◦◦ ◦◦

it follows that A and B have the same row space.

T
Lemma. Let V be a vector space and f1 , .., fr linear functionals on V . If W := i ker fi then
each fi induces a linear functional on V /W , say f¯i . Then f1 , ..., fr ∈ V ∗ are linearly independent
iff f¯1 , ..., f¯r ∈ (V /W )∗ are linearly independent.

Proof. Note that f1 , ..., fr ∈ W ◦ and f¯i is the image of fi under the natural linear transfor-
mation from W ◦ to (V /W )∗ . Therefore if f1 , ..., fr are linearly dependent, so is f¯1 , ..., f¯r .
Conversely, Let λ1 f¯1 + · · · + λr f¯r = 0 be a non-trivial linear relation. If f1 , ..., fr are linearly
independent, then there exists an element x ∈ V such that (λ1 f1 + · · · + λr fr )(x) 6= 0. But then
(λ1 f¯1 + · · · + λr f¯r )(x̄) := (λ1 f1 + · · · + λr fr )(x) 6= 0,
a contradiction.

Lemma. Let V be a vector space and f1 , ..., fr ∈ V ∗ linear functionals on V . Then f1 , ..., fr are
linearly independent iff the linear transformation T : V → F r , given by T (x) := (f1 (x), ..., fr (x))
is surjective.
In particular, f1 , · · · , fr are linearly independent iff there exists an r-dimensional subspace V 0 ≤ V ,
such that f1 |V 0 , ..., fr |V 0 is a basis of (V 0 )∗ .
P P
Proof. If fi = j6=i µj fj then clearly fi (x) = j6=i µj fj (x) for all x ∈ V , and therefore the
map cannot be surjective.
Conversely,
T suppose that f1 , ..., fr are linearly independent. If Hi := ker fi , thenTrecall that
W := i Hi has codimension r in V . Therefore, for each i, we can find an element xi ∈ j6=i Hj \W
such that fi (xi ) = 1. Then T (xi ) = ei for every i, and hence T is surjective.
We can take V 0 to be the subspace of V generated by x1 , ..., xr .

Remark. In the above lemma, we saw that if f1 , ..., fr ∈ V ∗ are linearly independent, then
there exist x1 , · · · , xr ∈ V such that fi (xj ) = δij for all i, j. Conversely, it’s also true that if
x1 , · · · , xr ∈ V are linearly independent elements, then there exist f1 , · · · , fr ∈ V ∗ , such that
fi (xj ) = δij for all i, j.

Transpose of a linear transformation

Definition. Let T : V → W be a linear transformation. Then the transpose of T , denoted by T t ,
is a linear transformation T t : W ∗ → V ∗ , defined as T t (f ) := f ◦ T for all f ∈ W ∗ .

Remark. If V, W are two vector spaces over F , then the transpose map from L(V, W ) to
L(W ∗ , V ∗ ) which sends T to T t , is an injective linear transformation. By the dimension argument,
the map is also surjective if V, W are finite dimensional.
If S : U → V , T : V → W are linear transformations, then (T ◦ S)t = S t ◦ T t .

Proposition. Let T : V → W be a linear transformation and T t : W ∗ → V ∗ the corresponding

transpose map. Then the following assertions hold.

27
(i) (im T )◦ = ker T t . In particular, if T is surjective then T t is injective. If V, W are finite-
dimensional, then rk T = rk T t .

(ii) If T t is injective (respectively surjective), then T is surjective (respectively injective).

(iii) If T is injective the T t is surjective, so that T is an isomorphism iff T t is an isomorphism;
and in this case, (T t )−1 = (T −1 )t .
(iv) im T t = (ker T )◦ .

Proof. We only prove (i) and (iv).

The first two assertions in (i) follow from the definitions. If V, W are finite-dimensional, (im T )◦ =
ker T t implies that dim W − rk T = dim W ∗ − rk T t ; and hence rk T = rk T t .
To prove (iv), consider the commutative diagram
π
V V /ker T

T T̄

where π : V → V /W is the natural projection and T̄ : V /ker T → W is the map induced by T . It

induces the following commutative diagram of dual spaces

(T̄ )t
W∗ (V /ker T )∗
πt
Tt
V∗

Recall that (V /ker T )∗ maps isomorphically onto (ker T )◦ under π t . As T̄ is injective, by (iii),
(T̄ )t is surjective. Therefore, by the commutativity of the diagram, im T t = (ker T )◦ .
Note that we have proved that T is injective (respectively surjective) iff T t is surjective (respec-
tively injective).

Proposition. Let V, W be finite-dimensional vector spaces. If LV : V → V ∗∗ and LW : W →

∗∗
W are the natural linear transformations, then for any linear transformation T : V → W , the
following diagram is commutative
T
V W
LV LW

∗∗ T tt ∗∗
V W
where T tt = (T t )t . So while discussing finite-dimensional vector spaces, for all the practical pur-
poses, we usually don’t make any distinction between V and V ∗∗ .

Proof. Left as an exercise.

Proposition. Let V, W be finite dimensional vector spaces with ordered basis BV := (x1 , · · · , xn )
t
and BW := (y1 , · · · , ym ) respectively. Then T t B∗ ,B∗ = [T ]BV ,BW .
W V

Proof. Left as an exercise.

Remark. The above proposition gives a justification for the usual ‘rules’ of matrix transpose
like (A + B)t = At + B t , (AB)t = B t At etc.
It also gives an alternative proof of the fact that the row rank of a matrix is equal to its column rank.

Exercises.
∗
1. HK : Section 3.6 - 1,2,3 (note that V (S, F ) = F (S) ); Section 3.7 - 1,2,3,4,5,6 (we only
need ch F = 0, or at least ch F > n),7,8.

28
2. Give an example of an injective linear operator T ∈ L(V ) such that T t is not injective.
Also, give an example of a linear operator T ∈ L(V ) which is not injective, but T t is injective.

3. Let V be a vector space and W ≤ V ∗ a subspace of dimension r. Then prove that

!
V
dim T = r.
f ∈W ker f

4. Let W ≤ V be a subspace of codimension r in V . Then show that for any subspace V 0 ≤ V ,

the intersection V 0 ∩ W has codimension at most r in V 0 .

5. (*) Prove that for any positive integer

(Mn (F ), +) can be embedded in (SL2n (F ), ·).
n,
I A
Hint. Look at the block matrix .
0 I

29
Lecture 9 and 10 (4/11/2020, 6/11/2020) :
We’ll now study the structure of linear operators.
The simplest linear operators, as all of us would agree, are the ones which can be described by a
single number - the scalar operators. But scalar operators are ‘too special’ in the sense that every
nonzero vector space which is not a field admits ‘a lot of’ non-scalar operators. So in terms of
simplicity, next comes the class of linear operators which are ‘made up’ of scalar operators, the
so-called diagonalizable operators. More precisely, a linear operator
L T : V → V is diagonalizable
if there exists a T -invariant direct sum decomposition V = i Vi such that T , restricted to each
Vi , acts as a scalar operator, so that T , in some sense, becomes a direct sum of scalar operators.
In general, diagonalizable linear operators turn out to be the most ‘well-behaved’ linear operators
which can be found ‘in abundance’.
The study of linear operators becomes very difficult if the underlying vector space is not finite-
dimensional. So we’ll mostly restrict ourselves to the finite-dimensional case, but the definitions
will be given in a general set-up unless an assumption on the dimension is mandatory. The follow-
ing questions will broadly guide us in the study of linear operators.

Question 1. Is every linear operator diagonalizabe? If not, how to characterize the class of
linear operators which are diagonalizable?

Question 2. How to diagonalize a linear operator which is diagonalizable?

Question 3. If all linear operators are not diagonalizable, what’s the next best class of linear
operators and how to classify its members?

We will try to answer the above questions in the case when V is a finite-dimensional vector space.

We start with a few definitions.

Definitions. Let T : V → V be a linear operator.

Two subspaces V1 , V2 ≤ V are said to be linearly independent if V1 ∩ VP 2 = 0. More generally, a
collection of subspaces {Vi }i∈I is said to be linearly independent if Vi ∩ j6=i Vj = 0 for all i ∈ I.
Can you see that this definition of linear independence matches with the P ‘usual’ one?
If {Vi }i∈I is linearly independent family
L of subspaces then the sum i Vi is called an internal
direct sum and usually denoted by i Vi (Yes, the same notation as the external P direct sum!).
L If
{Vi } is a collection of linearaly independent subspaces of V such that V = i Vi , then V = i Vi
is called a direct sum decomposition of V .
Recall that a subspace W ≤ V is said to be T -invariant if T (W ) ⊆ W . Note that if W is T -
invariant then f (T )(W ) ⊆ W for all f ∈ F [X]. If W ≤ V is a T -invariant subspace, we’ll denote
the restriction T |W by simply TW and the induced map on V /W by T̄W . If W is a T -invariant
subspace of V , then W 0 ≤ V is said to be a T -invariant complement of W if V = W ⊕ W 0 and W 0
LT -invariant. By a T -invariant decomposition of V , we mean a direct sum decomposition
is also
V = i Vi where each Vi is T -invariant. We’ll be mainly interested in the case when V is finite-
dimensional and therefore the collection {Vi } is finite. Note that for each i, ⊕j6=i Vj is a T -invariant
complement of Vi .
An element λ ∈ F is said to be an eigenvalue/characteristic value/characteristic root of T if the
linear operator T − λI isn’t injective, or equivalently, there exists a nonzero vector x ∈ V such
that T (x) = λx. A nonzero vector x ∈ V is said to be an eigenvector of T if the one-dimensional
subspace of V generated by x is T -invariant. If x is a nonzero vector and T (x) = λx then x is
called an eigenvector of T associated to λ. For each λ ∈ F , the kernel of T − λI is called the
λ-eigenspace of T and denoted by VT,λ , or simply by Vλ if T is understood from the context. Note
that Vλ is the collection of all eigenvectors of T associated to λ together with 0, so that λ is an
eigenvalue of T iff Vλ 6= 0.

Remarks.
1. A set of nonzero vectors {xi } ⊆ V is linearly independent iff the corresponding collection
of one-dimensional subspaces {< xi >} is linearly independent. Note that, although the
inclusion of 0 makes any family of vectors linearly dependent, the inclusion of the zero

30
subspace has no effect on the linear independence property of a family of subspaces.
L
2. Let {V
Li } be a family of subspaces ofP V and i Vi the external direct sum. Then the natural
map i Vi → V , defined as (xi ) 7→ i xiP , is injective iff the family {Vi } is linearly indepen-
dent. Note that the image of this map is i Vi . Therefore, for a linearly independent family
of subspaces {Vi }, the external direct sum is ‘same’ P as the internal direct sum. i.e.,L they’re
naturally isomorphic. As a consequence, if V = i Vi then the external direct sum i Vi is
isomorphic to V iff {Vi } is linearly independent. Note that {Vi } is linearly independent iff
every finite sub-family of {Vi } is linearly independent.
If {V1 , · · · , Vr } is a set of linearly independent subspaces in a finite dimensional vector space
V (Strictly speaking, ‘linear independence’ is a property of the family and not the individual
subspaces!), then V = V1 ⊕ · · · ⊕ Vr iff dim V = dim V1 + · · · + dim Vr .
3. For every linear operator T : V → V , 0 and V are always T -invariant subspaces. Also, ker T
and im T are T -invariant.
If S : V → V is another linear operator such that ST = T S then both ker T and im T are
S-invariant and vice-versa. In particular, every eigenspace of T is S-invariant and vice-versa.
However, a T -invariant subspace of V need not be S-invariant.
P
4. Let {Vi } be a family of T -invariant subspaces. Then i Vi and ∩i Vi are both T -invariant;
and if {Vi } is a directed family, then ∪i Vi is also T -invariant.
5. If T : V → V is a linear operator then {Vλ }λ∈F , the collection of eigenspaces of T , is a
linearly independent family. Consequently, if V is finite dimensional, T can have at most
finitely many eigenvalues. In fact, if dim V = n, then T cannot have more than n distinct
eigenvalues.

6. If λ is an eigenvalue of T and f ∈ F [X] is a polynomial, then f (λ) is an eigenvalue of f (T ).

We’ll later see that the converse is also true if V is a finite dimensional vector space over an
algebraically closed field. Note that the λ-eigenspace of T is contained in the f (λ)-eigenspace
of f (T ), but equality may not hold.
7. Every nonzero vector in V is an eigenvector of T iff T is a scalar operator.

8. Unlike a subspace of a vector space V which always has a complementary subspace, a T -

invariant subspace W ≤ V may not have any T -invariant complement. Can you give an
example?
9. Let W ≤ V be a T -invariant subspace. Then the TW -invariant subspaces of W are in one-
to-one correspondence with the T -invariant subspaces of V which are contained in W and
the T̄W -invariant subspaces of V /W are in one-to-one correspondence with the T -invariant
subspaces of V which contains W .
10. We can define the external direct sum for any family of vector spaces {Vi } without having to
worry about the relations among them, because even if there are some relations among the
Vi ’s, it has absolutely no bearing on the definition. But the internal direct sum of a family
{Vi } makes sense only if it’s a linearly independent family of subspaces of a vector space V .
This, however, is not at all surprising because the isomorphic copies of Lthe Vi ’s indeed form
an linearly independent family of subspaces in the external direct sum i Vi .
Definitions. A linear operator T : V → V is said to be diagonalizable (over F ) if the following
equivalent conditions hold.
P
1. V = λ∈F Vλ .
L
2. V = λ∈F Vλ .
3. V can be written as an internal direct sum of one-dimensional T -invariant subspaces.

4. There exists a basis of V consisting of the eigenvectors of T . Such a basis is called an

eigenbasis.

31
Note that T is diagonalizable iff every vector in V can be written as a sum of eigenvectors of T .
If V is finite dimensional then T is diagonalizable iff there exists an ordered basis B of V such
that [T ]B is a diagonal matrix, and hence the name diagonalizable.
Similarly, a matrix A ∈ Mn (F ) is said to be diagonalible if it’s similar to a diagonal matrix. Note
that A is diagonalizable iff every matrix similar to A is also diagonalizable. Therefore a linear
operator T on a finite-dimensional vector space V is diagonalizable iff its matrix representation
with respect to every ordered basis of V is diagonalizable.

Remarks.
1. If T is a diagonalizable operator, so is f (T ) for every polynomial f ∈ F [X].
2. If S, T ∈ L(V ) are diagonalizable operators then S + T, ST need not be diagonalizable.
However, we’ll later see that they are diagonalizable if ST = T S.
How to find the eigenvalues?

With the above discussion, it’s clear that the eigenvalues play a crucial role in the study of linear
operators. But given a linear operator T : V → V , how to find its eigenvalues?
The situation, in general, is quite hopeless if V is not finite-dimensional because it’s not possible to
check whether T − λI is injective or not for every λ ∈ F . The finite-dimensional case, however, is
remarkably simple to deal with as the following lemma gives us a recipe for finding the eigenvalues
of a linear operator.

Lemma. Let T be a linear operator on a finite-dimensional vector space V . If λ ∈ F , then the

following statements are equivalent.
(i) λ is an eigenvalue of T .
(ii) T − λI is not 1 − 1.
(iii) T − λI is not onto.
(iv) det (T −λI) = 0 (We can define the determinant of a linear operator because similar matrices
have the same determinant.).
Proof. Left as an exercise. Note that (iv) allows us to explicitly compute the eigenvalues of T
and this method does not work for a linear operator on an infinite-dimensional vector space.

Definition. Let R be a commutative ring and A ∈ Mn (R). then the characteristic polynomial
of A, denoted by chA (X), is defined to be the determinant of the matrix XI − A ∈ Mn (R[X]).
Note that if A and B = P −1 AP are similar matrices in Mn (R) then

chB (X) := det (XI − B) = det (XI − P −1 AP ) = det (P −1 (XI − A)P ) = det (XI − A) = chA (X).

As similar matrices have the same characteristic polynomial, for a linear operator T on a finite-
dimensional vector space V , we can define its characteristic polynomial chT (X) to be the charac-
teristic polynomial of any matrix representation of T .

Let φ : R[X] → R be an R-algebra homomorphism with X 7→ a. Then we get an induced

R-algebra homomorphism φ̃ : Mn (R[X]) → Mn (R), where φ̃(M ) is obtained by evaluating each
entry of M at a. Then we have the following commutative diagram

φ̃
Mn (R[X]) Mn (R)
det det
φ
R[X] R
As a result, we see that if T is a linear operator on a finite dimensional vector space V then the
eigenvalues of T are precisely the roots of its characteristic polynomial chT (X), and hence they’re
also called the characteristic values/characteristic roots of T .

Remarks.

32
1. Similarly, we can define the eigenvalues, eigenvectors, eigenspaces etc. of an n × n square
matrix A over F . For example, the eigenvalues of A (in F ) are the roots of chA (X) (in F ),
every solution of the system of homogeneous linear equations (A−λI)X = 0 is an eigenvector
of A, and the λ-eigenspace of A is the set of solutions of this system of equations.
2. The eigenvalues of a matrix/linear operator depends on the base field F . For example,
the 90◦ -rotation of R2 does not have any eigenvalue in R, but it has two eigenvalues in C,
namely ±i. As a result, the corresponding matrix is not diagonalizable over R, but it’s
diagonalizable over C. So while discussing the diagonalizability of a matrix/linear operator
one should always keep the underlying field in mind.
3. If V is an n-dimensional vector space and T ∈ L(V ), then chT (X) has degree n, showing
that T cannot have more than n distinct eigenvalues; and if T has n distinct eigenvalues,
then it’s diagonalizable.

We’re now in a position to take the first stab at diagonalizability of a matrix/linear operator. If
we start with a linear operator T on a finite-dimensional vector space V , then first take a matrix
representation of T to get a matrix A ∈ Mn (F ). Then find the roots of the characteristic polyno-
mial chA (X). If the roots are λ1 , · · · , λr , find the dimensions of ker (A − λi I) for each i. If the
dimensions are d1 , · · · , dr , then T is diagonalizable iff n = d1 + · · · + dr .
If T is diagonalizable, and Bi is an ordered basis of ker (T − λi I), then [T ]B is a diagonal matrix
where B := {B1 , · · · , Br }.

The above procedure checks diagonalizability of a matrix/linear operator; and in the process,
gives an actual diagonalization whenever it exists. Therefore it’s not an efficient method if one’s
only interested in checking diagonalizability and not in the actual diagonalization. We need to
introduce a few more concepts to give an useful criterion for diagonalizability.

Definitions. Let F be a field and R a finite-dimensional algebra over it. If a ∈ R, then the
annihilator of a (over F ), denoted by ann a, is defined to be the kernel of the F -algebra homomor-
phism from F [X] to R which sends X to a (note that the kernel is nonzero!). Every polynomial in
this kernel is called an annihilating polynomial of a, and the unique monic generator of this kernel
is said to be the minimal polynomial of a which is denoted by mina (X). Sometimes, the minimal
polynomial of a is also referred to as the annihilator of a, but it doesn’t lead to any confusion be-
cause the nonzero monic polynomials of F [X] are in one-to-one correspondence with the nonzero
ideals of F [X].

Remarks.

1. If φ : R → S is an injective homomorphism of F -algebras then for each a ∈ R, a and φ(a)

have the same annihilator, and therefore the same minimal polynomial. As a result, the
minimal polynomial of a linear operator T on a finite dimensional vector space V is same as
the minimal polynomial of any matrix representation of T . In particular, similar matrices
have the same minimal polynomial.

2. if R is a finite-dimensional algebra over F , then the degree of mina (X) is equal to the
dimension of the F -algebra F [a]. As a result, if E/F is field extension and A ∈ Mn (F ), then
the minimal polynomial of A doesn’t depend on whether we think of it as a matrix over F
or a matrix over E.
3. Unlike minimal polynomials which can be directly defined for a matrix as well as a linear
operator on a finite-dimensional vector space, characteristic polynomials can only be defined
for a matrix. Because similar matrices have the same characteristic polynomial, we can
extend the definition to linear operators on finite-dimensional vector spaces.
4. Let T be a linear operator on a finite dimensional vector space V and W ≤ V a T -invariant
subspace. If I := ann TW and J := ann T̄W , then IJ ⊆ ann T ⊆ I ∩J, where either inclusion
can be strict. Equivalently, we have

lcm(minTW (X), minT̄W (X)) | minT (X) | minTW (X) · minT̄W (X).

33
5. Let T be a linear operator on a vector space V of dimension n and W ≤ V a T -invariant
subspace of dimension r. Let B := (x1 , · · · , xr , xr+1 , · · · , xn ) be an ordered basis of V with
B0 := (x1 , · · · , xr ) being an ordered basis of W . Then B00 := (x̄r+1 , · · · , x̄n ) is an ordered
basis of V /W . In this case,
[TW ]B0 A
[T ]B =
0 [T̄W ]B00
, where A is certain r × (n − r) matrix over F . Since

det (XIn − [T ]B ) = det (XIr − [TW ]B0 ) · det (XIn−r − [T̄W ]B00 ),

it follows that chT (X) = chTW (X) · chT̄W (X).

L
6. Let V = i∈I Vi be a T -invariant decomposition of V with T Li := T |Vi for all i. In this
situation, we think of T as the direct sum of Ti and write T = i∈I Ti .
If the index set I = {1, · · · , r} is finite and each Vi is finite-dimensional, then the following
assertions hold.
T
(i) ann T = i ann Ti , or equivalently, minT (X) = lcmi (minTi (X)).
(ii) If Bi is an ordered basis of Vi for each i, then B := (B1 , · · · , Br ) is an ordered basis of
V . The matrix representation of T with respect to B has the following block-diagonal
form

[T1 ]B1
 

 . 0 

[T ]B = .
 

0
 
 . 
[Tr ]Br
Qr Pr Qr
As a result, chT (X) = i=1 chTi (X). Also, tr T = i=1 tr Ti and det T = i=1 det Ti .
Conversely, let V1 , · · · , Vr be finite dimensional L and TL
Lrvector spaces i ∈ L(Vi ) for all i,
then we can define a linear operator T := T
i=1 i : V
i i → i Vi , as T ((xi )7→
(Ti (xi )). Then it’s easy to see that each Vi is T -invariant with T |Vi = Ti for all i, so
that the above discussion can be applied in this situation.
7. If T is a linear operator on a finite-dimensional vector space V , then for all λ ∈ F , we have
chT −λI (X) = chT (X + λ) and minT −λI (X) = minT (X + λ).

8. A locally nilpotent operator cannot have any nonzero eigenvalue. Therefore a nonzero locally
nilpotent linear operator cannot be diagonalizable. In particular, a nonzero nilpotent matrix
is not diagonalizable.
Exercises.
1. HK : Section 6.2 - 6,10,11,12,13 (*),14,15;

2. Let T : V → V be a linear operator and W ≤ V a T -invariant subspace.

(i) Let x1 , · · · , xn ∈ V be eigenvectors of T associated to distinct eigenvalues. If x :=
x1 + · · · + xn , then prove that

< x1 , · · · , xn > = < x, T (x), · · · , T n−1 (x) > .

Deduce that x ∈ W iff xi ∈ W for all i.

(ii) If T is diagonalizable, prove that TW and T̄W are also diagonalizable.
Is the converse true?
L
(iii) Let V = i Vi be a T -invariant direct sum decomposition. Then show that the λ-
eigenspace of T is equal to the direct sum of the λ-eigenspaces of TVi . In particular, T
is diagonalizable iff every TVi is diagonalizable.
(iv) If T is diagonalizable, show that V = ker T ⊕ im T .

34
2
thatthe linear span of n × n nilpotent matrices has dimension n − 1.
3. (*) Prove
1 −1
Hint. is a nilpotent matrix.
1 −1
4. Let φ : R → S be a homomorphism of commutative rings. Then φ induces two ring homo-
morphisms
φP : R[X] → S[X] and φM : Mn (R) → Mn (S),
where φP (a0 + a1 X + · · · an X n ) := φ(a0 ) + φ(a1 )X + · · · + φ(an )X n , and for A ∈ Mn (R),
φM (A) is obtained by applying φ on each entry of A. Now prove that
(i) det φM (A) = φ(det A).
(ii) tr φM (A) = φ(tr A).
(iii) chφM (A) (X) = φP (chA (X)).
(iv) If A, B ∈ Mn (R) then φM (chA (B)) = chφM (A) (φM (B)).
5. Let T : V → V be a linear operator. If null T = r, show that null T m ≤ rm for all m ≥ 1.
If T is a nilpotent operator on an n-dimensional vector space V , prove that T n−1 6= 0 iff
rk T = n − 1.
6. Let V be a vector space of dimension n and T : V → V a linear operator. Find the set of all
T -invariant subspaces in the following cases.
(i) T is a scalar operator.
(ii) T has n distinct eigenvalues.
(iii) T is a nilpotent operator such that T n−1 6= 0.
7. Let A, B be square matrices of order n ≤ 3 over an algebraically closed field F . If chA (X) =
chB (X) and minA (X) = minB (X), then show that A and B are similar matrices.
What happens if we drop the assumption that n ≤ 3?
Remark. It is a fact that if E/F is a field extension and A, B ∈ Mn (F ), then A and B are
similar over F iff they are similar over E. Taking E to be an algebraic closure of F , one can
therefore drop the assumption that F is algebraically closed.
8. (*) Let f (X) ∈ F [X] be a monic polynomial of degree n and V := F [X]/(f (X)). If T
is the linear operator on V defined as the multiplication by X̄, then prove that chT (X) =
minT (X) = f (X). The matrix representation of T with respect to the basis (1̄, X̄, · · · , X̄ n−1 )
is called the companion matrix of f (X).
Hint. What is the annihilator of the one-dimensional subspace generated by 1̄?
9. Let T be a linear operator on a finite-dimensional vector space V . Then prove that the
following are equivalent.
(i) V = ker T ⊕ im T .
(ii) ker T and im T are linearly independent.
(iii) ker T = ker T 2 .

10. Let T be a linear operator on a finite-dimensional vector space V and W ≤ V a T -invariant

subspace. If chT (X) = minT (X) then prove that chTW (X) = minTW (X) and chT̄W (X) =
minT̄W (X).
Is the converse true?
11. Let T ∈ L(V ). If every hyperplane of V is T -invariant, show that T is a scalar operator.

12. (*) Let T be a linear operator on a vector space V with W being a T -invariant subspace of
V which contains the image of T . If S is a linear operator on W such that STW = TW S
then prove that there exists a linear operator S̃ ∈ L(V ) such that S̃|W = S and S̃T = T S̃ iff
im T is S-invariant.

35
13. (*) Let T : V → V be a nonzero linear operator which is not surjective. If W is the image of
T and S ∈ L(W ), then show that there exists a linear operator S̃ ∈ L(V ) such that S̃|W = S
and S̃T 6= T S̃.
If T is a nonzero nilpotent operator on F 2 , show that there exists a linear operator S ∈ L(F 2 )
such that ker f (T ) and im f (T ) are both S-invariant for every polynomial f (X) ∈ F [X], but
ST 6= T S.

36
Lecture 11,12 and 13 (9/11/2020, 11/11/2020, 13/11/2020) :
So far, in the study of linear operators, eigenvalues and eigenvectors have played a central role.
Now this approach, although perfectly well-suited for diagonalizable operators and useful in gen-
eral, has its own limitations because everything in V , lying outside the sum of the eigenspaces,
remains ‘invisible’. Firstly, if F is not algebraically closed then some (may be, even all!) of the
eigenvalues of T may ‘escape’, as can be seen in the case of the 90◦ -rotation of R2 . To avoid
this situation, we’ll have to work over an algebraically closed field. But even then we may not
have ‘enough’ eigenvectors as is well-demonstrated in the case of a nonzero nilpotent operators. In
particular,
P if T is a nilpotent operator in an n-dimensional vector space V such that T n−1 6= 0,
then λ∈F Vλ = V0 is just one-dimensional which hardly tells anything about the structure of T .
To overcome these problems, we’ll now take a more holistic approach in studying linear operators
on a finite-dimensional vector space. But most of our qualitative discussions will apply to every
linear operator which has a nonzero annihilating polynomial. In this approach, we’ll ‘break’ V into
a direct sum of T -invariant subspaces and study the restrictions individually to get a global picture.
The motivation behind this comes from the intuitive idea that the complexity in the structure of
L(V ), in general, ‘should’ come down if the dimension of V drops.
In other words, our present approach will be a ‘top-down’ one where we start with V and then
break it into smaller subspaces, whereas the previous one was ‘bottom-up’ where we started with
the small eigenspaces and then checked whether V can be ‘covered’ by them.

We’ll first prove an extremely useful result - the Cayley-Hamilton theorem. We’ve already seen
that every eigenvalue of a square matrix A is a root of its characteristic polynomial and conversely.
Now the Cayley-Hamilton theorem says much more - that not only the eigenvalues of A are roots
of chA (X), but A itself is a ‘root’ of its characteristic polynomial.

Cayley-Hamilton theorem
Statement. Let R be a commutative ring and A an n × n matrix over R. Then chA (A) = 0.

Proof. We’ll break the proof in several steps. We’ll first prove it over C, the field of complex
numbers, and then show that the general result follows from this.
Step 1. If R is a commutative ring and A ∈ Mn (R) a diagonal matrix then chA (A) = 0.
Step 2. Let A ∈ Mn (F ) be a diagonalizable matrix with B := P −1 AP being a diagonal matrix.
Then chA (A) = chB (A) = chB (P BP −1 ) = P chB (B)P −1 = 0.
Step 3. The set of all diagonalizable matrices over C contains the set of all matrices whose
characteristic polynomials have n distinct roots. If f (X) ∈ C[X] is a monic polynomial of degree
n then the discriminant of f (X), denoted by Disc(f (X)), is defined as (up to sign)
Y
Disc(f (X)) := (ri − rj ),
i6=j

where r1 , · · · , rn are the roots of f (X). Note that the discriminant of f (X) is nonzero iff f (X) has
n distinct roots. Now it’s a fact that the discriminant of f (X) is a polynomial in the coefficients of
f (X). As the coefficients of the characteristic polynomial of a matrix are polynomials in the entries
of the matrix, it follows that disc : Mn (C) → C, given as disc(A) := Disc(chA (X)), is a polynomial
function in the entries of the matrix A. So we have a polynomial in n2 -variables {Xij }1≤i,j≤n , say
Φ(Xij ) ∈ C[Xij ]1≤i,j≤n such that the set of all matrices in Mn (C) which have n distinct eigenvalues
2
are precisely the points in Cn (which can be ‘identified’ with Mn (C)) which take nonzero values
under Φ. Now it’s a fact, whose proof we leave as an exercise, that if φ(Y1 , · · · , Yr ) ∈ C[r] is a
nonzero polynomial in r-variables then the set

{(α1 , · · · , αr ) ∈ Cr | φ(α1 , · · · , αr ) 6= 0}

is a dense open set in Cr . In particular, the set of diagonalizable matrices is dense in Mn (C).
Therefore, since A 7→ chA (A) is a continuous map of Mn (C), the result follows.
Step 4. Let A be an n × n matrix over R. Now consider the ring R̃ := Z[Xij ]1≤i,j≤n where Xij
are variables over Z. Let Ã be the generic n × n matrix over Z, i.e., Ã is the n × n matrix over

37
R̃ whose ij-th entry is Xij . If we denote the ij-entry of A by aij then there exists a unique ring
homomorphism from R̃ to R, viz., which sends Xij to aij for all i, j, such that under the induced
ring homomorphism from Mn (R̃) to Mn (R), Ã is mapped to A. Therefore it suffices to show that
the generic matrix satisfies its characteristic polynomial. Now chÃ (Ã) is a matrix whose entries are
polynomials in n2 -variables over Z. So if some entry of this matrix is a nonzero polynomial then
2
we can find n2 integers, say cij ∈ Z, such that the polynomial, evaluated at the point (cij ) ∈ Zn ,
gives a nonzero value. But then the matrix C ∈ Mn (Z), whose ij-th entry is cij does not satisfy
its characteristic polynomial, a contradiction.

Remark. If T is a linear operator on a finite dimensional vector space V , the Cayley-Hamilton

theorem says that chT (X) is an annihilating polynomial of T , or equivalently, min T (X) | chT (X).
Note that even though chT (X) doesn’t make sense unless V is finite-dimensional, T may still satisfy
a nonzero polynomial, allowing us to define minT (X) to be the monic polynomial of least degree
satisfied by T .

Lemma. Let T be a linear operator on a finite dimensional vector space V . Then every eigen-
value of T is a root of minT (X) and conversely. Equivalently, the roots of minT (X) are same as
the roots of chT (X), both being equal to the set of eigenvalues of T .

Proof. First, let λ ∈ F be an eigenvalue of T . Then there exists a nonzero vector v ∈ V such
that T (v) = λv. As minT (T ) = 0, in particular, we’ve (minT (T ))(v) = 0. But then minT (λ)·v = 0,
implying that minT (λ) = 0.
Conversely, let λ ∈ F be a root of minT (X). Then we can write minT (X) as minT (X) =
(X − λ)g(X) for some g(X) ∈ F [X]. If λ is not an eigenvalue of T , then T − λI is an in-
vertible linear operator on V , forcing that g(T ) = 0, a contradiction.

Now we’ll show that not only that the roots of minT (X) and chT (X) are same, but they’ve the
same irreducible factors. But to give a proof of it, we need a preparatory lemma and a result from
field theory.

Lemma. Let E/F be a field extension. Then f (X), g(X) ∈ F [X] are relatively prime in F [X]
iff they’re relatively prime in E[X].

Proof. Left as an exercise.

We’ll also use the following fact about field extensions without giving any proof :
Let F be a field. Then there exists a field extension F /F (unique up to F -isomorphism), called
an algebraic closure of F , such that F is algebraically closed and every element of F satisfies a
nonzero polynomial over F , i.e., F /F is an algebraic extension.

Proposition. Let A be an n × n matrix over F . Then minA (X) and of chA (X) have the same
irreducible factors.

Proof. As minA (X) | chA (X), every irreducible factor of minA (X) also divides chA (X). Con-
versely, let f (X) be an irreducible factor of chA (X). Recall that the characteristic and the minimal
polynomial of A remain unchanged if we treat A as a matrix over F . Let λ ∈ F be a root of f (X).
Then λ is also a root of minT (X), implying that f (X) and minT (X) are not relatively prime in
F [X], and therefore they are not relatively prime in F [X]. Since f (X) is irreducible, it follows
that f (X) | minT (X).

Qr Theorem. Let T be a linear operator on an n-dimensional vector space V and chT (X) =
ei
i=1 if (X) the prime factorization of chT (X) into monic irreducible polynomials. If Vi :=
ker fiei (T ) for all i = 1, · · · , r then the following assertions hold.
(i) V = V1 ⊕ · · · ⊕ Vr is an internal direct sum decomposition of V into T -invariant subspaces.
e0
(ii) Let Ti := T |Vi . Then for each i, minTi (X) = fi i (X) for some positive integer e0i satisfying
Qr e0
1 ≤ e0i ≤ ei . Consequently, minT (X) = i=1 fi i (X), where 1 ≤ e0i ≤ ei for every i.

38
(iii) For every i, chTi (X) = fiei (X). In particular, dim Vi = ei · deg fi .
e0 S∞
(iv) Vi = ker fi i (T ) = d=1 ker fid (T ).
Proof. To prove (i), apply induction together with the result that if f (X), g(X) ∈ F [X] are
relatively prime polynomials such that (f g)(T ) = 0 then V = ker f (T ) ⊕ ker g(T ).
To prove (ii), first note that fiei (X) is an annihilating polynomial of Ti . As the minimal polynomial
e0
of Ti is a non-constant divisor of fiei (X), it follows that it is of the form fi i (X) for some positive
Qr e0
integer e0i satisfying 1 ≤ e0i ≤ ei . Therefore, we get that minT (X) = i=1 fi i (X) where 1 ≤ e0i ≤ ei
for every i.
To prove (iii),Q note that for each i, the characteristic polynomial of Ti must be a power of fi (X).
As chT (X) = i chTi (X), it follows that chTi (X) = fiei (X) for all i.
e0 S∞
For (iv), it’s clear that ker fi i (T ) ⊆ Vi ⊆ d=1 ker fid (T ). To prove the other inclusion, let n ≥ e0i .
e0 e0
We’ll show that ker fin (T ) = ker fi i (T ). Let x ∈ ker fin (T ). As gcd(fin (X), minT (X)) = fi i (X),
e0
it follows that fi i (T )(x) = 0.

Corollary (criterion for diagonalizability). Let T be a linear operator on a finite dimen-

sional vector space V . Then T is diagonalizable iff its minimal polynomial is a product of distinct
linear factors.

Proof. If T is a diagonalizable operator with distinct eigenvalues λ1 , · · · , λr , then T − λj I

annihilates the eigenspace associated to λj . Consequently, their product annihilates the sum of
the eigenspaces which, T being diagonalizable, is the whole of V .
Conversely, let minT (X) = (X − λ1 ) · · · (X − λr ) be a product of distinct linear factors. Then by
the above theorem,
V = ker (T − λ1 I) ⊕ · · · ⊕ ker (T − λr I),
implying that T is diagonalizable.

Remarks.
1. The above criterion for diagonalizability gives an alternative proof of the fact that if T is
a diagonalizable linear operator on a finite-dimensional vector space V and W ≤ V is a
T -invariant subspace, then both TW and T̄W are diagonalizable.
2. If A is an n × n matrix over a field F then A − λI is invertible for all but finitely many values
of λ ∈ F . If A is invertible then A−1 ∈ F [A], i.e., A−1 is a polynomial in A over F .
Qr
3. Let A ∈ Mn (F ) and minA (X) = i=1 fiei (X) an irreducible decomposition into the powers
of distinct monic irreducible polynomials. Then
r
F [X] Y F [X]
F [A] ∼
= ∼
= .
(minA (X)) i=1 (fiei (X))

In particular, F [A] contains exactly 2r number of idempotent elements; and F [A] is a field
iff minA (X) is an irreducible polynomial.
4. If a linear operator T is annihilated by a polynomial which is a product of distinct linear
factors then T is diagonalizable. The converse, however, may fail if V is not finite-dimensional.
Can you give an example?
5. Let A be an n × n matrix over F . If f (X) is a plynomial over F then f (A) is invertible iff
gcd(f (X), minA (X)) = gcd(f (X), chA (X)) = 1.
6. A ∈ Mn (F ) is a nilpotent matrix iff chA (X) = X n .
7. If T : V → V is a linear operator then T and T − cI have the same set of eigenvectors for all
c ∈ F.
8. Let F = R/C. Then the set of all m × n matrices having rank ≤ r is a closed subset of
Mm×n (F ) for every r ≥ 0. In fact, as r increases, we get an increasing chain of closed subsets
of Mm×n (F ), say C0 ⊆ C1 ⊆ · · · , with C0 = {0} and Cl = Mm×n (F ) where l = min{m, n}.

39
We’ll end our discussion on diagonalizable operators by giving another criterion for diagonaliz-
ability.

Definitions. Let T be a linear operator on a finite dimensional vector space V . If λ is a eigen-

value of T then the algebraic multiplicity (respectively, geometric multiplicity) of λ (with respect
to T ) is defined to the multiplicity of the root λ in chT (X) (respectively, the dimension of the
eigenspace Vλ ).
Evidently, we
S∞can give similar definitions for a square matrix over a field.
The union n=1 ker (T − λI)n is called the generalized λ-eigenspace of T ; and as we’ve seen, this
union is equal to the kernel of (T − λI)e , where e is the multiplicity of λ in the minimal polynomial
of T .

Lemma. Let T be a linear operator on a finite dimensional vector space V . Then for every
eigenvalue λ of T , the geometric multiplicity of λ cannot be more than its algebraic multiplicity.

Proof. Let chT (X) = (X − λ)r g(X), with g(λ) 6= 0. Then we’ve to show that dim Vλ ≤ r.
Now V = ker (T − λI)r ⊕ ker g(T ) and null (T − λI)r = r. As Vλ ⊆ ker (T − λI)r , the inequality
follows.
Note that the algebraic multiplicity of λ is equal to the dimension of the generalized λ-eigenspace
of T whereas the geometric multiplicity of λ is equal to the dimension of the λ-eigenspace of T .

Proposition. Let T be a linear operator on a finite dimensional vector space V . Then T is

diagonalizable iff chT (X) (equivalently, minT (X)) is a product of linear factors and each eigen-
value of T has the same algebraic and geometric multiplicity (The second condition simply says
that eigenspaces are same as the generalized eigenspaces.).

Proof. First, let T be a diagonalizable operator with eigenvalues λ1 , · · · , λr . Then minT (X) =
(X −λ1 ) · · · (X −λr ) and chT (X) = (X −λ1 )e1 · · · (X −λr )er with some positive integers e1 , · · · , er .
If Vi is the λi -eigenspace of T then Vi = ker (T − λi I) = ker (T − λi I)ei , implying that dim Vi = ei .
Conversely, let chT (X) = (X − λ1 )e1 · · · (X − λr )er . Then we can write V as an internal direct
sum of the generalized eigenspaces. By the given condition, every eigenspace is actually equal to
the corresponding generalized eigenspace, implying that T is diagonalizable.

Remarks.
1. In general, eigenspaces and generalized eigenspaces may be as different as chalk and cheese.
For example, if T is a nilpotent linear operator on an n-dimensional vector space V satisfies
T n−1 6= 0 then T has algebraic multiplicity n but geometric multiplicity only one! And for
a more ‘shocking’ example, you may look at the differentiation operator on the polynomial
ring R[X].
2. The following discussion is applicable to any linear operator which has a nonzero annihilating
polynomial (Strictly speaking, we’ve only defined annihilating polynomials for linear oper-
ators on a finite dimensional vector space, but you can guess the definition in the general
case! The only thing is that, for a linear operator on an infinite-dimensional vector space, the
definition will come with a phrase ‘if exists’.). In particular, it’s applicable to every linear
operator on a finite-dimensional vector space.
Not having enough eigenvalues is kind of an ‘artificial’ problem for a linear operator which
we can get rid of by moving to a larger field. But this difference between algebraic and
geometric multiplicities of various eigenvalues λ is the ‘real reason’ behind the failure of diag-
onalizability. Even in the infinite-dimensional case, when we may not be able to talk about
algebraic and geometric multiplicities, this failure is captured in the difference between the
λ-eigenspace and the generalized λ-eigenspace. Note that T − λI acts as a nilpotent operator
on the generalized λ-eigenspace which’s zero iff the generalized λ-eigenspace is same as the
λ-eigenspace. When all the eigenvalues of T are ‘present’ in F , then non-digonalizability
of T manifests itself in the nature of nilpotency of various translates of T at its different
eigenvalues. The difference between various eigenspaces and the corresponding generalized
eigenspaces tells us how ‘badly’ a linear operator fails to be diagonalizable.

40
So far, we’ve been largely dealing with diagonalizable operators and gave various criteria for diago-
nalizabilty, as well as a method of diagonalization whenever it exists. Now we are going to address
the final question raised in the introduction.
Diagonalizable operators, on a finite dimensional vector space V , are characterized by the property
that their matrix representations are diagonal with respect to suitable ordered bases. After diag-
onal matrices, the next class of ‘simple-looking’ matrices are the triangular ones (upper or lower).
So we’re now going to study those linear operators whose matrix representations, with respect to
suitable ordered bases, are upper/lower triangular.

Definitions. A linear operator T on a finite dimensional vector space V is said to be tringu-

larizable/triangulable if there exists an ordered basis B (respectively B0 ) of V such that [T ]B
(respectively [T ]B0 ) is upper-triangular (respectively lower-triangular).
Note that [T ]B is upper-triangular with respect to B := (x1 , · · · , xn ) iff [T ]B0 is lower triangular
with respect to B0 := (xn , · · · , x1 ) (Think this through! Still, if you’re not convinced then the
next lemma may help.).
Similarly, a matrix A ∈ Mn (F ) is said to be triangularizable/triangulable if there exists an in-
vertible matrix P (respectively Q) such that P −1 AP (respectively Q−1 AQ) is upper-triangular
(respectively lower-triangular).

Lemma. Let T be a linear operator on a finite-dimensional vector space V . Then T is tri-

angulable iff there exists a chain of T -invariant subspaces 0 = V0 ( V1 ( · · · ( Vn = V , where
n = dim V .

Proof. If [T ]B is upper triangular with respect to B = (x1 , · · · , xn ), then for each i, we may
take Vi :=< x1 , · · · , xi >.
Conversely, note that each Vi has codimension one in Vi+1 and if we take an ordered basis
B := (x1 , · · · , xn ) such that Vi+1 = Vi + < xi+1 > for all i < n, then [T ]B is an upper-triangular
matrix.

Lemma. Let T be a linear operator on a finite-dimensional vector space V . If W ≤ V is a

T -invariant subspace, then T is triangulable iff both TW and T̄W are triangulable.

Proof. The implication, in one direction, is pretty obvious.

For the other direction, let T ∈ L(V ) be a triangulable operator and W ≤ V a T -invariant subspace.
If 0 = V0 ( V1 ( · · · ( Vn = V is a chain of T -invariant subspaces, then note that both
0 = W ∩ V0 ⊆ W ∩ V1 ⊆ · · · ⊆ W ∩ Vn = W
and
W + V0 ⊆ W + V1 ⊆ · · · ⊆ W + Vn = V
are chains of T -invariant subspaces where each one has codimension at most one in the next sub-
space, and hence the conclusion follows.

Proposition. A linear operator T on a finite-dimensional vector space V is triangulable iff

chT (X) (or equivalently, minT (X)) is a product of linear factors (over F ).
In particular, if F = F is algebraically closed, then every matrix in Mn (F ) is triangulable.

Proof. If T is triangulable, let B = (x1 ,Q· · · , xn ) be an ordered basis of V such that [T ]B

r
is upper-triangular. Then clearly, chT (X) = i=1 (X − λi )ei , where λ1 , · · · , λr are the distinct
diagonal entries of [T ]B with
Qr each λi appearing exactly ei -times along the diagonal.
Conversely, let chT (X) = i=1 (X − λi )ei . We apply induction on dim V . Let x ∈ V be an eigen-
vector associated to λ1 . If W :=< x >, then chTW (X) = X −λ1 and chT̄W (X) = chT (X)/(X −λ1 ).
As the characteristic polynomial of both TW and T̄W are product of linear factors, by the induction
hypothesis, both of them are triangulable. Therefore it follows from the above lemma that T is
also triangulable.

We’ll now give a different proof of the Cayley-Hamilton theorem over an algebraically closed
field F .

41
Proposition. A triangulable linear operator T on a finite-dimensional vector space V satisfies
its characteristic polynomial.
In particular, the Cayley-Hamilton theorem holds over an algebraically closed field; and if we are
ready to accept the ‘fact’ that every field F can be embedded in an algebraically closed field, then
it proves the theorem over an arbitrary field F .

Proof. Let chT (X) = (X − λ1 ) · · · (X − λn ). Then we can choose an ordered basis B :=

(x1 , · · · , xn ) such that [T ]B is an upper triangular matrix whose i-th diagonal entry is equal to
λi . For each i, let Vi be the linear span of x1 , · · · , xi . Then 0 = V0 ( · · · ( Vn = V is a chain
of T -invariant subspaces. Note Qn that (T −λi I)(xi ) ∈ Vi−1 , so that (T − λi I)(Vi ) ⊆ Vi−1 for all i.
Therefore it follows that i=1 (T − λi I) (V ) = 0.

Remarks. In the following remarks, we assume that S, T are linear operators on a finite-
dimensional vector space V .
1. As the definition suggests, unlike diagonalizability, we define triangulability only for linear
operators on a finite dimensional vector space.
2. If an annihilating polynomial of T splits into linear factors then T is triangulable.
3. Even if S, T are triangulable, still S + T, ST need not be triangulable. We’ll later see that
they’re triangulable if ST = T S.
4. Every nilpotent operator on V is triangulable (Note the difference with diagonalizability!).
5. A triangulable operator T is diagonalizable iff every eigenvalue of T has the same algebraic
and geometric multiplicity, or equivalently, every eigenspace is equal to the corresponding
generalized eigenspace.

Simultaneous diagonalization and triangulation

So far, We’ve done diagonalization and triangulation of a single linear operator. Now we’re going
to see when a set of linear operators can be simultaneously diagonalized/triangulized, i.e., when we
can find a common ordered basis of V with respect to which each one of them is diagonal/triangular.

Throughout this section, all vector spaces are assumed to be finite-dimensional. In principal, one
can talk about simultaneous diagonalization even for a family of linear operators on an infinite-
dimensional vector space, but we’re not going to do it here.
We start with a few definitions.

Definitions. Let S ⊆ L(V ) be a set of linear operators. We say that S is a commuting family
of linear operators if ST = T S for all S, T ∈ S.
If S ⊆ L(V ) is a set of linear operators, then x ∈ V is called an eigenvector of S if T (x) ∈< x >
for all T ∈ S, i.e., if x is an eigenvector of every linear operator in S.
W ≤ V is said to be S-invariant if W is T -invariant for all T ∈ S. If W ≤ V is S-invariant then
we define SW := {TW | T ∈ S} ⊆ L(W ) and S̄W := {T̄W | T ∈ S} ⊆ L(V /W ).
An S-invariant subspace W of V is said to be an eigenspace of S is TW is a scalar operator for all
T ∈ S.
A direct sum decomposition V = V1 ⊕· · ·⊕Vr is said to be an S-invariant direct sum decomposition
if each Vi is S-invariant.
A set of diagonalizable (respectively triangulable) linear operators S ⊆ L(V ) is said to be simul-
taneously diagonalizable (respectively simultaneously triangulable) if there exists an ordered basis
B of V such that [T ]B is a diagonal (respectively upper-triangular) matrix for all T ∈ S.
Similarly, a set S ⊆ Mn (F ) is said to be simultaneously diagonalizable (respectively simultaneously
triangulable) if there exists an invertible matrix P ∈ GLn (F ) such that P −1 AP is a diagonal (re-
spectively upper-triangular) matrix for all A ∈ S.
In the definition, we could have replaced ‘upper-triangular’ matrices by ‘lower-triangular’ matrices.
Note that S ⊆ L(V ) is simultaneously diagonalizable (respectively simultaneously triangulable) iff
the corresponding family of matrix representations SB := { [T ]B | T ∈ S } ⊆ Mn (F ) is simulta-
neously diagonalizable (respectively simultaneously triangulable) for all ordered bases B of V.

42
Remarks.

1. If a set of diagonalizable operators S is simultaneously diagonalizable then S must be a com-

muting family of linear operators. However, the analogous statement is false for triangulable
operators. This happens because although the set of diagonal matrices forms a commutative
subring of Mn (F ), the set of upper (or lower)-triangular matrices is not a commutative ring.
2. If S ⊆ L(V ) is a commuting family of linear operators then S ⊆< S >⊆ F [S], where F [S]
is the smallest F -subalgebra of L(V ) containing S. The elements of F [S] are (multivariable)
polynomials over F in the elements of S. One may view F [S] as the directed union of F [S 0 ],
where S 0 varies over all finite subsets of S. As the linear operators of S commute with each
other, F [S] is also a commuting family of linear operators.
An element x ∈ V is an eigenvector of S iff it’s an eigenvector of F [S]. Similarly, W ≤ V
is an eigenspace of S iff it’s an eigenspace of F [S]. In fact, for T1 , · · · , Tr ∈ S, if x ∈ V is
such that Ti (x) = λi x for some λ1 , · · · , λr ∈ F , then for every polynomial f (X1 , · · · , Xr ) ∈
F [X1 , · · · , Xr ], f (T1 , · · · , Tr )(x) = f (λ1 , · · · , λr )(x).
If V is a finite-dimensional vector space then for all S ⊆ L(V ), we can find T1 , · · · , Tm ∈ S
such that F [S] = F [T1 , · · · , Tm ]. So every commutative F -subalgebra of Mn (F ) may be
viewed as the image of a polynomial algebra over F .

3. Let S ⊆ L(V ) be a family of linear operators and V = V1 ⊕ · · · ⊕ Vr an S-invariant direct

sum decomposition. Let Si ⊆ L(Vi ) be the family of linear operators obtained by restrict-
ing the linear operators of S to Vi . Then S is simultaneously diagonalizable iff each Si is
simultaneously diagonalizable.
We’ll first do simultaneous diagonalization.

Proposition. Let S be a set of diagonalizable linear operators on a finite-dimensional vec-

tor space V . Then S is simultaneously diagonalizable iff S is a commuting family of inear operators.

Proof. We’ve already seen that if S is simultaneously diagonalizable then it must be a com-
muting family of linear operators.
Conversely, let S be a commuting family of linear operators such that every member of S is diag-
onalizable. We apply induction on dim V .
If dim V = 1, there’s nothing to prove.
Now assuming that the assertion is true for all vector spaces of dimension ≤ r, let dim V = r + 1.
If S consists of only scalar operators, again, there’s nothing to prove. Otherwise, let T ∈ S be
a linear operator which is not scalar. Then we can find a non-trivial T -invariant decomposition
of V , say V = V 0 ⊕ V 00 , such both V 0 and V 00 are sum of eigenspaces of T . As S is a family of
commuting operators, V = V 0 ⊕ V 00 is also an S-invariant decomposition. Let S 0 , S 00 be the family
of linear operators induced by S on V 0 , V 00 respectivly. Then by the induction hypothesis, both S 0
and S 00 are simultaneously diagonalizable. Therefore S is also simultaneously diagonalizable.

Next comes simultaneous triangulation.

Lemma. Let V be an n-dimensional vector space and S ⊆ L(V ) a set of triangulable linear
operators. Then S is simultaneously triangulable iff there exists a chain of S-invariant subspaces
0 = V0 ( V1 ( · · · ( Vn = V .

Proof. Left as an exercise. It’s almost same as the proof which we did for triangulable opera-
tors.

Lemma. Let S ⊆ L(V ) be a set of triangulable linear operators on a vector space V . If W ≤ V

is S-invariant then S is simultaneously triangulable iff both SW and S̄W are simultaneously trian-
gulable.

Proof. Left as an exercise. Again, it’s almost same as the proof which we did for triangulable
operators.

43
Lemma. Let S be a commuting family of triangulable linear operators on a finite-dimensional
vector space V . Then V contains an eigenvector of S.

Proof. We apply induction on dim V .

If dim V = 1, there’s nothing to prove.
Now assuming that the assertion is true for all vector spaces of dimension ≤ r, let dim V = r + 1.
If S consists of only scalar operators, again, there’s nothing to prove. Otherwise, let T ∈ S be a
linear operator which is not scalar. Let W ≤ V be a nonzero eigenspace of T . Then S being a
family of commuting linear operators, W is also S-invariant. Now SW , by the induction hypothesis,
has an eigenvector in W which is also an eigenvector of S.

Proposition. A commuting family of triangulable operators is simultaneously triangulable.

Proof. Using induction on dim V , the proof follows from the above lemma. The details are
left as an exercise.
n(n+1)
Corollary. Let F be a commutative F -subalgebra of Mn (F ). Then dim F ≤ 2 − 1.

Proof. Let E be an algebraically closed field containing F and FE the smallest E-subalgebra
of Mn (E) containing F. Then under a suitable automorphism of Mn (E), the image of FE lies
inside the set of upper-triangular matrices of Mn (E). As FE is a commutative E-algebra and the
set of upper-triangular matrices isn’t commutative, we deduce that dimE FE ≤ n(n+1)2 − 1, and
n(n+1)
consequently, dim F ≤ 2 − 1.

Exercises.

1. HK : Section 6.3 - 1,3,4,6,7,8,9,10; Section 6.4 - 9,10,11,13; Section 6.5 - 2 (do it for n = 2, 3,
don’t try to do it for general n - that’s a big theorem!),5.
2. Give an example of a linear operator T on a finite-dimensional vector space V over F such
that T has no eigenvalues.
Does your answer change if F is algebraically closed?
What if F is algebraically closed, but V is not finite-dimensional?
3. Let F be an algebraically closed field and A ∈ Mn (F ). Then show that the following
statements are equivalent.
(i) A is diagonalizable.
(ii) For every non-constant polynomial f (X) ∈ F [X], the equation f (X) = A has a solution
in Mn (F ).
(iii) For all λ ∈ F , the equation X n − λ = A has a solution in Mn (F ).
4. Let A be an n-dimensional algebra over a field F . Then prove that A, as an F -algebra, can
be embedded in Mn (F ).
In particular C, as a R-algebra, can be embedded in M2 (R).
Hint. Can you ‘view’ the elements of A as F -linear operators on A?
5. Let D be the set of all diagonalizable matrices in Mn (F ), where F = R/C. Then prove that
D◦ , the interior of D, consists
of precisely those matrices which have n distinct eigenvalues.
λ
Hint. Look at the matrix .
0 λ

6. Let R be a commutative ring and A, B ∈ Mn (R). Then prove that chAB (X) = chBA (X).
Now suppose that R = F is a field. Then AB and BA have the same set of eigenvalues and
AB is triangulable iff BA is triangulable. Also, AB is nilpotent iff BA is nilpotent.
Does AB and and BA have the same set of eigenvectors?
Is minAB (X) = minBA (X)?
If AB is diagonalizable, is BA also diagonalizable?
Hint. First prove that chAB (X) = chBA (X) for R = C. For that, fix a matrix A ∈ Mn (C)
and observe that the result holds for all B ∈ GLn (C). As GLn (C) is dense in Mn (C), you can

44
now follow a similar line of arguments as given in the proof of the Cayley-Hamilton theorem.
The only thing is that, instead of one, here you’ve to consider two generic matrices, so that
R̃ will be a polynomial ring in 2n2 variables over Z.
7. Let V be a finite-dimensional vector space over an algebraically closed field F . If T ∈ L(V ),
show that T is diagonalizable iff the restriction of T to every two-dimensional subspace of V
is diagonalizable.
(*) Prove that we can drop the condition that ‘V is finite-dimensional’ if T is assumed to be
a locally finite linear operator.
8. Let A ⊆ B be unique factorization domains with A being a principal ideal domain. If
B ∗ ∩ A = A∗ , then prove that any two elements a, b ∈ A are relatively prime in A iff they’re
relatively prime in B.
9. Let T be a linear operator on a finite-dimensional vector space V over an algebraically closed
field F . Then show that for every polynomial f (X) ∈ F [X], µ ∈ F is an eigenvalue of f (T )
iff µ = f (λ) for some eigenvalue λ of T (Note that the implication in one direction does not
require F to be algebraicaly closed.).
Can we drop the assumption that F is algebraically closed?
10. Let T be a linear operatorPon a vector space V . If W ≤ V is T -invariant, then show that TW
is diagonalizable iff W ⊆ λ∈F Vλ .
11. If T is a linear operator on an n-dimensional vector space V then show that the following
statements are equivalent.
(i) T has n distinct eigenvalues.
(ii) Any S ∈ L(V ) which commutes with T can be written as a polynomial in T over F .
(iii) Any S ∈ L(V ) which commutes with T is diagonalizable.

a b
Hint. Look at the matrix .
0 a
12. Let T be a triangulable linear operator on a finite-dimensional vector space V . Then prove
that the following statements are equivalent.
(a) T is diagonalizable.
(b) Every T -invariant subspace W ≤ V has a T -invariant complement.
(c) For all λ ∈ F , λ is not an eigenvalue of T̄Vλ .
13. If F = R/C, we can give a metric d on F[X] by defining
sX
d(f, g) := |fi − gi |2 .
i

Then it follows from the proof of the Cayley-Hamilton theorem that the map from Mn (F ) to
F [X] which sends a matrix to its characteristic polynomial, is a continuous map.
Is the map A 7→ minA (X) continuous?
If you give a correct answer, it’ll follow that, unlike characteristic polynomials, the coeffi-
cients of minimal polynomials are not polynomials in the entries of a matrix.

In the remaining exercises, we’ll explore the possibilities of extending some of the results,
which we’ve already seen in the context of finite-dimensional vector spaces, to vector spaces
of arbitrary dimension.

First, we introduce a few definitions.

Let T : V → V be a linear operator. Then the annihilator of T , denoted by ann T , is
defined to be the kernel of the F -algebra homomorphism from F [X] to L(V ) which sends X
to T . Note that, unlike finite-dimensional vector spaces, here the kernel can be the zero ideal.
Every polynomial, contained in the kernel, is called an annihilating polynomial of T ; and if

45
the kernel is nonzero, then the monic annihilating polynomial of least degree is called the
minimal polynomial of T and denoted by minT (X). Again, unlike finite-dimensional vector
spaces, T may not have any minimal polynomial, and T has a minimal polynomial iff it has
a nonzero annihilating polynomial.

In the following exercises, T : V → V is a linear operator.

L
(*) If V =
14. T i Vi is a T -invariant direct sum decomposition, then prove that ann T =
i ann TV i
.
15. (*) If V is not finite-dimensional, prove that for every linear operator T ∈ L(V ), V contains
infinitely many T -invariant subspaces.
16. (*) Let f (X), g(X) ∈ F [X] be relatively prime polynomials. Then show that
(i) If (f g)(T ) = 0 then V = ker f (T ) ⊕ ker g(T ).
Qr
(ii) Let f (X) ∈ F [X] be a nonzero annihilating polynomial of T . If f (X) = i=1 fiei (X)
is an irreducible decomposition of f (X) then show that

V = ker f1e1 (T ) ⊕ · · · ⊕ ker frer (T ).

S∞
Further, if f (X) is the minimal polynomial of T then prove that ker fiei (T ) = n=1 ker fin (T ).
17. (*) If T satisfies a nonzero annihilating polynomial and W ≤ V is a T -invariant subspace,
then show that

lcm(minTW (X), minT̄W (X)) | minT (X) | minTW (X) · minT̄W (X).

18. (*) ForSeach λ ∈ F , the generalized λ-eigenspace of T , denoted by Ṽλ , is defined to be the
∞
union n=1 ker (T − λI)n . Then prove that

(i) {Ṽλ }λ∈F is a linearly independent family of subspaces of V .

P
(ii) If W := λ∈F Ṽλ , then W is T -invariant and TW is locally finite.
L
(iii) If F is an algebraically closed field and T is locally finite, then show that V = λ∈F Ṽλ .

L operator T triangulable if V is the sum of its generalized eigenspaces,

(iv) We call a linear
i.e., if V = λ∈F Ṽλ . Note that a triangulable linear operator is locally finite.
Show that this new definition matches with the earlier one if V is finite-dimensional.
19. (*) Let T be a locally finite linear operator. The prove that T is injective iff it’s surjective.
In particular, if T is injective (respectively surjective) but not surjective (respectively injec-
tive), then T cannot satisfy any nonzero annihilating polynomial.
20. (*) Suppose that T satisfies a nonzero annihilating polynomial. The for every f (X) ∈ F [X],
prove that the following statements are equivalent
(i) f (T ) is invertible.
(ii) f (T ) is injective.
(iii) f (T ) is surjective.
(iv) gcd(minT (X), f (X)) = 1.
In particular, T is invertible iff X - minT (X).
21. (*) If T is invertible, then T −1 ∈ F [T ] iff T satisfies a nonzero annihilating polynomial.
22. (*) If T satisfies a nonzero annihilating polynomial, so does f (T ) for all f (X) ∈ F [X].
Hint. You need to show that for any two polynomials f (X), g(X) ∈ F [X], there exists a
nonzero polynomial h(X) ∈ F [X] such that g(X) | h(f (X)).
23. (*)
(i) Prove that a diagonalizable linear operator is locally finite.

46
(ii) If every T -invariant subspace W ≤ V has a T -invariant complement, show that T is
locally finite.
(iii) Prove that T is diagonalizable iff every T -invariant subspace W ≤ V has a T -invariant
complement.
24. (*) If T has a nonzero annihilating polynomial then T is diagonalizable iff minT (X) is a
product of distinct linear factors.

25. (*) If T is a triangulable linear operator then show that the following statements are equiv-
alent.
(i) T is diagonalizable.
(ii) Vλ = Ṽλ for all λ ∈ F .
(iii) λ is not an eigenvalue of T̄Vλ for all λ ∈ F .

Hint. For a linear operator T ∈ L(V ), the induced map T̄ : V /ker T → V /ker T is injective
iff ker T = ker T 2 .

47
Lecture 14 (18/11/2020) :
The simplest diagonal operators, after scalar operators, are those having exactly two eigenvalues;
and especially the ones with eigenvalues 0 and 1.

Definition. Let T be a linear operator on a vector space. Then T is said to be an idempotent

operator/projection if it satisfies T 2 = T . In general, if R is a ring (not necessarily commutative!),
an element a ∈ R is called an idempotent element if a2 = a. An idempotent element a is said to
be a trivial idempotent element if a = 0 or a = 1.
So the name ‘idempotent operator’ has an algebraic flavour whereas ‘projection’ has an obvious
geometric connotation. You may think of some concrete examples in R2 /R3 to get an idea why
such a name is given.

Remarks.
1. If π ∈ L(V ) is a projection, then the image and kernel of π are denoted by R(π) and N (π)
respectively. As π 2 = π, it follows that R(π) ∩ N (π) = 0.
2. From the theory of diagonalizable operators which we’ve developed so far, it’s easy to see
that every idempotent operator is diagonalizable.
But one can also check this directly, without appealing to any fancy machinery, from the fact
that for all x ∈ V , x − π(x) ∈ N (π), so that V = R(π) + N (π); and since they’re linearly
independent, it follows that V = R(π) ⊕ N (π).
3. If π ∈ L(V ) is an idempotent operator, so is I − π. Then R(I − π) = N (π) and N (I − π) =
R(π).

4. If π1 , π2 are idempotent operators then π1 = π2 iff N (π1 ) = N (π2 ) and R(π1 ) = R(π2 ).
5. The linear span of the set of n × n idempotent matrices is equal to Mn (F ).
6. A linear operator T on a finite-dimensional vector space V is a projection iff its matrix
representation [T ]B with respect to some (equivalently, all) ordered basis B of V is an
idempotent matrix.
7. A linear operator T ∈ L(V ) is a projection iff V = ker T ⊕ im T and Tim T = idim T . In other
words, a non-scalar operator is idempotent iff it’s a diagonalizable operator with eigenvalues
0 and 1.

8. If T ∈ L(V ) is a diagonalizable operator with exactly two eigenvalues, then there exist
a(6= 0), b ∈ F such that aT + bI is a projection.
9. If V = V 0 ⊕ V 00 is a non-trivial direct sum decomposition of V , i.e., V 0 and V 00 are both
nonzero, then there exists a uniqe idempotent operator π ∈ L(V ) such that R(π) = V 0 and
N (π) = V 00 . In this situation, we say that π is a projection onto V 0 along V 00 . Again, a few
examples in R2 /R3 will illustrate the idea.
10. If T is a diagonalizable linear operator on a vector space V , then for all S ∈ L(V ), ST = T S
iff S preserves all eigenspaces of T . In particular, for a projection π on V , S ∈ L(V ) commutes
with π iff both R(π) and N (π) are S-invariant. Therefore V has a direct sum decomposition
into nonzero T -invariant subspaces iff T commutes with a non-trivial idempotent operator.

11. It’s easy to see that the non-scalar idempotent operators on V are in one-to-one correspon-
dence with the direct sum decompositions of V as a sum of two proper subspaces. We’re
soon going to generalize this.
Definition. Let V be a vector space. A (finite) set of nonzero idempotent operators {π1 , · · · , πr }
⊆ L(V ) is said to be a resolution of identity if the following two conditions are satisfied.

(i) π1 + · · · + πr = I
(ii) πi πj = 0 for all i 6= j.

48
Remark. The word ‘finite’ has been kept in the bracket because one can also define an arbi-
trary resolution of identity without the finiteness assumption. But for us, a resolution of identity
will always mean a finite resolution of identity.

Lemma. Let V = V1 ⊕ · · · ⊕ Vr be a direct sum decomposition of V where each P Vi is nonzero.

For each i, if πi is the idempotent operator whose image is Vi and kernel is Ṽi := j6=i Vj , then
{π1 , · · · , πr } is a resolution of identity.

Proof. Let x ∈ V . Then there exist x1 , · · · , xr ∈ V such that xi ∈ Vi for all i and
x = x1 +· · ·+xr . Then (π1 +· · ·+πr )(x1 +· · ·+xr ) = π1 (x1 )+· · ·+πr (xr ) = x. As πj (V ) = Vj ⊆ Ṽi ,
it follows that πi πj = 0 for all i 6= j. Therefore {π1 , · · · , πr } is a resolution of identity.

Lemma. Let {π1 , · · · , πr } be a resolution of identity. Then {R(π1 ), · · · , R(πr )} is a set of

nonzero subspaces of V which gives a direct sum decomposition.

Proof. Each R(πP i ) is nonzero as π1 , · · · , πr are nonzero linear operators.

P
If x ∈ V then x = i πi (x). As πi (x) ∈ R(πi ) for all i, we have V = i R(πi ). Now to show
P
that the R(πi )’s are linearly independent, let Ṽi := j6=i R(πj ) for all i. If x ∈ R(πi ) ∩ Ṽi
then we canPwrite x as x = π1 (x1 ) + · · · πi−1 (xi−1 ) + πi+1 (xi+1 ) · · · + πr (xr ), implying that
x = πi (x) = j6=i πi πj (xj ) = 0. Hence V = R(π1 )⊕· · ·⊕R(πr ) is a direct sum decomposition of V .

Proposition. Let V be a nonzero vector space. Then the elements of the set of all finite
resolutions of identity are in one-to-one correspondence with the elements of the set of all finite
direct sum decompositions of V where each summand is nonzero.

Proof. Let D, R be the set of all finite direct sum decompositions of V and the set of all finite
resolutions of identity. Then we can define two set-theoretic maps Φ : D → R and Ψ : R → D as
follows.
Φ({V1 , · · · , Vr }) := {π1 , · · · , πr },
P
where πi is the idempotent linear operator whose image is Vi and kernel is Ṽi := j6=i Vj , and

Ψ({π1 , · · · , πr }) = {R(π1 ), · · · , R(πr )}.

Now one can check that Ψ ◦ Φ = idD and Φ ◦ Ψ = idR . The details are left as an exercise.

Remark. If we take finite sequences instead of finite sets, we’ll get a one-to-one correspondence
between all finite sequences of nonzero projections giving a resolution of identity and the set of all
finite sequences of nonzero subspaces of V which gives a direct sum decomposition. Then, unlike
sets, one has to distinguish between say (π, I − π) and (I − π, π), and likewise between (V 0 , V 00 )
and (V 00 , V 0 ).
As the above proposition suggests, ‘breaking’ a vector space into finitely many nonzero subspaces
is ‘same is’ ‘breaking’ the corresponding identity operator as a finite sum of projections which don’t
‘interact’ with each other.

Exercises.

1. HK : Section 6.7 - 2,5,8,9 (assume that ch F 6= 2),10 (assume that ch F = 0; what happens
in positive characteristic?),11.
P
2. Let V1 , · · · , Vn be subspaces of a vector space V . For each i, let Ṽi := j6=i Vj . Then prove
that V = Vi ⊕ Ṽi is a direct sum decomposition of V for every i iff V = V1 ⊕ · · · ⊕ Vn is a
direct sum decomposition.
3. Let T ∈ L(V ) and V = V 0 ⊕ V 00 a T -invariant decomposition of V . If ker T ⊆ V 0 then show
that ker T n ⊆ V 0 for all n ≥ 1.
Hence or otherwise, prove that if T commutes with a projection π ∈ L(V ), then for all λ ∈ F ,
ker π contains the λ-eigenspace of T iff it contains the generalized λ-eigenspace of T . If we
drop the condition that πT = T π, does the conclusion still hold?

49
4. Let S, T be linear operators on a vector space V satisfying STQ= T S. Suppose that f (X) ∈
r
F [X] is a nonzero polynomial such that f (T ) = 0. Let f (X) = i=1 fiei (X) be an irreducible
ei
decomposition of f (X) and Vi := ker fi (T ) for all i. Then show that V = V1 ⊕ · · · ⊕ Vr is
an S-invariant direct sum decomposition of V .
Prove that every generalized eigenspace of T is S-invariant.
5. If π1 , π2 ∈ L(V ) are idempotent operators then

(i) Give an example to show that π1 + π2 , π1 π2 need not be projections.

(ii) Show that π1 + π2 is a projection iff π1 π2 = π2 π1 if ch F = 2 and π1 π2 = π2 π1 = 0 is
ch F 6= 2.
(iii) Prove that π1 π2 is a projection if π1 π2 = π2 π1 .
If π1 π2 and π2 π1 are both projections, does it imply that π1 π2 = π2 π1 ?

Hints. To prove (ii), note that if S, T ∈ L(V ) satisfies ST = cT S for some nonzero constant
c ∈ F , then both the kernel and the image of T are S-invariant.
For (iii), you may think about 2 × 2 matrices.

50
Lecture 15 (20/11/2020) :
Why projections?
We’ve seen that finite direct sum decompositions of a vector space V are ‘same as’ finite resolutions
of the identity operator on V . Now if V = V1 ⊕ · · · ⊕ Vr is a direct sum decomposition then for a
linear operator T ∈ L(V ), the decomposition is PT -invariant iff T πi = πi T for all i, where πi ∈ L(V )
is the projection with R(πi ) = Vi and N (πi ) = j6=i Vj . Note that Ti , the restriction of T on Vi , is
not a linear operator on V . So we use projections to bring different restrictions of T ‘on the same
platform’, i.e., make them linear operators on V by noting that the action of T πi = πi T is ‘same’
P of identity π1 + · · · + πr = I allows us to similarly ‘break’
as the action of Ti on Vi . The resolution
T into its various restrictions as T = i T πi , where T πi becomes the substitute of Ti .
Using projections, we’ll be able to characterize (and even describe!) diagonalizable operators. In
the remaining part of this lecture, we’ll often assume that the liner operator under consideration
satisfies a nonzero polynomial over F (This is automatic if V is finite-dimensional!). In this context,
the following conditions are equivalent for a linear operator T on V .
(i) T satisfies a nonzero polynomial over F .
(ii) The F -algebra homomorphism from F [X] to L(V ) which sends X to T is not injective.
(iii) The F -algebra F [T ] is a finite-dimensional vector space over F .
We’ll start with some general results.

Lemma. Let T be a linear operator on a vector space V . If π ∈ F [T ] is a non-trivial projection

then the following assertions hold.
(i) T satisfies a nonzero polynomial over F .
(ii) For every eigenspace Vλ of T , either Vλ ⊆ N (π) or Vλ ⊆ R(π).
(iii) For each λ ∈ F , N (π) (or R(π)) contains the λ-eigenspace of T iff it contains the generalized
λ-eigenspace of T .
(iv) If Ṽλ is the generalized λ-eigenspace of T then either Ṽλ ⊆ N (π) or Ṽλ ⊆ R(π).
(v) Let f (X) ∈ F [X] be a nonzero annihilating polynomial of T . Suppose that f (X) splits into
linear factors over F with roots λ1 , · · · , λr . Then
X
N (π) = Ṽλi ,
λi such that φ(λi )=0

and X
R(π) = Ṽλj ,
λj such that φ(λj )=1

where π = φ(T ). In particular, if T is diagonalizable, then for every projection π ∈ F [T ],

both N (π) and R(π) are sum of eigenspaces of T .
Proof. To prove (i), just note that if f (X) ∈ F [X] a non-constant polynomial such that f (T )
satisfies a nonzero polynomial over F , say h(X), then T satisfies the polynomial h(f (X)). For us,
h(X) = X(X − 1)
If π = φ(T ), then the λ-eigenspace of T is contained in the φ(λ)-eigenspace of φ(T ). Now π being
a projection, φ(λ) takes exactly two values - 0 and 1, and hence (ii) follows.
The implication, in one direction, is trivial. For the other direction, if ker (T − λI) ⊆ N (π) then
for all n ≥ 1, ker (T − λI)n ⊆ ker π n = ker π = N (π). For the assertion involving R(π), just note
that R(π) = N (I − π).
The last two assertions follow from (ii) and (iii).

Lemma. Let T be a linear operator on a vector space V and f (X), g(X) ∈ F [X] are non-
constant relatively prime polynomials. Suppose that f g(T ) = 0. If π ∈ L(V ) is the projection of
V whose kernel is ker g(T ) and image is ker f (T ), then π ∈ F [T ].

51
Proof. As f (X), g(X) are relative prime polynomials, we can find two polynomials f1 (X), g1 (X)
∈ F [X] such that f f1 (X) + gg1 (X) = 1. Then one can easily check that π = gg1 (T ).

The simple-looking lemma immediately gives us several useful corollaries.

Corollary. Let T be a linear operator on V satisfying a nonzero polynomial f (X) ∈ F [X].

Then the following assertions hold.
(i) If λ ∈ F is a root of f (X) and f (X) = (X − λ)e g(X) with g(λ) 6= 0, then F [T ] contains a
projection πλ such that N (πλ ) = ker g(T ) and R(πλ ) = ker (T − λI)e .
Qr
(ii) If f (X) := i=1 fiei (X) is an irreducible decomposition, and Vi := ker fiei (T P) for all i,
then for each i, there exists a projection πi ∈ F [T ] such that N (πi ) = Wi := j6=i Vj and
R(πi ) = Vi .
(iii) If p(X) ∈ F [X] is the minimal polynomial of T , then F [T ] contains exactly 2r idempotent
operators where r is the number of distinct prime factors of p(X).
In particular, if each fi is a linear polynomial, then every projection of V whose image is a
generalized eigenspace of T and the kernel the sum of the remaining generalized eigenspaces
of T is contained in F [T ] and conversely.
Proof. Left as an exercise.

Projections and Diagonalizability

We’ll now explore the close relation between diagonalizable operators and projections. The first
proposition translates the diagonal matrix representation of a diagonalizable operator in the lan-
guage of projections.

Proposition. Let V (6= 0) be a vector space and T ∈ L(V ) a diagonalizable operator with
finitely many eigenvalues λ1 , · · · , λr (The second condition is superfluous if V is finite-dimensional.).
Then there exists a resolution of identity {π1 , · · · , πr } such that T = λ1 π1 + · · · + λr πr .
Conversely, if {π1 , · · · , πr } is a resolution of identity and λ1 , · · · , λr are (distinct) elements of F
such that T = λ1 π1 + · · · + λr πr , then T is a diagonalizable operator with eigenvalues λ1 , · · · , λr .

Proof. If T is a diagonalizable operator with eigenvalues λ1 , · · · , λr and associated eigenspaces

P1 · · · , Vλr . Then for each i, let πi be the projection whose image is Vλi and kernel Wi :=
V λ ,
j6=i Vλj . Then {π1 , · · · , πr } is a resolution of identity and T = λ1 π1 + · · · + λr πr . The details
are left as an exercise.
The converse follows since every sum of a finite set of diagonalizable commuting operators is di-
agonalizable. Here we don’t need λi ’s to be distinct. The resulting linear operator T will have
eigenvalues λ1 , · · · , λr .

Remark. In the second part of the above proposition, if we only assume that T is a sum of
a finite set of commuting projections, then although T still remains diagonalizable, its eigenvalues
need to have any relation with λ1 , · · · , λr . For example, one may take I − I = 0.
It follows from our previous discussions that when T is diagonalizable, each πi in the above propo-
sition can be written as a polynomial in T . To see this directly, let T = λ1 π1 + · · · + λr πr . Then
for every polynomial h(X) ∈ F [X],
h(T ) = h(λ1 )π1 + · · · + h(λr )πr .
Now by Lagrange’s interpolation, for each i, there exists a polynomial fi (X) ∈ F [X] of degree
≤ r − 1 such that fi (λj ) = δij , so that πi = fi (T ).
You may try to state and prove a matrix-theoretic analogue of the above proposition. Believe me,
it’s a rewarding exercise!

The next proposition highlights the relation between diagonalizable operators and projections.

Proposition. Let T be a linear operator on V and f (X) ∈ F [X] a nonzero polynomial such
that f (T ) = 0. Let p(X) be the minimal polynomial of T with deg p(X) = r. Then the following
statements are equivalent.

52
1. T is diagonalizable.
2. p(X) has r distinct monic irreducible factors.
3. F [T ] contains 2r distinct projections.
4. T can be written as an F -linear combination of the idempotent operators contained in F [T ].
5. As an F -algebra, F [T ] is generated by the idempotent elements contained in it.
6. As an F -algebra, F [T ] is generated by the diagonalizable operators contained in it.
Proof. You may prove that (i) ⇐⇒ (ii) ⇐⇒ (iii) =⇒ (iv) =⇒ (v) =⇒ (vi) =⇒ (i).
The details are left as an exercise.

The next two results are similar in spirit as both of them identifies nilpotency to be the ‘real
reason’ behind the failure of diagonalizability, at least when all the eigenvalues are ‘present’ in F .

Proposition. Let T be a linear operator on V and f (X) ∈ F [X] a nonzero polynomial such
that f (T ) = 0. Suppose that f (X) splits into a product of linear factors over F . Then T is
diagonalizable iff F [T ] is a reduced ring, i.e., it does not contain any nonzero nilpotent element.

Proof. Let p(T ) be theQr minimal polynomial of T , i.e., the monic polynomial of least degree
satisfied by T . If p(X) := i=1 (X −λi )ei is the irreducible decomposition of p(X) into the product
of powers of distinct monic irreducible polynomials then the assertion follows from the F -algebra
isomorphism
r
F [X]
F [T ] ∼
Y
= .
i=1
((X − λ i ) ei )

Finally, the following theorem allows us to ‘split’ a linear operator T in its diagonalizable and
nilpotent ‘parts’ so that T is diagonalizable iff its nilpotent part is 0.

Theorem. Let T be a linear operator on V (6= 0) and f (X) ∈ F [X] a nonzaero polynomial
such that f (T ) = 0. Suppose that f (X) splits into a product of linear factors over F . Then there
exist unique linear operators D, N ∈ L(V ) satisfying the following conditions.
1. D is diagonalizable and N is nilpotent.
2. T = D + N .
3. DN = N D.
In this case, both D, N ∈ F [T ].
Qr
Proof. Let p(X) be the minimal polynomial of T and p(X) = i=1 (X − λi )ei the irreducible
decomposition of p(X) into a product of powers of distinct monic irreducible polynomials. For
each i, let Vi := ker (T − λi I)ei . Then V = V1 ⊕ · · · ⊕ Vr is a T -invariant
P direct sum decomposition.
Let πi be the projection whose image is Vi and kernel is Wi := j6=i Vj . Then {π1 , · · · , πr } is a
Pr
resolution of identity and D := i=1 λi πi is a diagonalizable linear operator. Using Lagrange’s
interpolation, we can write each πi as a polynomial in T of degree ≤ r − 1. Consequently, D is
also a polynomial in T of degree ≤ r − 1. Let N := T − D and e the maximum among ei ’s. Then
N e = 0 as N |Vi = T − λi I|Vi for all i. It implies that N ∈ F [T ] is a nilpotent operator.
To prove uniqueness, let T = D + N = D0 + N 0 be two decompositions satisfying the proper-
ties of the theorem, with D, N being the linear operators constructed above. We want to show
that D = D0 and N = N 0 . As D0 commutes with N 0 , both D0 and N 0 commute with T . Since
both D and N are polynomials in T , it implies that DD0 = D0 D and N N 0 = N 0 N . Therefore,
D − D0 = N 0 − N is a diagonalizable nilpotent operator, implying that D = D0 and N = N 0 .

Remark. Let’s try to ‘understand’ the above result and its proof, at least in the case when
V is finite dimensional. The reason why we only consider finite-dimensional vector spaces is that
we’ll consider a matrix representation of T which is easy to ‘visualize’.

53
Qr
Let T be a linear operator on an n-dimensional vector space V with ch T (X) = i=1 (X − λi )ei .
Let Vi be the generalized λi -eigenspace of T , i.e., Vi = ker (T − λi I)ei . For each i, we can choose
an ordered basis Bi of Vi , such that the corresponding matrix representation [TVi ]Bi is an upper-
triangular matrix. Then B := (B1 , · · · , Br ) is an ordered basis of V , and clearly

[TV1 ]B1
 

 . 0 

[T ]B = .
 

0
 
 . 
[Tr ]Br
so that [T ]B is a block diagonal matrix where each block is an upper triangular matrix of size
dim Vi . Also, for each i, every diagonal entry of [TVi ]Bi is equal to λi . Now if you look at the
proof of the above theorem, you’ll readily see that [D]B is the n × n diagonal matrix consisting of
r blocks of scalar matrices, which is obtained by collecting the ‘scalar part’ from each [TVi ]Bi . It
follows from the construction of D as given in the proof, as well as from the above discussion, that
chT (X) = chD (X). Also, the above discussion makes it obvious that T − D is a nilpotent operator.

Exercises.

1. HK : Section 6.7 - 1,4,5,6,8,9; Section 6.8 - 5,6,7,8,10,11,12,13,14 (can you give an ‘easier’
proof if F is infinite?).
2. If A, B ∈ Mn (F ) are nilpotent matrices, does it follow that A+B, AB are nilpotent matrices?
What if we further assume that AB = BA?
3. Let A, B ∈ Mn (F ) such that A is invertible and B is nilpotent. Is A+B an invertible matrix?
Does your answer change if we further assume that AB = BA?
4. If T : V → V is a linear operator, then show that the following statements are equivalent.
(i) ker T = ker T 2 .
(ii) ker T ∩ im T = 0.
Further, if V is finite-dimensional, the above conditions are equivalent to
(iii) rk T = rk T 2 .
(iv) V = ker T ⊕ im T .
Deduce that if V is a finite-dimensional vector space over an algebraically closed field F and
T ∈ L(V ) is a linear operator, then T is diagonalizable iff rk (T − λI) = rk (T − λI)2 for all
λ ∈ F.
5. (*) Let F be an algebraically closed field and A ∈ Mn (F ). Then X r = A has a solution in
Mn (F ) for all r ≥ 1 iff rk A = rk A2 .
Can we drop the condition that F is algebraically closed?
6. Let T : V → V be a linear operator satisfying a non-constant polynomial in F [X]. Then
show that the following statements are equivalent.
(a) minT (X) is a power of an irreducible polynomial.
(b) F [T ] does not contain any non-trivial idempotent element.
(c) F [T ] does not contain any non-scalar diagonalizable operator.

7. If A is a commutative F -algebra of diagonalizable matrices in Mn (F ), then show that

dim A ≤ n.
8. Let A ∈ Mn (F ) be an upper (or lower) triangular matrix. Then show that A is nilpotent iff
every diagonal entry of A is equal to 0.

9. Let T, π ∈ L(V ) be linear operators. If π is a projection and T π = πT , does it follow that

π ∈ F [T ] ?

54
10. If A is an n × n matrix over F then show that A is diagonalizable (respectively triangulable)
iff At is diagonalizable (respectively triangulable).
11. Let S, T ∈ L(V ) be linear operators satisfying ST = T S. Then prove that for all λ ∈ F , the
generalized λ-eigenspace of T is S-invariant.
12. (*) Let λ be an eigenvalue of a linear operator T ∈ L(V ). If there exists a projection
π ∈ L(V ) such that T π = πT and R(π) = Vλ , then show that Vλ is equal to the generalized
λ-eigenspace of T .
Let T ∈ L(V ) be a linear operator satisfying a nonzero polynomial f (X) ∈ F [X] which
splits into a product of linear factors over F . If λ1 , · · · , λr are the eigenvalues of T and
for each i, there exists a projection πi ∈ F [T ] such that R(πi ) = Vλi , then prove that T is
diagonalizable.
13. (*) Let T be a linear operator on an n-dimensional vector space V such that the minimal
polynomial of T has degree n. Then prove that every linear operator S ∈ L(V ) which
commutes with T can be written as a polynomial in T over F .
Hint. Find an element x ∈ V such that {x, T (x), · · · , T n−1 (x)} is a basis of V . Then there
exists a polynomial g(X) ∈ F [X] such that S(x) = g(T )(x). Can you show that S = g(T )?
14. Let T be a linear operator on V such that every T -invariant subspace of V has a T -invariant
complement. Then prove that ker T = ker T 2 .
Deduce that for all λ ∈ F , the λ-eigenspace of T is equal to its generalized λ-eigenspace.
Further, if we assume that F is algebraically closed and T is locally finite, then prove that
T is a diagonalizable operator.
15. Let T : V → V be a linear operator. Then show that im T has a T -invariant complement iff
V = im T + ker T .
Deduce that if V is moreover finite-dimensional, then the following statements are equivalent.
(a) V = im T ⊕ ker T .
(b) im T and ker T are linearly independent.
(c) im T has a T -invariant complement.
(d) im T has a unique T -invariant complement.
16. Let T be a linear operator on a finite-dimensional vector space V . Then show that V does
not have any nonzero proper T -invariant subspace iff chT (X) is an irreducible polynomial.
17. Let T be a linear operator on a finite-dimensional vector space V . If chT (X) = minT (X) is
a power of an irreducible polynomial, then show that a nonzero proper T -invariant subspace
of V cannot have any T -invariant complement.
Qr
18. (*) Let A, B ∈ Mn (F ). such that chA (X) = chB (X) = i=1 (X − λi )ei and minA (X) =
minB (X). If ei ≤ 3 for all i, then prove that A and B are similar matrices.
19. (*) Let A be an n × n matrix over a field F . Then the commutator of A, denoted by CA , is
defined as
CA := {M ∈ Mn (F ) | AM = M A}.
Now prove the following statements.
(i) If A and B = P −1 AP are similar matrices then CB = P −1 CA P . In particular, dim CA =
dim CB .
(ii) Given A ∈ Mn (F ), define a linear operator TA : Mn (F ) → Mn (F ) given by TA (M ) :=
AM −M A. Then CA = ker TA . Prove thar there exists an ordered basis B (independent
of A) of Mn (F ) such that

A
 

 . 0 

[TA ]B = .
 

0
 
 . 
A

55
i.e, [TA ]B is a block diagonal matrix with exactly n blocks of size n × n and each block
is equal to A.
Deduce that the map Mn (C) → Mn2 (C), given by A 7→ [TA ]B , is a continuous linear
transformation (One can also conclude this from the fact that every linear transforma-
tion between finite dimensional vector spaces over F := R/C is continuous, but we can
actually ‘see’ it from the above construction.).
(iii) If E/F is a field extension and A ∈ Mn (F ), then we can also define

CA (E) := {M ∈ Mn (E) | AM = M A}.

Then show that dimF CA = dimE CA (E).

(iv) Prove that for all A ∈ Mn (C), dim CA ≥ n.
Hint. To prove (iv), note that the assertion is true for all diagonalizable matrices over C
which is a dense subset of Mn (C).
The result in (iv) can be generalized to an arbitrary field F using Zariski topology.
20. If A ∈ Mr (C), the set of all eigenvalues of A is called the spectrum of A and denoted by
λ(A). Then prove the following assertions.

(i) If A and B = P −1 AP are similar matrices then lim An exists iff lim B n exists, and
n→∞ n→∞
in this case lim B n = P −1 ( lim An )P .
n→∞ n→∞
(ii) lim An = 0 iff λ(A) ⊆ B(0, 1).
n→∞

(iii) (*) The set {An } is bounded iff λ(A) ⊆ B[0, 1] and for all λ ∈ S 1 ∩ λ(A), the geometric
multiplicity of λ is equal to its algebraic multiplicity, i.e., rk (A − λI) = rk (A − λI)2 .
(iv) (*) The sequence (An ) is convergent iff λ(A) ⊆ B(0, 1) ∪ {1} and the geometric multi-
plicity of 1 is equal to its algebraic multiplicity, i.e., rk (A − I) = rk (A − I)2 . In this
case, lim An is an idempotent operator whose rank is equal to the geometric multi-
n→∞
plicity of 1.

Hint. To prove (ii), note that A is similar to a matrix A0 such that A0 can be written
as A0 = D + N , where D is a diagonal matrix with chD (X) = chA (X), N is a nilpotent
n
matrix and DN = N D. Now expand (D +
nN ) .
an nan−1 b

a b
For (iii), note that if a, b ∈ F then = for all n ≥ 1.
0 a 0 an
To prove (iv), observe that if α ∈ C, then lim αn = 1 iff α = 1.
n→∞

21. Let T be a linear operator on V . For an element v ∈ V , the annihilator of v with respect
to T , denoted by annT v, is defined as annT v := {f (X) ∈ F [X] | f (TT)(v) = 0}; and if
S ⊆ V , the annihilator of S with respect to T is defined as annT S := v∈S annT v. Note
that annT V = ann T . If annT v 6= 0, then the unique monic generator of annT v is also called
the annihilator of v with respect to T . In the following exercises, assume that T satisfies a
nonzero polynomial over F .

(i) Let V 0 , V 00 ≤ V with f (X) := annT V 0 and g(X) := annT V 00 . If f (X), g(X) are
relatively prime, then show that V 0 ∩ V 00 = 0.
(ii) For all x ∈ V , annT x = annT Vx , where Vx is the smallest T -invariant subspace of V
containing x.
(iii) Let u, v ∈ V . If f (X) := annT u and g(X) := annT v are relatively prime, then prove
that annT (u + v) = f g(X).
(iv) If φ(X) is a non-constant factor of minT (X), then show that there exists an element
v ∈ V such that annT v = φ(X).

56
Lecture 16, 17, 18 and 19 (14/12/2020, 16/12/20, 21/12/2020,
23/12/2020) :
Inner Product Spaces
So far, we have studied linear algebra from a purely algebraic viewpoint. But we have also seen
from different courses on analysis and geometry that Rn comes equipped with a natural geometric
structure - the so-called Euclidean metric which can be studied and used to our benefit.
We will now systematically develop a geometric structure on vector spaces. To do so, from now
on, unless otherwise mentioned, we will exclusively work over the field of real or complex numbers
which already have ‘in-built’ geometric structures. With co-ordinate systems at our disposal, our
main goal is to introduce the notions of ‘length’ and ‘angle’ - the fundamental tools of co-ordinate
geometry. In fact, we will soon see that the notion of angle, per se, is not that important, but the
concept of ‘orthogonality’ is, i.e., when two vectors are mutually perpendicular.
We will mainly focus on finite-dimensional vector spaces and many of the results which we prove
in this context have their natural generalizations to vector spaces of arbitrary dimensions, as one
can see in functional analysis. In a sense, the study of finite-dimensional inner product spaces in
linear algebra is an excellent prelude to a course on functional analysis.

Definitions. Let V be a vector space over a field F := R/C. Then an inner product on V is
defined to be a function
h , i: V × V → F
satisfying the following properties
(i) Linearity in the first co-ordinate: For all x, y, z ∈ V and λ ∈ F , hx + λy, zi = hx, zi + λhy, zi.
(ii) Conjugate-symmetry: For all x, y ∈ V , hy, xi = hx, yi.
(iii) Positivity: For all x ∈ V , if x is nonzero then hx, xi > 0.
An ordered pair (V, h , i), where V is a vector space over F and h , i : V × V → F is an inner
product on V , is called an inner product space, or IPS, in short, over F . Often we’ll denote an
inner product space simply by V if either the inner product h , i is understood from the context or
if the discussion is independent of any particular choice of the inner product. If F = R (or F = C)
then V is caller a real inner product space (or a complex inner product space).
Two vectors x, y ∈ V are said to be mutually orthogonal or orthogonal to each other if hx, yi = 0.
In symbol, we write this as x ⊥ y. Note that x ⊥ y iff y ⊥ x and a nonzero vector x cannot be
orthogonal to itself.
A finite sequence of vectors x1 , . . . , xn ∈ V is said to be an orthogonal sequence if xi ⊥ xj for all
i 6= j.
An orthogonal sequence of p vectors x1 , . . . , xn ∈ V is said to be an orthonormal sequence if each xi
has norm 1, i.e., if kxi k := hxi , xi i = 1 for all i. A vector whose norm is 1 is called a unit vector.
Thus an orthonormal sequence is an orthogonal sequence of unit vectors.
Note that an orthogonal sequence of nonzero vectors is linearly independent. In particular, an
orthonormal sequence of vectors is linearly independent.
A set of vectors S ⊆ V is said to be an orthogonal set (respectively an orthonormal set) if ev-
ery finite sequence of distinct elements of S is orthogonal (respectively orthonormal). Note that
we’re following the same pattern of definitions as given for linearly independent sequence/set of
vectors. However, unlike linear independence, orthogonality is always checked pairwise. Therefore,
we could have directly defined a set S ⊆ V to be orthogonal if any two distinct elements of the set
are mutually orthogonal. Further, if S consists of only unit vectors, then S is an orthonormal set.
In particular, a singleton set {x} ⊆ V is always orthogonal, and it’s orthonormal iff kxk = 1.

Remarks.
1. If (V, h , i) is an inner product space then for every subspace W ≤ V , the restriction h , i|W ×W
is an inner product on W , so that (W, h , i|W ×W ) itself becomes an inner product space.
More generally, let V, W be vector spaces over F := R/C with (V, h , iV ) being an inner
product space. If T : W → V is an injective linear transformation, the we can consider W to
be an inner product space by identifying it with its image T (W ), i.e., h , iW : W × W → F ,
defined as hx, yiW := hT (x), T (y)iV for all x, y ∈ W , is an inner product on W .

57
2. If F = R, then an inner product on V is a symmetric bilinear form, i.e., a symmetric function
on V × V which is linear in both the co-ordinates.
However, if F = C, then an inner product on V is neither bilinear nor symmetric. It’s
a so called sesquilinear form, i.e., one-and-half linear, because although it’s linear in the
first co-ordinate and additive in the second co-ordinate, it’s only conjugate-linear in the
second co-ordinate. Actually, over C, it couldn’t have been bilinear without sacrificing the
‘positivity’ condition; because if f : V × V → F is a bilinear function, then for all x ∈ V ,
f (ix, ix) = −f (x, x), implying that the positivity condition fails whenever V 6= 0.

3. hx, 0i = h0, xi = 0 for all x ∈ V .

4. If x ∈ V is such that hx, yi = 0 for all y ∈ V , then x = 0.
5. Let h , i be an inner product on a vector space V over F = R/C. Then the image of V × V
under h , i cannot be a proper subset of F unless V = 0.

6. If x1 , . . . , xn ∈ V is a sequence of orthogonal vectors (Strictly speaking, this is wrong since

‘orthogonality’ is a ‘quality’ of the sequence and not of the individual vectors!) then

kx1 + . . . + xn k2 = kx1 k2 + . . . + kxn k2 .

This is known as the Pythagorean law and it’s just a generalization of what we learnt in our
high school geometry; that given any right-angled triangle, hypotenuse2 = base2 + height2 .
7. Let (V, h , i) be an inner product space. Then the induced norm k k satisfies the parallelogram
law, i.e., kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) for all x, y ∈ V . We’ll later see that every
norm which satisfies this parallelogram law is induced by an inner product.

8. If h , i1 and h , i2 are two inner products on a vector space V then h , i1 + h , i2 is also an

inner product, and if V 6= 0 then for a constant λ ∈ F , λh , i1 is an inner product on V iff λ
is a nonzero positive real number. Also, h , i1 − h , i2 is an inner product iff hx, xi1 > hx, xi2
for all nonzero vectors x ∈ V .
9. Let (V, h , i) be an inner product space. Thenp
we can define a non-negative real-valued
function on V , the so called norm, as kxk := hx, xi. The norm function satisfies the
following properties
(i) kx + yk ≤ kxk + kyk for all x, y ∈ V .
(ii) kλxk = |λ| kxk for all λ ∈ F and x ∈ V .
(iii) kxk = 0 =⇒ x = 0 for all x ∈ V .

Now using the norm function k k, we can define a metric d : V × V → R as d(x, y) := kx − yk

for all x, y ∈ V . For an inner product space V , we’ll say that V is complete, separable etc. if
it is so with respect to the induced metric d. A complete inner product space is also called
a Hilbert space, named after the famous mathematician David Hilbert.
Examples. Recall that if A is an m × n matrix over C then A∗ , the conjugate-transpose of A,
is defined to be the n × m matrix whose ij-th entry is the conjugate of the ji-th entry of A for all
i, j. If the entries of A are coming from R, then A∗ = At .
Pn
1. The usual dot product on F n , defined as x · y := i=1 xi y i for all x, y ∈ F n , is an inner
product. If we treat the elements of F n as n × 1 column vectors, then x · y is nothing but
the entry of the 1 × 1 matrix y ∗ x. So identifying F with M1×1 (F ), we’ll often write that
x·y = y ∗ x. If P ∈ GLn (C), then using the first remark above, we can define an inner product
on Cn as hx, yiP := y ∗ P ∗ P x. Here we get the new inner product h , iP by composing the
usual dot product with the automorphism of Cn given by P .
2. Let V be the vector space of n × n square matrices over F . Then hA, Bi := tr AB ∗ is an
inner product on V .

3. Let F := R/C and V := C([0, 1], F ), the F -vector space of all continuous F -valued functions
R1
defined on the unit interval [0, 1]. Then hf, gi := 0 f (x)g(x) dx is an inner product on V .

58
Definitions. Let V be an inner product space. An orthonormal set S ⊆ V is said to be a
complete orthonornal set if it’s a maximal element in the family of all orthonormal subsets of V
with respect to set inclusion.
An orthonormal set S ⊆ V is said to be an orthonormal basis or a Hilbert basis if the linear span
of S is dense in V with respect to the induced metric.

It is clear that every orthonormal basis is a complete orthonormal set. Now using Zorn’s
lemma, we can easily show that every inner product space contains a complete orthonormal set (If
V = 0, it’s the empty set!). But surprisingly, there are examples of inner product spaces without
any orthonormal basis; in particular, showing that a complete orthonormal set need not be an
orthonormal basis. However, we’ll later see that every complete inner product space, i.e., a Hilbert
space, always has an orthonormal basis. In fact, in a Hilbert space, every complete orthonormal
set is a Hilbert basis.

Next, we do Gram-Schmidt orthogonalization, perhaps the most useful theorem for finite-
dimensional inner product spaces. It’ll allow us to ‘convert’ bases into orthonormal bases; so
that, theoretically, every problem about a finite-dimensional inner product space can be reduced
to a problem about F n with the usual dot product. In fact, often it suffices to consider just F 2 ,
as we’ll see.

Gram-Schmidt Orthogonalization. Let x1 , x2 , . . . be a finite/infinite sequence of vectors in

an inner product space. Then there exists an orthogonal sequence of vectors y1 , y2 , . . . in V such
that < x1 , . . . , xn >=< y1 , . . . , yn > for all positive integers n.
The sequence y1 , y2 , . . . can be chosen to be an orthonormal sequence iff the sequence x1 , x2 , . . . is
linearly independent.

Proof. We give an inductive construction of the sequence y1 , y2 , . . . .

For n = 1, we take y1 := x1 .
Now suppose that for some positive integer n = r, there exists an orthogonal sequence y1 , . . . , yr
such that < x1 , . . . , xi >=< y1 , . . . , yi > for all i ≤ r. For n = r + 1, we want to find a vector
yr+1 ∈ V such that yr+1 ⊥ yi for all i ≤ r and < x1 , . . . , xr , xr+1 >=< y1 , . . . , yr , yr+1 >. If
xr+1 ∈< x1 , . . . , xr > then we define yr+1 := 0. Otherwise,
Pr if such a yr+1 exists, then one such
yr+1 can be written in the form yr+1 = xr+1 − i=1 λi yi . So we want to find the constants
λ1 , . . . , λr ∈ F such that hyr+1 , yi i = 0 for all i ≤ r. Now one can easily check that
( hx ,y i
r+1 i
kyi k2 if yi 6= 0
λi =
0 otherwise.

Note that when yi = 0, λi could have been chosen to be any constant. Also, yr+1 = 0 iff
xr+1 ∈< x1 , . . . , xr >. Therefore y1 , y2 , . . . can be chosen to be an orthonormal sequence iff
the original sequence x1 , x2 , . . . is linearly independent.

Remarks.
1. If y1 , y2 , . . . is an orthogonal sequence satisfying the condition of the above theorem, then so
is the sequence λ1 y1 , λ2 y2 , . . . for every sequence of nonzero elements λ1 , λ2 , . . . ∈ F .
And if y1 , y2 , . . . is an orthonormal sequence satisfying the condition of the above theorem,
then so is the sequence λ1 y1 , λ2 y2 , . . . for every sequence of elements λ1 , λ2 , . . . ∈ F such
that |λi | = 1 for all i.
So the Gram-Schmidt orthogonalization only gives us one orthogonal sequence satisfying the
desired condition. It is not unique even if we restrict ourselves only to orthonormal sequences.
2. Every finite dimensional inner product space V has an orthonormal basis. In this case, an
orthonormal set S ⊆ V is an orthonormal basis iff it’s a basis of V .

3. If V is an inner product space of countably infinite dimension, then V has an orthonormal

basis. However, in this case, an orthonormal basis need not be a spanning set ofPV . For
example, let V := F (N) , with the inner product being defined as h(an ), (bn )i := n an bn .
Then V is not a Hilbert space. But its completion Vb (with respect to the induced metric)

59
is a Hilbert space and every orthonormal basis of V is also an orthonormal basis of Vb . Now
suppose that V is an inner product space of countably infinite dimension as given in the
above example. Then choose a vector x ∈ Vb \ V and let W := V + < x >. Clearly, W has
a countable dimension. Now every orthonormal basis S of V is also an orthonormal basis
of W , but S cannot span W . Therefore, unlike the finite-dimensional case, an orthonormal
basis need not be a spanning set of a countably infinite-dimensional inner product space.
4. Let V be an inner product space over F = R/C. For every positive integer n, let Tn :=
{(a1 , . . . , an ) ∈ F n | |ai | = 1 for all i}. Then Tn is a group under coordinate-wise multipli-
cation and it acts on the set of all orthonormal sequences of length n, where the action is
given by ((a1 , . . . , an ), (x1 , . . . , xn )) 7→ (a1 x1 , . . . , an xn ). Similarly, the group (F ∗ )n acts on
the set of all orthogonal sequences of length n. Note that if F = C, then Tn = (S 1 )n and if
F = R, then Tn = {±1}n .

Cauchy-Schwarz inequality. If (V, h , i) is an inner product space then for all x, y ∈ V ,

|hx, yi| ≤ kxk kyk.

Proof. It’s easy to see that if x, y are linearly dependent then |hx, yi| = kxk kyk.
So let us assume that W , the subspace of V generated by x and y, has dimension 2. Note that it
suffices to prove the inequality after replacing x by x/kxk. So we may assume that kxk = 1. Since
W is two-dimensional, we can use Gram-Schmidt orthogonalization to find an orthonormal basis
{x, e} of W . Let y = c1 x + c2 e where c2 6= 0. Then
p
|hx, yi| = |hx, c1 x + c2 ei| = |c1 | < |c1 |2 + |c2 |2 = kyk.

Therefore the equality holds in Cauchy-Schwarz inequality iff x and y are linearly dependent.

We can use Cauchy-Schwarz inequality to show that the norm induced by an inner product
satisfies the triangle inequality.

Corollary. Let (V, h , i) be an inner product space with the induced norm k k. Then
kx + yk ≤ kxk + kyk.

Proof. As both sides are non-negative, it suffices to prove the inequality after squaring them.
Now kx + yk2 = kxk2 + kyk2 + 2 Re hx, yi ≤ kxk2 + kyk2 + 2 |hx, yi| ≤ kxk2 + kyk2 + 2 kxk kyk =
(kxk + kyk)2 .

Complex inner product and the relation between Cn and R2n

We first observe that the inner product of a complex inner product space (V, h , i) is completely
determined by the ‘real part’ of the inner product. For all x, y ∈ V ,

hx, yi = Re hx, yi + i Im hx, yi.

Recall that if z is a complex number, then Im z = Re (−iz). Therefore, Im hx, yi = Re (−ihx, yi) =
Re hx, iyi. So we can write
hx, yi = Re hx, yi + i Re hx, iyi.
Now one can identify the vector space Cn with R2n by sending a vector (z1 , . . . , zn ) ∈ Cn
to the vector (Re z1 , Im z1 , . . . , Re zn , Im zn ) ∈ R2n , which is an R-linear transformation. Let
h , iC , h , iR denote the usual dot products on Cn and R2n respectively. Then one can check that
for all x, y ∈ Cn , Re hx, yiC = hx, yiR , where the right hand side is computed after identifying x, y
with their images in R2n under the prescribed identification. However, one can also check that
the induced norms k kC and k kR are same, i.e., it does not matter whether we compute the norm
of an element x ∈ Cn , or we compute its norm after identifying it as an element of R2n , which is
consistent with our intuition. Therefore, although the induced norms are same, the complex dot
product on Cn seems to ‘carry more information’ than its real counterpart, the dot product on R2n .
But actually the the complex inner product does not give us anything ‘more’ as the ‘imaginary
part’ of a complex inner product can be retrieved from its ‘real part’. Hence (Cn , ·) contains the
‘same information’ as (R2n , ·).

60
The angle between two nonzero vectors
If x, y ∈ R2 , we know from the knowledge of Cartesian geometry that θ(x, y), the angle between x
and y, is given by x · y = kxk kyk cos θ(x, y). Now any two vectors in Rn which are not collinear
generate a plane in Rn , and after identifying the plane with R2 , we can compute the angle between
those vectors by treating them as vectors in R2 . One can check the angle θ(x, y) is then given by
!
x · y
θ(x, y) = cos−1 , where 0 ≤ θ(x, y) ≤ π.
kxk kyk

So if (V, h , iR ) is a real inner product space and x, y ∈ V are nonzero vectors, we define θ(x, y),
the angle between x and y, as
!
−1 hx, yiR
θ(x, y) = cos , where 0 ≤ θ(x, y) ≤ π.
kxk kyk

Now if V is a complex inner product space then the inner product takes values in complex num-
bers. So, for example, if V = Cn with the usual dot product, then it’s only natural to identify Cn
with R2n and use the available notion of angle in R2n . This is something which we’ve always done
by identifying C with R2 . So inspired by the relation between Cn and R2n with their usual dot
products, we give the general definition of the angle between two nonzero vectors as follows.

Definition. Let (V, h , i) be an inner product space over F = R/C. If x, y ∈ V are nonzero
vectors then θ(x, y), the angle between x and y, is defined as
!
−1 Re hx, yi
θ(x, y) = cos , where 0 ≤ θ(x, y) ≤ π.
kxk kyk

But one should remember that the above definition is not very ‘intuition-friendly’ over C. As
for a simple example, let V := C be the one-dimensional vector space over C with the usual dot
product. Then in the algebraic sense, i, 1 ∈ V are ‘C-collinear’ because together they generate a
one-dimensional space. However, by the above definition, the angle between them is θ(i, 1) = π/2,
i.e., the vectors are mutually perpendicular! Here ‘the mystery’ lies in the fact that we ‘forget’
the ‘imaginary part’ of the inner product while computing the angle. The vectors i and 1 are not
orthogonal in the ‘complex sense’ because hi, 1i = i 6= 0. But they ‘become’ perpendicular when
we identify C with R2 , where 1 is mapped to (1, 0) and i is mapped to (0, 1).

Why inner product?

Inner product spaces model on and generalize Euclidean spaces, i.e., Rn with the usual scalar/dot
product. It’s especially very useful to study infinite-dimensional vector spaces where things like
‘infinite sums’ wouldn’t have made sense without the notion of limit. And without limit, there’s
no analysis. Inner product spaces also make the important concept of ‘orthogonality’ available to
us, whose significance will become more and more clear over time.
Although inner product has its ‘more serious’ applications in infinite-dimensional vector spaces,
its usefulness is evident even in the finite-dimensional case. Let V be a finite-dimensional inner
product space over F = R/C. Then it’s easy to see that an orthonormal set of V is an orthonormal
basis iff it’s a spanning set of V . Now Gram-Schmidt orthogonalization gives us a recipe to ‘convert’
a basis of V into an orthonormal basis. If B := (x1 , . . . , xn ) is an ordered orthonormal basis of V ,
then the co-ordinates of every vector in V with respect to B can be easily described in terms of
the inner product. For example, if x = λ1 x1 + . . . λn xn , then one can readily see that λi = hx, xi i
for all i = 1, . . . , n. Therefore, x = hx, x1 ix1 + . . . + hx, xn ixn for all x ∈ V .
Next, suppose that V, W are finite-dimensional inner product spaces over F with ordered or-
thonormal bases BV := (x1 , . . . , xn ) and BW := (y1 , . . . , ym ) respectively. If T : V → W is a
linear transformation, then the ij-th entry of the matrix [T ]BV ,BW is the yi -th coefficient of T (xj ),
i.e., hT (xj ), yi i. So the matrix representation of T with respect to orthonormal bases has a simple
description in terms of the inner products.

61
Matrix representations of inner products
Let (V, h , i) be a finite dimensional inner product space. If B := (x1 , . . . , xn ) is an ordered basis
of V , then the matrix MB whose ij-th entry is hxj , xi i, uniquely determines the inner product.
We say that MB is the matrix representation of h , i with respect to the ordered basis B. One can
check that for all x, y ∈ V , hx, yi = [y]∗B MB [x]B .
Conversely, a matrix M ∈ Mn (F ) is called a matrix of an inner product h , i if there exists an
inner product h , i : V × V → F and an ordered basis B := (x1 , . . . , xn ) of V such that M = MB .
We make a few observations about such a matrix M .
(i) M is a Hermitian matrix if F = C and a symmetric matrix if F = R. Recall that a complex
matrix A is said to be Hermitian if A = A∗ . So if A ∈ Mn (R), then A, as a matrix over C,
is Hermitian iff it’s symmetric.
(ii) Every diagonal entry of M is a positive real number.
(iii) For every n × 1 column vector x over F , x∗ M x > 0. A Hermitian matrix A ∈ Mn (C)
(respectively, a symmetric matrix A ∈ Mn (R)) is said to be positive-definite if x∗ Ax > 0
(respectively, xt Ax > 0) for all x ∈ Cn \ {0} (respectively, x ∈ Rn \ {0}). Therefore a matrix
is a matrix of an inner product iff it’s positive-definite.
(iv) Every eigenvalue of M is a nonzero positive real number.
(v) If P ∈ GLn (F ), then hx, yi := y ∗ P ∗ P x defines an inner product on F n . Therefore P ∗ P
is a matrix of an inner product. Later we’ll see that the converse is also true, i.e., every
positive-definite matrix is of this particular form.

Definitions. Let A be an n × n Hermitian matrix over F = R/C. Recall that A is called

positive definite if x∗ Ax > 0 for all nonzero n × 1 column vectors x over F .
A is said to be positive semi-definite if x∗ Ax ≥ 0 for all n × 1 column vectors x over F . Every
positive (respectively, negative) definite matrix is automatically positive (respectively, negative)
semi-definite. We’ll later see that a positive (respectively, negative) semi-definite matrix is positive
(respectively, negative) definite iff it’s invertibe.
Similarly, A is called negative definite (respectively, negative semi-definite) if x∗ Ax < 0 (respec-
tively, x∗ Ax ≤ 0) for all nonzero n × 1 column vectors x over F . Note that A is positive-definite
(respectively, positive semi-definite) iff −A is negative definite (respectively, negative semi-definite).
A is said to be indefinite if it’s neither positive semi-definite nor negative-semi-definite. So a Her-
mitian matrix A is indefinite iff there exist x, y ∈ F n such that x∗ Ax > 0 and y ∗ Ay < 0.

Remarks.

1. Let A ∈ Mn (C) be a Hermitian matrix. Then x∗ Ax ∈ M1 (C) is also a Hermitian matrix for
all n × 1 column vectors x over C. Therefore Ax · x is a real number for all x ∈ Cn . We’ll
later connect this to certain characterizing property of complex self-adjoint operators.
2. For all A ∈ Mn (F ), A + A∗ and AA∗ are both Hermitian matrices. Note that a scalar matrix
λI ∈ Mn (F ) is Hermitian iff λ ∈ R. Actually, complex Hermitian matrices ‘behave’ like
real numbers. For example, every matrix A ∈ Mn (C) can be uniquely written in the form
A = B + iC, where B, C are Hermitian matrices. (Can you solve the equation A = X + iY
assuming that X, Y are Hermitian matrices?). We will further explore the analogy between
complex Hermitian matrices and real numbers in the future.
3. A positive (or negative) definite matrix is invertible.
4. The map φ : Mn (F ) → Mn (F ), given by φ(A) := A∗ , is an R-algebra anti-isomorphism, i,e.,
it satisfies all the properties of an R-algebra isomorphism except that it switches the order
of multiplication, viz., φ(AB) = φ(B)φ(A). Note that φ2 = Id.
5. Let A ∈ Mn (F ) be a Hermitian matrix with n > 1 (How does an 1 × 1 Hermitian/positive
definite/positive semi-definite/negative definite/negative semi-definite matrix look like?).
Then we have seen that Ax · x ∈ R for all x ∈ Mn×1 (F ). So we can partition F n into
three sets - Ω+ (A) := {x ∈ F n | Ax · x > 0}, Ω− (A) := {x ∈ F n | Ax · x < 0} and

62
Ω0 (A) := {x ∈ F n | Ax · x = 0}. Note that all three sets are closed under nonzero scalar
multiplication.
The matrix A is positive (respectively, negative) definite iff Ω+ (A) ∪ {0} = V (respectively,
Ω− (A) ∪ {0} = V ).
Similarly, A is positive (respectively, negative) semi-definite iff Ω− (A) = ∅ (respectively,
Ω+ (A) = ∅). As F n \ {0} is connected and Ω+ (A), Ω− (A) are disjoint open sets, it fol-
lows that if A is an indefinite matrix then Ω0 (A) 6= 0. In this case, each one of the sets
Ω+ (A) ∪ {0}, Ω− (A) ∪ {0} and Ω0 (A) is a union of lines passing through the origin.
To illustrate the idea, we consider the set of 2 × 2 real symmetric matrices. Then for a
may
α β
symmetric matrix A = , we see that Ω0 (A) is the solution set of the homogeneous
β γ
quadratic equation
αX 2 + 2βXY + γY 2 ∈ R[X, Y ].
Clearly, Ω0 (A) can either be the singleton set consisting of the origin, or a line passing
through the origin, or a pair of lines passing through the origin. In the first case, A is a
positive (or negative) definite matrix. In the second case, A is a positive (or negative) semi-
definite matrix which is not positive (or negative) definite. And in the third case, A is an
indefinite matrix (why?).
1 0 1 0 3 0
You may consider the 2 × 2 real matrices A := , , to get an idea
0 0 0 −1 0 −1
about how these sets may look like.
The first matrix is positive semi-definite, with Ω0 (A) = {(0, y) ∈ R2 | y ∈ R}. The other two
matrices are indefinite, but there’s a difference. The second matrix gives the reflection with
respect to the X-axis. Here Ω0 (A) = {(x, y) ∈ R2 | x2 = y 2 }. So we get a partition of R2
in four symmetric quadrants, where the opposite quadrants consist of vectors of the ‘same
sign’. The last matrix, besides being indefinite, also imparts some stretch on the vectors.
Here Ω0 (A) = {(x, y) ∈ R2 | 3x2 = y 2 }. Now the quadrants are not symmetric anymore.
Only the opposite quadrants are symmetric. The quadrants in which the vectors of ‘positive
sign’ are lying, are bounded by two lines meeting at 120◦ at the origin. So we can actually
choose an orthonormal basis u, v ∈ R2 such that Au · u and Av · v are both positive, but A is
not a positive definite matrix. In a sense, the first matrix is a ‘limiting case’ of the other two
matrices when the two lines describing Ω0 (A) ‘coincide’ (If you imagine that the two lines
describing Ω0 (A) for the second or the third matrix are ‘pulled’ towards the Y -axis, then, as
a ‘limit’, we get the first matrix.).
A mental picture ‘identifying’ the set of all 2 × 2 real symmetric matrices with the real plane
R2 using the eigenvalues of the matrices may aid our understanding (We don’t want to make
it precise. For different matrices may have the same eigenvalues, and for a symmetric matrix
A with eigenvalues, say 1 and 2, can be sent either to (1, 2) or to (2, 1)!). It’ll give us the
following correspondances.

(i) Symmetric matrices ←→ R2 ,

(ii) Positive definite matrices ←→ Open 1st quadrant,
(iii) Positive semi-definite matrices ←→ Closed 1st quadrant,
(iv) Negative definite matrices ←→ Open 3rd quadrant,
(v) Negative semi-definite matrices ←→ Closed 3rd quadrant,
(vi) Indefinite matrices ←→ The union of open 2-nd quadrant and open 4-th quadrant.
6. The set of all n × n Hermitian matrices is a closed subgroup of Mn (F ). Note that it’s a
subspace of Mn (F ) iff F = R, where the set of n × n symmetric matrices is a subspace of
2
Mn (R) of dimension n 2+n .
7. Among all n × n real symmetric matrices, the positive definite ones are sometimes charac-
terized by the property that they don’t change the ‘direction’ of a nonzero vector. Because
if A ∈ Mn (R) is a symmetric matrix, then A is positive definite iff Ax · x > 0 for all nonzero
vectors x ∈ F n . Now Ax · x > 0 iff cos θ(Ax, x) > 0; or equivalently, the angle between Ax
and x is an acute angle. So the action of A does not change the ‘direction’ of the vector x.
Can you see that the above discussion also applies to complex Hermitian matrices?

63

1 1 x1
8. Let A := ∈ M2 (R). Then one can check that for all x := ∈ R2 \ {0},
−1 1 x2
xt Ax = x21 + x22 > 0. But A is not a symmetric matrix.
However, we’ll later see that for any matrix M ∈ Mn (C), the condition that x∗ M x > 0 for
all nonzero vectors x ∈ Cn actually implies that M is Hermitian.

9. If A ∈ Mn (F ) is a Hermitian (or positive/negative semi-definite) matrix, then so is B ∗ AB

for all B ∈ Mn (F ). Further, if B is invertible, then for every positive (respectively, negative)
definite matrix A ∈ Mn (F ), B ∗ AB is also positive (respectively, negative) definite matrix.
10. If A, B ∈ Mn (F ) are Hermitian matrices then the following assertions hold.

(a) A+B, ABA, BAB are Hermitian matrices, and AB is a Hermitian matrix iff AB = BA.
(b) If A is positive (respectively, negative) definite and B is positive (respectively, negative)
semi-definite, then A + B is a positive (respectively, negative) definite matrix.
(c) If A, B are positive (respectively, negative) definite matrices, then so are ABA and
BAB.
(d) If A, B are positive (respectively, negative) semi-definite matrices, then so are ABA and
BAB.
(e) If A is an positive (respectively, negative) definite matrix, so is A−1 . Now the scalar
matrix λI is positive (respectively, negative) definite iff λ is a positive (respectively,
negative) real number. Therefore, for a positive definite matrix A, λA is positive (re-
spectively, negative) definite iff λ > 0 (respectively, λ < 0).
(f) If A, B are positive (or negative) (semi-)definite matrices such that AB = BA then we’ll
later see that AB is a positive (semi-)definite matrix.
Can you give an example of positive definite matrices A, B ∈ Mn (F ) such that AB is
not even Hermitian?

11. Every vector z ∈ Cn can be uniquely written as z = x + iy, where x, y ∈ Rn . Using this,
it’s easy to prove that a symmetric matrix A ∈ Mn (R) is a positive definite (or negative
definite/positive semi-definite/negative semi-definite) matrix, then A, treated as an n × n
matrix over C, is also positive definite (or negative definite/positive semi-definite/negative
semi-definite).
In the proof, we crucially use the fact that Az · z ∈ R for all z ∈ Cn . Without this property,
it may happen for some matrix A ∈ Mn (R) that Ax · x > 0 for all x ∈ Rn , but Az · z < 0 for
some z ∈ Cn , as the example in Remark 8 shows. Thus, if we define a matrix to be ‘positive
definite’ if it only satisfies the positivity condition, without requiring it to be Hermitian,
then this will lead us to a situation where a matrix A ∈ Mn (R) will be ‘positive-definite’;
but when treated as a matrix over C, it won’t be ‘positive definite’ !

12. Let A ∈ Mn (F ) be a positive definite matrix. Then we’ve seen that all eigenvalues of A are
positive real numbers. Therefore det A is also positive. For each i ≤ n, let Ai be the i-th
leading principal sub-matrix of A, i.e., the matrix obtained from A by deleting the rows and
columns starting from i + 1. Then it’s easy to see that each Ai is also positive definite (Just
look at the vectors in F n whose last n − i co-ordinates are 0.). It implies that for each i ≤ n,
det Ai , called the i-th leading principal minor of A, is a positive real number. Surprisingly,
the converse is also true. Sylvester’s criterion, which we’ll not prove, states that ‘A Hemitian
matrix M ∈ Mn (F ) is positive definite iff every leading principal minor of M is positive’.

From inner product spaces to topological vector spaces

We mention, in passing, a few related concepts in an increasing order of generality.

Definitions. Let V be a vector space over F = R/C. By a norm on V , we mean a function

k k : V → R+0 , satisfying the following conditions.

(i) Sub-additivity or the triangle inequality: For all x, y ∈ V , kx + yk ≤ kxk + kyk.

(ii) Absolute homogeneity: For all λ ∈ F and x ∈ V , kλxk = |λ| kxk.

64
(iii) Positive definiteness: For all nonzero vectors x ∈ V , kxk > 0.

An ordered pair (V, k k), where V is a vector space over F = R/C and k k : V → R+ 0 is a norm on
V , is called a normed linear space, or in short, an N LS. Like inner product spaces, a normed linear
space is called separable, complete etc. if it’s so with respect to the induced metric. A complete
normed linear space is also called a Banach space, named after Stefan Banach.
By a metric vector space, or an MVS, in short, we mean an ordered pair (V, d) where V is a vec-
tor space over F = R/C, and d : V × V → R is a metric such that the vector space operations
+ : V × V → V and · : F × V → V are continuous, where the products are given the product
metrics.
A topological vector space, or a TVS, in short, means an ordered pair (V, I) where V is a vector
space over F = R/C, and I ⊆ P(V ) is a topology on V such that the vector space operations
+ : V × V → V and · : F × V → V are continuous, with the products being given the product
topologies.

Remarks.
1. Every inner product space is also a normed linear space with respect to the induced norm.
Similarly, every normed linear space is a metric vector space with respect to the induced
metric, and every metric vector space is a topological vector space with respect to the induced
topology. So IPS =⇒ NLS =⇒ MVS =⇒ TVS gives certain hierarchy.
2. We have seen that a norm which is induced by an inner product, satisfies the parellelogram
law. Conversely, if a norm k k on a vector space V satisfies the parallelogram law, then it
is induced by an inner product which can be retrieved from the norm using the polarization
identities
1
hx, yiR := (kx + yk2 − kx − yk2 ), if F = R,
4
and
1
hx, yiC := (kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 ), if F = C.
4
It requires some computation to show that the functions defined above actually give inner
products on V . Clearly, different inner products cannot induce the same norm.
3. If V is a normed linear space then the norm may not be induced by an inner product. For
example, one can consider the sup norm k ksup on F 2 , defined as k(x, y)ksup := max {|x|, |y|},
which is not induced by an inner product (why?). More generally, if we take V := F (N) , and
define the `p -norm on V as
∞
X 1
k(xn )kp := ( |xn |p ) p ,
n=1

then one can show that (We’re not going to prove it!) the `p -norm satisfies the parallelogram
law iff p = 2.
4. Let V be a vector space over F = R/C. Two norms `1 , `2 on V are said to be equivalent if
there exist real numbers c, C > 0 such that

c`1 (x) ≤ `2 (x) ≤ C`1 (x)

for all x ∈ V . It’s easy to check that this indeed gives an equivalence relation on the set of all
norms defined on V . Note that if `1 , `2 are two norms on V and S1 := {x ∈ V | `1 (x) = 1},
then `1 and `2 are equivalent iff inf {`2 (x) | x ∈ S1 } and sup {`2 (x) | x ∈ S1 } are both nonzero
real numbers, in which case, we can take c := inf {`2 (x) | x ∈ S1 } and C := sup {`2 (x) | x ∈
S1 }.
Thankfully, and perhaps not surprisingly, any to norms defined on a finite-dimensional vector
space V are equivalent. To prove this, one can show that if ` : V → R0+ is a norm on F n ,
then ` is a continuous function with respect to the usual Euclidean metric. We leave its proof
as an exercise.
However, two norms on an infinite dimensional normed linear space need not be equivalent.
For example, one may compare the `1 -norm with the sup norm on F (N) , and look at the
sequence (e1 + e2 + . . . + en )∞
n=1 .

65
5. If `1 , `2 are equivalent norms on a vector space V over F = R/C, then a set X ⊆ V is open
with respect to the metric induced by `1 iff it’s open with respect to the metric induced by `2 .
Therefore, if needed, we can answer various topological questions after replacing the original
norm by an equivalent norm.
6. If (V, k k) is a finite-dimensional normed linear space then the unit sphere, defined as S :=
{x ∈ V | kxk = 1}, is a compact set.
The converse is also true. If follows from Riesz’s lemma which states that
‘Let (V, k k) be a normed linear space, Y a closed proper subspace of V and α a real number
with 0 < α < 1. Then there exists an x ∈ S such that kx − yk ≥ α for all y ∈ Y .’
We will not prove Riesz’s lemma in this course.
7. As we saw in the case of inner product spaces, if `1 , `2 are norms on a vector space V over
F = R/C, then `1 + `2 is also a norm on V . If V 6= 0 the λ`1 is a norm on V iff λ > 0.
Analogous properties hold for metric vector spaces (What happens to the assertion involving
the difference of two norms/metrics?).
8. If (V1 , k k1 ), (V2 , k k2 ) are two normed linear spaces then we can define a norm on the direct
sum V1 ⊕ V2 as follows
k(x, y)k := kxk1 + kyk2 .

9. A metric d which makes F n a metric vector space need not be induced by a norm. To give
such an example, it’s tempting to consider the discrete metric on F n . But F n does not
become a metric vector space with respect to the discrete metric because then the scalar
multiplication is not continuous. So we consider the following metric

kx − yk
d(x, y) := ,
1 + kx − yk

where k k is the usual Euclidean norm on F n . Then one can check that (V, d) is a metric
vector space. But since d is bounded, it cannot be induced by a norm.
If (V, d) is a metric vector space, the d is induced by a norm iff it’s translation invariant,
i.e., d(x + z, y + z) = d(x, y) for all x, y, z ∈ V , and absolutely homogeneous, i.e., d(λx, λy) =
|λ|d(x, y) for all x, y ∈ V and λ ∈ F . In this case, we can ‘get back’ the norm by simply
defining kxk = d(0, x). In the above example, d is translation invariant, but not absolutely
homogeneous. We’ll later see an example of an absolutely homogeneous metric which is not
translation invariant.
10. Clearly, two different norms cannot induce the same metric. However, it’s a fact, which we
won’t prove, that there exists a unique topology on F n , viz., the product topology, which
makes it a topological vector space. Therefore every metric on F n which makes it a metric
vector space, induces the same topology.

11. Every norm on an one-dimensional vector space V over F = R/C is induced by an inner
product. In fact, it follows from the parallelogram law that if (V, k k) is a normed linear
space, then the norm on V is induced by an inner product iff for every two dimensional
subspace W ≤ V , the restriction k k W is induced by an inner product on W .

Morphisms
A linear transformation T between inner product spaces (V, h , iV ), (W, h , iW ) is said to be a
morphism (or homomorphism) of inner product spaces if T preserves the inner product, i.e., if
hT (x), T (y)iW = hx, yiV for all x, y ∈ V (Can you see the diagrammatic interpretation?). We’ll
later see that every set-theoretic map preserving the inner product is actually linear (Essentially
because the coefficients of a vector can be written in terms of the inner product.). Note that a
morphism of inner product spaces also preserves the induced norm and metric.
A linear transformation T between normed linear spaces (V, k kV ), (W, k kW ) is said to be a mor-
phism (or homomorphism) of normed linear spaces if T preserves the norm, i.e., if kT (x)kW = kxkV
for all x ∈ V .
If (V, dV ), (W, dW ) are metric vector spaces then a linear transformation T : V → W is sad to be a
morphism (or homomorphism) of metric vector spaces if T preserves distance, i.e., it’s an isometry.

66
If T is also surjective, then we call it an isometric isomorphism.
Finally, if (V, TV ), (W, IW ) are topological vector spaces, then a continuous linear transformation
T : V → W is called a morphism (or homomorphism) of topological vector spaces.

Remarks.
1. If (V, h , i) is an n-dimensional inner product space, then choosing an ordered orthonormal
basis B := (x1 , . . . . , xn ) of V is ‘same as’ giving an inner product space isomorphism between
(V, h , i) and (F n , ·). So just like vector spaces, up to isomorphism, there exists a unique
Pn inner product space for every positive integer n. Note that for all x, y ∈ V ,
n-dimensional
hx, yi = i=1 hx, xi ihy, xi i, just like the usual dot product.
2. A linear transformation between normed linear spaces preserves the norm iff it preserves the
induced metric.
3. Every morphism of inner product spaces/normed linear spaces/metric vector spaces is in-
jectibve.
4. If T is a linear transformation between inner product spaces V, W then T preserves the inner
product iff it preserves the induced norm. The implication is trivial in one direction, and the
other direction follows from the polarization identities.

Inner product space = Normed linear space + Orthogonality

The important concept of orthogonality isn’t available in an arbitrary normed linear space. Now
we will see that every normed linear space which ‘carries’ a reasonable notion of orthogonality,
actually ‘comes from’ an inner product space; so that, in some sense, it’s the orthogonality relation
which ‘makes’ a normed linear space an inner product space.

Let (V, k k) be a normed linear space over F = R/C. Suppose there exists a symmetric relation on
V , called orthogonality and denoted by x ⊥ y, such that the following properties are satisfied.
1. For all x 6= 0, x⊥ := {y ∈ V | x ⊥ y} is a hyperplane in V .
2. If x, y ∈ V are orthogonal vectors, i.e., x ⊥ y, then

kx + yk2 = kxk2 + kyk2 .

Then one can check that k k satisfies the parallelogram law, or equivalently, the norm on V is
induced by an inner product. The details are left as an exercise. Note that the first property
allows us to mimic Gram-Schmidt orthogonalization, whereas the second property, Pythagorean
law, captures the essence of orthogonality and connects it to the given norm.

Exercises.
1. HK : Section 8.1 - 3,6,7,8,11,13,14,15,16,17.
2. (i) Let F = R/C and Mn (F ) the set of n × n matrices over F . Then show that the rank
function rk : Mn (F ) → N is a lower semi-continuous function with respect to the
Euclidean metric, i.e., if A ∈ Mn (F ) has rank r then there exists an > 0 such that
rk B ≥ r for all B ∈ Mn (F ) satisfying kA − Bk < .
Is the rank function continuous?
(ii) Let V be a finite-dimensional normed linear space and x1 , . . . , xn ∈ V linearly indepen-
dent vectors. Then there exists an > 0 such that for all y1 , . . . , yn ∈ V . if d(xi , yi ) <
for all i, then y1 , . . . , yn ∈ V are also linearly independent.
3. If V is an arbitrary vector space over F = R/C, can you define an inner product on V ?
4. Prove that an inner product space V is separable iff there exists a countable set S ⊆ V such
that the linear span of S is dense in V .
5. Let h , i, h , i0 be two inner ptoducts on a vector space V over F . Suppose that S ⊆ V is a
spanning set of V . Then h , i = h , i0 iff hx, yi = hx, yi0 for all x, y ∈ S.

67
6. Show that an orthonormal set in an inner product space cannot have any limit point.
7. (*) Let S be an orthonormal set in an inner product space V . If x ∈ V , the show that the
set {y ∈ S | hx, yi =
6 0} is at most countable.
8. (*) Let (V, h , i) be an inner product space. Let Ṽ be the set of all Cauchy sequences of V
with respect to the induced metric. Two Cauchy sequences (xn ), (yn ) ∈ V N are said to be
equivalent if for all > 0 there exists a positive integer n ∈ N, such that d(xn , yn ) < for all
n ≥ n , where d is the induced metric. Let Vb be the metric space completion of V , i.e., the
set of all equivalence classes of Cauchy sequences of V . Clearly, V can be embedded in Vb
by sending an element of V to the equivalence class of the corresponding constant sequence.
Then show that (Vb , h , i), with respect to the following definition, also becomes an inner
product space containing V as a dense subspace.
(i) [(xn )] + [(yn )] := [(xn + yn )].
(ii) λ[(xn )] := [(λxn )].
(iii) h[(xn )], [(yn )]i := lim hxn , yn i.
n→∞

9. Prove that an infinite-dimensional Hilbert space V cannot have an orthogonal spanning set.
Deduce that a Hilbert space V which is not finite-dimensional must have an uncountable
dimension. In particular, a Hilbert space V is finite dimensional iff it has an orthonormal
spanning set.
10. Let V be an inner product space. Then show that for all x, y ∈ V , kx − yk ≥ | kxk − kyk |,
where k k is the induced norm function.
Now prove the result for an arbitrary normed linear space (without assuming that the norm
is induced by an inner product).
11. Let T : (V h , iV ) → (W, h , iW ) be a linear transformation between inner product spaces.
Then show that the following statements are equivalent.
(i) T preserves inner product, i.e., hT (x), T (y)iW = hx, yiV for all x, y ∈ V .
(ii) For every basis B of V , hT (x), T (y)iW = hx, yiV for all x, y ∈ B.
(iii) There exists a basis B of V , hT (x), T (y)iW = hx, yiV for all x, y ∈ B.
12. Let (V, h , i) be an n-dimensional inner product space over F . Then show that (V, h , i),
as an inner product space, is isomorphic to (F n , ·), i.e., there exists a linear transformation
T : V → F n such that hx, yi = T (x) · T (y) for all x, y ∈ V .
13. Let V be an inner product space. If W is a subspace of V , then show that W , the closure of
W in V with respect to the induced metric, is also a subspace of V .
Prove that every finite-dimensional subspace in an inner product space is closed with re-
spect to the induced metric (We’ve already seen examples of proper dense subspaces, so all
subspaces need not closed.).
14. Let (V, h , i) be an inner product space and x1 , . . . , xn ∈ V . Let A ∈ Mn (F ) be the matrix
defined as Aij = hxi , xj i for all i, j. Then prove the following statements.
(a) x1 , . . . , xn are linearly independent iff A is an invertible matrix.
(b) x1 , . . . , xn is an orthogonal sequence iff A is a diagonal matrix. x1 , . . . , xn is an orthog-
onal sequence of nonzero vectors iff A is a diagonal matrix whose every diagonal entry
is nonzero.
(c) x1 , . . . , xn is an orthonormal sequence iff A = In .
(d) If W is the subspace of V generated by x1 , . . . , xn , then rk A = dim W .
15. (*) Let (V, h , i) be an inner product space and S := {x ∈ V | kxk = 1}, the unit sphere of
V . Then show that S is compact iff V is finite dimensional.
16. (*) Show that the following statements are equivalent for an inner product space (V, h , i).
(i) V is finite dimensional.

68
(ii) Every subspace of V is closed.
(iii) Every hyperplane of V is closed.
(iv) Every linear functional on V is continuous.
17. Let (V, h , i)V , (W, h , i)W be inner product spaces. If T : V → W is a linear transformation,
then prove that the following statements are equivalent.
(i) T preserves the inner product, i.e., hT (x), T (y)iW = hx, yiV for all x, y ∈ V .
(ii) For every orthonormal sequence x1 , . . . , xn ∈ V , the sequence T (x1 ), . . . , T (xn ) ∈ W is
also orthonormal.
If dim V > 1, the the above two conditions are equivalent to
(iii) If u, v ∈ V are orthonormal vectors then so are T (u), T (v) ∈ W .
18. (*) Let A be an m × n matrix over F = R/C. Then show that the following statements are
equivalent.
(i) A has rank m.
(ii) For all b ∈ Mm×1 (F ), there exists a real number > 0 (depending only on A) such that
the system of linear equations A0 X = b has a solution in F n for all A0 ∈ Mm×n (F )
satisfying kA − A0 k < .
(iii) There exists a nonzero column vector b ∈ Mm×1 (F ) and a real number > 0 such
that such that the system of linear equations A0 X = b has a solution in F n for all
A0 ∈ Mm×n (F ) satisfying kA − A0 k < .
19. If A is an n × n matrix over F = R/C, then prove the following statements.

(i) rk AA∗ = rk A∗ A = rk A = rk A∗ . In particular, if A is a Hermitian matrix then

rk A = rk A2 . Deduce that a nonzero nilpotent matrix cannot be Hermitian.
(ii) If A ∈ Mn (C) satisfies the property that Ax · x ∈ R for all x ∈ F n , then show that A
is a Hermitian matrix.
(iii) Prove that every Hermitian matrix is diagonalizable. In particular, all real symmetric
matrices are diagonalizable.
Can you give an example of a 2 × 2 complex symmetric matrix which is not diagonal-
izable?
Hint. To prove (ii), you may use the decomposition A = B + iC, where B, C are Hermitian
matrices. Can you show that C = 0?

20. Let A ∈ Mn (F ) where F = R/C.

(a) If F = R then show that the following statements are equivalent.
(i) AAt = At A = I.
(ii) The column vectors of A form an orthonormal sequence.
(iii) The row vectors of A form an orthonormal sequence.
(b) If F = C then show that the following statements are equivalent.
(i) AA∗ = A∗ A = I.
(ii) The column vectors of A form an orthonormal sequence.
(iii) The row vectors of A form an orthonormal sequence.
If A ∈ Mn (C) satisfies AAt = At A = I, does it imply that the rows/column vectors of
A form an orthonormal sequence?
21. Let (V, h , iV ), (W, h , iW ) be inner product spaces and f : V → W a set-theoretic map.
Then f is said to be length preserving if kf (x)k = kxk for all x ∈ V , where k k is the induced
norm.

(i) Show that an isometry f : V → W is length preserving iff f (0) = 0.

Do you know isometries which don’t preserve the origin?

69
(ii) Give an example of a length preserving map f : V → W which is not an isometry (In
fact, can you give an example where f is continuous only at the origin?).
(iii) If F = R and f : V → W is an isometry such that f (0) = 0, then show that f preserves
the inner product, i.e., hf (x), f (y)iW = hx, yiV for all x, y ∈ V .
What if we replace R by C?
22. If A ∈ Mn (C) is a Hermitian matrix, show that chA (X) ∈ R[X].
23. If (V, h , i) is an inner product space then for all nonzero vectors x, y ∈ V and nonzero real
numbers c1 , c2 , if c1 c2 > 0 then θ(x, y) = θ(c1 x, c2 y).
24. (*) Let V be a vector space over C, the field of complex numbers. Then we can naturally
view V as a vector space over R. Then show that
(i) If h , iC a complex inner product on V then h , iR : V × V → R, defined as

hx, yiR := Re hx, yiC for all x, y ∈ V ,

is a real inner product on V .

(ii) If h , iR is a real inner product on V , then h , iC : V × V → C, defined as

hx, yiC := hx, yiR + hx, iyiR for all x, y ∈ V ,

is a complex inner product on V iff hix, iyiR = hx, yiR for all x, y ∈ V .
25. (*) Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear transforma-
tion. The show that
(a) The following statements are equivalent.
(i) T preserves length, i.e., kT (x)kW = kxkV for all x ∈ V .
(ii) T preserves length of unit vectors, i.e., kT (x)kW = 1 for all unit vectors x ∈ V .
(b) If T is injective then following statements are equivalent.
(i) T is angle preserving, i.e., θ(T (x), T (y)) = θ(x, y) for all nonzero vectors x, y ∈ V .
(ii) T preserves the angle between unit vectors, i.e., θ(T (x), T (y)) = θ(x, y) for all
x, y ∈ V satisfying kxkV = kykV = 1.
In both the conditions, the injectivity of T must be assumed as we define angle only
between nonzero vectors.
(c) T preserves inner product iff T preserves the inner product between any two unit vectors
of V .
(d) The following statements are equivalent.
(i) T preserves inner product.
(ii) T preserves length.
(iii) T is an isometry.
(e) If T is length preserving, then it’s also angle preserving.
(f) Give an example of a linear transformation T which is angle preserving, but not length
preserving.
26. (a) If (V, h , i) is an inner product space then show that the induced norm k k : V → R+ 0
and the vector space operations +, · are continuous with respect to the induced metric.
(b) If (V, k k) is a normed linear space, then show that the vector space operations +, · are
continuous with respect to the induced metric.
27. (*) Every finite-dimensional subspace of a normed linear space is closed.
28. (*) Let (V, k k) be a Banach Space. Then show that the following statements are equivalent.
(i) V is finite-dimensional.
(ii) Every subspace of V is closed.

70
(iii) Every hyperplane of V is closed.
(iv) Every linear functional on V is a continuous function.
Can you give an example of an infinite dimensional normed linear space V and a linear
functional f : V → F such that f is not continuous?
29. (*) If V, W are real inner product spaces and f : V → W is a set theoretic map, then show
that the following statements are equivalent.
(a) f preserves inner product.
(b) f preserves both the length of vectors and the angle between nonzero vectors.
If we instead take V and W to be complex inner product spaces, does (ii) =⇒ (i)? What
if we further assume that f (ix) = if (x) for all x ∈ V ?
30. (a) Let V be a vector space over C. Then we can as well view V as a vector space over R.
Now show that
(i) Every C-linearly independent sequence of vectors is also R-linearly independent.
(ii) Every R-spanning set of V is also a C-spanning set.
(iii) If S ⊆ V is a C-spanning set, then S ∪ iS is a R-spnning set, where iS := {ix | x ∈
S}.
(iv) If (x1 , . . . , xr ) is a C-linearly independent sequence in V then (x1 , . . . , xr , ix1 , . . . , ixr )
is an R-linearly independent sequence.
(b) Now suppose that (V, h , iC ) is a complex inner product space. Then V becomes a real
inner product space under the induced inner product
hx, yiR := Re hx, yiC , for all x, y ∈ V .
Then show that
(i) Every C-orthogonal set (with respect to h , iC ) is R-orthogonal (with respect to
h , iR ).
(ii) If x, y ∈ V are R-orthogonal, so are λx, µy for all λ, µ ∈ C satisfying λµ̄ ∈ R.

(iii) Two nonzero vectors x, y ∈ V are perpendicular to each other, i.e., θ(x, y) = π2 , iff
hx, yiR = 0, or equivalently, hx, yiC is a purely imaginary number. Therefore, al-
though R-orthogonality is same as being perpendicular, C-orthogonality is actually
a ‘stronger’ condition.
(iv) Give an example of mutually perpendicular vectors x, y ∈ V , such that x, iy are
not mutually perpendicular (Note that in a real inner product space, nonzero
multiples of mutually perpendicular vectors remain mutually perpendicular. This
happens because, unlike real numbers, multiplication by a complex number can
impart ‘non-trivial rotation’ on a vector. So the concept of ‘angle’, in some sense,
is essentially a ‘real’ concept.).
(v) If (x1 , . . . , xr ) is a C-orthogonal (respectively, C-orthonormal) sequence, then the
sequence (x1 , . . . , xr , ix1 , . . . , ixr ) is R-orthogonal (respectively, R-orthonormal).
31. Let x, y be nonzero vectors in an inner product space V . Then show that kx + yk2 =
kxk2 + kyk2 iff x and y are mutually perpendicular, i.e., θ(x, y) = π2 .

32. (*) Here we consider the metric, called the French railways metric, defined on, say R2 , as

kx − yk if x, y lie on the same line passing through the origin
d(x, y) =
kxk + kyk otherwise,
where k k is the usual Euclidean norm on R2 . Then show that (R2 , d) is a metric vector
space. The metric d is absolutely homogeneous, but not translation invariant.
Remark. The name of the metric comes from the idea that suppose every rail line in France
passes through Paris and Paris is the only junction where two rail lines meet. To visualize
the situation, you may think of an infinite collection of lines passing through the origin, but
there’s no plane as such to traverse!

71
Lecture 20 and 21 (28/12/2020, 30/12/2020) :
Best approximations and orthogonal projections
To understand best approximations and orthogonal projections let us first look at the following
simple real life situation -

“Suppose that a child is locked alone in an empty room with only some balloons hanging from
the ceiling. Now if he/she wants to get hold of one of the balloons what will he/she do?”

The ‘kiddish’ approach to solve this ‘problem’ gives us an idea of best approximations and
orthogonal projections. In a formal mathematical set-up, the problem is to look at R3 and try
to find the (?) point on the XY -plane which is closest to the point p := (0, 0, 1). We all know
from our knowledge in co-ordinate geometry that the desired point is the origin - the orthogonal
projection of p on the XY -plane. We take a line passing through p which is perpendicular to the
XY -plane, and the point of intersection of this line with the XY -plane turns out to be the point
which is closest to p among all points lying on the XY -plane.

The general question can be asked in a much broader set-up.

Question. Let (X, d) be a metric space, x ∈ X and Y a subset of X. Then does there exist a
point y ∈ Y such that d(x, y) = d(x, Y )? And if such a point exists, is it unique?

Let us first consider a few cases where either best approximation doesn’t exist, or it’s not
unique.
1. If Y is not closed in X and x ∈ Y \ Y , then d(x, Y ) = 0 but d(x, y) > 0 for all y ∈ Y .
Therefore, unless Y is closed, best approximations cannot exist in general. In particular, if
V is an infinite dimensional inner product space then for any subspace W ≤ V which is not
closed, there will be vectors in V without having any best approximation in W . We’ll later
see that in an inner product space, best approximation, whenever exists, is unique.
2. If W is a finite-dimensional subspace of a normed linear space (V, k k), then every x ∈ V
has a best approximation in W . To see this, first replace V by V 0 , the finite-dimensional
subspace of V generated by W and x. If d(x, W ) = r, let C be the compact set, defined as
C := {v ∈ V | d(x, v) = r}. As d(C, W ) = 0, and W is a closed subset of V , it follows that
C ∩ W 6= ∅, showing the existence of best approximation.
Best approximations, however, needn’t be unique in normed linear spaces. For example, let
us consider F 2 with respect to the sup norm. If W is the subspace of F 2 generated by (1, 0),
then every vector (λ, 0) ∈ W satisfying |λ| ≤ 1 is a best approximation of (0, 1) in W .
As best approximations may not be unique in a general normed linear space, we’ll mostly talk
about them in the context of inner product spaces. Let us begin with a few definitions.

Definitions. Let W be a subspace of an inner product space (V, h , i). Then for an element
x ∈ V , xW ∈ W is said the be a best approximation of x in W if kx − xW k ≤ kx − wk for all
w ∈ W.
Two sets S1 , S2 in V are said to be mutually orthogonal, or orthogonal to each other, if x ⊥ y for
all x ∈ S1 and y ∈ S2 .
If x ∈ V , then the orthogonal complement of x, denoted by x⊥ , is defined as
x⊥ := {y ∈ V | hx, yi = 0}.
If S ⊆ V , the orthogonal complement of S, denoted by S ⊥ , is defined as
\
S ⊥ := x⊥ .
x∈S

Note that x⊥ = λx⊥ for all nonzero ⊥ ⊥

L elements λ ∈ F ; and for a set S ⊆ V , x ∈ S iff S ⊆ x .
A direct sum decomposition V = i Vi is said to be an orthogonal direct sum decomposition of V ,
or simply an orthogonal decomposition of V , if Vi ⊥ Vj for all i 6= j.

72
Some basic properties
1. For all x ∈ V , x⊥ is a closed hyperplane in V . It is so because the linear functional on V
which sends an element y ∈ V to hy, xi ∈ F , is a continuous function. In particular, S ⊥ is a
closed subspace of V for all S ⊆ V .
2. If S1 ⊆ S2 ⊆ V , then we have the reverse inclusion S2⊥ ⊆ S1⊥ .
3. Note that 0⊥ = V and V ⊥ = 0. In fact, if W ≤ V is a dense subspace, then W ⊥ = 0.
Surprisingly, there exist examples of inner product spaces V and subspaces W ≤ V , such
that W is not dense in V , but W ⊥ = 0. However, such ‘abnormalities’ cannot occur if V is
complete.
⊥
4. If S ⊆ V , then S ⊥ =< S >⊥ = < S > .
5. For all W ≤ V , W and W ⊥ are linearly independent, i.e., W ∩ W ⊥ = 0.
6. If S ⊆ V , then < S > ⊆ S ⊥⊥ (:= (S ⊥ )⊥ ) as S ⊆ S ⊥⊥ and S ⊥⊥ is a closed subspace of V .
There exist examples to show that the inequality can be strict. However, when V is complete,
then we’ll later see that we always have an equality.
7. If V1 , V2 are subspaces of V , then the following assertions hold.
(a) V1 ⊥ V2 ⇐⇒ V1 ⊆ V2⊥ ⇐⇒ V2 ⊆ V1⊥ .
(b) V1⊥ + V2⊥ ⊆ (V1 ∩ V2 )⊥ . To see that the inequality can be strict, let V1 be a proper dense
subspace of V and V2 a one dimensional subspace generated by an element outside V1 .
(c) (V1 + V2 )⊥ = V1⊥ ∩ V2⊥ .
8. Let Wi be a family of subspaces in V such that W := ∪i Wi is also a subspace of V . Then
x ⊥ W iff x ⊥ Wi for all i.
9. If x ∈ V and w, w0 ∈ W are such that x − w ⊥ W and x − w0 ⊥ W , then w = w0 .
10. If x ⊥ W and W 0 is a subspace of W then x ⊥ W 0 .
11. Let x ∈ V and W a subspace of V . If xW ∈ W is a best approximation of x in W , then for
all subspaces W 0 ≤ W , if xW ∈ W 0 then xW = xW 0 .

Proposition. Let (V, h , i) be an inner product space. If W ≤ V is a finite-dimensional

subspace, then every x ∈ V has a unique best approximation in W .

Proof. If x ∈ W , there’s nothing to prove as x itself is the unique best approximation.

Otherwise, let V 0 be the finite-dimensional subspace of V generated by W and x. Let (x1 , . . . , xn+1 )
be an ordered orthonormal basis of V 0 such that W =< x1 , . . . , xn >. Then

x = hx, x1 ix1 + . . . + hx, xn ixn + hx, xn+1 ixn+1 .

If y := c1 x1 + . . . + cn xn , then it follows from the Pythagorean law that

kx − yk2 = |c1 − hx, x1 i|2 + . . . + |cn − hx, xn i|2 + |hx, xn+1 i|2 .

Therefore it’s easy to see that xW := hx, x1 ix1 + . . . + hx, xn ixn is the unique best approximation
of x in W . Also, the association x 7→ xW is a linear transformation.

In the above proof, x − xW ⊥ W . We elaborate this connection in the following lemma.

Lemma. Let (V, h , i) be an inner product space and W a subspace of V . Then y ∈ W is

a best approximation of x ∈ V iff x−y ⊥ W . In particular, best approximation, if exists, is unique.

Proof. Let Wy be the set of all finite-dimensional subspaces of W containing y. If W 0 ∈ Wy ,

then y is also a best approximation of x in W 0 . Therefore it follows that x − y ⊥ W 0 . Since W is
the union of all such subspaces, x − y ⊥ W .

73
Conversely, suppose that x − y ⊥ W . Now take any element w ∈ W . Then x − y ⊥ y − w, and
consequently kx − wk2 = kx − yk2 + ky − wk2 . Clearly, for all w 6= y, kx − wk2 > kx − yk2 .
Therefore, y is the unique best approximation of x in W .
To prove the second assertion, let y, y 0 ∈ W be two best approximations of x in W . Then x−y ⊥ W
and x − y 0 ⊥ W . In particular, y − y 0 ⊥ W . But that means y − y 0 ⊥ y − y 0 , implying that y = y 0 .

Remark. From the proposition preceding the above lemma, one may get the impression that
best approximation depends on the chosen ordered orthonormal basis (x1 , . . . , xn+1 ) as we ex-
pressed xW in terms of them. But the above lemma shows that this is not true, the best approxi-
mation does not depend on the chosen orthonormal basis.

We now move towards the definition of orthogonal projections.

Lemma. Let (V, h , i) be an inner product space and W a subspace of V . Suppose that every
x ∈ V has a best approximation in W , say xW . Then the map πW : V → V , sending each x ∈ V
to its best approximation xW ∈ W , has the following properties
(i) πW is a linear transformation.
(ii) πW is a projection with im πW = W and ker πW = W ⊥ , implying that V = W ⊕ W ⊥ is an
orthogonal decomposition.
Proof. Let x, y ∈ V . If W 0 is a finite-dimensional subspace of W containing both xW and yW ,
then πW 0 (x) = xW 0 = xW = πW (x) and πW 0 (y) = yW 0 = yW = πW (y). Now the linearity follows
from the finite-dimensional case.
The rest of the proof is left as an exercise.

Definition. Let (V, h , i) be an inner product space. A projection π : V → V is called an

orthogonal projection if ker π ⊥ im π.

We can sum up the the relation between best approximations and orthogonal projections in
the form of the following proposition.

Proposition. Let (V, h , i) be an inner product space. If W is a subspace of V then the

following statements are equivalent.
1. W admits an orthogonal projection, i.e., there exists an orthogonal projection π : V → V
such that W = im π.

2. V = W ⊕ W ⊥ (In general, W ⊕ W ⊥ ⊆ V .).

3. Each x ∈ V has a best approximation in W .
The best approximation of x in W is nothing but the orthogonal projection of x on W .

Proof. Left as an exercise.

Remarks.
1. If π : V → V is an orthogonal projection, then kπ(x)k ≤ kxk for all x ∈ V , with the equality
holding iff x is contained in the image of π. In particular, π is a continuous linear operator.
But a continuous projection, on the other hand, need not be orthogonal. For example, every
projection in F n is continuous, but they are seldom orthogonal.
A projection in an infinite dimensional inner product space may not be continuous. For
example, one may take V := F (N) with the usual dot product and π : V → V the projection
defined as π(en ) := ne1 for all n ≥ 1, where e1 , e2 , . . . is the natural orthonormal basis of V .
2. If W ≤ V admits an orthogonal projection, the W must be closed in V . The converse is
not true in general. In fact, there are even examples of closed subspaces W ≤ V , such that
W ⊥ = 0. However, we’ll later see that the converse is true if V is complete.

74
+ dim W ⊥ for all W ≤
3. If V is a finite dimensional inner product space, then dim V = dim WL
V . If V1 , . . . , Vr are mutually
P orthogonal subspaces of V then V = i Vi is an orthogonal
decomposition iff dim V = i dim Vi .
4. If V = V1 ⊕ V2 is an orthogonal decomposition, then V1⊥ = V2 and V2⊥ = V1 .
5. If V1 , V2 are subspaces of an inner product space V , then V1 ⊥ V2 implies that V1 ⊥ V2 . This
follows from the fact that inner product is a continuous function.

6. If W ≤ V admits an orthogonal projection, then so does W ⊥ and πW + πW ⊥ = Id.

7. π : V → V is an orthogonal projection iff Id − π is an orthogonal projection.
8. Unlike a general projection, which requires both of its kernel and image to identify it, orthog-
onal projections are ‘special’ because they can be uniquely identified only by their images (or
kernels). And unlike projections, a subspace W of V may not be the image of any orthogonal
projection of V .
9. A Hilbert space V is separable iff it has a countable orthonormal basis.
10. Let W be a subspace of an inner product space (V, h , i). Suppose that x ∈ V and y, y 0 ∈ W
are distinct elements such that kx − yk = kx − y 0 k. Then there exists an element y 00 ∈ W
such that kx − y 00 k < kx − yk (Can you visualize a cone with x being at its vertex and y, y 0
lying somewhere on the circular boundary of the base?).

Gram-Schmidt orthogonalization in the light of orthogonal projections

Let x1 , x2 , . . . be a finite/infinite sequence of vectors in an inner product space (V, h , i). Then
we constructed an orthogonal sequence y1 , y2 , . . . such that y1 , . . . , yn has the same linear span as
x1 , . . . , xn for all n ≥ 1. For every positive integer n, if we define Vn :=< x1 , . . . , xn > and πn to
be the orthogonal projection onto Vn , then it’s easy to see that yn+1 = xn+1 − πn (xn+1 ) for all
positive integers n.

Bessel’s inequality
Let x1 , . . . , xn be an orthonormal sequence of vectors in an inner product space (V, h , i). Then
for each x ∈ V , the best approximation of x in W :=< x1 , . . . , xn > is given by
n
X
xW = hx, xi ixi .
i=1

Therefore it follows that

kxW k2 = |hx, x1 i|2 + . . . + |hx, xn i|2 ≤ kxk2 ,

and this is true for every orthonormal sequence in V . Consequently, if S ⊆ V is a set of nonzero
orthogonal vectors, then
X |hx, yi|2
≤ kxk2 .
kyk2
y∈S

In fact, at most countably many summands in the above sum can be nonzero. To see this, let
y
≥ n1 , then Sn is a finite set for every

Sx := {y ∈ S | hx, yi =6 0}. If Sn := y ∈ S x, kyk
S∞
positive integer n. Since Sx = n=1 Sn , the assertion follows.

Exercises.
1. HK : Section 8.2 - 5,6,7,8,9,10,13,15,17.
2. Let V be an inner product space and T : V → V a linear operator. If T commutes with
every orthogonal projection of V , then show that T is a scalar operator.

3. Let (V, h , iV ), (W, h , iW ) be inner product spaces over F = R/C and f : V → W an

isometry satisfying f (0) = 0. Then show that

75
(i) f preserves the real part of the inner product, i.e., Re hf (x), f (y)iW = Re hx, yiV for
all x, y ∈ V .
(ii) f is R-linear, i.e., f is an additive group homomorphism and f (λx) = λf (x) for all
x ∈ V and λ ∈ R. In particular, if F = R, then f is an injective linear transformation.
If F = C, must then f be a linear transformation?
Does the answer change if we further assume that f (ix) = if (x) for all x ∈ V ?
4. (i) Let M be an n × n matrix over F = R/C. If y ∗ M x = 0 for all x, y ∈ F n , then show
that M = 0.
Deduce that A ∈ Mn (F ) is a Hermitian matrix iff Ax · y = x · Ay for all x, y ∈ F n .
(ii) If A ∈ Mn (F ) is a Hermitian matrix and we think of A as a linear operator on F n , then
show that ker A ⊥ im A.
Deduce that for every Hermitian matrix A ∈ Mn (F ), there exists an invertible matrix
P ∈ Mn (F ) such that P P ∗ = I and P ∗ AP is a real diagonal matrix.
5. (i) If A ∈ Mn (F ) is a Hermitian matrix and f (X) ∈ R[X], then show that f (A) is also a
Hermitian matrix.
What happens if we take f (X) ∈ C[X]?
(ii) Let A ∈ Mn (F ) be a Hermitian matrix. We think of A is a linear operator on F n .
Suppose that f (X), g(X) ∈ R[X] are relatively prime polynomials. If f g(A) = 0 then
show that ker f (A) ⊥ ker g(A).
6. Let V be a finite-dimensional inner product space. A linear operator T : V → V is said to
be orthogonally diagonalizable if there exists an ordered orthonormal basis B := (x1 , . . . , xn )
such that [T ]B is a diagonal matrix, i.e., there exists an orthonormal basis of V consisting
of eigenvectors of T . Show that a projection T : V → V is an orthogonal projection iff T is
orthogonally diagonalizable.
7. Let Y be a subspace of an inner product space V and x ∈ V . If (yn )n∈N is a sequence
of elements in Y such that lim d(x, yn ) = d(x, Y ), then show that (yn )n∈N is a Cauchy
n→∞
sequence.
Deduce that if Y is a closed subspace of a Hilbert space V then every element x ∈ V has a
unique best approximation in Y .
8. Show that the following statements are equivalent in a Hilbert space V .
(i) W ≤ V is a closed subspace of V .
(ii) There exists a continuous projection π : V → V such the image of π is W .
(iii) W admits an orthogonal projection, i.e, there exists an orthogonal projection π : V → V
whose image is W . In particular, if W ≤ V , then W = W ⊥⊥ .
(iv) V = W ⊕ W ⊥
(v) Every element x in V has a unique best approximation in W .
Deduce that if V is a Hilbert space and W ≤ V , then W ⊥ = 0 iff W = V . In particular,
every maximal orthonormal set of V is an orthonormal basis.
9. Let (V, h , i) be an inner product space. If x, y ∈ V then show that x ⊥ y iff kxk ≤ kx + λyk
for all λ ∈ F .
Deduce that a projection π : V → V is an orthogonal projection iff kπ(x)k ≤ kxk for all
x∈V.

10. Let (V, h , i) be an inner product space and π : V → V an orthogonal projection. Then show
that hπ(x), yi = hx, π(y)i = hπ(x), π(y)i for all x, y ∈ V .
Deduce that if W ≤ V is π-invariant, then so is W ⊥ .
11. Let V be an inner product space and π1 , π2 ∈ L(V ) orthogonal projections. Then show that

(i) π1 + π2 is an orthogonal projection iff π1 π2 = π2 π1 = 0.

(ii) π1 π2 = 0 iff π2 π1 = 0.

76
(iii) π1 π2 is an orthogonal projection iff it’s a projection. Deduce that if π1 π2 = π2 π1 , then
π1 π2 is an orthogonal projection.
(iv) If π1 π2 is an orthogonal projection then show that the image of π2 is π1 -invariant.
Deduce that π1 π2 is an orthogonal projection iff π1 π2 = π2 π1 .
12. (*) If V1 , V2 are mutually orthogonal subspaces of a Hilbert space V , then prove that
V1 + V2 = V1 + V2 .
Does the equality hold if we do not assume V to be complete?
Does the equality hold if we only assume V1 and V2 to be linearly independent, but not
mutually orthogonal?
Hint. To answer the second question, let X be an inner product space which is not complete.
Let V be the subsapce of X b ⊕ X,
b generated by X ⊕ X and an element (x, x) such that
x∈X b \ X. Then V1 := X ⊕ 0, V2 := 0 ⊕ X are both closed in V , but V1 + V2 = V .

13. (*) Let (xn )n∈N , (yn )n∈N be two sequences in an inner product space (V, h , i). Suppose that
lim kxn − yn k = 0. If the set {x1 , x2 , . . . } generates a finite-dimensional subspace of V , is
n→∞
the same true for the set {y1 , y2 , . . . }?
14. (*) Make the following statement precise, and then prove it.
‘If V is a finite-dimensional inner product space, then the set of all orthogonal projections of
V is nowhere dense in the set of all projections of V .’
Hint. Can you interpret it in the language of matrices?

77
Lecture 22, 23, 24 and 25 (4/1/2021, 8/1/2021, 11/1/2021,
13/1/2021) :
Orthogonal diagonalization
For linear operators on arbitrary vector spaces, the ‘nicest’ ones are those which can be diagonal-
ized. But now that we’ve added more structure to an arbitrary vector space, we can talk about
an even more special class of linear operators - the ‘orthogonally diagonalizable’ ones, i.e., those
diagonalizable operators which also respect the inner product.
As a general principle, we can take almost every result about linear operators on finite-dimensional
vector spaces, throw the word ‘orthogonal’ into the mix, and get a result about linear operators
on finite-dimensional inner product spaces.
While discussing orthogonal diagonalization, most of the times, we’ll restrict ourselves to finite-
dimensional inner product spaces as a purely algebraic approach isn’t suitable to deal with general
linear operators on an infinite dimensional inner product space in any satisfactory way. One needs
serious analytic machinery to study them which falls in the realm of functional analysis.

Definitions. Let (V, h , i) be an inner product space and T ∈ L(V ) a linear operator. Then T
is said to be orthogonally diagonalizable if it’s diagonalizable and distinct eigenspaces are mutually
orthogonal. If V is finite-dimensional, then T is orthogonally diagonalizable iff there exists an or-
dered orthonormal basis B := (x1 , . . . , xn ) of V such that [T ]B is a diagonal matrix. If F = C, an
orthogonally diagonalizable linear operator on V is sometimes also called a unitarily diagonalizable
linear operator. Perhaps the name is motivated by the fact that complex inner product spaces are
also known as unitary spaces.
A matrix U ∈ Mn (C) is called a unitary matrix if U U ∗ = U ∗ U = I. A real unitary matrix is also
called an orthogonal matrix. Note that a matrix A ∈ Mn (F ) is unitary iff its column vectors form
an orthonormal basis of F n . A unitary matrix is automatically invertible.
Two matrices A, B ∈ Mn (R) are said to be orthogonally equivalent if there exists a orthogonal
matrix P ∈ Mn (R) such that B = P t AP . In particular, we call A ∈ Mn (R) to be orthogo-
nally diagonalizable if it’s orthogonally equivalent to a diagonal matrix. Similarly, two matrices
A, B ∈ Mn (C) are said to be unitarily equivalent if there exists a unitary matrix U ∈ Mn (C) such
that B = U ∗ AU . In particular, A ∈ Mn (C) is said to be unitarily diagonalizable if it’s unitarily
equivalent to a diagonal matrix. Note that both orthogonal equivalence and unitary equivalence
are indeed equivalence relations. We’ll later see that if two real matrices are unitarily equivalent,
then they are also orthogonally equivalent.
Like orthogonal diagonalization, we can define a linear operator T on a finite dimensional inner
product space V to be orthogonally triangulable if there exists an orthonormal basis B of V such
that [T ]B is an upper (or lower) triangular matrix. But this is not particularly interesting since,
by Gram-Schmidt orthogonalization, a linear operator T is orthogonally triangulable iff it’s trian-
gulable. In particular, every complex n × n matrix is orthogonally triangulable.

Inner products and continuous linear functionals

First we note a useful criterion for the continuity of a linear functional.

Lemma. Let V be an inner product space and f : V → F a linear functional with W := ker f .
Then f is continuous iff W is closed.

Proof. Let Vb be a completion of V with V ⊆ Vb being a dense subspace. Let W be the closure
of W in Vb . Then W 6= Vb as W ∩ V = W 6= V (we’re assuming f 6= 0). Since Vb is complete and W
⊥ ⊥
is a proper closed subspace, W 6= 0. Let y ∈ W be a nonzero element and φy := h , yi. Then φy
is a continuous linear functional on Vb . As φy |V and f have the same kernel, f must be a nonzero
multiple of φy |V , and hence continuous.
The implication in the other direction is trivial.

If (V, h , i) is an inner product space, we’ve seen that every x ∈ V ‘gives’ a continuous linear
functional φx : V → F , defined as φx (y) := hy, xi for all y ∈ V . Then ker φx = x⊥ is a closed
hyperplane of V . Since we have an orthogonal decomposition, V = x⊥ ⊕ < x >, x⊥⊥ =< x >.

78
If f ∈ V ∗ is a continuous linear functional, then we say that f comes from the inner product if
there exists an element xf ∈ V such that f = φxf . From the general theory of linear functionals,
it follows that for all u, v ∈ V , < u >=< v > iff ker φu = ker φv , or equivalently, iff u⊥ = v ⊥ .
The association x 7→ φx gives an R-linear homomorphism Φ : V → V ∗ . Consequently, Φ is a
linear transformation if F = R; and when F = C, it gives a conjugate-linear transformation,
i.e., Φ(λx) := φλx = λ̄Φ(x). Note that Φ is always injective, and it’s surjective whenever V is
finite-dimensional. In fact, when F = R, Φ gives a natural linear isomorphism between V and
V ∗ . When V is finite-dimensional, we can explicitly describe the element xf ∈ V . To do so, let
B := (x1 , . . . , xn ) be an ordered orthonormal basis of V . Then f (xi ) = hxi , xf i for all i = 1, . . . , n,
implying that
Xn
xf = f (xi )xi .
i=1

If W is the one-dimensional subspace of V generated by xf , then f = f ◦ πW , i.e., f is entirely

determined by its action on xf , which is also evident from the fact that xf generates the orthogonal
complement of ker f . Note that f (xf ) = kxf k2 .

In general, if f : V → F is a linear functional on an inner product space (V, h , i) and

W := ker f , then there are three possibilities.
(i) f is not continuous, or equivalently, W is not closed. In this case W = V , implying that
W ⊥ = 0. So naturally, f cannot come from the inner product.
(ii) f is continuous, so that W is closed; but W ⊥ = 0. In this case too f cannot come from the
inner product.

(iii) f is continuous, i.e., W is closed, and further W ⊥ 6= 0. Then f comes from the inner product
because for every nonzero element y ∈ W ⊥ , f and h , yi have the same kernel.
In a nutshell, f ∈ V ∗ comes from an inner product iff ker f ⊥ 6= 0, for which W being closed, or
equivalently, f being continuous, is a necessary condition, but not sufficient as we’ll shortly see.

If V is an inner product space of infinite dimension, then we know that dim V ∗ > dim V .
Therefore the map Φ : V → V ∗ cannot be surjective. In fact, for such a V , we can construct
any number of linear functionals which are not continuous. Just take a sequence (x1 , x2 , . . . ) in
V . Then define a linear functional f on V such that f (xn ) := nxn for all n ≥ 1. Clearly, f isn’t
continuous, and therefore cannot come from the inner product.
Another example is V := C([0, 1], F ), the set of all continuous F -valued functions defined on the
closed unit interval, with the inner product being defined as
Z 1
hf, gi := f (t)g(t)dt.
0

Then any nonzero linear combination of evaluation maps is not continuous, and therefore cannot
come from the inner product. To see this, note that if α1 , . . . , αr ∈ [0, 1] are finitely many points,
then we can construct a sequence of continuous functions (fn )n∈N ∈ V N such that lim kfn k = 0,
n→∞
and for each n ∈ N, fn (α1 ) = 1 and fn (αi ) = 0 for all i = 2, . . . , r.

However, if V is a Hilbert space, then every continuous linear functional defined on V comes
from the inner product.

Riesz representation theorem. If (V, h , i) is a Hilbert space then for every continuous
linear functional f ∈ V ∗ , there exists a unique element xf ∈ V such that f = h , xf i.

Proof. The uniqueness of xf is obvious.

We assume, at the outset, that f 6= 0, for otherwise the assertion is trivial. Let N := ker f be the
closed hyperplane of V on which f vanish. Since V is complete and N is a proper closed subspace,
N ⊥ 6= 0. Let y ∈ N ⊥ be a nonzero element. Then ker f = ker h , yi. So there exists a nonzero
constant λ ∈ F such that f = λh , yi, which implies that if we define xf := λ̄y, then f = h , xf i.

79
A linear transformation between normed linear spaces is said to be continuous if it is continuous
with respect to the induced metrics. We below discuss a related concept which is sometimes useful
to discuss continuity in a more efficient manner.

Definitions. Let (V, k kV ), (W, k kW ) be normed linear spaces over F = R/C with V 6= 0 and
T : V → W a linear transformation. Then the operator norm of T , denoted by kT kop , is defined
as
kT (x)kW
kT kop := sup = sup kT (x)kW .
x∈V \{0} kxkV x∈V and kxkV =1

Simply put, kT kop is the ‘maximum stretch imparted on the vectors of V by T ’. If V = W and
T : V → V is a linear operator, then T is said to be a bounded linear operator if kT kop < ∞.
Similar definitions apply to linear transformations between inner product spaces by considering
the induced norms.

Remarks.
1. As norm is translation invariant, a linear transformation T : V → W is continuous iff it’s
continuous at the origin.
2. A linear transformation T between normed linear spaces is continuous iff kT kop < ∞.
3. For every linear transformation T , kT kop ≥ 0; and kT kop = 0 iff T = 0.
4. If T = λI is a scalar operator then kλIk = |λ|. In fact, if λ ∈ F is an eigenvalue of T then
|λ| ≤ kT kop .
5. If T : V → W is a continuous linear transformation then kT (x)kW ≤ kT kop kxkV for all
x∈V.
6. If V is a finite-dimensional normed linear space, then every linear transformation defined on
V is continuous. One can see this by ‘identifying’ V with F n with the Euclidean norm, and
noting that any two norms on a finite-dimensional inner product space are equivalent.
7. If S, T ∈ L(V, W ) are linear transformations then kS + T kop ≤ kSkop + kT kop . If λ ∈ F and
T : V → W is a continuous linear transformation, then kλT kop = |λ| kT kop . Therefore, the
set of all continuous linear transformations from V to W is a subspace of L(V, W ), denoted
by BL(V, W ), which becomes a normed linear space under the operator norm.
8. If T1 : V1 → V2 , T2 : V2 → V3 are continuous linear transformations then so is the composition
T2 ◦T1 : V1 → V3 . In fact kT2 ◦T1 kop ≤ kT1 kop kT2 kop . Therefore, the set of all bounded linear
operators on a normed linear space V forms an F -subalgebra of L(V ), denoted by BL(V ).
9. The continuity of linear transformations become important only when the inner product
spaces/normed linear spaces involved are not of finite dimension, because every linear trans-
formation between normed linear spaces of finite dimension is always continuous.
Now we introduce adjoints of linear transformations, an indispensable tool in studying orthog-
onal diagonalizability of linear operators. This is modelled on and generalizes the transpose map
A 7→ At of a linear operator A on (Rn , ·), which satisfies Ax · y = x · At y for all x, y ∈ Rn .

Definition. Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear

transformation. Then a linear transformation T ∗ : W → V is said to be an adjoint of T , if it
satisfies the following property
hT (x), yiW = hx, T ∗ (y)iV , for all x ∈ V and y ∈ W.
Note that T ∗ , if exists, is unique. If T is a linear transformation, an expression involving T ∗ , like,
for example, ‘T ∗ = . . . ’, should be read like ‘T ∗ exists and T ∗ = . . . ’. A bounded linear operator
T ∈ L(V ) is said to be self-adjoint if T ∗ exists and T ∗ = T .

A linear operator T is closely related to its adjoint T ∗ as the following result shows.

Proposition. Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W be a linear

transformation with an adjoint T ∗ : W → V . Then the following assertions hold.

80
(i) T ∗ has an adjoint. In fact, T ∗∗ := (T ∗ )∗ = T .
(ii) im T ∗ ⊥ ker T and im T ⊥ ker T ∗ .
(iii) If V, W are further finite-dimensional, then
(a) rk T = rk T ∗ ; and null T = null T ∗ iff dim V = dim W .
(b) V = ker T ⊕ im T ∗ and W = ker T ∗ ⊕ im T are orthogonal decompositions.
(c) T is injective iff T ∗ is surjective, and T ∗ is injective iff T is surjective. In particular, T
is an isomorphism iff T ∗ is an isomorphism. In this case, (T ∗ )−1 = (T −1 )∗ .
(d) rk T T ∗ = rk T ∗ T = rk T = rk T ∗ .
Proof. (i) follows directly from the definition of adjoint.
To prove (ii), let y ∈ ker T . Then hy, T ∗ (x)iV = hT (y), xiW = 0, implying that im T ∗ ⊥ ker T .
The second assertion follows since T ∗∗ = T .
From (ii), it follows that rk T ∗ ≤ rk T . Also, rk T = rk T ∗∗ ≤ rk T ∗ , Together, we get (iii)-(a).
(iii)-(b) follows from (ii) and (iii)-(a).
The first two assertions in (iii)-(c) follow from (iii)-(b). For the third assertion, let x ∈ V, y ∈ W .
Then

hT −1 (y), xiV = hT −1 (y), T ∗ (T ∗ )−1 (x)iV = hT T −1 (y), (T ∗ )−1 (x)iW = hy, (T ∗ )−1 (x)iW .

Therefore (T ∗ )−1 = (T −1 )∗ .
Finally, (iv) follows from (ii). Actually, we’ve already seen the ‘matrix-theoretic version’ of (iv).
We can also give matrix-theoretic proofs of (iii) after we’ve done matrix representations of adjoint
linear transformations. Please do it!

Although we need adjoints only for linear operators on finite dimensional inner product spaces,
we prove a more general result as it doesn’t require much of an extra effort.

Theorem. Let (V, h , iV ), (W, h , iW ) be inner product spaces with V being complete. Then
every continuous linear transformation T : V → W has an adjoint T ∗ which is also continuous. In
fact, kT kop = kT ∗ kop .

Proof. Fix an element y ∈ W . Then the map φy : V → F , given by x 7→ hT (x), yiW ,

is a continuous linear functional on V . Hence there exists a unique element y 0 ∈ V such that
φy = h , y 0 iV . If V is finite-dimensional, then we can explicitly describe y 0 . Otherwise, one has
to use Riesz representation theorem. Let T ∗ : W → V be the map defined by T ∗ (y) := y 0 . It’s
straightforward to check that T ∗ is a linear transformation. The details are left as an exercise.
To prove that T ∗ is continuous, we’ll show that kT ∗ kop ≤ kT kop . If y ∈ W is such that T ∗ (y) 6= 0,
then

kT ∗ (y)k2V = hT ∗ (y), T ∗ (y)iV = |hT T ∗ (y), yiW | ≤ kT T ∗ (y)kW kykW ≤ kT kop kT ∗ (y)kV kykW ,

implying that kT ∗ (y)kV ≤ kT kop kykW ; and consequently, kT ∗ kop ≤ kT kop . Similarly, kT kop ≤
kT ∗ kop , and together, we get kT kop = kT ∗ kop

Remark. It’s a fact which we are not going to prove that if a linear operator T on a Hilbert
space V has an adjoint T ∗ , then T must be continuous.

Matrix representations of adjoints are exactly what we ‘expect’ them to be.

Proposition. Let (V, h , iV ), (W, h , iW ) be finite-dimensional inner product spaces and

T : V → W a linear operator. If BV := (x1 , . . . , xn ), BW := (y1 , . . . , ym ) are ordered orthonormal
basis of V, W respectively, then [T ∗ ]BW ,BV = ([T ]BV ,BW )∗ .

Proof. The ij-th entry of [T ∗ ]BW ,BV is the xi -th coefficient of T ∗ (yj ), which is equal to
∗
hT (yj ), xi iV = hT (xi ), yj iW , which is equal to the conjugate of the ji-th entry of [T ]BV ,BW .
Hence the assertion follows.

81
Immediately, we see that the matrix representation of a self-adjoint operator on a finite-
dimensional inner product space with respect to an ordered orthonormal basis is Hermitian.

Corollary. Let V be a finite-dimensional inner product space. If T ∈ L(V ) then the following
statements are equivalent.
(i) T is a self-adjoint operator.
(ii) If B is an ordered orthonormal basis of V , then [T ]B is a Hermitian (symmetric, if F = R)
matrix.
(iii) There exists an ordered orthonormal basis B of V , such that [T ]B is a Hermitian (symmetric,
if F = R) matrix.
Proof. Left as an exercise.

If you’ve been paying attention, similarities between adjoint and transpose of a linear transfor-
mation couldn’t have escaped your eyes. It’s time to make the relation precise.

The relation between W ⊥ and W ◦ , and between T ∗ and T t

Let (V, h , iV ), (V 0 , h , iV 0 ) be finite-dimensional inner product spaces. Let ΦV : V → V ∗ , ΦV 0 :

V → (V 0 )∗ be the natural maps which send every element to the corresponding linear functional
0

induced by that element via inner product. Then ΦV , ΦV 0 are linear isomorphisms if F = R,
and conjugate-linear isomorphisms if F = C. In particular, every finite-dimensional real inner
product space is naturally isomorphic to its dual. If W is a subspace of V , then one can check that
ΦV (W ⊥ ) = W ◦ , so that we can ‘naturally identify’ W ⊥ with W ◦ .
If T : V → V 0 is a linear transformation, then we have the following commutative diagram of
R-linear transformations

T∗
V0 V
0 0
ΦV 0 : x 7→h ,x iV 0 ΦV : x7→h ,xiV
t
T
(V 0 )∗ V∗
Note that the vertical arrows are R-linear isomorphisms (conjugate linear, if F = C). It may be
quite helpful to keep this connection in mind.

Examples
1. 0∗ = 0, and I ∗ = I. In fact, for every scalar operator λI ∈ L(V ), we’ve (λI)∗ = λ̄I. In
particular, a scalar operator λI is self-adjoint iff λ ∈ R. Note that every scalar operator has
an adjoint, irrespective of the nature of V .
2. Let V := Mn (F ) where the inner product is given by hM, N i := tr (N ∗ M ). If A ∈ Mn (F ) is
a fixed matrix, then we can consider the linear operator LA given by the left multiplication
by A, i.e., LA (B) := AB for all B ∈ Mn (F ). Now one can check that (LA )∗ = LA∗ , so that
LA is self-adjoint iff A is a Hermitian matrix.
Formulate a similar result for RA , the right multiplication by A.
Next, let A, B ∈ Mn (F ) be fixed matrices. Now consider the linear operator TA,B : Mn (F ) →
Mn (F ), given by TA,B (M ) := AM B for all M ∈ Mn (F ). Then TA,B = LA ◦ RB = RB ◦ LA
and one can check that (TA,B )∗ = TA∗ ,B ∗ .
3. Let V := F [X], the polynomial ring in one variable over F = R/C, with the inner product
R1
being defined as hf, gi := 0 f (t)g(t)dt. Let φ ∈ F [X] be a fixed polynomial and Tφ : V → V
the linear operator given by the multiplication by φ. Then it’s easy to see that (Tφ )∗ = Tφ̄ ,
where φ̄ ∈ F [X] is obtained from φ by replacing its coefficients by their conjugates. Therefore
Tφ is self-adjoint iff φ ∈ R[X]. For each φ ∈ F [X], Tφ is continuous; but it doesn’t have
any eigenvalue unless φ is a constant polynomial. So we have examples of continuous self-
adjoint operators without any eigenvalues! ‘Oddities’ like this dissuade us from attempting
to classify orthogonally diagonalizable operators on infinite dimensional inner product spaces.

82
4. Let V = `2 (F ) := {(xn ) ∈ F N | 2
P
n |xn | < ∞}, the set of all square-summable sequences
over F . Then V is a Hilbert space containing F (N) as a dense subspace. Let R be the right
translation on V defined as

R((x1 , x2 , . . . )) := (0, x1 , x2 , . . . ).

Then R is a continuous linear operator; in fact, an isometry. The adjoint of R is given by

R∗ = L, the left multiplication on V , defined as

L((x1 , x2 , . . . )) := (x2 , x3 , . . . ),

which is also continuous. Note that R is injective, L is surjective and L ◦ R = Id, but
R ◦ L 6= Id. Also, R doesn’t have any eigenvalues.
5. Let W be a subspace of an inner product space V and ι : W ,→ V the natural inclusion. If ι∗
exists then y − ι∗ (y) ∈ W ⊥ for all y ∈ V , implying that ι∗ = πW , the orthogonal projection
of V onto W . Therefore, for every W ≤ V which does not admit any orthogonal projection,
for example, when W is not closed, ι is a continuous linear transformation without having
any adjoint.
6. Let us consider V := F (N) with the usual dot product, and T : V → V be the linear operator
defined as T (en ) := ne1 for all n ≥ 1. Then T is not continuous. We claim that T doesn’t
have any adjoint. To see this, consider the sequence (en /n)n∈N . If T ∗ exists then

1 = he1 , e1 i = lim hT (en /n), e1 i = lim hen /n, T ∗ (e1 )i = 0,

n→∞ n→∞

which is a contradiction.

In the following discussion, S, T are assumed to be linear operators on an inner product space
V , both having an adjoint. If you’re not comfortable with this general set-up, just assume V to
be finite-dimensional so that every linear operator is automatically continuous and has an adjoint.
The purpose of stating things in more generality is twofold. Firstly, restricting ourselves only to
finite dimensional inner product spaces doesn’t make too much of a difference. And secondly, i
want you to actively extract the arguments for the finite dimensional case, which, at times, are
somewhat simpler.

Basic properties of adjoints

1. (S +T )∗ = S ∗ +T ∗ and (ST )∗ = T ∗ S ∗ . In particular, if λ ∈ F , then (λT )∗ = λ̄T ∗ . Therefore,
if V is a Hilbert space, then the adjoint map Adj : BL(V ) → BL(V ), given by Adj (T ) := T ∗ ,
is an R-algebra anti-isomorphism which satisfies Adj2 = Id. This is a generalization of the
conjugate-transpose map of n × n square matrices over F . In fact, if V is an inner product
space of dimension n, then for every ordered orthonormal basis B of V , we get the following
commutative diagram.
T 7→T ∗
L(V ) L(V )

T 7→[T ]B T 7→[T ]B

A7→A∗
Mn (F ) Mn (F )

Therefore it’s no coincidence that we’re using the same symbol to denote the adjoint of a
linear operator, as well as the conjugate-transpose of a matrix! Note that if we consider
Mn (F ), the set of n × n matrices over F , then A 7→ Ā is an isomorphism, whereas A 7→ At
and A 7→ A∗ are anti-isomorphisms.
From an even more naive perspective, if we somehow ignore the fact that taking adjoints
switch the order of multiplication, the adjoint map ‘just looks like’ the conjugation map on
C. This ‘erroneous’ way of viewing adjoints actually turns out to be highly illuminating,
especially in the case of orthogonally diagonalizable operators on finite-dimensional complex
inner product spaces, as we’re soon going to see.

83
2. If W ≤ V is a T -invariant subspace, then W ⊥ is T ∗ -invariant. To see this, let x ∈ W and
y ∈ W ⊥ . Then hx, T ∗L (y)i = hT (x), yi = 0, implying that T ∗ (y) ∈ W ⊥ .
In particular, if V = i Vi is a T -invariant orthogonal decomposition of V , then the decom-
position is also T ∗ -invariant. This follows because, for each i, Vei := j6=i Vj is a T -invariant
L

subspace of V , and Vi = Vei⊥ .

Can you give an example of a linear operator T such that ker T is not T ∗ -invariant?
3. If W ≤ V is both T -invariant and T ∗ -invariant, then (T |W )∗ = T ∗ |W . It shows that T is
orthogonally diagonalizable iff T ∗ is orthogonally diagoalizable. In particular, a projection
π : V → V is an orthogonal projection iff π ∗ is an orthogonal projection.
4. If T has an adjoint T ∗ such that T T ∗ = T ∗ T , then ker T = ker T ∗ . Because if x ∈ ker T , then
kT ∗ (x)k2 = hT ∗ (x), T ∗ (x)i = hT T ∗ (x), xi = hT ∗ T (x), xi = 0, implying that ker T ⊆ ker T ∗ .
Similarly, ker T ∗ ⊆ ker T ∗∗ = ker T . Together, we get ket T = ker T ∗ (cf. example 4).
Replacing T by T − λI for any λ ∈ F , one easily sees that the λ-eigenspace of T is same as
the λ̄-eigenspace of T ∗ . In particular, T and T ∗ have the same set of eigenvectors.
In this case, if {λ1 , . . . , λr } is the set of eigenvalues of T then {λ̄1 , . . . , λ̄r } is the set of
eigenvalues of T ∗ with the λi -eigenspace of T being equal to the λ̄i -eigenspace of T ∗ for all
i = 1, . . . , r.
5. If T is a linear operator on a finite-dimensional inner product space V , then looking at the
matrix representations of T and T ∗ with respect to an ordered orthonormal basis of V , one
can easily see that rk T = rk T ∗ and null T = null T ∗ . For this, T and T ∗ need not commute
with each other.
L
6. Let V = i Vi be a T -invariant orthogonal
L decomposition of an inner product space V . Let
Ti := T |Vi for each i, so that TL= i Ti . Then T has an adjoint iff each Ti ∈ L(Vi ) has an
adjoint; and in this case, T ∗ = i Ti∗ .
One may now go back to the second example of adjoints, the left multiplication by a matrix,
and reinterpret it.
7. Let T be an orthogonally diagonalizable operator on V . Then T is bounded iff the set
σ(T ) := {|λ| | λ is an eigenvalue of T }, known to be the spectrum of T , is bounded (Usually,
when talking about spectrum, one takes complex eigenvalues of T . But it doesn’t make
any difference for orthogonally diagonalizable operators.). In fact, kT kop is equal to the
supremum of the set σ(T ), known to be the spectral radius of T , and denoted by ρ(T ). So an
orthogonally diagonalizable linear operator need not be continuous. But every orthogonally
diagonalizable operator has an adjoint, and it’s self-adjoint iff every eigenvalue of T is a
real number. Therefore, unlike Hilbert spaces, even an unbounded operator on an inner
product space may have an adjoint. For example, one can consider the linear operator
T : F (N) → F (N) , given by T (en ) := nen for all n ≥ 1, which is orthogonally diagonalizable,
and hence has an adjoint. But clearly, T is not continuous.
8. If T ∈ L(V ) is a bounded linear operator, then every eigenspace of T is a closed subspace of
V.

9. For every linear operator T which has an adjoint T ∗ , the linear operators T + T ∗ , T T ∗ , T ∗ T
are all self-adjoint.
10. Let T ∈ L(V ) be an orthogonally diagonalizable operator. IfLW ≤ V is a T -invariant
subspace then T |W is also orthogonally diagonalizable. If L V = λ∈F Vλ is the orthogonal
decomposition of V into the eigenspaces of T , then W = λ∈F Wλ , where Wλ = Vλ ∩ W
for all λL∈ F . For each λ ∈ F , if Wλ0 is the L orthogonal complement of Wλ in Vλ , then
W ⊥ = λ∈F Wλ0 . To see this,P first note that 0 ⊥
λ∈F Wλ ⊆ W . Conversely, let y ∈ W .
⊥

Then we can write y as y = λ yλ , where yλ ∈ Vλ for all λ ∈ F and yλ = 0 for all but finitely
many λ. If yµ ∈ / Wµ0 for some µ ∈ F , then there exists xµ ∈ Wµ such that hxµ , yµ i = 6 0,
implying that hxµ , yi = hxµ , yµ i 6=L / W ⊥ , which is
0. But as xµ ∈ W , it implies that y ∈
a contradiction. Therefore W ⊥ = λ∈F Wλ0 is also T -invariant and T |W ⊥ is orthogonally
diagonalizable.

84
We’re now in a position to discuss orthogonal diagonalizability of a linear operator on a finite-
dimensional inner product space. Suppose that a linear operator T on a finite-dimensional inner
product space V is orthogonally diagonalizable. If B is an ordered orthonormal basis of V such
that A := [T ]B is a diagonal matrix, then [T ∗ ]B = A∗ is also a diagonal matrix, implying that
T and T ∗ have a common orthonormal eigenbasis. In particular, T T ∗ = T ∗ T ; and when F = R,
A = A∗ , i.e., T is a self-adjoint operator. We are soon going to see that these necessary conditions
for orthogonal diagonalizability are also sufficient.

Definitions. Let T be a linear operator on an inner product space V . Then T is said to be

normal if T T ∗ = T ∗ T .
Similarly, we define a matrix A ∈ Mn (F ) to be normal if AA∗ = A∗ A.
Note that a self-adjoint operator is normal; and similarly, a Hermitian matrix is also normal.
A linear operator T on a finite-dimensional inner product space V is normal iff its matrix repre-
sentation [T ]B with respect to some (equivalently, every) orthonormal basis B of V is a normal
matrix.
A normal matrix A ∈ Mn (F ) remains normal under unitary (orthogonal, if F = R) equivalence.

Theorem. Let T be a normal operator on a finite-dimensional inner product space V . Then

T is orthogonally diagonalizable iff it’s triangulable.

Proof. Let S be a maximal orthonormal set consisting of the eigenvectors of T . Then S is

non-empty because T is triangulable. Let W be the linear span of S. If V = W , we’re done.
Otherwise, W ⊥ is T ∗ -invariant; and since T is triangulable, so is T ∗ , and therefore T ∗ |W ⊥ is also
triangulable. As W ⊥ 6= 0, we can find a unit vector x ∈ W ⊥ which is an eigenvector of T ∗ . But
then x is also an eigenvector of T , implying that S ∪ {x} is an orthonormal set of eigenvectors of
T , which contradicts the maximality of S.
We’ll see a different matrix-theoretic proof in the exercises.

Corollary. Let T be a linear operator on a finite-dimensional inner product space V . Then

(i) If F = C, then T is orthogonally diagonalizable iff it’s a normal operator.

(ii) If F = R, then T is orthogonally diagonalizable iff it’s a self-adjoint operator.

Equivalently, a matrix A ∈ Mn (C) (respectively, A ∈ Mn (R) is unitarily diagonalizable (respec-
tively, orthogonally diagonalizable) iff A is a normal (respectively, symmetric) matrix.

Proof. With C being algebraically closed, every linear operator on a finite-dimensional com-
plex inner product space is triangulable, Therefore (i) follows directly from the above theorem.
To prove (ii), let T be a self-adjoint operator on a finite-dimensional real inner product space.
Choose an orthonormal basis B := (x1 , . . . , xn ) of V and consider the matrix representation
A := [T ]B ∈ Mn (R). Then A is a symmetric matrix, and hence Hermitian, if treated as a matrix
over C. If λ ∈ C is an eigenvalue of A corresponding to an eigenvector x ∈ Cn , then Ax · x = x · Ax,
implying that λ = λ̄, or equivalently, λ ∈ R. It shows that every eigenvalue of A, and hence of
T , is real. Therefore T is a triangulable normal operator over R, implying that T is orthogonally
diagonalizable.
The matrix theoretic version of the corollary is left as an exercise.

In the following remarks, unless mentioned otherwise, S, T are assumed to be linear operators
on a finite-dimensional inner product space V .

Remarks.
1. A linear operator T is self-adjoint iff it’s orthogonally diagonalizable with real eigenvalues.
In particular, chT (X) ∈ R[X]; and the same is true for a Hermitian matrix.

2. A complex symmetric matrix,

unlike its real counterpart, needn’t
be orthogonally diagonal-
1 i 1 i
izable, as the example shows. In fact, the example shows that it may not
i 0 i −1
even be diagonalizable. So, unlike real symmetric matrices, complex symmetric matrices are

85
not very interesting; here the ‘correct’ notion is that of a conjugate-transpose. The ‘reason’
for this may be traced back to the conjugate-linearity of the inner product.

3. The orthogonal diagonalizability property of self-adjoint operators, in general, doesn’t extend

to infinite dimensional inner product spaces as we’ve already seen examples of self-adjoint
operators which don’t have any eigenvalue.
4. Every diagonal matrix is normal; and it’s Hermitian iff the diagonal entries are real numbers.

5. Every complex square matrix is unitarily triangulable, i.e., if A ∈ Mn (C), then there exists
a unitary matrix U ∈ Mn (C) such that U −1 AU is an upper triangular matrix. On the other
hand, for a real matrix A ∈ Mn (R), there exists an orthogonal matrix P ∈ Mn (R) such that
P −1 AP is upper triangular iff A is triangulable over R.
6. The set of normal matrices is a proper closed subset of Mn (F ). So orthogonal diagonaliz-
ability isn’t so ‘common’ as usual diagonalizability.
2
The set of all symmetric matrices in Mn (R) is a closed subspace of dimension n 2+n ; and the
set of all Hermitian matrices in Mn (C) is a closed R-linear subspace of real dimension n2 .
7. The only thing which prevents a normal operator on a finite-dimensional real inner product
space from being orthogonally diagonalizable, is the ‘absence’ of eigenvalues.

8. If T is a normal nilpotent operator then T = 0.

9. If A ∈ Mn (F ) is an upper (or lower) triangular Hermitian matrix, then A is diagonal.
10. Self-adjoint operators ‘behave like real numbers’. Every linear operator T on a complex inner
product space V can be uniquely written in the form T = T1 + iT2 , where T1 , T2 are self-
adjoint operators. The same argument which we used to show that a complex n × n matrix
A can be written uniquely in the form A = A1 + iA2 , where A1 , A2 are Hermitian matrices,
also works here. In fact, if V is finite-dimensional, one can also prove it after taking matrix
representations with respect to some ordered orthonormal basis of V . And T is normal iff
T1 T2 = T2 T1 .
Coming back to the finite-dimensional set-up, consider the set of self-adjoint operators as a
subset of the set of normal operators. Now can you see that we’ve an inclusion ‘like’ R ⊆ C,
with the adjoint operation acting as conjugation?

Definitions. Let T be a self adjoint linear operator on a finite-dimensional inner product

space (V, h , i). Then T is said to be positive (respectively, negative) definite if hT (x), xi > 0 for
all nonzero x ∈ V .
T is said to be positive (respectively, negative) semi-definite if hT (x), xi ≥ 0 for all x ∈ V .
T is called indefinite if it’s neither positive semi-definite, nor negative-semi-definite.
Note that T is positive (semi-)definite iff −T is negative (semi-)definite; and T is indefinite iff −T
is indefinite.

Remark. Let S, T be linear operators on a finite-dimensional inne rproduct space V .

If S, T are self-adjoint, then so are S + T, ST S and T ST ; and ST is self-adjoint iff ST = T S.
If S, T are positive (respectively, negative) semi-definite, then S + T is positive (respectively, neg-
ative) semi-definite. If further S is positive (respectively, negative) definite, then S + T is positive
(respectively, negative) definite.
If S, T are positive/negative (semi-)definite and ST = T S then ST is a positive (semi-)definite
linear operator.
If S is positive (semi-)definite and T is negative (semi-)definite and ST = T S then ST is negative
(semi-)definite.
(These are just like product of positive real numbers is positive and so on ....).
The case of indefinite operators is more unpredictable!

We’ve already seen the above definitions in the context of matrices. The following proposition
captures the relation.

86
Proposition. Let T be a self-adjoint operator on a finite-dimensional inner product space V .
Then the following statements are equivalent.

1. T is a positive definite/positive semi-definite/negative definite/negative semi-definite/indefinite

linear operator.
2. For all ordered orthonormal bases B if V , [T ]B is a positive definite/positive semi-definite/negative
definite/negative semi-definite/indefinite matrix.

3. There exists an ordered orthonormal bases B if V , such that [T ]B is a positive definite/positive

semi-definite/negative definite/negative semi-definite/indefinite matrix.
Proof. Left as an exercise.

Proposition. Let T be a self-adjoint operator on a finite-dimensional inner product space V .

Then the following assertions hold.
1. T is positive (respectively, negative) definite iff each eigenvalue of T is a positive (respectively,
negative) real number.
2. T is positive (respectively, negative) semi-definite iff each eigenvalue of T is a non-negative
(respectively, non-positive) real number.

3. T is indefinite iff it has both positive and negative eigenvalues.

Proof. We only give a proof in the case of positive definite operators. The proofs in the other
cases are similar and left as an exercise.
If T is positive definite, then every eigenvalue of T can easily be seen to be positive. Conversely,
Let V = V1 ⊕ . . . ⊕ Vr be the orthogonal decomposition of V into the eigenspaces of T correspoding
to the eigenvalues λ1 , . . . , λr > 0. If x ∈ V is a nonzero element, P weP P write x as
can uniquely
x = x 1 + . . . + x r such that x i ∈ Vi for all i. Then hT (x), xi = hT ( i x i ), i x i i = i hT (xi ), xi i =
2
P
i λi kx i k > 0, as x i is nonzero for some i.

L Proposition. Let T be a linear operator on an inner product space (V, h , i) and V =

i Vi a T -invariant orthogonal decomposition. Let Ti := T |Vi for all i. Then T is normal/
self-adjoint/anti-self-adjoint (i.e., T ∗ = −T )/positive definite/positive semi-definite/negative defi-
nite/negative semi-definite iff each Ti is so (What happens when T is indefinite?).

Proof. We only prove the positive definite case, the other ones are similar. If T is positive
definite, then T = T ∗ . So Ti∗ = T ∗ |Vi = T |Vi = Ti for all i, implying that Ti is self-adjoint. Now if
x ∈ Vi then hTi (x), xi = hT (x), xi > 0, implying that Ti is positive L definite. L
Conversely, suppose that each Ti is positive definite. Then T ∗ = i Ti∗ = i Ti = T , i.e.,PT is
self-adjoint. Now let x ∈ V be a nonzero element. Then x can be uniquely written as x = i xi
where each xi P ∈ Vi with
P at leastPone xi and P at most P finitely many xi being nonzero. Then
hT (x), xi = hT ( i xi ), i xi i = h i T (xi ), i (xi ) = i hTi (xi ), xi i > 0. Therefore T is positive
definite.

The next lemma may appear out of place at first sight. But the corollaries immediately follow-
ing the lemma justifies its significance.

Lemma. Let R be a commutative ring with A, A0 being R-algebras (not necessarily commuta-
tive). Let φ : A → A0 be an R-algebra homomorphism and f (X) ∈ R[X] a polynomial. If f (X) = a
has a solution in A then f (X) = φ(a) has a solution in A0 . If moreover φ is an isomorphism then
the number of solutions of f (X) = a is same as the number of solutions of f (X) = φ(a).
We’re mainly interested in the cases when R = F is a field, either A = L(V ) and A0 = Mn (F ) or
A = A0 = Mn (F ), and φ is an F -algebra isomorphism.

Proof. Left as an exercise.

Corollaries. In the following corollaries, T is a linear operator on a finite-dimensional inner

product space V over F = R/C.

87
1. If T is normal then T ∗ can be written as a polynomial in T .
Is it true without the normality assumption?

2. If F = C, T is a normal operator and f (X) ∈ C[X] is a non-constant polynomial, then there

exists a normal operator S : V → V such that f (S) = T (cf. ‘every non-constant polynomial
over C has a complex root’).
3. If F = R, T is self-adjoint operator and f (X) ∈ R[X] is a polynomial of odd degree, then
there exists a self-adjoint operator S : V → V such that f (S) = T (cf. ‘every real polynomial
of odd degree has a real root’).
4. If T is positive (semi-)definite, then there exists a unique positive (semi-)definite operator
S ∈ L(V ) such that T = S 2 = SS ∗ (cf. ‘every positive/non-negative real number has a
unique positive/non-negative square root’). Therefore a Hermitian matrix is a matrix of an
inner product iff it’s positive definite.

Similar assertions hold for matrices.

Proof. Just choose an orthonormal basis B of V such that [T ]B is a diagonal matrix. Note
that the map L(V ) → Mn (F ), obtained by sending a linear operator to its matrix representation
with respect to B, is an F -algebra isomorphism so that we can apply the above lemma. And for
matrices A ∈ Mn (F ), choose unitary matrices U ∈ Mn (F ) such that U −1 AU are diagonal. Now
the original questions reduce to questions about diagonal matrices. The remaining details are left
as an exercise.

Exercises
1. HK : Section 8.3 - 3,6,8,9,10,12;
2. Let T be a linear operator on a finite-dimensional inner product space V . Suppose that T has
n distinct eigenvalues. Then prove that T is orthogonally diagonalizable iff T is diagonalizable
and F [T ] contains 2n distinct orthogonal projections.

3. If V, W are finite-dimensional inner product spaces and f : V → W is a set-theoretic map

such that the image of every orthonormal sequence in V under f is an orthonrmal sequence
in W , then must f be a linear transformation?
4. (*) Let V, W be inner product spaces and T : V → W a linear transformation. Then show
that
kT kop
(i) If e1 , . . . , en is an orthonormal basis of V then √
n
≤ sup kT (ei )k. Can you give an
1≤i≤n
example where the bound is attained?
(ii) If V is finite-dimensional, then T is continuous. Can you conclude the same if, instead
of V , W is finite-dimensional?
(iii) Can you give an example of a linear operator on F (N) such that sup kT (ei )k = 1,
1≤i<∞
but T is not bounded?
5. Give an example of a diagonalizable linear operator which is not orthogonally diagonalizable.

6. If T is a linear operator on a finite-dimensional inner product space V , then show that

chT ∗ (X) = chT (X) and minT ∗ (X) = minT (X). In particular, T is triangulable (respectively,
diagonalizable) iff T ∗ is triangulable (respectively, diagonalizable).
7. (*) Let T be a locally finite linear operator on a complex inner product space V . Then show
that the following statements are equivalent.

(i) T is orthogonnally diagonalizable.

(ii) For all W ≤ V , if W is a T -invariant subspace, then so is W ⊥ .
8. (*) If V is a vector space over F = R/C then prove the following statements.

88
(i) If S := {xi } is a basis of V , then show that there exists a unique inner product on V
with respect to which S is an orthonormal set.
(ii) If h , i is an inner product on V , then prove that (V, h , i) is isomorphic to (F (I) , ·) iff
there exists an orthonormal spanning set of V indexed by I.
(iii) If T ∈ L(V ) is a diagonalizable linear operator, then show that there exists an inner
product on V with respect to which T is orthogonally diagonalizable.
9. Let T be a linear operator on a finite-dimensional complex inner product space V . If B is
an ordered basis of V , not necessarily orthonormal, is it true that [T ∗ ]B = ([T ]B )∗ ?
If [T ∗ ]B = ([T ]B )∗ for all T ∈ L(V ), does it follow that B is orthonormal?

10. Let T be a diagonalizable linear operator on a finite-dimensional inner product space V .

Then show that the following statements are equivalent.

(i) T is orthogonally diagonalizable.

(ii) For every orthonormal basis B of V , if [T ]B is an upper (or, lower) triangular matrix,
then it must be diagonal.
11. If A ∈ Mn (F ) us an upper (or lower) triangular normal matrix, then show that A is diagonal.
Deduce that if {Ti }i∈I is a commuting family of triangulable normal operators on a finite-
dimensional inner product space V , then the family is simultaneously orthogonally diagonal-
izable, i.e., there exists an ordered orthonormal basis B of V such that [Ti ]B is a diagonal
matrix for all i ∈ I. In particular, F [Ti ]i∈I is a commutative F -subalgebra of L(V ) consisting
of orthogonally diagonalizable operators.

12. (*) Let T be a linear operator on an inner product space V . Suppose that {Vi } is a family
of T -invariant subspaces of V such that V = ∪i Vi . Then show that T is orthogonally
diagonalizable iff T |Vi is orthogonally diagonalizable for each i.
13. (*) Let T be a normal operator on a complex inner product space V . If T is locally finite,
then prove that T is orthogonally diagonalizable.
Hint. Can you show that each x ∈ V is contained in a finite-dimensional subspace Vx which
is invariant under both T and T ∗ ?
14. Can you give an example of a real normal matrix which is not orthogonally diagonalizable
over R?

15. Let T be a triangulable linear operator on a finite-dimensional inner product space V . Then
show that the following statements are equivalent.
(i) T is orthogonally diagonalizable.
(ii) For every T -invariant subspace W ≤ V , W ⊥ is also T -invariant.
16. (*) Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear transforma-
tion. If there exists a sequence (xn )n∈N ∈ V N and an element y ∈ W such that lim xn = 0,
n→∞
but lim hT (xn ), yiW 6= 0, then prove that T cannot have an adjoint.
n→∞
Now consider F [X] to be an inner product space with respect to the inner product defined as
R1
hf, gi = 0 f (t)g(t)dt. Then show that the differentiation operator D : F [X] → F [X], given
by D(f ) := f 0 , does not have any adjoint.
17. Let T be a normal operator on an inner product space V . Then show that kT (x)k = kT ∗ (x)k
for all x ∈ V .
Deduce that an element v ∈ V is an eigenvector of T associated to λ ∈ F iff v is an eigenvector
of T ∗ associated to λ̄.

18. (*) Let T : V → W be a linear transformation of inner product spaces. If T has an adjoint
T ∗ : W → V , then prove that

kT T ∗ kop = kT ∗ T kop = kT kop kT ∗ kop = kT k2op .

89
Deduce that if T is a linear operator on a finite-dimensional inner product space V , then
kT kop is equal to the square root of the largest eigenvalue of the positive semi-definite opeartor
T T ∗ (or T ∗ T ).
Hint. For the first part, kT (x)k2 = hT (x), T (x)i = |hx, T ∗ T (x)i| ≤ kT ∗ T kop kxk2 .
19. Let (V, h , i) be an inner product space and π : V → V a projection which has an adjoint
π ∗ . Then show that the following statements are equivalent.

(i) π is an orthogonal projection.

(ii) π is self-adjoint, i.e., π = π ∗ .
(iii) π is normal, i.e., ππ ∗ = π ∗ π.
(iv) hπ(x), xi ≥ 0 for all x ∈ V .
(v) hπ(x), π ∗ (x)i ≥ 0 for all x ∈ V .

Can you give an example of a projection π which does not have an adjoint?

Hint. You may prove (i) =⇒ (ii) =⇒ (iii) =⇒ (i), (i) =⇒ (iv), (v) and (v) =⇒
(iv) =⇒ (i).

20. (*) Prove that a Hilbert space V cannot have an infinite orthogonal decomposition.
Deduce that if V is a Hilbert space then an orthogonally diagonalizable operator T ∈ L(V )
can have only finitely many eigenvalues, and therefore T is continuous.
Give an example of an orthogonally diagonalizable operator which is not continuous.
21. Let T be a linear operator on an inner product space V which has an adjoint T ∗ . Then show
that T T ∗ , T ∗ T are positive semi-definite operators.
Prove that T T ∗ (respectively, T ∗ T ) is positive definite iff T ∗ (respectively, T ) is injective.
22. (*) Let T be a linear operator on a complex inner product space V . Then prove that T = 0
iff hT (x), xi = 0 for all x ∈ V . Deduce that T is self-adjoint iff hT (x), xi ∈ R for all x ∈ V .
Do the assertions hold if we replace the complex inner product space by a real inner product
space?

23. (*) If T is an injective linear operator on a complex inner product space V , then show that
the following statements are equivalent.
(a) Re hT (x), xi = 0 for all x ∈ V .
π
(b) θ(T (x), x) = 2 for all nonzero x ∈ V .
∗
(c) T + T = 0.
24. A linear operator T on an inner product space V is said to be an anti-self-adjoint operator if
T ∗ + T = 0. Then show that the following statements are equivalent for a linear operator T
on a finite-dimensional complex inner product space V .

(a) T is an anti-self-adjoint operator.

(b) T is orthogonally diagonalizable with purely imaginary eigenvalues.
(c) Re hT (x), xi = 0 for all x ∈ V .
Can you now give an example of an invertible linear operator T on a finite-dimensional
complex inner product space V such that θ(T (x), x) = 0 for all nonzero x ∈ V ?

25. Let A be an n × n real matrix such that At + A = 0. If A is diagonalizable, then show that
A = 0.
26. Give examples of two normal matrices A, B ∈ Mn (R) such that A + B, AB are not normal.
27. Let T is a normal operator on an inner product space V (V is neither assumed to be finite-
dimensional, nor over C!). Then show that ker T ⊥ im T .
Deduce that distinct eigenspaces of T are mutually orthogonal.

90
28. (*) If A ∈ Mn (C), then we can define the operator norm of A as

kAkop := max kAxk,

x∈Cn , such that kxk=1

where kxk denotes the Euclidean norm of the vector x ∈ Cn . If kAk := tr (AA∗ ), then show
that kAkop ≤ kAk.
Hint. AA∗ is a positive semi-definite matrix.

91
Lecture 26 (18/1/2021) :
Unitary operators
Recall that if (V, h , iV ), (W, h , iW ) are inner product spaces, then an inner product space homo-
morphism T : V → W is a linear transformation T which preserves the inner product; and for a
linear transformation T : V → W , T preserves the inner product iff it preserves the induced norm
iff it preserves the induced metric. Therefore, an inner product space homomorphism T , which is
always one-to-one, is nothing but a linear isometry.
If V is a finite-dimensional real inner product space, an inner product space homomorphism
T : V → V is also called an orthogonal transformation.

Proposition. Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear

transformation. Then the following statements are equivalent.
1. T is an inner product space homomorphism.
2. The image of every orthonormal set in V under T is an orthonormal set in W .
3. For every basis B of V , hT (x), T (y)iW = hx, yiV for all x, y ∈ B.
4. If S ⊆ V is a spanning set, then hT (x), T (y)iW = hx, yiV for all x, y ∈ S.
5. There exists a spanning set S of V such that hT (x), T (y)iW = hx, yiV for all x, y ∈ S.

If dim V > 1, then the above statements are equivalent to

6. If x, y ∈ V are orthonormal vectors, then so is T (x), T (y) ∈ W .

And if V is finite-dimensional, then the above conditions are equivalent to

7. There exists an orthonormal sequence x1 , . . . , xn in V such that the image T (x1 ), . . . , T (xn )
is also an orthonormal sequence.
Proof. The proof essentially follows from the linearity and conjugate-linearity properties of
the inner product, and the fact that while checking the preservation of inner product, one only
needs to consider two elements at a time. The details are left as an exercise.

Definitions. If (V, h , i) is an inner product space and T : V → V a linear operator, then T

is said to be a unitary operator if it’s an automorphism of inner product spaces, i.e., T is a linear
automorphism which preserves the inner product. Therefore an inner product preserving linear
operator T ∈ L(V ) is unitary iff it’s surjective. If V is finite-dimensional, then this surjectivity
is implied, but not quite so if V is not finite-dimensional! As an example, we may consider the
right translation in F (N) . One usually denotes the set of unitary operators on V by U (V ). Two
linear operators T, T 0 ∈ L(V ) are said to be unitarily equivalent if there exists a unitary operator
U : V → V such that T 0 = U ∗ T U .
An n × n matrix A over C is said to be a unitary matrix if it satisfies AA∗ = A∗ A = I. The
set of all n × n unitary matrices, denoted by Un (C), is called the unitary group of degree n. It
is a subgroup of GLn (C). Note if A ∈ Un (C), then |det A| = 1. The set of all unitary matrices
of determinant 1, denoted by SUn (C), is called the special unitary group of degree n. Note that
SUn (C) is a normal subgroup of Un (C). In fact, the determinant map induces an isomorphism
Un (C) ∼ 1
SUn (C) = S .
A matrix A ∈ Mn (F ) is called an orthogonal matrix of it satisfies AAt = At A = I. As one
may expect, orthogonal matrices are not so important over C. The set of all n × n orthogo-
nal matrices, denoted by On (F ), is called the orthogonal group of degree n. It a subgroup of
GLn (F ). Real orthogonal matrices are the counterpart of complex unitary matrices. Note that
Un (C) ∩ On (C) = On (R). If A ∈ On (R), then det A = ±1. The set of all n × n real orthogonal
matrices of determinant 1 is denoted by SOn (R) and called the special (real) orthogonal group of
degree n. It is a normal subgroup of On (R), the (real) orthogonal group of degree n. In fact, the
On (R) ∼
determinant map induces an isomorphism SO n (R)
= {±1}.

92
The following lemma explicitly states the relation between unitary operators and unitary/orthogonal
matrices.

Lemma. Let V be a finite-dimensional inner product space over F = R/C and T : V → V a

linear operator. Then the following statements are equivalent.
1. T is a unitary operator.
2. For every orthonormal basis B of V , [T ]B is a unitary (orthogonal, if F = R) matrix.

3. There exists an orthonormal basis B of V , such that [T ]B is a unitary (orthogonal, if F = R)

matrix.
In particular, two linear operators T, T 0 ∈ L(V ) are unitarily equivalent iff their matrix represen-
tations with respect to some (equivalently, every) ordered orthonormal basis of V are unitarily
(respectively orthogonally, if F = R) equivalent.

Proof. Observe that for a matrix A ∈ Mn (F ), the ij-th entry of of AA∗ (respectively, A∗ A) is
just ri · rj (respectively, cj · ci ), where r1 , . . . , rn are the row vectors and c1 , . . . , cn are the column
vectors of A. Therefore A ∈ Mn (F ) is unitary (or orthogonal, if F = R) iff its row vectors and
column vectors form orthonormal sequences.

Unitary operators can also be characterized using adjoints.

Proposition. Let T be a linear operator on an inner product space (V, h , i). Then T is a
unitary operator iff T T ∗ = T ∗ T = I.

Proof. Suppose that T ∈ L(V ) is a unitary operator. Then T −1 is also unitary, and
hT (x), yi = hT −1 T (x), T −1 (y)i = hx, T −1 (y)i for all x, y ∈ V , implying that T ∗ = T −1 .
Conversely, suppose that T T ∗ = T ∗ T = I. Then T is invertible. So we only need to check that
T preserves the inner product. Now hT (x), T (y)i = hx, T ∗ T (y)i = hx, yi for all x, y ∈ V , implying
that T preserves the inner product (Where are we using T T ∗ = I?).

Remarks.
1. Every inner product space (V, h , i) is isomorphic, as an inner product space, to (F n , ·).
The proof essentially boils down to choosing an orthonormal basis of V . In particular, two
inner product spaces V and W of finite dimensions are isomorphic iff they have the same
dimension.

2. If S, T ∈ L(V ) are unitary operators, so are ST and S −1 . It implies that U (V ) is a subgroup

of GL(V ). Note that S + T is not unitary in general. Can you give an example where S + T
is a unitary operator?
3. Every unitary operator is normal, and hence orthogonally diagonalizable.

4. If T is a unitary operator on an inner product space V , then every eigenvalue of T has

absolute value 1. In particular, T cannot have any real eigenvalue other than ±1.
L
5. Let T be a linear operator on an inner product space V and V = i Vi a T -invariant
orthogonal decomposition. For each i, let Ti := T |Vi . Then T preserves the inner product
iff each Ti preserves the inner product. In particular, T is a unitary operator iff each Ti is a
unitary operator.
6. A linear operator T on a finite-dimensional complex inner product space V is unitary iff it
is orthogonally diagonalizable with every eigenvalue having an absolute value 1.
A unitary operator on a finite-dimensional real inner product space, however, needn’t be
orthogonally diagonalizable as one can easily see from the example of the π2 -rotation of
R2 . Here the ‘problem’ is that the eigenvalues are not real. In fact, an orthogonal matrix
A ∈ On (R) is orthogonally diagonalizable over R iff all its eigenvalues are real. However, A
is always unitarily diagonalizable over C.

93
7. A ∈ Mn (C) is a unitary matrix iff Ā, At and A∗ := Āt are unitary matrices.
Similarly, A ∈ Mn (R) is an orthogonal matrix iff At is an orthogonal matrix.

8. Let T be a unitary operator on a finite-dimensional inner product space V . If T is positive

definite (respectively, negative definite) then T = I (respectively, T = −I).
9. Let M be an n × n matrix over C (respectively, R). Now consider the linear operator
LM : Mn (C) → Mn (C) (respectively, LM : Mn (R) → Mn (R)), given by the left multiplication
by M . Then LM is a normal/self-adjoint/anti-self-adjoint/positive definite/positive semi-
definite/negative definite/negative semi-definite/unitary operator iff M is a normal/Hermitian
(respectively, symmetric)/positive definite/positive semi-definite/negative definite/negative
semi-definite/unitary (respectively, orthogonal) matrix. One can prove it directly, or use
remark 5 and similar remarks given earlier. The key here is to observe that when we
view Mn (F ) as an inner product space with the inner product hA, Bi := tr (AB ∗ ), then
Mn (F ) = C1 ⊕ . . . ⊕ Cn is an LM -invariant orthogonal decomposition of Mn (F ), where Ci is
the set of all n × n matrices whose every entry other than the ones lying in the i-th column
are equal to 0.
Similar result holds for the right multiplication by M . Here we have to consider the rows of
the matrices.
10. Let T be a linear operator on a finite-dimensional inner product space V . Then T is a self-
00
adjoint unitary operator iff there exists a T -invariant orthogonal decomposition V = V 0 ⊕ V
such that T |V 0 = IV 0 and T |V 00 = −I|V 00 .
11. If A, B ∈ Mn (C) are unitary matrices (or A, B ∈ Mn (R) are orthogonal matrices) then every
eigenvalue of A + B has absolute value at most 2. In particular, | det (A + B) | ≤ 2n .

12. As every isometry of Rn which fixes the origin is a linear operator, every isometry of Rn is
an orthogonal transformation followed by a translation.
13. We’ve seen that important properties of matrices like invertibility, triangulability, diagonal-
izability etc. remain invariant under the general equivalence given by the invertible ma-
trices. Similarly, for n × n matrices over F = R/C, important properties like normality,
self-adjointness, orthogonal diagonalizabilty, anti-self-adjointness, positive definiteness, pos-
itive semi-definiteness, negative definiteness, negative semi-definiteness, indefiniteness etc.
are preserved under unitary equivalence (or orthogonal equivalence, if F = R).
Similar assertions hold for linear operators on a finite-dimensional inner product space V .
14. If F = R/C, then On (F ), the set of all n × n orthogonal matrices over F , is a closed subset of
Mn (F ); and the set of all n × n complex unitary matrices Un (C) is a closed subset of Mn (C).
15. If T is an invertible linear operator on an inner product space V then for each λ ∈ F , λ is
an eigenvalue of T iff λ−1 is an eigenvalue of T −1 , and the λ-eigenspace of T is same as the
λ−1 -eigenspace of T −1 . In particular, T and T −1 have the same set of eigenvectors, and T is
(orthogonally) diagonalizable iff T −1 is (orthogonally) diagonalizable.

16. If T : V → V is a unitary operator then for an element λ ∈ F , λT is unitary iff |λ| = 1.

94
Lecture 27 (22/1/2021) :
QR decomposition
A careful application of the Gram-Schmidt orthogonalization process allows us to decompose in-
vertible matrices.

Theorem. Let A be an n × n invertible matrix over F = R/C. Then there exists a unique
pair of n × n matrices (QA , RA ) satisfying the following properties
(i) QA is a unitary (orthogonal, if F = R) matrix and RA is an upper triangular matrix with
positive diagonal entries.
(ii) A = QA RA , which is called the QR decomposition of A.

Before we begin the proof of the theorem, let us make a simple observation.

Lemma. Let F be an arbitrary field and M ∈ Mn (F ) an invertible upper (or lower) triangular
matrix. Then M −1 is also an upper (or lower) triangular matrix. Further, if F = R/C and each
diagonal entry of M is positive/negative, then the same is true for M −1 .

Proof. One can prove this lemma by direct computation. Alternatively, let TM : F n → F n
be the linear transformation whose matrix representation with respect to the natural ordered
basis B := (e1 , . . . , en ) is M . If 0 = V0 ( V1 ( . . . ( Vn = F n is a chain of T -invariant sub-
spaces then the chain is also T −1 -invariant as T −1 can be written as a polynomial in T . Since
[T −1 ]B = [T ]−1
B =M
−1
, the assertion follows. The rest is trivial.

Proof of the theorem. Let B := (e1 , . . . , en ) be the natural ordered orthonormal basis of F n
and TA : F n → F n the linear operator such that [TA ]B = A. If x1 , . . . , xn ∈ F n are the column
vectors of A, then clearly, TA (ei ) = xi for all i = 1, . . . , n. Now Bx := (x1 , . . . , xn ) is an ordered
basis of F n . Let (y1 , . . . , yn ) be the ordered orthonormal basis of F n which is obtained from Bx
by using Gram-Schmidt orthogonalization, and G : F n → F n the linear operator which sends xi
to yi for all i = 1, . . . , n. Then [G]Bx is an upper triangular matrix with positive diagonal entries.
Note that G ◦ TA is a unitary operator of F n . Therefore [G ◦ TA ]B = U ∈ Mn (F ) is a unitary
(orthogonal, if F = R) matrix. By the change of basis formula for matrix representations of a
linear transformation, we have [G]B = A[G]Bx A−1 . Now

U = [G]B [TA ]B = A[G]Bx A−1 A = A[G]Bx ,

implying that A = U [G]−1

Bx , and hence the existence of QR decomposition follows.
Now for the uniqueness, if A = QR = Q0 R0 are two possible QR decompositions of A, then
(Q0 )−1 Q = R0 R−1 is an upper triangular unitary (or orthogonal, if F = R) matrix with positive
diagonal entries, implying that (Q0 )−1 Q = R0 R−1 = I, and therefore the decomposition must be
unique.

Remark. Taking transpose on both sides of A = QA RA , one can easily see that A can also
0
be uniquely written in the form A = RA Q0A , where RA
0
is a lower triangular matrix with positive
0
diagonal entries and QA is a unitary (orthogonal, if F = R) matrix.
Appropriate versions of QR decompositions hold for arbitrary square (not necessarily invertible),
and even non-square matrices. But we won’t discuss them here.

Orthogonal transformations of Rn
To understand orthogonal transformations, let’s first consider the simplest nontrivial the case -
the real plane. Let T : R2 → R2 be an orthogonal transformation with A ∈ M2 (R) being its
matrix representation with respect to the natural ordered orthonormal basis (e1 , e2 ). Then A is
an orthogonal matrix. Now it is easy to see that
[
a −b 2 2 a b 2 2
O2 (R) = a +b =1 a +b =1 ,
b a b −a

95
with the first set being SO2 (R). If T has a real eigenvalue, then both its eigenvalues are real, in
which case A must be of the form

1 0 −1 0 1 0
A= or or
0 1 0 −1 0 −1

with respect to a suitable ordered orthonormal basis. Note that the second matrix represents
rotation at an angle π and the third one the reflection with respect to a line passing through the
origin.
If the eigenvalues of T are not real, then A is a special orthogonal matrix, and we can find
φ ∈ (0, π) ∪ (π, 2π) such that A is of the form

cos φ - sin φ
A= ,
sin φ cos φ

which is nothing but the anti-clockwise rotation of the plane at an angle φ because cos (π/2 + φ) =
- sin φ and sin (π/2 + φ) = cos φ. If we parametrize the (anti-clockwise) rotations of R2 as

cos θ - sin θ
Aθ = θ ∈ [0, 2π) ,
sin θ cos θ

then SO2 (R) = {Aθ | θ ∈ [0, 2π)}; and for every rotation but at an angle θ = 0 or π, we get an
orthogonal matrix whose eigenvalues are not real. Note that if we naturally identify R2 with C,
by sending e1 to 1 and e2 to i, then the anti-clockwise rotation of R2 at an angle θ is same as the
multiplication map in C, given by the element eiθ = cos θ + i sin θ.

Coming to the general case, let T : Rn → Rn be an orthogonal transformation and A the

matrix representation of T with respect to the natural ordered orthonormal basis of Rn . Then
A ∈ Mn (R) is an orthogonal matrix, and hence unitarily diagonalizable over C, i.e., there exists a
unitary matrix U ∈ Un (C) such that U ∗ AU ∈ Mn (C) is a diagonal matrix. Therefore the minimal
polynomial of T is of the form

minT (X) = (X − 1)r (X + 1)s f1 (X) . . . ft (X),

where t is a non-negative integer, r, s ∈ {0, 1} and f1 , . . . , ft ∈ R[X] are distinct monic irreducible
polynomials of degree 2; and the characteristic polynomial of T is of the form
0 0
chT (X) = (X − 1)r (X + 1)s f1e1 (X) . . . ftet (X),

where r0 (≥ r), s0 (≥ s) are non-negative integers and e1 , . . . , et are positive integers such that

n = r0 + s0 + 2(e1 + . . . + et ).

We need to prove a basic lemma before we proceed any further.

Lemma. Let T be a normal operator on a finite-dimensional inner product space (V, h , i)

(Strictly speaking, we won’t need V to be finite-dimensional!). Let f, g ∈ F [X] be relatively prime
polynomials such that f g(T ) = 0. Then V = ker f (T ) ⊕ ker g(T ) is a T -invariant orthogonal
decomposition.

Proof. From the theory of general linear operators, we know that V = ker f (T ) ⊕ ker g(T ) is
a T -invariant direct sum decomposition and ker g(T ) = im f (T ). So we only need to show that
ker f (T ) ⊥ imf (T ). Now f (T ) being normal, ker f (T ) = ker f (T )∗ . Since im f (T ) ⊥ ker f (T )∗ ,
the assertion follows.

Now continuing with the discussion of the orthogonal transformation T : Rn → Rn , the above
lemma implies that Rn has a T -invariant orthogonal decomposition

Rn = V+ ⊕ V− ⊕ V1 ⊕ . . . ⊕ Vt ,

where V+ := ker (T − I), V− := ker (T + I) and Vi := ker fi (T ) for all i = 1, . . . , t. Note

that dim V+ = r0 , dim V− = s0 and dim Vi = 2ei for all i. Now to understand how T acts

96
on Vi , let us consider the orthogonal transformation Ti := T |Vi . Then minTi (X) = fi (X) and
chTi (X) = fiei (X). If W ≤ Vi is a Ti -invariant subspace then W ⊥ is Ti∗ = Ti−1 -invariant. As
Ti = (Ti−1 )−1 is a polynomial in Ti−1 , W ⊥ is also Ti -invariant. It follows that each Vi has a further
orthogonal decomposition of the form

Vi = Vi,1 ⊕ . . . ⊕ Vi,ei

into two-dimensional T -invariant subspaces. Note that each Vi,j is isomorphic, as an inner product
space, to (R2 , ·). So there exists a unique θi ∈ (0, π) ∪ (π, 2π) such that with respect to a suitable
ordered orthonormal basis Bi,j of Vi,j ,

cos θi - sinθi
[T |Vi,j ]Bi,j = .
sin θi cos θi

Putting together all the pieces, we see that the matrix representation of T , with respect to a
suitable ordered orthonormal basis B of Rn , has the following block diagonal form

Ir 0
 

 −Is0 0 

Mθ 1
 
[T ]B = ,
 
 · 
0
 
 · 
Mθ t

where, for each i, Mθi is a 2ei × 2ei block diagonal matrix of the form

Aθ i
 

 · 0 

Mθ i = · ,
 

0
 
 · 
Aθi

cos θi - sinθi
with Aθi = for all i. And this completes the description of matrix representa-
sin θi cos θi
tions of orthogonal transformations of Rn .

It’s clear that On (R) is not a connected set (why?). However, we can now show that SOn (R) :=
{A ∈ On (R) | det A = 1} is connected. To do so, we first need a few definitions.

Definitions. Let X be a metric space (or more generally, a topological space). Then X is
said to be path connected of for all x, y ∈ X, there exists a continuous function γ : [0, 1] → X such
that γ(0) = x and γ(1) = y. We say that γ is a path from x to y. More generally, for an arbitrary
metric space (or, topological space) X, if we define a relation x ∼p y if there exists a path from x
to y, then one can check that ∼p is an equivalence relation, so that X is path connected iff there
exists a unique equivalence class in X with respect to ∼p .
Note that a path connected metric space (or, topological space) is connected and every surjective
image of a path connected metric space (or, topological space) is path connected.

Proposition. SOn (R) is path connected, and hence connected.

Proof. Let A ∈ SOn (R). As −I2 represents the rotation of the R2 -plane at an angle π, there
exists an orthogonal matrix P ∈ On (R) such that

Ir
 

t

 Aθ 1 0 

P AP =  · ,
 

0
 
 · 
Aθ m

97
such that n = 2m + r and θi ∈ (0, 2π) for all i = 1, . . . , m. We want to show that there exists a
path γ : [0, 1] → SOn (R) such that γ(0) = In and γ(1) = P t AP . As Aθi ∈ SO2 (R) for all i, it
suffices to construct paths γi : [0, 1] → SO2 (R) such that γi (0) = I2 and γi (1) = Aθi . But that is
easy. For each i, we can take γi : [0, 1] → SO2 (R), defined as

cos tθi - sin tθi
γi (t) := ,
sin tθi cos tθi

so that γi (0) = I2 and γi (1) = Aθi . So there exists a path from In to P t AP . Now let φP :
Mn (R) → Mn (R) be the homeomorphism defined as φ(M ) := P M P t for all M ∈ Mn (R). Then
φP induces a homeomorphism of SOn (R) and φP ◦ γ : [0, 1] → SOn (R) is a path from In to A.
As ∼p is an equivalence relation and In ∼p A for all A ∈ SOn (R), it follows that SOn (R) is path
connected. Note that we’ve essentially reduced the proof into just showing that SO2 (R) is path
connected.

Wrapping up
Let V be a finite dimensional complex inner product space. Then a normal operator T on V can
be thought of as a bunch of complex numbers as V has a T -invariant orthogonal decomposition,
with T acting on each component by the multiplication by a complex number. Therefore, if we
‘identify’ the set of normal operators on V with the complex plane, then it leads to the following
identifications which is obvious if dim V = 1.

1. Normal operators ←→ Complex plane C (orthogonally diagonalizable with complex eigen-

values),
2. Self-adjoint operators ←→ Real axis R (orthogonally diagonalizable with real eigenvalues),
3. Anti-self-adjoint operators ←→ Imaginary axis iR (orthogonally diagonalizable with purely
imaginary eigenvalues),
4. Positive definite operators ←→ Positive real axis R+ (orthogonally diagonalizable with
positive eigenvalues),
5. Positive semi-definite operators ←→ Non-negative real axis R+
0 (orthogonally diagonalizable
with non-negative eigenvalues),
6. Negative definite operators ←→ Negative real axis R \ R+
0 (orthogonally diagonalizable with
negative eigenvalues),
7. Negative semi-definite operators, ←→ Non-positive real axis R \ R+ (orthogonally diago-
nalizable with non-positive eigenvalues),
8. Unitary operators ←→ Unit circle S 1 (orthogonally diagonalizable with eigenvalues of unit
modulus).

Exercises.
1. HK : Section 8.4 - 1,3 (T = Mγ iff the matrix representation of T with respect to the ordered
basis (1, i) is of the form . . . ),7,10,11,13 (U ∗ = U −1 , the last 3 problems are related.); Section
9.5 - 3,4,6,8.
2. If A ∈ Mn (C) is a unitary matrix, then show that for every positive integer r, there exist
exactly rn unitary matrices B ∈ Mn (C) such that B r = A.
If A ∈ Mn (R) is an orthogonal matrix, then show that for every positive integer r, there
exists an orthogonal matrix B ∈ Mn (R) such that B 2r+1 = A. When is the solution unique?
If A ∈ SOn (R), then show that for every positive integer r, there exists a special orthogonal
matrix B ∈ SOn (R) such that B r = A.
Does there exist an orthogonal matrix B ∈ O2 (R) such that

2 1 0
B = ?
0 −1

98
3. Let A ∈ On (R) be an orthogonal matrix of determinant −1. Then show that there exists an
orthogonal matrix P ∈ On (R) such that
if n = 2m + 2 is even then

1 0

 0 −1 
0
 
Aθ 1
 
t
P AP =  ,
 
 · 
0
 
 · 
Aθm

and if n = 2m + 1 is odd then

−1
 

t

 Aθ1 0 

P AP =  · ,
 

0
 
 · 
Aθ m

for some θ1 , . . . , θm ∈ [0, 2π).

4. (*) Give an example of a linear operator T on an inner product space V such that T T ∗ = I
but T ∗ T 6= I.
5. If A ∈ Mn (C) satisfies A2 = I, must A be a unitary matrix?
6. Prove that An upper (or, lower) triangular unitary (respectively, orthogonal) matrix A ∈
Mn (C) (respectively, A ∈ Mn (R)) is diagonal. Further, if every diagonal entry of A is
positive (respectively, negative) then prove that A = I (respectively, A = −I).
Is an n × n upper-triangular complex orthogonal matrix diagonal?
7. (*) If A ∈ Mn (C) is such that I + A, I + A2 , I + A3 are all unitary matrices, then show that
A = 0.
8. If F is an arbitrary field, we can define a matrix A ∈ Mn (F ) to be orthogonal if AAt =
At A = I. The set of all n × n orthogonal matrices over F is denoted by On (F ). An
orthogonal matrix A ∈ On (F ) is called a special orthogonal matrix if det A = 1, and the set
of all special orthogonal matrices is denoted by SOn (F ). Then show that
(i) Every orthogonal matrix A is invertible. In fact, det A = ±1.
(ii) If λ ∈ F is an eigenvalue of A ∈ On (F ), so is λ−1 .
(iii) If A ∈ Mn (F ) is an orthogonal matrix and c ∈ F , then cA is an orthogonal matrix iff
c = ±1.
(iv) SOn (F ) is a normal subgroup of On (F ).
(v) If A ∈ Mn (F ) is an upper (or lower) triangular orthogonal matrix then A is diagonal.
Hint. The inverse of an invertible upper (or lower) triangular matrix is upper (or lower)
triangular.
(vi) The set of 2 × 2 symmetric orthogonal matrices over F is given by

a b
∈ M2 (F ) a2 + b2 = 1 .
b −a

(vii) If a is a nonzero complex number, then the 2 × 2 normal orthogonal matrix

!
a+a−1 a−a−1
2 −1 i 2
A := a+a−1
∈ M2 (C)
−i a−a2 2

has a and a−1 as its eigenvalues. In particular, the absolute values of the eigenvalues of
a complex normal orthogonal matrix need not be equal to 1.

99
9. (i) If T is a linear operator on a complex inner product space (V, h , i) such that hT (x), xi >
0 for all nonzero x ∈ V , then show that T is a positive definite linear operator.
(ii) Suppose that T is a linear operator on an inner product space (V, h , i) over F = R/C
such that hT (x), xi > 0 for all nonzero x ∈ V (If F = R, such a T need not be positive
definite!). Then show that the function b : V × V → F , defined as b(x, y) := hT (x), yi,
is an inner product on V . Prove that a linear operator S : V → V is a unitary operator
with respect to this new inner product b iff S has an adjoint S ∗ satisfying ST S ∗ = T .

10. (*) Let T be a linear operator on an inner product space V over F = R/C which has an
adjoint T ∗ . Then the following assertions hold.
(i) If S : V → V is a self-adjoint operator, so is T ∗ ST and T ST ∗ .
(ii) If S : V → V is a positive (or negative) semi-definite operator, so is T ∗ ST and T ST ∗ .
(iii) For every positive definite (respectively, negative definite) linear operator S : V → V ,
T ∗ ST and T ST ∗ are both positive definite (respectively, negative definite) iff T (or
equivalently, T ∗ ) is invertible.
(iv) If T ∗ T = T T ∗ is a scalar operator, i.e., T ∗ T = T T ∗ = λI for some non-negative real
number λ, then for every normal operator S ∈ L(V ), T ST ∗ and T ∗ ST are both normal
operators.

11. Let U be a unitary operator on a finite-dimensional inner product space (V, h , i). If W ≤ V
is a U -invariant subspace then prove that W ⊥ is also U -invariant.
12. Explicitly describe all orthogonal transformations of R3 .
13. Let T be a normal operator on a finite-dimensional complex inner product space V . Then
show that T is self-adjoint iff iT is anti-self-adjoint. More generally, if S, T ∈ L(V ) are
commuting normal operators then prove that
(i) If S is self-adjoint and T is anti-self-adjoint, then ST is anti-self-adjoint.
(ii) If S, T are both anti-self-adjoint operators then ST is negative semi-definite.

14. (*) Let T be an invertible linear operator on a finite-dimensional inner product space V .
Then show that the following statements are equivalent.
(i) T T ∗ = T ∗ T is a nonzero scalar operator.
(ii) For every self-adjoint operator S ∈ L(V ), T −1 ST is also self-adjoint.
(iii) For every anti-self-adjoint operator S ∈ L(V ), T −1 ST is also anti-self-adjoint.
Hint. For (ii) =⇒ (i), use that orthogonal projections are self-adjoint operators.
15. Let V be a finite-dimensional inner product space V . Then show that a linear operator T ∈
L(V ) is self-adjoint iff there exists a T -invariant orthogonal decomposition V = V+ ⊕ V− ⊕ V0
such that T |V+ is positive definite, T |V− is negative definite and T |V0 = 0V0 .

16. (*) Let (V, h , i) be an inner product space over F = R/C and x ∈ V a nonzero element. Let
x⊥ := {y ∈ V | θ(x, y) = π2 }. Then prove that x⊥ is an R-linear subspace of V satisfying
x⊥ ⊆ x⊥ ( V . Also, show that x⊥ has R-codimension one in V . In particular, x⊥ = x⊥ iff
F = R.
17. Let T be a linear operator on a finite-dimensional inner product space V . If kT − Ikop < 1,
then prove that T is invertible; and if T is invertible, then show that kT −1 kop ≤ kT k−1
op .

18. (*) Let V be a finite-dimensional inner product space.

(a) If f : V → V is a length preserving function, i.e, kf (x)k = kxk for all x ∈ V then prove
that f is injective. Give an example to show that f need not be surjective or linear. In
fact, it need not even be a group homomorphism.
(b) Let f : V → V be an angle preserving function, i.e., f (x) = 0 iff x = 0 for all x ∈ V ,
and if x, y ∈ V are nonzero elements then θ(f (x), f (y)) = θ(x, y). Then show that f
need not be length preserving, or a linear operator.

100
(c) If V1 , V2 , V3 are inner product spaces, and f : V1 → V2 , g : V2 → V3 are both length/angle
preserving functions, then so is g ◦ f .
(d) Prove that every nonzero scalar operator and every unitary operator on V is angle
preserving.
(e) If f : V → V is both length and angle preserving then show that f is R-linear. In
particular, if F = R, then f is an orthogonal transformation.
Is f linear if F = C? What if we further assume that f (ix) = if (x) for all x ∈ V ?
(f) If a linear operator T ∈ L(V ) preserves angle, then prove that for all x, y ∈ V , if x ⊥ y
then T (x) ⊥ T (y).
Deduce that, for such a T , there exists a nonzero λ ∈ F such that λT is a unitary
operator.
Hint. For the second part, can you reduce it to the case when dim V = 2 and T has 1
as an eigenvalue?

101
Lecture 28 and 29 (25/1/2021, 27/1/2021) :
Polar decomposition
We are familiar with polar √ decomposition of complex numbers. If z ∈ C, then z can be written as
z = reiθ , where r = |z| := z z̄ is a non-negative real number and eiθ ∈ S 1 is a complex number
of unit modulus. Here that r = |z| is uniquely determined by z, and a representation z = rz uz ,
where rz is a non-negative real number and uz is a complex number of unit modulus, is unique iff
z 6= 0, or equivalently, iff z is invertible.

Let T be a linear operator on a finite-dimensional inner product space V . We can naturally

associate two positive semi-definite (also called non-negative definite, or simply non-negative) op-
erators to T , viz., T T ∗ and T ∗ T . Being positive semi-definite, they’ve unique positive semi-definite
square roots which are going to play the roles of the ‘modulus of T ’.

Theorem. Let T be a linear operator on a finite-dimensional inner product space (V, h , i).
Then T can be written as T = U N , where U is a unitary operator and N is a positive semi-definite
(or non-negative) operator. The operator N is uniquely determined by T , and U is unique iff T is
invertible.

Proof. If there exists a unitary operator U and a non-negative operator N such that T = U N ,
then T ∗ = N ∗ U ∗ = N U ∗ , implying that T ∗ T = N U ∗ U N = N 2 . Therefore, N is the unique
non-negative square root of the non-negative operator T ∗ T .
To determine U , we first consider the case when T is invertible. Then T ∗ T is also invertible, and
hence so is N . Therefore, it suffices to show that U := T N −1 is a unitary operator. Now

U U ∗ = T N −1 (T N −1 )∗ = T N −1 (N −1 )∗ T ∗ = T N −2 T ∗ = T (T ∗ T )−1 T ∗ = T T −1 (T ∗ )−1 T ∗ = I.

As V is finite-dimensional, we conclude that U U ∗ = U ∗ U = I, implying that U is a unitary

operator. Clearly, U is uniquely determined by T .
Next, let us now consider the case when T is not invertible. The proof will clearly show why U
cannot be unique in this case. As described earlier, the non-negative operator N := T ∗ T is not
invertible anymore. Nevertheless, with N being orthogonally diagonalizable, we have an orthogonal
decomposition V = ker N ⊕ im N . Now ker T ⊆ ker N . Further, as rk T = rk T ∗ T = rk N ,
it follows that ker T = ker N . So it suffices to construct a unitary transformation (i.e., an inner
product space homomorphism) U 0 : im N → V such that T 0 = U 0 N 0 , where T 0 := T |im N and
N 0 := N |im N , because then U can be (non-uniquely) extended to a linear operator U : V → V such
that T = U N . Note that im N need not be T -invariant. But N 0 is injective, so that (N 0 )−1 exists.
We want to show that T (N 0 )−1 |im N = T 0 (N 0 )−1 preserves the inner product. Let x ∈ im N . Then

hT (N 0 )−1 (x), T (N 0 )−1 (x)i = h(N 0 )−1 (x), T ∗ T (N 0 )−1 (x)i = hx, (N 0 )−1 (N 0 )2 (N 0 )−1 (x)i = hx, xi,

implying that T 0 (N 0 )−1 preserves the inner product, and this completes the proof. Note that
there’s no canonical way of extending U 0 to U ∈ U (V ), because U can be defined in various ways
on ker T , and that’s the source of non-uniqueness of U .

Remark. Taking adjoint on both sides of T = U N , we see that T can also be written in the
form T = N1 U1 , where N1 is a non-negative operator satisfying N12 = T T ∗ , and U1 is a unitary
operator; and U1 is unique iff T is invertible.
For a polar decomposition T = U N , U and N commute with each other iff T is a normal operator.
In fact, if T is normal and B is an orthonormal eigenbasis of T in V , then [T ]B is a diagonal
matrix, so that by simply taking polar decompositions of the diagonal entries, we can construct a
polar decomposition of T . Then it is also obvious that U N = N U . This is one more ‘evidence’
supporting our philosophy that ‘normal operators on finite-dimensional inner product spaces be-
have like complex numbers’.

As an application of polar decomposition, we will show that any two real matrices which are
unitarily equivalent are also orthogonally equivalent. But before that, we prove a result of a more
general flavour.

102
Proposition. Let E/F be an arbitrary field extension where F is an infinite field. If A, B are
n × n matrices over F , then A is similar to B over F iff A is similar to B over E.

Proof. If A is similar to B over F then, clearly, it’s also similar to B over E. Conversely,
suppose that there exists P ∈ GLn (E) such that P −1 AP = B, or equivalently, AP = P B. Let
α1 , . . . , αm ∈ E be linearly independent elements over F such that each entry of P can be written
as an F -linear combination of α1 , . . . , αm . Then we can write P as

P = α1 P1 + . . . + αm Pm ,

with P1 , . . . , Pm ∈ Mn (F ) (Pi ’s need not be invertible!). As αi ’s are linearly independent over F ,

we see that APi = Pi B for all i = 1 . . . , m. Now let us consider the matrix

Pt := t1 P1 + . . . + tm Pm ∈ Mn (F [t1 , . . . , tm ]),

where t1 , . . . , tm are algebraically independent elements over F , so that we can treat them as
variables. Now det Pt is a nonzero polynomial in t1 , . . . , tm because when we put ti = αi for all
i, then we get a nonzero value. With F being infinite, we can find β1 , . . . , βm ∈ F such that
Pβ := β1 P1 + . . . , βm Pm ∈ Mn (F ) is an invertibe matrix. But then APβ = Pβ B, implying that A
is similar to B over F .

Remark. The above result is true even without the assumption that F is infinite, but the
proof uses fundamental theorem of modules over PID.

Theorem. If A, B ∈ Mn (R) are unitarily equivalent matrices, then they are also orthogonally
equivalent.

Proof. If A, B are unitarily equivalent, then there exists a unitary matrix U ∈ Un (C) such
that B = U ∗ AU , or equivalently, AU = U B. Then A∗ U = U B ∗ . Therefore, using the previous
proposition, we can find an invertible matrix S ∈ GLn (R) such that AS = SB and A∗ S = SB ∗ .
Let S = P N be the polar decomposition of S, so that P ∈ On (R) and N is a real positive
definite matrix satisfying N 2 = S ∗ S. We want to show that AP = P B. Now N is orthogonally
diagonalizable, having the same eigenspaces as N 2 = S ∗ S. As S ∗ SB = S ∗ AS = BS ∗ S, B preserves
every eigenspace of N 2 , and hence of N , implying that BN = N B. Now from AS = SB, we get
AP N = P N B = P BN , which implies AP = P B as N is invertible.

Spectral theory
Recall that a diagonalizable linear operator on a finite-dimensional vector space can be written as
a finite linear combination of mutually commuting projections. Not surprisingly, analogous result
holds for orthogonally diagonalizable linear operators.

Spectral theorem. Let T be an orthogonally diagonalizable linear operator on a finite-

dimensional inner product space (V, h , i) (i.e., T is self-adjoint if F = R, and normal if F = C).
Then T can be written as a linear combination of mutually commuting orthogonal projections
which gives a resolution of identity. Conversely, if there exists an orthogonal resolution of identity
π1 + . . . + πr = I and c1 , . . . , cr ∈ F such that T = c1 π1 + . . . + cr πr , then T is orthogonally
diagonalizable.

Proof. Let T : V → V be an orthogonally diagonalizable operator with eigenvalues λ1 , . . . , λr ∈

F and associated eigenspaces V1 , . . . , Vr . For each i, let πi ∈ L(V ) be the orthogonal projection of
V with image Vi . Then {π1 , . . . , πr } is an resolution of identity and T = λ1 π1 + . . . + λr πr .
Conversely, let π1 , . . . , πr be orthogonal projections giving a resolution of identity such that
T = c1 π1 + . . . + cr πr . As π1 , . . . , πr is a commuting family of diagonalizable linear operators,
they can be simultaneously triangularized. Therefore, using Gram-Schmidt orthogonalization, we
can find an ordered orthonormal basis B of V such that for each i, [πi ]B is an upper triangular
Hermitian (or symmetric, if F = R) matrix, and therefore diagonal. Consequently, [T ]B , being a
sum of diagonal matrices, is also a diagonal matrix, implying that T is orthogonally diagonalizable.

103
Remark. Recall that for an orthogonally diagonalizable operator T : V → V , the spectrum of
T , denoted by σ(T ), is defined to be the set of eigenvalues of T . So if σ(T ) = {λ1 , . . . , λr } and
V1 , . . . , Vr are the associated eigenspaces, then for each i, the orthogonal projection πi ∈ L(V )
which has Vi as its image, can be written as a polynomial in T . We have already seen Pr this in the
context of diagonalizable linear operators and Pr the proof does not change. If T = i=1 λi πi then
for every polynomial φ(X) ∈ F [X], φ(T ) = i=1 φ(λi )πi . Applying Lagrange’s interpolation, one
can easily see that
Y T − λj I
πi =
λi − λj
j6=i

for all i = 1, . . . , r. Therefore a diagonalizable linear operator T is orthogonally diagonalizable iff

F [T ] contains 2r orthogonal projections, where |σ(T )| = r. The resolution of identity

I = π1 + . . . + πr

is called the resolution of the identity defined by T ; and

T = λ1 π1 + . . . + λr πr

is called the spectral resolution of T .

Functions of orthogonally diagonalizable linear operators

In this section, we assume T to be an orthogonally diagonalizable linear operator on a finite-
dimensional inner product space V . But everything applies verbatim to diagonalizable operators
on a finite-dimensional vector space.

If
T = λ1 π1 + . . . + λr πr
is the spectral resolution of T , then we just saw that for every polynomial, φ(X) ∈ F [X],

φ(T ) = φ(λ1 )π1 + . . . + φ(λr )πr .

φ(X)
More generally, if ψ(X) ∈ F (X) is a rational function such that ψ(λj ) 6= 0 for all j = 1, . . . , r, then

φ(T ) φ(λ1 ) φ(λr )

= π1 + . . . + πr .
ψ(T ) ψ(λ1 ) ψ(λr )
Taking a cue from the above cases, we can define general functions of orthogonally diagonalizable
linear operators.

Definition. Let T be an orthogonally diagonalizable linear operator on a finite-dimensional

inner product space V with a spectral resolution

T = λ 1 π1 + . . . + λ r πr .

Then for every function f : S(⊆ F ) → F , if σ(T ) ⊆ S, we define

f (T ) := f (λ1 )π1 + . . . + f (λr )πr .

Example. We have seen that if T is a non-negative operator on a finite-dimensional inner

product space V , then there exists a unique non-negative operator S on V such that S 2 = T . If

T = λ1 π1 + . . . + λr πr

is the spectral resolution of T , then λi ≥ 0 for all i = 1, . . . , r. Now the square root function
√
: [0, ∞) → [0, ∞) is injective. Therefore, by the above definition,
√ √ √
T :=
λ1 π1 + . . . + λr πr ,
√ √
and it’s easy to check that ( T )2 = T , so that S = T .

104
Proposition. Let T be an orthogonally diagonalizable linear operator on a finite-dimensional
inner product space V with a spectral resolution

T = λ 1 π1 + . . . + λ r πr .

For a function f : S(⊆ F ) → F , if σ(T ) ⊆ S, then f (T ) is also an orthogonally diagonalizable

linear operator with σ(f (T )) = f (σ(T )).
Now suppose that U : V → V 0 is an isomorphism of inner product spaces and T 0 := U T U −1 .
Then T 0 is also an orthogonally diagonalizable linear operator with σ(T 0 ) = σ(T ), and f (T 0 ) =
U f (T )U −1 .

Proof. It follows directly from the spectral theorem that f (T ) is an orthogonally diagonalizable
linear operator with σ(f (T )) = f (σ(T )).
Now
T 0 = U (λ1 π1 + . . . + λr πr )U −1 = λ1 U π1 U −1 + . . . + λr U πr U −1 .
Note that U ∗ = U −1 . Therefore, for each i, U πi U −1 is a self-adjoint projection, implying that
U πi U −1 is an orthogonal projection. As

U πi U −1 ◦ U πj U −1 = U (πi ◦ πj )U −1 = 0

for all i 6= j, and

U π1 U −1 + . . . + U πr U −1 = U π1 + . . . + πr U −1 = U IV U −1 = IV 0 ,

we get that
IV 0 = U π1 U −1 + . . . + U πr U −1
is a resolution of identity. Therefore, by the spectral theorem, we conclude that T 0 is an orthogonally
diagonalizable linear operator with the spectral resolution

T 0 = λ1 U π1 U −1 + . . . + λr U πr U −1 .

In particular, σ(T 0 ) = σ(T ). We could have also proved the orthogonal diagonalizability of T 0 by
showing that V 0 = U (V1 ) ⊕ . . . ⊕ U (Vr ) is an orthogonal decomposition of V 0 , where V1 , . . . , Vr are
the eigenspaces of T .
Finally, X
X
f (T 0 ) := f (λi )U πi U −1 = U f (λi )πi U −1 = U f (T )U −1 .
i i

We’ve defined certain functions of orthogonally diagonalizable operators. Now we’ll define
functions of orthogonally diagonalizable normal matrices (i.e., real symmetric matrices and complex
normal matrices). Let A ∈ Mn (F ) be an orthogonally diagonalizable matrix. Let TA : F n → F n
be the linear operator whose matrix representation with respect to the natural ordered basis
B0 := (e1 , . . . , en ) of F n is A. Then TA is an orthogonally diagonalizable operator. Let σ(A) =
σ(TA ) = {λ1 , . . . , λr } and f : S(⊆ F ) → F a function such that σ(A) ⊆ S. Then we define
f (A) := [f (TA )]B0 . If
T = λ1 π1 + . . . + λr πr
is the spectral resolution of TA , then the spectral resolution of A can be defined as

A = λ1 E1 + . . . + λr Er ,

where Ei := [πi ]B0 for all i = 1, . . . , r. Clearly,

Y A − λj I
Ei = ,
λi − λj
j6=i

and they satisfy the following properties

(i) Ei2 = Ei for all i,

105
(ii) Ei Ej = 0 for all i 6= j,

(iii) I = E1 + . . . + Er and
(iv) Ei∗ = Ei for all i.
As A is an orthogonally diagonalizable matrix, it’s tempting to first to take a diagonal matrix D
which is orthogonally equivalent to A, apply f on D, an then reverse the procedure to get back
f (A). This approach works too. However, unlike the spectral resolution, which is canonically
defined by A, we can have many diagonal matrices which are orthogonally equivalent to A. So
‘well-definedness’ is an issue which needs to be addressed. For that, let P, P 0 ∈ Mn (F ) be unitary
(orthogonal, if F = R) matrices such that D := P −1 AP, D0 := (P 0 )−1 AP 0 are diagonal matrices.
We want to show that f (A) = P f (D)P −1 = P 0 f (D0 )(P 0 )−1 . To do so, let B, B0 be the ordered
orthonormal basis of V corresponding to P, P 0 respectively. Then D = [TA ]B and D0 = [TA ]B0 .
Now Xr
[f (TA )]B = f (λi )πi = f (D),
i=1 B
0
and similarly, [f (TA )]B0 = f (D ). Therefore, by change of basis formula for matrix representations,
we have f (A) = P f (D)P −1 = P 0 f (D0 )(P 0 )−1 .

Exponential functions of matrices and linear operators

We have already seen the real exponential function exp : R → R, given by exp (x) := ex for all
x ∈ R. It has a power series representation

x2 x3
ex := 1 + x + + + ... ,
2! 3!
and the image of the real exponential function is the set of positive real numbers. Similarly, we
can define the complex exponential function exp : C → C, defined as exp (z) := ez , where ez is
defined using the power series representation

z2 z3
ez := 1 + z + + + ... .
2! 3!
The complex exponential function has the following properties

(i) The image of the function is C \ {0},

(ii) e0 = 1,
(iii) ez1 +z2 = ez1 ez2 for all z1 , z2 ∈ C. In particular, 1
ez = e−z . Therefore, exp : C → C∗ is a
group homomorphism.
(iv) If z = x + iy with x, y ∈ R, then ez = ex · eiy = ex (cos y + i sin y). Therefore, ker exp =
{2mπi | m ∈ Z}, and ez ∈ R iff Im z = nπ for some non-negative integer n. Note that
|ez | = ex .
We can now easily define the exponential function of an orthogonally diagonalizable matrix over
F = R/C. In fact, we can mimic the power series expansion to define an exponential function
exp : Mn (F ) → Mn (F ) as exp (A) := eA , where

A2 A3
eA := I + A + + + ...
2! 3!
for all A ∈ Mn (F ). To show that the above series convergences, we first need a definition.
P
Definition. Let (V,P k k) be a normed linear space. Then a series n xn in V is said to be
absolutely convergent if n kxn k < ∞. P
It follows from the triangle inequality that for an absolutely convergent series n xn , the sequence
of partial sums is a Cauchy sequence, and therefore the series converges if V is complete.

106
Since Mn (F ) is complete with respect to the norm kAk := tr (AA∗ ) (In fact, every norm as
they are all equivalent.), it suffices to show that the above series is absolutely convergent, i.e.,

kA2 k kA3 k
1 + kAk + + + . . . < ∞.
2! 3!
To show the absolute convergence of the series, it is enough to show that kAn k ≤ kAkn for all
n ≥ 1, because then the sum is bounded by ekAk . To prove it, one can either directly prove that
kABk ≤ kAk kBk for all A, B ∈ Mn (F ), or prove the analogous result for the operator norm on
Mn (F ) (which is easier!), and use the fact that any two norms on Mn (F ) are equivalent, so that
convergence of the series with respect to the operator norm implies the same for the Euclidean norm.

Lemma. The exponential function exp : Mn (F ) → Mn (F ), defined by exp (A) := eA is a

continuous function.

Proof. Let A ∈ Mn (F ) and B := A + , where ∈ Mn (F ). We want to show that eB → eA as

→ 0. Now
∞ ∞ ∞ n
A+ A
X (A + )n − An X k(A + )n − An k X kAk + kk − kAkn
e −e = ≤ ≤
n=0
n! n=0
n! n=1
n!

= ekAk+kk − ekAk = ekAk ekk − 1 → 0 as → 0.

In the third inequality, we’re implicitly using the fact that kABk ≤ kAk kBk for all A, B ∈ Mn (F ).
This is something which we have not proved. But then we can again work with the operator norm,
for which this inequality is obvious. And because any two norms on a finite-dimensional vector
space over F = R/C are equivalent, questions about convergence, continuity etc. does not depend
on which norm we are working with.

The exponential function of matrices enjoys the following properties. Most of the proofs are
easy and we leave them as exercises.

1. If A ∈ Mn (F ) is a diagonal/upper (or lower) triangular matrix, then so is eA .

2. eλI = eλ I for all λ ∈ F . In particular, e0 = I.
−1
3. If P ∈ GLn (F ) then eP AP = P −1 eA P . So if A is similar to B, then eA is similar to eB .
Also, if A, B are (unitarily/orthogonally) equivalent, then so are eA and eB . In particular, if
A is (orthogonally/unitarily) diagonalizable/triangulable, then so is eA .
t t ∗ ∗
4. eA = eA and eA = eA for all A ∈ Mn (F ). It follows that if A is a symmetric matrix
then so is e ; and if A is Hermitian (respectively, skew-Hermitian i.e., A∗ = −A), then eA is
A

a Hermitian (respectively, unitary) matrix.

5. det eA = etr A , so that exp (Mn (F )) ⊆ GLn (F ). To see this, consider the natural inclusion
Mn (R) ⊆ Mn (C) to assume that F = C. Now first prove it for diagonalizable matrices of
Mn (C). As the set of diagonalizable matrices is dense in Mn (C), and the functions involved
are continuous, the assertion follows.
6. If F = R, then det eA > 0 for all A ∈ Mn (R). In particular, the image is not the whole of
GLn (R). We’ll later see that, over complex numbers, exery invertible matrix is contained in
the image of the exponential map.
7. If A, B ∈ Mn (F ) commutes with each other, then so does eA and eB . In fact, then
eA eB = eA+B = eB+A = eB eA . In particular, (eA )−1 = e−A for all A ∈ Mn (F ).
The proof of eA eB = eA+B when AB = BA, is purely computational in nature, but not very
difficult. You may try it if you want. But the surprising thing is that we can actually assume
that A, B are simultaneously diagonanizable because of the following result due to Motzkin-
Taussky (Pairs of matrices with property L II, Trans. Amer. Math. Soc. 80 (1955), 387-401) -

For every pair of commuting matrices A, B ∈ Mn (C) there exist two sequences of complex
matrices (An )n∈N , (Bn )n∈N satisfying the following properties

107
(i) For each n, both An an Bn are diagonalizable.
(ii) An Bn = Bn An for all n ≥ 1.
(iii) An → A and Bn → B as n → ∞.’
Hence the proof becomes obvious (Of course, modulo the above result which is perhaps not
‘easy’ to prove.)!

Similarly, we can also define exponential function of linear operators on a finite-dimensional

inner product space V . If T ∈ L(V ), we define

T2 T3
eT := 1 + T + + + ... .
2! 3!

Note that if B is an orthonormal basis of V the eT B = eA , where A := [T ]B . We can work
with any norm on L(V ) because any two such norms are equivalent. Now we state a few more
properties of the exponential function. We could have stated them for matrices. But the language
of linear operators seems to be a more natural set-up.

Proposition. Let T be a linear operator on a finite-dimensional inner product space V . Then

the following assertions hold.
(i) If W ≤ V is a T -invariant subspace, then W is also eT -invariant. In particular, if x ∈ V is
an eigenvector of T associated to λ ∈ F , then x is an eigenvector of eT associated to eλ . So
every eigenvector of T is also an eigenvector of eT ; and if T is (orthogonally) diagonalizable,
so is eT .
(ii) If W ≤ V is a T -invriant subspace then eTW = eT |W .
(iii) If F = C, then T ∈ L(V ) is (orthogonally) diagonalizable iff eT is orthgonally diagonalizable.
In this case σ eT = eσ(T ) := {eλ | λ ∈ σ(T )}.
Proof. To prove (i), note that every subspace of V is closed in V and if (Tn )n∈N is a sequence
of linear operators on V such that Tn → T an n → ∞, then lim Tn (x) = T (x) for all x ∈ V .
n→∞
(ii) Follows directly from the definition.
If T is (orthogonally) diagonalizable, then we have already seen that eT is also (orthogonally)
diagonalizable with the desired property.
Conversely, we first show that if T is not diagonalizable, then eT cannot be diagonalizable. If T
is not diagonalizable, we can find a two-dimensional T -invariant subspace W ≤ V such that, with
respect to a suitable ordered basis B of W , the matrix representation of TW is of the form

λ µ
[TW ]B =
0 λ

for some constants λ, µ ∈ F with µ 6= 0. Then one can check that

λ
µeλ

e
[eTW ]B =
0 eλ

which is not a diagonalizable matrix, and therefore eT is not diagonalizable. And for orthogonal
diagonalizability, one only needs to observe that if T is diagonalizable, then because T and eT have
the same eigenspaces, T is orthogonally diagonalizable iff eT is orthogonally diagonalizable.
The remaining assertions are obvious.

Simultaneous orthogonal diagonalization and self-adjoint algebras

If {Ti }i∈I is a commuting family of diagonalizable linear operators on a finite dimensional vector
space V , then we have seen that the family can be simultaneously diagonalized, i,e., there exists
an ordered basis B of V such that [Ti ]B is a diagonal matrix for all i ∈ I. Not surprisingly,
analogous result holds for a commuting family of orthogonally diagonalizable linear operators on
a finite-dimensional inner product space.

108
Proposition. Let V be a finite-dimensional inner product space and F ⊆ L(V ) a commuting
family orthogonally diagonalizable linear operators. Then F can be simultaneously orthogonally
diagonalized, i.e., there exists an ordered orthonormal basis B of V such that [T ]B is a diagonal
matrix for all T ∈ F.
In particular, If F ⊆ Mn (R) (respectively, F ⊆ Mn (C)) is a commuting family of symmetric (re-
spectively, normal) matrices then there exists an orthogonal matrix P ∈ On (R) (respectively, a
unitary matrix U ∈ Un (C)) such that P t AP ∈ Mn (R) (respectively, U ∗ AU ∈ Mn (C)) is a diagonal
matrix for all A ∈ F.

Proof. We give two proofs of the above proposition. The first one exploits certain property
of normal matrices, whereas the second one mimics the idea used to prove the similar result for
diagonalizable operators on vector spaces.
(1) Since F is a commuting family triangulable operators, there exists an ordered basis of V with
respect to which the matrix representation of every element of F is an upper triangular matrix.
So using Gram-Schmidt orthogonalization, we can find an ordered orthonormal basis B of V such
that [T ]B is an upper triangular matrix for all T ∈ F. As each [T ]B is an upper triangular normal
matrix, it follows that [T ]B is actually a diagonal matrix for all T ∈ F, implying that B simulta-
neously orthogonally diagonalizes the family F.
(2) Alternatively, we can apply induction on dim V . There’s nothing to prove if dim V = 1. Now
supposing that the result is true whenever dim V ≤ r, let dim V = r + 1. If every element of F
is a scalar operator, again, there’s nothing to prove. Otherwise, let T ∈ F be a linear operator
which is not scalar and W an eigenspace of T . As T is not scalar, W 6= V . Then W ⊥ is a
sum of eigenspaces of T . Therefore W and W ⊥ are both invariant under every element of F. As
dim W, dim W ⊥ ≤ r, from the induction hypothesis, it follows that there exist ordered orthonor-
mal basis B0 , B00 of W, W ⊥ respectively, such that [T |W ]B0 , [T |W ⊥ ]B00 are both diagonal matrix
for all T ∈ F. Therefore, if we take B := (B0 , B00 ), then B is an ordered orthonormal basis of V
such that [T ]B is a diagonal matrix for all T ∈ F.

Corollary. If A, B ∈ Mn (F ) are commuting normal matrices, then F [A, B] is a commutative

algebra of normal matrices. In particular, A + B, AB are normal matrices.

Proof. If F = C, the assertion directly follows from the above proposition because a complex
square matrix is normal iff it’s unitarily diagonalizable.
Otherwise, we can view A, B as complex matrices. Then C[A, B] is a commutative algebra of
complex normal matrices; and hence, R[A, B] ⊆ C[A, B] is commutative algebra of real normal
matrices.

Definitions. Let V be a finite-dimensional inner product space and F ⊆ L(V ) a family of

linear operators. Then x ∈ V is called an eigenvector of F if it is an eigenvector of T for all T ∈ F.
A subspace W ≤ V is said to be an F-invariant subspace of V if it’s invariant under every element
of F; and a maximal F-invariant subspace W ≤ V on which the restriction of T acts as a scalar
operator for every T ∈ F, is called an eigenspace of F (Similar approach can be adopted to define
the eigenspaces of a single linear operator!).

Remarks.
1. Let V1 , V2 ≤ V be T -invariant subspaces such that T |V1 , T |V2 are scalar operators for all
T ∈ F. If V1 ∩ V2 6= 0, then T |V1 +V2 is also a scalar operator for every T ∈ F. It follows that
if V1 , V2 are eigenspaes of F, then either V1 = V2 or V1 ∩ V2 = 0.

2. Every eigenvector of F is contained in a unique eigenspace of F.

3. If x ∈ V is an eigenvector of F, then the eigenspace of F containing x is equal to
\
VT,x ,
T ∈F

where VT,x is the eigenspace of T which contains x.

109
Lemma. Let F be a family of orthogonally diagonalizable linear operators on an inner product
space V . Then different eigenspaces of F are mutually orthogonal.

Proof. Let V1 , V2 be distinct eigenspaces of F and x ∈ V1 , y ∈ V2 . If x, y are nonzero vectors

then there exists a linear operator T ∈ F such that x and y are associated to distinct eigenvalues
of T , implying that x ⊥ y.

Lemma. Let T be an orthogonally diagonalizable linear operator on an inner product space

V . If W ≤ V is a T -invariant subspace, then W ⊥ is also T -invariant.

Proof. As T is orthogonally diagonalizable, so is T ∗ . If Vλ is the eigenspace of

LT associated
to λ ∈ F , then Vλ is also T ∗ -invariant and T ∗ |Vλ = λ̄IVλ . We know that W = λ∈F Vλ ∩ W .
Therefore W is T ∗ -invariant, and consequently, W ⊥ is invariant under T ∗∗ = T .

Proposition. Let F be a commuting family of orthogonally diagonalizable linear operators on

a finite-dimensional inner product space V . Then V can be written as an orthogonal sum

V = V1 ⊕ . . . ⊕ Vr ,

where V1 , . . . , Vr are the distinct eigenspaces of F. The corresponding resolution of identity

I = π1 + . . . + πr

is called the resolution of identity determined by F, and for each T ∈ F, the representation

T = λ1 π1 + . . . + λr πr

is called the spectral resolution of T in terms of this family (Note that here the λi ’s may not be
distinct.). In particular, every eigenspace of T is a sum of eigenspaces of F.

Before we give a proof of the proposition, let us consider a simple example which will help us
to understand the above ideas. Let V := R4 , equipped with the usual dot product and F = {S, T },
where S, T ∈ L(R4 ) are linear operators whose matrix representations with respect to the natural
ordered orthonormal basis (e1 , e2 , e3 , e4 ) are
   
1 0 0 0 1 0 0 0
0 1 0 0
 and 0 0 0 0
 

0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0

respectively. Then the eigenspace decomposition of F is given by

R4 = Re1 ⊕ Re2 ⊕ Re3 ⊕ Re4 .

Note that none of the eigenspaces of F is an eigenspace of either S or T . However, we’ll later see
that every eigenspace of F is an eigenspace of a polynomial in S, T .
If πi is the orthogonal projection of R4 onto Rei , then

S = 1 · π1 + 1 · π2 + 0 · π3 + 0 · π4

is the spectral resolution of S in terms of F.

Proof of the proposition. We know that distinct eigenspaces of F are mutually orthogo-
nal. Let W be the sum of all eigenspaces of F. If W = V , we are done. Otherwise, W ⊥ is also
F-invariant and FW ⊥ := {T |W ⊥ | T ∈ F} is a commuting family of orthogonally diagonalizable
linear operators. So they can be simultaneously orthogonally diagonalized; and in particular, W ⊥
contains an eigenvector of F, which is a contradiction.

Remark. If V = V1 + . . . + Vn = V10 + . . . + Vm0 are two orthogonal decompositions of V , then

V := {V10 , . . . , Vm0 } is said to be a refinement of V := {V1 , . . . , Vn } if for all i = 1, . . . , m, there exists
0

ri ∈ {1, . . . , n} such that Vi0 ⊆ Vri . Similarly, we can define refinement of a resolution of identity.

110
Note that refinement gives a partial order relation on the set of all orthogonal decompositions of
V and associated resolutions of identity. If F is a commuting family of orthogonally diagonalizable
linear operators, then the resolution of identity determined by F is a refinement of every resolution
of identity given by an element of F.

Definitions. Let V be an inner product space. An F -subalgebra A of L(V ) is said to be a

self-adjoint algebra if it is closed under the adjoint operation, i.e., if for each T ∈ A, T ∗ ∈ A.
Note that, according to our definition, a self-adjoint algebra A ⊆ L(V ) always contains the scalar
operators.
As every intersection of self-adjoint algebras is also self-adjoint, if F ⊆ L(V ), then the intersection
of all self-adjoint algebras containing F is called the self-adjoint algebra generated by F, and denoted
by AF .

Theorem. Let F be a commuting family of orthogonally diagonalizable linear operators on a

finite-dimensional inner product space V . If

I = π1 + . . . + πr

is the orthogonal resolution of identity determined by F, then

AF = {c1 π1 + . . . + cr πr ∈ L(V ) | c1 , . . . , cr ∈ F }.

In particular, AF is always generated by a single element.

Proof. Let CF := {c1 π1 + . . . + cr πr ∈ L(V ) | c1 , . . . , cr ∈ F }. Then CF is a self-adjoint algebra

containing F, implying that AF ⊆ CF . Let F [F] be the smallest subalgebra of L(V ) containing F.
Then F [F] ⊆ AF ⊆ CF . We will show that F [F] = CF , thereby proving the claim.
As CF , as an algebra over F (In fact, even as a vector space over F !), is generated by the projec-
tions π1 , . . . , πr , it suffices to show that πi ∈ F [F] for all i = 1, . . . , r. Let Vi be the eigenspace
of F corresponding to πi , i.e., Vi = im πi . If φ(X1 , . . . , Xn ) ∈ F [X1 , . . . , Xn ] is a polynomial
and T1 , . . . , Tn ∈ F, then for each i = 1, . . . , r, φ(T1 , . . . , Tn )|Vi = φ(λi1 , . . . , λin )IVi , where
Tj |Vi = λij IVi . It implies that F [F] is a commutative algebra of orthogonally diagonalizable linear
operators. Therefore it suffices to find an element S ∈ F [F] which has Vi as an eigenspace, because
then πi can be written as a polynomial in S, so that πi ∈ F [S] ⊆ F [F]. Choose a linear operator
T ∈ F [F] with the property that Vi ⊆ ker T ; and for any other linear operator T 0 ∈ F [F], if Wi is
the eigenspace of T 0 containing Vi , then Wi 6⊆ ker T , unless Wi = ker T . We want to show that
ker T = Vi . If not, there exists S ∈ F [F] such that ker T 6⊆ ker S. Now using the following lemma,
we will construct a linear operator T 00 ∈ F [F] such that Vi ⊆ ker T 00 ( ker T , which contradicts
the minimality of ker T , and thereby proving that Vi = ker T .

Lemma. Let S, T be commuting diagonalizable operators on an n-dimensional vector space

V over an arbitrary filed F such that ker S 6⊆ ker T and ker T 6⊆ ker S. Suppose that W :=
ker S ∩ ker T 6= 0. Then there exists a polynomial f (X) ∈ F [X] such that ker (S − f (T )) = W ,
provided |F | ≥ n − 1.

Proof. Let B be an ordered basis of V such that A := [S]B , B := [T ]B are diagonal matrices.
Let {c1 = 0, c2 , . . . , cs }, {c01 = 0, c02 , . . . , c0t } be the set of eigenvalues of S, T respectively. For each
i = 2, . . . , t, let Xi be set of eigenvalues of S which appear as the diagonal entries of A correspond-
ing to the diagonal entries of B which are equal to c0i . As null T > 1, |Xi | ≤ n − 2 for all i. Since
|F | ≥ n − 1, we can choose, for each i, an element ηi ∈ F \ Xi . Now using Lagrange’s interpolation,
we can find a polynomial f (X) ∈ F [X] such that f (0) = 0 and f (c0i ) = ηi for all i = 2, . . . , t. Then
it is easy to see that ker (S − f (T )) = W .

Finally, in order to show that AF is generated by one element, let µ1 , . . . , µr be distinct el-
ements of F . Then using Lagrange’s interpolation, one can easily see that AF = F [Te], where
Te := µ1 π1 + . . . , µr πr .

Remarks.

111
1. Let F := F2 , the field consisting of two elements. Now consider the matrices
   
1 0 0 0 1 0 0 0
0 1 0 0  and B := 0 0 0 0
 
A := 

0 0 0 0   0 0 1 0
0 0 0 0 0 0 0 0

Then ker A =< e3 , e4 >, ker B =< e2 , e4 > and ker A ∩ ker B =< e4 >. However, for
every polynomial f (X) ∈ F [X], the first diagonal entry of f (B) is equal to its third diagonal
entry, implying that ker (A − f (B)) 6=< e4 >. Therefore, in the above lemma, we cannot
drop the condition that |F | ≥ n − 1.
2. Again, let F := F2 , and consider the matrices
   
1 0 0 0 0 0
A := 0 0 0 and B := 0 0 0
0 0 0 0 0 1

and take F := {A, B}. Then F [F] ⊆ M3 (F ) is the set of all diagonal matrices. As every
diagonal matrix of M3 (F ) satisfies the equation X 2 = X, the F -algebra F [F], which has
dimension 3, cannot be generated by one element as an F -algebra.
3. If F = R/C, and A is a commutative self-adjoint algebra of orthogonally diagonalizable linear
operators on an inner product space V of dimension n, then there exists T ∈ A such that
A = F [T ]. In particular, dim A ≤ n.

4. It follows from the proof of the theorem that if V is a finite-dimensional inner product
space over F = R/C, then every commutative algebra of orthogonally diagonalizable linear
operators A ⊆ L(V ) is a self-adjoint algebra.
Now if π : V → V is a projection then the only projections contained in F [π] are π and
I − π. Therefore, if π is not an orthogonal projection, then F [π] is a commutative algebra
of diagonalizable operators (not orthogonally diagonalizable!) which is not a self-adjoint
algebra.

Exercises.
1. HK : Section 9.5 - 5,7,9 (For 9(c), you may need Thm. 10 on p. 337).
2. Let F be an arbitrary field with |F | ≥ n. If A is an F -subalgebra of the set of n × n diagonal
matrices over F , then prove that there exists a matrix A ∈ A, such that A = F [A].
3. Let V be a finite-dimensional inner product space and T : V → V an orthogonally diagonal-
izable operator with a spectral resolution

T = λ1 π1 + . . . + λr πr .

Now consider the inverse function on F ∗ which sends a nonzero element λ ∈ F to λ1 . Then
we can define inverse function of T iff σ(T ) ⊆ F ∗ , i.e., iff T is invertible in the usual sense.
In this case, prove that the spectral resolution of T1 is given by

1 1 1
= π1 + . . . + πr ,
T λ1 λr

and 1
T = T −1 .
4. (*) If A, B ∈ Mn (F ), then prove that kABk ≤ kAk kBk, where kAk := tr (AA∗ ).
Deduce that eA ≤ ekAk for all A ∈ Mn (F ).

5. (a) If A ∈ Mn (C), then prove that σ eA = eσ(A) .
Does the result hold for real matrices?

112
(b) Give an example of A ∈ Mn (R) such that A is not diagonalizable, but eA is diagonal.
Hint. To prove the first
part of
(a), use triangulation. For the second part of (a) and (b),
0 −π
take the matrix A := , and use a suitable embedding of C in M2 (R) to conclude
π 0
that eA = −I2 .

6. (*) Use the following steps to prove that exp : Mn (C) → GLn (C) is surjective.
(i) Every diagonal matrix of nonzero determinant is contained in the image of the expo-
nential map.
(ii) Every diagonalizable matrix of nonzero determinant is contained in the image of the
exponential map.
(iii) If A ∈ GLn (C) there exists a diagonalizable matrix D ∈ GLn (C) and a nilpotent matrix
N ∈ Mn (C) such that A = D +N and DN = N D. So it suffices to show that I +D−1 N
is contained in the image of the exponential map.
(iv) For nilpotent matrices N ∈ Mn (C), we can define the logarithm function

N2 N3 N n−1
log (I + N ) := N − + − . . . + (−1)n .
2 3 n−1

Then show that elog (I+N ) = I + N .

113
Miscellaneous Exercises
1. Let V be a vector space and S ⊆ V a linearly independent set. If x ∈<
/ S >, then show that
x + S := {x + y ∈ V | y ∈ S} is also linearly independent.
2. Let A be an m × n matrix over a field F . Show that A can be written as a product of two
matrices A = BC, where the columns of B and the rows of C are linearly independent.
3. Let V be a vector space over a field F . If f : V → V is a group homomorphism, then show
that the set Lin (f) := {λ ∈ F | f(λx) = λf(x) for all x ∈ V } is a subfield of F . Further, if
F is a subfield of C and f : F n → F n is a continuous group homomorphism, then prove that
Lin (f) is a closed subfield of F .
4. Let S, T, U be linear operators on a finite-dimensional vector space V , Then prove that

rk T + rk SU T ≥ rk U T + rk ST.

Deduce that rk T n + rk T n+2 ≥ 2 rk T n+1 for all positive integers n. Therefore the sequence
(rk T n − rk T n+1 )∞
n=1 is a decreasing sequence.

5. Let F be an algebraically closed field and A ∈ Mn (F ). If ch F = p > 0, then show that there
r
exists a positive integer r (depending on n) such that Ap is a diagonalizable matrix.
Give an example of a 2 × 2 real matrix A such that Am is not diagonalizable for all positive
integers m.
6. Let A be an r × r matrix over C such that the limit lim An exists. Then prove that lim An
n→∞ n→∞
is an idempotent matrix of rank equal to the rank of Ar .
If moreover A is an invertible matrix, then show that lim An = Ir .
n→∞
− − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −−
7. Let T be a linear operator on a vector space V . Suppose that f (X) ∈ F [X] is a polynomial
of prime degree p such that f (T ) = 0. Then prove that V has a T -invariant direct sum
decomposition V = ⊕i Vi such that each Vi has dimension p. − − − − − − − − − − − − − −
−−−−−−−−−−−−−−−−−−−−−−−−−
8. Prove that the following statements are equivalent for a linear operator T on a finite-
dimensional inner product space V .
(i) T is normal.
(ii) T ∗ ∈ F [T ], i.e., T ∗ can be written as a polynomial in T .
(iii) Every T -invariant subspace W ≤ V is also T ∗ -invariant.
(iv) For every T -invariant subspace W ≤ V , W ⊥ is also T -invariant.
(*) If T is a normal operator on an arbitrary inner product space V (not necessarily finite-
dimensional), then does it imply T ∗ ∈ F [T ]?
Hint. For the last question, let V be an inner product space having an orthonormal basis
indexed by S 1 , the unit circle. Then consider the unitary operator U : V → V , defined by
U (eλ ) := λeλ for all λ ∈ S 1 .
9. Let V be an inner product space with a countable orthonormal basis B, like F (N) . As Z, the
set of integers, is also a countable set, the ortonormal basis B can as well be indexed by Z.
Let B = {ei }i∈Z . Now consider the linear transformation T : V → V , given by T (ei ) := ei+1
for all i ∈ Z. Then show that
(i) T is a unitary operator.
(ii) T does not have any finite-dimensional nonzero invariant subspace. In particular, T
does not have any eigenvalue.
(iii) If W :=< ei >i∈N , then W is a T -invariant subspace. But T |W is not a unitary operator,
and W ⊥ is not T -invariant.
10. Show that every diagonalizable normal operator on a finite-dimensional inner product space
is orthogonally diagonalizable.

114
11. Prove that SO2 (R), as a group, is isomorphic to S 1 . In particular, SO2 (R) is an abelian
group.
Can you see that SO2 (R) is homeomorphic to S 1 ?
If n > 1 is a positive integer, and 0 < p < n is another positive integer, then show that
SOn (R) contains a subgroup isomorphic to SOp (R) × SOn−p (R). Deduce that SOn (R) is
not abelian for all n ≥ 3.

12. (*) In the following exercises, we explore certain topological properties of a few matrix groups.
(i) On (R), SOn (R) are compact sets.
On (R) has two connected components, both homeomorphic to SOn (R).
(ii) The set of all upper (or lower) triangular real matrices with positive diagonal entries is
a path connected set.
(iii) Let SUn (C) := {A ∈ Un (C) | det A = 1}. Then Un (C), SUn (C) are path connected
compact sets.
(iv) GLn (C), SLn (C) are both path connected. Are these sets compact?
(v) SLn (R) is a path connected set. Is it compact?
−
(vi) GLn (R) is not connected. It has two connected components - GL+ n (R) and GLn (R),
the set of all invertible real matrices of positive and negative determinant respectively.
−
In fact, GL+ n (R) is path connected and homeomorphic to GLn (R).
Hint. To prove (ii), construct paths for each entry of the matrix.
For (iii), first note that the matrices are unitarily diagonalizable, and then construct
paths for each entry.
After triangulation, use a similar idea for (iv).
For (v) and (vi), you may have to use QR decompositions. And finally, to show that
−
GL+ n (R) is homeomorphic to GLn (R), use the diagonal matrix whose first entry is −1,
and the remaining diagonal entries are equal to 1.
13. Let V be a complex inner product space. If x ∈ V is such that hx, yi = hy, xi for all y ∈ V ,
then show that x = 0.
14. Prove that the set of all n × n nilpotent matrices over F = R/C is a closed subset of Mn (F ).
Is it compact?
15. (*) Let T be a linear operator on a finite-dimensional inner product space V over F = R/C.
If T = U N is a polar decomposition of T , then show that T is normal iff U N = N U .
16. Let A be an n × n upper (or lower) triangular matrix over F = R/C. If is diagonalizable,
does it follow that A is a diagonal matrix? What if A is orthogonally diagonalizable?
17. (*) If A ∈ Mn (C) is not a scalar matrix then show that the set {U ∗ AU | U ∈ SUn (C)} is
uncountable. Deduce that if T is a non-scalar linear operator on a finite-dimensional complex
inner product space V , then the set

[T ]B ∈ Mn (F ) | B is an ordered orthonormal basis of V

is uncountable.
If A ∈ SO2 (R) \ {±I}, then show that the set {P t AP | P ∈ O2 (R)} contains exactly two
elements. So the above result does not extend to real inner product spaces.
18. (*) Let V be a finite-dimensional inner product space. Show that the following statements
are equivalent.
(i) T is a scalar operator.
(ii) The set
[T ]B ∈ Mn (F ) | B is an ordered orthonormal basis of V
is a singleton set.
Hint. Reduce the problem to the case when dim V = 2.

115

শাস্তি রবীন্দ্রনাথ ঠাকুর
No ratings yet
শাস্তি রবীন্দ্রনাথ ঠাকুর
4 pages
Banalata Sen
No ratings yet
Banalata Sen
15 pages
Quotes and Personalities
No ratings yet
Quotes and Personalities
34 pages
Essay & Quotes
No ratings yet
Essay & Quotes
42 pages