Linear Algebra Notes
Linear Algebra Notes
Sagnik Chakraborty
February 2, 2021
Abstract
These are some rough lecture notes of the course on Linear Algebra. The textbook for this
course is the ‘Linear Algebra’ book by Hoffman & Kunze. The notes are not complete, but
only intended to serve as a guideline. I’ll omit many details which you should fill in. Also,
frequent references will be made to the textbook (henceforth referred to as HK).
1
Contents
Lecture 1 (Matrices - row, column operations) 3
Lecture 4 (Basis) 11
Lecture 14 (Projections) 48
Lectures 28,29 (Polar decomposition, Spectral theory and Self-adjoint algebras) 102
2
Lecture 1 (24/9/2020) :
Linear algebra consists of two words - linear and algebra.
Algebra, in high school, was a ‘tool to solve equations’. With a major transition from previously
familiar arithmetic which involves computation with numbers, to a bunch of symbols like ‘x, y, z
etc.’ and a list of formulas like ‘(x + y)2 = x2 + y 2 + 2xy’. This algorithmic approach of ‘you
know the formula, you know the solution’ actually facilitates computation, allowing us to solve
equations more efficiently - our approach to solve a problem is no longer dependent on a specific
set of ‘values’; rather, any set of of values given as ‘inputs’ in the formulas give us solutions.
After that, ‘college algebra’ became a game involving ‘sets with operations’. The salient feature of
this so-called ‘abstract algebra’ is its axiomatic development. In this approach, to study a struc-
ture, we extract a set a properties of this structure and then formulate them as axioms. Then
formal deductions are made from these axioms. And whatever results we get, then applies to every
possible structure satisfying that particular set of axioms.
Now the word ‘linear’ comes from ’line’ - in a sense, it means ‘degree 1’. What is a line? If a, b
are two points in Rn (or, more generally, in a vector space V over a field F ) then the line through
a and b, denoted by l(a, b), is defined as
The simplest linear equation is ‘ax = b’. To make it meaningful, we should know what ax is,
i.e., a concept of multiplication. Then, to solve it, if a 6= 0, we should have a multiplicative inverse
(note that in a ring R we usually don’t allow 0 to have an inverse as R = 0 iff 1 = 0 in R iff 0 has
a multiplicative inverse in R). For example, 2x = 1 is a perfectly nice linear equation over Z. But
it has no solution in Z, the solution is in Q.
Next, if we consider two linear equations in two variables
ax + by = c
a x + b0 y = c 0
0
then we need one more operation - addition. To solve the equations, we also need subtraction.
So to work with linear equations and to find their solutions (whenever they exist!), the coefficients
must come from a division ring. To make life simpler, we’ll actually assume that the coefficients
are coming from a field.
The information of a system of linear equations is captured by an array of scalars, the so-called
augmented matrix. As we observed in the case of two linear equations in two variables, the most
basic (and surprisingly effective!) tool for solving linear equations is the so-called ‘elimination of
variables’. The essence of this operation is formalized in the process of row reduction of a matrix
by elementary row operations. There are three types of elementary row operations
3
Remarks.
(i) If A0 is a matrix obtained from an m × n matrix A by applying a single elementary row
operation then A0 = I 0 A, where I 0 is the m × m matrix obtained from the identity matrix by
applying the same elementary row operation.
(ii) Just as elementary row operations, we can similarly define elementary column operations. We
say that two matrices A, B ∈ Mm×n are column equivalent, denoted by A ∼c B, if B can be
obtained from A by a finite sequence of elementary column operations. Check that column
equivalence is also an equivalence relation.
(iii) If A0 is a matrix obtained from an m × n matrix A by applying a single elementary column
operation then A0 = AI 0 , where I 0 is the n × n matrix obtained from the identity matrix by
applying the same elementary column operation. Therefore A is row/column equivalent to
B iff At is column/row equivalent to B t .
(iv) Note that, when applied to the identity matrix, elementary row operations are ‘same as’
elementary column operations. For example, multiplying the i-th row by c 6= 0 is same as
multiplying the i-th column by c 6= 0, interchanging i-th and j-th row is same as interchanging
i-th and j-th column, and finally adding c-times of the i-th row to the j-th row is same as
adding c-times of the j-th column to the i-th column.
(v) While solving a system of linear equations we never use elementary columin operations since
it will ‘disturb the variables’.
Exercise. A matrix E ∈ Mn (F ) is called an elementary matrix if it can be obtained from the
identity matrix by applying a single elementary row/column operation. Prove that the set of ele-
mentary matrices generate GLn (F ).
4
Lectures 2 and 3 (28/9/2020, 30/9/2020) :
Remark. Let A be an m×n matrix over F . Let TA : F n → F m be the linear transformation given
by A. As elementary matrices generate the general linear group, A ∼r A0 (respectively A ∼c A0 ) iff
there exists an automorphism φ of F m (respectively ψ of F n ) such that TA0 = φ ◦ TA (respectively
TA0 = TA ◦ ψ).
We have been doing row reductions to find the set of solutions of a system of linear equations
all our life. The following proposition shows that it is a legitimate operation.
Proof. By induction, it suffices to show that the solution set remains invariant under a single
elementary row operation. After applying an elementary row operation, let A0 x = b0 be the new
system of linear equations. Then there exists an elementary matrix I 0 such that A0 = I 0 A and
b0 = I 0 b. Since I 0 ∈ GLn (F ), it follows that for an element ξ = (ξ1 , ..., ξn ) ∈ F n , Aξ = b iff
A0 ξ = b0 .
Alternatively, ξ := (ξ1 , ..., ξn ) ∈ F n is a solution of Ax = b iff i ξi Ai = b, where A1 , ..., An ∈ F m
P
m
areP P of A. Since every row operation of A induces an automorphism of F , say
the column vectors
θ, i ξi Ai = b iff i ξi θ(Ai ) = θ(b).
If Ax = b is a system of m linear equations in n-variables, let Sol (Ax = b) denote the set
of its solutions. Then we know that for each ξ = (ξ1 , ..., ξn ) ∈ Sol (Ax = b), Sol (Ax = b) =
ξ + Sol (Ax = 0). Therefore homogeneous linear equations play a special role in solving linear
equations. Observe that the solution of a system of homogeneous linear equations Ax = 0 which
is nothing but the kernel of TA , depends only on the row space of A and not on the individual row
vectors. Therefore row operations, which do not change the row space, also do not change the solu-
tion set of a system of homogeneous linear equations. In fact, if A ∈ Mm×n (F ) and B ∈ Mr×n (F )
are two matrices having the same row space then Ax = 0 and Bx = 0 have the same set of solutions.
Definition. Let A be an m × n matrix over F . Then the rank of A is defined to be the largest
non-negative integer r such that A contains an r × r sub-matrix whose determinant is nonzero.
Note that the rank of A cannot be more than either m or n, and it remains invariant under any
field extension of F .
Proof. (i) Let Ax = b be a system of m linear equations in n variables. Let (ξ1 , ..., ξn ) ∈ E n
be a solution of this system. Then we can find an F -linearly independent sequence of elements
e0 := 1, e1 , ..., er ∈ E such that each ξi can be written as an F -linear combination of ej ’s. Now
looking at each equation at a time and expanding every ξi as an F -linear combination of ej ’s, one
can easily see that the coefficient of e0 = 1 gives a solution of the system over F .
Alternatively, let A1 , ..., An ∈ F m ⊆ E m be the column vectors of A. Then Ax = b has a solution
over F iff b ∈ F m is contained in the F -column space of A iff the coefficient matrix A and the
augmented matrix (A|b) have the same ( rank or column rank???)rank. Similarly, Ax = b has a
solution over E iff b ∈ E m is contained in the E-column space of A iff the coefficient matrix A and
the augmented matrix (A|b) have the same rank. But the rank of a matrix does not change under
field extensions!
(ii) Non-triviality of a system of homogeneous linear equations depends on the rank of the coefficient
matrix. But the rank of A does not depend on whether we treat it as an element of Mm×n (F ) or
as an element of Mm×n (E).
Note that the proof crucially uses the equality of rank and column rank which we’ll prove in the
next class.
5
Lemma. Let V, W be vector spaces of dimension n, m respectively. If T : V → W is
a linear transformation of rank r then there exist an ordered basis x1 , ..., xn of V such that
T (V ) =< T (x1 ), ...., T (xr ) > and xr+1 , ..., xn ∈ Ker T .
Proposition. Let A ∈ Mm×n (F ). Then there exists a non-negative integer r such that by
applying elementary elementary row and column operations on A we can turn it into a matrix
which has an r × r identity matrix at the top left and zero everywhere else. Equivalently, there
exist invertible matrices Pm (F ) and Q ∈ GLn (F ) such that P AQ is of the desired form. As the
row rank and the column rank of A remain invariant under both row and column operations, it
implies that the row rank of A is equal to the column rank of A, both being equal to r.
Corollary. Let A be an m × n matrix over F . Then the rank of A is equal to its row rank and
column rank.
Proof. We have already proved that the row rank and column rank of A are equal.
If A has rank r, then there exists an r × r sub-matrix A0 of A such that det A 6= 0. Therefore
the row rank of A0 is equal to r. Let R1 , ..., Rr be the rows of A appearing in A0 . If we denote
the rows of A0 by R10 , ..., Rr0 then there exists certain projection π : F n → Fr under which each Ri
is mapped to Ri0 . As R10 , ..., Rr0 are linearly independent, so must be R1 , ..., Rr , implying that the
row rank of A is at least r.
Conversely, if R1 , ..., Rs are linearly independent rows of A. First look at the s × n sub-matrix
of A obtained by deleting the other rows. Since this sub-matrix has row rank s, it must have s
linearly independent columns. Deleting the other columns we get an s × s invertible sub-matrix of
A, which implies that the rank of A cannot be smaller that the row rank of A. Together, we get
the result.
Remarks.
1. If A ∈ Mm×n (F ) then A and At have the same rank, row rank and column rank. In the next
class, we’ll see that they are all equal.
2. If we define an equivalence relation on Mm×n (F ) by saying that A is equivalent to B if there
exist invertible matrices P ∈ GLm (F ) and Q ∈ GLn (F ) such that B = P AQ, then there
exist exactly min {m, n} + 1 different equivalence classes - the number of different possible
ranks of such matrices. So, under this equivalence relation, two matrices are equivalent iff
they have the same rank.
3. Let A be an m × n matrix over F . Then Ax = b has a solution for all b ∈ F m iff the column
rank of A is equal to m. In particular m ≤ n. And if m = n, then A must be invertible.
If m < n, then Ax = 0 has a non-trivial solution.
4. Let E/F be a field extension. Then x1 , ..., xr ∈ F n ⊆ E n are linearly independent over F iff
they are linearly independent over E. One can see this by noting that the rank of a matrix
does not change under linear transformations. Another approach is to extend this set to a
basis of F n which is then also a basis of E n .
The geometry behind the elimination of variables : The main philosophy behind the
elimination of variables is that a system of linear equations in n variables is ‘easier to solve’ than
6
a system of linear equation in (n + 1) variables.
We concentrate on a system of homogeneous linear P equations Ax = 0 to illustrate this idea. Af-
ter re-arranging the variables, if required, let i ai xi = 0 be a linear equationPof this system
with a1 6= 0. Then the hyperplane H defined by the equation x1 = −a−1 1 ( j>1 aj xj ) con-
tains all solutions of this system. Note that the natural projection π1 : k → k n−1 given by
n
(α1 , ..., αn ) 7→ (α2 , ..., αn ) induces an isomorphism between H and k n−1 . This allows us to view
the set of solutions as a subset of k n−1 and the original system of equations reduces to a system
of equations in (n − 1) variables which, in principle, is easier to solve.
The following ‘well-known’ fact which we are not going to prove will be used time and again.
There exists a monoid homomorphism det : Mn (F ) → F such that a matrix A is invertible iff
det (A) 6= 0. The set of all invertible matrices which forms a group under matrix multiplication,
is called the general linear group of degree n and denoted by GLn (F ).
Vector Spaces
A ring, for us, will always mean a ring with unity. Usually they’re also assumed to be commutative
unless, of course, we are dealing with the rings of matrices or linear operators. If R is a ring then
there exists a unique ring homomorphism R : Z → R and the non-negative generator of the kernel
of this map is called the characteristic of R, denoted by ch R. Note that, if R is an integral domain
then ch R is either zero or a prime number.
If (V, +) is an abelian group, then a scalar multiplication · : F × V → V is ‘same as’ a ring
homomorphism φ : F → EndZ (V ). If a scalar multiplication is given, we can define a ring homo-
morphism φ : F → EndZ (V ) which sends an element a ∈ F to φ(a), defined by φ(a)(v) := a · v
for all v ∈ V . Conversely, given a ring homomorphism φ : F → EndZ (V ), we can define a scalar
multiplication by a · v := φ(a)(v). Clearly, this is a one-to-one correspondence.
Examples.
1. 0, F and F n for any non-negative integer n.
2. Let I be a set. Then F I , the set of all functions from I to F is a vector space over F under
point-wise addition and scalar multiplication.
This is a special case of a more general phenomenon. If X is a set and G is an algebraic
structure then GX , under point-wise operations, tend to reflect the algebraic structure of G.
F (I) , the set of all functions from I to F which takes nonzero values only at finitely many
elements of I, is a subspace of F I . Note that F I = F (I) iff I is a finite set.
We’ll later see that vector spaces ‘look like’ F (I) whereas dual spaces ‘look like’ F I . In fact
(F (I) )∗ = F I .
3. The set of all m × n matrices over a field F .
4. If E/F is a field extension, then E is automatically a vector space over F .
S∞
5. F [X] = n=0 F n = F (N) .
7
which contains W .
The set of all linear combinations of a sequence (x1 , ..., xn ) ∈ V n , i.e, the set {λ1 x1 + ... +
λn xn | λ1 , ..., λn ∈ F }, is called the linear span of (x1 , ..., xn ). Note that the linear span of the
sequence (x1 , ..., xn ) is equal to the image of the homomorphism φ : F n → V , where φ(ei ) := xi
for all i; and rearranging the elements of a sequence does not change its linear span.
If S is a subset of V , then the linear span of S, denoted by < S >, is defined to be union of all
linear spans of all finite sequences of elements of S. Note that
\
< S >= W,
S⊆W ≤V
A finite sequence of elements (x1 , ..., xn ) ∈ V n is said to be linearly independent if the natural
homomorphism φ : F n → V , with φ(ei ) := xi for all i, is 1 − 1. Note that a linearly independent
sequence of vectors retains its property under any rearrangement of the vectors. A set S ⊆ V is
said to be linearly independent if all finite sequences of distinct elements of S are linearly indepen-
dent.
A vector space V is said to be finitely generated if there exists a finite set S ⊆ V such that
V =< S >.
Operations on subspaces of V .
1. Union: If V1 , V2 ≤ V , the V1 ∪ V2 is a subspace of V iff either V1 ⊆ V2 or V2 ⊆ V1 .
A partially ordered set (Γ, ≤ is said to directed (upwards) if for any two elements i, j ∈ Γ
there exists an element l ∈ Γ satisfying i ≤ l and j ≤ l. A family of subspaces {Vi } is said the
a directed system if it forms a directed set under the inclusion relation. The corresponding
union ∪i Vi may then be called a directed union. Note that a directed union of subspaces is
always a subspace.
2. Sum : If V1 , V2 ≤ V then we define their sum as V1 + V2 := {x + y | x ∈ V1 and y ∈ V2 }.
Note that V1 + V2 is a subspace of V . Inductively, we can define a sum of finitely many
subspaces of V . Since addition is a commutative operation in V , the order of the summands
does not change the sum of subspaces. If {Vi }i∈I P is a family of subspaces (finite or infinite!)
of V , then the sum of this family, denoted by i Vi , is defined to be the directed union of
the sums of all finite subfamilies of {Vi }. If I is a finite set then this definition matches with
the definition of the sum of finitely many subspaces. Note that
X \
Vi = W =< ∪i Vi >,
i ∪i Vi ⊆W ≤V
To see that the inclusion can be strict, just look at three distinct lines passing through the origin
in R2 .
Remarks.
8
3. Any vector space V is a directed union of its finite dimensional subspaces. It’s also an union
of its one dimensional subspaces, although this union is not directed.
4. If V is finitely generated and S ⊆ V is a spanning set then there exists a finite set S0 ⊆ S
such that V =< S0 >.
5. Let S ⊆ V be a linearly independent set. Then for all x ∈ V , S ∪ {x} is linearly independent
iff x ∈<
/ S >.
Theorem. Let S, T be subsets of a vector space V . If < S >= V and T is linearly independent,
then |T | ≤ |S|.
Exercises.
1. HK : Section 1.6 - 9,12; Section 2.1 - 4,6,7; Section 2.2 - 1,2,5,8 (you only need ch F 6= 2);
2. Prove that the ring of upper (or lower) triangular n × n matrices is not commutative for all
n ≥ 2.
3. If F is a field prove that both the groups (F, +) and (F ∗ , ·) can be embedded in SL2 (F ).
4. Let Sn , An be the symmetric group and the alternating group of degree n respectively. Prove
that for any commutative ring R, Sn and An can be embedded in GLn (R) and SLn (R)
respectively.
5. Let m > n be positive integers. Show that for all A ∈ Mm×n (F ), B ∈ Mn×m (F ), the product
AB ∈ Mm (F ) is not invertible.
If m ≤ n, can you always find A ∈ Mm×n (F ), B ∈ Mn×m (F ) such that AB = Im ?
6. Let R be a commutative ring with unity. If I is an ideal of R satisfying I 6= I 2 then prove
that I, under the induced operations, is a commutative ring without unity.
7. (*)
(i) Let V be the vector space of all real-valued functions defined on R. Show that the set
{eαx }α∈R is linearly independent.
(ii) Let V be the vector space of all real-valued functions defined on the unit interval [0, 1].
1
Show that the set { x−c }c∈R\[0,1] is linearly independent.
8. Let S1 , S2 be linearly independent sets f a vector space V . Prove that the image of S1 is
linearly independent in V / < S2 > iff the image of S2 is linearly independent in V / < S1 >.
Further if S1 ∩S2 = ∅, show that the above conditions are equivalent to S1 ∪S2 being linearly
independent.
9
9. If F is a finite field, find the number of elements in GLn (F ), SLn (F ) and the set of n × n
elementary matrices.
10. Let A ∈ Mn (Z). For any prime number p, let Ap denote the image of A in Mn (Z/pZ). Prove
that rk A ≥ rk Ap for all p, and the equality holds for all but finitely many p.
11. Let A be an n×n matrix over a field F whose every entry is equal to 1. Show that Ai+1 = nAi
for all i ≥ 1. Deduce that A is a nilpotent matrix iff ch F | n.
14. A set X ⊆ V is said to be closed under lines (I don’t think that this terminology is standard,
but the concept is!) if l(x, y) ⊆ X for all x, y ∈ X. Then prove that
(i) A set X ⊆ V is closed under lines iff −X is closed under lines iff v + X is closed under
lines for all v ∈ V .
(ii) If Ax = b is a system of m linear equations in n variables then the set of its solutions
is closed under lines.
(iii) If F (6=?) then a non-empty set X ⊆ V is closed under lines iff there exists a vector
v ∈ V and a subspace W ≤ V such that X = v + W .
15. If V1 , V2 are subspaces of a vector space V over F (6=?), then prove that
[
V1 + V2 = l(x, y).
x∈V1 , y∈V2
16. Let Ω be a subset of a vector space V over F (6=). For any set X ⊆ V , we define
[
l(X) := l(x, y).
x,y∈X
If 0 ∈ Ω, then we inductively define a sequence of sets by Ω0 := Ω and Ωi+1 := l(Ωi ) for all
i ≥ 0. Now prove that
(a) Ωi ⊆ Ωi+1 for all i.
(b) If Ω ⊆ Ω0 then Ωi ⊆ Ω0i for all i.
(c) If dim < Ω >= n < ∞, then there exists a finite set Γ ⊆ Ω, consisting of n + 1 points
which includes the origin, such that Γn = Ωn =< Ω >.
In general, show that < Ω >= ∪i Ωi .
10
Lecture 4 (9/10/2020) :
We will continue with the proof of the theorem that ‘If S, T ⊆ V are such that T is linearly inde-
pendent and S is a spanning set then then |T | ≤ |S|.’
We have already proved it in the case when S is finite. The ‘proposed proof’ when S is infinite
doesn’t work because if {Ti }i∈N is an increasing sequence of linearly independent sets in VS with
injecive maps φi : Ti → S such that V =< (S \ φi (Ti )) ∪ Ti > for all i, still (S \ φ̃(∪i Ti )) ∪i Ti
may not span V .
For example, let V := F [X], S := {1, 1 + X, 1 + X + X 2 , ...} and T := {X, X 2 , X 3 , ...}. Let Sn , Tn
be the set of first n elements of S, T respectively (strictly speaking, ‘first n elements of a set’ doesn’t
make any sense, but you know what i mean!). Then (S \ Sn ) ∪ Tn is a spanning set of V for all n
but, as you can see, V 6=< T >.
Lemma. Let X be an infinite set. Then F(X), the set of all finite subsets of X, has the same
cardinality as X.
If S is an infinite spanning set of V , by the above lemma, the elements of F(S) can be indexed
by S, say F(S) = {Si }i∈S . For each i ∈ S, let Vi :=< Si >. Then {Vi }i∈S is a collection of finite
dimensional subspaces of V indexed by S and V = ∪i Vi . If Ti := T ∩ Vi , then T = ∪i Ti . Note that
each Ti is a finite set since |Ti | ≤ |Si | for every i. Now the following lemma, whose proof we leave
as an exercise, finishes the proof of the theorem.
Lemma. Let Ω be an infinite set and S X is any set. Let {Xi }i∈Ω be a collection of subsets of
X. If each Xi is at most countable, then | i∈Ω Xi | ≤ |Ω|.
Proof. We will show that every vector space V contains a maximal linearly independent set
with respect to set inclusion. If V is finitely generated, we can inductively construct such a set.
Otherwise, we have to apply Zorn’s lemma which says that ‘If every chain of a partially ordered
set (P, ≤) has an upper bound then P has a maximal element.’. Now consider the collection of all
linearly independent subsets of V . This is a partially ordered set under set inclusion. If {Ti } is
a chain in this family then ∪i Ti is also linearly independent (in fact, more generally, the union of
any directed family of linearly independent sets is linearly independent), which serves as an upper
bound of the chain. So by Zorn’s lemma, V contains a maximal linearly independent set.
Question. Can you use Zorn’s lemma to conclude the existence of a minimal generating set?
11
Remark.
1. Any two bases of a vector space V have the same cardinality. The cardinality of any (and
hence, all) basis of a vector space V is called its dimension and denoted by dimF V , or simply
by dim V if F is understood from the context.
2. If V, W are two vector spaces over a field F then V ∼ = W iff dim V = dim W . So once
the base field F is fixed, vector spaces over F , up to isomorphism, are ‘nothing but cardinal
numbers’.
3. Let V, W be two vector spaces over a field F . If there exist injective (respectively surjective)
linear maps φ : V → W and ψ : W → V then V ∼ = W . This is analogous to the Schroeder-
Bernstein theorem for sets; and the proof, perhaps not surprisingly, uses the Schroeder-
Bernstein theorem to conclude that dim V = dim W .
4. If I is a set, then the canonical vector space over F with I as a basis is F (I) . In fact, given any
vector space V , there exists a natural one-to-one correspondence between the set maps from
I to V and the vector space homomorphisms from F (I) to V . So F (I) ‘converts’ set-theoretic
maps into vector space homomorphisms.
5. Any linearly independent set of V can be extended to a basis of V .
Definitions. Let {V Qα }α∈Γ be a family of vector spaces over F . Then the direct product of
the family, denoted by α∈Γ Vα , is defined to be the set of all functions f : Γ → ∪α Vα such that
f (α) ∈ Vα for all α ∈ Γ. Then pointwise addition and scalar multiplication makes it a vector space
over F . L
The (external) direct sum of the family {Vα }α∈Γ , denoted by α∈Γ Vα , is defined to be the set of
all functions f : Γ → ∪α Vα such that f (α) ∈ Vα for all α ∈ ΓL
and f (α) 6= 0 for most finitely many
α. Then pointwise addition and scalarL multiplication makes Qα∈Γ Vα a vector
L space overQF .
It is clear from the definitions that α∈Γ Vα is a subspace of α∈Γ Vα and α∈Γ Vα = α∈Γ Vα
iff Γ is a finite set.
Remarks.
L
1. For each α ∈ Γ, there exists a natural inclusion ια : Vα ,→ α∈Γ Vα , where ια (xα ) is defined
to be the function fxα : Γ → ∪α Vα which takes the value xα at α and L0 everywhere else.
Similarly,
L for each α ∈ Γ, there exists a natural projection π α : α∈Γ Vα → Vα , where
f ∈ α∈Γ Vα is mapped to f (α) ∈ Vα .
It is clear that πα ◦ ια = idPV α and πβ ◦ ια = 0 whenever α 6= β.
Also, one can check that α∈Γ ια ◦ πα = id α∈Γ Vα (the possibly infiniteL sum of linear
L
operators makes sense because when it is applied to any particular element of α∈Γ Vα , only
Lmany summands are nonzero). In particular, if we identify Vα with its image ια (Vα ),
finitely
then α∈Γ Vα =< ∪α Vα >. Under L this identification, if Bα is a basis of Vα for each α ∈ Γ
then B := ∪α Bα is a basis of α∈Γ Vα .
Lemma. If V1 , V2 are two subspaces of a vector space V then (V1 + V2 )/V1 ∼= V2 /(V1 ∩ V2 ).
Consequently, if V1 , V2 are finite dimensional then then dim (V1 + V2 ) = dim V1 + dim V2 −
dim (V1 ∩ V2 ).
12
Definition. Let R be a commutative ring. By an R-algebra A, we mean an ordered pair
(A, φA : R → A) where A is a ring (not necessarily commutative) and φA is a ring homomorphism,
called the structure-homomorphism of the R-algebra A, such that φA (R) is contained in the center
of A. Note that φA need not be injective. We will often simply say that A is an R-algebra when
the structure homomorphism φA is understood from the context.
As for examples of R-algebras, any commutative ring R is trivially an algebra over itself with the
identity map giving the structure homomorphism. A standard example of an R-algebra is the
polynomial algebra R[X]. We will mostly be interested in R-algebras where R = F is a field so
that, whenever A 6= 0, we can actually identify F as a subring of A contained in the center. For us,
the most important F -algebras will be the matrix rings Mn (F ) and the rings of linear operators
of vector spaces V over F .
By an R-algebra homomorphism between two R-algebras (A, φA : R → A) and (B, φB : R → B)
we mean a ring homomorphism f : A → B such that the following diagram commutes
φA
R A
f
φB
B
i.e., φB = f ◦ φA . Loosely speaking, an R-algebra homomorphism from A to B is a ring homomor-
phism from A to B which does not ‘disturb’ the elements of R.
Exercises.
1. HK : Section 2.3 - 7,8,9 (ch F 6= 2 suffices),11,13; 2.4 - 6,7.
2. Prove that a vector space V over an infinite field F cannot be written as a finite union of its
proper subspaces.
If V is a vector space of dimension > 1 over a finite field F , show that V can be written as
an union of q + 1 proper subspaces, where |F | = q.
3. (*) Let F be a field and F (X) := Q(f [X]), the field of rational functions in one variable over
F . Show that dimF F (X) = max {ℵ0 , |F |}, where ℵ0 = |N|.
1
Hint. can you prove that the set { X−c }c∈F is linearly independent over F ?
4. Let V1 , V2 be two subspaces of a vector space V . Let B12 be a basis of V1 ∩ V2 . If B1 ⊆ V1
and B2 ⊆ V2 are two sets such that their images are bases of V1 /(V1 ∩ V2 ) and V2 /(V1 ∩ V2 )
respectively, then prove that B1 ∪ B2 ∪ B12 is a basis of V1 + V2 .
5. Prove that Rm is isomorphic to Rn as abelian groups for any two positive integers m and n.
(*) Also, prove that R ∼
= RN as abelian groups.
Hint. Can you view the above groups as suitable vector spaces?
6. Let E/F be a finite field extension, i.e., E/F is a field extension such that E is a finite
dimensional vector space over F . If V is a finite dimensional vector space over E, then prove
that
dimF V = [E : F ] · dimE V,
where [E : F ] := dimF E.
Hint. If {λi } is a basis of E over F and {ej } is a basis of V over E, then {λi ej } is a basis of
V over F .
7. Let I be a set and E/F a field extension. Then there exists a natural inclusion of F -vector
spaces F (I) ,→ E (I) . If S ⊆ F (I) is a linearly independent set over F , show that S ⊆ E (I)
remains linearly independent over E.
Let E/F be a field extension and V a vector space over E. Then by restriction of scalars,
we can view V as a vector space over F . If S ⊆ V is linearly independent over F does it
remain linearly independent over E?
8. (*) If X is a metric space (or topological space, if you know what it is!), let C(X, F ) denote
the set of all continuous functions from X to F , where F := R/C. If X is homeomorphic to
Y , then prove that C(X, F ) and C(Y, F ) are isomorphic as F -algebras.
13
9. (*)
i) Let V := C(R, R), the set of all real-valued continuous functions defined on R. Prove
that the set {ecx }c∈R ⊆ V is linearly independent over R.
Deduce that V has uncountable dimension over R; in fact, dimR V = ℵ1 , where ℵ1 := |R|.
ii) Let V := C([0, 1], R), the set of all real-valued continuous functions defined on the closed
unit interval [0, 1]. Then prove that dimR V = ℵ1 .
Note that the natural restriction gives a surjective linear map from C(R, R) to C([0, 1], R),
implying that dimR C([0, 1], R) ≤ dimR C(R, R).
Hint. You may use the ideas of the previous exercises to prove (ii). But given a countable
set of continuous functions S from [0, 1] to R, can you actually construct a function
f ∈ C([0, 1], R) such that f ∈<
/ S >?
10. (*) Let F := R/C and V := {(an ) ∈ F N | |an |1/n → 0 as n → ∞}. Show that V is a vector
space over F and its dimension is uncountable.
Hint. You may prove it using the previous exercises. But given a countably infinite subset
S of V , can you actually construct a sequence (an ) ∈ V such that (an ) ∈<
/ S >?
14
Lecture 5 (12/10/2020) :
Co-ordinate system
Let V be a finite dimensional vector space over F . A finite sequence of vectors B := (x1 , ..., xn ) ∈
V n is called an ordered basis of V if the natural linear map φB : F n → V , given by ei 7→ xi for all i,
is an isomorphism. Equivalently, (x1 , ..., xn ) is a linearly independent sequence whose linear span
is the whole of V . If α ∈ V , then the co-ordinate representation of α with respect to B, denoted
by [α]B , is the pre-image of α with respect to φB . If B0 := (x01 , ..., x0n ) ∈ V n is another ordered
basis of V then we have a commutative diagram
θf
Fn Fn
φB φB0
f
V V
where f : V → V is the isomorphism defined by f (xi ) := x0i for all i, and θf (ei ) := e0i , where
e0i := φ−1 0
B0 (xi ) for all i. Let α ∈ V and [α]B , [α]B0 the correspondinding co-ordinate representations
with respect to the ordered basis B, B0 respectively. Let P be the n × n invertible matrix whose
columns are the vectors e01 , ..., e0n . For any α ∈ V , if we write [α]B , [α]B0 as n × 1 column matrices,
then [α]B = P [α]B0 , or equivalently, [α]B0 = P −1 [α]B . Note that the ordered bases of F n are in
one-to-one correspondence with the elements of GLn (F ), the set of n × n invertible matrices over
F . Writing the vectors of an orederd basis of F n as column vectors gives us an invertible matrix.
Conversely, the column vectors of an invertible matrix gives an ordered basis of F n . Similarly, if
V is an n-dimensional vector space over F , then the ordered bases of V are in one-to-one corre-
spondence with the elements of GL(V ), the group of invertible linear operators of V .
Linear Transformations
Let V, W be vector spaces over a field F . Then a set-theoretic map T : V → W is called a linear
transformation from V to W if the following diagrams are commutative
T ×T
V ×V W ×W
+V +W
T
V W
idF ×T
F ×V F ×W
·V ·W
T
V W
where (T × T )(x, y) := (T x, T y) and (idF × T )(λ, x) := λT x for all λ ∈ F and x, y ∈ V .
In short linear transformations are precisely those set-theoretic maps which preserve linear combi-
nations.
Remark. For various algebraic structures like groups, rings, vector spaces etc. morphisms
(homomorphisms, if you like) are defined to be those set-theoretic maps which respect the relevant
structure, and we can represent this property by using various diagrams as above.
15
A linear operator T : V → V is said to be nilpotent if there exists a positive integer n such that
T n = 0. T is said to be locally nilpotent if for all x ∈ V , there exists a positive integer nx (de-
pending on x) such that T nx (x) = 0. Note that a nilpotent operator is locally nilpotent and the
converse is true if V is finite dimensional. However, a locally nilpotent operator, in general, need
not be nilpotent. For example, if V := R[X] and D : V → V is the usual differential operator, i.e.,
D(f (X)) := f 0 (X), then D is a locally nilpotent operator which is not nilpotent.
If T is a linear operator on V , then W ≤ V is said to be T -invariant if T (W ) ⊆ W . A linear
operator T is said to be locally finite if for all x ∈ V , the linear span of the set {x, T x, T 2 x, T 3 x, ...}
is finite dimensional. Note that < {x, T x, T 2 x, T 3 x, ...} > is the smallest T -invariant subspace
of V which contains x. Consequently, T is locally finite iff every x ∈ V is contained in a finite
dimensional T -invariant subspace, or equivalently, V is an union of finite-dimensional T -invariant
subspaces. Note that a locally nilpotent operator is automatically locally finite. Also, every linear
operator on a finite-dimensional vector space is locally finite.
Examples.
1. Let V be the set of infinitely differentiable real-valued functions defined on R. Then the
usual differential operator D : VS→ V is a linear operator. Note that Ker Dn consists of all
∞
polynomials of degree < n and n=0 Ker Dn = R[X].
Remarks.
1. If F = R/C, then every linear transformation T : F n → F m is continuous
2. If V, W are vector spaces over F , then LF (V, W ) (or simply L(V, W ), the set of all linear
transformations from V to W , is also a vector space over V . If dim V = n and dim W = m
then dim L(V, W ) = mn.
If V = W , we usually denote L(V, V ) by just L(V ) and this is an F -algebra.
3. If T, T 0 ∈ L(V, W ) then T = T 0 iff they match on a generating set of V .
4. Let T : V → W be a linear transformation. Then T is injective iff ker T = 0 iff T takes
every linearly independent set of V to a linearly independent set of W iff there exists a linear
transformation T 0 : W → V such that T 0 ◦ T = idV .
T is surjective iff coker T = 0 iff T takes every spanning set of V to a spanning set of W iff
there exists a linear transformation T 0 : W → V such that T ◦ T 0 = idW .
T is bijective iff T takes a (respectively every) basis of V to a basis W iff there exists a linear
transformation T 0 : W → V such that T 0 ◦ T = idV and T ◦ T 0 = idW .
Note that if there exist linear transformations T 0 , T 00 : W → V such that T 0 ◦ T = idV and
T ◦ T 00 = idW then T 0 = T 00 and T is an isomorphism.
If T is a linear operator on a finite-dimensional vector space V , then by rank-nullity theorem,
T is injective iff it’s surjective iff it’s an isomorphism.
16
5. Let V be a finite-dimensional vector space over F = R/C with two ordered bases B :=
(x1 , ..., xn ) and B0 := (x01 , ..., x0n ). If φB and φB0 are the corresponding co-ordinate maps from
F n to V , then we can give two different metrics on V by defining d(x, y) := |φ−1 −1
B (x) − φB (y)
0 −1 −1 0
and d (x, y) := |φB0 (x) − φB0 (y)|. Then d, d are topologically equivalent, i.e., they induce the
same topology on V , or, in other words, a set X ⊆ V is open with respect to d iff it’s open
with respect to d0 . To see this, note that d, d0 are topologically equivalent iff the identity map
from (V, d) to (V, d0 ) is a homeomorphism. Now look at the following commutative diagram
T
Fn Fn
idV
V V
n
where T is the invertible linear operator of F which makes the above diagram commutative.
In fact, T (ei ) = φ−1 n
B0 (xi ) for all i. Note that, by construction, φB : (F , | |) → (V, d) and
0 n 0
φB : (F , | |) → (V, d ) are homeomorphisms. Also, T , being an invertible linear operator, is
a homeomorphism of (F n , | |). Therefore idV : (V, d) → (V, d0 ) is also a homeomrphism. So
if V is a finite-dimensional vector space over F = R/C, then we can give it a topology which
is independent of any particular choice of the co-ordinate system.
Exercises.
(1) HK : Section 3.2 - 1,5,7,8,9,11,12; Section 3.3 - 7.
(2) (*) If T : R2 → R2 is a continuous map such that T n is a linear operator for all n ≥ 2,
is T a linear operator?
(3) Let T : Rn → Rm be a group homomorphism. Show that the following statements are
equivalent.
(i) T is uniformly continuous.
(ii) T is continuous.
(iii) T is continuous at the origin.
(iv) There exists a point x ∈ Rn such that T is continuous at x.
(4) (*)
(i) Let T : Rn → Rm be a map satisfying T (λx) = λT (x) for all λ ∈ R and x ∈ Rn . Is
T continuous?
(ii) If T : Rn → Rm is a continuous map satisfying T (λx) = λT (x) for all λ ∈ R and
x ∈ Rn , is T a linear transformation?
Hint: You may visualize in R2 .
(5) (*) Let T : V → W be a linear transformation. If T 0 : W → V is a set-theoretic map such
that T 0 ◦ T = idV and T ◦ T 0 = idW , then prove that T 0 is also a linear transformation.
Does the same conclusion hold if we only assume that T 0 ◦ T = idV (or T ◦ T 0 = idW )?
Prove that T is an isomorphism iff there exist linear transformations T 0 , T 00 : W → V
such that T 0 ◦ T is an invertible linear operator on V and T ◦ T 00 is an invertible linear
operator on W .
6. Give examples of two linear operators T, T 0 ∈ L(V ) such that T ◦ T 0 = idV but T 0 ◦ T 6= idV .
7. Let T : V → V be a linear operator.
i) Prove that {ker T n }∞ n ∞
n=1 is an increasing and {im T }n=1 is a decreasing sequence of
subspaces in V .
ii) If ker T i = ker T i+1 for some positive integer i, then ker T i = ker T i+j for all j ≥ 0.
Similarly, if im T i = im T i+1 for some positive integer i, then im T i = im T i+j for all
j ≥ 0.
If V is finite-dimensional then ker T i = ker T i+1 iff im T i = im T i+1 .
iii) ker T = ker T 2 iff ker T ∩ im T = 0.
17
iv) If V is a vector space of dimension n, then V = ker T n ⊕ im T n .
v) If V 0 := i ker T i then V 0 is a T -invariant subspace of V and T |V 0 is a locally nilpotent
S
operator which is nilpotent iff there exists a positive integer n such that ker T n =
ker T n+1 .
vi) Give an example of a linear operator T : V → V such that ker T i ∩ im T i 6= 0 for all
i ≥ 1.
8. If V is a vector space of dimension > 1, prove that L(V ) is not a commutative ring.
12. Let V be a vector space and S, T ∈ L(V ) such that ST = T S. Prove that ker T and im T
are S-invariant and vice versa.
13. (*) Let T be a linear operator on an n-dimensional vector space V . Prove that there exists
a T -invariant decomposition V = V1 ⊕ V2 such that T |V1 is nilpotent and T |V2 is invertible
with dim V2 = rk T n so that rk T |V1 = rk T − rk T n .
Can you see how the matrix representation of T looks like with respect to a basis B1 ∪ B2 ,
where B1 is an ordered basis of V1 and B2 is an ordered basis of V2 ?
14. Let F be a field and Pol (F ) the set of all polynomial functions on F , i.e.,
Show that the natural F -algebra homomorphism from F [X] to Pol (F ) which sends a poly-
nomial to the corresponding polynomial function, is injective iff F is infinite. If F is finite,
can you describe the kernel of this map?
If F is finite, prove that every function from F to F is a polynomial function.
15. Let V, W be vector spaces over a field F and σ : F → F a ring homomorphism. Let
f : V → W be a group homomorphism satisfying f (λv) = σ(λ)f (v) for all λ ∈ F and v ∈ V
(if σ = idF then we get the definition of linear transformation).
Prove that ker f is a subspace of V ; and if σ is surjective then im f is a subspace of W .
Let x1 , ..., xn ∈ V . If f (x1 ), ..., f (xn ) is linearly independent, prove that x1 , ..., xn ∈ V is
linearly independent. The converse is also true if f is injective and σ is surjective.
If σ is surjective then S ⊆ V is a spanning set of V iff f (S) ⊆ im f is a spanning set of im f .
18
Lecture 6 (14/10/2020) :
Matrix representation of linear transformations
Let V, W be vector spaces over F of dimension n, m respectively and T : V → W a linear trans-
formation. Let BV := (x1 , ..., xn ) and BW := (y1 , ..., ym ) be ordered basis of V, W respectively.
We want to find the matrix representation of T with respect to these ordered bases. The ma-
trix, denoted by [T ]BV ,BW , is the unique matrix in Mm×n (F ) which makes the following diagram
commutative
[T ]BV ,BW
Fn Fm
φBV φBW
T
V W
where φBV is the isomorphism which sends the natural basis vectors of F n to xi ’s and φBW is the
isomorphism which sends the natural basis vectors of F m to yj ’s. The (i, j)-th entry of [T ]BV ,BW
is the coefficient of yi in T (xj ).
Remarks.
1. For any fixed pair of ordered bases BV , BW , we get a vector space isomorphism ΦBV ,Bw :
L(V, W ) → Mm×n (F ), given by T 7→ [T ]BV ,BW .
If V = W and BV = BW = B, then we denote the matrix corresponding to T by TB and
the map from L(V ) to Mn (F ), given by T 7→ [T ]B , is an F -algebra isomorphism.
2. If V, W are vector spaces of dimension n, m respectively, the set of all possible isomorphisms
from L(V, W ) to Mm×n (F ) is parametrized by the set ordered pairs like (BV , BW ) where
BV is an ordered basis of V and BW is an ordered basis of W .
In particular, the ordered bases of V parametrize the set of isomorphisms from L(V ) to
Mn (F ).
3. If U, V, W are vector spaces with ordered basis BU , BV , BW and dimension n, m, p respec-
tively, then for any linear transformations T : U → V and T 0 : V → W , [T 0 ◦ T ]BU ,BW =
[T 0 ]BV ,BW [T ]BU ,BV .
And this is why the matrix multiplication rule may appear somewhat peculiar at first glance,
it’s supposed to ‘reflect’ the composition of two linear transformations.
If T : V → W is an isomorphism, then [T −1 ]BW ,BV = [T ]−1 BV ,BW .
For the details, one may look at Hoffman and Kunze, section 3.4.
Change of basis. Let T : V → W be a linear transformation and BV , BW ordered basis of
V, W respectively. Let A := [T ]BV ,BW , so that
If B0V , B0W is another pair of ordered basis, how can we find A0 := [T ]B0V ,B0W ?
Note that, by change of co-ordinates we have
where PV is the invertibe matrix whose columns are the vectors of B0V , represented with respect
to BV , and similarly PW . Therefore we get that
−1
A 0 = PW APV .
19
Definition. Two matrices A, B ∈ Mn (F ) are said to be similar, denoted by A ∼ B, if there
exists an invertible matrix P ∈ GLn (F ) such that B = P −1 AP .
Observe that similarity is an equivalence relation. For a matrix A ∈ Mn (F ), if TA : F n → F n is
the corresponding linear operator whose matrix representation with respect to the natural ordered
basis of F n is A, then a similar matrix B = P −1 AP is the matrix representation of TA with respect
to the ordered basis given by the column vectors of P . As a result, for the purpose of discussing
linear operators, similar matrices are often considered ‘same’; and important invariants associated
to matrices, like trace, determinant etc., always take the same value on similar matrices.
Lagrange’s interpolation
Lemma. Let λ1 , ..., λn ∈ F be distinct elements. Then the corresponding Vandermonde matrix
1 λ1 λ21 · · · λ1n−1
1 λ2 λ22 · · · λ2n−1
V(λ1 ,...,λn ) := .
.. .. .. ..
.. . . . .
1 λn λ2n n−1
· · · λn
is invertible.
Proof. Let us consider the system of homogeneous linear equations V(λ1 ,...,λn ) X = 0. If it has
a non-trivial solution, say (c0 , c1 , ..., cn−1 ) ∈ F n , then the polynomial f (X) := c0 + c1 X + ... +
cn−1 X n−1 ∈ F [X] has degree at most n − 1. But f (λi ) = 0 for all i = 1, ..., n, a contradiction.
Theorem (Lagrange’s interpolation). Let λ1 , ..., λn ∈ F be distinct elements. Then for all
c1 , ..., cn ∈ F there exist a unique polynomial f (X) ∈ F [X] of degree < n such that f (λi ) = ci for
all i. Here we follow the standard convention where the degree of the zero polynomial is taken to
be either −1 or −∞.
Proof. We’ll give two proofs - one using linear algebra and the other using ring theory.
First proof: Let C be the n × 1 column vector given by (c1 , ..., cn ). As the Vandermonde matrix
V(λ1 ,...,λn ) is invertible, the system of linear equations V(λ1 ,...,λn ) X = C has a unique solution,
say (a0 , ..., an−1 ) ∈ F n , which can be obtained by using Cramer’s rule. Then the polynomial
f (X) := a0 + a1 X + ... + an−1 X n−1 satisfies the given conditions.
Second Proof: First,P for each i we’ll construct a polynomial fi of degree < n such that fi (λj ) = δij .
Then f (X) := i ci fi (X) does the job. Q
As fi (λj ) = 0 for all j 6= i, fi (X) must be divisible by the product j6=i (X − λj ). Now the
condition fi (λi ) = 1 forces the polynomial to be
Y X − λj
fi (X) = .
λi − λj
j6=i
The uniqueness of the polynomial follows from the fact that if f (X), g(X) ∈ F [X] are two poly-
nomials satisfying the given conditions then each λi is a root of the polynomial f (X) − g(X).
Note that if the condition that the polynomial has degree < n is dropped, then we cannot retain
the uniqueness part of the claim.
Schroeder-Bernstein Theorem.
Statement. Let A, B be two sets. If there exist injective maps φ : A → B and ψ : B → A, then
there exists a bijection between A and B.
Q
AxiomQof choice. Let {Xi }i∈I be a non-empty family of non-empty sets. Then i∈I Xi 6= ∅,
where i∈I Xi is the set of all maps f : I → ∪i∈I Xi such that f (i) ∈ Xi for all i.
20
Lemma. Let A, B be two non-empty sets. Then there exists an injective map φ : A → B iff
there exists a surjective map ψ : B → A.
Proof. Let φ : A → B be an injective map. Let a ∈ A. Then we can define a surjective map
ψ : B → A as follows (
φ−1 (b) if b ∈ φ(A)
ψ(b) :=
a otherwise.
Note that ψ ◦ φ = idA , and if φ is not surjective then ψ is unique iff A is a singleton set.
Conversely, suppose that ψ : B →QA be a surjective map. For each a ∈ A, let Fa := ψ −1 (a), the
fiber over a. By axiom of choice, a∈A Fa is non-empty, which gives us an injective map from A
to B. In fact, for any such injective map φ : A → B, ψ ◦ φ = idA .
Corollary. In view of the above lemma, if A, B are two non-empty sets then there exists
a bijection from A to B iff there exist injective (respectively surjective) maps φ : A → B and
ψ : B → A.
Exercises.
1. HK : Section 3.4 - 6,8,9,10,11,12.
2. Let λ1 , ..., λn be n distinct elements in a field F . Show that there exist n elements c1 , ..., cn ∈
F such that there does not exist any polynomial f (X) ∈ F [X] of degree < n − 1 such that
f (λi ) = ci for all i.
3. Let λ1 , ..., λn ∈ F be distinct elements and c1 , ..., cn ∈ F . If g(X) ∈ F [X] is a polynomial
such that g(λi ) 6= 0 for all i, then there exists a polynomial f (X) ∈ F [X], divisible by g(X),
such that f (λi ) = ci for all i.
4. Let T be a linear operator on a vector space V . If T commutes with every invertible operator
on V , prove that T is a scalar operator.
Deduce that If A ∈ Mn (F ) commutes with all elements of SLn (F ) then A = λI, fot some
λ ∈ F.
In particular, the center of Mn (F ) consists of precisely the scalar matrices.
Hint: If T : V → V is a linear operator such that T (v) ∈< v > for all v ∈ V , can you show
that T is a scalar operator?
5. (*) For A ∈ Mn (F ), let Sim (A) denote the set of matrices which are similar to A. Prove
that Sim (A) is a singleton set iff A is a scalar matrix.
If A ∈ Mn (F ) is not a scalar matrix, show that there exists a set-theoretic injective map
from F to Sim (A). In particular, |Sim (A)| = |F | whenever F is infinite.
Deduce that there exist uncountably many R-algebra embeddings of C in M2 (R).
6. (*) Let F be a field of characteristic zero and A ∈ Mn (F ) has trace zero. Then prove that A
is similar to a matrix B whose every diagonal entry is equal to zero.
Does the above result hold over a field of positive characteristic?
21
Lecture 7 (16/10/20) :
Dual spaces.
Definitions. If V is a vector space over F , a linear transformation φ : V → F is sometimes
called a linear functional on V ; and we denote the set of all linear functionals on V by V ∗ , i.e.,
V ∗ := L(V, F ). As we already know, V ∗ is a vector space over F . It’s called the dual space (or
simply, dual) of V .
If B := {xi } is a basis of V , we define a set B∗ := {x∗i : V → F | xi ∈ B} ⊆ V ∗ , where x∗i is
defined as x∗i (xj ) := δij for all xj ∈ B.
Proof. If possible, let x1 , ..., xn ∈ V such that x∗1 , ..., x∗n ∈ V ∗ is linearly dependent and let
φ := i λi x∗i = 0. But then φ(xj ) = 0 for all j = 1, ..., n, implying that λi = 0 for all i.
P
Theorem. If I is an infinite set then dim F I > dim F (I) . Therefore if V := F (I) then
dim V ∗ > dim V .
Proof. Let V := F (I) with a basis B := {xi }i∈I . Since I is infinite, |I| = |I × N|. So we can
partition B into a family of subsets {Bi }i∈I such that each Bi is a countably infinite set. Let
Vi :=< Bi > for all i. As {Bi }i∈I is a partition of B, it’s easy to see that V = ⊕i Vi . If possible,
let S ⊆ V ∗ be a spanning set of V ∗ with |S| = |I|. Since I is infinite, F(S), the collection of all
finite subsets of S, has the same cardinality as I, allowing us to write F(S) as F(S) = {Si }i∈I .
With each Si being finite and Vi being of infinite dimension, we can find an element φi ∈ Vi∗ such
/ SSi >. As V = ⊕i Vi , there exists a unique φ ∈ V ∗ such that φ|Vi = φi for all i ∈ I. But
that φ ∈<
then φ ∈/ i∈I < Si >= V ∗ , a contradiction.
Examples.
1. Let X be a metric space and V := C(X, R), the set of all continuous functions from X
to R. Then for each x ∈ X, we can define the evaluation map at x, denoted by evx , as
evx (f ) := f (x) for all f ∈ V . This induces a vector space homomorphism from R(X) to V ∗ .
Can you see that this map is injective?
2. Let V := C([a, b], R), the set of all real valued continuous functions defined on the closed
Rb Rb Rb
interval [a, b]. Then the integration operator a : V → R, defined as a f := a f (t)dt, is a
linear functional.
Remarks.
1. To study a mathematical object, it’s a standard practice to focus on a set of ‘nice’ functions
defined on it. If V := F n then the most natural functions on V are the natural projections/co-
ordinate functions and V ∗ is nothing but the linear span of these projections in F V , the set
of all functions from V to F which is a vector space over F in its own right.
2. If V is a vector space with a basis B, then the set-theoretic map B → B∗ with xi 7→
x∗i , induces an injective linear map from V to V ∗ . This is an isomorphism iff V is finite-
22
dimensional.
However, even if V is finite-dimensional, the isomorphism between V and V ∗ is not ‘natural’,
i.e., it depends on the chosen basis B of V , so the isomorphism is not co-ordinate-free. While
discussing double dual V ∗∗ , we’ll see that for a finite-dimensional vector space V , there’s a
natural isomorphism between V and V ∗∗ which does not depend on any particular choice of
basis of V .
3. If V := F (I) then V ∗ = F I . So in some sense, ‘vector spaces are like direct sums and dual
spaces are like direct products’.
4. If V is finite-dimensional then dim V = dim V ∗ .
5. For a field F , Lagrange’s interpolation tells us that F (F ) ,→ F [X]∗ , where we send an element
of F to the corresponding evaluation map, viz., a 7→ eva , where eva (f (X)) := f (a) for all
f (X) ∈ F [X]. In particular, if F is uncountable like F = R/C, then the dual of F [X] is
not countably generated. Now F [X] is not countably generated even when F is finite or
countable, but that requires a different proof.
6. If x ∈ V then x = 0 iff φ(x) = 0 for all φ ∈ V ∗ .
Let E := (e1 , ..., en ) be the natural ordered basis of F n . Then a homogeneous
7. P linear equation
∗
n
e∗i is the i-th
P
i a i Xi = 0 may be viewed as a linear functional on F , viz., i ai e i , where
n
co-ordinate function on F ; and the solution set of the linear equation is nothing but the
hyperplane defined by this linear functional, i.e., the kernel of this linear functional.
In fact, if V is a finite-dimensional vector space with an ordered basis B := (x1 , ..., xn ), then
by virtue of the following commutative diagram
ei 7→e∗
Fn i
(F n )∗
φB : ei 7→xi φB∗ : e∗ ∗
i 7→xi
xi 7→x∗
V i
V∗
every linear functional is ‘like’ a homogeneous linear equation. Linear functionals, therefore,
may be viewed as a ‘generalization of homogeneous linear equations’.
Lemma. Let T : V → W be a linear transformation. We define the transpose of T , denoted
by T t , as a linear transformation T t : W ∗ → V ∗ with T t (φ) := φ ◦ T for all φ ∈ W ∗ . Then T is
injective (respectively surjective) iff T t is surjective (respectively injective).
23
Proof. Left as an exercise.
S ◦ := {φ ∈ V ∗ | S ⊆ ker φ}.
Proof. (i) If W ≤ V , we’ve a natural linear map from V ∗ to W ∗ given by φ 7→ φ|W (formally,
this is obtained by composing φ with the natural inclusion ι : W ,→ V ). Note that this map is
surjective with the kernel being W ◦ .
(ii) Look at the following commutative diagram.
π
V V /W
∃ ?
φ
F
There exists φ̄ : V /W → F , making the above diagram commutative iff W ⊆ ker φ. Therefore
every elementof W ◦ induces a linear functional on V /W ; and this results into an isomorphism
∼ ∗
W◦ − → V /W .
Corollary 1. If φ, ψ ∈ V ∗ then < φ > = < ψ > iff ker φ = ker ψ. So hyperplanes determine
nonzero linear functionals up to nonzero constants.
Proof. Replacing V by V /W T, we’ve to show that if dim V = r then there exist r hyperplanes
in V , say H1 , ..., Hr , such that Hi =T 0. If B = (x1 , ..., xr ) is an ordered basis of V , then we can
define Hi := ker x∗i , so that we have i Hi = 0.
T
Lemma. If H1 , ..., Hm are hyperplanes in V , then H := i Hi has codimension ≤ m. It is
equal to m iff the corresponding linear functionals are linearly independent.
Proof. The first assertion follows from the fact that codim (Hi ∩ Hj ) ≤ 2 for all i, j, together
with induction.
Let φ1 , ..., φr ∈ V ∗ such that Hi = ker φi for all i and W := i Hi . Note that φ1 , ..., φr ∈ W ◦ . As
T
∗
W◦ ∼
= V /W , if codim W < r, then φi ’s cannot be T linearly independent. Conversely, if some φi
is linearly dependent on the other φj ’s, then W = j6=i Hj , implying that codim W < r.
24
Corollary. If a matrix A ∈ Mm×n (F ) has row rank r and column rank c, then r = c.
Exercises.
is infinite.
5. (*) Let E/F be a field extension with F being infinite. If A, B ∈ Mn (F ), then prove that A
and B are similar over F iff they are similar over E.
The result is true even without assuming that F is infinite, but the only proof known to me
uses ‘fundamental theorem of modules over a PID’ which you’ll learn in Linear Algebra II.
Hint. Let P ∈ GLn (E) such that AP = P B. Then we can find finitely many elements
λ1 , ..., λr ∈ E, linearly independent over F , such that each entry of P can be written as an
F -linear combination of λ1 , ..., λr . Note that we can then write P as P = λ1 P1 + ... + λr Pr ,
with P1 , ..., Pr ∈ Mn (F ). As the sequence λ1 , ..., λr is linearly independent over F , APi = Pi B
for all i = 1, ..., r. Let R := F [T1 , ..., Tr ] be the polynomial ring in r variables over F and
consider the matrix P̃ := T1 P1 + ... + Tr Pr ∈ Mn (R). Note that det P̃ is a polynomial, say
φ(T1 , ..., Tr ) ∈ F [T1 , ..., Tr ], which is a nonzero polynomial because φ(λ1 , ..., λr ) 6= 0. Since
F is infinite, we can find (α1 , ..., αr ) ∈ F r such that φ(α1 , ..., αr ) 6= 0.
25
Lecture 8 (2/11/20) :
Double dual
Definition. The double dual of a vector space V , denoted by V ∗∗ , is defined to be the dual of V ∗ ,
i.e., V ∗∗ = (V ∗ )∗ := L(V ∗ , F ).
Lemma. If V is a vector space, there exists a natural (co-ordinate free, for our purpose) injec-
tive linear transformation from V to V ∗∗ which is surjective iff V is finite dimensional.
Proof. Let B0 := {f1 , ..., fn } be a basis of V ∗ . If (B0 )∗ = {f1∗ , ..., fn∗ } is the dual basis of B0
∼
=
in V ∗∗ , then the natural isomorphism V − → V ∗∗ allows us to get n elements x1 , ..., xn ∈ V such
that Lxi = fi∗ for all i = 1, ..., n. As fi∗ (fj ) = Lxi (fj ) = fj (xi ) = δij for all i, j, we conclude that
B0 is the dual basis of B := {x1 , ..., xn }. T
Alternatively, let Hi be the hyperplane defined by fi . Then for each i, Vi := j6=i Hj is one-
dimensional and Vi * Hi . So for each i, we can find a vector xi ∈ Vi \ Hi such that fi (xi ) = 1.
Then it is easy to see that one B0 is the dual basis of B := {x1 , ..., xn }.
Remarks.
26
5. If f : V → F is a linear functional then ker f ⊆ f ◦ (after identifying V with its image in
V ∗∗ ), and equality holds iff V is finite-dimensional.
6. If x ∈ V then x = 0 iff f (x) = 0 for all f ∈ V ∗ . Similarly, for f ∈ V ∗ , f = 0 iff f (x) = 0
for all x ∈ V or, in other words, Lx (f ) = 0 for all x ∈ V . The duality relations between V
and V ∗ are especially useful when V is finite-dimensional, because then V can be naturally
identified with V ∗∗ .
7. Let AX = 0 and BX = 0 be two systems of homogeneous linear equations in n variables.
Recall that A is said to be row-equivalent to B iff they have the same row space. Then
A is row-equivalent to B iff the two systems have the same solution set. To see this, let
RA and RB be the row space of A, B respectively. If the linear equations are considered as
linear functionals on F n , then the kernel of the linear functional gives the solution set of
◦ ◦
the equation. If the two systems have the same solution set then RA = RB , implying that
(RA ) = (RB ) . As RA , RB can be naturally identified with (RA ) , (RB )◦◦ respectively,
◦◦ ◦◦ ◦◦
Proof. Note that f1 , ..., fr ∈ W ◦ and f¯i is the image of fi under the natural linear transfor-
mation from W ◦ to (V /W )∗ . Therefore if f1 , ..., fr are linearly dependent, so is f¯1 , ..., f¯r .
Conversely, Let λ1 f¯1 + · · · + λr f¯r = 0 be a non-trivial linear relation. If f1 , ..., fr are linearly
independent, then there exists an element x ∈ V such that (λ1 f1 + · · · + λr fr )(x) 6= 0. But then
(λ1 f¯1 + · · · + λr f¯r )(x̄) := (λ1 f1 + · · · + λr fr )(x) 6= 0,
a contradiction.
Lemma. Let V be a vector space and f1 , ..., fr ∈ V ∗ linear functionals on V . Then f1 , ..., fr are
linearly independent iff the linear transformation T : V → F r , given by T (x) := (f1 (x), ..., fr (x))
is surjective.
In particular, f1 , · · · , fr are linearly independent iff there exists an r-dimensional subspace V 0 ≤ V ,
such that f1 |V 0 , ..., fr |V 0 is a basis of (V 0 )∗ .
P P
Proof. If fi = j6=i µj fj then clearly fi (x) = j6=i µj fj (x) for all x ∈ V , and therefore the
map cannot be surjective.
Conversely,
T suppose that f1 , ..., fr are linearly independent. If Hi := ker fi , thenTrecall that
W := i Hi has codimension r in V . Therefore, for each i, we can find an element xi ∈ j6=i Hj \W
such that fi (xi ) = 1. Then T (xi ) = ei for every i, and hence T is surjective.
We can take V 0 to be the subspace of V generated by x1 , ..., xr .
Remark. In the above lemma, we saw that if f1 , ..., fr ∈ V ∗ are linearly independent, then
there exist x1 , · · · , xr ∈ V such that fi (xj ) = δij for all i, j. Conversely, it’s also true that if
x1 , · · · , xr ∈ V are linearly independent elements, then there exist f1 , · · · , fr ∈ V ∗ , such that
fi (xj ) = δij for all i, j.
Remark. If V, W are two vector spaces over F , then the transpose map from L(V, W ) to
L(W ∗ , V ∗ ) which sends T to T t , is an injective linear transformation. By the dimension argument,
the map is also surjective if V, W are finite dimensional.
If S : U → V , T : V → W are linear transformations, then (T ◦ S)t = S t ◦ T t .
27
(i) (im T )◦ = ker T t . In particular, if T is surjective then T t is injective. If V, W are finite-
dimensional, then rk T = rk T t .
T T̄
(T̄ )t
W∗ (V /ker T )∗
πt
Tt
V∗
Recall that (V /ker T )∗ maps isomorphically onto (ker T )◦ under π t . As T̄ is injective, by (iii),
(T̄ )t is surjective. Therefore, by the commutativity of the diagram, im T t = (ker T )◦ .
Note that we have proved that T is injective (respectively surjective) iff T t is surjective (respec-
tively injective).
∗∗ T tt ∗∗
V W
where T tt = (T t )t . So while discussing finite-dimensional vector spaces, for all the practical pur-
poses, we usually don’t make any distinction between V and V ∗∗ .
Proposition. Let V, W be finite dimensional vector spaces with ordered basis BV := (x1 , · · · , xn )
t
and BW := (y1 , · · · , ym ) respectively. Then T t B∗ ,B∗ = [T ]BV ,BW .
W V
Remark. The above proposition gives a justification for the usual ‘rules’ of matrix transpose
like (A + B)t = At + B t , (AB)t = B t At etc.
It also gives an alternative proof of the fact that the row rank of a matrix is equal to its column rank.
Exercises.
∗
1. HK : Section 3.6 - 1,2,3 (note that V (S, F ) = F (S) ); Section 3.7 - 1,2,3,4,5,6 (we only
need ch F = 0, or at least ch F > n),7,8.
28
2. Give an example of an injective linear operator T ∈ L(V ) such that T t is not injective.
Also, give an example of a linear operator T ∈ L(V ) which is not injective, but T t is injective.
29
Lecture 9 and 10 (4/11/2020, 6/11/2020) :
We’ll now study the structure of linear operators.
The simplest linear operators, as all of us would agree, are the ones which can be described by a
single number - the scalar operators. But scalar operators are ‘too special’ in the sense that every
nonzero vector space which is not a field admits ‘a lot of’ non-scalar operators. So in terms of
simplicity, next comes the class of linear operators which are ‘made up’ of scalar operators, the
so-called diagonalizable operators. More precisely, a linear operator
L T : V → V is diagonalizable
if there exists a T -invariant direct sum decomposition V = i Vi such that T , restricted to each
Vi , acts as a scalar operator, so that T , in some sense, becomes a direct sum of scalar operators.
In general, diagonalizable linear operators turn out to be the most ‘well-behaved’ linear operators
which can be found ‘in abundance’.
The study of linear operators becomes very difficult if the underlying vector space is not finite-
dimensional. So we’ll mostly restrict ourselves to the finite-dimensional case, but the definitions
will be given in a general set-up unless an assumption on the dimension is mandatory. The follow-
ing questions will broadly guide us in the study of linear operators.
Question 1. Is every linear operator diagonalizabe? If not, how to characterize the class of
linear operators which are diagonalizable?
Question 3. If all linear operators are not diagonalizable, what’s the next best class of linear
operators and how to classify its members?
We will try to answer the above questions in the case when V is a finite-dimensional vector space.
Remarks.
1. A set of nonzero vectors {xi } ⊆ V is linearly independent iff the corresponding collection
of one-dimensional subspaces {< xi >} is linearly independent. Note that, although the
inclusion of 0 makes any family of vectors linearly dependent, the inclusion of the zero
30
subspace has no effect on the linear independence property of a family of subspaces.
L
2. Let {V
Li } be a family of subspaces ofP V and i Vi the external direct sum. Then the natural
map i Vi → V , defined as (xi ) 7→ i xiP , is injective iff the family {Vi } is linearly indepen-
dent. Note that the image of this map is i Vi . Therefore, for a linearly independent family
of subspaces {Vi }, the external direct sum is ‘same’ P as the internal direct sum. i.e.,L they’re
naturally isomorphic. As a consequence, if V = i Vi then the external direct sum i Vi is
isomorphic to V iff {Vi } is linearly independent. Note that {Vi } is linearly independent iff
every finite sub-family of {Vi } is linearly independent.
If {V1 , · · · , Vr } is a set of linearly independent subspaces in a finite dimensional vector space
V (Strictly speaking, ‘linear independence’ is a property of the family and not the individual
subspaces!), then V = V1 ⊕ · · · ⊕ Vr iff dim V = dim V1 + · · · + dim Vr .
3. For every linear operator T : V → V , 0 and V are always T -invariant subspaces. Also, ker T
and im T are T -invariant.
If S : V → V is another linear operator such that ST = T S then both ker T and im T are
S-invariant and vice-versa. In particular, every eigenspace of T is S-invariant and vice-versa.
However, a T -invariant subspace of V need not be S-invariant.
P
4. Let {Vi } be a family of T -invariant subspaces. Then i Vi and ∩i Vi are both T -invariant;
and if {Vi } is a directed family, then ∪i Vi is also T -invariant.
5. If T : V → V is a linear operator then {Vλ }λ∈F , the collection of eigenspaces of T , is a
linearly independent family. Consequently, if V is finite dimensional, T can have at most
finitely many eigenvalues. In fact, if dim V = n, then T cannot have more than n distinct
eigenvalues.
31
Note that T is diagonalizable iff every vector in V can be written as a sum of eigenvectors of T .
If V is finite dimensional then T is diagonalizable iff there exists an ordered basis B of V such
that [T ]B is a diagonal matrix, and hence the name diagonalizable.
Similarly, a matrix A ∈ Mn (F ) is said to be diagonalible if it’s similar to a diagonal matrix. Note
that A is diagonalizable iff every matrix similar to A is also diagonalizable. Therefore a linear
operator T on a finite-dimensional vector space V is diagonalizable iff its matrix representation
with respect to every ordered basis of V is diagonalizable.
Remarks.
1. If T is a diagonalizable operator, so is f (T ) for every polynomial f ∈ F [X].
2. If S, T ∈ L(V ) are diagonalizable operators then S + T, ST need not be diagonalizable.
However, we’ll later see that they are diagonalizable if ST = T S.
How to find the eigenvalues?
With the above discussion, it’s clear that the eigenvalues play a crucial role in the study of linear
operators. But given a linear operator T : V → V , how to find its eigenvalues?
The situation, in general, is quite hopeless if V is not finite-dimensional because it’s not possible to
check whether T − λI is injective or not for every λ ∈ F . The finite-dimensional case, however, is
remarkably simple to deal with as the following lemma gives us a recipe for finding the eigenvalues
of a linear operator.
Definition. Let R be a commutative ring and A ∈ Mn (R). then the characteristic polynomial
of A, denoted by chA (X), is defined to be the determinant of the matrix XI − A ∈ Mn (R[X]).
Note that if A and B = P −1 AP are similar matrices in Mn (R) then
chB (X) := det (XI − B) = det (XI − P −1 AP ) = det (P −1 (XI − A)P ) = det (XI − A) = chA (X).
As similar matrices have the same characteristic polynomial, for a linear operator T on a finite-
dimensional vector space V , we can define its characteristic polynomial chT (X) to be the charac-
teristic polynomial of any matrix representation of T .
φ̃
Mn (R[X]) Mn (R)
det det
φ
R[X] R
As a result, we see that if T is a linear operator on a finite dimensional vector space V then the
eigenvalues of T are precisely the roots of its characteristic polynomial chT (X), and hence they’re
also called the characteristic values/characteristic roots of T .
Remarks.
32
1. Similarly, we can define the eigenvalues, eigenvectors, eigenspaces etc. of an n × n square
matrix A over F . For example, the eigenvalues of A (in F ) are the roots of chA (X) (in F ),
every solution of the system of homogeneous linear equations (A−λI)X = 0 is an eigenvector
of A, and the λ-eigenspace of A is the set of solutions of this system of equations.
2. The eigenvalues of a matrix/linear operator depends on the base field F . For example,
the 90◦ -rotation of R2 does not have any eigenvalue in R, but it has two eigenvalues in C,
namely ±i. As a result, the corresponding matrix is not diagonalizable over R, but it’s
diagonalizable over C. So while discussing the diagonalizability of a matrix/linear operator
one should always keep the underlying field in mind.
3. If V is an n-dimensional vector space and T ∈ L(V ), then chT (X) has degree n, showing
that T cannot have more than n distinct eigenvalues; and if T has n distinct eigenvalues,
then it’s diagonalizable.
We’re now in a position to take the first stab at diagonalizability of a matrix/linear operator. If
we start with a linear operator T on a finite-dimensional vector space V , then first take a matrix
representation of T to get a matrix A ∈ Mn (F ). Then find the roots of the characteristic polyno-
mial chA (X). If the roots are λ1 , · · · , λr , find the dimensions of ker (A − λi I) for each i. If the
dimensions are d1 , · · · , dr , then T is diagonalizable iff n = d1 + · · · + dr .
If T is diagonalizable, and Bi is an ordered basis of ker (T − λi I), then [T ]B is a diagonal matrix
where B := {B1 , · · · , Br }.
The above procedure checks diagonalizability of a matrix/linear operator; and in the process,
gives an actual diagonalization whenever it exists. Therefore it’s not an efficient method if one’s
only interested in checking diagonalizability and not in the actual diagonalization. We need to
introduce a few more concepts to give an useful criterion for diagonalizability.
Definitions. Let F be a field and R a finite-dimensional algebra over it. If a ∈ R, then the
annihilator of a (over F ), denoted by ann a, is defined to be the kernel of the F -algebra homomor-
phism from F [X] to R which sends X to a (note that the kernel is nonzero!). Every polynomial in
this kernel is called an annihilating polynomial of a, and the unique monic generator of this kernel
is said to be the minimal polynomial of a which is denoted by mina (X). Sometimes, the minimal
polynomial of a is also referred to as the annihilator of a, but it doesn’t lead to any confusion be-
cause the nonzero monic polynomials of F [X] are in one-to-one correspondence with the nonzero
ideals of F [X].
Remarks.
2. if R is a finite-dimensional algebra over F , then the degree of mina (X) is equal to the
dimension of the F -algebra F [a]. As a result, if E/F is field extension and A ∈ Mn (F ), then
the minimal polynomial of A doesn’t depend on whether we think of it as a matrix over F
or a matrix over E.
3. Unlike minimal polynomials which can be directly defined for a matrix as well as a linear
operator on a finite-dimensional vector space, characteristic polynomials can only be defined
for a matrix. Because similar matrices have the same characteristic polynomial, we can
extend the definition to linear operators on finite-dimensional vector spaces.
4. Let T be a linear operator on a finite dimensional vector space V and W ≤ V a T -invariant
subspace. If I := ann TW and J := ann T̄W , then IJ ⊆ ann T ⊆ I ∩J, where either inclusion
can be strict. Equivalently, we have
lcm(minTW (X), minT̄W (X)) | minT (X) | minTW (X) · minT̄W (X).
33
5. Let T be a linear operator on a vector space V of dimension n and W ≤ V a T -invariant
subspace of dimension r. Let B := (x1 , · · · , xr , xr+1 , · · · , xn ) be an ordered basis of V with
B0 := (x1 , · · · , xr ) being an ordered basis of W . Then B00 := (x̄r+1 , · · · , x̄n ) is an ordered
basis of V /W . In this case,
[TW ]B0 A
[T ]B =
0 [T̄W ]B00
, where A is certain r × (n − r) matrix over F . Since
det (XIn − [T ]B ) = det (XIr − [TW ]B0 ) · det (XIn−r − [T̄W ]B00 ),
[T1 ]B1
. 0
[T ]B = .
0
.
[Tr ]Br
Qr Pr Qr
As a result, chT (X) = i=1 chTi (X). Also, tr T = i=1 tr Ti and det T = i=1 det Ti .
Conversely, let V1 , · · · , Vr be finite dimensional L and TL
Lrvector spaces i ∈ L(Vi ) for all i,
then we can define a linear operator T := T
i=1 i : V
i i → i Vi , as T ((xi )7→
(Ti (xi )). Then it’s easy to see that each Vi is T -invariant with T |Vi = Ti for all i, so
that the above discussion can be applied in this situation.
7. If T is a linear operator on a finite-dimensional vector space V , then for all λ ∈ F , we have
chT −λI (X) = chT (X + λ) and minT −λI (X) = minT (X + λ).
8. A locally nilpotent operator cannot have any nonzero eigenvalue. Therefore a nonzero locally
nilpotent linear operator cannot be diagonalizable. In particular, a nonzero nilpotent matrix
is not diagonalizable.
Exercises.
1. HK : Section 6.2 - 6,10,11,12,13 (*),14,15;
34
2
thatthe linear span of n × n nilpotent matrices has dimension n − 1.
3. (*) Prove
1 −1
Hint. is a nilpotent matrix.
1 −1
4. Let φ : R → S be a homomorphism of commutative rings. Then φ induces two ring homo-
morphisms
φP : R[X] → S[X] and φM : Mn (R) → Mn (S),
where φP (a0 + a1 X + · · · an X n ) := φ(a0 ) + φ(a1 )X + · · · + φ(an )X n , and for A ∈ Mn (R),
φM (A) is obtained by applying φ on each entry of A. Now prove that
(i) det φM (A) = φ(det A).
(ii) tr φM (A) = φ(tr A).
(iii) chφM (A) (X) = φP (chA (X)).
(iv) If A, B ∈ Mn (R) then φM (chA (B)) = chφM (A) (φM (B)).
5. Let T : V → V be a linear operator. If null T = r, show that null T m ≤ rm for all m ≥ 1.
If T is a nilpotent operator on an n-dimensional vector space V , prove that T n−1 6= 0 iff
rk T = n − 1.
6. Let V be a vector space of dimension n and T : V → V a linear operator. Find the set of all
T -invariant subspaces in the following cases.
(i) T is a scalar operator.
(ii) T has n distinct eigenvalues.
(iii) T is a nilpotent operator such that T n−1 6= 0.
7. Let A, B be square matrices of order n ≤ 3 over an algebraically closed field F . If chA (X) =
chB (X) and minA (X) = minB (X), then show that A and B are similar matrices.
What happens if we drop the assumption that n ≤ 3?
Remark. It is a fact that if E/F is a field extension and A, B ∈ Mn (F ), then A and B are
similar over F iff they are similar over E. Taking E to be an algebraic closure of F , one can
therefore drop the assumption that F is algebraically closed.
8. (*) Let f (X) ∈ F [X] be a monic polynomial of degree n and V := F [X]/(f (X)). If T
is the linear operator on V defined as the multiplication by X̄, then prove that chT (X) =
minT (X) = f (X). The matrix representation of T with respect to the basis (1̄, X̄, · · · , X̄ n−1 )
is called the companion matrix of f (X).
Hint. What is the annihilator of the one-dimensional subspace generated by 1̄?
9. Let T be a linear operator on a finite-dimensional vector space V . Then prove that the
following are equivalent.
(i) V = ker T ⊕ im T .
(ii) ker T and im T are linearly independent.
(iii) ker T = ker T 2 .
12. (*) Let T be a linear operator on a vector space V with W being a T -invariant subspace of
V which contains the image of T . If S is a linear operator on W such that STW = TW S
then prove that there exists a linear operator S̃ ∈ L(V ) such that S̃|W = S and S̃T = T S̃ iff
im T is S-invariant.
35
13. (*) Let T : V → V be a nonzero linear operator which is not surjective. If W is the image of
T and S ∈ L(W ), then show that there exists a linear operator S̃ ∈ L(V ) such that S̃|W = S
and S̃T 6= T S̃.
If T is a nonzero nilpotent operator on F 2 , show that there exists a linear operator S ∈ L(F 2 )
such that ker f (T ) and im f (T ) are both S-invariant for every polynomial f (X) ∈ F [X], but
ST 6= T S.
36
Lecture 11,12 and 13 (9/11/2020, 11/11/2020, 13/11/2020) :
So far, in the study of linear operators, eigenvalues and eigenvectors have played a central role.
Now this approach, although perfectly well-suited for diagonalizable operators and useful in gen-
eral, has its own limitations because everything in V , lying outside the sum of the eigenspaces,
remains ‘invisible’. Firstly, if F is not algebraically closed then some (may be, even all!) of the
eigenvalues of T may ‘escape’, as can be seen in the case of the 90◦ -rotation of R2 . To avoid
this situation, we’ll have to work over an algebraically closed field. But even then we may not
have ‘enough’ eigenvectors as is well-demonstrated in the case of a nonzero nilpotent operators. In
particular,
P if T is a nilpotent operator in an n-dimensional vector space V such that T n−1 6= 0,
then λ∈F Vλ = V0 is just one-dimensional which hardly tells anything about the structure of T .
To overcome these problems, we’ll now take a more holistic approach in studying linear operators
on a finite-dimensional vector space. But most of our qualitative discussions will apply to every
linear operator which has a nonzero annihilating polynomial. In this approach, we’ll ‘break’ V into
a direct sum of T -invariant subspaces and study the restrictions individually to get a global picture.
The motivation behind this comes from the intuitive idea that the complexity in the structure of
L(V ), in general, ‘should’ come down if the dimension of V drops.
In other words, our present approach will be a ‘top-down’ one where we start with V and then
break it into smaller subspaces, whereas the previous one was ‘bottom-up’ where we started with
the small eigenspaces and then checked whether V can be ‘covered’ by them.
We’ll first prove an extremely useful result - the Cayley-Hamilton theorem. We’ve already seen
that every eigenvalue of a square matrix A is a root of its characteristic polynomial and conversely.
Now the Cayley-Hamilton theorem says much more - that not only the eigenvalues of A are roots
of chA (X), but A itself is a ‘root’ of its characteristic polynomial.
Cayley-Hamilton theorem
Statement. Let R be a commutative ring and A an n × n matrix over R. Then chA (A) = 0.
Proof. We’ll break the proof in several steps. We’ll first prove it over C, the field of complex
numbers, and then show that the general result follows from this.
Step 1. If R is a commutative ring and A ∈ Mn (R) a diagonal matrix then chA (A) = 0.
Step 2. Let A ∈ Mn (F ) be a diagonalizable matrix with B := P −1 AP being a diagonal matrix.
Then chA (A) = chB (A) = chB (P BP −1 ) = P chB (B)P −1 = 0.
Step 3. The set of all diagonalizable matrices over C contains the set of all matrices whose
characteristic polynomials have n distinct roots. If f (X) ∈ C[X] is a monic polynomial of degree
n then the discriminant of f (X), denoted by Disc(f (X)), is defined as (up to sign)
Y
Disc(f (X)) := (ri − rj ),
i6=j
where r1 , · · · , rn are the roots of f (X). Note that the discriminant of f (X) is nonzero iff f (X) has
n distinct roots. Now it’s a fact that the discriminant of f (X) is a polynomial in the coefficients of
f (X). As the coefficients of the characteristic polynomial of a matrix are polynomials in the entries
of the matrix, it follows that disc : Mn (C) → C, given as disc(A) := Disc(chA (X)), is a polynomial
function in the entries of the matrix A. So we have a polynomial in n2 -variables {Xij }1≤i,j≤n , say
Φ(Xij ) ∈ C[Xij ]1≤i,j≤n such that the set of all matrices in Mn (C) which have n distinct eigenvalues
2
are precisely the points in Cn (which can be ‘identified’ with Mn (C)) which take nonzero values
under Φ. Now it’s a fact, whose proof we leave as an exercise, that if φ(Y1 , · · · , Yr ) ∈ C[r] is a
nonzero polynomial in r-variables then the set
{(α1 , · · · , αr ) ∈ Cr | φ(α1 , · · · , αr ) 6= 0}
is a dense open set in Cr . In particular, the set of diagonalizable matrices is dense in Mn (C).
Therefore, since A 7→ chA (A) is a continuous map of Mn (C), the result follows.
Step 4. Let A be an n × n matrix over R. Now consider the ring R̃ := Z[Xij ]1≤i,j≤n where Xij
are variables over Z. Let à be the generic n × n matrix over Z, i.e., à is the n × n matrix over
37
R̃ whose ij-th entry is Xij . If we denote the ij-entry of A by aij then there exists a unique ring
homomorphism from R̃ to R, viz., which sends Xij to aij for all i, j, such that under the induced
ring homomorphism from Mn (R̃) to Mn (R), Ã is mapped to A. Therefore it suffices to show that
the generic matrix satisfies its characteristic polynomial. Now chà (Ã) is a matrix whose entries are
polynomials in n2 -variables over Z. So if some entry of this matrix is a nonzero polynomial then
2
we can find n2 integers, say cij ∈ Z, such that the polynomial, evaluated at the point (cij ) ∈ Zn ,
gives a nonzero value. But then the matrix C ∈ Mn (Z), whose ij-th entry is cij does not satisfy
its characteristic polynomial, a contradiction.
Lemma. Let T be a linear operator on a finite dimensional vector space V . Then every eigen-
value of T is a root of minT (X) and conversely. Equivalently, the roots of minT (X) are same as
the roots of chT (X), both being equal to the set of eigenvalues of T .
Proof. First, let λ ∈ F be an eigenvalue of T . Then there exists a nonzero vector v ∈ V such
that T (v) = λv. As minT (T ) = 0, in particular, we’ve (minT (T ))(v) = 0. But then minT (λ)·v = 0,
implying that minT (λ) = 0.
Conversely, let λ ∈ F be a root of minT (X). Then we can write minT (X) as minT (X) =
(X − λ)g(X) for some g(X) ∈ F [X]. If λ is not an eigenvalue of T , then T − λI is an in-
vertible linear operator on V , forcing that g(T ) = 0, a contradiction.
Now we’ll show that not only that the roots of minT (X) and chT (X) are same, but they’ve the
same irreducible factors. But to give a proof of it, we need a preparatory lemma and a result from
field theory.
Lemma. Let E/F be a field extension. Then f (X), g(X) ∈ F [X] are relatively prime in F [X]
iff they’re relatively prime in E[X].
We’ll also use the following fact about field extensions without giving any proof :
Let F be a field. Then there exists a field extension F /F (unique up to F -isomorphism), called
an algebraic closure of F , such that F is algebraically closed and every element of F satisfies a
nonzero polynomial over F , i.e., F /F is an algebraic extension.
Proposition. Let A be an n × n matrix over F . Then minA (X) and of chA (X) have the same
irreducible factors.
Proof. As minA (X) | chA (X), every irreducible factor of minA (X) also divides chA (X). Con-
versely, let f (X) be an irreducible factor of chA (X). Recall that the characteristic and the minimal
polynomial of A remain unchanged if we treat A as a matrix over F . Let λ ∈ F be a root of f (X).
Then λ is also a root of minT (X), implying that f (X) and minT (X) are not relatively prime in
F [X], and therefore they are not relatively prime in F [X]. Since f (X) is irreducible, it follows
that f (X) | minT (X).
Qr Theorem. Let T be a linear operator on an n-dimensional vector space V and chT (X) =
ei
i=1 if (X) the prime factorization of chT (X) into monic irreducible polynomials. If Vi :=
ker fiei (T ) for all i = 1, · · · , r then the following assertions hold.
(i) V = V1 ⊕ · · · ⊕ Vr is an internal direct sum decomposition of V into T -invariant subspaces.
e0
(ii) Let Ti := T |Vi . Then for each i, minTi (X) = fi i (X) for some positive integer e0i satisfying
Qr e0
1 ≤ e0i ≤ ei . Consequently, minT (X) = i=1 fi i (X), where 1 ≤ e0i ≤ ei for every i.
38
(iii) For every i, chTi (X) = fiei (X). In particular, dim Vi = ei · deg fi .
e0 S∞
(iv) Vi = ker fi i (T ) = d=1 ker fid (T ).
Proof. To prove (i), apply induction together with the result that if f (X), g(X) ∈ F [X] are
relatively prime polynomials such that (f g)(T ) = 0 then V = ker f (T ) ⊕ ker g(T ).
To prove (ii), first note that fiei (X) is an annihilating polynomial of Ti . As the minimal polynomial
e0
of Ti is a non-constant divisor of fiei (X), it follows that it is of the form fi i (X) for some positive
Qr e0
integer e0i satisfying 1 ≤ e0i ≤ ei . Therefore, we get that minT (X) = i=1 fi i (X) where 1 ≤ e0i ≤ ei
for every i.
To prove (iii),Q note that for each i, the characteristic polynomial of Ti must be a power of fi (X).
As chT (X) = i chTi (X), it follows that chTi (X) = fiei (X) for all i.
e0 S∞
For (iv), it’s clear that ker fi i (T ) ⊆ Vi ⊆ d=1 ker fid (T ). To prove the other inclusion, let n ≥ e0i .
e0 e0
We’ll show that ker fin (T ) = ker fi i (T ). Let x ∈ ker fin (T ). As gcd(fin (X), minT (X)) = fi i (X),
e0
it follows that fi i (T )(x) = 0.
Remarks.
1. The above criterion for diagonalizability gives an alternative proof of the fact that if T is
a diagonalizable linear operator on a finite-dimensional vector space V and W ≤ V is a
T -invariant subspace, then both TW and T̄W are diagonalizable.
2. If A is an n × n matrix over a field F then A − λI is invertible for all but finitely many values
of λ ∈ F . If A is invertible then A−1 ∈ F [A], i.e., A−1 is a polynomial in A over F .
Qr
3. Let A ∈ Mn (F ) and minA (X) = i=1 fiei (X) an irreducible decomposition into the powers
of distinct monic irreducible polynomials. Then
r
F [X] Y F [X]
F [A] ∼
= ∼
= .
(minA (X)) i=1 (fiei (X))
In particular, F [A] contains exactly 2r number of idempotent elements; and F [A] is a field
iff minA (X) is an irreducible polynomial.
4. If a linear operator T is annihilated by a polynomial which is a product of distinct linear
factors then T is diagonalizable. The converse, however, may fail if V is not finite-dimensional.
Can you give an example?
5. Let A be an n × n matrix over F . If f (X) is a plynomial over F then f (A) is invertible iff
gcd(f (X), minA (X)) = gcd(f (X), chA (X)) = 1.
6. A ∈ Mn (F ) is a nilpotent matrix iff chA (X) = X n .
7. If T : V → V is a linear operator then T and T − cI have the same set of eigenvectors for all
c ∈ F.
8. Let F = R/C. Then the set of all m × n matrices having rank ≤ r is a closed subset of
Mm×n (F ) for every r ≥ 0. In fact, as r increases, we get an increasing chain of closed subsets
of Mm×n (F ), say C0 ⊆ C1 ⊆ · · · , with C0 = {0} and Cl = Mm×n (F ) where l = min{m, n}.
39
We’ll end our discussion on diagonalizable operators by giving another criterion for diagonaliz-
ability.
Lemma. Let T be a linear operator on a finite dimensional vector space V . Then for every
eigenvalue λ of T , the geometric multiplicity of λ cannot be more than its algebraic multiplicity.
Proof. Let chT (X) = (X − λ)r g(X), with g(λ) 6= 0. Then we’ve to show that dim Vλ ≤ r.
Now V = ker (T − λI)r ⊕ ker g(T ) and null (T − λI)r = r. As Vλ ⊆ ker (T − λI)r , the inequality
follows.
Note that the algebraic multiplicity of λ is equal to the dimension of the generalized λ-eigenspace
of T whereas the geometric multiplicity of λ is equal to the dimension of the λ-eigenspace of T .
Proof. First, let T be a diagonalizable operator with eigenvalues λ1 , · · · , λr . Then minT (X) =
(X −λ1 ) · · · (X −λr ) and chT (X) = (X −λ1 )e1 · · · (X −λr )er with some positive integers e1 , · · · , er .
If Vi is the λi -eigenspace of T then Vi = ker (T − λi I) = ker (T − λi I)ei , implying that dim Vi = ei .
Conversely, let chT (X) = (X − λ1 )e1 · · · (X − λr )er . Then we can write V as an internal direct
sum of the generalized eigenspaces. By the given condition, every eigenspace is actually equal to
the corresponding generalized eigenspace, implying that T is diagonalizable.
Remarks.
1. In general, eigenspaces and generalized eigenspaces may be as different as chalk and cheese.
For example, if T is a nilpotent linear operator on an n-dimensional vector space V satisfies
T n−1 6= 0 then T has algebraic multiplicity n but geometric multiplicity only one! And for
a more ‘shocking’ example, you may look at the differentiation operator on the polynomial
ring R[X].
2. The following discussion is applicable to any linear operator which has a nonzero annihilating
polynomial (Strictly speaking, we’ve only defined annihilating polynomials for linear oper-
ators on a finite dimensional vector space, but you can guess the definition in the general
case! The only thing is that, for a linear operator on an infinite-dimensional vector space, the
definition will come with a phrase ‘if exists’.). In particular, it’s applicable to every linear
operator on a finite-dimensional vector space.
Not having enough eigenvalues is kind of an ‘artificial’ problem for a linear operator which
we can get rid of by moving to a larger field. But this difference between algebraic and
geometric multiplicities of various eigenvalues λ is the ‘real reason’ behind the failure of diag-
onalizability. Even in the infinite-dimensional case, when we may not be able to talk about
algebraic and geometric multiplicities, this failure is captured in the difference between the
λ-eigenspace and the generalized λ-eigenspace. Note that T − λI acts as a nilpotent operator
on the generalized λ-eigenspace which’s zero iff the generalized λ-eigenspace is same as the
λ-eigenspace. When all the eigenvalues of T are ‘present’ in F , then non-digonalizability
of T manifests itself in the nature of nilpotency of various translates of T at its different
eigenvalues. The difference between various eigenspaces and the corresponding generalized
eigenspaces tells us how ‘badly’ a linear operator fails to be diagonalizable.
40
So far, we’ve been largely dealing with diagonalizable operators and gave various criteria for diago-
nalizabilty, as well as a method of diagonalization whenever it exists. Now we are going to address
the final question raised in the introduction.
Diagonalizable operators, on a finite dimensional vector space V , are characterized by the property
that their matrix representations are diagonal with respect to suitable ordered bases. After diag-
onal matrices, the next class of ‘simple-looking’ matrices are the triangular ones (upper or lower).
So we’re now going to study those linear operators whose matrix representations, with respect to
suitable ordered bases, are upper/lower triangular.
Proof. If [T ]B is upper triangular with respect to B = (x1 , · · · , xn ), then for each i, we may
take Vi :=< x1 , · · · , xi >.
Conversely, note that each Vi has codimension one in Vi+1 and if we take an ordered basis
B := (x1 , · · · , xn ) such that Vi+1 = Vi + < xi+1 > for all i < n, then [T ]B is an upper-triangular
matrix.
We’ll now give a different proof of the Cayley-Hamilton theorem over an algebraically closed
field F .
41
Proposition. A triangulable linear operator T on a finite-dimensional vector space V satisfies
its characteristic polynomial.
In particular, the Cayley-Hamilton theorem holds over an algebraically closed field; and if we are
ready to accept the ‘fact’ that every field F can be embedded in an algebraically closed field, then
it proves the theorem over an arbitrary field F .
Remarks. In the following remarks, we assume that S, T are linear operators on a finite-
dimensional vector space V .
1. As the definition suggests, unlike diagonalizability, we define triangulability only for linear
operators on a finite dimensional vector space.
2. If an annihilating polynomial of T splits into linear factors then T is triangulable.
3. Even if S, T are triangulable, still S + T, ST need not be triangulable. We’ll later see that
they’re triangulable if ST = T S.
4. Every nilpotent operator on V is triangulable (Note the difference with diagonalizability!).
5. A triangulable operator T is diagonalizable iff every eigenvalue of T has the same algebraic
and geometric multiplicity, or equivalently, every eigenspace is equal to the corresponding
generalized eigenspace.
Throughout this section, all vector spaces are assumed to be finite-dimensional. In principal, one
can talk about simultaneous diagonalization even for a family of linear operators on an infinite-
dimensional vector space, but we’re not going to do it here.
We start with a few definitions.
Definitions. Let S ⊆ L(V ) be a set of linear operators. We say that S is a commuting family
of linear operators if ST = T S for all S, T ∈ S.
If S ⊆ L(V ) is a set of linear operators, then x ∈ V is called an eigenvector of S if T (x) ∈< x >
for all T ∈ S, i.e., if x is an eigenvector of every linear operator in S.
W ≤ V is said to be S-invariant if W is T -invariant for all T ∈ S. If W ≤ V is S-invariant then
we define SW := {TW | T ∈ S} ⊆ L(W ) and S̄W := {T̄W | T ∈ S} ⊆ L(V /W ).
An S-invariant subspace W of V is said to be an eigenspace of S is TW is a scalar operator for all
T ∈ S.
A direct sum decomposition V = V1 ⊕· · ·⊕Vr is said to be an S-invariant direct sum decomposition
if each Vi is S-invariant.
A set of diagonalizable (respectively triangulable) linear operators S ⊆ L(V ) is said to be simul-
taneously diagonalizable (respectively simultaneously triangulable) if there exists an ordered basis
B of V such that [T ]B is a diagonal (respectively upper-triangular) matrix for all T ∈ S.
Similarly, a set S ⊆ Mn (F ) is said to be simultaneously diagonalizable (respectively simultaneously
triangulable) if there exists an invertible matrix P ∈ GLn (F ) such that P −1 AP is a diagonal (re-
spectively upper-triangular) matrix for all A ∈ S.
In the definition, we could have replaced ‘upper-triangular’ matrices by ‘lower-triangular’ matrices.
Note that S ⊆ L(V ) is simultaneously diagonalizable (respectively simultaneously triangulable) iff
the corresponding family of matrix representations SB := { [T ]B | T ∈ S } ⊆ Mn (F ) is simulta-
neously diagonalizable (respectively simultaneously triangulable) for all ordered bases B of V.
42
Remarks.
Proof. We’ve already seen that if S is simultaneously diagonalizable then it must be a com-
muting family of linear operators.
Conversely, let S be a commuting family of linear operators such that every member of S is diag-
onalizable. We apply induction on dim V .
If dim V = 1, there’s nothing to prove.
Now assuming that the assertion is true for all vector spaces of dimension ≤ r, let dim V = r + 1.
If S consists of only scalar operators, again, there’s nothing to prove. Otherwise, let T ∈ S be
a linear operator which is not scalar. Then we can find a non-trivial T -invariant decomposition
of V , say V = V 0 ⊕ V 00 , such both V 0 and V 00 are sum of eigenspaces of T . As S is a family of
commuting operators, V = V 0 ⊕ V 00 is also an S-invariant decomposition. Let S 0 , S 00 be the family
of linear operators induced by S on V 0 , V 00 respectivly. Then by the induction hypothesis, both S 0
and S 00 are simultaneously diagonalizable. Therefore S is also simultaneously diagonalizable.
Lemma. Let V be an n-dimensional vector space and S ⊆ L(V ) a set of triangulable linear
operators. Then S is simultaneously triangulable iff there exists a chain of S-invariant subspaces
0 = V0 ( V1 ( · · · ( Vn = V .
Proof. Left as an exercise. It’s almost same as the proof which we did for triangulable opera-
tors.
Proof. Left as an exercise. Again, it’s almost same as the proof which we did for triangulable
operators.
43
Lemma. Let S be a commuting family of triangulable linear operators on a finite-dimensional
vector space V . Then V contains an eigenvector of S.
Proof. Using induction on dim V , the proof follows from the above lemma. The details are
left as an exercise.
n(n+1)
Corollary. Let F be a commutative F -subalgebra of Mn (F ). Then dim F ≤ 2 − 1.
Proof. Let E be an algebraically closed field containing F and FE the smallest E-subalgebra
of Mn (E) containing F. Then under a suitable automorphism of Mn (E), the image of FE lies
inside the set of upper-triangular matrices of Mn (E). As FE is a commutative E-algebra and the
set of upper-triangular matrices isn’t commutative, we deduce that dimE FE ≤ n(n+1)2 − 1, and
n(n+1)
consequently, dim F ≤ 2 − 1.
Exercises.
1. HK : Section 6.3 - 1,3,4,6,7,8,9,10; Section 6.4 - 9,10,11,13; Section 6.5 - 2 (do it for n = 2, 3,
don’t try to do it for general n - that’s a big theorem!),5.
2. Give an example of a linear operator T on a finite-dimensional vector space V over F such
that T has no eigenvalues.
Does your answer change if F is algebraically closed?
What if F is algebraically closed, but V is not finite-dimensional?
3. Let F be an algebraically closed field and A ∈ Mn (F ). Then show that the following
statements are equivalent.
(i) A is diagonalizable.
(ii) For every non-constant polynomial f (X) ∈ F [X], the equation f (X) = A has a solution
in Mn (F ).
(iii) For all λ ∈ F , the equation X n − λ = A has a solution in Mn (F ).
4. Let A be an n-dimensional algebra over a field F . Then prove that A, as an F -algebra, can
be embedded in Mn (F ).
In particular C, as a R-algebra, can be embedded in M2 (R).
Hint. Can you ‘view’ the elements of A as F -linear operators on A?
5. Let D be the set of all diagonalizable matrices in Mn (F ), where F = R/C. Then prove that
D◦ , the interior of D, consists
of precisely those matrices which have n distinct eigenvalues.
λ
Hint. Look at the matrix .
0 λ
6. Let R be a commutative ring and A, B ∈ Mn (R). Then prove that chAB (X) = chBA (X).
Now suppose that R = F is a field. Then AB and BA have the same set of eigenvalues and
AB is triangulable iff BA is triangulable. Also, AB is nilpotent iff BA is nilpotent.
Does AB and and BA have the same set of eigenvectors?
Is minAB (X) = minBA (X)?
If AB is diagonalizable, is BA also diagonalizable?
Hint. First prove that chAB (X) = chBA (X) for R = C. For that, fix a matrix A ∈ Mn (C)
and observe that the result holds for all B ∈ GLn (C). As GLn (C) is dense in Mn (C), you can
44
now follow a similar line of arguments as given in the proof of the Cayley-Hamilton theorem.
The only thing is that, instead of one, here you’ve to consider two generic matrices, so that
R̃ will be a polynomial ring in 2n2 variables over Z.
7. Let V be a finite-dimensional vector space over an algebraically closed field F . If T ∈ L(V ),
show that T is diagonalizable iff the restriction of T to every two-dimensional subspace of V
is diagonalizable.
(*) Prove that we can drop the condition that ‘V is finite-dimensional’ if T is assumed to be
a locally finite linear operator.
8. Let A ⊆ B be unique factorization domains with A being a principal ideal domain. If
B ∗ ∩ A = A∗ , then prove that any two elements a, b ∈ A are relatively prime in A iff they’re
relatively prime in B.
9. Let T be a linear operator on a finite-dimensional vector space V over an algebraically closed
field F . Then show that for every polynomial f (X) ∈ F [X], µ ∈ F is an eigenvalue of f (T )
iff µ = f (λ) for some eigenvalue λ of T (Note that the implication in one direction does not
require F to be algebraicaly closed.).
Can we drop the assumption that F is algebraically closed?
10. Let T be a linear operatorPon a vector space V . If W ≤ V is T -invariant, then show that TW
is diagonalizable iff W ⊆ λ∈F Vλ .
11. If T is a linear operator on an n-dimensional vector space V then show that the following
statements are equivalent.
(i) T has n distinct eigenvalues.
(ii) Any S ∈ L(V ) which commutes with T can be written as a polynomial in T over F .
(iii) Any S ∈ L(V ) which commutes with T is diagonalizable.
a b
Hint. Look at the matrix .
0 a
12. Let T be a triangulable linear operator on a finite-dimensional vector space V . Then prove
that the following statements are equivalent.
(a) T is diagonalizable.
(b) Every T -invariant subspace W ≤ V has a T -invariant complement.
(c) For all λ ∈ F , λ is not an eigenvalue of T̄Vλ .
13. If F = R/C, we can give a metric d on F[X] by defining
sX
d(f, g) := |fi − gi |2 .
i
Then it follows from the proof of the Cayley-Hamilton theorem that the map from Mn (F ) to
F [X] which sends a matrix to its characteristic polynomial, is a continuous map.
Is the map A 7→ minA (X) continuous?
If you give a correct answer, it’ll follow that, unlike characteristic polynomials, the coeffi-
cients of minimal polynomials are not polynomials in the entries of a matrix.
In the remaining exercises, we’ll explore the possibilities of extending some of the results,
which we’ve already seen in the context of finite-dimensional vector spaces, to vector spaces
of arbitrary dimension.
45
the kernel is nonzero, then the monic annihilating polynomial of least degree is called the
minimal polynomial of T and denoted by minT (X). Again, unlike finite-dimensional vector
spaces, T may not have any minimal polynomial, and T has a minimal polynomial iff it has
a nonzero annihilating polynomial.
lcm(minTW (X), minT̄W (X)) | minT (X) | minTW (X) · minT̄W (X).
18. (*) ForSeach λ ∈ F , the generalized λ-eigenspace of T , denoted by Ṽλ , is defined to be the
∞
union n=1 ker (T − λI)n . Then prove that
46
(ii) If every T -invariant subspace W ≤ V has a T -invariant complement, show that T is
locally finite.
(iii) Prove that T is diagonalizable iff every T -invariant subspace W ≤ V has a T -invariant
complement.
24. (*) If T has a nonzero annihilating polynomial then T is diagonalizable iff minT (X) is a
product of distinct linear factors.
25. (*) If T is a triangulable linear operator then show that the following statements are equiv-
alent.
(i) T is diagonalizable.
(ii) Vλ = Ṽλ for all λ ∈ F .
(iii) λ is not an eigenvalue of T̄Vλ for all λ ∈ F .
Hint. For a linear operator T ∈ L(V ), the induced map T̄ : V /ker T → V /ker T is injective
iff ker T = ker T 2 .
47
Lecture 14 (18/11/2020) :
The simplest diagonal operators, after scalar operators, are those having exactly two eigenvalues;
and especially the ones with eigenvalues 0 and 1.
Remarks.
1. If π ∈ L(V ) is a projection, then the image and kernel of π are denoted by R(π) and N (π)
respectively. As π 2 = π, it follows that R(π) ∩ N (π) = 0.
2. From the theory of diagonalizable operators which we’ve developed so far, it’s easy to see
that every idempotent operator is diagonalizable.
But one can also check this directly, without appealing to any fancy machinery, from the fact
that for all x ∈ V , x − π(x) ∈ N (π), so that V = R(π) + N (π); and since they’re linearly
independent, it follows that V = R(π) ⊕ N (π).
3. If π ∈ L(V ) is an idempotent operator, so is I − π. Then R(I − π) = N (π) and N (I − π) =
R(π).
4. If π1 , π2 are idempotent operators then π1 = π2 iff N (π1 ) = N (π2 ) and R(π1 ) = R(π2 ).
5. The linear span of the set of n × n idempotent matrices is equal to Mn (F ).
6. A linear operator T on a finite-dimensional vector space V is a projection iff its matrix
representation [T ]B with respect to some (equivalently, all) ordered basis B of V is an
idempotent matrix.
7. A linear operator T ∈ L(V ) is a projection iff V = ker T ⊕ im T and Tim T = idim T . In other
words, a non-scalar operator is idempotent iff it’s a diagonalizable operator with eigenvalues
0 and 1.
8. If T ∈ L(V ) is a diagonalizable operator with exactly two eigenvalues, then there exist
a(6= 0), b ∈ F such that aT + bI is a projection.
9. If V = V 0 ⊕ V 00 is a non-trivial direct sum decomposition of V , i.e., V 0 and V 00 are both
nonzero, then there exists a uniqe idempotent operator π ∈ L(V ) such that R(π) = V 0 and
N (π) = V 00 . In this situation, we say that π is a projection onto V 0 along V 00 . Again, a few
examples in R2 /R3 will illustrate the idea.
10. If T is a diagonalizable linear operator on a vector space V , then for all S ∈ L(V ), ST = T S
iff S preserves all eigenspaces of T . In particular, for a projection π on V , S ∈ L(V ) commutes
with π iff both R(π) and N (π) are S-invariant. Therefore V has a direct sum decomposition
into nonzero T -invariant subspaces iff T commutes with a non-trivial idempotent operator.
11. It’s easy to see that the non-scalar idempotent operators on V are in one-to-one correspon-
dence with the direct sum decompositions of V as a sum of two proper subspaces. We’re
soon going to generalize this.
Definition. Let V be a vector space. A (finite) set of nonzero idempotent operators {π1 , · · · , πr }
⊆ L(V ) is said to be a resolution of identity if the following two conditions are satisfied.
(i) π1 + · · · + πr = I
(ii) πi πj = 0 for all i 6= j.
48
Remark. The word ‘finite’ has been kept in the bracket because one can also define an arbi-
trary resolution of identity without the finiteness assumption. But for us, a resolution of identity
will always mean a finite resolution of identity.
Proof. Let x ∈ V . Then there exist x1 , · · · , xr ∈ V such that xi ∈ Vi for all i and
x = x1 +· · ·+xr . Then (π1 +· · ·+πr )(x1 +· · ·+xr ) = π1 (x1 )+· · ·+πr (xr ) = x. As πj (V ) = Vj ⊆ Ṽi ,
it follows that πi πj = 0 for all i 6= j. Therefore {π1 , · · · , πr } is a resolution of identity.
Proposition. Let V be a nonzero vector space. Then the elements of the set of all finite
resolutions of identity are in one-to-one correspondence with the elements of the set of all finite
direct sum decompositions of V where each summand is nonzero.
Proof. Let D, R be the set of all finite direct sum decompositions of V and the set of all finite
resolutions of identity. Then we can define two set-theoretic maps Φ : D → R and Ψ : R → D as
follows.
Φ({V1 , · · · , Vr }) := {π1 , · · · , πr },
P
where πi is the idempotent linear operator whose image is Vi and kernel is Ṽi := j6=i Vj , and
Now one can check that Ψ ◦ Φ = idD and Φ ◦ Ψ = idR . The details are left as an exercise.
Remark. If we take finite sequences instead of finite sets, we’ll get a one-to-one correspondence
between all finite sequences of nonzero projections giving a resolution of identity and the set of all
finite sequences of nonzero subspaces of V which gives a direct sum decomposition. Then, unlike
sets, one has to distinguish between say (π, I − π) and (I − π, π), and likewise between (V 0 , V 00 )
and (V 00 , V 0 ).
As the above proposition suggests, ‘breaking’ a vector space into finitely many nonzero subspaces
is ‘same is’ ‘breaking’ the corresponding identity operator as a finite sum of projections which don’t
‘interact’ with each other.
Exercises.
1. HK : Section 6.7 - 2,5,8,9 (assume that ch F 6= 2),10 (assume that ch F = 0; what happens
in positive characteristic?),11.
P
2. Let V1 , · · · , Vn be subspaces of a vector space V . For each i, let Ṽi := j6=i Vj . Then prove
that V = Vi ⊕ Ṽi is a direct sum decomposition of V for every i iff V = V1 ⊕ · · · ⊕ Vn is a
direct sum decomposition.
3. Let T ∈ L(V ) and V = V 0 ⊕ V 00 a T -invariant decomposition of V . If ker T ⊆ V 0 then show
that ker T n ⊆ V 0 for all n ≥ 1.
Hence or otherwise, prove that if T commutes with a projection π ∈ L(V ), then for all λ ∈ F ,
ker π contains the λ-eigenspace of T iff it contains the generalized λ-eigenspace of T . If we
drop the condition that πT = T π, does the conclusion still hold?
49
4. Let S, T be linear operators on a vector space V satisfying STQ= T S. Suppose that f (X) ∈
r
F [X] is a nonzero polynomial such that f (T ) = 0. Let f (X) = i=1 fiei (X) be an irreducible
ei
decomposition of f (X) and Vi := ker fi (T ) for all i. Then show that V = V1 ⊕ · · · ⊕ Vr is
an S-invariant direct sum decomposition of V .
Prove that every generalized eigenspace of T is S-invariant.
5. If π1 , π2 ∈ L(V ) are idempotent operators then
Hints. To prove (ii), note that if S, T ∈ L(V ) satisfies ST = cT S for some nonzero constant
c ∈ F , then both the kernel and the image of T are S-invariant.
For (iii), you may think about 2 × 2 matrices.
50
Lecture 15 (20/11/2020) :
Why projections?
We’ve seen that finite direct sum decompositions of a vector space V are ‘same as’ finite resolutions
of the identity operator on V . Now if V = V1 ⊕ · · · ⊕ Vr is a direct sum decomposition then for a
linear operator T ∈ L(V ), the decomposition is PT -invariant iff T πi = πi T for all i, where πi ∈ L(V )
is the projection with R(πi ) = Vi and N (πi ) = j6=i Vj . Note that Ti , the restriction of T on Vi , is
not a linear operator on V . So we use projections to bring different restrictions of T ‘on the same
platform’, i.e., make them linear operators on V by noting that the action of T πi = πi T is ‘same’
P of identity π1 + · · · + πr = I allows us to similarly ‘break’
as the action of Ti on Vi . The resolution
T into its various restrictions as T = i T πi , where T πi becomes the substitute of Ti .
Using projections, we’ll be able to characterize (and even describe!) diagonalizable operators. In
the remaining part of this lecture, we’ll often assume that the liner operator under consideration
satisfies a nonzero polynomial over F (This is automatic if V is finite-dimensional!). In this context,
the following conditions are equivalent for a linear operator T on V .
(i) T satisfies a nonzero polynomial over F .
(ii) The F -algebra homomorphism from F [X] to L(V ) which sends X to T is not injective.
(iii) The F -algebra F [T ] is a finite-dimensional vector space over F .
We’ll start with some general results.
and X
R(π) = Ṽλj ,
λj such that φ(λj )=1
Lemma. Let T be a linear operator on a vector space V and f (X), g(X) ∈ F [X] are non-
constant relatively prime polynomials. Suppose that f g(T ) = 0. If π ∈ L(V ) is the projection of
V whose kernel is ker g(T ) and image is ker f (T ), then π ∈ F [T ].
51
Proof. As f (X), g(X) are relative prime polynomials, we can find two polynomials f1 (X), g1 (X)
∈ F [X] such that f f1 (X) + gg1 (X) = 1. Then one can easily check that π = gg1 (T ).
Proposition. Let V (6= 0) be a vector space and T ∈ L(V ) a diagonalizable operator with
finitely many eigenvalues λ1 , · · · , λr (The second condition is superfluous if V is finite-dimensional.).
Then there exists a resolution of identity {π1 , · · · , πr } such that T = λ1 π1 + · · · + λr πr .
Conversely, if {π1 , · · · , πr } is a resolution of identity and λ1 , · · · , λr are (distinct) elements of F
such that T = λ1 π1 + · · · + λr πr , then T is a diagonalizable operator with eigenvalues λ1 , · · · , λr .
Remark. In the second part of the above proposition, if we only assume that T is a sum of
a finite set of commuting projections, then although T still remains diagonalizable, its eigenvalues
need to have any relation with λ1 , · · · , λr . For example, one may take I − I = 0.
It follows from our previous discussions that when T is diagonalizable, each πi in the above propo-
sition can be written as a polynomial in T . To see this directly, let T = λ1 π1 + · · · + λr πr . Then
for every polynomial h(X) ∈ F [X],
h(T ) = h(λ1 )π1 + · · · + h(λr )πr .
Now by Lagrange’s interpolation, for each i, there exists a polynomial fi (X) ∈ F [X] of degree
≤ r − 1 such that fi (λj ) = δij , so that πi = fi (T ).
You may try to state and prove a matrix-theoretic analogue of the above proposition. Believe me,
it’s a rewarding exercise!
The next proposition highlights the relation between diagonalizable operators and projections.
Proposition. Let T be a linear operator on V and f (X) ∈ F [X] a nonzero polynomial such
that f (T ) = 0. Let p(X) be the minimal polynomial of T with deg p(X) = r. Then the following
statements are equivalent.
52
1. T is diagonalizable.
2. p(X) has r distinct monic irreducible factors.
3. F [T ] contains 2r distinct projections.
4. T can be written as an F -linear combination of the idempotent operators contained in F [T ].
5. As an F -algebra, F [T ] is generated by the idempotent elements contained in it.
6. As an F -algebra, F [T ] is generated by the diagonalizable operators contained in it.
Proof. You may prove that (i) ⇐⇒ (ii) ⇐⇒ (iii) =⇒ (iv) =⇒ (v) =⇒ (vi) =⇒ (i).
The details are left as an exercise.
The next two results are similar in spirit as both of them identifies nilpotency to be the ‘real
reason’ behind the failure of diagonalizability, at least when all the eigenvalues are ‘present’ in F .
Proposition. Let T be a linear operator on V and f (X) ∈ F [X] a nonzero polynomial such
that f (T ) = 0. Suppose that f (X) splits into a product of linear factors over F . Then T is
diagonalizable iff F [T ] is a reduced ring, i.e., it does not contain any nonzero nilpotent element.
Proof. Let p(T ) be theQr minimal polynomial of T , i.e., the monic polynomial of least degree
satisfied by T . If p(X) := i=1 (X −λi )ei is the irreducible decomposition of p(X) into the product
of powers of distinct monic irreducible polynomials then the assertion follows from the F -algebra
isomorphism
r
F [X]
F [T ] ∼
Y
= .
i=1
((X − λ i ) ei )
Finally, the following theorem allows us to ‘split’ a linear operator T in its diagonalizable and
nilpotent ‘parts’ so that T is diagonalizable iff its nilpotent part is 0.
Theorem. Let T be a linear operator on V (6= 0) and f (X) ∈ F [X] a nonzaero polynomial
such that f (T ) = 0. Suppose that f (X) splits into a product of linear factors over F . Then there
exist unique linear operators D, N ∈ L(V ) satisfying the following conditions.
1. D is diagonalizable and N is nilpotent.
2. T = D + N .
3. DN = N D.
In this case, both D, N ∈ F [T ].
Qr
Proof. Let p(X) be the minimal polynomial of T and p(X) = i=1 (X − λi )ei the irreducible
decomposition of p(X) into a product of powers of distinct monic irreducible polynomials. For
each i, let Vi := ker (T − λi I)ei . Then V = V1 ⊕ · · · ⊕ Vr is a T -invariant
P direct sum decomposition.
Let πi be the projection whose image is Vi and kernel is Wi := j6=i Vj . Then {π1 , · · · , πr } is a
Pr
resolution of identity and D := i=1 λi πi is a diagonalizable linear operator. Using Lagrange’s
interpolation, we can write each πi as a polynomial in T of degree ≤ r − 1. Consequently, D is
also a polynomial in T of degree ≤ r − 1. Let N := T − D and e the maximum among ei ’s. Then
N e = 0 as N |Vi = T − λi I|Vi for all i. It implies that N ∈ F [T ] is a nilpotent operator.
To prove uniqueness, let T = D + N = D0 + N 0 be two decompositions satisfying the proper-
ties of the theorem, with D, N being the linear operators constructed above. We want to show
that D = D0 and N = N 0 . As D0 commutes with N 0 , both D0 and N 0 commute with T . Since
both D and N are polynomials in T , it implies that DD0 = D0 D and N N 0 = N 0 N . Therefore,
D − D0 = N 0 − N is a diagonalizable nilpotent operator, implying that D = D0 and N = N 0 .
Remark. Let’s try to ‘understand’ the above result and its proof, at least in the case when
V is finite dimensional. The reason why we only consider finite-dimensional vector spaces is that
we’ll consider a matrix representation of T which is easy to ‘visualize’.
53
Qr
Let T be a linear operator on an n-dimensional vector space V with ch T (X) = i=1 (X − λi )ei .
Let Vi be the generalized λi -eigenspace of T , i.e., Vi = ker (T − λi I)ei . For each i, we can choose
an ordered basis Bi of Vi , such that the corresponding matrix representation [TVi ]Bi is an upper-
triangular matrix. Then B := (B1 , · · · , Br ) is an ordered basis of V , and clearly
[TV1 ]B1
. 0
[T ]B = .
0
.
[Tr ]Br
so that [T ]B is a block diagonal matrix where each block is an upper triangular matrix of size
dim Vi . Also, for each i, every diagonal entry of [TVi ]Bi is equal to λi . Now if you look at the
proof of the above theorem, you’ll readily see that [D]B is the n × n diagonal matrix consisting of
r blocks of scalar matrices, which is obtained by collecting the ‘scalar part’ from each [TVi ]Bi . It
follows from the construction of D as given in the proof, as well as from the above discussion, that
chT (X) = chD (X). Also, the above discussion makes it obvious that T − D is a nilpotent operator.
Exercises.
1. HK : Section 6.7 - 1,4,5,6,8,9; Section 6.8 - 5,6,7,8,10,11,12,13,14 (can you give an ‘easier’
proof if F is infinite?).
2. If A, B ∈ Mn (F ) are nilpotent matrices, does it follow that A+B, AB are nilpotent matrices?
What if we further assume that AB = BA?
3. Let A, B ∈ Mn (F ) such that A is invertible and B is nilpotent. Is A+B an invertible matrix?
Does your answer change if we further assume that AB = BA?
4. If T : V → V is a linear operator, then show that the following statements are equivalent.
(i) ker T = ker T 2 .
(ii) ker T ∩ im T = 0.
Further, if V is finite-dimensional, the above conditions are equivalent to
(iii) rk T = rk T 2 .
(iv) V = ker T ⊕ im T .
Deduce that if V is a finite-dimensional vector space over an algebraically closed field F and
T ∈ L(V ) is a linear operator, then T is diagonalizable iff rk (T − λI) = rk (T − λI)2 for all
λ ∈ F.
5. (*) Let F be an algebraically closed field and A ∈ Mn (F ). Then X r = A has a solution in
Mn (F ) for all r ≥ 1 iff rk A = rk A2 .
Can we drop the condition that F is algebraically closed?
6. Let T : V → V be a linear operator satisfying a non-constant polynomial in F [X]. Then
show that the following statements are equivalent.
(a) minT (X) is a power of an irreducible polynomial.
(b) F [T ] does not contain any non-trivial idempotent element.
(c) F [T ] does not contain any non-scalar diagonalizable operator.
54
10. If A is an n × n matrix over F then show that A is diagonalizable (respectively triangulable)
iff At is diagonalizable (respectively triangulable).
11. Let S, T ∈ L(V ) be linear operators satisfying ST = T S. Then prove that for all λ ∈ F , the
generalized λ-eigenspace of T is S-invariant.
12. (*) Let λ be an eigenvalue of a linear operator T ∈ L(V ). If there exists a projection
π ∈ L(V ) such that T π = πT and R(π) = Vλ , then show that Vλ is equal to the generalized
λ-eigenspace of T .
Let T ∈ L(V ) be a linear operator satisfying a nonzero polynomial f (X) ∈ F [X] which
splits into a product of linear factors over F . If λ1 , · · · , λr are the eigenvalues of T and
for each i, there exists a projection πi ∈ F [T ] such that R(πi ) = Vλi , then prove that T is
diagonalizable.
13. (*) Let T be a linear operator on an n-dimensional vector space V such that the minimal
polynomial of T has degree n. Then prove that every linear operator S ∈ L(V ) which
commutes with T can be written as a polynomial in T over F .
Hint. Find an element x ∈ V such that {x, T (x), · · · , T n−1 (x)} is a basis of V . Then there
exists a polynomial g(X) ∈ F [X] such that S(x) = g(T )(x). Can you show that S = g(T )?
14. Let T be a linear operator on V such that every T -invariant subspace of V has a T -invariant
complement. Then prove that ker T = ker T 2 .
Deduce that for all λ ∈ F , the λ-eigenspace of T is equal to its generalized λ-eigenspace.
Further, if we assume that F is algebraically closed and T is locally finite, then prove that
T is a diagonalizable operator.
15. Let T : V → V be a linear operator. Then show that im T has a T -invariant complement iff
V = im T + ker T .
Deduce that if V is moreover finite-dimensional, then the following statements are equivalent.
(a) V = im T ⊕ ker T .
(b) im T and ker T are linearly independent.
(c) im T has a T -invariant complement.
(d) im T has a unique T -invariant complement.
16. Let T be a linear operator on a finite-dimensional vector space V . Then show that V does
not have any nonzero proper T -invariant subspace iff chT (X) is an irreducible polynomial.
17. Let T be a linear operator on a finite-dimensional vector space V . If chT (X) = minT (X) is
a power of an irreducible polynomial, then show that a nonzero proper T -invariant subspace
of V cannot have any T -invariant complement.
Qr
18. (*) Let A, B ∈ Mn (F ). such that chA (X) = chB (X) = i=1 (X − λi )ei and minA (X) =
minB (X). If ei ≤ 3 for all i, then prove that A and B are similar matrices.
19. (*) Let A be an n × n matrix over a field F . Then the commutator of A, denoted by CA , is
defined as
CA := {M ∈ Mn (F ) | AM = M A}.
Now prove the following statements.
(i) If A and B = P −1 AP are similar matrices then CB = P −1 CA P . In particular, dim CA =
dim CB .
(ii) Given A ∈ Mn (F ), define a linear operator TA : Mn (F ) → Mn (F ) given by TA (M ) :=
AM −M A. Then CA = ker TA . Prove thar there exists an ordered basis B (independent
of A) of Mn (F ) such that
A
. 0
[TA ]B = .
0
.
A
55
i.e, [TA ]B is a block diagonal matrix with exactly n blocks of size n × n and each block
is equal to A.
Deduce that the map Mn (C) → Mn2 (C), given by A 7→ [TA ]B , is a continuous linear
transformation (One can also conclude this from the fact that every linear transforma-
tion between finite dimensional vector spaces over F := R/C is continuous, but we can
actually ‘see’ it from the above construction.).
(iii) If E/F is a field extension and A ∈ Mn (F ), then we can also define
(i) If A and B = P −1 AP are similar matrices then lim An exists iff lim B n exists, and
n→∞ n→∞
in this case lim B n = P −1 ( lim An )P .
n→∞ n→∞
(ii) lim An = 0 iff λ(A) ⊆ B(0, 1).
n→∞
(iii) (*) The set {An } is bounded iff λ(A) ⊆ B[0, 1] and for all λ ∈ S 1 ∩ λ(A), the geometric
multiplicity of λ is equal to its algebraic multiplicity, i.e., rk (A − λI) = rk (A − λI)2 .
(iv) (*) The sequence (An ) is convergent iff λ(A) ⊆ B(0, 1) ∪ {1} and the geometric multi-
plicity of 1 is equal to its algebraic multiplicity, i.e., rk (A − I) = rk (A − I)2 . In this
case, lim An is an idempotent operator whose rank is equal to the geometric multi-
n→∞
plicity of 1.
Hint. To prove (ii), note that A is similar to a matrix A0 such that A0 can be written
as A0 = D + N , where D is a diagonal matrix with chD (X) = chA (X), N is a nilpotent
n
matrix and DN = N D. Now expand (D +
nN ) .
an nan−1 b
a b
For (iii), note that if a, b ∈ F then = for all n ≥ 1.
0 a 0 an
To prove (iv), observe that if α ∈ C, then lim αn = 1 iff α = 1.
n→∞
21. Let T be a linear operator on V . For an element v ∈ V , the annihilator of v with respect
to T , denoted by annT v, is defined as annT v := {f (X) ∈ F [X] | f (TT)(v) = 0}; and if
S ⊆ V , the annihilator of S with respect to T is defined as annT S := v∈S annT v. Note
that annT V = ann T . If annT v 6= 0, then the unique monic generator of annT v is also called
the annihilator of v with respect to T . In the following exercises, assume that T satisfies a
nonzero polynomial over F .
(i) Let V 0 , V 00 ≤ V with f (X) := annT V 0 and g(X) := annT V 00 . If f (X), g(X) are
relatively prime, then show that V 0 ∩ V 00 = 0.
(ii) For all x ∈ V , annT x = annT Vx , where Vx is the smallest T -invariant subspace of V
containing x.
(iii) Let u, v ∈ V . If f (X) := annT u and g(X) := annT v are relatively prime, then prove
that annT (u + v) = f g(X).
(iv) If φ(X) is a non-constant factor of minT (X), then show that there exists an element
v ∈ V such that annT v = φ(X).
56
Lecture 16, 17, 18 and 19 (14/12/2020, 16/12/20, 21/12/2020,
23/12/2020) :
Inner Product Spaces
So far, we have studied linear algebra from a purely algebraic viewpoint. But we have also seen
from different courses on analysis and geometry that Rn comes equipped with a natural geometric
structure - the so-called Euclidean metric which can be studied and used to our benefit.
We will now systematically develop a geometric structure on vector spaces. To do so, from now
on, unless otherwise mentioned, we will exclusively work over the field of real or complex numbers
which already have ‘in-built’ geometric structures. With co-ordinate systems at our disposal, our
main goal is to introduce the notions of ‘length’ and ‘angle’ - the fundamental tools of co-ordinate
geometry. In fact, we will soon see that the notion of angle, per se, is not that important, but the
concept of ‘orthogonality’ is, i.e., when two vectors are mutually perpendicular.
We will mainly focus on finite-dimensional vector spaces and many of the results which we prove
in this context have their natural generalizations to vector spaces of arbitrary dimensions, as one
can see in functional analysis. In a sense, the study of finite-dimensional inner product spaces in
linear algebra is an excellent prelude to a course on functional analysis.
Definitions. Let V be a vector space over a field F := R/C. Then an inner product on V is
defined to be a function
h , i: V × V → F
satisfying the following properties
(i) Linearity in the first co-ordinate: For all x, y, z ∈ V and λ ∈ F , hx + λy, zi = hx, zi + λhy, zi.
(ii) Conjugate-symmetry: For all x, y ∈ V , hy, xi = hx, yi.
(iii) Positivity: For all x ∈ V , if x is nonzero then hx, xi > 0.
An ordered pair (V, h , i), where V is a vector space over F and h , i : V × V → F is an inner
product on V , is called an inner product space, or IPS, in short, over F . Often we’ll denote an
inner product space simply by V if either the inner product h , i is understood from the context or
if the discussion is independent of any particular choice of the inner product. If F = R (or F = C)
then V is caller a real inner product space (or a complex inner product space).
Two vectors x, y ∈ V are said to be mutually orthogonal or orthogonal to each other if hx, yi = 0.
In symbol, we write this as x ⊥ y. Note that x ⊥ y iff y ⊥ x and a nonzero vector x cannot be
orthogonal to itself.
A finite sequence of vectors x1 , . . . , xn ∈ V is said to be an orthogonal sequence if xi ⊥ xj for all
i 6= j.
An orthogonal sequence of p vectors x1 , . . . , xn ∈ V is said to be an orthonormal sequence if each xi
has norm 1, i.e., if kxi k := hxi , xi i = 1 for all i. A vector whose norm is 1 is called a unit vector.
Thus an orthonormal sequence is an orthogonal sequence of unit vectors.
Note that an orthogonal sequence of nonzero vectors is linearly independent. In particular, an
orthonormal sequence of vectors is linearly independent.
A set of vectors S ⊆ V is said to be an orthogonal set (respectively an orthonormal set) if ev-
ery finite sequence of distinct elements of S is orthogonal (respectively orthonormal). Note that
we’re following the same pattern of definitions as given for linearly independent sequence/set of
vectors. However, unlike linear independence, orthogonality is always checked pairwise. Therefore,
we could have directly defined a set S ⊆ V to be orthogonal if any two distinct elements of the set
are mutually orthogonal. Further, if S consists of only unit vectors, then S is an orthonormal set.
In particular, a singleton set {x} ⊆ V is always orthogonal, and it’s orthonormal iff kxk = 1.
Remarks.
1. If (V, h , i) is an inner product space then for every subspace W ≤ V , the restriction h , i|W ×W
is an inner product on W , so that (W, h , i|W ×W ) itself becomes an inner product space.
More generally, let V, W be vector spaces over F := R/C with (V, h , iV ) being an inner
product space. If T : W → V is an injective linear transformation, the we can consider W to
be an inner product space by identifying it with its image T (W ), i.e., h , iW : W × W → F ,
defined as hx, yiW := hT (x), T (y)iV for all x, y ∈ W , is an inner product on W .
57
2. If F = R, then an inner product on V is a symmetric bilinear form, i.e., a symmetric function
on V × V which is linear in both the co-ordinates.
However, if F = C, then an inner product on V is neither bilinear nor symmetric. It’s
a so called sesquilinear form, i.e., one-and-half linear, because although it’s linear in the
first co-ordinate and additive in the second co-ordinate, it’s only conjugate-linear in the
second co-ordinate. Actually, over C, it couldn’t have been bilinear without sacrificing the
‘positivity’ condition; because if f : V × V → F is a bilinear function, then for all x ∈ V ,
f (ix, ix) = −f (x, x), implying that the positivity condition fails whenever V 6= 0.
This is known as the Pythagorean law and it’s just a generalization of what we learnt in our
high school geometry; that given any right-angled triangle, hypotenuse2 = base2 + height2 .
7. Let (V, h , i) be an inner product space. Then the induced norm k k satisfies the parallelogram
law, i.e., kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) for all x, y ∈ V . We’ll later see that every
norm which satisfies this parallelogram law is induced by an inner product.
3. Let F := R/C and V := C([0, 1], F ), the F -vector space of all continuous F -valued functions
R1
defined on the unit interval [0, 1]. Then hf, gi := 0 f (x)g(x) dx is an inner product on V .
58
Definitions. Let V be an inner product space. An orthonormal set S ⊆ V is said to be a
complete orthonornal set if it’s a maximal element in the family of all orthonormal subsets of V
with respect to set inclusion.
An orthonormal set S ⊆ V is said to be an orthonormal basis or a Hilbert basis if the linear span
of S is dense in V with respect to the induced metric.
It is clear that every orthonormal basis is a complete orthonormal set. Now using Zorn’s
lemma, we can easily show that every inner product space contains a complete orthonormal set (If
V = 0, it’s the empty set!). But surprisingly, there are examples of inner product spaces without
any orthonormal basis; in particular, showing that a complete orthonormal set need not be an
orthonormal basis. However, we’ll later see that every complete inner product space, i.e., a Hilbert
space, always has an orthonormal basis. In fact, in a Hilbert space, every complete orthonormal
set is a Hilbert basis.
Next, we do Gram-Schmidt orthogonalization, perhaps the most useful theorem for finite-
dimensional inner product spaces. It’ll allow us to ‘convert’ bases into orthonormal bases; so
that, theoretically, every problem about a finite-dimensional inner product space can be reduced
to a problem about F n with the usual dot product. In fact, often it suffices to consider just F 2 ,
as we’ll see.
Note that when yi = 0, λi could have been chosen to be any constant. Also, yr+1 = 0 iff
xr+1 ∈< x1 , . . . , xr >. Therefore y1 , y2 , . . . can be chosen to be an orthonormal sequence iff
the original sequence x1 , x2 , . . . is linearly independent.
Remarks.
1. If y1 , y2 , . . . is an orthogonal sequence satisfying the condition of the above theorem, then so
is the sequence λ1 y1 , λ2 y2 , . . . for every sequence of nonzero elements λ1 , λ2 , . . . ∈ F .
And if y1 , y2 , . . . is an orthonormal sequence satisfying the condition of the above theorem,
then so is the sequence λ1 y1 , λ2 y2 , . . . for every sequence of elements λ1 , λ2 , . . . ∈ F such
that |λi | = 1 for all i.
So the Gram-Schmidt orthogonalization only gives us one orthogonal sequence satisfying the
desired condition. It is not unique even if we restrict ourselves only to orthonormal sequences.
2. Every finite dimensional inner product space V has an orthonormal basis. In this case, an
orthonormal set S ⊆ V is an orthonormal basis iff it’s a basis of V .
59
is a Hilbert space and every orthonormal basis of V is also an orthonormal basis of Vb . Now
suppose that V is an inner product space of countably infinite dimension as given in the
above example. Then choose a vector x ∈ Vb \ V and let W := V + < x >. Clearly, W has
a countable dimension. Now every orthonormal basis S of V is also an orthonormal basis
of W , but S cannot span W . Therefore, unlike the finite-dimensional case, an orthonormal
basis need not be a spanning set of a countably infinite-dimensional inner product space.
4. Let V be an inner product space over F = R/C. For every positive integer n, let Tn :=
{(a1 , . . . , an ) ∈ F n | |ai | = 1 for all i}. Then Tn is a group under coordinate-wise multipli-
cation and it acts on the set of all orthonormal sequences of length n, where the action is
given by ((a1 , . . . , an ), (x1 , . . . , xn )) 7→ (a1 x1 , . . . , an xn ). Similarly, the group (F ∗ )n acts on
the set of all orthogonal sequences of length n. Note that if F = C, then Tn = (S 1 )n and if
F = R, then Tn = {±1}n .
Proof. It’s easy to see that if x, y are linearly dependent then |hx, yi| = kxk kyk.
So let us assume that W , the subspace of V generated by x and y, has dimension 2. Note that it
suffices to prove the inequality after replacing x by x/kxk. So we may assume that kxk = 1. Since
W is two-dimensional, we can use Gram-Schmidt orthogonalization to find an orthonormal basis
{x, e} of W . Let y = c1 x + c2 e where c2 6= 0. Then
p
|hx, yi| = |hx, c1 x + c2 ei| = |c1 | < |c1 |2 + |c2 |2 = kyk.
Therefore the equality holds in Cauchy-Schwarz inequality iff x and y are linearly dependent.
We can use Cauchy-Schwarz inequality to show that the norm induced by an inner product
satisfies the triangle inequality.
Corollary. Let (V, h , i) be an inner product space with the induced norm k k. Then
kx + yk ≤ kxk + kyk.
Proof. As both sides are non-negative, it suffices to prove the inequality after squaring them.
Now kx + yk2 = kxk2 + kyk2 + 2 Re hx, yi ≤ kxk2 + kyk2 + 2 |hx, yi| ≤ kxk2 + kyk2 + 2 kxk kyk =
(kxk + kyk)2 .
Recall that if z is a complex number, then Im z = Re (−iz). Therefore, Im hx, yi = Re (−ihx, yi) =
Re hx, iyi. So we can write
hx, yi = Re hx, yi + i Re hx, iyi.
Now one can identify the vector space Cn with R2n by sending a vector (z1 , . . . , zn ) ∈ Cn
to the vector (Re z1 , Im z1 , . . . , Re zn , Im zn ) ∈ R2n , which is an R-linear transformation. Let
h , iC , h , iR denote the usual dot products on Cn and R2n respectively. Then one can check that
for all x, y ∈ Cn , Re hx, yiC = hx, yiR , where the right hand side is computed after identifying x, y
with their images in R2n under the prescribed identification. However, one can also check that
the induced norms k kC and k kR are same, i.e., it does not matter whether we compute the norm
of an element x ∈ Cn , or we compute its norm after identifying it as an element of R2n , which is
consistent with our intuition. Therefore, although the induced norms are same, the complex dot
product on Cn seems to ‘carry more information’ than its real counterpart, the dot product on R2n .
But actually the the complex inner product does not give us anything ‘more’ as the ‘imaginary
part’ of a complex inner product can be retrieved from its ‘real part’. Hence (Cn , ·) contains the
‘same information’ as (R2n , ·).
60
The angle between two nonzero vectors
If x, y ∈ R2 , we know from the knowledge of Cartesian geometry that θ(x, y), the angle between x
and y, is given by x · y = kxk kyk cos θ(x, y). Now any two vectors in Rn which are not collinear
generate a plane in Rn , and after identifying the plane with R2 , we can compute the angle between
those vectors by treating them as vectors in R2 . One can check the angle θ(x, y) is then given by
!
x · y
θ(x, y) = cos−1 , where 0 ≤ θ(x, y) ≤ π.
kxk kyk
So if (V, h , iR ) is a real inner product space and x, y ∈ V are nonzero vectors, we define θ(x, y),
the angle between x and y, as
!
−1 hx, yiR
θ(x, y) = cos , where 0 ≤ θ(x, y) ≤ π.
kxk kyk
Now if V is a complex inner product space then the inner product takes values in complex num-
bers. So, for example, if V = Cn with the usual dot product, then it’s only natural to identify Cn
with R2n and use the available notion of angle in R2n . This is something which we’ve always done
by identifying C with R2 . So inspired by the relation between Cn and R2n with their usual dot
products, we give the general definition of the angle between two nonzero vectors as follows.
Definition. Let (V, h , i) be an inner product space over F = R/C. If x, y ∈ V are nonzero
vectors then θ(x, y), the angle between x and y, is defined as
!
−1 Re hx, yi
θ(x, y) = cos , where 0 ≤ θ(x, y) ≤ π.
kxk kyk
But one should remember that the above definition is not very ‘intuition-friendly’ over C. As
for a simple example, let V := C be the one-dimensional vector space over C with the usual dot
product. Then in the algebraic sense, i, 1 ∈ V are ‘C-collinear’ because together they generate a
one-dimensional space. However, by the above definition, the angle between them is θ(i, 1) = π/2,
i.e., the vectors are mutually perpendicular! Here ‘the mystery’ lies in the fact that we ‘forget’
the ‘imaginary part’ of the inner product while computing the angle. The vectors i and 1 are not
orthogonal in the ‘complex sense’ because hi, 1i = i 6= 0. But they ‘become’ perpendicular when
we identify C with R2 , where 1 is mapped to (1, 0) and i is mapped to (0, 1).
61
Matrix representations of inner products
Let (V, h , i) be a finite dimensional inner product space. If B := (x1 , . . . , xn ) is an ordered basis
of V , then the matrix MB whose ij-th entry is hxj , xi i, uniquely determines the inner product.
We say that MB is the matrix representation of h , i with respect to the ordered basis B. One can
check that for all x, y ∈ V , hx, yi = [y]∗B MB [x]B .
Conversely, a matrix M ∈ Mn (F ) is called a matrix of an inner product h , i if there exists an
inner product h , i : V × V → F and an ordered basis B := (x1 , . . . , xn ) of V such that M = MB .
We make a few observations about such a matrix M .
(i) M is a Hermitian matrix if F = C and a symmetric matrix if F = R. Recall that a complex
matrix A is said to be Hermitian if A = A∗ . So if A ∈ Mn (R), then A, as a matrix over C,
is Hermitian iff it’s symmetric.
(ii) Every diagonal entry of M is a positive real number.
(iii) For every n × 1 column vector x over F , x∗ M x > 0. A Hermitian matrix A ∈ Mn (C)
(respectively, a symmetric matrix A ∈ Mn (R)) is said to be positive-definite if x∗ Ax > 0
(respectively, xt Ax > 0) for all x ∈ Cn \ {0} (respectively, x ∈ Rn \ {0}). Therefore a matrix
is a matrix of an inner product iff it’s positive-definite.
(iv) Every eigenvalue of M is a nonzero positive real number.
(v) If P ∈ GLn (F ), then hx, yi := y ∗ P ∗ P x defines an inner product on F n . Therefore P ∗ P
is a matrix of an inner product. Later we’ll see that the converse is also true, i.e., every
positive-definite matrix is of this particular form.
Remarks.
1. Let A ∈ Mn (C) be a Hermitian matrix. Then x∗ Ax ∈ M1 (C) is also a Hermitian matrix for
all n × 1 column vectors x over C. Therefore Ax · x is a real number for all x ∈ Cn . We’ll
later connect this to certain characterizing property of complex self-adjoint operators.
2. For all A ∈ Mn (F ), A + A∗ and AA∗ are both Hermitian matrices. Note that a scalar matrix
λI ∈ Mn (F ) is Hermitian iff λ ∈ R. Actually, complex Hermitian matrices ‘behave’ like
real numbers. For example, every matrix A ∈ Mn (C) can be uniquely written in the form
A = B + iC, where B, C are Hermitian matrices. (Can you solve the equation A = X + iY
assuming that X, Y are Hermitian matrices?). We will further explore the analogy between
complex Hermitian matrices and real numbers in the future.
3. A positive (or negative) definite matrix is invertible.
4. The map φ : Mn (F ) → Mn (F ), given by φ(A) := A∗ , is an R-algebra anti-isomorphism, i,e.,
it satisfies all the properties of an R-algebra isomorphism except that it switches the order
of multiplication, viz., φ(AB) = φ(B)φ(A). Note that φ2 = Id.
5. Let A ∈ Mn (F ) be a Hermitian matrix with n > 1 (How does an 1 × 1 Hermitian/positive
definite/positive semi-definite/negative definite/negative semi-definite matrix look like?).
Then we have seen that Ax · x ∈ R for all x ∈ Mn×1 (F ). So we can partition F n into
three sets - Ω+ (A) := {x ∈ F n | Ax · x > 0}, Ω− (A) := {x ∈ F n | Ax · x < 0} and
62
Ω0 (A) := {x ∈ F n | Ax · x = 0}. Note that all three sets are closed under nonzero scalar
multiplication.
The matrix A is positive (respectively, negative) definite iff Ω+ (A) ∪ {0} = V (respectively,
Ω− (A) ∪ {0} = V ).
Similarly, A is positive (respectively, negative) semi-definite iff Ω− (A) = ∅ (respectively,
Ω+ (A) = ∅). As F n \ {0} is connected and Ω+ (A), Ω− (A) are disjoint open sets, it fol-
lows that if A is an indefinite matrix then Ω0 (A) 6= 0. In this case, each one of the sets
Ω+ (A) ∪ {0}, Ω− (A) ∪ {0} and Ω0 (A) is a union of lines passing through the origin.
To illustrate the idea, we consider the set of 2 × 2 real symmetric matrices. Then for a
may
α β
symmetric matrix A = , we see that Ω0 (A) is the solution set of the homogeneous
β γ
quadratic equation
αX 2 + 2βXY + γY 2 ∈ R[X, Y ].
Clearly, Ω0 (A) can either be the singleton set consisting of the origin, or a line passing
through the origin, or a pair of lines passing through the origin. In the first case, A is a
positive (or negative) definite matrix. In the second case, A is a positive (or negative) semi-
definite matrix which is not positive (or negative) definite. And in the third case, A is an
indefinite matrix (why?).
1 0 1 0 3 0
You may consider the 2 × 2 real matrices A := , , to get an idea
0 0 0 −1 0 −1
about how these sets may look like.
The first matrix is positive semi-definite, with Ω0 (A) = {(0, y) ∈ R2 | y ∈ R}. The other two
matrices are indefinite, but there’s a difference. The second matrix gives the reflection with
respect to the X-axis. Here Ω0 (A) = {(x, y) ∈ R2 | x2 = y 2 }. So we get a partition of R2
in four symmetric quadrants, where the opposite quadrants consist of vectors of the ‘same
sign’. The last matrix, besides being indefinite, also imparts some stretch on the vectors.
Here Ω0 (A) = {(x, y) ∈ R2 | 3x2 = y 2 }. Now the quadrants are not symmetric anymore.
Only the opposite quadrants are symmetric. The quadrants in which the vectors of ‘positive
sign’ are lying, are bounded by two lines meeting at 120◦ at the origin. So we can actually
choose an orthonormal basis u, v ∈ R2 such that Au · u and Av · v are both positive, but A is
not a positive definite matrix. In a sense, the first matrix is a ‘limiting case’ of the other two
matrices when the two lines describing Ω0 (A) ‘coincide’ (If you imagine that the two lines
describing Ω0 (A) for the second or the third matrix are ‘pulled’ towards the Y -axis, then, as
a ‘limit’, we get the first matrix.).
A mental picture ‘identifying’ the set of all 2 × 2 real symmetric matrices with the real plane
R2 using the eigenvalues of the matrices may aid our understanding (We don’t want to make
it precise. For different matrices may have the same eigenvalues, and for a symmetric matrix
A with eigenvalues, say 1 and 2, can be sent either to (1, 2) or to (2, 1)!). It’ll give us the
following correspondances.
63
1 1 x1
8. Let A := ∈ M2 (R). Then one can check that for all x := ∈ R2 \ {0},
−1 1 x2
xt Ax = x21 + x22 > 0. But A is not a symmetric matrix.
However, we’ll later see that for any matrix M ∈ Mn (C), the condition that x∗ M x > 0 for
all nonzero vectors x ∈ Cn actually implies that M is Hermitian.
(a) A+B, ABA, BAB are Hermitian matrices, and AB is a Hermitian matrix iff AB = BA.
(b) If A is positive (respectively, negative) definite and B is positive (respectively, negative)
semi-definite, then A + B is a positive (respectively, negative) definite matrix.
(c) If A, B are positive (respectively, negative) definite matrices, then so are ABA and
BAB.
(d) If A, B are positive (respectively, negative) semi-definite matrices, then so are ABA and
BAB.
(e) If A is an positive (respectively, negative) definite matrix, so is A−1 . Now the scalar
matrix λI is positive (respectively, negative) definite iff λ is a positive (respectively,
negative) real number. Therefore, for a positive definite matrix A, λA is positive (re-
spectively, negative) definite iff λ > 0 (respectively, λ < 0).
(f) If A, B are positive (or negative) (semi-)definite matrices such that AB = BA then we’ll
later see that AB is a positive (semi-)definite matrix.
Can you give an example of positive definite matrices A, B ∈ Mn (F ) such that AB is
not even Hermitian?
11. Every vector z ∈ Cn can be uniquely written as z = x + iy, where x, y ∈ Rn . Using this,
it’s easy to prove that a symmetric matrix A ∈ Mn (R) is a positive definite (or negative
definite/positive semi-definite/negative semi-definite) matrix, then A, treated as an n × n
matrix over C, is also positive definite (or negative definite/positive semi-definite/negative
semi-definite).
In the proof, we crucially use the fact that Az · z ∈ R for all z ∈ Cn . Without this property,
it may happen for some matrix A ∈ Mn (R) that Ax · x > 0 for all x ∈ Rn , but Az · z < 0 for
some z ∈ Cn , as the example in Remark 8 shows. Thus, if we define a matrix to be ‘positive
definite’ if it only satisfies the positivity condition, without requiring it to be Hermitian,
then this will lead us to a situation where a matrix A ∈ Mn (R) will be ‘positive-definite’;
but when treated as a matrix over C, it won’t be ‘positive definite’ !
12. Let A ∈ Mn (F ) be a positive definite matrix. Then we’ve seen that all eigenvalues of A are
positive real numbers. Therefore det A is also positive. For each i ≤ n, let Ai be the i-th
leading principal sub-matrix of A, i.e., the matrix obtained from A by deleting the rows and
columns starting from i + 1. Then it’s easy to see that each Ai is also positive definite (Just
look at the vectors in F n whose last n − i co-ordinates are 0.). It implies that for each i ≤ n,
det Ai , called the i-th leading principal minor of A, is a positive real number. Surprisingly,
the converse is also true. Sylvester’s criterion, which we’ll not prove, states that ‘A Hemitian
matrix M ∈ Mn (F ) is positive definite iff every leading principal minor of M is positive’.
64
(iii) Positive definiteness: For all nonzero vectors x ∈ V , kxk > 0.
An ordered pair (V, k k), where V is a vector space over F = R/C and k k : V → R+ 0 is a norm on
V , is called a normed linear space, or in short, an N LS. Like inner product spaces, a normed linear
space is called separable, complete etc. if it’s so with respect to the induced metric. A complete
normed linear space is also called a Banach space, named after Stefan Banach.
By a metric vector space, or an MVS, in short, we mean an ordered pair (V, d) where V is a vec-
tor space over F = R/C, and d : V × V → R is a metric such that the vector space operations
+ : V × V → V and · : F × V → V are continuous, where the products are given the product
metrics.
A topological vector space, or a TVS, in short, means an ordered pair (V, I) where V is a vector
space over F = R/C, and I ⊆ P(V ) is a topology on V such that the vector space operations
+ : V × V → V and · : F × V → V are continuous, with the products being given the product
topologies.
Remarks.
1. Every inner product space is also a normed linear space with respect to the induced norm.
Similarly, every normed linear space is a metric vector space with respect to the induced
metric, and every metric vector space is a topological vector space with respect to the induced
topology. So IPS =⇒ NLS =⇒ MVS =⇒ TVS gives certain hierarchy.
2. We have seen that a norm which is induced by an inner product, satisfies the parellelogram
law. Conversely, if a norm k k on a vector space V satisfies the parallelogram law, then it
is induced by an inner product which can be retrieved from the norm using the polarization
identities
1
hx, yiR := (kx + yk2 − kx − yk2 ), if F = R,
4
and
1
hx, yiC := (kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 ), if F = C.
4
It requires some computation to show that the functions defined above actually give inner
products on V . Clearly, different inner products cannot induce the same norm.
3. If V is a normed linear space then the norm may not be induced by an inner product. For
example, one can consider the sup norm k ksup on F 2 , defined as k(x, y)ksup := max {|x|, |y|},
which is not induced by an inner product (why?). More generally, if we take V := F (N) , and
define the `p -norm on V as
∞
X 1
k(xn )kp := ( |xn |p ) p ,
n=1
then one can show that (We’re not going to prove it!) the `p -norm satisfies the parallelogram
law iff p = 2.
4. Let V be a vector space over F = R/C. Two norms `1 , `2 on V are said to be equivalent if
there exist real numbers c, C > 0 such that
for all x ∈ V . It’s easy to check that this indeed gives an equivalence relation on the set of all
norms defined on V . Note that if `1 , `2 are two norms on V and S1 := {x ∈ V | `1 (x) = 1},
then `1 and `2 are equivalent iff inf {`2 (x) | x ∈ S1 } and sup {`2 (x) | x ∈ S1 } are both nonzero
real numbers, in which case, we can take c := inf {`2 (x) | x ∈ S1 } and C := sup {`2 (x) | x ∈
S1 }.
Thankfully, and perhaps not surprisingly, any to norms defined on a finite-dimensional vector
space V are equivalent. To prove this, one can show that if ` : V → R0+ is a norm on F n ,
then ` is a continuous function with respect to the usual Euclidean metric. We leave its proof
as an exercise.
However, two norms on an infinite dimensional normed linear space need not be equivalent.
For example, one may compare the `1 -norm with the sup norm on F (N) , and look at the
sequence (e1 + e2 + . . . + en )∞
n=1 .
65
5. If `1 , `2 are equivalent norms on a vector space V over F = R/C, then a set X ⊆ V is open
with respect to the metric induced by `1 iff it’s open with respect to the metric induced by `2 .
Therefore, if needed, we can answer various topological questions after replacing the original
norm by an equivalent norm.
6. If (V, k k) is a finite-dimensional normed linear space then the unit sphere, defined as S :=
{x ∈ V | kxk = 1}, is a compact set.
The converse is also true. If follows from Riesz’s lemma which states that
‘Let (V, k k) be a normed linear space, Y a closed proper subspace of V and α a real number
with 0 < α < 1. Then there exists an x ∈ S such that kx − yk ≥ α for all y ∈ Y .’
We will not prove Riesz’s lemma in this course.
7. As we saw in the case of inner product spaces, if `1 , `2 are norms on a vector space V over
F = R/C, then `1 + `2 is also a norm on V . If V 6= 0 the λ`1 is a norm on V iff λ > 0.
Analogous properties hold for metric vector spaces (What happens to the assertion involving
the difference of two norms/metrics?).
8. If (V1 , k k1 ), (V2 , k k2 ) are two normed linear spaces then we can define a norm on the direct
sum V1 ⊕ V2 as follows
k(x, y)k := kxk1 + kyk2 .
9. A metric d which makes F n a metric vector space need not be induced by a norm. To give
such an example, it’s tempting to consider the discrete metric on F n . But F n does not
become a metric vector space with respect to the discrete metric because then the scalar
multiplication is not continuous. So we consider the following metric
kx − yk
d(x, y) := ,
1 + kx − yk
where k k is the usual Euclidean norm on F n . Then one can check that (V, d) is a metric
vector space. But since d is bounded, it cannot be induced by a norm.
If (V, d) is a metric vector space, the d is induced by a norm iff it’s translation invariant,
i.e., d(x + z, y + z) = d(x, y) for all x, y, z ∈ V , and absolutely homogeneous, i.e., d(λx, λy) =
|λ|d(x, y) for all x, y ∈ V and λ ∈ F . In this case, we can ‘get back’ the norm by simply
defining kxk = d(0, x). In the above example, d is translation invariant, but not absolutely
homogeneous. We’ll later see an example of an absolutely homogeneous metric which is not
translation invariant.
10. Clearly, two different norms cannot induce the same metric. However, it’s a fact, which we
won’t prove, that there exists a unique topology on F n , viz., the product topology, which
makes it a topological vector space. Therefore every metric on F n which makes it a metric
vector space, induces the same topology.
11. Every norm on an one-dimensional vector space V over F = R/C is induced by an inner
product. In fact, it follows from the parallelogram law that if (V, k k) is a normed linear
space, then the norm on V is induced by an inner product iff for every two dimensional
subspace W ≤ V , the restriction k k W is induced by an inner product on W .
Morphisms
A linear transformation T between inner product spaces (V, h , iV ), (W, h , iW ) is said to be a
morphism (or homomorphism) of inner product spaces if T preserves the inner product, i.e., if
hT (x), T (y)iW = hx, yiV for all x, y ∈ V (Can you see the diagrammatic interpretation?). We’ll
later see that every set-theoretic map preserving the inner product is actually linear (Essentially
because the coefficients of a vector can be written in terms of the inner product.). Note that a
morphism of inner product spaces also preserves the induced norm and metric.
A linear transformation T between normed linear spaces (V, k kV ), (W, k kW ) is said to be a mor-
phism (or homomorphism) of normed linear spaces if T preserves the norm, i.e., if kT (x)kW = kxkV
for all x ∈ V .
If (V, dV ), (W, dW ) are metric vector spaces then a linear transformation T : V → W is sad to be a
morphism (or homomorphism) of metric vector spaces if T preserves distance, i.e., it’s an isometry.
66
If T is also surjective, then we call it an isometric isomorphism.
Finally, if (V, TV ), (W, IW ) are topological vector spaces, then a continuous linear transformation
T : V → W is called a morphism (or homomorphism) of topological vector spaces.
Remarks.
1. If (V, h , i) is an n-dimensional inner product space, then choosing an ordered orthonormal
basis B := (x1 , . . . . , xn ) of V is ‘same as’ giving an inner product space isomorphism between
(V, h , i) and (F n , ·). So just like vector spaces, up to isomorphism, there exists a unique
Pn inner product space for every positive integer n. Note that for all x, y ∈ V ,
n-dimensional
hx, yi = i=1 hx, xi ihy, xi i, just like the usual dot product.
2. A linear transformation between normed linear spaces preserves the norm iff it preserves the
induced metric.
3. Every morphism of inner product spaces/normed linear spaces/metric vector spaces is in-
jectibve.
4. If T is a linear transformation between inner product spaces V, W then T preserves the inner
product iff it preserves the induced norm. The implication is trivial in one direction, and the
other direction follows from the polarization identities.
Let (V, k k) be a normed linear space over F = R/C. Suppose there exists a symmetric relation on
V , called orthogonality and denoted by x ⊥ y, such that the following properties are satisfied.
1. For all x 6= 0, x⊥ := {y ∈ V | x ⊥ y} is a hyperplane in V .
2. If x, y ∈ V are orthogonal vectors, i.e., x ⊥ y, then
Then one can check that k k satisfies the parallelogram law, or equivalently, the norm on V is
induced by an inner product. The details are left as an exercise. Note that the first property
allows us to mimic Gram-Schmidt orthogonalization, whereas the second property, Pythagorean
law, captures the essence of orthogonality and connects it to the given norm.
Exercises.
1. HK : Section 8.1 - 3,6,7,8,11,13,14,15,16,17.
2. (i) Let F = R/C and Mn (F ) the set of n × n matrices over F . Then show that the rank
function rk : Mn (F ) → N is a lower semi-continuous function with respect to the
Euclidean metric, i.e., if A ∈ Mn (F ) has rank r then there exists an > 0 such that
rk B ≥ r for all B ∈ Mn (F ) satisfying kA − Bk < .
Is the rank function continuous?
(ii) Let V be a finite-dimensional normed linear space and x1 , . . . , xn ∈ V linearly indepen-
dent vectors. Then there exists an > 0 such that for all y1 , . . . , yn ∈ V . if d(xi , yi ) <
for all i, then y1 , . . . , yn ∈ V are also linearly independent.
3. If V is an arbitrary vector space over F = R/C, can you define an inner product on V ?
4. Prove that an inner product space V is separable iff there exists a countable set S ⊆ V such
that the linear span of S is dense in V .
5. Let h , i, h , i0 be two inner ptoducts on a vector space V over F . Suppose that S ⊆ V is a
spanning set of V . Then h , i = h , i0 iff hx, yi = hx, yi0 for all x, y ∈ S.
67
6. Show that an orthonormal set in an inner product space cannot have any limit point.
7. (*) Let S be an orthonormal set in an inner product space V . If x ∈ V , the show that the
set {y ∈ S | hx, yi =
6 0} is at most countable.
8. (*) Let (V, h , i) be an inner product space. Let Ṽ be the set of all Cauchy sequences of V
with respect to the induced metric. Two Cauchy sequences (xn ), (yn ) ∈ V N are said to be
equivalent if for all > 0 there exists a positive integer n ∈ N, such that d(xn , yn ) < for all
n ≥ n , where d is the induced metric. Let Vb be the metric space completion of V , i.e., the
set of all equivalence classes of Cauchy sequences of V . Clearly, V can be embedded in Vb
by sending an element of V to the equivalence class of the corresponding constant sequence.
Then show that (Vb , h , i), with respect to the following definition, also becomes an inner
product space containing V as a dense subspace.
(i) [(xn )] + [(yn )] := [(xn + yn )].
(ii) λ[(xn )] := [(λxn )].
(iii) h[(xn )], [(yn )]i := lim hxn , yn i.
n→∞
9. Prove that an infinite-dimensional Hilbert space V cannot have an orthogonal spanning set.
Deduce that a Hilbert space V which is not finite-dimensional must have an uncountable
dimension. In particular, a Hilbert space V is finite dimensional iff it has an orthonormal
spanning set.
10. Let V be an inner product space. Then show that for all x, y ∈ V , kx − yk ≥ | kxk − kyk |,
where k k is the induced norm function.
Now prove the result for an arbitrary normed linear space (without assuming that the norm
is induced by an inner product).
11. Let T : (V h , iV ) → (W, h , iW ) be a linear transformation between inner product spaces.
Then show that the following statements are equivalent.
(i) T preserves inner product, i.e., hT (x), T (y)iW = hx, yiV for all x, y ∈ V .
(ii) For every basis B of V , hT (x), T (y)iW = hx, yiV for all x, y ∈ B.
(iii) There exists a basis B of V , hT (x), T (y)iW = hx, yiV for all x, y ∈ B.
12. Let (V, h , i) be an n-dimensional inner product space over F . Then show that (V, h , i),
as an inner product space, is isomorphic to (F n , ·), i.e., there exists a linear transformation
T : V → F n such that hx, yi = T (x) · T (y) for all x, y ∈ V .
13. Let V be an inner product space. If W is a subspace of V , then show that W , the closure of
W in V with respect to the induced metric, is also a subspace of V .
Prove that every finite-dimensional subspace in an inner product space is closed with re-
spect to the induced metric (We’ve already seen examples of proper dense subspaces, so all
subspaces need not closed.).
14. Let (V, h , i) be an inner product space and x1 , . . . , xn ∈ V . Let A ∈ Mn (F ) be the matrix
defined as Aij = hxi , xj i for all i, j. Then prove the following statements.
(a) x1 , . . . , xn are linearly independent iff A is an invertible matrix.
(b) x1 , . . . , xn is an orthogonal sequence iff A is a diagonal matrix. x1 , . . . , xn is an orthog-
onal sequence of nonzero vectors iff A is a diagonal matrix whose every diagonal entry
is nonzero.
(c) x1 , . . . , xn is an orthonormal sequence iff A = In .
(d) If W is the subspace of V generated by x1 , . . . , xn , then rk A = dim W .
15. (*) Let (V, h , i) be an inner product space and S := {x ∈ V | kxk = 1}, the unit sphere of
V . Then show that S is compact iff V is finite dimensional.
16. (*) Show that the following statements are equivalent for an inner product space (V, h , i).
(i) V is finite dimensional.
68
(ii) Every subspace of V is closed.
(iii) Every hyperplane of V is closed.
(iv) Every linear functional on V is continuous.
17. Let (V, h , i)V , (W, h , i)W be inner product spaces. If T : V → W is a linear transformation,
then prove that the following statements are equivalent.
(i) T preserves the inner product, i.e., hT (x), T (y)iW = hx, yiV for all x, y ∈ V .
(ii) For every orthonormal sequence x1 , . . . , xn ∈ V , the sequence T (x1 ), . . . , T (xn ) ∈ W is
also orthonormal.
If dim V > 1, the the above two conditions are equivalent to
(iii) If u, v ∈ V are orthonormal vectors then so are T (u), T (v) ∈ W .
18. (*) Let A be an m × n matrix over F = R/C. Then show that the following statements are
equivalent.
(i) A has rank m.
(ii) For all b ∈ Mm×1 (F ), there exists a real number > 0 (depending only on A) such that
the system of linear equations A0 X = b has a solution in F n for all A0 ∈ Mm×n (F )
satisfying kA − A0 k < .
(iii) There exists a nonzero column vector b ∈ Mm×1 (F ) and a real number > 0 such
that such that the system of linear equations A0 X = b has a solution in F n for all
A0 ∈ Mm×n (F ) satisfying kA − A0 k < .
19. If A is an n × n matrix over F = R/C, then prove the following statements.
69
(ii) Give an example of a length preserving map f : V → W which is not an isometry (In
fact, can you give an example where f is continuous only at the origin?).
(iii) If F = R and f : V → W is an isometry such that f (0) = 0, then show that f preserves
the inner product, i.e., hf (x), f (y)iW = hx, yiV for all x, y ∈ V .
What if we replace R by C?
22. If A ∈ Mn (C) is a Hermitian matrix, show that chA (X) ∈ R[X].
23. If (V, h , i) is an inner product space then for all nonzero vectors x, y ∈ V and nonzero real
numbers c1 , c2 , if c1 c2 > 0 then θ(x, y) = θ(c1 x, c2 y).
24. (*) Let V be a vector space over C, the field of complex numbers. Then we can naturally
view V as a vector space over R. Then show that
(i) If h , iC a complex inner product on V then h , iR : V × V → R, defined as
is a complex inner product on V iff hix, iyiR = hx, yiR for all x, y ∈ V .
25. (*) Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear transforma-
tion. The show that
(a) The following statements are equivalent.
(i) T preserves length, i.e., kT (x)kW = kxkV for all x ∈ V .
(ii) T preserves length of unit vectors, i.e., kT (x)kW = 1 for all unit vectors x ∈ V .
(b) If T is injective then following statements are equivalent.
(i) T is angle preserving, i.e., θ(T (x), T (y)) = θ(x, y) for all nonzero vectors x, y ∈ V .
(ii) T preserves the angle between unit vectors, i.e., θ(T (x), T (y)) = θ(x, y) for all
x, y ∈ V satisfying kxkV = kykV = 1.
In both the conditions, the injectivity of T must be assumed as we define angle only
between nonzero vectors.
(c) T preserves inner product iff T preserves the inner product between any two unit vectors
of V .
(d) The following statements are equivalent.
(i) T preserves inner product.
(ii) T preserves length.
(iii) T is an isometry.
(e) If T is length preserving, then it’s also angle preserving.
(f) Give an example of a linear transformation T which is angle preserving, but not length
preserving.
26. (a) If (V, h , i) is an inner product space then show that the induced norm k k : V → R+ 0
and the vector space operations +, · are continuous with respect to the induced metric.
(b) If (V, k k) is a normed linear space, then show that the vector space operations +, · are
continuous with respect to the induced metric.
27. (*) Every finite-dimensional subspace of a normed linear space is closed.
28. (*) Let (V, k k) be a Banach Space. Then show that the following statements are equivalent.
(i) V is finite-dimensional.
(ii) Every subspace of V is closed.
70
(iii) Every hyperplane of V is closed.
(iv) Every linear functional on V is a continuous function.
Can you give an example of an infinite dimensional normed linear space V and a linear
functional f : V → F such that f is not continuous?
29. (*) If V, W are real inner product spaces and f : V → W is a set theoretic map, then show
that the following statements are equivalent.
(a) f preserves inner product.
(b) f preserves both the length of vectors and the angle between nonzero vectors.
If we instead take V and W to be complex inner product spaces, does (ii) =⇒ (i)? What
if we further assume that f (ix) = if (x) for all x ∈ V ?
30. (a) Let V be a vector space over C. Then we can as well view V as a vector space over R.
Now show that
(i) Every C-linearly independent sequence of vectors is also R-linearly independent.
(ii) Every R-spanning set of V is also a C-spanning set.
(iii) If S ⊆ V is a C-spanning set, then S ∪ iS is a R-spnning set, where iS := {ix | x ∈
S}.
(iv) If (x1 , . . . , xr ) is a C-linearly independent sequence in V then (x1 , . . . , xr , ix1 , . . . , ixr )
is an R-linearly independent sequence.
(b) Now suppose that (V, h , iC ) is a complex inner product space. Then V becomes a real
inner product space under the induced inner product
hx, yiR := Re hx, yiC , for all x, y ∈ V .
Then show that
(i) Every C-orthogonal set (with respect to h , iC ) is R-orthogonal (with respect to
h , iR ).
(ii) If x, y ∈ V are R-orthogonal, so are λx, µy for all λ, µ ∈ C satisfying λµ̄ ∈ R.
(iii) Two nonzero vectors x, y ∈ V are perpendicular to each other, i.e., θ(x, y) = π2 , iff
hx, yiR = 0, or equivalently, hx, yiC is a purely imaginary number. Therefore, al-
though R-orthogonality is same as being perpendicular, C-orthogonality is actually
a ‘stronger’ condition.
(iv) Give an example of mutually perpendicular vectors x, y ∈ V , such that x, iy are
not mutually perpendicular (Note that in a real inner product space, nonzero
multiples of mutually perpendicular vectors remain mutually perpendicular. This
happens because, unlike real numbers, multiplication by a complex number can
impart ‘non-trivial rotation’ on a vector. So the concept of ‘angle’, in some sense,
is essentially a ‘real’ concept.).
(v) If (x1 , . . . , xr ) is a C-orthogonal (respectively, C-orthonormal) sequence, then the
sequence (x1 , . . . , xr , ix1 , . . . , ixr ) is R-orthogonal (respectively, R-orthonormal).
31. Let x, y be nonzero vectors in an inner product space V . Then show that kx + yk2 =
kxk2 + kyk2 iff x and y are mutually perpendicular, i.e., θ(x, y) = π2 .
32. (*) Here we consider the metric, called the French railways metric, defined on, say R2 , as
kx − yk if x, y lie on the same line passing through the origin
d(x, y) =
kxk + kyk otherwise,
where k k is the usual Euclidean norm on R2 . Then show that (R2 , d) is a metric vector
space. The metric d is absolutely homogeneous, but not translation invariant.
Remark. The name of the metric comes from the idea that suppose every rail line in France
passes through Paris and Paris is the only junction where two rail lines meet. To visualize
the situation, you may think of an infinite collection of lines passing through the origin, but
there’s no plane as such to traverse!
71
Lecture 20 and 21 (28/12/2020, 30/12/2020) :
Best approximations and orthogonal projections
To understand best approximations and orthogonal projections let us first look at the following
simple real life situation -
“Suppose that a child is locked alone in an empty room with only some balloons hanging from
the ceiling. Now if he/she wants to get hold of one of the balloons what will he/she do?”
The ‘kiddish’ approach to solve this ‘problem’ gives us an idea of best approximations and
orthogonal projections. In a formal mathematical set-up, the problem is to look at R3 and try
to find the (?) point on the XY -plane which is closest to the point p := (0, 0, 1). We all know
from our knowledge in co-ordinate geometry that the desired point is the origin - the orthogonal
projection of p on the XY -plane. We take a line passing through p which is perpendicular to the
XY -plane, and the point of intersection of this line with the XY -plane turns out to be the point
which is closest to p among all points lying on the XY -plane.
Question. Let (X, d) be a metric space, x ∈ X and Y a subset of X. Then does there exist a
point y ∈ Y such that d(x, y) = d(x, Y )? And if such a point exists, is it unique?
Let us first consider a few cases where either best approximation doesn’t exist, or it’s not
unique.
1. If Y is not closed in X and x ∈ Y \ Y , then d(x, Y ) = 0 but d(x, y) > 0 for all y ∈ Y .
Therefore, unless Y is closed, best approximations cannot exist in general. In particular, if
V is an infinite dimensional inner product space then for any subspace W ≤ V which is not
closed, there will be vectors in V without having any best approximation in W . We’ll later
see that in an inner product space, best approximation, whenever exists, is unique.
2. If W is a finite-dimensional subspace of a normed linear space (V, k k), then every x ∈ V
has a best approximation in W . To see this, first replace V by V 0 , the finite-dimensional
subspace of V generated by W and x. If d(x, W ) = r, let C be the compact set, defined as
C := {v ∈ V | d(x, v) = r}. As d(C, W ) = 0, and W is a closed subset of V , it follows that
C ∩ W 6= ∅, showing the existence of best approximation.
Best approximations, however, needn’t be unique in normed linear spaces. For example, let
us consider F 2 with respect to the sup norm. If W is the subspace of F 2 generated by (1, 0),
then every vector (λ, 0) ∈ W satisfying |λ| ≤ 1 is a best approximation of (0, 1) in W .
As best approximations may not be unique in a general normed linear space, we’ll mostly talk
about them in the context of inner product spaces. Let us begin with a few definitions.
Definitions. Let W be a subspace of an inner product space (V, h , i). Then for an element
x ∈ V , xW ∈ W is said the be a best approximation of x in W if kx − xW k ≤ kx − wk for all
w ∈ W.
Two sets S1 , S2 in V are said to be mutually orthogonal, or orthogonal to each other, if x ⊥ y for
all x ∈ S1 and y ∈ S2 .
If x ∈ V , then the orthogonal complement of x, denoted by x⊥ , is defined as
x⊥ := {y ∈ V | hx, yi = 0}.
If S ⊆ V , the orthogonal complement of S, denoted by S ⊥ , is defined as
\
S ⊥ := x⊥ .
x∈S
72
Some basic properties
1. For all x ∈ V , x⊥ is a closed hyperplane in V . It is so because the linear functional on V
which sends an element y ∈ V to hy, xi ∈ F , is a continuous function. In particular, S ⊥ is a
closed subspace of V for all S ⊆ V .
2. If S1 ⊆ S2 ⊆ V , then we have the reverse inclusion S2⊥ ⊆ S1⊥ .
3. Note that 0⊥ = V and V ⊥ = 0. In fact, if W ≤ V is a dense subspace, then W ⊥ = 0.
Surprisingly, there exist examples of inner product spaces V and subspaces W ≤ V , such
that W is not dense in V , but W ⊥ = 0. However, such ‘abnormalities’ cannot occur if V is
complete.
⊥
4. If S ⊆ V , then S ⊥ =< S >⊥ = < S > .
5. For all W ≤ V , W and W ⊥ are linearly independent, i.e., W ∩ W ⊥ = 0.
6. If S ⊆ V , then < S > ⊆ S ⊥⊥ (:= (S ⊥ )⊥ ) as S ⊆ S ⊥⊥ and S ⊥⊥ is a closed subspace of V .
There exist examples to show that the inequality can be strict. However, when V is complete,
then we’ll later see that we always have an equality.
7. If V1 , V2 are subspaces of V , then the following assertions hold.
(a) V1 ⊥ V2 ⇐⇒ V1 ⊆ V2⊥ ⇐⇒ V2 ⊆ V1⊥ .
(b) V1⊥ + V2⊥ ⊆ (V1 ∩ V2 )⊥ . To see that the inequality can be strict, let V1 be a proper dense
subspace of V and V2 a one dimensional subspace generated by an element outside V1 .
(c) (V1 + V2 )⊥ = V1⊥ ∩ V2⊥ .
8. Let Wi be a family of subspaces in V such that W := ∪i Wi is also a subspace of V . Then
x ⊥ W iff x ⊥ Wi for all i.
9. If x ∈ V and w, w0 ∈ W are such that x − w ⊥ W and x − w0 ⊥ W , then w = w0 .
10. If x ⊥ W and W 0 is a subspace of W then x ⊥ W 0 .
11. Let x ∈ V and W a subspace of V . If xW ∈ W is a best approximation of x in W , then for
all subspaces W 0 ≤ W , if xW ∈ W 0 then xW = xW 0 .
kx − yk2 = |c1 − hx, x1 i|2 + . . . + |cn − hx, xn i|2 + |hx, xn+1 i|2 .
Therefore it’s easy to see that xW := hx, x1 ix1 + . . . + hx, xn ixn is the unique best approximation
of x in W . Also, the association x 7→ xW is a linear transformation.
73
Conversely, suppose that x − y ⊥ W . Now take any element w ∈ W . Then x − y ⊥ y − w, and
consequently kx − wk2 = kx − yk2 + ky − wk2 . Clearly, for all w 6= y, kx − wk2 > kx − yk2 .
Therefore, y is the unique best approximation of x in W .
To prove the second assertion, let y, y 0 ∈ W be two best approximations of x in W . Then x−y ⊥ W
and x − y 0 ⊥ W . In particular, y − y 0 ⊥ W . But that means y − y 0 ⊥ y − y 0 , implying that y = y 0 .
Remark. From the proposition preceding the above lemma, one may get the impression that
best approximation depends on the chosen ordered orthonormal basis (x1 , . . . , xn+1 ) as we ex-
pressed xW in terms of them. But the above lemma shows that this is not true, the best approxi-
mation does not depend on the chosen orthonormal basis.
Lemma. Let (V, h , i) be an inner product space and W a subspace of V . Suppose that every
x ∈ V has a best approximation in W , say xW . Then the map πW : V → V , sending each x ∈ V
to its best approximation xW ∈ W , has the following properties
(i) πW is a linear transformation.
(ii) πW is a projection with im πW = W and ker πW = W ⊥ , implying that V = W ⊕ W ⊥ is an
orthogonal decomposition.
Proof. Let x, y ∈ V . If W 0 is a finite-dimensional subspace of W containing both xW and yW ,
then πW 0 (x) = xW 0 = xW = πW (x) and πW 0 (y) = yW 0 = yW = πW (y). Now the linearity follows
from the finite-dimensional case.
The rest of the proof is left as an exercise.
We can sum up the the relation between best approximations and orthogonal projections in
the form of the following proposition.
Remarks.
1. If π : V → V is an orthogonal projection, then kπ(x)k ≤ kxk for all x ∈ V , with the equality
holding iff x is contained in the image of π. In particular, π is a continuous linear operator.
But a continuous projection, on the other hand, need not be orthogonal. For example, every
projection in F n is continuous, but they are seldom orthogonal.
A projection in an infinite dimensional inner product space may not be continuous. For
example, one may take V := F (N) with the usual dot product and π : V → V the projection
defined as π(en ) := ne1 for all n ≥ 1, where e1 , e2 , . . . is the natural orthonormal basis of V .
2. If W ≤ V admits an orthogonal projection, the W must be closed in V . The converse is
not true in general. In fact, there are even examples of closed subspaces W ≤ V , such that
W ⊥ = 0. However, we’ll later see that the converse is true if V is complete.
74
+ dim W ⊥ for all W ≤
3. If V is a finite dimensional inner product space, then dim V = dim WL
V . If V1 , . . . , Vr are mutually
P orthogonal subspaces of V then V = i Vi is an orthogonal
decomposition iff dim V = i dim Vi .
4. If V = V1 ⊕ V2 is an orthogonal decomposition, then V1⊥ = V2 and V2⊥ = V1 .
5. If V1 , V2 are subspaces of an inner product space V , then V1 ⊥ V2 implies that V1 ⊥ V2 . This
follows from the fact that inner product is a continuous function.
Bessel’s inequality
Let x1 , . . . , xn be an orthonormal sequence of vectors in an inner product space (V, h , i). Then
for each x ∈ V , the best approximation of x in W :=< x1 , . . . , xn > is given by
n
X
xW = hx, xi ixi .
i=1
and this is true for every orthonormal sequence in V . Consequently, if S ⊆ V is a set of nonzero
orthogonal vectors, then
X |hx, yi|2
≤ kxk2 .
kyk2
y∈S
In fact, at most countably many summands in the above sum can be nonzero. To see this, let
y
≥ n1 , then Sn is a finite set for every
Sx := {y ∈ S | hx, yi =6 0}. If Sn := y ∈ S x, kyk
S∞
positive integer n. Since Sx = n=1 Sn , the assertion follows.
Exercises.
1. HK : Section 8.2 - 5,6,7,8,9,10,13,15,17.
2. Let V be an inner product space and T : V → V a linear operator. If T commutes with
every orthogonal projection of V , then show that T is a scalar operator.
75
(i) f preserves the real part of the inner product, i.e., Re hf (x), f (y)iW = Re hx, yiV for
all x, y ∈ V .
(ii) f is R-linear, i.e., f is an additive group homomorphism and f (λx) = λf (x) for all
x ∈ V and λ ∈ R. In particular, if F = R, then f is an injective linear transformation.
If F = C, must then f be a linear transformation?
Does the answer change if we further assume that f (ix) = if (x) for all x ∈ V ?
4. (i) Let M be an n × n matrix over F = R/C. If y ∗ M x = 0 for all x, y ∈ F n , then show
that M = 0.
Deduce that A ∈ Mn (F ) is a Hermitian matrix iff Ax · y = x · Ay for all x, y ∈ F n .
(ii) If A ∈ Mn (F ) is a Hermitian matrix and we think of A as a linear operator on F n , then
show that ker A ⊥ im A.
Deduce that for every Hermitian matrix A ∈ Mn (F ), there exists an invertible matrix
P ∈ Mn (F ) such that P P ∗ = I and P ∗ AP is a real diagonal matrix.
5. (i) If A ∈ Mn (F ) is a Hermitian matrix and f (X) ∈ R[X], then show that f (A) is also a
Hermitian matrix.
What happens if we take f (X) ∈ C[X]?
(ii) Let A ∈ Mn (F ) be a Hermitian matrix. We think of A is a linear operator on F n .
Suppose that f (X), g(X) ∈ R[X] are relatively prime polynomials. If f g(A) = 0 then
show that ker f (A) ⊥ ker g(A).
6. Let V be a finite-dimensional inner product space. A linear operator T : V → V is said to
be orthogonally diagonalizable if there exists an ordered orthonormal basis B := (x1 , . . . , xn )
such that [T ]B is a diagonal matrix, i.e., there exists an orthonormal basis of V consisting
of eigenvectors of T . Show that a projection T : V → V is an orthogonal projection iff T is
orthogonally diagonalizable.
7. Let Y be a subspace of an inner product space V and x ∈ V . If (yn )n∈N is a sequence
of elements in Y such that lim d(x, yn ) = d(x, Y ), then show that (yn )n∈N is a Cauchy
n→∞
sequence.
Deduce that if Y is a closed subspace of a Hilbert space V then every element x ∈ V has a
unique best approximation in Y .
8. Show that the following statements are equivalent in a Hilbert space V .
(i) W ≤ V is a closed subspace of V .
(ii) There exists a continuous projection π : V → V such the image of π is W .
(iii) W admits an orthogonal projection, i.e, there exists an orthogonal projection π : V → V
whose image is W . In particular, if W ≤ V , then W = W ⊥⊥ .
(iv) V = W ⊕ W ⊥
(v) Every element x in V has a unique best approximation in W .
Deduce that if V is a Hilbert space and W ≤ V , then W ⊥ = 0 iff W = V . In particular,
every maximal orthonormal set of V is an orthonormal basis.
9. Let (V, h , i) be an inner product space. If x, y ∈ V then show that x ⊥ y iff kxk ≤ kx + λyk
for all λ ∈ F .
Deduce that a projection π : V → V is an orthogonal projection iff kπ(x)k ≤ kxk for all
x∈V.
10. Let (V, h , i) be an inner product space and π : V → V an orthogonal projection. Then show
that hπ(x), yi = hx, π(y)i = hπ(x), π(y)i for all x, y ∈ V .
Deduce that if W ≤ V is π-invariant, then so is W ⊥ .
11. Let V be an inner product space and π1 , π2 ∈ L(V ) orthogonal projections. Then show that
76
(iii) π1 π2 is an orthogonal projection iff it’s a projection. Deduce that if π1 π2 = π2 π1 , then
π1 π2 is an orthogonal projection.
(iv) If π1 π2 is an orthogonal projection then show that the image of π2 is π1 -invariant.
Deduce that π1 π2 is an orthogonal projection iff π1 π2 = π2 π1 .
12. (*) If V1 , V2 are mutually orthogonal subspaces of a Hilbert space V , then prove that
V1 + V2 = V1 + V2 .
Does the equality hold if we do not assume V to be complete?
Does the equality hold if we only assume V1 and V2 to be linearly independent, but not
mutually orthogonal?
Hint. To answer the second question, let X be an inner product space which is not complete.
Let V be the subsapce of X b ⊕ X,
b generated by X ⊕ X and an element (x, x) such that
x∈X b \ X. Then V1 := X ⊕ 0, V2 := 0 ⊕ X are both closed in V , but V1 + V2 = V .
13. (*) Let (xn )n∈N , (yn )n∈N be two sequences in an inner product space (V, h , i). Suppose that
lim kxn − yn k = 0. If the set {x1 , x2 , . . . } generates a finite-dimensional subspace of V , is
n→∞
the same true for the set {y1 , y2 , . . . }?
14. (*) Make the following statement precise, and then prove it.
‘If V is a finite-dimensional inner product space, then the set of all orthogonal projections of
V is nowhere dense in the set of all projections of V .’
Hint. Can you interpret it in the language of matrices?
77
Lecture 22, 23, 24 and 25 (4/1/2021, 8/1/2021, 11/1/2021,
13/1/2021) :
Orthogonal diagonalization
For linear operators on arbitrary vector spaces, the ‘nicest’ ones are those which can be diagonal-
ized. But now that we’ve added more structure to an arbitrary vector space, we can talk about
an even more special class of linear operators - the ‘orthogonally diagonalizable’ ones, i.e., those
diagonalizable operators which also respect the inner product.
As a general principle, we can take almost every result about linear operators on finite-dimensional
vector spaces, throw the word ‘orthogonal’ into the mix, and get a result about linear operators
on finite-dimensional inner product spaces.
While discussing orthogonal diagonalization, most of the times, we’ll restrict ourselves to finite-
dimensional inner product spaces as a purely algebraic approach isn’t suitable to deal with general
linear operators on an infinite dimensional inner product space in any satisfactory way. One needs
serious analytic machinery to study them which falls in the realm of functional analysis.
Definitions. Let (V, h , i) be an inner product space and T ∈ L(V ) a linear operator. Then T
is said to be orthogonally diagonalizable if it’s diagonalizable and distinct eigenspaces are mutually
orthogonal. If V is finite-dimensional, then T is orthogonally diagonalizable iff there exists an or-
dered orthonormal basis B := (x1 , . . . , xn ) of V such that [T ]B is a diagonal matrix. If F = C, an
orthogonally diagonalizable linear operator on V is sometimes also called a unitarily diagonalizable
linear operator. Perhaps the name is motivated by the fact that complex inner product spaces are
also known as unitary spaces.
A matrix U ∈ Mn (C) is called a unitary matrix if U U ∗ = U ∗ U = I. A real unitary matrix is also
called an orthogonal matrix. Note that a matrix A ∈ Mn (F ) is unitary iff its column vectors form
an orthonormal basis of F n . A unitary matrix is automatically invertible.
Two matrices A, B ∈ Mn (R) are said to be orthogonally equivalent if there exists a orthogonal
matrix P ∈ Mn (R) such that B = P t AP . In particular, we call A ∈ Mn (R) to be orthogo-
nally diagonalizable if it’s orthogonally equivalent to a diagonal matrix. Similarly, two matrices
A, B ∈ Mn (C) are said to be unitarily equivalent if there exists a unitary matrix U ∈ Mn (C) such
that B = U ∗ AU . In particular, A ∈ Mn (C) is said to be unitarily diagonalizable if it’s unitarily
equivalent to a diagonal matrix. Note that both orthogonal equivalence and unitary equivalence
are indeed equivalence relations. We’ll later see that if two real matrices are unitarily equivalent,
then they are also orthogonally equivalent.
Like orthogonal diagonalization, we can define a linear operator T on a finite dimensional inner
product space V to be orthogonally triangulable if there exists an orthonormal basis B of V such
that [T ]B is an upper (or lower) triangular matrix. But this is not particularly interesting since,
by Gram-Schmidt orthogonalization, a linear operator T is orthogonally triangulable iff it’s trian-
gulable. In particular, every complex n × n matrix is orthogonally triangulable.
Lemma. Let V be an inner product space and f : V → F a linear functional with W := ker f .
Then f is continuous iff W is closed.
Proof. Let Vb be a completion of V with V ⊆ Vb being a dense subspace. Let W be the closure
of W in Vb . Then W 6= Vb as W ∩ V = W 6= V (we’re assuming f 6= 0). Since Vb is complete and W
⊥ ⊥
is a proper closed subspace, W 6= 0. Let y ∈ W be a nonzero element and φy := h , yi. Then φy
is a continuous linear functional on Vb . As φy |V and f have the same kernel, f must be a nonzero
multiple of φy |V , and hence continuous.
The implication in the other direction is trivial.
If (V, h , i) is an inner product space, we’ve seen that every x ∈ V ‘gives’ a continuous linear
functional φx : V → F , defined as φx (y) := hy, xi for all y ∈ V . Then ker φx = x⊥ is a closed
hyperplane of V . Since we have an orthogonal decomposition, V = x⊥ ⊕ < x >, x⊥⊥ =< x >.
78
If f ∈ V ∗ is a continuous linear functional, then we say that f comes from the inner product if
there exists an element xf ∈ V such that f = φxf . From the general theory of linear functionals,
it follows that for all u, v ∈ V , < u >=< v > iff ker φu = ker φv , or equivalently, iff u⊥ = v ⊥ .
The association x 7→ φx gives an R-linear homomorphism Φ : V → V ∗ . Consequently, Φ is a
linear transformation if F = R; and when F = C, it gives a conjugate-linear transformation,
i.e., Φ(λx) := φλx = λ̄Φ(x). Note that Φ is always injective, and it’s surjective whenever V is
finite-dimensional. In fact, when F = R, Φ gives a natural linear isomorphism between V and
V ∗ . When V is finite-dimensional, we can explicitly describe the element xf ∈ V . To do so, let
B := (x1 , . . . , xn ) be an ordered orthonormal basis of V . Then f (xi ) = hxi , xf i for all i = 1, . . . , n,
implying that
Xn
xf = f (xi )xi .
i=1
(iii) f is continuous, i.e., W is closed, and further W ⊥ 6= 0. Then f comes from the inner product
because for every nonzero element y ∈ W ⊥ , f and h , yi have the same kernel.
In a nutshell, f ∈ V ∗ comes from an inner product iff ker f ⊥ 6= 0, for which W being closed, or
equivalently, f being continuous, is a necessary condition, but not sufficient as we’ll shortly see.
If V is an inner product space of infinite dimension, then we know that dim V ∗ > dim V .
Therefore the map Φ : V → V ∗ cannot be surjective. In fact, for such a V , we can construct
any number of linear functionals which are not continuous. Just take a sequence (x1 , x2 , . . . ) in
V . Then define a linear functional f on V such that f (xn ) := nxn for all n ≥ 1. Clearly, f isn’t
continuous, and therefore cannot come from the inner product.
Another example is V := C([0, 1], F ), the set of all continuous F -valued functions defined on the
closed unit interval, with the inner product being defined as
Z 1
hf, gi := f (t)g(t)dt.
0
Then any nonzero linear combination of evaluation maps is not continuous, and therefore cannot
come from the inner product. To see this, note that if α1 , . . . , αr ∈ [0, 1] are finitely many points,
then we can construct a sequence of continuous functions (fn )n∈N ∈ V N such that lim kfn k = 0,
n→∞
and for each n ∈ N, fn (α1 ) = 1 and fn (αi ) = 0 for all i = 2, . . . , r.
However, if V is a Hilbert space, then every continuous linear functional defined on V comes
from the inner product.
Riesz representation theorem. If (V, h , i) is a Hilbert space then for every continuous
linear functional f ∈ V ∗ , there exists a unique element xf ∈ V such that f = h , xf i.
79
A linear transformation between normed linear spaces is said to be continuous if it is continuous
with respect to the induced metrics. We below discuss a related concept which is sometimes useful
to discuss continuity in a more efficient manner.
Definitions. Let (V, k kV ), (W, k kW ) be normed linear spaces over F = R/C with V 6= 0 and
T : V → W a linear transformation. Then the operator norm of T , denoted by kT kop , is defined
as
kT (x)kW
kT kop := sup = sup kT (x)kW .
x∈V \{0} kxkV x∈V and kxkV =1
Simply put, kT kop is the ‘maximum stretch imparted on the vectors of V by T ’. If V = W and
T : V → V is a linear operator, then T is said to be a bounded linear operator if kT kop < ∞.
Similar definitions apply to linear transformations between inner product spaces by considering
the induced norms.
Remarks.
1. As norm is translation invariant, a linear transformation T : V → W is continuous iff it’s
continuous at the origin.
2. A linear transformation T between normed linear spaces is continuous iff kT kop < ∞.
3. For every linear transformation T , kT kop ≥ 0; and kT kop = 0 iff T = 0.
4. If T = λI is a scalar operator then kλIk = |λ|. In fact, if λ ∈ F is an eigenvalue of T then
|λ| ≤ kT kop .
5. If T : V → W is a continuous linear transformation then kT (x)kW ≤ kT kop kxkV for all
x∈V.
6. If V is a finite-dimensional normed linear space, then every linear transformation defined on
V is continuous. One can see this by ‘identifying’ V with F n with the Euclidean norm, and
noting that any two norms on a finite-dimensional inner product space are equivalent.
7. If S, T ∈ L(V, W ) are linear transformations then kS + T kop ≤ kSkop + kT kop . If λ ∈ F and
T : V → W is a continuous linear transformation, then kλT kop = |λ| kT kop . Therefore, the
set of all continuous linear transformations from V to W is a subspace of L(V, W ), denoted
by BL(V, W ), which becomes a normed linear space under the operator norm.
8. If T1 : V1 → V2 , T2 : V2 → V3 are continuous linear transformations then so is the composition
T2 ◦T1 : V1 → V3 . In fact kT2 ◦T1 kop ≤ kT1 kop kT2 kop . Therefore, the set of all bounded linear
operators on a normed linear space V forms an F -subalgebra of L(V ), denoted by BL(V ).
9. The continuity of linear transformations become important only when the inner product
spaces/normed linear spaces involved are not of finite dimension, because every linear trans-
formation between normed linear spaces of finite dimension is always continuous.
Now we introduce adjoints of linear transformations, an indispensable tool in studying orthog-
onal diagonalizability of linear operators. This is modelled on and generalizes the transpose map
A 7→ At of a linear operator A on (Rn , ·), which satisfies Ax · y = x · At y for all x, y ∈ Rn .
A linear operator T is closely related to its adjoint T ∗ as the following result shows.
80
(i) T ∗ has an adjoint. In fact, T ∗∗ := (T ∗ )∗ = T .
(ii) im T ∗ ⊥ ker T and im T ⊥ ker T ∗ .
(iii) If V, W are further finite-dimensional, then
(a) rk T = rk T ∗ ; and null T = null T ∗ iff dim V = dim W .
(b) V = ker T ⊕ im T ∗ and W = ker T ∗ ⊕ im T are orthogonal decompositions.
(c) T is injective iff T ∗ is surjective, and T ∗ is injective iff T is surjective. In particular, T
is an isomorphism iff T ∗ is an isomorphism. In this case, (T ∗ )−1 = (T −1 )∗ .
(d) rk T T ∗ = rk T ∗ T = rk T = rk T ∗ .
Proof. (i) follows directly from the definition of adjoint.
To prove (ii), let y ∈ ker T . Then hy, T ∗ (x)iV = hT (y), xiW = 0, implying that im T ∗ ⊥ ker T .
The second assertion follows since T ∗∗ = T .
From (ii), it follows that rk T ∗ ≤ rk T . Also, rk T = rk T ∗∗ ≤ rk T ∗ , Together, we get (iii)-(a).
(iii)-(b) follows from (ii) and (iii)-(a).
The first two assertions in (iii)-(c) follow from (iii)-(b). For the third assertion, let x ∈ V, y ∈ W .
Then
hT −1 (y), xiV = hT −1 (y), T ∗ (T ∗ )−1 (x)iV = hT T −1 (y), (T ∗ )−1 (x)iW = hy, (T ∗ )−1 (x)iW .
Therefore (T ∗ )−1 = (T −1 )∗ .
Finally, (iv) follows from (ii). Actually, we’ve already seen the ‘matrix-theoretic version’ of (iv).
We can also give matrix-theoretic proofs of (iii) after we’ve done matrix representations of adjoint
linear transformations. Please do it!
Although we need adjoints only for linear operators on finite dimensional inner product spaces,
we prove a more general result as it doesn’t require much of an extra effort.
Theorem. Let (V, h , iV ), (W, h , iW ) be inner product spaces with V being complete. Then
every continuous linear transformation T : V → W has an adjoint T ∗ which is also continuous. In
fact, kT kop = kT ∗ kop .
kT ∗ (y)k2V = hT ∗ (y), T ∗ (y)iV = |hT T ∗ (y), yiW | ≤ kT T ∗ (y)kW kykW ≤ kT kop kT ∗ (y)kV kykW ,
implying that kT ∗ (y)kV ≤ kT kop kykW ; and consequently, kT ∗ kop ≤ kT kop . Similarly, kT kop ≤
kT ∗ kop , and together, we get kT kop = kT ∗ kop
Remark. It’s a fact which we are not going to prove that if a linear operator T on a Hilbert
space V has an adjoint T ∗ , then T must be continuous.
Proof. The ij-th entry of [T ∗ ]BW ,BV is the xi -th coefficient of T ∗ (yj ), which is equal to
∗
hT (yj ), xi iV = hT (xi ), yj iW , which is equal to the conjugate of the ji-th entry of [T ]BV ,BW .
Hence the assertion follows.
81
Immediately, we see that the matrix representation of a self-adjoint operator on a finite-
dimensional inner product space with respect to an ordered orthonormal basis is Hermitian.
Corollary. Let V be a finite-dimensional inner product space. If T ∈ L(V ) then the following
statements are equivalent.
(i) T is a self-adjoint operator.
(ii) If B is an ordered orthonormal basis of V , then [T ]B is a Hermitian (symmetric, if F = R)
matrix.
(iii) There exists an ordered orthonormal basis B of V , such that [T ]B is a Hermitian (symmetric,
if F = R) matrix.
Proof. Left as an exercise.
If you’ve been paying attention, similarities between adjoint and transpose of a linear transfor-
mation couldn’t have escaped your eyes. It’s time to make the relation precise.
induced by that element via inner product. Then ΦV , ΦV 0 are linear isomorphisms if F = R,
and conjugate-linear isomorphisms if F = C. In particular, every finite-dimensional real inner
product space is naturally isomorphic to its dual. If W is a subspace of V , then one can check that
ΦV (W ⊥ ) = W ◦ , so that we can ‘naturally identify’ W ⊥ with W ◦ .
If T : V → V 0 is a linear transformation, then we have the following commutative diagram of
R-linear transformations
T∗
V0 V
0 0
ΦV 0 : x 7→h ,x iV 0 ΦV : x7→h ,xiV
t
T
(V 0 )∗ V∗
Note that the vertical arrows are R-linear isomorphisms (conjugate linear, if F = C). It may be
quite helpful to keep this connection in mind.
Examples
1. 0∗ = 0, and I ∗ = I. In fact, for every scalar operator λI ∈ L(V ), we’ve (λI)∗ = λ̄I. In
particular, a scalar operator λI is self-adjoint iff λ ∈ R. Note that every scalar operator has
an adjoint, irrespective of the nature of V .
2. Let V := Mn (F ) where the inner product is given by hM, N i := tr (N ∗ M ). If A ∈ Mn (F ) is
a fixed matrix, then we can consider the linear operator LA given by the left multiplication
by A, i.e., LA (B) := AB for all B ∈ Mn (F ). Now one can check that (LA )∗ = LA∗ , so that
LA is self-adjoint iff A is a Hermitian matrix.
Formulate a similar result for RA , the right multiplication by A.
Next, let A, B ∈ Mn (F ) be fixed matrices. Now consider the linear operator TA,B : Mn (F ) →
Mn (F ), given by TA,B (M ) := AM B for all M ∈ Mn (F ). Then TA,B = LA ◦ RB = RB ◦ LA
and one can check that (TA,B )∗ = TA∗ ,B ∗ .
3. Let V := F [X], the polynomial ring in one variable over F = R/C, with the inner product
R1
being defined as hf, gi := 0 f (t)g(t)dt. Let φ ∈ F [X] be a fixed polynomial and Tφ : V → V
the linear operator given by the multiplication by φ. Then it’s easy to see that (Tφ )∗ = Tφ̄ ,
where φ̄ ∈ F [X] is obtained from φ by replacing its coefficients by their conjugates. Therefore
Tφ is self-adjoint iff φ ∈ R[X]. For each φ ∈ F [X], Tφ is continuous; but it doesn’t have
any eigenvalue unless φ is a constant polynomial. So we have examples of continuous self-
adjoint operators without any eigenvalues! ‘Oddities’ like this dissuade us from attempting
to classify orthogonally diagonalizable operators on infinite dimensional inner product spaces.
82
4. Let V = `2 (F ) := {(xn ) ∈ F N | 2
P
n |xn | < ∞}, the set of all square-summable sequences
over F . Then V is a Hilbert space containing F (N) as a dense subspace. Let R be the right
translation on V defined as
R((x1 , x2 , . . . )) := (0, x1 , x2 , . . . ).
L((x1 , x2 , . . . )) := (x2 , x3 , . . . ),
which is also continuous. Note that R is injective, L is surjective and L ◦ R = Id, but
R ◦ L 6= Id. Also, R doesn’t have any eigenvalues.
5. Let W be a subspace of an inner product space V and ι : W ,→ V the natural inclusion. If ι∗
exists then y − ι∗ (y) ∈ W ⊥ for all y ∈ V , implying that ι∗ = πW , the orthogonal projection
of V onto W . Therefore, for every W ≤ V which does not admit any orthogonal projection,
for example, when W is not closed, ι is a continuous linear transformation without having
any adjoint.
6. Let us consider V := F (N) with the usual dot product, and T : V → V be the linear operator
defined as T (en ) := ne1 for all n ≥ 1. Then T is not continuous. We claim that T doesn’t
have any adjoint. To see this, consider the sequence (en /n)n∈N . If T ∗ exists then
which is a contradiction.
In the following discussion, S, T are assumed to be linear operators on an inner product space
V , both having an adjoint. If you’re not comfortable with this general set-up, just assume V to
be finite-dimensional so that every linear operator is automatically continuous and has an adjoint.
The purpose of stating things in more generality is twofold. Firstly, restricting ourselves only to
finite dimensional inner product spaces doesn’t make too much of a difference. And secondly, i
want you to actively extract the arguments for the finite dimensional case, which, at times, are
somewhat simpler.
T 7→[T ]B T 7→[T ]B
A7→A∗
Mn (F ) Mn (F )
Therefore it’s no coincidence that we’re using the same symbol to denote the adjoint of a
linear operator, as well as the conjugate-transpose of a matrix! Note that if we consider
Mn (F ), the set of n × n matrices over F , then A 7→ Ā is an isomorphism, whereas A 7→ At
and A 7→ A∗ are anti-isomorphisms.
From an even more naive perspective, if we somehow ignore the fact that taking adjoints
switch the order of multiplication, the adjoint map ‘just looks like’ the conjugation map on
C. This ‘erroneous’ way of viewing adjoints actually turns out to be highly illuminating,
especially in the case of orthogonally diagonalizable operators on finite-dimensional complex
inner product spaces, as we’re soon going to see.
83
2. If W ≤ V is a T -invariant subspace, then W ⊥ is T ∗ -invariant. To see this, let x ∈ W and
y ∈ W ⊥ . Then hx, T ∗L (y)i = hT (x), yi = 0, implying that T ∗ (y) ∈ W ⊥ .
In particular, if V = i Vi is a T -invariant orthogonal decomposition of V , then the decom-
position is also T ∗ -invariant. This follows because, for each i, Vei := j6=i Vj is a T -invariant
L
9. For every linear operator T which has an adjoint T ∗ , the linear operators T + T ∗ , T T ∗ , T ∗ T
are all self-adjoint.
10. Let T ∈ L(V ) be an orthogonally diagonalizable operator. IfLW ≤ V is a T -invariant
subspace then T |W is also orthogonally diagonalizable. If L V = λ∈F Vλ is the orthogonal
decomposition of V into the eigenspaces of T , then W = λ∈F Wλ , where Wλ = Vλ ∩ W
for all λL∈ F . For each λ ∈ F , if Wλ0 is the L orthogonal complement of Wλ in Vλ , then
W ⊥ = λ∈F Wλ0 . To see this,P first note that 0 ⊥
λ∈F Wλ ⊆ W . Conversely, let y ∈ W .
⊥
Then we can write y as y = λ yλ , where yλ ∈ Vλ for all λ ∈ F and yλ = 0 for all but finitely
many λ. If yµ ∈ / Wµ0 for some µ ∈ F , then there exists xµ ∈ Wµ such that hxµ , yµ i = 6 0,
implying that hxµ , yi = hxµ , yµ i 6=L / W ⊥ , which is
0. But as xµ ∈ W , it implies that y ∈
a contradiction. Therefore W ⊥ = λ∈F Wλ0 is also T -invariant and T |W ⊥ is orthogonally
diagonalizable.
84
We’re now in a position to discuss orthogonal diagonalizability of a linear operator on a finite-
dimensional inner product space. Suppose that a linear operator T on a finite-dimensional inner
product space V is orthogonally diagonalizable. If B is an ordered orthonormal basis of V such
that A := [T ]B is a diagonal matrix, then [T ∗ ]B = A∗ is also a diagonal matrix, implying that
T and T ∗ have a common orthonormal eigenbasis. In particular, T T ∗ = T ∗ T ; and when F = R,
A = A∗ , i.e., T is a self-adjoint operator. We are soon going to see that these necessary conditions
for orthogonal diagonalizability are also sufficient.
Proof. With C being algebraically closed, every linear operator on a finite-dimensional com-
plex inner product space is triangulable, Therefore (i) follows directly from the above theorem.
To prove (ii), let T be a self-adjoint operator on a finite-dimensional real inner product space.
Choose an orthonormal basis B := (x1 , . . . , xn ) of V and consider the matrix representation
A := [T ]B ∈ Mn (R). Then A is a symmetric matrix, and hence Hermitian, if treated as a matrix
over C. If λ ∈ C is an eigenvalue of A corresponding to an eigenvector x ∈ Cn , then Ax · x = x · Ax,
implying that λ = λ̄, or equivalently, λ ∈ R. It shows that every eigenvalue of A, and hence of
T , is real. Therefore T is a triangulable normal operator over R, implying that T is orthogonally
diagonalizable.
The matrix theoretic version of the corollary is left as an exercise.
In the following remarks, unless mentioned otherwise, S, T are assumed to be linear operators
on a finite-dimensional inner product space V .
Remarks.
1. A linear operator T is self-adjoint iff it’s orthogonally diagonalizable with real eigenvalues.
In particular, chT (X) ∈ R[X]; and the same is true for a Hermitian matrix.
85
not very interesting; here the ‘correct’ notion is that of a conjugate-transpose. The ‘reason’
for this may be traced back to the conjugate-linearity of the inner product.
5. Every complex square matrix is unitarily triangulable, i.e., if A ∈ Mn (C), then there exists
a unitary matrix U ∈ Mn (C) such that U −1 AU is an upper triangular matrix. On the other
hand, for a real matrix A ∈ Mn (R), there exists an orthogonal matrix P ∈ Mn (R) such that
P −1 AP is upper triangular iff A is triangulable over R.
6. The set of normal matrices is a proper closed subset of Mn (F ). So orthogonal diagonaliz-
ability isn’t so ‘common’ as usual diagonalizability.
2
The set of all symmetric matrices in Mn (R) is a closed subspace of dimension n 2+n ; and the
set of all Hermitian matrices in Mn (C) is a closed R-linear subspace of real dimension n2 .
7. The only thing which prevents a normal operator on a finite-dimensional real inner product
space from being orthogonally diagonalizable, is the ‘absence’ of eigenvalues.
We’ve already seen the above definitions in the context of matrices. The following proposition
captures the relation.
86
Proposition. Let T be a self-adjoint operator on a finite-dimensional inner product space V .
Then the following statements are equivalent.
Proof. We only prove the positive definite case, the other ones are similar. If T is positive
definite, then T = T ∗ . So Ti∗ = T ∗ |Vi = T |Vi = Ti for all i, implying that Ti is self-adjoint. Now if
x ∈ Vi then hTi (x), xi = hT (x), xi > 0, implying that Ti is positive L definite. L
Conversely, suppose that each Ti is positive definite. Then T ∗ = i Ti∗ = i Ti = T , i.e.,PT is
self-adjoint. Now let x ∈ V be a nonzero element. Then x can be uniquely written as x = i xi
where each xi P ∈ Vi with
P at leastPone xi and P at most P finitely many xi being nonzero. Then
hT (x), xi = hT ( i xi ), i xi i = h i T (xi ), i (xi ) = i hTi (xi ), xi i > 0. Therefore T is positive
definite.
The next lemma may appear out of place at first sight. But the corollaries immediately follow-
ing the lemma justifies its significance.
Lemma. Let R be a commutative ring with A, A0 being R-algebras (not necessarily commuta-
tive). Let φ : A → A0 be an R-algebra homomorphism and f (X) ∈ R[X] a polynomial. If f (X) = a
has a solution in A then f (X) = φ(a) has a solution in A0 . If moreover φ is an isomorphism then
the number of solutions of f (X) = a is same as the number of solutions of f (X) = φ(a).
We’re mainly interested in the cases when R = F is a field, either A = L(V ) and A0 = Mn (F ) or
A = A0 = Mn (F ), and φ is an F -algebra isomorphism.
87
1. If T is normal then T ∗ can be written as a polynomial in T .
Is it true without the normality assumption?
Proof. Just choose an orthonormal basis B of V such that [T ]B is a diagonal matrix. Note
that the map L(V ) → Mn (F ), obtained by sending a linear operator to its matrix representation
with respect to B, is an F -algebra isomorphism so that we can apply the above lemma. And for
matrices A ∈ Mn (F ), choose unitary matrices U ∈ Mn (F ) such that U −1 AU are diagonal. Now
the original questions reduce to questions about diagonal matrices. The remaining details are left
as an exercise.
Exercises
1. HK : Section 8.3 - 3,6,8,9,10,12;
2. Let T be a linear operator on a finite-dimensional inner product space V . Suppose that T has
n distinct eigenvalues. Then prove that T is orthogonally diagonalizable iff T is diagonalizable
and F [T ] contains 2n distinct orthogonal projections.
88
(i) If S := {xi } is a basis of V , then show that there exists a unique inner product on V
with respect to which S is an orthonormal set.
(ii) If h , i is an inner product on V , then prove that (V, h , i) is isomorphic to (F (I) , ·) iff
there exists an orthonormal spanning set of V indexed by I.
(iii) If T ∈ L(V ) is a diagonalizable linear operator, then show that there exists an inner
product on V with respect to which T is orthogonally diagonalizable.
9. Let T be a linear operator on a finite-dimensional complex inner product space V . If B is
an ordered basis of V , not necessarily orthonormal, is it true that [T ∗ ]B = ([T ]B )∗ ?
If [T ∗ ]B = ([T ]B )∗ for all T ∈ L(V ), does it follow that B is orthonormal?
12. (*) Let T be a linear operator on an inner product space V . Suppose that {Vi } is a family
of T -invariant subspaces of V such that V = ∪i Vi . Then show that T is orthogonally
diagonalizable iff T |Vi is orthogonally diagonalizable for each i.
13. (*) Let T be a normal operator on a complex inner product space V . If T is locally finite,
then prove that T is orthogonally diagonalizable.
Hint. Can you show that each x ∈ V is contained in a finite-dimensional subspace Vx which
is invariant under both T and T ∗ ?
14. Can you give an example of a real normal matrix which is not orthogonally diagonalizable
over R?
15. Let T be a triangulable linear operator on a finite-dimensional inner product space V . Then
show that the following statements are equivalent.
(i) T is orthogonally diagonalizable.
(ii) For every T -invariant subspace W ≤ V , W ⊥ is also T -invariant.
16. (*) Let (V, h , iV ), (W, h , iW ) be inner product spaces and T : V → W a linear transforma-
tion. If there exists a sequence (xn )n∈N ∈ V N and an element y ∈ W such that lim xn = 0,
n→∞
but lim hT (xn ), yiW 6= 0, then prove that T cannot have an adjoint.
n→∞
Now consider F [X] to be an inner product space with respect to the inner product defined as
R1
hf, gi = 0 f (t)g(t)dt. Then show that the differentiation operator D : F [X] → F [X], given
by D(f ) := f 0 , does not have any adjoint.
17. Let T be a normal operator on an inner product space V . Then show that kT (x)k = kT ∗ (x)k
for all x ∈ V .
Deduce that an element v ∈ V is an eigenvector of T associated to λ ∈ F iff v is an eigenvector
of T ∗ associated to λ̄.
18. (*) Let T : V → W be a linear transformation of inner product spaces. If T has an adjoint
T ∗ : W → V , then prove that
89
Deduce that if T is a linear operator on a finite-dimensional inner product space V , then
kT kop is equal to the square root of the largest eigenvalue of the positive semi-definite opeartor
T T ∗ (or T ∗ T ).
Hint. For the first part, kT (x)k2 = hT (x), T (x)i = |hx, T ∗ T (x)i| ≤ kT ∗ T kop kxk2 .
19. Let (V, h , i) be an inner product space and π : V → V a projection which has an adjoint
π ∗ . Then show that the following statements are equivalent.
Can you give an example of a projection π which does not have an adjoint?
Hint. You may prove (i) =⇒ (ii) =⇒ (iii) =⇒ (i), (i) =⇒ (iv), (v) and (v) =⇒
(iv) =⇒ (i).
20. (*) Prove that a Hilbert space V cannot have an infinite orthogonal decomposition.
Deduce that if V is a Hilbert space then an orthogonally diagonalizable operator T ∈ L(V )
can have only finitely many eigenvalues, and therefore T is continuous.
Give an example of an orthogonally diagonalizable operator which is not continuous.
21. Let T be a linear operator on an inner product space V which has an adjoint T ∗ . Then show
that T T ∗ , T ∗ T are positive semi-definite operators.
Prove that T T ∗ (respectively, T ∗ T ) is positive definite iff T ∗ (respectively, T ) is injective.
22. (*) Let T be a linear operator on a complex inner product space V . Then prove that T = 0
iff hT (x), xi = 0 for all x ∈ V . Deduce that T is self-adjoint iff hT (x), xi ∈ R for all x ∈ V .
Do the assertions hold if we replace the complex inner product space by a real inner product
space?
23. (*) If T is an injective linear operator on a complex inner product space V , then show that
the following statements are equivalent.
(a) Re hT (x), xi = 0 for all x ∈ V .
π
(b) θ(T (x), x) = 2 for all nonzero x ∈ V .
∗
(c) T + T = 0.
24. A linear operator T on an inner product space V is said to be an anti-self-adjoint operator if
T ∗ + T = 0. Then show that the following statements are equivalent for a linear operator T
on a finite-dimensional complex inner product space V .
25. Let A be an n × n real matrix such that At + A = 0. If A is diagonalizable, then show that
A = 0.
26. Give examples of two normal matrices A, B ∈ Mn (R) such that A + B, AB are not normal.
27. Let T is a normal operator on an inner product space V (V is neither assumed to be finite-
dimensional, nor over C!). Then show that ker T ⊥ im T .
Deduce that distinct eigenspaces of T are mutually orthogonal.
90
28. (*) If A ∈ Mn (C), then we can define the operator norm of A as
where kxk denotes the Euclidean norm of the vector x ∈ Cn . If kAk := tr (AA∗ ), then show
that kAkop ≤ kAk.
Hint. AA∗ is a positive semi-definite matrix.
91
Lecture 26 (18/1/2021) :
Unitary operators
Recall that if (V, h , iV ), (W, h , iW ) are inner product spaces, then an inner product space homo-
morphism T : V → W is a linear transformation T which preserves the inner product; and for a
linear transformation T : V → W , T preserves the inner product iff it preserves the induced norm
iff it preserves the induced metric. Therefore, an inner product space homomorphism T , which is
always one-to-one, is nothing but a linear isometry.
If V is a finite-dimensional real inner product space, an inner product space homomorphism
T : V → V is also called an orthogonal transformation.
92
The following lemma explicitly states the relation between unitary operators and unitary/orthogonal
matrices.
Proof. Observe that for a matrix A ∈ Mn (F ), the ij-th entry of of AA∗ (respectively, A∗ A) is
just ri · rj (respectively, cj · ci ), where r1 , . . . , rn are the row vectors and c1 , . . . , cn are the column
vectors of A. Therefore A ∈ Mn (F ) is unitary (or orthogonal, if F = R) iff its row vectors and
column vectors form orthonormal sequences.
Proposition. Let T be a linear operator on an inner product space (V, h , i). Then T is a
unitary operator iff T T ∗ = T ∗ T = I.
Proof. Suppose that T ∈ L(V ) is a unitary operator. Then T −1 is also unitary, and
hT (x), yi = hT −1 T (x), T −1 (y)i = hx, T −1 (y)i for all x, y ∈ V , implying that T ∗ = T −1 .
Conversely, suppose that T T ∗ = T ∗ T = I. Then T is invertible. So we only need to check that
T preserves the inner product. Now hT (x), T (y)i = hx, T ∗ T (y)i = hx, yi for all x, y ∈ V , implying
that T preserves the inner product (Where are we using T T ∗ = I?).
Remarks.
1. Every inner product space (V, h , i) is isomorphic, as an inner product space, to (F n , ·).
The proof essentially boils down to choosing an orthonormal basis of V . In particular, two
inner product spaces V and W of finite dimensions are isomorphic iff they have the same
dimension.
93
7. A ∈ Mn (C) is a unitary matrix iff Ā, At and A∗ := Āt are unitary matrices.
Similarly, A ∈ Mn (R) is an orthogonal matrix iff At is an orthogonal matrix.
12. As every isometry of Rn which fixes the origin is a linear operator, every isometry of Rn is
an orthogonal transformation followed by a translation.
13. We’ve seen that important properties of matrices like invertibility, triangulability, diagonal-
izability etc. remain invariant under the general equivalence given by the invertible ma-
trices. Similarly, for n × n matrices over F = R/C, important properties like normality,
self-adjointness, orthogonal diagonalizabilty, anti-self-adjointness, positive definiteness, pos-
itive semi-definiteness, negative definiteness, negative semi-definiteness, indefiniteness etc.
are preserved under unitary equivalence (or orthogonal equivalence, if F = R).
Similar assertions hold for linear operators on a finite-dimensional inner product space V .
14. If F = R/C, then On (F ), the set of all n × n orthogonal matrices over F , is a closed subset of
Mn (F ); and the set of all n × n complex unitary matrices Un (C) is a closed subset of Mn (C).
15. If T is an invertible linear operator on an inner product space V then for each λ ∈ F , λ is
an eigenvalue of T iff λ−1 is an eigenvalue of T −1 , and the λ-eigenspace of T is same as the
λ−1 -eigenspace of T −1 . In particular, T and T −1 have the same set of eigenvectors, and T is
(orthogonally) diagonalizable iff T −1 is (orthogonally) diagonalizable.
94
Lecture 27 (22/1/2021) :
QR decomposition
A careful application of the Gram-Schmidt orthogonalization process allows us to decompose in-
vertible matrices.
Theorem. Let A be an n × n invertible matrix over F = R/C. Then there exists a unique
pair of n × n matrices (QA , RA ) satisfying the following properties
(i) QA is a unitary (orthogonal, if F = R) matrix and RA is an upper triangular matrix with
positive diagonal entries.
(ii) A = QA RA , which is called the QR decomposition of A.
Before we begin the proof of the theorem, let us make a simple observation.
Lemma. Let F be an arbitrary field and M ∈ Mn (F ) an invertible upper (or lower) triangular
matrix. Then M −1 is also an upper (or lower) triangular matrix. Further, if F = R/C and each
diagonal entry of M is positive/negative, then the same is true for M −1 .
Proof. One can prove this lemma by direct computation. Alternatively, let TM : F n → F n
be the linear transformation whose matrix representation with respect to the natural ordered
basis B := (e1 , . . . , en ) is M . If 0 = V0 ( V1 ( . . . ( Vn = F n is a chain of T -invariant sub-
spaces then the chain is also T −1 -invariant as T −1 can be written as a polynomial in T . Since
[T −1 ]B = [T ]−1
B =M
−1
, the assertion follows. The rest is trivial.
Proof of the theorem. Let B := (e1 , . . . , en ) be the natural ordered orthonormal basis of F n
and TA : F n → F n the linear operator such that [TA ]B = A. If x1 , . . . , xn ∈ F n are the column
vectors of A, then clearly, TA (ei ) = xi for all i = 1, . . . , n. Now Bx := (x1 , . . . , xn ) is an ordered
basis of F n . Let (y1 , . . . , yn ) be the ordered orthonormal basis of F n which is obtained from Bx
by using Gram-Schmidt orthogonalization, and G : F n → F n the linear operator which sends xi
to yi for all i = 1, . . . , n. Then [G]Bx is an upper triangular matrix with positive diagonal entries.
Note that G ◦ TA is a unitary operator of F n . Therefore [G ◦ TA ]B = U ∈ Mn (F ) is a unitary
(orthogonal, if F = R) matrix. By the change of basis formula for matrix representations of a
linear transformation, we have [G]B = A[G]Bx A−1 . Now
Remark. Taking transpose on both sides of A = QA RA , one can easily see that A can also
0
be uniquely written in the form A = RA Q0A , where RA
0
is a lower triangular matrix with positive
0
diagonal entries and QA is a unitary (orthogonal, if F = R) matrix.
Appropriate versions of QR decompositions hold for arbitrary square (not necessarily invertible),
and even non-square matrices. But we won’t discuss them here.
Orthogonal transformations of Rn
To understand orthogonal transformations, let’s first consider the simplest nontrivial the case -
the real plane. Let T : R2 → R2 be an orthogonal transformation with A ∈ M2 (R) being its
matrix representation with respect to the natural ordered orthonormal basis (e1 , e2 ). Then A is
an orthogonal matrix. Now it is easy to see that
[
a −b 2 2 a b 2 2
O2 (R) = a +b =1 a +b =1 ,
b a b −a
95
with the first set being SO2 (R). If T has a real eigenvalue, then both its eigenvalues are real, in
which case A must be of the form
1 0 −1 0 1 0
A= or or
0 1 0 −1 0 −1
with respect to a suitable ordered orthonormal basis. Note that the second matrix represents
rotation at an angle π and the third one the reflection with respect to a line passing through the
origin.
If the eigenvalues of T are not real, then A is a special orthogonal matrix, and we can find
φ ∈ (0, π) ∪ (π, 2π) such that A is of the form
cos φ - sin φ
A= ,
sin φ cos φ
which is nothing but the anti-clockwise rotation of the plane at an angle φ because cos (π/2 + φ) =
- sin φ and sin (π/2 + φ) = cos φ. If we parametrize the (anti-clockwise) rotations of R2 as
cos θ - sin θ
Aθ = θ ∈ [0, 2π) ,
sin θ cos θ
then SO2 (R) = {Aθ | θ ∈ [0, 2π)}; and for every rotation but at an angle θ = 0 or π, we get an
orthogonal matrix whose eigenvalues are not real. Note that if we naturally identify R2 with C,
by sending e1 to 1 and e2 to i, then the anti-clockwise rotation of R2 at an angle θ is same as the
multiplication map in C, given by the element eiθ = cos θ + i sin θ.
where t is a non-negative integer, r, s ∈ {0, 1} and f1 , . . . , ft ∈ R[X] are distinct monic irreducible
polynomials of degree 2; and the characteristic polynomial of T is of the form
0 0
chT (X) = (X − 1)r (X + 1)s f1e1 (X) . . . ftet (X),
where r0 (≥ r), s0 (≥ s) are non-negative integers and e1 , . . . , et are positive integers such that
n = r0 + s0 + 2(e1 + . . . + et ).
Proof. From the theory of general linear operators, we know that V = ker f (T ) ⊕ ker g(T ) is
a T -invariant direct sum decomposition and ker g(T ) = im f (T ). So we only need to show that
ker f (T ) ⊥ imf (T ). Now f (T ) being normal, ker f (T ) = ker f (T )∗ . Since im f (T ) ⊥ ker f (T )∗ ,
the assertion follows.
Now continuing with the discussion of the orthogonal transformation T : Rn → Rn , the above
lemma implies that Rn has a T -invariant orthogonal decomposition
Rn = V+ ⊕ V− ⊕ V1 ⊕ . . . ⊕ Vt ,
96
on Vi , let us consider the orthogonal transformation Ti := T |Vi . Then minTi (X) = fi (X) and
chTi (X) = fiei (X). If W ≤ Vi is a Ti -invariant subspace then W ⊥ is Ti∗ = Ti−1 -invariant. As
Ti = (Ti−1 )−1 is a polynomial in Ti−1 , W ⊥ is also Ti -invariant. It follows that each Vi has a further
orthogonal decomposition of the form
Vi = Vi,1 ⊕ . . . ⊕ Vi,ei
into two-dimensional T -invariant subspaces. Note that each Vi,j is isomorphic, as an inner product
space, to (R2 , ·). So there exists a unique θi ∈ (0, π) ∪ (π, 2π) such that with respect to a suitable
ordered orthonormal basis Bi,j of Vi,j ,
cos θi - sinθi
[T |Vi,j ]Bi,j = .
sin θi cos θi
Putting together all the pieces, we see that the matrix representation of T , with respect to a
suitable ordered orthonormal basis B of Rn , has the following block diagonal form
Ir 0
−Is0 0
Mθ 1
[T ]B = ,
·
0
·
Mθ t
where, for each i, Mθi is a 2ei × 2ei block diagonal matrix of the form
Aθ i
· 0
Mθ i = · ,
0
·
Aθi
cos θi - sinθi
with Aθi = for all i. And this completes the description of matrix representa-
sin θi cos θi
tions of orthogonal transformations of Rn .
It’s clear that On (R) is not a connected set (why?). However, we can now show that SOn (R) :=
{A ∈ On (R) | det A = 1} is connected. To do so, we first need a few definitions.
Definitions. Let X be a metric space (or more generally, a topological space). Then X is
said to be path connected of for all x, y ∈ X, there exists a continuous function γ : [0, 1] → X such
that γ(0) = x and γ(1) = y. We say that γ is a path from x to y. More generally, for an arbitrary
metric space (or, topological space) X, if we define a relation x ∼p y if there exists a path from x
to y, then one can check that ∼p is an equivalence relation, so that X is path connected iff there
exists a unique equivalence class in X with respect to ∼p .
Note that a path connected metric space (or, topological space) is connected and every surjective
image of a path connected metric space (or, topological space) is path connected.
Proof. Let A ∈ SOn (R). As −I2 represents the rotation of the R2 -plane at an angle π, there
exists an orthogonal matrix P ∈ On (R) such that
Ir
t
Aθ 1 0
P AP = · ,
0
·
Aθ m
97
such that n = 2m + r and θi ∈ (0, 2π) for all i = 1, . . . , m. We want to show that there exists a
path γ : [0, 1] → SOn (R) such that γ(0) = In and γ(1) = P t AP . As Aθi ∈ SO2 (R) for all i, it
suffices to construct paths γi : [0, 1] → SO2 (R) such that γi (0) = I2 and γi (1) = Aθi . But that is
easy. For each i, we can take γi : [0, 1] → SO2 (R), defined as
cos tθi - sin tθi
γi (t) := ,
sin tθi cos tθi
so that γi (0) = I2 and γi (1) = Aθi . So there exists a path from In to P t AP . Now let φP :
Mn (R) → Mn (R) be the homeomorphism defined as φ(M ) := P M P t for all M ∈ Mn (R). Then
φP induces a homeomorphism of SOn (R) and φP ◦ γ : [0, 1] → SOn (R) is a path from In to A.
As ∼p is an equivalence relation and In ∼p A for all A ∈ SOn (R), it follows that SOn (R) is path
connected. Note that we’ve essentially reduced the proof into just showing that SO2 (R) is path
connected.
Wrapping up
Let V be a finite dimensional complex inner product space. Then a normal operator T on V can
be thought of as a bunch of complex numbers as V has a T -invariant orthogonal decomposition,
with T acting on each component by the multiplication by a complex number. Therefore, if we
‘identify’ the set of normal operators on V with the complex plane, then it leads to the following
identifications which is obvious if dim V = 1.
Exercises.
1. HK : Section 8.4 - 1,3 (T = Mγ iff the matrix representation of T with respect to the ordered
basis (1, i) is of the form . . . ),7,10,11,13 (U ∗ = U −1 , the last 3 problems are related.); Section
9.5 - 3,4,6,8.
2. If A ∈ Mn (C) is a unitary matrix, then show that for every positive integer r, there exist
exactly rn unitary matrices B ∈ Mn (C) such that B r = A.
If A ∈ Mn (R) is an orthogonal matrix, then show that for every positive integer r, there
exists an orthogonal matrix B ∈ Mn (R) such that B 2r+1 = A. When is the solution unique?
If A ∈ SOn (R), then show that for every positive integer r, there exists a special orthogonal
matrix B ∈ SOn (R) such that B r = A.
Does there exist an orthogonal matrix B ∈ O2 (R) such that
2 1 0
B = ?
0 −1
98
3. Let A ∈ On (R) be an orthogonal matrix of determinant −1. Then show that there exists an
orthogonal matrix P ∈ On (R) such that
if n = 2m + 2 is even then
1 0
0 −1
0
Aθ 1
t
P AP = ,
·
0
·
Aθm
t
Aθ1 0
P AP = · ,
0
·
Aθ m
has a and a−1 as its eigenvalues. In particular, the absolute values of the eigenvalues of
a complex normal orthogonal matrix need not be equal to 1.
99
9. (i) If T is a linear operator on a complex inner product space (V, h , i) such that hT (x), xi >
0 for all nonzero x ∈ V , then show that T is a positive definite linear operator.
(ii) Suppose that T is a linear operator on an inner product space (V, h , i) over F = R/C
such that hT (x), xi > 0 for all nonzero x ∈ V (If F = R, such a T need not be positive
definite!). Then show that the function b : V × V → F , defined as b(x, y) := hT (x), yi,
is an inner product on V . Prove that a linear operator S : V → V is a unitary operator
with respect to this new inner product b iff S has an adjoint S ∗ satisfying ST S ∗ = T .
10. (*) Let T be a linear operator on an inner product space V over F = R/C which has an
adjoint T ∗ . Then the following assertions hold.
(i) If S : V → V is a self-adjoint operator, so is T ∗ ST and T ST ∗ .
(ii) If S : V → V is a positive (or negative) semi-definite operator, so is T ∗ ST and T ST ∗ .
(iii) For every positive definite (respectively, negative definite) linear operator S : V → V ,
T ∗ ST and T ST ∗ are both positive definite (respectively, negative definite) iff T (or
equivalently, T ∗ ) is invertible.
(iv) If T ∗ T = T T ∗ is a scalar operator, i.e., T ∗ T = T T ∗ = λI for some non-negative real
number λ, then for every normal operator S ∈ L(V ), T ST ∗ and T ∗ ST are both normal
operators.
11. Let U be a unitary operator on a finite-dimensional inner product space (V, h , i). If W ≤ V
is a U -invariant subspace then prove that W ⊥ is also U -invariant.
12. Explicitly describe all orthogonal transformations of R3 .
13. Let T be a normal operator on a finite-dimensional complex inner product space V . Then
show that T is self-adjoint iff iT is anti-self-adjoint. More generally, if S, T ∈ L(V ) are
commuting normal operators then prove that
(i) If S is self-adjoint and T is anti-self-adjoint, then ST is anti-self-adjoint.
(ii) If S, T are both anti-self-adjoint operators then ST is negative semi-definite.
14. (*) Let T be an invertible linear operator on a finite-dimensional inner product space V .
Then show that the following statements are equivalent.
(i) T T ∗ = T ∗ T is a nonzero scalar operator.
(ii) For every self-adjoint operator S ∈ L(V ), T −1 ST is also self-adjoint.
(iii) For every anti-self-adjoint operator S ∈ L(V ), T −1 ST is also anti-self-adjoint.
Hint. For (ii) =⇒ (i), use that orthogonal projections are self-adjoint operators.
15. Let V be a finite-dimensional inner product space V . Then show that a linear operator T ∈
L(V ) is self-adjoint iff there exists a T -invariant orthogonal decomposition V = V+ ⊕ V− ⊕ V0
such that T |V+ is positive definite, T |V− is negative definite and T |V0 = 0V0 .
16. (*) Let (V, h , i) be an inner product space over F = R/C and x ∈ V a nonzero element. Let
x⊥ := {y ∈ V | θ(x, y) = π2 }. Then prove that x⊥ is an R-linear subspace of V satisfying
x⊥ ⊆ x⊥ ( V . Also, show that x⊥ has R-codimension one in V . In particular, x⊥ = x⊥ iff
F = R.
17. Let T be a linear operator on a finite-dimensional inner product space V . If kT − Ikop < 1,
then prove that T is invertible; and if T is invertible, then show that kT −1 kop ≤ kT k−1
op .
100
(c) If V1 , V2 , V3 are inner product spaces, and f : V1 → V2 , g : V2 → V3 are both length/angle
preserving functions, then so is g ◦ f .
(d) Prove that every nonzero scalar operator and every unitary operator on V is angle
preserving.
(e) If f : V → V is both length and angle preserving then show that f is R-linear. In
particular, if F = R, then f is an orthogonal transformation.
Is f linear if F = C? What if we further assume that f (ix) = if (x) for all x ∈ V ?
(f) If a linear operator T ∈ L(V ) preserves angle, then prove that for all x, y ∈ V , if x ⊥ y
then T (x) ⊥ T (y).
Deduce that, for such a T , there exists a nonzero λ ∈ F such that λT is a unitary
operator.
Hint. For the second part, can you reduce it to the case when dim V = 2 and T has 1
as an eigenvalue?
101
Lecture 28 and 29 (25/1/2021, 27/1/2021) :
Polar decomposition
We are familiar with polar √ decomposition of complex numbers. If z ∈ C, then z can be written as
z = reiθ , where r = |z| := z z̄ is a non-negative real number and eiθ ∈ S 1 is a complex number
of unit modulus. Here that r = |z| is uniquely determined by z, and a representation z = rz uz ,
where rz is a non-negative real number and uz is a complex number of unit modulus, is unique iff
z 6= 0, or equivalently, iff z is invertible.
Theorem. Let T be a linear operator on a finite-dimensional inner product space (V, h , i).
Then T can be written as T = U N , where U is a unitary operator and N is a positive semi-definite
(or non-negative) operator. The operator N is uniquely determined by T , and U is unique iff T is
invertible.
Proof. If there exists a unitary operator U and a non-negative operator N such that T = U N ,
then T ∗ = N ∗ U ∗ = N U ∗ , implying that T ∗ T = N U ∗ U N = N 2 . Therefore, N is the unique
non-negative square root of the non-negative operator T ∗ T .
To determine U , we first consider the case when T is invertible. Then T ∗ T is also invertible, and
hence so is N . Therefore, it suffices to show that U := T N −1 is a unitary operator. Now
U U ∗ = T N −1 (T N −1 )∗ = T N −1 (N −1 )∗ T ∗ = T N −2 T ∗ = T (T ∗ T )−1 T ∗ = T T −1 (T ∗ )−1 T ∗ = I.
hT (N 0 )−1 (x), T (N 0 )−1 (x)i = h(N 0 )−1 (x), T ∗ T (N 0 )−1 (x)i = hx, (N 0 )−1 (N 0 )2 (N 0 )−1 (x)i = hx, xi,
implying that T 0 (N 0 )−1 preserves the inner product, and this completes the proof. Note that
there’s no canonical way of extending U 0 to U ∈ U (V ), because U can be defined in various ways
on ker T , and that’s the source of non-uniqueness of U .
Remark. Taking adjoint on both sides of T = U N , we see that T can also be written in the
form T = N1 U1 , where N1 is a non-negative operator satisfying N12 = T T ∗ , and U1 is a unitary
operator; and U1 is unique iff T is invertible.
For a polar decomposition T = U N , U and N commute with each other iff T is a normal operator.
In fact, if T is normal and B is an orthonormal eigenbasis of T in V , then [T ]B is a diagonal
matrix, so that by simply taking polar decompositions of the diagonal entries, we can construct a
polar decomposition of T . Then it is also obvious that U N = N U . This is one more ‘evidence’
supporting our philosophy that ‘normal operators on finite-dimensional inner product spaces be-
have like complex numbers’.
As an application of polar decomposition, we will show that any two real matrices which are
unitarily equivalent are also orthogonally equivalent. But before that, we prove a result of a more
general flavour.
102
Proposition. Let E/F be an arbitrary field extension where F is an infinite field. If A, B are
n × n matrices over F , then A is similar to B over F iff A is similar to B over E.
Proof. If A is similar to B over F then, clearly, it’s also similar to B over E. Conversely,
suppose that there exists P ∈ GLn (E) such that P −1 AP = B, or equivalently, AP = P B. Let
α1 , . . . , αm ∈ E be linearly independent elements over F such that each entry of P can be written
as an F -linear combination of α1 , . . . , αm . Then we can write P as
P = α1 P1 + . . . + αm Pm ,
Pt := t1 P1 + . . . + tm Pm ∈ Mn (F [t1 , . . . , tm ]),
where t1 , . . . , tm are algebraically independent elements over F , so that we can treat them as
variables. Now det Pt is a nonzero polynomial in t1 , . . . , tm because when we put ti = αi for all
i, then we get a nonzero value. With F being infinite, we can find β1 , . . . , βm ∈ F such that
Pβ := β1 P1 + . . . , βm Pm ∈ Mn (F ) is an invertibe matrix. But then APβ = Pβ B, implying that A
is similar to B over F .
Remark. The above result is true even without the assumption that F is infinite, but the
proof uses fundamental theorem of modules over PID.
Theorem. If A, B ∈ Mn (R) are unitarily equivalent matrices, then they are also orthogonally
equivalent.
Proof. If A, B are unitarily equivalent, then there exists a unitary matrix U ∈ Un (C) such
that B = U ∗ AU , or equivalently, AU = U B. Then A∗ U = U B ∗ . Therefore, using the previous
proposition, we can find an invertible matrix S ∈ GLn (R) such that AS = SB and A∗ S = SB ∗ .
Let S = P N be the polar decomposition of S, so that P ∈ On (R) and N is a real positive
definite matrix satisfying N 2 = S ∗ S. We want to show that AP = P B. Now N is orthogonally
diagonalizable, having the same eigenspaces as N 2 = S ∗ S. As S ∗ SB = S ∗ AS = BS ∗ S, B preserves
every eigenspace of N 2 , and hence of N , implying that BN = N B. Now from AS = SB, we get
AP N = P N B = P BN , which implies AP = P B as N is invertible.
Spectral theory
Recall that a diagonalizable linear operator on a finite-dimensional vector space can be written as
a finite linear combination of mutually commuting projections. Not surprisingly, analogous result
holds for orthogonally diagonalizable linear operators.
103
Remark. Recall that for an orthogonally diagonalizable operator T : V → V , the spectrum of
T , denoted by σ(T ), is defined to be the set of eigenvalues of T . So if σ(T ) = {λ1 , . . . , λr } and
V1 , . . . , Vr are the associated eigenspaces, then for each i, the orthogonal projection πi ∈ L(V )
which has Vi as its image, can be written as a polynomial in T . We have already seen Pr this in the
context of diagonalizable linear operators and Pr the proof does not change. If T = i=1 λi πi then
for every polynomial φ(X) ∈ F [X], φ(T ) = i=1 φ(λi )πi . Applying Lagrange’s interpolation, one
can easily see that
Y T − λj I
πi =
λi − λj
j6=i
I = π1 + . . . + πr
T = λ1 π1 + . . . + λr πr
If
T = λ1 π1 + . . . + λr πr
is the spectral resolution of T , then we just saw that for every polynomial, φ(X) ∈ F [X],
T = λ 1 π1 + . . . + λ r πr .
T = λ1 π1 + . . . + λr πr
is the spectral resolution of T , then λi ≥ 0 for all i = 1, . . . , r. Now the square root function
√
: [0, ∞) → [0, ∞) is injective. Therefore, by the above definition,
√ √ √
T :=
λ1 π1 + . . . + λr πr ,
√ √
and it’s easy to check that ( T )2 = T , so that S = T .
104
Proposition. Let T be an orthogonally diagonalizable linear operator on a finite-dimensional
inner product space V with a spectral resolution
T = λ 1 π1 + . . . + λ r πr .
Proof. It follows directly from the spectral theorem that f (T ) is an orthogonally diagonalizable
linear operator with σ(f (T )) = f (σ(T )).
Now
T 0 = U (λ1 π1 + . . . + λr πr )U −1 = λ1 U π1 U −1 + . . . + λr U πr U −1 .
Note that U ∗ = U −1 . Therefore, for each i, U πi U −1 is a self-adjoint projection, implying that
U πi U −1 is an orthogonal projection. As
U πi U −1 ◦ U πj U −1 = U (πi ◦ πj )U −1 = 0
U π1 U −1 + . . . + U πr U −1 = U π1 + . . . + πr U −1 = U IV U −1 = IV 0 ,
we get that
IV 0 = U π1 U −1 + . . . + U πr U −1
is a resolution of identity. Therefore, by the spectral theorem, we conclude that T 0 is an orthogonally
diagonalizable linear operator with the spectral resolution
T 0 = λ1 U π1 U −1 + . . . + λr U πr U −1 .
In particular, σ(T 0 ) = σ(T ). We could have also proved the orthogonal diagonalizability of T 0 by
showing that V 0 = U (V1 ) ⊕ . . . ⊕ U (Vr ) is an orthogonal decomposition of V 0 , where V1 , . . . , Vr are
the eigenspaces of T .
Finally, X
X
f (T 0 ) := f (λi )U πi U −1 = U f (λi )πi U −1 = U f (T )U −1 .
i i
We’ve defined certain functions of orthogonally diagonalizable operators. Now we’ll define
functions of orthogonally diagonalizable normal matrices (i.e., real symmetric matrices and complex
normal matrices). Let A ∈ Mn (F ) be an orthogonally diagonalizable matrix. Let TA : F n → F n
be the linear operator whose matrix representation with respect to the natural ordered basis
B0 := (e1 , . . . , en ) of F n is A. Then TA is an orthogonally diagonalizable operator. Let σ(A) =
σ(TA ) = {λ1 , . . . , λr } and f : S(⊆ F ) → F a function such that σ(A) ⊆ S. Then we define
f (A) := [f (TA )]B0 . If
T = λ1 π1 + . . . + λr πr
is the spectral resolution of TA , then the spectral resolution of A can be defined as
A = λ1 E1 + . . . + λr Er ,
105
(ii) Ei Ej = 0 for all i 6= j,
(iii) I = E1 + . . . + Er and
(iv) Ei∗ = Ei for all i.
As A is an orthogonally diagonalizable matrix, it’s tempting to first to take a diagonal matrix D
which is orthogonally equivalent to A, apply f on D, an then reverse the procedure to get back
f (A). This approach works too. However, unlike the spectral resolution, which is canonically
defined by A, we can have many diagonal matrices which are orthogonally equivalent to A. So
‘well-definedness’ is an issue which needs to be addressed. For that, let P, P 0 ∈ Mn (F ) be unitary
(orthogonal, if F = R) matrices such that D := P −1 AP, D0 := (P 0 )−1 AP 0 are diagonal matrices.
We want to show that f (A) = P f (D)P −1 = P 0 f (D0 )(P 0 )−1 . To do so, let B, B0 be the ordered
orthonormal basis of V corresponding to P, P 0 respectively. Then D = [TA ]B and D0 = [TA ]B0 .
Now Xr
[f (TA )]B = f (λi )πi = f (D),
i=1 B
0
and similarly, [f (TA )]B0 = f (D ). Therefore, by change of basis formula for matrix representations,
we have f (A) = P f (D)P −1 = P 0 f (D0 )(P 0 )−1 .
x2 x3
ex := 1 + x + + + ... ,
2! 3!
and the image of the real exponential function is the set of positive real numbers. Similarly, we
can define the complex exponential function exp : C → C, defined as exp (z) := ez , where ez is
defined using the power series representation
z2 z3
ez := 1 + z + + + ... .
2! 3!
The complex exponential function has the following properties
A2 A3
eA := I + A + + + ...
2! 3!
for all A ∈ Mn (F ). To show that the above series convergences, we first need a definition.
P
Definition. Let (V,P k k) be a normed linear space. Then a series n xn in V is said to be
absolutely convergent if n kxn k < ∞. P
It follows from the triangle inequality that for an absolutely convergent series n xn , the sequence
of partial sums is a Cauchy sequence, and therefore the series converges if V is complete.
106
Since Mn (F ) is complete with respect to the norm kAk := tr (AA∗ ) (In fact, every norm as
they are all equivalent.), it suffices to show that the above series is absolutely convergent, i.e.,
kA2 k kA3 k
1 + kAk + + + . . . < ∞.
2! 3!
To show the absolute convergence of the series, it is enough to show that kAn k ≤ kAkn for all
n ≥ 1, because then the sum is bounded by ekAk . To prove it, one can either directly prove that
kABk ≤ kAk kBk for all A, B ∈ Mn (F ), or prove the analogous result for the operator norm on
Mn (F ) (which is easier!), and use the fact that any two norms on Mn (F ) are equivalent, so that
convergence of the series with respect to the operator norm implies the same for the Euclidean norm.
In the third inequality, we’re implicitly using the fact that kABk ≤ kAk kBk for all A, B ∈ Mn (F ).
This is something which we have not proved. But then we can again work with the operator norm,
for which this inequality is obvious. And because any two norms on a finite-dimensional vector
space over F = R/C are equivalent, questions about convergence, continuity etc. does not depend
on which norm we are working with.
The exponential function of matrices enjoys the following properties. Most of the proofs are
easy and we leave them as exercises.
For every pair of commuting matrices A, B ∈ Mn (C) there exist two sequences of complex
matrices (An )n∈N , (Bn )n∈N satisfying the following properties
107
(i) For each n, both An an Bn are diagonalizable.
(ii) An Bn = Bn An for all n ≥ 1.
(iii) An → A and Bn → B as n → ∞.’
Hence the proof becomes obvious (Of course, modulo the above result which is perhaps not
‘easy’ to prove.)!
T2 T3
eT := 1 + T + + + ... .
2! 3!
Note that if B is an orthonormal basis of V the eT B = eA , where A := [T ]B . We can work
with any norm on L(V ) because any two such norms are equivalent. Now we state a few more
properties of the exponential function. We could have stated them for matrices. But the language
of linear operators seems to be a more natural set-up.
which is not a diagonalizable matrix, and therefore eT is not diagonalizable. And for orthogonal
diagonalizability, one only needs to observe that if T is diagonalizable, then because T and eT have
the same eigenspaces, T is orthogonally diagonalizable iff eT is orthogonally diagonalizable.
The remaining assertions are obvious.
108
Proposition. Let V be a finite-dimensional inner product space and F ⊆ L(V ) a commuting
family orthogonally diagonalizable linear operators. Then F can be simultaneously orthogonally
diagonalized, i.e., there exists an ordered orthonormal basis B of V such that [T ]B is a diagonal
matrix for all T ∈ F.
In particular, If F ⊆ Mn (R) (respectively, F ⊆ Mn (C)) is a commuting family of symmetric (re-
spectively, normal) matrices then there exists an orthogonal matrix P ∈ On (R) (respectively, a
unitary matrix U ∈ Un (C)) such that P t AP ∈ Mn (R) (respectively, U ∗ AU ∈ Mn (C)) is a diagonal
matrix for all A ∈ F.
Proof. We give two proofs of the above proposition. The first one exploits certain property
of normal matrices, whereas the second one mimics the idea used to prove the similar result for
diagonalizable operators on vector spaces.
(1) Since F is a commuting family triangulable operators, there exists an ordered basis of V with
respect to which the matrix representation of every element of F is an upper triangular matrix.
So using Gram-Schmidt orthogonalization, we can find an ordered orthonormal basis B of V such
that [T ]B is an upper triangular matrix for all T ∈ F. As each [T ]B is an upper triangular normal
matrix, it follows that [T ]B is actually a diagonal matrix for all T ∈ F, implying that B simulta-
neously orthogonally diagonalizes the family F.
(2) Alternatively, we can apply induction on dim V . There’s nothing to prove if dim V = 1. Now
supposing that the result is true whenever dim V ≤ r, let dim V = r + 1. If every element of F
is a scalar operator, again, there’s nothing to prove. Otherwise, let T ∈ F be a linear operator
which is not scalar and W an eigenspace of T . As T is not scalar, W 6= V . Then W ⊥ is a
sum of eigenspaces of T . Therefore W and W ⊥ are both invariant under every element of F. As
dim W, dim W ⊥ ≤ r, from the induction hypothesis, it follows that there exist ordered orthonor-
mal basis B0 , B00 of W, W ⊥ respectively, such that [T |W ]B0 , [T |W ⊥ ]B00 are both diagonal matrix
for all T ∈ F. Therefore, if we take B := (B0 , B00 ), then B is an ordered orthonormal basis of V
such that [T ]B is a diagonal matrix for all T ∈ F.
Proof. If F = C, the assertion directly follows from the above proposition because a complex
square matrix is normal iff it’s unitarily diagonalizable.
Otherwise, we can view A, B as complex matrices. Then C[A, B] is a commutative algebra of
complex normal matrices; and hence, R[A, B] ⊆ C[A, B] is commutative algebra of real normal
matrices.
Remarks.
1. Let V1 , V2 ≤ V be T -invariant subspaces such that T |V1 , T |V2 are scalar operators for all
T ∈ F. If V1 ∩ V2 6= 0, then T |V1 +V2 is also a scalar operator for every T ∈ F. It follows that
if V1 , V2 are eigenspaes of F, then either V1 = V2 or V1 ∩ V2 = 0.
109
Lemma. Let F be a family of orthogonally diagonalizable linear operators on an inner product
space V . Then different eigenspaces of F are mutually orthogonal.
V = V1 ⊕ . . . ⊕ Vr ,
I = π1 + . . . + πr
is called the resolution of identity determined by F, and for each T ∈ F, the representation
T = λ1 π1 + . . . + λr πr
is called the spectral resolution of T in terms of this family (Note that here the λi ’s may not be
distinct.). In particular, every eigenspace of T is a sum of eigenspaces of F.
Before we give a proof of the proposition, let us consider a simple example which will help us
to understand the above ideas. Let V := R4 , equipped with the usual dot product and F = {S, T },
where S, T ∈ L(R4 ) are linear operators whose matrix representations with respect to the natural
ordered orthonormal basis (e1 , e2 , e3 , e4 ) are
1 0 0 0 1 0 0 0
0 1 0 0
and 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
Note that none of the eigenspaces of F is an eigenspace of either S or T . However, we’ll later see
that every eigenspace of F is an eigenspace of a polynomial in S, T .
If πi is the orthogonal projection of R4 onto Rei , then
S = 1 · π1 + 1 · π2 + 0 · π3 + 0 · π4
Proof of the proposition. We know that distinct eigenspaces of F are mutually orthogo-
nal. Let W be the sum of all eigenspaces of F. If W = V , we are done. Otherwise, W ⊥ is also
F-invariant and FW ⊥ := {T |W ⊥ | T ∈ F} is a commuting family of orthogonally diagonalizable
linear operators. So they can be simultaneously orthogonally diagonalized; and in particular, W ⊥
contains an eigenvector of F, which is a contradiction.
ri ∈ {1, . . . , n} such that Vi0 ⊆ Vri . Similarly, we can define refinement of a resolution of identity.
110
Note that refinement gives a partial order relation on the set of all orthogonal decompositions of
V and associated resolutions of identity. If F is a commuting family of orthogonally diagonalizable
linear operators, then the resolution of identity determined by F is a refinement of every resolution
of identity given by an element of F.
I = π1 + . . . + πr
AF = {c1 π1 + . . . + cr πr ∈ L(V ) | c1 , . . . , cr ∈ F }.
Proof. Let B be an ordered basis of V such that A := [S]B , B := [T ]B are diagonal matrices.
Let {c1 = 0, c2 , . . . , cs }, {c01 = 0, c02 , . . . , c0t } be the set of eigenvalues of S, T respectively. For each
i = 2, . . . , t, let Xi be set of eigenvalues of S which appear as the diagonal entries of A correspond-
ing to the diagonal entries of B which are equal to c0i . As null T > 1, |Xi | ≤ n − 2 for all i. Since
|F | ≥ n − 1, we can choose, for each i, an element ηi ∈ F \ Xi . Now using Lagrange’s interpolation,
we can find a polynomial f (X) ∈ F [X] such that f (0) = 0 and f (c0i ) = ηi for all i = 2, . . . , t. Then
it is easy to see that ker (S − f (T )) = W .
Finally, in order to show that AF is generated by one element, let µ1 , . . . , µr be distinct el-
ements of F . Then using Lagrange’s interpolation, one can easily see that AF = F [Te], where
Te := µ1 π1 + . . . , µr πr .
Remarks.
111
1. Let F := F2 , the field consisting of two elements. Now consider the matrices
1 0 0 0 1 0 0 0
0 1 0 0 and B := 0 0 0 0
A :=
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
Then ker A =< e3 , e4 >, ker B =< e2 , e4 > and ker A ∩ ker B =< e4 >. However, for
every polynomial f (X) ∈ F [X], the first diagonal entry of f (B) is equal to its third diagonal
entry, implying that ker (A − f (B)) 6=< e4 >. Therefore, in the above lemma, we cannot
drop the condition that |F | ≥ n − 1.
2. Again, let F := F2 , and consider the matrices
1 0 0 0 0 0
A := 0 0 0 and B := 0 0 0
0 0 0 0 0 1
and take F := {A, B}. Then F [F] ⊆ M3 (F ) is the set of all diagonal matrices. As every
diagonal matrix of M3 (F ) satisfies the equation X 2 = X, the F -algebra F [F], which has
dimension 3, cannot be generated by one element as an F -algebra.
3. If F = R/C, and A is a commutative self-adjoint algebra of orthogonally diagonalizable linear
operators on an inner product space V of dimension n, then there exists T ∈ A such that
A = F [T ]. In particular, dim A ≤ n.
4. It follows from the proof of the theorem that if V is a finite-dimensional inner product
space over F = R/C, then every commutative algebra of orthogonally diagonalizable linear
operators A ⊆ L(V ) is a self-adjoint algebra.
Now if π : V → V is a projection then the only projections contained in F [π] are π and
I − π. Therefore, if π is not an orthogonal projection, then F [π] is a commutative algebra
of diagonalizable operators (not orthogonally diagonalizable!) which is not a self-adjoint
algebra.
Exercises.
1. HK : Section 9.5 - 5,7,9 (For 9(c), you may need Thm. 10 on p. 337).
2. Let F be an arbitrary field with |F | ≥ n. If A is an F -subalgebra of the set of n × n diagonal
matrices over F , then prove that there exists a matrix A ∈ A, such that A = F [A].
3. Let V be a finite-dimensional inner product space and T : V → V an orthogonally diagonal-
izable operator with a spectral resolution
T = λ1 π1 + . . . + λr πr .
Now consider the inverse function on F ∗ which sends a nonzero element λ ∈ F to λ1 . Then
we can define inverse function of T iff σ(T ) ⊆ F ∗ , i.e., iff T is invertible in the usual sense.
In this case, prove that the spectral resolution of T1 is given by
1 1 1
= π1 + . . . + πr ,
T λ1 λr
and 1
T = T −1 .
4. (*) If A, B ∈ Mn (F ), then prove that kABk ≤ kAk kBk, where kAk := tr (AA∗ ).
Deduce that eA ≤ ekAk for all A ∈ Mn (F ).
5. (a) If A ∈ Mn (C), then prove that σ eA = eσ(A) .
Does the result hold for real matrices?
112
(b) Give an example of A ∈ Mn (R) such that A is not diagonalizable, but eA is diagonal.
Hint. To prove the first
part of
(a), use triangulation. For the second part of (a) and (b),
0 −π
take the matrix A := , and use a suitable embedding of C in M2 (R) to conclude
π 0
that eA = −I2 .
6. (*) Use the following steps to prove that exp : Mn (C) → GLn (C) is surjective.
(i) Every diagonal matrix of nonzero determinant is contained in the image of the expo-
nential map.
(ii) Every diagonalizable matrix of nonzero determinant is contained in the image of the
exponential map.
(iii) If A ∈ GLn (C) there exists a diagonalizable matrix D ∈ GLn (C) and a nilpotent matrix
N ∈ Mn (C) such that A = D +N and DN = N D. So it suffices to show that I +D−1 N
is contained in the image of the exponential map.
(iv) For nilpotent matrices N ∈ Mn (C), we can define the logarithm function
N2 N3 N n−1
log (I + N ) := N − + − . . . + (−1)n .
2 3 n−1
113
Miscellaneous Exercises
1. Let V be a vector space and S ⊆ V a linearly independent set. If x ∈<
/ S >, then show that
x + S := {x + y ∈ V | y ∈ S} is also linearly independent.
2. Let A be an m × n matrix over a field F . Show that A can be written as a product of two
matrices A = BC, where the columns of B and the rows of C are linearly independent.
3. Let V be a vector space over a field F . If f : V → V is a group homomorphism, then show
that the set Lin (f) := {λ ∈ F | f(λx) = λf(x) for all x ∈ V } is a subfield of F . Further, if
F is a subfield of C and f : F n → F n is a continuous group homomorphism, then prove that
Lin (f) is a closed subfield of F .
4. Let S, T, U be linear operators on a finite-dimensional vector space V , Then prove that
rk T + rk SU T ≥ rk U T + rk ST.
Deduce that rk T n + rk T n+2 ≥ 2 rk T n+1 for all positive integers n. Therefore the sequence
(rk T n − rk T n+1 )∞
n=1 is a decreasing sequence.
5. Let F be an algebraically closed field and A ∈ Mn (F ). If ch F = p > 0, then show that there
r
exists a positive integer r (depending on n) such that Ap is a diagonalizable matrix.
Give an example of a 2 × 2 real matrix A such that Am is not diagonalizable for all positive
integers m.
6. Let A be an r × r matrix over C such that the limit lim An exists. Then prove that lim An
n→∞ n→∞
is an idempotent matrix of rank equal to the rank of Ar .
If moreover A is an invertible matrix, then show that lim An = Ir .
n→∞
− − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −−
7. Let T be a linear operator on a vector space V . Suppose that f (X) ∈ F [X] is a polynomial
of prime degree p such that f (T ) = 0. Then prove that V has a T -invariant direct sum
decomposition V = ⊕i Vi such that each Vi has dimension p. − − − − − − − − − − − − − −
−−−−−−−−−−−−−−−−−−−−−−−−−
8. Prove that the following statements are equivalent for a linear operator T on a finite-
dimensional inner product space V .
(i) T is normal.
(ii) T ∗ ∈ F [T ], i.e., T ∗ can be written as a polynomial in T .
(iii) Every T -invariant subspace W ≤ V is also T ∗ -invariant.
(iv) For every T -invariant subspace W ≤ V , W ⊥ is also T -invariant.
(*) If T is a normal operator on an arbitrary inner product space V (not necessarily finite-
dimensional), then does it imply T ∗ ∈ F [T ]?
Hint. For the last question, let V be an inner product space having an orthonormal basis
indexed by S 1 , the unit circle. Then consider the unitary operator U : V → V , defined by
U (eλ ) := λeλ for all λ ∈ S 1 .
9. Let V be an inner product space with a countable orthonormal basis B, like F (N) . As Z, the
set of integers, is also a countable set, the ortonormal basis B can as well be indexed by Z.
Let B = {ei }i∈Z . Now consider the linear transformation T : V → V , given by T (ei ) := ei+1
for all i ∈ Z. Then show that
(i) T is a unitary operator.
(ii) T does not have any finite-dimensional nonzero invariant subspace. In particular, T
does not have any eigenvalue.
(iii) If W :=< ei >i∈N , then W is a T -invariant subspace. But T |W is not a unitary operator,
and W ⊥ is not T -invariant.
10. Show that every diagonalizable normal operator on a finite-dimensional inner product space
is orthogonally diagonalizable.
114
11. Prove that SO2 (R), as a group, is isomorphic to S 1 . In particular, SO2 (R) is an abelian
group.
Can you see that SO2 (R) is homeomorphic to S 1 ?
If n > 1 is a positive integer, and 0 < p < n is another positive integer, then show that
SOn (R) contains a subgroup isomorphic to SOp (R) × SOn−p (R). Deduce that SOn (R) is
not abelian for all n ≥ 3.
12. (*) In the following exercises, we explore certain topological properties of a few matrix groups.
(i) On (R), SOn (R) are compact sets.
On (R) has two connected components, both homeomorphic to SOn (R).
(ii) The set of all upper (or lower) triangular real matrices with positive diagonal entries is
a path connected set.
(iii) Let SUn (C) := {A ∈ Un (C) | det A = 1}. Then Un (C), SUn (C) are path connected
compact sets.
(iv) GLn (C), SLn (C) are both path connected. Are these sets compact?
(v) SLn (R) is a path connected set. Is it compact?
−
(vi) GLn (R) is not connected. It has two connected components - GL+ n (R) and GLn (R),
the set of all invertible real matrices of positive and negative determinant respectively.
−
In fact, GL+ n (R) is path connected and homeomorphic to GLn (R).
Hint. To prove (ii), construct paths for each entry of the matrix.
For (iii), first note that the matrices are unitarily diagonalizable, and then construct
paths for each entry.
After triangulation, use a similar idea for (iv).
For (v) and (vi), you may have to use QR decompositions. And finally, to show that
−
GL+ n (R) is homeomorphic to GLn (R), use the diagonal matrix whose first entry is −1,
and the remaining diagonal entries are equal to 1.
13. Let V be a complex inner product space. If x ∈ V is such that hx, yi = hy, xi for all y ∈ V ,
then show that x = 0.
14. Prove that the set of all n × n nilpotent matrices over F = R/C is a closed subset of Mn (F ).
Is it compact?
15. (*) Let T be a linear operator on a finite-dimensional inner product space V over F = R/C.
If T = U N is a polar decomposition of T , then show that T is normal iff U N = N U .
16. Let A be an n × n upper (or lower) triangular matrix over F = R/C. If is diagonalizable,
does it follow that A is a diagonal matrix? What if A is orthogonally diagonalizable?
17. (*) If A ∈ Mn (C) is not a scalar matrix then show that the set {U ∗ AU | U ∈ SUn (C)} is
uncountable. Deduce that if T is a non-scalar linear operator on a finite-dimensional complex
inner product space V , then the set
[T ]B ∈ Mn (F ) | B is an ordered orthonormal basis of V
is uncountable.
If A ∈ SO2 (R) \ {±I}, then show that the set {P t AP | P ∈ O2 (R)} contains exactly two
elements. So the above result does not extend to real inner product spaces.
18. (*) Let V be a finite-dimensional inner product space. Show that the following statements
are equivalent.
(i) T is a scalar operator.
(ii) The set
[T ]B ∈ Mn (F ) | B is an ordered orthonormal basis of V
is a singleton set.
Hint. Reduce the problem to the case when dim V = 2.
115