Saikia PK Linear Algebra

Saikia-Linear Algebra book1 February 18, 2014 14:0
Linear
Algebra
i
Linear
Algebra
Second Edition
Promode Kumar Saikia

North-Eastern Hill University
iii
No part of this eBook may be used or reproduced in any manner whatsoever without the
publisher’s prior written consent.
Copyright © 2014 Pearson India Education Services Pvt. Ltd
This eBook may or may not include all assets that were part of the print version. The
publisher reserves the right to remove any material in this eBook at any time.
ISBN: 9789332522145
eISBN: 9789332540521
Head Office: 7th Floor, Knowledge Boulevard, A-8(A) Sector 62, Noida 201 309, India.
Registered Office: Module G4, Ground Floor, Elnet Software City, TS-140, Block 2 &
9, Rajiv Gandhi Salai, Taramani, Chennai, Tamil Nadu 600113, Fax : 080-30461003,
Phone: 080-30461060, www.pearson.co.in, Email id: cs.india@pearson.com
iv
Contents
Preface ix
Preface to the Second Edition xiii
A Note to Students xv
List of Symbols xvii
1 Matrices 1
1.1 Introduction 1
1.2 Basic Concepts 1
1.3 Matrix Operations and Their Properties 15
1.4 Invertible Matrices 27
1.5 Transpose of a Matrix 32
1.6 Partition of Matrices; Block Multiplication 36
1.7 Groups and Fields 45
2 Systems of Linear Equations 49

2.1 Introduction 49
2.2 Gaussian Elimination 49
2.3 Elementary Row Operations 55
2.4 Row Reduction 65
2.5 Invertible Matrices Again 77
2.6 LU Factorization 82
2.7 Determinant 96
v
vi Contents
3 Vector Spaces 114

3.1 Introduction 114
3.3 Linear Independence 127
3.4 Basis and Dimension 135
3.5 Subspaces Again 147
3.6 Rank of a Matrix 153
3.7 Orthogonality in Rn 163
3.8 Bases of Subspaces 178
3.9 Quotient Space 184
4 Linear Maps and Matrices 191

4.3 Algebra of Linear Maps 204
4.4 Isomorphism 215
4.5 Matrices of Linear Maps 221
5 Linear Operators 237

5.2 Polynomials Over Fields 238
5.3 Characteristic Polynomials and Eigenvalues 243
5.4 Minimal Polynomial 271
5.5 Invariant Subspaces 283
5.6 Some Basic Results 298
5.7 Real Quadratic Forms 310
6 Canonical Forms 321

6.2 Primary Decomposition Theorem 321
6.3 Jordan Forms 329
Contents vii
7 Bilinear Forms 346

7.3 Linear Functionals and Dual Space 355
7.4 Symmetric Bilinear Forms 360
7.5 Groups Preserving Bilinear Forms 374
8 Inner Product Spaces 380

8.2 Hermitian Forms 380
8.3 Inner Product Space 385
8.4 Gram–Schmidt Orthogonalization Process 390
8.5 Adjoints 403
8.6 Unitary and Orthogonal Operators 409
8.7 Normal Operators 416
Bibliography 430
Index 431
Preface
This book is the outcome of a growing realization, shared by my colleagues, that there is a need for
a comprehensive textbook in linear algebra whose main emphasis should be on clarity of exposition.
There are several excellent textbooks available currently; however, the perception is that each of these
has its own area of excellence leaving room for improvement. This perception has guided the approach
to some topics of this book. For the contents of the book, I have drawn on my experience of teaching
a full semester course in linear algebra over the years for postgraduate classes in the North-Eastern
Hill University in Shillong, India. The inputs from some colleagues from undergraduate colleges have
also helped.
My main concern has always been with simplicity and clarity, and an effort has been made to
avoid cumbersome notations. I have opted for informal discussions instead of giving definitions which
appear cluttered-up. Overall, our aim has been to help readers acquire a feeling for the subject. Plenty
of examples and numerous exercises are also included in this book.
Chapter 1 introduces matrices and matrix operations and explores the algebraic structures of sets
of matrices while emphasizing the similarities with more familiar structures. The role of unit matrices
in the ring structure of matrices is also discussed. Block operations of partitioned matrices are quite
useful in later chapters. This chapter discusses such matrices to make readers comfortable with their
uses. Chapter 2 comprehensively covers the treatment of solutions of systems of linear equations
by row reduction with the help of elementary row/column operations. Elementary matrices appear
naturally; their usefulness in analysing matrices, especially invertible matrices, is also examined, and
a section on properties of determinants is also included in this chapter. Determinants are defined in
terms of expansions by minors along the first row; by doing so, it has become possible to give proofs
of properties of determinants of arbitrary orders accessible to even undergraduate students. It should
be noted that these properties are well-known and used frequently but hardly proved in classrooms.
Chapter 3 begins by introducing the basic concepts related to vector spaces. Ample examples are
provided for concepts like linear independence, basis and coordinates to make it easier for an average
student. A whole section of this chapter is devoted to the idea of the rank of a matrix in computations
as well as in theory. Rank of a matrix is defined through the row space and the column space of
the matrix; this approach has the advantage of working with ideas like linear independence to make
relevant proofs more transparent. Computations of bases of sums and intersections of subspaces have
always been difficult for students and an attempt has been made to remove the difficulties of such
computations. The easy-paced treatment of the topics of these three chapters makes this part of the
book suitable for both students and teachers of undergraduate courses.
Chapters 4 to 8 deal adequately with the essentials in linear algebra for a postgraduate student in
mathematics. More practically, the topics cover the requirements of the NET syllabus. A brief look
at the contents of these chapters follows. Linear maps between vector spaces are studied in detail in
Chapter 4. The interplay between linear maps and matrices is stressed throughout this chapter. Other
ix
x Preface
important concepts, such as isomorphism, dimension formula and similarity, are dealt with in this
chapter. Projections as well as nilpotent maps and matrices are also introduced so that readers are
familiar with them long before their actual applications. Chapter 5 is a long one; the goal is to obtain
the diagonalization theorems. However, the main emphasis is to carefully develop the concepts, such
as eigenvalues, characteristic polynomials, minimal polynomials and invariant subspaces, which are
essential in many branches of higher mathematics. Cyclic subspaces and companion matrices are also
treated here. Chapter 6 is devoted to canonical forms of matrices. A shorter and more accessible treat-
ment of Jordan form is provided. Primary decomposition theorem and rational canonical forms are the
other two topics in this chapter. Chapter 7 discusses bilinear forms. A method for diagonalizing sym-
metric matrices as well as quadratic forms is given here. Sylvester’s classical result for real symmetric
matrices is also included. Chapter 8 deals with certain essential concepts which can be treated in the
framework of inner product spaces and are introduced through hermitian forms. The main objective
of this chapter is to obtain the orthogonal diagonalization of hermitian and real symmetric matrices.
Standard topics, such as Gram-Schmidt process, adjoints, self-adjoint and normal operators, are thor-
oughly examined in this chapter leading to the Spectral theorem. Unitary and orthogonal operators are
the other key topics of this chapter.
The final chapter, Chapter 9, is devoted to a few topics which are must for a student of linear
algebra but unfortunately do not find a place in the syllabi of linear algebra in most of the Indian
universities. The chapter begins with a discussion of rigid motions and the canonical forms for orthog-
onal operators. Many applications of linear algebra in diverse disciplines depend on the theory of real
quadratic forms and real symmetric matrices; as examples of such applications, this chapter discusses
the classifications of conics and quadrics as well as the problems of constrained optimization, and
relative extrema of real-valued functions. To facilitate the discussion of these problems, positive def-
inite matrices are also introduced. Singular value decompositions of real or complex matrices reveal
important properties of such matrices and lead to amazing applications. The last section of the chapter
deals with singular value decompositions; as an application, Moore–Penrose inverses of matrices are
briefly discussed.
Numerous exercises are provided for almost all the sections of the book. These exercises form an
integral part of the text; attempts to solve these will enhance the understanding of the material they
deal with. A word about the true/false questions included in this book: We, at the North-Eastern Hill
University, have been encouraging the practice of including such true/false questions in examination
papers. We hope that the inclusion of such questions in this book will help spread the practice to other
mathematics departments of the country.
My thoughts about the subject matter of this book have been shaped by various books and articles
on algebra and linear algebra by master expositors such as Halmos, Herstein, Artin and others. Their
influence on this book is undeniable. I take this opportunity to acknowledge my indebtedness to all
of them. I have also been greatly benefited by the textbooks listed in the bibliography; I express my
gratitude to all the authors of these textbooks. The material about isometry in the last chapter closely
follows Kumaresan’s lovely article on isometries which appeared in the Mathematics Newsletter, vol.
14, March 2005.
Above all, my colleagues in the Mathematics Department of the North-Eastern Hills University
deserve special thanks for helping me in so many ways during the preparation of this book. Professor
M.B. Rege and Professor H. Mukherjee were always ready with suggestions for me; their encour-
agement kept me going. Innumerable discussions with my younger colleagues, Ashish Das, A. Tiken
Singh, A. M. Buhphang, S. Dutta, J. Singh and Deepak Subedi, helped me immensely to give the final
shape to the manuscript, especially in preparing various exercises. A. Tiken Singh and Ashish Das also
Preface xi
helped me to learn the intricacies of AMS Latex. Professor Nirmal Dev read through some portions of
the initial draft; I thank him for his valuable suggestions.
I must thank the authorities of the North-Eastern Hill University, especially the then Vice-
Chancellor Professor M. Miri for granting me sabbatical leave for a year in 2003 during which the
first draft of this book was prepared.
Finally, I must admit that without my wife Moinee’s support, it would have been impossible to go
through preparing and typing several drafts of the manuscript for this book in the last five years. She
deserves my special thanks.
In spite of all the care I have taken, mistakes may have remained in this book. I take full responsi-
bility for any such mistake and will appreciate if they are pointed out.
—Promode Kumar Saikia
Preface to the Second Edition

This new edition was initially conceived so that certain topics, as suggested by reviewers, could be in-
corporated to make the book useful to a wider readership. Apart from these new topics, select portions
of the first edition, mainly in Chapters 3, 6 and 7, have been rewritten for greater clarity for the present
edition. Moreover, material in this edition is arranged in such a way that due importance can be given
to real symmetric matrices in the initial chapters only. As a result, it has been possible to present the
important result about the orthogonal diagonalizibility of real symmetric matrices in Chapter 5 (where
diagonalizable operators are discussed). This removes a major drawback of the first edition, where
one had to wait till Chapter 8 to obtain the same result as a consequence of the theory of self-adjoint
operators in general inner product spaces.
The following are the new additions in this edition: LU factorization (Section 2.6; permutation
matrices needed for this section are introduced in a new subsection in Section 2.3), orthogonality in
Rn with respect to the standard dot product(Section 3.7; this section also deals with orthogonal and
unitary matrices and Gram Schmidt process), orthogonal diagonalizibility of real symmetric matri-
ces (Section 5.3, in a new subsection), groups preserving bilinear forms such as orthogonal, pseudo-
orthogonal and symplectic groups (Section 7.5). A new section 1.7 contains definitions and examples
of groups, rings and fields for the benefit of readers not familiar with these concepts.
Because of these new material, we have to omit the last chapter of the first edition on selected topics
in this edition to keep the book to a reasonable length; the omitted chapter, along with a new section on
difference equations and recurrence relations, will be uploaded to the website of the book. However,
two important topics from the omitted chapter are included in this edition at appropriate places. The
first one deals with conic sections as an application of real quadratic forms (Section 5.7) and the other
briefly discusses positive definite and positive semi definite matrices (in Section 7.4). The topic of
rational canonical forms in Chapter 6 of the first edition is shifted to the website.
The other major changes in this edition are as follows: a new proof of the result that the character-
istic and the minimal polynomials of a linear operator have the same irreducible factors, which does
not use field extensions (Proposition 5.6.6), correction of the proof of existence of SN decomposition
(Proposition 6.2.3) and a simpler proof of Sylvester’s law of inertia (Theorem 7.4.8). Some incorrect
statements in the first edition are either corrected or removed for this edition. Some new exercises are
also added in this edition.
While preparing this edition, I have had many fruitful discussions with several of my colleagues,
especially with Prof. A. K. Das of North Eastern Hill University and Prof. R. Shukla of Allahabad
University; their valuable suggestions helped me in formulating the contents of several new topics.
Prof. S. S. Khare went through the drafts of some of the new material and offered useful comments to
xiii
xiv Preface to the Second Edition
make the presentation better. Thanks are due also to the teachers using the first edition in class rooms
for pointing out mistakes and shortcomings in the first edition. I must acknowledge the role of the
anonymous reviewers too whose views prompted this revised edition. Finally, my sincere thanks to all
the persons in Pearson Education India, especially Jigyasa Bhatia, Anita Yadav, Rajesh Matthews and
Nikhil Rakshit, whose patient cooperation and useful suggestions have made this new edition possible.

A Note to Students
This book is intended to help in understanding concepts and learning procedures for computations in
linear algebra. Computations are essential in linear algebra; but to carry out computations effectively
and provide justification for the procedures adopted, one needs to thoroughly understand the related
concepts. Classroom instructions do play a key role in understanding the material in a course of linear
algebra. However, you will have to follow up the classroom lectures with your own effort. This is
where this book fits in.
I have chosen a conversational style for this book so that it will be easy for you to read and re-
read any material till you feel you have mastered it. You can test your understanding with the help of
exercises provided at the end of all sections of the book. It is essential that you work out these exercises
honestly. In doing so, you will find, more often than not, that you have to go back to the text to clear up
some points. There are true/false type of questions in most sets of exercises; these questions ask you to
determine whether a statement is true or false. Your answer has to be justified; a true statement requires
a short proof whereas an example (known as a counter example) is needed to disprove a false one.
You will find that quite a few results in this book are left to the reader; some of these depend on
routine verifications. It is expected that you complete such proofs. Routine verifications are usually
based on straightforward arguments; you should go through such verifications at least once even if
they are not too exciting.
Finally, I hope that using this book proves as useful and enjoyable for you as it was for me while
writing.
xv
List of Symbols
ai j the (i, j)th entry of the matrix A

ann(T
! ") the annihilator of T
Ab an augmented matrix
A−1 the inverse of A
A∗ the conjugate of A
At the transpose of A
Bil(V) the set of bilinear forms on V
C the set of complex numbers
EC[a, b] the set of all real-valued continuous functions on [a, b]
C( f (x)) the companion matrix of f (x)
δi j the Kroncker delta symbol
ei j the unit matrix with 1 at (i, j)th place
End(V) the set of linear maps of V into itself
EndF (V) the set of F-linear maps of V into itself
F a field
Fn the set of all n-tuples over a field F
F[x] the set of all polynomials over a field F
GLn (F) the set of all invertible matrices of order n over a field F
Hom(V, W) the set of all linear maps of V into W
In the identity matrix of order n
Im( f ) the image of a map f
Jn (a) the elementary Jordan form of order n with eigenvalue a
ker f the kernel of a linear map f
Mm×n (F) the set of all m × n matrices over F
Mn (F) the set of all matrices of order n over F
O the zero matrix
Q the set of rational numbers
R the set of all real numbers
R[x] the set of all polynomials over R
Rn [x] the set of all polynomials of degree at most n over R
xvii
xviii List of Symbols
Rθ the rotation of the plane through angle θ

S ym(V) the set of all symmetric bilinear forms on V
$S % the span of the set S
! the relation of row equivalence
T∗ the adjoint of the operator T
T r(A) the trace of the matrix A
V⊥ the radical of V
V ⊥L the left radical of V
V ⊥R the right radical of V
V∗ the dual space of V
$v, w% the inner product of v and w
'v' the lenght of v
W⊥ the orthogonal complement of W
XR the set of all maps from X into R
z the zero map
Z the set of integers
Z(v, T ) the T -cyclic subspace generated by v
1 Matrices
1.1 INTRODUCTION
Matrices, these days, are indispensable in the field of mathematics and in countless applications of
mathematics. They are extremely useful in arranging, manipulating and transferring data, and so have
proved invaluable in efficient analysis of large databases. To cite a few examples, recent innovations
such as internet search engine algorithms, effective communications of images and data over the in-
ternet and to and from satellites, computer-aided designs used extensively in industries, all these rely
crucially on matrix techniques. Linear models are employed in diverse disciplines to study various
phenomena; matrices play an important role in such models. On the other hand, keeping pace with the
ever-increasing speed and memories of computers, matrix techniques in numerical analysis are under-
going constant refinement leading to vast improvement of our ability to deal with complex systems
one comes across in areas such as weather forecasting, economic planning, data management services,
etc. There is no doubt that the study of matrix techniques and the theory of matrices will keep growing
in importance in coming days.
In linear algebra, matrices are used to not only to visualize abstract concepts in concrete terms, but
also to gain insights about such concepts using their matrix representations. For example, as we shall
see shortly, systems of linear equations, one of the major topics in linear algebra, can be interpreted as
matrix equations; this allows an efficient treatment of such systems using powerful matrix methods.
We shall also see that solutions of matrix equations lead naturally to concrete examples of vector
spaces, another central topic in linear algebra. The concept of linear maps or linear transformations
lies at the core of linear algebra; we shall further initiate the reader to the idea of matrices as linear
maps or linear transformations between such concrete vector spaces. So, we begin our study of linear
algebra with an introduction algebraic structures formed by matrices and illustrating some fundamental
concepts of linear algebra with the help of matrices.
1.2 BASIC CONCEPTS

An array A of mn numbers arranged in m rows and n columns, such as:
 
 a11 a12 · · · a1n 

A =  ... .. . 
 . · · · .. 
am1 am2 · · · amn
1
2 Matrices
is said to be an m × n matrix, or simply a rectangular matrix. The number which appears at the
intersection of the ith row and the jth column is usually referred to as the (i, j)th entry of the matrix A.
It is natural that the number appearing as the (i, j)th entry be denoted by ai j in general. Then matrix
A is described as [ai j ]m×n ; if it is clear from the context or if the size of the array is not important, we
drop the subscript m × n. Sometimes, it is more convenient to describe the (i, j)th entry of a matrix A
as Ai j .
The entries ai j of a matrix A = [ai j ] may be real numbers or complex numbers. If all the entries are
real, the matrix A is called a real matrix; otherwise, it is called a complex matrix. For example, if
' (
5 −1 1/2
√ ,
A=
0 2 2
then A is a real 2 × 3 matrix,

√ i.e., A is a matrix having 2 rows and 3 columns of real numbers with
a11 = 5, a12 = 0, a23 = 2, etc.
The set of real numbers R as well as the set of complex numbers C are examples of algebraic struc-
tures known as fields. Like these familiar numbers, elements of a field can be added and multiplied.
Every field has two distinguished elements which behave exactly in the same way with respect to
field addition and multiplication as our familiar real numbers 0 and 1; it is customary to denote these
field elements by the same symbols 0 and 1. Moreover, subtraction and division by non-zero elements
are also allowed in a field. For formal definitions and some examples of fields and other algebraic
structures such as groups and rings, see Section 1.7 at the end of this chapter.
The elements of an arbitrary but fixed field are called scalars. A field, in general, will be denoted
by the symbol F.
Matrices over any field can be defined the same way as matrices with real or complex entries. Thus,
any rectangular array of m rows and n columns of scalars from a field F is an m × n matrix over F.
From now onwards, we will be considering matrices over an arbitrary field unless otherwise specified.
However readers, who are unfamiliar with the idea of an arbitrary field, can continue to treat ma-
trices as arrays over any familiar number system without any hindrance to their understanding of the
material in the initial two chapters.
We go back to introducing new concepts and nomenclatures for matrices over a field F.
If a matrix A over a field F is an 1 × n matrix, that is, A has a single row, then we say that A is (an
n-dimensional) row vector over F. It is convenient in this case to drop the row index i and write A
simply as
A = [a1 a2 ... an ].
Similarly, an m × 1 matrix A over F is (an m-dimensional) column vector
 
 a1 
 a 
 2
A =  .  ,
 .. 
 
am
where ai ∈ F.
Basic Concepts 3
Observe that an m × n matrix A = [ai j ] over F can be described as comprising m row vectors
ρ1 , ρ2 , . . . , ρm , where ρi is given by
ρi = (ai1 , ai2 ,..., ain ) 1 ≤ i ≤ m,
or as comprising n column vectors γ1 , γ2 , . . . , γn , where

 
 a1 j 
 a 
 2j
γ j =  .  1≤ j≤n
 .. 
 
am j
is an m-dimensional column vector over F.

Sometimes, it is convenient to write a matrix A having n columns γ1 , γ2 , . . . , γn as
A = [γ1 γ2 , . . . , γn ].
Quite frequently, we have to consider rectangular matrices having the same number of rows and
columns. An n × n matrix (having n rows and n columns) with entries from a field F is a square matrix
of order n over F. Such square matrices, because they occur frequently in diverse areas, form a very
important subfamily of matrices.
Special Forms of Matrices

A square matrix A = [ai j ] has a distinguished set of entries, namely, its diagonal consisting of the
entries aii , that is, a11 , a22 , . . . , ann . A square matrix A = [ai j ] is said to be a diagonal matrix if all
its off-diagonal entries are zero, that is, ai j = 0 if i ! j. (Of course, some or all the diagonal entries of
a diagonal matrix may be zero). A diagonal matrix is called a scalar matrix if all the diagonal entries
are equal; a very special case of a scalar matrix of order n is the one where all the diagonal entries
are equal to 1; then, it is called the identity matrix of order n over F. An identity matrix of order n is
usually denoted by In . So, the identity matrix In of order n over F looks like
 
1 
 1 
 0 
 . 
 
 . ,
In =  
 . 
 . 
 0 
 1 
1
where the multiplicative identity 1 of F appears along the diagonal. The two ‘zeros’ appearing off-
diagonal indicate that except for those indicated along the diagonal, all other entries of the matrix are
zeros.
4 Matrices
A square matrix over a field F is upper triangular if all its entries below the diagonal are zero.
(That does not exclude the possibility of zeros occurring as other entries). Using our ‘zero’ notation,
such a matrix of order n can be described as
 
a11 a12 . . . a1n 
 a22 . . . a2n 

 . . . . 
 ,
 0 . . . 
 
 . . 

ann
the zero below the diagonal indicating that all the entries below the diagonal are zeros. Similarly, a
lower triangular matrix looks like
 
a11 
a21
 a22 0 
 . . 
 .
 . . . 
 . 
 . .. 

an1 an2 . . ann
We now come back to the discussion of general rectangular matrices. An m×n matrix over a field F
having all its entries zeros is called the zero matrix over F; we will denote it by 0m×n or simply by 0
if its size is not important.
Before operations on matrices are introduced, one has to know when two matrices are equal.
Definition 1.2.1. Two matrices A and B over field F are equal, and we write A = B if
• A and B have the same size, that is, they have the same number of rows and columns, and
• The corresponding entries of A and B are equal as elements of F.
Symbolically, two m × n matrices A = [ai j ] and B = [bi j ] over F are equal if
ai j = bi j for all i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ n.
According to the preceding definition, two n-dimensional row vectors a = (a1 , a2 , . . . , an ) and
b = (b1 , b2 , . . . , bn ) over a field F are equal if and only if ai = bi (for 1 ≤ i ≤ n) as elements of F;
this shows that the order in which the components ai appear in the row vector (a1 , a2 , . . . , an ) is
important. The order of appearance of elements ai determines the row vector (a1 , a2 , . . . , an ), it is
also called an ordered n-tuple. For n = 2 such a symbol is called an ordered pair; thus, as ordered
pairs (a, b) is not the same as (b, a) (unless a = b) even though as sets {a, b} and {b, a} are the same.
An ordered n-tuple for n = 3 is a ordered triple.
Similarly, two n-dimensional column vectors over F are equal if and only if corresponding entries
are equal. Thus the order of the entries of an n-dimensional column vector determines it uniquely and
so we can also think of it also as an ordered n-tuple.
Thus, an n-dimensional row or a column vector over a field F is an ordered n-tuple and conversely
any n-tuple with elements from a field F can be considered either an n-dimensional row vector or an
n-dimensional column vector.
Basic Concepts 5
The set of all ordered n-tuples formed by the elements of a field is usually denoted by Fn . As we
shall see later, Fn is a vector space; its elements are called vectors. When a vector in Fn is represented as
an n-dimensional column vector or row vector, its entries will be called the components of the vector.
More generally, the set of all m × n matrices over a field F is another example of a vector space
(see Section 1.3). The requirements that these matrices need to satisfy are certain properties of two
operations on such matrices, namely addition and scalar multiplication. We introduce these operations
now.
From now onwards, the set of all m × n matrices will be denoted by Mm×n (F); Mn (F) will denote
the set of all square matrices of order n over the field F.
Addition and Multiplication of Matrices

We can add two matrices over a field if they have the same size; the sum is again a matrix of the same
size and its entries are obtained by adding the corresponding entries of the two given matrices. In other
words, if A, B ∈ Mm×n (F) with A = [ai j ] and B = [bi j ], then the sum, denoted by A + B, is given by
A + B = [ai j + bi j ].
We can also define the sum by letting A + B = [ci j ], where
ci j = a i j + b i j for 1 ≤ i ≤ m, 1 ≤ j ≤ n. (1.1)
Note that A + B ∈ Mm×n (F).

In particular, if A, B ∈ Mn (F), (that is for m = n) then their sum A + B ∈ Mn (F). Similarly, the
sum a + b of two n-dimensional column (respectively, row) vectors in Fn , obtained by adding the
corresponding entries of a and b, is an n-dimensional column (respectively, row) vector in Fn as
Fn = Mn×1 (F).
The following illustrates the addition of two real 2 × 3 matrices:
' ( ' ( ' ( ' (
1 2 0 1 0 3 1+1 2+0 0+3 2 2 3
+ = = .
4 0 6 0 5 0 4+0 0+5 6+0 4 5 6
It should be clear that matrices of different sizes cannot be added; we say that such sums are not
defined.
We can multiply a matrix over a field F by an element of the field or a scalar; such an operation is
known as scalar multiplication.
Definition 1.2.2. For any A ∈ Mm×n (F) and any scalar c ∈ F the scalar multiple cA is the matrix in
Mm×n (F) obtained from A by multiplying each entry of A (which is a scalar too) by c. Thus, if A = [ai j ]
then
/ 0
cA = cai j for 1 ≤ i ≤ m, 1 ≤ j ≤ n. (1.2)
It is clear that for A ∈ Mn (F) or x ∈ Fn , the scalar multiples cA is in Mn (F) and cx is in Fn .

6 Matrices
' (
−1 0 i
For example, if A = is a 2 × 3 matrix over C, then the following scalar multiples
3 2 0
' ( ' (
−2 0 2i −i 0 −1
2A = and iA =
6 4 0 3i 2i 0
are also 2 × 3 matrices over C.

Recall that a scalar matrix is a diagonal matrix (that is, a square matrix whose all off-diagonal
entries are zero) whose all diagonal entries are equal. It is clear that a scalar matrix of order n all of
whose diagonal entries are equal to, say c, can be written as the scalar multiple cIn of the identity
matrix In of order n.
Note that if we replace the scalars ai j that appear as entries of a matrix A = [ai j ] ∈ Mm×n (F) by their
negatives −ai j in F, the resultant m × n matrix is just the scalar multiple of A by the scalar −1. For
convenience, we denote this new matrix in Mm×n (F) by −A. We think of this matrix as the negative (or
formally as the additive inverse) of the matrix A. It is clear that when a matrix A ∈ Mm×n (F) is added
to the matrix −A, the sum is the zero matrix 0 in Mm×n (F).
The notation of the negative of a matrix allows us to introduce subtraction of matrices of the same
size. If A = [ai j ] and B = [bi j ] are two matrices in Mm×n (F), then A − B will be the matrix in Mm×n (F)
given by
A − B = A + (−B) = [ai j − bi j ].
In other words, the (i, j)th entry of A − B is obtained by subtracting the scalar bi j from ai j in F. Thus,
for any given A ∈ Mm×n (F), all the entries of A − A will be zeros, that is, A − A = 0, the zero matrix in
Mm×n (F).
We illustrate some of the ideas discussed so far with an example. If
' ( ' (
2 0 1 0
A= and B=
1/2 −1 1/4 −1/2
are matrices over R, then

   
 2 0  1 0
A − 2B =   − 2  
1/2 −1 1/4 −1/2
   
 2 0  2 0
=   −  
1/2 −1 1/2 −1
 
0 0
=  .
0 0
So, A − 2B is the 2 × 2 zero matrix over R. Similarly, the reader can verify that
' (
4 0
A + 2B = .
1 −2
We also note that A = 2B or B = 1/2A.

Basic Concepts 7
Like numbers, matrices can also be multiplied although matrix multiplication, unlike addition or
subtraction, is a more complicated operation. The product of two matrices cannot be computed sim-
ply by multiplying the corresponding entries of the matrices. We first look at the special case of the
multiplication of a row vector A by a column vector B of the same size over a field F. If
 
b1 
 
b2 
A = (a1 , a2 , . . . , an ) and B =  . ,
 .. 
 
bn
then the product AB is defined as the scalar (or the number)
AB = a1 b1 + a2 b2 + · · · + an bn .
This is usually referred to as the dot product or the scalar product of the two vectors. For example, the
dot product of
 
2
/ 0  
A= 5 0 −3 and B = 1
 
4
will be AB = 5.2 + 0.1 + (−3) · 4 = 10 + 0 − 12 = −2.

The sum expressing the product AB can be abbreviated by using the convenient notation for sum-
mation denoted by the Greek letter sigma (Σ):
n
1
a 1 b1 + a2 b2 + · · · + an bn = ai bi . (1.3)
i=1
In the general case, we can multiply a matrix A by another matrix B only when the number of
columns of A is the same as the number of rows of B, so that we can take the dot product of each
row vector of A with each column vector of B. To be more precise, if A is an m × n matrix (having n
columns) and B an n × p matrix (having n rows), both over the same field F, then the product AB is
an m × p matrix over F; the (i, j)th entry of AB is the scalar obtained by the dot product, given by the
Equation (1.3), of the ith row of A and the jth column of B.
Definition 1.2.3. For any m × n matrix A = [ai j ], 1 ≤ i ≤ m, 1 ≤ j ≤ n and an n × p matrix B =

[bi j ], 1 ≤ i ≤ n, 1 ≤ j ≤ p, the product AB is the m × p matrix AB = [ci j ] , where
n
1 1
ci j = aik bk j = aik bk j , (1.4)
k=1 k
for any fixed i and j with 1 ≤ i ≤ m, 1 ≤ j ≤ p.

8 Matrices
For example, the product of the following matrices

 
  0 1

1 2 3 
A =   and B = 2 −2

0 4 −1 
1 1
is a 2 × 2 matrix as A is a 2 × 3 and B a 3 × 2 matrix. If ci j denotes the (i, j)th entry of AB then
c11 = 1.0 + 2.2 + 3.1 = 7

c12 = 1.1 + 2(−2) + 3.1 = 0
c21 = 0.1 + 4.2 + 1(−1) = 7
c22 = 0.1 + 4(−2) + (−1)1 = −9
Hence,
 
7 0
AB =  .
7 −9
Sometimes, we say that matrices A and B are comparable for multiplication if the product AB is
defined. For example, if m ! n then for A, B ∈ Mm×n (F) the product AB is not defined.
However, it is clear that if A, B ∈ Mn (F) then product AB is defined and AB ∈ Mn (F).
In particular, if A ∈ Mn (F) then A can be multiplied to itself. This product, which is denoted by A2
is in Mn (F). Therefore, it is possible to define integral powers Ak for any positive integer k as a matrix
of order n by multiplying A to itself k times. But, we will have to wait till the Section 1.3 to see that
there is no ambiguity in defining Ak for k ≥ 3.
As an exercise involving matrix multiplication, we prove the following simple but useful result.
Proposition 1.2.4. The product of a finite number of lower triangular (respectively, upper tri-
angular) matrices in Mn (F) is a lower triangular (respectively, upper triangular) matrix in Mn (F).
Moreover, if the diagonal elements of the matrices in the product are all equal to 1, then the diagonal
elements of the product are also equal to 1.
Proof. We prove the result for lower triangular matrices, leaving the similar proof for upper triangular
matrices to the reader. Also it is clear that it suffices to prove the result for a product of two matrices.
So let A = [ai j ] and B = [bi j ] be two lower triangular matrices of order n over a field F. Since in both A
and B, the entries above the diagonal are all zero, it follows that ai j = bi j = 0 if i < j. Now, if we denote
AB = [ci j ], then by the definition of matrix multiplication, for any fixed i and j with 1 ≤ i, j ≤ n,
n
1
ci j = aik bk j
k=1
1i n
1
= aik bk j + aik bk j .
k=1 k=i+1
If i < j then each bk j in the first sum is zero as k ≤ i in this sum whereas each aik in the second sum is
zero as k ≥ i + 1 in the second sum. It therefore follows that ci j = 0 if i < j, proving that AB = [ci j ] is
lower triangular.
Basic Concepts 9
To verify the second assertion, note that by hypothesis, aii = bii = 1 for each i, 1 ≤ i ≤ n. Also any
diagonal element of AB is given by
n
1
cii = aik bki
k=1
i−1
1 n
1
= aik bki + aik bki + aii bii .
k=1 k=i+1
Each bki is zero in the first sum and each aik is zero in the second sum so the sum reduces to cii =
aii bii = 1 by hypothesis. !
Matrix Notation for Systems of Linear Equations

We now discuss as to how the definition of matrix multiplication allows us to describe systems of
linear equations in matrix notations. Consider, for example, the following system of two equations in
two variables:
2x − 3y = 5
x + 4y = −3.
Arranging the coefficients of the variables x and y in an array exactly the way they appear in the two
equations, we obtain the following matrix of order 2:
' (
2 −3
A= ,
1 4
which is known as the coefficient matrix of the given

2 3 system of equations. Note that as the coefficients
x
are real numbers, A is a real matrix. If we set x = , a 2-dimensional column vector (a 2 × 1 matrix),
y
then it is a simple exercise of matrix multiplication to show that the product Ax is a 2 × 1 matrix given
by
2 3
2x − 3y
Ax = .
x + 4y
2 3
5
Therefore, if b = , it follows, by the definition of the equality of two column vectors, that the
−3
given system of equations can be described by the matrix equation
Ax = b.
The procedure for expressing any system of linear equations over any field as a matrix equation is
a straight-forward generalization of the preceding example. The general system of m equations in n
10 Matrices
variables (sometimes thought of as unknowns) over a field F is usually described as
a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. ..
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm .
Here, the ai j and the bi stand for given (or known) scalars in F and the xi are the variables or the
unknowns.
For convenience, this system of linear equations can also be described briefly by using the summa-
tion notation:
n
1
ai j x j = bi for i = 1, 2, . . . , m. (1.5)
j=1
Here, the index i tells us which one of the m equations is being considered, whereas for each fixed i,
the index j indicates the position in the ith equation that we are looking at. So, ai j is the coefficient of
the jth variable in the ith equation.
The system of linear equations given in Equation (1.5) is said to be homogeneous, if bi = 0 for all i.
A solution of the system is a list s1 , s2 , ..., sn of scalars in F such that when the scalars s1 , s2 , ..., sn
are substituted for the variables x1 , x2 , ..., xn , respectively, each one of the m equations becomes an
equality. Clearly the order of the scalars in the list s1 , s2 , ..., sn is important. For example, the system
of two equations we had considered earlier has 1, −1 as a solution but not −1, 1. Thus, any solution
of the general system of m equations in n variables given by Equation (1.5) is an ordered n-tuple of
elements of F; usually such a solution will be considered a column vector in Fn . If each si is zero, the
solution is called the zero solution; a nonzero solution has at least one si nonzero.
Associated with the system given by Equation (1.5) of linear equations is an m × n matrix, known
as the coefficient matrix of the system. If the system of equations is over a field F, the coefficients
are scalars from F, and so the coefficient matrix is also over F. As the name suggests, this matrix has
for its jth column precisely the coefficients of the jth variable x j appearing the way they do vertically
downwards from the first equation. Thus, symbolically, the coefficient matrix of the system described
by Equation (1.5) is [ai j ]. For example, the coefficient matrix of the following system
x1 + 2x2 − 3x3 = 1
4x1 + 5x3 = 0
−x1 + x2 + x3 = 9
 
 1 2 −3
 
is the 3 × 3 real matrix A =  4 0 5. In this matrix, for example, a11 = 1, a12 = 2, a22 = 0 and
 
−1 1 1
a32 = 1.
Next, we show how the idea of matrix multiplication can be used to express the general system of
linear equations given by Equation (1.5) as a single matrix equation. Since the system consists of m
Basic Concepts 11
equations in n variables, the coefficient matrix A = [ai j ] will be an m × n matrix over F. Let x and b,
respectively, be the n × 1 and m × 1 column vectors given by
   
 x1  b1 
 x2  b2 
x =  .  and b =  .  .
 ..   .. 
   
xn bn
Here, the components of x are the variables or the unknowns and those of the column vector b over
F are the scalars appearing in right-hand side of Equation (1.5). Now observe that the product Ax is
an m-dimensional column vector whose ith component, by the rule of matrix multiplication, is the dot
4
product of the ith row of A and the column vector x and hence is precisely nj=1 ai j x j , the left hand
side of Equation (1.5). Therefore, the system given by Equation (1.5) is equivalent to the single matrix
equation
Ax = b (1.6)
 
 s1 
 s 
 2
by the definition of equality of matrices. Note that, as stated earlier, a column vector s =  .  in Fn is
 .. 
 
sn
a solution of the matrix equation Ax = b if and only if the components s1 , s2 , . . . , sn , in that order,
form a solution of the system of equations (1.5).
Column-Row Expansion
The product Ax of a matrix and a column vector, which we have expanded using the row-column
multiplication, has another equally useful expansion, usually referred to as the column-row expansion.
To see how this works, consider an m × n matrix A = [ai j ] over a field F and denote its jth column for
 
 a1 j 
 a2 j 
1 ≤ j ≤ n, by γ j =  . . Now for any scalar x j (for 1 ≤ j ≤ n),
 .. 
 
am j  
 a1 j x j 
 a2 j x j 
x j γ j =  .  .
 .. 
 
am j x j
On the other hand, as we have noted already, the ith component of the m-dimensional column vector
4
Ax is the sum nj=1 ai j x j and so, by the preceding expression for the m-dimensional column vector
4
x j γ j , also equals the ith component of the sum nj=1 x j γ j of the m-dimensional column vectors. Thus,
Ax can be expressed as
 
 x1 
 x 
 2
A  .  = x1 γ1 + x2 γ2 + · · · + xn γn , (1.7)
 .. 
xn
a linear combination of the columns of A.
12 Matrices
For example,
        
 1 2 3   2  1 2 3
 4 5        
6   −3  = 2 4 − 3 5 + 4 6 ,
        
7 8 9 4 7 8 9
which the reader should verify by computing the product on the left hand side by the usual row-column
multiplication and adding the column vectors (after performing the scalar multiplication) on the right
hand side.
The column-row expansion, as given in Equation (1.7), will be useful in several occasions later
in this book, primarily because it shows that the matrix equation Ax = b is equivalent to the vector
equation
x1 γ1 + x2 γ2 + · · · + xn γn = b, (1.8)
where γ1 , γ2 , . . . , γn are the vectors forming the columns of A and x1 , x2 , . . . , xn are the components
of the column vector x.
We shall further see (at the end of Section 1.6) that, in general, the product of two matrices can also
be computed by column-row expansion.
Before we end this introductory section, we consider another important aspect of matrices. Given
any m × n matrix A ∈ Mm×n (F) and any column vector a ∈ Fn , that is, an n × 1 matrix over F, the product
Aa = b is am m × 1 matrix over F, that is column vector in Fm . In other words, multiplication by A
produces a function or a mapping, say f , from Fn to Fm , given by f (a) = Aa. This function has the
following basic properties:
f (a + a' ) = f (a) + f (a' ) and f (ca) = c f (a)
for any a, a' ∈ Fn and c ∈ F. Due to these properties, f is said to be a linear map. Linear maps play a
most important role in linear algebra and its applications. As we shall see later, any linear map from
Fn into Fm can essentially be realized as multiplication by some suitable matrix in Mm×n (F).
In the next section, we shall be studying systematically the properties of the operations on matrices
that we have introduced so far.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All ma-
trices are over an arbitrary field F unless otherwise specified.
(a) For any two matrices A and B of the same size, A + B = B + A.
(b) If, for matrices A and B the product AB is defined, then the product BA is also defined.
(c) For any two matrices A and B of order 2, AB = BA.
(d) If A, B and C are matrices of the same size, then
(A + B) + C = A + (B + C).
(e) If C is a scalar matrix of order n, then for any matrix A of order n, CA = AC.
(f) If A is a real matrix of order n, then A2 − 2A is also a real matrix of order n. (A2 = AA)
(g) For a matrix A of order n such that A ! In , A2 can never be equal to In .
Basic Concepts 13
(h) Every homogeneous system of linear equations with coefficients from a field F determines a
matrix equation Ax = 0 where 0 is a zero column vector over F.
(i) If the first two columns of a matrix B are equal, then the first two columns of AB are equal
whenever AB is defined.
(j) If the first two rows of a matrix B are equal, then the first two columns of AB are equal
whenever AB is defined.
(k) Every field has at least two distinct elements.
(l) If A is an m × n matrix and B is the n × p zero matrix, then AB is the m × p zero matrix.
' (
0 1
2. If A = over any field F, then show that A2 is the zero matrix of order 2 over F.
0 0
3. Compute AB and BA if
' ( ' (
0 1 1 0
A= and B = .
0 0 0 0
' (
i 1
4. If A = , then show that A2 is the zero matrix of order 2 over C.
1 −i
5. If
 
1 2 3
 
A = 0 4 5,
 
0 0 6
then show that the product (A − I3 )(A − 4I3)(A − 6I3) is the zero matrix of order 3, where I3 is the
identity matrix of order 3 over R.
6. Show that A3 = I3 if
 
0 0 1
 
A = 1 0 0.
 
0 1 0
7. Show that A3 = I2 if
' (
−1 −1
A= .
1 0
8. If for a real 2 × 2 matrix A, AB = BA for every real 2 × 2 matrix B, then prove that A must be a
scalar matrix.
9. Let
' ( ' (
a 0 p q
A= and B =
0 b r s
be matrices over a field F. If q and r are non-zero scalars, find the conditions on the entries of A
so that AB = BA.
10. If the first two columns of a matrix B over a field F are equal, then show that the first two columns
of AB are equal for any matrix A over F such that AB is defined.
11. Show that it is possible to find two 2 × 2 matrices A and B over any field F having entries only 0
and 1 but such that AB ! BA. Generalize to the case of n × n matrices.
14 Matrices
12. Consider a matrix A over a field F having a zero row. For any matrix B over F such that AB is
defined, show that AB must have a zero row.
13. Give an example of two real 2 × 2 matrices A and B such that
(A + B)2 ! A2 + 2AB + B2.
14. Let A and B be two 2 × 2 complex matrices. Prove that (AB − BA)2 is a scalar matrix. Can this
result be generalized to n × n matrices?
15. For any complex numbers a, b, c and d, let
' ( ' (
a b c d
A= and B = .
−b a −d c
Prove that AB = BA. ' (

cos θ − sin θ
16. For any real number θ, let T (θ) = . Show that
sin θ cos θ
T (θ1 )T (θ2 ) = T (θ2 )T (θ1 ) = T (θ1 + θ2 )
for any two real numbers θ1 and θ2 .

17. Let A and B be real matrices given by
' ( ' (
cos2 θ cos θ sin θ cos2 φ cos φ sin φ
A= and B= .
cos θ sin θ sin2 θ cos φ sin φ sin2 φ
Prove that AB is the zero matrix of order 2 if θ and φ differ by an odd multiple of π/2.
18. For the Pauli’s matrices,
' ( ' ( ' (
01 0 −i 1 0
σ1 = , σ2 = , σ3 = ,
10 i 0 0 −1
prove the following relations:
σ21 = σ22 = σ23 = −iσ1 σ2 σ3 = I2 ,
and
σ1 σ2 = iσ3 σ2 σ1 = −iσ3 .
19. Prove Proposition (1.2.4) for upper triangular matrices.

20. Let A be an m × n, and B be an n × p matrix over R such that the sum of the entries of each row
of both the matrices equals to 1. Prove that the sum of the entries of each row of the product AB
is also 1. Is the result true if rows are replaced by columns?
21. Determine real matrices A and B such that
 
 x1  2 3
 
 3x1 − x2 + x3
A  x2  = ,
  −x1 + 2x3
x3
Matrix Operations and Their Properties 15
and
   
 x1   x1 + x2 
   
B  x2  =  x2 − 3x3  .
   
x3 4x1 − x2 + x3
22. Write the following systems of equations with coefficients from R as matrix equations Ax = b:
(a) x1 − 2x2 + 3x3 − x4 =0
x2 − x3 + 3x4 − 2x5 = 0.
√
(b) 2x1 + 7x2 = 1
−x1 − πx2 = −3.
One needs familiarity with finite fields for the following exercises.
23. The field of two elements is usually described as Z2 = {0, 1}, where 1 + 1 = 0. Write down the
distinct 2 × 2 matrices over Z2 .
24. Generalize the last exercise to compute the number of m × n matrices over the finite field of p
elements.
1.3 MATRIX OPERATIONS AND THEIR PROPERTIES

Addition of scalars from any field, say the field of real numbers, follow certain rules. These rules,
in turn, are key to verifications of basic properties of matrix addition. The rules governing addition,
denoted by +, in a field F are as follows:
(i) (Commutativity) For any a, b ∈ F a + b = b + a.
(ii) (Associativity) For any a, b, c ∈ F (a + b) + c = a + (b + c).
(iii) (Existence of Additive Identity) There is a unique element (the zero) 0 ∈ F such that
a+0 = a = 0+a
for any a ∈ F.
(iv) (Existence of Additive Inverse) For any a ∈ F there is a unique element −a ∈ F such that
a + (−a) = 0 = (−a) + a.
Because of these rules, a field F is said to be an abelian group with respect to its addition. Also
note the exact similarity of these rules to the rules of addition of integers which every reader must be
familiar with; so the set of Z of integers is another example of an abelian group (for a formal definition
of groups and abelian groups, see Section 1.7 at the end of this chapter).
We shall shortly prove that addition of matrices in Mm×n (F) satisfies rules similar to that of addition
of integers or of field elements. As mentioned earlier, the reader can assume, if necessary, that either
F = R, the field of real numbers, or F = C, the field of complex numbers.
One identifies the set M1 (F) of all 1 × 1 matrices over F with F itself.
According to the definition of addition of matrices (see Equation 1.1), any two matrices in Mm×n (F)
(or, in Mn (F)) can be added to obtain another matrix in the same set. This fact is sometimes expressed
by saying that Mm×n (F) or Mn (F) is closed with respect to addition, or that matrix addition is a binary
operation in Mm×n (F) or in Mn (F). Here are the basic properties of matrix addition.
16 Matrices
Proposition 1.3.1. Let Mm×n (F) be the set of all m × n over a field F. Then,
(a) A + B = B + A for any A, B ∈ Mm×n (F);
(b) (A + B) + C = A + (B + C) for any A, B, C ∈ Mm×n (F);
(c) If 0 = 0m×n is the zero matrix in Mm×n (F), then,
A+0 = 0+A = A
for any A ∈ Mm×n (F);

(d) Given any A ∈ Mm×n (F), there is a B ∈ Mm×n (F) such that
A + B = 0m×n = B + A.
In fact, A determines B uniquely and B = −A.
We reiterate that these properties hold for matrices in Mn (F) too (for m = n) as well as for n-
dimensional row vectors or column vectors for any positive integer n.
Proof. It is clear that the matrices appearing on both sides of the equality in each of the assertions are
in Mm×n (F). Therefore, to verify these assertions, we just have to show that the general (i, j)th entries
of matrices of both sides of any equality are the same.
Let A = [ai j ], B = [bi j ] and C = [ci j ] be arbitrary matrices in Mm×n (F). Now, for any i, j, the (i, j)th
entry of A+ B is the scalar ai j +bi j whereas the (i, j)th entry of B + A is bi j +ai j . Since ai j +bi j = bi j +ai j
by commutativity of addition for scalars of F, it follows that A + B = B + A, which proves (a). Next,
by associativity of addition in F, (ai j + bi j ) + ci j = ai j + (bi j + ci j ) for any i, j. So the (i, j)th entry of
(A + B) +C is the same as the (i, j)th entry of A + (B +C) showing that (A + B) +C = A + (B +C) which
is the assertion (b) of the proposition. To prove the property of the zero matrix given in (c), note that
every entry of the zero matrix is the scalar 0 of F. Thus, for any A = [ai j ] ∈ Mm×n (F), the (i, j)th entry
of A + 0 is ai j + 0 = ai j = 0 + ai j , which is the (i, j)th entry of 0 + A. Thus property (c) holds in Mm×n (F).
Finally, for any A = [ai j ] in Mm×n (F), the matrix B = [−ai j ], where −ai j is the additive inverse of ai j in
F, is clearly in Mm×n (F). Since ai j + (−ai j ) = 0 = (−ai j ) + ai j for all i, j, it follows that A + B = 0 = B + A.
Thus (d) holds. !
As with the addition in fields, matrix addition is commutative and associative because of properties
(a) and (b), respectively, of Mm×n (F) given in the proposition. 0 is the additive identity in Mm×n (F)
and −A is the additive inverse of any A ∈ Mm×n (F) by properties (c) and (d) respectively. Thus, just
like a field with addition, Mm×n (F) is an abelian groups precisely because of these properties of matrix
addition (see Definition 1.7.1 in Section 1.7, the last section of this chapter). We record this important
fact.
Theorem 1.3.2. Mm×n (F) is an abelian group with respect to matrix addition with 0m×n acting as
the additive identity.
Since Mn (F) as well as Fn are closed with respect to matrix addition, it follows that both Mn (F) and
n
F are also abelian groups with respect to matrix addition.
Before we consider properties of scalar multiplication of matrices. we need to look at some prop-
erties of multiplication in a field. Recall the entries of a scalar multiple cA of a matrix A = [ai j ] over
a field F are the products cai j of scalars in F. Consequently, the verifications of various properties of
scalar multiplication requires, in turn, the following properties of multiplication in F:
(a) (Associativity) For any a, b, c ∈ F, a(bc) = (ab)c.
(b) (Existence of multiplicative Identity) There is a unique element 1 ∈ F such that
1a = a = a1
for any a ∈ F,
(c) (Left Distributive Law) For any a, b and c in F
a(b + c) = ab + ac.
(d) (Right Distributive Law) For any a, b and c in F
(a + b)c = ac + bc.
We now present the basic properties of scalar multiplication of matrices.
Proposition 1.3.3. Let Mm×n (F) be the set of all m×n matrices over a field F. Then, for any scalars
c, c' ∈ F
(a) (cc' )A = c(c' A) for any A ∈ Mm×n (F);
(b) (c + c')A = cA + c' A for any A ∈ Mm×n (F);
(c) c(A + B) = cA + cB for any A, B ∈ Mm×n (F);
(d) 1A = A for any A ∈ Mm×n (F), where 1 is the multiplicative identity of F.
It is clear that the results of the proposition are valid for matrices in Mn (F) as well as for row or
column vectors in Fn .
Proof. As in the case of matrix addition, these equalities are established by showing that the general
entries of the matrices on both sides of any of these equalities are the same as all the scalar multiples
of matrices involved are clearly in Mm×n (F).
If A = [ai j ] is an arbitrary matrix in Mm×n (F), then for any c, c' ∈ F, the (i, j)th entry of the scalar
multiple (cc' )A, for any i, j, is (cc' )ai j . Since c, c' and ai j are all elements of F, it follows, by the
associativity of field multiplication, that (cc' )ai j = c(c' ai j ) which is clearly the (i, j)th element of the
scalar multiple c(c' A). This proves assertion (a). On the other hand, the (i, j)th entry of (c + c' )A is
the scalar (c + c' )ai j . Now by the right distributive law in F, (c + c' )ai j = cai j + c' ai j , which is the (i, j)th
entry of the sum cA+c' A. Therefore assertion (b) holds. Now , if B = [bi j ] is another arbitrary matrix in
Mm×n (F), then ai j + bi j being the (i, j)th entry of A + B, the (i, j)th entry of the scalar multiple c(A + B)
is clearly c(ai j + bi j ). By the left distributive law in F, c(ai j + bi j ) = cai j + cbi j , the (i, j)th entry of the
sum cA + cB. So assertion (c) of the proposition holds. The verification of the last assertion of the
proposition is trivial as for the multiplicative identity 1 of F, 1ai j = ai j = ai j 1. !
Because the abelian group Mm×n (F) (respectively, Mn (F)) also satisfies the properties stated in this
proposition with respect to scalar multiplication by scalars in F, Mm×n (F) (respectively, Mn (F)) is said
to be a vector space over the field F (see Chapter 3 for the formal definition of a vector space). For the
same reason, Fn is also a vector space over F.
18 Matrices
Let us now consider matrix multiplication. We have seen in the last section that, unlike real numbers
or integers, two matrices in Mm×n (F), in general, cannot be multiplied. So we say that Mm×n (F) is not
closed under multiplication in general. Even then, as the following proposition shows, under suitable
restrictions, matrix multiplication does satisfy nice properties. For example, the first assertion of the
proposition shows that matrix multiplication is associative whereas the second and the third together
shows that it satisfies the distributive laws. The verifications of these results are nice exercises in
matrix multiplications and so the reader is urged to follow the arguments closely.
Proposition 1.3.4. Let F be a field.
(a) If for matrices A, B and C over F the matrix products AB, BC, (AB)C and A(BC) are defined,
then
(AB)C = A(BC).
(b) If for matrices A, B and C over F the matrix products AB, AC and A(B + C) are defined, then
A(B + C) = AB + AC.
(c) If for matrices A, B and C over F the matrix products AC, BC and (A + B)C are defined, then
(A + B)C = AC + BC.
(d) If for matrices A and B over F the matrix product AB is defined, then for any c ∈ F
c(AB) = (cA)B = a(cB).
(e) If A is an m × n matrix over F and Im and In are identity matrices of order m and n respectively
over F, then
Im A = A = AIn .
Proof. For the hypothesis in (a) to be satisfied, if A is an m × n matrix, then B has to be an n × p matrix
and so C is an p × r matrix for some positive integers m, n, p and r. In that case while AB is an m × p
matrix, BC is an n × r matrix and so (AB)C and A(BC) are both m × r matrices over F. Thus to prove
that these two matrices are equal, we need to show that their corresponding entries are the same. So
let A = [ai j ], B = [bi j ] and C = [ci j ]. Setting AB = [xi j ] and BC = [yi j ], we then see that (see Equation
1.4)
n
1
xi j = aik bk j for any i, j (1 ≤ i ≤ m, 1 ≤ j ≤ p),
k=1
and
p
1
yi j = bik ck j for any i, j (1 ≤ i ≤ n, 1 ≤ j ≤ r).
k=1
(In both these sums the index k is a dummy index used only to indicate that the summations will take
place for the integral values of k as stated; so any other letter or symbol can be used in place of k.)
Now fix positive integers i and j such that 1 ≤ i ≤ m and 1 ≤ j ≤ r. The (i, j)th entry of (AB)C for these
fixed i and j can be expressed as
p
 n
p 1

1 1  
xik ck j =  ail blk  ck j . (1.9)
k=1 k=1 l=1
Note that in the inner sum representing xik , we are forced to use l as the dummy index as k, for each
xik , is fixed. Now, (by the right distributive law in the field F)again, the product of the inner sum on
the right hand side of the preceding equation by each ck j ( for k = 1, 2, . . . , p) can be expressed as
sums of products of scalars of the type ail blk ck j for l = 1, 2, . . . , n. The resultant expression can then
be rearranged by collecting the coefficients of ail for each l (1 ≤ l ≤ n). Since the coefficient of ail in
4p
the resultant expression, by the left distributive law in F, is clearly k=1 blk ck j , it follows that the left
hand side of Equation (1.9) can be rewritten, using the formula for yl j , as
n
 p  n
1 1  1
ail  blk ck j  = ail yl j (1.10)
l=1 k=1 l=1
which is precisely the (i, j)th entry of the product A(BC). This completes the verification of the asser-
tion in (a).
We sketch the proof of (b) leaving it to the reader to fill in the details. For (b), we may assume that
while A is an m × n matrix, B and C (and so B + C also) are n × p matrices. Thus the product A(B + C)
and the sum AB + AC are both m × p matrices so we need only to compare their corresponding entries.
Now, if we set A = [ai j ], B = [bi j ] and C = [ci j ], then the (i, j)th entry of A(B + C) is given by
n
1 5 6
aik bk j + ck j ,
k=1
which, by the left distributive law in F, can be split into two sums
n
1 n
1
aik bk j + aik ck j .
k=1 k=1
Since the first sum is the (i, j)th entry of AB and the second the (i, j)th entry of AC, their sum is the
(i, j)th entry of AB + AC. Hence assertion (b). A similar proof establishes assertion (c). Since for any
scalar c ∈ F, by the left distributive law in F again,
n
1 n
1 n
1
c aik bk j = caik bk j = aik cbk j ,
k=1 k=1 k=1
the definition of scalar multiple of a matrix implies the assertion in (d).

The easy verification of (e) is left to the reader. !
Note that if x and y are n-dimensional column vectors in Fn , that is, n × 1 matrices over F, then so is
the sum x + y. Therefore, for any m × n matrix A over F, the products Ax, Ay and A(x + y) are defined.
Proposition (1.3.4) then implies the following:
20 Matrices
Corollary 1.3.5. Let F be a field and A ∈ Mm×n (F).

(a) For any column vectors x, y ∈ Fn ,
A(x + y) = Ax + Ay.
(b) For any column vector x ∈ Fn and a scalar c ∈ F,
A(cx) = cAx.
Earlier in the last section, we had remarked that the multiplication of n-dimensional column vectors
in Fn by an m × n matrix over F is a linear map from Fn into Fm ; this corollary provides the justification
of our remark (the formal definition of a linear map is given in Chapter 4).
Properties of Multiplication in Mn (F)

Recall that the set Mn (F) of all square matrices of order n is closed with respect to addition and
multiplication of matrices and so the hypotheses in all the assertions of the preceding proposition are
valid for matrices in Mn (F). Since we shall be dealing with square matrices most of the time, we
reiterate the assertions of the proposition specifically for matrices in Mn (F) next.
Proposition 1.3.6. Let F be a field. Then
(a) For any A, B, C ∈ Mn (F),
(AB)C = A(BC);
(b) For any A ∈ Mn (F),
AIn = AIn = A,
where In is the identity matrix in Mn (F);

(c) For any A, B ∈ Mn (F) and c ∈ F,
c(AB) = (cA)B = A(cB);
(d) For any A, B, C ∈ Mn (F)
A(B + C) = AB + AC and (A + B)C = AC + BC.
In other words, matrix multiplication in Mn (F) is associative and obeys both the left and right
distributive laws; moreover, Mn (F) has the multiplicative identity. Since Mn (F) is an abelian group with
respect to addition, these properties of matrix multiplication makes Mn (F) into a ring with identity.
Note that while matrix addition is commutative, AB ! BA, in general, for A, B ∈ Mn (F). What we mean
is that though for specific matrices A, B in Mn (F), AB may be equal to BA (for example, if one of them
is In ), we cannot set this as a rule applicable to any two matrices in Mn (F). (See Exercise 3 in Section
1.2.) Because AB ! BA, in general, for A, B ∈ Mn (F), we say that for n > 1, Mn (F) is non-commutative
ring. (What happens for n = 1?)
Another property with respect to which matrix multiplication in Mn (F) differs from addition is
the existence of inverse. Any A ∈ Mn (F) has its additive inverse. However, given an A ∈ Mn (F), even if
A ! 0, there is no certainty that A has a multiplicative inverse, that is, a matrix B such that AB = In = BA.
For example, it is easy to verify that there can be no real numbers a, b, c and d such that
' (' ( ' (
1 0 a b 1 0
= .
0 0 c d 0 1
It must be pointed out though that Mn (F), for any positive integer n, contains a large class of invertible
matrices, that is, matrices having multiplicative inverses.
The structural similarity between Mn (F) and the set Z of integers also deserves attention. Z, like
Mn (F), is a ring with identity (namely, the integer 1). However, unlike Mn (F), Z is a commutative ring
as ab = ba for any two integers a, b. On the other hand, as in Mn (F), not every non-zero integer has a
multiplicative inverse; in fact, 1 and −1 are the only invertible integers.
There is another important difference between the ring of integers Z and the ring of matrices Mn (F)
for n > 1. The product of two non-zero integers is always non-zero (so Z is called an integral domain).
But in Mn (F), for n > 1, one can find many non-zero matrices whose product is the zero matrix. For
example, in M2 (F) for any field F,
' (' ( ' (
1 0 0 0 0 0
= .
0 0 0 1 0 0
We end the discussion about multiplicative properties of Mn (F) by pointing out an important con-
sequence of the associativity of multiplication in Mn (F) which provides us with the unambiguous
meaning of any positive integral power Ak for a matrix A ∈ Mn (F). We have already noted that A2
can be defined as the product AA. Now, by associativity, (AA)A = A(AA) so A3 can be defined as any
one of the two equal products (A2 )A or A(A2 ). In general, we define Ak for any positive integer k ≥ 3
inductively as follows
Ak = (Ak−1 )A = A(Ak−1 ).
Thus, Ak is the product of k number of A, where the product can be computed in any order because
of the associativity of multiplication. For k = 0, by convention, we let A0 = In , the identity matrix of
order n.
Unit Matrices
It is time to introduce some very specific matrices in Mn (F) (for n > 1), which in some sense are the
building blocks of Mn (F). To understand these matrices, we first look at them in M2 (F), the ring of
2 × 2 matrices. We define the four unit matrices in M2 (F) as follows:
' ( ' (
1 0 0 1
e11 = , e12 =
0 0 0 0
and
' ( ' (
0 0 0 0
e21 = , e22 = .
1 0 0 1
Here, following convention, we are using small case letters to denote unit matrices. Any ma-
trix in M2 (F) can be expressed in terms of these simple matrices. For example, given the matrix
22 Matrices
' (
−1 3
A= , we see that A is the following sum of scalar multiples of the preceding unit matrices:
1/2 4
A = (−1)e11 + 3e12 + 1/2e21 + 4e22.
Once we understand how this example works (the (i, j)th entries of A combine with corresponding ei j
to produce A), it is too easy to write down the general formula; given an arbitrary A = [ai j ] ∈ M2 (F),
A is the following sum of scalar multiples of the unit matrices:
A = a11 e11 + a12e12 + a21 e21 + a22 e22 .
Noting that the suffixes in the sum run independently through values 1 and 2, we can conveniently
write A as a double sum
2
1
A= a i j ei j .
i, j=1
Technically speaking, we have just expressed A as a linear combination of the unit matrices, that is,
as a sum of scalar multiples of the unit matrices.
Keeping the 2 ×2 unit matrices in mind, we now consider the general case of unit matrices in Mn (F).
For any positive integer n, there are exactly n2 unit matrices ei j for 1 ≤ i, j ≤ n in Mn (F), where ei j is
the n × n matrix whose entries are all zeros, except for the entry at the (i, j)th place, which is 1. We
record the basic properties of these unit matrices now.
Proposition 1.3.7. Let Mn (F) be the set of all square matrices of order n over a field F. Then,
(a) Any matrix in Mn (F) is a linear combination of the ei j . If A = [ai j ], then
A = a11 e11 + a12 e12 + · · · + annenn

1n
= a i j ei j .
i, j=1
(b) If In is the identity matrix in Mn (F), then

n
1
In = eii .
i=1
(c) Given two unit matrices ei j and ekl in Mn (F),
ei j ekl = 0 if j ! k
= eil if j = k.
Proof. Only property (c) may present some difficulty for the reader and so we leave the proof of the
first two to the reader. As for property (c), once we recall that every entry of the product ei j ekl is a
dot product of some row vector of ei j with some column vector of ekl , the verification will be quite
simple. Since every row vector of ei j except the ith one and every column vector of ekl except the lth
one are the zero vectors, it follows that the only possible non-zero entry in ei j ekl can result from the
dot product of the ith row of ei j and the lth column of ekl . In other words, the only possible non-zero
entry in the product will be the (i, l)th entry. Next, observe that this (i, l)th entry is zero, unless j = k in
which case it is 1. Thus, ei j ekl is the zero matrix, unless j = k in which case it must be the unit matrix
having 1 at the (i, l)th entry. This finishes the proof of (c). !
There is a useful notation, known as the Kronecker delta symbol which allows us to express the
two relations of property (c) of the preceding proposition as a single one. The Kronecker delta symbol,
denoted by δi j , is actually a function of two variables, depicted as the subscripts i and j, and its values
are given by


0 if i ! j

δi j = 
 (1.11)
1 if i = j
We have not declared the range of the variables deliberately. They can range over any set of numbers,
finite or infinite. The only requirement is that both the variables must range over the same set. However,
in almost all our uses of the Kronecker delta symbol, the variables will range over some subset of the
set of positive integers.
Note that the property (c) of the preceding proposition can be put in the form
ei j ekl = δ jk eil
as the right-hand side equals eil if j = k, and is zero otherwise. In this case, the variables of the
Kronecker symbol vary over the set of positive integers from 1 to n.
Unit matrices in Mn (F) are useful in establishing important properties of the ring Mn (F). For ex-
ample, these unit matrices are non-zero divisors of zero in the sense that the product of non-zero
matrices ei j and ekl produces the zero matrix if j ! k. See Exercises 20 and 22 for other examples.
It is possible to consider rectangular unit matrices, too. An m×n matrix is a unit matrix in Mm×n (F),
if all its entries are zero, except one entry which is 1. Thus, in Mm×n (F), which consists of all matrices
with m rows and n columns, there are precisely mn unit matrices. The same notation ei j can be used
to describe the unit matrix in Mm×n (F) having 1 at the (i, j)th place and zeros elsewhere. Of course,
unless m = n, we cannot talk about multiplication of unit matrices in Mm×n (F). However, as in the case
of square matrices, it is easy to see that the unit matrices can be used to build arbitrary matrices in
Mm×n (F).
Proposition 1.3.8. Any matrix in Mm×n (F) is a linear combination of the unit matrices ei j . If
A = [ai j ], then,
A = a11 e11 + a12 e12 + · · · + amn emn

1m 1 n
= a i j ei j .
i=1 j=1
We sometimes need to multiply unit matrices of different sizes. If they are comparable for multi-
plication, then their product will either be the zero matrix, or a unit matrix again. We record this fact
using the Kronecker delta symbol in the following result; the proof is similar to that of Proposition
(1.3.7). Note that we keep using the same letter e to denote unit matrices of different sizes; in practice,
the sizes should be clear from the context.
24 Matrices
Proposition 1.3.9. Let ei j and ekl be two unit matrices of sizes m × n and n × p, respectively, over
a field F. Then
ei j ekl = δ jk eil .
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All given
matrices are over an arbitrary field F unless otherwise specified.
(a) The integers form an abelian group with respect to addition.
(b) If, for square matrices A, B and C of the same order, AB = AC, then B = C.
(c) Even if both A and B are non-zero matrices of the same order, the product AB can be the
zero matrix.
(d) For any m × n matrix A, AIm = A whereas Im A is not even defined if m ! n.
(e) For any square unit matrix ei j , the product ei j 2 is the zero matrix.
(f) No scalar multiple of a non-zero m × n matrix can be the zero matrix.
(g) If, for a 3 × 3 matrix A, A3 is the zero matrix, then A must be the zero matrix.
(h) If, for an m × n matrix A, both the products AB and BA are defined for some B, then m = n.
(i) For matrices A, B and C of the same size, A − (B − C) = (A − B) − C.
(j) For square matrices A, B and C of the same order, A(B − C) = (−A)(C − B).
2. Prove that for any A, B, C ∈ Mn (F),
(A + B)C = AC + BC.
3. For any A, B ∈ Mn (F), and any scalar a ∈ F, show that
(aA)B = a(AB) = A(aB).
4. Prove that for any A, B, C ∈ Mn (F),

(a) −(−A) = A;
(b) A(B − C) = AB − AC;
(c) (A − B)C = AC − BC.
5. For any m × n matrix A, n × p matrix B and p × 1 matrix γ over a field F, show that
(AB)γ = A(Bγ).
6. Let C be a matrix in Mn (F) whose column vectors are γ1 , γ2 , . . . , γn , so
C = [γ1 γ2 ··· γn ].
Prove that for any B ∈ Mn (F)
BC = [Bγ1 Bγ2 ··· Bγn ].

7. Use the preceding exercise to prove that
A(BC) = (AB)C
for any A, B, C ∈ Mn (F).

8. Prove that the following laws of indices hold for any A ∈ Mn (F) and for any non-negative inte-
gers k and l:
(a) Ak Al = Ak+l and
(b) (Ak )l = Akl .
9. Let A, B ∈ Mn (F) such that AB = BA. Show that for any positive integer k,
(a) ABk = Bk A and
(b) (AB)k = Ak Bk .
10. Let A, B ∈ Mn (F) such that AB = BA. Show that (A + B)2 = A2 + 2AB + B2, assuming that 2 ! 0
in F.
As in the last exercise, one can show that for matrices A, B ∈ Mn (F) such that AB = BA, the
Binomial Theorem holds for the expansion of (A + B)k , provided the base field F has character-
istic 0. The field R of real numbers or the field C of complex numbers is of characteristic 0.
11. Evaluate the following for any positive integer k:
 k
' (k 0 0 0
1 1  
, 1 0 0 .
0 1 
0 0 1
12. Prove that the subsets of Mn (F) consisting of the following types of matrices are closed with
respect to addition, multiplication and scalar multiplication.
(a) Upper triangular matrices.
(b) Lower triangular matrices.
(c) Diagonal matrices.
(d) Scalar matrices.
Given a matrix A ∈ Mn (F), we define the trace of A to be the sum /of the 0 diagonal entries of A.
4
The trace of A, which is a scalar, is denoted by T r(A). Thus, if A = ai j , then T r(A) = ni=1 aii .
13. Let A, B ∈ Mn (F). Prove the following:
(a) T r(A + B) = T r(A) + T r(B).
(b) T r(cA) = cT r(A) for any scalar c ∈ F.
(c) T r(AB) = T r(BA).
14. Give an example of two matrices A and B, say, in M2 (R), such that T r(AB) ! T r(A)T r(B).
15. Use properties of traces of matrices to show that for any two matrices A, B ∈ Mn (F),
AB − BA ! In ,
where In is the identity matrix in Mn (F). (Here, F is a field in which n ! 0.)

26 Matrices
16. Prove Proposition (1.3.9).

17. Let e11 , e12 , e21 and e22 be the unit matrices in M2 (F), and let
' (
a11 a12
A= = a11 e11 + a12e12 + a21e21 + a22 e22
a21 a22
be an arbitrary matrix in M2 (F). Use the formula for multiplication of unit matrices given in
Proposition (1.3.7) to compute the following matrices:
e11 A, e12 A, Ae21 and Ae22 .

4
18. Let A = i, j ai j ei j be an arbitrary matrix in Mn (F), where ei j are the unit matrices in Mn (F).
Write down the matrices elk A and Aelk as linear combinations of the unit matrices.
We say that two matrices A and B in Mn (F) commute if AB = BA.
19. Find all matrices B ∈ M2 (F) such that B commutes with
' (
0 0
A= .
1 0
' in Mn (F)
20. Prove that a scalar matrix ( commutes with every matrix in Mn (F).
a11 a12
21. Consider a matrix A = in Mn (F) such that A commutes with e12 and e21 . Prove that
a21 a22
a11 = a22 and a12 = a21 = 0.
22. Let A be a matrix in Mn (F) such that A commutes with every matrix in Mn (F). Prove that A must
be a scalar matrix.
4
(Hint: A = i, j ai j ei j commutes with every unit matrix elk .)
Given an m × n matrix A = [ai j ] and a p × q matrix B, both over a field F, the Kronecker
Product A ⊗ B of A and B is the (mp) × (nq) matrix given by
A ⊗ B = [ai j B],
where ai j B is the scalar multiple of the matrix B by ai j . Thus, the Kronecker product of any two
matrices over a field is always defined.
23. Let A, B, C and D be matrices over a field F.
(a) If A and B are both m × n matrices, then show that for any C,
(A + B) ⊗ C = A ⊗ C + B ⊗ C.
(b) If B and C are both p × q matrices, then show that for any A,
A ⊗ (B + C) = A ⊗ B + A ⊗ C.
(c) If sizes of the matrices A, B, C and D are such that the products AB and CD are defined,
then verify that the product of A ⊗ C and B ⊗ D is also defined and
(A ⊗ C)(B ⊗ D) = (AB) ⊗ (CD).

Invertible Matrices 27
24. For any two matrices A and B over a field F, show that
T r(A ⊗ B) = T r(A)T r(B).
A matrix A ∈ Mn (R) is called a stochastic or a transition matrix if all its entries are non-
negative reals and the sum of the entries in each column is 1.
25. Prove that the product of any two stochastic matrices of order n is a stochastic matrix of order n.
1.4 INVERTIBLE MATRICES

Definition 1.4.1. A matrix A ∈ Mn (F), that is, a square matrix A of order n over a field F, is said to
be invertible in Mn (F), if there is a matrix B ∈ Mn (F) such that
AB = BA = In ,
where In is the identity matrix of order n over F. The matrix B is said to be the inverse of A and denoted
by A−1 .
To take a trivial example, In itself is invertible and In−1 = In . On the other hand, the zero matrix
surely cannot be invertible. As we have seen in the last section, in general, a non-zero matrix need not
be invertible.
If a matrix A in Mn (F) is not invertible, then we sometimes say that A is a singular matrix.
EXAMPLE 1 The reader is invited to check that the following matrices are invertible by verifying
that AA−1 = A−1 A = In :
' ( ' (
3 0 1/3 0
(a) A = , A =
−1 ;
0 2 0 2/3
' ( ' (
1 2 1 −2/3
(b) A = , A−1 = ;
0 3 0 1/3
' ( ' (
1 0 1 0
(c) A = , A−1 = ;
1 1 −1 1
(d) A = aIn (a ! 0), A−1 = a−1 In ;

The definition of the invertibility of a matrix is clearly not suitable for checking whether a given
matrix A is invertible. For, not only one has to guess what A−1 may be, but also compute AA−1 and
A−1 A. However, we will be able to develop some conditions for invertibility that will not require any
knowledge of the inverse. Moreover, there are more efficient ways (see, for example, Algorithm 2.5.6
in Chapter 2) to calculate the inverse of a matrix. Even then, we must point out that for theoretical
purposes, Definition (1.4.1) will continue to be useful throughout this book.
For now, we will be content with a slight simplification of Definition (1.4.1) whose justification
will be given later (see Proposition 2.5.7). We claim that if we can find a matrix B such that either of
the following conditions
AB = In or BA = In (1.12)
28 Matrices
holds, then A is invertible with B as its inverse. A nice way of putting this is to say that a one-sided
inverse must be a two-sided inverse. This is clearly a non-trivial fact, as matrix multiplication is not
commutative in general and therefore requires proof. But we will have to wait till Proposition (2.5.7)
in Section 2.5 for the proof.
A careful reader must have noted that we are assuming the uniqueness of the inverse of an invertible
matrix. That is why it is possible to name the inverse A−1 and call it the inverse of A. It is easy to see
why a matrix cannot have two inverses. If possible, suppose that A has two inverses B and C, so that
AB = In = BA and AC = In = CA. Now C = CIn = C(AB). However, C(AB) = (CA)B and CA = In
showing that C = B.
Using the fact that the inverse of an invertible matrix is unique, one can easily deduce from the
relation
AA−1 = In = A−1 A
the following proposition.
Proposition 1.4.2. If A is invertible in Mn (F), then its inverse A−1 is also invertible and
(A−1 )−1 = A.
The sum of two invertible matrices need not be an invertible matrix. But we can say something
definite about their products.
Proposition 1.4.3. Let A and B be two invertible matrices in Mn (F). Then, the product AB is also
invertible and
(AB)−1 = B−1 A−1 .
Proof. We show that AB is invertible by verifying that B−1 A−1 is its inverse. Note that by one of our
preceding remarks, it is sufficient to verify that it is an one-sided inverse:
(AB)(B−1 A−1 ) = (A(BB−1))A−1 = (AIn )A−1 = AA−1 = In . !
Proposition (1.4.3) shows that the set of invertible matrices in Mn (F) is closed with respect to matrix
multiplication. It is also clear that the identity matrix is invertible. Moreover, by Proposition (1.4.2),
the inverse of any invertible matrix is again invertible. Denoting the set of invertible matrices in Mn (F)
by GLn (F), we have, therefore, the following result:
Theorem 1.4.4. The set GLn (F) of invertible matrices in Mn (F) forms a group with respect to
matrix multiplication.
We end this section by presenting a sufficient condition for a square matrix of order 2 to be invertible
as well as a formula for the inverse in case the condition is satisfied.
' (
a b
Proposition 1.4.5. Let A = be a matrix over a field F. If δ = ad − bc ! 0 in F, then A is
c d
invertible and
' (
d −b
A−1 = 1/δ .
−c a
Proof. A direct computation shows that

' (' ( ' (
a b d −b ad − bc 0
= .
c d −c a 0 ad − bc
So if δ ! 0, then on multiplication of the preceding matrix equation by the scalar 1/δ reduces the right
hand side of the equation to I2 , which shows that A, the first matrix of the left hand side of the equation
is invertible. By the uniqueness of the inverse, the scalar multiple 1/δ times the second matrix has to be
inverse of A proving the assertion of the proposition (recall that for a scalar c, one has c(XY) = X(cY)
for matrices X, Y of the same order). !
' (
ab
It must have been recognized that the scalar ad − bc is the determinant of the matrix . As we
cd
shall see later, every square matrix A over a field can be associated with a scalar det A, known as its
determinant, though the formula for det A for a general matrix A of order n ≥ 3 is not as simple as that
for a matrix of order 2. However, we shall also see that a matrix A ∈ Mn (F) is invertible if and only
if det A is non-zero; thus the preceding result is one half of this general result for the special case of
n = 2; see Exercise 7 of this section in this connection.
The general formula for the inverse of an invertible matrix using determinant, even for invertible
matrices of order 3, involves a large number of computations to be of any practical value; as we have
stated earlier, one uses instead the algorithm given in Section 5 of Chapter 2. In contrast, finding
the inverse of an invertible matrix A of order 2 by using the preceding proposition is quite simple;
interchange the diagonal elements of A, change the sign of the other two entries and then scalar
multiply the resultant ( by the reciprocal of the determinant. For example, as the determinant of
' matrix
2 2
the real matrix A = is 2.4 − 2.2 =0.2,the inverse of A is
2 4
' ( ' (
−1 4 −2 1 −1/2
A = (1/4) = .
−2 2 −1/2 1/2
We will come back to invertible matrices in Section 5 of the next chapter.
EXERCISES
matrices are square and over an arbitrary field F unless otherwise specified.
(a) A matrix with a zero row or a zero column cannot be invertible.
(b) The sum of two invertible matrices of the same order is invertible.
(c) A diagonal matrix with all its diagonal entries non-zero, is necessarily invertible.
(d) If matrices A and B commute, then the invertibility of A implies the invertibility of B.
(e) If A is an invertible matrix, then the homogeneous system of equations Ax = 0 has a non-zero
solution.
(f) If A is an invertible matrix, then AB = AC implies that B = C.
30 Matrices
(g) If for matrices A and B of order n, AB = In , then A and B are both invertible.
(h) The sum of two singular matrices of the same order cannot be invertible.
(i) The trace of an invertible matrix must be non-zero.
(j) The set of non-zero scalar matrices of order n is a subgroup of GLn (F).
2. Let A be an invertible matrix in Mn (F). Prove the following:
(a) For any non-zero scalar a ∈ F, the scalar multiple aA is invertible.
(b) For any positive integer k, Ak is invertible.
3. Let A ∈ Mn (F). Prove the following:
(a) If aA is invertible for some non-zero scalar a ∈ F, then A is also invertible;
(b) If for some positive integer k, Ak is invertible, then A is also invertible.
4. Let A and B be matrices in Mn (F).
(a) If the product AB is invertible, then prove that both A and B are invertible.
(b) If AB is invertible, then prove that BA is also invertible.
5. Let A be an invertible matrix in Mn (F). Show that for no non-zero matrix B ∈ Mn (F), AB or BA
can be the zero matrix.
6. Use the formula in Proposition (1.4.5) to compute the inverses of the following matrices:
' ( ' (
1 1 1 2
A= , B= ,
0 1 3 4
' ( ' (
cos θ −sin θ 1+i i
C= , D= .
sin θ cos θ −i 1−i
Note that while A, B, C are real matrices, D is a complex matrix with i2 = −1.
7. Verify that the condition
' δ( ! 0 in Proposition (1.4.5) is necessary for the invertibility of A by
1 1
showing that if A = , then there is no B ∈ M2 (R) such that
1 1
' (
1 0
AB = .
0 1
8. Let A ∈ M2 (F) such that its second row is a scalar multiple of the first, that is, A is of the form
' (
a b
A= .
ca cb
Prove that A is not invertible by showing that there can be no B ∈ M2 (F) such that AB = I2 .
Generalize to prove that if some row of a matrix A ∈ Mn (F), (n > 1) is a scalar multiple of
another row, then A cannot be invertible.
9. Use the preceding exercise to show that if A is an n × 1 and B an 1 × n matrix over a field F, then
the n × n matrix AB cannot be invertible (n > 1).
10. The following matrices A over R are invertible. Guess their inverses A−1 , and compute AA−1 to
confirm your guess:
     
1 0 0 1 2 −1 0 1 0
     
A = 1 1 0, A = 0 1 3, A = 1 0 0.
     
1 1 1 0 0 1 0 0 1
11. Find the inverses of the following matrices of order 3 over a field F:
 
a1 0 0 
 
A =  0 a2 0  ai ! 0,
 
0 0 a3
 
 0 0 a1 
 
A =  0 a2 0  ai ! 0,
 
a3 0 0
 
a 0 0
 
A = 1 a 0 a ! 0.
 
0 1 a
Generalize to matrices of order n.

12. Consider the real matrices
' ( ' (
5 −2 1 −2
A= and P= .
−2 2 2 1
(a) Compute P−1 .

(b) Verify that
' (
1 0 −1
A=P P .
0 6
(c) Prove that, for any positive integer n,

;' ( ' (<
n 1 2 n 4 −2
A = 1/5 +6 .
2 4 −2 1
' (
a b
13. Let A = ∈ M2 (R), where a, b, c and d are non-negative real numbers such that a + c =
c d ' (
b 1
b + d = 1 and A ! I2 . Let P = .
c −1
(a) Prove that P is invertible and that
' (
1 0
P−1 AP = .
0 a+d−1
(b) Compute An for any positive integer n.

The next exercise shows how to use matrix methods to solve systems of recurrence relations.
32 Matrices
14. Consider the following system of recurrence relations:
xn+1 = 5xn − 2yn

yn+1 = −2xn + 2yn,
where the initial values are given by x0 = 0 and y0 = 1. Let

' (
x
Xn = n
yn
for any non-negative integer n.

(a) Writing the system of recurrence relations in matrix form, show that
Xn = A n X 0
for any positive integer n, where A is the matrix of Exercise 12.

(b) Use Exercise 12 to show that
xn = 1/5(2 − 2.6n)
yn = 1/5(4 + 6n)
for any positive integer n.

15. List the invertible matrices in Mn (F) if F = Z2 , for n = 2 and 3.
16. Let A and B be invertible matrices over a field F. Prove that the Kronecker product A ⊗ B is also
invertible and
(A ⊗ B)−1 = A−1 ⊗ B−1.
1.5 TRANSPOSE OF A MATRIX

Quite often, for various reasons, one has to consider a given matrix A with its rows and columns
interchanged. The new matrix so obtained by such an interchange is called the transpose of A, and
denoted by At . To be precise, if A = [ai j ] is an m × n matrix over a field F, then its transpose At = [bi j ]
is an n × m matrix over F such that
bi j = a ji for all i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Thus, for A ∈ Mm×n (F), we have At ∈ Mn×m (F). In particular, the transpose of a n-dimensional row
(respectively, column) vector is a n-dimensional column (respectively, row) vector. Note also that if
A ∈ Mn (F), then At is also in Mn (F).
We need to know how transposing a matrix works with various matrix operations. The following
result explains.
Proposition 1.5.1. For any matrices A and B over a field F and a ∈ F, the following hold:
(a) (At )t = A.
(b) (aA)t = aAt .
Transpose of a Matrix 33
(c) If A and B can be added, then
(A + B)t = At + Bt .
(d) If A and B can be multiplied, then
(AB)t = Bt At .
(e) If A is invertible, then so is At , and
(At )−1 = (A−1 )t .
Proof. The proofs of the first three are straightforward and we leave them to the reader. For assertion
(d), we may assume that A is an m × n and B is an n × p matrix over F so the product AB is an m × p
and the transpose (AB)t a, p × m matrix. Similarly, the product Bt At of the transposes is a p × m matrix
and thus both sides of (d) are matrices of the same size. To check the equality of the entries of these
matrices, we let A = [ai j ], B = [bi j ] and AB = [ci j ]. Then, a typical (i, j)th entry of (AB)t will be c ji ,
which, by the definition of AB will be given by
n
1
c ji = a jk bki .
k=1
On the other hand, supposing At = [di j ] and Bt = [ fi j ], we see that the (i, j)th entry of the product Bt At
is given by
n
1 n
1
fik dk j = bki a jk ,
k=1 k=1
which is clearly c ji . The equality in (d) follows.

Finally, if A ∈ Mn (F) is invertible, then AA−1 = In = A−1 A. Taking transposes, and using the product
rule (d) for transposes, we obtain
(A−1 )t At = In t = At (A−1 )t .
Since In t = In , it follows from the displayed equation that At is invertible and its inverse is (A−1 )t . The
proof of assertion (e) is complete. !
Matrices known as symmetric matrices form an important class of matrices in Mn (F), and they are
defined in terms of transposes.
Definition 1.5.2. A matrix A ∈ Mn (F) is symmetric if At = A, and skew-symmetric if At = −A.
Thus, a symmetric matrix is a square matrix in which each off-diagonal entry equals its mirror image
or its reflection about the diagonal on the other side; however, there is no restriction on the diagonal
entries. So the identity matrix, the zero matrix, any scalar matrix and any diagonal matrix in Mn (F) are
34 Matrices
trivially symmetric. Clearly both the following matrices

' ( ' (
2 2 4 −2
A= , A−1 = 1/4
2 4 −2 2
are symmetric. There are several important facts about symmetric matrices one should know. We
record three here; others can be found in the exercises.
Proposition 1.5.3. Let F be a field.
(a) If A and B are symmetric matrices in Mn (F) then so are A + B and A − B.

(b) If A is a symmetric matrix in Mn (F) then so is cA for any scalar c ∈ F.
(c) If a symmetric matrix a ∈ Mn (F) is invertible then so is A−1 .
The proofs are routine applications of properties of transposes and are left to the reader to decide
on.
Before looking for examples of skew-symmetric matrices, the reader should note that all the diag-
onal entries of a skew-symmetric matrix are zeros.
For complex matrices, there is a related concept that merits attention. First, recall that for a complex
number z = a + ib ∈ C, its conjugate, denoted by z, is given by z = a − ib.
Given a matrix A = [ai j ] ∈ Mm×n (C), its conjugate transpose, denoted by A∗ is the matrix obtained
from the transpose At of A by replacing each entry of At by its conjugate. Thus, A∗ ∈ Mn×m (C), and if
A∗ = [bi j ], then bi j = a ji .
Thus, for example, if
 
 i 1 ' (
  −i −3i 0
A = 3i 1 + i, then A∗ = .
  1 1−i i
0 −i
Note that if A is a matrix with real entries, then its conjugate transpose A∗ coincides with At .
There are results for conjugate transposes analogous to those for transposes given in the preceding
Proposition (1.5.1). They are listed in the next proposition for future reference. The proofs are similar
and left to the reader as easy exercises.
Proposition 1.5.4. Let A and B be matrices over F where F is either C or R. Then,
(a) (A∗ )∗ = A;
(b) (cA)∗ = cA∗ for any c ∈ F;
(c) If A and B can be added, then
(A + B)∗ = A∗ + B∗;
(d) If A and B can be multiplied, then
(AB)∗ = B∗ A∗ .
Transpose of a Matrix 35
EXERCISES
(a) If a matrix A is symmetric, so is At .
(b) If a matrix A is skew-symmetric, so is At .
(c) A symmetric matrix is invertible.
(d) If an invertible matrix A is skew-symmetric, so is A−1 .
(e) If A and B are symmetric matrices of the same order, then the product AB is symmetric.
(f) If A and B are skew-symmetric matrices of the same order, then the product AB is symmetric.
(g) A square matrix cannot be both symmetric and skew-symmetric.
(h) For a square matrix A, (Ak )t = (At )k for any positive integer k.
(i) For any m × n matrix A, the product AAt is symmetric.
(j) If A and B are invertible matrices of the same order, then ((AB)−1)t = (At )−1 (Bt )−1 .
' ( 2 3
1 3 1
2. Let A = and u = be matrices over R. Compute the matrices (Au)t , ut At as well as
3 −1 2
ut u and uut .
3. For any m×n matrix A over a field F, show that the products AAt and At A are symmetric matrices
in Mm (F) and Mn (F), respectively.
4. For any field F, prove the following:
(a) For any A ∈ Mn (F), the matrices A + At and A − At are symmetric and skew-symmetric,
respectively.
(b) For any symmetric or skew-symmetric matrix A ∈ Mn (F), and for any scalar c ∈ F, the matrix
cA is symmetric or skew-symmetric, respectively.
(c) Every matrix in Mn (F) is a sum of a symmetric and a skew-symmetric matrix (provided,
division by 2 is allowed in the field F).
5. For any m × n matrix over R, show that each diagonal entry of the matrix At A is non-negative.
6. For a non-zero matrix A in Mn (F), where F = C or R, prove that AAt or At A cannot be the zero
matrix. Give a counter example over the field F of two elements.
7. Let A be either a symmetric or a skew-symmetric matrix in Mn (F). Show that A2 is a symmetric
matrix.
8. Let A, B ∈ Mn (F) be both either symmetric or skew-symmetric. Prove that AB is symmetric if
and only if A and B commute.
9. Let A, B ∈ Mn (F) be symmetric matrices. Prove that AB + BA is symmetric and AB − BA is
skew-symmetric.
10. Given any symmetric matrix A ∈ Mn (F), show that for any m × n matrix C over F, the matrix
CAC t is symmetric.
11. Prove the properties of conjugate transposes given in Proposition (1.5.4).
12. If A is invertible in Mn (C), show that A∗ is also invertible in Mn (C).
13. Let A be an upper triangular matrix in Mn (R) such that A commutes with At . Prove that A is
diagonal.
14. Give an example of matrices A, B ∈ Mn (C) such that
(a) At = A but A∗ ! A;
(b) Bt = −B but B∗ ! −B.
36 Matrices
1.6 PARTITION OF MATRICES; BLOCK MULTIPLICATION

Though matrices are introduced as arrays of numbers displayed in rows and columns, other ways of
looking at them can be quite useful. For example, subdividing a matrix by vertical as well as horizontal
lines to produce what is known as a partitioned matrix turns out to be very convenient, especially
when calculations with large matrices are required. Many contemporary applications of linear algebra
appear more natural and much easier to handle, if we resort to partitioned matrices. However, we will
not give a formal definition of a partitioned matrix as it requires cumbersome notation. We can avoid
such a formal definition for the idea of a partitioned matrix is so simple and natural that examples
alone will be sufficient for its understanding.
To keep our presentation informal, we will assume throughout this chapter that matrices under
discussion at any given point are over a fixed field, and we will not mention the underlying field at all
except in the last proposition.
We begin our discussion by considering the following example:
 
1 2 3 4 6 0

 6.
A = 7 8 0 −3 1
 
0 2 9 1 2 4
As indicated by the vertical and horizontal lines, A is an example of a 2 × 3 partitioned, or 2 × 3 block

matrix which can be visualized as consisting of submatrices or blocks:
' (
A11 A12 A13
A= .
A21 A22 A23
It is essential to regard the blocks or the submatrices Ai j of A virtually as the entries of A, and ma-
nipulate the blocks as if they are scalars; because of this viewpoint, A can be thought of as having 2
horizontal blocks and 3 vertical blocks. Note that in this example, the blocks or submatrices that form
the partition of A are of different sizes. This is typical, for there is no formal restriction on sizes of
blocks. We partition a matrix the way we deem fit, depending on our requirements. Sometimes, the
sizes of the blocks will be determined naturally. For example, when modelling a physical system such
as a communication or an electric network, or a large transportation system, or the Indian economy
by a matrix, it is essential to consider the matrix as a partitioned one whose blocks will be naturally
determined by mutual interactions of the different components of the system.
Another advantage of partitioned matrices lies in the fact that very large matrices, which appear
in many applications with the advent of high-speed computers, can be handled with relative ease if
we partition them properly. Such large matrices are partitioned into much smaller submatrices so that
computers can work with several submatrices at one time. Thus, the idea of partitioned matrix has
turned fruitful in tackling highly complex processes of today’s technical world.
Let us briefly discuss how the usual matrix operations work with partitioned matrices. Two matrices
of the same size can be added block wise if they are partitioned exactly in the same way; the sum will
be obviously a matrix with the same partition. To be more specific, given matrices A and B with
similar partitions, each block of the sum A + B is clearly the sum of the corresponding blocks of A and
B. Similarly, the scalar multiple of a partitioned matrix is obtained by multiplying each block of the
matrix by the scalar.
Partition of Matrices; Block Multiplication 37
We now consider the following partitioned matrices of the same size to see how the operations
work in practice:
   
1 2 3 4 6 0 0 2 −1 0 −3 5
  
A = 7 8 0 −3 1 6, B = 1 −2 4 −2 2 −5,
   
0 2 9 1 2 4 4 −4 6 1 0 −4
 
6 5 4 3 2 1

C = 7
 8 0 −1 2 0.
 
0 2 9 1 2 4
Now, A+ B can be computed block wise as A and B are partitioned the same way with corresponding
blocks or submatrices having the same sizes. However, even though A and C are matrices of the same
size (so A + C can be computed adding elements entry wise) A + C cannot be computed block wise
as the blocks of A and C are not comparable. But such problems do not occur in practice, as we
partition matrices beforehand in such a way that block wise operations are possible. Coming back to
our example, if we write
' ( ' (
A A12 A13 B B12 B13
A = 11 , B = 11
A21 A22 A23 B21 B22 B23
in terms of their blocks, then A + B, being the sum of corresponding blocks of A and B, respectively,
can be visualized in the following partitioned form:
' (
A11 + B11 A12 + B12 A13 + B13
A+B = .
A21 + B21 A22 + B22 A23 + B23
This presentation of A + B validates the point we made earlier, that while combining matrices with
comparable blocks, we treat their blocks as if they are actual matrix entries. If the scalar entries of the
sum A + B are needed, one simply adds up the blocks as matrices. In A + B, for example, A12 + B12 is
the submatrix
' ( ' ( ' (
4 6 0 −3 4 3
+ = .
−3 1 −2 2 −5 3
We adopt the same point of view (that is, of treating blocks as entries) to see how partitioned
matrices can be multiplied. In fact, we multiply partitioned matrices A and B by our old row-column
method for entries now applied to blocks, if the column partition of A matches the row partition of B.
By this we mean two things:
• The number of horizontal blocks of A equals the number of vertical blocks of B.
• The number of columns in each block of A equals the number of rows in the corresponding block
of B.
However, the basic requirement that the number of columns of A is equal to the number of rows of
B must be satisfied for the product AB to make sense. We work through a few examples to understand
block multiplication.
38 Matrices
Let A be a 3 × 5 and B be a 5 × 2 matrix (so that the product AB makes sense and is a 3 × 5 matrix)
partitioned into blocks by vertical and horizontal lines as shown below
 
   1 2

1 2 3 4 0

 3
 4
 1, B =  5 6.
A = 0 −1 2 1
   
2 0 1 0 1  0 1

−1 2
As indicated, A is partitioned into two horizontal blocks having 3 and 2 columns, respectively, whereas
B has two vertical blocks with 3 and 2 rows, respectively; thus, the column partition of A matches
the row partition of B. So, block multiplication can be performed. If we represent A and B by their
submatrices
' ( ' (
A A12 B
A = 11 , B = 11 ,
A21 A22 B21
then AB will also appear as consisting of two submatrices:

' (
A B + A12 B21
AB = 11 11 .
A21 B11 + A22 B21
Note that it is as if we have multiplied a 2 × 2 matrix to a 2 × 1 matrix to obtain a 2 × 1 matrix. Of

course, we still have to compute the submatrices by the usual rules of matrix operations. For example,
 
' ( 1 2 ' (' (
1 2 3   4 0 0 1
A11 B11 + A12 B21 = 3 4 +
0 −1 2   1 1 −1 2
5 6
' ( ' (
22 28 0 4
= +
7 8 −1 3
' (
22 32
= .
6 11
Similarly,
 
/ 0 1 2 /
 0' 0 1
(
A21 B11 + A22 B21 = 2 0 1 3 4 + 0 1
  −1 2
5 6
/ 0 / 0
= 7 10 + −1 2
/ 0
= 6 12 .
Thus, AB turns out to be

 
22 32
 6
 11.

6 12
It is clear that once the partitions of A and B match in this manner, all the products of the submatrices
making up the AB are automatically well-defined.
Also, note that compared to the calculations required in direct computation of product of matrices
from definition, those needed in block multiplication are much shorter, especially if the blocks are
chosen carefully. (see Exercise 5 of this section.) Therefore, block multiplication turns out to be more
efficient especially when working with large matrices.
For most of the applications required in this book though, partitioning matrices into four blocks
will suffice. We discuss this important special case thoroughly. So, let A be an m × n and B be an n × p
matrix (so that AB is an m × p matrix) partitioned as follows:
' ( ' (
A11 A12 B11 B12
A= , B= ,
A21 A22 B21 B22
where the number of columns of A11 equals the numbers of rows of B11. Observe that this single
condition ensures that the column partition of A matches the row partition of B, for there are just two
horizontal blocks of A and two vertical blocks of B. Now, we multiply these two matrices as if they
are both 2 × 2 matrices to obtain AB as follows:
' (
A B + A12 B21 A11 B12 + A12 B22
AB = 11 11 .
A21 B11 + A22 B21 A21 B12 + A22 B22
For example,
    
 1 2 3  0 1 −1 4 2  1 7 5 7 0
     
−1 0 1 −1 0 3 0 −1 =  1 1 1 −3 −2.
   
0 2 1 1 2 0 1 0 −1 2 6 1 −2
One should verify the details of the last example so as to gain some experience in dealing with block
matrices. We work out the upper-most right submatrix in the product AB:
' (' ( ' (/ 0
1 2 4 2 3
A11 B12 + A12 B22 = + 1 0
−1 0 0 −1 1
' ( ' (
4 0 3 0
= +
−4 −2 1 0
' (
7 0
= .
−3 −2
Let us discuss a couple of examples to illustrate the advantages of partitioning a matrix into four
blocks; these examples will be useful later.
EXAMPLE 2 Let us compute the square A2 of the following 5 × 5 matrix by block multiplication:
 
3 7 0 0 0

1 0 0 0 0
 
A = 0 0 1 −1 2.
0 0 0 
 1 0
0 0 1 3 −2
40 Matrices
Since A has a number of zeros, we try to partition it in such a way that zero blocks
do result; for, without actually doing any calculations we know that a zero block
multiplied to any comparable block will produce a zero block only. Also, since we
have to multiply A to itself, we require that after partitioning A, its upper left-hand
block, that is, A11 , must have the same number of rows as columns. In other words,
it must be a square submatrix. Thus, the following seems to be the most convenient
partitioning of A:
 
 3 7 0 0 0
−1 
 0 0 0 0

A =  0 0 1 −1 2.
 
 0 0 0 1 0
0 0 1 3 −2
Now, because we treat the four blocks as entries of a matrix, it is clear that the zero
submatrices in A will produce zero submatrices in the same positions in A2 . We have
the complete calculation as follows:
  
 3 7 0 0 0  3 7 0 0 0
−1  
 0 0 0 0 −1 0 0 0 0
2   
A =  0 0 1 −1 2  0 0 1 −1 2 (1.13)
  
 0 0 0 1 0  0 0 0 1 0
 
0 0 1 3 −2 0 0 1 3 −2
 
 2 21 0 0 0

−3
 −7 0 0 0

=  0 0 3 4 −2.
 
 0 0 0 1 0

0 0 −1 −4 6
The special kind of matrix we had just considered is known as a block triangular
matrix. A square matrix A of order n is a block upper triangular matrix, if A can
be put in the form
' (
A11 A12
A= ,
O A22
where A11 is also a square matrix of order, say r where r < n. It is clear that A22 is a
square matrix of order (n − r), and that O is the zero matrix of size (n − r) × r.
Given such a matrix A, block multiplication of A with itself is permissible as A11
is a square submatrix. We leave it to the reader to verify that
' 2 (
A A1
A2 = 11 ,
O A222
where A1 is the r × (n − r) submatrix given by A11 A12 + A12 A22 . Since A2 is again
a block upper triangular matrix with its upper left corner submatrix A211 a square
matrix, it follows that we can continue the process of block multiplication of A with
its powers to obtain, for any positive integer k, the power Ak in the following form:
 k 
k A11 Ak 
A =  .
O Ak22
EXAMPLE 3 Consider the block upper triangular matrix

' (
A11 A12
A=
O A22
of order n such that A is invertible. We seek to express A−1 also in a block triangular
form. We assume that A11 has order r and A22 has order s, so that r + s = n.
For the time being, let us denote A−1 as B. Partition B into four blocks as follows:
' (
B11 B12
B= ,
B21 B22
where B11 is a square block of order r. Then, the relation AB = In implies that
' (' ( ' (
A11 A12 B11 B12 Ir O
AB = =
O A22 B21 B22 O Is
so that by the rules of block multiplication, we obtain the following four matrix
equations after equating submatrices in the preceding equality:
A11 B11 + A12 B21 = Ir ,

A11 B12 + A12 B22 = O,
A22 B21 = O,
A22 B22 = I s .
Note that the last of the preceding equality implies, according to condition (1.12),
that A22 is invertible and A−122 = B22 . Now, multiplying the third of the equalities by
22 , we then see that B21 = O. It follows from the first of the equalities that A11 is
A−1
11 = B11 . Finally, multiplying the last equality by A11 and B22 shows
invertible and A−1 −1 −1
that B12 = −A11 A12 A22 . Hence, we can conclude that

−1 −1
 −1 −1 
A 11 A12 A22 
−A−1
A−1 = B =  11 .
O A−1
22
For future reference, we record a condition sufficient for block multiplication of

two comparable matrices with two horizontal and two vertical blocks each.
Proposition 1.6.1. Let

' ( ' (
A11 A12 B B12
A= and B = 11
A21 A22 B21 B22
42 Matrices
be two m × n and n × p partitioned matrices, respectively. If the number of columns in the submatrix
A11 is the same as the number of rows of B11, then block multiplication of A and B is possible, and
' (
A B + A12 B21 A11 B12 + A12 B22
AB = 11 11 .
A21 B11 + A22 B21 A21 B12 + A22 B22
Corollary 1.6.2. Let
' (
A A12
A = 11
A21 A22
be a partitioned matrix. If A11 is a square submatrix, then
' (
A11 2 + A12 A21 A11 A12 + A12 A22
A2 = .
A21 A11 + A22 A21 A21 A12 + A22 2
We end this section by presenting an alternative, but quite useful, way of looking at matrix multi-
plication, which we had anticipated in Section 2. For an m × n matrix A and n × p matrix B over any
field F, the entries of the product AB are usually obtained by the dot products of the rows of A with the
columns of B. However, by partitioning A into its columns, and B into its rows, and block multiplying
the partitioned matrices thus obtained, we can produce another description of the product AB. This
way of partitioning results in n blocks in A in a row, each block an m-dimensional column vector; and
similarly in n blocks in B in a column, each block a p-dimensional row-vector; multiplying the blocks
of A with the blocks of B by the row-column method (treating the blocks as if they are scalars), we
then obtain the required expression for AB as given in the following proposition.
Proposition 1.6.3. Let A be an m × n matrix, and B be an n × p matrix over a field F. If

γ1 , γ2 , . . . , γn are the columns of A (so that each γ j is an m × 1 matrix), and ρ1 , ρ2 , . . . , ρn are
the rows of B (so that each ρ j is an 1 × p matrix), then
AB = γ1 ρ1 + γ2 ρ2 + · · · + γn ρn .
Note that each product γ j ρ j is an m × p matrix and so their sum is also m × p matrix.
This way of multiplying two comparable matrices is known as the column–row multiplication of
matrices.
In particular, we see that if A is an m × n matrix with γ1 , γ2 , . . . , γn as its columns and x =
(x1 , x2 , . . . xn ) an n × 1 column vector, then as the jth row of x is x j ,
Ax = x1 γ1 + x2 γ2 + · · · + xn γn ,
a result which we have verified directly in Section 2 (see the discussion preceding Equation 1.7).
EXERCISES
1. In each of the following products of block matrices, assume that the blocks are such that block
multiplication is possible. Express each of the product as a single block matrix. I and O stand
for the identity matrix and zero matrix of suitable sizes, respectively.
' (' ( ' (' (
A O C D A B I E
, ,
O B E F O D O I
' (' ( ' (' (

O I A B A B C D I
, .
I D C O O I O O F
2. In each of the following products of two matrices, one of the matrices is a partitioned one.
Partition the other matrix suitably so that block multiplication is possible. Express each product
as a single block matrix after performing block multiplication.
 
   1 −1 2

0 1 −1 2 0  0
 1 −2
2 
 0 3 −1 1  3 0 1,
  
1 2 3 0 −2  1 2 1
−1 2 −2
  
 1 2 0 0  2 −1 1 0
 
−1
 5 0 0  0 3 −1 1
 .
 0
 0 3 1 −1 1 −2 0
 
0 0 2 −3 2 −1 1 1
3. Compute A2 by block multiplication after suitably partitioning the following matrices A:

 

2 −1 0
 2 −1 0 0

  1 3 0 0
A = 1 3 0, A =  .
  0 0 4 −1
0 0 4
0 0 3 1
4. Consider matrices
 
 3 3 
1
0 0 0 
4 9
2
  
A = 0
1 0 0 and B = 6 8.
3
 5 9
0
0 1 0  4 
7 0
/ 0
Compute AB by block multiplication after partitioning A as I3 0 x and partitioning B in
such a way that block multiplication is possible.
5. Verify Proposition (1.6.1).
6. Verify Corollary (1.6.2).
7. Verify Proposition (1.6.3).
44 Matrices
8. Suppose a matrix A ∈ Mn (F) can be partitioned as

' (
I O
A= ,
B I
where the symbols I stand for identity matrices of possibly different orders. Prove that A is
invertible, and express A−1 as a block matrix.
Hint: Determine matrices C, D, E and F such that
' ( ' (
C D I O
A = .
E F O I
9. Let A be an invertible matrix in Mn (F) in block upper triangular form:

' (
A A12
A = 11 ,
O A22
where A11 is a square matrix. Show that A−1 is also a block upper triangular matrix. Express
the blocks
' of A ( in terms of blocks of A.
−1
B O
10. Let A = , where B and C are square matrices over a field F of possibly different orders.
O C
Prove that A is invertible if and only if B and C are invertible.
11. Consider a matrix A over a field F which can be partitioned as follows:
' (
A11 A12
A= ,
A21 A22
where A11 is invertible. Find matrices X and Y over F such that

' (' (' (
I O A11 O I Y
A= ,
X I O S O I
where S = A22 − A21 A11 −1 A12 .

12. For any two m × n matrices A and B over a field F, show that
' (' ( ' (
I m A Im B Im A + B
= ,
O In O In O In
where Im and In are identity matrices of order m and n over F. Hence, show that for any m × n
matrix A over F,
' (−1 ' (
Im A Im −A
= .
O In O In
13. A lower triangular matrix in Mn (F) can be represented as

' (
a 0
,
b C
Groups and Fields 45
where a ∈ F, b is an (n − 1) × 1 column vector over F, 0 is the 1 × (n − 1) zero row vector and C

is a lower triangular matrix in M(n−1) (F).
Prove that the product of two lower triangular matrices in Mn (F) is lower triangular by using
block multiplication.
Hint: Use induction on n.
1.7 GROUPS AND FIELDS

Earlier in this chapter, while discussing properties of addition and multiplication of matrices, certain
terms, such as fields and groups, were used to refer to specific algebraic structures; we also used
properties of fields for proving properties of matrix operations. In this brief section, we discuss these
structures and give some examples, mostly of those which appear frequently in linear algebra.
One of the basic concepts required for these definitions is that of a binary operation on a set. As the
term suggests (‘bi’ means two), a binary operation on a non-empty set S is a rule by which any pair of
elements of S is associated to a unique element of S ; one also says that S is closed with respect to the
binary operation. To describe the result of the application of the rule to a pair, various symbols such
as + (usually called addition) or · (called multiplication) or even ∗ are used. Thus when one says that
+ (or ·) is a binary operation on a set S , one means that every pair x, y of elements of S is associated
to the unique element x + y ∈ S (or to x · y; most of the times the juxtapose xy is used to denote x · y).
Thus, the usual addition and multiplication of integers are binary operations in the set Z of integers;
for any pair of integers x, y, x + y and xy are well-defined integers. Similarly, the familiar operations of
addition and multiplication are binary operations in the set R of real numbers as well as in the set C of
complex numbers. The definition of addition of m × n matrices shows that it is too a binary operation
on the set of such matrices.
Now for the definition of a group refer to next definition.
Definition 1.7.1. A non-empty set G with a binary operation · is said to be a group (with respect
to ·) if the following axioms (rules) are satisfied.
(a) (Associativity) (x · y) · z = x · (y · z) for any x, y, z ∈ G.

(b) (Existence of Identity) There is an element e ∈ G such that
x · e = x = e · x for any x ∈ G.
(c) (Existence of Inverse) For every x ∈ G, there is an element y ∈ G such that
x · y = e = y · x.
A group G with a binary operation · is called an commutative group if

(d) (Commutativity) x · y = y · x for any x, y ∈ G.
In case a group G with addition +, satisfies the condition for commutativity, that is, x + y = y + x for
all x, y ∈ G, it is customary to call it an abelian group
It can be easily shown that the identity of a group is unique and so is the inverse of any element of
a group.
46 Matrices
Now for the examples. With respect to usual addition of numbers, each of the sets Z (integers), Q
(rational numbers), R (real numbers) or C (complex numbers) is an abelian group. For, (x + y) + z =
x + (y + z) and x + y = y + x for any numbers x, y and z in any of these four sets; the number 0, which
is common to all these four sets, clearly acts as the (additive) identity: x + 0 = x = 0 + x. Moreover,
the negative −x of a number x is its (additive) inverse. Thus, each of the four sets is an abelian group.
Regarding multiplication, in Z it is associative and 1 acts as the multiplicative identity. However, as we
have remarked earlier, even the non-zero integers do not form a group with respect to multiplication as,
except 1 and −1, no integer has an inverse. In contrast, each of the three sets Q∗ (non-zero rationals),
R∗ (non-zero reals) and C∗ (non-zero complex numbers) is a commutative group with respect to usual
multiplication. The number 1, common to these three sets, is clearly the identity. If r is a non-zero
rational or a real number, then the reciprocal 1/r is the inverse; for a non-zero complex number a + bi
a − bi
(so at least one of a and b must be a non-zero reals), the complex number 2 2 is the inverse.
a +b
There are numerous other examples of groups which can be found in any standard textbook of
algebra such as the classic Topics in Algebra by I.M. Herstein [3]. Some groups appear naturally in
linear algebra and will be considered as we go along. At this point, we want to introduce a family of
finite groups. A finite group has finitely many elements. Note that in all the examples we have given
so far, the groups have infinitely many elements; they are infinite groups. The simplest finite group has
two elements and is usually described as Z2 = {0, 1}. We can introduce a binary operation + in Z2 by
the following relations:
0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0.
Thus, every possible pair of elements in Z2 is associated to a unique element in Z2 and so + is indeed
a binary operation in Z2 , called addition in Z2 . It is a trivial verification that Z2 satisfies all the axioms
for an abelian group with respect to addition; the element 0 is the identity (zero) and 1 is the inverse
of itself. Similarly, multiplication in Z2 can be defined by
0 · 0 = 0, 0 · 1 = 0, 1 · 0 = 0, 1·1 = 1
Note: It is clear that Z2 cannot be group with respect to multiplication as the element 0 does not have
an inverse in Z2 . On the other hand, note that the singleton {1}, consisting of the only non-zero element
of Z2 , forms a commutative group with respect to multiplication as all the required axioms are satisfied
trivially because of the relation 1 · 1 = 1.
In general, for every prime number p, we can consider the set Z p = {0, 1, . . . , p − 1} of p elements.
As in the case of Z2 , addition and multiplication are defined in Z p as follows. For any a, b ∈ Z p , the
sum a + b and the product a · b in Z p are the least non-negative remainders of the sum a + b (as integers)
and the product a · b (as integers), respectively, when divided by p in Z; the operations are known as
addition and multiplication modulo p. For example, as 3 + 2 = 5 and 3 · 2 = 6 in Z, in Z5 , 3 + 2 = 0 and
3 · 2 = 1.
Since the least non-negative remainder of any integer when divided by p is an integer between 0
and p − 1, it follows that addition and multiplication in Z p , as defined in the preceding para, are binary
operations on Z p . It can, further, be verified that
(a) Z p with its addition is an abelian group with the zero 0 ∈ Z p acting as the additive identity;
(b) Z p ∗ , the set of non-zero elements of Z p , is also a commutative group with respect to multiplica-
tion with 1 ∈ Z p acting as the multiplicative identity.
Groups and Fields 47
In all the examples of abelian groups (with respect to whatever operation known as addition) that
we have discussed so far, each of the groups has another binary operation called multiplication such
that, with the exception of the additive group Z of integers, the non-zero elements of the additive group
form a commutative group with respect to multiplication. These examples, thus, lead naturally to the
definition of a field.
In the following definition, as in all the examples, the identity of a group, with respect to a binary
operation called addition, will be designated as the zero of the group and will be denoted by the
symbol 0.
Definition 1.7.2. Let F be a non-empty set with two binary operations, one called addition and the
other multiplication. F is a field if the following conditions hold.
(a) F is an abelian group with respect to addition.
(b) F∗ , the set of the non-zero elements of F, is a commutative group with respect to multiplication.
(c) The elements of F satisfy the distributive laws:
(x + y : z = x : z + y : z for all x, y, z ∈ F,
and
x : (y + z) = x : y + x : z for all x, y, z ∈ F.
In general, the unique additive identity of a field F is called the zero of F and is denoted by 0,
whereas the unique multiplicative identity is denoted by 1; by convention, 0 ! 1 so any field has at
least two elements. The additive inverse of an element x in a field F is denoted by −x; the multiplicative
inverse of a non-zero element x ∈ F is denoted either by 1/x or by x−1 .
We have already noted that the number systems Q, R, C and Z p satisfy the first two axioms of the
definition of a field. Since the distributive laws also hold for them, we have the following list of fields:
(a) Q, R and C with respect to usual addition and multiplication of numbers.
(b) Z p with respect to addition and multiplication modulo p for any prime p (including p = 2).
There are many more important examples of fields. However, the fields relevant for our purpose are
the ones mentioned in the preceding example and their subfields. Before introducing the concept of a
subfield, we digress a bit with some remarks about our discussion so far in this section.
It must have been noticed that while presenting examples of fields of various numbers, we have
glossed over the verifications of the properties of addition and multiplication in these number systems.
So it must be pointed out that rigorous proofs of these properties can be given once these numbers and
their addition and multiplication are defined properly (see, for example, Chapter 13 of Basic Abstract
Algebra by Bhattacharya, Jain and Nagpaul).
We come back to subfields now. As the name suggests, a subfield K of a field F is a non-empty
subset of F such that K itself is a field with respect to the field operations of F. Thus, for elements x, y
of K, the sum x + y and the product xy in F, must be in K. It is also clear that the identities 0 and 1 of
F must belong to K. To be precise, the following conditions on a non-empty subset K of a field F are
sufficient for K to be a subfield:
(a) For any x, y ∈ K, x + y ∈ K.
(b) For any x ∈ K, −x ∈ K.
(c) For any non-zero x ∈ K, the inverse x−1 ∈ K.
(d) 1 ∈ K.
48 Matrices
If K is a subfield of F, we also say that F is an extension field of K.

The following are some examples of subfields:
(a) The field R of real numbers is a subfield of the field C = {a + bi : a, b ∈ R} of complex numbers.
(b) The field Q of rational
√ numbers is a subfield of R and also of C.
(c) The subset {a + b √2 : a, b ∈ Q} is a subfield of R.
(d) The subset {a + b −2 : a, b ∈ R} is a subfield of C.
We finally consider rings. Like a field, a ring also has two binary operations, called addition and
multiplication.
Definition 1.7.3. A non-empty set R, with two binary operations called addition (+) and multipli-
cation (:), is a ring if the following hold.
(a) R with respect to addition is an abelian group.
(b) Multiplication in R is associative.
(c) The distributive laws (as in a field) hold in R. A ring R is a commutative ring if
(d) the multiplication in R is commutative. A ring R is a ring with identity if
(e) R has multiplicative identity 1.
Note: A field is a commutative ring with identity in which every non-zero element has a multiplica-
tive inverse.
As we have noted earlier in Section 1.3, the set Mm×n (F) of all m × n matrices over a field F is
an abelian group with respect to matrix addition (see Theorem 1.3.2). Moreover, the set Mn (F) of
all square matrices of order n over a field F, with respect to matrix addition and multiplication, is a
non-commutative ring with identity. The other important example of commutative rings is, as we have
already noted in Section 1.3, is the set of Z of integers with respect to usual addition and multiplication.
These sets of matrices are also examples of vector spaces over a field. We shall be considering
vector spaces in detail in Chapter 3.
Systems of Linear
2 Equations
2.1 INTRODUCTION
This chapter is a leisurely but exhaustive treatment of systems of linear equations and their solutions.
The chapter begins by looking at the well-known procedure of row reduction of systems of linear
equations for obtaining their solutions and thereby develops the important theoretical machinery of
row and column operations. We also initiate the study of linear equations by using matrix operations
in this chapter. The application of matrix methods yields useful insights into other entities such as
invertible matrices. This chapter also introduces the important idea of determinants and develops its
properties.  
 x1 
 x 
 2
Note: from now onwards, we shall be writing a column vector x =  .  as the transpose
 .. 
 
xn
(x1 , x2 , . . . , xn )t of the corresponding row vector as far as practicable.
2.2 GAUSSIAN ELIMINATION

We begin by recalling the elementary procedure for solving two linear equations in two variables. The
idea is to use one equation to eliminate one variable from the other equation; the resultant equation,
having only the remaining variable, readily yields the value of this variable. Using it in any of the
original equations, we obtain an equation which determines the value of the other variable. These
values constitute the required solution. The same idea, with slight modification, works in the more
general case of a system of, say, m equations in n variables x1 , x2 , . . . , xn over any field. In this
general case, we use the x1 term in the first equation, or in any other equation, to eliminate the x1
term in all the other (m − 1) equations. The resultant system of m equations will thus have exactly one
equation with x1 term. Next, we use the x2 term in any equation, which does not have the x1 term, to
eliminate the x2 term in the other (m − 1) equations. After the second round of eliminations, the x1 and
the x2 term will be presented only in a single equation each in the new system. We now proceed with
the x3 term the same way, and so on with the other variables one by one, until we have a very simple
equivalent system of equations from which the values of the variables, or the solution set of the system
can obtained easily. Not only that, the shape of the final system can also tell us when the system does
not have a solution.
49
50 Systems of Linear Equations
This method of solving a system of linear equations, by systematic elimination of as many variables
as possible, to arrive at simpler equivalent system is known as the method of Gaussian elimination. In
what follows, we refine this procedure by using matrices to turn it into a powerful tool for examining
systems of linear equations.
Before we proceed any further, we recall some of the terms associated with a system of linear
equations over a field F. A reader not familiar with the notion of a field can assume F to be either R,
the field of real numbers or C, the field of complex numbers. As seen in Chapter 1, the general system
of m equations in n variables (unknowns) over a field F is usually described as
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. ..
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm ,
where the coefficients ai j and the bi are given scalars in the field F. Such a system can also be described
briefly by using the summation sign:
n
1
ai j x j = bi for i = 1, 2, . . . , m. (2.1)
j=1
/ 0
We have also seen that if A = ai j denotes the m × n matrix of the coefficients of the system (2.1), then
the system can also be described as a single matrix equation:
Ax = b, (2.2)
where x is the column vector (x1 , x2 , . . . , xn )t of variables and b = (b1 , b2 , . . . , bm )t is the column
vector over F consisting of the scalars appearing on the right-hand side of the system of equations
(2.1). Recall that A is called the coefficient matrix of the given system. A solution of the system is an
n × 1 column vector s = (s1 , s2 , . . . , sn )T over F such that As = b becomes a true identity. In other
words, when the scalars s j replace the variables x j , each of the equations of the system (2.1) becomes
a true statement.
The collection of all the solutions of a given system of linear equations over any field is the solution
set of the system. It must be pointed out that a system of linear equations need not have a solution; in
that case, the solution set is empty.
Two systems of linear equations over the same field are said to be equivalent if they have the same
solution set. The point of subjecting a system of equations to Gaussian elimination is to reduce the
given system to a simpler equivalent system so that the solutions of the given system are obtained
by considering the equivalent system. It is clear that if any system equivalent to a given system of
equations fails to have a solution, then the original system too can have no solution.
Definition 2.2.1. A system of linear equations over a field F is said to be consistent if it has some
solution over F; it is inconsistent if it has no solution.
The preceding remark can now be restated: a system of equations is inconsistent if and only if any
equivalent system is inconsistent.
We now present a few simple examples to show various possibilities of solutions of a system of
linear equations.
Gaussian Elimination 51
EXAMPLE 1 Consider the system

2x1 − 3x2 = −3
x1 + x2 = 1
of equations over the reals. One verifies easily (for example, by eliminating x1 from
the first equation to begin with) that
x1 = 0
x2 = 1
2 3
0
is an equivalent system. Thus, we may conclude that s = is the only solution.
1
EXAMPLE 2 The system
x 1 + x2 = 3
2x1 + 2x2 = 4
is equivalent to
x1 + x 2 = 3
0 = −2
which is clearly inconsistent. No matter which real numbers replace x1 and x2 in the
system, one of the equations can never be a true statement. Thus, the given system is
also inconsistent.
EXAMPLE 3 However, a system equivalent to
x1 − 2x2 = 3
2x1 − 4x2 = 6
is
x1 − 2x2 = 3
which is obtained by subtracting
2 3 the first equation from the second. It follows that
2a + 3
for any real number a, is a solution of both the systems. Thus, the solution
a
set of the given system of equations is infinite.
In these examples, we have tacitly assumed that certain operations, which constitute the process of
Gaussian elimination, on a system of linear equations produce an equivalent system. To justify our
assumption, we need to study systematically the effects of Gaussian elimination on a system of lin-
ear equations. The procedure of Gaussian elimination applied to the equations of a system of linear
equations involves just three types of operations with the equations.
1. Multiplication of all the terms of an equation by a non-zero scalar (for simplicity, we call the
resultant a scalar multiple, or just a multiple of the equation.)
2. Interchanging two equations.
3. Replacing an equation by its sum with a multiple of another equation.
We will illustrate these operations shortly in an example. Moreover, for later use, we will also
examine how these operations change certain matrices associated with the systems of linear equations
at the same time. We have already introduced the idea of a coefficient matrix of a system of linear
equations. The next definition introduces another matrix known as the augmented matrix of a system
of equations.
Definition 2.2.2. Let A be the coefficient matrix of the following system of m linear equations in n
variables:
1n
ai j x j = bi for i = 1, 2, . . . , m. (2.3)
j=1
Then, the augmented matrix of the system (2.3) is the m × (n + 1) matrix given in the block form as
/ 0
A b,
where b is the column vector consisting of the scalars bi in the right-hand side of Equation (2.3).
In the following example, we shall write the augmented matrix of each system of equations along-
side it so that the changes that occur in these matrices as we go on eliminating variables in the equations
are visible.
EXAMPLE 4 We consider the following system of equations with the augmented matrix alongside
it:
 
x1 − x2 + 2x3 = 3 1 −1 2 3
 
3x1 + 2x2 − x3 = 1 3 2 −1 1.
 
x2 + 4x3 = −1 0 1 4 −1
To start the procedure of Gaussian elimination, we use the first equation to elim-
inate x1 from the other two equations. So, the first step will be to multiply the first
equation by −3 and add it to the second equation (or multiply it by 3 and subtract it
from the second) to eliminate x1 from the second equation. If the third equation had
a x1 term, we would have done a similar operation to eliminate it from the equation.
The new system and the corresponding augmented matrix will look like this
 
x1 − x2 + 2x3 = 3 1 −1 2 3
 
5x2 − 7x3 = −8 0 5 −7 −8.
 
x2 + 4x3 = −1 0 1 4 −1
Now, we have two equations without the x1 term, and theoretically we can use
either one to eliminate the x2 term. However, as the third one has 1 as the coefficient
of x2 , we choose to work with it. For reasons which will be clear later, we interchange
the two equations so as to have the x2 term with coefficient 1 in the second row; note
that in the new augmented matrix, the first non-zero term in the first two rows is 1:
 
x1 − x2 − 2x3 = 3 1 −1 2 3
 
x2 + 4x3 = −1 0 1 4 −1.
 
− 5x2 − 7x3 = −8 0 5 −7 −8
Gaussian Elimination 53
Now that we have only x2 in the second equation, we add suitable multiples of
the second row to the first as well as the third row to get rid of the x2 terms therein.
This will result in
 
x1 + 2x3 = 3 1 0 6 2
0 1 
x2 + 4x3 = −1 4 −1.
 
− 27x3 = −3 0 0 −27 −3
There is only x3 term present in the third equation, and multiplying it by −1/27
makes its coefficient 1. Now, suitable multiples of this new third equation can be used
to eliminate the x3 terms from the other two equations. Carrying out the necessary
computations, we finally arrive at
 
x1 = 12/9 1 0 0 12/9
x2 = −13/9 0 1 0 −13/9.

 
x3 = 1/9 0 0 1 1/9
We conclude that the given system has a unique solution, namely, (12/9, −13/9, 1/9)t .
We note the following points about the preceding procedure:
1. There is no single way of carrying out the elimination process. For example, while carrying the
process, the second step could have been that of multiplying the second equation by 1/5 instead
of interchanging the second and the third rows.
2. Whatever steps one may take to eliminate the variables, one tries to manipulate the equations in
such a way that the first surviving variable (with coefficient 1) in each equation appears from the
left to the right as we come down the equations of the system.
3. Finally, about the augmented matrices. Consider the augmented matrices of the systems of equa-
tions we obtained in each stage. It is clear that we could have obtained these matrices by per-
forming on the rows of the original matrix the same operations as the ones that were performed
on the equations. For example, the second of these matrices could have been obtained simply
by adding −3 times the first row to the second row of the first augmented matrix. Similarly, the
third of the augmented matrices could have been obtained by simply interchanging the second
and the third rows of the second augmented matrix.
Thus, we see that solving a system of equations can also be accomplished by performing a series
of row operations (which we will describe a little later) on the augmented matrix to bring the matrix
to the simplest possible form and then writing down the equations corresponding to the new matrix.
Then, one can read off the solutions from the new simpler system of equations.
Before we try the alternative method of tackling systems of equations, we need to explain the two
terms, namely ‘row operations’ and ‘simplest possible form of matrices’. We discuss these two ideas
in the next section.
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications. Any
given matrix is over an arbitrary field.
(a) Any system of linear equations, given by the matrix equation Ax = 0, is consistent.
(b) Any system of linear equations, given by the matrix equation Ax = b for a fixed but arbitrary
column vector b, is consistent.
(c) Two systems of linear equations described by matrix equations, Ax = b and Bx = c, are
equivalent, if Bs = c for any solution s of Ax = b.
(d) Two equivalent systems of linear equations must have the same number of equations.
(e) The augmented matrix of a system of linear equations can never be a square matrix.
(f) A system of linear equations consisting of a single linear equation involving n variables is
always consistent if n > 1.
(g) If s1 and s2 are solutions of the system of linear equations Ax = 0, so is s1 + s2 .
(h) If s1 and s2 are solutions of the system of linear equations Ax = b for some non-zero column
vector b, then so are s1 + s2 .
(i) For a square matrix A, any system of linear equations Ax = b has a unique solution if A is
invertible.
2. Show that the the following equation over R has infinitely many solutions; further show that
every solution can be expressed as a linear combination, that is, as a sum of scalar multiples of
three fixed 3 × 1 column vectors over R:
x1 − x2 − 2x3 = 1.
3. Solve the following system of equations over R by the method of Gaussian elimination:
x1 + x2 = 3
(a)
2x1 − x2 = 1,
x1 − 2x2 − x3 = 2
(b)
2x1 + x2 − 2x3 = 9,
3x1 + 6x2 + x3 = 16
(c) 2x1 + 4x2 + 3x3 = 13
x1 + 3x2 + 2x3 = 9.
4. Is the following system of equations over the field C of complex numbers consistent?
x1 + ix2 = 1 + i
ix1 − x2 = 1 − i.
5. Find the values of a for which the following system of equations over R is consistent. Determine
the solutions for each such a.
x1 + 3x2 − x3 = 1
2x1 + 7x2 + ax3 = 3
x1 + ax2 − 7x3 = 0.
6. Find the values of the real scalars a, b and c such the the following system of equations over the
Elementary Row Operations 55
real numbers has a unique solution, no solution and infinitely many solutions.
2x1 + 2x2 + 2x3 = a

x1 + 2x2 − 2x3 = b
x1 + x2 + x3 = c.
2.3 ELEMENTARY ROW OPERATIONS

Though the term ‘row operation’ has come up in the context of augmented matrix, it should be clear
that the following definition is valid for any matrix. In fact, we will see many applications of row
operations in areas other than that of systems of equations.
Definition 2.3.1. Let A be an m × n matrix over a field F, whose rows are considered n-dimensional
row vectors. An elementary row operation on A is any one of the following three types of operations
that can be performed on the rows of A.
1. Row scaling: multiplying any row of A by a non-zero scalar;

2. Row exchange: interchanging any two rows of A;
3. Row replacement: adding a scalar multiple of a row of A to another row.
These row operations are considered as scalar multiplication and addition of vectors in Fn . Note
that row operations do not change the size of a matrix.
If an m × n matrix A is changed to a matrix B by a sequence of row operations (or even by a single
row operation), we say that B is row equivalent to A. So, row equivalence is a relation in Mm×n (F),
which we will show, a little later, to be an equivalence relation.
We sometimes use the symbol ∼ to describe row equivalence; if B is row equivalent to A, then we
write B ∼ A.
We can also define elementary column operations of three types on a matrix in a similar manner.
It suffices to say that all one has to do is to replace the word ‘row’ by ‘column’ in Definition (2.3.1).
It is also clear as is what is meant by saying that B is column equivalent to A.
Elementary Matrices
The study of row equivalence (respectively, column equivalence) is made easier by the fact that the
effect of any elementary row operation (respectively, column operation) on a matrix can also be re-
alized by left multiplying (respectively, right multiplying) by certain matrices known as elementary
matrices. As the following definition shows, an elementary matrix corresponding to an elementary
row operation can be obtained by applying precisely the same row operation to an identity matrix of
suitable size; thus, there are three types of elementary matrices. Note that any elementary matrix has
to a square one.
Definition 2.3.2. An elementary matrix of order m over a field F is a matrix obtained by applying
an elementary row operation to the identity matrix Im ∈ Mm (F).
Thus, corresponding to the three types of elementary row operations given in Definition (2.3.1),
there are three types of elementary matrices:
(a) An elementary matrix of order m, corresponding to a row scaling, is obtained by multiplying a

row of Im by a non-zero scalar a and therefore is of the form:
 
1 
 1 
 0 
 . 
 
 a 
 .
 . 
 . 
 0 
 1 

1
(b) An elementary matrix of order m, corresponding to a row exchange, is obtained by interchanging

two rows of Im and therefore is of the form:
 
1 
 1 0 
 
 . 
 
 0 ··· 1 
 .. .. 
 .
 . . 
 1 ··· 0 
 
 . 
 
 0 1 
1
(c) An elementary matrix of order m, corresponding to a row replacement, is obtained by adding to

a row of Im a scalar multiple of another of its rows and therefore is of the form
 
1 
 1 . a 
 
 . 
 .
 . 
 
 0 1 
1
It can be easily verified that each of these elementary matrices can be obtained from the identity
matrix by either an elementary row operation or the corresponding column operation. For example,
to get an elementary matrix of the row replacement type, either we add to the ith row a multiple of
the jth row of the identity matrix, or we add to the jth column the same multiple of the ith column.
Interchanging the ith and the jth rows in the identity matrix has the same effect as interchanging the
corresponding columns. Finally, multiplying the ith row of the identity matrix by a non-zero scalar a
is the same as doing the same to the ith column.
It is easy to write down the elementary matrices over any field F in practice. For example, the
following is a complete list of elementary matrices of order 2:
' ( ' (
a 0 1 0
Row scaling: or
0 1 0 a
' (
0 1
Row interchange:
1 0
' ( ' (
1 a 1 0
Row replacement: or
0 1 a 1
Here, a stands for an arbitrary non-zero scalar form F. To take another example, let us list the pos-
sible elementary matrices of order 3 of row interchange type. Since the corresponding row operations
involve the interchange of two rows, the required matrices can be obtained by simply interchanging
two rows of the 3 × 3 identity matrix I3 . Thus, they are
     
0 1 0 0 0 1 1 0 0
     
1 0 0, 0 1 0, 0 0 1.
0 0 1 1 0 0 0 1 0
As we have noted earlier, these can also be obtained by interchanging the columns of the identity
matrix I3 .
Then we examine what happens when we left-multiply an arbitrary 2 × 3 matrix over a field by the
elementary matrices of order 2 over the same field in the following example:
' (' ( ' (
1 a a11 a12 a13 a + aa21 a12 + aa22 a13 + aa23
= 11 ;
0 1 a21 a22 a23 a21 a22 a23
' (' ( ' (

1 0 a11 a12 a13 a11 a12 a13
= ;
a 1 a21 a22 a23 a21 + aa11 a22 + aa12 a23 + aa13
' (' ( ' (

0 1 a11 a12 a13 a a22 a23
= 21 ;
1 0 a21 a22 a23 a11 a12 a13
' (' ( ' (

a 0 a11 a12 a13 aa11 aa12 aa13
= .
0 1 a21 a22 a23 a21 a22 a23
These calculations show that the effect of applying an elementary row operation to a 2 × 3 (or for
that matter, to any matrix having two rows with any number of columns) matrix is the same as that
of left-multiplying the matrix by the corresponding elementary matrix of order 2. Stated differently,
left-multiplication of a 2 × n matrix by an elementary matrix of order 2 results in the corresponding
row operation in that matrix. Interestingly, the column operations can similarly be carried out with the
help of elementary matrices. However, we need to right-multiply by elementary matrices to affect the
column operations. Leaving the verification in some small cases to the reader (see Exercises 9 and 10),
we straightaway record the general result.
Proposition 2.3.3. Let A ∈ Mm×n (F). Suppose that e(A) and e' (A) are the matrices obtained from
A by some elementary row operation e and column operation e' , respectively. Let E and E ' be the
corresponding elementary matrices of order m and n, respectively, over F. Then,
e(A) = EA and e' (A) = AE ' .
Before beginning the proof, let us note that the sizes of the elementary matrices in the proposition
depend on whether we have to left-or right-multiply A by them. Also, we shall deal with row operations
e only; the proof with respect to column operations e' can be carried out in a similar manner.
Proof. This is one of those results where several cases have to be verified one by one. We choose to
verify only one case, leaving the rest to the reader.
Consider the row operation e which interchanges the ith and the jth rows of the m × n matrix A.
Here, i and j are fixed but arbitrary integers between 1 and m. We need to show that, for any k with
1 ≤ k ≤ m, the kth rows of e(A) and EA are the same. We first assume that k ! i, j. As A and e(A) differ
only in the ith and the jth rows, for our choice of k, the kth row of A is the same as the kth row of e(A).
4
On the other hand, a typical entry of the kth row, say, the (k, l)th entry, of EA is given by nr=1 ckr arl
if E = [ckl ] and A = [akl ]. Since E is obtained from Im by interchanging its ith and jth rows, it follows
that ckk = 1 and ckr = 0 if k ! r so the (k, l)th entry of EA is just akl . Thus, the kth rows of A and EA,
and hence the kth rows of e(A) and EA coincide.
Next, we assume that k = i or k = j. Suppose that k = i (the case of k = j can be settled the same
way). Now, the ith row of e(A) is the jth row of A. On the other hand, as E = [ckl ] is obtained by
interchanging the ith and the jth rows of Im , ci j = 1 and cir = 0 if r ! j. Therefore, a typical entry of the
4
ith row EA, say, the (i, l)th one, is nr=1 cir arl = a jl . Thus, the ith row of EA is precisely the jth row of
A which, as we have already seen, is the ith row of e(A). This completes the proof of the case we had
chosen. !
Since row or column equivalent matrices are obtained by sequences of elementary row operations
or column operations, the preceding proposition implies the following.
Proposition 2.3.4. Let A ∈ Mm×n (F), and let B ∈ Mm×n (F) be row equivalent to A. Then, there are
elementary matrices E1 , E2 , . . . , Er of order m over F, such that
B = Er . . . E2 E1 A.
Similarly, if C is column equivalent to A in Mm×n (F), then there are elementary matrices
F1 , F2 , . . . , F s of order n over F, such that
C = AF1 F2 . . . F s .
To continue our discussion on row and column operations, we now note that every elementary row
or column operation has a reverse operation. To be precise, if A is changed to e(A) or e' (A), then there
is the reverse operation e1 or e'1 so that e1 (e(A)) = A and e'1 (e' (A)) = A. For example, if e(A) is obtained
from A by interchanging the ith and the jth rows, then interchanging the same rows of e(A), we get
back A from e(A). Similarly, corresponding to the operation of adding a times the jth row to the ith
row of A, the operation of adding of (−a) times the jth row of e(A) to its ith row will be the reverse
operation on e(A). These remarks form the basis of the proof of the following result about elementary
matrices.
Proposition 2.3.5. Any elementary matrix over a field F is invertible. Moreover, the inverse of an
elementary matrix over F is again an elementary matrix of the same order over F.
Before formally proving this proposition, we consider some examples. To find, for example, the
inverse of the elementary matrix
' (
0 1
,
1 0
we note that the corresponding elementary row operation interchanges the two rows of a 2 × n matrix.
Since to undo this change, we have to interchange the two rows of the resultant matrix, it follows that
the reverse of the row operation must be the same. Thus, the inverse of the given elementary matrix is
' (−1 ' (
0 1 0 1
= .
1 0 1 0
Similarly, as the reverse of the process of adding 3 times the 3rd row to the 2nd row of a matrix is to
add (−3) times the 3rd row to the 2nd row, we see that
 −1  
1 0 0 1 0 0
   
0 1 3 = 0 1 −3.
  
0 0 1 0 0 1
With these examples in mind, we now proceed with the proof.

Proof. Consider an elementary matrix E of order m over a field F. If E is obtained by multiplying the
ith row of the identity matrix Im by a non-zero scalar d, then it is clear that the elementary matrix ob-
tained by multiplying the same row of Im by d−1 is the inverse of E. (Just multiply the two elementary
matrices which are diagonal matrices.)
In case E is obtained by adding d times the ith row of the identity matrix Im to its jth row, then
E = Im + dei j where ei j is the unit matrix of order m over F having 1 as the (i, j)th entry and zeros
elsewhere. Now, using the formula for multiplication of unit matrices given in Proposition (1.3.7), we
see that, as i ! j,
(Im + dei j )(Im − dei j ) = Im + dei j − dei j + d2 ei j ei j

= Im .
Therefore, the elementary matrix Im − dei j is the inverse of E.

Finally, observe that if E is obtained by interchanging the ith and the jth row of Im , then
E = Im − eii − e j j + ei j + e ji . We leave it to the reader (see Exercise 8 of this section) to verify, by
using again the formula for multiplication of unit matrices, that E 2 = Im . Thus, E is its own inverse.
This completes the proof. !
We now record the implications of this result for row equivalent matrices.
Proposition 2.3.6. Let A, B ∈ Mm×n (F) such that B is row equivalent to A. Then, there is an invert-
ible matrix E of order m over F such that
B = EA.
Proof. If B is row equivalent to A, then by Proposition (2.3.3) there are elementary matrices
E1 , E2 , . . . , Er , each of order m, such that
B = Er Er−1 . . . E1 A.
By the preceding proposition, each Ek is invertible. However, a product of invertible matrices is in-
vertible (see Proposition 1.4.3) and Hence E = Er Er−1 . . . E1 is invertible. !
The following corollary is now clear.
Corollary 2.3.7. Let A, B ∈ Mm×n (F) such that B is row equivalent to A. Then A is row equivalent
to B.
With these results, it is easy to prove that row equivalence is an equivalence relation in Mm×n (F);
we leave the proof to the reader. (See Exercise 16 of this section.)
It should be clear that we have analogous results for column equivalent matrices in Mm×n (F).
The following observation will be helpful later.
Proposition 2.3.8. Let A, B ∈ Mm×n (F) be row equivalent. Let A' be the matrix obtained from A by
removing certain columns, and let B' be the matrix obtained from B by removing the corresponding
columns. Then, A' and B' are row equivalent.
Since elementary row operations affect rows of a matrix column wise, the proposition is clear.
Permutation Matrices
Let us end this section with a brief discussion of certain class of matrices known as permutation
matrices. A permutation matrix of order n (over a field F) is any matrix that can be produced by
rearranging, or permuting the rows of the identity matrix In of order n (over F). In other words, a
matrix in Mn (F) is a permutation matrix if and only if each row and each column has a single non-zero
entry equal to 1.
So any elementary matrix of order n, which is obtained by a single exchange of two rows of In , is
clearly a permutation matrix; we shall refer to such matrix as an elementary permutation matrix.
While, in general, a permutation matrix need not be a symmetric matrix, it is clear that an elementary
permutation matrix is symmetric.
A more precise description of a permutation matrix depends on a mathematical interpretation of the
idea of a permutation. A permutation of n symbols can be thought of as an one-to-one map σ from the
set Xn = {1, 2, . . . , n} onto itself; if σ(i) = ki , for 1 ≤ i ≤ n, then one may write σ = {k1 , k2 , . . . , kn }.
The corresponding permutation of the rows of In produces a permutation matrix P; it is clear that the
ith row of P is the ki th row of In . If we denote the n-dimensional row vector which is the ith row of In
as ei (so ei has 1 at the ith place and zeros elsewhere), we can also describe P as the matrix of order n,
whose ith row is eki . For example, corresponding to the the permutation {2, 3, 1} of X3 = {1, 2, 3}, we
have the permutation matrix
   
e2   0 1 0 
   
P = e3  =  0 0 1 .
   
e1 1 0 0
The identity matrix In can be considered a permutation matrix of order n. It corresponds to the
identity permutation, that is, the identity map on Xn . With this convention, it is an easy exercise in
counting to show that there are n! permutation matrices of order n over any field.
The simplest permutation matrix, other than the identity matrix, is the one that can be obtained
from the identity matrix by interchanging two rows in it. It corresponds to a permutation σ of Xn ,
which interchanges two symbols, say i and j, in Xn while fixing all the other symbols in Xn ; any such
permutation is known as a transposition and is usually denoted by the symbol (i, j). So while the
corresponding permutation matrix P has e j at the ith and ei at the jth row, all the other rows of P are
the same as the corresponding rows of In . As we have noted earlier, this particular permutation matrix,
which sometimes will be denoted by Pi j , is precisely the elementary matrix (of order n) corresponding
to the elementary row operation of interchanging the ith and the jth rows. Note: Pi j = P ji . such a
matrix, which we have called an elementary permutation matrix, is symmetric. Thus, any permutation
matrix in Mn (F), which is generated by a transposition, is symmetric.
Before proceeding any further, we consider an example. As we have already seen while discussing
elementary row operations, the elementary permutation matrices of order 3 are the following:
     
 0 1 0   0 0 1   1 0 0 
     
P12 =  1 0 0 , P13 =  0 1 0 , P23 =  0 0 1 .
     
0 0 1 1 0 0 0 1 0
Apart from these three and the identity matrix, there are two more permutation matrices of order 3:
   
 0 1 0   0 0 1 
   
P =  0 0 1 , Q =  1 0 0 .
   
1 0 0 0 1 0
It is clear that P can be obtained from I3 by first interchanging 1st and the 2nd rows, and then inter-
changing 2nd and 3rd rows in the resultant matrix. The elementary matrices corresponding to these
row interchanges are P12 and P23 . Now, it turns out that
  
 1 0 0   0 1 0 
   
P23 P12 =  0 0 1   1 0 0  = P.
  
0 1 0 0 0 1
On the other hand, if we first interchange the 2nd and the 3rd row of I3 and then interchange the 1st
and the 2nd rows of the resultant matrix, we end up with the other permutation matrix Q. We leave it
to the reader to verify that
P12 P23 = Q.
So, at least for permutation matrices of order 3 (over any field), we have verified that any such matrix
is a product of elementary permutation matrices; this product may include only one factor or even
no factors in order to cover the cases of elementary permutation matrices and of the identity matrix,
respectively.
In general, consider elementary permutation matrices P1 , P2 , . . . , Pk of order n over any field, and
set P = P1 P2 . . . Pk , the product of these given matrices. Now, each Pi corresponds to an elementary
row operation, namely a single interchange of rows. Hence, the product P = PIn , by Proposition (2.3.4),
can be obtained by applying the sequence of row interchanges, corresponding to the matrices Pi , to
In (note that this means P2 P1 In can be obtained by performing the row interchange given by P2 in
P1 In ). It follows that the rows of P have to be the rows of In permuted in some order determined by
the sequence of row interchanges and so P is a permutation matrix of order n. Thus, we have shown
that the product of a finite number of elementary permutation matrices is a permutation matrix.
Conversely, any rearrangement of the rows of In can be produced by subjecting In to a sequence
of suitable row interchanges. For example, as we have seen a short while ago, the rearrangement of
the rows of I3 , given by the permutation {2, 3, 1} of X3 = {1, 2, 3}, can also be produced by applying
the row interchanges corresponding to transpositions (12) and (23), in that order, to I3 . The general
assertion is a restatement of a basic fact in the theory of permutations: any permutation on n symbols
is a product of transpositions. (Since proving this needs a somewhat lengthy digression, we omit
the proof which can be found in any basic text book dealing with elementary group theory). Now,
applying a sequence of row interchanges to In , by Proposition (2.3.4), is equivalent to left-multiplying
In by the corresponding elementary matrices. But any such elementary matrix is a permutation matrix
corresponding to some transposition. It follows that if a permutation matrix P of order n, is obtained
from In by subjecting In to a sequence of row interchanges corresponding to elementary permutation
matrices, say, P1 , P2 , . . . , Pk , then P = Pk . . . P2 P1 In , showing that P = Pk . . . P2 P1 . Thus, we have
proved the first assertion of the following proposition.
Proposition 2.3.9. Let P be a permutation matrix of order n (over any field).
(a) P is a product of elementary matrices which are themselves permutation matrices of order n.
(b) If Q is another permutation matrix of order n, then PQ is a permutation matrix of order n.
(c) P is invertible and P−1 = Pt .
Proof. Coming to the proof of the second assertion, we note that, by the first assertion, PQ is a product
of elementary permutation matrices. Now, as we have shown earlier, any product of elementary permu-
tation matrices is a permutation matrix. It follows that PQ is a permutation matrix. To prove the final
assertion, we express P as a product of elementary permutation matrices, say P = Pk Pk−1 . . . P2 P1 .
Recall that any elementary matrix is invertible; in fact, the inverse of any elementary permutation
matrix is itself (see Proposition 2.3.5). It follows that P is invertible and
P−1 = P−1 −1
1 P2 . . . Pk
−1
= P1 P2 . . . Pk .
Finally, observe that any elementary permutation matrix is symmetric, that is, Pi = Pti . Therefore, by
using the formula for the transpose of a product of matrices, one obtains the following:
P−1 = Pt1 Pt2 . . . Ptk

= (Pk Pk−1 . . . P1 )t
= Pt
as required. !
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications. All given
matrices are over an arbitrary field.
(a) Two row equivalent matrices have to be of the same size.
(b) Two row equivalent matrices are column equivalent.
(c) For any two elementary matrices E1 and E2 of order n, E1 E2 = E2 E1 .
(d) Every elementary matrix is invertible.
(e) The transpose of an elementary matrix is an elementary matrix.
(f) The product of two elementary matrices of the same order is an elementary matrix.
(g) The sum of two elementary matrices of the same order is an elementary matrix.
(h) Every non-zero square matrix of order n is row equivalent to In .
(i) The identity matrix In is an elementary matrix.
(j) The set of all permutation matrices of order n is a group with respect to matrix multiplication.
2. Determine whether the following matrices are elementary matrices; for each of the elementary
matrices, find a single elementary row operation that produces it from a suitable identity matrix.
' ( ' (
1 1 1 0
, ,
1 1 0 0
     
0 1 0 0 0 1  1 0 0
     
1 0 0, 0 1 0,  0 1 0,
  
0 1 1 1 0 0 −3 0 1
   
1 0 0 0
 1 0 4 0

0 0 1 0 0 1 0 0
 ,  .
0
 1 0 0 0
 0 0 0
0 0 0 1 0 0 0 1
3. For each of the elementary matrices of Exercise 2, determine its inverse.

4. Let
   
 3 −1 2 1 0 −1
   
A = −2 −1 −1, B = 2 1 1
   
1 0 −1 3 −1 2
and
 
1 0 −1
 
C = 1 1 0
 
4 −1 3
be matrices over R. Find elementary row operations that changes A to B, and then elementary
column operations that changes B to C. Hence, determine invertible matrices E and F in M3 (R)
such that
EA = B and BF = C.
5. Let
   
1 0 −1 1 0 −1
   
A = 2 1 2, B = 1 3 1
   
1 −2 1 1 −2 1
and
 
1 0 0
 
C = 1 3 2
 
1 −2 2
be matrices over R. Find an elementary row operation that changes A to B and an elementary col-
umn operation that changes B to C. Finally, show that a sequence of elementary row operations
will transform C to I3 .
6. Find invertible matrices E and F in M3 (R) such that EAF = I3 where A is the matrix of
Exercise 5.
7. Complete the proof of Proposition (2.3.3).
8. Let i ! j and m be positive integers such that 1 ≤ i, j ≤ m. Express the following products of unit
matrices of order m over a field F as single matrices: eii 2 , e j j 2 , ei j e ji , e ji ei j , ei j ei j and e ji e ji .
Hence, show that if' E = I(m − eii − e j j + ei j + e ji , then E 2 = Im .
1 2
9. Given the matrix A = over R, and matrices
3 4
' ( ' ( ' (
1 0 0 1 −2 0
E1 = , E2 = and E3 = ,
−1 1 1 0 0 1
compute AE1 , AE2 and AE3 . Verify that there are elementary column operations which trans-
form A to' AE1 , AE2 and (AE3 .
a a12 a13
10. Let A = 11 be an arbitrary 2 × 3 matrix over a field F. Verify that:
a21 a22 a23
(i) the interchange of the 2nd and the 3rd column of A can be produced by right-multiplying it
by
 
1 0 0
0 0 1,
 
0 1 0
(ii) the addition of 2 times the 1st column of A to the second column can be produced by right-
multiplying it by
 
1 2 0
0 1 0.
 
0 0 1
Row Reduction 65
11. Consider the matrix

 
a 0 0 0
b 
a 0 0
A =  ,
 c b a 0
d c b a
where the non-zero scalars a, b, c and d are from a field F. Let E be the elementary matrix
obtained by adding −b times row 2 of I4 to its row 4. Find the matrix EA by subjecting A to
suitable row operation.
12. Let A ∈ Mn (F). Show that if a finite sequence of elementary row operations reduces A to the
identity matrix In , then A is invertible.
In fact, the converse is also true. But we have to wait till Section 2.5 to see it.
13. Let
   
 1 3 −1 1
   
A =  0 1 2 and b = 1
   
−1 0 8 3
be real matrices. Perform elementary row operations to the 3 ×4 block matrix [A | b] to transform
it to the form [I3 | b' ]. Hence, deduce the solutions of the following system of equations:
x1 + 3x2 − x3 = 1
x2 + 2x3 = 1
−x1 + 8x3 = 3.
14. Express the matrix

 
0 1 0 0

1 0 0 0
A =  
0 0 0 1
0 0 1 0
as a product of elementary matrices of order 4. Find the inverse of A.

15. Prove Corollary (2.3.7).
16. Show that row equivalence is an equivalence relation in Mm×n (F) for any field F.
17. State and prove results analogous to Proposition (2.3.6) and Corollary (2.3.7) about column
equivalent matrices in Mm×n (F).
18. Prove that there are n! permutation matrices of order n over any field F.
19. Let P be a permutation matrix and D a diagonal matrix in Mn (F). Prove that PDP−1 is diagonal.
2.4 ROW REDUCTION

We now come back to the systems of linear equations and their analysis by means of elementary row
operations. We have seen in Section 2.2 that by applying suitable elementary row operations to the
augmented matrix of a system of linear equations, we can arrive at a simpler system from which the
solutions can be read off easily. It was tacitly assumed that any solution of this row-equivalent system
is also a solution of the original system. Recall that two systems of linear equations are equivalent
if they have the same solution set. In other words, we assumed that row-equivalent systems of linear
equations are equivalent. We begin this section by proving this assumption. Refer to Section 2.2,
especially the notations and terminologies introduced beginning with Equations (2.1) and (2.2), and
Definition (2.2.2).
Proposition 2.4.1. Consider the system of linear equations over a field F given by
Ax = b,
where A ∈ Mm×n (F) is the coefficient matrix and b is an m × 1 column vector of scalars from F. Suppose
that a series of elementary row operations on the augmented matrix [A | b] of this system reduces it to
= ' '>
A | b . Then, the solutions of Ax = b and A' x = b' are the same.
= >
Proof. Let C = [A | b] and C ' = A' | b' be the augmented matrices of the two systems. Since C ' is
obtained from C by a series of row operations, it follows that
C ' = Er Er−1 . . . E2 E1C

= EC,
where E = Er Er−1 , . . . , E2 E1 is the product of the elementary matrices E1 , E2 , . . . , Er of order

m corresponding to the successive elementary row operations applied to C. As was seen in the last
section, the matrix E, being a product of elementary matrices, is invertible (see Proposition 2.3.5).
Also, note that by virtue of properties of block multiplication, or by straightforward calculation with
the entries of E and C, we have EC = [EA | E b], so that
= >
[EA | E b] = C ' = A' | b' .
Now, suppose that an n × 1 column vector u over F is a solution of the system Ax = b. It means that
the matrix equation Au = b holds. It follows that
A' u = (EA)u = E(Au) = E b = b'
which shows that u is also a solution of the system A' x = b' . Since E is invertible, a similar argument
shows that any solution of A' x = b' is a solution of Ax = b. !
Observe that this proposition also implies that Ax = b is inconsistent, that is, it has no solution if
and only if A' x = b' is inconsistent.
Row Echelon Matrices

We next seek a standard form to which the augmented matrix of a system of equations can be row
reduced so that corresponding equivalent system of equations is simple enough. The row echelon
form of the augmented matrix is one such form. Such forms give rise to simpler systems as row
echelon matrices have rows in step-like fashion (echelon) and in general, look something like
 
1 ∗ ∗ ∗ 0 ∗ ∗ ∗ 0 ∗ ∗ 0 · · · 

 1 ∗ ∗ ∗ 0 ∗ ∗ 0 · · · 
 .
 1 ∗ ∗ 0 · · ·
 0
1···
Row Reduction 67
We introduce some terminologies before we give the formal definition. By a non-zero row (re-
spectively, column) of a matrix we mean a row (respectively, column) in which at least one entry is
non-zero; by a zero row (respectively, column) we mean a row (respectively, column) having all zero
entries. A leading entry of a row means the first non-zero entry from the left of a non-zero row, that is,
the left-most non-zero entry of the row.
Definition 2.4.2. A matrix in Mm×n (F) is said to be a row echelon matrix if the following conditions
are satisfied:
(a) All non-zero rows are above any zero row.
(b) Each leading entry in a non-zero row is in a column to the right of the leading entry in any row
above it.
(c) All the entries in the column below a leading entry are zero.
A row echelon matrix is, further, said to be a reduced row echelon matrix provided it satis-
fies two more conditions:
(d) The leading entry in each non-zero row is 1.
(e) The leading entry is the only non-zero entry in the column containing the leading entry.
The leading entry of a non-zero row of a matrix in an echelon or a reduced echelon form is called
a pivot, the location of a pivot a pivot position and a column containing a pivot a pivot column.
Quite often, a column of a matrix is also referred to as a pivot column if the column corresponds to
a pivot column of any of its row echelon forms.
The difference between a row echelon matrix and a reduced row echelon matrix is that the leading
entry 1 is the only non-zero entry in its column in a row reduced echelon matrix, whereas there may
be non-zero entries above a leading entry in a row echelon matrix. Moreover, the leading entry in a
row echelon matrix may be any non-zero scalar, and not necessarily 1 as in the case of a row reduced
echelon matrix. Thus,
 
  0 1 2 3 0 4
3 2 0 4 0 0 0 −1 0 6 2

0 1 −1 6 2   
  and 0 0 0 0 5 −1
0 0 0 4 −3   
  0 0 0 0 0 0
0 0 0 0 0 
0 0 0 0 0 0
are examples of row echelon matrices, whereas
 
  0 1 0 3 0 4
1 0 0 0 0
 0 0 1 0 0 2

0 1 0 0 2  
  and 0 0 0 0 1 −1
0 0 0 1 −3 0 
  0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0
are examples of row reduced echelon matrices.
One of the most crucial facts about matrices is that any matrix over a field can be row reduced to
a row echelon, or even to a reduced row echelon matrix, by appropriate elementary row operations.
This procedure for obtaining a row echelon or a reduced row echelon matrix row equivalent to a given
matrix is known as the row reduction of that matrix, and such row reduced matrices are known as row
echelon forms or the reduced row echelon form of the given matrix. It is clear that any echelon form
of a zero matrix is itself. We now give the algorithm for row reduction of any non-zero matrix.
Algorithm for Row Reduction 2.4.1

Consider any rectangular non-zero matrix over a field.
Step 1: Pick the first non-zero column, that is, the left-most non-zero column. If a non-zero entry
occurs in the first row of this column, go to Step 2; otherwise bring any non-zero entry in this
column to the top row by interchanging the first row and the row containing that non-zero
entry.
Step 2: Add or subtract suitable multiples of the first row (of the new matrix if some interchange has
taken place in Step 1) to other rows to make all the entries, except the first one of the first
non-zero column, zeros.
Step 3: Cross out the first row and the first non-zero column (which has a single non-zero entry at
the first place now) as well as all the zero columns, if any, to the left of the next non-zero
column in the matrix obtained after the first step. Repeat Steps 1 and 2 in the new submatrix
thus obtained. Note that first column of this new matrix is a non-zero column.
It is clear that continuing this process will reduce the original matrix to a row echelon matrix. Note
that the executions of Steps 1 and 2 for the first time produce the first pivot column in the matrix, and
applying the same steps to the subsequent submatrices will yield the other pivot columns one by one.
To produce the row reduced echelon form of the matrix, we need to carry out the next steps.
Step 4: Multiply each row containing a pivot element by a suitable scalar to make each pivot equal
to 1.
Note that by repeating Step 2, all the entries below each pivot element were already made equal to
zero. The next step will ensure that all the entries above each pivot are made zeros.
Step 5: Beginning with the right-most pivot, make all the entries above pivot zeros by adding or sub-
tracting suitable multiples of the row containing the pivot to the the rows above it.
Since any non-zero entry of a matrix over a field has a multiplicative inverse, it follows that the
preceding algorithm applied to a non-zero matrix will produce its row echelon form as well as its
reduced row echelon form.
Two important remarks about row echelon and row reduced echelon forms of a matrix are in order:
• Row echelon forms of a matrix are not unique. Different sequences of row operations on a given
matrix will produce different echelon matrices. In other words, a given matrix may be row reduced
to a number of different row echelon matrices. For example, both
' ( ' (
1 2 1 0
and
0 2 0 2
' (
1 2
are row echelon forms of .
1 4
• Fortunately, the reduced row echelon form of a given matrix is unique. That is, the reduced row
echelon form of a given matrix is independent of 'the sequence
( of row operations employed. Taking
1 0
the matrix of the preceding remark, we see that is its only reduced row echelon form.
0 1
We will give a proof of the uniqueness of the reduced row echelon form of a matrix shortly.
Row Reduction 69
Solutions of Systems of Linear Equations

Our first task is to /see how
0 the row echelon or the row reduced echelon form, say, [A
'
b' ] of the
augmented matrix A b of a system of linear equations Ax = b help us to find the solutions of the
system. We look at some examples to begin with.
For example, assume that the reduced row echelon form of an augmented matrix looks like
 
/ 01 −1 0 3 2

A '
b = 0
'
0 1 5 4 .

0 0 0 0 1
The corresponding system of equations (which is equivalent to some given system) is clearly the
following one:
x1 − x2 + 3x4 = 2
x3 + 5x4 = 4
0 = 1.
Because the last equation is an absurdity, this system and therefore, the original system it is equiv-
alent to, has no solution. Thus, we see that if the last column of the reduced matrix is a pivot column,
then the original system of equations is inconsistent. Note that the last column is the reduced column
vector b' of the system Ax = b.
Next, consider the reduced echelon matrix
 
1 0 3 0 −1 0
 
/ 0 0 1 −1 0 2 3
A' b = 
' .
0 0 0 1 1 4
 
0 0 0 0 0 0
The corresponding system of equations will be
x1 + 3x3 − x5 = 0
x2 − x3 + 2x5 = 3
x4 + x5 = 4.
Note that the variables in the pivot columns appear exactly once in the whole system; these are
called the basic variables of the original system of equations. The other variables are called the free
variables. Thus, in the last example, x1 , x2 and x4 are the basic variables, whereas x3 and x5 are the
free variables. We obtain the solutions of the system by assigning arbitrary values to the free variables,
if any, and then expressing the basic variables in terms of these arbitrary constants according to the
equations involving the basic variables. So, for arbitrary scalars a and b, we have
x5 = a
x3 = b
x4 = 4 − a
x2 = 3 − 2a + b
x1 = a − 3b,
which constitute the complete solution of the given system. Note that because of the presence of the
free variables, which can be assigned arbitrary values, there can be no unique solution of the system
of equations.
For every choice of scalars a and b for the variables x3 and x5 , respectively, we have one solution
of the system. The arbitrary choice of the free variables works, because the system of equations does
not put any constraints on the free variables.
Observe that any solution whose components were described above can also be expressed as a sum
of scalar multiples of fixed column vectors. For examples,
       
 a − 3b  0  1 −3
3 − 2a + b 3 −2  1
       


b  = 0 + a  0 + b  1 .
    
   −1  0
 4 − a  4    
a 0 0 0
This is known as the general solution of the given system in parametric form. The scalars a and b are
the parameters.
We now summarize the basic facts about the method of finding solutions of a system of equations
using the reduced row echelon form of the augmented matrix of the system of equations.
= >
Theorem 2.4.3. Let A' | b' be the reduced row echelon form of the augmented matrix [A | b] of a
system Ax = b of linear equations over a field F.
= >
(a) If the last column of A' | b' is a pivot column, then the system Ax = b is inconsistent.
= ' '>
(b) If the last column of A | b is not a pivot column, then the system does have solution. In this
case, two situations may arise:
• If every column of A' is a pivot column, then the system Ax = b has a unique solution.
• Otherwise, the solutions of Ax = b consist of the basic variables, that is, the variables in the
pivot columns of A' , expressed in terms of the other variables known as the free variables.
The free variables can be assigned any arbitrary values.
The proof of the theorem is clear and left to the reader.

Recall that it is customary to call a column of an arbitrary matrix also as a pivot column if it
corresponds to a pivot column of the reduced row echelon form of that matrix. Therefore, it is clear
that the preceding theorem can also be stated in terms of pivot columns of [A | b].
One should note that if one is interested in deciding only the consistency of a given system of
equations, it is sufficient to find a row echelon form of the augmented matrix, and not necessarily the
Row Reduction 71
reduced form. For, if the last column of any row echelon form is a pivot column, then we can conclude
that the system is inconsistent.
The following corollary is clear.
Corollary 2.4.4. A homogeneous system of linear equations Ax = 0 is always consistent. The num-
ber of free variables comprising the general solution of the system is the number of non-pivot columns
of A. Furthermore, this system will have only the zero solution if and only if every column of A is a
pivot column.
Note that while finding solutions of a homogeneous system Ax = 0 of linear equations, there is no
need to form the augmented matrix of the system; one row reduces A directly.
We apply the algorithm of row reduction to the augmented matrix of the following system of equa-
tions over R to find its solutions, if any.
EXAMPLE 5 Consider the system of equations:
3x2 + 9x3 + 6x5 = −3

−2x1 + 4x2 + 6x3 − 4x4 = −2
4x1 − 11x2 − 18x3 − 2x4 − 3x5 = 10.
The augmented matrix to which the algorithm will be applied is

 
 0 3 9 0 6 −3

−2 4 6 −4 0 −2.
 
4 −11 −18 2 −3 10
The very first column being non-zero, we interchange rows 1 and 2 as the first step
so that top-most left position of the matrix is non-zero. We could have interchanged
rows 1 and 3 to achieve the first goal; this will lead to an echelon form probably
different from the one we will be getting. The row equivalent matrix after this first
row operation is
 
−2 4 6 −4 0 −2
 0 
3 9 0 6 −3 R1 ↔ R2 .
 
4 −11 −18 2 −3 10
Note that for convenience, we are indicating briefly the operations to the right of
the matrix.
As the second step, we add twice the 1st row to row 3 so that all the entries in
the first column, except the first one, are zeros. Clearly, while dealing with a bigger
matrix, this step has to be repeated several times to yield the same shape in the matrix.
The new equivalent matrix is
 
−2 4 6 −4 0 −2
 0 
3 9 0 6 −3 R3 → R3 + 2R1.
 
0 −3 −6 −6 −3 6
For the third step, we cover up the first row and the first column, and repeat step
1 and 2 to the smaller submatrix. Since the submatrix has already a non-zero entry,
namely 3, in the top left-most position, we can go directly to the 2nd step. Adding
the second row to the third row, we get
 
−2 4 6 −4 0 −2
 0 3 9 0 6 −3 R3 → R3 − R2 .
 
0 0 3 −6 3 3
We have already reached a row echelon form, and as the last column is not a pivot
column, the system does have solutions. To produce the actual solutions, we have to
apply steps 4 and 5 to the last matrix. Step 4 gives us easily
 
1 −2 −3 2 0 1
0 
 1 3 0 2 −1 R1 → −1/2R1, R2 → 1/3R2, R3 → 1/3R3.

0 0 1 −2 1 1
Step 5 is about making all the entries above the pivot entries zero. We begin by
the right-most pivot entry, which is 1 in the 3rd column; Adding 3 times the third
row to the first row, and subtracting the same from the second row we can make all
the entries above these pivot zeros:
 
1 −2 0 −4 3 4

0 1 0 6 −1 −4 R1 → R1 + 3R3 , R2 → R2 − 3R3.
 
0 0 1 −2 1 1
Finally, by adding suitable multiples of the second row to the first row, the fol-
lowing reduced row echelon form is produced:
 
1 0 0 8 1 −4

0 1 0 6 −1 −4 R1 → R1 + 2R2 .
 
0 0 1 −2 1 1
Observe that by Theorem (2.4.3), x4 and x5 are the free variables as the fourth and
fifth columns are not pivot columns. Also, note that the last column is not considered
for deciding about free variables; it is the column for the scalars. Now, the system of
equations corresponding to the reduced row echelon matrix is clearly
x1 + 8x4 + x5 = −4
x2 + 6x4 − x5 = −4
x3 − 2x4 + x5 = 1.
which is equivalent to the original system of equations. To write down the solu-
tions, we follow the procedure outlined in Theorem (2.4.3). We first assign arbitrary
constants to the free variables, and then express the basic variables in terms of these
constants according to the last set of equations. Thus, the components of any solution
will be given by
x5 = a
x4 = b
x3 = 1 − a + 2b
x2 = −4 + a − 6b
x1 = −4 − a − 8b.
where a and b are arbitrary reals. Thus, we see that for every choice of reals a and
b, the list (−4 − a − 8b, −4 + a − 6b, 1 − a + 2b, b, a) is a solution of the given sys-
Row Reduction 73
tem of equations. We leave it to the reader to show that the general solution in the
parametric form of the given system of equations can be put in the form
       
 x1  −4 −1 −8
 x  −4 

 1 
 −6
 2       
 x3  =  1 + a −1 + b  2.
 x   0  0  1
 4       
x5 0 1 0
In the next example, we will have the opportunity to consider inconsistent

systems.
EXAMPLE 6 We find the values of a for which the following system of equations over R is incon-
sistent:
x1 + 5x2 − 3x3 = −4
−x1 − 4x2 + x3 = 3
−2x1 − 7x2 = a.
The augmented matrix is
 
 1 5 −3 −4

−1
 −4 1 3.

−2 −7 0 a
Adding the first row to the second row, and 2 times the first row to the third row, we
make the entries other than the pivot one in the 1st column zeros. The row equivalent
matrix now looks like
 
1 5 −3 −4

0 1 −2 −1.
 
0 3 −6 a − 8
Next, we add 3 times the second row to the third row to make the entry below the
pivot one in the second column zero. This produces
 
1 5 −3 −4

0 1 −2 −1
 
0 0 0 a−5
which is the row echelon form of the augmented matrix. Its last column is a pivot
column unless a = 5. In other words, if the real number a ! 5, then the given system
is inconsistent.
We now prove the uniqueness of the reduced row echelon form of a matrix; the simple proof we
give is due to Thomas Yuster.
Theorem 2.4.5. The reduced row echelon form of any matrix over a field F is unique.
Proof. Let A be an m × n matrix over a field F. We can assume that A is a non-zero matrix. The proof
is by induction on n. If n = 1, there is nothing to prove as (1, 0, 0, . . . , 0)t is the only possible reduced
row echelon form of A. So, we can assume that A has n columns, where n ≥ 2. Let A' be the matrix
obtained from A by deleting the last column. As elementary row operations on a matrix affect it column
wise, therefore any sequence of elementary row operations that reduces A to its reduced row echelon
form also transforms A' to its reduced row echelon form. However, by the induction hypothesis, the
reduced row echelon form of A' is unique. It follows that if B and C are two reduced row echelon forms
of A, then they can differ only in the nth column only. Thus, if B ! C, then there is some j, (1 ≤ j ≤ m)
such that the ( j, n)th entries of B and C are not equal, say, b jn ! c jn . Now, let v = (v1 , v2 , . . . , vn )t
be an n × 1 column vector over F such that Bv = 0. Then, as B and C are row equivalent, Cv = 0 so
(B−C)v = 0. But the first n −1 columns of B −C are zero columns, so by considering the jth component
of (B − C)v, we can conclude that (b jn − c jn )vn = 0, or that vn = 0 as b jn ! c jn .
Thus, we have shown that in case B ! C, then any solution x = (x1 , x2 , . . . , xn )t of the system of
equations Bx = 0 or Cx = 0 must have xn = 0. It follows that xn cannot be a free variable for these
systems of equations, or equivalently, the nth column of both B and C are pivot columns. But the first
n − 1 columns of B and C are identical to the corresponding columns of the reduced row echelon form
of A' . Therefore, the rows containing the pivots in the nth column of B and C cannot be different as
these must correspond to the first zero row of A' . Finally, note that all the entries, except for the pivots,
in the nth columns of B and C are zeros. These two observations imply that B = C contradicting our
assumption about B and C. This establishes the theorem. !
We finally consider a square matrix A of order n whose reduced row echelon forms R has no zero
rows. Then every row of R has a pivot. To accommodate these n pivots, each of the n columns of R
must have a pivot. This forces R = In , which proves the following useful result.
Proposition 2.4.6. If the reduced row echelon form R of a matrix A ∈ Mn (F) does not have a zero
row, then R = In .
EXERCISES
1. Determine whether the following statements are true or false giving brief justification. All given
matrices are over an arbitrary field F.
(a) Every matrix has only one row echelon form.
(b) The reduced row echelon form of a non-zero matrix cannot be the zero matrix.
(c) The reduced row echelon form of a non-zero square matrix is the identity matrix.
(d) Any row echelon form of an invertible matrix has to be invertible.
(e) If the reduced row echelon form of the augmented matrix of a system of linear equations has
no zero row, then the system has a unique solution.
(f) If the reduced row echelon form of the augmented matrix of a system of linear equations has
no zero column, then the system has no solution.
(g) If a square matrix A of order 3 has two pivot columns, then the system Ax = 0 has non-zero
solutions.
/ 0 / 0
(h) If A' b' can be obtained from A b by a finite number of elementary column opera-
tions, then Ax = b and A' x = b' are equivalent.
(i) If the reduced row echelon form R of a square matrix does not have a zero column, then R
must be an identity matrix.
(j) In any row echelon form of a matrix, the pivot in any non-zero row is the only non-zero entry
in that pivot column.
Row Reduction 75
(k) If every column of a row echelon form of a matrix A is a pivot column, then the system of
equations Ax = b for any b has a solution.
(l) A basic variable in a system of equation Ax = b is a variable that corresponds to a pivot
column of A.
(m) The last column of the augmented matrix of a system of equations Ax = b can never be a
pivot column.
(n) If the number of variables in a system of equations Ax = b exceeds the number of equations,
then the system has to be consistent.
2. Determine whether the following matrices over R are in row echelon form; which ones are in
reduced row echelon form?
/ 0 ' ( ' (
1 3 0 1
1 0 0 1 0, , ,
0 1 1 0
   
1 2 −1 1 0 1 0
   
0 1 0, 0 0 0 0,
 
0 0 0 0 1 0 0
   
1 0 1 0 1 0 1 0
   
0 0 0 1 , 0 0 1 1.
 
0 1 0 0 0 0 0 1
3. Transform the following matrices over R to reduced row echelon forms by elementary row
operations:
   
1 2 3 4 1 2 0 0 4
   
2 3 4 5, 2 3 1 0 0,
 
3 4 5 6 0 2 0 1 1
 
0 0 2 1 −1
 
2 4 0 6 8.

0 1 −1 3 −2
What will be the reduced row echelon forms if the matrices are considered over the field Q
and C of rational numbers?
4. Describe, in each case, all solutions of Ax = 0 over R in parametric form if A is row-equivalent
to
 
/ 0 ' ( 1 0 0 0 3 −1
1 0 −1  
2 −4 6 , , 0 2 −2 4 6 −2.
2 1 1  
0 0 0 2 0 −2
5. Find the parametric solutions of the following homogeneous system of equations:
x1 + 2x2 + x3 = 0 2x1 − x3 = 0
; .
x1 − x2 + 3x3 = 0 2x2 − x3 = 0
6. Determine whether the systems of linear equations over R whose augmented matrices are given
below are consistent; if consistent, find the solutions.
   
1 1 2 1 0 1 2
 ,  ,
2 2 4 2 0 2 3
   
3 −1 2 1  1 −2 −1 3 0
2 1 1 0, −2 4 5 −5 3.
   
 
1 −3 0 2 3 −6 −6 8 2
7. Determine whether the following systems of linear equations over R are consistent by row re-
ducing the augmented matrices; if consistent, find the solutions:
x1 + 4x2 + 2x3 − 3x5 = −1
(a) 2x1 + 9x2 + 5x3 + 2x4 + x5 = 2.
x1 + 3x2 + x3 − 2x4 − 4x5 = 1
2x1 − x2 + 7x3 + 3x4 = 6

(b) x1 + 2x2 + x3 − x4 = −2.
−3x1 − x3 + 5x4 = 10
x1 + x2 = 1
x1 + x2 + x3 = 2
(c) .
x2 + x3 + x4 = 2
x3 + x4 = 1
x1 + x2 = 1
x1 + x2 + x3 = 4
(d) x2 + x3 + x4 = −3.
x3 + x4 + x5 = 2
x4 + x5 = −1
8. In the following system of linear equations, determine the values of the real number a for which
the system is consistent and then determine the solutions for such a:
x1 + x2 + ax3 = 3
x1 + ax2 + x3 = 0.
ax1 + x2 + x3 = 0

10. Let A ∈ Mm×n (F). Prove that Ax = 0 has only the zero solution if and only if for any m × 1 matrix
b over F, Ax = b has a unique solution.
11. Consider a system of linear equations over a field F in which there are more variables than
equations. Suppose that F is infinite (e.g., F = R or C). Show that if the system is consistent, then
there will be infinitely many solutions.
Invertible Matrices Again 77
12. Give an example of a consistent system of linear equations in which there are more equations
than variables.
13. Consider a consistent system of linear equations Ax = b where A ∈ Mm×n (F) and b is an m × 1
column vector over F. Prove the following:
(a) The homogeneous system Ax = 0 is consistent.
(b) If s is a fixed but arbitrary solution of Ax = b, then any solution of Ax = b can be written as
s + s0 for some solution s0 of Ax = 0.
2.5 INVERTIBLE MATRICES AGAIN

As we have mentioned earlier, row operations have useful applications in areas other than that of
solving systems of linear equations. In this section, we use elementary row operations to analyse
invertible matrices and compute their inverses.
Recall that a homogeneous system Ax = 0 of m linear equations in n variables over a field F (which
means that A ∈ Mm×n (F)) always has the zero solution; in fact, it is the only solution in case every
column of A is a pivot column. We now seek a condition on the size of A under which the system Ax = 0
has a non-zero solution. To do that assume that A has r pivot columns. It is clear that r ≤ min{m, n}. So,
if m < n, then n − r, the number of non-pivot columns of A, is strictly positive. Since non-pivot columns
give rise to free variables, there is at least one free variable in the solution of Ax = 0 (see Corollary
2.4.4). Thus, Ax = 0 will have non-zero solutions as free variables can be assigned any non-zero scalar.
We thus have the following result which will have important consequences later.
Proposition 2.5.1. Let
Ax = 0
be a homogeneous system of m equations in n variables over a field F. If m < n, then the system will
have non-zero solutions.
Let us now consider the system of equations Ax = 0, where the coefficient matrix A is a square
matrix over F. In that case, there is a connection between the invertibility of A and the solvability of
the corresponding system of equations.
Proposition 2.5.2. Let A ∈ Mn (F). Then, A is invertible if and only if the homogeneous system
Ax = 0
of linear equations over F has x = 0 as the only solution.
Proof. Suppose that A is invertible. If an n × 1 column vector v over F satisfies the system of equations
Ax = 0, then multiplying the matrix equation Av = 0 by the inverse A−1 of A, we see that v = 0 which
means that Ax = 0 cannot have a non-zero solution.
Conversely, assume that Ax = 0 does not have any non-zero solution (it always has the zero solution)
over F. Therefore, if R is the reduced row echelon form of A, then the equivalent system Rx = 0
also does not have a non-zero solution. Now, according to Proposition (2.4.6), R is either In or it
has at least one zero row. In the second case, R has to have a column without a pivot forcing free
variables in the system of equations leading to some non-zero solution. Thus, our hypothesis implies
that R = In . Since R is the reduced row echelon form of A, it follows by Proposition (2.3.6) that
there is an invertible matrix E such that R = EA. Hence, A = E −1 In = E −1 showing that A itself is
invertible. !
We can restate the last result as follows.
Corollary 2.5.3. A matrix in Mn (F) is invertible if and only if its reduced row echelon form is the
identity matrix.
The following is an easy consequence whose proof is left to the reader.
Corollary 2.5.4. A matrix in Mn (F) cannot be invertible if it has a zero row or a zero column.
We apply Corollary (2.5.3) to triangular matrices now. Observe that an upper triangular matrix A,
which, by definition is a square matrix, is already in a row echelon form. Therefore, if even a single
diagonal element of A is zero, then the reduced row echelon form of A cannot be the identity matrix.
Thus A is invertible if and only if each of its diagonal elements is non-zero. To prove a similar assertion
for a lower triangular matrix, we first note that the transpose of a such a matrix is upper triangular with
the same diagonal. Since a matrix is invertible if and only if its transpose is, we see that the following
result holds for a lower triangular matrix too.
Corollary 2.5.5. A triangular matrix in Mn (F) is invertible if and only if each of its diagonal ele-
ments is non-zero.
Finally, observe that the proof of Proposition (2.5.2) also contains the following useful fact.
Proposition 2.5.6. If a matrix in Mn (F) is invertible, then it can be expressed as a product of finite
number of elementary matrices in Mn (F).
A convenient procedure to compute the inverse of an invertible matrix can now be formulated.
Since the reduced row echelon form of an invertible matrix A in Mn (F) is In , there are elementary
matrices E1 , E2 , . . . , Er in Mn (F) such that
E r . . . E 2 E 1 A = In .
It then follows that
A−1 = In A−1 = Er . . . E2 E1 In .
As each elementary matrix corresponds to an elementary row operation, we can interpret the pre-
ceding remark as follows: the elementary row operations which change A to In , when applied in the
same sequence to In , will produce A−1 . In practice, we use the following procedure:
Algorithm for finding Inverse 2.5.1

/ 0
Let A ∈ Mn (F). Apply elementary row operations to the block matrix A In so as to reduce A to
/ 0
its reduced row echelon form. If A is row-equivalent to In , then A is invertible, and A In is row-
/ 0
equivalent to In A−1 . Otherwise A is not invertible.
' (
2 −2
EXAMPLE 7 We find the inverse of A = if it exists.
4 7 / 0
We apply elementary row operations to A I2 so as to bring A to its row reduced
form. The computation is as follows:
/ 0 ' (
2 −2 1 0
A I2 =
4 7 0 1
' (
2 −2 1 0
∼
0 11 −2 1
' (
1 −1 1/2 0
∼
0 1 −2/11 1/11
' (
1 0 7/22 1/11
∼ .
0 1 −2/11 1/11
Since A has been reduced to I2 , according to our Algorithm (2.5), A−1 exists, and
equals the second block in the last matrix.
The process of row reduction is useful not only in the practical matter of finding inverses, but also
as a theoretical tool. As an example, we prove that a one-sided inverse of a matrix is necessarily its
inverse, a result we had quoted in Chapter 1 (see Equation 1.12).
Proposition 2.5.7. Let A ∈ Mn (F). If A has a right inverse (respectively, left inverse) B ∈ Mn (F),
that is, if AB = In (respectively, BA = In ), then A is invertible with B as its inverse.
Proof. Suppose that AB = In . Let R be the reduced row echelon form of A, and let E be the invertible
matrix in Mn (F) such that
EA = R. (2.4)
It therefore follows, by left-multiplying the relation AB = In by E, that
RB = E. (2.5)
Now, by Proposition (2.4.6), R is either the identity matrix or has at least one zero row. The second
possibility cannot occur as in that case, the matrix E on the right-hand side of the Equation (2.5) will
also have a zero row. In that case, E cannot be invertible by Corollary (2.5.4). Thus, R has to be the
identity matrix. Therefore, Equation (2.5) reduces to B = E which implies that B itself is an invertible
matrix. Equation (2.4) now, in turn, reduces to BA = In showing that B is the inverse of A. If, on
the other hand, we assume that BA = In , the same argument with B in place of A shows us that B is
invertible with A as its inverse. But that also implies that A is invertible with B as the inverse. !
We end this section with a result that will be needed later.
Proposition 2.5.8. The inverse of an invertible lower triangular matrix in Mn (F) is lower
triangular.
Proof. According to Proposition (2.5.6), the inverse A−1 of an invertible matrix A ∈ Mn (F) is given by
the product A−1 = Ek Ek−1 , . . . , E2 E1 , where E1 , E2 , . . . , Ek are the elementary matrices in Mn (F)
corresponding (in order) to the elementary row operations which, when applied successively to A,
bring it to its reduced row echelon form, the identity matrix. We claim that each Ei is lower triangular.
To prove our claim, we first observe that, as A is lower triangular, no elementary operation of the
second type, that is, no row interchange, is required for the row reduction of A. An elementary matrix
corresponding to an elementary row operation of the first type, that is row scaling by dividing by a non-
zero pivot, is diagonal and so lower triangular. Again as A is lower triangular, any row replacement,
that is, any elementary row operation of the third type, used for the row reduction of A is the addition
of a suitable multiple of some row to a row below it. Therefore, the elementary matrix corresponding
to such a row operation is lower triangular. So our claim follows. Since the product of a finite number
of lower triangular matrices is lower triangular (see Proposition 1.2.4), our claim implies that A−1 is
lower triangular. !
The analogous result for an invertible upper triangular matrix is left as an exercise.
EXERCISES
1. Determine whether the following statements are true or false giving brief justification. All given
matrices are square matrices over an arbitrary field F.
(a) If, for a matrix A of order n, Ax = 0 has a solution, then A is row equivalent to In .
(b) If A is invertible, then Ax = b has a solution for any column vector b.
(c) If Ax = b has a unique solution for a column vector b, then Ax = c has a solution for any
column vector c.
(d) If Ax = b is inconsistent for some column vector b, then the reduced row echelon form of
A must have at least one zero row.
(e) If, for matrices A and B of order n, ABx = 0 only has the zero solution, then A is invertible.
(f) Two invertible matrices of the same order are row equivalent.
(g) If A is invertible, then At x = 0 has only the zero solution.
(h) If all the diagonal entries of an upper triangular matrix A are non-zero, then A is invertible.
' (
a b
(i) Suppose that the scalars a, b, c, d are all non-zero. Then the matrix is row equiva-
c d
lent to I2 .
(j) If all the entries of a square matrix are non-zero scalar, then the matrix must be invertible.
2. Prove Corollaries (2.5.3) and (2.5.4).
3. Determine whether the following matrices over R are invertible; find the inverses of the invert-
ible ones using Algorithm (2.5). Also, express each invertible matrix as a product of elementary
matrices.
     
' ( 0 0 1 1 1 1 1 1 2
1 2      
, 0 1 0, 1 0 1, 1 1 1,
3 5      
1 0 0 1 2 1 1 2 1

     
1 2 3 2
  2 2 2 0
 1 0 0 0

2 2 6 4 −1 2 1 0 2 3 0 0
 ,  ,  .
2
 4 6 0  1
 4 1 0 4
 5 6 0
2 4 6 2 0 0 0 2 2 3 6 1
4. Find the inverses of the following matrices over R:
       
1 2 3 1 0 0 1 2 3  1 1/2 1/3
       
0 1 2, 2 1 0, 2 1 2 1/2 1/3 1/4.
   
0 0 1 3 2 1 3 2 1 1/3 1/4 1/5
5. Let a, b be real numbers such that a is non-zero and a ! b. Determine whether the following
matrix is invertible over R and if so, find its inverse:
 
a b b
 
a a b.

a a a
6. Determine the inverse of the following matrices in case they are invertible:
   
 4 −1 −1 −1   2 −1 0 0 
 −1 4 −1 −1   −1 0 1 0 
 ,  .
 −1 −1 4 −1   0 1 0 −1 
  
−1 −1 −1 4 0 0 −1 1
7. Find the inverse of
 
 1 −1 1 −1 1 
 0 1 −1 1 −1 
 
A =  0 0 1 −1 1 .

 0 0 0 1 −1 

0 0 0 0 1
Hence find the inverse of the following without any row reduction:
 
 1 0 0 0 0 
 −1 1 0 0 0 
 
B =  1 −1 1 0 0 .

 −1 1 −1 1 0 

1 −1 1 −1 1
8. Let A ∈ Mn (F). Suppose that the column vectors of A are γ1 , γ2 , . . . , γn . Prove that A is
invertible if and only if the only scalars c1 , c2 , . . . , cn for which the relation
n
1
c i γi = 0
i=1
holds are c1 = c2 = · · · = cn = 0.
9. Let A and B ∈ Mn (F) such that AB is invertible. Prove that both A and B are invertible.
10. Prove that the inverse of an invertible upper triangular matrix in Mn (F) is upper triangular.
11. Let A ∈ Mn (F). Prove that A is invertible if and only if the matrix equation Ax = b has at least
one solution for each n × 1 column vector b over F.
12. The coefficient matrices of the following systems of linear equations over R are invertible. Find
the solutions of the systems of equations by computing the inverses of the coefficient matrices
by elementary row operations.
x1 + x2 + x3 + 2x4 = 1
x1 + 2x2 = 3 x1 + x2 + 2x3 + 3x4 = 2
, .
2x1 + 3x2 = 1 x1 + 2x2 + 3x3 + 4x4 = 3
2x1 + 3x2 + 4x3 + 5x4 = 4
2.6 LU FACTORIZATION
As we have seen in the section on Gaussian elimination, this procedure to solve a system of linear
equations, such as Ax = b, proceeds in two distinct stages. The first, which can be termed as/ for- 0
ward elimination, consists of a sequence of elementary row operations to the augmented matrix A b ,
to eliminate (that is, to make zero) as many entries of A as possible. In the second stage, which is
sometimes known as backward substitution, one uses the reduced row echelon form of the augmented
matrix, produced by operations of the first stage, to obtain the solutions of the original system of
equations.
In practice, especially in many computer programmes dealing with solutions of systems of equa-
tions, one replaces A with a factorization of A into a product of two or more simpler matrices. For
example, in case no row exchange (type two row operation) is needed for the row reduction of A, one
can express A as a product A = LU, where L is a lower triangular matrix with all its diagonal entries
equal to 1, and U a row echelon form of A obtained without any row scaling. In fact, L too can be
obtained as a by-product of of the row reduction of A. Such a factorization A = LU is known as an LU
factorization or an LU decomposition of A. If A is an m × n matrix over a field F, then L is of order
m whereas U is an m × n matrix, both over F.
EXAMPLE 8 The following is an example of an LU factorization of a square matrix A of order 3:

    
 1 −1 −2   1 0 0   1 −1 −2 
 1   
0 −1  =  1 1 0   0 1

1 .
    
2 3 2 2 5 1 0 0 1
Once an LU factorization A = LU is obtained, the task of solving the system of
equations Ax = b reduces to that of solving a pair of simpler systems: first Ly = b for
LU Factorization 83
y and then U x = y for x. Since Ax = L(U x), the solution x of U x = y is the required
solution of Ax = b. The matrix equations Ly = b and U x = y are easier to solve as L
is lower triangular with 1 on the diagonal and U is almost upper triangular. For a
square matrix A, U is, in fact, upper triangular.
EXAMPLE 9 As an example, we solve the system of equations Ax = b over R given explicitly as

    
 1 −1 −2   x1   2 
 1    
0 −1   x2  =  −1  ,
    
2 3 2 x3 1
using the LU factorization of the coefficient matrix A given earlier. We first solve
Ly = b, using the triangular factor L, that is, solve
    
 1 0 0   y1   2 
     
 1 1 0   y2  =  −1  ,
2 5 1 y3 1
which is equivalent to the system
y1 = 2
y1 + y2 = −1 .
2y1 + 5y2 + y3 = 1
It is an easy matter to solve this system by forward substitution as L is a lower

triangular matrix with 1 along the diagonal:
 
 2 
 
y =  −3  .
 
12
We next solve U x = y, that is ,

    
 1 −1 −2   x1   2 
    
 0 1 1   x2  =  −3  ,
0 0 1 x3 12
which gives the system of equations
x1 − x2 − 2x3 = 2
x2 + x3 = −3 .
x3 = 12
Backward substitution then yields the solution x = (x1 , x2 , x3 )t of the original system
of equations:
x3 = 12
x2 = −15 .
x1 = 11
We remark that in this example, U is a square matrix of order 3 as A is so; moreover, it turns out
that U is upper triangular with non-zero entries along the diagonal as A is invertible.
Existence of LU Factorizations
We now verify that an LU factorizations of any m × n matrix A over a field F can be obtained provided
no row interchange is required for row reducing A into some row echelon form. If no row scaling is
performed (that is, no division of a row by the pivot contained in that row) in the row reduction of
A, then only the elementary row operations of the third type (row replacements) are needed to bring
A to some row echelon form. Now observe that as these row replacements are always additions of
suitable multiples of a row to rows below it, the corresponding elementary matrices are necessarily
lower triangular. Thus, the process of row reduction of A can be described as follows: there are lower
triangular matrices E1 , E2 , . . . , Ek , all of order n over F, such that
Ek Ek−1 . . . E2 E1 = U, (2.6)
where U is an m × n matrix over F in row echelon form. Note: as no elementary operation of type one
is used, that is, no row scaling is resorted to in order to make the pivots of U equal to 1, each Ei has
all its diagonal entries equal to 1. Thus, each Ei is invertible by Corollary (2.5.5). Setting
L = E1 −1 E2 −1 . . . Ek −1 = (Ek Ek−1 . . . E1 )−1 (2.7)
we see, because of Equation (2.6), that A = LU. Moreover, By Proposition (2.5.8) again, each Ei −1
is lower triangular with all its diagonal elements equal to 1; hence, by Proposition (1.2.4), L, being a
product of such Ei −1 , is itself a lower triangular matrix with all its diagonal elements equal to 1. This
completes the verification of the following result.
Proposition 2.6.1. Let F be a field and A ∈ Mm×n (F). Assume that A can be row reduced to a matrix
U in row echelon form without using any row interchange, that is, without using any elementary row
operation of the third type. Then there is a lower triangular matrix L of order m over F such that
A = LU.
Moreover, if no row scaling is done to make the pivots in U equal to 1, then all the diagonal entries of
L are equal to 1.
Why are LU factorizations important in applications? First of all, because of increasingly powerful
computing powers available these days, solving very large systems of linear equations and similar
large scale computations are routinely performed. Several computer programmes designed for such
computations rely on LU factorizations of matrices. It is because such factorizations are relatively
more efficient as well as reliable (for example, in minimizing round-off errors which are inherent in
computer programmes). Secondly, in many technical and business applications, series of systems of
linear equations Ax = bi with the same coefficient matrix A but with different bi are needed to be
solved. The advantage of an LU factorization in such situations is clear: one has to compute A = LU
only once. Forward substitutions with the same L for the different bi and subsequently backward
substitution with the same U then solve the systems one by one. Finally, another advantage of the LU
factorization is the ease with which L can be constructed; the very process of row reduction of A to
the echelon form U provides all the input for L.
LU Factorization 85
Construction of L
We now explain the precise manner in which entries of L are to be determined by analysing the row
reduction of an arbitrary m × n matrix A (over a field F); the only assumption we are making is that
at no stage of the reduction, any row interchange is required. The row reduction begins by making
all the entries of the first column of A, below the first entry, zeros one by one, by adding suitable
multiples of the first row to the rows below it. Now from our discussions of elementary matrices in
Section 2.2, we know that if one has to subtract l j1 times the first row of A from the jth row (or,
equivalently, adding −l j1 times the first row to the jth row)to eliminate the ( j, 1)th entry of A, then
the corresponding elementary matrix of order m, say E j1 , is given by E j1 = Im − l j1 e j1 ; here Im is the
identity matrix of order m and e j1 denotes the unit matrix of order m having 1 at the ( j, 1)th position
and zeros elsewhere. We set, for convenience,
E1 = Em1 Em−1,1 . . . E21

= (Im − lm1 em1 )(Im − lm−1,1 em−1,1 ) . . . (Im − l21 e21 )
It is clear that A reduces to the matrix E1 A after all the elementary row operations dealing with the first
column of A are carried out. Now, from the proof of Proposition (2.3.5), we know that E −1 j1 = Im +l j1 e j1 .
It follows that
E1−1 = E21
−1 −1
E . . . Em1 −1
? 31
= (Im + l j1 e j1 )
j>1
To get a more explicit description of E1 −1 , first note that E21 −1 E −1 = (I + l e )(I + l e ) = I +

31 m 21 21 m 31 31 m
l21 e21 + l31 e31 by the rules governing the multiplications of unit matrices (see Proposition 1.3.7 of
Chapter 1). Continuing the multiplications of the factors in E1−1 in this way, we finally arrive at
1
E1−1 = Im + l j1 e j1 , (2.8)
j>1
which, when written out explicitly, yields the following:

 
 1 0 0 . . . 0 
 l 
 21 1 0 . . . 0 
 l31 0 1 . . 0 


E1−1 =  . . . . . .
 
 . . . . . 
 . . . . . 
 
lm1 0 0 . . . 1
Note: E1−1 is a lower triangular matrix with 1 along the diagonal; most interestingly, the ( j, 1)th entry
of this matrix, for j ≥ 2, is precisely the multiplier l j1 used to make the ( j, 1)th entry of A zero. Also
all the entries of E1−1 , other than those along the diagonal and in the first column, are zeros.
The row reduction of A next moves to the next pivot column of E1 A; this pivot, whatever column it
may be in, has to be in the second row. This stage of the row reduction consists of subtracting multiples
of the second row by suitable multipliers, from each of the rows below it, one by one, so as to make
all the entries below the second pivot zeros. As in the preceding case, if we denote the multiplier, used
to make the entry in the jth row below the pivot zero, as l j2 then the elementary matrix corresponding
to the replacement of the jth row will be given by E j2 = I − l j2 e j2 . Therefore, the row replacements
associated with the second pivot reduces E1 A to E2 E1 A, where
E2 = Em2 Em−1,2 . . . E32

= (Im − lm2 em2 )(Im − lm−1,2 em−1,2 ) . . . (Im − l32 e32 )
As with the case for E1 ,we then have
E2−1 = E32
−1 −1
. . . Em2
?
= (Im + l j2 e j2 )
j>2
1
= Im + l j2 e j2 .
j>2
Explicitly,
 
 1 0 0 . . . . 0 
 0 1 0 . . . . 0 
 
 0 l32 1 . . . . 0 
 
 0 l42 0 . . . . 0 
E2−1 =  .
 . . . . . 
 . . . . . 
 

 . . . . . 
0 lm2 0 . . . . 1
Observe that 1 1
E1−1 E2−1 = (Im + l j1 e j1 )(Im + l j2 e j2 )
j>1 j>2
1 1
= Im + l j1 e j1 + l j2 e j2
j>1 j>2
as e j1 ek2 is the zero matrix if k > 2. This shows that

 
 1 0 0 . . . . 0 
 l21 1 0 . . . . 0 

 
 l31 l32 1 . . . . 0 
 l l42 0 . . . . 0 
E1−1 E2−1 =  41 .
 . . . . . 
 . . . . . 
 
 . . . . . 
 
lm1 lm2 0 . . . . 1
Continuing in this manner, we see that if A has t pivots (t ≤ m), then the complete process of row
reduction of A can be effected by left-multiplying A by E t Et−1 . . . E1 with Ek given by
Ek = (Im − lmk emk )(Im − lm−1,k em−1,k ) . . . (Im − lk+1,k ek+1,k ), (2.9)
LU Factorization 87
where l jk (for j > k) is the multiplier of the kth row used for row replacement of the jth row for making
the entry in the jth row below the kth pivot zero. If U is the row echelon matrix thus obtained from A,
it is clear that
Et Et−1 . . . E1 A = U.
Each Ek , being a product of elementary matrices, is invertible. Hence, the factorization A = LU holds
with
L = E1−1 E2−1 . . . Et−1 .
On the other hand, from Equation (2.9), it follows that
Ek−1 = (Im + lk+1,k ek+1,k )(Im + lk+2,k ek+2,k ) . . . (Im + lm,k em,k ).
Therefore,
1 1 1
L = (Im + l j1 e j1 )(Im + l j2 e j2 ) . . . (Im + l jt e jt )
j>1 j>2 j>t
1 1 1
= Im + l j1 e j1 + l j2 e j2 + · · · + l jt e jt
j>1 j>2 j>t
as e jk elq is the zero matrix if k < l.

From this expression of L, we can give an explicit description of L. L is clearly a lower triangular
matrix of order m with 1 along the diagonal; moreover, the entries in its kth column (1 ≤ k ≤ t), below
the diagonal entry are the multipliers lk+1,k , lk+2,k , . . . , lmk . In case the number of pivots t < m then the
last m − t columns of L will be the same as the corresponding columns of the identity matrix of order
m. However, in case t = m, the last pivot of A occurs in the last row of A. Since there is no need for
row replacement in this case, we may take Et = Em = I, the identity matrix of order m. It follows that
the lower triangular matrix L of order m has all entries in the last column equal to zero except for the
bottommost one, which is the diagonal element 1.
This description of L allows us to write down L at the same time when row replacements are being
used to bring A to its row echelon form U. All we have to do is to keep track of the multipliers. We
start with the identity matrix of order m (if A is m × n) and place the multipliers associated with the
row replacements used to make the entries below the kth pivot in the kth column below the diagonal
entry 1 in the order indicated in the preceding description of L. If the number t of pivots is less than m,
the process will create entries below the diagonal in the first t columns of the identity matrix, resulting
in L. If t = m, all the positions below the diagonal will be filled up with the multipliers while the last
column of the identity matrix is left intact.
We present a couple of examples to illustrate the construction of L according to the method we have
just outlined. For convenience, as practised earlier, a row replacement in which we subtract l times the
ith row from the jth row, will be denoted by
R j → R j − lRi .
EXAMPLE 10 We go back to the matrices of Example 1 to show how row reduction of

 
1 −1 −2
 
A = 1 0 −1
 
2 3 2
helps us to construct L as given in the example. The first stage of row reduction of A
are the row replacements given by R2 → R2 − R1 and R3 → R3 − 2R1, which reduce
A to  
1 −1 −2
 
A1 = 0 1 1.
 
0 5 6
Thus, the multipliers used to deal with the first column of A are respectively, 1 and
2. These will then fill the first column of L below the diagonal. Moving on to the
second pivot in A1 , which is the first non-zero entry in the second row, we see that
the row replacement R3 → R3 − 5R1 makes the entry below the second pivot zero.
This operation changes A1 to
 
1 −1 −2
A2 = 0 1 1.
 
0 0 1
So the second column of L has the multiplier 5 below the diagonal entry 1. Since A2
is already in a row echelon form so we take U = A2 . The corresponding L is therefore
given by
 
1 0 0
 
L = 1 1 0.
 
2 5 1
Note: U is an upper triangular matrix. It is left to the reader to check that A = LU.
EXAMPLE 11 We determine an LU factorization of the following matrix
 
 2 1 −1 0 −2

 4 2 1 1 0
A =  .
−2 −1 4 1 8
6 3 −3 0 −10
The row replacements R2 → R2 − 2R1 , R3 → R3 + R1 and R4 → R4 − 3R1 give us zeros
below the first pivot 2 of A, and reduce A to
 
2 1 −1 0 −2
0 0 3 1 4
A1 =  .
0 0 3 1 6
0 0 0 0 −4
Thus, the multipliers which will fill the first column of L below the diagonal entry
1 are respectively 2, −1 and 3 (note that as the second row operation was adding
R1 to R3 , the corresponding multiplier is −1). The next pivot is 3 appearing in the
second row of A1 but in the third column. We need a single row replacement, namely
R3 → R3 − R2 , to make the entries below this pivot zeros. This reduces A1 to
 
2 1 −1 0 −2
0 0 3 1 4
A2 =  .
0 0 0 0 2
0 0 0 0 −4
LU Factorization 89
So the multipliers which will fill the second column of L are respectively 1 and 0.
Finally, R4 → R4 + 2R3 reduces A2 to a row echelon form of A:
 
2 1 −1 0 −2
0 0 3 1 4
U =  .
0 0 0 0 2
0 0 0 0 0
So the multiplier to fill in the third column of L is −2. Note: as L has to be matrix of
order 4 and there are only three pivots of A, the last column of L is the fourth column
of I4 . Hence, A = LU, where
 
 1 0 0 0

 2 1 0 0
L =  .
−1 1 1 0
3 0 −2 1
Continuing with the example, it can be easily shown that U can be further factored as
U = DV, where D is a diagonal matrix of order 4 and V obtained from U by replacing
each pivot of U by 1. To verify that such a factorization is possible, we first observe
that V can be obtained from U by performing the elementary row operations of divid-
ing out each row of U containing a pivot by the pivot itself. If we denote a diagonal
matrix having a1 , a2 , . . . , an as diagonal elements by diag[a1, a2 , . . . , an ], then it
is clear that the three elementary matrices corresponding to these row operations on
U which makes the pivots equal to 1 are E1 = diag[1/2, 1, 1, 1], E2 = [1, 1/3, 1, 1]
and E3 = diag[1, 1, 1/2, 1]. Thus we see that E3 E2 E1 U = V, which implies that
U = E1−1 E2−1 E3−1 V. But as we have seen in our discussion about inverses of elemen-
tary matrices in Section 2.3, E1−1 = diag[2, 1, 1, 1], E2−1 = diag[1, 3, 1, 1] and E3−1 =
diag[1, 1, 2, 1] so their product equals the diagonal matrix D = diag[2, 3, 2, 1].
Thus, we obtain the factorization
U = DV
  
2 0 0 0 1

1/2 −1/2 0 −1 

0 3 0 0 0 0 1 1/3 4/3
=   .
0 0 2 0 0 0 0 0 1 
0 0 0 1 0 0 0 0 0
This example illustrate the following: suppose that an m × n matrix A has an LU factorization
A = LU. Construct a diagonal matrix D of order m by placing the pivots of U along the diagonal
in the same order in which they appear in U (from left to right) and, if necessary, filling the rest of
the diagonal by 1’s. Then we have another factorization A = LDV, where V is obtained from U by
replacing each of its pivots by 1. The easy verification is left to the reader as an exercise.
Any factorization of a matrix A ∈ Mn (F) in the form
A = LDV,
where L is a lower triangular matrix of order m with 1 along the diagonal, D a diagonal matrix of order
m and V an m × n matrix whose all pivots are equal to 1, is called an LDV factorization or an LDV
decomposition of A.
Computing Inverses by LU Factorization

One important application of an LU factorization provides an efficient method to work out the in-
verse of an invertible matrix A ∈ Mn (F). In fact, because of the efficiency, software programmes,
such as MATLAB, use this method to compute inverses. The procedure is simple. If A = LU, then
A−1 = U −1 L−1 , where U −1 and L−1 are still upper and lower triangular matrices respectively (see
Proposition 2.5.8) and so computation of A−1 as the product U −1 L−1 is relatively simple. Further
simplification is possible. Since A is invertible, it must have n pivots, that is, the diagonal entries of
the echelon form U, being the pivots, are non-zero. Therefore, the corresponding LDV factorization
A = LDV is such that D is a diagonal matrix of order n with the pivots along the diagonal, and L and V
are, respectively, lower and upper triangular matrices of order n with their diagonal entries all equal
to 1. It follows that the computation of V −1 and L−1 and hence of A−1 = U −1 D−1 L−1 , by row reduction
algorithm, is even simpler.
EXAMPLE 12 An easy calculation shows that for the matrix
 
 2 −3 −1
 
A =  6 0 6
 
−4 6 5
one has an LU factorization given by
    
 2 −3 −1  1 0 0 2 −3 −1
    
 6 0 6 =  3 1 0 0 9 9,
   
−4 6 5 −2 0 1 0 0 3
Hence an LDV factorization of A is

   
 1 0 0 2 0 0 1 −3/2 −1/2
   
 3 1 0 0 9 0 0 1 1 .
  
−2 0 1 0 0 3 0 0 1
The row reduction method of finding inverses readily gives us, in fact, in two steps
in both cases, that
 −1  
 1 0 0  1 0 0
   
 3 1 0 = −3 1 0
  
−2 0 1 2 0 1
and
 −1  
1 −3/2 −1/2 1 3/2 −1
   
0 1 1  = 0 1 −1.
  
0 0 1 0 0 1
LU Factorization 91
We conclude that
   
1 3/2 −1 1/2 0 0   1 0 0
   
A−1 = 0 1 −1  0 1/9 0  −3 1 0
   
0 0 1 0 0 1/3 2 0 1
 
−2/3 1/6 −1/3
 
=  −1 1/9 −1/3.
 
2/3 0 1/3
One interesting aspect of an LDV factorization of an invertible matrix A = LDV is that such a
factorization is unique, a fact which we shall shortly prove. However, for a singular matrix, we cannot
hope for uniqueness as the following example shows.
EXAMPLE 13 The LU factorization
    
2 1 −1 1 0 0 2 1 −1
    
A = 4 4 −1 = 2 1 0 0 2 1
    
2 −1 −2 1 −1 1 0 0 0
of the real matrix A shows that for any real number x,
   
1 0 0 2 0 0 1 1/2 −1/2
   
A = 2 1 0 0 2 0 0 1 1/2 .
   
1 −1 1 0 0 x 0 0 0
Thus, A has infinitely many LDV factorizations.
Proposition 2.6.2. Let A ∈ Mn (F) be an invertible matrix such that no row interchange is required
for its row reduction. Then the LDV factorization of A is unique.
Proof. If possible, let A = L1 D1 V1 = L2 D2 V2 be two factorizations, where L1 , L2 and V1 , V2 are,

respectively, lower and upper triangular matrices of order n, having all their diagonal entries equal to 1,
and D1 , D2 diagonal matrices of order n. By Proposition (2.5.5), each of L1 , L2 , V1 and V2 is invertible
and so (as A is invertible) D1 and D2 are also invertible. Now, left-multiplying L1 D1 V1 = L2 D2 V2 by
2 and right-multiplying by (D1 V1 )
L−1 −1 = V −1 D−1 , one obtains
1 1
L−1 −1 −1
2 L1 = D2 V2 V1 D1 . (2.10)
Note: Recall that (see Proposition 2.5.8) the inverse of a lower triangular (respectively, an upper trian-
gular) matrix is lower triangular (respectively, upper triangular). Moreover, by Proposition (1.2.4), a
product of lower triangular (respectively, upper triangular) matrices is lower triangular (respectively,
upper triangular). It follows that the left hand side of Equation(2.10) represents a lower triangular ma-
trix of order n whereas the right hand side is an upper triangular matrix of order n (considering D1 , D2
as upper triangular). Therefore, Equation (2.10) can hold only if both sides of the equation represent
the same diagonal matrix, say D, of order n. Since both L−1 2 and L1 are lower triangular matrices of
order n, each having all diagonal entries equal to 1, by Proposition (1.2.4) again, the diagonal matrix
D = L−1 2 L1 must have all its diagonal entries equal to 1, and so D = In . Thus, we may conclude that
L−1
2 L 1 = In and D2 V2 V1−1 D−1
1 = In . The first implies that L1 = L2 while the second that D2 V2 = D1 V1 ,
which can be rewritten as V2 V1−1 = D−12 D1 . As the left hand side of the last equation is an upper trian-
gular matrix of order n with all the diagonal entries equal to 1, whereas the right hand side a diagonal
matrix of order n, we deduce that V2 V1−1 = In = D−12 D1 . Hence, V1 = V2 and D1 = D2 completing the
proof of the proposition. !
As a consequence of this uniqueness, the factors L and V of the LDV factorization of a symmetric
invertible matrix reflect the symmetry in a beautiful way.
Proposition 2.6.3. Let A ∈ Mn (F) be an invertible symmetric matrix such that its row reduction
requires no row interchange. If A = LDV is the unique LDV factorization of A, then V = Lt .
Proof. Since (XY)t = Y t X t for any X, Y ∈ Mn (F), and At = A as A is symmetric, it follows, from the
LDV factorization, that
A = At = (LDV)t = V t DLt ,
as D is diagonal. Observe that V t is a lower triangular matrix with all diagonal entries equal to 1
(taking transpose does not alter the diagonal) and Lt an upper triangular with 1 along the diagonal.
So, comparing this factorization of A with A = LDV, we conclude, by the uniqueness of the LDV
factorization, that L = V t and, equivalently, V = Lt . !
EXAMPLE 14 The elementary row operations R2 → R2 − 2R1 , R3 → R3 − 3R3 and R3 → R3 − R2 ,
applied successively, to the real symmetric matrix
 
1 2 3
 
A = 2 6 8
 
3 8 10
produces an echelon form

 
1 2 3
 
0 2 2,

0 0 −1
which tells us that

 
1 2 3
 
V = 0 1 1
 
0 0 1
and that the pivots, from the left to the right, are 1, 2 and −1. Since A is symmetric as
well as invertible (what are the pivots?), L = V t and we obtain the LDV factorization
of A as
   
1 0 0 1 0 0 1 2 3
   
A = 2 1 0 0 2 0 0 1 1.
   
3 1 1 0 0 −1 0 0 1
LU Factorization 93
Permuted LU Factorization
We finally consider modifications in the procedure for obtaining a LU factorization of a matrix which
needs row interchange for row reduction. In practice, one cannot avoid such matrices; moreover, for
minimizing round-off errors in computations with matrices, row interchanges are necessary. In the
following example, we illustrate a permuted LU factorization of a matrix, whose row reduction without
row interchange does not yield an LU factorization.
EXAMPLE 15 The elementary row operations R2 → R2 + 2R1 and R3 → R3 − 3R1 reduces

 
 1 2 3
 
A = −2 −4 −2
 
3 9 4
to  
1 2 3
0 0 
4,
 
0 3 −5
which is not in the form required for an LU factorization. However, it is clear that
the row interchange R2 ↔ R3 brings the preceding matrix to the required form. Now
if, instead of A, we start with the matrix obtained from A by applying the same row
interchange to A, that is, with
 
 1 2 3
 
P23 A =  3 9 4,
 
−2 −4 −2
where P23 is the elementary permutation matrix (see subsection of Section 2.3 for a
discussion of permutation matrices) corresponding to the row interchange R2 ↔ R3 ,
then one easily verifies that P23 A has an LU factorization given by
  
 1 0 0 1 2 3
   
P23 A =  3 1 0 0 3 −5.
  
−2 −1 0 0 0 4
In general, suppose that A ∈ Mm×n (F) needs row interchanges, at various stages of its row reduc-
tion, to arrive at an echelon form. To produce a permuted LU factorization of A, one first ap-
plies all the required row interchanges to A before any other elementary row operation. Now re-
call that the elementary matrices corresponding to row interchanges are also permutation matrices.
Thus, the matrix that A changes to because of all the row exchanges, can also be obtained by left-
multiplying A by a sequence of permutation matrices of order m. By Proposition (2.3.9), the product
of these permutation matrices of order n is a permutation matrix, say P, of order m. It is clear that the
matrix PA does not need any row interchange for its row reduction and so PA has an LU factorization.
How does one use a permuted LU factorization of A to solve a system of equations Ax = b? If
PA = LU is such a factorization, then one first solves Ly = Pb for y by forward substitution followed
by solving U x = y for x by backward substitution; since P is a matrix of order m, the product Pb
is still an m-dimensional column vector. Observe that we are actually working with the augmented
matrix [PA|Pb]. This augmented matrix corresponds to a system of equations obtained by permuting
the equations in Ax = b according to the permutation described by P. So the systems Ax = b and
PAx = Pb are equivalent in the sense that they have the same solution sets.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All
(a) Any matrix which requires no row interchange for its row reduction has a unique LU fac-
torization.
(b) If Pi j and P jk (k ! i) are elementary permutation matrix of the same order, then Pi j P jk =
P jk Pi j .
(c) If Pi j and Pkl are elementary permutation matrices of the same order such that none of i
and j equals any of k and l, then Pi j Pkl = Pkl Pi j .
(d) Any invertible matrix has an LU factorization.
(e) If the system of equations Ax = b has a solution, then A has an LU factorization.
(f) If an invertible matrix A has an LU factorization, then so does A−1 .
(g) If a square matrix A has no LU factorization, then neither can AB has for any non-zero
matrix B of the same order.
(h) If A = LU is an LU factorization, then L is invertible.
(i) If A = LU is an LU factorization of a square matrix A, then U is invertible.
(j) A real symmetric matrix has an LDV factorization.
2. Find the LU factorizations of the following real matrices:
   
2 −1 2  5 2 1
   
4 0 7, −10 −2 3
 
6 −5 7 15 2 −3
3. Determine whether the following real matrix has an LU factorization:
 
0 2 −6 −2 −4
 
0 −1 3 3 2.

0 −1 3 7 10
4. Find LU and LDV factorizations of

 
2 2 0
 
2 5 3.

0 3 6
5. Find LU factorizations of the following matrices and determine their inverses if they exist:
   
 1 2 1 1 2 1
   
−1 1 2, 1 0 1
 
1 0 1 1 1 2
LU Factorization 95
6. Find an LU factorization
 
1 −1 −1 −1

1 2 −2 −2
A =  .
1 2 3 −3
1 2 3 4
Does the LU factorization of A indicate that A is invertible? If so, find the inverse of A.
7. Find an LU factorization of
 
2 1 1 1

1 2 1 1
A =  .
1 1 2 1
1 1 1 2
Use the factorization to solve the systems of equations Ax = b for

     
 1   −1   0 
 −1   1   1 
b =   , b =   , b =   .
 −1   −1   −1 
1 1 1
8. Use an LU factorization of the tridiagonal matrix

 
 1 −1 0 0 0

−1 2 −1 0 0
 
A =  0 −1 2 −1 0

 0 0 −1 2 −1

0 0 0 −1 2
to solve the system of equations Ax = b for

 
 1 
 −1 
 
b =  1  .
 −1 
 
1
Find A−1 after finding the LDV factorization of A.

9. Let a, b, c and d are real numbers such that a ! 0, b ! a, c ! b and d ! c. Find the LDV factor-
ization of
 
a a a a

a b b b
A =  ,
a b c c
a b c d
and hence find A−1 .

10. Is it possible to have an LU factorization of

 
1 2 −1
 
A = 2 4 1?
 
1 3 1
If not, find a permutation matrix P such that PA has an LU factorization.
2.7 DETERMINANT
With every square matrix over a field, we can associate a unique scalar called its determinant. Origi-
nally, the determinant of the coefficient matrix of a system of linear equations having the same number
of equations and variables was used to determine the existence of solutions. Since then, the concept
of determinant has evolved into a very useful theoretical tool in diverse areas of mathematics. In this
section, we present a treatment of the concept of determinants which makes the proofs of their basic
properties accessible at this level.
Because of its very nature, as the reader will find out a little later, the definition of determinants,
and verifications of their properties, tend to be tedious and cumbersome. One can develop the theory
in a very elegant manner, but such a treatment requires mathematical ideas way beyond the scope of
this book. We have decided to keep the definition simple, but then we have to sacrifice certain amount
of simplicity in our proofs. Our definition of determinant of a matrix will be a recursive one. We first
write out the determinant of a 1 × 1 and a 2 × 2 matrix explicitly. Then, for n ≥ 2, we will define the
determinant of a n × n matrix inductively in terms of determinants of certain (n − 1) × (n − 1) matrices.
This recursive definition allows us to express the original determinant in terms of determinants of
matrices of successively lower orders step by step. Thus, theoretically at least, it is possible to evaluate
the determinant of a n × n matrix explicitly as it can be expressed in terms of 2 × 2 determinants which
are already defined.
Note that we have already used an expression like n × n determinant to mean the determinant of an
n × n matrix. We let this abuse of language stand to make things easier for us.
Recursive Definition of Determinants

Now, for the definitions. Let det A denote the determinant of any matrix A ∈ Mn (F) 'where F is(a field.
a a12
For any 1 × 1 matrix A = [a], which is just a scalar in F, we let det A = a, and for A = 11 where
a21 a22
ai j ∈ F, we let
det A = a11 a22 − a12 a21 , (2.11)
which is a well-defined scalar in F.

In general, given an n × n matrix A (n ≥ 2 ) over F, let Ai j be the (n − 1) × (n − 1) submatrix obtained
from A by deleting the ith row and the jth column of A for any i, j, 1 ≤ i, j ≤ n; such Ai j are called the
minors of A. The determinant of A is then defined in terms of the determinants of the minors from the
Determinant 97
first row of A as follows:
det A = a11 det A11 − a12 det A12 · · · ± a1n det A1n
n
1
= (−1)1+ ja1 j det A1 j . (2.12)
j=1
This is known as the expansion of det A by minors along the first row. Note that for n = 2, the
definition given by Equation (2.12) is the same as the one given by Equation (2.11). Also, note that
the determinant of any matrix in Mn (F), is a scalar in F.
EXAMPLE 16 In the following example, we compute the 3 × 3 determinant first by expressing it
in terms of 2 × 2 determinants according to Equation (2.12) before using Equation
(2.11):
 
 2 0 −3 ' ( ' ( ' (
  1 0 −1 0 −1 1
det −1 1 0 = 2. det − 0. det + (−3). det
  −1 1 2 1 2 −1
2 −1 1
= 2.1 − 0.(−1) − 3.(−1)
= 2−0+3
= 5.
Determinant of a Lower Triangular Matrix
EXAMPLE 17 We next compute the determinant of a lower triangular matrix in a similar manner:
 
 2 0 0 ' (
  4 0
det −1 4 0 = 2. det
  5 6
3 5 6
= 2.4. det [6]
= 24.
Clearly, one can generalize this example.
Proposition 2.7.1. The determinant of a lower triangular matrix over any field F, and in particular,
of a diagonal matrix, is the product of its diagonal entries.
Proof. Suppose, A = [ai j ] is a lower triangular matrix in Mn (F). The proof is by induction on n, the
order of A. There is nothing to prove if n = 1, so we can begin induction with n ≥ 2. Since all the
entries of the first row of A, except possibly a11 , are zeros, it follows from Equation (2.12) that
det A = a11 det A11 .
But as A is a lower triangular matrix, it is clear that A11 itself is a lower triangular matrix of order (n −1)
with diagonal entries a22 , a33 , . . . , ann . Therefore, the induction hypothesis implies that det A11 is the
product a22 a33 , . . . , ann . The preceding equality then proves the proposition. !
Corollary 2.7.2. If In is the identity matrix of order n (n ≥ 1) over any field F, then
det In = 1.
Row Operations and Determinants

One of our main concerns will be to see how elementary row operations on a square matrix changes
its determinant. Before presenting the relevant results, we introduce a bit of notation which brings out
dependence of the determinant of a matrix on its rows.
Definition 2.7.3. If A ∈ Mn (F) has the row vectors ρ1 , ρ2 , . . . , ρn as its rows, then we will write
det A = Dn (ρ1 , ρ2 , . . . , ρn ).
j
For any j (1 ≤ n), let ρi be the 1 × (n − 1) row vector obtained by deleting the jth entry of the row vector
ρi . Then, for the minor A1 j , which is obtained from A by deleting the 1st row and the jth column of A,
we have, in our new notation,
j j
det A1 j = Dn−1 (ρ2 , . . . , ρn ).
Therefore, the definition of the determinant of A yields the following:

n
1
j j
det A = (−1)1+ j a1 j Dn−1 (ρ2 , . . . , ρn ). (2.13)
j=1
Also, note that as we view the rows of a matrix as row vectors, any addition of rows or scalar
multiplication of a row are done componentwise.
Now, we are ready to present the first of the basic properties of determinants.
Proposition 2.7.4. Let ρ j (1 ≤ j ≤ n) be n 1 × n row vectors over a field F. Then, the following
hold:
(a) For a fixed i, (1 ≤ i ≤ n), and any 1 × n row vector σ over F,
Dn (ρ1 , ρ2 , . . . , ρi + σ, . . . , ρn )
= Dn (ρ1 , ρ2 , . . . , ρi , . . . , ρn ) + Dn (ρ1 , ρ2 , . . . , σ, . . . , ρn ),
where in the second Dn , on the right-hand side of the equation, σ appears in the ith place;
(b) For any scalar c ∈ F and a fixed i, (1 ≤ i ≤ n),
Dn (ρ1 , ρ2 , . . . , cρi , . . . , ρn ) = cDn (ρ1 , ρ2 , . . . , ρi , . . . , ρn ).
A terse way of stating the proposition is to say that a determinant, viewed as a function on row
vectors, is linear or the determinant is linear on the rows of matrices. A word of caution, however; it
does not say, for example, that det (A + B) = det A + det B. To  clarify what
 it actually means, we give
1 2 3
 
a couple of examples to illustrate the proposition. Consider 3 2 0. Since the second row of the
 
0 1 0
Determinant 99
matrix can be thought of as (3, 2, 0) + (0, 0, 0), the first relation of the proposition implies that
     
 1 2 3  1 2 3 1 2 3
     
det 3 + 0 2 + 0 0 + 0 = det 3 2 0 + det 0 0 0.
     
0 1 0 0 1 0 0 1 0
Similarly, the second relation of the proposition allows us to work out the following:
' ( ' (
1 2 1 2
det = det
2 4 2.1 2.2
' (
1 2
= 2. det .
1 2
We now come to the proof.
Proof. The proof of both the assertions will be carried out by induction on n, the number of rows (and
columns) of the matrices involved. Consider assertion (a) first. If n = 1, then i = 1 and the matrices
involved are just scalars. The corresponding determinants are the same scalars and so, there is nothing
to prove. So, assume that n ≥ 2 and the assertion holds for determinants of any (n −1) ×(n −1) matrices.
Let A be the n × n matrix whose rows are ρ1 , . . . , ρi + σ, . . . , ρn . Further, for any k, (1 ≤ k ≤ n), let
ρk = (ak1 , ak2 , . . . , akn ) and σ = (b1 , b2 , . . . , bn ) for scalars akl and bl . Consider the case when i = 1.
Then, the jth entry of first row of A is (a1 j + b j ), and so according to Equation (2.13), we see that
1
j j
Dn (ρ1 + σ, . . . , ρ2 , . . . , ρn ) = (−1)1+ j(a1 j + b j )Dn−1 (ρ2 , . . . , ρn ).
j
j j
Since Dn−1 (ρ2 , . . . , ρn ) is also a scalar like a1 j and b1 , the sum on the right-hand side of the preceding
equality can be split into two sums as follows:
1 1
j j j j
(−1)1+ ja1 j Dn−1 (ρ2 , . . . , ρn ) + (−1)1+ j b j Dn−1 (ρ2 , . . . , ρn ).
j j
Invoking Equation (2.13) again, and noting that ρ1 = (a11 , a12 , . . . , a1n ) and σ = (b1 , b2 , . . . , bn ), we
see that the preceding two sums can be expressed as
Dn (ρ1 , ρ2 , . . . , ρn ) + Dn(σ, ρ2 , . . . , ρn )
which proves the assertion in this case. Note that this straightforward case did not need the induction
hypothesis.
So, consider the case when 2 ≤ i ≤ n. So, the sum of the row vectors occurs in A in a row other
than the first one. In this case, any minor A1 j , where 1 ≤ j ≤ n, obtained by removing the 1st row and
j j j
the jth column of A, is an (n − 1) × (n − 1) having the row vectors ρ2 , . . . , ρi + σ, . . . , ρn as its rows.
j
(see Definition (2.6.1) for the definition of ρk ) The obvious modifications needed for i = 2 or i = n are
clear. Thus, by Equation (2.13), we have
1
j j j
Dn (ρ1 , . . . ρi + σ, . . . , ρn ) = (−1)1+ ja1 j Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn ). (2.14)
j
However, by the induction hypothesis
j j j
Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn )
j j j j j
= Dn−1 (ρ2 , . . . , ρi , . . . , ρn ) + Dn−1(ρ2 , . . . , σ j , . . . , ρn ).
j j j
By substituting the preceding expression for Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn ) in Equation (2.14), we
obtain the desired formula for Dn (ρ1 , . . . ρi + σ, . . . , ρn ) applying Equation (2.13).
The reader can provide the proof of assertion (b) along the same lines using induction. !
Since the zero row vector can be thought as a scalar multiple of any row vector, say, of (1, 1, . . . , 1),
by the scalar zero, assertion (b) of the preceding proposition immediately implies the following corol-
lary; note that the determinant of any matrix is a scalar, too.
Corollary 2.7.5. If a row of a matrix A ∈ Mn (F) is a zero row, then det A = 0.
An analogous result in the case of a zero column is also true. However, our way of treating deter-
minant means that we cannot give an analogous proof. But induction readily proves the result, and we
leave the proof of the following to the reader.
Proposition 2.7.6. If a column of a matrix A ∈ Mn (F) is a zero column, then det A = 0.
The next result is crucial in calculations with determinants.
Proposition 2.7.7. If two adjacent rows of a matrix A ∈ Mn (F) are equal, then det A = 0.
In terms of our alternative notation,
Dn (ρ1 , . . . , ρn ) = 0 if ρi = ρi+1 for some i (1 ≤ i ≤ n − 1)
Proof. There are two cases to be considered. The first is when the first two rows of A are equal, and
the second is when the adjacent equal rows occur after the first row. We take up the first case to begin
with.
For the hypothesis to make sense in the first case, n ≥ 2. If n = 2, then A has the form
' (
a11 a12
.
a11 a12
Then, the formula for the determinant of a 2 × 2 matrix directly shows that det A = 0. So we may
assume that n ≥ 3. Since the first two rows of A are equal, for any j (1 ≤ j ≤ n), the first row of the
minor A1 j is the first row of A with the jth entry removed. Thus, if (a11 , a12 , . . . , a1n ) is the first row
of A, then the first row of A1 j is (a11 , a12 , . . . , a1, j−1 , a1, j+1 , . . . , a1n ). Thus, according to Equation
(2.12), we can express, for each fixed j, det A1 j as an alternate sum of terms of the type a1k det B jk
where B jk can be thought of as the (n − 2) × (n − 2) submatrix obtained from A by deleting the 1st and
the 2nd row as well as the jth and the kth columns of A; observe that k ! j.
Determinant 101
Before continuing with our proof, we work out the details in case of a 4 × 4 matrix A with its first
two rows equal, to make our proposed line of reasoning clear. We have the formula for det A explicitly
as
det A = a11 det A11 − a12 det A12 + a13 det A13 − a14 det A14 ,
where
   
a12 a13 a14  a11 a13 a14 
   
A11 = a32 a33 a34 , A12 = a31 a33 a34 
   
a42 a43 a44 a41 a43 a44
   
a11 a12 a14  a11 a12 a13 
   
A13 = a31 a32 a34 , A14 = a31 a32 a33 .
   
a41 a42 a44 a41 a42 a43
Therefore,
det A = a11 a12 det B12 − a11 a13 det B13 + a11 a14 det B14
− a12a11 det B21 + a12a13 det B23 − a12 a14 det B24
+· · · ,
where
' (
a a34
B12 = 33 = B21 (2.15)
a43 a44
(2.16)
' (
a a34
B13 = 32 = B31 etc. (2.17)
a42 a44
The point of this example is to note that B jk and Bk j are the same matrix so that det A, being the
alternate sum of the B jk , is automatically zero. Let us come back to the proof. We have already noted,
as in the example, that det A is the sum of all the terms of type a1 j a1k det B jk (ignoring the signs of
the terms), where both j and k run through 1 to n with the stipulation that k ! j. To deal with this
sum, we pair off, for each pair ( j, k) with j < k, the term a1 j a1k det B jk with the term a1k a1 j det Bk j .
As we insist that j < k, this pairing will exhaust all the terms in the sum for det A. On the other hand,
B jk and Bk j are obtained from A by removing the same rows and columns, but in different sequence.
Therefore, it is clear that B jk and Bk j are equal matrices. (That is why in the example of the 4 × 4
matrix, B12 is equal to B21, etc.). Consequently, to prove that det A is zero as the proposition claims,
it suffices to show that the terms we have paired off in the sum for det A, for example, a1 j a1k det B jk
and a1k a1 j det Bk j , have different signs.
Assume that a1 j a1k det B jk has ‘−’ sign. This negative sign may arise in two different ways in the
expansion of det A, and we treat them as separate cases.
Suppose that the ‘−’ sign was picked up by a1 j during the row expansion of det A, and hence a1k
has the ‘+’ sign during the expansion of det A1 j . In that case, a careful look at Equation (2.12) for the
occurrence of signs, we conclude that both j and k are even (note as jth column was removed to get
A1 j and j < k, a1k in A1 j is at an odd-numbered place if k is even.) But then, a1k in the expansion of
det A, and a1 j in the expansion of det A1k will both pick up the ‘−’sign. For, as j < k, the removal of
of the kth column does not affect the sign of a1 j in the expansion of det A1k , as the term remains at
an even-numbered place in A1k . Thus, a1 j a1k det B jk and a1k a1 j det Bk j do have different signs in this
case.
In case a1 j picks up the ‘+’ sign in the expansion of det A, then in the expansion of det A1 j , a1k
has to have the ‘−’ sign. So, j has to be odd and since a1k has to be in an even-numbered place in
the first row of A1 j , k must also be odd. Consequently, the term containing a1k in the expansion of
det A has the ‘+’ sign. However, as j < k, a1 j will still be in an odd-numbered place in the first row
of a1k , so a1k a1 j Bk j has the ‘+’ sign in this case. This finishes the argument when a1 j a1k det B jk has
‘−’ sign.
When a1 j a1k det B jk has + sign, the desired conclusion can be arrived at exactly in the same
manner.
We now take up the case when the equal adjacent rows appear after the first row in A which is
possible only when n ≥ 3. We will use induction to settle this case. If n = 3, then the second and
the third row of A are equal. Therefore, the three minors A11 , A12 and A13 are all 2 × 2 matrices
having equal rows so their determinants are zeros by the first case. So, assume that n ≥ 4 and the
proposition holds for the determinant of any (n − 1) × (n − 1) matrix satisfying the hypothesis of the
proposition. Consider the expansion of det A by the minors of the first row. Now, each of these minors
is an (n − 1) × (n − 1) matrix whose two adjacent rows are equal by our assumption and hence their
determinants are zeros by the induction hypothesis. It follows that det A is also zero, proving the
proposition in this case. !
There are some more results about how determinants are affected by certain conditions on adjacent
rows of a matrix. These are presented now.
Proposition 2.7.8. Let A ∈ Mn (F).

(a) If matrix B is obtained by adding a scalar multiple of a row of A to an adjacent row, then
det B = det A.
(b) If matrix B is obtained by interchanging two adjacent rows of A, then det B = − det A.
Proof. Consider assertion (a) first. Assume that a scalar multiple of a row of A is added to a row above
it. To prove assertion (a) in this case, we have to show that if the rows of the matrix A are the row
vectors ρ1 , . . . , ρi , . . . , ρn , then for any i, (1 ≤ i ≤ n − 1) and any scalar a, det A = Dn (ρ1 , . . . , ρi +
aρi+1 , ρi+1 , . . . , ρn ). Now, as determinant is linear on rows according to Proposition (2.7.4), we have
Dn (ρ1 , . . . , ρi + aρi+1 , ρi+1 , . . . , ρn )

= Dn (ρ1 , . . . , ρi , ρi+1 , . . . , ρn )
+ a.Dn (ρ1 , . . . , ρi+1 , ρi+1 , . . . , ρn ).
The first term on the right-hand side of the equality is the determinant of A whereas the second term is
the determinant of a matrix with identical adjacent rows. So, by the preceding proposition, the second
term is zero and the assertion follows in this case. The case when a scalar multiple of a row of A is
added to a lower row can be settled the same way.
For proving assertion (b), we begin by noting, as particular cases of the first assertion, that we can
add or subtract a row from an adjacent row of a matrix without changing its determinant. We use this
Determinant 103
observation repeatedly in the following calculations to prove (b):
det A
= Dn (ρ1 , . . . , ρi , ρi+1 , . . . , ρn )
= Dn (ρ1 , . . . , ρi − ρi+1 , ρi+1 , . . . ρn )
= Dn (ρ1 , . . . , ρi − ρi+1 , ρi , . . . , ρn )
= Dn (ρ1 , . . . , −ρi+1 , ρi , . . . , ρn )
= −Dn (ρ1 , . . . , ρi+1 , ρi , . . . , ρn )
= − det B,
where the last but one equality follows from Proposition (2.7.5). !
We can now quickly establish the analogous results for arbitrary rows.
Theorem 2.7.9. Let A ∈ Mn (F).

(a) If two rows of A are identical, then det A = 0.
(b) If the matrix B is obtained by adding a scalar multiple of a row of A to another row, then
det B = det A.
(c) If the matrix B is obtained by interchanging any two rows of A, then det B = − det A.
Proof. If two rows of A are equal, a few interchanges of adjacent rows will result in these rows becom-
ing adjacents in the matrix C to which A changes with these row operations. By the second assertion
of Proposition (2.7.8), det C = ± det A, and by the first, det C = 0, showing that det A = 0. This proves
assertion (a).
Consider assertion (b). Suppose that the matrix B is obtained by adding a scalar multiple of the jth
row of A to its ith row. The idea is to verify that the effect of this single row operation on A is equivalent
to the combined effect of certain row operations on adjacent rows of A and matrices obtained from
A by these operations. It is clear that by interchanging the jth row and the rows adjacent to it but in
between it and the ith row of A (to be precise, in A and the matrices obtained in succession by each
of these interchanges), we obtain a matrix C which has the ith row and the jth row of A as adjacent
rows. Observe that if k interchanges are needed to produce C from A, then det C = (−1)k det A by
assertion (b) of the preceding proposition. Next, we obtain matrix C ' by adding the required scalar
multiple of the row of C which is the jth row of A to its ith row (which is still the ith row of A). Since
these two rows are adjacent in C, by assertion (a) of the preceding proposition, det C ' = det C. Finally,
in C ' , we perform the same interchanges of adjacent rows that were done to rows of A but in reverse
order. It is clear that these interchanges will produce the matrix B from C ' , and det B = (−1)k det C ' .
It follows that det B = (−1)2k det A = det A. Hence assertion (b).
A similar argument will prove assertion (c), and we leave the proof to the reader. !
Determinants and Elementary Matrices

Our next task is to translate the foregoing results in terms of determinants of elementary matrices.
Recall that there are three types of elementary matrices (see Definitions 2.3.2), and applying an ele-
mentary row operation to a matrix is equivalent to left-multiplying it by the corresponding elementary
matrix (Proposition 2.3.3). The next proposition is, therefore, just a restatement of the last theorem.

(a) If E is an elementary matrix of order n over F, corresponding to a row scaling by a scalar a
(a ! 0) then det (EA) = a det A.
(b) If E is an elementary matrix order n over F,, corresponding to a row exchange, then det (EA) =
− det A.
(c) If E is an elementary matrix order n over F, corresponding to a row replacement then det (EA) =
det A.
Taking A in the proposition to be the identity matrix In , whose determinant we have seen in Propo-
sition (2.7.2) to be 1, we derive the following corollary:
Corollary 2.7.11. Let E be an elementary matrix of order n over a field F.

(a) If E corresponds to a row scaling by a (a ! 0) then det E = a.
(b) If E corresponds to a row exchange then det E = −1.
(c) If E corresponds to a row replacement then det E = 1.
Substituting these values of the determinants of the elementary matrices in the preceding proposi-
tion, we get the following important result:
Corollary 2.7.12. Let A ∈ Mn (F). Then, for any elementary matrix E of order n over a field F, we
have
det (EA) = det E. det A.
Thus, if E1 , E2 , . . . , Er are elementary matrices of order n over F, then
det (E1 E2 , · · · , Er ) = det E1 det E2 , . . . , det Er .
Note that Corollary (2.7.11) also demonstrates that the determinant of any elementary matrix is non-
zero. Since any invertible matrix can be written as a product of elementary matrices by Proposition
(2.5.6), and since according to the preceding corollary, the determinant of a product of elementary
matrices is the product of their determinants, it then follows that the determinant of an invertible
matrix cannot be zero. Thus, we have proved one-half of the following important characterization of
invertible matrices.
Theorem 2.7.13. Let A ∈ Mn (F). A is invertible if and only if det A ! 0.
Proof. For the other half, assume that det A ! 0. Now, let R be the reduced row echelon form of
A, and let E1 , E2 , . . . , Er be the elementary matrices such that R = E1 E2 , . . . , Er A. Then, repeated
application of Corollary (2.7.12) shows that det R = det E1 det E2 , . . . , det Er det A. It follows by
hypothesis that det R ! 0. But R, being the reduced row echelon form of a square matrix A, either equals
the identity, or has a zero row. (see Proposition 2.4.6). The second possibility cannot occur as det R is
non-zero. So, R is the identity matrix, which is possible only if A is invertible by Corollary (2.5.3). !
The corollary preceding this important result is, in fact, a special case of the so-called multiplicative
property of the determinant function:
Determinant 105
Theorem 2.7.14. For any A, B ∈ Mn (F),
det (AB) = det A det B.
Proof. We first consider the case when A is invertible. Therefore, we can assume that A is a product
of elementary matrices, say, of E1 , E2 , . . . , Er over F. The last corollary then yields that
det A = det E1 det E2 , . . . , det Er .
Therefore, by repeated applications of Corollary (2.7.12), we obtain

det (AB) = det (E1 , E2 , . . . , Er B)
= det (E1 (E2 , . . . , Er B))
= det E1 det (E2 , . . . , Er B)
= det E1 . det E2 , . . . , det Er . det B
= det A. det B.
Next assume that A is not invertible, or equivalently that det A = 0. Since A is not invertible, we know
that its reduced row echelon form, say, R must have a zero row. But then the product RB also has at
least one zero row, forcing det (RB) = 0. Note that R is obtained by left-multiplying A by elementary
matrices. Thus, by Corollary (2.7.12), det (RB) is the product of determinants of elementary matrices
and det (AB). As elementary matrices have non-zero determinants, we conclude that det (AB) = 0.
Since det A = 0, it follows that det A det B and det (AB) are equal, as both are equal to zero. !
This multiplicative property, combined with the fact that the determinant of the identity matrix is
1, gives us a formula of the determinant of the inverse of an invertible matrix.
Proposition 2.7.15. Let A be an invertible matrix in Mn (F). Then

1
det A−1 = .
det A
Proof. Note that the formula makes sense, as A is invertible and so det A is a non-zero scalar. For
the derivation of the formula, we take the determinant of both sides of the matrix equation AA−1 = In ,
and then use the multiplicative property of the det function to obtain det A det A−1 = 1. Since det A is
non-zero, the proposition follows. !
Another consequence of the multiplicative property of determinants is the following corollary

which we record for future reference:
Corollary 2.7.16. For A, B ∈ Mn (F) with B invertible,

det (B−1 AB) = det A.
Proof. We leave the proof to the reader. !
In this connection, we would like to remark that for matrices X and Y, det X det Y = det Y det X as
the determinants are scalars, even though the matrix product XY is not equal to YX.
It is now time to examine whether our original definition of determinant can be modified. Recall
that the Definition (2.12) of determinant of a matrix A was in terms of expansion by minors along the
first row. We ask: can we use some other row in place of the first row, say the ith row? To answer
the question, given a matrix A with row vectors ρ1 , ρ2 , . . . , ρi , . . . , ρn as its successive rows from
the top, we consider the matrix B with row vectors ρi , ρ1 , . . . , ρi−1 , ρi+1 , . . . , ρn as its rows. Thus,
whereas the first row of B is the ith row ρi of A, the rest of the rows of B are the rows of A occurring in
the same order as in A starting with ρ1 but with the ith row missing. Observe that the minor obtained
from A by deleting the ith row and the jth column is that same as the one obtained from B by removing
the 1st row and the jth column. Therefore, if Ai j denotes the minor obtained from A by deleting the
ith row and the jth column of A, then according to Definition (2.12), we have
1
det B = (−1) j+1 ai j det Ai j (2.18)
j
provided ρi = (ai1 , ai2 , . . . , ain ).

Next, we find out how det A is related to det B. First note that we can obtain B from A by (i − 1)
successive interchanges of rows, in fact, of adjacent rows. Begin by interchanging ρi and ρi−1 in A so
that in the new matrix ρi is the (i − 1)th row and ρi−1 is the ith row; next interchange ρi and ρi−2 in
the new matrix to make ρi the (i − 2)nd row of the newer matrix, and continue in this manner. Since
i − 1 interchanges of adjacent rows produce B from A, and each interchange changes the determinant
of the matrix in which this row operation is taking place by a factor of (−1), we see that det B =
(−1)(i−1) det A. But note that (−1)2k = 1 for any positive integer k. So, the same relation can be put as
det A = (−1)(i−1) det B. Using the value of det B from Equation (2.18), we thus obtain the formula
1
det A = (−1)i+ j ai j det Ai j . (2.19)
j
The preceding formula, known as Laplace’s Expansion by minors along the ith row, enables one
to expand the determinant of a matrix A by minors Ai j along any fixed ith row. Observe that for i = 1,
this gives our original formula (2.12).
As we have shown the validity of expansion of a determinant by any row, naturally a question
arises: Can the determinant of a matrix be found by expanding along a column? The answer is yes.
Starting by defining the determinant of a matrix in terms of the minors of its first column, we can
develop all the properties of the determinant function analogous to all the ones we have developed so
far but in terms of columns. But a simpler way would be to show that the determinants of a matrix and
its transpose are equal. Since the rows of the transpose of a matrix are precisely its columns, it will
then be a straightforward task to obtain the properties of the determinant function in terms of columns.
We begin by showing that the determinants of an elementary matrix and its transpose are the same.
Lemma 2.7.17. Let E be an elementary matrix in Mn (F). Then det E = det E t .
Proof. If E is an elementary matrix, corresponding to either a row scaling or a row exchange then it
is clear that E t = E (see Definition 2.3.2) so there is nothing to prove. So assume that E is the matrix
obtained by adding c times the jth row of In to its ith row, then E = In + cei j , where ei j is the unit
matrix of order n having 1 at its (i, j)th place and zeros elsewhere. It is then clear that E t = In + ce ji ,
which corresponds to the row operation of adding c times the ith row to the jth row. Therefore, by the
third part of Corollary (2.7.11), det E = det E t . !
Determinant 107
Recall that if E is an elementary matrix corresponding to some elementary column operation, then E
can also be produced from In by some elementary row operation (see discussion after Definition 2.3.1
of elementary matrices), the lemma is also valid such an E.
The lemma is crucial for the proof of the next result.
Proposition 2.7.18. Let A ∈ Mn (F). Then,

det A = det (At ),
where At is the transpose of A.
Proof. Let R be the reduced row echelon form of A with r pivots, where 1 ≤ r ≤ n (if r = 0 then
R and so A must be the zero matrix and so there is nothing to prove). Thus, there are elementary
matrices E1 , E2 , . . . , El , corresponding to elementary row operations which reduce A to R such that
R = El , . . . , E2 E1 A. We claim that suitable column operations will further reduce R to
' (
Ir 0
S= .
0 0
If r = 1, R itself has the form of S and so we may assume that r ≥ 2. It is clear that by adding suitable
multiples of pivot columns of R to non-pivot columns, we can make all the non-zero entries of R,
except the pivots, zeros. Therefore, the only non-zero entries of the matrix obtained after these column
operations are the pivots of R and so suitable column exchanges will reduce it to S . Hence our claim.
By our claim there are elementary matrices F 1 , F2 , . . . , Fm , corresponding to elementary column
operations, which reduced R to S , such that
S = El · · · E2 E1 AF1 F2 · · · Fm . (2.20)
Since S T = S , taking transposes of both sides of the preceding relation, one obtains
S = F m t · · · F 1 t At E 1 t · · · E l t . (2.21)
Therefore, taking determinants of the two matrix products expressing the same matrix S , given in
Equations (2.20) and (2.21) results in the following equality
(det El ) · · · (det E2 )(det E1 )(det A)(det F1 ) · · · (det Fm ) =

(det Fm t ) · · · (det F1 t )(det At )(det E1 t ) · · · (det El t ). (2.22)
as determinant is multiplicative. By Corollary (2.7.11), the determinant of an elementary matrix is

non-zero. Also we have just seen that the determinants of an elementary matrix and its transpose are
equal. Thus, equation (2.22) (note that all the determinants in the equality are scalars in F) implies
that det A = det At . !
Let us examine why the last result allows us to compute a determinant of a matrix A = [ai j ] by
expanding it by minors along any column. Denote the transpose At of A by B = bi j so bi j = a ji for all i
and j. Observe that the minor B ji obtained from B by crossing out the jth row and the ith column of B
is the transpose of the minor Ai j obtained by removing the ith row and the jth column of A. It follows
from the preceding proposition that det B ji = det Ai j . Now, according to Equation (2.19), we can find
the determinant of B by expanding along, say, the jth row:

1
det B = (−1) j+i b ji det B ji .
i
However, det B = det A, b ji = ai j and det B ji = det Ai j , so that the preceding equation can be re-written
as
1
det A = (−1)i+ j ai j det Ai j . (2.23)
i
Note that the last sum is over dummy index i, so that the sum runs over the entries of A along the jth
column as i varies. In other words, this gives us the expansion of the determinant of A in minors along
any fixed column of A.
Apart from helping to derive the explicit expansion of the determinant of a matrix along any of
its columns, the preceding proposition also allows us to translate all the properties of the determinant
function determined by conditions on rows of matrices (which have been derived so far) to proper-
ties determined by analogous conditions on columns. We gather all such properties in the following
theorem whose easy proof is left to the reader.
Theorem 2.7.19. Let A ∈ Mn (F).

(a) The determinant function is linear on columns of A.
(b) If any column of A is a zero column, then det A = 0.
(c) If two columns of A are identical, then det A = 0.
(d) If a multiple of a column of A is added to another column, the determinant is unchanged.
(e) If two columns of A are interchanged, the determinant changes by a factor of −1.
A final remark about evaluation of determinants of specific matrices. More often than not, elemen-
tary row or column operations are performed on a given matrix to bring it to a form where it is possible
to use some result to compute its determinant directly without using the expansion by minors. For ex-
ample, if it is possible to reduce the matrix to an upper triangular matrix, then its determinant can be
computed simply by multiplying its diagonal elements. In case expansion by minors along a row or
a column has to be employed, one makes sure that row or column has plenty of zeros by appropri-
ate row or column operations. However, if elementary row or column operations are used on a given
matrix to simplify the computation of its determinant, then one must keep track of the changes of the
determinant of the original matrix due to these operations.
Cramer’s Rule
To end this section, we apply the idea of determinants to give explicit solution of a system of equations
Ax = b in case it has a unique solution. This explicit formula is known as Cramer’s Rule. It should
be mentioned that this rule has little practical use as it involves prohibitive number of calculations.
However, its theoretical importance requires that we are familiar with the rule. We need a bit of no-
tation first. Given a system of n equations in n variables Ax = b over a field F, we let A( j) for any
j, (1 ≤ j ≤ n), to be the n × n matrix obtained from the coefficient matrix A by replacing its jth column
by the column vector b. In other words, if γ1 , γ2 , . . . , γn are the n column vectors of a matrix A, and
if we describe A as
/ 0
A = γ1 . . . γ j . . . γn ,
Determinant 109
then for any fixed j,1 ≤ j ≤ n,

/ 0
A( j) = γ1 ... b ... γn .
Thus, for
   
 3 1 −4 4
   
A =  2 0 6 and b = 1 ,
   
−1 3 5 0
we have, for example,
   
4 1 −4  3 1 4
   
A(1) = 1 0 6 and A(3) =  2 0 1 , etc.
   
0 3 5 −1 3 0
Now, we can state the rule that explicitly gives the solution of a system of equations, provided the
determinant of the coefficient matrix is non-zero.
Theorem 2.7.20. (The Cramer’s Rule) Let Ax = b, be a system of n equations in n variables over
a field F. If det A ! 0, then a unique solution of the system of equations exist whose components are
given by
det A( j)
xj =
det A
for any j, 1 ≤ j ≤ n.
Note that the standard convention we are following is that the components of the n×1 column vector
x are the variables x1 , x2 , . . . , xn . However, in the formula for the Cramer’s Rule, the x j actually
stands for the components of the solution vector. We have already seen that the condition det A ! 0
is a necessary and sufficient for the system Ax = b to have a solution, and by an abuse of language,
we will refer to the components of the solution (they are now actually scalars) by x j . This is more in
keeping up with the conventional ways of stating Cramer’s Rule.
Proof. Since, by Theorem (2.7.19), the determinant function is linear on columns, we get, for any
fixed j,
/ 0
x j det A = det γ1 . . . x j γ j . . . γn ,
where γi is the column vector consisting of the entries of the ith column of A. Also, by the same result
(2.7.19), adding a multiple of a column to a fixed column does not change a determinant. Therefore,
we may add to the jth column γ j of the preceding determinant the multiples x1 γ1 , x2 γ2 , . . . , xn γn of
all the other column vectors without changing the determinant. But in that case the jth column vector
4
of x j det A will be the sum i xi γi where the sum runs from 1 to n. On the other hand, as we are
assuming that the xi are the components of the solution of Ax = b, interpreting this matrix equations
in terms of the column vectors using the column-row rule for multiplication of matrices, we obtain
n
1
xi γi = b.
i=1
Consequently, the jth column of the determinant x j det A can be replaced by the column vector b
whence
/ 0
x j det A = det γ1 . . . b . . . γn .
Thus, x j det A is precisely the determinant of the matrix A( j) introduced at the beginning of this
section. Since det A ! 0, the formula in the theorem follows. !
The Cramer’s Rule can also be derived from a certain explicit formula for the inverse of an invertible
matrix, although it must be admitted that like the Cramer’s Rule, this formula too is not very practical.
We need a definition before stating the formula.
Definition 2.7.21. Let A ∈ Mn (F). For any i and j, 1 ≤ i, j ≤ n, let Ai j be the minor produced by
deleting the ith row and the jth column of A, and let ∆i j = (−1)i+ j det Ai j . The (classical) adjoint of A,
denoted by ad j(A), is defined to be the n × n matrix, whose (i, j)th entry is ∆ ji . In other words,
ad j(A) = [ ∆i j ]t ,
where the superscript t denotes the transpose.

' (
1 2
For example, for A = , we have
3 4
∆11 = 4, ∆12 = −3
∆21 = −2, ∆22 = 1
so that
' (
4 −2
ad j(A) = .
−3 1
Observe that
' (
−2 0
A · ad j(A) = = (−2)I2.
0 −2
It is no accident that −2 is the determinant of the original matrix A. It follows that A−1 =
ad jA/ det A. This relation actually holds for any arbitrary invertible matrix and is a consequence
of the next theorem.
Theorem 2.7.22. Let A ∈ Mn (F). Then
ad j(A) · A = det A · In = A · ad j(A)
so that if A is invertible,
ad j(A)
A−1 = .
det A
Determinant 111
Proof. Let [ci j ] be the product (ad j(A))A. Then, according to Definition (2.6.2), a typical entry of the
product is given by
n
1
ci j = ∆ki ak j
k=1
1n
= (−1)k+i ak j · det Aki .
k=1
It is clear from the form of the formula that ci j is the expansion by minors along the ith column of the
determinant of some matrix. Since the minors in the sum are Aki , where i is fixed and k varies, we see
that the matrix must be the one obtained from A by replacing the ith column by the jth column of A
itself. This new matrix, thus, has two identical columns if i ! j. If i = j, then the matrix is A itself, and
so cii = det A. Stated differently, the product (ad j(A))A is a diagonal matrix as off-diagonal entries ci j
are zeros whereas with each diagonal entry cii equal to det A. It follows that (ad j(A))A is the scalar
matrix det A · In . A similar calculation shows that A · ad j(A) is the same scalar matrix. !
EXERCISES
matrices are square matrices over an arbitrary field F.
(a) If two rows, or two columns, of a matrix A are identical, then det A = 0.
(b) If det A = 0, then at least two rows or two columns of A are identical.
(c) For any elementary matrix E, det E = − det E t .
(d) If B is row equivalent to A, then det A = det B.
(e) If A is an upper triangular matrix, then det A is the product of the diagonal entries of A.
(f) For any matrix A, det (−A) = − det A.
(g) The determinant a matrix in reduced row echelon form is either 1 or 0.
(h) If det A of a matrix A in Mn (F) is non-zero, then the reduced row echelon form of A must be
In .
(i) For an invertible matrix A, det A = det (A−1 ).
(j) For any two matrices A and B of the same order, det (AB) = det (BA).
(k) For any two matrices A and B of the same order, det (A + B) = det B + det A.
(l) If A ∈ Mn (R) has integer entries, then det A is an integer.
(m) If, for A ∈ Mn (R), det A is an integer, then A has integer entries.
(n) If A ∈ Mn (R) has positive entries, then det A is also positive.
(o) If A = −At , then det A is zero.
(p) If det A = 0, then the system of equations Ax = 0 has more than one solution.
2. Prove assertion (b) of Proposition (2.7.4).
3. Prove, by induction, that if any column of a matrix A ∈ Mn (F) is a zero column, then det A = 0.
4. Prove assertion (c) of Theorem (2.7.9).

6. Prove Theorem (2.7.19).
7. For an elementary matrix E over a field F, show directly that det E = det (E t ), that is, without
invoking Proposition (2.7.18).
8. Show that for any A ∈ Mn (F) and any scalar c ∈ F,
det (cA) = cn det A.
A matrix A ∈ Mn (F) is called a nilpotent matrix if Ak is the zero matrix for some positive
integer k.
9. For any nilpotent A ∈ Mn (F), show that det A = 0.
10. Let A ∈ Mn (F). Assume that 2 ! 0 in F.
(a) If A = −At , then show that A cannot be invertible if n is odd.
(b) Assume that F = R. If AAt = In , then show that det A = ±1.
11. Let F be a field. Show that det : Mn (F) → F is an onto function which is not one–one. Prove,
further, that det is a homomorphism from the multiplicative group GLn (F) of invertible matrices
in Mn (F) onto the multiplicative group of non-zero elements of F.
12. Evaluate the determinants of the following matrices over R.
     
3 0 −1 −1 2 3 1 a a2 
0 2
   
3,  3 4 0, 1 b b2 

     2 
1 −4 0 −2 1 0 1 c c
13. Find the determinant of the following real matrix:

 
1 1 0 0 0 0

a 1 0 0 0 0

0 0 a 0 0 0
 
0 0 1 a 0 0
0 
 0 0 0 a 1

0 0 0 0 1 1
.
14. Let A ∈ Mn (R) having a ∈ R in the diagonal, 1 along the subdiagonal as well as along the super-
diagonal and zeros everywhere else. Evaluate det A.
15. Use the Cramer’s Rule to find solutions of the following system of equations over any field F:
ax1 + x2 = b1 5 √ 6
x1 + ax2 + x3 = b2 a! 2
x2 + ax3 = b3 ,
where a is a non-zero and b1 , b2 and b3 are arbitrary elements of F. Hence or otherwise, find the
inverse of the matrix
 
a 1 0
1 a 1.
 
0 1 a
Determinant 113
16. Solve the following system of equations over R by using the Cramer’s Rule:
    
 2 −1 −1  x1  b1 
    
−1 2 −1  x2  = b2 ,
   
−1 −1 2 x3 b3
where b1 , b2 and b3 are arbitrary real numbers.
17. Use the Cramer’s Rule to find solutions of the following system of equations over any field F:
    
0 a b  x1  0
a b a  x2  = a,
     
b a 0 x3 b
where a and b are non-zero elements of F.
3 Vector Spaces
3.1 INTRODUCTION
The set of m × n matrices, say with real entries, is an example of a mathematical structure with two
operations, similar to addition and scalar multiplication of matrices, which satisfy the same basic
properties as the matrix operations do. Such structures are known as vector spaces. Examples of vector
spaces and their applications are found in every conceivable area of today’s theoretical and technical
disciplines. Because these structures occur in such diverse areas, it makes sense to study vector spaces
in abstract without referring to specific entities so that a single theory is available to deal with all of
them. Before stating the formal definition of a vector space, we have a brief look at a few examples of
such spaces.
One of the most important example is a special case of the matrices, namely, the set Rn of all ordered
n-tuples of real numbers. As we have seen in the very first chapter, elements in Rn can be viewed either
as n-dimensional row vectors or as n-dimensional column vectors depending on our convenience.
Vectors of Rn can be added and multiplied by a real number component-wise; the resultant vectors
will still be in Rn :
     
a1  b1  a1 + b1
a2  b2  a + b 
  +   =  2 2
. .
 .   .   ... 
 .   .   
an bn an + bn
   
a1  ca1 
a  ca 
 2   2
c  .  =  .  .
.
 .   .. 
   
an can
Rn is a vector space as Rn with respect to these operations, called addition and scalar multiplica-
tion, satisfy the properties given in Propositions (1.3.1) and (1.3.3). If, instead, we consider Rn as
the collection of n-dimensional row vectors, the vector space operations will be similarly performed
114
Basic Concepts 115
component-wise as follows:
(a1 , a2 , . . . , an ) + (b1, b2 , . . . , bn ) = (a1 + b1 , . . . , an + bn )

c(a1 , a2 , . . . , an ) = (ca1 , ca2 , . . . , can ).
For the next example of vector spaces, we have chosen the collection of all directed line segments
in the plane (or in the three-dimensional space). These are probably the first vectors the reader had
encountered in college mathematics. As explained in geometry courses
• Two line segments are equal, if they have the same length and direction.
• Two line segments are added by the Parallelogram law.
• The scalar multiple of a line segment u by a real number a is the line segment whose length is
|a| times the length of the original one, and whose direction is the same as that of u if a > 0, and
opposite to that of u otherwise.
The reader may recall that the basic properties of the two operations of addition and scalar mul-
tiplication of line segments are exactly the same as those of matrix operations given in Propositions
(1.3.1) and (1.3.3), and can be verified by geometrical arguments. For example, a line segment having
length zero, that is, a point is clearly the additive identity, playing the role of the zero matrix here.
Again, given a line segment u, the line segment v having the same length as u, but being in the oppo-
site direction acts as the additive inverse of u, for u + v is clearly the segment of zero length, that is,
the zero segment. The set of directed line segments, thus, is a real vector space, as the scalars are real
numbers.
As our third example, we consider the set of all real-valued functions on some closed interval [a, b].
As we will see later, such functions can be added and their scalar multiples can be defined in a very
natural manner. For example, x2 + sin x and 3 exp x need no explanations. What is important is that
the properties satisfied by these operations on functions are the same as the ones satisfied by matrices,
or by ordered n-tuples of real numbers.
In view of these examples, it will be convenient to have the concept of a vector space in abstract
without specifying the nature of the elements in it. To be more specific, we will define a vector space
to be a set satisfying certain rules (axioms) with respect to certain operations and derive properties
satisfied by the space solely on the basis of these axioms.
3.2 BASIC CONCEPTS

As in the earlier chapters, a reader who is not familiar with the concept of a field, may take it as the
set of either the real numbers or the complex numbers with usual addition and multiplication.
Definition 3.2.1. Let F be a field and V a non-empty set with two operations:
(a) addition, which is a rule by which each pair of elements u, v in V is associated with a unique
element u + v ∈ V, called the sum,
(b) scalar multiplication, which is a rule by which each u ∈ V for any a ∈ F is associated to a unique
element au ∈ V, called the scalar multiple.
We say that V is a vector space over the field F if the following axioms hold:
(i) For all u, v and w in V,
116 Vector Spaces
(a) u + v = v + u;
(b) (u + v) + w = u + (v + w);
(c) There is an element 0 ∈ V such that
u + 0 = u = 0 + u;
(d) For each u ∈ V, there is an element −u ∈ V such that
u + (−u) = 0 = (−u) + u;
and
(ii) for all u, v in V and all scalars a, b in F,
(a) a(u + v) = au + av;
(b) (a + b)u = au + bu;
(c) (ab)u = a(bu);
(d) 1u = u, where 1 is the multiplicative identity in F.
The four axioms in (i) shows that a vector space V is an abelian group under addition. In other
words, a vector space V over a field F is an abelian group with respect to addition satisfying addition-
ally the axioms in (ii) for multiplication by scalars in F.
The elements of a vector space are usually called vectors; the vector 0 whose existence is assured
in the third axiom for addition is called zero vector. The zero vector in a vector space is also known
as the additive identity, and the vector −u as the additive inverse of u.
Note our convention that vectors are denoted by bold lower case letters and scalars by plain lower
case letters.
If F = R, we say that V is a real vector space and if F = C, V is a complex vector space. In general,
if V is a vector space over a field F then the elements of F are called scalars.
It is useful to have the notion of subtraction in a vector space V; for any u, v ∈ V, we let
u − v = u + (−v). (3.1)
Thus v − v = 0. It follows that by subtracting v from both sides of the relation u + v = w + v, we obtain
u = w.
The following properties are easy consequences of the definition of a vector space.
Proposition 3.2.2. For a vector space V over a field F, the following assertions hold:
(a) The zero vector 0 of V is unique.
(b) The negative or the additive inverse −u in V of any vector u ∈ V is unique.
(c) a0 = 0 for any scalar a ∈ F.
(d) 0u = 0 for any vector u ∈ V.
(e) (−a)u = −(au) for any u ∈ V and any a ∈ V.
(f) For a non-zero vector u ∈ V, au = 0 for a scalar a implies that a = 0.
We should point out that the bold-faced zero denotes the zero vector, and the plain zero denotes the
scalar zero.
Basic Concepts 117
Proof. To prove (a), assume that there is another vector 0 ∈ V such that 0 + v = v = v + 0 for all v ∈ V.
In particular then, 0 + 0 = 0 = 0 + 0. However, by definition, 0 + 0 = 0. It follows that 0 = 0, proving
the uniqueness of 0. Similarly to prove (b), assume that for a v ∈ V, there is some u ∈ V such that
v + u = 0 = u + v. Now as (−v) + v = 0, one has the following:
u = 0 + u = (−v + v) + u = −v + (v + u) = −v + 0 = −v.
This proves the uniqueness of the additive inverse of any v ∈ V.
Next observe that by the distributive law for scalar multiplication, for any u ∈ V and a ∈ F, one has
au = a(u+0) = au+a0. So by subtracting the vector au from both sides, we conclude that a0 = 0, which
is (c). On the other hand, by subtracting au from both sides of the relation au = (a + 0)u = au + 0u, we
obtain 0u = 0, which is (d).
Now for any a ∈ F, a + (−a) = a − a = 0 in F. Therefore, au + (−a)u = (a + (−a))u = 0u = 0 by (d).
By the uniqueness of the additive inverse, then it follows that −au = (−a)u, which proves (e).
For the final assertion, recall that any non-zero scalar a ∈ F has a multiplicative inverse a−1 in F
such that aa−1 = 1. Therefore, if au = 0 for a non-zero scalar a, then multiplying the equality by a−1
and using the last two axioms for scalar multiplication, we see that u = 0. Thus, if au = 0 holds for a
non-zero vector u, then a = 0 in F. This completes the proof.
Next observe that for any a ∈ F and v ∈ V, −av is the additive inverse of av. Therefore, by the
distributive law for scalar multiplication,
a(v − v) = av − av = 0.
it follows that a0 = a(v − v) = 0. !
Subspaces
Before discussing examples of vector spaces, let us introduce the important idea of a subspace. A
subspace W of a vector space V is a non-empty subset of V which is a vector space on its own with
respect to the operations of V. In other words, a subspace is actually a vector space within a larger
vector space with its laws of compositions the same as the ones in the larger one but restricted to it.
It is easy to see that a subset of a given vector space will form a vector space on its own if it satisfies
only the three conditions stated in the next result.
Proposition 3.2.3. A non-empty subset W of a vector space V is a subspace (with respect to

addition and scalar multiplication of V restricted to it) if
(a) W is closed with respect to addition, that is, for any u and v ∈ W, the sum u + v ∈ W;
(b) W is closed with respect to scalar multiplication, that is, for any u ∈ W, and a ∈ F, the scalar
multiple au ∈ W;
(c) the zero vector of V is in W.
Proof. Conditions (a) and (b) imply that the operations of V are also valid as operations within W. So
W, whose elements are vectors in V too, does satisfy all the axioms of a vector space except possibly
the ones that stipulate the existence of additive identity (zero) and additive inverse. However, condition
(c) ensures that W has the zero vector. Thus, the only point we have to check is that for any u ∈ W, its
inverse −u in V actually lies in W. But by condition (b), (−1)u ∈ W and (−1)u = −u as u is a vector in
a known vector space V.
118 Vector Spaces
Actually, the third condition is superfluous as W is non-empty. For any given vector in W, the sec-
ond condition ensures that its inverse is in W so that the first condition implies that the zero vector is
in W. !
Let us discuss some examples of vector spaces. Note that the subspaces of any vector space also
provide us with examples of vector spaces.
EXAMPLE 1 For any vector space V, there are always two subspaces, namely, the whole space
V itself, and the singleton {0} consisting of the zero vector of V, known as the zero
subspace of V. The three conditions defining a subspace in both the cases are trivially
satisfied. These two subspaces are known as the trivial or the improper subspaces of
V. Any subspace of V, other than these two, is known as a proper subspace of V.
EXAMPLE 2 Let K be a subfield of a field F (see Section 1.7 for the definition of subfields). F,
being a field, is an abelian group with respect its addition. Multiplication of elements
of F by elements of K can be considered as scalar multiplication on F, which, by the
properties of multiplication in F, trivially satisfies all the axioms for scalar multipli-
cation. Thus, F is a vector space over any of its subfields. In particular, F is a vector
space over itself.
Since the field R of real numbers is a subfield of the field C of complex numbers,
C is a vector space over R. Similarly both R and C are vector spaces over the field Q
of rational numbers.
EXAMPLE 3 The basic model for a vector space for us, as we have mentioned earlier, is the real
vector space Rn , whose elements are n-dimensional column (or n-dimensional row)
vectors with real entries. As we have seen in the introduction to this chapter, the
addition as well as the scalar multiplication in Rn are performed (and equality de-
fined) component-wise. That Rn is an abelian group under addition, and that Rn is
actually is a vector space over R are consequences of the corresponding results in
Propositions (1.3.1) and (1.3.3) for matrices. Nevertheless, the reader should verify
the vector space axioms for Rn directly.
Two special cases of Rn , namely, the plane R2 and the three-dimensional space
3
R , are the easiest spaces to visualize and so will play an important role in the rest of
this book.
EXAMPLE 4 The subspaces of Rn , in particular of R2 and R3 , are important geometrical objects.

In fact, the subspaces of R2 and R3 are best described geometrically. Note that
R2 = {(x1 , x2 ) | xi ∈ R}
so that we can think of R2 as the plane with a fixed coordinate system whose points
are identified through their coordinates with the ordered pairs in R2 . Thus, the zero
subspace of R2 is the singleton formed by the origin (0, 0). Now, any line passing
through the origin has the equation of the form y = mx, and so the subset {(x, mx) |
Basic Concepts 119
x ∈ R} of R2 does describe the line. It is easy to verify that {(x, mx) | x ∈ R} for a
fixed real number m is indeed a subspace of R2 . We will show later that any proper
subspace of R2 must be such a straight line in the plane through the origin.
Similarly, the proper subspaces of R3 are either the straight lines through the
origin or the planes containing the origin.
Sometimes for geometrical considerations, instead of identifying a row vector
x = (x1 , x2 ) (or the column vector x = (x1 , x2 )t ) in R2 with the point (x1 , x2 ) in
the plane, we think of x as representing the directed line segment from the origin
(0, 0) to the point (x1 , x2 ). This representation works effectively as vector addition
corresponds precisely to the addition of line segments by the Parallelogram law:
if L1 is the line segment from (0, 0) to (x1 , x2 ) and L2 the segment from (0, 0) to
(y1 , y2 ), then L1 + L2 is the line segment from (0, 0) to the point (x1 + y1 , x2 + y2 ).
Such identification of vectors with line segments enables geometrical problems to be
discussed in terms of vectors.
EXAMPLE 5 For any field F, we can consider the set Fn of all n-tuples of scalars from F. The
components being field elements, we can define equality and addition as well as
scalar multiplication component-wise in Fn in terms of addition and multiplication
in F exactly the way they were defined for Rn . With respect to these operations, Fn
becomes a vector space over F (see details of these operations in Section 3 of Chapter
1). An important example is the complex vector space Cn of all n-tuples of complex
numbers with respect to operations defined component-wise.
The elements of Fn (so of Cn too) can be considered either n-dimensional col-
umn or n-dimensional row vectors. Usually a column vector shall be written as the
transpose (a1 , a2 , . . . , an )t of a row vector.
EXAMPLE 6 Every homogeneous system of equations Ax = 0, where A is a fixed matrix in

Mm×n (F) and x is an n-dimensional column vector of variables, provides us with
a subspace of Fn (here, we consider Fn as the space of column vectors over F). We
claim that the n-dimensional column vectors in Fn , which are solutions of Ax = 0,
form a subspace of Fn , called the solution space of the given system of equations.
For, by the rules of matrix multiplication, A(y + z) = Ay + Az and Acy = cAy, so that
the sum of any two solutions and any scalar multiple of a solution are again solu-
tions of the system of equations Ax = 0. Since the zero vector is always a solution of
Ax = 0, our claim follows from Proposition (3.2.3).
Note that if A is an invertible matrix, then the solution space of Ax = 0 is none
other than the zero subspace of Fn .
EXAMPLE 7 It is easy to check that Propositions (1.3.1) and (1.3.3) of Chapter 1 imply that the
set of matrices Mm×n (F) or Mn (F) are both vector spaces over the field F. There are
numerous subspaces of these two vector spaces. Some interesting ones of the vector
space of square matrices Mn (F) are listed below.
(a) The lower triangular matrices;
(b) The upper triangular matrices;
120 Vector Spaces
(c) The diagonal matrices;

(d) The scalar matrices;
(e) The set of all matrices whose some specified entries are zeros. For example, if
all off-diagonal entries are zero, then the corresponding set is the subspace of
diagonal matrices. Also note, as another example, that the subset of all matrices
in Mn (F) whose first rows are zero is a subspace of Mm×n (F).
EXAMPLE 8 Let X be a non-empty subset of either R or the set of natural numbers N, and let X R
be the set of all functions from X into R. Thus,
X R = { f | f : X → R}.
First, we define equality in X R . For any f, g ∈ X R , we let
f = g if and only if f (x) = g(x) for all x ∈ X.
We define the sum f + g of elements f, g ∈ X R and the scalar multiple a f for any
a ∈ R point-wise, that is, by giving their values at any x ∈ X as follows:
( f + g)(x) = f (x) + g(x)

(a f )(x) =a f (x),
where the sum f (x) + g(x) and the product a f (x) are computed as real numbers.
One easily verifies that X R with these laws of compositions is a real vector space.
Note that verification of the vector space axioms is straightforward because of the
properties of addition and multiplication of real numbers. The zero vector here is the
function z defined to satisfy z(x) = 0 for every x ∈ X. Consequently, the inverse of any
f ∈ X R is the function − f given by (− f )(x) = − f (x).
Consider the special case when X = [a, b], a closed interval in R. The set of all
continuous functions as well as the set of all differentiable functions from [a, b] into
R form two interesting subspaces of X R . We will denote the real vector space of all
real-valued continuous functions on a closed interval [a, b] as C[a, b].
EXAMPLE 9 For any field F, a sequence in F can also be considered as a function f from the
natural numbers N into F. It is a convention to describe such a sequence f such
that f (n) = an for n = 1, 2, . . . by the symbol {an }. We can define operations on the
collection V of all sequences in F term-wise:
{an } + {bn } = {an + bn } and c{an } = {can }.
Since the terms of the sequences are field elements, these term-wise operations make
V into a vector space over F. Note that the zero of this vector space has to be the
constant sequence whose every term is the zero of the field.
EXAMPLE 10 Let R[x] be the collection of all polynomials with real coefficients. Take the laws of
compositions in R[x] to be the usual addition of polynomials and the multiplication
of a polynomial by a real number. With these operations, R[x] is a real vector space.
Basic Concepts 121
Observe that for any non-negative integer n, the set Rn [x] of all polynomials of
degree at most n is a subspace of R[x].
We point out that polynomials can be considered over any field; the coefficients
of such a polynomial are scalars from the field. It is clear that operations defined in
the same way as in R[x] will make the set F[x] of such polynomials a vector space
over F.
Sum of Subspaces
For the following result, which produces new subspaces from known ones, we need a new notation: if
A and B are non-empty subsets of a vector space V, then the sum
A + B = {a + b | a ∈ A, b ∈ B}
is the collection of all possible sums of the vectors of A and B.
Proposition 3.2.4. Let W1 and W2 be two subspaces of a vector space V over a field F. Then the
intersection W1 ∩ W2 and the sum W1 + W2 are subspaces of V.
Proof. Leaving the case of the intersection to the reader, we consider that of the sum. Since the zero
vector of V belongs to both the subspaces, it is in W1 + W2 too. Next, let u, v ∈ W1 + W2 . By definition,
there are elements wi , w'i ∈ Wi such that u = w1 + w2 and v = w'1 + w'2 . So, u + v = (w1 + w2 ) + (w'1 + w'2 ) =
(w1 + w'1 ) + (w2 + w'2 ), which is clearly in W1 + W2 . Now, if a ∈ F, then au = a(w1 + w2 ) = aw1 + aw2 is
in W1 + W2 as W1 and W2 are closed with respect to scalar multiplication. Thus, by Proposition (3.2.3)
W1 + W2 is a subspace of V. !
We must point out that the union W1 ∪ W2 need not be a subspace. See Exercise 9 of this section.
There is another natural way of constructing subspaces of a given vector space. We begin by the
simplest of such construction.
EXAMPLE 11 Fix any arbitrary vector v in a vector space V over a field F. Then, by Proposition
(3.2.3), one sees easily that the set {av | a ∈ F} of all scalar multiples of v is a subspace
of V. It is the subspace of V generated or spanned by v.
If W1 and W2 are subspaces of V generated by vectors v1 and v2 , respectively, then it is clear that the
sum W1 + W2 is the collection of all sums of the type a1 v1 + a2 v2 . Since such sums of scalar multiples
of vectors, known as linear combinations of vectors, are essential in examining vector space structures,
we discuss them now.
Let V be a vector space over a field F. Given a set of vectors v1 , v2 , . . . vm in V, a linear combina-
tion of the given vectors is a sum of the type
1
a i vi = a 1 v1 + a 2 v2 + · · · + a m vm .
i
where a1 , a2 , . . . , am are arbitrary scalars from F. Choosing all the ai to be zeros, we see that the
zero vector can be realized as a linear combination of any set of vectors. A single vector v j can also
be thought of as a linear combination of v1 , v2 , . . . , vm by choosing a j = 1 and all the other ai to be
zeros.
122 Vector Spaces
We have already come across linear combinations of vectors in preceding chapters. For example,
in Propositions (1.3.7) and (1.3.8), we had expressed arbitrary matrices as linear combinations of unit
matrices.
Note: If W is a subspace of a vector space, any linear combination of a finite number of vectors of
W is a vector in W.
Linear combinations of vectors from a set of vectors give rise to a subspace known as the linear
span of the vectors.
Definition 3.2.5. Given a set S = {v1 , v2 , . . . , vm } of finitely many vectors of a vector space V over
a field F, the linear span /S 0 of S (over F) is the set all possible linear combinations of vectors of S ,
that is,
/S 0 = {a1 v1 + a2 v2 + · · · + am vm | ai ∈ F}
 

1
 


= ai vi | ai ∈ F
 .
 
i
If S has infinitely many vectors, then /S 0 is defined to be the collection of all the linear combinations
of all possible finite subsets of S .
We may rephrase the definitions for the two cases as a single assertion as follows: /S 0 is the set of
all finite linear combinations of vectors of S .
If S = {v}, then it is clear that /S 0 is the set of all scalar multiples of v which includes v as well as
the zero vector, and as we had seen in the preceding example, is a subspace. Let us see some more
examples.
EXAMPLE 12 If S is a singleton {u} in R2 , then its linear span is the set {au} where a ranges over
R. Geometrically, the span is the straight line through the origin and the point u in
the plane R2 . For example, if we choose u = (1, 0), then its span is {(a, 0)} which is
the x-axis.
Similarly, the linear span of the unit vectors {(1, 0), (0, 1)} is the set {a(1, 0) +
b(0, 1)} = {(a, b)} for arbitrary reals a and b, and so must be the whole plane R2 .
EXAMPLE 13 In the same manner, we can see that if u and v are two vectors in R3 , then their
linear span will be the plane through the origin containing the two vectors. Thus, if
u = (1, 0, 0) and v = (0, 1, 0), then their span is the xy plane.
EXAMPLE 14 Consider the set {1, x, x2 } of vectors in the real vector space R[x]. The linear span of
this set is clearly
{a + bx + cx2 | a, b, c ∈ R},
and thus is the subspace of all polynomials of degree at most 2.

Consider next the infinite set S = {1, x, x2 , . . . , xn , . . . } consisting of all the non-
negative integral powers of x in R[x]. Any linear combination of finitely many pow-
ers of x will be in /S 0. Moreover, we can choose these powers from S arbitrarily.
Therefore, any polynomial, irrespective of its degree and number of terms, can be
Basic Concepts 123
thought of as linear combination of finitely many suitable powers from S , and so

belongs to /S 0. It follows that /S 0 = R[x].
In each of these examples, the linear span of a set of vectors turns out to be a subspace. This is true
in general. This and other basic facts about linear spans of vectors are collected in the next proposition.
Proposition 3.2.6. For a non-empty subset S of a vector space V, the following hold:
(a) The linear span /S 0 of S is a subspace of V. In fact, /S 0 is the smallest subspace of V containing
S.
(b) //S 00 = /S 0.
(c) If T is another non-empty subset of V such that T ⊂ S , then /T 0 ⊂ /S 0.
(d) /S ∪ T 0 = /S 0 + /T 0 where the symbol + denotes the sum of subspaces.
Proof. We begin by noting that as any vector in /S 0 a linear combination of finite number of vectors
in S , the sum of two such vectors is again a linear combination of only finite number of vectors of S .
The zero vector can be realized as a linear combination of any set of vectors and so it is in /S 0. Next,
it is clear that any scalar multiple of a linear combination in /S 0 is again in /S 0. Thus, by Proposition
(3.2.3),/S 0 is a subspace of V.
For the other assertion in (a), observe that any finite linear combination of vectors of a subspace
must be in the subspace as a subspace is closed with respect to addition and scalar multiplication.
Thus, if W is a subspace of V containing S , then /S 0, by definition, will be a subset of W. The proof
of (a) is complete.
In particular, it is shown that if W is a subspace of V, then /W0, being the smallest subspace of
V containing W, is W itself. (b) therefore follows as /S 0 itself is a subspace. (c) follows from the
definition of linear span. For the last assertion, it suffices to verify that the sum on the right-hand side
is the smallest subspace containing the union S ∪ T , and we leave the verification to the reader. !
We sometimes say that /S 0 is the subspace generated by S , or spanned by S , and refer to S as

a generating set, or a spanning set of /S 0. Note that there may be more than one generating set for
a subspace. In other words, there may be two different subsets S and T of a vector space such that
/S 0=/T 0. For example, it can be shown easily that R2 = /(1, 0), (0, −1)0 even though we had seen that
R2 is the span of {(1, 0), (0, 1)}. Another point to note is that we can add certain vectors to a generating
set S without altering the linear span of S . For example, R2 is also the span of {(1, 0), (0, 1), (1, 1)} for
(1, 1) itself is a linear combination of the other two vectors. Thus, there is a need to identify minimal
generating sets, if any, of a linear span of vectors. We examine this and other aspects of generating
sets of a linear span in the following section.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications:
(a) Any non-zero vector space over R has infinitely many distinct vectors.
(b) Any non-zero vector space over an infinite field has infinitely many distinct subspaces.
(c) The field Q of rationals is a real vector space.
(d) The field R is a vector space over Q.
(e) The set C[x] of all polynomials with complex coefficients is a real vector space.
124 Vector Spaces
(f) The set R[x] of all real polynomials is a complex vector space.
(g) The subset {a + bi | a, b ∈ Q} of C is a subspace of C over Q.
(h) The sum of the subspaces {(x, 0) ∈ R2 } and {(x, x) ∈ R2 } is R2 .
(i) W1 + W2 = W2 + W1 for any two subspaces W1 and W2 of a vector space V.
(j) If, for subspaces W1 , W2 and W3 of a vector space V, W1 + W2 = W1 + W3 , then W2 = W3 .
(k) If, for two subspaces W1 and W2 of a vector space V, W1 + W2 = V, then W1 ∩ W2 is the zero
subspace.
(l) Any two sets of vectors spanning a subspace of a vector space must have the same number
of vectors.
(m) If W is the linear span of a set of vectors S of a vector space, then no proper subset of S can
span W.
(n) The sum of two subspaces of a vector space contains each of the two subspaces.
(o) The set of all invertible matrices in Mn (F) is a subspace.
(p) The set of all non-invertible matrices in Mn (F) is a subspace.
(q) The empty set is a subspace of every vector space.
(r) For vectors v1 , v2 of a vector space,
/v1 , v2 0 = /v1 + v2 0.
(s) R2 is a subspace of R3 .
(t) The set {(a, a) | a ∈ R} is a subspace of R2 .
(u) If U is a subspace of W and W is a subspace of V, then U is a subspace of V.
2. Prove the basic properties of vector space operations as stated in Proposition (3.2.2).
3. Let v be a fixed vector in a vector space V over a field F. Verify directly that the set {av | a ∈ F}
of all scalar multiples of v is a subspace of V.
4. Determine whether the following subsets S of the real vector spaces V form subspaces:
(a) S = {(x, mx) | x ∈ R} for any fixed real m; V = R2 ;
(b) S = {(x, y) | y = sin x , x ∈ R}; V = R2 ;
(c) S = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0, xi ∈ R}; V = R3 ;
(d) S = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 1, xi ∈ R}; V = R3 ;
(e) S = {(x1 , x2 , x3 ) | x1 = x2 , x3 = 2x1 , xi ∈ R}; V = R3 ;
(f) S = {(x1 , x2 , x3 ) | x1 ≥ 0, xi ∈ R}; V = R3 ;
(g) S = {(x1 , x2 , x3 ) | x1 2 + x2 2 + x3 2 = 0, xi ∈ R}; V = R3 .
5. Verify that the subsets of the vector space Mn (F) consisting of the given matrices in each of the
following are subspaces:
(a) the symmetric matrices;
(b) the lower triangular matrices;
(c) the upper triangular matrices;
(d) the diagonal matrices;
Basic Concepts 125
(e) the scalar matrices;

(f) matrices whose all diagonal entries are zeros;
(g) matrices which commute with a fixed matrix A ∈ Mn (F);
(h) matrices whose first row has all zero entries.
6. In each of the following cases, determine whether the subset of the real vector space R[x] con-
sisting of the given polynomials forms a subspace:
(a) polynomials of degree at most n for any positive integer n;
(b) constant polynomials or polynomials of degree 0;
(c) polynomials whose constant term is zero;
(d) polynomials whose constant term is 1;
(e) polynomials whose derivatives vanish at x = 1.
Can this exercise be generalized to polynomials with complex coefficients?
7. Verify that the set I R of all real-valued functions defined on a closed real interval I = [a, b]
is a real vector space with respect to pointwise addition and pointwise scalar multiplication as
defined in Example 8. Show further that the subsets of all continuous and differentiable functions
on I, respectively, are subspaces.
8. Let V be the set of all real sequences {an }, that is, sequences in R. Verify that V is a real vector
space with respect to coordinate-wise addition and scalar multiplication as defined in Example
9. Prove further that, in each of the following, the subset of V consisting of the given sequences
forms a subspace:
(a) convergent sequences;
(b) sequences which converge to 0;
4
(c) sequences {an } such that i an 2 is finite.
9. Consider subspaces W1 = {(x, x) | x ∈ R} and W2 = {(x, 2x) | x ∈ R} of R2 . Verify that W1 ∪ W2 is
not a subspace of R2 .
10. Given subspaces W1 and W2 of a vector space V, show that W1 ∪ W2 is a subspace of V if and
only if W1 ⊂ W2 or W2 ⊂ W1 .
11. Prove the following variant of Proposition (3.2.3): A non-empty subset W of a vector space V
over a field F is a subspace if and only if aw1 + w2 ∈ W whenever a ∈ F, and w1 , w2 ∈ W.
12. Let W1 and W2 be subspaces of a vector space V. Prove that any subspace of V that contains
both W1 and W2 also contains the subspace W1 + W2 .
13. Let S and T be non-empty subsets of a vector space V. Prove that the linear span of S ∪ T is the
sum /S 0 + /T 0 of subspaces spanned by S and T .
14. Let S be a non-empty set of vectors in a vector space V, and let v ∈ S be such that v is a linear
combination of finitely many vectors of S , none of which is v. If S ' is the set obtained from S
by removing v from S , then show that /S 0 = /S ' 0.
15. Prove that R2 is spanned by the vectors (1, 0) and (1, −1).
16. Let W be the subset of R3 given by
  

 x  

 1 
 

W =
 

 x 2 

 | x 1 − 2x 2 − 3x 3 = 0 
 .

 x   

3
126 Vector Spaces
Verify that W is a subspace of R3 . Show that every vector of W can be expressed as

 
2a + 3b
 a ,
 
b
for some real numbers a and b. Hence, find vectors v1 and v2 in R3 such that their span /v1 , v2 0
equals W.
17. Let W be the set of all vectors in R4 of the form
 
 a − b 
a − 2b
 
2a − b
3b
for arbitrary reals a, b. Show that W is a subspace of R4 by finding vectors in R4 whose span is
W.
18. Let V = R2 be the set of all ordered pairs of real numbers. In V, define addition component-wise
as usual, but define scalar multiplication in the following way:
a(x1 , x2 ) = (ax2 , ax1 )
for any a ∈ R. Is V, with these operations, a vector space over R?

19. Let V = Rn be the real vector space of all ordered n-tuples of real numbers with respect to
usual component-wise addition and scalar multiplication. Is V a vector space over the field C
of complex numbers with similar operations? Is V a vector space over the field Q of rational
numbers with similar operations?
20. Prove that a vector space over an infinite field cannot be the union of finitely many proper
subspaces.
The following three exercises are taken from the article ’Generating Exotic-looking Vec-
tor spaces’ by M.A. Carchidi, which appeared in the College Mathematics Journal,Vol.29,
No.4(Sept.,1998)
21. Consider the set of real numbers in the open interval (−1, 1). For any x, y ∈ (−1, 1) and for any
α ∈ R, define
x+y (1 + x)α − (1 − x)α
x⊕y = , α3 x = .
1 + xy (1 + x)α + (1 − x)α
Verify that for such x, y and α, x ⊕ y and α 3 x are indeed real numbers in (−1, 1). Then show
that (−1, 1) is a vector space over R with respect to addition ⊕ and scalar multiplication 3. (The
reader is expected to verify each of the vector space axioms.)
22. Let a be a fixed real number. For any real numbers x, y and α, define
x ⊕ y = x + y − a, α 3 x = αx + a(1 − α).
Show that R is a vector space over itself with respect to ⊕ and 3.

The preceding two exercises are special cases of the following general result.
Linear Independence 127
23. Let f : R → V be an one–one map from the set R of real numbers onto a non-empty set V. Define
addition ⊕ and scalar multiplication 3 in V, for any x, y ∈ V and any α ∈ R, by
x ⊕ y = f ( f −1 (x) + f −1 (y)) α 3 x = f (α. f −1 (x)),
where f −1 : V → R is the inverse of f , and + and . are usual addition and multiplications in R.
Verify that V is a real vector space with respect to ⊕ and 3.
3.3 LINEAR INDEPENDENCE

We have seen in the preceding section that a given set of vectors generating a subspace of a vector
space may not be the ideal one. This section deals with the important question of finding suitable
generating sets for vector spaces. It is reasonable to expect that an ideal generating set for a vector
space should be minimal in the sense that no proper subset of it can span the vector space. We will see
that such minimal generating sets for a large class of vector spaces will allow us to describe vectors in
these spaces in terms of coordinates. The use of coordinates, in turn, will permit matrix methods for
analysis of questions even in abstract vector spaces.
The search for minimal generating sets depends on another important idea, that of linear indepen-
dence of vectors.
Definition 3.3.1. A finite set S of vectors {v1 , v2 , . . . , vm } of a vector space V over a field F is said
to be linearly independent over F if the relation
a1 v1 + a2 v2 + · · · + am vm = 0, ai ∈ F,
holds only when all the scalars ai = 0 in F. We also say that S is linearly dependent over F if S is not
linearly independent. An arbitrary set S of vectors in V is linearly independent over F if every finite
subset of S is linearly independent. Otherwise, S is linearly dependent over F.
Note that {v1 , v2 , . . . , vm } is linearly dependent over F if and only if a relation
a1 v1 + a2 v2 + · · · + am vm = 0, ai ∈ F (3.2)
holds where not all scalar coefficients ai are zeros. Such a relation among a set of linearly dependent
vectors is sometimes referred to as a relation of linear dependence.
A linear combination, such as the preceding one, where at least one scalar coefficient is non-zero,
may be termed as non-trivial as opposed to a trivial one in which all the coefficients are zeros. Using
these terms, we may rephrase our definition of linear independence as follows: a finite set of vectors
is linearly independent if and only if no non-trivial linear combination of these vectors results in the
zero vector. Similarly, the set is linearly dependent if and only if some non-trivial linear combination
equals the zero vector.
A remark about the definition of linear independence is in order; the concept of linear independence
depends closely on the scalar field associated with the vector space. See Example 21 discussed later.
Even then, if there is no confusion about the underlying field, we may drop the words ‘ over F’ while
talking about linear independence.
The observations in the following result will prove quite useful.
128 Vector Spaces
Proposition 3.3.2. Let S be a non-empty subset of a vector space V.

(a) If S contains the zero vector of V, then S is linearly dependent. Also, if S has two equal vectors,
then S is linearly dependent.
(b) If S is linearly independent, then so is every subset of S .
(c) If S is linearly dependent, then so is every subset of V containing it.
(d) A single non-zero vector in V is linearly independent.
(e) Two vectors of V are linearly dependent if and only if one is a scalar multiple of the other.
The verifications of these assertions are straightforward, and left to the reader.
To make the idea of linear independence clear, let us look at some examples.
EXAMPLE 15 The vectors (1, 0) and (0, 1) of R2 are linearly independent over R. For, if x1 (1, 0) +
x2 (0, 1) = (0, 0) for real numbers x1 and x2 , then it follows that (x1 , x2 ) = (0, 0) or,
equivalently, that x1 = x2 = 0.
A similar verification shows that the m vectors e1 = (1, 0, 0, . . . , 0), e2 =
(0, 1, 0, . . . , 0), . . . en = (0, 0, . . . , 0, 1) in Rm (or even in Fm for any field F),
where e j has 1 as the jth place and zeros elsewhere, are linearly independent over R
(over F).
Note that the relation of linear dependence for vectors of a vector space V has the
zero vector of V on the right-hand side, not the zero scalar.
EXAMPLE 16 Let us examine whether the vectors (a, b, c), (1, 1, 0) and (0, 1, 0) of R3 are linearly
independent over R, where none of the real numbers a, b and c is zero. The relation
of linear dependence now reads
x1 (a, b, c) + x2 (1, 1, 0) + x3(0, 1, 0) = (0, 0, 0),
which is equivalent to three equations
x1 a + x2 = 0
x1 b + x2 + x3 = 0.
x1 c = 0
Solving these for the scalars x1 , x2 and x3 , we see that x1 = 0 as c ! 0, whence the
first will force x2 = 0 and then the second implies that x3 = 0. Thus, the given vectors
are linearly independent over R.
This example illustrates an important technique for determining the linear independence of vectors
in R3 , or in general, of vectors in the vector space Fm over F. Recall that those three equations, because
of column-row multiplication (see Equation 1.7 in Section 1.2) amount to a single vector equation
       
a 1 0 0
       
x1 b + x2 1 + x3 1 = 0
       
c 0 0 0
as in Proposition (3.3.3). However, we go a step further and express the vector equation as the follow-
ing equivalent matrix equation:

    
a 1 0  x1  0
    
b 1 1  x2  = 0.
   
c 0 0 x3 0
It is, therefore, clear that the theory of matrix equations developed in previous chapters will be useful
in dealing with questions about linear independence. We will discuss this point in detail after the
following examples.
EXAMPLE 17 It is easy to see that the set {1, x, x2 } of vectors in R[x] is linearly independent,
whereas {1, x, x2 , 5 +2x−3x2 } is linearly dependent. For, a polynomial, in particular,
a polynomial in 1, x and x2 is the zero polynomial (the zero vector of R[x]) only when
all its coefficients are zeros. For the asserted dependence of 1, x, x2 and 5 + 2x − 3x2 ,
note that for scalars a1 = −5, a2 = −2, a3 = 3 and a4 = 1, we do have a relation of
linear dependence.
EXAMPLE 18 In fact, the infinite set {1, x, x2 , . . . , xn , . . . } of all the non-negative powers of x as
vectors of R[x] is linearly independent, as no non-trivial linear combination of any
finite set of powers of x can give us the zero polynomial.
EXAMPLE 19 Consider the real vector space Mm×n (R) of all m × n matrices with real entries. The
subset {ei j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} of the unit matrices is linearly independent over R.
4
For, if the sum i, j ai j ei j equals the zero vector (which in this case is the zero matrix),
then the matrix [ai j ] is the zero matrix which implies that each scalar ai j = 0.
EXAMPLE 20 Consider the subset {sin t, cos t} of the real vector space C[−π/2, π /2] of all continu-
ous real-valued functions on the closed interval [−π/2, π /2]. These two functions are
linearly independent, for otherwise according to one of our observations in Proposi-
tion (3.3.2), one must be a scalar multiple of the other, say, sin t of cos t. Since these
two are functions, we conclude that
sin t = a cos t for all t ∈ [−π/2, π /2]
for some fixed scalar a. This is absurd since the same a cannot work for all the
t ∈ [−π/2, π /2]. (Compare the graphs). Our assertion follows.
EXAMPLE 21 The set {1, i} of C is linearly independent over R if we consider C as a vector space
over R, whereas as vectors of the complex vector space they are dependent as the
following relation of linear dependence shows:
(i)1 + (−1)i = 0.
Note that in this relation, the scalars a1 = i and a2 = −1 are from the base field C.
130 Vector Spaces
Testing Linear Independence by Using Matrices

Now, we discuss a systematic procedure of checking the linear independence or dependence of vec-
tors belonging to Fm by using matrices and their echelon forms. The key to the procedure is the
following observation: the scalars a1 , a2 , . . . , am in the relation for linear dependence of the vectors
v1 , v2 , . . . , vm form a non-zero solution (as an element of Fm ) of the vector equation:
x1 v1 + x2 v2 + · · · + xm vm = 0.
Hence, we have the following alternative definition of linear independence.
Definition 3.3.3. The vectors v1 , v2 , . . . , vm of a vector space V over a field F are linearly inde-
pendent over F if and only if the vector equation
x1 v1 + x2 v2 + · · · + x m vn = 0
has a unique solution in Fm , namely, the zero solution given by x1 = x2 = · · · = xn = 0.
Consider a set of n vectors v1 , v2 , . . . , vn in Fm . For our purpose, it will be convenient to think of

these vectors as m-dimensional column vectors. Let A be the m × n matrix over F, whose jth column
is the column vector v j :
A = [v1 v2 · · · vn ].
Also for a set of n scalars a1 , a2 , . . . , an in F, let a be the n-dimensional column vector given by
 
a1 
a2 
a =  . .
 .. 
 
an
With these notation in place, the linear combination
a 1 v1 + a 2 v2 + · · · + a n vn
can be expressed, by column-row multiplication (see Equation 1.7), as the following matrix product:
Aa = a1 v1 + a2 v2 + · · · + an vn . (3.3)
Note that Aa ∈ Fm .
To derive a workable definition of linear independence of vectors in Fm in terms of solutions of
matrix equations, we next introduce x, the n-dimensional column vector of variables:
x = (x1 , x2 , . . . , xn )t
We are now ready with the characterization of linear independence of vectors in Fm in terms of matrix
equations.
Proposition 3.3.4. Let v1 , v2 , . . . , vn be vectors in Fm and let A ∈ Mm×n (F) be the matrix whose
jth column is the m-dimensional column vector v j . The vectors v1 , v2 , . . . , vn are linearly independent
over F if and only if the matrix equation
Ax = 0
has only the trivial solution x1 = x2 = · · · = xn = 0.
Note that 0 in the matrix equation is in Fm .
Proof. By Equation (3.3), Ax is the linear combination x1 v1 + x2 v2 + · · · + xn vn so the proposition

follows from Definition (3.3.3) of linear independence of vectors. !
This proposition, coupled with Corollary (2.4.4) from Chapter 2, provides us with the following
test for linear independence.
Corollary 3.3.5. Notation as in the last Proposition. Let R be either a row echelon form or the
reduced row echelon form of A. If every column of R is a pivot column, then the vectors are linearly
independent. Thus, if even a single column of R fails to be a pivot column, then the vectors are linearly
dependent.
As an application of the corollary, let us determine whether the vectors (1, 1, 0, 2), (0, −1, 1, 0) and
(−1, 0, −1, −2) in R4 are linearly independent over R. The matrix formed by these vectors as columns
is given by
 
1 0 −1

1 −1 0
A =  .
0 1 −1
2 0 −2
One easily sees that the reduced row echelon form of A is

 
1 0 −1

0 1 −1
R =  .
0 0 0
0 0 0
Since r has only two pivot columns, we conclude, by the preceding corollary, that the given vectors
are linearly dependent over R.
Some Important Results

Two consequences, which are of great theoretical importance, must be singled out.
Corollary 3.3.6. A set of m vectors in Fm are linearly independent if and only if the m × m matrix
over F, formed by the vectors as its columns, is invertible.
Corollary 3.3.7. Any set of n vectors in Fm is linearly dependent if n > m.

132 Vector Spaces
The first corollary follows from Corollary (2.5.3), which states that a square matrix is invertible if
and only if its reduced row echelon form is the identity matrix in which case clearly all the columns
of the reduced row echelon form are pivot columns. For the second corollary, we need to note that if
the number of columns in a matrix is more than the number of rows, then there will be columns in the
echelon form of the matrix which cannot be pivot columns.
The last corollary can be rephrased as follows:
Corollary 3.3.8. The number of vectors in any linearly independent set of vectors in Fm cannot
exceed m.
Thus, in Fm we have this concept of maximal linearly independent sets of vectors; these are the sets
of linearly independent vectors in Fm which, when extended by adding even a single vector, no longer
remain linearly independent. This is a concept as important as the one of minimal generating set. We
now examine how to arrive at maximal linearly independent set of vectors in an arbitrary vector space
V.
Let us begin by extending the smallest possible linearly independent set of vectors in V. Re-
call that any non-zero vector in a vector space is linearly independent. So, starting with a non-
zero vector v1 , how do we choose v2 such that {v1 , v2 } is still linearly independent? Now, by the
last observation in Proposition (3.3.2), {v1 , v2 } is linearly independent if and only if v2 is not a
scalar multiple of v1 , or in other words, if and only if v2 " /v1 0, as the span of v1 is the set of all
its scalar multiples. Fix a v2 such that {v1 , v2 } is linearly independent. Now, if for another vector
v3 ,
a 1 v1 + a 2 v2 + a 3 v3 = 0
for scalars ai , not all of which are zeros, it follows that a3 ! 0. For, otherwise the linear independence
of {v1 , v2 } will force both a1 and a2 to be zeros. Therefore, we can divide the relation by a3 to express
v3 as a linear combination of v1 and v2 , which is another way of saying that v3 is in the span of {v1 , v2 }.
Thus, we can add v3 to the linearly independent set {v1 , v2 } to obtain a larger linearly independent set
if and only if v3 is not in the span of {v1 , v2 }. Continuing in the same vein, we can prove the general
case as given in the next proposition.
Proposition 3.3.9. Let S be a finite set of linearly independent vectors of a vector space V over
a field F. Then, for any v ∈ V, the extended set S ∪ {v} is linearly independent over F if and only if
v " /S 0, where /S 0 is the linear span of S .
The connection between linear dependence and linear span that came out in the discussion preced-
ing the last proposition gives rise to the following explicit characterization of linearly dependent set of
vectors.
Proposition 3.3.10. A set {v1 , v2 , . . . , vm } of non-zero vectors of a vector space V is linearly

dependent over the base field F if and only if for some j > 1, the vector v j is a linear combination of
the preceding vectors v1 , v2 , . . . , v j−1 in the list.
Note that the proposition makes sense only when m ≥ 2.

Proof. Note that if, for some j > 1, v j is the linear combination
a1 v1 + a2v2 + · · · + a j−1 v j−1 ,
then not all the scalars ai are zeros (for, otherwise v j = 0). Therefore, we have the following relation
of linear dependence:
a1 v1 + · · · + a j−1 v j−1 + (−1)v j + 0v j+1 + · · · + 0vm = 0.
Conversely, suppose that the given vectors are linearly dependent. In any relation of linear dependence
4
for these vectors, such as i ai vi = 0, choose the largest subscript j for which a j ! 0. Thus, all ak = 0
for k > j so the relation actually does not contain the terms corresponding to the vectors vk for all k > j.
Dividing the relation by the non-zero scalar a j then allows us to express v j as a linear combination of
v1 , v2 , . . . , v j−1 . !
Note that if v j is a linear combination of v1 , v2 , . . . , v j−1 , then the span of v1 , v2 , . . . , v j−1 is the
same as the span of v1 , v2 , . . . , v j−1 , v j . (See Exercise 14 of the last Section). We exploit this simple
fact for the next corollary.
Corollary 3.3.11. Let the subspace W of V be spanned by S = {v1 , v2 , . . . , . . . , vm }. If for some

k, 1 ≤ k ≤ m, the set {v1 , v2 , . . . , vk } is linearly independent, then it can be extended to a subset S ' of
S consisting of only linearly independent vectors such that W = /S ' 0.
Proof. If S is linearly independent, then there is nothing to prove as we may choose S ' = S . Otherwise,
it follows from the preceding proposition that some v j in S is a linear combination of v1 , v2 , . . . , v j−1
where j − 1 ≥ k. If the set formed by deleting v j from S is linearly independent, it can be chosen as
S ' ; otherwise continue the process of weeding out any vector which is a linear combination of the
preceding vectors in the new set. This process must stop after a finite number of steps to yield S ' , as
k vectors of S are already linearly independent. The statement about the span of S ' follows from the
remark preceding this corollary. !
This corollary shows that it is possible to extract a linearly independent subset from among the
vectors of a finite generating set of a vector space V such that the linearly independent subset itself
will span V. This observation leads to the important concepts of bases and dimensions of vector spaces.
We deal with these two concepts in detail in the next section.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications; all the
given vectors are in an arbitrary field unless otherwise mentioned.
(a) If v1 , v2 , . . . , vm are linearly dependent vectors, then vm is a linear combination of the other
m − 1 vectors.
(b) Every non-empty subset of a linearly independent set of vectors is linearly independent.
(c) Every set of vectors containing a linearly independent set is linearly independent.
(d) Every non-empty subset of a linearly dependent set of vectors is linearly dependent.
(e) Every set of vectors containing a linearly dependent set is linearly dependent.
134 Vector Spaces
(f) If v is in the span of a non-empty set S of vectors, then any u ∈ S is in the span of the set
obtained from S by replacing u by v.
√
(g) The vectors 1, 2 are linearly independent in R considered a vector space over the field Q.
√ √
(h) The vectors 1, 2, 8 are linearly independent in R considered a vector space over the field
Q.
(i) A subset of a vector space is linearly independent if none of its vectors is a linear combination
of the others.
(j) In R3 , there cannot be a set of four vectors such that any three of them are linearly indepen-
dent.
(k) The real polynomials 1 − 2x + 3x2 − 4x3 − x20 and 2 − 4x + 6x2 − 8x3 + x20 are linearly depen-
dent over R.
(l) The functions 1, cos t and sin t, as vectors of C[−π/2, π /2], are linearly dependent over R.
2. Prove the assertions in Proposition (3.3.2).
4. Are the following vectors in R3 in the span of (1, 1, 1) and (1, 2, 3)?
(a) (1, 0, 2)
(b) (−1, −2, 3)
(c) (3, 4, 5)
5. Do the vectors (1, 3, 2, 4), (−2, 4, 7, −1), (0, 2, 7, −1) and (−2, 1, 0, −3) span R4 ?
6. Give a spanning set of five vectors of the vector space R3 . Also, find a linearly independent
subset of the spanning set.
7. Verify that S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} span the vector space C3 of ordered triples of com-
plex numbers over the field C of complex numbers. If C3 is considered a vector space over the
field of real numbers, then show that S does not span C3 . Find a spanning set of C3 if it is
regarded a real vector space.
8. Given that v1 , v2 and v3 are linearly independent in a vector space V over a field F, determine
whether the following sets are linearly independent:
(a) v1 , v1 + v2 , v1 + v2 + v3
(b) v1 − v2 , v2 − v3 , v3 − v1
9. Find three vectors in R3 which are linearly dependent but such that any two of them are linearly
independent.
10. In each of the following, determine whether the given vectors are linearly independent in the
indicated vector space. All the vector spaces are over R.
(a) (1, −2, 2), (2, −4, 2) in R3
(b) (1, −2, 2), (2, −4, 2), (4, −8, 2) in R3
(c) (0, −1, 2, 3), (1, 2, −1, 0), (−2, −8, 6, 6) in R4
(d) x2 − x, 2 + 3x − x2, −4 − 8x in R3 [x]
' ( ' ( ' (
1 −2 2 −4 3 −4
(e) , , in M2 (R)
0 1 1 2 2 −4
Basis and Dimension 135

       
1 1 1 0  0 1  1 0
       
(f) 0 0 , 0 1 ,  1 1 ,  1 0 in M2 (R)
    
0 −1 0 1 −1 0 −1 1
(g) sin 2t, cos 2t, sin t cos t in C[0, 1]
Recall that we are following the convention that vectors in Fn can be described as either row
vectors or column vectors.      
3  1   1 
     
11. Find the values of a for which the vector 3 in R3 is in the span of  −1  and  2 .
     
a 1 −3
12. Let A ∈ Mn (F). Show that A is invertible if and only if the rows of A, regarded as vectors in Rn ,
are linearly independent.
3.4 BASIS AND DIMENSION

We had observed, at the end of the preceding section, that any set, spanning a vector space, always
contains a linearly independent subset which too spans the vector space. On the other hand, Propo-
sition (3.3.9) implies that the process of extending linearly independent sets can be carried on till
the extended linearly independent set spans the vector space. The following definition, therefore, is a
natural one.
Definition 3.4.1. A linearly independent subset of a vector space is a basis of the vector space if it
spans the vector space.
Thus, it is clear that a basis of a vector space is a maximal linearly independent set of vectors in
that space.
Because both the concepts of span and independence depend on the scalar field F, a basis too
depends on F. However, as has been our practice in similar situations, we do not emphasize this
dependence if there is no occasion for confusion.
A fundamental fact is that any vector space does possess a basis. For a large class of vector spaces,
it is quite easy to see this. The following definition introduces these spaces.
Definition 3.4.2. A finite-dimensional vector space is one which has a finite generating set. A
vector space which cannot be generated by any finite subset is an infinite-dimensional vector space.
According to Corollary (3.3.11), any finite generating set of a vector space contains a linearly
independent subset which spans the vector space. Thus, a finite-dimensional vector space has a finite
basis.
Even if a vector space is not finite-dimensional, it has a basis. But any proof of this fact depends
on deep concepts of set theory, such as the axiom of choice, which are beyond the scope of this book.
Let us consider some finite-dimensional vector spaces and their bases.
EXAMPLE 22 Rm is a finite-dimensional vector space. The vectors e1 , e2 , . . . , em , where
j
e j = (. . . , 0, . . . , 1, . . . , 0, . . . )
136 Vector Spaces
is the vector having 1 at the jth place and zeros everywhere else, span Rm . Indeed,
4
any vector (x1 , . . . , x j , . . . , xm ) ∈ Rm is the linear combination k xk ek of these e j .
Moreover, these vectors e j are linearly independent over R as the relation
1
xk ek = 0
k
implies that (x1 , . . . , x j , . . . , xm ) is the zero vector whence one concludes that each
x j = 0. Thus, {e1 , e2 , · · · , em } is a basis of Rm . This basis is usually referred to as
the standard basis of Rm .
More generally, if 1 and 0 denote the identities of a field F, then one can verify
that the vectors e j of Fm , similarly defined, form the standard basis of Fm over the
field F.
EXAMPLE 23 An argument based on degrees of polynomials shows that R[x] cannot be finite-
dimensional. For, suppose that polynomials f1 (x), f2 (x), . . . , fk (x) span R[x]. Note:
For any two polynomials f (x) and g(x) and scalars a, b, the degree of the non-zero
polynomial a f (x) + bg(x) is the larger of the two degrees of f (x) and g(x). It follows
that if n is the largest of the degrees of the given polynomials f1 (x), f2 (x), . . . , fk (x),
then n must be the degree of any non-zero linear combination a1 f1 (x) + a2 f2 (x) +
· · · + ak fk (x). Thus the degree of any non-zero polynomial in the span of the given
polynomials cannot exceed n. Since R[x] has polynomials of arbitrarily large de-
grees, our supposition is absurd.
EXAMPLE 24 However, the real vector space Rn [x] is finite-dimensional. It is easy to see that the
n + 1 vectors 1, x, x2 , . . . , xn not only span Rn [x], but also are linearly independent
over R, so they form a basis of Rn [x]. We will call this basis the standard basis of
Rn [x].
Similarly, the space of all polynomials, with coefficients from a field F, whose
degrees do not exceed a fixed positive integer, is finite-dimensional considered a
vector space over F.
EXAMPLE 25 Consider the vector space Mm×n (F) of m × n matrices with entries from a field F. As
we have seen in Proposition (1.3.8) of Chapter 1, the mn unit matrices ei j span this
vector space. They form a basis of Mm×n (F) for, if for some scalars xi j
1
xi j ei j = 0,
i, j
where the matrix 0 is the zero (vector) of Mm×n (F), then the m × n matrix [xi j ] itself
is the zero matrix 0 showing that each scalar xi j is zero.
Note that this basis of Mm×n (F) is similar to the standard basis of Rm .
EXAMPLE 26 The space Mn (F) of square matrices has a basis consisting of n2 unit matrices ei j , 1 ≤
i, j ≤ n.
We leave it to the reader to verify that the subspace of Mn (F) consisting of sym-
metric matrices is spanned by e11 , e22 , . . . , enn and the sums ei j + e ji for i ! j. Thus,
this subspace has n(n + 1)/2 matrices forming a basis.
EXAMPLE 27 Let us come back to R2 which has the standard basis {e1 , e2 }, where e1 = (1, 0) and
e2 = (0, 1). It is quite simple to show that the vectors (1, 0) and (1, 1) form another
basis of R2 . In fact, we can find infinitely many pairs of vectors in R2 which form
bases of R2 . Refer to Corollary (3.3.6) for an easy method to check whether a given
pair forms a basis or not.
Dimension
In fact, the last example is typical. In any vector space, there will be many choices for a basis. In later
chapters, a lot of effort will be spent in choosing a basis appropriate for any given application. So the
following result is fundamental.
Theorem 3.4.3. Any two bases of a finite-dimensional vector space have the same number of vec-
tors.
The proof is an easy consequence of the following lemma which implies that the number of vectors
in any basis cannot exceed the same for any other basis. Note that the lemma is a generalization of
Corollary (3.3.7).
Lemma 3.4.4. Let {v1 , v2 , . . . , vm } be a basis of a vector space V over a field F. If vectors
u1 , u2 , . . . , un are linearly independent in V, then n ≤ m.
Proof. It is sufficient to show that any set of n vectors in V is linearly dependent if n > m. In other
words, it is sufficient to show that given any arbitrary set of n vectors u1 , u, . . . , un where n > m, it is
possible to find scalars x1 , x2 , . . . , xn in F, not all zero, such that
x1 u1 + x2 u2 + · · · + xn un = 0.
To find such scalars, we first express each u j as a linear combination of the given basis vectors, which
is possible as the basis vectors span V. So, for each fixed j (1 ≤ j ≤ n), let a1 j , a2 j , . . . , am j be m
scalars such that
u j = a1 j v1 + a2 j v2 + · · · + am j vm
1m
= a i j vi .
i=1
Let A be the m × n matrix [ai j ] over F formed by these mn scalars in such a way the coefficients for u j
form the jth column of A. Consider now the homogeneous system of m equations in n variables given
by the matrix equation:
Ax = 0.
Since by hypothesis n > m, Proposition (2.5.1) about such system implies that this system has a non-
zero solution in Fn . In other words, it is possible to choose x1 , x2 , . . . , xn in F, not all zero (constituting
138 Vector Spaces
the non-zero solution), such that

n
1
ai j x j = 0 for i = 1, 2, . . . , m.
j=1
It follows that
n
1
x1 u1 + x2 u2 + · · · + xn un = x ju j
j=1
n
m 
1 1 
= x j   ai j vi 
j=1 i=1
 
m 
1 1 n 
= 
 a x  v
 i j j  i
i=1 j=1
=0
as the coefficient of each vi is zero by our choice of x j . This completes the proof of the lemma. !
Since the number of vectors in any linearly independent set in a vector space of dimension m cannot
exceed m, a basis is sometimes referred to as a maximal linearly independent set. But the main use
of the theorem is in enabling one to assign a number to a finite-dimensional vector space which is
intrinsic to the space and independent of the choice of a basis.
Definition 3.4.5. The dimension of a finite-dimensional vector space V, denoted by dim V, is the
number of vectors in any of its basis. The dimension of the zero vector space {0} is defined as zero.
The convention about the dimension of the zero space is necessary as it has no basis. Another
way around this difficulty is to declare that the zero space is spanned by the empty set of linearly
independent vectors, and thus its dimension is zero.
From our examples of bases of vector spaces, we immediately conclude that:
(a) for any field F, dimFm = m, as the standard basis of Fm has m vectors;
(b) dimMm×n (F) = mn as the unit matrices form a basis. In particular, dimension of the vector space
Mn (F) is n2 ;
(c) the real vector space Rn [x] has dimension (n + 1) as 1, x, x2 , . . . , xn form a basis;
(d) The subspace R2 [x] of Rn [x] for n > 2 has dimension 3 as 1, x, x2 span it.
Basis of a Subspace
In the last example, we could find a finite generating set spanning of a subspace of a finite-dimensional
vector space. However, it is not clear that an arbitrary subspace of a finite-dimensional vector space
must have a finite generating set. The next proposition uses dimension argument to make it clear.
Proposition 3.4.6. A subspace W of a finite-dimensional vector space V is necessarily finite-

dimensional. In fact, dim W ≤ dim V.
Proof. If W is the zero space {0}, the result is trivial by our convention about the dimension of the
zero space. So, assume that W is non-zero. Then, W certainly has linearly independent vectors. If any
finite set of such linearly independent vectors in W does not span W, by Proposition (3.3.9), it can be
expanded to a larger, but still a finite, set of independent vectors. However, vectors independent in W
are also independent in V and so their number cannot exceed dim V. That shows that the expansion of
a finite set of linearly independent vectors in W to larger sets cannot be continued indefinitely, and so
after a finite number of steps must end resulting in a spanning set of W, that is, in a basis of W. Since
the basis vectors of W are linearly independent in V too, it follows that dim W ≤ dim V. !
We want to record the last point noted in the proof as a corollary.
Corollary 3.4.7. Any linearly independent set of vectors in a non-zero subspace W of a finite-
dimensional vector space V can be expanded to a basis of W.
Note that W can be chosen as V also in the corollary. In that case, the corollary yields the following
useful result.
Corollary 3.4.8. Any set of m linearly independent vectors in an m-dimensional vector space forms
a basis.
The last proposition comes handy in describing subspaces of a vector space in terms of dimensions.
We classify the subspaces of R2 according to their dimensions as an example. Since the dimension of
R2 is two, any of its subspaces must have dimension less than two.
EXAMPLE 28 The following are the subspaces of R2 .

Zero-dimensional subspace: Only the zero space {0}.
One-dimensional subspaces: All the straight lines passing through the origin. Any such line will be
spanned by any non-zero vector (a point in this case) lying in the line.
Two-dimensional subspaces: Only the whole space R2 . Any two linearly independent vectors will
form a basis.
If we start with any non-zero vector (x1 , x2 ) (think of this as a point in the plane), it forms the
basis of the unique straight line passing through it and the origin. Now, if (y1 , y2 ) is not in the span of
the first vector, i.e. if (y1 , y2 ) does not lie on that line, then the set {(x1 , x2 ), (y1 , y2 )} is automatically
linearly independent, so forms a basis of R2 .
Tests for a Basis

For various calculations, we will frequently need to test sets of vectors to see whether they form a
basis. For vectors in Fm , the following restatement of Corollary (3.3.6) provides such a test.
Lemma 3.4.9. Let v1 , v2 , . . . , vm be m vectors in the m-dimensional vector space Fm . Let P be the
square matrix of order m whose jth column consists of the components of the vector v j . Then, these m
vectors form a basis of Fm if and only if P is invertible.
140 Vector Spaces
Thus, we know that (1, 0) and (1, 1) form a basis of R2 as

' (
1 1
P=
0 1
and det P ! 0.
For vectors in arbitrary finite-dimensional vector spaces, analogous tests can be formulated once
coordinates of vectors are available. The idea of coordinates, which makes numerical calculations
possible in arbitrary vector spaces, follows from the following fundamental result.
Proposition 3.4.10. Let {v1 , v2 , . . . , vm } be a basis of a finite-dimensional vector space V. Then

every vector of V can be expressed uniquely as a linear combination of these basis vectors.
That every vector is a linear combination of the basis vectors is part of the definition of a basis. The
point of this result is that for a fixed basis, there is only one way of choosing the scalar coefficients in
the expression of a vector as a linear combination of the basis vectors.
Proof. It is sufficient to prove the assertion about the uniqueness. Assume that it is possible to express
4 4
a vector v ∈ V as i xi vi as well as i yi vi for scalars xi and yi . These two linear combinations being
equal, we can rewrite the equality as
1
(xi − yi )vi = 0.
i
However, vi are basis vectors and so are linearly independent. We may, therefore, conclude from the
last vector equation that for each i, xi − yi = 0 or xi = yi . This proves the required uniqueness. !
Thus, every vector in a m-dimensional vector space determines a unique set of m scalars with
respect to a given basis. Let us formalize this association of vectors with unique sets of scalars first.
Coordinates and Coordinate Vectors

So let V be a finite-dimensional vector space over a field F, and let B = {v1 , v2 , . . . , vm } be a fixed basis
of V. Then, given a vector v ∈ V, there is one and only one way of writing v as a linear combination of
the basis vectors:
v = x1 v 1 + x 2 v 2 · · · + x m v m , xi ∈ F.
The m scalars xi , uniquely determined by the vector relative to the given basis, are called the coordi-
nates of v relative to the basis B, and the column vector
 
 x1 
 x 
 2 
 ..  = (x1 , x2 , . . . , xm )t (3.4)
 . 
 
xm
in Fm consisting of these scalars is called the coordinate vector of v with respect to the given basis of
V. If we want to indicate the basis with respect to which the coordinates are taken, we will refer to the
coordinate vector of v as [v]B .
Note: The association of a vector of an m-dimensional vector space with an ordered m-tuple in Fm
depends on the order of the basis vectors in the given basis. Thus, to remove any ambiguity in this
association, we need to fix the order of the basis vectors. Hence, we adopt the following convention:
a basis of V is an ordered set of linearly independent vectors which span V. Thus, for us a basis will
always mean an ordered basis.
However, we will continue to use curly brackets {} to describe sets of vectors forming a basis.
(Usually, ordered sets are enclosed in round brackets). Thus, for us, the basis {v1 , v2 , . . . , vm } is not
the same as the basis {v2 , v1 , . . . , vm } even though as sets they are the same.
Let us consider some examples.
EXAMPLE 29 Consider the m-dimensional vector space Fm over the field F equipped with the stan-
dard basis E = {e1 , e2 , . . . , em }, where
j
e j = (. . . , 0, . . . , 1, . . . , 0, . . . ).
Given any vector v = (x1 , x2 . . . , xm ) in Fm , since

1
(x1 , x2 . . . , xm ) = xk ek ,
k
it follows that the components of v themselves are the coordinates of v with respect
to the standard basis E.
For example, in R2 ,
((1, 2))E = (1, 2)t .
EXAMPLE 30 The vector space M2 (R) of 2 × 2 real matrices has the' four unit( matrices, e11 , e12 , e21
a a12
and e22 , forming the standard basis. The matrix 11 as a vector of M2 (R)
a21 a22
has a11 , a12 , a21 and a22 as the coordinates relative
' to(this standard basis. Thus, for
1 −3
example, the coordinate vector of the matrix with respect to the standard
0 −2
basis is (1, −3, 0, −2)t .
EXAMPLE 31 If V is the vector space of all real polynomials of degree at most 3, then as we
have seen earlier, {1, x, x2 , x3 } is the standard basis of V over R. The polynomial
a0 + a1 x + a2 x2 + a3 x3 clearly has a0 , a1 , a2 and a3 as its coordinates with respect to
this basis.
EXAMPLE 32 Consider the vector space R2 with basis (1, 2), (0, 4) (note one is not a scalar multiple
of the other, so they are linearly independent, and hence form a basis, as R2 has
dimension 2). To determine the coordinates of, say, (5, −3) with respect to this basis,
we have to solve the following vector equation
(5, −3) = x1 (1, 2) + x2(0, 4)

142 Vector Spaces
for real numbers x1 and x2 . We express this vector equation as the matrix equation
' (2 3 2 3
1 0 x1 5
= .
2 4 x2 −3
The 2 × 2 matrix here has the basis vectors as its columns, so is invertible
by Lemma
' (3.4.9).
( Multiplying the matrix equation from the left by the inverse
4 0
1/4 , we obtain the coordinate vector of (5, −3) as
−2 1
2 3 ' (2 3 2 3
x1 4 0 5 5
= 1/4 = .
x2 −2 1 −3 −13/4
Note that we have solved the matrix equation through multiplication by the in-
verse. This method makes sense only when the matrices are small in size. In fact, it
is easier to solve matrix equations by the method of row reduction of the augmented
matrix, which was discussed in the last chapter.
EXAMPLE 33 In general, let B = {v1 , v2 , . . . , vm } be an arbitrary basis of Fm . Given a vector v ∈ Fm ,

we wish to determine the coordinates x1 , x2 , . . . , xm of this column vector v with
respect to the basis B, determined by
v = x1 v1 + x2 v2 + · · · + xm vm .
Now, as in Equation (3.3), the sum on the right-hand side is precisely the matrix
product Px, where P is the square matrix whose jth column is the vector v j , and x
is the column vector formed by the scalars x1 , x2 , . . . xm we are seeking. Therefore,
the preceding vector equation can be put in the form
v = Px.
Observe that P, according to Lemma (3.4.9), is invertible as basis vectors form the
columns of P. Therefore, the required coordinates of v are given by
x = P−1 v.
Using the notation for coordinate vectors introduced earlier, we may write the last
equation as
[v]B = P−1 v (3.5)
for expressing the coordinate vector of v with respect to the basis B. Note that we
can interpret the column vector v as the coordinate vector [v]E of itself with respect
to the standard basis E.
To obtain a similar result about the way coordinates change due to change of bases in an arbitrary
vector space, we need the general form of Lemma (3.4.9) which makes it easier to check whether a set
of vectors form a basis or not.
Proposition 3.4.11. Let V be an m-dimensional vector space with basis B = {v1 , v2 , . . . , vm }.

Consider another set of m vectors u1 , u2 , . . . , um in V. Let p j be the coordinate vector of u j with
respect to the basis B, and let P be the m × m matrix whose jth column is the column vector p j . Then,
the m vectors u1 , u2 , . . . , um form a basis of V if and only if P is invertible.
Proof. Since dim V = m, it suffices to show that the m vectors u1 , u2 , . . . , um are linearly independent
if and only if P is invertible. But this is precisely what we had shown in the proof of Lemma (3.4.4)
(with A in place of P). !
This proposition provides us with a very efficient method to produce bases out of a given basis of a
finite-dimensional vector space.
Corollary 3.4.12. Let v1 , v2 , . . . , vm be a basis of a vector space V over a field F, and let P = [pi j ]
be an invertible matrix in Mm (F). Then, the m vectors u1 , u2 , . . . , um defined by the equations
m
1
uj = p i j vi for j = 1, 2..., m,
i=1
form a basis of V.
We now give a couple of examples to illustrate the uses of the preceding results.
EXAMPLE 34 Consider the set B = {1, 1 + x, 1 + x2 } of vectors in R2 [x], the real vector space of all
real polynomials of degree at most 2. Since E = {1, x, x2 } is the standard basis of V,
we may express the vectors of B in terms of the vectors of E to construct the matrix
P as in the preceding lemma. It is clear that
 
1 1 1
 
P = 0 1 0
 
0 0 1
is invertible, as det P = 1. Therefore, B is another basis of R2 [x].
EXAMPLE 35 Consider R2 with the usual standard basis. For any real θ, consider the vectors
2 3 2 3
cos θ − sin θ
v1 = and v2 =
sin θ cos θ
in R2 . Expressing these vectors in terms of the standard basis, we see that the matrix
P in this case is
' (
cos θ − sin θ
P= ,
sin θ cos θ
from which it is clear that det P = 1. Thus, P is invertible and so v1 , v2 form a basis
of R2 .
144 Vector Spaces
Change of Basis Matrix

It is time now to discuss the change of coordinates relative to a change of basis. First, a definition.
Definition 3.4.13. Let B = {v1 , v2 , . . . , vm } and B' = {u1 , u2 , . . . , um } be two ordered bases of an
m-dimensional vector space V. Consider the unique m2 scalars pi j obtained by expressing the u j as
linear combinations of the vectors of the basis B:
m
1
uj = p i j vi for j = 1, 2, . . . , m.
i=1
The m × m matrix P = [pi j ] whose jth column is formed by the coefficients in the expression for u j (in
the same order) is called the transition matrix, or the change of basis matrix from the basis B' to the
basis B.
Note that such a transition matrix from one basis to another is invertible by Lemma (3.4.11).
For example, the matrices P in the preceding Examples 34 and 35 are the transition matrices from
the new bases to the usual standard bases of R2 [x] and R2 , respectively.
We now generalize Example 33 to give a formula for relating the coordinates of a vector with
respect to two different bases of a vector space:
Theorem 3.4.14. Let V be a finite-dimensional vector space with bases B and B' . For any v ∈ V,
let x and x' be the coordinate vectors of v with respect to bases B and B' , respectively. Then,
x' = P−1 x,
where P is the transition matrix from the basis B' to the basis B.
Proof. Let B = {v1 , v2 , . . . , vm } and B' = {v' 1 , v' 2 , . . . , v' m } be two bases of the vector space V.
Given any v ∈ V, let
m
1 m
1
v= xi vi and v = x'j v' j (3.6)
i=1 j=1
be the expressions of v as a linear combination of the bases vectors so that the scalars xi and x' j form
the coordinate vectors x and x' of v with respect to the two bases, respectively. Let P = [pi j ] be the
matrix of transition from the basis B' to B. Then, by Definition (3.4.13)
m
1
v' j = p i j vi for j = 1, 2, . . . , m.
i=1
Substituting the expression for the v! j in the second of the Equation (3.6), we then see that
m
m 
1 1 
'  
v= x j  pi j vi 
j=1 i=1
 
m 
1 1 m 

  v .
i j j
'
=  p x  i
i=1 j=1
Since the vectors vi are independent, comparing the preceding expression for v with the first of the
Equation (3.6), we conclude that
m
1
xi = pi j x'j for i = 1, 2, . . . , m.
j=1
But these m equations are equivalent to a single matrix equation
x = Px' .
The proof is complete as P is invertible. !
EXERCISES
1. Determine whether the following assertions are true or false- giving brief justifications.
(a) Every vector space is finite-dimensional.
(b) If a finite subset S of a finite-dimensional vector space V spans V, then S must be a basis of
V.
(c) The zero vector space can have no basis.
(d) A finite-dimensional vector space over R can have only finitely many distinct bases.
(e) If B is a basis of a vector space V, then for any subspace W of V, there is a subset of B which
is a basis of W.
(f) In an m-dimensional vector space V, any set of m vectors is linearly independent if it spans
V.
(g) In an m-dimensional vector space V, any set of m vectors spans V if it is linearly independent.
(h) If a subspace W of an m-dimensional vector space V has m linearly independent vectors,
then W = V.
(i) Any transition matrix from one basis of a finite-dimensional vector space to another basis is
an invertible matrix.
(j) Any invertible matrix of order n over a field F is the transition matrix of some basis of Fn to
its standard basis.
(k) Any non-zero subspace of an infinite-dimensional vector space has to be infinite-
dimensional.
(l) Any proper subspace of an infinite-dimensional vector space has to be finite-dimensional.
(m) There is only one subspace of dimension m in an m-dimensional vector space.
(n) There are infinitely many distinct one-dimensional subspaces of R2 .
(o) If a vector space is spanned by an infinite set, then the vector space must be infinite-
dimensional.
(p) Any vector space over a finite field is necessarily finite-dimensional.
(q) Given an m-dimensional real vector space V and a basis B of V, any element of Rm is the
coordinate vector of some vector of V with respect to B
2. Prove that there are infinitely many bases of R2 one of whose members is (1, 0).
146 Vector Spaces
3. Find three different bases of the real vector space R3 [x] of all polynomials with real coefficients
of degree at most 3.
4. Prove that there is a one–one correspondence between the collection of distinct bases of Fm and
GLm (F), the collection of all invertible matrices in Mn (F).
5. Let W be the subset of the complex vector space C4 consisting of all those vectors whose third
and the fourth components are the same. Verify that W is a subspace of C4 , and determine its
dimension.
6. Determine the dimension of the subspace W of Mn (F) consisting of the symmetric matrices by
actually exhibiting a basis of W.
Do the same for the subspace consisting of those matrices A ∈ Mn (F) such that At = −A.
7. Classify the subspaces of R3 in terms of their dimensions.
8. Let W be the subset of M2 (C) consisting of all matrices A = [ai j ] such that a11 + a22 = 0.
(a) Show that W is a subspace of M2 (C), and find a basis of W.
(b) Determine the dimension of W considering it as a vector space over the field of real numbers.
(c) Determine the dimension of the subspace U of the real vector space W consisting of all those
matrices A = [ai j ] in W such that a12 = −a22 .
9. Show that the field of real numbers, considered a vector space over the field of rational numbers,
cannot be finite-dimensional.
10. Show that the space C[a, b] of all continuous real-valued functions on a closed interval [a, b] of
the real line is infinite-dimensional.
11. Let B = {v1 , v2 , . . . , vn } and C = {u1 , u2 , . . . , un } be bases of a vector space V over a field F.
Let P be the n × n matrix over F whose jth column is the coordinate vector of v j with respect
to basis C for 1 ≤ j ≤ n. Let [v]B and [v]C be the coordinate vectors of an arbitrary vector v
with respect to bases B and C, respectively. Which of the following two equations is satisfied by
every vector v ∈ V?
[v]B = P[v]C or [v]C = P[v]B .
12. Let A = {v1 , v2 , v3 } and B = {u1 , u2 , u3 } be bases of a vector space V over R. Suppose that
v1 = u1 − u2 , v2 = −u1 + 2u2 + u3 and v3 = −2u1 + u2 + 4u3 . Determine the change of basis matrix
from B to A, and the coordinates of v = −3v1 + v2 − 2v3 with respect to the basis B.
13. Let A = {v1 , v2 , . . . , vn } and B = {u1 , u2 , . . . , un } be bases of Fn , and P be the change of basis
matrix from A to B. Consider the augmented matrix:
/ 0
A = u1 · · · un v1 · · · vn .
Prove that A is row-equivalent to [In P], where In is the identity matrix of order n over F.
14. Determine the coordinates of (−3, 2, −1) ∈ R3 with respect to the basis {(1, 1, 1), (1, 0, 1), (1, 1, 2)}
of R3 .
15. Determine the coordinate vector of 1 − 3x + 2x2 ∈ R2 [x] with respect to the basis {x2 − x, x2 +
1, x − 1} of R2 [x].
16. Find a basis of the subspace W = {(x1 , x2 , x3 , x4 ) | x1 − 3x2 + x3 = 0} of R4 . Calculate the coor-
dinates of (1, 1, 2, −1) ∈ W with respect to this basis.
Subspaces Again 147
17. Consider the matrix

 
0 1 0 0 0

0 0 0 0 1
 
A = 0 0 1 0 0

1 0 0 0 0

0 0 0 1 0
over any field F. Prove that A is invertible by showing that A is a transition matrix from a certain
basis of F5 to the standard basis. Generalize.
18. Prove that a vector space V over an infinite field cannot be the union of finite number of proper
subspaces.
19. Prove that the polynomials 1, 2x, −2 + 4x2 and −12x + 8x3 form a basis of R3 [x]. Find the coor-
dinate vector of −5 + 4x + x2 − 7x3 with respect to this basis.
These polynomials are the first four Hermite polynomials.
20. Prove that the polynomials 1, 1 − x, 2 − 4x + x2 and 6 − 18x + 9x2 − x3 form a basis of R3 [x]. Find
the coordinate vector of −5 + 4x + x2 − 7x3 with respect to this basis.
These polynomials are the first four Laguerre polynomials.
3.5 SUBSPACES AGAIN

In this section, we focus on some aspects of sums of subspaces of a vector space. Here is a summary
of the facts we have learnt about subspaces so far.
• A subspace W of a finite-dimensional vector space V is finite-dimensional; in fact, dim W ≤
dim V.
• Any generating set for a subspace W contains a basis of W.
• Any basis of a subspace W can be extended to a basis of V.
We can strengthen the first result by a simple observation whose proof is left as an exercise to the
reader.
Lemma 3.5.1. Let U ⊂ W be subspaces of a finite-dimensional vector space V. Then, U = W if and

only if dim U = dim W.
Note that any result about a vector space is applicable to a subspace, as a subspace is a vector
space on its own. Thus, for example, in the lemma we can take W = V. Similarly, any result proved for
general subspaces is valid for the whole space unless it is specifically for proper subspaces only.
Quite frequently, we need to discuss sums of subspaces. As we had seen in Section 3, given sub-
spaces W1 and W2 of a vector space V, their sum W1 + W2 , defined as
W1 + W2 = {w1 + w2 | wi ∈ Wi },
is again a subspace of V.
The following result gives a useful formula for the dimension of the sum of two subspaces.
Proposition 3.5.2. Let W1 and W2 be finite-dimensional subspaces of a vector space V. Then,
dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).

148 Vector Spaces
Thus W1 ∩ W2 is the zero subspace if and only if

dim(W1 + W2 ) = dim W1 + dim W2 .
Proof. Consider the case when the subspace W1 ∩ W2 is non-zero to begin with, so we can suppose
that it has a basis. Let {v1 , v2 , . . . , vr } be a basis of W1 ∩ W2 so that r = dim(W1 ∩ W2 ). Now, W1 ∩ W2
being a subspace of both W1 and W2 , its basis can be extended to bases of these bigger subspaces. So,
we may assume that B1 = {v1 , v2 , . . . , vr , u1 , . . . , ut } and B2 = {v1 , v2 , . . . , vr , w1 , . . . , w s } are the
extended bases of W1 and W2 respectively. Note that dim W1 = r + t, and dim W2 = r + s.
We claim that B = {v1 , v2 , . . . , vr , u1 , . . . , ut , w1 , . . . , w s } is a basis of W1 + W2 . It is clear from
the definition of sum of subspaces that B spans W1 + W2 . So to prove the claim, we need only to prove
that B is a linearly independent set. If not, some vector in the list for B must be a linear combination
of the vectors preceding it. Since the first (r + t) vectors in the list for B are linearly independent, this
vector has to be one of the last s vectors of the list. In other words, for some k, (1 ≤ k ≤ s), the vector
wk is the following linear combination:
wk = a1 v1 + · · · + ar vr + b1 u1 + · · · + b s u s + c1 w1 + · · · + ck−1 wk−1 ,
where at least one c j is non-zero. This relation can be rewritten as
wk − c1 w1 − · · · − ck−1 wk−1 = a1 v1 + · · · + ar vr + b1 u1 + · · · + b su s .
Observe that the expression on the left-hand side is in W2 , whereas the one on the right-hand side
is in W1 so that the vector represented by these two equal expressions must be in W1 ∩ W2 . Thus,
wk − c1 w1 − · · · − ck−1 wk−1 is a linear combination of the basis vectors v1 , v2 , . . . , vr of W1 + W2 .
This, however, contradicts the fact that the vectors of B2 are linearly independent. Our claim is thus
established showing that dim(W1 + W2 ) = r + s + t. The desired equality of dimensions follows as
dim W1 = r + s and dim W2 = r + t.
In case the intersection of W1 and W2 is the zero subspace, a similar argument shows that the union
of any two bases chosen for W1 and W2 is a basis of the sum W1 + W2 . Hence, the formula for the
dimension of W1 + W2 holds in this case, too. !
Note that the proposition is applicable even if V is infinite-dimensional.

We give some applications of the result in the following examples.
EXAMPLE 36 Let V = R2 , and let W1 and W2 be two distinct lines passing through the origin.
Thus, they are one-dimensional subspaces of R2 such that their intersection is the
zero space {0} which is the origin of R2 . Thus, dim(W1 ∩ W2 ) = 0. But then the
formula in Proposition (3.5.2) implies that dim(W1 + W2 ) must be 2. Since R2 itself
has dimension 2, it follows from the first result quoted at the beginning of this section
that W1 + W2 = R2 .
Our conclusion also proves that any two non-zero vectors, one each from these
distinct lines, will form a basis of R2 , for any non-zero vector in a one-dimensional
space forms a basis.
EXAMPLE 37 Consider subspaces in R3 next. Let W1 and W2 be two distinct planes in R3 passing
through the origin. Thus, they are two-dimensional subspaces of R3 . Since dim(W1 +
W2 ) cannot exceed 3, the dimension of R3 , it follows from Proposition (3.5.2) that
dim(W1 ∩ W2 ) ! 0. However, (W1 ∩ W2 ) ⊂ W1 so that its dimension cannot exceed
Subspaces Again 149
dim W1 = 2. If it equals 2, then by Lemma (3.5.1), (W1 ∩ W2 ) = W1 , a contradiction

as W1 and W2 are distinct planes. Thus, the only possibility is that dim(W1 ∩ W2 ) = 1.
We conclude that the given planes intersect at a straight line passing through the
origin.
It should be noted that in the last example, an argument based only on dimensions has helped us in
getting a geometrical insight.
We now discuss the general case of sums of finitely many subspaces of a vector space.
Definition 3.5.3. Let W1 , W2 , . . . , Wk be subspaces of a vector space V. The sum of these sub-
spaces, denoted by W1 + W2 + · · · + Wk , is defined as follows:
W1 + W2 + · · · + Wk = {w1 + w2 + · · · + wk | wi ∈ Wi }.
The subspaces Wi are known as summands of the sum.

It is clear that the sum, thus defined, is a subspace of V. Also, note that the definition is valid for
subspaces of an infinite-dimensional vector space, too.
Direct Sums of Subspaces

The assertions of the following proposition give the equivalent conditions, which when imposed on a
sum of subspaces, make such sums one of the most useful concepts of linear algebra.
Proposition 3.5.4. Let W1 , W2 , . . . , Wk be subspaces of a vector space V, and W = W1 + W2 +

· · · + Wk be their sum. Then, the following are equivalent.
(a) Every vector in the sum W can be expressed uniquely as a sum of vectors from the subspaces
W1 , W2 , . . . , Wk .
(b) For vectors wi ∈ Wi , the relation
w1 + w2 + · · · + wi = 0
implies that each wi = 0 in Wi .

(c) For each i, 1 ≤ i ≤ k,
Ci + · · · Wk ) = {0}
Wi ∩ (W1 + W2 + · · · + W
is the zero subspace. (Here, the hat over Wi means that the term is missing in the sum.)
Proof. As every subspace contains the zero vector, the zero vector of W can be expressed as the sum
v1 + v2 + · · · + vk , where for each i, vi is the zero vector of the subspace Wi . Therefore, the uniqueness
assumed in condition (a) shows that (a) implies (b). Conversely, if for a vector v ∈ W,
v = w1 + w2 · · · + wk
= u1 + u2 · · · + uk ,
where for each i, wi , ui ∈ Wi , then we have
(w1 − u1 ) + (w2 − u2 ) + · · · + (wk − uk ) = 0.

150 Vector Spaces
Since Wi is a subspace, for each i, (wi − ui ) ∈ Wi . So, if (b) holds, wi = ui for each i. Thus, (b) implies
(a).
A relation of the type w1 + w2 + · · · + wk = 0 with vectors wi ∈ Wi implies that for each fixed i, wi
is the sum of vectors −w j for all j ! i, and therefore, is in the intersection of Wi with the sum of the
other subspaces W j with j ! i. Thus, conditions (b) and (c) are equivalent. !
Definition 3.5.5. Let W1 , W2 , . . . , Wk be subspaces of a vector space V. The subspace W = W1 +

W2 + · · · + Wk is called the internal direct sum, or simply the direct sum, of the subspaces if any one
and hence all of the three conditions of Proposition (3.5.4) are satisfied. In that case, we write
W = W1 ⊕ W2 ⊕ · · · ⊕ Wk .
The subspaces Wi are known as direct summands of the subspace W.
In case the subspaces are finite-dimensional, there is a useful characterization of their direct sum in
terms of dimensions.
Proposition 3.5.6. Let W1 , W2 , . . . , Wk be finite-dimensional subspaces of a vector space V, and

W = W1 + W2 + · · · + Wk be their sum. Then, the following are equivalent.
(a) W = W1 ⊕ W2 ⊕ + · · · ⊕Wk .
(b) The union of any bases B1 , B2 , . . . , Bk of the subspaces W1 , W2 , . . . , Wk , respectively, is a
basis of W.
(c) dim W = dim W1 + dim W2 + · · · + dim Wk .
Proof. We first note that the vectors of the union of any bases of the subspaces W1 , W2 , . . . , Wk span
W, whether the sum W is direct or not.
To prove that condition (a) implies (b), note that as the union of the bases of the subspaces Wi
clearly spans their sum W, it suffices to show that vectors in this union are linearly independent. Now,
given any linear combination of the vectors in the union of the bases which equals the zero vector, we
can group together, for each i, the scalar multiples of the members of Bi in that combination, and label
the sum of these multiples as, say, ai . Each ai being a linear combination of vectors of the basis Bi is
in Wi . Therefore, the relation we started with can be rewritten as
a1 + a2 + · · · + ak = 0, where ai ∈ Wi .
Since we are assuming that W is a direct sum, it follows, by virtue of the preceding proposition, that
each ai is the zero vector. On the other hand, the vectors in each Bi are linearly independent so the
scalars in the linear combination of the vectors of Bi that resulted in ai , must all be zeros. The argument
can be repeated for each i to show that all the scalar coefficients in the original relation involving the
basis vectors in the union are zeros. Thus, the vectors in the union are linearly independent proving
that condition (a) implies condition (b). A similar argument will prove that condition (b) implies (a).
That (b) implies (c) is trivial. So, we assume condition (c), and consider the union of given bases
B1 , B2 , . . . , Bk of the subspaces W1 , W2 , . . . , Wk , respectively. It is clear that the union is a spanning
set of the sum W. Thus, if the union is not a basis of W, then there is a proper subset of the union
which will be a basis of W. In that case, dim W will be strictly less than the sum of the dimensions of
W1 , W2 , . . . , Wk contradicting our assumption. This completes the proof of the proposition. !
Subspaces Again 151
A couple of remarks are in order.
(i) Recall that we have insisted that a basis of a vector space is an ordered one. Thus, the union of
the bases in the condition (c) must be interpreted as the sequence of vectors obtained by stringing
together the vectors in the ordered bases of the subspaces one after the other in the same order.
(ii) The definition of a direct sum of two subspaces is particularly simple. We can rephrase it as
follows: the sum W = W1 + W2 is a direct sum if and only if W1 ∩ W2 = {0}, the zero subspace.
(iii) For finite-dimensional subspaces W1 and W2 of a vector space V, the sum W = W1 ⊕ W2 is direct
if and only if dim W = dim W1 + dim W2 .
EXAMPLE 38 Going back to Proposition (3.5.2), we see, for example, that the sum of any two
distinct one-dimensional subspaces of R2 is direct; in fact,
R2 = W1 ⊕ W2
for any two distinct lines W1 and W2 passing through the origin.
On the other hand, we had seen after Proposition (3.5.2) that the sum of two
distinct planes in R3 passing through the origin cannot be a direct sum.
EXAMPLE 39 Recall that a matrix A ∈ Mn (F) is called a symmetric matrix if At = A, and a skew-
symmetric matrix if At = −A. We had also seen that the symmetric and the skew-
symmetric matrices form subspaces of Mn (F), say W1 and W2 , respectively. Observe
that for any matrix A ∈ Mn (F), the matrix 1/2(A + At ) is symmetric, whereas 1/2(A −
At ) is anti-symmetric. (This assumes that the field F is such that division by 2 is
possible in F). It follows that Mn (F) = W1 + W2 . Next, note that a matrix in Mn (F)
is symmetric as well as skew-symmetric if and only if it is the zero matrix. In other
words, W1 ∩ W2 = {0}. We can, therefore, conclude that
Mn (F) = W1 ⊕ W2 .
Given a subspace W of a vector space V, quite often we need to know whether W is a direct
summand of V, or equivalently, whether there is another subspace W1 such that V = W ⊕ W1 . Such
a subspace is sometimes referred to as a direct complement of W. If W is a subspace of a finite-
dimensional vector space V, it is easy to see that complementary subspaces for W exist. Choose a
basis of W, expand it to a basis of V and let W1 be the subspace spanned by those basis vectors which
are not in W. Then, it is clear that V = W ⊕ W1 . We record this observation now.
Proposition 3.5.7. Any subspace of a finite-dimensional vector space has direct complements.
EXERCISES
vector spaces are finite-dimensional over arbitrary fields.
(a) Any basis of a proper subspace of a vector space V can be extended to a basis of V.
(b) Any linearly independent subset of a proper subspace of a vector space V can be extended
to a basis of V.
152 Vector Spaces
(c) For any two subspaces W1 and W2 of a vector space, dim(W1 + W2 ) is larger than the mini-
mum of the two dimensions of W1 and W2 .
(d) The direct complement of any proper subspace of a vector space is unique.
(e) If, for subspaces W1 , W2 , . . . , Wk in a vector space, Wi ∩ W j = (0) for all i ! j, then the sum
W1 + W2 + · · · + Wk is direct.
(f) If for two subspaces W1 and W2 of a vector space V, dim W1 + dim W2 = dim V, then V =
W1 ⊕ W2 .
(g) If dim V > 1, then there are always two subspaces W1 and W2 of vector space V such that
V = W1 ⊕ W2 .
(h) For subspaces W1 , W2 and W3 in a vector space, the dimension of the sum (W1 + W2 + W3 )
equals
dim W1 + dim W2 + dim W3 − dim(W1 ∩ W2 ∩ W3 ).
(i) In an n-dimensional vector space V, there are n distinct subspaces whose direct sum is V.
(j) An n-dimensional vector space cannot be a direct sum with more than n direct summands.
2. Prove Lemma (3.5.1).
3. Prove Proposition (3.5.2) in case W1 ∩ W2 = {0}.
4. Prove that condition (c) implies (a) in Proposition (3.5.4).
5. Let F be a field, and let e1 , e2 , . . . , en be the standard basis of the vector space Fn . Prove that
Fn = Fe1 ⊕ Fe2 ⊕ · · · ⊕ Fen .
6. Give an example of a vector space V and subspaces W1 , W2 and W3 such that W2 ! W3 and
V = W1 ⊕ W2 = W1 ⊕ W3 .
7. Let W1 , W2 and W3 be subspaces of a vector space. If W2 ⊂ W1 , then prove the Modular Law:
W1 ∩ (W2 + W3 ) = W2 + W1 ∩ W3 .
8. Let W1 , W2 , W3 and V1 be subspaces of a vector space V such that V1 = W2 ⊕ W3 . If V = W1 ⊕ V1 ,

then show that V = W1 ⊕ W2 ⊕ W3 .
4
9. Let W1 , W2 , . . . , Wk be subspaces of a vector space. Prove that the sum i Wi of these sub-
spaces is direct if and only if every set of non-zero vectors w1 , w2 , . . . , wk , where w j is chosen
from W j for j = 1, 2, . . . , k, is linearly independent.
10. Let W1 , W2 and W3 be the following subsets of vectors in R3 :
W1 = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0},
W2 = {(x1 , x2 , x3 ) | x1 = x2 },
W3 = {(x1 , x2 , x3 ) | x1 = x2 = 0}.
Verify that W1 , W2 and W3 are subspaces of R3 such that
R3 = W1 + W2 = W1 + W3 .
Which of these two sums are direct?

Rank of a Matrix 153
11. Let W1 and W2 be subspaces of M2 (R) given by

;' ( <
a b
W1 = | a, b, c ∈ R ,
c 0
;' ( <
0 b
W2 = | b, c, d ∈ R .
c d
Show that M2 (R) = W1 + W2 but that the sum is not direct. Find a subspace W3 of W2 such that
M2 (R) = W1 ⊕ W3 .
12. Let V be the real vector space of all mappings from R into R. (See Example 8 of Section 3.2 for
the definition of space of mappings). Let W1 and W2 be the subsets of V consisting of even and
odd functions in V, respectively, given by
W1 = { f ∈ V | f (−x) = f (x) for all x ∈ R},
W2 = { f ∈ V | f (−x) = − f (x) for all x ∈ R}.
Verify that W1 and W2 be subspaces of V, and show that V = W1 ⊕ W2 .

13. Let W1 and W2 be the subspaces of Mn (F) consisting of symmetric and skew-symmetric matrices
of Mn (F), respectively. Argue in terms of the dimensions of these subspaces to show that
Mn (F) = W1 ⊕ W2 .
Recall that a scalar matrix is a diagonal matrix in Mn (F) such that all its diagonal entries are
equal.
14. Let W be the subspace of all scalar matrices in M2 (R). Find a basis of W in terms of the unit
matrices e11 , e12 , e21 and e22 . Hence, determine three distinct subspaces W1 , W2 and W3 of
M2 (R) by giving their bases such that
M2 (R) = W ⊕ W1 = W ⊕ W2 = W ⊕ W3 .
3.6 RANK OF A MATRIX

The ideas such as subspace and linear independence, developed in the context of vector spaces, also
help us in gaining useful insights about individual matrices. In this section, we use the ideas to examine
the important concept of the rank of a matrix.
Consider any A ∈ Mm×n (F). Each of the m rows of A, considered an n-dimensional row vector, is a
vector in the vector space Fn . Similarly, the m-dimensional column vectors of A can be considered as
vectors in Fm .
Definition 3.6.1. Given A ∈ Mm×n (F), the subspace of Fn spanned by the row vectors of A is called
the row space of A and the dimension of the row space of A is called the row rank of A.
Similarly, the subspace of Fm spanned by the column vectors of A is the column space of A, and
the dimension of the column space is the column rank of A.
We denote the row space and the column space of a matrix A by row(A) and col(A), respectively.
It is clear that the identity matrix In of order n over any field F has n as its row rank and column
rank as its row vectors as well as the column vectors form the standard basis of Fn . On the other hand,
it is clear that the zero matrix in Mm×n (F) has both its ranks zero.
To avoid trivialities, we will deal with only non-zero matrices in this section.
154 Vector Spaces
We illustrate the definitions with the following example:
EXAMPLE 40 Consider real matrices

   
1 0 0 0 0
 0 1 0 0 0

0 1 0 0 0 0 0 1 0 0
A =  , B =  ,
0 0 1 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1
   
0 0 1 0 0
 0 0 1 0 0

0 0 0 1 0 0 0 0 0 1
C =  , D =  .
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
For each of these four matrices, the rows are vectors in R5 whereas the columns are
vectors in R4 . Further, in each non-zero row, there is only one non-zero entry, and
more importantly, this is the only non-zero entry of the column containing it. It is
thus clear that the non-zero rows are linearly independent as vectors of R5 so that
the row rank of each of these matrices is the number of non-zero rows in the matrix.
Similarly, the non-zero columns are also linearly independent and thus their numbers
are the column ranks.
Observe that it is easy to find the row and the column ranks of these matrices precisely because they
are in reduced row echelon form in which every non-zero column is a pivot column. Thus, it makes
sense to relate the ranks of an arbitrary matrix to the ranks of its echelon forms.
It is easy to relate the row rank of a matrix to the row rank of any matrix row equivalent to it. Recall
that (see Definition (2.3.1)) there are three types of elementary row operations.
Lemma 3.6.2. Two row equivalent matrices in Mm×n (F) have the same row rank.
Proof. We show that two row equivalent matrices have the same row spaces. Therefore, it suffices to
show that if B is obtained from A ∈ Mm×n (F) by a single elementary row operation, then row(B) =
row(A). Considering the three types of elementary row operations, we see that any row vector of
B is either a scalar multiple of the corresponding row vector of A, or a linear combination of two
row vectors of A. It follows that the span row(B) of the row vectors of B is contained in row(A).
Since A can also be obtained from B by some suitable row operations, a similar argument shows that
row(A) ⊂ row(B). The lemma follows. !
We may thus conclude that the row rank of any matrix is the same as the row rank of its reduced
row echelon form or any of its echelon forms. Before we take up the general result about the row ranks
of such matrices, we note that the non-zero rows of the matrix
 
1 0 0 0 0
0 1 0 −1 0
 
R = 0 0 1 0 0,

0 0 0 0 1

0 0 0 0 0
which is the reduced row echelon form of some matrix of order 5, are linearly independent as row
vectors in R5 . To take another example, consider the following matrix in row echelon form:
 
1 2 3 4 6 5

0 1 7 9 8 −1
S =  .
0 0 0 0 2 3
0 0 0 0 0 0
In this matrix too, it is clear that the non-zero rows are linearly independent as vectors of R6 , as the
entries below any pivot are all zeros. The reader should verify the asserted linear independence by
working out the details, as it will make the argument of the following result clear.
Lemma 3.6.3. Let R ∈ Mm×n (F) be a row echelon or a reduced row echelon matrix. Assume that R
is non-zero. Then, the non-zero row vectors of R are linearly independent as vectors in Fn .
Proof. Consider the non-zero rows of R as row vectors in Fn , and number them as ρ1 , ρ2 , . . . , ρr
starting from the bottom-most non-zero row so that the top-most row of R is ρr . Now in the list
ρ1 , ρ2 , . . . , ρr , the rows preceding any row ρ j are actually the non-zero rows below it inside the
echelon matrix R. Thus, the components in ρ1 , ρ2 , . . . , ρ j−1 corresponding to the pivot in ρ j are
all zeros. It follows that ρ j cannot be a linear combination of the preceding rows ρ1 , ρ2 , . . . , ρ j−1 .
Hence, by the characterization of linearly dependent vectors given in Proposition (3.3.10), the lemma
follows. !
Corollary 3.6.4. The row rank of a row echelon, or a reduced row echelon matrix, is the number
of its non-zero rows.
This corollary, along with Lemma (3.6.2), implies the following easy characterization of the row
rank of a matrix.
Proposition 3.6.5. The row rank of a matrix A in Mm×n (F) is the number of the non-zero rows of
the reduced row echelon form of A.
We now begin our discussion of column rank. The first task is to relate the column rank of a reduced
row echelon matrix R to its pivot columns. Let γ be any non-zero non-pivot column of R. Consider the
pivot columns preceding γ as we go along the matrix R from the left to the right; we number them as
γ1 , γ2 , . . . , γ s , where γk is the column in which the pivot appears in the kth row. It should be clear
that γk need not be the kth column of R as we are numbering only the pivot columns of R. As γ s is the
pivot column just preceding γ (there may be non-pivot columns between γs and γ), all the entries in
the column γ below the sth entry are zeros. Therefore, the column vector γ looks like
 
a1 
 .. 
 . 
 
a 
γ =  s ,
 0 
 . 
 .. 
 
0
156 Vector Spaces
where some of the scalars ak may also be zeros. Observe that each of the pivot columns γ1 , γ2 , . . . , γ s
has pivot 1 so the column γ can be expressed as the linear combination
a 1 γ1 + a 2 γ2 + · · · + a s γ s .
We have thus shown that any non-zero non-pivot column of R is a linear combination of some of its
pivot columns. Therefore, the pivot columns of R are sufficient to span the column space. Note next
that the entries preceding a pivot in the row containing the pivot are all zeros. Therefore, no pivot
column of R can be a linear combination of the pivot columns preceding it. This proves that the pivot
columns of R are linearly independent and so form a basis of the column space of R. We record this
fact as the following result.
Lemma 3.6.6. Let R ∈ Mm×n (F) be a reduced row echelon matrix. Then, the pivot columns of R
form a basis of col(R), so that the column rank of R is the number of pivot columns of R.
To link up this lemma with a result about the column rank of an arbitrary matrix, we need to prove
the following.
Lemma 3.6.7. The column ranks of two row equivalent matrices in Mm×n (F) are equal.
Proof. Let A and B be row equivalent matrices in Mm×n (F). Since the column rank of a matrix is the
dimension of the space spanned by its column vectors, it suffices to prove that the numbers of linearly
independent columns of A and B are the same. Equivalently, it is sufficient to show that any relation
of linear dependence among the columns of A gives rise to an exactly similar relation among the
corresponding columns of B and vice versa. To prove the assertion, let γ1 , γ2 , . . . , γn be the column
vectors A, and σ1 , σ2 , . . . , σn be the column vectors of B. Consider the following relation of linear
dependence
c1 γ1 + c2 γ2 + · · · + cn γn = 0, ci ∈ F. (3.7)
It implies that (c1 , c2 , . . . , cn )t is a solution of the vector equation
x1 γ1 + x2 γ2 + · · · + xn γn = 0,
which can also be expressed as the matrix equation Ax = 0 (see Equation 3.3), where x is the column
vector (x1 , x2 , . . . , xn )t . It follows that (c1 , c2 , . . . , cn )t is a solution of the matrix equation Ax = 0.
However, A and B being row equivalent, we know that the systems Ax = 0 and Bx = 0 have the same
set of solutions. Working backwards with the columns of B now, we see that the relation (3.7) implies
the following relation among the columns of B:
c1 σ1 + c2 σ2 + · · · + cn σn = 0, ci ∈ F
Since row equivalence is symmetric, a similar assertion, obtained by interchanging A and B, holds.
This completes the proof. !
The preceding two lemmas together yield the following result about column ranks.
Proposition 3.6.8. Let R be the reduced row echelon form of a matrix A ∈ Mm×n (F). Then, the
linearly independent columns of A are precisely those corresponding to the pivot columns of R. Thus,
the column rank of A is the number of pivot columns of R.
We can now state the main result of this section.
Theorem 3.6.9. The row rank and the column rank of a matrix in Mm×n (F) are the same.
Proof. Let R be the reduced row echelon form of an arbitrary matrix A ∈ Mm×n (F). Since row equiva-
lent matrices have the same row rank as well as the same column rank, it suffices to show that the row
rank and the column rank of R are the same. We can assume that R is non-zero. By Lemma (3.6.3), the
row rank of R is the number of non-zero rows of R. However, each non-zero row of R contains a pivot,
and each pivot of R belongs to exactly one pivot column of R. It follows that the number of non-zero
rows of R is precisely the number of pivot columns of R. Thus, the row rank of R, according to Lemma
(3.6.6), is the column rank of R. !
Definition 3.6.10. Let A ∈ Mm×n (F). The rank of A is the common value of the row and the column
rank of A. We denote this common value as rank(A).
As we have already noted, for a matrix A ∈ Mm×n (F), the product Ax describes all possible linear
combinations of the column vectors of A as x ranges over Fn . It is also clear, from the properties of
matrix multiplication, that the set {Ax | x ∈ Fn } is a subspace of Fm . Thus, {Ax | x ∈ Fn } is the explicit
description of the column space of A. So, we have another characterization of the rank of a matrix,
which is useful for discussing ranks of products of matrices.
Proposition 3.6.11. Let A ∈ Mm×n (F). Then rank(A) is the dimension of the subspace
col(A) = {Ax | x ∈ Fn }.
The next proposition gathers some assorted results about ranks of matrices. All the matrices in the
proposition are over the same field F.
Proposition 3.6.12. Let A ∈ Mm×n (F).

(a) If At is the transpose of A, then rank(A) = rank(At ).
(b) For any n × p matrix B, rank(AB) ≤ rank(A).
(c) For any p × m matrix B, rank(BA) ≤ rank(A).
(d) For any invertible matrices P and Q of suitable orders,
rank(PA) = rank(AQ) = rank(A).
Proof. (a) is immediate from the definition of rank. For the rest, the preceding description of column
space helps us in deriving quick proofs. For example, for any y ∈ F p and n × p matrix B, let x = By.
Then, x ∈ Fn . Note that Ax = (AB)y. This shows that col(AB) ⊂ col(A), proving (b). We leave the proofs
of the other two to the reader. !
158 Vector Spaces
Sometimes, we need to find bases for the row and the column spaces of a matrix explicitly. We
summarize now our findings about such bases.
• The basis of the column space col(A) of a matrix A is precisely those columns which correspond
to the pivot columns of the reduced row echelon form of A.
Note that the pivot columns of the reduced form of A need not form a basis of the column space
of A in general, as row operations invariably change the column space. This is in sharp contrast
to the situation obtained for the row space of a matrix.
• The basis of the row space row(A) of a matrix A is precisely the non-zero rows of the reduced row
echelon form of A.
The following examples illustrate the results we have developed in this section.
EXAMPLE 41 Consider the matrix

 
1 0 2 0 −3

4 1 0 −1 0
 
A = 0 3 1 0 −2.

8 2 0 −2 0

0 0 1 0 −3
Now, it can be verified that

 
1 0 0 0 0

0 1 0 −1 0
 
R = 0 0 1 0 0

0 0 0 0 1

0 0 0 0 0
is the reduced row echelon form of A. Since R has 4 non-zero rows, it follows that
the row rank of R as well as of A is 4. Note that 4 must be the column rank of A and
R too.
According to the assertions preceding this example, we see that the row space row(A) as well as
row(R) is spanned by (1, 0, 0, 0, 0), (0, 1, 0, −1, 0), (0, 0, 1, 0, 0) and (0, 0, 0, 0, 1).
Considering the columns of A corresponding to the pivot columns of R, we see that the column
space col(A) is spanned by
       
1 2 0  −3 
4 1 0  0 
 
       2 .
0, 3, 1 and  
8 2 0  0 
     
0 0 1 −3
Null Space and Nullity

There is yet another subspace associated with an m × n matrix A, though unlike the row space and the
column space of A, this one has no direct relations with the entries of the matrix. Recall (see Example
6 in the examples of subspaces in Section 3.1) that the solutions of the matrix equation Ax = 0 form a
subspace of the vector space Fn .
Definition 3.6.13. Let A ∈ Mm×n (F). The null space of A, denoted by null(A), is the subspace of
Fn consisting of the solutions of the matrix equation Ax = 0. The dimension of the null space of A is
called the nullity of A and denoted by nullity(A).
For example, the nullity of an invertible matrix A is zero as the invertibility of A implies that the
zero vector is the only solution of Ax = 0 so the null space of such a matrix is the zero subspace.
We now establish a relation between the rank and the nullity of an arbitrary matrix. This relation is
one of the most useful results in linear algebra.
Theorem 3.6.14. Let A ∈ Mm×n (F). Then,
rank(A) + nullity(A) = n.
Proof. Let s = nullity(A). If s = 0, then the system of equations Ax = 0 has only the zero solution and
so every column of A is a pivot column (see discussion preceding Proposition 2.5.1). It follows that
the rank of A is n proving the theorem in this case. So, we assume that s ≥ 1. Let x1 , x2 , . . . , x s be
column vectors in Fn forming a basis of null(A); extend it to a basis x1 , x2 , . . . , x s , x s+1 , . . . , xn of
Fn . We claim that the column vectors Ax s+1 , . . . , Axn of Fm form a basis of col(A). Recall that col(A)
is the subspace consisting of all the vectors Ax as x ranges over Fn . So, it is clear that the vectors Ax j ,
4
for s + 1 ≤ j ≤ n, are in col(A). On the other hand, for any x ∈ Fn , we can express it as nj=1 b j x j for
some scalars b j . Therefore,
n
1 n
1
Ax = b j Ax j = b j Ax j
j=1 j=s+1
as Ax j = 0 for 1 ≤ j ≤ s. Thus the vectors Ax s+1 , . . . , Axn span col(A). Next, we show that these vec-
4 4
tors are linearly independent. Suppose that for scalars c j , nj=s+1 c j Ax j = 0. Then, A( nj=s+1 c j x j ) = 0
4n
showing that j=s+1 c j x j ∈ null(A). It follows that
n
1 s
1
cjxj = bi x j ,
j=s+1 i=1
for some scalars b1 , b2 , . . . , b s . As the vectors x1 , x2 , . . . , x s , x s+1 , . . . , xn form a basis of Fn , it

follows that all the coefficients in the preceding equality and, in particular, all the c j are zero. The
claim is thus established. By our claim, the dimension of col(A), which is the rank of A, is precisely
n − s. The theorem follows. !
For the explicit determination of a basis of the null space of a matrix A, one can use the reduced row
echelon form R of A. Since the solution space of Ax = 0 is the solution space of Rx = 0, such a determi-
nation is easier working with R. The following example explains the method of finding a basis of the
null space of a matrix; since such a basis is a linearly independent set of solutions of Ax = 0, or equiv-
alently of Rx = 0, the reader is advised to review the material of Chapter 2, especially of Section 2.4.
160 Vector Spaces
EXAMPLE 42 Consider the 4 × 5 real matrix:

 
 2 0 1 0 4

−8 3 5 −6 −1
A =  .
 0 2 6 −4 0
0 4 −12 8 2
The null space of A, or of its reduced row echelon form R, is a subspace of R5 . The
reader can verify that
 
1 0 1/2 0 0

0 1 3 −2 0
R =  
0 0 0 0 1
0 0 0 0 0
is the reduced row echelon form of A. R has three pivot columns, namely, the 1st, 2nd
and 5th column so the nullity of A is 2. So, if x = (x1 , x2 , x3 , x4 , x5 )t is the column
vector of variables, then x1 , x2 and x5 are the basic variables, and the other the free
variables. Now the matrix equation Rx = 0, i.e.
   
  x 0
1 0 1/2 0 0  1   
0   x2  0
 1 3 −2 0    
  x  = 0.
0 0 0 0 1  3   
  x  0
0 0 0 0 0  4   
x5 0
can be written out explicitly as
x1 + (1/2)x3 = 0
x2 + 3x3 − 2x4 = 0
x5 = 0.
Thus, the non-trivial linear combinations of the free variables are given by
x1 = −1/2x3
x2 = −3x3 + x4 .
We think of the remaining basic variable x5 as the trivial or the zero linear combina-
tion of the free variables. Thus, the general solution of Rx = 0, and equivalently of
Ax = 0, is given by
   
 x1   −1/2x3 
 x2  −3x3 + x4 
   
 x3  =  x3 .
 x   x4 
 4   
x5 0
This general solution was obtained by following the steps outlined in the summary
preceding this example.
Now comes the crucial step of expressing the general solution as a linear combi-
nation of suitable column vectors, with the coefficients in the combination being the
free variables. In our example, this is done as follows:
     
 x1   −1/2  0
 x2   −3  2
     
 x3  = x3  1  + x4 0.
 x   
 4   0  1
 
x5 0 0
It is now clear that

   
 −1/2  0
 −3  2
   
 1  and 0
 
 0  1
 
0 0
span the null space of A (as well as of R). They are also linearly independent as can
be seen by looking at their components corresponding to the free variables.
Corollary 3.6.15. For any A ∈ Mm×n (F), the nullity of A, or the dimension of the solution space of
the homogeneous system of linear equations Ax = 0, is the number of non-pivot columns of the reduced
row echelon form of A, and hence is the number of free variables of the system.
We now present a result which relates the invertibility of a square matrix to its rank. Recall that a
square matrix is invertible if and only if its row reduced echelon form is the identity matrix.
Proposition 3.6.16. The following are equivalent for a matrix A ∈ Mn (F).

(a) A is invertible.
(b) The rows of A form a basis of Fn .
(c) The rank of A is n.
(d) The columns of A form a basis of Fn .
(e) The nullity of A is 0.
The proof is left to the reader.
EXERCISES
matrices are over an arbitrary field.
(a) The rank of an m × n matrix cannot exceed either m or n.
(b) The rank of a 2 × n matrix, where n > 2, must be 2 if the second row is not a multiple of the
first row.
(c) The nullity of an m × n matrix cannot be zero if m ! n.
(d) The row space of a square matrix is the same as its column space.
(e) The row space of a matrix A is the same as the column space of its transpose At .
162 Vector Spaces
(f) If A is an m × 1 matrix and B a 1 × n matrix, then the product AB has rank 1.

(g) The sum of the dimensions of the null space and the row space of a matrix cannot exceed
the number of rows of the matrix.
(h) If R is any row echelon form of a matrix A, and R has four non-zero rows, then the first four
rows of A form a basis of the row space of A.
(i) If R is any row echelon form of a matrix A, then the pivot columns of R form a basis of the
column space of A.
(j) The intersection of the null space and the column space of any square matrix is the zero
subspace.
(k) The row space and the column space of any non-invertible square matrix cannot be the same
subspace.
(l) The nullity of a matrix A is the same as the nullity of its transpose At .
(m) The m × n zero matrix is the only matrix of rank zero in Mm×n (F).
(n) An n × n matrix having n linearly independent columns is invertible.
(o) If the rank of an m × n matrix is r, then the nullity of At is (m − r).
(p) If, for an m × n matrix A over F, the equation Ax = b is consistent for all b ∈ F, then the
column space of A is all of Fm .
(q) If the rank of an m × n matrix A is 2, then A is the sum of 2 m × n matrices each of whose
rank is 1.
2. Find the rank and the nullity of the following matrices over R:
     
1 2 1 1 1 0 0  1 −1 0 4
1 0 1, 0 0 1 1, −1 2 −4 7,

     
1 1 2 1 1 1 0 5 −6 4 9
 
   4 −7 3 7 −5
 0 3 −1 −2 6
  6 −8 5 12 −8

−2 1 2 1 −3  
 , −7 10 −8 −9 14.
 2 −3 0 4 1  
 3 −5 4 2 −6
1 −1 2 −2 3
−5 6 −6 −7 3
3. Find bases for the row space, the column space and the null space of each of the matrices in
Exercise 2.
4. Prove assertions (c) and (d) of Proposition (3.6.12).
5. Prove that the statements of Proposition (3.6.16) are equivalent.
6. For any A ∈ Mm×n (F) and any non-zero scalar c ∈ F, prove that rank(cA) = rank(A).
7. Let A be an m × n, and B be an n × p matrix over a field F. Show that product AB can be written
as a sum of n matrices each of which has rank at most 1.
Hint: Use column-row expansion for the product as given in Proposition (1.6.3).
8. Let A ∈ Mm×n (F) be of rank 1. Show that A can be written as a product BC where B is an m × 1
and C is an 1 × n matrix over F.
9. Let A ∈ Mm×n (F) be of rank r. Show that A can be written as the sum of r rank 1 matrices in
Mm×n (F).
Orthogonality in Rn 163
10. How does the rank of a matrix in Mm×n (F) change if a single entry of the matrix is replaced by
another scalar?  
 1 
 
11. Determine whether v =  2  is in the column space of
 
−1
 
−4 −1 3
 
A =  5 1 2.
 
3 0 1
Does v belong to the null space A?

12. Let A ∈ Mm×n (F) with nullity k. If m + k = n, then show that for every column vector b ∈ Fm , the
system of equations Ax = b has a solution.
13. Let A ∈ Mm×n (F). Show that the system of equations Ax = b for any b ∈ Fm has a solution if and
only if the system of equations At x = 0 has only the trivial solution.
14. Let A be an m × n and B an n × m matrix over any field F. If m > n, then prove that AB cannot be
invertible.
15. Let A and B be matrices in Mn (F) such that A2 = A, B2 = B and In − A − B is invertible. Prove
that A and B have the same rank by computing A(In − A − B) and B(In − A − B).
3.7 ORTHOGONALITY IN Rn
There are some interesting interplay between the row space, the column space and the null space
of a real matrix which can only be understood in terms of the natural geometry of Rn . Geometrical
considerations in a real or a complex vector space can be introduced through the concept of inner
products; we shall be discussing such inner products in detail in Chapter 8. Here, our aim being
limited, we deal with only the standard inner product on the vector space Fm , where F is either R or
C; this product is usually known as the dot product. Even if one is dealing solely with real matrices,
sometimes the dot product on Cm cannot be avoided; for example, we shall be needing it to prove an
important result about real symmetric matrices in Section 5.3. We shall also be introducing orthogonal
matrices and QR-factorizations of real matrices in this section.
Throughout this section, F is either C or R. Also, we shall be treating elements of Fm as column
vectors; we shall denote a column vector as (x1 , x2 , . . . , xm )t .
First the notation. For any x = a + ib for a, b ∈ R, its conjugate is given by x = a − ib; x = x if and
only if x is real. Any real number can obviously be treated as a complex number with zero imaginary
part; x = a + i0. Also recall from Section 1.5, that for any x = (x1 , x2 , . . . , xm )t ∈ Fm , its conjugate
transpose is given by x∗ = (x1 , x2 , . . . , xm ); in case F = R, x∗ = xt = (x1 , x2 , . . . , xm ). We also set
a∗ = a for a scalar a ∈ F, considering a an 1 × 1 matrix.
Definition 3.7.1. For any x = (x1 , x2 , . . . , xm )t and y = (y1 , y2 , . . . , ym )t in Fm , the dot product
or the standard inner product /x, y0 is given by
/x, y0 = y∗ x
= x1 y1 + x2 y2 + · · · + xm ym
Note that /x, y0 is a scalar in F.

164 Vector Spaces
In case F = R, the inner product is the usual dot product in R, which is a real number:
/x, y0 = yt x
= x1 y1 + x2 y2 + · · · + xm ym .
It is easy to establish the basic properties of the dot product.
Proposition 3.7.2. For any x, y, z ∈ Fm and a ∈ F, the following hold:

(a) /x + y, z0 = /x, z0 + /y, z0.
(b) /x, y + z0 = /x, y0 + /x, z0.
(c) /ax, y0 = a /x, y0.
(d) /x, y0 = /y, x0.
(e) /x, ay0 = a /x, y0
Proof. We treat vectors in Fm as m × 1 and their conjugates as 1 × m matrices. Then by properties

of matrix multiplication, one has z∗ (x + y) = z∗ x + z∗ y and y∗ (ax) = a(y∗ x); also, as yi + zi = yi + zi ,
(y + z)∗ x = (y∗ + z∗ )x = y∗ x + z∗ x. These relations are the first three assertions of the proposition. Next,
applying the rule for the conjugate transpose of a product of two matrices given in Section 1.5, we
obtain (x∗ y)∗ = y∗ (x∗ )∗ = y∗ x, which implies the fourth assertion. It is clear that the third and the
fourth together give us the last assertion. !
The proposition clearly implies, for scalars c1 , c2 , . . . , cr and vectors x1 , x2 , . . . , xr in Fm , that

D1
r E r
1
ci x i , y = ci /xi , y0 (3.8)
i=1 i=1
for any y ∈ Fm , a result whose verification is left to the reader.

For a complex number x = a + ib, xx = a2 + b2 is a non-negative real number; it is clear that xx = 0
if and only if x = 0. The modulus or the absolute value |x| is then defined as the non-negative square
root of xx; thus |x| = 0 if and only if x = 0. In case x is real, |x| = x if x is non-negative and |x| = −x if
x is negative. It follows that for any x = (x1 , x2 , . . . , xm )t ∈ Fm
/x, x0 = |x1 |2 + |x2|2 + · · · + |xm |2 ,
a non-negative real number with /x, x0 = 0 if and only if x is the zero vector.
Definition 3.7.3. The length 4x4 of a vector x ∈ Fm is defined as the non-negative square root of
/x, x0.
Thus,
4x42 = |x1 |2 + |x2 |2 + · · · + |xm |2 .
A vector x ∈ Fm is a unit vector if 4x4 = 1. The standard basis vectors e1 , e2 , . . . , em of Fm are clearly
1
unit vectors. Also note that for any non-zero vector x, x is a unit vector.
4x4
Orthogonal Vectors
We now introduce the important idea of orthogonality in Fm .
Definition 3.7.4. A vector x ∈ Fm is orthogonal to y ∈ Fm , and we write x ⊥ y, if /x, y0 = 0.
Note: /x, y0 = 0 is equivalent to /y, x0 = 0. So x ⊥ y if and only if y ⊥ x. Thus we usually use the
term orthogonal vectors.
The idea of orthogonal vectors in Fm generalizes that of perpendicular straight lines in R2 and R3 .
A vector (x1 , x2 )t in R2 represents a (directed) line segment in the plane from the origin to the point
with coordinates (x1 , x2 ); the length of the segment is the length of the vector as we have defined. Now
consider vectors x and y in R2 ; then they represent two sides of a triangle, which meet at the origin.
By the Parallelogram law, the sum x + y represents a line segment which is parallel to and equal in
length to the third side of the triangle. We calculate the length of x + y by using the properties of the
dot product in R2 :
4x + y42 = /x + y, x + y0
= /x, x0 + /x, y0 + /y, x0 + /y, y0
= 4x42 + 4y42 + /x, y0 + /y, x0
= 4x42 + 4y42 + 2 /x, y0 ,
as /y, x0 = /x, y0 = /x, y0 for the dot product over R. Now, if the two sides of the triangle represented
by the vectors x and y are perpendicular, then by Pythagoras’ theorem 4x + y42 = 4x42 + 4y42 , which
holds, according to our calculation, if and only if /x, y0 = /y, x0 = 0. In other words, the condition for
the perpendicularity of two vectors in R2 is that the their dot product in R2 is zero. Thus, the definition
of the orthogonality of vectors in Fm is indeed a generalization of the idea of perpendicularity in R2 .
We now record a few useful facts about orthogonality in Fm .
Proposition 3.7.5. Let F be either R or C.

(a) Any x ∈ Fm is orthogonal to the zero vector in Fm .
(b) The standard basis vectors e1 , e2 , . . . , em form a mutually orthogonal set of unit vectors in Fm .
(c) If x1 , x2 , . . . , xm are the coordinates of x ∈ Fm with respect to the standard basis
{e1 , e2 , . . . , em }, then xi = /x, ei 0 for each i.
(d) The only vector in Fm , which is orthogonal to itself, is the zero vector.
(e) If x ∈ Fm is orthogonal to every vector in Fm , then x is the zero vector.
Proof. The first two assertions follow directly from the definition of orthogonality. The hypothesis of
4
the next assertion implies that x = mj=1 x j e j . Taking dot product of both sides of the relation by ei we
F G
then obtain the required result as /ei , ei 0 = 1 and e j , ei = 0 for j ! i by (b). For the fourth assertion,
note that for complex numbers x1 , x2 , . . . , xm , the relation |x1 |2 + |x2 |2 + · · · + |xm |2 = 0 holds if and
only if each xi = 0. Thus, for x = (x1 , x2 , . . . , xm )t ∈ Fm , the dot product /x, x0 = 0 if and only if each
component of x is zero. Finally, the hypothesis in (e) implies, in particular, that x is orthogonal to each
of the standard basis vectors ei , for 1 ≤ i ≤ m. Since /x, ei 0 is the ith component of x, the assertion in
(e) too follows. !
166 Vector Spaces
One of our goals of this section is to show that every vector in the row space of a matrix A ∈ Mm×n (R)
is orthogonal to any vector in its null space. In fact, a lot more is true. We need one more concept to
to be able to describe the situation completely.
Consider the set of all vectors in Fm which are orthogonal to each vector of a given subspace W of
m
F ; denote it by W ⊥ . So
W ⊥ = {x ∈ Fm | /x, y0 = 0 for all y ∈ W}.
W ⊥ is clearly non-empty as the zero vector belongs to it. Now the relation
/ax + by, w0 = a /x, w0 + b /y, w0
implies that if x, y ∈ W ⊥ , that is, /x, w0 = 0 and /y, w0 = 0 for any w ∈ W, then ax + by ∈ W ⊥ . This
shows that W ⊥ is a subspace of Fm .
Definition 3.7.6. For any subspace W of Fm , the subspace W ⊥ is called the orthogonal comple-
ment of W.
We now proceed to prove the result about the relationship between the row space row(A) and the
null space null(A) of an m × n matrix A over R. For relevant definitions and results, see the discussion
on row and column spaces in Section 3.6.
Proposition 3.7.7. Let A ∈ Mm×n (R). Then row(A)⊥ = null(A).
Proof. Since row(A) = col(At ), any vector in row(A) can be expressed as At y for some y ∈ Rm . Now, for
any x ∈ null(A), the following calculation, using the definition of the dot product and some properties
of transposes of matrices, shows that x ∈ row(A)⊥ :
F G
x, At y = (At y)t x
= yt Ax
= /Ax, y0
= /0, y0
= 0.
Thus, null(A) ⊂ row(A)⊥ . Note that though we started with the dot product in Rn , the scalar yt Ax in
the middle of the calculation was expressed as a dot product in Rm .
To prove the reverse inclusion row(A)⊥ ⊂ null(A), consider any v ∈ row(A)⊥ . Then v is orthogonal
H I
to every vector in row(A) = col(At ), and so for any arbitrary vector y ∈ Rm , v, At y = 0, where the dot
n
product is in R . A calculation, similar to the preceding one, then shows that /Av, y0 = 0, where the dot
product takes place in Rm . Since y is an arbitrary vector in Rm , it follows from part (iv) of Proposition
(3.7.5) that Av = 0 in Rm . Thus v ∈ null(A), which completes the proof of the first equality of sets in
the proposition. !
Orthogonal and Orthonormal basis

Definition 3.7.8. A set of vectors in Fm is said to an orthogonal set of vectors if any two distinct
vectors in the set are orthogonal. An orthogonal set of vectors is said to be an orthonormal set if every
vector in the set is a unit vector.
F G
Thus, vectors x1 , x2 , . . . , xr in Fm is orthogonal if xi , x j = 0 for i ! j. In addition, if /xi , xi 0 = 1
for each i for 1 ≤ i ≤ r, then the vectors form an orthonormal set.
Note: Any set of non-zero orthogonal vectors can be transformed into an orthonormal set by divid-
ing each vector by its length, which is a scalar.
The standard basis vectors e1 , e2 , . . . , em form an orthonormal set of vectors in Fm ; it is an example
of an orthonormal basis.
The following result shows that any set of m non-zero orthogonal vectors form a basis of Fm .
Proposition 3.7.9. Any set of non-zero orthogonal vectors in Fm is linearly independent.
Proof. Given any set x1 , x2 , . . . , xr of non-zero orthogonal vectors in Fm , consider any relation
c1 x1 + c2 x2 + · · · + cr xr = 0, (3.9)
where ci ∈ F. Taking the dot product of both sides of the preceding relation by x j , for a fixed j where
1 ≤ j ≤ r and then using Equation (3.8) to express the dot product of a sum of vectors as a sum of dot
products, we obtain
F G
c j x j, x j = 0
F G
as the given vectors are orthogonal. Since the given vectors are non-zero, the scalar x j , x j is non-
zero and so we conclude that c j = 0. Thus, Equation (3.9) can hold only if all ci are zeros. This proves
the linear independence of the given set of vectors. !
Corollary 3.7.10. Any set of m orthonormal vectors in Fm forms an orthonormal basis.
Orthogonal and Unitary Matrices

Matrices whose columns are orthonormal have nice properties; moreover, they appear frequently in
applications.
Definition 3.7.11. A matrix Q ∈ Mn (R) is said to be orthogonal if its columns are orthonormal in
Rn . A matrix U ∈ Mn (C) is said to be unitary if its columns are orthonormal in Cn .
EXAMPLE 43 Recall that the columns of a permutation matrix P of order n are a rearrangement
of the columns of the identity matrix In . Since the columns of In are orthonormal,
it follows that the columns of any permutation matrix are orthonormal. Thus any
permutation matrix is an orthonormal matrix.
168 Vector Spaces
EXAMPLE 44 We leave to the reader the verification that the following complex matrix
' (
1 1 1
A= √
2 i −i
is unitary.
Let Q ∈ Mn (R) be an orthogonal matrix with orthonormal columns γ1 , γ2 , . . . , γn , each column
an n × 1 matrix. The columns being linearly independent, Q is invertible (see Proposition 3.6.16).
We claim that its inverse is the transpose Qt . Now Qt , a matrix of order n, has rows γt1 , γt2 , . . . , γtn ,
each row an 1 × n matrix. It is clear that the (i, j)th entry of Qt Q is the product γti γ j , a scalar. Since
{γ1 , γ2 , . . . , γn } is an orthonormal set, it follows, from the definition of dot product in Rn , that
F G
γti γ j = γ j , γi = 0 if i ! j,
= 1 if i = j.
Thus, Qt Q = In showing that Qt is a left inverse of Q. But we have already seen (see Proposition
(2.5.7) in Section 2.5) that an one-sided inverse of a square matrix is necessarily the unique inverse of
the matrix. Hence, Qt is the inverse of Q.
Note that by a similar argument, the condition At A = In , for any A ∈ Mn (R), implies that the columns
of A are orthonormal. Thus, we have the following result, which is also sometimes taken as the defini-
tion of an orthogonal matrix.
Proposition 3.7.12. A matrix Q ∈ Mn (R) is orthogonal if and only if Qt Q = In = QQt .
It must be emphasized that we have not used any special property of the columns of Q to derive the
condition QQt = In from Qt Q = In ; rather it reflects a property of invertible matrices. So it is surprising
that a consequence of the condition QQt = In is the following, whose proof is left to the reader.
Corollary 3.7.13. If the columns of A ∈ Mn (R) are orthonormal, then the rows of A are also or-
thonormal.
We verify next some more properties of orthogonal matrices.
Proposition 3.7.14. Let Q be an orthogonal matrix in Mn (R).

(a) For any column vectors x and y in Rn ,
/Qx, Qy0 = /x, y0 .
Thus, multiplication by Q preserves the dot product.

(b) For any column vector x in Rn ,
4Qx4 = 4x4.
Thus, multiplication by Q preserves lengths of vectors.

(c) det Q = ±1.
Proof. The definition of dot product proves the first assertion immediately:
/Qx, Qy0 = (Qy)t Qx

= (yt Qt )(Qx)
= yt (Qt Q)x
= yt x
= /x, y0
as Qt Q = In and (XY)t = Y t X t for any two matrices X and Y which can be multiplied. The next assertion
follows from the first as 4x42 = /x, x0. Finally, taking determinant of both sides of the relation Qt Q = In ,
one obtains (det Q)2 = 1 as the determinants of a matrix and its transpose are the same. The final
assertion follows. !
A detailed discussion of orthogonal matrices in general inner product spaces can be found in Section
8.6. Still a point must be noted here: orthogonal matrices preserve dot product and lengths of vectors
so they can be used to represent geometric transformations in Rn , such as rotation and reflections,
which do not change lengths of line segments or angles between them. See relevant material in our
website for a thorough discussion of such transformations. Orthogonal matrices, in such contexts, can
be identified by their determinants.
Definition 3.7.15. An orthogonal matrix Q is a rotation matrix if det Q = 1 and a reflection matrix
if det Q = −1.
EXAMPLE 45 Consider the orthogonal matrices

' ( ' (
cos θ − sin θ 0 1
R= , S= .
sin θ cos θ 1 0
R represents a rotation of the plane R2 through an angle θ, whereas S a reflection

2 3
x
of the plane about the straight line y = x as can be verified by computing R 1 and
2 3 x2
x1
S . Note: det R = 1 and det S = −1.
x2
More examples of rotation and reflection matrices can be found in the exercises
that follow this section.
EXAMPLE 46 The rotation matrix

' (
cos θ − sin θ
R=
sin θ cos θ
has the inverse

' (
cos θ sin θ
R−1 = Rt = .
− sin θ cos θ
170 Vector Spaces
It can be easily verified that the columns of the following matrix
 
 1 1 1 1 

 1 −1 1 −1 
H =  
 1 1 −1 −1 
1 −1 −1 1
form an orthogonal basis of R4 . Converting the columns into orthonormal columns, one concludes that
1
H −1 = H.
4
Another advantage of orthonormal sets in calculations involving vectors in Fm is evident from the
following, whose simple verification is left to the reader.
Proposition 3.7.16. Let {v1 , v2 , . . . , vr } be an orthonormal basis of a subspace W of Fm . Then for

any v ∈ W,
r F
1 G
v= v, v j v j .
j=1
Thus, computing the coordinates of a vector with respect to an orthonormal basis is quite simple.
One does have to solve systems of equations or use the change of basis matrix as we have done in
Section 3.4.
Gram–Schmidt Process
Because of many advantages of orthonormal bases, the need frequently arises to replace a set of lin-
early independent vectors (such as a basis of a subspace) in Fm by an orthonormal set of vectors.
This can be done by following a procedure known as the Gram–Schmidt orthogonalization process,
which is applicable even in general inner product spaces. (A detailed discussion of the process is given
in Section 8.6). Here, by considering vectors in R2 , we try to give an intuitive feeling for the key idea
which makes the process work: finding a formula for the projection of a vector onto another. The
reader has, in fact, worked with this idea while computing the component of a vector along another
vector in coordinate geometry or basic vector algebra.
As before, we identify a vector x = (x1 , x2 )t in R2 with the directed line segment from the origin
(0, 0) of the plane R2 to the point (x1 , x2 ). Consider two vectors v = (x1 , x2 )t and u = (y1 , y2 )t in
R2 ; we assume that they are linearly independent so that they are not collinear. Let (z1 , z2 ) be the
foot of the perpendicular from the point (y1 , y2 ) to the line L joining the origin (0, 0) to (x1 , x2 );
then the line segment from (0, 0) to (z1 , z2 ), that is, the vector (z1 , z2 )t is the component (projection)
of u = (y1 , y2 )t along v. As the line containing v is one-dimensional, the component (z1 , z2 )t is av
for some real number a. Similarly, if L1 is the straight line through the origin perpendicular to L,
then dropping the perpendicular from (y1 , y2 ) onto L1 , we can determine the component, say w, of u
along L1 . Then by the Parallelogram law for addition of line segments again, u = av + w. Taking dot
product of both sides of this vector relation with v, one obtains /u, v0 = a /v, v0 as v is orthogonal to w.
Therefore, we may conclude that

/u, v0
u− v
/v, v0
is orthogonal to v.
To put our discussion in proper perspective as far as the Gram–Schmidt process is concerned, we
note that a direct calculation (without referring to coordinates) shows that, for any v, u ∈ Fm ,
D E
/u, v0
v, u − v = 0.
4v42
/u, v0
We shall refer to the vector v as the component (projection) of u along v.
4v42
In general, given a set of mutually orthogonal vectors v1 , v2 , . . . , vk−1 and another vector uk in Fm ,
subtracting from uk each of its components along v1 , v2 , . . . , vk−1 , respectively, produces a vector
vk orthogonal to each of v j for 1 ≤ j ≤ (k − 1). This procedure is the key to Gram–Schmidt process
outlined in the following proposition; for a proof see the discussion of the process in general inner
product spaces in Section 8.6.
Proposition 3.7.17. Let u1 , u2 , . . . , ur be a set of linearly independent vectors in Fm . We deter-

mine vectors v1 , v2 , . . . , vr recursively as follows: we set v1 = u1 , and after finding v1 , v2 , . . . , vk−1
for 2 ≤ k ≤ r, we set
/uk , v1 0 /uk , v2 0 /uk , vk−1 0
vk = u k − 2
v1 − 2
v2 − · · · − vk−1 . (3.10)
4v1 4 4v2 4 4vk−1 42
Then v1 , v2 , . . . , vr form a mutually orthogonal set of linearly independent vectors such that their
span is the same subspace spanned by u1 , u2 , . . . , ur .
Finally, if we set, for 1 ≤ k ≤ r,
1
qk = vk ,
4vk 4
then {q1 , q2 , . . . , qr } is an orthonormal set such that its span is the same as the span of
u1 , u2 , . . . , ur .
We shall need the following observation later:

H I
qk , uk ! 0 for any k, 1 ≤ k ≤ r. (3.11)
Since vk is a scalar multiple of qk , it is sufficient to verify that /vk , uk 0 ! 0, which is clear for k = 1.
For k ≥ 2, by Equation (3.10), vk − uk is a linear combination of v1 , v2 , . . . , vk−1 , each of which is
orthogonal to vk . Thus, if /vk , uk 0 = 0, then /vk , vk 0 = 0 (verify), which implies that vk is the zero
vector. This contradiction completes the verification of our observation.
Since any subspace W of Fm is finite-dimensional, W has a basis consisting of linearly independent
vectors. The Gram–Schmidt process then converts this basis into an orthonormal basis. Furthermore,
we know that any linearly independent set in Fm can be extended to a basis of Fm (see Section 3.4). It
follows any orthonormal basis of W can be extended to an orthonormal basis of Fm .
172 Vector Spaces
Proposition 3.7.18. Any subspace of Fm has an orthonormal basis, which can be extended to an
orthonormal basis of Fm .
Given a subspace W of Fm , choose an orthonormal basis v1 , v2 , . . . , vr of W and then extend it

to an orthonormal basis v1 , v2 , . . . , vr , vr+1 , . . . , vm of Fm . If U is the subspace of Fm spanned by
vr+1 , . . . , vm , then it is clear that Fm = W ⊕ U. Also, as each basis vector of U is orthogonal to every
basis vector of W, U ⊂ W ⊥ . On the other hand, given any v ∈ W ⊥ , we may write, by Proposition
(3.7.16),
m
1
v= /v, vi 0 vi ,
i=1
which implies that v is a linear combination of vr+1 , . . . , vm (as /v, vi 0 = 0 for i = 1, 2, . . . , r). This
proves the first part of the following; the proof of the second part is left to the reader.
Proposition 3.7.19. Let W be a subspace of Fm . Then
Fm = W ⊕ W ⊥ .
Furthermore, (W ⊥ )⊥ = W.
The Proposition (3.7.7), about the row and null space of a matrix, now implies the following.
Corollary 3.7.20. Let A ∈ Mm×n (R).

(a) row(A) = null(A)⊥.
(b) Rn = row(A) ⊕ null(A).
Now for some numerical computations.
EXAMPLE 47 We apply the Gram–Schmidt process to the vectors

   
1 1
   
u1 = 1 and u2 = 0
   
0 2
in R3 . We set v1 = u1 = (1, 1, 0)t . Then
4v41 2 = /v1 , v1 0 = vt1 v1 = 1 + 1 + 0 = 2.
So the next vector will be given by
/u2 , v1 0
v2 = u2 − v1
4v41 2
1
= (1, 0, 2)t − (1, 1, 0)t
2
= (1/2, −1/2, 2)t .
√
Since 4v42 2 = 1/4 + 1/4 + 4 = 9/2, the length of v2 = 3/ 2. Thus the required or-
thonormal vectors are
 √   √ 
1/ 2  1/3 2 
 √   √ 
1/ 2 and  −1/3 .
√ 2 
0 2 2/3
EXAMPLE 48 In this example, we determine an orthonormal basis of the column space of the matrix
 
 1 0 2 
 
A =  1 1 1 .
 
−1 −1 0
Here we start with the vectors u1 = (1, 1, −1)t , u2 = (0, 1, −1)t and u3 = (2, 1, 0)t .
So v1 = (1, 1, −1)t , whose length squared is 1 + 1 + 1 = 3. Thus,
/u2 , v1 0
v2 = u2 − v1
4v41 2
0+1+1
= (0, 1, −1)t − (1, 1, −1)t
3
= (−2/3, 1/3, −1/3)t
and so 4v42 2 = 4/9 + 1/9 + 1/9 = 6/9 = 2/3. It follows that

/u3 , v1 0 2+1+0
2
= =1
4v41 3
/u3 , v2 0 −4/3 + 1/3 + 0

= = −3/2.
4v42 2 2/3
So, finally
/u3 , v1 0 /u3 , v2 0
v3 = u3 − 2
v1 − v2
4v41 4v42 2
= (2, 1, 0)t − (1, 1, −1)t + 3/2(−2/3, 1/3, −1/3)t
= (0, 1/2, 1/2)t ,
√ √
whose length is 1/4 + 1/4 = 1/ 2. Normalizing the vectors v1 , v2 and v3 , that is,
dividing them by their respective lengths, we obtain the following orthonormal basis
of the column space of A:
 √   √ √   
 1/ 3   − 2/ 3   √0 
 √   √   
 1/ √3 ,  1/ √6 ,  1/ √2 .
     
−1/ 3 −1/ 6 1/ 2
174 Vector Spaces
We reiterate that the Gram–Schmidt process guarantees that the final orthonormal
vectors spans the same subspace whose basis we started with. Note that if we start
with linearly dependent set of vectors then the Gram–Schmidt process breaks down
(see Exercise 13).
EXAMPLE 49 We now consider an example involving complex vectors. Let us apply the Gram–
Schmidt process to vectors
2 3 2 3
1 1+i
u1 = and u2 = .
i −i
One has to be a little careful because dot products of complex vectors are expressed
in terms of conjugate transposes and so the order of the 2 vectors
3 in dot products do
2 1
matter. To begin with, v1 = u1 and so 4v1 4 = (1, −i) = 2. To find v2 , we first
i
compute
/u2 , v1 0 = v∗1 u2
2 3
1+i
= (1, −i)
−i
= 1 + i + i2
= i.
Therefore, the next vector, given by the Gram–Schmidt process, is

/u2 , v1 0
v2 = u2 − v1
4v1 42
2 3 2 3
1+i i 1
= −
−i 2 i
2 3
1 + i/2
= .
1/2 − i
The reader can now verify that 4v2 42 = 5/2 and that the required orthonormal vectors
are
2 3 2 3
1 1 1 2+i
√ and √ .
2 i 10 1 − 2i
QR Factorization
The Gram–Schmidt process gives rise to what is known as the QR factorization of real matrices.
Consider a real m × n matrix A with n linearly independent columns u1 , u2 , . . . , un in Rm . Note that,
as Rm is an m-dimensional vector space, n ≤ m. It is also clear that the rank of A is n.
Suppose that the Gram–Schmidt process, applied to the column vectors of A produce an orthonor-
mal set q1 , q2 , . . . , qn . Let Q be the m × n matrix over R having the vectors of this orthonormal set as
its columns. We claim that there is an invertible upper triangular matrix R of order n such that A = QR.
To prove our claim, we first recall that the Gram–Schmidt process is such that for any k, 1 ≤ k ≤ n,
span{u1 , u2 , . . . , uk } = span{q1 , q2 , . . . , qk }. Thus, for any j, 1 ≤ j ≤ n, the vector u j is a linear
combination of the orthonormal vectors q1 , q2 , . . . , q j . It then follows from Proposition (3.7.16) that
F G F G F G
u j = u j , q1 q1 + u j , q2 q2 + u j , q j q j . (3.12)
F G
For a fixed j, set ri j = u j , qi for 1 ≤ i ≤ j and define an n-dimensional column vector r j as
 
r1 j 
r 
 2 j 
 .. 
 . 
 
r j = r j j .
 
 0 
 .. 
 . 
 
0
Thus, if R is the matrix of order n having the column vectors r1 , r2 , . . . , rn as its n columns, then R
is clearly upper triangular.
We have already noted (see end of Section 1.6 about column-row multiplication as well as the de-
scription of column space in Section 3.6) that if M is an m×n matrix with columns γ1 , γ2 , . . . , γn and
x an n-dimensional column vector (x1 , x2 , . . . , xn )t , then Mx = x1 γ1 + x2 γ2 + · · · + xn γn . Applying
this formula to the matrix Q having columns v1 , v2 , . . . , vn and the column vector r j , we see that
Qr j = r1 j q1 + r2 j q2 + · · · + r j j q j
= u j,
by Equation(3.12). In other words, Q multiplied to the jth column of R produces u j , the jth column
of A. Therefore, QR = A. To complete the proof of our claim, we need to show F thatG R is invertible.
Now R is an upper triangular matrix such that each of its diagonal entry r j j = u j , q j is non- zero by
Equation (3.11). Therefore, R is invertible.
Thus, we have proved the following.
Proposition 3.7.21. Let A be a real m × n matrix such that the rank of A is n. Then there is an
m × n matrix Q with orthonormal columns and an invertible upper triangular matrix R of order n such
that
A = QR.
EXAMPLE 50 We determine the QR factorization of the following matrix

 
 1 −1 
 2 3 
A =  .

 −1 2 
−2 1
Clearly the column vectors u1 = (1, 2, −1, −2)t and u2 = (−1, 3, 2, −1)t are linearly
independent. We leave it to the reader to show that the Gram–Schmidt process, ap-
176 Vector Spaces
plied to u1 , u2 gives us the orthonormal set q1 , q2 , where

 √   √ 
 −1/ √10   −3/5 2 
   √ 
 2/ √10   4/5 √2 
q1 =   , q2 =  ,
 −1/ √10    5/5 2 
 
−2/ 10 0
and that
H I √
u1 , q1 = 10
H I √
u1 , q2 = 10/2
H I √
u2 , q2 = 5/ 2
It follows that the QR factorization of A is given by

   √ √ 
 1 −1   −1/ √10 −3/5√ 2  ' √ √ (
 2 3   2/ √10 4/5 √2  10 10/2
 =  √ .
 −1 2   −1/ 10 5/5 2  0
 5/ 2
 √
−2 1 −2/ 10 0
Note: The matrix equation A = QR implies that Qt A = R as Qt Q is the identity matrix

of order 2.
EXERCISES
All matrices are over R unless otherwise specified. The field F is either R or C.
1. Determine whether the following statements are true or false giving brief justification.
(a) Any linearly independent set in Fm is an orthogonal set.
(b) A square matrix with orthogonal columns is an orthogonal matrix.
(c) If the subspace W of Fm is spanned by u1 , u2 and y ∈ Fm is orthogonal to both u1 and u2 ,
then y ∈ W ⊥ .
(d) If 4x − y42 = 4x42 + 4y42 for x, y ∈ Fm , then x ⊥ y.
(e) If an m × n matrix A has orthonormal columns, then AAt = Im .
(f) If ut v = 0 for vectors u, v ∈ Cm , then v ⊥ u.
(g) An orthogonal matrix has linearly independent rows.
(h) if a vector u ∈ Rm has zero component along a vector v ∈ Rm , then u and v are orthogonal.
(i) For any subspace W of Fm , W ∩ w⊥ = {0}.
(j) For an orthogonal matrix Q of order n, Qx ⊥ Qy if and only if x ⊥ y.
(k) Every non-zero m × n matrix over R has a QR factorization.
(l) If A = QR is a QR factorization, then the columns of Q is an orthonormal basis of the column
space of A.
(m) If A = QR is a QR factorization, then the diagonal entries of R are positive.
(n) For any m × n real matrix A, the null space of At is the orthogonal complement of the column
space of A.
(o) The null space of any orthogonal matrix is {0}.
(p) If A is a real symmetric matrix of order n such that A2 = In , then A is orthogonal.
(q) For a real matrix A, the equation Ax = b has a solution if and only if b is orthogonal to every
solution of At x = 0.
2. For x, y ∈ Rm , show that x + y is orthogonal to x − y if and 2only3if 4x4 = 4y4.
1
3. Find two vectors x1 and x2 in R2 which are orthogonal to . Can they be linearly indepen-
−1
dent?  
 1 
 
4. Find two vectors x1 and x2 in R3 which are orthogonal to  1 . Is it possible to choose x1 and
 
−1
x2 such that they are linearly independent or orthogonal?
5. Show that
   
1  −2 
   
u1 = 0 and u2 =  1 
   
1 2
are orthogonal in R3 . Extend {u1 , u2 } to an orthogonal basis {u1 , u2 , u3 } of R3 .

6. Let W be the subspace of R3 represented by the plane x − 2y + z = 0. Determine W ⊥ .
7. Let
 
1 1 1
 
A = 2 3 3 .
 
344
Find vectors u, v ∈ R3 such that u is orthogonal to the row space of A and v orthogonal to the
column space of A.
8. Find an orthonormal basis of the column space of A, the matrix of the preceding Exercise.
9. Find an orthonormal basis of W ⊥ , where W is the subspace of R4 spanned by (1, 0, 0, 1)t .
10. Let A be real matrix of order 3, one of whose rows is (1, 1, 1). Is it possible that the vector
(−1, 0, 2)t is a solution of the system of equation Ax = 0?
11. Let A be a real m × n matrix. Prove that the orthogonal complement of null(At ) is the column
space of A.
12. Let A be a real m × n matrix. Prove that A and At A have the same null space (in Rn ). Show further
that if A has non-zero orthogonal columns, then At A is invertible.
13. Let u1 = (1, 1)t and u2 = (2, 2)t . Verify that
/u2 , u1 0
u2 − u1 = 0.
4u1 42
14. Let L be the line segment from the origin (0, 0) to the point (x1 , x2 ) and let x = (x1 , x2 )t be the
vector representing L. If L makes an angle α with the positive direction of the x-axis, then show
that
x2 x1
sin α = and cos α = .
4x4 4x4
178 Vector Spaces
Further, assume that the line segment L1 representing y = (y1 , y2 )t makes an angle β with the
positive direction of the x-axis and that θ = β − α. Prove that
/y, x0
cos θ = .
4y44x4
15. Prove that the orthogonal matrices of order n form a group with respect to matrix multiplication.
16. Let u be a unit vector in Rm . Prove that Q = Im − 2uut is an orthogonal matrix.
17. Prove that the rows of an orthogonal matrix Q in Mm (R) form an orthonormal set.
18. Let Q ∈ Mm (R) such that /Qx, Qy0 = /x, y0 for all x, y ∈ Rm . Is Q an orthogonal matrix?
19. Verify that the columns of the following matrices
   
 1 1 1 1  1 1 1 1 
 1 −1  1 i i2 i3 

1 −1 
H =   , F =  
 1 1 −1 −1  1 i2 i4 i6 
 
1 −1 −1 1 1 i3 i6 i9
are orthogonal in R4 and C4 , respectively. Hence determine their inverses.

20. Verify that the following matrices
   
 2 −1 −2   1 √ 0 0 
1  
 
 
A = √  −1 2 −2  ,  0 3/2 −1/2
√ 
10  −2 −2 −1  
0 1/2 3/2
are orthogonal. Are they rotation or reflection matrices?

21. Let a, b and c be real numbers such that a2 + b2 + c2 = 1. Prove that
 
 1 − 2a2 −2ab −2ac 
 
Q =  −2ab 1 − 2b2 −2bc 
 2 
−2ac −2bc 1 − 2c
is orthogonal and determine whether Q is a rotation or a reflection matrix.

22. Find QR factorizations of the following matrices;
 
 0 1  
1 1 2

 1 1   
  , 2 3 1 .
 −1 0  
  1 0 1
1 1
23. Let A be a real symmetric matrix of order n. If A = QR is a QR-factorization, then prove that
A2 = Rt R is an LU factorization of A2 .
3.8 BASES OF SUBSPACES

We need to compute the dimensions as well as bases of specific subspaces of vector spaces quite
often. In this section, we discuss, with the help of a few examples, as to how these computations can
be performed efficiently with matrices and their echelon forms. The techniques are the same as the
Bases of Subspaces 179
ones adopted to find row and column ranks of matrices in the last section. Thus, familiarity with the
material of the last section will be useful in understanding the working of the examples here.
As vectors in an arbitrary finite-dimensional vector space can be visualized as row or column vec-
tors by introducing coordinates through a fixed basis, our examples will be in Fn for a field F, or more
specifically, in Rn .
EXAMPLE 51 Let W be the subspace of R5 spanned by the vectors (1, 1, 2, 1, −2), (2, 3, 8, 1, −1)
and (−1, 1, 6, −3, 8). We discuss the following questions.
(a) How to find a basis of W?

(b) What may be the general form of a vector in W?
(c) What are the coordinates of an arbitrary vector of W with respect to a basis of W?
In this example, we try to answer these questions by forming the 3 × 5 matrix A

listing the given vectors as the rows of A. Thus,
 
 1 1 2 1 −2
 
A =  2 3 8 1 −1.
 
−1 1 6 −3 8
The crucial point is that W is the same as the row space of the matrix A. Therefore,
any basis of the row space of A is a basis of W. Now recall [see the remarks after the
proof of Proposition (3.6.9) in the last section] that a basis of the row space of A is
formed by the non-zero rows of the reduced row echelon form R of A. Thus, we need
to perform row operations on A to reduce it to R. Recall that the symbol ∼ denotes
row equivalence.
 
 1 1 2 1 −2
 
A =  2 3 8 1 −1
 
−1 1 6 −3 8
 
1 1 2 1 −2
 
∼ 0 1 4 −1 3
 
0 2 8 −2 6
 
1 1 2 1 −2
 
∼ 0 1 4 −1 3
 
0 0 0 0 0
 
1 0 −2 2 −5
 
∼ 0 1 4 −1 3.
 
0 0 0 0 0
We see that
 
1 0 −2 2 −5
 
R = 0 1 4 −1 3.
 
0 0 0 0 0
180 Vector Spaces
It follows that the non-zero row vectors
v1 = (1, 0, −2, 2, −5) and v2 = (0, 1, 4, −1, 3)
of R form a basis of W. This answers the first question.

For the second question, note that any vector in W is of the form x1 v1 + x2 v2 for
some scalars x1 and x2 . But such a general description does not really throw any light
on the nature of vectors in W. What we would like to have is some specific relations
among the components of the vectors in W which distinguishes them from the rest of
vectors in R5 . To discover such relations, let y = (y1 , y2 , y3 , y4 , y5 ) = x1 v1 + x2 v2 be
an arbitrary vector in W. Note that (x1 , x2 )t is the coordinate vector of y with respect
to the basis {v1 , v2 } of W. Now, equating the components of both sides of the equality
y = x1 v1 + x2 v2 = x1 (1, 0, −2, 2, −5) + x2(0, 1, 4, −1, 3),
we can easily express yi in terms of the scalars x j :
y1 = x1 , y2 = x2 , y3 = −2x1 + 4x2, y4 = 2x1 − x2 , y5 = −5x1 + 3x2.
Eliminating the x j , we conclude that (y1 , y2 , y3 , y4 , y5 ) ∈ W if and only if
2y1 − 4y2 + y3 = 0
−2y1 + y2 + y4 = 0
5y1 − 3y2 + y5 = 0.
In other words, W is precisely the solution space in R5 of the following homogeneous

system of linear equations over R:
2x1 − 4x2 + x3 = 0
2x1 − x2 − x4 = 0
5x1 − 3x2 + x5 = 0.
Observe that the number of free variables of this system of equations is the dimension
of W by Corollary (3.6.15).
The fact that a subspace of Rn spanned by a certain set of vectors can be iden-
tified as the solution space of some homogeneous system of linear equations over
R will be useful in tackling some other problem as we will see shortly. Coming
back to our example, note that we have also found the coordinates x1 and x2 of
y = (y1 , y2 , y3 , y4 , y5 ) ∈ W with respect to the basis {v1 , v2 } of W:
x 1 = y1 and x2 = y2 .
For example, it is readily verified that the coordinates of the vectors spanning W we
started with, will be given as follows
(1, 1, 2, 1, −2) = 1 · v1 + 1 · v2
(2, 3, 8, 1, −1) = 2 · v1 + 3 · v2
(−1, 1, 6, −3, 8) = −1 · v1 + 1 · v2 .
It is now quite straightforward to obtain the coordinates of a vector in W relative

to an arbitrary basis of W by using Theorem (3.4.14). For example, the rank of the
matrix A being 2 (as the rank equals the rank of R, the echelon form of A), the first
two rows of A, say u1 = (1, 1, 2, 1, −2) and u2 = (2, 3, 8, 1, −1), also form a basis of
W. Expressing these vectors in terms of the basis vectors v1 , v2 of W, we see that
' (
1 2
P=
1 3
is the transition matrix from basis {u1 , u2 } to basis {v1 , v2 } of W. Since
' (
3 −2
P−1 = ,
−1 1
it follows that the coordinate vector of any y = (y1 , y2 , y3 , y4 , y5 ) in W with respect
to basis {u1 , u2 } will be given by
2 3 2 3
y 3y1 −2y2
P−1 1 = .
y2 −y1 +y2
We next illustrate the method of finding a basis of the intersection of two
subspaces of Fn by computing a basis of W ∩ U, where W is the same sub-
space of R5 we have been considering and U is the subspace of R5 spanned by
(2, 1, 0, 3, −7), (1, 0, −1, 2, −6) and (4, 2, 3, 5, −9). We first find a basis of U the
same way as in the case for W by finding an echelon form of the matrix whose rows
are the vectors spanning U. It is an easy matter to see that the required echelon form is
 
1 0 −1 2 −6
0 1 2 −1

5,
 
0 0 0 0 0
so that (1, 0, −1, 2, −6) and (0, 1, 2, −1, 5) form a basis of U. Thus, if x1 , x2
are the coordinates of an arbitrary vector y = (y1 , y2 , y3 , y4 , y5 ) in U, then
(y1 , y2 , y3 , y4 , y5 ) = (x1 , x2 , −x1 + 2x2 , 2x1 − x2 , −6x1 + 5x2 ). Eliminating the xi , we
see that the components of y satisfy the relations
y1 − 2y2 + y3 = 0
2y1 − y2 − y4 = 0.
6y1 − 5y2 + y5 = 0
It follows that U is the solution space of the following homogeneous system of linear
equations over R:
x1 − 2x2 + x3 = 0
2x1 − x2 − x4 = 0.
6x1 − 5x2 + x5 = 0
The key to finding a basis of W ∩ U is the simple observation that the intersection
W ∩ U is the common solution space of the preceding two homogeneous systems of
equations whose solution spaces are W and U, respectively.
182 Vector Spaces
So, following the procedure outlined in Section 2.4 for finding solutions of sys-
tems of linear equations, we form the coefficient matrix
 
2 −4 1 0 0

2 −1 0 −1 0

5 −3 0 0 0
C =  
1 −2 1 0 0
2 
 −1 0 −1 0

6 −5 0 0 1
of the six equations which define subspaces W and U, and row reduce it. We leave it
to the reader to verify that the reduced row echelon form S of C is
 
1 −2 0 0 0

0 1 0 2 1

0 0 1 0 0
S =  .
0 0 0 7 3
0 
 0 0 0 0

0 0 0 0 0
Since S has just a single non-pivot column, the solutions satisfying the two systems
of equations simultaneously, i.e. the subspace W ∩U form an one-dimensional vector
space. The matrix S implies that the components of the vectors (x1 , x2 , x3 , x4 , x5 ) in
W ∩ U satisfy the following relations:
x1 − 2x2 = 0
x2 + 2x4 + x5 = 0 .
x3 = 0
7x4 + 3x5 = 0
A basis of W ∩ U, which consists of a single vector, can be determined by assigning

any arbitrary value to x5 , the free variable, and then expressing the others in terms of
x5 according to these relations. Thus, for example, if we choose x5 = 7, then we see
that x4 = −3, x2 = −1 and x1 = −2. We can, therefore, conclude that (−2, −1, 0, −3, 7)
forms a basis of W ∩ U.
In the next example, we find a basis of the sum of two subspaces whose spanning sets are given.
EXAMPLE 52 Let W be the subspace of R4 spanned by the vectors (1, −2, −1, 1) and (−2, 1, 3, −3),
and U be the subspace spanned by (1, −8, 1, −1) and (3, 2, 1, −1). We note that the
sum W + U is spanned by the four given vectors, and therefore a basis of W + U
can be determined from the reduced row echelon form of the 4 × 4 matrix A whose
rows are the vectors spanning W and U, respectively. It can be easily verified that the
reduced row echelon form of

 
 1 −2 −1 1

−2 1 3 −3
A =  
 1 −8 1 −1
3 2 1 −1
is
 
1 0 0 0

0 1 0 0
R =  .
0 0 1 −1
0 0 0 0
So, a basis of W + U is formed by v1 = (1, 0, 0, 0), v2 = (0, 1, 0, 0) and v3 =

(0, 0, 1, −1). As in the preceding example, we then see that if y = (y1 , y2 , y3 , y4 ) is an
arbitrary vector of W + U, then its coordinates with respect to the basis {v1 , v2 , v3 } of
W + U are just y1 , y2 and y3 . It can also be deduced easily that W + U is the solution
space of a single linear equation
x3 + x4 = 0.
EXERCISES
1. Let W be the subspace of R4 spanned by the vectors (1, −2, 1, 4), (−2, 1, 3, −1), (0, 6, −10, −14)
and (−5, 1, 10, 1). Find a basis of W, and a homogeneous system of linear equations over R
whose solution space is W. Determine the coordinates of an arbitrary vector in W with respect
to the basis of W.
2. Let W be the subspace of R5 generated by the vectors (−2, 1, 3, 0, 5), (1, −3, 0, 2, −2), (1, 2, 3, 0, 5)
and (−1, −2, −1, 0, 6). Find a basis of W, and a homogeneous system of linear equations over R
whose solution space is W. Determine the coordinates of an arbitrary vector in W with respect
to the basis of W.
3. Let W be the subspace of R5 consisting of the vectors (x1 , x2 , x3 , x4 , x5 ) such that
x1 − 2x2 + 3x3 − x4 = 0
.
x2 − x3 + 3x4 − 2x5 = 0
Find a basis of W and extend it to a basis of R5 . What are the coordinates of a vector in W with
respect to the basis just found?
4. Find a basis of the subspace W of the space R3 [x] of all real polynomials of degree at most 3
spanned by
2 − 3x2 + 4x3 , x + 2x2 − x3 , x + x2 − x3 and 4 + x − 5x2 + 2x3 .
Hint: Use the coordinates of the polynomials spanning W with respect to the standard basis
of R3 [x] to reduce the problem to finding a basis of a subspace of R4 .
The trace T r(A) of a matrix A ∈ Mn (F) is the sum of its diagonal entries.
5. Let W1 and W2 be the following subsets of M3 (F):
(a) W1 = {A | T r(A) = 0}.
184 Vector Spaces
43
(b) W2 = {A = [ai j ] | j=1 a1 j = 0}.
Prove that W1 and W2 are subspaces of M3 (F). Further, determine the dimensions W1 , W2 , W1 ∩
W2 and W1 + W2 .
6. Let
W1 = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 − x2 − x3 = 0}
and
W2 = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 = x2 }
be subspaces of R5 . Determine the dimensions of the subspaces W1 , W2 , W1 + W2 and W1 ∩ W2 .

7. Find bases of W1 + W2 and W1 ∩ W2 , where W1 and W2 are the subspaces of R4 generated by
(1, 1, 1, 1), (1, 1, −1, −1), (1, −1, 1, −1)
and
(1, 1, 0, 0), (1, −1, 1, 0), (2, −1, 1, −1),
respectively.
8. Let W1 and W2 be the subspaces spanned by
(1, −2, 3, 4, 0), (1, 2, −3, 4, 1), (3, 2, −3, 12, 2)
and
(1, 1, 1, −1, −2), (0, 1, −4, 5, 3), (3, 5, −5, 7, 0),
respectively. Find two systems of homogeneous linear equations over R whose solution spaces
are W1 and W2 , respectively. Hence, find a basis of W1 ∩ W2 . What are the coordinates of an
arbitrary vector in W1 ∩ W2 with respect to this basis?
9. Given the subspace W1 of R4 spanned by
(1, 2, −1, 1), (2, 2, −1, 1) and (1, 1, 2, 3),
find bases for two subspaces W2 and W3 of R4 such that
R4 = W1 ⊕ W2 = W1 ⊕ W3 .
3.9 QUOTIENT SPACE

We have seen that the geometrical object of a straight line passing through the origin in R2 or R3 is
a one-dimensional subspace in the respective vector space. A question therefore arises: Is there any
interpretation of the straight lines not passing through the origin? Such an interpretation needs the
idea of a quotient space.
Quotient Space 185
To introduce the idea of such a space, let us consider a straight line, say, y = x + 1 in R2 . The points
on this line form a subset L = {(x1 , x2 ) ∈ R2 | x2 = x1 + 1} of R2 . Note that this is not a subspace of
R2 . However, it is closely related to the subspace of R2 formed by the straight line passing through
the origin and parallel to the line y = x + 1. Let the subspace be denoted by W so that W = {(x1 , x2 ) ∈
R2 | x2 = x1 }. To relate L to W, we need the following notation which is similar to that if sum of two
subspaces.
For a vector v and a subset U of a vector space V, we let v + U denote the subset of V formed by
the sums v + u as u ranges over U, i.e.,
v + U = {v + u | u ∈ U}.
Note that if v ∈ U, then v + U = U as subsets of V.

The notation just introduced can be extended to the addition of two arbitrary subsets of a vector
space which generalizes sum of two subspaces.
Definition 3.9.1. Let A and B be any two non-empty subsets of a vector space V. Then, the sum
A + B is the subset of V defined as
A + B = {a + b | a ∈ A, b ∈ B}.
Going back to the example with which we started, the notation of a sum of a vector and a subset
allows us to relate L and W as subsets of R2 as follows
L = (0, 1) + W.
For, if (a1 , a2 ) ∈ L, then a2 = a1 + 1, so that (a1 , a2 ) = (0, 1) + (a1, a1 ) which is clearly in (0, 1) + W.
Similarly, every vector of (0, 1) + W is in L. Hence, the equality between these two subsets of R2 .
Because of this equality, we say that L is a coset of the subspace W in R2 , and (0, 1) is a coset
representative of the coset L.
We list some of the obvious points related to this idea of a coset of a subspace of the vector space
R2 for they will have relevance in the general case also.
• There is no unique representative of the coset L. A moment’s thought will make it clear that any
vector in the subset L will be as good a representative as (0, 1) is.
• However, as the subspace W contains the zero vector, any coset representative has to be a vector
in the set L.
• There is nothing special about the line L which makes it a coset of W. In fact, any line parallel to
the line W can be thought of as a coset of the subspace W. For any line L' parallel to W, we may
choose a vector l in L' , and a routine calculation then will show that L' is the coset l + W.
• W is a coset of itself, for we may choose any vector in W, for example, the zero vector, as a
representative of the coset W.
• Consider the cosets or the straight lines L = (0, 1) + W and L' = (0, 2) + W. Since L = {(x1 , x2 ) ∈
R2 | x2 = x1 + 1}, and L' = {(x1 , x2 ) ∈ R2 | x2 = x1 + 2}, it follows that the sum of L and L' as subsets
of R2 is the set L'' = {(x1 , x2 ) ∈ R2 | x2 = x1 + 3}. See Definition (3.8.1). On the other hand, the
sum of the coset representatives of L and L' gives the vector (0, 3) which determines a third coset
(0, 3) + W. But note that this coset is precisely L'' , the sum of the first two cosets. The point is
that cosets can be added.
186 Vector Spaces
• In a similar manner, it can be verified that the scalar multiples of the vectors in a coset of W by a
scalar will again be a coset. For example, multiplying the vectors in L by the scalar 2, we obtain
the vectors in L' so that it makes sense to say that 2L = L' .
These cosets can be added and scalar multiplied, makes the idea of viewing straight lines in R2 a
rewarding one. For, then these straight lines themselves may be organized as a vector space. In fact,
even in the most general case, the cosets of any subspace of a vector space can, in a very natural
manner, be made into a vector space. Such spaces, whose vectors are cosets, are known as quotient
spaces. Let us look at the general case now.
Cosets of a Subspace
Let W be a subspace of a vector space V over a field F. (V may be infinite-dimensional also). For any
v ∈ V, the coset v + W of W represented by v is the subset of V defined as
v + W = {v + w | w ∈ W}. (3.13)
It is easy to verify that two cosets v + W and v' + W are equal as sets if and only if v − v' ∈ W. Thus, a
coset v + W can have any other representative v' from V as long as v − v' ∈ W.
The collection of all distinct cosets of W in V, denoted by the symbol V/W, is called the quotient
space of V by W. Thus,
V/W = {v + W | v ∈ V}.
There is a convenient notation for cosets in case there is no ambiguity about the underlying subspace
W (for example, in this present discussion there is only one subspace). If the role of W is understood,
we may let
v = v + W.
In this notation,
V/W = {v | v ∈ V} (3.14)
and
v1 = v2 if and only if v1 − v2 ∈ W. (3.15)
Addition and Scalar Multiplication of Cosets

In V/W, as we have mentioned earlier, addition and scalar multiplication of cosets are in terms of their
representatives as follows:
v1 + v2 = v1 + v2 (3.16)
av = av,
where a is a scalar from the field F. Note that these formulae in our old notation for cosets, are
(v1 + W) + (v2 + W) = (v1 + v2 ) + W

Quotient Space 187
and
a(v1 + W) = av1 + W.
Also, note that the operation of addition + and scalar multiplication as implied by juxtaposition on
the left-hand side of these formulae are defined for V/W in terms of similar operations of V on the
right-hand side.
Before we proceed any further, let us discuss a difficulty with these definitions. The difficulty arises
because the operations are defined in terms of the representatives of cosets, and we know that there is
no unique representative of a coset. Thus, for these definitions to be valid, we have to make sure that
they do not depend on the choices of vectors to represent cosets. The verification that the equations
in (3.16) are independent of the choice of coset representatives is described as establishing the well-
definedness of those operations.
Let us take the first of these two definitions, and choose different representatives for the cosets v1
and v2 . So, let u1 and u2 in V such that
v1 = u1 and v2 = u2 .
We have to show that
v1 + v2 = u1 + u2 ,
or, equivalently, that
v1 + v2 = u1 + u2 .
Now, v1 = u1 and v2 = u2 imply that v1 − u1 and v2 − u2 are in the subspace W. So, the sum of these
two vectors is again in W. Since the sum can be put in the form (v1 + v2 ) − (u1 + u2 ), the cosets of W
determined by (v1 + v2 ) and (u1 + u2 ) are the same i.e. v1 + v2 = u1 + u2 as desired. Hence, the sum of
cosets does not depend on our choice of representatives of the cosets. We invite the reader to verify
the well-definedness of the scalar multiplication of cosets in the same way.
It is routine now to prove the following theorem.
Theorem 3.9.2. Let W be a subspace of a vector space V over a field F. Then, the set V/W of all
distinct cosets of W in V is a vector space over the same field F with respect to addition and scalar
multiplication of cosets as defined by Equation(3.16).
Proof. The verification of the vector space axioms for V/W is straightforward, as the operations on
the cosets are defined in terms of the operations on vectors of V, and in V those axioms are already
satisfied. For example, the zero vector of V/W is 0 = 0 + W, the coset corresponding to the zero vector
0 of V. Similarly, as −v is the additive inverse of v in V, the inverse of the coset v in V/W is the coset
−v.
With these comments, we leave the detailed verification to the reader. !
EXAMPLE 53 For any vector space V, if we choose W = V, then the quotient space V/V clearly has
only one distinct coset, for v + V = 0 + V for any vector v in V. Thus, the quotient
space V/V is an one element vector space, namely, the zero vector space.
188 Vector Spaces
On the other extreme, if we take W = {0}, the zero subspace of V, then V/W is
almost the same as V, for this time distinct vectors in V would determine distinct
cosets.
EXAMPLE 54 We take up the example at the beginning of the section. Here, V = R2 and W =
{(x1 , x2 ) | x1 = x2 }. As we showed there, the quotient space R2 /W is an example of
new vector space, whose vectors are all the distinct straight lines in R2 parallel to
the line represented by W.
It is clear that, in general, if W is taken to be the one-dimensional subspace of
R2 given by any straight line L passing through the origin, then R2 /W is the vector
space of all lines in R2 parallel to L.
We will see in the next chapter that the quotient space R2 /W in all such cases are
essentially the same as the vector space R.
EXAMPLE 55 Consider the infinite-dimensional real vector space R[x] of all real polynomials. Con-
sider the set W of polynomials in R[x] which are multiples in R[x] of the fixed poly-
nomial x2 + 1. Thus,
W = {(x2 + 1)g(x) | g(x) ∈ R[x]}.
Note that W is not just the scalar multiples of x2 + 1, i.e. it is not the subspace
generated by x2 + 1 in R[x]. But then one verifies that W is still a subspace of R[x],
and so we may talk about the quotient space R[x]/W. An arbitrary coset in this
quotient space is an object like p(x) + W, where p(x) is a polynomial in R[x]. To get
a manageable description of such a coset, we divide the polynomial p(x) by x2 + 1.
It is clear that either p(x) is a multiple of x2 + 1 or the remainder is a polynomial of
degree less than that of x2 + 1. In other words, if q(x) is the quotient, then
p(x) = (x2 + 1)q(x) + r(x),
where r(x) is either the zero polynomial or a polynomial of degree at most one. Now,
the last relation implies that p(x) − r(x), being a multiple of x2 + 1, belongs to the
subspace W. By the definition of equality of cosets, one then finds that the cosets
p(x) + W and r(x) + W are the same. For example, in case p(x) is a multiple of x2 + 1,
the coset p(x) + W coincides with the coset W, the zero of R[x]/W. Every other coset
in R[x]/W has a representative polynomial of degree at most one. Thus, we may
describe R[x]/W as
R[x]/W = {a0 + a1 x + W | a0 , a1 ∈ R}.
This quotient space is essentially a copy of R2 .
Dimensions of Quotient Spaces

One must have been struck by our comment about the quotient spaces being almost the same as some
known ones. But we must wait till the next chapter which deals with mappings between spaces before
Quotient Space 189
these connections can be made precise. One of the results we need for that purpose is about dimensions
of quotient spaces.
Proposition 3.9.3. Let W be a subspace of a finite-dimensional vector space over a field F. Then,
the quotient space V/W is also finite-dimensional and
dim V/W = dim V − dim W.
Proof. Let dim V = n and dim W = m. Note that as V is finite-dimensional, W has to be finite-
dimensional, and dim W ≤ dim V by Proposition (3.4.6). It is sufficient to exhibit a basis of V/W
consisting of n − m vectors.
Start with a basis w1 , w2 , . . . , wm for W over F. Now, these vectors are linearly independent in W,
hence automatically in V also. By Corllary (3.4.7), this linearly independent set can be extended to a
basis w1 , w2 , . . . , wm , wm+1 , . . . , wn of V. We claim that the cosets wm+1 , . . . , wn form a basis of
the quotient space V/W.
Consider linear independence of these cosets over the field F. Note that the zero of the vector space
V/W is the coset W = 0. Assume that for scalars am+1 , am+2 . . . , an , we have a relation of linear
dependence for these vectors in V/W as follows
am+1 wm+1 + am+2 wm+2 + · · · + an wn = 0.
Applying the definition of scalar multiplication of a coset in each of the terms of the sum, and then
combining all the resultant cosets into a single coset by the rule of coset addition, we can rewrite the
last relation as an equality of two cosets:
(am+1 wm+1 + am+2 wm+2 + · · · + anwn ) + W = 0 + W,
which immediately places the vector am+1 wm+1 + am+2 wm+2 + · · · + an wn in W. Since W is spanned by
w1 , w2 , . . . , wm , the preceding vector can be written as a linear combination of these basis vectors of
W. Thus, we can find scalars, which we name as a1 , a2 , . . . , am so that
am+1 wm+1 + am+2 wm+2 + · · · + an wn = a1 w1 + a2 w2 + · · · + am wm .
But the vectors w1 , w2 , . . . , wm , wm+1 , . . . , wn in V form a basis of V. Hence, the last relation forces
all the scalars on both sides of the relation, and in particular, am+1 , am+2 , . . . , an to be zeros. This
establishes the linear independence of the vectors wm+1 , . . . , wn .
To complete the proof, we have to show that these cosets span V/W. So, let v + W be any
coset in V/W. If we express v, a vector of V, as a linear combination of the basis vectors
w1 , w2 , . . . , wm , wm+1 , . . . , wn of V, then the coset v + W will be the corresponding linear combi-
nation of the cosets w1 + W, w2 + W, . . . , wm + W, wm+1 + W, . . . , wn + W. But as w1 , w2 , . . . , wm
are in W, the corresponding cosets are the zero cosets in V/W. Therefore, v + W is actually a linear
combination of the remaining cosets wm+1 + W, . . . , wn + W. !
This theorem, for example, shows that if W is a straight line passing through the origin of R2 ,
then the dimension of the quotient space R2 /W is 1. Thus, any non-zero coset can be a basis. Putting
differently, this says that every coset is a scalar multiple of a fixed non-zero coset.
190 Vector Spaces
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All vector
spaces are finite-dimensional over an arbitrary field unless otherwise specified.
(a) For any subspace W of an infinite-dimensional vector space V, the quotient space V/W is
infinite-dimensional.
(b) The set of all planes in R3 parallel to a plane containing the origin of R3 is a quotient space
of R3 .
(c) If W1 is a subspace of another subspace W2 of V, the V/W1 is a subspace of V/W2 .
(d) If W1 is a subspace of another subspace W2 of V, the W2 /W1 is a subspace of V/W1 .
(e) Two cosets v1 + W and v2 + W are equal if and only if v1 = v2 .
(f) If v1 , v2 , . . . , vn span a vector space V, then v1 + W, v2 + W, . . . , vn + W span V/W for any
subspace W of V.
(g) If the quotient space V/W has dimension m, then for any basis v1 , vm , . . . , vn of V, v1 +
W, . . . , vm + W is a basis of V/W.
(h) The dimension of a quotient space V/W is strictly less than the dimension of V if and only if
W is non-zero.
(i) If W is the subspace of all those matrices in Mn (F) having trace zero, then any matrix in a
non-zero coset of W must have non-zero trace.
2. Let W be the subspace of R[x] consisting of all polynomial multiples of p(x) = x3 + 1. De-
scribe the elements of the quotient space R[x]/W. Determine a basis of R[x]/W, if it is finite-
dimensional.
3. If W1 ⊂ W2 are subspaces of a vector space V, then show that W2 /W1 is a subspace of V/W1 . Is
every subspace of V/W1 is of the form W/W1 , where W is some subspace of V containing W1 ?
Justify your answer.
4. Let W be the subspace of R3 spanned by (1, 1, 0) and (1, 0, 1). Find a basis of the quotient space
R3 /W.
5. Let W be the subspace of R4 spanned by (1, −1, 0, 1) and (2, 0, −1, 2). Find a basis of R4 /W.
6. Let W be the subspace of V = M2 (R) consisting of the diagonal matrices in M2 (R). Find a basis
of V/W.
Linear Maps
4 and Matrices
4.1 INTRODUCTION
In Chapter 1, we have noted that the multiplication by an m × n matrix over a field F transforms column
vectors of Fn to column vectors in Fm and so can be thought of as a function or a mapping from Fn to
Fm . To be precise, if we let T (x) = Ax for any x ∈ Fn , then T (x) ∈ Fm and so T is a mapping from Fn
to Fm .
The most important property of this mapping, given by the multiplication by A, is that it preserves
the vector space operations:
T (x + y) = T (x) + T (y)
T (ax) = aT (x),
since by the properties of matrix multiplication A(x + y) = Ax + Ay and A(ax) = aAx. The first of the
preceding equalities states that the vector in Fm , produced by applying T to the sum x + y in Fn can
also be obtained by adding the images T (x) and T (y) in Fm ; the second states that the image of the
scalar multiple ax in Fn under T can be obtained by scalar multiplying the image T (x) by a in Fm .
In other words, it is immaterial whether vector space operations are carried out before applying T or
after applying T . That is the reason T is said to preserve vector space operations.
Functions or mappings between vector spaces that preserve vector space operations are called linear
maps. They are also known as linear transformations.
Linear maps, like matrices, are indispensable in diverse applications of linear algebra mainly be-
cause these maps are well-suited to describe physical phenomena and changes in physical objects;
linear maps are also a basic tool in exploring relations between vector spaces.
A mapping f from a set X to another set Y is usually denoted by f : X → Y; the element y = f (x) ∈ Y
for any x ∈ X is the image of x under f while x is called the pre-image of y.
4.2 BASIC CONCEPTS

Linear maps or linear transformations can be defined between any pair of vector spaces, finite or
infinite-dimensional, as long as their underlying scalar field is the same.
Definition 4.2.1. Let V and W be vector spaces over the same field F. A map T : V → W is a linear
191
192 Linear Maps and Matrices
map over F if
T (v1 + v2 ) = T v1 + T v2 for any v1 , v2 ∈ V,

T (av) = aT v for any v ∈ V and a ∈ F.
For such a linear map T : V → W, V is said to be the domain and W the range of T .
It should be clear that the vector addition and the scalar multiplication in the left-hand side of the
relations defining T are the operations of the domain V whereas the operations of the right-hand side
are of W, the range of T . The equality in both the conditions is in W.
Linear maps or linear transformations between vector spaces over a field F are also known as F-
homomorphisms or simply as vector space homomorphisms.
Linear maps from a vector space V into itself occur frequently. These are usually known as linear
operators on V. Some prefer to call them as endomorphisms of V.
We will normally use the term linear transformation to describe linear maps between Rm and Rn
for various m and n.
Two linear maps T and S are equal if they are equal as functions, that is, T = S if they have the
same domain and range and T v = S v for all v ∈ V.
It is left to the reader to verify that the two conditions defining a linear map T can be combined to
give a single equivalent one:
T (av1 + v2 ) = aT v1 + T v2 for v1 , v2 ∈ V and a ∈ F.
The last result can be put in a more general form to show that linear combinations of vectors are
preserved by linear maps:
 
1m m
 1
T   a j v j  = a jT v j. (4.1)
j=1 j=1
Two simple but useful consequences of the definition of linear maps are as follows.
Proposition 4.2.2. Let T : V → W be a linear map. Then,

(a) T (0) = 0;
(b) T (−v) = −T v for any v ∈ V.
In the first identity, the zero in T (0) is clearly the zero vector of V whereas the zero on the right-
hand side is the zero vector of W. Since the same symbol may be used to denote different entities, their
usages have to be understood from the context.
Proof. For the first assertion, note that
T (0) = T (0 + 0) = T (0) + T (0)
by the linearity of T . Now, subtracting T (0) from both sides yields the result. The linearity of T again
Basic Concepts 193
shows that
T v + T (−v) = T (v − v) = T (0) = 0,
which implies that T (−v) must be the additive inverse of T v, giving us the second assertion. !
The first result says that the zero vector of the domain is always a pre-image of the zero vector of
the range of any linear map. In fact, the set of the pre-images of the zero vector (the additive identity)
of the range of a linear map is a very important subspace of the domain, called the kernel of the map.
As we will see later, closely linked to this kernel is the image of the map. For now, we introduce these
subspaces.
Kernel and Image of a Linear Map

Proposition 4.2.3. Let T : V → W be a linear map.
(a) The kernel of T , denoted by ker T , and defined as
ker T = {v ∈ V | T v = 0},
is a subspace of V.
(b) The image of T , denoted by Im(T ), and defined as
Im(T ) = {w ∈ W | w = T v for some v ∈ V},
is a subspace of W.
Proof. Note that ker T is non-empty as the zero vector of V is in it. So, to prove that it is a subspace,
it is sufficient to show that any linear combination of two vectors in ker T is again in ker T . So let
v1 , v2 ∈ ker T , and let a1 , a2 be scalars. Then, by the linearity of T , we have
T (a1 v1 + a2 v2 ) = a1 T v1 + a2T v2 .
But both T v1 and T v2 are zero in W, so the last relation shows that a1 v1 + a2 v2 is in ker T .
A similar application of the linearity of T shows that Im(T ) is a subspace of W. !
We now look at some examples of linear maps.
EXAMPLE 1 For any vector spaces V and W over the same field, the map T : V → W given by
T v = 0 for any v ∈ V is trivially a linear map. It is customary to call this map the zero
map from V into W. We will denote the zero map by z.
The kernel of the zero map is the whole of V, whereas the image is the zero
subspace of W.
By the zero operator of V, we will mean the linear operator z on V which maps
every vector of V to its zero vector.
EXAMPLE 2 The identity map of an arbitrary vector space V, that is, the map T : V → V such that
T v = v for all v ∈ V, is clearly a linear operator on V. We denote this identity operator
as IV , or simply as I.
In fact, if V is vector space over the field F, then for any fixed scalar a ∈ F, the
map T : V → V given by T v = av can easily be shown to be linear on V. We denote
this map as aIV or simply as aI.
EXAMPLE 3 Let P1 and P2 be defined from R2 to itself by
P1 (x1 , x2 ) = (x1 , 0)
P2 (x1 , x2 ) = (0, x2 ).
We leave it to the reader to verify that these are linear maps. These linear operators
of R2 are known as projections of R2 onto the x- and y-axis, respectively.
We can also think of these projections as linear maps from R2 into R in an obvious
manner.
It is clear that we can similarly define projections from, say, R3 to R2 or to R, or
for that matter from Fn to Fm for any field F provided n ≥ m.
EXAMPLE 4 On the other hand, if n ≥ m, the inclusion map from Fm to Fn given by
(x1 , x2 , . . . , xm ) 6→ (x1 , x2 , . . . , xm , 0, 0, . . . , 0)
is trivially a linear map.

In general, any direct sum decomposition of a vector space V gives rise to various
projections of V. For example, if V = W ⊕ W1 , then there is a linear map P of V onto
W defined as follows: for any v ∈ V, if v = w + w1 be the unique expression of v in
terms of vectors of the summands W and W1 , then we let Pv = w. P is clearly linear
and onto W, and has precisely W1 as its kernel. P is called a projection of V onto
W. Note that different complements of W in V will give rise to different projections
of V onto W. However, they can be identified by their kernels.
EXAMPLE 5 Consider the subspace W = {(x, 0) | x ∈ R} (the x-axis) of R2 . If W1 = {(0, y) | y ∈ R}
is the y-axis, then R2 = W ⊕ W1 and the projection P of R2 onto W determined by
this decomposition is the linear map we have already seen in Example 3: P(x1 , x2 ) =
(x1 , 0). The kernel of this projection is clearly the complement of W in this case.
Note that W2 = {(x, x) | x ∈ R} is another complement of the x-axis W in R2 . This
time, however, (x1 , x2 ) = (x1 − x2 , 0) + (x2 , x2 ) ∈ W ⊕ W2 so the projection P onto W,
determined by the decomposition R2 = W ⊕ W2 , will be the linear map given by
P(x1 , x2 ) = (x1 − x2 , 0).
We leave it to the reader to verify directly that the image of this P is W, and the
kernel is W2 .
EXAMPLE 6 Let V = Rn [x] be the real vector space of all real polynomials of degree at most n,
and let D : V → V be the differential map given by
D( f (x)) = f ' (x),

Basic Concepts 195
where f ' (x) is the formal derivative of f (x). In other words,
D(a0 + a1 x + a2 x2 + · · · + am xm ) = a1 + 2a2 x + · · · + mam xm−1 .
The familiar properties of differentiation show that D is indeed a linear operator on

V. The kernel of D is clearly the set of all scalar polynomials. However, we can think
of this set as the subspace of V generated by any non-zero scalar. How about the
image of D? Since indefinite integrals of polynomials can be considered, it follows
that the subspace of all polynomials of degree at most n − 1 is the image of D.
Note that D can be defined on the infinite-dimensional vector space R[x]. It can
be shown the same way that even on R[x], D is linear.
EXAMPLE 7 Let a, b, c and d be arbitrary real numbers, and let T : R2 → R2 be defined by
T (x1 , x2 ) = (ax1 + bx2 , cx1 + dx2 ).
Then, T is a linear map on R2 . The verification that ker T = {(0, 0)} if and only if
ad − bc ! 0 is left to the reader.
EXAMPLE 8 Consider the map Rθ : R2 → R2 defined by
Rθ (x1 , x2 ) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ).
This map, which is the anticlockwise rotation of the plane R2 through an angle θ, is
a linear operator on R2 .
The verification that Rθ is a linear map can be made easier by observing that its
effect can be realized by matrix multiplication. For this point of view, we need to
think of the vectors of R2 as column vectors. Then, we let
' (
cos θ −sin θ
Mθ =
sin θ cos θ
so that
2 3 ' (2 3
x1 cos θ −sin θ x1
Mθ =
x2 sin θ cos θ x2
2 3
x cos θ −x2 sin θ
= 1 .
x1 sin θ +x2 cos θ
It is now clear by properties of matrix multiplication that Rθ is a linear operator on

R2 .
From geometric consideration (as Rθ is a (anticlockwise rotation of the plane
through an angle θ), we see that ker Rθ = {(0, 0)} and ImRθ = R2 . We invite the reader
to supply a direct proof of these conclusions.
EXAMPLE 9 In a similar manner, we can describe the operation that reflects every vector of R2
about a straight line that makes an angle θ with the positive x-axis in terms of matrix
multiplication. We will show later that if

' (
cos 2θ sin 2θ
Hθ = ,
sin 2θ − cos θ
then for any column vector v ∈ R2 , Hθ v is the vector obtained by reflecting v about
the line making an angle θ with the positive x-axis. As in the last example, the corre-
sponding map is a linear operator on R2 by properties of matrix multiplication.
Three important cases can be singled out as given below.
(a) θ = 0; Reflection about the x-axis. This can be described as the linear operator T
such that T (x, y) = (x, −y).
(b) θ = π/2; Reflection about the y-axis. This can be described as the linear operator T
such that T (x, y) = (−x, y).
(c) θ = π/4; Reflection about the line y = x. This can be described as the linear operator
T such that T (x, y) = (y, x).
EXAMPLE 10 In fact, matrix multiplication, in some sense, is the most important example of linear
maps, especially in the case of finite-dimensional spaces. We discuss the general
case. Consider, for any field F, the vector spaces Fn and Fm and let A be an arbitrary
but fixed m × n matrix over F. Write vectors in Fn and Fm as column vectors. Now,
the map
T A : Fn → Fm
given by
T A (x) = Ax for any x ∈ Fn ,
is a linear map by the rules of matrix multiplication.

It is interesting to examine the kernel and the image of the map T A. The kernel of T A is precisely
the solution space in Fn of the homogeneous system of equations Ax = 0, so ker T A is the null space
of A. On the other hand, the image of T A is the subspace of Fm consisting of all those vectors b for
which the system of equations Ax = b has a solution in Fn , so Im(T A) is the column space of A. (See
Section 3.6 for the definitions of the null space and the column space of a matrix.)
Linear Maps Defined on Basis Vectors

We now consider a useful way of describing linear maps whose domains are finite-dimensional. Ob-
serve that if T : V → W is a linear map, and v1 , v2 , . . . , vn is a basis of V, then T is completely
determined by its actions on the basis vectors vi . For, if v = a1 v1 + a2 v2 + · · · + an vn , then by Equation
(4.1),
T v = a 1 T v1 + a 2 T v2 + · · · + a n T vn .
Therefore once the vectors T vi in W are known, then T v can be determined for any v ∈ V. This obser-
vation leads to the following useful result.
Basic Concepts 197
Proposition 4.2.4. Let V and W be vector spaces over the same field F. Assume that V is finite-
dimensional. Let {v1 , v2 , . . . , vn } be a basis of V, and w1 , w2 , . . . , wn a set of n arbitrary vectors in
W. Then there is a unique linear map f : V → W such that T vi = wi for all i.
Proof. Clearly, the map T (a1 v1 + a2 v2 + · · ·+ an vn ) = a1 w1 + a2 w2 + · · · + an wn is the required one. !
The fact that a linear map on a finite-dimensional vector space is completely determined by its
action on the vectors of any basis also implies that two linear maps on a finite-dimensional vector are
equal if and only if they agree on the vectors of any basis of the domain.
Lemma 4.2.5. Let V and W be vector spaces over the same field. Assume that V is finite-
dimensional. Let {v1 , v2 , . . . , vn } be any basis of V. Then for any two linear maps T and S from
V into W,
T =S if and only if Tvj = S vj for all j.
Proof. T = S if T v = S v for any v ∈ V. So if T v j = S v j for 1 ≤ j ≤ n, then by the remarks preceding

the proposition, we easily infer that T v = S v for any v ∈ V. The converse is trivial. !
The following remarks will be useful in applications of the preceding proposition.
(a) The range W of T in the proposition need not be finite-dimensional.

(b) The vectors wi in W can be chosen really arbitrarily. We may even choose them to be all equal.
For example, even for the choice w1 = w2 = · · · = wn = 0 in W, the proposition guarantees a
unique linear map from V to W (which in this case is the zero map).
(c) In practice, once wi s are given, it is customary to define the linear map guaranteed by this
proposition in the following manner: Define T by T vi = wi . Extend T linearly to all of V to get a
unique linear map from V to W.
We show how to use the proposition by deriving the differential map for the polynomials as an exam-
ple. Choose the standard basis 1, x, x2 , . . . , xn of the real vector space Rn [x], and let
D(xm ) = mxm−1 for m = 1, 2, . . . , n
Then, it is easy to see that once we extend D linearly to all of Rn [x], we recover the differentiation
map from Rn [x] to itself.
To take another example, let us define a map T from R2 to itself by letting it act on the standard
basis vectors of R2 as follows:
2 3 2 3
1 cos 2θ
T e1 = T =
0 sin 2θ
and
2 3 2 3
0 sin 2θ
T e2 = T = .
1 − cos 2θ
We can extend T to a linear operator on R2 by Proposition (4.2.4). Then, for any vector (x1 , x2 )t in R2 ,
2 3
x
T 1 = T (x1 e1 + x2 e2 )
x2
= x 1 T e1 + x 2 T e2
2 3 2 3
cos 2θ sin 2θ
= x1 + x2
sin 2θ −cos 2θ
2 3
x1 cos 2θ + x2 sin 2θ
=
x1 sin 2θ − x2 cos 2θ
2 3
x
= Hθ 1 ,
x2
where Hθ is the matrix we had introduced in Example 9. We leave it to the reader now to verify,
by geometrical consideration, that T (e1 ) and T (e2 ) are indeed, the vectors obtained by reflecting the
standard basis vectors about the line making an angle θ with the positive x-axis.
We now come back to our general discussion about linear maps.
Recall that any map T from V into W is onto or surjective if the image of T is the whole of the
range W, and is one–one or injective if distinct elements of V are mapped by T to distinct elements
of W. A map which is both one–one and onto is a bijection.
As the next result shows that the kernel is a very convenient tool in determining whether a linear
map is one–one or not.
Proposition 4.2.6. Let T : V → W be a linear map. T is one–one if and only if ker T = {0}.
Proof. The linearity of T shows that the equality T v1 = T v2 is equivalent to T (v1 − v2 ) = 0. By the
definition of kernel, this equality is equivalent to the inclusion (v1 − v2 ) ∈ ker T . Hence the result. !
Dimension Formula
But the linearity of a map gives rise to a much more fundamental relation between the kernel and the
image, in case the domain of the map is finite-dimensional. More precisely, we have the following
theorem.
Theorem 4.2.7 (Dimension Formula). Let V and W be vector space over the same field F. Assume
that V is finite-dimensional. For a linear map T : V → W, the following holds:
dim V = dim ker T + dim Im(T ).
Proof. Let dim V = n and dim ker T = k. If k ! 0, let v1 , v2 , . . . , vk be a basis of the subspace
ker T of V. Then, by Corollary (3.4.7), we can find vectors vk+1 , vk+2 , . . . , vn in V such that
v1 , v2 , . . . , vk , vk+1 , . . . , vn is a basis of V. In case k = 0, ker T = {0} and we choose any basis
v1 , v2 , . . . , vn of V to begin with.
We claim that in both the cases, whether k = 0 or not, the vectors T vk+1 , T vk+2 , . . . , T vn form a
basis of the subspace Im(T ) of W.
Basic Concepts 199
We sketch a proof of the claim in case k ! 0. Since Im(T ) consists of vectors T v as v ranges over
V, it follows that for suitable scalars a1 , a2 , . . . , an in F, any vector of Im(T ) can be expressed as
 
n
1 n
 1
T  a j v j  = a jT v j.
j=1 j=1
However, as v1 , v2 , . . . , vk are in ker T , we see that the above sum is actually a linear combination
of T vk+1 , T vk+2 , . . . , T vn only. In other words, the vectors T vk+1 , T vk+2 , . . . , T vn span Im(T ).
Next, we need to verify that the vectors T vk+1 , T vk+2 , . . . , T vn are linearly independent. Now, if
bk+1 T vk+1 + bk+2 T vk+2 + · · · + bn T vn = 0
for some scalars b j , then using the linearity of T , we can rewrite the relation as
T (bk+1 vk+1 + bk+2 vk+2 + · · · + bnvn ) = 0.
This implies that the vector bk+1 vk+1 + bk+2 vk+2 + · · · + bn vn of V is actually in ker T and so is some
linear combination of the basis vectors v1 , v2 , . . . , vk of ker T . It follows that we have a linear com-
bination of all the vectors v1 , . . . , vk , . . . , vn which equals the zero vector of V. But these vectors are
linearly independent. Therefore, all the coefficients in that linear combination and, in particular, the
scalars bk+1 , bk+2 , . . . , bn must be zeros. This completes the proof of the claim in case k ! 0.
The slight modification needed in the proof for the case k = 0 is left to the reader. !
Note that the proof of the preceding theorem is exactly the same as that of Theorem (3.6.14) of
Chapter 3. To see why that is so, consider an arbitrary matrix A ∈ Mm×n (F) and the corresponding linear
map T A from Fn to Fm . As we have noted in Example 10, ker T A is the null space of A whereas the
image of T A is the column space of A. Since the dimensions of the null space and the column space of A
are the nullity and the rank of A, respectively, the preceding theorem implies that rank(A) +nullity(A) =
n, which is Theorem (3.6.14).
In view of this discussion and anticipating later development, we make the following definitions:
Definition 4.2.8. Let V and W be finite-dimensional vector spaces over the same field and T a
linear map from V into W.
(a) The nullity of T is the dimension of the kernel of T and denoted by nullity(T ).
(b) The rank of T is the dimension of the image of T and denoted by rank(T ).
We can then restate the dimension formula as follows.
Corollary 4.2.9. Let T be a linear map of V into W, where V and W are finite-dimensional vector
spaces over the same field. Then,
dim V = nullity(T ) + rank(T ).
The following is a useful result.
Corollary 4.2.10. Let T : V → W be a linear map. Assume that dim V = dim W. Then, T is one–one
if and only T is onto.
Proof. Let T be one–one. So we may assume, by Proposition (4.2.6), that dim ker T = 0. In that case,
the dimension formula of the preceding theorem implies that dim V = dim Im(T ). Since by hypothesis
dim V = dim W, we conclude that dim Im(T ) = dim W. But Im(T ) is a subspace of W so the equality
of the dimensions means that Im(T ) = W, which implies that T is onto W.
The converse can be proved in exactly the same way. !
Thus, a linear operator on a finite-dimensional vector space is one–one if and only if it is onto.
The preceding corollary as well as the dimension formula are immensely useful. We give some
instances of their uses.
When we considered the map Rθ from R2 into itself, we remarked that ker(Rθ ) is the zero subspace
(after all, which point in R2 can be rotated through an angle θ to end up at the origin?). The preceding
corollary then shows immediately that Rθ is onto. Similarly, the linear operator T (x, y) = (ax + by, cx +
dy) on R2 is one–one if and only if ad − bc ! 0, hence must be onto under the same condition. Finally,
consider the differential map D defined on the real vector space of all polynomials of degree at most
n. As soon as we find that its kernel is one-dimensional, consisting of the scalars only, we can deduce
from the dimension formula that its image must be of dimension n, as V has dimension n + 1.
Projections and Direct Sums

We now consider a general class of linear maps, some special cases of which we have already encoun-
tered in earlier examples. In the following discussion, we anticipate products or composites of linear
maps which will be considered in detail later. For a linear operator T on a vector space, we let T 2 to
be the map on V given by T 2 v = T (T v) for any v ∈ V; it will be shown later that T 2 is also a linear
operator on V. However, it will be good exercise for the reader to verify the linearity of T 2 now.
Definition 4.2.11. A projection P on a vector space V is a linear operator on V such that P2 = P.
Note that the projections determined by a direct sum decomposition of a vector space V, as dis-
cussed in some of the preceding examples, are projections in the sense of this definition too. For, given
a direct sum V = W1 ⊕ W2 , every v in V can be expressed uniquely as a sum w1 + w2 with wi ∈ Wi . In
that case, the map P on V given by Pv = w1 does satisfy the condition that P2 = P as
P2 v = Pw1 = w1 = Pv.
It is easy to find properties that characterize arbitrary projections.
Proposition 4.2.12. Let P be a projection on a vector space V with W as its image and K as its
kernel.
(a) w ∈ W if and only if Pw = w.
(b) Any v ∈ V can be uniquely expressed as a sum Pv + (v − Pv) of vectors in W and K.
(c) V = W ⊕ K.
Proof. If Pv = w is in the range W of P, then Pw = P2 v = Pv = w. Assertion (a) follows. Since for any
v ∈ V, P(v − Pv) = Pv − P2 v = 0, it follows that (v − Pv) is in K, the kernel of P. So by (a), Pv + (v − Pv)
is indeed an expression of v as a sum of vectors from W and K. If v = w1 + w2 is another such sum,
then Pv = Pw1 = w1 , as w1 is in the range of P. Uniqueness in (b) follows. Finally, for assertion (c),
Basic Concepts 201
note that if w ∈ W ∩ K, then 0 = Pw. On the other hand, w being in W, by (a), equals Pw. Thus, W ∩ K
is the zero subspace so that assertion (b) implies (c). !
In general, any direct sum decomposition of a vector space V into k summands can be described
completely by k number of projections. If
V = W1 ⊕ · · · ⊕ Wk ,
then any v can be uniquely written as a sum v = v1 + · · · + vk , where v j ∈ W j . We define, for each j,
1 ≤ j ≤ k, a map P j : V → V by P j v = v j . It is easy to see that each P j is a projection of V onto W j
4
with kernel the subspace i! j Wi . These projections, further, enjoy certain properties that are closely
connected to the decomposition of V. The following proposition spells out these properties.
Proposition 4.2.13. If a vector space V can be decomposed as a direct sum
V = W1 ⊕ · · · ⊕ Wk
of k subspaces, then there are k projections P1 , P2 , . . . , Pk on V satisfying the following conditions.

(a) The image of P j is W j for any j, 1 ≤ j ≤ k.
(b) P j Pi is the zero map on V if j ! i.
(c) P1 + P2 + · · · + Pk = I, the identity map on V.
Conversely, if P1 , P2 , . . . , Pk are projections on a vector space V satisfying conditions (b) and (c),
then V is the direct sum of the images of these projections.
The proof is a routine verification, and left as an exercise.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. In the
following, V and W are vector spaces over the same field.
(a) If for a linear map T from V to W, T v = 0 only for v = 0, then T is one–one.
(b) Any linear map from V to W carries a linearly independent subset of V to a linearly inde-
pendent subset of W.
(c) Any linear map from V to W carries a linearly dependent subset of V to a linearly dependent
subset of W.
(d) If T : R → R is defined by T (a) = a + 2, then T is linear.
(e) There is no linear map T from R3 into R2 such that T (1, 1, 1) = (1, 0) and T (1, 2, 1) = (0, 1).
(f) There is no linear map from R2 to R3 such that T (1, −1) = (1, 1, 1) and T (−1, 1) = (1, 2, 1).
(g) If W is a subspace of a finite-dimensional vector space V, then there is a linear operator on
V whose kernel is W.
(h) If W is a subspace of a finite-dimensional vector space V, then there is a linear operator on
V whose image is W.
(i) Given vectors v ∈ V and w ∈ W with v non-zero, there is a unique linear map T : V → W such
that T v = w.
(j) There can be no linear map from R2 onto R.

(k) The column space of an m × n matrix over a field F is the image of some linear map from Fn
into Fm .
(l) There is no linear operator on R2 such that the kernel of T coincides with the image of T .
(m) There is no linear operator on R3 such that the kernel of T coincides with the image of T .
(n) There is a linear operator on R4 having its kernel equal to its image.
2. Determine whether the following maps between the indicated real vector spaces are linear. For
each linear map, determine the kernel and the image and deduce whether the map is one–one or
onto.
(a) T : R3 → R2 ; T (x1 , x2 , x3 ) = (x1 + x2 , −x3 ).
(b) T : R2 → R3 ; T (x1 , x2 ) = (0, x1 − 2x2 , x1 + 3x2 ).
' (
2 x1 −x2
(c) T : R → M2 (R); T (x1 , x2 ) = .
x2 x1 + x2
' ( ' (
x11 x12 x11 0 −x21
(d) T : M2 (R) → M2×3 (R); T = .
x21 x22 −x12 0 x22
(e) T : R3 [x] → R3 [x]; T (p(x)) = p' (x) + p(x).
(f) T : Rn [x] → R; T (p(x)) = p(0).

(g) T : Mn (F) → Mn (F); T (A) = A + At .
3. Explain why the following maps from R2 to R2 are not linear.
(a) T (x1 , x2 ) = (x1 , x2 2 ).
(b) T (x1 , x2 ) = (x1 + x2 + 1, x1 − x2 ).
(c) T (x1 , x2 ) = (0, sin x2 ).
(d) T (x1 , x2 ) = (1, x2 ).
4. Let V and W be vector spaces over a field F. Show that a map T : V → W is linear if and only if
T (av1 + v2 ) = aT v1 + T v2
for any v1 , v2 ∈ V and a ∈ F.

5. Let T : R3 → R3 be defined as
T (x1 , x2 , x3 ) = (x1 + x2 − 2x3, −x1 + 2x2 − x3 , 4x2 − 5x3).
Verify that T is a linear operator on R3 . Find all (a, b, c) ∈ R3 such that (a, b, c) ∈ ker T . Also,
find the vectors (a, b, c) in the range of T .
6. Let T be the translation operator on the vector space R[x] of all real polynomials given by
T (p(x)) = p(x + 1) for any p(x) ∈ R[x].
Prove that T is a linear operator on R[x].

Basic Concepts 203
7. Let R[x] be the vector space of all real polynomials. Prove that the differential operator D on
R[x] is onto but not one–one. Let T : R[x] → R[x] be defined by
J x
T (p(x)) = p(t)dt.
0
Prove that I is a linear operator on R[x] which is one–one but not onto.
8. Let B be a fixed matrix in Mn (F), where F is an arbitrary field. Define T : Mn (F) → Mn (F) by
T (A) = AB − BA.
Prove that T is a linear operator on Mn (F). Describe the kernel of T if B is scalar matrix. Find
the range of T in case B is a diagonal matrix.
9. Consider the set of complex numbers C as a vector space over R. Is the map T : R2 → C given
by T (a, b) = a + ib a linear map?
10. Let T be a linear operator on the vector space Fm . Prove that if the kernel of T coincides with
the image of T , then show that m has to be even.
11. Give an example of a linear operator T on R4 whose kernel and image are the same subspace of
R4 .
(Hint: Use Proposition (4.2.4) to define T in terms of the standard basis.)
12. Give an example of two distinct linear operators on the same vector space which have the same
kernel and image.
13. Let T be a linear map from a vector space V to a vector space W, both over the same field F. If
V1 is a subspace of V, then show that T (V1 ) = {T v | v ∈ V1 } is a subspace of W. Similarly, show
that if W1 is a subspace of W, then T −1 (W1 ) = {v ∈ V | T v ∈ W1 } is a subspace of V.
14. Let V and W be vector spaces of the same dimension over a field F. For a linear map
T : V → W, show that T is onto if and only if for every basis v1 , v2 , . . . , vm of V, the vectors
T v1 , T v2 , . . . , T vm form a basis of W.
15. Let a, b, c and d be fixed real numbers and let T : R2 → R2 be the map
T (x1 , x2 ) = (ax1 + bx2 , cx1 + dx2 ).
Prove that T is linear. Show further that T is one–one if and only if ad − bc ! 0.

16. Prove directly that the rotation Rθ of the plane R2 through an angle θ is a one–one, onto linear
operator on R2 .
17. Let T be the linear operator on R2 such that T e1 = (1, −1) and T e2 = (2, 3), where e1 and e2
form the standard basis of R2 . Find an expression for T (x1 , x2 ) in terms of x1 and x2 for any
(x1 , x2 ) ∈ R2 . Is T one–one? Is it possible to find (x1 , x2 ) ∈ R2 such that T (x1 , x2 ) = (a1 , a2 ) for
any (a1 , a2 ) ∈ R2 ?
18. Find bases for the kernel and images of the following linear transformations and hence determine
their rank and nullity.
(a) T : R3 → R3 ;
T (x1 , x2 , x3 ) = (2x1 − x2 + 3x3, −x1 + 2x3 , x1 + 2x2 − x3 ).
(b) T : R3 → R2 ;
T (x1 , x2 , x3 ) = (x1 − x2 + x3 , x1 + 2x2 − 3x3 ).
(Hint: If F is linear on V, then the image of F is spanned by F(v1 ), F(v2 ), . . . , F(vm ) for any
basis v1 , v2 , . . . , vm of V.)
19. Find a linear operator T on R3 whose image is spanned by (1, −1, 1), (1, 2, 3) and (0, 0, −1).
(The expected answer should be a formula for T (x1 , x2 , x3 ) in terms of x1 , x2 and x3 .)
20. Find a linear transformation from R2 to R3 whose image is spanned by (1, 1, 1).
21. Find a linear transformation from R3 to R2 whose kernel is spanned by (1, 1, 1).
4.3 ALGEBRA OF LINEAR MAPS

The collection of all linear maps between two vector spaces over a field can be given a vector space
structure. More interestingly, the vector space of linear operators of a vector space carries the ad-
ditional structure of a ring. We discuss these structures in detail in this section. We first fix some
notations.
Definition 4.3.1. Let V and W be vector spaces over a field F. The collection of all linear maps
(that is, F-homomorphisms) from V into W are denoted by HomF (V, W) or simply by Hom(V, W). The
collection of all linear operators (that is, F-endomorphisms) on a vector space V over a field F is
denoted by EndF (V) or simply by End(V).
We have already noted in examples of Section 4.2 that Hom(V, W) always contains the zero map
(the map which takes every vector of V to the zero vector of W), and End(V) the zero operator as well
as the identity operator of V. Though it is not clear now, there are innumerable elements in Hom(V, W)
if V and W are non-zero spaces. In fact, we will see shortly that for finite-dimensional spaces V and
W, once a pair of bases are fixed, any matrix of a suitable size gives rise to a map in Hom(V, W).
Our immediate task is to put in place a vector space structure on Hom(V, W). Since the elements of
Hom(V, W) are maps or functions, the usual definitions of sums and scalar multiples of maps give us
the required operations for Hom(V, W). We have already seen that with such operations, real-valued
functions on a closed interval or real polynomials form vector spaces. (See Examples 8 and 10 of
Section 3.2).
Definition 4.3.2. For linear maps T and S in Hom(V, W), we define the sum T + S to be the map
from V → W whose action on a vector of V is as follows: for any v ∈ V,
(T + S )v = T v + S v.
Similarly, the scalar multiple aT of T ∈ Hom(V, W) for a scalar a ∈ F is defined by what it does to a
vector of V: for any v ∈ V,
(aT )v = aT v.
The crucial fact is that both T + S and aT are again linear maps from V into W.
Proposition 4.3.3. Let V and W be vector spaces over the same field F.
(a) For any T, S ∈ Hom(V, W), the sum T + S ∈ Hom(V, W).
(b) For any T ∈ Hom(V, W) and a ∈ F, the scalar multiple aT ∈ Hom(V, W).
In particular, if T and S are linear operators on V, then T + S and aT for any a ∈ F are again linear
operators on V.
Algebra of Linear Maps 205
Proof. For any v1 , v2 ∈ V, we have
(T + S )(v1 + v2 ) = T (v1 + v2 ) + S (v1 + v2 )

= T v1 + T v2 + S v1 + S v2 ,
where the first equality follows from the definition of the sum T + S , and the second from the linearity
of T and S . Rearranging the terms in the last expression and using the definition of the sum T + S once
again, we obtain
(T + S )(v1 + v2 ) = (T + S )v1 + (T + S )v2 .
Thus, T + S preserves addition of vectors. Similar calculation shows that T + S preserves scalar mul-
tiplication:
(T + S )(av) = T (av) + S (av)
= aT v + aS v
= a(T v + S v)
= a(T + S )v. (4.2)
Thus we have shown that T + S is a linear map from V to W and so is in Hom(V, W).
For the second assertion, one needs to show that, for any a ∈ F and any linear map T from V to W,
the scalar multiple aT is also a linear map from V to W or, to be precise, to verify that
(aT )(v1 + v2 ) = (aT )v1 + (aT )v2

(aT )(bv) = b(aT )v.
for any v1 , v2 and v in V and b ∈ F. This routine verification is left to the reader.
The last assertion of the proposition can be obtained by taking W = V. !
Vector Space of Linear Maps

The preceding proposition confirms that Hom(V, W) is closed with respect to addition and scalar
multiplication of linear maps. Another round of routine verifications, this time of the vector space
axioms, establishes that Hom(V, W) itself a vector space.
Theorem 4.3.4. Let V and W be vector spaces over a field F. The collection Hom(V, W) of linear
maps of V into W forms a vector space over F with respect to addition and scalar multiplication of
maps. The zero map z from V to W is the zero vector (additive identity) of Hom(V, W).
Observe that verifying the vector space axioms for Hom(V, W) amounts to checking equalities of
maps. For example, to verify that addition of elements in Hom(V, W) is commutative, one has to show
that the maps T + S and S + T are equal, which, in turn, is equivalent to showing that
(T + S )v = (S + T )v
for all v ∈ V. Similarly, the trivial fact that T v + 0 = T v implies, according to the definition of equality
of functions, that T + z = T , so that one may conclude that z acts as the zero vector in Hom(V, W).
With these remarks, we leave the verification of the vector space axioms for Hom(V, W) to the
reader.
A special case of the preceding theorem, obtained by letting W = V, must be singled out.
Corollary 4.3.5. For any vector space V over a field F, the collection End(V) = EndF (V) of all
linear operators on V is a vector space over F with respect to addition and scalar multiplication of
linear operators. The zero operator on V acts as the additive identity in End(V).
The following fact is a useful one: a non-zero element of Hom(V, W) is a linear map T from V into
W such that there is at least one v ∈ V such that T v ! 0, the zero vector of W. Equivalently, the kernel
of T cannot be all of V.
Dimensions of Spaces of Linear Maps

Till now, we have been looking at linear maps between arbitrary vector spaces. However, for finite-
dimensional vector spaces, there is a striking formula, given in the following theorem, relating the
dimension of Hom(V, W) with those of V and W.
Theorem 4.3.6. If V and W are finite-dimensional vector spaces over a field F with dim V = n and
dim W = m, then Hom(V, W) is also finite-dimensional over F and
dim Hom(V, W) = nm.
In particular, for an n-dimensional vector space V, End(V) has dimension n2 .
The usual proof of this result is a constructive one, in the sense that once two bases of V and W are
fixed, then by using the given basis vectors one can actually write down nm specific linear maps from
V to W which form a basis of Hom(V, W). However, at this point it is difficult to see the motivation
for constructing these specific maps. So we postpone the proof till the next section where we will be
exploring the connection between Hom(V, W) and Mm×n (F). Recall that Mm×n (F) is an mn-dimensional
vector space; thus, our theorem will be a by product of the main result of that section.
Another special case of Hom(V, W), is introduced in the following definition. Recall that a field F
can be considered a vector space over itself.
Definition 4.3.7. For any vector space V over a field F, the vector space Hom(V, F) is known as
the dual space of V. It is denoted by K
V. The elements of K
V, which are linear maps from V to F, are
known as linear functionals on V.
The preceding theorem then implies the following corollary.
Corollary 4.3.8. For an n-dimensional vector space V over a field F,
dim K
V = n.
Composition of Linear Maps

We now introduce the important concept of the composition of linear maps. The reader must have
come across instances of functions produced by combining two functions (for example, the function
sin x2 is the composite of the functions f (x) = sin x and g(x) = x2 ); the composite of two linear maps
is obtained in the same manner.
Definition 4.3.9. Let T : V → W and S : W → U be linear maps where V, W and U are vector
spaces over the same field F. The composite S T of T and S is the map S T : V → U defined by
(S T )v = S (T v) for any v ∈ V.
Now for any v1 , v2 ∈ V and a ∈ F, using the linearity of T and S separately, we see that
S T (av1 + v2 ) = S (T (av1 + v2 )) = S (aT v1 + T v2 ) = S (aT v1 ) + S (T v2 )

= aS (T v1 ) + S (T v2 ) = a(S T )(v1 ) + (S T )(v2 ).
Thus, S T is a linear map.

Note that the definition of composite is valid even when two or all of the vector spaces V, W and
U are the same. Thus if T and S are linear operators on a vector space V, then their composite S T ,
which we shall also call as their product, is again a linear operator on V. In particular, given a linear
operator T on V, the composite T 2 = T T , defined by T 2 v = T (T v) for any v ∈ V, is a linear operator.
Letting T 0 = IV , the identity operator on V, we can then inductively define, for any positive integer n,
the linear operator T n by
T n v = T (T n−1 v)
for any v ∈ V.
EXAMPLE 11 Let T and S be the reflections in R2 about the x-axis and y-axis respectively. Thus,
T (x, y) = (x, −y) and S (x, y) = (−x, y) for any (x, y) ∈ R2 . Then
S (T (x, y)) = S (x, −y) = (−x, −y),
showing that the product S T is the reflection in R2 about the origin. Note that S T =
T S . However, as we shall a little later that composition of linear operators is not
commutative in general.
EXAMPLE 12 Geometrically, it is obvious that if the rotation through an angle θ in R2 is followed

by another rotation through the same angle, then the combined result is the rotation
through angle 2θ. In other words, if T is the linear operator on R2 representing a
rotation through angle θ, then the operator T 2 is the rotation through an angle 2θ. We
verify this assertion now. Recall from an earlier example that T can be described,
using a matrix, as
2 3 2 '3 ' (2 3
x x cos θ −sin θ x
T = ' = ,
y y sin θ cos θ y
where we are writing elements of R2 as column vectors as matrix multiplication is

used for describing T . Then
2 3 2 '3
x x
T2 =T '
y y
' (2 3
cos θ −sin θ x'
=
sin θ cos θ y'
' (' (2 3
cos θ −sin θ cos θ −sin θ x
= .
sin θ cos θ sin θ cos θ y
However, because of the familiar trigonometrical identities cos2 θ−sin2 θ = cos 2θ and
sin 2θ = 2 cos θ sin θ, the matrix product in the preceding equation can be simplified
as follows:
' (' ( ' 2 (
cos θ −sin θ cos θ −sin θ cos θ − sin2 θ −2 cos θ sin θ
=
sin θ cos θ sin θ cos θ 2 cos θ sin θ cos2 θ − sin2 θ
' (
cos 2θ − sin 2θ
= .
sin 2θ cos 2θ
Therefore,
2 3 ' (2 3
2 x cos 2θ − sin 2θ x
T = ,
y sin 2θ cos 2θ y
which shows that T 2 is indeed a rotation through an angle 2θ.
Ring of Linear operators

We have already seen that the composite or the product of two linear operators on a vector space V
over a field F is again a linear operator on V. Thus, the collection EndF (V) of all linear operators on V,
apart from addition of operators, has composition of two operators as another binary operation. What
is of importance to us is that with respect to these two binary operations EndF (V) is a ring. A reader
not familiar with the concept of a ring should to go through Section 1.7 in the first chapter for relevant
definitions.
Theorem 4.3.10. Let V be a vector space over a field F. Then, the vector space End(V) = EndF (V)
formed by the linear operators on V is a ring with respect to addition and composition of maps. The
identity operator on V acts as the (multiplicative) identity of the ring End(V).
Proof. It has been noted that End(V) is a vector space and so, in particular, it is an abelian group with
respect to addition of operators on V. To complete the proof that End(V) is a ring we therefore need
to verify the following identities involving arbitrary operators on V:
T 1 (T 2 T 3 ) = (T 1 T 2 )T 3 , (4.3)
T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 , (4.4)
(T 1 + T 2 )T 3 = T 1 T 3 + T 2 T 3 , (4.5)
for any T 1 , T 2 and T 3 in End(V). Since End(V) is closed with respect to addition and composition of
linear operators on V, it follows that both sides of all the three identities we are trying to prove are
operators on V and so have the same domain and the same range. Thus to prove the equalities, we
just have to show that both sides in each equality agree on arbitrary vectors of V. We take the second
relation first. Now for any vector v ∈ V, by the definitions of composition and addition of operators,
we see that
T 1 (T 2 + T 3 )v = T 1 ((T 2 + T 3 )v)
= T 1 (T 2 v + T 3 v)
which, by the linearity of T 1 , equals to
= T 1 (T 2 v) + T 1(T 3 v)
= (T 1 T 2 )v + (T 1 T 3 )v
= (T 1 T 2 + T 1T 3 )v.
By looking at the beginning and the end of this chain of equalities, we infer, as the relation holds for
an arbitrary vector v in V, that T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 as required (recall the definition of equality
of two operators). Note the use the definitions of composition and addition of operators, respectively,
in deriving the last two equalities.
We leave the verifications of the other two equalities, which can be done in a similar manner, to the
reader.
It is also easy to see that the identity operator I on V acts as the multiplicative identity in End(V),
that is, T I = IT = T for any linear operator T on V. For, given any v ∈ V,
(T I)v = T (Iv) = T v
and
(IT )v = I(T v) = T v. !
Because of Equation (4.3), the composition of linear operators is said to be associative; the two
Equations (4.4) and (4.5) show that it is also distributive. The distributive laws tell us how composition
of linear operators combine with addition. Similarly, there is a useful property that specifies how
composition of linear operators on a vector space V over a field F combines with respect to scalar
multiplication:
a(S T ) = (aS )T = S (aT ), (4.6)
for any S , T in EndF (V) and any a ∈ F. The verification is standard and left to the reader.
The ring End(V), like its counterpart Mn (F), is a very interesting one. To understand some of the
features that make it interesting, consider the special case of V = R2 . Fix any basis v1 , v2 of R2 . Recall
from Proposition (4.2.4) that we can get unique linear operators of R2 , that is, elements of End(R2 ),
by simply specifying their values at the basis vectors v1 , v2 and then extending them linearly to all
of R2 . Let us define four elements T 11 , T 12 , T 21 , T 22 of End(R2 ) in this manner:
T 11 v1 = v1 , T 11 v2 = 0,
T 12 v1 = 0, T 12 v2 = v1 ,
T 21 v1 = v2 , T 21 v2 = 0,
T 22 v1 = 0, T 22 v2 = v2 .
Therefore,
(T 11 + T 22)v1 = v1
(T 11 + T 22)v2 = v2
which show that T 11 + T 22 agrees with the identity map I on R2 on each basis vector. It follows that
T 11 + T 22 = I on R2 .
It is equally instructive to interpret the product T 11 T 22 . Since
(T 11T 22 )v1 = T 11 (T 22 v1 ) = T 11 0 = 0
and
(T 11 T 22 )v2 = T 11 (T 22 v2 ) = T 11 v2 = 0,
it follows that T 11 T 22 is the zero map z of End(R2 ). Thus, we have shown that the product of two
non-zero elements of End(R2 ) is the zero element.
Similar calculations show that T 11 T 21 = z whereas T 21 T 11 ! z proving that the product T 11 T 21 is
not the same map as the product T 21 T 11 . In other words, End(R2 ) is not a commutative ring. With this
example in mind, the reader should have no difficulty in proving the following general result.
Proposition 4.3.11. Let V be a finite-dimensional vector space over a field F such that dim V > 1.
(a) The ring End(V) is not commutative.
(b) End(V) has non-zero zero divisors, that is, End(V) has non-zero operators whose product is the
zero operator.
Compare this with similar properties of Mn (F), which we discussed after Proposition (1.3.6). The
similarity of these two results is yet another pointer to the connection between End(V) and Mn (F).
This connection will be made precise in the next section.
Invertible Linear Operators

Analogous to the invertible matrices in Mn (F), there are invertible linear operators in End(V).
Definition 4.3.12. Let V be a vector space over a field F. An operator T ∈ EndF (V) is invertible in
EndF (V) if there is an operator S in End(V) such that
S T = T S = IV ,
where IV is the identity operator on V. If such an operator S on V exists, one says that S is the inverse
of T and is usually denoted by T −1 .
An alert reader must have noticed our tacit assumption that if an inverse of an operator exists,
then it must be unique. Uniqueness of inverses is a general fact and one can verify it by a standard
argument. Suppose that for a linear operator T on V, there are two operators S 1 and S 2 on V such
that S 1 T = T S 1 = I and S 2 T = T S 2 = I, where I stands for the identity operator on V. Now, as
S 1 = S 1 I = S 1 (T S 2 ), by associativity of composition of maps, S 1 = (S 1 T )S 2 = IS 2 = S 2 proving our
contention that if an inverse of T exists it must be unique.
In case a linear operator T on V has a set-theoretic inverse, then it can be shown that the inverse has
to be automatically a linear operator and so has to be the inverse of T in the ring End(V). Recall that a
function f : X → Y has a set-theoretic inverse if and only if f is one–one and onto. In that case for any
y ∈ Y, there is a unique x ∈ X such that f (x) = y; the set-theoretic inverse of f , denoted by f −1 , is then
the function from Y to X given by f −1 (y) = x. One easily verifies that the composite f −1 f = IX , the
identity map on X and f f −1 = IY , the identity map on Y; if X = Y, then of course, f −1 f = f f −1 = IX .
We are now ready to prove the following.
Proposition 4.3.13. Let T be a linear operator on a vector space V over a field F. Then T is
invertible in EndF (V) if and only if T is one–one and onto.
Proof. Assume first that T : V → V is one–one and onto. Then inverse function T −1 : V → V exists
and T T −1 = T −1 T = IV , the identity map on V. So it suffices to show that T −1 is a linear map. Let v, w
be arbitrary vectors in V and set v1 = T −1 v and w1 = T −1 w. Then by definition, T v1 = v and T w1 = w.
Since T is linear, for any a ∈ F,
T (av1 + w1 ) = aT v1 + T w1 = av + w
and so applying the definition of T −1 again, we obtain
T −1 (av + w) = av1 + w1
= aT −1 v + T −1 w,
which proves that T −1 is linear.

Conversely, we assume that T is invertible and S is its inverse. If v ∈ ker T , then applying S to both
sides of T v = 0, we see that S (T v) = S 0 = 0 as S is linear. As S T is the identity map on V, it follows that
v = 0 and so ker T = {0} showing that T is one–one. On the other hand, for any w ∈ V, if v = S w, then
applying T to both sides, we obtain T v = T (S w) = (T S )w = IV w = w. This implies that T is onto. !
The proof also shows that the inverse of an invertible operator is an one–one and onto map.
Recall that a linear operator on a finite-dimensional vector space is one–one if and only if it is onto.
So we have the following useful corollary.
Corollary 4.3.14. A linear operator on a finite-dimensional vector space is invertible if and only if
it is either one–one or onto.
EXAMPLE 13 Consider the linear operator T on R2 given by T (x, y) = (−x, −y). Since for any x, y ∈
R, −(−x) = x and −(−y) = y, it follows that T 2 (x, y) = T (T (x, y)) = T (−x, −y) = (x, y)
for all (x, y) ∈ R2 . Thus T 2 acts as the identity operator on R2 . By definition then
T −1 = T and so T is invertible. Geometrically, T reflects any point in R2 about the
origin and so it is clear that T 2 (which is T followed by itself) will bring any point
back to itself.
In general, any projection P on a vector space is invertible with P itself as its

inverse as P2 = P.
EXAMPLE 14 The operator representing an anticlockwise rotation of R2 through an angle θ is

clearly invertible as a clockwise rotation through −θ will correspond to its inverse.
Nilpotent Operators
We next introduce nilpotent elements in the ring End(V) for any vector space V of dimension at least 2.
Definition 4.3.15. A linear operator T ∈ End(V) is said to be nilpotent if T k = z for some positive
integer k. For a nilpotent operator T , the positive integer k is said to be the index of nilpotency if
T k = z, but T k−1 ! z.
Let V be a vector space of dim = n over a field F, where n > 1. Choose any basis v1 , v2 , . . . , vn of
V. Consider the linear operator T on V defined by
T v j = v j+1 for j = 1, 2, . . . , n − 1, (4.7)

T vn = 0.
It is clear that for the product or composite T 2 , we have
T 2 v j = v j+2 for j = 1, 2, . . . , n − 2,
2
T vj = 0 for j = n − 1, n.
Similar calculations for higher powers of T show that whereas T k , for k ≤ n − 1, cannot be the zero
map (as T k v1 ! 0), T n carries every basis vector to the zero vector, hence must be the zero map. Thus,
we have found a nilpotent linear operator of index n. By modifying this example, nilpotent operators
of index less than n can be produced easily.
EXERCISES
following, V is a vector space over an arbitrary field.
(a) The sum and the product of two linear operators on V are again linear operators on V.
(b) The vector space of all linear transformations of R3 to R2 is of dimension 5.
(c) If, for a linear operator T on Rn , T 2 is the zero map, then T itself is the zero map.
(d) If for a linear operator T on V, T 2 = I, then either T = I or T = −I.
(e) If T 1 , T 2 and T 3 are linear operators on V such that T 1 + T 2 = T 1 + T 3 , then T 2 = T 3 .
(f) If T 1 , T 2 and T 3 are linear operators on V such that T 1 T 2 = T 1 T 3 , then T 2 = T 3 .
(g) If T 1 , T 2 and T 3 are linear operators on V, then T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 .
(h) For non-zero operators T 1 and T 2 on V, the product T 1 T 2 cannot be the zero operator.
(i) For a linear operator T on V, T 2 is the zero operator on V if and only if Im(T ) ⊂ ker T .
(j) If, for a non-zero linear operator T on V, T 2 = T , then T must be the identity operator on V.
(k) The dual of a finite-dimensional vector space V is isomorphic to V.

(l) The trace map on Mn (F) is a linear functional.
(m) An invertible operator on V cannot be nilpotent.
2. Complete the verification of the vector space axioms for HomF (V, W) in Theorem (4.3.4).
3. Prove Theorem (4.3.10).
4. Determine, in each of the following, whether the given map is a linear functional on the indicated
vector space:
(a) T on Mm×n (F), T ([ai j ]) = a11 .
(b) T on Mn (F), T (A) = det A.
(c) T on Rn [x], T (g(x)) = g(0).
Lb
(d) T on C[a, b], T (g(x)) = a g(t)dt.
(e) T on R2 , T ((x1 , x2 )) = x1 + x2 .
5. Let T and S be invertible operators on a vector space V.
(a) Prove that T −1 is invertible and (T −1 )−1 = T .
(b) Prove that S T is invertible and (S T )−1 = T −1 S −1 .
6. Let V and W be vector spaces over a field F and T : V → W be a linear map which is one–one
and onto. Prove that the set-theoretic inverse T −1 is an one–one, onto linear map from W to V.
7. Let T : R3 → R2 and S : R2 → R3 be the linear transformations given by
T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 )
and
S (x1 , x2 ) = (x1 + x2 , x1 − x2 , x2 ).
Give similar formulae for S T and T S .
8. Let Rθ and Rφ be the linear operators on R2 representing anticlockwise rotations of R2 through
angles θ and φ, respectively. Prove, using matrices, that their composite or product is the rotation
through the angle θ + φ.
9. Let R be the linear operator on R2 representing the rotation of R2 through the angle π/4. Prove
that the linear operator R7 is the multiplicative inverse of R.
10. Let T : R3 → R2 and S : R2 → R3 be the maps given by formulas
T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ) and S (x1 , x2 ) = (x1 − x2 , x1 + x2 , x1 )
respectively. Prove that T and S are linear maps. Describe the composites T S and S T by in
terms of similar formulas.
11. Let T be the linear operator on R3 such that
T e1 = e1 − e2 , T e2 = e1 + e2 + e3 and T e3 = e3 − e1
where e1 , e2 and e3 form the standard basis of R3 . Compute T (x1 , x2 , x3 ) for any (x1 , x2 , x3 ) in
R3 . Also, find all the vectors (x1 , x2 , x3 ) ∈ R3 such that T (x1 , x2 , x3 ) = (0, 0, 0). Is T invertible?
12. Let T be the linear operator on the complex vector space C3 such that
T e1 = (i, 0, −1), T e2 = (0, 1, 1) and T e3 = (1, 1 + i, 2)

where e1 , e2 and e3 form the standard basis of C3 . Compute T (x1 , x2 , x3 ) for any (x1 , x2 , x3 ) ∈ C.
Also find all the vectors (x1 , x2 , x3 ) ∈ C such that T (x1 , x2 , x3 ) = (0, 0, 0). Is T invertible?
13. Let T 1 and T 2 be linear operators on R3 given by
T 1 (x1 , x2 , x3 ) = (0, x1 , x2 ), T 2 (x1 , x2 , x3 ) = (x3 , x2 , x1 ).
(a) Give similar formulae for the operators T 1 + T 2 , T 1 T 2 , T 2 T 1 , T 1 2 and T 2 2 .

(b) Determine which of the two operators is invertible, and give a formula for the inverse.
(c) Find non-zero linear operators S 1 and S 2 such that S 1 T 1 and T 1 S 2 are zero operators.
14. Let T be the linear operator on M2 (R) given by
2' (3 ' (
a b 0 a
T = .
c d b c
Compute the powers T 2 , T 3 and T 4 by giving their actions on an arbitrary matrix of M2 (R).
Also, find a linear operator S on M2 (R) such that S T ! T S .
(For the second part, you should be able to produce a specific A ∈ M2 (R) such that (S T )(A) !
(T S )(A) for whatever operator S you have found.)
15. Let T and S be linear operators on R2 [x] given by
T (a0 + a1 x + a2 x2 ) = a0 + a1 (x + 1) + a2(x + 1)2
and
S (a0 + a1 x + a2 x2 ) = a1 + 2a2 x.
Give similar formulae for the operators T 2 , S 2 , S T and T S . Which of the two operators T and
S is nilpotent?
16. Let T and S be linear operators on a vector space V. Prove that T S is nilpotent if and only if S T
is nilpotent.
In Exercises 13 through 16, the required linear maps should be described in terms of their
actions on some chosen bases of their domains.
17. Let V be an m-dimensional vector space over a field F. Prove that for any positive integer k,
1 ≤ k ≤ m, there is a nilpotent operator on V whose index of nilpotency is k.
18. Let V be an m-dimensional vector space over a field F. Prove that if m > 1, then there are non-
zero linear operators T 1 and T 2 on V such that T 1 T 2 ! T 2 T 1 .
19. Determine linear operators T and S on R3 such that S T is the zero operator on R3 whereas T S
is not.
20. Find two non-zero operators T 1 and T 2 on R2 such that T 1 T 2 is the zero operator on R2 .
Can your example be generalized to an arbitrary vector space of dimension larger than 1?
21. Let T be the translation operator on the real vector space R[x] of all real polynomials defined
by
T (p(x)) = p(x + 1) for any f (x) ∈ R[x].
Is T invertible?
22. Let T be the linear operator on the vector space R[x] of all real polynomials defined by
T ( f (x)) = x f (x) for any f (x) ∈ R[x].

Isomorphism 215
Is T one–one? Is T onto? Also, if D is the usual differential operator on R[x], then prove that
DT − T D is the identity operator on R[x].
23. Let T be a linear operator on a finite-dimensional vector space V such that rank(T 2 ) = rank(T ).
Prove that
Im(T ) ∩ ker T = {0}.
24. Let V, W and U be vector spaces over a field F. If T ∈ HomF (V, W) and S ∈ HomF (W, U), then
show that the composite S T is in HomF (V, U).
25. Let T ∈ HomF (V, W) and S ∈ HomF (W, U) be linear maps, where V, W and U are finite-
dimensional vector spaces over a field F.
(a) Show that T (ker S T ) is a subspace of ker S .
(b) Hence prove that
dim ker(S T ) ≤ dim ker S + dim ker T.
26. Let T and S be linear operators on a finite-dimensional vector space V over a field F such that
T 2 = S 2 is the zero operator and T S + S T is the identity operator on V. Prove that
ker T = T ker S , ker S = S ker T and V = ker T ⊕ ker S .
4.4 ISOMORPHISM
We have, quite often, come across instances of two vector spaces which we claimed are essentially
the same. We can now make this vague idea precise by introducing the idea of isomorphism of vector
spaces.
Definition 4.4.1. Let V and W be vector spaces over the same field F. A linear map T : V → W
is an isomorphism of V onto W if T is both one–one and onto. In that case, we say V and W are
isomorphic as vector spaces.
If V is isomorphic to W, then sometimes the notation V # W is used.
The existence of an isomorphism between V and W means that every vector of W is associated with
a unique vector of V in such a way that this association respects the vector space operations. In other
words, the vectors of isomorphic spaces differ in names only; the isomorphism allows one to rename
the vectors of one vector space as vectors of the other space in a way compatible with the respective
operations of the two spaces. In that sense, two isomorphic vector spaces are the same.
If T is an isomorphism of V onto W, then T has a set theoretic inverse T −1 (why?) from W to V. In
an exercise in the last section, the reader was asked to show that T −1 is also one–one, onto linear map.
So T −1 is an isomorphism of W onto V.
Similarly, it is a routine exercise to show that if T : V → W and S : W → U are isomorphisms, then
the composite S T is an isomorphism of V onto U.
The last two assertions are the main ingredients of the proof of the following proposition which we
leave to the reader.
Proposition 4.4.2. Isomorphism is an equivalence relation in the collection of all vector spaces
over a fixed field.
Here are some examples of isomorphic spaces.
EXAMPLE 15 The set of complex numbers C, as a vector space over R, is isomorphic to R2 . The
isomorphism is clearly the map given by
T (a + ib) = (a, b).
EXAMPLE 16 When we say that R2 is isomorphic to itself, we probably think of the identity map
(the one that maps v to itself) as the isomorphism. However, there are infinitely many
ways in which R2 can be conceived of as an isomorphic copy of itself. For example,
as we have seen in the last section, every choice of reals a, b, c and d such that
ad − bc ! 0, the map
(x1 , x2 ) 6→ (ax1 + bx2 , cx1 + dx2 )
sets up an isomorphism of R2 with itself. Another one will be the rotation Rθ where
θ ! 2nπ.
This abundance of isomorphisms is not limited to R2 only. We will show presently
that we have many choices for isomorphism for any arbitrary vector space.
EXAMPLE 17 The vector space M2 (R) is isomorphic to R4 . The map which sends
' (
a b
6→ (a, b, c, d)
c d
is an isomorphism.
It is equally easy to show that Mm×n (F) # Fmn for any field F.
EXAMPLE 18 The map
a0 + a1 x + a2 x2 6→ (a0 , a1 , a2 )
establishes an isomorphism from the vector space R2 [x] of all real polynomials of
degree at most two with R3 .
The general case of isomorphism between the real vector space Rn [x] of polyno-
mials of degree at most n, and Rn+1 is left to the reader, as an exercise, to formulate
as well as to prove.
EXAMPLE 19 However, as the most important example, we show that any n-dimensional vector
space V over a field F is isomorphic to Fn . To see this, choose a basis v1 , v2 , . . . , vn
of V. If, for an arbitrary vector v ∈ V,
v = a 1 v1 + a 2 v2 + · · · + a n vn
Isomorphism 217
for scalars ai , then by setting

 
a1 
a 
 2
T v =  . ,
 .. 
 
an
we get a well-defined map T : V → Fn (see discussion following Equation 3.4). Basic
properties of addition and scalar multiplication in vector spaces show that T is a
linear map. As the basis vectors are linearly independent, T is one–one. Therefore,
T is onto as the dimensions of V and Fn are the same.
Though this example shows that there is only one vector space of dimension n over F up to isomor-
phism, that does not mean that we should restrict ourselves to studying Fn only. For, the identification
of V with Fn depends on our choice of the basis of V; every choice of a basis determines a rule for
identifying vectors of V with n-tuples of Fn . Thus, there is no natural way of associating the vectors
of a general n-dimensional space V with vectors of Fn . Thus, the isomorphism outlined in the example
has a very limited use. However, for future reference, we state the following proposition.
Proposition 4.4.3. Let V be an n-dimensional vector space over a field F. Fix a basis of V, so
that every vector is assigned a unique coordinate vector in Fn . This assignment is a vector space
isomorphism of V onto Fn .
Note that in order to set up the isomorphism in the last example, we could have started by letting
T vi = ei , where ei form the standard basis of Fn . Then, according to Proposition (4.2.4), T could have
been extended to a linear map on V. In that case the following result would have directly shown that f
is an isomorphism.
Proposition 4.4.4. Let V be a finite-dimensional vector space, and let v1 , v2 , . . . , vn be any basis
of V. Suppose that W is another vector space over the same field F. Then a linear map T : V → W is
an isomorphism if and only if the images T v1 , T v2 , . . . , T vn form a basis of W.
Proof. The proof depends on the following familiar fact: if v = a1 v1 + a2 v2 + · · · + an vn , then T v =

a1 T v1 + a2 T v2 + · · · + an T vn . Therefore, if the images T vi span W, then T is onto; if they are linearly
independent, then ker T must be zero, that is, T is one–one. Thus, the proposition is proved in one
direction. A simple modification yields the proof in the other direction. !
The preceding proposition implies that the dimension of a vector space is a crucial number as far
as isomorphism is concerned.
Corollary 4.4.5. Two finite-dimensional vector spaces over a field are isomorphic as vector spaces
if and only if they have the same dimension.
Proof. If V and W are isomorphic then any basis of V is mapped by the isomorphism to a basis of W.
So they have the same dimension.
Conversely, assume that both V and W have dimension n, and choose bases v1 , v2 , . . . , vn and
w1 , w2 , . . . , wn of V and W, respectively. Now, Proposition (4.2.4) ensures that there is a linear map
T : V → W which maps vi to wi for each i. But then the preceding proposition implies that T is an
isomorphism. !
The rest of this section is devoted to a brief discussions about results known as Isomorphism theo-
rems. This portion can be left out by the reader at the first reading.
We begin by looking at a very natural mapping from a vector space to any of its quotient spaces.
Let V be a vector space over a field F, and W any subspace of V. Define η : V → V/W as follows
ηv = v + W.
The rules for addition and scalar multiplication of cosets show that
η(v1 + v2 ) = ηv1 + ηv2
η(av) = aηv
for all vectors v1 , v2 , v ∈ V and scalars a ∈ F. Hence, η is a linear map. It is clear that η is onto V/W. We
claim that the kernel of η is W. To establish the claim, first recall that the zero vector of the quotient
space V/W is the coset W itself. Hence, if v ∈ ker η, then ηv = W and so v + W = W by the definition
of η. The equality of cosets then implies that v ∈ W, which, in turn, shows that ker η ⊂ W. To complete
the proof of our claim, we have to verify the inclusion in the other direction which can be easily done
in a similar manner.
The map η is customarily described as the canonical homomorphism of V onto V/W. Thus, we
have shown that the canonical homomorphism η : V → V/W is an onto linear map whose kernel is
precisely W.
The idea of a quotient space is crucial in establishing isomorphisms between vector spaces. The
next theorem is an example of how it can be done.
Theorem 4.4.6. Let V and W be vector spaces over a field F, and let T : V → W be a linear map.
Then,
V/ker T # Im(T ).
In particular, if T : V → W is an onto linear map, then
V/ker T # W.
Proof. Put K = ker T , and define S : V/K → W as follows:
S (v + K) = T v.
Note that as S is defined in terms of a representatives of a coset, we have to be careful as to whether

the definition of S is independent of the choice of representative of cosets. The process of checking
this independence is known as verifying that S is well defined; in practice, we take two arbitrary
representatives, say, v1 and v2 , of the same coset of K and show that T v1 = T v2 . But our choice of v1
and v2 implies that v1 − v2 ∈ K = ker T . Therefore, T (v1 − v2 ) = 0 in W so the linearity of T gives us
the required equality.
To complete the proof of the theorem, we need to show that
(a) S is linear,
(b) S is one–one and
(c) S is onto Im(T ).
But these depend on routine and by now familiar verifications and therefore left to the reader. !
Isomorphism 219
Observe that, in case of a finite-dimensional vector space V, the isomorphism of the theorem allows
us to deduce the following formula:
dim(V/ ker T ) = dim Im(T ).
It follows, from the expression of the dimension of a quotient space (see Proposition 3.9.3), that
dim V = dim ker T + dim Im(T ),
which is the dimension formula of Theorem (4.2.7).

The proofs of the next two Isomorphism theorems are omitted; however, we indicate the necessary
steps needed for the proofs in Exercises 6 and 8.
Theorem 4.4.7. Let T : V → W be an onto linear map with kernel K.Then there is a one–one
correspondence between the subspaces of W and the subspaces of V containing K.
Corollary 4.4.8. Let U be a subspace of V. Then every subspace of the quotient space V/U is of
the form L/U for some subspace L of V containing U.
Theorem 4.4.9. Let W1 and W2 be subspaces of a vector space V. Then
(W1 + W2 )/W1 # W2 /W1 ∩ W2 .
EXERCISES
following, given vector spaces are over an arbitrary field unless otherwise mentioned.
(a) Rm is not isomorphic to Rn if m ! n.
(b) Two proper, distinct subspaces of a finite-dimensional vector space can never be isomorphic.
(c) Every vector space is isomorphic to itself.
(d) An infinite-dimensional vector space V cannot be isomorphic to a proper subspace of V.
(e) Rn is isomorphic to a unique subspace of Rm if m > n.
(f) There is a one–one correspondence between the subspaces of two isomorphic vector spaces.
(g) Any quotient space of a vector space V is isomorphic to a subspace of V.
(h) Every subspace of a vector space V is isomorphic to a quotient space of V.
(i) Every pair of bases of two vector spaces over a field having the same dimension determines
an isomorphism between them.
(j) If, for a finite-dimensional vector space V,
V = W1 ⊕ W2 = W1 ⊕ W3
for subspaces W1 , W2 and W3 , then W2 # W3 .

(k) For any field F, Mm×n (F) # Fm+n .
(l) The null space of a matrix A ∈ Mm×n (F) cannot be isomorphic to its column space.
2. Let V, W and U be vector spaces over a field F.

(a) If T is a linear map of V onto W such that the inverse map T −1 : W → V exists, then show
that T −1 is also linear.
(b) If T : V → W and S : W → U are linear maps, then show that the composite map S ◦ T is a
linear map from V to U.
(c) Complete the proof of Proposition (4.4.2).
3. Give a detailed proof of Proposition (4.4.3).
5. Carry out the verifications needed to complete the proof of Theorem (4.4.6).
The following exercise provides a proof of Theorem (4.4.7).
6. Let T be a linear map from a vector space V onto another space W, both over the same field. For
any subspace W1 of W, let
V1 = {v ∈ V | T v ∈ W1 }.
Show that V1 is a subspace of V containing K, the kernel of T . Show further that W1 6→ V1 is an
one–one map from the set of all subspaces of W onto the set of all subspaces of V containing K.
8. Prove Theorem (4.4.9) by using Theorem (4.4.6) after carrying out the following steps:
For any subspaces W1 and W2 , define T from the sum W1 + W2 to the quotient space
W2 /(W1 ∩ W2 ) by
T (w1 + w2 ) = w2 + (W1 ∩ W2 ).
Show that
(a) T is well-defined, i.e., if w1 + w2 = w' 1 + w' 2 ∈ W1 + W2 , then T (w1 + w2 ) = T (w' 1 + w' 2 ).
(b) T is linear.
(c) T is onto W2 /(W1 ∩ W2 ).
(d) Kernel of T is precisely W1 .
9. Let F be any field, and A a fixed matrix in Mn (F). Prove that the map T A : Fn → Fn defined by
T A (x) = Ax
is a vector space isomorphism if and only if A is invertible.

10. Let F be any field, and B a fixed invertible matrix in Mn (F). Prove that the map φ : Mn (F) →
Mn (F) defined by
φ(A) = B−1 AB
is a vector space isomorphism.

11. Consider M2 (R) with the standard basis consisting of the four unit matrices. Use Exercise 10
and the invertible matrix ' (
1 0
B=
0 −1
to produce another basis of M2 (R).

12. Is the map T : Mn (F) → Mn (F) given by T (A) = At an isomorphism?
Matrices of Linear Maps 221
13. Give examples of vector spaces over a field and linear maps T and S between them such that (i)
T is one–one but not onto and (ii) S is onto but not one–one.
4.5 MATRICES OF LINEAR MAPS

One of the reasons for the utility of linear maps between finite-dimensional vector spaces is that there
is a simple way of representing them as matrices. This representation of linear maps as matrices is
a very effective one, for any algebraic manipulation of linear maps corresponds to exactly the same
manipulation of the matrices representing the maps. That explains, as the reader must have guessed by
now, the similarity between the algebraic structures of Hom(V, W) and Mm×n (F). We now discuss how
this representation works.
Let V and W be finite-dimensional vector spaces over a field F with dim V = n and dim W = m,
and let T : V → W be a linear map. Fix a pair of ordered bases B = {v1 , v2 , . . . , vn } and C =
{w1 , w2 , . . . , wm } of V and W, respectively. The image T v j , being a vector in W, is a linear com-
bination of the basis vectors wi . Thus, for each fixed j, 1 ≤ j ≤ n, we can find m unique scalars
a1 j , a2 j , . . . , am j in F such that
m
1
Tvj = ai j wi . (4.8)
i=1
Once T v j is expressed in this manner for all j, we have a set of mn scalars ai j for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Definition 4.5.1. The matrix of T , with respect to the bases B and C of V and W, respectively, is
the matrix
m(T ) = [ai j ]
in Mm×n (F), whose jth column consists of precisely the scalars determined by T v j in Equation (4.8).
Thus, this construction allows us, after fixing bases, to associate with any linear map T : V → W
a unique matrix m(T ) in Mm×n (F). Some of the important features of the definition are pointed out in
the following remarks:
(a) It should be clear that different pair of bases will yield different matrix representations of the
same linear map. So, it is reasonable to expect that this dependence on bases should be reflected
in the notation m(T ) for the matrix of T . However, that will make the notation really clumsy.
We prefer to keep our notation as simple as m(T ), but that means we have to keep in mind the
bases used for a given matrix representation.
(b) Even when we consider maps from V into itself, that is, with W = V in the definition, it is not
necessary to take B = C. (See examples after these remarks.)
(c) However, if for the linear operator T : V → V, we use the same basis B for both the domain V
and the range V, then we will refer to the matrix m(T ) of T as the matrix with respect to the
basis B. This will be the case in most of our important examples.
We now present some examples.

EXAMPLE 20 Consider the identity map I = IV on V, where V is an n-dimensional vector space

over a field F. Pick any basis B of V. Since I fixes any vector of V, in particular, the
basis vectors of B, it follows that the entries of the jth column of m(I) will be all
zeros, except the entry at the jth place, which is 1. Thus, the matrix of I with respect
to any basis B will be the identity matrix In of order n in Mn (F).
However, it is not hard to see that the matrix of I with respect to two distinct
bases B and C of V need not be the identity matrix, for the jth column of the matrix
now will consist of the scalars which are the coefficients in the linear combination
expressing the jth vector of B in terms of the vectors in the basis C.
EXAMPLE 21 Consider the zero map z from V to W, both vector spaces over a field F. Assume
dim V = n and dim W = m. Since z takes every vector of V, and in particular the basis
vectors of any basis of V, to the zero vector in W, it follows that all the entries of any
of the n columns of m(z) must be zero. Thus, no matter which bases are chosen, m(z)
will be the zero matrix in Mm×n (F).
EXAMPLE 22 Let P1 : R2 → R2 be the linear operator given by P1 (x1 , x2 ) = (x1 , 0). Take the stan-
dard basis {e1 , e2 } of R2 where
e1 = (1, 0), e2 = (0, 1).
We express the images of the basis vectors under P1 as combination of the same
basis vectors:
P1 e1 = P1 (1, 0) = (1, 0) = 1.e1 + 0.e2 ,
P1 e2 = P1 (0, 1) = (0, 0) = 0.e1 + 0.e2 .
Therefore, the matrix of P1 with respect to the standard basis of R2 is

' (
1 0
.
0 0
To appreciate the importance of ordered basis in matrix representation, consider
the matrix of P1 with respect to the basis {e2 , e1 } of R2 . We repeat that this is not the
standard basis of R2 , though as a set it is the same as the standard basis. The reader
should have no difficulty in showing that the matrix of P1 with respect to the new
basis is ' (
0 0
0 1
which is not the matrix we had found for P1 with the first ordered basis.
EXAMPLE 23 We find the matrix of P1 with respect to another basis B of R2 , consisting of vectors
v1 = (1, 1), v2 = (2, −1). To do that we have to determine scalars a, b, c and d such
that
P1 v1 = (1, 0) = av1 + bv2 = a(1, 1) + b(2, −1) = (a + 2b, a − b),
P1 v2 = (2, 0) = cv1 + dv2 = c(1, 1) + d(2, −1) = (c + 2d, c − d).
The two equations we have to solve for a and b are a + 2b = 1 and a − b = 0, which
give a = b = 1/3. Similarly, equating the components of both sides of the second
equation, and solving them for c and d, we see that c = d = 2/3. Thus, the required
matrix of P1 relative to basis B is given by
' (
1/3 2/3
.
1/3 2/3
EXAMPLE 24 Let T : R2 → R3 be defined by
T (x1 , x2 ) = (x1 + x2 , x2 , x2 ).
To find the matrix of T with respect to the standard bases of R2 and R3 , respectively,
note that
T (1, 0) = (1, 0, 0) = 1.e1 + 0.e2 + 0.e3
T (0, 1) = (1, 1, 1) = 1.e1 + 1.e2 + 1.e3 .
Note that to avoid confusion, we have used the symbols ei to mean the standard
basis vectors of R3 whereas no such symbols are used for the basis vectors in R2 .
Thus, the preceding relations show that the matrix of T with respect to the standard
bases of R2 and R3 is a 3 × 2 one, given by
 
1 1
0 1.
 
0 1
EXAMPLE 25 Let T : R3 → R be the linear map given by T (x1 , x2 , x3 ) = x1 + x2 + x3 . The matrix

of T , with respect to the standard basis {e1 , e2 , e3 } of R3 and the basis {1} of R, is
clearly the following 1 × 3 matrix
/ 0
1 1 1.
Note that any non-zero real a can form a basis of the vector space R over itself.
Keeping the same basis for R3 but changing the basis of R to {a}, we see that the
matrix of T with respect to the new pair of bases is
[a−1 a−1 a−1 ].
EXAMPLE 26 Consider the differential map D on R3 [x], the real vector space of all real polynomials
of degree at most 3. Thus,
D(a0 + a1 x + a2 x2 + a3 x3 ) = a1 + 2a2 x + 3a3 x2 .
Take the standard basis {1, x, x2 , x3 } of R3 [x]. To obtain the matrix of D with respect
to this basis, we have to express D of each of these basis vectors as linear combi-
nations the same basis vectors. The coefficients in these combinations will form the
columns of the required matrix. Since
D(1) = 0, D(x) = 1, D(x2 ) = 2x, D(x3 ) = 3x2 ,
it follows that the required matrix is

 
0 1 0 0

0 0 2 0
 .
0
 0 0 3
0 0 0 0
However, with respect to the basis {1, 1 + x, 1 + x2, 1 + x3 }, the matrix of the same
differential map D will be the following one:
 
0 1 −2 −3
0 0 2 0
 .
0 0 0 3
0 0 0 0
Going back to the general discussion now, we seek to relate the vectors appearing as images of
a linear map T : V → W between two finite-dimensional vector spaces to its matrix representation.
Choose B = {v1 , . . . , vn } and C = {w1 , . . . , wm } as bases of V and W, respectively, and let A = [ai j ]
be the matrix in Mm×n (F) representing T with respect to these bases. Recall the idea of coordinates of
4
vectors in finite-dimensional vector spaces. For any v ∈ V, if v = b1 v1 + · · · + bn vn = nj=1 b j v j is the
expression of v in terms of the vectors of the basis B, then the coordinate vector of v with respect to
basis B is the n × 1 column matrix (b1 , . . . , bn )t . Similarly, if T v = w, then writing w in terms of the
4
basis vectors of C we get the coordinate vector (c1 , . . . , cm )t of w, where T v = w = m i=1 ci wi . There
is a very natural and useful relation between the coordinate vector of v and the coordinate vector of
T v = w through the matrix A of T . To find this relation, recall that as A = [ai j ] is the matrix of T with
respect to the bases B and C, we have, from Equation (4.8)
m
1
Tvj = ai j wi .
i=1
It follows that
w = Tv
 
1n 
= T   b j v j 
j=1
n
1
= b jT v j as T is linear,
j=1
n
m 
1 1 
= b j   ai j wi  by the formula for T v j ,
j=1 i=1
 
m 
1 1 n 

  w
=  a i j j
b  i by interchanging the sums.
i=1 j=1
Observe that changing the order of the two summations and then rearranging the terms to obtain the
last equality are allowed as the sums are finite. Comparing this expression for w with the earlier one
4
w= m i=1 ci wi , we conclude that
n
1
ci = ai j b j for i = 1, 2, . . . , m,
j=1
by the uniqueness of linear combinations of basis vectors. However, these m equations are equivalent
to a single matrix equation
   
 c1  b1 
 c  b 
 2   2 
 ..  = A  .. .
 .   . 
cm bn
Thus, we have shown that the vector equation
Tv = w is equivalent to the matrix equation Ax = y (4.9)
where x and y are the coordinate vectors of v and w with respect to the given bases of V and W,
respectively. Observe that this nice formula works only when the matrix representation of T and the
coordinate vectors of v and w are computed with respect to the same bases of V and W.
Ranks and Nullities of Matrices Representing Linear Maps

It is time now to examine whether the rank and nullity of a linear map, as introduced in Definition
(4.2.8) in Section 4.2, are related to more familiar numbers known as the rank and nullity of any matrix
that represents it. By definition, the rank of a linear map T : V → W is the dimension of the subspace
Im(T ) of W whereas the nullity of T is the dimension of ker T . Let A be the matrix of T relative to
some fixed bases of V and W. Assume that the dimensions of V and W are n and m, respectively.
Observe that choosing a basis for the n-dimensional vector space V over a field F means setting up an
isomorphism of V onto Fn under which a vector v in V corresponds to its coordinate vector x in Fn
(see Proposition 4.4.3). Similarly, W is isomorphic to Fm . Since T v = 0 if and only if Ax = 0, it follows
that ker T and the nullspace of A are isomorphic under the same correspondence between vectors of
V and their coordinate vectors in Fn . Thus, the dimensions of these two subspaces are equal showing
that the nullity of the linear map T and the nullity of the matrix A are the same.
Under the same correspondence, the image of T , that is, the subspace {w | T v = w for some v ∈ V}
of W is isomorphic to the subspace {y | Ax = y for some x ∈ Fn } of Fm . But this subspace of Fm is the
column space of A whose dimension is the rank of the matrix A. Thus, the ranks of T and A are the
same.
We record our observations as the following proposition.
Proposition 4.5.2. Let V and W be finite-dimensional vector spaces over a field F. Let T : V → W
be a linear map and let A be its matrix with respect to some fixed bases of V and W. Then, rank(T ) =
rank(A) and nullity(T ) = nullity(A).
The relations proved in Proposition (4.5.2) allow us to settle certain questions about linear maps by
looking at corresponding matrices. For example, let us try to see the implication of T being one–one.
We keep the same notation. Now, T is one–one if and only if T v = 0 implies v = 0. By the equivalence
given in Equation (4.9), this condition holds if and only if
Ax = 0 implies x = 0,
which is another way of stating that the matrix equation Ax = 0 has only the zero solution.
We now specialize to the case when dim V = dim W = n so that the matrix A is now a square
matrix of order n. Now, a square matrix A is invertible if and only if the equation Ax = 0 has only the
zero solution. Our discussion in this case then implies that T : V → W is one–one if and only if the
corresponding matrix A of T is invertible. But the fact that A is invertible also means that the equation
Ax = y has a solution for any n-dimensional column vector y (W has dimension n). In other words, the
equation T v = w for any w ∈ W has a solution v ∈ V, that is, T is onto.
Note that the conclusion that T is onto under the hypothesis that T is one–one was arrived at earlier
as Corollary (4.2.9) of the dimension formula. But we have reached the same conclusion by using
matrix equation as the equivalence (4.9) is now available.
Singular maps and Matrices

Linear operators which are not invertible are also useful. We take this opportunity to have a very brief
discussion about such maps. First a definition.
Definition 4.5.3. Let T be a linear operator on a vector space V. We say T is singular if T is not
invertible. Similarly, a square matrix A is said to be singular if A is not invertible.
Now, let T be a linear operator on an n-dimensional vector space V over a field F, and A the matrix
of T with respect to any fixed but arbitrary basis of V. We leave the proof of the following to the reader.
Proposition 4.5.4. The following are equivalent:

(a) T is singular.
(b) ker T is non-zero.
(c) The matrix equation Ax = 0 has a non-zero solution.
(d) det A = 0.
(e) A is singular.
Linear Maps representing Matrices

We now show that every matrix determines a linear map in exactly the same manner a linear map
determines a matrix, so that the association of linear maps and matrices works in both ways.
Lemma 4.5.5. Let V and W be finite-dimensional vector spaces over a field F with dim V = n
and dim W = m. Let B ∈ Mm×n (F) be an arbitrary matrix. Fix any bases B = {v1 , . . . , vn } and C =
{w1 , . . . , wm } for V and W, respectively. Then there is a unique linear map S : V → W such that the
matrix of S with respect to these bases is precisely B.
Proof. If B = [bi j ], according to Proposition (4.2.4), there is a unique linear map S from V into W
such that
m
1
S vj = bi j wi for j = 1, 2, . . . , n.
i=1
It is clear, by Definition (4.5.1), that the matrix of the linear map S with respect to the bases B and C
of V and W, respectively, is precisely B. !
Let us discuss a couple of examples to illustrate the preceding lemma.
' (
0 1
EXAMPLE 27 Let A = , and let V = W = R2 . Fix the standard basis {e1 , e2 } of R2 . Following
1 0
the procedure outlined in the lemma, we see that A determines a unique linear oper-
ator T : R2 → R2 (with respect to the chosen basis) such that T e1 = 0.e1 + 1.e2 = e2
and T e2 = 1.e1 + 0.e2 = e2 . Thus, for a typical vector (x1 , x2 ) in R2 ,
T (x1 , x2 ) = T (x1 e1 + x2 e2 ) = x1 e2 + x2 e1 = (x2 , x1 ).
We can also use Equation (4.9) to determine the coordinate vector of T (x1 , x2 ), which
is given by
2 3 ' (2 3 2 3
x 0 1 x1 x
A 1 = = 2 .
x2 1 0 x2 x1
Observe that x1 and x2 are the coordinates of the vector (x1 , x2 ) in R2 only with
respect to the standard basis. It should be clear that in calculations like the preceding
matrix calculation, the coordinate vectors must be interpreted in terms of the bases
chosen.
EXAMPLE 28 Consider the example of the same matrix A. This time keep the standard basis {e1 , e2 }
for the domain R2 , but let {v1 , v2 } be the basis of the range space R2 , where
v1 = (2, 0) and v2 = (1, 1).
Let S be the linear operator on R2 determined by the same matrix A with respect to
the new bases, so that
S e1 = v2 and S e2 = v1 .
Note that if (x1 , x2 )t is the coordinate vector of v, then the coordinate vector of S v
in W will still be the column vector (x2 , x1 )t , but the coordinates this time must be
interpreted in terms of the new basis of W. Hence, the components of the coordinate
vector (x2 , x1 )t in W = R2 are given by
x2 v1 + x1 v2 = x2 (2, 0) + x1(1, 1) = (x1 + 2x2 , x1 ).
We leave it the reader to verify directly that S (x1 , x2 ) is indeed (x1 + 2x2 , x1 ), by
computing S (x1 e1 + x2 e2 ).
' (
2 0 −1
EXAMPLE 29 Consider now the matrix A = . Being a 2 × 3 real matrix, A determines
−1 1 1
a unique linear map from any three-dimensional real vector space V into a two-
dimensional real vector space W, once bases for V and W are fixed. Let us choose
V = R2 [x] with standard basis {1, x, x2 } and W = R1 [x] with standard basis {1, x}.
Recall that the coefficients of a polynomial in Rn [x] themselves are the coordinates
of that polynomial with respect to the standard basis {1, x, · · · , xn }. Now
   
a0  ' ( a0  2 3
  2 0 −1   2a0 − a2
A a1  = a1  = .
  −1 1 1   −a0 + a1 + a2
a2 a2
Since we are considering usual standard bases for both the polynomial spaces, it
follows that the linear map determined by A is T : R2 [x] → R1 [x], where
T (a0 + a1 x + a2 x2 ) = (2a0 − a2 ) + (−a0 + a1 + a2 )x.
Observe that the same matrix determines the linear map S : R3 → R2 with respect to
the usual standard bases, where S is given by
S (x1 , x2 , x3 ) = (2x1 − x3 , −x1 + x2 + x3 ).
HomF (V, W) is Isomorphic to Mm×n (F)

The preceding examples will help in getting a feeling for the correspondence described in the next
theorem. Recall that a one–one correspondence between two sets is a one–one and onto map. We
shall also require the fact (for the proof of the theorem) that a map is one–one and onto if and only if
the inverse of the map exists.
Theorem 4.5.6. Let V and W be finite-dimensional vector spaces over a field F with dimensions
n and m, respectively. Choose any bases B={v1 , v2 , . . . , vn } and C={w1 , w2 , . . . , wm } of V and W,
respectively. For any linear map T : V → W, let m(T ) be the matrix of T with respect to the bases B
and C. Then, the map
T 6→ m(T )
from HomF (V, W) to Mm×n (F) is a one–one correspondence which establishes a vector space isomor-
phism from HomF (V, W) onto Mm×n (F).
Proof. By our remark preceding the statement of the theorem, to prove that the map T 6→ m(T ) is a
one–one correspondence, we need to exhibit an inverse of this map. This inverse is provided by the
map in Lemma (4.5.5) which assigns every matrix in Mm×n (F) a unique linear map from V into W
with respect to a pair of fixed bases. Denote this map by m∗ . Now, if B ∈ Mm×n (F) determines S ∈
HomF (V, W), it was shown in that lemma that m(S ) is precisely B. In other words, m(m∗ (B)) = m(S ) =
B. Moreover, the uniqueness of m∗ of a matrix as given by the same lemma shows that m∗ (m(T )) = T
for any T ∈ HomF (V, W). Thus, m∗ is indeed the inverse of m.
To complete the proof, thus, we need to show further that m preserves the vector space operations,
that is, to show that
m(T 1 + T 2) = m(T 1 ) + m(T 2) and m(aT 1 ) = am(T 1)
for any T 1 , T 2 ∈ Hom(V, W) and a ∈ F.
To verify the first equality, assume that m(T 1 ) = [ai j ] and m(T 2 ) = [bi j ] be the m × n matrices of T 1
and T 2 with respect to the bases B and C, respectively. According to Equation (4.8), one then has
m
1 m
1
T1v j = ai j wi and T 2 v j = bi j wi .
i=1 i=1
It follows that
m
1
(T 1 + T 2 )v j = (ai j + bi j )wi ,
i=1
showing that the jth column of the matrix of T 1 + T 2 with respect to the given bases is the sum of the
jth columns of m(T 1 ) and m(T 2 ). Since j is arbitrary, it follows that m(T 1 +T 2 ) is indeed m(T 1 ) +m(T 2).
We leave a similar verification of the second equality to the reader. !
A consequence of the preceding theorem is that the vector space End(V) of all linear operators
of an n-dimensional vector space over a field F is isomorphic to Mn (F) as vector spaces. More in-
terestingly, the same isomorphism also preserves the product in these spaces, which we verify in the
following corollary. That means, for example, that invertible matrices correspond to one–one, onto
linear operators. Some other consequences are listed in the exercises.
Corollary 4.5.7. Let V be an n-dimensional vector space over a field F. Let B = {v1 , v2 , . . . , vn }
be a fixed basis of V. For any T ∈ EndF (V), let m(T ) be the matrix of T with respect to the basis B.
Then, the map m : EndF (V) → Mn (F) is a vector space isomorphism which preserves the product also.
This corollary is also stated as follows: Mn (F) and EndF (V) are isomorphic as F-algebras.
Proof. That m : EndF (V) → Mn (F) is a vector space isomorphism follows from the preceding theorem.
Thus, the only verification left is to check that m(T 1 T 2 ) = m(T 1 )m(T 2 ) for any T 1 , T 2 ∈ EndF (V). As
in the proof of the theorem, if we let m(T 1 ) = A = [ai j ] and m(T 2 ) = B = [bi j ] be the matrices, of order
n, of T 1 and T 2 , respectively, with respect to the basis B, then for each j, (1 ≤ j ≤ m),
n
1 n
1
T1v j = ai j vi and T 2 v j = b i j vi .
i=1 i=1
Therefore,
(T 1 T 2 )v j = T 1 (T 2 v j ) by definition of product,
 n 
1 
= T 1   bk j vk  by the formula for T 2 ,
k=1
n
1
= b k j T 1 vk as T 1 is linear
k=1
n
 n 
1 1 
= bk j   aik vi  by the formula forT 1 ,
k=1
 n i=1 
n 1
1  
=  aik bk j  vi
i=1 k=1
It follows that the (i, j)th entry of the matrix m(T 1 T 2 ), which is the coefficient of vi in the sum
4
of the last equality, is nk=1 aik bk j . But this sum is also the (i, j)th entry of the matrix product AB.
We, therefore, conclude that the matrix of the product T 1 T 2 is the product AB of the matrices of T 1
and T 2 . !
We give a couple of applications of these results.

Since isomorphic vector spaces have the same dimension, and since the unit matrices ei j for 1 ≤ i ≤
m, 1 ≤ j ≤ n form a basis of Mm×n (F), we have the following corollary.
Corollary 4.5.8. Let dim V = n and dim W = m. Then,
dim HomF (V, W) = nm.
In particular, dim EndF (V) = n2 .
Note that this corollary is Theorem (4.3.6) which we stated without proof in the preceding section.
As another application of the isomorphism between matrices and linear maps, we show how to pro-
duce basis vectors of Hom(V, W), or End(V). Observe that basis vectors correspond to basis vectors
under any isomorphism between vector spaces. Therefore, if we write down the linear maps corre-
sponding to the unit matrices ei j , then Theorem (4.5.6) guarantees that these maps form a basis of
Hom(V, W). So let us fix any two bases, say B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wm } of V and
W, respectively. The linear maps we are looking for, say fi j , must be such that their matrices with
respect to the bases B and C are precisely the ei j . Thus, we must define these maps in such a way that
the kth column of the matrix of fi j will be the kth column of ei j .(Here, we are assuming that i, j are
fixed but arbitrary, and k is a positive integer between 1 and n.) This gives us the clue as to how fi j
acts on the kth basis vector of B. As the jth column of ei j consists of all zeros except for the entry at
the ith row which is 1, and as the kth column, for k ! j, is the zero column, we must have
fi j (vk ) = wi if k = j
=0 if k ! j.
These mn linear maps fi j from V to W then form a basis of Hom(V, W). We leave to the reader the
slight modification needed to obtain a basis of End(V).
Next, we present two examples where linear maps will be used to deduce specific as well as general
facts about matrices.
The first example is about the existence of nilpotent matrices. Recall that a matrix A ∈ Mn (F) is said
to be nilpotent if for some positive integer k, Ak is the zero matrix. The smallest k for which Ak is the
zero matrix, but Ak−1 is not, is said to be the index of nilpotency of A. We ask the question: Can we
find nilpotent matrices in Mn (F) of any index k ≤ n? (We will see later that the index cannot be greater
than n.)
It turns out that the nilpotent operators, constructed after Definition (4.4.3), can be used to derive
nilpotent matrices. So, as in that example, we fix a basis of Fn , say, the standard basis {e1 , e2 , . . . , en }.
Then, the linear operator T : Fn → Fn determined by the formulae
T e j = e j+1 for j = 1, 2, . . . , n − 1
T en = 0
has the property that T n is the zero map whereas T n−1 is not. Therefore, by the isomorphism between
Mn (F) and End(Fn ) as F-algebras, the matrix of T with respect to the standard basis must have the
same property. In other words, it must be a nilpotent matrix of index n. This n × n matrix has a very
special but simple form, and we denote it by Jn (0). Considering the action of T on the basis vectors,
we see that
 
0 0 0 . . . 0 0 0 

1
 0 0 . . . 0 0 0 

0
 1 0 . . . 0 0 0 
. . . . . . 
Jn (0) =  .
 . . . . . . 

 .
 . . . . . 

0
 0 0 . . . 1 0 0 
0 0 0 . . . 0 1 0
Definition 4.5.9. Jn (0) is called the elementary Jordan block over F of order n with zero diagonal.
In general, by the matrix Jn (a) ∈ Mn (F) for a scalar a ∈ F, we will mean an n ×n matrix over F having
all the diagonal entries equal to a, all the subdiagonal entries equal to 1 and having zeros elsewhere.
Observe that, unless a = 0, the matrix Jn (a) is not nilpotent.
To understand better the ideas presented just now, we write down the matrix Jn (0) and its powers
in the case of n = 4:
   
0 0 0 0
 0 0 0 0

1 0 0 0 0 0 0 0
J4 (0) =   J4 2 (0) =  
0 1 0 0 1 0 0 0
0 0 1 0 0 1 0 0
   
0 0 0 0
 0 0 0 0

0 0 0 0 0 0 0 0
J4 3 (0) =   J4 4 (0) =  .
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
This example clearly suggests the way nilpotent matrices of different indices can be formed.
The following result shows how difficult questions about matrices may be settled with ease by
examining the corresponding linear maps.
Lemma 4.5.10. Let A, B ∈ Mn (F), and In the identity matrix in Mn (F). The matrix In − AB is invert-
ible if and only if In − BA is invertible.
Proof. Consider the n-dimensional vector space Fn over F, and fix any basis, say the standard basis,
of Fn . Once the basis is fixed, we have an isomorphism between Mn (F) and End(Fn ). The point to note
is that invertible matrices in Mn (F) correspond to invertible operators in End(Fn) and vice-versa under
this isomorphism.
Let T and S be the linear operators on Fn corresponding to the matrices A and B. If I denotes the
identity map on Fn , it follows that I − T S and I − S T will correspond to matrices I − AB and I − BA,
respectively. Assume that I − AB is invertible, but I − BA is not. This assumption implies that the
operator I − T S is invertible, but I − S T is not. Therefore, ker(I − T S ) is the zero subspace, whereas
there is some non-zero vector v ∈ Fn such that (I − S T )v = 0. It follows that S (T v) = Iv = v. Put w = T v,

so we have S w = v. Observe that this relation implies that w ! 0 for, otherwise v = 0. Applying T to
both sides of the relation we finally obtain T (S w) = T v = w which shows that w ∈ ker(I − T S ). This
contradicts our assumption that I − T S is invertible as w is non-zero. This completes the proof. !
We have already noted that different choices of bases produce different matrices of the same linear
map. We now discuss the precise relation between two different matrix representations of a single
linear map with respect to different choices of bases. For simplicity, we take up the special case of a
linear operator on a vector space. This is the case one usually encounters in practice. For the general
case of a linear map between two vector spaces, see the result at the end of this section.
So let V be a finite-dimensional vector space over a field F, and T is a linear operator on V. We
choose two bases B and B' of V. Let A and B be the matrices of T with respect to bases B and B' ,
respectively. Now, by Equation (4.9), the vector equation T v = w can be written in terms of A and B
as follows:
Ax = y and Bx' = y'
where x, x' are the coordinate vectors of v, and y, y' of w with respect to bases B and B' , respectively.
Let P be the change of basis matrix from the basis B' to B. Then, as we have shown in the discussion
about change of coordinates (see Theorem 3.4.14 in the last chapter)
Px' = x and Py' = y.
Therefore,
(P−1 AP)x' = (P−1 A)x = P−1 y = y' .
We can compare this with Bx' = y' . Since x' can be chosen arbitrarily, this comparison yields the
equality
P−1 AP = B.
Thus we have proved the following important result.
Proposition 4.5.11. If A and B are matrices of a linear operator on a finite-dimensional vector

space V with respect to bases B and B' , respectively, then
B = P−1 AP,
where P is the change of basis matrix from B' to B.
This proposition gives rise to the important concept of similar matrices.
Definition 4.5.12. Given two matrices A, B ∈ Mn (F), we say B is similar (or conjugate) to A over
F, if there is an invertible matrix P ∈ Mn (F) such that
B = P−1 AP.
We leave it to the reader to verify that similarity is an equivalence relation in the set Mn (F). The
similarity relation partitions Mn (F) into equivalence classes, called similarity classes of similar matri-
ces. Note that similarity depends on the base field. For example, two matrices of order n over R may
not be similar, even though considered as matrices in Mn (C), they may be similar.
The discussion preceding Definition (4.5.11) proves the following proposition.
Proposition 4.5.13. Let V be an n-dimensional vector space over a field F, and let T be a linear
operator on V. If A and B are matrices in Mn (F) representing T with respect to two different bases of
V, then A and B are similar in Mn (F).
Going the other way, we may ask the following question: Suppose, A and B are similar matrices in
Mn (F). Given any n-dimensional vector space V over F, is it possible to find a linear operator T on V,
and a pair of bases of V such that A and B are matrices of T with respect to these bases? We claim
that it is indeed, possible and we now sketch a proof of our claim. Consider similar matrices A = [ai j ]
and B = [bi j ] in Mn (F) as well as an invertible matrix P = [pi j ] such that B = P−1 AP. Choose any basis
B = {v1 , v2 , . . . , vn } of V, and let T be the unique linear operator on V determined by the matrix A
with respect to this basis. Thus,
n
1
Tvj = a i j vi for each j = 1, 2, . . . , n.
i=1
We seek the other basis of V with the help of the invertible matrix P. Let S : V → V be the unique
linear operator determined this time by P the usual way:
n
1
S vj = p i j vi for each j = 1, 2, . . . , n.
i=1
Let S v j = u j . Since P is invertible, S is an isomorphism, and consequently the vectors

{u1 , u2 , . . . , un }, being the images of basis vectors under S , will form a basis of V. We leave it to
the reader to show that B is precisely the matrix of T with respect to this new basis of V, which
establishes our claim.
We record our claim as the following proposition.
Proposition 4.5.14. Given two similar matrices in Mn (F) and any n-dimensional vector space V
over F, there is a linear operator T on V and a pair of bases of V such that the given matrices are the
matrix representations of T with respect to these bases.
It is sometimes useful to consider similar linear operators, which can be defined analogously to
similar matrices.
Definition 4.5.15. Two linear operators T, S ∈ EndF (V) are said to be similar, if there is an invert-
ible linear operator R ∈ EndF (V) such that S = R−1 T R.
It should be clear that the matrices representing two similar operators on a finite-dimensional vector
space with respect to any basis must be similar.
As in Mn (F), similarity is an equivalence relation in EndF (V).
For a simple criterion for the similarity of two linear operators, see Exercise 19 of this section.
The relation between the matrices of a linear map between two vector spaces with respect to two
pairs of bases is described in the following result. The proof, which is similar to the proof of Proposi-
tion (4.5.13), is also left to the reader.
Proposition 4.5.16. Let V and W be vector spaces over a field F, having dimensions n and m,
respectively. Let T ∈ HomF (V, W), and let A ∈ Mm×n (F) be the matrix of T with respect to bases B and
C of V and W, respectively. Given another pair of bases B' and C' of V and W, respectively, if A' is
the matrix of T with respect to the bases B' and C' , then there exist invertible matrices P ∈ Mn (F) and
Q ∈ Mm (F) such that
A' = Q−1 AP.
In fact, P and Q are the matrices of change of bases from B' to B and C' to C, respectively.
We conclude this section by presenting an example of two matrices which can be shown to be sim-
ilar by actually representing them as matrices of a single linear operator with respect to two different
bases.
EXAMPLE 30 Consider matrices

   
0 0 0 1 1 1
   
A = 1 0 1 and B = 1 0 −1
   
0 1 1 0 0 0
in M3 (R). Our aim is to find an operator on the three-dimensional vector space R3

such that the given matrices are its representations with respect to some suitable
bases. Usually, we take one of the bases to be the standard basis e1 , e2 , e3 , and the
operator T to be the unique one determined by one of the matrices, say A, with
respect to this basis. Thus, we have
T e1 = e2 , T e2 = e3 and T e3 = e2 + e3 .
Next, we try to define another basis, whose vectors are suitable linear combinations
of the vectors of the first basis, such that the matrix of T with respect to this new
basis is B. The entries of B suggest that we try the vectors given by
u1 = e2 + e3
u2 = e3
u3 = e1 .
We leave to the reader the verification that the matrix of T is indeed, B with respect
to the basis formed by u1 , u2 and u3 . This then confirms that A and B are similar.
EXERCISES
vector spaces are finite-dimensional and over an arbitrary field.
(a) Any m × n matrix over a field F determines a unique linear map from Fm to Fn with respect
to their standard bases.
(b) If A and B are, respectively, the matrices of linear maps T and S of a vector space V to
another space W (both over the same field) with respect to some fixed bases of V and W,
then A + B is the matrix of T + S with respect to the same bases.
(c) HomF (V, W) # HomF (W, V) for any vector spaces V and W over the field F.
(d) If T is an invertible linear operator on a vector space V, then its matrix with respect to any
basis of V is an invertible matrix.
(e) If a matrix in Mn (F) determines two linear operators T and S on Fn with respect to two bases,
then T = S .
(f) If an invertible matrix A is similar to another matrix B in Mn (F), then B is invertible.
(g) If a matrix A is similar to B in Mn (F), then A2 is similar to B2 in Mn (F).
(h) For any positive integer n > 1, there are similar matrices in Mn (F) having different ranks.
(i) If A is the matrix of an invertible linear operator on Fn with respect to any fixed basis, then
the columns of A form a basis of Fn .
(j) There is some basis of the real vector space C of complex numbers with respect to which the
matrix of the operator T given by T (z) = z is singular.
(k) The nullity of the elementary Jordan block Jn (0) for n ≥ 1 is 1.
(l) The nullity of any non-zero nilpotent matrix in Mn (F) is 1.
2. Let T be the linear transformation from R3 → R2 given by
T (x1 , x2 , x3 ) = (x1 − 2x2, x2 − 2x3).
Find the matrix of T with respect to the standard bases of R3 and R2 . What will be the matrix of
T if the basis of R2 is changed to {v1 , v2 }, where v1 = (1, −2) and v2 = (0, 1)?
3. Compute the matrix of the linear operator T on R4 given by
T (x1 , x2 , x3 , x4 ) = (x1 , 2x1 + x2 , 3x1 + 2x2 + x3 , 4x1 + 3x2 + 2x3 + x4 )
with respect to the standard basis of R4 . Determine the invertibility of T by considering the
matrix of T .
4. Let T be the linear map from R2 [x] to R3 [x] given by
T (p(x)) = (x + 1)p(x) for any p(x) ∈ R2 [x].
Find the matrix of T with respect to the bases {1, x, x2 } and {1, x, x2 , x3 } of R2 [x] and R3 [x],
respectively.
5. Consider bases B = {1, x, x2 } and B' = {1, 1 + x, 1 + x2 } of R2 [x].
(a) Find the matrix of the translation operator T on R2 [x] given by T (p(x)) = p(x + 1) for any
p(x) ∈ R2 [x] with respect to the bases B and B' .
(b) Let B'' = {1} be a basis of the real vector space R. Find the matrix of the linear map
S : R2 [x] → R given by S (p(x)) = p(0) with respect to the bases B and B'' .
6. Let T be the linear operator on M2 (R) given by T (A) = At for any A ∈ M2 (R). Find the matrix
of T with respect to the basis of M2 (R) consisting of the unit matrices e11 , e12 , e21 and e22 , and
deduce that T is an invertible operator.
7. Compute the ranks and the nullities of the following linear operators by considering their matri-
ces with respect to the standard basis:
(a) T on R3 ; T (x1 , x2 , x3 ) = (x1 − x2 + 2x3 , 3x2 − x3 , 3x1 + 5x3 ).
(b) T on R4 ; T (x1 , x2 , x3 , x4 ) = (x2 + x3 − x4 , 2x1 − x3 + x4 , x1 + x2 − 2x4, x1 − 2x2 − 3x3 ).

9. Complete the proof of Theorem (4.5.6).
The following establishes the dimension of HomF (V, W) directly.
10. Let {v1 , v2 , . . . , vn } and {w1 , w2 , . . . , wm } be two bases of vector spaces V and W over a field
F respectively. For 1 ≤ i ≤ m and 1 ≤ j ≤ n, let fi j be the linear map from V to W given by
fi j (vk ) = δ jk wi for k = 1, 2, . . . , n,
where δ jk is the Kronecker delta.
(a) Show that { fi j } are mn linearly independent elements of HomF (V, W).
(b) Show that { fi j } span HomF (V, W).
(Hint: For (b), given any f ∈ HomF (V, W), first write f (v j ) as a linear combination of
w1 , w2 , . . . , wm .)
13. Show that the following matrices are similar over C:
' ( ' iθ (
cos θ sin θ e 0
A= and B = .
− sin θ cos θ 0 e−iθ
14. Let V be an n-dimensional vector space over a field F where n ≥ 2, and let T be a nilpotent
operator of index of nilpotency n. Exhibit a basis of V with respect to which the matrix of T is
precisely Jn (0), the elementary Jordan block of order n over F.
15. Let F be a field, and let A and B be nilpotent matrices in Mn (F) such that both have index of
nilpotency n. Use the preceding exercise to show that A and B are similar over F.
16. Let
   
1 1 1 3 0 0
   
A = 1 1 1 and B = 0 0 0
   
1 1 1 0 0 0
be matrices in M3 (R). Prove that they are similar over R by showing that if T is the linear
operator on R3 represented by A with respect to the standard basis of R3 , then there is a basis of
R3 relative to which the matrix of T is B.
17. Prove that over any field F, the elementary Jordan block Jn (0) is similar to its transpose Jn (0)t
for any positive integer n.
18. Let A and B be similar matrices in Mn (F). Determine whether A and B have
(a) The same rank,
(b) The same nullity,
(c) The same trace,
(d) The same determinant.
19. Let T and S be linear operators on a finite-dimensional vector space V. If the matrix of T with
respect to some basis of V is the same as the matrix of S with respect to another basis of V, then
show that T and S are similar.
[Hint: If {v1 , v2 , . . . , vn } and {u1 , u2 , . . . , un } are the bases, consider the linear operator R on
V defined by Rv j = u j .]
5 Linear Operators
5.1 INTRODUCTION
The advantage of representing a linear operator on a finite-dimensional vector space by a matrix lies in
the freedom to choose suitable bases of the vector space. An appropriate basis will result in a relatively
simple matrix of the linear operator which will enable us to understand the operator better. Ideally, one
would like such a matrix to be as simple as a diagonal one, such as:
 
λ1 0 0 ... 0 0 
 0 
λ2 0 ... 0 0 
 
 0 0 λ3 ... 0 0 
 . . . ... . . 
diag[λ1 , λ2 , . . . , λn ] =  .
 . . . ... . . 
 . 
 . . ··· . . 

 0
 0 0 ··· λn−1 0 
0 0 0 ··· 0 λn
If a linear operator T on an n-dimensional vector space V can be represented by such a diagonal

matrix, then just by counting the number of non-zero entries along the diagonal, one would know the
rank as well as the nullity of T ; in fact, determining the bases for the image and the kernel of T will
be equally easy. Also note that if D is a diagonal matrix, then solving the system of equations Dx = 0
or the system Dx = b is trivial.
Now, for T to be represented by a diagonal matrix like the preceding one, there must be a basis
v1 , v2 , . . . , vn of V, such that
T v j = λ jv j for j = 1, 2, . . . , n.
Non-zero vectors, such as v j , which T changes into scalar multiples of themselves are crucial in
understanding T , and so are given a special name: they are eigenvectors of T corresponding to the
eigenvalue λ j .
Thus, the ideal situation will be the one in which T has enough eigenvectors to form a basis of V; we
will then say that T is diagonalizable. So resolving the diagonalization problem for a linear operator
depends on finding its eigenvectors. However, it turns out that it is far easier to find the eigenvalues
as there is a systematic procedure for determining them. Once eigenvalues are found, simple matrix
equations lead to the corresponding eigenvectors.
237
238 Linear Operators
The ideas of eigenvalues and eigenvectors of a linear operator are intimately related to certain poly-
nomials determined by the operator. A study of these polynomials helps us not only in developing
alternate ways of looking at diagonalizable operators but also in analysing non-diagonalizable opera-
tors later.
This chapter thus explores several key concepts of linear algebra. However, the focus will be on
diagonalizable operators throughout. We begin though with a brief discussion of polynomials with
coefficients from a field, such as the field of real numbers or complex numbers, as polynomials will
play a crucial role in the material that follows.
5.2 POLYNOMIALS OVER FIELDS

This section is a brief review of the nomenclature, notation and results about polynomials that we will
be needing in this as well as in later chapters. Most of the results are without proofs. Readers looking
for more comprehensive treatment of the material should look up standard textbooks in algebra such
as Topics in Algebra [3] by I. N. Herstein.
A polynomial f (x) with coefficients from a field F is an expression of the form
f (x) = a0 + a1 x + · · · + an xn ,
where n is a non-negative integer, and the field elements a j are the coefficients of f (x). By a non-
zero polynomial, we mean a polynomial having at least one non-zero coefficient. The degree of a
non-zero polynomial f (x) is the largest exponent of x with corresponding coefficient non-zero; the
zero polynomial is assigned the degree −1 as a convention. We denote the zero polynomial by 0;
it is sometimes convenient to think of the zero polynomial as one of indeterminate degree, having
zeros for all of its coefficients. Polynomials of degree zero are called constants or scalars. The leading
coefficient of a non-zero f (x) of degree n is the coefficient an ; a monic polynomial is a non-zero
polynomial with leading coefficient 1.
The set of all polynomials with coefficients from a field F is denoted by F[x]. Thus, R[x] is the set
of all real polynomials which we have already treated as an example of a vector space over R.
Two polynomials in F[x] are equal if they have the same degree and their corresponding coefficients
are equal.
Polynomials f (x) and g(x) in F[x] of degree m and n, respectively, can be added naturally by adding
the corresponding coefficients to produce the sum polynomial f (x) + g(x); any coefficient missing in
one of the polynomials corresponding to a non-zero coefficient of the other polynomial is assumed to
be zero for this purpose. So the degree of the sum f (x) + g(x) is max(m, n).
Scalar multiplication of a polynomial in F[x] by an element of F is straightforward: if f (x) =
a0 + a1 x + · · · + am xm , then for any c ∈ F, the scalar multiple c f (x) is the polynomial of same degree
obtained by multiplying each coefficient of f (x) by c. So,
c f (x) = ca0 + ca1 x + · · · + cam xm .
EXAMPLE 1 If f (x) = 1 + 2x and g(x) = −3x + 4x2 + x3 are two real polynomials, then
f (x) + g(x) = 1 − x + 4x2 + x3
is a polynomial of degree 3 in R[x]. Similarly, for f (x) = 2 + ix + (1 + i)x2 and g(x) =

Polynomials Over Fields 239
−i+ x2 +3x4 in C[x], the sum f (x)+g(x), a polynomial of degree 4 in C[x], is given by
f (x) + g(x) = (2 − i) + ix + (2 + i)x2 + 3x4 .
EXAMPLE 2 The scalar multiple of f (x) = 2 − 4x + 6x3 in R[x] by c = 1/2 is the polynomial
1/2 f (x) = 1 − 2x + 3x3. For f (x) = 1 − ix + (1 + i)x2 in C[x], the scalar multiple i f (x)
is i + x + (−1 + i)x2.
EXAMPLE 3 Given polynomials f (x) = a0 + a1 x + · · · + am xm and g(x) = b0 + b1 x + · · · + bn xn with
m ≥ n, the linear combination c f (x) + dg(x) equals the zero polynomial for some
scalars c and d if and only if the following equalities hold in F:
ca j + db j = 0 for j = 1, 2, . . . , n and ca j = 0 for j ≥ n.
It follows that given the polynomials f1 (x) = 1, f2 (x) = x, f3 (x) = x2 , . . . , fn+1 (x) =
xn , the linear combination c1 f1 (x)+c2 f2 (x)+· · ·+cn+1 fn+1 (x) = 0 if and only if ci = 0
for all i.
Since the addition and scalar multiplication of polynomials in F[x] are basically in terms of the
corresponding operations in the field F, it is a routine matter to verify that the set of polynomials F[x]
is not only an additive group with respect to addition of polynomials but also a vector space over F.
Proposition 5.2.1. The set F[x] of polynomials with coefficients from a field F is an additive group
with respect to addition of polynomials with the zero polynomial acting as the additive identity. If
scalar multiplication of polynomials by scalars from F is also taken into account, the F[x] becomes a
vector space over F. F[x] is an infinite-dimensional vector space. The subset Fn [x] of all polynomials
over F of degree at most n is an (n + 1)-dimensional subspace of F[x] with {1, x, x2 , . . . , xn } as its
standard basis.
Now, we want to focus on the multiplicative structure of F[x]. Unlike addition or scalar multiplica-
tion, which are defined component-wise, multiplication of polynomials is performed in the following
manner. If f (x) = a0 + a1 x + · · · + am xm and g(x) = b0 + b1 x + · · · + bn xn are two polynomials in F[x],
then their product f (x)g(x) is defined to be the polynomial c0 + c1 x + · · · + cm+n xm+n , where
1
ck = ai b j for all k = 0, 1, . . . , m + n
i+ j=k
the sum being taken over all 0 ≤ i ≤ m and 0 ≤ j ≤ n. Note that f (x)g(x) = g(x) f (x). It is also clear that
if the degrees of f (x) and g(x) are m and n, respectively, then f (x)g(x) is of degree m + n.
It is again a routine verification that the additive group F[x] with this multiplication is a commuta-
tive ring with the constant polynomial 1 as the identity of the ring. In fact, if scalar multiplication is
taken into account, then F[x], like the matrix algebra Mn (F), turns out to be an F-algebra with identity.
Divisibility Properties of Polynomials over a Field

However, it is the divisibility properties of F[x], akin to those of the ring Z of integers, that will play a
crucial role for us. As usual, given polynomials f (x), g(x) ∈ F[x] with g(x) non-zero, we say that g(x)
divides f (x), if there is some polynomial h(x) ∈ F[x] such that f (x) = g(x)h(x). In that case, g(x) is a
divisor of f (x), or f (x) is a multiple of g(x). All the familiar properties of the division process in the
integers carry over to F[x] as detailed in the following proposition.
Proposition 5.2.2. The following hold for polynomials in F[x].

(a) f (x) divides itself.
(b) If f (x) divides g(x), and g(x) divides k(x), then f (x) divides k(x).
(c) If f (x) divides g(x) and h(x), then f (x) divides g(x) + h(x).
(d) If f (x) divides g(x), then f (x) divides any multiple g(x)h(x).
(e) Every non-zero constant divides any polynomial in F[x].
(f) The non-zero constants are the only invertible elements in F[x].
As in Z, F[x] has a division algorithm which essentially says that if a non-zero polynomial g(x) is
not a divisor of f (x), then we can employ division by g(x) to obtain a remainder of degree less than
that of g(x).
Proposition 5.2.3. Given polynomials f (x), g(x) ∈ F[x] such that g(x) is non-zero, there are poly-
nomials q(x), r(x) such that
f (x) = g(x)q(x) + r(x),
where either r(x) = 0 or deg r(x) < deg g(x).
Recall now that an ideal I of a commutative ring R is an additive subgroup of R which is closed
with respect to multiplication by elements of R. For example, the multiples mZ of an integer m are
ideal of the ring Z. The division algorithm in F[x] implies that any polynomial in a non-zero ideal of
the ring F[x] is a multiple of a fixed polynomial called a generator of that ideal; in fact, the generator
can be chosen to be a monic polynomial.
Proposition 5.2.4. Every ideal of F[x] has a generator. In case of a non-zero ideal I of F[x], there
is a monic polynomial of positive degree which generates I.
Thus, for any non-zero ideal I of F[x], we can find a monic polynomial d(x) such that every poly-
nomial in I can be expressed as a product d(x)q(x) for some polynomial q(x); so,
I = {d(x)q(x) | q(x) ∈ F[x]}.
Also, by an argument using degrees of polynomials, one can easily see that the product of two non-zero
polynomials in F[x] is non-zero. In other words, F[x] is an integral domain.
As every ideal of this integral domain F[x] is generated by a single element, it is called a principal
ideal domain, or a PID. PIDs possess important divisibility properties such as existence of greatest
common divisors and unique factorizations. We briefly consider the implications of these concepts for
F[x] now.
We begin by noting that given polynomials f1 (x), f2 (x), . . . , fn (x) in F[x], not all of which are zero
polynomials, the collection I of all possible linear combinations:
f1 (x)h1 (x) + f2 (x)h2(x) + · · · + fn (x)hn(x),

Polynomials Over Fields 241
where h1 (x), h2 (x), . . . , hn (x) are arbitrary polynomials in F[x], is an ideal of F[x]; it is called the
ideal generated by the polynomials f1 (x), f2 (x), . . . , fn (x). It is clear that each fi (x) ∈ I. Since F[x] is
a PID, it follows that I is generated by a single monic polynomial, say d(x), in I. In other words, every
polynomial in I is a multiple of d(x). One can therefore draw the following conclusions
• d(x) = f1 (x)q1 (x) + f2 (x)q2 (x) + · · · + fn (x)qn (x) for some polynomials q1 (x), q2 (x), . . . , qn (x) in
F[x].
• d(x) divides every fi (x).
• If f (x) divides every fi (x), that is, if f (x) is a common divisor of the fi (x), then f (x) must divide
d(x).
We have just shown that the monic polynomial d(x) is the greatest common divisor of
f1 (x), f2 (x), . . . , fn (x). It is commonplace to use the term gcd for a greatest common divisor.
It is clear that any non-zero scalar multiple of d(x) will have the same divisibility properties with
respect to polynomials f1 (x), f2 (x), . . . , fn (x); however, d(x) is the unique monic polynomial with
these properties. For future reference, we record our observations in the following.
Proposition 5.2.5. Let f1 (x), f2 (x), . . . , fn (x) be polynomials in F[x], not all of which are the zero
polynomials. Then, their greatest common divisor exists; it is the unique monic polynomial which also
generates the ideal generated by the polynomials f1 (x), f2 (x), . . . , fn (x). Thus,
d(x) = f1 (x)q1 (x) + f2(x)q2 (x) + · · · + fn (x)qn (x)
for some polynomials q1 (x), q2 (x), . . . , qn (x) in F[x].
An important case occurs when non-zero polynomials f1 (x), f2 (x), . . . , fn (x) have no common
divisors other than the constants. In that case we say that the polynomials are relatively prime. It is
clear that the gcd of relatively prime polynomials is the constant 1.
Corollary 5.2.6. If non-zero polynomials f1 (x), f2 (x), . . . , fn (x) in F[x] are relatively prime, then
there exist polynomials q1 (x), q2 (x), . . . , qn (x) in F[x] such that
f1 (x)q1 (x) + f2(x)q2 (x) + · · · + fn (x)qn (x) = 1.
Irreducible Polynomials
Finite sets of what are known as irreducible polynomials over F provide us with examples of rela-
tively prime polynomials. An irreducible polynomial over F is a non-zero, non-constant polynomial
in F[x] whose only divisors are the non-zero constants or the scalar multiples of itself. So except for
non-zero constants, an irreducible polynomial can have no divisor of degree less than its own degree.
A polynomial which is not irreducible is a reducible polynomial.
EXAMPLE 4 Any linear polynomial ax + b where a, b ∈ F with a ! 0 is an irreducible polynomial

over F.
EXAMPLE 5 The polynomial x2 + 1 is reducible over C as it can be factored as (x + i)(x − i) over

C which shows that it has divisors of degree 1.
Quite often, one uses the factor theorem to identify divisors of degree 1 of given polynomials. Before
stating this result, we note that a polynomial f (x) over a field F can be considered as a mapping from
F into F. For this interpretation, the value of the polynomial f (x) = a0 + a1 x + · · · + am xm at a scalar
c ∈ F is defined naturally as
f (c) = a0 + a1 c + · · · + am cm ,
obtained by substituting the indeterminate x by c in the expression for f (x). Note that one can draw
the graph of a real polynomial as it can be considered as a map from R to R.
We say that if a scalar c is a root of f (x) in F if the scalar f (c) = 0. Thus, c is a root of f (x) if c is a
solution of the functional equation f (x) = 0.
Now, we can state the factor theorem which is an easy consequence of the division algorithm.
Corollary 5.2.7. For a polynomial f (x) over F, a scalar c ∈ F is a root of f (x) in F if and only if
(x − c) is a divisor of f (x) in F[x].
It follows that if f (x) has a root in F and the degree of f (x) > 1, then f (x) is reducible over F.
We also say that c ∈ F is a root of f (x) of multiplicity r, if (x − c)r divides f (x) but (x − c)r+1 does
not.
By induction, we can also establish that a polynomial of degree n over a field F can have at most n
roots in F even if the roots are counted according to their multiplicities.
EXAMPLE 6 The polynomial x2 + 1 is irreducible over R. If not, it will have a root, say c ∈ R, by
the factor theorem. But that is absurd as, for a real c, c2 ! −1.
Similarly, x2 + x + 1 is irreducible over R, as by the quadratic formula, the two
roots of x2 + x + 1 are non real complex numbers.
EXAMPLE 7 Any polynomial f (x) of odd degree over R must have a root in R so such a polyno-
mial of degree > 1 cannot irreducible over R. The existence of a real root of such a
polynomial can be verified by considering the graph of y = f (x).
The case for polynomials over C is rather simple because of the celebrated Fundamental Theorem
of Algebra.
Theorem 5.2.8. Every non-constant polynomial over C has a root in C.
This implies the following characterization of irreducible polynomials over C.
Corollary 5.2.9. The only irreducible polynomials over C are the linear ones.
In general, a field F is said to be algebraically closed if every non-constant polynomial over F has
a root in F, or equivalently, the irreducible polynomials over F are precisely the linear polynomials.
A basic result (which we quote without proof) states that any field can be considered a subfield of an
algebraically closed field. For example, the field R of real numbers is a subfield of the algebraically
closed field C.
Characteristic Polynomials and Eigenvalues 243
To put the preceding theorem in perspective, we now state the Unique Factorization Theorem for
the PID F[x] which can be proved exactly the way it is proved that any positive integer greater than 1
can be expressed as a product of primes.
Theorem 5.2.10. Let F be a field. Any non-constant (monic) polynomial in F[x] can be factored
as a product of (monic) irreducible polynomials over F. Such a factorization is unique except for the
order of the factors.
Thus, given a non-constant polynomial f (x) over F, we can express it as a product
f (x) = p1 (x)r1 p2 (x)r2 · · · pt (x)rt
of distinct irreducible polynomials pi (x) over F. Note that the irreducible polynomials pi (x) are rela-
tively prime in pairs.
In particular, a non-constant monic polynomial f (x) over an algebraically closed field F such as C,
can be uniquely expressed as a product of linear factors:
f (x) = (x − a1)r1 (x − a2)r2 · · · (x − at )rt ,
where ai ∈ F. Observe that each ai in F is a root of f (x) occurring ri times; ri is called the multiplicity
of the root ai . It follows that a polynomial of degree n over an algebraically closed field F has n roots
in F if the roots are counted according to their multiplicities.
Note the following implication: since a polynomial f (x) over R can be considered a polynomial
over C, even if f (x) cannot be expressed as a product of linear factors over R, it can be expressed so
over C. Therefore, a polynomial of degree n over R has precisely n roots in C if the roots are counted
according to their multiplicities.
We end the section by pointing out another interesting property of real polynomials. If a real poly-
nomial f (x) has a non-real complex root a (as f (x) can be considered a polynomial over C, too),
conjugating the relation f (a) = 0, we see that the conjugate a must also be a root of f (x). It is then
clear that the real polynomial x2 − (a + a)x + |a|2 is a divisor of f (x). One can thus conclude that, apart
from the linear ones, the only other irreducible polynomials over R must be of degree 2.
5.3 CHARACTERISTIC POLYNOMIALS AND EIGENVALUES

We begin by introducing eigenvalues and eigenvectors of a linear operator formally.
Definition 5.3.1. Let T be a linear operator on a vector space V over a field F. A scalar λ ∈ F is
an eigenvalue of T if there is a non-zero vector v in V such that T v = λv. Such a non-zero vector v ∈ V
is called a eigenvector of T belonging to the eigenvalue λ.
Note that v = 0 always satisfies T v = λv for any λ ∈ F. So the point of this definition is the existence
of a non-zero vector v satisfying the given condition.
One way of looking at an eigenvalue of T is to examine the set of vectors {v ∈ V | T v = λv}. It is
easy to verify that for any λ ∈ F, this set is actually a subspace of V. Thus λ is an eigenvalue of T if
and only if the subspace {v ∈ V | T v = λv} is non-zero. This non-zero subspace is called the eigenspace
of T corresponding to the eigenvalue λ.
Observe that any non-zero vector of the eigenspace corresponding to an eigenvalue is an eigenvector
belonging to the eigenvalue.
There is yet another useful interpretation of the idea of an eigenvalue which can be seen as follows:
T v = λv if and only if (T − λIV )v = 0. Therefore, a necessary and sufficient condition for λ to be an
eigenvalue of T is that the kernel of the linear operator (T − λIV ) is non-zero. But by Proposition
(4.5.4), the kernel of an operator is non-zero if and only if the operator is singular. Recall that singular
means ‘ not invertible’.
The following proposition lists some equivalent conditions for a scalar to be an eigenvalue.
Proposition 5.3.2. Let T be a linear operator on a vector space V over a field F. For a scalar
λ ∈ F, the following are equivalent.
(a) λ is an eigenvalue of T .
(b) The kernel of (T − λIV ) ∈ End(V) is non-zero.
(c) The map (T − λIV ) ∈ End(V) is singular.
Let us look at some examples to understand the concepts introduced. Note that eigenvalues and
eigenvectors of a linear operator on the zero space are not defined; so whenever we discuss eigen-
values or eigenvectors of a linear operator, we will tacitly assume that the underlying vector space is
non-zero.
EXAMPLE 8 The scalar 0 is the only eigenvalue of z, the zero operator, on any vector space V.
EXAMPLE 9 For the identity map IV of any vector space V, the scalar 1 is the only eigenvalue
with every non-zero vector of V an eigenvector for the eigenvalue.
EXAMPLE 10 Consider the projection P1 on R2 . Since P1 (x, y) = (x, 0), 1 is an eigenvalue of P1

with any (x, 0) with x ! 0 as an eigenvector. Similarly, the scalar 0 is another eigen-
value of P1 with (0, y) as an eigenvector for any non-zero y.
EXAMPLE 11 In general, if P is an arbitrary projection on a vector space V with range W and kernel
K, then by the properties of such projections as given in Proposition (4.2.12), a vector
w in the image W if and only if Pw = w. Thus, any non-zero w in W is an eigenvector
of P belonging to the eigenvalue 1. Note that if W ! V, then 0 is another eigenvalue
of P with every non-zero vector in K being an eigenvector for this eigenvalue.
We will see shortly that 1 and 0 are the only eigenvalues of a projection.
EXAMPLE 12 Let Rθ be the linear operator on R2 which is the rotation of the plane counterclock-
wise through an angle θ. Assume that θ is not an integral multiple of π (that means
θ ! 0 too). Note that any scalar multiple of a non-zero vector (x1 , x2 ) in R2 must lie
on the straight line passing through the origin and the point (x1 , x2 ). Since Rθ moves
any non-zero point along a circular arc through an angle θ, it follows that no non-zero
vector can possibly be an eigenvector. So, Rθ has no eigenvalue in R.
EXAMPLE 13 Let λ ∈ F be an eigenvalue of some linear operator T on a vector space V over a field
F. Let v be a non-zero vector in V such that T v = λv. But then,
T 2 v = T (λv) = λT v = λ2 v,
showing that the map T 2 has λ2 as an eigenvalue. An easy induction shows that T k
has eigenvalue λk for any positive integer k.
For any polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am xm in F[x], let the symbol
f (T ) denote the operator a0 IV + a1 T + a2 T 2 + · · · + am T m on V. The preceding dis-
cussion shows that the operator f (T ) in End(V) has the scalar f (λ) ∈ F as an eigen-
value.
EXAMPLE 14 Let T be a nilpotent operator on a vector space V over a field F. If the index of
nilpotency of T is r, then, T r = z but T k ! z for any positive integer k if k < r. Here,
z is the zero operator on V. Now, if λ ∈ F is an eigenvalue of T , then as in the last
example, λr is an eigenvalue of T r . Since T r is the zero map on V, it follows that
λr = 0 which implies that λ = 0. On the other hand, as T r−1 ! z, there is some non-
zero v ∈ V such that T r−1 v ! 0. If we set w = T r−1 v, then it is a non-zero vector such
that T w = T (T r−1 v) = 0 as T r is the zero map on V. Thus, 0 ∈ F is the only eigenvalue
of the nilpotent operator T with any non-zero T r−1 v as an eigenvector.
If we take the more specific case of the differential map D on Rn [x], a nilpotent
operator of index (n + 1), we can easily see that no non-zero real number can be an
eigenvalue of D. For, D applied to any non-zero polynomial lowers its degree by 1
whereas the degree of a scalar multiple of a polynomial remains the same.
Method for Finding Eigenvalues

Observe that in all these examples, we had to utilize specific properties of individual linear operators
to obtain some information about their eigenvalues. We now discuss a procedure (which was hinted at
in the introduction to this chapter) which enables us, at least in principle, to determine the eigenvalues
of any linear operator on a finite-dimensional vector space.
Let T be a linear operator on an n-dimensional vector space V over a field F. Assume that n ! 0.
Fix any basis of V and let A ∈ Mn (F) be the matrix representation of T with respect to the chosen
basis. Observe that for any scalar λ in F, the matrix (A − λIn ) represents the linear operator (T − λIV )
with respect to the same basis. Now, according to the alternative characterization given in Proposition
(5.3.2), λ ∈ F is an eigenvalue of T if and only if the map (T − λIV ) is singular. However, invertibility
is preserved by the isomorphism between EndF (V) and Mn (F) (see Corollary 4.5.7). It follows that λ
is an eigenvalue of T if and only if the matrix (A − λIn) is singular, which is equivalent to the condition
that det(λIn − A) = 0. We thus have the following matrix analogue of Proposition (5.3.2).
Proposition 5.3.3. Let T be a linear operator on an n-dimensional vector space V, and let A ∈
Mn (F) be the matrix of T with respect to any arbitrary basis of V. Then, λ ∈ F is an eigenvalue of T if
and only if det(λIn − A) = 0.
Observe that the equality det(λIn − A) = 0 is equivalent to the statement that the function det(xIn − A)
of x vanishes at x = λ. This equivalence turns out to be a useful once it is realized that det(xIn − A) is
actually a monic polynomial in x over F of degree n.
For example, if n = 3 and A = [ai j ], then the matrix xI3 − A looks like
 
 x − a11 −a12 −a13 
 
 −a21 x − a22 −a23 

−a31 −a32 x − a33
so that by expanding the corresponding determinant, we see that det(xI3 − A) is the polynomial x3 −
x2 (a11 + a22 + a33 ) + xs1 − det A, where s1 is a certain sum of products of the ai j ’s taken two at a time.
Note that the constant term of the polynomial is det A. Thus, the constant term as well as the other
coefficients of the polynomial are certain sums of products of the entries of A, hence are scalars in F.
It is not difficult to see, in general too, that if A is a matrix over a field F of order n, then det(xIn − A)
is a monic polynomial of degree n with coefficients from F.
Summarizing, we see that λ ∈ F is an eigenvalue of T if and only if λ is root of the monic polynomial
det(xIn − A) for any matrix representation A of T .
Characteristic Polynomial
Definition 5.3.4. Let A ∈ Mn (F). The monic polynomial det(xIn − A) of degree n over F is called
the characteristic polynomial of A. We sometimes denote it by ch(A).
The usefulness of the idea of characteristic polynomial is largely due to the following fact.
Proposition 5.3.5. Similar matrices in Mn (F) have the same characteristic polynomial.
Proof. Recall that if A and B in Mn (F) are similar, then there is an invertible matrix P ∈ Mn (F) such
that B = P−1 AP. Now
det(xIn − B) = det(xIn − P−1 AP)

= det(P−1 xIn P − P−1 AP)
= det(P−1 (xIn − A)P)
= det P−1 (det(xIn − A)) det P
= det(xIn − A). !
Since any two matrices representing a linear operator relative to two bases are similar (see Propo-
sition 4.5.13), it follows that the last proposition enables us to define the characteristic polynomial of
a linear operator without any ambiguity.
Definition 5.3.6. Let T be a linear operator on a finite-dimensional vector space. The character-
istic polynomial of T is defined as the characteristic polynomial of any matrix representing T .
We can rephrase the conclusion of the earlier discussion about eigenvalues of T in terms of its
characteristic polynomial as follows.
Proposition 5.3.7. Let T be a linear operator on an n-dimensional vector space V over a field F.
Let A be the matrix of T with respect to any fixed but arbitrary basis of V. Then,
(a) λ ∈ F is an eigenvalue of T if and only if λ is a root in F of the characteristic polynomial of A.
(b) The eigenvalues of T are precisely the roots in F of its characteristic polynomial.
We can introduce eigenvalues and eigenvectors of any matrix A ∈ Mn (F) in an obvious manner: a
scalar λ ∈ F is an eigenvalue of A if we can find a non-zero column vector x in Fn such that Ax = λx;
the vector x is an eigenvector of A belonging to the eigenvalue λ. We leave it to the reader to verify
that λ is an eigenvalue of A if and only if λ is a root of the characteristic polynomial of A.
As with linear operators, the eigenspace of a matrix belonging to an eigenvalue is the subspace
consisting of non-zero eigenvectors along with the zero vector. For future reference, we make this idea
precise.
Proposition 5.3.8. The eigenspace for an eigenvalue λ of a matrix A ∈ Mn (F) is the solution space
in Fn of the matrix equation (A − λIn)x = 0.
Proof. A vector x ∈ Fn is a non-zero solution of (A − λIn )x = 0 if and only if x is an eigenvector for A

belonging to the eigenvalue λ. The proposition is thus clear. !
It is clear that linear operators on finite-dimensional vector spaces and their matrices have the same
eigenvalues. In fact, by Proposition (4.5.4) and the discussion preceding it, we can have the following
precise formulation of the relationship between the eigenvalues and corresponding eigenvectors of a
linear operator and those of any of its matrix representations.
Proposition 5.3.9. Let T be a linear operator on an n-dimensional vector space V over a field
F, and A ∈ Mn (F) be its matrix with respect to a fixed but arbitrary basis B of V. For any vector
v ∈ V, let x ∈ Fn be its coordinate vector with respect to B. Then, v is an eigenvector of T belonging
to an eigenvalue λ ∈ F if and only if x is an eigenvector of A belonging to the same eigenvalue λ ∈ F
of A. Thus, the vectors comprising the eigenspace of T belonging to the eigenvalue λ correspond
precisely to the column vectors in Fn comprising the solution space of the system of equations given
by (A − λIn)x = 0.
The preceding proposition coupled with Proposition (5.3.8) then yields the following.
Corollary 5.3.10. Notations same as above. Then the eigenspaces belonging to the same eigen-
value of T and its matrix A are isomorphic.
We need to introduce one more terminology related to the concept of an eigenvalue.
Definition 5.3.11. If an eigenvalue occurs r times as a root of the characteristic polynomial, we

then say that the eigenvalue has algebraic multiplicity r.
We now list the characteristic polynomials and eigenvalues of some special linear operators and
matrices. The characteristic polynomial of a matrix A is usually worked out directly by evaluating the
determinant det(xIn − A); note that, by definition, the characteristic polynomial of a linear operator is
the characteristic polynomial of any matrix representing it.
EXAMPLE 15 The zero matrix 0n of order n over any field F clearly has xn as the characteristic
polynomial. The zero map z on an n-dimensional vector space V over F thus has xn
as its characteristic polynomial as the matrix of z with respect to any basis of V is the
zero matrix of order n. So 0 is the only eigenvalue, with algebraic multiplicity n, of
such a zero operator or of the zero matrix of order n.
EXAMPLE 16 The scalar matrix aIn of order n over a field F, with a ∈ F, has (x − a)n as its charac-
teristic polynomial. For, the matrix xIn − aIn , whose determinant gives us the char-
acteristic polynomial, is also a scalar matrix of order n having (x − a) as the diagonal
entry. So, aIn has a single distinct eigenvalue a, again with algebraic multiplicity n.
Thus, for any n-dimensional vector space V over a field F, the linear operator aIV
has (x − a)n as its characteristic polynomial, as aIn is the matrix of aIV with respect to
any basis of V. In particular, the identity operator on an n-dimensional vector space
over any field has (x − 1)n as its characteristic polynomial.
EXAMPLE 17 The characteristic polynomial of any diagonal matrix
diag[λ1 , λ2 , . . . , λn ]
is clearly (x − λ1 )(x − λ2 ) . . . (x − λn ). Similarly, the characteristic polynomial of a

lower triangular (or of an upper triangular) matrix, with scalars λ1 , λ2 , . . . , λn along
the diagonal, is (x − λ1 )(x − λ2 ) . . . (x − λn ). In both these cases, the eigenvalues
are the distinct entries appearing on the diagonal; the algebraic multiplicity of each
eigenvalue is the number of times it appears on the diagonal.
The fact that the characteristic polynomial of a linear operator on an n-dimensional vector space, or
equivalently of an n × n matrix, is a polynomial of degree n and eigenvalues roots of this polynomial
implies that the number of distinct eigenvalues can be at most n. See Section 5.2 for a discussion of
roots of polynomials. Of course, if the underlying field is C or any other algebraically closed field,
then the number of eigenvalues (not necessarily distinct) will be the same as the degree of the charac-
teristic polynomial. On the other hand, if the characteristic polynomial turns out to be an irreducible
polynomial of degree ≥ 2 over the underlying field, then there can be no eigenvalue. We present a few
examples to illustrate the various possibilities.
EXAMPLE 18 Consider the operator Rθ on R2 , whose matrix with respect to the standard basis is
' (
cos θ −sin θ
A= .
sin θ cos θ
A direct calculation shows that det(xI2 − A) equals x2 − 2(cos θ)x + 1, so this polyno-
mial is the characteristic polynomial of A as well as of Rθ . Note that the discriminant
of this polynomial is (−4sin2 θ), a negative real number unless sin θ = 0. It follows,
from the formula for the solutions of a quadratic equation, that if θ is not a multiple
of π, then there can be no real root of the characteristic polynomial. Hence, for such
values of θ, A or Rθ has no real eigenvalues.
However, as the polynomial can be factored into two distinct linear factors over
C, A considered a matrix over C, has two distinct eigenvalues.
EXAMPLE 19 Consider the linear operator T on the real space R3 whose matrix A with respect to
the standard basis of R3 is the permutation matrix
 
0 0 1
 
A = 1 0 0.
 
0 1 0
Evaluating det(xI3 − A), we see that A and T have x3 − 1 as the characteristic poly-
nomial. Since x3 − 1 = (x − 1)(x2 + x + 1) and since x2 + x + 1 is irreducible over R, it
follows that 1 is the only eigenvalue of T and A in R.
If A is considered a matrix over C (for example, as the matrix of a linear operator
on C3 with respect to the standard basis of C3 ), then A has three distinct eigenvalues
1, ω and ω2 , where ω is a non-real cube root of unity.
Computing Eigenvectors
EXAMPLE 20 Consider the real matrix

' (
−8 6
A= .
−15 11
The characteristic polynomial of A, which the reader can easily work out, is x2 − 3x +
2 whose roots in R are 1 and 2. Thus, A has two distinct eigenvalues. We work out
the eigenvectors of A in R2 corresponding to these eigenvalues. Recall that a column
vector x is an eigenvector for A corresponding to the eigenvalue λ if it is a non-zero
solution of the matrix equation Ax = λx, or equivalently, of (A − λI2 )x = 0. Thus, to
find the eigenvectors of A for the eigenvalue λ = 1, we have to solve
2 3 ' (2 3 2 3
x −9 6 x1 0
(A − I2) 1 = = ,
x2 −15 10 x2 0
which is easily seen to be equivalent to

' (2 3 2 3
−3 2 x1 0
= .
0 0 x2 0
It follows that any solution (x1 , x2 )t is given by the condition 3x1 = 2x2 . We may
choose any vector (x1 , x2 )t satisfying this condition to be an eigenvector; for exam-
ple, (2, 3)t is an eigenvector for the eigenvalue λ = 1. A similar argument shows that
the general solution of the equation (A − 2I2)x = 0 is given by 5x1 = 3x2 , so we may
take (3, 5)t as an eigenvector of A for the eigenvalue λ = 2. These two vectors are
clearly linearly independent and hence form a basis of R2 .
Note that the choices of the eigenvectors are arbitrary; we could have taken any
vectors as eigenvectors as long as they satisfied the matrix equations or the equivalent
conditions giving the general solution. As we will prove a little later, any arbitrary
choice would have still resulted in a basis, for eigenvectors belonging to distinct
eigenvalues are automatically linearly independent.
We continue with the matrix of the previous example to see the effect of the existence of a basis
consisting of eigenvectors.
EXAMPLE 21 Let T be the linear operator on R2 whose matrix with respect to the standard basis
{e1 , e2 } of R2 be A. Thus, by the definition of matrix representation (see Section 4.5),
2 3 2 3 2 3 2 3
1 −8 0 6
T = and T = .
0 −15 1 11
Note that not only T and A have the same eigenvalues, they also have the same
eigenvectors by Proposition (5.3.9) as T acts on column vectors of R2 . Therefore, the
basis {v1 = (2, 3)t , v2 = (3, 5)t } of R2 consists of eigenvectors of T corresponding,
respectively, to eigenvalues 1, 2 of T . Therefore T v1 = 1v1 and T v2 = 2v2 and so
the matrix D of T with respect to the basis {v1 = (2, 3)t , v2 = (3, 5)t } is the diagonal
matrix
' (
1 0
D= ,
0 2
where the diagonal entries are the eigenvalues of T .
Since the matrices A and D represents the same linear operator T , they are similar
matrices. In fact, as we have seen in Section 4.5 during the discussion of similar
matrices, if P is the transition matrix from the new basis of R2 of
' eigenvectors
( to the
2 3
original standard basis, then P AP = D. In this example, P =
−1 and it can be
3 5
easily verified that
' (−1 ' (' ( ' (
2 3 −8 6 2 3 1 0
= .
3 5 −15 11 3 5 0 2
EXAMPLE 22 As another exercise in calculating eigenvectors, let us find the eigenvectors of the
permutation matrix
 
0 0 1
 
A = 1 0 0
 
0 1 0
considering it a matrix over C. Thus, A can be considered the matrix of a linear
operator T on C3 , the three-dimensional vector space of ordered triples of complex
numbers, with respect to the standard basis of C3 . We have already seen that 1, ω
and ω2 are the three eigenvectors of A over C. The matrices A − λI3 for these three
eigenvalues are
     
−1 0 1 −ω 0 1 −ω2 0 1
     
 1 −1 0,  1 −ω 0,  1 −ω2 0,
    
0 1 −1 0 1 −ω 0 1 −ω2
respectively. Since they are row equivalent to

     2 
1 0 −1 ω 0 −1 ω 0 −1
     
0 1 −1 ,  0 1 −ω ,  0 1 −ω2 ,

0 0 0 0 0 0 0 0 0
it follows that the solutions x = (x1 , x2 , x3 )t of the matrix equation (A − λI3 )x = 0 for
λ = 1, ω and ω2 , respectively, will be given by
x1 − x3 = 0 ωx1 − x3 = 0 ω2 x1 − x3 =0
, , .
x2 − x3 = 0 x2 − ωx3 = 0 x2 − ω3x =0
As in the previous example, we can pick any column vector (x1 , x2 , x3 )t satisfy-
ing these three sets of equations to obtain the required eigenvectors. Thus, we may
choose (1, 1, 1)t , (1, ω2 , ω)t and (1, ω, ω2 )t as eigenvectors corresponding to eigen-
values 1, ω and ω2 , respectively. We leave to the reader to verify that these are lin-
early independent over C, and hence form a basis of C3 over C. It is clear that relative
to this new basis, the matrix of T is the diagonal matrix:
 
1 0 0 
0 ω 
0 .
 2 
0 0 ω
Our computation also shows that

   
0 0 1 1 0 0 
1 0 0  
  and 0 ω 0 
2 
0 1 0 0 0 ω
are similar over C.

With these examples in mind, we introduce one of the most important concepts of linear algebra.
Definition 5.3.12. Let T be a linear operator on a finite-dimensional vector space V over a field
F. T is said to diagonalizable (over F) if there is a basis of V consisting of eigenvectors of T .
Note that relative to a basis of eigenvectors, the matrix of a diagonalizable operator will be a diag-
onal one.
An analogous definition can be given for matrices.
Definition 5.3.13. A matrix A ∈ Mn (F) is said to be diagonalizable (over F) if there is a basis of Fn

consisting of eigenvectors of A.
We will see shortly that A is diagonalizable if and only if A is similar to a diagonal matrix over F.
It is clear that a linear operator on a finite-dimensional vector space is diagonalizable if and only if
any matrix representing the operator is diagonalizable.
Given a diagonalizable matrix A in Mn (F), consider the unique linear operator T on Fn determined
by A with respect to the standard basis of Fn . Since T and A have the same eigenvalues, it follows that
with respect to the basis of Fn consisting of eigenvectors of A, the matrix of T is a diagonal matrix
D, whose diagonal entries are the eigenvalues of A. Therefore, if P is the transition matrix from the
basis consisting of the eigenvectors of A to the standard basis of Fn , then, by Proposition (4.5.11), D =
P−1 AP. Note that by the definition of transition matrix, the columns of P consist of the eigenvectors
of A forming the basis of Fn . Thus, we have the following useful result about diagonalizable matrices.
Proposition 5.3.14. Let A ∈ Mn (F) be diagonalizable with eigenvalues λ1 , λ2 , . . . , λn , not neces-

sarily distinct. Let v1 , v2 , . . . , vn be eigenvectors of A forming a basis of Fn such that v j , for each j,
is an eigenvector belonging to the eigenvalue λ j . If
/ 0
P = v1 v2 ··· vn
denotes the matrix in Mn (F) whose jth column is formed by the components of v j , then P is invertible
and
P−1 AP = diag[λ1, λ2 , . . . , λn ].
We say that P diagonalizes A. Note that as P depends on our choice of the eigenvectors belonging
to the eigenvalues of A, the matrix diagonalizing A is not unique.
Corollary 5.3.15. A matrix A in Mn (F) is diagonalizable if and only if A is similar to a diagonal

matrix D over F; in that case, the diagonal entries are precisely the eigenvalues of A.
We want to draw the reader’s attention to another important point contained in the preceding dis-
cussion: any matrix in Mn (F) determines a unique linear operator on the n-dimensional vector space
Fn over F, say, with respect to the standard basis. They are equivalent in the sense that they share the
same characteristic polynomial, the same eigenvalues as well as the same eigenvectors. Thus, all the
results about linear operators can be translated to results about their corresponding matrices, too. We
will do that kind of translation without any comments from now onwards.
Let us now consider some linear operators and matrices and check whether they are diagonalizable.
EXAMPLE 23 The projection P1 : R2 → R2 given by P1 (x1 , x2 ) = (x1 , 0) is diagonalizable over R.

For, it has two eigenvalues 1 and 0, and it is clear that the standard basis vectors e1
and e2 are the eigenvectors of P1 corresponding to those eigenvalues, respectively.
EXAMPLE 24 In fact, any projection P, that is, a linear operator P such that P2 = P, on a finite-
dimensional vector space V is trivially diagonalizable. For, if W and K are the image
and the kernel of P respectively, then by Proposition (4.2.12), we have
(a) Pw = w for any w ∈ W.

(b) V = W ⊕ K.
Thus, non-zero vectors of W and K are eigenvectors of P belonging to the eigenval-

ues 1 and 0, respectively. Also, by (b), the union of any bases of W and K yields
a basis of V. It is clear that with respect to such a basis of V, the matrix of P is a
diagonal one with 1 and 0 as the diagonal entries.
EXAMPLE 25 Consider the matrix ' (

0 1
A=
1 0
over R. The eigenvalues of A are 1 and −1. The eigenvectors of A are column vectors
in R2 . Just by looking at A, we can conclude that e1 + e2 = (1, 1)t is an eigenvector
of A for the eigenvalue 1. we leave to the reader to verify that e1 − e2 = (1, −1)t is an
eigenvector belonging to the eigenvalue −1 and
' (−1 ' ( ' (
1 1 1 1 1 0
A = .
1 −1 1 −1 0 −1
EXAMPLE 26 The rotation Rθ of R2 is not a diagonalizable operator of R2 over R unless θ is a
multiple of π. For, as we had seen in Example 14, Rθ has no real eigenvalue for other
values of θ.
The matrix of Example 15 is not diagonalizable over R but is so over C.
Quite often, one may be interested only in the dimension of the eigenspace, that is, the number of
linearly independent eigenvectors (but not actual eigenvectors) for a given eigenvalue. Recall that the
dimension of the solution space of the matrix equation Bx = 0 is precisely the nullity of the matrix
B (see Definition 3.6.13). Therefore, by appealing to Proposition (5.3.8), we can derive the following
convenient way of finding dimensions of eigenspaces.
Lemma 5.3.16. For any eigenvalue λ of a matrix A of order n, the number of linearly independent
eigenvectors belonging to λ is the nullity of the matrix A − λIn.
Recall that the nullity of a matrix B of order n is n − r, where r is the rank of B.
EXAMPLE 27 Consider the nilpotent matrix

 
0 0 0 0

1 0 0 0
J4 (0) =  
0 1 0 0
0 0 1 0
over any field F. (Any field, even a finite one, must have the multiplicative and the
additive identity, which are usually denoted by 1 and 0 respectively.) It is clear that
J4 (0), being a lower triangular matrix having zeros along the diagonal, has x4 as its
characteristic polynomial so that 0 is the only eigenvalue. J4 (0) is thus diagonalizable
only if it has four linearly independent eigenvectors in F4 belonging to that single
eigenvalue 0. However, the nullity of J4 (0) is 1 as its rank is clearly 3. Thus, there
cannot be more than one linearly independent eigenvector for J4 (0), showing that
J4 (0) is not diagonalizable over any field.
As some of these examples (including the last one) show, linear operators may fail to be diagonal-
izable if they do not have enough eigenvalues. On the other hand, the following result implies that a
linear operator on an n-dimensional vector space or a matrix of order n having n distinct eigenvalues
is necessarily diagonalizable.
Proposition 5.3.17. Eigenvectors belonging to distinct eigenvalues are linearly independent.
Proof. We have left the statement of the theorem deliberately vague, so that it can be used for operators
as well as matrices. We prove it for linear operators. The obvious modifications needed for matrices
are left to the reader.
So let T be a linear operator on a finite-dimensional vector space V over a field F. Let v1 , v2 , . . . , vk
be eigenvectors of T in V corresponding to distinct eigenvalues λ1 , λ2 , . . . , λk . We can assume that k ≥
2. Now, if these eigenvectors are not linearly independent, some vector in the list is a linear combina-
tion of the preceding ones of the list (see Proposition 3.3.10). Let v j ( j ≥ 2) be the first such vector in the
list. Thus, v1 , v2 , . . . , v j−1 are linearly independent. Also, we have scalars c1 , c2 , . . . , c j−1 such that
v j = c1 v1 + c2 v2 + · · · + c j−1 v j−1 . (5.1)
Applying T to both sides of Equation (5.1) and noting that T vi = λi vi , we obtain
λ j v j = c1 λ1 v1 + c2 λ2 v2 + · · · + c j−1 λ j−1 v j−1 . (5.2)
On the other hand, multiplying Equation (5.1) by λ j yields another relation:
λ j v j = c1 λ j v1 + c2 λ j v2 + · · · + c j−1 λ j v j−1 . (5.3)
Finally, we subtract Equation (5.3) from Equation (5.2) to arrive at
c1 (λ1 − λ j )v1 + c2 (λ2 − λ j )v2 + · · · + c j−1 (λ j−1 − λ j )v j−1 = 0.
However, the vectors v1 , v2 , . . . , v j−1 being linearly independent, the coefficients ci (λi − λ j ), for
1 ≤ i ≤ j − 1, in the preceding relation are all zeros. It follows that c1 = c2 = · · · = c j−1 = 0 since the
eigenvalues λi , for 1 ≤ i ≤ j, are distinct. Equation (5.1) then shows that the eigenvector v j is the zero
vector, an absurdity. As the assumption that the eigenvectors are dependent led us to this absurdity, the
proposition follows. !
In view of the proposition, if a linear operator on an n-dimensional vector space, or a matrix of

order n over a field F, has n distinct eigenvalues, then by choosing an arbitrary eigenvector for each of
the eigenvalues, we can produce a set of n linearly eigenvectors which will form a basis of the vector
space (or of Fn in case of the matrix A). The following useful corollary results.
Corollary 5.3.18. A linear operator on an n-dimensional vector space or a matrix of order n over
a field F, having n distinct eigenvalues is diagonalizable over F.
It should be clear that even if an operator on an n-dimensional space or a matrix of order n does not
have n distinct eigenvalues, it may still be diagonalizable as the next example shows.
EXAMPLE 28 The reader can check that the real matrix

 
 10 0 72
 
A = −3 1 −24
 
−1 0 −7
has two distinct eigenvalues 1 and 2, with 1 having multiplicity 2. A will be diagonal-
izable if it has two linearly independent eigenvectors corresponding to one of these
eigenvalues. Begin with the eigenvalue λ = 1. Since in the matrix
 
 9 0 72
 
(A − 1I3)x = −3 0 −24,
 
−1 0 −8
the first as well as the second row is a multiple of the third row, it follows that the
rank of the matrix is 1 and its nullity is 2. Thus, it is possible to find two linearly
independent eigenvectors belonging to the eigenvalue λ = 1. Now, there will be at
least one eigenvector of A for the eigenvalue λ = 2, and this together with the two
eigenvectors chosen already, form a basis of R3 by Proposition (5.3.17). Thus A is
diagonalizable over R. We leave the task of actually producing the basis to the reader.
In contrast to this diagonalizable 3 × 3 matrix with two distinct eigenvalues, the 3 × 3 matrix of the
next example is not diagonalizable, even though it too has two distinct eigenvalues.

 
15 −12 −16
 
A =  4 −2 −5.
 
9 −8 −9
It is easy to see that there are two distinct eigenvalues of A, namely, 1 and 2, with 1
having multiplicity 2. To find the number of linearly independent eigenvectors of A
belonging to the eigenvalue 1, we need to find the nullity of the matrix
 
14 −12 −16
 
A − I3 =  4 −3 −5.
 
9 −8 −10
To do that we seek the reduced form of the matrix.

   
14 −12 −16 1 −1 −1
 4 −3
 
−5 ∼ 4 −3 −5

   
9 −8 −10 9 −8 −10
 
1 −1 −1
 
∼ 0 1 −1
 
0 1 −1
 
1 0 −2
 
∼ 0 1 −1.
 
0 0 0
The last matrix, which is the row reduced form of (A − I3 ), has two pivot columns
so its nullity is 1. Thus, the nullity of (A − I3 ), or equivalently, the dimension of the
eigenspace of A belonging to eigenvalue 1 is 1. Similar calculation for the eigenvalue
λ = 2 shows that the corresponding eigenspace also has dimension 1. Thus, we see
that for each of the eigenvalues, there cannot be more than one linearly independent
eigenvector. Consequently, there cannot be a basis of of R3 consisting of eigenvectors
of A, showing that A is not diagonalizable.
Note that A is not diagonalizable even if we consider it as a matrix over the larger
field C.
Eigenspaces
The last two examples suggest that in case the number of eigenvalues is not equal to the order of a
matrix, a simpler criterion for deciding diagonalizability is needed. For deriving any such criterion,
the idea of an eigenspace needs deeper examination. Without any loss of generality, we consider
eigenspaces of operators. We record some general observations about eigenspaces in the following
remarks.
(a) If λ is an eigenvalue of the operator T , then the eigenspace ker(T − λIV ) is non-zero and so has
dimension at least one.
(b) Every non-zero vector in the eigenspace ker(T − λIV ) is an eigenvector of T belonging to the
eigenvalue λ. Consequently, any non-zero linear combination of finitely many eigenvectors of T
belonging to the eigenvalue λ is again an eigenvector for the same λ.
(c) Let W1 , W2 , . . . , Wk be the eigenspaces of T corresponding to distinct eigenvalues, and let
W = W1 + W2 + · · · + Wk be the sum of these subspaces. We claim that this sum of subspaces
is direct (see Section 3.5 for a discussion of direct sums). To prove the claim assume that v1 +
v2 + · · · + vk = 0 in V, where vi ∈ Wi . We need to show that each vi is the zero vector. Now, if
some vi are non-zero, clearly some other v j ( j ! i) must also be non-zero. Thus the given sum
is a relation of linear dependence for the non-zero vectors among v1 , v2 , . . . , vk . However, any
non-zero vector in an eigenspace is an eigenvector. So we have a relation of linear dependence
for eigenvectors belonging to distinct eigenvalues, contradicting Proposition (5.3.17). Thus each
vi must be the zero vector and our claim follows. We may therefore write
W = W1 ⊕ W2 ⊕ · · · ⊕ Wk .
As opposed to the algebraic multiplicity of an eigenvalue, which is the multiplicity of the eigenvalue
as a root of the characteristic polynomial, we introduce now its geometrical multiplicity.
Definition 5.3.19. The geometrical multiplicity of an eigenvalue is the dimension of the corre-
sponding eigenspace.
So the last remark implies that the sum of the geometrical multiplicities of the distinct eigenvalues
of an operator is the dimension of the subspace spanned by the eigenvectors belonging to all possible
eigenvalues.
We now look at the case of a diagonalizable operator closely. Let T be a diagonalizable linear
operator on an n-dimensional vector space V over a field F. Therefore, there is a basis of eigenvectors
of V with respect to which the matrix of T is a diagonal one, say
D = diag[λ1 , λ2 , . . . , λn ],
where the λi are eigenvalues of T . In general, the scalars λ1 , λ2 , . . . , λn need not be distinct. Suppose
that exactly k (1 ≤ k ≤ n) distinct values appear on the diagonal of D. Without loss of generality, we
may further assume (if necessary by reordering the basis of eigenvectors) that the first n1 of the scalars
appearing on the diagonal are equal to λ1 , the next n2 equal to λ2 and so on, and finally the last nk
are equal to λk . In other words, there are k distinct eigenvalues λ1 , λ2 , . . . , λk of T which appear as
entries of D. Note that for any j corresponding to the eigenvalue λ j , there are n j linearly independent
eigenvectors belonging to this particular eigenvalue in the chosen basis of V. Therefore, with respect
to this basis, the matrix D can be expressed in blocks as follows:
 
λ1 In1 0 . . . 0 
 0 
 λ2 I n 2 . . . 0 

D =  . .. ..  , (5.4)
 .. . . 
 
0 0 . . . λk Ink
with λ j appearing exactly n j times in the diagonal of D. The characteristic polynomial of D is easily
seen to be
(x − λ1)n1 (x − λ2 )n2 . . . (x − λk )nk , (5.5)
where n1 + n2 + · · · + nk = n, the dimension of V. This must also be the characteristic polynomial of T
since the characteristic polynomial of a linear operator is the same as that of any matrix representing
it. Examining Equations (5.4) and (5.5), we can arrive at the following conclusions.
• The characteristic polynomial of T factors completely into a product of linear factors, some of
which may be repeated, over F.
• The distinct diagonal entries λ1 , λ2 , . . . , λk of D are all the possible eigenvalues of T as these are
the only roots of the characteristic polynomial as given by Equation (5.5).
• n j , the number of times λ j appears on the diagonal of D for any j, is precisely the algebraic
multiplicity of the eigenvalue λ j .
We are now ready to state one of the main theorems of this section.
Theorem 5.3.20. Let T be a linear operator on a finite-dimensional vector space V over a field
F. Let λ1 , λ2 , . . . , λk be the distinct eigenvalues of T , and let W1 , W2 , . . . , Wk be the corresponding
eigenspaces of T . Then, the following are equivalent.
(a) T is diagonalizable over F.
(b) The characteristic polynomial of T factors completely into linear factors over F as
(x − λ1 )d1 (x − λ2)d2 . . . (x − λk )dk ,
where d j = dim W j for j = 1, 2, . . . , k.
(c) The sum of the geometric multiplicities of the distinct eigenvalues of T equals dim V.
(d) If W1 , W2 , . . . , Wk are the distinct eigenspaces of T , then
dim V = dim W1 + dim W2 + · · · + dim Wk .
(e) If W1 , W2 , . . . , Wk are the distinct eigenspaces of T , then
V = W1 ⊕ W2 ⊕ · · · ⊕ Wk .
The matrix version is obvious and we leave it to the reader to formulate and prove it.
Proof. If T is diagonalizable with distinct eigenvalues λ1 , λ2 , . . . , λk , then we have seen that with
respect to a suitably ordered basis of eigenvectors, the matrix of T can be expressed as a block matrix
 
λ1 In1 0 . . . 0 
 0 λ2 I n 2 . . . 0 
 
D =  . .. ..  ,
 .. . . 
 
0 0 . . . λk I n k
where for each j, n j is the algebraic multiplicity of the eigenvalue λ j , which is its multiplicity as a root
of the characteristic polynomial of T .
Therefore, to prove that (a) implies (b), it suffices to show that for each j, n j equals dim W j = d j ,
the geometric multiplicity of a j . Now, there are exactly n j zero rows in the diagonal matrix (D − λ j In )
as λi ! λ j for i ! j. Hence, the nullity of (D − λ j In ), and the nullity of the equivalent linear operator
T − λ j IV is n j . That is another way of saying that n j is the dimension of the eigenspace W j .
That (b) implies (c) is trivial, as the degree of the characteristic polynomial of T is dim V. (d) is just
a restatement of (c). For the next implication, note that in the last of the preceding remarks, we have
seen that the sum W of the eigenspaces of T is a direct sum. Since the dimension of a direct sum is
the sum of the dimensions of the summands, it follows from the hypothesis in (d) that dim W = dim V
forcing W = V. Thus, (e) follows.
Finally, (e) implies that the eigenvectors of T span V. Therefore, we can choose a basis of V from
among these eigenvectors. In other words, T is diagonalizable proving (a). !
One of the advantages of this theorem is that, in some cases, without actually finding eigenvectors,
we can decide whether an operator or a matrix is diagonalizable. For example, as soon as we found
that the characteristic polynomial of the permutation matrix
 
0 0 1
 
A = 1 0 0
 
0 1 0
was (x−1)(x2 + x+1), we could have concluded that A is not diagonalizable over R, as x2 + x+1 cannot
be factored into linear factors over R. However, a word of caution: even if the characteristic polynomial
of a matrix or an operator factors into linear factors, it does not necessarily mean diagonalizability.
To cite an instance, the characteristic polynomial of the matrix in Example 23 is easily seen to be
(x − 1)2(x − 2), so it does factor into linear factors over R (or even over C). However, as we had seen
in that example, the matrix is not diagonalizable even over C.
There is another characterization of diagonalizable operator which is simpler to verify, but it needs
the concept of the minimal polynomial of an operator. We devote the next section for a detailed study
of this concept for it is as important for a linear operator as its characteristic polynomial.
Power of a Diagonalizable Matrix

In many practical problems, one needs to compute powers of a given matrix. For a diagonalizable ma-
trix, there is a simple but useful method for calculating such powers. Let A ∈ Mn (F) be a diagonalizable
matrix. Suppose P ∈ Mn (F) is an invertible matrix such that
P−1 AP = D,
where D = diag[λ1, λ2 , . . . , λn ] whose diagonal entries λ1 , λ2 , . . . , λn are the eigenvalues of A, not

necessarily distinct. It is an easy exercise to show that for any positive integer k,
Ak = PDk P−1 , (5.6)
where clearly, Dk = diag[λ1k , λ2 k , . . . , λn k ]. Since P and P−1 are already known, the formula in Equa-
tion (5.6) is quite efficient.
For a large class of diagonalizable matrices that appear in applications, P can be so chosen that
P−1 = Pt , the transpose. In that case, this method for computing powers becomes even more efficient.
Real symmetric matrices form such a class of matrices and we devote the rest of this section to a
discussion of diagonalizability of these matrices.
Eigenvalues and Eigenvectors of Real Symmetric Matrices

We have seen earlier, in quite a few examples, real matrices which cannot be diagonalized over R
because either they lack eigenvalues in R or there are not enough eigenvectors. Therefore, it is truly
remarkable that any real symmetric matrix can be diagonalized over R. The proof of this important
result, which we shall be presenting shortly, is simple but makes essential use of concepts related to
the usual dot product in Rn and in Cn (see Section 3.7 for details about dot products). We need a few
basic properties of dot product for proving this, which we recall now.
Definition 5.3.21. Given any two column vectors x = (x1 , x2 , . . . , xn )t and y = (y1 , y2 , . . . , yn )t
in Cn , the usual dot product /x, y0 is defined by
/x, y0 = x1 y1 + x2 y2 + · · · + xn yn ,
where yi denotes the conjugate of the complex number yi . Recall that y ∈ C is real if and only if y = y.
Thus for x and y in Rn , the dot product reduces to
/x, y0 = x1 y1 + x2 y2 + · · · + xn yn .
Letting the row vector y∗ = (y1 , y2 , . . . , yn ) denote the conjugate transpose of the column vector
y = (y1 , y2 , . . . , yn )t (in the real case y∗ is simply the transpose yt ), the dot product of x and y can be
expressed as the following matrix product
/x, y0 = y∗ x,
or as
/x, y0 = yt x,
in case x and y are in Rn .
The following observation will be useful later.
Lemma 5.3.22. Let X and Y be real matrices of order n.If ρ1 , ρ2 , . . . , ρn are the row vectors of X
Fand γ1 G, γ2 , . . . , γn the column vectors of Y, then the (i, j)th entry of the product XY is the dot product
ρti , γ j in Rn .
Proof. Note that if ρi = (ai1 , ai2 , . . . , ain ) and γ j = (b1 j , b2 j , . . . , bn j )t , then the (i, j)th entry of the
product XY, by the usual rules of matrix multiplication, is given by the sum ai1 b1 j +ai2 b2 j +· · ·+ain bn j
which is precisely the dot product of the column vectors ρti and γ j by the definition of dot product. !
Two basic properties that we shall also need are
/λx, y0 = λ /x, y0
and
/x, λy0 = λ /x, y0
for any λ ∈ C; these properties were verified in Section 3.7. We are now ready to prove the following
remarkable result.
Proposition 5.3.23. Let A be a real symmetric matrix of order n. Then A has n real eigenvalues,
counting multiplicities. Moreover, for each such real eigenvalue, A has real eigenvectors.
Proof. Consider A a matrix over C. Then its characteristic polynomial over C has n complex roots
in C as C is algebraically closed (see Section 5.2 for roots of polynomials over C). Thus, A has n
complex eigenvalues. Let λ ∈ C be any such eigenvalue of A and x ∈ Cn an eigenvector belonging to
λ, so Ax = λx. Then by properties of the dot product:
λ /x, x0 = /λx, x0 = /Ax, x0

= x∗ (Ax). (5.7)
On the other hand, according to properties of conjugate transposes (see Section 1.5 for these proper-
ties),
x∗ (Ax) = (x∗ A)x = (A∗ x)∗ x

= (Ax)∗ x
as A∗ = At = A, A being a real symmetric matrix. Since (Ax)∗ x = /x, Ax0, it then follows from Equation
(5.7) that
λ /x, x0 = /x, Ax0 = /x, λx0

= λ /x, x0.
These equalities imply that
(λ − λ) /x, x0 = 0. (5.8)
However, for any x = (x1 , x2 , . . . , xn )t ∈ Cn ,
/x, x0 = |x1 |2 + |x2 |2 + · · · + |xn |2 = 0
if and only if each xi = 0, that is, if and only if x = 0. The vector x in Equation (5.8), being an
eigenvector, is non-zero and so the preceding equation can hold only if λ = λ. This proves that λ is a
real number. Since λ is an arbitrary eigenvalue of A, it follows that every eigenvalue of A is real.
Now, expressing each component x j of x (for 1 ≤ j ≤ n) as x j = a j + ib j for real numbers a j and

b j , we may write x = a + ib, where both a = (a1 , a2 , . . . , an )t and b = (b1 , b2 , . . . , bn )t are column
vectors in Rn . Now for the real number λ, λx = λ(a + ib) = λa + iAb and by properties of matrix
multiplication Ax = A(a + ib) = Aa + iAb. Therefore, the relation Ax = λx can be restated as
Aa + iAb = λa + iλb. (5.9)
Since A is a real matrix, a, b are vectors with real components and λ a real number, comparing both
sides (which are n-dimensional column vectors) of Equation (5.9), we conclude that Aa = λa and
Ab = λb. Thus both a and b are real eigenvector of A for the eigenvalue λ. The proof is complete. !
The same proof, with minor modification, also shows that if H ∈ Mn (C) is a hermitian matrix, that
is, H ∗ = H, then H has n real eigenvalues.
We cannot avoid using some other concepts about orthogonality in Rn to prove our main result. We
recall the relevant definitions and results; for details, the reader can go through the material in Section
3.7.
Two vectors x and y in Rn (or in Cn ) are orthogonal if /x, y0 = 0. A set of mutually orthogonal
vectors in Rn (or in Cn ) is a orthogonal set; so any two vectors in the set are orthogonal. Any orthogonal
set of non-zero vectors is a linearly independent set. For x = (x1 , x2 , . . . , xn )t in Rn , /x, x0 = x21 + x22
+ · · · + x2n is clearly non-negative; it is 0 if and only if x = 0. The length 4x4 of a non-zero x is defined
as the positive square root of /x, x0; the length of the zero vector is 0. Thus 4x42 = /x, x0 . A vector in
Rn is a unit vector if its length is 1; note that any non-zero vector can be made into a unit vector by
dividing it by its length. An orthonormal set of vectors in Rn is a set of mutually orthogonal vectors
such that each vector is a unit vector. Thus vectors {v1 , v2 , . . . , vr } in Rn form an orthonormal set in
Rn , if
F G
vi , v j = v j t v i = δ i j ,
where the Kronecker delta symbol δi j = 0 if i ! j and δii = 1 for all i. Observe that if Q is an orthogonal
matrix of order n, that is, a matrix whose columns form an orthonormal set, then by the preceding
formula for dot product of orthonormal vectors, Qt Q = In , which shows that such a Q is invertible and
that Q−1 = Qt .
A well-known procedure, called the Gram–Schmidt process, converts any linearly independent set
of vectors in Rn into an orthonormal set. In particular, any orthonormal set of vectors in Rn , being
linearly independent, can be extended to an orthonormal basis of Rn .
We can now prove the following remarkable result which states that a real symmetric matrix is
similar to a diagonal matrix Over R.
Proposition 5.3.24. For any real symmetric matrix A of order n, there is an orthogonal matrix Q
of order n such that Q−1 AQ = Qt AQ is diagonal.
Proof. The proof is by induction on n. If n = 1, then there is nothing to prove. So assume that n >
1 and that (by induction hypothesis) the result holds for any real symmetric matrix of order n − 1.
Let A be a real symmetric matrix of order n and λ any eigenvalue of A. By Proposition (5.3.23),
λ is real and there is a eigenvector, say v1 ∈ Rn , of A belonging to the eigenvalue λ. Since a scalar
multiple of an eigenvector is still an eigenvector for the same eigenvalue, we can assume that v1 is a
unit vector. Therefore, by the Gram–Schmidt process, {v1 } can be extended to an orthonormal basis
{v1 , v2 , . . . , vn } of Rn . Let P be the real matrix whose columns are these basis vectors; as we have
remarked a while ago, that makes P an orthogonal matrix and so P−1 = Pt , the transpose of P. Note
that the jth row of Pt is the transpose vtj of the jth column v j of P. We also observe, by the row-column
multiplication rule (see Exercise 8 of Section / 1.3), that the jth0 column of the product AP is the column
vector Av j and so we may express AP = Av1 Av2 , . . . , Avn . Therefore one has
 t
v1 
vt  / 0
 2
P AP = P AP =  .  Av1 Av2 , . . . , Avn .
−1 t
 .. 
 t
vn
F G
From this expression, one concludes, by Lemma (5.3.22), that the (i, j)th entry of P−1 AP is vi , Av j as
(vti )t = vi ; this enables us to compute the first column of P−1 AP by taking j = 1. Since {v1 , v2 , . . . , vn }
is an orthonormal set and v1 is an eigenvector for A for the real eigenvalue λ, one obtains
/vi , Av1 0 = /vi , λv1 0

= λ /vi , v1 0


λ if i = 1

=
0 if i ! 1
Thus the first column of Pt AP is (λ, 0, . . . , 0)t . Now, observe that as A is symmetric, Pt AP is also
symmetric: (Pt AP)t = Pt At (Pt )t = Pt AP. We, therefore conclude that, there is a real symmetric matrix
B of order n − 1 such that
'
(
t λ 0
P AP = ,
0B
where the two symbols 0 denote the (n − 1)-dimensional zero row vector and zero column vector,
respectively. As B is a real symmetric matrix of order n − 1, by the induction hypothesis, there is an
orthogonal matrix U of order n − 1 such that U −1 BU = U t BU = D1 , where D1 is a diagonal matrix of
order (n − 1). Set
' (
1 0
Q=P .
0U
Then, as Pt P = In and U t U = In−1 ,

' ( ' (
t 1 0 t 1 0
QQ= PP
0 Ut 0U
' (' (
1 0 1 0
=
0 Ut 0 U
= In ,
which shows that Q is an orthogonal matrix. Finally,

' ( ' (
t 1 0 t 1 0
Q AQ = P AP
0 Ut 0U
' (' (' (
1 0 λ 0 1 0
=
0 Ut 0 B 0 U
' (
λ 0
= ,
0 D1
where D1 = U t BU is a real diagonal matrix. As Q is orthogonal, this calculation shows that Q−1 AQ =
Qt AQ is a diagonal matrix. The proof is complete. !
Observe that if Q is an orthogonal matrix and A a real symmetric matrix, both of order n, such that
Qt AQ is a diagonal matrix D, then the diagonal entries of D are necessarily the eigenvalues of A.
It is interesting that the converse of the preceding result holds: an orthogonally diagonalizable
matrix must be a symmetric one.
Proposition 5.3.25. Let A be a real matrix of order n. If there is an orthogonal matrix P of order
n such that P−1 AP is diagonal, then A must be symmetric.
The proof is straightforward and left to the reader.

For diagonalizing specific real symmetric matrices the following result will be needed.
Proposition 5.3.26. Let λ1 and λ2 be two distinct eigenvalues of a real symmetric matrix A. If v1
and v2 are eigenvectors belonging to λ1 and λ2 , respectively, then v1 and v2 are orthogonal.
Proof. By properties of dot products for real vectors
λ1 /v1 , v2 0 = /λ1 v1 , v2 0 = /Av1 , v2 0

= v2 t Av1
= (At v2 )t v1
= (Av2 )t v1 ,
as A is symmetric. Again
(Av2 )t v1 = /v1 , Av2 0 = /v1 , λ2 v2 0

= λ2 /v1 , v2 0
as λ2 is real. These computations show that (λ1 − λ2 ) /v1 , v2 0 = 0. Since λ1 and λ2 are distinct, it
follows that /v1 , v2 0 = 0. !
Now we can outline a procedure for diagonalizing a real symmetric matrix A; this procedure will
also produce the orthogonal matrix Q such that Qt AQ is diagonal. The first step will be to find the
eigenvalues of A and determine the basis vectors of each eigenspace by the general methods used ear-
lier. The next step is to apply Gram–Schmidt process, as discussed in Section 3.7, to the basis vectors
of each eigenspace to find an orthonormal basis of each one of them separately. Since eigenvectors
belonging to distinct eigenvalues are orthogonal by Proposition (5.3.26), the union of the orthonormal
bases of the eigenspaces yield a basis of Rn . Finally, we form the matrix whose columns are the mem-
bers of the orthonormal bases of the eigenspaces. This is the orthogonal matrix Q such that Qt AQ is
diagonal, the diagonal entries of which are the eigenvalues of A appearing in the same order as their
corresponding eigenvectors in Q.
We illustrate the procedure in the following example.
EXAMPLE 30 Consider the 3 × 3 symmetric matrix

 
2 1 1
 
A = 1 2 1.
 
1 1 2
It is easy to verify that λ = 1, 4 are the eigenvalues of A with the eigenvalue 1 having
multiplicity 2. Now, for the eigenvalue λ = 1, the matrix
 
1 1 1
 
A − λI3 = 1 1 1
 
1 1 1
clearly has nullity 2 so that corresponding eigenspace W1 has dimension 2. We may

choose
u1 = (1, −1, 0)t and u2 = (1, 0, −1)t
as a basis of this eigenspace. For the eigenvalue λ = 4, the matrix

   
−2 1 1 1 0 −1
   
A − λI3 =  1 −2 1 ∼ 0 1 −1
   
1 1 −2 0 0 0
and so the dimension of the eigenspace W2 is 1. We choose
u3 = (1, 1, 1)t
as a basis for this eigenspace. The Gram–Schmidt process applied to the basis
{u1 , u2 } yields the following orthonormal basis
√ √ √ √ √ √
v1 = (1/ 2, −1/ 2, 0)t v2 = (1/ 6, 1/ 6, − 2/ 3)t
of W1 . Finally, normalizing u3 , we obtain

√ √ √
v3 = (1/ 3, 1/ 3, 1/ 3)t .
Thus, the required orthogonal matrix Q will then be given by

 √ √ √ 
 1/ 2 1/ √6 1/ √3
 √ .
Q = −1/ 2
 √1/ √6 1/ √3
0 − 2/ 3 1/ 3
We leave it to the reader to verify directly that
   
2 1 1 1 0 0 
t    
Q 1 2 1 Q = 0 1 0 .
  
1 1 2 0 0 4
While working out an orthonormal basis of eigenvectors of a real symmetric matrix, the reader
should remember that if the multiplicity of an eigenvalue of such a matrix is r, then it is guaranteed
that there will be r linearly independent eigenvectors for that eigenvalue.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications.
(a) No non-zero scalar can be an eigenvalue of the zero operator on a vector space.
(b) Every linear operator on a finite-dimensional vector space has eigenvectors.
(c) Every linear operator on a complex vector space has eigenvalues.
(d) Any two eigenvectors of a linear operator on a vector space are linearly independent.
(e) The sum of two eigenvalues of a linear operator is again its eigenvalue.
(f) If A ∈ Mn (F) is diagonalizable, so is Ak for any positive integer k.
(g) If Ak is diagonalizable for some integer k ≥ 2, then A is diagonalizable.
(h) Any projection on a finite-dimensional vector space is diagonalizable.
(i) A linear operator on a finite-dimensional vector space whose characteristic polynomial fac-
tors into linear factors must be diagonalizable.
(j) A diagonalizable linear operator on an n-dimensional vector space has n distinct eigenvalues.
(k) Two similar matrices have the same eigenspace for a common eigenvalue.
(l) A matrix in Mn (F) is similar to a diagonal matrix only if Fn has a basis of eigenvectors of
the matrix.
(m) The number of linearly independent eigenvectors of a matrix A ∈ Mn (F) belonging to an
eigenvalue λ equals the number zero rows of the row reduced form of (A − λIn).
(n) A linear operator on R3 must have a real eigenvalue.
(o) If A ∈ Mn (F) is diagonalizable, so is A + aIn for any a ∈ F.
(p) For matrices A, B ∈ Mn (F), every eigenvector of AB is an eigenvector of BA.
(q) If A ∈ Mn (F) is diagonalizable, then the rank of A is the number of non-zero eigenvalues,
counted according to their multiplicities.
(r) A non-zero real symmetric matrix is invertible.
(s) An orthogonal matrix cannot be a symmetric one.
(t) Any permutation matrix is an orthogonal matrix.
2. In each of the following cases, let T be the linear operator on R2 represented by the given matrix
A with respect to the standard basis of R2 . Find the characteristic polynomial, eigenvalues and
eigenvectors spanning the eigenspaces for each eigenvalue of T :
' ( ' (
0 1 1 1
A= , A= ,
0 0 1 1
' ( ' (
5 1 1 −1
A= , A= .
−7 −3 0 1
3. For each of the following matrices A over the field F, find the eigenvalues and the eigenvectors
spanning the eigenspace for each of the eigenvalue. Also, determine whether any of the matrices
A is diagonalizable; if so, find an invertible matrix P and a diagonal matrix D such that P−1 AP =
D.
' (
i 1
(a) A = for F = C.
2 −i
' (
0 1
(b) A = for F = R.
1 0
 
0 0 −2
 
(c) A = 1 2 1 for F = R.
 
1 0 3
' (
−3 1
(d) A = for F = C.
−7 3
 
1 2 1
 
(e) A = 2 0 −2 for F = R.
 
1 −2 3
 
−13 −60 −60
 
(f) A =  10 42 40 for F = R.
 
−5 −20 −18
 
1 2 3 4

2 4 6 8
(g) A =   for F = R.
3 6 9 12
4 8 12 16
4. For each of the following real matrices A, find the eigenvalues and the eigenvectors spanning
the eigenspace for each of the eigenvalue. Also, determine whether any of the matrices A is
diagonalizable; if so, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.
 
1 0 1
 
(a) A = 0 2 0 .
 
1 0 1
 
 1 0 1
 
(b) A = −1 2 −1 .
 
1 0 1

 
 2 0 1
 
(c) A = −1 1 −1 .
 
1 0 2
5. Prove Proposition (5.3.26). Give an example of a diagonalizable real matrix of order 3 which
cannot have a full set of 3 orthonormal eigenvectors.
6. For each of the following 3×3 real matrices A, determine orthogonal matrices P such that P−1 AP
is diagonal; use the diagonal form of A to compute Ak for any positive integer k:
 
1 1 0
 
(a) A = 1 0 1 .
 
0 1 1
 
1 1 3
 
(b) A = 1 3 1 .
 
3 1 1
 
1/2 1/2 1/4
 
(c) A = 1/4 1/4 1/2 .
 
1/4 1/4 1/2
7. For each of the following permutation matrices, compute the eigenvalues in C:
 

1 0 0
 1 0 0 0

  0 0 0 1
0 0 1 ,  .
 0 0 1 0
0 1 0 
0 1 0 0
8. For real numbers a, b and c, let C be the circulant matrix

 
a b c
 
C =  c a b .
 
b c a
Show that if f (x) = a + bx + cx2 and P the permutation matrix
 
0 1 0
 
P = 0 0 1 ,
 
1 0 0
then C = f (P). Find the eigenvalues of P and hence the eigenvalues of C.

9. Compute the eigenvalues of the permutation matrix
 
0 1 0 0

0 0 1 0
P =  .
0 0 0 1
1 0 0 0
Hence find the eigenvalues of the circulant matrix

 
a b c d 

d a b c 
C =  ,
 c d a b 
b c d a
where a, b, c and d are real numbers. Note that C = f (P), where f (x) = a + bx + cx2 + dx3 .
10. Let A = [ai j ] be the permutation matrix of order n over R such that ai j = 1 if i + j = 2 or/ n + 2,0
and ai j = 0 otherwise. Show that the eigenvalues of A are 1 and −1 with multiplicities n2 + 1
/ 0
and n−12 , respectively. Here, [x] denotes the largest integer ≤ x.
[Hint: As in Exercise 16 of Section 4.5, show that if T is the linear operator on Rn determined
by A with respect to the standard basis of Rn , then there is another basis of Rn relative to which
the matrix of T is a diagonal one with 1 and −1 as the only diagonal entries.]
11. Determine the trace and the determinant of the matrix in the preceding exercise for any positive
integer n.
12. Show that ±1 and ±i are the possible eigenvalues of the following matrix A over C by considering
the eigenvalues of A2 :
 
1 1 1 
1  
A = √ 1 ω ω2  .
3 1 ω2 ω4 
Here, ω is a non-real cube root of unity.

13. Generalize the preceding exercise as follows: For any positive integer n ≥ 2, let ω = e2πi/n and A
the Fourier matrix of order n over C given by
1 / 0
A = √ a jk , where a jk = ω( j−1)(k−1) .
n
Show that the possible eigenvalues of A are ±1 and ±i.

[Hint: Use the identity 1 + ω + ω2 + · · · + ωn−1 = 0.]
14. Find the eigenvalues of the linear operator T on R2 which takes the circle {(x1 , x2 ) | x1 2 + x2 2 = 1}
to the ellipse {(x1 , x2 ) | x1 2 /a2 + x2 2 /b2 = 1}.
15. Find the eigenvalues and the corresponding eigenvectors of the linear operator T , on the vector
space R2 [x] of all real polynomials of degree at most 2, given by
T (a0 + a1 x + a2 x2 ) = (3a0 − 2a1) − (2a0 − 3a1)x + 4a2 x2 .
16. Find the eigenvalues and the corresponding eigenvectors of the differentiation operator D on
R3 [x], the real vector space of all real polynomials of degree at most 3.
17. Find the eigenvalues and the corresponding eigenvectors of the following linear operators on
M2 [R], the vector space of the 2 × 2 real matrices:
' ( ' ( ' ( ' (
a b 2c a+c a b c d
T = , S = .
c d b − 2c d c d a b
18. Let A be a diagonalizable matrix of order n over a field F having 0 and 1 as its only eigenvalues.
If the null space of A has dimension m, which of the following are correct assertions?
(a) The characteristic polynomial of A is xm (x − 1)n−m.
(b) Ar = Ar+1 for any positive integer r.
(c) The trace and determinant of A are n − m and 0, respectively.
(d) The rank of A is n − m.
19. For any matrix A ∈ Mn (F) for a field F, show that A and its transpose At have the same set of
eigenvalues.
20. Let T be a linear operator on a vector space V over a field F. Show that T is invertible if and
only if zero is not an eigenvalue of T . Show further that a ∈ F is an eigenvalue of an invertible
operator T if and only if a−1 is an eigenvalue of T −1 .
21. Show that any matrix in Mn (R), with n odd, has at least one real eigenvalue.
22. Let A ∈ Mn (F) be a matrix such that the sum of the entries in each row (column) is a scalar a.
Show that a is an eigenvalue of A.
23. Let A, B ∈ Mn (F). Prove that AB and BA have the same eigenvalues.
(Hint: Consider two separate cases for non-zero eigenvalues and zero as an eigenvalue.)
24. Let x and y be two n × 1 column vectors over a field F. Use Exercise 20 to find the eigenvalues
of xyt .
(Hint: Find two suitable matrices A and B of order n over F such that AB = xyt .)
25. Let ai and bi , for i = 1, 2, 3, be arbitrary elements of a field F. Find the eigenvalues of the matrix
 
a1 b1 a1 b2 a1 b3 
 
a2 b1 a2 b2 a2 b3  .
 
a3 b1 a3 b2 a3 b3
26. Let A and B be matrices in Mn (F) such that a non-zero a ∈ F is an common eigenvalue of AB
and BA. Prove that corresponding eigenspaces of AB and BA have the same dimension.
27. Let A and B be matrices in Mn (F). If any one of A or B is invertible, then prove that AB and BA
have the same characteristic polynomial by showing that they are similar over F.
28. Let A and B be matrices in Mn (F) such that none of them is invertible. Prove that AB and BA
have the same characteristic polynomials by carrying out the following computations:
(a) If A is of rank r, then show that there are P and Q in Mn (F) such that
' (
Ir 0
PAQ = ,
0 0
where Ir is the r × r identity matrix and 0 denote zero matrices of appropriate sizes. Let
Q−1 BP−1 be partitioned as follows:
' (
C D
Q−1 BP−1 = ,
E F
where C is an r × r matrix. Show that

' (
C D
PABP−1 =
0 0
and
' (
C 0
Q−1 BAQ = .
E 0
(b) Noting that AB and BA are similar to PABP−1 and Q−1 BAQ, respectively, deduce from the
last two matrix equations in (a) that the chracteristic polynomials of both AB and BA are
given by
xn−r det(xIr − C).
29. Give an example of two square matrices A and B over any field such that AB and BA are not
similar.
30. Let A be a fixed matrix in Mn (F). If T be the linear operator on Mn (F) given by
T (B) = AB for any B ∈ Mn (F),
then show that T and the matrix A have the same eigenvalues.
31. Let A, B ∈ Mn (F) such that each has n distinct eigenvalues. Prove that AB = BA if and only if
they have the same eigenvectors.
32. Let T be the linear operator on Mn (R) given by T (A) = At , the transpose of T . Find the eigen-
values of T . Show further that there is a basis of Mn (R) with respect to which the matrix of T is
diagonal.
33. If a is an eigenvalue of a linear operator T on a vector space V over a field F, then show that
for any polynomial f (x) over F, the scalar f (a) is an eigenvalue of the operator f (T ). [If f (x) =
a0 + a1 x + · · · + am xm , then f (T ) is the linear operator given by f (T ) = a0 I + a1 T + · · · + am T m ,
where I is the identity map on V]
34. Use the preceding exercise to prove that if f (x) is the characteristic polynomial of a diagonal-
izable operator T on a finite-dimensional vector space V, then f (T ) = z, the zero operator on
V.
35. For any diagonalizable operator T on a finite-dimensional vector space V, prove that V = Im(T )⊕
Ker(T ).
36. Let T be a linear operator on a finite-dimensional vector space V over a field F such that all the
roots of the characteristic polynomial of T are in F. Prove that T is diagonalizable if and only if
for every eigenvalue a of T
Ker(T − aI) = Ker(T − aI)k
for every integer k ≥ 2, where I is the identity operator on V.

37. Let A be an invertible matrix in M2 (R) such that A is similar to A2 . Prove that the characteristic
polynomial of A is either x2 + x + 1 or x2 − 2x + 1.
(Hint: Consider A as a matrix over C and then relate the coefficients of the characteristic poly-
nomial of A to the sums and the products of the eigenvalues of A and A2 .)
Minimal Polynomial 271
38. Let V be the vector space of all the differentiable, real-valued functions on R, and let D be the
function on V given by D( f (x)) = f ' (x), the derivative of f (x). Prove that D is a linear operator
on V and any a ∈ R is an eigenvalue of D.
(Hint: Consider the exponential function on R.)
39. Let A ∈ Mn (F) be diagonalizable with characteristic polynomial
a 0 + a 1 x + a 2 x2 + · · · + xn .
Show that the rank of A is the largest integer i such that an−i ! 0.
5.4 MINIMAL POLYNOMIAL

We have seen that the vector space End(V) = EndF (V) of all the linear operators on a finite-dimensional
vector space V over a field F is also a ring. However, to be able to utilize the multiplicative structure
of End(V) in analysing a linear operator T , we need to find some ways to relate different powers of T .
The idea of polynomials in T provides a convenient way to do that.
Consider a vector space V (no restriction on dimension) over a field F and let T be a linear operator
on V. Given a polynomial f (x) over F, say
f (x) = a0 + a1 x + a2 x2 + · · · + am xm ,
where the coefficients a j are scalars from F, we define the symbol f (T ) as follows:
f (T ) = a0 IV + a1 T + a2 T 2 + · · · + am T m , (5.10)
where IV stands for the identity operator on V. Note that being a linear combination of powers of T ,
f (T ) is actually a linear operator on V, that is, f (T ) ∈ End(V).
It is clear from Equation (5.10) that if f (x) = g(x) as polynomials in F[x], then f (T ) = g(T ) as maps
on V for any T ∈ EndF (V). Similarly, if h(x) = f (x) + g(x) or h(x) = f (x)g(x), then h(T ) = f (T ) + g(T )
or h(T ) = f (T )g(T ), respectively.
Definition 5.4.1. For a linear operator T on a vector space V over a field F, one says that T
satisfies polynomial f (x) ∈ F[x] if f (T ) is the zero operator z of V, that is, if
f (T )v = a0 v + a1 T v + a2 T 2 v + · · · + am T m v = 0 (5.11)
for every v ∈ V (0 denotes the zero vector of V).
Recall that the only linear operator on the zero space over any field F is the zero operator which
clearly satisfies any polynomial over F. Since this will cause technical difficulties, our discussion about
linear operators satisfying polynomials has to be restricted to operators on non-zero vector spaces. So
we assume, for rest of this section, that all vector spaces are non-zero.
We present some examples of polynomials satisfied by various linear operators.
EXAMPLE 31 Let z be the zero operator on a vector space V over a field F and let f (x) = x. Now
for any v ∈ V, f (z)(v) = z(v) = 0 by the definition of the zero operator. Thus z satisfies
f (x) = x.
EXAMPLE 32 The identity operator IV on any vector space V over a field F satisfies a simple rela-
tion: IV v = v for any v ∈ V. Taking f (x) = x − 1 and T = IV in Equation (5.11), we see
that IV satisfies the polynomial x − 1 over F.
EXAMPLE 33 Any projection P on a vector space V over a field F is a linear operator on V such that
P2 = P. Thus, by definition, a projection P satisfies the polynomial x2 − x = x(x − 1)
over the field F.
In particular, the special projections P1 and P2 on R2 do satisfy the polynomial
2
x − x over R.
EXAMPLE 34 Let T : Rn → Rn be the linear transformation defined by its action on the standard
basis of Rn as follows:
T e j = e j+1 for 1 ≤ j ≤ n − 1 and T en = 0.
As we have seen earlier that T n = z, the zero map on Rn , whereas T k is not z

for any k < n. We can, therefore, say that T satisfies the polynomial xn , but not the
polynomials xk for any k < n.
EXAMPLE 35 If T be the linear operator on R3 represented by the diagonal matrix A =
diag[c1, c2 , c3 ] with respect to the standard basis, then the matrix A −c j I3 , for 1 ≤ j ≤
3 has its jth row the zero row. Therefore, it follows that (A − c1 I3 )(A − c2 I3 )(A − c3 I3 )
is the zero matrix. In terms of the operator T , we thus see that T satisfies the polyno-
mial (x − c1 )(x − c2)(x − c3) over R.
As the last example suggests, we can also talk about matrices satisfying polynomials in complete
analogy to linear operators. Given A ∈ Mn (F) and any polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am xm
over F, we define
f (A) = a0 In + a1 A + a2 A2 + · · · + am Am , (5.12)
where In is the identity matrix of order n. It is clear that f (A) is a matrix in Mn (F). f We say that A
satisfies the polynomial f (x) over F if f (A) is the zero matrix in Mn (F).
For example, if A is the diagonal matrix of Example 27, then we have ' seen ( that A does satisfy the
0 0
the polynomial (x −c1 )(x −c2 )(x −c3) over R. Similarly, the matrix A = satisfies the polynomial
1 0
x2 over any field F, but not x.
Annihilators
To facilitate our discussion about polynomials satisfied by an operator, we introduce a new notation.
Definition 5.4.2. Given any linear operator T on a vector space V over a field F, the annihilator
ann(T ) of T is the collection of all polynomials over F satisfied by T . Thus
ann(T ) = { f (x) ∈ F[x] | f (T ) = z},
where z is the zero operator on V.

Thus, f (x) ∈ ann(T ) if and only if the operator f (T ) is such that f (T )v = 0 for all v ∈ V.
The basic properties of ann(T ) are listed in the following proposition.
Proposition 5.4.3. For any linear operator T on a vector space V over a field F, the following
hold. All polynomials are over F.
(a) The zero polynomial is in ann(T ).
(b) The polynomial x ∈ ann(T ) if and only if T is the zero operator on V.
(c) If f (x), g(x) ∈ ann(T ), then f (x) ± g(x) ∈ ann(T ).
(d) If f (x) ∈ ann(T ), then f (x)g(x) ∈ ann(T ) for any g(x).
(e) If ann(T ) has a non-zero polynomial f (x), then ann(T ) has a monic polynomial of degree same
as that of f (x).
We leave the verifications of these properties to the reader as only straightforward applications
of the definition are involved. Property (d) additionally requires the fact that polynomials in T
commute. This follows from the obvious property of multiplication of polynomials over a field:
f (x)g(x) = g(x) f (x).
Proposition 5.4.4. For any linear operator T on a vector space V over a field F and polynomials
f (x) and g(x) over F,
f (T )g(T ) = g(T ) f (T ).
In an exactly analogous manner we can define the annihilator ann(A) of a matrix A ∈ Mn (F):
ann(A) = { f (x) ∈ F[x] | F(A) = 0}, (5.13)
where 0 is the zero matrix in Mn (F).

It is obvious that all the assertions in the preceding proposition are valid if T is replaced by A.
Now we restrict ourselves to finite-dimensional vector spaces. So let T be a linear operator on a
vector space V of dimension n over F and A ∈ Mn (F) be the matrix of T with respect to a fixed but
arbitrary basis of V. Then, the isomorphism between EndF (V) and Mn (F) induced by the fixed basis
implies that T satisfies a polynomial f (x) over F if and only if A satisfies f (x). Similarly, if A ∈ Mn (F)
satisfies a polynomial over F, and A determines the linear operator T on V with respect to any basis
of V, then T also satisfies the same polynomial. See Section 4.4 for details of the isomorphism which
enables such correspondence between operators and matrices.
It is clear, therefore, that any result about polynomials satisfied by linear operators on a finite-
dimensional vector space has a counterpart for matrices.
It is not clear at this moment whether, given an arbitrary non-zero linear operator T on a vector
space, ann(T ) has any non-zero polynomial. Thus the following result is theoretically important as it
ensures the existence of such polynomials.
Theorem 5.4.5. Let V be a finite-dimensional non-zero vector space over a field F and let T be a
linear operator on V. Then T satisfies at least one polynomial of positive degree over F.
Proof. Assume that dim V = n ! 0. Now the vector space EndF (V) of all linear operators has dimen-
sion n2 over F (see Theorem 4.3.6). It follows that any set of (n2 + 1) vectors in EndF (V), and in
2
particular, the vectors T 0 = I, T, T 2 , . . . , T n are linearly dependent over the field F. Therefore we
can find scalars a0 , a1 , . . . , an2 in F, not all zero, such that

2
a0 I + a1 T + a2 T 2 + · · · + an2 T n = z,
where z is the zero operator in EndF (V). Note that it cannot be the case that only a0 is non-zero and the
rest of the scalars are zeros, for then a0 I = z, which is not possible. Thus, T satisfies the polynomial
2
a0 + a1 x + a2 x2 + · · · + an2 xn of degree at most n2 over F, where at least one ai , for i ≥ 1, is non-zero.
The last condition implies that the polynomial has positive degree. !
Thus for any linear operator T on a finite-dimensional vector space, ann(T ) has polynomials of
positive degree. In fact, a reader familiar with the concept of ideals in a ring will notice that because
of the third and fourth assertions of Proposition (5.4.3), the following result holds.
Proposition 5.4.6. For any linear operator T on a finite-dimensional vector space over a field F,
ann(T ) is a non-zero ideal in the ring F[x] of polynomials over F.
Note that a non-zero constant polynomial cannot belong to ann(T ).

The fundamental fact about ann(T ) is that it has a special polynomial m(x) with the property that
every other polynomial in ann(T ) is a multiple of m(x); such a polynomial is called a generator of the
ideal ann(T ). The existence of such a polynomial can be inferred from the general theory of ideals in
the ring of polynomials over a field; see, for example, Proposition (5.2.4) in our brief discussion on
polynomials, which states that every non-zero ideal in F[x] has a monic polynomial as a generator.
However, as we shall see presently, the existence of such generators can be proved easily without
invoking ideal theory; our aim in mentioning such theories is to make the reader aware of the general
theoretical framework.
Proposition 5.4.7. Let V be a finite-dimensional vector space over a field F and T a linear oper-
ator on V. Let m(x) be a monic polynomial of the least positive degree in ann(T ).
(a) If f (x) ∈ ann(T ), then m(x) divides f (x) in F[x].
(b) m(x) is unique.
Proof. For any f (x) ∈ ann(T ), by the division algorithm in F[x] or by usual long division, we can find
polynomials q(x) and r(x) in F[x] such that
f (x) = m(x)q(x) + r(x),
where either r(x) = 0 or deg r(x) < deg m(x). Since m(x) and f (x) are in ann(T ), it follows, by sub-
stituting T for x in the last polynomial equation and working inside the ring End(V), that r(T ) = z.
This shows that r(x) ∈ ann(T ). Therefore, if r(x) is non-zero, a suitable scalar multiple of r(x) yields
a monic polynomial in ann(T ) whose degree is strictly smaller than that of m(x). This contradicts our
choice of m(x). This contradiction forces r(x) to be the zero polynomial. Hence m(x) divides f (x).
For the uniqueness part, assume that m1 (x) is another monic polynomial of the least positive degree
in ann(T ). Then, assertion (a) implies that m(x) and m1 (x) are such that each one is a multiple of
the other in F[x]. Since they are both monic polynomials of the same degree, it follows that they are
equal. !
This result leads us to the following important definition.

Definition 5.4.8. Let T be a linear operator on a non-zero vector space V over a field F. The unique
monic polynomial of the least positive degree in ann(T ) is called the minimal polynomial of T over F.
The definition is applicable to infinite-dimensional vector spaces also. However, note that min-
imal polynomials are defined for linear operators on non-zero vector spaces only. One of the rea-
sons for excluding the zero vector space (over any field) is that the zero operator, which is the
only linear operator on such a space, satisfies every polynomial over the field. Thus there can
be no unique monic polynomial satisfied by the operator on the zero space; even if the field is
the finite field of two elements, x and x − 1 are two distinct monic polynomials of least positive
degree.
We restate the proposition preceding the definition in terms of minimal polynomials for future
reference.
Proposition 5.4.9. Given a linear operator T on a vector space over a field F, the minimal polyno-
mial m(x) of T is the unique monic polynomial in F[x] such that m(T ) = z. Further, for any polynomial
g(x) such that g(T ) = z, m(x) divides g(x) in F[x].
A similar analysis shows that for any matrix A ∈ Mn F, there is a unique monic polynomial over F,
say m(x) such that
(a) m(A) is the zero matrix in Mn F;

(b) if f (A) is also the zero matrix for some f (x) in F[x], then m(x) divides f (x) in F[x].
We say that m(x) is the minimal polynomial of A.

It is clear that linear operators and their corresponding matrices have the same minimal polynomi-
als.
We consider some examples next.
EXAMPLE 36 The zero linear operator on a vector space V, and similarly the zero matrix in Mn (F),
has the minimal polynomial x.
EXAMPLE 37 The minimal polynomial of the identity operator on any vector space, and similarly
of the identity matrix in Mn (F), is x − 1.
EXAMPLE 38 Consider the projection P1 : R2 → R2 given by P1 (x1 , x2 ) = (x1 , 0). As we have seen
earlier, P1 satisfies the polynomial x2 − x = x(x − 1) over R. The minimal polynomial
for P1 therefore, is a monic divisor of x(x − 1), and has to be one of three: x, x − 1 or
x(x − 1). If it is x, then P1 has to be the zero map; if it is x − 1, then P1 has to be the
identity map on R2 . It follows that x(x − 1) is 'the minimal
( polynomial of P1 .
1 0
Similarly, the minimal polynomial of A = is x(x − 1).
0 0
It is clear that for a general projection P on vector space V, the minimal polyno-
mial has to be x(x − 1) unless P is the identity map (projection onto all of V) or the
zero map (projection onto the zero subspace).
EXAMPLE 39 Recall that, for any field F, the linear operator T on Fn , given by the following action
on the standard basis
T e j = e j+1 for 1 ≤ j ≤ n − 1 and T en = 0,
satisfies the polynomial xn . So the minimal polynomial of T must be a divisor of xn .

Since we also know that T n−1 cannot be the zero operator, we can conclude that the
minimal polynomial of T must be xn .
It follows that the special nilpotent matrix Jn (0), introduced in Definition (4.5.3)
as the matrix of the nilpotent map T , has minimal polynomial xn .
Thus the minimal polynomials of the matrices
   
0 0 0 0
 0 0 0 0

1 0 0 0 0 0 0 0
J4 (0) =   and J4 (0) = 
2 
0 1 0 0 1 0 0 0
0 0 1 0 0 1 0 0
are x4 and x3 , respectively.
EXAMPLE 40 Let A = diag[c1, c2 , c3 ] be a diagonal matrix over any field F. As we had seen in
Example 17, A satisfies the polynomial (x − c1 )(x − c2 )(x − c3 ) and so the minimal
polynomial is a divisor of this polynomial. Note that if c1 , c2 and c3 are distinct real
numbers, each of the following three matrices:
   
0 0 0  c1 − c2 0 0 
   
0 c2 − c1 0  ,  0 0 0 
  
0 0 c3 − c1 0 0 c3 − c 2
and
 
c1 − c3 0 0
 
 0 c2 − c3 0

0 0 0
has exactly one row zero. Thus for these diagonal matrices A −c1 I, A −c2 and A −c3 I,
the product of any two cannot be the zero matrix, whereas the product of all the three
is. We, therefore, can conclude that (x − c1 )(x − c2 )(x − c3 ) is the minimal polynomial
of A.
If c1 = c2 ! c3 , the minimal polynomial for A is (x − c1 )(x − c3 ), and if c1 = c2 = c3 ,
it is just (x − c1).

 
0 0 −1
 
A = 1 0 0.
 
0 1 0
The minimal polynomial of A is the minimal polynomial of any linear operator A

determines on, say, R3 . Fix a basis v1 , v2 , v3 of R3 . Then, A determines an operator
T on R3 given by
T v1 = v2 , T v2 = v3 , T v3 = −v1 .
To get some idea as to how the powers of T behave, we compute the images of v1
under successive powers of T :
T v1 = v2 , T 2 v1 = v3 , T 3 v1 = −v1 .
Thus, (T 3 + I)v1 = 0. Computing in the same manner, we see that (T 3 + I)v2 = 0 =

(T 3 + I)v3 . Thus, T 3 + I is the zero map on R3 , showing that T satisfies x3 + 1 over R.
Since over R, (x + 1)(x2 − x + 1) is a factorization of x3 + 1 as a product of irreducible
polynomials, apart from x3 + 1 the possible candidates for the minimal polynomial
of T are x + 1 and x2 − x + 1. But it is clear that none of T + I or T 2 − T + I is the zero
operator; for example, T v1 + Iv1 = v2 + v1 ! 0 and T 2 v1 − T v1 + Iv1 = v3 − v2 + v1 ! 0
as v1 , v2 and v3 are linearly independent. It follows that the minimal polynomial of
T , and therefore of A is x3 + 1.
We may treat A as a complex matrix, so the linear operator T determined by A
will be acting on a three-dimensional vector space C3 over C. As in the example
with the real matrix, we can show that A and T does not satisfy x + 1 or x2 − x + 1.
Therefore even though over C, the polynomial x2 − x + 1 can be factored into linear
factors, none of these linear factors can be the minimal polynomial of T or A. Thus,
even over C, x3 + 1 is the minimal polynomial of both T and A.
Applications of Minimal Polynomials

The minimal polynomial of an operator contains important information about the operator. It is, there-
fore, not surprising to come across non-trivial results about linear operators which are proved using
the idea of minimal polynomial. The rest of the section is devoted to some such results.
Proposition 5.4.10. Let T be a linear operator on a finite-dimensional vector space V over a

field F. Then, T is invertible in EndF (V) if and only if the constant term of its minimal polynomial is
non-zero.
Proof. Let m(x) = a0 + a1 x + a2 x2 + · · · + xk be the minimal polynomial of T .

Assume that the constant term a0 ! 0. Now, we may rewrite the equality m(T ) = z as
T (a1 I + a2T + · · · + T k−1 ) = −a0 I,
where I stands for the identity map in EndF (V). Since a0 ! 0, the expression (−a1 I − a2 T − · · · −
T k−1 )a−1
0 gives us a well-defined operator in EndF (V). Call it S . In that case the last equation can be
simply put as
TS = S T = I
showing that T is invertible with S ∈ EndF (V) as its inverse.

Conversely, assume that T is invertible, and if possible, let a0 = 0. Multiplying the relation m(T ) = z
by T −1 , we then obtain
a1 I + a2 T + · · · + ak−1 T k−2 + T k−1 = z
as T T −1 = T −1 T = I. This, however, shows that T satisfies a polynomial of degree less than that of
m(x), the minimal polynomial of T . This contradiction proves that a0 is non-zero. !
Recall that T is singular if it is not invertible. (see Definition (4.5.2).)
Corollary 5.4.11. A linear operator T on a finite-dimensional vector space V is singular if and

only if there is non-zero S ∈ EndF (V) such that
T S = S T = z.
Proof. For an invertible T , the relation T S = S T = z, after multiplication by T −1 , implies that S is the
zero map. Thus if the relation holds for some non-zero S ∈ End(V), T cannot be invertible.
Conversely, assume that T is singular. If m(x) is the minimal polynomial of T , by the preceding
proposition, m(x) has no constant term. Therefore, the equality m(T ) = z in EndF (V) has the form
a1 T + a2 T 2 + · · · + ak T k = z
for some scalars a1 , a2 , . . . , ak . Note that k ≥ 1 as the degree of a minimal polynomial is positive. Let
S = a1 I + a2 T + · · · + ak T k−1 . Then, S is a linear operator on V such that S T = T S = z. S is non-zero,
for otherwise T will satisfy a polynomial of degree less than k, contradicting the fact that m(x) has
degree k. The proof is complete. !
Recall that a T ∈ EndF (V) is invertible if and only if ker T ! {0}. Thus, the only way T can fail to
be invertible is if there is some non-zero vector v ∈ V such that T v = 0. The last corollary tells us more
about such v; it can be chosen to be S w for any w not in the kernel of S .
There is also an unexpected implication of the same corollary.
Corollary 5.4.12. If T ∈ EndF (V) is right-invertible, then T is invertible.
Proof. Let T ' be the right inverse of T in EndF (V) so that T T ' = I, where I is the identity map in
EndF (V). If T is not invertible, then Corollary (5.4.11) provides a non-zero S in EndF (V) such that
S T = z. But then,
z = (S T )T ' = S (T T ' ) = S I = S ,
a contradiction. !
One can similarly show that left-invertibility implies invertibility.
It is clear that the preceding proposition and its corollaries imply analogous results about matrices
in Mn (F), one of which we have already proved in Chapter 2.

(a) A is invertible if and only if the constant term of its minimal polynomial is non-zero.
(b) A is singular if and only if there is a non-zero matrix B such that AB = BA = 0n .

(c) If A is either right-invertible or left-invertible, then A is invertible.
We conclude this section by showing that the minimal polynomial is an invariant of a similarity
class of operators or matrices; in other words, the objects within a similarity class have the same
minimal polynomial.
Proposition 5.4.14. Let V be a finite-dimensional vector space over a field F.

(a) If T and S are similar in EndF (V), then they have the same minimal polynomial.
(b) If A and B are similar matrices in Mn (F), then they have the same minimal polynomial.
Proof. We prove the result for matrices. Let B = P−1 AP for some invertible matrix P ∈ Mn (F). We
2
claim that B satisfies any polynomial that A satisfies. First observe that (P−1 AP) = (P−1 AP)(P−1 AP) =
P−1 A(PP−1 )AP = P−1 A2 P. An easy induction then shows that
k
P−1 Ak P = (P−1 AP) = Bk
for any positive integer k. Now, if A satisfies a polynomial f (x) = a0 + a1 x + · · · + am xm , after left-
multiplying both sides of the equation f (A) = 0n by P−1 , and right-multiplying by P, we can rewrite
the equation as
a0 In + a1 P−1 AP + a2 P−1 A2 P + · · · + am P−1 Am P = 0n .
Since the products of matrices in each term of the sum can be replaced by appropriate powers of B
according to our preceding observation, it follows that B does satisfy the polynomial f (x). Hence, our
claim.
In particular, B will satisfy the monic polynomial of A. Hence, according to the matrix analogue of
Proposition (5.4.9), the minimal polynomial of B divides the minimal polynomial of A.
Similarly, as A = Q−1 BQ with Q = P−1 , the minimal polynomial of A divides the minimal polyno-
mial of B. Since minimal polynomials are monic, each dividing the other one implies that they are the
same.
It is clear that the result for linear operators can be proved exactly the same way. !
Recall that given any square matrix of order n or a linear operator on an n-dimensional vector space
(n > 0), the degree of its minimal polynomial does not exceed n2 . However, even for a 2 × 2 matrix, it
may be difficult to find, or guess a polynomial of degree not exceeding 4 which may be a candidate
for its minimal polynomial. For example, consider the linear operator Rθ of rotation of R2 through an
angle θ = π/6. We work with the matrix A of Rθ with respect to the standard basis of R2 . The matrix A
and its powers are listed below:
'√ (  √ 
3/2 √ −1/2  1/2 − 3/2
A= 2
A =  √  
1/2 3/2 3/2 1/2
   √ 
3
0 −1 4
 −1/2 − 3/2
A =   A =  √ .
1 0 3/2 −1/2
The reader will agree that it is difficult even to guess a relation among these powers. Fortunately,
certain results we will derive in Section 5.6 will help us in finding the minimal polynomial of an
operator by considering the factors of its characteristic polynomial. One such result states that the
minimal polynomial of an operator or a matrix divides its characteristic polynomial. We have cited
this result, without any proof, so that the reader can use it for the exercises at the end of the section.
The derivation of these results depends on the idea of invariant subspaces which we take up in the
next section.
EXERCISES
operators are on non-zero finite-dimensional vector spaces.
(a) The minimal polynomial of no non-zero operator can be x.
(b) For every positive integer k, 1 ≤ k ≤ n, there is a matrix A ∈ Mn (F) such that the minimal
polynomial of A is xk .
(c) Every linear operator has a unique minimal polynomial.
(d) If two matrices in Mn (F) have the same minimal polynomial, then they must be similar
over F.
(e) If f (x) and g(x) are the minimal polynomials of linear operators T and S , respectively, then
f (x)g(x) is the minimal polynomial of the composite T S .
(f) If f (x) and g(x) are, respectively, the minimal polynomials of matrices A and B in Mn (F),
then f (x) + g(x) is the minimal polynomial of the sum A + B.
(g) For a linear operator T , the minimal polynomial of T 2 divides the minimal polynomial of T .
(h) For a linear operator T , the minimal polynomial of T divides the minimal polynomial of T 2 .
(i) The characteristic polynomial and the minimal polynomial of a diagonalizable operator are
the same.
(j) The minimal polynomials of a matrix in Mn (F) and its transpose are the same.
(k) If a linear operator on an n-dimensional vector space has n distinct eigenvalues, then its
minimal polynomial has degree n.
(l) The minimal polynomial of the differential operator on Rn [x] is xn+1 .
(m) If f (x) is the minimal polynomial of a linear operator T on a vector space V, then f (x − 1) is
the minimal polynomial of T − IV .
(n) For A, B ∈ Mn (F) the matrices AB and BA have the same minimal polynomial.
2. In each of the following cases, find the minimal polynomial of the linear operator T on the
indicated vector space V:
(a) T (x1 , x2 ) = (x1 , x1 + x2 ) on V = R2 .
(b) T (x1 , x2 , x3 ) = (−x3 , x1 − ix3 , x2 + ix3 ) on V = C3 .
(c) T ( f (x)) = f ' (x) + f (x) on V = R3 [x].
(d) T ( f (x)) = x f ' (x) + f (x) on V = R4 [x].
(e) T (A) = At on V = Mn (F).
(f) T ( f (x)) = f (x + 1) on V = R3 [x].
3. Find the characteristic polynomials and therefore, the minimal polynomials of the following
matrices over the indicated fieldF:
' (
3 1
(i) A = for F = R.
1 3
' (
0 1
(ii) A = for F = R.
−1 0
' (
0 1
(iii) A = for F = C.
−1 0
'√ (
3/2 √ −1/2
(iv) A = for F = R.
1/2 3/2
 
0 0 1
 
(v) A = 0 1 0 for F = R.
 
1 0 0
 
 1 1 1
 
(vi) A = −1 −1 −1 for F = C.
 
1 1 1
4. Can there be a 3 × 3 real non-diagonal matrix whose minimal polynomial is x2 − 5x + 4?
5. Let T be a nilpotent operator of index n on an n-dimensional vector space V (that is, T n is the
zero operator whereas T n−1 is not). Show that, if n > 1, then there is no linear operator S on V
such that S 2 = T .
6. Show that the minimal polynomial of the real matrix
' (
0 −1
1 −1
is x2 + x + 1. Also, find a real matrix of order 4 whose minimal polynomial is x2 + x + 1.

7. Compute the minimal polynomial of the matrix
 
 0 0 1 
 
A =  1 0 1 

−1 1 0
considered a matrix over C. Verify that A is diagonalizable over C but not so if it is considered
a real matrix.
8. Let A = [ai j ] be an n × n matrix over R such that a11 is non-zero. Suppose that the kth row of A is
k times the first row of A for 2 ≤ k ≤ n. Compute the characteristic polynomial and the minimal
polynomial of A.
(Hint: Use suitable properties of determinant to simplify the computation of the characteristic
polynomial.)
9. Give examples of diagonalizable operators on R3
(a) whose characteristic and minimal polynomials are equal;
(b) whose characteristic and minimal polynomials are not equal.
10. Let A and B be two matrices in Mn (F) having the same trace and the same minimal polynomial
of degree n − 1. Prove that the characteristic polynomials of A and B are the same.
11. For each of the following permutation matrices, compute the minimal polynomial over C:
 

1 0 0
 1 0 0 0

  0 0 0 1
0 0 1,  .
 0 0 1 0
0 1 0 
0 1 0 0
Are the minimal polynomials over R the same as those over C?

12. Compute the minimal polynomials of the following the permutation matrices over C and over
R:
 

0 1 0
 0 1 0 0

  0 0 1 0
0 0 1,  .
 0 0 0 1
1 0 0 
1 0 0 0
/ 0
13. Find the minimal polynomial of the real permutation matrix A = ai j , where ai j = 1 if i + j = 2
or n + 2, and ai j = 0 otherwise.
14. Give an example of two matrices A and B over R, such that the products AB and BA have
different minimal polynomials.
15. Let A ∈ Mn (F) be an upper triangular matrix with diagonal entries a1 , a2 , . . . an as its diagonal
entries. Find the conditions on the ai which forces the minimal polynomial of A to have degree
n and degree 1, respectively.
16. Let A ∈ M3 (F) be the matrix
 
0 0 −a0 
 
A = 1 0 −a1 ,
 
0 1 −a2
where ai ∈ F. Prove that the characteristic polynomial as well as the minimal polynomial of A
over F is x3 + a2 x2 + a1 x + a0.
17. For a fixed matrix A ∈ Mn (F), let T be the linear operator on Mn (F) given by
T (B) = AB for any B ∈ Mn (F).
Show that the minimal polynomial of T is the same as the minimal polynomial of the matrix A.
18. Prove that the matrices
   
0 0 1 0 1 0
   
A = 1 0 0 and B = 0 0 1
   
0 1 0 1 0 0
have the same minimal polynomial over C by showing that are similar over C. Are they similar
over R?
Invariant Subspaces 283
19. Show that

   
1 1 1 3 0 0
   
A = 1 1 1 and B = 0 0 0
   
1 1 1 0 0 0
are similar over R by exhibiting two bases of R3 with respect to which A and B are the matrices
of a linear operator on R3 . Hence find the minimal polynomial of A.
(Hint: Consider the linear operator on R3 which is represented by A with respect to the standard
basis of R3 .)
20. Let A = [ai j ] be an n × n real matrix such that ai j = 1 for all i, j. Find the characteristic and the
minimal polynomial of A by showing that A is similar to B = [bi j ], where b11 = n and bi j = 0 if
either i or j is different from 1.
21. Compute the minimal polynomial of the permutation matrix
 
0 0 1
 
A = 1 0 0
 
0 1 0
over R. Further, if C = A2 + A + I3, then find the minimal polynomial of C, too.

22. Show that the following matrices over any field F have the same characteristic and minimal
polynomial:
   
0 0 0 0 0 0 0 0
1  
0 0 0 1 0 0 0
A =   and B =  .
0 0 0 0 0 0 0 0
 
0 0 1 0 0 0 0 0
Also, verify that A and B are not similar over F by showing that it is not possible to find an
invertible matrix P over F such that
AP = PB.
( '
P 1 P2
(Hint: For easier computation, write P as a block matrix .)
P3 P4
23. Let D be the differential operator on the polynomial ring R[x]. Show that D can have no minimal
polynomial over R by proving that there is no polynomial f (x) over R such that f (D) is the zero
operator on R[x].
5.5 INVARIANT SUBSPACES

A deeper analysis of linear operators on a vector space depends on subspaces on which they act as
linear operators again. The eigenspaces of a linear operator are examples of such spaces. If W is an
eigenspace of a linear operator T belonging to an eigenvalue λ, then for any w ∈ W, the image T w is
again a vector in W, for T (T w) = T (λw) = λT w. Thus, T maps W into W, and we express this property
of W with respect to T by saying that W is invariant under T .
Definition 5.5.1. Let T be a linear operator on a vector space V. A subspace W of V is said to be

T -invariant if for any w ∈ W, T w ∈ W. In other words, W is T -invariant if T (W) ⊂ W.
This definition is valid for infinite-dimensional vector spaces too. It is clear that if W is a T -invariant
subspace, then T can be considered a linear operator on W which is itself a vector space on its own.
First, a few standard examples.
EXAMPLE 42 Any vector space V and its zero subspace {0} are T -invariant for any operator T on V.
EXAMPLE 43 For the identity operator IV on any vector space V, every subspace of V is invariant.
EXAMPLE 44 For any operator T on a vector space V, consider v ∈ ker T . Then T (T v) = T (0) = 0
showing that ker T is T -invariant. Similarly, the subspace Im(T ) is T -invariant.
EXAMPLE 45 Suppose that for an operator T on a vector space V over a field F, there is a vector
v ∈ V such that T n v = 0 for some positive integer n ≥ 2. Let w be any vector in W, the
4
subspace spanned by the vectors v, T v, T 2 v, . . . , T n−1 v. Since w = n−1 j
j=0 a j T v for
4n−1 4
a j ∈ F, it follows that T w = j=0 a j T j+1 v = n−2
j=0 a j T
j+1 v as T n v = 0. Thus T w ∈ W
proving that W is T -invariant.

The following proposition yields a large number of T -invariant subspaces for a
linear operator T .
Proposition 5.5.2. Let T be a linear operator on a vector space V. If S is a linear operator on V

such that T and S commute, that is, T S = S T , then the subspaces ker S and Im(S ) are T -invariant.
Proof. If v ∈ ker S , then
S (T v) = (S T )v = (T S )v = T (S v) = T (0) = 0,
which implies that T v ∈ ker S . It follows that ker S is T -invariant.

On the other hand, for any v ∈ V, T (S v) = S (T v) is clearly in Im(S ). So Im(S ) is T -invariant too. !
Recall that if f (x) = a0 + a1 x + · · · + an xn is a polynomial over a field F, then for any linear operator
T on a vector space V over the field F, the symbol f (T ) = a0 IV + a1 T + · · · + an T n is again a linear
operator on V. Since T commutes with any power of T , it follows that T commutes with f (T ) for any
polynomial f (x). Therefore, the following corollary results.
Corollary 5.5.3. Let V be a vector space over a field F, and T is a linear operator on V. For any
polynomial f (x) ∈ F[x], the subspaces ker f (T ) and Im( f (T )) are T -invariant. In particular, for any
eigenvalue λ of T , the eigenspace W = ker(T − λIV ) is T -invariant.
For deriving the second assertion, take f (x) = x − λ ∈ F(x).

As we have remarked at the outset, a linear operator T can be considered a map on any T -invariant
subspace. We now discuss this idea in detail. Let W be a T -invariant subspace for a linear operator T
on a vector space V over a field F and so T (W) ⊂ W. It is convenient to think of the linear map T from
W to W as different from T ; it is, in fact, different in the sense that its domain and range are restricted
to W.
Definition 5.5.4. If W is a T -invariant subspace of V, then the restriction T W of T to W is defined

as
T W (w) = T w for any w ∈ W.
It is clear that T W is a linear operator on W; note that T W (v) is not defined for v " W.
We now specialize to the case when W is a T -invariant subspace of a finite-dimensional vector
space V. We wish to obtain a matrix representation of T which will reflect the fact that W is a T -
invariant subspace of V. To do so, we first fix a basis, say v1 , v2 , . . . , vm of W, and then extend it to a
basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V. First consider the matrix of T W on W: if, for 1 ≤ j ≤ m,
m
1
T W (v j ) = b i j vi
i=1
for scalars bi j ∈ F, then the matrix B of T W with respect to the basis of W is in Mm (F) given by B = [bi j ].
Let A be the n × n matrix of T with respect to the extended basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V.
Since T v j = T W (v j ) for 1 ≤ j ≤ m, it follows that
m
1
Tvj = b i j vi for 1 ≤ j ≤ m,
i=1
for the same scalars bi j which appeared in the matrix B for T W . Therefore, the relations that will yield
the entries of the matrix A relative to the basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V will look like
m
1
Tvj = bi j vi + 0.vm+1 + · · · + 0.vn for 1 ≤ j ≤ m
i=1
and
n
1
Tvj = ci j vi for m + 1 ≤ j ≤ n, (5.14)
i=1
for some scalars ci j . It follows that the first m columns of the matrix A have all zero entries below the
mth row and that if we ignore these zero entries, the first m columns of A are exactly the same as the
columns of B. In other words, A can be written in terms of blocks of submatrices as
' (
B D
A= , (5.15)
O C
where B is the m×m matrix representing T W , D and C are, respectively, m×(n −m) and (n −m) ×(n −m)
matrices with entries determined by Equation (5.14), and finally, O is the (n − m) × m zero matrix. We
illustrate this construction by the following example.
EXAMPLE 46 Suppose that the linear operator T : R4 → R4 has an eigenvalue λ = 2 such that the
eigenspace W = ker(T − 2I) belonging to the eigenvalue 2 has dimension 2. Choose
a basis v1 , v2 of W; they are necessarily eigenvectors of T with eigenvalue 2. Since

W is an eigenspace, it is T -invariant and so the restriction T W is defined and
T W (v1 ) = T v1 = 2v1
T W (v2 ) = T v2 = 2v2 .
Thus, the matrix of T W with respect to the given basis of W of eigenvectors is the
2 × 2 matrix ' (
2 0
B= .
0 2
Next, we extend the basis v1 , v2 of W to a basis v1 , v2 , v3 , v4 of R4 . Now, even if we

do not know anything specific about v3 and v4 , it is clear that the shape of the matrix
A of T will be determined by the following relations:
T v1 = 2v1 + 0.v2 + 0.v3 + 0.v4

T v2 = 0.v1 + 2.v2 + 0.v3 + 0.v4
T v3 = c13 v1 + c23 v2 + c33 v3 + c43 v4
T v4 = c14 v1 + c24 v2 + c34 v3 + c44 v4
for some scalars ci j in R. Using the notations of Equation (5.15), we have D, C and
O as 2 × 2 matrices with
' ( ' ( ' (
c13 c14 c33 c34 0 0
D= , C= and O = .
c23 c24 c43 c44 0 0
Thus, the matrix of T relative to the basis of V will be the block matrix
' (
B D
A= .
O C
Note that if we would have known that v3 and v4 are both eigenvectors of T for
another eigenvalue λ, then D would have been the zero matrix, and C would have
been the diagonal matrix diag[λ,λ ], so that A would have been the diagonal matrix
diag[2, 2,λ ,λ ]. In that case, A will look like
' (
B O
A= ,
O C
where both B and C are diagonal matrices.
Direct Sums of Matrices

Keeping in mind the preceding discussion, we now consider the general case of such matrix descrip-
tions of operators when the vector space on which the operator is acting can be expressed as a direct
sum of invariant subspaces. So assume that for a linear operator T on a finite-dimensional vector space
V, we have
V = W1 ⊕ W2 ⊕ · · · ⊕ Wk
where W1 , W2 , . . . , Wk are T -invariant subspaces. Denote the restriction of T to Wi as T i . Assume

further that we have chosen basis Bi for each Wi , and the matrix of the restriction T i with respect to
this basis of Wi is Ai . Now, by the properties of direct sum (see Proposition 3.5.4), the union B of the
bases Bi is a basis of V. Recall that for us any basis is an ordered basis. Thus the ordered basis B is
more then just a union of the ordered bases Bi ; we require that in B, the vectors of Bi will precede
those of Bi+1 , and will appear exactly in the same order in B as in Bi . It is, as if we are stringing
together the bases Bi to obtain B. Let the matrix of T relative to the basis B be A. We want to relate A
to the various Ai .
Note that, by the definition of the matrix A, non-zero entries in the columns of A determined by the
image of any vector in Bi under T can occur only in the rows corresponding to the vectors in Bi , as
these vectors span the T -invariant subspace Wi . Since the action of T on the basis vectors of Bi is the
same as the action of T i , it follows that the entries in the rows and columns in A corresponding to the
vectors of Bi form precisely the submatrix Ai . Therefore, we can represent A as the following block
diagonal matrix:
 
A1 0 . . . 0 
 0 
 A2 . . . 0 
 . . . 
A =  ,
 . . . 
 . 
 . . 

0 0 . . . Ak
where the presence of the symbols 0 reflects the fact that the entries in the rows and columns corre-
sponding to each Ai other than its own entries are all zeros.
It will be convenient to call such a block diagonal matrix a direct sum of the matrices
A1 , A2 , . . . , Ak . We will sometimes write such a direct sum as
A1 ⊕ A2 ⊕ · · · ⊕ Ak . (5.16)
We have proved the following proposition which will be quite useful in determining the simplest
matrix form of linear operators.
Proposition 5.5.5. Suppose that a finite-dimensional vector space V can be decomposed as a

direct sum of T -invariant subspaces W1 , W2 , . . . , Wk . Let A1 , A2 , . . . , Ak be the matrices of the re-
strictions T i of T to Wi with respect to some bases of Wi . If we string together the chosen bases of the
Wi to get a basis of V, then with respect to that basis of V, the matrix of T is the direct sum of the
matrices A1 , A2 , . . . , Ak .
We note that this proposition generalizes the situation obtained for diagonalizable operators. For, a
linear operator T on a finite-dimensional vector space V is diagonalizable if and only if V is a direct
sum of distinct eigenspaces. Each of these eigenspace is T -invariant, and the matrix of the restriction
of T on such an eigenspace for eigenvalue λ j with respect to a basis of eigenvectors is clearly the scalar
matrix λ j Id j , where d j is the dimension of the eigenspace. The matrix of T , with respect to the basis of
V formed by stringing together the bases of the distinct eigenspaces is thus the diagonal matrix, which
is the direct sum of the scalar matrices λ1 Id1 , λ2 Id2 , . . . , λk Idk .
Two remarks about direct sum of matrices are in order.
(i) The converse of Proposition (5.5.5) holds. If the matrix A of a linear operator T on a finite-
dimensional vector space V can be expressed as a direct sum of submatrices, then V can be
decomposed as a direct sum of T -invariant subspaces in such a way that the submatrices are
the matrices of the restrictions of T to these subspaces.
(ii) If
 
A1 0 . . . 0 
 0 A 
 2 . . . 0 
 . . . 
A =  
 . . . 
 . 
 . . 

0 0 . . . Ak
then, by block multiplication one can see easily that

 m 
A1 0 . . . 0 
 0 
 A2 m . . . 0 
 . . . 
Am =   (5.17)
 . . . 
 . 
 . . 

0 0 . . . Ak m
for any positive integer m.

Using the notation we had introduced in Equation (5.16), we can give a brief description of the
preceding expression for the power of a direct sum of matrices as follows: if
A = A1 ⊕ A2 ⊕ · · · ⊕ Ak ,
then for any positive integer m,
Am = A 1 m ⊕ A2 m ⊕ · · · ⊕ A k m .
Let us now go back to the general discussion about T -invariant subspaces. The first task is to relate
the minimal and the characteristic polynomials of a linear operator to those of its restriction to an
invariant subspace.
Proposition 5.5.6. Let T be a linear operator on a finite-dimensional vector space V over a field
F and W a T -invariant subspace of V. If T W is the restriction of T to W, then
(a) the minimal polynomial of T W divides the minimal polynomial of T ;
(b) the characteristic polynomial of T W divides the characteristic polynomial of T .
Proof. Let m(x) and mW (x) denote the minimal polynomials of T and T W , respectively. Consider any
polynomial f (x) ∈ F[x] such that f (T ) = z, the zero map on V. This implies that f (T )v = 0 for any
v ∈ V. Thus f (T ) takes every vector of W also to the zero vector of W (the zero vector of W is the
same as that of V), which means that f (T W ) acts as the zero map on W. In other words, T W satisfies
any polynomial that T satisfies. In particular, T W satisfies m(x). By properties (see Proposition 5.4.9)
of minimal polynomials, it follows that mW (x) divides m(x) proving assertion (a).
We deal with assertion (b) now. Let dim V = n and dim W = m. Fix a basis of W, and let B be the
m × m matrix of T W with respect to the fixed basis of W. Extend that basis to a basis of V, and let A be
the matrix of T relative to that basis of V. Then, as in Equation (5.15), we have
' (
B D
A= ,
O C
where C is some (n − m) × (n − m) square matrix. It then follows that the characteristic polynomial of
A is the product of the characteristic polynomials of matrices B and C (see Exercise 13). Since the
characteristic polynomials of matrices A and B are the characteristic polynomials of operators T and
T W , respectively, the second assertion follows. !
One can say more about the connection between the characteristic polynomial of T and that of
T W . For that we need to introduce the linear operator induced by T on the quotient space V/W for a
T -invariant subspace W of V.
We have discussed quotient spaces in detail in Section 3.9; here we recall the main points briefly.
For any subspace W of V over a field F, the quotient space V = V/W is the collection of all cosets
v = v + W for v ∈ V; v = u if and only if v − u ∈ W. V/W can be made into a vector space over F be
defining addition and scalar multiplication of cosets as follows: for any v, u ∈ V and a ∈ F,
v + u = (v + W) + (u + W) = (v + u) + W = v + u,
and
av = a(v + W) = (av) + W = av.
Clearly 0 = 0 + W = w + W = w (for any w ∈ W) is the additive identity in V/W.

Now let T be a linear operator on V and W be a T -invariant subspace of V. We define T : V → V by
T (v) = T (v + W) = T v + W = T v.
Since any coset can be represented by different vectors and the definition of T depends on the vectors
representing cosets, we need to verify that T is well-defined, that is, the image of a coset under T is
independent of the choice of the vector representing the coset.
So suppose for vectors v1 and v2 in V, the cosets v1 = v2 . Thus v1 − v2 ∈ W and so T v1 − T v2 =
T (v1 −v2 ) ∈ W as W is T -invariant. Therefore, by the definition of equality in V, we infer that T v1 +W =
T v2 + W which is another way of saying that T (v1 ) = T (v2 ). This proves that T is well-defined.
It is an easy exercise to show, using the linearity of T , that T is a linear operator on the quotient
space V; one says that T is the operator induced by T on V.
Now assume that V is finite-dimensional. In this case, we are interested in relating a ma-
trix representation of T to that of T . As we have seen in Proposition (3.9.3), if we extend a
basis {w1 , w2 , . . . , wm } of W to a basis {w1 , w2 , . . . , wm , wm+1 , . . . , wn } of V, then the cosets
wm+1 , . . . , wn form a basis of V. Suppose that the matrix of T with respect this basis of V is A = [ai j ]
whereas the matrix of the restriction T W with respect to the basis {w1 , w2 , . . . , wm } of W is B. Since
W is T -invariant, T w j , for 1 ≤ j ≤ m, is a linear combination of only the basis vectors of W. It follows,
as in the proof of the preceding proposition, that
' (
B D
A= , (5.18)
O C
where O represents all the zeros below the entries of B. Next, we observe that, for m + 1 ≤ j ≤ n, the
relation
T w j = a1 j w1 + a2 j w2 + · · · + an j wn
implies that
T (w j ) = am+1, j wm+1 + am+2, j wm+2 + · · · + an j wn .
Thus the matrix of T with respect to the basis {wm+1 , wm+2 , . . . , wn } of V is precisely C, the submatrix
of A in Equation (5.18). Since the same equation shows that the characteristic polynomial of A is the
product of the characteristic polynomials of B and C, we have just proved the first assertion of the
following proposition.
F and W a T -invariant subspace of V. Let T W be the restriction of T to W and T the linear operator
on the quotient space V = V/W induced by T . Then the following hold.
(a) The characteristic polynomial of T is the product of the characteristic polynomial of T W and
that of T .
(b) The minimal polynomial of T divides the minimal polynomial of T .
Proof. To prove the second assertion, it suffices to show that if T satisfies a polynomial f (x) ∈ F[x],
then T also satisfies f (x). So suppose that T satisfies f (x) = a0 + a1 x + · · · + an xn over F. Thus
a0 v + a1 T v + · · · + an T n v = 0, (5.19)
for any v ∈ V. Now note that for any v1 , v2 and v in V and scalars a, b ∈ F, by properties of operations
in the quotient space V, one has
k
av1 + bv2 = av1 + bv2 and T (v) = T k v.
Using these properties, one can deduce from Equation (5.19) that, for any v ∈ V,
n
a0 v + a1 T (v) + · · · + an T (v) = 0.
2 n
Thus a0 + a1 T + a2 T + · · · + an T acts as the zero operator on V, which shows that T too satisfies the
polynomial f (x).
In particular, T satisfies the minimal polynomial of T and so by the definition of the minimal
polynomial of T , it divides the minimal polynomial of T . !
Cyclic Subspaces
We now introduce a special invariant subspace that will come up time and again.
Definition 5.5.8. Let T be a linear operator on a vector space V over a field F, and let v be a vector
in V. The subspace of V spanned by the sequence of vectors
v, T v, T 2 v, . . . , T k v, . . .
is known as the T -cyclic subspace generated by v, and denoted by Z(v, T ).

If Z(v, T ) = V, then sometimes it is said that v is a T -cyclic vector for V, or simply that T has a
cyclic vector.
We now observe that for any polynomial f (x) ∈ F[x], the operator f (T ) on V can be thought of as a
linear combination (with coefficients from F) of some finitely many of the powers T k for k ≥ 0. Thus
we obtain the following description of Z(v, T ).
Z(v, T ) = { f (T )v | f (x) ∈ F[x] }. (5.20)
The following gives yet another description of a T -cyclic subspace.
Proposition 5.5.9. Let T be a linear operator on a vector space V over a field F. For any v ∈ V,
the T -cyclic subspace Z(v, T ) is the smallest T -invariant subspace of V containing v.
Proof. For any polynomial f (x) ∈ F[x], it is trivial that T f (T ) is again a polynomial in T . Therefore,
for any f (T )v ∈ Z(v, T ), where f (x) is some polynomial over F, T ( f (T )v) = (T f (T ))v is clearly in
Z(v, T ) which shows that Z(v, T ) is T -invariant.
On the other hand, if W is a T -invariant subspace of V containing v, then T v is in W and so
T 2 v = T (T v) is also in W. Continuing in this manner, it can be shown that T k v ∈ W for any integer
k ≥ 0. Since such vectors span Z(v, T ), one concludes that Z(v, T ) is contained in W. The proof of the
proposition is complete. !
Now consider Z(v, T ) for a linear operator T on a finite-dimensional vector space V and for a non-
zero v ∈ V. So the vectors T k v spanning Z(v, T ) cannot all be linearly independent. On the other hand,
as v is non-zero, the singleton {v} is linearly independent. Thus, it is possible to find the largest positive
integer m such that
S = {v, T v, . . . , T m−1 v}
is linearly independent. In that case, S ∪ {T m v} is linearly dependent and so T m v is in the span of the
vectors in S. Suppose that
T m v = c0 v + c1 T v + c2 T 2 v + · · · + cm−1 T m−1 v, (5.21)
for some scalars c0 , c1 , . . . , cm−1 . Then applying T to both sides of the preceding relation, we see that
T m+1 v = c0 T v + c1 T 2 v + · · · + cm−2 T m−1 v + cm−1 T m v, (5.22)
the right hand side of which can expressed again as a linear combination of vectors in S by replacing
T m v by its expression in Equation (5.21).
It is clear that continuing in the same manner, one can show that T k v for every k ≥ (m + 1) is in the
span of vectors in S. This proves the following.
Proposition 5.5.10. Let T be a linear operator on a finite-dimensional vector space V, and let
Z(v, T ) be the T -cyclic subspace generated by a non-zero vector v in V. Suppose that m is the largest
positive integer such that
S = {v, T v, . . . , T m−1 v}
is linearly independent. Then dim Z(v, T ) = m.

Note that m is also the least positive integer such that T m v is a linear combination of the vectors in
the sequence
v, T v, T 2 v, . . . , T k v, . . .
preceding it.
Thus, a T -cyclic subspace of dimension m has a special basis which can be expressed in terms of
T as well as the generating vector v. It is convenient to name this basis.
Definition 5.5.11. Let T be an operator on a finite-dimensional vector space V, and let v be any
non-zero vector in V. Assume that the dimension of the T -cyclic subspace Z(v, T ) generated by v is m.
Then the basis of Z(v, T ) given by
v, T v, T 2 v, . . . , T m−1 v
is called a T -cyclic basis of Z(v, T ).

In case T m v = 0, we will sometimes refer to the preceding basis of Z(v, T ) as a T -nilcyclic basis of
Z(v, T ), and Z(v, T ) as a T -nilcyclic subspace.
The scalars appearing in Equation (5.21) are important too, and we incorporate them in a polyno-
mial that plays a crucial role in what follows.
Definition 5.5.12. For a linear operator T on a finite-dimensional vector space V and a non-zero
vector v ∈ V, suppose that the dimension of dim Z(v, T ) is m. Suppose further that
T m v = −am−1 T m−1 v − · · · − a1T v − a0v
for some scalars ai . (note that ai = −ci of Equation (5.21) for notational convenience). Then the poly-
nomial
fv (x) = a0 + a1 x + · · · + am−1 xm−1 + xm
is called the T -annihilator of v. Sometimes, it is also called the T -annihilator of the subspace Z(v, T ).
It is clear that the T -annihilator fv (x) is the unique monic polynomial of least degree such that
fv (T )v = 0. Note that
• Every non-zero vector in a finite-dimensional space V has a unique T -annihilator for any linear
operator T on V.
• The degree of the T -annihilator of v is m if and only if the dimension of the T -cyclic subspace
Z(v, T ) is m.
Let us look at some examples.
EXAMPLE 47 If v is an eigenvector of T belonging to an eigenvalue λ, then T v = λv, T 2 v = λ2 v, . . .,

so it clear that Z(v, T ) is one-dimensional with {v} as a basis. Note that the T -
annihilator of the eigenvector v is x − λ, which is the minimal polynomial of T re-
stricted to the eigenspace corresponding to λ.
EXAMPLE 48 If dim V > 1, then the identity map I on V can have no cyclic vector as I k v = v for all
non-negative integers k.
EXAMPLE 49 If T : R3 → R3 is represented by
 
0 0 0
 
1 0 0

0 1 0
with respect to the standard basis, then T has a cyclic vector. In fact, e1 = (1, 0, 0) is
a T -cyclic vector, that is, Z(e1 , T ) = R3 as e1 , T e1 = e2 , T 2 e1 = e3 form a basis of R3 ;
also T 3 e1 = 0. It is also easy to verify that the T -annihilator of e1 is x3 , the minimal
polynomial of T .
These examples suggest an interesting connection between the T -annihilator of a
vector v and the minimal polynomial of the restriction of T to the invariant subspace
Z(v, T ).
Proposition 5.5.13. Let T be a linear operator on a finite-dimensional vector space V, and v be

any non-zero vector in V. Let T v be the restriction of T to the T -invariant, T -cyclic subspace Z(v, T )
generated by v. Then the minimal as well as the characteristic polynomial of T v is precisely the T -
annihilator fv (x) of v.
Note that the degrees of these two polynomials equal the dimension of Z(v, T ).
Proof. Let
fv (x) = a0 + a1 x + · · · + am−1 xm−1 + xm
be the T -annihilator of v. Therefore, the vectors v, T v, . . . , T m−1 v form a T -cyclic basis of Z(v, T ).
Since by the definition of T -annihilator, fv (T )v = 0, and since fv (T ) commutes with every power of
T , it follows that fv (T ) takes every vector in the cyclic basis, and hence every vector of Z(v, T ), to the
zero vector. But the action of T on Z(v, T ) is the same as the action of T v . Thus, we may conclude
that fv (T v ) is the zero map on Z(v, T ). In other words, T v satisfies the polynomial fv (x) on Z(v, T ).
On the other hand, if T v satisfies any polynomial of degree k < m, then it is easy to see that T k v is a
linear combination of the vectors v, T v, . . . , T k−1 v. As k < m, this contradicts the choice of fv (x) as
the T -annihilator of v. It follows that fv (x) has to be the minimal polynomial of T v .
To establish the assertion about the characteristic polynomial of T v , we first work out the matrix of
T v with respect to the T -cyclic basis of Z(v, T ). Note that T and therefore T v applied to any of these
basis vectors in that list v, T v, . . . , T m−1 v except the last one, produces the next vector in the list. On
the other hand, T applied to the last vector in the list produces T m v which can be expressed by using
the T -annihilator fv (x) as the linear combination
T m v = −a0 v − a1 T v − · · · − am−1 T m−1 v

of the basis vectors. Thus the matrix of T v with respect to the T -cyclic basis is the following m × m
matrix:
 
0 0 0 . . . 0 0 −a0 
1 0 
 0 . . . 0 0 −a1 

0 1
 0 . . . 0 0 −a2 
. . . . . .
C =  . (5.23)
 . . . . . .
 . . 
 . . . .

0 0
 0 0 0 0 1 0 −am−2 
0 0 0 0 0 0 0 1 −am−1
Expanding the determinant det(xIm − C) by the first column, and then applying induction, one can
easily show that the characteristic polynomial of C and therefore of T v is
a0 + a1 x + · · · + am−1 xm−1 + xm ,
which is the T -annihilator of v. !
The special type of matrix we have just considered is quite useful and so deserves a name.
Definition 5.5.14. The m × m matrix defined in Equation (5.23) is called the companion matrix of
the monic polynomial a0 + a1 x + a2 x2 + · · · + am−1 xm−1 + xm . If we denote the polynomial as f (x), then
its companion matrix is denoted by C( f (x)).
Note that if A is the companion matrix of the polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am−1 xm−1 +
xm , then
• the order of A equals the degree of f (x);
• all the subdiagonal entries of A are equal to 1;
• the negatives of the coefficients (except the leading one) of f (x) appear on the last column of A;
• every entry off the subdiagonal or the last column is zero;
• the characteristic as well as the minimal polynomial of A are f (x) itself.
Thus, for example, if a matrix is to be produced whose minimal polynomial is a given monic polyno-
mial, one needs only to consider its companion matrix.
Cyclic Subspaces for Nilpotent Operator

Nilpotent operators provide nice examples of T -cyclic, T -invariant subspaces. Recall that an operator
T on V is nilpotent of index r, if r is the smallest positive integer such that T r is the zero map on V.
This means that T r−1 is not zero and so we may choose a vector v in V such that T r−1 v is non-zero. It
is now easy to verify that the vectors
v, T v, T 2 v, . . . , T r−1 v (5.24)
are linearly independent. Since the zero vector T r v is a trivial linear combination of the vectors in the
list (5.24), according to Definition (5.5.11), these vectors do form a nilcyclic basis of the T -invariant
T -cyclic subspace W = Z(v, T ) of dimension r.
As T r v is the zero vector, it follows from Equation (5.23) that the matrix of T W , the restriction of
T to W, is the r × r matrix having 1 on the subdiagonal and zeros everywhere else, which is Jr (0), the
elementary Jordan block of order r with eigenvalue 0.
Proposition 5.5.15. Let T be a nilpotent operator on a finite-dimensional vector space V. If the

index of nilpotency of T is r, then there is a T -cyclic, T -invariant subspace W of dimension r such that
the matrix of T W with respect to the corresponding T -nilcyclic basis of W is the elementary Jordan
block Jr (0).
We end this section by pointing out an interesting relationship between a linear operator T on a vec-
tor space V and the projection on V onto a T -invariant subspace. Recall that (see Proposition (4.2.12))
given a projection P, V can be expressed as W ⊕ K where W is the image of P and K the kernel.
Lemma 5.5.16. Let T be a linear operator on a vector space V, and let P be a projection on V
with range W and kernel K. Then, W and K are T -invariant subspaces of V if and only if T and P
commute.
Proof. The assertion in one direction follows from a general result about commuting operators. If T
commutes with P, then by Proposition (5.5.2), the range and kernel of P are T -invariant. Thus the
special properties of projections are needed only when we prove the lemma in the other direction. So
assume that W and K are T -invariant. Now, as V = W ⊕ K, any v ∈ V can be written as v = v1 + v2 =
Pv1 +v2 . Here, we have used the fact that w ∈ W if and only if Pw = w as W is the range of the projection
P. But K is T -invariant so P(T v2 ) = 0. It follows that P(T v) = P(T (Pv1 )) = T (Pv1 ) as T (Pv1 ) is in W.
Using the fact that Pv2 is the zero vector, we then see that
(PT )v = T (Pv1 ) = T (Pv1 ) + T (Pv2 ) = T P(v1 + v2 )
which is (T P)v. This completes the proof. !
This result can be generalized to the situation when a vector space has a direct sum decomposition
of finitely many subspaces. We follow the notation of Proposition (4.2.13) which asserts the following:
if V = W1 ⊕ · · · ⊕ Wk , and P1 , . . . , Pk are the associated projections with the image of P j as W j , then
Pi P j for i ! j is the zero map and P1 + · · · + Pk is the identity map on V.
Proposition 5.5.17. Let T be a linear operator on a vector space V, and let

V = W1 ⊕ W2 ⊕ · · · ⊕ Wk
be a direct sum decomposition of V with associated projections P1 , P2 , . . . , Pk . Then, W j , for each j,
is T -invariant if and only if T commutes with P j .
The proof needs a slight modification of the proof of the preceding lemma and left to the reader.
EXERCISES
vector spaces are finite-dimensional.
(a) If every subspace of a vector space is invariant under a linear operator T on V, then T = aI
for some scalar a, where I is the identity operator on V.
(b) If V = W1 ⊕ W2 , where W1 is T -invariant for some linear operator T on V, then W2 is also

T -invariant.
(c) Every linear operator T on R2 has a one-dimensional T -invariant subspace.
(d) If the T -cyclic subspace generated by some non-zero v ∈ V is one-dimensional, then v is an
eigenvector of T for some eigenvalue.
(e) If D is the differential operator on R3 [x], then R3 [x] is D-cyclic.
(f) For a linear operator T on a finite-dimensional vector space V, the T -cyclic subspace gener-
ated by any v ∈ V is the same as the T -cyclic subspace generated by T v.
(g) If P is a projection of a finite-dimensional vector space V, then V = Im(P) ⊕ ker P is a direct
sum of T -invariant subspaces.
(h) If T is a nilpotent operator of index of nilpotency r on a vector space V where r < dim V,
then T has no cyclic vector.
(i) If a linear operator T on a finite-dimensional vector space has a cyclic vector, then so
does T 2 .
(j) If for a linear operator T on a finite-dimensional vector space, T 2 has a cyclic vector, then
so does T .
(k) If (x + 1) is a divisor of the minimal polynomial of a linear operator T on a vector space V,
then there is a vector v ∈ V whose T -annihilator is precisely (x + 1).
(l) If the minimal polynomial of the restriction T W of a linear operator T to a T -invariant sub-
space W is x − λ, then λ is an eigenvalue of T .
(m) If the characteristic polynomial of the restriction T W of a linear operator T to a T -invariant
subspace W is (x − λ)2 , then λ cannot be an eigenvalue of T .
2. Find all T -invariant subspaces of R2 for the linear operator T whose matrix with respect to the
standard basis of R2 is
' (
−1 2
A= .
−1 2
3. In each of the following, determine whether the given subspace W of the vector space V is
T -invariant for the given linear operator:
(a) V = R[x], T ( f (x)) = x f ' (x); W = R2 [x];
(b) V = R3 , T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 , x3 + x1 ); W = {(a, a, a) | a ∈ R};
(c) V = R3 , T (x1 , x2 , x3 ) = (−x3 , x1 − x3 , x2 − x3 ); W = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0};
(d) V = Mn (R), T (A) = At ; W = {A | A = At };
' (
0 1
(e) V = M2 (R), T (A) = A; W = {A | A = At }.
1 0
4. Let T be the linear operator on the vector space V of all real-valued continuous functions on the
interval [0, 1] given by
J x
(T f )(x) = f (t)dt for 0 ≤ x ≤ 1.
0
Which of the following subspaces of V are T -invariant?

(a) The subspace of all differentiable functions on [0, 1];

(b) The subspace of all f ∈ V such that f (0) = 0;
(c) The subspace of all polynomials of degree at most n;
(d) The subspace spanned by sin x and cos x;
(e) The subspace spanned by {1, sin x, cos x}.
5. Let T be a linear operator on a finite-dimensional vector space V. For any v ∈ V, show that the
T -annihilator of v is a divisor of the minimal polynomial of T .
6. Let T be a linear operator on a finite-dimensional vector space V. Prove that T has a cyclic vector
if and only if there is some basis of V relative to which T is represented by the companion matrix
of the minimal polynomial of T .
7. Let T be a diagonalizable linear operator on an n-dimensional vector space V.
(a) If T has a cyclic vector, show that T has n distinct eigenvalues.
(b) If T has n distinct eigenvalues, and if eigenvectors v1 , v2 , . . . , vn of T form a basis of V,
then show that v = v1 + v2 + · · · + vn is a cyclic vector of T .
8. Let T and S be linear operators on a vector space of dimension n. If S commutes with T , then
show that every eigenspace of T is S -invariant. Hence, prove that if T is diagonalizable with n
distinct eigenvalues and S commutes with T then S itself is diagonalizable.
9. Let T be the linear operator on R3 such that its matrix with respect to the standard basis is
 
1 0 0
 
0 1 0.

0 0 −1
(a) Show that T has no cyclic vector.

(b) Determine the T -cyclic subspaces W1 and W2 generated by v1 = (1, 1, 0)t and v2 =
(1, 1, −1)t , respectively.
(c) Determine the T -annihilators of v1 and v2 .
10. Let T be the linear operator on C3 such that its matrix with respect to the standard basis is
 
0 i 0
 
1 −1 −i.

0 1 1
Find the T -cyclic subspaces generated by e1 = (1, 0, 0)t and v = (1, 0, i)t , respectively. What are
the T -annihilators of these two vectors?
12. Prove in detail that if a matrix A ∈ Mn (F) can be expressed as a block matrix
' (
B D
A=
O C
then the characteristic polynomial of A is the product of the characteristic polynomials of B and
C.
13. Let T be a linear operator on a finite-dimensional vector space V. Suppose that V can be decom-
posed as a direct sum
V = W1 ⊕ W2 ⊕ · · · ⊕ Wk
of T -invariant subspaces. Let T i be the restriction of T to Wi . Prove that the characteristic polyno-
mial of T is the product of the characteristic polynomials of all the restrictions T 1 , T 2 , . . . , T k .
14. Let T be the linear operator on the real vector space R3 [x] of all real polynomials of degree at
most 3 given by T ( f (x)) = f '' (x), the second derivative of f (x). Let W1 and W2 be the T -cyclic
subspaces generated by v1 = x3 and v2 = x2 , respectively. If T 1 and T 2 are the restrictions of T to
W1 and W2 , respectively, compute the characteristic polynomials of T 1 and T 2 . Hence, find the
characteristic polynomial of T .
15. Let T be the linear operator on R4 whose matrix with respect to the standard basis is
 
1 1 0 0

0 1 −1 0
 .
1
 0 1 0
1 0 0 1
Find the characteristic and minimal polynomials of the restriction T 1 of T to the T -cyclic sub-
space generated by e1 = (1, 0, 0, 0)t . Determine the characteristic and the minimal polynomials
of T without evaluating any determinant.
16. Show that any linear operator on R3 has invariant subspaces of all possible dimensions.
17. For any linear operator T on Rn (n ≥ 2), show that there is a two-dimensional T -invariant sub-
space of Rn .
18. Let T and S be diagonalizable linear operators on a finite-dimensional vector space V such that
T and S commute.
(a) If W is the eigenspace of S for some eigenvalue, then show that W is T -invariant.
(b) Prove that the restriction T W of T to W is diagonalizable and W has a basis consisting com-
mon eigenvectors of T and S .
(c) Hence prove that there is a basis of V with respect to which matrices of both T and S are
diagonal.
19. Let A and B be diagonalizable matrices in Mn (F) such that A and B commute. Prove that there
is an invertible matrix U ∈ Mn (F) such that both U −1 AU and U −1 BU are diagonal.
The preceding two exercises show that two commuting diagonalizable linear operators (or
two commuting diagonalizable matrices) are simultaneously diagonalizable.
20. Let A and B be diagonalizable matrices in Mn (F) such that A and B commute. Prove that A + B
and AB are also diagonalizable.
5.6 SOME BASIC RESULTS

In this section, we derive some major theoretical results about linear operators by using appropriate
subspaces invariant under the operators. We also explore the relationship between characteristic and
minimal polynomials of linear operators.
Some Basic Results 299
We begin by proving the classical result we had referred to earlier, which states that the minimal
polynomial of a linear operator divides its characteristic polynomial. We have chosen, from among
various proofs of the result, one which uses the idea of invariant subspaces; it is remarkable how the
power of the concepts related to invariant subspaces makes the proof truly simple.
Theorem 5.6.1. (Cayley–Hamilton Theorem) Let T be a linear operator on a finite-dimensional

vector space V over a field F. Then T satisfies its characteristic polynomial, that is, if ch(x) is the
characteristic polynomial of T over F, then ch(T ) = z, the zero map on V.
Proof. It needs to be shown that ch(T )v = 0 for any v ∈ V. We can assume that v is non-zero. Let
W = Z(v, T ) be the T -cyclic subspace generated by v, and let fv (x) ∈ F[x] be the T -annihilator of v.
Then, by Proposition (5.5.13), fv (x) is also the characteristic polynomial of the restriction T W of T
to the T -invariant subspace W. Therefore, fv (x) divides ch(x), the characteristic polynomial of T (see
Proposition 5.5.6). Suppose that ch(x) = q(x) fv (x) for some q(x) ∈ F[x]. Then,
ch(T )v = q(T ) fv (T )v = q(T )( fv (T )v) = q(T )(0) = 0,
as the T -annihilator of v takes it to the zero vector. The theorem follows. !
Since the minimal polynomial of T divides any polynomial satisfied by T , we have the following
important result.
Corollary 5.6.2. Let T be a linear operator on a finite-dimensional vector space over a field F.
Then, the minimal polynomial of T divides its characteristic polynomial in F[x].
We reiterate the matrix version of the Cayley–Hamilton theorem next.
Theorem 5.6.3. A square matrix over a field F satisfies its characteristic polynomial. Thus, the
minimal polynomial of a square matrix divides its characteristic polynomial in F[x].
Since the characteristic polynomial of a linear operator on an n-dimensional vector space or of an

n × n matrix has degree precisely n, the following corollary results:
Corollary 5.6.4. The degree of the minimal polynomial of a linear operator on an n-dimensional
vector space, or a matrix of order n, cannot exceed n.
Eigenvalues and Minimal Polynomials
Recall that a polynomial f (x) in F[x] has a linear factor x − λ (λ ∈ F) if and only if λ is a root of f (x)
in F. Therefore, it follows from the preceding results that any root in F of the minimal polynomial
of a linear operator T on a finite-dimensional vector space V over a field F must be a root of the
characteristic polynomial of T in F, that is, an eigenvalue of T . Conversely, assume that λ ∈ F is an
eigenvalue of T . Thus, for some non-zero vector v ∈ V, T v = λv. Then, it is easy to see that for any
scalar c ∈ F and any positive integer k, (cT k )v = cλk v. It follows that given any polynomial f (x) ∈ F[x],
f (T )v = f (λ)v.
This shows that f (λ) is an eigenvalue of the operator f (T ) on V with v as an eigenvector. In particular,
if m(x) is the minimal polynomial of T , then
0 = m(T )v = m(λ)v
as m(T ) is the zero operator on V. Since v is non-zero, we conclude from the preceding equality that
m(λ) = 0. Thus, we have proved the following proposition.
Proposition 5.6.5. For a linear operator T on a finite-dimensional vector space over a field F, the
eigenvalues of T are precisely the roots of its minimal polynomial in F.
Thus, the roots of the characteristic polynomial and the minimal polynomial of T in the underlying
field F are the same, apart from their multiplicities.
The matrix version of the proposition is clear and we leave it to the reader to formulate and verify
such a version.
As nonconstant polynomials over F, the characteristic and the minimal polynomial of T can be
factorized uniquely as products of irreducible polynomials over F. Since linear polynomials, that is
polynomials of degree 1, are irreducible, it follows from the preceding discussion that the characteristic
and the minimal polynomial share the same irreducible factors of degree 1 (the number of times such
factors appear in the respective factorizations of the two polynomials may be different; see Section 5.2
for relevant results)
We now prove the remarkable fact that the characteristic polynomial and the minimal polynomial
share even the irreducible factors of degree greater than 1; again, in general, the number of times each
such irreducible factor appear in the factorizations of these two polynomials need not be the same. We
shall require the following observation, which is a consequence of the uniqueness of factorizations of
polynomials into irreducible factors: if p(x) an irreducible polynomial over a field F such that p(x)
divides the product f (x)g(x) of polynomials f (x) and g(x) over F, then p(x) divides either f (x) or
g(x).
Proposition 5.6.6. Let V be a finite-dimensional vector space over a field F and T a linear opera-
tor on V. Let ch(x) and m(x) be the characteristic and the minimal polynomial of T over F, respectively.
Then the irreducible factors of ch(x) and m(x) in F[x] are the same.
Proof. The proof is by induction on dim V. If dim V = 1, then the characteristic polynomial of T is
necessarily a linear one such as x − λ for some λ ∈ F. Since the minimal polynomial of T is a divisor of
the characteristic polynomial and has be a monic polynomial of positive degree, it must also be x − λ.
So the result holds in this case. Let us assume then that dim V > 1. By the induction hypothesis, we
can also assume that the proposition holds for any linear operator on a vector space whose dimension
is less than that of V. We choose any non-zero vector v ∈ V and let W = Z(v, T ), the T cyclic subspace
of V generated by v; also let V be the quotient space V/W. Note that as dim W ≥ 1, by Proposition
(3.9.3), dim V = dim V − dim W ≤ n − 1 < dim V. Let T W denote the restriction of T to W and T
the operator induced by T on the quotient space V. Further, let ch1 (x) and ch2 (x) be the characteristic
polynomials of T W and of T respectively; then ch(x) = ch1 (x)ch2(x) by the first assertion of Proposition
(5.5.7).
Coming back to the proof proper, let p(x) be an irreducible factor of ch(x), the characteristic poly-
nomial of T . Since ch(x) = ch1 (x)ch2(x), it follows from our remark about an irreducible factor of a
product of two polynomials, that either p(x) divides ch1 (x) or it divides ch2 (x). In the first case, as W
is a T -cyclic subspace, ch1 (x) is the same as the minimal polynomial m1 (x) of T W (see Proposition
5.5.13) and so p(x) divides m1 (x). On the other hand, m1 (x) divides m(x), the minimal polynomial of
T by Proposition (5.5.6). Thus we conclude that in this case p(x) divides m(x). Suppose now that the
second case holds. Then, as T is a linear operator on a vector space of dimension less than dim V,
by the induction hypothesis, p(x) divides the minimal polynomial of T . Since Proposition (5.5.7) also
asserts that the minimal polynomial of T divides m(x), we are done in this case too. The proof is
complete by induction. !
The preceding proof is a modified version of a proof due to Prof. M. Leeuwen.

Recall from Theorem (5.3.20) that the characteristic polynomial of a diagonalizable operator T on
a finite-dimensional vector space V over a field F factors completely into linear factors. Proposition
(5.6.6) then implies the following result for diagonalizable operators, which will be sharpened shortly.
Corollary 5.6.7. Let T be a diagonalizable linear operator on a finite-dimensional vector space V

over a field F. Then its minimal polynomial factors completely into linear factors over F.
We now discuss a couple of examples which illustrate some of the results of this section. We point
out that all the results about the relationships between minimal polynomials, characteristic polynomi-
als and eigenvalues for operators have obvious counterparts for matrices, and we will use them without
any comments.
EXAMPLE 50 If a linear operator on a finite-dimensional vector space or a square matrix has
(x − λ1 )n1 (x − λ2 )n2 · · · (x − λk )nk
as its characteristic polynomial for distinct λi , then its minimal polynomial has to be
(x − λ1 )r1 (x − λ2 )r2 . . . (x − λk )rk ,
where 1 ≤ ri ≤ ni for each i = 1, 2, . . . , k. It follows as any linear factor is an irre-

ducible polynomial over any field.
EXAMPLE 51 Let T be a linear operator on R4 having two eigenvalues 1 and 2. Then we cannot
conclude that the characteristic polynomial of T is a product of factors (x − 1) and
(x − 2); there may be an irreducible factor (over R) of degree 2. But if we are given
that T is diagonalizable, then both the characteristic as well as the minimal polyno-
mial of T must be products of only these factors. To determine the multiplicities of
these factors, we need more information about T .
However, if T is an arbitrary operator on C4 having 1 and 2 as the only eigen-
values, then we can conclude that the characteristic polynomial has to be a product
of the factors (x − 1) and (x − 2) as the characteristic polynomial factors completely
into a product of linear factors over C and every such factor (x − λ) will give rise to
an eigenvalue λ. Although, as in the preceding case, we cannot say how many times
each factor repeats without knowing more about T . Note that the minimal polyno-
mial of T is also a product of the same linear factors whose multiplicities cannot
exceed the corresponding ones for the characteristic polynomial.
EXAMPLE 52 Consider the linear operator T on R4 whose matrix relative to the standard basis is
 
0 0 0 1

1 0 0 0
A =  .
0 1 0 0
0 0 1 0
Note that A is a companion matrix so the characteristic as well as the minimal

polynomial of A, and hence of T , are x4 − 1. Now the irreducible factors of the mini-
mal as well as the characteristic polynomials of T over R are (x − 1), (x + 1) and (x2 +
1). Therefore, T has two eigenvalues, 1 and −1. It is also clear that T is not diagonal-
izable over R as the characteristic polynomial is not a product of linear factors over R.
However, if T is considered an operator on C4 with the same matrix with respect
to the standard basis, then it has four distinct eigenvalues, namely 1, −1, i and −i. T
is diagonalizable over C and its minimal and the characteristic polynomial have the
same factorization (x − 1)(x + 1)(x − i)(x + i) over C.
Our next goal is to derive a simple but useful necessary and sufficient condition
for an operator to be diagonalizable in terms of its minimal polynomial. Recall from
Corollary (5.6.7) that the minimal polynomial of a diagonalizable operator factors
completely into linear factors over the base field. However, as the following theorem
shows there can be no repetition of factors; in fact, even the converse holds.
Theorem 5.6.8. Let T be a linear operator on a finite-dimensional vector space V over a field F.
T is diagonalizable if and only if the minimal polynomial of T is a product
(x − λ1)(x − λ2 ) . . . (x − λk )
of distinct linear factors over F, that is, λi ! λ j for i ! j.
Proof. We tackle the easier half first. Assume that T is diagonalizable with distinct eigenvalues
M
λ1 , λ2 , . . . , λk . Thus, the characteristic polynomial of T must be kj=1 (x − λ j )n j for some positive
integers n j . By hypothesis, V has a basis consisting of eigenvectors of T . But any such basis vector
must be in the kernel of one the operators T − λ1 I, T − λ2 I, . . . , T − λk I (I denotes the identity map
IV ). Note that these operators are polynomials in T and therefore commute. It follows that
(T − λ1 I)(T − λ2 I) . . . (T − λk I)(v) = 0
for each basis vector v, and so this product of linear operators takes every vector in V to the
zero vector. In other words (T − λ1 I)(T − λ2 I) . . . (T − λk I) = z on V and therefore T satisfies the
polynomial (x − λ1 )(x − λ2 ) . . . (x − λk ). Consequently, the minimal polynomial of T must divide
(x − λ1 )(x − λ2 ) . . . (x − λk ). On the other hand, by Proposition (5.6.6), each of these linear factors,
being an irreducible factor of the characteristic polynomial of T , must be a factor of the minimal
polynomial of T too. We conclude that the minimal polynomial must be (x − λ1 )(x − λ2 ) . . . (x − λk )
proving one part of the theorem.
To prove the converse, assume that the minimal polynomial of T is (x − λ1)(x − λ2 ) . . . (x − λk ), a
product of distinct linear factors. Thus, the composite (T − λ1 I)(T − λ2 I) . . . (T − λk I) is the zero map
on V. Therefore,
V = ker((T − λ1 I)(T − λ2 I) . . . (T − λk I)). (5.25)
Now, by Exercise 21 of Section 4.4, for the composite S R of two linear operators S and R on V, we
have
dim ker(S R) ≤ dim ker S + dim ker R.
So, extending this inequality to the composite of the k operators T − λ1 I, T − λ2 I, . . . , T − λk I on V

(by induction, for example), we see that
dim ker((T − λ1 I)(T − λ2 I) . . . (T − λk I))

≤ dim ker(T − λ1 I) + dim ker(T − λ2 I) + · · · + dim ker(T − λk I)
= dim(ker(T − λ1 I) ⊕ ker(T − λ2 I) ⊕ · · · ⊕ ker(T − λk I)),
where the last equality follows as the sum of distinct eigenspaces is a direct sum (see remarks in
following Example 23). Equation (5.25) then shows that
dim V ≤ dim(ker(T − λ1 I) ⊕ ker(T − λ2 I) ⊕ · · · ⊕ ker(T − λk I)).
Since the direct sum of the eigenspaces is a subspace of V, it follows that the preceding inequality is
an equality and so by properties of direct sums of subspaces (see Proposition 3.5.6), V is the direct
sum of the distinct eigenspaces of T . This completes the proof. !
It is easy to formulate and prove the following matrix version of the theorem.
Theorem 5.6.9. Let A ∈ Mn (F). A is similar to a diagonal matrix over F if and only if the minimal
polynomial of A is a product of distinct linear factors over F.
Theorems (5.6.8) and (5.6.9) are extremely useful because of their simplicity. We give a few ex-
amples illustrating their uses. The reader should compare these with similar examples that we worked
out previously about diagonalizability using Theorem (5.3.20).
EXAMPLE 53 Consider the 4 × 4 matrix

 
0 0 0 0

1 0 0 0
A =  .
0 1 0 0
0 0 1 0
We have seen, in Section 5.5, that the minimal polynomial of A is x4 (it is its charac-
teristic polynomial too). The linear factor x = (x−0) repeats four times in the minimal
polynomial and so by Theorem (5.6.9), A cannot be similar to a diagonal matrix over
any field.
Similarly the nilpotent matrix of order n represented by the Jordan block Jn (0)
has minimal polynomial xn , and so, cannot be similar to a diagonal matrix over any
field for n > 1.
EXAMPLE 54 Consider the real matrix of Example 23 of Section 5.3 given by

 
15 −12 −16
 
A =  4 −2 −5.
 
9 −8 −9
Since A was seen to have two eigenvalue 1 and 2, the results of this section im-
plies that A will be similar to a diagonal matrix only when its minimal polynomial is
(x − 1)(x − 2). Now,
  
14 −12 −16 13 −12 −16
  
(A − I3)(A − 2I3) =  4 −3 −5   4 −4 −5 
  
9 −8 −10 9 −8 −11
is clearly a non-zero matrix, so that A does not satisfy (x − 1)(x − 2). It follows that A
cannot be diagonalizable.
EXAMPLE 55 Let A be the following 4 × 4 real matrix

 
1 0 0 0

1 1 0 0
A =  .
0 0 2 0
0 0 0 2
Since this is a lower triangular matrix, the entries along the diagonal are the eigen-
values. So, 1 and 2 are the eigenvalues, each repeating. The shape of the matrix itself
suggests that A cannot satisfy the polynomial (x − 1)(x − 2). We leave it to the reader
to verify that the product (A − I4 )(A − 2I4 ) is not the zero matrix, and so (x − 1)(x − 2)
cannot be the minimal polynomial of A. Thus, A is not diagonalizable.
Upper Triangular Matrix Representations

Linear operators, which cannot be diagonalized but whose minimal polynomials are products of linear
factors, can still have fairly useful matrix representations. A celebrated result due to Schur states that
for such a linear operator T on a finite-dimensional vector space V, one can find a basis with respect
to which the matrix of T is upper triangular. We shall need the following lemma about eigenvalues of
an induced operator for the proof of Schur’s theorem.
Lemma 5.6.10. Let T be a linear operator on a finite-dimensional vector space V such that its
minimal polynomial is a product of linear factors. Let W be a proper T -invariant subspace of V and
T the operator on the quotient space V = V/W induced by T . Then some eigenvalue of T is also an
eigenvalue of T .
Proof. Since W is a proper subspace of V, the quotient space V = V/W is a non-zero finite-dimensional
vector space and so the minimal polynomial m1 (x) of the induced operator T on V has degree at least
1. On the other hand, by Proposition (5.5.7), the minimal polynomial m1 (x) of T divides the minimal
polynomial m(x) of T . As m(x) is the product of only linear factors, at least one such linear factor, say
x − λ is also a factor of m1 (x). Then λ is clearly an eigenvalue of T . !
Note that if λ is the eigenvalue of T as postulated in the lemma, then there is a non-zero v ∈ V such
that (T − λI)v = 0, where I is the identity operator of V induced by the identity operator I of V. Since v
is non-zero in V = V/W, it follows that v " W whereas (T − λI)v ∈ W. This conclusion, which follows
from the lemma, will be key in the proof of the following result.
F. If the minimal polynomial of T is a product of linear factors over F, then there is a basis of V with
respect to which the matrix of T is upper triangular.
Proof. We show how to construct the required basis {v1 , v2 , . . . , vn } of V. Since the matrix of T has to
be upper triangular, it follows that the vectors of the basis have to chosen so as to satisfy the condition
that for any k ≥ 1, T vk is in the span of v1 , v2 , . . . , vk . We begin our construction by choosing v1 to be
any eigenvector of an eigenvalue, say λ1 , of T (note that T has at least one eigenvalue as its minimal
polynomial is a product of linear factors); since T v1 = λ1 v1 , the required condition is satisfied. Let W1
be the subspace spanned by v1 ; W1 is trivially T -invariant. If W1 = V, we are clearly done. Otherwise
W1 is a proper subspace of V and so we can apply the lemma to find a vector v2 " W1 such that for some
eigenvalue λ2 of T , (T − λ2 I)v2 ∈ W1 . The choice of v2 implies that {v1 , v2 } are linearly independent
and that
T v2 = a12 v1 + λ2 v2 , (5.26)
for some a12 ∈ F. If W2 is the subspace spanned by v1 , v2 , then Equation (5.26) shows that W2 is
T -invariant. Now if W2 = V, we are done as {v1 , v2 } is the required basis. Otherwise we continue in
a similar manner. To be precise, suppose that we have been able to find linearly independent vectors
v1 , v2 , . . . , vk−1 such that their span Wk−1 is T -invariant and T vk−1 ∈ Wk−1 . If Wk−1 is a proper sub-
space of V, then by the preceding lemma, we can find a vector vk " Wk−1 and an eigenvalue λk of T
such that T vk − λk vk is in Wk−1 . It is clear then that v1 , v2 , . . . , vk−1 , vk are linearly independent and
if Wk is the span of these vectors, then T vk is in Wk . Since V is finite-dimensional, this process must
stop after finitely many steps producing a basis of V with the required property. !
Definition 5.6.12. A linear operator T on a finite-dimensional vector space V over a field F is

called triangulable if there is a basis of V with respect to which the matrix of T is upper triangular.
Similarly, a matrix A ∈ Mn (F) is said to be triangulable if A is similar to a upper triangular matrix in
Mn (F).
It can be easily shown that if U is an upper triangular matrix of order n, whose diagonal elements
are a11 , a22 , . . . , ann (not necessarily distinct), then the characteristic polynomial of U is
ch(U) = (x − a11)(x − a22) · · · (x − ann).
Since the minimal polynomial of U divides ch(U), it follows that the the minimal polynomial of U,
and of any matrix similar to U, has to a product of linear factors. This proves that the converse of the
preceding proposition holds.
We can now present Schur’s complete theorem.
Theorem 5.6.13. (Schur’s Theorem) A linear operator T on a finite-dimensional vector space V

over a field F is triangulable if and only if the characteristic polynomial (or the minimal polynomial)
of T factors completely into a product of linear factors over F.
For algebraically closed fields such as the field C of complex numbers, Schur’s theorem takes a
particularly simple form as any non-constant polynomial (and so characteristic polynomial of any
operator) factors into a product of linear factors.
Corollary 5.6.14. Any linear operator on a finite-dimensional vector space over an algebraically
closed field is triangulable.
The matrix versions are straightforward.
Corollary 5.6.15. (Schur’s Theorem) Let A ∈ Mn (F). A is similar to a upper triangular matrix in
Mn (F) if and only if the characteristic polynomial (or the minimal polynomial) of A factors completely
into a product of linear factors over F.
Corollary 5.6.16. If F is an algebraically closed field, the any A ∈ Mn (F) is triangulable.
For some applications, see Exercises 23, 27 and 28.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All
given vector spaces are finite-dimensional.
(a) If a triangular matrix A is similar to a diagonal matrix over a field F, then A is already
diagonal.
(b) Any matrix A in Mn (F) such that A2 = A is diagonalizable over F.
(c) Any square matrix over C is diagonalizable.
(d) The restriction of a non-diagonalizable operator T on a finite-dimensional vector space to
a T -invariant subspace can never be diagonalizable.
(e) If every one-dimensional subspace of a finite-dimensional vector space is invariant under a
linear operator T , then T is diagonalizable.
(f) The only nilpotent diagonalizable operator on a non-zero finite-dimensional vector space
is the zero operator.
(g) If zero is the only eigenvalue of a linear operator on a finite-dimensional vector space, then
it must be nilpotent.
(h) If the characteristic polynomial of a linear operator is a product of distinct linear factors,
then it coincides with the minimal polynomial.
(i) The roots in F of the minimal polynomial of a matrix in Mn (F) are precisely its eigenvalues.
(j) Any matrix in Mn (C) is similar to a lower triangular matrix.
(k) If T is the operator on V/W induced by a linear operator T on V, then any eigenvalue of T
is an eigenvalue of T .
2. Give an example of each of the following:
(a) A non-zero matrix in M2 (R) which is diagonalizable but not invertible.
(b) A matrix in M2 (R) which is invertible but not diagonalizable.
(c) Diagonalizable matrices A and B in M2 (R) such that A + B is not diagonalizable.

(d) Diagonalizable matrices A and B in M2 (R) such that AB is not diagonalizable.
3. Find the minimal polynomial of
 
0 1 0 1

1 0 1 0
A =  .
0 1 0 1
1 0 1 0
Is A diagonalizable over R? Over C?
4. Let A be real 6 × 6 matrix which has x4 (x − 1)2 as its characteristic polynomial and x(x − 1) as
its minimal polynomial. What are the dimensions of the eigenspaces of A?
5. Can x2 + x + 1 be the minimal polynomial of a real 3 × 3 diagonalizable matrix or a complex
3 × 3 diagonalizable matrix?
6. If A is a diagonalizable matrix in Mn (R) such that Ak = In for some positive integer k ≥ 1, then
show that A2 = In .
7. Let A be a matrix in M3 (R). If A is not similar to a lower triangular matrix over R, then show
that A is similar to a diagonal matrix over C.
8. If zero is the only eigenvalue of a linear operator T on a finite-dimensional complex vector
space, then show that T is nilpotent.
9. Let T be a linear operator on an n-dimensional vector space V such that T k = z, the zero operator
on V, for some positive integer k. Prove that T n = z.
10. Let T be a linear operator on a finite-dimensional vector space V over a field F with minimal
polynomial m(x). For any polynomial f (x) over F, let r(x) be the gcd of f (x) and m(x). Prove
that ker f (T ) = ker r(T ).
11. Let T be a linear operator on a finite-dimensional vector space V over a field F with minimal
polynomial m(x). Prove that for any irreducible polynomial r(x) over F, r(x) and m(x) are
relatively prime if and only if the operator r(T ) is invertible on V.
12. Let T be a diagonalizable operator on a finite-dimensional real vector space V. Prove that there
is no non-zero v ∈ V such that (T 2 + T + I)(v) = 0, the zero vector of V. (I is the identity map
on V).
13. Let T be a diagonalizable operator on a finite-dimensional vector space V over a field F. Show
that for any polynomial f (x) over F such that f (a) is non-zero for any eigenvalue a of T , the
operator f (T ) on V is not only diagonalizable, but also invertible.
14. Let T be a nilpotent operator on a finite-dimensional vector space V over a field F. Show that
for any polynomial f (x) over F such that the constant term of f (x) is non-zero, the operator
f (T ) is invertible, hence cannot be nilpotent.
15. Let T be a linear operator on a finite-dimensional vector space V. Suppose that V can be
decomposed as a direct sum
V = W1 ⊕ W2 ⊕ · · · ⊕ Wk
of T -invariant subspaces. Let T i be the restriction of T to the T -invariant subspace Wi . Prove

that T is diagonalizable if and only if each T i is diagonalizable.
16. Let A be a lower triangular matrix in Mn (F) having distinct eigenvalues a1 , a2 , . . . , ak . Sup-
pose that the algebraic multiplicity of the eigenvalue ai is di . Verify directly that A satisfies its
characteristic polynomial
(x − a1 )d1 (x − a2)d2 · · · (x − ak )dk .
17. Use the preceding exercise to give another proof of Cayley–Hamilton Theorem for matrices in
Mn (C).
18. Prove that any permutation matrix in Mn (C) is diagonalizable.
19. Give an example of a 4 × 4 real permutation matrix P ! I4 which is diagonalizable over R.
20. Give an example of a 4 × 4 real permutation matrix which is not diagonalizable over R.
21. Let T be a linear operator on a vector space V, and W be a T -invariant subspace of V. Define T
on the quotient space V = V/W by T (v) = T (v) + W for any v ∈ V. Verify that T is well-defined,
and that it is a linear operator on the quotient space V/W.
We sketch an alternative proof of Proposition (5.6.6) in the following exercise.
22. Let A ∈ Mn (F), where F is an arbitrary field. If the minimal polynomial m(x) of A is
m(x) = a0 + a1 x + a2 x2 + · · · + ar−1 xr−1 + xr ,
where ai ∈ F, define matrices B j for j = 0, 1, 2, . . . , r − 1, as follows:
B0 = I
B j = A j + ar−1 A j−1 + · · · + ar− j+1 A + ar− j I,
where I is the n × n identity matrix over F.

(a) Show that
B j − AB j−1 = ar− j I for j = 1, 2, . . . , r − 1
and
−ABr−1 = a0 I.
(b) If B(x) is the polynomial with matrix coefficients given by
B(x) = Br−1 + Br−2 x + · · · + B1 xr−2 + B0 xr−1 ,
then prove that (xI − A)B(x) = m(x)I.

(c) Hence prove that the characteristic polynomial of A divides m(x)n .
(d) Finally, use properties of irreducible polynomials over fields to prove that any irreducible
factor of the characteristic polynomial of A divides m(x).
23. Let A ∈ Mn (F) whose characteristic polynomial ch(x) factors over F as follows:
ch(x) = (x − λ1 )d1 (x − λ2 )d2 . . . (x − λk )dk .
Prove that
T r(A) = d1 λ1 + d2 λ2 + · · · + dk λk
and
det(A) = λ1 d1 λ2 d2 . . . λk dk .
24. Let A be the following matrix in M3 (C):

 
1 0 0 
 
A = 0 ω 0  .
 2 
0 0 ω
Prove that T r(A) = T r(A2 ) = 0, but that A is not nilpotent.

25. Give an example of non-nilpotent matrix A ∈ Mn (C) such that
T r(A) = T r(A2 ) = · · · = T r(An−1 ) = 0.
Can you find a non-nilpotent matrix in Mn (R) with similar properties?

26. Let A ∈ Mn (C) be an invertible matrix such that Ak is diagonalizable for some positive integer
k. Prove that A itself is diagonalizable.
We need Newton’s identities for the following exercises. These identities relate power
sums pk (x1 , x2 , . . . xn ) = x1 k + x2 k + · · · + xn k of n variables x1 , x2 , . . . , xn to elementary
symmetric polynomials ek (x1 , x2 , . . . , xn ) given by
e0 (x1 , x2 , . . . xn ) = 1,
1n
e1 (x1 , x2 , . . . xn ) = xi ,
i=1
1
e2 (x1 , x2 , . . . xn ) = xi x j etc.
i≤ j
In general, e j (x1 , x2 , . . . , xn ) for 1 ≤ j ≤ n is the sum of the products of x1 , x2 , . . . , xn taken

j at a time. Thus, for example, en (x1 , x2 , . . . , xn ) = x1 x2 . . . xn . Then, the Newton’s identities
can be stated as
k
1
kek (x1 , x2 , . . . , xn ) = (−1)i−1ek−i (x1 , x2 , . . . , xn )pk (x1 , x2 , . . . , xn ),
i=1
valid for any positive integer k, (1 ≤ k ≤ n).

27. Let A ∈ Mn (F). If
T r(A) = T r(A2 ) = · · · = T r(An ) = 0,
then show that A is nilpotent.

(Hint: Consider A as a matrix over an extension of F, where the characteristic polynomial
of A factors completely into a product of linear factors and then use Schur’s theorem to reduce
to the case of a upper triangular matrix.)
28. Let A, B ∈ Mn (F). If T r(Ak ) = T r(Bk ) for each positive integer k, (1 ≤ k ≤ n), then show that A
and B have the same characteristic poynomial. Hence show that if A, B ∈ Mn (C) with T r(Ak ) =
T r(Bk ) for each positive integer k, (1 ≤ k ≤ n), then A and B have the same set of n eigenvalues.
29. Let T be a linear operator on a finite-dimensional vector space V such that its minimal polyno-
mial is a product of linear factors. If a proper subspace W of V is T -invariant, then show that
there is some v " W and an eigenvalue λ of T such that (T − λI)v ∈ W.
(Hint: Use Proposition (5.6.10).)
5.7 REAL QUADRATIC FORMS

Real quadratic forms appear naturally in various branches of mathematics (such as co-ordinate geom-
etry of R2 and R3 and applications of linear algebra) as well as in physics, statistics and economics.
Special types of real quadratic forms play an important role in number theory. Thus it is a topic that
every student of mathematics should be familiar with. We are now in a good position to introduce the
reader to real quadratic forms as such forms are closely related to real symmetric matrices which we
have already studied in detail; our aim is to develop enough theory to be able to classify conics in R2
and R3 .
We give a couple of examples to show how real symmetric matrices give rise to real quadratic
forms. Consider first the real symmetric matrix
' (
1 2
A= .
2 1
If x denotes the column vector (x1 , x2 )t , then

(2 3 '
x1 1 2
xt Ax = (x1 , x2 )
x2 2 1
2 3
x
= (x1 + 2x2 , 2x1 + x2 ) 1
x2
= x21 + 4x1 x2 + x22 .
Thus xt Ax is an expression where the degree of each term is two (degree of 2x1 x2 is the sum of the
degrees of x1 and x2 ) and so is a homogeneous polynomial of degree 2 in variables x1 and x2 . Note
that the expression is of the form x2 + 4xy + y2 which is the standard equation of a circle in R2 .
Similarly, for the symmetric matrix
 1
 4 1 − 
 2 
 
 
 1 
A =  1 2 ,
 2 
 
 
 1 1 
− 1
2 2
Real Quadratic Forms 311
and x = (x1 , x2 , x3 )t , we see that

 
 4 1 − 12   
   x 
   1 
t 1
x Ax = (x1 , x2 , x3 )  1 2 2   x2 

   
  x3
− 21 21 1
= 4x21 + 2x22 + x23 + 2x1 x2 − x1 x3 + x2 x3 ,
which, as in the previous example, is a homogeneous polynomial of degree 2 in variables x1 , x2 and x3 .

Such homogeneous polynomials of degree 2 in a number of variables are also known as real quadratic
forms.
Keeping these examples in mind, we present the general definition of a real quadratic form.
Definition 5.7.1. A real quadratic form q(x1 , x2 , . . . , xn ) in n variables x1 , x2 , . . . , xn is a ho-

mogeneous polynomial of the type
n
1 1
q(x1 , x2 , . . . , xn ) = cii x2i + ci j xi x j , (5.27)
i=1 i< j
where the coefficients ai j are real numbers.
If one thinks of x1 , x2 , . . . , xn as the components of the column vector x ∈ Rn , then the quadratic
form q can be considered a function on the real vector space Rn ; in that case we denote the quadratic
form as q(x).
Given a quadratic form q, as in Equation (5.27), we can associate a real symmetric matrix A = [ai j ]
with q in the following manner: for 1 ≤ i ≤ j ≤ n, we let

 if i = j
cii


ai j = 
 1

 ci j if i ! j
2
and set a ji = ai j . Then A = [ai j ] is a real symmetric matrix of order n; we shall call A the matrix of
the quadratic form q. Now, if x is the column vector (x1 , x2 , . . . , xn )t , then the product Ax is also a
4
column vector whose i th component is nj=1 ai j x j . Thus xt Ax can be expressed as the double sum
n 1
1 n
a i j xi x j , (5.28)
i=1 j=1
which can rearranged, by grouping the terms for which i = j first. For the rest of the terms for which
i ! j, note that xi x j = x j xi and ai j = a ji . Therefore the double sum in Equation (5.28) can be expressed
as
n
1 1
aii xi xi + 2 a i j xi x j .
i=1 i< j
We have thus shown that
q(x) = xt Ax.
Conversely, the same argument shows that for any real symmetric matrix A = [ai j ], the product xt Ax
is a real quadratic form in n variables, or equivalently on Rn .
The following result then gives a working definition of a real quadratic form.
Proposition 5.7.2. A real quadratic form in n variables x1 , x2 , . . . , xn is xt Ax for some real

symmetric matrix A of order n and the column vector x = (x1 , x2 , . . . , xn )t ∈ Rn .
Principal Axes Theorem

Consider a quadratic form q with associated symmetric matrix A ∈ Mn (R). Then we know, by Proposi-
tion (5.3.23), that A has n real eigenvalues, say λ1 , λ2 , . . . , λn , not necessarily distinct. Moreover, by
Proposition (5.3.24), there is an orthogonal matrix Q ∈ Mn (R) such that Q−1 AQ = Qt AQ is the diag-
onal matrix D = diag[λ1, λ2 , . . . , λn ]. Consider the change of coordinates given by x = Qy, where
y = (y1 , y2 , . . . , yn )t . Then
xt Ax = yt Qt AQy = yt Dy,
showing that the quadratic form q can be expressed as
λ1 y21 + λ2 y22 + · · · + λn y2n .
One says that the orthogonal change of coordinates has removed the cross-product terms of q; cross-
product terms refer to those involving any product xi x j for i ! j in the expression for q. Note that
the columns of the orthogonal matrix Q form an orthonormal basis of Rn consisting of orthonormal
eigenvectors of A; the ith column of Q is the unit eigenvector belonging to the eigenvalue λi .
Thus, we have the following important result about real quadratic forms.
Theorem 5.7.3. (Principal Axes Theorem) A real quadratic form

n
1 1
q(x) = q(x1 , x2 , . . . , xn ) = aii x2i + 2 a i j xi x j
i=1 i< j
on Rn can be reduced to its diagonal form

n
1
q(y) = q(y1 , y2 , . . . , yn ) = λi y2i
i=1
by an orthogonal change of coordinates x = Qy, where λ1 , λ2 , . . . , λn are the eigenvalues of the real
symmetric matrix A = [ai j ] associated with form q(x1 , x2 , . . . , xn ) and Q is an orthogonal matrix
whose columns form an orthonormal basis of eigenvectors of A.
It must be pointed out that the orthogonal change of co-ordinates x = Qy means simply this: if
a vector in Rn has x = (x1 , x2 , . . . , xn )t as its co-ordinate vector with respect to the standard basis
of Rn , then y = (y1 , y2 , . . . , yn )t is its co-ordinate vector with respect to the orthonormal basis of
Rn constituted by the columns of the orthogonal matrix Q. The important point about an orthogonal
change of coordinates is that such a change does not alter the distances between points of Rn (see part
(b) of Proposition 3.7.14 about properties of orthogonal matrices).
To apply the Principal Axes theorem to a specific real quadratic form, in practice, we adopt the
same procedure used in diagonalizing a real symmetric matrix.
EXAMPLE 56 Consider the real quadratic form
q(x1 , x2 , x3 ) = 2x21 + 2x22 + 2x23 + 2x1 x2 + 2x1 x3 + 2x2 x3 . (5.29)
It is clear that the matrix associated to q is

 
2 1 1
 
A = 1 2 1,
 
1 1 2
the real symmetric matrix which we diagonalized in Example 30 of Section 5.3. We

had shown that A has two eigenvalues λ = 1, 4 and that an orthonormal basis for R3
can be chosen whose vectors are respectively
 √   √   √ 
 1/ 2   1/ √6   1/ 3 
 √     √ 
 −1/ 2 ,  √1/ √6 ,  1/ √3 ,
     
0 − 2/ 3 1/ 3
where the first two are eigenvectors for λ = 1 and the third an eigenvector for λ = 4.
Therefore if Q is the orthogonal matrix whose columns are these basis vectors, then
   
2 1 1 1 0 0 
t    
Q 1 2 1 Q = 0 1 0 .
  
1 1 2 0 0 4
It then follows from the proof of the Principal Axes theorem that the following
change of co-ordinates
   √ √ √  
 x1   1/ √2 1/ √6 1/ √3 y1 

 x2  = −1/ 2   
y ,
   √1/ √6 1/ √3  2 
x3 0 − 2/ 3 1/ 3 y3
transforms the given quadratic form to
q(y1 , y2 , y3 ) = y21 + y22 + 4y23 . (5.30)

Observe that the preceding matrix equation gives the following explicit relations
between the original and the new co-ordinates:
1 1 1
x1 = √ y1 + √ y2 + √ y3
2 6 3
1 1 1
x2 = − √ y1 + √ y2 + √ y3
2 6 3
√
2 1
x3 = − √ y2 + √ y3
3 3
It should be clear to the reader that preceding transformation of the quadratic form
by matrix method amounts to, in practical terms, showing that by substituting each
xi , by its formula in terms of the yi s, in the expression for q in Equation (5.29) yields
Equation (5.30). One cannot but marvel the way the correct substitution was found
which removes the cross-product terms in q.
As our main application of real quadratic forms, we now classify conics in R2 and R3 .
Conic Sections
From analytic geometry, we know that a quadratic equation of the form
ax2 + 2bxy + cy2 + dx + ey + f = 0, (5.31)
where the coefficients are all real and in which a, b and c are not all zeros, represents a conic section.
It means that for suitable choices of the coefficients in Equation (5.31), the resultant equation repre-
sents a circle, an ellipse, a hyperbola or a parabola in general; however, equation (5.31) also includes
degenerate cases as well as cases with empty solutions such as x2 + y2 + 1 = 0. We now present a brief
discussion of the classifications of the conics represented by Equation (5.31).
The nature of the conic represented by Equation (5.31) is determined largely by the quadratic form
ax2 + 2bxy + cy2
associated with the expression in Equation (5.31) and so by the corresponding real symmetric matrix
' (
a b
A= .
b c
Equation (5.31) then can be represented as
2 3 2 3
x d
X t AX + Bt X + f = 0 for X = ∈ R2 and B = .
y e
Note that, as in analytic geometry, we are denoting the components of a vector as x and y and not as
x1 and x2 .
As A is a real symmetric matrix, it has two real eigenvalues, say λ1 and λ2 . Now, by the Principal
Axes Theorem (5.7.3), there is an orthogonal matrix Q of order 2 such that the change of coordinates
X = QX ' , where X ' = (x' , y' )t , diagonalizes the quadratic form in Equation (5.32) to the form λ1 x' 2 +
λ2 y' 2 . Thus the expression in Equation (5.31) reduces in the x' y' -plane to
λ1 x' 2 + λ2 y' 2 + d1 x' + e1 y' + f = 0, (5.32)

where Bt QX ' = d1 x' + e1 y' . Next, observe that as Q is an orthogonal matrix, QQt = I2 and so det Q =
±1. If det Q = −1, consider the matrix P obtained from Q by interchanging its columns. It is easy
to see that P is an orthogonal matrix with det P = 1 and the change of coordinates X = PX ' reduces
the quadratic form X t AX to the diagonal form X ' t diag[λ2, λ1 ]X ' . In other words, without any loss of
generality, we can assume that the orthogonal matrix Q, for the change of coordinates X = QX ' , has
determinant 1, that is, Q is a rotation in R2 (see Definition 3.7.15).
We now proceed to classify the curves given by Equation (5.31) using the equivalent Equation
(5.32).
Case 1: Both the eigenvalues λ1 and λ2 of A are non-zero.

In case the eigenvalues have the same sign, then completing the squares in Equation (5.32), we can
put it in the form
λ1 (x' + d2 )2 + λ2 (y' + e2 )2 = f2 , (5.33)
for some real numbers d2 , e2 and f2 .

If f2 is non-zero and has the same sign as λ1 and λ2 , then the translation x'' = x' +d2 and y'' = y' +e2
reduces Equation (5.33) (after replacing x'' and y'' by x and y) to the standard form of an ellipse
x2 y 2
+ = 1,
α2 β2
N N
where α = f2 /λ1 and β = f2 /λ2 are non-zero positive reals. 2α and 2β are the lengths of the ellipse
along the new x-axis and y-axis, respectively; the larger of these two is the length of the major axis
whereas the other one is the length of the minor axis. Note that all we have done is to rotate the
coordinate axes of the original xy-plane (through the rotation Q) and then shifted the origin (by a
translation) so that axes of the new xy-plane are the axes of the ellipse.
The degenerate subcases will occur if non-zero f2 has the sign opposite to that of the eigenvalues in
which case Equation (5.33) has no graph (imaginary ellipse) or if f2 = 0 in which case Equation (5.33)
represents a pair of straight lines given by x' + d2 = 0 = y' + e2 .
If the non-zero eigenvalues have opposite signs, then as in the preceding case, after completing the
square we can put Equation (5.32) in either of the two forms
x2 y2
− = ±1,
α2 β2
which are the standard forms of hyperbolas, or in the degenerate case in the form
λ1 (x' + d2 )2 − λ2 (y' + e2 )2 = 0,
which again represents a pair of straight lines.
Case 2: One of the eigenvalues is zero.

Without any loss of generality, we assume that λ1 = 0. In this case, Equation (5.32) can be rewritten
as
λ2 (y' + e2 )2 = d2 x' + f2 , (5.34)
where e2 , d2 and f2 are real numbers. The degenerate cases will arise for d2 = 0; if f2 = 0 too, then
Equation (5.34) is a pair of coincident lines; otherwise it will represent a pair of parallel lines (in case
λ2 and f2 have the same sign) or will not have a solution (in case λ2 and f2 have opposite signs). On
the other hand, if d2 is non-zero, then Equation (5.34) can be rewritten as the standard equation of a
parabola after performing suitable translation:
y2 = 4ax.
Case 3: Both the eigenvalues are zero.

This case occurs only when the matrix A is the zero matrix. So, Equation (5.31) represents the
straight line dx + ey + f = 0.
We now illustrate the classification provided by the preceding discussion in a couple of examples.
EXAMPLE 57 The symmetric matrix

' (
2 1
A=
1 2
has eigenvalues 3 and 1. Therefore, the equation
2x2 + 2xy + 2y2 − 4 = 0
represents an ellipse, as the eigenvalues of the associated matrix A are both positive
and f2 = 4. In fact, following the procedure of orthogonal diagonalization of real
symmetric matrices, we see that the change of coordinates X = QX ' effected by the
rotation
' √ √ (
1/ √2 1/ √2
Q=
1/ 2 −1/ 2
transforms the given equation to
3x' 2 + y' 2 = 4,
√
which is the equation of an ellipse with axes of lengths 4/ 3 and 4. To determine
the required rotation, we note that Q is the orthogonal matrix which diagonalizes the
real symmetric matrix A. So the columns of Q are the unit eigenvectors of A forming
an orthonormal basis of R2 and therefore can be computed by the usual methods; one
has also to make sure that det Q = 1, if necessary, by interchanging the columns of
Q.
It is also clear that if instead we consider the equation
2x2 + 2xy + 2y2 + 4 = 0,
then it cannot represent any graph as f2 now has sign opposite to that of the eigen-
values. This absence of graph can be explained by the fact that there are no real x
and y which can satisfy the equation.
Now, consider the equation
2x2 + 2xy + 2y2 + 2x − 2y − 4 = 0. (5.35)

The associated symmetric matrix is the same matrix A so that the same orthogonal
matrix diagonalizes A. However, we have to take into account the effect of Q on the
x and y terms of the equation. Note that if X = (x, y)t and X ' = (x' , y' )t , then writing
out the transformation X = QX ' explicitly, we obtain
1 1
x = √ x' + √ y'
2 2
1 ' 1 '
y= √ x − √ y.
2 2
It follows that Equation (5.35) is transformed by Q into
4
3x' 2 + y' 2 + √ y' − 4 = 0,
2
which, after completing the square, can be rewritten as
√
3x' 2 + (y' + 2)2 = 6
or as
x2 y2
+ =1
2 6
√
after affecting the translation x = x' and y = y' + 2. Thus Equation (5.35) represents
yet another ellipse whose major axis still lies along the new y-axis.
EXAMPLE 58 Consider the equation
2x2 − 4xy − y2 + 4 = 0.
In the notation we have adopted, this can be expressed as X t AX + 4 = 0, where

' (
2 −2
A= .
−2 −1
As the eigenvalues of A are 3 and −2, our discussion of the case of eigenvalues of
different signs shows that the given equation represents a hyperbola. The standard
equation of this hyperbola is easily seen to be
y' 2 x' 2
− = 1.
2 (3/4)
Similarly, the equation
2x2 − 4xy − y2 − 4 = 0
represents the hyperbola with its standard equation as

x'2 y'2
− = 1.
(3/4) 2
Classification of Quadrics
The treatment of conic sections in R2 can be extended to what are known as quadrics or quadratic
surfaces in R3 . We present a brief review of the classification of these surfaces now. The general
equation of a quadric is
ax2 + by2 + cz2 + 2dxy + 2exz + 2 f yz + 2px + 2qy + 2rz + s = 0. (5.36)
The relevant quadratic form is
ax2 + by2 + cz2 + 2dxy + 2exz + 2 f yz
with the associated real symmetric matrix

 
a d e 
d b

f .
 
e f c
As in the case of conic sections in R2 , one can find, by the Principal Axes theorem (5.7.3), an orthog-
onal change of coordinates X = QX ' which reduces the quadratic form to
λ1 x' 2 + λ2 y' 2 + λ3 z' 2 ,
where λ1 , λ2 and λ3 are the eigenvalues of A (verify). Observe that Q can be chosen such that det Q = 1
so Q can be assumed to be a rotation. Now, suppose that Equation (5.36) of the quadric changes to
λ1 x' 2 + λ2 y' 2 + λ3 z' 2 + δ1 x' + δ2 y' + δ3 z' + δ = 0. (5.37)
As in the case of quadratic conics, the nature of the surface represented by Equation (5.37) depends on
the signs of the eigenvalues of A and the constant term. If, for example, all the three eigenvalues λ1 , λ2
and λ3 are non-zero, then an obvious translation of the type x' = x'' + µ1 , y' = y'' + µ2 , z' = z'' + µ3 can
be used to reduce Equation (5.37) to the following form:
λ1 x'' 2 + λ2 y'' 2 + λ3 z'' 2 = µ, (5.38)
which then can be put in the standard form as follows. Consider the case when all the eigenvalues
have the same sign. In case µ is non-zero, Equation (5.38) can be rewritten (after dropping the primes)
as
x2 y2 z2
+ + =1
α2 β2 γ2
or as
x2 y2 z2
+ + = −1,
α2 β2 γ2
depending on whether µ has the same sign or the sign opposite to that of the eigenvalues. The first
equation represents an ellipsoid and the second an imaginary ellipsoid. When µ = 0, we have the
equation of the zero ellipsoid:
x2 y2 z2
+ + = 0.
α2 β2 γ2
Similarly, if the eigenvalues are not of the same sign, then we can put Equation (5.38), interchanging
x, y and z if necessary, in one of the following three forms:
x2 y2 z2
+ − = 0, 1 or − 1,
α2 β2 γ2
which represents an elliptic cone, a hyperboloid of one sheet or a hyperboloid of two sheets,
respectively.
Note that the preceding surfaces are possible only when the rank of the matrix A is 3, as the diagonal
form of A, consisting of its eigenvalues as diagonal entries, has three non-zero diagonal entries.
If the rank of A is 2, then we can assume, without any loss of generalities, that λ3 = 0. In that case,
depending on whether δ3 (the coefficient of the z' -term) is non-zero or zero in Equation (5.37), one
can perform suitable translations to eliminate the x' −, y' − terms, and the z' − term, if possible, in it to
represent it in either of the forms
x2 y2
± = ±z,
α2 β2
which represents an elliptic or a hyperbolic paraboloid, or in one of the forms
x2 y 2
± = 0, 1 or − 1,
α2 β2
which represents a pair of planes, an elliptic cylinder or a hyperbolic cylinder, respectively.
Finally, we consider the case when the rank of A is 1. So, we may assume that λ2 = λ3 = 0. As in
the preceding cases, suitable translations then allow us to reduce Equation (5.37) to either
x2 + α = 0,
which represents a pair of parallel planes, real or imaginary, or
x2 + 2αy = 0,
a parabolic cylinder.
Finally, it should be pointed out that in both R2 and R3 , in general, the reduction of the general
equation to the standard form is accomplished by a rotation followed by a translation, a combination
which is known as an isometry (see website for a discussion of isometries).
EXERCISES
1. For each of the following quadratic forms, determine an orthogonal matrix Q which diagonal-
izes the quadratic form; give the diagonalized form in each case:
a. 2x1 x3 + x2 x3 .
b. 3x21 + 2x22 + 2x23 + 2x1 x2 + 2x1 x3 + 2x2 x3 .
c. −9x21 − 7x22 − 11x23 + 8x1 x2 − 8x1 x3 .

d. 4x21 + x22 − 8x23 + 4x1 x2 − 4x1 x3 + 8x2 x3 .
2. Classify the conics in R2 , represented by the following equations, by reducing the equations
to the standard forms; in each case, specify the rotation Q and any translation, if necessary,
required to reduce the equation to the standard form (make sure that det Q = 1):
a. 7x2 + 4xy + 4y2 − 6 = 0.
b. 2x2 + 10xy + 2y2 + 21 = 0.
c. 5x2 − 4xy + 5y2 − 14x − 28y + 10 = 0.
√ N
d. 8x2 − 8xy + 2y2 − 4 5x + 12 5y + 6 = 0.
3. Classify the quadrics in R3 , represented by the following equations, by reducing the equations
to the standard forms; in each case, specify the rotation P and any translation, if necessary,
required to reduce the equation to the standard form (make sure that det P = 1):
a. x2 + yz = 0.
b. x2 + 3y2 + z2 + 2xy + 2xz + 6yz − 10 = 0.
c. x2 + 3y2 + z2 + 2xy + 2xz + 6yz + 18x − 12z − 10 = 0.
d. x2 − z2 − 4xy + 4yz + 6 = 0.
e. x2 − z2 − 4xy + 4yz + 6x − 3y − 12z − 6 = 0.
f. 9x2 − 4xy + 6y2 + 3z2 + 10x − 20y + 12z + 32 = 0.
6 Canonical Forms
6.1 INTRODUCTION
In Chapter 5, we saw that a linear operator on a finite-dimensional vector space can be diagonalized
only under some strict conditions on its minimal or characteristic polynomial. So we seek other simple
forms of matrix representations of linear operators. An upper or a lower triangular matrix is an exam-
ple of such simple forms, and we have already seen that over C, any operator can be represented as
a triangular matrix. But there are other matrix representations that reflect intrinsic properties of linear
operators. This chapter deals with some such representations usually known as the canonical forms.
To motivate our approach, recall that (see discussion after Proposition 5.5.5) a linear operator T on
a finite-dimensional vector space is diagonalizable precisely because the vector space can be decom-
posed as a direct sum of T -invariant subspaces such that the restrictions of T to these subspaces act
as some scalar times the identity operators of these subspaces. Our search for canonical forms for a
general linear operator T generalizes this approach. The underlying vector space is decomposed into
a direct sum of suitable T -invariant subspaces on which the restrictions of T act in ways that reflect
important features of T . Then, Proposition (5.5.5) allows us to have a matrix representation of T as
a block diagonal matrix. The decomposition of the vector space into direct sum of suitable invariant
subspaces is accomplished by a powerful result known as the primary decomposition theorem. We
begin by discussing this theorem in the next section.
6.2 PRIMARY DECOMPOSITION THEOREM

The proof of the following theorem relies heavily on properties of polynomials over fields; for relevant
details, see Section 5.2.
Theorem 6.2.1. Let T be a linear operator on a finite-dimensional vector space V over a field F,
whose minimal polynomial m(x) can be expressed as the following product
m(x) = (p1 (x))r1 (p2 (x))r2 · · · (pt (x))rt
of irreducible polynomials p j (x) over F, for 1 ≤ j ≤ t, where r j are positive integers.. Define, for each
j, 1 ≤ j ≤ t, the following subspaces of V:
W j = ker(p j (T ))r j .
321
322 Canonical Forms
Then the following assertions hold:

(a) Each W j is T -invariant.
(b) If T j = T W j is the restriction of T to W j , then the minimal polynomial of T j is (p j (x))r j .
(c) V can be expressed as the direct sum
V = W1 ⊕ W2 ⊕ · · · ⊕ Wm
of the T -invariant subspaces W j .
Proof. According to Proposition (5.5.2), the kernel of any operator on V which commutes with T has
to be T -invariant. Since every polynomial in T and, in particular p j (T )r j , commutes with T , assertion
(a) of the theorem follows. We take up assertion (c) next. The direct sum decomposition in (c) will be
proved by showing that each W j is the image of some projection on V (for conditions on projections
inducing direct sum decompositions, see Proposition 4.2.13).
The required projections P j will be defined as certain polynomials in T . To begin with we introduce
polynomials f1 (x), f2 (x), . . . , ft (x) using the factorization of m(x) as follows: for each j, 1 ≤ j ≤ t, let
m(x) ?
f j (x) = = (pk (x))rk .
(p j (x))r j k! j
From the uniqueness of factorization into irreducible factors, it is clear that these polynomials f j (x)
can have no common divisors other than scalar polynomials. Thus, they are relatively prime. It follows
(see Corollary 5.2.6) that there are polynomials q1 (x), q2 (x), . . . , qt (x) over F, such that
f1 (x)q1 (x) + f2(x)q2 (x) + · · · + ft (x)qt (x) = 1. (6.1)
We also note that the product f j (x) fk (x), for j ! k, is divisible by m(x), so that by properties of the
minimal polynomial,
f j (T ) fk (T ) = z, (6.2)
which is the zero operator on V.

Next, we define certain operators P1 , P2 , . . . , Pt on V as polynomials in T as follows. For each j,
1 ≤ j ≤ t, let P j = f j (T )q j (T ). Then by Equations (6.1) and (6.2),
P1 + P2 + · · · + Pt = I, (6.3)
which is the identity operator on V, and for j ! k,
P j Pk = z, (6.4)
the zero operator, as polynomials in T commute. Note that multiplying Equation (6.3) by a fixed P j
yields P j 2 = P j because of equation (6.4). Thus each P j is a projection on V.
As the projections P1 , P2 , . . . , Pt on V satisfy the conditions given in Equations (6.3) and (6.4),
it follows from Proposition (4.2.13) that V is the direct sum of the ranges of these projections. So
assertion (c) of the theorem will be established once we show that Im(P j ) = W j for each j.
Now, for any v ∈ Im(P j ), as P j is a projection, v = P j v, and so
(p j (T ))r j v = (p j (T ))r j (P j v) = (p j (T ))r j ( f j (T )q j (T )v),

Primary Decomposition Theorem 323
by definition of P j . Since by construction f j (x)(p j (x))r j = m(x), it follows that f j (T )(p j (T ))r j is the
zero operator on V. The preceding equation then shows that (p j (T ))r j v = 0 placing v ∈ ker(p j (T ))r j .
One concludes that Im(P j ) ⊂ W j .
To prove the reverse inclusion, assume that v ∈ W j . Thus (p j (T ))r j v = 0, which implies that Pi v = 0
for any i ! j as (p j (x))r j is a divisor of fi (x) by definition. It follows then by Equation (6.3) that
v = Iv = (P1 + P2 + · · · + Pt )v = P j v,
which shows, P j being a projection, that v ∈ Im(P j ). This completes the verification that Im(P j ) = W j
and so (c) is proved.
We finally prove assertion (b), that is, show that the restriction T j of T to W j has minimal polyno-
mial (p j (x))r j . Since W j is the kernel of (p j (T ))r j , it follows that (p j (T j ))r j w = 0 for any w ∈ W j . In
other words, T j satisfies the polynomial (p j (x))r j , which implies that the minimal polynomial h(x) of
T j divides (p j (x))r j .
On the other hand, h(x) being the minimal polynomial of T j , h(T j ), and hence h(T ), is the zero
operator on W j . Since f j (T ) acts as the zero operator on Wi for any i ! j and V is the direct sum of
the Wi , it follows that h(T ) f j (T ) is the zero operator on V as h(T ) and f j (T ) commute. Therefore,
the minimal polynomial m(x) = f j (x)(p j (x))r j of T divides h(x) f j (x), and so (p j (x))r j is a divisor of
h(x), which, together with the conclusion of the preceding para, implies that h(x) equals (p j (x))r j . This
proves assertion (b) of the theorem. !
We must point out that the proof of the primary decomposition theorem does not involve at any stage
the finite-dimensionality of the underlying vector space. Thus the theorem holds even for operators on
infinite-dimensional vector spaces, provided their minimal polynomials can be expressed as products
of finite numbers of irreducible polynomials.
The primary decomposition theorem is a very useful theoretical tool. However, in general, explicit
computations (for example, to find suitable bases for the summands W j , which may determine nice
matrix forms for the operator) based on the theorem are not practicable.
One of the cases where the primary decomposition theorem does yield significant results is when
the irreducible factors of the minimal polynomial of an operator are all linear. This, for example,
will be the case for any operator on a complex vector space as the only irreducible polynomials over
the field of C of complex numbers are the linear ones. In such cases, one has the following useful
implication of the primary decomposition theorem.
Corollary 6.2.2. Suppose that the minimal polynomial m(x) of a linear operator T on a finite-
dimensional vector space V over a field F is given by
m(x) = (x − λ1 )r1 (x − λ2)r2 . . . (x − λt )rt .

If T j is the restriction of T on the T -invariant subspace W j = ker(T − λ j I), then S j = T j − λ j I j , where
I j is the identity operator on W j , is a nilpotent operator on W j of index r j .
Proof. Proof is clear as, by the primary decomposition theorem, the minimal polynomial of T j is
(x − λ j )r j and so that of S j is xr j . !
Observe that for a diagonalizable operator T on a finite-dimensional vector space V with distinct
M
eigenvalues λ1 , λ2 , . . . , λt , its minimal polynomial m(x) = tj=1 (x − λ j ) is a product of distinct linear
factors. The T -invariant subspaces W j of the primary decomposition theorem, in this case, are the
324 Canonical Forms
eigenspaces of T , for, by definition, W j = ker(T − λ j I). Thus the decomposition of V as the direct sum
of W j , as in the theorem, is precisely the direct sum of the distinct eigenspaces of T , a result we proved
so laboriously earlier.
The projections associated with the decomposition also provides a nice characterization of the
diagonalizable operator T . To derive it, we consider the operator D = λ1 P1 + λ2 P2 + · · · λt Pt on V,
where P1 , P2 , . . . , Pt are the projections associated with the decomposition of V as the direct sum of
the eigenspaces W j , where W j = Im(P j ). Recall that for any j, P j acts as the identity on W j whereas
4
it acts as the zero operator on Wi for i ! j. Thus, for v = j w j in V, where w j ∈ W j , it is an easy
4 4
computation to show that Dv = D j w j = j λ j w j .
4 4
On the other hand, T w j = λ j w j for any w j ∈ W j . Thus, for v = j w j , one has T v = j λ j w j . As v
is arbitrary in V, T and D are equal implying that
T = λ1 P1 + λ2 P2 + · · · λt Pt . (6.5)
Conversely, we claim that if P j are non-zero projections on a finite-dimensional vector space V
over a field F such that
(i) P j Pk , for j ! k, is the zero operator on V;
(ii) P1 + P2 + · · · + Pt = I, the identity operator on V,
then a linear operator T on V, satisfying Equation (6.5), is diagonalizable with the scalars
λ1 , λ2 , . . . , λt as its eigenvalues.
We first show that each λ j is an eigenvalue of T . Multiplying both sides of Equation (6.5) by P j ,
one obtains T P j = λ j P j by Condition (i). Thus, if W j = ImP j (which means that w j ∈ W j if and only
if P j w j = w j ), then W j = ker(T − λ j I). But W j has non-zero vector as P j is non-zero, and so it is clear
that λ j is an eigenvalue of T with W j as the corresponding eigenspace. Next note that for any scalar λ,
Equation (6.5) coupled with Condition (ii), implies that
T − λI = (λ1 − λ)P1 + (λ2 − λ)P2 + · · · + (λt − λ)Pt .
Therefore, if (T − λI)v = 0 holds for any non-zero v ∈ V, then (λk − λ)Pk v = 0, for any k, as P j Pk is
the zero operator for k ! j. However, v is non-zero if and only if P j v is non-zero for some j. Thus for
any such j, the preceding equality implies that λ j − λ = 0. Thus, T can have no eigenvalue other than
λ1 , λ2 , . . . , λt .
4
Finally, we note that for any v ∈ V, by Condition (ii), v = j P j v. Since every non-zero vector
in Im(P j ) is an eigenvector of T , it then follows that V is spanned by eigenvectors of T . Thus, T is
diagonalizable as claimed.
Now we consider the general case of a linear operator whose minimal polynomial is a product of
linear factors, not necessarily distinct. In that case, the primary decomposition theorem (with a little
bit of help from Equation 6.5) provides a nice description of such operators, known as the Jordan–
Chevalley or the SN decomposition.
Proposition 6.2.3. Let T be a linear operator on a finite-dimensional vector space V, whose min-
imal polynomial m(x) is a product of linear factors. Then there is a diagonalizable operator S and a
nilpotent operator N, both on V, such that
T = S +N and S N = NS . (6.6)
Moreover, T determines the diagonalizable operator S and the nilpotent operator N uniquely.
The letter S stands for semi-simple; for, sometimes diagonalizable operators are also known as
semi-simple operators.
M
Proof. Let m(x) = tj=1 (x − λ j )r j be the minimal polynomial of T . Then according to the primary
decomposition theorem (see Equations 6.3 and 6.4), there are non-zero projections P1 , P2 , . . . , Pt on
V, each a polynomial in T , such that (i) Pk P j , for k ! j, is the zero operator on V and (ii) P1 + P2 +
· · · + Pt = I, the identity operator on V. Therefore, by the discussion following Equation (6.5), the
operator
S = λ1 P1 + λ2 P2 + · · · + λt Pt . (6.7)
is a diagonalizable operator on V. We next set
N = T − S. (6.8)
We claim that the operator N on V is a nilpotent one. We first note that, by multiplying the relation
I = P1 + P2 + · · · + Pt by T , one obtains
T = T P1 + T P2 + · · · + T Pt .
This relation, combined with the Definition (6.7) of S , implies the following formula for N = T − S :
t
1
N= (T − λ j I)P j . (6.9)
j=1
In fact, we shall show that, for any positive integer r, the relation
t
1
Nr = (T − λ j I)r P j (6.10)
j=1
holds, by induction on r. Equation (6.9) starts the induction for r = 1. So we assume that (6.10) holds
4
for any r ≥ 1. Now, multiplying the expression for N r by N = k (T − λk I)Pk , we can simplify the
product by noting that any Pk , being a polynomial in T , commutes with the polynomial (T − λk I).
Since P j Pk , for j ! k, is the zero operator and P2k = Pk , our simplification yields the relation N r+1 =
4t r+1
k=1 (T − λk I) Pk . This shows that, by induction, the relation (6.10) holds for any positive integer r.
Recall that in the proof of the primary decomposition theorem it was shown that Im(P j ) = ker(T −
λ j I)r j for any j. Therefore, if we choose a positive integer r such that r > max{r j } for all j, then
for any v ∈ V, (T − λ j I)r P j v = 0 for any j. It then follows, from Equation (6.10), that for such an r,
4
N r v = tj=1 (T − λ j I)r P j v = 0. Thus N r is the zero operator and so our claim that N is nilpotent is
established.
Since, by the definition of N, T = S + N, where S and N, being polynomials in T , commute, the
first part of the proposition is proved. Thus to complete the proof, we need to prove the uniqueness
part. So let T = S ' + N ' be another decomposition, where S ' is diagonalizable and N ' nilpotent such
that they commute. It is then trivial that each of S ' and N ' commutes with T and hence with any
polynomial in T . In particular, each commutes with S and N as these were constructed as polynomials
in T . Now, the commuting diagonalizable operators S and S ' are simultaneously diagonalizable (see
Exercise 19 in Section 5.5) and so S − S ' is a diagonalizable operator on V. On the other hand, as
the nilpotent operators N and N ' commute, the operator N ' − N is a nilpotent operator on V. Since
326 Canonical Forms
T = S + N = S ' + N ' , we have just shown that the operator S − S ' = N ' − N on V is diagonalizable as
well as nilpotent, which implies that S − S ' has to be the zero operator. Thus, we may conclude that
S = S ' and N ' = N, which proves the required uniqueness. !
It is customary to call S the diagonalizable part and N the nilpotent part of T .

The matrix version of Proposition (6.2.3) is clear, and we leave it to the reader to formulate them.
However, for a proper appreciation of the matrix version, we have to wait till we develop the theory of
Jordan forms later in the chapter. To illustrate the difficulties of actually computing the diagonalizable
and nilpotent parts of a matrix, we present a simple example.
Consider the real matrix A given by
 
0 0 0 −1

1 0 0 0
A =  ,
0 1 0 2
0 0 1 0
whose minimal polynomial (as well as the characteristic polynomial) is clearly x4 − 2x2 + 1 (what’s so
clear about it?). Note that x4 − 2x2 + 1 = (x − 1)2(x + 1)2 , where x − 1 and x + 1 are irreducible over R.
Therefore, if T is the linear operator on R4 , represented by A, say with respect to the standard basis
of R4 , then the primary decomposition theorem implies that R4 = W1 ⊕ W2 , where the T -invariant
subspaces Wi are given by
W1 = ker(T − I)2 and W2 = ker(T + I)2 , (6.11)
where I is the identity operator on R4 . Let T j and I j denote the restriction of T to W j and the identity
of W j , respectively. Then it is clear that N1 = (T 1 − I1 ) and N2 = (T 2 − I2 ) are nilpotent of index 2 on
W1 and W2 , respectively.
Our first goal is to find suitable bases for the subspaces W j using the nilpotent operators N j . Fortu-
nately for us, each of the subspaces W j has dimension 2 (by checking the characteristic polynomial)
whereas the nilpotent operators are also of index 2; so the nilcyclic bases determined by N1 and N2
are ideal for our purpose (see, for example, Proposition 5.5.15).
We take the subspace W1 first. The basis we are looking for is {v1 , N1 v1 }, where v1 is so chosen
in W1 such that N1 v1 is non-zero. Since W1 = ker(T − I)2 , we can determine the vectors in W1 by
computing the solution space of (A − I4)2 x = 0 in R4 .
We leave it to the reader to verify that the row reduced form of
 
 1 0 −1 2

−2 1 0 −1
(A − I4)2 =  
 1 −2 3 −4
0 1 −2 3
is
 
1 0 −1 2

0 1 −2 3
 ,
0
 0 0 0
0 0 0 0
which shows that (x1 , x2 , x3 , x4 )t ∈ W1 if and only if the coordinates satisfy the system of equatios
x1 − x3 + 2x4 = 0
.
x2 − 2x3 + 3x4 = 0
Now it is easy to check that v1 = (1, 2, 1, 0)t ∈ W1 but (A − I4)(1, 2, 1, 0)t is non-zero. This implies
that for this choice of v1 , {v1 , N1 v1 } is the basis of W1 with respect to which the matrix of N1 is the
elementary Jordan block J2 (0). So the matrix B1 of T 1 with respect to the same basis is given by
' (
1 0
B1 = .
1 1
Similarly, one can find a basis {v2 , N2 v2 } of W2 with respect to which the matrix of T 2 is
' (
−1 0
B2 = .
1 −1
Since R4 is the direct sum of W1 and W2 , the union of the bases we have found for W1 and W2
provides a basis for R4 . It also follows that the matrix B of T with respect to this basis is the direct
sum of matrices B1 and B2:
 
1 0 0 0 
1 1 0 0 
B =  
0 0 -1 0 
0 0 1 -1
We can rewrite B as B = D + J, where D is the diagonal matrix D = [1, 1, −1, −1] and J is the nilpotent
matrix J = J2 (0) ⊕ J2 (0). Thus, we have shown that the original matrix A is similar to the sum of a
diagonal and a nilpotent matrix.
It must be pointed out that the example worked precisely because the dimensions of W1 and W2
are equal to the indices of the corresponding nilpotent operators N1 and N2 . In general, the canonical
forms of nilpotent operators (that is, the simplest possible matrix forms for such operators) will be
required. We take up the derivation of such canonical forms in the next section.
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications:
(a) The primary decomposition theorem is not valid for a linear operator whose minimal poly-
nomial is a power of a single irreducible polynomial.
(b) The primary decomposition theorem is not valid for a linear operator on an infinite-
dimensional vector space.
(c) Every matrix over C is similar to a sum of a diagonal and a nilpotent matrix.
(d) Every matrix over C is a sum of a diagonalizable and a nilpotent matrix.
(e) The SN decomposition of A is given by
     
1 2 3 1 0 0 0 2 3
     
A = 0 1 4 = 0 1 0 + 0 0 4 .
     
0 0 1 0 0 1 0 0 0
328 Canonical Forms
(f) If the minimal polynomial of a linear operator is a product of distinct linear factors, then the
image of each projection given by the primary decomposition theorem has dimension 1.
(g) If the minimal polynomial of a linear operator is a product of distinct linear factors, then the
nilpotent operator in the SN decomposition of the operator is the zero operator.
(h) If the minimal polynomial of a linear operator is xr , then the diagonalizable part of the
operator is the zero operator.
(i) If the minimal polynomial of an operator is a product of linear factors, then the operator is
diagonalizable if and only if the nilpotent part of the operator is zero.
2. For each of the following matrices A over the indicated fields F, find the SN decomposition if it
exists: ' (
1 1
(a) A = over F = R.
1 1
 
0 0 1
(b) A = 0 0 0 over F = R.
 
1 0 0
 
0 0 −1
 
A = 1 0 0.
 
0 1 0
Express the minimal polynomial of T as a product p1 (x)p2 (x) of monic irreducible polynomials
over R. Find bases of the kernels of p1 (T ) and p2 (T ), and compute the matrix of T with respect
to the basis of R3 thus found.
 
0 0 1 −1

1 0 0 0
A =  .
1 −1 −5 4
1 −1 −8 6
Express the minimal polynomial of T as a product p1 (x)p2 (x) of monic irreducible polynomials
over R. Find bases of the kernels of p1 (T ) and p2 (T ), and compute the matrix of T with respect
to the basis of R4 thus found.
 
−10 −7 23 
 
A =  −2 −3 6 .
 
−5 −4 12
Show that there exist diagonalizable operator S and nilpotent operator N on R3 such that T =
S + N. What are the matrices of S and N with respect to the standard basis of R3 ?
6. Let T be a linear operator on a finite-dimensional vector space V over a field F with characteristic
polynomial
ch(x) = (x − a1)d1 (x − a2)d2 · · · (x − at )dt

Jordan Forms 329
and minimal polynomial
m(x) = (x − a1)r1 (x − a2)r2 · · · (x − at )rt
over F. Let Wi = ker(T − ai )ri . Prove that dim Wi = di .

7. Prove the following variant of the primary decomposition theorem: Let T be a linear oper-
ator on a vector space V over a field F such that its minimal polynomial m(x) factorizes as
p1 (x)p2 (x) · · · pt (x), where the pi are pairwise relatively prime monic polynomials over F. Let
Wi = ker pi (T ). Prove the following assertions.
(a) The subspaces Wi are T -invariant.
(b) V = W1 ⊕ W2 ⊕ · · · ⊕ Wt .
(c) The minimal polynomial of T restricted to Wi is pi (x).
8. For a linear operator T on a vector space V whose minimal polynomial factors into a prod-
uct of powers of distinct irreducible polynomials, let V = W1 ⊕ W2 ⊕ · · · ⊕ Wt be the primary
decomposition of V. Prove that for any T -invariant subspace W of V
W = (W ∩ W1 ) ⊕ (W ∩ W2 ) ⊕ · · · ⊕ (W ∩ Wt ).
9. Deduce from the primary decomposition theorem the result that if the minimal polynomial of a
linear operator is a product of distinct linear factors, then it must be diagonalizable.
10. Let T be a linear operator on a finite-dimensional vector space over C, and let S be the diago-
nalizable part of T . Show that for any polynomial f (x) over C, f (S ) is the diagonalizable part of
f (T ).
11. Let the minimal polynomial of a linear operator T on a vector space V be m(x). Prove that there
is a vector in V whose T -annihilator is precisely m(x).
12. Let T be a linear operator on a finite-dimensional vector space V over a field F. Prove that the
characteristic polynomial of T is irreducible over F if and only if T has no T -invariant subspace
other than V and the zero subspace.
13. Let T be a linear operator on a finite-dimensional vector space over C with minimal polynomial
m(x), and let f (x) be any polynomial over F. Prove that ker( f (T )n ) = ker f (T ) for all n ≥ 1 if
and only if m(x) has no repeated factors.
6.3 JORDAN FORMS

We begin this section by first deriving the canonical form or the Jordan form of a nilpotent operator
on a finite-dimensional vector space and then go on to derive the Jordan form of a general operator
whose minimal polynomial splits into a product of linear factors.
We have already taken the first step in determining the canonical form of a nilpotent operator in
the last chapter. In Proposition (5.5.15), we had seen that if T is a nilpotent operator of index of
nilpotency r on a vector space V, then there is a T -invariant subspace W1 of dimension r and a T -
nilcyclic basis of W1 , relative to which the matrix of the restriction T 1 is the elementary Jordan block
Jr (0) of eigenvalue 0. In fact, if v ∈ V is a vector such that T r−1 v ! 0, then v, T v, . . . , T r−1 v is a
nilcyclic basis with respect to which the matrix of the restriction T 1 is clearly Jr (0). The idea behind
the derivation of the canonical form of T is to decompose V as a direct sum V = W1 ⊕ W2 ⊕ · · · ⊕ Wk of
T -invariant, T -cyclic subspaces such that the restriction T i of T to the subspace Wi for i ≥ 2 is nilpotent
on Wi but whose index of nilpotency does not exceed that of T i−1 . The existence of nilcyclic bases for
330 Canonical Forms
these subspaces then ensures that the matrix of T is a direct sum of elementary Jordan matrices whose
sizes are non-increasing.
The crucial step is to determine the nilcyclic bases for the T -invariant subspaces which will yield
the necessary Jordan forms. We consider a hypothetical canonical form of a nilpotent operator which is
a direct sum of elementary Jordan blocks of non-increasing sizes, to clarify this crucial step. Suppose
that T is nilpotent of index 6 on a vector space V of dimension 22 having its Jordan form as
J6 (0) ⊕ J6(0) ⊕ J5(0) ⊕ J3(0) ⊕ J2(0).
This corresponds to nilcyclic bases of the T -invariant subspaces Wi which we arrange in columns as
follows:
W1 W2 W3 W4 W5
v1 v2
T v1 T v2 v3
T 2 v1 T 2 v2 T v3 .
T 3 v1 T 3 v2 T 2 v3 v4
T 4 v1 T 4 v2 T 3 v3 T v4 v5
T 5 v1 T 5 v2 T 4 v3 T 2 v4 T v5
However, such nilcyclic bases will be constructed (for the proof of their existence) by determining the
vectors in each row, starting with the first row and then determining the vectors in each succeeding
row, by a well-defined procedure.
Before proving that such a procedure works, let us make some preliminary remarks and fix the
necessary notation. Suppose that T is nilpotent of index r on a finite-dimensional vector space V. For
any positive integer j, let
K j = ker T j .
Note that K1 , the kernel of T , is the usual eigenspace of T belonging to its sole eigenvalue 0. The
rest of the subspaces K j are sometimes called generalized eigenspaces. Since T r is the zero operator
whereas T r−1 is not, it follows that
Kr = V but Kr−1 ! V.
Hence, there exists some v ∈ V such that T r v = 0, but T r−1 v ! 0. Therefore, T r− j v ∈ K j , but T r− j v "
K j−1 . Thus, there is the following strict inclusion of generalized eigenspaces:
K1 ⊂ K2 ⊂ · · · ⊂ Kr−1 ⊂ Kr = V.
Put K0 = {0}.
Next, we define integers qi = qi (T ) for i = 1, 2, . . . , r by
qi = dim(Ki /Ki−1 ) = dim Ki − dim Ki−1 . (6.12)
Note that q1 = dim K1 = nullity(T ) and
dim Ki = q1 + q2 + · · · + qi
for any i = 1, 2, . . . , r. These, in turn, determine another set of integers si = si (T ) as follows
sr = qr and si = qi − qi+1 for i = 1, 2, . . . , r − 1. (6.13)

Jordan Forms 331
The following result implies that the integers si are non-negative.
Lemma 6.3.1. For i = 1, 2, . . . , r − 1,
dim(Ki /Ki−1 ) ≥ dim(Ki+1 /Ki ).
Proof. Let u1 + Ki , u2 + Ki , . . . , ul + Ki be linearly independent in the quotient space Ki+1 /Ki . It is

clear that the vectors T u1 , T u2 , . . . , T ul are in Ki . To prove the lemma, it suffices to show that the
cosets T u1 + Ki−1 , T u2 + Ki−1 , . . . , T ul + Ki−1 are linearly independent in Ki /Ki−1 . Now, if a1 (T u1 +
Ki−1 ) + a2(T u2 + Ki−1 ) + · · · + al (T ul + Ki−1 ) = Ki−1 , the zero vector in Ki /Ki−1 , then a1 T u1 + a2 T u2
+ · · · + al T ul ∈ Ki−1 forcing a1 u1 + a2 u2 + · · · + al ul ∈ Ki . This, however, implies that a1 (u1 + Ki ) +
a2 (u2 + Ki ) + · · · + al (ul + Ki ) = Ki , the zero of Ki+1 /Ki . The linear independence of these cosets then
implies that a1 = a2 = · · · = al = 0 which, in turn, proves the linear independence of the cosets T u1 +
Ki−1 , T u2 + Ki−1 , . . . , T ul + Ki−1 in Ki /Ki−1 . !
We also note that
s1 (T ) + s2 (T ) + · · · + sr (T ) = q1 (T ) = dim ker T. (6.14)
We remark that even though the integers qi = qi (T ) and si = si (T ) are determined by the gener-
alized eigenspaces of the nilpotent operator T of index r, we sometimes suppress the symbol T for
convenience.
We can now state the main result about nilpotent operator.
Proposition 6.3.2. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional

vector space V with K j = ker T j for 0 ≤ j ≤ r. Let si = si (T ) be the integers defined by Equation (6.13).
Also set
m = m(T ) = nullity(T ).
Then, there exists m vectors v1 , v2 , . . . , vm in V such that

(a) exactly s j of these vectors are in the difference K j − K j−1 for j = 1, 2, . . . , r;
(b) non-zero vectors of the form T k vi , for k ≥ 0 and 1 ≤ i ≤ m, form a basis of Kr = V.
Furthermore, any set of vectors u1 , u2 , . . . , ul in Kr such that the cosets u1 + Kr−1 , u2 +
Kr−1 , . . . , ul + Kr−1 are linearly independent in the quotient space Kr /Kr−1 , can be included among
v1 , v 2 , . . . , v m .
Proof. The proof is by induction on r. If r = 1, then T is the zero operator on V and vectors from
any basis of V will be the required ones trivially. So, we may assume that r > 1. Choose a basis
v1 + Kr−1 , v2 + Kr−1 , . . . , v sr + Kr−1 of the quotient space Kr /Kr−1 . (Note that sr = qr is the dimension
of Kr /Kr−1 ). The vectors T v1 , T v2 , . . . , T v sr are in Kr−1 , and the preceding Lemma (6.3.1) then
shows that the cosets T v1 + Kr−2 , T v2 + Kr−2 , . . . , T v sr + Kr−2 are linearly independent in Kr−1 /Kr−2 .
Now T 1 , the restriction of T to Kr−1 , is clearly a nilpotent operator of index r − 1, whose sequence
of generalized eigenspaces can be obtained from that of T by just excluding the last one, namely Kr .
It follows that
sr−1 (T 1 ) = qr−1 = qr−1 (T )

332 Canonical Forms
and
si (T 1 ) = si = si (T ) for i = 1, 2, . . . , r − 2.
Note that sr (T 1 ) is undefined. Observe also that m(T 1 ) = nullity(T 1 ) = m, as the kernel of T as well as
of T 1 is the same, namely K1 . Therefore, by the induction hypothesis, there is a set S1 of m vectors
u1 , u2 , . . . , um in Kr−1 , which includes T v1 , T v2 , . . . , T v sr by our claim in the preceding paragraph,
and is such that
(a) exactly s j (T 1 ) of the vectors are from K j − K j−1 for j = 1, 2, . . . , r − 1;
(b) non-zero vectors of the form T 1 k ui for i = 1, 2, . . . , m and k ≥ 0 form a basis of Kr−1 .
To obtain the required vectors in V, we replace the sr vectors T v1 , T v2 , . . . , T v sr in S1 by
v1 , v2 , . . . , v sr and rename the rest of the vectors in S1 as v sr +1 , . . . , vm to obtain a set S of m vectors
in Kr . Note that S1 had exactly sr−1 (T 1 ) vectors from Kr−1 . Since
sr−1 (T 1 ) = qr−1 (T )
and
sr−1 (T ) = qr−1 (T ) − qr (T ) = qr−1 (T ) − sr ,
it follows by the construction of S that it has exactly sr vectors from Kr − Kr−1 and sr−1 = sr−1 (T )
vectors from Kr−1 − Kr−2 . Furthermore, we have already noted that si (T 1 ) = si (T ) = si for i =
1, 2, . . . , r − 2, so S has exactly si vectors from Ki . Finally, T and T 1 coincide on Kr−1 . Therefore,
the properties of S1 , noted earlier, prove that S is the required set in Kr = V. !
This technical result provides us with all the information about canonical forms of nilpotent opera-
tors.
Proposition 6.3.3. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional

vector space V. Then, there is a basis of V with respect to which the matrix of T is a direct sum of
copies of elementary Jordan matrices Jr (0), Jr−1 (0), . . . , J1 (0) with Jl (0) appearing in the direct sum
exactly sl times. The total number of elementary Jordan matrices in the sum is nullity(T ). The sizes
and the number of elementary Jordan matrices are independent of the choice of the basis and uniquely
determined by T .
Proof. Given the nilpotent operator T of index r on V, consider the set of vectors v1 , v2 , . . . , vm
of the preceding proposition, where m = nullity(T ). For each i, 1 ≤ i ≤ m, let Vi be the T -nilcyclic
subspace of V spanned by the vectors T k vi for k ≥ 0; then dim Vi = l if and only if vi ∈ Kl − Kl−1 . In
that case, vi , T vi , . . . , T l−1 vi form a basis of Vi , with respect to which the matrix of the restriction
of T to Vi is precisely Jl (0), the l × l elementary Jordan matrix of eigenvalue 0. The proposition also
implies that the number of such T -cyclic subspaces is sl for any 1 ≤ l ≤ r. It is also clear that dim V =
dim V1 + dim V2 + · · · + dim Vm . Therefore if B is the union of the nilcyclic bases of all the Vi , then B
is a basis of V and the matrix of T with respect to B is clearly the required direct sum of elementary
Jordan blocks.
Jordan Forms 333
Note that the number sl of l × l Jordan blocks appearing in the direct sum is given by
sr = qr and sl = ql − ql+1 for l < r,
where ql = dim(ker T l / ker T l−1 ) for any l. Thus, sl is determined uniquely by T and so independent
of the basis chosen for V. Similarly, the total number of Jordan blocks appearing in the direct sum is
uniquely determined by T , as it equals nullity(T ). !
We restate this result about the existence of a special form of matrix representation of a nilpotent
operator in a more convenient form by changing the notation slightly.
Theorem 6.3.4. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional

vector space V. Then, T determines a unique set of positive integers n1 ≥ n2 ≥ · · · ≥ nm with n1 = r,
and n1 + n2 + · · · + nm = dim V, such that there is a basis of V relative to which the matrix of T is a
direct sum of the following elementary Jordan blocks of eigenvalue 0, namely
Jn1 (0), Jn2 (0), . . . , Jnm (0).
Equivalently,
V = Wn1 ⊕ Wn2 ⊕ · · · ⊕ Wnm , (6.15)

where the subspace Wni is a T -nilcyclic subspace of dimension ni . As far as the change of notation
is concerned, note that, for example, our new integers n1 = n2 = · · · = n sr are all equal to r, and if
sr−1 ! 0, then the corresponding integers are n sr +1 = · · · = n sr +sr−1 = r − 1, and so on.
It is clear from Proposition (6.3.3) that T determines these integers ni uniquely, subject to the
following two conditions: (i) ni are non-increasing, and (ii) the sum of the ni equals dim V
These integers are called the invariants of the nilpotent operator T . Note that there has to be at
least one invariant of T , namely n1 = r.
As usual, we have the matrix analogue of the results proved just now, and we frame them leaving
the derivation to the reader.
Theorem 6.3.5. Let A ∈ Mn (F) be nilpotent of index r. Then, there is a set of positive integers
n1 ≥ n2 ≥ · · · ≥ nm with n1 = r, 1 ≤ ni ≤ r, and n1 + n2 + · · · + nm = n, such that A is similar to a
matrix which is the direct sum of elementary Jordan blocks Jn1 (0), Jn2 (0), . . . , Jnm (0). The number m
of Jordan blocks equals nullity(A).
As in the case of operators, these integers are called the invariants of the nilpotent matrix A.
Also, the matrix which is the direct sum of the elementary Jordan blocks determined by A is called
the Jordan form or the canonical form of the nilpotent matrix A. Note that the number sl (A) of
elementary Jordan blocks of size l for any 1 ≤ l ≤ r, in the Jordan form of a matrix A is given by
sr (A) = qr (A) and sl (A) = ql (A) − ql+1(A) for l < r, (6.16)
where ql (A) = nullity(Al ) − nullity(Al−1).

We remark here that the invariants of a nilpotent operator T and any matrix representation A of
T are exactly the same, for, as the discussion preceding Theorem (6.3.4) shows, the invariants are
determined completely by the dimensions of ker T l for various l’s and hence by the nullities of Al .
334 Canonical Forms
The uniqueness of the invariants of a nilpotent operator implies the following useful result.
Proposition 6.3.6. Let A, B ∈ Mn (F) be nilpotent matrices having invariants n1 ≥ n2 ≥ · · · ≥ nm

and t1 ≥ t2 ≥ · · · ≥ tq , respectively. Then, A and B are similar if and only if m = q and ni = ti for all i.
Proof. If A and B have the same set of invariants, then they will be similar to the same Jordan form,
and so must themselves be similar.
If A and B are similar, then they are the representation of some nilpotent operator T on Fn with
respect to two bases of Fn . Since a nilpotent operator and any of its matrix representation share the
same invariants, it follows that A and B have the same invariants. !
Similarly, the following result about similar nilpotent operators holds.
Proposition 6.3.7. Two nilpotent operators on a finite-dimensional vector space are similar if and
only if they have the same invariants.
We consider some examples now. Recall that a nilpotent operator on an n-dimensional vector
space, or an n × n matrix with index of nilpotency r, has minimal polynomial xr and characteristic
polynomial xn .
EXAMPLE 1 Suppose that the invariants of a nilpotent operator T (or of a nilpotent matrix A)
are 3, 3, 2, 1, 1. Then, we have the following information about T : T is nilpotent of
index 3, acting on a vector space of dimension 3 + 3 + 2 + 1 + 1 = 10, or A is a 10 × 10
matrix of index of nilpotency 3. The Jordan form of T or A is the direct sum of
the elementary Jordan blocks J3 (0), J3 (0), J2 (0), J1 (0), J1 (0). Explicitly, the Jordan
form will be
 
 0 0 0 0 0 0 0 0 0 0 
 1 0 0 0 0 0 0 0 0 0 
 
 0 1 0 0 0 0 0 0 0 0 
 
 0 0 0 0 0 0 0 0 0 0 
 0 0 0 1 0 0 0 0 0 0 
 .
 0 0 0 0 1 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 
 
 0 0 0 0 0 0 1 0 0 0 
 
 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0
EXAMPLE 2 Assume that T is a nilpotent operator with minimal polynomial x3 and characteristic
polynomial x7 . This information is not enough to determine the Jordan form of T or
the similarity class of T . All that we can do is to specify the possible Jordan forms
of T . Note that T is acting on a vector space of dimension 7 (as the characteristic
polynomial has degree 7) and has index of nilpotency 3. The Jordan form of T can
be the direct sum of the elementary Jordan blocks listed in any one of the following:
• J3 (0), J3 (0), J1 (0)
• J3 (0), J2 (0), J2 (0)
• J3 (0), J2 (0), J1 (0), J1 (0)
• J3 (0), J1 (0), J1 (0), J1 (0), J1 (0).
Jordan Forms 335
These possibilities are determined by finding positive integers n1 ≥ n2 ≥ · · · ≥ nt ,

where n1 = 3 and n1 + n2 + · · · + nt = 7.
EXAMPLE 3 Consider two 3 × 3 nilpotent matrices over any field F having the same minimal
polynomial. Note that if the minimal polynomial is x3 , then the Jordan form of both
will be J3 (0), and so they are similar. For the other possibility, note that if a 3 × 3
nilpotent matrix has minimal polynomial x2 , its invariants have to be 2 and 1. Thus,
by Proposition (6.3.6), any two nilpotent matrices having minimal polynomial x2
will also be similar.
It is left as an exercise to the reader to find two 4 × 4 nilpotent matrices having the
same minimal polynomial but which are not similar over any given field.
EXAMPLE 4 Let T be a linear operator on R4 such that it is represented with respect to the standard
basis by the matrix:
 
0 0 0 0

1 0 0 0
A =  .
2 3 0 0
4 5 6 0
Computing higher powers of A, we see that

   
 0 0 0 0
  0 0 0 0

 0 0 0 0  0 0 0 0
A2 =   and A3 =  
 3 0 0 0  0 0 0 0
17 18 0 0 18 0 0 0
and A4 the zero matrix. Thus, A and T are nilpotent of index 4, and it is possible to
find a T -nilcyclic basis of R4 . We seek first a vector v in R4 such that T 3 v is non-
zero. Equivalently, we seek v = (x1 , x2 , x3 , x4 )t ∈ R4 such that A3 (x1 , x2 , x3 , x4 )t !
(0, 0, 0, 0)t ; from the explicit description of A3 , it is clear that any (x1 , x2 , x3 , x4 )t
satisfies the condition if x1 ! 0. Thus, we may choose the required v in many ways.
For simplicity, let us choose v = (1, 0, 0, 0)t = e1 . Then simple matrix multiplica-
tions show that Av = (0, 1, 2, 4)t , A2 v = (0, 0, 2, 17)t and A3 v = (0, 0, 0, 18)t . It fol-
lows, therefore, by Proposition (5.5.15) that v = (1, 0, 0, 0)t , T v = (0, 1, 2, 4)t , T 2 v =
(0, 0, 3, 17)t and T 3 v = (0, 0, 0, 18)t form a T -nilcyclic basis of R4 , relative to which
the matrix of T is the elementary Jordan block:
 
0 0 0 0

1 0 0 0
J4 (0) =  .
0 1 0 0
0 0 1 0
We leave it to the reader to determine the matrix P such that P−1 AP = J4 (0).
We now discuss the Jordan forms for general linear operators. To make the ideas clear, we first
look at a special case of a linear operator T on an n-dimensional vector space V whose characteristic
336 Canonical Forms
polynomial is of the form, say (x − λ)n for some scalar λ, so that T has a single eigenvalue. Thus the
minimal polynomial of T is (x − λ)r for some r ≤ n. To analyse T , we consider instead the operator S
on V given by S = T − λI, where I is the identity map on V. Then, by Corollary (6.2.2), S is nilpotent
of index r (as the minimal polynomial of S has to be xr ). Let r = n1 ≥ n2 ≥ · · · ≥ nm be the invariants
of S . Thus there is a basis {vi } of V such that the matrix J of S relative to this basis is a direct sum
of the Jordan blocks Jn1 (0), Jn2 (0), . . . , Jnm (0). Since T vi = S vi + λvi , it follows that the matrix of T
with respect to the same basis will be the sum of J, the matrix of S , and diag[λ,λ , . . . ,λ ], the matrix
of λI. In other words, the matrix of T will differ from J only in the diagonal; instead of 0 all along
the diagonal as in J, the matrix of T will have λ along the diagonal. Thus, the matrix of T will be the
direct sum of elementary Jordan blocks with eigenvalue λ. Recall that an elementary Jordan block
Jl (λ), of eigenvalue λ, is a matrix of order l having λ along the diagonal, 1 along the subdiagonal and
zeros everywhere else.
Because of the properties of the Jordan form of the nilpotent operator S (determined earlier in the
section), we can therefore conclude that if T ∈ EndF (V) has characteristic polynomial (x − λ)n and
minimal polynomial (x − λ)r , then there is a unique set of positive integers n1 ≥ n2 ≥ · · · ≥ nt with
4
n1 = r and tj=1 n j = n such that there is a basis of V, relative to which the matrix A of T is the direct
sum of Jordan blocks Jn1 (λ), Jn2 (λ), . . . , Jnm (λ), that is,
A = Jn1 (λ) ⊕ Jn2 (λ) ⊕ · · · ⊕ Jnm (λ).
We say, in general, that a matrix is in Jordan form if it can be expressed as a direct sum of elemen-
tary Jordan blocks of possibly different eigenvalues. Thus, we have shown that if the characteristic
polynomial of a linear operator on a finite-dimensional vector space of the form (x − λ)d , then it can
be represented by a matrix in Jordan form.
But given such a T , how does one determine the unique integers ni ? Since T has been shown to
be a sum of a nilpotent operator and a scalar multiple of the identity operator, it follows that these
integers for T are the same as the ones for the nilpotent part, which, by our discussion of invariantsof
a nilpotent operator, can be described as follows:
(a) n1 = r is the degree of the minimal polynomial of T .

(b) The number of Jordan blocks of size r, that is, the number of of ni such that ni = n1 = r is
precisely sr = dim ker(T − λI)r .
(c) For l = r − 1, r − 2, . . . , 2, 1, there will be exactly sl = (ql − ql+1 ) number of Jordan blocks of
size l where, for l > 1,
ql = dim ker(T − λI)l − dim ker(T − λI)l−1 , which implies that
ql = rank(T − λI)l−1 − rank(T − λI)l .
Moreover, one also has q1 = dim ker(T − λI) = n − rank(T − λI).
Note the use of the dimension formula of Theorem (4.2.7) in simplifying the expressions for q’s.
(d) The total number of Jordan blocks is
(q1 − q2 ) + (q2 − q3 ) + · · · + (q s−1 − qr ) + qr = q1 = nullity(T − λI).
In practice, these integers are best determined by forming the matrix (T − λI) with respect to any
convenient basis of V, and computing the ranks of successive powers of that matrix. We consider a
simple example to illustrate these remarks.
Jordan Forms 337
EXAMPLE 5 Consider an operator T whose matrix with respect to the standard basis of F4 is
 
λ 1 0 −1 
 0 
λ−1 0 1 
A =  ,
 0 0 λ 0 
1 0 0 λ+1
where λ is an arbitrary but fixed scalar from the field F. It is easy to see that the
characteristic polynomial of A, and therefore, of T , is (x − λ)4 . Thus, T − λI, where I
is the identity operator on F4 . We compute the powers of A − λI4 , the matrix of T − λI
with respect to the standard basis:
 
0 1 0 −1

0 −1 0 1
A − λI4 =  ; rank(A − λI4) = 2;
0 0 0 0
1 0 0 1
 
−1 −1 0 0
 1 
2 1 0 0
(A − λI4) =  ; rank(A − λI4)2 = 1;
 0 0 0 0
1 1 0 0
Finally, (A − λI4 )3 is the zero matrix, showing that x3 is the minimal polynomial
of A − λI4 and of T − λI. It follows from the formulae enumerated in the preceding
remarks that
s3 = q3 = nullity(A − λI4)3 − nullity(A − λI4)2 = 4 − 3 = 1.
Similarly, as q2 = nullity(A − λI4)2 − nullity(A − λI4)3 = 4 − 3 = 1, one obtains s2 =

q2 − q3 = 0. Finally, it is clear that s1 = 1.
So the Jordan form of the nilpotent operator T − λI will have 2 elementary blocks
and that the order of the first block must be 3. We can, therefore, conclude that the
Jordan form of T has to be the following direct sum of J3 (λ) and J1 (λ):
 
λ 0 0 0
1 
 λ 0 0
.
0 1 λ 0
 
0 0 0 λ
We can now put all the pieces together (one of the pieces is the primary decomposition Theorem
(6.2.1) to state the definitive result about the existence and uniqueness of the Jordan form of a linear
operator whose characteristic polynomial and therefore the minimal polynomial factors completely
into a product of linear factors over the base field. We first make precise the idea of a matrix in Jordan
form.
Definition 6.3.8. A matrix A in Mn (F), whose characteristic polynomial factors into linear factors
over F and which has m distinct eigenvalues λ1 , λ2 , . . . , λm , is said to be a matrix in Jordan form if
338 Canonical Forms
(a) A is a direct sum of m submatrices, say, A1 , A2 , . . . , Am ;

(b) for each j, 1 ≤ j ≤ m, the submatrix A j is a direct sum of elementary Jordan blocks of non-
increasing orders, each with eigenvalue λ j ; the number of such blocks inside A is nullity(A j −
λ j I j ), where I j is the identity matrix of order equal to that of A j .
Note that if m = 1, then A = A1 . Also, note that if A is in Jordan form then A is a lower triangular
matrix.
Theorem 6.3.9. Let T be a linear operator on a finite-dimensional vector space over a field F.
Suppose that the characteristic polynomial of T factors into a product of linear factors over F and T
has m distinct eigenvalues λ1 , λ2 , . . . , λm . Then, there is a basis of V with respect to which the matrix
A of T is in Jordan form with
A = A1 ⊕ A2 ⊕ · · · ⊕ Am ,
where, for each j, (1 ≤ j ≤ m), A j itself is a direct sum of elementary Jordan blocks of the type Jl (λ j ).
Proof. We sketch the proof as the ideas involved had been encountered already. By hypothesis, we can
M
assume that the characteristic polynomial of T factors over F as mj=1 (x − λ j )d j , where d j are positive
integers. Therefore, there are positive integers r j , where r j ≤ d j for each j, such that the minimal
polynomial of T is of the form
m
?
(x − λ j )r j .
j=1
For each j, let W j = ker(T − λ j I)r j . Then the primary decomposition theorem implies that
V = W1 ⊕ W2 ⊕ · · · ⊕ Wm ,
where each W j is T -invariant, and the restriction T j of T to W j has minimal polynomial (x − λ j )r j .

Therefore, if I j denotes the identity map on the subspace W j , then S j = T j − λ j I j is nilpotent on W j of
index r j . Observe that on W j , T j acts like S j + λ j I j .
The discussion preceding the example is applicable to each T j , and therefore, we may choose a
basis B j of W j with respect to which the matrix A j of T j is the direct sum of Jordan blocks:
Jn1, j (λ j ), Jn2, j (λ j ), . . . , Jnm j , j (λ j ),
where the positive integers n1, j ≥ n2, j ≥ · · · ≥ nm j , j are the invariants of the nilpotent operator S j on
W j . Note that the sum of these integers equals dim W j .
4
As V = ⊕W j , stringing together the bases B1 , B2 , . . . , Bm , we get a basis of V, with respect to
which the matrix J of T has the required form. !
Like the canonical form of a nilpotent matrix, the Jordan form of an operator or a matrix, if it exists,
is essentially unique as shown in the next proposition.
F such that the characteristic polynomial of T factors into a product of linear factors over F. Then T
determines its Jordan form A uniquely up to the order in which the eigenvalues of T appear inside A.
Jordan Forms 339
Proof. Let A = A1 ⊕ A2 ⊕ · · · ⊕ Am be a matrix in Jordan form representing T , corresponding to the

distinct eigenvalues λ1 , λ2 , . . . , λm of T . Thus, each A j itself is a direct sum of certain number of
elementary Jordan blocks with eigenvalue λ j and so is a lower triangular matrix with λ j appearing
along the diagonal. That T determines A uniquely (up to the order in which the matrices A j appear
inside A) is clear from the following observations.
(a) The number of submatrices A j is the number of distinct eigenvalues of T .
(b) For each fixed j, the submatrix A j alone, among the submatrices A1 , A2 , . . . , Am , has λ j along
its diagonal. Therefore, the number of times λ j appears along the diagonal of the lower triangular
matrix A j is clearly the multiplicity of the eigenvalue λ j as a root of the characteristic polynomial
of T . It follows that the order of A j is determined by the characteristic polynomial of T .
(c) Finally, we claim that the number, say q j , of elementary Jordan blocks inside A j is also deter-
mined by T . It is clear that q j is the nullity of the matrix A j − λ j I j , where I j is the identity matrix
of order the same as that of A j . We prove that
Nullity(A j − λ j I j ) = dim ker(T − λ j I), (6.17)
where I is the identity map on V; once proven, this relation establishes our claim. To prove
(6.17), we begin by considering the decomposition
V = V1 ⊕ V2 ⊕ · · · ⊕ Vm
of V as a direct sum of T -invariant subspaces corresponding to the matrix representation A =

A1 ⊕ A2 ⊕ · · · ⊕ Am of T . Let B j be the basis of V j for 1 ≤ j ≤ m such that the matrix of T with
respect to the basis B of V, obtained by taking the union of the B j , is A. Then, the matrix of T j ,
the restriction of T to the T -invariant subspace V j , is A j with respect to the basis B j . Observe
that, for k ! j, the matrix of T k − λ j Ik (where we let Ik also to denote the identity map of Vk ) with
respect to the basis Bk is lower triangular with non-zero diagonal entries, each equal to λk − λ j .
Therefore, the operator T k − λ j Ik on Vk is invertible and so ker(T k − λ j Ik ) is the zero subspace of
4
Vk . Now, expressing any v = m k=1 vk as the unique sum of vectors from the T -invariant subspaces
Vk , we see that
m
1
(T − λ j I)v = (T k − λ j Ik )vk . (6.18)
k=1
4
Now recall that by properties of direct sum decomposition (see Proposition 3.5.4) m k=1 wk = 0
for wk ∈ Vk if and only if each wk = 0. It then follows from Equation (6.18) (as a consequence of
our observation preceding the equation), that v ∈ ker(T − λ j I) if and only if v j ∈ ker(T j − λ j I j ).
Since dim ker(T j − λ j I j ) is the nullity of A j − λ j I j , Equation (6.17) follows.
The proof of the proposition is complete. !
We must point out, even at the risk of being repetitive, the basic features of the Jordan form of a
general linear operator.
(a) Each A j is a d j × d j matrix with a single eigenvalue λ j .
(b) Each A j , being the Jordan form of T j , is itself a direct sum of elementary Jordan blocks with
eigenvalue λ j . The first of these Jordan blocks will be Jr j (λ j ), r j being the multiplicity of λ j as
340 Canonical Forms
a root of the minimal polynomial of T j . The sizes of these Jordan blocks within A j from left to
right are non-increasing.
The type and frequency of these Jordan blocks within A j are determined by the procedure
outlined just before the Theorem (6.3.9).
(c) The number of elementary Jordan blocks in A j equals dim ker(T − λ j I). The sum of the sizes of
the blocks in A j must be d j .
(d) Thus, A j is a lower triangular matrix having the eigenvalue λ j along the diagonal (d j times), and
having either 1 or 0 along the subdiagonal.
As an illustration, we derive the Jordan form of a diagonalizable operator T on a finite-dimensional

M
vector space over a field F. Recall that the characteristic polynomial of T is a product mj=1 (x − λ j )d j
of linear factors over F, where the m scalars λ j are the distinct eigenvalues of T . Thus, T does have
a Jordan form J which is the direct sum of m matrices A j . We examine these A j now. As T is diag-
onalizable the minimal polynomial of T is the product of distinct linear factors and so the minimal
M
polynomial of T is mj=1 (x − λ j ).
Thus, the first, and therefore each of the elementary Jordan blocks with eigenvalue λ j that comprises
A j must be of order 1, that is, each block is the scalar λ j . We conclude that each A j is a d j × d j diagonal
matrix having the eigenvalue λ j on the diagonal. In other words, the Jordan form of T is the diagonal
matrix having the eigenvalues appearing along the diagonal as many times as their multiplicities as
roots of the characteristic polynomial. So, the Jordan form of the diagonalizable operator is nothing
but the diagonal matrix of T relative to a basis of eigenvectors of T that we derived earlier.
The matrix analogue of Theorem (6.3.9) is clear.
Proposition 6.3.11. Let A be a matrix in Mn (F) such that its characteristic polynomial factors into
linear factors over F. Then, A is similar to a matrix in Jordan form in Mn (F). Further, the Jordan form
of A is unique up to the order of appearance of its eigenvalues.
The uniqueness of the Jordan form of a matrix allows us to settle questions regarding similarity of
matrices having Jordan forms.
Corollary 6.3.12. Let A and B be two matrices in Mn (F) having Jordan forms. Then they are similar
if and only if their Jordan forms (up to a rearrangement of eigenvalues) are the same.
Let us discuss a few examples of Jordan forms. Recall that in Examples 4 and 5, we had already
discussed the Jordan form of an operator by relating it to some nilpotent one. As will be shown in
the next set of examples, in simple cases we can do away with this intermediate step of referring to
nilpotent operators, and compute the required Jordan forms by determining the relevant invariants by
examining the restrictions on them.
EXAMPLE 6 We find the possible Jordan forms of a linear operator T on an n-dimensional vector
space having the minimal polynomial (x − 1)2 for n = 3, 4 or 5. Since T has a single
eigenvector 1, the required Jordan form for any n is the direct sum of elementary
Jordan blocks with eigenvalue 1 alone. (In our notation, J has a single A j .) Recalling
the basic properties of A j , we therefore see that our task is to determine integers
n1 ≥ n2 ≥ n3 · · · such that
n1 = 2 and n1 + n2 + n3 · · · = n (6.19)
Jordan Forms 341
for, the characteristic polynomial of T has to be (x − 1)n . (In terms of our notation,
d1 = n.)
For n = 3: The only possible choice we have in this case is n1 = 2 and n2 = 1 so
that T has a unique Jordan form J which is the direct sum of J2 (1) and J1 (1):
 
1 0 0
 
J = 1 1 0.
 
0 0 1
For n = 4: This time there are two choices for the set of integers satisfying Equa-
tion (6.19): n1 = 2, n2 = 2 or n1 = 2, n2 = 1, n3 = 1. Correspondingly, there are two
possibilities for the Jordan form of T :
   
1 0 0 0 1 0 0 0
 
1 1 0 0 1 1 0 0
J =   or J =  .
0 0 1 0 0 0 1 0

0 0 1 1 0 0 0 1
For n = 5, It is easy to check that this time too, there are two sets of integers
satisfying Equation (6.19). The required Jordan form is thus, either the direct sum of
J2 (1), J2 (1) and J1 (1) or of J2 (1) and three copies of J1 (1):
   
1 0 0 0 0 1 0 0 0 0
1  
 1 0 0 0 1
 1 0 0 0

 0.
J = 0 0 1 0 0 or J = 0 0 1 0
   
0 0 1 1 0 0 0 0 1 0
 
0 0 0 0 1 0 0 0 0 1
EXAMPLE 7 Let us find the possible Jordan forms of a linear operator T on a eight-dimensional
vector space whose minimal polynomial is (x−1)3(x+1)4. Any of these Jordan forms
will be the direct sum of two matrices A1 and A2 , with eigenvalues 1 and −1, respec-
tively. The compositions of these two matrices will be determined by the algebraic
multiplicities of the corresponding eigenvalues, and therefore by the characteristic
polynomial of T . Note that the characteristic polynomial of T is of degree 8. More-
over, as the minimal polynomial and the characteristic polynomial have the same
linear factors, and as the minimal polynomial divides the characteristic polynomial,
we can determine possible characteristic polynomials of T easily. Accordingly, we
have the following cases to deal with.
Case 1: Characteristic polynomial (x − 1)3 (x + 1)5 . The point to note is that
once the multiplicities of an eigenvalue in the characteristic and the minimal poly-
nomial are known, the determination of the corresponding A j is completely in-
dependent of the other eigenvalues; one just follows the procedure of Example
6 for each eigenvalue separately. Thus, in this case, A1 is the Jordan form of
an operator having (x − 1)3 as the minimal as well as the characteristic polyno-
mial, whereas A2 is the Jordan form of one with minimal polynomial (x + 1)4
and characteristic polynomial (x + 1)5 . Recall that the degree of the minimal poly-
nomial is the size of the first block, and the sizes of the subsequent blocks are
342 Canonical Forms
non-increasing. Thus, A1 has to be J3 (1), and A2 has to be the direct sum of J4 (−1)
and J1 (−1):
 
  −1 0 0 0 0
 1 
1 0 0  −1 0 0 0
  
A1 = 1 1 0 and A2 =  0 1 −1 0 0.

   0
0 1 1  0 1 −1 0
0 0 0 0 −1
Case 2: Characteristic polynomial (x − 1)4(x + 1)4. This time we can assume that
A1 is the Jordan form of an operator having (x − 1)3 as the minimal polynomial but
(x − 1)4 as its characteristic polynomial, and A2 is the Jordan form of one having
(x + 1)4 as the minimal as well as the characteristic polynomial. As in the first case,
there can only be one choice for both A1 and A2 .
   
1 0 0 0 −1 0 0 0
1  
1 0 0  1 −1 0 0
A1 =   and A2 =  .
0 1 1 0  0 1 −1 0
0 0 0 1 0 0 1 −1
EXAMPLE 8 Let A be a 3 × 3 matrix over a field F having a single eigenvalue λ ∈ F. If the charac-
teristic polynomial of A factors completely into linear factors over F (in which case,
it must be (x − λ)3), then A is similar to a matrix in Jordan form. It is now easy to
see that depending on the minimal polynomial of A, it will be similar to one of the
following three matrices:
     
λ 0 0 λ 0 0 λ 0 0
0 λ 0, 1 λ 0, 1 λ 0.
     
0 0 λ 0 0 λ 0 1 λ
For example, the matrix

   
3 1 1 3 0 0
   
0 3 0 is similar to 1 3 0
 
0 0 3 0 0 3
which is in Jordan form, as they have the same minimal polynomial (x − 3)2.
Lest the reader think that if two matrices having the same characteristic polyno-
mial and the same minimal polynomial are similar to the same Jordan form as this
example suggests, let us remind her/him once more that assertion is not true in most
of the cases. Though the assertion is true for n × n matrices for n ≤ 3, it fails for
matrices for even n = 4. The following two different Jordan forms:
   
3 0 0 0
 3 0 0 0

1 3 0 0 1 3 0 0
  and  
0
 0 3 0 0
 0 3 0
0 0 1 3 0 0 0 3
Jordan Forms 343
provide us with a counterexample, as these two have (x − 3)4 and (x − 3)2 as the
characteristic and the minimal polynomial, respectively.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. Assume
that the characteristic polynomials of the given linear operators or matrices are product of linear
products.
(a) Any two 3 × 3 nilpotent matrices are similar if and only if they have the same minimal
polynomial.
(b) Any two 3 × 3 matrices are similar if they have the same minimal polynomial.
(c) If a nilpotent operator has invariants 5, 4, 3 and 2, then its nullity is 2.
(d) If a nilpotent operator has invariants 5, 4, 3 and 2, then its rank is 5.
(e) If a nilpotent operator has minimal polynomial x5 , then its Jordan form has at least one
elementary Jordan block of order 5.
(f) Two matrices having the same Jordan form are similar.
(g) Any two linear operators (or matrices) having the same characteristic polynomial and the
same minimal polynomial are similar.
(h) Two matrices having the same characteristic polynomial, the same minimal polynomial and
the same rank are similar.
(i) Two matrices having the same characteristic polynomial, the same minimal polynomial, the
same rank and the same trace are similar.
(j) The Jordan form of a diagonal matrix is itself.
(k) The Jordan form of an upper triangular matrix is a diagonal one.
(l) The basis with respect to which the matrix of a linear operator is its Jordan form is unique.
(m) For a linear operator T , if rank(T k ) = rank(T k+1 ), then nullity(T k ) = nullity(T k+1 ).
(n) For a linear operator T on an n-dimensional vector space with an eigenvalue λ, then
ker(T − λI)n = ker(T − λI)k
for any positive integer k.

(o) The Jordan forms of any two 4 × 4 matrices of rank 2 are the same.
(p) If the minimal polynomial of a linear operator having characteristic polynomial x3 (x − 1)2 is
x(x − 1), then its nullity is 5.
2. Find a matrix over C whose characteristic polynomial is x2 (x + 1)4 , the minimal polynomial
x2 (x + 1)2 and whose rank is 4.
3. Is there any complex matrix A of order 3, such that
 
0 0 1

 
A3 = 0 0 0?
 
0 0 0
344 Canonical Forms
4. Suppose that a matrix A ∈ M14 (C) has 0 as its only eigenvalue. If rank(A) = 9, rank(A2 ) =
5, rank(A3 ) = 3, rank(A4 ) = 1 and rank(A5 ) = 0, determine the Jordan form of A.
5. Find all possible Jordan forms of matrices in M7 (C) having 1 as their only eigenvalue.
6. Find the Jordan form of a matrix A ∈ M7 (R) having characteristic polynomial (x − 1)3 (x − 2)4
with nullity(A − I7) = 2 and nullity(A − 2I7) = 3.
7. Suppose that A ∈ M13 (R) has the Jordan form
J5 (a) ⊕ J4(a) ⊕ J3(a) ⊕ J1(a)
for some real number a. If I is the identity matrix in M13 (R), compute nullity(A − aI)k for k =
1, 2, 3, . . . .
8. Find all possible Jordan forms of 6 × 6 nilpotent matrices over a field F.
9. Let A and B be 6 × 6 nilpotent matrices over a field F having the same minimal polynomial and
the same nullity. Prove that A and B are similar. Give an example to show that the same is not
true for 7 × 7 matrices.
10. Let A and B be matrices over a field F having the same characteristic and the same minimal
polynomial (x − a1 )d1 (x − a2 )d2 · · · (x − am )dm . If di ≤ 3 for every i, then prove that A and B are
similar.
11. Let T be a linear operator on R5 whose matrix with respect to the standard basis is
 
0 1 0 0 a

0 0 0 0 b
 
A = 0 0 0 1 c 

0 0 0 0 d

0 0 0 0 0
for some real a, b, c and d. Find a basis of R5 , with respect to which the matrix of T is its Jordan
form.
12. Find the Jordan forms of the following matrices over R:
   
 1 1 1
  1 1 1

 
−1 1 1 −1 −1 −1
   
1 0 2 1 1 0
 
  1 2 3 4

 1 1 1 
  2 4 6 8
−1 1 1  
  3 6 9 12
1 0 2  
4 8 12 16
   
1 2 3 4
  1 1 0 0

 
0 5 6 7 −2 0 1 0
   .
0 0 8 9  2 0 0 0
   
0 0 0 10 −2 −1 −1 −1
Jordan Forms 345
13. Let T be the linear operator on C3 [x], the complex vector space of all polynomials with complex
coefficients of degree at most 3, given by T ( f (x)) = f (x − 1). Find the Jordan form, the trace and
the determinant of T .
14. Let A be a complex matrix with characteristic polynomial ch(x) and minimal polynomial m(x).
If ch(x) = m(x)(x + i) and m2 (x) = ch(x)(x2 + 1), find the possible Jordan forms of A.
15. Prove that two projections on a finite-dimensional vector space having the same rank have the
same Jordan form.
16. Let A be a nilpotent matrix of order n over a field F of index of nilpotency n. Prove that there is
no matrix B over F such that B2 = A.
17. Let A be a nilpotent matrix of order n over a field F. If B = A − In , then compute the determinant
of B.
18. Let A be an r × r nilpotent matrix of index of nilpotency r over a field F. Prove that A is similar
to its transpose At over F.
19. Prove that any matrix over C is similar to its transpose over C.
20. Let A be a matrix in Mn (C) with trace zero. Prove that A is similar to a matrix with all zeros
along the diagonal.
21. Let T be a linear operator on a finite-dimensional vector space V over a field F such that its
minimal polynomial is a product of linear factors over F. Prove that T is diagonalizable if and
only if ker(T −λI)k = ker(T −λI) for any positive integer k and any eigenvalue λ. (I is the identity
operator on V.)
22. Let A, B ∈ Mn (C) such that AB − BA = A. Show that Ak B − BAk = kAk for every positive integer
k. Deduce that A is nilpotent.
7 Bilinear Forms
7.1 INTRODUCTION
As we have seen in Chapter 5, the idea of orthogonality in Rn or Cn , introduced through dot products,
was the key to establish an important result about real symmetric matrices (that they can always be
diagonalized). The concept of bilinear forms generalizes dot products and provides a way for intro-
ducing the idea of orthogonality in arbitrary vector spaces. As such, these forms are basic in various
areas of advanced mathematics as well as in many applications. Along with bilinear forms, we also
introduce alternating bilinear forms in this chapter; such forms and associated symplectic groups are
also important in diverse areas of mathematics and physical sciences.
7.2 BASIC CONCEPTS

Recall that the dot product f (x, y) = /x, y0 in Rn (or in Cn ) associates a scalar with any two vectors.
The usefulness of the dot product is mainly due to the nice properties it has; for example, in Rn , the
dot product f (x, y) = /x, y0 is linear in both the variables. In generalizing the dot product to arbitrary
vector spaces, this bilinearity is the condition that is carried over.
Definition 7.2.1. Let V be a vector space over a field F. A map f : V × V → F is a bilinear form on
V if f satisfies the following conditions:
f (v + v' , w) = f (v, w) + f (v' , w)
f (v, w + w' ) = f (v, w) + f (v, w' )
f (av, w) = a f (v, w)
= f (v, aw)
for all v, v' , w, w' ∈ V and for all a ∈ F.
Thus, if f is bilinear on V, then f (v, w) is a scalar in F for each pair (v, w) of vectors. The reader
should be aware of the use of the same symbols for denoting operations in V and F and therefore there
should be no confusion in this regard.
Note also that a bilinear form is linear in each of the variables so if we fix one variable, the resultant
function is just a linear map from V into F. The next proposition states this property more precisely
and lists some other simple ones.
346
Basic Concepts 347
Proposition 7.2.2. Let f be a bilinear form on a vector space V over a field F.

(a) For a fixed v ∈ V, the map Lv , given by Lv (w) = f (v, w) for any w ∈ V, is linear on V.
(b) For a fixed w ∈ V, the map Rw , given by Rw (v) = f (v, w) for all v ∈ V, is linear on V.
(c) For all v ∈ V, f (0, v) = f (v, 0) = 0.
(d) If g(v, w) = f (w, v) for v, w ∈ V, then g is a bilinear form on V.
Proof. We leave the easy verification to the reader as an exercise. !
Before we consider examples of bilinear forms, we introduce some important classes of such forms.
Definition 7.2.3. A bilinear form f on a vector space V over a field F is symmetric if f (v, w) =
f (w, v) and skew-symmetric if f (v, w) = − f (w, v) for all v, w ∈ V. f is said to be alternating if f (v, v) =
0 for all v ∈ V.
Any alternating form f on a vector space V is skew-symmetric as can be seen by expanding f (v +

w, v + w) and using the defining relations f (v, v) = 0 = f (w, w). Conversely, a skew-symmetric form f
on V is alternating if in the underlying field F, division by 2 is allowed (that is, chF ! 2).
EXAMPLE 1 In any vector space V over a field F, there is a trivial symmetric bilinear form f given
by f (v, w) = 0, where 0 is the zero of the field F. We will refer to this form as the
zero bilinear form on V, or as the bilinear form on V which is identically zero.
EXAMPLE 2 If V = Fn , the vector space of column vectors or n × 1 matrices over F, then we can
get a symmetric bilinear form f on V just like the dot product on Rn by declaring that
f (x, y) = xt y for any x, y ∈ V. Note that the matrix product xt y is a scalar in F. By
properties of matrix multiplication, one verifies easily that f is symmetric bilinear
form on Fn .
It must be pointed out that what is usually known as the dot product (see Definition
5.3.21 in Section 3.7) in Cn differs in a significant way from the bilinear form given
in the preceding example; if g denotes the usual dot product in Cn , then g(x, ay) =
a f (x, y), which means that f is not linear in the second variable. Even then a far-
reaching generalization of the usual dot product in Cn , known as a hermitian form on
a complex vector space, is equally important and has many applications. We study
hermitian forms in the next chapter.
EXAMPLE 3 The preceding example is a special case of this one. For the vector space V = Fn , any
fixed matrix A ∈ Mn (F) gives rise to a bilinear form fA on V if we let fA (x, y) = xt Ay
for any x, y ∈ V.
EXAMPLE 4 We calculate the bilinear form fA on R2 if

' (
1 2
A= .
2 3
348 Bilinear Forms
Given arbitrary vectors x = (x1 , x2 )t and y = (y1 , y2 )t in R2 , the product xt A = (x1 +

2x2 , 2x1 + 3x2 ) and so fA (x, y) = xt Ay = x1 y1 + 2x2 y1 + 2x1 y2 + 3x2 y2 .
This formula, like the one for the dot product on R2 , completely describes the
form fA in terms of the components of vectors of R2 .
Note that in this example, fA (y, x) is the same scalar as fA (x, y) so fA is a sym-
metric form. As we will see in the following general case, this is no accident as A
was a symmetric matrix to begin with.
EXAMPLE 5 For a symmetric matrix A = At in Mn (F), one has
(xt Ay)t = yt At (xt )t = yt Ax = fA (y, x),
by properties of transposes of matrices. On the other hand, xt Ay being a scalar,

(xt Ay)t = xt Ay = fA (x, y).
It follows that fA (x, y) = fA (y, x) for all x, y in Fn . Thus, the symmetric matrix A
forces the bilinear form fA on Fn to be a symmetric form.
Conversely, assume that fA (x, y) = xt Ay is a symmetric bilinear form on Fn . Then
for the standard basis (considered as column vectors) e1 , e2 , . . . , en of Fn , we have
fA (ei , e j ) = fA (e j , ei ) for all i, j. But, by the definition of fA , if A = [ai j ], then it is
clear that fA (ei , e j ) = ei t [ai j ]e j = ai j . A similar calculation shows that fA (e j , ei ) = a ji .
As fA is assumed to be symmetric, we conclude that the matrix A is symmetric.
In a similar manner, the bilinear form fA is an alternating one if A = [ai j ] is an
alternating matrix, that is, At = −A and a j j = 0.
These examples show the close relationship between bilinear forms and matrices,
which we will discuss shortly.
For the next example, we need the idea of the trace of a matrix in Mn (F). Recall
that for any A = [ai j ] ∈ Mn (F), the trace of A, denoted by T r(A), is the sum of the
diagonal elements of A. T r, as a function from Mn (F) to F, is a linear one: T r(A +
B) = T r(A) + T r(B) and T r(cA) = cT r(A) for any A, B ∈ Mn (F) and any c ∈ F.
EXAMPLE 6 For a field F, let V = Mm×n (F) be the vector space of all m × n matrices over F. Let
A ∈ Mm (F) be a fixed square matrix of order m over F and fA the function defined on
V × V as follows:
fA (X, Y) = T r(X t AY) for X, Y ∈ V.
Since for any X, Y and Z in V and c ∈ F, (cX + Y)t = cX t + Y t by properties of trans-

poses, the linearity of the trace function shows that
T r((cX + Y)t AZ) = cT r(X t AZ) + T r(Y t AZ)
Thus fA is linear in the first variable. Similarly, linearity of T r alone shows that fA
is linear in the second variable and so we have verified that fA is a bilinear form on
Mm×n (F).
In particular, by choosing A = Im , the identity matrix in Mm (F), we see that
T r(X t Y) defines a bilinear form f on Mm (F). Note that T r(Y t X) = T r((X t Y)t ) =
T r(X t Y) as traces of a matrix and its transpose are the same. Thus f is a symmetric
bilinear form.
Basic Concepts 349
EXAMPLE 7 Let V = Rn [x] be the real vector space of all real polynomials of degree at most n for
any fixed positive integer n. Define
J 1
F( f (x), g(x)) = f (x)g(x)dx
0
for any two polynomials f (x) and g(x) in V, where the integral is the usual Riemann
integral. The familiar properties of integrals show that F is a symmetric bilinear form
on V.
In fact, if f (x) and g(x) are continuous real-valued functions in the real vec-
tor space C[0, 1] of continuous real-valued functions on the closed interval [0, 1],
the same formula for F( f (x), g(x)) provides us with a symmetric bilinear form on
C[0, 1]. Lb
Note that, in the same way, a f (x)g(x)dx defines symmetric bilinear forms on
Rn [x] and C[a, b].
The study of bilinear forms on finite-dimensional vector spaces can be facilitated
by associating bilinear forms with matrices. This association, however, depends on
bases chosen for the vector space similar to the way matrix representations of linear
operators depend on bases.
Definition 7.2.4. Let V be an arbitrary n-dimensional vector space over a field F. Given a bilinear
form f on V and a basis B = {v1 , v2 , . . . , vn } of V, the matrix of f with respect to the basis B is the
matrix A = [ai j ] in Mn (F), where ai j = f (vi , v j ) for any i, j such that 1 ≤ i, j ≤ n.
On the other hand, any A = [ai j ] ∈ Mn (F) determines a bilinear form fA on V, with respect to B, as
4 4
follows: for vectors v = j x j v j and w = j y j v j , let
1
fA (v, w) = a i j xi y j .
i, j
Calculations, similar to the ones in earlier examples, show that if x = (x1 , x2 , . . . , xn )t and y =
(y1 , y2 , . . . , yn )t denote the coordinate vectors, respectively of v and w with respect B, then fA (v, w) =
xt Ay. Thus, fA is a bilinear form on V (see Example 5). Also, note that fA (vi , v j ) = ai j for any basis vec-
tors vi and v j . Therefore by Definition (7.2.4), the matrix of fA , with respect to the basis of B, is A itself.
EXAMPLE 8 As an example, we calculate the matrix of the bilinear form fA of Example 4 with
respect to two different bases of R2 . Since fA was given by fA ((x1 , x2 )t , (y1 , y2 )t ) =
x1 y1 + 2x2 y1 + 2x1 y2 + 3x2 y2 in that example and the standard basis of R2 is given
by e1 = (1, 0)t , e2 = (0, 1)t , one has fA (e1 , e1 ) = 1, fA (e1 , e2 ) = 2, fA (e2 , e1 ) = 2 and
fA (e2 , e2 ) = 3. It follows that the matrix of fA with respect to the standard basis is A
itself.
Next, consider the basis B = {(1, 1)t , (1, −1)t } of R2 . Substituting the values of
the coordinates of the basis vectors in the expression for fA , one easily shows that
the matrix of fA with respect to the new basis is given by
' (
7 −2
.
−2 7
Not unexpectedly, this matrix is also symmetric.

350 Bilinear Forms
EXAMPLE 9 We consider the matrix of the bilinear form f (X, Y) = tr(X t Y) on V = M2 (F) with
respect to the standard basis of V consisting of the four unit matrices e11 , e12 , e21
and e22 ; ei j is the matrix of order 2 whose (i, j)th entry is 1 and whose all other
entries are zero. The matrix of f with respect to this basis is clearly a matrix of order
16 over F. So we compute only a few of the entries as an example. For convenience,
we rename the matrices in the standard basis; v1 = e11 , v2 = e12 , v3 = e21 and v4 = e22 .
The (i, j)th entry ai j of the required matrix, by Definition (7.2.4), then is given by
ai j = f (vi , v j ) = T r(vti v j ). Using formulas for multiplication of unit matrices given
in Equation 1.11in Chapter 1, we find, for example, a11 = T r(et11 e11 ) = T r(e11 e11 ) =
T r(e11 ) = 1, whereas a23 = T r(et12 e21 ) = T r(e21 e21 ) = 0, as e21 e21 is the zero matrix.
For any vector space V, let Bil(V) denote the collection of all bilinear forms on V.
One can impose an algebraic structure on Bil(V) by defining addition and scalar multiplication of
bilinear forms on V.
Definition 7.2.5. Let V be a vector space, not necessarily finite-dimensional, over a field F. The
sum f + g, for any f, g ∈ Bil(V), is the map from V × V → F given, for any v, w ∈ V, by
( f + g)(v, w) = f (v, w) + g(v, w).
The scalar multiple c f , for any f ∈ Bil(V) and c ∈ F, is the map from V × V → F given , for any v and
w in V, by
(c f )(v, w) = c f (v, w).
Routine verifications show that the sum of two bilinear forms and a scalar multiple of a bilinear
form are again bilinear forms.
Thus, Bil(V) is closed with respect to the operations defined just now. In fact, Bil(V) is a vector
space with the zero form as the zero vector.
Proposition 7.2.6. Let V be a vector space over a field F. Then, the set Bil(V) of all bilinear forms
on V is itself a vector space over F. Moreover, S ym(V), the subset of all symmetric bilinear forms on
V, and Alt(V), the subset of all alternating forms on V, are subspaces of Bil(V).
The routine verifications needed for the proof of the proposition are left to the reader.
In case V is n-dimensional over a field F, the following result shows that, as vector spaces over F,
Bil(V) and Mn (F) are isomorphic.
Proposition 7.2.7. Let V be an n-dimensional vector space over a field F. Fix a basis B of V. For
f ∈ Bil(V), let T ( f ) ∈ Mn (F) be the matrix of f with respect to B as in Definition (7.2.4). Then T is
a vector space isomorphism from Bil(V) onto Mn (F). Moreover, T carries the subspaces S ym(V) and
Alt(V) of Bil(V) onto the subspaces of symmetric matrices and of alternating matrices, respectively,
in Mn (F).
Basic Concepts 351
Proof. Let B = {v1 , v2 , . . . , vn } be the fixed basis of V. For arbitrary bilinear forms f, g ∈ Bil(V), let
A = [ai j ] and B = [bi j ] be the matrices in Mn (F) representing f and g, respectively, with respect to B.
Thus, for fixed i, j with 1 ≤ i, j ≤ n, one has
f (vi , v j ) = ai j and g(vi , v j ) = bi j .
Since, for the sum f + g and the scalar multiple c f , we have by definition
( f + g)(vi , v j ) = f (vi , v j ) + g(vi , v j ) = ai j + bi j
and
(c f )(vi , v j ) = c f (vi , v j ) = cai j ,
it follows that the (i, j)th entries of the matrices T ( f + g) and T (c f ) are the (i, j)th entries of the
matrices A + B and cA, respectively. In other words,
T ( f + g) = A + B = T ( f ) + T (g)
T (a f ) = aA = aT ( f ),
which show that T is linear.
Next, note that the discussion following Definition (7.2.4) implies that the linear map T on Bil(V)
is one–one and onto Mn (F). Thus, T is an isomorphism as required.
The other assertions of the theorem are clear. !
Since the vector space Mn (F) has dimension n2 over F, the following corollary follows.
Corollary 7.2.8. The vector space Bil(V), of bilinear forms on an n-dimensional vector space over
a field F, is of dimension n2 over F.
Summarizing, a bilinear form f on an n-dimensional vector space V over a field F, and the unique
matrix A ∈ Mn (F) f determines with respect to a basis B of V are associated through the equation
f (v, w) = xt Ay, (7.1)
where x and y are the coordinate vectors of v and w in V, respectively, with respect to the basis B.
The preceding equation also helps us establish the relationship between the matrices representing
the same bilinear form with respect to different bases. So suppose that B ∈ Mn (F) of f with respect to
another basis B' of V and x' and y' be the coordinate vectors of the same vectors v and w in V. Now,
if P is the change of basis matrix from B' to B, then x = Px' and y = Py' (see Proposition 4.5.11 in
Section 4.5). Thus, by Equation (7.1),
f (v, w) = xt Ay = (Px' )t A(Py' )
= x' t (Pt AP)y' .
Since B is the unique matrix f determines with respect to B' , it follows from Equation (7.1) (this time
for the basis B' ) that
B = Pt AP. (7.2)
This calls for a definition. Recall that any change of basis matrix, and in particular P, is an invertible
matrix.
352 Bilinear Forms
Definition 7.2.9. For matrices A, B ∈ Mn (F), B is said to be congruent to A (over F), if there is an
invertible matrix P ∈ Mn (F) such that B = Pt AP.
It is an easy verification, using properties of transposes of matrices, that being congruent to is an

equivalence relation in Mn (F).
The conclusion of Equation (7.2) can now be stated as follows: any two matrices representing a
bilinear form on a finite-dimensional vector space with respect to two bases are congruent.
' (
0 1
EXAMPLE 10 Consider the matrix A = ∈ M2 (R). We compute the bilinear form f on R2 de-
−1 0
termined by A relative to the standard basis E of R2 . If v = (x1 , x2 )t and w = (y1 , y2 )t
are two arbitrary vectors in R2 , then, as E is the standard basis, (x1 , x2 )t and (y1 , y2 )t
themselves are the coordinate vectors of v and w, respectively. Therefore, by Equa-
tion (7.1), we obtain
' (2 3
0 1 y1
f (v, w) = (x1 , x2 )
−1 0 y2
2 3
y
= (x1 , −x2 ) 1
y2
= x 1 y2 − x2 y1 .
It is clear that A itself is the matrix of f with respect to the standard basis of R2 . To
compute the matrix B of f with respect t t
' to the(basis B = {(1, −1) , (1, 2) }, we note
1 1
that the change of basis matrix is P = (see Proposition 3.4.14). Therefore,
−1 2
by Equation (7.2),
' (' (' ( ' (
1 −1 0 1 1 1 0 1
B = 1/3 = .
1 2 −1 0 −1 2 −1 0
The concept of congruence will help us in finding simpler matrix representation of a bilinear form
exactly in the manner the concept of similarity helps in finding simpler matrix representation of a
linear operator. The ideal situation will be when it is possible to have a diagonal matrix representing
a given bilinear form. As was the case with linear operators, therefore, it is important to identify
diagonalizable bilinear forms.
Definition 7.2.10. A bilinear form f on a finite-dimensional vector space V is said to diagonaliz-

able if there is a basis of V with respect to which the matrix of f is a diagonal one.
We examine the problem of diagonalizing a bilinear form in the next two sections.
EXERCISES
that all underlying vector spaces are finite-dimensional and all given matrices are square.
(a) Given a vector space V of dimension n over a field F, any matrix of order n over F is the
matrix of some bilinear form on V with respect to any given basis of V.
Basic Concepts 353
(b) If dim V = n, then the dimension of the space of all symmetric bilinear forms on V is 2n.
(c) Two distinct bilinear forms on a vector space V cannot have the same matrix representation
with respect to a fixed basis of V.
(d) Two matrices representing a bilinear form on Fn have the same characteristic polynomial.
(e) The difference of two symmetric bilinear forms on a vector space need not be a symmetric
bilinear form.
(f) If for a bilinear form f on a vector space V, f (vi , vi ) ! 0 for any basis vector vi in a given
basis of V, then f (v, v) ! 0 for any non-zero v ∈ V.
(g) Two similar matrices in Mn (F) need not be congruent in Mn (F).
(h) Two congruent matrices in Mn (F) have the same eigenvalues.
(i) Two congruent matrices in Mn (F) have the same determinant.
(j) Two congruent matrices in Mn (F) have the same rank.
(k) The bilinear form f on M2 (F) given by f (A, B) = tr(AB) is symmetric.
(l) If charF = 2, then any alternating form on a vector space V over F is a symmetric one.
2. Verify the elementary properties of bilinear forms given in Proposition (7.2.2).
3. Let V be a vector space over a field F. Verify that the sum of two bilinear forms on V and a scalar
multiple of a bilinear form on V are bilinear forms on V. Furthermore, carry out the verifications
needed to complete the proof of Theorem (7.2.6).
4. Verify that the following functions given in the examples of this section are bilinear forms. F
stands for an arbitrary field.
(a) fA on Fn if fA (x, y) = xt Ay for any x, y ∈ Fn , where A ∈ Mn (F).
L1
(b) F on V = Rn [x], or on C[0, 1] if F( f (x), g(x)) = 0 f (t)g(t)dt for f (x), g(x) ∈ V.
5. Determine which of the following functions from R2 × R2 → R are bilinear forms. Assume
v = (x1 , x2 )t and w = (y1 , y2 )t are arbitrary vectors in R2 .
(a) f (v, w) = 1.
(b) f (v, w) = x1 x2 + y1 y2 .
(c) f (v, w) = x1 y2 − x2 y1 .
(d) f (v, w) = (x1 + y1 )2 − x2 y2 .
(e) f (v, w) = −x1 y2 − 2x2 y1 − 3x1y2 − 4x2y2 .
6. Which of the following are bilinear forms? Justify your answer.
(a) f : R × R → R given by f (x1 , x2 ) = ax1 + bx2 , where a and b are fixed non-zero real numbers.
(b) For a vector space V over a field F, f : V × V → F given by f (v, w) = (F(v, w))2 for a fixed
bilinear form F on V.
(c) f : V × V → F given by f (A, B) = T r(A)T r(B), where V = Mn (F).
(d) f : C × C → R given by f (z1 , z2 ) = |z1 − z2 |, where C is the vector space of complex numbers
over the field R, and |z| denotes the modulus, or absolute value of the complex number z.
7. Let V be a vector space over a field F and L and R linear maps from V into F (F considered a
vector space over itself). Is the map f : V × V → F given by f (v, w) = L(v)R(w) for any v, w ∈ V,
a bilinear form on V? Justify your answer.
354 Bilinear Forms
8. Let f be the bilinear form on the real vector space R2 given by
f ((x1 , x2 )t , (y1 , y2 )t ) = x1 − 2x1 y2 − 2x2 y1 + 3x2 y2 .

Find the matrices A and B of f with respect to (i) the standard basis E = {(1, 0)t , (0, 1)t } and (ii)
the basis B = {(1, 1)t , (−1, 1)t }, respectively.
9. Consider the real vector space V = M2 (R), and let f be the bilinear form on V defined by
f (X, Y) = T r(X t AY), where
' (
1 −1
A= .
−1 0
Compute the diagonal entries of the matrix of f with respect to the standard basis
{e11 , e12 , e21 , e22 }, where ei j is the unit matrix of order 2, whose only non-zero entry is an 1
at the (i, j)th place. ' (
2 1 −1
10. Consider the bilinear forms on the real vector space R determined by the matrix with
−1 2
t t
respect to (i) the standard basis E and (ii) the basis B = {(1, 1) , (1, 2) }, respectively. Compute
the formulae for these bilinear forms in terms of components of arbitrary vectors of R2 .
11. Let f be the bilinear form on the real vector space R3 whose matrix with respect to the standard
basis of R3 is given by
 
 1 −1 2
 
−1 0 3.

2 −1 −1
Find the matrix of f with respect to the basis {(1, 0, 1)t , (1, 1, 0)t ,
(1, 1, −1)t }. of R3 . ' (
2 0 1
12. Verify that the bilinear form f on R determined by the matrix A = , with respect to the
−1 0
standard basis of R2 , is alternating by computing xt Ax for any x ∈ R2 .
13. For any positive integer n, let p and q be non-negative integers such that p + q = n and
' (
Ip 0
A=
0 −Iq
a real matrix of order n (here I p and Iq are the identity matrices of orders p and q, respec-
tively, and 0 denote zero matrices of suitable sizes). If f p,q is the bilinear form on Rn given by
f p,q (x, y) = xt Ay, find a formula for the form in terms of coordinates of x and y and verify that
it is a symmetric one.
14. Prove that congruence of matrices is an equivalence relation in Mn (F) for any field F.
Exercises 12 and 13 are valid if the underlying field R is replaced by any field F whose
characteristic is ! 2, that is, a field in which division by 2 is allowed.
15. Let f be a bilinear form on a vector space V over R. Prove that f can be expressed uniquely as a
sum f1 + f2 where f1 is a symmetric bilinear form on V and f2 an alternating bilinear form on V.
16. Show that, if a bilinear form f on a vector space over R is both symmetric and alternating, then
f is the zero form.
17. Let f be a symmetric and g an alternating bilinear form on a complex vector space V. If f + g is
the zero form on V, then show that both f and g are zero on V.
Linear Functionals and Dual Space 355
18. Let V be a vector space over a field F. Assume that chF ! 2, so that division by 2 in F is allowed.
On the vector space Bil(V) of all bilinear forms on V, define a function T such that for any
f ∈ Bil(V), T ( f ) is given by T ( f )(v, w) = (1/2) f (v, w) − (1/2) f (w, v) for any v, w ∈ V. Show
that (i) T ( f ) ∈ Bil(V) and (ii) T is a linear operator on Bil(V).
19. Assume that dim V = n. Determine the dimensions of the subspaces consisting of the symmetric
and the skew-symmetric bilinear forms in Bil(V).
20. Let V be a vector space over a field F such that V = V1 ⊕ V2 for subspaces V1 and V2 of V. If f1
and f2 are bilinear forms on V1 and V2 , respectively, prove that there is a unique bilinear form f
on V whose restrictions to V1 and V2 are f1 and f2 , respectively.
4
21. Define f on Cn by f ((z1 , z2 , . . . , zn ), (w1 , w2 , . . . , wn )) = ni=1 zi wi . Show that f is a bilinear
form on Cn if it is considered a vector space over R but not bilinear if considered over C.
7.3 LINEAR FUNCTIONALS AND DUAL SPACE

This section is devoted to developing tools that are needed to resolve the diagonalization problem for
bilinear forms in general. The notion of linear functional will be quite useful in this development, and
so we give a brief review of the theory of linear functionals.
Definition 7.3.1. Let V be a vector space over a field F. A linear map f from V to F, where F is
considered a vector space over itself, is called a linear functional on V.
The set of all linear functionals on V forms a vector space HomF (V, F) as a special case of Theorem
(4.3.4).
Definition 7.3.2. For a vector space V over a field F, its dual space V ∗ is the vector space of all
linear functionals on V.
Observe that for an n-dimensional vector space V, dim V ∗ = n by Corollary (4.3.8), and so V ∗ is
isomorphic to V as vector spaces over F (see Corollary 4.4.5 in Section 4.4).
We now explore the connection between bilinear forms and linear functionals.
Let f be a bilinear form on a vector space V over a field F. Recall that (see Proposition (7.2.2)) for
a fixed v ∈ V, the map Lv : V → F given by Lv (w) = f (v, w) is a linear map, which means that Lv is a
linear functional on V. This allows us to define a map L : V → V ∗ by the formula
L(v) = Lv . (7.3)
Since f (v1 + v2 , w) = f (v1 , w) + f (v2 , w), it follows that L(v1 + v2 ) = L(v1 ) + L(v2 ). On the other hand,
for a scalar a, f (av, w) = a f (v, w), so one can conclude that L(av) = aL(v). Thus, L is a linear map of
V into V ∗ .
Similarly, there is a linear map R from V into V ∗ such that
R(w) = Rw , (7.4)
where Rw , for a fixed w ∈ V, is the linear map on V given by Rw (v) = f (v, w).
We use these maps to define certain subspaces of V important to the studies of a bilinear form f .
356 Bilinear Forms
Definition 7.3.3. For a vector space V and a bilinear form f on V, let L and R be the linear maps
defined in Equations (7.3) and (7.4). We let
V ⊥L = ker L = {v ∈ V | Lv : V → F is the zero map}.

V ⊥R = ker R = {u ∈ V | Ru : V → F is the zero map}.
Being kernels of linear maps, V ⊥L and V ⊥R are subspaces of V. There are simpler descriptions of
these subspaces.
Proposition 7.3.4. For a bilinear form f on a vector space V,
V ⊥L = {v ∈ V | f (v, w) = 0 ∈ F for all w ∈ V};

V ⊥R = {w ∈ V | f (v, w) = 0 ∈ F for all v ∈ V}.
Proof. Left as an exercise to the reader. !
One may like to think of the condition f (v, w) = 0 generalizing the concept of perpendicularity in
terms of the dot product in R2 or R3 . However, unless f is symmetric, one does not have the right
generalization. Note that if f is symmetric, then V ⊥L = V ⊥R .
Definition 7.3.5. Let f be a bilinear form on a vector space V. If V ⊥L = V ⊥R , then we let
V ⊥ = V ⊥ L = V ⊥R ,
and call V ⊥ the radical or the null space of f .
For the notion of perpendicularity to be meaningful, we need to exclude pathological cases where
a non-zero vector can be perpendicular to itself. It is clear that such exclusion can be ensured by
insisting on the condition that V ⊥L = V ⊥R = {0}. The next result will help in finding a more practical
interpretation of this condition.
Proposition 7.3.6. Let f be a bilinear form on a finite-dimensional vector space V over a field
F. Then, the dimensions of the subspaces V ⊥R and V ⊥L are the same, and equal to the nullity of the
matrix of f with respect to any basis of V.
Recall that (see Definition 3.6.13) the nullity of any n × n matrix C over F is n − rank(C); it is also
the dimension of the solution space in Fn of the matrix equation Cx = 0.
Proof. Fix a basis B = {v1 , v2 , . . . , vn } of V and let A = [ai j ] be the matrix in Mn (F) of the bilinear
form f with respect to B so that ai j = f (vi , v j ) for all i, j, 1 ≤ i, j ≤ n. It follows from the definition
4
of V ⊥L that a vector v = j c j v j ∈ V ⊥L if and only if f (v, vi ) = 0 for i = 1, 2, . . . , n. Expanding
4 4
f (v, vi ) = f ( j c j v j , vi ), we see that v = j c j v j ∈ V ⊥L if and only if
n
1
c j a ji = 0 for i = 1, 2, . . . , n.
j=1
4
Equivalently, v = j c j v j ∈ V ⊥L if and only if the components of the coordinate vector c =
(c1 , c2 , . . . , cn )t of v with respect to B satisfy the system of equations:

n
1
bi j x j = 0 for i = 1, 2, . . . , n,
j=1
where bi j = a ji for all 1 ≤ i, j ≤ n. We may thus conclude that the dimension of V ⊥L is precisely the
dimension of the solution space in Fn of the matrix equation At x = 0, where At is the transpose of
A = [ai j ] and x = (x1 , x2 , . . . , xn )t . Since ranks of a matrix and its transpose are the same, it follows
that the dimension of V ⊥L is the nullity of A as claimed.
A similar analysis shows that the dimension of V ⊥R also is the nullity of A. !
It is now easy to derive the required interpretation of the condition V ⊥L = {0} = V ⊥R .
Corollary 7.3.7. Let f be a bilinear form on a finite-dimensional vector space V over a field. Then,
the following are equivalent.
(a) V ⊥L = {0}.
(b) V ⊥R = {0}.
(c) The matrix of f with respect to any basis of V is invertible.
Proof. An n × n matrix over a field is invertible if and only if its rank is n, or equivalently, its nullity is
zero. !
Another consequence of the last proposition is that the rank and nullity of the matrix of a bilinear
form on a finite-dimensional vector space with respect to any basis is independent of the choice of the
basis. Note that this independence could have been inferred from the fact that matrices representing
a bilinear form with respect to different bases are congruent and congruent matrices have the same
rank.
Definition 7.3.8. The rank of a bilinear form on a finite-dimensional vector space is the rank of the
matrix of the form with respect to any basis of the space.
We now introduce an important class of bilinear forms.
Definition 7.3.9. A bilinear form f on a vector space V is said to be non-degenerate if V ⊥L = {0}

or equivalently, V ⊥R = {0}.
Thus, for a non-degenerate bilinear form, its radical V ⊥ is defined and is the zero subspace. The
definition implies that for a symmetric or a skew-symmetric non-degenerate bilinear form f on V, for
any non-zero vector u ∈ V, there is a vector v ∈ V such that f (u, v) ! 0.
It is easy to give examples of non-degenerate bilinear forms thanks to Corollary (7.3.7). For any
invertible matrix A ∈ Mn (F), the bilinear form fA on Fn defined by fA (x, y) = xt Ay has A as its matrix
with respect to the standard basis and so, by the said corollary, is a non-degenerate form.
One of the most important implication of the non-degeneracy of a bilinear form f is that every
linear functional on V is determined by a unique vector in V through such a bilinear form.
Let f be a non-degenerate bilinear form on a finite-dimensional vector space V over a field F. So,
by definition, V ⊥L = {0}. It follows, by Definition (7.3.3), that the kernel of the linear map L : V → V ∗
is the zero subspace and so the linear map L is one–one. However, dim V = dim V ∗ is finite, so L
358 Bilinear Forms
must be onto V ∗ (see Corollary 4.2.10). But then the ontoness of L implies that any linear functional
in V ∗ must be Lv for some v ∈ V. Conversely, if L is onto, that is, if L(V) = V ∗ , then by tracing the
argument backwards, we see that f must be non-degenerate. In a similar manner, we can show that f
is non-degenerate if and only if R(V) = V ∗ , where R is the map from V to V ∗ given by Equation (7.4).
These conclusions are put in a slightly different format in the following proposition.
Proposition 7.3.10. For a non-degenerate bilinear form f on a finite-dimensional vector space V

over a field F, the following hold.
(a) For any linear functional φ ∈ V ∗ , there is a unique vector v ∈ V such that φ(w) = f (v, w) for all
w in V.
(b) For any linear functional φ ∈ V ∗ , there is a unique vector w ∈ V such that φ(v) = f (v, w) for all
v in V.
Proof. Use the definitions of Lv and Ru in conjunction with the conclusions of the preceding discus-
sion. !
The importance of the preceding proposition is due the fact that the theoretical basis of the material
that will be considered in the next section is provided by it.
EXERCISES
that all underlying vector spaces are finite-dimensional over an arbitrary field F.
(a) For any vector space V, the dual space V ∗ is isomorphic to V.
(b) For a linear functional f on an n-dimensional vector space, the rank of f is n − 1.
(c) For any invertible matrix A in Mn (F), the bilinear form f on Fn given by f (x, y) = xt Ay is a
non-degenerate form.
(d) If f is a non-degenerate bilinear form on a vector space V, then every element of the dual V ∗
is of the form Rv for some v ∈ V.
(e) For any non-zero vector v in a vector space V, there is some f ∈ V ∗ such that f (v) is non-zero.
(f) On an n-dimensional vector space V, there is a bilinear form on V of rank m for any integer
m, 0 ≤ m ≤ n.
(g) On an n-dimensional vector space V, there is a non-degenerate bilinear form on V of rank m
for any integer m, 0 ≤ m ≤ n.
(h) Given a non-degenerate bilinear form f on a vector space V, the restriction of f to any
non-zero subspace V is non-degenerate.
(i) The sum of two non-degenerate bilinear forms on a vector space is always non-degenerate.
2. Give examples of bilinear forms f on R3 having rank 1, 2 and 3, respectively, by giving formulae
for f ((x1 , x2 , x3 )t , (y1 , y2 , y3 )t ) in each case.
3. Give examples of non-degenerate bilinear forms, one each, on R2 and R3 .
4. Give an example of a symmetric bilinear form f on R3 such that f is degenerate, that is, not
non-degenerate.
6. Complete the proof of Proposition (7.3.6) by showing that the dimension of V ⊥R is the nullity of
the matrix A.
7. Consider the bilinear form f on R4 given by f (x, y) = xt Ay, where A is the matrix
 
 1 0 0 0 

 0 1 0 0 
 .
 0 0 1 0 
0 0 0 −1
Find a vector v ∈ R4 such that f (v, v) = 0.

8. Let f be a bilinear form on a finite-dimensional vector space V and W a subspace of V such that
the restriction fW of f to W is a non-degenerate bilinear form on W. Show that the dimension of
W cannot exceed the rank of the bilinear form f .
9. Let V be a vector space over a field F, and f1 and f2 linear functionals on V. Show that the map
f : V × V → F given by f (v, w) = f1 (v) f2 (w) for v, w in V, defines a bilinear form on V. Is f
symmetric?
For a finite-dimensional vector space V over a field F with basis {v1 , v2 , . . . , vn }, the dual
basis of the dual space V ∗ is the collection { f1 , f2 , . . . , fn } of linear functionals on V such that
fi (v j ) = δi j , where δi j is the Kronecker delta for all i, j. We had seen that these fi form a basis of
the dual space V ∗ .
10. Let V be a finite-dimensional vector space over a field F with basis B = {v1 , . . . , vn } and dual
basis B∗ = { f1 , . . . , fn }. Prove that the bilinear forms T i j given by T i j (v, w) = fi (v) f j (w) for any
v, w in V, for 1 ≤ i, j ≤ n, form a basis of the vector space Bil(V) over F.
11. Let V be a finite-dimensional vector space over a field F and f a bilinear form on V. Prove that
f can be expressed as the product of two linear functionals f1 and f2 on V in the sense that
f (v, w) = f1 (v) f2 (w) for v, w in V if and only if f has rank 1.
12. Let V be a vector space over a field F and V ∗ be its dual space. Form the vector space W =
V × V ∗ whose vector space operations are performed component-wise in terms of the vector
space operations of V and V ∗ , respectively. Define ω : W × W → F by
ω((v1 , f1 ), (v2 , f2 )) = f2 (v1 ) − f1 (v2 )
for all vi ∈ V and fi ∈ V ∗ . Prove that ω is a skew-symmetric, non-degenerate bilinear form on W.

For any linear operator T on a vector space V with a bilinear form f , a linear map T ∗ on V is
said to be the adjoint of T relative to the form f if f (T v, w) = f (v, T ∗ w) for all v, w in V. The
next exercise proves the existence of the adjoint of any linear operator on a finite-dimensional
vector space with a non-degenerate bilinear form.
13. Let f be a non-degenerate bilinear form on a finite-dimensional vector space V over a field F
and T a linear operator on V.
(a) For a fixed vector w ∈ V, show that the map φw defined by
φw (v) = f (T v, w) for any v ∈ V,
is a linear functional on V.
(b) Use Proposition (7.3.10) to show that any fixed w ∈ V determines a unique vector w' ∈ V
such that
f (T v, w) = f (v, w' ) for all v ∈ V.

360 Bilinear Forms
(c) Define a map T ∗ : V → V by the rule that for any w ∈ V, T ∗ (w) = w' , where w' is the vector
uniquely determined by w as in (b), so that
f (T v, w) = f (v, T ∗ w).
(d) For any w1 , w2 ∈ V and a ∈ F, let T ∗ (w1 + w2 ) = u1,2 and T ∗ (aw1 ) = u1 , so that by part
(c), f (T v, w1 + w2 ) = f (v, u1,2 ) and f (T v, aw1 ) = f (v, u1 ). Use the bilinearity of f in these
relations to deduce that
f (v, T ∗ w1 + T ∗ w2 ) = f (v, u1,2 )

f (v, aT ∗ w1 ) = f (v, u1 )
for all v ∈ V.
(e) Finally use the non-degeneracy of f to conclude, from the relations proved in part (d), that
T ∗ is linear on V.
14. Let T and S be linear operators on a finite-dimensional vector space V over a field F, and let
a ∈ F. Assume that V has a non-degenerate bilinear form f . Prove that (i); (T + S )∗ = T ∗ + S ∗ ,
(ii) (T S )∗ = S ∗ T ∗ , (iii) (aT )∗ = aT ∗ and (iv) (T ∗ )∗ = T , where the adjoints are taken with
respect to f .
7.4 SYMMETRIC BILINEAR FORMS

In this section, we study symmetric bilinear forms in some detail; the main result in the section deals
with diagonalization of such forms so as to determine their canonical matrix forms. The implications
for congruence classes of real symmetric matrices are discussed, which naturally leads to positive
definite and positive semi-definite matrices.
The basic notion that we need for the main result is that of orthogonality which we introduce now.
Definition 7.4.1. Let f be a symmetric bilinear form on a vector space V over any field. Given two
vectors v, w ∈ V, we say v is orthogonal to w (with respect to f ) if the scalar f (v, w) = 0. Sometimes
the notation v ⊥ w is used to mean that v is orthogonal to w.
A few remarks are in order.
(i) It is clear that the notion of orthogonality depends completely on the bilinear form. Two vectors
may be orthogonal with respect to one symmetric bilinear form on a vector space, but need not
be so with respect to another.
(ii) Orthogonality is a symmetric relation. If v ⊥ w, then by the symmetry of the underlying bilinear
form, w ⊥ v.
(iii) As with the dot product on Rn , any vector is orthogonal to the zero vector with respect to any
given symmetric bilinear form f ; f (v, 0) = f (v, v − v) = f (v, v) − f (v, v) = 0.
(iv) However, in contrast to dot product, there may be self-orthogonal non-zero vectors with respect
to a non-trivial symmetric bilinear form. For an example, see Exercise 7 of the last section.
(v) On the other hand, with respect to a non-degenerate symmetric bilinear form, by Definition
(7.3.9), the only vector which is orthogonal to every vector space has to be the zero vector.
(vi) The notion of orthogonality, as given in the preceding definition, makes sense in a vector space
Symmetric Bilinear Forms 361
equipped with an alternating form f as for such a form f (v, w) = 0 if and only if f (w, v) = 0.
But here in this section, we focus on symmetric bilinear forms as these forms, unlike alternating
ones, can be diagonalized. For canonical matrix forms of alternating forms, see the last section
of this chapter.
We introduce two related notions; these are generalizations of similar ones for the dot products on
Rn , which have been used extensively earlier in discussing real symmetric matrices.
Definition 7.4.2. Let f be a symmetric bilinear form on a vector space V. For a subspace W of V,
the orthogonal complement of W, denoted by W ⊥ , is defined as
W ⊥ = {v ∈ V | f (v, w) = 0 for all w ∈ W}.

Thus W ⊥ is the set of all vectors in V which are orthogonal to every vector in W.
By properties of bilinear forms, W ⊥ itself is a subspace of V. Also, note that, if W = V, then the
preceding definition coincides with our earlier definition of V ⊥ .
Definition 7.4.3. A basis B = {v1 , v2, , . . . , vn } of a vector space V with a symmetric bilinear form
f is called an orthogonal basis of V with respect to f if
f (vi , v j ) = 0 for i ! j (1 ≤ i, j ≤ n).

It is clear that the matrix of a symmetric bilinear form f , with respect to an orthogonal basis, is a
diagonal one.
Before coming to the main result, we make an observation: if f is a bilinear form on a vector
space V over a field F and W is a subspace of V, then f can be restricted to W, that is, f can be
considered a map from W × W into F. It is clear that the restriction of f to W is a bilinear form on
W, which is denoted by fW . If f is symmetric, then fW will be symmetric on W. However, fW can be
non-degenerate even if f is not so on V (and conversely).
Proposition 7.4.4. Let V be a finite-dimensional vector space with a symmetric bilinear form f . If
W be a subspace of V such that the restriction fW is non-degenerate, then
V = W ⊕ W ⊥.
Proof. We first claim that W ∩ W ⊥ = {0}. To prove the claim, note that the non-degeneracy of fW
implies that the only vector in W, which is orthogonal to every vector in W, must be the zero vector.
Now, if w ∈ W ∩ W ⊥ , then by the definition of W ⊥ , w is orthogonal to all of W, and so must be the zero
vector. Hence the claim.
Thus, it remains to show that V = W + W ⊥ , or equivalently, dim V = dim W + dim W ⊥ as W ∩ W ⊥ =
{0} (see Proposition 3.5.6).
Define h : V → W ∗ , the space of linear functionals on W, by letting
h(v) = φv for v ∈ V, (7.5)
where φv is the restriction of Lv to W (see Equation 7.3 for the definition of Lv ) and thus is a linear
functional on W. Since φv (w) = Lv (w) = f (v, w) for any w ∈ W, it follows that, for any v1 , v2 ∈ V and
362 Bilinear Forms
a ∈ F,
φav1 +v2 (w) = f (av1 + v2 , w)

= a f (v1 , w) + f (v2 , w)
= aφv1 (w) + φv2 (w)
= (aφv1 + φv2 )(w).
Thus, as maps on W, φav1 +v2 and aφv1 + φv2 are equal. By the definition of h, this equality implies that
h is linear on V.
We determine the kernel of h now. Note that, by the definition of h, v ∈ ker h if and only if φv (w) = 0
for any w ∈ W, which is equivalent to the relation f (v, w) = 0 holding for any w ∈ W. This shows that
v ∈ ker h if and only if v ∈ W ⊥ . Thus ker h = W ⊥ .
On the other hand, the non-degeneracy of fW implies, according to Proposition (7.3.10), that any
φ ∈ W ∗ is uniquely determined by some w' ∈ W in the sense that φ(w) = f (w, w' ) = f (w' , w) for all
w ∈ W. Comparing φ with the definitions of the maps h and φw' , we then see that φ = φw' = h(w' ).
Thus we have shown that h is onto W ∗ , that is, Im(h) = W ∗ . As dim W = dim W ∗ , it follows that
dim Im(h) = dim W. Therefore the dimension formula of Theorem (4.2.7) yields
dim V = dim Im(h) + dim ker h = dim W + dim W ⊥ ,
as desired. !
The idea of the characteristic of a field will now be required. By the characteristic ch F of a field
F, we mean the smallest positive integer n such that n.1 = 0, where 1 is the multiplicative identity of
F; if there is no such integer, we say that F has characteristic zero. It can be easily shown that if the
characteristic of a field is non-zero, then it is necessarily a prime. Our familiar fields R, Q or C have
all characteristic zero whereas the finite field of p elements has characteristic p. One of the difficulties
with a field of characteristic p is that the field element p = p.1 behaves as the zero of the field and so is
not invertible; thus division by p is not allowed. Thus if a field, finite or infinite, contains Z2 , the field
of two elements, then it has characteristic 2 and so division by 2 is not allowed in such a field.
Lemma 7.4.5. Let f be a symmetric bilinear form on a vector space V over a field F such that
ch F ! 2. If f is not identically zero, then there is some v ∈ V such that f (v, v) ! 0.
Proof. Since f is not identically zero, there is a pair of non-zero vectors v, w ∈ V such that f (v, w) ! 0.
If, either f (v, v) ! 0 or f (w, w) ! 0, we are done. Otherwise, we let u = v + w. Then, as both f (v, v)
and f (w, w) are zeros, an easy calculation shows that f (u, u) = f (v + w, v + w) = 2 f (v, w) ! 0 and so
u is the desired vector in V. Note the use of the hypothesis about the characteristic of the scalar field
in deducing the conclusion. !
Now we can prove the main theorem.
Theorem 7.4.6. Let f be a symmetric bilinear form on a finite-dimensional vector space V over a
field F, where ch F ! 2. Then there is an orthogonal basis of V relative to the symmetric form f .
Proof. If f is identically zero, then any basis of V is trivially an orthogonal basis. So we may assume
that f is not identically zero. In that case, the preceding lemma provides a non-zero v1 ∈ V such that
f (v1 , v1 ) ! 0.
We take v1 as the first vector of the orthogonal basis we are seeking, and construct the rest of the
basis vectors inductively as follows: suppose we have chosen a set of linearly independent vectors
v1 , . . . , vk for k ≥ 1 such that f (vi , vi ) ! 0 but f (vi , v j ) = 0 if i ! j. We now find the next vector of the
required basis.
Let W be the subspace of V spanned by the vectors v1 , . . . , vk . The matrix of the restriction fW
with respect to this basis of W is clearly diag[ f (v1 , v1 ), f (v2 , v2 ), . . . , f (vk , vk )]. By our choice of
the vectors vi , these diagonal entries are all non-zero so the matrix of fW is invertible. It follows from
Corollary (7.3.7) that fW is non-degenerate. We can then conclude that
V = W ⊕ W⊥
by Proposition (7.4.4). Now, if f restricted to W ⊥ is identically zero, we choose any basis of W ⊥ and
label the vectors in the basis as vk+1 , . . . , vn . Since f is identically zero, any two vectors in this list
are orthogonal. On the other hand, by the definition of W ⊥ , every vi for 1 ≤ i ≤ k is orthogonal to any
v j for k + 1 ≤ j ≤ n. Finally, as V is the direct sum of W and W ⊥ , the union v1 , . . . , vk , vk+1 , . . . , vn
is a basis of V which, by the observations we have made, is orthogonal.
Consider next the case in which f restricted to W ⊥ is not identically zero. Then, according to
Lemma (7.4.5), we can find a non-zero vector vk+1 in W ⊥ such that f (vk+1 , vk+1 ) ! 0. Since every
vector in W is orthogonal to each vector in W ⊥ , the vectors in the list v1 , . . . , vk , vk+1 satisfy the
following properties:
(a) f (vi , v j ) = 0 if i ! j for 1 ≤ i, j ≤ k + 1.
(b) f (vi , vi ) ! 0 for 1 ≤ i ≤ k + 1.
We claim that v1 , . . . , vk+1 are linearly independent. Suppose that
a1 v1 + · · · ak vk + ak+1 vk+1 = 0
4
for scalars ai ∈ F. Then f ( k+1 j=1 a j v j , vk+1 ) = f (0, vk+1 ) = 0 which implies, upon expanding the sum,
4k+1
that j=1 a j f (v j , vk+1 ) = 0. Since distinct vectors of the list v1 , . . . , vk , vk+1 are orthogonal, it follows
that the sum reduces to ak+1 f (vk+1 , vk+1 ) = 0 forcing ak+1 = 0. But then the relation of linear depen-
dence becomes a1 v1 + · · · + ak vk = 0, whence we conclude that the rest of the scalars a j too are zero
as v1 , . . . , vk are linearly independent. So the claim is established.
Thus we have been able to extend the linearly independent set v1 , . . . , vk to a larger linearly inde-
pendent set v1 , . . . , vk , vk+1 such that the distinct vectors are orthogonal, and f (v j , v j ) ! 0 for each v j
in the set.
Hence, by induction, we can produce an orthogonal basis of the finite-dimensional vector
space V. !
For a converse to this theorem, which holds for any scalar field, see Exercise 7 of this section.
Since symmetric matrices correspond to symmetric bilinear forms by Theorem (7.2.7), the diag-
onalizability of symmetric bilinear forms implies that any symmetric matrix is congruent to some
diagonal matrix.
364 Bilinear Forms
Corollary 7.4.7. Let F be a field such that ch F ! 2. Given any symmetric matrix A ∈ Mn (F), there
is an invertible matrix P such that Pt AP is diagonal.
For real symmetric matrices, we have already obtained a stronger result in section 5.3: a real sym-
metric matrix A is similar to a diagonal matrix, whose diagonal entries are the eigenvalues of A. We
have seen that two matrices in a similarity class share the same eigenvalues; so in some sense the
eigenvalues are the invariants of the similarity class. Moreover, the similarity class of a diagonalizable
matrix A contains a unique diagonal matrix (upto the order of the diagonal entries), whose diagonal
entries are precisely the eigenvalues of A; this unique diagonal matrix is sometimes called the canoni-
cal form of A in its similarity class. On the other hand, two congruent matrices, even over R, need not
have the same eigenvalues, traces or determinants ( but do have the same rank). Thus it is reasonable
to ask whether the congruence class of a diagonalizable matrix has a distinguished diagonal matrix
(the canonical form in the congruence class) containing information about some invariants of the con-
gruence class. For a real symmetric matrix, the answer is provided by the following classical result
due to Sylvester.
Theorem 7.4.8. (Sylvester’s Law of Inertia) Let f be a symmetric bilinear form on a real n-
dimensional vector space V. In any diagonal matrix representing f with respect to some orthogonal
basis of V, the numbers, respectively, of positive, negative and zero diagonal entries are independent
of the chosen basis and are uniquely determined by f .
Proof. It suffices to show that if D and D' are diagonal matrices in Rn representing f with respect to
two orthogonal bases, say B and B' , of V, then the numbers of positive and negative diagonal entries
of D are, respectively, the same as the numbers of positive and negative diagonal entries of D' . We
first note that the rank of a diagonal matrix is the number of non-zero diagonal entries. Since, being
congruent matrices, D and D' have the same rank, say r, it follows that both D and D' have r non-zero
diagonal entries.
Suppose now that the numbers of positive entries of D and D' , respectively, are p and s. Note that
the ith entry of D is f (ui , ui ), where ui is the ith vector of the ordered basis B (similarly for the ith entry
of D' ). Therefore permuting the basis vectors if necessary, we may assume that the first p diagonal
entries of D and the first s diagonal entries of D' are positive. Thus we may assume that orthogonal
bases B = {v1 , . . . , vr , w1 , . . . , wn−r } and B' = {v'1 , . . . , v'r , w'1 , . . . , w'n−r } can be chosen such that
the matrices of f with respect to B and B' are, respectively, D = diag[d1, d2 , . . . , dr , 0, . . . , 0] and
D' = diag[d1' , d2' , . . . , dr' , 0, . . . , 0] such that di and d 'j are non-zero; moreover
f (vi , vi ) = di > 0 for 1 ≤ i ≤ p, (7.6)

and
f (v'j , v'j ) = d 'j < 0 for s + 1 ≤ j ≤ r. (7.7)
Let W1 be the subspace of V spanned by v1 , v2 , . . . , v p . Then for any non-zero v ∈ V such that v =
4p 4p 2
i=1 xi vi , one obtains, by Equation (7.6), f (v, v) = i=1 xi di > 0 as xi are reals.
Similarly, for any v ∈ W2 , where W2 is the subspace of V spanned by v's+1 , . . . , v'r , w'1 , . . . , w'n−r ,
one has f (v, v) ≤ 0 by Equation (7.7) as f (w'j , w'j ) = 0.
It follows from the preceding two equations for f (v, v) that W1 ∩ W2 contains only the zero vector
of V. We can then conclude from the formula dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ) (see
Proposition 3.5.2) that p + (n − s) ≤ dim V = n, that is p ≥ s. A similar argument shows that s ≤ p.
Thus the number of positive diagonal entries of D and D' is the same. Since the number of non-zero
diagonal entries of D and D' is the same, the other two assertions of the proposition are now clear. !
The matrix version of the Law of inertia.
Corollary 7.4.9. Let A be real symmetric matrix. Then the numbers of positive, negative and zero
diagonal entries in any diagonal matrix congruent to A are independent of the choice of the diagonal
matrix in the congruence class of A and is uniquely determined by A.
Thus, the numbers of positive and negative diagonal entries of any diagonal matrix congruent to a
real symmetric matrix A, apart from its rank, are the invariants of the congruent class of A.
Definition 7.4.10. Let p and q be the numbers of positive and negative diagonal entries, respec-
tively, of any diagonal matrix representing a symmetric bilinear form f on a real vector space. The
number p is called the index, S = p − q the signature and r = p + q the rank of f .
These three numbers are invariants of the form f as they are independent of the choice of the
diagonal matrix representing the bilinear form. Note that the rank of f is actually the matrix rank of
any diagonal matrix representing f . Also, it is clear that any two of these three invariants determine
the third.
We can define analogous invariants for any real symmetric matrix because of the matrix version of
Sylvester’s Law of Inertia.
Definition 7.4.11. Let A be a real symmetric matrix and D be a diagonal matrix congruent to A.
Let p and q be the numbers of positive and negative diagonal entries, respectively, of D. Then, p is the
index, and S = p − q is the signature of the real symmetric matrix A.
The index, the signature and the rank of A are invariants of A as they are independent of the choice
of the diagonal matrix D congruent to A.
Recall that for a real symmetric matrix A, there is a real orthogonal matrix Q such that Q−1 AQ is
a diagonal matrix D whose diagonal entries are the eigenvalues of A, all of which are real. Since Q is
orthogonal, Q−1 = Qt and so it follows that D is congruent to A. Using the invariants of A, we then
obtain the following significant result.
Corollary 7.4.12. If p is the index and S the signature of a real symmetric matrix A, then the
number of positive and negative eigenvalues of A are, respectively, p and q = p − S .
The invariants of a real symmetric matrix A determine A uniquely upto congruence as the next
proposition shows.
Proposition 7.4.13. Two real symmetric matrices in Mn (R) are congruent if and only if they have
the same invariants.
Proof. It is easy to see that if two symmetric matrices in Mn (R) are congruent, then they must be
congruent to the same diagonal matrix. So they have the same invariants.
366 Bilinear Forms
To prove the converse, we first show that if A ∈ Mn (R) is a real symmetric matrix with index p and
rank r, then A is congruent to the following diagonal matrix:
 
I p 0 0
 
D p,r =  0 −Ir−p 0,
 
0 0 0
where Ik denotes the identity matrix of order k, and the symbols 0 denote zero matrices of suitable
orders.
By hypothesis, the symmetric matrix A is congruent to a diagonal matrix D ∈ Mn (R), whose first
p diagonal entries are positive, the next (r − p) diagonal entries negative and the rest of the diagonal
entries zeros. Let di be the ith non-zero diagonal entry of D; note we have not defined di for (r + 1) ≤
i ≤ n. Let P = diag[a1, a2 , . . . , an ] be the diagonal matrix in Mn (R) defined by
 √


 1/ di , if 1 ≤ i ≤ p,


 √
ai =  1/ −di , if (p + 1) ≤ i ≤ r,



1 if (r + 1) ≤ i ≤ n.
Since the transpose Pt is the same matrix as P, a simple calculation shows that Pt DP is precisely D p,r .
In other words, D is congruent to Dp, r and so, by the transitivity of the congruence relation, A is
congruent to D p,r as claimed.
Therefore, if A and B are two real symmetric matrices having the same invariants, then each of them
is congruent to the same diagonal matrix D p;r for some p and r. It follows that A and B are themselves
congruent. !
Consider the case when symmetric bilinear form f , on a real n-dimensional vector space V, is
non-degenerate. Then V ⊥ is the zero subspace and so the rank of f is n. The proof of the preceding
proposition then implies the following.
Corollary 7.4.14. Let f be a symmetric non-degenerate bilinear form an a real n-dimensional

vector space, If the index of f is p and q = n − p, then there is an orthogonal basis of V with respect to
which the matrix of f is the following diagonal one:
' (
Ip 0
I p,q = ,
0 −Iq
where I p and Iq are identity matrices of orders p and q, respectively, and the matrices 0 are zero
matrices of suitable sizes.
The following matrix version of the two preceding results answers the question we raised, before
introducing Sylvester’s law of inertia, about the existence of canonical forms in congruence classes of
real symmetric matrices.
Corollary 7.4.15. Let A be a real symmetric matrix of order n with rank r and index p. Then A is
congruent to D p,r of order n. In case A is invertible, then A is congruent to the matrix I p,q of order n,
where q = n − p.
For details about such canonical forms for symmetric matrices over fields other than the real field,
the reader can look up Chapter 6 of Jacobson’s Basic Algebra, vol. I [5].
There is a simple method for computing the canonical form (or the invariants) of a real symmetric
matrix. The following lemma describes the method.
Lemma 7.4.16. Let A ∈ Mn (F) be a symmetric matrix (ch F ! 2). Then A can be diagonalized by
performing a sequence of pairs of operations on A, each pair consisting of an elementary column
operation and the corresponding row operation.
Proof. We begin by recalling the way multiplications of A by elementary matrices produce the effects
of corresponding elementary row (column) operations on A (see Section 2.2 for the relevant concepts).
If E ∈ Mn (F) is an elementary matrix, then the product AE is the matrix produced by the column
operation corresponding to E on A. The crucial point is that the corresponding row operation on AE
will result precisely in the product E t AE. Thus, E t AE can be thought of as the result of first performing
a certain column operation on A, and then performing the corresponding row operation on AE. This
explains the use of the word pairs in the statement of the lemma.
Now suppose that for the symmetric matrix A ∈ Mn (F), one can find an invertible matrix P ∈
Mn (F) such that Pt AP is diagonal. By Theorem (2.5.6), the invertible matrix P can be expressed
as a product of certain elementary matrices, say, P = E1 E2 , . . . , Em . Taking transposes, we obtain
Pt = E m
t t
Em−1 , . . . , E1t . It follows that the diagonal form D of A is given by
D = Pt AP = Em
t t
Em−1 , . . . , E1t AE1 E2 , . . . , Em .
The associativity of matrix products and the observations of the preceding paragraph then establishes
the lemma. !
In view of this lemma, it is clear that the procedure for diagonalizing a symmetric matrix A consists
of applying a suitable column operation to A and then the corresponding row operation to AE to begin
with, and then repeating with suitable pairs of such column and row operations to the resultant matrix
at each stage. The column and the row operations, at each stage, are chosen to make each pair of
symmetrically placed off-diagonal entries of the matrices obtained at that stage zeros.
Note that the chosen column operations, if applied to the identity matrix of suitable size in proper
sequence, will yield the matrix P also.
EXAMPLE 11 We diagonalize the following symmetric matrix in M3 (R):

 
 1 −2 3
 
A = −2 5 4.
 
3 4 −1
We begin with the block matrix [A|I3 ], and subject A to pairs of elementary column
and corresponding row operations till it reduces to a diagonal matrix D. Note that
only the column operations will be applied to I3 so that finally I3 will change to P
for which Pt AP = D. We outline the procedure leaving the details to the reader. For
convenience, wed denote the column (row) operation of adding a times the column
C j (row R j ) to column Ci (row Ri ) as Ci → Ci + aC j (Ri → Ri + aR j ). Choosing the
first pair of operations to be C2 → C2 + 2C1 and R2 → R2 + 2R1 , one sees that [A|I3 ]
368 Bilinear Forms
changes to
 
1 0 3 1 2 0
 
0 1 10 0 1 0.
 
3 10 −1 0 0 1
Applying C3 → C3 − 3C1 and R3 → R3 − 3R1 to the preceding block matrix, one then
obtains
 
1 0 0 1 2 −3
 
0 1 10 0 1 0.
 
0 10 −10 0 0 1
Finally, the pair C3 → C3 − 10C2 and R3 → R3 − 10R2 reduces the preceding block
matrix to [D|P]:
 
1 0 0 1 2 −23


0 1 0 0 1 −10.
 
0 0 −110 0 0 1
Thus, we have shown that for

 
1 2 −23
 
P = 0 1 −10,
 
0 0 1
one has
   
1 0 0  1 −2 3
   
0 1 0 = Pt −2 5 4 P.
 
0 0 −110 3 4 −1
By Corollary (7.4.15), A is also congruent to

 
 1 0 0 
 
I2,1 =  0 1 0 .

0 0 −1
We can therefore conclude that A has rank 3 and index 2. More importantly, we
now know that any diagonal matrix congruent to A has the same index and rank. In
particular, if D is the diagonal matrix Qt AQ, where Q is an orthogonal matrix, then
as the diagonal entries of D are the eigenvalues of A, we can conclude that A has two
positive and one negative eigenvalue.
It must be pointed out that there is no unique way of reducing a symmetric matrix A to a diago-
nal one by such pairs of operations; one can choose any pair of operations which renders non-zero
symmetrically placed entries of A, and of resultant symmetric matrices obtained at each stage, zeros.
There is a very useful classification of real symmetric matrices in terms of the signs of its eigen-
values; equivalently, real quadratic forms can be classified in the same manner. Recall that a real
symmetric matrix A of order n is associated to a real quadratic form q on Rn given by q(x) = xt Ax and
conversely.
Definition 7.4.17. Let A be a real symmetric matrix and q the associated quadratic form. A and
q are said to be positive definite (positive semi-definite) if all the eigenvalues of A are positive (non-
negative). They are called indefinite if A has both positive and negative eigenvalues, and negative
definite if all the eigenvalues are negative.
It is usual to call positive definite and positive semi-definite matrices simply as pd and psd matrices.
For practical purposes, real quadratic forms are also classified in terms of the nature of their ranges.
The following discussion depends on the results about diagonalization of real symmetric matrices; see
Section 5.3.
Proposition 7.4.18. Let q be a quadratic form on Rn given by q(x) = xt Ax, where A is the associ-
ated real symmetric matrix of order n. Then the following hold.
(a) q is positive definite if and only if xt Ax > 0 for all non-zero x ∈ Rn .
(b) q is positive semi-definite if and only if xt Ax ≥ 0 for all non-zero x ∈ Rn .
(c) q is negative definite if and only if xt Ax < 0 for all non-zero x ∈ Rn .
(d) q is indefinite if and only if xt Ax assumes both positive and negative real values.
Proof. Let Q be the orthogonal matrix of order n which diagonalizes A. Thus Qt AQ = D, where D is
the diagonal matrix having the n eigenvalues λ1 , λ2 , . . . , λn of A as its diagonal entries. As we have
seen earlier, as Q−1 = Qt , the change of coordinates Qy = x allows us to compute xt Ax as follows:
xt Ax = xt (QDQt )x
= (Qt x)t D(Qt x)
= yt Dy
= λ1 |y1 |2 + λ2 |y|2 + · · · + λn |yn |2 , (7.8)
where y = (y1 , y2 , . . . , yn )t . Observe that y = Qt x = 0, if and only if, x = 0. So, if x is non-zero, |yi |2 > 0
for all yi . Thus, an examination of the right-hand side of Equation (7.8) yields all the assertions of the
proposition. !
It is clear that the assertions of the proposition hold if we replace q by A.

The concepts of pd (positive definite) and psd (positive semi-definite) matrices are important as
they appear in numerous applications. We here can give only a very brief introduction to the theory of
these matrices. The following example shows the power of the ideas we have developed so far.
EXAMPLE 12 Let A and B be real symmetric matrices both having all their eigenvalues non-
negative (positive) and so are psd (pd) matrices. Thus, by the preceding proposition
(as applied to symmetric matrices), for any non-zero x ∈ Rn , both the real numbers
xt Ax and xt Bx are non-negative (positive). It follows that xt (A + B)x = xt Ax + xt Bx
too is non-negative (positive). By the same proposition then, A + B too is psd (pd).
We conclude that the eigenvalues of the symmetric matrix A + B are all non-negative
(positive).
370 Bilinear Forms
EXAMPLE 13 Consider the quadratic form q(x) = 3x21 + 4x1 x2 on R2 . Though it is clear that q
assumes positive values, we do not see it assuming negative values unless we make
a proper choice of x1 and x2 . However, the associated real symmetric matrix is
' (
3 2
,
2 0
which has eigenvalues 4 and −1. So Proposition (7.4.18) confirms that q is an indef-
inite quadratic form, which means that q does assume negative values, too.
EXAMPLE 14 The eigenvalues of
 
 3 −1 0
 
A = −1 2 −1
 
0 −1 3
are 1, 3 and 4 (verify). Thus, A is a positive definite matrix. Thus the corresponding
quadratic form 3x21 + 2x22 + 3x23 − 2x1 x2 − 2x2 x3 on R3 assumes only positive values,
a fact which is not apparent from its formula.
One of the interesting aspects of psd (pd) matrices is that they behave as non-negative (positive)
real numbers. For example, the following result shows that psd matrix has a unique square-root.
Proposition 7.4.19. Let A be a real symmetric matrix. Then A is psd (pd) if and only if there is a
unique psd (pd) matrix B such that A = B2 .
Proof. Suppose A = B2 for some psd matrix B. Since eigenvalues of B are non-negative and the eigen-
values of B2 are the squares of the eigenvalues of B, it follows that A is psd. To prove the converse,
let Q be the orthogonal matrix which diagonalizes the psd matrix A. Thus A = QDQt , where D is the
diagonal matrix whose diagonal entries λi are the eigenvalues of A, all of which are√non-negative. Let
E be the diagonal matrix whose ith diagonal entry is the non-negative square root λi . Then D = E 2 .
Since Q−1 = Qt , it follows that A = QDQt = QE 2 Q = (QEQt )(QEQt ) =√B2 , where B = QEQt . It is
clear that B is psd as it is a diagonal matrix having the same eigenvalues λi as E does.
It remains to prove the uniqueness of B. So let C be another psd matrix such that A = C 2 . Let
γ1 , γ2 , . . . , γn be the orthonormal columns of the orthogonal matrix Q which diagonalizes A; thus
the ith column vector γi is a unit eigenvector for the eigenvalue λi of A, that is, (A − λi In )γi = 0 for
each i (assuming A is of order n). One has then, for any i, 1 ≤ i ≤ n,
N N
0 = (C 2 − λi In )γi = (C − λi In )(C + λi In )γi .
√
If λi is positive for some i, then it is clear that C + λi In is positive√definite (see Example√13) and
so, invertible. It then follows from the preceding relation that (C − λi In )γi = 0 and so λi is an
eigenvalue of C with γi as an eigenvector. In case λi = 0, Aγi = 0 and so, by properties of the usual dot
product in Rn , one obtains

H I F G
4Cγi 42 = Cγi , Cγi = C t Cγi , γi
F G
= C 2 γi , γi
H I
= Aγi , γi
= 0.
Thus Cγi = 0, which shows that for the eigenvalue λi = 0, γi is an eigenvector of√C. √ √
It follows that C is diagonalizable and Qt CQ = E, the diagonal matrix having λ1 , λ2 , . . . , λn
as its diagonal entries. However, by definition B = QEQt and so we conclude that C = B. This proves
the uniqueness part. !
The following is clear now.
Corollary 7.4.20. If, for psd (pd) matrices B and C, B2 = C 2 , then B = C.
Suppose now that for a real symmetric matrix A of order n, there is an invertible matrix C ∈ Mn (Rn )
such that C t C = A. Then for any x ∈ Rn ,
xt Ax = xt (C t C)x = (Cx)t (Cx). (7.9)
Now by properties of the dot product in Rn ,
(Cx)t (Cx) = /Cx, Cx0 = 4Cx42 .
Therefore Equation (7.9) implies that xt Ax = 0 if and only if Cx = 0. However, as C is invertible,

Cx = 0 if and only if x = 0. We conclude that xt Ax = 4Cx42 > 0 for any non-zero x in Rn , that is, A is
positive definite.
This discussion coupled with Proposition (7.4.19) implies the following characterization of positive
definite matrices.
Proposition 7.4.21. For a real symmetric matrix A of order n, the following are equivalent.
(a) A is positive definite.
(b) There is a positive definite matrix B such that A = B2.
(c) There is an invertible matrix C of order n such that A = C t C.
EXERCISES
1. Determine whether the following assertions are true are false giving brief justifications. All given
vector spaces are finite-dimensional over fields F whose characteristic ! 2.
(a) For any symmetric bilinear form f on V, there is a basis of Vwhich is orthogonal relative to
f.
(b) If a symmetric matrix A ∈ Mn (F) is congruent to a diagonal matrix D, then the diagonal
entries of D are the eigenvalues of A.
372 Bilinear Forms
(c) Any symmetric bilinear form on R2 of rank 2 can be represented as

' (
1 0
0 1
with respect to some suitable basis of R2 .

(d) Any matrix in Mn (F) which is congruent to a symmetric matrix is itself symmetric.
(e) A symmetric matrix in Mn (Q) (Q is the field of rational numbers) is congruent to a diagonal
matrix in Mn (Q) whose non-zero entries are ±1.
(f) A symmetric matrix in Mn (C) is congruent to a diagonal matrix in Mn (C) all of whose non-
zero diagonal entries are 1.
(g) There is no symmetric matrix in Mn (R) whose index equals its signature.
(h) There is no symmetric matrix in Mn (R) whose rank equals its signature.
(i) Two symmetric matrices in Mn (F) are congruent if and only if they represent the same sym-
metric bilinear form on the vector space Fn .
(j) The determinants of two matrices in Mn (F) representing the same symmetric bilinear form
on an n-dimensional vector space over F differ by a multiple of a square of an element of F.
(k) For any subspace W of a vector space V with a symmetric bilinear form f , (W ⊥ )⊥ = W.
(l) Any two congruent matrices have the same determinant.
(m) Any two congruent matrices have the same trace.
(n) A real symmetric matrix with negative entries is negative definite.
(o) A positive definite matrix is invertible.
(p) A negative definite matrix is not invertible.
(q) The diagonal entries of a positive definite matrix are positive reals.
(r) If the diagonal entries of a real symmetric matrix are positive, then the matrix is positive
definite.
2. Verify that 1 and −1 are not eigenvalues of the matrix in Example 11.
3. For each of the following real symmetric matrix A, determine an invertible matrix P and a
diagonal matrix D such that Pt AP = D, and hence determine the invariants, and the number of
positive and negative eigenvalues of A:
   
' (  0 1 −1  1 −2 3
1 1    
(a) , (b)  1 0 0 , (c) −2 3 2 .
1 −1    
−1 0 1 3 2 4
4. Determine an orthogonal basis of R3 relative to each of the following symmetric bilinear form
f on R3 and then compute the invariants of f :
(a) f ((x1 , x2 , x3 )t ) = x21 + x22 − x23 − 2x1 x2 + 4x2 x3 ;
(b) f ((x1 , x2 , x3 )t ) = −x21 + x23 + 2x1 x2 + 2x1 x3 .
5. Determine an orthogonal basis of C3 relative to the symmetric bilinear form f on C3 given by
f ((x1 , x2 , x3 )t ) = (1 + i)x21 + x23 + 4x1 x2 + 2ix1 x3 + 2(1 − i)x2 x3 .

6. Prove the matrix version of Sylvester’s Law of Inertia given in Corollary (7.4.9).
7. Let f be a bilinear form on a finite-dimensional vector space V. If there is a basis of V with
respect to which the matrix of f is diagonal, that is, if f is diagonalizable, then show that f must
be symmetric.
8. Let f be a symmetric bilinear form on V, and let W be a subspace of V. Show that
f (v + W, u + W) = f (v, u)
defines a symmetric bilinear form on the quotient space V/W. Show further that f is non-
degenerate if and only if W = V ⊥ , where V ⊥ is the orthogonal complement of V with respect to
f.
9. Let D be a diagonal matrix in Mn (F) for a field F, and let D' be the matrix in Mn (F) obtained
by some arbitrary permutation of the diagonal entries of D. Prove that D' is congruent to D in
Mn (F).
10. Prove that the number of distinct congruence classes of symmetric matrices in Mn (R) is
(n+1)(n+2)
2 .
11. Let f be a non-degenerate symmetric bilinear form on a vector space V over a field F, and let
T be a linear operator on V. Show that the map g : V × V → F given by g(v, w) = f (v, T w) is
a bilinear form on V. Show further that g is symmetric if and only if T = T ∗ , where T ∗ is the
adjoint of T with respect to f .
(Hint: See Exercise 12 of the preceding section for the definition of the adjoint of a linear
operator.)
12. Let V be a finite-dimensional vector space with a symmetric bilinear form f . If W1 and W2 be
subspaces of V such that dim W1 > dim W2 , then prove that W1 contains a a non-zero vector
which is orthogonal to every vector of W2 .
Exercises 17 to 21 are based on the material in Prof. R. Bhatia’s article Min Matrices and
Mean Matrices which appeared in Mathematical Intelligencer, Vol.33, No.2, 2011.
13. If A1 and A2 are psd (respectively, pd) matrices, then show that the direct sum A1 ⊕ A2 is psd
(respectively, pd).
14. Consider the real matrix Fn (flat matrix) of order n all of whose entries are equal to 1. Compute
xt Fn x for x = (x1 , x2 , . . . , xn )t ∈ Rn and verify that while Fn is positive semi-definite, it cannot
be positive definite.
15. Find a positive semi-definite matrix R of order 4 such that the flat matrix F4 = Rt R. Can any
choice of R be invertible?
Generalise the construction of R to the case of Fn .
16. Let
 
1 1 1 1

1 2 2 2
M =  .
1 2 3 3
1 2 3 4
a. Find a positive definite matrix R of order 4 such that M = Rt R.

b. Compute xt Mx for x = (x1 , x2 , x3 , x4 )t ∈ R4 and verify that M is positive definite.
374 Bilinear Forms
17. Let M = [mi j ] be the min matrix of order n, where mi j = min i, j:

 
 1 11 ... 1 1 

 1 22.... 2 2 
 
M =  1
 23.... 3 3 .
 . .. .. .. 
 .. . . . 

1 2 3 . . . . n−1 n
Prove that M is positive definite by finding a an invertible lower triangular matrix R (through the
LU factorization of M) such that M = Rt R.
Exercise 13 shows that the restriction on the characteristic of the scalar field is necessary in
Theorem (7.4.6) for diagonalization of a symmetric bilinear form. Knowledge of the finite field
of two elements is necessary for this exercise.
18. Let F be the field Z2 = {0, 1} of two elements, and let V = F2 be the vector space of 2-dimensional
column vectors over F. Consider the map f on V × V given by f (x1 , x2 )t , (y1 , y2 )t = x1 y2 + x2 y1 .
Show that f is a symmetric bilinear form on V and compute the matrix A of f relative to the
standard basis of V and hence show that rank( f ) = 2. Next prove 'that if (f is diagonalizable then
1 0
there is a basis of V with respect to which the matrix of f is D = .
' ( 0 1
a b
Finally, if P = ∈ M2 (F) is an invertible matrix such that Pt AP = D, then show, by com-
c d
paring the diagonal entries of both sides of this matrix equation, that 1 = 0 in F, a contradiction.
(Hint: One needs to use the fact that 2x = 0 for any x ∈ F.)
7.5 GROUPS PRESERVING BILINEAR FORMS

Recall that if Q is a real orthogonal matrix of order n then Qt Q = In = QQt . Therefore for any x, y ∈ Rn ,
one has (Qx)t Qy = xt Qt Qy = xt y. In other words, f (Qx, Qy) = f (x, y), where f is the symmetric
bilinear form representing the usual dot product on Rn . One says that Q preserves the dot product.
In general, one can introduce linear operators (or matrices) which preserve bilinear forms. Operators
or matrices preserving non-degenerate forms furnish interesting examples of groups, such as general
orthogonal and symplectic groups, having important properties.
If f is bilinear form on a vector space V, then a linear operator T on V is said to preserve f if
f (T v, T w) = f (v, w) for all v, w ∈ V.
It is clear that (i) the identity operator on V preserves any bilinear form f , and (ii) if operators T
and S on V preserve f , then so does the composite S T . In case f is a non-degenerate form on a
finite-dimensional vector space V, any linear operator T preserving f is necessarily invertible; for, if
v ∈ ker T , then as f is non-degenerate, the condition f (v, w) = f (T v, T w) = f (0, w) = 0 for all w ∈ V
implies that v = 0. Thus T is one-one and so invertible as V is finite-dimensional. Moreover,
f (T −1 v, T −1 w) = f (T (T −1 v), T (T −1 w)) = f (v, w),
which proves that T −1 also preserves f . Thus we have verified the following proposition.
Groups preserving Bilinear Forms 375
Proposition 7.5.1. Let f be a non-degenerate bilinear form on a finite-dimensional vector space

V over an arbitrary field F. Then the set O( f ) of linear operators on V which preserve f is a group.
Definition 7.5.2. The group O( f ) is called the orthogonal group or the group of isometries of the
non-degenerate bilinear form f .
One can identify the group O( f ), for a non-degenerate bilinear form f on an n-dimensional vector
space V over a field F, with a group of invertible matrices of order n over the field F by fixing a basis
of V. Let A be the matrix of f with respect to a fixed but arbitrary basis {v1 , v2 , . . . vn } of V; as f is
non-degenerate, A is invertible. Recall that if x and y are the coordinate vectors in Fn of vectors v and
w in V respectively, then f (v, w) = xt Ay. Therefore, if a linear operator T on V, which preserves f , is
represented by the matrix M of order n with respect to the same basis, then M is invertible (as T is)
and
f (T v, T w) = f (Mx, My) = xt M t AMyt .
It follows that T preserves f if and only if
M t AM = A. (7.10)
One also says that the matrix M preserves the form f if M t AM = A, where A is the matrix representing
f with respect to some basis of V. The following result, whose easy verification is left to the reader, can
be termed as the matrix version of Proposition (7.5.1). Recall that GLn (F) is the group of all invertible
matrices of order n over a field F.
Proposition 7.5.3. For any A ∈ GLn (F), the set O(A) of matrices M ∈ GLn (F) such that M t AM = A
is a group.
Moreover, using Equation (7.10), one can easily show that the group of operators and the group of
matrices preserving a non-degenerate form are essentially the same.
Proposition 7.5.4. Let f be a non-degenerate bilinear form on an n-dimensional vector space V

over a field F and let A ∈ GLn (F) be the matrix of f with respect to some basis B of V. If for any
T ∈ O( f ), θ(T ) is the matrix of T with respect to B, then θ is an isomorphism of the group O( f ) onto
O(A).
For example, the group of matrices M ∈ GLn (F) preserving the dot product f (x, y) = xt y on Fn
consists of precisely those matrices M such that M t M = In , since with respect to the standard basis of
Fn , the matrix of the dot product is the identity matrix In . This group is the general orthogonal group
On (F); the matrices in this group are the orthogonal matrices of order n over the field F. The real
orthogonal group On (R) is, thus, just one example of bilinear form preserving group of matrices.
Over F = R, one also has pseudo-orthogonal groups On (p, q) for non-negative integers p and q
such that p + q = n; On (p, q) is the group of all real matrices M of order n such that
M t I p,q M = I p,q ,
376 Bilinear Forms
where
' (
Ip 0
I p,q = .
0 −Iq
Note that On (0, n) = On (n, 0) = On (R). Because of Corollary (7.4.14), it is clear that On (p, q) is the
set of real matrices M of order n which preserves any non-degenerate symmetric bilinear form on an
n-dimensional real vector space having rank n and index p.
In case F = C (or some other algebraically closed field), Theorem (7.4.6) implies that any non-
degenerate symmetric bilinear form on an n-dimensional vector space determines an orthogonal basis
with respect to which matrix of the form is the identity matrix and so matrices preserving any such
form is the full orthogonal group On (C).
In all theses cases, clear descriptions of groups of matrices (or of linear operators) preserving non-
degenerate symmetric bilinear forms on finite dimensional vector spaces over a field F of characteristic
different from 2, depended on the existence of nice canonical forms of diagonal matrices representing
the form. In general however, for arbitrary fields F (except for finite fields), it may not be possible to
determine such canonical forms; for details, one may refer to Basic Algebra,vol. 1 by Jacobson [?].
It is therefore surprising that an alternating non-degenerate bilinear form on a finite-dimensional
vector space over any field admits a nice canonical matrix representation. Recall that a bilinear form
f on a vector space V over a field F is called alternating if f (v, v) = 0 for all v ∈ V. Observe that an
alternating form is skew-symmetric: f (v, w) = − f (w, v) for any v, w ∈ V. If charF ! 2, then a skew-
symmetric form f is also alternating; in case charF = 2, skew symmetric forms are the same as the
symmetric ones as a = −a for any a ∈ F. The following is the fundamental result about alternating
forms.
Proposition 7.5.5. Let f be an alternating bilinear form on a finite-dimensional vector space V

over a field F. Then there is a basis of V with respect to which the matrix of f is a direct sum of a zero
matrix and k copies of the matrix
' (
0 1
J1 = .
−1 0
Furthermore, the rank r of f is even and r = 2k.
Proof. If f is the zero form, there is nothing to prove. So we may assume that f is not the zero form.
Then there are vectors u1 and v1 in V such that f (u1 , v1 ) ! 0. Replacing v1 by a scalar multiple, if
necessary, we can assume that f (u1 , v1 ) = 1. Using properties of the alternating form f , one can easily
see that u1 and v1 are linearly independent (see Exercise 8). Let W1 be the subspace of V spanned
by u1 and v1 ; the key to finding the required basis of V is to show that V is a direct sum of such
two-dimensional subspaces (and possibly a subspace on which f is the zero form). To do that, we first
prove that V = W1 ⊕ W1⊥ , where W1⊥ is the subspace (prove it) consisting of the vectors w ∈ V such that
f (u1 , w) = 0 and f (v1 , w) = 0. Now, for any v ∈ V, let v' = f (v, v1 )u1 − f (v, u1 )v1 and set w' = v − v' .
It is clear that v' is in W1 . Moreover, as f (u1 , u1 ) = 0 and f (u1 , v1 ) = 1, we see that
f (u1 , w' ) = f (u1 , v) − f (u1 , v' )

= f (u1 , v) + f (v, u1 ) f (u1 , v1 )
= 0.
A similar calculation shows that f (v1 , w' ) = 0. So w' ∈ W1⊥ . Since v = v' − w' , it follows that v is
in W1 + W1⊥ . Next we verify that W1 ∩ W1⊥ = {0}. So let v = au1 + bv1 be in W1⊥ , which implies that
f (u1 , au1 +bv1 ) = 0 = f (v1 , au1 +bv1 ). Properties of the alternating form f and the fact that f (u1 , v1 ) =
1 then show that b = 0 = −a. Thus v = 0, completing our verification. We therefore can conclude that
V = W1 ⊕ W1⊥ . Now the restriction of f to W1⊥ is clearly alternating. So if f is the zero form on W1⊥ ,
then we are done. Otherwise we can find two vectors u2 and v2 in W1⊥ such that f (u2 , v2 ) = 1. As
in the case of the vectors u1 and v2 , one can show that u2 and v2 are linearly independent. Letting
W2 to be the subspace spanned by u2 and v2 and W2⊥ the subspace of all vectors w ∈ W1⊥ such that
f (u2 , w) = 0 = f (v2 , w), one can show, as in the preceding case, that W1⊥ = W2 ⊕ W2⊥ .
Continuing in this manner, as V is finite-dimensional, we obtain vectors u1 , v1 , . . . , uk , vk such
that
f (u j , v j ) = 1 for 1 ≤ j ≤ k;
f (ui , v j ) = f (ui , u j ) = f (vi , v j ) = 0 for i ! j.
Moreover, if W j is the subspace spanned by the linearly independent vectors u j and v j , then
V = W1 ⊕ W2 ⊕ · · · ⊕ Wk ⊕ W0 ,
where W0 is the subspace (possibly the zero subspace) such that the restriction of f to W0 is the
zero form. Now, choosing any basis w1 , w2 , . . . , wn−2k of W0 , we see that if dim V = n, then
u1 , v1 , . . . , uk , vk , w1 , w2 , . . . , wn−2k is a basis of V and that the matrix of f with respect to this
basis is the required one.
The assertion about the rank of f is now clear. !
The pair u j , v j is sometimes called a hyperbolic pair and the subspace W j spanned by these two
linearly independent vectors a hyperbolic plane.
Recall that for a non-degenerate bilinear form f on a vector space V, for every non-zero u ∈ V,
there is some v ∈ V such that f (u, v) ! 0. Thus, for such a form, the subspace W0 of the preceding
proposition has to be the zero subspace and so can be omitted from the direct sum decomposition of
V. In that case, reordering the basis of the non-zero part of V as u1 , u2 , . . . , un , v1 , v2 , . . . , vn , we
deduce the following result.
Corollary 7.5.6. Let f be a non-degenerate alternating form on a finite-dimensional vector space

V over a field F. Then the dimension of V is necessarily even. Further, if dim V = 2n, then there is a
basis of V with respect to which the matrix of f is
' (
0n In
Jn =
−In 0n ,
where 0n and In are respectively the zero matrix and the identity matrix, both of order n, over F.
The matrix Jn gives rise to a group which has deep group-theoretic properties.
Definition 7.5.7. Let F be a field of characteristic different from 2. The symplectic group S p2n (F)
is the group of matrices
{S ∈ GL2n (F) | S t Jn S = Jn }. (7.11)

378 Bilinear Forms
Proposition (7.5.3) shows that S p2n (F) is indeed a group. Elements of S p2n (F) are called symplec-
tic matrices. An easy calculation shows that Jn itself is a symplectic matrix of order 2n.
As in the case of non-degenerate symmetric bilinear forms, which we had discussed earlier, the
linear operators on a vector space V, which preserve a non-degenerate alternating bilinear form f on
V, do form a group, say S p( f ). If V is finite-dimensional of dimension 2n, then by choosing a basis of
V, consisting of n hyperbolic pairs, one can set up a group isomorphism of S p( f ) onto S p2n (F). Thus,
the symplectic matrices in S p2n (F) are also said to preserve the form f .
To obtain some examples of symplectic matrices S satisfying the relation S t Jn S = Jn , we express
S as a partitioned matrix ' (
A B
S= ,
C D
where the blocks A, B, C and D are all matrices over F of order n and then express the product S t Jn S
in blocks by block multiplication. Equating these four blocks with the corresponding blocks of Jn , we
find the following conditions on the blocks of S :
(i) At C and Bt D are symmetric and
(ii) At D − C t B = In .
Thus, for example, for any A ∈ GLn (F) of order n and any symmetric matrix B of order n (over F), the
blocks of the matrices ' −1 ( ' (
A 0n In B
and ,
0n At 0n In
where 0n is the zero matrix of order n, satisfy the preceding conditions and so the matrices are sym-
plectic.
EXERCISES
1. Determine whether the following assertions are true or false giving brief justification. All vector
spaces are finite-dimensional.
a. A matrix A ∈ Mn (F) is skew-symmetric if and only if xt Ax = 0 for all x ∈ Fn .
b. A skew-symmetric matrix A ∈ Mn (F) cannot be invertible if n is odd.
c. If A and B are matrices of a non-degenerate bilinear form on a vector space, then the groups
O(A) and O(B) are isomorphic.
d. Any matrix preserving a quadratic form is invertible.
e. The group O(1, 1) preserves the form x2 − y2 .
f. If M ∈ Mn (C) preserves the dot product on Cn , then the columns of M form an orthonormal
set.
g. If A is a matrix of a non-degenerate bilinear form on a complex vector space of dimension
n, then A is congruent to In .
h. The matrix Jn ∈ S p2n (F).
i. If S ∈ S p2n (F), then S t ∈ S p2n (F).
j. The determinant of the symplectic matrix Jn is 1.
k. Any S ∈ S p2n (F) is invertible.
l. Every real invertible matrix of order 2n is in S p2n (F).
2. If A real skew-symmetric matrix of order n, then show that In + A is invertible and (In − A)(In +
A)−1 is orthogonal.
3. Prove that the eigenvalues 'of a real skew-symmetric ( matrix are purely imaginary.
cosh t sinh t
4. Verify that the real matrix is in O(1, 1).
' sinh (t cosh t
cos z − sin z
5. Verify that A = is in O2 (C) but is not unitary.
sin z cos z
6. Let A' and B (be real matrices of order p and q, respectively. Find conditions on A and B such that
A 0
M= is in the group O(3, 3).
0 B
7. Find a matrix M ∈ O3 (C) whose first column is (1, 0, i)t .
8. let n be a positive integer and p and q non-negative integers such that p + q = n. Prove that the
group O(p, q) is isomorphic to O(q, p).
9. Let f be an alternating form on a vector space V over a field F, and let u1 , v1 ∈ V such that
f (u1 , v1 ) = 1. If w = a1 u1 + b1v1 for a1 , b1 ∈ F, then prove that a1 = f (w, v1 ) and b1 = − f (w, u1 .
Hence, prove that u1 and v1 are 'linearly independent
( over F.
A−1 0
10. Verify that for any A ∈ GLn (F), is in S p2n (F).
0 At
11. Verify that for any symmetric matrix B of order n over a field F,
' (
In B
∈ S p2n (F),
0 In
where In denote the identity matrix and 0 the zero matrix, both of order n, over F.
12. If S ∈ S p2n (F), then prove that S −1 ∈ S p2n (F).
13. Let S ∈ S p2n (C). Use the fact that S is similar to S t to prove that S is similar to S −1 .
14. Let A be a real matrix of order 2n such that the product Jn A is symmetric. Prove that the trace of
A is zero and that Jn At is also symmetric.
8 Inner Product Spaces
8.1 INTRODUCTION
Bilinear form, as we had seen in the last chapter, is a natural generalization of the dot products of
R2 and R3 ; the important geometric concept of perpendicularity can also be introduced for symmetric
bilinear forms. For a symmetric bilinear form f , even the notion of the length of a vector can be
introduced in case f (v, v) is positive real for any non-zero v; such a form is known as positive definite.
However, the theory of positive definite symmetric bilinear forms will be developed in the frame-
work of hermitian forms in this chapter. Hermitian forms, in some sense, generalize real symmetric
forms to the complex case. Most of the chapter will be devoted to positive definite hermitian forms,
otherwise known as inner products. One of the main goals of this chapter is to develop necessary ideas
which help classifying completely the operators which are diagonalizable with respect such inner
products.
Throughout this chapter, the field F is either R or C.
8.2 HERMITIAN FORMS

Definition 8.2.1. Let V be a vector space over a field F. A map H : V × V → F is called a hermitian
form on V, if for all v, v' , w ∈ V and a ∈ F,
(i) H(v + v' , w) = H(v, w) + H(v' , w);

(ii) H(av, w) = aH(v, w);
(iii) H(v, w) = H(w, v).
Here, bar denotes the complex conjugate.

Thus, a hermitian form is linear in the first variable but not exactly linear in the second vari-
able. In fact, by using the last condition to interchange the variables, we can deduce from these
conditions that:
(iv) H(v, w + w' ) = H(v, w) + H(v, w' );

(v) H(v, aw) = aH(v, w);
(vi) H(v, v) is real.
380
Hermitian Forms 381
As in the case of bilinear forms, one easily verifies that H(v, 0) = 0 for any v ∈ V.
We may describe the condition on the second variable as conjugate linearity. Therefore, sometimes
a hermitian form is referred to as one and a half linear (sesqui-linear) form. Condition (iii) is referred
to as hermitian symmetry; observe that if F = R, then a hermitian form is nothing but a symmetric
bilinear form.
The reader should be cautioned that hermitian forms can also be defined by insisting on linearity
in the second variable and conjugate linearity in the first variable. Thus, it is necessary to find out the
definition adopted for hermitian forms while checking other sources.
We give some examples of hermitian forms now.
EXAMPLE 1 Let V = Fn , the vector space of n-dimensional column vectors over F. For x, y ∈ Fn ,
we let H(x, y) = y∗ x, where for the column vector y = (y1 , y2 , . . . , yn )t , y∗ denotes
the row vector (y1 , y2 , . . . , yn ). Writing out the product, we then see that
H(x, y) = x1 y1 + x2 y2 + · · · + xn yn .
It is easy to verify that H is a hermitian form on Fn . It is the standard hermitian

product or the standard inner product on Fn .
If F = R, then this standard product is clearly the standard dot product on Rn as
in this case
H(x, y) = yt x = x1 y1 + x2 y2 + · · · + xn yn .
Note that the standard dot product xt y on Cn is not a hermitian form.

It is customary to denote this standard hermitian form H(x, y) by /x, y0.
√
Thus, in C3 , for example, if x = (1, 1 + i, 1 − i)t and y = ( 2, 2 + 2i, −i)t , then
√
/x, y0 = 1( 2) + (1 + i)(2 − 2i) + (1 − i)i
√
= (5 + 2) + i.
To get more examples of hermitian forms on Fn , we need to establish an association with such hermi-
tian forms with certain matrices that will reflect the special properties of hermitian forms. We introduce
these matrices now. Recall that for an m × n matrix A over F, its adjoint or conjugate transpose A∗
is the n × m matrix [bi j ], where bi j = a ji ; we have already used the conjugate transpose y∗ of a column
vector y in Example 1.
Definition 8.2.2. A matrix A ∈ Mn (F) is a hermitian, or a self-adjoint matrix if A = A∗ .
Thus a real hermitian matrix is simply a symmetric matrix. Note that the diagonal entries of a
hermitian matrix are all real.
The following properties of adjoints will be useful (see Section 1.5): For A, B ∈ Mn (F), one has
(i) (A + B)∗ = A∗ + B∗,
(ii) (AB)∗ = B∗ A∗ ,
∗
(iii) (aA)∗ = aA∗ for any a ∈ F. Thus, for an invertible matrix A ∈ Mn (F), (A∗ )−1 = (A−1 ) .
Let H be a hermitian form on an n-dimensional vector space V over F. Fix a basis B =

{v1 , v2 , . . . , vn } of V. The matrix A of the form H with respect to the basis B is defined as
A = [ai j ], where ai j = H(vi , v j ) for all i, j.
Since H(vi , v j ) = H(v j , vi ), it follows that ai j = a ji . Thus, the matrix A of the hermitian form H relative
to any basis of V is a hermitian or self-adjoint matrix in Mn (F).
Conversely, a hermitian matrix of order n determines a hermitian form on any n-dimensional vector
space over F. To verify this assertion, let A be a hermitian matrix in Mn (F). Fix a basis B of an n-
dimensional vector space V over a field F. If x and y are the coordinate vectors of any two arbitrary
vectors v and w of V with respect to the basis B, then we define
H(v, w) = y∗ Ax. (8.1)
The properties of matrix multiplication and of adjoints of matrices that have been just listed, readily
imply that H is a hermitian form on V. For example, (y + y' )∗ Ax = (y∗ + y' ∗ )Ax = y∗ Ax + y' ∗ Ax.
Similarly, (ay)∗ Ax = ay∗ Ax. These show that that H is conjugate linear in the second variable.
To verify the other properties of H, we need to find the matrix of H with respect to the chosen basis
B. Observe that if B = {v1 , v2 , . . . , vn }, then the coordinate vector of vi for any i is the n × 1 column
vector ei , having 1 at the ith place and zeros everywhere else. Therefore,
H(vi , v j ) = e j ∗ Aei = (a j1 , a j2 , . . . , a jn )ei = a ji
for all i, j. Since a ji = ai j , it follows that H(vi , v j ) = H(v j , vi ) for all basis vectors vi and v j , and so H
does have hermitian symmetry for all vectors in V.
We have just shown that hermitian forms on an n-dimensional vector space V over a field F are in
one–one correspondence with hermitian matrices of order n over F. Also note that hermitian matrices
A ∈ Mn (F) supply us with examples of hermitian forms on Fn through the formula y∗ Ax given in
Equation (8.1); the standard inner product can be obtained by taking A to the identity matrix.
EXAMPLE 2 Consider the hermitian matrix

' (
1 i
A=
−i 2
over C. For x = (x1 , x2 )t and y = (y1 , y2 )t in C2 , the form given by

' (2 3
1 i x1
H(x, y) = y∗ Ax = (y1 , y2 )
−i 2 x2
= x1 y1 − ix2 y1 + ix1 y2 + 2x2 y2
is then a hermitian form H on C2 with respect to the standard basis of C2 . Note that
H(x, x) = x1 x1 − ix2 x1 + ix1 x2 + 2x2 x2 = |x1 |2 + 2Im(x2 x1 ) + 2|x2|2 is a real number.
EXAMPLE 3 If, for any A, B ∈ Mn (F), we define
H(A, B) = T r(AB∗ ),
Hermitian Forms 383
where B∗ is the conjugate transpose of B, then one can verify that H is a hermi-
tian form on Mn (F) by appealing to the properties of the trace function and ma-
trix multiplication. There is also an another way of verifying the same. If A = [ai j ]
and B = [bi j ], then the general diagonal entry cii of the product AB∗ is given by
4
cii = k aik bik , so by the definition of the trace of a matrix we have
11
H(A, B) = aik bik ,
i k
2
which is nothing but the standard product of Example 1 on Fn . Note that Mn (F) as a
2
vector space is naturally isomorphic to Fn .
It is also clear, in the light of the discussions preceding Example 2, that for any
hermitian matrix P ∈ Mn (F), the formula H(A, B) = T r(APB∗) will define a hermitian
form on Mn (F).
For an example of a different type, we introduce a hermitian form on an infinite-dimensional vector
space.
EXAMPLE 4 Let V be the complex vector space of all continuous complex-valued functions on
the interval [0, 2π]. Then
J 2π
H( f, g) = (1/2π) f (t)g(t) dt
0
defines a hermitian form of V. We leave the verification, which will depend on the
properties of the Riemann integral, to the reader. We point out that any h(t) ∈ V can
be written as h(t) = a(t) + ib(t), where a(t) = Re h(t) and b(t) = Im h(t) are real-valued
functions on [0, 2π] so that h(t) = a(t) − ib(t). Therefore, the integral of h(t) can be
expressed in terms of real integrals as
J 2π J 2π
a(t) − i b(t).
0 0
EXAMPLE 5 Let V be the complex vector space of all polynomials with complex coefficients of
degree at most 3. As in the previous example,
J 1
H( f (x), g(x)) = f (t)g(t) dt
0
is a hermitian form on V. We compute the 4 × 4 matrix A = [ai j ] of H with respect to

the basis B = {v1 = 1, v2 = x, v3 = x2 , v4 = x3 } of V. By definition,
J 1
ai j = H(vi , v j ) = ti+ j−2 dt.
0
Evaluating the integral, we see that ai j = 1/(i + j − 1).

Thus, the matrix A will be the following real hermitian, that is, the real symmetric
matrix
 
 1 1/2 1/3 1/4

1/2 1/3 1/4 1/5
 .
1/3
 1/4 1/5 1/6
1/4 1/5 1/6 1/7
The concept of perpendicularity, which is commonly known as orthogonality, can be introduced

in a vector space V equipped with a hermitian form H exactly in the same manner as it was done in
the spaces with symmetric bilinear forms (see Definition (7.4.1)). Two vectors v, u ∈ V are orthog-
onal (with respect to H), if H(v, u) = 0. By hermitian symmetry, the relation of being orthogonal is
symmetric.
Our discussion about orthogonality with respect to a symmetric bilinear form in Section 4 of the
last chapter can be repeated without much change in the case of a space with hermitian form. For
example, the orthogonal complement of a subspace with respect to a hermitian form can be introduced
and a result analogous to Theorem (7.4.4) can be proved.
This theorem as well as the fact that orthogonal bases exist for hermitian forms hold under the
condition that the hermitian form is positive definite. In fact, a lot can be said about the structures
of spaces with positive definite forms and we make a detailed study of such forms in the following
section.
EXERCISES
1. Determine whether the following assertions are true are false giving brief justifications. All
underlying vector spaces are finite-dimensional over the field of complex numbers or the field
of real numbers.
(a) If a hermitian form on a complex vector space is symmetric, then it must be identically zero.
(b) A symmetric bilinear form on a real vector space is hermitian.
(c) A non-zero hermitian form on a real vector space is positive definite.
(d) The determinant of any matrix representation of a positive definite hermitian form is a posi-
tive real number.
(e) There can be no positive definite hermitian form on an infinite-dimensional complex vector
space.
(f) If f is a hermitian form on a real vector space is not positive definite, then f cannot be
non-degenerate.
(g) The restriction of a positive definite hermitian form to a non-zero subspace need not be
positive definite.
(h) The sum of two hermitian matrices of the same order is again a hermitian matrix.
(i) The diagonal elements of any hermitian matrix are real.
2. Show that H(x, y) = xt y for all x, y ∈ Fn defines a hermitian form on Fn . Is H positive definite?
3. Show that H(A, B) = T r(AB∗ ) for A, B ∈ Mn (F) defines a hermitian form on Mn (F) directly by
using properties of trace and adjoints of matrices. Is H positive definite?
Inner Product Space 385
4. Let P be a fixed hermitian matrix in Mn (F). Does H(A, B) = Tr(APB∗) for A, B ∈ Mn (F) define
a hermitian form on Mn (F)?
5. Let V be the complex vector space of all continuous
L 2π complex-valued functions on the interval
[0, 2π]. Prove that the formula H( f, g) = 1/2π 0 f (t)g(t)dt defines a hermitian form H on V.
6. Let V be the complex vector spaceL of all polynomials with complex coefficients of degree at
1
most n. Verify that H( f (x), g(x)) = 0 f (t)g(t)dt defines a hermitian form on V.
7. Verify all the properties of adjoints listed in Proposition (8.5.6).
8. Prove or disprove: for A ∈ Mn (C),
det(A∗ ) = det(A).
8.3 INNER PRODUCT SPACE

In this section, we consider a positive definite hermitian form on a vector space over a field F, where
F is either C or R. Such a form is called an inner product, and a space with such a form is known
as an inner product space. One usually denotes an inner product by /, 0. Thus, by the properties of
a hermitian form (see Definition (8.2.1)), an inner product /v, w0 of two vectors in an inner product
space V over a field F is a scalar satisfying the following conditions:
(i) /v, w + w' 0 = /v, w0 + /v, w' 0,

(ii) /v + v' , w0 = /v, w0 + /v' , w0,
(iii) /av, w0 = a/v, w0
(iv) /v, aw0 = a/v, w0
(v) /v, w0 = /w, v0
for all v, v' , w, w' ∈ V and a ∈ F. It also satisfies that all important property of positive definiteness:
(vi) /v, v0 is a positive real for all non-zero v ∈ V.
Note that our definition of inner product makes it linear in the first variable but conjugate linear in
the second variable.
If F = R, then the conjugates are superfluous and so an inner product over R is linear in both the
variables and simply a positive definite real symmetric bilinear form.
We can expand the inner product of sums of vectors using repeatedly the properties satisfied by an
inner product:
D1 1 E 1
ai vi , b jw j = ai b j /vi , w j 0. (8.2)
i j i, j
For example,
/v1 + av2 , bw1 0 = b/v1 , w1 0 + ab/v2 , w1 0.
A few observations that follow easily from the definition of an inner product are listed in the fol-
lowing proposition. They will be needed frequently later.
Proposition 8.3.1. For any vectors v, w and u in V,
(i) /v, 00 = /0, v0 = 0;

(ii) /v, v0 = 0 if and only if v = 0;
(iii) /v, w0 = /v, u0 for all v ∈ V implies that w = u;
(iv) /v, w0 + /w, v0 = Re/v, w0.
Here, Re a denotes the real part of a complex number a.
Most of the examples of hermitian forms of the previous section are also inner products as can be
verified easily. We recall these examples for the sake of completeness.
EXAMPLE 6 For x, y ∈ Cn , the form given by /x, y0 = y∗ x is the standard inner product on Cn
or in Rn . For x = (x1 , x2 , . . . , xn )t , we see that /x, x0 = |x1 |2 + |x2 |2 + · · · + |xn |2 and
so /x, x0 is a positive real number if x is non-zero.
The standard inner product on Rn is similarly given by /x, y0 = yt x, the usual dot
product. Thus /x, x0 = x1 2 + x2 2 + · · · + xn 2 for x = (x1 , x2 , . . . , xn )t and so /x, x0 is
a positive real number for a non-zero x.
EXAMPLE 7 However, the hermitian form H on Fn defined by a hermitian matrix ' A (as H(x, y) =
−1 0
y Ax is not an inner product in general. For example, if A =
∗
, then xt Ax
0 1
clearly cannot be positive for all non-zero vectors x in R2 .
4
EXAMPLE 8 Since for A = [ai j ] ∈ Mn (F), T r(AA∗ ) = i, j |ai j |2 , the corresponding hermitian form
H on Mn (F) of Example 3 given by H(A, B) = T r(AB∗ ) is an inner product.
EXAMPLE 9 For functions f and g in the space V of all continuous complex-valued functions on
the closed interval [0, 2π],
J 2π
/ f , g0 = f (t)g(t)dt
0
is an inner product on V as the given integral
L 2π defines a hermitian
L 2π form on V (see
Example 4 of the preceding section) and as 0 f (t) f (t)dt = 0 | f (t)|2 dt is a positive
real for any non-zero function f in V.
EXAMPLE 10 Similarly, the vector space V of all polynomials with complex coefficients
L1 is an inner
product space with the inner product defined as / f (x), g(x)0 = 0 f (t)g(t)dt.
Observe that any subspace of an inner product space is trivially an inner product space. This obser-
vation provides various examples of inner product spaces. For instance, the space R[x] of all polynomi-
als with real coefficients or the space Rn (F) of all polynomials with real coefficients of degree at most
n, being subspaces of the inner product space ofL the last example, are themselves inner product spaces
1
with the inherited inner product / f (x), g(x)0 = 0 f (t)g(t)dt. Note that as we are considering polyno-
mials with real coefficients, we can dispense with the conjugate in the definition of the inner product.
Notions of perpendicularity and lengths of vectors can be naturally introduced in an inner product
space V.
Two vectors v and w in an inner product space V are orthogonal if /v, w0 = 0. Note that symmetry
is in-built in the definition because of the hermitian symmetry property of an inner product. Properties
of inner product show that (i) every v is orthogonal to the zero vector and (ii) only self-orthogonal
vector in an inner product space is the zero vector.
The fact that the inner product of a vector with itself is a non-negative real number enables us to
define the length of a vector.
√ length 4v4 of a non-zero vector v in an inner product space is defined as the

Definition 8.3.2. The
positive square root /v, v0; if v = 0, then /v, v0 = 0 and we set 404 = 0. A unit vector is one whose
length is one.
In practice, it is easier to deal with squares of lengths of vectors.

One of the reasons why inner product spaces turn out to be useful is that the notion of length of
vectors in such a space satisfies the basic properties of usual length of vectors in R2 or R3 .
Proposition 8.3.3. Let V be an inner product space over a field F. The following hold for all
v, w ∈ V and a ∈ F.
(i) 4v4 ≥ 0; 4v4 = 0 if and only if v is the zero vector.
(ii) 4av4 = |a|4v4.
(iii) For any non-zero vector v, the scalar multiple v/4v4 is a unit vector.
(iv) If v ⊥ w, then 4v + w42 = 4v42 + 4w42 .
(v) |/v, w0| ≤ 4v44w4.
(vi) 4v + w4 ≤ 4v4 + 4w4.
Whereas (iv) and (v) are known respectively as Pythagoras’ identity and Cauchy–Schwarz in-
equality, the last one is the well-known Triangle inequality.
Proof. The first three are immediate from the definition of length, and their verifications are left to the
reader as exercises.
The verification of (iv) is also straightforward:
4v + w42 = /v + w, v + w0 = /v, v0 + /w, w0 = 4v42 + 4w42
as orthogonality of v and w implies that /v, w0 = /w, v0 = 0.

For (v), note that the result is trivial if w = 0. So we may assume that w is non-zero. Let e be the
unit vector defined by e = w/4w4. For any v ∈ V, consider the scalar a = /v, e0 ∈ F. Then
/v − ae, e0 = /v, e0 − /ae, e0 = /v, e0 − a/e, e0 = a − a = 0
so the vectors v − ae and e are orthogonal. Applying Pythagoras’ Identity to orthogonal vectors v − ae
and ae, we obtain
4v42 = 4v − ae42 + 4ae42

= 4v − ae42 + |a|2.
The preceding equality implies that |a|2 ≤ 4v42 , or equivalently, |a| ≤ 4v4 as the length of a vector is
non-negative. Since |a| = |/v, e0| = |/v, w/4w40| = (1/4w4)|/v, w0|, it follows from the inequality |a| ≤ 4v4
that
1/4w4|/v, w0| ≤ 4v4,
which, when multiplied by the positive real 4w4, yields the required inequality (v).
Before we present the proof of the Triangle inequality, we recall the following elementary fact
about complex numbers: if x is a complex number with x as its conjugate, then x + x = 2Re(x) ≤ 2|x|,
where Re(x) denotes the real part of x and |x| the absolute value of x. In particular, for vectors v, w ∈ V
/v, w0 + /w, v0 = /v, w0 + /v, w0 ≤ 2|/v, w0|.
Therefore,
4v + w42 = /v, v0 + /v, w0 + /w, v0 + /w, w0

≤ /v, v0 + 2|/v, w0| + /w, w0
= 4v42 + 2|/v, w0| + 4w42 .
Using Cauchy–Scharwz inequality, we then obtain
4v + w42 ≤ 4v42 + 24v44w4 + 4w42 .
Taking positive square roots of both sides of this inequality of non-negative reals yields the required
triangle inequality. !
Cauchy–Schwarz inequality and the triangle inequality, when applied to Fn with the standard inner
product, give us some unexpected inequalities of numbers; given 2n numbers x1 , . . . , xn , y1 , . . . , yn ,
complex or real, form column vectors x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t in Fn with these num-
4 4
bers. With the standard inner product in Fn , we then have |/x, y0| = | i xi yi |, and 4x42 = i |xi |2 and
2 4 2
4y4 = i |yi | . Therefore Cauchy–Schwarz inequality and the triangle inequality can be interpreted as
the following inequalities for the arbitrary 2n numbers:
1 O1 O1
| xi yi | ≤ |xi |2 |yi |2
i i i
and
O1 O1 O1
|xi + yi ≤|2 |xi |2 + |yi |2 ,
i i i
respectively.
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications.
(a) An inner product is linear in both the variables.
(b) An inner product can be defined only on a vector space over R or C.
(c) There is a unique inner product on R2 .
(d) The sum or the difference of two inner products on a vector space V is an inner product on
V.
(e) If for a linear operator T on an inner product space V, /T w, v0 = 0 for all w, v ∈ V, then T is
the zero operator.
(f) If a vector v is orthogonal to every vector of a basis of an inner product space, then v is the
zero vector.
(g) There is no inner product in R2 such that /e1 , e2 0 = −1, where e1 and e2 are the standard
basis vectors of R2 .
(h) In an inner product space V, |/v, w0| = 4v44w4 if and only if vectors v and w are linearly
independent in V.
(i) For a complex inner product space V, there can be no v ∈ V such that 4v4 = −i.
(j) The restriction of an inner product to a non-zero subspace need not be an inner product.
2. Let x and y be the coordinate vectors, respectively, of v and u with respect to a fixed basis of an
n-dimensional vector space V over a field F, and let A be an arbitrary hermitian matrix in Mn (F).
Verify that H(v, u) = y∗ Ax defines a hermitian form H on V.
3. Show that the hermitian form H on Mn (F), given by H(A, B) = T r(AB∗ ) for A, B ∈ Mn (F), is an
inner product on Mn (F).
4. In each of the following cases, determine whether the given formula provides an inner product
for the given vector space:
(a) V = R2 ; /(x1 , x2 ), (y1 , y2 )0 = x1 y1 − x2 y2 .
(b) V = R2 ; /(x1 , x2 ), (y1 , y2 )0 = x1 y1 − x2 y1 − x1 y2 + 4x2 y2 .
(c) V = C2 ; /x, y0 = xAy∗ , where
' (
1 i
A=
−i 0
(d) V = M2 (R); /A, B0 = T r(AB).

(e) V = R2 ; /x, y0 = yt Ax, where
' (
1 2
A=
3 4
L1
(f) V = R[x]; / f (x), g(x)0 = 0 f ' (t)g(t)dt where f ' (x) is the formal derivative of f (x).
5. Find the conditions on a symmetric 2 × 2 matrix A such that /x, y0 = yt Ax defines an inner
product on R2 .
6. Compute the lengths of the given vectors in each of the following cases, and verify both Cauchy–
Schwarz inequality and the triangle inequality:
(a) v = (1, i), u = (−2, 1 + i) in C2 with the standard inner product.
(b) f (x) = e x , g(x) = x2 in the space V of all real-valued continuous functions on the interval
[0, 1] with the inner product given by
J 1
/ f, g0 = f (t)g(t)dt.
0
' ( ' (
1 i 1+i 0
(c) A = , B= in M2 (C) with inner product /A, B0 = Tr(AB∗ ).
−i 2 i −i
7. Let T be the linear operator on R2 given by T (x1 , x2 ) = (−x2 , x1 ). Prove that /v, T v0 = 0 for all
v ∈ R2 if /, 0 is the standard inner product on R2 .
8. Prove that there can be no linear operator T on C2 with the standard inner product such that
/v, T v0 = 0 for all v ∈ C2 .
9. For any A, B ∈ Mn (F), show that (T r(A∗ B))2 ≤ T r(A∗ A)T r(B∗ B).
10. Let V be the set of all real sequences f = ( f (1), f (2), . . .) such that only finitely many terms of
f are non-zero.
(a) Show that V is a real vector space with addition and scalar multiplication defined component-
wise.
4
(b) If for f, g ∈ V, / f, g0 = ∞
n=1 f (n)g(n), then show that /, 0 is an inner product on V. Note that
the sum is actually a finite sum.
8.4 GRAM–SCHMIDT ORTHOGONALIZATION PROCESS

The standard basis of Fn plays a very important role even when Fn is considered an inner product
space with the standard inner product. The reason for the importance stems from the fact the vectors
in the standard basis are unit vectors which are mutually orthogonal with respect to the standard
inner product. Such bases of inner product spaces, known as orthonormal bases, make computations
with vectors and linear operators extremely simple. In fact, in Section 5.3, we have already used
orthonormal bases with respect to the dot product in Rn for proving a significant result about real
symmetric matrices; moreover, the reader will recall that one can use a procedure (the Gram–Schmidt
process) to convert linearly independent vectors in Rn to orthonormal sets. In this section, we discuss
orthonormal sets as well as the Gram–Schmidt process in arbitrary inner product spaces and explore
the key idea of orthogonal projection. First we give the relevant definitions.
Definition 8.4.1. A non-empty set S of vectors in an inner product space V is called orthogonal if
any two distinct vectors in S are orthogonal. An orthogonal set S of vectors is called orthonormal if
each vector in S is a unit vector. A basis B of an inner product space is an orthonormal basis if B is
an orthonormal set.
We point out that these definitions hold even if the set S is infinite. On the other extreme, a sin-
gle non-zero vector in an inner product space forms an orthogonal set and a single unit vector an
orthonormal set.
Briefly, a set {vi } of vectors in an inner product space, indexed by i in an indexing set Λ, is or-
thonormal if and only if, for i, j ∈ Λ,
/vi , v j 0 = δi j ,
where δi j is the Kroncker’s delta symbol.

Note that as scalar multiples of two orthogonal vectors remain orthogonal, any orthogonal set of
non-zero vectors can be transformed into a set of orthonormal vectors by multiplying each vector by
the reciprocal of its length. This process is described as normalizing the vectors.
Gram–Schmidt Orthogonalization Process 391
EXAMPLE 11 The standard basis E = {e1 , e2 , . . . , en } for Cn , or for Rn , is an orthonormal basis

4
with respect to the standard inner product given by /x, y0 = nj=1 x j y j .
EXAMPLE 12 The vectors (1, 1)t and (1, −1)t form an orthogonal set in R2 with respect to the
standard inner product. We can normalize this orthogonal set to √an orthonormal
√
set √
by dividing
√ t each vector by its length. Thus, the vectors (1/ 2, 1/ 2)t and
(1/ 2, −1/ 2) form an orthonormal set of vectors in R2 , which is clearly an or-
thonormal basis.
EXAMPLE 13 Consider the inner product space Mn (F) with the inner product /A, B0 = T r(AB∗).
For 1 ≤ i, j ≤ n, let ei j be the unit matrix having 1 at the (i, j)th place and zeros
elsewhere. Then it is easy to see that the n2 unit matrices ei j , for 1 ≤ i, j ≤ n, form
a basis of Mn (F). Observe that /ei j , ekl 0 = T r(ei j ekl ∗ ) = T r(ei j elk ) = T r(δ jl eik ) and so
/ei j , ekl 0 ! 0 if and only if j = l and i = k. Thus any two distinct unit matrices must
be orthogonal, as at least one subscript has to be different for two unit matrices to be
distinct. Our calculation also shows that /ei j , ei j 0 = T r(eii ) = 1 and each unit matrix
is a unit vector. It follows that the unit matrices ei j form an orthonormal set in Mn (F)
with respect to the given inner product.
EXAMPLE 14 Consider the infinite set S = {einx | n = 0, ±1, ±2, . . }. of functions in the inner product
space V of complex-valued continuous functions on [0, 2π] as in Example 4 of the
preceding section. Here, einx = cos nx + i sin nx where cos and sin are the real cosine
and sine functions. Now,
J 2π
inx imx 1
/e , e 0= eint eimt dt
2π 0
J 2π
1
= eint e−imt dt.
2π 0
J 2π
1
= ei(n−m)t dt
2π 0
If n ! m, then an easy calculation shows that the value of the last integral is zero,
whereas for n = m it is trivially 2π. Thus, while two distinct functions of the set S are
orthogonal, the length of each is one (because of the factor of 1/2π) showing that S
is an orthonormal set in V.
If we denote the function eint for any integer n by fn , then the preceding calcula-
tion has shown that / fn , fm 0 = δmn , where δmn is the Kronecker delta symbol.
EXAMPLE 15 The basis {1, x, x2 } of the space of all real
L 1 polynomials of degree at most 2 equipped
with the inner product / f (x), g(x)0 = 0 f (x)g(x)dx is not an orthogonal set. For
L1
example, /1, x0 = 0 1.xdx = 1/2 showing that 1 and x are not orthogonal.
One of the advantages of an orthonormal basis is that it simplifies the expressions of vectors as
linear combinations of basis vectors as shown in the next proposition; such a linear combination is
sometimes known as a Fourier expansion.
Proposition 8.4.2. Let S = {v1 , v2 , . . . , vr } be an orthogonal set of non-zero vectors in an inner

product space V with inner product /, 0. If v is in the span of S, then
r
1 /v, vi 0
v= vi .
i=1 4vi 42
In case S is an orthonormal set, one has

r
1
v= /v, vi 0vi . (8.3)
i=1
4
Proof. Write v = j a j v j as a linear combination of the vectors in S. For each i, 1 ≤ i ≤ n, taking the
inner product of v with vi , we get
D1 E 1
/v, vi 0 = a j v j , vi = a j /v j , vi 0 = ai /vi , vi 0,
j j
/v, vi 0
as distinct vectors in S are orthogonal. It follows that ai = for each i. !
4vi 42
Thus, coefficients in linear combinations in inner product spaces can be determined by straight-
forward computations of inner products, instead of by solving systems of linear equations as done
earlier.
√ t
EXAMPLE 16 In an √ earlier texample, we had seen2 that the vectors v1 = (1/ 2)(1, 1) , v2 =
(1/ 2)(1, −1) are orthonormal in R with respect to the standard inner product.
They clearly form a basis of R2 (as one is not a scalar multiple of the other).
To express an arbitrary vector (a, b)t in terms of these basis vectors, we need not
write out the equations for x and y from the relation (a, b)t = xv1 + yv2 , and then
solve them for x and y as we had √ done earlier. For, √ the last corollary implies that
x = /(a, b)t ,√v1 0 = /(a, b)t , (1/√ 2)(1, 1)t 0 = (1/ 2)(a + b) and y = /(a, b)t , v2 0 =
/(a, b)t , (1/ 2)(1, −1)t 0 = (1/ 2)(a − b).
This technique of evaluating coefficients by computing inner products leads us to the following
important result.
Proposition 8.4.3. Let S be an orthogonal (or orthonormal) set of non-zero vectors in an inner
product space V over a field F, then S is linearly independent over F.
Proof. If S is infinite, then we need to show that every finite subset, say {v1 , v2 , . . . , vr }, of S is lin-
early independent. If S is finite, we choose this subset to be S itself. Consider any linear combination
a1 v1 + a2 v2 + · · · + ar vr = 0 for ai ∈ F. As for each fixed i, /vi , v j 0 = 0 for j ! i, taking inner product
of both sides of the relation with vi yields ai /vi , vi 0 = 0. Since /vi , vi 0 is a non-zero scalar for any i, it
follows from the preceding relation that ai = 0. Thus, the required condition for linearly independence
is established. !
Corollary 8.4.4. The number of non-zero vectors in an orthogonal set in a finite-dimensional inner
product space cannot exceed the dimension of the space.
EXAMPLE 17 It is easy to verify that the vectors v1 = (1, 1, 0)t , v2 = (1, −1, 0)t and v3 = (0, 0, 2)t
are orthogonal in R3 with respect to the standard inner product. Thus, our last Propo-
sition (8.4.3) implies that these vectors form an orthogonal basis of R3 . We wish to
express a vector v ∈ R3 , for example, v = (1, 2, 3)t , in terms of these basis vectors.
We have /v, v1 0 = /(1, 2, 3)t , (1, 1, 0)t 0 = 3, /v, v2 0 = /(1, 2, 3)t , (1, −1, 0)t 0 = −1
and /v, v3 0 = /(1, 2, 3)t , (0, 0, 2)t 0 = 6. Similar calculations show that 4v1 42 = 2,
4v2 42 = 2 and 4v3 42 = 4. Hence, by the formula in (8.4.3), one obtains (1, 2, 3)t =
(3/2)(1, 1, 0)t − (1/2)(1, −1, 0)t + (3/2)(0, 0, 2)t .
Note that normalizing the given orthogonal
√ basis vectors produces
√ an orthonormal
basis of R3 consisting of u1 = (1/ 2)(1, 1, 0)t , u2 = (1/ 2)(1, −1, 0)t and u3 =
t t
(1/2)(0,
√ 0, 2) . In√ that case it is clear that the vector v = (1, 2, 3) can be expressed as
(3/ 2)u1 − (1/ 2)u2 + 3u3 .
EXAMPLE 18 Let V be the vector space of all complex-valued continuous
L 2π functions on the interval
[0, 2π] with the inner product / f (x), g(x)0 = 1/2π 0 f (t)g(t)dt. It was shown in Ex-
ample 15 that any two members of the infinite family { fn (x) = einx | n any integer} are
orthogonal with respect to the given inner product. It follows that V cannot be finite-
dimensional for, by Proposition (8.4.3), V has an infinite set of linearly independent
vectors.
Now that we have seen the advantages of an orthogonal or an orthonormal basis, it is time to
discuss the existence of such bases, at least for finite-dimensional inner product spaces. As in the
case of Rn with dot product (see Section 3.7), in such an inner product space, the Gram–Schmidt
orthogonalization process is available, which transforms any given basis of an inner product space
into an orthogonal basis. At the heart of this procedure is the construction of orthogonal projection of
vectors.
The motivation for such a construction was highlighted in our discussion in Section 3.7 about
Gram–Schmidt process in Rn in terms of the decomposition of a vector in R2 into two perpendicular
directions; it may be worthwhile for the reader to look up that discussion for a clue as to how the
following definition is arrived at. Such a decomposition of a vector in R2 is essentially in terms of
the projection of the vector onto a line (an one-dimensional subspace) giving one of the direction.
In the general case, we need to project a vector orthogonally onto subspaces having finite bases of
orthogonal or orthonormal vectors; the following definition spells out the mechanism for doing so.
Definition 8.4.5. Let V be an inner product space and W be a subspace having an orthogonal
basis {v1 , v2 , . . . , vm }. For any v ∈ V, the orthogonal projection of v on W, denoted by PW (v), is the
following vector of W:
m
1
/v, v1 0 /v, vm 0 /v, vi 0
PW (v) = v1 + · · · + vm = vi .
4v1 42 4vm 42 i=1 4vi 42
In case the basis {v1 , v2 , . . . , vm } is orthonormal, it is clear that the expression for PW (v) has to be
modified by replacing each of the lengths 4vk 42 by 1.
Since the coefficients /v, vi 0/4vi 42 of vi in the expression for PW (v) are scalars and /vi , v j 0 = δi j , it
follows that /PW (v), v j 0 = /v, v j 0, for any j, by the linearity of the inner product, a result which will
be useful later.
As in the case of symmetric bilinear forms (see Section 7.4), vectors orthogonal to each vector of a
given subspace, or even of a non-empty subset, of an inner product space need to be considered.
Definition 8.4.6. Let S be any non-empty set of vectors in an inner product space V. The orthog-
onal complement of S , denoted by S ⊥ , is the set of all vectors in V which are orthogonal to every
vector in S . Thus
S ⊥ = {v ∈ V | /v, w0 = 0 for every w ∈ S }.
Standard arguments show that (i) for any non-empty subset S of V, S ⊥ is a subspace of V and
(ii) V ⊥ = {0} and {0}⊥ = V.
One can say more about orthogonal complements of subspaces.
Proposition 8.4.7. Let V be an inner product space. For any subspace W of V, W ∩ W ⊥ = {0}
and (W ⊥ )⊥ = W. Moreover, if W is finite-dimensional, then any vector v, which is orthogonal to each
vector of a basis of W, is in W ⊥ .
The proof is straightforward and so left to the reader. The following lemma explains the name of
orthogonal projections of vectors.
Lemma 8.4.8. Let V be an inner product space and W be a subspace which has some finite or-
thogonal basis. If PW (v) is the orthogonal projection of a vector v ∈ V on the subspace W, then
v − PW (v) ∈ W ⊥ .
Proof. Let v1 , v2 , . . . , vk be an orthogonal basis of W. By the preceding proposition, it suffices

to show that v − PW (v) is orthogonal to any basis vector v j of W. Now /v − PW (v), v j 0 = /v, v j 0 −
/PW (v), v j 0 by linearity of inner product. However, as the given basis of W is orthogonal, by
the remark following the definition of orthogonal projection, we have /PW (v), v j 0 = /v, v j 0. Thus,
/v − PW (v), v j 0 = 0 as required. !
We are now ready to introduce the Gram–Schmidt orthogonalization process in an arbitrary

inner product space. This process consists of repeated applications of orthogonal projections to arrive
at orthogonal bases for finite-dimensional subspaces of an inner product space.
Theorem 8.4.9. Let V be an inner product space, and let S = {u1 , u2 , . . . , un } a linearly indepen-
dent set of vectors of V. Then there is an orthogonal set S' = {v1 , v2 , . . . , vn } of vectors in V such that
for each k = 1, 2, . . . , n, the set {v1 , . . . , vk } is a basis for the subspace Wk spanned by {u1 , . . . , uk }.
Proof. We begin by letting v1 = u1 ! 0. The desired vectors v2 , . . . , vn are then defined inductively
as follows: Suppose that for any m, 1 ≤ m < n, vectors v1 , v2 , . . . , vm have been chosen such that
for each k, 1 ≤ k ≤ m, the vectors v1 , . . . , vk form an orthogonal basis of the subspace Wk spanned
by the linearly independent vectors u1 , . . . , uk . Then, the next vector required, that is, vm+1 , will
be constructed by projecting um+1 orthogonally on the subspace Wm , now considered spanned by
v1 , . . . , vm . More precisely, we define
vm+1 = um+1 − PWm (um+1 )
1m
/um+1 , vi 0
= um+1 − vi .
i=1
/vi , vi 0
We need to show that v1 , . . . , vm , vm+1 form an orthogonal basis of Wm+1 .

First, we observe that vm+1 is non-zero, for, if vm+1 = 0, then um+1 will be in the span of v1 , . . . , vm
which, by our inductive construction, is Wm . It then follows um+1 will be a linear combination of
u1 , . . . , um (as these span Wm ), which contradicts the linear independence of the set S.
Next we claim that the non-zero vectors v1 , . . . , vm , vm+1 are mutually orthogonal. Since by the
induction hypothesis, the vectors v1 , . . . , vm form an orthogonal set, it suffices to show that vm+1 is
orthogonal to v j for j = 1, 2, . . . , m. However, by Lemma (8.4.8), vm+1 = um+1 − PWm (um+1 ) is in
Wm ⊥ , and so is orthogonal to each of the vectors v1 , v2 , . . . , vm that span Wm . Hence our claim.
Thus, v1 , v2 , . . . , vm+1 is an orthogonal set of non-zero vectors in the subspace Wm+1 , which is
spanned by m + 1 linearly independent vectors u1 , u2 , . . . , um+1 . On the other hand, by Proposition
(8.4.3), vectors v1 , v2 , . . . , vm+1 too are linearly independent, and so they form a basis of Wm+1 .
By induction, the proof is complete. !
Corollary 8.4.10. Every finite-dimensional inner product space has an orthonormal basis.
Proof. Any basis {u1 , . . . , un } of a finite-dimensional inner product space can be replaced by an or-
thogonal basis {v1 , . . . , vn } by the Gram–Schmidt orthogonalization process. Further, an orthonormal
basis can be obtained by replacing each v j by the normalized vector v j /4v j 4. !
Corollary 8.4.11. (Orthogonal Decomposition) Let W be a finite-dimensional subspace of an in-

ner product space V. Then V = W ⊕ W ⊥ .
Proof. Since by assertion (iii) of Proposition (8.4.7), W ∩ W ⊥ = {0}, we need only to show that V is
the sum of W and W ⊥ . To show that we first choose an orthonormal basis of W so that for any v ∈ V,
the orthogonal projection PW (v), as defined in Definition (8.4.2), is a vector in W. Now, observe that
v = PW (v) + (v − PW (v)), where v − PW (v) ∈ W ⊥ by Lemma (8.4.8). The corollary follows. !
Recall from Proposition (3.4.7) that any set of linearly independent vectors in a finite-dimensional
vector space can be extended to a basis. A similar result holds about orthonormal sets of vectors.
Corollary 8.4.12. Let S be an orthonormal set of vectors of a finite-dimensional inner product

space V. Then S can be extended to an orthonormal basis of V.
Proof. Let W be the subspace of V spanned by S and W ⊥ be its orthogonal complement. Now, W ⊥
is also finite-dimensional so by Corollary (8.4.10) it has an orthonormal basis, say S' . It is clear that
S ∪ S' is an orthonormal basis of V as V = W ⊕ W ⊥ . !
Quite frequently, we start with a single unit vector e of a finite-dimensional inner product space V.
Since e forms an orthonormal set, e along with any orthonormal basis of its orthogonal complement
will form an orthonormal basis of V.
See Section 3.7 for some detailed numerical examples illustrating the Gram–Schmidt orthogonal-
ization process. Nonetheless we give a couple of examples here; in the first one, the reader is expected
to work out the details.
EXAMPLE 19 Consider the subspace W of R4 spanned by the vectors u1 = (1, 0, 2, 0)t , u2 =

(1, 1, 7, 0)t and u3 = (2, 6, 4, 1)t . Assume that R4 has the standard inner product.
We proceed to find an orthonormal basis of W by the Gram–Schmidt orthogo-
nalization process. So, we begin by letting v1 = u1 = (1, 0, 2, 0)t . Now /u2 , v1 0 =
/(1, 1, 7, 0)t , (1, 0, 2, 0)t 0 = 15 and 4v1 42 = /(1, 0, 2, 0)t , (1, 0, 2, 0)t 0 = 5. Thus, the
15
second required vector is given by v2 = (1, 1, 7, 0)t − (1, 0, 2, 0)t = (−2, 1, 1, 0)t .
5
Similar calculation shows that /u3 , v1 0 = 10, /u3 , v2 0 = 6 and 4v2 42 = 6. Thus, the
third vector turns out to be v3 = (2, 5, −1, 1)t .
Normalizing√ these three v j , we arrive
√ at an orthonormal basis
√ of W consisting of
t t
the vectors (1/ 5)(1, 0, 2, 0) , (1/ 6)(−2, 1, 1, 0) and (1/ 31)(2, 5, −1, 1) . t
EXAMPLE 20 Let us obtain an orthonormal basis of R2 [x], the space of all real L 1 polynomials of
degree at most 2, equipped with the inner product / f (x), g(x)0 = 0 f (t)g(t)dt. Con-
sider the basis {u1 = 1, u2 = x, u3 = x2 } of R2 [x], and let v1 = u1 = 1. Then the next
required vector is given by
2J 1 3
/u2 , v1 0 1
v2 = u2 − 2
v1 = x − tdt .1 = x − .
4v1 4 0 2
L1
Now 4v2 42 = 4x − 1/242 = 0
(t − 1/2)2dt = 1/12. Therefore,
/u3 , v1 0
/u3 , v2 0
v3 = u 3 − v1 − 2
v2
4v 4 4v2 42
2J 11 3 2J 1 3
2 2 2
=x − t dt .1 − 12 t (t − 1/2)dt .(x − 1/2)
0 0
= x2 − 1/3 − 12(1/4 − 1/6)(x − 1/2)
= x2 − x + 1/6.
L1
Note that 4v3 42 = 0 (t2 − t + 1/6)2dt = 1/180. Thus the polynomials 1, x − 1/2, x2 −
x + 1/6 form an orthogonal√ basis of R2 [x]. Normalizing
√ these vectors, we obtain an
orthonormal basis {1, (1/ 12)(x − 1/2), (1/6 5)(x2 − x + 1/6)}.
Another advantage of working with an orthonormal basis of an inner product space lies in the ease
with which we can compute the matrix of a linear operator on that space. The precise formulation is
given in the following result.
Proposition 8.4.13. Let T be a linear operator on a finite-dimensional inner product space V with
an orthonormal basis B = {v1 , v2 , . . . , vn }. If A = [ai j ] be the matrix of T with respect to B, then
ai j = /T v j , vi 0.
Proof. Since B is an orthonormal basis of V, any v ∈ V can be expressed, by Corollary (8.3), as

4 4
v = ni=1 /v, vi 0vi . In particular, T v j = ni=1 /T v j , vi 0vi , for j = 1, 2, . . . , n. On the other hand, by the
definition of the matrix A of T , the vector T v j determines the jth column of A uniquely as follows:
4
T v j = ni=1 ai j vi for each j. Comparing these two expressions (by using the linear independence of the
vectors vi ), we arrive at the formula for ai j . !
The point of this proposition is that instead of solving systems of equations to determine the entries
of the matrix of an operator on an inner product space, one has the easier option of computing these
entries as inner products.
EXAMPLE 21 Consider R2 with the standard inner product, and let T : R2 → R2 be the linear opera-
tor given by T (x1 , x2 )t = (x1 , x1 +√x2 )t . Suppose, √ we want to find the matrix A = [ai j ]
of T with respect to the basis {1/ 2(1, 1)t , 1/ 2(1, −1)t } of R2 . Once we note that
this basis is orthonormal relative to the given inner product, √ computing
√ t √the entries
√ t of
A is √
straightforward:
√ √for example
√ /T v 1 , v1 0 = /T (1/ 2, 1/ 2) , (1/ 2, 1/ 2) 0 =
/(1/ 2, 2/ 2)t , (1/ 2, 1/ 2)t 0 and so by the formula in the proposition a11 =
1/2 + 2/2 = 3/2. Similar calculations show that a12 = /T v2 , v1 0 = 1/2, a21 = −1/2
and a22 = 1/2.
We now examine another aspect of the concept of orthogonal projection. Though the notation PW v
for any v ∈ V gives the impression that PW is a map on V, the expression for PW (v), in terms of a given
orthonormal or orthogonal basis of W, does not allow us to consider it as a map on V. Fortunately, there
is an alternative description of an orthogonal projection onto a subspace, which will not only allow
us to treat such a projection as a map from V onto W but also extend the idea to infinite-dimensional
cases.
The key is to think of the orthogonal projection of a vector on a subspace as that vector of the
subspace which is, in some sense, closest to the given vector. Indeed, in R2 the point Q, on a line W (a
one-dimensional subspace), which is closest to a given point P in R2 , corresponds to the orthogonal
projection on W of the vector v that represents P. We interpret this situation in terms of an inequality
involving distances: 4v − PW (v)4 is the least among all possible distances 4v − w4 as w ranges over
W. This inequality suggests that the idea of orthogonal projection can be extended in the following
manner.
Definition 8.4.14. Let W be a subspace of an inner product space V. For any v ∈ V, a best approx-
imation to v by vectors in W is a vector u ∈ W such that
4v − u4 ≤ 4v − w4 for all w ∈ W.
Because of our familiarity with R2 or R3 , we intuitively feel that such an approximation should be
unique and be such that v − u is orthogonal to all of W. This is indeed the case even in an arbitrary
inner product space, as shown in the following proposition.
Proposition 8.4.15. Let W be a subspace of an inner product space V. For any v ∈ V, a vector
u ∈ W is a best approximation to v by vectors in W if and only if v − u ∈ W ⊥ .
Proof. Suppose that for some vector u ∈ W, v − u ∈ W ⊥ . Then, for any w ∈ W, u − w is orthogonal to
v − u. Hence, by Pythagoras’ identity (see Proposition 8.3.3), we see that
4v − w42 = 4(v − u) + (u − w)42

= 4v − u42 + 4u − w42
≥ 4v − u42
for any w ∈ W, where the equality holds only when u = w. Thus, u satisfies the condition in Definition
(8.4.4) for being a best approximation to v.
To prove the proposition in the other direction, assume that u ∈ W such that 4v − w4 ≥ 4v − u4 for
any w ∈ W. Note that in general,
4v − w42 = 4(v − u) + (u − w)42

= 4v − u42 + 2Re/v − u, u − w0 + 4u − w42 .
Our assumption then implies that
2Re/v − u, u − w0 + 4u − w42 ≥ 0 (8.4)
for any w ∈ W. Next, observe that as u ∈ W is fixed, every vector in W can be expressed as u − w by
choosing an appropriate w, so that Inequality (8.4) holds with u − w replaced by an arbitrary vector of
W. In particular, for any w ∈ W such that w ! u, we may replace u − w in Inequality (8.4) by a(u − w),
where a is the scalar given by
/v − u, u − w0
a=− .
4u − w42
A short calculation involving basic properties of inner product then shows that Inequality (8.4) reduces
to
|/v − u, u − w0|2 |/v − u, u − w0|2

−2 + ≥ 0,
4u − w42 4u − w42
which clearly holds if and only if /v − u, u − w0 = 0. This implies, because of our earlier observation
about vectors in W, that every non-zero vector of W is orthogonal to v − u. Hence the converse follows.
!
Corollary 8.4.16. If, for a vector v in an inner product space V, a best approximation to v by
vectors of W exists, then it must be unique.
Proof. If u1 and u2 in W are two best approximations to v by vectors of W, then by the preceding
proposition /v − u1 , w0 = /v − u2 , w0 = 0 for every w ∈ W. It follows that /u2 − u1 , w0 = /(v − u1 ) − (v −
u2 ), w0 = 0, showing that the vector u2 − u1 of W is actually in W ⊥ . But the only vector of a subspace
which is also in its orthogonal complement is the zero vector (see Proposition 8.4.7) The corollary
follows. !
Recall from Lemma (8.4.8) that if W is a finite-dimensional subspace of an inner product space V,
then for any v ∈ V, the vector v − PW (v) is in W ⊥ . Therefore the preceding proposition also implies that
best approximations always exist in the case of finite-dimensional subspaces.
Corollary 8.4.17. If W is a finite-dimensional subspace of an inner product space V, then the

vector PW (v) ∈ W, for any v ∈ V, is the best approximation to v by vectors of W.
Taking a cue from the idea of best approximations, we now introduce orthogonal projections as
operators on arbitrary inner product spaces.
Definition 8.4.18. Let W be a subspace of an inner product space V such that every vector of V
has best approximation by vectors in W. The orthogonal projection of V on W is the map PW : V → W
such that PW (v) = v x , where v x is the best approximation of v by vectors in W.
We reiterate that if W is a finite-dimensional subspace of an inner product space V, then the orthog-
onal projection does exist as a map on V; in fact, if {v1 , v2 , . . . , vm } is an orthonormal basis of W,
then for any v ∈ V,
m
1
PW (v) = /v, vi 0vi . (8.5)
i=1
We consider an example to clarify the ideas involved in orthogonal projections. We assume that PW
is a linear map, a fact we will prove shortly.
EXAMPLE 22 Consider the subspace W of R3 with the standard inner product, spanned by
(−1, 0, 1)t . For any v = (x1 , x2 , x3 )t in R3 , we have
/(x1 , x2 , x3 )t , (−1, 0, 1)t 0
PW (v) = (−1, 0, 1)t
4(−1, 0, 1)t 42
−x1 + x3
= (−1, 0, 1)t
2
1
= (x1 − x3 , 0, −x1 + x3 )t .
2
It follows that ker PW = {(y1 , y2 , y3 )t ∈ R3 | y1 = y3 }. On the other hand,
v − PW (v) = (x1 , x2 , x3 )t − 1/2(x1 − x3 , 0, −x1 + x3 )t
1
= (x1 + x3 , 2x2 , x1 + x3 )t .
2
It is, therefore, clear that W ⊥ coincides with ker PW .
To continue with the example, we consider the orthonormal basis of R3 formed
by
1 1 1
v1 = √ (−1, 0, 1)t , v2 = √ (1, 0, 1)t and v3 = √ (−1, 0, −1)t .
2 2 2
The choice of the basis will be clear a little later. Let A = [ai j ] be the matrix of PW ; it
is an easy exercise to compute the entries of A, by the formula given in Proposition
(8.4.13), to show that
 
1 0 0
 
A = 0 0 0. (8.6)
 
0 0 0
That almost all the entries are zeros is not unexpected as two of the basis vectors are in ker PW .
Also note that A2 = A so we can conclude that PW is a projection map as PW 2 = PW .
Our next result shows, as anticipated in the preceding example, that the map we have just intro-
duced as an orthogonal projection is indeed a projection in our earlier sense (see 4.2.12 for relevant
properties).
Proposition 8.4.19. Let W be a finite-dimensional subspace of an inner product space V, and let
PW be the orthogonal projection of V on W. Then, PW is a linear operator on V such that PW 2 = PW .
Moreover, Im(PW ) = W and ker PW = W ⊥ .
Proof. Since W is finite-dimensional, we may assume that there is an orthonormal basis

{v1 , v2 , . . . , vm } of W. In that case, for any v ∈ V we have
m
1
PW (v) = /v, vi 0vi
i=1
by Equation (8.5). This formula shows that PW is a linear map as an inner product is linear in the first
variable; that the range of PW is contained in W is also clear from the formula.
We can also interpret PW (v) as the best approximation to v by vectors of W. Thus, for w ∈ W, PW (w)
has to be w itself. But for any v ∈ V, PW (v) ∈ W, so that PW (PW (v)) = PW (v) which proves that PW is
onto W and PW 2 = PW .
We have already seen that for any v ∈ V, the vector PW (v) is the unique vector in W such that
v − PW (v) ∈ W ⊥ . It follows that PW (v) = 0 if and only if v ∈ W ⊥ . In other words, ker PW = W ⊥ . This
completes the proof. !
This result now explains our choice of the orthonormal basis for R3 in Example 23 with reference
to the projection PW . We took v1 as an orthonormal basis of the subspace W, and chose v2 and v3 as an
orthonormal basis of ker PW . Since ker PW = W ⊥ and R3 = W ⊕ W ⊥ , it follows that v1 , v2 and v3 form
an orthonormal basis of R3 . Now PW acts as the identity on W, and is the zero map on W ⊥ . Therefore,
the chosen basis reflects the essential character of the projection PW as shown by its matrix A relative
to that basis.
We end this section by deriving an important inequality which is valid in any inner product space.
Corollary 8.4.20. (Bessel’s inequality) Let v1 , v2 , . . . , vn be an orthogonal set of non-zero vectors

in an inner product space V. Then for any v ∈ V,
n
1 |/v, vi 0|2
≤ 4v42 .
i=1 4vi 42
Proof. Let W be the subspace of V spanned by the given vectors v1 , v2 , . . . , vn . Observe that by
Proposition (8.4.3), these non-zero vectors are linearly independent and so form an orthogonal basis
of W. On the other hand, by Proposition (8.4.19), any vector v ∈ V can be decomposed as v = PW (v) +u,
where PW is the orthogonal projection of V onto W and so PW (v) ∈ W whereas u = v − PW (v) ∈ W ⊥ .
Since PW (v) and u are orthogonal, it follows, by Pythagoras’ Identity, that 4v42 = 4PW (v)42 + 4u42 ,
which implies the inequality
4PW (v)42 ≤ 4v42 . (8.7)
On the other hand, as v1 , v2 , . . . , vn are orthogonal, the following expression (see Definition 8.4.5)
1n
/v, vi 0
PW (v) = vi
i=1
4vi 42
readily yields the following formula:

1n
|/v, vi 0|2
4PW (v)42 = /PW (v), PW (v)0 = .
i=1
4vi 42
Bessel’s inequality then follows from the inequality in (8.7). !
Two points should be noted. Firstly, if v1 , v2 , . . . , vn is an orthonormal set, then Bessel’s Inequality
reads
n
1
|/v, vi 0|2 ≤ 4v42 .
i=1
Secondly, going through the proof of Bessel’s inequality, we see that equality holds in Bessel’s in-
equality if and only if v = PW (v), that is, if and only if v is in the span of the given vectors.
Coming back to the Gram–Schmidt process, it can be used to derive a factorization of a m × n real
matrix into a product of a m × n matrix having orthonormal columns and an invertible upper triangular
matrix of order n. Such a factorization is known as a QR factorization. QR factorizations of real
matrices have practical applications. For details, the reader is referred to Section 3.7, where we have
discussed properties of Rn with the usual dot product; the reader shall also find examples and exercises
related to the Gram–Schmidt process in Rn in that section.
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications. Assume
that V is a finite-dimensional inner product space.
(a) Every orthogonal set in V is linearly independent.
(b) Every linearly independent set in V is orthogonal.
(c) Every finite set of vectors in V, containing the zero vector, is orthogonal.
(d) A unit vector in V forms an orthonormal set in V.
(e) The Gram–Schmidt process changes an arbitrary set of vectors in V into an orthonormal
set.
(f) If for two subspaces W1 and W2 of V, W1 ⊥ = W2 ⊥ , then W1 = W2 .
(g) If for two non-empty subsets S 1 and S 2 of V, S 1 ⊥ = S 2 ⊥ , then S 1 = S 2 .
(h) If a subset {v1 , v2 , . . . , vk } of a basis {v1 , v2 , . . . , vn } of V spans a subspace W, then
{vk+1 , vk+2 , . . . , vn } spans W ⊥ .
(i) If the dimension of a subspace W of V is m, then any orthogonal set of m vectors in W is
an orthogonal basis of W.
(j) An orthogonal projection of V onto a subspace W is completely determined by its image of
W.
(k) Any projection of V onto a subspace is an orthogonal projection.
(l) If a vector v ∈ V is orthogonal to each basis vector of subspace W, then PW (v) is the zero
vector, where PW is the orthogonal projection of V onto W.
(m) For any v ∈ V, the best approximation to v by vectors of a subspace W is precisely v− PW (v).
(n) For a linear operator T on V, the matrix [ak j ] of T with respect to an orthonormal basis
{v1 , v2 , . . . , vn } is given by ak j = /T vk , v j 0.
(o) Every finite-dimensional inner product space has an orthonormal basis.
(p) There is an inner product on the vector space Mn (F) with respect to which the n2 unit
matrices ei j form an orthonormal basis.
2. In each part of the following exercise, apply the Gram–Schmidt process to the given set S in
the indicated inner product space V to obtain an orthonormal basis of the subspace spanned by
S ; also find the coordinates of the given vector with respect to the orthonormal basis found:
(a) V = R2 with standard inner product, S = {(1, 1)t , (−1, 2)t } and v = (2, 3)t .
(b) V = R3 with standard inner product, S = {(1, 1, 0)t , (−1, 2, 1)t } and v = (1, 4, 1)t .
(c) V = C3 with standard inner product, S = {(1, i, 0)t , (1 + i, −i, −1)t } and v = (1 − i, 2 − i, −1)t .
Lπ
(d) V = /S 0 with inner product / f, g0 = 0 f (t)g(t)dt with S = {sin t, cos t, 1, t} and h(t) = t + 1.
(e) V = M2 (R) with inner product /A, B0 = T r(AB∗ ),
;' ( ' (< ' (
1 1 −1 2 0 3
S= , and C = .
−1 1 0 3 −1 4
3. Determine an orthonormal basis for V = R2 [x]L of all real polynomials of degree at most 2 if the
1
inner product on V is given by / f (x), g(x)0 = −1 f (t)g(t)dt.
4. Compute S ⊥ for the subset S = {(1, 1, 0, −2)t , (−1, 1, 3, 0)t } of R4 with the standard inner
product.
5. Prove that any two distinct functions in the set {einx | n ∈ Z} are orthogonal in the inner product
space of all continuous complex-valued
L 2π functions on the interval [0, 2π] with respect to the
inner product / f (x), g(x)0 = 1/2π 0 f (t)g(t)dt.
6. Consider R4 with the standard inner product and let W be the subspace spanned by (1, 1, 0, 0)t
and (1, 0, −1, 0)t . Find an orthonormal basis of W ⊥ .
7. Find the orthogonal projections of the following inner product spaces V on subspaces W
spanned by the given vectors, and determine their matrices with respect to some suitable bases.
(a) V = R2 with the standard inner product, W spanned by (1, 1)t .
(b) V = R3 with the standard inner product, and W spanned by (1, 0, 0)t and (−1, 1, 0)t .
(c) V = R2 with the inner product given by
/(x1 , xt2 ), (y1 , y2 )t 0 = x1 y1 − x2 y1 − x1 y2 + 4x2 y2 ,
and W = /(1, 1)t 0.

8. Compute PW (h(x)) for h(x) = x3 ∈ R[x] onL the subspace W spanned by {1, x, x2 } if the inner
1
product on R[x] is given by / f (x), g(x)0 = 0 f (t)g(t)dt.
9. Determine the orthogonal complements of the following subspaces W of the given inner prod-
uct spaces V.
L1
(a) V = R2 [x] with inner product / f (x), g(x)0 = 0 f (t)g(t) dt, and W the subspace of constant
polynomials.
(b) V = C3 with the standard inner product, and W the subspace spanned by (1, i, 0)t .
Adjoints 403
10. Let W be a finite-dimensional subspace of an inner product space V. If v " W, show that there
is some u ∈ V such that u ∈ W ⊥ but /v, u0 ! 0.
11. If W1 and W2 are subspaces of a finite-dimensional inner product space, then show that
(W1 + W2 )⊥ = W1 ⊥ ∩ W2 ⊥ and (W1 ∩ W2 )⊥ = W1 ⊥ + W2 ⊥ .
12. Let W be a finite-dimensional subspace of an inner product space V, and let PW be the orthog-
onal projection of V on W. Prove that /PW (v), u0 = /v, PW (u)0 for all v, u ∈ V.
13. Let V be the real inner product space consisting of all real-valued continuous
L1 functions defined
on the closed interval [−1, 1], with the inner product given by / f, g0 = −1 f (t)g(t)dt. Let Wo =
{ f ∈ V | f (−x) = − f (x)} and We = { f ∈ V | f (−x) = f (x)} be the subspaces consisting of odd and
even functions in V, respectively. Show that Wo ⊥ = We .
14. Let V be the inner product space of Exercise 14. Consider the subspace W = R2 [x] of all real
polynomials of degree at most 2. Find an orthonormal basis of W, and use it to compute the
best approximation of h(x) = e x ∈ V by polynomials in W = R2 [x].
15. Let V be the set of all real sequences { f } such that f (n) is non-zero for only finitely many
positive integers n. It was seen in Exercise 10 of Section 8.3 that V is an inner product space
4
with respect to the product given by / f, g0 = ∞ n=1 f (n)g(n). Now define, for each k, a sequence
ek ∈ V, where ek (n) = δk,n , the Kronecker delta. Show that
(a) {e1 , e2 , . . .} is an orthonormal basis of V.
(b) If W be the subspace generated by all the sequences fk for k = 1, 2, . . . , where fk = e1 + ek ,
then W ! V.
(c) W ⊥ = {0}.
This example shows that if W is not finite-dimensional, then it is not necessary that (W ⊥ )⊥
equals W.
8.5 ADJOINTS
In this section, we discuss adjoints of linear operators on inner product spaces.
Definition 8.5.1. Let V be an inner product space and T a linear operator on V. We say T has an
adjoint T ∗ if T ∗ is a linear map on V such that
/T v, w0 = /v, T ∗ w0 for all v, w ∈ V.
Note that if T ∗ exists, then for any v, w ∈ V,
/v, T w0 = /T w, v0 = /w, T ∗ v0 = /T ∗ v, w0, (8.8)
If V is a finite-dimensional inner product space, the following result, which characterizes linear
functionals on V, enables one to prove the existence of adjoints of operators on V. Observe that any
vector w in V determines a linear functional fw : V → F defined by fw (v) = /v, w0 as the inner product
is linear in the first variable. Conversely, given any linear functional f on a finite-dimensional inner
4
product space V, we choose an orthonormal basis {v1 , v2 , . . . , vn } of V and define w = ni=1 f (vi )vi .
FThen4for the linear G functional fw , determined by the newly defined w, one has fw (v j ) = /v j , w0 =
n
v j , i=1 f (vi )vi = f (v j ) for every basis vector v j , as the inner product is conjugate linear in the
second variable and the v j form an orthonormal basis. Since fw and f agree on the basis vectors of V,
it follows that f = fw . We thus have the following.
Proposition 8.5.2. Any linear functional f , on a finite-dimensional inner product space V over a
field F, determines a unique vector w ∈ V such that f (v) = /v, w0 for all v ∈ V.
Proof. Only the uniqueness remains to be proved, which follows from the observation that if /v, w0 =
/v, w' 0 for every v ∈ V, then by properties of an inner product w = w' . !
We can now prove the existence of the adjoint of any linear operator on a finite-dimensional inner
product space.
Theorem 8.5.3. For any linear operator T on a finite-dimensional inner product space V over F,
there is a unique linear operator T ∗ on V such that
/T v, w0 = /v, T ∗ w0 for any v, w ∈ V.
Proof. First note that given a vector w ∈ V, the map Rw : V → F, given by Rw (v) = /T v, w0 is a linear
functional on V as T is linear and an inner product is also linear in the first variable. Therefore, by the
preceding proposition, there is a unique vector w' ∈ V such that Rw (v) = /v, w' 0 for any v in V. In other
words, given a vector w ∈ V, there is a unique vector w' ∈ V such that /T v, w0 = /v, w' 0 for any v in
V. The uniqueness of w' , on the other hand, shows that the association w → w' defines a well-defined
map T ∗ : V → V given by T ∗ (w) = w' . In terms of this map T ∗ , the preceding equality can be rewritten
as the defining equation for T ∗ :
/T v, w0 = /v, T ∗ w0 (8.9)
for any v and w in V. We show that T ∗ is linear. Let w1 , w2 ∈ V and a ∈ F. Then, by using the definition
of T ∗ and the properties of an inner product, we see that for any v ∈ V,
/v, T ∗ (aw1 + w2 )0 = /T v, aw1 + w2 0 = a/T v, w1 0 + /T v, w2 0
= a/v, T ∗ w1 0 + /v, T ∗ w2 0
= /v, aT ∗ w1 0 + /v, T ∗ w2 0
= /v, aT ∗ w1 + T ∗ w2 0.
Since the preceding chain of equalities hold for any v ∈ V, it follows that T ∗ (aw1 +w2 ) = aT ∗ w1 +T ∗ w2 ,
which proves the linearity of T ∗ .
Now we prove the uniqueness of T ∗ . If there is another linear operator S on V such that /T v, w0 =
/v, S w0 for all v, w ∈ V, then by the definition of T ∗ , it follows that /v, T ∗ w0 = /v, S w0 for all v ∈ V.
We conclude, as in the last paragraph, that T ∗ w = S w for all w ∈ V, so T ∗ = S . !
Now fix an orthonormal basis {v1 , v2 , . . . , vn } of an inner product space V. Then, for any linear
operator T on V, since /T v j , vi 0 = /v j , T ∗ vi 0 = /T ∗ vi , v j 0, by the definition of the matrices of linear
operators with respect to an orthonormal basis (see Proposition 8.4.13), the following results.
Proposition 8.5.4. Let T be a linear operator on a finite-dimensional inner product space V. With
respect to any orthonormal basis of V, the matrix of the adjoint T ∗ is the conjugate transpose of the
matrix of T .
Adjoints 405
Thus if T is a linear operator on a real finite-dimensional inner product space V, then the matrix of
T ∗ with respect to any orthonormal basis of V is the transpose of the matrix of T .
The following is clear.
Corollary 8.5.5. Let A ∈ Mn (F), and T be the linear operator on Fn defined by A with respect to
the standard basis E of Fn . Then the linear operator on Fn determined by the conjugate transpose A∗
with respect to E, is precisely the adjoint T ∗ of T .
In case A is a real matrix, then the corollary holds with A∗ replaced by At .

The last proposition is useful in determining adjoints of linear operators as the next example shows.
EXAMPLE 23 Consider operator T : C3 → C3 given by
T (x1 , x2 , x3 ) = (x1 + ix2 − x3 , ix1 − x2 + x3 , −x1 + x2 + ix3 ).
Recall that the standard basis of C3 over C is orthonormal with respect to the standard
inner product on C3 . The matrix A of T with respect to the standard basis is
 
 1 i −1
 
A =  i −1 1.
 
−1 1 i
The conjugate transpose A∗ is, therefore, given by

 
 1 −i −1
 
A∗ =  −i −1 1.
 
−1 1 −i
Therefore T ∗ is determined by these relations: T ∗ e1 = e1 − ie2 − e3 , T ∗ e2 = −ie1 −

e2 + e3 and T ∗ e3 = −e1 + e2 − ie3 . Using these formulas, we can express the action of
T ∗ on a typical vector of C3 as
T ∗ (x1 , x2 , x3 ) = (x1 − ix2 − x3 , −ix1 − x2 + x3 , −x1 + x2 − ix3 ).
We leave it to the reader to verify that /T v, u0 = /v, T ∗ u0 by using these formulae.

EXAMPLE 24 In this example, we compute the adjoint of the linear operator T A on Mn (F) given by
the left-multiplication by a fixed matrix A ∈ Mn (F). Here, the inner product on Mn (F)
is the standard one given by /X, Y0 = T r(XY ∗ ). Using the well-known property of the
trace function, namely that T r(CD) = T r(DC) for any two matrices C, D, one ob-
tains T r(AXY ∗ ) = T r(XY ∗ A) = T r(X(A∗ Y)∗ ). Thus /T A (X), Y0 = /X, A∗ Y0. However,
/T A (X), Y0 = /X, T A ∗ (Y)0 by definition, so the uniqueness of adjoint implies that
T A ∗ (Y) = A∗ Y for all Y ∈ Mn (F).
It follows that the adjoint of the left-multiplication by A is the left-multiplication by

A∗ , the adjoint of A.
EXAMPLE 25 In an infinite-dimensional inner product space, a linear operator need not have an
adjoint. Here, we present the standard example. Let V be the space of allL polynomials
1
with real coefficients equipped with the inner product / f (x), g(x)0 = 0 f (t)g(t)dt.
We claim that the the differentiation operator D on V does not have an adjoint. To
prove our claim, we first note that by integration by parts
J 1 J1
f ' (t)g(t) dt = f (1)g(1) − f (0)g(0) − f (t)g' (t) dt
0 0
where the symbol h' (x), as usual, denotes the derivative D(h(x)) for any polynomial
h(x). In terms of inner product, the relation can be rewritten as
/D( f (x)), g(x)0 = f (1)g(1) − f (0)g(0) − / f (x), D(g(x))0.
If the adjoint D∗ exists, then by the definition of the adjoint, the last relation implies
that
/ f (x), D∗ (g(x))0 = f (1)g(1) − f (0)g(0) − / f (x), D(g(x))0

for all f (x), g(x) ∈ V. It follows that
/ f (x), (D + D∗)(g(x))0 = f (1)g(1) − f (0)g(0). (8.10)
Now, it is routine exercise to verify that for a fixed g(x) ∈ V, the map Lg defined by
Lg ( f (x)) = f (1)g(1) − f (0)g(0) is a linear functional on V. Therefore, Equation (8.10)
shows that for each f (x) ∈ V, Lg ( f (x)) = / f (x), h(x)0, where h(x) = (D + D∗ )(g(x)).
We leave it to the reader to show (Exercise 13) that Lg ( f (x)) can be the inner product
f (x) with a fixed h(x) for all f (x) if and only if Lg is the zero functional. In other
words, assuming the existence of D∗ (g(x)) for a fixed but arbitrary polynomial g(x),
we arrive at the following relation:
f (1)g(1) − f (0)g(0) = 0 for all polynomials f (x) ∈ V.
But then choosing f (x) suitably, we see that g(1) = g(0) = 0. This is absurd as g(x)
can be chosen arbitrarily.
The following gives an example of a linear operator on an infinite-dimensional inner product space
which does have an adjoint.
EXAMPLE 26 Consider the infinite-dimensional inner product space W (see Example 14 of Section
8.4) of complex-valued continuous functions on [0, 2π] spanned by the orthonormal
family { fn }, where { fn (x) = einx | n = 0, ±1, ±2, . . . }. Let T and S be the operators
on W given by
T ( f )(x) = eix f (x) and S ( f )(x) = e−ix f (x).
We leave it to the reader the routine verification that T and S are linear operators on
W. Note also that T (eimx ) = ei(m+1)x and S (eimx ) = ei(m−1)x . Recall that as the functions
eimx form an orthonormal basis of W, we have /eikx , eilx 0 = δk,l and so
/T (eimx ), einx 0 = /ei(m+1)x , einx 0 = δ(m+1),n

Adjoints 407
and
/eimx , S (einx )0 = /eimx , ei(n−1)x 0 = δm,n−1 .
But m + 1 = n if and only if m = n − 1 so that the definition of Kronecker’s delta

function implies that δm+1,n = δm,n−1 . The preceding relations then show that
/T (eimx ), einx 0 = /eimx , S (einx )0.
Since this relation holds for arbitrary basis vectors of W, the uniqueness of adjoints
allows us to conclude that T ∗ = S .
The following lists some basic properties of adjoints of operators; their proofs are left to the reader.
Proposition 8.5.6. For linear operators T and S (which have adjoints) on an inner product space
V over a field F and any a ∈ F, the following hold.
(a) (T + S )∗ = T ∗ + S ∗ .
(b) (aT )∗ = aT ∗ for any a ∈ F.
(c) (T S )∗ = S ∗ T ∗ .
(d) T ∗∗ = T .
(e) I ∗ = I, where I is the identity operator on V.
We end this section by introducing an important class of operators on an inner product space. They
will be studied in detail in a later section.
Definition 8.5.7. A linear operator T on an inner product space is called self-adjoint, or hermitian
if T = T ∗ .
It is clear from Proposition (8.5.4) that an operator on a finite-dimensional inner product space over
F is self-adjoint if and only if its matrix A, relative to any orthonormal basis, is hermitian (A∗ = A) in
case F = C and symmetric (At = A) in case F = R.
Hermitian matrices and, in particular, real symmetric matrices play important roles in many appli-
cations; one of the reasons for their importance is that eigenvalues of such matrices are always real.
We have proved this fact by using dot products in Rn . The proof that any self-adjoint operator on an ar-
bitrary finite-dimensional inner product space also has all its eigenvalues real can be found in Section
8.7 later.
EXERCISES
1. Determine whether the following statements are true or false giving brief justifications. Assume
that all linear operators are on finite-dimensional inner product spaces.
(a) The adjoint T ∗ of a linear operator T cannot be equal to T if T is not the identity map.
(b) If linear operators T and S commute, then T ∗ and S ∗ commute too.
(c) The adjoint T ∗ of an invertible linear operator T is invertible.
(d) The differentiation operator D on the inner product space of all real polynomials of degree
at most 2 has no adjoint.
(e) The matrix of T ∗ is the conjugate transpose of the matrix of T on V with respect to any basis
of V.
(f) If T is a non-zero operator, then T ∗ T cannot be the zero operator.
(g) For any linear operator T , ker T ∗ = ker T .
(h) For any operator T , the adjoint of T + T ∗ is itself.
(i) For any operator T , the adjoint of T T ∗ is itself.
(j) The adjoint of a diagonalizable operator is diagonalizable.
(k) For a linear operator T on a real inner product space, the characteristic polynomials of T and
T ∗ are the same.
(l) If the columns of an m × n matrix A form an orthonormal basis of the column space of A,
then A∗ A = In .
For the following exercises, we remind the readers the convention about vectors in Fn . While
we may continue to express such vectors as row vectors as arguments for linear operators, for
explicit computations of inner products, they must be written as column vectors.
2. For each of the following linear operators T , evaluate T ∗ (v) for the given vectors v of the indi-
cated inner product spaces V:
(a) V = R2 with standard inner product, T (x1 , x2 ) = (x1 + 2x2 , x1 − x2 ) and v = (1, −1).
(b) V = R2 with standard inner product, T (x1 , x2 ) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ) and
v = (1, 1).
(c) V = C2 with standard inner product, T (z1 , z2 ) = (z1 + iz2 , (1 − i)z1 + z2 ) and v = (1 + i, 1 − i).
L1
(d) V = R2 [x] with inner product / f, g0 = 0 f (t)g(t)dt, T (p(x)) = p' (x) − p(x) and v = 1 + x.
3. Describe the adjoints T ∗ of the linear operators T on indicated inner product spaces V in each
of the following cases. Verify also that /T v, u0 = /v, T ∗ u0 for all v, u ∈ V.
(a) V = R2 with standard inner product, T (x1 , x2 ) = (x1 , 0);
(b) V = C2 with standard inner product, T (z1 , z2 ) = (z1 + iz2 , (1 − i)z2);
L1
(c) V = R2 [x] with inner product / f (x), g(x)0 = 0 f (t)g(t)dt,
T (a0 + a1 x + a2 x2 ) = a1 + 2a2 x.
5. Verify that for the linear operator T on C3 given in Example 24 of this section, /T ei , e j 0 =
/ei , T ∗ e j 0 for the standard basis vectors e1 , e2 and e3 of C3 .
6. Let T be a linear operator on a finite-dimensional inner product space. Prove the following
assertions:
(i) ker T is the orthogonal complement of Im(T ∗ ).
(ii) Im(T ∗ ) is the orthogonal complement of ker T .
7. Let T be a linear operator on a finite-dimensional inner product space. If T has an eigenvector,
prove that so does T ∗ .
8. Give an example of a linear operator T on an inner product space such that ker T ! ker T ∗ .
9. Let W be a subspace of a finite-dimensional inner product space V. Prove that the adjoint of the
projection PW of V on W is PW itself.
10. Let T be a linear operator on a finite-dimensional inner product space V. Prove the following:
(i) ker(T ∗ T ) = ker T .
Unitary and Orthogonal Operators 409
(ii) The rank of T is the same as the rank of T ∗ .

(iii) The rank of T is the same as the ranks of T T ∗ and T ∗ T .
11. For any A ∈ Mn (F), prove that rank(AA∗ ) = rank(A∗ A) = rank(A).
12. Fix vectors u and w in an inner product space V, and define T u,w : V → V by T u,w (v) = /v, u0w
for all v ∈ V. Prove the following:
(i) T u,w (v) is a linear operator on V.
(ii) T u,w
∗ =T
w,u .
(iii) T u,w T x,y = T u,/w,x0y .
Under what conditions is T u,w self-adjoint?
13. Let V be the Lreal vector space of all polynomials over R equipped with the inner product
1
/ f (x), g(x)0 = 0 f (t)g(t)dt. Define, for a fixed g(x) ∈ V, a map Lg on V by
Lg ( f (x)) = f (1)g(1) − f (0)g(0) for any f (x) ∈ V.
(i) Prove that Lg is a linear functional on V.

(ii) Prove that Lg ( f (x)) is an inner product H( f (x), h(x)) on V for a fixed h(x) ∈ V if and only if
Lg is the zero functional on V.
14. Let T be a linear operator on a finite-dimensional complex inner product space. Let T 1 and T 2
be the operators on V defined by
1 1
T 1 = (T + T ∗ ) and T 2 = (T − T ∗ ).
2 2i
Verify that T 1 and T 2 are self-adjoint operators such that
T = T 1 + iT 2 .
Thus, the self-adjoint operators T 1 and T 2 behave like the real and imaginary parts of the operator
T.
15. Let V be a complex inner product space, and T be a self-adjoint operator on V. Prove the fol-
lowing assertions.
(i) 4v + iT v4 = 4v − iT v4 for any v ∈ V.
(ii) v + iT v = w + iT w if and only if v = w.
(iii) If I is the identity map on V, then I + iT and I − iT are invertible.
16. Let P be a projection on a finite-dimensional inner product space V. Prove that P is self-adjoint
if and only if PP∗ = P∗ P.
8.6 UNITARY AND ORTHOGONAL OPERATORS

Linear maps between two inner product spaces, more specifically, linear operators on an inner product
space, which preserve inner products are important in the study of such spaces. We discuss such
operators in this section. Recall that in Section 7.6, linear operators (and matrices) preserving non-
degenerate bilinear forms were discussed.
Definition 8.6.1. Let V and W are two inner product spaces over the same field, and let T be a
linear map from V into W. T is said to preserve inner products if
/T v, T u0 = /v, u0 for all v, u ∈ V.
An isomorphism of V onto W is a one–one, onto linear map which preserves inner products.
Note that we have used the same notation to describe inner products in two different spaces. How-
ever, we do not expect any confusion because of this convention as the images of vectors under T
inside an inner product symbol clearly indicates the space in which the inner product is being taken.
An important concept in an inner product space is the length of a vector. The following result shows
that the preservation of inner products by a linear map is equivalent to the preservation of lengths.
Proposition 8.6.2. Let V and W be inner product spaces over the same field, and T be a linear
map of V into W. Then T preserves the inner products if and only if 4T v4 = 4v4 for any v ∈ V.
Proof. If T preserves inner products, then for any v ∈ V, 4T v42 = /T v, T v0 = /v, v0 = 4v42 , which shows
that T preserves lengths.
For the converse, we need the Polarization identity, (see Exercise 2 of this section) which ex-
presses the inner product of two vectors in terms of lengths of certain specific linear combinations of
these vectors. The identity states, for F = R, that
1 1
/v, u0 = 4v + u42 − 4v − u42 ,
4 4
and for F = C, that
1 1 i i
/v, u0 = 4v + u42 − 4v − u42 + 4v + iu42 − 4v − iu42 .
4 4 4 4
It is clear, because of these identities, that if a linear map T preserves lengths, it will preserve the
corresponding inner products too. !
This proposition yields the following useful fact which is not apparent from definition.
Corollary 8.6.3. Let V and W be inner product spaces over the same field. If a linear map T of V
into W preserves inner products, then T is a one–one map.
Proof. A linear map T is one-one if and only if ker T = {0}, that is, T v = 0 if and only if v = 0 (see
Proposition 4.2.6). Since in an inner product space the only vector of zero length is the zero vector,
the corollary follows from the preceding proposition. !
Recall that a linear map between two vector spaces over a field, having the same dimension, is
one–one if and only if it is onto. Therefore, the following important result is clear.
Corollary 8.6.4. Let T be a linear operator on a finite-dimensional inner product space V which
preserves the inner product of V. Then T is an inner product space isomorphism on V.
Definition 8.6.5. An isomorphism of an inner product space onto itself is said to be a unitary
operator if F = C and an orthogonal operator if F = R.
The preceding corollary then can be restated.
Proposition 8.6.6. Let V be a finite-dimensional inner product space over a field F. A linear
operator on V which preserves the inner product is unitary if F = C, and orthogonal if F = R.
We have already seen the importance of orthonormal bases of inner product spaces. It is, therefore
natural to inquire about the way such bases behave under inner product preserving linear maps.
Proposition 8.6.7. Let V and W be finite-dimensional inner product spaces over the same field F
such that dim V = dim W. For a linear map T of V into W, the following are equivalent.
(a) T preserves inner products.
(b) T carries every orthonormal basis of V onto an orthonormal basis of W.
(c) T carries some orthonormal basis of V onto an orthonormal basis of W.
Proof. We show that (a) implies (b). Let {v1 , v2 , . . . , vn } be an orthonormal basis of V. By hy-
pothesis, T is a linear inner product-preserving map and so is one–one. Since dim V = dim W, it
follows that T must be onto and hence is a vector space isomorphism of V onto W. Therefore,
{T v1 , T v2 , . . . , T vn } is a basis of W. However, as T preserves inner products, /T vi , T v j 0 = /vi , v j 0 = δi j
and so {T v1 , T v2 , . . . , T vn } is an orthonormal basis of W.
That (b) implies (c) needs no proof, and so we show that (c) implies (a). Assume that
{v1 , v2 , . . . , vn } is an orthonormal basis of V such that {T v1 , T v2 , . . . , T vn } forms an orthonormal
4 4
basis of W. Now consider arbitrary vectors v, w ∈ V such that v = ni=1 xi vi and w = nj=1 y j v j for
4n 4n
some scalars xi , y j ∈ F. Thus T v = i=1 xi T vi and T w = j=1 y j T v j . Since T vi form an orthonormal
basis of W, it follows that for any i, j, /T vi , T v j 0 = δi j . Therefore by properties of an inner product,
we obtain
1 1
/T v, T w0 = xi y j δ i j = xi yi = /v, w0,
ij i
which proves that T preserves inner product. !

The proposition shows that a linear operator on a finite-dimensional inner product space is unitary
or orthogonal if and only if the operator carries some or any orthonormal basis of V to an orthonormal
basis of V.
The following useful fact follows straight from the definition of unitary operators.
Lemma 8.6.8. Any eigenvalue of a unitary operator or an orthogonal operator is of absolute

value 1.
Proof. If λ is an eigenvalue of a unitary operator T with eigenvector v, then
/T v, T v0 = /λv, λv0 = λλ/v, v0,
which implies that 4T v42 = |λ|2 4v42 . One concludes that |λ|2 = 1 as the unitary operator T preserves
lengths (and T v ! 0 as v is an eigenvector). A similar argument works for an orthogonal operator. !
Now, we turn our attention to the adjoints of unitary and orthogonal operators. The major property
enjoyed by such adjoints, as stated in the next proposition, will be crucial in the rest of this chapter.
Proposition 8.6.9. Let T be a linear operator on an inner product space V over C. Then, T is
unitary if and only if the adjoint T ∗ of T exists and
T T ∗ = T ∗ T = I,
where I is the identity map on V.
Proof. Assume that T is unitary. Then, T is one–one and onto, so its inverse T −1 exists. Since we can
express the inner product /T v, w0 as /T v, T T −1 w0, it follows that /T v, w0 = /T v, T T −1 w0 = /v, T −1 w0
for all v and w in V as T preserves inner products. Therefore, by the uniqueness of the adjoint of T
(see Theorem 8.5.3), we conclude that T −1 = T ∗ . The desired relation follows.
Conversely, we assume that T ∗ exists and T T ∗ = T ∗ T = I. This time, by the uniqueness of the
inverse of a linear operator it follows that T −1 exists and T −1 = T ∗ . The existence of T −1 also implies
that T is a vector space isomorphism. So to complete the proof, we need only to show that T preserves
inner products. Now, for any v and w in V, by definition of adjoint, /T v, T w0 = /v, T ∗ T w0, which
clearly equals /v, w0 as T ∗ = T −1 . The proof is complete. !
The real analogue of the preceding proposition says that an operator T on a real inner product
space is orthogonal if and only if T ∗ exists and T T ∗ = T ∗ T = I.
We now look at some examples of unitary and orthogonal operators. In these, and in later examples,
the inner product space Fn is with the standard inner product unless otherwise mentioned. We must
point out some of these examples have already been presented in Section 5.3 but we include them
anyway here for the sake of completeness.
EXAMPLE 27 Consider Rθ , the rotation of the plane R2 through an angle θ. As we had seen earlier,
Rθ is a linear operator on R2 . It is clear geometrically that the rotation Rθ preserves
length (between any two points in the plane), and therefore, by Proposition (8.6.2),
is an orthogonal operator on R2 . Recall that the matrix A of Rθ with respect to the
standard basis {e1 , e2 } is
' (
cos θ − sin θ
A= .
sin θ cos θ
Now note that {e1 , e2 } is an orthonormal basis of R2 , and so by Proposition (8.6.7),

Rθ (e1 ) = (cos θ, sin θ)t and Rθ (e2 ) = (− sin θ, cos θ)t again form an orthonormal ba-
sis of R2 . Of course, this can be verified directly by calculating the inner products of
Rθ (e1 ) and Rθ (e2 ).
Another fact that is clear from the matrix representation of Rθ is that it is not
self-adjoint, that is, Rθ ∗ ! Rθ for 0 < θ < π.
EXAMPLE 28 Let L be a one-dimensional subspace of Rn , and W = L⊥ . A linear operator RW on
Rn is called a reflection of Rn about W if
RW (v) = −v for all v in L,

RW (v) = v for all v in W.
Consider the orthogonal decomposition Rn = L ⊕ L⊥ = L ⊕ W. Fix a unit vector e ∈ L,

so that L = Re. If we express an arbitrary v in Rn as ae + w for w ∈ W, then by taking
inner product of v with e, we see that a = /v, e0. It follows that the formula for RW is
given by
RW (v) = v − 2/v, e0e.
The linearity of the inner product shows that RW is a linear map. We now show
that RW is an orthogonal operator on Rn by producing an orthonormal basis of Rn
which is mapped by RW onto another orthonormal basis. Pick any orthonormal ba-
sis {u1 , u2 , . . . , un−1 } of W. It is clear that {e, u1 , u2 , . . . , un−1 } is an orthonormal
basis of Rn (as L ⊕ W is an orthogonal decomposition). By definition, RW carries this
orthonormal to the basis {−e, u1 , u2 , . . . , un−1 }, which is trivially an orthonormal
basis again. Thus, RW is an orthogonal operator of Rn .
We now introduce the matrix equivalents of unitary and orthogonal operators.
Definition 8.6.10. A matrix A ∈ Mn (C) is called a unitary matrix if A∗ A = I, where A∗ is the

conjugate transpose of A and I is the identity matrix of Mn (C).
For F = C or R, a matrix A ∈ Mn (F) is called an orthogonal matrix if At A = I, where At is the
transpose of A and I is the identity matrix of Mn (F).
Thus, a real orthogonal matrix is unitary whereas a unitary matrix is orthogonal if and only if all its
entries are real.
Since, by Proposition (8.5.4), the conjugate transpose of the matrix of an operator with respect to
an orthonormal basis of an inner product space is the matrix of the adjoint of the operator, the next
result can be easily verified.
Proposition 8.6.11. Let T be a linear operator on a finite-dimensional inner product space V and
A the matrix of T with respect to some orthonormal basis of V.
(a) Let V be a complex inner product space. Then T is a unitary operator if and only if A is a unitary
matrix.
(b) Let V be a real inner product space. Then T is an orthogonal operator if and only if A is an
orthogonal matrix.
We now point out a special property of unitary matrices which makes identification of such
matrices much simpler. To derive this property, consider a complex unitary matrix A = [ai j ]; then
4
A∗ A = I implies that for any 1 ≤ i, j ≤ n, nk=1 aki ak j = δi j . Therefore, if we think of the columns
γi = (a1i , a2i , . . . , ani )t of A as vectors of Cn , then the last relation implies that with respect to the
standard inner product of Cn , we have /γ j , γi 0 = δi j , which is equivalent to saying that the columns of
A form an orthonormal set in Cn . However, since A∗ A = I if and only if AA∗ = I (in Section 1.4 we
have noted that an one-sided inverse of a matrix is the inverse), it follows that the rows of A too form
an orthonormal set in Cn .
We have an analogous result by considering real orthogonal matrices: the columns of an n × n real
matrix form an orthonormal set in Rn if and only if the rows too form an orthonormal set.
Proposition 8.6.12. Let A ∈ Mn (F) where F = C (respectively, R). Then A is unitary (respectively,
orthogonal) if and only if the columns as well as the rows of A form an orthonormal basis of Fn .
Recall that the matrix of the change of an orthonormal basis of a finite-dimensional inner product
space over F to another orthonormal basis is unitary if F = C or orthogonal if F = R. (see remark
after Proposition 8.6.7). Therefore, if A and B are the matrices of a linear operator with respect to two
orthonormal bases B and B' of a finite-dimensional complex inner product space V, then
B = U −1 AU = U ∗ AU,
where U is the change of basis matrix from B' to B. In that case, we say that A is unitarily similar
to B.
In a similar manner, matrices A and B representing a linear operator on a real inner product space
with respect to two orthonormal bases are orthogonally similar, that is,
B = P−1 AP = Pt AP,
where P is the orthogonal matrix of the change of bases.

The central problem now is to determine the conditions on linear operators on finite-dimensional
inner product spaces which ensure the existence of orthonormal bases relative to which the matrices
of the operators are diagonal ones. As was the case with operators on plain vector spaces, this problem
is related to finding the conditions on matrices under which they are unitarily or orthogonally similar
to diagonal matrices. We take up this problem in the following section.
We finally point out that, as in the case of operators preserving non-degenerate bilinear forms,
the unitary (respectively, orthogonal) operators on a complex (respectively, real) inner product space
form a group with composition of maps as the binary operation (see Exercise 16 of this section); the
operators in the group preserve the respective inner product. In particular, the unitary (respectively,
orthogonal) matrices of order n, which preserve the standard inner product on Cn (respectively, Rn ),
form a group, which is denoted by Un (C) (respectively, On (R)).
EXERCISES
1. Determine whether the following assertions are true or false giving brief justifications. All linear
operators are on finite-dimensional inner product spaces and matrices are over C or R as the case
may be.
(a) A unitary matrix is orthogonal.
(b) An complex orthogonal matrix is unitary.
(c) An orthogonal operator is invertible.
(d) An orthogonal operator is diagonalizable.
(e) The determinant of an orthogonal matrix is ±1.
(f) The determinant of an unitary matrix is ±1.
(g) Any invertible operator on a complex inner product space is unitary.
(h) Any invertible operator on a real inner product space is orthogonal.
(i) The adjoint of a unitary operator is unitary.
(j) The product of two orthogonal operators is orthogonal.
(k) The sum of two orthogonal operators is orthogonal.
(l) If the rows of an n × n real matrix A are orthogonal in Rn , then A is orthogonal.
(m) The matrices of a linear operator with respect to two orthonormal bases are congruent.
(n) A unitary operator cannot be self-adjoint.
(o) If a linear operator has all its eigenvalues of absolute value 1, then it must be unitary.
(p) If a is an eigenvalue of a real orthogonal matrix A, then 1/a is also an eigenvalue of A.
2. Prove the Polarization identities:
(a) For any v, u in a real inner product space,
1 1
/v, u0 = 4v + u42 − 4v − u42 .
4 4
(b) For any v, u in a complex inner product space,
1 1 i i
/v, u0 = 4v + u42 − 4v − u42 + 4v + iu42 − 4v − iu42 .
4 4 4 4
3. Verify that the reflection RW of Example 29 satisfies
RW (v) = v − 2/v, e0e.
4. Verify that the reflection RW of Example 29 is an orthogonal operator.

5. Consider C as a real vector space. For z ∈ C, let T z be the linear operator on V given by T z (x) = zx.
Determine those z for which T z is self-adjoint. Also find z for which it is unitary.
6. Let V = M2 (C) with inner product given by /A, B0 = T r(AB∗). For an A ∈ V, let T A be the linear
operator on V given by T A(M) = AM. Show that T A is unitary if and only if A is a unitary matrix.
7. Prove that a 2 × 2 orthogonal matrix is of one of the following two forms:
' ( ' (
a b a b
or ,
−b a b −a
8. Prove that a unitary matrix in M2 (C) is of the form

' (
a b
−eiθ b eiθ a
where θ 'is a real number, 2 2

( a and b are complex numbers such that |a| + |b| = 1.
a+b b−a
9. Let A = , where a, b ∈ R. Determine when A is orthogonal.
a−b b+a
10. Show that for every a ∈ R, the matrix
 
 1 −2a 2a2 
1  
A=  2a 1 − 2a
2 −2a
1 + 2a2 2a2 2a 1

is orthogonal.
11. Let V be a finite-dimensional inner product space, and let W be a subspace. As V = W ⊕ W ⊥ ,
every vector v can be uniquely written as a sum w + w' with w ∈ W and w' ∈ W ⊥ . Define a
map T : V → V by T (v) = w − w' . Prove that T is a linear operator which is unitary as well as
self-adjoint.
12. Let V be the inner product L 1 space of all complex-valued continuous functions on [0, 1] with the
inner product / f, g0 = 0 f (t)g(t)dt. For a fixed h ∈ V, let T be the linear operator on V defined
by T ( f ) = h f , where (h f )(x) = h(x) f (x). Prove that T is a unitary operator if and only if |h(x)| = 1
for 0 ≤ x ≤ 1.
13. Find an orthogonal matrix whose first row is (1/3, 2/3, 3/3).
14. Let T be a self-adjoint operator on a finite-dimensional complex inner product space V. In Ex-
ercise 15 of Section 8.5, we have seen that I + iT , where I is the identity map on V, is invertible.
Show that
S = (I − iT )(I + iT )−1
is a unitary operator on V.
15. Let W be a finite-dimensional T -invariant subspace of an inner product space where T is a
unitary operator on V. Prove that the restriction T W is a unitary operator on W.
16. Prove that the unitary (respectively, orthogonal) operators on a complex (respectively, real) inner
product space form a group.
17. Let V be the real vector space of all real 3 × 3 skew-symmetric matrices A (i.e. At = −A). Make
V into a real inner product space with the inner product /A, B0 = (1/2)T r(ABt). Prove that the
map T : R3 → V given by
 
 0 −x3 x2 
 
T (x1 , x2 , x3 )t =  x3 0 −x1 
 
−x2 x1 0
is an inner product space isomorphism of R3 onto V. Find an orthonormal basis of V.

18. Let {ai j } be a square array of n2 real numbers. Show that
n
1 n
1
ai j 2 = 1 for i = 1, 2, . . . n and ai j ak j = 0 for i ! k
j=1 j=1
if and only if
n
1 n
1
ai j 2 = 1 for i = 1, 2, . . . n and ai j aik = 0 for j ! k.
i=1 i=1
8.7 NORMAL OPERATORS

The fact that a real symmetric matrix is diagonalizable (see Section 5.3) implies that if a linear operator
T on Rn is represented by a symmetric matrix with respect to the standard basis of Rn , then there is
an orthonormal basis of Rn , consisting of the eigenvalues of T , with respect to which the matrix of T
is diagonal; here we are assuming that Rn is equipped with the usual dot product. In this section, we
seek conditions on a given linear operator on a finite-dimensional inner product space, which ensures
an orthonormal basis consisting of eigenvectors of the operator.
Observe that if a finite-dimensional inner product space V has an orthonormal basis B consisting of
eigenvectors of a linear operator T , then not only the matrix A of T with respect to B, is diagonal but
also the matrix A∗ of the adjoint T ∗ relative to the same basis must be diagonal (see Proposition 8.5.4).
Since diagonal matrices always commute, it follows that AA∗ = A∗ A. In other words, for V to have an
Normal Operators 417
orthonormal basis of eigenvectors of T , it is necessary that T and its adjoint T ∗ commute. One of our
objectives in this section is to show that this necessary condition is also sufficient if V is a complex
inner product space. We begin with the following natural definition.
Definition 8.7.1. A linear operator T on an inner product space is normal if its adjoint T ∗ exists
and T T ∗ = T ∗ T . A square matrix A is said to be normal if AA∗ = A∗ A for A ∈ Mn (C) and AAt = At A
for A ∈ Mn (R).
It follows that a linear operator on a finite-dimensional inner product space is normal if and only if
its matrix relative to any orthonormal basis is normal.
We now look at some examples of normal operators and matrices.
EXAMPLE 29 The rotation Rθ of the plane R2 through an angle θ, where 0 < θ < π has
' (
cos θ − sin θ
A=
sin θ cos θ
as its matrix relative to the standard orthonormal basis of R2 . Now, by the trigono-
metrical identity cos2 θ + sin2 θ = 1, we see that AAt = I2 = At A. So, as A is a real
matrix, Rθ is normal.
EXAMPLE 30 The real matrix A of the preceding example is not symmetric. However, any real
symmetric matrix is clearly normal. So is any real skew-symmetric matrix. Similarly,
a complex hermitian matrix is normal.
In fact, any self-adjoint operator T is trivially normal as T = T ∗ .
EXAMPLE 31 A unitary operator T is normal as T ∗ = T −1 , so T satisfies the condition for normality
trivially. Similarly an orthogonal operator on a real inner product space is normal,
too.
Similarly, a unitary or a real orthogonal matrix is a normal matrix.
EXAMPLE 32 Consider the infinite-dimensional inner product space W of complex-valued contin-
uous functions on [0, 2π] spanned by the orthonormal family { fn } where
{ fn (x) = einx | n = 0, ±1, ±2, . . . }.
We have seen in Example 27 of Section 8.5 that if T and S be the linear operators
on W given by T ( f )(x) = eix f (x) and S ( f )(x) = e−ix f (x), then the adjoint of T exists
and T ∗ = S . However, the definitions of T and S clearly show that their composite
(as maps) is the identity map I of W. So T T ∗ = T S = I = S T = T ∗ T , which proves
that both T and S are indeed normal operators on W.
For the main result about eigenvalues and eigenvectors of normal operators, we require two basic
facts.
The first is the following relation:
4T v4 = 4T ∗ v4 (8.11)
for any normal operator T on an inner product space V and any v ∈ V. To derive this relation, note that,
as T T ∗ = T ∗ T , one has /T v, T v0 = /(T ∗ T )v, v0 = /(T T ∗ )v, v0 = /T ∗ v, T ∗ v0 and so 4T v42 = 4T ∗ v42 . This
proves (8.11).
The second one is stated in the following lemma.
Lemma 8.7.2. Let T be a normal operator on an inner product space V over a field F. If I is the
identity map on V, then for any λ ∈ F, the operator T − λI is also normal.
Proof. By properties of adjoints of operators given in Proposition (8.5.6), one has (T − λI)∗ = T ∗ −
λI ∗ = T ∗ − λI. A direct computation then establishes the equality of (T − λI)(T − λI)∗ and (T − λI)∗ (T −
λI), as T T ∗ = T ∗ T . !
Now the main result.
Proposition 8.7.3. Let T be a normal operator on an inner product space V over a field F.
(a) If T v = λv for some λ ∈ F and some v ∈ V, then T ∗ v = λv. Thus, if v is an eigenvector of
T belonging to some eigenvalue λ, then v is also an eigenvector of T ∗ , but belonging to the
eigenvalue λ.
(b) If v1 and v2 in V are eigenvectors of T belonging to distinct eigenvalues λ1 and λ2 , then v1 and
v2 are orthogonal.
Proof. If T v = λv for some λ ∈ F and v ∈ V, then (T − λI)v = 0. Now, by the preceding lemma,
the operator T − λI is a normal operator, and so by Equation (8.11), applied to T − λI, we see
that 4(T − λI)∗ v4 = 0. Since only the zero vector in an inner product space has length zero, it then
follows that (T − λI)∗ v = 0. Now, using properties of adjoints, the last relation can be rewritten as
T ∗ v − (λI ∗ )v = T ∗ v − λv = 0, which proves the first assertion in (a). The second assertion in (a) follows
by the definition of an eigenvector.
Now, if v2 is an eigenvector of T belonging to the eigenvalue λ2 , then by (a) it follows that T ∗ v2 =
λ2 v2 . Therefore,
λ1 /v1 , v2 0 = /λ1 v1 , v2 0 = /T v1 , v2 0
= /v1 , T ∗ v2 0
= /v1 , λ2 v2 0
= λ2 /v1 , v2 0.
As λ1 ! λ2 , the scalar /v1 , v2 0 = 0, which proves (b). !

However, the question of the existence of eigenvalues of normal operators in general is quite tricky.
For example, the rotation through an angle θ on R2 is a normal operator (See Example 30 of this
Section), but we have already seen that the rotation operator cannot have an eigenvalue if θ ! 0, π.
(See Example 14 of Section 5.3). On the other extreme, the normal operator T or S of Example 33 of
this Section, on an infinite-dimensional complex inner product space, also cannot have an eigenvalue,
a fact the reader can verify easily.
However, any linear operator on a finite-dimensional vector space over the field C of complex
numbers has at least some eigenvalue as the characteristic polynomial of the operator factors into a
product of linear factors. In particular, a normal operator on a complex finite-dimensional inner product
space does have eigenvalues. But the remarkable fact about normal operators on such inner product
spaces is that there are enough eigenvectors to render them diagonalizable. The proof of this property
of normal operators depends on the following result on general operators; it is the inner product space
analogue of Schur’s Theorem (5.6.13) for algebraically closed fields.
Theorem 8.7.4. (Schur’s Theorem) Let T be a linear operator on a finite-dimensional inner

product space V over the field F, where F is either C or R. If the characteristic polynomial of T factors
over F into a product of linear factors, then there is an orthonormal basis of V with respect to which
the matrix of T is upper triangular.
Note that if V is a complex inner product space, that is, F = C, then the hypothesis about the
characteristic polynomial of T holds anyway as C is an algebraically closed field. (See Section 5.2).
Also note that as V is assumed to be finite dimensional, the adjoint of T exists.
Proof. The hypothesis about the characteristic polynomial of T implies that T has at least one eigen-
value, say, λ ∈ F. We begin our proof by claiming that T ∗ has an eigenvector because of the existence
of the eigenvalue λ of T . Note that we cannot apply Proposition (8.7.3) as we are considering a gen-
eral operator T and not a normal operator. In any case, we establish the claim by showing that λ
is an eigenvalue of T ∗ . To do that, choose any eigenvector v ∈ V of T belonging to the eigenvalue λ.
Then (T − λI)v = 0 and so for any u ∈ V, the inner product /(T − λI)v, u0 = 0. Since, by properties of
adjoints,
/(T − λI)v, u0 = /v, (T − λI)∗ u0 = /v, (T ∗ − λI)u0,
it then follows that
/v, (T ∗ − λI)u0 = 0. (8.12)
Let U be the image of the operator T ∗ − λI, that is, the subspace {(T ∗ − λI)u | u ∈ V} of V. Therefore,
Equation (8.12) implies, according to the definition of the orthogonal complement U ⊥ of U, that v ∈
U ⊥ . However v, being an eigenvector, is non-zero and so U ⊥ is a non-zero subspace of V. Therefore,
the orthogonal decomposition V = U ⊕ U ⊥ of V (see Corollary 8.4.11) shows that U has to be a proper
subspace of V. In other words, the linear operator T ∗ − λI is not onto, hence cannot be one–one as
V is finite-dimensional. Thus, the kernel of the operator T ∗ − λI has non-zero vector. This proves our
claim, as any non-zero vector in the kernel is an eigenvalue of T ∗ belonging to the eigenvalue λ.
Now we come to the proof of the theorem proper, which will be by induction on n = dim V. As
usual, if n = 1, the result is trivially true. So, let n > 1. By our claim in the preceding paragraph, we
can assume that T ∗ has an eigenvector. Since any non-zero scalar multiple of an eigenvector remains
an eigenvector, we may normalize the eigenvector. Thus, without any loss of generality, we assume
that the eigenvector v of T ∗ is a unit vector. Let W be the subspace of V spanned by v. If W ⊥ is the
orthogonal complement of W, then the decomposition V = W ⊕ W ⊥ shows that dim W ⊥ = n − 1. Now
observe that W is T ∗ -invariant as it is spanned by an eigenvector of T ∗ . It is then easy to see that W ⊥
is T -invariant (see Exercise 4 of this section). Let S be the restriction of T to the T -invariant subspace
W ⊥ . By a general results about invariant subspaces, we know that the characteristic polynomial of S
divides the characteristic polynomial of T (see Proposition 5.5.6). Therefore S , as a linear operator
on the (n − 1)-dimensional inner product space W ⊥ , satisfies the hypothesis of the theorem about its
characteristic polynomial. It follows, by the induction hypothesis, that there is an orthonormal basis
{v1 , v2 , . . . , vn−1 } of W ⊥ with respect to which the matrix of S is upper triangular.
As V is a direct sum of W ⊥ and W and as the unit basis vector v of W is orthogonal to each basis
vector vi of W ⊥ , we conclude that {v1 , v2 , . . . , vn−1 , vn = v} is an orthonormal basis of V; clearly the
matrix of T is upper triangular with respect to this basis. !
Now, we are ready to prove the main result about diagonalizability of normal operators.
Theorem 8.7.5. Let T be a linear operator on a finite-dimensional complex inner product space
V. Then T is normal if and only if V has an orthonormal basis consisting of eigenvectors of T .
Proof. To prove the theorem in one direction, let us assume that T is normal. Then, by Schur’s theo-
rem, there is an orthonormal basis B of V with respect to which the matrix of T is upper triangular. We
claim that the additional condition of normality of T forces the matrix A to be diagonal, so the basis
B itself is the required basis of eigenvectors of T .
To establish the claim, we let A = [ai j ] be the upper triangular matrix of T with respect to the
orthonormal basis B = {v1 , v2 , . . . , vn } of V. Then, by Proposition (8.5.2), the matrix of the adjoint T ∗
of T with respect to the same basis B is precisely the conjugate transpose A∗ of A. Let A∗ = [bi j ], so
that bi j = a ji . Now, as A is upper triangular, T v1 = a11 v1 by the definition of A, so v1 is an eigenvector
of T with eigenvalue a11 . The first part of Proposition (8.7.3) then implies that T ∗ v1 = a11 v1 . On the
other hand, by the definition of A∗ , we have
n
1 n
1
T ∗ v1 = bi1 vi = a1i vi .
i=1 i=1
Comparing the two expressions for T ∗ v1 , we conclude that a1i = 0 for all i ≥ 2. Thus, a1i = 0 for i ≥ 2.
In particular, a12 = 0. Then, going back to the second column of the upper triangular matrix A, we see
that T v2 = a22 v2 . This forces T ∗ v2 = a22 , so arguing the same way as in the preceding paragraph, we
can conclude that a2i = 0 for all i ≥ 3.
It is now clear that by continuing in a similar manner, we will be able to show that all the off-
diagonal entries of A are zeros. Hence the claim, which proves one half of the theorem.
To prove the converse, we assume that V has an orthonormal basis B consisting of eigenvectors
of T . Then the matrix A of T with respect to B is clearly diagonal. Consequently, the matrix A∗ of
the adjoint T ∗ , being the conjugate transpose of A, is also diagonal. Since any two diagonal matrices
commute, it follows that AA∗ = A∗ A, which implies that T T ∗ = T ∗ T . Thus, T is normal and the proof
is complete. !
We now consider the matrix versions of the preceding results. Recall that a matrix A ∈ Mn (F) is
normal if AA∗ = A∗ A. Given such a normal matrix A, we can associate a normal operator T on Fn the
usual way so that A is precisely the matrix of T with respect to the standard basis E of Fn . Of course,
Fn has to considered an inner product space with respect to the standard inner product. Now if F = C,
according to the theorem we have just proved, there is an orthonormal basis B of Fn relative to which
the matrix of T is a diagonal one. Since the standard basis of Fn is an orthonormal basis with respect
to the standard inner product, the change of basis matrix from B to E is a unitary matrix, say U. Thus,
U −1 AU is a diagonal matrix.
Corollary 8.7.6. For any normal matrix A ∈ Mn (C), there is a unitary matrix U such that U ∗ AU is
diagonal.
This corollary is sometimes stated as: a complex normal matrix is unitarily diagonalizable.
We can similarly obtain the matrix version of Schur’s theorem.
Corollary 8.7.7. Given any matrix A ∈ Mn (C), there is a unitary matrix U such that U ∗ AU is upper
triangular.
As we have already seen, a normal operator on a real inner product space, in general, cannot be
diagonalized. Fortunately, any self-adjoint operator on a finite-dimensional real inner product space
is diagonalizable. Recall an operator T is self-adjoint if T = T ∗ . So the matrix A of T , with respect
to any orthonormal basis of the underlying finite-dimensional inner product space, is hermitian, that
is, A = A∗ . Since a real hermitian matrix is just a symmetric matrix, the discussion about self-adjoint
operators that follows also covers real symmetric matrices.
We begin by a result which highlights two crucial aspects of self-adjoint operators. The first is that
any eigenvalue of a self-adjoint operator is real even in the complex case. The second is that any such
operator must have eigenvectors even in the real case.
Lemma 8.7.8. Let T be a self-adjoint operator on a finite-dimensional inner product space V over
a field F.
(a) Any eigenvalue λ ∈ F of T is real.
(b) Eigenvectors of T belonging to distinct eigenvalues are orthogonal.
(c) The characteristic polynomial of T factors into linear factors over F.
Proof. If λ ∈ F is an eigenvalue of T , and v is a corresponding eigenvector, then, by Proposition (8.7.3),

T ∗ v = λv so that the following holds:
λv = T v = T ∗ v = λv.
Since v is a non-zero vector, it follows that λ = λ which proves (a).

We leave the proof of (b) to the reader, and come to the proof of the crucial part (c). The result is
obvious if F = C, so we assume that V is a real inner product space. Let dim V = n and let B be an or-
thonormal basis of V. If A is the matrix of T relative to B, then we know that A itself is hermitian. Let
W = Cn be the space of all n × 1 column vectors over C, which is a complex inner product space with
respect to the standard inner product. Consider the linear operator L on W given by Lx = Ax. As the ma-
trix of L with respect to the standard orthonormal basis of W is clearly the hermitian matrix A, L itself
is a self-adjoint operator on W. On the other hand, the characteristic polynomial of L factors into linear
factors, say of the type x − λ over C (as W is a complex space and C is algebraically closed). But then
each λ appearing in this factorization must be an eigenvalue of the self-adjoint operator L and so must
be real. It follows that the characteristic polynomial of L, and therefore of A, factors into linear factors
over R. However, the characteristic polynomial of T is the same as that of A, and (c) follows. !
We reiterate the most important implication of the last lemma: a self-adjoint operator on a finite-
dimensional inner product space must have eigenvectors even if the space is over the field of real
numbers.
We have, by now, established all the crucial points needed for the proof of the main result about
self-adjoint operators on real inner product spaces. The complex case has already been tackled in
Theorem (8.7.5) as self-adjoint operators are also normal.
Theorem 8.7.9. Let T be a linear operator on a finite-dimensional real inner product space V.
Then, T is self-adjoint if and only if there is an orthonormal basis of V relative to which the matrix of
T is diagonal.
Proof. The characteristic polynomial of a self-adjoint operator T on the real inner product space V, by
the preceding lemma, factors into linear factors over R. Therefore, by Schur’s Theorem (8.7.4), there
is an orthonormal basis of V with respect to which the matrix A of T is upper triangular. The matrix of
the adjoint T ∗ , relative to the same orthonormal basis, is the conjugate transpose A∗ of A. However, as
T is self-adjoint, A = A∗ . Thus both A and A∗ are upper triangular forcing them both to be diagonal.
Conversely, suppose the matrix A of T with respect to an orthonormal basis of V is diagonal. Since
the diagonal entries are real (the base field is R), it follows that A = A∗ . So T is self-adjoint. !
The matrix versions of the theorems are clear.
Corollary 8.7.10. A self-adjoint or hermitian matrix A ∈ Mn (C) is unitarily diagonalizable, that is,
there is a unitary matrix U ∈ Mn (C) such that U ∗ AU = U −1 AU is diagonal.
In particular, we see that a symmetric matrix A ∈ Mn (R) is orthogonally diagonalizable, that is,
there is an orthogonal matrix P ∈ Mn (R) such that P−1 AP = Pt AP is diagonal, a result we have already
proved in Section 5.3.
Since the diagonal entries of P−1 AP or U −1 AU are precisely the eigenvalues of A, the following is
clear, which we record anyway for reference.
Corollary 8.7.11. A hermitian or a real symmetric matrix of order n has n real eigenvalues if they
are counted according to their multiplicities.
For a real symmetric matrix A, it is trivial that for any x ∈ Rn , /Ax, x0 is real. A similar result holds
for complex hermitian matrices too, whose proof is left to the reader.
Corollary 8.7.12. For a hermitian matrix A of order n, /Ax, x0 is real for any x ∈ Cn with respect
to any inner product on Cn .
The actual procedure for obtaining an orthogonal matrix P that will diagonalize a given real sym-
metric matrix was illustrated in examples at the end of section 5.3. We apply the same procedure to
diagonalize a complex hermitian matrix.
EXAMPLE 33 The characteristic polynomial of the following hermitian matrix
' (
1 i
A=
−i 0
is easily seen to be x2 − x − 1. The discriminant is positive, and so, as predicted by

Lemma (8.7.8),√the eigenvalues of A are 2
√ real. Solving x − x − 1 = 0, we see that
λ1 = (1/2)(1 + 5) and λ2 = (1/2)(1 − 5) are the distinct eigenvalues of A. Note
that λ1 + λ2 = 1 and λ1 λ2 = −1 so that the matrix A − λ1 I2 can be put in the form
' (
λ2 i
.
−i −λ1
Therefore, multiplying the first row of the matrix A − λ1 I2 by iλ1 and subtracting
the multiple from the second row, we see that A − λ1 I2 is row equivalent to
' (
λ2 i
.
0 0
A routine calculation then shows that we may choose v1 = (i, −λ2 )t as an eigenvector
of A belonging to the eigenvalue λ1 . Similarly, we can choose v2 = (i, −λa1 ) as an
eigenvector corresponding to the eigenvalue λ2 . Let l1 and l2 be the lengths of v1 and
v2 respectively; note that l1 2 = 1 + λ22 = 2 + λ2 whereas l2 2 = 1 + λ12 = 2 + λ1 as both
λ1 and λ2 satisfy the relation x2 = x + 1. It is clear that normalizing v1 and v2 , we get
an orthonormal basis of C2 . These orthonormal basis vectors form the columns of
the unitary matrix U which diagonalizes the given matrix. In other words, if
' (
i/l1 i/l2
U=
−(λ2 /l1 ) −(λ1 /l2 )
then,
' (
λ 0
U −1 AU = U ∗ AU = 1 .
0 λ2
A remark is in order. Note that in the example, we did not bother to verify that v1 and v2 are
orthogonal before normalizing them, as eigenvectors belonging to distinct eigenvalues of a normal
operator and hence of a hermitian matrix are orthogonal by Proposition (8.7.3). In case the eigenspace
belonging to an eigenvalue of a hermitian or a real symmetric matrix has dimension greater than 1,
one has to apply the Gram–Schmidt process to the eigenvectors chosen for the basis of the eigenspace
to produce an orthonormal basis; the union of such orthonormal bases for the distinct eigenvalues then
automatically becomes an orthonormal basis of the underlying inner product space.
A unitary as well as a self-adjoint operator on a complex inner product space are both normal. We
now present a result which allows us to distinguish between these two examples of normal operators
on a finite-dimensional space.
Proposition 8.7.13. Let T be a normal operator on a finite-dimensional complex inner product

space V. Then
(a) T is self-adjoint if and only if all its eigenvalues are real.
(b) T is unitary if and only if all its eigenvalues have absolute value 1.
Proof. We have already seen the nature of eigenvalues of self-adjoint and unitary operators in Lemmas
(8.6.8) and (8.7.8) so both the assertions need proofs only in one direction. Suppose T is a normal
operator on V such that all its eigenvalues are real. Fix any orthonormal basis of V and let A be the
matrix of T with respect to this basis. Since A is a normal matrix, it follows that we can find a unitary
matrix U such that U −1 AU = U ∗ AU = D, a diagonal matrix. The diagonal entries of the diagonal
matrix D are precisely the eigenvalues of A as well as of T , so are real by hypothesis. It follows that
D∗ = D. Using properties of conjugate transposes, we then see that
A∗ = (UDU −1 )∗ = (UDU ∗ )∗ = (U ∗ )∗ D∗ U ∗ = UDU ∗ = A.
The proof of the other assertion is similar, and left to the reader. !
The diagonalizability of a normal operator on a finite-dimensional complex inner product space or

of a self-adjoint operator on a finite-dimensional real inner product space enables us to express such
operator in a particularly satisfying form in terms of its eigenvalues. To be precise, if λ1 , λ2 , . . . , λk
are the distinct eigenvalues of such an operator T , and P1 , P2 , . . . , Pk are the orthogonal projections
onto the corresponding eigenspaces, then T can be expressed as the sum λ1 P1 +λ2 P2 +· · ·+λk Pk . This
elegant formulation of a normal operator is known as the Spectral Theorem. We end this section by
considering this theorem and its consequences.
Recall from our discussion in Section 4 that the orthogonal projection P of a finite-dimensional
inner product space V on a given subspace W is the unique projection map of V having W as the range
and the orthogonal complement W ⊥ as the kernel. In fact, the orthogonal projection P and I − P are the
projections associated with the direct sum decomposition V = W ⊕ W ⊥ . Note that if v ∈ V is expressed
as a sum v1 + v2 with v1 ∈ W and v2 ∈ W ⊥ , then Pv = Pv1 = v1 and Pv2 = 0 (see Proposition 8.4.19).
Now we are ready to present the Spectral theorem.
Theorem 8.7.14. (The Spectral Theorem) Let T be a linear operator on a finite-dimensional inner
product space V over a field F. Suppose that T is normal if V is a complex inner product space, or that
T is self-adjoint if V is a real space. Assume further that λ1 , λ2 , . . . , λk are the distinct eigenvalues
of T . Let W j be the eigenspace of T corresponding to the eigenvalue λ j for 1 ≤ j ≤ k, and P j the
orthogonal projection of V on W j . Then, the following hold.
(a) V = W1 ⊕ W2 ⊕ · · · ⊕ Wk .
Cj be the direct sum of all eigenspaces Wi where i ! j. Then
(b) For each j, let W
Cj .
W j⊥ = W
(c) P j Pi = δi j Pi for 1 ≤ i, j ≤ k.
(d) If I is the identity map on V, then
I = P1 + P2 + · · · + Pk .
(e) T = λ1 P1 + λ2 P2 + · · · + λk T k .
Proof.
(a) By hypothesis, T is diagonalizable whether V is a complex or a real inner product space. There-
fore V has to be the direct sum of the eigenspaces belonging to distinct eigenvalues by the
fundamental result about diagonalizable operators (see Theorem 5.3.20).
(b) We had seen in Proposition (8.7.3) and Lemma (8.7.8) that the eigenvectors of a normal or a
self-adjoint operator belonging to distinct eigenvalues are orthogonal. Thus, any eigenvector in
W j is orthogonal to any sum of vectors in W Cj = 4i! j Wi . It follows that W
Cj ⊂ W j ⊥ . However,
the dimension of W j ⊥ equals dim V − dim W j as V = W j ⊕ W j ⊥ . On the other hand, by (a), the
dimension of W Cj also equals dim V − dim W j . Thus, dimensions of W j ⊥ and W Cj are the same.
C C
The inclusion W j ⊂ W j therefore implies that W j = W j .
⊥ ⊥
(c) The preceding assertion implies that if i ! j, then every Wi is contained in the kernel of the
projection P j as W j is the range of P j . Moreover, Pi (V) ⊂ Wi . Thus, P j Pi must be the zero
operator on V if j ! i. Also note that P j 2 = P j .
(d) Since V is the direct sum of the eigenspaces, given any v ∈ V, we have v = v1 + v2 + · · · + vk ,
where vi ∈ Wi . However, we have already noted that the kernel of P j contains all the eigenspaces
Wi for i ! j. It follows that P j v = P j v j = v j and so
v = P1 v + P2 v + · · · + Pk v
for every v ∈ V. (d) follows.

(e) Finally, we note that for any v j ∈ W j ,
T v j = λ jv j = λ j P jv j.
On the other hand, if v = v1 + v2 + · · · + vk with v j ∈ W j , then as P j v = P j v j , the last displayed

equation implies that
T v = λ1 P1 v + λ2 P2 v + · · · + λk Pk v
for any v ∈ V. This establishes (e).

!
There are some terminologies associated with the assertions of the Spectral theorem. The set of
distinct eigenvalues of T is called the spectrum of T , the sum I = P1 + P2 + · · · + Pk is called the
resolution of the identity induced by T , and the sum T = λ1 P1 + λ2 P2 + · · · + λk Pk is called the
spectral resolution of T .
The following lemma is useful for deducing consequences of the spectral resolution of T .
Lemma 8.7.15. Let T be a normal operator on a complex inner product space V, or a self-adjoint
operator on a real inner product space V. If
T = λ1 P1 + λ2 P2 + · · · + λk Pk
is the spectral decomposition of T , then for any polynomial f (x) ∈ F[x],
f (T ) = f (λ1 )P1 + f (λ2 )P2 + · · · + f (λk )Pk .
Proof. We can compute the powers of T by using the properties of the orthogonal projections asserted
in the Spectral theorem. For example,
 k  k 
1  1  1 1
2
T =  λi Pi   λ j P j  = λi λ j Pi P j = λ j2P j
i=1 j=1 i, j j
as Pi P j = δi j P j . So induction shows that for any positive integer k,

1
Tk = λ jk P j.
j
Therefore, if
1
f (x) = c0 + c1 x + c · · · + cn xn = ck xk ,
k
4 4
then an easy computation shows that f (T ) = k ck T k = j f (λ j )P j as required. !
We also need the following basic form of the widely-used Lagrange’s interpolation formula:
given two sets of n + 1 complex numbers a0 , a1 , . . . , an and b0 , b1 , . . . , bn with ai s distinct, there is
unique polynomial g(x), of degree at most n, over C such that
g(ai ) = bi for all i = 0, 1, 2, . . . , an .
To construct such a polynomial, begin by defining polynomials f0 , f1 , . . . , fn by

?
f j (x) = (x − ai ).
i! j
It is clear that each f j is a polynomial of degree n such that f j (a j ) ! 0 and f j (ai ) = 0 whenever i ! j.
Therefore, if we set
n
1 f j (x)
g(x) = bj ,
j=0
f j (a j )
then it follows that g(x) is a polynomial (over C) of degree at most n (and of degree n if all the b j are
non-zero) such that g(ai) = bi for 0 ≤ i ≤ n.
If q(x) is another polynomial over C of degree at most n such that q(ai ) = bi for all i, then the
polynomial g(x) −q(x) clearly has n distinct roots a0 , a1 , . . . , an contradicting the fact that polynomial
g(x) − q(x) over C has degree at most n. This contradiction forces g(x) − q(x) to be the zero polynomial,
proving the uniqueness of the polynomial g(x).
Corollary 8.7.16. Let T be a linear operator on a finite-dimensional complex inner product space
V. Then T is normal if and only if T ∗ = g(T ) for some polynomial g(x) over C.
Proof. If T is normal, let T = λ1 P1 + λ2 P2 + · · · + λk Pk be the spectral resolution of T . Since any

projection is self-adjoint (prove it), taking adjoint of the spectral resolution of T , we see that
T ∗ = λ1 P1 + λ2 P2 + · · · + λk Pk . (8.13)
Also by the preceding lemma, for any polynomial g(x) over C, the spectral resolution of T gives
g(T ) = g(λ1 )P1 + g(λ2)P2 + · · · + g(λk )Pk .
Now, since λi s are distinct, it is possible to choose, by Lagrange’s interpolation formula, a polynomial
g(x) over C such that g(λi ) = λi for 1 ≤ i ≤ k. For this choice of g(x), one has
g(T ) = a1 T 1 + a2 T 2 + · · · + ak T k ,
which, by Equation (8.13), is precisely T ∗ .

Conversely, if T ∗ is a polynomial in T , then T commutes with T ∗ as T commutes with any polyno-
mial in T . Normality of T is thus trivial. !
In the proof of the last corollary, we had noted that if T = λ1 P1 + λ2 P2 + · · · + λk Pk is the spectral
resolution of T , then the adjoint too has a resolution T ∗ = T ∗ = λ1 P1 + λ2 P2 + · · · + λk Pk . It follows
that
T T ∗ = (λ1 P1 + λ2 P2 + · · · + λk Pk )(λ1 P1 + λ2 P2 + · · · + λk Pk )
= |λ1 |2 P1 + |λ2 |2 P2 + · · · + |λk |2 Pk
by using the properties of the P j stated in the Spectral theorem. Thus, in case the eigenvalues of T
have absolute value 1, it follows from the resolution of the identity that T T ∗ = I, or T is unitary. Thus,
we have been able to derive Proposition (8.7.13) again.
EXERCISES
that the underlying inner product spaces are finite-dimensional.
(a) A real symmetric matrix is normal.
(b) A complex symmetric matrix is normal.
(c) A real symmetric matrix of order n has n real eigenvalues.
(d) The sum of two normal operators is normal.
(e) The product of two normal matrices is normal.
(f) If A is a normal matrix, then so is A2 .
(g) A scalar multiple of a normal operator is normal.
(h) For a self-adjoint operator T , a subspace W is T -invariant if and only if W ⊥ is T -invariant.
(i) Every self-adjoint operator is diagonalizable.
(j) Eigenvectors of an operator and its adjoints are the same.
(k) Eigenvalues of an orthogonal operator must have absolute value 1.
(l) If a normal operator is nilpotent, then it must be the zero operator.
(m) The dimension of any eigenspace of a real symmetric matrix equals the algebraic multiplic-
ities of the corresponding eigenvalue.
(n) Eigenspaces belonging to distinct eigenvalues of hermitian matrix are orthogonal.
(o) Any projection is a self-adjoint operator.
(p) A self-adjoint operator is a linear combination of orthogonal projections.
(q) An orthogonal projection is uniquely determined by its range.
2. Prove that the eigenvectors of a normal operator on a finite-dimensional inner product space
belonging to distinct eigenvalues are orthogonal.
3. Prove that if two normal operators on an inner product space commute, then their product is
normal. Is the assertion true if “normal” is replaced “self-adjoint”?
4. Let T be a linear operator on an inner product space V having adjoint T ∗ . If W is a T ∗ -invariant
subspace of V, prove that W ⊥ is T -invariant.
5. Let T be a normal operator on a finite-dimensional complex inner product space. Use diagonal-
izability property of T to show that any T -invariant subspace of V is T ∗ -invariant.
6. For a normal operator T on a finite-dimensional inner product space, prove that ker T = ker T ∗ ,
and Im(T ) = Im(T ∗ ).
7. Let T be a linear operator on an inner product space V, W a T -invariant subspace of V and T W

the restriction of T to W. Prove the following assertions:
(a) If T is self-adjoint, so is T W .
(b) If W is also T ∗ -invariant, then (T W )∗ = (T ∗ )W .
(c) If W is also T ∗ -invariant, and T is normal, then T W is normal.
8. Determine whether the following operators on indicated inner product spaces are normal or
self-adjoint:
(a) R with standard inner product; T (x1 , x2 ) = (x1 − 2x2, 3x1 + 2x2 ),
(b) C2 with standard inner product; T (z1 , z2 ) = (z1 + iz2 , z1 + z2 ),
(c) M2 (R) with inner product /A, B0 =L T r(AB∗); T (A) = At ,
'
(d) R2 [x] with inner product / f, g0 = −1 f (t)g(t)dt; T ( f (x)) = f ' (x).
9. If a normal operator on a finite-dimensional inner product space is a projection, show that it
must be an orthogonal projection.
10. Let A be a real symmetric matrix of order n having eigenvalues λ1 , λ2 , . . . , λn . Prove that A can
be expressed as a sum:
A = λ1 u1 u1 t + λ2 u2 u2 t + · · · + λn un un t
of n rank-1 matrices, where u1 , u2 , . . . , un form an orthonormal basis of Rn .

11. Each of the following real symmetric matrices A represent a linear operator T on R2 or R3 as
the case may be, with respect standard basis. Find the spectral resolution of each T :
   
' ( 2 1 0  1 −2 0
0 −1    
(a) A = ; (b) A = 1 2 1; (c) A = −2 0 1.
−1 0    
0 1 2 0 1 −1
12. For each of the following hermitian matrices A, find a unitary matrix U such that U ∗ AU is
diagonal.
' ( ' (
1 i 2 1−i
(a) A = ; (b) A = .
−i 2 1+i 1
Also, find the spectral resolutions of the linear operators on C2 represented by these matrices
with respect to the standard basis of C2 .
13. For each of the following normal matrices A, find a unitary matrix U such that U ∗ AU is diag-
onal. Also, determine the spectral decompositions of the linear operators on C2 represented by
the matrices with respect to the standard basis.
 
' ( 0 0 i
1 1+i 1−i  
14. (a) A = ; (b) A = 0 i 0.
2 1−i 1+i  
i 0 0
4
15. Consider a normal operator T on a finite-dimensional inner product space V with T = i λi Pi
as its spectral resolution. Prove the following assertions:
(a) Each Pi is a polynomial in T .
(b) If T n is the zero operator on V for some positive integer n, then T itself is the zero operator.
(c) If a linear operator S on V commutes with T , then S commutes with each Pi .
(d) If S commutes with T , then S commutes with T ∗ .
(e) T is invertible if and only if λi ! 0 for all i.

(f) T is a projection if and only if every eigenvalue λi of T is 1 or 0.
(g) T = −T ∗ if and only if every eigenvalue λi of T is imaginary.
18. Show that there is no hermitian or real symmetric matrix of order n such that A4 + A3 + A2 + A + In
is the zero matrix.
429
Bibliography
Bhattacharya P.B., Jain S.K. and Nagpaul S.R., First Course in Linear Algebra, Wiley Eastern Ltd., New Delhi,
1991.
Friedberg S.H., Insel A.J. and Spence L.E., Linear Algebra, 4th Edition, Prentice-Hall of India, New Delhi, 2004.
Herstein I. N., Topics in Algebra, 2nd Edition Reprint, Wiley India, New Delhi, 2006.
Hoffman K. and Kunze R., Linear Algebra, 2nd Edition, Prentice-Hall of India, New Delhi, 2000.
Jacobson N., Basic Algebra I, 2nd Edition, Hindustan Publishing Co., 1984, Dover Publication, 2009.
Kumaresan, S., Linear Algebra-A Geometric Approach, Prentice-Hall of India, New Delhi, 2001.
Lay D.C., Linear Algebra and Its Application, 3rd Edition, Pearson Education (Singapore) Pvt. Ltd., New Delhi,
2003.
430
Index
A orthogonal complement of, 361
orthogonality of, 360–361
Abelian group, 15, 45–46
preserve, 374
Addition of matrices, 5, 34
radical of, 356
associative property, 16
scalar multiple, 350–351
basic properties of, 16–24
skew-symmetric, 347–348
commutative property, 16
Sylvester’s Law of Inertia, 364–365
m × n matrices, 45
symmetric, 347, 360–371
rules governing, 15
symmetric non-degenerate, 366
Additive inverse of matrix, 6
Binary operation, 15, 45–48
Adjacent rows of a matrix, 102–103
Block matrix, 36
Adjoints of operators, 403–407
Block multiplication, 37, 39–42
Algebraic multiplicity, 247
Blocks, 36–38
Algorithm for finding inverse, 79
as entries of a matrix, 40
Arbitrary non-zero scalar, 57
Block upper triangular matrix, 40–41
B C
Backward substitution, 83 Canonical forms:
Basic variables of original system primary decomposition theorem, 321–327
of equations, 69 Canonical homomorphism, 218
Bessel’s inequality, 400–401 Cauchy–Schwarz inequality, 387–388
Bilinear forms: Cayley–Hamilton theorem, 299
adjoint, 359 Characteristic polynomials, 246–249
alternating, 347–348 Circulant matrix, 267–268
characteristic of a field, 362 Coefficient matrix, 9–11, 50, 52, 66
congruent, 352 Column partition, 38
diagonalization of symmetric matrices, Column rank of a matrix, 153, 156
369–370 Column-row expansion, 11–12
dual space, 355–358 Column–row multiplication of matrices, 42
groups preserving, 374–378 Column space of a matrix, 153
invertible matrix, 364 Column vector, 2
linear functionals, 355–358 Commutative group, 45
linearity, 346 Commutative ring, 48
matrix of f with respect to the basis B, Complex matrix, 2
349–350 Components of the vector, 5
on n-dimensional vector space, 351 Conjugate, 34, 163–164
non-degenerate, 357–358 Conjugate transpose, 34, 381
null space of, 356 Coordinates, 140–141
orthogonal basis of, 361–363 Coordinate vector, 140–141
431
432 Index
Coset of subspace, 185–186 of order m, 55–57

addition and scalar multiplication, 186–188 permutation matrix, 60
linear independence of, 189 Elementary row operation, 55–62
Cramer’s rule, 108–111 Elementary symmetric
polynomials, 309
D Ellipsoid, 318
Elliptic cone, 319
Determinant of matrix, 29 Elliptic cylinder, 319
of arbitrary invertible matrix, 105, 110–111 Elliptic paraboloid, 319
Cramer’s rule, 108–111 Endomorphisms, 192
elementary matrices and, 103–108 Equivalence relation, 55
expansion of det A by minors along Extension field, 48
first row, 97
Laplace’s expansion, 106
F
of lower triangular matrix, 97–98
multiplicative, 107 Factor theorem, 242
n × n, 96 Fields, 2, 47
recursive definitions, 96–97 Finite-dimensional vector
row operations and, 98–103 space, 135–136
of transpose, 106–107 dimension of a, 138
Diagonalizable matrix, 251–252, 258–259 subspaces of, 147–149
Diagonal matrix, 3, 33 transition matrix, 144
Differential map, 194–195 Finite groups, 46
Direct sum of matrices, 286–287 Fixed column vectors, 70
of elementary Jordan blocks with Forward substitution, 83
eigenvalue, 336 Fourier expansion, 391
Direct sum of vector space, 200–201 Fourier matrix, 268
Distributive laws, 18–19, 47, 209 Free variables of original system
for scalar multiplication, 117 of equations, 69
Division algorithm, 240 Fundamental Theorem of Algebra, 242
Divisor, 240
Dot product, 7, 163–164, 169, 374 G
Dual space, 206, 355–358
Gaussian elimination, 49–53
Generalized eigenspaces, 330
E
Generating set, 123
Eigenspaces, 256–258 Generator, 240
for an eigenvalue, 247 Geometrical multiplicity of an eigenvalue,
Eigenvalues, 243–265 256–257
eigenspace for an, 247 Gram–Schmidt orthogonalization process,
geometrical multiplicity of, 256–257 170–174, 390–401, 423
method for finding, 245–246 in an arbitrary inner product space,
minimal polynomials and, 299–304 394–395
of real symmetric matrices, 259–265 best approximation to v by vectors,
Eigenvectors, 243–265 397–398
computing, 249–256 orthogonal complements of subspaces, 394
of real symmetric matrices, 259–265 orthogonal decomposition, 395–396
Elementary column operations, 55 orthogonal projection, 393–394, 399–400
Elementary Jordan block, 231 Greatest common divisor (gcd), 241
Elementary matrices: Group, 45
and determinants, 103–108 Group of isometries, 375
Index 433
H Invertible matrix, 27–29, 77–80

LDV factorization of, 91
Hermitian forms:
linear independence of, 143
on an infinite-dimensional vector space,
nullity of, 159
383–384
propositions, 28–29
conjugate transpose, 381
Invertible operator, 210–212
self-adjoint matrix, 381
Irreducible polynomial, 241–243
standard hermitian product, 381
Isomorphism, 215–219
standard inner product, 381
Isomorphism theorems, 218
symmetry, 381
Homogeneous equation, 10
Homogeneous system of linear equations, 71 J
invertible of, 77–78 Jordan–Chevalley decomposition, 324
Homomorphisms, 192 Jordan forms:
canonical, 218 as a direct sum of elementary Jordan
Horizontal blocks, 39 blocks, 336
Hyperbolic cylinder, 319 generalized eigenspaces, 330
Hyperbolic pair, 37 of a linear operator, 337–340
Hyperbolic paraboloid, 319 nilcyclic bases, 330
Hyperbolic plane, 37 of a nilpotent operator, 329–335
Hyperboloid of one sheet, 319 similarity of matrices, 340–343
Hyperboloid of two sheet, 319
K
I Kroncker’s delta symbol, 390
Kronecker delta symbol, 23
Identity matrix, 3, 33, 59
Imaginary ellipsoid, 318
L
Improper subspace, 118
Inclusion map, 194 Lagrange’s interpolation formula, 426
Index, 365 Laplace’s expansion by minors along
Infinite-dimensional vector space, 135–136 the ith row, 106
Inner product space: LDV factorizations, 91–92
Cauchy–Schwarz inequality, 387–388 Leading entry of a row, 67
length of vectors in, 387 Length of a vector, 164
orthogonal, 386–387 Linear functionals, 206
orthonormal basis, 390 Linear functionals, 355–358
positive definiteness, 385 Linear independence of vectors, 127–133
Pythagoras’ identity, 387–388 basis of vector space, 135–137
standard inner product, 381, 386 column-row multiplication, 130
triangle inequality, 387 definition of, 127
unit vectors in, 387 examples, 128–129
Invariants of nilpotent matrix, 333 Gram–Schmidt process, 171
Invariant subspaces, 283–295 maximal, 132, 135
Inverses: of non-empty subset of a
of an invertible lower triangular matrix, 80 vector space, 128
of an invertible matrix using non-trivial, 127
determinant, 29 of preceding vectors, 132–133
by LU factorization, 90–92 testing of, 130–131
Invertibility of a square matrix, 161 theoretical importance, 131–133
Invertible lower triangular matrix, inverse trivial, 127
of an, 80 Linearly dependent of vectors, 127
434 Index
Linear maps, 12, 191 matrices; Invertible matrix; Symmetric

algebra of, 204–212 matrix
as basis vectors, 196–198 addition of, 5
bijection, 198 additive inverse of, 6
composition of, 207–208 adjacent rows of a, 102–103
differential map, 194–195 augmented, 52–53, 66
dimension formula, 198–200 block, 36
dimension of spaces, 206 block upper triangular, 40–41
direct sums, 200–201 circulant, 267–268
domain, 192 coefficient, 9–11, 50, 52, 66
dual space, 206 column rank of, 153, 156
of finite-dimensional vector spaces, 199 column–row multiplication of, 42
image of, 191, 193–196 column space of, 153
inclusion map, 194 complex, 2
injective, 198 diagonal, 3, 33
kernel, 193–196 diagonal, 3
linear functionals, 206 diagonalizable, 251–252, 258–259
matrices of, 221–234 direct sum of, 286–287
pre-image of, 191 equality, 4–5
projections, 194, 200–201 Fourier, 268
range, 192 identity, 3, 33, 59
ranks and nullities of matrices lower triangular, 4
representing, 225–226 min, 374
singular maps, 226 multiplication, 5, 10
surjective, 198 multiplication of, 5, 20–21, 34, 195–196
vector spaces of, 205–206 nilpotent, 112
zero map, 193 notation for systems of linear equations,
Linear operators, 222 9–11
on an n-dimensional vector space, 247 nullity of a, 158–161
on an n-dimensional vector space, 226 operations, 15–24
eigenvalues and eigenvectors of, orthogonal, 163–176
243–265 orthogonally similar, 414
invertible, 210–212 partitioned, 36–42
nilpotent, 212 permutation, 60–62, 267
ring of, 208–210 real, 2
Linear transformations, 191–192 real, 2
Lower triangular matrix, 4, 84 rectangular, 2
determinants of, 97–98 reflection, 169
inverse of an invertible, 80 rotation, 169
proposition, 8–9 row echelon, 66–67, 87
LU factorization, 82–83 row equivalent, 55, 59–60
computing inverses, 90–92 row reduction of, 65–74, 79,
construction of L, 85–87 84–85, 90
existence of, 84 scalar, 6
permuted, 93–94 scalar, 3, 5, 33
square matrix, 82–83 singular, 27
skew-symmetric, 33–34
square, of order n, 3
M
stochastic, 27
Matrix. See also Addition of matrices; subtraction of, 6
Determinant of matrix; Elementary symplectic, 378
Index 435
Matrix (contd.) Non-zero row of a matrix, 67

transition, 27, 144 Non-zero solution of the vector
triangular, 78 equation, 130
unit, 21–23 Non-zero subspace, 243
unitary, 167–170 Normal operators, 416–427
upper triangular, 4 Notation:
zero, 4, 33, 86 for cosets, 186
zero, 4 of negative of a matrix, 6
Maximal linearly independent sets for summation, 7
of vectors, 132, 135 for systems of linear equations, 9–11
M-dimensional column vectors, 3, 11, 41, unit matrix, 23
93, 130–131, 153 Nullity of a matrix, 158–161
M equations in n variables, general invertible matrix, 159
system of, 9–10 rank and, 159
Minimal polynomials, 271–280 Null space, 158–161
annihilators, 272–274
applications of, 277–280 O
eigenvalues and, 299–304
least positive degree in, 275 One-dimensional subspace, 139
satisfied by linear operators, 271–272 One-sided inverse, 28
uniquemonic polynomial, 275–276 Ordered n-tuple, 4
Min matrix, 374 Ordered pair, 4
Minors, 96–97 Ordered triple, 4
M × n matrix, 2–3, 7–8, 11, 23, 58, 130, Orthogonal group, 375
136, 158 Orthogonally similar matrix, 414
Monic polynomial, 238 Orthogonal matrices, 163–176, 375
Multiple, 240 multiplication, 168–169
Multiplication of matrix, 5, 34, 195–196 Orthogonal operators, 409–414
properties, 20–21 Orthogonal set of vectors, 167
Multiplicative identity, 46 Orthogonal vectors, 165–166
Multiplier, 85 Orthonormal basis, 167
of the column space of the matrix, 173
N Orthonormal set of vectors, 167
N-dimensional column vectors, 4–5, 19–20,

P
114, 130, 175, 226, 261, 381
N-dimensional row vectors, 114 Pair of planes, 319
Newton’s identities, 309 Parabolic cylinder, 319
Nilpotent matrix, 112 Parallelogram law, 165, 170
Nilpotent operator, 212 Parallelogram law, 115
cyclic subspaces for, 294–295 Partitioned matrix, 36–42
invariants of, 333 Permutation matrix, 60–62, 267
Jordan forms of, 329–335 Permuted LU factorization, 93–94
Non-negative remainder of integer, 46 Pivot, 67
Non-negative square root, 164 Pivot column, 67, 131, 157–158, 160
Non-zero complex number, 46 Pivot position, 67
Non-zero divisors of zero, 23 Polarization identity, 410
Non-zero entry of a row, 67 Polynomials:
Non-zero integers, 46 addition of, 239
Non-zero orthogonal vectors, 167 characteristic, 246–249
Non-zero polynomial, degree of, 238 divisibility properties of, 239–241
436 Index
Polynomials (contd.) Rotation matrix, 169

irreducible, 241–243 Row echelon form of augmented
minimal, 271–280 matrix, 70–71
monic, 238 Row echelon matrix, 66–67, 87
multiplication of, 239 Row equivalent matrix, 55, 59–60
scalar multiplication of, 238–239 Row exchange, 55
Polynomials of degree zero, 238 Row partition, 38
Positive definite matrix, 369 Row rank of a matrix, 153
Positive semi-definite matrices, 369 reduced row echelon form of, 154–156
Power sums, 309 of a row echelon, 155
Preserve inner products, 410 Row reduction of matrices, 65–74, 79,
Primary decomposition theorem, 321–327 84–85, 90
Principal axes theorem, 312–313 Row replacement, 55
Principal ideal domain (PID), 240 Row scaling, 55
Proper subspace, 118 Row space of a matrix, 153
Pseudo-orthogonal groups, 375 Row vector, 2
Pythagoras’ identity, 387–388
Pythagoras’ theorem, 165 S
Scalar matrix, 3, 33
Q Scalar multiple, 5
QR factorization, 174–176, 401 Scalar multiplication, 5
QR-factorizations, 163 of matrices, 16–17
Quadrics, 318–319 of a polynomial, 238
Quotient space, 184–189 Scalar product, 7
dimensions of, 188–189 Scalars, 116
Scalars, 2
Schur’s theorem, 305–306, 419
R
Self-adjoint matrix, 381
Rank of a matrix: Signature, 365
column, 153, 156 Similarity classes, 232–233
non-zero row vectors, 155 Single matrix equation, 11
row, 153–155 Singular matrix, 27
Real matrix, 2 Skew-symmetric matrix, 33–34
Real quadratic forms, 310–312 SN decomposition, 324
conic section, 314 Solution of the system, 10, 50
ellipse, 315 Solution set of system, 50
parabola, 316 Solutions of systems of linear
principal axes theorem, 312–313 equations, 69–74
Rectangular matrix, 2 Solution space, 119
Rectangular non-zero matrix, 68 Spanning set, 123
Reduced column vector, 69 Spectral resolution of T , 425
Reduced row echelon, 77–79, 131 Spectral theorem, 424–425
Reduced row echelon form of a matrix, 157, Spectrum of T , 425
159–160 Square matrix, 78, 136
Reflection matrix, 169 LU factorization of a, 82–83
Reflection of Rn about W, 412 Square matrix of order n, 3
Resolution of the identity induced by T , 425 Square submatrix, 42
Ring, 48 Standard hermitian product, 381
of linear operators, 208–210 Standard inner product, 163
Ring with identity, 48 Standard inner product, 381, 386
Index 437
Stochastic matrix, 27 System of linear equations, 9–11

Subfields, 47 Systems of linear equations:
Submatrices, 37 augmented matrix of a system,
Subspaces, 117–121 52–53, 66
bases of, 178–183 coefficient matrix, 50, 52, 66
basis of, 138–139 consistent, 50–51
coset of, 185–186 equivalent, 65–66
cyclic, 290–292 equivalent, 50
direct complement of, 151 Gaussian elimination, 49–53
direct summands of, 150 inconsistent, 50–51
direct sums of, 149–151 multiplication, 51–52
of a finite-dimensional vector space, possibilities of solutions, 50–51
147–149 solution of, 50
generated, 121, 123 solution set of, 50
internal direct sum of, 150 solutions of, 69–74
invariant, 283–295
linear combination of, 121 T
linear span, 122
T -annihilator, 292
one-dimensional, 139
T -cyclic subspace, 290
ordered bases of, 151
T -cyclic vector, 291
orthogonal complement of, 166
T -invariant subspace, 284, 322
orthonormal basis of a, 170
T -nilcyclic subspace, 292
rank of, 157
Transition matrix, 27, 144
spanned, 121, 123
Translation operator, 202
sum of, 121–123
Transpose of a matrix, 32–34
T -annihilator, 292
Transposition, 61
T -cyclic basis of, 291–292
Triangle inequality, 387
T -invariant, 284
Triangular matrix, 78
T -nilcyclic, 292
Two-dimensional subspace, 139
two-dimensional, 139
Two-sided inverse, 28
union of bases of, 150
zero-dimensional, 139
U
Subtraction of matrices, 6
Sum, 5 Unique Factorization Theorem, 243
Sylvester’s Law of Inertia, 364–365 Unitarily similar matrix, 414
Symmetric matrix, 33–34, 151, 347, 360–371 Unitary matrices, 167–170
aspects of psd (pd) matrices, 370–371 Unitary operators, 409–414
characteristic of a field, 362 Unit matrices, 21–22
diagonalization of, 369–370 properties of, 22–23
index, 365 Unit vector, 164
invertible, 364 Upper triangular matrix, 4, 78
orthogonal basis of, 361–363 representations, 304–306
orthogonal complement of, 361
orthogonality of, 360–361 V
positive definite matrix, 369
Vectors, 116
positive semi-definite matrices, 369
Vector spaces, 48
signature, 365
addition and multiplication of polynomials,
Sylvester’s Law of Inertia, 364–365
120–121
Symmetric permutation matrix, 60
additive inverse of u, 116–117
Symplectic matrices, 378
axioms of, 117–118
438 Index
Vectors spaces (contd.) non-empty subsets of, 185

basis as an ordered basis, 141 in a non-zero subspace, 139
basis of subspace, 138–139 over field F, 115–116
change of basis, 144–145 standard basis of, 136
coordinates, 140–141 subspaces, 117–121
coordinate vector, 140–141 subtraction in, 116
dimension of, 137–145 sum of subspaces, 121–123
direct sum of, 200–201 test for basis, 139–140
distributive law for scalar transition matrix, 144
multiplication in, 117 vectors in arbitrary, 139–140
examples of, 118–121 Vertical blocks, 39
finite-dimensional, 135–136
homomorphisms, 192
Z
infinite-dimensional, 135–136, 195
of invertible matrix, 143 Zero blocks, 40
lemma, 137–138 Zero-dimensional subspace, 139
of linear maps, 205–206 Zero ellipsoid, 319
of m-dimensional vector space forms, 139 Zero map, 193
n-dimensional column vectors, 114 Zero matrix, 4, 33, 86
n-dimensional row vectors, 114 Zero subspace, 118
non-empty subset of, 123 Zero vector, 116

Saikia PK Linear Algebra

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Saikia PK Linear Algebra

Uploaded by

Copyright:

Available Formats

Saikia-Linear Algebra book1 February 18, 2014 14:0

Promode Kumar Saikia

Copyright © 2014 Pearson India Education Services Pvt. Ltd

Preface to the Second Edition xiii

List of Symbols xvii

2 Systems of Linear Equations 49

3 Vector Spaces 114

4 Linear Maps and Matrices 191

5 Linear Operators 237

6 Canonical Forms 321

7 Bilinear Forms 346

8 Inner Product Spaces 380

Preface to the Second Edition

xiv Preface to the Second Edition

—Promode Kumar Saikia

ai j the (i, j)th entry of the matrix A

xviii List of Symbols

Rθ the rotation of the plane through angle θ

1.2 BASIC CONCEPTS

then A is a real 2 × 3 matrix,

Similarly, an m × 1 matrix A over F is (an m-dimensional) column vector

ρi = (ai1 , ai2 ,..., ain ) 1 ≤ i ≤ m,

or as comprising n column vectors γ1 , γ2 , . . . , γn , where

is an m-dimensional column vector over F.

Special Forms of Matrices

Symbolically, two m × n matrices A = [ai j ] and B = [bi j ] over F are equal if

ai j = bi j for all i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Addition and Multiplication of Matrices

We can also define the sum by letting A + B = [ci j ], where

Note that A + B ∈ Mm×n (F).

It is clear that for A ∈ Mn (F) or x ∈ Fn , the scalar multiples cA is in Mn (F) and cx is in Fn .

are also 2 × 3 matrices over C.

are matrices over R, then

We also note that A = 2B or B = 1/2A.

then the product AB is defined as the scalar (or the number)

will be AB = 5.2 + 0.1 + (−3) · 4 = 10 + 0 − 12 = −2.

Definition 1.2.3. For any m × n matrix A = [ai j ], 1 ≤ i ≤ m, 1 ≤ j ≤ n and an n × p matrix B =

for any fixed i and j with 1 ≤ i ≤ m, 1 ≤ j ≤ p.

For example, the product of the following matrices

c11 = 1.0 + 2.2 + 3.1 = 7

Matrix Notation for Systems of Linear Equations

which is known as the coefficient matrix of the given

variables (sometimes thought of as unknowns) over a field F is usually described as

a11 x1 + a12 x2 + · · · + a1n xn = b1

f (a + a' ) = f (a) + f (a' ) and f (ca) = c f (a)

(A + B)2 ! A2 + 2AB + B2.

Prove that AB = BA. ' (

T (θ1 )T (θ2 ) = T (θ2 )T (θ1 ) = T (θ1 + θ2 )

for any two real numbers θ1 and θ2 .

prove the following relations:

σ21 = σ22 = σ23 = −iσ1 σ2 σ3 = I2 ,

19. Prove Proposition (1.2.4) for upper triangular matrices.

Matrix Operations and Their Properties 15

1.3 MATRIX OPERATIONS AND THEIR PROPERTIES

for any A ∈ Mm×n (F);

In fact, A determines B uniquely and B = −A.

Matrix Operations and Their Properties 17

(d) (Right Distributive Law) For any a, b and c in F

We now present the basic properties of scalar multiplication of matrices.

Proposition 1.3.4. Let F be a field.

c(AB) = (cA)B = a(cB).

Matrix Operations and Their Properties 19

the definition of scalar multiple of a matrix implies the assertion in (d).