You are on page 1of 504

Advanced Linear Algebra with Applications

Mohammad Ashraf · Vincenzo De Filippis ·


Mohammad Aslam Siddeeque

Advanced Linear Algebra


with Applications
Mohammad Ashraf Vincenzo De Filippis
Department of Mathematics Department of Engineering
Aligarh Muslim University University of Messina
Aligarh, Uttar Pradesh, India Messina, Italy

Mohammad Aslam Siddeeque


Department of Mathematics
Aligarh Muslim University
Aligarh, India

ISBN 978-981-16-2166-6 ISBN 978-981-16-2167-3 (eBook)


https://doi.org/10.1007/978-981-16-2167-3

Mathematics Subject Classification: 15A03, 15A04, 15A18, 15A20, 15A21, 15A23, 15A42, 15A63

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

Linear Algebra is a fundamental branch of mathematics and is regarded as the most


powerful mathematical tool which has acquired enormous applications in diverse
fields of studies viz. Basic Sciences, Engineering, Computer Science, Economics,
Ecology, Demography, Genetics etc. The present book has been designed for
advanced undergraduate students of linear algebra. The subject matter has been devel-
oped in such a manner that it is accessible to the students who have very little back-
ground of linear algebra. A rich collection of examples and exercises have been added
at the end of each section just to help the readers for conceptual understanding. Basic
knowledge of various notions e.g., sets, relations, mappings have been pre-assumed.
This book comprises twelve chapters. It is expected that students would already
be familiar with most of the material presented in the first chapter which starts with
a quick review of basic literature on groups, rings and fields which are essential
components for defining a vector space over a field. Further, it goes into the details
of the matrices and determinants. Besides discussing elementary row operations and
rank of matrices, method for finding solution of system of linear equations have been
given. Chapter 2 opens with the introduction of vector space and its various examples.
Apart from discussing linearly independent sets, linearly dependent sets, subspaces
and their algebras, linear span, quotient space, basis and dimension, several exercises
have been included on these topics. After a brief geometrical interpretation in case
of vector spaces over the real field, change of basis has been included. Chapter 3
introduces linear transformation on vector spaces, kernel and range of a linear trans-
formation, nonsingular, singular and invertible linear transformations. Further, basic
isomorphism theorems, algebra of linear transformations, matrix of a linear transfor-
mation and vice-versa, effect of change of basis on matrix representation of a linear
transformation are also discussed in this chapter. Chapter 4 treats linear functional
and dual spaces. Some results related to annihilator of a subset of a vector space and
hyperspaces (or hyperplanes) have been included. Finally, dual (or transpose) of a
linear transformation and its matrix representation relative to ordered dual basis have
been discussed.

v
vi Preface

Chapter 5 is based on inner product spaces. Orthogonality and orthonormality of


a subset of a vector space are discussed. It also contains applications of Gram-
Schmidt orthonormalization process in obtaining orthonormal basis, orthogonal
complements, Riesz representation theorem in finite dimensional inner product
spaces. Finally, operators viz. adjoint operators, self adjoint operators, isometries
on inner product spaces have been studied. Chapter 6 is devoted to the study of
canonical form of an operator. This chapter opens with eigenvalues and eigenvec-
tors of a linear transformation. Further, triangularizable, diagonalizable operators are
highlighted. We conclude this chapter with a study of Jordan canonical form of an
operator, Cayley-Hamilton theorem, minimal polynomial and normal operators on
inner product spaces. Chapter 7 deals with the study of bilinear and quadratic forms
on Euclidean spaces. Proofs of the main theorems regarding the symmetric, skew-
symmetric and quadratic forms have been included. Chapter 8 is devoted to the study
of sesquilinear and Hermitian forms of an operator. Besides the matrix of sesquilinear
forms the effect of change of basis has also been included. Chapter 9, which is novel
in a book of this kind, illustrates a more advanced view of tensor product of vector
spaces and tensor product of linear transformations. A unified procedure is presented
to establish existence and uniqueness of tensor product of two vector spaces. Further-
more, we provide a fairly detailed introduction of tensor algebra. Final section comes
up with an introduction of exterior Algebra or Grassmann Algebra.
The last three chapters focus on empowering readers to pursue interdisciplinary
applications of linear algebra in numerical methods, analytical geometry and in
solving linear system of differential equations. In detail, Chap. 10 initially provides
an introduction to LU and P LU decompositions of matrices and their applications
in solving the system of linear equations. Furthermore, we discuss briefly dominant
eigenvalue and the corresponding dominant eigenvector of a square matrix and use
the application of power method that gives an approximation to the eigenvalue of
the greatest absolute value and a corresponding eigenvector. This chapter ends with
singular value decomposition which is a powerful tool in machine learning, detaining,
telecommunications, digital image processing, spectral decomposition, discrete opti-
mization etc. We have demonstrated an idea how SVD compression works on images.
Chapter 11 is dedicated to estimate distances between points, lines and planes, as well
as angles between lines and planes in Euclidean spaces by using methods involved
in the theory of inner product spaces. The chapter concludes with application of the
progress of eigenvalues, eigenvectors and canonical forms of linear operators and
quadratic form to study and classify the conic curves and quadric surfaces in the
affine, Euclidean and projective spaces. Chapter 12 is devoted to study methods for
solving linear systems of ordinary differential equations, by using techniques associ-
ated with the calculation of eigenvalues, eigenvectors and generalized eigenvectors
of matrices. Further, the method to represent a system of differential equations in a
matrix formulation and then using the Jordan canonical form and, whenever possible,
the diagonal canonical form of matrices, a process aimed at solving linear systems of
differential equations in a very efficient way has been described. Finally, it is shown
that the method linked with the solution of systems also presents a way of dealing
with the problem of solving differential equations of order n.
Preface vii

This book is self contained. Throughout the book stress is laid on the under-
standing of the subject. We hope that this book will help the readers to gain enough
mathematical maturity to understand and pursue any advance course in linear algebra
with greater ease and understanding. We welcome comments and suggestions from
the readers.
The authors gratefully acknowledge their deep gratitude to all those, whose text
books on linear algebra have been used by them in learning the subject and writing
this text book. We are also thankful to our colleagues and research scholars for
making valuable comments and suggestions.

Aligarh, India Mohammad Ashraf


Messina, Italy Vincenzo De Filippis
Aligarh, India Mohammad Aslam Siddeeque
Contents

1 Algebraic Structures and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Fields with Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5 System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Linear Dependence, Independence and Basis . . . . . . . . . . . . . . . . . 55
2.3 Geometrical Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1 Kernel and Range of a Linear Transformation . . . . . . . . . . . . . . . . 80
3.2 Basic Isomorphism Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3 Algebra of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 Nonsingular, Singular and Invertible Linear
Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.6 Effect of Change of Bases on a Matrix Representation
of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.1 Linear Functionals and the Dual Space . . . . . . . . . . . . . . . . . . . . . . 111
4.2 Second Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4 Hyperspaces or Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.5 Dual (or Transpose) of Linear Transformation . . . . . . . . . . . . . . . . 126

ix
x Contents

5 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


5.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2 The Length of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3 Orthogonality and Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4 Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6 Canonical Forms of an Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2 Triangularizable Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.3 Diagonalizable Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.4 Jordan Canonical Form of an Operator . . . . . . . . . . . . . . . . . . . . . . 191
6.5 Cayley-Hamilton Theorem and Minimal Polynomial . . . . . . . . . . 212
6.6 Normal Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . 218
7 Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.1 Bilinear Forms and Their Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.2 The Effect of the Change of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.3 Symmetric, Skew-Symmetric and Alternating Bilinear
Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.4 Orthogonality and Reflexive Forms . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.5 The Restriction of a Bilinear Form . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.6 Non-degenerate Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.7 Diagonalization of Symmetric Forms . . . . . . . . . . . . . . . . . . . . . . . . 254
7.8 The Orthogonalization Process for Nonisotropic
Symmetric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7.9 The Orthogonalization Process for Isotropic Symmetric
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.10 Quadratic Forms Associated with Bilinear Forms . . . . . . . . . . . . . 266
7.11 The Matrix of a Quadratic Form and the Change of Basis . . . . . . 269
7.12 Diagonalization of a Quadratic Form . . . . . . . . . . . . . . . . . . . . . . . . 270
7.13 Definiteness of a Real Quadratic Form . . . . . . . . . . . . . . . . . . . . . . 272
8 Sesquilinear and Hermitian Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.1 Sesquilinear Forms and Their Matrices . . . . . . . . . . . . . . . . . . . . . . 285
8.2 The Effect of the Change of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
9 Tensors and Their Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.1 The Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.2 Tensor Product of Linear Transformations . . . . . . . . . . . . . . . . . . . 315
9.3 Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
9.4 Exterior Algebra or Grassmann Algebra . . . . . . . . . . . . . . . . . . . . . 325
10 Applications of Linear Algebra to Numerical Methods . . . . . . . . . . . . 333
10.1 LU Decomposition of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
10.2 The P LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
10.3 Eigenvalue Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Contents xi

10.4 Singular Value Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349


10.5 Applications of Singular Value Decomposition . . . . . . . . . . . . . . . 358
11 Affine and Euclidean Spaces and Applications of Linear
Algebra to Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.1 Affine and Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.2 Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
11.3 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
11.4 A Natural Application: Coordinate Transformation in RE2 . . . . . 397
11.5 Affine and Metric Classification of Quadrics . . . . . . . . . . . . . . . . . 399
11.6 Projective Classification of Conic Curves and Quadric
Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
12 Ordinary Differential Equations and Linear Systems
of Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
12.1 A Brief Overview of Basic Concepts of Ordinary
Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
12.2 System of Linear Homogeneous Ordinary Differential
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
12.3 Real-Valued Solutions for Systems with Complex
Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
12.4 Homogeneous Differential Equations of nth Order . . . . . . . . . . . . 468

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
About the Authors

Mohammad Ashraf is Professor at the Department of Mathematics, Aligarh Muslim


University, India. He completed his Ph.D. in Mathematics from Aligarh Muslim
University, India, in the year 1986. After completing his Ph.D., he started his teaching
career as Lecturer at the Department of Mathematics, Aligarh Muslim University,
elevated to the post of Reader in 1987 and then became Professor in 2005. He also
served as Associate Professor at the Department of Mathematics, King Abdulaziz
University, KSA, from 1998 to 2004.
His research interests include ring theory/commutativity and structure of rings and
near-rings, derivations on rings, near-rings & Banach algebras, differential identities
in rings and algebras, applied linear algebra, algebraic coding theory and cryptog-
raphy. With a teaching experience of around 35 years, Prof. Ashraf has supervised
the Ph.D. thesis of 13 students and is currently guiding 6 more. He has published
around 225 research articles in international journals and conference proceedings
of repute. He received the Young Scientist’s Award from Indian Science Congress
Association in the year 1988 and the I.M.S. Prize from Indian Mathematical Society
for the year 1995. He has completed many major research projects from the UGC,
DST and NBHM. He is also Editor/ Managing Editor of many reputed international
mathematical journals.

Vincenzo De Filippis is Associate Professor of Algebra at the University of Messina,


Italy. He completed his Ph.D. in Mathematics from the University of Messina, Italy,
in 1999. He is the member of the Italian Mathematical Society (UMI) and National
Society of Algebraic and Geometric Structures and their Applications (GNSAGA).
He has published around 100 research articles in reputed journals and conference
proceedings.

Mohammad Aslam Siddeeque is Associate Professor at the Department of Math-


ematics, Aligarh Muslim University, India. He completed his Ph.D. in Mathematics
from Aligarh Muslim University, India, in 2014 with the thesis entitled “On deriva-
tions and related mappings in rings and near-rings”. His research interest lies in
derivations and its various generalizations on rings and near-rings, on which he has
published articles in reputed journals.
xiii
Symbols

N The set of natural numbers


Z The set of integers
Q The set of rational numbers
R The set of real numbers
C The set of complex numbers
a∈S a is an element of a set S
a∈ /S a is not an element of a set S
S ⊆ T, T ⊇ S S is a subset of a set T
S=T Sets S and T are equal (have same elements)
∅ The empty set
A B Union of sets A and B
A B Intersection of sets A and B
A\B Difference of sets A and B
A Complement of A
(a, b) Ordered pair consisting of a, b
A×B Cartesian product of A and B
f :S→T Function f from a set S to a set T
f (s) Image of an element s under a function f
f ◦ g, f g Composition or product of functions f and g
Sn Symmetric group of degree n
m|n m divides n
mn m does not divide n
|G| or o(G) Order of a group G
Z (G) Center of a group G
a∼b a is equivalent to b in a specified sense
a ≡ b (mod n) a is congruent to b modulo n
Ker φ Kernel of a homomorphism φ
G/N Quotient of a group G by a subgroup N
F[x] Polynomial ring over a field F
deg p(x) Degree of a polynomial p(x)
g(x)| f (x) Polynomial g(x) divides f (x)

xv
xvi Symbols

R[x] Polynomial ring over a ring R


Fn [x] Set of all polynomials of at most degree n over a field F
C [a, b] Set of all real valued continuous functions defined on [a, b]
v∈V Vector v in a vector space V
αv Scalar α times a vector v
α1 v1 + · · · + αn vn Linear combination of vectors v1 , . . . , vn

v1 , v2 , . . . , vn Subspace spanned by v1 , v2 , . . . , vn
L(S) Linear span of a set S of vectors
V ⊕W Direct sum of vector spaces V, W
dim(V ) Dimension of a vector space V
U +W Sum of subspaces U, W of V
R(T ) Range of a linear transformation T
N (T ) or K er T Kernel of a linear transformation T
r (T ) Rank of a linear transformation T
n(T ) Nullity of a linear transformation T
At Transpose of a matrix A
r (A) Rank of a matrix A
f  (x) Formal derivative of a polynomial f (x)
m(T ) Matrix associated with a linear transformation T
V ∼

=W Isomorphism of two vector spaces V and W
V

Dual of a vector space V


V Second dual or bidual of a vector space V


S⊥ Orthogonal complement of a subset S of an inner product
space V
S◦ Annihilator of a subset S of an inner product space V
det (A) or |A| Determinant of a matrix A
trace(A) or tr(A) Trace of a matrix A
d(a, b) Distance between two vectors a and b
T∗ Adjoint of a linear transformation T
||v|| Norm (or length) of a vector v
L(U, V ) Set of all linear transformations from a vector space U to
V
A (V ) Set of all linear transformations from a vector space V to
itself
Mm×n (F) Set of all m × n matrices over a field F
Mn (F) Set of all n × n matrices over a field F

m(T )(B,B  ) or [T ] BB Matrix of a linear transformation T relative to ordered bases
B and B 
[v] B

Coordinate vector of v relative to an ordered basis B
T Dual of a linear transformation T
W1  W2  · · ·  Wn Orthogonal direct sum of subspaces W1 , W2 , . . . , Wn

 Tensor product
Exterior product
⊕ext External direct sum
Chapter 1
Algebraic Structures and Matrices

The present chapter is aimed at providing background material in order to make


the book as self-contained as possible. However, the basic information about set,
relation, mapping etc. have been pre-assumed. Further, an appropriate training on
the basics of matrix theory is certainly the right approach in studying linear algebra.
Everybody knows that essentially, handling any problem relating to linear transfor-
mations, eigenvalues, normal form of operators, quadratic forms and applications to
geometry, sooner or later, the focal point will be solving a linear system of equations.
That is why matrix theory is so important. Matrices give us both the opportunity to
represent data of problems and methods to solve any associated linear system.

1.1 Groups

Definition 1.1 Let S be a nonempty set. A mapping ∗ : S × S → S is called a binary


operation on S.

Remark 1.2 (i) In the above definition ∗(a, b) is denoted by a ∗ b. It can be easily
seen that usual addition and multiplication are binary operations on the set of
natural numbers N, the set of integers Z, the set of rational numbers Q, the set
of real numbers R and on the set of complex numbers C.
(ii) If we consider the set of vectors V in a plane, vector product is a binary operation
on V while the scalar product on the set V is not a binary operation.
(iii) Let S be a nonempty set equipped with a binary operation ∗, i.e., ∗ : S × S → S
is a mapping, then (S, ∗) is called groupoid. For example, (Z, +), (Z, −) are
groupoids while (N, −) is not a groupoid.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_1
2 1 Algebraic Structures and Matrices

(iv) A nonempty set G equipped with a binary operation ∗, say (G, ∗), is said to be
a semigroup if ∗ is an associative binary operation, i.e., an associative groupoid
is called a semigroup. For example, (Z, + ), (Q, +), (R, +), (C, +), (N, ·),
(Z, ·), (Q, ·), (R, ·), (C, ·), are semigroups but (Z, −), (R∗ = R\{0}, ÷) are
not semigroups.

Definition 1.3 A nonempty set G equipped with a binary operation ∗, say (G, ∗),
is said to be a group if
(1) For every a, b, c ∈ G, a ∗ (b ∗ c) = (a ∗ b) ∗ c, i.e., ∗ is an associative binary
operation on G.
(2) For every a ∈ G, there exists an element e in G such that a ∗ e = e ∗ a = a.
Such an element e in G is called the identity element in G.
(3) For every a ∈ G, there exists an element b ∈ G such that a ∗ b = b ∗ a = e.
Such an element b ∈ G is said to be the inverse of a in G.

Remark 1.4 (i) In the above definition, G is a group under the binary operation
∗. Throughout this chapter by a group G means a group under multiplication
unless otherwise mentioned. For the sake of convenience the product between
any two elements a, b of a multiplicative group G will be denoted by ab instead
of a · b.
(ii) It can be easily seen that the identity element e in a group G is unique. For if
e, e are two identities in a group G, then e = ee = e e = e .
(iii) The inverse of each element in a group G is unique. If a  and a  are two inverses
of an element a in a group G, then aa  = a  a = e and aa  = a  a = e, where
e is the identity element in G. Then it can be easily seen that a  = a  e =
a  (aa  ) = (a  a)a  = ea  = a  .
(iv) If a −1 denotes the inverse of a in G, then in view of axiom (3), one can write
a = b−1 and b = a −1 . This is also easy to see that (ab)−1 = b−1 a −1 for all
a, b ∈ G.
(v) A group G is said to be abelian (or commutative) if ab = ba holds for all
a, b ∈ G. Otherwise, G is said to be a nonabelian group.
(vi) If a group G contains finite number of elements, then G is said to be a finite
group. Otherwise group G is said to be infinite. The number of elements in a
finite group G is called the order of G and is generally denoted by ◦(G) or |G|.
(vii) For any a ∈ G and n a positive integer, a n = a · a· · · a, a 0 = e, the identity of
n− times
the group G, a −n = (a −1 )n and hence it is straightforward to see that a n a m =
a n+m , (a n )m = a nm .

Example 1.5 (1) It can be easily seen that the groupoids (Z, +), (Q, +), (R, +)
and (C, +) form abelian groups under addition, while (Q∗ = Q\{0}, ·), (R∗ =
R\{0}, ·) and (C∗ = C\{0}, ·) form abelian groups under multiplication.
(2) The set G = {1, −1} forms an abelian group under multiplication.
(3) Consider the set G = {1, −1, i, −i} the set of fourth roots of unity. This is an
abelian group of order 4 under multiplication.
1.1 Groups 3

(4) The set of all positive rational numbers Q+ forms an abelian group under
the binary operation  ∗ defined as a ∗ b = ab
2
. Note that e = 2 is the identity
element in Q+ while for any a ∈ Q+ , a4 is the inverse of a in Q+ .
(5) The set Z of all integers forms an abelian group with respect to the binary
operation  ∗ defined by a ∗ b = a + b + 1, for all a, b ∈ Z. Note that e = −1
is the identity element in Z, while any a ∈ Z has the inverse −2 − a.
(6) The set {1, ω, ω2 }, where ω is the cube root of unity, forms an abelian group
of order 3 under multiplication of complex numbers.
(7) The set of matrices
  1 0   −1 0   1 0   −1 0 
M= , , ,
01 0 1 0 −1 0 −1

forms an abelian group of order 4 under matrix multiplication.


(8) The set of quaternions {±1, ±i, ± j, ±k} forms a nonabelian group of order
8 under multiplication i 2 = j 2 = k 2 = −1, i j = − ji = k, jk = −k j =
i, ki = −ik = j.
a a 
(9) The set of matrices M = | 0 = a ∈ R forms an abelian group under
aa
1 1
matrix multiplication. The identity of the group M is 21 21 , and for any
   1 1  2 2
aa
nonzero a ∈ R the inverse of ∈ M is 1 1 .
4a 4a
aa 4a 4a
(10) The set K = {e, a, b, c} forms an abelian group under the binary operation
defined on K as ab = ba = c, bc = cb = a, ca = ac = b, a 2 = b2 = c2 = e.
This group is known as Klein 4-group.
(11) The set of all permutations defined on n symbols (n ≥ 3) forms a nonabelian
group of order n! and is denoted by Sn . For example, S3 = {I, (12), (13), (23),
(123), (132)} is a nonabelian group of order 6.
(12) The set of all n × n invertible matrices over the set R (or C) forms a group
under matrix multiplication generally known as general linear group, denoted
by G L n (R) (or G L n (C)). Similarly, the set of all n × n real matrices with
determinant one is a group under multiplication, denoted as S L(n, R) and
known special linear group. The set of all n × n orthogonal matrices over reals
forms a group and is called the orthogonal group, denoted as O(n, R).

Definition 1.6 If G is a group, then the order of an element a ∈ G is the least positive
integer n such that a n = e, the identity of group G. If no such positive integer exists
we say that a is of infinite order. We use the notation ◦(a) for the order of a.

Proposition 1.7 If G is a group, then the following hold:


(i) For any a, g ∈ G, ◦(a) = ◦(gag −1 ).
(ii) For any a, b ∈ G, ◦(ab) = ◦(ba).
(iii) If ◦(a) is finite, then for any integer m, a m = e implies that ◦(a) divides m.
4 1 Algebraic Structures and Matrices

Proof (i) For any positive integer n, a n = e if and only if (gag −1 )n = ga n g −1 = e


for all g ∈ G. Hence ◦(gag −1 ) = ◦(a).
(ii) Since ba = b(ab)b−1 , we find that ◦(ab) = ◦(ba), by (i).
(iii) Suppose that ◦(a) = n and m be a positive integer such that a m = e. Application
of Euclid’s algorithm yields that m = nq + r for some integer q and r with 0 ≤ r <
n. Hence, in this case e = a m = (a n )q a r = a r . But since n is the least positive integer
such that a n = e, we arrive at r = 0, and hence ◦(a)|m.

Proposition 1.8 Let G be a group and n be a positive integer. Then the following
hold:
(i) G is abelian if and only if (ab)2 = a 2 b2 , for all a, b ∈ G.
(ii) G is abelian if and only if (ab)k = a k bk , for all a, b ∈ G, where k = n, n +
1, n + 2.

Proof (i) Suppose that G is abelian. Then

(ab)2 = (ab)(ab) = a(b(ab)) = a((ba)b) = a(ab)b = a 2 b2 .

Conversely, if (ab)2 = a 2 b2 , then a −1 ((ab)(ab)) = a −1 (a 2 b2 ), and this implies that


b(ab) = ab2 . Similarly, operating by b−1 from the right side in the latter equation,
we find that ba = ab, and hence G is abelian.

(ii) We have (ab)n = a n bn , (ab)n+1 = a n+1 bn+1 and (ab)n+2 = a n+2 bn+2 . Using
the first two conditions, we get (a n bn )(ab) = (ab)n+1 = a n+1 bn+1 . Now using can-
cellation laws we arrive at bn a = abn . Similarly, using the latter two conditions we
find that (a n+1 bn+1 )(ab) = (ab)n+2 = a n+2 bn+2 , and hence bn+1 a = abn+1 . But in
this case abn+1 = bn+1 a = b(bn a) = b(abn ) and using cancellation laws we find
that G is abelian.

Definition 1.9 A nonempty subset H of a group (G, ∗) is said to be a subgroup of


G if
(i) H is closed under the binary operation  ∗ ,
(ii) H is a group under the same binary operation  ∗ .

Example 1.10 (1) For any group G, G is a subgroup of G. Similarly, the trivial
group {e} is a subgroup of G. Any subgroup of G which is different from G and
{e} is said to be a proper (nontrivial) subgroup of G.
(2) The additive group (Z, +) is a subgroup of additive group (Q, +), the additive
group (Q, +) is a subgroup of (R, +) and the additive group (R, +) is a subgroup
of the additive group (C, +).
(3) The multiplicative group {1, −1, i, −i}, i 2 = −1, is a subgroup of all complex
numbers under multiplication. It is straightforward to see that S L(n, R) and
O(n, R) are subgroups of G L(n, R).
(4) The subsets {e, a}, {e, b} and {e, c} are subgroups of Klein 4-group K =
{e, a, b, c}.
1.1 Groups 5

(5) Let G be the multiplicative group of all nonsingular 2 × 2 matrices over the set
of complex numbers. Consider the subset
        
10 i 0 0i 0 1
H= ± ,± ,± ,±
01 0 −i i 0 −1 0

of the group G. It can be easilyseenthat H is a subgroup


 of G.
 10 i 0

(6) Consider the subset H = ± ,± of H in the above example.
01 0 −i
It can be seen that H  is a subgroup of H .

Proposition 1.11 A nonempty subset H of a group G is a subgroup of G if and only


if ab−1 ∈ H for any a, b ∈ H .

Proof If H is a subgroup of G, then for any a, b ∈ H , b−1 ∈ H and consequently


ab−1 ∈ H .
Conversely, assume that for any a, b ∈ H, ab−1 ∈ H . For any a ∈ H , e = aa −1 ∈
H , and H contains identity. Now for any a ∈ H , and e ∈ H , a −1 = ea −1 ∈ H , i.e.,
every element in H has its inverse in H . Also for a, b ∈ H , b−1 ∈ H , and hence
ab = a(b−1 )−1 ∈ H . Consequently, H is closed. Since binary composition on G is
associative, the induced binary composition is also associative on H . Thus H is a
subgroup of G.

Definition 1.12 Let G be a group and H a subgroup of G. For any a ∈ G, the set
H a = {ha | h ∈ H } (resp. a H = {ah | h ∈ H }) is called a right (resp. left) coset of
H determined by a in G.

Example 1.13 Let S3 = {I, ϕ, ξ, ξ 2 , ϕξ, ϕξ 2 }, where ϕ = (23), ξ = (123), ϕ 2 =


I, ξ 3 = I , be symmetric group defined on three symbols. Consider the subset
H = {I, ϕ} which is a subgroup of S3 . The left coset ξ H = {ξ, ξ ϕ} and the right
coset H ξ = {ξ, ϕξ } of H in S3 are distinct.

Remark 1.14 (i) If the binary composition in G is addition and H a subgroup


of G, then the left (resp. right) coset of H in G determined by a is defined as
a + H = {a + h | h ∈ H } (resp. H + a = {h + a | h ∈ H }).
(ii) If a group G is abelian and H is a subgroup of G, then every left coset of H in
G is always the same as the right coset of H in G.
(iii) If H is a subgroup of a group G, then it can be seen that there is one-to-
one correspondence between any two left cosets of H in G. If a H , bH are any
two left cosets of H in G, define a map θ : a H → bH such that θ (ah) = bh. If
θ (ah 1 ) = θ (ah 2 ), then bh 1 = bh 2 implies that h 1 = h 2 and hence ah 1 = ah 2 , i.e.,
θ is one-to-one. Clearly, θ is onto and hence there is one-to-one correspondence
between a H and bH .

Definition 1.15 A subgroup H of G is said to be a normal subgroup of G if a H =


H a for all a ∈ G.
6 1 Algebraic Structures and Matrices

Proposition 1.16 A subgroup H of a group G is normal in G if and only if ghg −1 ∈


H for any g ∈ G, h ∈ H .

Proof Since g H = H g, gh ∈ g H = H g, and hence gh = h 1 g for some h 1 ∈ H .


Thus, ghg −1 = h 1 ∈ H , for any g ∈ G, h ∈ H .
Conversely, assume that ghg −1 ∈ H for any g ∈ G, h ∈ H . Let a be an arbitrary
element in G. Then aha −1 ∈ H for any h ∈ H . Hence ha = a(a −1 h(a −1 )−1 ) ∈
a H , and consequently, H a ⊆ a H . Similarly, if aha −1 ∈ H , then it can be seen that
ah = aha −1 a ∈ H a, i.e., a H ⊆ H a, which shows that a H = H a.

Proposition 1.17 If H is a normal subgroup of G, then the set of left (right) cosets
of H in G forms a group under the binary operation a H bH = abH .

Proof First, we show that the given binary operation is well-defined, i.e., for
any a1 , a2 , b1 , b2 ∈ G, a1 H = a2 H , b1 H = b2 H implies that a1 b1 H = a2 b2 H .
If a1 H = a2 H , b1 H = b2 H , then a1−1 a2 ∈ H and b1−1 b2 ∈ H . This yields that
(a1 b1 )−1 a2 b2 = b1−1 a1−1 a2 b2 = b1−1 (a1−1 a2 )b1 (b1−1 b2 ). Since H is normal in G, b1−1
(a1−1 a2 )b1 ∈ H . But b1−1 b2 ∈ H yields that (a1 b1 )−1 a2 b2 ∈ H , and hence a1 b1 H =
a2 b2 H . This operation is also associative. In fact, for any a, b, c ∈ G, (a H bH )cH =
(ab)H cH = (ab)cH = a(bc)H = a H (bcH ) = a H (bH cH ). Now if e is the iden-
tity element in G, then eH = H acts as the identity element, i.e., eH a H = (ea)H =
a H . For any a ∈ G, a −1 H is the inverse of a H . Hence the set of cosets of H in G
forms a group.

Remark 1.18 If H is a normal subgroup of G, then the set of left(right) cosets of


H in G forms a group which is known as the quotient group or factor group with
respect to H and generally denoted as G/H .

Lemma 1.19 Let G be a group and H a subgroup of G. Then G is the union of all
left cosets of H in G and any two distinct left cosets of H in G are disjoint.

Proof Obviously, union of all left cosets of H in G is a subset of G. Conversely, for


any a ∈ G, a = ae ∈ a H , and hence G is the union of all left cosets of H in G. Now
assume that a H, bH are any two distinct cosets of H in G which are not disjoint,
i.e., a H ∩ bH = ∅. If x ∈ a H ∩ bH , then x = ah 1 = bh 2 for some h 1 , h 2 ∈ H . Let
y be an arbitrary element in a H so that y = ah, h ∈ H . Then

y = ah = (xh −1 −1 −1 −1
1 )h = x(h 1 h) = bh 2 (h 1 h) = b(h 2 h 1 h),

where h 2 h −1
1 h ∈ H . Hence y ∈ bH and a H ⊆ bH . Similarly, it can be shown that
bH ⊆ a H . Consequently, a H = bH , i.e., a H and bH are not distinct. This leads to
a contradiction.

Theorem 1.20 (Lagrange’s Theorem) Let G be a finite group and H a subgroup of


G, then ◦(H ) divides the ◦(G).
1.1 Groups 7

Proof Since G is a finite group, the number of left cosets of H in G is finite. If


the distinct left cosets of H in G are denoted by H = a1 H, a2 H, . . . , at H , then by
Lemma 1.19, G is the union of its all left cosets of H in G, i.e., G = a1 H ∪ a2 H ∪
· · · ∪ at H . But each left coset has the same number of elements which is equal to
the number of elements in H = a1 H , ◦(G) = t ◦ (H ). Hence ◦(H ) divides ◦(G).

Definition 1.21 Let G and G  be any two groups. A mapping θ : G → G  is said to


be a homomorphism if θ (ab) = θ (a)θ (b) holds for all a, b ∈ G.

Remark 1.22 (i) Note that if binary operations in groups G and G  are different,
say ∗ and o, respectively, then θ satisfies the property θ (a ∗ b) = θ (a)oθ (b)
for all a, b ∈ G. For the sake of convenience, it has been assumed that both G
and G  are multiplicative groups.
(ii) A homomorphism of a group which is also onto is called an epimorphism. A
homomorphism of a group which is also one-to-one is called monomorphism.
Further, a homomorphism of a group G into itself is called an endomorphism.
(iii) A group homomorphism which is one-to-one and onto is said to be an isomor-
phism. An isomorphism of a group G onto itself is called an automorphism of
G.
(iv) If θ : G → G  is a homomorphism, then θ (G) is a subgroup of G  .
(v) Let θ : G → G  be a group homomorphism of G onto G  . If G is abelian, then
G  is also abelian.
(vi) If θ : G → G  is a group homomorphism of G onto G  , where e and e are the
identities of G and G  , respectively, then θ (e) = e and θ (a −1 ) = (θ (a))−1 for
all a ∈ G.

Example 1.23 (1) For any group G, the identity mapping IG : G → G is a group
homomorphism.
(2) For any two groups G and G  , the mapping θ : G → G  such that θ (a) = e , the
identity of G  is a group homomorphism and is called trivial homomorphism.
(3) The mapping θ : R∗ → R∗ defined on multiplicative group R∗ = R\{0} such
that θ (a) = |a| is a homomorphism, but θ : R → R such that θ (a) = |a| is not
homomorphism on additive group R.
(4) The map θ : R2 → R2 such that θ (a, b) = a is a homomorphism.
(5) If θ : R2 → C such that θ (a, b) = a + ib, then θ is an isomorphism.
(6) Let a ∈ G be a fixed element of G and θ : G → G such that θa (g) = aga −1 ,
for all a ∈ G. Then it can be seen that θ is an automorphism. In fact, for any
x, y ∈ G, θ (x y) = a(x y)a −1 = axa −1 aya −1 = θa (x)θa (y), i.e., θ is a homo-
morphism. Further, θa (x) = θa (y) implies that axa −1 = aya −1 , i.e., x = y
and hence θ is one-to-one. For any x ∈ G, there exists a −1 xa ∈ G such that
θa (a −1 xa) = a(a −1 xa)a −1 = x and hence θ is onto. Thus θ is an automorphism
of G.

Definition 1.24 Let θ : G → G  be a homomorphism of a group G to G  . Then the


set {a ∈ G | θ (a) = e , the identity of G  } is called the kernel of θ and is denoted by
K er θ .
8 1 Algebraic Structures and Matrices

Proposition 1.25 Let θ : G → G  be a homomorphism of a group G to G  . Then


(i) K er θ is a subgroup of G,
(ii) θ is one-to-one mapping if and only if K er θ = {e}, where e is the identity of G.

Proof (i) Since θ (e) = e , where e is the identity of G  , hence e ∈ K er θ , i.e.,


K er θ = ∅. Now let a, b ∈ K er θ . Then θ (a) = e , θ (b) = e . Now

θ (ab−1 ) = θ (a)θ (b−1 ) = θ (a)(θ (b))−1 = e e(−1) = e e = e ,

and hence ab−1 ∈ K er θ , i.e., K er θ is a subgroup of G.


(ii) Suppose that θ is one-to-one mapping and a ∈ K er θ . Then θ (a) = e = θ (e),
which yields that a = e, and hence K er θ = {e}.
Conversely, assume that K er θ = {e} and θ (a) = θ (b). Then e = θ (a)(θ (b))−1 =
θ (a)θ (b−1 ) = θ (ab−1 ), and hence ab−1 ∈ K er θ = {e}. Consequently, ab−1 = e
and hence a = b.

Exercises

1. Show that the set of all nth complex roots of unity forms a group with respect
to ordinary multiplication.
2. If G is a group, then show that the set Z (G) = {a ∈ G | ab = ba, for all b ∈ G}
(is said to be center of G ) is a normal subgroup of G.
3. If a is a fixed element of a group G, then show that the set N (a) = {x ∈ G |
ax = xa} (is said to be normalizer of a in G) is a subgroup of G.
4. If every element in a group G is self-inverse, i.e., a −1 = a for all a ∈ G, then
show that G is abelian.
5. In a group G, for all a, b ∈ G show that (aba −1 )n = aba −1 if and only if b = bn ,
where n is any integer.
6. Let G be a group and m, n be two relatively prime positive integers such that
(ab)m = a m bm and (ab)n = a n bn hold for all a, b ∈ G. Then show that G is
abelian.
7. Let G be a group such that (ab)2 = (ba)2 for all a, b ∈ G. Suppose G also has
the property that c2 = e implies that c = e, c ∈ G. Then show that G is abelian.
8. Show that a group G in which a m bm = bm a m and a n bn = bn a n hold for all
a, b ∈ G, where m, n are any two relatively prime positive integers, is abelian.
9. Show that intersection of two subgroups of a group G is a subgroup of G. More
generally, show that intersection of any arbitrary family of subgroups of G is a
subgroup of G.
10. If H is a finite subset of a group G such that ab ∈ H for all a, b ∈ H , then show
that H is a subgroup of G.
11. Show that a group cannot be written as a set theoretic union of two of its proper
subgroups.
1.1 Groups 9

12. If K is a subgroup of H and H is a subgroup of G, then show that K is a subgroup


of G.
13. If a is an element of a group G, and n is any nonzero integer, then show that
◦(a) = ◦(a n ) if and only if n is relatively prime to ◦(a).

1.2 Rings

This section is devoted to the study of algebraic structure equipped with two binary
operations, namely rings. Several basic properties of rings and subrings are given.

Definition 1.26 A nonempty set R equipped with two binary operations, say addi-
tion  + and multiplication  · , is said to be a ring if it satisfies the following axioms:
(1) (R, +) is an abelian group.
(2) (R, ·) is a semigroup.
(3) For any a, b, c ∈ R, a · (b + c) = a · b + a · c, (b + c) · a = b · a + c · a.

Remark 1.27 (i) The binary operations  + and  · are not necessarily usual addition
and multiplication. Moreover, these are only symbols used to represent both
binary operations of R. For the convenience we write ab instead of a · b. A
ring R is said to be commutative if ab = ba holds for all a, b ∈ R. Otherwise,
R is said to be noncommutative. If a ring R contains an element e such that
ae = ea = a for all a ∈ R, we say that R is a ring with identity, and the identity
e in a ring R is usually denoted by 1. In general, a ring R may or may not have
identity. But if the ring R has identity 1, then it is unique.
(ii) It can be easily seen that a0 = 0a = 0 for all a ∈ R.
(iii) For any a, b ∈ R, a(−b) = (−a)b = −ab, (−a)(−b) = ab.
(iv) In a ring R, for a fixed positive integer n, na = a + a + · · · + a and a =
n

n− times
a · a· · · a.
n− times

Example 1.28 (1) (Z, +, ·), (Q, +, ·), (R, +, ·) and (C, +, ·) are examples of com-
mutative rings.
(2) Every additive abelian group G is a ring if we define multiplication in G as
ab = 0 for all a, b ∈ G, where 0 is the additive identity. This ring is called zero
ring.
(3) The set Mn×n (Z) of all n × n matrices over integers forms a ring under the
binary operations of matrix addition and matrix multiplication, which is an
example of a noncommutative ring.
(4) Let Q = {a + bi + cj + dk | a, b, c, d ∈ R}. Define addition in Q as (a1 +
b1 i + c1 j + d1 k) + (a2 + b2 i + c2 j + d2 k) = (a1 + a2 ) + (b1 + b2 )i +
(c1 + c2 ) j + (d1 + d2 )k under which Q forms an abelian group. Now multi-
ply any two members of Q as multiplication of polynomials by using the rules
10 1 Algebraic Structures and Matrices

i j = − ji = k, jk = −k j = i, ki = −ik = j. Then Q forms a ring known as


ring of real quaternions which is noncommutative.
(5) Rn = {(a1 , a2 , . . . , an ) | a1 , a2 , . . . , an ∈ R} forms a ring under the addition
and multiplication defined as follows:

(a1 , a2 , . . . , an ) + (b1 , b2 , . . . , bn ) = (a1 + b1 , a2 + b2 , . . . , an + bn )

(a1 , a2 , . . . , an )(b1 , b2 , . . . , bn ) = (a1 b1 , a2 b2 , . . . , an bn ).

This is a commutative ring.


(6) Let R be any ring. Consider the set R[x] of all symbols a0 + a1 x + · · · + an x n
where n is any nonnegative integer and the coefficients a0 , a1 , . . . , an ∈ R. If
f (x) = a0 + a1 x + · · · + an x n , g(x) = b0 + b1 x + · · · + bm x m , then f (x) =
g(x) if and only if ai = bi , i ≥ 0. Define addition and multiplication in R[x]
as follows:

f (x) + g(x) = (a0 + b0 ) + (a1 + b1 )x + · · · + (ak + bk )x k

f (x)g(x) = c0 + c1 x + · · · + ck x k ,

where ck = a0 bk + a1 bk−1 + · · · + ak b0 . It can be seen that R[x] is a ring under


the above operation of addition and multiplication, known as polynomial ring
in indeterminate x.a b 
(7) The set M = | a, b ∈ Z2 forms a noncommutative ring with four
00
elements.
(8) The set Z[i] = {a + ib|a, b ∈ Z} is a ring under usual addition and multipli-
cation, which is√known as the ring of Gaussian integers.
(9) The set {a + b 2 | a, b ∈ Z} forms a ring under usual addition and multipli-
cation.
(10) Let S be any set. Consider P(S), the power set of S and define binary opera-
tions in P(S) as X + Y = (X ∪ Y )\(X ∩ Y ), X Y = X ∩ Y . Then P(S) is
a commutative ring.
(11) For any fixed positive number n ≥ 2, let Zn = {0, 1, 2, . . . , n − 1} and define
addition modulo n, denoted as ⊕n , as a ⊕n b = r , where r is the remainder
obtained by dividing the ordinary sum of a and b by n, where a, b ∈ Zn . Simi-
larly, the multiplication modulo n is defined as a ⊗n b = r , r is the remainder
obtained by dividing the ordinary product of a and b by n. It can be seen that the
set (Zn , ⊕n , ⊗n ) forms a commutative ring with identity 1 under the addition
and multiplication modulo n.
(12) The set R[x] of all polynomials with real coefficients under addition and mul-
tiplication of polynomials forms a commutative ring with identity.
1.2 Rings 11

In the previous section, we have defined the notion of subgroup H of a group G, i.e.,
a nonempty subset H of G which is itself a group under the same binary operation.
Analogously, the notion of a subring has been introduced in the case of a ring also.

Definition 1.29 A nonempty subset S of a ring (R, +, ·) is said to be a subring of


(R, +, ·), if S is itself a ring under the same binary operations.

Example 1.30 (1) (Z, +, ·) is a subring of (Q, +, ·), (Q, +, ·) is a subring of


(R, +, ·) and (R, +, ·) is a subring of (C, +, ·).
a b a b 
(2) S = | a, b ∈ Z is a subring of M2 (Z) = | a, b, c, d ∈ Z .
00 cd
a a  a b 
(3) S = | a ∈ R is a subring of M2 (R) = | a, b, c, d ∈ R .
aa cd
(4) (nZ, +, ·) is a subring of the ring (Z, +, ·) for any fixed integer n ∈ Z.

Proposition 1.31 A nonempty subset S of a ring R is a subring of R if and only if


for any a, b ∈ S, a − b ∈ S and ab ∈ S.

Proof Let S be a subring of R and a, b ∈ S. Then by definition (S, +, ·) is a ring and


for any a, b ∈ S, a, −b ∈ S and hence a − b ∈ S and ab ∈ S. Conversely, assume
that S is a nonempty subset of R such that for any a, b ∈ S, a − b ∈ S and ab ∈ S.
If a − b ∈ S for any a, b ∈ S, then it can be seen that (S, +) is an additive subgroup
of (R, +). Also since S is closed under multiplication, (S, ·) is a sub-semigroup of
(R, ·). Further, since the laws of distributivity hold in R, these also hold in S. Hence
(S, +, ·) is a ring under the induced operation of addition and multiplication. Hence
S is a subring of R.

Remark 1.32 (i) A subring need not contain the identity of a ring. For exam-
ple, (Z, +, ·) has the identity 1 while its subring (2Z, +, ·) has no identity.
(Z6 , ⊕6 , ⊗6 ) has the identity 1, but its subring ({0, 3}, ⊕6 , ⊗6 ) has the identity
3. It is also to remark that a ring has no identity element but it has a subring
which contains the identity element. For example, Z × 3Z has no identity while
its subring Z × {0} has the identity (1, 0).
(ii) Trivially, subring of a commutative ring is commutative. But a noncommuta-
tive ring may have a commutative subring. For example,
a b 
M2 (Z) = | a, b, c, d ∈ Z
cd

is a noncommutative ring while its subring


a 0
S= | a, d ∈ Z
0d

is commutative.
12 1 Algebraic Structures and Matrices

a b 
(iii) Let R = | a, b ∈ R, q ∈ Q . It can be seen that R is a ring with
0q
  a b 
10
identity . But it has a subring S = | a, b ∈ R, m ∈ Z} without
01 0 2m
identity.

Definition 1.33 A nonempty subset I of a ring (R, +, ·) is said to be a left (resp.


right) ideal of R if for all a, b ∈ I, r ∈ R, a − b ∈ I, ra ∈ I (resp. a − b ∈ I, ar ∈
I ). If I is both a left as well as a right ideal of R, I is said to be an ideal of R.

Example 1.34 In the ring M2 (Z) of all 2 × 2 matrices over Z, the ring of inte-
a b  c 0
gers consider the subsets A = | a, b ∈ Z and B = | c, d ∈ Z
00 d0
of M2 (Z). It can be easily seen that A is a right ideal of M2 (Z) which is not a left
ideal of M2 (Z) while B is a left ideal of M2 (Z) which is not a right ideal of M2 (Z).
a 0
If we consider the subset C = | a, b, c ∈ Z of M2 (Z), then it can be seen
bc
that C is a subring of M2 (Z), but neither a left nor a right ideal of M2 (Z).

Definition 1.35 Let (R, +, ·) be a ring and I be an ideal of R. Then the set R/I =
{a + I | a ∈ R} forms a ring under the operations: (a + I ) + (b + I ) = (a + b) + I
and (a + I )(b + I ) = ab + I . This is known as quotient ring.

Definition 1.36 Let R1 , R2 be any two rings. A mapping θ : R1 → R2 is said to


be a homomorphism if θ (a + b) = θ (a) + θ (b) and θ (ab) = θ (a)θ (b) hold for all
a, b ∈ R1 .
A ring homomorphism which is also one-to-one and onto is called a ring isomor-
phism. Moreover, if R1 , R2 have the identity elements e1 and e2 , respectively, then
θ (e1 ) = e2 .

Example 1.37 (1) A map θ : C → C such that θ (z) = z̄, the complex conjugate of
z, is a ring homomorphism which is both one-to-one and onto.
  a −b 
(2) Let θ : C → R, where R = | a, b ∈ R is a ring under matrix addi-
b a
 
x −y
tion and multiplication. For any z = x + i y ∈ C, define θ (z) = . It can
y x
be easily verified that θ is a homomorphism which is both one-to-one and onto.

Definition 1.38 Let θ : R1 → R2 be a homomorphism of a ring R1 to a ring R2 .


Then the kernel of θ , denoted as ker θ , is the collection of all those elements of R1
which map to the zero of R2 , i.e., K er θ = {a ∈ R1 | θ (a) = 0}.

Remark 1.39 (i) It can be easily seen that K er θ is an ideal of R1 . In fact,


θ (0) = 0, 0 ∈ K er θ and hence K er θ = ∅. If a, b ∈ K er θ , then θ (a) = 0, θ (b) = 0.
Hence θ (a − b) = θ (a) − θ (b) = 0, i.e., a − b ∈ K er θ . Again if a ∈ K er θ and r ∈
R1 , θ (ar ) = θ (a)θ (r ) = 0, and θ (ra) = θ (r )θ (a) = 0. Hence ar and ra ∈ K er θ .
1.2 Rings 13

This shows that ker θ is an ideal of R1 .

(ii) If θ : R1 → R2 is a homomorphism and I is an ideal of R1 , then θ (I ) is an


ideal of θ (R1 ). Note that θ (I ) = {x ∈ R2 | there exists a ∈ I such that θ (a) = x}.
Since 0 = θ (0), 0 ∈ θ (I ), i.e., θ (I ) = ∅. If x, y ∈ θ (I ), then there exist a, b ∈ I such
that θ (a) = x, θ (b) = y. Hence x − y = θ (a) − θ (b) = θ (a − b). Since a − b ∈ I ,
x − y ∈ θ (I ). If r1 ∈ θ (R1 ), then there exists r ∈ R1 such that r1 = θ (r ). Hence
r1 x = θ (r )θ (a) = θ (ra). Since ra ∈ I , r1 x ∈ θ (I ). In a similar way, it can be seen
that xr1 ∈ θ (I ), and hence θ (I ) is an ideal of θ (R1 ).

Proposition 1.40 Let θ : R1 → R2 be a ring homomorphism. Then θ is one-to-one


if and only if K er θ = {0}.

Proof Suppose that K er θ = {0}, and θ (a) = θ (b), a, b ∈ R1 . Then θ (a − b) =


θ (a) − θ (b) = 0, and hence a − b ∈ K er θ = {0}, i.e., a − b = 0 and θ is one-to-
one.
Conversely, assume that θ is one-to-one. If a ∈ K er θ , then θ (a) = 0 = θ (0), and
since θ is one-to-one, we find that a = 0, i.e., K er θ = {0}.

Definition 1.41 A nonzero element a ∈ R is called a zero divisor if there exists a


nonzero element b ∈ R such that ab = 0 and ba = 0. A commutative ring R is said
to be an integral domain if R contains no zero divisors.
It is obvious to see that Z, the ring of integers, is free from zero divisor and hence is
an integral domain.

Definition 1.42 A ring R with identity 1, in which each nonzero element has mul-
tiplicative inverse, i.e., if a ∈ R is a nonzero element, then there exists b ∈ R such
that ab = ba = 1, is called a division ring.

Definition 1.43 The characteristic of a ring R, denoted as char (R), is the smallest
positive integer n such that na = 0 for all a ∈ R. If no such integer exists, the
characteristic of R is said to be zero.

Proposition 1.44 If R is an integral domain, then char (R) = 0 or char (R) = p,


where p is a prime.

Proof Suppose that char (R) = n = 0, and n is not prime. Then n = n 1 n 2 , where
n 1 and n 2 are proper divisors of n. For any 0 = a ∈ R, 0 = na 2 = n 1 n 2 a 2 =
(n 1 a)(n 2 a). Since R is an integral domain, n 1 a = 0 or n 2 a = 0. If n 1 a = 0, then
it can be seen that n 1r = 0 for any r ∈ R. In fact, (n 1r )a = r (n 1 a) = 0, and since
a = 0, we arrive at n 1r = 0 for all r ∈ R, where n 1 < n. This contradicts the mini-
mality of n, and hence n is a prime.

Definition 1.45 A ring is called a simple ring if it has no proper ideals.


a b 
Example 1.46 (1) The ring R = | a, b, c, d ∈ Q is a simple ring.
cd
14 1 Algebraic Structures and Matrices

(2) Every division ring is a simple ring.

Polynomial Ring Over a Ring


Let S be a ring and R be a subring of S. Then S is called a ring extension of R. An
element α ∈ S is called an algebraic element over R if there exists a nonzero poly-
nomial f (x) = α0 + α1 x + α2 x 2 + · · · + αn x n over R such that f (α) = 0, where
√ f (α) = α0 + α1 α + α2 α + · · · + αn α . Clearly R and Q are
2 n
we define extensions
of Z. 3 ∈ R is algebraic over Z because it satisfies the polynomial x 2 − 3 over Z.
It is to be noted that e, π ∈ R are transcendental over Z.
Let S be an extension of R. Let x ∈ S be a transcendental element over R. Further,
suppose that x commutes with all the elements of R. Then it can be easily verified
that T = {α0 + α1 x + α2 x 2 + · · · + αn x n | α0 , α1 , α2 · · · αn ∈ R, n any nonnegative
integer} is a subring of R. This ring is known as a polynomial ring in x over the ring
R. Each element of T is known as a polynomial in x over R. It is usually denoted
by R[x]. Obviously, R ⊆ T = R[x] but x ∈ / R[x]. It can be easily verified that each
element of R[x] has its unique representation. Moreover, if R is commutative, then
R[x] is also commutative and if R is an integral domain, then R[x] is also an integral
domain.
In particular, if the ring S and its subring R have the same identity 1, then it
can be also shown that R[x] = [R ∪ {x}], the smallest subring of S containing R ∪
{x}. Moreover, in this case R[x] is a ring with the identity 1 and x ∈ R[x]. Let
f (x) = α0 + α1 x + α2 x 2 + · · · + αn x n ∈ R[x]. Here α0 , α1 , α2 , . . . , αn are known
as coefficients of f (x). Here n is called the degree of f (x) provided αn = 0. If
f (x) = 0 + 0x + 0x 2 + · · · + 0x n , then it is called the zero polynomial. Its degree
is undefined.

Remark 1.47 It can be easily seen that the definition of a polynomial ring given
above is equivalent to the definition of a polynomial ring via Example 1.28(6).

Exercises

1. Show that a ring R with identity is commutative if and only if (ab)2 = a 2 b2 for
all a, b ∈ R.
2. Justify the existence of the identity in the above result.
3. Prove that a ring R is commutative if it satisfies the property a 2 = a for all
a ∈ R.
4. Show that a ring R is commutative if and only if (a + b)2 = a 2 + 2ab + b2 for
all a, b ∈ R.
5. If (R, +, ·) is a system satisfying all the axioms of a ring with unity except
a + b = b + a, for all a, b ∈ R, then show that (R, +, ·) is a ring.
6. Let R be a ring (may be without unity 1) satisfying any one of the following
identities:
1.2 Rings 15

(a) (x y)2 = x y, for all x, y ∈ R,


(b) (x y)2 = yx, for all x, y ∈ R,
(c) (x y − yx)2 = x y − yx, for all x, y ∈ R.
Then show that R is commutative.
7. Find all rings up to isomorphism with two elements (three elements).
8. Show that it is impossible to find a homomorphism of Z4 onto Z5 .
9. Show that the rings Z and 2Z are not isomorphic.
10. Prove or disprove that homomorphic image of an integral domain is an integral
domain.
11. If char (R) = p, p a prime, then show that (a + b) p = a p + b p , a, b ∈ R.
12. If R is a ring satisfying a 2 = a for all a ∈ R, then show that char (R) = 2.
13. Show that every ring containing six elements is necessarily commutative.
14. Show that it is impossible to get an integral domain with six elements.
15. Let R be a ring (may be without unity 1) satisfying any one of the following
identities:
(a) (x y)n = x y, for all x, y ∈ R, where n > 1 is a fixed positive integer,
(b) (x y)n = yx, for all x, y ∈ R, where n ≥ 1 is a fixed positive integer.
Then show that R is commutative.
16. Let n ≥ 1, be a fixed positive integer and R a ring with the identity 1 satisfy-
ing (ab)k = a k bk , for all a, b ∈ R, where k = n, n + 1, n + 2. Prove that R is
commutative. Moreover, justify the existence of the identity 1 in this result.

1.3 Fields with Basic Properties

This section is devoted to study field, its various examples and some basic properties
which have been used freely in the subsequent chapters.

Definition 1.48 A commutative division ring is called a field.


The set Q, R, and C under usual addition and multiplication are known examples of
fields.

Example 1.49 (1) (Z6 , ⊕6 , ⊗6 ) is neither an integral domain nor a field.


(2) (Z p , ⊕ p , ⊗ p ), p a prime, is a field.
(3) Consider the ring of real quaternions Q (see Example 1.23(4)), which is non-
commutative and with the identity 1. For any nonzero element x = a + bi +
cj + dk ∈ Q, it can be seen that there exists an element y = aa−bi−cj−dk
2 +b2 +c2 +d 2 ∈ Q

such that x y = yx = 1, i.e., every nonzero element has its multiplicative inverse.
This is an example of a division ring which is not a field.
(4) Define addition and multiplication in R2 = {(a, b) | a, b ∈ R} as (a, b) + (c, d)
= (a + c, b + d), (a, b)(c, d) = (ac − bd, ad + bc). It can be seen that
(R2 , +, ·) is a field. In fact, for any nonzero (a, b) ∈ R2 , there exists (a, b)−1 =
−b
( a 2 +b
a
2 , a 2 +b2 ).
16 1 Algebraic Structures and Matrices

 a b
(5) Consider the set M = | a, b ∈ C . Then M is a ring under matrix
−b̄ ā
 
10
addition and matrix multiplication with identity . For any nonzero matrix
01
 

x−i y −(a+ib)
x + i y a + ib
X= ∈ M, there exists Y = x +ya−ib +a +b x +y +a +b

2 2 2 2 2 2 2 2

−a + ib x − i y x+i y
x 2 +y 2 +a 2 +b2 x 2 +y 2 +a 2 +b2
 
10
M such that X Y = Y X = . Hence M is a division ring which is not a
01
field.
Proposition 1.50 Every field is an integral domain.
Proof Suppose that R is a field, and there exist a = 0, b = 0 in R such that ab = 0.
Since b = 0 is an element of the field R, there exists c ∈ R such that bc = cb =
1. Now 0 = (ab)c = a(bc) = a1 = a, a contradiction. Hence ab = 0 and R is an
integral domain.
Remark 1.51 Every integral domain need not be a field. For example, Z, the ring
of integers, is an integral domain which is not a field. But in case of a finite integral
domain, the converse of the above result is true.
Proposition 1.52 Every finite integral domain is a field.
Proof Let R = {a1 , a2 , . . . , an } be a finite integral domain consisting of distinct ele-
ments a1 , a2 , . . . , an , and let a be a nonzero element of R. Then aa1 , aa2 , . . . , aan
are all distinct and belong to R. In fact, if aai = aa j , 1 ≤ i, j ≤ n, then a(ai −
a j ) = 0 which yields that ai = a j , 1 ≤ i, j ≤ n, a contradiction. Hence the set
{aa1 , aa2 , . . . , aan } coincides with R. In particular, aak = a for some 1 ≤ k ≤ n. It
can be seen that ak is the multiplicative identity of R. If a is an arbitrary element
of R, then a = aai for some 1 ≤ i ≤ n. But a ak = ak a = ak (aai ) = (ak a)ai =
aai = a shows that ak is the identity of R, denote it by 1. Hence if 0 = a ∈ R, we
find that aam = am a = 1, for some 1 ≤ m ≤ n. Thus every nonzero element in R
has its multiplicative inverse, R is a field.
Remark 1.53 (i) We have earlier shown that the characteristic of an integral
domain is either 0 or p, where p is a positive prime integer. But we have also
earlier proved that every field is an integral domain. Thus we conclude that
characteristic of a field is either 0 or p, where p is a positive prime integer.
(ii) The identity of a subfield is the same as the identity of the field.
Proposition 1.54 A field has no proper ideals. In particular, every field is a simple
ring.
Proof Let F be a field. Next, suppose that I is an ideal of F. If I = {0}, then nothing
to do. If not, there exists a nonzero element a ∈ I. As F is a field, there exists
the element a −1 ∈ F such that a −1 a = 1. Since I is an ideal of F, it follows that
a −1 a = 1 ∈ I. As a result, I = F. Hence F has no proper ideal. Finally, it can be
said that every field F is a simple ring.
1.3 Fields with Basic Properties 17

Proposition 1.55 Let f : F −→ R be any ring homomorphism, where R is any


arbitrary ring. Then f is either an injective ring homomorphism or the zero homo-
morphism.
Proof We know that K er f is an ideal of F. But as proved above a field has no proper
ideals. Thus we have only two possibilities here for K er f . Either K er f = {0}, or
K er f = F. If the first case happens then f becomes an injective ring homomorphism,
and if the second case happens then f becomes the zero homomorphism.
Polynomial Ring Over a Field
As we described a polynomial ring over a ring R in the previous section, in the
present section, we will focus only on polynomial ring over a field F and its some
important properties, which have been used in different chapters of this book. First,
we define some basic terms analogously as we did in the previous section.
Let F be a subfield of K . Then K is called a field extension of F. An ele-
ment α ∈ K is called an algebraic element over F if there exists a nonzero poly-
nomial f (x) = α0 + α1 x + α2 x 2 + · · · + αn x n ∈ F[x] such that f (α) = 0, where
we define f (α) = α0 + α1 α + α2 α 2 + · · · + αn α n . Otherwise, α ∈ K is known as a
transcendental element
√ over F. Clearly, C is an extension of R and R is an extension
of Q. Obviously, 3 ∈ R is algebraic over Q because it is a root of x 2 − 3 over Q. It
is to be noted that e, π ∈ R are transcendental over Q. An extension K of a field F is
called an algebraic extension of F if its each element is algebraic over F. Observe that
C is an algebraic extension of R because any α + iβ ∈ C is a root of the polynomial
(x − α)2 + β 2 ∈ R[x]. It is also clear that R is not an algebraic extension of Q.
Let K be an extension of F. Let x ∈ K be a transcendental element over F.
Similarly, as discussed in preceding section, one concludes that the polynomial
ring F[x] is given as F[x] = {α0 + α1 x + α2 x 2 + · · · + αn x n |α0 , α1 , α2 . . . αn ∈
F, n any nonnegative integer}. This ring F[x] is an integral domain with identity. The
polynomial ring F[x] has many important properties. Division algorithm, Euclidean
algorithm and Factor theorem are some of the important and well-known results
in the ring F[x]. Here, we prove only Division algorithm, which has been used in
proving so many results of this book.
Proposition 1.56 (Division Algorithm) Let f (x), 0 = g(x) ∈ F[x]. There exists
unique polynomials q(x), r (x) ∈ F[x] such that f (x) = g(x)q(x) + r (x), where
either r (x) = 0 or deg r (x) < deg g(x). Here q(x) and r (x) are called quotient and
remainder, respectively, when f (x) is divided by g(x).
Proof First, we prove the existence of such polynomials q(x), r (x) ∈ F[x]. If
f (x) = 0 or a constant polynomial then it is obvious. On the other hand, if g(x)
is a constant polynomial then existence is also trivial. Now consider the case when
deg f (x) = m ≥ 1 and deg g(x) = n ≥ 1. Here if m ≤ n then it is easy to get exis-
tence of such polynomials. Finally, we tackle the case when m > n. In this case proof
follows by transfinite induction applying on deg f (x). Result is obvious when m = 1.
Let us suppose induction hypothesis, i.e., the result holds for all polynomials belong-
ing to F[x] whose degree is less than the degree of f (x). Let f (x) = α0 + α1 x +
18 1 Algebraic Structures and Matrices

α2 x 2 + · · · + αm x m and g(x) = β0 + β1 x + β2 x 2 + · · · + βn x n . Now construct the


polynomial h(x) = f (x) − αm βn−1 x m−n g(x). Clearly, h(x) ∈ F[x] and deg h(x) <
deg f (x). Hence by induction hypothesis there exist polynomials q1 (x), r1 (x) ∈ F[x]
such that h(x) = g(x)q1 (x) + r1 (x). This implies that f (x) − αm βn−1 x m−n g(x) =
g(x)q1 (x) + r1 (x), i.e., f (x) = g(x)(αm βn−1 x m−n + q1 (x)) + r1 (x). Thus existence
of polynomials for f (x) stands proved. Thus result holds for any f (x), 0 = g(x) ∈
F[x].
Now, we prove the uniqueness of this existence of polynomials. If possi-
ble suppose that there exist polynomials q1 (x) = q2 (x); r1 (x) = r2 (x) ∈ F[x] such
that f (x) = g(x)q1 (x) + r1 (x), where either r1 (x) = 0 or deg r1 (x) < deg g(x)
and f (x) = g(x)q2 (x) + r2 (x), where either r2 (x) = 0 or deg r2 (x) < deg g(x).
Equating two values for f (x), we obtain that g(x)(q1 (x) − q2 (x)) = r1 (x) − r2 (x).
As F[x] is an integral domain and q1 (x) − q2 (x) = 0, hence we conclude that
r1 (x) − r2 (x) = 0. Thus deg (r1 (x) − r2 (x)) < n because deg r1 (x) < n and deg
r2 (x) < n. On the other hand, deg (r1 (x) − r2 (x)) = deg [g(x)(q1 (x) − q2 (x))]=
deg g(x) + deg (q1 (x) − q2 (x)) ≥ n, which leads to a contradiction. This completes
the proof.

Remark 1.57 (i) Let f (x) ∈ F[x]. Let α ∈ F be a root of f (x). Divide f (x)
by (x − α) according to Division algorithm, there exist unique polynomials
q(x) and r (x) such that f (x) = (x − α)q(x) + r (x), either r (x) = 0 or deg
r (x) < deg (x − α) = 1. This implies that r (x) = 0 or λ, a nonzero constant
polynomial. If r (x) = λ, then we get f (x) = (x − α)q(x) + λ. Using the fact
that α is a root of f (x), we arrive at λ = 0, i.e., f (x) = (x − α)q(x). This
shows that (x − α) is a factor of f (x). It is known as factor theorem.
(ii) Let f (x), g(x) ∈ F[x]. We say that a nonzero polynomial f (x) divides a
polynomial g(x) in F[x], symbolically written as f (x)/g(x), if there exists
h(x) ∈ F[x] such that g(x) = f (x)h(x). Here f (x) and h(x) are called as
factors of g(x). In particular, if deg f (x) < deg g(x) and deg h(x) <deg
g(x), then f (x) and h(x) are known as proper or nontrivial factors of g(x).
Otherwise, f (x) and h(x) are known as improper or trivial factors of g(x).
(iii) Let f (x) ∈ F[x] be any nonconstant polynomial. If f (x) has no proper fac-
tors then f (x) is known as irreducible polynomial. Otherwise, it is known as
reducible polynomial.
(iv) Let f (x) ∈ F[x] be any nonconstant polynomial. Then f (x) = f 1 (x) f 2 (x) · · ·
f n (x), where f 1 (x), f 2 (x), . . . , f n (x) are some irreducible polynomials over
F.
(v) Let K be an extension of F. An element α ∈ K is called a root of f (x) = α0 +
α1 x + α2 x 2 + · · · + αn x n ∈ F[x] if f (α) = 0, where f (α) = α0 + α1 α +
α2 α 2 + · · · + αn α n .
(vi) Let f (x) and g(x) be any two nonzero polynomials over F. Then deg ( f (x) +
g(x)) ≤ max ( deg f (x), deg g(x)), deg ( f (x)g(x)) = deg f (x) + deg g(x).
(vii) Let f (x) ∈ F[x] be any polynomial of degree n ≥ 1. Then total number of
roots of f (x) = n.
1.3 Fields with Basic Properties 19

(viii) Let K be an extension of F. Let α ∈ K be algebraic over F. Let T = [F ∪ {α}]


be the subfield of K generated by F ∪ {α}. Then T is an algebraic extension
of F.

Definition 1.58 Let F be a field. A subset S of F with minimum 2 elements is called


a subfield of F if it forms a field with regard to the operations of F.

Example 1.59 (1) Q is a subfield of R.


(2) Q and R are subfields of C.
(3) Every field is a subfield of itself, known
√ as trivial subfield.
(4) We know that the set X = {a + b 2| a, b ∈ Q} consisting of special type of
real numbers is a field with regard to usual addition and usual multiplication. It
is obvious to observe that Q is a subfield of X.

Proposition 1.60 Let F be a field. A subset S of F, with minimum two elements is a


subfield of F if and only if
(i) a, b ∈ S ⇒ a − b ∈ S,
(ii) a, 0 = b ∈ S ⇒ ab−1 ∈ S.

Proof Suppose that F is a field. And S is a subfield of F. This implies that (S, +) is
a subgroup of (F, +). Hence a, b ∈ S ⇒ a + (−b) = a − b ∈ S. Similarly, (S ∗ =
S\{0}, ·) is a subgroup of (F∗ = F\{0}, ·). This implies that a, b ∈ S ∗ ⇒ ab−1 ∈ S ∗ .
In turn, we have ab−1 ∈ S. On the other hand, if a = 0 and b ∈ S ∗ , then ab−1 = 0 ∈
S. Now including the previous two statements, we have shown that a, 0 = b ∈ S ⇒
ab−1 ∈ S.
Conversely, suppose that a subset S of F, with minimum two elements satisfies
(i) a, b ∈ S ⇒ a − b ∈ S; (ii) a, 0 = b ∈ S ⇒ ab−1 ∈ S. To show that (S, +, ·) is
a field, we prove that (S, +, ·) is a commutative ring with identity in which each
nonzero element is a unit. As |S| ≥ 2, there exists an element 0 = x ∈ S. Now using
(i), we have x − x = 0 ∈ S and using (ii) we have x x −1 = 1 ∈ S. Next suppose
that a, b ∈ S. If at least one of them is zero then ab = 0 ∈ S. Now suppose the case
when a and b both are nonzero. Using (ii), we have 1b−1 = b−1 ∈ S. Obviously
b−1 = 0. Now using (ii) again, we have a(b−1 )−1 = ab ∈ S. Thus, we have shown
that a, b ∈ S ⇒ ab ∈ S. Now utilizing (i) and the previous conclusion, we have
shown that S is a subring of F with identity. As F is a commutative ring, we are
bound to conclude that S is a commutative ring with the identity 1 in which each
nonzero element is a unit. Thus S is a field but it is contained in F. Hence S is a
subfield of F.

Definition 1.61 A field F is called a prime field if it has no proper subfield. This
says that only subfield of F is F itself.

Proposition 1.62 The fields Q and Z p , where p is a positive prime integer, are prime
fields.
20 1 Algebraic Structures and Matrices

Proof Let S be a subfield of Q. It is obvious that 1 ∈ S. Let q ∈ Q. If q = 0, then


it obviously belongs to S. On the other hand, if q = 0, then it will look like q = ab ,
where a = 0, b = 0 ∈ Z. Without loss of generality, we can assume that b > 0. As
· · · + 1 = b ∈ S. But characteristic of Q is 0. As a result, b = 0, but
1 ∈ S, 1 + 1 +
b−times
S is a field. Hence b−1 ∈ S. If a > 0, then like the previous arguments a ∈ S. On
the other hand, if a < 0, then −a > 0 and by previous arguments −a ∈ S. But as
S is a field, hence −(−a) = a ∈ S. At this stage, we have proved that a, b−1 ∈ S.
Using the fact that S is a field, we conclude that ab−1 = ab ∈ S. Now, it follows that
Q ⊆ S. Finally, we have shown that S = Q. As a result, Q has no proper subfield.
Thus Q is a prime field.
Consider Z p = {0̄, 1̄, . . . , ( p − 1)}, the field of residue classes modulo p, where
p is a positive prime integer. Let S be a subfield of Z p . Clearly, 1̄ ∈ S. Let x̄ ∈ Z p .
If x̄ = 0̄, then obviously x̄ ∈ Z p . If x̄ = 0̄, then clearly x is an integer such that
1 ≤ x ≤ p. We have x̄ = 1̄ + 1̄ + · · · + 1̄ . We obtain that x̄ ∈ S due to the fact that
x−times
S is a field. Thus we have shown that S = Z p . Hence Z p is a prime field.

Proposition 1.63 Intersection of all subfields of a field F is a prime field.

Proof Let {Si }i∈ be the family


of all subfields of the field F, where is an indexing
set. We have to prove that {Si }i∈ is a prime field. We know that intersection of a
family
of subfields is a subfield. Thus {Si }i∈ is a subfield of F. Next we prove that
{Si }i∈ has no proper subfield. For this let S be a subfield of {Si }i∈ . This implies
that S is obviously a subfield of F. But {Si }i∈ be the family of all subfields of the
field F, thus there exists i 0 ∈ such that S = Si0 . Obviously, {Si }i∈ ⊆ Si0 = S.
Thus we conclude that S = {Si }i∈ . Therefore {Si }i∈ has no proper subfield.
Hence {Si }i∈ is a prime field.

Proposition 1.64 Let F be a field. If char (F) = 0, then there exists a subfield S1 of
F such that S1 ∼= Q. Further, if char (F) = p, where p is a positive prime integer,
then there exists a subfield S2 of F such that S2 ∼
= Zp.

Proof Let us suppose that char (F) = 0. Define a map f : Q −→ F such that f ( mn ) =
(m1)(n1)−1 , where 1 is the identity of the field F. As char (F) = 0, so (n1) = 0 for
any nonnegative integer n. Therefore (n1)−1 exists. It can be easily verified that f is
a well-defined map and also it is an injective ring homomorphism. Thus Q ≈ f (Q).
But we know that an isomorphic image of a field is a field. This shows that f (Q) is
a field. Now our required subfield S1 of F is f (Q), i.e., S1 = f (Q) ⊆ F.
Now let us suppose that char (F) = p. Since Z p = {0̄, 1̄, . . . , ( p − 1)}. Now
define a map f : Z p −→ F such that f (x̄) = x1, where 1 is the identity of the field
F and x is an integer such that 0 ≤ x ≤ p − 1. It is easy to verify that f is a well-
defined map and also it is an injective ring homomorphism. Hence Z p ≈ f (Z p ).
But f (Z p ) is a field because isomorphic image of a field is a field. Thus our desired
subfield S2 of F is f (Z p ), i.e., S2 = f (Z p ) ⊆ F.
1.3 Fields with Basic Properties 21

Definition 1.65 A field containing a finite number of elements is called a finite field
or a Galois field.

Example 1.66 Z p is a finite field.

Remark 1.67 (i) The characteristic of a finite field F cannot be 0. For otherwise,
i.e., if char(F) = 0, then 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . . will belong to F
as 1 ∈ F and also all these elements will be distinct. But under such a situation
F will contain infinite number of elements. This leads to a contradiction. Thus
the characteristic of a finite field F will be a positive prime integer p.
Here it is to be noted that converse of this statement is not true. There exists
infinite field of characteristic p. We are constructing an example of such a field.
Let Z p denotes the field of residue classes modulo p. Let x be a transcendental
element over the field Z p . Let Z p [x] denotes the polynomial ring over the
f (x)
field Z p . Now define a set given by T = { g(x) | f (x), g(x) ∈ Z p [x]}. It can be
easily verified that T is a field with regard to addition and multiplication of
rational functions. It is also obvious to observe that T is an infinite field but
char (T ) = p.
(ii) A field F is finite if and only if its multiplicative group is cyclic.
(iii) The number of elements in a finite field will be of the form p n , where p is a
positive prime integer, n is any positive integer and char (F) = p. A finite field
or a Galois field containing p n elements is usually denoted by G F( p n ).
(iv) Given a positive prime p and a positive integer n, there exists a finite field
containing p n elements.
(v) Any two finite fields having the same number of elements are isomorphic.

Definition 1.68 A field F is called an algebraically closed field if every polynomial


over F has a root in F.

Example 1.69 (1) The field of complex numbers C is algebraically closed. Since
by Fundamental Theorem of Algebra, we know that every polynomial over field
of complex numbers has all roots in C.
(2) The field of real numbers R is not algebraically closed. Since x 2 + 1 is a poly-
nomial over field of real numbers which has roots ±i but these roots do not lie
in R.
(3) The field of rational numbers Q is not algebraically closed.√Since x 2 − 3 is a
polynomial over field of rational numbers which has roots ± 3 but these roots
do not lie in Q.
(4) Any finite field F cannot be algebraically closed. Let F = {a1 , a2 , . . . , am }. Now
construct a polynomial P(x) = 1 + (x − a1 )(x − a2 ) · · · (x − am ) over F. It is
obvious to observe that no element of F is a root of P(x). Thus F is not alge-
braically closed.

Proposition 1.70 Let F be a field. Then following statements are equivalent for F:
(i) F is algebraically closed.
(ii) F has no proper algebraic extension.
22 1 Algebraic Structures and Matrices

(iii) Every irreducible polynomial over F is of degree 1.


(iv) Every polynomial over F has a root in F.
(v) Every polynomial over F has all its roots in F.
(vi) Every polynomial over F breaks into linear factors over F.

Proof (i) ⇒ (ii) Let F be algebraically closed. We have to prove that F has no proper
algebraic extension. Suppose on contrary, i.e., there exists a proper algebraic exten-
sion K of F. This implies that there exists α ∈ K such that α ∈ / F and α is a root of a
polynomial f (x) over F. Let f (x) = α0 + α1 x + α2 x 2 + · · · + αn x n , where n ≥ 1
be a polynomial over the field F. As F is algebraically closed, hence f (x) will have
a root in F. Let this root be β1 ∈ F. Hence by factor theorem f (x) = (x − β1 )g(x),
where g(x) ∈ F[x]. Using hypothesis again g(x) will have a root β2 ∈ F. Now
using again the factor theorem, we arrive at f (x) = (x − β1 )(x − β2 )g1 (x), where
g1 (x) ∈ F[x]. Now proceeding inductively, after finite number of steps and using
the fact that any polynomial over F of degree n has n roots, we obtain that
f (x) = λ(x − β1 )(x − β2 ) · · · (x − βn ), where λ, βn ∈ F. Now we conclude that
β1 , β2 , . . . , βn are precisely the roots of f (x). But α is also a root of f (x). Thus,
we get that α = βi for some i : 1 ≤ i ≤ n. This implies that α ∈ F, which leads to
a contradiction. Thus F has no proper algebraic extension.

(ii) ⇒ (iii) Let F has no proper algebraic extension. We have to show that every
irreducible polynomial over F is of degree 1. Suppose on contrary. Hence there exists
an irreducible polynomial f (x) over F of degree n ≥ 2. It is obvious to observe that
no root of f (x) will lie in F. Let α be a root of f (x) lying outside F. Clearly α is
algebraic over F. Let K= [F ∪ {α}] be the subfield generated by F ∪ {α}. As α is
algebraic over F and K = F. Thus K is a proper algebraic extension of F. This leads
to a contradiction. Hence every irreducible polynomial over F is of degree 1.

(iii) ⇒ (iv) Let every irreducible polynomial over F is of degree 1. Suppose


that f (x) is any polynomial over F. We know that f (x) = f 1 (x) f 2 (x) · · · f n (x),
where f 1 (x), f 2 (x), . . . , f n (x) are irreducible polynomials over F. But according
to hypothesis, we get f 1 (x) = a1 x + b1 , f 2 (x) = a2 x + b2 , . . . , f n (x) = an x + bn .
This shows that f 1 (x), f 2 (x), . . . , f n (x) have their roots in F. As a result, f (x) has
all its roots in F. This completes our proof.

(iv) ⇒ (v) Suppose that every polynomial over F has a root in F. Let f (x) ∈ F[x]
of degree n ≥ 1. Hence f (x) will have a root in F. Let this root be β1 ∈ F. Hence by
factor theorem f (x) = (x − β1 )g(x), where g(x) ∈ F[x]. Using hypothesis again
g(x) will have a root β2 ∈ F. Now using again the factor theorem, we arrive at
f (x) = (x − β1 )(x − β2 )g1 (x), where g1 (x) ∈ F[x]. Now proceeding inductively,
after finitely number of steps and using the fact that degree of f (x) = n, we obtain
that f (x) = λ(x − β1 )(x − β2 ) · · · (x − βn ), where λ, βn ∈ F. Now we conclude
that β1 , β2 , . . . , βn are precisely the roots of f (x) and these lie in F. We are done.
1.3 Fields with Basic Properties 23

(v) ⇒ (vi) Let every polynomial over F have all its roots in F. Suppose that
f (x) ∈ F[x] of degree n ≥ 1. By hypothesis f (x) will have all roots in F. But
we know that the total number of all roots of f (x) is equal to n, the degree
of the polynomial. Thus, let these roots be α1 , α2 , . . . , αn ∈ F. Thus, by factor
theorem (x − α1 ), (x − α2 ), . . . , (x − αn ) will be factors of f (x). Thus, we can
write f (x) = (x − α1 )(x − α2 ) · · · (x − αn )g(x), where g(x) ∈ F[x]. Now compar-
ing the degrees of both sides in the previous relation, we conclude that g(x) = λ ∈ F.
Hence f (x) = λ(x − α1 )(x − α2 ) · · · (x − αn ). Now proof is completed.

(vi) ⇒ (i) Suppose that every polynomial over F breaks into linear factors over F
and let f (x) = λ(x − α1 )(x − α2 ) · · · (x − αn ). Clearly α1 is a root of f (x) which
lies in F. This shows that F is algebraically closed.

Exercises

1. Prove that the only isomorphism of Q (resp. R) onto Q (resp. R) is the identity
mapping I Q (resp. I R ).
2. Show that the only isomorphism of C onto itself which maps reals to reals is the
identity mapping IC or the conjugation mapping.
3. Let R be a commutative ring with unity. Prove that R is a field if R has no proper
ideals.
4. Show that a finite field cannot be of characteristic zero.
√ √
5. Prove that for a fixed prime p, Q( p) = {a + b p | a, b ∈ Q} forms a field
under usual addition and usual multiplication.
6. Let F be a field of characteristic
√ p > 0 and a ∈ F be such that there exists no
b ∈ F with b p = a (i.e., p a ∈ / F). Then show that x p − a is irreducible over F.
7. If f is a function from a field F to itself such that f (x) = x −1 if x = 0 and
f (0) = 0, then show that f is an automorphism if and only if F has either 2, 3, 4
elements.
8. Find all roots of x 5 + 3̄x 3 + x 2 + 2̄x ∈ Z5 [x] in Z5 .
9. Show that a factor ring of a field is either the trivial ring of one element or is
isomorphic to the field.
10. Show that for a field F, the set of all matrices of the form
a b
| for all a, b ∈ F is a right ideal of M2 (F) but not a left ideal of
00
M2 (F). Moreover, find a subset of M2 (F) which is a left ideal of M2 (F), but not
a right ideal of M2 (F).
11. Let R be a ring with identity. If char (R) = 0, then show that there exists a
subring S1 of R such that S1 ∼ = Z, the ring of integers. Further, if char (R) = n,
where n is a positive integer, then prove that there exists a subring S2 of R such
that S2 ∼
= Zn , the ring of residue classes (mod n).
24 1 Algebraic Structures and Matrices

1.4 Matrices

Let us start by introducing the basic definitions:

Let F be a field. Most part of the results we are going to discuss hold in the case F
is an arbitrary field, in all that follows we always assume that the characteristic of F
is different from 2.
A matrix over F of size m × n is a rectangular array with m rows and n columns
of the form ⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
A=⎢ ⎣ ... ... ... ... ⎦

am1 am2 . . . amn

where ai j ∈ F, for i = 1, . . . , m and j = 1, . . . , n. The indices i, j corresponding to


any ai j indicate that the element appears in the ith row and jth column of the matrix.
A matrix is said to be square of order n, if it has the same number n of  rows and
columns. Any matrix A can be represented in the compact form A = ai j , where the
number of rows and columns must clearly be intended. Sometimes it will be useful
to look at a matrix A as the sequence of its rows or a sequence of its columns. In
those cases, denoting by
⎡ ⎤
a1 j
⎢ a2 j ⎥
⎢ ⎥
Ri = [ai1 ai2 . . . ain ] and C j = ⎢ . ⎥
⎣ .. ⎦
am j

respectively, the ith row and the jth column of A, we may represent A just listing
its rows or columns, that is as

A = [R1 , R2 , . . . , Rm ] or A = [C1 , C2 , . . . , Cn ].

Definition 1.71 Let A be a matrix over F of order m × n. A submatrix of the matrix


A is a rectangular array with s ≤ m rows and t ≤ n columns obtained from A by
deleting some of its m − s rows and n − t columns. This is equivalent to say that, if
we pick s rows Ri1 , . . . , Ris and t columns C j1 , . . . , C jt of A, we obtain a submatrix
B of A by collecting the elements belonging to those rows and columns
⎡ ⎤
ai1 j1 ai1 j2 . . . ai1 jt
⎢ ai2 j1 ai2 j2 . . . ai2 jt ⎥
B=⎢
⎣ ...
⎥.
... ... ... ⎦
ais j1 ais j2 . . . ais jt
1.4 Matrices 25

A principal submatrix is a square submatrix obtained by picking the same number of


rows and columns in A, i.e., s = t and i 1 = j1 , . . . , i s = js . The leading principal
submatrix of size k × k is a principal submatrix which arises by picking precisely
the first k rows and the first k columns of A.
Two matrices are said to be equal if they have the same orders and the correspond-
ing entries are all equal to each other. The m × n zero matrix is the matrix whose all
entries ai j are 0, for any i = 1, . . . , m and j = 1, . . . , n. We denote such a matrix
by 0, or 0m×n in case its order needs to be recalled.
In literature, the set of all m × n matrices over a field F is denoted by Mmn (F) and
in particular, for m = n, the set of all square matrices of order n over F is usually
denoted by Mn (F).  
In case of a square matrix A = ai j ∈ Mn (F), we give the following additional
definitions:
(1) The set consisting of all elements with the same indices aii , for i = 1, . . . , n is
said to be the main diagonal of the square matrix.
(2) A is said to be a diagonal matrix if all its off-diagonal terms are 0.
(3) A is called identity matrix of size n if all its diagonal terms are equal to 1, while
other remaining entries are 0, i.e., aii = 1 and ai j = 0 for i = j. We denote such
a matrix by I, or In if it is important to keep track of its size.
(4) A is said to be a scalar matrix if it is of the form α In , for some α ∈ F. Then it
is a diagonal matrix whose all diagonal terms are equal to α.
(5) A is called an upper triangular if every element below the main diagonal is 0,
that is ai j = 0 for any i > j.
(6) A is called a lower triangular if every element above the main diagonal is 0,
that is ai j = 0 for any j > i.
Matrix Operations

Let Mmn (F) be the set of all m × n matrices over F. If A, B ∈ Mmn (F), then the
sum A + B is the matrix in Mmn (F) obtained by adding
  together the corresponding
entries in the two matrices. In other
  words, if A = ai j and B = bi j then A + B =
C ∈ Mmn (F) is the matrix C = ci j such that ci j = ai j + bi j , for any i = 1, . . . , m
and j = 1, . . . , n :
⎡ ⎤
a11 + b11 a12 + b12 . . . a1n + b1n
⎢ a21 + b21 a22 + b22 . . . a2n + b2n ⎥
C =⎢

⎥.

... ... ... ...
am1 + bm1 am2 + bm2 . . . amn + bmn
 
Let now α ∈ F and A = ai j ∈ Mmn (F). The scalar multiplication of A by α is the
matrix in Mmn (F) obtained
 by multiplying each entry of A by the scalar α, that is,
the matrix α A = αai j :
26 1 Algebraic Structures and Matrices
⎡ ⎤
αa11 αa12 ... αa1n
⎢ αa21 αa22 ... αa2n ⎥
αA = ⎢
⎣ ...
⎥.
... ... ... ⎦
αam1 αam2 ... αamn

Appealing to the field axioms and focusing the attention to an entry-wise, it is easy
to see that addition and scalar multiplication satisfy the following properties:
(1) For any A, B ∈ Mmn (F), A + B = B + A.
(2) For any A, B, C ∈ Mmn (F), A + (B + C) = (A + B) + C.
(3) For any A ∈ Mmn (F), A + 0m×n = 0m×n + A = A.
(4) For any A = (ai j ) ∈ Mmn (F), there exists the matrix B = (−ai j ) ∈ Mmn (F)
such that A + B = 0m×n . Usually such a matrix B is denoted by −A.
(5) For any α ∈ F and A, B ∈ Mmn (F), α(A + B) = α A + α B.
(6) For any α, β ∈ F and A ∈ Mmn (F), (α + β)A = α A + β A.
(7) If 1F is the identity element of F, then 1F A = A, for any A ∈ Mmn (F).
(8) For any α, β ∈ F and A ∈ Mmn (F), α(β A) = (αβ)A.
In particular, note that Mmn (F) is a commutative group with respect to the addition
between matrices. Moreover, it is an example of a vector space over F. The concept
of vector space is discussed in the next chapter.

Definition 1.72 Let A ∈ Mn (F). The sum of all entries on the main diagonal of A
is called the trace of A and denoted by tr (A).

The following properties satisfied by the trace are easy consequences of its definition.
(1) If A, B ∈ Mn (F), then tr (A + B) = tr (A) + tr (B).
(2) If A ∈ Mn (F) and α ∈ F, then tr (α A) = αtr (A).

Definition 1.73 Let A ∈ Mmn (F). The transpose of A, usually denoted by At , is the
n × m matrix obtained by interchanging rows and columns of A in such a way that
the ordered elements in the ith row (column) of A are exactly the ordered elements of
the ith column (resp. row) of At . In other words, if α ∈ F appears in the (r, s)-entry
of A, then it appears in the (s, r )-entry of At .

Here we also list a number of easy properties, whose proofs depend only on the
definition of the transpose.
(1) For any A ∈ Mmn (F), (At )t = A.
(2) For any A, B ∈ Mmn (F), (A + B)t = At + B t .
(3) For any A ∈ Mn (F) and α ∈ F, (α A)t = α At .
(4) For any A ∈ Mn (F), tr (At ) = tr (A).
A particular class of matrices, highly significant in further discussions, consists of
those which satisfy the condition of coinciding with their corresponding transposes.

Definition 1.74 If a square matrix A satisfies the condition A = At , (resp. A =


−At ) we say that it is a symmetric (resp. skew-symmetric ) matrix.
1.4 Matrices 27

Generalizing the previous concepts, we introduce the following:


 
Definition 1.75 The conjugate of a complex matrix A = ai j ∈ Mmn (C) is the
matrix A = (ai j ) ∈ Mmn (C) such that ai j = ai j for any i = 1, . . . , m and j =
1, . . . , n. The conjugate transpose of A is the matrix A∗ = (bi j ) ∈ Mnm (C) such
that bi j = a ji for any i = 1, . . . , n and j = 1, . . . , m. A square matrix that is equal
to its conjugate transpose is called hermitian. If A = −A∗ , then A is said to be
skew-hermitian.
If A is real, then its conjugate transpose is same as its transpose, and hence hermitian
is same as symmetric.  
Now
 we describe matrix multiplication as follows: Let A = ai j ∈ Mmn (F)  and

B = bi j ∈ Mnq (F). The matrix product AB is defined as the matrix C = ci j ∈
Mmq (F) such that the entry cr s of C is computed by


n
cr s = ar k bks = ar 1 b1s + ar 2 b2s + ar 3 b3s + . . . + ar n bns .
k=1

It is clear that the product AB does not make sense if A ∈ Mmn (F) and B ∈ Mtq (F)
with n = t. Hence, the products AB and B A are simultaneously possible only if A
and B are matrices of the orders m × n and n × m, respectively. Nevertheless, even
when possible, it is not generally true that AB = B A. Thus the matrix product is not
commutative. Associativity of matrix multiplication and distributive properties are
given as below:
     
Proposition 1.76 (i) If A = ai j ∈ Mmn (F), B = bi j ∈ Mnt (F), C = ci j ∈
Mtq (F) then (AB)C = A(BC), that is, the matrix product is associative.
(ii) For any A ∈ Mmn (F), B ∈ Mnt (F), C ∈ Mnt (F), we have A(B + C) = AB +
AC, i.e., the matrix product is distributive over the matrix addition.

Note 1.77 The earlier mentioned identity matrix of order n is the identity element
for the matrix product in Mn (F). All the properties we have discussed lead us to
the conclusion that, for any n ≥ 1, the set Mn (F), equipped by matrix addition and
multiplication is a (noncommutative) ring having unity.

Two relevant additional properties for the matrix product are the following:
(i) If A = (ai j ) ∈ Mmn (F) and B = (bi j ) ∈ Mnr (F), then (AB)t = B t At .
(ii) If A = (ai j ) ∈ Mn (F) and B = (bi j ) ∈ Mn (F), then tr (AB) = tr (B A).
The Determinant of a Square Matrix

Definition 1.78 Let A = (ai j ) ∈ Mn (F). Then the determinant of A, written as det A
or |A| is the element σ ∈Sn Sign (σ )a1σ (1) a2σ (2) · · · anσ (n) ∈ F, where Sign (σ ) = 1
if σ is an even permutation and sign (σ ) = −1 if σ is an odd permutation and Sn is
the permutation group on n symbols, i.e., 1, 2, . . . , n. We will also use the notation
28 1 Algebraic Structures and Matrices
 
 a11 ... a1n 

 .. .. .. 
 . . . 

 an1 . . . ann 

for the determinant of the matrix


⎡ ⎤
a11 ... a1n
⎢ .. .. .. ⎥ .
⎣ . . . ⎦
an1 . . . ann

Remark 1.79 (i) For n = 1, i.e., if A = (a) is a 1 × 1 matrix, then we have |A| =
a.  
a11 a12
(ii) For n = 2, i.e., if A = is a 2 × 2 matrix, then we have |A| =
a21 a22
a11 a22 − a12 a21 .
(iii) If A is a diagonal matrix with diagonal entries λ1 , λ2 , . . . , λn , then |A| =
λ1 λ2 · · · λn .
Definition 1.80 Any square matrix having determinant equal to zero (respectively
different from zero) is said to be singular (resp. nonsingular).
Determinants have the following well-known basic properties:
Theorem 1.81 Let A ∈ Mn (F). Then
(i)interchanging two rows of A changes the sign of det A,
(ii)det A= det At ,
(iii)for any B ∈ Mn (F), det (AB) =det A det B,
(iv) the determinant of an upper triangular or lower triangular matrix is the product
of the entries on its main diagonal, ⎛ ⎞
B1 0 · · · 0
⎜ . ⎟
⎜ 0 . . . · · · .. ⎟
(v) if A has the block diagonal form, i.e., A = ⎜ . ⎜ ⎟
. ⎟ , where B1 ,
⎝ .. · · · . . . .. ⎠
0 · · · 0 Bm
B2 , . . . , Bm are square matrices, then det A=det B1 det B2 · · · det Bm .
Elementary Array Operations in Fn

Let us consider a set of arrays R1 , . . . , Rk ∈ Fn . For any α1 , . . . , αk ∈ F, the element

α1 R1 + α2 R2 + · · · + αk Rk ∈ Fn (1.1)

is said to be a linear combination of R1 , . . . , Rk .


Definition 1.82 Let R1 , . . . , Rk ∈ Fn and 0 be the zero element in Fn . It seems clear
that any Ri can be regarded as a matrix with exactly one row and n columns, that
1.4 Matrices 29

is Ri ∈ M1n (F). Therefore, addition Ri + R j and scalar product α Ri , by an element


α ∈ F, are just particular cases of the more general ones related to addition of m × n
matrices and scalar product of a m × n matrix by an element α ∈ F.
We say that R1 , . . . , Rk are linearly dependent if there exist nontrivial choice of
scalars α1 , . . . , αk ∈ F such that

α1 R1 + α2 R2 + · · · + αk Rk = 0.

Conversely, we say that R1 , . . . , Rk are linearly independent in the case

α1 R1 + α2 R2 + · · · + αk Rk = 0 if and only if α1 = α2 = · · · = αk = 0.

Remark 1.83 Suppose that R1 , . . . , Rk ∈ Fn are linearly dependent and let


α1 , . . . , αk ∈ F such that

α1 R1 + α2 R2 + · · · + αk Rk = 0,

where αi = 0, for some i = 1, . . . , k. Then

αi Ri = −α1 R1 − α2 R2 − · · · − αi−1 Ri−1 − αi+1 Ri+1 − · · · − αk Rk

so that
α1 α2 αi−1 αi+1 αk
Ri = − R1 − R2 − · · · − Ri−1 − Ri+1 − · · · − Rk ,
αi αi αi αi αi

where α1i = αi−1 is the inverse of αi as an element of the field F, implying that Ri
can be expressed as a linear combination of {R1 , . . . , Ri−1 , Ri+1 , . . . , Rk }.

Remark 1.84 Let {R1 , . . . , Rk } be a set of elements in Fn . If one of them is the zero
element 0 ∈ Fn , then R1 , . . . , Rk are linearly dependent.

For the sake of clarity, if R1 , . . . , Rk are linearly dependent (or independent) arrays,
we sometimes say that the set {R1 , . . . , Rk } is a linearly dependent set (resp. linearly
independent set).
Given a set S of elements of Fn , one of the most important questions in linear
algebra, if not the most important one, is how to recognize the (largest) number of
linearly independent arrays in S. To answer this question we need to permit some
remarks.
We first assume that the set

{R1 , . . . , Rk } ⊂ Fn (1.2)

is linearly dependent and there are α1 , . . . , αk ∈ F such that

α1 R1 + α2 R2 + . . . + αk Rk = 0 (1.3)
30 1 Algebraic Structures and Matrices

for some αi = 0.
A first and obvious remark is that the order of comparison of the arrays does
not affect the fact that the set is linearly dependent. Thus, any other set obtained by
permuting R1 , . . . , Rk , is yet linearly dependent.
Let now 0 = β ∈ F and consider the following set of elements in Fn :

{R1 , . . . , Ri−1 , β Ri , Ri+1 , . . . , Rk } ⊂ Fn .

From the fact that

γ1 R1 + γ2 R2 + · · · + γi (β Ri ) + · · · + γk Rk = 0,

where γh = βαh , for any h = i, and γi = αi , it follows that

{R1 , . . . , Ri−1 , β Ri , Ri+1 , . . . , Rk }

is linearly dependent. Hence, scalar product of an array by an element from F does


not affect the fact that the new set of elements in Fn is linearly dependent, as the
original set was.
We now fix 0 = β ∈ F and two distinct arrays Ri , R j ∈ {R1 , . . . , Rk }. Consider
the following set of elements in Fn :

{R1 , . . . , Ri−1 , Ri + β R j , Ri+1 , . . . , Rk }. (1.4)

Notice that

γ1 R1 + γ2 R2 + · · · + γi (Ri + β R j ) + · · · + γk Rk = 0,

where γh = αh , for any h = j, and γ j = α j − βαi . We then conclude that the set
described in (1.4) is linearly dependent.
Suppose now that the set at (1.2) is linearly independent. Of course, any other set
obtained by permuting (1.2) is yet linearly independent.
Moreover, if we assume there exists some 0 = β ∈ F such that

{R1 , . . . , Ri−1 , β Ri , Ri+1 , . . . , Rk }

is linearly dependent, we would conclude that

γ1 R1 + γ2 R2 + · · · + γi (β Ri ) + · · · + γk Rk = 0

for some 0 = γi ∈ F, contradicting the fact that (1.2) is linearly independent. Also
in this case, the scalar product by a nonzero scalar preserves the linear independence
between elements of the new set.
Finally, for 0 = β ∈ F and two distinct rows Ri , R j ∈ {R1 , . . . , Rk }, assuming
that
1.4 Matrices 31

{R1 , . . . , Ri−1 , Ri + β R j , Ri+1 , . . . , Rk } (1.5)

is linearly dependent, it should follow that

γ1 R1 + γ2 R2 + · · · + γi Ri + · · · + (γ j + γi β)R j + · · · + γk Rk = 0,

where 0 = γh ∈ F for some 1 ≤ h ≤ k, once again a contradiction. We then conclude


that the set described in (1.5) is linearly independent.
In summary, starting from a given set of array (1.2), we have obtained three
different sets of elements in Fn , by performing the following three operations on the
elements of (1.2):
(1) Interchanging two arrays.
(2) Multiplying an array Ri by a nonzero scalar α in such a way that α Ri replaces
Ri in the new set of arrays.
(3) Adding a constant multiple of an array R j (namely α R j ) to another array Ri , in
such a way that Ri + α R j replaces Ri in the new set of arrays.
The listed operations are called elementary operations. We have in particular proved
that, even if elementary operations generate subsets that are different from the starting
one, they preserve the property of linear dependence (or independence) between the
elements constituting those sets. Hence, when we determine the largest number
of linearly independent arrays in a given set, it could be useful to perform some
elementary operations to obtain a new set of arrays, for which the answer to the
question can be more easily achieved.

Definition 1.85 Two sets of arrays S = {R1 , . . . , Rk } ⊂ Fn and S  = {R1 , . . . , Rk }


⊂ Fn are said to be equivalent if S can be transformed into S  by performing a finite
sequence of elementary operations on it.

Consider then the set S = {R1 , . . . , Rk } ⊂ Fn and assume that it is linearly depen-
dent. Without loss of generality, we may assume that Rk is linearly depending from
R1 , . . . , Rk−1 (if not, it would be enough to permute the order of arrays). By follow-
ing the argument in Remark 1.83, Rk being a linear combination of R1 , . . . , Rk−1 ,
there exist α1 , . . . , αk ∈ F such that
α1 α2 αk−1
Rk = − R1 − R2 − · · · − Rk−1 .
αk αk αk

At this point, replacing Rk by


α1 α2 αk−1
Rk + R1 + R2 + · · · + Rk−1
αk αk αk

we obtain a set S  of arrays which is equivalent to S. In particular, the kth array of


S  is now precisely 0 :
S  = {R1 , . . . , Rk−1 , 0}.
32 1 Algebraic Structures and Matrices

Starting from this, we consider the subset {R1 , . . . , Rk−1 }. If it is linearly independent,
we have no chance to replace any array by 0. But, in case it is linearly dependent,
by repeating the process, we are able to replace, for instance, Rk−1 by 0 and obtain
again a new set of arrays S  which is equivalent to S  :

S  = {R1 , . . . , Rk−2 , 0, 0}.

Therefore, if we suppose that, after t steps, the set of arrays is

S (t) = {R1 , . . . , Rk−t , 0, . . . , 0}


  
t−times

and the subset {R1 , . . . , Rk−t } is linearly independent, then we stop the process.
Having in mind the above, it becomes clear that the largest number k − t of linearly
independent arrays in S (t) is equal to the largest number of linearly independent
arrays in the starting set S. Moreover, if the original set S consists of all linearly
independent elements, then there is no sequence of elementary operations that can
transform it into a set S  having some zero element.
Equivalent Matrices
 
We have previously remarked how any matrix A = ai j ∈ Mmn (F) can be repre-
sented just listing its rows R1 , . . . , Rm .
In light of what we said above, k rows R1 , . . . , Rk extracted from the matrix
are linearly dependent (or independent) if they are dependent (or independent) as
elements of Fn . In particular, to recognize the largest number of linearly independent
rows in the set S = {R1 , . . . , Rk }, we may perform some appropriate elementary
operations on S and obtain an equivalent set S  of arrays in Fn . The elements of S 
are not necessarily rows from A; nevertheless the relationship of linear dependence
(or independence) between elements of S  has not changed since the relationship
between elements of S.
Note 1.86 We introduce the following notations for elementary row operations on
a matrix A :
(I) Interchanging two rows Ri and R j will be denoted Ri ↔ R j .
(II) Multiplying a row Ri by a nonzero scalar α, in such a way that α Ri replaces
Ri in the new set of arrays, will be denoted by Ri → α Ri .
(III) Adding a constant multiple of a row R j (namely α R j ) to another row Ri , in
such a way that Ri + α R j replaces Ri in the new set of arrays, will be denoted
by Ri → Ri + α R j .
Definition 1.87 Two matrices A, A ∈ Mmn (F) are called equivalent if A can be
transformed into A by performing a finite sequence of elementary row operations
on it.
Remark 1.88 Let A, A ∈ Mn (F) be equivalent matrices. Thus A is obtained from
A by performing a finite sequence of elementary row operations. This means that,
1.4 Matrices 33

any submatrix B  of A is obtained from a suitable submatrix B of A, by performing


the same sequence of row operations. In particular, if B  and B are equivalent square
submatrices respectively of A and A, then B  is singular (resp. nonsingular) if and
only if B is.

Reduced Row Form of a Matrix


 
Let A = ai j ∈ Mmn (F) and A = [R1 , . . . , Rm ] be its compact representation,
where any Ri ∈ Fn is a row of A.

Definition 1.89 An element ai j ∈ Ri is said to be the leading entry of ith row if it


is the first nonzero entry of the row.
 
Definition 1.90 The matrix A = ai j ∈ Mmn (F) is said to be in reduced row form
or echelon form if it satisfies the following conditions:
(1) All the rows that consist entirely of zeros are below any row that contains nonzero
entries, that is the nonzero rows of A precede the zero rows.
(2) The column numbers c1 , . . . , ck of the leading entries of the nonzero rows
R1 , . . . , Rk form an increasing sequence of numbers, that is c1 < c2 < · · · < ck .
In other words, the leading element of each row is in a column to the right of the
leading elements of preceding rows.

Hence, we may, in general, say that the reduced row form of a matrix is the following:
⎡ ⎤
0 · 0 a1 j1 · · · · · · ··· · · · · · a1n
⎢0 · · 0 · 0 a2 j2 · · · ··· · · · · · a2n ⎥
⎢ ⎥
⎢0 · 0 0 0 · 0 · 0 a3 j3 · · · · · · · · a3n ⎥
⎢ ⎥
⎢· · · · · · · · · · ··· · · ··· · ⎥
⎢ ⎥
⎢· · · · · · · · · · ··· · · ··· · ⎥
⎢ ⎥
⎣· · · · · · · · · · ··· · · ··· · ⎦
0 0 · · · · · 0 · 0 ··· 0 am jm · · · amn

in case there is no row consisting entirely of zeros, or also


⎡ ⎤
0 · 0 a1 j1 ··· ··· ··· ··· ··· ··· ··· · · · a1n
⎢0 · · 0 ··· 0 a2 j2 ··· ··· ··· ··· · · · a2n ⎥
⎢ ⎥
⎢0 · 0 0 0 ··· 0 · · · 0 a3 j3 ··· · · · a3n ⎥
⎢ ⎥
⎢· · · · ··· ··· ··· ··· ··· ··· ··· · · · · ⎥
⎢ ⎥
⎢0 0 · · ··· ··· ··· 0 ··· 0 ··· 0 ak jk · akn ⎥
⎢ ⎥
⎢0 0 · · ··· ··· ··· ··· ··· ··· · · · 0 0 ⎥
⎢ ⎥
⎣· · · · ··· ··· ··· ··· ··· ··· ··· · · · · ⎦
0 0 · · ··· ··· ··· ··· ··· ··· ··· · · 0 0

if there are precisely m − k zero rows in the matrix.


34 1 Algebraic Structures and Matrices

Theorem 1.91 (Gaussian elimination method) By performing a finite sequence of


elementary row operations, any matrix A ∈ Mmn (F) can be transformed in a matrix
having a reduced row form.
Definition 1.92 A leading entry of a matrix used to zero out entries below it by
means of elementary row operations is called a pivot.
We have the following important property when a matrix is transformed into a row-
reduced echelon form:
Proposition 1.93 If A ∈ Mmn (F) has the reduced row form, then the m − r nonzero
rows R1 , . . . , Rm−r are linearly independent.
The following results are important in explaining the relationships between linearly
independent rows and linearly independent columns in matrices:
Theorem 1.94 (i) Let A ∈ Mn (F). The set of all rows {R1 , . . . , Rn } from A is
linearly independent if and only if A is nonsingular.
(ii) Let A ∈ Mmn (F). The largest number of linearly independent rows from A is
equal to the largest number of linearly independent columns from A.
The Rank of a Matrix

Definition 1.95 Let A ∈ Mmn (F). The rank of A, denoted by r (A), is the largest
order of any nonsingular square submatrix in A.
According to Theorem 1.94, the above definition of rank of a matrix can be reworded
as follows:
Definition 1.96 Let A ∈ Mmn (F). The rank of A is the largest number of linearly
independent rows ( or columns) in A.
Theorem 1.97 Let A, B ∈ Mmn (F) be two distinct matrices. If A, B are equivalent
then r (A) = r (B).
Proof We proved earlier that row operations do not affect the linear relationships
among rows. Also, since the rank is the largest number of linearly independent rows,
we may assert that two equivalent matrices have the same rank.
Therefore, in order to compute the rank of a matrix, we may change it into a reduced
row form by appropriate elementary operations such that we can directly see how
large its rank is: the count for the number of nonzero rows in its reduced row form
is precisely the rank of the matrix.
Definition 1.98 A square matrix A ∈ Mn (F) is said to be invertible if there exists a
square matrix B ∈ Mn (F), called the inverse of A, such that AB = B A = In .
Since matrix multiplication is not commutative, one can give the notions of both
right and left inverse for a right invertible and left invertible matrix, respectively. We
have the following important elementary properties of invertible matrices:
1.4 Matrices 35

Proposition 1.99 (i) A ∈ Mn (F) is invertible if and only if det A = 0.


(ii) Also, det A−1 = (det A)−1 .
Elementary Matrices

Definition 1.100 A square matrix of order n is said to be an elementary matrix if


it is obtained by performing one of the three types of elementary operations on the
identity matrix of order n.
Hence we may obtain three different types of elementary matrices E having order
n:
(1) Interchanging two rows Ri and R j in the identity matrix In . In this case, |E| =
−1.
(2) Multiplying a row Ri of In by a nonzero scalar α. In this case |E| = α.
(3) Adding a constant multiple of a row R j from In to another row Ri of In . In this
case |E| = 1.
Notice that since any elementary matrix of order n is equivalent to the identity matrix
In , its rank is precisely equal to n; therefore it is not singular and so invertible. In
particular, easy computations show that:
(1) If E is the elementary matrix obtained from In by interchanging two different
rows, then E −1 = E.
(2) If E is the elementary matrix obtained from In by multiplying a row Ri by
a nonzero scalar α, then E −1 is the elementary matrix obtained from In by
multiplying the same row Ri by the nonzero scalar α −1 .
(3) If E is the elementary matrix obtained from In by adding a constant multiple
α R j of a row R j to another row Ri , then E −1 is the elementary matrix obtained
from In by adding the constant multiple −α R j of the row R j to the row Ri .
As a simple exercise in matrix multiplication, one may verify that if A ∈ Mmn (F),
then the matrix resulting by applying one of the three types of elementary row opera-
tions to A can also be obtained by left multiplying the matrix A by the corresponding
same type of elementary matrix of order m. In fact, if E is the elementary matrix
obtained from Im by interchanging rows Ri and R j , then E A is the matrix obtained
from A by interchanging the same rows i and j. If E is the elementary matrix
obtained from Im by multiplying a row Ri by a nonzero scalar α, then E A is the
matrix obtained from A by multiplying the same row Ri by α = 0. Finally, if E is
the elementary matrix obtained from Im adding the constant multiple α R j to the row
Ri , then E A is the matrix obtained from A by adding its jth row multiplied by α to
its ith row.

At this point it seems clear that if E 1 , E 2 , . . . , E k are elementary matrices, then


(E 1 E 2 . . . E k )A is a matrix obtained by performing, one by one, all the row operations
represented by E 1 , E 2 , . . . , E k , according to the order of matrix multiplication:

A → E k A −→ E k−1 E k A −→ · · · −→ E 1 E 2 · · · E k A.
36 1 Algebraic Structures and Matrices

Remark 1.101 (i) If the elementary matrix E results from performing a row oper-
ation on Im and if A is an m × n matrix, then the product E A is the matrix that
results when the same row operation is performed on A.
(ii) Every nonsingular square matrix of order n is equivalent to the identity matrix
In and there exists elementary matrices E 1 , . . . , E t such that

In = E t E t−1 · · · E 1 A.

Moreover
A = (E t E t−1 · · · E 1 )−1 = E 1−1 E 2−1 · · · E t−1 .

Recalling that the inverse of any elementary matrix is again an elementary matrix,
we also may assert that any nonsingular square matrix can be expressed by the
product of a finite number of elementary matrices.

Exercises

1. For what values of k ∈ R, is the rank of the following matrix equal to 1, 2 or 3?


⎡ ⎤
1211
⎣k 1 k 1⎦.
k1kk

2. Determine the rank of the following matrix


⎡ ⎤
3 1 2 1
⎢1 2 1 2 ⎥
⎢ ⎥
⎢1 1 ⎥
⎢ 1 1 ⎥.
⎣4 4 3 4 ⎦
1 −2 0 −2

3. Determine a reduced row form of the following matrix


⎡ ⎤
1 1 1 0 1
⎢2 1 3 1 2⎥
⎢ ⎥.
⎣2 2 2 1 1⎦
3 2 4 2 2

4. Determine a reduced row form of the following matrix


⎡ ⎤
k−1 2 1 1
⎣ 1 3 2 k⎦
1 110
1.4 Matrices 37

where k ∈ R.
5. Let A ∈ Mn (C) be a Hermitian matrix. Show that its determinant is a real number.
6. Let A ∈ Mn (R) be such that At = −A. Show that, if the order n is odd, then
|A| = 0.
7. Compute the determinant of the following matrix
⎡ ⎤
1 1 1 ··· 1
⎢ 1 α α2 · · · α n−1 ⎥
⎢ ⎥
⎢ 1 α2 α4 · · · α 2n−2 ⎥
⎢ ⎥,
⎣··· ··· ··· ··· ··· ⎦
· · · α (n−1)
2
1 α n−1 α 2(n−1)

where α = cos 2π n
+ i sin 2π n
.
8. Let Z7 = {0̄, 1̄, 2̄, 3̄, 4̄, 5̄, 6̄} be the field of residue classes (mod 7). Using ele-
mentary row operations, find the inverse of the matrix A ∈ M3 (Z7 ), where
⎡ ⎤
2̄ 3̄ 1̄
A = ⎣ 0̄ 4̄ 5̄ ⎦ .
6̄ 3̄ 4̄

1.5 System of Linear Equations

One of the most important problems in mathematics is that of solving a system of


linear equations. Actually, solving a linear equation can be considered the first step
for studying problems in mathematics. In fact, linear equations occur, at some stage,
in the process of solution for the great part of mathematical problems encountered
in any area of scientific research.
A linear equation in n unknowns is an equation that can be written in the form

a1 x 1 + a2 x 2 + · · · + an x n = b (1.6)

where a1 , . . . , an , b are known scalars and x1 , . . . , xn denote the unknowns. Usually


the scalars a1 , . . . , an are called the coefficients and b is called the constant of the
equation. A solution to Eq. (1.6) is any sequence of scalars c1 , . . . , cn such that

a1 c1 + a2 c2 + · · · + an cn = b

that is such that the substitution x1 = c1 , x2 = c2 , . . . , xn = cn satisfies Eq. (1.6).


An m × n system of linear equations is a set of m equations of the form (1.6).
So, it can be put in the form
38 1 Algebraic Structures and Matrices

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
(1.7)
.........
am1 x1 + am2 x2 + · · · + amn xn = bm ,

where ai1 , . . . , ain are the coefficients and bi is the constant term of the ith equation
in the system. A solution to the system (1.7) is a sequence of scalars c1 , . . . , cn that
is simultaneously a solution to any equation in the system, that is

a11 c1 + a12 c2 + · · · + a1n cn = b1


a21 c1 + a22 c2 + · · · + a2n cn = b2
.........
am1 c1 + am2 c2 + · · · + amn cn = bm .

A system of linear equations may have either infinitely many solutions, no solution
or a unique solution. The system is called consistent if it has at least one solution,
and it is called inconsistent if it has no solution. Among consistent systems, the one
having infinitely many solutions are said to be indefinite, the one having a unique
solution are said to be definite.

Definition 1.102 Two system of linear equations involving the same variables
x1 , . . . , xn are said to be equivalent if they have the same set of solutions.

To obtain the solutions of a linear system, we may then decide to study an equivalent
system, having the same set of solutions as the original one. The goal is to simplify
the given system in order to obtain an equivalent one, whose solutions are easier to
get. In this regard, we will emphasize the role played by three types of operations
that can be used on a system to obtain an equivalent system.
(1) If we interchange the order in which two equations of a system occur, this will
have no effect on the solution set.
(2) If one equation of a system is multiplied by a nonzero scalar, this will have no
effect on the solution set.
(3) By adding one equation to a multiple of another one, we create a new equation
certainly satisfied by any solution of the original equations. Thus, the system
consisting of both the original equations and the resulting new one is equivalent
to the original system.
In short, it seems natural that the main questions we must ask ourselves are
(1) how to recognize if a system is consistent;
(2) if it is consistent, how to describe the set of its solutions, by simply relying on
an equivalent system.
To answer to these questions, we will refer to the matrix theory previously developed.
In doing so, the first step is to collect all coefficients from the equations of the system
(1.7), in order to present them in tabular form
1.5 System of Linear Equations 39
⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
A=⎢
⎣ ...
⎥.
... ... ... ⎦
am1 am2 . . . amn

The matrix A is called the coefficient matrix for the system (1.7). Moreover, if we
augment the coefficient matrix with the extra column
⎡ ⎤
b1
⎢ b2 ⎥
⎢ ⎥
B=⎢
⎢...⎥

⎣...⎦
bm

whose entries are the constants of equations from the system, we obtain a m × n + 1
matrix ⎡ ⎤
a11 a12 . . . a1n | b1
⎢ a21 a22 . . . a2n | b2 ⎥
C =⎢ ⎣ ... ... ... ... | ...⎦

am1 am2 . . . amn | bm

that is called the augmented matrix for the system (1.7). The augmented matrix C is
usually denoted by [A|B] or A . Hence, if we display the coefficients and constants
for the system in matrix form, and by introducing the array
⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
X =⎢
⎢...⎥

⎣...⎦
xn

whose entries are the n unknowns used in the equations, we may express compactly
the system (1.7) as follows:
AX = B. (1.8)

From this point of view, it is clear that the elementary row operations for the
augmented matrix representing a system of linear equations exactly duplicate
the above described operations for equations in the system. This means that, if
C = [A|B] ∈ Mmn+1 (F) is the augmented matrix for the original system (1.8) and
C  = [A |B  ] ∈ Mmn+1 (F) is row equivalent to C, then C  is the augmented matrix
for a system that is equivalent to (1.8). Therefore, to reduce (1.8) to an equivalent and
easier system, we just perform elementary row operations on the augmented matrix
of the system, in order to obtain its reduced row form. Then consider the system
associated with this last reduced matrix to recognize if it is consistent and, in case of
40 1 Algebraic Structures and Matrices

positive answer, to describe its set of solutions. At this point we need to fix a method
that states unequivocally when a system is consistent or not.
Theorem 1.103 A m × n system of linear equations AX = B is consistent if and
only if rank(A) = rank([A|B]).
Therefore we conclude that, in case of consistent systems having rank r lesser than
the number n of unknowns, the general solution of the system can be found as follows:
(i) Recognize the r leading unknowns: the ones whose coefficients are precisely
the pivots of augmented matrix in its reduced row form.
(ii) Assign arbitrary values to n − r free unknowns.
(iii) For any given set of values for the free unknowns, the values of the leading ones
are determined uniquely from the equations in the system, by using the back
substitution.
Hence the system admits infinitely many solutions depending on n − r free param-
eters.
Homogeneous Systems Let us focus our attention on an important special case of
systems of linear equations, more precisely when the constant terms are all zero, i.e.,
bi = 0 for any i = 1, . . . , m in (1.7). Such a system is called homogeneous and, of
course, admits always at least the trivial solution xi = 0, for all i = 1, . . . , n. In fact,
there is no doubt that homogeneous systems are consistent. The augmented matrix
is obtained from the coefficient one by adjoining a zero column, and this would
thus not change the rank of the matrix. Hence, the only real question is whether
a homogeneous system admits other solutions besides the trivial one. To answer
this question it is sufficient to recall the arguments presented as a consequence of
Theorem 1.103 so that we may assert the following:

Theorem 1.104 A m × n homogeneous system of linear equations admits only the


trivial solution if and only if the rank of its coefficient matrix equals the number of
unknowns. Moreover, in case the rank r of coefficient matrix is lesser than the number
n of unknowns, the system admits infinitely many solutions (besides the trivial one)
depending on n − r free parameters.

Exercises

1. Consider the system


2x1 + kx2 = 2 − k
kx1 + 2x2 = k
kx2 + kx3 = k − 1

in three unknowns and one real parameter k. Determine, in case there exist, the
values of the real parameter k for which the system is consistent. Then, in those
cases, determine all the solutions of the system.
1.5 System of Linear Equations 41

2. Determine, in case there exist, the solutions of system

(3 − k)x1 − 2x2 + (k − 2)x3 = 4


2x1 − 6x2 − 3x3 =0
kx1 + 4x2 + 2x3 =7

in three unknowns and one real parameter k.


3. Determine, in case there exist, the nontrivial solutions of the following homoge-
neous system:
x1 + 2x2 − x3 + x4 = 0
2x1 + 4x2 − 2x3 − x4 = 0
x1 − x2 + 2x4 = 0
2x1 + x2 − x3 = 0.

4. Let C be the field of complex numbers. Are the following two systems of linear
equations equivalent? If so, express each equation in each system as a linear
combination of the equations in the other system.

2x1 + (−1 + i)x2 + x4 = 0


3x2 − 2i x3 + 5x4 = 0

and
(1 + 2i )x1 + 8x2 − i x3 − x4 = 0
2
x − 21 x2 + x3 + 7x4 = 0.
3 1

5. Determine, in case there exist, the solutions of system

x1 + 2x2 − x3 + kx4 = 1
2x1 + x2 + x4 = 1
3x1 + 3x2 − x3 + (3 − k)x4 = −1

in four unknowns and one real parameter k.


6. Determine, in case there exist, the solutions of system

x1 + x2 + 2x4 = 2
−x1 − x2 + x3 + x4 = k 2
3x1 + 2x3 + 3x4 = k − 2

in three unknowns and one real parameter k.


7. Determine, in case there exist, the solutions of system

x1 + 2x2 − kx3 =k
x1 + x2 − kx3 =1
3x1 + (2 + k)x2 − (2 + k)x3 = 0
42 1 Algebraic Structures and Matrices

in three unknowns and one real parameter k.


8. Find all solutions to the system of equations

(1 − i)x1 − i x2 = 0
2x1 + (1 − i)x2 = 0.
Chapter 2
Vector Spaces

If we consider the set V of all vectors in a plane (or in a 3-dimensional Euclidean


space) it can be easily seen that the sum of two vectors is a vector again and under
the binary operation of vector addition  + , V forms an additive abelian group. One
can also multiply a vector with a scalar (real numbers) and it is also straightforward
to see that the product of a scalar with a vector is again a vector in the plane (or in
a 3-dimensional Euclidean space); that is, there exists an external binary operation
(or scalar multiplication) R × V → V satisfying certain properties. Motivated by
these two basic operations on V , the notion of an algebraic structure, viz., vector
space was introduced. It is a very important notion not only in Algebra but also has
a significant role in the study of various notions in analysis. Throughout this chapter
F will denote a field.

2.1 Definitions and Examples

Definition 2.1 A nonempty set V equipped with a binary operation say  + and an
external binary operation F × V → V such that (α, v) → α.v is said to be a vector
space over the field F if it satisfies the following axioms:
(1) (V, +) is an abelian group
(2) α.(v+w) = α.v+α.w
(3) (α + β).v = α.v+β.v
(4) (αβ).v = α.(βv)
(5) 1.v = v for all v ∈ V , where 1 is the identity of F
for all α, β ∈ F, v, w ∈ V

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 43
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_2
44 2 Vector Spaces

Remark 2.2 (i) The elements of V are called vectors while the elements of F are
said to be scalars. Throughout, the product between scalar α and vector v will
be denoted as αv instead of α.v. In axiom (3) the sum α + β is defined between
two scalars, the elements of the field F while the sum αv+βv is defined between
two vectors, the elements of V . There should be no confusion, the context will
clear the intension. For the sake of convenience the symbol + will stand for the
addition of two vectors as well as scalars.
(ii) If F = R then V is said to be the real vector space while if F = C, then V is said
to be a complex vector space. For any α, β ∈ F the difference α − β represents
α + (−β), where (−β) is the additive inverse of β ∈ F, considered as an element
of additive group (F, +), while for any u, v ∈ V the difference u − v represents
the vector u + (−v), where −v is the additive inverse of v ∈ V in the additive
group (V, +).

The following lemma is a direct consequence of the axioms of a vector space:

Lemma 2.3 Let V be a vector space over a field F. Then for any α, β ∈ F and
v, w ∈ V
(i) α0 = 0.
(ii) 0v = 0.
(iii) −(αv) = α(−v) = (−α)v.
(iv) αv = 0 if and only if α = 0 or v = 0.
(v) (α − β)v = αv − βv.
(vi) α(v − w) = αv − αw.

There should be no confusion in 0 and 0. In the above result 0 is the additive identity of
the field F while 0 denotes the additive identity of the additive group V . Henceforth,
we shall also denote the additive identity 0 by 0. In use it can be easily understood
whether 0 denotes the additive identity of a field or the additive identity of the group
V.
Example 2.4 (1) Every field F is a vector space over its subfield. If E is a subfield
of F, then F is a vector space over E, under the usual addition of F and the
scalar multiplication αv (or the multiplication of F), for any α ∈ E, v ∈ F. In
particular, every field F is a vector space over itself, the field of complex numbers
C is a vector space over the field of reals R and finally the field R of reals is a
vector space over the field Q of rational numbers.
By (1), every field of characteristic 0 can be regarded as a vector space over
the field Q of rational numbers because each field of characteristic 0 contains a
subfield isomorphic to the field Q of rational numbers. Similarly every field of
characteristic p, where p is a positive prime integer, can be regarded as a vector
space over the field Z p of the field of residue classes modulo p because each
field of characteristic p contains a subfield isomorphic to the field Z p of residue
classes modulo p.
2.1 Definitions and Examples 45

(2) The set


⎧⎡ ⎤ ⎫

⎪ a11 a12 · · · a1n ⎪

⎨⎢ ⎬
⎢ a21 a22 · · · a2n ⎥
⎥ | ai j ∈ F, i = 1, 2, . . . , m, j = 1, 2, . . . , n
⎪ ⎣ ··· ··· ··· ··· ⎦ ⎪

⎩ ⎪

am1 am2 · · · amn

of all Mm×n matrices over a field F is a vector space over F under the matrix
addition and scalar multiplication.
(3) If we consider the set Fn = {(a1 , a2 , . . . , an ) | ai ∈ F, i = 1, 2, . . . , n}. Then
any two elements x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ) ∈ Fn are said to be
equal if and only if xi = yi for each i = 1, 2, . . . , n. Now for any α ∈ R, define
addition and scalar multiplication in Fn as follows:

x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ),

αx = (αx1 , αx2 , . . . , αxn ).

It can be easily seen that Fn is a vector space over F. For F = R and n = 2,


there is a correspondence between any vector in the plane to a unique ordered
pair of real numbers and conversely. Under this correspondence, coordinate-wise
addition and coordinate-wise scalar multiplication correspond, respectively, to
vector addition and scalar multiplication, and hence R2 represents the space of all
vectors in a plane. Note also that the real vector space R2 has special geometric
significance and is called an Euclidean plane.
(4) Let A be a nonempty set, F be a field and let V = { f | f : A → F}. For any
f, g ∈ V and α ∈ F define f + g, α f : A → F such that for any x ∈ A

( f + g)(x) = f (x) + g(x),

(α f )(x) = α f (x).

It can be easily verified that the set V of all functions from A into F is a vector
space over F.
(5) Let F[x] be the polynomial ring in indeterminate x over the field F. In the abelian
group (F[x], +) define scalar multiplication as follows:

α(α0 + α1 x + · · · + αn x n ) = αα0 + αα1 x + · · · + ααn x n .

It can be easily seen that F[x] is a vector space over F.


(6) In the above example, consider the set Fn [x] of all polynomial of degree less
than or equal to n in F[x]. Using the natural addition and scalar multiplication
of polynomials, as defined above, it can be seen that V is a vector space over the
field F.
46 2 Vector Spaces

(7) Consider the set of all real-valued continuous functions C [0, 1] defined on the
interval [0, 1], i.e., f : [0, 1] → R. Define addition and scalar multiplication in
C [0, 1] as follows:
( f + g)(x) = f (x) + g(x),

(α f )(x) = α f (x)

for any α ∈ R, f, g ∈ C [0, 1], x ∈ [0, 1]. Then C [0, 1] is a vector space over
R.
(8) Let F[[x]] be the set of all formal power series in indeterminate
∞ x over a field F,
that is, a collection of the expression
∞ of the form f (x) = i=0 ai x i
, where ai ∈

F. Any two power series f (x) = i=0 ai x i and g(x) = i=0 bi x i are equal if
and only if ai = bi for all i. For any α ∈ F define


f (x) + g(x) = (ai + bi )x i ,
i=0



α f (x) = (αai )x i .
i=0

It can be seen that F[[x]] is a vector space over the field F.

Definition 2.5 A nonempty subset W of a vector space V over a field F is said to


be a subspace of V if W is itself a vector space over the same field F with regard to
induced binary operations.

Lemma 2.6 A nonempty subset W of a vector space V over a field F is a subspace


of V if and only if for any w, w1 , w2 ∈ W and α ∈ F
(i) w1 + w2 ∈ W ,
(ii) αw ∈ W .

Proof If W is a subspace of V , then W is itself a vector space over the same field
F and hence for any α ∈ F and w ∈ W, αw ∈ W , as W is closed with regard to
scalar multiplication. Being an additive group W is nonempty and closed under the
operation of addition, that is, for any w1 , w2 ∈ W, w1 + w2 ∈ W .
Conversely, if the conditions (i) and (ii) hold, then by (i) W is closed under
the addition and by (ii) W is closed under the scalar multiplication. Since W is
nonempty, there exists w ∈ W. Now by condition (ii), 0w = 0 ∈ W and also for
any w ∈ W, −w = (−1)w ∈ W . The operation of addition in V being associative
and commutative is also associative and commutative in W , and thus (W, +) is an
abelian group. The axioms (2)–(5) in the Definition 2.1 hold in W , as they hold in
V . Hence W is itself a vector space over the field F with regard to induced binary
operations and therefore W is a subspace of V .
2.1 Definitions and Examples 47

Remark 2.7 Let V be any vector space. Then {0} and V are always subspaces of
V . These two subspaces are called trivial or improper subspaces of V . Any subspace
W of V other than {0} and V is called nontrivial or proper subspace of V .
Example 2.8 (1) In the vector space R3 , consider the subset W = {(x, y, z) ∈
R3 | αx + βy + γ z = 0}, where α, β, γ ∈ R. Since (0, 0, 0) ∈ W, W = ∅. Let
(x1 , y1 , z 1 ), (x2 , y2 , z 2 ) ∈ W . Then

α(x1 + x2 ) + β(y1 + y2 ) + γ (z 1 + z 2 ) = (αx1 + βy1 + γ z 1 ) + (αx2 + βy2 + γ z 2 ) = 0

and hence (x1 , y1 , z 1 ) + (x2 , y2 , z 2 ) ∈ W . Also since for any δ ∈ R and (x1 ,
y1 , z 1 ) ∈ W , α(δx1 ) + β(δy1 ) + γ (δz 1 ) = δ(αx1 + βy1 + γ z 1 ) = δ0 = 0
implies that δ(x1 , y1 , z 1 ) ∈ W . Hence W is a subspace of R3 .
(2) Consider the subsets W1 = {(x, 0) | x ∈ R} and W2 = {(0, y) | y ∈ R} of R2 . It
can be easily seen that W1 and W2 are subspaces of the vector space R2 over
the field R. Note that W1 and W2 are the X and Y axes, respectively. Similarly,
if we consider the X Y -plane W3 = {(x, y, 0) | x, y ∈ R} in R3 , it can be easily
verified that W3 is a subspace of the vector space R3 over R.
(3) In Example 2.4(6), the subset Fn [x] of F[x] is a subspace of F[x] over the field
F.
(4) Let V = C [0, 1] be the vector space of all real-valued continuous functions on
[0, 1]. Then W , the subset of V , consisting of all differentiable functions is a
subspace of V .
(5) Let Mn (R) denote the vector space of all n × n matrices with real entries over the
field of real numbers. The subsets W1 and W2 of V , consisting of all symmetric
and skew symmetric matrices respectively, are subspaces of the vector space
Mn (R).
Lemma 2.9 A nonempty subset W of a vector space V over a field F is a subspace
of V if and only if for any w1 , w2 ∈ W and α, β ∈ F, αw1 + βw2 ∈ W .
Proof If W is a subspace of V , then for any w1 , w2 ∈ W and α, β ∈ F, αw1 , βw2 ∈
W by Lemma 2.6, αw1 + βw2 ∈ W . Conversely, let W be a nonempty subset of V
such that for any w1 , w2 ∈ W and α, β ∈ F, αw1 + βw2 ∈ W . Since 1 ∈ F, w1 +
w2 = 1w1 + 1w2 ∈ W . Also since 0 ∈ F, αw1 = αw1 + 0w2 ∈ W , for any α ∈ F
and w1 ∈ W . Hence by Lemma 2.6, W is a subspace of V .
Definition 2.10 Let W1 , W2 , . . . , Wn be subspaces of a vector space V over F. Then
n
the sum of W1 , W2 , . . . , Wn is defined as i=1 Wi = W1 + W2 + · · · + Wn = {w1 +
w2 + · · · + wn | wi ∈ Wi , i = 1, 2, . . . , n}.
n
Remark 2.11
  It can be easily seen that  i=1 Wi is a subspace of V . In fact, 0 ∈
n n n
i=1 Wi , i=1 Wi = ∅ and if x, y ∈ i=1 Wi , then x = w1 + w2 + · · · + wn , y =
 
w1 + w2 + · · · + w  n , where wi , w  i ∈ Wi for each i = 1, 2, . . . , n. Since each Wi s
is a subspace we find that x + y = (w1 + w  1 ) + (w2 +
 w  2 ) + · · · + (wn + w  n ) ∈
n n
i=1 Wi and α ∈ F, αx = αw1 + αw2 + · · · + αwn ∈ i=1 Wi . Hence by Lemma
n
2.6, we find that i=1 Wi is a subspace of V .
48 2 Vector Spaces

Lemma 2.12 If V is a vector space over F and {Wi }i∈I , where I is an index set, is
a collection of subspaces of V , then W = i∈I Wi is a subspace of V .

Proof Obviously 0 ∈ W, W = ∅. Let w, w ∈ W . Then w, w  ∈ Wi for each i ∈ I .


Since each Wi is a subspace, for any α ∈ F both w + w  ∈ W and αw ∈ W and
hence W is a subspace of V .

The above result shows that the intersection of two subspaces over a field F is a
subspace over F, but it is to be noted that the union of two subspaces need not be a
subspace.

Example 2.13 (1) R2 is a vector space over R. Consider two subspaces of R2 ,


namely, W1 = {(x, 0) | x ∈ R} and W2 = {(0, y) | y ∈ R}. If we take (1, 0) ∈
W1 and (0, 1) ∈ W2 , then (1, 0), (0, 1) ∈ W1 ∪ W2 but their sum (1, 0) + (0, 1)=
(1, 1) ∈/ W1 ∪ W2 and hence W1 ∪ W2 is not a subspace of R2 .
(2) In the vector space R3 , W1 = {(x, y, 0) | x, y ∈ R}, i.e., X Y -plane and W2 =
{(0, y, z) | y, z ∈ R}, i.e., Y Z -plane are subspaces of R3 . Then it can be easily
seen that R3 = W1 + W2 . In fact, for any (x, y, z) ∈ R3 , (x, y, z) = (x, y, 0) +
(0, 0, z) ∈ W1 + W2 , as (x, y, 0) ∈ W1 , (0, 0, z) ∈ W2 . This shows that R3 ⊆
W1 + W2 . Also W1 + W2 , being a subspace of R3 , is a subset of R3 and hence
R3 = W1 + W2 .

Definition 2.14 Let W1 , W2 be two subspaces of a vector spaceV over F. Then V is


said to be a direct sum of W1 and W2 if V = W1 + W2 and W1 W2 = {0}, denoted
as V = W1 ⊕ W2 . Each of Wi s are called direct summand of V . For each subspace
W2 of V , W2 is also known as complement of W1 and W1 is a complement of W2 .

Lemma 2.15 Let V be a vector space over F, and W1 , W2 be two subspaces of V .


Then V = W1 ⊕ W2 if and only if every v ∈ V can be uniquely written as w1 +
w2 , w1 ∈ W 1 , w2 ∈ W 2 .

Proof Assume that V = W1 ⊕ W2 and v ∈ V . Then v = w1 + w2 , w1 ∈ W1 , w2 ∈


W2 . If v has another representation say v = w1 + w2 , w1 ∈ W1 , w2 ∈ W2 , then
w1 − w1 = w2 − w2 ∈ W1 ∩ W2 = {0}. This implies that w1 = w1 and w2 = w2
and hence every element v ∈ V can be uniquely written as v = w1 + w2 , where
w1 ∈ W 1 , w 2 ∈ W 2 .
Conversely, assume that v ∈ V can be uniquely written as v = w1 + w2 , where
w1 ∈ W1 , w2 ∈ W2 . Then obviously V = W1 + W2 . Now let w ∈ W1 W2 . Then
w ∈ W1 and w ∈ W2 . Hence w = w + 0, w ∈ W1 , 0 ∈ W2 and also w = 0 + w, 0 ∈
W1 , w ∈ W2 . By the uniqueness of the expression for w, we find that w = 0, and
hence W1 ∩ W2 = {0} i.e., V = W1 ⊕ W2 .

In view of the above result one can generalize the definition of direct sum to a finite
number of subspaces as follows:
2.1 Definitions and Examples 49

Definition 2.16 Let V be a vector space and W1 , W2 , . . . , Wn be subspaces of V .


Then V is said to be a direct sum of W1 , W2 , . . . , Wn if every v ∈ V can be uniquely
expressed as v = w1 + w2 + · · · + wn , where wi ∈ Wi for each i = 1, 2, . . . , n.
Note that the above direct sum is in fact internal direct sum of W1 , W2 , . . . , Wn .
Consider any finite number of vector spaces V1 , V2 , . . . , Vn over the same field F and
let V = {(v1 , v2 , . . . , vn ) | vi ∈ Vi , i = 1, 2, . . . , n}. Any two elements (v1 , v2 , . . . ,
vn ), (v  1 , v  2 , . . . , v  n ) ∈ V are said to be equal if and only if vi = v  i for each
i = 1, 2, . . . , n. For any (v1 , v2 , . . . , vn ), (v  1 , v  2 , . . . , v  n ) ∈ V and α ∈ F define
addition and scalar multiplication in V as follows:

(v1 , v2 , . . . , vn ) + (v  1 , v  2 , . . . , v  n ) = (v1 + v  1 , v2 + v  2 , . . . , vn + v  n )

α(v1 , v2 , . . . , vn ) = (αv1 , αv2 , . . . , αvn ).

It can be easily seen that V is a vector space over the field F with regard to the
above operations. We call V as the external direct sum of V1 , V2 , . . . , Vn , denoted as
V = V1 ⊕ V2 ⊕ · · · ⊕ Vn .

Lemma 2.17 Let W1 , W2 , . . . , Wn be subspaces of a vector space over F. Then


V is an internal direct sum of W1 , W2 , . . . , Wn , i.e., V = W1 ⊕ W2 ⊕ · · · ⊕ Wn if
and only if V = W1 + W2 + · · · + Wn and Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1
+ · · · + Wn ) = {0} for every i = 1, 2, 3, . . . , n.
Proof Suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Wn . Then it is obvious to observe
that V = W1 + W2 + · · · + Wn . Let w ∈ Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 +
· · · + Wn ). This implies that w ∈ Wi and w = w1 + w2 + · · · + wi−1 + wi+1 +
· · · + wn for some w1 ∈ W1 , w2 ∈ W2 , . . . , wi−1 ∈ Wi−1 , wi+1 ∈ Wi+1 , . . . , wn ∈
Wn . Here w ∈ V has two representations as: w = 0 + 0 + · · · + 0 + w + 0 + · · · +
0 and w = w1 + w2 + · · · + wi−1 + 0 + wi+1 + · · · + wn . But as each element
of V has unique representation, we are forced to conclude that w = 0. Hence
it follows that Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0} for every
i = 1, 2, 3, . . . , n.
Conversely, suppose that V = W1 + W2 + · · · + Wn and Wi ∩ (W1 + W2 + · · · +
Wi−1 + Wi+1 · · · + Wn ) = {0} for every i = 1, 2, 3, . . . , n. It is clear that if v ∈
V , then there exists w1 ∈ W1 , w2 ∈ W2 , . . . , wn ∈ Wn such that v = w1 + w2 +
· · · + wi−1 + wi + wi+1 + · · · + wn . To prove the uniqueness of this representa-
     
tion, let us suppose that v = w1 + w2 + · · · + wi−1 + wi + wi+1 + · · · + wn also,
    
where w1 ∈ W1 , w2 ∈ W2 , . . . , wn ∈ Wn . This implies that wi − wi = (w1 − w1 ) +
   
(w2 − w2 ) + · · · + (wi−1 − wi−1 ) + (wi+1 − wi+1 )+ · · · + (wn − wn ) ∈ Wi ∩ (W1
+ W2 + · · · + Wi−1 + Wi+1 · · · + Wn ). Using our hypothesis we are forced to con-

clude that wi = wi , for each i = 1, 2, . . . , n. This shows that representation is unique
and therefore V is an internal direct sum of W1 , W2 , . . . , Wn .
Theorem 2.18 Let W1 , W2 , . . . , Wn be any n ≥ 1 subspaces of vector space V over
a field F. Then V = W1 ⊕ W2 ⊕ · · · ⊕ Wn if and only if V = W1 + W2 + · · · + Wn
50 2 Vector Spaces

and for any vi ∈ Wi ; 1 ≤ i ≤ n, v1 + v2 + · · · + vn = 0 implies that v1 = v2 =


· · · = vn = 0.

Proof For n = 1, V = W1 and the result is obvious, and hence assume that n ≥ 2.
Suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Wn . Hence by Lemma 2.17, V = W1 + W2 +
· · · + Wn and Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0}. Now for
some v j ∈ W j , let v1 + v2 + · · · + vn = 0. Then vi = − nj=1, j=i v j ∈ Wi ∩ (W1 +
W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ). This yields that vi = 0 for every i.
Conversely, assume that V = W1 + W2 + · · · + Wn and for any vi ∈ Wi ; 1 ≤ i ≤
n, v1 + v2 = · · · + vn = 0 implies that v1 = v2 = · · · = vn = 0. For some i let v ∈
Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ). Then v = wi ∈ Wi and v ∈
W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn implies that v = v1 + v2 + · · · + vi−1
+ vi+1 + · · · + vn  and consequently wi = v1 + v2 + · · · + vi−1 + vi+1 + · · · + vn .
This implies that nj=1 v j = 0, where vi = −wi . By our hypothesis v j = 0 and
in particular vi = 0 and hence v = −vi = 0. Therefore Wi ∩ (W1 + W2 + · · · +
Wi−1 + Wi+1 + · · · + Wn ) = {0} and hence V = W1 ⊕ W2 ⊕ · · · ⊕ Wn .

Definition 2.19 Let V be a vector space over a field F and v1 , v2 , . . . , vn ∈ V . Then


any element of the form α1 v1 + α2 v2 + · · · + αn vn , where αi ∈ F, i = 1, 2, . . . , n,
is called a linear combination of v1 , v2 , . . . , vn over F.

Definition 2.20 Let S be a non-empty subset of a vector space V over a field F, then
the linear span of S, denoted as L(S) is the set of all linear combinations of finite
sets of elements of S, i.e.,
 k 

L(S) = αi vi | αi ∈ F, vi ∈ S, k ∈ N .
i=1

Note that in the above definition αi , vi , k are chosen from their respective domains.
Since α0 = 0 for all α ∈ F, for S = {0}, L(S) = {0}. Moreover, if S = {v} for some
v ∈ V , then L(S) = {αv | α ∈ F}.

Theorem 2.21 Let S, T be nonempty subsets of a vector space V over a field F,


then
(i) L(S) is the smallest subspace of V containing S,
(ii) If S ⊆ T, then L(S) ⊆ L(T ),
(iii) L(S ∪ T ) = L(S) + L(T ),
(iv) L(L(S)) = L(S),
(v) L(S) = S if and only if S is a subspace of V .

Proof (i) First we shall show that L(S) is a subspace


k of V . Since 0 = 10, 0 ∈ L(S)
 
and hence L(S) = ∅. Let v, v ∈ L(S). Then v = i=1 αi vi , αi ∈ F, vi ∈ S and v =
     k   
i=1 βi vi , βi ∈ F, vi 
∈ S, and hence v + v = i=1 αi vi + i=1 βi vi ∈ L(S) and
k
for any γ ∈ F, γ v = i=1 (γ α)vi ∈ L(S). This shows that L(S) is a subspace of
V . For any v ∈ S, v = 1v, v ∈ L(S), and hence L(S) is a subspace of V containing
2.1 Definitions and Examples 51

S. To show that L(S) is the smallest subspace of V containing S,assume that W is


k
a subspace of V which contains S. Then for any v ∈ L(S), v = i=1 αi vi , vi ∈ S.
Since S is contained in W , we find that each of vi ∈ W and W being a subspace of
V contains v and hence L(S) ⊆ W for any subspace containing S. This proves our
assertion.
(ii) Assume that S ⊆ T . Then by the above case we find that S ⊆ T ⊆ L(T )
and hence S ⊆ L(T ). But by (i), L(S) is the smallest subspace of V containing S,
L(S) ⊆ L(T ).
(iii) Consider any two subsets S, T of V and assume that W = L(S) + L(T ).
But since S ⊆ L(S) ⊆ W and T ⊆ L(T ) ⊆ W , we find that S, T ⊆ W and hence
S ∪ T ⊆ W . Therefore using (i) L(S ∪ T ) ⊆ W. Now assume that w ∈ W and hence
w = w1 + w2 such that w1 ∈ L(S), w2 ∈ L(T ). Since S, T ⊆ S ∪ T, L(S), L(T ) ⊆
L(S ∪ T ). This implies that w = w1 + w2 ∈ L(S ∪ T ) and hence W ⊆ L(S ∪ T ).
Combining the results so obtained we find that L(S ∪ T ) = L(S) + L(T ).
k
(iv) Obviously L(S) ⊆ L(L(S)). Now let v ∈ L(L(S)). Then m v = i=1 αi vi ,
αi ∈ F, vi ∈ L(S). Now each of vi in L(S) can be written as vi = j=1 βi j vi j , βi j ∈
k m
F, vi j ∈ S. This yields that v = i=1 j=1 αi βi j vi j , where αi , βi j ∈ F, vi j ∈ S. This
shows that L(L(S)) ⊆ L(S). Thus result stands proved.
(v) Assume that L(S) = S. By (i), we know that L(S) is a subspace of V . As a
result, S is a subspace of V . Conversely, suppose that S is a subspace of V . As by
(i), L(S) is the smallest subspace of V containing S, i.e., S ⊆ L(S). But S is also
a subspace of V containing S as a subset, thus we have L(S) ⊆ S. Finally, we have
proved that L(S) = S.

In view of Theorem 2.21(iii) and (v), the following result follows directly:
Corollary 2.22 If W1 , W2 are any two subspaces of a vector space V over F, then
W1 + W2 is a subspace of V spanned by W1 ∪ W2 .

Definition 2.23 Let X be a subset of a vector space V over a field F. A subspace W


of V is said to be generated by X , denoted as X , if X ⊆ W and for any subspace
W  of V , X ⊆ W  implies that W ⊆ W  .

Theorem 2.24 Let S be a nonempty subset of a vector space V over a field F, then
(i) S = L(S),
(ii) S = i∈I Wi , where I is an index set and Wi is a subspace of V containing S
as a subset.

Proof (i) By definition, it is clear that S is a subspace of V , containing S as a


subset. But L(S) is the smallest subspace of V , containing S as a subset. Thus we
conclude that L(S) ⊆ S. Again by definition of S, it is obvious that S ⊆ L(S).
Finally, it follows that S = L(S).
(ii)
 For each i ∈ I , Wi is a subspace of V containing S as a subset. This shows
that i∈I Wi is also a subspace
 of V containing
 S as a subset. By the definition of
S, we obtain that S ⊆ i∈I Wi . Let w ∈ i∈I Wi . This shows that w ∈ Wi , for
each i ∈ I . As S is also a subspace of V containing S as a subset, in particular
52 2 Vector Spaces

 S = Wi0 for some i 0 ∈ I . Now we


we find that conclude that w ∈ Wi0 = S. This
gives us i∈I Wi ⊆ S. Hence, we get S = i∈I Wi .

Remark 2.25 If S is a subset of a vector space  V , where S = ∅, then we can also


talk about S and L(S). Actually here S = i∈I Wi , where Wi is a subspace of V
containing S = ∅ as a subset. But each subspace of V contains ∅ as its subset. This
shows that Wi ’s, i ∈ I are precisely subspaces of 
V . In particular, Wi = {0} for some
i ∈ I . Thus we conclude that in this case S = i∈I Wi = {0}. On the other hand,
L(S) = {0} is vacuously true. Finally, we can say that the above theorem holds if S
is any arbitrary subset of V .

Theorem 2.26 Let {v1 , v2 , . . . , vn } be a set of vectors which spans a subspace W


of a vector space V . If for some j; 1 ≤ j ≤ n, v j is a linear combination of the
remaining n − 1 vectors, then {v1 , v2 , . . . , v j−1 , v j+1 , . . . , vn } also spans W .

Proof Since v j is a linear combination of v1 , v2 , . . . , v j−1 , v j+1 , . . . , vn , there exist


 j−1 n
scalars αi such that v j = i=1 αi vi + i= j+1 αi vi . Now let v be an arbitrary vector
in W , then
n
v= β v
=1
n
= β v + β j v j
=1,= j  
n 
j−1 
n
= β v + β j αi vi + αi vi .
=1,= j i=1 i= j+1

This shows that {v1 , v2 , . . . , v j−1 , v j+1 , . . . , vn } spans W .

Definition 2.27 Let V be a vector space over a field F and W a subspace of V . Let

V /W = {v + W | v ∈ V }.

Define addition and multiplication in V /W as follows:


(i) (v1 + W ) + (v2 + W ) = v1 + v2 + W , for any v1 + W, v2 + W ∈ V /W .
(ii) α(v1 + W ) = αv1 + W for any α ∈ F, v1 + W ∈ V /W.
It can be easily seen that addition and scalar multiplication both are well-defined and
(1) (V /W, +) is an abelian group.
 α, β ∈ F and v1 + W,
(2) For any  v2 + W ∈ V /W
(i) α (v1 + W ) + (v2 + W ) = α(v1 + W ) + α(v2 + W )
(ii) (α + β)(v1 + W ) = α(v1 + W ) + β(v1 + W )
(iii) αβ(v1 + W ) = (αβ)v1 + W = α(βv1 ) + W = α(βv1 + W )
(iv) 1(v1 + W ) = 1v1 + W = v1 + W.
V /W is a vector space over F and is called quotient space of V relative to W .
2.1 Definitions and Examples 53

Remark 2.28 Since (V /W, +) is the quotient group of (V, +) with regard to its
normal subgroup (W, +), it is obvious to observe the following for all v, v1 , v2 ∈ V :
(i) v + W = ∅.
(ii) v + W = W if and only if v ∈ W .
(iii) v1 + W = v2 + W if and only if (v1 − v2 ) ∈ W .
(iv) Any two elements of V /W are either equal or mutually disjoint.
(v) Union of all the elements of V /W equals V . Thus the quotient space V /W gives
a partition of the vector space V .

Example 2.29 Let V be the vector space of all polynomials of degree less than or
equal to n over the field R of real numbers, where n ≥ 2 is a fixed integer. Assume
that W is a subspace of V , consisting of all polynomials of degree less than or equal
to (n − 2). Hence

V /W = {(α0 + α1 x + α2 x 2 + · · · + αn−2 x n−2 + αn−1 x n−1 + αn x n ) + W |


α0 , α1 , α2 , . . . , αn−2 , αn−1 , αn ∈ R}
= {((α0 + α1 x + α2 x 2 + · · · + αn−2 x n−2 ) + W
+(αn−1 x n−1 + αn x n ) + W )|α0 , α1 , α2 , . . . , αn−2 , αn−1 , αn ∈ R}
= {(αn−1 x n−1 + αn x n ) + W |αn−1 , αn ∈ R}

It is easy to notice that V /W = {αn−1 (x n−1 + W ) + αn (x n + W )|αn−1 , αn ∈ R}.


This implies that V /W = L(S), where S = {(x n−1 + W ), (x n + W )} ⊆ V /W.

Exercises

1. Let U and W be vector spaces over a field F. Let V be the set of ordered pairs
(u, w) where u ∈ U and w ∈ W . Show that V is a vector space over F with regard
 
to addition in V and scalar multiplication on V defined by (u, w) + (u , w ) =
 
(u + u , w + w ) and α(u, v) = (αu, αv), where α ∈ F.
2. Let AX = B be a nonhomogeneous system of linear equations in n unknowns,
that is B = 0. Show that the solution set is not a subspace of F n .
3. Let V be the vector space of all functions from the real field R into R. Prove that
W is a subspace of V if W consists of all bounded functions.
4. Suppose U, W1 , W2 are subspaces of a vector space V over a field F. Show that
(U ∩ W1 ) + (U ∩ W2 ) ⊆ U ∩ (W1 + W2 ).
5. Give examples of three subspaces U, W1 , W2 of a vector space V such that
(U ∩ W1 ) + (U ∩ W2 ) = U ∩ (W1 + W2 ).
6. Let S = {(xn ) | xn ∈ R} be the set of all real sequences. Then S is a vector space
over R under the following operations:

(xn ) + (yn ) = (xn + yn ),


54 2 Vector Spaces

α(xn ) = (αxn ), α ∈ R.

Let C be the set of all convergent sequences and C0 be the set of all null sequences.
Then show that C and C0 are vector subspaces of S.
7. Why does a vector space V over F(= C, R or Q) contain either one element or
infinitely many elements? Given v ∈ V, is it possible to have two distinct vectors
u, w ∈ V such that u + v = 0 and w + v = 0?  
a b
8. Let H be the collection of all complex 2 × 2 matrices of the form . Show
−b̄ ā
that H is a vector space under the usual matrix addition and scalar multiplication
over R. Is H also a vector space over C?
9. Show that if W1 is a subspace of a vector space V , and if there is a unique
subspace W2 such that V = W1 ⊕ W2 , then W1 = V .
10. Let U = {(a, b, c) | a = b = c, a, b, c ∈ R} and W = {(0, b, c) | b, c ∈ R} be
subspaces of R3 . Show that R3 = U ⊕ W .
11. Let U1 = {(a, b, c) | a = c, a, b, c ∈ R}, U2 = {(a, b, c) | a + b + c = 0, a, b,
c ∈ R} and U3 = {(0, 0, c) | c ∈ R} be subspaces of R3 . Show that
(a) R3 = U1 + U2 ,
(b) R3 = U2 + U3 ,
(c) R3 = U1 + U3 .
When is the sum a direct sum?
12. Let W1 , W2 and W3 be subspaces of a vector space V . Show that W1 + W2 + W3
is not necessarily a direct sum even though

W1 ∩ W2 = W1 ∩ W3 = W2 ∩ W3 = {0}.

13. Show that M2 (R) = W1 ⊕ W2 , where


  
a b
W1 = | a, b ∈ R ,
−b a
  
c d
W2 = | c, d ∈ R .
d −c

14. Let C(R) be the vector space of all real-valued functions over R, and let W1 and
W2 be the collections of even and odd continuous functions on R, respectively.
Show that W1 and W2 are subspaces of C(R). Show further that C(R) = W1 ⊕
W2 .
15. Give an example of a vector space V having any three different nonzero subspaces
W1 , W2 , W3 such that V = W1 ⊕ W2 = W2 ⊕ W3 = W3 ⊕ W1 .
2.2 Linear Dependence, Independence and Basis 55

2.2 Linear Dependence, Independence and Basis

Definition 2.30 Let S = {v1 , v2 , . . . , vn } be a finite set, having n vectors in a vec-


tor space V over a  field F. Then S is said to be linearly independent if for any
n
α1 , α2 , . . . , αn ∈ F, i=1 αi vi = 0 implies that αi = 0 for each i = 1, 2, . . . , n. If
S is not linearly independent then S is said to be linearly dependent, that is, if there
n
exist scalars α1 , α2 , . . . , αn ∈ F, not all zero, such that i=1 αi vi = 0.
On the other hand, an infinite subset S of V is said to be linearly independent if
every finite subset of S is linearly independent. In this case also if S is not linearly
independent then S is said to be linearly dependent, i.e., if there exists a finite subset
of S, which is linearly dependent.

Example 2.31 (1) Let V = {(a, b, c) | a, b, c ∈ F}. This is a vector space over F
in which vectors e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) are linearly inde-
pendent. In fact, if there exist scalars α1 , α2 , α3 ∈ F such that α1 (1, 0, 0) +
α2 (0, 1, 0) + α3 (0, 0, 1) = (α1 , α2 , α3 ) = (0, 0, 0) implies that α1 = α2 = α3 =
0.
(2) Let V denote the vector space of all polynomials in x over the field R of
real numbers, i.e., V = {α0 + α1 x + α2 x 2 + · · · + αn x n | α0 , α1 , α2 , . . . , αn ∈
R, n ∈ N ∪ {0}}. Assume S is an infinite subset of V , where S = {1, x, x 2 , x 3 ,
x 4 , . . . , x n , . . .}. The set S is a linearly independent subset of V . This is due
to the fact that if we take any finite subset of S, then it will be of the form
{xi1
, x i2 , x i3 , . . . , x im }, where i 1 , i 2 , i 3 , . . . , i m are some nonnegative integers and
if m k=1 λi k x = 0, this shows that λi k = 0 for each k = 1, 2, . . . , m and hence
ik

{x , x , x , . . . , x im } is a linearly independent subset of V .


i1 i2 i3

Remark 2.32 (i) If 0 ∈ S, then S is linearly dependent.


(ii) {v}, where v ∈ V , is linearly independent if and only if v = 0.
(ii) If any two vectors u, v are linearly independent in V , then u + v and u − v are
also linearly independent.
(iii) Any two nonzero vectors are linearly dependent in V if and only if one is a
scalar multiple of the other.
(iv) Every superset of a linearly dependent subset of V is linearly dependent.
(v) Every subset of a linearly independent subset of V is linearly independent.
(vi) The empty subset ∅ of V is linearly independent, as the condition of linear
independence holds vacuously for ∅.

Theorem 2.33 Let V be a vector space over a field F. A subset  S = {v1 , v2 , . . . , vn }


of nonzero vectors of V is linearly dependent if and only if vi = nj=1, j=i α j v j , α j ∈
F, for some i.

ProofIf S is linearly dependent, then there exist scalars βi ∈ F not all zero such
n
that i=1 βi vi = 0. Suppose
 that βi = 0 for some i.  Then the above expression
can be written as vi = nj=1, j=i (−βi−1 β j )v j , i.e., vi = nj=1, j=i α j v j , where α j =
−βi−1 β j ∈ F.
56 2 Vector Spaces

 if for some i, vi can be expressed as a linear combination of v j , j = i,


Conversely,
i.e., vi = nj=1, j=i α j v j , where α j ∈ F, then this yields that α1 v1 + α2 v2 + · · · +
(−1)vi + αi+1 vi+1 + · · · + αn vn= 0. This shows that there exist scalars α1 , α2 , . . . ,
n
αn with αi = −1 = 0 such that i=1 αi vi = 0, and hence S is linearly dependent.
Theorem 2.34 Let V be a vector space over a field F. An ordered subset S =
{v1 , v2 , . . . , vn } of nonzero vectors of V is linearly dependent if and only if there
exists some vector vk , 2 ≤ k ≤ n, which is a linear combination of preceding ones.
Proof Assume that ordered set S is linearly dependent. Consider the set S1 = {v1 },
which is obviously a linearly independent set. Now consider the set S2 = {v1 , v2 }.
If S2 is a linearly dependent set, then there exist scalars α1 , α2 ∈ F, not both zero
such that α1 v1 + α2 v2 = 0. Here we claim that α2 = 0. For otherwise S1 becomes a
linearly dependent set, leading to a contradiction. Now multiplying both sides of the
previous relation by α2−1 , we arrive at v2 = λv1 , where λ = −α2−1 α1 , i.e., vk is a linear
combination of preceding vector, where k = 2. If S2 is a linearly independent set,
then we consider the set S3 = {v1 , v2 , v3 }. If S3 is a linearly dependent set, then there
exist scalars β1 , β2 , β3 ∈ F, not all zero such that β1 v1 + β2 v2 + β3 v3 = 0. Using the
similar argument as above, it can be easily shown that β3 = 0 and v3 = v1 + δv2 ,
for some , δ ∈ F, i.e., vk is a linear combination of preceding vectors, where k = 3.
Let m be the least positive integer such that the set Sm = {v1 , v2 , v3 , . . . , vm } is
linearly dependent. Using similar argument as above, it can be shown that vm =
δ1 v1 + δ2 v2 + · · · + δm−1 vm−1 , i.e., vk is a linear combination of preceding vectors,
where k = m. In the worst case k = n, will do our job. It is obvious that 2 ≤ k ≤ n.
Conversely, suppose that vk , 2 ≤ k ≤ n is a linear combination of preceding vectors,
i.e., there exist scalars 1 , 2 , . . . , k−1 such that vk = 1 v1 + 2 v2 + · · · + k−1 vk−1 .
This shows that 1 v1 + 2 v2 + · · · + k−1 vk−1 + (−1)vk + 0vk+1 + 0vk+2 + · · · +
0vn = 0 and thus the ordered set S = {v1 , v2 , . . . , vn } is linearly dependent.
Definition 2.35 A subset B of a vector space V over F is called a basis of V if
(i) B is linearly independent.
(ii) B spans V , i.e., L(B) = V .

Remark 2.36 (i) Basis of a vector space need not be unique.


(ii) Every field F is a vector space over itself. If 0 = α ∈ F, then {α} is a basis of F.
(iii) Let B = {v1 , v2 , . . . , vn } be a subset of a vector space V over F. Then B is
nof V if and only if every vector v
a basis ∈ V can be 
n
expressed uniquely as
n
v = i=1 αi vi where αi ∈ F, i.e., if v = i=1 αi vi = i=1 βi vi , αi , βi ∈ F,
then αi = βi for each i = 1, 2, . . . , n.
Example 2.37 (1) Consider the vector space V = Fn over a field F, where n is a pos-
itive integer. Then the set of n-tuples e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0),
. . . , en = (0, 0, 0, . . . , 1) ∈ V . For any v = (v1 , v2 , . . . , vn ) ∈ V can be writ-
ten as v = e1 v1 + e2 v2 + · · · + en vn , and hence the set {e1 , e2 , . . . , en } spans
V . Moreover, {e1 , e2 , . . . , en } is linearly independent subset of V . Hence,
{e1 , e2 , . . . , en } is a basis of V , which is called the standard basis of V = Fn .
2.2 Linear Dependence, Independence and Basis 57

(2) Let V = Fn [x], the set of all polynomials in indeterminate x of degree less than or
equal to n. V is a vector space over the field F. Now let B = {1, x, x 2 , . . . , x n } ⊆
V . It can be easily seen that any f (x) ∈ V can be written as f (x) = α0 1 + α1 x +
· · · + αn x n with αi ∈ F. This shows that B spans V . Moreover, α0 1 + α1 x +
· · · + αn x n = 0 yields that α0 = α1 = · · · = αn = 0 and hence B is linearly
independent. Thus B = {1, x, x 2 , . . . , x n } is a basis of V , which is called the
standard basis of V .
(3) The set M2   2×
of all  2matrices
  overR forms a vector space over R. The set
10 01 00 00
B= , , , is a standard basis of M2 .
00 00 10 01
Definition 2.38 Let V be a vector space over a field F. A subset S of V is called a
maximal linearly independent set if
(i) S is linearly independent.
  
(ii) S ⊂ S , where S ⊆ V , implies that S is a linearly dependent set.

Definition 2.39 Let V be a vector space over a field F. A subset S of V is called a


minimal set of generators if
(i) S = V .
  
(ii) S ⊂ S, where S ⊆ V , implies that S  = V .

Theorem 2.40 Let V be a vector space over a field F and let S ⊆ V . Then following
statements are equivalent.
(i) S is a maximal linearly independent set of V .
(ii) S is a minimal set of generators of V .
(iii) S is a basis of V .
(iv) Every element of V can be uniquely written as a linear combination of finitely
number of elements of S, i.e., for any v ∈ V if v = α1 v1 + α2 v2 + · · · +
αn vn and v = β1 v1 + β2 v2 + · · · + βn vn , where αi , βi ∈ F and vi ∈ S, i =
1, 2, 3, . . . , n then αi = βi for all i = 1, 2, 3, . . . , n.

Proof (i) ⇒ (ii) We know that S = L(S). First we prove that L(S) = V . L(S) ⊆
V holds obviously. Let v ∈ V . If v ∈ S, then obviously v = 1v, which shows that
v ∈ L(S). On the other hand, if v ∈ V \ S, then S ∪ {v} is a linearly dependent
set since S is a maximal linearly independent set. There exists a finite subset T
of S ∪ {v} containing the element v of S, which is a linearly dependent set. Let
T = {v1 , v2 , . . . , vn , v}. There exist scalars α1 , α2 , . . . , αn , α, not all zero such that
α1 v1 + α2 v2 + · · · + αn vn + αv = 0. Here we claim that α = 0, for otherwise the
set {v1 , v2 , . . . , vn } becomes a linearly dependent set, leading to a contradiction. Now
we get v = (−(α −1 α1 ))v1 + (−(α −1 α2 ))v2 + · · · + ((−α −1 αn )vn . This implies that
v ∈ L(S) and thus V ⊆ L(S). Finally we have proved that L(S) = V .
Now we prove that S is a minimal set such that S = V . Suppose on contrary,
i.e., there exists P  S such that P = V , i.e., L(P) = V . Let w ∈ S − P. There
exist scalars β1 , β2 , . . . , βm such that w = β1 w1 + β2 w2 + · · · + βm wm for some
58 2 Vector Spaces

w1 , w2 , . . . , wm ∈ P. This shows that the set {w1 , w2 , . . . , wm , w} ⊆ S is a linearly


dependent set which is a contradiction. Including the above arguments, we have
shown that S is a minimal set of generators.
(ii) ⇒ (iii) Suppose that S is a minimal set of generators. Then we have to
prove that S is a basis of V . By hypothesis S = V , i.e., L(S) = V . We have
to only prove that S is a linearly independent set. Suppose on contrary, i.e., S
is a linearly dependent set. Then there exist a set {v1 , v2 , . . . , vn } ⊆ S, which is
a linearly dependent set. Hence there exist scalars α1 , α2 , . . . , αn , not all zero
such that α1 v1 + α2 v2 + · · · + αi vi + · · · + αn vn = 0. Let us suppose that αi = 0.
Now preceding relation can be written as vi = (−(αi−1 α1 ))v1 + (−(αi−1 α2 ))v2 +
· · · + (−(αi−1 αi−1 )vi−1 + (−(αi−1 αi+1 )vi+1 + · · · + (−(αi−1 αn )vn . This shows that
< S − {vi } >= V , which is a contradiction. Hence S is a linearly independent set.
Thus S is a basis of V .
(iii) ⇒ (iv) Let S be a basis of V . This implies that L(S) = V . For any v ∈ V if
v = α1 v1 + α2 v2 + · · · + αn vn and v = β1 v1 + β2 v2 + · · · + βn vn , where αi , βi ∈
F and vi ∈ S, i = 1, 2, 3, . . . , n. This implies that (α1 − β1 )v1 + (α2 − β2 )v2 +
· · · + (αn − βn )vn = 0. As S is a linearly independent set, we are forced to conclude
that (αi − βi ) = 0 for all i = 1, 2, 3, . . . , n, i.e., αi = βi for all i = 1, 2, 3, . . . , n.
Hence each element of V is uniquely expressible as a linear combination of finitely
number of elements of S.
(iv) ⇒ (i) First we prove that S is a linearly independent set. Let {v1 , v2 , . . . , vn }
⊆ S and α1 v1 + α2 v2 + · · · + αn vn = 0 for some scalars α1 , α2 , . . . , αn . But we also
have 0 = 0v1 + 0v2 + · · · + 0vn . Using hypothesis we are forced to conclude that
αi = 0 for all i = 1, 2, 3, . . . , n. This implies that S is a linearly independent set.
Next we show that S is a maximal linearly independent set. Let S  T ⊆ V .
Suppose that v ∈ T \ S. Hence there exist unique scalars β1 , β2 , . . . , βm such that
v = β1 v1 + β2 v2 + · · · + βm vm where v1 , v2 , . . . , vm ∈ S. This shows that the set
{v, v1 , v2 , . . . , vm } is a linearly dependent set. Hence T is a linearly dependent set.
Thus we have proved that S is a maximal linearly independent set.
Lemma 2.41 Let S be a linearly independent subset of a vector space V and let
S = S1 ∪ S2 with S1 ∩ S2 = ∅. If W, W1 , W2 are subspaces of V spanned by S, S1 , S2
respectively, then W = W1 + W2 and W1 ∩ W2 = {0}.
Proof In fact, L(S1 ∪ S2 )= L(S1 ) + L(S
 2 ) i.e., W = W1 + W2 . Moreover, if 0 =
v ∈ W1 ∩ W2 , then v = i=1 m
αi vi = nj=1 β j v j ; vi ∈ S1 , v j ∈ S2 for some m ≥
m 
1, n ≥ 1. This yields that i=1 αi vi − nj=1 β j v j = 0. Since v = 0, there exist
some αi = 0 and some βi = 0. As S1 ∩ S2 = ∅, B = {v1 , v2 , . . . , vm , v1 , v2 , . . . , vn }
is a subset of S with m + n vectors. But the latter relation shows that B is lin-
early dependent. This contradicts the fact that S is linearly independent, and hence
W1 ∩ W2 = {0}.
Example 2.42 Let V be a vector space over a field F with dimV = k ≥ 2, and let
B = {v1 , v2 , . . . , vk } be a basis of V . For each 1 ≤ i ≤ k, let Wi = vi  = {αvi | α ∈
F}, then it can be seen that V is internal direct sum of k subspaces W1 , W2 , . . . , Wk
each of dimension one. Consider v ∈ V . Then there exist scalars α1 , α2 , . . . , αk ∈ F
2.2 Linear Dependence, Independence and Basis 59

such that v = α1 v1 + α2 v2 + · · · + αk vk i.e., v = β1 + β2 + · · · + βk ; βi = αi vi ∈


Wi . Suppose also that v = β1 + β2 + · · · + βk ; βi = αi vi ∈ Wi . Hence v = α1 v1 +
k k k
α2 v2 + · · · + αk vk , which implies that i=1 αi vi = i=1 αi vi i.e., i=1 (αi − αi )vi

= 0. But since B is linearly independent, we find that αi = αi and therefore βi =
αi vi = αi vi = βi and v has a unique representation. Hence V = W1 ⊕ W2 ⊕ · · · Wk .

Theorem 2.43 If a vector space V over a field F has a basis containing m vectors,
where m is a positive integer, then any set containing n vectors n > m in V is linearly
dependent.

Proof Suppose that {v1 , v2 , . . . , vm } is a basis of V and let {w1 , w2 , . . . , wn } be any


subset of V containing n vectors (n > m). Since {v1 , v2 , . . . , vm } spans V , there exist
scalars αi j ∈ F such that


m
wj = αi j vi ,
i=1

for each j such that 1 ≤ j ≤ n.


In order to show that {w1 , w2 , . . . , wn } is linearly dependent, we have to find
β1 , β2 , . . . , βn ∈ F not all zero such that


n
β j w j = 0.
j=1

 m m n
This yields that nj=1 β j ( i=1 αi j vi ) = 0, i.e., i=1 ( j=1 αi j β j )vi = 0.
But since {v1 , v2 , . . . , vm } is a basis of V , 0 = 0v1 + 0v2 + · · · + 0vm is the
unique representation for 0 vector. Hence the above expression yields that nj=1 αi j β j
= 0, for each i such that 1 ≤ i ≤ m. This is a system of m homogeneous linear equa-
tions in n unknowns. Thus by Theorem 1.104, there exists nontrivial solution say
β1 , β
2 , . . . , βn . This ensures that there exist scalars β1 , β2 , . . . , βn not all zero such
that nj=1 β j w j = 0 and hence the set {w1 , w2 , . . . , wn } is linearly dependent.

Definition 2.44 A vector space V over a field F is said to be finite dimensional (resp.
infinite dimensional) if there exists a finite (resp. infinite) subset S of V which spans
V , i.e., L(S) = V .

Remark 2.45 If a vector space V has a basis with a finite (resp. an infinite) number
of vectors, then it is finite (resp. infinite) dimensional. The number of vectors of
a basis of V is called the dimension of V denoted as dimV . If V = {0}, then its
dimension is taken to be zero.

Theorem 2.46 Let V be a vector space over a field F. If it has a finite basis, then
any two bases of V have the same number of elements.

Proof Let B = {v1 , v2 , . . . , vn } and B = {w1 , w2 , . . . , wm } be two bases of V . As

B is a basis and the set B is a linearly independent set of V , by Theorem 2.43, we
60 2 Vector Spaces


arrive at m ≤ n. Similarly, as B is a basis of V and the set B is a linearly independent
set of V , again by Theorem 2.43, we conclude that n ≤ m. This yields that m = n.

Theorem 2.47 Let V be an n-dimensional vector space over a field F. Then any
linearly independent subset of V consisting n elements is a basis of V .

Proof Let S = {v1 , v2 , . . . , vn } be a linearly independent subset of V containing n


vectors. Then in view of Theorem 2.43, for any v ∈ V , the set {v, v1 , v2 , . . . , vn }
consisting of n + 1 vectors is linearly dependent. Therefore, there exist scalars
α, α1 , α2 , . . . , αn ∈ F not all zero such that αv + α1 v1 + α2 v2 + · · · + αn vn = 0.
Our claim is that α = 0. If α = 0, then this contradicts the fact that S is linearly inde-
pendent. Hence v = (−(α −1 α1 ))v1 + (−(α −1 α2 ))v2 + · · · + (−(α −1 αn ))vn . This
shows that S spans V and hence a basis of V .

Theorem 2.48 Let V be an n-dimensional vector space over a field F. Then any
linearly independent subset {v1 , v2 , . . . , vm }, m ≤ n, of V can be extended to a
basis V .

Proof Given that {v1 , v2 , . . . , vm } is a linearly independent subset of V . Let W be


the subspace of V generated by {v1 , v2 , . . . , vm }. If W = V , then {v1 , v2 , . . . , vm }
is a basis of V . If W ⊂ V , then there exists vm+1 ∈ V such that vm+1 ∈ / W,
that is vm+1 ∈ V \ W . Then we claim that the set {v1 , v2 , . . . , vm , vm+1 } is lin-
early independent. If it is not, then there exist scalars α1 , α2 , . . . , αm , αm+1 not
all zero such  that α1 v1 + α2 v2 + · · · + αm vm + αm+1 vm+1 = 0. For if αm+1 = 0,
m −1
then vm+1 = i=1 (−(αm+1 αi ))vi ∈ W , a contradiction to the choice of vm+1 .
 
Now let W be the subspace generated by {v1 , v2 , . . . , vm , vm+1 }. If W = V , then

{v1 , v2 , . . . , vm , vm+1 } is a basis of V . Otherwise choose vm+2 ∈ V \ W and con-
tinue this process. Since V is finite dimensional, a linearly independent subset in V
has utmost n elements. After a finite number of n − m steps, we arrive at a linearly
independent subset {v1 , v2 , . . . , vm , vm+1 , . . . , vn } of V such that the subspace gen-
erated by {v1 , v2 , . . . , vm , vm+1 , . . . , vn } coincides with V . Hence there exist vectors
vm+1 , vm+2 , . . . , vn ∈ V such that {v1 , v2 , . . . , vm , vm+1 , . . . , vn } is a basis of V .

Theorem 2.49 Let W be a subspace of a finite dimensional vector space V . Then


there exists a subspace W  of V such that V = W ⊕ W  .

Proof Let dimV = n and dimW = m. If W = {0}, then W  = V and, on the other
hand, if W = V , then W  = {0}. Hence assume that neither W = {0} nor W = V . In
this case 1 ≤ m < n. Let B1 = {v1 , v2 , . . . , vm } be a basis of W . Since B is linearly
independent, by the above theorem B can be extended to a basis of V say B =
{v1 , v2 , . . . , vm , vm+1 , . . . , vn }. Now B = B1 ∪ B2 with B2 = {vm+1 , vm+2 , . . . , vn }
and B1 ∩ B2 = ∅. Let W  be spanned by B2 . Since V and W are spanned by B and B1 ,
respectively, by Lemma 2.41, V = W + W  and W ∩ W  = {0}, i.e., V = W ⊕ W  .

Theorem 2.50 Let V be a finite dimensional vector space over a field F and U a
subspace of V . Then dimU ≤ dimV . Equality holds only when U = V .
2.2 Linear Dependence, Independence and Basis 61

Proof Let B = {v1 , v2 , . . . , vn } be a basis of the vector space V . Let B1 be a basis


of subspace U . This implies that B1 is linearly independent set in U and hence B1 is
also linearly independent set in V . Using Theorem 2.43, we arrive at the number of
elements in the set B1 ≤ n, i.e., dimU ≤ dimV .
When dimU = dimV . B1 , a basis of U is also a linearly independent subset of V .
But dimV = n forces us to conclude that B1 is a basis of V also. Now we conclude
that L(B1 ) = U = V . This implies that U = V .
Remark 2.51 It is to be noted that if V is an infinite dimensional vector space
then its subspaces are of both finite and infinite dimensions. For justification, let V
denote the vector space of all polynomials in x with real coefficients over the field
of real numbers. It is easy to observe that V is an infinite dimensional vector space.
One of its basis is {1, x, x 2 , x 3 , . . . , x n , . . .}. It is easy to observe that W1 = {1},
W2 = {1, x}, W3 = {1, x, x 2 }, . . . are subspaces of V of dimensions 1, 2, 3, . . .,
respectively. If we take W = {x 4 , x 5 , x 6 , x 7 , . . .}, then it is easy to observe that W
is a subspace of V of infinite dimension.
Theorem 2.52 Let V be a finite dimensional vector space over a field F and W1 , W2
be any two subspaces of V , then dim(W1 + W2 ) = dimW1 + dimW2 − dim(W1 ∩
W2 ).
Proof Since W1 ∩ W2 is a subspace of V , which is a finite dimensional vector space.
Therefore using Theorem 2.50, we have dimW1 ∩ W2 ≤ dimV . Now assume that
{v1 , v2 , . . . , vk } is a basis of W1 ∩ W2 . But as W1 ∩ W2 is a subset of both W1 and
W2 , the set {v1 , v2 , . . . , vk } is a subset of both W1 and W2 . It is obvious to see that
{v1 , v2 , . . . , vk } is a linearly independent subset of both W1 and W2 . Thus the set
{v1 , v2 , . . . , vk } can be extended to a basis {v1 , v2 , . . . , vk , w1 , w2 , . . . , wr } of W1
and a basis {v1 , v2 , . . . , vk , u 1 , u 2 , . . . , u s } of W2 . Now our claim is that the set B =
{v1 , v2 , . . . , vk , w1 , w2 , . . . , wr , u 1 , u 2 , . . . , u s } spans W1 +
W 2 . Let x ∈ W1 + W2 .
k
Then x = y1 + y2 where y1 ∈ W1 , y2 ∈ W2 . Hence y1 = i=1 αi vi + rj=1 β j w j
k 
and y2 = i=1 γi vi + s=1 δ u  , for some αi , β j , γi , δl ∈ F, i = 1, 2, . . . , k; j =
k
1, 2, . . . , r ;  =1, 2, . . . , s. This implies that x = y1 + y2 = i=1
 (αi + γi )vi +
r s
β
j=1 j j w + δ
=1  u . This shows that the set B spans W 1 + W 2 Now to show
.
that B is linearly independent, let


k 
r 
s
αi vi + βjwj + δ u  = 0.
i=1 j=1 =1

This implies that



k 
r 
s
αi vi + βjwj = − δ u  .
i=1 j=1 =1

The expression in the right side is in W2 and in the left side is in W1 . Therefore,
s  
− δ u  ∈ W1 ∩ W2 , and hence can be written as − s=1 δ u  = kj=1 γ j v j for
=1
62 2 Vector Spaces

some γ j ∈ F, j = 1, 2, . . . , k. But since {v1 , v2 , . . . , vk , u 1 , u 2 , . . . , u s } is a basis of


W2 , we find that δ = 0, γ j = 0 for all  and j. Hence in particular,


k 
r 
s
αi vi + βjwj = − δ u  = 0.
i=1 j=1 =1

Since the set {v1 , v2 , . . . , vk , w1 , w2 , . . . , wr } being a basis of W1 is linearly inde-


pendent, the latter relation yields that αi = 0, β j = 0 for all i, j. Thus αi = 0, β j =
0, δ = 0 for all i, j,  and the set B is linearly independent. Thus B forms a basis for
the subspace W1 + W2 . Hence dim(W1 + W2 ) = k + r + s = (k + r ) + (k + s) −
k = dimW1 + dimW2 − dim(W1 ∩ W2 ).
If V is a finite dimensional vector space over a field F and W1 , W2 , W3 are any
three subspaces of V , then by replacing W1 with W1 + W2 and W2 with W3 in the
above theorem and using the fact that W1 ∩ (W1 + W2 ) ⊇ (W1 ∩ W2 ) + (W1 ∩ W3 ),
we arrive at the following corollary:
Corollary 2.53 Let V be a finite dimensional vector space over a field F, and
W1 , W2 , W3 be any three subspaces of V . Then dim(W1 + W2 + W3 ) ≤ dimW1 +
dimW2 + dimW3 − dim(W1 ∩ W2 ) − dim(W2 ∩ W3 )−dim(W3 ∩ W1 ) + dim(W1
∩ W2 ∩ W3 ).
Remark 2.54 If V is a direct sum of subspaces W1 and W2 , i.e., V = W1 ⊕ W2 ,
then dimV = dimW1 + dimW2 .
Theorem 2.55 Let W be a subspace of a finite dimensional vector space V over a
field F. Then dim(V /W ) = dimV − dimW .
Proof Let dimV = n and dimW = m, and let {v1 , v2 , . . . , vm } be a basis of W .
Since {v1 , v2 , . . . , vm } is a linearly independent subset of V , the set {v1 , v2 , . . . , vm }
can be extended to a basis of V , say

{v1 , v2 , . . . , vm , vm+1 , . . . , vn }.

Now consider the n − m vectors vm+1 + W, vm+2 + W, . . . , vn + W in the quotient


space V /W . Our claim is that the set S = {vm+1 + W, vm+2 + W, . . . , vn + W }
is a basis of V /W and then dim(V /W ) = n − m = dimV − dimW . First we
show that S spans V /W . Let v + W ∈ V /W . Then v ∈ V and there exist scalars
α1 , α2 , . . . , αn ∈ F such that v = α1 v1 + α2 v2 + · · · + αn vn . Therefore

v + W = (α1 v1 + α2 v2 + · · · + αm vm ) + W + (αm+1 vm+1 + αm+2 vm+2 +


· · · + αn vn ) + W
= (αm+1 vm+1 + αm+2 vm+2 + · · · + αn vn ) + W
= αm+1 (vm+1 + W ) + αm+2 (vm+2 + W ) + · · · + αn (vn + W )

This shows that S spans V /W . Further, we show that S is linearly independent. Let
βm+1 , βm+2 , . . . , βn ∈ F such that
2.2 Linear Dependence, Independence and Basis 63

βm+1 (vm+1 + W ) + βm+2 (vm+2 + W ) + · · · + βn (vn + W ) = W

(βm+1 vm+1 + βm+2 vm+2 + · · · + βn vn ) + W = W.

This implies that βm+1 vm+1 + βm+2 vm+2 + · · · + βn vn ∈ W . Therefore there exist
δ1 , δ2 , . . . , δn ∈ F such that
βm+1 vm+1 + βm+2 vm+2 + · · · + βn vn = δ1 v1 + δ2 v2 + · · · + δm vm , i.e.,

δ1 v1 + δ2 v2 + · · · + δm vm + (−βm+1 )vm+1 + (−βm+2 )vm+2 + · · · + (−βn )vn = 0.

Since {v1 , v2 , . . . , vm , vm+1 , . . . , vn } is a basis of V , we find that δ1 = δ2 = · · · =


δm = −βm+1 = −βm+2 = · · · = −βn = 0 and hence in particular βm+1 = βm+2 =
· · · = βn = 0 and S is linearly independent. This completes the proof of our theorem.

Exercises

1. Let K be a subfield of a field L and L be a subfield of a field M. Suppose M


is of dimension n over L and L is of dimension m over K . Prove that M is of
dimension mn over K .
2. Suppose that S1 , S2 , . . . are linearly independent sets of vectors and S1 ⊆ S2 ⊆
· · · . Show that the union S = S1 ∪ S2 ∪ · · · is also linearly independent.
3. Let C, R and Q denote the field of complex numbers, real numbers and rational
numbers, respectively. Show that
(a) C is an infinite dimensional vector over Q.
(b) R is an infinite dimensional vector space over Q.
(c) The set {α + iβ, γ + iδ} is a basis of C over R if and only if (αδ − βγ ) = 0,
hence C is a vector space of dimension 2 over R.
4. Construct an example of an infinite dimensional vector space V and
 it’s subspace
W such that there exists a subspace W1 of V such that V = W W1 .
5. Construct an example of an infinite dimensional vector space V over F with a
subspace W such that V /W is a finite dimensional vector space.
6. Let U = Span{(1, 2, 3), (0, 1, 2), (3, 2, 1)} and W = Span{(1, −2, 3), (−1, 1,
−2), (1, −3, 4)} be two subspaces of R3 . Determine the dimension and a basis
for U + W, and U ∩ W.
7. Consider the following subspaces of R5 :

U = Span{(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)},

W = Span{(1, 3, 0, 2, 1), (1, 5, −6, 6, 3), (2, 5, 3, 2, 1)}.

Find a basis and the dimension of U + W, and U ∩ W.


64 2 Vector Spaces

8. Let U = {(x1 , x2 , x3 , x4 ) ∈ R4 | x2 + x3 + x4 = 0} and W = {(x1 , x2 , x3 , x4 ) ∈


R4 | x1 + x2 = 0, x3 = 2x4 } be subspaces of R4 . Find bases and dimensions for
U , W , U ∩ W and U + W.
9. Let U and W be two distinct (n − 1)-dimensional subspaces of an n-dimensional
vector space V . Then prove that dim(U ∩ W ) = n − 2.
10 Suppose U and W are distinct 4-dimensional subspaces of a vector space V ,
where dim V = 6. Find the possible dimensions of U ∩ W .
11. Let W1 be a subspace of Mn×n (R) consisting of all n × n symmetric matrices.
Find a basis and the dimension of W1 . Further, find a subspace W2 of Mn×n (R)
such that Mn×n (R) = W1 ⊕ W2 .
12. Let W = {(x, y) ∈ R2 | ax + by = 0} for a fixed (a, b) ∈ R2 \ (0, 0). Show that
W is a 1-dimensional subspace of V = R2 and that the cosets of W in V are
lines parallel to the line ax + by = c for c ∈ R.
13. Let C (R) be the vector space of all real-valued continuous functions over
R with addition ( f + g)(x) = f (x) + g(x) and scalar multiplication (α f ) =
α f (x), α ∈ R. Show that sinx and cosx are linearly independent and the vector
space generated by sinx and cosx, i.e., Span{sinx, cosx} = {asinx + bcosx |
a, b ∈ R} is contained in the solution set to the differential equation y  + y = 0.
Are sin 2 x and cos 2 x linearly independent? What about 1, sin 2 x and cos 2 x? Find
R ∩ Span{sinx, cosx} and R ∩ Span{sin 2 x, cos 2 x}.
14. Let V = M3 (R), W be set of symmetric matrices, W  be the set of skew sym-
metric matrices. Prove that W and W  are subspaces of V and find the dimW
and dimW  . Moreover, prove that V = W ⊕ W  .
15. Let V = W1 + W2 for some finite dimensional subspaces W1 and W2 of V . If
dimV = dimW1 + dimW2 , show that V = W1 ⊕ W2 .
16. Let u ∈ R be a transcendental number. Let W be the set of real numbers which
are of the type α0 + α1 u + · · · + αk u k ; αi ∈ Q, k ≥ 0. Prove that W is an
infinite dimensional subspace of R over Q.
17. Prove that the vector space C of all continuous functions from R to R is infinite
dimensional.
18. If W1 , W2 , . . . , Wn are finite dimensional subspaces of a vector space V , then
show that W1 + W2 + · · · + Wn is finite dimensional and dim(W1 + W2 + · · · +
Wn ) ≤ dimW1 + dimW2 + · · · + dimWn .
19. Suppose that W1 , W2 , . . . , Wn are finite dimensional subspaces of a vector space
V such that W1 + W2 + · · · + Wn is a directsum. Then show that W1 ⊕ W2 ⊕
· · · ⊕ Wn is finite dimensional and dim(W1 ⊕ W2 ⊕ · · · ⊕ Wn ) = dimW1 +
dimW2 + · · · + dimWn .

2.3 Geometrical Interpretations

Vector spaces and notions involved there can be interpreted geometrically in some
cases. We have chosen the vector spaces R2 and R3 over the field R of real numbers
for geometrical interpretation.
2.3 Geometrical Interpretations 65

Geometrically R2 is the Euclidean plane or 2-dimensional cartesian plane. R2 is a


vector space of dimension 2. {(0, 0)} and R2 are trivial subspaces of R2 of dimensions
0 and 2, respectively. Let W be a nontrivial subspace of R2 . Then dim W = 1. Let {v}
be a basis of W. It is obvious that v = 0, choose v = (α, β) = (0, 0) ∈ R2 . Therefore
W = L({v}) = {λ(α, β) | λ ∈ R}. If (x, y) ∈ W , then it implies that x = λα and y =
λβ, i.e., x−0
α
= y−0
β
, which is obviously an equation of a straight line passing through
origin and having slope βα . This shows that each nontrivial subspace of R2 represents
a straight line passing through origin. Conversely, suppose a straight line passing
, where ηδ is the slope of the line. If W denotes

through origin be given by x−0 δ
= y−0
η
 
the set of all the points lying on this line, then W = {(μδ, μη) | μ ∈ R} or W =

{μ(δ, η) | μ ∈ R} ⊂ R2 . It can be easily verified that W is a nontrivial subspace
of R2 . Thus all the straight lines passing through origin are precisely nontrivial
subspaces of R2 .
Let W be a nontrivial subspace of R2 . Then clearly W = {λ(α, β) | λ ∈ R}, which
is geometrically a straight line passing through origin and having slope βα . Let RW
2

denote the quotient space of R2 with regard to the subspace W and is given by
R2 2

W
= {v + W |v ∈ R2 }. Consider a coset (v + W ) ∈ RW , where v = (a, b). Here (v +
W ) = {(a + λα, b + λβ) | λ ∈ R}. If (x, y) be any arbitrary element of (v + W ),
then we get x−a α
= y−b
β
. Geometrically the latter equation gives a straight line passing
through (a, b) and parallel to the straight line represented by W . This shows that
geometrically each coset (v + W ) is a straight line passing through point v and
parallel to the line represented by the subspace W . Thus geometrically the quotient
space RW is the collection of all the straight lines, which are parallel to the straight
2

line represented by the subspace W.


Consider R3 , the vector space over field of real numbers. It is a 3-dimensional
vector space. Geometrically R3 is the 3-dimensional Euclidean space. Its nontrivial
subspaces will be either of dimension 1 or of dimension 2. Let W1 be a subspace
of R3 of dimension 1. Let {v} be a basis of W1 . It is obvious that v = 0, choose
v = (α, β, γ ) = (0, 0, 0) ∈ R3 . Therefore W1 = L({v}) = {λ(α, β, γ ) | λ ∈ R}. If
(x, y, z) ∈ W1 , then it implies that x = λα, y = λβ and z = λγ , i.e., x−0 α
= y−0
β
=
z−0
γ
, which is obviously an equation of a straight line passing through origin and
having direction ratios α, β and γ . This shows that the nontrivial subspace W1 of R3
represents a straight line passing through origin. On the other hand, let a straight line
passing through origin be given by: x−0 δ
= y−0
η
= y−0ψ
, where δ, η, ψ are direction

ratios of the line. If W denotes the set of all the points lying on this line, then
 
W = {(μδ, μη, μψ) | μ ∈ R} or W = {μ(δ, η, ψ) | μ ∈ R} ⊂ R3 . It can be easily

seen that W is a nontrivial 1-dimensional subspace of R3 . Thus all the straight lines
passing through origin are precisely 1-dimensional subspaces of R3 .
Let W2 be a subspace of R3 of dimension 2. Let {v1 , v2 } be a basis of W2 . Let v1 =
(α1 , β1 , γ1 ), v2 = (α2 , β2 , γ2 ). Therefore W2 = L({v1 , v2 }) = {λ(α1 , β1 , γ1 ) + μ
(α2 , β2 , γ2 ) | λ, μ ∈ R}. If (x, y, z) ∈ W2 , then it implies that x = λα1 + μα2 ,
y = λβ1 + μβ2 and z = λγ1 + λγ2 . Eliminating λ, μ from the latter relations we get
(β1 γ2 − β2 γ1 )x + (γ1 α2 − γ2 α1 )y + (α1 β2 − α2 β1 )z = 0, which is an equation of a
66 2 Vector Spaces

plane passing through origin, v1 and v2 . This shows that geometrically W2 represents
a plane passing through origin. Conversely, suppose a plane passing through origin
be given by: px + qy + r z = 0, where (0, 0, 0) = ( p, q, r ) ∈ R3 . If W  denotes
μ
the set of all the points lying on this plane, then W  = {( qλ−r p
, λ, μ) | λ, μ ∈ R}
 −r
or W = {λ( p , 1, 0) + μ( p , 0, 1) | λ, μ ∈ R} ⊂ R . It can be easily verified that
q 3

W  is a subspace of R3 of dimension 2. Thus all the planes passing through origin


are precisely 2-dimensional subspaces of R3 . Hence we conclude that lines passing
through origin and planes passing through origin are precisely subspaces of R3 .
R3 R3
Consider the quotient spaces W  ,
W 
. Then it can be easily seen that geometrically
R3
W
represents all the straight lines, which are parallel to the line represented by the

subspace W , whereas the quotient space WR  represents all the planes, which are
3

parallel to the plane represented by the subspace W  .

Exercises

1. Let R2 be the vector space over the field R of real numbers. Let S be any subset
of R2 as given below. Then find the subspace generated by S, i.e., S. Also find
out the equation of the curve represented by this subspace.
(a) S = {(3, 5)}.
(b) S = {(2, −3), (4, −6)}. √ √
(c) S = {(−3, −8), (3, 8), (3 5, 8 5)}.
2. Let R3 be the vector space over the field R of real numbers. Let S be any subset
of R3 as given below. Then find the subspace generated by S, i.e., S. Also find
out the equations of the curves or surfaces represented by these subspaces.
(a) S = {(−5, 11, 3)}.
(b) S = {(5, −3, 17), (3, −8, −11)}.
(c) S = {(3, 5, −3), (6, 10, −6), (7, −8, −6)}.

2.4 Change of Basis

Let V be a finite dimensional vector space over a field F and let B = {v1 , v2 , . . . , vn }
be a basis of V . Fix the order of elements in the basis B. If v ∈ V then there exist
unique scalars α1 , α2 , . . . , αn ∈ F such ⎡
that v⎤= α1 v1 + α2 v2 + · · · + αn vn . Thus
α1
⎢ α2 ⎥
⎢ ⎥
v ∈ V determines a unique n × 1 matrix ⎢ . ⎥ . This n × 1 matrix is known as the
⎣ .. ⎦
αn
coordinate vector of v relative to the ordered basis B, denoted as [v] B . Notice that
2.4 Change of Basis 67

basis B and the order of the elements in B play a very important role in determining
the coordinate vector of any arbitrary element in V . Throughout this section we shall
consider the ordered basis.
Definition 2.56 Let V be a vector space over a field F and let
B1 = {u 1 , u 2 , . . . , u n } and B2 = {v1 , v2 . . . , vn } be two ordered bases of V . Now
if we consider B2 as a basis of V , then each vector  in V can be uniquely written as
a linear combination of v1 , v2 . . . , vn , i.e., u i = nj=1 α ji v j , for each i; 1 ≤ i ≤ n,
where α ji ∈ F. Then the n × n matrix P = (α ji ) is called the transition matrix of
B1 relative to
the basis B2 . Similarly, if we consider B1 as a basis of V , then we can
n
write v j = i=1 βi j u i , for each j; 1 ≤ j ≤ n, and the n × n matrix Q = (βi j ) is
known as the transition matrix of B2 relative to the basis B1 .
Remark 2.57 (i) Notice that in the ⎡ above⎤definition the coordinate vector of u i
α1i
⎢ α2i ⎥
⎢ ⎥
relative to the basis B2 , i.e., [u i ] B2 = ⎢ . ⎥ is the ith column vector in the matrix
⎣ .. ⎦
αni
P relative to the basis B2 .
(ii) In Fn , the standard ordered basis is {e1 , e2 , . . . , en }, where ei is the n-tuple
with ith component 1, and all the other components are zero. For example, in R3 ,
B1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is the standard ordered basis, but the ordered
basis B2 = {(0, 1, 0), (1, 0, 0), (0, 0, 1)} is not the standard ordered basis of R3 .
Example 2.58 Let B1 = {e1 , e2 , e3 } and B2 = {(1, 1, 0), (1, −1, 0), (0, 1, 1)} be
two ordered bases of vector space R3 . Then

(1, 1, 0) = 1e1 + 1e2 + 0e3


(1, −1, 0) = 1e1 + (−1)e2 + 0e3
(0, 1, 1) = 0e1 + 1e2 + 1e3 .
⎡ ⎤
1 1 0
Thus the transition matrix of B2 relative to the basis B1 is P = ⎣ 1 −1 1 ⎦.
0 0 1
Similarly
e1 = 21 (1, 1, 0) + 21 (1, −1, 0) + 0(0, 1, 1)
e2 = 21 (1, 1, 0) + −12
(1, −1, 0) + 0(0, 1, 1)
e3 = −12
(1, 1, 0) + 1
2
(1, −1, 0) + 1(0, 1, 1).
⎡1 1 −1

2 2 2
Hence the transition matrix of B1 relative to the basis B2 is Q = ⎣ ⎦. 1 −1 1
2 2 2
0 0 1
It is easy to see that P Q = Q P = I and hence P and Q are inverses of each other.
Theorem 2.59 Let B1 = {u 1 , u 2 , . . . , u n } and B2 = {v1 , v2 , . . . , vn } be two ordered
bases of a vector space V over a field F. If P is the transition matrix of B1 relative
68 2 Vector Spaces

to the basis B2 , then P is nonsingular and P −1 is the transition matrix of B2 relative


to the basis B1 .

Proof Let P = (αi j ) and Q = (βi j ) be the transition matrices of B1 relative to the
basis B2 and of B2 relative to the basis B1 , respectively. Hence
 n n we find that u j =
α v
i=1 i j i , for each j, 1 ≤ j ≤ n, where αij ∈ F and vi = k=1 βki u k , for each
i, 1 ≤ i ≤ n. This shows that
 n 
 n 
uj = αi j βki u k
i=1  k=1 
 
n n
= βki αi j u k
k=1 i=1
n
= δk j u k ,
k=1

n
where δk j = i=1 βki αi j is the (k, j)-th entry of the product Q P. But since B1 is
linearly independent, we find that δk j = 1 for k = j and δk j = 0 for k = j. This
yields that Q P = I . In a similar manner, it can be shown that P Q = I . Hence P is
nonsingular and Q = P −1 .

Theorem 2.60 Let B = {u 1 , u 2 , . . . , u n } be an ordered basis of a nonzero finite


dimensional vector space V over a field F and n P = (αi j ) be any invertible n × n
matrix over F. For each j, 1 ≤ j ≤ n, if v j = i=1 αi j u i , then B  = {v1 , v2 , . . . , vn }
is a basis of V .

Proof Since dimV = n, it is enough to prove that B  spans V . Let W = L(B  ) and
P −1 = (βi j ). Then for each t, 1 ≤ t ≤ n we find that
n 
n 
n 
j=1 β jt v j = β jt αi j u i
j=1  i=1 
n n
= αi j β jt u i
i=1 j=1
n
= δit u i , where δit = 1 ∈ F for i = t and δit = 0 ∈ F for i = t
i=1
= ut .

This yields that u t ∈ W and hence B ⊆ W . Since W is a subspace of V , V ⊆ W ⊆ V


and V is spanned by B  .

Theorem 2.61 Let V be a vector space of dimension n over a field F. If P = (αi j )


is the transition matrix of the ordered basis { f 1 , f 2 , . . . , f n } relative to the ordered
basis {e1 , e2 , . . . , en }, and Q = (βi j ) is the transition matrix of the ordered basis
{g1 , g2 , . . . , gn } relative to the ordered basis { f 1 , f 2 , . . . , f n }. Then the transition
matrix of {g1 , g2 , . . . , gn } relative to the basis {e1 , e2 , . . . , en } is P Q.
2.4 Change of Basis 69

 
Proof We find that f k = nj=1 α jk e j , for each k, 1 ≤ k ≤ n and gi = nk=1 βki f k ,
for each i, 1 ≤ i ≤ n. This yields that
 

n 
n
gi = βki α jk e j
k=1
 n j=1

n 
= α jk βki e j
j=1 k=1
n
= γ ji e j ,
j=1


where γ ji = nk=1 α jk βki . Hence the transition matrix of {g1 , g2 , . . . , gn } relative
to the ordered basis {e1 , e2 , . . . , en } is (γi j ) = (αi j )(βi j ) = P Q. This completes the
proof.

Theorem 2.62 Let B1 = {u 1 , u 2 , . . . , u n } and B2 = {v1 , v2 . . . , vn } be two ordered


bases of a vector space V over a field F. Then for any v ∈ V, [v] B1 = Q[v] B2 where
Q is the transition matrix of B2 relative to the basis B1 .

Proof Let Q = (βi j ) be the transition matrix of B2 relative to ⎡ the basis


⎤ B1 . Then
a1
 ⎢ a2 ⎥
⎢ ⎥
v j = nk=1 βk j u k , for each j, 1 ≤ j ≤ n. Suppose that [v] B1 = ⎢ . ⎥ and [v] B2 =
⎣ .. ⎦
an
⎡ ⎤
b1
⎢ b2 ⎥ n n  n
⎢ ⎥
⎢ .. ⎥. Since v ∈ V, v = j=1 b j v j = j=1 b j k=1 βk j u k . This implies that
⎣ . ⎦
bn
⎡ ⎤
a1
n  n ⎢ a2 ⎥ 
⎢ ⎥
v= βk j b j u k . Also since [v] B1 = ⎢ . ⎥, we find that v = nk=1 ak u k .
k=1 j=1 .
⎣ . ⎦
an
Now comparing
 the coefficients of u k in the latter two relations for v, we arrive at
ak = nj=1 βk j b j . This shows that

⎡
n ⎤
β1 j b j
⎡⎤ ⎢ j=1 ⎥ ⎡ ⎤⎡ ⎤
a1 ⎢ n ⎥ β11 β12 ··· β1n b1
⎢ ⎥
⎢ a2 ⎥ ⎢ β2 j b j ⎥ ⎢ β21 β22 ··· β2n ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
[v] B1 = ⎢ . ⎥ = ⎢ j=1 ⎥ = ⎢ .. .. .. .. ⎥ ⎢ .. ⎥
⎣ .. ⎦ ⎢ .. ⎥ ⎣ . . . . ⎦⎣ . ⎦
⎢ . ⎥
an ⎢ ⎥ βn1 βn2 · · · βnn bn
⎣ n ⎦
βn j b j
j=1
70 2 Vector Spaces

= Q[v] B2 .
This completes the proof of the result.

Theorem 2.63 Let B = {u 1 , u 2 , . . . , u n } be an ordered basis of a vector space V


over a field F. Then the map ξ : V → Mn×1 (F) such that ξ(v) = [v] B , v ∈ V satis-
fies the following:

(i) ξ(αv1 + βv2 ) = αξ(v1 ) + βξ(v2 ), for any α, β ∈ F and v1 , v2 ∈ V


(ii) ξ is one-to-one and onto.
⎡ ⎤ ⎡ ⎤
a1 b1
⎢ a2 ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥
Proof Let [v1 ] B = ⎢ . ⎥ and [v2 ] B = ⎢ . ⎥. Since B is an ordered basis of V ,
⎣ .. ⎦ ⎣ .. ⎦
a b
n n nn
we
n find that αv1 = i=1 αai u i , βv 2 = i=1 βbi u i . This yields that αv1 + βv2 =
i=1 (αai + βb i )u i . Hence we find that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
αa1 + βb1 αa1 βb1 a1 b1
⎢ αa2 + βb2 ⎥ ⎢ αa2 ⎥ ⎢ βb2 ⎥ ⎢ a2 ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
[αv1 + βv2 ] B = ⎢ .. ⎥ = ⎢ .. ⎥ + ⎢ .. ⎥ = α ⎢ .. ⎥ + β ⎢ .. ⎥ .
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
αan + βbn αan βbn an bn

This shows that ξ(αv1 + βv2 ) = αξ(v1 ) + βξ(v2 ), for any α, β ∈ F and v1 , v2 ∈ V .

(ii) For any v1 , v2 ∈ V , ξ(v1 ) = ξ(v2 ) implies that [v1 ] B = [v2 ] B , which yields
that ai = bi ⎡ ⎤ i, 1 ≤ i ≤ n. Hence v1 = v2 and the map ξ is one-to-one. Also
for each
c1
⎢ c2 ⎥ n
⎢ ⎥
for any c = ⎢ . ⎥ ∈ Mn×1 (F) there exists v = i=1 ci u i ∈ V such that ξ(v) = c
⎣ .. ⎦
cn
and hence ξ is onto.

Example 2.64 Let V = R2 [x] be the vector space of all polynomials with real coef-
ficients of degree less than or equal to 2. Consider the following three ordered bases
of V , B1 = {1, 1 + x, 1 + x + x 2 }, B2 = {x, 1, 1 + x 2 } and B3 = {x, 1 + x 2 , 1}.
(1) For p(x) = 1 − 2x + 2x 2 , find [ p(x)] B1 , [ p(x)] B2 , [ p(x)] B3 .
(2) Find the matrix M1 of B3 relative to B2 .
(3) Find the matrix M2 of B2 relative to B1 .
(4) Find the matrix M of B3 relative to B1 and verify that M = M2 M1 .
(1) Express p(x) as a linear combination of ordered bases B1 , B2 and B3 , given
as below:
1 − 2x + 2x 2 = 3(1) + (−4)(1 + x) + 2(1 + x + x 2 )
1 − 2x + 2x 2 = (−2)x + (−1)1 + 2(1 + x 2 )
2.4 Change of Basis 71

1 − 2x + 2x 2 = (−2)x +⎡2(1 ⎤ + x 2 ) + (−1)1⎡ ⎤ ⎡ ⎤


3 −2 −2
Thus we get [ p(x)] B1 = ⎣ −4 ⎦, [ p(x)] B2 = ⎣ −1 ⎦, [ p(x)] B3 = ⎣ 2 ⎦.
2 2 −1
(2) As we have x = 1x + 01 + 0(1 + x 2 ),
1 + x 2 = 0x + 01 + 1(1 + x 2 )
1 = 0x + 1(1) + 0(1 + x 2 ).
This shows that matrix of B3 relative to the ordered basis B2 will be given by
⎡ ⎤
100
M1 = ⎣ 0 0 1 ⎦ .
010

(3) Similarly, x = (−1)(1) + 1(1 + x) + 0(1 + x + x 2 ),


1 = 1(1) + 0(1 + x) + 0(1 + x + x 2 ),
1 + x 2 = 1(1) + (−1)(1 + x) + 1(1 + x + x 2 ).
Thus the matrix of B2 relative to the ordered basis B1 will be given by
⎡ ⎤
−1 1 1
M2 = ⎣ 1 0 −1 ⎦ .
0 0 1

(4) As above, we can write x = −1(1) + 1(1 + x) + 0(1 + x + x 2 ),


1 + x 2 = 1(1) + (−1)(1 + x) + 1(1 + x + x 2 ) ,
1 = 1(1) + 0(1 + x) + 0(1 + x + x 2 ). Hence the matrix of B3 relative to the ordered
basis B1 will be given by ⎡ ⎤
−1 1 1
M = ⎣ 1 −1 0 ⎦ .
0 1 0

It can be easily verified that M = M2 M1 .

Exercises

1. In the vector space R3 , find the transition matrix of the ordered basis
{(1, cosx, sinx), (1, 0, 0), (1, −sinx, cosx)} relative to the standard ordered
basis {(1, 0, 0), (0, 1, 0), (0, 0, 1)} of R3 .
2. In the vector space R3 , find the transition matrix of the ordered basis
{(2, 1, 0), (0, 2, 1), (0, 1, 2)} relative to the standard ordered basis
{(1, 0, 0), (0, 1, 0), (0, 0, 1)} of R3 .
72 2 Vector Spaces

3. Let B1 = {(1, 0), (0, 1)} and B2 = {(2, 3), (3, 2)} be two ordered bases of R2 .
Then find the transition matrix P of B2 relative to the basis B1 and the transition
matrix Q of B1 relative to the basis B2 and show that P Q = Q P = I2 .
4. In the vector space R3 , let B1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and B2 =
{(1, 1, 0), (1, −1, 0), (0, 1, 1)} be two ordered bases. Then find the transition
matrix P of B2 relative to the basis B1 and transition matrix Q of B1 relative to
the basis B2 and show that P Q = Q P = I3 .
5. Suppose the X and Y axes in the plane R2 are rotated counterclockwise 45o
so that the new X and Y axes are along the line y = x and the line y = −x,
respectively.
(a) Find the change of basis matrix.
(b) Find the coordinates of the point P(5, 6) under the given rotation.
6. Let B = {u 1 , u 2 , . . . , u n } be an ordered basis of a nonzero finite dimensional
n
vector space V over a field F. For each j : 1 ≤ j ≤ n, define v j = i=1 αi j u j ,

where αi j ∈ F. If the ordered set B = {v1 , v2 , . . . , vn } is a basis of V , then prove
that P = (αi j ) is an invertible matrix over F.
7. Let W be the subspace of C3 over C spanned by α1 = (1, 0, i), α2 = (i, 0, 1),
where C is the field of complex numbers. Prove the following:
(a) The set B = {α1 , α2 } is a basis of W .

(b) The set B = {β1 , β2 }, where β1 = (1 + i, 0, 1 + i), β2 = (1 − i, 0, i − 1)
is also a basis of W .

(c) Find the matrix of the ordered basis B = {β1 , β2 } relative to the ordered
basis B = {α1 , α2 }.
8. Find the total number of ordered bases of the vector space V = Fn , where F is
a finite field containing p elements.
9. Let {α1 , α2 , . . . , αn } be a basis of an n-dimensional vector space V . Show that
{λ1 α1 , λ2 α2 , . . . , λn αn } is also a basis of V for any nonzero scalars λ1 , λ2 , . . . , λn .
If the coordinate of a vector v under the basis {α1 , α2 , . . . , αn } is
x = {x1 , x2 , . . . , xn }, then find the coordinate of v under {λ1 α1 , λ2 α2 , . . . , λn αn }.
What are the coordinates of w = α1 + α2 + · · · + αn with respect to the bases
{α1 , α2 , . .
. , αn } and {λ1 α1 , λ2 α2 , . . . , λn αn } ?
ab
10. Let W = | a, b, c ∈ R . Show that W is a subspace of M2×2 (R) over
bc
     
10 01 00
R and , , forms a basis of W . Find the coordinate of the
00 10 01
 
1 −2
matrix under this basis.
−2 3
11. Consider the vector space Rn over R with usual operations. Consider the bases
B1 = {e1 , e2 , . . . , en } where

e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1)


2.4 Change of Basis 73

and B2 = { f 1 , f 2 , . . . , f n } where

f 1 = (1, 0, . . . , 0), f 2 = (1, 1, . . . , 0), . . . , f n = (1, 1, . . . , 1).

(a) Find the transition matrix of B2 relative to the basis B1 .


(b) Find the transition matrix of⎡B1 ⎤relative to the basis B2 .
1
⎢2⎥
⎢ ⎥
(c) If v ∈ Rn such that [v] B1 = ⎢ . ⎥, then find [v] B2 .
⎣ .. ⎦
n
Chapter 3
Linear Transformations

A map between any two algebraic structures (say groups, rings, fields, modules
or algebra) of same kind is said to be an isomorphism if it is one-to-one, onto
and homomorphism; roughly speaking, it preserves the operations in the underlying
algebraic structures. If any two vector spaces over the same field are given, then one
can study the relationship between two vector spaces. In this chapter, we will define
the notion of a linear transformation between two vector spaces U and V which are
defined over the same field and discuss the basic properties of linear transformations.
Throughout, the vector spaces are considered over a field F unless otherwise stated.
Definition 3.1 Let U and V be vector spaces over the same field F. A map T : U →
V is said to be a linear transformation (a vector space homomorphism or a linear
map) if it satisfies the following:
(1) T (u 1 + u 2 ) = T (u 1 ) + T (u 2 )
(2) T (αu 1 ) = αT (u 1 )
for all u 1 , u 2 ∈ U and α ∈ F.

Remark 3.2 (i) If T : U → V is a linear transformation, then it is obvious to see


that T (0) = 0, where 0 on the left side denotes the zero vector in U while the 0
on the right side represents zero vector of V . Moreover, T (−u) = −T (u), holds
for all u ∈ U .
(ii) For any u 1 , u 2 ∈ U , u 1 − u 2 = u 1 + (−u 2 ) and hence if T : U → V is a lin-
ear transformation, then T (u 1 − u 2 ) = T (u 1 + (−u 2 )) = T (u 1 ) + T (−u 2 ) =
T (u 1 ) − T (u 2 ).
(iii) Note that the vector spaces U and V are defined over the same field F to make
the condition (2) in the above definition meaningful. A linear transformation
T : U → V is said to be epimorphism if T is onto. If T is one-to-one it is said

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 75
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_3
76 3 Linear Transformations

to be monomorphism. A linear transformation T : U → V which is both one-


to-one and onto is called an isomorphism; if this being the case then the vector
spaces U and V are said to be isomorphic and we write V ∼ = W.
(iv) In particular, a linear transformation T : U −→ U is called a linear operator
on U .
(v) It can be easily seen that a map T : U → V is a linear transformation if and
only if T (αu 1 + βu 2 ) = αT (u 1 ) + βT (u 2 ) for any α, β ∈ F, u 1 , u 2 ∈ U .
(vi) If T : U → V is a linear transformation, then using the above remark and
n 
n
induction on n, it can be seen that T ( (αi u i )) = αi T (u i ) for any u i ∈
i=1 i=1
U, αi ∈ F, i = 1, 2, . . . , n.

Example 3.3 (1) Consider the vector space V = R[x] of polynomials over the field
R. Define a map T : V → V such that T ( f (x)) = f  (x), the usual derivative
of polynomial f (x). It can be easily seen that T ( f (x) + g(x)) = T ( f (x)) +
T (g(x)) and T (α f (x)) = αT ( f (x)), for any f (x), g(x) ∈ V and α ∈ R. Hence
T is a linear mapping which is onto but not one-to-one.
(2) Let T : R2 → R2 such that T (a, b) = (a + b, b) for all a, b ∈ R. It can be easily
seen that T is a linear transformation which is also an isomorphism.
(3) Let T : R3 → R3 such that T (a, b, c) = (0, b, c) for any (a, b, c) ∈ R3 . One
can easily verify that T is a linear transformation which is neither one-to-one
nor onto.
(4) Consider V = C [a, b], the vector space of all continuous real-valued func-
tions on the closed interval [a, b]. Define map T : V → R such that T ( f (x)) =
b
f (x)d x. For any α, β ∈ R and f (x), g(x) ∈ C [a, b], T (α f (x) + βg(x)) =
ab b b
a (α f (x) + βg(x))d x = α a f (x)d x + β a f (x)d x = αT ( f (x)) +
βT (g(x)). This shows that T is a linear transformation which is onto but not
one-to-one.
(5) Let R2 [x] denote the vector space of all polynomials of degree less than or equal
to two, over the field R. Then there exits a natural map T : R2 [x] → R3 defined
by T (α0 + α1 x + α2 x 2 ) = (α0 , α1 , α2 ), where α0 , α1 , α2 ∈ R. This is a linear
transformation which is one-to-one and onto.
(6) The map T : R2 → R2 defined by T (a, b) = (a + 1, b) (or, T (a, b) = (|a|,
|b|)) is not a linear transformation.
(7) The map T : R4 → R2 defined by T (a1 , a2 , a3 , a4 ) = (a1 + a2 , a3 + a4 ) is a
linear transformation. This is onto but not one-to-one.
(8) The map T : Mm×n (R) → Mn×m (R) defined by T (A) = At , the transpose of
the matrix A, is a linear transformation. It can be easily shown that this is an
isomorphism.
(9) Let T : Rn → Rn+1 such that T (a1 , a2 , . . . , an ) = (a1 , a2 , . . . , an , 0). Then T
is a linear transformation called natural inclusion. It is an injective linear trans-
formation but not surjective.
(10) Let C be the vector space over the field of real numbers R. Let T : C → C
be a map defined by T (z) = z̄, where z̄ is the conjugate of complex num-
3 Linear Transformations 77

ber z. Obviously T is a linear transformation. It can be verified that T is an


isomorphism.
(11) Let V = Mn (F) denote the vector space of all n × n matrices with entries from
F. Define a map T : V → F such that T (A) = tr (A), where tr (A) denotes the
trace of the matrix A. It can be easily proved that T is a linear transformation
which is onto but not one-to-one in general. However, this homomorphism T
will be an isomorphism if and only if n = 1.
(12) Consider the vector space V = F[x] of all polynomials in x with coefficients
from the field F, where characteristic of F is 0. Define a map : V → V
 2 n+1
such that ( f (x)) = α0 x + α1 x2 + · · · + αn xn+1 if f (x) = α0 + α1 x + · · · +
  
αn x . It can be easily seen that ( f (x) + g(x)) = ( f (x)) + (g(x)) and
n

(α f (x)) = α ( f (x)), for any f (x), g(x) ∈ V and α ∈ F. Hence is a linear
mapping which is one-to-one but not onto. This linear transformation is called
integration transformation.
(13) Let αi j ∈ F for each i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ n. Define a map T :
Fm → Fn such that
 m 
 
m 
m
T (a1 , a2 , . . . , am ) = αi1 ai , αi2 ai , . . . , αin ai .
i=1 i=1 i=1

It can be easily verified that T is a linear transformation.


(14) Let A = (αi j )n×m be an n × m matrix over F. Define a map T : Fm → Fn such
that if X = (x1 , x2 , . . . , xm ) ∈ Fm , then T (X ) = AX = (αi j )n×m (x j )m×1 =
m
(βi )n×1 , where βi = αi j x j . Here X is being treated as a column matrix.
j=1
Thus T (X ) = AX is an n × 1 matrix which may be considered as an n-tuple
belonging to Fn . It can be proved that T is a linear transformation.

Some Standard Linear Transformations

(1) Let T : U → V be the mapping which assigns the zero vector of V to every
vector u ∈ U , i.e., T (u) = 0 for all u ∈ U . Then it can be verified that T is a
linear transformation, which is known as the zero linear transformation , usually
denoted by 0.
(2) The identity mapping I : U → U such that I (u) = u for all u ∈ U is a linear
transformation and is known as the identity linear transformation, denoted as
IU . It is an isomorphism also.
(3) Let V be any vector space and W be a subspace of V . The inclusion mapping
i : W → V defined as i(w) = w for all w ∈ W is a linear transformation. This
is known as inclusion linear transformation, which is injective also. It is an
isomorphism if and only if W = V .
(4) Let V be a vector space and W a subspace of V . Let T : V → V /W be the
map defined by T (v) = v + W for every v ∈ V. It is easy to see that T is linear
transformation which is known as the quotient linear transformation. It is a
78 3 Linear Transformations

surjective linear transformation which is not an injective linear transformation


in general, but it is an isomorphism if and only if W = {0}.
(5) The map Ti : Fn → F given by Ti (α1 , α2 , . . . , αi , . . . , αn ) = αi is a linear trans-
formation for every i; 1 ≤ i ≤ n. This is known as ith projection of Fn to F,
which is surjective linear transformation but not injective linear transformation
in general. This is an isomorphism if and only if n = 1.

Remark 3.4 (i) If T : U → V is a linear transformation, it can be easily seen that


T may not map a linearly independent subset of U into a linearly independent
subset of V . For example, consider the linear transformation T : R2 → R2 such
that T (a1 , a2 ) = (0, a2 ). The set {(1, 0), (0, 1)} is linearly independent in R2
while {T (1, 0), T (0, 1)} is linearly dependent.
(ii) If T : U → V is a linear transformation, then T maps a linearly dependent subset
of U into a linearly dependent subset of V . In fact, if {u 1 , u 2 , . . . , u n } is linearly
dependent in U , then there exist scalars α1 , α2 , . . . , αn (not all zero) such that
α1 u 1 + α2 u 2 + · · · + αn u n = 0. This yields that

α1 T (u 1 ) + α2 T (u 2 ) + · · · + αn T (u n ) = T (α1 u 1 + α2 u 2 + · · · + αn u n ) = 0,

which shows that {T (u 1 ), T (u 2 ), . . . , T (u n )} is linearly dependent in V .


(iii) In view of Definition 2.16, notice that there should be no confusion in internal
direct sum and external direct sum. In fact, it can be easily seen that if V is an
internal direct sum of V1 , V2 , . . . , Vn , then V is isomorphic to the external direct
sum of V1 , V2 , . . . , Vn . Each v ∈ V can be uniquely written as v = v1 + v2 +
· · · + vn where vi ∈ Vi . Define a map f : V → V1 ⊕ V2 ⊕ · · · ⊕ Vn such that
f (v) = (v1 , v2 , . . . , vn ). The map f is well defined because of the uniqueness
of the representation of v. One can easily verify that f is one-to-one, onto and
homomorphism.

The following example shows that there exists no nonzero linear transformation T :
R2 −→ R2 , which maps the straight line ax + by + c = 0, where c = 0 to (0, 0) ∈
R2 .
Example 3.5 It is clear that out of a and b at least one must be nonzero, let it be a, i.e.,
a = 0. The straight line can also be represented as L = {( −bt−c a
, t) | t ∈ R} ⊆ R2 .
−bt−c
But T (l) = (0, 0) for all l ∈ L. This implies that T ( a , t) = (0, 0) for all t ∈ R. In
particular T ( −c
a
, 0) = (0, 0), i.e., (−c)
a
T (1, 0) = (0, 0). Thus we arrive at T (1, 0) =
(0, 0). T ( a , t) = (0, 0) for all t ∈ R also gives us T ( −bt
−bt−c
a
, t) + T ( −c
a
, 0) = (0, 0),
−bt −b
i.e., T ( a , t) = (0, 0). In particular, putting t = 1, we get T ( a , 1) = (0, 0), i.e.,
(−b)
a
T (1, 0) + T (0, 1) = (0, 0). This implies that T (0, 1) = 0. Finally we have
T (x, y) = T (x, 0) + T (0, y) = x T (1, 0) + yT (0, 1) = (0, 0) for all (x, y) ∈ R2 ,
i.e., T = 0.
3 Linear Transformations 79

Exercises

1. Let V be the vector space of all continuous functions f : R −→  xR over the field
of reals and define a mapping φ : V −→ V by [φ( f )](x) = 0 f (t)dt. Prove
that φ is a linear transformation.
2. Let V be the vector space over R of polynomials with real coefficients. Define ψ :
 k  k
V −→ V by ψ ai x i = iai x i−1 . Prove that ψ is a linear transformation.
i=0 i=1
3. Let Vn = { p(x) ∈ F[x] | degp(x) < n}, where n is any positive integer. Define
T : Vn −→ Vn by T ( p(x)) = p(x + 1). Show that T is an automorphism of Vn .
4. Let U = Fn+1 and V = { p(x) ∈ F[x] | degp(x) ≤ n}, where n is any positive
integer, be the vector spaces over F. Define T : U −→ V by T (α0 , α1 , . . . , αn ) =
α0 + α1 x + · · · + αn x n . Then prove that T is an isomorphism from U to V.
5. Let V = R2 be the 2-dimensional Euclidean space. Show that rotation through
an angle θ 0 is a linear transformation on V .
6. Let V be the vector space of all twice differentiable functions in [0, 1]. Show
that the mappings T1 : V −→ V and T2 : V −→ V defined by T1 ( f ) = dd xf and
T2 ( f ) = x f are linear transformations.
7. T : V −→ V be a linear transformation which is not onto. Show that there exists
some v ∈ V , v = 0, such that T (v) = 0.
8. Show that the following mappings are linear transformations:
(a) T : R2 −→ R2 defined by T (x, y) = (2x − 3y, y).
(b) T : R3 −→ R2 defined by T (x, y, z) = (x + 2y + z, 3x − 4y − 2z).
(c) T : R4 −→ R3 defined by T (x, y, z, t) = (x − y + 2z + t, x − y − 2z +
3t, x + 3y + z − 2t).
9. Show that the following mappings are not linear mappings:
(a) T : R2 −→ R2 defined by T (x, y) = (x y, y 2 ).
(b) T : R2 −→ R3 defined by T (x, y) = (x + 3, 3y, 2x − y).
(c) T : R3 −→ R3 defined by T (x, y, z) = (|x + y|, 2x − 3y + z, x + y + 2).
10. Let V be the vector space of n-square real matrices. Let M be an arbitrary but
fixed matrix in V . Let T1 , T2 : V −→ V be defined by T1 (A) = AM + M A,
T2 (A) = AM − M A, where A is any matrix in V . Show that T1 and T2 both are
linear transformations on V .
11. Prove that a mapping T : U −→ V is a linear transformation if and only if
T (x + αy) = T (x) + αT (y) for all x, y ∈ U and α ∈ F.
12. Prove that any linear functional f : Rn → R is a continuous function.
13. Let f : Rn → R be a continuous function which is also additive. Prove that f
is a linear functional on Rn .
80 3 Linear Transformations

3.1 Kernel and Range of a Linear Transformation

Definition 3.6 Let T : U → V be a linear transformation. The kernel or null space


of T , denoted as N (T ) or K er T , is the set of all those elements of U which are
mapped to the zero of V , that is,

N (T ) = {u ∈ U | T (u) = 0}.

The range of T or the image of T or the rank space of T , denoted as R(T ), or T (U )


or Im (T ) is the set of images in V , that is

R(T ) = {T (u) | u ∈ U }.

Remark 3.7 (i) If T : U → V is a linear transformation, then T (0) = 0 and hence


0 ∈ N (T ) and N (T ) = ∅. Now if u 1 , u 2 ∈ N (T ) and α, β ∈ F then T (αu 1 +
βu 2 ) = αT (u 1 ) + βT (u 2 ) = α0 + β0 = 0 and hence αu 1 + βu 2 ∈ N (T ). This
shows that N (T ) is a subspace of U . This is also an easy exercise that R(T ) is
a subspace of V .
(ii) If T : U → V is a linear transformation and the set {u 1 , u 2 , . . . , u n } spans U ,
then the set {T (u 1 ), T (u 2 ), . . . , T (u n )} spans R(T ). Let v ∈ R(T ). Then there
exists u ∈ U such that T (u) = v. In fact, if {u 1 , u 2 , . . . , u n } spans U , then for
any u ∈ U there exist scalars α1 , α2 , . . . , αn ∈ F such that u = α1 u 1 + α2 u 2 +
· · · + αn u n . Hence, we find that v = T (u) = T (α1 u 1 + α2 u 2 + · · · + αn u n ) =
α1 T (u 1 ) + α2 T (u 2 ) + · · · + αn T (u n ). Therefore R(T ) is spanned by the set
{T (u 1 ), T (u 2 ), . . . , T (u n )}.
(iii) If T : U → V is a linear transformation, then T is one-to-one if and only
if N (T ) = {0}. In fact, if T is one-to-one and u ∈ N (T ), then T (u) = 0 =
T (0) implies that u = 0 and hence N (T ) = {0}. Conversely, assume that
N (T ) = {0} and for some u 1 , u 2 ∈ U, T (u 1 ) = T (u 2 ). Then T (u 1 − u 2 ) =
T (u 1 ) − T (u 2 ) = 0 and hence u 1 − u 2 ∈ N (T ) = {0} and therefore u 1 = u 2 .
This shows that T is one-to-one.
Definition 3.8 Let T : U → V be a linear transformation. If N (T ) and R(T ) are
finite dimensional, then dimension of N (T ) is called nullity of T , denoted as n(T )
while the dimension of R(T ) (or the range of T ) is called the rank of T denoted as
r (T ).
Theorem 3.9 (Rank-Nullity Theorem or Sylvester Law) Let U, V be vector spaces
over a field F and T : U → V be a linear transformation. If U is finite dimensional
then n(T ) + r (T ) = dimU .
Proof Let dimU = n and dim N (T ) = k. If {u 1 , u 2 , . . . , u k } is a basis of N (T ),
then being a linearly independent subset of U , it can be extended to a basis of U , say
{u 1 , u 2 , . . . , u k , u k+1 , . . . , u n }. Now our claim is that the set

B = {T (u k+1 ), T (u k+2 ), . . . , T (u n )}
3.1 Kernel and Range of a Linear Transformation 81

is a basis of R(T ). First we show that B spans R(T ). Since {u 1 , u 2 , . . . , u k , u k+1 , . . . ,


u n } spans U and T (u i ) = 0 for each i, 1 ≤ i ≤ k, by Remark 3.7(ii)

R(T ) = span{T (u 1 ), T (u 2 ), . . . , T (u k ), T (u k+1 ), . . . , T (u n )}


= {T (u k+1 ), . . . , T (u n )}
= span B.

In order to show that B is linearly independent, suppose there exist scalars αk+1 , αk+2 ,
 n 
n
. . . , αn ∈ F such that αi T (u i ) = 0. This implies that T ( αi u i ) = 0, and
i=k+1 i=k+1

n
hence αi u i ∈ N (T ). But since {u 1 , u 2 , . . . , u k } spans N (T ), there exist β1 , β2 ,
i=k+1

n 
k 
n 
k
. . . , βk ∈ F such that αi u i = βi u i , which yields that αi u i + (−βi )u i
i=k+1 i=1 i=k+1 i=1
= 0. But since {u 1 , u 2 , . . . , u k , u k+1 , . . . , u n } is a basis of U , we find that each
of αi = 0 and hence the set B is linearly independent and forms a basis of
R(T ). This also shows that elements in B are distinct and r (T ) = n − k, that is
dimU = r (T ) + n(T ).

Theorem 3.10 Let U, V be vector spaces over a field F and T : U → V be a linear


transformation. If U  is a finite dimensional subspace of U such that U  ∩ N (T ) =
{0}, then the following hold:

(i) If {u 1 , u 2 , . . . , u n } is a basis of U  , then {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis


of T (U  ),
(ii) dimU  = dimT (U  ),
(iii) If T is one-to-one and U is finite dimensional, then r (T ) = dimU .

Proof (i) By Remark 3.7 (ii), if {u 1 , u 2 , . . . , u n } spans U  , then {T (u 1 ), T (u 2 ), . . . ,


T (u n )} spans T (U  ). In order to show that {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of
n
T (U  ), suppose there exist scalars α1 , α2 , . . . , αn ∈ F such that αi T (u i ) = 0. This
i=1

n  

n
implies that T αi u i = 0. But since U ∩ N (T ) = {0}, we arrive at αi u i ∈
i=1 i=1

n
U  ∩ N (T ) = {0}, and hence αi u i = 0. Given that the set {u 1 , u 2 , . . . , u n } being
i=1
a basis of U  is linearly independent, we find that αi = 0 for each i = 1, 2, . . . , n.
This shows that {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of T (U  ) if {u 1 , u 2 , . . . , u n } is
a basis of U  .

(ii) In view of the above we find that dimT (U  ) = dimU  .

(iii) If T is one-to-one, then N (T ) = {0}. But if N (T ) = {0}, then U ∩ N (T ) =


{0}, and by (ii) we find that dimU = dimT (U ). This implies that r (T ) = dimU .
82 3 Linear Transformations

Theorem 3.11 Let U and V be any two finite dimensional vector spaces over the
same field F. If f : U → V is an isomorphism and B = {u 1 , u 2 , . . . , u n } is a basis

of U , then B = { f (u 1 ), f (u 2 ), . . . , f (u n )} is a basis of V .

Proof We shall show that B is linearly independent. Suppose there exist scalars

n 
n
α1 , α2 , . . . , αn such that αi f (u i ) = 0. This implies that f (αi u i ) = 0, i.e.,
i=1 i=1

n 
n
f( (αi u i )) = 0. But since f is one-to-one map, we arrive at αi u i = 0. Since B
i=1 i=1

is a basis of U , we find that αi = 0 for each i = 1, 2, . . . , n. This shows that B is
linearly independent.

Now we show that B spans V , let v ∈ V . Since f is onto, for v ∈ V there exists
n
u ∈ U such that f (u) = v. Every vector u ∈ U can be written as u = βi u i , where
i=1

n 
n 
n
βi ∈ F. This shows that f (u) = f (βi u i ) = βi f (u i ), i.e., v = βi f (u i ).
i=1 i=1 i=1

Therefore, B spans V , and hence a basis of V .

Remark 3.12 The above theorem holds even if U and V are infinite dimensional
vector spaces over the same field F. Accordingly, theorem can be stated as: Let
U and V be any two infinite dimensional vector spaces over the same field F. If
f : U → V is an isomorphism and B = {u 1 , u 2 , . . . , u n , . . .} is a basis of U , then

B = { f (u 1 ), f (u 2 ), . . . , f (u n ), . . .} is a basis of V . Proof of this fact follows the
same pattern as above.

Theorem 3.13 Let U, V be vector spaces over a field F and {u 1 , u 2 , . . . , u n } be a


basis of U . Let {v1 , v2 , . . . , vn } be any set of vectors (not necessarily distinct) in V .
Then there exists a unique linear transformation T : U → V such that T (u i ) = vi ,
for each i, 1 ≤ i ≤ n.

Proof Let u be an arbitrary element in U . Then since {u 1 , u 2 , . . . , u n } be a basis of



n
U , there exist α1 , α2 , . . . , αn ∈ F such that αi u i = u. Define a map T : U → V
i=1

n
such that T (u) = αi vi .
i=1
First we show that T is linear. Suppose that x, y ∈ U and α, β ∈ F. Then x =

n 
n
αi u i and y = βi u i for some αi , βi ∈ F and
i=1 i=1


n 
n 
n
αx + βy = (ααi )u i + (ββi )u i = (ααi + ββi )u i .
i=1 i=1 i=1

This implies that


3.1 Kernel and Range of a Linear Transformation 83
 n   n 

n  
T (αx + βy) = (ααi + ββi )vi = α αi vi + β βi vi = αT (x) + βT (y).
i=1 i=1 i=1

Clearly T (u i ) = vi , for each i, 1 ≤ i ≤ n. To prove the uniqueness, suppose there


exists a linear transformation T  : U → V such that T  (u i ) = vi , for each i, 1 ≤ i ≤
 n
n. Then, any x ∈ U can be written as x = αi u i and hence
i=1


n 
n
T  (x) = αi T  (u i ) = αi vi = T (x)
i=1 i=1

and therefore T = T  .

The following corollary shows that if any two linear transformations agree on the
basis of a vector space, then they will be the same.

Corollary 3.14 Let U, V be vector spaces over a field F and {u 1 , u 2 , . . . , u n } be


a basis of U . If T, T  : U → V are linear transformations and T (u i ) = T  (u i ) for
each i, 1 ≤ i ≤ n, then T = T  .

Remark 3.15 (i) Theorem 3.13 can be restated as: Let U, V be vector spaces over
a field F and {u 1 , u 2 , . . . , u n } be a basis of U . A map f : {u 1 , u 2 , . . . , u n } −→
V can be uniquely extended to a linear map T : U −→ V , such that T (u i ) =
f (u i ) for each i, 1 ≤ i ≤ n.
(ii) Any map f from a basis of U to V will determine a unique linear map T :
U −→ V , which is called extension of f by linearity.
(iii) Thus maps from different bases of U to V or different maps from the same
basis of U to V will give different linear maps from U −→ V . Thus Theorem
3.13 gives us a method for determining linear maps from a finite dimensional
vector space U to a vector space V .

Theorem 3.16 Two finite dimensional vector spaces U and V over a field F are
isomorphic if and only if they are of the same dimension.

Proof Let U = ∼ V . This implies that there exists a bijective linear map T : U −→ V .
Suppose that dimU = n. We have to prove that dimV = n. Let {u 1 , u 2 , . . . , u n } be
a basis of U . We claim that the set B = {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of V .
First we prove that B spans V . For this let v ∈ V . Since T is onto, there exits u ∈ U ,
such that v = T (u). As u ∈ U , this shows that there exist scalars α1 , α2 , . . . , αn ∈
n
F such that u = α1 u 1 + α2 u 2 + · · · + αn u n . As a result, we get v = T ( αi u i ) =
i=1

n
αi T (u i ) and hence B spans V . Next to prove that B is a linearly independent set,
i=1

n 
n
let βi T (u i ) = 0 for some scalars β1 , β2 , . . . , βn ∈ F. This gives us T ( βi u i ) =
i=1 i=1
84 3 Linear Transformations


n 
n
0, i.e., βi u i ∈ K er T . Since T is injective, we conclude that βi u i = 0. This
i=1 i=1
implies that βi = 0 for all i, 1 ≤ i ≤ n. It proves that B is a linearly independent set
and thus B will also contain n elements. Thus B is a basis of V and dim V = n.
Conversely, suppose that dimU = dimV = n. Then we have to show that U ∼ = V.
Let {u 1 , u 2 , . . . , u n } and {v1 , v2 , . . . , vn } be bases of U and V , respectively. Define
a map f : {u 1 , u 2 , . . . , u n } −→ V such that f (u i ) = vi for each i, 1 ≤ i ≤ n. By
Remark 3.7 (i), this map f can be uniquely extended to a linear map T : U −→ V
such that T (u i ) = f (u i ) = vi for each i, 1 ≤ i ≤ n. We show that the linear map T is
bijective. Let x, y ∈ U . There exist scalars α1 , α2 , . . . , αn , β1 , β2 , . . . , βn ∈ F such
n n n
that x = αi u i and y = βi u i . Then T (x) = T (y) implies that T ( αi u i ) =
i=1 i=1 i=1

n 
n 
n 
n 
n
T( βi u i ). Now we obtain that αi T (u i ) = βi T (u i ), i.e., αi vi = βi vi .
i=1 i=1 i=1 i=1 i=1

n
This shows that (αi − βi )vi = 0. Since {v1 , v2 , . . . , vn } is a basis of V , we conclude
i=1

n 
n
that αi = βi for all i, 1 ≤ i ≤ n. This implies that αi u i = βi u i , i.e., x = y.
i=1 i=1
These arguments prove that T is one-to-one. To prove the ontoness of T , let v ∈ V ,

n
thus there exist scalars γ1 , γ2 , . . . , γn ∈ F such that v = γi vi . This shows that
i=1

n 
n
v= γi T (u i ). Since T is linear, we have v = T ( γi u i ) = T (u), where u =
i=1 i=1

n
γi u i ∈ U . Thus there exists u ∈ U such that T (u) = v, so T is onto and therefore
i=1
T is an isomorphism, i.e., U ∼
= V.

Exercises

1. Find out Range, Kernel, Rank and Nullity of all the linear transformations given
in the Problems 1–6, Problem 8 and 10 of the preceding section.
2. Let T : V1 −→ V2 be a linear map between finite dimensional vector spaces.
Prove that T is an isomorphism if and only if n(T ) = 0 and r (T )= dimV2 .
3. If T : U −→ V is a linear map, where U is finite dimensional, prove that
(a) n(T ) ≤ dimU ,
(b) r (T ) ≤ min(dimU , dimV ).
4. Let Z be subspace of a finite-dimensional vector space U , and V a finite-
dimensional vector space. Then prove that Z will be the kernel of a linear map
T : U −→ V if and only if dim Z ≥ dimU − dimV .
5. Let T : R4 −→ R3 be a linear map defined by T (e1 ) = (1, 1, 1), T (e2 ) =
(1, −1, 1), T (e3 ) = (1, 0, 0), T (e4 ) = (1, 0, 1). Then verify Rank-Nullity
Theorem, where {e1 , e2 , e3 , e4 } is the standard basis of R4 .
6. Let T be a nonzero linear transformation from R5 to R2 such that T is not onto.
Find r (T ) and n(T ).
3.1 Kernel and Range of a Linear Transformation 85

7. Let T : P3 (R) −→ P2 (R) be a map defined by T (α0 + α1 x + α2 x 2 + α3 x 3 ) =


α1 + 2α2 x + 3α3 x 2 , where P3 (R) is the vector space of all real polynomials of
degree less than or equal to 3 over field of real numbers.
(a) Prove that T is a linear transformation.
(b) Find N(T) and R(T ).
(c) Verify that r (T ) + n(T ) = 4.
8. Let P2 (x, y) be the vector space of all real polynomials of degree less than or
equal to 2 in x and y over the field of real numbers. Let L : P2 (x, y) −→ P2 (x, y)
be defined by L(P(x, y)) = ∂∂x (P(x, y)) + ∂∂y (P(x, y)), P(x, y) ∈ P2 (x, y).
Find a basis for kernel of L.
9. Let Pn [x] be the space of all real polynomials of degree at most n over the field of
real numbers. Consider the derivation operator ddx . Find the dimension of Kernel
and Image of ddx .
10. Consider Pn [x] over R. Define T (P(x)) = x P  (x) − P(x), P(x) ∈ Pn [x].
(a) Show that T is a linear transformation on Pn [x].
(b) Find N (T ) and R(T ).
11. Let V be a finite dimensional vector space and T be a linear transformation on
V . Show that there exists an integer k ≥ 0 such that V = R(T k ) ⊕ N (T k ).
12. Let T be a linear transformation on a finite dimensional vector space V . Show
that dim(I mT 2 ) = dim(I mT ) if and only if V = I mT ⊕ K er T . Specifically
if T 2 = T then V = I mT ⊕ K er T . Is the converse true?
13. Let T : R5 → R2 be a nonzero linear transformation such that T is not onto.
Find r (T ) and n(T ).
14. Let V1 , V2 , . . . , Vn ; n ≥ 2 be any finite dimensional vector spaces over a field
F. Let Ti : Vi → Vi+1 ; (1 ≤ i ≤ n − 1) be linear transformations such that
(a) K er T1 = {0},
(b) K er Ti+1 = R(Ti ), for 1 ≤ i ≤ n − 2,
(c) R(Tn−1 ) = Vn ,

n
then show that (−1)i dimVi = 0.
i=1

3.2 Basic Isomorphism Theorems

In this section we prove some isomorphism theorems, which have vast applications
in linear algebra.

Theorem 3.17 (Fundamental Theorem of Vector Space Homomorphisms or First


Isomorphism Theorem) Let T : U −→ V be a linear transformation. Then the quo-
tient space of U with regard to its subspace K , where K = K er T is isomorphic to
the homomorphic image of U under T , i.e., UK ∼= T (U ).
86 3 Linear Transformations

Proof Define a map f : UK −→ T (U ) such that f (u + K ) = T (u). The map f is


well defined because if we take u 1 + K = u 2 + K , then it implies that u 1 − u 2 ∈ K ,
i.e., T (u 1 − u 2 ) = 0 and hence T (u 1 ) = T (u 2 ), i.e., f (u 1 + K ) = f (u 2 + K ).
Also, we have f (α(u 1 + K ) + β(u 2 + K )) = f ((αu 1 + βu 2 ) + K ) = T (αu 1 +
βu 2 ) = αT (u 1 ) + βT (u 2 ) = α f (u 1 + K ) + β f (u 2 + K ) for every u 1 + K , u 2 +
K ∈ UK , α, β ∈ F. This shows that f is a linear transformation. Let f (u 1 + K ) =
f (u 2 + K ). It shows that T (u 1 ) = T (u 2 ), i.e., T (u 1 − u 2 ) = 0. As a result, (u 1 −
u 2 ) ∈ K = K er T , i.e., u 1 + K = u 2 + K and therefore f is one-to-one. Obviously
f is onto. Hence f is an isomorphism and we have shown that UK ∼ = T (U ).
Corollary 3.18 Let T : U −→ V be an onto linear transformation. Then the quo-
tient space of U with regard to its subspace K , where K = K er T is isomorphic to
V , i.e., UK ∼
= V.
Theorem 3.19 (Second Isomorphism Theorem) Let V1 and V2 be subspaces of the
2 ∼
vector space V . Then V1V+V 2
= V1 V 1 V2 .

Proof Clearly V2 is a subspace of V1 + V2 and V1 V2 is a subspace of V1 . There-
fore V2 and V1 1 V2 are quotient spaces. Define a map f : V1V+V
V1 +V2 V 2
−→ V1 V 1 V2
2

such that f ((v1 + v2 ) + V2 ) = v1 + (V1 V2 ) for every (v1 + v2 ) + V2 ∈ V1V+V 2


2
.
 
The map f is well defined because if we take (v1 + v2 ) + V2 = (v1 + v2 ) + V2 , then
  
it implies that (v1 − v 1 ) + (v2 − v2 ) ∈ V2 , i.e., (v1 − v1 ) ∈ V2 . As
a result, we obtain
 
that (v1 − v1 ) ∈ (V1 V2 ) and thus v1 + (V1 V2 ) = v1 + (V1 V2 ), i.e., f ((v1 +
 
v2 ) + V2 ) = f ((v1 + v2 ) + V2 ). Now for every α, β ∈ F, we have f (α((v1 + v2 ) +
    
V2 )
+ β((v1 + v2 ) + V2 ) = f ((αv1 + βv1 ) + (αv 2 + βv2 ) + V2 ) = (αv 1 + βv1 ) +

(V1 V2 ) = (αv1 + (V1 V2 )) + (βv1 + (V1 V2 )) = α(v1 + (V1 V2 )) + β(v1
 
+ (V1 V2 ))=α f ((v1 +v2 ) + V2 ) + β f ((v1 + v2 ) + V2 ). The previous arguments
show that f is a linear transformation. To prove that f is one-to-one, let f ((v1 +
  
v2 ) + V2 ) = f ((v1 + v2 ) + V2 ). This implies that v1 + (V1 V2 )=v1 + (V1 V2 ),
  
i.e., (v1 − v1 ) ∈ (V1 V2 ). This shows that (v1 − v1 ) + V2 = (−v2 + v2 )+V2 , i.e.,
 
(v1 + v2 ) − (v1 + v2  ) ∈ V2 . This implies that (v1 + v2 ) + V2 = (v1 + v2  ) + V2 . For
ontoness of f , let v1 + (V1 V2 ) ∈ V1 V2 . Clearly f ((v1  + 0) + V2 ) = v1  +
 V 1


(V1 V2 ), and hence f is onto. This shows that f is an isomorphism, i.e., V1 +V2 ∼ V2 =
V
1
V1 V2
.

Theorem 3.20 (Third Isomorphism Theorem) Let V1 and V2 be subspaces of the


2 ∼ V
vector space V such that V2 ⊆ V1 . Then VV1/V
/V2 = V1
.
Proof It is easy to observe that V /V1 , V /V2 , V1 /V2 are quotient spaces and V1 /V2 is
a subspace of V /V2 . As a result, VV1/V2
/V2
is a quotient space. Define a map f : VV1/V2
/V2
−→
V
V1
such that f ((v + V2 ) + V1 /V2 ) = v + V1 for every (v + V2 ) + V1 /V2 ∈ VV1/V 2
/V2
.

We prove that f is well defined. For this, let (v + V2 ) + V1 /V2 = (v + V2 ) +
 
V1 /V2 . It implies that (v + V2 ) − (v + V2 ) ∈ V1 /V2 , i.e., (v − v ) + V2 ∈ V1 /V2 .
 
Now we conclude that (v − v ) ∈ V1 , i.e., v + V1 = v + V1 , showing that f ((v +

V2 ) + V1 /V2 ) = f ((v + V2 ) + V1 /V2 ) and hence f is well defined. Also we have
3.2 Basic Isomorphism Theorems 87


f (α((v + V2 ) + V1 /V2 )) + β((v + V2 ) + V1 /V2 )) = f ((α(v + V2 ) + V1 /V2 ) +
  
(β(v + V2 ) + V1 /V2 )) = f (((αv + βv ) + V2 ) + V1 /V2 ) = (αv + βv ) + V1 =
 
(αv + V1 ) + (βv + V1 ) = α(v + V1 ) + β(v + V1 ) = α f ((v + V2 ) + V1 /V2 ) +
 
β f ((v + V2 ) + V1 /V2 ) for every α, β ∈ F, v, v ∈ V. The previous arguments
show that f is a linear transformation. To prove that f is one-to-one, let f ((v +
 
V2 ) + V1 /V2 ) = f ((v + V2 ) + V1 /V2 ). This implies that v + V1 = v + V1 , i.e.,
  
(v − v ) ∈ V1 . Now we have (v − v ) + V2 ∈ V1 /V2 , i.e., (v + V2 ) − (v + V2 ) ∈
 
V1 /V2 . This shows that (v + V2 ) + V1 /V2 = (v + V2 ) + V1 /V2 , since v + V2 , v +
V2 are the members of the vector space V /V2 and V1 /V2 is a subspace of V /V2 .
Thus f is one-to-one. To show the ontoness of f , let v + V1 ∈ V /V1 . Obviously
f ((v + V2 ) + V1 /V2 ) = v + V1 , which shows that f is onto. Finally we conclude
2 ∼ V
that f is an isomorphism and VV1/V/V2 = V1
.

Example 3.21 If U and W are vector spaces. Let V = U × W = {(u, w)|u ∈


U, w ∈ W }. Show that V forms a vector space with regard to component-wise oper-
ations. Using first isomorphism theorem, prove that {0}×W V ∼
= U and U ×{0}
V ∼
= W.
Further if both the vector spaces U and W are finite dimensional, then also prove
that V is finite dimensional and dimV = dimU + dimW.
Here V = U × W = {(u, w)|u ∈ U, w ∈ W }. It can be easily verified that V
forms a vector space with regard to component-wise operations. Define a map
T : V −→ U such that T (u, w) = u for every (u, w) ∈ V. It is easy to observe that
T (α(u 1 , w1 ) + β(u 2 , w2 )) = T (αu 1 + βu 2 , αw1 + βw2 ) = αu 1 + βu 2 = αT (u 1 ,
w1 ) + βT (u 2 , w2 ). This shows that T is a linear transformation, which is onto
obviously. Here we also have Kernel T = {(0, w) | w ∈ W } = {0} × W . Now using
first isomorphism theorem, we get {0}×W V ∼
= U. Similarly, we can also show that
∼ W . If both U and W are finite dimensional vector spaces, then let dimU = m
U ×{0} =
V

and dimV = n. Next suppose that B1 = {u 1 , u 2 , . . . , u m } and B2 = {v1 , v2 , . . . , vn }


be bases
of U and W , respectively. Then it can be easily verified that the set
B1 × {0} {0} × B2 is a basis of the vector space V . This proves that V is finite
dimensional. Since we have {0}×W V ∼
= U, and U is finite dimensional vector space,
then dim {0}×W = dimU . Now using the fact that V is a finite dimensional vector,
V

we arrive at dimV − dim({0} × W ) = dimU . But ({0} × W ) ∼ = W , we also get


dim({0} × W ) = dimW . Finally, we conclude that dimV − dimW = dimU , i.e.,
dimV = dimU + dimW.

Exercises

1. Using first isomorphism theorem, prove the second and third isomorphism
Theorems.
2. Let Vn ={ p(x) ∈ Q[x] | degp(x) < n} and Vn−1 = { p(x) ∈ Q(x) | degp(x) <
(n − 1)}, where n > 1. Define T : Vn −→ Vn−1 by T ( p(x)) = ddx p(x). Show
that T is a linear transformation and using first isomorphism theorem also prove
that Vn /Q ∼= Vn−1 .
3. Let V = V1 V2 be the direct sum of its subspaces V1 and V2 . Show that the
mappings p1 : V −→ V1 and p2 : V −→ V2 defined by p1 (v) = v1 , p2 (v) =
88 3 Linear Transformations

v2 , where v = v1 + v2 , v1 ∈ V1 , v2 ∈ V2 , are linear transformations from V to


V1 and V2 , respectively. With the help of first isomorphism theorem prove that
V /V1 ∼ = V2 and V /V2 ∼ = V1
4. Using first isomorphism theorem, prove the Rank-Nullity Theorem.
5. If V1 and V2 are finite dimensional subspaces of a vector space V . Using second
isomorphism theorem prove that V1 + V2 is finite dimensional and dim (V1 + V2 )
= dim V1 + dim V2 -dim (V1 ∩ V2 ).
6. If V1 , V2 , . . . , Vn are vector spaces over the field F, where n is any integer
greater than 1. Let V = V1 × V2 × · · · × Vn = {(v1 , v2 , . . . , vn )|v1 ∈ V1 , v2 ∈
V2 , . . . , vn ∈ Vn }. Show that V forms a vector space over F with regard to
component-wise operations. Using first isomorphism theorem, prove that
∼ V . Further if all the vector spaces V1 , V2 , . . . , Vn
V1 ×V2 ×···×Vi−1 ×{0}×Vi+1 ×Vn = i
V

are finite dimensional, then also prove that V is finite dimensional and dimV =
dimV1 + dimV2 + · · · + dimVn .
7. Prove that M2 (R) ∼ = R4 . Give two different isomorphisms of M2 (R) onto R4 .

3.3 Algebra of Linear Transformations

In the set of linear transformations one can combine any two linear transformations
in various ways in order to obtain a new linear transformation. The study of this set
is important because it forms a natural vector space structure. It is more important
in case if we consider the set of linear transformations from a vector space into
itself, because in that case, it is also possible to define the composition of two linear
mappings.
Let T1 , T2 : U → V be linear transformations from a vector space U to a vector
space V over the same field F. The sum T1 + T2 and the scalar product kT1 , where
k ∈ F, are defined to be the following mappings from U into V .

(T1 + T2 )(u) = T1 (u) + T2 (u) and (kT1 )(u) = k T1 (u).

It can be easily seen that T1 + T2 and kT1 are also linear. In fact, for any u 1 , u 2 ∈ U
and α, β ∈ F,

(T1 + T2 )(αu 1 + βu 2 ) = T1 (αu 1 + βu 2 ) + T2 (αu 1 + βu 2 )


= αT1 (u 1 ) + βT1 (u 2 ) + αT2 (u 1 ) + βT2 (u 2 )
= α{T1 (u 1 ) + T2 (u 1 )} + β{T1 (u 2 ) + T2 (u 2 )}
= α(T1 + T2 )(u 1 ) + β(T1 + T2 )(u 2 ).

Similarly, it can be seen that for any k ∈ F,


3.3 Algebra of Linear Transformations 89

kT1 (αu 1 + βu 2 ) = k{T1 (αu 1 + βu 2 )}


= k{αT1 (u 1 ) + βT1 (u 2 )}
= (kα)T1 (u 1 ) + (kβ)T1 (u 2 )
= (αk)T1 (u 1 ) + (βk)T1 (u 2 )
= α(kT1 (u 1 )) + β(kT1 (u 2 ))
= α(kT1 )(u 1 ) + β(kT1 )(u 2 ).

This shows that T1 + T2 and kT1 are also linear.


Example 3.22 Let f, g : R3 → R2 such that f (a, b, c) = (2a, b − c) and g(a, b, c)
= (a, b + c) for all a, b, c ∈ R. Then it can be easily seen that f, g are linear trans-
formations. It is also easy to see that

( f + g)(a, b, c) = f (a, b, c) + g(a, b, c)


= (2a, b − c) + (a, b + c)
= (3a, 2b)

This is again a linear transformation.


Remark 3.23 For any two vector spaces U and V over the same field F, the set
H om(U, V ) will denote the set of all linear transformations from U to V . There
should be no confusion concerning the underlying field of scalars. In fact, under the
above addition and scalar multiplication the set H om(U, V ) forms a vector space.
Theorem 3.24 Let U and V be vector spaces over the same field F. Then H om
(U, V ), the set of all linear transformations from U into V is a vector space over F
under the operation of addition and scalar multiplication as follows:
For any T1 , T2 ∈ H om(U, V ) and k ∈ F

(T1 + T2 )(u) = T1 (u) + T2 (u), (kT1 )(u) = kT1 (u) for any u ∈ U.

Proof We have already seen that for any T1 , T2 ∈ H om(U, V ), T1 + T2 ∈ H om


(U, V ). Define a map 0 : U → V such that 0(u) = 0, the zero vector in V . Then it
is easy to see that 0 is a linear transformation called the zero linear transformation
satisfying T1 + 0 = 0 + T1 = T1 . Also for any T1 ∈ H om(U, V ), define (−T1 )(u) =
−(T1 (u)). It can also be seen that (−T1 ) ∈ H om(U, V ) and T1 + (−T1 ) = (−T1 ) +
T1 = 0. Commutativity and associativity of addition in H om(U, V ) is obvious and
hence H om(U, V ) forms an abelian group under the addition. Further, for any T1 ∈
H om(U, V ) and k ∈ F, we have already seen that kT1 ∈ H om(U, V ). It is an easy
exercise to see that for any T1 , T2 ∈ H om(U, V ) and α, β ∈ F

α(T1 + T2 ) = αT1 + αT2 , (α + β)T1 = αT1 + βT1 , (αβ)T1 = α(βT1 ), 1T1 = T1 .

This shows that H om(U, V ) is a vector space over the field F.


Theorem 3.25 If U and V are vector spaces over a field F of dimension m and n,
respectively, then the dimension of the vector space H om(U, V ) over F is mn.
90 3 Linear Transformations


Proof Let B = {u 1 , u 2 , . . . , u m } and B = {v1 , v2 , . . . , vn } be bases of U and V ,
respectively. Define mn mappings, f i j : B −→ V for each i = 1, 2, . . . , m and j =
1, 2, . . . , n such that f i j (u k ) = 0 for all k = i and f i j (u k ) = v j for all k = i, where
k = 1, 2, . . . , m. Now by Theorem 3.13, one can find mn linear transformations
Ti j : U → V , such that Ti j | B = f i j where i = 1, 2, . . . , m and j = 1, 2, . . . , n. Our
result follows if we can show that the set {Ti j | i = 1, 2, . . . , m, j = 1, 2, . . . , n} is
a basis of H om(U, V ). Let T be an arbitrary member of H om(U, V ). Since for
each u i ∈ U, T (u i ) ∈ V and {v1 , v2 , . . . , vn } is a basis of V , corresponding to each
n
T (u i ) we can find n scalars αi j ∈ F, j = 1, 2, . . . , n such that T (u i ) = αi j v j .
j=1
Now it is clear that for every i and j, Ti j (u k ) = f i j (u k ) = v j for all k = i and

m n
Ti j (u k ) = f i j (u k ) = 0 for all k = i. Next we claim that T = αi j Ti j . Since
i=1 j=1


m 
n  
m  
n  
αi j Ti j (u k ) = αi j Ti j (u k )
i=1 j=1 i=1 j=1
m  
 n 
= αi j Ti j (u k )
i=1 j=1
n
= αk j Tk j (u k )
j=1
n
= αk j v j
j=1
= T (u k ),


m 
n 
m 
n 
αi j Ti j ∈ H om(U, V ) and both T, αi j Ti j agree on all basis elements
i=1 j=1 i=1 j=1

m 
n
of U . This shows that T = αi j Ti j , i.e., T ∈ H om(U, V ) is a linear com-
i=1 j=1
bination of Ti j s. Now in order to show that the set {Ti j | i = 1, 2, . . . , m, j =
1, 2, . . . , n} is linearly independent, suppose that there exist scalars βi j ∈ F such
m  n
that βi j Ti j = 0, where 0 stands for the zero linear transformation from U to
i=1 j=1

m 
n 
V . This implies that βi j Ti j (u k ) = 0(u k ) for all k = 1, 2, . . . , m. This yields
i=1 j=1
n 
 m 
that βi j Ti j (u k ) = 0. But since Ti j (u k ) = v j for all k = i and Ti j (u k ) = 0
j=1 i=1

n
for all k = i, the latter expression reduces to βk j v j = 0 for all k = 1, 2, . . . , m.
j=1
Since {v1 , v2 , . . . , vn } is a basis of V , we find that βk j = 0. This yields that the
set {Ti j | i = 1, 2, . . . , m, j = 1, 2, . . . , n} is linearly independent and thus forms a
basis of H om(U, V ). Finally, we get dim H om(U, V ) = mn = dimU dimV .
3.3 Algebra of Linear Transformations 91

Remark 3.26 Let U be a vector space over a field F such that dimU = m, then
dim H om(U, U ) = m 2 . Since every field F is a vector space over F of dimension
one, dim H om(U, F) = m.

Example 3.27 Let U = R3 and V = R2 and let {e1 , e2 , e3 } and { f 1 , f 2 } be stan-


dard bases of U and V , respectively. This yields that dim H om(U, V ) = 6. In fact,
we find the set {Ti j | i = 1, 2, 3, j = 1, 2} as a basis of H om(U, V ). Any u =
(a, b, c) ∈ R3 can be written as u = ae1 + be2 + ce3 . Now T11 (e1 ) = f 1 , T11 (e2 ) =
(0, 0), T11 (e3 ) = (0, 0). This yields that T11 (u) = aT11 (e1 ) + bT11 (e2 ) + cT11 (e3 ) =
a f 1 = (a, 0). Similarly T12 (u) = a f 2 = (0, a), T21 (u) = b f 1 = (b, 0), T22 (u) =
b f 2 = (0, b), T31 (u) = c f 1 = (c, 0), T32 (u) = c f 2 = (0, c).

Lemma 3.28 Let U, V, W, be vector spaces over a field F . Then the following hold:
(i) For any linear transformations T1 : U → V and T2 : V → W the composite
map T2 T1 : U → W is again a linear transformation.
(ii) For any linear transformations T1 : U → V, T2 : V → W and α ∈ F, α(T2 T1 )
= (αT2 )T1 = T2 (αT1 ).
(iii) For any linear transformations T2 , T3 : U → V and T1 : V → W, T1 (T2 +
T3 ) = T1 T2 + T1 T3 .
(iv) For any linear transformations T1 , T2 : V → W and T3 : U → V (T1 + T2 )T3
= T1 T3 + T2 T3 .

Proof (i) For any u 1 , u 2 ∈ U and α, β ∈ F

T2 T1 (αu 1 + βu 2 ) = T2 (T1 (αu 1 + βu 2 ))


= T2 (αT1 (u 1 ) + βT1 (u 2 ))
= αT2 (T1 (u 1 )) + βT2 (T1 (u 2 ))
= α(T2 T1 )(u 1 ) + β(T2 T1 )(u 2 )

This completes the proof.


(ii) Since T2 T1 : U → W , for any u ∈ U,
   
α(T2 T1 ) (u) = α (T2 T1 )(u) .
     
(αT2 ))(T1 (u) = (αT2 )(T1 (u)) = α T2 (T1 (u)) = α (T2 T1 )(u)
       
T2 (αT1 ) (u) = T2 (αT1 )(u) = T2 α(T1 (u)) = α (T2 T1 )(u) .

The above three expressions ensure the validity of (ii).


(iii) Since T1 (T2 + T3 ), T1 T2 + T1T3 : U → W are  linear transformations, for
any u ∈ U, T1 (T2 + T3 ) (u) = T1 T2 (u) + T3 (u) = T1 (T2 (u)) + T1 (T3 (u)) =
(T1 T2 )(u) + (T1 T3 )(u) = (T1 T2 + T1 T3 )(u). This yields the required result.
(iv) Proof is similar as (iii).
92 3 Linear Transformations

Remark 3.29 (i) The set of all linear transformations of V into itself, i.e., H om
(V, V ) is usually denoted by A (V ).
(ii) A linear operator T on the vector space U is called an idempotent linear operator
on U if T 2 = T .
(iii) A linear operator T on the vector space U is called a nilpotent linear operator
on U if T k = 0 for some integer k ≥ 1.

Theorem 3.30 If V is a vector space over a field F, then A (V ) is an algebra with


identity over the field F, which is not commutative in general.

Proof (i) By Theorem 3.24, A (V ) is a vector space over F under the addition and
scalar multiplication operations on linear mappings.
(ii) A (V ) is a ring. In fact, A (V ) is an abelian group under the operation addition of
linear transformations. Also by Lemma 3.28 (i), A (V ) is closed with respect to the
operation of product (composition). Since the composition of functions is associative
in general, the product in A (V ) is associative. The distributivity of product on the
operation of addition of linear transformations follows by Lemma 3.28(iii) and (iv).
Hence A (V ) forms ring under the operations of addition and composition of linear
transformations.
(iii) By Lemma 3.28 (ii), for any α ∈ F and T1 , T2 ∈ A (V )

α(T1 T2 ) = (αT1 )T2 = T1 (αT2 ).

This shows that A (V ) is an algebra over the field F.


(iv) Since IV T = T I V = T for all T ∈ A (V ), where I V is the identity linear map
on V . As the composition of functions is not commutative in general, the product in
A (V ) is not commutative. These facts prove that A (V ) is an algebra with identity
IV over the field F, which is not commutative in general.

Exercises

1. Show that the set {T1 , T2 , T3 } in the corresponding vector space is linearly inde-
pendent, where
(a) T1 , T2 , T3 ∈ A (R2 ) defined by T1 (x, y) = (x, 2y), T2 (x, y) = (y, x + y),
T3 (x, y) = (0, x),
(b) T1 , T2 , T3 ∈ H om(R3 , R) defined by T1 (x, y, z) = x + y + z, T2 (x, y, z) =
y + z, T3 (x, y, z) = x − z.
2. Find the condition under which dim H om(V, U ) = dimV ?
3. Suppose V = U W , where U and W are subspaces of V . Let T1 and T2 be
linear operators on V defined by T1 (v) = u, T2 (v) = w, where v = u + w, u ∈
U, w ∈ W . Show that
(a) T12 = T1 and T22 = T2 , i.e., T1 and T2 are projections.
(b) T1 + T2 = I , the identity mapping.
(c) T1 T2 = 0 and T2 T1 = 0.
3.3 Algebra of Linear Transformations 93

4. Let T1 and T2 be linear operators on V satisfying parts (a), (b), (c) of the above
Problem 3. Prove that V = T1 (V ) T2 (V ).
5. Give an example to show that the set of all nonzero elements of A (V ) is not a
group under composition of linear operators.
6. Determine the group of units of the ring A (V ).
7. Let T : V −→ W be an isomorphism, where V and W are vector spaces over
the field F. Prove that the mapping f : A (V ) −→ A (W ) defined by f (S) =
T ST −1 is an isomorphism.
8. If T ∈ A (V ), prove that the set of all linear transformations S on V such that
T S = 0 is a subspace and a right ideal of A (V ).
9. If dimA (V ) > 1, then prove that A (V ) is not commutative.
10. Let T1 , T2 be two linear maps on V such that T1 T2 = T2 T1 . Then prove that
(a) (T1 + T2 )2 = T12 + 2(T1 T2 ) + T22 ;
(b) (T1 + T2 )n = n C0 T1n + n C1 T1(n−1) T2 + · · · + n Cn T2n .
11. Let V1 be a subspace of a vector space V . Then prove that the set of all linear
transformations from V to V that vanish on V1 is a subspace of A (V ).
12. Let T be an idempotent linear operator on any vector space U over F. Then
prove that U = R(T ) N (T ) and for any v ∈ R(T ), T (v) = v.
13. Let V be an n-dimensional vector space, n ≥ 1, and T : V → V be a linear
transformation such that T n = 0 but for some v ∈ V, T n−1 (v) = 0. Show that
the set {v, T (v), T 2 (v), . . . , T n−1 (v)} forms a basis of V .
14. Let T : V → V be a nilpotent linear operator. Show that I + T is invertible.
(Hint: Any linear transformation is said to be nilpotent if T k = 0 for some integer
k ≥ 1. If T k = 0, then I − T + T 2 − T 3 + · · · + (−1)k−1 T k−1 is the inverse of
(I + T ).
15. Let B = {v1 , v2 , . . . , vn } be a basis of V . Suppose that for each 1 ≤ i, j ≤ n
define v , f or k = i
f i, j (vk ) = k
vi + v j , f or k = i.

Prove that f i, j s are differentiable and form a basis of A (V ).

3.4 Nonsingular, Singular and Invertible Linear


Transformations

Definition 3.31 A linear transformation T : V −→ U is called nonsingular if T (v)


= 0, where v ∈ V implies that v = 0, i.e., if K er T = {0}. If, on the other hand, there
exists a nonzero vector v ∈ V such that T (v) = 0, i.e., K er T = {0}, then T is called
singular.

Remark 3.32 (i) From the Remark 3.7(iii), T is nonsingular if and only if T is
injective or one-to-one.
94 3 Linear Transformations

(ii) If V is a finite dimensional vector space then T : V −→ U is nonsingular if


and only if r (T ) = dimV follows by Theorem 3.9.
Theorem 3.33 A linear transformation T : V −→ U is nonsingular if and only if
the image of any linearly independent set is linearly independent.
Proof Let T : V −→ U be nonsingular. Let S be any linearly independent subset
of V . Now we have two cases.

Case I: Let S be finite, i.e., S = {v1 , v2 , . . . , vn }. We have to prove that {T (v1 ),


T (v2 ), . . . , T (vn )} is linearly independent. For this, let α1 T (v1 ) + α2 T (v2 ) + · · · +
n
αn T (vn ) = 0, where α1 , α2 , . . . , αn ∈ F. This implies that T ( αi vi ) = 0, i.e.,
i=1

n
αi vi ∈ K er T . As T is nonsingular, we have K er T = {0}. Now we conclude
i=1

n
that αi vi = 0, but as S is linearly independent, we get αi = 0 for all i, 1 ≤ i ≤ n.
i=1
Thus the image of S is linearly independent.

Case II: Let S = {w1 , w2 , . . . , wm , . . .} be infinite. We have to show that the


image set {T (w1 ), T (w2 ), . . . , T (wm ), . . .} is linearly independent. For this choose
an arbitrary finite subset P of {T (w1 ), T (w2 ), . . . , T (wm ), . . .}, then clearly P =
{T (wi1 ), T (wi2 ), . . . , T (wim )} for some i 1 , i 2 , . . . , i m ∈ N. The set {wi1 , wi2 , . . . ,
wim }, being a subset of S, is a linearly independent set in V . Now using Case I,
it follows that P is a linearly independent subset of U . As a result, the image set
{T (w1 ), T (w2 ), . . . , T (wm ), . . .} is linearly independent.
Conversely, suppose that the image of linearly independent set is linearly inde-
pendent. We have to prove that T is nonsingular; equivalently we have to prove that
K er T = {0}. Let v ∈ K er T . We claim that v = 0, for otherwise, if v = {0}, then
{v} will be a linearly independent set. By our hypothesis {T (v)} will be linearly
independent set, leading to a contradiction, because T (v) = 0.
Theorem 3.34 Let U , V , W be finite dimensional vector spaces over the same field
F. Let T1 : U −→ V and T2 : V −→ W be linear transformations. Then
(i) r (T2 T1 ) ≤ Min [r (T2 ), r (T1 )],
(ii) n(T2 T1 ) ≥ n(T1 ).
Proof (i) T2 T1 : U −→ W is a linear transformation. We have T1 (U ) ⊆ V ; this
implies that T2 (T1 (U )) ⊆ T2 (V ), i.e., T2 T1 (U ) ⊆ T2 (V ). Since T2 T1 (U ) and T2 (V )
are subspaces of W , T2 T1 (U ) is a subspace of T2 (V ) and hence dimT2 T1 (U ) ≤
dimT2 (V ), i.e., r (T2 T1 ) ≤ r (T2 ). Again T1 (U ) is a subspace of V . Hence the
restriction of T2 on the subspace T1 (U ) will be a linear transformation, i.e.,
T2 |T1 (U ) : T1 (U ) −→ W. Hence by Theorem 3.9, dimT2 T1 (U ) ≤ dimT1 (U ), i.e.,
r (T2 T1 ) ≤ r (T1 ). Including the previous two inequalities we conclude that r (T2 T1 ) ≤
Min [r (T2 ), r (T1 )].
3.4 Nonsingular, Singular and Invertible Linear Transformations 95

(ii) Let u ∈ K er T1 . This implies that T1 (u) = 0, i.e., T2 (T1 (u)) = T2 (0). Thus
we get T2 T1 (u) = 0, i.e., u ∈ K er T2 T1 . Therefore, K er T1 ⊆ K er T2 T1 . Since both
K er T1 and K er T2 T1 are subspaces of U , K er T1 is a subspace of K er T2 T1 . Now we
get dim K er T1 ≤ dim K er T2 T1 , i.e., n(T1 ) ≤ n(T2 T1 ).
Theorem 3.35 Let V be a finite dimensional vector space and T, S : V −→ V be
linear transformations, where S is nonsingular. Then r (ST ) = r (T S) = r (T ).
Proof Clearly, ST : V −→ V is a linear transformation. I m(ST ) = (ST )(V ) =
S(T (V )). Therefore, r (ST ) = dim S(T (V )). Since T (V ) is a subspace of V , the
restriction of S on the subspace T (V ) is also a nonsingular linear transformation, i.e.,
S |T (V ) : T (V ) −→ V . Hence r (S |T (V ) ) = dimT (V ) follows from Remark 3.32(ii).
Now this implies that dim S(T (V )) = dimT (V ), and hence r (ST ) = r (T ). Again
since V is finite dimensional and S : V −→ V is injective, S is surjective also.
Hence S(V ) = V , i.e., T (S(V ) = T (V ). This relation shows that (T S)(V ) = T (V ),
i.e., I m(T S) = I m(T ). As a result, dim I m(T S) = dim I mT , i.e., r (T S) = r (T ).
Finally we have proved that r (ST ) = r (T S) = r (T ).
Definition 3.36 A linear transformation T : U −→ V is called invertible if T is
bijective as a map. In other words, if T is an isomorphism, then T is said to be
invertible.
Remark 3.37 (i) If a linear transformation T : U −→ V is invertible, then T is
a bijective map and hence the inverse of the map T , which is usually denoted by
T −1 , exists and which is a map from V to U . Now we prove that T −1 (αv1 + βv2 ) =
αT −1 (v1 ) + βT −1 (v2 ) for all α, β ∈ F, v1 , v2 ∈ V . Let us suppose that T −1 (αv1 +
βv2 ) = u ∈ U and T −1 (v1 ) = u 1 ∈ U, T −1 (v2 ) = u 2 ∈ U . This implies that T (u) =
αv1 + βv2 and T (u 1 ) = v1 , T (u 2 ) = v2 . Now we have T (u) = αT (u 1 ) + βT (u 2 ).
But as T is a linear transformation, we have T (u) = T (αu 1 + βu 2 ). Since T is
one-to-one, we get u = αu 1 + βu 2 , i.e., T −1 (αv1 + αv2 ) = αT −1 (v1 ) + βT −1 (v2 ).
Thus T −1 is also linear transformation, which is known as the inverse of linear
transformation T . We know that if T is a bijective map, then T −1 is also a bijective
map. This shows that if T is invertible, i.e., T is an isomorphism, then its inverse
T −1 is also an isomorphism and hence invertible also.
(ii) By the properties of invertible maps, we can say that the linear transformation
T : U −→ V is invertible if and only if there exists the linear transformation T −1 :
V −→ U such that T T −1 = I V and T −1 T = IU , where IU and I V are identity linear
transformations on vector spaces U and V , respectively. In particular, if T : U −→ U
is a linear operator on U . Then T is invertible if and only if there exists the linear
operator T −1 on U , i.e., T −1 : U −→ U such that T T −1 = IU = T −1 T.
(iii) Let U be a finite dimensional vector space and T : U −→ U be an invertible
linear operator on U . This shows that T is bijective, i.e., K er T = {0} or T is nonsin-
gular. Hence n(T ) = 0. Using Rank-Nullity theorem, we have r (T ) = dimU . This
implies that T is surjective. It is easy to observe that if U is a finite dimensional vector
space, then the linear map T : U −→ U is invertible if and only if T is nonsingular,
i.e., injective or alternatively T is surjective. Hence nonsingular and invertible linear
transformations on a vector space U are synonyms when U is finite dimensional.
96 3 Linear Transformations

(iv) If U is not finite dimensional, then a linear transformation on U may be


nonsingular, i.e., injective and yet not invertible as illustrated by the following
example: Consider the map T : F[x] −→ F[x], given by T (α0 + α1 x + α2 x 2 +
· · · + αn x n ) = α0 x + α1 x 2 + α2 x 3 + · · · + αn x n+1 . We know that F[x] is an infinite
dimensional vector space and it can be easily verified that T is linear. Now we prove
that T is nonsingular or injective. Let α0 + α1 x + α2 x 2 + · · · + αn x n ∈ Ker T . Now
we get T (α0 + α1 x + α2 x 2 + · · · + αn x n ) = 0 (the zero polynomial over F). This
implies that α0 x + α1 x 2 + α2 x 3 + · · · + αn x n+1 = 0. But {x, x 2 , x 3 , . . . , x n+1 } is a
linearly independent set in the vector space F[x], thus we get α0 = α1 = · · · = αn =
0. Hence Ker T = {0}. This shows that T is injective. But T is not surjective because
the polynomials of zero degree, i.e., scalars are not the image of any polynomial
under T . Hence T is not invertible.

Theorem 3.38 Let U and V be finite dimensional vector spaces over the same field
F such that dim U = dim V . If T : U −→ V is a linear transformation, then the
following statements are equivalent:
(i)T is invertible.
(ii)T is nonsingular.
(iii)The range of T is V .
(iv) If {u 1 , u 2 , . . . , u n } is any basis of U , then {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis
of V .
(v) There is a basis {u 1 , u 2 , . . . , u n } for U such that {T (u 1 ), T (u 2 ), . . . , T (u n )} is
a basis of V .

Proof (i) =⇒ (ii) Since T is invertible, T is injective. Thus K er T = {0}. This


shows that T is nonsingular.
(ii) =⇒ (iii) T is nonsingular, it implies that K er T = {0}, i.e., n(T ) = 0. Using
Rank-Nullity theorem, we get r (T ) = dimU . But it is given that dimU = dimV ,
thus we get r (T ) = dimV , i.e., dimT (U ) = dimV . Since T (U ) is a subspace of V ,
we conclude that T (U ) = V , i.e., the range of T is V .
(iii) =⇒ (iv) It is given that T (U ) = V . This shows that dimT (U ) = dimV ,
i.e., r (T ) = dimU . Using again Rank-Nullity Theorem, we get n(T ) = 0, i.e.,
T is nonsingular. Let us suppose that {u 1 , u 2 , . . . , u n } is any basis of U , i.e.,
{u 1 , u 2 , . . . , u n } is a linearly independent subset of U . By Theorem 3.33, it follows
that {T (u 1 ), T (u 2 ), . . . , T (u n )} is a linearly independent set of V . Now we show
that the set {T (u 1 ), T (u 2 ), . . . , T (u n )} spans V . For this let v ∈ V . Thus there exists
u ∈ U such that T (u) = v. Since {u 1 , u 2 , . . . , u n } is a basis of U , there exist scalars
α1 , α2 , . . . , αn ∈ F such that α1 u 1 + α2 u 2 + · · · + αn u n = u. Now we can write
v = T (α1 u 1 + α2 u 2 + · · · + αn u n ) = α1 T (u 1 ) + α2 T (u 2 ) + · · · + αn T (u n ). These
facts prove that the set {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of V .

(iv) =⇒ (v) Since U is a finite dimensional vector space, it has a finite basis. Let
{u 1 , u 2 , . . . , u n } be a basis of U . Now by hypothesis {T (u 1 ), T (u 2 ), . . . , T (u n )} is
a basis of V .
3.4 Nonsingular, Singular and Invertible Linear Transformations 97

(v) =⇒ (i) Suppose that there is a basis {u 1 , u 2 , . . . , u n } for U such that


{T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of V . It is clear that Im (T ) is generated by
{T (u 1 ), T (u 2 ), . . . , T (u n )} and also V is generated by {T (u 1 ), T (u 2 ), . . . , T (u n )}.
Now we conclude that T (U ) = V , i.e., T is onto. This implies that r (T ) = dim V ,
i.e., r (T ) = dim U . Using Rank-Nullity theorem, we conclude that n(T ) = 0, i.e.,
T is one-to-one. This proves the fact that T is invertible.
Theorem 3.39 Let U , V and W be vector spaces over the same field F and T1 :
U −→ V , T2 : V −→ W be invertible linear transformations. Then the following
hold:
(i) T2 T1 is invertible and (T2 T1 )−1 = T1−1 T2−1 .
(ii) If α(= 0) ∈ F, then αT1 is invertible and (αT1 )−1 = α −1 T1−1 .
Proof (i) We know that the composition of two linear transformations is a lin-
ear transformation, T2 T1 : U −→ W is a linear transformation. Both the maps T1
and T2 are bijective, and hence T2 T1 is also a bijective map. As a result, T2 T1 is
invertible. Since T1 and T2 are invertible, it implies that there exist linear transfor-
mations T1−1 : V −→ U , T2−1 : W −→ V . Thus we have a linear transformation
T1−1 T2−1 : W −→ U . Now we claim that (T2 T1 )−1 = T1−1 T2−1 . To prove the claim,
we have to show that (T2 T1 )T1−1 T2−1 = IW , T1−1 T2−1 (T2 T1 ) = IU , where IU and IW
are identity linear transformations on vector spaces U and W , respectively. Consider
(T2 T1 )T1−1 T2−1 = T2 (T1 T1−1 )T2−1 = T2 I V T2−1 = IW . Similarly, it can be shown that
T1−1 T2−1 (T2 T1 ) = IU .

(ii) We know that αT1 : U −→ V is a linear transformation. We prove that


αT1 is one-to-one and onto. Let (αT1 )(u 1 ) = (αT1 )(u 2 ), where u 1 , u 2 ∈ U . This
implies that α(T1 (u 1 )) = α(T1 (u 2 )). Since α(= 0) ∈ F, multiplying both sides of
the previous relation by α −1 , we obtain that T1 (u 1 ) = T1 (u 2 ), which implies that
u 1 = u 2 because T1 is one-to-one. In this way the above arguments prove that αT1
is one-to-one. For ontoness of αT1 , let v ∈ V . Due to ontoness of T1 , there exists
u ∈ U , such that T1 (u) = v. It can be easily verified that (αT1 )(α −1 u) = v. This
shows that αT1 is onto. Thus αT1 is invertible. Since T1 is invertible, it implies that
there exists a linear transformation T1−1 : V −→ U . Thus we have a linear trans-
formation α −1 T1−1 : V −→ U . Now we claim that (αT1 )−1 = α −1 T1−1 . To prove
the claim, we have to show that (αT1 )(α −1 T1−1 ) = I V , (α −1 T1−1 )(αT1 ) = IU , where
IU and I V are identity linear transformations on vector spaces U and V , respec-
tively. Consider (αT1 )α −1 T1−1 = (αα −1 )(T1 T1−1 ) = I V . Similarly, it can be shown
that α −1 T1−1 (αT1 ) = IU .
Theorem 3.40 If T, T1 , T2 are linear operators on a vector space U such that T T1 =
T2 T = IU , where IU is the identity linear operator on U , then T is invertible and
T −1 = T1 = T2 .
Proof We prove that T is a bijective map. For one-to-oneness of T , let us suppose
that T (u 1 ) = T (u 2 ). This implies that T2 (T (u 1 )) = T2 (T (u 2 )), i.e., (T2 T )(u 1 ) =
(T2 T )(u 2 ). Using hypothesis we get IU (u 1 ) = IU (u 2 ), i.e., u 1 = u 2 . Hence T is
98 3 Linear Transformations

injective. For ontoness of T , let u ∈ U , clearly T1 (u) ∈ U and obviously we have


T (T1 (u)) = IU (u) = u. Hence T is onto. Thus we conclude that T is invertible. Now
let the inverse of T be T −1 , i.e., T T −1 = T −1 T = IU . But it is given that T T1 = IU .
This implies that T −1 (T T1 ) = T −1 IU , i.e., T −1 = T1 . Similarly, it can be proved
that T −1 = T2 .

Theorem 3.41 If U is a finite dimensional vector space, then a linear operator T


on U is invertible if and only if there exists a linear operator T1 on U such that (i)
T T1 = IU or alternatively (ii) T1 T = IU .

Proof Suppose that T is invertible, i.e., T −1 exists. Obviously, T T −1 = T −1 T = IU .


This implies that if we take T1 = T −1 , then we get our required result.
Conversely, suppose that (i) holds. We show that T is onto. For this, let y ∈
U so that T1 (y) ∈ U . Suppose that T1 (y) = x ∈ U . Consider T (x) = T (T1 (y)) =
(T T1 )(y) = IU (y) = y. This shows that T is onto. Since T is defined on a finite
dimensional vector space, T is one-to-one also. Thus T is invertible.
If (ii) holds, then we have T1 T = IU . We show that T is one-to-one. For this let
T (x) = T (y). This implies that T1 (T (x)) = T1 (T (y)), i.e., (T1 T )(x) = (T1 T )(y).
This shows that IU (x) = IU (y), i.e., x = y. Thus T is injective. But U is finite
dimensional therefore T will be onto also. As a result, T becomes invertible.

Exercises

1. Determine all nonsingular linear transformations T : R4 −→ R3 .


2. Let C2 be a vector space over C, the field of complex numbers. Let the lin-
ear transformation T : C2 −→ C2 be defined by T (z 1 , z 2 ) = (αz 1 + βz 2 , γ z 3 +
δz 4 ), where z 1 , z 2 ∈ C and α, β, γ , δ are fixed scalars. Prove that T is nonsingular
if and only if αδ − βγ = 0.
3. If α is a nonzero scalar, then prove that the linear transformation T : V −→ U
is singular if and only if αT is singular.
4. Suppose that V is a finite dimensional vector space. Let T be a linear transfor-
mation on V such that r (T 2 ) = r (T ). Show that Ker T Im T = {0}.
5. Let S : U −→ V and T : V −→ U are linear maps. Show that if S and T are
nonsingular, then T S is also nonsingular. Give an example such that T S is
nonsingular but T is singular.
6. Let S, T : U −→ V be linear transformations, where V is a finite dimensional
vector space. Then prove that
(a) |r (S) − r (T )| ≤ r (S + T ) ≤ r (S) + r (T );
(b) r (αT ) = r (T ) for each nonzero α ∈ F.
7. Draw the image of unit square in R2 under the linear transformation T : R2 −→
R2 defined by T (x, y) = (x − y, 2x − 3y).
8. Let T ∈ A(V ) be such that T 2 − T + I = 0, where I is the identity linear oper-
ator on V . Then show that T is invertible, also find the following:
3.4 Nonsingular, Singular and Invertible Linear Transformations 99

(a) T −1
(b) 2T − T −1 .
9. Let S and T be linear transformations on a finite dimensional vector space. Show
that if ST = I , then T S = I . Is this true for infinite dimensional vector spaces?
10. Let A, B, C, D ∈ Mn×n (C). Define T on Mn×n (C) by T (X ) = AX B + C X +
X D, X ∈ Mn×n (C). Show that T is a linear transformation on Mn×n (C) and that
when C = D = 0, T has an inverse if and only if A and B are invertible.

3.5 Matrix of a Linear Transformation

Throughout this section all the vector spaces will be finite dimensional. Con-
sider an ordered basis B = {u 1 , u 2 , . . . , u m } of a vector space V over a field F.
For any vector u ∈ V , there exist unique scalars α1 , α2 , . . . , αm such that u =
α1 u 1 + α2 u 2 + · · · + αm u m . The coordinate vector of u relative to the ordered basis
B, i.e., [u] B is a column vector denoted by [u] B = [α1 , α2 , . . . , αm ]T . Now, let V

be an n-dimensional vector space with an ordered basis B = {v1 , v2 , . . . , vn } and
T : U → V be a linear transformation. Then, for any u i ∈ U , T (u i ) ∈ V . Therefore,
n
for any 1 ≤ i ≤ m if T (u i ) = α ji v j , αi j ∈ F, then it can be expressed as
j=1
T (u 1 ) = α11 v1 + α21 v2 + · · · + αn1 vn
T (u 2 ) = α12 v1 + α22 v2 + · · · + αn2 vn

..
.

T (u m ) = α1m v1 + α2m v2 + · · · + αnm vn .

The transpose matrix of the coefficients matrix of the above equations of order n × m
is called the matrix of T relative to the ordered bases B and B  of the vector spaces

U and V , respectively, and is denoted as m(T )(B,B  ) or [T ] BB . The ith column of this
matrix is the coefficients of T (u i ), when it is expressed as a linear combination of
v j , j = 1, 2, 3, . . . , n.

Example 3.42 Let T : R3 → R2 be a linear transformation such that T (a, b, c) =


(a + b, b − c) for all a, b, c ∈ R. We shall find the matrix of T relative to the ordered
standard basis of R3 and R2 , respectively. Now we can write

T (1, 0, 0) = 1(1, 0) + 0(0, 1)


T (0, 1, 0) = 1(1, 0) + 1(0, 1)
T (0, 0, 1) = 0(1, 0) + (−1)(0, 1).

Thus the matrix of T relative to the ordered standard basis of R3 and R2 is


100 3 Linear Transformations

11 0
[T ] = .
0 1 −1
⎡ ⎤
a 
11 0 ⎣ ⎦
Observe that T (a, b, c) = (a + b, b − c) = b . Hence if we take
0 1 −1
c
the ordered standard bases of both R3 and R2 , then it is clear that the matrix of T
relative to the ordered standard basis is the matrix in the above equation. Now we
shall find the matrix of T relative to another basis say {(1, 1, 0), (1, 0, 1), (0, 1, 1)}
and {(1, 2), (2, 1)} of R3 and R2 , respectively. In fact,

T (1, 1, 0) = 0(1, 2) + 1(2, 1)


T (1, 0, 1) = −1(1, 2) + 1(2, 1)
T (0, 1, 1) = −1
3
(1, 2) + 23 (2, 1).
−1

0 −1
Thus [T ] = 2
3 .
1 1 3

Remark 3.43 (i) It is straightforward to see that the matrix of a linear transforma-
tion changes according to the choice of a basis of the underlying vector space.
In fact, if we change the order of element in a given basis even then we get a
different matrix of the linear transformation. Since the basis of a vector space is
not unique, the order of element and the basis both play important role in finding
the matrix of a linear transformation.
(ii) The matrix of the zero linear transformation is always the zero matrix for any
choice of basis. However, the matrix of the identity linear transformation is the
identity matrix with respect to the same basis, otherwise it may be different from
the identity matrix.
Theorem 3.44 Let A = (δ ji ) be an n × m matrix. Fix bases B = {u 1 , u 2 , . . . , u m }

m
and B  = {v1 , v2 , . . . , vn } of U and V , respectively. Let u ∈ U such that u = αi u i .
i=1

n
Define a map T : U → V such that T (u) = β j v j , where each of β j is defined as
j=1

m 
βj = δ jk αk . Then T : U → V is a linear transformation such that [T ] BB = A.
k=1

 
m
Proof First, we show that T is well defined. For this let u = u , where u = αk u k
k=1
 
m  
n
and u = αk u k . Thus we have T (u) = β j v j , where each of β j is defined as
k=1 j=1

m  
n    
m 
βj = δ jk αk and T (u ) = β j v j , where each of β j is defined as β j = δ jk αk .
k=1 j=1 k=1
 
As B is a basis of U and u = u , we conclude that αk = αk for each k = 1, 2, 3, . . . , n.
 
As a result, β j = β j for each j = 1, 2, 3, . . . , n. Thus T (u) = T (u ) stands proved.
3.5 Matrix of a Linear Transformation 101

 
m  
Let α, β ∈ F. Then αu + βu = (ααk + βαk )u k and let T (αu + βu ) =
k=1

n 
m 
γ j v j , where each of γ j is defined as γ j = δ jk (ααk + βαk ).
j=1 k=1

 
n
T (αu + βu ) = γjvj
j=1
n  m  
= δ jk (ααk + βαk ) v j
j=1 k=1
n m  n 
 m  
=α δ jk αk v j + β δ jk αk v j
j=1 k=1 j=1 k=1

= αT (u) + βT (u ).

Thus T is a linear transformation.


Here we also have

T (u 1 ) = δ11 v1 + δ21 v2 + · · · + δn1 vn


T (u 2 ) = δ12 v1 + δ22 v2 + · · · + δn2 vn

..
.

T (u m ) = δ1m v1 + δ2m v2 + · · · + δnm vn .



Obviously [T ] BB = (δ ji ) = A. This completes the proof of the theorem.

Theorem 3.45 Let T1 , T2 : U → V be linear transformations, where U and V are


vector spaces over F of dimensions m and n, respectively, with ordered bases B and

B . Then the following hold:
 
(i) If T1 = T2 , then [T1 ] BB = [T2 ] BB .
  
(ii) [T1 + T2 ] BB = [T1 ] BB + [T2 ] BB .
 
(iii) [αT1 ] BB = α[T1 ] BB .
  
Proof (i) Let B = {x1 , x2 , . . . , xm }, B = {y1 , y2 , . . . , yn } and [T1 ] BB = (α ji ), [T2 ] BB
= (β ji ). We have

T1 = T2 ⇔ T1 (xi ) = T2 (xi ), for i = 1, 2, 3, . . . , m


n n
⇔ α ji y j = β ji y j
j=1 j=1
n
⇔ (α ji − β ji )y j = 0
j=1
⇔ α ji = β ji for each i and each j
⇔ (α ji ) = (β ji )
 
⇔ [T1 ] BB = [T2 ] BB .
102 3 Linear Transformations

(ii) Since T1 + T2 : U → V is also a linear transformation and also,

(T1 + T2 )(xi ) = T1 (xi ) + T2 (xi ), for i = 1, 2, 3, . . . , m


n 
n
= α ji y j + β ji y j
j=1 j=1
n
= (α ji + β ji )y j .
j=1

  
Therefore [T1 + T2 ] BB = [(α ji + β ji )] = (α ji ) + (β ji ) = [T1 ] BB + [T2 ] BB .

(iii) As αT1 : U → V is also a linear transformation and also,

(αT1 )(xi ) = αT1 (xi ), for i = 1, 2, 3, . . . , m


 n
=α α ji y j )
j=1

n
= (αα ji )y j .
j=1

 
Thus [αT1 ] BB = (αα ji ) = α(α ji ) = α[T1 ] BB .

Theorem 3.46 Let T1 : U → V and T2 : V → W be linear transformations, where


U , V and W are vector spaces over F of dimensions m, n and p, respectively, with
    
ordered bases B, B and B . Then [T2 T1 ] BB = [T2 ] BB  [T1 ] BB .
 
Proof Let B = {x1 , x2 , . . . , xm }, B = {y1 , y2 , . . . , yn }, B = {z 1 , z 2 , . . . , z p } and
 
[T1 ] BB = (α ji ), [T2 ] BB  = (βk j ). Clearly, T2 T1 : U → W is a linear transformation.
Also, we have

(T2 T1 )(xi ) = T2 (T1 (xi )), for i = 1, 2, 3, . . . , m


n
= T2 ( α ji y j )
j=1

n
= α ji T2 (y j )
j=1
n 
p
= α ji βk j z k )
j=1 k=1
p  
n
= βk j α ji )z k
k=1 j=1
p 
n
= γki z k , where γki = βk j α ji .
k=1 j=1

  
Thus [T2 T1 ] BB = (γki ) = (βk j )(α ji ) = [T2 ] BB  [T1 ] BB .

Remark 3.47 If in the above theorem, we take U = V = W and B = B  = B  ,


then it is obvious to observe the following fact: [T2 T1 ] BB = [T2 ] BB [T1 ] BB .
3.5 Matrix of a Linear Transformation 103

Theorem 3.48 If U and V are vector spaces over F of dimensions n and m, respec-
tively, then H om(U, V ) ∼
= Mm×n (F), where Mm×n (F) denotes the vector space of
all m × n matrices with entries from F and hence dimension of the vector space
Mm×n (F) = mn.

Proof Let B and B be ordered bases of U and V , respectively. Define a map

f : H om(U, V ) → Mm×n (F) such that f (T ) = [T ] BB . In the light of Theorem 3.45

f is well defined and one-to-one. We also have f (αT1 + βT2 ) = [αT1 + βT2 ] BB =
B B B B
[αT1 ] B + [βT2 ] B = α[T1 ] B + β[T2 ] B = α f (T1 ) + β f (T2 ) for every α, β ∈ F and
T1 , T2 ∈ H om(U, V ). Thus f is a linear transformation. Using Theorem 3.44, it is
easy to observe that f is onto. Now we conclude that f is an isomorphism and hence
H om(U, V ) ∼ = Mm×n (F). Therefore, dim Mm×n (F) = dim H om(U, V ) = dimU
dimV = mn.

Theorem 3.49 Let V be an n-dimensional vector space over F. Then the algebras
A (V ) and Mn×n (F) are isomorphic.

Proof Let B be an ordered basis of V. Define a map f : A (V ) → Mn×n (F) such


that f (T ) = [T ] BB . Using Theorems 3.44, 3.45 and Remark 3.47, one can easily
verify the following. For any T1 , T2 ∈ L(V, V ), α, β ∈ F: (i) f is well defined,
(ii) f (αT1 + βT2 ) = α f (T1 ) + β f (T2 ), (iii) f (T1 T2 ) = f (T1 ) f (T2 ), (iv) f is a
bijective map. Thus f is an algebra isomorphism and hence the algebras A (V ) and
Mn×n (F) are isomorphic.

3.6 Effect of Change of Bases on a Matrix Representation


of a Linear Transformation

Theorem 3.50 Let T : U −→ V be a linear transformation, where U and V are



vector spaces over F of dimensions n and m, respectively. Let B and B be two
 B1 B
ordered bases of U and B1 , B1 be two ordered bases of V . Then [T ] B and [T ] B1 are

B
equivalent matrices and [T ] B1 = P[T ] BB1 S −1 , where P is the transition matrix of B1

relative to B1 and S is the transition matrix of B relative to B  .

Proof Let IU and I V denote the identity linear operators on the vector spaces U and
B
V , respectively. It is clear that T = IV T IU . Using Theorem 3.46, we have [T ] B1 =
B B B
[I V T IU ] B1 = [I V ] B11 [T ] BB1 [IU ] BB  . Now let us put P = [I V ] B11 and Q = [IU ] BB  . Thus
B
we can write [T ] B1 = P[T ] BB1 Q, where clearly P and Q are matrices of orders
m × m and n × n, respectively. Now we claim that P and Q are invertible matri-
B
ces. Since I V = I V I V , we find that [I V ] BB11 = [I V I V ] BB11 = [I V ] BB1 [I V ] B11 . Similarly, we
1
B B B B
also have [I V ] B1 = [I V I V ] B1 = [I V ] B11 [I V ] BB1 . But we know that [I V ] BB11 = [I V ] B1 =
1 1 1 1
B
In×n , the identity matrix of order n × n. Hence we conclude that [IV ] BB1 [I V ] B11 =
1
104 3 Linear Transformations

B
[I V ] B11 [I V ] BB1 = In×n , i.e., [I V ] BB1 P = P[I V ] BB1 = In×n . This shows that P is an invert-
1 1 1
ible matrix. In the similar lines one can show that Q is also an invertible matrix. This

B
proves that [T ] BB1 and [T ] B1 are equivalent matrices.
B B
As we have proved that [T ] B1 = P[T ] BB1 Q, where P = [I V ] B11 , i.e., the tran-
sition matrix of B1 relative to B1 and Q = [IU ] BB  . If we put S = Q −1 , then
B 
[T ] B1 = P[T ] BB1 S −1 , where S = Q −1 = [IU ] BB , i.e., the transition matrix of B rela-
tive to B  . This completes the proof.
Corollary 3.51 Let T be a linear operator on an n-dimensional vector space U. If
 
B and B are ordered bases of U , then [T ] BB and [T ] BB  are similar matrices and also
B
[T ] B  = P[T ] B P , where P is the transition matrix of B relative to B  .
B −1

Proof Let IU be the identity linear operator on the vector space U . If in partic-
 
ular, in the above theorem, we put V = U , B = B1 and B = B1 , then we get
 
[T ] BB  = P[T ] BB Q, where P = [IU ] BB and Q = [IU ] BB  are invertible matrices of order

n × n, i.e., [T ] BB and [T ] BB  are equivalent matrices. Now we claim that Q = P −1 .

Since IU = IU IU , we find that [IU ] BB = [IU IU ] BB = [IU ] BB  [IU ] BB . In the similar
   
way, [IU ] BB  = [IU IU ] BB  = [IU ] BB [IU ] BB  . But [IU ] BB = [IU ] BB  = In×n , the identity
 
matrix of order n × n. Hence we conclude that [IU ] BB [IU ] BB  = [IU ] BB  [IU ] BB = In×n ,
i.e., P Q = Q P = In×n . This implies that Q = P −1 . Finally we have shown that
 
[T ] BB  = P[T ] BB P −1 . This implies that [T ] BB and [T ] BB  are similar matrices.
 
As we have obtained that [T ] BB  = P[T ] BB P −1 , where P = [IU ] BB , which is obvi-
ously the transition matrix of B relative to B  .
Let T : U −→ V be a linear transformation, where U and V are finite dimensional
vector spaces over F. Theorem 3.50 and Corollary 3.51 show that matrices of T
relative to two different sets of bases are equivalent or similar. The next two results
prove their converse.
Theorem 3.52 Let T : U −→ V be a linear transformation, where U and V are
vector spaces over F of dimensions n and m, respectively. Let B and B1 be ordered
bases of U and V , respectively, and let C = [T ] BB1 . If C is equivalent to D, then there
 B
exist ordered bases B  , B1 of U and V , respectively, such that D = [T ] B1 .
Proof As C is equivalent to D, there exists nonsingular m × m matrix P and nonsin-
gular n × n matrix Q such that D = PC Q. Let S = Q −1 . Then clearly D = PC S −1 .
By Theorem 3.50, there exists an ordered basis B1 of V such that the transition matrix
of B1 relative to B1 is P −1 , i.e., [I V ] BB1 = P −1 and there exists an ordered basis B  of
1
U such that Q is the transition matrix of B  relative to B, i.e., [IU ] BB  = Q. As a result,
P is the transition matrix of B1 relative to B1 and Q −1 = S is the transition matrix
B
of B relative to B  . But by Theorem 3.50, [T ] B1 = P[T ] BB1 S −1 = PC Q = D. This
completes the proof.
Corollary 3.53 Let T be a linear operator on an n-dimensional vector space U.
For some ordered basis B of U , let A = [T ] BB . Let A be similar to D. Then there
 
exists an ordered basis B of U such that [T ] BB  = D.
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 105

Proof Since A is similar to D, there exists an invertible matrix P such that D =


P A P −1 . By Theorem 3.50, there exists an ordered basis B  of U such that the
transition matrix of B  relative to B is P −1 , i.e., [IU ] BB  = P −1 . This implies that

[IU ] BB = P, the transition matrix of B relative to B  . But Corollary 3.51, provides

us [T ] BB  = P[T ] BB P −1 = P A P −1 = D.

Theorem 3.54 Let A be any m × n matrix over F, B and B  be the standard bases of
Fn×1 and Fm×1 , respectively. Then, for T : Fn×1 −→ Fm×1 , given by T (X ) = AX ,

[T ] BB = A. For m = n, [T ] BB = A.

Proof Let B = {e1 , e2 , . . . , en }, B  = {e1 , e2 , . . . , em }, where e j = (αt1 )n×1 , j =


1, 2, . . . , n and αt1 = 1 if t = j for otherwise it equals 0, ei = (αl1 
)m×1 , i =

1, 2, . . . , m and α
1 = 1 if
= i for otherwise it
⎡ ⎤ ⎡ equals 0. Then,
⎤ for any m×n
0 a1 j
⎢ 1 ⎥ ⎢ a2 j ⎥ 
⎢ ⎥ ⎢ ⎥ m
matrix A = (ai j ), over F, T (e j ) = Ae j = A ⎢ . ⎥ = ⎢ . ⎥ = ai j ei , where
.
⎣ . ⎦ ⎣ . ⎦ i=1 .
0 am j
B
j = 1, 2, . . . , n. Hence [T ] B = (ai j ) = A. Obviously for m = n, B  = B, so A =
(ai j ) = [T ] BB .

Theorem 3.55 Let T : U −→ V be a linear transformation, where U and V are


vector spaces over F of dimensions n and m, respectively. Let B and B  be ordered

bases of U and V , respectively, and let A = [T ] BB . Then

(i) for any u ∈ U , [T (u)] BB  = A[u] BB ,
(ii) rank (T ) = rank (A).
 
Proof (i) Let B = {x1 , x2 , . . . , xn }, B = {y1 , y2 , . . . , ym } and A = [T ] BB = (ai j ).

m
By definition T (x j ) = ai j yi , j = 1, 2, . . . , n. Consider v ∈ V . Let
i=1

⎤⎡
α1
⎢ α2 ⎥
⎢ ⎥
[u] B = ⎢ . ⎥ .
⎣ .. ⎦
αn


n 
n 
n 
m
Then u = α j x j . This implies that T (u) = α j T (x j ) = αj ai j yi ], i.e.,
j=1 j=1 j=1 i=1
m  
 n  
n
T (u) = ai j ]yi . Thus, in [T (u)] BB  , the (i, 1)th entry = ai j . It can be easily
i=1 j=1 j=1

n
seen that the (i, 1)th entry of A[u] BB = ai j . This implies that (i, 1)th entry of
j=1
 
[T (u)] BB  = (i, 1)th entry of A[u] BB for each i = 1, 2, . . . , m. Thus [T (u)] BB  = A[u] BB .
This completes the proof of (i).
106 3 Linear Transformations

(ii) Consider the following homogeneous ⎡ ⎤system of linear equations, given in


x1
⎢ x2 ⎥
⎢ ⎥
matrix form, i.e., AX = O, where X = ⎢ . ⎥ . Let W denote the solution space of
⎣ .. ⎦
xn
this system.
Next suppose that DimW = n − r , where r = rank(A), because the above sys-
tem contains (n − r ) linearly independent solutions. Now define a map f : U −→
Fn×1 such that f (u) = [u] BB , it can be easily verified that f is an isomorphism. Let u ∈

K er T ⇐⇒ T (u) = 0 ⇐⇒ [T (u)] BB  = 0 ⇐⇒ A[u] BB = 0 ⇐⇒ [u] BB ∈ W. But as
f (u) = [u] BB . We conclude that u ∈ K er T ⇐⇒ f (u) ∈ W. Therefore N (T ) =
f −1 (W ), i.e., n(T ) = dim[ f −1 (W )]. Since f −1 : Fn×1 −→ U is also an isomor-
−1
phism, therefore its restriction on subspace W , i.e., fW : W −→ f −1 (W ) is also an
isomorphism. Thus by Rank-Nullity theorem dimW = dim f −1 (W ). Now we con-
clude that n(T ) = dimW , i.e., dim(U ) − r (T ) = n − r = dim(U ) − r (A). This
implies that r (T ) = r (A), i.e., rank(T ) = rank(A)
Corollary 3.56 Let T be a linear operator on the vector space U of dimension n
and B be an ordered basis of U . Then T is an automorphism of U if and only if
A = [T ] BB is nonsingular.
Proof We know that T is an automorphism if and only if RankT = dimU = n.
Now using the above theorem, we also have T is an automorphism if and only
if RankT = Rank A. Now we conclude that Rank A = n. But we know that A is
nonsingular if and only if Rank A = n. Thus it follows that T is an automorphism
of U if and only if A = [T ] BB is nonsingular.
Example 3.57 Let T be the linear operator on C2 defined by T (x1 , x2 ) = (x1 , 0).

Let B be the standard ordered basis for C2 and let B = {α1 , α2 } be the ordered basis
defined by α1 = (1, i), α2 = (−i, 2).
(1) The matrix of T relative to the pair of bases B, B  .
We have B = {(1, 0), (0, 1)}. Since

T (1, 0) = (1, 0) = 2(1, i) + −i(−i, 2), T (0, 1) = (0, 0) = 0(1, i) + 0(−i, 2),

 2 0
we find that [T ] BB = .
−i 0
(2) The matrix of T relative to the pair of bases B  , B.
We have

T (1, i) = (1, 0) = 1(1, 0) + 0(0, 1), T (−i, 2) = (−i, 0) = −i(1, 0) + 0(0, 1).

1 −i
Thus we get [T ] BB  = .
0 0
(3) The matrix of T relative to the ordered basis B  .
Since
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 107

T (1, i) = (1, 0) = 2(1, i) + (−i)(−i, 2),


T (−i, 2) = (−i, 0) = (−2i)(1, i) + (−1)(−i, 2)

 2 −2i
we find that [T ] BB 
= .
−i −1
(4) The matrix of T relative to the ordered basis {α2 , α1 }.
Since
T (−i, 2) = (−i, 0) = (−1)(−i, 2) + (−2i)(1, i),
T (1, i) = (1, 0) = (−i)(−i, 2) + 2(1, i)

−1 −i
the matrix of T relative to ordered basis {α2 , α1 } is [T ] = .
−2i 2

Exercises
⎡ ⎤
1 0 3
1. Let V = R3 and suppose that ⎣ −1 −4 3 ⎦ is the matrix of T ∈ A (V ) rela-
6 2 1
tive to the basis {e1 , e2 , e3 }, where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).
Find the matrix of T relative to the basis {e1 , e2 , e3 }, where e1 = (1, 1, 0), e2 =
(1, 0, 1) and e3 = (0, 1, 1).
2. Let V be the vector space of all polynomials of degree less than or equal to
3 over R. In A (V ), define T by T (α0 + α1 x + α2 x 2 + α3 x 3 ) = α0 + α1 (x +
1) + α2 (x + 1)2 + α3 (x + 1)3 . Compute the matrix of T relative to bases
(i) {1, x, x 2 , x 3 };
(ii) {1, 1 + x, 1 + x 2 , 1 + x 3 }.
If the matrix in part (i) is A and that in part (ii) is B, then prove that A and B
are similar matrices.
3. Let T : R2 −→ R2 be the linear operator such that T (x, y) = (x − y, 2x + y).
Let B = {(1, 0), (0, 1)} and B  = {(1, 2), (2, 1)} be ordered bases of R2 .
 
(a) Find [T ] BB , [T ] BB  , [T ] BB , [T ] BB  .

(b) Find the transition matrix P of B relative to B  and verify that [T ] BB  =
P[T ] BB P −1 .
 
(c) Find the formula for T −1 , and find [T −1 ] BB  . Also verify that [T −1 ] BB  =

([T ] BB  )−1 .

(d) Find [T −1 ] BB  and also verify [T −1 ] BB  = ([T ] BB )−1 .
4. Let P2 (R) be the vector space of all polynomials of degree less than or equal to
2 over R and let T : R2 −→ P2 (R) be given by T (α, β) = βx + αx 2 . Consider
the following bases B = {(1, −2), (−3, 0)} of R2 and B  = {1, x, x 2 } of P2 (R).

(a) Find [T ] BB .
 
(b) Verify that [T (3, −4)] BB  = [T ] BB [(3, −4)] BB .

(c) Verify that RankT = Rank[T ] BB .
108 3 Linear Transformations

5. Let T : R3 −→ R3 be the linear operator whose matrix relative to ⎡ the standard



1 3 1
ordered basis B = {e1 =(1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1)} is ⎣ −1 −3 1 ⎦ .
0 2 4

(a) Find T (e1 ), T (e2 ), T (e3 ) and determine T . Prove that T is invertible and
determine T −1 .
(b) Find the matrix of each of the following relative to the standard ordered
basis T 2 , T 2 + T , T 2 + I , (−2T )3 + 6T 2 − I .

(c) Find [T ] BB  , where B  = {e1 = (1, 1, 1), e2 = (1, 1, 0), e3 = (1, 0, 0)}.
(d) Find a basis of K er T and a basis of RangeT .

(e) Show that both matrices [T ] BB and [T ] BB  have the same rank.

(f) Prove that [T ] BB and [T ] BB  are similar matrices.
6. Let P3 (R) be the vector space of all polynomials of degree less than or equal to
3 over R and let D : P3 (R) −→ P3 (R) be the differentiation operator. Let B be
the standard basis {1, x, x 2 , x 3 } and B  = {1, 1 + x, (1 + x)2 , (1 + x)3 }.
 
(a) Find [D] BB , [D] BB  , [D] BB , [D] BB  .

(b) For A = [D] BB  , verify that A4 = 0, but A3 = 0.
(c) For any α = 0 in R, prove that α I + D is invertible.
7. Let V be the 2-dimensional vector space of solutions of the differential equation
y  − 3y  + 2y = 0 over C and let B = {y1 = e x , y2 = e2x } be a basis of V and
D : V −→ V be the differentiation operator. Find [D] BB .
8. Let V be the vector space of all 2 × 2 matrices with real entries and let T :
V −→ R be a map defined by T (A) = the trace of A.
1 0 0 1 0 0 0 0
(a) Show that the set B = , , , is a basis for V .
00 00 10 01
 
(b) Prove that T is a linear transformation and determine [T ] BB , where B = {5}.
Determine the dimension and a basis for the Kernel of T .
9. Let T be a linear operator on R3 defined by T (x, y, z) = (2y + z, x − 4z, 3x −
6z).
(a) Find [T ] BB , where B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)}.
(b) Verify that [T ] BB [v] B = [T (v)] B for any v ∈ R3 .
10. For each of the following linear operators T on R2 , find the matrix, that is
represented by T relative to the standard basis B = {(1, 0), (0, 1)} of R2 .
(a) T is defined by T (1, 0) = (2, 3), T (0, 1) = (3, −4).
(b) T is the rotation in R2 counterclockwise by 900 .
(c) T is the reflection in R2 about the line y = x.
11. The set B = {e3t , te3t , t 2 e3t } is a basis of a vector space V of functions f :
R −→ R. Let D be the differential operator on V , i.e., D( f ) = ddtf . Find [D] BB .
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 109

12. Let A be the matrix representation of a linear operator T on a vector space V


over the field F. Prove that, for any polynomial f (x) over F, we have that f (A)
is the matrix representation of f (T ).
13. Let T : Rn −→ Rm be the linear mapping defined by T (x1 , x2 , . . . , xn ) =
(α11 x1 + · · · + α1n xn , α21 x1 + · · · + α2n xn , . . . , αm1 x1 + · · · + αmn xn ), where
αi j ∈ R, i = 1, 2, . . . , m, j = 1, 2, . . . , n. Show that the rows of the matrix [T ]
representing T relative to the usual bases of Rn and Rm are the coefficients of
xi in the components of T (x1 , x2 , . . . , xn ).
14. Let T : U −→ V be a linear transformation, where U and V are vector spaces
of dimensions m and n, respectively. Prove that there exist bases of U and  V
Ir 0
such that matrix representation of T relative to these bases has the form ,
0 0
where Ir is the r -square identity matrix.
15. Let V be the vector space of all 2 × 2 matrices with real entries.
ab
Consider the following matrix M and usual basis B of V ; M = and
cd

10    
01 00 00 
B= , , , . Find [T ] BB , represented by each of the
00 00 10 01
following linear operators T on V .
(a) T (A) = M A,
(b) T (A) = AM,
(c) T (A) = AM − M A.
16. Suppose T : V −→ V is linear. A subspace W of V is called invariant under T
if T (W ) ⊆ W . Suppose W is invariant under
T and dimW = r . Show that T
AB
has a block triangular representation M = , where A is r × r submatrix.
0 C
17. Let U = V + W , where V and W are subspaces of U , which are invariant under
a linear operator T : U −→ U . Also suppose that dimV = r and dimW = s.
A 0
Show that T has a block diagonal representation M = , where A and B
0 B
are r × r and s × s submatrices.
18. Consider C as a vector space over R. For a ∈ C, let Ta : C −→ C be a linear
operator given by Ta (x) = xa for all x ∈ C. Find [Ta ] BB , where B = {1, i} and
so get an isomorphic representation of the complex numbers as 2 × 2 matrices
over real field.
19. Consider Q as a vector space over R, where Q is the division ring of real
quaternions. For a ∈ Q, let Ta : Q −→ Q be a linear operator defined as Ta (x) =
xa for all x ∈ Q. Find [Ta ] BB , where B = {1, i, j, k} and so get an isomorphic
representation of Q as 4 × 4 matrices over real field.
20. Suppose that x- and y-axes in the plane R2 are rotated counterclockwise 300
to yield new X - and Y -axes for the plane. Consider this rotation as a map T :
R2 −→ R2 and prove that T is a linear transformation, also find the following:
110 3 Linear Transformations

(a) The change of basis matrix or transition matrix for the new coordinate sys-
tem.
(b) T (2, 3), T (−6, 8), T (−2, 6), T (3, 5), T (a, b).
21. Let T be a linear operator on R3 defined by T (x, y, z) = (2x − y + z, x − y +
2z, 2x − 3y + z).

(a) Find [T ] BB and [T ] BB  , where B = {(1, 1, 1), (1, 1, 0), (1, 0, 0)} and B  =
{(1, 1, 0), (1, 2, 3), (1, 3, 5)}.

(b) Verify that Determinant [T ] BB = Determinant [T ] BB  .
22. Let T : R3 −→ R2 be a linear transformation given by T (x, y, z) = (2x − y +
z, −3x + 2y − z). Let B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, B  = {(1, 1, 0), (0, 1,
2), (0, 1, 1)} and B1 = {(1, 1), (1, 2)}, B1 = {(1, 0), (0, 2)} be ordered bases of
R3 and R2 , respectively. Then
B
(a) Find [T ] BB1 and [T ] B1 .
B
(b) Verify that [T ] BB1 and [T ] B1 are equivalent matrices.

(c) Verify that Rank [T ] BB1 = Rank [T ] BB1 .
B
(d) Find out nonsingular matrices P and Q such that [T ] B1 = P[T ] BB1 Q.
B
(e) Verify that for any v ∈ R3 , [T (v)] B1 = [T ] BB1 [v] B and [T (v)] B1 =[T ] B1 [v] B  .
23. Let T be a linear transformation on an n-dimensional vector space V . If
T n−1 (v) = 0 but T n (v) = 0, for some v ∈ V , then v, T (v), . . . , T n−1 (v) are
linearly independent, and thus form a basis of V . Find the matrix representation
of T under this basis.
24. Let C∞ (R) be the vector space of real valued functions on R having derivative
of all orders. Consider the differential operator D(y) = −y  + ay  + by, y ∈
C∞ (R), where a and b are real constants. Show that y = eλx lies in the K er D
if and only if λ is a root of the quadratic equation t 2 + at + b = 0. 
21
25. Let T be a linear transformation on R associated with the matrix
2
under
02
the basis {α1 = (1, 0), α2 = (0, 1)}. Let W1 be the subspace of R2 spanned by
α1 . Show that W1 is invariant under T and that there does not exist a subspace
W2 invariant under T such that R2 = W1 ⊕ W2 .
26. Let S and T be linear transformations on R2 . Given that thematrix represen-
12
tation of S under the basis {α1 = (1, 2), α2 = (2, 1)} is , and the matrix
23

33
representation of T under the basis {β1 = (1, 1), β2 = (1, 2)} is . Let
24
u = (3, 3) ∈ R2 . Find
(a) The matrix of S + T under the basis {β1 , β2 }.
(b) The matrix of ST under the basis {α1 , α2 }.
(c) The coordinate vector of S(u) under the basis {α1 , α2 }.
(d) The coordinate vector of T (u) under the basis {β1 , β2 }.
Chapter 4
Dual Spaces

Duality is a very important tool in mathematics. In this chapter, we explore some


instances of duality. Let V be a vector space over a field F. Since every field is a vector
space over itself, one can consider the set of all linear transformations H om(V, F).
The set H om(V, F) forms a vector space known as dual space to V or conjugate
space of V and is denoted as V = H om(V, F). It is also denoted by V ∗ . In the present
chapter, we shall study the properties of this vector space. Throughout, we assume
that the underlying vector spaces are finite dimensional unless otherwise mentioned.

4.1 Linear Functionals and the Dual Space

Definition 4.1 Let V be a vector space over a field F. A linear transformation T :


V → F is called a linear functional and the set of all linear functionals on V denoted
 = H om(V, F) forms a vector space over the field F.
as V

Example 4.2 (1) Let V be a vector space over a field F. Define a map f : V −→ F
given by f (x) = 0 for all x ∈ V. It is obvious to observe that f is a linear
functional on V, which is known as the zero linear functional on V.
(2) Consider the vector space Fn over the field F. Define the map f : Fn −→ F by
f (x1 , x2 , . . . , xn ) = a1 x1 + a2 x2 + · · · + an xn , where x1 , x2 , . . . , xn ∈ F and
a1 , a2 , . . . , an are n fixed scalars. It can be easily verified that f is a linear
functional on Fn .
(3) Consider the vector space Fn over the field F. Let πi : Fn −→ F be the ith
projection, i.e., πi (x1 , x2 , . . . , xn ) = xi . It can be easily shown that πi is a linear
functional on Fn .

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 111
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_4
112 4 Dual Spaces

(4) Let R[x] be the vector space of all polynomials in x over the real field R. Define
1
the map I : R[x] −→ R by I ( f (x)) = 0 f (x)d x. It can be easily proved that
I is a linear functional on R[x].
(5) Let Mn×n (F) be the vector space of all n × n matrices over the field F. Consider
the map f : Mn×n (F) −→ F, defined as f (A) = trace(A) = a11 + a22 + · · · +
ann , where A = (ai j )n×n . It can be easily seen that f is a linear functional on
Mn×n (F).
(6) Let F[x] be the vector space of all polynomials in x over the field F. Define
the map L : F[x] −→ F by L( f (x)) = the value of f (x) at some fixed element
t ∈ F, i.e., f (t). It can be easily seen that L is a linear functional on F[x].
(7) Let C [a, b] be the vector space of all real-valued continuous functions over
the field R of real numbers. Define the map L : C [a, b] −→ R by L( f (x)) =
b
a f (x)d x. It is obvious that L is a linear functional on C [a, b].

Definition 4.3 Let V be a vector space over F and let B = {v1 , v2 , . . . , vn } be a basis
of V . Then a subset 
B = {  is said to be the dual basis of V
v1 , v2 , . . . , vn } of V  with
 
respect to the basis B if B is a basis of V and vi (vi ) = 1, vi (v j ) = 0 for i = j.

Remark 4.4 (i) Since F is a vector space of dimension one over itself, dim H om
(V, F) = dimV .
(ii) It is clear that the dual basis with respect to any basis of a vector space V is
always unique while the change of basis of V will always produce a different
dual basis.
 has a dual basis.
Theorem 4.5 Let V be a vector space over a field F. Then V

Proof Let B = {v1 , v2 , . . . , vn } be a basis of V . If v ∈ V , then there exist unique


α1 , α2 , . . . , αn ∈ F such that v = α1 v1 + α2 v2 + · · · + αn vn . Now, for each i; 1 ≤
i ≤ n, define a mapping vi : V → F such that vi (v) = αi . Clearly, vi for each i; 1 ≤
i ≤ n, is well-defined. It can be easily seen that vi , for each i; 1 ≤ i ≤ n, is linear,
i.e., vi ∈ V for each i; 1 ≤ i ≤ n. Indeed, if a, b ∈ V , then there exist unique scalars

n 
n
β1 , β2 , . . . , βn and δ1 , δ2 , . . . , δn such that a = βi vi and b = δi vi . This shows
i=1 i=1
that for each i; 1 ≤ i ≤ n,

n 
vi (a + b) = vi (βi + δi )vi = βi + δi = vi (a) + vi (b)
i=1

 n
and for any α ∈ F, vi (αa) = vi i=1 αβi vi = αβi = α vi (a). Hence, vi ∈ V  for
each i; 1 ≤ i ≤ n and it can be seen that for each i; 1 ≤ i ≤ n, vi (vi ) = 1, vi (v j )=0
for i = j; j = 1, 2, . . . , n. Since dimV =dim V , in order to show that { v1 , v2 , . . . , vn }

is a basis of V , it remains only to show that it is linearly independent. For any γi ∈ F,
n n
where i = 1, 2, . . . , n whenever i=1 γi vi = 0, then i=1 γi vi (v j ) = 0, for each
j = 1, 2, . . . , n. This implies that γi = 0 for each i = 1, 2, . . . , n. Hence we con-
clude that { .
v1 , v2 , . . . , vn } is a basis of V
4.1 Linear Functionals and the Dual Space 113

Theorem 4.6 Let B = {v1 , v2 , . . . , vn } be an ordered basis of V and  v1 , v2 , . . . ,


B={
 dual to the basis B. Then
vn } be the ordered dual basis of V
(i) any v ∈ V can be uniquely written as v = v1 (v)v1 + v2 (v)v2 + · · · + vn (v)vn ,
(ii) any 
v∈V  can be uniquely written as 
v =  1 )
v(v v1 +v(v2 ) v2 + · · · +
v(vn )vn ,
 → Fn such that μ(
(iii) the mapping μ : V v) =  v(v1 ),
v(v2 ), . . . ,
v(vn ) is a linear
transformation which is both one-to-one and onto.
Proof (i) Since B is a basis of V, v ∈ V can be uniquely written as v = α1 v1 +
α2 v2 + · · · + αn vn . Now for any 1 ≤ i ≤ n, vi (v) = α1 vi (v1 ) + α2 vi (v2 ) + · · · +
αn vi (vn ) = αi . This shows that v = v1 (v)v1 + v2 (v)v2 + · · · + vn (v)vn .

(ii) Since
v∈V  and  B is a basis of V , there exist unique scalars α1 , α2 , . . . , αn such
v = α1 v1 + α2 v2 + · · · + αn vn . Now for any v j ∈ V, 1 ≤ j ≤ n;
that 

v(v j ) = α1 v1 (v j ) + α2 v2 (v j ) + · · · + αn vn (v j ) = α j ; 1 ≤ j ≤ n.




This yields that  v(v1 )


v = v1 +
v(v2 )
v2 + · · · +
v(vn )
vn .

 and
(iii) It can be easily seen that μ is a linear transformation. Indeed, for φ, ξ ∈ V
α, β ∈ F,

αφ + βξ = (αφ + βξ )(v1 )
v1 +
(αφ + βξ )(v2 )
v2 + · · · + (αφ + βξ )(vn )
vn
= αφ(v1 ) + βξ(v1 ) v1 + αφ(v2 ) + βξ(v2 ) v2 + · · · +
αφ(vn ) + βξ(vn ) vn .

Therefore,

μ(αφ + βξ ) = (αφ(v1 ) + βξ(v1 ), αφ(v2 ) + βξ(v2 ), . . . , αφ(vn ) + βξ(vn ))


= (αφ(v1 ), αφ(v2 ), . . . , αφ(vn )) + (βξ(v1 ), βξ(v2 ), . . . , βξ(vn ))
= α(φ(v1 ), φ(v2 ), . . . , φ(vn )) + β(ξ(v1 ), ξ(v2 ), . . . , ξ(vn ))
= αμ(φ) + βμ(ξ ).

. Then f = 0
This shows that μ is linear. Given that B is a basis of V and let f ∈ V
if and only if f (vi ) = 0 for every i with 1 ≤ i ≤ n. Therefore

μ( f ) = 0 ⇔ ( f (v1 ), f (v2 ), . . . , f (vn )) = 0


⇔ f (vi ) = 0 for each 1 ≤ i ≤ n
⇔ f = 0.

 = dimFn , μ
Hence, K er μ = {0} and therefore μ is one-to-one. Also since dim V
is onto. This completes the proof.
Theorem 4.7 Let V be a vector space over F. Then the following hold:
(i) For any nonzero vector v ∈ V , there exists a linear functional   such that
v∈V

v(v) = 0.
114 4 Dual Spaces

(ii) A vector v ∈ V is zero if and only if  v(v) = 0 for all 


v∈V .
(iii) If v1 , v2 ∈ V such that  v(v1 ) = 
v(v2 ) for all  , then {v1 , v2 } is linearly
v∈V
dependent.
(iv) Let v∈V . If 
v(x) = 0, then V = x ⊕ K er v.
(v) Two nonzero linear functionals v1 , v2 ∈ V  have the same kernel if and only if
there is a nonzero scalar λ such that v1 = λ v2 .

Proof (i) Since v = 0 in V , one can find a basis B = {v = v1 , v2 , . . . , vn } of V .


Hence, there exists a dual basis {  relative to the basis B of
v = v1 , v2 , . . . , vn } of V
V such that  v(v) = v1 (v1 ) = 1 = 0.
(ii) In view of (i), it is obvious.
(iii) If {v1 , v2 } is linearly independent, then {v1 , v2 } can be extended to a basis of V
say B = {v1 , v2 , v3 , . . . , vn }. Let {
v1 , v2 , . . . , vn } be the dual basis of V  relative to
B. Then v2 (v1 ) = 0, but it does not imply that v2 (v2 ) = 0( v2 (v2 ) = 1). Thus {v1 , v2 }
is linearly dependent.
(iv) If v ∈ x ∩ K er v, then  v(v) = 0 and v = αx for some nonzero α, whence

v(v) = α v(x) = 0 implies that  v(x) = 0, a contradiction. Hence x ∩ K er v = {0}.
Now for any v ∈ V ,


v(v)
v(v)

v= x+ v− x ∈ x + K er
v

v(x) 
v(x)

which yields that V = x + K er


v and hence V = x ⊕ K er
v.

(v) If v1 = λ v2 for λ = 0, then K er v1 = K er v2 . Conversely, if K = K er v1 =


K er v2 , then by (iv), for any x ∈ / K , V = x ⊕ K and hence, v1 | K = λ v2 | K for any
scalar λ. Further, if λ = vv 1 (x)
2 (x)
, it follows that λ
v2 (x) = v1 (x) for all x ∈
/ K . Therefore
v1 = λ v2 .

Example 4.8 R3 = {(x, y, z) | x, y, z ∈ R} is a vector space over R. If B = {(−1,


1, 1), (1, −1, 1), (1, 1, −1)} is a basis of R3 , then we shall find the dual basis of B.
Any (x, y, z) ∈ R3 can be written as

y+z x+z x+y


(x, y, z) = (−1, 1, 1) + (1, −1, 1) + (1, 1, −1).
2 2 2
Define
y+z x+z x+y
φ1 (x, y, z) = , φ2 (x, y, z) = , φ3 (x, y, z) = .
2 2 2
3 , and φ1 (−1, 1, 1) = 1, φ1 (1, −1, 1) = 0, φ1 (1, 1, −1) =
Obviously, φ1 , φ2 , φ3 ∈ R
0, φ2 (−1, 1, 1) = 0, φ2 (1, −1, 1) = 1, φ2 (1, 1, −1)=0, φ3 (−1, 1, 1)=0, φ3 (1, −1,
1) = 0, φ3 (1, 1, −1) = 1. Hence, {φ1 , φ2 , φ3 } is the dual basis of B.
4.1 Linear Functionals and the Dual Space 115


n
Remark 4.9 Define a function f : Fn → F such that f (x1 , x2 , . . . , xn ) = ai xi ,
i=1
then it is a linear functional defined on the vector space Fn , which is deter-
mined uniquely by (a1 , a2 , . . . , an ) ∈ Fn relative to the standard ordered basis
B = {e1 , e2 , . . . , en }; f (ei ) = ai , 1 ≤ i ≤ n. Every linear functional on Fn is of this
form for some scalars a1 , a2 , . . . , an , because if B is the standard basis of Fn , f ∈ Fn

n
and f (ei ) = ai , then for any (x1 , x2 , . . . , xn ) ∈ Fn , (x1 , x2 , . . . , xn ) = xi ei yields
i=1
that

n 
n
f (x1 , x2 , . . . , xn ) = xi f (ei ) = ai xi .
i=1 i=1

Example 4.10 If B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} is a basis of R3 , then in order
to find the dual basis of B, define φ1 , φ2 , φ3 : R3 → R such that

φ1 (x, y, z) = a1 x + b1 y + c1 z,

φ2 (x, y, z) = a2 x + b2 y + c2 z,

φ3 (x, y, z) = a3 x + b3 y + c3 z.

In order to find the dual basis we, need

φ1 (1, 1, 0) = 1, φ1 (1, 0, 1) = 0, φ1 (0, 1, 1) = 0,

φ2 (1, 1, 0) = 0, φ2 (1, 0, 1) = 1, φ2 (0, 1, 1) = 0,

φ3 (1, 1, 0) = 0, φ3 (1, 0, 1) = 0, φ3 (0, 1, 1) = 1.

Now solving the equations φ1 (1, 1, 0) = a1 + b1 = 1, φ1 (1, 0, 1) = a1 + c1 =


0, φ1 (0, 1, 1) = b1 + c1 = 0, we find that a1 = 21 , b1 = 21 , c1 = −1
2
, which shows
that φ1 (x, y, z) = x+y−z
2
. Similarly by solving the equations

φ2 (1, 1, 0) = a2 + b2 = 0, φ2 (1, 0, 1) = a2 + c2 = 1, φ2 (0, 1, 1) = b2 + c2 = 0

and

φ3 (1, 1, 0) = a3 + b3 = 0, φ3 (1, 0, 1) = a3 + c3 = 0, φ3 (0, 1, 1) = b3 + c3 = 1,

we find that a2 = 21 , b2 = −1 2
, c2 = 21 and a3 = −1
2
, b3 = 21 , c3 = 21 , respectively.
−x+y+z
This yields that φ2 (x, y, z) = 2 , φ3 (x, y, z) =
x−y+z
2
. Hence, {φ1 , φ2 , φ3 } is a
basis of R3 dual to the basis B.
116 4 Dual Spaces

Exercises

1. If V is finite dimensional and v1 , v2 ∈ V , where v1 = v2 , then prove that there


exists f ∈ V  such that f (v1 ) = f (v2 ).
2. If u, v ∈ V such that f (u) = 0 implies f (v) = 0, for all f ∈ V  then prove that
v = λu for some λ ∈ F.
3. For f, g ∈ V  if f (v) = 0 implies that g(v) = 0 for all v ∈ V , then show that
{ f, g} is linearly dependent.
4. Let W be a proper subspace of a finite dimensional vector space V and let
v ∈ V \ W . Show that there is a linear functional f ∈ V  for which f (v) = 1
and f (w) = 0 for all w ∈ W .
5. Find out the dual basis of each of the following basis of R3 :
(a) {(1, 1, 1), (1, 1, 0), (1, 0, 0)},
(b) {(1, 2, 3), (2, 3, 1), (3, 2, 1)} and
(c) {(1, −2, 7), (−3, 4, 5), (6, −1, 3)}.
6. Let R2 [x] be the vector space of all real polynomials of degree less than or equal

to two. Find the dual basis of R 2 [x] induced by the following bases of R2 [x]:

(a) {1, x, x 2 },
(b) {1 + x, 1 − x, 1 + x 2 } and
(c) {x, 2 + x, 1 − x − x 2 }.
7. Let V be the vector space of all polynomials over R of degree less than or
equal to 2. Let φ1 , φ2 , φ3 be the linear functionals on V defined by φ1 ( f (t)) =
1
0 f (t)dt, φ2 ( f (t)) = f (1), φ3 ( f (t)) = f (0). Here f (t) denotes the deriva-
tive of f (t). Find a basis { f 1 (t), f 2 (t), f 3 (t)} of V such that its dual is
{φ1 , φ2 , φ3 }.
8. Let V be the vector space of all polynomials over F of degree less than or equal
to 2. Let a, b, c ∈ F be distinct scalars. Let φa , φb , φc be the linear functionals on
V defined by φa ( f (t)) = f (a), φb ( f (t)) = f (b), φc ( f (t)) = f (c). Show that
{φa , φb , φc } is a basis of V and also find its dual basis.
9. Let {e1 , e2 , . . . , en } be the usual basis of Fn . Show that its dual basis is
{π1 , π2 , . . . , πn }, where πi is the ith projection mapping: πi (a1 , a2 , . . . , an ) =
ai .
10. Let W be a subspace of V . For any linear functional φ on W , show that there is
a linear functional f on V such that f (w) = φ(w) for any w ∈ W ; that is, φ is
the restriction of f to W.
11. Let V be a vector space over R. Let φ1 , φ2 ∈ V  and suppose f : V −→ R,
defined by f (v) = φ1 (v)φ2 (v), also belongs to V . Show that either φ1 = 0 or
φ2 = 0.
12. Let V be the vector space of all polynomials over R of degree less than or equal
to 3. Find the dual basis of a basis B = {1, 1 + x, (1 + x)2 , (1 + x)3 } of V.
4.1 Linear Functionals and the Dual Space 117

13. Consider R2 as a vector space over R. Find the formulae for linear functionals f
and g on R2 such that for a fixed θ , f (cosθ, sinθ ) = 1, f (−sinθ, cosθ ) = 2,
g(cosθ, sinθ ) = 2 and g(−sinθ, cosθ ) = 1.

2 .
(a) Prove that B = { f, g} is a basis of R

(b) Find an ordered basis B = {u, v} of R2 such that B is the dual of B.
14. Let n and m be any two positive integers.
(a) For any m linear functionals f 1 , f 2 , . . . , f m on Fn , prove that σ : Fn −→ Fm
given by σ (u) = ( f 1 (u), f 2 (u), . . . , f m (u)) is a linear transformation.
(b) Given any linear transformation T : Fn −→ Fm . Prove that there exist
uniquely determined linear functionals g1 , g2 , . . . , gm depending upon T
such that T (u) = (g1 (u), g2 (u), . . . , gm (u)).
15. Let V be the vector space of all polynomials over R of degree less than or equal to
 
2. Define φ1 , φ2 , φ3 in V such that φ1 ( f (t)) = 1 f (t)dt, φ2 ( f (t)) = 2 f (t)dt
 −1 0 0

and φ3 ( f (t)) = 0 f (t)dt. Show that B = {φ1 , φ2 , φ3 } is a basis of V . Find

a basis B of V , of which B is dual.
16. Let F be a field of characteristic zero and let V be a finite dimensional vector
space over F. If v1 , v2 , . . . , vm are finitely many vectors in V , each different
from the zero vector, prove that there is a linear functional f on V such that
f (vi ) = 0, i = 1, 2, . . . , m.
17. In R3 , let v1 = (1, 0, 1); v2 = (0, 1, −2); v3 = (−1, −1, 0).
(a) If f is a linear functional on R3 such that f (v1 ) = 1, f (v2 ) = −1, f (v3 ) = 3
and if u = (x, y, z), then find f (u).
(b) Describe explicitly a linear functional f on R3 such that f (v1 ) = f (v2 ) = 0
but f (v3 ) = 0.
(c) If f is any linear functional such that f (v1 ) = f (v2 ) = 0 but f (v3 ) = 0 and
if u = (2, 3, −1), then show that f (u) = 0.
18. Let B = {v1 , v2 , v3 } be a basis of C3 defined by v1 = (1, 0, −1), v2 = (1, 1, 1),
v3 = (2, 2, 0). Find the dual basis of B.

4.2 Second Dual Space

For any vector space V , one can consider its dual space V  which contains all the
linear functionals on V. Since V  is also a vector space, consider  ,
V the dual of V
. If V is finite dimensional then V
which contains all the linear functionals on V  and

V are also finite dimensional and dimV = dim V  = dim  V.

Theorem 4.11 (Principal of duality) If V is a vector space over F, then there exists
a canonical isomorphism from V onto  V.
118 4 Dual Spaces

Proof We shall show that each v ∈ V determines a specific element v ∈ 


V . Define
v:V → F such that v( f ) = f (v) for all f ∈ V
. We show that this map is linear.
, we have
For scalars α, β ∈ F, f, g ∈ V

v(α f + βg) = (α f + βg)(v) = α f (v) + βg(v) = αv( f ) + βv(g).

Thus, v is a linear functional on V and hence v ∈  V . Now define the canonical map

σ : V → V such that σ (v) = v. For any α, β ∈ F and v1 , v2 ∈ V, σ (αv1 + βv2 ) =
αv1 + βv2 . It can be easily seen that for any f ∈ V , αv1 + βv2 ( f ) = f (αv1 +
βv2 ) = α f (v1 ) + β f (v2 ) = αv1 ( f ) + βv2 ( f ) = (αv1 + βv2 )( f ).
This shows that αv1 + βv2 = αv1 + βv2 . Hence, using this relation in the pre-
ceding definition of σ , it can be seen that σ (αv1 + βv2 ) = ασ (v1 ) + βσ (v2 ) and σ
is a linear transformation from V to  V . Now if v ∈ K er σ , then σ (v) = v = 0 or
v( f ) = 0 for all f ∈ V . This shows that f (v) = 0 for all f ∈ V . By Theorem 4.7(i),
v = 0 and consequently K er σ = {0} and σ is a monomorphism. This yields that
dimV = dimσ (V ). However, dimV = dim V  = dim  V so that σ (V ) is a subspace
of V such that dimσ (V ) = dim V . This can happen only if σ (V ) = 
  V and hence σ
is onto. This completes the proof of our result.

Remark 4.12 (i) It is to be noted that in the above theorem σ : V −→  V does not
depend upon any particular choice of basis of V , that is why it is called canonical
isomorphism.
(ii) If V is any arbitrary vector space (need not be finite dimensional), even then
σ : V −→  V will exist which is an injective homomorphism, but need not be
onto.

Theorem 4.13 Let V be a vector space over a field F and let  B = { f1 , f2 , . . . , fn }


. Then there exists a basis B = {v1 , v2 , . . . , vn } of V such that 
be a basis of V B is a
dual basis of B.

Proof Since  , by Theorem 4.5, there exists a basis


B = { f 1 , f 2 , . . . , f n } is a basis of V
 
B = {g1 , g2 , . . . , gn } of V dual to  B. Hence gi ( f j ) = 0 for i = j and gi ( f j ) = 1 for
i = j, 1 ≤ i, j ≤ n. Now let σ : V →  V be a canonical isomorphism. Then for any
gj ∈ V , 1 ≤ j ≤ n, there exists v j ∈ V such that σ (v j ) = g j . Then {v1 , v2 , . . . , vn }
is a basis of V. Now for 1 ≤ i, j ≤ n,

f i (v j ) = σ (v j ) ( f i ) = g j ( f i ) = δi j ,
1 if i = j
where δi j is the Kronecker delta, i.e., δi j = . This yields that 
B is the
0 i f i = j
dual basis of B.

Example 4.14 Consider the vector space R2 [x], i.e., the vector space of all poly-
nomials of degree less than or equal to two over R and let ψ1 , ψ2 , ψ3 : R2 [x] → R
4.2 Second Dual Space 119
1
such that ψ1 ( f (x)) = 0 f (x)d x, ψ2 ( f (x)) = f (x), ψ3 ( f (x)) = f (0). Now we
find the basis of R2 [x] dual to {ψ1 , ψ2 , ψ3 }.
Let {a1 + b1 x + c1 x 2 , a2 + b2 x + c2 x 2 , a3 + b3 x + c3 x 2 } be the required basis
of R2 [x], then
b1 c1
ψ1 (a1 + b1 x + c1 x 2 ) = a1 + + = 1,
2 3
b2 c2
ψ1 (a2 + b2 x + c2 x 2 ) = a2 + + = 0,
2 3
b3 c3
ψ1 (a3 + b3 x + c3 x 2 ) = a3 + + = 0,
2 3

ψ2 (a1 + b1 x + c1 x 2 ) = b1 + 2c1 = 0,

ψ2 (a2 + b2 x + c2 x 2 ) = b2 + 2c2 = 1,

ψ2 (a3 + b3 x + c3 x 2 ) = b3 + 2c3 = 0,

ψ3 (a1 + b1 x + c1 x 2 ) = b1 = 0,

ψ3 (a2 + b2 x + c2 x 2 ) = b2 = 0,

ψ3 (a3 + b3 x + c3 x 2 ) = b3 = 1.

This yields that b1 = 0, c1 = 0, a1 = 1, b2 = 0, c2 = 21 , a2 = −1


6
, b3 = 1, c3 =
−1 −b3 x2
2
, a3 = 2 − 2 = − 2 + 6 = − 3 . Hence, {1, − 6 + 2 , − 3 + x − 21 x 2 } is the
c3 1 1 1 1 1

required basis of R2 [x].

4.3 Annihilators

Definition 4.15 Let V be a vector space over a field F and S be a subset of V. Then
 is defined as the collection of all f ∈ V
annihilator S ◦ of S in V  such that f (s) = 0
◦ 
for all s ∈ S, i.e., S = { f ∈ V | f (s) = 0 for all s ∈ S}.

Remark 4.16 (i) For a subset S of V , the annihilator S ◦ of S is defined as the set
of all v ∈ V such that f (v) = 0 for all f ∈ S.
(ii) In a finite dimensional vector space V if 0 = v ∈ V, then we have seen that
there exists f ∈ V such that f (v) = 0. This shows that V ◦ contains no nonzero
functional, i.e., V ◦ = {0}. It is also clear that if a subset S of V contains the
zero vector alone, then S ◦ = V .
120 4 Dual Spaces

(iii) For any subset S of a vector space V, (S ◦ )◦ can be viewed as a subspace of V


under the identification of V and 
V , i.e., (S ◦ )◦ = {v ∈ V | f (v) = 0 for every

f ∈ S }.

Lemma 4.17 Let V be a vector space over a field F. Then


(i) For any subsets S1 , S2 of V if S1 ⊆ S2 , then S2◦ ⊆ S1◦ .
(ii) For any subset S of V , S ◦ is a subspace of V and S ⊆ (S ◦ )◦ .
◦ ◦
(iii) For any subset S of V , S = (L(S)) .

Proof (i) Suppose that S1 ⊆ S2 and f ∈ S2◦ . Then for any v ∈ S1 , f (v) = 0 and
consequently, f ∈ S1◦ which completes the required proof.

(ii) Since for any v ∈ S, 0(v) = 0, we find that 0 ∈ S ◦ and therefore S ◦ = φ. Let
f, g ∈ S ◦ and α, β ∈ F. Then for every v ∈ S

(α f + βg)v = α f (v) + βg(v) = α0 + β0 = 0,

which shows that S ◦ is a subspace of V . Now let v ∈ S. Then for every linear func-
tional f ∈ S ◦ , v ∈ (S ◦ )◦ and under the identification of V
v( f ) = f (v) = 0. Hence 
 ◦ ◦
and V , v ∈ (S ) .

(iii) Since S ⊆ L(S), we find that (L(S))◦ ⊆ S ◦ . Conversely, suppose that f ∈


S ◦ , i.e., f (s) = 0 for all s ∈ S. Now for any α1 , α2 , . . . , αn ∈ F, α1 v1 + α2 v2 +
· · · + αn vn ∈ L(S) for all v1 , v2 , . . . , vn ∈ S. Then f (α1 v1 + α2 v2 + · · · + αn vn ) =
α1 f (v1 ) + α2 f (v2 ) + · · · + αn f (vn ) = 0 and hence f ∈ (L(S))◦ . This completes
the proof.

Theorem 4.18 Let V be a vector space over a field F and W a subspace of V . Then
(i) dimW + dimW ◦ = dimV,
∼ 
(ii) W = WV◦ and
(iii) (W ◦ )◦ = W.

Proof (i) Let {w1 , w2 , . . . , wm } be a basis of W . Then it can be extended to a basis of


V say B = {w1 , w2 , . . . , wm , wm+1 , . . . , wn }. Now let 
B = { w1 , 
w2 , . . . , 
wm , 
wm+1 ,
..., wn } be a basis of V  dual to B. Then for m + 1 ≤ k ≤ n and 1 ≤ j ≤ m,
wk (w j ) = 0. This shows that 
 wk (w) = 0 for all w ∈ W and m + 1 ≤ k ≤ n, and
hence { wn } ⊆ W ◦ . Now we show that {
wm+1 , . . . ,  wn } is a basis of W ◦ .
wm+1 , . . . , 
This is a linearly independent subset with n − m elements. Suppose that  w is an
arbitrary member of W ◦ . Since  w∈V , and  B is a basis of V , we find that  w=
n n

w(wi ) wi . But since w j ∈ W for all 1 ≤ j ≤ m, we arrive at  w= 
w(wi )wi .
i=1 i=m+1
wn } spans W ◦ . Hence dimW ◦ = n − m = n − dimW ,
wm+1 , . . . , 
This yields that {
i.e., dimW + dimW ◦ = dimV.
4.3 Annihilators 121

(ii) Given that W is a subspace of V. Suppose that f ∈ V  and f |W , the restriction of f


to W. Then it is straightforward to see that f |W ∈ W  . Now define a map ψ : V →W 
such that ψ( f ) = f |W . It is clear that for any α, β ∈ F and f, g ∈ V ,

ψ(α f + βg) = (α f + βg)|W = α f |W + βg|W = αψ( f ) + βψ(g).

This shows that ψ is a vector space homomorphism. Now if f ∈ K er ψ, then the


restriction of f to W must be zero, i.e., f (w) = 0 for all w ∈ W or f ∈ W ◦ . Con-
versely, if f ∈ W ◦ , i.e., f (w) = 0 for all w ∈ W , then f |W = 0 and f ∈ K er ψ.
Hence K er ψ = W ◦ . Now we show that ψ is onto. Then we show that any given
h∈W  is the restriction of some f ∈ V . Let {w1 , w2 , . . . , wm } be a basis of W .
Then it can be extended to a basis of V say {w1 , w2 , . . . , wm , u 1 , u 2 , . . . , u r }, where
m + r =dimV . Hence, we can write V = W ⊕ U , where U is a subspace of V
 , define ξ ∈ V
spanned by {u 1 , u 2 , . . . , u r }. Now for any h ∈ W  such that for any
v ∈ V, v = w + u and ξ(v) = h(w), where w ∈ W, u ∈ U . Let v = v and suppose
that v = w + u , v = w + u where w , w ∈ W and u , u ∈ U . This implies
that w = w and u = u . As h is a map, we get h(w ) = h(w ). This implies that
ξ(v ) = ξ(v ) and hence ξ is well defined. It can be easily seen that ξ is a linear
functional whose restriction on W is h, i.e., ψ(ξ ) = ξ |W = h. Hence, ψ is onto and
∼  ∼ 
by fundamental theorem of vector space homomorphism W = K er
V
ψ
, i.e., W = WV◦ .

(iii) Suppose that dimW = m and dimV = n. Then by the above (i), we find that
dimW ◦ = n − m. Therefore,

 − dimW ◦ = dimV − dimW ◦ = n − (n − m) = m = dimW.


dim(W ◦ )◦ = dim V

Since by Lemma 4.17 W ⊆ (W ◦ )◦ , we find that W = (W ◦ )◦ .


Proposition 4.19 If W1 , W2 are subspaces of a finite dimensional vector space V ,
then
(i) (W1 + W2 )◦ = W1◦ ∩ W2◦ ;
(ii) (W1 ∩ W2 )◦ = W1◦ + W2◦ .
Proof (i) Since W1 ⊆ W1 + W2 and W2 ⊆ W1 + W2 , by Lemma 4.17, (W1 +
W2 )◦ ⊆ W1◦ ∩ W2◦ . Now, on the other hand, suppose that ϕ ∈ (W1◦ ∩ W2◦ ). Then ϕ
annihilates both W1 and W2 . If v ∈ W1 + W2 , then v = w1 + w2 , where w1 ∈ W1
and w2 ∈ W2 . Now ϕ(v) = ϕ(w1 ) + ϕ(w2 ) = 0. This shows that ϕ annihilates
W1 + W2 , i.e., ϕ ∈ (W1 + W2 )◦ . Therefore, (W1◦ ∩ W2◦ ) ⊆ (W1 + W2 )◦ and hence
(W1 + W2 )◦ = W1◦ ∩ W2◦ .

(ii) Replacing W1 by W1◦ and W2 by W2◦ in (i) and using Theorem 4.18(iii),
we get (W1◦ + W2◦ )◦ = (W1◦ )◦ ∩ (W2◦ )◦ = W1 ∩ W2 and hence ((W1◦ + W2◦ )◦ )◦ =
(W1 ∩ W2 )◦ . This implies that (W1 ∩ W2 )◦ = W1◦ + W2◦ .
Remark 4.20 Observe that no dimension argument is employed in the proof (i),
hence the above result (i) holds for vector spaces of finite or infinite dimensions.
122 4 Dual Spaces

4.4 Hyperspaces or Hyperplanes

Definition 4.21 Let V be a vector space over F. Then a maximal proper subspace
of V is called a hyperspace or hyperplane of V .

Example 4.22 (1) Consider Rn , n ≥ 2, as a vector space over R. Then the sub-
spaces W1 = {(α1 , α2 , . . . , αn−1 , 0) | αi ∈ R, i = 1, 2, . . . , n − 1} and W2 =
{(0, β1 , . . . , βn−1 ) | βi ∈ R, i = 1, 2, . . . , n − 1} are hyperspaces of Rn .
(2) Consider Pn (x), n ≥ 1, as the vector space of all real polynomials of degree
at most of degree n over the field R. Then the subspaces W1 = {α1 x + α2 x 2 +
· · · + αn x n | αi ∈ R, i = 1, 2, . . . , n} and W2 = {β1 + β2 x + · · · + βn−1 x n−1 |
βi ∈ R, i = 1, 2, . . . , n − 1} are hyperspaces of Pn (x).

Theorem 4.23 Let V be an n-dimensional vector space over a field F, where n ≥ 2.


Then a subspace W of V will be a hyperspace of V if and only if dim W = n − 1.

Proof Suppose that the subspace W is a hyperspace of V . As W is a proper subspace


of V , hence 0 < dim W < dim V =n. We claim that dim W = n − 1. Suppose
on contrary, 0 < dim W = m < n − 1. Since W is a proper subspace of V , there
exists an element v ∈ V \ W . Consider W1 , the subspace of V generated by v, i.e.,
W1 = [v]. It is obvious that W ⊂ W + W1 ⊆ V . Since W + W1 is a subspace of V
and W ∩ W1 = {0}, we have dim W + W1 = dim W +dim W1 − dim W ∩ W1 = dim
W + dim W1 = m + 1 < n. But W is a maximal subspace of V also; this forces us
to conclude that W + W1 = V and hence dim W + W1 = dim V = n, which leads
to a contradiction. Thus we conclude that dim W = n − 1.
Conversely suppose that dim W = n − 1. We have to show that W is a hyperspace
of V . By our hypothesis, it is clear that 0 < dim W < n. This shows that W is a proper
subspace of V . Next we prove that W is also a maximal subspace of V . For this, let
W2 be a subspace of V such that W ⊆ W2 ⊆ V . Then we have to prove that either
W2 = W or W2 = V . If W2 = W , then there is nothing to do. If W2 = W , then there
exists v ∈ W2 \ W . Consider W3 , the subspace of V generated by v , i.e., W3 = [v ]
and arguing in the similar way as above one can show that dim W + W3 = n − 1 + 1
= n. But as W + W3 is a subspace of V , we conclude that W + W3 = V. We also
have W + W3 ⊂ W2 . Hence, we conclude that V ⊂ W2 . Finally we get V = W2 .
Thus W is a hyperspace of V.

Theorem 4.24 If f is a nonzero linear functional on a vector space V , then the null
space of f is a hyperspace of V . Conversely, every hyperspace of V is the null space
of a (not unique) nonzero linear functional on V .

Proof Let f be a nonzero linear functional on the vector space V and W the null
space of f. We have to show that W is a hyperspace of V. It is obvious that W = V.
Also W = {0} as dim V > 1 if V is finite dimensional. This shows that W is a
proper subspace of V. To prove that W is also a maximal subspace of V , let W1 be a
subspace of V such that W ⊆ W1 ⊆ V . Then we have to prove that either W1 = W
or W1 = V. If W1 = W , then there is nothing to do. If W1 = W , then there exists
4.4 Hyperspaces or Hyperplanes 123

v ∈ W1 \ W. Now consider the subspace W + [v] = {w + λv | w ∈ W, λ ∈ F}. It


is clear that 0 = f (v) ∈ F. Let x ∈ V. We can write x = (x − f (x) f (v)−1 v) +
f (x) f (v)−1 v. We also have f (x − f (x) f (v)−1 v) = f (x) − f (x) f (v)−1 f (v) = 0.
Thus (x − f (x) f (v)−1 v) ∈ null space of f = W. This ensures that x ∈ W + [v], i.e.,
V ⊆ W + [v]. But it is given that W + [v] ⊆ W1 . Hence, we conclude that V ⊆ W1 ,
i.e., W1 = V. Thus W is a hyperspace of V.
Conversely, suppose that W is a hyperspace of V . Then we have to construct a
nonzero linear functional g on V such that null space of g = W , as {0} = W = V .
Thus there exists v ∈ V \ W. Since W + [v ] is a subspace of V and W ⊂ W +
[v ], therefore using the maximality of W, we conclude that V = W + [v ]. Let
z ∈ V. Then z = w + αv for some w ∈ W, α ∈ F. We claim that for any z ∈ V,
corresponding w ∈ W and α ∈ F are unique. To prove this claim, let us suppose that
z = w1 + α1 v and z = w2 + α2 v , where w1 , w2 ∈ W and α1 , α2 ∈ F. This implies
that w1 + α1 v = w2 + α2 v , i.e., w1 − w2 = (α2 − α1 )v . From previous relations,
we conclude that α1 = α2 , for otherwise we have v = (α2 − α1 )−1 (w1 − w2 ) ∈ W ,
leading to a contradiction. It is now obvious that w1 = w2 . Now define a map g :
V −→ F such that g(z) = α, which is obviously well defined. Now we show that g is
a linear functional on V and null space of g = W. Let z 1 = w1 + α1 v and z 2 = w2 +
α2 v , where w1 , w2 ∈ W and α1 , α2 ∈ F. Then λz 1 + μz 2 = (λw1 + μw2 ) + (λα1 +
μα2 )v , for any λ, μ ∈ F. Hence g(λz 1 + μz 2 ) = λα1 + μα2 = λg(z 1 ) + λg(z 2 ).
This proves that g is a linear functional on V . It is obvious to observe that null space
of g is W and g = 0.
Lemma 4.25 If f and g are linear functionals on a vector space V , then g is a
scalar multiple of f if and only if the null space of g contains the null space of f ,
i.e., if and only if f (v) = 0 implies g(v) = 0, where v ∈ V .
Proof We divide the proof in two cases.
Case I: If at least one of f and g is the zero linear functional, then lemma holds
trivially.
Case II: f = 0 and g = 0. Suppose that g = α f , for some 0 = α ∈ F. Let f (v) = 0
for some v ∈ V , as we are given g(v) = α f (v). This shows that g(v) = α0 = 0.
Thus we obtain g(v) = 0. Conversely, suppose that the null space of g contains
the null space of f . Since f = 0, there exists v ∈ V such that f (v ) = 0. Also
assume that the null space of f is N and therefore v ∈ V \ N . By the previous
theorem N will be a maximal subspace of V. Let β = g(v )( f (v ))−1 . Now define
the linear functional h on V , given by h = g − β f. Next we show that h is the
zero linear functional on V. Consider the subspace W of V , which is spanned by
N ∪ {v }, i.e., [N ∪ {v }]. Since N is a maximal subspace of V and v ∈ V − N ,
therefore we conclude that V = [N ∪ {v }] = N + [v ] = {x + λv | x ∈ N , λ ∈ F}.
Thus h(x + λv ) = (g − β f )(x + λv ) = g(x + λv ) − β f (x + λv ). Now using the
fact that the null space of g contains the null space of f , we have h(x + λv ) =
λg(v ) − βλ f (v ) = λg(v ) − λ( f (v ))−1 f (v )g(v ) = 0 for all x ∈ N and for all
λ ∈ F. This implies that h equals the zero linear functional on V . Finally, we conclude
that g − β f = 0, i.e., g = β f. Hence, we have shown that g is a scalar multiple
of f.
124 4 Dual Spaces

Theorem 4.26 Let f, f 1 , f 2 , . . . , fr be linear functionals on a vector space V


with respective null spaces N , N1 , N2 , . . . , Nr . Then f is a linear combination of
f 1 , f 2 , . . . , fr if and only if N1 ∩ N2 ∩ · · · ∩ Nr ⊆ N .

Proof Let f = α1 f 1 + α2 f 2 + · · · + αr fr , for some α1 , α2 , . . . , αr ∈ F. For any v ∈


N1 ∩ N2 ∩ · · · ∩ Nr , we have f (v) = (α1 f 1 + α2 f 2 + · · · + αr fr )(v) = α1 f 1 (v) +
α2 f 2 (v) + · · · + αr fr (v) = 0. This implies that v ∈ N . Thus we proved N1 ∩ N2 ∩
· · · ∩ Nr ⊆ N .
We shall prove the converse by induction on the number r. If r = 1, then
the result holds by the previous lemma. Suppose the result holds for r = k −
1, and let f 1 , f 2 , . . . , f k be linear functionals with null spaces N1 , N2 , . . . , Nk
such that N1 ∩ N2 ∩ · · · ∩ Nk ⊆ N . Let f , f 1 , f 2 , . . . , f k−1
be the restrictions of

f, f 1 , f 2 , . . . , f k−1 to the subspace Nk . Then f , f 1 , f 2 , . . . , f k−1 are linear func-
tionals on the vector space Nk . Furthermore, if v ∈ Nk and v is an element of
the null space of f i , i = 1, 2, . . . , k − 1, i.e., f i (v) = 0, i = 1, 2, . . . , k − 1, then
v ∈ N1 ∩ N2 ∩ · · · ∩ Nk because the null space of f i ⊆ Ni , i = 1, 2, . . . , k − 1 and
thus by our hypothesis v ∈ N , i.e., f (v) = 0. We also get f (v) = f (v) = 0. Now
by the induction hypothesis, there exist scalars βi , i = 1, 2, . . . , k − 1 such that
f = β1 f 1 + β2 f 2 + · · · + βk−1 f k−1
. Now define a map h : V −→ F such that

k−1
h(x) = ( f − βi f i )(x) for all x ∈ V . It is easy to verify that h is a linear func-
i=1
tional on V . Let u be an element of the null space of f k . Thus h(u) = ( f −

k−1 
k−1
βi f i )(u) = f (u) − βi f i (u). Now using the facts that u ∈ Nk , f (u) = f (u),
i=1 i=1

k−1
f 1 (u) = f 1 (u), . . . ,
f k−1 (u) = f k−1 (u), we have h(u) = f (u) − βi f i (u) = 0.
i=1
This proves that u is an element of the null space of h also. By the preceding lemma,

k−1
h is a scalar multiple of f k . If h = βk f k for some βk ∈ F, then f − βi f i = βk f k ,
i=1

k
i.e., f = βi f i .
i=1

Example 4.27 Let n be a positive integer and F a field. Suppose that W is the set of all
vectors (x1 , x2 , . . . , xn ) ∈ Fn such that x1 + x2 + · · · + xn = 0. Then it can be seen

n
that W ◦ consists of all linear functionals f of the form f (x1 , x2 , . . . , xn ) = c xi .
i=1

CaseI : Suppose that char (F) = 2. If f ∈ W ◦ , then clearly f ∈ Fn . But we



n
know that f (x1 , x2 , . . . , xn ) = ci xi , for some fixed c1 , c2 , . . . , cn ∈ F precisely.
i=1
It is obvious that (1, −1, 0, 0, . . . , 0, 0) ∈ W. Hence f (1, −1, 0, 0, . . . , 0, 0) = 0. It
implies that c1 = c2 . Similarly (0, 1, −1, 0, . . . , 0, 0) ∈ W. As a result f (0, 1, −1, 0,
. . . , 0, 0) = 0, which implies that c2 = c3 . Arguing in the same way, we conclude
that c1 = c2 = · · · = cn . Let us say each of the previous ci to be c ∈ F. Thus, we
4.4 Hyperspaces or Hyperplanes 125


n
have proved that f (x1 , x2 , . . . , xn ) = c xi .
i=1

CaseII : Suppose that char (F) = 2. Now in this case x = −x for all x ∈ F. It is
easy to observe that the proof given in Case I holds in Case II also.

Exercises

1. If F is the field of real numbers, find W ◦ and its dimension, where


(a) W is the subspace of R3 spanned by (1, 2, −1) and (1, 1, 0).
(b) W is the subspace of R4 spanned by (0, 0, 1, 1), (−1, −2, 3, 1) and (1, 0,
0, 3).
  ◦ 
2. If V = W1  = W1◦
W2 , then prove that V W2 , and hence deduce that WV◦ ∼
=
1

W2 .
3. Let W1 , W2 be subspaces of a vector space V . Then show that W 1 ⊕ W2 =


W1 ⊕ W2 .
V ∼
4. Let W be a subspace of V . Prove that W = W ◦.
5. Let V = R and W be the subspace spanned by v1 = (2, −2, 3, 2), v2 =
4

(3, −3, 1, 1).


(a) Find dim W and dim W ◦ , by explicitly giving their respective basis. Verify
that 4 = dim W + dim W ◦ .
(b) Let f ∈ V be given as f (x1 , x2 , x3 , x4 ) = x1 + x2 + x3 − x4 . Is f ∈ W ◦ ?

6. Let V be the vector space of all polynomials over R of degree less than or equal
to 3 and W be the subspace of V , consisting of those polynomials p(x) ∈ V
such that p(1) = 0, p(−1) = 0. Find dim W and dim W ◦ . ⎡ ⎤
122 1
⎢1 2 2 2 ⎥
7. Let W1 and W2 be the row space and the column space of A = ⎢ ⎣ 2 4 3 3 ⎦,

0 0 1 −1
respectively.

4 that belong to W ◦ .
(a) Find the general formula for those f ∈ R 1
4 that belong to W ◦ .
(b) Find the general formula for those f ∈ R 2

8. For any A, B ∈ Mn (F), the vector space of n × n matrices over a field F, prove
that
(a) tr (AB) = tr (B A),
(b) if A and B are similar, then tr (A) = tr (B),
(c) there exist no two matrices A and B in M2 (R), such that AB − B A = I2 ,
where I2 is the identity matrix of order 2.
126 4 Dual Spaces

9. Let V = M2 (R) be the vector space of all 2 × 2 matrices with real entries and
W is the subspace
 of V consisting of those A ∈ V such that AB = B A, where
13  such
B= . Find dim W and dim W ◦ . Does there exist a nonzero f ∈ V
26


00 
that f (I2 ) = 0, f = 0 and f ∈ W ◦ .
01
10. Find a basis of the annihilator W ◦ of the subspace W of R4 spanned by
(1, 2, −3, 4) and (0, 1, 4, −1).
11. Let W be the subspace of R5 , which is spanned by the vectors v1 = e1 + 2e2 +
e3 , v2 = e2 + 3e3 + 3e4 + e5 , v3 =e1 + 4e2 + 6e3 + 4e4 + e5 , where {e1 , e2 , e3 ,
e4 , e5 } is the standard basis of R5 . Find a basis for W ◦ .
12. Let V =M2 (R) be  the vector space of all 2 × 2 matrices with real entries and
2 −2
let B = . Let W be the subspace of V consisting of those A ∈ V such
−1 1
that AB = 02×2 . Let f be a linear functional on V which is in the annihilator
of W. Suppose that f (I  2 ) = 0 and f (C) = 3, where I2 is the identity matrix of
00
order 2 × 2 and C = . Find f (B).
01
13. Let S be a set, F a field and V (S; F) the vector space of all functions from S
into F, where operations are defined as follows: ( f + g)(x) = f (x) + g(x);
(α f )(x) = α f (x). Let W be any n-dimensional subspace of V (S; F). Show that
there exist points x1 , x2 , . . . , xn in S and functions f 1 , f 2 , . . . , f n in W such that
f i (x j ) = δi j .
14 If W is a subspace of a finite dimensional vector space V and if { f 1 , f 2 , . . . , fr }
is any basis for W ◦ , then prove that W = ∩ri=1 Ni , where Ni is the null space of
f i , i = 1, 2, . . . , r.

4.5 Dual (or Transpose) of Linear Transformation

. For a
For a given vector space V over a field F, one can always find its dual space V

given linear transformation T : V → W , is it possible to find a linear map T : W→

V ? In the present section, we shall discuss properties of such linear maps. In fact, if
T : V → W is a linear transformation, then for any f ∈ W  , the composition f ◦ T :
V → F given by f ◦ T (v) = f (T (v)) for all v ∈ V defines a linear transformation.
In fact, for any u, v ∈ V and α, β ∈ F, f ◦ T (αu + βv) = α f ◦ T (u) + β f ◦ T (v)
and hence f ◦ T ∈ V .

Definition 4.28 If V, W are vector spaces over F and T : V → W is a linear trans-


formation, and f ∈ W  , then a map T : W→V  given by T
( f ) = f ◦ T is called
dual or transpose of T ; it is also denoted by T .
t

 , α, β ∈ F, and v ∈ V
For any f, g ∈ W
4.5 Dual (or Transpose) of Linear Transformation 127

(α f + βg))(v) =
(T ((α f + βg) ◦ T )(v)
= (α f + βg)(T (v))
= α f (T (v)) + βg(T (v))
= α( f ◦ T )(v) + β(g ◦ T )(v)
= (α( f ◦ T ) + β(g ◦ T ))(v)
= ( f ) + β T
(α T (g))(v).

(α f + βg) = α T
This shows that T ( f ) + β T
(g), and hence T
: W
→V  is a linear
transformation. This implies that if T ∈ H om(V, W ), then T ∈ H om(W
, V
).

Example 4.29 Let f be the linear functional on R3 defined by f (x, y, z) = x −


( f ), in each case if T : R2 −→ R3 is a linear transformation
2y + 2z. Evaluate T
defined as
(1) T (x, y) = (3x − y, x + 2y, x − y),
(2) T (x, y) = (−2x + y, 3y, x + 6y) and
(3) T (x, y) = (2x − y, x − 2y, −x).
Solution:
( f )(x, y) = ( f ◦ T )(x, y) = f (T (x, y)) = f (3x − y, x + 2y, x − y)
(1) T
= 3x − y − 2x − 4y + 2x − 2y = 3x − 7y.
( f )(x, y) = ( f ◦ T )(x, y) = f (T (x, y)) = f (−2x + y, 3y, x + 6y)
(2) T
= −2x + y − 6y + 2x + 12y = 7y.
( f )(x, y) = ( f ◦ T )(x, y) = f (T (x, y)) = f (2x − y, x − 2y, −x)
(3) T
= 2x − y − 2x + 4y − 2x = −2x + 3y.

Lemma 4.30 If U, V are vector spaces over F and T1 , T2 ∈ H om(U, V ); α ∈ F,


then
(i) 
0 = 0, where 0 stands for the zero linear transformation.
(ii) IU = IU , where IU and IU stand for the identity linear operators on the vector
spaces U and U , respectively.

(iii) (T1 + T2 ) = T1 + T2 .


(iv) (αT 
1 ) = α T1 .

Proof (i) Since 0 : U −→ V, hence  0:V −→ U . If f ∈ V, and u ∈ U, then


(
0( f ))(u) = ( f ◦ 0)(u) = f (0(u)) = f (0) = 0 = 0(u). This shows that  0( f ) = 0,
i.e., 0 = 0.
(ii) We know that IU : U −→ U , thus IU : U  −→ U . For any f ∈ U
 and u ∈ U ,
we have ( IU ( f ))(u) = ( f ◦ IU )(u) = f (u). This implies that ( IU ( f )) = f , i.e.,
( IU ( f )) = IU ( f ). Finally, we have obtained IU = IU .

(iii) If T1 + T2 : U −→ V , then T   
1 + T2 : V −→ U . If f ∈ V , and u ∈ U , then

((T1 + T2 )( f ))(u)=( f ◦ (T1 + T2 ))(u) = f ((T1 + T2 )(u)) = f (T1 (u) + T2 (u)) =
( f ◦ T1 + f ◦ T2 )(u). This implies that (T 1 + T2 )( f ) = ( f ◦ T1 + f ◦ T2 ), i.e.,
128 4 Dual Spaces

(T     
1 + T2 )( f ) = T1 ( f ) + T2 ( f ). Finally, we get (T1 + T2 )( f ) = ( T1 + T2 )( f ), i.e.,
 
(T1 + T2 ) = T1 + T2 . 
(iv) Since αT1 : U −→ V , hence αT 1 : V −→ U . If f ∈ V , and u ∈ U , then

((αT 1 )( f ))(u) = ( f ◦ (αT1 ))(u) = f (αT1 )(u) = f (α(T1 (u))) = α( f (T1 (u)). This
gives us ((αT  
1 )( f ))(u) = (α( f ◦ T1 ))(u), i.e., αT1 ( f ) = (α( f ◦ T1 )). Thus, we
 
conclude that (αT1 )( f ) = (α(T1 ( f ))), i.e., (αT1 )( f ) = (α T1 )( f ). This shows that


(αT 
1 ) = α T1 .

Theorem 4.31 Let U, V, W be vector spaces over F. If T : U −→ V and S : V −→



W are linear transformations, then (S ◦
◦ T) = T S.
Proof Since S ◦ T : U −→ W , S ◦T :W  −→ U . We also have T: V −→ U  and

S : W −→ V , as a result we get T ◦ S : W −→ U . Thus, both S ◦ T and T ◦ 
        S are
linear transformations from W  to U. To prove the equality of these two maps, let

w∈W  . Then (S ◦ T )(
w) =  w ◦ S) ◦ T = (
w ◦ (S ◦ T ) = ( w)) ◦ T. If we assume
S(

S(w) = v∈V , then we arrive at (S  ◦ T )(
w) = 
v◦T =T (
v) = T(S(w)) = (T◦

S)( 
w). Finally, we conclude that (S ◦ T) = T◦ S.
Lemma 4.32 If U, V are vector spaces over F and T : U −→ V is an invertible
 is also an invertible linear transformation and (T
linear transformation, then T )−1 =

(T −1 ).

Proof Since T : U −→ V is an invertible linear transformation, there exists the


invertible linear transformation T −1 : V −→ U such that T −1 T = IU , T T −1 = I V ,
where IU and I V are the identity linear operators on the vector spaces U and V ,
respectively. We also have T : V −→ U  and (T  −→ V
−1 ) : U . Using Theorem
4.31 and Lemma 4.30, we have T 
(T 
−1 ) = I ; (T
  = I V . Previous relations
−1 ) T
U
show that T  is invertible and (T −1 
) = (T ). −1

Theorem 4.33 Let U and V be vector spaces over a field F and let T be a linear
transformation from U to V. Then null space or kernel of T  is the annihilator of
range of T. Further if U and V are finite dimensional, then
) = r (T ).
(i) r (T
(ii) the range of T is the annihilator of the null space of T.

Proof We first prove that N (T : V


) = (T (U ))◦ . Since T  −→ U
, we find that

) ⇔
f ∈ N (T ( f ) = 0, the zero linear functional on U
T
⇔ f ◦T =0
⇔ ( f ◦ T )(u) = 0 ∈ F, for all u ∈ U
⇔ f (T (u)) = 0, for all T (u) ∈ T (U )
⇔ f ∈ (T (U ))◦ .

) = (T (U ))◦ .
Hence N (T
4.5 Dual (or Transpose) of Linear Transformation 129

(i) Now let dim U = n, dim V = m and r (T ) = dim (T (U )) = p. Since T (U )


is a subspace of V , using Theorem 4.18, we find that dim (T (U ))◦ = dim V –
dim (T (U )) = m − p. But since N (T ) = dim
) = (T (U ))◦ , we arrive at dim N (T
◦ ) = m − p. Now, using the relation r (T
(T (U )) , i.e., n(T ) + n(T) = dim V, we
) =dimV
obtain r (T  − n(T
) = m − (m − p) = p = r (T ).

(ii) We have to show that T (V (V


) = (N (T ))◦ . For this, let f ∈ T ). This implies that
f =T (g), for some g ∈ V , i.e., f = g ◦ T. If u ∈ N (T ), then f (u) = (T
(g))(u) =
(g ◦ T )(u) = g(T (u)) = g(0) = 0. This shows that f ∈ (N (T ))◦ and hence T (V) ⊆
(N (T ))◦ . But since dim T(V
) = r (T) = r (T ) = p and N (T ) is a subspace of U , by
Theorem 4.18, we find that dim (N (T ))◦ = dim U – dim N (T ) = n − (n − p) = p.
(V
Finally, it follows that T ) = (N (T ))◦ .

Theorem 4.34 Let U and V be finite dimensional vector spaces over a field F. Let
B1 be an ordered basis for U with dual basis
B1 , and let B2 be an ordered basis for
V with dual basis
B2 . Let T : U −→ V be a linear transformation, let [T ] BB21 be the

]
matrix of T relative to B1 ,B2 and let [T B2
be the matrix of T relative to B1 ,
B2 .
B1

]
Then [T B2
= the transpose of [T ] BB21 .
B 1

Proof Let B1 = {u 1 , u 2 , . . . , u n }, B2 = {v1 , v2 , . . . , vm } and B1 = { f 1 , f 2 , . . . , f n };



B2
B2 = {g1 , g2 , . . . , gm }. Suppose that [T ] B1 = (αi j )m×n and [T ]
B2
= (βi j )m×n . By
B1
m m
definition, we have T (u j ) = (g j ) =
αi j vi and T βi j f i , j = 1, 2, . . . , n. On the
i=1 i=1

m m 
m
(gs ))(u r ) = gs (T (u r )) = gs
other hand, (T αkr vk = αkr gs (vk ) = αkr δsk
k=1 k=1 k=1
= αsr where r = 1, 2, . . . , n; s = 1, 2, . . . , m. For any linear functional f on U , we
m
have f = (gs ) and use
f (u r )( f s ). If we apply this formula to the functional f = T
s=1

m
the fact that (T (gs ) =
(gs ))(u r ) = αsr , we have βr s = αsr , because T βr s f s also,
r =1
where r = 1, 2, . . . , n; s = 1, 2, . . . , m. This implies that (βi j )m×n = the transpose

]
of (αi j )m×n , i.e., [T B2
= the transpose of [T ] BB21 .
B 1

Exercises

1. Let f be the linear functional on R3 defined by f (x, y, z) = 2x − 3y + z. For


( f ))(x, y, z) :
each of the following linear operators on R3 , find (T
(a) T (x, y, z) = (x − 3y + z, −2x + y + z, 2x − z),
(b) T (x, y, z) = (−2x + y + z, x − 2y, x + z) and
(c) T (x, y, z) = (−2x + y, x − 2z, y + z).
130 4 Dual Spaces

2. Let V be a finite dimensional vector space. Then prove that the linear transfor-
mation T : V −→ V is nonsingular if and only if its transpose T : V −→ V  is
nonsingular.
3. Let V be the vector space of all polynomial functions over the field of real numbers.
Let a and b be fixed real numbers and let f be the linear functional on V defined
b
by f ( p(x)) = a p(x)d x, where p(x) ∈ V . If D is the differentiation operator
 f ).
on V , then find D(
4. Let V be the vector space of all n × n matrices over a field F and let B be a fixed
n × n matrix. If T is the linear operator on V defined by T (A) = AB − B A, and
( f ).
if f is the trace function, then find T
5. Let V be a finite dimensional vector space over the field F and let T be a linear
operator on V. Let α be a scalar and suppose there is a nonzero vector v ∈ V such
that T (v) = αv. Prove that there is a nonzero linear functional f on V such that
( f ) = α f.
T
6. Let n be a positive integer and let V be the vector space of all polynomial functions
over the field of real numbers which have degree atmost n, i.e., functions of the
form f (x) = αo + α1 x + · · · + αn x n . Let D be the differential operator on V.
Find a basis for the null space of the transpose operator D. 
7. Let V be a finite dimensional vector space over a field F. Show that T −→ T  is
an isomorphism from A (V ) to A (V ).
Chapter 5
Inner Product Spaces

In the previous chapters, we have considered vector space V over an arbitrary field
F. In the present chapter, we shall restrict ourselves over the field of reals R or the
complex field C. One can see that the concept of “length” and “orthogonality” did
not appear in the investigation of vector space over arbitrary field. In this chapter, we
place an additional structure on a vector space V to obtain an inner product space.
If V is a vector space over R then V is called real vector space. On the other hand,
if V is a vector space over C then V is called complex vector space.

5.1 Inner Products

Definition 5.1 A vector space V over F is said to be an inner product space if there
exists a function ,  : V × V → F satisfying the following axioms:
(1) u, v = v, u for all u, v ∈ V .
(2) u, u ≥ 0 and u, u = 0 ⇔ u = 0 for all u ∈ V .
(3) αu + βv, w = αu, w + βv, w for all u, v, w ∈ V and α, β ∈ F.

Remark 5.2 (i) The function ,  satisfying the axioms (1), (2) and (3) is called
inner product on V .
(ii) If F = R, then the complex conjugate v, u = v, u, and hence the axiom (1)
can be written as u, v = v, u.
(iii) u, v is generally denoted as (u, v), u.v or u|v. Throughout we shall denote
it by u, v.
(iv) If F = C, the field of complex numbers, then axiom (1) implies that u, u is
real and hence the axiom (2) makes sense. For any α, β ∈ C and u, v, w ∈ V ,
applying (1) and (2) we see that

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 131
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_5
132 5 Inner Product Spaces

u, αv + βw = αv + βw, u


= αv, u + βw, u
= αv, u + βw, u
= αv, u + βw, u
= αu, v + βu, w.

Example 5.3 (1) In the vector space V = F2 , for any u = (α1 , α2 ), v = (β1 , β2 ) ∈
V define
u, v = 2α1 β1 + α1 β2 + α2 β1 + α2 β2 .

It can be easily seen that

u, v = 2α1 β1 + α1 β2 + α2 β1 + α2 β2
= 2α1 β1 + α1 β2 + α2 β1 + α2 β2
= 2β1 α1 + β1 α2 + β2 α1 + β2 α2
= v, u,

u, u = 2α1 α1 + α1 α2 + α2 α1 + α2 α2
= α1 α1 + α1 (α1 + α2 ) + α2 (α1 + α2 )
= |α1 |2 + (α1 + α2 )(α1 + α2 )
= |α1 |2 + |α1 + α2 |2 ≥ 0.

This shows that u, u = 0 if and only if α1 = α2 = 0, i.e., u = 0.


For any u = (α1 , α2 ), v = (β1 , β2 ), w = (γ1 , γ2 ) ∈ V and δ, λ ∈ F, it is also
straightforward to see that

δu + λv, w = δu, w + λv, w.

This shows that the above product defines an inner product on V .


(2) In the vector space V = Fn , for any u = (α1 , α2 , . . . , αn ), v = (β1 , β2 , . . . , βn )
∈ V define
u, v = α1 β1 + α2 β2 + · · · + αn βn .

It can be easily seen that the above product defines an inner product on V .
The above inner product is called standard inner product in Rn and Cn , and
the resulting inner product space is called Euclidean and Unitary space, respec-
tively.
(3) Let V = C [a, b], the vector space of all continuous complex valued functions
defined on the closed interval [a, b]. For any f (t), g(t) ∈ V define
 b
 f (t), g(t) = f (t)g(t)dt.
a

This defines an inner product on V . For any f (t), g(t) ∈ V , it can be seen that
5.1 Inner Products 133

b b
g(t) f (t)dt = a g(t) f (t)dt
a b
= a f (t)g(t)dt.

This shows that  f (t), g(t) = g(t), f (t). Also


 b  b
f (t) f (t)dt = | f (t)|2 dt ≥ 0
a a

and equality holds if and only if f (t) = 0.


For any α, β ∈ C, f (t), g(t), h(t) ∈ V
b
α f (t) + βg(t), h(t) = a {α f (t) + βg(t)}h(t)dt
b b
= α a f (t)h(t)dt + β a g(t)h(t)dt
= α f (t), h(t) + βg(t), h(t).
⎡ ⎤
λ1 0 0
(4) For any diagonal matrix A = ⎣ 0 λ2 0 ⎦ with all reals λi ≥ 0, 1 ≤ i ≤ 3. For
0 0 λ3
any u, v ∈ R3 , let u = (u 1 , u 2 , u 3 ), v = (v1 , v2 , v3 )
⎡ ⎤⎡ ⎤
 λ1 0 0
v1
u, v = u 1 u 2 u 3 ⎣ 0 λ2 0 ⎦ ⎣ v2 ⎦
0 0 λ3 v3
= λ1 u 1 v1 + λ2 u 2 v2 + λ3 u 3 v3 .

Then the above product defines an inner product on R3 . Hence there are infinitely
many inner product that can be defined on R3 . Readers are advised to show that
the above product u Avt cannot be an inner product if A has a negative diagonal
entry λi < 0.

a11 a12
(5) Consider the vector space V = M2 (R). For any A = and B =
a21 a22


b11 b12
in V , define
b21 b22

A, B = a11 b11 + a12 b12 + a21 b21 + a22 b22 .

Then it can be easily seen that the above product defines an inner product on
V . In fact A, A = a11
2
+ a12
2
+ a21
2
+ a22
2
≥ 0
and A, A = 0 if and only if
00
a11 = 0, a12 = 0, a21 = 0, a22 = 0, i.e., A = . Moreover, since all the
00
entries of the matrix are real, A, B = B, A. Further for any α, β ∈ R and
A, B, C ∈ V ,
134 5 Inner Product Spaces



αa11 + βb11 αa12 + βb12 c c
α A + β B, C = , 11 12
αa21 + βb21 αa22 + βb22 c21 c22
= (αa11 + βb11 )c11 + (αa12 + βb12 )c12
+(αa21 + βb21 )c21 + (αa22 + βb22 )c22
= α(a11 c11 + a12 c12 + a21 c21 + a22 c22 )+
β(b11 c11 + b12 c12 + b21 c21 + b22 c22 )
= αA, C + βB, C.

This shows that the above product ,  is an inner product on V.


(6) Let A, B ∈M2 (R). Define
 A, B = tr (At B).
a11 a12
(i) If A = , then A, A = tr (At A) = a112
+ a12
2
+ a21
2
+ a22
2
≥ 0,
a21 a22
and A, A = 0 ⇔ a11 = 0, a12 = 0, a21 = 0, a22 = 0 or A = 0.
(ii) A, B = tr (At B) = tr (At B)t = tr (B t A) = B, A.
(iii) For any scalars α, β and A, B, C ∈ M2×2 (R)

α A + β B, C = tr ((α A + β B)t C)
= tr ((α At + β B t )C)
= tr (α At C) + tr (β B t C)
= α tr (At C) + β tr ((B t C)
= αA, C + βB, C.

This shows that ,  is an inner product on M2 (R). Moreover, if A and B are


matrices of order n × n, even then A, B = tr (At B) defines an inner product
on Mn (R).
Lemma 5.4 Let V be an inner product space and let u, v, w, x ∈ V and α, β, γ , δ ∈
F. Then the following hold:
(i) 0, v = 0 = v, 0, for all v ∈ V .
(ii) αu + βv, γ w + δx = αγ u, w + αδu, x + βγ v, w + βδv, x.
(iii) If for any fixed u ∈ V, u, v = 0 for all v ∈ V , then u = 0.
(iv) For any v1 , v2 ∈ V if u, v1  = u, v2  for all u ∈ V , then v1 = v2 .
Proof (i) Obvious.

(ii) For any u, v, w, x ∈ V and α, β, γ , δ ∈ F

αu + βv, γ w + δx = αu, γ w + δx + βv, γ w + δx


= αγ u, w + αδu, x + βγ v, w + βδv, x.

(iii) Suppose that u, v = 0 for all v ∈ V . Then in particular u, u = 0 and hence
u = 0.

(iv) Since v1 , u = u, v1  = u, v2  = v2 , u, we find that v1 , u = v2 , u for
all u ∈ V . This implies that v1 − v2 , u = v1 , u − v2 , u = 0 for all u ∈ V , and
hence by (iii), we get the required result.
5.1 Inner Products 135

Exercises

1. Let R2 be the vector space over the real field. Find all 4-tuples of real numbers
(a, b, c, d) such that for u = (α1 , α2 ), v = (β1 , β2 ) ∈ R2 , u, v = aα1 β1 +
bα2 β2 + cα1 β2 + dα2 β1 defines an inner product on R2 .
2. Let V be the vector space of all real functions y = f (x) over the field of reals,
2 0
satisfying dd xy2 + 4y = 0. In V define u, v = π uvd x. Prove that this defines
an inner product on V .
3. Let V be the vector space of all real functions y = f (x) over the field of reals, sat-
3 2 0
isfying dd xy3 − 12 dd xy2 + 44 ddyx − 48y = 0. In V if u, v = −∞ uvd x, then prove
that this defines an inner product on V .
∞
4. Let V = {(a1 , a2 , a3 , . . .), ai ∈ R | ai2 is convergent }. Then V is a vector
i=1
space over R with addition and scalar multiplication defined component- wise.
Prove that the map ,  : V × V −→ R, given by (a1 , a2 , . . .), (b1 ,
b2 , . . .) = a1 b1 + a2 b2 + · · · is well defined and also prove that it is an inner
product on V .
5. Let V be a finite dimensional vector space over F, and {e1 , e2 , . . . , en } be a
basis of V . If u, v ∈ V , then u = a1 e1 + a2 e2 + · · · + an en , v = b1 e1 + b2 e2 +
· · · + bn en , where ai , bi are uniquely determined scalars. Define u, v = a1 b1 +
a2 b2 + · · · + an bn . Prove that this map is an inner product on V .
6. Let R2 be the vector space over the real field. For u = (α1 , α2 ), v = (β1 , β2 ) ∈
R2 , prove that u, v = (α1 −α2 )(β 4
1 −β2 )
+ (α1 +α2 )(β
4
1 +β2 )
defines an inner product
on R2 .
7. Let V be a n-dimensional vector space over the field of complex numbers C.
Let B be a basis of V . Define, for arbitrary u, v ∈ V , u, v = [u] B [v] B , where
the inner product of the coordinate vectors on the right hand side is the natural
inner product of the vector space Cn over C. Prove that ,  is an inner product
on V .
8. Let V be the vector space of m × n matrices over R. Prove that A, B = tr (B t A)
defines an inner product in V .
9. Suppose f (u, v) and g(u, v) are inner products on a vector space V over R.
Prove that
(a) The sum f + g is an inner product on V , where ( f + g)(u, v) = f (u, v) +
g(u, v).
(b) The scalar product k f , for k > 0, is an inner product on V , where (k f )(u, v)
= k f (u, v).
10. Find the values of k so that the following is an inner product on R2 , where
u = (x1 , x2 ) and v = (y1
, y2 ); u,
v = x1 y1 −
3xa1jybk2 − 3x2 y1 + kx2 y2 .
11. Show that the formula  a j x j , bk x k  = j+k+1
defines an inner product
j k j,k
on the space R[x] of polynomials over the real field R.
12. Let ,  be the standard inner product on R2 and let T be the linear operator
T (x1 , x2 ) = (−x2 , x1 ). Now T is “rotation through 900 ” anti-clockwise and has
136 5 Inner Product Spaces

the property that u, T (u) = 0 for all u ∈ R2 . Find all the inner products ,  on
R2 such that u, T (u) = 0 for all u.
13. Consider any u, v ∈ R2 . Prove that
 
 u, u u, v 
 
 u, v v, v  = 0

if and only if u and v are linearly independent.

5.2 The Length of a Vector

Definition 5.5 Let V be an inner product space. If√v ∈ V , then the length of v (or
norm of v) denoted as
v
and is defined as
v
= v, v.

Lemma 5.6 If V is an inner product space, then for any u, v ∈ V, α ∈ F


(i)
u
≥ 0,
u
= 0 ⇔ u = 0,
(ii)
αu
= |α|
u
,
(iii)
u + v
2 =
u
2 +
v
2 + 2Reu, v,
(iv)
u + v
2 +
u − v
2 = 2(
u
2 +
v
2 ),
(v) for any u, v ∈ V

1

u + v
2 − 41
u − v
2 , if F = R
u, v = 4
1
4

u + v
2 − 41
u − v
2 + 4i
u + iv
2 − 4i
u − iv
2 , if F = C.

Proof (i) Clear from the definition of the inner product space.

(ii) For any α ∈ F, u ∈ V


αu
2 = αu, αu = ααu, u = αα
u
2 = |α|2
u
2 = (|α|
u
)2 .

This implies that


αu
= |α|
u
.

(iii) For any u, v ∈ V


u + v
2 = u + v, u + v
= u, u + u, v + v, u + v, v
=
u
2 +
v
2 + u, v + u, v
=
u
2 +
v
2 + 2Reu, v.

(iv) In view of (iii), we find that


u − v
2 =
u
2 +
v
2 − 2Reu, v. This yields
that
u + v
2 +
u − v
2 = 2(
u
2 +
v
2 ). This equality is also known as paral-
5.2 The Length of a Vector 137

lelogram equality.

(v) In view of (iii) we have


u + v
2 −
u − v
2 = 4Reu, v. If F = R, then
Reu, v = u, v and hence u, v = 41
u + v
2 − 41
u − v
2 . On the other hand
if F = C, then replacing v by iv we arrive at 4Reu, iv =
u + iv
2 −
u − iv
2 .
But since u, iv = −iu, v and Reu, iv = I mu, v, the above relation yields
that 4I mu, v =
u + iv
2 −
u − iv
2 . But

u, v = Reu, v + i I mu, v


= 41
u + v
2 − 41
u − v
2 + 4i
u + iv
2 − 4i
u − iv
2 .

The above identities (iii), (iv) and (v) are known as Polarization identities.

Definition 5.7 Let V be a vector space over F. A function


.
: V → F is said to
be a norm on V if it satisfies the following axioms:
(1)
v
≥ 0 and
v
= 0 ⇔ v = 0 for all v ∈ V ,
(2)
αv
= |α|
v
, for any α ∈ F and v ∈ V ,
(3)
u + v

u
+
v
for all u, v ∈ V .
A vector space equipped with a norm is said to be a normed space.

Theorem 5.8 (Cauchy-Schwartz inequality) If V is an inner product space and


u, v ∈ V , then |u, v| ≤
u

v
.

Proof If u = 0 or v = 0, in both the cases we find that |u, v| ≤


u

v
. Therefore,
we assume that neither u = 0 nor v = 0. Then in this case
u
= 0. If
u
= 0, then
u, u = 0 and hence u = 0, a contradiction. Now for any scalar λ

u − λv, u − λv = u, u − λv, u − λ{u, v − λu, vv, v} ≥ 0.


u,v u,v
Now if we choose λ = v,v
, then we find that
u
2 − v,v
v, u ≥ 0. This implies
u,vu,v
that
u
2 −
v
2
≥ 0, i.e., |u, v|2 ≤
u
2
v
2 . This yields that |u, v| ≤
u

v
.

Remark 5.9 (i) If we consider the Example 5.3(2), the above theorem gives that
for any u = (α1 , α2 , . . . , αn ), v = (β1 , β2 , . . . , βn ) ∈ V
 n   
   
 n

 
n
 
 αi βi  ≤ |αi | 
2 |βi |2 .
 
i=1 i=1 i=1

(ii) In case of the inner product given in Example 5.3(3) we find that for any
f (t), g(t) ∈ C [a, b]
138 5 Inner Product Spaces

   

 b  b b
 f (t)g(t)dt  ≤ | f (t)|2 dt |g(t)|2 dt.

a a a

u,v
(iii) From Cauchy-Schwartz inequality, we have −1 ≤
u

v
≤ 1 for any two
nonzero vector u and v. This ensures that there exists a unique θ ∈ [0, π ] such
u,v
that cosθ =
u

v
or u, v =
u

v
cosθ . This angle θ is called the angle
between u and v.

Lemma 5.10 Let V be an inner product space. Then for any u, v ∈ V, |u, v| =

|v
if and only if u, v are linearly dependent.

Proof If any one of u, v is zero, then the result follows. Hence, we assume that
neither u nor v is zero. If |u, v| =
u

|v
, then for scalar λ = u,v
v,v

u − λv, u − λv = u, u − λu, v − λv, u + λλv, v


=
u
2 − u,vv,u
v,v
− λ{u, v − u,v
v,v
v, v}
|u,v|2
=
u
2 −
v
2
= 0.

This yields that u − λv = 0, i.e., u, v are linearly dependent.


Conversely, assume that {u, v} is linearly dependent. Then u = αv for some scalar
α, and hence

v
= |α|
v
2 = |α|v, v = |αv, v| = |u, v|.

Theorem
√ 5.11 Every inner product space is normed space together with the norm

u
= u, u.

Proof Suppose that


√ V is an inner product space.
(i) Since
u
= u, u, we find that
u
2 = u, u ≥ 0. This implies that
u
≥ 0.
Furthermore, √

u
= 0 ⇔ u, u = 0
⇔ u = 0.

Hence
u
= 0 ⇔ u = 0.

(ii) For any α ∈ F and u ∈ V


αu
2 = αu, αu = ααu, u = (|α|
u
)2 .

This implies that


αu
= |α|
u
.
(iii) For any u, v ∈ V
5.2 The Length of a Vector 139


u + v
2 = u + v, u + v
= u, u + u, v + v, u + v, v
= u, u + u, v + u, v + v, v
=
u
2 + 2Re(u, v) +
v
2

u
2 + 2|u, v| +
v
2 (since α ∈ F and Rel α ≤ |α|)

u
2 + 2
u

v
+
v
2 (by Cauchy-Schwarz inequality)
≤ (
u
+
v
)2 .

This shows that


u + v

u
+
v
for all u, v ∈ V and hence V is a normed
space.

Definition 5.12 If S is any set, a function d : S × S → R is said to be a metric on


S, if for any a, b, c ∈ S it satisfies the following:
(1) d(a, b) ≥ 0, d(a, b) = 0 ⇔ a = b,
(2) d(a, b) = d(b, a),
(3) d(a, c) ≤ d(a, b) + d(b, c).
The set S equipped with a metric d is said to be a metric space, generally denoted as
(S, d).

Theorem 5.13 Let V be an inner product space over R. Then the function d : V ×
V → R such that d(u, v) =
u − v
, where u, v ∈ V is a metric on V .

Proof Obviously, d(u, v) =


u − v
≥ 0, and d(u, v) = 0 if and only if u = v. It
can also be seen that d(u, v) =
u − v
=
− (v − u)
=
v − u
= d(v, u) for
all u, v ∈ V . For any u, v, w ∈ V

d(u, v) =
u − v

=
(u − w) + (w − v)


u − w
+
w − v
.

This shows that d(u, v) ≤ d(u, w) + d(w, v) for all u, v, w ∈ V and hence d is a
metric on V .

Exercises
1. Consider f (x) = 4x 3 − 6x + 5 and g(x) = −2x 2 + 7 in the polynomial space
1
P(x) with inner product  f, g = 0 f (x)g(x)d x. Find
f
and
g
.
2. Let V be a real inner product space. Show that
(a)
u
=
v
if and only if u + v, u − v = 0,
(b)
u + v
2 =
u
2 +
v
2 if and only if u, v = 0. Show by counter examples
that the above statements are not true for C2 .
3. Let Rn and Cn be the vector spaces over R and C, respectively. Prove that
(a)
(a1 , a2 , . . . , an )
∞ = max (|ai |),
(b)
(a1 , a2 , . . . , an )
1 = |a1 | + |a2 | + · · · + |an |,
140 5 Inner Product Spaces

(c)
(a1 , a2 , . . . , an )
2 = |a1 |2 + |a2 |2 + · · · + |an |2
are norms on Rn and Cn . These are known as infinity-norm, one-norm and
two-norm, respectively.
4. Solve the above problem 3., for u = (1 + i, −2i, 1 − 6i) and v = (1 − i, 2 +
3i, −3i) in C3 .
5. Consider vectors u = (1, −2, −4, 3, −6) and v = (3, −2, 1, −4, −1) in R5 .
Find
(a)
u
∞ and
v
∞ ,
(b)
u
1 and
v
1 ,
(c)
u
2 and
v
2 ,
(d) d∞ (u, v), d1 (u, v), d2 (u, v),
where the norms
.
∞ ,
.
1 and
.
2 are the infinity-norm, one-norm and two-
norm, respectively, on R5 and d∞ , d1 , d2 are metric functions induced by these
norms, respectively.
6. Let C [a, b] be the vector space of real continuous functions on [a, b] over
b b
R. Prove that (i)
f
1 = a | f (t)|dt, (ii)
f
2 = a [ f (t)]2 dt, (iii)
f
∞ =
max(| f (t)|) are norms on C [a, b]. Further consider the functions f (t) = 2t 2 −
6t and g(t) = t 3 + 6t 2 in C [1, 3] and hence find
(a) d1 ( f, g),
(b) d2 ( f, g),
(c) d∞ ( f, g),
where d1 , d2 , d∞ are metric functions induced by the above norms, respectively.
7. Find out norms and metrics induced by the inner products defined in the problems
1 − 8 and 10 of the preceding section.
8. Prove the Apollonius identity

1 1

w − u
2 +
w − v
2 =
u − v
2 + 2
w − (u + v)
2 .
2 2
9. Let u = (r1 , r2 , . . . , rn ) and v = (s1 , s2 , . . . , sn ) be in Rn . The Cauchy-Schwartz
inequality states that
|r1 s1 + r2 s2 + · · · + rn sn |2 ≤ (r12 + r22 + · · · + rn2 )(s12 + s22 + · · · +2 sn2 ).
Prove that

(|r1 s1 | + |r2 s2 | + · · · + |rn sn |)2 ≤ (r12 + r22 + · · · + rn2 )(s12 + s22 + · · · + sn2 ).

5.3 Orthogonality and Orthonormality

Definition 5.14 Let V be an inner product space over F. Let u, v ∈ V. Then u is


said to be orthogonal to v if u, v = 0, whenever u = v.
5.3 Orthogonality and Orthonormality 141

Remark 5.15 (i) A vector v is said to be orthogonal to a subset S of an inner


product space V if v is orthogonal to each vector in S. Also any two subspaces
are called orthogonal if every vector in one subspace is orthogonal to each
vector in other.
(ii) A subset S of an inner product space V is said to be an orthogonal set if any
two distinct vectors in S are orthogonal.
(iii) The zero vector is the only vector which is orthogonal to itself. Moreover, since
for any v ∈ V , 0, v = 0v, v = 0v, v = 0, we find that 0 ∈ V is orthogonal
to every v ∈ V.
(iv) The relation of orthogonality is in fact symmetric, i.e., if u is orthogonal to v,
then u, v = 0. This implies that v, u = 0, which shows that v, u = 0, and
hence v is orthogonal to u.
(v) If u is orthogonal to v, then it can be easily seen that every scalar multiple of u
is orthogonal to v. In fact, for any scalar α, αu, v = αu, v = α0 = 0.

Theorem 5.16 Let S = {u 1 , u 2 , . . . , u n } be a pairwise orthogonal set of nonzero


vectors in an inner product space V . Then the following hold:
(i) S is linearly independent.

n
v,u k 
(ii) If v ∈ V is in the linear span of S, then v =
u k
2 k
u .
k=1

n 
n
(iii)
u i
2 =
u i
2 (Pythagoras theorem).
i=1 i=1


n
Proof (i) Suppose there exist scalars α1 , α2 , . . . , αn such that αi u i = 0. Thus for
i=1
each 1 ≤ k ≤ n
0 = 0, u k 
n
=  αi u i , u k 
i=1

n
= αi u i , u k 
i=1
= αk u k , u k 
= αk
u k
2 .

But since
u k
2 = 0 we find that αk = 0 for each 1 ≤ k ≤ n, and hence S is linearly
independent.

n
(ii) Since v ∈ L(S), there exist scalars α1 , α2 , . . . , αn such that v = αi u i . Thus for
i=1

n 
n
each 1 ≤ k ≤ n, v, u k  =  αi u i , u k  = αi u i , u k . But since S is orthogonal,
i=1 i=1
we find that v, u k  = αk u k , u k . Now since u k = 0,
u k
2 = 0, the latter relation
n
yields that αk = v,u k

u k
2
, 1 ≤ k ≤ n, and hence v = v,u k 

u k
2 k
u .
k=1
(iii) For any u i ∈ S, 1 ≤ i ≤ n
142 5 Inner Product Spaces


n 

nn

u i
2 = ui , ui
i=1  i=1
i=1
= u i , u j 
1≤i, j≤n
n
= u i , u i 
i=1
n
=
u i
2 .
i=1

Remark 5.17 Any two vectors u, v in an Euclidean space are orthogonal if and only
if
u + v
2 =
u
2 +
v
2 . This result is not true in unitary space. In fact, for any
u = (u 1 , u 2 ), v = (v1 , v2 ) ∈ C2 , u, v = u 1 v1 + u 2 v2 defines an inner product on
C2 . If we consider u = (0, i), v = (0, 1) ∈ C2 , then u, v = 00 + i1 = i = 0, and
hence u and v are not orthogonal. But


u + v
2 =
(0, i) + (0, 1)
2
=
(0, 1 + i)
2
= (0, 1 + i), (0, 1 + i)
= 00 + (1 + i)(1 + i)
= (1 + i)(1 − i)
= 2,

u
2 = (0, i), (0, i) = 1 and
v
2 = (0, 1), (0, 1) = 1 yield that
u + v
2 =

u
2 +
v
2 .

Theorem 5.18 Let {v1 , v2 , . . . , vn } be any subset of linearly independent vectors in


an inner product V . Define u 1 , u 2 , . . . , u n ∈ V inductively as u 1 = v1 and


k−1
vk , u j 
u k = vk − u j ; for 2 ≤ k ≤ n.
j=1

u j
2

Then {u 1 , u 2 , . . . , u n } is pairwise orthogonal subset in V , and L({v1 , v2 , . . . , vn }) =


L({u 1 , u 2 , . . . , u n }).

Proof We apply induction on n. For n = 1, u 1 = v1 and the result holds triv-


ially. Now assume that n > 1 and the result holds for all m < n linearly inde-
pendent vectors in V . By applying the induction on the sets {v1 , v2 , . . . , vn−1 },
we find that {u 1 , u 2 , . . . , u n−1 } is linearly independent and L({v1 , v2 , . . . , vn−1 }) =
L({u 1 , u 2 , . . . , u n−1 }). Since


n−1
vn , u j 
u n = vn − u j,
j=1

u j
2
5.3 Orthogonality and Orthonormality 143


n−1
vn ,u j 
if u n = 0, we find that vn =
u j
2 j
u which is spanned by {v1 , v2 , . . . , vn−1 }. This
j=1
shows that {v1 , v2 , . . . , vn } is linearly dependent, a contradiction to our assumption.
Hence u n = 0. Further, for 1 ≤ i ≤ n − 1


n−1
vn , u j 
u n , u i  = vn , u i  − u j , u i .
j=1

u j
2

But by the induction hypothesis u j , u i  = 0, for i = j; 1 ≤ j ≤ n − 1. Hence the


above yields that u n , u i  = vn , u i  − v
un ,u
i

i
2 u i , u i  = 0. This shows that the set

{u 1 , u 2 , . . . , u n } is pairwise orthogonal and by Theorem 5.16, it is linearly indepen-


dent. Hence the subspace W spanned by {u 1 , u 2 , . . . , u n } has dimension n, and by
the above relation for u n ensures that vn ∈ W . Consequently, n linearly independent
vectors v1 , v2 , . . . , vn ∈ W and hence W = L({v1 , v2 , . . . , vn }). This completes the
proof of our theorem.
Lemma 5.19 If u and v are any two vectors in a real inner product space, then u + v
and u − v are orthogonal if and only if
u
=
v
.
Proof Suppose that u + v and u − v are orthogonal. Then we find that u + v, u −
v = 0. This implies that
u
2 =
v
2 , and hence
u
=
v
.
Conversely, assume that
u
=
v
. Then

u + v, u − v = u, u − u, v + v, u − v, v


= u, u − u, v + u, v − v, v
= u, u − u, v + u, v − v, v
=
u
2 −
v
2
= 0,

and hence u + v and u − v are orthogonal.


Remark 5.20 If vectors u and v represent the adjacent sides of a parallelogram,
then u + v and u − v represent the diagonals of the parallelogram. The above lemma
shows that the diagonals are perpendicular if and only if the lengths of sides are
same. In other words, the above lemma ensures that a parallelogram is a rhombus if
and only if the diagonals are perpendicular.
Consider R2 , geometrically as the cartesian plane and algebraically as a vector
space over R. Let u, v ∈ R2 such that
u = 1. Next suppose that the vectors v and u
represent the position vectors of the points P and Q, respectively, in the plane. Thus
−→ −−→ −→
O P = v, O Q = u, where O is the origin. If θ is the angle between the vector O P,
−−→ −→
and O Q, then it is obvious that the orthogonal projection of vector O P along the
−−→ −→ −→
vector O Q will be given by the vector O R = (| O P|cosθ )u, where R is the foot of
−→
the perpendicular drawn from P on the vector O Q. Thus O R = (v.u)u. Since the
−→
scalar product v.u in R2 coincides with an inner product in R2 , O R = u, vu.
144 5 Inner Product Spaces

The above observation suggests the following:


Definition 5.21 Let V be an inner product space and let v ∈ V , and u ∈ V such that

u
= 1. Then the projection Pu (v) of v along u is defined as Pu (v) = u, vu. If
u ∈ V is any nonzero vector, then

u u u, v
Pu (v) = ,v = u.

u

u
u, u

Remark 5.22 We observe that for any u, v in a real inner product space
u,v
v − Pu (v), u = v, u −  u,u u, u
= v, u − u, v
= v, u − v, u
= v, u − v, u sincev, u ∈ R
= 0.

This shows that v − Pu (v) is orthogonal to u.

Lemma 5.23 Let V be an inner product space defined over R. For a unit vector
u and any vector v ∈ V , let Pu (v) = u, vu. Then d(Pu (v), v) ≤ d(αu, v) for any
α ∈ R.

Proof By the above remark, it is clear that v − Pu (v) is orthogonal to u. Therefore,


for any α ∈ R, v − Pu (v) is orthogonal to αu or Pu (v), and hence v − Pu (v) is
orthogonal to Pu (v) − αu. Thus by Theorem 5.16(iii)


v − αu
2 =
v − Pu (v) + Pu (v) − αu
2
=
v − Pu (v)
2 +
Pu (v) − αu
2 .

This yields that d(v, αu)2 ≥ d(v, Pu (v))2 , and hence we find that d(Pu (v), v) ≤
d(αu, v) for any α ∈ R.

Definition 5.24 Let S be a set of vectors in an inner product space V . Then S is said
to be orthonormal set if
5.3 Orthogonality and Orthonormality 145

(1) each vector in S has unit norm, i.e.,


u
= 1,
(2) for any u, v ∈ S, u = v, u, v = 0.
In addition, if S is a basis of V , then S is called an orthonormal basis of V .

Remark 5.25 (i) If S is a finite orthonormal subset of an inner product space V ,


then it is clear that S is linearly independent.
(ii) A set consisting of mutually orthogonal unit vectors is an orthonormal set.

Theorem 5.26 Let {v1 , v2 , . . . , vn } be an orthonormal subset of an inner product



n n 
n
space V . If v = αi vi , then αi = v, vi  and
v
2 = |vi |2 = |v, vi |2 .
i=1 i=1 i=1

Proof Since the set {v1 , v2 , . . . , vn } is orthonormal,


 n 
 
n
v, vi  = α j v j , vi = α j v j , vi  = αi .
j=1 j=1

Hence αi ᾱi = |αi |2 = |v, vi |2 . Therefore


v
2 = v,
 v 

n 
n
= αi vi , αjvj
i=1 j=1

n 
n
= αi α¯j vi , v j 
j=1 i=1
n
= αi ᾱi
i=1
n
= |v, vi |2 .
i=1

Definition 5.27 Let S be any subset of an inner product space V . The orthogonal
complement of S denoted as S ⊥ consists of those vectors in V which are orthogonal
to every vector of S, i.e.,

S ⊥ = {v ∈ V | v, w = 0 for all w ∈ S}.

Remark 5.28 (i) For any given vector u ∈ V, u ⊥ consists of all vectors in V which
are orthogonal to u, i.e., u ⊥ = {v ∈ V | v, u = 0}. Also for any subset S of
V , (S ⊥ )⊥ = {v ∈ V | v, w = 0, for all w ∈ S ⊥ } will be denoted by S ⊥⊥ .
/ V ⊥ , and hence V ⊥ = {0}.
(ii) Since for any 0 = v ∈ V, v, v = 0, we find that v ∈
Obviously {0}⊥ = V .
(iii) Since zero vector in V is orthogonal to every vector in V , clearly 0 ∈ S ⊥ and
hence S ⊥ = ∅. For any u, v ∈ S ⊥ , α, β ∈ F and any vector w ∈ S

αu + βv, w = αu, w + βv, w = α0 + β0 = 0


146 5 Inner Product Spaces

and hence S ⊥ is a subspace of V . This shows that if S is a nonempty subset of


an inner product space V , then S ⊥ is a subspace of V . Moreover, if v ∈ S ∩ S ⊥ ,
then v, v = 0 yields that v = 0. Hence S ∩ S ⊥ = {0}.
(iv) Suppose that S1 , S2 are any two subsets of V such that S1 ⊆ S2 . Then it can be
easily seen that S2⊥ ⊆ S1⊥ . In fact, if v ∈ S2⊥ , then we find that v, w = 0 for all
w ∈ S2 , which yields that v, w = 0 for all w ∈ S1 , i.e., v ∈ S1⊥ .
Theorem 5.29 Let V be a finite dimensional inner product space and W a subspace
of V . Then V = W ⊕ W ⊥ , and W = W ⊥⊥ .
Proof If W = {0}, then obviously W ⊥ = V. Hence V = {0} ⊕ V = W ⊕ W ⊥ .
Henceforth assume that W = {0}. Since W is a subspace of an inner product
space V , W is also an inner product space which has an orthonormal basis say

n
B = {w1 , w2 , . . . , wn }. Let v ∈ V , and w = v, wi wi ∈ W . Then it can be seen
i=1
that v − w, w j  = 0 for each 1 ≤ j ≤ n. But since B spans W , we find that
v − w ∈ W ⊥ . Now v = w + (v − w) ∈ W + W ⊥ , and hence V = W + W ⊥ . By the
above Remark 5.28(iii) W ∩ W ⊥ = {0} and consequently V = W ⊕ W ⊥ .
Now if w ∈ W, then for any v ∈ W ⊥ , w, v = 0 implies that w, v = 0 = 0 i.e.,
v, w = 0. This implies that w ∈ W ⊥⊥ and hence W ⊆ W ⊥⊥ . Since V = W ⊥ ⊕
W ⊥⊥ and if dimV = n, dimW = m, we find that dimW ⊥ = dimV − dimW =
n − m. This yields that dimW ⊥⊥ = dimV − dimW ⊥ = n − (n − m) = m, and
hence dimW = dimW ⊥⊥ and therefore W = W ⊥⊥ .
Corollary 5.30 If V is a finite dimensional inner product space and W1 , W2 are
subspaces of V , then
(i) if W1 ⊆ W2 , then W2⊥ ⊆ W1⊥ ,
(ii) (W1 + W2 )⊥ = W1⊥ ∩ W ⊥ ,
(iii) (W1 ∩ W2 )⊥ = W1⊥ + W2⊥ ,
(iv) if V = W1 ⊕ W2 , then V = W1⊥ ⊕ W2⊥ .
Proof ((i)) Obvious in view of the above Remark 5.28(iv).

(ii) Since W1 and W2 both are subsets of the subspace W1 + W2 , we find that (W1 +
W2 )⊥ ⊆ W1⊥ and (W1 + W2 )⊥ ⊆ W2⊥ . This yields that (W1 + W2 )⊥ ⊆ W1⊥ ∩ W2⊥ .
Now conversely, assume that u ∈ W1⊥ ∩ W2⊥ . Then u ∈ W1⊥ and u ∈ W2⊥ . There-
fore u is orthogonal to every vector in W1 and also to every vector in W2 . Let
w be any vector in W1 + W2 , then w = w1 + w2 , where w1 ∈ W1 , w2 ∈ W2 . It
can be easily seen that u, w = u, w1 + w2  = u, w1  + u, w2  = 0. Therefore,
u ∈ (W1 + W2 )⊥ and hence W1⊥ ∩ W2⊥ ⊆ (W1 + W2 )⊥ . Combining this with the
above fact we find that (W1 + W2 )⊥ = W1⊥ ∩ W2⊥ .

(iii) Since W1⊥ and W2⊥ are also subspaces of V , taking W1⊥ in place of W1 and W2⊥
in place of W2 in (ii) we find that (W1⊥ + W2⊥ )⊥ = W1⊥⊥ ∩ W ⊥⊥ = W1 ∩ W2 . This
implies that (W1⊥ + W2⊥ )⊥⊥ = (W1 ∩ W2 )⊥ and hence (W1 ∩ W2 )⊥ = W1⊥ + W2⊥ .
5.3 Orthogonality and Orthonormality 147

(iv) Since V = W1 ⊕ W2 , we find that {0} = V ⊥ = (W1 + W2 )⊥ = W1⊥ ∩ W2⊥ . But


W1 ∩ W2 = 0 implies that V = {0}⊥ = (W1 ∩ W2 )⊥ = W1⊥ + W2⊥ . This yields the
required result.
Suppose that V is an inner product space over F. If u ∈ V , then the map that sends
v to v, u is a linear functional on V . The following result shows that every linear
functional on V is of this form.
Definition 5.31 Let V be an inner product space and let W1 , W2 , . . . , Wn be sub-
spaces of V . Then V is the orthogonal direct sum of W1 , W2 , . . . , Wn , written as
V = W1  W2  · · ·  Wn if
(1) V = W1 ⊕ W2 ⊕ · · · ⊕ Wn ,
(2) Wi ⊥ W j for i = j; 1 ≤ i, j ≤ n.
Theorem 5.32 Let V be an inner product space. Then the following are equivalent:
(i) V = U  W ,
(ii) V = U ⊕ W and W = U ⊥ ,
(iii) V = U ⊕ W and W ⊆ U ⊥ .
Proof (i) =⇒ (ii) If (i) holds, then V = U ⊕ W and U ⊥ W , which implies that
W ⊆ U ⊥ . But if v ∈ U ⊥ , then v = u + w, for u ∈ U, w ∈ W . Hence

0 = u, v = u, u + u, w = u, u.

This yields that u = 0 and v ∈ W , which implies that U ⊥ ⊆ W . Hence, U ⊥ = W .

(ii) =⇒ (iii) Obvious.

(iii) =⇒ (i) If (iii) holds then W ⊆ U ⊥ , which implies that U ⊥ W and (i) holds.
Theorem 5.33 (Riesz Representation Theorem) Let V be a finite dimensional inner
product space over F and ϕ : V → F be a linear functional on V . Then there exists
a unique vector u ∈ V such that ϕ(v) = v, u for every v ∈ V .
Proof First we show there exists a vector u ∈ V such that ϕ(v) = v, u for every
v ∈ V . Let {v1 , v2 , . . . , vn } be an orthonormal basis of V . Then by Theorem 5.16(ii),
n
v = v, vi vi , and hence
i=1

 
ϕ(v) = ϕ v, v1 v1 ) + ϕ(v, v2 v2 ) + · · · + ϕ(v, vn vn
= v, v1 ϕ(v1 ) + v, v2 ϕ(v2 ) + · · · + v, vn ϕ(vn )
= v, ϕ(v1 )v1  + v, ϕ(v2 )v2  + · · · + v, ϕ(vn )vn 
= v, ϕ(v1 )v1 + ϕ(v2 )v2 + · · · + ϕ(vn )vn 

for every v ∈ V . Now setting u = ϕ(v1 )v1 + ϕ(v2 )v2 + · · · + ϕ(vn )vn , we arrive at
ϕ(v) = v, u for every v ∈ V . Now in order to show that there exists a unique
148 5 Inner Product Spaces

vector u ∈ V with the desired behavior suppose that there exist u 1 , u 2 ∈ V such that
ϕ(v) = v, u 1  = v, u 2  for every v ∈ V . Then we find that

v, u 1  − v, u 2  = v, u 1 − u 2  = 0, for every v ∈ V.

Now in particular, for v = u 1 − u 2 we find that u 1 − u 2 = 0 or u 1 = u 2 which proves


the uniqueness of u.

Definition 5.34 Let V and W be any two inner product spaces over the same field F.
Then V is said to be isomorphic to W if there exists a vector isomorphism ψ : V →
W such that for any v1 , v2 ∈ V, v1 , v2  = ψ(v1 ), ψ(v2 ). Such an isomorphism is
called inner product space isomorphism.

Theorem 5.35 Any inner product space V over the field F of dimension n is iso-
morphic to Rn or Cn according as F = R or C.

Proof Since V is finite dimensional, it has an orthonormal basis say B = {v1 ,


v2 , . . . , vn }. Define a map ψ : Fn → V such that


n
ψ(α1 , α2 , . . . , αn ) = αi vi for every(α1 , α2 , . . . , αn ) ∈ Fn .
i=1

Clearly ψ is a vector space isomorphism. For any α = (α1 , α2 , . . . , αn ), β = (β1 ,


β2 , . . . , βn ) ∈ Fn
n
α, β = αi βi .
i=1

Also  

n 
n
ψ(α), ψ(β) = αi vi , βjvj
i=1 j=1

n  n
= αi β j vi , v j 
i=1 j=1
n
= αi βi
i=1
= α, β

for all v ∈ V . This shows that V is isomorphic to Fn .

Lemma 5.36 If S is a subset of an inner product space V , then


(i) S ⊥ = (L(S))⊥ ,
(ii) L(S) ⊆ S ⊥⊥ ,
(iii) L(S) = S ⊥⊥ , if V is finite dimensional.
5.3 Orthogonality and Orthonormality 149

Proof (i) It is clear that S ⊆ L(S), and hence (L(S))⊥ ⊆ S ⊥ . Now suppose that
v ∈ S ⊥ . Then v is orthogonal to each vector on S. let w ∈ L(S), then w is a linear
n
combination of finite number of vectors in S, i.e., w = αi vi , where vi ∈ S. Thus
i=1


n
v, w = v, αi vi 
i=1

n
= αi v, vi 
i=1
= 0 (since v is orthogonal to each vector in S).

Therefore v is orthogonal to every vector in L(S), i.e., v ∈ (L(S))⊥ . This shows that
S ⊥ ⊆ (L(S))⊥ and hence S ⊥ = (L(S))⊥ .

(ii) Let v ∈ L(S). If w is any vector in S ⊥ , then w is orthogonal to each vector in S.


Thus w is orthogonal to v which is a linear combination of a finite number of vectors
in S. This ensures that v is orthogonal to every vector w in S ⊥ and hence v ∈ S ⊥⊥ .
This implies that L(S) ⊆ S ⊥⊥ .

(iii) From (i), we have S ⊥ = (L(S))⊥ , which implies that S ⊥⊥ = (L(S))⊥⊥ . But
since L(S) is a subspace of V , by Theorem 5.29, we find that (L(S))⊥⊥ = L(S).
This shows that S ⊥⊥ = L(S).

Theorem 5.37 (Bessel’s inequality) Let S = {v1 , v2 , . . . , vn } be any nonempty orthonor-


mal finite subset of an inner product space V . Then for any v ∈ V ,


n
|v, vi |2 ≤
v
2 .
i=1

Proof For any v ∈ V , observe that


 

n 
n 
n

v − v, vi vi
= v −
2
v, vi vi , v − v, v j v j
i=1 i=1 j=1

n
= v, v − v, vi vi , v
i=1

n 
n 
n
− v, v j v, v j  + v, vi v, v j vi , v j .
j=1 j=1 i=1

But since S is orthonormal set, i.e., vi , v j  = 0 if i = j and vi , v j  = 1 if i = j,


the above relation yields that
150 5 Inner Product Spaces


n 
n 
n

v
2 − |v, vi |2 − |v, vi |2 + |v, vi |2 ≥ 0,
i=1 i=1 i=1


n
which implies that |v, vi |2 ≤
v
2 .
i=1

Theorem 5.38 If S = {v1 , v2 , . . . , vn } is an orthonormal subset of a finite dimen-


n
sional inner product space V , then for any vector v ∈ V, v − v, vi vi is orthog-
i=1

n
onal to each of v1 , v2 , . . . , vn . Moreover, if u ∈
/ L(S), then for w = u − u, vi vi ,
i=1

w
= 0 and for vn+1 = w

w

, the set {v1 , v2 , . . . , vn , vn+1 } is an orthonormal subset


of V .
Proof For any 1 ≤ j ≤ n we observe that


n 
n
v − v, vi vi , v j  = v, v j  − v, vi vi , v j 
i=1 i=1
= v, v j  − v, v j v j , v j 
= v, v j  − v, v j 
= 0.


n
This shows that v − v, vi vi is orthogonal to each v1 , v2 , . . . , vn . Now assume
i=1

n
that u ∈
/ L(S) and
w
= 0, where w = u − u, vi vi , because if
w
= 0 then
i=1

n
w = 0 and hence the above relation yields that u = u, vi vi or u is in the linear
i=1
combination of v1 , v2 , . . . , vn , i.e., u ∈ L(S), a contradiction and hence
w
= 0.
Further, we see that {v1 , v2 , . . . , vn , vn+1 } is also an orthonormal subset of V . For
each 1 ≤ i ≤ n,

vn+1 , vi  = 
w

w
, vi 

n
= 1

w

u − u, v j v j , vi 
j=1

n
= 1

w

{u, vi  − u, v j v j , vi }


j=1
=
w

1
{u, vi  − u, vi }
= 0.

Finally we see that vn+1 , vn+1  = 


w

w
,
w

w
= 1

w
2
w, w = 1, which completes
the proof.
If we consider the standard basis B = {e1 , e2 , . . . , en } of Rn ( or Cn ), it can be easily
observed that the basis B is an orthonormal basis, i.e., vectors in B are pairwise
5.3 Orthogonality and Orthonormality 151

orthogonal and each vector in B has unit length. For n ≥ 2 this vector space has
infinitely many bases whose members are pairwise orthogonal, and has length 1. We
shall now discuss a method for obtaining orthonormal basis for a finite dimensional
inner product space.

Gram-Schmidt orthonormalization process


Consider any sequence of linearly independent vectors say (v1 , v2 , . . . , vn , . . .) in an
inner product space V . If Wi is the subspace spanned by v1 , v2 , . . . , vi and W be the
subspace spanned by all the given vectors, then obviously W is the union of all Wi .
By Theorem 5.18 if we define u 1 = v1 and


k−1
vk , u j 
u k = vk − u j ; for k ≥ 2,
j=1

u j
2

then {u 1 , u 2 , . . . , u i } forms a basis of Wi . This yields that (u 1 , u 2 , . . . , u n , . . .) forms


an orthogonal basis of W . Now for any k ≥ 1, define wk =
uu kk
. Then
wk
= 1, and
(w1 , w2 , . . . , wn , . . .) forms an orthonormal basis of W . This process of constructing
an orthonormal basis, starting from a basis of a countably generated subspace of V
is called Gram-Schmidt orthonormalization process.
Example 5.39 (1) Apply Gram-Schmidt orthonormalization process to find an
orthonormal basis of the subspace W of C3 generated by {(1, i, 0), (1, 2, 1 − i)}.

Suppose that v1 = (1, i, 0), v2 = (1, 2, 1 − i). Put u 1 = v1 = (1, i, 0),


u 2 = v2 − v2 ,v1 
v
v1 ,v1  1
= (1, 2, 1 − i) − 1.1+2.(−i)+(1−i).0
1.1+i.(−i)+0.0
(1, i, 0) = ( 1+2i
2
, 2−i
2
, 2−2i
2
).
Assume w1 =
u 1
= ( √2 , √2 , 0),
u1 1 i

( (1+2i)
2 , 2 , 2 )
2−i 2−2i
w2 = u2

u 2

= √
18 = ( 1+2i
√ , √
18
, √18 ).
2−i 2−2i
18
2
Thus the required orthonormal basis of W is given by

1 i 1 + 2i 2 − i 2 − 2i
{w1 , w2 } = {( √ , √ , 0), ( √ , √ , √ )}.
2 2 18 18 18

(2) Let R[x], the vector space of all real polynomials with inner product on R[x]
given by
 1
 p(x), q(x) = p(x)q(x)d x.
−1

Applying Gram-Schmidt orthonormalization process to the sequence B = (1, x,


x 2 , x 3 , . . .) yields v1 = 1, v2 = x, v3 = x 2 , v4 = x 3 and so on. Put
u 1 = v1 = 1,  1
v2 ,v1  (1.x)d x
u 2 = v2 − v
v1 ,v1  1
=x− −1
1 1 = x,
−1 (1.1)d x
152 5 Inner Product Spaces
1 2 1 3
v3 ,u 1  v3 ,u 2  x dx x dx
u 3 = v3 − u 1 ,u 1 
u 1 − u 2 ,u 2 
u 2 = x 2
− −11 1 − −1
1 2d x
x = x 2 − 13 ,
−1 1d x −1 x
v4 ,u 1  v4 ,u 2  v4 ,u 3 
u 4 = v4 − u 1 ,u 1  u 1 − u 2 ,u 2  u 2 − u 3 ,u 3  u 3 ,
1 3 1 4 1 3 2 1
x dx x dx x (x − )d x
= x 3 − −11 1 − −11 x − −11 2 1 32 (x 2 − 13 ) = x 3 − 35 x,
−1 1d x −1 xd x −1 (x − 3 ) d x
and so on. Now let
w1 =
uu 11
= √ 11 = √1 ,
2
dx
−1 √
w2 = u2

u 2

= x
√ 1 2 = √23 x,
−1 x d x

x2− 1
w3 = u3

u 3

= √ 1 2 3 1 2 = 410 (3x 2 − 1),


−1 (x − 3 ) d x

x3− 3 x
w4 = u4

u 4

= √ 1 3 5 3 2 = 25√72 (x 3 − 35 x)
−1 (x − x) d x
5
and so√ on. Hence
√ √ orthonormal sequence in R[x] is given by
the required
1 √3 10 2 2
( 2 , 2 x, 4 (3x − 1), 5 7 (x 3 − 35 x), . . .).
√ 2 √

Lemma 5.40 Let {v1 , v2 , . . . , vn } be the set of nonzero pairwise orthogonal vectors
in an inner product space V . Then the vectors u 1 , u 2 , . . . , u n with u 1 = v1 and
 vk ,u j 
k−1
u k = vk −
u j
2 j
u ; for 2 ≤ k ≤ n are such that vi = u i ; 1 ≤ i ≤ n.
j=1

Proof Given that u 1 = v1 . The result can be proved easily by using induction.
Assume that for some k; 1 ≤ k < n, u i = vi , 1 ≤ i ≤ k. Then for 1 ≤ j ≤ k; vk+1 ,
u j  = vk+1 , v j  = 0 and hence


k
vk+1 , u j 
u k+1 = vk+1 − u j = vk+1 ,
j=1

u j
2

and the result follows.


It is well-known that in a finite dimensional vector space every linearly independent
subset can be extended to a basis of the vector space. The following theorem shows
that the similar result follows for orthonormal subset of a finite dimensional inner
product space:
Theorem 5.41 Let V be a finite dimensional inner product space with dimV = n ≥
1. Then every orthonormal subset of V can be extended to an orthonormal basis of
V.
Proof Suppose S = {v1 , v2 , . . . , vk } is an orthonormal subset of V . Since S is
orthonormal, by Theorem 5.16, it is linearly independent. Hence S can be extended
to a basis of V say {v1 , v2 , . . . , vn } such that vi = vi , 1 ≤ i ≤ k. Now define u 1 = v1
 vi ,u j 
i−1
and u i = vi −
u j
2 j
u ; for 2 ≤ i ≤ n. Hence by Lemma 5.40, u i = vi = vi for
j=1
u
1 ≤ i ≤ k. Now let w j =
u jj
for 1 ≤ j ≤ k. But since the set S is orthonormal,

v j
= 1, w j = v j for 1 ≤ j ≤ k, and by using the Gram-Schmidt process we find
that B = {w1 , w2 , . . . , wn } is an orthonormal basis of V .
5.3 Orthogonality and Orthonormality 153

Exercises
1. Let B = {e1 , e2 , . . . , en } be an orthonormal basis of V . Prove that
(a) For any v ∈ V , we have v = v, e1 e1 + v, e2 e2 + · · · + v, en en ,
(b) a1 e1 + a2 e2 + · · · + an en , b1 e1 + b2 e2 + · · · + bn en =a1 b1 + a2 b2 + · · ·
+ an bn ,
(c) For any u, v ∈ V , we have u, v = u, e1 v, e1  + · · · + u, en v, en .
2. Let B = {e1 , e2 , . . . , en } be an orthogonal basis of V . Then prove that for any
v ∈ V , v = ev,e 1
1 ,e1 
e1 + ev,e 2
2 ,e2 
e2 + · · · + ev,e n
n ,en 
en .
3. Consider the subspace U of R4 spanned by the vectors: v1 = (1, 1, 1, 1), v2 =
(1, 1, 2, 4), v3 = (1, 2, −4, −3), using Gram-Schmidt algorithm, find
(a) an orthogonal basis of U ,
(b) an orthonormal basis of U .
4. Let R3 [x] be the inner product of all polynomials of degree at most 3, under the
inner product  ∞
p(x)q(x)e−x d x.
2
 p(x), q(x) =
−∞

Apply Gram-Schmidt process to the basis {1, x, x 2 , x 3 } thereby computing the


first four Hermite polynomials ( at least upto the multiplicative constant).
v,w
5. Let w = 0. Suppose that v is any vector in V . Prove that c = w,w = v,w

w
2
is the

unique scalar such that v = v − cw is orthogonal to w.
6. Suppose v = (1, −3, 5, −6). Find the projection of v onto W or, in other words,
find w ∈ W that minimizes
v − w
, where W is the subspace of R4 spanned by
(a) u 1 = (1, 2, 3, 1) and u 2 = (1, −3, 4, −6),
(b) v1 = (−2, 0, 4, 5) and v2 = (−3, −6, 1, 0).
7. Suppose B = {e1 , e2 , . . . , er } is an orthogonal basis for a subspace W of a
finite dimensional inner product space V . Prove that B can be extended to an
orthogonal basis for V , that is one may find vectors er +1 , er +2 , . . . , en such that
B = {e1 , e2 , . . . , en } is an orthogonal basis of V .
8. Using Gram-Schmidt algorithm, find an orthonormal basis of the subspace W
of C3 spanned by v1 = (1, i, i + 3), v2 = (1, −i + 2, i + 4).
9. Let U be a subspace of R4 spanned by the vectors: v1 = (1, 1, 1, 1), v2 =
(1, −1, 2, 2), v3 = (1, 2, −3, −4). Apply Gram-Schmidt algorithm, to find
(a) an orthogonal basis of U and an orthonormal basis of U ,
(b) the projection of v = (−1, 2, 3, −4) onto U .
10. Show that an orthonormal subset {v1 , v2 , . . . , vm } of an inner product space V
m
is an orthonormal basis of V if and only if |v, vi |2 =
v
2 for every v ∈ V .
i=1
11. Let the set {v1 , v2 , . . . , vn } be linearly dependent. What happens when the Gram-
Schmidt process of orthogonalization is applied to it?
154 5 Inner Product Spaces

12. Apply Gram-Schmidt algorithm to orthonormalize the set of linearly indepen-


dent vectors {(1, 0, 1, 1), (−1, 0, −1, 1), (0, −1, 1, 1)} of R4 .
13. Find k so that u = (−1, 2, −3, 4) and v = (−3, 5, k, 6) in R4 are orthogonal.
14. Let W be the subspace of R3 spanned by u = (2, −3, 5) and v = (2, 4, −7).
Find a basis of the orthogonal complement W ⊥ of W .
15. If W is a subspace of C3 spanned by (1, i, 1 − i) and (i, −1, 0), then under the
standard inner product, find W ⊥ .
16. Let u = (−2, 3, 1, 5, 0) be a vector in R5 . Find an orthogonal basis for u ⊥ .
17. Let S consist of the following vectors in R4 : u 1 = (1, 1, 0, −1), u 2 = (1, 2, 1, 3),
u 3 = (1, 1, −9, 2), u 4 = (16, −13, 1, 3).
(a) Show that S is orthogonal and a basis of R4 .
(b) Find the coordinates of an arbitrary vector v = (a, b, c, d) in R4 relative to
the basis S.
18. Suppose that S, S1 , S2 are the subsets of an inner product space V . Prove the
following:
(a) S ⊆ S ⊥⊥ ,
(b) if S1 ⊆ S2 , then S1⊥⊥ ⊆ S2⊥⊥ .
19. Let V be a complex inner product space and let W be a subspace of V . Suppose
that v ∈ V is a vector for which v, w + w, v ≤ w, w for all w ∈ W . Prove
that v ∈ W ⊥ .
20. Suppose w1 , w2 , . . . , wn form an orthogonal set of nonzero vectors in V . Let
 v,wi 
v ∈ V . Define v = v − (c1 w1 + c2 w2 + · · · + cn wn ), where ci = w i ,wi 
. Then

prove that v is orthogonal to w1 , w2 , . . . , wn .
21. Suppose w1 , w2 , . . . , wn form an orthogonal set of nonzero vectors in V . Let v be
any vector of V and let ci be the component of v along wi . Then, prove that for any

n 
n 
scalars a1 , a2 , . . . , an , we have
v − ck wk

v − ak wk
, i.e., ci wi
k=1 k=1
is the closest approximation to v as a linear combination of w1 , w2 , . . . , wn .
22. Let V be the vector space of polynomials over R of degree ≤ 3 with inner product
1
defined by  f, g = 0 f (t)g(t)dt. Find a basis of the subspace W orthogonal
to h(t) = −3 + 2t + 5t 2 + 6t 3 .
23. Let M2 (R) be the vector space with inner product A, B = tr (B t A). Find an
orthogonal basis for the orthogonal complement of
(a) diagonal matrices,
(b) symmetric matrices,
(c) skew symmetric matrices.
24. Suppose {u 1 , u 2 , . . . , u n } is an orthogonal set of vectors. Show that
{k1 u 1 , k2 u 2 , . . . , kn u n } is an orthogonal set for any scalars k1 , k2 , . . . , kn .
25. Let V be the real inner product space consisting of the real valued continuous
 +1
function on [−1, 1] with the inner product  f, g = −1 f (t)g(t)dt. Let W be
5.3 Orthogonality and Orthonormality 155

the subspace of odd functions, i.e., functions satisfying f (−t) = − f (t). Find
orthogonal complement
 of W .
26. Suppose V = W1 W2 and that f 1 and f 2 are inner products on W1 and W2 ,
respectively. Show that there is a unique inner product f on V such that
(a) W2 = W1⊥ ,
(b) f (u, v) = f k (u, v) when u, v ∈ Wk , k = 1, 2.

5.4 Operators on Inner Product Spaces

Let U, V be inner product spaces over F and T : V → U be a linear transformation.


Fix u ∈ U and consider a linear functional ϕ on V such that ϕ(v) = T (v), u, v ∈ V .
This linear functional depends on T and u. By Riesz representation theorem there
exists a unique vector in V such that this linear functional is given by taking the inner
product with it. If T ∗ (u) is the unique vector in V , then T (v), u = v, T ∗ (u).

Definition 5.42 Let U and V be inner product spaces over the field F and T :
U → V be a linear transformation. The adjoint of T , denoted as T ∗ , is the function
T ∗ : V → U such that T (u), v = u, T ∗ (v), for every u ∈ U and v ∈ V .

Example 5.43 Let T : R2 → R3 such that T (a, b) = (a + b, a − b, b) be a linear


transformation. Find the adjoint of T .

Proof Adjoint of T , i.e., T ∗ is a function from R3 to R2 . Now let (x, y, z) ∈ R3 be


an arbitrary point in R3 . Then for every (a, b) ∈ R2

(a, b), T ∗ (x, y, z) = T (a, b), (x, y, z)


= (a + b, a − b, b), (x, y, z)
= ax + bx + ay − by + bz
= (a, b), (x + y, x − y + z).

This shows that T ∗ (x, y, z) = (x + y, x − y + z), which is the required adjoint of


T.

Remark 5.44 The above example shows that if T : R2 → R3 is a linear transforma-


tion, then T ∗ : R3 → R2 is also a linear transformation. The following result shows
that this is indeed true in general.

Theorem 5.45 If U and V are inner product spaces over the same field F and
T : U → V is a linear transformation, then T ∗ is a linear transformation from V to
U.

Proof Given that T : U → V is a linear transformation. For let v1 , v2 ∈ V and α, β ∈


F. If u ∈ U , then
156 5 Inner Product Spaces

u, T ∗ (αv1 + βv2 ) = T (u), αv1 + βv2 


= T (u), αv1  + T (u), βv2 
= αT (u), v1  + βT (u), v2 
= αu, T ∗ (v1 ) + βu, T ∗ (v2 )
= u, αT ∗ (v1 ) + u, βT ∗ (v2 )
= u, αT ∗ (v1 ) + βT ∗ (v2 ).

Hence the above yields that T ∗ (αv1 + βv2 ) = αT ∗ (v1 ) + βT ∗ (v2 ) for all v1 , v2 ∈ V
and α, β ∈ F.

Theorem 5.46 Let U, V, W be inner product spaces over the same field F. Then
(i)(S + T )∗ = S ∗ + T ∗ for all S, T ∈ H om(U, V ),
(ii)(αT )∗ = αT ∗ for all α ∈ F, T ∈ H om(U, V ),
(iii)(ST )∗ = T ∗ S ∗ for all T ∈ H om(U, V ), S ∈ H om(V, W ),
(iv) (T ∗ )∗ = T for all T ∈ H om(U, V ),
(v) I ∗ = I (resp. O ∗ = O), where I (rep. O) is the identity (resp. zero) linear
transformation on U
(vi) if T ∈ H om(U, V ) and T is invertible, then (T ∗ )−1 = (T −1 )∗ .

Proof (i) Let S, T ∈ H om(U, V ). If u ∈ U and v ∈ V, then

u, (S + T )∗ (v) = (S + T )(u), v


= S(u), v + T (u), v
= u, S ∗ (v) + u, T ∗ (v)
= u, S ∗ (v) + T ∗ (v)

for all u ∈ U, v ∈ V . This shows that (S + T )∗ = S ∗ + T ∗ for all S, T ∈ H om


(U, V ).

(ii) For any α ∈ F and u ∈ U, v ∈ V , we find that

u, (αT )∗ (v) = (αT )(u), v = αT (u), v = αu, T ∗ (v) = u, αT ∗ (v).

This shows that (αT )∗ = αT ∗ for all α ∈ F, T ∈ H om(U, V ).

(iii) Assume that T ∈ H om(U, V ), S ∈ H om(V, W ). Then for any u ∈ U, w ∈ W

u, (ST )∗ (w) = (ST )(u), w = S(T (u)), w = T (u), S ∗ (w) = u, T ∗ (S ∗ (w)).

This implies that (ST )∗ = T ∗ S ∗ for all T ∈ H om(U, V ), S ∈ H om(V, W ).

(iv) Let T ∈ H om(U, V ). For any u ∈ U, v ∈ V , we have T ∗ (u), v=u, (T ∗ )∗ (v).


It is also easy to see that T ∗ (u), v = v, T ∗ (u) = T (v), u = u, T (v). Combin-
ing this with the above relation, we find that (T ∗ )∗ = T for all T ∈ H om(U, V ).
5.4 Operators on Inner Product Spaces 157

(v) For any u, v ∈ U we have u, v = I (u), v = u, I ∗ (v). This shows that
I ∗ = I . Similarly one can prove that 0∗ = 0.

(vi) If T ∈ H om(U, V ) and is invertible, then T T −1 = T −1 T = I . This implies that


(T T −1 )∗ = I ∗ , i.e., (T T −1 )∗ = I . This yields that (T −1 )∗ T ∗ = I . Similarly, it can
be seen that T ∗ (T −1 )∗ = I which shows that (T ∗ )−1 = (T −1 )∗ .

Remark 5.47 (i) One can find the relationship between N (T ), the null space and
R(T ), the range of a linear map T and its adjoint T ∗ . If T ∈ H om(U, V ), then
for any v ∈ V , v ∈ N (T ∗ ) if and only if T ∗ (v) = 0, and T ∗ (v) = 0 if and only
if u, T ∗ (v) = 0 for all u ∈ U or T (u), v = 0 for all u ∈ U . This shows that
v ∈ N (T ∗ ) if and only if v ∈ (R(T ))⊥ and hence N (T ∗ ) = (R(T ))⊥ . By taking
the orthogonal complement both the sides one can find that R(T ) = N (T ∗ )⊥ ,
and hence replacing T by T ∗ we find that R(T ∗ ) = (N (T ))⊥ .
(ii) If T is a linear transformation from U to V , then the following theorem shows
that it is easy to find T ∗ if we can find the relation between m(T ) and m(T ∗ ).

Theorem 5.48 Let U and V be nonzero finite dimensional inner product spaces
over the same field F, and let B1 = {u 1 , u 2 , . . . , u m } and B2 = {v1 , v2 , . . . , vn } be
ordered orthonormal basis of U and V , respectively. If T is a linear transformation
from U into V , and m(T ) = (α ji ) be the matrix of T of order n × m relative to the
basis B1 of U , then the matrix of T ∗ relative to the basis B2 is the conjugate transpose
of m(T ) of order m × n.

Proof Given that U and V are finite dimensional inner product spaces with orthonor-
mal basis B1 = {u 1 , u 2 , . . . , u m } and B2 = {v1 , v2 , . . . , vn }, respectively. Suppose
that m(T ∗ ) = (βi j ) represents the matrix of T ∗ relative to the basis B2 . Then for
1 ≤ i ≤ m, i ≤ j ≤ n,


n 
m

T (u i ) = α ji v j and T (v j ) = βi j u i .
j=1 i=1

Since B1 and B2 are orthonormal bases, we find that


 n 

T (u i ), v j  = α ji v j , v j = α ji .
j=1

Similarly,  

m
T ∗ (v j ), u i  = βi j u i , u i = βi j .
i=1

But the latter relations show that

βi j = u i , T ∗ (v j  = T (u i ), v j  = α ji ,
158 5 Inner Product Spaces

and hence m(T ∗ ) = (m(T ))t , i.e., m(T ∗ ) is the conjugate transpose of m(T ).
This relation is very useful in determining T ∗ if T is known.

Example 5.49 Let T : R2 → R3 be a linear transformation given by T (a, b) =


(a + b, a − b, b). Then the matrix of T relative to the standard bases is
⎡ ⎤
1 1
m(T ) = ⎣ 1 −1 ⎦ .
0 1

Thus the matrix of T ∗ relative to the standard bases is




∗ 1 1 0
m(T ) = .
1 −1 1

Hence, T ∗ (a, b, c) = (a + b, a − b + c).

Example 5.50 Let T : C3 → C3 be a linear transformation defined by T (a, b, c) =


(2a + (1 − i)b, (4 + i)a − 5ic, 3ia + (2 + i)b + 3c). Then the matrix of T with
respect to the standard basis of C3 is given by
⎡ ⎤
2 1−i 0
m(T ) = ⎣ 4 + i 0 −5i ⎦ .
3i 2 + i 3

Thus the matrix of T ∗ relative to the standard basis of C3 is


⎡ ⎤
2 4 − i −3i
m(T ∗ ) = ⎣ 1 + i 0 2 − i ⎦ ,
0 5i 3

and T ∗ (a, b, c) = (2a + (4 − i)b − 3ic, (1 + i)a + (2 − i)c, 5ib + 3c).

Now we consider linear operator T on a finite dimensional inner product space V ,


i.e., T ∈ A (V ) instead of T ∈ H om(V, V ).

Definition 5.51 An operator T ∈ A (V ) is called self-adjoint if T = T ∗ , and if


T ∗ = −T , then T is called skew-adjoint.

Remark 5.52 (i) T ∈ A (V ) is self-adjoint if and only if T (u), v = u, T (v),


for all u, v ∈ V .
(ii) In an Euclidean space, the self-adjoint linear transformation is called symmetric
and in case of unitary space it is called Hermitian.
(iii) If T ∈ A (V ) is a self-adjoint operator on an inner product space V , then
(S ∗ T S)∗ = S ∗ T ∗ (S ∗ )∗ = S ∗ T S, i.e., S ∗ T S is self-adjoint for all S ∈ A (V ).
On the other hand, if S is invertible and S ∗ T S is self-adjoint, then it can be seen
5.4 Operators on Inner Product Spaces 159

that T is self-adjoint. In fact, if S ∗ T S is self-adjoint, then (S ∗ T S)∗ = S ∗ T S.


This implies that S ∗ T ∗ S = S ∗ T S. Since S invertible, S ∗ is also invertible and the
latter relation shows that T ∗ S = T S, and hence T ∗ = T i.e., T is self-adjoint.
Example 5.53 Let T : C3 → C3 be a linear transformation such that T (a, b, c) =
(5a + (2 + 3i)b + (3 − 2i)c, (2 − 3i)a + 2b + (1 + i)c, (3 + 2i)a + (1 − i)b +
7c). Then the matrix of T with respect to the standard basis of C3 is given by
⎡ ⎤
5 2 + 3i 3 − 2i
m(T ) = ⎣ 2 − 3i 2 1 +i ⎦.
3 + 2i 1 − i 7

Thus ⎡ ⎤
5 2 + 3i 3 − 2i
m(T ∗ ) = ⎣ 2 − 3i 2 1 +i ⎦.
3 + 2i 1 − i 7

This shows that T = T ∗ and T is a self-adjoint operator.


: C2 → C2 such that T (a, b) = (2ia + (4 +
Example 5.54 Let T

i)b, (−4 + i)a
+
2i 4 + i ∗ −2i −4 − i
3ib), then m(T ) = and hence m(T ) = =
−4 + i 3i 4 − i −3i


2i 4 + i
− = −m(T ). Thus T = −T ∗ and T is skew-adjoint.
−4 + i 3i

If S, T ∈ A (V ) are self-adjoint operators on a finite dimensional inner product space


V , then S ∗ = S and T ∗ = T . Now using the results given in Theorem 5.46, one can
easily prove the following:

Theorem 5.55 Let S, T ∈ A (V ) be self-adjoint operators on a finite dimensional


inner product space V . Then
(i) S + T is self-adjoint,
(ii) if T is invertible, then T ∗ is invertible and (T ∗ )−1 = (T −1 )∗ ,
(iii) T S is self-adjoint if and only if T S = ST ,
(iv) for any real number α, αT is self-adjoint.
Definition 5.56 An operator T ∈ A (V ) on an inner product space V is called nor-
mal if T T ∗ = T ∗ T .
Remark 5.57 If T ∈ A (V ) is self-adjoint, then T ∗ = T and hence every self-
adjoint operator is normal, but there exist normal operators which are not self-adjoint.
Example 5.58 Let T : R2 → R2 such that
T (a, b) = (2a − 3b, 3a
+ 2b). Then
2 −3 ∗ 2 3
with respect to standard basis, m(T ) = . Thus m(T ) = , hence
3 2 −3 2
T ∗ (a, b) = (2a + 3b, −3a + 2b). It can be easily verified that T T ∗ = T ∗ T , i.e., T
is normal which is not self-adjoint.
160 5 Inner Product Spaces

Remark 5.59 (i) If T ∈ A (V ) is self-adjoint, then T (v), v = v, T ∗ (v) = v,


T (v) = T (v), v, for all v ∈ V , i.e., T (v), v is real. Similarly it can also be
seen that if T ∈ A (V ) is skew-adjoint, then T (v), v is purely imaginary for
all v ∈ V .
(ii) T ∈ A (V ) is normal if and only if (T T ∗ − T ∗ T )(v) = 0 for all v ∈ V . In fact,

T is normal ⇐⇒ T T ∗ (v), v − T ∗ T (v), v = 0


⇐⇒ T (v), T (v) − T ∗ (v), T ∗ (v) = 0

for all v ∈ V . This shows that T is normal if and only if


T (v)
2 =
T ∗ (v)
2 ,
i.e.,
T (v)
=
T ∗ (v)
.
(iii) If T ∈ A (V ) is normal, then it can be easily seen that N (T ) = N (T ∗ ). In fact,
if v ∈ N (T ), then T (v) = 0. This implies that T ∗ (v), T ∗ (v) = v, T T ∗ (v) =
v, T ∗ T (v) = v, 0 = 0, and hence T ∗ (v) = 0, i.e., v ∈ N (T ∗ ). This yields
that N (T ) ⊆ N (T ∗ ). In a similar manner it can be seen that if u ∈ N (T ∗ ),
then T (u), T (u) = 0, which shows that T (u) = 0 and hence u ∈ N (T ), i.e.,
N (T ∗ ) ⊆ N (T ). This gives that N (T ) = N (T ∗ ).
(iv) If V is a complex inner product space and T ∈ A (V ) such that T (v), v = 0
for all v ∈ V , then T = 0. In fact, T (v), v = 0 for all v ∈ V yields that

0 = 41 {T (u + w), u + w − T (u − w), u − w + i(T (u + iw), u + iw


−T (u − iw), u − iw)}
= 41 {2T (u), w + 2T (w), u + i(−2iT (u), w + 2iT (w), u)}
= T (u), w,

that is, T (u), w = 0, for all u, w ∈ V . Hence, in particular T (u), T (u) = 0,


i.e., T (u) = 0 for all u ∈ V .
(v) If V is a real inner product space, a nonzero operator T might satisfy T (v), v =
0 for all v ∈ V . However, this cannot happen for a self-adjoint operator, i.e., if
T = T ∗ and T (v), v = 0 for all v ∈ V , then T = 0. In fact, if T is self-adjoint,
then for all u, w ∈ V , T (w), u = w, T (u) = T (u), w. This yields that

0 = 14 {T (u + w), u + w − T (u − w), u − w}


= 14 {2T (u), w + 2T (w), u}
= T (u), w,

for all u, w ∈ V and hence again T = 0.


(vi) If T ∈ A (V ) is self-adjoint and T = 0, then it can be seen that T cannot
be nilpotent. In fact, if T is nilpotent, then there exists a positive integer n
such that T n = 0, but T n−1 = 0. Thus there exists u ∈ V such that T n−1 (u) =
0. Now T n−1 (u), v = u, (T n−1 )∗ (v) for all v ∈ V . Since T is self-adjoint
T n−1 (u), T n−1 (u) = u, T 2n−2 (u). But since 2n − 2 ≥ n for n > 1, we find
that T n−1 (u), T n−1 (u) = 0 and hence T n−1 (u) = 0, a contradiction.
5.4 Operators on Inner Product Spaces 161

Assume that every inner product space is a normed space and every normed space is
a metric space. In√fact, if V is an inner product space then V is a normed space under
the norm
v
= v, v, and it has also been seen in the beginning of this chapter
that every normed space is a metric space with respect to the metric d : V × V → R
defined by d(u, v) =
u − v
. In case of isomorphism, it is also known that the linear
map preserves the operations of the vector space. Now we define a linear transforma-
tion on V which do not change the distance or do not change the length of a vector
in an inner product space V .

Definition 5.60 Let V be an inner product space over a field F, d : V × V → R


be a metric on V and T ∈ A (V ). Then T is said to be an isometry if for all u, v ∈
V, d(u, v) = d(T (u), T (v)).

Example 5.61 Consider the linear transformation T : R2 → R2 such that T (a, b) =


(b, a), where R2 is an inner  product space with standard inner product. Let z =
(x,
 y) ∈ R 2
. Then
z
= x 2 + y 2 . Moreover, T (z) = (y, x) yields that
T (z)
=
y + x . Thus
T (z)
=
z
and T preserves length of vector. In this case let
2 2

d : R2 × R2 → R such that d(z 1 , z 2 ) =


z 1 − z 2
be a metric on R2 . Then for z 1 =
(a1 , b1 ), z 2 = (a2 , b2 ) ∈ R2

d(z 1 , z 2 ) = (a1 − a2 )2 + (b1 − b2 )2

d(T (z 1 ), T (z 2 )) = (b1 − b2 )2 + (a1 − a2 )2 .

This shows that d(z 1 , z 2 ) = d(T (z 1 ), T (z 2 )), i.e., d preserves the distance. However,
if we take S : R2 → R2 such that S(a, b) = (0, b), then it can be easily seen that

z
=
S(z)
, z ∈ R2 .

Remark 5.62 ((i) T ∈ A (V ) is an isometry if and only if


T (x)
=
x
for all
x ∈ V . If T is an isometry, then for any x, y ∈ V
x − y
=
T (x) − T (y)
=

T (x − y)
, and hence in particular
x
=
T (x)
. Conversely, if
T (x)
=

x
for all x ∈ V , then for any x, y ∈ V,
x − y
=
T (x − y)
=
T (x) −
T (y)
, i.e., T is an isometry.
(ii) The sum of two isometries need not be an isometry. For example, let S, T :
R2 → R2 such that S(a, b) = (b, a), T (a, b) = (−b, −a) for any a, b ∈ R.
Then S and T are isometries on R2 , but S + T = 0, which is not an isometry.
(iii) In a finite dimensional inner product space V , isometries are non-singular. In
fact, if T ∈ A (V ) is an isometry, then a ∈ N (T ) if and only if T (a) = 0, i.e.,

a
= 0 or a = 0. Hence T is one-to-one and since T is onto, T is non-singular.

Theorem 5.63 Let V be an inner product space over F. Then T ∈ A (V ) is an


isometry if and only if T (u), T (v) = u, v, for all u, v ∈ V .

Proof Let T be an isometry. We separate the proof in two cases:


CaseI . Assume that V is an Euclidean space. Then u, v = v, u = v, u. Since
162 5 Inner Product Spaces

u + v
2 =
u
2 +
v
2 + u, v + v, u, we find that u, v =
u+v

u
2 −
v
2
2

2
for
all u, v ∈ V . Now since T ∈ A (V ), the latter relation yields that

T (u), T (v) =
T (u)+T (v)

T (u)

T (v)

2 2 2

2

T (u+v)
2 −
T (u)
2 −
T (v)
2
= 2
=
u+v

u

v
2
2 2

2
= u, v

for all u, v ∈ V .
CaseI I . Suppose that V is unitary space. In this case u, v = v, u implies that

u, v + u, v =
u + v
2 −
u
2 −
v
2 .

It can be easily seen that


u + iv
2 = u + iv, u + iv
= u, u − iu, v + iv, u − i 2 v, v
=
u
2 − iu, v + iu, v +
v
2 .

The above relation yields that

u, v − u, v = i{
u + iv
2 −
u
2 −
v
2 }.

The above two relations reduce to

2u, v =
u + v
2 + i
u + iv
2 − (1 + i)(
u
2 +
v
2 ) for all u, v ∈ V.

Replacing u by T (u), v by T (v), and using the fact that T is an isometry with
T (iu) = i T (u), we find that

T (u), T (v) = 21 {
T (u) + T (v)
2 + i
T (u) + i T (v)
2
−(1 + i)(
T (u)
2 +
T (v)
2 )}
= 21 {
T (u + v)
2 + i
T (u + iv)
2 − (1 + i)(
T (u)
2 +
T (v)
2 )}
= 21 {
u + v
2 + i
u + iv
2 − (1 + i)(
u
2 +
v
2 )}
= u, v.

Conversely if T (u), T (v) = u, v for all u, v ∈ V , then in particular T (u), T (u)
= u, u for all u ∈ V , that is,
T (u)
2 =
u
2 . Hence
T (u)
=
u
and T is an
isometry.

One can also find another kind of linear transformation on an inner product space V
which preserves all the structure of V .
Definition 5.64 Let V be an inner product space over the field F. Then T ∈ A (V )
is said to be unitary if
5.4 Operators on Inner Product Spaces 163

T (u), T (v) = u, v for all u, v ∈ V.

Remark 5.65 If T ∈ A (V ) satisfies T (u), T (u) = u, u for all u ∈ V , then it


can be seen that T is unitary. In fact, if T (u), T (u) = u, u, then by linearizing
this relation on u we find that

T (u), T (v) + T (v), T (u) = u, v + v, u for all u, v ∈ V.

Since the above relation holds for all v ∈ V , replacing v by iv, where i 2 = −1
we arrive at −iT (u), T (v) + iT (v), T (u) = −iu, v + iv, u for all u, v ∈
V. Now multiplying by i yields that

T (u), T (v) − T (v), T (u) = u, v − v, u for all u, v ∈ V.

Combining the above two relations we find that T (u), T (v) = u, v for all u, v ∈
V , and therefore T is unitary.
Example 5.66 Let V be a finite dimensional inner product space and λ1 , λ2 , . . . , λn
be scalars with absolute value 1. If T ∈ A (V ) satisfies T (v j ) = λ j v j for some
orthonormal basis {v1 , v2 , . . . , vn } of V , then it can be seen that T is an isometry.
n
Suppose that v ∈ V . Then there exist scalars α1 , α2 , . . . , αn such that v = αi vi .
i=1

n
Since the given basis is orthonormal, v, vi  = αi . Hence v = v, vi vi .
i=1



n 
n

v
= 2
αi vi , αi vi
i=1 i=1

n
= αi αi vi , vi 
i=1
n
= |αi |2
i=1
n
= |v, vi |2 .
i=1


n 
n
Since v = v, vi vi , we find that T (v) = v, vi T (vi ). This yields that T (v) =
i=1 i=1

n
v, vi λi vi . Now
i=1



n 
n

T (v)
2 = v, vi λi vi , v, vi λi vi
i=1 i=1

n
= |v, vi |2 |λi |2
i=1
n
= |v, vi |2 .
i=1
164 5 Inner Product Spaces

The above two relations yield that


T (v)
2 =
v
2 , i.e.,
T (v)
=
v
, and T is
isometry.

The following theorem characterizes an isometry in terms of orthonormal basis of


an inner product space.
Theorem 5.67 Let V be a finite dimensional inner product space over F. T ∈ A (V )
is an isometry if and only if T maps an orthonormal basis of V into an orthonormal
basis of V .

Proof Let {v1 , v2 , . . . , vn } be an orthonormal basis of V . Hence vi , v j  = 0 or 1


according as i = j or i = j. But since T is an isometry, T (vi ), T (v j ) = vi , v j ,
1 ≤ i, j ≤ n. This yields that T (vi ), T (v j ) = 0 or 1 according as i = j or i = j.
This shows that {T (v1 ), T (v2 ), . . . , T (vn )} is an orthonormal set and since orthonor-
mal set is linearly independent set, {T (v1 ), T (v2 ), . . . , T (vn )} is a basis of V .
Conversely, assume that T maps orthonormal basis of V into an orthonormal basis of
n
V . If {v1 , v2 , . . . , vn } is an orthonormal basis of V , then for any v ∈ V , v = αi vi ,
i=1
and  

n 
n
v, v = αi vi , αjvj
i=1 j=1

n 
n
= αi α j vi , v j 
i=1 j=1
n
= |αi |2 .
i=1

Similarly,  

n 
n
T (v), T (v) = αi T (vi ), α j T (v j )
i=1 j=1

n 
n
= αi α j T (vi ), T (v j )
i=1 j=1
n
= |αi |2 .
i=1

This shows that v, v = T (v), T (v), for all v ∈ V , i.e.,


T (v)
=
v
or T is
isometry.

Theorem 5.68 Let V be a finite dimensional inner product space. Then T ∈ A (V )


is isometry if there exists a unique isometry S ∈ A (V ) such that T S = ST = I .

Proof Let v ∈ V and T ∈ A (V ). Then define a map ϕv : V → F, such that ϕv (u) =


T (u), v. For any u 1 , u 2 ∈ V, α, β ∈ F

ϕv (αu 1 + βu 2 ) = T (αu 1 + βu 2 ), v
= αT (u 1 ), v + βT (u 2 ), v
= αϕv (u 1 ) + βϕv (u 2 ).
5.4 Operators on Inner Product Spaces 165

Thus ϕv is a linear functional on V and hence by Riesz representation theorem


for each u ∈ V there exists a unique v ∈ V such that ϕv (u) = u, v . Now define
S : V → V such that S(v) = v for all v ∈ V . Hence T (u), v = u, S(v) for all
u, v ∈ V . Now

T (u), αv1 + βv2  = αT (u), v1  + βT (u), v2 


= αu, S(v1 ) + βu, S(v2 )
= u, αS(v1 ) + β S(v2 ).

But since T (u), αv1 + βv2  = u, S(αv1 + βv2 ), the above relation yields that
S(αv1 + βv2 ) = αS(v1 ) + β S(v2 ) and S ∈ A (V ). Now our claim is that S is
unique. Suppose there exists S  ∈ A (V ) such that T (u), v = u, S  (v). Then
u, (S − S  )(v) = 0 for all v ∈ V . This shows that S − S  = 0. It can also be seen
that S is one-to-one. If S(v) = 0, then T (u), v = u, S(v) = 0, for all u ∈ V .
Since T is an isometry, T −1 exists, and hence T (T −1 )(v), v = 0 which yields that
v, v = 0 or v = 0. Since V is finite dimensional, S is non-singular.
As T (u), v = u, S(v) for all u, v ∈ V , we find that T (u), T (v) = u, S(T (v))
for all u, v ∈ V . But since T is an isometry, we have u, v = u, ST (v). This reduces
to u, (ST − I )(v) = 0, for all u, v ∈ V . Hence, in particular (ST − I )(v), (ST −
I )(v) = 0, and ST = I . Similarly, it can also be seen that T S = I and hence S is an
isometry. In fact, for any isometry T there exists S ∈ A (V ) such that ST = T S = I .
Now for any v ∈ V ,
T S(v)
=
T (S(v)
=
S(v)
, i.e.,
v
=
I (v)
=
S(v)

for all v ∈ V or S is an isometry.


Theorem 5.69 Let V be a finite dimensional inner product space and T ∈ A (V ).
Then there exists a unique linear operator T ∗ on V such that T (u), v = u, T ∗ (v),
for all u, v ∈ V .
Proof Let v ∈ V. Define a map ϕv : V → F such that ϕv (u) = T (u), v. It is
straightforward to show that ϕv is a linear functional on V . Thus by Riesz represen-
tation theorem there exists a unique v ∈ V depending on v such that ϕv (u) = u, v .
Now let T ∗ (v) = v so that T (u), v = u, T ∗ (v). For any u, v1 , v2 ∈ V, α, β ∈ F

u, T ∗ (αv1 + βv2 ) = T (u), αv1 + βv2 


= αT (u), v1  + βT (u), v2 
= αu, T ∗ (v1 ) + βu, T ∗ (v2 )
= u, αT ∗ (v1 ) + βT ∗ (v2 ).

This shows that T ∗ (αv1 + βv2 ) = αT ∗ (v1 ) + βT ∗ (v2 ) and hence T ∗ is a linear
operator on V . For uniqueness, let there exist another linear operator on V such
that T (u), v = u, T  (v). This yields that u, T ∗ (v) = u, T  (v) or u, T ∗ (v) −
T  (v) = 0 and hence in particular we find that T ∗ (u) − T  (u) = 0 for all u ∈ V i.e.,
T ∗ = T .
Exercises
1. If T : R3 → R2 is a linear transformation, then find the adjoint of T .
166 5 Inner Product Spaces

2. Let V, W be inner product spaces over F. For a fix u ∈ V and x ∈ W let T :


V → W be linear transformation such that T (v) = v, ux for all v ∈ V . Find
the adjoint T ∗ . (Hint: v, T ∗ (w) = v, w, xu for w ∈ W, v ∈ V ).
3. Let T : R3 → R3 be a linear transformation such that T (a, b, c)=(a + b, b, a +
b + c). Find T ∗ .
4. Let T : C3 → C3 be a linear transformation such that T (a, b, c) = (2a −
ib, b − 5ic, a + (1 − i)b + 3c). Find T ∗ .
5. If T ∈ C2 , then find T ∗ in each of the following cases:
(a) T (a, b) = (ia + (1 − i)b, (1 + i)a − ib),
(b) T (a, b) = (2a + ib, (i − 1)a + b),
(c) T (a, b) = ((i − 1)a + b, a − b).
6. If V is an inner product space over the F, and T ∈ A (V ), then show that the
following conditions are equivalent:
(a) T (u), T (v) = u, v,
(b) T ∗ T = I,
(c) T is an isometry, i.e.,
T (u)
=
u
.
7. In the above problem if T ∈ A (V ) is one-to-one and onto, then show that (a)
and (c) are equivalent to (b) T ∗ T = T T ∗ = I .
8. If {v1 , v2 , . . . , vn } is an orthonormal basis of an inner product space V , then for
n
any u, v ∈ V show that u, v = u, vi vi , v.
i=1
9. Show that the set of all isometries on an inner product space forms a group.
10. If T ∈ A (V ), then show that T = S + K , where S, K ∈ A (V ) such that S is
self-adjoint and K is skew-adjoint and the above decomposition is unique.(Hint:
∗ ∗
Consider S = T +T2
and K = T −T 2
).
11. Let T : C3 → C3 be a linear transformation such that T (a, b, c) = (a + ib +
(1 + i)c, ia − b, 2ia + 3b + (2i − 1)c). Find T ∗ .
Chapter 6
Canonical Forms of an Operator

In this chapter, we study the structure of linear operators. In all that follows, V will be
a finite dimensional vector space and T : V → V a linear operator from V to itself.
We recall that the kernel N (T ) and the image R(T ) of T are both subspaces of V and,
in the light of the rank-nullity theorem, the following conditions are equivalent: (i)
T is bijective, (ii) N (T ) = {0}, (iii) R(T ) = V. A linear operator which satisfies
these equivalent conditions is said to be invertible, its inverse function is also an
operator and its matrix with respect to an arbitrary basis of V is an invertible matrix.
An operator that is not invertible is said to be a singular operator and its matrix with
respect to an arbitrary basis of V is a singular matrix.
We already have discussed that if V is a vector space of dimension n over the
field F and B is a basis for V , then we can associate a matrix A ∈ Mn (F) to T .
More precisely, the column coordinate vectors of A are the images of the elements of
the basis B. Thus any linear operator on V is represented by an appropriate matrix,
whose scalar entries depend precisely on the choice of the basis for V.
We remind the reader of the basic effect of the change of basis in V. If B and
B  are two different bases for V and A and A are the matrices of T relative to the
bases B and B  , respectively, then A and A are similar to each other. In particular
A = P −1 A P, where P is the transition matrix. Hence A and A represent the same
operator T and A is obtained from A by conjugation. In other words, any matrix
which is similar to A represents the linear operator T.
The first question one might ask oneself is, how to determine a basis of V with
respect to which the linear operator is represented by a particularly simple matrix.
To answer this question, we first introduce the concepts of eigenvalue, eigenvector
and eigenspace of a linear operator T. They will be the main tools for analyzing
the structure of the matrix of T. Then we describe the most relevant classes of
linear operators, in accordance with the study of their representing matrices. More
precisely, we focus our attention on operators which are represented in terms of some

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 167
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_6
168 6 Canonical Forms of an Operator

suitable bases for V, by triangular, diagonal or block diagonal matrices (usually called
canonical forms of a matrix). Then, we conclude this chapter by studying the class
of normal operators.

6.1 Eigenvalues and Eigenvectors

Definition 6.1 An eigenvector of T is a nonzero vector v ∈ V such that T (v) = λv,


for some λ ∈ F. Analogously, let B = {e1 , . . . , en } be a basis of V, A the matrix
of T with respect to the basis B and v a nonzero vector of V. If X is the nonzero
coordinates column vector of v in terms of B, then v is said to be an eigenvector of
A if AX = λX for some λ ∈ F. The scalar λ ∈ F is called the eigenvalue associated
with the eigenvector v.

To determine whether λ ∈ F is an eigenvalue of the linear operator T : V → V,


we must determine whether there are any nonzero solutions to the matrix equa-
tion AX = λX, where A ∈ Mn (F) is the n × n matrix of T. If we denote X =
[x1 , x2 , . . . . . . , xn ]T and A = (ai j ), the above matrix equation reduces to the linear
system
a11 x1 + a12 x2 + · · · + a1n xn = λx1
a21 x1 + a22 x2 + · · · + a2n xn = λx2
... ... ... ...
an1 x1 + an2 x2 + · · · + ann xn = λxn ,

that is, the homogeneous linear system (A − λI )X = 0, where I denotes the n × n


identity matrix. We know that the homogeneous linear system admits nontrivial
solutions only if its rank is less than the number of indeterminates. In particular, to
determine the nonzero solutions of (A − λI )X = 0 we need that |A − λI | = 0. The
polynomial p(λ) = |A − λI | is said to be the characteristic polynomial of T (of A).
Here  
 a11 − λ a12 . . . a1n 
 
 .. 
 . 
p(λ) =  

 .. 
 . 
 an1 an2 . . . ann − λ 
= (−1)n λn + αn−1 λn−1 + · · · + α1 λ + α0 ,

where αi ∈ F, for any i = 0, . . . , n − 1. Of course, its degree is equal to n. Moreover,


p(λ) has exactly n roots in the algebraic closure of F. Any of these roots which
belongs to F, is an eigenvalue of T (of A). Analogously, the equation p(λ) = 0 is
said to be the characteristic equation of T (of A) and has n solutions in the algebraic
closure of F. Any of these solutions which belongs to F, is an eigenvalue of T (of A).
Without loss of generality, in all that follows we refer to p(λ) as the characteristic
polynomial of the matrix A.
6.1 Eigenvalues and Eigenvectors 169

Now we assume that λ0 ∈ F is an eigenvalue of A. To find an eigenvector of


A associated with λ0 , we have to solve the matrix equation AX = λ0 X. Since
|A − λ0 I | = 0, the rank of the matrix A − λ0 I is equal to r < n. Therefore, the
above homogeneous linear system admits nontrivial solutions. More precisely, we
can describe the set of all solutions as follows:

V0 = {0 = X ∈ V | (A − λ0 I )X = 0}.

The vector subspace of V defined as W0 = V0 ∪ {0} is said to be the eigenspace of


A associated with the eigenvalue λ0 . The dimension of W0 is equal to n − r.

Example 6.2 Let T : R3 → R3 be the linear operator with associated matrix


⎡ ⎤
122
A = ⎣1 3 1⎦
221

with respect to the canonical basis of R3 . The characteristic polynomial is the fol-
lowing determinant:
 
1 − λ 2 2 

p(λ) =  1 3 − λ 1  = (λ − 1)(−λ2 + 4λ + 5).
 2 2 1 − λ

There are three distinct roots of p(λ), that is, three distinct eigenvalues λ1 = −1,
λ2 = 5, λ3 = 1 of T. Let us find the corresponding eigenspaces.
For λ1 = −1, we have ⎡ ⎤
222
A + I = ⎣1 4 1⎦
222

which has rank equal to 2. Thus, the associated homogeneous linear system

2x1 + 2x2 + 2x3 = 0


x1 + 4x2 + x3 = 0
2x1 + 2x2 + 2x3 = 0

has solutions X = (α, 0, −α), for any α ∈ R. Hence, the eigenspace corresponding
to λ1 has dimension 1 and is generated by the eigenvector (1, 0, −1).
For λ2 = 5, we have ⎡ ⎤
−4 2 2
A − 5I = ⎣ 1 −2 1 ⎦
2 2 −4

having rank equal to 2, so that the associated homogeneous linear system


170 6 Canonical Forms of an Operator

−4x1 + 2x2 + 2x3 = 0


x1 − 2x2 + x3 = 0
2x1 + 2x2 − 4x3 = 0

has solutions X = (α, α, α), for any α ∈ R. In this case, the eigenspace correspond-
ing to λ2 is generated by the eigenvector (1, 1, 1).
Finally, for λ3 = 1, ⎡ ⎤
022
A − I = ⎣1 2 1⎦
220

has rank equal to 2 and the associated homogeneous linear system

2x2 + 2x3 = 0
x1 + 2x2 + x3 = 0
2x1 + 2x2 = 0

has solutions X = (α, −α, α), for any α ∈ R. The eigenspace corresponding to λ3
is generated by the eigenvector (1, −1, 1).
Example 6.3 Let T : R3 → R3 be the linear operator with associated matrix
⎡ ⎤
0 −1 1
A = ⎣1 0 2⎦
0 0 2

with respect to the canonical basis of R3 . The characteristic polynomial is


 
 −λ −1 1 
 
p(λ) =  1 −λ 2  = (2 − λ)(λ2 + 1)
 0 0 2 − λ

having three distinct roots λ1 = 2, λ2 = i, λ3 = −i. In the light of our definition,


we say that λ1 = 2 is eigenvalue of T, but λ2 = i, λ3 = −i are not, since i, −i ∈
/ R.
Example 6.4 Let now T : C3 → C3 be the linear operator with associated matrix
⎡ ⎤
0 −1 1
A = ⎣1 0 2⎦
0 0 2

with respect to the canonical basis of C3 . We repeat the same computations as above
and find that λ1 = 2, λ2 = i and λ3 = −i are the eigenvalues of T. For λ1 = 2, we
have ⎡ ⎤
−2 −1 1
A − 2I = ⎣ 1 −2 2 ⎦
0 0 0
6.1 Eigenvalues and Eigenvectors 171

which has rank equal to 2. Thus, the associated homogeneous linear system

−2x1 − x2 + x3 = 0
x1 − 2x2 + 2x3 = 0

has solutions X = (0, α, α), for any α ∈ R. Hence, the eigenspace corresponding to
λ1 has dimension 1 and is generated by the eigenvector (0, 1, 1).
For λ2 = i, we have ⎡ ⎤
−i −1 1
A − i I = ⎣ 1 −i 2 ⎦
0 0 2−i

having rank equal to 2, so that the associated homogeneous linear system

−i x1 − x2 + x3 = 0
x1 − i x2 + 2x3 = 0
(2 − i)x3 =0

has solutions X = (iα, α, 0), for any α ∈ R. In this case, the eigenspace correspond-
ing to λ2 is generated by the eigenvector (i, 1, 0).
Finally, for λ3 = −i, ⎡ ⎤
i −1 1
A + iI = ⎣1 i 2 ⎦
0 0 2+i

has rank equal to 2 and the associated homogeneous linear system has solutions
X = (−iα, α, 0), for any α ∈ R. The eigenspace corresponding to λ3 is generated
by the eigenvector (−i, 1, 0).
Lemma 6.5 Let A ∈ Mn (F). If λ is an eigenvalue of A, then λm is an eigenvalue
of Am , for any m ≥ 2. Moreover, any eigenvector of A associated with λ is an
eigenvector of Am associated with λm .
Proof It follows directly from the definition of eigenvalue and associated eigenvec-
tor. In fact, multiplying on the left by A in the matrix equation AX = λX, we get
A2 X = λAX = λ2 X, that is A2 admits the eigenvalue λ2 , moreover X is an eigen-
vector associated with λ2 . In a similar way A3 X = λ3 X, and continuing this process
one has that Am X = λm X.
Lemma 6.6 Let A ∈ Mn (C). If λ0 is an eigenvalue of A, then the conjugate element
λ0 is an eigenvalue of the adjoint A∗ .
Proof It is known that, for any matrix B ∈ Mn (C), |B ∗ | = |B|. Thus

|A − λI | = |(A − λI )∗ | = |(A∗ − λI |

so that |(A∗ − λ0 I | = 0 follows from |A − λ0 I | = 0.


172 6 Canonical Forms of an Operator

Lemma 6.7 Let A ∈ Mn (F). If λ0 is an eigenvalue of A, then λ0 is an eigenvalue


of the transpose A T .

Proof It follows from

|A − λI | = |(A − λI )T | = |A T − λI |.

Lemma 6.8 Let T : V → V be a linear operator on a finite dimensional vector


space V. Let A ∈ Mn (F) be the matrix of T with respect to a basis of V, λ0 an
eigenvalue of A and X ∈ V an eigenvector of T corresponding to λ0 . Then, for any
α0 ∈ F, the scalar λ0 + α0 is an eigenvalue of the matrix A + α0 I. Moreover, X is
eigenvector of T + α0 I corresponding to the eigenvalue λ0 + α0 .

Proof Let X ∈ V be an eigenvector of A associated with λ0 . One can easily see that

(A + α0 I )X = AX + α0 X = λ0 X + α0 X = (λ0 + α0 )X

as required.

Example 6.9 Let T : C3 → C3 be the linear operator as in Example 6.4 and intro-
duce the following operator G : C3 → C3 defined as G(X ) = T (X ) + 3X, for any
X ∈ C3 . Thus, the matrix of G with respect to the canonical basis of C3 is
⎡ ⎤
3 −1 1
A = ⎣1 3 2⎦.
0 0 5

The characteristic polynomial is


 
 3 − λ −1 1 
  
p(λ) =  1 3 − λ 2  = (5 − λ) (3 − λ)2 + 1
 0 0 5 − λ

having three distinct roots λ1 = 5, λ2 = 3 + i, λ3 = 3 − i. Easy computations show


that
(i) The eigenspace corresponding to λ1 = 5 has dimension 1 and is generated by
the eigenvector (0, 1, 1).
(ii) The eigenspace corresponding to λ2 = 3 + i has dimension 1 and is generated
by the eigenvector (i, 1, 0).
(iii) The eigenspace corresponding to λ3 = 3 − i has dimension 1 and is generated
by the eigenvector (−i, 1, 0).

Theorem 6.10 Let A ∈ Mn (F). Suppose λ1 , λ2 are distinct eigenvalues of A and


X 1 and X 2 are nonzero eigenvectors corresponding to λ1 and λ2 , respectively. Then
{X 1 , X 2 } is linearly independent.
6.1 Eigenvalues and Eigenvectors 173

Proof On the contrary we assume that there exists a nonzero α ∈ F, such that X 1 =
α X 2 . By the hypothesis, we have that AX 1 = λ1 X 1 and AX 2 = λ2 X 2 . The facts that
AX 1 = λ1 X 1 and X 1 = α X 2 imply that α AX 2 = λ1 α X 2 , that is, αλ2 X 2 = λ1 α X 2 .
Hence α(λ2 − λ1 )X 2 = 0, which is a contradiction.

Remark 6.11 The above result implies easily that if {λ1 , . . . , λt } are all the dis-
tinct eigenvalues of A and X 1 , X 2 , . . . , X t are nonzero eigenvectors corresponding
to λ1 , λ2 , . . . , λt , respectively, then {X 1 , . . . , X t } is linearly independent. Moreover,
if W1 , . . . , Wt are the eigenspaces associated with λ1 , . . . , λt , respectively, the sub-
space W = W1 + · · · + Wt is a direct sum and we write W = W1 ⊕ · · · ⊕ Wt .

Example 6.12 Let T : R3 → R3 be the linear operator having matrix


⎡ ⎤
204
A = ⎣0 4 1⎦
102

with respect to the canonical basis of R3 . The characteristic polynomial is

p(λ) = −λ(4 − λ)2

having two distinct roots λ1 = 0, λ2 = 4. The eigenspace W1 corresponding to λ1 =


0 has dimension 1 and is generated by the eigenvector (8, −1, 4). The eigenspace
W2 corresponding to λ2 = 4 has dimension 1 and is generated by the eigenvector
(0, 1, 0). It is easy to see that W1 ∩ W2 = {0}, so that W1 ⊕ W2 is a direct sum.

Remark 6.13 Let A ∈ Mn (F) be an upper-triangular matrix. Then, the eigenvalues


of A consist precisely of the entries on the main diagonal of A.
In fact, if ⎡ ⎤
a11 a12 . . . a1n
⎢ a22 . . . a2n ⎥
⎢ ⎥
A=⎢ . ⎥
⎣ . . ⎦
ann

then ⎡ ⎤
a11 − λ a12 . . . a1n
⎢ a22 − λ . . . a2n ⎥
⎢ ⎥
A − λI = ⎢ .. ⎥
⎣ . ⎦
ann − λ

and |A − λI | = (a11 − λ)(a22 − λ) · · · (ann − λ), whose roots are precisely

λ1 = a11 , λ2 = a22 , . . . , λn = ann .


174 6 Canonical Forms of an Operator

Theorem 6.14 Let A, B ∈ Mn (F) be similar matrices, that is, there exists an invert-
ible matrix P ∈ Mn (F) such that B = P −1 A P. Then A and B have the same eigen-
values. Moreover, if Y is an eigenvector of B associated with the eigenvalue λ, then
X = PY is an eigenvector of A associated with λ.

Proof Let λ be an eigenvalue of A and X one of its associated eigenvectors. Since


A = P B P −1 , it follows that (P B P −1 )X = λX, that is, B(P −1 X ) = λ(P −1 X ), as
required.

Example 6.15 Let T : R2 → R2 be the linear operator having the following matrix,
with respect to the canonical basis B of R2 ,

1 1
A= .
−2 4

Let B  = {(1, 0), (1, 1)} a basis for R2 . The matrix of T with respect to B  is
P A P = A , where
−1

11
P=
01

is the transition matrix having vectors of B  in the columns. By computations, we


have that
1 −1 1 1 11 3 0
A = P −1 A P = = .
0 1 −2 4 01 −2 2

The matrices A, A have the same eigenvalues λ1 = 3, λ2 = 2. The eigenspace


of A corresponding to λ1 = 3 is generated by X 1 = (1, 2)t ; the eigenspace of A
corresponding to λ2 = 2 is generated by X 2 = (1, 1)t .
Moreover, the eigenspace of A corresponding to λ1 = 3 is generated by the
coordinates vector Y1 = (−1, 2)t in terms of the basis B  ; the eigenspace of A
corresponding to λ2 = 2 is generated by the coordinates vector Y2 = (0, 1)t in terms
of the basis B  . It is easy to see that X 1 = PY1 and X 2 = PY2 .

Now, we define algebraic and geometric multiplicity of an eigenvalue as follows:

Definition 6.16 Let A ∈ Mn (F) and p(λ) its characteristic polynomial. Assume that
F contains the splitting field of p(λ) and let S = {λ1 , . . . , λt } be the set of all distinct
roots of p(λ). Hence, we get the following decomposition

p(λ) = (λ1 − λ)a1 (λ2 − λ)a2 · · · (λt − λ)at


t
where ai = n. For any eigenvalue λk ∈ S, we say that
i=1

(i) ak is its algebraic multiplicity, i.e., it is the number of times λk occurs as a root
of p(λ).
6.1 Eigenvalues and Eigenvectors 175

(ii) The dimension of the eigenspace associated with λk is the geometric multiplicity
of λk .

The algebraic multiplicity and geometric multiplicity of an eigenvalue can differ.


However, the geometric multiplicity can never exceed the algebraic multiplicity, as
proved in the following:

Theorem 6.17 Let V be a vector space of dimension n and T : V → V the linear


transformation from V to itself, B a basis of V and A ∈ Mn (F) the matrix of T
with respect to the basis B. Assume that λ0 is an eigenvalue of A having algebraic
multiplicity equal to a and geometric multiplicity equal to g. Then g ≤ a.

Proof Let p(λ) be the characteristic polynomial of A and V0 the eigenspace of


A associated with the eigenvalue λ0 . By the hypothesis dim(V0 ) = g and let
{X 1 , . . . , X g } be a basis for V0 , where each X i is an eigenvector corresponding
to λ0 .
Extending {X 1 , . . . , X g } to a basis B  = {X 1 , . . . , X g , Y1 , . . . , Yn−g } of V, we
can now determine the matrix A of T with respect to the basis B  . As with any
linear transformation, the columns of A are the coordinate vectors of the images
T (X 1 ), . . . , T (Yn−g ) in terms of the basis B  . Since T (X i ) = λ0 X i , for any i =
1, . . . , g, we have that
D A1
A =
0n−g,g A2

where 0n−g,g ∈ Mn−g,g (F), A1 ∈ Mg,n−g (F), A2 ∈ Mn−g (F) and D ∈ Mg (F) is a
diagonal block defined as:
⎡ ⎤
λ0
⎢ λ0 ⎥
⎢ ⎥
D=⎢ .. ⎥.
⎣ . ⎦
λ0

Since A and A are similar matrices, they have the same eigenvalues. In particular,
A and A have the same characteristic polynomial. On the other hand, the charac-
teristic polynomial of A is p(λ) = (λ − λ0 )g q(λ), where q(λ) is the characteristic
polynomial of the matrix A2 . Therefore λ0 occurs, as a root of p(λ), at least g-times.
As a consequence, the algebraic multiplicity of λ0 is a ≥ g.

Example 6.18 Let T : R4 → R4 be the linear operator having matrix


⎡ ⎤
2 1 0 0
⎢ −1 0 1 0 ⎥
A=⎢
⎣ 1

3 1 0 ⎦
0 0 0 −1
176 6 Canonical Forms of an Operator

with respect to the canonical basis of R4 . The characteristic polynomial is p(λ) =


(−1 − λ)2 (2 − λ)2 , having two distinct roots λ1 = 2 (with algebraic multiplicity
equal to 2) and λ2 = −1 (with algebraic multiplicity equal to 2).
For λ1 = 2, we have
⎡ ⎤
0 1 0 0
⎢ −1 −2 1 0 ⎥
A − 2I = ⎢ ⎣ 1 3 −1 0 ⎦

0 0 0 −3

having rank equal to 3, so that the associated homogeneous linear system

x2 = 0
−x1 − 2x2 + x3 = 0
x1 + 3x2 − x3 = 0
−3x4 = 0

has solutions X = (α, 0, α, 0), for any α ∈ R. The eigenspace corresponding to λ1


is generated by the eigenvector (1, 0, 1, 0) and the geometric multiplicity of λ1 is
equal to 1.
For λ2 = −1, we have ⎡ ⎤
3 100
⎢ −1 1 1 0 ⎥
A+I =⎢ ⎣ 1 3 2 0⎦

0 000

having rank equal to 2, so that the associated homogeneous linear system

3x1 + x2 =0
−x1 + x2 + x3 = 0
x1 + 3x2 + 2x3 = 0

has solutions X = (α, −3α, 4α, β), for any α, β ∈ R. In this case, the eigenspace
corresponding to λ2 is generated by the eigenvectors {(1, −3, 4, 0), (0, 0, 0, 1)} and
the geometric multiplicity of λ2 is equal to 2.

Exercises

1. Let F be a field and A ∈ Mn (F) be such that its characteristic polynomial is

p(λ) = (−1)n λn + αn−1 λn−1 + · · · + α1 λ + α0 , αi ∈ F,

where i = 0, 1, . . . , n − 1. Prove that


6.1 Eigenvalues and Eigenvectors 177

(a) α0 = |A|.
(b) αn−1 = (−1)n−1 tr (A).
2. Let T : V → V be a nonsingular linear operator on a n-dimensional vector space
V, A the matrix of T with respect to a basis for V. Let λ be an eigenvalue of T
and X ∈ V an eigenvector of T corresponding to λ. Prove that λ = 0 (trivial)
and λ−1 is an eigenvalue of A−1 and X is an eigenvector of T −1 corresponding
to λ−1 .
3. Let T : V → V be a linear operator on a n-dimensional vector space V and let
A, B ∈ Mn (F) two matrices of T with respect two different bases for V (i.e., A
and B are similar). Let λ ∈ F be an eigenvalue of A and B. Prove that:
(a) The geometric multiplicity of λ as eigenvalue of A coincides with its geo-
metric multiplicity as eigenvalue of B.
(b) The trace, the determinant and the rank of A are, respectively, equal to the
trace, the determinant and the rank of B.
4. Let T : R3 → R3 be the linear operator with associated matrix
⎡ ⎤
010
A = ⎣0 0 1⎦.
991

Determine eigenvalues, eigenvectors and a basis for each eigenspace of T.


5. Redo Exercise 4, in the case the linear operator T : R3 → R3 is represented by
the following matrix ⎡ ⎤
00 0
A = ⎣ 1 0 −1 ⎦ .
31 2

6. Redo Exercise 4, in the case the linear operator T : C4 → C4 is represented by


the following matrix ⎡ ⎤
1 −2 0 0
⎢2 1 0 0 ⎥
A=⎢ ⎥
⎣ 0 0 2 −2 ⎦ .
0 0 3 −2

7. Let T : R3 → R3 be the linear operator with associated matrix


⎡ ⎤
−1 7 4
A = ⎣ 3 −3 4 ⎦ .
2 1 4

Determine det (e A ) and trace(e A ).


178 6 Canonical Forms of an Operator

8. Let A be a 3 × 3 matrix with real entries such that det (A) = 6 and the trace
of A is 0. If det (A + I ) = 0, where I denotes the 3 × 3 identity matrix, then
determine eigen values of A.
9. Let J denotes a 101 × 101 matrix with all the entries equal to 1 and let I denotes
the identity matrix of order 101. Then find det (J − I ) and trace (J − I ).
10. Let A be a 4 × 4 matrix with real entries such that −1, 1, 2, −2 are its eigen
values. If B = A4 − 5A2 + 5I , then find:
(a) det (A + B)
(b) det (B)
(c) trace (A − B)
(d) trace (A + B).
11. Show that all eigen values of a real skew-symmetric orthogonal matrix are of
unit modulus.
12. Let α, β be two distinct eigenvalues of a square matrix A, and W1 , W2 be the
corresponding eigenspaces associated with α, β, respectively. Then show that
W1 ∩ W2 = {0}.

6.2 Triangularizable Operators

Definition 6.19 Let V be a vector space of dimension n. A linear operator T : V →


V is said to be triangularizable if there exists a basis B of V such that the matrix A
of T with respect to B is upper triangular, i.e.,
⎡ ⎤
λ1 a12 . . . a1n
⎢ λ2 . . . a2n ⎥
⎢ ⎥
A=⎢ .. ⎥
⎣ . ⎦
λn

where λ1 , . . . , λn ∈ F are precisely the eigenvalues of T (of A), see Remark 6.13.
Analogously, we say that a matrix is triangularizable if it is similar to an upper
triangular matrix.

Theorem 6.20 (Schur’s Theorem) Let V be an inner product space of dimension n


over a field F, where F = R or C and T : V → V a linear transformation from V
to itself. The characteristic polynomial p(λ) of T splits over F, that is,

p(λ) = (λ1 − λ)a1 (λ2 − λ)a2 · · · (λt − λ)at ,


t
where ai = n and any λi ∈ F, if and only if T has an upper-triangular matrix
i=1
with respect to some orthonormal basis of V.
6.2 Triangularizable Operators 179

Proof Let A ∈ Mn (F) be the matrix of T. The fact that T has an upper-triangular
matrix with respect to some orthonormal basis of V is equivalent to saying that A is
unitarily similar to an upper triangular matrix A , namely H −1 AH = A , where H
is an unitary matrix.
Firstly, we assume that the characteristic polynomial p(λ) of T splits over F
and prove that T is unitary triangularizable. We prove the result by induction on the
dimension of V. Of course, it is trivial in the case dim(V ) = 1. Thus, we assume that
the result holds for any inner product space of dimension less than n. Let λ0 ∈ F be
an eigenvalue of T and X 1 an eigenvector corresponding to λ0 having unit norm, that
is X 1 = 1. Extending {X 1 } to an orthonormal basis B = {X 1 , . . . , X n } of V, we
may compute the matrix A of T with respect to B. If we denote by P the transition
matrix, then it follows that
A = P −1 A P,

where P = [X 1 , . . . , X n ]. On the other hand, since X 1 is eigenvector associated with


λ0 , A is a diagonal block defined as

λ0 ut
A =
0n−1,1 A1

for 0n−1,1 = [ 0, . . . , 0 ]t , u ∈ Fn−1 and A1 ∈ Mn−1 (F). Since A and A have the
  
(n−1)−times
same characteristic polynomial, a fortiori any eigenvalue of A1 is an eigenvalue of A.
Therefore, the characteristic polynomial of A1 splits over F. By induction hypothesis,
there exists an unitary matrix Q ∈ Mn−1 (F) such that Q −1 A1 Q = A2 is an upper
triangular matrix. Hence, if

1 01,n−1
R=
0n−1,1 Q

for 01,n−1 = [ 0, . . . , 0 ], then R is invertible and


  
(n−1)−times

1 01,n−1 λ0 u t 1 01,n−1
R −1 A R = −1
0n−1,1 Qt 0n−1,1 A1 0n−1,1 Q
λ0 u
=
0n−1,1 A2

is upper triangular. Moreover,

R −1 A R = R −1 P −1 A P R = (P R)−1 A(P R)

where both P and R are unitary matrices. Thus, the product P R is an unitary matrix,
and its columns are the coordinate vectors of an unitary basis with respect to which
the linear operator T is represented by the upper triangular matrix
180 6 Canonical Forms of an Operator

λ0 uT
.
0n−1,1 A2

Conversely, assume that T has an upper-triangular matrix with respect to some


orthonormal basis of V, that is, there exists an unitary matrix U ∈ Mn (F) such that
U −1 AU = A is an upper triangular matrix having entries from F. By the fact that
the eigenvalues of A consist precisely of the entries on the main diagonal of A (see
Remark 6.13) and since A, A have the same eigenvalues (they are similar to each
other), we obtain the required conclusion.

Corollary 6.21 Let T : V → V be a linear operator on a finite dimensional complex


vector space V. There is an unitary basis B of V such that the matrix of T with respect
to B is upper-triangular. This is equivalent to saying that every complex matrix is
unitarily similar to an upper triangular matrix.

Example 6.22 Let T : R4 → R4 be the linear operator having matrix


⎡ ⎤
0 1 1 1
⎢ −2 1 0 2⎥
A=⎢
⎣ −1

0 2 1⎦
−2 1 1 3

with respect to the canonical basis of R4 . The characteristic polynomial is p(λ) =


(2 − λ)2 (1 − λ)2 having two distinct roots λ1 = 1 and λ2 = 2.
For λ1 = 1, we have ⎡ ⎤
−1 1 1 1
⎢ −2 0 0 2 ⎥
A−I =⎢ ⎣ −1 0 1 1 ⎦

−2 1 1 2

so that the associated homogeneous linear system has solutions: X = (α, 0, 0, α), for
any α ∈ R. The corresponding eigenspace is N1 = (1, 0, 0, 1) = ( √12 , 0, 0, √12 ).
The orthogonal complement is

1 1
N1⊥ = ( √ , 0, 0, − √ ), (0, 1, 0, 0), (0, 0, 1, 0)
2 2

so that, an orthonormal basis for R4 is

1 1 1 1
B1 = {( √ , 0, 0, √ ), ( √ , 0, 0, − √ ), (0, 1, 0, 0), (0, 0, 1, 0)}.
2 2 2 2

The vectors in B1 are the columns of the orthonormal transition matrix


6.2 Triangularizable Operators 181

⎡ √1 √1

2 2
0 0
⎢ 0 0 1 0⎥
Q1 = ⎢
⎣ 0
⎥.
0 0 1⎦
√1
2
− √12 0 0

Then, the matrix of T with respect to B1 is


⎡ ⎤
1 −3 1 1
⎢0 2 0 0⎥
A1 = Q 1T AQ 1 = ⎢ ⎥
⎣ 0 − √42 1 0 ⎦ .
0 − √22 0 2

Since the method is by induction, this leads to a recursive algorithm.


At this point, we delete the first row and column of A1 and consider the following
3 × 3 submatrix: ⎡ ⎤
2 00
⎢ 4 ⎥
C1 = ⎣ − √2 1 0 ⎦
− √22 0 2

having μ = 1 as eigenvalue, with corresponding unit eigenvector (0, 1, 0) ∈ R3 .


We extend the set {(0, 1, 0)}, by adding the vectors (1, 0, 0), (0, 0, 1), to obtain an
orthonormal basis B2 = {(0, 1, 0), (1, 0, 0), (0, 0, 1)} for R3 . We use the elements
of B2 to construct the orthonormal matrix
⎡ ⎤
010
Q2 = ⎣ 1 0 0 ⎦ .
001

Hence we compute
⎡ ⎤
1 − √42 0
⎢ ⎥
A2 = Q 2T C1 Q 2 = ⎣ 0 2 0 ⎦ .
0 − √22 2

Once again, we delete the first row and column of A2 and consider the following
2 × 2 submatrix:
2 0
C2 =
− √22 2

having eigenvalue ν = 2 with corresponding unit eigenvector (0, 1) ∈ R2 . Starting


from this eigenvector we obtain the orthonormal basis B3 = {(0, 1), (1, 0)} for R2 .
Using the vectors in B3 , we have the orthonormal matrix
182 6 Canonical Forms of an Operator

01
Q3 =
10

and compute
2 − √22
A3 = Q 3T C2 Q 3 = .
0 2

Actually the process is finished. We have now to construct the final upper-
triangular matrix which is similar to A. In other words, we have to find the orthonor-
mal matrix U ∈ M4 (R) such that U T AU is an upper-triangular matrix. This matrix
is the product of 3 orthonormal matrices U1 , U2 , U3 ∈ M4 (R). Each of them is cor-
responding to one step of the above process. More precisely:
(i) The first matrix is U1 = Q 1 .
(ii) Let U2 be the matrix ⎡ ⎤
1 0 0 0
1 0 ⎢0 0 1 0⎥
=⎢
⎣0
⎥.
0 Q2 1 0 0⎦
0 0 0 1

(iii) Analogously, we construct U3 as


⎡ ⎤
1 0 0 0
I2 0 ⎢0 1 0 0⎥
=⎢
⎣0
⎥.
0 Q3 0 0 1⎦
0 0 1 0

Thus ⎡ √1 ⎤
0 2
0 √12
⎢ 0 1 0 0 ⎥
U = U1 U2 U3 = ⎢
⎣ 0 0

1 0 ⎦
√1 0
2
0 − √12

and ⎡ ⎤
1 √22 √22 −3
⎢ 0 1 0 − √4 ⎥
⎢ 2⎥
U t AU = ⎢ ⎥.
⎣ 0 0 2 − √2 ⎦
2
0 0 0 2
6.2 Triangularizable Operators 183

Exercises

1. Let T : R4 → R4 be the linear operator having matrix


⎡ ⎤
1 2 0 1
⎢2 1 1 1⎥
A=⎢
⎣0

0 1 2⎦
0 0 2 1

with respect to the canonical basis of R4 . Determine an orthonormal basis for


R4 with respect to which the matrix of T is upper-triangular.
2. Repeat Exercise 1 for the linear operator T : R4 → R4 having matrix
⎡ ⎤
0 0 1 0
⎢0 1 0 0⎥
A=⎢
⎣2

2 1 0⎦
2 0 0 0

with respect to the canonical basis of R4 .


3. Repeat Exercise 1 for the linear operator T : R4 → R4 having matrix
⎡ ⎤
2 2 0 0
⎢1 −2 1 0 ⎥
A=⎢
⎣0

0 2 −6 ⎦
1 0 1 −2

with respect to the canonical basis of R4 .


4. Repeat Exercise 1 for the linear operator T : C4 → C4 having matrix
⎡ ⎤
2 −3 0 0
⎢2 −2 0 0 ⎥
A=⎢
⎣0

0 2 −3 ⎦
0 0 3 −2

with respect to the canonical basis of C4 .


5. Repeat Exercise 1 for the linear operator T : C3 → C3 having matrix
⎡ ⎤
0 1 1
A = ⎣ −1 0 2 ⎦
−1 −2 0

with respect to the canonical basis of C3 .


184 6 Canonical Forms of an Operator

6.3 Diagonalizable Operators

Definition 6.23 Let V be a vector space of dimension n. A linear operator T : V →


V is said to be diagonalizable if there exists a basis B of V such that the matrix A
of T with respect to B is diagonal, i.e.,
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
A=⎢



⎣ .. ⎦
.
λn

where λ1 , . . . , λn ∈ F are precisely the eigenvalues of T (of A)( see Remark 6.13).
Usually, B is said to be a diagonalizing basis of T. Analogously, we say that a matrix
is diagonalizable if it is similar to a diagonal matrix.
Assume that V has dimension n. One may note that, in light of the definition of the
matrix associated with a linear operator, the fact that B = {e1 , . . . , en } is a diago-
nalizing basis of V coincides with the one that T (e1 ) = λ1 e1 , …, T (en ) = λn en , for
suitable elements λi ∈ F.
Nevertheless, not all matrices could be diagonalized, in the sense that, in some
cases it is not possible to find a basis of V such that the matrix of T with respect to
that basis is diagonal. Here, we give an answer to the question of which matrices are
similar to diagonal matrices or, analogously, which linear operators are represented
by diagonal matrices. The diagonalizable linear operators are characterized by the
following:
Theorem 6.24 Let V be a vector space of dimension n and T : V → V a linear
transformation from V to itself. If the characteristic polynomial p(λ) of T splits
over F, i.e., p(λ) breaks over F in linear factors, then the following conditions are
equivalent:
(i) T is diagonalizable.
(ii) For any eigenvalue of T, its algebraic multiplicity matches with its geometric
multiplicity.
(iii) There exists a basis for V that consists entirely of eigenvectors of T.
Proof Let A ∈ Mn (F) be the matrix of T. The fact that T has a diagonal matrix with
respect to some basis of V is equivalent to saying that A is similar to a diagonal
matrix A , namely P −1 A P = A , where the columns of P are the coordinate vectors
of a basis with respect to which the linear operator T is represented by the diagonal
matrix A. Thus, we actually show that the following conditions are equivalent:
(i) A is diagonalizable.
(ii) For any eigenvalue of A, its algebraic multiplicity matches with its geometric
multiplicity.
(iii) There exists a basis for V that consists entirely of eigenvectors of A.
6.3 Diagonalizable Operators 185

Let λ1 , . . . , λm be the list of the distinct eigenvalues of A. Further suppose that


a1 , a2 , . . . , am and g1 , . . . , gm , respectively, be the algebraic and geometric multi-
plicities of λ1 , . . . , λm . We recall that a1 + a2 + · · · + am = n.
(i) ⇒ (ii) Since A ∈ Mn (F) is a diagonalizable matrix, there exists an invertible
matrix P ∈ Mn (F) such that
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥

A =P −1
AP = ⎢

⎥,

⎣ .. ⎦
.
λm

where the diagonal entries λi are not necessarily distinct and any eigenvalue λi
repeatedly occurs on the main diagonal as many times as it occurs as a root of the
characteristic polynomial of A . Let λk be any eigenvalue of A . Since the rank of
A − λk I is equal to n − ak , the dimension of the eigenspace Vk associated with
λk is n − (n − ak ) = ak , as required.
(ii) ⇒ (iii) We now assume that, for any eigenvalue λk of A, its algebraic multi-
plicity ak matches with its geometric multiplicity gk . Thus, the dimension of any
eigenspace Vk associated with λk is equal to ak . Hence we have

dimV1 + dimV2 + · · · + dimVm = n = dimV

that is
V1 ⊕ V2 ⊕ · · · ⊕ Vm = V.

Therefore, the union of the bases of any Vk is a basis for V that consists entirely
of eigenvectors of A.
(iii) ⇒ (i) We finally assume that B = {X 1 , . . . , X n } is a basis for V that con-
sists entirely of eigenvectors of A. Even if the eigenvalues of A are not necessarily
distinct, we can list them in the set {λ1 , . . . , λn }, such that any eigenvalue λi repeat-
edly occurs as many times as it occurs as a root of the characteristic polynomial
of A and

AX 1 = λ1 X 1 , AX 2 = λ2 X 2 , AX 3 = λ3 X 3 , . . . , AX n = λn X n .

Here, we compute the coordinate vectors of the images T (X i ) in terms of B :

T (X 1 ) = AX 1 = λ1 X 1 = (λ1 , 0, 0, 0, . . . , 0) B

T (X 2 ) = AX 2 = λ2 X 2 = (0, λ2 , 0, 0, . . . , 0) B

T (X 3 ) = AX 3 = λ3 X 3 = (0, 0, λ3 , 0, . . . , 0) B
186 6 Canonical Forms of an Operator

............

T (X n ) = AX n = λn X n = (0, 0, 0, 0, . . . , λn ) B .

If we denote by A the matrix of T with respect to the basis B, then the column
coordinate vectors of A are precisely the images T (X i ), that is,
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
A = ⎢

⎥.

⎣ .. ⎦
.
λn
 
Moreover, A = P −1 A P, where P = X 1 . . . . . . . . . X n is the transition matrix
having in any ith column the coordinates of the eigenvector X i .

Example 6.25 Let T : R3 → R3 be the linear operator having matrix


⎡ ⎤
2 −3 3
A = ⎣0 1 1⎦
0 −2 4

with respect to the canonical basis of R3 . The characteristic polynomial is p(λ) =


(2 − λ)2 (3 − λ), having two distinct roots λ1 = 2 (with algebraic multiplicity equal
to 2), λ2 = 3 (with algebraic multiplicity equal to 1).
For λ1 = 2, we have ⎡ ⎤
0 −3 3
A − 2I = ⎣ 0 −1 1 ⎦
0 −2 2

having rank equal to 1, so that the associated homogeneous linear system reduces
to x2 − x3 = 0 and has solutions X = (α, β, β), for any α, β ∈ R. The eigenspace
corresponding to λ1 is generated by the eigenvectors {(1, 0, 0), (0, 1, 1)} and the
geometric multiplicity of λ1 is equal to 2.
For λ2 = 3, we have ⎡ ⎤
−1 −3 3
A − 3I = ⎣ 0 −2 1 ⎦
0 −2 1

having rank equal to 2, so that the associated homogeneous linear system

−x1 − 3x2 + 3x3 = 0


−2x2 + x3 = 0
6.3 Diagonalizable Operators 187

has solutions X = (3α, α, 2α), for any α ∈ R. In this case, the eigenspace corre-
sponding to λ2 is generated by the eigenvector (3, 1, 2) and the geometric multiplicity
of λ2 is equal to 1.
Now consider the basis B = {(1, 0, 0), (0, 1, 1), (3, 1, 2)} for R3 , consisting
entirely of eigenvectors of T. With respect to this basis, T has a diagonal form.
The matrix of T with respect to B can be computed as A = P −1 A P, where
⎡ ⎤
103
P = ⎣0 1 1⎦
012

is the transition matrix having vectors of B in the columns. Hence,


⎡ ⎤−1 ⎡ ⎤⎡ ⎤
1 03 2 −3 3 103
A = ⎣ 0 1 1⎦ ⎣0 1 1⎦⎣0 1 1⎦
⎡0 12 ⎤ ⎡0 −2 4 ⎤ ⎡0 1 2 ⎤
1 3 −3 2 −3 3 103
= ⎣0 2 −1 ⎦ ⎣ 0 1 1 ⎦ ⎣ 0 1 1 ⎦
⎡0 −1 ⎤1 0 −2 4 012
2 00
= ⎣0 2 0⎦.
0 03

Example 6.26 Let T : C4 → C4 be the linear operator having matrix


⎡ ⎤
1 −1 1 0
⎢1 1 1 1 ⎥
A=⎢
⎣0

0 1 −1 ⎦
0 0 1 1

with respect to the canonical basis of C4 . The characteristic polynomial is p(λ) =


(λ − 1 − i)2 (λ − 1 + i)2 , having two distinct roots λ1 = 1 + i (with algebraic mul-
tiplicity equal to 2), λ2 = 1 − i (with algebraic multiplicity equal to 2).
For λ1 = 1 + i, we have
⎡ ⎤
−i −1 1 0
⎢ 1 −i 1 1 ⎥
A − (1 + i)I = ⎢ ⎣ 0 0 −i −1 ⎦

0 0 1 −i

having rank equal to 3, so that the associated homogeneous linear system


188 6 Canonical Forms of an Operator

−i x1 − x2 + x3 = 0
x1 − i x2 + x3 + x4 = 0
−i x3 − x4 = 0
x3 − i x4 = 0

has solutions X = (α, −iα, 0, 0), for any α ∈ R. The eigenspace corresponding to
λ1 is generated by the eigenvector (1, −i, 0, 0) and the geometric multiplicity of λ1
is equal to 1. We may conclude that, in this case, T is not diagonalizable.

Example 6.27 Let T : R5 → R5 be the linear operator having matrix


⎡ ⎤
1 2 −1 1 0
⎢2 1 1 −1 0⎥
⎢ ⎥
A=⎢
⎢0 0 2 1 0⎥⎥
⎣0 0 1 2 0⎦
0 0 0 0 3

with respect to the canonical basis of R5 . The characteristic polynomial is p(λ) =


(3 − λ)3 (1 − λ)(−1 − λ), having three distinct roots λ1 = 3 (with algebraic multi-
plicity equal to 3), λ2 = 1 (with algebraic multiplicity equal to 1), λ3 = −1 (with
algebraic multiplicity equal to 1).
For λ1 = 3, we have
⎡ ⎤
−2 2 −1 1 0
⎢ 2 −2 1 −1 0 ⎥
⎢ ⎥
A − 3I = ⎢ ⎢ 0 0 −1 1 0 ⎥

⎣ 0 0 1 −1 0 ⎦
0 0 0 0 0

having rank equal to 2, so that the associated homogeneous linear system

2x1 − 2x2 + x3 − x4 = 0
x3 − x4 =0

has solutions X = (α, α, β, β, γ ), for any α, β, γ ∈ R.


The eigenspace corresponding to λ1 is generated by the set of eigenvectors given
by {(1, 1, 0, 0, 0), (0, 0, 1, 1, 0), (0, 0, 0, 0, 1)} and the geometric multiplicity of λ1
is equal to 3.
For λ2 = 1, we have ⎡ ⎤
0 2 −1 1 0
⎢ 2 0 1 −1 0 ⎥
⎢ ⎥
A−I =⎢ ⎢0 0 1 1 0⎥

⎣0 0 1 1 0⎦
00 0 0 2
6.3 Diagonalizable Operators 189

having rank equal to 4, so that the associated homogeneous linear system

2x2 − x3 + x4 = 0
2x1 + x3 − x4 = 0
x3 + x4 = 0
x5 = 0

has solutions X = (−α, α, α, −α, 0), for any α ∈ R. The eigenspace corresponding
to λ2 is generated by the eigenvector (−1, 1, 1, −1, 0) and the geometric multiplicity
of λ2 is equal to 1. For λ3 = −1, we have
⎡ ⎤
2 2 −1 1 0
⎢2 2 1 −1 0⎥
⎢ ⎥
A+I =⎢
⎢0 0 3 1 0⎥⎥
⎣0 0 1 3 0⎦
0 0 0 0 4

having rank equal to 4, so that the associated homogeneous linear system

2x1 + 2x2 − x3 + x4 = 0
2x1 + +2x2 + x3 − x4 = 0
3x3 + x4 = 0
x3 + 3x4 = 0
x5 = 0

has solutions X = (−α, α, 0, 0, 0), for any α ∈ R. The eigenspace corresponding to


λ3 is generated by the eigenvector (−1, 1, 0, 0, 0) and the geometric multiplicity of
λ3 is equal to 1.
Therefore, there exists a basis B for R5 that consists entirely of eigenvectors of
T , i.e.,

B = {(1, 1, 0, 0, 0), (0, 0, 1, 1, 0), (0, 0, 0, 0, 1), (−1, 1, 1, −1, 0), (−1, 1, 0, 0, 0)}.

The matrix A of T with respect to B has the following diagonal form:


⎡ ⎤
3 0 0 0 0
⎢0 3 0 0 0 ⎥
⎢ ⎥
A = ⎢
⎢0 0 3 0 0 ⎥⎥.
⎣0 0 0 1 0 ⎦
0 0 0 0 −1
190 6 Canonical Forms of an Operator

Exercises

1. Let A ∈ Mn (F) be a block diagonal matrix, having the form



A1 0
A= .
0 A2

Prove that A is diagonalizable if and only if both A1 and A2 are diagonalizable.


2. Let T : V → V be a linear operator on a finite dimensional vector space V.
Prove that if there exists k ≥ 1 such that T k is the identity map on V, then T is
diagonalizable.
3. Let T : R3 → R3 be the linear operator having matrix
⎡ ⎤
00 1
A = ⎣ 0 0 −1 ⎦
11 2

in terms of the canonical basis for R3 . Determine, if possible, the basis for R3
with respect to which the matrix of T is diagonal.
4. Redo Exercise 3 for the linear operator T : R4 → R4 having matrix
⎡ ⎤
0 −1 1 1
⎢ −1 0 1 1⎥
A=⎢
⎣ 1
⎥.
1 −2 1⎦
0 0 0 1

5. For which values of constants α, β, γ ∈ R are the matrices


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3αβ 2αβ 21α
A = ⎣0 2 γ ⎦, B = ⎣0 2 γ ⎦, C = ⎣0 β γ ⎦
00 2 00 3 00 3

diagonalizable?
6. Suppose that A ∈ Mn (F) has two distinct eigenvalues λ and μ such that dim(E λ )
= (n − 1), where E λ is the eigenspace associated with λ. Prove that A is diago-
nalizable.
7. For each of the following linear operators T on a vector space V, test for diag-
onalizability and if T is diagonalizable find a basis B for V such that [T ] B is a
diagonal matrix.
 
(a) V = P3 (R) and T is defined by T ( f (x)) = f (x) + f (x).
(b) V = P2 (R) and T is defined by T (ax 2 + bx + c) = cx 2 + bx + a.
(c) V = R3 and T is defined by
6.3 Diagonalizable Operators 191
⎛ ⎞ ⎛ ⎞
a1 a2
T ⎝ a2 ⎠ = ⎝ −a1 ⎠ .
a3 2a3

(d) V = P2 (R) and T is defined by T ( f (x)) = f (0) + f (1)(x + x 2 ).


(e) V = C2 and T is defined by T (z, w) = (z + iw, i z + w).
( f ) V = M2 (R) and T is defined by T (A) = At .

6.4 Jordan Canonical Form of an Operator

We know that T is diagonalizable if and only if there exists a basis for V that consists
entirely of eigenvectors of T. On the other hand, not all linear operators could be
diagonalized. Here, we analyze what can be done in the case T is not diagonalizable.
In this case and under suitable assumptions, we shall see that there exists a basis of
V with respect to which the matrix of T assumes a fairly simple form, called the
Jordan canonical form.
More precisely, the p × p Jordan block associated with the scalar α ∈ F is defined
as the following p × p matrix:
⎡ ⎤
α 1
⎢ .. .. ⎥
⎢ . . ⎥
J p (α) = ⎢ ⎥
⎣ α 1⎦
α

that is, any entry ai j of J p (α) is equal to



⎨α if i= j
ai j = 1 if j = i + 1, 1 ≤ i ≤ p − 1

0 elsewhere

Notice that the characteristic polynomial of the Jordan block J p (α) is (α − λ) p .


Therefore, J p (α) has an unique eigenvalue, i.e., α ∈ F, having algebraic multiplicity
p. Moreover, since ⎡ ⎤
0 1
⎢ .. .. ⎥
⎢ . . ⎥
J p (α) − α I p = ⎢ ⎥
⎣ 0 1⎦
0

has rank p − 1, the geometric multiplicity of α is 1. This means that J p (α) is not
diagonalizable, unless the trivial case p = 1 and J1 (α) = [α] ∈ M1 (F).
We say that A ∈ Mn (F) has Jordan form if A is made up of diagonal Jordan blocks,
that is
192 6 Canonical Forms of an Operator
⎡ ⎤
Jn 1 (α1 )
⎢ .. ⎥
⎢ . ⎥
A=⎢

⎥,

⎣ .. ⎦
.
Jnr (αr )

where any Jni (αi ) is a n i × n i Jordan block associated with the scalar αi such that

n i = n. It is easy to see that α1 , . . . , αr are the eigenvalues of A which are not
i
necessarily distinct, and the characteristic polynomial of A is

p A (λ) = (λ − α1 )n 1 (λ − α2 )n 2 · · · (λ − αr )nr .

Example 6.28 The matrix


⎡ ⎤
3 1 0 0 0
⎢0 3 1 0 0⎥
⎢ ⎥ J3 (3) 0
A1 = ⎢
⎢0 0 3 0 0⎥⎥ =
⎣0 0 J2 (2)
0 0 2 1⎦
0 0 0 0 2

has Jordan form, in fact it consists of 2 diagonal Jordan blocks associated with the
eigenvalues 3 and 2, respectively.

Example 6.29 The matrix


⎡ ⎤
2 1 0 0 0 0
⎢0 2 1 0 0 0⎥ ⎡ ⎤
⎢ ⎥ J3 (2) 0 0
⎢0 0 2 0 0 ⎥
0⎥ ⎣
A2 = ⎢
⎢0 = 0 J2 (4) 0 ⎦
⎢ 0 0 4 1 0⎥⎥
⎣0 0 0 J1 (3)
0 0 0 4 0⎦
0 0 0 0 0 3

has Jordan form, in fact, it consists of 3 diagonal Jordan blocks associated with the
eigenvalues 2, 4 and 3, respectively.

Example 6.30 The matrix


⎡ ⎤ ⎡ ⎤
3 0 0 0 J1 (3) 0 0 0
⎢0 0⎥ ⎢ 0 ⎥
A3 = ⎢
2 0 ⎥ = ⎢ 0 J1 (2) 0 ⎥
⎣0 0 4 0 ⎦ ⎣ 0 0 J1 (4) 0 ⎦
0 0 0 5 0 0 0 J1 (5)

has Jordan form, in fact, it consists of 4 diagonal 1 × 1 Jordan blocks associated with
the eigenvalues 3, 2, 4 and 5, respectively.
6.4 Jordan Canonical Form of an Operator 193

The operator T is said to be Jordanizable if there exists a basis of V with respect


to which the matrix of T has a Jordan form. In other words, the matrix A ∈ Mn (F)
is said to be Jordanizable if it is similar to a matrix having Jordan form. Example
6.30 shows that the diagonalizable matrices (operators) represent a special case of
the more general class of matrices which are similar to some Jordan canonical form.

Definition 6.31 Let W be a subspace of the vector space V and T : V → V be a


linear operator. We say that W is invariant under T (T -invariant) if T (W ) ⊂ W.
In this case, if T|W denotes the restriction of T to the smaller domain W, then
T|W : W → W is a linear operator on W.

Remark 6.32 Any eigenspace of a linear operator T : V → V is a T -invariant sub-


space of V.

Remark 6.33 It is easy to see that, if T : V → V is a linear operator and W1 , W2


are T -invariant subspaces of V, then both W1 + W2 and W1 ∩ W2 are T -invariants.

Remark 6.34 Let T : V → V be a linear operator, I : V → V the identity map on


V, W a T -invariant subspace of V and λ ∈ F. Then W is (T − λI )-invariant.

Lemma 6.35 Let V be a vector space of dimension n, T : V → V a linear operator


and U, W are T -invariant subspaces of V such that V = U ⊕ W. If we denote
by pT (λ), pT|U (λ) and pT|W (λ) the characteristic polynomials of T, T|U and T|W ,
respectively, the following condition holds:

pT (λ) = pT|U (λ) pT|W (λ).

Proof Let BU = {u 1 , . . . , u k } and BW = {w1 , . . . , wn−k } be bases of U and W ,


respectively. Thus BU ∪ BW = B is a basis of V. Since T (U ) ⊂ U and T (W ) ⊂ W,
the matrix A of T with respect to B has the following block diagonal form

AU 0k,n−k
A= ,
0n−k,k A W

where 0k,n−k is the zero matrix in Mk,n−k (F), 0n−k,k is the zero matrix in Mn−k,k (F),
AU is the matrix of the restriction T|U with respect to the basis BU , A W is the matrix
of the restriction T|W with respect to the basis BW . Therefore,

pT (λ) = |A − λIn | = |AU − λIk ||A W − λIn−k | = pT|U (λ) pT|W (λ)

as required.

Definition 6.36 Let V be a vector space of dimension n. A linear operator T :


V → V is called nilpotent if there exists a positive integer k such that T k is zero,
i.e., T k (v) = 0, for any v ∈ V. The smallest such integer k is called the index of
nilpotency of T. In other words, k is the index of nilpotency if T h = 0 for any
positive integer h < k. Analogously, a matrix A ∈ Mn (F) is called nilpotent if there
194 6 Canonical Forms of an Operator

exists a positive integer k such that Ak = 0. The smallest such k is called the index of
nilpotency of A, in the sense that Ah = 0 for any positive integer h < k. Of course,
the linear operator T is nilpotent if and only if its matrix A is nilpotent.
Lemma 6.37 Let V be a vector space of dimension n, T : V → V a linear operator
and A ∈ Mn (F) the matrix of T. If A is nilpotent, then its characteristic polynomial
is p(λ) = (−1)n λn .
Proof Let k ≥ 1 be the smallest integer such that Ak = 0. Suppose that 0 = λ ∈ F is
an eigenvalue of A. Then there exists an eigenvector 0 = v ∈ V such that Av = λv.
By Lemma 6.5, it follows that Ak v = λk v = 0, which contradicts the nilpotency of
A. Therefore λ = 0 is the only eigenvalue of A so that its characteristic polynomial
is precisely p(λ) = (−1)n λn .
Definition 6.38 A generalized eigenvector corresponding to an eigenvalue λ of a
linear operator T : V → V is a nonzero vector v ∈ V such that (T − λIV )k v = 0,
for some integer k ≥ 1. The exponent of the generalized eigenvector v is the smallest
integer h ≥ 1 such that (T − λIV )h v = 0.
Let now A ∈ Mn (F) be the matrix of T : V → V, and λ an eigenvalue of T. Denote
(T − λI )0 = I, (T − λI )1 = T − λI, (T − λI )2 = (T − λI )(T − λI ), . . . , (T −
λI )h = (T − λI )(T − λI )h−1 and consider both kernel and image of any (T −
λI )h : V → V, respectively:
 
Nh,λ = N (T − λI )h = {v ∈ V | (T − λI )h (v) = 0},
 
Rh,λ = R (T − λI )h = {(T − λI )h (v) | v ∈ V }.

It is easy to see that:

(0) = N0,λ ⊂ N1,λ ⊂ N2,λ ⊂ . . . . . . ⊂ Ns,λ ⊂ . . .

is a chain of subspaces of V. Since V is finite dimensional, this chain cannot be


infinite. In a similar way, the chain

V = R0,λ ⊃ R1,λ ⊃ R2,λ ⊃ . . . . . . ⊃ Rs,λ ⊃ . . .

cannot be infinite. Since both chains terminate after finitely many steps, we can find
a minimum integer m ≥ 1 such that

Nm,λ = Nm+1,λ = Nm+2,λ = · · · · · · = Nt,λ for all t ≥ m

and
Rm,λ = Rm+1,λ = Rm+2,λ = · · · · · · = Rt,λ for all t ≥ m.

Notice that if v ∈ Nm,λ then 0 = (T − λI )m (v) = (T − λI )m−1 (T − λI )(v),


which means that (T − λI )(v) ∈ Nm−1,λ ⊂ Nm,λ . Since v ∈ Nm,λ , then also T (v) ∈
6.4 Jordan Canonical Form of an Operator 195

Nm,λ . Moreover, if w ∈ Rm,λ then w = (T − λI )m (u), for some u ∈ V. Hence


(T − λI )(w) = (T − λI )m+1 (u) ∈ Rm+1,λ ⊂ Rm,λ . As above, since w ∈ Rm,λ , we
also have T (w) ∈ Rm,λ . Hence both Nm,λ and Rm,λ are T -invariant subspaces of V.

Remark 6.39 The linear operator T : V → V is nilpotent if and only if there exists
k ≥ 1 (which is precisely the index of nilpotency of T ) such that Nk,0 = V and
Rk,0 = {0}.

Definition 6.40 Let T : V → V be a linear operator, λ an eigenvalue of T and s ≥ 1


the smallest integer such that Ns,λ = Nt,λ , for any t ≥ s. The subspace

Ns,λ = {X ∈ V | there exists i ≤ s, (T − λI )i (X ) = 0}

is called generalized eigenspace of T corresponding to the eigenvalue λ. Any element


of Ns,λ is a generalized eigenvector corresponding to the eigenvalue λ. The integer
s is called index of the eigenvalue λ. It is the greatest integer among the exponents
of the generalized eigenvectors corresponding to the eigenvalue λ.

Lemma 6.41 Let T : V → V be a linear operator of the finite dimensional vector


space V, λ an eigenvalue of T, t the index of λ. Then Nt,λ is invariant under the
action of the linear operator T − λI.

Proof Let X ∈ Nt,λ be a generalized eigenvector corresponding to λ, that is 0 =


(T − λI )t X = (T − λI )t−1 (T − λI )X. Hence (T − λI )X ∈ Nt−1,λ ⊂ Nt,λ , which
means that Nt,λ is invariant under the action of the operator T − λI.

Lemma 6.42 Let T : V → V be a linear operator of the finite dimensional vector


space V, λ = μ distinct eigenvalues of T, t the index of λ. Then Nt,λ is invariant
under the action of the linear operator T − μI.

Proof Let X ∈ Nt,λ be a generalized eigenvector corresponding to λ, that is (T −


λI )t X = 0. Moreover, by μ = λ, we have (T − μI )X = T (X ) − μX = 0. Since,
by the previous lemma, (T − λI )X ∈ Nt,λ , there is Y ∈ Nt,λ such that T (X ) =
λX + Y ∈ Nt,λ , which implies that (T − μI )X = T (X ) − μX ∈ Nt,λ , as required.

Lemma 6.43 Let A ∈ Mn (F) be the matrix of a linear operator T : V → V, λ an


eigenvalue of T, s be the minimum integer such that Ns,λ = Nt,λ and Rs,λ = Rt,λ ,
for any t ≥ s. Then dim(Ns,λ ) is equal to the algebraic multiplicity of the eigenvalue
λ.

Proof We divide the proof into two steps. Firstly, we consider the case when T
has the eigenvalue λ = 0. Let G = T|Ns,0 : Ns,0 → Ns,0 be the restriction of T to
Ns,0 , H = T|Rs,0 : Rs,0 → Rs,0 the restriction of T to Rs,0 . Denote by pT (t), pG (t)
and p H (t) the characteristic polynomials of T, G and H , respectively. Since V =
Ns,0 ⊕ Rs,0 and by Lemma 6.35, we have

pT (t) = pG (t) p H (t).


196 6 Canonical Forms of an Operator

On the other hand, since G is nilpotent, pG (t) = (−1)k t k , where k = dim(Ns,0 ).


Moreover H is bijective, so that t = 0 cannot be eigenvalue of H, that is t = 0 is not
a root of the polynomial p H (t), then the algebraic multiplicity of λ = 0 is precisely
the dimension of its generalized eigenspace.
Let now λ = 0 be an eigenvalue of T and let h ≥ 1 be its algebraic multiplicity.
It is clear that 0 is an eigenvalue of the linear operator T − λI : V → V, having
algebraic multiplicity equal to h. By the previous argument, the dimension of the
generalized eigenspace of 0 (as eigenvalue of T − λI ) is precisely h. On the other
hand, the generalized eigenspace of 0 as eigenvalue of T − λI coincides with the
generalized eigenspace of λ as eigenvalue of T, and the proof is complete.

Theorem 6.44 Let T : V → V be a linear operator of a finite dimensional vector


space V. Then V is the direct sum of all generalized eigenspaces corresponding to
the distinct eigenvalues of T.

Proof Let λ1 , . . . , λr be the distinct eigenvalues of T, t1 , . . . , tr the indices and


Nt1 ,λ1 , . . . , Ntr ,λr the generalized eigenspaces of λ1 , . . . , λr , respectively. Firstly, we
prove that Nt1 ,λ1 + · · · + Ntr ,λr is a direct sum. To do this, we show that, if

X 1 + · · · + X r = 0, X i ∈ Nti ,λi (6.1)

then X i = 0 for any i = 1, . . . , r.


Since (T − λr I )tr (X 1 + · · · + X r ) = 0 and (T − λr I )tr X r = 0, then (T − λr I )tr
(X 1 + · · · + X r −1 ) = 0, that is Y1 + · · · + Yr −1 = 0, where Yi = (T − λr I )tr X i , for
i = 1, . . . , r − 1. By the previous lemma, Yi ∈ Nti ,λi .
Analogously, (T − λr −1 I )tr −1 (Y1 + · · · + Yr −1 )=0 implies (T − λr −1 I )tr −1 (Y1 +
· · · + Yr −2 ) = 0, that is Z 1 + · · · + Z r −2 = 0, where Z i = (T − λr −1 I )tr −1 Yi =
(T − λr −1 I )tr −1 (T − λr I )tr X i ∈ Nti ,λi , for any i = 1, . . . , r − 2. Continuing this
process, we finally get

(T − λr −1 I )t2 (T − λr −1 I )t3 · · · (T − λr −1 I )tr X 1 = 0. (6.2)

On the other hand, (T − λr −1 I )tk · · · (T − λr −1 I )tr X 1 ∈ Nt1 ,λ1 \ Nts ,λs , for any
k = 2, . . . , r and s = 1. Therefore, relation (6.2) is true only if X 1 = 0. In this case,
(6.1) reduces to
X 2 + · · · + X r = 0, X i ∈ Nti ,λi . (6.3)

So, starting from (6.3) and repeating the same above process r − 1 times, one may
prove that X i = 0, for any i = 1, . . . , r.
Let now W = Nt1 ,λ1 ⊕ · · · ⊕ Ntr ,λr . By Lemma 6.43, we know that dim(Nti ,λi ) is
equal to the algebraic multiplicity of the eigenvalue λi , that is dim F (W ) = dim F (V )
and V = W.

Lemma 6.45 Let A ∈ Mn (F) be the matrix of a linear operator T : V → V, λ an


eigenvalue of T, s be the minimum integer such that Ns,λ = Nt,λ and Rs,λ = Rt,λ ,
for any t ≥ s. Then:
6.4 Jordan Canonical Form of an Operator 197

(i) The restriction of T − λI to Ns,λ is nilpotent.


(ii) The restriction of T − λI to Rs,λ is an isomorphism.
(iii) V = Ns,λ ⊕ Rs,λ .

Proof Let G = (T − λI )|Ns,λ : Ns,λ → Ns,λ be the restriction of T − λI to Ns,λ . For


any X ∈ Ns,λ , we have G s (X ) = (T − λI )s (X ) = 0. Thus G is nilpotent.
Now let H = (T − λI )|Rs,λ : Rs,λ → Rs,λ be the restriction of T − λI to Rs,λ . Let
X ∈ N (H ) = {v ∈ Rs,λ |H (v) = 0} be the kernel of H. Since X ∈ Rs,λ , thereexists
Y ∈ V such
 that X = (T − λI )s (Y ). As a consequence, 0 = H (X ) = H (T −
λI ) (Y ) = (T − λI )s+1 (Y ), that is Y ∈ Ns+1,λ . On the other hand Ns,λ = Ns+1,λ ,
s

so that Y ∈ Ns,λ , that is 0 = (T − λI )s (Y ) = X. It is proved that N (H ) = {0}, so


that the image of H is equal to Rs,λ and the endomorphism H is bijective.
We finally show that V = Ns,λ ⊕ Rs,λ . Firstly, we prove that Ns,λ ∩ Rs,λ = {0}.
On the contrary we suppose that there exists 0 = X ∈ Ns,λ ∩ Rs,λ . Since X ∈ Ns,λ ,
we have (T − λI )s (X ) = 0. On the other hand, since X ∈ Rs,λ , there is some
0 = Y ∈ V such that X = (T − λI )s (Y ). Therefore (T − λI )2s (Y ) = 0, that is
Y ∈ N2s,λ = Ns,λ . This implies the contradiction 0 = (T − λI )s (Y ) = X.
To complete the proof, we recall that (T − λI )s : V → V is a linear operator of
V, so that dim(Ns,λ ) + dim(Rs,λ ) = dim(V ) and we are done.

Remark 6.46 Suppose that λ = 0 is an eigenvalue of a linear operator T : V → V,


A the matrix of T with respect to a basis B of V, and replace the operator T by
T − λI. The matrix of T − λI with respect to the basis B is A − λI. Moreover, if
one of A or A − λI is in Jordan form, so is the other. More precisely, A contains the
Jordan block
⎡ ⎤
λ 1
⎢ .. .. ⎥
⎢ . . ⎥
J p (λ) = ⎢ ⎥
⎣ λ 1⎦
λ

if and only if A − λI contains the Jordan block


⎡ ⎤
0 1
⎢ .. .. ⎥
⎢ . . ⎥
J p (0) = ⎢ ⎥.
⎣ 0 1⎦
0

Thus, in order to analyze the Jordan form of T, we replace T by T − λI and reduce


to the case of singular operators (that is, operators having at least one eigenvalue
λ = 0).
To simplify the notation, we will write Ns instead of Ns,0 and Rs instead of Rs,0 .

Definition 6.47 Let A ∈ Mn (F) be the matrix of a singular linear operator T : V →


V, s be the minimum integer such that Ns = Nt and Rs = Rt , for any t ≥ s. Let
198 6 Canonical Forms of an Operator

{e1 , . . . , et } be a basis for Ns . Since V = Ns ⊕ Rs , there exist vectors et+1 , . . . , en ∈


Rs such that B = {e1 , . . . , en } is a basis for V. With respect to B, the matrix A of T
has the following form:
A1 0t,n−t
A= (6.4)
0n−t,t A2

where:
(i) A1 is a nilpotent matrix of index s, corresponding to the restriction of T to Ns ;
(ii) A2 is an invertible matrix, corresponding to the restriction of T to Rs .
More precisely:
(iii) For i = 1, . . . , t, the
coordinate column vectors of the image T (ei ) define the
A1
submatrix .
0n−t,t
(iv) For i = t + 1, . . . , n, the coordinate column vectors of the image T (ei ) define
0
the submatrix t,n−t .
A2
Thus, according to the decomposition V = Ns ⊕ Rs we have the decomposition
(6.4). It is called Fitting decomposition of A (usually V = Ns ⊕ Rs is also called
Fitting decomposition of T ).

Now, we can definitively describe the characteristic polynomial of a nilpotent oper-


ator:

Theorem 6.48 Let V be a vector space of dimension n, T : V → V a linear oper-


ator and A ∈ Mn (F) the matrix of T. A is nilpotent if and only if its characteristic
polynomial is p(λ) = (−1)n λn .

Proof In case A is nilpotent, then p(λ) = (−1)n λn is proved in Lemma 6.37. Hence,
we assume p(λ) = (−1)n λn , that is 0 is the only eigenvalue of A. By contradiction,
suppose that A is not nilpotent, that is T is not nilpotent. Then there is some vector
v ∈ V such that T s (v) = 0, in particular Rs = {0} (see Remark 6.39).
Thus, by Fitting decomposition rule, V = Ns ⊕ Rs and

A1 0t,n−t
A= ,
0n−t,t A2

where A2 is an invertible matrix. On the other hand, since we have assumed that 0 is
the only eigenvalue of A, a fortiori it is the only eigenvalue of the nonsingular matrix
A2 , which is a contradiction.

We are now able to construct a basis of V with respect to which the matrix of T has
Jordan form. It will be an inductive process and consists of several steps. We start
with the following:
6.4 Jordan Canonical Form of an Operator 199

Theorem 6.49 Let V be a vector space of dimension n, T : V → V a singular


linear operator. Let X r ∈ V be such that X r ∈ Nr \ Nr −1 , in other words let X r
be a generalized eigenvector of T and r be the exponent of X r . If we consider the
following chain of vectors X r −i = T (X r −i+1 ) = T i (X r ), for i = 1, . . . , r − 1, then
the following properties hold:
(i) X r −i ∈ Nr −i \ Nr −i−1 , for any i = 1, . . . , r − 1.
(ii) {X 1 , X 2 , . . . , X r } is a linearly independent set.
(iii) If W = Span{X 1 , X 2 , . . . , X r } is the subspace of V generated by
{X 1 , X 2 , . . . , X r }, then T (W ) ⊆ W.

Proof Consider the vector X r −1 = T (X r ). Hence T r −1 (X r −1 ) = T r −1 T (X r ) =


T r (X r ) = 0, because of X r ∈ Nr . Then X r −1 ∈ Nr −1 . Now, suppose that X r −1 ∈
Nr −2 . This should imply that 0 = T r −2 (X r −1 ) = T r −2 T (X r ) = T r −1 (X r ), which
contradicts the assumption that X r ∈ / Nr −1 . Thus we have proved that X r −1 ∈
Nr −1 \ Nr −2 . The inductive construction of the chain of vectors X r −i = T (X r −i+1 ) =
T i (X r ), indicates that we may repeat the previous argument in order to obtain the
required conclusion X r −i ∈ Nr −i \ Nr −i−1 , for any i = 1, . . . , r − 1.
Now assume
α1 X 1 + α2 X 2 + · · · + αr X r = 0 (6.5)

for some elements α1 , . . . , αr ∈ F. By applying T r −1 to (6.5), we have

0 = α1 T r −1 (X 1 ) + α2 T r −1 (X 2 ) + · · · + αr T r −1 (X r ). (6.6)

Since X j ∈ N j , for any j = 1, . . . , r − 1, and X r ∈


/ Nr −1 , by (6.6) it follows
0 = αr T r −1 (X r ), implying αr = 0. Thus, we may rewrite (6.5) as follows:

α1 X 1 + α2 X 2 + · · · + αr −1 X r −1 = 0. (6.7)

Hence, application of T r −2 to (6.7), and since X r −1 ∈ / Nr −2 , implies αr −1 = 0.


By repeating this process we may prove that αi = 0, for any i = 1, . . . , r , that is
{X 1 , X 2 , . . . , X r } is a linearly independent set.
Finally, we prove that W = Span{X 1 , X 2 , . . . , X r } is T -invariant. At first we
notice that T (X 1 ) = T T r −1 (X r ) = T r (X r ) = 0 ∈ W. We now focus our atten-
tion on the image T (X j ), when 1 < j ≤ r. In this case T (X j ) = T T r − j (X r ) =
T r +1− j (X r ) = X j−1 ∈ W. Hence, the image of any linear combination α1 X 1 +
α2 X 2 + · · · + αr X r is an element of W, as desired.

Remark 6.50 Let X r ∈ V be such that X r ∈ Nr \ Nr −1 . The chain of vectors


X r −i = T (X r −i+1 ) = T i (X r ), for i = 1, . . . , r − 1, as in Theorem 6.49, can be
defined as follows:
(i) X k−1 = T (X k ), for k = 2, . . . , r ;
(ii) 0 = T (X 1 ).
200 6 Canonical Forms of an Operator

Remark 6.51 Let T : V → V be a nilpotent operator (that is, 0 is the only eigen-
value of T ), X 1 , . . . , X k generalized eigenvectors of T such that ri is the exponent of
X i , for any i = 1, . . . , k. Starting from any X i we now construct the chain of vectors
as in Theorem 6.49:
(i) X i = X i,ri ;
(ii) X i,ri − j = T j (X i,ri ), for j = 1, . . . , ri − 1.
The set Bi = {X i,1 , X i,2 , . . . , X i,ri } is linearly independent and the subspace Wi =
Span(Bi ) is T -invariant. Moreover, in light of Remark 6.50:
(iii) X i,k−1 = T (X i,k ), for k = 2, . . . , ri ;
(iv) T (X i,1 ) = 0, that is, X i,1 is an eigenvector of T.
If V = ⊕i=1
k
W , where Wi s are subspaces of V having bases Bi s respectively, then
ki
the set B = i=1 Bi is a basis for V. Write

B = {X 1,1 , . . . , X 1,r1 ; X 2,1 , . . . , X 2,r2 ; . . . , . . . ; X k,1 , . . . , X k,rk }

and let A be the matrix of T with respect to B. The column coordinate vectors of A
are the images of the elements of the basis B. By computing these images, we have
that
(v) T (X i, j ) = X i, j−1 in case X i, j is not the first vector in the chain Bi ;
(vi) T (X i, j ) = 0 in case X i, j is the first vector in the chain Bi .
Therefore, the coordinate vector of T (X i, j ) with respect to B is either
 i−1

(0, 0, . . . , 0, 1, 0, . . . , 0), where 1 is the rh + ( j − 1) -th entry, or
h=1
(0, 0, . . . , 0, 0), respectively. Therefore, the matrix A has the following block diag-
onal form ⎡ ⎤
J1 (0)
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ . .. ⎥
⎣ ⎦
Jk (0)

where ⎡ ⎤
0 1
⎢ .. .. ⎥
⎢ . . ⎥
Ji (0) = ⎢ ⎥
⎣ 0 1⎦
0

is a ri × ri Jordan block. In this case {X 1 , . . . , X k } is called a set of Jordan generators


and B is called a Jordan basis for V.
Theorem 6.52 Let T : V → V be a linear operator of a finite dimensional vector
space V. If T is nilpotent, then there exists a basis B for V such that the matrix of T
with respect to B has Jordan form.
6.4 Jordan Canonical Form of an Operator 201

Proof Let dim F V = n. We prove the result by induction on the dimension of V. It


is trivial, in case dim F V = 1.
Let N and R be the kernel and the image of T , respectively. Since T is nilpotent,
N = {0}, dim F R < dim F V, and every nonzero vector of V is a generalized eigen-
vector with eigenvalue 0. By our induction assumption, the result is true for T|R , the
restriction of T to R. So there exists a set of Jordan generators {X 1 , . . . , X k } for
T|R . Let ti be the exponent of any X i and let Ui be the subspace of R spanned by
{X i,1 , . . . , X i,ti }, where
(i) X i,ti = X i
j
(ii) X i,ti − j = T|R (X i,ti ), for j = 1, . . . , ti − 1.
By Theorem 6.49, {X i,1 , . . . , X i,ti } is a linearly independent set, then it is a basis for
Ui . Moreover Ui is T|R -invariant and R = U1 ⊕ · · · ⊕ Uk by the induction assump-
tion.
Let now Yi ∈ V be such that T (Yi ) = X i . Hence the exponent of Yi is equal to
ti + 1. Denote by Vi the subspace of V spanned by {Yi,1 , . . . , Yi,ti +1 }, where
(i) Yi,ti +1 = Yi
(ii) Yi,ti +1− j = T j (Yi,ti +1 ), for j = 1, . . . , ti
and W = V1 + · · · + Vk . Since each Vi is T -invariant, W is T -invariant. Moreover
T (Vi ) = Ui , and hence T (W ) = R.
If X ∈ Vi ∩ N , then T (X ) = 0 and there exist α1 , . . . , αti +1 ∈ F such that

X = α1 Yi,1 + · · · + αti +1 Yi,ti +1 . (6.8)

By applying T to relation (6.8), one has

0 = T (X ) = α1 T (Yi,1 ) + · · · + αti +1 T (Yi,ti +1 ). (6.9)

Notice that  
T (Yi,1 ) = T T ti (Yi,ti +1 ) = T ti +1 (Yi,ti +1 ) = 0.

Moreover
Yim = T ti +1−m (Yi,ti +1 ) = T ti −m (X i,ti )

implies
T (Yim ) = T ti +1−m (X i,ti ) = 0, ∀1 < m ≤ ti + 1.

Therefore, (6.9) reduces to

ti +1
 ti +1

ti +1−m
0= αm T (X i,ti ) = (αm X i,m−1 ). (6.10)
m=2 m=2

Since {X i,1 , . . . , X i,ti } is a linearly independent set, relation (6.10) implies αm =


0, for any m = 2, . . . , ti + 1. Thus, by (6.8), it follows
202 6 Canonical Forms of an Operator

X = α1 Yi,1 = α1 T ti (Yi,ti +1 ) = α1 T ti (Yi ) = α1 T ti −1 (X i )

that is, X is a nonzero element of Ui . In other words, we have showed that Vi ∩ N ⊆


Ui .
Let now v1 ∈ V1 , . . . , vk ∈ Vk be such that v1 + · · · + vk = 0. Hence

0 = T (v1 + · · · + vk ) = T (v1 ) + · · · + T (vk ). (6.11)

Moreover, since T (Vi ) = Ui , we also have T (vi ) ∈ Ui , for all i = 1, . . . , k.


Therefore, by the fact that U1 , . . . , Uk are independent and by (6.11), it fol-
lows T (vi ) = 0, thus vi ∈ Vi ∩ N ⊆ Ui , for any i = 1, . . . , k. Once again since
U1 , . . . , Uk are independent and by the assumption that v1 + · · · + vk = 0, we
conclude that vi = 0, for any i = 1, . . . , k. In other words we have proved that
W = V1 ⊕ · · · ⊕ Vk , so that {Y1 , . . . , Yk } is a set of Jordan generators for T|W .
Finally, let v1 , v2 ∈ V be such that T (v1 ) = v2 . Since T (W ) = R, there exists an
element u ∈ W such that T (u) = v2 = T (v1 ), that is T (u − v1 ) = 0 and u − v1 ∈
N . Hence, there is x ∈ N such that v1 = u + x. Repeating this process for any
element of V, we have that V = W + N . Thus, we can extend a basis of W to a basis
of V by adding elements of N . In this sense, let x1 , . . . , xm ∈ N be such that

V = W ⊕ Span{x1 , . . . , xm } = V1 ⊕ · · · ⊕ Vk ⊕ Span{x1 , . . . , xm }.

Notice that Span{x1 , . . . , xm } is trivially T -invariant, moreover the restriction of


T to Span{x1 , . . . , xm } is the zero operator and its matrix is the zero matrix.
Therefore, since {Y1 , . . . , Yk } is a set of Jordan generators for T|W ,
{Y1 , . . . , Yk , x1 , . . . , xm } is a set of Jordan generators for T.

Theorem 6.53 Let T : V → V be a linear operator of a finite dimensional vector


space V. If the characteristic polynomial p(λ) of T splits over F, then there exists a
basis B for V such that the matrix of T with respect to B has Jordan form.

Proof We prove the theorem by induction on the dimension of V. The result is trivial
if dim F V = 1. Assume dim F V = n ≥ 2 and the theorem holds for any T -invariant
proper subspace of V.
We firstly consider the case p(λ) = (λ − λ1 )n , that is there exists a unique eigen-
value λ1 of T, having algebraic multiplicity equal to n. Let A be the matrix of T with
respect to a basis B of V. Now replace T by T − λ1 I, so that A − λ1 I is the matrix
of T − λ1 I with respect to the same basis B. Moreover, if A − λ1 I is similar to a
Jordan diagonal blocks matrix, so is A. However, T − λ1 I is a nilpotent operator
and we can conclude by Theorem 6.52.
Hence, we may assume that there exist at least two distinct eigenvalues λ1 , λ2 of
T. The linear operator T − λ1 I is singular, having eigenvalue zero, and A − λ1 I is
its matrix. As above, if A − λ1 I admits a Jordan canonical form, so does A.
Let s ≥ 1 be the index of λ1 . That is, s is the minimum integer such that Ns,0 = Nt,0
and Rs,0 = Rt,0 , for any t ≥ s. We write Ns instead of Ns,0 and Rs instead of Rs,0 .
6.4 Jordan Canonical Form of an Operator 203

Notice that Rs = 0, since (T − λ1 I )X = 0 for any eigenvector X corresponding to


λ2 and and Ns = 0, since T − λ1 I is singular. Thus, we can consider the Fitting
decomposition of (T − λ1 I ), i.e., V = Ns ⊕ Rs . Both the restrictions (T − λ1 I )|Ns
and (T − λ1 I )|Rs are linear operators over proper vector subspace of V . Hence, by
our induction assumption, there exist B1 = {X 1 , . . . , X k } set of Jordan generators for
(T − λ1 I )|Ns and B2 = {X k+1 , . . . , X n } set of Jordan generators for (T − λ1 I )|Rs .
So the matrix A1 of (T − λ1 I )|Ns with respect to the basis B1 has Jordan form, as
well as the matrix A2 of (T − λ1 I )|Rs in terms of the basis B2 . As in Lemma 6.35,
the matrix of T − λ1 I with respect to the basis B1 ∪ B2 of V has the following block
diagonal form
A1 0k,n−k
0n−k,k A2

and the proof is complete.

Lemma 6.54 Let T : V → V be a singular linear operator of the finite dimensional


vector space V , and assume that there exists a basis B for V such that the matrix
A of T with respect to B has Jordan form. If Jk (0) is a k × k Jordan block of A
corresponding to the eigenvalue zero, then

  k − d, 0 ≤ d < k
rank Jk (0)d =
0, d≥k

Proof Consider the linear operator H : Fk → Fk whose matrix is


⎡ ⎤
0 1
⎢ .. .. ⎥
⎢ . . ⎥
⎢ ⎥
Jk (0) = ⎢
⎢ . .. .. ⎥
. ⎥.
⎢ ⎥
⎣ 0 1⎦
0

The fact that Jk (0)k = the zero matrix, is trivial, so we can consider the case d < k.
We denote R, R 2 , . . . , R d the images of the operators H, H 2 , . . . , H d respectively.
Let X 1 , . . . , X k be generalized eigenvectors of T corresponding to the eigenvalue
zero, such that Jk (0) is the matrix of H with respect to the basis C = {X 1 , . . . , X k } for
Fk . Since the columns of Jk (0) are the coordinate vectors of H (X 1 ), . . . , H (X k ) with
respect to the basis C, then H (X 1 ) = 0 and H (X i ) = X i−1 , for any i = 2, . . . , k.
Since {H (X 1 ), . . . , H (X k )} is a generator set for R, then {H (X 2 ), . . . , H (X k )} =
{X 1 , . . . , X k−1 } is a basis for R. Here we also notice that H 2 (X 1 ) = 0, H 2 (X 2 ) =
H (H (X 2 )) = H (X 1 ) = 0 and H 2 (X i ) = H (X i−1 ) = X i−2 , for any i = 3, . . . , k.
As above, this means that {H (X 3 ), . . . , H (X k )} = {X 2 , . . . , X k−1 } is a basis for R 2 .
Repeating this process, we can see that H d (X 1 ) = 0, H d (X i ) = 0 for any
i ≤ d and H d (X i ) = X i−d , for any d < i ≤ k, so that {H (X d+1 ), . . . , H (X k )} =
{X d , . . . , X k−1 } is a basis for R d , that is dim F (R d ) = k − d. Moreover Jk (0)d is the
204 6 Canonical Forms of an Operator

matrix of any H d , in other words the rank of Jk (0)d is equal to the dimension of R d ,
and we are done.
Corollary 6.55 Let T : V → V be a singular linear operator of the finite dimen-
sional vector space V and assume that there exists a basis B for V such that the
matrix A of T with respect to B has Jordan form. If Jk (0) is a k × k Jordan block of
A corresponding to the eigenvalue zero, then

      1, d = k
rank Jk (0)d+1 − 2rank Jk (0)d + rank Jk (0)d−1 =
0, 1 ≤ d = k

Proof It follows by easy computations, as a consequence of Lemma 6.54.


Lemma
 6.56 Let  A, B ∈ Mn (F) be similar
 and λ an eigenvalue of A and B. Then
rank (A − λI )k = rank (B − λI )k for any k = 1, . . . , n.
Proof Let C ∈ Mn (F) be the nonsingular matrix such that B = C −1 AC. For any
k ≥ 1,  k
C −1 (A − λI )k C = C −1 (A − λI )C = (B − λI )k

that is, (A − λI )k and (B − λI )k are similar. Thus, rank(A − λI )k = rank(B −


λI )k .
Corollary 6.57 A, B ∈ Mn (F) are similar if and only if (A − λI ) and (B − λI )
are similar.
Theorem 6.58 Let T : V → V be a linear operator of a finite dimensional vector
space V over the field F, A ∈ Mn (F) be the matrix of T with respect to a basis for
V. Assume that there exists a basis B for V such that the matrix A of T with respect
to B has Jordan form. If Jk (λ) is a k × k Jordan block of A corresponding to the
eigenvalue λ, then the number of times Jk (λ) occurs as a diagonal block in A is
equal to
     
rank (A − λI )k+1 − 2rank (A − λI )k + rank (A − λI )k−1 .

Proof Let B = {X 1 , . . . , X n } be the Jordan basis for V , with respect to which


the matrix A of T has Jordan form. Hence  A = C −1 AC, where C is the tran-
sition matrix. By Lemma 6.56, rank (A − λI )k = rank (A − λI )k , for any
k = 1, . . . , n. Hence, in order to prove our formula, we replace A by A . Let
⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
A =⎢




⎣ .. ⎦
.
At

be the Jordan block diagonal form, where any n i × n i block Ai is associated with an
eigenvalue λi of T. Thus
6.4 Jordan Canonical Form of an Operator 205
⎡ ⎤
A1 − λIn 1
⎢ .. ⎥
⎢ . ⎥
A − λI = ⎢




⎣ .. ⎦
.
At − λIn t

and ⎡ ⎤
(A1 − λIn 1 )k
⎢ .. ⎥
⎢ . ⎥
(A − λI )k = ⎢



⎣ .. ⎦
.
(At − λIn t )k

so that
  t
 
rank (A − λI )k = rank (Ai − λIni )k . (6.12)
i=1

 
To simplify the notation we denote rk (A) = rank (A − λI )k and rk (Ai ) =
rank (Ai − λIni )k . By (6.12) it follows that

t 
 
rk+1 (A) − 2rk (A) + rk−1 (A) = rk+1 (Ai ) − 2rk (Ai ) + rk−1 (Ai ) . (6.13)
i=1

If λi = λ, then λ is not the eigenvalue corresponding to Ai , then Ai − λIni is an


invertible n i × n i matrix, having rank equal to n i . Hence any power of Ai − λIni has
rank equal to n i . In this case

t 
 
rk+1 (Ai ) − 2rk (Ai ) + rk−1 (Ai ) = 0.
i=1

On the other hand, for λi = λ, the matrix A − λIni is a n i × n i Jordan block


having zero on the main diagonal (its form is Jni (0)). In this case, by Corollary 6.55,

1, k = n i
rk+1 (Ai ) − 2rk (Ai ) + rk−1 (Ai ) =
0, k = n i

Hence, in light of relation (6.13), we obtain that rk+1 (A) − 2rk (A) + rk−1 (A) rep-
resents precisely the number of Jordan blocks Ai having dimension k and associated
to the eigenvalue λ.

Theorem 6.59 Let T : V → V be a linear operator of a finite dimensional vector


space V, A ∈ Mn (F) be the matrix of T with respect to a basis for V. Assume that
there exists a basis B for V such that the matrix A of T with respect to B has Jordan
form. If λ is an eigenvalue of T having index k, then there exists at least one k × k
206 6 Canonical Forms of an Operator

Jordan block in A corresponding to the eigenvalue λ. Moreover, every Jordan block


in A associated with λ has order less than or equal to k.

Proof Let Nh = K er (A − λI )h , for any h ≥ 1. In particular Nk is the generalized


eigenspace of λ. For any i < k,we know that Ni ⊂ Nk , and hence dim F (Ni ) <
dim F (Nk ) and rank (A − λI )i > rank (A − λI )k . Moreover, for any t ≥ k,
Nk = Nt , that is rank (A − λI )t > rank (A − λI )k . Hence, if Jk (λ) is a k × k
Jordan block corresponding to the eigenvalue λ, we can compute the number m of
times Jk (λ) occurs as a block in A by using the result in Theorem 6.58, that is
     
m = rank (A − λI )k+1 − 2rank (A − λI )k + rank (A − λI )k−1 .

In light of previous comments, it is easy to see that m ≥ 1. Now let Jh (λ) an h × h


Jordan block corresponding to λ, with h = k. If we suppose h > k, then h − 1 ≥ k,
thus Nh−1 = Nk and a fortiori Nh+1 = Nh = Nk . Thus
     
rank (A − λI )h+1 = rank (A − λI )h = rank (A − λI )h−1

and an application of Theorem 6.58 shows that the number of times Jh (λ) occurs as
a block in A is
     
rank (A − λI )h+1 − 2rank (A − λI )h + rank (A − λI )h−1 = 0,

which leads to a contradiction.

Theorem 6.60 Let T : V → V be a linear operator of the finite dimensional vector


space V over the field F, A ∈ Mn (F) be the matrix of T with respect to a basis for V.
Assume that there exists a basis B for V such that the matrix A of T with respect to
B has Jordan form. If λ is an eigenvalue of T having index k, then the total number
of Jordan blocks in A corresponding to the eigenvalue λ is equal to the geometric
multiplicity of λ.

Proof As above, let Nh = K er (A − λI )h , for any h ≥ 1, and Nk be the generalized


eigenspace of λ (that is k is the index of λ). To obtain the total number t of blocks
corresponding to λ, one can apply the formula in Theorem 6.58. Moreover, the
greatest order for a block corresponding to λ is equal to k, therefore


k
     !
t= rank (A − λI )i+1 − 2rank (A − λI )i + rank (A − λI )i−1 .
i=1

By computations, it follows that


       
t = rank (A − λI )0 − rank A − λI + rank (A − λI )k+1 − rank (A − λI )k .
6.4 Jordan Canonical Form of an Operator 207

Since Nk+1 = Nk , we have that


   
rank (A − λI )k+1 = rank (A − λI )k

so that
     
t = rank (A − λI )0 − rank A − λI = n − rank A − λI

which is precisely the geometric multiplicity of λ.

Example 6.61 Let T : R5 → R5 be the linear operator having matrix


⎡ ⎤
3 1 2 1 2
⎢0 3 0 2 −2 ⎥
⎢ ⎥
A=⎢
⎢0 0 3 −1 1 ⎥

⎣0 0 0 2 1 ⎦
0 0 0 0 2

with respect to the canonical basis of R5 . The characteristic polynomial is p(λ) =


(3 − λ)3 (2 − λ)2 , having two distinct roots λ1 = 3 (with algebraic multiplicity equal
to 3), λ2 = 2 (with algebraic multiplicity equal to 2).
For λ1 = 3, we have
⎡ ⎤
012 1 2
⎢ 0 0 0 2 −2 ⎥
⎢ ⎥
A − 3I = ⎢ ⎢ 0 0 0 −1 1 ⎥

⎣ 0 0 0 −1 1 ⎦
0 0 0 0 −1

having rank r1 = 3, so that the associated homogeneous linear system has solutions
X = (α, −2β, β, 0, 0), for any α, β ∈ R. The eigenspace corresponding to λ1 is
N1,λ1 = (1, 0, 0, 0, 0), (0, −2, 1, 0, 0) and the geometric multiplicity of λ1 is equal
to 2. Therefore, there are 2 Jordan blocks corresponding to the eigenvalue λ1 = 3.
More precisely, one block has dimension 2 and one block has dimension 1. Now
⎡ ⎤
0 0 0 −1 −1
⎢0 0 0 −2 4 ⎥
⎢ ⎥
(A − 3I )2 = ⎢
⎢0 0 0 1 −2 ⎥

⎣0 0 0 1 −2 ⎦
0 0 0 0 1

has rank r2 = 2, that is the associated homogeneous linear system has solutions
(α, β, γ , 0, 0), for any α, β, γ ∈ R. Thus N2,λ1 has dimension 3 and N2,λ1 =
(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0). Notice that the dimension of N2,λ1 coin-
cides with the algebraic multiplicity of λ1 , so that N2,λ1 is the generalized eigenspace
208 6 Canonical Forms of an Operator

of λ1 . In fact, it is easy to see that Nk,λ1 = N2,λ1 , for any k ≥ 3. To obtain a


set of Jordan generators corresponding to λ1 , we start from X 2 ∈ N2,λ1 \ N1,λ1 .
We choose X 2 = (0, 1, 0, 0, 0). Then compute X 1 = (A − 3I )X 2 = (1, 0, 0, 0, 0).
Hence B1 = {X 1 , X 2 } is a set of Jordan generators associated with the block of
dimension 2. The set of Jordan generators associated with the block of dimension 1
must be B2 = {X 3 }, where X 3 ∈ N1,λ1 \ X 1 , X 2 , that is X 3 = (0, −2, 1, 0, 0),
For λ2 = 2, we have
⎡ ⎤
112 1 2
⎢ 0 1 0 2 −2 ⎥
⎢ ⎥
A − 2I = ⎢⎢ 0 0 1 −1 1 ⎥

⎣0 0 0 0 1 ⎦
000 0 0

having rank equal to 4, so that the associated homogeneous linear system has solu-
tions (−α, −2α, α, α, 0), for any α ∈ R. The eigenspace corresponding to λ2 is
N1,λ2 = (−1, −2, 1, 1, 0) and the geometric multiplicity of λ2 is equal to 1. There-
fore, there is 1 Jordan block corresponding to the eigenvalue λ2 = 2, having dimen-
sion 2. Since ⎡ ⎤
124 1 3
⎢0 1 0 2 0⎥
⎢ ⎥
(A − 2I ) = ⎢
2
⎢ 0 0 1 −1 0 ⎥

⎣0 0 0 0 0⎦
000 0 0

has rank r2 = 3, then the associated homogeneous linear system has solutions
(−α − 3β, −2α, α, α, β), for any α, β ∈ R. Thus, the generalized eigenspace N2,λ2
has dimension 2 and N2,λ2 = (−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1). To obtain a set of
Jordan generators corresponding to λ2 , we start from X 5 ∈ N2,λ2 \ N1,λ2 . Of course
X 5 = (−3, 0, 0, 0, 1). Then compute X 4 = (A − 2I )X 5 = (−1, −2, 1, 1, 0). Hence
B3 = {X 4 , X 5 } is a set of Jordan generators associated with the block of dimension
2, corresponding to λ2 .
Finally, we can write a Jordan basis for T :

B = B1 ∪ B2 ∪ B3
= {X 1 , X 2 , X 3 , X 4 , X 5 }
= {(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, −2, 1, 0, 0), (−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1)}.

Let A be the matrix of T with respect to B. The column coordinate vectors of A


are the images of the elements of B, more precisely:

T (X 1 ) = AX 1
= (3, 0, 0, 0, 0), with coordinates in terms of B as (3, 0, 0, 0, 0);
6.4 Jordan Canonical Form of an Operator 209

T (X 2 ) = AX 2
= (1, 3, 0, 0, 0), with coordinates in terms of B as (1, 3, 0, 0, 0);

T (X 3 ) = AX 3
= (0, −6, 3, 0, 0), with coordinates in terms of B as (0, 0, 3, 0, 0);

T (X 4 ) = AX 4
= (−2, −4, 2, 2, 0), with coordinates in terms of B as (0, 0, 0, 2, 0);

T (X 5 ) = AX 5
= (−7, −2, 1, 1, 2), with coordinates in terms of B as (0, 0, 0, 1, 2).

Hence, if P is the following transition matrix


⎡ ⎤
1 0 0 −1 −3
⎢0 1 −2 −2 0 ⎥
⎢ ⎥
P=⎢
⎢0 0 1 1 0 ⎥⎥
⎣0 0 0 1 0 ⎦
0 0 0 0 1

then ⎡ ⎤
3 1 0 0 0
⎢0 3 0 0 0⎥
⎢ ⎥
A = P −1 A P = ⎢
⎢0 0 3 0 0⎥⎥
⎣0 0 0 2 1⎦
0 0 0 0 2

is the canonical Jordan form of A.

Example 6.62 Let T : R5 → R5 be the linear operator having matrix


⎡ ⎤
0 0 0 −1 1
⎢ −1 1 1 −1 1 ⎥
⎢ ⎥
A=⎢
⎢ 0 0 1 0 0⎥ ⎥
⎣ 1 0 0 2 0⎦
0 0 0 0 1

with respect to the canonical basis of R5 . The characteristic polynomial is p(λ) =


(1 − λ)5 , having one root λ = 1 (with algebraic multiplicity equal to 5).
We have ⎡ ⎤
−1 0 0 −1 1
⎢ −1 0 1 −1 1 ⎥
⎢ ⎥
A−I =⎢ ⎢ 0 0 0 0 0⎥

⎣ 1 0 0 1 0⎦
0 00 0 0
210 6 Canonical Forms of an Operator
⎡ ⎤
0 0 0 0 −1
⎢0 0 0 0 −1 ⎥
⎢ ⎥
(A − I )2 = ⎢
⎢0 0 0 0 0 ⎥⎥
⎣0 0 0 0 1 ⎦
0 0 0 0 0

and (A − I )3 = 0. In particular, this means that there exists a Jordan block corre-
sponding to λ of dimension 3 and any other Jordan block of λ has dimension less
than 3.
Therefore, A − I has rank r1 = 3, so that the associated homogeneous linear
system has solutions X = (α, β, 0, −α, 0), for any α, β ∈ R. So
N1,λ = (1, 0, 0, −1, 0), (0, 1, 0, 0, 0), the geometric multiplicity of λ is equal to
2 and there are 2 Jordan blocks corresponding to the eigenvalue λ. Since (A −
I )2 has rank r2 = 1, the associated homogeneous linear system has solutions X =
(α, β, γ , δ, 0), for any α, β, γ , δ ∈ R. So

N2,λ = (1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0), (0, 0, 0, 1, 0).

Moreover, (A − I )3 = 0 implies N3,λ = R5 . We also remark that, the number


of blocks corresponding to λ, having dimension 1 is equal to n − 2r1 + r2 = 0.
Analogously, the number of blocks corresponding to λ, having dimension 2 is equal
to r1 − 2r2 + r3 = 1, as expected. In other words, we have one block of dimension
2 and one block of dimension 3.
To obtain a set of Jordan generators {X 1 , X 2 , X 3 } corresponding to λ and asso-
ciated with the block of dimension 3, we start from X 3 ∈ N3,λ \ N2,λ . We choose
X 3 = (0, 0, 0, 0, 1). Then compute X 2 = (A − I )X 3 = (1, 1, 0, 0, 0), X 1 = (A −
I )X 2 = (−1, −1, 0, 1, 0). Hence B1 = {X 1 , X 2 , X 3 } is a set of Jordan generators
associated with the block of dimension 3. The set of Jordan generators associated with
the block of dimension 2 must be B2 = {X 4 , X 5 }, where X 5 ∈ N2,λ \ N1,λ and X 5 ∈ /
X 1 , X 2 , X 3 . We choose X 5 = (0, 0, 1, 0, 0), so X 4 = (A − I )X 5 = (0, 1, 0, 0, 0).
Hence, we can write a Jordan basis for T :

B = B1 ∪ B2
= {X 1 , X 2 , X 3 , X 4 , X 5 }
= {(−1, −1, 0, 1, 0), (1, 1, 0, 0, 0), (0, 0, 0, 0, 1), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0)}.

Let A be the matrix of T with respect to B. If P is the following transition matrix


⎡ ⎤
−1 1 0 0 0
⎢ −1 1 0 1 0⎥
⎢ ⎥
P=⎢
⎢ 0 0 0 0 1⎥⎥
⎣ 1 0 0 0 0⎦
0 0 1 0 0

then
6.4 Jordan Canonical Form of an Operator 211
⎡ ⎤
1 1 0 0 0
⎢0 1 1 0 0⎥
⎢ ⎥
A = P −1 A P = ⎢
⎢0 0 1 0 0⎥⎥
⎣0 0 0 1 1⎦
0 0 0 0 2

is the canonical Jordan form of A.


As in the previous example, one can also compute A by using the fact that the
column coordinate vectors of A are the images of the element of B (the reader can
easily verify it).

Exercises

1. For each of the following linear operators T on the vector space V, determine
whether the given subspace W is a T -invariant subspace of V :

(a) V = P3 (R), T ( f (x)) = f (x), and W = P2 (R).
(b) V = P(R), T ( f (x)) = x f (x), and W = P2 (R).
(c) V = R3 , T (a, b, c) = T (a + b + c, a + b + c, a + b + c) and W =
{(t, t, t) : t ∈ R}. "1
(d) V = C([0, 1]), T ( f (t)) = [ 0 f (x)d x]t, and W = { f ∈ V : f (t) =
at + b for some a and b}. 
01
(e) V = M2 (R), T (A) = A, and W = {A ∈ V : At = A}.
10
2. Let T : R4 → R4 be the linear operator having matrix
⎡ ⎤
2 0 0 0
⎢1 3 1 0⎥
A=⎢
⎣0

−1 1 1⎦
0 1 2 1

with respect to the canonical basis of R4 . Determine the (diagonal or Jordan)


canonical form A of A and the basis for R4 with respect to which the matrix of
T is A .
3. Repeat Exercise 2, for the linear operator T : R4 → R4 having matrix
⎡ ⎤
1 0 0 0
⎢1 −2 0 −4 ⎥
A=⎢
⎣0

0 −1 0 ⎦
0 1 0 3

with respect to the canonical basis of R4 .


4. Repeat Exercise 2, for the linear operator T : R5 → R5 having matrix
212 6 Canonical Forms of an Operator
⎡ ⎤
1 4 1 10
⎢1 1 2 1 0⎥
⎢ ⎥
A=⎢
⎢0 0 1 4 0⎥

⎣0 0 1 1 0⎦
0 1 0 13

with respect to the canonical basis of R5 .


5. Repeat Exercise 2, for the linear operator T : R6 → R6 having matrix
⎡ ⎤
0 1 1 00 0
⎢ 0 1 0 01 −1 ⎥
⎢ ⎥
⎢ −1 1 2 00 0 ⎥
A=⎢
⎢ 0

⎢ 0 0 10 0 ⎥⎥
⎣ 1 0 −1 1 1 1 ⎦
0 0 0 10 1

with respect to the canonical basis of R6 .


6. Let T : C3 → C3 be the linear operator having matrix
⎡ ⎤
i 10
A = ⎣1 i i ⎦
01i

with respect to the canonical basis of C3 . Determine the (diagonal or Jordan)


canonical form A of A and the basis for C3 with respect to which the matrix of
T is A .
7. Let T : V → V be a linear operator on a finite dimensional complex vector space
V. Prove that T is diagonalizable if and only if every generalized eigenvector of
T is an eigenvector of T.
8. Let F be a field and A, B ∈ Mn (F) be such that their characteristic polynomials
split over F. Prove that A and B are similar if and only if they have the same
Jordan canonical form.

6.5 Cayley-Hamilton Theorem and Minimal Polynomial

Let F be a field, p(X ) ∈ F[X ] a polynomial of degree d ≥ 1, A ∈ Mn (F) a n × n


matrix. If we write

p(X ) = a0 + a1 X + a2 X 2 + · · · + ad X d , a0 , a1 , . . . , ad ∈ F,

we can substitute the matrix A for the indeterminate X and obtain a matrix p(A) ∈
Mn (F), more precisely
6.5 Cayley-Hamilton Theorem and Minimal Polynomial 213

p(A) = a0 I + a1 A + a2 A2 + · · · + ad Ad ∈ Mn (F).

In particular, we say that A is a root for the polynomial p(X ), in case p(A) is the
zero matrix in Mn (F) and write p(A) = 0. Starting from these comments, we may
now state the following:

Theorem 6.63 (Cayley-Hamilton Thoerem) Let T : V → V be a linear operator of


the finite dimensional vector space V, A ∈ Mn (F) be the matrix of T with respect to
a basis for V. If p(λ) ∈ F[λ] is the characteristic polynomial of T, then p(A) = 0.

Proof Let λ1 , . . . , λk be the distinct eigenvalues of T, ti the index of λi , for


i = 1, . . . , k. Let V = N1 ⊕ · · · ⊕ Nk , where Ni = K er (A − λi I )ti = Nti ,λi is the
generalized eigenspace corresponding to the eigenvalue λi . For any X ∈ V, there
exist X 1 ∈ N1 , X 2 ∈ N2 , . . . , X k ∈ Nk such that

X = X1 + · · · + Xk. (6.14)

Recall that, if ai is the algebraic multiplicity of λi , then ai ≥ ti , so (A −


λi I )ai X i = 0, for any X i ∈ Ni . Therefore, multiplying on the left the relation (6.14)
by the matrix p(A), we have

p(A)X = p(A)X 1 + · · · + p(A)X k . (6.15)

On the other hand, p(λ) = (−1)n (λ − λ1 )a1 · · · (λ − λk )ak , that is

p(A) = (−1)n (A − λ1 I )a1 · · · (A − λk I )ak . (6.16)

Since (A − λi I ) and (A − λ j I ) commute for any λi = λ j , and starting from


(6.16), it is easy to see that

p(A)X i = (−1)n (A − λ1 I )a1 · · · (A − λk I )ak X i = 0, ∀X i ∈ Ni , ∀i = 1, . . . , k.


(6.17)
Hence combining (6.15) and (6.17) we get p(A)X = 0, for any X ∈ V. Thus
p(A) is the matrix which represents the zero operator over V, that is, p(A) is the
zero matrix in Mn (F).

In the following example, we show how Cayley-Hamilton Theorem can be used in


order to solve some problems related to the computation of the inverse of a matrix,
the power of a matrix and their eigenvalues.

Example 6.64 Let T : R2 → R2 be the linear operator having matrix



23
A=
32
214 6 Canonical Forms of an Operator

with respect to the canonical basis of R2 . The characteristic polynomial of T is p(λ) =


λ2 − 4λ − 5. The eigenvalues of T are −1, 5 and, by Cayley-Hamilton Theorem,
A2 − 4 A − 5I = 0, where I is the identity matrix of order 2.
We firstly use the Cayley-Hamilton Theorem to compute the power A6 . Starting
from A2 = 4 A + 5I, we get

A3 = 4 A2 + 5 A = 4(4 A + 5I ) + 5 A = 21 A + 20I.

Then
A6 = (A3 )2 = (21A + 20I )2 = 441A2 + 840 A + 400I = 441(4 A + 5I ) + 840 A
+ 400I = 2604 A + 2605I. This implies that

7813 7812
A6 = .
7812 7813

We now obtain the inverse of A, starting again from A2 − 4 A = 5I. Multiplying


by A−1 we get A − 4I = 5A−1 , so that

−1 1 1 −2 3
A = (A − 4I ) = .
5 5 3 −2

Finally, we compute the eigenvalues for the matrix B = A4 − 3A3 + 2 A2 − A +


I.
If we divide x 4 − 3x 3 + 2x 2 − x + 1 by the characteristic polynomial x 2 − 4x −
5, we get

x 4 − 3x 3 + 2x 2 − x + 1 = (x 2 − 4x − 5)(x 2 + x + 11) + (48x + 56).

In particular, since A2 − 4 A − 5I = 0, one has

B = A4 − 3 A3 + 2 A2 − A + I = (A2 − 4 A − 5I )(A2 + A + 11I ) + (48 A + 56I ) = 48 A + 56I

that is
152 144
B=
144 152

having eigenvalues 8, 296.

Definition 6.65 Let A be the matrix of a linear operator T : V → V, p(λ) ∈ F[λ]


the characteristic polynomial of T. A polynomial m(λ) ∈ F[λ] is said to be the
minimal polynomial of T if it is the monic polynomial of smallest degree such that
m(A) is the zero matrix.

Theorem 6.66 Let A be the matrix of a linear operator T : V → V, f (λ) ∈ F[λ] a


polynomial. Then f (A) = 0 if and only if the minimal polynomial m(λ) of T divides
f (λ).
6.5 Cayley-Hamilton Theorem and Minimal Polynomial 215

Proof Let f (A) = 0. By the division algorithm for polynomials, there exist polyno-
mials q(λ), r (λ) ∈ F[λ] such that f (λ) = q(λ)m(λ) + r (λ), where r (λ) = 0 or
deg r (λ) < deg m(λ). Thus 0 = f (A) = q(A)m(A) + r (A) = r (A). Since m(λ) is
the minimal polynomial of T and deg r (λ) < deg m(λ), then the polynomial r (λ)
must be identically zero. Thus m(λ) divides f (λ).
Assume now that f (λ) = m(λ)q(λ), for some q(λ) ∈ F[λ]. Then f (A) = m(A)
q(A) = 0, as required.

Corollary 6.67 The minimal polynomial of a linear operator T divides the charac-
teristic polynomial of T.

Theorem 6.68 Let A be the matrix of a linear operator T : V → V, m(λ) ∈ F[λ]


the minimal polynomial of T. λ0 ∈ F is a root of m(λ) if and only if λ0 is an eigenvalue
of T.

Proof Let p(λ) ∈ F[λ] be the characteristic polynomial of T. The result is trivial in
case m(λ) = p(λ). Thus we can consider deg(m) < deg( p) and p(λ) = m(λ)q(λ)
for some q(λ) ∈ F[λ].
Assume first that λ0 is a root of m(λ). In this case it is easy to see that p(λ0 ) =
m(λ0 )q(λ0 ) = 0, that is λ0 is a root of the characteristic polynomial, i.e., λ0 is an
eigenvalue of T.
Suppose, now p(λ0 ) = 0 and suppose by contradiction that (λ − λ0 ) does not
divide m(λ). Hence, there exists a polynomial q(λ) ∈ F[λ] such that m(λ) =
(λ − λ0 )q(λ) + r , where deg(q) = deg(m) − 1 and 0 = r ∈ F. Hence 0 = m(A) =
(A − λ0 I )q(A) + r I, that is −r I = (A − λ0 I )q(A). By computing the determi-
nants of the matrices in the last identity, we get | − r I | = |A − λ0 I ||q(A)|, which
means (−r )n = |A − λ0 I ||q(A)|. Since λ0 is an eigenvalue of T, |A − λ0 I | =
p(λ0 ) = 0 and (−r )n = 0, which is a contradiction.

Theorem 6.69 The minimal polynomial of a linear operator T : V → V has the


following form
m(λ) = (λ − λ1 )t1 · · · (λ − λk )tk ,

where λ1 , . . . , λk are the distinct eigenvalues of T and ti is the index of λi , for any
i = 1, . . . , k.

Proof Let N1 , . . . , Nk be the generalized eigenspace of λ1 , . . . , λk respectively.


Then V = N1 ⊕ · · · ⊕ Nk and, for any X ∈ V there are X 1 ∈ N1 , . . . , X k ∈ Nk such
that X = X 1 + · · · + X k . Let A be the matrix of the linear operator T, with regard
to a basis B of V.
Since (A − λi I ) and (A − λ j I ) commute for any λi = λ j , we see that

(A − λ1 I )t1 · · · (A − λk I )tk X i = 0, for all X i ∈ Ni , ∀i = 1, . . . , k

that is
(A − λ1 I )t1 · · · (A − λk I )tk X = 0, for all X ∈ V.
216 6 Canonical Forms of an Operator

Therefore, the polynomial m(λ) represents the zero operator on V, so that m(A) =
0. Moreover, the polynomial m(λ) divides the characteristic polynomial p(λ), since
any ti ≤ ai , for any i = 1, . . . , k, where ai is the algebraic multiplicity of λi .
To conclude the proof, it is sufficient to show that m(λ) is the monic polynomial
of smallest degree such that m(A) is the zero matrix.
Denote
f (λ) = (λ − λ1 )s1 · · · (λ − λk )sk ,

the minimal polynomial of T and suppose by contradiction that f (λ) = m(λ). Since
m(A) = 0, by Theorem 6.66, f (λ) divides m(λ). In other words, there is at least
one s j ∈ {s1 , . . . , sk } such that s j < t j . Since t j is the index of λ j , then there exists
X ∈ N j such that 0 = (A − λ j I )s j X = Y ∈ N j . On the other hand, for any Z ∈ N j
and l = j, we know that Z ∈ / Nl and, by Lemma 6.42, 0 = (A − λl I )sl Z ∈ N j .
Moreover, since (A − λi I ) and (A − λ j I ) commute for any λi = λ j , we can write
the factorization of f (A) as follows:

f (A) = (A − λ1 I )s1 · · · (A − λ j−1 I )s j−1 (A − λ j+1 I )s j+1 · · · (A − λ j I )sk (A − λ j I )s j .

It is obvious that f (A) = 0. Therefore, f (A)(X ) = 0 and f (A)(Y ) = 0 for all


X, Y ∈ N j , j = 1, 2, · · · , k. On the other hand, we arrive at

0 = f (A)(X )
= (A − λ1 I )s1 · · · (A − λ j−1 I )s j−1 (A − λ j+1 I )s j+1 · · · (A − λ j I )sk (A − λ j I )s j X
= (A − λ1 I )s1 · · · (A − λ j−1 I )s j−1 (A − λ j+1 I )s j+1 · · · (A − λ j I )sk Y
= 0

which leads to a contradiction. Thus, we conclude that si = ti for all i = 1, . . . , k


that is, m(λ) is precisely the minimal polynomial of T.

Example 6.70 Let T : R5 → R5 be the linear operator as in the Example 6.61. The
characteristic polynomial of T is p(λ) = (3 − λ)3 (2 − λ)2 and the Jordan canonical
form of T is represented by the matrix
⎡ ⎤
3 1 0 0 0
⎢0 3 0 0 0⎥
⎢ ⎥
A = ⎢
⎢0 0 3 0 0⎥⎥
⎣0 0 0 2 1⎦
0 0 0 0 2

with respect to the basis

B = {(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, −2, 1, 0, 0), (−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1)}
6.5 Cayley-Hamilton Theorem and Minimal Polynomial 217

for R5 . In this case, the minimal polynomial of T is m(λ) = (3 − λ)2 (2 − λ)2 .

Example 6.71 Let T : R5 → R5 be the linear operator as in Example 6.62. The


characteristic polynomial of T is p(λ) = (1 − λ)5 and the Jordan canonical form of
T is represented by the matrix
⎡ ⎤
1 1 0 0 0
⎢0 1 1 0 0⎥
⎢ ⎥
A = ⎢
⎢0 0 1 0 0⎥⎥
⎣0 0 0 1 1⎦
0 0 0 0 2

with respect to the basis

B = {(−1, −1, 0, 1, 0), (1, 1, 0, 0, 0), (0, 0, 0, 0, 1), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0)}

for R5 . The minimal polynomial of T is m(λ) = (1 − λ)3 .

Exercises

1. Let T : V → V be a linear operator on a vector space V of dimension 3 over


the field F, such that the characteristic polynomial of T splits over F. Describe
all possibilities for the Jordan canonical form and for the minimal polynomial
of T.
2. Repeat Exercise 1, in case T : V → V is a linear operator on a vector space V
of dimension 4 over the field F.
3. Let T : R3 → R3 be the linear operator having matrix
⎡ ⎤
212
A = ⎣1 2 2⎦
001

with respect to the canonical basis for R3 . Applying the Cayley-Hamilton The-
orem, find the eigenvalues and the eigenvectors of the matrix B = A5 − 5A4 +
8A3 − 8A2 + 8A − 7I.
4. Let T : R2 → R2 be the linear operator having matrix

01
A=
21

with respect to the canonical basis for R2 . Compute A10 using the Cayley-
Hamilton Theorem.
5. Let T : R3 → R3 be the linear operator having matrix
218 6 Canonical Forms of an Operator
⎡ ⎤
311
A = ⎣1 3 1⎦
003

with respect to the canonical basis for R3 . Find the inverse matrix of A using the
Cayley-Hamilton Theorem.
6. Let T : R3 → R3 be the linear operator having matrix
⎡ ⎤
3 1 1
A = ⎣ 0 −1 1 ⎦
0 0 2

with respect to the canonical basis for R3 . Applying the Cayley-Hamilton The-
orem,
(a) Find the eigenvalues and the eigenvectors of the matrix B = A5 − 2 A4 −
7A3 + 8A2 + 11A − 2I.
(b) Find the inverse matrix of A.
(c) Compute the matrix A8 .
7. Let T : R9 → R9 be the linear operator having characteristic polynomial equal to
p(λ) = (3 − λ)4 (2 − λ)2 (−1 − λ)3 and minimal polynomial equal to m(λ) =
(3 − λ)2 (2 − λ)(−1 − λ)2 . Describe all possibilities for the Jordan canonical
form of T.

6.6 Normal Operators on Inner Product Spaces

In this final Section, V is an inner product space over F, T : V → V a linear operator.


Here, we always assume that T : V → V is normal (T T ∗ = T ∗ T ) and F denote either
R or C. We remind the reader of some operators which belong to the more general
class of normal operators:
(i) self-adjoint operator: T = T ∗ and T (u), v = u, T (v), for any u, v ∈ V ;
(ii) skew-adjoint operator: T = −T ∗ and T (u), v = − u, T (v), for any u, v ∈
V;
(iii) unitary (orthogonal in case F = R) operator: T ∗ = T −1 and T (u), T (v) =
u, v, for any u, v ∈ V.
Let B be a basis of V with respect to which A ∈ Mn (F) is the matrix of T and A∗ is
the matrix of T ∗ . Thus, for any X ∈ V, T T ∗ (X ) = T ∗ T (X ) and A A∗ X = A∗ AX,
that is (A A∗ − A∗ A)X = 0. Hence A A∗ = A∗ A, which means that the matrix A of
a normal operator is also normal, in the sense that it commutes with its conjugate
transpose A∗ .
6.6 Normal Operators on Inner Product Spaces 219

Remark 6.72 Symmetric, Hermitian, skew-symmetric, skew-Hermitian, orthogo-


nal and unitary operators are normal operators. As a consequence,
(i) the matrix of a symmetric (Hermitian) operator is symmetric (Hermitian);
(ii) the matrix of a skew-symmetric (skew-Hermitian) operator is skew-
symmetric (skew-Hermitian);
(iii) the matrix of an orthogonal (unitary) operator is orthogonal (unitary).

The above remark is being justified as below.


Let T : V → V the operator and A, A∗ be the matrices of T and T ∗ respectively.
(i) If T = T ∗ then T (X ) = T ∗ (X ), for any X ∈ V. Thus AX = A∗ X and (A −
A∗ )X = 0 for any X ∈ V, implying that A = A∗ .
(ii) By using the same above argument, one can prove that T = −T ∗ implies
A = −A∗ .
(iii) If T ∗ = T −1 then T T ∗ (X ) = X, that is A A∗ X = X, for any X ∈ V. Thus
A A∗ is the identity in Mn (F) so that A∗ = A−1 .
Here, we develop a detailed description of the class of normal operators (matrices).
We start with the following:

Lemma 6.73 Let T : V → V be normal and let X ∈ V be an eigenvector of T


corresponding to the eigenvalue λ. Then X is an eigenvector of T ∗ corresponding to
the eigenvalue λ.

Proof Since T is normal, so is T − λI. We are also given T (X ) = λX , i.e., (T −


λI )X = 0. This implies that
# # # #
0 = (T − λI )X = #(T − λI )∗ X # = #(T ∗ − λI )X #

that is (T ∗ − λI )X = 0, as desired.

Lemma 6.74 Let T : V → V be normal. Then, eigenvectors of T corresponding to


distinct eigenvalues are orthogonal.

Proof Let λ, μ be distinct eigenvalues of T and X, Y ∈ V be such that T (X ) = λX ,


T (Y ) = μY. Then
T (X ), Y  = λX, Y  = λ X, Y 

and also
T (X ), Y  = X, T ∗ (Y ) = X, μY  = μ X, Y .

Hence λ X, Y  = μ X, Y , so that X, Y  = 0 follows from λ = μ.

Lemma 6.75 Let T : V → V be a self-adjoint operator on a inner product space


V over the field F. Every eigenvalue of T is real.
220 6 Canonical Forms of an Operator

Proof Let λ be an eigenvalue of T and X ∈ V be an eigenvector of T corresponding


to λ, i.e., T (X ) = λX. Then

λ X 2 = λ X, X  = λX, X  = T (X ), X  =
X, T (X ) = X, λX  = λ X, X  = λ X 2 .

Hence, λ = λ ∈ R.

Lemma 6.76 Let T : V → V be a skew-adjoint operator on a inner product space


V over the field F. Every nonzero eigenvalue of T is purely imaginary.

Proof Assume there exists a nonzero eigenvalue λ of T and X ∈ V be an eigen


vector corresponding to the eigen value λ. Thus T (X ) = λX. In this case

λ X 2 = λ X, X  = λX, X  = T (X ), X  =
−X, T (X ) = −X, λX  = −λ X, X  = −λ X 2 .

Hence λ = −λ, that is the real part of λ is zero.

Remark 6.77 Immediate consequences are that:


(i) If T is a symmetric operator on a real inner product space, or T is Hermitian
on a complex inner product space, then every eigenvalue of T is real.
(ii) If T is a skew-symmetric operator on a real inner product space, or T is skew-
Hermitian on a complex inner product space, then every eigenvalue of T is
purely imaginary.

Moreover, we have the following:


Remark 6.78 If T is unitary (orthogonal), then every eigenvalue has modulus (abso-
lute value) equal to 1. In fact, if λ ∈ C is an eigenvalue of T and X ∈ V is an
eigenvector of T corresponding to λ, such that X = 1, then

1 = X, X  = T (X ), T (X ) = λX, λX  = λλ X, X  = λλ

as desired.

Theorem 6.79 (Spectral Theorem) Let T : V → V be an operator on the inner


product space V over a field F, A the matrix of T. Then A is orthogonally similar
to a diagonal matrix if and only if T is self-adjoint. In other words, there exists an
orthonormal basis of V consisting of eigenvectors of T if and only if T is self-adjoint.

Proof Suppose that V has an orthonormal basis B consisting of eigenvectors of T.


Then the matrix A of T with respect to B is diagonal. Since A = (A )∗ , then A is
self-adjoint as well as T.
We now prove the other direction of the theorem and assume that T is self-
adjoint. By Lemma 6.75, every eigenvalue of T is real, in particular the characteristic
6.6 Normal Operators on Inner Product Spaces 221

polynomial of T splits over F. By Theorem 6.20, there exists an orthonormal basis B


of V with respect to which the matrix A of T is upper-triangular, that is there exists
an orthonormal matrix P ∈ Mn (F) such that P ∗ A P = A . Since A∗ = A, (A )∗ =
P ∗ A P = A . Therefore, A is simultaneously self-adjoint and upper-triangular. In
this case, it is easy to see that A must be a diagonal matrix.

Remark 6.80 The Spectral Theorem can be stated from two different points of view:
(i) If V is a real inner product space, then the matrix A of T is orthogonally similar
to a diagonal matrix if and only if A is symmetric.
(ii) If V is a complex inner product space, then the matrix A of T is unitarily similar
to a diagonal matrix if and only if A is Hermitian.

Example 6.81 Let T : R4 → R4 be the symmetric linear operator having matrix


⎡ ⎤
2 1 1 0
⎢1 2 1 0⎥
A=⎢
⎣1

1 2 0⎦
0 0 0 2

with respect to the canonical basis of R4 . The characteristic polynomial is


p(λ) = (1 − λ)2 (2 − λ)(4 − λ).
For λ1 = 1, we have ⎡ ⎤
1110
⎢1 1 1 0⎥
A−I =⎢ ⎣1 1 1 0⎦

0001

so that the associated homogeneous linear system has solutions:


X = (−α − β, α, β, 0), for any α, β ∈ R. So the corresponding eigenspace is N1,λ1 =
(−1, 1, 0, 0), (−1, 0, 1, 0). Starting from vectors (−1, 1, 0, 0), (−1, 0, 1, 0), we
can easily construct the following orthonormal basis for N1,λ1 :
$   %
1 1 1 1 2
N1,λ1 = − √ , √ , 0, 0 , √ , √ , − √ , 0 .
2 2 6 6 6

For λ2 = 2, we have ⎡ ⎤
0 1 1 0
⎢1 0 1 0⎥
A − 2I = ⎢
⎣1

1 0 0⎦
0 0 0 0

so that the associated homogeneous linear system has solutions: X = (0, 0, 0, α),
for any α ∈ R. So the corresponding eigenspace is N1,λ2 = (0, 0, 0, 1). Finally, for
λ3 = 4, we have
222 6 Canonical Forms of an Operator
⎡ ⎤
−2 1 1 0
⎢ 1 −2 1 0 ⎥
A − 4I = ⎢
⎣ 1

1 −2 0 ⎦
0 0 0 −2

so that the associated homogeneous linear system has solutions: X = (α, α, α, 0),
for any α ∈ R. The corresponding eigenspace is
N1,λ3 = (1, 1, 1, 0)) = ( √13 , √13 , √13 , 0).
Therefore, with respect to the orthonormal basis
     &
1 1 1 1 2 1 1 1
− √ , √ , 0, 0 , √ , √ , − √ , 0 , (0, 0, 0, 1), √ , √ , √ , 0
2 2 6 6 6 3 3 3

the matrix of T is ⎡ ⎤
1 0 0 0
⎢0 1 0 0⎥
A =⎢

⎣0
⎥.
0 2 0⎦
0 0 0 4

Example 6.82 Let T : C4 → C4 be the Hermitian linear operator having matrix


⎡ ⎤
1 −i 0 0
⎢i 0 −1 0⎥
A=⎢
⎣0

−1 1 0⎦
0 0 0 2

with respect to the canonical basis of C4 . The characteristic polynomial is


p(λ) = (2 − λ)2 (1 − λ)(−1 − λ).
For λ1 = 1, we have ⎡ ⎤
0 −i 0 0
⎢ i −1 −1 0 ⎥
A−I =⎢ ⎣ 0 −1 0 0 ⎦

0 0 0 1

so that the associated homogeneous linear system has solutions: X = (α, 0, iα, 0),
for any α ∈ R. The corresponding eigenspace is N1,λ1 = (1, 0, i, 0) = ( √12 , 0,
√i , 0). For λ2 = −1, we have
2

⎡ ⎤
2 −i 0 0
⎢i 1 −1 0⎥
A+I =⎢
⎣0

−1 2 0⎦
0 0 0 3
6.6 Normal Operators on Inner Product Spaces 223

so that the associated homogeneous linear system has solutions:


X = (iα, 2α, α, 0), for any α ∈ R. The corresponding eigenspace is
N1,λ2 = (i, 2, 1, 0) = ( √i 6 , √26 , √16 , 0).
Finally, for λ3 = 2 we have
⎡ ⎤
−1 −i 0 0
⎢ i −2 −1 0⎥
A − 2I = ⎢
⎣ 0

−1 −1 0⎦
0 0 0 0

so that the associated homogeneous linear system has solutions:


X = (−iα, α, −α, β), for any α, β ∈ R. The corresponding eigenspace is
N1,λ3 = (−i, 1, −1, 0), (0, 0, 0, 1) = (− √i 3 , √13 , − √13 , 0), (0, 0, 0, 1).
Therefore, with respect to the unitary basis
     &
i 1 1 1 i i 2 1
− √ , √ , − √ , 0 , (0, 0, 0, 1), √ , 0, √ , 0 , √ , √ , √ , 0
3 3 3 2 2 6 6 6

the matrix of T is ⎡ ⎤
2 0 0 0
⎢ 0 2 0 0 ⎥
A = ⎢
⎣0
⎥.
0 1 0 ⎦
0 0 0 −1

Theorem 6.83 Let T : V → V be an operator on a inner product space V over a


field F, A the matrix of T and assume that the characteristic polynomial p(λ) of
T splits over F. Then T is normal (A is normal) if and only if T is diagonalizable
(resp. A is diagonalizable). More precisely, if F = R then A is orthogonally similar
to a diagonal matrix; if F = C then A is unitarily similar to a diagonal matrix.

Proof If V has an orthonormal basis B consisting of eigenvectors of T, then the


matrix A of T with respect to B is diagonal and so it is normal, as well as T.
Now assume that T is normal. Since the characteristic polynomial of T splits over
F, again by Theorem 6.20, there exists an orthonormal basis B of V with respect to
which the matrix A of T is upper-triangular.
In particular, if F = R, then there is an orthogonal matrix C ∈ Mn (R) such that
A = C T AC is upper triangular; if F = C, then there is an unitary matrix P ∈ Mn (C)
such that A = P ∗ A P is upper triangular.
Thus A is either orthogonally or unitarily similar to the upper-triangular A . More-
over A is normal, as A is. We prove that A is diagonal by induction on the dimension
of V.
If dim F V = 1, it is trivial. Now assume dim F V = n ≥ 2 and suppose that the
desired result holds for any vector space of smaller dimension. Let B = {e1 , . . . , en }
be the basis of V with respect to which A is the matrix of T. Since A is upper-
triangular, we can write its block diagonal form as follows:
224 6 Canonical Forms of an Operator

α uT
A =
0n−1,1 E

for 0n−1,1 = [ 0, . . . , 0 ]T , u ∈ Fn−1 and E ∈ Mn−1 (F). Moreover (A )∗ A = A


  
(n−1)−times

(A )∗ , so that, for any X ∈ V, we have
 ∗ # #2
T (X ) 2 = (A X )∗ (A X ) = X ∗ (A )∗ A X = X ∗ A (A )∗ X = (A )∗ X (A )∗ X = #T ∗ (X )# .
(6.18)

We notice that

α uT 1 α
T (e1 ) = =
0n−1,1 E 0n−1,1 0n−1,1

and
∗ α 01,n−1 1 α
T (e1 ) = =
u E∗ 0n−1,1 u

that is
 ∗ 
  α
T (e1 ) = (A X ) (A X ) = α 01,n−1
2
= αα
0n−1,1

and
# ∗ #      
#T (e1 )#2 = (A )∗ X ∗ (A )∗ X = α u ∗ α = αα + u ∗ u.
u

By relation (6.18), T (e1 ) 2 = T ∗ (e1 ) 2 so that 0 = u ∗ u = u = 0, that is


u = 0. Hence
 α 01,n−1
A =
0n−1,1 E

and
  ∗ α 01,n−1 α 01,n−1 αα 01,n−1
A (A ) = =
0n−1,1 E 0n−1,1 E∗ 0n−1,1 E E ∗

α 01,n−1 α 01,n−1 αα 01,n−1
(A )∗ A = = .
0n−1,1 E∗ 0n−1,1 E 0n−1,1 E ∗ E

Since A (A )∗ = (A )∗ A , we get E E ∗ = E ∗ E, i.e., E is a (n − 1) × (n − 1)


normal matrix. Moreover, any eigenvalue of E is an eigenvalue of A , and hence
the characteristic polynomial of E splits over F. By the induction hypothesis, E
is orthogonally (resp. unitary) similar to a diagonal matrix, that is there exists an
orthogonal (resp. unitary) matrix Q ∈ Mn−1 (F) such that Q −1 E Q = D is a diagonal
matrix. Hence, for
6.6 Normal Operators on Inner Product Spaces 225

1 0
U=
0Q

we have
α 0
U −1 A U = ,
0 D

where U is either orthogonal or unitary with respect to the fact that Q is orthogonal
or unitary.

Corollary 6.84 (Complex Spectral Theorem) Suppose that V is a complex inner


product space, T : V → V a linear operator and A the matrix of T. Then A is
unitarily similar to a diagonal matrix if and only if T is normal.

Corollary 6.85 Suppose that V is a complex inner product space, T : V → V and


A the matrix of T.
(i) If T is skew-Hermitian, then A is unitarily similar to a diagonal matrix.
(ii) If T is unitary, then A is unitarily similar to a diagonal matrix.

We cannot expect the same conclusion in the case V is a real inner product space:

Example 6.86 Let T : R3 → R3 be the normal (skew-symmetric) linear operator


having matrix ⎡ ⎤
0 −2 2
A = ⎣ 2 0 −1 ⎦
−2 1 0

with respect to the canonical basis of R3 . The characteristic polynomial is p(λ) =


−λ3 − 9λ, having roots 0, 3i, −3i.
For λ1 = 0, the homogeneous linear system associated with A has solutions:
X = (α, 2α, 2α), for any α ∈ R. The corresponding eigenspace is
N1,λ1 = (1, 2, 2). For λ2 = 3i, we have
⎡ ⎤
−3i −2 2
A − (3i)I = ⎣ 2 −3i −1 ⎦
−2 1 −3i

so that the associated homogeneous linear system has solutions:


X = ((−6i − 2)α, (3i − 4)α, 5α), for any α ∈ R. The corresponding eigenspace is
N1,λ2 = (−6i − 2, 3i − 4, 5).
Finally, for λ3 = −3i, we have
⎡ ⎤
3i −2 2
A + (3i)I = ⎣ 2 3i −1 ⎦
−2 1 3i
226 6 Canonical Forms of an Operator

so that the associated homogeneous linear system has solutions:


X = ((6i − 2)α, (−3i − 4)α, 5α), for any α ∈ R. The corresponding eigenspace is
N1,λ3 = (6i − 2, −3i − 4, 5).
There is no basis for R3 consisting of real eigenvectors of T, and hence A is not
similar to a diagonal matrix.

It is clear from the above discussion that one has to ask oneself what could be the
canonical form of real normal operators, in the case it is not diagonal. To answer this
question, we need to fix some results on invariant subspaces of V.

Lemma 6.87 Let T : V → V be a normal operator on the inner product space V


over the field F, U an T -invariant subspace of V. Then,
(i) U ⊥ is T -invariant.
(ii) U is T ∗ -invariant.

(iii) (T|U )∗ = T|U .
(iv) T|U is normal.
(v) T|U ⊥ is normal.

Proof
(i) Let V = U ⊕ U ⊥ and B1 = {e1 , . . . , ek }, B2 = {c1 , . . . , cn−k } be bases of U
and U ⊥ , respectively. Since T (ei ) ∈ U , for any i = 1, . . . , k, then the matrix
A of T with respect to the basis B1 ∪ B2 for V is

A1 E
A= ,
0n−k,k A2

where 0n−k,k is the zero matrix in Mn−k,k (F), A1 ∈ Mk (F), A2 ∈ Mn−k (F),
E ∈ Mk,n−k (F). Here, we denote αi j ∈ F and ηi j ∈ F the elements of A1 and
E, respectively. For i = 1, . . . k, T (ei ) is the coordinate vector of the entries
in the ith column of A.
The matrix A∗ of T ∗ with respect to the same basis B1 ∪ B2 is

∗ A∗1 Ok,n−k
A = ,
E ∗ A∗2

where 0k,n−k is the zero matrix in Mk,n−k (F). For i = 1, . . . , k, T ∗ (ei ) is the
coordinate vector of the entries in the i-th column of (A1 )∗ and E ∗ , that is the
conjugates of the entries in the i-th row of A1 and E.
Since T is normal, T (ei ) 2 = T ∗ (ei ) 2 for any i, thus


k 
k
# ∗ #
T (ei ) 2 = #T (ei )#2
i=1 i=1

that is
6.6 Normal Operators on Inner Product Spaces 227


k 
k 
k 
k 
k 
n
α ji α ji = αi j αi j + ηi j ηi j.
i=1 j=1 i=1 j=1 i=1 j=k+1

k n
This implies that i=1 j=k+1ηi j ηi j = 0 that is each ηi j must be zero. There-
fore E = 0 and
A1 0k,n−k
A= .
0n−k,k A2

It is easy to see that AX ∈ U ⊥ , for any X ∈ U ⊥ , that is U ⊥ is T -invariant.


(ii) It follows from the first result, since the matrix of T ∗ is

A∗1 Ok,n−k
A∗ = .
0n−k,k A∗2

(iii) Denote G = T|U and let X, Y ∈ U. Then

X, G ∗ (Y ) = G(X ), Y  = T (X ), Y  = X, T ∗ (Y )

that is G ∗ (Y ) − T ∗ (Y ) ∈ U ∩ U ⊥ = {0}, for any Y ∈ U. Thus G ∗ (Y ) =



T ∗ (Y ), where T ∗ (Y ) ∈ U by (ii). Hence G ∗ = T|U as desired.
∗ ∗
(iv) Since T T = T T, in particular T|U T|U = T|U T|U and, by (iii), T|U (T|U )∗ =
∗ ∗

(T|U )∗ T|U .
(v) It is a consequence of the previous result, since U ⊥ is T -invariant.

Lemma 6.88 Let T : V → V be a normal operator on a real inner product space


V . Then V has an invariant subspace U of dimension 1 or 2.

Proof Suppose that T has a real eigenvalue λ. Let X ∈ V be an eigenvector cor-


responding to λ and U = Span{X }. Then T (X ) = λX ∈ U and U is trivially T -
invariant.
Assume now that any eigenvalue of T is complex (not real). Of course the
characteristic polynomial p(λ) of T has coefficients in the real field. Hence, if
λ0 ∈ C is a root of p(λ) then its complex conjugate λ0 is also a root of p(λ). Thus
p(λ) has n = 2m roots, say λk = αk + iβk and λk = αk − iβk , for αk , βk ∈ R and
k = 1, . . . m. The decomposition of p(λ) is

'
m
   'm
 2 
p(λ) = λ − (αk + iβk ) λ − (αk − iβk ) = λ + ak λ + bk ,
k=1 k=1


4bk −ak2
where αk = − a2k and βk = ± 2
. If A is the matrix of T, then by Cayley-
Hamilton Theorem,
'
m
 
0 = p(A) = A2 + ak A + bk I .
k=1
228 6 Canonical Forms of an Operator

In particular, there is at least one j ∈ {1, . . . , m} such that the matrix A2 + a j A +


b j I is not invertible, that is the operator T 2 + a j T + b j I is not injective on V. Thus
there exists a nonzero vector X ∈ V such that (T 2 + a j T + b j I )X = 0. Consider
now the 2-dimensional subspace W of V spanned by the set {X, T (X )}. For any
element Y ∈ W, there exist μ1 , μ2 ∈ R such that Y = μ1 X + μ2 T (X ). Hence

T (Y ) = μ1 T (X ) + μ2 T 2 (X ) = μ1 T (X ) + μ2 (−a j T (X ) − b j X ) ∈ W

that is W is T -invariant, as required.

We are now ready to prove the following:

Theorem 6.89 Suppose that V is a real inner product space, T : V → V a linear


operator, A the matrix of T with respect to a basis for V. T is normal if and only if
A is orthogonally similar to a block diagonal matrix
⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥ (6.19)
⎢ .. ⎥
⎣ . ⎦
Ak

where each Ai is either a real number or a block of dimension 2, having the form

α −β
(6.20)
β α

where α, β ∈ R and β > 0.

Proof Firstly, we suppose there exists an orthogonal matrix U ∈ Mn (F) such that
⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
A = U −1 AU = ⎢

⎥,

⎣ .. ⎦
.
Ak

where each Ai has the form described in the statement of the theorem. In this case,
we can see that each Ai commutes with its transpose, that is each Ai is normal. Hence
the fact that A commutes with its transpose follows from easy computations. Thus
A is normal, as well as A.
Assume now that A is a normal matrix. In case n = 1 it is trivial.
We now prove the result for n = 2. Write

ab
A=
cd
6.6 Normal Operators on Inner Product Spaces 229

the matrix of T with respect to some orthonormal basis {X 1 , X 2 } for V. Thus


 T  
ab 1 ab 1
T (X 1 ) 2 = (AX 1 )T (AX 1 ) = = a 2 + c2
cd 0 cd 0

and

T ∗ (X 1 ) 2 = (A T X 1 )T (A T X 1 )
 T  
a c 1 a c 1
=
bd 0 bd 0
= a 2 + b2 .

Since A is normal, T (X 1 ) 2 = T ∗ (X 1 ) 2 , that is b2 = c2 .


If b = c, then A is a symmetric matrix. By the Spectral Theorem, A is orthogonally
similar to a diagonal matrix and we are done.
Let now c = −b, then
a b
A= .
−b d

Of course, we can assume b = 0, if not there is nothing to prove. Since A T A =


A A T , by computing the (1, 2)-entries of both A T A and A A T , we have −ab + bd =
ab − bd, that is ab = bd. Hence a = d and

a b
A= .
−b a

If b < 0 the result is proved. If b > 0, we compute the matrix A of T with respect
to the orthonormal basis {X 1 , −X 2 }. It is

a −b
A =
b a

as required.
Hence we can suppose in what follows n ≥ 3 and prove the result by induction
on n. Assume that the theorem holds for any normal matrix having order less than n.
Let U be a T -invariant subspace of V having dimension 1 or 2. If dim R U = 1,
then any vector in U with norm 1 is an orthonormal basis of U and the matrix A1
of T|U has order 1. If dim R U = 2, then T|U is a normal operator on U (see Lemma
6.87) and the matrix A1 of T|U has the form (6.20) with respect to an orthonormal
basis C1 of U. Since T|U ⊥ is also a normal operator on U ⊥ (see again Lemma 6.87),
by induction hypothesis there exists an orthonormal basis C2 of U ⊥ with respect to
which the matrix A2 of T|U ⊥ has the desired form.
Therefore, considering the basis C1 ∪ C2 for V , the matrix A of T with respect
to C1 ∪ C2 is
230 6 Canonical Forms of an Operator

A1 0
0 A2

having the form (6.19).

Remark 6.90 Let ⎡ ⎤


A1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
Ak

be the block diagonal form of the matrix of a normal operator T as in Theorem 6.89.
Then each 1 × 1 block Ai is precisely a real eigenvalue of T and any 2 × 2 block
having form (6.20) is corresponding to the pair of complex conjugate eigenvalues
α + iβ, α − iβ of T.

Now we give a method to construct canonical form of a real normal operator:


Suppose that V is a n-dimensional real inner product space, T : V → V an oper-
ator, A the matrix of T with respect to a basis for V. Here, we describe the way to
obtain an orthonormal basis of V with respect to which the matrix of T is similar to
a block diagonal matrix of the form (6.19). To do this, we firstly prove the following
facts:
(a) Let λ = a + ib and λ = a − ib (b = 0) be a pair of complex (not real) eigen-
values of T, X ∈ Cn a complex eigenvector corresponding to λ. Then X is an
eigenvector of T corresponding to λ.

Proof From AX = λX it follows

AX = λX = λX .

On the other hand, since A is real,

AX = AX = AX .

Thus, AX = λX as desired.

(b) Let λ = a + ib be a complex (not real) eigenvalue of T, X ∈ Cn a complex


eigenvector corresponding to λ. Then X and X are orthogonal.

Proof It follows from the fact that λ = λ (since λ is not real). Hence, X and X are
corresponding to the distinct eigenvalues λ and λ, respectively.
(c) Let X ∈ Cn be an eigenvector of T. Since X is a complex vector, it is defined as a
combination of two real vectors x ∈ Rn the real part, and y ∈ Rn the imaginary
part of X , that is X = x + i y. Then x, y are orthogonal vectors and x 2 = y 2 .
6.6 Normal Operators on Inner Product Spaces 231


Proof Since X, X are orthogonal, we have 0 = X X = X T X = (x + i y)T (x +
i y) = x T x + 2i x T y − y T y = x 2 + 2i x T y − y 2 which implies both x 2 =
y 2 and x T y = 0 as required.

(d) Let Z , Y ∈ Cn be such that both Z ∗ Y = 0 and Z Y = 0. If Z = a + ib and
Y = c + id, for a, b, c, d ∈ V, then a T c = a T d = b T c = b T d = 0, that is the
real and imaginary parts of Z are orthogonal to both the real and imaginary parts
of Y.

Proof By Z ∗ Y = 0 we get

0 = (a + ib)∗ (c + id) = (a − ib)T (c + id) = a T c + b T d + i(a T d − b T c)

that is
a T c + b T d = 0, a T d − b T c = 0. (6.21)

Analogously, by Z Y = 0, it follows

0 = (a − ib)∗ (c + id) = (a + ib)T (c + id) = a T c − b T d + i(a T d + b T c)

that is
a T c − b T d = 0, a T d + b T c = 0. (6.22)

Comparing (6.21) with (6.22), we have the required conclusion.

(e) Let λ = a + ib be a complex, not real, eigenvalue of T, X = x + i y a complex


eigenvector of T corresponding to λ, where x, y ∈ V. Then T (x) = ax − by
and T (y) = bx + ay.

Proof It is sufficient to compute the image of X :

T (X ) = AX = λX = (a + ib)(x + i y) = (ax − by) + i(bx + ay).

On the other hand, AX = A(x + i y) = Ax + i Ay, hence Ax = ax − by and


Ay = bx + ay.

( f ) If
√X = x + √ i y is a complex eigenvector of T having length equal to 1, then both
2x and 2y have length equal to 1.

Proof It follows from:

1 = X = x 2 + y 2 = 2 x 2 = 2 y 2 .

We are now ready to construct the required orthonormal basis for V. Let α1 , . . . , αr
be the real eigenvalues of T, w1 , . . . , wr real eigenvectors of T corresponding
to α1 , . . . , αr , respectively, and let W = w1 , . . . , wr . By standard computations,
we obtain an orthonormal basis for W. Let {z 1 , . . . , zr } be such a basis. Let
232 6 Canonical Forms of an Operator

λ1 , λ1 , . . . , λk , λk be the complex, not real, eigenvalues of T and X 1 , X 1 , . . . , X k , X k


complex eigenvectors corresponding to λ1 , λ1 , . . . , λk , λk respectively. For any
j = 1, . . . , k, we choose λ j = a j − ib j , for a j , b j ∈ R and b j > 0, and write
X j = x j + i y j , for x j , y j ∈ V.

Consider the following set of real vectors:


√ √ √ √
B = { 2x1 , 2y1 , . . . , 2xk , 2yk , z 1 , . . . , zr }.

In light of our previous comments, we see that B is a set of n orthonormal real


vectors of V, that is B is an orthonormal basis for V. Let A be the matrix of T
with respect to B. Then, the column coordinate vectors of A are the images of the
√ of B : √
elements √
T ( 2x1 ) = A( 2x1 ) = 2(a1 x1 + b1 y1 ), having coordinates in terms of B :

[a1 , b1 , 0 . . . , 0 ],
  
(n−2)−times

√ √ √
T ( 2y1 ) = A( 2y1 ) = 2(−b1 x1 + a1 y1 ), having coordinates in terms of B :

[−b1 , a1 , 0 . . . , 0 ],
  
(n−2)−times

√ √ √
T ( 2x2 ) = A( 2x2 ) = 2(a2 x2 + b2 y2 ), having coordinates in terms of B :

[0, 0, a2 , b2 , 0 . . . , 0 ],
  
(n−4)−times

√ √ √
T ( 2y2 ) = A( 2y2 ) = 2(−b2 x2 + a2 y2 ), having coordinates in terms of B :

[0, 0, −b2 , a2 , 0 . . . , 0 ].
  
(n−4)−times

√ √ √
More generally, for any j = 1, . . . , k : T ( 2x j ) = A( 2x j ) = 2(a j x j + b j y j ),
having coordinates in terms of B :

[ 0 . . . , 0 , a j , b j , 0 . . . , 0 ],
     
(2 j−2)−times (n−2 j)−times

√ √ √
T ( 2y j ) = A( 2y j ) = 2(−b j x j + a j y j ), having coordinates in terms of B :

[ 0 . . . , 0 , −b j , a j , 0 . . . , 0 ].
     
(2 j−2)−times (n−2 j)−times
6.6 Normal Operators on Inner Product Spaces 233

Moreover, for any h = 1, . . . , r


T (z h ) = αh z h , having coordinates in terms of B :

[ 0...,0 , αh , 0...,0 ].
     
(2k+h−1)−times (n−2k−h)−times

Therefore, A has precisely the form


⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ Ak ⎥
⎢ ⎥
⎢ α1 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
αr

where each Ai has the form



a −b
b a

where α, a, b ∈ R and b > 0.


Two special cases of real normal operators are described in the following:

Corollary 6.91 Let A ∈ Mn (R) be an orthonormal matrix. Then, A is orthogonally


similar to a block diagonal matrix having the form
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ λk ⎥
⎢ ⎥,
⎢ A1 ⎥
⎢ ⎥
⎢ . . ⎥
⎣ . ⎦
At

where λh = ±1, for any h = 1, . . . , k and



cos(θ j ) −sin(θ j )
Aj =
sin(θ j ) cos(θ j )

for suitable θ j ∈ [0, 2π ), and for j = 1, . . . , t.

Proof Since A is orthogonal, it is normal. The real eigenvalues of an orthonormal


matrix are precisely ±1. The complex, not real, eigenvalues are of the form eiθ =
cos(θ ) + isin(θ ), with sin(θ ) = 0. Hence the result follows from Theorem 6.89
and Remark 6.90.
234 6 Canonical Forms of an Operator

Corollary 6.92 Let A ∈ Mn (R) be a skew-symmetric matrix. Then A is orthogonally


similar to a block diagonal matrix having the form
⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ A ⎥
⎢ k ⎥,
⎢ 0 ⎥
⎢ ⎥
⎢ . .. ⎥
⎣ ⎦
0

where
0 −α j
Aj =
αj 0

for suitable α j ∈ R, and for j = 1, . . . , k.

Proof Since the eigenvalues of a skew-symmetric matrix are precisely imaginary


numbers, every eigenvalue of A is either zero or ±iα, for some α ∈ R. Once again
the result follows from Theorem 6.89 and Remark 6.90.

Example 6.93 Let T : R3 → R3 be the normal (skew-symmetric) linear operator


as in Example 6.86. The matrix of T with respect to the canonical basis of R3 is
⎡ ⎤
0 −2 2
A = ⎣ 2 0 −1 ⎦ .
−2 1 0

The eigenvalues are 3i, −3i, 0. For λ = 3i, the corresponding eigenspace is W1 =
(−6i − 2, 3i − 4, 5). So we obtain an eigenvector X 1 ∈ C3 having length equal to
1, such that W1 = X 1  :
     
−6i − 2 3i − 4 5 −2 −4 5 −6 3
X1 = √ , √ ,√ = √ ,√ ,√ +i √ , √ ,0 .
90 90 90 90 90 90 90 90

Analogously, for λ = −3i, the corresponding eigenspace is W2 = (6i − 2, −3i −


4, 5) and we obtain X 2 ∈ C3 having length equal to 1, such that W2 = X 2 :
     
6i − 2 −3i − 4 5 −2 −4 5 6 −3
X2 = √ , √ ,√ = √ ,√ ,√ +i √ , √ ,0 .
90 90 90 90 90 90 90 90

Finally, for λ = 0, the eigenspace is generated by the following eigenvector of


length 1 :  
1 2 2
X3 = , , .
3 3 3
6.6 Normal Operators on Inner Product Spaces 235

Notice that X 1 = X 2 . In particular, we choose the vector X 2 (corresponding to


the eigenvalue whose imaginary part is negative) and write X 2 = x2 + i y2 , where
   
−2 −4 5 6 −3
x2 = √ ,√ ,√ , y2 = √ , √ ,0 .
90 90 90 90 90
√ √
Adding 2x2 and 2y2 to X 3 , we construct the following orthonormal basis for
R3 :      &
−2 −4 5 6 −3 1 2 2
B= √ ,√ ,√ , √ , √ ,0 , , , .
45 45 45 45 45 3 3 3

Let A be the matrix of T with respect to B. Then the column coordinate vectors
of A are the images of the elements of B. By computations, we get
√ √
T ( 2x2 ) = 3 2y2
√ √
T ( 2y2 ) = −3 2x2
T (X 3 ) = 0.

Therefore, A has precisely the form


⎡ ⎤
0 −3 0
⎣3 0 0⎦
0 0 0

as expected.

Exercises

1. Let T : R4 → R4 be the symmetric operator having matrix


⎡ ⎤
1 1 0 0
⎢1 1 0 0⎥
A=⎢
⎣0

0 2 3⎦
0 0 3 2

with respect to the canonical basis of R4 . Determine an orthonormal basis for R4


with respect to which the matrix of T has diagonal form.
2. Repeat Exercise 1, for the symmetric linear operator T : R4 → R4 having matrix
⎡ ⎤
1 1 1 1
⎢1 1 1 1⎥
A=⎢
⎣1

1 1 1⎦
1 1 1 1
236 6 Canonical Forms of an Operator

with respect to the canonical basis of R4 .


3. Let T : C4 → C4 be the Hermitian operator having matrix
⎡ ⎤
1 i 0 0
⎢ −i 1 0 0⎥
A=⎢
⎣ 0

0 1 i⎦
0 0 −i 1

with respect to the canonical basis of C4 . Determine an unitary basis for C4 with
respect to which the matrix of T has diagonal form.
4. Let T : R4 → R4 be the normal (skew-symmetric) operator having matrix
⎡ ⎤
0 −2 0 2
⎢ 2 0 0 2⎥
A=⎢
⎣ 0

0 0 0⎦
−2 −2 0 0

with respect to the canonical basis of R4 . Determine an orthogonal basis for R4


with respect to which the matrix of T has block diagonal form.
5. Let T : V → V be a normal operator on the n-dimensional vector space V over
the field F, A the matrix of T with respect to a basis for V. Let λ be an eigenvalue of
T and X ∈ V an eigenvector of T corresponding to λ. Prove that λλ is eigenvalue
of the operator T ∗ T = T T ∗ and X is eigenvector of T ∗ T = T T ∗ corresponding
to λλ.
6. For each linear operator T on an inner product space V, determine whether T
is normal, self-adjoint or neither. If possible, produce an orthonormal basis of
eigenvectors of T for V and list the corresponding eigenvalues.
(a) V = R2 and T is defined by T (a, b) = (2a − 2b, −2a + 5b).
(b) V = R3 and T is defined by T (a, b, c) = (−a + b, 5b, 4a − 2b + 5c).
(c) V = C2 and T is defined by T (a, b) = (2a + ib, a + 2b).
 "1
(d) V = P2 (R) and T is defined by T ( f ) = f , where f, g = 0 f (t)g(t)dt.
(e) V = M2 (R) and T is defined by T (A)
 =A . 
t

ab cd
( f ) V = M2 (R) and T is defined by T = .
cd ab
7. Let V be a complex inner product space and let T be a linear operator on V .
Define T1 = 21 (T + T ∗ ) and T2 = 2i1 (T − T ∗ ).
(a) Prove that T1 and T2 are self-adjoint and that T = T1 + i T2 .
(b) Suppose also that T = U1 + iU2 , where U1 and U2 are self-adjoint. Prove
that U1 = T1 and U2 = T2 .
(c) Prove that T is normal if and only if T1 T2 = T2 T1 .
6.6 Normal Operators on Inner Product Spaces 237

8. Prove that if T is a unitary operator on a finite dimensional inner product space


V, then T has a unitary square root; that is there exists a unitary operator U such
that T = U 2 .
9. Let A be an n × n real symmetric or complex normal matrix. Prove that tr (A) =
n n
λi and tr (A∗ A) = |λi |2 , where λi s are the (not necessarily distinct) eigen-
i=1 i=1
values of A.
Chapter 7
Bilinear and Quadratic Forms

This chapter is devoted to the study of the properties of bilinear and quadratic forms,
defined on a vector space V over a field F. The main goal will be the construction of
appropriate methods aimed at obtaining the canonical expression of the functions,
in terms of suitable bases for V. To do this, we will introduce the concept of orthog-
onality with respect to a bilinear form and mostly make use of the orthogonalization
Gram-Schmidt process. Unless otherwise stated, here any vector space V is a finite
dimensional vector space over F.

7.1 Bilinear Forms and Their Matrices

Let F be a field, V, W vector spaces over F and V × W the cartesian product of V


and W (as sets). A function f : V × W → F is called bilinear if it is linear in each
variable separately, that is,

f (α1 v1 + α2 v2 , w) = α1 f (v1 , w) + α2 f (v2 , w)

f (v, β1 w1 + β2 w2 ) = β1 f (v, w1 ) + β2 f (v, w2 )

for any α1 , α2 , β1 , β2 ∈ F, v1 , v2 ∈ V and w1 , w2 ∈ W. A bilinear function f : V ×


W → F is usually called a bilinear form on V × W.

Example 7.1 The inner product f : Rn × Rn −→ R is a bilinear form on Rn × Rn .

Example 7.2 Let B = {b1 , b2 } be a basis for R2 and C = {c1 , c2 , c3 } a basis for R3 .
Let f : R2 × R3 −→ R be the function defined by
 
f (x1 , x2 ), (y1 , y2 , y3 ) = x1 (y1 + y2 ) + x2 (y1 − y3 ),

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 239
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_7
240 7 Bilinear and Quadratic Forms

where (x1 , x2 ) and (y1 , y2 , y3 ) are the coordinate vectors of any v ∈ R2 and w ∈ R3
in terms of B and C, respectively. Then f is a bilinear form on R2 × R3 . In fact,
for any α, β ∈ R, v1 = (x1 , x2 ), v2 = (x1 , x2 ) ∈ R2 and w1 = (y1 , y2 , y3 ), w2 =
(y1 , y2 , y3 ) ∈ R3 , it is easy to see that
   
f α(x1 , x2 ) + β(x1 , x2 ), (y1 , y2 , y3 ) = α f (x1 , x2 ), (y1 , y2 , y3 )
 
+β f (x1 , x2 ), (y1 , y2 , y3 )

and
   
f (x1 , x2 ), α(y1 , y2 , y3 ) + β(y1 , y2 , y3 ) = α f (x1 , x2 ), (y1 , y2 , y3 )
 
+β f (x1 , x2 ), (y1 , y2 , y3 ) .

Let f : V × W → F be a bilinear form on V × W, where V and W are finite


dimensional vector spaces over F. Let B = {b1 , . . . , bn } and C = {c1 , . . . , cm } be
ordered bases for V and W , respectively. Let [v] B and [w]C be the coordinate vec-
tors of v ∈ V in terms of B and w ∈ W in terms of C, respectively. Say [v] B =

n 
m
[x1 , . . . , xn ]t and [w]C = [y1 , . . . , ym ]t , that is, v = xi bi and w = y j c j . Since
i=1 j=1
f is compatible with linear combinations in each variable, we have
⎛ ⎞

n 
m 
f (v, w) = f ⎝ xi bi , yjcj⎠ = xi y j f (bi , c j ).
i=1 j=1 i, j

 
If we consider the coefficients matrix A = ai j , where ai j = f (bi , c j ), for any
1 ≤ i, j ≤ n then it is easy to see that

f (v, w) = [v]tB A[w]C .


 
The n × m matrix A = ai j is said to be the matrix of the bilinear form with respect
to the ordered bases B and C.
Conversely, let A ∈ Mnm (F) and let, as above, [v] B , [w]C be the coordinate vectors
of v ∈ V, in terms of B, and w ∈ W, in terms of C, respectively. If we define a
function f : V × W → F such that f (v, w) = [v]tB A[w]C , then by computations
it follows that f (v, w) = xi y j ai j . Of course, this map is compatible with linear
i, j
combinations in each variable, that is, f is a bilinear map on the set V × W.
Moreover, if we assume that there exist two n × m matrices A and A of f, in terms
of the same ordered bases B for V and C for W, then, for any v ∈ V and w ∈ W, it
follows that [v]tB A[w]C = f (v, w) = [v]tB A [w]C , that is, [v]tB (A − A )[w]C = 0.
Denote by αi j the coefficient entries of the matrix A − A . But since v ∈ V and
w ∈ W are arbitrary, for all i = 1, . . . , n and j = 1, . . . , m, we get
7.1 Bilinear Forms and Their Matrices 241
 
[ 0, . . . , 0 , 1, 0, . . . , 0 ] B A − A [ 0, . . . , 0 , 1, 0, . . . , 0 ]Ct = 0 =⇒ αi j = 0.





(i−1)−times (n−i)−times ( j−1)−times (m− j)−times

This means that A − A is the zero matrix, that is A = A .


Thus, we have proved that there exists an unique n × m matrix having entries
in F, which represents the bilinear form with respect to the given ordered bases B
and C. So, it is clear that the matrix of a bilinear form on the set V × W changes
according to the choice of the bases of the underlying vector spaces.
Example 7.3 Let f : R2 × R3 → R be a bilinear form on R2 × R3 defined by
 
f (x1 , x2 ), (y1 , y2 , y3 ) = x1 (y1 + y2 ) + x2 (y1 − y3 )
 
with respect to the canonical bases in R2 and R3 . The matrix A = ai j of f is
obtained by the following computations:
   
a11 = f (1, 0), (1, 0, 0) = 1, a12 = f (1, 0), (0, 1, 0) = 1,
   
a13 = f (1, 0), (0, 0, 1) = 0; a21 = f (0, 1), (1, 0, 0) = 1,
   
a22 = f (0, 1), (0, 1, 0) = 0, a23 = f (0, 1), (0, 0, 1) = −1.

Thus  
11 0
A= .
1 0 −1

Now consider two different ordered bases for R2 and R3 , precisely:

D = {d1 , d2 } = {(1, 1), (2, 1)} for R2

and E = {e1 , e2 , e3 } = {(1, 1, 0), (0, 0, 1), (2, 0, 1)} for R3 .

Since
f (d1 , e1 ) = 3, f (d1 , e2 ) = −1, f (d1 , e3 ) = 3;

f (d2 , e1 ) = 5, f (d2 , e2 ) = −1, f (d2 , e3 ) = 5

the matrix A of f with respect to the bases D and E is the following:


 
 3 −1 3
A =
5 −1 5

and hence f can be represented by


242 7 Bilinear and Quadratic Forms
⎡ ⎤
  y1
    3 −1 3
f (x1 , x2 ), (y1 , y2 , y3 ) = x1 x2 ⎣ y2 ⎦
5 −1 5
y3
= x1 (3y1 − y2 + 3y3 ) + x2 (5y1 − y2 + 5y3 )

with respect to the bases D and E.

Example 7.4 Let f : R3 × R2 → R be a bilinear form on R3 × R2 , having the


matrix ⎡ ⎤
11
⎣1 0⎦
21

in terms of the bases B = {b1 , b2 , b3 } = {(1, 1, 0), (0, 0, 1), (0, 1, 1)} for R3 and
C = {c1 , c2 } = {(1, 1), (0, 1)} for R2 . Then f can be expressed as follows
⎡ ⎤
 
    11 y1
f (x1 , x2 , x3 ), (y1 , y2 ) = x1 x2 x3 ⎣ 10 ⎦
y2
21
= x1 (y1 + y2 ) + x2 y1 + x3 (2y1 + y2 ).

We now obtain the matrix of f in terms of different ordered bases for V and W.
Let D = {d1 , d2 , d3 } = {(0, 1, 0), (0, 1, 1), (2, 0, 1)} be a basis for R3 and E =
{e1 , e2 } = {(1, 1), (2, 1)} be a basis for R2 . In order to obtain the matrix relative
to the bases D and E, one has to compute the coefficients f (di , e j ). The first
step is to determine the coordinate vectors of d1 , d2 , d3 in terms of B, and e1 , e2
in terms of C : d1 = [0, 1, 0]t = [0, −1, 1]tB , d2 = [0, 1, 1]t = [0, 0, 1]tB , d3 =
[2, 0, 1]t = [2, 3, −2]tB

e1 = [1, 1]t = [1, 0]Ct , e2 = [2, 1]t = [2, −1]Ct .

Thus ⎡ ⎤
 1
 1  
1
f (d1 , e1 ) = 0 −1 1 ⎣ 1 0⎦ = 1,
0
2 1
⎡ ⎤
 1 1  
2
f (d1 , e2 ) = 0 −1 1 ⎣ 1 0 ⎦ = 1,
−1
2 1

f (d2 , e1 ) = 2, f (d2 , e2 ) = 3, f (d3 , e1 ) = 1, and f (d3 , e2 ) = 2.

Therefore, with respect to the bases D and E, f can be expressed as


7.1 Bilinear Forms and Their Matrices 243
⎡ ⎤
 
    11 y1
f (x1 , x2 , x3 ), (y1 , y2 ) = x1 x2 x3 ⎣ 23 ⎦
y2
12
= x1 (y1 + y2 ) + x2 (2y1 + 3y2 ) + x3 (y1 + 2y2 ).

7.2 The Effect of the Change of Bases

We now consider two different ordered bases B = {b1 , . . . , bn } and B  = {b1 , . . . , bn }
for V, as well as two different ordered bases C = {c1 , . . . , cm } and C  = {c1 , . . . , cm }
for W. In view of the above, the bilinear form f : V × W → F can be represented
by different matrices, in connection with the choice of a basis for V and W. For
instance, let A be the matrix of f with respect to the ordered bases B for V and C
for W, and A the matrix of f with respect to the ordered bases B  for V and C  for
W. Now let us describe the relationship between the matrices A and A .
Let P ∈ Mn (F) be the transition matrix of B  relative to B, whose i-th column is
the coordinates vector [bi ] B , and Q ∈ Mm (F) be the transition matrix of C  relative to
C, whose i-th column is the coordinates vector [ci ]C . We recall that, for any vectors
v ∈ V and w ∈ W, the following hold:

[v] B = P[v] B  and [w]C = Q[w]C  .

Thus
f (v, w) = [v]tB A[w]C
 t  
= P[v] B  A Q[w]C 
= [v]tB  P t AQ [w]C  .

On the other hand, f (v, w) = [v]tB  A [w]C  and, by the uniqueness of A in terms of
the ordered bases B  and C  , we get A = P t AQ.

Example 7.5 Let f : R3 × R2 → R be the bilinear form as in Example 7.4, having


the matrix ⎡ ⎤
11
A = ⎣1 0⎦
21

in terms of the ordered bases B = {b1 , b2 , b3 } = {(1, 1, 0), (0, 0, 1), (0, 1, 1)} for
R3 and C = {c1 , c2 } = {(1, 1), (0, 1)} for R2 .
We introduce D = {d1 , d2 , d3 } = {(0, 1, 0), (0, 1, 1), (2, 0, 1)} a basis for R3 and
E = {e1 , e2 } = {(1, 1), (2, 1)} a basis for R2 .
The transition matrices P, Q of D relative to B and that of E relative to C,
respectively, are ⎡ ⎤
0 0 2  
1 2
P = ⎣ −1 0 3 ⎦ , Q =
0 −1
1 1 −2
244 7 Bilinear and Quadratic Forms

so that, the matrix A of f, in terms of the ordered bases D for R3 and E for R2 , is

A = P t AQ
⎡ ⎤⎡ ⎤
0 −1 1 1 1  
1 2
= ⎣0 0 1 ⎦⎣1 0⎦
0 −1
2 3 −2 2 1
⎡ ⎤
1 1
= ⎣2 3⎦.
1 2

Let us now investigate the special case when V = W, in other words, we consider the
bilinear form f : V × V → F. Under this assumption, we may always consider only
one ordered basis B = {e1 , . . . , en } for V, so that the coefficients matrix A = ai j
of f is obtained by the computations ai j = f (ei , e j ), for any i, j = 1, . . . , n.
Therefore, if B and B  are two different bases for V, then f can be represented
by two different matrices: A, the matrix of f with respect to the ordered basis B,
and A , the matrix of f with respect to the ordered basis B  . In light of the above
argument, A = P t A P, where P is the transition matrix of B  relative to B. Notice
that A, A , P are n × n square matrices, and, in particular, P is an invertible matrix.
At this point, we would like to recall the following:

Definition 7.6 Let A, A be two n × n matrices having coefficients in some field


F. A, A are called congruent matrices if there exists an invertible n × n matrix
P with coefficients in F, such that A = P t A P. The relationship A = P t A P is
usually called congruence. It is easy to see that the congruence between matrices is
an equivalence relation.

More precisely, we have:

Theorem 7.7 Let f : V × V → F be a bilinear form on V × V (equivalently we


say that f is a bilinear form on V ). Two matrices A, A represent f, in terms of two
different ordered bases for V, if and only if they are congruent.

Proof We already have proved one direction: if both A and A represent the same
bilinear form f with respect to the ordered bases B and B  , respectively, then there
exists a nonsingular matrix P (which is precisely the transition matrix of B  relative
to B) such that A = P t A P.
In order to prove the other direction, we assume that A is the matrix of f in
terms of the ordered basis B = {b1 , . . . , bn }. Here suppose that A = P t A P for
some nonsingular n × n matrix P. Of course, the i-th column vector [u 1i , . . . , u ni ]t
in P can be viewed as a coordinate vector with respect to the ordered basis B. Let

n
ui = u ji b j be the vector of V, having coordinates [u 1i , . . . , u ni ]t in terms of B.
j=1
Hence, for any i = 1, . . . , n, we obtain a sequence of n linearly independent vectors
7.2 The Effect of the Change of Bases 245

u 1 , . . . , u n . Hence, the set B  = {u 1 , . . . , u n } is an ordered basis for V and P is the


transition matrix of B  relative to B. Therefore, A is precisely the matrix of f with
respect to the ordered basis B  .

7.3 Symmetric, Skew-Symmetric and Alternating Bilinear


Forms

We focus our attention on three different types of bilinear forms:


Definition 7.8 Let f : V × V → F be a bilinear form on V.
(1) f is called symmetric if f (u, v) = f (v, u) for all u, v ∈ V.
(2) f is called skew-symmetric if f (u, v) = − f (v, u) for all u, v ∈ V.
(3) f is called alternating if f (u, u) = 0 for all u ∈ V.
Nevertheless, here we prove that any skew-symmetric form is precisely a symmetric
or an alternating form, according to the fact that the characteristic of F is 2 or not,
respectively.

Theorem 7.9 Let f : V × V → F be a bilinear form on V.


(i) If f is alternating form, then it is skew-symmetric.
(ii) If char (F) = 2, then f is skew-symmetric if and only if it is alternating.
(iii) If char (F) = 2, then f is skew-symmetric if and only if it is symmetric.

Proof (i) We firstly assume that f is alternating. Thus, by expanding the relation
f (u + v, u + v) = 0 for any u, v ∈ V, we get

0 = f (u, u) + f (u, v) + f (v, u) + f (v, v) = f (u, v) + f (v, u)

so that f (u, v) = − f (v, u) for any u, v ∈ V, i.e., f is skew-symmetric.

(ii) Consider now the case char (F) = 2 and suppose f is skew-symmetric. There-
fore, f (u, u) = − f (u, u) for any u ∈ V, which implies 2 f (u, u) = 0, for any
u ∈ V, that is, f is alternating.

(iii) Finally, if char (F) = 2, then f is symmetric if and only if f (u, v) = f (v, u) =
− f (v, u) for any u, v ∈ V (since 1 = −1). This last relation holds if and only if f
is skew-symmetric, as desired.

Theorem 7.10 Let f : V × V → F be a bilinear form on V. Then


(i) f is symmetric if and only if the matrix A of f is symmetric (At = A), whatever
the choice of ordered basis for V with respect to which A is related.
246 7 Bilinear and Quadratic Forms

(ii) f is alternating if and only if the matrix A of f is skew-symmetric (At = −A)


and the diagonal entries of A are zero, whatever the choice of ordered basis for
V with respect to which A is related.

Proof We remark that the same result holds for bilinear skew-symmetric forms, i.e.,
f is skew-symmetric if and only if the matrix A of f is skew-symmetric. Never-
theless, in light of the previous theorem, it is known that a skew-symmetric form is
either symmetric or alternating. Therefore, it is sufficient to prove the result in these
last two cases.  
Let B = {e1 , . . . , en } be any ordered basis for V and A = ai j be the matrix of
f in terms of B.

(i) Firstly, we assume that f is symmetric. Thus, ai j = f (ei , e j ) = f (e j , ei ) = a ji


for all i = j, i.e., A is a symmetric matrix.
Suppose now At = A. Hence, for any u, v, ∈ V, f (u, v) = u t Av and f (v, u) =
v Au. Here, we have identified u by [u] B and v by [v] B , respectively. On the other
t

hand, since u t Av ∈ F is a scalar element, we have

f (u, v) = u t Av = (u t Av)t = v t At u = v t Au = f (v, u),

as required.

(ii) Let now f be alternating. Thus, aii = f (ei , ei ) = 0 for any i = 1, . . . , n. More-
over, for any i = j, f (ei + e j , ei + e j ) = 0 implies f (ei , e j ) + f (e j , ei ) = 0, i.e.,
ai j = f (ei , e j ) = − f (e j , ei ) = −a ji and hence A is skew-symmetric.
Conversely, let At = −A be such that aii = 0 for any i = 1, . . . , n. Hence, for
any u ∈ V, f (u, u) = u t Au. As above, since u t Au ∈ F is a scalar element,

f (u, u) = u t Au = (u t Au)t = u t At u = −u t Au = − f (u, u).

Hence, in case of char (F) = 2, it follows f (u, u) = 0 for any u ∈ V and we are
done.
n
Finally, let char (F) = 2 and u = αi ei be any vector of V. Since f (ei , ei ) =
i=1
aii = 0 for any i, and f (ei , e j ) = ai j = −a ji = − f (e j , ei ) = f (e j , ei ) for any i =
j, it follows that


n 
n
 
f (u, u) = f αi ei , αi ei = 2 αi α j f (ei , e j ) = 0,
i=1 i=1 i= j

as required.

Definition 7.11 Let f : V × V −→  F be either symmetric, skew-symmetric or


alternating on V. We will refer to V, f as a metric vector space.
7.4 Orthogonality and Reflexive Forms 247

7.4 Orthogonality and Reflexive Forms

A bilinear form is a generalization of the inner product, so the condition f (u, v) = 0


for some u, v ∈ V can be viewed as a generalization of the orthogonality. More
precisely, we give the following:
 
Definition 7.12 Let V, f be a metric vector space and u, v ∈ V. Then u is said to
be f -orthogonal to v if f (u, v) = 0. In this case, we write u ⊥ v.
In general, the relation ⊥ of orthogonality might not be symmetric, in the sense that
we can have u ⊥ v ( f (u, v) = 0) but v ⊥ u ( f (v, u) = 0).
Example 7.13 Let f : R2 × R2 → R be a bilinear form defined as
  
    3 2 y1
f ( x1 , x2 ), (y1 , y2 ) = x1 x2
−1 1 y2

= 3x1 y1 − x2 y1 + 2x1 y2 + x2 y2 .

For u = [1, 1]t and v = [3, −2]t , we have f (u, v) = 0 but f (v, u) = 0.
Remark 7.14 Of course, if f is either symmetric or alternating, then the orthogo-
nality relation ⊥ is symmetric. In fact, if f (u, v) = f (v, u) (or f (u, v) = − f (v, u))
for any u, v ∈ V, then u ⊥ v if and only if v ⊥ u.
When the orthogonality relation is symmetric, that is, f (u, v) = 0 if and only if
f (v, u) = 0 for all u, v ∈ V, we say that f is reflexive.
Theorem 7.15 Let f : V × V → F be a bilinear form on the vector space V. Then
f is reflexive if and only if f is either symmetric or alternating.
Proof In light of Remark 7.14, we now assume that f is reflexive and prove that
it is either symmetric or alternating. Let x, y, z ∈ V. Of course, f (x, y) f (x, z) =
f (x, z) f (x, y). Thus, we have
 
0 = f (x, y) f (x, z) − f (x, z) f (x, y) = f x, f (x, y)z − f (x, z)y . (7.1)

Since f is reflexive, (7.1) implies


 
0 = f f (x, y)z − f (x, z)y, x = f (x, y) f (z, x) − f (x, z) f (y, x) for all x, y, z ∈ V.
(7.2)
In particular, for x = z in (7.2), one has

f (x, y) f (x, x) − f (x, x) f (y, x) = 0 for all x, y ∈ V. (7.3)

In other words, for any x, y ∈ V

either f (x, x) = f (y, y) = 0 or f (y, x) = f (x, y). (7.4)


248 7 Bilinear and Quadratic Forms

Here, we suppose that f is neither symmetric nor alternating and show that a con-
tradiction follows. In light of our last assumption, there exist u, v, w ∈ V such that
f (v, w) = f (w, v) and f (u, u) = 0. By (7.4) and f (v, w) = f (w, v), it follows
that f (w, w) = f (v, v) = 0. Moreover, by (7.3) and f (u, u) = 0, we also have both
f (u, v) = f (v, u) and f (u, w) = f (w, u).
On the other hand, for x = v, y = w and z = u in (7.2), it follows
 
0 = f (v, w) f (u, v) − f (v, u) f (w, v) = f (u, v) f (v, w) − f (w, v) (7.5)

and analogously, for x = w, y = v and z = u in (7.2),


 
0 = f (w, v) f (u, w) − f (w, u) f (v, w) = f (u, w) f (w, v) − f (v, w) . (7.6)

Therefore, since f (v, w) = f (w, v), relations (7.5) and (7.6) say that f (u, v) =
f (v, u) = 0 and f (u, w) = f (w, u) = 0. Hence

f (u + v, w) = f (v, w) and f (w, u + v) = f (w, v),

that is, f (u + v, w) = f (w, u + v). By using (7.4), we get f (u + v, u + v) = 0.


This gives the contradiction

0 = f (u + v, u + v) = f (u, u) + f (u, v) + f (v, u) + f (v, v) = f (u, u).

If f is reflexive on V and W is a subspace of V, we set

W ⊥ = {v ∈ V | f (v, w) = 0 for all w ∈ W } = {v ∈ V | f (w, v) = 0 for all w ∈ W }

and call W ⊥ the f -orthogonal space of W. One may notice that this definition is
equivalent to the one of orthogonal complement in an inner product space. In this
sense, we prefer to use the term f -orthogonal space, and not orthogonal complement
for the set W ⊥ , in order to distinguish the case of metric spaces and the other one of
inner product spaces.
In fact, if W ⊥ is the orthogonal complement of the subspace W of an inner product
space V , then W ⊕ W ⊥ = V. On the other hand, if W ⊥ is simply the f -orthogonal
space of W in the metric space V (i.e., V is equipped by a bilinear form that is not
an inner product), then it may happen that W + W ⊥ = V.

Example 7.16 Let f : R3 × R3 → R be a (skew-symmetric) bilinear form defined


as ⎡ ⎤⎡ ⎤
    0 1 0 y1
f ( x1 , x2 , x3 ), (y1 , y2 , y3 ) = x1 x2 x3 ⎣ −1 0 1 ⎦ ⎣ y2 ⎦
0 −1 0 y3

= x1 y2 − x2 y1 + x2 y3 − x3 y2 .

If W =
(1, −1, 0) , then W ⊥ =
(1, −1, 0), (1, 0, 1) and W + W ⊥ = W ⊥ = R3 .
7.5 The Restriction of a Bilinear Form 249

7.5 The Restriction of a Bilinear Form

If (V, f ) is a metric vector space and W is a subspace of V, then we may introduce


the restriction f |W of f to W, so we get the metric space (W, f |W ). It is clear that if
f is either symmetric, skew-symmetric or alternating on V then so is the restriction
f |W on W.
Example 7.17 Let f : R3 × R3 → R be a (symmetric) bilinear form defined as
⎡ ⎤⎡ ⎤
    011 y1
f ( x1 , x2 , x3 ), (y1 , y2 , y3 ) = x1 x2 x3 ⎣ 1 0 1 ⎦ ⎣ y2 ⎦
110 y3

= x1 y2 + x2 y1 + x1 y3 + x3 y1 + x2 y3 + x3 y2 .

If W =
(1, 1, 0), (0, 1, 1) , then any pair of vectors u, v ∈ W can be written as
u = α1 (1, 1, 0) + α2 (0, 1, 1), v = β1 (1, 1, 0) + β2 (0, 1, 1), α1 , α2 , β1 , β2 ∈ R.
Hence
   
f (u, v) = α1 β1 f (1, 1, 0), (1, 1, 0) + α1 β2 f (1, 1, 0), (0, 1, 1)
   
+α2 β1 f (0, 1, 1), (1, 1, 0) + α2 β2 f (0, 1, 1), (0, 1, 1)
= 2α1 β1 + 3α1 β2 + 3α2 β1 + 2α2 β2
  
  23 β1
= α1 α2 .
32 β2
 
23
Thus, the matrix represents the restriction of f to W.
32
For instance, let u ∈ W and the coordinate vector of u for V be [2, 1, −1]t . Thus
u = 2(1, 1, 0) + (−1)(0, 1, 1). Similarly, if v ∈ W and the coordinate vector of v
for V be [1, 4, 3]t then v = 1(1, 1, 0) + 3(0, 1, 1). Thus, the coordinate vector of u
with respect to the basis B = {(1, 1, 0), (0, 1, 1)} for W is [2, −1]t . Analogously,
the coordinate vector of v with respect to B is [1, 3]t .
As vectors of V, we get
⎡ ⎤⎡ ⎤
  011 1
f (u, v) = 2 1 −1 ⎣ 1 0 1 ⎦ ⎣ 4 ⎦ = 13.
110 3

As vectors of W,
  
  23 1
f (u, v) = 2 −1 = 13.
32 3
250 7 Bilinear and Quadratic Forms

7.6 Non-degenerate Bilinear Forms

Given a vector space V and a reflexive bilinear form f on V, we define the radical of
f as the subspace: Rad( f ) = V ⊥ = {u ∈ V | f (u, v) = 0 for all v ∈ V } = {u ∈
V | f (v, u) = 0 for all v ∈ V }. The bilinear form f is said to be non-degenerate if
V ⊥ = {0}. This is equivalent to say that f (v, u) = 0, for any v ∈ V, implies u = 0.
In case V ⊥ = {0}, we refer to f as a degenerate form.
Similarly, we may define the radical of the restriction f |W , where W is a subspace
of V,

Rad( f |W ) = W ∩ W ⊥ = {u ∈ W | f (u, w) = 0 for all w ∈ W }

and we say that f |W is non-degenerate on W if W ∩ W ⊥ = {0}.


Lemma 7.18 The reflexive bilinear form f : V × V → F is non-degenerate if and
only if the matrix of f is invertible.
Proof Let A ∈ Mn (F) be the matrix of f with respect to the ordered basis B =
{e1 , . . . , en }. It is clear that if u ∈ V ⊥ , then f (u, ei ) = 0 for any i =1, . . . , n.
On the other hand, if f (u, ei ) = 0 for any vector ei ∈ B, then f (u, αi ei ) =
 i
αi f (u, ei ) = 0 for any scalar elements α1 , . . . , αn . Thus f (u, v) = 0 for any
i
v ∈ V, that is, u ∈ V ⊥ . In other words, we have proved that v ∈ V ⊥ if and only if
f (v, ei ) = 0 for any vector ei of the basis for V.
Now, let X ∈ V ⊥ and [x1 , . . . , xn ]tB be the coordinate vector of X with respect to
B. Hence

X ∈ V⊥ ⇐⇒ X t Aei = 0 for all i = 1, . . . , n


⇐⇒ eit AX = 0 for all i = 1, . . . , n
⇐⇒ [ 0, . . . , 0 , 1, 0, . . . , 0 ] B A[x1 , . . . , xn ]tB = 0 (7.7)



(i−1)−times (n−i)−times
for all i = 1, . . . , n.
 
If ai j are the coefficient entries of the matrix A, then the relation (7.7) means that

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
... ... ... ...
an1 x1 + an2 x2 + · · · + ann xn = 0

in other words, X ∈ V ⊥ if and only if its coordinate vector [x1 , . . . , xn ]tB , in terms
of B, is a solution of the homogeneous linear system associated with the matrix A.
Thus V ⊥ = {0} if and only if the homogeneous linear system associated with the
matrix A has only the trivial solution. This happens if and only if the rank of A is
equal to n, that is, A is invertible, as required.
7.6 Non-degenerate Bilinear Forms 251

Remark 7.19 A reflexive non-degenerate bilinear form on a vector space V might


restrict to a degenerate bilinear form on a subspace W of V. For example, let f :
R3 × R3 → R be the (symmetric) bilinear form having matrix
⎡ ⎤
1 1 1
A = ⎣ 1 0 −1 ⎦ .
1 −1 0

It is trivially a non-degenerate form, since A is invertible. Nevertheless, if


W =
(1, 1, 0), (0, 0, 1) , then the matrix of f |W is
 
30
00

so that f |W isa degenerate


 reflexive form on W, in fact, by easy computations we
find that Rad f |W = {(0, 0, α)|α ∈ R} =
(0, 0, 1) .

Theorem 7.20 Let f : V × V → F be a reflexive form, W a subspace of V. Then


f |W is non-degenerate on W if and only if V = W ⊕ W ⊥ .

Proof Of course, we may assume that W is a proper subspace of V, if not there is


nothing to prove.  
Trivially, the directness of the sum V = W ⊕ W ⊥ implies Rad f |W = W ∩
W ⊥ = {0} and f |W is non-degenerate on W.
Conversely, assume that f |W is non-degenerate
  on W, that is, no nonzero element
of W lies in W ⊥ and W ∩ W ⊥ = Rad f |W = {0}. This means that W and W ⊥ are
in direct sum. Let dim(V ) = n, dim(W ) = k < n and B = {e1 , . . . , ek } be a basis
for W. Extending B to a basis B  for V, we find ek+1 , . . . , en vectors of V such that
B  = {e1 , . . . , en }.
Now, let A be the matrix of f in terms of B  , X ∈ W ⊥ and [x1 , . . . , xn ]tB  be the
coordinate vector of X with respect to B  . Hence:

X ∈ W⊥ ⇐⇒ X t Aei = 0 for all i = 1, . . . , k


⇐⇒ eit AX = 0 for all i = 1, . . . , k
⇐⇒ [ 0, . . . , 0 , 1, 0, . . . , 0 ] B  A[x1 , . . . , xn ]tB  = 0 (7.8)



(i−1)−times (n−i)−times
for all i = 1, . . . , k.
 
If ai j are the coefficient entries of the matrix A, then the relation (7.8) means that

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
.
... ... ... ...
ak1 x1 + ak2 x2 + · · · + akn xn = 0
252 7 Bilinear and Quadratic Forms

Therefore, X ∈ W ⊥ if and only if its coordinate vector [x1 , . . . , xn ]tB  is a solu-


tion of the homogeneous linear system associated with the submatrix A of A
consisting in the first top k rows of A. Since the rank of A is ≤ k, the null
space of A has dimension ≥ n − k, that is, dim(W ⊥ ) ≥ n − k. On the other
hand, dim(V ) = n ≥ dim(W ⊕ W ⊥ ) = dim(W ) + dim(W ⊥ ) = k + dim(W ⊥ ) ≥
k + n − k = n, implying that dim(W ⊕ W ⊥ ) = n and W ⊕ W ⊥ is precisely equal
to V.

Remark 7.21 Using the above argument one may prove that if f : V × V → F
is a reflexive non-degenerate form and W is any subspace of V, then dim(W ) +
dim(W ⊥ ) = dim(V ).

Moreover, it is clear that the following holds:

Corollary 7.22 Let f : V × V → F be a reflexive non-degenerate form and W a


subspace of V. Then f |W is non-degenerate on W if and only if f |W ⊥ is non-degenerate
on W ⊥ .

Example 7.23 Let f : R3 × R3 → R be a (symmetric) bilinear form defined as


⎡ 1 1
⎤ ⎡ ⎤
0 y1
   ⎢ 2 2

f ( x1 , x2 , x3 ), (y1 , y2 , y3 ) = x1 x2 x3 ⎣ 1
2
0 21 ⎦ ⎣ y2 ⎦
1 1
0 y3
2 2

= 21 x1 y2 + 21 x2 y1 + 21 x1 y3 + 21 x3 y1 + 21 x2 y3 + 21 x3 y2 .

Since the matrix of f is invertible, f is non-degenerate. Consider now the following


decomposition of R3
  
1 1
R =
(1, 1, 0) ⊕ − , , 0 , (−1, −1, 1) .
3
2 2

If we denote W =
(1, 1, 0) , it is easy to see that
(− 21 , 21 , 0), (−1, −1, 1) is the
f -orthogonal space of W. So we may write W ⊥ =
(− 21 , 21 , 0), (−1, −1, 1) . Thus
R3 = W ⊕ W ⊥ .
Moreover, any pair of vectors u, v ∈ W, can be written as

u = α1 (1, 1, 0), v = β1 (1, 1, 0), α1 , β1 ∈ R.

Hence, by the definition of f, f (u, v) = α1 β1 . Thus, the 1 × 1 matrix [1] represents


the restriction f |W of f to W. This restriction is clearly non-degenerate.
Similarly, any pair of vectors u  , v  ∈ W ⊥ can be written as

α1 α1 β1 β1
u  = (− − α2 , − α2 , α2 ), v  = (− − β2 , − β2 , β2 )
2 2 2 2
7.6 Non-degenerate Bilinear Forms 253

where (α1 , α2 ) and (β1 , β2 ) are the coordinates of u  and v  , respectively (in terms
of the above fixed basis for W ⊥ ). By computations, we have that

1
f (u  , v  ) = − α1 β1 − α2 β2 .
4
Therefore, the matrix  
− 14 0
0 −1

represents the restriction f |W ⊥ of f to W ⊥ and it is also non-degenerate.

Definition 7.24 Let f : V × V → F be a reflexive form and B = {e1 , . . . , en } a


basis for V. We say that B is an f -orthogonal basis if f (ei , e j ) = 0 for all i = j.

Definition 7.25 Let f : V × V → F be a reflexive form and B = {e1 , . . . , en } an


f -orthogonal basis for V. We say that B is an f -orthonormal basis if f (ek , ek ) = 1
for all k = 1, . . . , n.

Example 7.26 Consider the symmetric form f defined in Example 7.23 and the
basis  
1 1
B = (1, 1, 0), (− , , 0), (−1, −1, 1)
2 2

for R3 . In light of Definition 7.24, B is an f -orthogonal basis for R3 .

Definition 7.27 Let f : V × V → F be a bilinear form. A nonzero vector v ∈ V is


called f -isotropic if f (v, v) = 0; otherwise v is called f -nonisotropic. The vector
space V is called f -isotropic if it contains at least one f -isotropic vector; otherwise
V is called f -nonisotropic. V is said to be totally f -isotropic (or also f -symplectic)
if every vector of V is isotropic.

Example 7.28 Let f : R3 × R3 → R be a (symmetric) bilinear form defined by the


matrix ⎡ ⎤
121
A = ⎣2 2 0⎦
100

in terms of the canonical basis for R3 . Notice that f is non-degenerate. The vector
X = [1, 0, − 12 ]t is f -isotropic, in fact f (X, X ) = 0. Hence R3 is f -isotropic. The
vector Y = [1, 2, 1]t is f -nonisotropic, in fact f (Y, Y ) = 19.

Example 7.29 Let f : R3 × R3 → R be a (symmetric) bilinear form defined by the


matrix ⎡ ⎤
110
A = ⎣1 2 1⎦
013
254 7 Bilinear and Quadratic Forms

in terms of the canonical basis for R3 . If we denote by X = [x1 , x2 , x3 ]t , any nonzero


vector of R3 in terms of the canonical basis for R3 , then

f (X, X ) = x12 + 2x1 x2 + 2x22 + 2x2 x3 + 3x32


= (x1 + x2 )2 + (x2 + x3 )2 + 2x32 > 0.

Therefore, there is no nonzero vector in R3 which is f -isotropic. Hence, in the sense


of Definition 7.27, R3 is a f -nonisotropic real vector space.
Example 7.30 Let f : R3 × R3 → R be a (skew-symmetric) bilinear form defined
by the matrix ⎡ ⎤
0 1 2
A = ⎣ −1 0 1 ⎦
−2 −1 0

in terms of the canonical basis for R3 . If we denote by X = [x1 , x2 , x3 ]t , any vector


of R3 in terms of the canonical basis for R3 , then it is clear that f (X, X ) = 0, i.e.,
R3 is a totally f -isotropic real vector space, in the sense of Definition 7.27.

7.7 Diagonalization of Symmetric Forms

Let V be equipped with the symmetric bilinear form f. Notice that any inner product
on a real vector space (according to the definition in Chap. 5) is a symmetric bilinear
form. The converse is not generally true. For instance, let f : R2 × R2 → R be a
bilinear form having matrix  
−4 2
A=
2 −2

in terms of the canonical basis for R2 . One can verify that f is symmetric. On the
other hand, for u = [1, 1]t ∈ R2 , we see that f (u, u) = u t Au = −2. Thus, f is not
an inner product (in the sense of our definition in Chap. 5).
In general, as statedin the previous section, if the bilinear form f is reflexive,
 
one may refer to V, f as a metric space. In particular, if f is symmetric, V, f
is called symmetric space. Nevertheless, any symmetric form having the additional
property f (v, v) > 0for all 0 = v ∈ V , is an inner product. Of course, in this case,
we may also refer to V, f as an inner product space.
Lemma 7.31 Let F be a field of characteristic different from 2 and (V, f ) a sym-
metric space. If f = 0 (that is f is not identically zero on V × V ), then there is
w0 ∈ V such that f (w0 , w0 ) = 0.
Proof Assume that f (v, v) = 0 for all v ∈ V. Therefore, for any x0 , y0 ∈ V , it
follows that f (x0 , x0 ) = 0, f (y0 , y0 ) = 0 and f (x0 + y0 , x0 + y0 ) = 0. Therefore,
since the form is symmetric, we get the contradiction
7.7 Diagonalization of Symmetric Forms 255

0 = f (x0 + y0 , x0 + y0 ) = f (x0 , y0 ) + f (y0 , x0 ) = 2 f (x0 , y0 ).

Theorem 7.32 Let F be a field of characteristic different from 2 and (V, f ) a sym-
metric space. Then there is an f -orthogonal basis for V.

Proof Firstly, we remark that, in case either V is 1-dimensional over F or f = 0,


any basis is f -orthogonal. Thus, we assume that dim(V ) = n ≥ 2 and there exist
x0 , y0 ∈ V such that f (x0 , y0 ) = 0. By Lemma 7.31, there is w0 ∈ V such that
f (w0 , w0 ) = 0.
Now, we prove our result by induction on the dimension n ≥ 2, so we assume
that the theorem is true for any symmetric space of smaller dimension.
Since f (w0 , w0 ) = 0, the restriction of f to the subspace W =
w0 of V is non-
degenerate. Hence, by Theorem 7.20, V = W ⊕ W ⊥ . Moreover, W ⊥ is a symmetric
space having dimension n − 1 over F. So, by the induction assumption, there is
an f -orthogonal basis {e1 , . . . , en−1 } for W ⊥ and then {e1 , . . . , en−1 , w0 } is an f -
orthogonal basis for V.
 
Remark 7.33 It is clear that if A = ai j is the matrix representing a bilinear form
in terms of an f -orthogonal basis, then A is a diagonal matrix, because of ai j =
f (ei , e j ) = 0 for all i = j. So we may give another version of Theorem 7.32, from
the matrix theory point of view: Any symmetric matrix, over a field of characteristic
different from 2, is congruent to a diagonal matrix.

7.8 The Orthogonalization Process for Nonisotropic


Symmetric Spaces

Here, we would like to describe the classical method for finding an f -orthogonal
basis for a symmetric space (V, f ), under the assumption that f (v, v) = 0 for any
nonzero v ∈ V.
Actually, the present method does not differ from the one we have already outlined
in the case of inner product spaces and is usually called the Gram-Schmidt process.
Let B = {b1 , . . . , bn } be a basis of the symmetric space V and A0 the matrix of
f with respect to B. Using the Gram-Schmidt process, we may compute the basis
E = {e1 , . . . , en } for V as follows:

e1 = b1


k−1
f (bk ,ei )
ek = bk − e
f (ei ,ei ) i
for all k = 2, . . . , n;
i=1

that is,
256 7 Bilinear and Quadratic Forms

e1 = b1

f (b2 ,e1 )
e2 = b2 − e
f (e1 ,e1 ) 1

f (b3 ,e1 ) f (b3 ,e2 ) (7.9)


e3 = b3 − e
f (e1 ,e1 ) 1
− e
f (e2 ,e2 ) 2
... ... ...

f (bn ,e1 ) f (bn ,e2 ) f (bn ,en−1 )


en = bn − e
f (e1 ,e1 ) 1
− e
f (e2 ,e2 ) 2
− ··· − e .
f (en−1 ,en−1 ) n−1

By easy computations, one can see that ek ⊥ ei , for any k = i. Moreover, since f
is nonisotropic, f (ek , ek ) = 0 for any k = 1, . . . , n. Hence, E is an f -orthogonal
basis for V. Let U be the transition matrix of E relative to B such that U t A0 U is
diagonal. Then, the column vectors of U are the coordinates of the elements of E in
terms of B. In particular, U has the following form:
⎡ ⎤
1 α12 α13 · · · α1n
⎢0 1 α23 · · · α2n⎥
⎢ ⎥
⎢ .. ⎥
..
⎢0 0 . ⎥
.
⎢ ⎥
U =⎢. .. ..⎥ , αi j ∈ F. (7.10)
⎢ .. . .⎥
⎢ ⎥
⎢. .. ⎥
⎣ .. . αn−1,n ⎦
0 0 0 1

We recall that such a matrix is usually called upper unitriangular, in the sense that it
is an upper triangular matrix having all diagonal coefficients equal to 1.
The matrix A of f in terms of E is diagonal, namely
⎡ ⎤
a11
⎢ a22 ⎥
⎢ ⎥
A = U t A0 U = ⎢ .. ⎥ (7.11)
⎣ . ⎦
ann

where aii = f (ei , ei ) = 0.


At this point, we have to split our argument into two different cases.
Firstly, we assume that the field F is algebraically closed. Then we know that there
exist α1 , . . . , αn ∈ F such that αi2 = aii , for any i = 1, . . . , n. Hence, by using the
vectors from the basis B, we compute a different basis B̃ = {e˜1 , . . . , e˜n } such that
e˜i = αi−1 ei . It is clear that B  is an f -orthogonal basis for V. Moreover, we have
 
f (e˜k , e˜k ) = f αk−1 ek , αk−1 ek = αk−2 f (ek , ek ) = akk
−1
akk = 1 for all k = 1, . . . , n.

Hence, the matrix à of f in terms of the basis B̃ has the following diagonal form:
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 257
⎡ ⎤
1
⎢ 1 ⎥
⎢ ⎥
à = ⎢ .. ⎥.
⎣ . ⎦
1

If P̃ is the transition matrix having in any i-th column the coordinates of the vector
e˜i , then P̃ t A P̃ = Ã.
Of course, if F is not algebraically closed, one cannot expect the same final result.
In particular, here we describe the case F = R.
Starting from B, we obtain a basis C = {c1 , . . . , cn } for V, as follows:

ck = √ 1 e
| f (ek ,ek )| k
= √ 1 ek
|akk |
for all k = 1, . . . , n

where f (ck , ci ) = 0 for all k = i,


 
1 1 1
f (ck , ck ) = f √ ek , √ ek = f (ek , ek ) = ±1 for all k = 1, . . . , n.
|akk | |akk | |akk |

Hence, the matrix A of f in terms of the basis C has the following diagonal form:
⎡ ⎤
±1
⎢ ±1 ⎥
⎢ ⎥
A = ⎢ .. ⎥,
⎣ . ⎦
±1

where any diagonal (i, i)-entry is equal to +1 or −1 according to the fact that aii is
positive or negative, respectively. Finally, by reordering the vectors in C, we get an
f -orthogonal basis D = {w1 , . . . , wn } for V, with respect to which the matrix A
of f is ⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
A = ⎢

⎥.

⎢ −1 ⎥
⎢ .. ⎥
⎣ . ⎦
−1

Moreover, if P is the transition matrix having in any i-th column the coordinates of
the vector wi , then P t A P = A .
The Gram-Schmidt process can be essentially viewed as a recursive algorithm,
divided into several steps, which should allow us to obtain a sequence of bases
for V. In any single step of the process, the basis of the sequence consists of vec-
tors obtained as linear combinations of vectors from the basis given at the previous
step. More precisely, at the i-th step, we get a basis Bi = {e1(i) , . . . , en(i) } such that
258 7 Bilinear and Quadratic Forms

{e1(i) , . . . , ei(i) , ek(i) } is an f -orthogonal set for any k ≥ i + 1. We stop the process
whenever the basis consists of all f -orthogonal vectors.  
Denote by B0 = {e1(0) , . . . , en(0) } the starting basis for V and A0 = ai j the coef-
ficients matrix of f with respect to B0 . Let us describe in detail any single step:
(i) Step 1 :
Set B1 = {e1(1) , . . . , en(1) }, where

e1(1) = e1(0)

and
f (ek(0) , e1(1) )
ek(1) = ek(0) − (1) (1) 1
e(1) , k = 2, . . . , n.
f (e1 , e1 )

Then ek(1) ⊥ e1(1) for any k = 1, and V =


e1(1) ⊕
e1(1) ⊥ , where
e1(1) ⊥ =

e2(1) , . . . , en(1) . The matrix A1 of f in terms of B1 has any (k, 1)-entry and
(1, k)-entry equal to zero for k = 1. Moreover, if P1 is the transition matrix
having in any i-th column the coordinates of the vector ei(1) in terms of B0 ,
then P1t A P1 = A1 .
(ii) Step 2 :
Set B2 = {e1(2) , . . . , en(2) }, where

e1(2) = e1(1)

and
e2(2) = e2(1)

so that, by the previous step,


e1(2) ⊥ e2(2)

and
f (ek(1) , e2(2) )
ek(2) = ek(1) − e2(2) , k = 3, . . . , n.
f (e2(2) , e2(2) )

Then ek(2) ⊥ e1(2) and ek(2) ⊥ e2(2) , for any k = 1, 2 and V =


e1(2) , e2(2) ⊕
e1(2) ,
e2(2) ⊥ , where
e1(2) , e2(2) ⊥ =
e3(2) , . . . , en(2) . The matrix A2 of f in terms of
B2 has any (k, j)-entry and ( j, k)-entry equal to zero, for j = 1, 2 and k = j.
Moreover, if P2 is the transition matrix having in any i-th column the coordi-
nates of the vector ei(2) in terms of B1 , then P2t A1 P2 = A2 .
(iii) Step m : for m ≥ 2.
We set Bm = {e1(m) , . . . , en(m) }, where

e(m)
j = e(m−1)
j , j = 1, . . . , m
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 259

so that, thanks to the previous steps,

e(m)
j ⊥ ei(m) for all i = j and i, j = 1, . . . , m

and
f (ek(m−1) , em(m) )
ek(m) = ek(m−1) − em(m) , k = m + 1, . . . , n.
f (em(m) , em(m) )

Then ek(m) ⊥ e(m) j , for any j = 1, . . . , m and k = m + 1, . . . , n, and


V =
e1 , . . . , em(m) ⊕
e1(m) , . . . , em(m) ⊥ , where
e1(m) , . . . , em(m) ⊥ =
em+1
(m) (m)
,
. . . , en(m) . Thus, the matrix Am of f in terms of Bm has any (k, j)-entry and
( j, k)-entry equal to zero, for j = 1, . . . , m and k = j. Once again, if Pm is
the transition matrix having in any i-th column the coordinates of the vector
ei(m) in terms of Bm−1 = {e1(m−1) , . . . , en(m−1) }, then Pmt Am−1 Pm = Am .
In any step of the process, we may define the transition matrix Pi , associated with
the transition from the basis Bi to Bi−1 : the column vectors of Pi are the elements
of Bi in terms of Bi−1 .
Assume for example that the process consists of n steps: at the end, we obtain the
matrix P = P1 P2 . . . Pn , which is the transition matrix of Bn relative to B0 , such that
P t A0 P is diagonal. The column vectors of P are the elements of the f -orthogonal
basis for V.

Example 7.34 Let f : R3 × R3 → R be the symmetric bilinear form defined by the


matrix ⎡ ⎤
−2 1 0
A = ⎣ 1 −2 1 ⎦
0 1 −3

in terms of the canonical basis for R3 . If we denote by X = [x1 , x2 , x3 ]t any nonzero


vector of R3 in terms of the canonical basis B = {e1 , e2 , e3 } for R3 , then

f (X, X ) = −2x
 1 + 2x1 x2 − 2x2 + 2x2 x3 − 3x3
2 2 2

= − x12 + (x1 − x2 )2 + (x2 − x3 )2 + 2x32 < 0 for all X ∈ R3 .

Therefore, there is no nonzero vector in R3 which is f -isotropic. Hence, in the sense


of Definition 7.27, R3 is a f -nonisotropic real vector space.
Starting from B = {e1 , e2 , e3 }, we construct the new basis B  = {e1 , e2 , e3 } for
R , defined as follows:
3

f (e1 , e2 ) f (e1 , e3 )
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1 ,
f (e1 , e1 ) f (e1 , e1 )

that is,
260 7 Bilinear and Quadratic Forms

1
e1 = e1 , e2 = e2 + e1 , e3 = e3 .
2
Therefore, the transition matrix is
⎡ ⎤
1 21 0
C1 = ⎣ 0 1 0 ⎦
001

and the matrix of f in terms of the basis B  is


⎡ ⎤
−2 0 0
A = C1t AC1 = ⎣ 0 −3
2
1 ⎦.
0 1 −3

In the second step, we define the following new basis B  = {e1 , e2 , e3 } for R3 :

f (e2 , e3 ) 
e1 = e1 , e2 = e2 , e3 = e3 − e ,
f (e2 , e2 ) 2

that is,
2
e1 = e1 , e2 = e2 , e3 = e3 + e2 .
3
The transition matrix is now ⎡ ⎤
100
C2 = ⎣ 0 1 23 ⎦
001

and the matrix of f in terms of the basis B  is


⎡ ⎤
−2 0 0
A = C2t A C2 = ⎣ 0 − 23 0 ⎦ .
0 0 − 73

In order to determine the coordinates of vectors e1 , e2 , e3 , we make the composition
of both changes of basis. So we obtain the transition matrix C of the final basis B 
relative to the starting canonical basis:
⎡ 1 1

1 2 3
⎢ ⎥
C = C1 C2 = ⎣ 0 ⎦. 1 23
001

Hence, the coordinates of e1 , e2 , e3 are precisely the columns of C, that is
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 261
 
1 1 2
B  = (1, 0, 0), ( , 1, 0), ( , , 1) .
2 3 3

A is the matrix of f in terms of B  . Finally, we construct the basis for R3 with
respect to which the matrix of f has all nonzero entries equal to ±1. To do this, we
e
replace each ei by √ i   . In particular, we have
| f (ei ,ei )|

  √ √
 

  3    7
| f (e1 , e1 )| = 2, | f (e2 , e2 )| = √ , | f (e3 , e3 )| = √
2 3

and obtain the basis


 
1 1 2 1 2 3
B̃ = ( √ , 0, 0), ( √ , √ , 0), ( √ , √ , √ ) .
2 6 6 21 21 21

The transition matrix of B̃ relative to the starting basis B is


⎡ ⎤
√1 √1 √1
2 6 21
⎢ 0 √26 √221 ⎥
C̃ = ⎣ ⎦
0 0 √321

and the matrix of f in terms of B̃ is


⎡ ⎤
−1 0 0
à = C̃ t AC̃ = ⎣ 0 −1 0 ⎦ .
0 0 −1

Here, we would like to repeat the previous example, in this case, the bilinear form is
defined on a complex vector space. More precisely:
Example 7.35 Let f : C3 × C3 → C be the symmetric bilinear form defined by the
matrix ⎡ ⎤
−2 1 0
A = ⎣ 1 −2 1 ⎦
0 1 −3

in terms of the canonical basis B for C3 .


By using the above argument, we find a basis
 
1 1 2
B  = (1, 0, 0), ( , 1, 0), ( , , 1)
2 3 3

in terms of which the matrix of f is the following diagonal one:


262 7 Bilinear and Quadratic Forms
⎡ ⎤
−2 0 0
A = ⎣ 0 − 23 0 ⎦ .
0 0 − 73

A is the matrix of f in terms of B  . Since the vector space is defined over a
algebraically closed field, here we may construct a basis for C3 with respect to
which the matrix of f has all nonzero entries equal to 1. To do this, we replace each
e
ei by √ i   . Thus, we have
f (ei ,ei )

  √ √
√ 3  7
f (e1 , e1 ) = i 2,  
f (e2 , e2 ) = i √ ,  
f (e3 , e3 ) = i √
2 3

and obtain the basis


 
1 1 2 1 2 3
B̃ = ( √ , 0, 0), ( √ , √ , 0), ( √ , √ , √ ) .
i 2 i 6 i 6 i 21 i 21 i 21

The transition matrix of B̃ relative to the starting basis B is


⎡ 1 1

√ √ √1
i 2 i 6 i 21
⎢ 2
0 i √6 i √221 ⎥
C̃ = ⎣ ⎦
0 0 i √321

and the matrix of f in terms of B̃ is


⎡ ⎤
100
à = C̃ t AC̃ = ⎣ 0 1 0 ⎦ .
001

7.9 The Orthogonalization Process for Isotropic Symmetric


Spaces

It is clear that the application of Gram-Schmidt process is strictly connected to the


following needed condition: in order to compute the i-th step, the vector ei(i−1) ∈ Bi−1
must be nonisotropic. In other words, the matrix Ai−1 of f, in terms of the basis Bi−1 ,
must have a nonzero diagonal (i, i)-entry.
Hence, if we assume that (V, f ) is isotropic, this condition could be not necessarily
verified. Here, we would like to describe a simple method for taking forward the
process, usually called Lagrange orthogonalization process.
7.9 The Orthogonalization Process for Isotropic Symmetric Spaces 263

So, after m steps, we suppose that Bm = {e1(m) , . . . , en(m) } is the basis for V
such that {e1(m) , . . . , em(m) , ek(m) } is an f -orthogonal set, for any k ≥ m + 1, but
(m) (m)
f (em+1 , em+1 ) = 0. Nevertheless, if there exists j ≥ m + 2 such that f (e(m) (m)
j , e j ) =
(m) (m)
0, we may obtain a new basis Bm where em+1 switches places with e j . The outcome
achieved enables us to apply the Gram-Schmidt method starting from Bm .
Therefore, we have to consider the hardest case when f (e(m) (m)
j , e j ) = 0, for
any j ≥ m + 1. Firstly, we notice that if f (e(m) (m)
j , ek ) = 0, for any j ≥ m + 1 and
k ≥ j + 1, then Bm is already an f -orthogonal basis, and we are done. Thus, we
suppose there are j ≥ m + 1 and k ≥ j + 1 such that f (e(m) (m)
j , ek ) = α  = 0. Easy
computations show that

f (e(m)
j + ek(m) , e(m)
j + ek(m) ) = 2α = 0.

Once again, we may obtain a new basis Bm for V, by replacing e(m) with e(m) + ek(m)

j j
in the basis Bm . As above, we can apply the Gram-Schmidt algorithm, by using the

vectors from Bm . By repeating this argument, we will finally find an f -orthogonal
basis for V.

Example 7.36 Let f : R3 × R3 → R be the symmetric bilinear form defined by the


matrix ⎡ 1 1⎤
0 2 2
⎢1 1⎥
A=⎣2 0 2⎦
1 1
2 2
0

in terms of the canonical basis B = {e1 , e2 , e3 } for R3 . Notice that, in this case, R3 is
f -isotropic. We firstly proceed to construct a basis B  = {e1 , e2 , e3 } for R3 in terms
of which the matrix of f has some nonzero element on the main diagonal. Since
f (e1 , e1 ) = 0 and f (e1 , e2 ) = 0, we may define

e1 = e1 + αe2 , e2 = e2 , e3 = e3 ,

where α ∈ R and f (e1 , e1 ) = α. It is easy to see that for α = 1 we get the required
condition f (e1 , e1 ) = 0. Thus, the transition matrix is
⎡ ⎤
100
C1 = ⎣ 1 1 0 ⎦
001

so that the matrix of f in terms of the new basis is


⎡ 1

1 2
1
A = C1t AC1 = ⎣ 21 0 21 ⎦ .
1 21 0
264 7 Bilinear and Quadratic Forms

The next change of basis will be the following:

f (e1 , e2 )  f (e1 , e3 ) 


e1 = e1 , e2 = e2 − 
 
 e1 , e3 = e3 − e,
f (e1 , e1 ) f (e1 , e1 ) 1

that is,
1
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
2
and the corresponding transition matrix is
⎡ ⎤
1 − 21 −1
C2 = ⎣ 0 1 0 ⎦ .
0 0 1

Hence, the new expression of the matrix of f is


⎡ ⎤
1 0 0
A = C2t A C2 = ⎣ 0 − 41 0 ⎦ .
0 0 −1

The basis B  = {e1 , e2 , e3 } in terms of which the matrix of f is precisely A , is
obtained by the computation
⎡⎤
1 − 21 −1
C = C1 · C2 = ⎣ 1 21 −1 ⎦ ,
0 0 1

where C is the transition matrix from B  to the starting basis B. Looking at the
columns of C, we have
 
1 1
B  = (1, 1, 0), (− , , 0), (−1, −1, 1) .
2 2

Moreover, C t AC = A .
Finally, we construct the basis for R3 , with respect to which the matrix of f has
e
all nonzero entries equal to ±1. To do this, we replace each ei by √ i   . In
| f (ei ,ei )|
particular, we have
  1 
| f (e1 , e1 )| = 1, | f (e2 , e2 )| = , | f (e3 , e3 )| = 1
2
and obtain the basis
7.9 The Orthogonalization Process for Isotropic Symmetric Spaces 265
 
B̃ = (1, 1, 0), (−1, 1, 0), (−1, −1, 1) .

The transition matrix from B̃ to the starting basis B is


⎡ ⎤
1 −1 −1
C̃ = ⎣ 1 1 −1 ⎦
0 0 1

and the matrix of f in terms of B̃ is


⎤⎡
1 0 0
à = C̃ t AC̃ = ⎣ 0 −1 0 ⎦ .
0 0 −1

Putting Theorem 7.32 and the above orthogonalization process together, we are now
able to state the following:

Theorem 7.37 Let F be an algebraically closed field of characteristic different from


2 and (V, f ) a symmetric vector space. Then there is an f -orthogonal basis B for
V such that the matrix of f in terms of B has the following form:
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥.
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

(Notice that we cannot say that B is orthonormal, because there is the chance that
some diagonal element is zero, that is, f (ei , ei ) = 0 for some ei ∈ B).

Theorem 7.38 Let (V, f ) be a real symmetric vector space. Then there is an f -
orthogonal basis B for V such that the matrix of f in terms of B has the following
form:
266 7 Bilinear and Quadratic Forms
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥.
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

7.10 Quadratic Forms Associated with Bilinear Forms

Let F be a field, V a finite dimensional vector space over F and B = {e1 , . . . , en } a


basis for V. As usual, we denote by X = [x1 , . . . , xn ]t the coordinate vector of any
v ∈ V in terms of B.
n
A quadratic form on V is any map q : V → F such that q(X ) = αi j xi x j ,
i, j=1
that is, q(X ) is a homogeneous polynomial of degree 2, involving the indeterminates
{x1 , . . . , xn }. This definition is strictly connected with the coordinates x1 , . . . , xn of
vectors of V, with respect to the given basis.
Here, we give an equivalent definition of a quadratic form, which is coordinate-
free:

Definition 7.39 Let q : V → F. Define a map f : V × V → F such that


 
1
q(u + v) − q(u) − q(v) when char (F) = 2
f (u, v) = 2 (7.12)
q(u + v) − q(u) − q(v) when char (F) = 2.

The map q is called a quadratic form on V if both the following conditions hold:
(1) the map (7.12) is bilinear;
(2) for any α ∈ F and v ∈ V, q(αv) = α 2 q(v).
It is easy to see that f is symmetric. The map f is called the symmetric bilinear form
associated with q.

Remark 7.40 For the rest of this section, we always assume that char (F) = 2.

Proposition 7.41 The map q : V → F is a quadratic form on V if and only if there


exists a bilinear form ϕ : V × V → F such that, for any v ∈ V , q(v) = ϕ(v, v).

Proof If q is a quadratic form on V, then ϕ is precisely the bilinear symmetric map


f of Definition 7.39. Next, we show that q(v) = f (v, v). Putting α = 0 and α =
7.10 Quadratic Forms Associated with Bilinear Forms 267

−1 in q(αv) = α 2 q(v), respectively, we get q(0) = 0 and q(−v) = q(v). Taking


u = −v in f (u, v), we obtain f (−v, v) = 21 (q(−v + v) − q(−v) − q(v)). As f
is bilinear, we get − f (v, v) = 21 (q(0) − q(v) − q(v)). After simplification, we get
q(v) = f (v, v).
Conversely, assume there is a bilinear form ϕ : V × V → F such that, for any
v ∈ V, q(v) = ϕ(v, v). Then, for any α ∈ F and v ∈ V, we have

q(αv) = ϕ(αv, αv) = α 2 ϕ(v, v) = α 2 q(v). (7.13)

Moreover, for any u, v ∈ V,


   
1
2
q(u + v) − q(u) − q(v) = 2 ϕ(u + v, u + v) − ϕ(u, u) − ϕ(v, v)
1


= ϕ(u, u) + ϕ(u, v) + ϕ(v, u) + ϕ(v, v)
1
2


−ϕ(u, u) − ϕ(v, v)

 
= ϕ(u, v) + ϕ(v, u) .
1
2

 (7.14)

Let now f : V × V → F be defined as f (u, v) = q(u + v) − q(u) − q(v) , for
1
2
any u, v ∈ V. By relation (7.14) it is clear that f (u, v) = f (v, u), that is, f is
symmetric, moreover, since ϕ is bilinear, so is also f . Thus, in light of Definition 7.39,
we conclude that q is a quadratic form on V, having f as its associated symmetric
bilinear form.
Remark 7.42 The previous result outlines the fact that the quadratic form q can be
expressed both in terms of its associated symmetric bilinear form f , and in terms of
any bilinear form ϕ having the property ϕ(v, v) = q(v) for all v ∈ V. Nevertheless,
ϕ is not required to be necessarily symmetric. On the other hand, in case we assume
ϕ symmetric, by (7.14) it follows that ϕ = f.
In other words, there is an unique symmetric bilinear form associated with q.
Definition 7.43 Let q : V → F be a quadratic form on V, B = {e1 , . . . , en } a basis
for V and ϕ be any bilinear form associated with q, that is, ϕ(v, v) = q(v) for any
v ∈ V. If C is the matrix of ϕ in terms of the basis B, then q(v) = ϕ(v, v) = v t Cv.
We say that C is a matrix associated with q in terms of the basis B.
In light of the previous remark, there is only one symmetric matrix associated
with a quadratic form in terms of a fixed basis.
Now the question that arises is how we can compute the symmetric form associated
with a given quadratic form. To provide an answer to this question, we prove the
following:
268 7 Bilinear and Quadratic Forms

Theorem 7.44 Let q : V → F be a quadratic form on V and B = {e1 , . . . , en } be


an ordered basis for V and A the matrix of q in terms of B. Then the (unique)
symmetric bilinear form associated with q is represented by the matrix 21 C + C t
with respect to the basis B, where C is the matrix of any bilinear form associated
with q.

Proof Let ϕ be any bilinear form associated with q, that is, ϕ(v, v) = q(v), for any
v ∈ V. If C is the matrix of ϕ in terms of the basis B, then q(v) = ϕ(v, v) = v t Cv.
Now we define the bilinear form ϕ̃ having matrix C t , that is, ϕ̃(v, w) = v t C t w for
any v, w ∈ V. So we may introduce the quadratic form associated with ϕ̃ as follows:
q̃(v) = v t C t v for any v ∈ V. Since both v t C t v and v t Cv are scalar elements of F,
it is clear that each of them coincides with its transpose. On the other hand, the
transpose of v t C t v is precisely v t Cv (and viceversa). Hence v t C t v = v t Cv, i.e.,
q(v) = q̃(v), for any v ∈ V and

1 t 1
2q(v) = q(v) + q̃(v) = v t (C + C t )v =⇒ q(v) = v (C + C t )v = v t (C + C t ) v,
2 2

where C + C t is a symmetric matrix which represents q with respect to the basis B.


Moreover, if we assume there is another symmetric matrix D associated with q,
then we have
1
q(v) = v t (C + C t ) v and q(v) = v t Dv for all v ∈ V,
2
 
which means v t 21 (C + C t ) − D v = 0, for any v ∈ V. For clearness, we write
 
1
2
(C + C t ) − D = E = αi j . Since D and 21 (C + C t ) are symmetric matrices, then
so E is. Thus, for any ei ∈ B, one has eit Eei = 0, implying αii = 0. Moreover, for
any i = j and ei , e j ∈ B, we also get

0 = (ei + e j )t E(ei + e j ) = eit Ee j + etj Eei = αi j + α ji = 2αi j .

Therefore, E = 0 and 21 (C + C t ) = D. Thus, the uniqueness of the symmetric form


associated with q is proved.
 
Example 7.45 Let q : R2 → R defined by q (x1 , x2 ) = x12 + 5x1 x2 + 2x22 in terms
of the canonical basis for R2 . By easy computations, we can see that each of the fol-
lowing bilinear forms can be associated with q :
 
ϑ : R2 × R2 → R, ϑ (x1 , x2 ), (y1 , y2 ) = x1 y1 + 2x1 y2 + 3x2 y1 + 2x2 y2 ;
 
η : R2 × R2 → R, η (x1 , x2 ), (y1 , y2 ) = x1 y1 + x1 y2 + 4x2 y1 + 2x2 y2 ;
 
ψ : R2 × R2 → R, ψ (x1 , x2 ), (y1 , y2 ) = x1 y1 + 5x1 y2 + 2x2 y2 .
7.10 Quadratic Forms Associated with Bilinear Forms 269

None of the above bilinear maps is symmetric. In order to obtain the symmetric form
associated with q, we may choose arbitrarily one of the above bilinear form and
compute its matrix in terms of the canonical basis for R2 . For instance, the matrix of
ϑ is    
12 13
A= , so that At =
32 22

and the only symmetric bilinear map ϕ associated with q has matrix
 5

1 1
(A + At ) = 5
2 ,
2 2
2

that is,
  5 5
ϕ (x1 , x2 ), (y1 , y2 ) = x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
2 2
Of course, we get the same result starting from the matrix of η or ψ.

7.11 The Matrix of a Quadratic Form and the Change of


Basis

Let q : V → F be a quadratic form, f : V × V → F abilinear  form associated with


q, B = {e1 , . . . , en } an ordered basis for V, and A = ai j the matrix of f in terms
of B. Thus, for any vectors X, Y ∈ V, we know that f (X, Y ) = X t AY. In particular,

q(X ) = f (X, X ) = X t AX,

where X is the column coordinate vector with respect to B. We say that A is the
matrix of q in terms of the basis B.
Moreover, if B  is another basis for V, different from B, it is known that f can be
represented also by the matrix A = P t A P in terms of B  , where P is the transition
matrix of B  relative to B.
As above, we say that A is the matrix of q with respect to the basis B  and

q(X  ) = X t A X  ,

where X  is the column coordinate vector in terms of the basis B  .


In summary: the quadratic form q can be represented by any matrix, which repre-
sents a bilinear form associated with q. In particular, two matrices A, A represent
q, in terms of two different bases for V, if and only if they are congruent.
As pointed out in the Remark 7.42, we may associate different bilinear forms to
the same quadratic form q. Hence, in general, once a basis for V has been established,
270 7 Bilinear and Quadratic Forms

we may represent q by different matrices, each of which is relative to a bilinear form


associated with q. Only one of these forms is symmetric.
Hence, there is a one-to-one correspondence between symmetric bilinear forms
on V and quadratic forms on V and knowing the quadratic form is equivalent to
knowing the corresponding bilinear form.
In particular, the matrix of a quadratic form coincides with the matrix of the
corresponding symmetric bilinear form and it can be obtained as follows: let B be a
basis for V, X = [x1 , . . . , xn ]t the generic column coordinate vector with respect to
n 
B and q : V → F a quadratic form defined as q(X ) = αi xi2 + αi j xi x j . Then
i=1 i< j
the matrix of q is ⎡ ⎤
α1 α12
2
· · · α21n
⎢ α12 α2 · · · α22n ⎥
⎢ 2 ⎥
A=⎢ . .. . . .. ⎥ .
⎣ .. . . . ⎦
α1n α2n
2 2
· · · αn

7.12 Diagonalization of a Quadratic Form

In light of the equivalence between quadratic and symmetric bilinear forms, let us
rephrase Theorems 7.37 and 7.38 as follows:

Theorem 7.46 Let F be an algebraically closed field of characteristic different from


2 and q : V → F a quadratic form on V. Then there is an f -orthogonal basis B =
{e1 , . . . , en } for V such that the matrix of q in terms of B has the following form:
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
A=⎢

⎥.

⎢ 0 ⎥
⎢ .. ⎥
⎣ . ⎦
0

In other words, if r is the rank of the matrix of q, then it follows that, for any X ∈ V,
having coordinate vector [x1 , . . . , xn ]t in terms of B,
7.12 Diagonalization of a Quadratic Form 271

q(X ) = X t AX
⎡ ⎤
1
⎢ .. ⎥⎡ ⎤
⎢ . ⎥
⎢ ⎥ x1
⎢ 1 ⎥⎢ . ⎥
= [x1 , . . . , xn ] ⎢

⎥⎣ . ⎦
⎥ .
⎢ 0 ⎥
⎢ .. ⎥ xn
⎣ . ⎦
0

= x12 + · · · + xr2 .

Theorem 7.47 Let q : V → R be a real quadratic form on V. Then there is an f -


orthogonal basis B = {e1 , . . . , en } for V such that the matrix of q in terms of B has
the following form: ⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ . .. ⎥
A=⎢ ⎥.
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

In other words, if r is the rank of the matrix of q, there exists an integer 0 ≤ p ≤ r


such that, for any X ∈ V, having coordinate vector [x1 , . . . , xn ]t in terms of B :

q(X ) = X t AX
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥⎡ ⎤
⎢ −1 ⎥ x1
⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
= [x1 , . . . , xn ] ⎢ . ⎥⎣ . ⎦
⎢ ⎥
⎢ −1 ⎥ xn
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

= x12 + · · · + x 2p − x 2p+1 − · · · − xr2 .


272 7 Bilinear and Quadratic Forms

7.13 Definiteness of a Real Quadratic Form

Here, we shall still process some properties of congruent matrices. To do this, we


need the following lemmas:
Lemma 7.48 Let A, P be two n × n matrices having coefficients in a field F. If P
is invertible, then A and A P have the same rank.

Proof Each of A, P, A P represents a linear operator on the vector space Fn . We


will discuss the range of any such operator. Without loss of generality, we will refer
to these ranges as R(A), R(P), R(A P).
Let X ∈ Fn be such that X ∈ R(A P). Hence, there is Y ∈ Fn such that X =
(A P)Y = A(PY ), implying that X ∈ R(A). Conversely, let now X ∈ Fn be such
that X ∈ R(A), that is, X = AZ for some Z ∈ Fn . Since P is invertible, the linear
operator represented by P is an isomorphism. Therefore, there is precisely one vector
Y ∈ Fn such that Z = PY, so X = AZ = A PY and X ∈ R(A P).
Hence, the image of A is equal to the image of A P. Both of them are generated
by the linearly F-independent columns in A and A P, respectively. Since the number
of linearly independent columns in a matrix is precisely its rank, we get the required
conclusion.

Lemma 7.49 Let A, P be two n × n matrices having coefficients in a field F. If P


is invertible, then A and P A have the same rank.

Proof As above, we consider the linear operators represented by A, P, P A. Now,


we will discuss also the kernel of any such operator and refer to these kernels as
N (A), N (P), N (P A).
Let X ∈ Fn be such that X ∈ N (A), that is, AX = 0 and a fortiori P AX = 0,
which implies X ∈ N (P A). Conversely, let X ∈ Fn be such that X ∈ N (P A), that
is (P A)X = 0. Since P is invertible, we may left multiply the previous relation by
P −1 , having AX
 = 0, that is, X ∈ N(A). Therefore N (A)
 = N (P  A). On the other

hand n = dim N (A)  + dim R(A)
 and also
 n = dim N (P A) + dim R(P A) ,
implying that dim R(A) = dim R(P A) . As in the previous lemma, we conclude
that the rank of A is equal to the rank of P A.

Lemma 7.50 Let A, A be two n × n matrices having coefficients in a field F. If


A, A are congruent, then they have the same rank.

Proof By our hypothesis, there exists an invertible n × n matrix P with coefficients


in F, such that A = P t A P. Since P t is invertible, and by Lemma 7.49, the matrices
A and A P have the same rank. Moreover, using Lemma 7.48, the rank of A P is
equal to the rank of A , as desired.

Definition 7.51 Let q : V → R be a real quadratic form, associated with the


symmetric bilinear form f. The ordered pair ( p, r − p) such that, for any X =
(x1 , . . . , xn ) ∈ Rn , q(X ) = x12 + · · · + x 2p − x 2p+1 − · · · − xr2 with respect to an
appropriate f -orthogonal basis for V, is called the signature of q.
7.13 Definiteness of a Real Quadratic Form 273

In a similar way, we give the following:

Definition 7.52 Let A be a real n × n symmetric matrix, Q a real n × n invertible


matrix such that Q t AQ = D is a diagonal matrix of the form
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

having p ones and r − p negative ones on the main diagonal, where r is the rank of
A (recall that A and D have the same rank, since they are congruent). The ordered
pair ( p, r − p) is called the signature of A.

Remark 7.53 Let D be a real n × n diagonal matrix having the form


⎡ ⎤
α12
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ α 2p ⎥
⎢ ⎥
⎢ −α 2p+1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥,
⎢ ⎥
⎢ −αr2 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

where αi ∈ R, for any i = 1, . . . , r. The signature of D is precisely ( p, r − p).


In fact, if q : V → R is the real quadratic form represented by D in terms of a
basis B, then, by Theorem 7.47, there is an orthogonal basis B  = {e1 , . . . , en } for
V such that the matrix of q in terms of B  has the following form:
274 7 Bilinear and Quadratic Forms
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
 ⎢ .. ⎥
D =⎢ . ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

having p positive ones and r − p negative ones on the main diagonal. In other words,
there exists an invertible matrix P, that is, the transition matrix of B  relative to B,
such that P t D P = D  .

Definition 7.54 Let q : V → R be a real quadratic form, associated with the sym-
metric bilinear form f. The quadratic form q is called:
(1) Positive definite if q(X ) > 0 for all nonzero vectors X ∈ V ; in this case, the
signature is (n, 0).
(2) Negative definite if q(X ) < 0 for all nonzero vectors X ∈ V ; in this case, the
signature is (0, n).
(3) Indefinite if it is neither positive-definite nor negative-definite, in the sense that
q takes on V both positive and negative values; in this case, the signature is
( p, r − p) for p = r.
(4) Positive semi-definite if q(X ) ≥ 0 for all X ∈ V, but there is some nonzero vector
X 0 ∈ V so that q(X 0 ) = 0; in this case, the signature is (r, 0).
(5) Negative semi-definite if q(X ) ≤ 0 for all X ∈ V, but there is some nonzero
vector X 0 ∈ V so that q(X 0 ) = 0; in this case, the signature is (0, r ).

Analogously, we may introduce the following:


Definition 7.55 A symmetric matrix A ∈ Mn (R) is called:
(1) Positive definite if X t AX > 0 for all nonzero vectors X ∈ Rn ; in this case, the
signature of A is (n, 0).
(2) Negative definite if X t AX < 0 for all nonzero vectors X ∈ Rn ; in this case, the
signature is (0, n).
(3) Indefinite if it is neither positive-definite nor negative-definite, in the sense that
X t AX takes both positive and negative values, depending on the choice of X ∈
Rn ; in this case, the signature is ( p, r − p), for p = r.
(4) Positive semi-definite if X t AX ≥ 0 for all X ∈ Rn , but there is some nonzero
vector X 0 ∈ Rn so that X 0t AX 0 = 0; in this case, the signature is (r, 0).
(5) Negative semi-definite if X t AX ≤ 0 for all X ∈ Rn , but there is some nonzero
vector X 0 ∈ Rn so that X 0t AX 0 = 0; in this case, the signature is (0, r ).
7.13 Definiteness of a Real Quadratic Form 275

Remark 7.56 Let (V, f ) be a finite dimensional metric space over the field R. If f is
a symmetric bilinear form on V, having the additional property that f (v, v) ≥ 0, for
any v ∈ V and f (v, v) = 0 if and only if v = 0, then we say that (V, f ) is an inner
product vector space. This is equivalent to say that the quadratic form q associated
with f is positive definite.

Lemma 7.57 Let D, E be two real n × n diagonal matrices. If D, E are congruent,


then they have the same rank and signature.

Proof Since D and E are congruent, they represent the same real quadratic form
q : V → R, in terms of two different bases for V. The fact that D and E have the
same rank is proved in Lemma 7.50, namely, rank(D) = rank(E) = r ≤ n. Thus,
there exists a basis B = {e1 , . . . , en } for V such that the matrix of q in terms of B is
⎡ ⎤
α12
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ α 2p ⎥
⎢ ⎥
⎢ −α 2p+1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥,
⎢ ⎥
⎢ −αr2 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

where αi ∈ R, for any i = 1, . . . , r. Simultaneously, there exists a basis B  =


{e1 , . . . , en } for V such that the matrix of q in terms of B  is
⎡ ⎤
β12
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ βs2 ⎥
⎢ ⎥
⎢ −βs+1
2 ⎥
⎢ ⎥
⎢ .. ⎥
E =⎢ . ⎥,
⎢ ⎥
⎢ −βr2 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

where βi ∈ R, for any i = 1, . . . , r. Hence, the pair ( p, r − p) is the signature of D


and the pair (s, r − s) is the signature of E (see Remark 7.53). Our aim is to prove
that p = s. By our assumptions, we have
276 7 Bilinear and Quadratic Forms

q(ei ) > 0 for all i = 1, . . . , p;


q(ei ) < 0 for all i = p + 1, . . . , r ;
q(ei ) = 0 for all i = r + 1, . . . , n;
q(ei ) > 0 for all i = 1, . . . , s;
q(ei ) < 0 for all i = s + 1, . . . , r ;
q(ei ) = 0 for all i = r + 1, . . . , n.


Consider now the set S={e1 . . . , e p , es+1 , . . . , en } and suppose there exist λ1 , . . . , λ p ,
λ1 , . . . , λn−s ∈ R such that

λ1 e1 + · · · + λ p e p + λ1 es+1

+ · · · + λn−s en = 0. (7.15)

If we denote u = λ1 e1 + · · · + λ p e p = −(λ1 es+1



+ · · · + λn−s en ) ∈ V , then we
have both
  
p

p
q(u) = q λ1 e1 + · · · + λ p e p = λi2 q(ei ) = λi2 αi2 ≥ 0
i=1 i=1

and
  
n−s r −s

q(u) = q −λ1 es+1

− ··· − λn−s en = 
λi2 q(es+i )=− λi2 βs+i
2
≤ 0.
i=1 i=1

This implies that q(u) = 0. Therefore,


p r −s

λi2 αi2 = λi2 βs+i
2
= 0.
i=1 i=1

That is,

λi = 0 for all i = 1, . . . , p and λj = 0 for all j = 1, . . . , r − s.

Thus, the relation (7.15) reduces to

λ1 es+1

+ · · · + λn−s en = 0

implying
λj = 0 for all j = 1, . . . , n − s.

Hence, S is a linearly independent set, the vector subspaces


e1 , . . . , e p and


es+1 , . . . , en of V are direct summands, so that the dimension of the vector subspace
spanned by S is equal to p + n − s ≤ n, i.e., p ≤ s.
7.13 Definiteness of a Real Quadratic Form 277

Symmetrically, let S  = {e1 . . . , es , e p+1 , . . . , en }. By the above arguments, one


may prove that S  is a linearly independent set, the vector subspaces
e1 , . . . , es
and
e p+1 , . . . , en of V are direct summands, so that the dimension of the vector
subspace spanned by S  is equal to s + n − p ≤ n, i.e., s ≤ p. In summary, p = s,
as required.

Theorem 7.58 Let A, A be two real n × n symmetric matrices. A, A are congru-


ent if and only if they have the same signature.

Proof We firstly recall that A, A are congruent if and only if they represent the
same real quadratic form q : V → R, in terms of two different bases for V (see
Theorem 7.7). Moreover, Remark 7.33 assures that any symmetric matrix, over a
field of characteristic different from 2, is congruent to a diagonal matrix.
In light of these comments, there exist invertible n × n real matrices M, P, Q and
diagonal n × n real matrices D, D  such that the following relations hold simultane-
ously:
A = M t A M, A = P t D  P, A = Q t D Q.

Of course, by Remark 7.53, we may assume that D and D  represent the signature
of A and A , respectively. Therefore, by
 
Q t D Q = A = M t A M = M t P t D  P M,

it follows
 t  
D = (Q t )−1 M t P t D  P M Q −1 = P M Q −1 D  P M Q −1 .

Hence, the diagonal matrices D, D  are congruent and, by Lemma 7.57, they have
the same signature.
We now prove the other direction of this theorem and suppose that A, A have the
same signature. Thus, there exists a diagonal n × n real matrix D (representing the
common signature of A and A ) and two invertible n × n real matrices P, Q such
that D = P t A P = Q t A Q. Hence
 t  
A = (Q t )−1 P t A P Q −1 = P Q −1 A P Q −1 ,

i.e., A and A are congruent.

Theorem 7.59 Let A be a real n × n symmetric matrix having signature ( p, r − p).


Then
(i) The number of positive eigenvalues of A is equal to p.
(ii) The number of negative eigenvalues of A is equal to r − p.

Proof Since the signature of A is ( p, r − p), A is congruent to a diagonal matrix D


having the form
278 7 Bilinear and Quadratic Forms
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

with p positive ones and r − p negative ones on the main diagonal. More precisely,
there exists an invertible n × n real matrix Q such that D = Q t AQ.
On the other hand, since A is symmetric, it is orthogonally similar to a diagonal
matrix D  having the eigenvalues {λ1 , . . . , λn } of A on the main diagonal, that is,
there exists an orthogonal real n × n matrix P such that
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
P t A P = D = ⎢

⎥.

⎣ .. ⎦
.
λn

By using the eigenvalues {λ1 , . . . , λn } of A, we now construct the following diagonal


matrix ⎡ ⎤
μ1
⎢ .. ⎥
⎢ . ⎥
C =⎢ ⎢ ⎥,
. .. ⎥
⎣ ⎦
μn

where
 √1
λi
if λi > 0
μi = √1
−λi
if λi < 0
1 if λi = 0.

Easy computations show that


⎡ ⎤
η1
⎢ .. ⎥
⎢ . ⎥
CD C = CD C = ⎢


 t ⎥ = D  ,

⎣ .. ⎦
.
ηn
7.13 Definiteness of a Real Quadratic Form 279

where ⎧ ⎫
⎨ 1 if λi > 0 ⎬
ηi = μi2 λi = −1 if λi < 0 .
⎩ ⎭
0 if λi = 0

Hence, if we assume that A has rank equal to t ≤ n, s positive eigenvalues and t − s


negative eigenvalues, the signature of D  is precisely (s, t − s). On the other hand,
since P is orthogonal and C is diagonal invertible, we also have A = P D  P t and
D = C −1 D  (Ct )−1 , so that D = Q t AQ = Q t P D  P t Q = Q t PC −1 D  (C t )−1 P t Q
t
= (C t )−1 P t Q D  (C t )−1 P t Q . Therefore, D, D  are congruent, having the same
rank and signature (see Lemma 7.57), that is, r = t, p = s and r − p = t − s, as
required.

Corollary 7.60 A real n × n symmetric matrix A is positive definite if and only if


any eigenvalue of A is positive. Analogously, let q : V → R be a real quadratic form
and A be the matrix of q, in terms of a fixed basis for V. Then q is positive definite
if and only if any eigenvalue of A is positive.

We conclude this chapter by studying the relationship between the principal subma-
trices of a matrix associated with a quadratic form q and the definiteness of q. Here,
we recall the following well-known objects:
Definition 7.61 Let A be a n × n matrix over F, namely
⎡ ⎤
a11 a12 · · · a1n
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
A=⎢
⎢ .. ..

.. .. ⎥ .
⎣ . . . . ⎦
an1 an2 · · · ann

The k × k principal submatrix of A is the square submatrix of A that is obtained by


taking the elements common to the first k rows and columns of A. In other words, a
matrix of this form arises from A by omission of the last n − k rows and columns.
A principal minor of A is the determinant of any principal submatrix of A. In other
words: ⎡ ⎤
⎡ ⎤ a11 a12 · · · a1k
  a11 a12 a13 ⎢ .. .. .. .. ⎥
  a11 a12 ⎢ . . . . ⎥
a11 , , a21 a22 a23 , . . . , ⎢
⎣ ⎦
⎢ .. .. .. .. ⎥

a21 a22 ⎣
a31 a32 a33 . . . . ⎦
ak1 ak2 · · · akk

for any k ≤ n; are the principal submatrices of A, and


280 7 Bilinear and Quadratic Forms
' '
' a11 a12 · · · a1k ''
' ' '
' ' ' a11 a12 a13 ' ' .. .. .. .. '
'a a12 '' '' ' ' . . . . ''
a11 , '' 11 , a a a , . . . , '' .
'
a21 a22 ' '' 21 22 23 '' ' ..
.. .. .. '
a31 a32 a33 ' . . . ''
' ak1 ak2 · · · akk '

for any k ≤ n; are its principal minors.

Theorem 7.62 Let A be the symmetric n × n matrix of the quadratic form q : V →


R. Then q is positive definite if and only if any principal minor of A is positive.

Proof Firstly, we assume that any principal minor of A is positive and show, by
induction on the dimension n, that q is positive definite. Remind that this is equivalent
to say that any eigenvalue of A is positive.
Of course, in case n = 1, the conclusion is trivial, because A = (a11 ), where
a11 > 0 and q(X ) = a11 x12 > 0, for any X ∈ V, which has its coordinate vector
[X ], where [X ] = [x1 ]t .
Thus, we assume that the result holds for any quadratic form q  : W → R, where
W is a real space of dimension m < n. In other words, we suppose that if any principal
minor of the symmetric real matrix associated with q  is positive, then q  is positive
definite.
Since a11 is the principal minor of order 1, then a11 > 0. By using this element,
we may apply the orthogonalization process for symmetric spaces. We compute the
following coefficients:
a12 a13 a1i
β12 = β13 = . . . . . . β1i = for all i ≥ 2
a11 a11 a11

and obtain the matrix


⎡ ⎤
1 −β12 −β13 ··· −β1n
⎢0 1 0 ··· 0 ⎥
⎢ ⎥
⎢ . .. .. ⎥
⎢0 0 . ⎥
⎢ ⎥
U =⎢. .. .. ⎥
⎢ .. . . ⎥
⎢ ⎥
⎢. .. ⎥
⎣ .. . 0 ⎦
0 0 0 1

such that  
a11 01,n−1
U t AU = (7.16)
0n−1,1 B

where

01,n−1 = (0, . . . , 0), 0n−1,1 = 01,n−1


t
and B ∈ Mn−1 (R) is symmetric.


(n−1)−times
7.13 Definiteness of a Real Quadratic Form 281

Notice that, since U is upper unitriangular, U t is lower unitriangular, in the sense


that it is a lower triangular matrix having all diagonal coefficients equal to 1 :
⎡ ⎤
1 0 0 ··· 0
⎢ −β12 1 0 · · · 0⎥
⎢ ⎥
⎢ .. ⎥
⎢ −β13 0 . . . .⎥
⎢ ⎥
U =⎢ .
t
.. .. ⎥ .
⎢ .. . .⎥
⎢ ⎥
⎢ . .. ⎥
⎣ .. . 0⎦
−β1n 0 0 1

This matrix represents a finite number of elementary row operations. If we denote


by Ri = [ai1 , . . . , ain ] the row coordinate vector consisting of the elements from the
i-th row of A, we may write ⎡ ⎤
R1
⎢ R2 ⎥
⎢ ⎥
⎢ .. ⎥
A=⎢ ⎢ . ⎥.

⎢ .. ⎥
⎣ . ⎦
Rn

Analogously, we denote by Ri the row coordinate vector consisting of the elements
from the i-th row of the product U t A. By easy computations, we see that
⎡ ⎤
R1
⎢ R2 − β12 R1 ⎥
⎢ ⎥
⎢ ⎥
U A = ⎢ R3 − β13 R1 ⎥ .
t
⎢ .. ⎥
⎣ . ⎦
Rn − β1n R1

In other words, R1 = R1 and, for any k ≥ 2, Rk = Rk − β1k R1 . Therefore, any k × k
principal submatrix of U t A is obtained by the corresponding k × k principal subma-
trix of A after using a finite number of elementary row operations. In particular, the
operation is the addition of a multiple of a row to another. It is well known that this
type of basic operation has no effect on the determinant. Hence any k × k principal
minor of U t A is equal to the corresponding k × k principal minor of A.
By using the same argument, one can see that the matrix U t (AU ) has the same
principal minors of AU. On the other hand, since any matrix has the same principal
minors of its transpose matrix, AU and U t At have the same principal minors. In
summary, U t AU, AU, U t At , At and A have the same principal minors.
Looking at (7.16), it is clear that any principal minor of U t AU is the product of
a11 > 0 with a principal minor of B. Therefore, any principal minor of B should be
positive. On the other hand, B is a symmetric matrix of dimension n − 1, therefore,
282 7 Bilinear and Quadratic Forms

by induction hypothesis, the eigenvalues {λ1 , . . . , λn−1 } of B are positive. Thus, the
eigenvalues {a11 , λ1 , . . . , λn−1 } of U t AU are positive, so that q is positive definite.
Suppose now that q is positive definite. Let Ak be any principal submatrix of
A of dimension k and Y = (α1 , . . . , αk ) ∈ Rk . Hence, the coordinate vector of Y
is [Y ] = [α1 , . . . , αk ]t . Further, let X ∈ V such that its coordinate vector is [X ] =
[α1 , . . . , αk , 0, . . . , 0 ]t ∈ Rn . Since q is positive definite, X t AX > 0. That is,


(n−k)−times

⎡ ⎤
α1
⎢ .. ⎥
⎢ ⎥
 ⎢ . ⎥
Ak Bk,n−k ⎢ ⎥
⎢ αk ⎥
X AX = [α1 , . . . , αk , 0, . . . , 0]
t
Cn−k,k E n−k,n−k ⎢ 0
⎢ ⎥

⎢ . ⎥
⎣ .. ⎦
0
⎡ ⎤
α1
⎢ ⎥
= [α1 , . . . , αk ]Ak ⎣ ... ⎦ > 0,
αk

where Bk,n−k ∈ Mk,n−k (R), Cn−k,k ∈ Mn−k,k (R) and E n−k,n−k ∈ Mn−k,n−k (R).
By the arbitrariness of Y = (α1 , . . . , αk ), it follows that Y t Ak Y > 0 for any vector
Y ∈ Rk , that is, Ak ∈ Mk (R) is the matrix of a positive definite quadratic form on
Rk . Therefore, any eigenvalue of Ak is positive, and hence the determinant of Ak is
positive because the determinant of Ak is equal to the product of all eigenvalues of
Ak . Thus, any principal minor of A is positive.

Exercises

1. Let f : R4 × R4 → R be a symmetric form associated with the matrix


⎡ ⎤
0 1 1 1
⎢1 0 1 1⎥
⎢ ⎥
⎣1 1 0 1⎦
1 1 1 0

in terms of the canonical basis for R4 . Determine an orthogonal basis for R4 in


terms of which the matrix of f is diagonal.
2. Let f : R4 × R4 → R be the symmetric form defined as
7.13 Definiteness of a Real Quadratic Form 283
 
f (x1 , x2 , x3 , x4 ), (y1 , y2 , y3 , y4 ) = 2x1 y1 − 4x1 y3 + x1 y4 − 2x2 y2 − 2x2 y3
−4x2 y4 − 4x3 y1 − 2x3 y2 + x4 y1 − 4x4 y2
+x4 y4 .

in terms of the canonical basis for R4 . Determine an orthogonal basis for R4 in


terms of which the matrix of f is diagonal.
3. Let f : C3 × C3 → C be a symmetric form associated with the matrix
⎡ ⎤
112
⎣1 2 3⎦
233

in terms of the canonical basis for C3 . Determine an orthogonal basis for C3 in


terms of which the matrix of f is the 3 × 3 identity matrix.
4. Let f : Rn × Rn → R be a non-degenerate bilinear form. Prove that there exist
nonzero f -isotropic vectors if and only if f is neither positive nor negative
definite.
5. Let A be the symmetric n × n matrix of the quadratic form q : V → R. Let Ak
be the k × k principal submatrix of A, obtained by taking the elements common
to the first k rows and columns of A. Prove that q is negative definite if and only
if, for any k = 1, . . . , n, (−1)k det (Ak ) > 0.
6. Which of the following functions f on C2 × C2 are bilinear forms? For α =
(x1 , x2 ), β = (y1 , y2 ) in C2 ;
(a) f (α, β) = 1;
(b) f (α, β) = x1 + x2 + y1 + y2 ;
(c) f (α, β) = x1 y1 + x2 y2 ;
(d) f (α, β) = x1 y1 + x2 y2 ;
(e) f (α, β) = x1 y1 + x2 y2 + 2x2 y1 + 3x2 y2 ;
( f ) f (α, β) = x1 2 + y1 2 + x1 y1 .
7. For each of the following symmetric bilinear forms over R, find the associated
quadratic form:
(a) f (X, Y ) = x1 y1 + x1 y2 + x2 y1 on R2 , where X = (x1 , x2 ), Y = (y1 , y2 );
(b) f (X, Y ) = −2x1 y2 + x2 y2 − x3 y3 + 4x1 y2 + 4x2 y1 on R3 , where X = (x1 ,
x2 , x3 ), Y = (y1 , y2 , y3 );
(c) f (X, Y ) = x1 y2 + x2 y1 on R2 , where X = (x1 , x2 ), Y = (y1 , y2 ).
8. For each of the following quadratic forms, find the corresponding symmetric
bilinear form f :
(a) q(X ) = x12 + x22 − x32 on R3 , where X = (x1 , x2 , x3 );
(b) q(X ) = ax1 x2 + bx1 x3 + cx32 on R3 , where X = (x1 , x2 , x3 ); a, b, c ∈ R;
(c) q(X ) = x22 on R4 , where X = (x1 , x2 , x3 , x4 );
(d) q(X ) = 2x1 x2 − x22 on R2 , where X = (x1 , x2 );
(e) q(X ) = x1 x2 + x2 x3 + x3 x4 + x4 x1 on R4 , where X = (x1 , x2 , x3 , x4 ).
9. Let V be the real vector space of all 2 × 2 (complex) Hermitian matrices, i.e.,
2 × 2 complex matrices A = (ai j ) which satisfy ai j = a ji for all i, j = 1, 2.
284 7 Bilinear and Quadratic Forms

(a) Show that the equation q(A) = det A defines a quadratic form q on V.
(b) Let W be the subspace of V of matrices of trace 0. Show that the bilinear
form f determined by q is negative
 definite on the subspace W.
0b
10. Prove that matrix A = over a field F with char (F) = 2, is congruent to
b0
 
 2b 0 
A = . Further, find a nonsingular matrix P over F such that A =
0 − b2
P t A P.
Chapter 8
Sesquilinear and Hermitian Forms

In the previous chapter, we have studied and characterized the maps V × V → F,


having the property of being linear in each of their arguments, where V is a vector
space over the field F. However, not all forms of interest are bilinear. For example, on
complex vector spaces, one may introduce functions that provide an extra complex
conjugation. In this regard, we now consider a somewhat larger class of functions
than the bilinear one.

8.1 Sesquilinear Forms and Their Matrices

Let C be the complex field, V, W complex vector spaces and V × W the cartesian
product of V and W (as sets). A function f : V × W → C is called sesquilinear
if it is conjugate-linear (complex semi-linear) in the first variable and linear in the
second one, that is

f (α1 v1 + α2 v2 , w) = α1 f (v1 , w) + α2 f (v2 , w) (8.1)

and
f (v, β1 w1 + β2 w2 ) = β1 f (v, w1 ) + β2 f (v, w2 ) (8.2)

for any α1 , α2 , β1 , β2 ∈ C, v1 , v2 ∈ V and w1 , w2 ∈ W. A sesquilinear function


f : V × W → C is usually called a sesquilinear form on V × W.

Example 8.1 The inner product f : Cn × Cn −→ C defined as


  n
f (x1 , . . . , xn ), (y1 , . . . , yn ) = xi yi
i=1

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 285
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_8
286 8 Sesquilinear and Hermitian Forms

is a sesquilinear form on Cn × Cn .
Example 8.2 Let A be a complex n × n matrix. The function f : Cn × Cn −→ C
defined as f (X, Y ) = X ∗ AY is a sesquilinear form on Cn × Cn , where X, Y ∈ Cn
and X ∗ denotes the transpose conjugate of the vector X , i.e., X ∗ = (X )t . Here the
vectors X and Y have been identified by column matrices with complex entries.
Let f : V × W → C be a sesquilinear form on V × W, where V and W are finite
dimensional complex vector spaces. Let B = {b1 , . . . , bn } and C = {c1 , . . . , cm }
be ordered bases for V and W, respectively. Let [v] B and [w]C be the coordinate
vectors of v ∈ V in terms of B and w ∈ W in terms of C, respectively. Say [v] B =

n 
m
[x1 , . . . , xn ]t and [w]C = [y1 , . . . , ym ]t , i.e., v = xi bi and w = y j c j . We have
i=1 j=1

n 
m 
f (v, w) = f ( xi bi , yjcj) = xi y j f (bi , c j ).
i=1 j=1 i, j

 
If we consider the coefficients matrix A = ai j , where ai j = f (bi , c j ), for any
i = 1, . . . , n and j = 1, . . . , m, then it is easy to see that

f (v, w) = [v]∗B A[w]C .


 
The n × m matrix A = ai j is said to be the matrix of the sesquilinear form f with
respect to the ordered bases B and C.
Conversely, let A ∈ Mnm (C) and let, as above, [v] B , [w]C be the coordinate vectors
of v ∈ V, in terms of B, and w ∈ W, in terms of C, respectively. If we define the
function f : V × W → F such that f (v, w) = [v]∗B A[w]C , then by computations it
follows that f (v, w) = xi y j ai j ,, i.e., f is a sesquilinear map on the set V × W.
i, j
Moreover, using an analogous argument to that of the matrix of a bilinear form,
one can see that there is an unique n × m complex matrix, which represents the
sesquilinear form with respect to the given ordered bases B and C.
Example 8.3 Let f : C3 × C4 → C be a sesquilinear form on C3 × C4 . Let B =
{b1 , b2 , b3 } and C = {c1 , c2 , c3 , c4 } be bases for C3 and C4 , respectively. Let [X ] B
and [Y ]C be the coordinate vectors of v ∈ V in terms of B and w ∈ W in terms of C,

3
respectively. Say [X ] B = [x1 , x2 , x3 ]t and [Y ]C = [y1 , y2 , y3 , y4 ]t , i.e., v = xi bi
i=1

4
and w = y j c j . Let
j=1


3 
4
f (v, w) = f ( xi bi , yjcj)
i=1 j=1
= i x1 y1 + x1 y2 + (2 + i)x1 y3 + (1 + i)x1 y4 − i x2 y1 + x2 y2 + x2 y3
+(1 − i)x2 y4 + 3i x3 y1 + 2x3 y2 + 2x3 y3 + 3x3 y4 .
8.1 Sesquilinear Forms and Their Matrices 287

This implies that f (b1 , c1 ) = i, f (b1 , c2 ) = 1, . . . , f (b3 , c4 ) = 3. In light of the


above comments, the matrix of f in terms of bases B and C for C3 and C4 respectively,
is the following: ⎡ ⎤
i 1 2+i 1+i
⎣ −i 1 1 1 − i ⎦ .
3i 2 2 3

As in the case of bilinear forms, we introduce the concept of the quadratic form
associated with a sesquilinear form f. More precisely:
Definition 8.4 Let f : V × V → C be a sesquilinear form. The corresponding
quadratic function Q : V → C is defined as Q(v) = f (v, v), for any v ∈ V.
By relations (8.1) and (8.2), it follows that the corresponding quadratic form Q(v) =
f (v, v) satisfies the following:

Q(u + v) + Q(u − v) = 2Q(u) + 2Q(v) (8.3)

and
Q(λu) = |λ|2 Q(u) (8.4)

for any λ ∈ C, u ∈ V.
Moreover, relation (8.3) yields

Q(u + v) = Q(u) + Q(v) + f (u, v) + f (v, u) (8.5)

for any u, v ∈ V.
Unlike the situation, we have previously described in the case of bilinear forms,
here we may prove the following:
Proposition 8.5 Let f : V × V → C be a sesquilinear form and Q : V → C the
quadratic form defined as Q(v) = f (v, v), for any v ∈ V. Then f is the unique
sesquilinear form corresponding to Q.
Proof In (8.5) we replace v by iv, so it follows that

Q(u + iv) = Q(u) + Q(iv) + f (u, iv) + f (iv, u) = Q(u) + Q(v) + i f (u, v) − i f (v, u).
(8.6)
Multiplying (8.6) by i we get

i Q(u + iv) = i Q(u) + i Q(v) − f (u, v) + f (v, u). (8.7)

Finally, a comparison of (8.5) and (8.7) yields

i Q(u + iv) − Q(u + v) = (i − 1)Q(u) + (i − 1)Q(v) − 2 f (u, v)

that is
288 8 Sesquilinear and Hermitian Forms

2 f (u, v) = Q(u + v) − i Q(u + iv) + (i − 1)Q(u) + (i − 1)Q(v)

so that

1 i
f (u, v) = Q(u + v) − Q(u) − Q(v) − Q(u + iv) − Q(u) − Q(v) .
2 2
(8.8)
Therefore, f is uniquely determined by the function Q.
Remark 8.6 To underline the difference between bilinear and sesquilinear cases,
we recall that if Q is a quadratic form associated with a bilinear function f , then f
is uniquely determined only if it is symmetric.
In order to extend the concept of symmetric forms to complex vector spaces, we
introduce the following:
Definition 8.7 Let C be the complex field and V a complex vector space. A sesquilin-
ear form f : V × V → C is called Hermitian if

f (v, w) = f (w, v)

for any v, w ∈ V.
Example 8.8 Let f : C3 × C3 → C be a sesquilinear form on C3 × C3 . Let B =
{b1 , b2 , b3 } and C = {c1 , c2 , c3 } be ordered bases for C3 . Next suppose that [X ] B
and [Y ]C be the coordinate vectors of v ∈ V and w ∈ V in terms of B and C,

3
respectively. Say [X ] B = [x1 , x2 , x3 ]t and [Y ]C = [y1 , y2 , y3 ]t , i.e., v = xi bi and
i=1

3
w= y j c j . Let
j=1


3 
3
f (v, w) = f ( xi bi , yjcj)
i=1 j=1
= x1 y1 + i x1 y2 + (2 + i)x1 y3 − i x2 y1 + x2 y2 + (1 + i)x2 y3
+(2 − i)x3 y1 + (1 − i)x3 y2 + 2x3 y3 .

Hence, f (b1 , c1 ) = 1, f (b1 , c2 ) = i, . . . , f (b3 , c3 ) = 2. The matrix of f in terms


of ordered bases B and C for C3 is the following:
⎡ ⎤
1 i 2+i
⎣ −i 1 1 +i ⎦.
2−i 1−i 2

Easy computations show that f is Hermitian (as well as its associated matrix).
In light of the above comments regarding the matrix of a sesquilinear form, it is clear
that any Hermitian form on a complex vector space V is represented by a complex
8.1 Sesquilinear Forms and Their Matrices 289

Hermitian matrix, depending on the choice of the basis for V. Conversely, for any
Hermitian matrix A, the sesquilinear form f : V × V → C associated with A is
Hermitian.
Theorem 8.9 Let f : V × V → C be a sesquilinear form on the complex vector
space V. The function f is a Hermitian form if and only if f (v, v) ∈ R, for any
v ∈ V.
Proof If we assume that f is Hermitian, then it is easy to see how the fact f (v, v) =
f (v, v), for any v ∈ V, implies that f (v, v) ∈ R.
Conversely, suppose that f (v, v) ∈ R, for any v ∈ V. Thus, for any u, v ∈ V we
also have

f (u + v, u + v) = f (u, u) + f (u, v) + f (v, u) + f (v, v) ∈ R

implying that
f (u, v) + f (v, u) = α ∈ R. (8.9)

In particular,
f (iu, v) + f (v, iu) = β ∈ R

that is
−i f (u, v) + i f (v, u) = β

and multiplying by i,
f (u, v) − f (v, u) = iβ. (8.10)

By comparing relations (8.9) and (8.10), we get both

α + iβ
f (u, v) =
2
and
α − iβ
f (v, u) = ,
2

which means f (u, v) = f (v, u), for any u, v ∈ V, as required.


Remark 8.10 In what follows, we will substantially go over the outlines of the
presentation given for bilinear and symmetric forms. The most part of arguments
and proofs also apply to Hermitian forms, so we omit them, unless when they need
to be deeply altered.
Definition 8.11 Let f : V × V → C be a Hermitian form. The quadratic form asso-
ciated with f is called h-quadratic form.
Notice that, in light of Theorem 8.9, if f is Hermitian then the corresponding
quadratic function Q is real valued. For this reason, we may also catalog any h-
quadratic form in relation to whether it assumes positive or negative values.
290 8 Sesquilinear and Hermitian Forms

Definition 8.12 Let f : V × V → C be a Hermitian form and Q its associated h-


quadratic form. The h-quadratic form Q is called:
(1) Positive definite if Q(X ) > 0 for all nonzero vectors X ∈ V.
(2) Negative definite if Q(X ) < 0 for all nonzero vectors X ∈ V.
(3) Indefinite if it is neither positive-definite nor negative-definite, in the sense that
Q takes on V both positive and negative values.
(4) Positive semi-definite if Q(X ) ≥ 0 for all X ∈ V, but there is some nonzero
vector X 0 ∈ V so that Q(X 0 ) = 0.
(5) Negative semi-definite if Q(X ) ≤ 0 for all X ∈ V, but there is some nonzero
vector X 0 ∈ V so that Q(X 0 ) = 0.

8.2 The Effect of the Change of Bases

If B = {b1 , . . . , bn } and B  = {b1 , . . . , bn } are two different ordered bases for V,
and C = {c1 , . . . , cm }, C  = {c1 , . . . , cm } are ordered bases for W, we know that
the sesquilinear form f : V × W → C can be represented by different matrices,
according to the choice of a basis for V and W. Let A be the matrix of f in terms of
the bases B for V and C for W, and A the matrix of f in terms of the bases B  for
V and C  for W.
By following the same procedure as in the case of bilinear forms, we may establish
the following relationship between A and A :

A = P ∗ AQ

where P ∈ Mn (C) is the transition matrix of B  relative to B, and Q ∈ Mm (C) is


the transition matrix of C  relative to C. In particular:
Definition 8.13 Let A, A be two n × n complex matrices. A, A are called H-
congruent matrices if there exists an invertible n × n complex matrix P, such that
A = P ∗ A P. The relationship A = P ∗ A P is usually called H-congruence and it is
an equivalence relation.
Just as in the case of bilinear forms, the following holds:

Theorem 8.14 Let f : V × V → C be a sesquilinear form on V . Two matrices


A, A represent f , in terms of two different bases for V , if and only if they are
H -congruent.

Proof The proof is unchanged with respect to the one of Theorem 7.7.
8.3 Orthogonality 291

8.3 Orthogonality

The concepts of f -orthogonality, f -orthogonal, f -orthonormal vectors and f -ortho-


gonal complement, as well as the definitions of isotropic vectors, nonisotropic vec-
tors, degenerate and non-degenerate forms are unchanged from the previously studied
case of bilinear and symmetric forms.
Of course, if f : V × V → C is a Hermitian form on V, the orthogonality relation
is symmetric, that is f (u, v) = 0 if and only if f (v, u) = 0, for all u, v ∈ V.
We would like to focus our attention just on the proof of the following:

Lemma 8.15 Let f : V × V → C be a Hermitian form on V. If f = 0 (in the sense


that it is not identically zero on V × V ), then there exists a vector v ∈ V such that
f (v, v) = 0.

Proof Since we assume that f = 0, there are x0 , y0 ∈ V such that 0 = f (x0 , y0 ) =


α ∈ C. Let 0 = β = α, so that 0 = β f (x0 , y0 ) = βα ∈ R. Thus

0 = β f (x0 , y0 ) = f (x0 , βy0 ) ∈ R.

This means that, for z 0 = βy0 ,

0 = f (x0 , z 0 ) = f (z 0 , x0 ) = f (z 0 , x0 ).

To prove our result, we suppose on the contrary f (v, v) = 0 for all v ∈ V. Hence, it
follows that f (x0 , x0 ) = 0, f (z 0 , z 0 ) = 0 and f (x0 + z 0 , x0 + z 0 ) = 0. Therefore,
we get the contradiction

0 = f (x0 + z 0 , x0 + z 0 ) = f (x0 , z 0 ) + f (z 0 , x0 ) = 2 f (x0 , z 0 ).

Theorem 8.16 Let f : V × V → C be a Hermitian form on V . Then there is an


f -orthogonal basis for V .

Proof An inspection of the proof of Theorem 7.32 reveals that it applies unchanged.
Here we just apply Lemma 8.15 in place of Lemma 7.31. As above mentioned,
the rest of the proof is unchanged.
Remark 8.17 The Gram-Schmidt orthonormalization process, introduced in order
to construct an orthonormal basis for a symmetric space, also applies to the case
of a complex space equipped with a Hermitian form f. No change is needed in
the procedure previously described for symmetric spaces, if not the replacement of
the transpose matrix U t by the conjugate-transpose (adjoint) U ∗ , where U is the
transition matrix.
Moreover, we also recall that, in the case of a complex space V equipped with
a Hermitian form f, if we change to a new orthonormal basis for V, the transition
matrix U will be unitary. In fact, since the matrix A associated with f is Hermitian,
it is unitarily similar to a diagonal matrix. This means that there is an f -orthogonal
292 8 Sesquilinear and Hermitian Forms

basis for V in terms of which the real diagonal matrix A representing f is obtained
by A = U ∗ AU = U −1 AU, where U is an unitary matrix.

Theorem 8.16 and Remark 8.17 will allow us to state the ‘Hermitian’ version of
Theorem 7.38:

Theorem 8.18 Let f : V × V → C be a Hermitian form on V. Then there is an


f -orthogonal basis B for V such that the matrix of f in terms of B has the following
form: ⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ . .. ⎥
⎢ ⎥.
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ . ⎥
⎣ . . ⎦
0

Example 8.19 Let f : C3 × C3 → C be the Hermitian form defined by the matrix


⎡ ⎤
2 1 + 2i i
A = ⎣ 1 − 2i 0 2 − i ⎦
−i 2 + i 1

in terms of the canonical basis B = {e1 , e2 , e3 } for C3 . In order to apply the orthonor-
malization process, we firstly introduce the following new basis B  = {e1 , e2 , e3 } for
C3 :
f (e1 , e2 ) f (e1 , e3 )
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
f (e1 , e1 ) f (e1 , e1 )

that is
1 + 2i i
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
2 2
and the corresponding transition matrix is
⎡ ⎤
1 − 1+2i
2
− 2i
C1 = ⎣ 0 1 0 ⎦.
0 0 1

Hence, the new expression of the matrix of f is


8.3 Orthogonality 293
⎡ ⎤
2 0 0
A = C1∗ AC1 = ⎣ 0 − 25 2−3i
2
⎦.
0 2+3i
2
1
2

The next change of basis will be the following

f (e2 , e3 ) 
e1 = e1 , e2 = e2 , e3 = e3 − e .
f (e2 , e2 ) 2

We introduce the basis B  = {e1 , e2 , e3 } such that

2 − 3i 
e1 = e1 , e2 = e2 , e3 = e3 + e2
5
and the corresponding transition matrix is
⎡ ⎤
10 0
C2 = ⎣ 0 1 2−3i
5
⎦.
00 1

Thus, the matrix of f assumes the following form:


⎡ ⎤
2 0 0
A = C2∗ A C2 = ⎣ 0 − 25 0 ⎦ .
0 0 95

The basis B  = {e1 , e2 , e3 } in terms of which the matrix of f is precisely A is
obtained by the computation
⎡ ⎤
1 − 1+2i
2
− 4+3i
5
C = C1 C2 = ⎣ 0 1 2−3i ⎦
5
0 0 1

where C is the transition matrix of B  relative to the starting basis B. Looking at the
columns of C and reordering the vectors, we have the following basis for C3

 4 + 3i 2 − 3i 1 + 2i
B = (1, 0, 0), (− , , 1), (− , 1, 0)
5 5 2

in terms of which the matrix of f is


⎡ ⎤
20 0
A = ⎣ 0 95 0 ⎦ .
0 0 − 25
294 8 Sesquilinear and Hermitian Forms

Finally, we construct the basis for C3 , with respect to which the matrix of f has all
nonzero entries equal to ±1. To do this, we replace any vector

4 + 3i 2 − 3i 1 + 2i
e1 = (1, 0, 0), e2 = (− , , 1), e3 = (− , 1, 0)
5 5 2
ei
by the corresponding √ . In particular, we have
| f (ei ,ei )|

   
√ 3 5
| f (e1 , e1 )| = 2,  
| f (e2 , e2 )| = √ ,  
| f (e3 , e3 )| = .
5 2

By all replacements, we obtain the new basis



1 4 + 3i 2 − 3i 5 1 + 2i 2
B̃ = ( √ , 0, 0), (− √ , √ , √ ), (− √ , √ , 0) .
2 3 5 3 5 3 5 10 10

The transition matrix of B̃ relative to the starting basis B is


⎡ ⎤
√1
2
− 4+3i
√ − 1+2i
3 5

10
⎢ 2−3i √2 ⎥
C̃ = ⎣ 0 √
3 5 10 ⎦
5

0 3 5
0

and the matrix of f in terms of B̃ is


⎤ ⎡
10 0
à = C̃ ∗ AC̃ = ⎣ 0 1 0 ⎦ .
0 0 −1

Finally, we may extend the concept of signature to any Hermitian form:

Definition 8.20 Let f : V × V → C be a complex Hermitian form. The ordered


pair ( p, r − p) such that, for any X, Y ∈ V having coordinate vectors [X ] =
[x1 , . . . , xn ]t and [Y ] = [y1 , . . . , yn ]t , f (X, Y ) = x1 y1 + · · · + x p y p − x p+1 y p+1 −
· · · − xr yr with respect to an appropriate f -orthogonal basis for V, is called the sig-
nature of f.

Of course, in parallel, we give the following:

Definition 8.21 Let A be a complex n × n Hermitian matrix, U a complex n × n


invertible matrix such that U ∗ AU = D is a real diagonal matrix of the form
8.3 Orthogonality 295
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

having p ones and r − p negative ones on the main diagonal, where r is the rank of
A (recall that A and D have the same rank, since they are congruent). The ordered
pair ( p, r − p) is called the signature of A. Moreover, according to Definition 8.12,
f is:
(1) Positive definite if its signature is (n, 0).
(2) Negative definite if its signature is (0, n).
(3) Indefinite if its signature is ( p, r − p), for p = r.
(4) Positive semi-definite if its signature is (r, 0).
(5) Negative semi-definite if its signature is (0, r ).

Definition 8.22 Let V be a complex


 space equipped by the Hermitian form f. In
general, one may refer to V, f as a Hermitian space. In particular, if f has the
additional property f(v, v)> 0, for all 0 = v ∈ V, then it is an inner product, and
we may also refer to V, f as an inner product complex space.

We conclude the chapter by providing the ‘Hermitian’ versions of Theorems 7.58


and 7.59:

Theorem 8.23 Let A, A be two complex n × n Hermitian matrices. A, A are


H -congruent if and only if they have the same signature.

Theorem 8.24 Let A be a real n × n symmetric matrix having signature ( p, r − p).


Then
(i) The number of positive eigenvalues of A is equal to p.
(ii) The number of negative eigenvalues of A is equal to r − p.

To prove Theorems 8.23 and 8.24, it is sufficient to recall that the eigenvalues of any
Hermitian matrix are real numbers. Therefore, we can use the argument contained in
Theorems 7.58 and 7.59 without any change, but first replace the role of transpose
matrices with one of adjoint matrices in all the proofs.
296 8 Sesquilinear and Hermitian Forms

Exercises

1. Let f : V × V → C be a sesquilinear form. Prove that f is Hermitian if and only


if the matrix A of f is Hermitian, whatever the choice of basis for V with respect
to which A is related.
2. Let f : C3 × C3 → C be a Hermitian form associated with the matrix
⎡ ⎤
2 i 2−i
⎣ −i 1 1+i⎦
2+i 1−i 1

in terms of the canonical basis for C3 . Determine an orthogonal basis for C3 in


terms of which the matrix of f is diagonal.
3. Let f : C4 × C4 → C be a Hermitian form associated with the matrix
⎡ ⎤
1 2+i 0 i
⎢2 − i 1 i −i ⎥
⎢ ⎥
⎣ 0 −i 2 −i ⎦
−i i i 2

in terms of the canonical basis for C4 . Determine an orthogonal basis for C4 in


terms of which the matrix of f is diagonal.
4. Let V be the vector space of all n × n matrices over C and f : V × V → C the
map defined as f (A, B) = trace(A∗ B), for any A, B ∈ V. Prove that f is a
positive definite Hermitian form on V × V.
5. Let V be a complex vector space of dimension n, f : V × V → C a positive
definite Hermitian form, H the n × n complex matrix of f in terms of a suitable
basis B for V. Prove that the sesquilinear form associated with the matrix H −1
is a positive definite Hermitian form.
6. Let V be a complex vector space of dimension n, f : V × V → C a positive
definite Hermitian form, H the n × n complex matrix of f in terms of a suitable
basis B for V. Let N be the n × n complex matrix such that N 2 = H. Prove that
the sesquilinear form associated with the matrix N is a positive definite Hermitian
form on V⎡× V. ⎤
1 1+i 2i
7. Let H = ⎣ 1 − i 4 2 − 3i ⎦ be a Hermitian matrix. Find a nonsingular
−2i 2 + 3i 7
matrix P such that D = P ∗ H P is diagonal. Also, find the signature of H.
8. Show that the relation “Hermitian congruent" is an equivalence relation.
9. Let f be a Hermitian form on V over C. Then prove that there exists a basis of
V in which f is represented by a diagonal matrix. Also, show that every other
diagonal matrix representation of f has the same number of positive entries and
negative entries.
Chapter 9
Tensors and Their Algebras

In Chap. 7 bilinear and quadratic forms with various ramifications have been dis-
cussed. In the present chapter we address an aesthetic concern raised by bilinear
forms and, as a part of this study, the tensor product of vector spaces has been intro-
duced. Further, besides the study of tensor product of linear transformations, in the
subsequent sections, a tensor algebra will be developed, and the chapter concludes
with the study of exterior algebra viz. Grassmann algebra.

9.1 The Tensor Product

Definition 9.1 Let U and V be vector spaces over the same field F. Then tensor
product of U and V is a pair (W, ψ) consisting of a vector space W over F together
with a bilinear map ψ : U × V → W satisfying the following universal property:
for any vector space X over F and any bilinear map f : U × V → X , there exists a
unique linear map h : W → X such that h ◦ ψ = f , that is, the following diagram

ψ
U ×V W

h
f

commutes.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 297
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_9
298 9 Tensors and Their Algebras

Remark 9.2 (i) W is generally denoted by U ⊗ V and is called tensor product of


U and V (if exists) and we shall see below that it is unique up to isomorphism.
(ii) Let B1 = {u i | i ∈ I } and B2 = {v j | j ∈ J } be bases of U and V, respectively.
The bilinear mapping ψ on U × V can be uniquely determined by assigning
arbitrary values to the pair (u i , v j ). For each ordered pair (u i , v j ), we can write
a formal symbol u i ⊗ v j and define W to be a vector space with basis B =
{u i ⊗ v j | u i ∈ B1 , v j ∈ B2 }. Now define a mapping ψ on U × V by setting
ψ(u i , v j ) = u i ⊗ v j . This uniquely defines a bilinear map ψ which is universal.
If f : U × V → X is bilinear, then the condition f = h ◦ ψ is equivalent to
h(u i ⊗ v j ) = f (u i , v j ) which uniquely defines a linear map h : W → X and
the pair (W, ψ) has the universal property for bilinearity. Any element w of W
m n
is a finite linear combination of u i ⊗ v j , i.e., w = αi j (u ki ⊗ vk j ), and if
i=1 j=1

m 
n
u= αi u i , v = β j v j , then
i=1 j=1


m 
n  
m 
n
u ⊗ v = ψ(u, v) = ψ αi u i , βjvj = αi β j u i ⊗ v j .
i=1 j=1 i=1 j=1

(iii) As indicated above, for any u ∈ U and v ∈ V, the image of (u, v) under the
universal bilinear pairing into U ⊗ V shall be denoted u ⊗ v ∈ U ⊗ V . In view
of the bilinearity of the pairing (u, v) → u ⊗ v, we find the relations (α1 u 1 +
α2 u 2 ) ⊗ v = α1 (u 1 ⊗ v) + α2 (u 2 ⊗ v) and u ⊗ (β1 v1 + β2 v2 )=β1 (u ⊗ v1 ) +
β2 (u ⊗ v2 ) in U ⊗ V for any α1 , α2 , β1 , β2 ∈ F, u, u 1 , u 2 ∈ U, v, v1 , v2 ∈ V .
(iv) Note that if U = {0} or V = {0} , then U ⊗ V = {0}. Indeed the only bilinear
pairing ψ : U × V → W is the zero pairing and hence the pairing U × V →
{0} yields the tensor product. Thus {0} ⊗ V = {0} and U ⊗ {0} = {0}.

Example 9.3 Let R[x] and R[y] be vector spaces over R. Then, R[x] ⊗ R[y] =
R[x, y].

Define a map f : R[x] × R[y] → R[x, y], given by, f ( p(x), q(y)) = p(x)q(y).
For any α, β ∈ R, p1 (x), p2 (x) ∈ R[x], q(y) ∈ R[y] we have,

f (αp1 (x) + βp2 (x), q(y)) = (αp1 (x) + βp2 (x))q(y)


= αp1 (x)q(y) + βp2 (x)q(y)
= α f ( p1 (x), q(y)) + β f ( p2 (x), q(y)).

Thus f is linear in first slot. Similarly, one can show that f is linear in second
slot and hence f is a bilinear map. Now we claim that, (R[x, y], f ) is the tensor
product of R[x] and R[y]. For this let X be any arbitrary vector space over R and
g : R[x] × R[y] → X be any arbitrary bilinear map. Then we have to construct a
unique homomorphism h : R[x, y] → X such that the following diagram commutes,
i.e., h ◦ f = g.
9.1 The Tensor Product 299

f
R[x] × R[y] R[x, y]

h
g

Define h : R[x, y] → X such that,


 f
inite  
f inite
h αi j x i y j = αi j g(x i , y j ).
i, j≥0 i, j≥0

Obviously, h is well defined. Next we prove that h is a linear map.

  f
inite   f
inite 
h α αi j x y + β
i j
αi j x y
i j

i, j≥0 i, j≥0
 f
inite 
=h (ααi j + ββi j )x i y j
i, j≥0


f inite
= (ααi j + ββi j )g(x i , y j )
i, j≥0


f inite

f inite
= ααi j g(x , y ) +i j
ββi j g(x i , y j )
i, j≥0 i, j≥0
 f
inite   f
inite 
= αh αi j x i y j + βh βi j x i y j .
i, j≥0 i, j≥0

Hence, h is a linear map. Also

(h ◦ f )( p(x), q(y)) = h( f ( p(x), q(y))) = h( p(x)q(y)).


m 
n 
m 
n
Let p(x)= αi x i and q(y)= β j y j . This implies that p(x)q(y) = δi j x i y j
i=0 j=0 i=0 j=0
and δi j = αi β j , 0 ≤ i ≤ m  , 0 ≤ j ≤ n  .
Therefore,
300 9 Tensors and Their Algebras


m 
n 
m 
n
(h ◦ f )( p(x), q(y)) = δi j g(x i , y j ) = αi β j g(x i , y j )
i=0 j=0 i=0 j=0
m n
= g(αi x i , β j y j )
 m
i=0 j=0

 
n
=g αi x i , βj y j
i=0 j=0
= g( p(x), q(y)).

This implies that h ◦ f = g. Finally, we show the uniqueness of h. If possible,


suppose that h 1 : R[x, y] → X be another linear map such that h 1 ◦ f = g, then
 f
inite

h(r (x, y)) = h γi j x y
i j
where r (x, y) ∈ R[x, y]
i, j≥0
f
inite
= γi j g(x i , y j )
i, j≥0
finite
= γi j h 1 f (x i , y j )

i, j≥0
f
inite

= h1 γi j f (x i , y j )
 f inite
i, j≥0


= h1 γi j x i y j
i, j≥0
= h 1 (r (x, y)).

This implies that, h = h 1 . Therefore, h is unique and finally, we have R[x] ⊗ R[y] =
R[x, y]. Here it is to be noted that x and y do not commute.
Existence and Uniqueness of Tensor Product of Two Vector Spaces
To prove the existence of tensor product of two vector spaces, we need the notion of
“free vector space over a given set”. Let S be any set and F, a field. A pair (V, f )
is called a free vector space over F on the set S, where V is a vector space over F
and f : S → V is a function if for any arbitrary vector space X over F and for any
arbitrary function g : S → X , there exists a unique linear map h : V → X such that
the following diagram commutes, i.e., h ◦ f = g.

f
S V

h
g

X
9.1 The Tensor Product 301

It can be easily shown that given any set S and F, a field, there always exists a free
vector space over F on S. If (V, f ) is a free vector space on S, then f is one-to-one
and
f (S) = V. Moreover, every vector space can be realized as a quotient space of
a free vector space. If (V, f ) is a free vector space over F on S, then as a convention,
we say that V is a free vector space on S. The associated map f and the field F are
assumed to be understood.
Theorem 9.4 Let U and V be vector spaces over a field F. Then their tensor product
U ⊗ V exists and it is unique upto isomorphism.

Proof Let (W, f ) be a free vector space over the set U × V . Now consider T ,
the subspace of W , generated by the elements f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) −
α2 f (u 2 , v), f (u, β1 v1 + β2 v2 ) − β1 f (u, v1 ) − β2 f (u, v2 ), for all elements u, u 1 ,
u 2 ∈ U, v, v1 , v2 ∈ V and α1 , α2 , β1 , β2 ∈ F. Now construct the quotient space WT .
It is obvious that there exists the quotient homomorphism q : W → WT . Clearly
f  = q ◦ f : U × V → WT . We show that f  is a bilinear map from U × V to WT ;

f  (α1 u 1 + α2 u 2 , v) − α1 f  (u 1 , v) − α2 f  (u 2 , v)
= q[ f (α1 u 1 + α2 u 2 , v)] − α1 q[ f (u 1 , v)] − α2 q[ f (u 2 , v)],
= q[ f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v)],
= 0.

Hence
f  (α1 u 1 + α2 u 2 , v) = α1 f  (u 1 , v) + α2 f  (u 2 , v),

for all α1 , α2 ∈ F, u 1 , u 2 ∈ U, v ∈ V.
This shows that f  is linear in the first coordinate. Similarly, it can be seen that f 
is also linear in the second coordinate. Thus, f  is a bilinear map. Now, we claim that
( WT , f  ) is a tensor product of U and V . For this let g : U × V → X be any arbitrary
bilinear map, where X is any vector space. Then we have to produce a unique linear
map h : WT → X such that the following diagram commutes, i.e., h ◦ f  = g.

f
U ×V W
T

h
g

X
302 9 Tensors and Their Algebras

Since W is a free vector space over U × V . Hence for the map g : U × V → X , there
exists a unique linear map h  : W → X such that the following diagram commutes,
i.e., h  ◦ f = g.
f
U ×V W

h
g

Now we show that T ⊆ K er h  . For this, consider arbitrary elements α1 , α2 ∈ F,


u, u 1 , u 2 ∈ U, v, v1 , v2 ∈ V. Then, we have

h  ( f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v))
= g(α1 u 1 + α2 u 2 , v) − α1 g(u 1 , v) − α2 g(u 2 , v)
= 0.

Hence, the element f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v) ∈ K er h  . Simi-


larly, we can show that the element f (u, β1 v1 + β2 v2 ) − β1 f (u, v1 ) − β2 f (u, v2 ) ∈
K er h  . But T is generated by these elements. Thus, we conclude that T ⊆ K er h  .
Now define a map h : WT → X such that h(w + T ) = h  (w). Assume that w1 +
T = w2 + T . This implies that w1 − w2 ∈ T, i.e., w1 − w2 ∈ K er h  . As a result
h  (w1 − w2 ) = 0, i.e., h  (w1 ) = h  (w2 ). Finally, we have shown that h(w1 + T ) =
h(w2 + T ). Thus h is well defined. Obviously, h is a linear map. Next we show
that h f  = g. Consider (h f  )(u, v)=h( f  (u, v))=h(q f (u, v)) = h(q( f (u, v))) =
h( f (u, v) + T ) = h  ( f (u, v)) = (h  f )(u, v) = g(u, v) for every element (u, v) ∈
U × V . Hence h f  = g stands proved.
To prove our claim, it only remains to be proved that h is unique. If possible,
suppose that there exists another linear map k : WT → X such that k ◦ f  = g. As
(W, f ) is a free vector space over U × V , we have < f (U × V ) >= W.
Let w + T ∈ WT . Thus w ∈ W and we have αi j ∈ F, 1 ≤ i ≤ m, 1 ≤ j ≤ n such
m n
that w = αi j f (u i , v j ); where u i ∈ U, v j ∈ V, 1 ≤ i ≤ m, 1 ≤ j ≤ n.
i=1 j=1
Hence,
9.1 The Tensor Product 303
  

m 
n
h(w + T ) = h αi j f (u i , v j ) + T
 i=1 j=1 

m  n
=h αi j f (u i , v j )
i=1 j=1

m 
n
= αi j h  f (u i , v j )
i=1 j=1
m  n
= αi j g(u i , v j )
i=1 j=1
m  n
= αi j (k ◦ f  )(u i , v j )
i=1 j=1  
m  n
= αi j k f  (u i , v j )
 j=1
i=1 

m  n

=k αi j f (u i , v j )
 i=1 j=1 

m  n
=k αi j (q f )(u i , v j )
 i=1
 mj=1 n 

=k q αi j f (u i , v j )
 mi=1 nj=1  

=k αi j f (u i , v j ) + T
i=1 j=1
= k(w + T ).

This shows that h = k. Hence, h is unique linear map. Thus ( WT , f  ) is a required


tensor product of U and V . Thus, existence stands proved.
To prove the uniqueness of tensor product of U and V , let us assume that (W, ψ)
and (W  , φ) are two tensor products of U and V , then it can be easily seen that there
exists a unique linear isomorphism W ∼ = W  carrying ψ and φ (and vice-versa). In
fact, if we factor φ through universal property of W,

ψ
U ×V W

h
φ

W

then h ◦ ψ = φ.
In a similar way, now factor ψ through universal property of W  ,
304 9 Tensors and Their Algebras

φ
U ×V W

h
ψ

we arrive at h  ◦ φ = ψ. Now consider the identity map IW on W and composition


h  ◦ h:

ψ ψ
U ×V W U ×V W

φ
IW ψ h
ψ

W W W
h

Both IW and h  ◦ h factor ψ through ψ as in the universal property of W and hence


by the uniqueness of the universal property, we arrive at IW = h  ◦ h. In a similar
way, one can arrive at IW  = h ◦ h  and hence we find that W ∼ = W  which ensures
that there is only one tensor product.
In this sense, the tensor product of U and V (equipped with its “universal" bilinear
pairing from U × V ) is unique up to isomorphism, and hence we may speak of the
tensor product of U and V and write U ⊗ V for the tensor product of U and V .
Theorem 9.5 Let U , V be vector spaces over a field F. The following statements
are equivalent:
(i) (U ⊗ V, ψ) is the tensor product of U and V .
(ii) Let ψ : U × V → U ⊗ V be a bilinear function such that
ψ(U × V ) = U ⊗
V . For any bilinear mapping f from U × V to another vector space X over F
there exists a linear map h : U ⊗ V → X such that h ◦ ψ = f.

ψ
U ×V U ⊗V

h
f

Proof (i) ⇒ (ii) If (U ⊗ V, ψ) is the tensor product of U and V , ψ : U × V →


U ⊗ V is a bilinear mapping and
ψ(U × V ) is a subspace of U ⊗ V . This
9.1 The Tensor Product 305

shows that the inclusion homomorphism exists, i.e., i :


ψ(U × V ) → U ⊗ V is
a vector space homomorphism. Define a map f : U × V →
ψ(U × V ) such that
f (u, v) = ψ(u, v). Obviously f is a bilinear mapping and since U ⊗ V is the tensor
product, there exists a unique homomorphism h : U ⊗ V →
ψ(U × V ) such that
the following diagram

ψ
U ×V U ⊗V

h
f


ψ(U × V )

is commutative, i.e., h ◦ ψ = f . Using the definition of tensor product again, there


exists a unique homomorphism (the identity homomorphism) I on U ⊗ V such that
the following diagram is commutative, i.e., I ◦ ψ = ψ.

ψ
U ×V U ⊗V

I
ψ

U ⊗V

But we have
h i
U ⊗ V −→
ψ(U × V ) −→ U ⊗ V

Clearly i ◦ h is a homomorphism (the identity homomorphism I ), i.e., i ◦ h : U ⊗


V → U ⊗ V. Since f (u, v) = ψ(u, v), it is clear that

(i ◦ h) ◦ ψ = i ◦ (h ◦ ψ)
=i◦ f
= f
= ψ.

This shows that homomorphism i ◦ h also makes the latter diagram commutative.
Hence, the uniqueness of I in the same guarantees that i ◦ h = I . On using ontoness
of i and I is bijective, i :
ψ(U × V ) → U ⊗ V becomes surjective, i.e., i(
ψ(U ×
V ) ) = U ⊗ V . But since i is the inclusion map, the later relation yields that
ψ(U ×
V ) = U ⊗ V and hence ψ(U × V ) generates U ⊗ V .
(ii) ⇒ (i) Let ψ : U × V → U ⊗ V be a bilinear function such that
ψ(U ×
V ) = U ⊗ V and for any vector space X over F and any bilinear map f : U × V →
306 9 Tensors and Their Algebras

X there exists a homomorphism h : U ⊗ V → X such that h ◦ ψ = f . To prove the


uniqueness of h, if possible suppose that h  : U ⊗ V → X is another homomorphism
which makes the above diagram commutative, i.e., h  ◦ ψ = f . Since U ⊗ V is
spanned by ψ(U × V ), for any x ∈ U ⊗ V we find that
 

m 
n
h(x) = h αi j ψ(u i , v j )
i=1 j=1

m n 
= αi j h ◦ ψ(u i , v j )
i=1 j=1
m  n 
= αi j f (u i , v j )
i=1 j=1
m  n 
= αi j h  ◦ ψ(u i , v j )
i=1 j=1 


m  n
=h αi j ψ(u i , v j )
i=1 j=1
= h  (x).

This implies that h = h  and the uniqueness of h stands proved.

Theorem 9.6 Let U , V be vector spaces over a field F with the ordered bases
{u 1 , u 2 , . . . , u m } and {v1 , v2 , . . . , vn }, respectively. Then any bilinear mapping f :
U × V → W is uniquely determined by the pairings of f (u i , v j ) ∈ W ; 1 ≤ i ≤
m; 1 ≤ j ≤ n, and conversely if f (u i , v j ) ∈ W are arbitrarily given then there exists
a unique bilinear mapping f : U × V → W satisfying f (u i , v j ) = wi j .

Proof Suppose that we are given a bilinear mapping f : U × V → W . Any u ∈ U


m 
n
and v ∈ V can be uniquely written as u = αi u i , v = β j v j ; αi , β j ∈ F. Hence
i=1 j=1
using bilinearity of f , we arrive at


m 
n
f (u, v) = αi β j f (u i , v j ).
i=1 j=1

This shows that f is uniquely determined by the pairing f (u i , vi ) ∈ W .


Conversely, given any wi j ∈ W define the map f : U × V → W such that

m n 
m 
n
f (u, v) = αi β j wi j ; where u = αi u i , and v = β j v j are the unique rep-
i=1 j=1 i=1 j=1
resentation of u and v with respect to the ordered bases of U and V respectively. The
existence and uniqueness of such expressions for u and v ensure that f is well defined
and obviously f (u i , v j ) = wi j . Hence, it remains only to show that f is bilinear. We
check bilinearity on u ∈ U for a fixed v ∈ V , and the other way around similarly.

m 
m  
n
For any u, u  ∈ U, and α, α  ∈ F with u = αi u i , u  = αi u i , and v = βjvj,
i=1 i=1 j=1
9.1 The Tensor Product 307

 
m  
we find that αu + α u  = (ααi + α αi )u i and
i=1

 



m   
n
f (αu + α u , v) = f (ααi + α αi )u i , βjvj
i=1 j=1

m   
n
= (ααi + α αi ) β j f (u i , v j )
i=1 j=1
m n  
= (ααi + α αi )β j wi j
i=1 j=1
m  n  
m 
n 
=α αi β j wi j + α αi β j wi j
i=1 j=1 i=1 j=1

= α f (u, v) + α f (u  , v).

This completes the proof.

Lemma 9.7 Let U , V be vector spaces over a field F, and let {u 1 , u 2 , . . . , u n }


be linearly independent subset of U . Then for arbitrary vectors v1 , v2 , . . . , vn ∈ V
n
u i ⊗ vi = 0 ⇒ vi = 0 for all i. In particular, u ⊗ v = 0 if and only if u = 0 or
i=1
v = 0.

n
Proof Let us suppose that u i ⊗ vi = 0 where it may be assumed that none of
i=1
the vectors vi are zero. The universal property of tensor product ensures that for any
bilinear function f : U × V → X , there exists a unique linear map h : U ⊗ V → X
such that h ◦ ψ = f . This implies that

n  
n 
n
0=h u i ⊗ vi = (h ◦ ψ)(u i , vi ) = f (u i , vi ).
i=1 i=1 i=1

The above relation holds for any bilinear function f : U × V → X . In particu-


lar one may choose two linear functionals φ ∈ U ∗ , ξ ∈ V ∗ such that f (u, v) =
φ(u)ξ(v), u ∈ U, v ∈ V . Since the set of vectors {u 1 , u 2 , . . . , u n } is linearly inde-
pendent, one can consider the dual vector u i∗ ∈ U ∗ such that u i∗ (u j ) = δi j . Now,

n
if φ = u ∗k ; 1 ≤ k ≤ n, then we arrive at 0 = u ∗k (u i )ξ(vi ) = ξ(vk ) for all linear
i=1
functionals ξ ∈ V ∗ . Hence, we easily obtain that vk = 0.

Theorem 9.8 Let U and V be vector spaces over a field F and let B1 = {u i | i ∈ I }
and B2 = {v j | j ∈ J } be bases of U and V, respectively. Then the set B = {u i ⊗
v j |i ∈ I, j ∈ J } is a basis for U ⊗ V.

m 
n
Proof To show that B is linearly independent, suppose that αi j (u i ⊗ v j ) = 0.
i=1 j=1
This can be rewritten
308 9 Tensors and Their Algebras


m 
n 
ui ⊗ αi j v j =0
i=1 j=1


n
and hence application of Lemma 9.7 yields that αi j v j = 0 for all i and hence
j=1
αi j = 0 for all i and j. Now to show that B spans U ⊗ V , let u ⊗ v be an arbitrary

m 
n
element of U ⊗ V . Then since u = αi u i and v = β j v j , we find that
i=1 j=1


m n
u⊗v = αi u i ⊗ βjvj
i=1  j=1 
m  n
= αi u i ⊗ βjvj
i=1  j=1 
m 
n
= αi β j (u i ⊗ v j )
i=1 j=1
m n
= αi β j (u i ⊗ v j ).
i=1 j=1

Hence, any sum of elements of the form u ⊗ v is a linear combination of the vectors
u i ⊗ v j , as desired.

Corollary 9.9 If U and V are finite dimensional vector spaces over a field F, then
dim(U ⊗ V ) = dim(U )dim(V ).

If U and V are finite dimensional vector spaces, then one can see the nature
of linear functionals which exist on U ⊗ V . The proof of the following theorem
illustrates that any linear functional on the tensor product is nothing but a tensor
product of linear functionals in isomorphic sense.
Theorem 9.10 Let U and V be finite dimensional vector spaces. Then

U


⊗V = U
⊗V


⊗V
via the isomorphism h : U
→ U ⊗ V given by h(σ ⊗ τ )(u ⊗ v) = σ (u)τ (v),

⊗V
for every σ ⊗ τ ∈ U
and for every u ⊗ v ∈ U ⊗ V.

Proof Let us choose fixed elements σ ∈ U


and τ ∈ V
. Define a map g  : U × V →
F such that g (u, v) = σ (u)τ (v). It is obvious to observe that g  is a bilinear map. Let


(U ⊗ V, f  ) be the tensor product of U and V . Hence, there exists a unique linear


map h σ,τ : U ⊗ V → F such that the following diagram commutes:
9.1 The Tensor Product 309

f
U ×V U ⊗V

h σ,τ
g

i.e., h σ,τ ( f  (u, v)) = g  (u, v) or h σ,τ (u ⊗ v) = g  (u, v), and hence h σ,τ (u ⊗ v) =
σ (u)τ (v). This shows that h σ,τ ∈ U ⊗ V.
As for each fixed σ ∈ U
and for each fixed τ ∈ V
, we have unique linear functional
h σ,τ on U ⊗ V . Thus, we define a map g : U × V

→ U ⊗ V such that g(σ, τ ) =
h σ,τ . Noting the above arguments, g is well defined. Here, for any α, β ∈ F, σ1 , σ2 ∈

,
U
g(ασ1 + βσ2 , τ )(u ⊗ v) = (h ασ1 +βσ2 ,τ )(u ⊗ v)
= (ασ1 + βσ2 )(u)τ (v)
= (ασ1 (u) + βσ2 (u))τ (v)
= ασ1 (u)τ (v) + βσ2 (u)τ (v)
= α(h σ1 ,τ (u ⊗ v)) + β(h σ2 ,τ (u ⊗ v))
= (αh σ1 ,τ )(u ⊗ v) + (βh σ2 ,τ )(u ⊗ v)
= (αh σ1 ,τ + βh σ2 ,τ )(u ⊗ v)

= αg(σ1 , τ ) + βg(σ2 , τ ) (u ⊗ v)

implies that g(ασ1 + βσ2 , τ )(u ⊗ v) = αg(σ1 , τ ) + βg(σ2 , τ ).


Hence g is linear in the first coordinate. Similarly, it can be seen that g is also
linear in the second coordinate. Thus, g is a bilinear map.
Let (U
⊗V
, f ) be the tensor product of U

and V
. Hence, corresponding to the
above bilinear map g, there exists a unique linear map h : U
⊗V
→ U ⊗ V such
that the following diagram commutes:

f

×V
U

⊗V
U

h
g

U
⊗V

i.e, h ◦ f = g, i.e., h( f (σ, τ )) = g(σ, τ ) or h(σ ⊗ τ ) = h σ,τ . This implies that


h(σ ⊗ τ )(u ⊗ v) = h σ,τ (u ⊗ v), i.e., h(σ ⊗ τ )(u ⊗ v) = σ (u)τ (v).
Next, we prove that the above homomorphism h is an isomorphism. For this,
let {u 1 , u 2 , . . . , u m } be a basis for U , with dual basis {u
1 , u
2 , . . . , u
m } and let
{v1 , v2 , . . . , vn } be a basis for V , with dual basis {

v1 , v
2 , . . . , v
n }. We know that
310 9 Tensors and Their Algebras

{u i ⊗ v j , 1 ≤ i ≤ m, 1 ≤ j ≤ n} and {
u i ⊗ v
j , 1 ≤ i ≤ m, 1 ≤ j ≤ n} are bases
of U ⊗ V and U
⊗V
respectively. Then

u i ⊗ v
j )(u ⊗ vμ ) = u
i (u )

h(
v j (vμ ) = δi, δ j,μ = δ(i, j),( ,μ) .

Here δi, , etc. are Kronecker’s deltas.


And, thus {h(
u i ⊗ v
j )|1 ≤ i ≤ m, 1 ≤ j ≤ n} ⊆ U⊗ V is the dual basis to the
basis {u i ⊗ v j |1 ≤ i ≤ m, 1 ≤ j ≤ n} for U ⊗ V . In the other words we have h(

ui ⊗
v
j ) = ui ⊗ vj.
In this way, we have proved the linear map h sends a basis of U
⊗V
to a basis
of U ⊗ V and hence h is a bijective map. Thus h becomes an isomorphism and we
have

⊗V
U
∼= U⊗ V.

Theorem 9.11 Let V1 , V2 and W be vector spaces over a field F. Let μ : V1 ×


V2 −→ W be a bilinear map. Suppose, there exist bases B1 and B2 of V1 and V2
respectively such that, μ(B1 × B2 ) is a basis for W. Then, for any choice of bases
B1 and B2 for V1 and V2 respectively, μ(B1 × B2 ) is a basis for W .

Proof Let B1 and B2 be bases for V1 and V2 , respectively. We first show that
μ(B1 × B2 ) spans W . Let y ∈ W . Since μ(B1 × B2 ) spans W , we can write
r s
y= a jk μ(z 1 j , z 2k ), where z 1 j ∈ B1 , z 2k ∈ B2 . But since B1 is a basis for
j=1 k=1
t 
p
V1 , z 1 j = b j x1 , where x1 ∈ B1 . Similarly, z 2k = ckm x2m , where x2m ∈ B2 .
=1 m=1
Thus,  t 

r 
s  
p
y= a jk μ b j x1 , ckm x2m
j=1 k=1 =1 m=1
r  s t 
p
= a jk b j ckm μ(x1 , x2m ).
j=1 k=1 =1 m=1

This implies that, y ∈


μ(B1 × B2 ) . Now, we need to show that μ(B1 × B2 ) is
linearly independent. If V1 and V2 are both finite dimensional, this follows from the
fact that |μ(B1 × B2 )| = |μ(B1 × B2 )| and both span W .
For infinite dimensions, a more sophisticated change of basis argument is needed.
 p
t 
Suppose d m μ(x1 , x2m ) = 0, where x1 ∈ B1 , x2m ∈ B2 . Then by change
=1 m=1

r 
s
of basis, x1 = e j z 1 j , where z 1 j ∈ B1 and x2m = f mk z 2k , where z 2k ∈ B2 .
j=1 k=1
Note that, the e j form an inverse matrix to the matrix formed by, b j above, then
r 
s
e j b j  = δ  and similarly, f mk ckm  = δmm  (where δ refers to the Kronecker’s
j=1 k=1
delta). Thus we have,
9.1 The Tensor Product 311



t 
p
0= d m μ(x1 , x2m )
=1 m=1
p
 r 
t   
s
= d m μ e j z 1 j , f mk z 2k
=1 m=1 j=1 k=1

r s 
t 
p
= d m e j f mk μ(z 1 j , z 2k ).
j=1 k=1 =1 m=1



t 
p
Since μ(B1 × B2 ) is linearly independent, d m e j f mk = 0, for all j, k. But
=1 m=1
now,


t 
p
d  m  = d m δ  δmm 
=1 m=1
p
 r 
t   s
= d m b j  e j )( ckm  f mk
=1 m=1 j=1 k=1
 p

r s t 
= b j  ckm  d m e j f mk
j=1 k=1 =1 m=1
= 0, for all  , m  .

Thus, μ(B1 × B2 ) is linearly independent. Hence μ(B1 × B2 ) is a basis for W .

Theorem 9.12 Let U, V, W be vector spaces over a field F. Then


(i) U ⊗ V ∼
= V ⊗ U,
(ii) U ⊗ F ∼
= F⊗U ∼ = U,
(iii) (U ⊗ V ) ⊗ W ∼
= U ⊗ (V ⊗ W ).

Proof (i) Assume that (U ⊗ V, ψ) is a tensor product of U and V . Define a bilinear


mapping f : U × V → V ⊗ U such that f (u, v) = v ⊗ u. Using the definition of
tensor product, we find that there exists a unique linear mapping h : U ⊗ V →
V ⊗ U such that the diagram

ψ
U ×V U ⊗V

h
f

V ⊗U

is commutative, i.e., h ◦ ψ = f . Similarly, define g : V × U → U ⊗ V such that


g(v, u) = u ⊗ v. This leads to a unique linear mapping h  : V ⊗ U → U ⊗ V such
that the diagram
312 9 Tensors and Their Algebras

ψ
V ×U V ⊗U

h
g

U ⊗V

is commutative, i.e., h  ◦ ψ = g. Now for any x ∈ U ⊗ V , we find that
 m n 

h  ◦ h(x) = h  ◦ h αi j (u i ⊗ v j )
 m i=1 j=1 
 n
= h αi j h(u i ⊗ v j )
 i=1 j=1 

m  n
=h αi j (v j ⊗ u i )
i=1 j=1

m 
n
= αi j h  (v j ⊗ u i )
i=1 j=1
m  n
= αi j (u i ⊗ v j )
i=1 j=1
= x.

Similarly, it can be seen that h ◦ h  (y) = y for all y ∈ V ⊗ U . This shows that h ◦
h  = I on V ⊗ U and h  ◦ h = I on U ⊗ V . Thus h is an isomorphism whose inverse
is h  and hence U ⊗ V ∼= V ⊗ U.
(ii) The above proof shows that the correspondence u ⊗ v ←→ v ⊗ u has been
used to establish the isomorphism U ⊗ V ∼ = V ⊗ U . Similarly, using the corre-
spondence u ⊗ α ←→ α ⊗ u ←→ αu, where α ∈ F, u ∈ U it can be seen that
U ⊗F∼ = F⊗U ∼ = U.

(iii) For any fixed u ∈ U , define a map f u : V × W → (U ⊗ V ) ⊗ W such that


f u (v, w) = (u ⊗ v) ⊗ w, which is bilinear. Hence, there exists a unique linear map-
ping L u : V ⊗ W → (U ⊗ V ) ⊗ W . Further, for a fixed v ∈ V and w ∈ W , the
map f (v,w) : U → (U ⊗ V ) ⊗ W such that f (v,w) (u) = (u ⊗ v) ⊗ w is linear. Now
define g : U × (V ⊗ W ) → (U ⊗ V ) ⊗ W such that g(u, v ⊗ w) = L u (v ⊗ w) =
f (v,w) (u). Then it can be easily seen that g is bilinear, that is, linear in both the
arguments. For any u 1 , u 2 ∈ U, α, β ∈ F

g αu 1 + βu 2 , v ⊗ w = f (v,w) (αu 1 + βu 2 )
= α f (v,w) (u 1 ) + β f (v,w) (u 2 )
= αg(u 1 , v ⊗ w) + βg(u 2 , v ⊗ w)

and
9.1 The Tensor Product 313
 
g u, α(v1 ⊗ w1 ) + β(v2 ⊗ w2 ) = L u α(v1 ⊗ w1 ) + β(v2 ⊗ w2 )
= αL u (v1 ⊗ w1 ) + β L u (v2 ⊗ w2 )
= αg(u, v1 ⊗ w1 ) + βg(u, v2 ⊗ w2 ).

Thus, there exists a linear mapping L g : U ⊗ (V ⊗ W ) → (U ⊗ V ) ⊗ W such that


L g (u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w. Using similar arguments a linear map L g : (U ⊗
V ) ⊗ W → U ⊗ (V ⊗ W ) can be obtained and it is easy to see that L g L g = I on
U ⊗ (V ⊗ W ) and L g L g = I on (U ⊗ V ) ⊗ W, i.e., L g and L g are inverses of each
other and hence L g is an isomorphism, i.e., (U ⊗ V ) ⊗ W ∼= U ⊗ (V ⊗ W ).

Using the isomorphism given in (iii) as identification, define the tensor product
of three vector spaces as follows:

U ⊗ V ⊗ W = (U ⊗ V ) ⊗ W = U ⊗ (V ⊗ W ).

More generally, when an ordered set of vector spaces {V1 , V2 , . . . , Vn } is given, it


can be seen that their tensor product does not depend on the associativity and is
uniquely determined up to a (canonical) isomorphism and in general by dropping
the parenthesis it can be simply written as


n
Vi = V1 ⊗ V2 ⊗ · · · ⊗ Vn .
i=1

By definition, there exists a canonical map


n
(v1 , v2 · · · , vn ) → v1 ⊗ v2 ⊗ · · · ⊗ vn ∈ Vi
i=1

and the following induction formula holds:

V1 ⊗ V2 ⊗, . . . ⊗ Vk = (V1 ⊗ V2 ⊗ · · · ⊗ Vk−1 ) ⊗ Vk

with correspondence v1 ⊗ v2 ⊗ · · · ⊗ vk = (v1 ⊗ v2 ⊗ · · · ⊗ vk−1 ) ⊗ vk .

If V1 , V2 , . . . , Vn , W are vector spaces over a field F, a function ψ : V1 × V2 ×


· · · × Vn → W is said to be multilinear(n-linear) if it is linear in each argument
(when the others are kept fixed), that is,

ψ(v1 , v2 , . . . , αvi + βvi , . . . , vn ) = αψ(v1 , v2 , . . . , vi , . . . , vn )

+βψ(v1 , v2 , . . . , vi , . . . , vn )

for all v1 ∈ V1 , v2 ∈ V2 , . . . , vi , vi ∈ Vi , . . . , vn ∈ Vn for all 1 ≤ i ≤ n, α, β ∈ F.


The collection of all such functions is a vector space over F.
314 9 Tensors and Their Algebras

A bilinear mapping is a special case of n-linear mapping for n = 2. The above


defined tensor product v1 ⊗ v2 ⊗ · · · ⊗ vn is n-linear and let us state the universal
property for n-linear mappings.
In particular, a n-linear map f : V × V × · · · × V → W is called symmetric if
interchanging any two coordinate positions, nothing changes in f, i.e.,

f (v1 , v2 , . . . , vi , . . . , v j , . . . , vn ) = f (v1 , v2 , . . . , v j , . . . , vi , . . . , vn )

for any i = j.
A n-linear map f : V × V × · · · × V → W is called antisymmetric or skew sym-
metric if interchanging any two coordinate positions introduces a factor of (−1), i.e.,

f (v1 , v2 , . . . , vi , . . . , v j , . . . , vn ) = − f (v1 , v2 , . . . , v j , . . . , vi , . . . , vn )

for any i = j.
A n-linear map f : V × V × · · · × V → W is called alternate or alternating if

f (v1 , v2 , . . . , vn ) = 0

whenever any two of the vectors vi are equal.


It is to be noted that if char (F) = 2, then alternate =⇒ symmetric⇐⇒ skew
symmetric. But if char (F) = 2, then alternate ⇐⇒skew symmetric.
The pair (V1 ⊗ V2 ⊗ · · · ⊗ Vn , ψ), where ψ : V1 × V2 × · · · × Vn → V1 ⊗ V2 ⊗
· · · ⊗ Vn is multilinear mapping defined by ψ(v1 , v2 , . . . , vn ) = v1 ⊗ v2 ⊗ · · · ⊗ vn
has the following universal property: If f : V1 × V2 × · · · × Vn → X is any multi-
linear function from V1 × V2 × · · · × Vn to a vector space X over F then there exists
a unique linear transformation h : V1 ⊗ V2 ⊗ · · · ⊗ Vn → X such that h ◦ ψ = f ,
that is, the following diagram commutes:

ψ
V1 × V2 × · · · × Vn V1 ⊗ V2 ⊗ · · · ⊗ Vn
h
f
X

Exercises

1. If U and V are finite dimensional vector spaces over the same field, then show that
U ⊗ V can be represented as the dual of the space of bilinear forms on U ⊕ V .
2. Let U and V be vector spaces over the same field. If V = V1 ⊕ V2 , then show
that U ⊗ V = (U ⊗ V1 ) ⊕ (U ⊗ V2 ).
9.1 The Tensor Product 315

3. Show that the tensor product of P3 and R2 is M2×4 (R).


4. If u = (1, i) and v = (i, 1, −1) are vectors in C2 and C3 , respectively, then find
the representation of u ⊗ v in terms of the basis ei ⊗ e j for C2 ⊗ C3 .
5. Let U1 , U2 be subspaces of a vector space U . Then show that

(U1 ⊗ U ) ∩ (U2 ⊗ U ) ∼
= (U1 ∩ U2 ) ⊗ U.

6. Let U1 and V1 be subspaces of vector space U and V, respectively. Show that

(U1 ⊗ V ) ∩ (U ⊗ V1 ) ∼
= U1 ⊗ V1 .

7. Let U1 , U2 ⊆ U and V1 , V2 ⊆ V be subspaces of vector spaces U and V respec-


tively. Show that

(U1 ⊗ V1 ) ∩ (U2 ⊗ V2 ) ∼
= (U1 ∩ U2 ) ⊗ (V1 ∩ V2 ).

8. Find examples of two vector spaces U and V and a nonzero vector x ∈ U ⊗ V


such that at least two distinct (not including order of the terms) representation of
n
the form x = u i ⊗ vi , where the u i ’s are linearly independent and so are vi ’s.
i=1
9. Let R3 and R2 be vector spaces over R. Show that P5 = {α0 + α1 x + α2 x 2 +
α3 x 3 + α4 x 4 + α5 x 5 |α0 , α1 , α2 , α3 , α4 , α5 ∈ R} is a tensor product of R3 and
R2 .
(Hint: Consider associated bilinear map f : R3 × R2 → P5 such that

f (α1 , α2 , α3 ), (β1 , β2 ) = α1 β1 + α1 β2 x + α2 β1 x 2 + α2 β2 x 3
+α3 β1 x 4 + α3 β2 x 5 ,

for every (α1 , α2 , α3 ), (β1 , β2 ) ∈ R3 × R2 ).

9.2 Tensor Product of Linear Transformations

Let T1 : U1 → V1 and T2 : U2 → V2 be any two arbitrarily given linear maps and


consider the tensor products U1 ⊗ U2 and V1 ⊗ V2 together with their tensor maps τ1
and τ2 . Define a map g : U1 × U2 → V1 × V2 such that g(u 1 , u 2 ) = (T1 (u 1 ), T2 (u 2 ))
for every (u 1 , u 2 ) ∈ U1 × U2 . As τ2 : V1 × V2 → V1 ⊗ V2 is a bilinear map, it fol-
lows that τ2 ◦ g is obviously a bilinear map. By definition of tensor product U1 ⊗ U2 ,
there exists a unique linear map f : U1 ⊗ U2 → V1 ⊗ V2 such that f ◦ τ1 = τ2 ◦ g
holds in the following rectangle.
316 9 Tensors and Their Algebras

τ1
U1 × U2 U1 ⊗ U2

τ2 ◦g
g f

V1 × V2 τ2
V1 ⊗ V2

The unique map f is called the tensor product of linear maps T1 and T2 and
usually f is represented by f = T1 ⊗ T2 . Here, we have ( f ◦ τ1 )(u 1 , u 2 ) = (τ2 ◦
g)(u 1 , u 2 ), i.e., f (u 1 ⊗ u 2 ) = T1 (u 1 ) ⊗ T2 (u 2 ). We also write as (T1 ⊗ T2 )(u 1 ⊗
u 2 ) = T1 (u 1 ) ⊗ T2 (u 2 ).

Remark 9.13 (i) Let IU : U → U and I V : V → V be the identity linear maps


on U and V respectively. Then IU ⊗ I V = IU ⊗V . Since (IU ⊗ I V )(u ⊗ v) =
IU (u) ⊗ I V (v) = u ⊗ v = IU ⊗V (u ⊗ v).

(ii) Let 0 : U1 → V1 be the zero linear map and T : U2 → V2 be any arbitrary linear
map then 0 = 0 ⊗ T , T ⊗ 0 = 0 , where 0 : U1 ⊗ U2 → V1 ⊗ V2 is the zero
linear map and 0 : U2 ⊗ U1 → V2 ⊗ V1 is the zero linear map. (0 ⊗ T )(u 1 ⊗
u 2 ) = 0(u 1 ) ⊗ T (u 2 ) = 0 ⊗ T (u 2 ) = 0 = 0 (u 1 ⊗ u 2 ) gives the clue.

Example
9.14 Let D : R[x] → R[x] be the linear map, which is called derivation
and : R[y] → R[y] be the linear map, which is known as integration operator. We
shall determine D ⊗ .

Since, we know
that R[x] ⊗ R[y]=R[x, y], D ⊗ : R[x] ⊗ R[y] → R[x] ⊗ R[y],
D ⊗ : R[x, y] → R[x, y], given by (D ⊗ )( p(x) ⊗ q(y)) = (Dp(x)) ⊗
i.e.,
( q(y)).
In other words,
 
f
inite f
inite
(D ⊗ ) αi j x y
i j
= αi j D(x i ) yj
i≥0, j≥0 i≥0, j≥0
f
inite j+1
= αi j i x i−1 yj+1
i≥0, j≥0
f
inite
= i
α x i−1 y j+1 .
j+1 i j
i≥0, j≥0

Here, it is to be noted that x and y do not commute.

Proposition 9.15 Let T1 : U1 → V1 , T2 : V1 → W1 , S1 : U2 → V2 and S2 : V2 →


W2 be linear maps. Then (T2 ◦ T1 ) ⊗ (S2 ◦ S1 ) = (T2 ⊗ S2 ) ◦ (T1 ⊗ S1 ).

Proof Clearly, T2 ◦ T1 : U1 → W1 and S2 ◦ S1 : U2 → W2 are linear maps. Hence


by definition of tensor product of linear maps, we have (T2 ◦ T1 ) ⊗ (S2 ◦ S1 ) : U1 ⊗
9.2 Tensor Product of Linear Transformations 317

U2 → W1 ⊗ W2 . On the other hand, we also have T1 ⊗ S1 : U1 ⊗ U2 → V1 ⊗ V2 and


T2 ⊗ S2 : V1 ⊗ V2 → W1 ⊗ W2 as linear maps. As a result (T2 ⊗ S2 ) ◦ (T1 ⊗ S1 ) :
U1 ⊗ U2 → W1 ⊗ W2 is also a linear map. Consider, for any u 1 ∈ U1 , u 2 ∈ U2 ,
 
(T2 ◦ T1 ) ⊗ (S2 ◦ S1 ) (u 1 ⊗ u 2 ) = (T2 ◦ T1 )(u 1 ) ⊗ (S2 ◦ S1 )(u 2 )
= T2 (T1 (u 1 )) ⊗ S2 (S1 (u 2 ))
= (T2 ⊗ S2 )T1 (u 1 ) ⊗ S1 (u 2 )
= (T
 2 ⊗ S2 ) (T1 ⊗ S1 )(u 1 ⊗ u 2 )
= (T2 ⊗ S2 ) ◦ (T1 ⊗ S1 ) (u 1 ⊗ u 2 ).

This implies that

(T2 ◦ T1 ) ⊗ (S2 ◦ S1 ) = (T2 ⊗ S2 ) ◦ (T1 ⊗ S1 ).

Theorem 9.16 Let U1 , U2 , V1 , V2 be vector spaces over the same field F. If U1 ∼


= U2
and V1 ∼
= V2 , then U1 ⊗ V1 ∼
= U2 ⊗ V2 .

Proof Given that U1 ∼ = U2 and V1 ∼ = V2 . This confirms that there exist isomor-
phisms T1 : U1 → U2 and T2 : V1 → V2 . Using the definition of tensor product
of linear maps, we have T1 ⊗ T2 : U1 ⊗ V1 → U2 ⊗ V2 , which is a linear map
given as (T1 ⊗ T2 )(u 1 ⊗ v1 ) = T1 (u 1 ) ⊗ T2 (v1 ). We claim that T1 ⊗ T2 is a bijec-
tive map. For ontoness of T1 ⊗ T2 , let u 2 ⊗ v2 ∈ U2 ⊗ V2 . Since T1 and T2 are onto,
there exist u 1 ∈ U1 and v1 ∈ V1 such that T1 (u 1 ) = u 2 and T2 (v1 ) = v2 . Obviously
(T1 ⊗ T2 )(u 1 ⊗ v1 ) = T1 (u 1 ) ⊗ T2 (v1 ) = u 2 ⊗ v2 . This shows that T1 ⊗ T2 is onto.
Before proving that T1 ⊗ T2 is injective, we will prove a fact, i.e., kernel of T1 ⊗ T2
is the subspace K of U1 ⊗ V1 generated by the elements x ⊗ y of U1 ⊗ V1 with
x ∈ K er (T1 ) or y ∈ K er (T2 ). In otherwords,

K er (T1 ⊗ T2 ) = K =
{x ⊗ y ∈ U1 ⊗ V1 |x ∈ K er (T1 ) or y ∈ K er (T2 )} .

Let L = {x ⊗ y ∈ U1 ⊗ V1 |x ∈ K er (T1 ) or y ∈ K er (T2 )}. For each x ⊗ y ∈ L, we


have (T1 ⊗ T2 )(x ⊗ y) = T1 (x) ⊗ T2 (y) but as x ∈ K er (T1 ) or y ∈ K er (T2 ), we
conclude that (T1 ⊗ T2 )(x ⊗ y) = 0. This implies that x ⊗ y ∈ K er (T1 ⊗ T2 ), i.e.,
L ⊆ K er (T1 ⊗ T2 ). Since K is the smallest subspace of U1 ⊗ V1 , containing L, we
arrive at K ⊆ K er (T1 ⊗ T2 ). Next our target is to show that K er (T1 ⊗ T2 ) ⊆ K .
Consider the quotient homomorphism q : U1 ⊗ V1 → U1 K⊗V1 . This homomorphism

q induces another homomorphism q ∗ : U1 K⊗V1 → U2 ⊗ V2 , such that q ∗ (u ⊗ v) +

K = T1 (u) ⊗ T2 (v). Here, we have the following commutative diagram, i.e., q ∗ q =
T1 ⊗ T2 .
318 9 Tensors and Their Algebras

U1 ⊗ V1

T1 ⊗T2
q

U1 ⊗V1
U2 ⊗ V2
K q∗

Now, we show that q ∗ is an injective homomorphism. For this purpose, we con-


struct a function g : U2 × V2 → U1 K⊗V1 such that g(x2 , y2 ) = (x1 ⊗ y1 ) + K ; where
T1 (x1 ) = x2 , T2 (y1 ) = y2 . The existence of such elements x1 ∈ U1 , y1 ∈ V1 is guar-
anteed because both T1 and T2 are onto. We claim that this association is a map. We
show that choice of x1 ∈ U1 , y1 ∈ V1 does not effect the value of (x1 ⊗ y1 ) + K . For
this let, we have another x1 ∈ U1 , y1 ∈ V1 such that T1 (x1 ) = x2 and T2 (y1 ) = y2 .
This shows that T1 (x1 − x1 ) = 0 and T2 (y1 − y1 ) = 0, i.e., x1 − x1 ∈ K er T1 and
y1 − y1 ∈ K er T2 . Suppose that x1 − x1 = η and y1 − y1 = δ, for some η ∈ K er (T1 )
and δ ∈ K er (T2 ). This implies that x1 = x1 + η and y1 = y1 + δ. In this case, since,
η ∈ K er (T1 ) and δ ∈ K er (T2 )

x1 ⊗ y1 − x1 ⊗ y1 = (x1 + η) ⊗ (y1 + δ) − x1 ⊗ y1


= x1 ⊗ y1 + x1 ⊗ δ + η ⊗ y1 + η ⊗ δ − x1 ⊗ y1
= x1 ⊗ δ + η ⊗ y1 + η ⊗ δ.

Hence, we conclude that x1 ⊗ δ + η ⊗ y1 + η ⊗ δ ∈ K and hence x1 ⊗ y1 − x1 ⊗


y1 ∈ K . This forces us to conclude that (x1 ⊗ y1 + K ) = (x1 ⊗ y1 ) + K . Thus above
arguments are sufficient to say that g is a well-defined map. Next we prove that g
is a bilinear map. For this, let α1 , α2 ∈ F, x3 , x4 ∈ U2 , y3 ∈ V2 . Since, both T1
and T2 are onto, there exists x3 , x4 ∈ U1 , y3 ∈ V1 such that T1 (x3 ) = x3 , T1 (x4 ) =
x4 , T2 (y3 ) = y3 . Also, we have T1 (α1 x3 + α2 x4 ) = α1 x3 + α2 x4 . Thus,

g(α1 x3 + α2 x4 , y3 ) = (α1 x3 + α2 x4 ) ⊗ y3 + K
= α1 (x3 ⊗ y3 ) + α2 (x4 ⊗ y3 ) + K
= α1 (x3 ⊗ y3 ) + K + α2 (x4 ⊗ y3 ) + K
= α1 g(x3 , y3 ) + α2 g(x4 , y3 ).

Hence, g is linear in the first coordinate. Similarly, we can show that g is linear in
the second coordinate also. Thus g is a bilinear map. By the definition of U2 ⊗ V2 ,
there exists a unique linear map h : U2 ⊗ V2 → U1 K⊗V1 such that following diagram
commutes, i.e., h(u 2 ⊗ v2 ) = (u 1 ⊗ v1 ) + K , where T1 (u 1 ) = u 2 and T2 (v1 ) = v2 .
9.2 Tensor Product of Linear Transformations 319

U2 × V2 U2 ⊗ V2

h
g

U1 ⊗V1
K

U1 ⊗V1 U1 ⊗V1
Here, we observe that h ◦ q ∗ : K
→ K
is a homomorphism such that
 
(h ◦ q ∗ ) u 1 ⊗ v1 + K = h q ∗ (u 1 ⊗ v1 + K )
= h T1 (u 1 ) ⊗ T2 (v1 )
= (u 1 ⊗ v1 ) + K .

This shows that h ◦ q ∗ is the identity map. Hence, it is bijective map as a result
q is one-to-one. Thus K erq ∗ = {K }. Let us suppose that z ⊗ t ∈ K er (T1 ⊗ T2 ).


implies that (T1 ⊗ T2 )(z ⊗ t) =∗0 =⇒ T1 (z) ⊗ T2 (t)∗ = 0. Hence, q (z ⊗ t) +
This
K = 0, i.e., (z ⊗ t) + K ∈ K er (q ). But since K er (q ) = {K }, this implies that
(z ⊗ t) + K = K , i.e., z ⊗ t ∈ K . Thus, we have proved that K er (T1 ⊗ T2 ) ⊆ K .
Finally, we have shown that K er (T1 ⊗ T2 ) = K . Given that T1 and T2 are injective,
i.e., K er T1 = {0} and K er T2 = {0}. As a result

K er (T1 ⊗ T2 ) =
x ⊗ y ∈ U1 ⊗ V1 |x ∈ {0} or y ∈ {0} .

Therefore, K er (T1 ⊗ T2 ) = {0}. This implies that T1 ⊗ T2 is injective. Finally, we


have shown that T1 ⊗ T2 is an isomorphism and thus U1 ⊗ V1 ∼ = U2 ⊗ V2 .

Exercises

1. Let U1 , V1 , U2 , V2 be vector spaces of finite dimensions. If T1 : U1 → V1 and


T2 : U2 → V2 be linear maps, then prove that rank(T1 ⊗ T2 ) = rankT1rankT2 .

2. Let U1 , V1 , U2 , V2 be vector spaces of finite dimension n. Let T1 : U1 → V1 and


T2 : U2 → V2 be nonsingular linear maps. Prove that T1 ⊗ T2 is also nonsingular
and (T1 ⊗ T2 )−1 = T1−1 ⊗ T2−1 .

3. Let d
dx
: R[x] → R[x] and d
dy
: R[y] → R[y] be linear maps, known as deriva-
∂2
tions. Prove that ddx ⊗ dy
d
≡ ∂ x∂ y
. Here, R[x] and R[y] represent the vector spaces
of all real polynomials in x and y, respectively. Here, x and y do not commute.

4. Let P2 (x) be the vector space of all polynomials over R in x of degree less
than or equal to 2 and P3 (y) be the vector space of all polynomials over R in y
320 9 Tensors and Their Algebras

of degree less than or equal to 3. Let B1 = {1, x, x 2 } and B2 = {1, y, y 2 , y 3 } be


bases of P2 (x) and P3 (y), respectively. Further, suppose ddx : P2 (x) → P2 (x) and
d
dy
: P2 (y) → P2 (y) be linear maps. Determine the matrix of linear map ddx ⊗ dy d

with respect to ordered basis B1 ⊗ B2 , where order is your choice.

5. Is the tensor product of linear transformations associative? Justify your claim.

6. Is the tensor product of linear transformations commutative? Justify your claim.

7. Let U and V be vector spaces over F, having dimensions m and n, respectively,


where F is an algebraically closed field. Let T1 : U → U be a linear map, having
eigen values λ1 , λ2 , . . . , λm and T2 : V → V be another linear map, having eigen
values μ1 , μ2 , . . . , μn . Show that all the eigen values of T1 ⊗ T2 will be given by
λi μ j ; 1 ≤ i ≤ m, 1 ≤ j ≤ n.

9.3 Tensor Algebra

In the previous section, we studied how to obtain the tensor product of finite number
of vector spaces over the same field F. Given a vector space V, we want to construct
an algebra over F. Before, doing that we want to give a brief idea of external direct
product and external direct sum of an arbitrarily set of vector spaces over the same
field F. Let F = {Vi |i ∈ I } be an arbitrarily given set of vector spaces Vi , where I
is an indexing set. Let P denotes the set of all maps that can be defined from I to
the union M of the sets Vi such that f (i) ∈ Vi holds for every i ∈ I, i.e., P = { f :
I −→ M = ∪i∈I Vi and f (i) ∈ Vi holds for every i ∈ I }. Define addition and scalar
multiplication in P as: for any f, g ∈ P, α ∈ F, the functions f + g : I −→ M and
α f : I −→ M are defined by ( f + g)(i) = f (i) + g(i) and (α f )(i) = α( f (i)) for
each i ∈ I. One can easily verify that P is a vector space over F with regard to these
operations. P is known
 as the external direct product of vector spaces Vi , i ∈ I. It is
usually denoted by , i.e., P = i∈I Vi .
Now, we construct a special type of subspace of the vector space P. Let us consider
a subset S of P, which consists of all f ∈ P such that f (i) = 0 holds for all except
finite number of indices i ∈ I. It is obvious to observe that S is a subspace of P.
This subspace is known as the external direct sum of given set F of vector spaces.
 ext
Usually, we denote it by ext , i.e., S = i∈I Vi . It is to be remarked here that
if the indexing set I is finite, then P = S. And, in this case, each of them can be
called either the external direct product or the external direct sum of the given set F
of vector spaces.
For each index j ∈ I, we define a map f j : V j −→ S such that ( f j )(v) ∈ S for
every v ∈ V j , where ( f j )(v) : I −→ M is a map defined by (( f j )(v))(i) = v if i = j
and 0, otherwise. It can be easily shown that f j is an injective linear transforma-
tion. Thus, for each index j ∈ I, we can identify V j with its image f j (V j ) in S
9.3 Tensor Algebra 321

and in this sense, we can say that V j is a subspace of S for each index j ∈ I.
Let f ∈ S and suppose that 0 = f (i 1 ) = v1 ∈ Vi1 , 0 = f (i 2 ) = v2 ∈ Vi2 , . . . , 0 =
f (ir ) = vr ∈ Vir but f (i) = 0 for each i ∈ I − {i 1 , i 2 , . . . , ir }, where r is any non-
negative integer. Now, we can write f = ( f i1 )(v1 ) + ( f i2 )(v2 ) + · · · + ( f ir )(vr ).
Here, the vectors ( f i1 )(v1 ), ( f i2 )(v2 ), . . . , ( f ir )(vr ) can be identified by the vectors
v1 , v2 , . . . , vr respectively. Thus, finally we can write f ∈ S as f = v1 + v2 + · · · +
r
vr , i.e., f = vi in identified sense. Now, we start constructing an algebra over F.
i=1
p p
Let V be a vector space over F. Define the symbol Vq by Vq = V1 ⊗ V2 ⊗
· · · ⊗ Vp ⊗ V
1 ⊗ V
2 ⊗ · · · ⊗ V
q where Vi = V, i = 1, 2, . . . , p and V
j = V
, j =
p
1, 2, . . . , q. Also, define V00 =F, V0 = V1 ⊗ V2 ⊗ · · · ⊗ V p , Vq0 = V
1 ⊗ V
2 ⊗ · · · ⊗

q , where Vi = V, i = 1, 2, . . . , p, V
V
j = V
, j = 1, 2, . . . , q. Thus, the symbols
p
Vq have been defined for each nonnegative integers p and q. Obviously, F =
p
{Vq , where ( p, q) ∈ N ∪ {0} × N ∪ {0}} is a set of vector spaces, where N ∪ {0} ×
N ∪ {0} is an indexing set. Let T p(V ) be the external direct sum of this set of vector
spaces, i.e., T (V ) = ext ( p,q)∈I Vq , where I = N ∪ {0} × N ∪ {0}. The vector space
T (V ) is known as the tensor space of V and an element of T (V ) is called a tensor.
p
By the previous arguments, it is clear that Vq are subspaces of T (V ) for each non-
p
negative integers p and q. If p and q are positive integers, then an element of V0
is called a contravariant tensor, an element of Vq0 is called a covariant tensor and an
p
element of Vq is called as p times contravariant and q times covariant tensor or a
mixed tensor of the type ( p, q).
r  s
p
Define a multiplication in T (V ) as: let x, y ∈ T (V ), clearly x = vq ji , y =
i=1 j=1

m 
n
p p p p p
vq i , where vq ji ∈ Vq j i , vq i ∈ Vq  i then
j j j
i=1 j=1

  
pr + pm qs +qn
 p

xy = vqpji ⊗ vq i ,
j
a=0 b=0 pi + pi =a q j +q j =b

  
pi + p
where vq ji ⊗ vq i ∈ Vq j i ⊗ Vq  i ∼
p p p p
j j
= Vq j +q j i . This can be easily seen that this multipli-
cation is a binary operation in T (V ). Actually, this multiplication is a bilinear map
from T (V ) × T (V ) to T (V ). It can be verified that the vector space T (V ) forms an
algebra over F with regard to the multiplication defined above, which is in general
noncommutative, infinite dimensional and with the identity V00 . This algebra con-
structed above is known as the tensor algebra of V and is usually denoted by T (V ).
We will denote the set of all contravariant and covariant tensors in T (V ) by T0 (V )
and T 0 (V ), respectively.
Theorem 9.17 T0 (V ) and T 0 (V ) form subalgebras of the tensor algebra T (V ).
 p p p
Proof Clearly T0 (V ) = { v0 |v0 ∈ V0 , p = 1, 2, 3, . . .}. Let x, y ∈ T0 (V ).
f inite

r
p 
m
p 
r
p
Thus x = v0 i and y = v0 i . For any α, β ∈ F, we have αx + βy = αv0 i +
i=1 i=1 i=1
322 9 Tensors and Their Algebras


m
p p p p p
βv0 i . As V0 i and V0 i are subspaces of T (V ), we have, say w0 i = αv0 i ∈
i=1
p p p p 
r
p 
m
p
V0 i and w0 i = βv0 i ∈ V0 i . Thus αx + βy = w0 i + w0 i . Which shows that
i=1 i=1
αx + βy ∈ T0 (V ), as a result T0 (V ) is a subspace of T (V ). Taking α = 1, β = −1,
+ pm  
pr
p p p p
we have x − y ∈ T0 (V ). Here x y = v0 i ⊗ v0 i ), where v0 i ⊗ v0 i ∈
a=0 pi + pi =a
p p
∼ pi + p 
V0 i⊗ = V0 i . In turn, we conclude that x y ∈ T0 (V ) and hence T0 (V ) is a
V0 i
subring of T (V ). Only little observation shows that T0 (V ) is a subalgebra of T (V ).
Using the similar arguments, it can be proved that T 0 (V ) is also a subalgebra of
T (V ).

Noting the definition of multiplication in T (V ), it is clear that the product of any


p
two arbitrary tensors is completely determined by the product of tensors x ∈ Vq and

p
y ∈ Vq  . Now, we see the calculus and the properties involved in the product of such
tensors x and y. Let V be a finite dimensional vector space having dimension n. Let
{ei }i=1
n
be a basis for V and {ei }i=1
n
be a dual basis for V
. By the previous section, we
have ei1 ⊗ ei2 ⊗ · · · ei p ⊗ e ⊗ e ⊗ · · · ⊗ e , where i k and jm run independently
j1 j2 jq
p
over the integer 1 through n, is a basis for Vq . For the sake of conciseness, we write
j j ··· j p
these tensors used in the basis as: ei11i22···i pq . In general, any element x ∈ Vq is of the
form

x = x1 ⊗ x2 ⊗ · · · ⊗ x p ⊗ y1 ⊗ y2 ⊗ · · · ⊗ yq , (9.1)


n 
n
where xk = xkik eik , k = 1, 2, . . . , p; ym = y mjm e jm , m = 1, 2, . . . , q. If we
i k =1 jm =1
put the values of xk and ym in the above expression for x, then a vast set of summa-
tions occurs. Thus to avoid all these summation signs, we separate these signs but it
is understood. Now using the multilinearity, we obtain
i q j j ··· j
x = x1i1 x2i2 · · · x pp y 1j1 y 2j2 · · · y jq ei11i22···i pq .

i i ···i j j ··· j
Using short notation, we can write the above expression as x = α j11 j22 ··· jpq ei11i22···i pq . Here,
it is understood that summation is carried out on each index. Thus, each tensor in
p k1 k2 ···k  h 1 h 2 ···h 
Vq can be written in this form. Let y = βh 1 h 2 ···hpq  ek1 k2 ···k pq . As a result

i i ···i k1 k2 ···k  j1 j2 ··· jq h 1 h 2 ···h 


x ⊗ y = α j11 j22 ··· jpq βh 1 h 2 ···hpq  ei1 i2 ···i p k1 k2 ···k pq .

A tensor x that can be expressed in the form (9.1) is known as decomposable


tensor. Otherwise, tensor is known as indecomposable. Recalling the definition of
a tensor, we conclude that, actually, each tensor is a finite sum of decomposable
tensors. For example, the tensors
9.3 Tensor Algebra 323

3e1 ⊗ e1 ⊗ e2 ∈ V21 , −3e1 ⊗ e1 + 10e2 ⊗ e1 − 6e1 ⊗ e2 + 20e2 ⊗ e2

= (−3e1 + 10e2 ) ⊗ (e1 + 2e2 ) ∈ V11

etc. are decomposable tensors but tensors 5e1 ⊗ e1 ⊗ e2 + e1 ⊗ e1 , e1 ⊗ e1 + e1 ⊗


e2 etc. are indecomposable. Here, we notice that the product of any two tensors
in T (V ) becomes a tensor of higher order, i.e., multiplication in T (V ) is an order
enhancing phenomenon in tensors. Now, we introduce another phenomenon in T (V ),
which reduces the orders of the tensors, that is known as contraction.
Contraction
Let V be any vector space and T (V ) be the tensor algebra of V. Define a map-
ping f : V1 × V2 × · · · × V p × V
1 × V

2 × · · · × V

q −→ Vq−1 p−1
, where Vi = V, i =

1, 2, . . . , p, V j = V , j = 1, 2, . . . , q by
f (x1 , x2 , . . . , x p , y 1 , y 2 , . . . , y q )
= y k (x h )x1 ⊗ x2 ⊗ · · · ⊗ x
h ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y
k ⊗ · · · ⊗ y q , where
x
h and y
k indicate that these vectors are deleted from the tensor. Here, y k (x h ) repre-
sents the image of x h ∈ V under the linear functional y k ∈ V
. It can be verified that
f is a ( p + q)-linear mapping. Thus, using the definition of tensor product of finite
p p−1
number of vector spaces, there exists a unique linear map say Ckh : Vq −→ Vq−1
such that
Ckh (x1 ⊗ x2 ⊗ · · · ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y q )
= y k (x h )x1 ⊗ x2 ⊗ · · · ⊗ x
h ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y
k ⊗ · · · ⊗ y q .
p
Clearly, under this map, the orders of each tensor in Vq are being reduced. The map-
ping Ckh is called a contraction of the hth contravariant index and the kth covariant
p
index. Now, we examine some behaviors of Ckh on the tensors belonging to Vq . For
this, let V be of finite dimension n and further we will use some notations related
with V and T (V ), which have the same meaning as they have in previous para-
j j ··· j p j j ··· j j j ···

j ··· j
graphs. Consider the tensor ei11i22···i pq ∈ Vq , then Ckh (ei11i22···i pq ) = e jk (eih )ei 1i 2···i
k···i q =
1 2 h p
j j ···

j ··· j i i ···i j j ··· j


δih jk ei 1i 2···i
k···i q , where δih jk stands for Kronecker’s delta. If x = α j11 j22 ··· jpq ei11i22···i pq , then
1 2 h p
i i ···i ···i j j ···

j ··· j
using the linearity of Ckh , we have Ckh (x) = α j11 j22 ··· hjk−1 iph jk+1 ··· jq ei 1i 2···i
k···i q . We consider
1 2 h p
some examples to clear these complicated symbols.
Let x ∈ T (V ) such that x = e1 ⊗ e2 − 5e2 ⊗ e2 + e1 ⊗ e2 − 7e2 ⊗ e1 . Then x ∈
V1 and C11 (x) = −5. Let y ∈ T (V ) such that
1

y = 5e1 ⊗ e2 ⊗ e1 ⊗ e1 ⊗ e3 − 7e1 ⊗ e2 ⊗ e1 ⊗ e1 ⊗ e3 + e1 ⊗ e1 ⊗ e2 ⊗ e3 ⊗
e . Clearly y ∈ V32 and C11 (y) = 5e2 ⊗ e1 ⊗ e3 − 7e1 ⊗ e2 ⊗ e3 , C21 (y) = 5e2 ⊗
2

e1 ⊗ e3 − 7e2 ⊗ e1 ⊗ e3 , C32 (y) = 0 but C23 (y), C41 (y) etc. are undefined.
Example 9.18 Let V be a finite dimensional vector space with dimension n. Then V11
is isomorphic to A (V ) via the map f, x ⊗ y −→ f (x ⊗ y) such that f (x ⊗ y)(v) =
y(v)x for all v ∈ V, where y(v) is the image of v under the linear functional y.
Moreover, the contraction C11 of any (x ⊗ y) ∈ V11 is precisely the trace of the linear
operator f (x ⊗ y), where trace of a linear operator is defined as the trace of any
matrix of the linear operator with regard to any ordered basis B of V.
324 9 Tensors and Their Algebras

Define a map η : V × V
−→ A (V ) such that η(x, y) = g(x, y), where g(x, y)
(v) = y(v)x for all v ∈ V. Obviously, for any fixed x and y, g(x, y) is a linear operator
on V. It can be also easily verified that η is a bilinear map. Using the definition of
tensor product, there exists a unique linear map f : V11 to A (V ) such that f (x ⊗
y)(v) = g(x, y)(v) = y(v)x. Here, we observe that V11 and A (V ) have the same
finite dimension. To prove that f is an isomorphism, it is only left to show that f is

n 
n
j
one-to-one. For this, let z ∈ K er f, i.e., z = αi (ei ⊗ e j ) because ei ⊗ e j , 1 ≤
i=1 j=1
j 
n 
n
j
i ≤ n, 1 ≤ j ≤ n is a basis for V11 , where αi ∈ F. This implies that f ( αi (ei ⊗
i=1 j=1

n 
n
j
e j ) = 0. Due to linearity of f, we have αi f (ei ⊗ e j ) = 0. It follows that
i=1 j=1

n 
n
j 
n 
n
j
[ αi f (ei ⊗ e )](v) = 0 for all v ∈ V, i.e.,
j
αi ((e j (v))(ei )) = 0 for all
i=1 j=1 i=1 j=1
v ∈ V. Using the fact that {e1 , e2 , . . . , en } is linearly independent set in V and varying
j
v through all the vectors in the set {e1 , e2 , . . . , en }, one concludes that αi = 0 for all
i, j. In turn, we get z = 0 and hence f becomes injective. Hence V1 is isomorphic
1

to Hom (V, V ). This shows that tensors in V11 can be regarded as elements of A (V ).
In a similar fashion, one can show that tensors in V21 can be regarded as elements of
A (V
).
The contraction of x ⊗ y ∈ V11 is given by C11 (x ⊗ y) = y(x).
Case I: If at least one out of x and y is zero, then C11 (x ⊗ y) = 0 and the operator
f (x ⊗ y) will be the zero operator. As the trace of the zero operator is zero, result is
obvious for this case.
Case II: Suppose that x = 0 and y = 0. As y is a nonzero linear functional, Ker y = V
and rank y = 1. Using rank nullity theorem, we have Nullity y = n − 1. Thus V =
Ker y < u > for some nonzero u ∈ V. Now using the previous conclusion, one
can show that the matrix of linear operator f (x ⊗ y) with regard to any ordered basis
will have each of its diagonal entries equal to 0 except one that equals y(x). Thus,
trace of the linear operator equals C11 (x ⊗ y). This proves our result.

Using the same idea as in the previous example, each tensor in V31 can be
regarded as a linear transformation from V03 into V. It can be easily seen that
j j j
contraction need not be commutative. For example, let x = α ij11ij22i3j3 ei11i22i33 be an element
j j j j j
of V33 . Then C12 (x) = αii11 ij22ij33 ei12i33 , C21 (x) = α ij11ii21ij33 ei21i33 , C12 ◦ C21 (x) = αii21ii12 ij33 ei31 , C21 ◦
j
C12 (x) = αii11ii12 ij33 ei32 . Clearly, this shows that C12 ◦ C21 = C21 ◦ C12 . It is to be noted
that product of contractions in one order may be defined but it may not be defined
in the reverse order. The contractions C11 are special in nature. Notice that if we
j j j j j j
take x = α ij11ij22i3j3 ei11i22i33 , then C11 (x) = αii11 ij22ij33 ei22i33 , C11 ◦ C11 (x) = αii11ii22 ij33 ei33 , C11 ◦ C11 ◦
C11 (x) = αii11ii22ii33 . Thus C11 ◦ C11 ◦ C11 maps V33 into the scalars. Similarly, it follows
p
that p copies of C11 composed with each other map V p into F.
9.3 Tensor Algebra 325

Exercises

1. If dimension of V is greater than 1, then prove that T (V ) is a ring, which is free


from zero divisors.
2. Using only properties of tensor product, show that Vq ⊗ Vsr ∼
p p+r
= Vq+s .
3. Find an isomorphism to show that V21 is isomorphic to Hom (V
, V
). Also deter-
mine the matrix of the elements of Hom (V
, V
).
4. Let V be an n-dimensional vector space. Let X = {xi }i=1 n
be a basis of V together

with X = {x i }i=1 as its dual basis. If B = {ei }i=1 is another basis of V together
n n
 j
with B = {ei }i=1
n
as its dual basis such that ei = αih and e j = βk x k , then obtain
p j j ··· j
the coefficients of an element of Vq for its expansion in the basis xi11i22···i pq .
5. Let x, y ∈ V11 be decomposable. Find the conditions such that x + y is also
decomposable.
6. Prove that tensor algebra T (V ) is a graded algebra.

9.4 Exterior Algebra or Grassmann Algebra

Throughout this section, V represents a finite dimensional vector space over a field
F, where char (F) = 0 and dim (V ) = n. In this section, we define symmetric and
p
antisymmetric or alternating tensors in V0 . Later, the constructions of symmetric
tensor algebras ST (V ) and antisymmetric tensor algebras AT (V ) of the vector space
V with their properties are interpreted. We have concluded this section by introducing
 of exterior product or wedge product ∧ and exterior algebra or Grassmann
the notions
algebra V of V.
Let S p be the permutation group of p integers 1, 2, . . . , p. Let σ ∈ S p . Define a
p
map g : V1 × V2 × · · · × V p −→ V0 , where Vi = V for each i such that 1 ≤ i ≤ p
and g(x1 , x2 , . . . , x p ) = xσ (1) ⊗ xσ (2) ⊗ · · · ⊗ xσ ( p) . It can be shown that g is a p-
linear mapping. Thus using the definition of tensor product, there exists a unique lin-
p p
ear map, say Sσ : V0 −→ V0 , such that Sσ (x1 ⊗ x2 ⊗ · · · ⊗ x p ) = xσ (1) ⊗ xσ (2) ⊗
· · · ⊗ xσ ( p) . With the help of Sσ , we define symmetric and antisymmetric tensors in
p p
V0 . A tensor x ∈ V0 is said to be symmetric if Sσ (x) = x for every σ. If Sσ (x) = x
for some particular σ, then x is said to be symmetric with regard to σ. On the other
p
hand, a tensor x ∈ V0 is said to be antisymmetric or alternating if Sσ (x) =(Sign
σ )x for every σ. If Sσ (x) =(Sign σ ) x for some particular σ, then x is said to be
antisymmetric or alternating with regard to σ, where Sign σ is 1 if σ is even and
Sign σ is −1 if σ is odd.
p
Theorem 9.19 Let x ∈ V0 . Then x is symmetric if and only if Sτ (x) = x for every
transposition τ. The tensor x is antisymmetric if and only if Sτ (x) = −x for every
transposition τ.
Proof Let x be symmetric. Thus Sσ (x) = x for every σ. But any transposition is also
a particular permutation, hence Sτ (x) = x. Conversely, suppose that Sτ (x) = x for
326 9 Tensors and Their Algebras

every transposition τ. Let σ ∈ S p . We know that each permutation can be written as


product of transpositions. Thus σ = τ1 τ2 · · · τm and hence Sσ (x) = S(τ1 τ2 ···τm ) (x) =
x, using the fact that Sσ = S(τ1 τ2 ···τm ) = Sτ1 oSτ2 o · · · oSτm , we get as desired.
Let x be antisymmetric. Thus Sσ (x) = (Sign σ )x for every σ and Sign σ is 1 if
σ is even and Sign σ is −1 if σ is odd. If σ = τ, then Sign σ = −1. We arrive at
Sτ (x) = −x. Conversely, let Sτ (x) = −x for every transposition τ. Let σ ∈ S p . By
using the above arguments we have Sσ = S(τ1 τ2 ···τm ) = Sτ1 oSτ2 o · · · oSτm and hence
Sσ (x) = (−1)m x. If σ is an odd permutation, then m will be an odd integer and
hence Sσ (x) = −x. On the other hand if m is an even permutation, then m will be
an even integer and hence Sσ (x) = x. Thus our result stands proved.
p
The set of all symmetric tensors of V0 is usually denoted by ST p (V ), i.e.,
p
ST p (V ) = {x ∈ V0 |Sσ (x) = x, for all σ ∈ S p }. Let x, y ∈ ST p (V ), α, β ∈ F. Due
to linearity of Sσ , we have Sσ (αx + βy) = αSσ (x) + β Sσ (y). The symmetry of x
and y forces us to conclude that αx + βy ∈ ST p (V ). Thus ST p (V ) is a subspace
p p
of V0 . The set of all antisymmetric tensors of V0 is usually denoted by AT p (V ).
p
The set of all antisymmetric tensors of V0 is usually denoted by AT p (V ), i.e.,
p
AT p (V ) = {x ∈ V0 |Sσ (x) = (Sign σ ) x, for all σ ∈ S p }, where Sign σ is 1 if σ is
even and Sign σ is −1 if σ is odd. On the similar lines, one can show that AT p (V )
p
is a subspace of V0 . The vector space ST p (V ) is called the symmetric tensor space
of V. On the other hand, the vector space AT p (V ) is called the antisymmetric tensor
space or exterior product space of V.
p
Theorem 9.20 Let S and A be linear operators defined on V0 such that S =
1 
p!
S σ , and A = 1
p!
(Sign σ )Sσ , where the sum in both the operators is over all
the elements of S p . Then both S and A are projections. S is a projection on ST p (V )
and A is a projection on AT p (V ).
p 
Proof Let x ∈ V0 . Then for any ξ ∈ S p , Sξ S(x) = p!1 Sξ σ (x). But S p is a group.

As
 σ runs over all the elements of S p , so does ξ σ. This implies that Sξ σ (x) =
p
Sσ (x). Hence Sξ S(x) = S(x). This implies that for any x ∈ V0 , S(x) is sym-
metric. Furthermore, if x is symmetric, then Sσ (x) = x for all σ ∈ S p and S(x) =
1  1 
p!
S σ (x) = p!
x = x, because order of S p = p!. Finally, we have shown that
p
S (x) = S(x) for every x ∈ V0 . Thus S is projection and it is projection on ST p (V ).
2

Let τ be a transposition being an element of S p . Then τ σ is odd if σ is even  and


τ σ is even if σ is odd. Thus sign (τ σ ) = − sign (σ ). Thus Sτ (A(x)) = p!1 (Sign
 p
σ )Sτ σ (x) = − p!1 (Sign τ σ )Sτ σ (x) = −A(x). Hence, for any x ∈ V0 , A(x) is an
alternating tensor. Also, if x is alternating, then (sign σ )Sσ (x) =(sign σ )2 x = x.
p
Hence A(x) = x. Thus, since A(x) is alternating, for any x ∈ V0 , A2 (x) = A(x).
Now we have proved that A is projection and it is projection on AT p (V ).
p
Theorem 9.21 If I p denotes the linear span of all tensors in V0 , that are symmetric
with respect to a transposition, then kernel of A is I p .
p
Proof Let x ∈ V0 , which is symmetric
 with regard to atransposition τ. Thus
Sτ (x) = x, and A(Sτ (x)) = p!1 (Sign σ )Sσ τ (x) = − p!1 (Sign σ τ )Sσ τ (x) =
9.4 Exterior Algebra or Grassmann Algebra 327

−A(x). Thus, we get A(x) = −A(x), which implies that A(x) = 0 because char
(F) = 0. Hence x ∈ Kernel (A). This implies that  I p ⊆ Kernel A. Let x ∈ Ker-
nel A. Thus A(x) = 0. As we have A(x) = p!1 (Sign σ )Sσ (x). Now adding x

on both sides of the preceding relation, we arrive at x = p!1 (x−Sign σ Sσ (x)).
To show that x ∈ I p , it is sufficient to prove that x−Sign σ Sσ (x) ∈ I p for any
σ ∈ S p . We prove this statement by applying induction on the number of trans-
positions in which σ is expressible. The minimum number of transpositions in
which σ is expressible will be 1 and this is the case, when σ is itself transposition.
p
Thus let σ = (i j). But x ∈ V0 , we have x = x1 ⊗ x2 ⊗ · · · ⊗ x p . This shows that
y = x−Sign (i j)S(i j) (x) = x1 ⊗ x2 ⊗ · · · ⊗ xi ⊗ · · · ⊗ x j ⊗ · · · ⊗ x p + x1 ⊗ x2 ⊗
· · · ⊗ x j ⊗ · · · ⊗ xi ⊗ · · · ⊗ x p = x1 ⊗ x2 ⊗ · · · ⊗ (xi + x j ) ⊗ · · · ⊗ (xi + x j ) ⊗
· · · ⊗ x p − x1 ⊗ x2 ⊗ · · · ⊗ xi ⊗ · · · ⊗ xi ⊗ · · · ⊗ x p − x1 ⊗ x2 ⊗ · · · ⊗ x j ⊗ · · · ⊗
x j ⊗ · · · ⊗ x p . Obviously y ∈ I p . In fact, we have shown that if τ is any transpo-
sition then x + Sτ (x) ∈ I p . Assume the induction hypothesis, i.e., the statement is
true for all permutations σ which have been expressible as product of r transposi-
tions. Let σ1 ∈ S p , which is expressible as product of r + 1 transpositions. We can
  
write σ1 = σ τ , where σ is a permutation expressible as a product of r transpositions
  
and τ is a transposition. Now x−Sign σ1 Sσ1 (x) = x−Sign (σ τ )Sσ  τ  (x) = x+Sign
 
(σ )Sσ  τ  (x) − Sτ  (x) + Sτ  (x) = x + Sτ  (x)+Sign (σ )Sσ  τ  (x) − Sτ  (x) = x− 
  
Signτ Sτ  (x) − [Sτ  (x)−Sign (σ )Sσ  (Sτ  (x))]. Using the fact that A(τ (x)) = p!1
 
(Sign σ )Sσ τ  (x) = − p!1 (Sign σ τ )Sσ τ  (x) = −A(x) = 0 for any σ ∈ S p and the
induction hypothesis we conclude that x−Sign σ1 Sσ1 (x) ∈ I p . Thus result follows
by induction.

Let F = {ST p (V )| p ∈ I = {0, 1, 2, . . .}} be a set of symmetric tensor spaces,


where I is an indexing set. Let ST (V ) be the external direct sum of this set of vector
spaces, i.e., ST (V ) = ext p∈I ST (V ). ST (V ) is an infinite dimensional subspace of
p

T0 (V ). ST (V ) is not closed with regard to ⊗. Now we define a new multiplication


r s
in ST (V ) in the following way. Let x, y ∈ ST (V ), clearly x = vi , y = vj ,
i=1 j=1
where vi ∈ ST i (V ) and vj ∈ ST j (V ) then

r +s  
 
xy = S(vi ⊗ vj ) .
a=0 i+ j=a

It is easy to check that ST (V ) forms a commutative algebra with identity with


regard to the above multiplication defined. This algebra is known as symmetric
tensor algebra of V.

Theorem 9.22 Let V be any vector space. Then the symmetric tensor algebra ST (V )
is isomorphic to the algebra of polynomials F[e1 , e2 , . . . , en ], where {e1 , e2 , . . . , en }
is a basis of V.
328 9 Tensors and Their Algebras

Proof We know that each element u of symmetric tensor algebra ST (V ) will be



r
identified as u = vi , where vi ∈ ST i (V ). Thus let x ∈ ST i (V ), as x is a sym-
i=1

n
metric element of V0i . Hence x = α j1 , j2 ,..., ji e j1 ⊗ e j2 ⊗ · · · ⊗ e ji , where
j1 , j2 ,..., ji =1
α j1 , j2 ,..., ji ∈ F. Define a map f : ST (V ) −→ F[e1 , e2 , . . . , en ], such that
 

n
f α j1 , j2 ,..., ji e j1 ⊗e j2 ⊗ · · · ⊗ e ji
j1 , j2 ,..., ji =1

n   
= α j1 , j2 ,..., ji (e j1 e j2 ··· e ji ),
j1 , j2 ,..., ji =1


where represents the multiplication of algebra F[e1 , e2 , . . . , en ]. It can be easily
verified that f is an algebra isomorphism. Thus the symmetric tensor algebra ST (V )
is isomorphic to the algebra of polynomials F[e1 , e2 , . . . , en ].

Theorem 9.23 Let V be any vector space. Then the vector space ST p (V ) of sym-
metric tensors is isomorphic to the vector space F p [e1 , e2 , . . . , en ], of homogeneous
polynomials of degree p for p ≥ 1.

p 
n
Proof Let x ∈ ST p (V ), as x is a symmetric element of V0 . Hence x =
j1 , j2 ,..., ji =1
α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p , where α j1 , j2 ,..., j p ∈ F. Define a map f : ST p (V ) −→
F p [e1 , e2 , . . . , en ], such that
 

n
f α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p
j1 , j2 ,..., j p =1

n   
= α j1 , j2 ,..., j p (e j1 e j2 ··· e j p ),
j1 , j2 ,..., ji =1


where represents the multiplication of algebra F[e1 , e2 , . . . , en ]. It can be easily
verified that f is a vector space isomorphism. Thus the vector space ST p (V ) of
symmetric tensors is isomorphic to the vector space F p [e1 , e2 , . . . , en ].

Theorem 9.24 Let V be any vector space. Then, the vector space F p [e1 , e2 , . . . , en ],
of homogeneous polynomials of degree p is isomorphic to a quotient space of the
p
vector space V0 for p ≥ 1.
p
Proof Define a map f : V0 −→ F p [e1 , e2 , . . . , en ] such that
 

n
f α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p
j1 , j2 ,..., j p =1

n   
= α j1 , j2 ,..., j p (e j1 e j2 ··· e j p ).
j1 , j2 ,..., ji =1
9.4 Exterior Algebra or Grassmann Algebra 329

Obviously, f is a surjective linear map. Kernel of f is given as I p =


{Sσ (x) −
p
x|x ∈ V0 , σ ∈ S p } . By fundamental theorem of vector space homomorphism, we
p
have I0p ∼
V
= F p [e1 , e2 , . . . , en ], which is our required result. Using Theorem 9.23, we
p p
V ∼ V0
deduce that I0p = ST p (V ). Due to this isomorphism the vector space Ip
is also
referred as the symmetric tensor space of V.

Finally, in the end of this section, we want to explore the notion of exte-
rior product and exterior algebra of a vector space V. Consider the quotient vec-
p p
V p V
tor space I0p . Let q : V0 −→ I0p be the quotient homomorphism, i.e., for any
p p
x ∈ V0 , we have q(x) = x + I p . As the map f : V1 × V2 × · · · × V p −→ V0 where
Vi = V, i = 1, 2, . . . , p, such that f (x1 , x2 , . . . , x p ) = x1 ⊗ x2 ⊗ · · · ⊗ x p is a p-
p
V
linear map. Thus, the product map q ◦ f : V1 × V2 × · · · × V p −→ I0p is also a
p-linear map. The image of any element (x1 , x2 , . . . , xn ) under q ◦ f is denoted by
V
p 
x1 ∧ x2 ∧ · · · ∧ x p . The quotient space I0p is denoted by p V. The elements of
p
V are called p vectors over V . A p vector  is called decomposable
 or a pure if it
is of the form x1 ∧ x2 ∧ · · · ∧ x p . We define 1 V = V and 0 V = F. Now, we are
interested in dealing with these p-vectors where p = 0, 1, 2, . . . , which will give
rise to an algebra. For this we prove the following.

Theorem 9.25 Let U be any arbitrary vector space. If h : p V −→ U is a
linear mapping, then h ◦ q ◦ f : V1 × V2 × · · · × V p −→ U where Vi = V, i =
1, 2, . . . , p, is an alternating p-linear mapping, where q and f have been described
in above paragraph. Conversely, if g : V1 × V2 × · · · × V p −→ U where Vi =
V, i = 1, 2, . . . , p, is an alternating p-linear mapping, then there exists a unique lin-
ear mapping h : p V −→ U such that g(x1 , x2 , . . . , x p ) = h(x1 ∧ x2 ∧ · · · ∧ x p ).

Proof Obviously, h ◦ q ◦ f is a p-linear map. Let (x1 , x2 , . . . , x p ) ∈ V1 × V2 ×


· · · × V p where xi = x j for i = j, clearly x1 ⊗ x2 ⊗ · · · ⊗ x p is a symmetric ten-
sor with regard to transposition (i j). By Theorem 9.21, x1 ⊗ x2 ⊗ · · · ⊗ x p ∈ I p . As
a result q(x1 ⊗ x2 ⊗ · · · ⊗ x p ) = I p . Finally, we have h ◦ q ◦ f (x1 , x2 , . . . , x p ) =
hoq(x1 ⊗ x2 ⊗ · · · ⊗ x p ) = h(I p ) = 0, because h is linear. This proves the first
statement.
Conversely, suppose that g : V1 × V2 × · · · × V p −→ U is an alternating p-linear
mapping. Thus g(x1 , x2 , . . . , x p ) = 0 if xi = x j , where i = j. Also, using the above
arguments if xi = x j with i = j, then x1 ∧ x2 ∧ · · · ∧ x p = 0, i.e., h(x1 ∧ x2 ∧ · · · ∧
x p ) = 0. Using these facts, we obtain that the association (y1 ∧ y2 ∧ · · · ∧ y p ) −→
p
g(y1 , y2 ,. . . , y p ) is a map. Now extending this map by linearity on  V , define a
p
map h : V −→ U because the p-vectors y1 ∧ y2 ∧ · · · ∧ y p span p (V ). Fur-
ther, for uniqueness of h, let h 1 : p V −→ U be any other linear mapping such
that h 1 (x1 ∧ x2 ∧ · · · ∧ x p ) = g(x1 , x2 , . . . , x p ). This shows that h and h 1 agree on
a spanning set for p (V ). Hence h = h 1 .

Now, we define a product of p-vectors  and q-vectors. Consider vectors x1 ∧


x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq ∈ p+q (V ). Now define a map g1 : V1 × V2 ×
330 9 Tensors and Their Algebras


· · · × V p −→ p+q (V ) such that g1 (x1 , x2 , . . . , x p ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧
y2 ∧ · · · ∧ yq . Clearly g1 is alternating
 p-linear
 map. Hence by Theorem 9.25, there
exists unique linear map h 1 : p (V ) −→ p+q (V ) such that g( x1 , x2 , . . . , x p ) =
h 1 (x1 ∧ x2 ∧ · · · ∧ x p ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Similarly define
a map g2 : V1 × V2 × · · · × Vq −→ p+q (V ) such that g2 (y1 , y2 , . . . , yq ) = x1 ∧
x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Clearly g1 isalternating p-linear map. Similarly,
there exists unique linear map h 2 : q (V ) −→ p+q (V ) such that g( y1 , y2 , . . . , yq )
= h 1 (y1  ∧ y2 ∧ · · ·
∧ yq ) = x1 ∧x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Now define a
map f : p (V ) × q (V ) −→ p+q (V ) such that f (x1 ∧ x2 ∧ · · · ∧ x p , y1 ∧ y2 ∧
· · · ∧ yq ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . As h 1 and h 2 are linear maps,
therefore f is a bilinear map. Let x = x1 ∧ x2 ∧ · · · ∧ x p and y = y1 ∧ y2 ∧ · · · ∧
yq . Then the image of (x, y) under f, i.e., f (x, y) = x ∧ y = x1 ∧ x2 ∧ · · · ∧ x p ∧
y1 ∧ y2 ∧ · · · ∧ yq is called the exterior product or wedge product or the Grassmann
product. If p = 0 or q = 0, then we define α ∧ y to be αy and x ∧ α to be αx. Since
above map f is bilinear. Hence following are evident. (x + y) ∧ (u + v) = x ∧ u +
x ∧ v + y ∧ u + y ∧ v, (αx) ∧ y = x ∧ (αy) = α(x ∧ y). For pure p-vectors x, q-
vectors y, and r -vectors z, we have (x ∧ y) ∧ z = x ∧ (y ∧ z).
Let F = { p (V )| p ∈ I = {0, 1, 2, . . .}} be a set of vector spaces of p-vectors,
where I is an indexing  set.  (Vp ) be the external direct sum of this set of
Let
vector spaces, i.e., (V ) = ext p∈I (V ). Now we define a new multiplication in
  r s
(V ) in the following way. Let x, y ∈ (V ), clearly x = vi , y = vj , where
i=1 j=1
 
vi ∈ i (V ) and vj ∈ j (V ) then

r +s  
 
xy = (vi ∧ vj ) .
a=0 i+ j=a


It is easy to check that (V ) forms a noncommutative algebra with identity with
regard to the exterior product or wedge product defined above. This algebra is known
as antisymmetric tensor algebra of V, or exterior algebra of V or Grassmann algebra
of V. Usually it is represented by (V ) or AT (V ).

Exercises
p
1. Find the dimension of subspaces ST p (V ) and AT p (V ) of V0 . 
2. Prove that the symmetric tensor algebra ST (V ) and exterior algebra (V ) are
graded algebras.  
3. Let the dimension of V be n. Prove that dim p (V ) = np .
p q
4. If x ∈ (V ) and y ∈ (V ), then show that x ∧ y = (−1) pq y ∧ x.
5. Let x1 ∧ x2 ∧ · · · ∧ x p ∈ p (V ). Then prove that x1 ∧ x2 ∧ · · · ∧ x p = (Signσ )
xσ (1) ∧ xσ (2) ∧ · · · ∧ xσ ( p) , for any σ ∈ S p .
9.4 Exterior Algebra or Grassmann Algebra 331

6. Show that x1 ∧ x2 ∧ · · · ∧ xm = 0, m ≤ n, if and only if {xi }i=1 m


is linearly depen-
dent, where dim V = n. 
7. Let {ei }i=1
n
be a basis for V. Prove that the set of np vectors of the form ei1 ∧ ei2 ∧
· · · ∧ ei p , where i 1 i 2 · · · i p ranges over the combinations
 of p-distinct integers
taken from the set {1, 2, . . . , n}, is a basis for p (V ).
Chapter 10
Applications of Linear Algebra
to Numerical Methods

In this chapter, we shall study common problems in numerical linear algebra which
includes LU and P LU decompositions together with their applications in solving
a linear system of equations. Further, we shall briefly discuss the power method
which gives an approximation to the eigenvalue of the greatest absolute value and
corresponding eigenvectors. Finally, singular value decomposition (SVD) of matrices
together with its properties and applications in diverse fields of studies are included.

10.1 LU Decomposition of Matrices

The Gauss elimination method reduces the system of equations to an equivalent


upper triangular system which can be solved by back solution. A different approach
can be used to solve the system of equations AX = B by decomposing (or factoring)
the coefficient matrix A into a product of lower and upper triangular matrices. The
motivation for a triangular decomposition is based on the observation that system of
equations involving triangular coefficient matrices is easier to deal with. Using the
same arguments as used in Remark 1.101(ii), of Chap. 1, the coefficient matrix A
can be factored in such a way.
Let A be an m × n matrix which can be reduced in row echelon form U by
using sequence of elementary row operations (I I I )(see Note 1.86 which involves
adding a constant multiple of a row R j (namely α R j ) to another row Ri , in such a
way that Ri + α R j replaces Ri in the new set of arrays, usually denoted by Ri →
Ri + α R j , i > j without interchanging any two rows. We zero out the entries below
the pivot entry in the first column and then the next and so on while scanning from
the left. By Remark 1.101(i), each of these operations can be accomplished by
multiplying on the left by an appropriate elementary matrix. Hence, we can find
elementary matrices E 1 , E 2 , . . . , Er such that

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 333
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_10
334 10 Applications of Linear Algebra to Numerical Methods

Er · · · E 2 E 1 A = U.

Since the reduction of A to row echelon form can be achieved without interchanging
any two rows, we can assume that the required elementary matrices E k represent
operations of the form Ri → Ri + α R j designed to add multiples α R j of row R j
to the row Ri below. This means i > j in all the cases. Each E k is, therefore, a
lower triangular elementary matrix with 1 s on the diagonal. The inverse of operation
Ri → Ri + α R j is Ri → Ri − α R j , again with i > j. Hence, the inverse E k−1 is also
a lower triangular matrix with 1 s on its main diagonal. Since elementary matrices
E 1 , E 2 , . . . , Er are nonsingular, multiplying both the sides of the above relation on
the left successively by Er−1 , . . . , E 2−1 , E 1−1 , we get

A = E 1−1 E 2−1 · · · Er−1 U.

The matrix L = E 1−1 E 2−1 · · · Er−1 is a lower triangular matrix with 1 s on the main
diagonal provided that no two rows are interchanged in reducing A to U , and the
above yields that A = LU .
The following theorem summarizes the above result:
Theorem 10.1 (LU decomposition theorem) Suppose that A is an m × n matrix
that can be reduced to echelon form without interchanging any two rows. Then there
exist an m × m lower triangular matrix L with 1 s on the main diagonal and an
m × n row echelon matrix U such that A = LU .
Definition 10.2 A factorization of a matrix A as A = LU , where L is a lower tri-
angular and U is an upper triangular matrix, is called an LU decomposition of A.
There is a convenient procedure for finding LU decomposition. In fact, it is only
necessary to keep the track of the multipliers which are used to reduce the matrix
in row reduced echelon form. This procedure is described in the following example
and is called the multiplier method or Dolittle’s method.
⎡ ⎤
12 4
Example 10.3 In order to find the LU decomposition of the matrix A = ⎣ 3 8 14 ⎦ ,
2 6 13
write the identity matrix in the left, i.e.,
⎡ ⎤⎡ ⎤
100 12 4
⎣ 0 1 0 ⎦ ⎣ 3 8 14 ⎦ .
001 2 6 13

The procedure involves doing row operations to the matrix on the right while simul-
taneously updating the column of the matrix on the left. First, we perform the row
operations R2 → R2 − 3R1 to make zero below 1 in the first column and second
row. Note that 3 is added in the second row of the first column because −3 times of
the first row of A is added to the second row.
10.1 LU Decomposition of Matrices 335
⎡ ⎤⎡ ⎤
100 12 4
⎣3 1 0⎦⎣0 2 2 ⎦.
001 2 6 13
⎡ ⎤ ⎡ ⎤
1 00 100
We see that E 1 = ⎣ −3 1 0 ⎦ and hence E 1−1 = ⎣ 3 1 0 ⎦. We carry out the similar
0 01 001
procedure for the third row and find that
⎡ ⎤⎡ ⎤
100 124
⎣3 1 0⎦⎣0 2 2⎦.
201 025
⎡ ⎤ ⎡ ⎤
1 00 100
Note that E 2 = ⎣ 0 1 0 ⎦ and hence E 2−1 = ⎣ 0 1 0 ⎦ . Finally, similar arguments
−2 0 1 201
for the second column and third row yield that
⎡ ⎤⎡ ⎤
100 124
A = ⎣3 1 0⎦⎣0 2 2⎦.
211 003
⎡ ⎤
1 0 0
Thus, we find LU decomposition of the matrix A. We see that E 3 = ⎣ 0 1 0 ⎦ and
0 −1 1
⎡ ⎤
100
hence E 3−1 = ⎣ 0 1 0 ⎦ . It can be seen that U = E 3 E 2 E 1 A and L = E 1−1 E 2−1 E 3−1 .
011
Notice that in each position below the main diagonal of L, the entry is the negative
of the multiplier in the operation that introduced the zero in that position of U .

Remark 10.4 (i) This is natural to ask whether every matrix has LU decompo-
sition. Sometimes it is impossible to write given matrix in this form. In fact,
if a square matrix A can be reduced to row echelon form without using row
interchanges, then A has LU decomposition. More generally, an invertible
matrix A has LU decomposition provided that all its leading submatrices have
nonzero determinants. The kth leading submatrix of A, denoted as Ak , is the
k × k matrix obtained by retaining only the top k rows and left most k columns.
For example, if ⎡ ⎤
a11 a12 · · · a1n
⎢ a21 a22 · · · a2n ⎥
⎢ ⎥
A = ⎢ . . . . ⎥,
⎣ .. .. .. .. ⎦
an1 an2 · · · ann
336 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤
a11 a12 · · · a1k
 ⎢ a21 a22 · · · a2k ⎥
a11 a12 ⎢ ⎥
then A1 = a11 , A2 = , . . . Ak = ⎢ . .. .. .. ⎥ , and A has LU
a21 a22 ⎣ .. . . . ⎦
ak1 ak2 · · · akk
decompositions if |Ai | = 0; for all 1 ≤ i ≤ n.
(ii) It is also interesting to ask whether a square matrix has more than one decom-
position. In the absence of additional restrictions, it can be easily seen that LU
decompositions are not unique. For example, if
⎡ ⎤⎡ ⎤
11 0 ··· 0 1 u 12 · · · u 1n
⎢ 21 22 ··· 0 ⎥⎢0 1 ··· u 2n ⎥
⎢ ⎥⎢ ⎥
A = LU = ⎢ . .. .. .. ⎥ ⎢ .. .. .. ⎥
⎣ .. . . . ⎦⎣ . . ··· . ⎦
n1 n2 · · · nn 0 0 ··· 1

and L has nonzero entries on the main diagonal, then shift the diagonal entries
from the left factor to the right factor as follows:
⎡ ⎤⎡ ⎤⎡ ⎤
1 0 ··· 0 11 0 · · · 0 1 u 12 · · · u 1n
⎢ 21
⎢ 1 ··· 0⎥
11
⎢ ⎥⎢
⎥ ⎢ 0 22 · · · 0 ⎥ ⎢ 0 1 · · · u 2n ⎥

A=⎢ . . . ⎥ ⎢ . . . ⎥ ⎢ . . . ⎥
⎣ .. .. · · · .. ⎦ ⎣ .. .. · · · .. ⎦ ⎣ .. .. · · · .. ⎦
n1 n2
11 ⎡22
··· 1 0 0 · · · nn 0 0 ··· 1
⎤⎡ ⎤
1 0 ··· 0 11 11 u 12 · · · 11 u 1n
⎢ 21 1 · · · 0 ⎥ ⎢ 0 22 · · · 22 u 2n ⎥
⎢ 11 ⎥⎢ ⎥
= ⎢ .. .. . ⎥⎢ . .. .. ⎥ .
⎣ . . · · · .. ⎦ ⎣ .. . ··· . ⎦
n1 n2
 
··· 1 0 0 · · · nn
11 22

This is another triangular decomposition of A.


(iii) If A is an invertible matrix of order n that can be reduced to row echelon form
without interchanging any two rows, then it can be seen that LU decomposition
of A is unique. Suppose A has two decompositions say A = L 1 U1 and A =
L 2 U2 . Then since L 1 , L 2 , U1 , U2 are nonsingular and
L −1 −1 −1 −1 −1 −1
2 L 1 = L 2 In L 1 = L 2 (A A )L 1 = L 2 (L 2 U2 )(L 1 U1 ) L 1
= L −1 −1 −1
2 L 2 (U2 U1 )L 1 L 1
= U2 U1−1 ,
L −1
2 is also a lower triangular matrix and has 1 s on its main diagonal, and
L −1
2 L1 is also a lower triangular matrix with 1 s on its main diagonal. By the
same process, one can conclude that U2 U1−1 is an upper triangular matrix. Since
L −1 −1
2 L 1 = U2 U1 , their common value is a matrix that is both lower triangular
and upper triangular with 1 s on its main diagonal. The only matrix that meets
these requirements is the identity matrix and hence L −1 −1
2 L 1 = U2 U1 = I n ,
10.1 LU Decomposition of Matrices 337

which implies that L 1 = L 2 and U1 = U2 . This ensures the uniqueness of


decomposition of A.

The procedure adopted in Example 10.3 for finding LU decomposition of a matrix


is also useful for obtaining an LU decomposition of a rectangular matrix, which can
be seen by the following example.
⎡ ⎤
12121
⎢2 0 2 1 1⎥
Example 10.5 In order to find LU decomposition of A = ⎢ ⎥
⎣ 2 3 1 3 2 ⎦, we write
10112
the identity matrix in the left, i.e.,
⎡ ⎤⎡ ⎤
1000 12121
⎢0 1 0 0⎥⎢2 0 2 1 1⎥
A=⎢ ⎥⎢
⎣0 0 1 0⎦⎣2 3 1 3 2⎦.

0001 10112

Now applying the row operations R4 → R4 − R1 , R2 → R2 − 2R1 and R3 →


R3 − 2R1 on the matrix in the right and updating column of the identity matrix
in the left, we find that
⎡ ⎤⎡ ⎤
1000 1 2 1 2 1
⎢ 2 1 0 0 ⎥ ⎢ 0 −4 0 −3 −1 ⎥
⎢ ⎥⎢ ⎥
⎣ 2 0 1 0 ⎦ ⎣ 0 −1 −1 −1 0 ⎦ .
1001 0 −2 0 −1 1

Further, performing the row operations R3 → R3 − 41 R2 followed by R4 → R4 −


1
R , we get
2 2 ⎡ ⎤⎡ ⎤
1000 1 2 1 2 1
⎢2 1 0 0⎥⎢0 −4 0 −3 −1 ⎥
⎢ 1 ⎥⎢ ⎥.
⎣2 1 0⎦⎣0 0 −1 − 41 41 ⎦
4
1 21 0 1 0 0 0 21 23

This is an LU decomposition
⎡ of A. ⎤ ⎡ ⎤
1 000 1 000
⎢ −2 1 0 0 ⎥ ⎢ 0 1 0 0⎥
It can be seen that E 1 = ⎢ ⎥ ⎢ ⎥
⎣ 0 0 1 0 ⎦ , E 2 = ⎣ −2 0 1 0 ⎦ ,
0 001 0 001
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 000 1 0 00 1 0 0 0
⎢ 0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 1 0 0⎥
E3 = ⎢ ⎥ ⎢ ⎥ ⎢
⎣ 0 0 1 0 ⎦ , E4 = ⎣ 0 − 1 1 0 ⎦ , E5 = ⎣ 0

4
0 1 0⎦
−1 0 0 1 0 0 01 0 − 21 0 1
338 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤
1 2 1 2 1
⎢0 −4 0 −3 −1 ⎥
with U = E 5 E 4 E 3 E 2 E 1 A = ⎢
⎣0
⎥ and
0 −1 − 41 41 ⎦
0 0 0 21 23
⎡ ⎤
1000
⎢2 1 0 0⎥
L = E 1−1 E 2−1 E 3−1 E 4−1 E 5−1 =⎢
⎣2 1 1 0⎦.

4
1 21 0 1

Solution of the System of Equations Using LU Decomposition


The LU decomposition is an approach designed to exploit the triangular system and
useful to solve the system of equations. In fact, this method is well suited for digital
computers and is the basis of many practical computer programs.
Any given system can be solved in two stages. First, we shall show that how a
linear system AX = B can be readily solved once A is decomposed into a product
of lower triangular and upper triangular matrices. Once A is factored, the system
AX = B can be solved using the following steps:
(1) Write the system AX = B as LU X = B.
(2) Define an n × 1 matrix Y by U X = Y .
(3) Rewrite the system
⎡ ⎤ in (1) as ⎡LY =⎤ B, where L = {(i j ) | i j = 0, i < j; 1 ≤
y1 b1
⎢ y2 ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥
i, j ≤ n}, Y = ⎢ . ⎥ , B = ⎢ . ⎥ and solve this system for Y .
⎣ .. ⎦ ⎣ .. ⎦
yn bn

11 y1 = b1
21 y1 + 22 y2 = b2
31 y1 + 32 y2 + 33 y3 = b3 .
.. .
. = ..
n1 y1 + n2 y2 + n3 y3 + · · · + nn yn = bn

This yields value of y1 , and further using the successive equations, one can find
y2 , y3 , . . . , yn .
(4) Once Y has been determined, solve the upper triangular system U X = Y to find
the solution X of the system.

Remark 10.6 (i) If any of the diagonal element ii is zero, then the system is
singular and can’t be solved.
(ii) If all the diagonal elements ii are nonzero, then the system has unique solution.

Although the above procedure replaces the problem of solving the single system
into two different systems LY = B and U X = Y , but because of the involvement of
triangular matrices, the latter systems are easy to solve.
10.1 LU Decomposition of Matrices 339

Example 10.7 Solve the system of equations involving five variables and four equa-
tions: ⎡ ⎤
⎡ ⎤ x ⎡ ⎤
12121 ⎢ ⎥ 1
⎢2 0 2 1 1⎥⎢ y ⎥ ⎢2⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣2 3 1 3 2⎦⎢ z ⎥ = ⎣3⎦.
⎣w⎦
10112 4
t

By the above example note that we have the following LU decomposition of the
coefficient matrix:
⎡ ⎤ ⎡ ⎤⎡ ⎤
12121 1000 1 2 1 2 1
⎢ 2 0 2 1 1 ⎥ ⎢ 2 1 0 0 ⎥ ⎢ 0 −4 0 −3 −1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥.
⎣ 2 3 1 3 2 ⎦ = ⎣ 2 1 1 0 ⎦ ⎣ 0 0 −1 − 41 41 ⎦
4
10112 1 21 0 1 0 0 0 1
2
3
2

Represent the system of equations as AX = B and let U X = Y . Consider LY = B,


i.e., ⎡ ⎤⎡ ⎤ ⎡ ⎤
1000 y1 1
⎢ 2 1 0 0 ⎥ ⎢ y2 ⎥ ⎢ 2 ⎥
⎢ 1 ⎥⎢ ⎥ ⎢ ⎥
⎣ 2 1 0 ⎦ ⎣ y3 ⎦ = ⎣ 3 ⎦ .
4
1 21 0 1 y4 4
⎡ ⎤ ⎡ ⎤
y1 1
⎢ y2 ⎥ ⎢ 0 ⎥
⎢ ⎥
The above yields that Y = ⎣ ⎦ = ⎣ ⎥ ⎢ . Now by solving U X = Y , we find that
y3 1⎦
y 3
⎡ ⎤4 ⎡ ⎤
⎡ ⎤ x ⎡ ⎤ − 21
1 2 1 2 1 ⎢ ⎥ 1 ⎢ 4t−9 ⎥
⎢ 0 −4 0 −3 −1 ⎥ ⎢ y ⎥ ⎢ 0 ⎥ ⎢ 3t−12 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 0 0 −1 − 1 1 ⎦ ⎢ z ⎥ = ⎣ 1 ⎦ and hence this reduces to X = ⎢ 2 ⎥ ,
4 4 ⎣ w ⎦ ⎣ 6 − 3t ⎦
0 0 0 21 23 3
t t
t ∈ R.
This looks like trivial operations, but it is advantageous because it reduces the
number of operations involved in finding a solution to a system of equations and
makes a difference for a large system.

10.2 The P LU Decomposition

The LU decomposition is a useful tool to find the solution of a system of equations,

01
but this does not work for every matrix. For example, if we consider A = , then
10
it can be seen that there do not exist lower triangular matrix L and upper triangular
340 10 Applications of Linear Algebra to Numerical Methods

matrix U such that A = LU . Motivation behind P LU decomposition, where P is a


permutation matrix, is Gaussian elimination while reducing a matrix in echelon form
where we need to perform elimination of type (I I I )(see Note 1.86), i.e., replacing a
row Rk by Rk + α R and operation of type (I ), i.e., interchanging two rows Rk ↔ R .
Instead of using only type (I I I ) operation, we shall permute two rows (operation
(I )) first and then use operation of type (I I I ) later.
Definition 10.8 Let σ ∈ Sn be a permutation defined on n-symbols {1, 2, . . . , n}.
Then a permutation matrixPσ associated with the permutation σ is defined as

1, j = σ (i)
Pσ = (ai j ), where ai j = .
0, otherwise

1234
For example, if σ = is a permutation defined on four symbols, then the
2431
⎡ ⎤
0100
⎢0 0 0 1⎥
permutation matrix associated with σ is Pσ = ⎢⎣0 0 1 0⎦.

1000
Since the matrix Pσ is full of zeros, it is easy to deal with such matrix. It is called
permuting matrix because it would equal to the identity matrix if we could permute
with its rows.
Remark 10.9 (i) Let σ, τ ∈ Sn . Then their composition σ ◦ τ ∈ Sn .
(ii) If Pσ and Pτ are permutation matrices associated with σ and τ , respectively,
then Pσ Pτ = Pτ ◦σ . In fact, if ai j , bi j are (i, j)th entry of Pσ and Pτ , respec-
tively, and ci j is the (i, j)th entry of Pτ ◦σ , then for each i, j ∈ {1, 2, . . . , n},
n
ci j = aik bk j . Obviously, aik = 1 if and only if k = σ (i) and bk j = 1 if and
k=1
only if j = τ (k). Therefore, aik bk j = 1 if and only if j = τ (k) = τ (σ (i)).
Hence, the product Pσ Pτ is the matrix of permutation τ ◦ σ and Pσ Pτ = Pτ ◦σ .
(iii) The elementary matrix associated with the elementary operation of switching
rows is a permutation matrix. Therefore, performing a series of row switches
may be represented as a permutation matrix, since it is a product of permutation
matrices.
(iv) If E ∈ Mn (F) is an elementary matrix that represents the action Rk → Rk +
α R and if Pσ be the permutation matrix for σ ∈ {1, 2, . . . , n}, then E Pσ =
Pσ E  , where E  is the elementary matrix that represents the action Rσ (k) →
Rσ (k) + α Rσ ().
Applying procedure of LU decomposition, we can say that when no interchanges
are needed, we can factor a matrix A ∈ Mn (C) as A = LU , where L is a lower
triangular while U is upper triangular. When row interchanges are needed let P be the
permutation matrix that creates these row interchanges, then the LU decomposition
can be carried out for the matrix P A, i.e., P A = LU . This decomposition is known
as P LU decomposition.
10.2 The P LU Decomposition 341

Theorem 10.10 (P LU decomposition theorem) For any m × n matrix A, there exist


a permutation matrix P, an m × m lower triangular matrix L and an m × n row
echelon matrix U such that P A = LU .

Proof It is clear that permutation of two rows has no bearing on the use of elementary
row operation Ri → Ri + α R j in the reduction of A to row echelon form. Thus, for
any matrix A, there exists a permutation matrix P such that P A can be reduced to row
echelon form without requiring further permutation of rows. Hence, Theorem10.1
guarantees that there exists a suitable matrix L and U such that P A = LU .
⎡ ⎤
2 1 0 1
⎢ 2 1 2 3 ⎥
Example 10.11 P LU decomposition of the matrix A = ⎢ ⎥
⎣ 0 0 1 2 ⎦ . Since
−4 −1 0 −2
⎡ ⎤
1000
⎢0 0 0 1⎥
LU decomposition of A is not possible, let P = ⎢ ⎥
⎣ 0 0 1 0 ⎦ be a permutation matrix
0100
and let
⎡ ⎤⎡ ⎤ ⎡ ⎤
1000 2 1 0 1 2 1 0 1
⎢ 0 0 0 1 ⎥ ⎢ 2 1 2 3 ⎥ ⎢ −4 −1 0 −2 ⎥
A = P A = ⎢ ⎥⎢ ⎥ ⎢
⎣0 0 1 0⎦⎣ 0 0 1 2 ⎦ = ⎣ 0 0 1 2 ⎦.

0100 −4 −1 0 −2 2 1 2 2

Now we shall find LU decomposition of A .


⎡ ⎤ ⎡ ⎤⎡ ⎤
2 1 0 1 1 0 0 0 2 1 0 1
⎢ −4 −1 −2 ⎥ ⎢ 0⎥ ⎢ −2 ⎥
⎢ 0 ⎥ = ⎢0 1 0 ⎥ ⎢ −4 −1 0 ⎥.
⎣ 0 0 1 2 ⎦ ⎣0 0 1 0⎦⎣ 0 0 1 2 ⎦
2 1 2 2 0 0 0 1 2 1 2 2

Applying the row operations R2 → R2 + 2R1 and R4 → R4 − R1 on the matrix at


the right side and updating column of the identity matrix in left, we get
⎡ ⎤⎡ ⎤
1 0 0 0 2 1 0 1
⎢ −2 1 0 0⎥ ⎢0 1 0 0⎥
⎢ ⎥⎢ ⎥.
⎣ 0 0 1 0⎦⎣0 0 1 2⎦
1 0 0 1 0 0 2 1

Further, using the row operation R4 → R4 − 2R3 , we find that


⎡ ⎤⎡ ⎤
1 000 210 1
⎢ −2 1 0 0 ⎥ ⎢ 0 1 0 0 ⎥
⎢ ⎥⎢ ⎥
⎣ 0 0 1 0⎦⎣0 0 1 2 ⎦.
1 021 0 0 0 −3
342 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤ ⎡ ⎤
1 000 210 1
⎢ −2 1 0 0 ⎥ ⎢0 1 0 0 ⎥
Thus, A = P A = LU , where L = ⎢ ⎥ ⎢
⎣ 0 0 1 0⎦,U = ⎣0 0 1 2 ⎦.

1 021 0 0 0 −2
Hence, A = P 2 A = P(P A) = P LU which yields P LU decomposition of A, i.e.,
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
2 1 0 1 1 0 0 0 1 0 0 0 2 1 0 1
⎢ 2 1 2 3 ⎥ ⎢0 0 0 1⎥ ⎢ −2 1 0 0⎥ ⎢0 1 0 0 ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥.
⎣ 0 0 1 2 ⎦ ⎣0 0 1 0⎦⎣ 0 0 1 0⎦⎣0 0 1 2 ⎦
−4 −1 0 −2 0 1 0 0 1 0 2 1 0 0 0 −2

Solution of the System of Equations Using PLU Decomposition


P LU decomposition of a matrix can be used to solve the system of equations. Let
AX = B, where A ∈ Mn (C) be a system of equations. Find a permutation matrix
P such that P A = LU , where L = {(i j ) | i j = 0, i < j} is lower triangular and
U = {(u i j ) | u i j = 0, i > j} is upper triangular. Therefore, we have P AX = P B
and hence LU X = P B. Now solve the systems

LY = P B and U X = Y.

Then LU X = LY = P B and hence X is a solution of this system. This has an


advantage over the direct Gaussian elimination because the systems LY = P B and
U X = Y are triangular and are easy to solve. ⎡ ⎤
b1
⎢ b2 ⎥
⎢ ⎥
For the first of the systems LY = P B, let P B = ⎢ . ⎥ . Then it can be easily
⎣ .. ⎦
bn
seen that back solution can be used to determine Y , i.e., we have recursive relations

y1 = b1
11
b2 −21 y1
y2 = 22
.. .
.

n−1
bn − ni yi
yn = i=1
nn

A similar procedure can be adopted to solve U X = Y . The upper triangular system


U X = Y can be written as the set of linear equations:

u 11 x1 + u 12 x2 + · · · + u 1n xn = y1
u 22 x2 + · · · + u 2n xn = y2
.. . .
. = ..
u nn xn = yn
10.2 The P LU Decomposition 343


n
y1 − u 1k xk
yn−1 −u n−1n xn
The back solution is xn = yn
u nn
, xn−1 = u n−1n−1
, x1 = k=1
u 11
.
Remark 10.12 (i) In practice, the step of determining and then multiplying by
the permutation matrix is not actually carried out. Rather, an indent array is
generated, while the elimination step is accomplished that effectively inter-
changes a pointer to the row interchanges. This saves considerable time in
solving potentially very large systems.
(ii) If any of the diagonal element u ii is zero, then the system is singular and cannot
be solved.
(iii) If all diagonal elements of U are nonzero, then the system has unique solution.
⎡ ⎤
1234
Example 10.13 Use P LU factorization of A = ⎣ 1 2 3 0 ⎦ and solve the system
5311
⎡ ⎤
1
of equations AX = B, where B = ⎣ 2 ⎦.
3
We proceed to find row reduced echelon form of the matrix A. First add −1 times
of the first row to the second row and then add −5 times the first row to the third row
of A to get ⎡ ⎤⎡ ⎤
100 1 2 3 4
⎣ 1 1 0 ⎦ ⎣ 0 0 0 −4 ⎦ .
501 0 −7 −14 −19

Now there is no way to obtain upper triangular matrix by using row operation of
replacing a row with itself added to a multiple of another row to the second matrix
(without interchanging any two rows). Now consider the matrix A by switching the
last two rows of A
⎡ ⎤ ⎡ ⎤⎡ ⎤
1234 100 1234
A = ⎣ 5 3 1 1 ⎦ = ⎣ 0 1 0 ⎦ ⎣ 5 3 1 1 ⎦ .
1230 001 1230

Now add −1 times of the first row to the third row and then add −5 times the first
row to the second row to get
⎡ ⎤⎡ ⎤
100 1 2 3 4
⎣ 5 1 0 ⎦ ⎣ 0 −7 −14 −19 ⎦ .
101 0 0 0 −4

The first matrix is lower triangular while the second matrix is upper triangular and
hence A has LU decomposition. ⎡ ⎤
100
Thus, A = P A = LU , where L and U are given above and P = ⎣ 0 0 1 ⎦.
010
Hence, A = P A = P(P A) = P LU and
2
344 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
1234 100 100 1 2 3 4
⎣ 1 2 3 0 ⎦ = ⎣ 0 0 1 ⎦ ⎣ 5 1 0 ⎦ ⎣ 0 −7 −14 −19 ⎦ .
5311 010 101 0 0 0 −4

To solve the system of equations AX = B, let U X = Y and consider P LY = B. In


other words, solve ⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤
100 1 00 y1 1
⎣0 0 1⎦⎣5 1 0 ⎦ ⎣ y2 ⎦ = ⎣ 2 ⎦ .
010 1 01 y3 3
⎡ ⎤⎡ ⎤ ⎡ ⎤
100 y1 1
Multiplying both sides by P, we find that ⎣ 5 1 0 ⎦ ⎣ y2 ⎦ = ⎣ 3 ⎦ and hence Y =
101 y3 2
⎡ ⎤ ⎡ ⎤
y1 1
⎣ y3 ⎦ = ⎣ −2 ⎦. Now solve the system U X = Y , i.e.,
y2 1
⎡ ⎤
⎡ ⎤ x1 ⎡ ⎤
1 2 3 4 ⎢ x2 ⎥ 1
⎣ 0 −7 −14 −19 ⎦ ⎢ ⎥ = ⎣ −2 ⎦ .
⎣ x3 ⎦
0 0 0 −4 1
x4

This yields that ⎡ ⎤ ⎡ 49t+53 ⎤


x1 7
⎢ x2 ⎥ ⎢ −14t−93 ⎥
⎢ ⎥ = ⎢ 7 ⎥ , t ∈ R.
⎣ x3 ⎦ ⎣ t ⎦
x4 −4 1

Exercises
1. If A is any n × n matrix, then show that A can be factored as A = P LU , where
L is lower triangular, U is upper triangular and P is a permutation matrix which
can be obtained by the interchanging rows of In appropriately.
2. Show that the product of many finitely lower triangular matrices is a lower trian-
gular matrix, and apply this result to show that the product of many finitely upper
triangularmatrices
is upper triangular.
pq
3. Let A = . Then
r s
(a) prove that if p = 0, then A has unique LU decomposition with 1 s along
the main diagonal of L;
(b) find the LU decomposition described in (a).
10.2 The P LU Decomposition 345
⎡ ⎤
1 6 2
4. Show that A = ⎣ 2 12 5 ⎦ does not have a LU decomposition. Moreover,
−1 −3 −1
reorder the rows of A and find a LU decomposition of new matrix, and hence
solve the system of equations:

x1 + 6x2 + 2x3 = 9,
2x1 + 12x2 + 5x3 = 7,
−x1 − 3x2 − x3 = 17.

5. Let A be an n × n matrix with triangular factorization LU . Show that det (A) =


u 11 + u 22 + · · · + u nn . ⎡ ⎤
1 1 1
6. Find LU factorization of A = ⎣ 2 4 1 ⎦, where L is a lower triangular matrix
−3 1 −2
with 1 s along the main diagonal and U is an upper triangular matrix. Moreover,
solve the system AX = B for B =
⎡ ⎤ ⎡ ⎤
4 7
(a) ⎣ 3 ⎦ ; (b) ⎣ 23 ⎦ .
−13 0

7. Use LU decomposition and forward and back solution to solve the system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 −3 2 −2 x1 −11
⎢3 −2 −1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0 ⎥ ⎢ x2 ⎥ = ⎢ −4 ⎥ .
⎣2 36 −28 27 ⎦ ⎣ x3 ⎦ ⎣ 155 ⎦
1 −3 22 5 x4 10
⎡ ⎤
3 −1 0
8. Factor A = ⎣ 3 −1 1 ⎦ as A = P LU , where P is obtained from I3 by inter-
0 2 1
changing rows appropriately, L is lower triangular and U is upper triangular
matrix.

10.3 Eigenvalue Approximations

The eigenvalues of a square matrix can be obtained by solving its characteristic


equation. In practical problems, especially those involving matrices of large orders,
that method for calculating eigenvalues has many computational difficulties; there-
fore, other methods for finding eigenvalues are needed. In this section, we will study
a simple algorithm, called the power method, that gives an approximation to the
eigenvalue of the greatest absolute value and a corresponding eigenvector. In many
346 10 Applications of Linear Algebra to Numerical Methods

applications, only the dominant eigenvalue and eigenvector of a matrix are needed,
there power method can be tried. However, if additional eigenvalues and eigenvec-
tors are needed, then other methods are required. These methods will not involve the
characteristic polynomial. To observe that there are some advantages to work directly
with the matrix, we must determine the effect that minor changes in the entries of A
have upon the eigenvalues. We have proved a result related to this.
Definition 10.14 Let A be a square matrix. An eigenvalue of A is called the dominant
eigenvalue of A if its absolute value is larger than the absolute values of the remaining
eigenvalues. An eigenvector corresponding to the dominant eigenvalue is called a
dominant eigenvector of A.
Theorem 10.15 Let A be a matrix of order n × n with a complete set of eigenvectors
and let X be a matrix that diagonalizes A, i.e.,
⎡ ⎤
λ1
⎢ λ2 ⎥
⎢ ⎥
X −1 AX = D = ⎢ .. ⎥.
⎣ . ⎦
λn
  
If A = A + E and λ is an eigenvalue of A , then min |λ − λi | ≤ cond2 (X )||E||2 .
1≤i≤n

Proof If λ is equal to any of the λi s,
then nothing to do. Now suppose that
λ is unequal to any of the λi s. Thus, if we set D1 = D − λ I, then D1 is a
nonsingular matrix. As λ is an eigenvalue of A , it is also an eigenvalue of
X −1 A X. Therefore, X −1 A X − λ I is singular and hence D1−1 (X −1 A X − λ I )
is also singular. On the other hand, D1−1 (X −1 A X − λ I ) = D1−1 X −1 (A + E −
λ I )X = D1−1 X −1 E X + D1−1 X −1 (A − λ I )X = D1−1 X −1 E X + D1−1 X −1 (X D X −1
− λ I )X = D1−1 X −1 E X + D1−1 (D − λ I ). Now using the fact that D1 = D − λ I,
we conclude that D1−1 (X −1 A X − λ I ) = D1−1 X −1 E X + I. This implies that |D1−1
X −1 E X − (−1)I | = 0, i.e., (−1) is an eigenvalue of D1−1 X −1 E X. It follows that
| − 1| ≤ ||D1−1 X −1 E X ||2 ≤ ||D1−1 ||2 cond2 (X )||E||2 . The two-norm of D1−1 is given
by D1−1 = max |λ − λi |−1 . The index i that maximizes |λ − λi |−1 is the same index
1≤i≤n
that minimizes |λ − λi |. Thus, min |λ − λi |−1 ≤ cond2 (X )||E||2 .
1≤i≤n

Note 10.16 If the matrix A is symmetric, we can choose an orthogonal diagonalizing


matrix. In general, if Q is any orthogonal matrix, then cond 2 (Q) = ||Q||2 ||Q −1 ||2 =
1 and hence the conclusion of Theorem 10.15 simplifies to min |λ − λi |−1 ≤ ||E||2 .
1≤i≤n

Thus, if A is symmetric and ||E 2 || is small, the eigenvalues of A will be close to the
eigenvalues of A.
Power Method
Let A be an n × n matrix with eigenvalues λ1 , λ2 , . . . , λn such that |λ1 | > |λ2 | ≥
· · · ≥ |λn |. These eigenvalues are unknown to us. This method determines the dom-
inant eigenvalue, i.e., λ1 and a corresponding eigenvector. To see the idea behind the
10.3 Eigenvalue Approximations 347

method, let us assume that A has n linearly independent eigenvectors X 1 , X 2 , . . . , X n


corresponding to the eigenvalues λ1 , λ2 , . . . , λn , respectively. Given an arbitrary
vector X 0 in Rn , we can write X 0 = α1 X 1 + α2 X 2 + · · · + αn X n . This implies
that AX 0 = α1 λ1 X 1 + α2 λ2 X 2 + · · · + αn λn X n . This further implies that A2 X 0 =
α1 λ21 X 1 + α2 λ22 X 2 + · · · + αn λ2n X n . Moving ahead in this way at kth step, we
arrive at Ak X 0 = α1 λk1 X 1 + α2 λk2 X 2 + · · · + αn λkn X n . If we define Ak X 0 = X k ; k =
1, 2, . . . , then we have λ1k X k = α1 X 1 + α2 ( λλ21 )k X 2 + · · · + αn ( λλn1 )k X n . Since | λλ1i | <
1
1 for i = 2, 3, . . . , n, it follows thatX k → α1 X 1 as k → ∞. Thus, if α1 = 0, the
1
λk1
sequence {( λk )X k } converges to an eigenvector α1 X 1 of A. Since α1 X 1 is a dominant
1
1
eigenvector of A, this shows that the limiting value of the sequence {( λ1k )X k } gives
1
us a dominant eigenvector of A. Let us denote this limiting value, i.e., the dominant
eigenvector by a vector Y. Thus, till this stage we have determined a dominant eigen-
vector Y of A. We now show how to approximate the dominant eigenvalue once an
approximation Y to a dominant eigenvector is known. Let λ be an eigenvalue of A
and X a corresponding eigenvector. If , denotes the Euclidean inner product, then

X, AX X, λX λ X, X
= = = λ.
X, X X, X X, X

Thus, if Y is an approximation to a dominant eigenvector, the dominant eigenvalue


λ1 can be approximated by
Y, AY
λ1 ≈ .
Y, Y

It is to be noted that we have not scaled the sequence {( λ1k )X k } in the process. On
1
the other hand, if we scale the sequence {X k }, then one gets unit vectors at each step
and the sequence converges to a unit vector in the direction of X 1 . The eigenvalue
λ1 can be computed at the same time.
We now summarize the steps in the power method with scaling as following:
(1) Pick an arbitrary nonzero vector X 0 .
(2) Compute AX 0 and scale down to obtain the first approximation to a dominant
eigenvector. Say it X 1 .
(3) Compute AX 1 and scale down to obtain the second approximation X 2 .
(4) Compute AX 2 and scale down to obtain the third approximation X 3 .
Continuing in this way, a succession X 0 , X 1 , X 2 , . . . of better and better approx-
imations to a dominant eigenvector will be obtained, and in each step, dominant
eigenvalue λ1 is approximated by X Xi ,AX i
i ,X i
, where i : 1, 2, . . . .
Example 10.17 Approximate a dominant eigenvector and the dominant
 eigenvalue
11
of the matrix A by using power method with scaling, where A = .
13
348 10 Applications of Linear Algebra to Numerical Methods

We arbitrarily choose 
1
X0 =
1

as an initial approximation of a dominant vector. Multiplying X 0 by A and scaling


down yield
    
11 1 2 1 2 .5
AX 0 = = , X1 = = .
13 1 4 4 4 1

Multiplying X 1 by A and scaling down yield


    
11 .5 1.5 1 1.5 .429
AX 1 = = , X2 = = .
13 1 3.5 3.5 3.5 1

Hence, the first approximation of the dominant eigenvalue is

X 1 , AX 1 (.5)(1.5) + 3.5
λ1 ≈ = = 3.4.
X 1 , X 1 (.5)(.5) + 1

Multiplying X 2 by A and scaling down yield

    
11 .429 1.429 1 1.429 .417
AX 2 = = , X3 = = .
13 1 3.429 3.429 3.429 1

Hence, the second approximation of the dominant eigenvalue is

X 2 , AX 2 (.429)(1.429) + 3.429
λ1 ≈ = = 3.414.
X 2 , X 2 (.429)(.429) + 1

Multiplying X 3 by A and scaling down yield

    
11 .417 1.417 1 1.417 .4147
AX 3 = = , X4 = = .
13 1 3.417 3.417 3.417 1

Hence, the third approximation of the dominant eigenvalue is

X 3 , AX 3 (.417)(1.417) + 3.417
λ1 ≈ = = 3.414.
X 3 , X 3 (.417)(.417) + 1

Continuing in this way, we generate a succession of approximations to a dominant


eigenvector and the dominant eigenvalues. Thus, the approximate
 dominant
eigen-
.414
value is 3.414 and the corresponding dominant eigenvector is .
1
10.3 Eigenvalue Approximations 349

Exercises

21
1. Let A = . Apply three iterations of the power method with any nonzero
12
starting vector and obtain the approximate value of dominant eigenvalue and
dominant eigenvector of A. Determine the exact eigenvalues of A by solving
characteristic equation and determine the eigenspace corresponding to the largest
eigenvalue. Compare the answers you obtained in these two ways.
2. Find the dominant eigenvalue and dominant eigenvector if exist in the following
matrices: ⎡ ⎤ ⎡ ⎤
  4 2 1 1 −12 0
−1 4 01 ⎣
, , 0 −5 3 ⎦ , ⎣ 1 0 0 ⎦ .
1 −1 40
0 0 6 0 0 2
 
1 2 1
3. Let A = and X 0 = . Compute X 1 , X 2 , X 3 and X 4 , using power
−1 −1 1
method. Explain
 why power method will fail to converge in this case.
18 17
4. Let A = . Use the power method with scaling to approximate the domi-
2 3

1
nant eigenvalue and the dominant eigenvector of A. Start with X 0 = . Round
1
off all computations to three significant digits and stop after three iterations. Also,
find the exact
⎡ value⎤for the dominant eigenvalue and eigenvector.
21 0
5. Let A = ⎣ 1 2 0 ⎦ . Use the power method with scaling to approximate the
0 0 10
⎡ ⎤
1
dominant eigenvalue and the dominant eigenvector of A. Start with X 0 = ⎣ 1 ⎦ .
1
Round off all computations to three significant digits and stop after three iterations.
Also, find the exact value for the dominant eigenvalue and eigenvector.
6. Let X = (x1 , x2 , . . . , xn )t be an eigenvector of A corresponding to eigenvalue λ.

n n
If |xi | = ||X ||∞ , then show that ai j = λxi and |λ − aii | ≤ |ai j |.
j=1 j=1, j=i
7. Let A be a matrix with eigenvalues λ1 , λ2 , . . . , λn and let λ be an eigenvalue of
A + E. Let X be a matrix that diagonalizes A, and let C = X −1 E X. Prove that

n
for some i, |λ − λi | ≤ |ci j | and min |λ − λ j | ≤ cond∞ (X )||E||∞ .
j=1 1≤ j≤n

10.4 Singular Value Decompositions

First two sections of this chapter deal with the LU and P LU decompositions of
matrices. In this section, we shall discuss decomposition for rectangular matrices
rather than square matrices. This decomposition is known as Singular Value Decom-
position (SVD). This decomposition is fundamental to numerical analysis and linear
350 10 Applications of Linear Algebra to Numerical Methods

algebra. We will describe here its properties and discuss its applications, which are
many and growing. Throughout the section, A is an m × n matrix, where we have
assumed m ≥ n. This assumption is made for convenience only. All the results will
also hold if m < n.
Definition 10.18 Let A be a matrix of √ order m × n. The real numbers σ1 , σ2 , . . . , σn
are called singular values of A if σi = λi for each i = 1, 2, 3, . . . , n; where λi s are
eigenvalues of At A. The corresponding eigenvectors are called as singular vectors of
A. It is to be noted that At A is positive semi-definite. As a result, all the eigenvalues
of At A will be always nonnegative, i.e., λi ≥ 0 for each i = 1, 2, 3, . . . , n. Also,
thus, all the singular values of A are nonnegative.

Definition 10.19 Let A be any matrix of order m × n. Then A can be factored as


A = U DV t , where U is an m × m orthogonal matrix, V is an n × n orthogonal
matrix and D is an m × n matrix whose off diagonal entries are all 0 s and whose
diagonal elements are shown as follows:
⎡ ⎤
σ1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ σr ⎥
D=⎢

⎥,

⎢ 0 ⎥
⎢ .. ⎥
⎣ . ⎦
0

where σ1 , σ2 , . . . , σr are positive singular values of A such that σ1 ≥ σ2 ≥ · · · ≥ σr .


Such type of factorization of A is known as singular value decomposition of A.

We initiate by showing that such a decomposition is always possible. Before moving


ahead, we give a remark in which we describe different types of matrix norms which
have been used throughout this section.

Remark 10.20 (i) Let A be any matrix of order m × n with real or complex
entries. Then the Frobenius norm, sometimes also called the Euclidean norm
of an m × n matrix A, is defined as thesquare root of the sum of the absolute

m n
squares of its elements, i.e., A F = |ai j |2 or equivalently A F =
i=1 j=1

trace(A A∗ ), where A∗ is the tranjugate of A.
(ii) Let A be any matrix of order m × n with real or complex entries. Then

m
||A||1 = max |ai j |, which is simply the maximum absolute column sum
1≤ j≤n i=1
of the matrix A;

n
||A||∞ = max |ai j |, which is simply the maximum absolute row sum of
1≤i≤m j=1
the matrix A;
10.4 Singular Value Decompositions 351

||A||2 = [ Dominant eigenvalue of (A∗ A)] 2 , where the dominant eigenvalue


1

means the eigenvalue with greatest absolute value.


(iii) We have the following relations among different types of matrix norms. Let
A be any matrix of order m × n with real or complex entries. Then ||A||2 =

||A||1 ||A||∞ . Also, we have ||A||2 ≤ ||A|| F , equality holds if and only if A
is of rank 1 or the zero matrix.

Theorem 10.21 Every matrix of order m × n has a singular value decomposition.

Proof Let A be any matrix of order m × n. We have X t At AX = ||AX ||22 ≥ 0 for all
X ∈ Rn . This shows that the matrix At A is positive semi-definite. As a result, all the
eigenvalues of the matrix (At A) will be nonnegative.
√ We order these eigenvalues, so
that λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Define σi = λi . Without loss of generality, one may
suppose that exactly r of the σi s are nonzero, so that σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and
σr +1 = σr +2 = · · · = σn = 0.
Set
⎡ ⎤
σ1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥ 
⎢ σ ⎥
D=⎢ ⎢ r ⎥ = Dr O ,
0 ⎥ O O
⎢ ⎥
⎢ . ⎥
.. ⎦

0

where Dr is an r × r diagonal matrix whose diagonal entries are σ1 , . . . , σr . Since


At A is symmetric, there is an orthogonal matrix V that diagonalizes (At A). This
implies that V t At AV = D t D. Let V1 , V2 , . . . , Vn be the column vectors of V.
Hence, Vi is an eigenvector corresponding to eigenvalue σi2 for i = 1, . . . , r and
Vr +1 , . . . , Vn are eigenvectors corresponding to eigenvalue 0. Now using the column
vectors V1 , V2 , . . . , Vr , Vr +1 , . . . , Vn , we construct two matrices of orders n × r
and n × (n − r ) denoted by P and Q, respectively, such that P = [V1 , V2 , . . . , Vr ]
and Q = [Vr +1 , Vr +2 , . . . , Vn ]. Since At AVi = 0 for i = r + 1, . . . , n, it follows
that (AQ)t AQ = Q t At AQ = O and hence AQ = O. As At A P = P Dr2 , it follows
that Dr−1 P t At A P Dr−1 = Dr−1 P t P Dr2 Dr−1 = Dr−1 P t P Dr = Ir , where Ir denotes
the identity matrix of order r × r. If we put S = A P Dr−1 , then S t S = Ir and, there-
fore, S is an m × r matrix with orthonormal column vectors U1 , . . . , Ur . Now the
set {U1 , . . . , Ur } can be extended to form an orthonormal basis for Rm . Let the
extended basis be {U1 , . . . , Ur , Ur +1 , . . . , Um }. Now set T = [Ur +1 , . . . , Um ] and
U = [U1 , . . . , Ur , Ur +1 , . . . , Um ] = [S T ]. It is obvious to observe that T and U
are matrices of orders m × (m − r ) and m × m, respectively. This implies that
352 10 Applications of Linear Algebra to Numerical Methods

St
U t AV = A[P Q]
Tt

St
= [A P O]
Tt

St A P O
= .
T t AP O

But we get S t A P = Dr−1 P t At A P = Dr and T t A P = T t S Dr = O. Therefore,


U t AV = D and thus A = U DV t . Here U and V are orthogonal matrices and D
is a matrix in block form.
It is to be noted that A and D are equivalent matrices because U and V are
nonsingular matrices. But we know that equivalent matrices have the same rank.
Thus, we conclude that rank A = rank Dr = r, the number of nonzero singular
values of A (provided counting should be done according to multiplicity).
In this proof, we have supposed that m ≥ n. But if m < n, then using this proof
we can also find the singular value decomposition described as follows: Let A be a
matrix of order m × n, where m < n. Then At will be a matrix of order n × m, where
n > m, now using the above proof we can get a singular value decomposition of At .
This implies that there exist orthogonal matrices U and V such that At = U DV t ,
where U is of order n × n, V is of order m × m and D of order n × m, respectively.
 
Finally, we get A = V D t U t = V D U t , where D t = D . Clearly, A has a singular

value decomposition and here D is a block matrix whose off diagonal entries are
zero and whose diagonal elements are given as follows:
⎡ ⎤
σ1
⎢ σ2 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
D =⎢ ⎥,


⎢ σr ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

where σ1 , σ2 , . . . , σr are positive singular values of At . But we know that A and At


have the same set of positive singular values. Therefore, A possesses a singular value
decomposition.

Remark 10.22 Let A be an m × n matrix having a singular value decomposition


U DV t . Then
(i) Since A = U DV t , this shows that At A = V D t U t U DV t , i.e.,At A = V D t DV t
and
10.4 Singular Value Decompositions 353
⎡ ⎤
σ12
⎢ σ22 ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
D=⎢ .. ⎥.
⎢ . ⎥
⎢ ⎥
⎣ . ⎦
σn2

It follows that the eigenvalues of At A are σ12 , σ22 , . . . , σn2 . As σis are the non-
negative square roots of the eigenvalues of At A, they are unique.
(ii) Singular value decomposition of A is not unique. If we notice in the proof of the
above theorem, then we come across the extension of the set {U1 , U2 , . . . , Ur }
to an orthogonal basis of Rm . With the help of this basis, we obtain U. But
we know that this extension is not unique in general. As a result, U is also not
unique. Also ,it is obvious to observe that V is also not unique. Thus, we have
shown that singular value decomposition of A is not unique.
We can also justify this fact by giving a counter example as follows: Let A =
In . If we set D = In , and take U = V to be any arbitrary n × n orthogonal
matrix, then A = U DV t will hold. Thus, uniqueness of decomposition stands
disproved.
(iii) Since V diagonalizes At A, it shows that the Vis : i = 1, 2, . . . , n are eigen-
vectors of At A. Also, as A At = U DV t V D t U t = U D D t U t , it follows that U
diagonalizes A At and the Uis : i = 1, 2, . . . , m are eigenvectors of A At .
(iv) It is also easy to observe that AV = U D. Now comparing ith columns of
each side of the previous equation, we have AVi = σi Ui ; i = 1, 2, . . . , n.
Similarly, we also have At U = V D t and we conclude that At Ui = σi Vi ;
for i = 1, 2, . . . , n; and At Ui = O; for i = n + 1, n + 2, . . . , m. The Vis are
called the right singular vectors of A and the Uis are called the left singular
vectors of A.
Note 10.23 Every complex matrix has a singular value decomposition.
Example 10.24
⎡ Find ⎤ the singular values and a singular value decomposition of A,
0 2
where A = ⎣ 2 −1 ⎦ .
1 0
The matrix 
5 −2
A A=
t
−2 5

has eigenvalues λ1 = 7 and⎡λ√2 = 3. As
⎤ a result, the singular values of A are σ1 = 7
√ 7 √0
and σ2 = 3. Thus, D = ⎣ 0 3 ⎦ . The eigenvectors corresponding to the eigen-
0 0
 
1 1
values λ1 and λ2 will be of the form α and β , respectively, where α, β
1 −1
354 10 Applications of Linear Algebra to Numerical Methods

1 1
are nonzero real numbers. Therefore, the orthogonal matrix V = diag- √1
1 −1 2
onalizes At A. As σ1 and σ2 both are nonzero, using (iv) of the above remark, U1
and U2 wil be given by
⎡ ⎤ ⎡ 2 ⎤
0 2  √1  √
1 1 ⎣ ⎦ ⎢ √114 ⎥
U1 = AV1 = √ 2 −1 2 = ⎣ 14 ⎦ ;
σ1 7 1 0 √1
2 √1
14

⎤ ⎡⎡ −2 ⎤
0 2  √1  √
6
1 1 ⎣ ⎢ ⎥
U2 = AV2 = √ 2 −1 ⎦ −1 = ⎣ √36 ⎦ .
2
σ2 3 1 0 √
1
2 √
6

The set {U1 , U2 } can be extended to an orthonormal basis {U1 , U2 , U3 } of R3 ,


where ⎡ ⎤
1√
⎢ 21 ⎥
U3 = ⎣ √221 ⎦ . It gives us
−4

21
⎡ ⎤⎡√ ⎤
7 √0  √1 
√2 √−2 √1
14 6 21 √1
⎢ ⎥⎣
3 ⎦ √12
1 √3 √2
A = U DV = ⎣t √ ⎦ 0 −1
2 ,
14 6 21 √
√1 √1 √ −4
0 0 2 2
14 6 21

which is a singular value decomposition of A.


Note 10.25 Let A be a matrix of order m × n. Let r be the rank of A and 0 < k < r.
Let Λ be the set of m × n matrices of rank k or less. It can be proved that there exists
a matrix X ∈ Λ such that ||A − X || = min S∈Λ ||A − S|| F , i.e., minimum is achieved
in Λ with regard to the Frobenius norm.
Let A be a matrix of order m × n. Let r be the rank of A and 0 < k < r. Singular
value decomposition of A can be used to find a matrix of order m × n of rank k which
is closest to A with regard to the Frobenius norm. Assuming the above remark, we will
discuss how such a matrix X can be derived from the singular value decomposition
of A.
Lemma 10.26 Let A be a matrix of order m × n. If Q is an orthogonal matrix of
order m × m, then ||Q A|| F = ||A|| F .
Proof Let Q = [qi j ]m×m and A = [a jk ]m×n . Suppose that Q A = [δik ]m×n , where

m
m
δik = qi j a jk . As Q is an orthogonal matrix, we have qi j qil = 1, if j = l and
j=1 i=1
0 otherwise. Hence,


m 
m 
n  m
||Q A||2F = ||[ qi j a jk ]|| F = [ qi j a jk ]2 .
j=1 i=1 k=1 j=1
10.4 Singular Value Decompositions 355

Using the relation just given above, we obtain that


m 
n
||Q A||2F = [aik
2
] = ||A||2F .
i=1 k=1

Note 10.27 If A has singular value decomposition U DV t , then using the above
lemma, we have ||A|| F = ||U DV t || F = ||DV t || F = ||(DV t )t || F = ||V D t || F = ||
1
D t || F = ||D|| F . This follows that ||A|| F = (σ12 + σ22 + · · · + σn2 ) 2 , where σ1 , σ2 ,
. . . , σn are the singular values of A.

Theorem 10.28 Let A be a matrix of order m × n of rank r and 0 < k < r. Let
A = U DV t be a singular value decomposition and let Λ denotes the set of all
matrices of order m × n of rank k or less. Assuming that minimum is achieved in
Λ, i.e., there exists a matrix X ∈ Λ such that ||A − X || F = min S∈Λ ||A − S|| F , then

+ · · · + σn2 ) 2 . In particular, if A = U D V t , where
1
||A − X || F = (σk+12
+ σk+2
2

⎡ ⎤
σ1
⎢ σ2 ⎥
⎢ ⎥
⎢ . .. ⎥
⎢ ⎥  
⎢ ⎥ Dk O

D =⎢ ⎢ σk ⎥
⎥= O O ,
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

 1
then ||A − A || F = (σk+1
2
+ σk+2
2
+ · · · + σn2 ) 2 = min S∈Λ ||A − S|| F .

Proof Let X be a matrix in Λ satisfying ||A − X || F = min S∈Λ ||A − S|| F . Since A
 
has its rank k, therefore, A ∈ Λ. This implies that ||A − X || F ≤ ||A − A || F =
 
2 21
||U (D − D )V || F = ||(D − D )|| F = (σk+1 + σk+2 + · · · + σn ) . Next, we will
t 2 2
1
prove that ||A − X || F ≥ (σk+1
2
+ σk+2 2
+ · · · + σn2 ) 2 . Let X = U1 D1 V1t be a singular
value decomposition, where
⎡ ⎤
ω1
⎢ ω2 ⎥
⎢ ⎥
⎢ . .. ⎥
⎢ ⎥ 
⎢ ⎥ D1k O
D1 = ⎢ ⎢ ωk ⎥
⎥= O O ,
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

where ω1 , ω2 , . . . , ωk are at most nonzero singular values of X. If we put B =


U1t AV1 , then A = U1 BV1t , and it gives us
356 10 Applications of Linear Algebra to Numerical Methods

||A − X || F = ||U1 (B − D1 )V1t || F = ||(B − D1 )|| F .



B11 B12
Let us partition B as D1 , i.e., B = , where B11 , B12 , B21 and B22 are
B21 B22
matrices of orders k × k, k × (n − k), (n − k) × k and (m − k) × (m − k), respec-
tively. This implies that

||A − X ||2F = ||B11 − D1k ||2F + ||B12 ||2F + ||B21 ||2F + ||B22 ||2F .

B11 B12
We claim that B12 = O. For otherwise, define Y = V1 U1 . Obviously,
O O
Y ∈ Λ and
||A − Y ||2F = ||B21 ||2F + ||B22 ||2F < ||A − X ||2F ,

B12 = O. Similarly, it can be proved


which contradicts the definitionof X. Hence,
B11 O
that B21 = 0. If we put Z = V1 U1 , then Z ∈ Λ and we have
O O

||A − Z ||2F = ||B22 ||2F < ||B11 − Dk ||2F + ||B22 ||2F = ||A − X ||2F .

By using the definition of X, we conclude that B11 = Dk . If B22 has a singular value
decomposition U2t D2 V2 , then

||A − X || F = ||B22 || F = ||D2 || F .


   
Ik O I O Dk 2 O
Let U3 = and V3 = k . Now U3t U2t AV2 V3 = . This
O U2 O V2 O D2
 
Dk 2 O
implies that A = U2 U3 (V2 V3 )t and here it is clear that the diagonal
O D2
elements of D2 are singular values of A. Thus, it follows that
1
||A − X || F = ||D2 || F ≥ (σk+1
2
+ σk+2
2
+ · · · + σn2 ) 2 .

Finally, we conclude that


1 
||A − X || F = ||D2 ||2 = (σk+1
2
+ σk+2
2
+ · · · + σn2 ) 2 = ||A − A || F .

Remark 10.29 (i) Let A be an m × n matrix having a singular value decom-


position U DV t . We define E j = U j V jt , j = 1, 2, . . . , n; where U j and V j
are column vectors as discussed in Theorem 10.21. Then each E j is of rank
1 and we claim that A = σ1 E 1 + σ2 E 2 + · · · + σn E n . To prove the claim,
notice that (V1 V1t + · · · + Vn Vnt )X = (V1t X )V1 + · · · + (Vnt X )Vn = X holds
for every X ∈ Rn due to the fact that the column vectors of V form an orthonor-
mal basis for Rn . This implies that null space N (V1 V1t + · · · + Vn Vnt − I ) =
10.4 Singular Value Decompositions 357

Rn . Thus, we conclude that V1 V1t + · · · + Vn Vnt = I. This implies that

A = AI

= A(V1 V1t + · · · + Vn Vnt )

= σ1 U1 V1t + · · · + σn Un Vnt

= σ1 E 1 + · · · + σn E n .

If A is of rank n, then
⎡ ⎤
σ1
⎢ σ2 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥ t
A =U⎢

⎢ σn−1 ⎥V

⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0

= σ1 E 1 + · · · + σn−1 E n−1

will be a matrix of rank (n − 1) which is nearest to A with respect to the



Frobenious norm. Similarly, A = σ1 E 1 + · · · + σn−2 E n−2 will be the nearest
matrix of rank (n − 2) and so on. Particularly, if A is a nonsingular n × n matrix,
 
then A is singular and ||A − A || F = σn . Thus, σn can be taken as a measure
of how close a matrix is to being singular.
(ii) We should not use the value of det A as a measure of how close A is to being
singular. If, for example, A is the matrix of order 100 × 100 diagonal matrix
whose diagonal entries are all 13 , then det A = 3−100 , however σ100 = 13 . On the
other hand, the following example is very close to being singular even though
its determinant is 1 and all of its eigenvalues are equal to 1. Let A be an upper
triangular matrix whose diagonal entries are all 1 and whose entries above the
main diagonal are all −1, i.e.,
⎡ ⎤
1 −1 −1 · · · −1 −1
⎢ 0 1 −1 · · · −1 −1 ⎥
⎢ ⎥
⎢ 0 0 1 · · · −1 −1 ⎥
A=⎢ ⎢ ⎥.

⎢··· ··· ··· ··· ··· ··· ⎥
⎣ 0 0 0 · · · 1 −1 ⎦
0 0 0 ··· 0 1

It is to be noted that det A = det (A−1 ) = 1 and all the eigenvalues of A are
1. However, if n is large, then A is close to being singular. To observe this, let
358 10 Applications of Linear Algebra to Numerical Methods

⎡ ⎤
1 −1 −1 ··· −1 −1
⎢ 0 1 −1 ··· −1 −1 ⎥
⎢ ⎥
⎢ 0 0 1 ··· −1 −1 ⎥
B=⎢

⎥.
⎢ ··· ··· ··· ··· ··· ···⎥

⎣ 0 0 0 ··· 1 −1 ⎦
−1
3n−2
0 0 ··· 0 1

Clearly, B is singular, since the system B X = O has a nontrivial solution


X = [3n−2 , 3n−3 , . . . , 30 , 1]t . The matrices A and B differ only at nth row and
first column. Hence, we have ||A − B|| F = 3n−2 1
. Using Theorem 10.28, σn =
min X singular ||A − X || F ≤ ||A − B|| F = 3n−2 . Thus, if n = 100, then σn ≤ 3981 ,
1

and as a result, A is very close to singular.

Theorem 10.30 Let A be a matrix of order m × n with singular value decomposition


U DV t . Then ||A||2 = σ1 , where σ1 is the largest eigenvalue A.

Proof Since U and V are orthogonal. We have ||A||2 = ||U DV t ||2 = ||D||2 . Now

||D||2 = max ||D X ||2


||X ||2
X =0

√ n
i=1 (σi x i )
2
= √ n 2
i=1 xi
≤ σ1 .

||D X ||2
In particular, if we choose X = (1, 0, . . . 0), then ||X ||2
= σ1 . Therefore, it follows
that ||A||2 = ||D||2 = σ1 .

Corollary 10.31 Let A be a nonsingular matrix with singular value decomposition


U DV t . Then cond2 (A) = σσn1 .

Proof The singular values of A−1 = V D −1 U t arranged in decreasing order are σ1n ≥
1
σn−1
≥ · · · ≥ σ11 . Therefore, ||A−1 ||2 = σ1n and cond 2 (A) = σσn1 .

10.5 Applications of Singular Value Decomposition

The singular value decomposition (SVD) is not only a classical theory in matrix com-
putation and analysis but also is a powerful tool in machine learning and modern data
analysis. Today, singular value decomposition has spread through many branches of
sciences, in particular psychology and sociology, climate and atmospheric sciences,
astronomy and descriptive and predictive statistics. This is also used in some impor-
tant topics such as digital image processing, spectral decomposition, polar factoriza-
tion to matrices, compression algorithm, ranking documents, discrete optimization,
10.5 Applications of Singular Value Decomposition 359

clustering a mixture of spherical Gaussians, principal component analysis and low-


rank approximations of matrices. Finally, we have demonstrated an idea how SVD
compression works on images.
SVD in Spectral Decomposition
Suppose B is a square matrix. If the vector X and scalar λ are such that B X = λX,
then X is an eigenvector of the matrix B and λ is the corresponding eigenvalue. We
present here a spectral decomposition theorem for the special case where B is of
the form B = A At for some matrix A which may be possibly rectangular. If A is a
real-valued matrix, then B is symmetric and positive definite. That is, X t B X > 0 for
all nonzero vectors X. The spectral decomposition theorem holds more generally.
The theorem runs as follows:
If B = A At , then B = i σi2 u i u it , where A = i σi u i vit is the singular value
decomposition of A. The proof of this theorem is described as follows:
   
B = A At = ( σi u i vit )( σ j u j vtj )t = σi σ j u i vit v j u tj = σi2 u i u it .
i j i j i

When the σis are all distinct, the u is are the eigenvectors of B and the σi2 are the
corresponding eigenvalues. If the σis are not distinct, then any vector that is a linear
combination of those u i with the same eigenvalue is an eigenvector of B.
SVD in Principal Component Analysis
SVD is used in Principal Component Analysis (PCA). PCA is illustrated by an
example: customer-product data where there are n customers buying d products.
Let matrix A with elements ai j represent the probability of customer purchasing
product j. One hypothesizes that there are really only k underlying basic factors
like age, income, family size, etc. that determine a customer’s purchase behavior.
An individual customer’s behavior is determined by some weighted combination
of these underlying factors. This implies that a customer’s purchase behavior can
be characterized by a k-dimensional vector where k is much smaller than n and d.
The components of the vector are weights for each of the basic factors. Associated
with each basic factor is a vector of probabilities, each component of which is the
probability of purchasing a given product by someone whose nature depends only on
that factor. More abstractly, A is an n × d matrix that can be expressed as the product
of two matrices U and V , where U is an n × k matrix expressing the factor weights
for each customer and V is a k × d matrix expressing the purchase probabilities of
products that correspond to that factor. It is possible that A may not be exactly equal
to U V , but close to it since there may be noise or random perturbations.
As discussed in the previous section, we take the best rank k approximation Ak
from SVD, as a result, we get such a U, V. In this usual setting, one assumed that
A was available completely and we wished to find U and V to identify the basic
factors or in some applications to denoise A if we think of A − U V as noise. Now
suppose that n and d are very large, on the order of thousands or even millions, there
is probably little one could do to estimate or even store A. In this setting, we may
assume that we are given just a few elements of A and wish to estimate A. If A was
360 10 Applications of Linear Algebra to Numerical Methods

an arbitrary matrix of size n × d, this would require (nd) pieces of information


and cannot be done with a few entries. But again we hypothesize that A was a small
rank matrix with added noise. If we also assume that the given entries are randomly
drawn according to some known distribution, then there is a possibility that SVD
can be used to estimate the whole of A. This area is called collaborative filtering and
one of its uses is to target an ad to a customer based on one or two purchases.
SVD in Solving Discrete Optimization Problems
The use of SVD in solving discrete optimization problems is a relatively new subject
with many applications. We start with an important Non-deterministic polynomial
time hard problem, the Maximum Cut Problem for directed graph G(V, E).
The Maximum Cut Problem is to partition the node set V of a directed graph
into two subsets S and S̄ so that the number of edges from S to S̄ is maximized. Let
A be the adjacency matrix of the graph. With each vertex i, associate an indicator
variable xi . The variable xi will be set 1 for i ∈ S and 0 for i ∈ S̄. The vector
X = {x1 , x2 , . . . , xn } is unknown and we are trying to find it (or equivalently the
cut), so as to maximize the number of edges across the cut. The number of edges
across the cut is precisely i j (1 − x j )ai j . Thus, the Maximum Cut Problem can be
posed as the optimization problem

Maximize (1 − x j )ai j subject to xi ∈ {0, 1}.
ij

This can also be written in matrix notation as



(1 − x j )ai j = X t A(I − X ),
ij

where I denotes the vector of all 1s . So the problem can be restated as

Maximize X t A(I − X ) subject to xi ∈ {0, 1}. (10.1)

SVD is used to solve


this problem approximately by computing the SVD of A and
k
replacing A by Ak = i=1 σi Ui Vit in the previous relation to get

Maximize X t Ak (I − X ) subject to xi ∈ {0, 1}. (10.2)

It is to be noted that the matrix Ak is no longer a 0-1 adjacency matrix. We will prove
2
that (i) for each 0-1 vector X , X t Ak (I − X ) and X t A(I − X ) differ by at most √nk+1 .
Thus, maxima in (10.1) and (10.2) differ by at most this amount. Also, we will show
that (ii) a near optimal X for (10.2) can be found by exploiting the low rank of Ak ,
which by (i) is near optimal for (10.1) where near optimal means with additive error
2
of at most √nk+1 .
First,√we prove (i), since X and I − X are 0-1 n-vectors, each has a length
at most n. Using the definition of two-norm of a matrix , |(A − Ak )(I − X )| ≤
10.5 Applications of Singular Value Decomposition 361


n||A − Ak ||2 . Now since X t (A − Ak )(I − X ) is the dot product of the vector X
with the vector (A − Ak )(I − X ), we get X t (A − Ak )(I − X ) ≤ n||A − Ak ||2 . But
we know that ||A − Ak ||2 = σk+1 , (k + 1)st singular
value of A. The inequalities
(k + 1)σk+1
2
≤ σ12 + σ22 + · · · + σk+1
2
≤ ||A||2F = i j ai2j ≤ n 2 imply that σk+1
2

n2
k+1
and hence ||A − Ak ||2 ≤ √k+1 n
showing (i).
Now we prove (ii) as given above. Let us look at the special case when k = 1 and
A is approximated by the rank 1 matrix A1 . An even more special case when the left
and the right singular vectors U and V are required to be identical is already hard
to solve exactly because it subsumes the problem of whether for a set of n vectors
{a1 , a2 , . . . , an }, there is a partition into two subsets whose sums are equal. So, we
look for algorithms that solve the Maximum k Cut Problem approximately.
For (ii), we have to maximize i=1 σi (X t Ui )(Vit (I − X )) over 0-1 vectors X.
For any S ⊆ {1, 2, . . . , n}, write Ui (S) for the sum of coordinates of the vector Ui
corresponding to elements in the set S and also for Vi . That is, Ui (S) = j∈S u i j .
k
We will maximize i=1 σi Ui (S)Vi ( S̄) using dynamic programming.
For a subset S of {1, 2, . . . , n}, define the 2k-dimensional vector W (S) =
(U1 (S), V1 ( S̄), U2 (S), V 2 ( S̄), . . . , Uk (S), Vk ( S̄)). If we had the list of all such vec-
k
tors, we could find i=1 σi Ui (S)Vi ( S̄) for each of them and take the maximum.
There are 2n subsets S, but several S could have the same W (S) and in that case
it suffices to list just one of them. Round each coordinate of each Ui to the nearest
integer multiple of n12 . Call the rounded vector Ūi . Similarly obtain V̄i . Let W (S)
denote the vector (U 1 (S), V 1 (S), U 2 (S), V 2 (S), . . . , U k (S), V k (S). We will con-
struct a list of all possible values of the vector W (S). Again, if several different S s
lead to the same vector W (S), we will keep only one copy on the list. The list will be
constructed by dynamic programming. For the recursive step of Dynamic Program-
ming, assume we already have a list of all such vectors for S ⊆ {1, 2, . . . , i} and
wish to construct the list for S ⊆ {1, 2, . . . , i + 1}. Each S ⊆ {1, 2, . . . , i} leads to

two possible S ⊆ {1, 2, . . . , i + 1}, namely S and S ∪ {i + 1}. In the first case, the

vector W (S ) = (U 1 (S) + u 1,i+1 , V 1 (S), U 2 (S) + u 2,i+1 , V 2 (S), . . .). In the second
case, W (S  ) = (U 1 (S), V 1 (S) + v1,i+1 , U 2 (S), V 2 (S) + v2,i+1 , . . .). We put these
two vectors for each vector in the previous list. Then, crucially eliminate duplicates.
2
Assume that k is constant. Now, we show that the error is at most √nk+1 as

claimed. Since Ui and Vi are unit length vectors, |Ui (S)|, |(Vi (S)| ≤ n. Also
|U i (S) − Ui (S)| ≤ nkn 2 = k12 and similarly for Vi . To bound the error, we use an
elementary fact: if a, b are reals with |a|, |b| ≤ M and we estimate a by a  and b by
b so that |a − a  |, |b − b | ≤ δ ≤ M, then |ab − a  b | = |a(b − b ) + b (a − a  )| ≤
|a||(b − b )| + |b ||(a − a  )| ≤ 3Mδ. Using this, we get that

k 
k
 n
3
n2
 σi U i (S)V i (S) − σi Ui (S)Vi (S) ≤ 3kσ1 ≤3
i=1 i=1
k2 k

and this meets the claimed error bound.


362 10 Applications of Linear Algebra to Numerical Methods

√Next, we prove that the running time is polynomially bounded. |U i (S)|, |V i (S)|2≤
2 n. Since U i (S), V i (S) are all integer multiples of nk1 2 , there are at most √nk 2

possible values of U i (S), V i (S) from which it follows that the list of W (S) never
gets larger than ( √nk
1
2)
2k
which for fixed k is polynomially bounded. Finally, we have
following conclusion:
“Given a directed graph G(V, E), a cut of size at least the maximum cut minus
2
O( √n k ) can be computed in polynomial time n for any fixed k”.
SVD in Image Processing
Suppose A is the pixel intensity matrix of a large image. The entry ai j gives the
intensity of the i jth pixel. If A is n × n, the transmission of A requires transmitting
O(n 2 ) real numbers. Instead, one could send Ak , that is, the top k singular values
σ1 , σ2 , . . . , σk along with the left and right singular vectors U1 , U2 , . . . , Uk and
V1 , V2 , . . . , Vk . This would require sending O(kn) real numbers instead of O(n 2 )
real numbers. If k is much smaller than n, this results in saving. For many images, a
k much smaller than n can be used to reconstruct the image provided that a very low-
resolution version of the image is sufficient. Thus, one can use SV D as a compression
method.
For an illustration, suppose a satellite takes a picture and wants to send it to Earth.
The picture may contain 1000 × 1000 “pixels”, a million little squares, each with a
definite color. We can code the colors and send back 1000000 numbers. It is better
to find the essential information inside the 1000 × 1000 matrix and send only that.
Suppose we know the SV D. The key is in the singular values (in D used in the
previous section). Typically, some σ  s are significant and others are extremely small.
If we keep 20 and throw away 980, then we send only the corresponding 20 columns
of U and V (if A = U DV t , as in the previous section). The other 980 columns are
multiplied in U DV t by the small σ  s that are being ignored. We can do the matrix
multiplication as columns times rows:

A = U DV t = U1 σ1 V1t + U2 σ2 V2t + · · · + Ur σr Vrt .

Any matrix is the sum of r matrices of rank 1. If only 20 terms are kept, we send 20
times 2000 numbers instead of a million (25 to 1).
The pictures are really striking, as more and more singular values are included.
At first, you see nothing and suddenly you recognize everything. The cost is in
computing the SV D, and this has become much more efficient, but it is expensive
for a big matrix.
An example of SVD compression in image is demonstrated in Fig. 10.1, where
Fig. 10.1(a) is the original image of Airplane of size 3 × (512 × 512) which is con-
sidered as the test image. In order to apply SVD compression on color image, firstly
the color image is decomposed into three channels, i.e., Red, Green and Blue.
After that, each color channel passes through SVD compression. Finally, all the
three compressed channels are combined to generate the SVD compressed image.
Figure 10.1(b) shows the recovered image using the first 10 singular values having
Peak Signal-to-Noise Ratio (PSNR) value 22.62746 dB and Fig. 10.1(c) shows the
10.5 Applications of Singular Value Decomposition 363

Fig. 10.1 SVD-based compression of Airplane image having size 3 × (512 × 512): (a) Original
image; (b) Compressed image using the first 10 singular values; (c) Compressed image using the
first 30 singular values

7
10
Error between compress and original image

14

12

10

-2
50 100 150 200 250 300
Number of Singular Values

Fig. 10.2 Error of recovered image using different numbers of singular values

recovered image using the first 30 singular values having PSNR value 27.202 dB,
where the visual quality of the recovered image is evaluated by PSNR, and a high
value of PSNR shows that the reconstructed image has high visual quality. Moreover,
in Fig. 10.2, we can analyze using approximately the first 200 singular values yields
to approximately zero error of the generated image.
364 10 Applications of Linear Algebra to Numerical Methods

Exercises

1 −1
1. Find the singular value decomposition of each of the following matrices. ,
0 2
⎡ ⎤ ⎡ ⎤
 23 −3 0 0
2 −1 ⎢ 3 1 ⎥ ⎢ 0 1 1⎥
,⎢ ⎥ ⎢
⎣0 0⎦, ⎣ 0 0 2⎦.

1 2
00 0 00
2. Show that A and At have the same nonzero singular values. Do their singular
value decompositions related? Justify your answer.
3. Let A be a symmetric matrix with eigenvalues λ1 , λ2 , . . . , λn . Then show that
|λ1 |, |λ2 |, . . . , |λn | are singular values of A.
4. Let A be an m × n matrix with a singular value decomposition U DV t . Prove that

||AX ||2
min X =0 = σn ,
||X ||2

where σn is the minimum positive singular value of A.


5. Let A be an m × n matrix with a singular value decomposition U DV t . Show that
for any vector X ∈ Rn ,

σn ||X ||2 ≤ ||AX ||2 ≤ σ1 ||X ||2 ,

where σ1 is the maximum positive singular value of A.


6. If σ is a singular value of A, then prove that there exists a nonzero vector X such
that σ = ||AX ||2
||X ||2
.
7. Let A be an m × n matrix with a singular value decomposition U DV t . Show that
if A has rank r, where r < n, then A = Ur Dr Vrt , where Vr = (v1 , v2 · · · , vr ),
Ur = (u 1 , u 2 · · · , u r ) and Dr is an r × r diagonal matrix with diagonal entries
σ1 , σ2 , ·⎡· · , σr . ⎤
5 3 6
8. If A = ⎣ 1 2 −1 ⎦ , then find ||A||1 , ||A||2 , ||A|| F , ||A||∞ .
−3 4 5
9. (a) If A changes to α A, what is the change in the SVD?
(b) What is the SVD for At and for A−1 ?
Chapter 11
Affine and Euclidean Spaces and
Applications of Linear Algebra to
Geometry

This chapter is initially devoted to the study of subspaces of an affine space, by


applying the theory of vector spaces, matrices and system of linear equations. By
using methods involved in the theory of inner product spaces, we then stress prac-
tical computation of distances between points, lines and planes, as well as angles
between lines and planes in Euclidean spaces. Finally, we apply the development
of eigenvalues, eigenvectors, diagonalization and allied concepts to the study and
classification of the conic curves and quadric surfaces in the affine, Euclidean and
projective spaces.

11.1 Affine and Euclidean Spaces

Initially, we shall describe briefly the framework within which we are about to set to
work.
Definition 11.1 Let F be a field, V a vector space of dimension n over F and S a
nonempty set. The pair (S, V ) is said to be an affine space if there exists an application

ϕ:S×S→V

defined whereby two elements A, B ∈ S are associated with a vector of V, which is


denoted by ϕ(A, B) = AB, such that the following conditions are satisfied:
(1) For any fixed element O ∈ S and for any vector v ∈ V, there is a unique element
P ∈ S such that OP = v.
(2) For any P1 , P2 , P3 ∈ S, P1 P3 = P1 P2 + P2 P3 .
For brevity, we shall denote the affine space (S, V ) by A. Any element of S is called a
point of A. The definition of an affine space A = (S, V ) induces naturally a bijection

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 365
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_11
366 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

between the elements of S and vectors of V. In fact, after the choice of the origin
O ∈ S, any vector v ∈ V can be represented as v = OP, where P ∈ S is uniquely
determined. This bijection S → V is then depending only by the choice of the origin
point. Hence, if dim F V = n then A ∼
=V ∼= Fn . In this sense, we have the following:

Definition 11.2 An affine coordinate system of the affine space A consists of a fixed
point O ∈ S, called origin, and a basis {e1 , . . . , en } of V. It is usually denoted by
{O, e1 , . . . , en }. In terms of this system, the coordinates of any point P ∈ A are
defined as the coordinates of the vector ϕ(O, P) = OP with respect to the basis
{e1 , . . . , en } of V. Hence, if

OP = x1 e1 + x2 e2 + · · · + xn en ,

we say that x1 , . . . , xn are the coordinates of P and write P ≡ (x1 , . . . , xn ). The


coordinate system {O, e1 , . . . , en } is also called frame of reference in the space A.
If dim F (V ) = n ≥ 1, then we say that the affine space A has dimension n. Moreover,
if P, Q ∈ A have respectively coordinates (x1 , . . . , xn ) and (y1 , . . . , yn ) with respect
to {e1 , . . . , en }, then the vector PQ has coordinates (y1 − x1 , . . . , yn − xn ) in terms
of the frame of reference {O, e1 , . . . , en }.
The introduction of an affine space (S, V ) can be interpreted intuitively as saying
that S and V are two different ways of looking at the same object. In some sense,
A is the way of defining a vector space structure on the set S of points. However,
let us recall that the addition of points does not make sense: points are not vectors.
Nevertheless, fixed an origin point O ∈ A, the coordinates of any point P ∈ A in
terms of a frame of reference in A could be treated as coordinates of vector OP with
respect to the basis of V, and so can be used in vector operations.

Remark 11.3 Let V be a vector space, n-dimensional over the field F. If we define
the map ϕ : V × V → V by ϕ(v, w) = w − v, for any vectors v, w ∈ V, then the
pair (V, V ) is an affine space of dimension n. Indeed, the conditions introduced in
Definition 11.1 are satisfied:
(i) For any fixed vector u ∈ V and for any vector v ∈ V, there is a unique vector
w ∈ V such that
ϕ(u, v) = v − u = w.

(ii) For any vectors


 v1, v2, v3 ∈ V, ϕ(v1 , v3 ) = ϕ(v1 , v2 ) + ϕ(v2 , v3 ), because
v3 − v1 = v2 − v1 + v3 − v2 .
As a particular case, we consider V = Fn . The affine space (Fn , Fn ) is usually called
n-dimensional affine space over F and denoted by FAn .

Definition 11.4 Let A = (S, V ) be an affine space of dimension n, associated with


the n-dimensional real vector space V. We say that A is an affine Euclidean space
of dimension n if there is a fixed symmetric bilinear form f : V × V → R whose
11.1 Affine and Euclidean Spaces 367

associated quadratic form is positive definite. For brevity, we shall denote the affine
Euclidean space (S, V ) by E.

Hence, an affine Euclidean space E is associated with a real vector space V in which,
for any vectors v1 , v2 ∈ V , there corresponds a real nonnegative number f (v1 , v2 )
such that the following conditions are satisfied:
(1) For any vectors v1 , v2 , w ∈ V, f (v1 + v2 , w) = f (v1 , w) + f (v2 , w).
(2) For any vectors v1 , v2 ∈ V, f (v1 , v2 ) = f (v2 , v1 ).
(3) For any vectors v1 , v2 ∈ V and scalar λ ∈ R, f (λv1 , v2 ) = f (v1 , λv2 ) =
λ f (v1 , v2 ).
(4) For any vector 0 = v ∈ V, f (v, v) > 0.
A Euclidean coordinate system of E (also called Euclidean frame of reference in
E) is an affine coordinate system {O, e1 , . . . , en } of E, where vectors e1 , . . . , en are
pairwise f -orthonormal. For V = Rn , the affine Euclidean space (Rn , Rn ) is usually
called n-dimensional affine Euclidean space over R and denoted by REn .

Example 11.5 The real affine Euclidean plane RE2 in which f (v1 , v2 ) = v1 · v2
(the usual dot product) is a Euclidean space of dimension 2 associated with the
vector space R2 . It is well known how to introduce a coordinate system in the 2-
dimensional Euclidean plane RE2 (it is usually the 2-dimensional O X Y coordinate
system adopted for the Euclidean geometry in the plane). Choose an origin point
O in the plane and draw two perpendicular axes through it, one horizontal and one
vertical, respectively, called X and Y. Any point P can be uniquely represented by
the pair of real numbers (x, y), called the coordinates of the point P in terms of
the coordinate system O X Y. If i, j are the unit vectors of X and Y , respectively, the
point P has coordinates (x, y) if and only if

OP = xi + yj.

Hence, if v1 = α1 i + α2 j and v2 = β1 i + β2 j are two vectors in the real Euclidean


plane, then v1 · v2 = α1 β1 + α2 β2 .

Example 11.6 Analogously, one may introduce a coordinate system in the 3-


dimensional affine Euclidean space RE3 (it is usually the 3-dimensional O X Y Z
coordinate system adopted for the Euclidean geometry in the space). One may fix
an origin point O in the space and draw three pairwise perpendicular axes through
it, respectively, called X, Y and Z . Any point P can be uniquely represented by the
triplet of real numbers (x, y, z), called the coordinates of the point P in terms of the
coordinate system O X Y Z . If i, j, k are the unit vectors of X, Y and Z , respectively,
the point P has coordinates (x, y, z) if and only if

OP = xi + yj + zk.

Hence, if v1 = α1 i + α2 j + α3 k and v2 = β1 i + β2 j + β3 k are two vectors in the real


Euclidean space, then v1 · v2 = α1 β1 + α2 β2 + α3 β3 .
368 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Example 11.7 More generally, if REn is the affine Euclidean space of dimension n
over R, then there exists an orthonormal basis B of Rn in terms of which the inner
product of vectors is defined by

v1 · v2 = α1 β1 + · · · + αn βn

where (α1 , . . . , αn ) and (β1 , . . . , βn ) are the coordinates of v1 and v2 , respectively,


with respect to the basis B.
Definition 11.8 Let A be an affine space over the vector space V. A subset A ⊂ A is
called an affine subspace of A if the set of all vectors PQ, for any points P, Q ∈ A ,
forms a vector subspace V  of V. The dimension of V  as vector space is called
dimension of A as affine subspace of A.
In particular, we recall that
(1) If dim(A ) = 1, A is called line in A.
(2) If dim(A ) = 2, A is called plane in A.
(3) If dim(A ) = dim(A) − 1, A is called hyperplane in A.
The vector subspace V  of V associated with an affine subspace A of A is called
direction of A .
Example 11.9 Let A = RA2 be the real 2-dimensional affine plane. The affine sub-
spaces of A are
(1) The points in A: any point is an affine subspace of dimension 0.
(2) The lines in A: any line is an affine subspace of dimension 1.
Example 11.10 Let A = RA3 be the real 3-dimensional affine space. The affine
subspaces of A are
(1) The points in A: any point is an affine subspace of dimension 0.
(2) The lines in A: any line is an affine subspace of dimension 1.
(3) The planes in A: any plane is an affine subspace of dimension 2.
Example 11.11 Let A = Fn be the affine space over the vector space V = Fn , for
F a field. Consider the following system of linear equations in n unknowns:

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
(11.1)
.........
am1 x1 + am2 x2 + · · · + amn xn = bm

where coefficients ai j and constants bi lie in F. We recall that if we set


⎤ ⎡⎡ ⎤
⎡ ⎤ x1 b1
a11 a12 . . . a1n ⎢ x2 ⎥ ⎢ b2 ⎥
⎢ a21 a22 . . . a2n ⎥ ⎢ ⎥ ⎢ ⎥
A=⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ... ... ... ... ⎦, X = ⎢...⎥, B = ⎢...⎥,
⎣...⎦ ⎣...⎦
am1 am2 . . . amn
xn bm
11.1 Affine and Euclidean Spaces 369

the system (11.1) can be written in the compact form AX = B. The set of solutions
A is a subset of A = Fn ; more precisely, it is an affine subspace of A. To prove this
fact, we show that if Y1 , Y2 ∈ A are solutions of (11.1), then the vector Y1 Y2 lies in
the solution space of homogeneous system associated with (11.1), which is a vector
subspace of V = Fn . For any P, Q ∈ A, we set PQ = Q − P. By Remark 11.3, we
know that this definition induces a structure of affine space. From AY1 = B and
AY2 = B, it follows A(Y2 − Y1 ) = AY2 − AY1 = 0, that is, Y1 Y2 = Y2 − Y1 lies in
the solution subspace of the homogeneous system associated with (11.1), as asserted.

Every affine subspace A of A = (S, Fn ) can be defined by a system of linear


equations. In fact, the direction V  of A is a vector subspace of Fn , so that it
can be described by a system of homogeneous linear equations (r equations, if
dim(V  ) = n − r ):
f 1 (x1 , . . . , xn ) = 0
..
.
.. (11.2)
.
fr (x1 , . . . , xn ) = 0.

Now, fix an arbitrary point P ∈ A having coordinates (η1 , . . . , ηn ) in terms of a


frame of reference in A, and set

f 1 (η1 , . . . , ηn ) = b1 , . . . , fr (η1 , . . . , ηn ) = br .

Let Q ∈ A be any other arbitrary point whose coordinate vector is equal to


(ϑ1 , . . . , ϑn ). By the fact that PQ ∈ V  , it follows that the vector (ϑ1 − η1 , . . . , ϑn −
ηn ) is a solution of system (11.2), that is,

f 1 (η1 , . . . , ηn ) = f 1 (ϑ1 , . . . , ϑn ), . . . , fr (η1 , . . . , ηn ) = fr (ϑ1 , . . . , ϑn ).

This means that the coordinate vector of an arbitrary point Q ∈ A is a solution of


the linear system
f 1 (x1 , . . . , xn ) = b1
..
.
..
.
fr (x1 , . . . , xn ) = br .

Remark 11.12 Let A = Fn be the affine space over the vector space V = Fn , for F
a field. Fix a point p ∈ A and let W be a vector subspace of V. The set

p + W = { p + w|w ∈ W }

is an affine subspace of A associated with vector space W. In fact, for q1 = p + w1 ∈


p + W and q2 = p + w2 ∈ p + W, we have q1 q2 = q2 − q1 = w2 − w1 ∈ W.
370 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Let A = Fn be the affine space over the vector space V = Fn , for F a field. Under the
assumption that PQ = Q − P, for any P, Q ∈ A, any affine subspace of A has the
form Q + W, for some fixed point Q ∈ A and vector subspace W of V. Moreover,
any affine subspace A of A can be represented by the associated vector subspace W
and by any of its points Q. To prove this, assume A = Q + W, let P ∈ A be any
other point of A and set A = P + W. If R ∈ A , then

PR = PQ + QR = −QP + QR ∈ W

implying that R ∈ A . Conversely, if S ∈ A , then

QS = QP + PS = −PQ + PS ∈ W,

that is, S ∈ A . Thus we have proved that A = A .


Starting from the above remark, let us now show how any affine subspace
of FAn can be represented. Let V be the vector space associated with FAn and
B = {e1 , . . . , en } be a basis of V. Set A = P + W, where P ≡ (b1 , . . . , bn ) ∈ FAn
and W is a subspace of V. Moreover, let {w1 , . . . , wk } be a basis of W, and
wi ≡ (w1i , . . . , wni ) in terms of the basis B.
For any point Q ∈ A , PQ ∈ W implies that there exist suitable t1 , . . . , tk ∈ F
such that
PQ = t1 w1 + · · · + tk wk .

If we define PQ by its coordinate vector (x1 , . . . , xn ) with respect to basis B, we


have that
x1 = b1 + t1 w11 + · · · + tk w1k
x2 = b2 + t1 w21 + · · · + tk w2k
..
. (11.3)
..
.
xn = bn + t1 wn1 + · · · + tk wnk

called the parametric equations of A .

Example 11.13 A straight line r in RA3 is an affine subspace of dimension 1. Hence,


it can be represented by the coordinates of one of its point P0 ≡ (x0 , y0 , z 0 ) and the
coordinates of a vector v ≡ (l, m, n) which is parallel to the line. In other words, if
we denote by P ≡ (x, y, z) an arbitrary point of r, then the vector P0 P is parallel
to v, that is, there exists some t ∈ R such that P0 P = tv, where t is depending on
the choice of P ∈ r. Therefore, one may describe the coordinates of any point of the
straight line r, regardless of the variation of the parameter value t ∈ R. Hence, r can
be represented by the following relations:
11.1 Affine and Euclidean Spaces 371

x = x0 + tl
y = y0 + tm (11.4)
z = z 0 + tn

which are called parametric equations of the straight line r. Thus, for any point
P ∈ r, if we denote by X and X0 the coordinate vectors of P and P0 , respectively,
we may write X = X0 + tv.
The real numbers (l, m, n) are called direction ratios of the straight line r. There-
fore, two straight lines r and r  are parallel if and only if their direction ratios are,
respectively, (l, m, n) and ρ(l, m, n), for a suitable ρ ∈ R.
Moreover, if we consider two straight lines r and r  in RE3 , having respectively
direction ratios (l, m, n) and (l  , m  , n  ), we see that they are perpendicular if and
only if the vectors v ≡ (l, m, n) and v ≡ (l  , m  , n  ) are orthogonal, that is, the inner
product v · v is zero. This means that ll  + mm  + nn  = 0.

Example 11.14 Let now π be a plane of RA3 . It is an affine subspace of dimension


2 and could be uniquely represented by any of its points P0 ≡ (x0 , y0 , z 0 ) and by
a pair of vectors v1 , v2 that are parallel to the plane. Suppose that v1 ≡ (l1 , m 1 , n 1 )
and v2 ≡ (l2 , m 2 , n 2 ). The fact that π is a variety of dimension 2 implies that we
may consider the plane as the set of all points P ≡ (x, y, z) in RA3 such that the
displacement vector from P0 to P is a linear combination of v1 , v2 . Thus, P0 P =
tv1 + t  v2 , for t, t  ∈ R, and any point of π is determined by an appropriate pair
(t, t  ) of real parameters. On the other hand, P0 P ≡ (x − x0 , y − y0 , z − z 0 ), that is,

x − x0 = tl1 + t l2
y − y0 = tm 1 + t  m 2
z − z 0 = tn 1 + t  n 2

so that the set of all points (x, y, z) belonging to the plane is described by

x = x0 + tl1 + t l2
y = y0 + tm 1 + t  m 2
z = z 0 + tn 1 + t  n 2

which are called parametric equations of the plane.


Moreover, since the vectors of coordinates (x − x0 , y − y0 , z − z 0 ), (l1 , m 1 , n 1 )
and (l2 , m 2 , n 2 ) are linearly dependent, it follows that the matrix
⎡ ⎤
x − x0 y − y0 z − z 0
⎣ l1 m1 n1 ⎦
l2 m2 n2

has rank ≤ 2, in particular, its determinant is zero. By the easy computation of this
determinant, it follows that there exist suitable scalar coefficients a, b, c, d such that
372 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

the coordinates (x, y, z) of any point of the plane should satisfy the following relation
ax + by + cz + d = 0. It is called Cartesian equation of the plane.

Definition 11.15 Let A be an affine space over the vector space V and A , A two
affine subspaces of A, having directions V  and V  , respectively. A and A are said
to be parallel if either V  ⊆ V  or V  ⊆ V  . In particular, if A and A have the same
dimension, we say that they are parallel if V  = V  .

Example 11.16 Consider the following affine subspaces in RA3 :

A having direction V  = (1, −2, 2) ,

A having direction V  = (0, 1, −1), (1, 1, −1)

and notice V  ⊂ V  as vector spaces. Then A and A are parallel subspaces in RA3 .
In particular, we may look at that from the geometrical point of view, saying that
any line having direction ratios (1, −2, 2) and any plane containing vectors whose
coordinates are of the form α(0, 1, −1) + β(1, 1, −1), for any α, β ∈ R, are parallel
in the classical affine 3-dimensional space.

Example 11.17 Consider the following two linear systems of equations in 4


unknowns:
x1 − x2 + 2x3 − x4 = 1,
2x1 + x2 + x3 + x4 = 1, (11.5)
2x1 + x2 − x3 − x4 = 1,

x2 − x3 + x4 = 1,
(11.6)
x3 + x4 = 3.

The set A of solutions of (11.5) and the set A of solutions of (11.6) are parallel
affine subspaces in RA4 . To prove it, firstly we compute the direction V  of A . It
consists of the solutions of the homogeneous linear system

x1 − x2 + 2x3 − x4 = 0,
2x1 + x2 + x3 + x4 = 0,
2x1 + x2 − x3 − x4 = 0,

that is, V  = (1, −2, −1, 1) . We may also notice that the point P  = ( 23 , − 13 , 0, 0)
is a solution of system (11.5). Hence, A = P  + V  can be represented by relations:

x1 = 23 + α
x2 = − 13 − 2α
, α ∈ R.
x3 = −α
x4 = α
11.1 Affine and Euclidean Spaces 373

On the other hand, direction V  of A consists of the solutions of the homogeneous
linear system
x2 − x3 + x4 = 0,
x3 + x4 = 0,

that is, V  = (1, 0, 0, 0), (0, −2, −1, 1) . Since the point P  = (0, 4, 3, 0) is solu-
tion of system (11.6), we write A = P  + V  and it can be represented by

x1 = β
x2 = 4 − 2γ
, β, γ ∈ R.
x3 = 3−γ
x4 = γ

Then V  ⊂ V  and the assertion is proved.

Proposition 11.18 Every hyperplane H in REn has the form

Hu,λ = {P ∈ REn having coordinate vector Y ∈ Rn : Y · u = λ} (11.7)

for some nonzero vector u ∈ Rn that is orthogonal to the hyperplane and some λ ∈ R.

Proof Fix any point Q ∈ H ; let X be its coordinate vector. Then H − Q is a hyper-
plane containing the origin of the frame of reference and parallel to H. If u ∈ Rn is
orthogonal to H, then a point P, having coordinate vector Y, lies in H if and only if
the vector Y − X is orthogonal to u, that is, Y · u = X · u. Thus H has the form (11.7)
for λ = X · u. More precisely, if u = [a1 , . . . , an ]t ∈ Rn , then H is represented by
the linear equation
a1 x1 + · · · + an xn − λ = 0.

Notice that, following the argument presented in the above proposition, the hyper-
plane H contains the origin point of the frame of reference if and only if λ = 0.

Example 11.19 Let u = [1, 2, −1]t ∈ R3 . Then by the symbol Hu,λ we mean all
hyperplanes in RE3 that are orthogonal to vector u. The different hyperplanes Hu,λ
are parallel to each other, as λ varies. Thus, for instance,
(1) for λ = 0, Hu,0 is represented by equation x + 2y − z = 0;
(2) for λ = −1, Hu,−1 is represented by equation x + 2y − z + 1 = 0;
(3) for λ = 2, Hu,2 is represented by equation x + 2y − z − 2 = 0.

By using the dot product of inner product vector space R3 , we are in a position to
compile an overview of the main formulae and techniques enabling the computation
of angles and distances in the 3-dimensional real Euclidean space.
374 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Projection of Vectors in RE3


One important use of dot product is in projections. Here, we describe two different
kinds of projections of vectors: onto a line and onto a plane in the affine Euclidean
space RE3 .
Of course, the concept of orthogonal projection covers the more general situation
related to inner product n-dimensional spaces. Here, we just would like to show how
those arguments may be applied to the classical 3-dimensional analytic geometry.
More precisely, given a vector v ∈ R3 and a subspace W of R3 generated by vectors
{e1 , . . . , ek } (k = 1, 2), we say that v is the orthogonal projection vector of v onto
W if there exists w ∈ R3 that is orthogonal to any vector in W, such that v = v + w.
We know that in case {e1 , . . . , ek } is an orthogonal basis for W, then we may obtain
the orthogonal projection as

k
v · ei
v = ei . (11.8)
i=1 i
e · ei

Example 11.20 Let r be the line in RE3 defined by equations

x = 1 + 2t
y = 2−t , t ∈R
z = 3t

and v be the vector of coordinates (2, −1, 1) in terms of the standard frame of
reference in O X Y Z . In order to get the projection of v onto r, we may compute
the projection onto r of a vector which is equipollent to v (that is, it has the same
length, direction and sense of v) but has its tail on r. We obtain this vector by a
simple translation of v. Actually, without loss of generality, we may assume that v
is precisely applied to r. Notice that the direction of r is represented by the vector
e = (2, −1, 3). At this point, performing the formula (11.8), we get

 v·e 8 4 12
v = e= ,− , .
e·e 7 7 7

Example 11.21 Let π be the plane in RE3 defined by equations

x = 1 + t − t
y = 2 + 2t − t  , t, t  ∈ R
z = 1 − t + t

and v be the vector of coordinates (1, 1, 2) in terms of the standard frame of reference
in O X Y Z . As above, we may assume that v is precisely applied to π. Starting from
the definition of π and using the classical orthogonalization process, we arrive at
the conclusion that the vector space representing the direction of π is generated by
11.1 Affine and Euclidean Spaces 375

orthogonal vectors e1 = (1, 2, −1), e2 = (−1, 1, 1). At this point, performing the
formula (11.8), we get

 v · e1 v · e2 1 1
v = e1 + e2 = − , 1, .
e1 · e1 e2 · e2 2 2

The Angle Enclosed Between Two Straight Lines


Let (l, m, n) be the direction ratios of a straight line r. The direction cosines of r are
the cosines of the angles between the unit vector of r having coordinates

l m n
√ ,√ ,√
l 2 + m 2 + n2 l 2 + m 2 + n2 l 2 + m 2 + n2

and the positive X, Y and Z axes, respectively. Therefore, the direction cosines of r
can be represented by

l l
α = cos(
r, X ) ∈ +√ , −√ ,
l 2 + m 2 + n2 l 2 + m 2 + n2

 m m
β = cos(r, Y) ∈ +√ , −√ ,
l 2 + m 2 + n2 l 2 + m 2 + n2

n n
γ = cos(
r, Z ) ∈ + √ , −√ .
l 2 + m 2 + n2 l 2 + m 2 + n2

Let now (l, m, n) and (l  , m  , n  ) be the direction ratios of the straight lines r and r  ,
respectively. The cosine of the angle enclosed between r and r  is equal to the cosine
of the angle enclosed between the unit vectors of r and r  , having coordinates

l m n
√ ,√ ,√
l 2 + m 2 + n2 l 2 + m 2 + n2 l 2 + m 2 + n2

and
l m n
√  , √ , √ ,
l 2 + m 2 + n 2 l 2 + m 2 + n 2 l 2 + m 2 + n 2

respectively, that is,

ll  + mm  + nn 
 ) = ± √
cos(rr √ .
l 2 + m 2 + n 2 l 2 + m 2 + n 2

Example 11.22 Consider the lines


376 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

x = 1 + 2t x = 2 + 3t 
r : y = 1 − t , r : y = 1 + 2t  , t, t  ∈ R.


z = 1+t z = 1 + t

The directions of r and r  are (2, −1, 1)r and (3, 2, 1)r  , respectively. Their direction
cosines are
2 1 1
X ) = ± √ , cos(r
cos(r Y ) = ∓ √ , cos(r
Z) = ±√ ,
6 6 6

3 2 1
cos(r
X) = ±√ , cos(r
Y ) = ± √ , cos(r
Z) = ±√ .
14 14 14

The cosine of the angle enclosed between r and r  is equal to

5
 ) = ± √ √ .
cos(rr
6 14

Distance Between a Point and a Hyperplane


Let H be a hyperplane in REn , V the direction of H and u = (α1 , . . . , αn ) an orthog-
onal vector to H. As remarked above, there exists λ ∈ R such that H is represented
by equation

n
f (x1 , . . . , xn ) = 0 where f (x1 , . . . , xn ) = αi xi − λ. (11.9)
i=1

Let now P0 be a point of REn , having coordinate vector X0 = (β1 , . . . , βn ) in terms


of a fixed frame of reference in REn . The distance from P0 to H is defined as the
length of vector P0 Q0 , where Q 0 is the orthogonal projection of P0 onto H. To obtain
Q 0 , we have to construct the affine subspace H ⊥ = P0 + V ⊥ , which is orthogonal
to H and containing P0 . Then compute Q 0 = H ∩ H ⊥ . If we describe H ⊥ in its
parametric form

x1 = β1 + α1 t
H : ...

, t ∈R (11.10)
xn = βn + αn t

and substitute (11.10) in (11.9), we get

n
(αi βi + αi2 t) − λ = 0,
i=1
11.1 Affine and Euclidean Spaces 377

that is,
f (X0 ) + t u2 = 0.
(X0 )
Thus, point Q 0 is obtained by (11.10) for t = − fu 2 . The coordinate vector repre-

senting P0 Q0 is then

f (X0 ) f (X0 ) f (X0 )
− α1 , −α2 , . . . , −αn
u 2
u 2
u2

whose length is 
f (X0 )2 | f (X0 )|
(α12 + · · · + αn2 ) = .
u 4 u

Example 11.23 Let now P0 ≡ (x0 , y0 , z 0 ) be a point of RE3 and ax + by + cz +


d = 0 the equation of a plane π. The distance from P0 to π is equal to the length of
the vector P0 Q0 , where Q 0 is the orthogonal projection point of P0 on π.
Since
a b c
u≡ √ ,√ ,√
a 2 + b2 + c2 a 2 + b2 + c2 a 2 + b2 + c2

represents the normal direction to π, then

|ax0 + by0 + cz 0 + d|
δ(P0 , H ) = √ .
a 2 + b2 + c2

Example 11.24 As a consequence, we may also obtain the distance between two
parallel planes. To do this, without loss of generality, we consider the planes π and
π  having equations

π : ax + by + cz + d = 0, π  : ax + by + cz + d  = 0, d = d  .

If P0 ≡ (x0 , y0 , z 0 ) ∈ π, then the distance δ(π, π  ) between π and π  is equal to the


distance from P0 to π  , that is,

|ax0 + by0 + cz 0 + d  | |d  − d|
δ(π, π  ) = δ(P0 , π  ) = √ =√
a 2 + b2 + c2 a 2 + b2 + c2

because ax0 + by0 + cz 0 = −d.

Distance Between Two Skew Lines


Consider now two skew lines (that is, non-parallel non-intersecting lines) r and r  .
This means that there exist two parallel planes π and π  , containing r and r  , respec-
tively. One of the more important problems in Geometry is finding the minimum
378 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

distance between the two lines, which is the distance between π and π  . It is nat-
urally the shortest distance between the lines, i.e., the length of the orthogonal line
segment to both lines. The solution of the problem is very simple through the use of
both scalar product · and cross product ∧ of vectors. In detail, let v = (l, m, n) and
v = (l  , m  , n  ) be two vectors representing the direction ratios of r and r  , respec-
tively. The vector v ∧ v is orthogonal to v and v , that is, to r and r  . Hence, for any
choice of two points P ∈ r and Q ∈ r  , the absolute value of the scalar projection
of PQ in the direction of v ∧ v is the minimum distance δ(r, r  ) between r and r  :
 
 v ∧ v 
 
δ(r, r ) = PQ ·   .
v ∧ v  

Example 11.25 Consider the following lines in RE3 :

x= 0 x = −2 + t 
r : y = t , r : y = −1 − 3t  , t, t  ∈ R.


z = 1+t z = −2t 

We may describe r as an affine subspace of the form P + V, where P is the point of


r having coordinates (0, 0, 1) and V = (0, 1, 1) is its direction. Analogously, r  is
an affine subspace of the form Q + W, where Q is the point of r  having coordinates
(−2, −1, 0) and W = (1, −3, −2) is its direction. The conclusion that r and r 
are skew lines stems from the fact that vectors PQ ≡ (−2, −1, −1), (0, 1, 1) and
(1, −3, −2) are linearly independent. The cross product of directions (0, 1, 1) and
(1, −3, −2) is equal to the vector
 
i j k 
 
0 1 1  = i + j − k
 
 1 −3 −2 

and its length is equal to 3. Then, by performing the above-discussed formula, we
obtain the minimum distance between r and r  as follows:
 
 i + j − k  2
 
δ(r, r ) = PQ · √ =√ .
3  3

Distance Between a Point and a Line


Let r be a line having direction ratios (l, m, n), P1 a point of coordinates (x1 , y1 , z 1 ),
which does not belong to r. To obtain the minimum distance δ(P1 , r ) from P1 to r,
we may construct the plane π, containing P1 and orthogonal to r. If Q 1 = π ∩ r is
the intersection point of π and r, then the length of P1 Q1 is the minimum distance
between P1 and r. In order to get the vector P1 Q1 , one may also proceed as follows.
Let Q 0 ≡ (x0 , y0 , z 0 ) be any point of r and consider the parallelogram P having
height P1 Q1 and base Q1 Q0 . Hence, the area of P is equal to
11.1 Affine and Euclidean Spaces 379

P1 Q1  Q1 Q0  .

On the other hand, since Q1 Q0 and Q0 P1 are the sides of P, it can be also obtained
as the norm of Q1 Q0 ∧ Q0 P1 . Therefore,

P1 Q1  Q1 Q0  = Q1 Q0 ∧ Q0 P1  .

Moreover, Q1 Q0 is a line segment of r, thus there exists a suitable α ∈ R such that


Q1 Q0 ≡ α(l, m, n). From this we get
 
 i j k 
 

Q1 Q0 ∧ Q0 P1 =  αl αm αn 
 (x1 − x0 ) (y1 − y0 ) (z 1 − z 0 ) 

and the norm of Q1 Q0 ∧ Q0 P1 is equal to


     
 x1 − x0 y1 − y0 2  x1 − x0 z 1 − z 0 2  y1 − y0 z 1 − z 0 2

α   
+  
+  .
l m  l n  m n 

Since Q1 Q0  is precisely α l 2 + m 2 + n 2 , we conclude that δ(P1 , r )

= P1 Q1 

Q1 Q0 ∧Q0 P1 
= Q1 Q0 

 2  2  2

  x1 − x0 y1 − y0   x1 − x0 z 1 − z 0   y1 − y0 z 1 − z 0 
 + +
 l m   l n   m n 
= l 2 +m 2 +n 2
.

Example 11.26 Let P0 ∈ RE3 be the point of coordinates (1, −1, 2) and r the line
represented by the parametric form

x = 1 + 2t
r : y = 1 − t , t ∈ R.
z =1+t

In order to obtain the minimum distance from P0 to r, we firstly compute the coor-
dinate of vector P0 Q, where Q is any point of r. For instance, if we choose Q as the
point of coordinates (1, 1, 1), it follows that P0 Q ≡ (0, 2, −1). Then the requested
distance is  2  2  2

  −1 1   2 1   2 −1 
  +  + 
 2 −1   0 −1   0 2 
δ(P0 , r ) = 4+1+1

= √7 .
2
380 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Exercises

1. Let A = RA5 be equipped with the standard frame of reference. Determine para-
metric and linear equations representing the affine subspace of A having direc-
tion V = (0, 1, 1, 0, 0), (0, 0, 0, 1, 0), (1, 1, 0, 0, −1) and containing the point
P ≡ (1, −1, 0, 0, 0).
2. In the affine space RA6 equipped with the standard frame of reference, represent
the affine subspace containing the following points:

P1 ≡ (2, 2, 0, 0, 0, 0), P2 ≡ (1, 0, 1, 0, 0, 0),

P3 ≡ (0, −2, 2, 0, 0, 0), P4 ≡ (2, 1, −1, 1, 0, 0).

3. In the Euclidean space RE3 equipped with the standard frame of reference, deter-
mine the projection of vector v ≡ (1, 2, −1) onto the hyperplane containing the
origin point and having direction V = (2, 1, 0), (−1, 1, 1) .
4. In the Euclidean space RE3 , consider the following lines: r1 containing the point
P ≡ (1, 1, 2) and having direction V = (1, 1, 0) , r2 as intersection of planes
having equations x + y − z + 1 = 0 and 2x + y + z − 1 = 0. Prove that r1 and
r2 are skew lines and determine their minimum distance.
5. Let P1 , P2 , P3 be three points in the affine space RA2 equipped with the standard
frame of reference. Letting (α1 , α2 ), (β1 , β2 ) and (γ1 , γ2 ) be their coordinates,
prove that P1 , P2 , P3 are collinear if and only if the matrix
⎡ ⎤
α1 α2 1
⎣ β1 β2 1 ⎦
γ1 γ2 1

has determinant equal to zero.


6. Let P1 , P2 , P3 , P4 be four points in the affine space RA3 equipped by the standard
frame of reference. Letting (α1 , α2 , α3 ), (β1 , β2 , β3 , ) (γ1 , γ2 , γ3 ) and (δ1 , δ2 , δ3 )
be their coordinates, prove that P1 , P2 , P3 , P4 are coplanar if and only if the matrix
⎡ ⎤
α1 α2 α3 1
⎢ β1 β2 β3 1⎥
⎢ ⎥
⎣ γ1 γ2 γ3 1⎦
δ1 δ2 δ3 1

has determinant equal to zero.


11.2 Affine Transformations 381

11.2 Affine Transformations

Definition 11.27 Let F be a field, V, V  two vector spaces over F, S, S  two


nonempty sets and A = (S, V ), A = (S  , V  ) the corresponding affine spaces. An
affine transformation of A into A is a map f : A → A such that there exists a linear
transformation ϕ : V → V  satisfying the following condition:

f(P)f(Q) = ϕ(PQ), for all P, Q ∈ A.

The homomorphism ϕ : V → V  is called the linear part of f (or also the associated
homomorphism with f ). An affine transformation of A is an isomorphism of A into
itself and its linear part is an isomorphism ϕ : V → V.
The set of all affine transformations of A is usually denoted by A f f (A). The fact
that it is a group can be easily checked.
Remark 11.28 In fact,
(i) the identity map η : A → A is the affine transformation having the identity
map on V as associated automorphism,
(ii) the composition of two affine transformations g ◦ f of A (having associated
automorphisms χ , ϕ : V → V , respectively) is the affine transformation asso-
ciated with the automorphism χ ◦ ϕ of V,
(iii) the inverse of the affine transformation f : A → A, having associated auto-
morphism ϕ : V → V, is the affine transformation f −1 : A → A having
ϕ −1 : V → V as associated automorphism.
Example 11.29 Let A = (S, V ) be an affine space, where V is a vector space over
the field F, and let v ∈ V be a fixed vector. Consider the map f v : A → A defined
as follows: for any P ∈ A,

f v (P) = Q ∈ A ⇐⇒ PQ = v.

First, we notice that f v (P1 ) = f v (P2 ) = Q ∈ A if and only if P1 Q = P2 Q, i.e., if


and only if P1 = P2 ∈ A. Hence f v is injective. Moreover, for any Q ∈ A and taking
P = f −v (Q), one has that QP = −v, that is, PQ = v. Thus Q = f v (P), i.e., f v
is surjective. Let now P, Q ∈ A, such that P1 = f v (P) and P2 = f v (Q), that is,
PP1 = QP2 = v. Thus, we have

fv (P)fv (Q) = P1 P2
= P1 P + PQ + QP2
= −v + PQ + v
= PQ.

Therefore, f v is an affine transformation of A having the identity map of V as


associated automorphism. The above- described map is usually called translation of
A defined by v.
382 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Remark 11.30 Let A = (S, V ) be an affine space and f : A → A any affine trans-
formation associated with the identity map of V. It follows that, for any P, Q ∈ A,
both the following hold:
f(P)f(Q) = PQ

and
Pf(P) = PQ + Qf(Q) + f(Q)f(P)
= PQ + Qf(Q) − PQ
= Qf(Q).

Therefore, the vector v = Pf(P) is not depending on the choice of point P ∈ A.


Thus, f = f v is a translation in the sense of previous definition.
In other words, translations of A cover completely the set of affine transformations
having the identity map of V as associated automorphism.

Remark 11.31 One can easily see that


(i) the identity map η : A → A is the translation f 0 of A defined by the zero vector
of V.
(ii) the composition of two translations f w ◦ f v (defined by vectors w, v ∈ V ) of
A is the translation of A defined by the vector w + v.
(iii) the inverse of the translation f v (defined by vector v ∈ V ) of A is the translation
of A defined by vector −v.
Hence, the set TA of all translations of A is a group. Moreover, the map χ : TA → V
defined by χ ( f v ) = v ∈ V, for any f v ∈ TA , is bijective and

χ ( f w ◦ f v ) = w + v = χ ( f w ) + χ ( f v ),

that is, χ is an isomorphism.

Example 11.32 Let A = (S, V ) be an affine space, where V is a vector space over
the field F. Let P0 ∈ A be a fixed point and 0 = λ ∈ F. Consider the map f P0 ,λ :
A → A defined as follows: for any P ∈ A,

f P0 ,λ (P) = Q ∈ A ⇐⇒ P0 Q = λP0 P.

The point P0 is called the center of f P0 ,λ and the nonzero scalar λ is called its
ratio. If we assume f P0 ,λ (P1 ) = f P0 ,λ (P2 ) = Q ∈ A, then λP0 P1 = λP0 P2 , imply-
ing P1 = P2 ∈ A. Hence, f P0 ,λ is injective. Moreover, for any Q ∈ A and tak-
ing P = f P0 ,λ−1 (Q), one has that P0 P = λ−1 P0 Q, that is, P0 Q = λP0 P. Thus
Q = f P0 ,λ (P), i.e., f v is surjective.
Let now P, Q ∈ A, such that P1 = f P0 ,λ (P) and P2 = f P0 ,λ (Q), that is, P0 P1 =
λP0 P and P0 P2 = λP0 Q. It follows that
11.2 Affine Transformations 383

fP0 ,λ (P)fP0 ,λ (Q) = P1 P2


= P1 P0 + P0 P2
= −λP0 P + λP0 Q
= λPQ.

Therefore, f P0 ,λ is an affine transformation of A. The associated automorphism to


f P0 ,λ is the map ϕ : V → V defined by ϕ(v) = λv, for any v ∈ V. The above-
described affine transformation is called homothety of A centered at P0 with scale
factor (or ratio) λ.
Remark 11.33 Fix any point P0 ∈ A, the set of all homotheties of A centered at P0
is usually denoted by H om(A) P0 . It is easy to see that H om(A) P0 is a group. In fact
(i) the identity map η : A → A is the homothety f P0 ,1 of A.
(ii) the composition of two homotheties f P0 ,μ ◦ f P0 ,λ (having ratios μ, λ, respec-
tively) of A is the homothety f P0 ,μλ of A.
(iii) the inverse of the homothety f P0 ,μ of A is the homothety f P0 ,μ−1 of A.
Example 11.34 Let A = (S, V ) be an affine space, where V is a vector space of
dimension n over the field F, O ∈ A be a fixed point and consider the map f O : A →
A defined by
P ∈ A, f O (P) = P  ⇐⇒ OP = −OP.

It is easy to see that f O is an affine transformation. It is usually called central symmetry


with respect to the point O. The linear part of the central symmetry is represented by
the matrix −In , where In is the identity matrix in Mn (F). A subset S ⊆ A is called
centrally symmetric with respect to the point O, if f O (S) = S. In this case, O is
usually called the center of S.
We remark that any affine subspace S of an affine space A is symmetric with respect
to any of its points. In fact, let P0 ∈ S be any point of S and W the vector subspace
associated with S. Then, for any point P ∈ S, P0 P ∈ W. The central symmetry f P0 :
A → A with respect to P0 induces the identity P0 fP0 (P) = −P0 P. Thus, P0 fP0 (P) ∈
W and f P0 (P) ∈ S. By the definition of central symmetry, it follows that
(1) f : A → A is a central symmetry with respect to the point P if and only if
f (P) = P.
(2) Let f : A → A be a central symmetry of the n-dimensional affine space A with
respect to the point P having coordinates (γ1 , . . . , γn ) in terms of a fixed frame
of reference. If Q ∈ A has coordinates (α1 , . . . , αn ), then its image f (Q) has
coordinates (β1 , . . . , βn ) if and only if βi = 2γi − αi , for any i = 1, . . . , n.
Definition 11.35 Let A ⊂ A be two affine spaces. An affine transformation f :
A → A is called projection of A onto A if f (A) = A .
Proposition 11.36 Let f : A → A be a projection of A onto its subspace A . Then,
for any point P  ∈ A , the pre-image f −1 (P  ) ⊆ A is an affine subspace of A. In
particular, if f : A → A is a projection of A onto its subspace A , then for distinct
points P  , P  ∈ A , the affine subspaces f −1 (P  ) and f −1 (P  ) are parallel.
384 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Proof Let V and V  be the respective vector spaces of A and A , and ϕ : V → V  the
linear transformation associated with f. For any points Q 1 , Q 2 ∈ f −1 (P  ), we see
that ϕ(Q1 Q2 ) = f(Q1 )f(Q2 ) = 0 (since f (Q 1 ) = f (Q 2 ) = P  ). Hence, the vector
Q1 Q2 lies in the null space of ϕ (which is a subspace of V ).
Consider now P ∈ f −1 (P  ) ⊆ A and any vector u ∈ K er (ϕ), the null space of
ϕ. Then, there exists Q ∈ A such that PQ = u, that is, 0 = ϕ(PQ) = f(P)f(Q), and
f (P) = f (Q) follows. Thus, Q ∈ f −1 (P  ) and K er (ϕ) is precisely the vector space
associated with the affine subspace f −1 (P  ).
Definition 11.37 Let f : A → A be a projection of A onto its subspace A and
S ⊆ A (not necessarily a subspace of A ). The set

f −1 (S) = f −1 (P)
P∈S

is called a cylinder in A.
Let us now write down the action of affine transformations of affine spaces in coor-
dinate form. To do this, let A be an affine n-dimensional space associated with the
vector space V over the field F, B = {O, e1 , . . . , en } a frame of reference of A. We
prove the following.
Theorem 11.38 The map f : A → A is an affine transformation if and only if there
exist a nonsingular matrix A ∈ Mn (F) and a fixed point c ∈ A such that, for any
point P ∈ A having coordinate vector X = [x1 , . . . , xn ]t in terms of B, the point
f (P) ∈ A has coordinate vector Y = [y1 , . . . , yn ]t in terms of B, satisfying the
following relation:
Y = AX + c. (11.11)

Proof Assume that f : A → A is an affine transformation associated with the auto-


morphism ϕ : V → V , and let A ∈ Mn (F) be the matrix of ϕ with respect to the
basis {e1 , . . . , en } of V. Set c = f (O) and let [c1 , . . . , cn ]t be the coordinate vector
of f (O) in terms of B. Hence, we have that

OP = x1 e1 + · · · + xn en
Of(P) = y1 e1 + · · · + yn en
Of(O) = c1 e1 + · · · + cn en

and ϕ(ei ) = Aei , for each i = 1, . . . , n, treating ϕ(ei ) and ei as column vectors.
Thus,
ϕ(OP) = ϕ(x1 e1 + · · · + xn en ) = A(x1 e1 + · · · + xn en )

and
f(O)f(P) = (y1 − c1 )e1 + · · · + (yn − cn )en .

On the other hand, ϕ(OP) = f(O)f(P), hence


11.2 Affine Transformations 385

A(x1 e1 + · · · + xn en ) = (y1 − c1 )e1 + · · · + (yn − cn )en ,

that is, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1 y1 c1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
A⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥=⎢ . ⎥−⎢ . ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
xn yn cn

as required.
Conversely, for any invertible matrix A ∈ Mn (F) associated with an automor-
phism of V with respect to the basis {e1 , . . . , en }, and for any point c ∈ A having
coordinate vector [c1 , . . . , cn ]t in terms of B, let f A,c : A → A be the map defined
by relation (11.11). This map is an affine transformation. In fact, for any P, Q ∈ A,
having respectively coordinate vectors X = [x1 , . . . , xn ]t and Y = [y1 , . . . , yn ]t in
terms of B, the following holds:

fA,c (P)fA,c (Q) = f A,c (P) − f A,c (Q)


= AX + c − (AY + c)
= A(X − Y )
= ϕ(PQ).

Remark 11.39 Translations of an n-dimensional affine space A are precisely all the
affine transformations of the form f In ,c , where c ∈ A is a fixed point and In ∈ Mn (F)
is the identity matrix.
One of the most relevant aspects of the affine transformations is that some properties
are invariant under the action of such transformations. If a subset S ⊂ A possesses a
property that is invariant under the action of f, then f (S) ⊂ A is a subset having the
same property. Later, we describe in detail the application to the case of geometric
figures having properties that are invariant under the action of affine transformations.
Here, we firstly would like to fix some useful results:
Theorem 11.40 Let A be the affine space FAn and f : A → A an affine transfor-
mation. Then
(i) f maps an affine subspace A to an affine subspace having the same dimension
of A .
(ii) f preserves the property of parallelism among affine subspaces.
In particular, if n = 2, let A be the affine plane FA2 or the affine 3-dimensional space
FA3 and f : A → A an affine transformation. Then, it is obvious to observe that f
maps a line to a line and it also preserves the property of parallelism among lines.
Further, for n = 3 if A is the affine 3-dimensional space FA3 and f : A → A an
affine transformation, then f maps a plane to a plane; it preserves the property of
parallelism among planes and preserves the property of parallelism among a plane
and a line.
386 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Proof (i) To show these properties, without loss of generality, we may consider
A = P0 + W, for a fixed point P0 ∈ A and a vector subspace W of Fn . Here, we
have identified V by Fn because V  Fn , where V is the vector space associated
with the affine space A. Assume that {w1 , . . . , wk } is a basis of W. Thus, for any
point P ∈ A ,
P0 P = t1 w1 + · · · + tk wk

for suitable t1 , . . . , tk ∈ F. If we denote by X and X 0 the coordinate vector of P and


P0 , respectively, we may write

X = X 0 + t1 w1 + · · · + tk wk .

The coordinates of the point f (P) may be computed using the expression f (X ) =
AX + c, where A is an invertible matrix of Mn (F) and c is a fixed point of A . Thus

f (P) = f (X 0 + t1 w1 + · · · + tk wk )
= A(X 0 + t1 w1 + · · · + tk wk ) + c
= (AX 0 + c) + t1 Aw1 + · · · + tk Awk
= P1 + w

where P1 is the point having coordinate vector AX 0 + c and w = t1 Aw1 + · · · +


tk Awk . Hence, all points of the form f (P) describe an affine subspace whose associ-
ated vector space W  has dimension k and is generated by vectors {Aw1 , . . . , Awk .}
Moreover, since A represents a nonsingular operator of Fn , {Aw1 , . . . , Awk } is an
independent set of vectors, so it is a basis of ϕ(W ), where ϕ is the linear operator
of Fn defined by A. Hence, we may write f (A ) = f (P0 ) + ϕ(W ). As required, the
dimension of f (A ) coincides with the one of A .
(ii) Consider now A a subspace which is parallel to A . As above, A = P1 + U,
for a fixed point P1 ∈ A and a vector subspace U of Fn . Without loss of general-
ity, we may assume that U is a vector subspace of W. Hence, if {u1 , . . . , uh } is
a basis of U, we may complete a basis of W by adding appropriate k − h vec-
tors w1 , . . . , wk−h . By the above argument, {Au1 , . . . , Auh } is a basis of ϕ(U ),
{Au1 , . . . , Auh , Aw1 , . . . , Awk−h } is a basis of ϕ(W ) and we may represent the
images of the affine subspaces as

f (A ) = f (P0 ) + ϕ(W ), f (A ) = f (P1 ) + ϕ(U ).

Since ϕ(U ) is a vector subspace of ϕ(W ), we conclude that f (A ) and f (A ) are
parallel.

As a consequence of Theorem 11.40, we have the following.


Corollary 11.41 Let A be the affine space FAn and f : A → A a translation. Then
f maps an affine subspace A to an affine subspace that is parallel to A and has the
same dimension of A .
11.2 Affine Transformations 387

Now, we spend a few lines in order to introduce the first step of the study of conics
in affine spaces. We’ll come back to a deep analysis of conics in the sequel; here, we
just would like to remark some properties which are strictly related to the previous
results.
A conic section or a conic is the locus of a point which moves in a plane so that its
distance from a fixed point is in a constant ratio to its perpendicular distance from a
fixed straight line. The fixed point is called the focus, the fixed straight line is called
the directrix and the constant ratio is called eccentricity usually denoted by e. The
line passing through the focus and perpendicular to the directrix is called axis, and
the point of intersection of a conic with its axis is called a vertex.

M
rix
ct
ire
D

P
Moving Point
S
Focus

If S is the point ( p, q) and the directrix is x + my + n = 0, then


P S = (x − p)2 + (y − q)2 ; P M = | x+my+n|

2 +m 2
. Then PPMS = e implies that

( 2 + m 2 ){(x − p)2 + (y − q)2 } = e2 ( x + my + n)2 ,

which is a particular form of

ax 2 + 2hx y + by 2 + 2gx + 2 f y + c = 0, (11.12)

the general equation of the second degree. Conversely, it can be proved that the
general equation of the second degree (11.12) also represents a conic section. Hence,
a conic section can be defined in the following way also:
A conic section is the set of all points in a plane, whose coordinates
 satisfy
 the
a h g 
 
general equation of the second degree given by (11.12). Now let  =  h b f  . Then
g f c 
the conic section represented by (11.12) is called
• a parabola if  = 0, h 2 = ab;
• an ellipse if  = 0, h 2 < ab, (either a = b or h = 0);
• a circle if  = 0, h 2 = ab, a = b, h = 0;
• a hyperbola if  = 0, h 2 > ab.

Remark 11.42 (i) If  = 0, then corresponding conic section is known as non-


degenerate.
(ii) If  = 0, then corresponding conic section is known as degenerate.
• If  = 0, h 2 > ab, then (11.12) represents a pair of distinct real lines.
388 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

• If  = 0, h 2 = ab, (either g 2 = ac or f 2 = bc), then (11.12) represents a pair


of coincident lines.
• If  = 0, h 2 < ab, then (11.12) represents two imaginary lines/point.
Motivated by the above definition of a conic section, we have given a more general
definition of a conic in the forthcoming Sect. 11.5.
Theorem 11.43 Let A be the affine plane FA2 and f : A → A an affine transfor-
mation. Then f maps a conic curve to a conic curve and in particular
(i) f maps an ellipse to an ellipse.
(ii) f maps a parabola to a parabola.
(iii) f maps a hyperbola to a hyperbola.
(iv) f maps a degenerate conic (split in the union of secant, parallel or merged
lines) to a degenerate conic of the same type.
Proof To represent a conic in FA2 , we firstly recall that a conic is a geometric locus
in FA2 consisting of all points whose coordinates are the solution of a quadratic
equation of the form g(x, y) = 0, where

g(x, y) = a11 x 2 + 2a12 x y + a22 y 2 + 2a13 x + 2a23 y + a33 (11.13)

and ai j ∈ F, for any i, j = 1, 2, 3. It is the standard representing equation of a conic.


The quadratic part of the polynomial g(x, y) is the quadratic form

q(x, y) = a11 x 2 + 2a12 x y + a22 y 2 (11.14)

and can be represented by the symmetric matrix


 
a a
A = 11 12
a12 a22

so that q(x, y) can be written as q(x, y) = X t AX, where X = [x, y]t . Hence, if
we denote B = [2a13 , 2a23 ]t , then the equation g(x, y) = 0 can be written in the
following matrix notation:

X t AX + B t X + a33 = 0 (11.15)

or, also in the following very compact form:


⎡ ⎤ ⎡ ⎤
x a11 a12 a13
Y t CY = 0, where Y = ⎣ y ⎦ and C = ⎣ a12 a22 a23 ⎦ . (11.16)
1 a13 a23 a33

The general classification of conics is well known.


Case I: If |C| = 0 and |A| > 0, the conic is a non-degenerate (real or imaginary)
ellipse.
11.2 Affine Transformations 389

Case II: If |C| = 0 and |A| < 0, the conic is a non-degenerate hyperbola.
Case III: If |C| = 0 and |A| = 0, the conic is a non-degenerate parabola.
Case IV: If |C| = 0, the conic is degenerate, that is, it consists of either the union
of two (secant, parallel or merged) real lines or the union of two conjugate (secant
or parallel) imaginary lines. In all these subcases, the rank of C is equal to 2 if the
conic is a union of distinct lines; it is equal to 1 in either case (merged lines).

We now consider the affine transformation by its expression f (X ) = M X + c, where


M is an invertible matrix of M2 (F) and c is a fixed point of FA2 . Thus, the image of
the conic Γ : X t AX + B t X + a33 = 0 is computed by the substitution

X = M −1 f (X ) − c = H X  + d (11.17)

where
   
−1 β11 β12    t −1 γ1
H=M = , X = f (X ) = [x , y ] , d = −M c= .
β21 β22 γ2

Hence, by using relation (11.17) in (11.15), we have that the geometrical locus f (Γ )
consisting of all points whose coordinates X  are the solution of

(H X  + d)t A(H X  + d) + B t (H X  + d) + a33 = 0,

that is,

X t (H t AH )X  + X t H t Ad + d t AH X  + d t Ad + B t H X  + B t d + a33 = 0.
(11.18)
Moreover, by the facts X t H t Ad = d t At H X  and At = A, (11.18) reduces to

X t (H t AH )X  + (d t AH + d t AH + B t H )X  + (d t Ad + B t d + a33 ) = 0.
(11.19)
Therefore, under the notation A = H t AH, B  = (d t AH + d t AH + B t H )t and

a33 = d t Ad + B t d + a33 in (11.19), f (Γ ) is represented by

X t A X  + B t X  + a33

= 0,

that is, a quadratic equation h(x  , y  ) = 0. Then f (Γ ) is a conic curve. Notice that
relation (11.17) also induces the following one:
⎡ ⎤ ⎡ ⎤⎡ ⎤
x β11 β12 γ1 x
⎣ y ⎦ = ⎣ β21 β22 γ2 ⎦ ⎣ y  ⎦ . (11.20)
1 0 0 1 1
390 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Thus, by using (11.20) in (11.16), it follows that the conic f (Γ ) is represented by


the equation
⎛⎡ ⎤ ⎡  ⎤⎞t ⎛⎡ ⎤ ⎡  ⎤⎞
β11 β12 γ1 x β11 β12 γ1 x
⎝⎣ β21 β22 γ2 ⎦ ⎣ y  ⎦⎠ C ⎝⎣ β21 β22 γ2 ⎦ ⎣ y  ⎦⎠ = 0,
0 0 1 1 0 0 1 1

that is, ⎡ ⎤ ⎡ ⎤
β11 β21 0 β11 β12 γ1
Y t ⎣ β12 β22 0 ⎦ C ⎣ β21 β22 γ2 ⎦ Y  = 0. (11.21)
γ1 γ2 1 0 0 1

So, the matrix associated with f (Γ ) is


⎡ ⎤ ⎡ ⎤
β11 β21 0 β11 β12 γ1
C  = ⎣ β12 β22 0 ⎦ C ⎣ β21 β22 γ2 ⎦
γ1 γ2 1 0 0 1
⎡ ⎤t ⎡ ⎤
β11 β12 γ1 β11 β12 γ1
= ⎣ β21 β22 γ2 ⎦ C ⎣ β21 β22 γ2 ⎦ .
0 0 1 0 0 1

Since C and C  are congruent, they have the same rank. Then we may admit that Γ
is non-degenerate (respectively degenerate) if and only if f (Γ ) is. Moreover, if Γ
were a degenerate conic, then it would be a pair of secant, parallel or merged lines.
In this case, since f maps a line to a line and preserves the property of parallelism
among lines; the image f (Γ ) would be again a pair of secant, parallel or merged
lines, respectively.
Finally, we fix our attention to the case of a non-degenerate conics Γ (ellipse,
hyperbola and parabola). The matrix A represents the quadratic part of the poly-
nomial h(x  , y  ), which describes the image f (Γ ). Since A = H t AH, then |A | =
|H |2 |A|, that is, the determinant of A is zero, positive or negative, according with
the determinant of A is zero, positive or negative. Thus, the type of the conic is
unchanged.

Let us describe now one class of affine transformations of particular interest. Let us
assume that f : A → A has a fixed point, that is, O ∈ A such that f (O) = O. In
light of Theorem 11.38, f can be represented, with respect to a frame of reference
{O, e1 , . . . , en } of A, by the action

f A,O (X ) = AX,

where A ∈ Mn (F) is the invertible matrix of the automorphism ϕ associated with


f A,O in terms of the basis {e1 , . . . , en } of V. Then any affine transformation f A,O ,
fixing the point O, can be identified with its linear part. Let us denote by A f f (A) O ,
11.2 Affine Transformations 391

the set of all affine transformations of the form f A,O . In this sense, there is a one-
to-one correspondence between A f f (A) O and the matrices A ∈ G L n (F). For this
reason, such affine transformations are usually called linear.
Consider then an affine transformation f : A → A and let O be any point of A.
If we denote v = Of(O) ∈ V and f v is the translation of A defined by v, then the
composition g = f v−1 ◦ f (that is, g = f −v ◦ f ) is clearly a linear affine transfor-
mation of A because g fixes the point O as discussed below. Hence, f = f v ◦ g is
a representation of f precisely in terms of composition of one translation and one
linear affine transformation.
Notice that if v = −Of −1 (O) ∈ V and h = f ◦ f v−1  , where f v is the translation

of A defined by v , then f = h ◦ f v is again a representation of f. Moreover, since

f v−1 ◦ f (O) = T ⇐⇒ f(O)T = −v


⇐⇒ Tf(O) = v
⇐⇒ T = O

and
f v−1
 (O) = T ⇐⇒ f −v (O) = T

⇐⇒ OT = −v
⇐⇒ OT = Of −1 (O)
⇐⇒ T = f −1 (O),

one has that


g(O) = f v−1 ◦ f (O) = O

and
h(O) = f ◦ f v−1
 (O) = f ◦ f
−1
(O) = O.

Therefore, g and h are linear affine transformations fixing the same point O ∈ A
and associated with the same automorphism of V. Thus, in light of the previously
mentioned bijection between A f f (A) O and G L n (F), g and h are represented by
the same matrix A ∈ G L n (F) (by the choice of a basis of V ), that is, g = h. As a
conclusion, we have that
f = f v ◦ g = g ◦ f v . (11.22)

The meaning of this result is summarized in the following.


Theorem 11.44 Let A be an affine space over the vector space V. For any affine
transformation f : A → A and for any point O ∈ A, there exist unique v, v ∈ V
and a linear affine transformation g : A → A fixing O, such that relation (11.22)
holds.
392 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

11.3 Isometries

Definition 11.45 Let E be an affine Euclidean space over the vector space V. An
affine transformation f : E → E is called an isometry of E if the associated auto-
morphism ϕ : V → V is an orthogonal operator (an isometry of V ).

It is easy to check that


(1) The identity map 1E : E → E is an isometry and here the associated orthogonal
operator is the identity map on V.
(2) The composition of two isometries f and g is an isometry and here the associated
orthogonal operator is the composition of the orthogonal operators associated
with f and g, respectively.
(3) The inverse of an isometry g is yet an isometry and here the associated orthogonal
operator is the inverse of the orthogonal operator associated with g.
Thus, the set of all isometries of an affine Euclidean space E is a group, usually
denoted by I so(E). A subgroup of I so(E) of particular interest is the one consist-
ing of all isometries of E associated with automorphisms which are represented by
matrices having determinant precisely equal to 1. These isometries are called direct
isometries and the mentioned subgroup of I so(E) is called group of direct isometries
and is denoted by I so+ (E).

Example 11.46 Any translation of E is a direct isometry of E.

Example 11.47 Let H be a hyperplane in the n-dimensional affine Euclidean space


E, V the direction of H and f : E → E the map satisfying the following rules:
(1) for any P ∈ E, Pf(P) ∈ V ⊥ ;
(2) for any P ∈ E, the distance from P to H is equal to the distance from f (P) to
H.
So, f fixes all points in H and interchanges the position of any other point in E along
the orthogonal line to H, at equal distance from H. In the literature, such a map is
usually called reflection across H.
In order to investigate the behavior of this kind of transformation, we firstly
consider the case H that contains the origin point of the frame of reference in E. If
V is the associated vector space of the affine Euclidean space E, then we identify
V by Rn . Similarly, we also identify the direction V of H by the vector space Rn−1
and V ⊥ by the vector space R. Since Rn = V ⊕ V ⊥ , any vector v ∈ Rn can be
uniquely expressed as w + w , where w ∈ V and w ∈ V ⊥ . More precisely, since
dim(V ) = n − 1 and dim(V ⊥ ) = 1, if we fix a vector 0 = u ∈ V ⊥ , we may write

v = w + αu (11.23)

for α ∈ R such that w = αu. By the fact that w · u = 0 and performing the dot
product by u in the identity (11.23), it follows v · u = αu · u.
11.3 Isometries 393

Let now P ∈ E having coordinate vector v ∈ Rn . Since f fixes w ∈ H and acts


like −1 on αu, one has that

f (P) = f (w + αu)
= w − αu
= w − u·u
v·u
u
= v − αu − u·uv·u
u
= v − 2 u·u u
v·u

which is the coordinate vector of the image f (P). In particular, since w and αu are
orthogonal
 f (v) = w + αu = v .

Therefore, if P, Q ∈ E are represented respectively by the coordinate vectors v and


v , it follows that  
f(Q)f(P) =   f (v ) − f (v)

= 
 f(v − v)

= v − v
= QP

which proves that f is an isometry. To write the matrix representing f, pick an


orthogonal basis {e1 , . . . , en−1 } for V and complete to a basis of Rn by adding a
vector en ∈ V ⊥ . Hence, for any vector v ∈ Rn , there exist suitable α1 , . . . , αn ∈ R
n 
n−1
such that v = αi ei and f (v) = αi ei − αn en , that is
i=1 i=1

⎡ ⎤ ⎡ ⎤
α1 ⎡ ⎤ α1
⎢ ..⎥ 1 ⎢ ⎥ ..
⎢ .⎥ ⎢ 1 ⎥⎢ ⎥ .
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢
.. . .. ⎥ ⎢ ⎥ ..
⎢ ⎥ ⎢
. ⎥⎢ ⎥ .
⎢ ⎥=⎢ ⎥⎢ ⎥.
⎢ ⎥ ⎢
.. ⎥⎢ ⎥ ..
⎢ ⎥ ⎢
. ⎥⎢ ⎥ .
⎢ ⎥ ⎣ 1 ⎦ ⎢ ⎥
⎣ αn−1 ⎦ ⎣ αn−1 ⎦
−1
−αn αn

This shows that every reflection across a hyperplane containing the origin consists
only of its linear part, moreover, it is represented by a matrix having determinant
equal to −1.
To get a formula for reflections across any hyperplane, not necessarily containing
the origin, we describe the hyperplane in terms of coordinate vectors of its points:

H = {P ∈ REn having coordinate vector Y ∈ Rn : Y · u = λ}

for some nonzero vector u ∈ Rn that is orthogonal to the hyperplane and some λ ∈ R.
394 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Fix a point Q ∈ E having coordinate vector w ∈ Rn . To reflect a point P across


H, whose coordinate vector is v ∈ Rn , we firstly perform a translation by subtracting
the coordinates of Q; in this sense, we obtain a point P  having coordinate vector
v − w and a hyperplane (−Q) + H which is parallel to H and contains the origin of
the frame of reference. At this point, following the previous argument, we compute
the reflection of the point P  across (−Q) + H. Finally, adding the coordinates of
w back, we have the reflection of P across H.
More in detail, if we denote by g the reflection across the hyperplane (−Q) + H,
we have
f (P) = g(v − w) + w
(11.24)
= (v − w) − 2 (v−w)·u
u·u
u+w

for any vector u orthogonal to the direction of H. Moreover, since Q ∈ H then


w · u = λ and (11.24) can be written as

v·u−λ
f (v) = v − 2 u
u·u
which represents the coordinate vector of the reflected point from P across H.

To clarify much more the previous example, we illustrate some specific cases.
Example 11.48 Let r be the straight line in RE2 defined by equation x − 2y + 1 =
0. Its direction is V = (2, 1) , whose orthogonal complement is u , where u ≡
(1, −2). Any point of r possesses coordinate vector (y1 , y2 ) satisfying the relation
y1 − 2y2 = −1 (λ = −1). Hence, for any vector (x1 , x2 ) representing a point of RE2 ,
its reflection across r is obtained by
  2x1 −4x2 +2 4x1 −8x2 +4
!
f (x1 , x2 ) = x1 − 5
, x2 + 5

3x1 +4x2 −2 4x1 −3x2 +4


!
= 5
, 5
(11.25)
" #   
3 4
x1 − 2
= 5 5
+ 45 .
4
5
− 35 x2 5

Example 11.49 Let π be the plane in RE3 defined by equation x − 2y + 2z −


2 = 0. Its direction is V = (2, 0, −1), (2, 1, 0) , whose orthogonal complement
is V ⊥ = u , where u ≡ (1, −2, 2). The coordinates (y1 , y2 , y3 ) of any point of π
satisfy the relation y1 − 2y2 + 2y3 = 2 (λ = 2). Hence, for any vector (x1 , x2 , x3 )
representing a point of RE3 , its reflection across π is obtained by f (x1 , x2 , x3 )
11.3 Isometries 395

2x1 −4x2 +4x3 −4 −4x1 +8x2 −8x3 +8 4x1 −8x2 +8x3 −8


!
= x1 − 9
, x2 − 9
, x3 − 9

7x1 +4x2 −4x3 +4 4x1 +x2 +8x3 −8 −4x1 +8x2 +x3 +8


!
= 9
, 9
, 9
⎡ ⎤⎡ ⎤ ⎡ 4 ⎤ (11.26)
7
9
4
9
− 49 x1
⎢ ⎥ ⎣ ⎦ ⎣ 98 ⎦
= ⎣ 4
9
1
9
8
9 ⎦ x2 + − 9 .
8
− 49 98 1 x3 9
9

Definition 11.50 A direct isometry f : E → E of the affine Euclidean space E, for


which a point P ∈ E is a fixed point, i.e., f (P) = P, is called rotation centered at
P.

By Theorem 11.38 and since any orthogonal operator of a vector space is represented
by an orthogonal matrix, the following directly follows.

Theorem 11.51 The map f : E → E is an isometry of the affine Euclidean space


E if and only if there exist an orthogonal matrix A ∈ Mn (F) and a fixed point c ∈ A
such that, for any point P ∈ A having coordinate vector X = [x1 , . . . , xn ]t in terms
of B, the point f (P) ∈ A has coordinate vector Y = [y1 , . . . , yn ]t in terms of B,
satisfying the following relation:

Y = AX + c. (11.27)

Let us note that we are now able to reformulate the result contained in Theorem 11.44.

Theorem 11.52 Let E be an affine Euclidean space over the vector space V. For
any isometry f : E → E and for any point O ∈ E, there exist unique v ∈ V and an
isometry g : A → A fixing O, such that

f = fv ◦ g (11.28)

where f v is the translation of E defined by v.

Remark 11.53 We know that, for any points P, Q of an affine Euclidean space E
over the Euclidean vector space V, the vector PQ is determined. Hence, we may
define the map
δ :E×E→R

such that δ(P, Q) = PQ , usually called distance between P and Q. As a met-
ric map on a vector space, one may see that the distance δ satisfies the following
properties:
(i) δ(P, Q) > 0, for any Q = P points of E.
(ii) δ(P, P) = 0, for any P ∈ E.
(iii) δ(P, Q) = δ(Q, P), for any P, Q ∈ E.
(iv) δ(P, Q) = δ(P, H ) + δ(H, Q), for any P, H, Q ∈ E.
396 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

In other words, a distance satisfies the same condition of a metric of a vector space,
thus any affine Euclidean space is a metric space.

Theorem 11.54 Let E be an affine Euclidean space over the vector space V (over
the field R). The map f : E → E is an isometry if and only if there exists a distance
δ : E × E → R such that

 
δ f (P), f (Q) = δ P, Q (11.29)

for any points P, Q ∈ E.

Proof If f : E → E is an isometry and ϕ : V → V is its associated automorphism


(an isometry of V ), then

δ f (P), f (Q) = f(P)f(Q)

= ϕ(PQ)

= PQ
 
= δ P, Q .

Conversely, assume that condition (11.29) holds. We now fix a point O ∈ E and
introduce the map σ : E → V defined by σ (P) = OP, for any point P ∈ E. In light
of the definition of affine spaces, it is clear that σ is a bijection between E and V,
i.e., for any point P ∈ E, there is a unique vector v ∈ V such that v = OP.
Consider the following function ϕ : V → V defined by ϕ(v) = f(O)f(P), for any
v = OP ∈ V. Notice that

ϕ(0) = ϕ(OO) = f(O)f(O) = 0.

Moreover, for any v = OP and w = OQ, we have

ϕ(v) − ϕ(w) = ϕ(OP) − ϕ(OQ)

= f(O)f(P) − f(O)f(Q)

= f(Q)f(P)

= QP

= v − w ,
11.3 Isometries 397

that is, ϕ is an isometry of V. Since f is an affine transformation of E associated


with ϕ, f is an isometry of E.

Thus, an isometry of an affine Euclidean space preserves distances between points.


In the literature, if the mapping f : E → E, of an affine Euclidean space E into itself,
is an isometry of E as a metric space, then it is called motion.

11.4 A Natural Application: Coordinate Transformation in


RE2

A coordinate transformation in the affine Euclidean space RE2 is the transition from
a frame of reference of RE2 to another one, corresponding to a transition from a
basis of R2 to another one, so that any point of RE2 (and so any vector of R2 ) is
represented by different coordinates with respect to different systems.
We fix our attention to translations and rotations of the unit vectors i and j of a
Cartesian coordinate system O X Y. More precisely,
Translation: Introduce a second coordinate system O  X  Y  , having the origin O  and
unit vectors i and j , such that the coordinates of O  with respect to the system O X Y
are (x0 , y0 ) and the base vectors i , j are parallel to i and j, respectively.

Rotation: Introduce a second coordinate system O X  Y  , having the same origin and
unit vectors i and j , such that the new coordinate system is obtained from the first
one by a rotation of a certain angle ϑ of the base vectors. We recall that the sign
convention of rotations is positive counterclockwise.
Let (x, y) be the coordinates of the point P in terms of O X Y. In order to determine
the coordinates (x  , y  ) of P with respect to the new coordinate system, we firstly
remark that they are the coordinates of the vector OP, where O is the tail and P is
the head. We express these coordinates both in terms of {i, j} and in terms of {i , j }:

OP = xi + yj OP = x  i + y  j .

In particular, in the case of the above-mentioned translation from the system O X Y


to O  X  Y  , the coordinate vector of i and j relative to the basis {i, j} are (1, 0) and
(0, 1), respectively. Hence, the following relation:
    
x − x0 10 x
=
y − y0 01 y

represents the translation, that is,

x  = x − x0
y  = y − y0
398 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

and conversely
x = x  + x0
y = y  + y0 .

Moreover, one can see that rotations are nothing more than changes of basis in the
vector space R2 , then they can be represented by a transition matrix of the ordered
basis {i , j } relative to the ordered basis {i, j}. Therefore, the coordinate vector of i
relative to the basis {i, j} is the first column vector in the transition matrix relative to
{i, j}; analogously, the coordinate vector of j relative to the basis {i, j} is the second
column vector in the transition matrix relative to {i, j}.
Hence, in the case the new coordinate system is obtained from the first one by
 
a counterclockwise rotation of a certain angle  ϑ, the
 coordinate vector of i and j
relative to the basis {i, j} are cos(ϑ), sin(ϑ) and −sin(ϑ), cos(ϑ) , respectively.
Thus, the rotation is represented by
    
x cos(ϑ) −sin(ϑ) x
= .
y sin(ϑ) cos(ϑ) y

Conversely, since the previous transition matrix is orthogonal, we also have


    
x cos(ϑ) sin(ϑ) x
= .
y −sin(ϑ) cos(ϑ) y

Hence, the equations representing a rotation are

x = x  cos(ϑ) − y  sin(ϑ)
y = x  sin(ϑ) + y  cos(ϑ)

and
x  = xcos(ϑ) + ysin(ϑ)
.
y  = −xsin(ϑ) + ycos(ϑ)

We finally assume that the new coordinate system O  X  Y  has been counterclockwise
rotated of a certain angle ϑ with respect to O X Y and then translated. As above, we
denote as O  ≡ (x0 , y0 ) the coordinates of the translated origin. In this case, the
roto-translation is represented by the relation
    
x − x0 cos(ϑ) −sin(ϑ) x
=
y − y0 sin(ϑ) cos(ϑ) y

and conversely
    
x cos(ϑ) sin(ϑ) x − x0
= . (11.30)
y −sin(ϑ) cos(ϑ) y − y0
11.4 A Natural Application: Coordinate Transformation in RE2 399

Thus, we have

x = x  cos(ϑ) − y  sin(ϑ) + x0
(11.31)
y = x  sin(ϑ) + y  cos(ϑ) + y0

and
x  = (x − x0 )cos(ϑ) + (y − y0 )sin(ϑ)
y  = −(x − x0 )sin(ϑ) + (y − y0 )cos(ϑ).

Example 11.55 Let P ≡ (2, −1) and r : x + y − 2 = 0 be, respectively, the coor-
dinates of the point P and the equation of the line r with respect to the system O X Y.
Introduce a new coordinate system O  X  Y  , where O  ≡ (1, 4) and the base vectors
{i , j } are counterclockwise rotated by π4 .
We now determine the coordinates of P and the equation of r in terms of the new
coordinate system O  X  Y  . By using relation (11.30), we obtain the new coordinates
of P:    " √2 √2 #    √ 
x 1 −2√2
= √
2 √2 = .
y − 2 2
2 2 −5 −3 2

Moreover, by relation (11.31), we may replace indeterminates x, y in the equation


of r, in order to get the equation of r with respect to the new system O  X  Y  :
√ √ √ √
2  2  2  2 
x − y +1 + x + y +4 −2=0
2 2 2 2

that is √
2x  + 3 = 0.

11.5 Affine and Metric Classification of Quadrics

We dedicate the present section to the study of quadrics in affine and Euclidean
spaces. More precisely, we approach the study of the action of affine transformations
and isometries on quadrics. After a general analysis and classification of all the
potential cases, we will focus the attention on conic curves in FA2 and FE2 and
quadric surfaces in FA3 and FE3 from a purely geometrical point of view.
In order to head in the right direction, the first step must therefore be to introduce
the framework within the strategy for classification of geometric loci is usually
implemented. Now, we start by some definitions:

Definition 11.56 Let S, S  be two subsets of an affine space A. We say that S and S 
are affinely equivalent if there exists a nonsingular affine transformation f : A → A
such that f (S) = S  .
400 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

The significance of the previously defined equivalence (sometimes called affinity)


lies in the fact that a frame of reference {O, e1 , . . . , en } for A is mapped in a new
frame of reference {O  , e1 , . . . , en }. More in detail, if ei = OPi , then O  = f (O),
Pi = f (Pi ) and ei = O Pi . Thus, any point P ∈ S is mapped to a point P  ∈ S  in
such a way that P is represented in terms of {O, e1 , . . . , en } by the same coordinates
which represent P  in terms of {O  , e1 , . . . , en }.
Typically, one is interested in geometric properties invariant under certain transfor-
mations. In the case of affine transformations, we know that parallelism between lines,
planes and, more generally, between affine subspaces is preserved. Nevertheless, to
ensure the invariance of distances (and angles) between objects, the transformation
is required to be an isometry.

Definition 11.57 Let S, S  be two subsets of an affine Euclidean space E. We say


that S and S  are metrically equivalent or also congruent if there exists an isometry
f : E → E such that f (S) = S  .

On the basis of the definitions we have just mentioned, we now investigate the
question about how a subset S ⊂ A can be transformed into another S  ⊂ A by an
affine transformation in such a way that S  could represent the simplest form among
all the possible subsets of A that are affinely equivalent to S.
More specifically, here we examine the case when such a subset is a quadric.

Definition 11.58 Let A be an affine space associated with the vector space V over
the field F and assume dim F V = n. A quadric in A is a nonempty set Q of points
whose coordinates satisfy the equation p(x1 , . . . , xn ) = 0, where p is a quadratic
polynomial in the variables x1 , . . . , xn and with coefficients in F.

Note 11.59 In all that follows, we always assume that the characteristic of the men-
tioned field F is different from 2.

Starting from the definition of a quadric and collecting terms of the second, first and
zeroth degrees, the polynomial p can be written as

n
p(x1 , . . . , xn ) = ai j xi x j + 2 ak,n+1 xk + an+1,n+1 (11.32)
i=1 j=1 k=1

for ai j = a ji ∈ F, for any i, j = 1, . . . , n + 1. The quadratic part of the polynomial


p(x1 , . . . , xn ) is the quadratic form

n
q(x1 , . . . , xn ) = ai j xi x j (11.33)
i=1 j=1

and can be represented by the symmetric matrix


11.5 Affine and Metric Classification of Quadrics 401
⎡ ⎤
a11 · · · · · · a1n
⎢ .. .. ⎥
⎢ . . ⎥

A=⎢ . ⎥
.. ⎥ .
⎣ .. . ⎦
a1n · · · · · · ann

Hence, q(x1 , . . . , xn ) = X t AX , where X = [x1 , . . . , xn ]t .


If B = [2a1,n+1 , . . . , 2an,n+1 ]t then the equation p(x1 , . . . , xn ) = 0 can be written
in the following matrix notation:

X t AX + B t X + an+1,n+1 = 0. (11.34)

It is also easy to see that, if we set


⎡ ⎤
x1
⎢ .. ⎥  
⎢ ⎥ 1 A v
Y = ⎢ . ⎥ , v = B and C = t ,
⎣ xn ⎦ 2 v an+1,n+1
1

we may express Eq. (11.34) in the compact form

Y t CY = 0. (11.35)

A classical classification of quadrics is made in relationship with the determinant of


matrix C:
Definition 11.60 A quadric of Eq. (11.35) is called non-degenerate if C is not sin-
gular, otherwise it is said to be degenerate.
However, the first step in order to observe quadrics from the right point of view is the
distinction between quadrics having a center and those that don’t. More precisely,
Definition 11.61 Let P ∈ A be a point and Q ⊂ A a quadric. Then P is called center
of the quadric if the set Q is invariant under the action of the central symmetry with
respect to the point P (see Example 11.34).
In this sense, we have the following result.

Theorem 11.62 Let Q be a quadric of Eq. (11.35), P a point of FAn having coor-
dinate vector u ≡ [γ1 , . . . , γn ] in terms of a fixed coordinate system. The point P

n
is a center of Q if and only if, for any i = 1, . . . , n, ai j γ j + ai,n+1 = 0, that is,
j=1
Au + v = 0.

Proof Let f : FAn → FAn be the central symmetry with respect to point P. By
applying f to the polynomial (11.32) representing Q, we obtain the polynomial
representing the set of points f (Q), that is,
402 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry


n
f p(x1 , . . . , xn ) = ai j (2γi − xi )(2γ j − x j )
i, j=1

n
+2 ai,n+1 (2γi − xi ) + an+1,n+1
i=1 (11.36)

n 
n 
n
=4 ai j γi γ j − 4 ai j xi γ j + ai j xi x j
i, j=1 i, j=1 i, j=1
n  n
+4 ai,n+1 γi − 2 ai,n+1 xi + an+1,n+1 .
i=1 i=1

Since P is a center of Q if and only


if f (Q) = Q, in an equivalent manner, P
is a center of Q if and only if f p(x1 , . . . , xn ) = p(x1 , . . . , xn ). On the other

hand, by comparing relations (11.36) and (11.32) we see that f p(x1 , . . . , xn ) =
p(x1 , . . . , xn ) is true only if
n


n
ai j γ j + ai,n+1 (xi − γi ) = 0.
i=1 j=1

Since the last identity holds only if any coefficient is identically zero, we get the
required conclusion.

Corollary 11.63 A quadric Q of Eq. (11.35) has a center if and only if rank(C) ≤
rank(A) + 1.

n
Proof If we assume that Q has a center, then ai j γ j + ai,n+1 = 0, where [γ1 , . . . ,
j=1
γn ] is the coordinate vector of the center. Hence, the system of linear equations
coming from the identity AX = −v has solutions, that is, rank(A) = rank(A|v).
Since rank(C) ≤ rank(A|v) + 1, the conclusion follows.
Conversely, if rank(C) ≤ rank(A) + 1 but we assume that Q has no center, then
we have to assert that AX = −v has no solution, that is, rank(A|v) = rank(A) + 1.
This should imply that the (n + 1)th column of C is linearly independent of the first
n column vectors of C. On the other hand, since C is symmetric, the last assertion
means that the (n + 1)th row of C is linearly independent of the first n row vectors
of C. Thus rank(C) = rank(A|v) + 1 = rank(A) + 2, which is a contradiction.

Theorem 11.64 Let Q be a quadric of A represented by Eq. (11.34). If the matrix


A is nonsingular then Q has a center. Moreover, it is unique.

Proof By Theorem 11.62, a point P ∈ A should be a center of Q if and only if its


coordinate vector X 0 ≡ [γ1 , . . . , γn ] is a solution of the linear system AX + ai,n+1 =
0, for i = 1, 2, . . . , n where X = [x1 , . . . , xn ]t . However, since by our assumption
the matrix A has rank equal to n, the system admits exactly one and only one solution,
that is, the center exists and it is unique.
11.5 Affine and Metric Classification of Quadrics 403

These fruitful discussions allow us to gain the first overview of the distinction between
different categories of quadrics in relation to a possible existence of a center.
Let us now investigate the following question: into what simplest form can the
equation of a quadric be written in terms of a suitable choice of a frame of reference
of the n-dimensional affine space An ? The answer to this question is a consequence
of the solution of the related problem regarding the establishment of appropriate
conditions under which two quadrics can be transformed into each other by an affine
transformation. Since a quadric is represented by a polynomial of degree 2, we may
consider whether two quadratic polynomials are affinely (or metrically) equivalent,
in the sense of the following definition:
Definition 11.65 Let p1 (x1 , . . . , xn ) and p2 (x1 , . . . , xn ) be two distinct polynomials
with coefficients in a field F. We say that p1 and p2 are affinely (or metrically)
equivalent if there exists an affine transformation
 (respectively, an isometry) f :
FAn → FAn such that f p1 (x1 , . . . , xn ) = p2 (x1 , . . . , xn ).
The quadrics Q1 and Q2 , represented by two affinely (or metrically) equivalent
polynomials are said to be affine (or metrically) equivalent.
Here we extend the argument previously presented in Theorem 11.43 to the n-
dimensional case. Hence, consider the affine transformation f of A, described by
f (X ) = M X + c, where M is an invertible matrix of Mn (F) and c is a fixed point
of A. The image of Q is computed by the substitution

X = M −1 f (X ) − c = H X  + d (11.37)

in the relation (11.34), where H = M −1 and d = −M −1 c = −H c. Therefore, f (Q)


consists of all points whose coordinates X  are the solution of the equation

(H X  + d)t A(H X  + d) + B t (H X  + d) + an+1,n+1 = 0,

that is,

X t (H t AH )X  + (d t AH + d t AH + B t H )X  + (d t Ad + B t d + an+1,n+1 ) = 0.
(11.38)
Therefore, under the notation A = H t AH , B  = (d t AH + d t AH + B t H )t and

an+1,n+1 = d t Ad + B t d + an+1,n+1 in (11.38), f (Q) is represented by the equation

X t A X  + B t X  + an+1,n+1

=0

implying that f (Q) is yet a quadric of A. Since the relation (11.37) can be also
obtained by ⎡ ⎤ ⎡ ⎤
x1 x1
⎢ .. ⎥   ⎢ .. ⎥
⎢ . ⎥ H d ⎢ . ⎥
⎢ ⎥= ⎢ ⎥ , for 0 = [0, . . . , 0 ]t
$ %& '
(11.39)
⎣ xn ⎦ 0t 1 ⎣ x  ⎦
n
n -times
1 1
404 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

the substitution (11.39) in (11.35) gives


  t  
H d  H d
Y C Y  = 0,
0t 1 0t 1

that is,    
Ht 0 H d
Y t C Y  = 0.
dt 1 0t 1

This means that the matrix representing f (Q) is


     t  
Ht 0 H d H d H d
C = C = C .
dt 1 0t 1 0t 1 0t 1

Thus, using the terminology introduced in Definition 11.56, we may assert that
(1) The matrices A, A , representing the quadratic parts of polynomials defining
two affinely equivalent quadrics, are congruent.
(2) The matrices C, C  , representing the entire polynomials defining two affinely
equivalent quadrics, are congruent.
One of the main properties of congruent matrices is that they have the same rank.
For this reason, in the further discussions we refer to the ranks of A and C in order
to indicate at the same time ranks of A and C  , respectively.

If we consider the case when Q is a quadric of an affine Euclidean space E, then the
affine transformation acting on the points of Q can be described by f (X ) = M X + c,
where M is an invertible orthogonal matrix of Mn (F) and c is a fixed point of E. So,
following Definition 11.57, we say that
(1) The matrices A, A , representing the quadratic parts of polynomials defining
two metrically equivalent quadrics, are congruent.
(2) The matrices C, C  , representing the entire polynomials defining two metrically
equivalent quadrics, are congruent.
Fixing a quadric Q, we pay special attention to establishing what is a suitable choice
of a frame of reference of the n-dimensional affine space A, in terms of which the
equation of Q can be written in the simplest form. We divide our argument into two
main cases.

The Affine Classification for Quadrics with Center


We start from Eq. (11.34) for v = 21 B, that is,

X t AX + 2vt X + an+1,n+1 = 0. (11.40)

By assuming that Q has a center, and by Theorem 11.62, there exists a vector u ∈ Fn
such that Au + v = 0. Looking at relation (11.37), and since A is symmetric, we
11.5 Affine and Metric Classification of Quadrics 405

may choose H ∈ Mn (F) such that A = H t AH is a diagonal matrix. Hence, for


X = H X  + u in (11.40), the polynomial representing Q in the new coordinate
system is equal to

(H X  + u)t A(H X  + u) + 2vt (H X  + u) + an+1,n+1 = 0,

that is,
X t A X  + e = 0 (11.41)

where e = an+1,n+1 + vt u. The complete matrix associated with the quadric is


 
 A 0
C =
0t e

and the polynomial p of Q may assume two different forms.


In case e = 0, then r = rank(A) < rank(C) (so that r = rank(A ) < rank(C  )),
meaning that rank(C) = r + 1. The polynomial is

r
p(x1 , . . . , xn ) = αi xi2 + e (11.42)
i=1

and, by the following final substitution (affine transformation)


() *
|e| 
x i = 1, . . . , r
xi = |αi | i . (11.43)
xi i = r + 1, . . . , n

Equation (11.42) reduces to one of

r
p(x1 , . . . , xn ) = xi2 − x 2
j +1 (11.44)
i=1 j=h+1

or

r
p(x1 , . . . , xn ) = xi2 − x 2
j − 1. (11.45)
i=1 j=h+1

If e = 0, then r = rank(A) = rank(C) (so that r = rank(A ) = rank(C  )) and the


polynomial is

r
p(x1 , . . . , xn ) = αi xi2 . (11.46)
i=1

Here, we apply the substitution


406 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

√1 x  i = 1, . . . , r
xi = |αi | i
xi i = r + 1, . . . , n

and (11.46) reduces to

r
p(x1 , . . . , xn ) = xi2 − x 2
j . (11.47)
i=1 j=h+1

Notice that we have renumbered the coordinates in the polynomial p in such a


way that the first h are positive and the remaining r − h are negative. In further
discussions, we are going to do it again, but no mention of such action will be
done. These replacements are just about substitutions of variables, that is, affine
transformations.
We then conclude that any quadric with center in the n-dimensional affine space
A is affinely equivalent to one of the form (11.44), (11.45) or (11.47). In detail, we
have
(i) For r = n, polynomials of the form (11.44) or (11.45) represent a non-
degenerate quadric. We say that Q is an Ellipsoid in the cases h = 0 or h = r ;
we say that Q is a Hyperboloid in case 1 ≤ h ≤ r − 1.
(ii) For r ≤ n − 1, polynomials of the form (11.44) or (11.45) represent a degen-
erate quadric. We say that Q is a non-parabolic Cylinder.
(iii) Polynomials of the form (11.47) represent a degenerate quadric. We say that
Q is a Cone.
Remark 11.66 Some particular cases are the following:
(i) In case F = R and for h = 0 in (11.45), Q is the empty set.
(ii) In case F = R and for h = r in (11.44), Q is the empty set.
(iii) For h = 0 or h = r in (11.47), Q is an affine subspace of An .
Example 11.67 In the 3-dimensional affine space RA3 , let

x 2 + 4x y + 2x + y 2 + 2yz + 2z + 1 = 0

be the equation representing a quadric Q. The matrices associated with Q are


⎡ ⎤
⎡ 1 ⎤ 2 0 1
120 ⎢2 1 1 0⎥
A = ⎣2 1 1⎦ C = ⎢
⎣0
⎥.
1 0 1⎦
010
1 0 1 1

Hence, Q can be represented in the following matrix notations:


⎡⎤⎡ ⎤ ⎡ ⎤
120
! x ! x
x y z ⎣2 1 1⎦⎣ y ⎦ + 1 0 1 ⎣ y ⎦ + 1 = 0
010 z z
11.5 Affine and Metric Classification of Quadrics 407

and ⎡ ⎤⎡ ⎤
1 2 0 1 x
!⎢2 1 1 0⎥ ⎢ ⎥

x yz1 ⎣ ⎥ ⎢ y ⎥ = 0.
0 1 0 1⎦⎣ z ⎦
1 0 1 1 1

Notice that |C| = 0 and AX + v = 0 has a solution, thus Q is a non-degenerate


quadric having exactly one center of symmetry. Solving the system
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
120 x 1 0
⎣2 1 1⎦⎣ y ⎦ + ⎣0⎦ = ⎣0⎦,
010 z 1 0

we find the coordinates of center (1, −1, −1). Looking at the symmetric matrix
A, by the process of diagonalization of a bilinear symmetric form, we may obtain
H ∈ Mn (R) such that A = H t AH is a diagonal matrix. The implementation of the
standard process leads us to
⎡ ⎤
1 −2 − 23
H = ⎣ 0 1 13 ⎦ .
0 0 1

Performing the transformation


⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
x 1 −2 − 23 x 1
⎣ y ⎦ = ⎣ 0 1 1 ⎦ ⎣ y  ⎦ + ⎣ −1 ⎦ ,
3
z 0 0 1 z −1

we have that the equation representing Q in the new coordinate system is

1
x 2 − 3y 2 + z 2 + 1 = 0.
3
Finally, for

x = X

y = √13 Y

z  = 3Z ,

we obtain the following affine canonical form of Q :

X 2 − Y 2 + Z 2 + 1 = 0,

which is a hyperboloid.
408 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

The Affine Classification for Quadrics Without Center


Starting again from Eq. (11.40), we now assume rank(A) < rank(A|v), that is,
AX + v = 0 has no solution. Here, we denote by X the generic coordinate vector in
terms of a fixed basis for Fn . In this case, the quadric Q has no center. Consider the
following subspace of Fn

N = {X ∈ Fn : vt X = 0}.

The dimension of N is equal to n − 1. Let {e1 , . . . , en−1 } be a basis for N and


en ∈ Fn be a vector such that etn Aei = 0, for any i = 1, . . . , n − 1. Let us extend
{e1 , . . . , en−1 } to the basis {e1 , . . . , en } for Fn . In order to get the equation of Q in
the frame of reference {O, e1 , . . . , en }, we have to perform the affine transformation
X = E X  , where E is the transition matrix of {e1 , . . . , en } relative to the fixed basis
and obviously whose column vectors are precisely e1 , . . . , en . Substitution of X =
E X  in (11.40) gives

X t E t AE X  + 2vt E X  + an+1,n+1 = 0 (11.48)

where
(i) the nth column and nth row of matrix E t AE are zeros, except eventually the
(n, n)-entry;
(ii) the product vt E X  is equal to αn xn , for some αn ∈ F.
Thus, the polynomial representing Q with respect to the basis {e1 , . . . , en } can be
written as
p(x1 , . . . , xn ) = X t A X  + 2αn xn + an+1,n+1 (11.49)

where  
à 0
A =  .
0t ann

Notice that, since Q has no center, the coefficient αn in (11.49) is not zero. Moreover
à ∈ Mn−1 (F) is symmetric, so that there exists a suitable basis {c1 , . . . , cn−1 } for N
in terms of which the bilinear symmetric form associated with à is represented by
a diagonal matrix; in other words, there exists a nonsingular matrix D ∈ Mn−1 (F)
such that A = D t ÃD is a diagonal matrix, i.e.,
⎡ ⎤
α1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ αr ⎥
A = D ÃD = ⎢
 t

⎥ 0 = αi ∈ F, r = rank( Ã).

⎢ 0 ⎥
⎢ .. ⎥
⎣ . ⎦
0
11.5 Affine and Metric Classification of Quadrics 409

We now observe that introducing the matrix



D0
H= t ,
0 1

an easy computation gives  


A 0
H t A H =  .
0t ann

Hence, applying the affine transformation X  = H X  to the polynomial (11.49), we


get
p(x1 , . . . , xn ) = X t H t A H X  + 2αn xn + an+1,n+1
r
(11.50)
= αi xi2 + ann

xn2 + 2αn xn + an+1,n+1 .
i=1


Looking at (11.50), assume firstly that ann = 0. In this case, and since
2
 αn αn2
ann xn2 + 2αn xn + an+1,n+1 = 
ann xn +  + an+1,n+1 − 
ann ann

we may write (11.50) as

r 2
αn
p(x1 , . . . , xn ) = αi xi2 + 
ann xn +  + a (11.51)
i=1
ann

αn2
for a  = an+1,n+1 −  .
ann
If we now apply to (11.51) the translation

xi = xi i = 1, . . . , n − 1
xn = xn − aαn ,
nn

the polynomial representing Q assumes the form

r
αi xi2 + ann

xn2 + a 
i=1

which should give a quadric with center. This contradiction proves that the coefficient

ann of polynomial (11.50) must be zero. Hence,

r
p(x1 , . . . , xn ) = αi xi2 + 2αn xn + an+1,n+1 . (11.52)
i=1

Finally, starting from (11.52) and by performing the substitution


410 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

xi = √ 1 x  i = 1, . . . , n
|αi | i
−1
xn
xn = 2αn
− an+1,n+1
2αn
,

the polynomial of Q is

r
p(x1 , . . . , xn ) = xi2 − x 2 
j + xn . (11.53)
i=1 j=h+1

Hence, any quadric without a center in the n-dimensional affine space A is affinely
equivalent to one of the form (11.53).
Then, for a quadric without center, we have that rank(A) = r ≤ n − 1 and
rank(C) = r + 2. In particular,
(i) for r = n − 1, the quadric is non-degenerate and here we say that Q is a
Paraboloid;
(ii) for r ≤ n − 2, the quadric is degenerate and here we say that Q is a parabolic
Cylinder.
We can sum up what we have done saying that for any quadric Q in the n-dimensional
affine space An , over an arbitrary field F of characteristic different from 2, there is a
suitable frame of reference for An , in terms of which Q is specified by a particularly
simple equation, called canonical equation or canonical form for Q. Any possible
canonical equation of Q is associated with a polynomial of the form (11.44), (11.45),
(11.47) or (11.53). In particular,

n 
n
(i) polynomials xi2 + 1 and xi2 − 1 represent the canonical form of an Ellip-
i=1 i=1
soid (it is a non-degenerate quadric with exactly one center);
h n 
h n
(ii) polynomials xi2 − xi2 + 1 and xi2 − xi2 − 1 (h = 0, n) repre-
i=1 i=h+1 i=1 i=h+1
sent the canonical form of a Hyperboloid (once again, it is a non-degenerate
quadric with exactly one center);
h 
r 
h r
(iii) polynomials xi2 − xi2 + 1 and xi2 − xi2 − 1 (r ≤ n − 1) rep-
i=1 i=h+1 i=1 i=h+1
resent the canonical form of a non-parabolic Cylinder (it is a degenerate quadric
with an infinite number of centers);
h r
(iv) polynomials xi2 − xi2 (r ≤ n) represent a Cone (it is a degenerate
i=1 i=h+1
quadric having exactly one center);
 2
n−1
(v) polynomial xi + xn represents a Paraboloid (it is a non-degenerate quadric
i=1
without center);
r
(vi) polynomials xi2 + xn (r ≤ n − 2) represent a parabolic Cylinder (it is a
i=1
degenerate quadric without center).
11.5 Affine and Metric Classification of Quadrics 411

Example 11.68 Let Q be the quadric in RA3 having equation

x 2 + 2x y + 2x z + 4x + 2yz + z 2 + 2z + 1 = 0.

The matrices associated with Q are


⎡ ⎤
⎡ ⎤ 1 112
111 ⎢1 0 1 0⎥
A = ⎣1 0 1⎦ C = ⎢
⎣1
⎥.
1 1 1⎦
111
2 011

Hence, the compact forms representing Q are


⎡ ⎤⎡ ⎤ ⎡ ⎤
!111 x ! x
x y z ⎣1 0 1⎦⎣ y ⎦ + 2 0 1 ⎣ y ⎦ + 1 = 0
111 z z

and ⎡ ⎤⎡ ⎤
1 1 12 x
!⎢1 0 1 0⎥ ⎢y⎥
x yz1 ⎢
⎣1
⎥ ⎢ ⎥ = 0.
1 1 1⎦⎣ z ⎦
2 0 11 1

Notice that |C| = 0 and AX + v = 0 has no solution. In particular, rank(A) = 2 and


rank(C) = 4, so that Q is a non-degenerate quadric without any center of symmetry.
Consider the subspace

N = {X ∈ R3 : [2, 0, 1]X = 0}.

The dimension of N is equal to 2 and N = (1, 0, −2), (0, 1, 0) . We extend this


basis of N to one of R3 , by adding the vector (1, 0, −1), which is A-orthogonal to
both vectors generating N .
We now perform the affine transformation X = E X  , where E is the transition
matrix of {(1, 0, −2), (0, 1, 0), (1, 0, −1)} relative to the standard basis of R3 and
obviously whose column vectors are precisely (1, 0, −2), (0, 1, 0), (1, 0, −1). As a
result, we have
x = x  + z
y= y
z = −2x  − z  .

Substitution of X = E X  leads to the following transformation of the matrices asso-


ciated with the quadric:
412 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −2 1 0 1 1 −1 0
A = E t AE = ⎣ 0 1 0 ⎦ A ⎣ 0 1 0 ⎦ = ⎣ −1 0 0 ⎦
1 0 −1 −2 0 −1 0 0 0

and
⎡ ⎤ ⎡ ⎤
    1 −1 0 0 0
Et 0 E0 ⎢ −1 0 0 0⎥ ⎢ 
0⎥
C = C =⎢ ⎥=⎢ A ⎥.
0 1 0 1 ⎣ 0 0 0 1 ⎦ ⎣ 1⎦
0 0 1 1 0 0 1 1

By the standard process for diagonalization for bilinear forms, we find the basis
{(1, 0, 0), (1, 1, 0), (0, 0, 1)} in terms of which the bilinear symmetric form associ-
ated with A is represented by a diagonal matrix. More precisely, if we denote by D,
the transition matrix of {(1, 0, 0), (1, 1, 0), (0, 0, 1)} relative to the standard basis of
R3 which has column vectors (1, 0, 0), (1, 1, 0), (0, 0, 1), then we get
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
100 110 1 0 0
A = D t A D = ⎣ 1 1 0 ⎦ A ⎣ 0 1 0 ⎦ = ⎣ 0 −1 0 ⎦ .
001 001 0 0 0

Now, introducing the matrix  


D0
H= ,
0t 1

we get ⎡ ⎤
1 0 0 0
⎢ 0 −1 0 0⎥
HtC H = ⎢
⎣0
⎥.
0 0 1⎦
0 0 1 1

The quadric is then represented by the equation x 2 − y 2 + 2z  + 1 = 0, which is


a paraboloid. Finally, by the transformation

x  = X
y  = Y
z  = Z
2
− 21 ,

we obtain the affine canonical form of the equation representing Q, that is X 2 −


Y 2 + Z = 0, which is a paraboloid.

At this point, we wonder what can we say in case F is an algebraically closed field
(for instance, F = C). We are going to solve the same above problem by means of
successive affine transformations; after each one, the equation of the quadric is for-
mulated simplifying the original one. One may simply repeat arguments previously
presented, acting with appropriate amendments in some steps. More precisely, for
11.5 Affine and Metric Classification of Quadrics 413

quadrics with center, we may start from the polynomial of the form (11.42):

r
p(x1 , . . . , xn ) = αi xi2 + e.
i=1

If e = 0, we perform the substitution


()
e 
x i = 1, . . . , r
xi = αi i

xi i = r + 1, . . . , n

so that (11.42) reduces to

r
p(x1 , . . . , xn ) = xi2 + 1. (11.54)
i=1

If e = 0, we apply the following change of coordinates:



√1 xi i = 1, . . . , r
xi = αi
xi i = r + 1, . . . , n

and the polynomial reduces to

r
p(x1 , . . . , xn ) = xi2 . (11.55)
i=1

In the case of quadrics without center, we look at polynomial of the form (11.52):

r
p(x1 , . . . , xn ) = αi xi2 + 2αn xn + an+1,n+1 ,
i=1

and by performing the substitution

xi = √1
αi
xi i = 1, . . . , n − 1
xn
xn = 2αn
− an+1,n+1
2αn

the polynomial of Q is

r
p(x1 , . . . , xn ) = xi2 + xn . (11.56)
i=1

Therefore, we conclude that an arbitrary set in an n-dimensional affine space over


an algebraically closed field, given by equating a second-degree polynomial in n
414 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

variables to zero, is affinely equivalent to one of the sets defined by equating to zero
one of the polynomials (11.54), (11.55) and (11.56). In particular,

n
(i) polynomial xi2 + 1 represents the canonical form of an Ellipsoid;
i=1

r
(ii) polynomials xi2 + 1 (r ≤ n − 1) represent the canonical form of a non-
i=1
parabolic Cylinder;
 r
(iii) polynomials xi2 (r ≤ n) represent a Cone;
i=1

n−1
(iv) polynomial xi2 + xn represents a Paraboloid;
i=1
r
(v) polynomials xi2 + xn (r ≤ n − 2) represent a parabolic Cylinder.
i=1

Remark 11.69 We notice that the canonical equation of a degenerate quadric con-
tains less than n variables.

We would like to dedicate a brief interlude to a considerable aspects of quadrics


theory, due to the natural pathology of quadratic polynomials in n indeterminates: in
some cases, they can be split in a product of two linear polynomials. When this event
occurs, it is a good thing to understand what’s going on to quadrics represented by
such polynomials.

Definition 11.70 A quadric Q ⊂ An is said to be reducible if its representing


polynomial p(x1 , . . . , xn ) can be reduced as a product of two linear polynomials
h 1 (x1 , . . . , xn ) and h 2 (x1 , . . . , xn ). Each of such polynomials defines a hyperplane
in An , namely

H1 : h 1 (x1 , . . . , xn ) = 0 H2 : h 2 (x1 , . . . , xn ) = 0.

We write Q = H1 ∪ H2 , meaning that the subset Q consists of all points from H1


and all points from H2 .

In order to characterize reducible quadrics we fix the following.


Theorem 11.71 Let Q ⊂ An be a quadric represented by equation Y t CY = 0, as
specified in (11.35). If Q is not the empty set, then it is reducible if and only if
rank(C) ≤ 2.

Proof Assume firstly that Q is reducible and hyperplanes H1 and H2 are represented
by the following equations:

H1 : α1 x1 + · · · + αn xn + αn+1 = 0 H2 : β1 x1 + · · · + βn xn + βn+1 = 0,

for some αi , βi ∈ F. By an appropriate affinity f : An → An , we may map H1 in


such a way that the equation representing its image f (H1 ) is exactly x1 = 0. After
11.5 Affine and Metric Classification of Quadrics 415

performing the same affinity on H2 , we denote by γ1 x1 + · · · + γn xn = 0 the equa-


tion representing f (H2 ).
The image f (Q) consists precisely of all points from f (H1 ) and all points from
f (H2 ). Then its representing polynomial comes from the product of polynomials
representing hyperplanes f (H1 ) and f (H2 ), that is,

γ1 x12 + γ2 x2 + · · · + γn xn + γn+1 = 0.

Hence, the matrix associated with f (Q) is


⎡ γ2 γn+1 ⎤
γ1 2
··· ··· ··· 2
⎢ γ2 ⎥
⎢ 2
0 ··· ··· ··· 0 ⎥
⎢ .. .. .. ⎥
⎢ . . . ⎥
 ⎢ ⎥
C =⎢ .. .. .. ⎥
⎢ . . . ⎥
⎢ ⎥
⎢ .. .. .. ⎥
⎣ . . . ⎦
γn+1
2
0 ··· ··· ··· 0

whose rank is ≤ 2, as well as the rank C, which is congruent with C  .


Conversely, suppose that rank(C) ≤ 2 and consider now the canonical form of
quadric Q. In its simplest form, the quadric is described by a matrix C  having the
same rank of C. Taking a look at all possible canonical forms of Q, that is,
(i) Equations (11.44), (11.45), (11.47) and (11.53) in case F is not algebraically
closed and
(ii) Equations (11.54), (11.55) and (11.56), in case F is algebraically closed,
it is clear that the only ones admitting rank ≤ 2 are
(i) polynomials xi2 + 1 and xi2 − 1, for some index 1 ≤ i ≤ n;
(ii) polynomials xi2 − x 2j and xi2 + x 2j , for some indices i = j and 1 ≤ i, j ≤ n;
(iii) polynomials xi2 , for some index 1 ≤ i ≤ n.
Initially, we consider polynomials xi2 + 1 and xi2 + x 2j when F is not algebraically
closed. In both cases, f (Q) should be the empty set, which contradicts the fact that
Q is not.
In any other case, the corresponding polynomial is always reducible in the product
of two linear polynomials, which are distinct, merged, real or also, in case F is alge-
braically closed, complex conjugate. These polynomials represent two hyperplanes
K 1 , K 2 in An , whose union is f (Q), namely f (Q) = K 1 ∪ K 2 . Therefore, f (Q)
consists of all points from K 1 and all points from K 2 . Since any affine transformation
maps hyperplanes to hyperplanes (preserving the direction), we conclude that Q is
the union of the hyperplanes f −1 (K 1 ) and f −1 (K 2 ).

All that has just been said can be applied to the case of quadrics in A2 and A3 , in
order to obtain the well known affine classification of conic curves in the affine plane
416 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

and the affine classification of quadric surfaces in the 3-dimensional space. These
are nothing more than reduced cases of the ones previously discussed.
Thus, in the case of conics in the plane, we may simply assert the following.
Theorem 11.72 Let RA2 be the 2-dimensional affine space over the vector space
R2 . Any conic curve is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 − 1 = 0, non-degenerate real ellipse;
(ii) x 2 − y 2 − 1 = 0, non-degenerate hyperbola;
(iii) x 2 − y = 0, non-degenerate parabola;
(iv) x 2 + y 2 + 1 = 0, empty set (non-degenerate imaginary ellipse);
(v) x 2 − y 2 = 0, two secant real lines;
(vi) x 2 + y 2 = 0, one real point (two complex conjugate lines);
(vii) x 2 − 1 = 0, two real parallel lines;
(viii) x 2 + 1 = 0, empty set (two complex parallel lines);
(i x) x 2 = 0, two real merged lines.
In the case of complex spaces, we have the following.

Theorem 11.73 Let CA2 be the 2-dimensional affine space over the vector space
C2 . Any conic curve is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + 1 = 0, non-degenerate ellipse;
(ii) x 2 − y = 0, non-degenerate parabola;
(iii) x 2 + y 2 = 0, two complex conjugate secant lines;
(iv) x 2 + 1 = 0, two complex parallel lines;
(v) x 2 = 0, two real merged lines.

Shifting the focus to quadric surfaces in real affine space, we have the following.
Theorem 11.74 Let RA3 be the 3-dimensional affine space over the vector space R3 .
Any quadric surface is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + z 2 − 1 = 0, real ellipsoid;
(ii) x 2 + y 2 + z 2 + 1 = 0, empty set (non-degenerate imaginary ellipsoid);
(iii) x 2 + y 2 − z 2 + 1 = 0, elliptic hyperboloid;
(iv) x 2 + y 2 − z 2 − 1 = 0, hyperbolic hyperboloid;
(v) x 2 + y 2 − z = 0, elliptic paraboloid;
(vi) x 2 − y 2 − z = 0, hyperbolic paraboloid;
(vii) x 2 + y 2 + z 2 = 0, one real point (imaginary cone);
(viii) x 2 + y 2 − z 2 = 0, real cone;
(i x) x 2 + y 2 + 1 = 0, empty set (imaginary cylinder);
(x) x 2 + y 2 − 1 = 0, right circular cylinder;
(xi) x 2 − y = 0, parabolic cylinder;
(xii) x 2 − y 2 − 1 = 0, hyperbolic cylinder;
11.5 Affine and Metric Classification of Quadrics 417

(xiii) x 2 + y 2 = 0, a line (two secant complex planes);


(xiv) x 2 − y 2 = 0, two secant real planes;
(xv) x 2 + 1 = 0, empty set (two complex parallel planes);
(xvi) x 2 − 1 = 0, two real parallel planes;
(xvii) x 2 = 0, two merged real planes.
And finally, in the case of complex spaces, we have the following.

Theorem 11.75 Let CA3 be the 3-dimensional affine space over the vector space C3 .
Any quadric surface is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + z 2 − 1 = 0, ellipsoid;
(ii) x 2 + y 2 − z = 0, paraboloid;
(iii) x 2 + y 2 + z 2 = 0, cone;
(iv) x 2 + y 2 + 1 = 0, elliptic cylinder;
(v) x 2 − y = 0, parabolic cylinder;
(vi) x 2 + y 2 = 0, two secant planes;
(vii) x 2 + 1 = 0, two parallel planes;
(viii) x 2 = 0, two merged planes.

The classification of quadrics in an affine Euclidean space REn up to metric equiva-


lence uses precisely the same arguments as those used in the case of the affine space
FEn . Since isometries are affine transformations, we may apply the results obtained
in the affine space to the Euclidean case. We start by considering quadrics given by
Eq. (11.34) for v = 21 B, that is,

X t AX + 2vt X + an+1,n+1 = 0. (11.57)

The Metric Classification for Quadrics with Center


This is the case when rank(A) = rank(A|v). Let u ∈ Fn be such that Au + v = 0.
We use the same transformation X = H X  + u as in the affine case. But in the present
case H ∈ Mn (F) is an orthogonal matrix, such that A = H t AH is a diagonal matrix.
Thus, we reduce to the equation

X t A X  + e = 0 (11.58)

where A has its eigenvalues on the main diagonal. The polynomial associated with
the quadric is

r
p(x1 , . . . , xn ) = αi xi2 + e (11.59)
i=1
418 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

where α1 , . . . , αr are the nonzero eigenvalues of A. In case e = 0, we divide


p(x1 , . . . , xn ) by ±e, according with the fact that e > 0 or e < 0, and obtain the
form

r
 
p(x1 , . . . , xn ) = αi xi2 + 1 (11.60)
i=1

where αi = αei , for any i = 1, . . . , r. Notice that the last affine transformation (11.43),
previously used in the affine case, in the present Euclidean space doesn’t make any
sense, because it is not orthogonal.
We then conclude that any quadric with center in the n-dimensional affine
Euclidean space REn is metrically equivalent to one of the following forms:

r
αi xi2 + 1 (11.61)
i=1

r
αi xi2 . (11.62)
i=1

Example 11.76 Let Q be the quadric in FE3 , where F = R or C having equation

2x 2 + 2x y + 2x + 2y 2 + 4y + z 2 + 2 = 0.

The matrices associated with Q are


⎡ ⎤
⎡ ⎤ 2 1 0 1
210 ⎢1 2 0 2⎥
A = ⎣1 2 0⎦ C = ⎢
⎣0
⎥.
0 1 0⎦
001
1 2 0 2

Then rank(A) = rank(C) = 3 and Q is a degenerate quadric having exactly one


center of symmetry and which is obviously a cone.
Solving the system
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤
210 x1 1 0
⎣ 1 2 0 ⎦ ⎣ x2 ⎦ + ⎣ 2 ⎦ = ⎣ 0 ⎦ ,
001 x3 0 0

we find the coordinates (0, −1, 0) of the center. To implement the transformation X =
H X  + u, where H ∈ M3 (R) is an orthogonal matrix, we determine the eigenvalues
λ1 , λ2 , λ3 of A. We get
(1)
λ1 = λ2 = 1 having associated eigenspace generated by orthogonal vectors
√1 , − √1 , 0 and (0, 0, 1);
2 2
11.5 Affine and Metric Classification of Quadrics 419

(2) λ3 = 3, having associated eigenspace generated by vector √1 , √1 , 0
2 2
.

Thus, the transformation of variables is


⎡ ⎤ ⎡ √1 ⎤⎡ ⎤ ⎡ ⎤
x 0 √12 X 0
⎣y⎦ = ⎢ ⎥
2
⎣ − √12 0 √12 ⎦ ⎣ Y ⎦ + ⎣ −1 ⎦ .
z 0 1 0 Z 0

Substitution of variables leads to the canonical equation

X 2 + Y 2 + 3Z 2 = 0.

Notice that, for F = R, the quadric consists of exactly one point. For F = C, the
quadric is a degenerate non-reducible surface.

The Metric Classification for Quadrics Without Center


Starting again from Eq. (11.57), we now assume rank(A) < rank(A|v), that is,
AX + v = 0 has no solution. Even in this case also if we could repeat the same
argument as in the case of affine spaces, here we want to take the shortcut by taking
advantage of the fact that we can use the resources provided by the Euclidean structure
of the space REn .
Denoted by r the rank of the matrix A, the null space N (A) of A has dimension n −
r. Moreover, since A is symmetric, N (A) coincides with the orthogonal complement
of the column space of A (that is I m(A), the image of A). Hence, we may write
v = v1 + v2 , where v1 ∈ I m(A) and v2 ∈ N (A). Hence, Av2 = 0 and there exists
u ∈ Rn such that Au = v1 . Of course, the first stage of the process is to guarantee the
diagonalization of A. Once again, we introduce the transformation X = H X  − u,
where H ∈ Mn (F) is an orthogonal matrix, whose column vectors are precisely the
eigenvectors of A, that is, A = H t AH = H −1 AH is a diagonal matrix, having its
eigenvalues on the main diagonal.
The column vectors of H define an orthonormal basis {e1 , . . . , en } for Rn , con-
sisting of r eigenvectors corresponding to nonzero eigenvalues and n − r eigenvec-
tors corresponding to the eigenvalue zero. We may reorder the element in the basis
and assume that {e1 , . . . , er } are precisely the eigenvectors associated with nonzero
eigenvalues of A. Hence, since v2 is an eigenvector corresponding to eigenvalue
zero, we may choose er +1 = vv22  .
Thus, by substitution X = H X  − u in Eq. (11.57), we have

p(x1 , . . . , xn ) = X t H t AH X  + 2vt2 H X  + (−2vt2 u − vt1 u + an+1,n+1 )


(11.63)
= X t H t AH X  + 2vt2 H X  + e

where e = −2vt2 u − vt1 u + an+1,n+1 . Notice that ert +1 H is precisely the (r + 1)th
vt
column of H t H = In , that is, v22  H = er +1 . Hence, vt2 H = v2  er +1 . Therefore,
420 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

the polynomial (11.63) reduces to

p(x1 , . . . , xn ) = X t H t AH X  + 2 v2  xr +1 + e


r
(11.64)
= αi xi2 + βr +1 xr +1 + e
i=1

where α1 , . . . , αr are the nonzero eigenvalues of A and βr +1 = 2 v2  . Finally, by


performing the translation

xi = xi i = 1, . . . , r
xr +1 = xr+1 − βre+1 ,

we may write the polynomial associated with the quadric as

r
αi xi2 + βr +1 xr+1 . (11.65)
i=1

Example 11.77 Let Q be the quadric in RE4 having equation

x12 + 2x1 x2 + 4x1 + x22 + 2x2 + x32 + 2x3 x4 + x42 + 3 = 0.

The matrices associated with Q are


⎡ ⎤
⎡ ⎤ 1 1 0 0 2
1 1 0 0 ⎢1
⎢1 1 0 0 1⎥
1 0 0⎥ ⎢ ⎥
A=⎢
⎣0
⎥ C = ⎢0
⎢ 0 1 1 0⎥⎥.
0 1 1⎦ ⎣0 0 1 1 0⎦
0 0 1 1
2 1 0 0 3

Then rank(A) = 2, rank(C) = 4 and Q is a degenerate quadric having no center


of symmetry which is obviously a parabolic cylinder. The first step is finding the
projection v1 of vector v, having components (2, 1, 0, 0), onto the subspace I m(A).
To do this, we easily see that an orthogonal basis for I m(A) consists of vectors
c1 = (1, 1, 0, 0) and c2 = (0, 0, 1, 1). Then,

v · c1 v · c2 3 3
v1 = c1 + c2 = , , 0, 0 .
c1 · c1 c2 · c2 2 2

Therefore, since v1 ∈ I m(A), the system


⎡ ⎤⎡ ⎤ ⎡ 3 ⎤ ⎡ ⎤
1 1 0 0 x1 2
0
⎢1 1 0 0⎥ ⎢ x2 ⎥ ⎢ 3 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ + ⎢ 2 ⎥ = ⎢ ⎥
⎣0 0 1 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦ ⎣ 0 ⎦
0 0 1 1 x4 0 0
11.5 Affine and Metric Classification of Quadrics 421

has solutions and their general form is (−α − 23 , α, β, −β), for any α, β ∈ R. We
choose one of them, for instance, α = β = 0, and denote it as u = (− 23 , 0, 0, 0).
This is the vector we’ll use for translation. We now determine the null space N (A)
of A: one of its orthonormal bases is

1 1 1 1
√ , − √ , 0, 0 , 0, 0, √ , − √ .
2 2 2 2

Then compose an orthonormal basis for R4 by the union of bases from I m(A) and
N (A), that is,

1 1 1 1 1 1 1 1
B= √ , √ , 0, 0 , 0, 0, √ , √ , √ , − √ , 0, 0 , 0, 0, √ , − √ .
2 2 2 2 2 2 2 2

The transformation of variables is then the composition of


(1) translation by u and
(2) rotation by the matrix having elements from B as column vectors,
that is,
⎡ ⎤ ⎡ √1 0 √1 0
⎤⎡ ⎤ ⎡ ⎤
x1 2 2 x1 − 23

⎢ x2 ⎥ ⎢ √ 0 − √
1 1 ⎥
0 ⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥=⎢ 2 1 ⎥⎢ ⎥ + ⎢ ⎥
⎣ 0 ⎦.
2
⎣ x3 ⎦ ⎣ 0 √ 0 √1 ⎦ ⎣ x  ⎦
3
2 2
x4 0 √12 0 − √12 x4 0

Hence, we perform the following substitution of variables in the equation of the


quadric:
x1 = √12 (x1 + x3 ) − 23 ,
x2 = √12 (x1 − x3 ),
x3 = √1 (x  + x4 ),
2 2
x4 = √1 (x  − x4 ).
2 2

This leads us to arrive at the equation


√ 3
2x12 + 2x22 + 2x3 − = 0.
4
Finally, for
x1 = X1
x2 = X2
x3 = X 3 + 4√3 2
x4 = X 4,

we have the metric canonical equation of the cylinder


422 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

2X 12 + 2X 22 + 2X 3 = 0.

Summarizing, we conclude that every quadric Q of the affine Euclidean space REn
having equation
X t AX + B t X + an+1,n+1 = 0

and in its more compact form


Y t CY = 0

can be given in some coordinate system by an equation of the form (11.61) in the
case Q has a center, or by an equation of the form (11.65) if Q doesn’t have any
center. So, we have proved the following.
Theorem 11.78 Every quadric of REn is metrically equivalent to one of the follow-
ing:

p 
r
(i) αi xi2 − α j x 2j = 0 if rank(C) = rank(A),
i=1 j= p+1
p r
(ii) αi xi2 − α j x 2j = 1 if rank(C) = rank(A) + 1,
i=1 j= p+1
p r
(iii) αi xi2 − α j x 2j − xr +1 = 0 if rank(C) = rank(A) + 2,
i=1 j= p+1

where p ≤ r ≤ n and 0 = αk > 0, for any k = 1, . . . , r.


At first sight, we may remark that there exists an infinite number of classes of con-
gruence for quadrics in REn : any class is represented by one of the polynomials in
Theorem 11.78 that are depending by the coefficients αi ∈ R. On the contrary, the
number of classes of affinity for quadrics in RAn is finite, since it depends only on
the number n of indeterminates and on the rank r of the matrix A.
Once again let us establish the natural connection between the results obtained
earlier with the more familiar objects from analytic geometry: conic curves in the
real plane and quadric surfaces in the real 3-dimensional space. In the light of
Theorem 11.78, we have the following

Theorem 11.79 Let RE2 be the 2-dimensional affine Euclidean space over the vec-
tor space R2 . Any conic curve is metrically equivalent to one of the sets defined by
the following equations:
x2 y2
(i) a2
+ b2
− 1 = 0, real ellipse;
x2 y2
(ii) a2
− − 1 = 0, hyperbola;
b2
(iii) x + ay = 0, parabola;
2
2 2
(iv) ax 2 + by2 + 1 = 0, empty set (imaginary ellipse);
x2 y2
(v) a2
− b2
= 0, two secant real lines;
x2 y2
(vi) a2
+ b2
= 0, one real point (two complex conjugate lines);
11.5 Affine and Metric Classification of Quadrics 423

2
(vii) ax 2 − 1 = 0, two real parallel lines;
2
(viii) ax 2 + 1 = 0, empty set (two complex parallel lines);
(i x) x 2 = 0, two real merged lines.

For quadric surfaces in real Euclidean space, we have the following.


Theorem 11.80 Let RE3 be the 3-dimensional affine Euclidean space over the vec-
tor space R3 . Any quadric surface is metrically equivalent to one of the sets defined
by the following equations:
x2 y2 z2
(i) a2
+ b2
+ c2
− 1 = 0, real ellipsoid;
x2 y2 z2
(ii) a2
+ b2
+ c2
+ 1 = 0, empty set (imaginary ellipsoid);
x2 y2 z2
(iii) a2
+ b2
− c2
+ 1 = 0, elliptic hyperboloid;
x2 y2 z2
(iv) a2
+ b2
− c2
− 1 = 0, hyperbolic hyperboloid;
x2 y2
(v) a2
+ b2
− z = 0, elliptic paraboloid;
x2 y2
(vi) a2
− b2
− z = 0, hyperbolic paraboloid;
x2 y2 z2
(vii) a2
+ b2
+ c2
= 0, one real point (imaginary cone);
x2 y2 z2
(viii) a2
+ b2
− c2
= 0, real cone;
x2 y2
(i x) a2
+ b2
+ 1 = 0, empty set (imaginary cylinder);
x2 y2
(x) a2
+ − 1 = 0, elliptic cylinder;
b2
x2
(xi) a2
− y = 0, parabolic cylinder;
x2 y2
(xii) a2
− b2
− 1 = 0, hyperbolic cylinder;
x2 y2
(xiii) a2
+ b2
= 0, a line (two secant complex planes);
x2 y2
(xiv) a2
− = 0, two secant real planes;
b2
x2
(xv) a2
+ 1 = 0, empty set (two complex parallel planes);
x2
(xvi) a2
− 1 = 0, two real parallel planes;
(xvii) x 2
= 0, two merged real planes.

11.6 Projective Classification of Conic Curves and Quadric


Surfaces

We now aim at exploiting the algebraic properties of the representations of geo-


metric entities, recalling the concept of homogeneous coordinates and projective
spaces. As affine and Euclidean coordinates represent geometric entities in an Affine
and a Euclidean space, respectively, homogeneous coordinates represent geomet-
ric elements in a projective space. Throughout any classical course of Geometry, the
usefulness of homogeneous coordinates for constructions and transformations is usu-
ally well highlighted. The projective geometry substantially focuses on the study of
424 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

properties which are invariant under the action of projective transformations: a char-
acteristic feature of projective geometry is the symmetry of relationships between
points and lines, called duality.
Nevertheless, we will restrict ourselves here to briefly mentioning the main defi-
nitions, which will contribute to achieving the primary objective of this chapter: the
study of conic curves and quadric surfaces from a projective point of view.
Let F be a field and consider the vector space Fn+1 of dimension n + 1. Let
0 = v ∈ Fn+1 be a nonzero vector, then the set

v = {λv | λ ∈ F}

is called a ray of Fn+1 .

Definition 11.81 The projective space FPn , of dimension n, associated with Fn+1 ,
is the set of rays of Fn+1 . Any element of FPn is called a point.

Therefore, a point of the projective space FPn is represented by a vector of coordinates


X = [x1 , . . . , xn+1 ]t ∈ Fn+1 , where at least one of xi ’s is not zero.
Definition 11.82 Let X = [x1 , . . . , xn+1 ]t ∈ Fn+1 be the coordinates representing a
point P ∈ FPn . The scalars {x1 , . . . , xn+1 } are called projective (or homogeneous)
coordinates of P.
Two vectors X, Y ∈ Fn+1 represent the same point of FPn when there exists λ ∈
F \ {0} such that X = λY. Hence, the projective coordinates of a point are defined
up to a scale factor. This relates to the following.

Definition 11.83 Let Mn+1 (F) be the ring of (n + 1) × (n + 1) matrices over F. Any
nonsingular matrix C ∈ Mn+1 (F) defines a linear transformation F : FPn → FPn
which is called projective transformation and transforms a point having projective
coordinates X into a point having projective coordinates X  via X  ≈ C X , where ≈
indicates equality up to a scale factor.

Hence, a projective transformation F of FPn is an automorphism of FPn induced by


an automorphism of Fn+1 . The matrix representing this automorphism is not unique:
two different matrices A, B ∈ Mn+1 (F) induce the same projective transformation
if and only if there exists λ ∈ F such that A = λB.

Definition 11.84 Let S and S  be two subsets of the projective space FPn . We say
that S is projectively equivalent to S  if there exists a projective transformation
F : FPn → FPn such that F(S) = S  .

Projective Coordinates in 2-Dimensional Projective Space


It is well known how to introduce a Cartesian coordinate system in the 2-dimensional
affine plane FA2 . Any point P of FA2 can be uniquely represented by the pair of
scalars (x, y), called the coordinates of the point P in terms of the coordinate system
O X Y, we say P ≡ (x, y). The origin itself has coordinates O ≡ (0, 0).
11.6 Projective Classification of Conic Curves and Quadric Surfaces 425

If P ≡ (x, y) is a point in the affine plane FA2 , we may represent it in the Pro-
jective plane FP2 by its homogeneous coordinates. We simply add a third coordinate
equal to 1. Thus, P ≡ (x, y) is represented by its homogeneous coordinates (x, y, 1).
More generally, the homogeneous coordinates of a point P are (x1 , x2 , x3 ) iff
x = xx13 and y = xx23 are its Euclidean coordinates. Thus, homogeneous coordinates
are invariant up to scaling: (x1 , x2 , x3 ) and (αx1 , αx2 , αx3 ) represent the same point,
for any 0 = α ∈ F.
To represent a line in the projective plane, we start from the standard formula ax +
by + c = 0 and introduce the homogeneous coordinates to arrive at the equation
ax1 + bx2 + cx3 = 0.
The coordinates (x1 , x2 , 0) cannot represent any point of the form (x, y, 1) as
they don’t share the same third coordinates. In fact, (x1 , x2 , 0) is a point representing
the slope of a line: parallel lines r and r  in FP2 meet at the line x3 = 0 (line at
infinity). More precisely, if r : ax1 + bx2 + cx3 = 0 and r  : ax1 + bx2 + c x3 = 0,
then their intersection point has homogeneous coordinates equal to (b, −a, 0). This
point is called point at infinity of r (and r  ) and represents the class of all parallel
lines having the same direction.
Hence, the general idea is to let every couple of lines in the projective plane have
an intersection point. By this approach, the projective plane can be defined as

FP2 = FA2 ∪ {the set of all directions in FA2 }.

Projective Classification of Conics in FP2


As above, to represent a conic in the projective plane, we start from the standard
representing equation. Hence, we firstly recall that a conic is a locus in FA2 (F = R
or F = C) consisting of all points whose coordinates are solution of a quadratic
equation of the form f (x, y) = 0, where

f (x, y) = a11 x 2 + 2a12 x y + a22 y 2 + 2a13 x + 2a23 y + a33 (11.66)

and ai j ∈ F, for any i, j = 1, 2, 3. Introducing the homogeneous coordinates, we


define a conic Γ as a locus in the Projective plane FP2 consisting of all points
whose homogeneous coordinates are solution of a quadratic equation of the form
f (x1 , x2 , x3 ) = 0, where

f (x1 , x2 , x3 ) = a11 x12 + 2a12 x1 x2 + a22 x22 + 2a13 x1 x3 + 2a23 x2 x3 + a33 x32
(11.67)
and ai j ∈ F, for any i, j = 1, 2, 3.
The quadratic polynomial f (x1 , x2 , x3 ) can be represented by the symmetric
matrix ⎡ ⎤
a11 a12 a13
A = ⎣ a12 a22 a23 ⎦
a13 a23 a33
426 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

so that ⎡ ⎤⎡ ⎤
! a11 a12 a13 x1
f (x1 , x2 , x3 ) = x1 x2 x3 ⎣ a12 a22 a23 ⎦ ⎣ x2 ⎦ .
a13 a23 a33 x3
!t
Hence, if we denote X = x1 x2 x3 , then the equation f (x1 , x2 , x3 ) = 0 can be
written in the following matrix notation: X t AX = 0.
We say that A is the matrix associated with the conic Γ, or also that the conic Γ
is determined by the matrix A.
Following Definition 11.84, we say that two conics Γ and Γ  of the projective
space FP2 are projectively equivalent if there exists a projective transformation F :
FP2 → FP2 such that F(Γ ) = Γ  .

Theorem 11.85 Let Γ be a conic of the projective space FP2 , determined by the 3 ×
3 matrix A with coefficients in F. If F : FP2 → FP2 is a projective transformation
and−1Ct is the matrix representing F, then F(Γ ) is a conic determined by the matrix
C AC −1 .

Proof By the definition of projective transformations, we have that F(X) = C


t X, for
any X ∈ FP2 . We now denote by Γ  the conic determined by the matrix C −1 AC −1
and prove that Γ  = F(Γ ).
Firstly, we notice that if X ∈ Γ, then obviously C −1 C X ∈ Γ. Hence,
 −1 t  
C C X A C −1 C X = 0,

that is,
 −1 t −1
(C X ) C
t
AC C X = 0,

therefore, C X = F(X ) ∈ Γ  , so that F(Γ ) ⊆ Γ  . Similarly, by using the fact that


F −1 (X ) = C −1 X, for any X ∈ FP2 , one may prove that F −1 (Γ  ) ⊆ Γ, concluding
that Γ  = F(Γ ), as required.

As a consequence of the previous theorem, we have the following.

Theorem 11.86 Let Γ and Γ  be two projective conics, represented by the symmetric
matrices A ∈ M3 (F) and A ∈ M3 (F), respectively. Then Γ is projectively equivalent
to Γ  if and only if there exists a nonsingular matrix C ∈ M3 (F) such that A =
C t AC.

Proof Firstly, we assume that Γ is projectively equivalent to Γ  , that is, there exists
a projective transformation F : FP2 → FP2 such that F(Γ ) = Γ  . Let C ∈ M3 (F)
be the nonsingular matrix associated with F. Hence, C X 0 ∈ Γ  , for any point X 0 ∈
Γ. The replacement of X by C X in the matrix notation X t AX = 0 of Γ leads
to the relation (C X )t A(C X ) = 0, that is, X t (C t AC)X = 0, which represents Γ  .
Therefore, A = C t AC as required.
11.6 Projective Classification of Conic Curves and Quadric Surfaces 427

Conversely, let C be any nonsingular 3 × 3 matrix with coefficients in F, and let


F : FP2 → FP2 be the projective transformation represented by the matrix C −1 . By
Theorem 11.85, the conic F(Γ ) is determined by the matrix C t AC. Hence, F(Γ ) =
Γ  , that is, Γ is projectively equivalent to Γ  .

In other words, two projective conics Γ and Γ  are projectively equivalent if and
only if their associated matrices are congruent.

Definition 11.87 Let Γ be a conic of FP2 , represented by the matrix notation


X t AX = 0, where A ∈ M3 (F). The rank of the matrix A is also called the rank
of Γ.

In light of Definition 11.87 and Lemma 7.50, we are now able to state the following.
Theorem 11.88 If two conics Γ and Γ  of FP2 are projectively equivalent, then
they have the same rank.
In particular,
Theorem 11.89 Let F be an algebraically closed field. Two conics Γ and Γ  of FP2
are projectively equivalent if and only if they have the same rank.
The fact that the matrix associated with a conic is symmetric makes it possible for us to
transform it to a diagonal matrix. In this way, we determine the number of congruence
classes for matrices in M3 (F), which correspond to congruence classes of conics. Any
class is represented by a diagonal matrix notation, usually called projective canonical
form of a conic, such that any conic of FP2 is projectively equivalent to one (and
only one) of them. To do this, we refer to the results contained in Theorems 7.46 and
7.47, devoted to the description of the diagonal forms of quadratic functions. More
precisely, in the case F is algebraically closed and using Theorem 7.46, we have the
following.

Theorem 11.90 Let F be an algebraically closed field. Any conic Γ of FP2 is pro-
jectively equivalent to one (and only one) of the following:
(i) x12 + x22 + x32 = 0 (non-degenerate or ordinary conic);
(ii) x12 + x22 = 0 (degenerate conic of rank 2);
(iii) x12 = 0 (degenerate conic of rank 1).

Proof By Theorem 11.86, any projective transformation associated with a nonsingu-


lar matrix C ∈ M3 (F) transforms Γ to the projectively equivalent Γ  . In particular, if
A is the matrix representing Γ, then C t AC is the matrix representing Γ  . Hence, as
an application of Theorem 7.46, it follows that there exists an appropriate C ∈ M3 (F)
such that C t AC has one of the following forms:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
100 100 100
⎣0 1 0⎦, ⎣0 1 0⎦, ⎣0 0 0⎦.
001 000 000
428 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

Therefore, in case F is algebraically closed, there exist precisely 3 classes of con-


gruence, each of which is determined by the rank of the belonging conics.
If F is not algebraically closed, the congruence classes are 5.
Theorem 11.91 Any conic Γ of RP2 is projectively equivalent to one (and only one)
of the following:
(i) x12 + x22 − x32 = 0 (non-degenerate or ordinary conic);
(ii) x12 + x22 + x32 = 0 (non-degenerate or ordinary conic, containing no real
points);
(iii) x12 − x22 = 0 (degenerate conic of rank 2);
(iv) x12 + x22 = 0 (degenerate conic of rank 2);
(v) x12 = 0 (degenerate conic of rank 1).

Proof By using the same argument as in Theorem 11.90 and applying Theorem 7.47,
one has that any matrix representing a conic of RP2 is congruent to one of the
following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
10 0 100 1 0 0 100 100
⎣ 0 1 0 ⎦ , ⎣ 0 1 0 ⎦ , ⎣ 0 −1 0 ⎦ , ⎣ 0 1 0 ⎦ , ⎣ 0 0 0 ⎦ .
0 0 −1 001 0 0 0 000 000

Remark 11.92 We notice that, in case F is algebraically closed, a degenerate conic


of FP2 having rank 2 is projectively equivalent to the union of two distinct lines. In
fact, the polynomial defining the conic factors into a product of linear polynomials.
If F = R and Γ is a degenerate conic having rank 2, we have two different cases:
the congruence class x12 − x22 = 0, that is again the union of two distinct lines; and
the congruence class x12 + x22 = 0 which represents a real point (intersection of two
conjugate complex lines).
When Γ is a degenerate conic having rank 1, there is no difference between
the case F = R and F is algebraically closed: in any case, the conic is projectively
equivalent to two superposed lines, that is, the polynomial defining the curve is a
square of a linear polynomial.

Projective Classification of Quadrics in FP3


Before proceeding with the description of the quadric surfaces in FP3 , we need to
recall the definition of the projective coordinates in 3-dimensional projective space.
As in the 2-dimensional Euclidean space, any point P of the 3-dimensional
Euclidean Space FE3 can be uniquely represented by the triplet of scalars (x, y, z),
called the coordinates of the point P with respect to the coordinate system O X Y Z ,
we say P ≡ (x, y, z). The origin itself has coordinates O ≡ (0, 0, 0). As above we
recall that if P ≡ (x, y, z) is a point of FE3 , we may represent it in the projective
space FP3 by its homogeneous coordinates, adding a fourth coordinate equal to 1.
Thus, P ≡ (x, y, z) is represented by its homogeneous coordinates (x, y, z, 1).
The homogeneous coordinates of a point P are (x1 , x2 , x3 , x4 ) iff x = xx14 , y =
x2
x4
and z = xx34 are its Euclidean coordinates. Since homogeneous coordinates are
11.6 Projective Classification of Conic Curves and Quadric Surfaces 429

invariant up to scaling, then (x1 , x2 , x3 , x4 ) and (αx1 , αx2 , αx3 , αx4 ) represent the
same point, for any 0 = α ∈ R.
To represent a plane in the projective space FP3 , we homogenize the general
equation for planes ax + by + cz + d = 0 by introducing the homogeneous coor-
dinates, so that each term has the same degree. Hence, the general equation of a
plane in FE3 has the form ax1 + bx2 + cx3 + d x4 = 0. Therefore, in order to rep-
resent a line r in FE3 , we just recall that it is the intersection of two non-parallel
planes π : ax1 + bx2 + cx3 + d x4 = 0 and π  : a  x1 + b x2 + c x3 + d  x4 = 0, thus
the line r can be described as

r : ax1 + bx2 + cx3 + d x4 = 0, a  x1 + b x2 + c x3 + d  x4 = 0.

Let now P be a point having projective coordinates (l, m, n, 0). The first three coor-
dinates (l, m, n) represent the direction vector of a line: parallel lines r and r  in FP3
meet at the plane x4 = 0 (plane at infinity). More precisely, if r and r  are parallel
lines in FP3 , then their intersection point has homogeneous coordinates equal to
(l, m, n, 0), where (l, m, n) are the direction vector of r and r  . The point (l, m, n, 0)
is called point at infinity of r (and r  ) and represents the class of all parallel lines
having the same direction vector.
To represent a quadric of FP3 , we start from the standard representing equation. A
quadric is a locus in F3 (F = R or F = C) consisting of all points whose coordinates
are a solution of a quadratic equation of the form f (x, y, z) = 0, where

f (x, y, z) = a11 x 2 + 2a12 x y + 2a13 x z + 2a14 x + a22 y 2

+2a23 yz + 2a24 y + a33 z 2 + 2a34 z + a44

and ai j ∈ F, for any i, j = 1, 2, 3, 4. Introducing the homogeneous coordinates, we


define a quadric Σ as a locus in FP3 consisting of all points whose homogeneous
coordinates are solution of a quadratic equation of the form
f (x1 , x2 , x3 , x4 ) = 0, where

f (x1 , x2 , x3 , x4 ) = a11 x12 + 2a12 x1 x2 + 2a13 x1 x3 + 2a14 x1 x4 + a22 x22

+2a23 x2 x3 + 2a24 x2 x4 + a33 x32 + 2a34 x3 x4 + a44 x42 ,

and ai j ∈ F, for any i, j = 1, 2, 3, 4.


Hence, the quadratic polynomial f (x1 , x2 , x3 , x4 ) can be determined by the sym-
metric matrix ⎡ ⎤
a11 a12 a13 a14
⎢ a12 a22 a23 a24 ⎥
A=⎢ ⎥
⎣ a13 a23 a33 a34 ⎦ .
a14 a24 a34 a44
430 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
!t
Therefore, if we denote X = x1 x2 x3 x4 , we have that f (x1 , x2 , x3 , x4 ) =
X t AX. In this way, we obtain the following matrix notation for the equation of
a quadric: X t AX = 0.
We say that A is the matrix associated with the quadric Σ, or also that the quadric
Σ is determined by the matrix A.

Definition 11.93 Let Σ and Σ  be two quadrics of the projective space FP3 . We
say that Σ is projectively equivalent to Σ  if there exists a projective transformation
F : FP3 → FP3 such that F(Σ) = Σ  .

At this point, we may remind the arguments previously developed in the section
devoted to the classification of conics in the projective space FP2 . Following the
same line we are able to state the following.

Theorem 11.94 Let Σ be a quadric of the projective space FP3 , determined by the
4 × 4 matrix A with coefficients in F. If F : FP3 → FP3 is a projective transforma-
tion and C is the matrix representing F, then F(Σ) is a quadric determined by the
t
matrix C −1 AC −1 .

Proof See the proof of Theorem 11.85.

Theorem 11.95 Let Σ and Σ  be two projective quadrics, represented by the sym-
metric matrices A ∈ M4 (F) and A ∈ M4 (F), respectively. Then Σ is projectively
equivalent to Σ  if and only if there exists a nonsingular matrix C ∈ M4 (F) such that
A = C t AC.

Proof See the proof of Theorem 11.86.

Hence, two projective quadrics Σ and Σ  are projectively equivalent if and only if
their associated matrices are congruent.

Definition 11.96 Let Σ be a quadric of FP3 , represented by the matrix notation


X t AX = 0, where A ∈ M4 (F). The rank of the matrix A is also called the rank of
Σ.

Moreover:
Theorem 11.97 If two quadrics of FP3 are projectively equivalent, then they have
the same rank. In particular, two quadrics of CP3 are projectively equivalent if and
only if they have the same rank. (C is an algebraically closed field.)
Therefore, also in the case of classification of quadrics, we may determine a number
of congruence classes for matrices in M4 (F), which correspond to congruence classes
of quadrics. The classes are usually called projective canonical forms of a quadric,
such that any quadric of FP3 is projectively equivalent to one (and only one) of them.
By using again Theorems 7.46 and 7.47, we have the following.

Theorem 11.98 Let F be an algebraically closed field. Any quadric of FP3 is pro-
jectively equivalent to one (and only one) of the following:
11.6 Projective Classification of Conic Curves and Quadric Surfaces 431

(i) x12 + x22 + x32 + x42 = 0 (non-degenerate or ordinary quadric);


(ii) x12 + x22 + x32 = 0 (degenerate quadric of rank 3);
(iii) x12 + x22 = 0 (degenerate quadric of rank 2);
(iv) x12 = 0 (degenerate quadric of rank 1).

Proof By Theorem 11.95, any projective transformation associated with a nonsingu-


lar matrix C ∈ M4 (F) transforms Σ to the projectively equivalent Σ  . In particular,
if A is the matrix representing Σ, then C t AC is the matrix representing Σ  . By
Theorem 7.46, it follows that there exists an appropriate C ∈ M4 (F) such that C t AC
has one of the following forms:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
⎢0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 0 0 0⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥, ⎢ ⎥.
⎣0 0 1 0⎦ ⎣0 0 1 0⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Theorem 11.99 Any quadric Σ of RP3 is projectively equivalent to one (and only
one) of the following:
(i) x12 + x22 + x32 + x42 = 0 (non-degenerate or ordinary quadric, containing no
real points);
(ii) x12 + x22 + x32 − x42 = 0 (non-degenerate or ordinary quadric);
(iii) x12 + x22 − x32 − x42 = 0 (non-degenerate or ordinary quadric);
(iv) x12 + x22 + x32 = 0 (degenerate quadric of rank 3, containing only one real
point);
(v) x12 + x22 − x32 = 0 (degenerate quadric of rank 3);
(vi) x12 + x22 = 0 (degenerate quadric of rank 2);
(vii) x12 − x22 = 0 (degenerate quadric of rank 2);
(viii) x12 = 0 (degenerate quadric of rank 1).

Proof As in Theorem 11.98 and applying Theorem 7.47, one has that any matrix
representing a quadric of RP3 is congruent to one of the following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
⎢0 1 0 0⎥ ⎢0 1 0 0 ⎥ ⎢0 1 0 0 ⎥ ⎢0 1 0 0⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥, ⎢ ⎥
⎣0 0 1 0⎦ ⎣0 0 1 0 ⎦ ⎣0 0 −1 0 ⎦ ⎣0 0 1 0⎦
0 0 0 1 0 0 0 −1 0 0 0 −1 0 0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
⎢0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 −1 0 0⎥ ⎢0 0 0 0⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥, ⎢ ⎥.
⎣0 0 −1 0⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Example 11.100 Let Q be the projective quadric of FP3 having equation x12 + x22 −
x32 + x1 x2 + x1 x4 − 3x42 = 0. The matrix associated with Q is
432 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

⎡ ⎤1
0 21
1 2
⎢1 1 0 0 ⎥
A=⎢ ⎥
⎣ 0 0 −1 0 ⎦ .
2

1
2
0 0 −3

By performing the standard process for diagonalization of quadratic forms, we may


transform the matrix A into a diagonal matrix, which is congruent with A. After
computations, we see that two steps are needed for diagonalization of A.
In the first step, the transition matrix needed to annihilate the entries on the first
row and column in A is ⎡ ⎤
1 − 21 0 − 21
⎢0 1 0 0 ⎥
E1 = ⎢
⎣0 0 1 0 ⎦

0 0 0 1

so that ⎡ ⎤
1 0 0 0
⎢0 3 0 −1 ⎥
A = E 1t AE 1 = ⎢ 4 ⎥
⎣ 0 0 −1 0 ⎦ .
4

0 − 41 0 − 13
4

Starting from this last one, the second transition of basis is represented by the matrix
⎡ ⎤
1 0 00
⎢0 1 0 13 ⎥
E2 = ⎢
⎣0

0 1 0⎦
0 0 01

and we obtain ⎡ ⎤
10 0 0
⎢0 3 0 0 ⎥
A = E 2t A E 2 = ⎢ ⎥
⎣ 0 0 −1 0 ⎦ .
4

0 0 0 − 10
3

In case F = R and by substitutions

x1 = X 1

2
x2 = √ X 2
3

x3 = X 3

3
x4 = √ X 4 ,
10
11.6 Projective Classification of Conic Curves and Quadric Surfaces 433

we get the projective canonical equation X 12 + X 22 − X 32 − X 42 = 0.


If F = C, by substitutions

x1 = X 1

2
x2 = √ X 2
3

x3 = i X 3

i 3
x4 = √ X 4 ,
10

we get the projective canonical equation X 12 + X 22 + X 32 + X 42 = 0.

Exercises

1. In the affine space RA2 , consider the transformation f : RA2 → RA2 defined by
√ √
1 3 3 3 1 √
f (x, y) = − x− y+ , x − y + 34 .
4 4 4 4 4

Prove that f is an affinity on RA2 and describe geometrically what its action is.
(translation, rotation, reflection, a composition of some of them, etc.)
2. In the affine space RA2 , consider the transformation f : RA2 → RA2 such that

f (0, 0) = (2, 0) f (2, 0) = (4, 0) f (1, 1) = (3, −1)

where coordinates of points are referred to the standard frame of reference. Prove
that f is an affinity on RA2 , determine its representation and describe geometri-
cally what its action is.
3. Represent the reflection in RE4 across the hyperplane of equation x1 + 2x2 −
x3 + x4 + 1 = 0.
4. Given a hyperbola Γ in space RE2 , prove that there exists an affine transformation
mapping Γ to the hyperbola represented by equation x y = 1.
5. Let S1 , S2 be two ordered sets of RA2 consisting of 3 non-collinear points each.
Prove that there exists a unique affine transformation f : RA2 → RA2 such that
f (S1 ) = S2 .
6. Determine the affine and metric classification of the following quadrics and the
transformations needed to obtain them, in both cases F = R and F = C:
(a) 2x 2 + y 2 + z 2 − x − 2y + 1 = 0 in FA3 and FE3 .
(b) x 2 + y 2 + 2x y + 2x z + 2yz − 2x + 2 = 0 in FA4 and FE4 .
434 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry

(c) 2x 2 − 8x y + 2y 2 − 2x + 1 = 0 in FA2 and FE2 .


(d) x 2 + y 2 + 2x y + 2yz − 2x + y + 1 = 0 in FA3 and FE3 .
(e) x 2 + y 2 + 2x y − 2x + y + 1 = 0 in FA4 and FE4 .
7. Determine the projective classification of the following quadrics, in both cases
F = R and F = C:

(a) 2x12 + x22 + x32 − x1 x4 − 2x2 x4 + x42 = 0 in FP3 .


(b) x12 − 2x1 x2 + x22 − x1 x3 = 0 in FP2 .
(c) x12 − x22 + 2x1 x2 + 2x1 x3 + 4x2 x3 − 2x1 x4 + x42 = 0 in FP3 .
(d) x12 + x22 + x1 x2 + 2x2 x3 − 4x1 x4 + 2x2 x4 + x42 = 0 in FP3 .
(e) 3x12 − 2x1 x2 − 2x1 x4 + x2 x4 + x32 + 2x42 = 0 in FP3 .
Chapter 12
Ordinary Differential Equations and
Linear Systems of Ordinary Differential
Equations

In this chapter, we provide a method for solving systems of linear ordinary differ-
ential equations by using techniques associated with the calculation of eigenvalues,
eigenvectors and generalized eigenvectors of matrices. We learn in calculus how to
solve differential equations and the system of differential equations. Here, we firstly
show how to represent a system of differential equations in a matrix formulation.
Then, using the Jordan canonical form and, whenever possible, the diagonal canon-
ical form of matrices, we will describe a process aimed at solving systems of linear
differential equations in a very efficient way. To do this, we also give a short descrip-
tion of the so-called vector-valued functions. Finally, as a further application, in the
last part of the chapter, we show that the method linked with the solution of systems
also supplies a way of dealing with the problem of resolution of differential equations
of order n.

12.1 A Brief Overview of Basic Concepts of Ordinary


Differential Equations

A differential equation is an equation which involves one or more independent vari-


ables, one or more dependent variables and derivatives of the dependent variables
with respect to some or all of the independent variables. If there is just one inde-
pendent variable, then the derivatives are all ordinary derivatives, and the equation
is an ordinary differential equation. Hence, an ordinary differential equation is an
equation relating to one unknown function y(x) of one variable x and some of its
derivatives. The domain of the unknown function is an interval of the real line. Such
an equation is then of the form

F(x, y(x), y  (x), . . . , y (n) (x)) = g(x), (12.1)

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 435
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_12
436 12 Ordinary Differential Equations and Linear …

where x is a scalar parameter, g(x) is a function of the variable x which is defined and
continuous in an interval I ⊆ R, y(x) is the unknown function of x, y  (x) is the first
derivative of y(x) and, for any i, y (i) is the ith derivative of y(x). We recall that in
literature, the functions y(x), y  (x), . . . , y (n) (x) of the variable x, that are involved
in an ordinary differential equation, are commonly replaced by the y, y  , . . . , y (n) so
that Eq. (12.1) can be written as

F(x, y, y  , . . . , y (n) ) = g(x).

We say that the differential equation is homogeneous if the function g(x) is just the
constant function 0, that is, the differential equation is precisely

F(x, y(x), y  (x), . . . , y (n) (x)) = 0. (12.2)

The order of a differential equation is the order of the highest derivative which
appears in the equation. A solution (or integral) of an ordinary differential equation
of order n of the form (12.1) is a function y(x) defined in the interval I ⊆ R and
satisfying the following:
(1) It is n-times differentiable in the domain of definition I .
(2) F(x0 , y(x0 ), y  (x0 ), . . . , y (n) (x0 )) = g(x0 ), for any x0 ∈ I .

Example 12.1 Any solution of the first-order differential equation y  = ay, where
a ∈ R is a fixed constant, has the form y(x) = ceax (c ∈ R). In fact, assuming that
y(x) is a solution, we notice that the product y(x)e−ax has derivative equal to zero;
hence, it is a constant, say y(x)e−ax = c, for some c ∈ R. Moreover, the function
ceax is differentiable in the whole R.

Example 12.2 Any solution of the second-order differential equation y  = a 2 y,


where a ∈ R is a fixed constant, has the form y(x) = c1 eax + c2 e−ax (c1 , c2 ∈ R).
In fact,

y  (x) = ac1 eax − ac2 e−ax and y  (x) = a 2 c1 eax + a 2 c2 e−ax = a 2 y(x).

Moreover, the function c1 eax + c2 e−ax is two times differentiable in the whole R.

Example 12.3 Any solution of the first-order differential equation y  = ay + beax ,


where a, b ∈ R are fixed constants, has the form y(x) = ceax + bxeax (c ∈ R). More
in general, any solution of the differential equation y  = f (x)y + g(x) has the form

 

f (x)d x
y(x) = e g(x)e− f (x)d x
dx + c for any constant c ∈ R.

Note that the constants of integration are included in each of the previous examples.
They are always included in the general solution of differential equations. This means
that the solution of a differential identity represents a family of infinite curves (∞1
12.1 A Brief Overview of Basic Concepts of Ordinary Differential Equations 437

and ∞2 curves for Examples 12.1 and 12.2, respectively). More in general, the
solution of a differential equation of order n is a function depending on n arbitrary
constants and represents a family of ∞n curves. The set of solutions

y = y(x, c1 , . . . , cn ) for any c1 , . . . , cn ∈ R

is called the general solution of the differential equation. In speaking of ordinary


differential equations, we say that we have an initial value problem if all the specified
values of the solution and its derivatives are given at one point of the domain of
definition I . Any standard text on differential equations discusses the initial value
problem, that is, whether we can construct the unique function y(x), among those
representing the general solution, such that


⎪ y(x0 ) = y0


⎨ y  (x0 ) = y1
F(x, y, y  , . . . , y (n) ) = g(x) and ··· (12.3)



⎪ ···
⎩ (n−1)
y (x0 ) = yn−1

for a specific element x0 ∈ I , the domain of definition of y(x). This procedure is in


line with the assignment of specific values to the arbitrary constants c1 , . . . , cn . The
function satisfying (12.3) is usually said to be a particular solution of the differential
equation.

Example 12.4 The general solution of the second-order differential equation


2yy  − x y 2 − 4x = 0 has the form y(x) = x +c
2 2

c
(0 = c ∈ R).
If we consider the following initial value problem:

2yy  − x y 2 − 4x = 0 and y(1) = 2,

we get the particular solution correlating to the value of the constant c which satisfies

12 + c2
2 = y(1) =⇒ = 2 =⇒ (c − 1)2 = 0 =⇒ c = 1, 1,
c

that is, c = 1 and y(x) = x 2 + 1.

However, sometimes, solutions of the differential equation that cannot be regarded as


part of general solution exist. The characteristic of any such solution (usually called
singular solution) is such that it is not derived from the general solution by assigning
particular values to the constants.

Example 12.5 Let us return to Example 12.4. Notice that both y = 2x and y =
−2x are solutions of the second-order differential equation 2yy  − x y 2 − 4x = 0.
Nevertheless, none of them can be deducted by the general solution by assigning
particular values to the constant c.
438 12 Ordinary Differential Equations and Linear …

Example 12.6 The general solution of the first-order differential equation


2y  − (y − 1)2 x = 0 has the form y(x) = − x 22+c + 1 (c ∈ R).
Even if y = 1 is a solution of the equation, it cannot be derived by the general one.

12.2 System of Linear Homogeneous Ordinary Differential


Equations

An ordinary differential equation of the general form

F(x, y(x), y  (x), . . . , y (n) (x)) = g(x) (12.4)

is linear if it is linear in the unknown function y(x) and its derivatives.


If g(x) is not the zero function, the equation is said to be nonhomogeneous. In this
case, the equation
F(x, y(x), y  (x), . . . , y (n) (x)) = 0 (12.5)

is called the associated homogeneous equation. It is known that, if s(x) is the gen-
eral solution of the associated homogeneous equation (12.5), (s(x) is usually called
a complementary solution) and s0 (x) is any solution of the nonhomogeneous equa-
tion (12.4), then y(x) = cs(x) + s0 (x) is a general solution of the nonhomogeneous
equation, for an arbitrary constant c. In other words, the general solution to (12.4) is,
thus, obtained by adding all possible homogeneous solutions to one fixed particular
solution.
From this family, we can select one which satisfies the initial condition y(x0 ) = y0 .
In any course on ordinary differential equations, all of us probably encountered the
general method known as variation of parameters for constructing particular solu-
tions of nonhomogeneous ordinary differential equations with constant coefficients.
Here, it is not our intention to discuss this aspect. Actually, we would like to reiterate
that, from the point of view of the search for solutions, the study of the associated
homogeneous equation plays a crucial role. Thus, in this section, we shall concen-
trate our attention exactly on the case of linear homogeneous first-order ordinary
differential equations in which one or more than one unknown function occurs.
Let y1 (x), . . . , yn (x) be differentiable functions of the scalar parameter x, ai j
given scalars (for i, j = 1, . . . , n) and f 1 (x), . . . , f n (x) arbitrary functions of x.
Assume that f 1 (x), . . . , f n (x) are defined and continuous in an interval I ⊆ R.
A system of n linear first-order ordinary differential equations in n unknowns
with constant coefficients (an n × n system of linear equations) has the general form

y1 (x) = a11 y1 (x) + a12 y2 (x) + · · · + a1n yn (x) + f 1 (x)


y2 (x) = a21 y1 (x) + a22 y2 (x) + · · · + a2n yn (x) + f 2 (x)
(12.6)
.........
yn (x) = an1 y1 (x) + an2 y2 (x) + · · · + ann yn (x) + f n (x).
12.2 System of Linear Homogeneous Ordinary Differential Equations 439

We may write the system as a matrix differential equation

y (x) = Ay(x) + f(x), (12.7)

where ⎡ ⎤
y1 (x) ⎡ ⎤
⎢ y  (x) ⎥ a11 a12 . . . a1n
⎢ 2 ⎥ ⎢ a21 a22 . . . a2n ⎥
y (x) = ⎢
⎢ ... ⎥,
⎥ A=⎢
⎣...
⎥,
⎣ ... ⎦ ... ... ... ⎦
an1 an2 . . . ann
yn (x)
⎡ ⎤ ⎡ ⎤
y1 (x) f 1 (x)
⎢ y2 (x) ⎥ ⎢ f 2 (x) ⎥
⎢ ⎥ ⎢ ⎥
y(x) = ⎢ . . . ⎥ , f(x) = ⎢
⎢ ⎥
⎢ ... ⎥ ⎥.
⎣ ... ⎦ ⎣ ... ⎦
yn (x) f n (x)

With the aim of developing our discussion of systems of differential equations, we


need to introduce the following terminology:
Definition 12.7 A vector-valued function is a vector whose entries are functions of
x. To be more specific, let g1 (x), . . . , gn (x) be functions of the independent variable
x. The vector-valued function having component g1 (x), . . . , gn (x) is the vector
⎡ ⎤
g1 (x)
⎢ g2 (x) ⎥
⎢ ⎥
g(x) = ⎢
⎢ ... ⎥

⎣ ... ⎦
gn (x)

whose domain of definition is the largest possible interval for which all components
are defined, that is, the intersection of their domains of definition.
The calculus processes of taking limits, differentiating and integrating are extended
to vector-valued functions by evaluating the limit (derivative or integral, respectively)
of each entry gi (x) separately, that is,
⎡ ⎤
lim g1 (x) ⎡  ⎤ ⎡ ⎤
⎢ x→x0
⎥ g1 (x)  g1 (x)
⎢ lim g2 (x) ⎥ ⎢ g  (x) ⎥  ⎢ g2 (x) ⎥
⎢ x→x0 ⎥ ⎢ 2 ⎥ ⎢ ⎥
lim g(x) = ⎢
⎢ ... ⎥ , g (x) = ⎢ . . . ⎥ ,
⎥ ⎢ ⎥ g(x) = ⎢
⎢ ... ⎥ ⎥.
x→x0
⎢ ... ⎥ ⎣ . . . ⎦ ⎣ ... ⎦
⎣ ⎦ 

lim gn (x) gn (x) gn (x)
x→x0

More precisely, for any point x0 in the domain of definition of g(x), the following
hold:
440 12 Ordinary Differential Equations and Linear …

(1) The limit lim g(x) exists if and only if the limits of all components exist. If any
x→x0
of the limits lim gi (x) fail to exist, then lim g(x) does not exist.
x→x0 x→x0
(2) g(x) is continuous at x = x0 if and only if all components are continuous at
x = x0 .
(3) g(x) is differentiable at x = x0 if and only if all components are differentiable
at x = x0 .
The objects y(x), y (x) and f(x) in relation (12.7) are examples of vector-valued
functions. In this sense, we say that a solution of (12.7) is a vector-valued function
y(x) = [y1 (x), . . . , yn (x)]T satisfying the following conditions:
(1) Each function yi (x) is defined and differentiable in the domain of definition I .
(2) y(x0 ) satisfies (12.7), for any x0 ∈ I , that is, for any x0 ∈ I ,

y1 (x0 ) = Ay1 (x0 ) + f 1 (x0 )


... ... ...
... ... ...
yn (x0 ) = Ayn (x0 ) + f n (x0 ).

Just as in the case of a single differential equation, we are usually interested


in solving an initial value problem; that is, we seek the vector function y(x) =
[y1 (x), y2 (x), . . . , yn (x)]T that satisfy not only the differential equations given by
(12.4) but also a set of initial conditions of the form
⎡ ⎤
y1 (x0 )
⎢ y2 (x0 ) ⎥
⎢ ⎥
y0 = y(x0 ) = ⎢
⎢ ... ⎥,

⎣ ... ⎦
yn (x0 )

where x0 , y1 (x0 ), . . . , yn (x0 ) are known constants.


The system is said to be homogeneous if each of the functions f 1 (x), . . . , f n (x) is
precisely equal to the constant function 0, that is,

y (x) = Ay(x). (12.8)

Since linear homogeneous systems have linear structure, if s1 (x), . . . , sn (x) are solu-
tions of system (12.8), then

s(x) = c1 s1 (x) + · · · · · · + cn sn (x)

is also a solution of (12.8) for any c1 , . . . , cn ∈ R. In fact,


12.2 System of Linear Homogeneous Ordinary Differential Equations 441
 
c1 s1 (x) + · · · · · · + cn sn (x) = c1 s1  (x) + · · · · · · + cn sn  (x)

= c1 A · s1 (x) + · · · · · · + cn A · sn (x)
 
= A · c1 s1 (x) + · · · · · · + cn sn (x) .

In other words, given a homogeneous system, any linear combination of a finite


number of its solutions is also a solution. Moreover, we notice that the vector function
0 is a solution for (12.8). Hence, the set S of all solutions for (12.8) has a natural
structure of vector space.
As in every vector space, we need to know whether a set of vectors is linearly
dependent or independent. This leads to the following definition:
Definition 12.8 Let s1 (x), . . . , sk (x) ∈ S. We say that s1 (x), . . . , sk (x) are linearly
dependent vector-valued functions if there exist constants c1 , . . . , ck not all zero such
that c1 s1 (x) + · · · · · · + ck sk (x) = 0, for any x ∈ I .
Practically speaking, s1 (x), . . . , sk (x) ∈ S are linearly dependent vector-valued
functions if and only if s1 (x0 ), . . . , sk (x0 ) are linearly dependent vectors of Rn , for
any x0 ∈ I . In this sense, we say that s1 (x), . . . , sk (x) ∈ S are linearly independent
vector-valued functions if there exists x0 ∈ I such that s1 (x0 ), . . . , sk (x0 ) are linearly
independent vectors of Rn .
Now, if we fix a constant vector c ∈ Rn , there exists an unique solution s(x) of
system (12.8) satisfying the condition s(0) = c. In this way, we may define the
map Φ : Rn → S and, for any c ∈ Rn , there exists an unique s(x) ∈ S such that
Φ(c) = s(x). The map Φ is linear and bijective, implying that S ∼ = Rn . This means
that S, viewed as a vector space, has dimension n. Therefore, any solution of (12.8)
can be obtained as a linear combination of n linearly independent elements of S, that
is, n linearly independent solutions {s1 , . . . . . . , sn } of the system. Such a set is called
a fundamental system of solutions. To formalize the previous definition, we say that
{s1 , . . . . . . , sn } is a fundamental system of solutions if the following two conditions
are satisfied:
(1) There exists x0 ∈ I such that s1 (x0 ), . . . . . . , sn (x0 ) are linearly independent vec-
tors.
(2) Any vector function solution of the system is a linear combination of
s1 (x), . . . . . . , sn (x).
A simple test for the linear independence of a set of vector functions s1 (x), . . . . . . , sn
(x) that are real-valued and defined on an interval I ⊆ R can be formulated as follows:
(1) Put the components of these vector functions into the columns of a square matrix
M(x) of order n:  
M(x) = s1 (x)| · · · · · · |sn (x) .
442 12 Ordinary Differential Equations and Linear …

(2) Compute the determinant of the matrix M(x): it is called the Wronskian of the
n vector functions s1 (x), . . . . . . , sn (x) and usually denoted by W (x).
(3) If the Wronskian is different from zero for any x ∈ I , then s1 (x), . . . . . . , sn (x)
are linearly independent.
As a consequence of well-known Liouville’s theorem (also called Abel’s formula),
we observe that if W (x0 ) = 0 for any given x0 ∈ I , then W (x) = 0 for any x ∈ I .

Remark 12.9 In general, if {yi (x)} is a linearly dependent set of functions, then the
Wronskian must vanish. However, the converse is not necessarily true, as one can find
examples in which the Wronskian vanishes without the functions being dependent.
Nevertheless, if {yi (x)} are solutions for a linear system of ordinary differential
equations, then the converse does hold. In other words, if {yi (x)} are solutions for
a linear system of ordinary differential equations and the Wronskian of the {yi (x)}
vanishes, then {yi (x)} is a linearly dependent set of functions.

Example 12.10 The vector functions


   5x 
e3x e
s1 (x) = , s (x) =
−e3x 2
e5x

are solutions of the system


y1 = 4y1 + y2 ,
y2 = y1 + 4y2 .

Those are independent solutions, in fact


 3x 5x 
 e e 
W (x) =  3x 5x  = 0.
−e e

Thus, a general solution of the system is given by


   
1 1 5x
y(x) = c1 e3x + c2 e .
−1 1

Example 12.11 Solve the system

y1 = 4y1 ,
y2 = −2y2 ,
y3 = 3y3 ,

and find a solution that satisfies the initial conditions

y1 (2) = 0, y2 (2) = −1, y3 (2) = 3.

Solving each equation separately, we easily find


12.2 System of Linear Homogeneous Ordinary Differential Equations 443

y1 = c1 e4x ,

y2 = c2 e−2x ,

y3 = c3 e3x .

Thus, the vector functions


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0
s1 (x) = ⎣ 0 ⎦ e4x , s2 (x) = ⎣ 1 ⎦ e−2x , s3 (x) = ⎣ 0 ⎦ e3x
0 0 1

are solutions of the system. Moreover, the determinant of the matrix


⎡ ⎤
e4x 0 0
⎣ 0 e−2x 0 ⎦ ,
0 0 e3x

whose columns are precisely the coordinates of solutions s1 , s2 , s3 , is equal to e5x .


The determinant is then different from zero, for any x ∈ R. Therefore, vectors
s1 (x), s2 (x), s3 (x) are linearly independent (for any x ∈ R) and form a fundamental
system of solutions. Hence, a general solution for the system is the vector function
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 c1 e4x
y(x) = c1 ⎣ 0 ⎦ e4x + c2 ⎣ 1 ⎦ e−2x + c3 ⎣ 0 ⎦ e3x = ⎣ c2 e−2x ⎦ .
0 0 1 c3 e3x

To achieve the same general solution, we now introduce a different approach. The
system has the matrix form
⎡ ⎤ ⎡ ⎤⎡ ⎤
y1 (x) 4 0 0 y1 (x)
y (x) = ⎣ y2 (x) ⎦ = ⎣ 0 −2 0 ⎦ ⎣ y2 (x) ⎦ .


y3 (x) 0 0 3 y3 (x)

The coefficient matrix of the system


⎡ ⎤
4 0 0
⎣ 0 −2 0 ⎦
0 0 3

is diagonal and has three distinct real eigenvalues {4, −2, 3}. The corresponding
eigenvectors are as follows:
• X 1 = [1, 0, 0] for λ1 = 4.
• X 2 = [0, 1, 0] for λ2 = −2.
• X 3 = [0, 0, 1] for λ3 = 3.
444 12 Ordinary Differential Equations and Linear …

At this point, we notice that s1 (x) = e4x X 1 , s2 (x) = e−2x X 2 , s3 (x) = e3x X 3 , where
s1 , s2 , s3 are precisely the solutions previously deducted from the system.
Thus, the general solution is given by
⎡ ⎤
c1 e4x
y(x) = ⎣ c2 e−2x ⎦ = c1 e4x X 1 + c2 e−2x X 2 + c3 e3x X 3
c3 e3x

for X 1 , X 2 , X 3 linearly independent eigenvectors associated with the coefficient


matrix of the system.
From the given initial conditions, we have

0 = y1 (2) = c1 e8 ,
−1 = y2 (2) = c2 e−4 ,
3 = y3 (2) = c3 e6 .

Thus, c1 = 0, c2 = −e4 , c3 = 3e−6 and the solution satisfying the initial conditions
is
y1 = 0,

y2 = −e4 e−2x ,

y3 = 3e−6 e3x .

The previous example is easy to solve, thanks to the fact that each equation in
the system involves one and only one unknown function and its derivative. More
precisely, the ith equation involves precisely yi , yi , so that the matrix associated
with the system is diagonal. This allows us to easily find the solution of any equation
separately. It is clear that the simplest systems are those in which the associated
matrix is diagonal.
Here, we’ll give an answer to the question of how we might solve an homogeneous
system
y (x) = Ay(x) (12.9)

whose coefficient matrix A is not diagonal. The first case we analyze is related to
systems whose associated coefficient matrix is diagonalizable. To work this out, we
prove the following:
  Let y1 (x), . . . , yn (x) be differentiable functions of the scalar param-
Theorem 12.12
eter x, A = ai j n×n , a given diagonalizable scalar matrix such that
⎡ ⎤ ⎡ ⎤
y1 (x) ⎡ ⎤ y1 (x)
⎢ y  (x) ⎥ a11 a12 . . . a1n ⎢ ⎥
⎢ 2 ⎥ ⎢ a22 . . . a2n ⎥ ⎢ y2 (x) ⎥
⎢ . . . ⎥ = ⎢ a21 ⎥⎢ ... ⎥. (12.10)
⎢ ⎥ ⎣... ... ... ... ⎦⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
an1 an2 . . . ann
yn (x) yn (x)
12.2 System of Linear Homogeneous Ordinary Differential Equations 445

If {λ1 , . . . , λn } is the spectrum of A (the eigenvalues are not necessarily distinct) and
{X 1 , . . . , X n } is a linearly independent set of eigenvectors of A, such that AX i =
λi X i , for any i = 1, . . . , n, then {X 1 eλ1 x , . . . . . . , X n eλn x } is a fundamental system
of solutions and any solution of system (12.10) is a vector-valued function having
the form
y(x) = c1 X 1 eλ1 x + · · · + cn X n eλn x (12.11)

for c1 , . . . , cn arbitrary constants.

 A is a diagonalizable matrix, there exists an n × n invertible matrix


Proof Since
P = pi j n×n such that
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
P −1
AP = ⎢

⎥,

⎣ .. ⎦
.
λm

where the diagonal entries λi are not necessarily distinct and any eigenvalue λi
repeatedly occurs on the main diagonal as many times as it occurs as a root of the
characteristic polynomial of A. We recall that the columns of P coincide with the
eigenvectors of A. In this regard, we may write

P = [X 1 X 2 . . . Xn]

with ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
p11 p12 p1n
⎢ p21 ⎥ ⎢ p22 ⎥ ⎢ p2n ⎥
X1 = ⎢ ⎥
⎣ ... ⎦ X2 = ⎢ ⎥
⎣ ... ⎦ ... Xn = ⎢ ⎥
⎣ ... ⎦.
pn1 pn2 pnn

Introduce a new vector-valued function


⎡ ⎤
g1 (x)
⎢ g2 (x) ⎥
⎢ ⎥
g(x) = ⎢
⎢ ... ⎥,

⎣ ... ⎦
gn (x)

where g1 , . . . , gn are differentiable functions of the variable x such that g(x) is related
to the unknown function y(x) by the equation g(x) = P −1 · y(x).
Hence,
y(x) = Pg(x) (12.12)

which means that


446 12 Ordinary Differential Equations and Linear …
⎡ ⎤ ⎡ ⎤
y1 (x) ⎡ ⎤ g (x)
⎢ y2 (x) ⎥ p11 p12 ... p1n ⎢ 1 ⎥
⎢ ⎥ ⎢ p21 p22 ... p2n ⎥ ⎢ g2 (x) ⎥
⎢ ... ⎥ = ⎢ ⎥⎢ ... ⎥. (12.13)
⎢ ⎥ ⎣ ... ... ... ... ⎦⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
pn1 pn2 ... pnn
yn (x) gn (x)

Since P is a constant coefficient matrix, differentiation of both sides of (12.13) leads


to ⎡  ⎤ ⎡ ⎤
y1 (x) ⎡ ⎤ g1 (x)
⎢ y  (x) ⎥ p p . . . p ⎢  ⎥
⎥ ⎢ g2 (x) ⎥
11 12 1n
⎢ 2 ⎥ ⎢
⎢ . . . ⎥ = ⎢ p21 p22 . . . p2n ⎥ ⎢ . . . ⎥ , (12.14)
⎢ ⎥ ⎣ ... ... ... ... ⎦⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
pn1 pn2 . . . pnn
yn (x) gn (x)

that is,
y (x) = Pg (x). (12.15)

Substitution of (12.12) and (12.15) in (12.10) leads us to Pg (x) = A Pg(x). Thus,
g (x) = P −1 A Pg(x), which means
⎡ ⎤ ⎡ ⎤⎡ ⎤
g1 (x) λ1 g1 (x)
⎢ g  (x) ⎥ ⎢ . . ⎥ ⎢ g2 (x) ⎥
⎢ 2 ⎥ ⎢ . ⎥⎢ ⎥
⎢ ... ⎥ = ⎢ ⎥⎢ ... ⎥ (12.16)
⎢ ⎥ ⎢ .. ⎥ ⎢ ⎥
⎣ ... ⎦ ⎣ . ⎦⎣ ... ⎦
gn (x) λn gn (x)

so that
gi (x) = λi gi (x) ∀i = 1, . . . , n. (12.17)

We can solve equations (12.17) individually and obtain

gi (x) = ci eλi x ∀i = 1, . . . , n. (12.18)

From the given relation (12.13), for any i = 1, . . . , n, we get



n
yi (x) = pi j g j (x)
j=1


n
= pi j c j eλ j x ,
j=1

that is,
12.2 System of Linear Homogeneous Ordinary Differential Equations 447
⎡ ⎤
y1 (x)
⎢ y2 (x) ⎥
⎢ ⎥
y(x) = ⎢
⎢ ... ⎥

⎣ ... ⎦
yn (x)
⎡ ⎤
p11 c1 eλ1 x + · · · + p1n cn eλn x
⎢ p21 c1 eλ1 x + · · · + p2n cn eλn x ⎥
⎢ ⎥
=⎢
⎢ ... ⎥

⎣ ... ⎦
λ1 x λn x
pn1 c1 e + · · · + pnn cn e

= c1 X 1 eλ1 x + · · · + cn X n eλn x ,

where the coefficients c1 , . . . , cn are arbitrary and can be determined by assigning


initial conditions in terms of the value of y1 (x), . . . , yn (x) at some particular x = x0 .
In particular, {X 1 eλ1 x , . . . . . . , X n eλn x } is a set of n solutions. Moreover, the matrix
whose columns are the coordinates of these solutions is
⎡ λx ⎤
e 1 p11 eλ2 x p12 . . . eλn x p1n
⎢ eλ1 x p21 eλ2 x p22 . . . eλn x p2n ⎥
⎢ ⎥.
⎣ ... ... ... ... ⎦
eλ1 x pn1 eλ2 x pn2 . . . eλn x pnn

Its determinant is equal to e(λ1 +······+λn )x · det (P) and is different from zero, for any
x ∈ R. Therefore, we conclude that {X 1 eλ1 x , . . . . . . , X n eλn x } is a fundamental system
of solutions.

Example 12.13 Solve the system

y1 = 3y1 − y2 + y3 − 2y4 ,


y2 = −y1 + 3y2 − 2y3 + y4 ,
y3 = 2y3 + 2y4 ,
y4 = 8y3 + 2y4 ,

and find a solution that satisfies the initial conditions

y1 (0) = 1, y2 (0) = 1, y3 (0) = 2, y4 (0) = −1.

First we write the system in its matrix form


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
y1 (x) 3 −1 1 −2 y1 (x)
⎢ y  (x) ⎥ ⎢ −1 −2 1 ⎥ ⎢ ⎥
 ⎢
y (x) = ⎣ 2 ⎥ = ⎢ 3 ⎥ · ⎢ y2 (x) ⎥ .
y3 (x) ⎦ ⎣ 0 0 2 2 ⎦ ⎣ y3 (x) ⎦
y4 (x) 0 0 8 2 y4 (x)
448 12 Ordinary Differential Equations and Linear …

The coefficient matrix of the system


⎡ ⎤
3 −1 1 −2
⎢ −1 3 −2 1 ⎥
⎢ ⎥
⎣ 0 0 2 2 ⎦
0 0 8 2

has four distinct real eigenvalues {−2, 2, 4, 6}; thus, it is diagonalizable. The corre-
sponding eigenvectors are as follows:
• X1 = [−7, 5, 8, −16]T for λ1 = −2.
• X2 = [1, 1, 0, 0]T for λ2 = 2.
• X3 = [1, −1, 0, 0]T for λ3 = 4.
• X4 = [−9, 3, 8, 16]T for λ4 = 6.
Hence,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−7 1 1 −9
⎢ 5 ⎥ −2x ⎢ 1 ⎥ 2x ⎢ −1 ⎥ 4x ⎢ 3 ⎥ 6x
y(x) = c1 ⎢
⎣ 8 ⎦e
⎥ + c2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 0 ⎦ e + c3 ⎣ 0 ⎦ e + c4 ⎣ 8 ⎦ e ,
−16 0 0 16

that is,
y1 = −7c1 e−2x + c2 e2x + c3 e4x − 9c4 e6x ,

y2 = 5c1 e−2x + c2 e2x − c3 e4x + 3c4 e6x ,

y3 = 8c1 e−2x + 8c4 e6x ,

y4 = −16c1 e−2x + 16c4 e6x .

From the given initial conditions, we have

1 = y1 (0) = −7c1 + c2 + c3 − 9c4 ,

1 = y2 (0) = 5c1 + c2 − c3 + 3c4 ,

2 = y3 (0) = 8c1 + 8c4 ,

−1 = y4 (0) = −16c1 + 16c4 .

Solving the above linear system, one has c1 = 5


32
, c2 = 23
,c
16 3
= 23 , c4 = 3
32
. Hence,
the solution satisfying the initial conditions is
12.2 System of Linear Homogeneous Ordinary Differential Equations 449

35 −2x
y1 = − 32 e + 23 2x
16
e + 23 e4x − e ,
27 6x
32

25 −2x
y2 = 32
e + 23 2x
16
e − 23 e4x + e ,
9 6x
32

y3 = 45 e−2x + 34 e6x ,

y4 = − 25 e−2x + 23 e6x .

Example 12.14 Find the general solution of the system

y1 = y1 + 2y2 ,
y2 = 3y1 + 2y2 ,
y3 = 3y1 − 2y2 + 4y3 .

We write the system in its matrix form


⎡ ⎤ ⎡ ⎤⎡ ⎤
y1 (x) 1 2 0 y1 (x)
y (x) = ⎣ y2 (x) ⎦ = ⎣ 3 2 0 ⎦ ⎣ y2 (x) ⎦ .
y3 (x) 3 −2 4 y3 (x)

The coefficient matrix of the system


⎡ ⎤
1 2 0
⎣3 2 0⎦
3 −2 4

has two distinct real eigenvalues {−1, 4}, in particular the algebraic multiplicity of
λ = 4 is equal to 2. The corresponding eigenvectors are as follows:
• X 1 = [2, 3, 0]t and X 2 = [0, 0, 1]t for λ = 4.
• X 3 = [1, −1, −1]t for λ2 = −1.
Thus, the geometric multiplicity of λ = 4 is equal to the algebraic one, that is, the
matrix is diagonalizable. So, the general solution has the following form:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 0 −1
y(x) = c1 ⎣ 3 ⎦ e4x + c2 ⎣ 0 ⎦ e4x + c3 ⎣ 1 ⎦ e−x ,
0 1 1

that is,
y1 = 2c1 e4x − c3 e−x ,

y2 = 3c1 e4x + c3 e−x ,

y3 = c2 e4x + c3 e−x .
450 12 Ordinary Differential Equations and Linear …

Example 12.15 Find the general solution of the system

y1 = 3y1 + y2 − 4y3 ,


y2 = 3y1 + 4y2 − 3y3 ,
y3 = −y1 + 4y2 .

The coefficient matrix of the system is


⎡ ⎤
3 1 −4
⎣ 3 4 −3 ⎦ .
−1 4 0

It has one real eigenvalue λ = −1 and two complex conjugate eigenvalues μ =


4 + 3i, μ = 4 − 3i. It is clear that the matrix is diagonalizable.
The eigenspace corresponding to λ = −1 is generated by the eigenvector X 1 =
[1, 0, 1]t .
To find the part of the general solution associated with the pair of complex conju-
gate eigenvalues, it is sufficient to take only one of them and find the associated
eigenvector. For example, if we consider μ = 4 + 3i and solve the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
3 − μ 1 −4 x1 0
⎣ 3 4 − μ −3 ⎦ ⎣ x2 ⎦ = ⎣ 0 ⎦ ,
−1 4 −μ x3 0

we get the complex eigenvector X 2 = [i, 1 + i, 1]t . Thus, the complex eigenvector
corresponding to μ = 4 − 3i is X 2 = [−i, 1 − i, 1]t . So, the general solution has the
following form:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 i −i
y(x) = c1 ⎣ 0 ⎦ e−x + c2 ⎣ 1 + i ⎦ e(4+3i)x + c3 ⎣ 1 − i ⎦ e(4−3i)x ,
1 1 1

that is,
y1 = c1 e−x + ic2 e(4+3i)x − ic3 e(4−3i)x ,

y2 = (1 + i)c2 e(4+3i)x + (1 − i)c3 e(4−3i)x ,

y3 = c1 e−x + c2 e(4+3i)x + c3 e(4−3i)x .


 
Recalling that, for a, b ∈ R, ea+ib = ea cos b + i sin b , we may write the above
solution as
12.2 System of Linear Homogeneous Ordinary Differential Equations 451
  
y1 = c1 e−x + ic2 e4x cos 3x + i sin 3x − ic3 e4x (cos 3x − i sin 3x

= c1 e−x + ic2 e4x cos 3x − c2 e4x sin 3x − ic3 e4x cos 3x − c3 e4x sin 3x
   
= c1 e−x − (c2 + c3 )e4x sin 3x + i (c2 − c3 )e4x cos 3x ,

y2 = (1 + i)c2 e(4+3i)x + (1 − i)c3 e(4−3i)x


  
= (1 + i)c2 e4x cos 3x + i sin 3x + (1 − i)c3 e4x (cos 3x − i sin 3x

= c2 e4x cos 3x + ic2 e4x cos 3x + ic2 e4x sin 3x − c2 e4x sin 3x+

c3 e4x cos 3x − ic3 e4x cos 3x − ic3 e4x sin 3x − c3 e4x sin 3x
   
= (c2 + c3 )e cos 3x − sin 3x + i(c2 − c3 )e cos 3x + sin 3x ,
4x 4x

  
y3 = c1 e−x + c2 e4x cos 3x + i sin 3x + c3 e4x (cos 3x − i sin 3x
   
= c1 e−x + (c2 + c3 )e4x cos 3x + i (c2 − c3 )e4x sin 3x .

They give us the complex solution


⎡     ⎤
−x
⎢ c1 e − (c2 + c3 )e sin 3x + i (c2 − c3 )e cos 3x
4x 4x

⎢    ⎥
⎢ ⎥
y(x) = ⎢ ⎥
⎢ (c2 + c3 )e cos 3x − sin 3x + i(c2 − c3 )e cos 3x + sin 3x ⎥ .
4x 4x
⎢     ⎥
⎣ −x

c1 e + (c2 + c3 )e cos 3x + i (c2 − c3 )e sin 3x
4x 4x

Notice that every solution of the form (12.11) will be a linear combination of the
special solutions
X 1 e λ1 x , . . . , X n e λn x

each of which must be interpreted as the product of the variable scalar eλi x and the
constant vector X i , for any eigenvalue λi and corresponding eigenvector X i . Actually,
any particular solution X i eλi x can be obtained from the general one, assigning the
values ci = 1 and c j = 0 for any j = i.
The previously mentioned method allows us to solve the system (12.10) whenever
the associated coefficient matrix A is diagonalizable. The difficulty arises when there
exist some eigenvalues of A having its geometric multiplicity strictly lesser than the
algebraic one. In this case, A has fewer than n linearly independent eigenvectors and
452 12 Ordinary Differential Equations and Linear …

it is not diagonalizable. Nevertheless, when the matrix A has real eigenvalues, the
eigenvectors and generalized eigenvectors form a basis for Rn . From this basis, we
may construct the complete solution of system (12.10), by using a similar argument
as in Theorem 12.12.

 y1 (x), . . . , yn (x) be differentiable functions of the scalar


Theorem 12.16 Let
parameter x, A = ai j n×n , a given jordanizable scalar matrix such that
⎡ ⎤ ⎡ ⎤
y1 (x) ⎡ ⎤ y (x)
⎢ y  (x) ⎥ a11 a12 . . . a1n ⎢ 1 ⎥
⎢ 2 ⎥ ⎢ a21 a22 . . . a2n ⎥ ⎢ y2 (x) ⎥
⎢ ... ⎥ = ⎢ ⎥⎢ ... ⎥. (12.19)
⎢ ⎥ ⎣... ... ... ... ⎦⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
an1 an2 . . . ann
yn (x) yn (x)

Assume that ⎡ ⎤
Jn 1 (λ1 )
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎣ ⎦
Jnr (λr )

is the Jordan canonical form of matrix A, where eigenvalues λ1 , . . . , λr are not


necessarily distinct and
• Jni (λi ) is the Jordan block associated with eigenvalue λi (for any i = 1, . . . , r );
• n 1 , . . . , n r are the orders of Jordan blocks Jn 1 (λ1 ), . . . , Jnr (λr ), respectively, with
r
n i = n;
i=1
• {X i,1 , . . . , X i,ni } is the chain of generalized eigenvectors generating the n i × n i
Jordan block associated with λi (for any i = 1, . . . , r ).
Then, the set
 
k  
x k− j
eλi x X i, j |i = 1, . . . , r k = 1, . . . , n i
j=1
(k − j)!

is a fundamental system of solutions and any solution of system (12.19) is a vector-


valued function having the form


r 
ni 
k 
x k− j
y(x) = ci,k eλi x X i, j (12.20)
i=1 k=1 j=1
(k − j)!

for n arbitrary constants ci,k (i = 1, . . . , r and k = 1, . . . , n i ).

Proof By our main assumptions, there exists an n × n invertible matrix P =


pi j n×n such that
12.2 System of Linear Homogeneous Ordinary Differential Equations 453
⎡ ⎤
Jn 1 (λ1 )
⎢ .. ⎥
⎢ . ⎥
P −1 A P = ⎢ ⎥.
⎣ ⎦
Jnr (λr )

We recall that column vectors from P coincide with a Jordan basis for Rn and
represent a set of n linearly independent generalized eigenvectors of A. Any Jordan
block Jn k (λk ) of size n k is associated with a subset of n k Jordan generators, that
is, a chain of n k linearly independent generalized eigenvectors corresponding to the
eigenvalue λk . Moreover, the first vector in the chain is exactly an eigenvector of A
corresponding to λr . We may write

P = [ X 1,1 , . . . , X 1,n 1 | . . . . . . | X r,1 , . . . , X r,nr ].


     
chain related to Jn1 chain related to Jnr

Introducing the vector-valued function


⎡ ⎤
g1,1 (x)
⎢ ... ⎥
⎢ ⎥
⎢ g1,n 1 (x) ⎥
⎢ ⎥
⎢ ... ⎥

g(x) = ⎢ ⎥

⎢ ... ⎥
⎢ gr,1 (x) ⎥
⎢ ⎥
⎣ ... ⎦
gr,nr (x)

for gi, j differentiable functions of the variable x and defined by g(x) = P −1 y(x),
one can see that solution of (12.19) is given by

y(x) = Pg(x)

= X 1,1 g1,1 (x) + · · · + X 1,n 1 g1,n 1 (x)+

······
(12.21)
+X r,1 gr,1 (x) + · · · + X r,nr gr,nr (x)


r 
ni
= X i, j gi, j (x).
i=1 j=1

As in the proof of Theorem 12.12, we arrive at

g (x) = P −1 A Pg(x),
454 12 Ordinary Differential Equations and Linear …

that is, ⎡ ⎤ ⎡ ⎤

g1,1 (x) g1,1 (x)
⎢ ... ⎥ ⎡ ⎤⎢ ⎥
⎢  ⎥ ⎢ ... ⎥
⎢ g (x) ⎥ Jn 1 (λ1 ) ⎢ g1,n 1 (x) ⎥
⎢ 1,n 1 ⎥ ⎥⎢ ⎥
⎢ ... ⎥ ⎢ .. ⎥⎢ ... ⎥
⎢ ⎥=⎢ . ⎢ ⎥.
⎢ ... ⎥ ⎢ ⎥⎢
⎦⎢ ... ⎥
(12.22)
⎢  ⎥ ⎣ ⎥
⎢ g (x) ⎥ ⎢ ⎥
⎢ r,1 ⎥ Jnr (λr ) ⎢ gr,1 (x) ⎥
⎣ ... ⎦ ⎣ ... ⎦
 gr,nr (x)
gr,n r
(x)

Relation (12.22) produces r different systems of linear first-order ordinary differential


equations. More precisely, we get one and only one system for every diagonal Jordan
block. Notice that, given any diagonal Jordan block Jn k (λk ) of size n k , the n k unknown
functions gi, j (and their first derivatives gi, j ) that are involved in the associated linear
system do not occur in any other system related to any other Jordan block. Thus, the
above-mentioned systems can be solved independently of each other. By virtue of
this independence, we’ll now proceed to determine the solutions of any one of them.
To simplify the exposition, we denote by

Jk any Jordan block of order k corresponding to eigenvalue λ;


{X 1 , . . . , X k } the chain of corresponding eigenvectors; (12.23)
{g1 , . . . , gk } the unknown functions involved in the associated system.

Solving the system


⎡ ⎤ ⎡ ⎤⎡ ⎤
g1 (x) λ 1 g1 (x)
⎢ g  (x) ⎥ ⎢ · · · ⎥ ⎢ g2 (x) ⎥
⎢ 2 ⎥ ⎢ ⎥⎢ ⎥,
⎣ ... ⎦ = ⎣ λ 1⎦⎣ ... ⎦
(12.24)
gk (x) λ gk (x)

we get
gi = λgi + gi+1 for i = 1, . . . , k − 1;
(12.25)
gk = λgk .

We integrate starting from the last equation (see Example 12.1)

gk (x) = ck eλx for arbitrary constant ck .

Then, substitution of gk leads to



gk−1 (x) = λgk−1 (x) + gk (x) = λgk−1 (x) + ck eλx

having solution (see Example 12.3)


12.2 System of Linear Homogeneous Ordinary Differential Equations 455

gk−1 (x) = λgk−1 (x) + gk (x)=ck−1 eλx + ck xeλx for arbitrary constants ck−1 , ck .

Analogously,

gk−2 (x) = λgk−2 (x) + gk−1 (x) = λgk−2 (x) + ck−1 eλx + ck xeλx

whose integral is (see again Example 12.3)

x 2 λx
gk−2 (x) = ck−2 eλx + ck−1 xeλx + ck e for arbitrary constants ck−2 , ck−1 , ck .
2
Continuing this backward substitutions process, we arrive at the solution of system
(12.25), more precisely,


k
x h− j λx
g j (x) = ch e , j = 1, . . . , k for arbitrary constants c1 , . . . , ck .
h= j
(h − j)!
(12.26)
Go back to the general case. The unknown functions {gi,1 , . . . , gi,ni }, that are involved
in the system associated with Jni and related to eigenvalue λi (for any i = 1, . . . , r )
, can be determined as follows:


ni
x h− j λi x
gi, j (x) = ci,h e , j = 1, . . . , n i for arbitrary constants ci,h .
h= j
(h − j)!
(12.27)
Substitution of (12.27) in (12.21) gives


r 
ni
y(x) = X i, j gi, j (x)
i=1 j=1

 

r 
ni 
ni
X i, j eλi x
h− j
= x
ci,h (h− j)! (12.28)
i=1 j=1 h= j

 

r 
ni
λi x

h
x h− j
= ci,h e X
(h− j)! i, j
,
i=1 h=1 j=1

where
 
h  
λi x x h− j
e X i, j |i = 1, . . . , r, h = 1, . . . , n i (12.29)
j=1
(h − j)!

is a set of n solutions to the initial system. We may describe this set by marking with
an appropriate symbol and any of its element as follows:
456 12 Ordinary Differential Equations and Linear …


h 
x h− j
si,h (x) = eλi x X i, j i = 1, . . . , r h = 1, . . . , n i
j=1
(h − j)!

and denote by S the n × n matrix-valued function whose columns are precisely the
coordinates of any si,h (x):
 
S(x) = s1,1 (x) . . . s1,n1 (x) . . . . . . sr,1 (x) . . . sr,nr (x) .

To show that (12.29) is a fundamental system of solutions, we now prove that it is a


linearly independent set of solutions.
To do this, we see that S(x) = PB(x), where B(x) is matrix-valued function having
the following block diagonal form:
⎡ ⎤
Bn1 (x)
⎢ .. ⎥
⎢ . ⎥
B(x) = ⎢ ⎥
⎣ ⎦
Bnr (x)

for
⎡ ⎤
x ni −1 λi x
eλi x xeλi x x2 eλi x x3! eλi x x4! eλi x
2 3 4
... ... (n i −1)!
e
⎢ x ni −2 λi x ⎥
⎢ 0 eλi x xeλi x x2 eλi x x3! eλi x ⎥
2 3
... ... e
⎢ (n i −2)! ⎥
⎢ 0 0 eλi x xeλi x x2 eλi x
2
x 3 λi x
e ... x ni −3 λi x
e ⎥
⎢ 3! (n i −3)! ⎥
⎢ x ni −4 λi x ⎥
⎢ 0 0 0 eλi x xeλi x x 2 λi x
e ... e ⎥
⎢ 2 (n i −4)! ⎥
Bni (x) = ⎢

..
.
⎥.

⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎣ 0 0 0 ··· ··· ··· eλi x xeλi x ⎦
0 0 0 ··· ··· ··· 0 eλi x

By the fact that P is the constant invertible matrix whose columns are the coordinates
of linearly independent generalized eigenvectors of A, and since the determinant of
B(x) is a function of the variable x which is trivially nowhere zero, we arrive at
the conclusion that S(x) is an invertible matrix-valued function, for any x ∈ R. This
means that its rank is equal to n and its columns are linearly independent, as required.

Remark 12.17 Returning to the simplified case (12.23), we can assert that, for any
Jordan block Jk of length k and related to the eigenvalue λ, a set of k linearly
independent solutions is given by
12.2 System of Linear Homogeneous Ordinary Differential Equations 457
 
s1 (x) = eλx X 1 ; s2 (x) = eλx x X 1 + X 2 ;
 2   3 
x x x2
s3 (x) = eλx X 1 + x X 2 + X 3 ; s4 (x) = eλx X1 + X2 + x X3 + X4 ;
2 3! 2
... ... ... ...
 k−2 
λx x
sk−1 (x) = e X 1 + · · · + x X k−2 + X k−1 ;
(k − 2)!
 k−1 
λx x x2
sk (x) = e X1 + · · · + X k−2 + x X k−1 + X k ;
(k − 1)! 2
(12.30)
where {X 1 , . . . , X k } is the chain of generalized eigenvectors associated with eigen-
value λ and generating Jk . Thus, the contribution to the general solution (12.21) of
system (12.19) has the form


k 
k t
x t− j
ct st (x) = ct eλx X j, (12.31)
t=1 t=1 j=1
(t − j)!

where c1 , . . . , ck are arbitrary constants. So, a fundamental system of solutions can


be obtained by the union of sets of linearly independent solutions of the form (12.30),
corresponding to all Jordan blocks. The general solution (12.21) is then given by the
total sum of particular solutions of the form (12.31), each of which is obtained from
a different Jordan block of the matrix P −1 A P.

Let us finally summarize the method for solving a system of the form (12.19):
• Find eigenvalues {λ1 , . . . , λr } of A.
• For any Jordan block Jk (λ) of order k (related to some eigenvalue λ), find the
corresponding chain of k generalized eigenvectors {X 1 , . . . , X k }.
• Construct the particular solution (12.31) corresponding with Jk , that is,

c1 eλx X 1 +
 
λx
c2 e x X1 + X2 +
 2 
x
c3 eλx X1 + x X2 + X3 +
2
(12.32)
··· ··· ··· ··· +
 k−2 
x
ck−1 eλx X 1 + · · · + x X k−2 + X k−1 +
(k − 2)!
 k−1 
x x2
ck eλx X1 + · · · + X k−2 + x X k−1 + X k .
(k − 1)! 2

• Finally, add up all the particular solutions of the form (12.32).


458 12 Ordinary Differential Equations and Linear …

Example 12.18 Solve the system

y1 = 2y1 − y2 + y3 + y4 ,
y2 = y1 + 4y2 + y4 ,
y3 = 4y3 + y4 ,
y4 = y3 + 4y4 ,

and find a solution that satisfies the initial conditions

y1 (0) = 1, y2 (0) = 0, y3 (0) = 0, y4 (0) = 1.

The coefficient matrix of the system


⎡ ⎤
2 −1 1 1
⎢1 4 0 1⎥
A=⎢
⎣0

0 4 1⎦
0 0 1 4

has two distinct real eigenvalues λ1 = 3 and λ2 = 5. The eigenvalues λ1 = 3 has


algebraic multiplicity equal to 3, but its geometric multiplicity is equal to 1. Thus,
the matrix is not diagonalizable. However, we may obtain its Jordan canonical form,
that is, ⎡ ⎤
3100
⎢0 3 1 0⎥
A = ⎢ ⎥
⎣0 0 3 0⎦.
0005

To arrive at the general solution of the system, we now find the eigenvectors and
generalized eigenvectors of matrix A.
Since there exists only one Jordan block of order 3 with eigenvalue λ1 = 3, we need
to find generalized eigenvectors corresponding to λ1 and having exponents 1, 2, 3.
Hence, we must solve the homogeneous linear systems associated with matrices
⎡ ⎤
−1 −1 1 1
⎢ 1 1 0 1⎥
(A − 3I ) = ⎢
⎣ 0 0 1 1⎦,

0 0 11
⎡ ⎤
0 0 1 0
⎢0 0 2 3⎥

(A − 3I ) = ⎣
2 ⎥
0 0 2 2⎦
0 0 2 2

and
12.2 System of Linear Homogeneous Ordinary Differential Equations 459
⎡ ⎤
0 0 1 1
⎢0 0 5 5⎥
(A − 3I ) = ⎢
3
⎣0
⎥.
0 4 4⎦
0 0 4 4

Starting with (A − 3I ), we see that solutions of


⎡ ⎤⎡ ⎤ ⎡ ⎤
−1 −1 1 1 x1 0
⎢ 1 1 0 1⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣ 0 0 1 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 1 1 x4 0

are vectors from the space N1,λ1 = (1, −1, 0, 0).


For the generalized eigenvectors of exponent 2, we solve the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0 1 0 x1 0
⎢0 0 2 3⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣0 0 2 2 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 2 2 x4 0

and find N2,λ1 = (1, 0, 0, 0), (0, 1, 0, 0).


Then, for generalized eigenvectors of exponent 3, by the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0 1 1 x1 0
⎢0 0 5 5⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥,
⎣0 0 4 4 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 4 4 x4 0

we get N3,λ1 = (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, −1).


To construct the chain of generalized eigenvectors {X 1 , X 2 , X 3 }, we start from X 3 =
(0, 0, 1, −1) ∈ N3,λ1 \ N2,λ1 . Thus,
⎤ ⎡
0
⎢ −1 ⎥
X 2 = (A − 3I )X 3 = ⎢
⎣ 0 ⎦

and ⎤ ⎡
1
⎢ −1 ⎥
X 1 = (A − 3I )X 2 = ⎢ ⎥
⎣ 0 ⎦.
0

Finally, we compute the eigenvector corresponding to λ2 = 5, whose multiplicity is


equal to one. The homogeneous system associated with (A − 5I ) is
460 12 Ordinary Differential Equations and Linear …
⎡ ⎤⎡ ⎤ ⎡ ⎤
−3 −1 1 1 x1 0
⎢ 1 −1 0 1 ⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣ 0 0 −1 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 1 −1 x4 0

having solution Y1 = α(1, 5, 4, 4), for any α ∈ R. Therefore, the Jordan basis with
respect of which we have the Jordan canonical form A is {X 1 , X 2 , X 3 , Y1 } and the
general solution of the original system of differential equations is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1  1 0 
⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥
c1 e3x ⎢ ⎥ ⎢ ⎥ ⎢
⎣ 0 ⎦ + c2 e x ⎣ 0 ⎦ + ⎣ 0 ⎦ +
3x ⎥

0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
 2 1 0 0  1
x ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥ ⎢ ⎥
c3 e3x ⎢ ⎥+x⎢ ⎥+⎢ ⎥ + c4 e5x ⎢ 5 ⎥ ,
2 ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 4 ⎦
0 0 −1 4

that is,
2
y1 (x) = c1 e3x + c2 xe3x + c3 x2 e3x + c4 e5x ,
2
y2 (x) = −c1 e3x − c2 (xe3x + e3x ) − c3 ( x2 e3x + xe3x ) + 5c4 e5x ,

y3 (x) = c3 e3x + 4c4 e5x ,

y4 (x) = −c3 e3x + 4c4 e5x .

From the given initial conditions, we have

1 = y1 (0) = c1 + c4 ,

0 = y2 (0) = −c1 − c2 + 5c4 ,

0 = y3 (0) = c3 + 4c4 ,

1 = y4 (0) = −c3 + 4c4 ,

whose solutions are


7 1 1 1
c1 = , c2 = − , c3 = − , c4 = .
8 4 2 8
So the solution satisfying the initial conditions is
12.2 System of Linear Homogeneous Ordinary Differential Equations 461

1 x 2 3x
y1 (x) = 78 e3x − 41 xe3x − 2 2
e + 18 e5x ,
2
y2 (x) = − 78 e3x + 14 (xe3x + e3x ) + 21 ( x2 e3x + xe3x ) + 58 e5x ,

y3 (x) = − 21 e3x + 21 e5x ,

y4 (x) = 21 e3x + 21 e5x .

Example 12.19 Find the general solution of the system

y1 = 3y1 + 2y2 − y3 − y4 ,


y2 = 2y2 + y4 ,
y3 = y1 + y2 + y3 + y4 ,
y4 = 2y4 .

The coefficient matrix of the system


⎡ ⎤
3 2 −1 −1
⎢0 2 0 1 ⎥
A=⎢
⎣1

1 1 1 ⎦
0 0 0 2

has only one eigenvalue λ = 2, with algebraic multiplicity equal to 4, but its geometric
multiplicity is equal to 1. Thus, the matrix has the following Jordan canonical form:
⎡ ⎤
2 1 0 0
⎢0 2 1 0⎥
 ⎢
A =⎣ ⎥.
0 0 2 1⎦
0 0 0 2

To arrive at the general solution of the system, we now find the eigenvectors and
generalized eigenvectors corresponding to λ and having exponents 1, 2, 3, 4.
Starting with (A − 2I ), we must solve the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 2 −1 −1 x1 0
⎢0 0 0 1 ⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥.
⎣1 1 −1 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 0 0 x4 0

It is easy to see that N1,λ = (1, 0, 1, 0).


462 12 Ordinary Differential Equations and Linear …

For the generalized eigenvectors of exponent 2, we look at the homogeneous system


associated with the matrix (A − 2I )2 , that is,
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 1 0 0 x1 0
⎢0 0 0 0 ⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣0 1 0 −1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 0 0 x4 0

and find N2,λ = (1, 0, 0, 0), (0, 0, 1, 0).


Then, for generalized eigenvectors of exponent 3, we solve the homogeneous system
associated with (A − 2I )3 , that is,
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 0 0 1 x1 0
⎢0 0 0 0⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣0 0 0 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 0 0 x4 0

and obtain N3,λ = (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0).


Finally, since (A − 2I )4 = 0, N4,λ = R4 . To complete a chain of generalized eigen-
vectors {X 1 , X 2 , X 3 , X 4 }, we start from X 4 = (0, 0, 0, 1) ∈ N4,λ \ N3,λ . Thus,
⎤ ⎡
−1
⎢ 1 ⎥
X 3 = (A − 2I )X 4 = ⎢ ⎥
⎣ 1 ⎦,
0
⎤⎡
0
⎢ 0 ⎥
X 2 = (A − 2I )X 3 = ⎢ ⎥
⎣ −1 ⎦
0

and ⎡ ⎤
1
⎢0⎥
X 1 = (A − 2I )X 2 = ⎢
⎣1⎦.

Therefore, the Jordan basis with respect of which we have the Jordan canonical form
A is {X 1 , X 2 , X 3 , X 4 } and the general solution of the original system of differential
equations is
12.2 System of Linear Homogeneous Ordinary Differential Equations 463
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1  1 0 
⎢0⎥ ⎢0⎥ ⎢ 0 ⎥
c1 e2x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 1 ⎦ + c2 e x ⎣ 1 ⎦ + ⎣ −1 ⎦ +
2x

0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
 2 1 0 −1 
x ⎢ ⎥ ⎢ ⎥ ⎢
⎢0⎥ + x ⎢ 0 ⎥ + ⎢ 1 ⎥ +

c3 e2x ⎣ ⎦ ⎣ ⎦ ⎣
2 1 −1 1 ⎦
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
 3 1 0 −1 0 
⎢ ⎥ x2 ⎢ 0 ⎥
2x x ⎢ 0 ⎥
⎢ 1 ⎥ ⎢0⎥
+ ⎢ ⎥+x⎢ ⎥ ⎢ ⎥
c4 e
6 ⎣1⎦ 2 ⎣ −1 ⎦ ⎣ 1 ⎦ + ⎣0⎦ ,
0 0 0 1

that is,
 2 3
y1 (x) = c1 + c2 x + c3 ( x2 − 1) + c4 ( x6 − x) e2x ,

y2 (x) = c3 + c4 x e2x ,
 2 3
x2
y3 (x) = c1 + c2 (x − 1) + c3 ( x2 − x + 1) + c4 ( x6 − 2
+ x) e2x ,

y4 (x) = c4 e2x .

Example 12.20 Find the general solution of the system

y1 = 2y1 + 4y2 − 8y3 ,


y2 = 4y3 ,
y3 = −y2 + 4y3 .

The coefficient matrix of the system


⎡ ⎤
2 4 −8
A = ⎣0 0 4 ⎦
0 −1 4

has only one eigenvalue λ = 2, with algebraic multiplicity equal to 3, but its geometric
multiplicity is equal to 2. Thus, the matrix has the following Jordan canonical form:
⎡ ⎤
210
A = ⎣ 0 2 0 ⎦ .
002

To arrive at the general solution of the system, we firstly find the generalized eigenvec-
tors corresponding to generating the Jordan block of order 2. Starting with (A − 2I ),
464 12 Ordinary Differential Equations and Linear …

we must solve the system


⎡ ⎤⎡ ⎤ ⎡ ⎤
0 4 −8 x1 0
⎣ 0 −2 4 ⎦ ⎣ x2 ⎦ = ⎣ 0 ⎦ .
0 −1 2 x3 0

Then N1,λ = (1, 0, 0), (0, 2, 1).


Since (A − 2I )2 = 0, we may choose X 2 = (0, 0, 1) ∈ N2,λ \ N1,λ as generalized
eigenvector of exponent 2. To get a chain of generalized eigenvectors {X 1 , X 2 }, we
start from X 2 = (0, 0, 1). Thus,
⎤ ⎡
−8
X 1 = (A − 2I )X 2 = ⎣ 4 ⎦ .
2

For the chain of order 1, we can easy choose Y1 = (1, 0, 0) ∈ N1,λ . Therefore, the Jor-
dan basis with respect of which we have the Jordan canonical form A is {X 1 , X 2 , Y1 }
and the general solution of the original system of differential equations is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−8  −8 0  1
c1 e2x ⎣ 4 ⎦ + c2 e2x x ⎣ 4 ⎦ + ⎣ 0 ⎦ + c3 e2x ⎣ 0 ⎦ ,
2 2 1 0

that is, 
y1 (x) = −8c1 − 8c2 x + c3 e2x ,

y2 (x) = 4c1 + 4c2 x e2x ,

y3 (x) = 2c1 + c2 (2x + 1) e2x .

Example 12.21 Find the general solution of the system

y1 = −y3 − y4 ,
y2 = −y4 ,
y3 = y1 + y2 ,
y4 = y2 .

The coefficient matrix of the system


⎡ ⎤
0 0 −1 −1
⎢0 0 0 −1 ⎥
A=⎢
⎣1

1 0 0 ⎦
0 1 0 0
12.2 System of Linear Homogeneous Ordinary Differential Equations 465

has two complex conjugate eigenvalues λ = ±i, each of which has algebraic mul-
tiplicity equal to 2, but geometric multiplicity equal to 1. Thus, the matrix has the
following Jordan canonical form
⎡ ⎤
i 1 0 0
⎢ 0 i 0 0 ⎥
A = ⎢
⎣0
⎥.
0 −i 1 ⎦
0 0 0 −i

To determine eigenvectors and generalized eigenvectors corresponding to both


λ = ±i, it is sufficient to take just one of them. So, for λ = −i, we have that the
associated eigenspace is generated by X 1 = [−1, 0, −i, 0]t and the chain of general-
ized eigenvectors is completed by X 2 = [0, −i, 0, 1]t . Thus, for λ = i, we obtain the
chain X 1 = [−1, 0, i, 0]t , X 2 = [0, i, 0, 1]t . Therefore, the general complex solution
of the original system of differential equations is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1  −1 0 

−i x ⎢ 0 ⎥
⎥ ⎢ 0 ⎥ ⎢ −i ⎥
c1 e ⎣ + c2 e −i x ⎢
x⎣ ⎥ + ⎢ ⎥ +
−i ⎦ −i ⎦ ⎣ 0 ⎦
0 0 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1  −1 0 

ix ⎢ 0 ⎥
⎥ ⎢ 0 ⎥ ⎢i ⎥
c3 e ⎣ + c4 e x ⎣
ix ⎢ ⎥ + ⎢ ⎥ ,
i ⎦ i ⎦ ⎣0⎦
0 0 1

that is,
⎡⎤ ⎡ ⎤ ⎡ ⎤
−1  −1 0 
 ⎢ 0 ⎥   ⎢ 0 ⎥ ⎢ −i ⎥
c1 cos x − i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + c2 cos x − i sin x x ⎣ −i ⎦ + ⎣ 0 ⎦ +
0 0 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1  −1 0 
 ⎢ 0 ⎥   ⎢ 0 ⎥ ⎢i ⎥
c3 cos x + i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ i ⎦ + c4 cos x + i sin x x ⎣ i ⎦ + ⎣ 0 ⎦
0 0 1

and

y1 (x) = −(c1 + c3 ) cos x − (c2 + c4 )x cos x + i(c1 − c3 ) sin x + i(c2 − c4 )x sin x,

y2 (x) = −(c2 + c4 ) sin x − i(c2 − c4 ) cos x,

y3 (x) = −(c1 + c3 ) sin x − (c2 + c4 )x sin x − i(c1 − c3 ) cos x − i(c2 − c4 )x cos x,

y4 (x) = (c2 + c4 ) cos x − i(c2 − c4 ) sin x.


466 12 Ordinary Differential Equations and Linear …

12.3 Real-Valued Solutions for Systems with Complex


Eigenvalues

Summarizing what is seen in the above brief presentation, when solving

y (x) = Ay(x), (12.33)

we know what has to be done. After determining eigenvalues, we have to compute


the corresponding eigenvectors and generalized eigenvectors, then the solution is in
the form of the linear combination (12.11) or (12.20) according to whether the coef-
ficient matrix is diagonalizable or Jordanizable, respectively. In case the coefficient
matrix A has some complex eigenvalues, we find a complex vector-valued function
that is the general complex solution of system (see Examples 12.15 and 12.21).
Even if complex functions find applications in several sectors, from engineering to
telecommunications and surgery, real functions are more appropriate for many other
purposes.
Now, assume that the real matrix A has two complex conjugate eigenvalues λ, λ, with
associated complex conjugate (generalized) eigenvectors X 1 , . . . , X k and
X 1 , . . . , X k , respectively. Corresponding to both the first sequence of vectors and
to the second one, we may construct two solutions of system s1 (x) and s2 (x). In
particular, these solutions come in a conjugate pair, that is, s2 (x) = s1 (x). Thus, the
real and imaginary parts of s1 (x) are precisely
   
  1   1
Re s1 (x) = s1 (x) + s1 (x) , I m s1 (x) = s1 (x) − s1 (x) .
2 2i

Therefore, the real and imaginary parts of s1 (x) are real solutions of the system,
because they are linear combinations of solutions. The arguments that were put
forward are all we need for obtaining the real solutions for systems with complex
eigenvalues.

Example 12.22 Let’s get back to Example 12.15. We now would like to find the
general real solution of the system

y1 = 3y1 + y2 − 4y3 ,


y2 = 3y1 + 4y2 − 3y3 ,
y3 = −y1 + 4y2 .

The coefficient matrix of the system has one real eigenvalue λ = −1 and two complex
conjugate eigenvalues μ = 4 + 3i, μ = 4 − 3i. It is diagonalizable.
The eigenspace corresponding to λ = −1 is generated by the eigenvector X 1 =
[1, 0, 1]t . The particular solution associated with λ = −1 is then
12.3 Real-Valued Solutions for Systems with Complex Eigenvalues 467
⎡ ⎤
1
s1 (x) = ⎣ 0 ⎦ e−x .
1

To find the part of the general real solution associated with the pair of complex
conjugate eigenvalues, it is sufficient to take only the real and imaginary parts of
eigenvector corresponding to one of them. For example, the complex eigenvector
corresponding to μ = 4 + 3i is X 2 = [i, 1 + i, 1]t = [0, 1, 1]t + i[1, 1, 0]t . Thus,
the real and imaginary parts of X 2 are, respectively,
   
Re X 2 = [0, 1, 1]t , I m X 2 = [1, 1, 0]t ,

and the associated particular real solution is


⎡ ⎤ ⎡ ⎤
 0 1 
s2 (x) = e(4+3i)x ⎣ 1 ⎦ + i ⎣ 1 ⎦ ,
1 0

that is, ⎡ ⎤ ⎡ ⎤
 0 1 
 
s2 (x) =e cos 3x + i sin 3x
4x ⎣ 1 + i 1⎦
⎦ ⎣
1 0
⎡ ⎤ ⎡ ⎤
− sin 3x cos 3x
=e4x ⎣ cos 3x − sin 3x ⎦ + ie4x ⎣ cos 3x + sin 3x ⎦ .
cos 3x sin 3x
 
Hence, the linear combination of s1 (x), Re s2 (x)) and I m s2 (x)) gives us the general
real solution of the system, that is,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 − sin 3x cos 3x
y(x) = c1 e−x ⎣ 0 ⎦ + c2 e4x ⎣ cos 3x − sin 3x ⎦ + c3 e4x ⎣ cos 3x + sin 3x ⎦
1 cos 3x sin 3x

for c1 , c2 , c3 arbitrary constants.

Example 12.23 In conclusion, let us return to Example 12.21 and find the general
real solution of the system
y1 = −y3 − y4 ,
y2 = −y4 ,
y3 = y1 + y2 ,
y4 = y2 .

The coefficient matrix of the system has two complex conjugate eigenvalues λ =
±i, each of which has algebraic multiplicity equal to 2, but geometric multiplicity
equal to 1. As pointed out above, to find the general real solution associated with
468 12 Ordinary Differential Equations and Linear …

the pair of complex conjugate eigenvalues, we take only the real and imaginary
parts of generalized eigenvectors corresponding to one of them, more precisely we
choose λ = −i. The chain of generalized eigenvector associated with λ = −i is X 1 =
[−1, 0, −i, 0]t and X 2 = [0, −i, 0, 1]t . Therefore, the particular corresponding to
λ = −i is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1  −1 0 
⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ −i ⎥
s(x) = e−i x ⎢ ⎥
⎣ −i ⎦ + e
−i x
x⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + ⎣ 0 ⎦ ,
0 0 1

that is,
⎤ ⎡ ⎡ ⎤ ⎡ ⎤
−1  −1 0 
 ⎢ 0 ⎥   ⎢ 0 ⎥ ⎢ −i ⎥
s(x) = cos x − i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + cos x − i sin x x ⎣ −i ⎦ + ⎣ 0 ⎦
0 0 1
⎡ ⎤ ⎡ ⎤
− cos x − x cos x sin x + x sin x
⎢ − sin x ⎥ ⎢ − cos x ⎥
=⎢ ⎥ ⎢ ⎥
⎣ − sin x − x sin x ⎦ + i ⎣ − cos x − x cos x ⎦ .
cos x − sin x

The linear combination of real and imaginary parts of s(x) is then the real solution
⎡ ⎤ ⎡ ⎤
− cos x − x cos x sin x + x sin x
⎢ − sin x ⎥ ⎢ − cos x ⎥
y(x) = c1 ⎢ ⎥ ⎢ ⎥
⎣ − sin x − x sin x ⎦ + c2 ⎣ − cos x − x cos x ⎦
cos x − sin x

for c1 , c2 arbitrary constants.

12.4 Homogeneous Differential Equations of nth Order

We now consider the linear, homogeneous differential equation with constant coef-
ficients of order n


n
y (n) (x) + ai y (n−i) (x) = 0, where a1 , . . . , an are fixed real constants.
i=1
(12.34)
Any function z(x) defined in an interval I ⊆ R is a solution of (12.34) if
(1) z(x) is at least n-times differentiable in I ;
12.4 Homogeneous Differential Equations of nth Order 469


n
(2) z (n) (x) + ai z (n−i) (x) = 0, for any x ∈ I .
i=1

Let L be the differential operator of order n with constant coefficients defined by


n
L y = y (n) + ai y (n−i) (12.35)
i=1

for any function y = y(x) at least n-times differentiable over I . Thus, L defines a
map C n (I ) → C 0 (I ) such that L y is a continuous function, for any complex-valued
function y = y(x) at least n-times differentiable over I . We trivially note that, for
any c1 , c2 ∈ C and y1 , y2 ∈ C n (I ),

L(c1 y1 + c2 y2 ) = c1 L y1 + c2 L y2 ,

that is, L is a linear operator. In particular, this means that if y1 , . . . , yk are k solutions
of (12.34), then any linear combination of them is a solution of (12.34). Therefore,
the set
V = {y ∈ C n (I ) : L y = 0}

is a complex vector space.

Theorem 12.24 The dimension of the vector space V is precisely equal to n.

Proof Consider the following set of initial values problems:

(P1 ) L y = 0, y(x0 ) = 1, y  (x0 ) = 0, . . . , y (n−1) (x0 ) = 0

(P2 ) L y = 0, y(x0 ) = 0, y  (x0 ) = 1, . . . , y (n−1) (x0 ) = 0


(12.36)
··· ······

(Pn ) L y = 0, y(x0 ) = 0, y  (x0 ) = 0, . . . , y (n−1) (x0 ) = 1

and assume that functions y1 (x), . . . , yn (x) are solutions of the problems (P1 ), . . . ,
(Pn ), respectively. Thus, y = c1 y1 + · · · + cn yn (ci ∈ R) is solution of the initial
value problem

L y = 0, y(x0 ) = c1 , y  (x0 ) = c2 , . . . , y (n−1) (x0 ) = cn .

Hence, if we suppose that c1 y1 + · · · + cn yn = 0, i.e., y = 0, it follows c1 = c2 =


· · · = cn = 0, that is, y1 , . . . , yn ∈ V are linearly independent functions.
Let now w(x) ∈ V be any solution of (12.34) and define the following function:

y(x) = w(x0 )y1 (x) + w  (x0 )y2 (x) + · · · + w (n−1) (x0 )yn (x),
470 12 Ordinary Differential Equations and Linear …

where x0 ∈ I is precisely the previously fixed point in the initial values problems
(P1 ), . . . , (Pn ). By the definition of y1 , . . . , yn , we observe that L y = 0 and also

y(x0 ) = w(x0 ), y  (x0 ) = w  (x0 ), . . . , y (n−1)(x0 ) = w (n−1)(x0 ) .

Therefore, y and w are solutions of the same initial value problem and this implies that
y(x) = w(x), for any x ∈ I . Therefore, the arbitrary solution w(x) ∈ V is a linear
combination of the linearly independent solutions y1 , . . . , yn . The arbitrariness of w
allows us to conclude that {y1 , . . . , yn } is a set of linear independent generators for
the vector space V , that is, dim R V = n.

At this point, before proceeding with the determination of methods to obtain the
solutions of (12.34), we recall how to estimate if a set of known solutions is linear
independent or dependent. To do this, we suppose {y1 , . . . , yn } is a set of solutions
for (12.34).
Definition 12.25 The Wronskian of {y1 , . . . , yn } is defined on the interval I to be
the determinant
 
 y1 (x) y2 (x) · · · yn (x) 
 
 y1 (x) y2 (x) · · · yn (x) 
 
W (x) =  y1 (x) y2 (x) · · · yn (x)  .
 ··· ··· ··· · · · 
 (n−1)
y (x) y (n−1)
(x) · · · y (n−1)
(x) 
1 2 n

Solutions {y1 , . . . , yn } are linearly independent if and only if W (x0 ) = 0 at some


point x0 ∈ I .

The Reduction to a Linear System


A first approach for solving (12.34) is to show how we can reduce it to a system
of linear ordinary differential equations and solve it by using the above-discussed
method.
In fact, setting

y1 = y, y2 = y  , y3 = y  , . . . , yn = y (n−1) , yn = y (n) ,

we obtain the system of the form

y1 = y2

y2 = y3

... ... ... (12.37)



yn−1 = yn

yn = −an y1 − an−1 y2 − an−2 y3 − · · · · · · − a1 yn


12.4 Homogeneous Differential Equations of nth Order 471

whose coefficient matrix is


⎡ ⎤
0 1 0 ... 0
⎢ 0 0 1 ... 0 ⎥
⎢ ⎥
A=⎢
⎢ ... ... ... ... ... ⎥
⎥. (12.38)
⎣ 0 0 0 ... 1 ⎦
−an −an−1 ··· −a1

Hence, if the vector-valued function f(x) = [y1 (x), . . . , yn (x)]T is the solution of
system (12.37), the solution of the original differential equation (12.34) is precisely
the first component y1 (x) of f(x).
The above matrix A is usually called the Frobenius companion matrix of the monic
polynomial


n
p(x) = x + a1 x
n n−1
+ a2 x n−2
+ · · · + an−1 x + an = x + n
ai x n−i .
i=1

Note that, if n is the degree of a monic polynomial, its companion matrix has order
n. For example, the 4 × 4 companion matrix of x 4 + 2x 3 − 3x 2 + x + 1 is
⎡ ⎤
0 1 0 0
⎢ 0 0 1 0 ⎥
⎢ ⎥.
⎣ 0 0 0 1 ⎦
−1 −1 3 −2

The above discussions clearly show that it is important, from an analytic viewpoint, to
describe some properties of companion matrices. In particular, we focus our attention
in order to compute eigenvalues and eigenvectors of a companion matrix. To do this,
we prove some easy known results.

Theorem 12.26 Let A be the companion matrix of the polynomial


n
p(x) = x + a1 x
n n−1
+ a2 x n−2
+ · · · + an−1 x + an = x + n
ai x n−i .
i=1

Then the characteristic polynomial of A is (−1)n p(x). Moreover, the minimal poly-
nomial of A coincides with the characteristic one.

Proof We prove the first part of the theorem by induction on the order n of the matrix
(the degree of p(x)).
For n = 1, p(x) = x + a and A = [−a]. Hence, the characteristic polynomial of A
is −a − λ = (−1)1 (λ + a) = (−1)1 p(λ).
Suppose the assertion is true for any (n − 1) × (n − 1) companion matrix (with
n ≥ 2). Since A has order n, its characteristic polynomial is equal to the determinant
472 12 Ordinary Differential Equations and Linear …
 
 −λ 1 0 ... 0 
 
 0 −λ 1 ... 0 
 
|A − λI | =  . . . . . . ... ... . . .  .
 0 0 ... −λ 1 
 
 −an −an−1 ... . . . −a1 − λ 

We compute |A − λI | by using the cofactor expansion with respect to the first col-
umn:  
 −λ 1 0 ... 0 
 
 0 −λ 1 . . . 0 
 

|A − λI | = (−λ)  . . . ... ... ... . . . 
 0 0 . . . −λ 1 
 
 −an−1 −an−2 . . . . . . −a1 − λ 

 
 1 0 0 ... 0 

 −λ 1 0 ... 0 

+(−1)n+1 (−an )  . . . ... ... ... . . .  .
 0 ... −λ 1 0 

 0 0 ... −λ 1 

Notice that the first determinant is equal to the characteristic polynomial of the
companion matrix of the polynomial

x n−1 + a1 x n−2 + · · · + an−2 x + an−1 .

So, by induction hypothesis, the first determinant is


 
(−1)n−1 λn−1 + a1 λn−2 + · · · + an−2 λ + an−1 .

Moreover, the second determinant is trivially equal to (−1)n+1 (−an ) = (−1)n (an ).
Hence,
 
|A − λI | = (−1)n−1 (−λ) λn−1 + a1 λn−2 + · · · + an−2 λ + an−1 + (−1)n (an )

 
= (−1)n λn + a1 λn−1 + · · · + an−2 λ2 + an−1 λ + an

as desired.
Let now λ0 be any eigenvalue of A and consider the matrix A − λ0 I . Since its
determinant must be zero, its rank is less than or equal to n − 1. On the other hand,
writing
12.4 Homogeneous Differential Equations of nth Order 473
⎡ ⎤
−λ0 1 0 ... 0
⎢ 0 −λ0 1 ... 0 ⎥
⎢ ⎥
A − λ0 I = ⎢
⎢ ... ... ... ... ... ⎥ ⎥
⎣ 0 0 ... −λ0 1 ⎦
−an −an−1 ... . . . −a1 − λ0

and deleting the first column and the last row, we obtain the (n − 1) × (n − 1) lower
triangular submatrix ⎡ ⎤
1 0 ... 0
⎢ −λ0 1 . . . 0 ⎥
⎢ ⎥
⎣ ... ... ... ...⎦
0 . . . −λ0 1

whose determinant is equal to 1. Thus, the rank of A − λ0 I is precisely equal to


n − 1. This is enough to conclude that the dimension of the eigenspace associated
with λ0 is equal to 1, that is, there will be only one Jordan block corresponding to λ0
(when we consider the canonical form of the matrix), and its dimension is equal to the
algebraic multiplicity of λ0 as the root of the characteristic polynomial. Repeating
this discussion for any eigenvalue of A, we can affirm that the minimal polynomial
of A coincides with its characteristic one.

Theorem 12.27 Let A be the companion matrix of the polynomial


n
p(x) = x n + a1 x n−1 + a2 x n−2 + · · · + an−1 x + an = x n + ai x n−i .
i=1

If λ is an eigenvalue of A, then its corresponding eigenspace is generated by the


eigenvector [1, λ, λ2 , . . . , λn−1 ]T .

Proof Assume that X = [x1 , . . . , xn ]t is the eigenvector corresponding to λ. Thus, X


is the solution of the homogeneous linear system whose coefficient matrix is A − λI :
⎡ ⎤⎡ ⎤
−λ 1 0 ... 0 x1
⎢ 0 −λ 1 ... 0 ⎥ ⎢ x2 ⎥
⎢ ⎥⎢ ⎥
⎢ ... ... ... ... ... ⎥ ⎢ ⎥
⎢ ⎥ ⎢ . . . ⎥ = 0.
⎣ 0 0 ... −λ 1 ⎦ ⎣...⎦
−an −an−1 ... . . . −a1 − λ xn

We have already proved that the rank of A − λI is equal to n − 1; more precisely,


the first n − 1 rows of the matrix are linearly independent. Hence, we may construct
the homogeneous linear system by using exactly these lines:
474 12 Ordinary Differential Equations and Linear …

−λx1 + x2 = 0

−λx2 + x3 = 0
(12.39)
... ... ...

−λxn−1 + xn = 0.

The easy solution of (12.39) gives the required eigenvector.

Example 12.28 Solve the following initial value problem:


 
y − 3y + 4y = 0,

y(0) = 1, y  (0) = 0, y  (0) = −1.

We firstly set

y1 = y, y2 = y  , y3 = y  , y3 = y .

Then we have the system


y1 = y2 ,

y2 = y3 , (12.40)

y3 = −4y1 + 3y3 ,

whose coefficient matrix is ⎡ ⎤


0 10
A = ⎣ 0 0 1⎦.
−4 0 3

The matrix A has two distinct real eigenvalues: λ1 = −1 having algebraic multiplicity
equal to 1; λ2 = 2 having algebraic multiplicity equal to 2.
The eigenvector generating the null space of A + I is X 1 = (1, −1, 1) (that is, the
eigenvector associated with λ1 = −1).
Notice that, since λ2 = 2 has geometric multiplicity equal to 1, there exists one Jordan
block of order 2 generated by the generalized eigenvectors of λ2 . By easy computa-
tion, we get N1,λ2 = (1, 2, 4) and N2,λ2 = (1, 0, −4), (0, 1, 4). The correspond-
ing chain of generalized eigenvectors of order 2 is X 1 = (1, 2, 4), X 2 = (0, 1, 4).
Hence, the Jordan canonical form of A is
⎡ ⎤
21 0
A = ⎣ 0 2 0 ⎦
0 0 −1
12.4 Homogeneous Differential Equations of nth Order 475

relative to the Jordan basis {(1, 2, 4), (0, 1, 4), (1, −1, 1)}. Thus, a general solution
for the system (12.40) is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1  1 0  1
c1 e2x ⎣ 2 ⎦ + c2 e2x x ⎣ 2 ⎦ + ⎣ 1 ⎦ + c3 e−x ⎣ −1 ⎦ ,
4 4 4 1

that is, 
y1 (x) = c1 + c2 x e2x + c3 e−x ,

y2 (x) = 2c1 + c2 (2x + 1) e2x − c3 e−x ,

y3 (x) = 4c1 + c2 (4x + 4) e2x + c3 e−x .

In particular, 
y(x) = y1 (x) = c1 + c2 x e2x + c3 e−x

is the general solution of the original differential equation, so that

y  (x) = 2c1 e2x + 2c2 xe2x + c2 e2x − c3 e−x ,

y  (x) = 4c1 e2x + 4c2 xe2x + 4c2 e2x + c3 e−x .

From the given initial conditions, we have

1 = y(0) = c1 + c3 ,

0 = y  (0) = 2c1 + c2 − c3 ,

−1 = y  (0) = 4c1 + 4c2 + c3 .

Solving the above linear system, one has c1 = 23 , c2 = −1, c3 = 13 . So the solution
satisfying the initial conditions is

y(x) = 23 e2x − xe2x + 13 e−x .

The Characteristic Polynomial of a Differential Equation


Here we look at a different method for solving homogeneous differential equations
of nth order. To do this, we observe that the exponential function eλx (λ ∈ C) has
appeared several times in the solutions of differential equations of the form (12.34).
We then consider the function y(x) = eλx and ask ourselves when it is possible that
this is a solution for (12.34).
476 12 Ordinary Differential Equations and Linear …

Since y ( j) (x) = λ j eλx for any j ≥ 0, it follows that y(x) is a solution for (12.34) if
and only if
n
eλx {λn + ai λn−i } = 0 ∀x ∈ I.
i=1

This will be zero only if


n
λn + ai λn−i = 0 ∀x ∈ I
i=1

since the exponential is never zero. We then observe that the complex exponential eλx
can be a solution for the given differential equation. More precisely, it is a solution
for (12.34) if and only if λ is a root of the polynomial


n
p(t) = t n + ai t n−i .
i=1

This polynomial is typically called the characteristic polynomial of the differential


equation. Factoring the polynomial over C, we obtain the n roots of p(t), namely
λ1 , . . . , λn ∈ C. These roots are not necessarily all distinct. If each of them is counted
with its multiplicity, then the polynomial factors as

p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k ,

where λ1 , . . . , λk are the distinct roots of p(t).


Let L be the linear differential operator of order n with constant coefficients defined
by (12.35). We notice that L can be written as


n
L = D (n) + ai D (n−i) = p(D), (12.41)
1=1

where p(t) is precisely the characteristic polynomial associated with the differential
equation and D k = ( ddx )k (k = 1, . . . , n), denoting by ddx the differentiation operator.
Recalling that Eq. (12.34) can also be written L y = 0, we see that any solution y(x)
of (12.34) actually is an element of the null space of L, that is, y(x) ∈ K er (L).
Remark 12.29 Suppose there exist L 1 , . . . , L k linear differential operators with
constant coefficients mapping C n (I ) → C 0 (I ), such that L = L 1 L 2 · · · L k . Then
K er (L i ) ⊆ K er (L), for any i = 1, . . . , k.
The idea is now to discuss the characteristic polynomial associated with the differ-
ential equation, in order to obtain the linearly independent functions generating the
null space of L, that is, a basis for K er (L). The roots of p(t) will have three possible
forms:
12.4 Homogeneous Differential Equations of nth Order 477

(1) They are all real and distinct.


(2) They are all real but not all distinct (at least one of them has algebraic multiplicity
greater than 1).
(3) There exist at least two complex roots λ = α + iβ and λ = α − iβ, for α, β ∈
R and β = 0. Also in this case, each complex root must be counted with its
multiplicity.
Hence, we look at each of these cases to get the general solution for Eq. (12.34). To
do this, it is important to recall the following well-known fact:
Remark 12.30 Suppose there exist L 1 , . . . , L k linear differential operators with
constant coefficients mapping C n (I ) → C 0 (I ), such that L = L 1 L 2 · · · L k . Then
K er (L i ) ⊆ K er (L), for any i = 1, . . . , k.

Case (1): The roots of p(t) are all real and distinct.
In this case, the characteristic polynomial factors as

p(t) = (t − λ1 )(t − λ2 ) · · · (t − λn ),

where λ1 , . . . , λn are the distinct roots of p(t), as well as the differential operator L
similarly factors as

L = (D − λ1 )(D − λ2 ) · · · (D − λn ),

where
(D − λi )y = y  − λi y ∀i = 1, . . . , n. (12.42)

Solving (12.42) for any i = 1, . . . , n, we have the solutions

y1 (x) = eλ1 x , . . . , yn (x) = eλn x

and, by Remark 12.30, y1 , . . . , yn ∈ K er (L), that is, {y1 , . . . , yn } is a set of n solu-


tions for (12.34). Moreover, computing the Wronskian, we get
 
 eλ1 x eλ2 x · · · eλn x 
 
 λ1 eλ1 x λ2 eλ2 x · · · λn eλn x 
 2
W (x) =  λ1 e 1 λ x λ2 e 2 · · · λn e n 
2 λ x 2 λ x
 · · · ··· ··· ··· 
 
 λn−1 eλ1 x λn−1 eλ2 x · · · λn−1 eλn x 
1 2 n
 
 1 1 ··· 1 
 
 λ1 λ2 · · · λn 
 2
=e 1 (λ +···+λ n )x  λ λ2 · · · λn 
2 2
 1
 ··· ··· ··· ··· 
 
 λn−1 λn−1 · · · λn−1 
1 2 n
!
= e(λ1 +···+λn )x (λi − λ j )  = 0 ∀x ∈ I.
i> j
478 12 Ordinary Differential Equations and Linear …

Hence, {y1 , . . . , yn } is a linearly independent set, that is, a basis for the vector space
of solutions. The general solution can be written as

y(x) = c1 eλ1 x + · · · + cn eλn x c1 , . . . , cn ∈ R.

Example 12.31 Solve the differential equation

 9   3
y − y + 5y − y = 0.
2 2
The characteristic polynomial associated with the equation is

9 3
p(t) = t 3 − t 2 + 5t −
2 2

having three distinct real roots: λ1 = 1, λ2 = 21 and λ3 = 3, all of them of algebraic


multiplicity equal to 1.
Thus, the general solution of the differential equation is
1
y(x) = c1 e x + c2 e 2 x + c3 e3x .

Case (2): The roots of p(t) are all real but not all distinct.
Here we assume

p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k ,

where λ1 , . . . , λk are the distinct roots of p(t) and m i is the algebraic multiplicity of
λi , for i = 1, . . . , k. In parallel, we can factor L as

L = (D − λ1 )m 1 (D − λ2 )m 2 · · · (D − λk )m k .

To fully describe the present case, we need to premise some results. More precisely,

Proposition 12.32 Let λ be a root of p(t), having algebraic multiplicity equal to m.


Then the functions
eλx , xeλx , . . . , x m−1 eλx

form a basis for the null space K er (D − λ)m .

Proof For m = 1, it is clear that (D − λ)eλx = 0. By induction, assume that

eλx , xeλx , . . . , x m−2 eλx ∈ K er (D − λ)m−1 .

Hence,
12.4 Homogeneous Differential Equations of nth Order 479

(D − λ)m x m−1 eλx = (D − λ)m−1 (D − λ)x m−1 eλx

= (D − λ)m−1 (m − 1)x m−2 eλx

= 0 ∀x ∈ I

proving that
eλx , xeλx , . . . , x m−1 eλx ∈ K er (D − λ)m .

To complete the proof, we then prove that those functions are linearly independent.
Let c1 , . . . , cm ∈ R be such that

c1 eλx + c2 xeλx + · · · + cm x m−1 eλx = 0 (12.43)

and, by contradiction, assume there exists at least one index i ∈ {1, . . . , m} such that
ci = 0. Since eλx is never zero, (12.43) says that

c1 + c2 x + · · · + cm x m−1 = 0.

On the other hand, functions 1, x, x 2 , . . . , x m−1 are clearly linearly independent,


since their Wronskian is
 
 1 x x2 · · · x m−1 
 
 0 1 2x · · · (m − 1)x m−2 
 
 0 0 2 · · · (m − 2)(m − 1)x m−3  = 0 ∀x ∈ R.
 
··· ··· ··· ··· ··· 
 
 0 0 0 ··· (m − 1)! 

Of course, this is not possible, since ci = 0.

Proposition 12.33 Let λ1 , . . . , λk be real distinct numbers and f 1 (x), . . . , f k (x)


polynomials. If each f i (x) (i = 1, . . . , k) is not identically zero in R, then the func-
tions f 1 (x)eλ1 x , . . . , f k (x)eλk x are linearly independent.

Proof We prove the result by induction. Assume firstly k = 2 and let c1 , c2 ∈ R be


such that
c1 f 1 (x)eλ1 x + c2 f 2 (x)eλ2 x = 0 ∀x ∈ R. (12.44)

Since λ1 = λ2 , without loss of generality, we may assume λ2 = 0. Hence, by (12.44),


we get
c1 f 1 (x)e(λ1 −λ2 )x + c2 f 2 (x) = 0 ∀x ∈ R. (12.45)

If c2 = 0, relation (12.45) implies

c1 c2−1 f 1 (x)e(λ1 −λ2 )x = − f 2 (x) ∀x ∈ R


480 12 Ordinary Differential Equations and Linear …

which cannot occur, due to the fact that λ1 − λ2 = 0 and both f 1 (x) and f 2 (x) are not
identically zero. Hence, we may assert that c2 = 0 and, by (12.45), c1 = 0 follows
trivially.
Suppose now that the result is true for the k − 1 functions

f 1 (x)eλ1 x , . . . , f k−1 (x)eλk−1 x .

Our final aim is to show that it holds again for the k functions

f 1 (x)eλ1 x , . . . , f k (x)eλk x .

In this sense, we suppose there are c1 , . . . , ck ∈ R such that


k
ci f i (x)eλi x = 0 ∀x ∈ R. (12.46)
i=1

As above, we may assume λk = 0 and reduce (12.46) to


k−1
ci f i (x)e(λi −λk )x = −ck f k (x) ∀x ∈ R. (12.47)
i=1

If we denote n = degr ee( f k ) and suppose ck = 0, by (12.47), it follows


 n+1 
k−1 
d (λi −λk )x
ci f i (x)e = 0 ∀x ∈ R
dx i=1

which is a contradiction, since λi − λk = 0 and f i (x) is not identically zero, for any
i = 1, . . . , k − 1.
Thus, ck = 0 and relation (12.47) reduces to


k−1
ci f i (x)eμi x = 0 ∀x ∈ R, (12.48)
i=1

where
μ1 = (λ1 − λk ), μ2 = (λ2 − λk ), . . . , μk−1 = (λk−1 − λk )

are all distinct. Then, by induction and relation (12.48), c1 = · · · = ck−1 = 0, as


required.
12.4 Homogeneous Differential Equations of nth Order 481

Proposition 12.34 Let

p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k

be the characteristic polynomial for the differential equation (12.34), where λ1 , . . . ,


λk are the distinct roots of p(t) and m i is the algebraic multiplicity of λi , for i =
1, . . . , k. Then functions

yis (x) = x s−1 eλi x ∀1 ≤ i ≤ k ∀1 ≤ s ≤ m i (12.49)

are linearly independent.


Proof For i = 1, . . . , k and s = 1, . . . , m i , let cis ∈ R be such that


k 
mi
cis x s−1 eλi x = 0 ∀x ∈ R. (12.50)
i=1 s=1


mi
For any i ∈ {1, . . . , k}, here we denote f i (x) = cis x s−1 .
s=1
In case, any function f i (x) is identically zero in R, it follows trivially cis = 0, for
any i ∈ {1, . . . , k} and for any s ∈ {1, . . . , m i }.
Then we may assume that
• there exist some i 1 , . . . , i h (1 ≤ h ≤ k) such that the polynomials f i1 , . . . , f ih are
not identically zero;
• fr (x) = 0, for any x ∈ R, if r ∈ / {i 1 , . . . , i h }.
Hence, we reduce relation (12.50) to


h
f i j (x)eλi j x = 0 ∀x ∈ R. (12.51)
j=1

But, in light of Proposition 12.33 and since f i1 , . . . , f ih are not identically zero, the
relation (12.51) represents a contradiction.
At this point, we are ready to describe the general solution of Eq. (12.34) in the case
its characteristic polynomial is

p(t) = (t − λ1 )(t − λ2 ) · · · (t − λn ),

where λ1 , . . . , λn are the distinct roots of p(t).


Using the results contained in Propositions 12.32, 12.33 and 12.34, we firstly list the
n linearly independent solutions, that is, the functions of the basis for the null space
K er (L). Those functions are

eλ1 x , xeλ1 x , . . . , x m 1 −1 eλ1 x


482 12 Ordinary Differential Equations and Linear …

eλ2 x , xeλ2 x , . . . , x m 2 −1 eλ2 x

...,...,...,...

eλk x , xeλk x , . . . , x m k −1 eλk x .

Then we write the general solution


k 
mi
y(x) = cis x s−1 eλi x . (12.52)
i=1 s=1

Example 12.35 Repeat Example 12.28 and solve the differential equation
 
y − 3y + 4y = 0.

The characteristic polynomial associated with the equation is

p(t) = t 3 − 3t 2 + 4

having two distinct real roots: λ1 = −1 of algebraic multiplicity equal to 1; λ2 = 2


of algebraic multiplicity equal to 2.
Thus, the general solution of the differential equation is

y(x) = c1 e−x + c2 e2x + c3 xe2x .

Example 12.36 Solve the differential equation


  
y iv − 2y − 3y + 8y − 4y = 0.

The characteristic polynomial associated with the equation is

p(t) = t 4 − 2t 3 − 3t 2 + 8t − 4

having two distinct real roots: λ1 = 1 and λ2 = 2, both of them of algebraic multi-
plicity equal to 2.
Thus, the general solution of the differential equation is

y(x) = c1 e x + c2 xe x + c3 e2x + c4 xe2x .

Example 12.37 Solve the differential equation


 
y iv − 8y + 18y − 27y = 0.

The characteristic polynomial associated with the equation is


12.4 Homogeneous Differential Equations of nth Order 483

p(t) = t 4 − 8t 3 + 18t 2 − 27

having two distinct real roots: λ1 = −1 of multiplicity equal to 1 and λ2 = 3 of


multiplicity equal to 3.
Thus, the general solution of the differential equation is

y(x) = c1 e−x + c2 e3x + c3 xe3x + c4 x 2 e3x .

Case (3): p(t) has some complex roots.


We finally need to deal with complex roots. Assuming that λ = α + iβ (0 = β ∈ R)
occurs m-times in the list of roots (i.e., λ has a multiplicity of m), we have that
λ = α − iβ is again a root for p(t), also having multiplicity equal to m. In this case,
we can use the work from the repeated roots above to get the following set of 2m
complex-valued solutions:

eαx (cos βx + i sin βx), xeαx (cos βx + i sin βx), . . . , x m−1 eαx (cos βx + i sin βx)

which represent a basis for the null space K er (D − λ)m , and

eαx (cos βx − i sin βx), xeαx (cos βx − i sin βx), . . . , x m−1 eαx (cos βx − i sin βx)

which represent a basis for the null space K er (D − λ)m . Exactly as seen in the pre-
vious cases, those functions give their contribution to the constitution of the whole
basis for K er (L).

If we need to express the real-valued solutions of (12.34), we use Euler’s formula on


the first set of complex-valued solutions above. Then we split each complex solution
into its real and imaginary parts to arrive at the following set of 2m real-valued
solutions:
eαx cos βx, xeαx cos βx, . . . , x m−1 eαx cos βx,

eαx sin βx, xeαx sin βx, . . . , x m−1 eαx sin βx.

Those functions contribute to the general real solution, together with any other solu-
tions corresponding to any real roots of p(t).

Example 12.38 Solve the differential equation

y iv − 16y = 0.

The characteristic polynomial associated with the equation is

p(t) = t 4 − 16
484 12 Ordinary Differential Equations and Linear …

having two distinct real roots λ1 = 2 and λ2 = −2, both of which of multiplicity
equal to 1, and two complex roots λ3 = 2i and λ3 = −2i.
Thus, the general solution of the differential equation is

y(x) = c1 e2x + c2 e−2x + c3 e2i x + c4 e−2i x

= c1 e2x + c2 e−2x + c3 (cos 2x + i sin 2x) + c4 (cos 2x − i sin 2x)

= c1 e2x + c2 e−2x + (c3 + c4 ) cos 2x + i(c3 − c4 ) sin 2x,

and the general real solution is

y(x) = c1 e2x + c2 e−2x + c3 cos 2x + c4 sin 2x.

Example 12.39 Find the real solution of the following initial value problem:
  
y − y + 4y − 4y = 0,
 
y(0) = 2, y (0) = 1, y (0) = 1.

The characteristic polynomial associated with the equation is

p(t) = t 3 − t 2 + 4t − 4

having one real root λ1 = 1 of multiplicity equal to 1, and two complex roots λ2 = 2i
and λ2 = −2i.
Thus, the general real solution of the differential equation is

y(x) = c1 e x + c2 cos 2x + c3 sin 2x

so that the general real solution is



y (x) = c1 e x − 2c2 sin 2x + 2c3 cos 2x,

y (x) = c1 e x − 4c2 cos 2x − 4c3 sin 2x.

By the initial conditions


y(0) = c1 + c2 = 2,

y (0) = c1 + 2c3 = 1,

y (0) = c1 − 4c2 = 1

implying c1 = 95 , c2 = 15 , c3 = − 25 . Hence, the solution for the initial values problem


is the function
y(x) = 95 e x + 15 cos 2x − 25 sin 2x.
12.4 Homogeneous Differential Equations of nth Order 485

Example 12.40 Solve the differential equation


 
y vi + 4y v + 4y iv + 18y + 36y + 81y = 0.

The characteristic polynomial associated with the equation is

p(t) = t 6 + 4t 5 + 4t 4 + 18t 3 + 36t 2 + 81

having the following roots:


• λ1 = −3 of√multiplicity equal to 2.
• λ2 = 21 + i 211 of multiplicity equal to 2.

• λ2 = 1
2
−i 11
2
of multiplicity equal to 2.
Thus, the general solution of the differential equation is then

y(x) = c1 e−3x + c2 xe−3x


1
√ √
+ c3 e 2 x (cos 11
2
x + i sin 11
2
x)
1
√ √
+ c4 xe 2 x (cos 11
2
x + i sin 11
2
x)
1
√ √
+ c5 e 2 x (cos 11
2
x − i sin 11
2
x)
1
√ √
+ c6 xe 2 x (cos 11
2
x − i sin 11
2
x),

and the general real solution is

y(x) = c1 e−3x + c2 xe−3x


1
√ 1

+ c3 e 2 x cos 11
2
x + c4 xe 2 x cos 11
2
x
1
√ 1

+ c5 e 2 x sin 11
2
x + c6 xe 2 x sin 11
2
x.

Exercises

1. Solve the system


y1 = y2 ,
y2 = 4y1 ,
y3 = y4 ,
y4 = 4y3 ,
486 12 Ordinary Differential Equations and Linear …

and find a solution that satisfies the initial conditions

y1 (0) = 1, y2 (0) = 0, y3 (0) = 1, y4 (0) = 0.

2. Find the general solution of the system

y1 = −5y1 + 2y2 + 2y4 ,


y2 = 2y1 − 2y2 − y4 ,
y3 = −5y3 + 2y4 ,
y4 = 2y3 − 2y4 .

3. Find the general solution of the system

y1 = y1 ,
y2 = −y2 − 4y3 + 2y4 ,
y3 = 3y2 + y3 − 2y4 ,
y4 = y2 − 4y3 + y4 .

4. Find the general solution of the system

y1 = 2y1 + y2 − y3 + y4 ,
y2 = 2y2 + y3 + 2y4 ,
y3 = 2y3 + y4 ,
y4 = 2y4 .

5. Find the general solution of the system

y1 = y1 + 2y2 − y3 − y4 ,
y2 = −2y1 + y2 + y3 + y4 ,
y3 = y3 + 2y4 ,
y4 = −2y3 + y4 .

6. Find the real-valued solution of the system

y1 = y1 + 2y2 ,
y2 = −2y1 + y2 .

7. Find the real-valued solution of the system

y1 = y2 ,
y2 = −3y1 − 2y2 .
12.4 Homogeneous Differential Equations of nth Order 487

8. Find the real-valued solution of the system

y1 = y1 + y2 + y3 ,
y2 = −y1 + y2 − y4 ,
y3 = y3 + y4 ,
y4 = −y3 + y4 .

9. Find the real-valued solution of the system

y1 = 3y2 − y4 ,
y2 = −3y1 + y3 ,
y3 = 2y4 ,
y4 = −2y3 .

10. Find the real-valued solution of the system

y1 = 2y1 − 5y2 + y4 ,


y2 = y1 − 2y2 − y3 − y4 ,
y3 = −y3 − 6y4 ,
y4 = 3y3 + 5y4 .

11. Find the solution of the following initial value problem:

y  − 2y  + 2y = 0,

1 
y(0) = , y (0) = 1.
3
12. Solve the following initial value problem:
 
y − 2y − 8y = 0,

y(1) = 1, y  (1) = −1.

13. Solve the following initial value problem:


  
y − 7y + 16y − 12y = 0,
 
y(0) = 0, y (0) = 1, y (0) = 1.

14. Solve the following initial value problem:


  
y − y + 9y − 9y = 0,
 
y(0) = 1, y (0) = 0, y (0) = 0.
488 12 Ordinary Differential Equations and Linear …

15. Find the general solution of the differential equation


  
y v − 5y iv + 14y − 22y + 17y − 5y = 0.

16. Find the solution of the following equation:



y iv − 2y + 2y  − 2y  + y = 0.

17. Find the general solution of the differential equation



y iv + y = 0.

18. Find the real solution of the differential equation



y iv + 8y + 16y = 0.

19. Describe all possible solutions of the differential equation


 
y + (1 − k)y − ky  = 0

as parameter k varies in real numbers.


20. Describe all possible solutions of the differential equation
 
y − ky + k 2 y  − k 3 y = 0

as parameter k > 0 varies in real positive numbers.


References

1. Anton, H.: Elementary Linear Algebra. Wiley, New York (1987)


2. Artin, M.: Algebra. Pearson Education Limited, Inc. (2013)
3. Axler, S.: Linear Algebra Done Right. Springer International Publishing, Berlin (2015)
4. Clay, D.C., Lay, S.R., MacDonald, J.J.: Linear Algebra and its Applications. Pearson, London
(2014)
5. Friedberg, S.H., Insel, A.J., Spence, L.E.: Linear Algebra. Pearson New International Edition
(2013)
6. Garcia, S.R., Horn, R.A.: A Second Course in Linear Algebra. Cambridge University Press,
Cambridge (2017)
7. Greub, W.H.: Linear Algebra. Springer, Berlin (1967)
8. Herstein, I.N.: Topics in Algebra. Wiley, New York (1975)
9. Hoffman, K., Kunze, R.: Linear Algebra. Prentice Hall of India Pvt. Ltd, New Delhi (2005)
10. Jacobson, N.: Lectures in Abstract Algebra 2: Linear Algebra. Springer, New York (1953)
11. Kostrikin, A., Manin, Y.: Linear Algebra and Geometry. Gordon and Breach Science Publishers,
London (1997)
12. Lam, T.Y.: A First Course in Noncommutative Rings. Springer, New York (2001)
13. Leon, S.J.: Linear Algebra with Applications. Macmillan Publishing Company, New York
(1980)
14. Marcus, M.: Finite Dimensional Multilinear Algebra Part II. Marcel Dekker, New York (1975)
15. Nobel, B., Daniel, J.: Applied Linear Algebra. Prentice Hall, Hoboken (1977)
16. Roger, B.: Linear Algebra. Rinton Press, Princeton (2001)
17. Roman, Steven: Advanced Linear Algebra. Graduate Texts in Mathematics. Springer Science
Inc, New York (2005)
18. Satake, I.: Linear Algebra. Marcel Dekker Inc., New York (1975)
19. Scroggs, J.E.: Linear Algebra. Brooks/Cole Publishing Company, Belmount (1970)
20. Singh, S.: Linear Algebra. Vikas Publishing House Pvt, Ltd (1997)
21. Strang, G.: Linear Algebra and its Applications. Hardcourt Brace Jovanovich, New York (1980)
22. Strang, G.: Introduction to Linear Algebra. Wellesley-Cambridge Press, Wellesley (2016)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 489
Nature Singapore Pte Ltd. 2022
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3
Index

A Canonical homomorphism, 77
Abelian group, 2 Canonical isomorphism, 118
Adjoint, 155 Cauchy-Schwartz inequality, 137
Affine coordinate system, 366 Cayley-Hamilton thoerem, 213
Affine Euclidean space, 366 Characteristic equation, 168
Affinely equivalent, 399, 403 Characteristic of a ring, 13
Affine space, 365 Characteristic polynomial, 168, 214, 476
Affine subspace, 368, 369 Coefficient matrix, 39
Affine transformation, 381 Column coordinate vectors, 167
Affinity, 400 Commutative ring, 9
Algebraically closed field, 21 Complementary solution, 438
Algebraic element, 14, 17 Complex spectral theorem, 225
Algebraic extension, 19 Congruence, 244
Algebraic multiplicity, 174, 467 Congruent matrices, 244
Alternate map, 314 Conjugate, 27
Alternating bilinear form, 245 Conjugate-linear, 285
Alternating tensor, 325 Conjugate transpose, 27
Annihilator, 119 Consistent system, 38
Antisymmetric map, 314 Contraction map, 323
Antisymmetric tensor, 325 Contravariant tensor, 321
Associative binary operation, 2 Coordinate vector, 67, 99
Augmented matrix, 39 Coset, 5
Automorphism, 7 Covariant tensor, 321

B
Basis, 56 D
Bessel’s inequality, 149 Decomposable tensor, 322
Bilinear, 239 Definite solution, 38
Bilinear form, 239 Degenerate, 401
Binary operation, 1 Degenerate quadric without center, 410
Diagonalizable matrix, 184, 466
Diagonalizable operator, 184
C Diagonilizing basis, 184
Canonical equation, 410 Diagonal matrix, 25
Canonical forms of a matrix, 168 Dimension, 59

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 491
Nature Singapore Pte Ltd. 2022
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3
492 Index

Direction cosines, 375 Geometric multiplicity, 175, 467


Direct sum, 48, 49 Gram-Schmidt ortho-normalization, 151
Direct summand, 48 Grassmann algebra, 325, 330
Division algorithm, 17 Grassmann product, 330
Division ring, 13 Group, 2
Dolittle’s method, 334 Group homomorphism, 7
Dominant eigenvalue, 346 Groupoid, 1
Dominant eigenvector, 346
Dual, 126
Dual basis, 112 H
Dual space, 111 H-congruent matrices, 290
Hermitian, 158, 288
Hermitian matrix, 27
E Hermitian space, 295
Echelon form, 33 Homogeneousdifferentialequation,436,468
Eigenspace, 169, 175 Homogeneous equation, 438
Eigenvalue, 168 Homogeneous system, 40, 440
Eigenvector, 168 Homotheties, 383
Elementary matrix, 35 Hyperplane, 122, 368
Elementary operations, 31 Hyperspace, 122
Elementary row operations, 32
Endomorphism, 7
Epimorphism, 75
I
Equal matrices, 25
Equivalent matrices, 32, 34 Ideal, 12
Equivalent sets, 31 Idempotent linear operator, 92
Equivalent system, 38 Identity element, 2
Euclidean frame of reference, 367 Identity linear transformation, 77
Euclid’s algorithm, 4 Identity matrix , 25
Expotent, 194 Improper subspace, 47
Exterior algebra, 325, 330 Inclusion linear transformation, 77
Exterior product, 330 Inclusion mapping, 77
Inconsistent system, 38
Indecomposable tensor, 322
F Indefinite, 274, 290, 295
Field, 15 Indefinite solution, 38
Finite dimensional, 59 Indeterminate, 46
Finite group, 2 Index of eigenvalue, 195
First isomorphism theorem, 85 Index of nilpotency, 193
Fitting decomposition, 198 Infinite dimensional, 59
Formal power series, 46 Infinite group, 2
Frame of reference, 366 Initial conditions, 447
Free vector space, 300 Initial value problem, 437
Frobenius companion matrix, 471 Inner product, 131
Frobenius norm, 350 Inner product complex space, 295
Fundamental system of solutions, 441 Inner product space, 131
Fundamental theorem of algebra, 21 Inner product space isomorphism, 148
Integral, 436
Integral domain, 13
G Internal direct sum, 49
Galois field, 21 Invariant, 193, 385
Generalized eigenspace, 195 Inverse, 2
Generalized eigenvector, 194 Invertible linear transformation, 95
General linear group, 3 Invertible matrix, 34
Index 493

Invertible operator, 167 Metric space, 139


Isometry, 161, 392 Metric vector space, 246
Isomorphism, 7, 76 Minimal polynomial, 214
Isotropic, 253 Minimal set of generators, 57
Monomorphism, 7, 76
Motion, 397
J Multilinear function, 313
Jordan basis, 200, 462 Multiplier method, 334
Jordan block, 191
Jordan canonical form, 191
Jordan form, 191 N
Jordanizable, 193, 466 Natural homomorphism, 77
Negative definite, 274, 290, 295
Negative semi-definite, 274, 290, 295
K Nilpotent linear operator, 92, 193
Kernel, 80 n-linear function, 313
Kernel of a group homomorphism, 7 Nonabelian group, 2
Kernel of a ring homomorphism, 12 Noncommutative, 9
Klein 4-group, 3 Non-degenerate, 250, 401
Kronecker delta, 310 Non-degenerate quadric without center, 410
Nonhomogeneous differential equation, 438
Nonisotropic, 253, 259
L Nonsingular, 28, 93
Lagrange orthogonalization process, 262 Nonsingular linear transformation, 93
Lagrange’s theorem, 6 Norm, 137
Leading entry, 33 Norm of a vector, 136
Leading submatrices, 335 Normal operator, 159
Left (resp. right) coset, 5 Normal subgroup, 5, 6
Left singular vectors, 353 Normed space, 137
Linear combination, 28, 50 Nullity, 80
Linear differential equation, 438 Null space, 80
Linear equation, 37
Linear functional, 111
Linearly dependent, 29, 55, 441 O
Linearly independent, 29, 55 Order of a differential equation, 436
Linearly independent vector valued func- Order of an element, 3
tions, 441 Ordinary differential equation, 435
Linear operator, 76 Orthogonal, 140, 247
Linear span, 50 Orthogonal basis, 253, 255
Linear transformation, 75 Orthogonal complement, 145
Lower triangular, 25 Orthogonal direct sum, 147
Lower unitriangular, 281 Orthogonal group, 3
LU decomposition, 334 Orthogonal space, 248
Orthonormal basis, 145, 253
Orthonormal set, 144
M
Matrix, 24
Matrix associated with a quadratic form, 267 P
Matrix differential equation, 439 Parallelogram equality, 137
Matrix of a sesquilinear form, 286 Parametric equation, 370
Matrix of a bilinear form, 240 Particular solution, 437
Maximal linearly independent set, 57 Permutation matrix, 340
Metric, 139 Permuting matrix, 340
Metrically equivalent, 400, 404 Pivot entry, 34
494 Index

PLU decomposition, 340 Semigroup, 2


Point, 365, 424 Sesquilinear, 285
Polarization identity, 137 Sesquilinear form, 285
Polynomial ring, 10 Set of Jordan generators, 200
Polynomial ring over a ring, 17 Signature, 272, 294, 295
Positive definite, 274, 290, 295 Simple ring, 13
Positive semi-definite, 274, 290, 295 Singular, 28
Power method, 346 Singular linear transformation, 93
Prime field, 19 Singular matrix, 167
Principal of duality, 117 Singular operators, 167, 197
Principal submatrix, 25, 279 Singular solution, 437
Projection, 383 Singular value decomposition, 350
Projective coordinate, 424 Singular values, 350
Projectively equivalent, 424, 426, 430 Singular vectors, 350
Projective transformation, 424 Skew-adjoint operator, 158, 218
Skew-Hermitian matrix, 27
Skew-symmetric, 26
Q Skew-symmetric bilinear form, 245
Quadratic form, 266, 289 Skew symmetric map, 314
Quadratic function, 287 Solution of an ordinary differential equation,
Quadric, 400 436
Quaternion group, 3 Spectral theorem, 220
Quotient group, 6 Square matrix, 24
Quotient linear transformation, 77 Standard basis, 56, 99
Quotient ring, 12 Subfield, 19
Quotient space, 52 Subgroup, 4, 5
Submatrix, 24
Subring, 11
R Subspace, 46
Range of a linear transformation, 80 Sylvester Law, 80
Rank, 34, 80, 427, 430 Symmetric bilinear form, 245, 254, 266
Rank of a matrix, 34 Symmetric map, 314
Ray, 424 Symmetric matrix, 26
Reducible quadric, 414 Symmetric tensor, 325
Riesz representation theorem, 147 Symmetric tensor algebra, 327
Right singular vectors, 353 Symplectic, 253
Ring, 9 System of linear equations, 37
Ring extension of a ring, 14
Ring isomorphism, 12
Ring of Gaussian integers, 10 T
Ring of real quaternions, 10, 15 Tensor algebra, 320
Rotation, 395 Tensor product, 297
Tensor product of linear transformations, 316
Third isomorphism theorem, 86
S Trace, 26
Scalar, 44 Tranjugate, 350
Scalar matrix, 25 Transcendental, 14
Scalar multiplication, 25 Transcendental element, 17
Schur’s theorem, 178 Transition matrix, 67
Second dual space, 117 Translation, 381
Second isomorphism theorem, 86 Transpose, 26, 126
Self-adjoint operator, 158, 218 Triangularizable, 178
Index 495

U W
Unitarily similar, 179, 225 Wedge product, 330
Unitary operator, 162, 218 Wronskian, 442, 470
Upper triangular, 25
Upper unitriangular, 256, 281

Z
V Zero divisor, 13
Variation of parameters, 438 Zero linear transformation, 77
Vector space, 43 Zero matrix, 25
Vector valued function, 439 Zero ring, 9

You might also like