Professional Documents
Culture Documents
Mathematics Subject Classification: 15A03, 15A04, 15A18, 15A20, 15A21, 15A23, 15A42, 15A63
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
This book is self contained. Throughout the book stress is laid on the under-
standing of the subject. We hope that this book will help the readers to gain enough
mathematical maturity to understand and pursue any advance course in linear algebra
with greater ease and understanding. We welcome comments and suggestions from
the readers.
The authors gratefully acknowledge their deep gratitude to all those, whose text
books on linear algebra have been used by them in learning the subject and writing
this text book. We are also thankful to our colleagues and research scholars for
making valuable comments and suggestions.
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
About the Authors
xv
xvi Symbols
v1 , v2 , . . . , vn Subspace spanned by v1 , v2 , . . . , vn
L(S) Linear span of a set S of vectors
V ⊕W Direct sum of vector spaces V, W
dim(V ) Dimension of a vector space V
U +W Sum of subspaces U, W of V
R(T ) Range of a linear transformation T
N (T ) or K er T Kernel of a linear transformation T
r (T ) Rank of a linear transformation T
n(T ) Nullity of a linear transformation T
At Transpose of a matrix A
r (A) Rank of a matrix A
f (x) Formal derivative of a polynomial f (x)
m(T ) Matrix associated with a linear transformation T
V ∼
=W Isomorphism of two vector spaces V and W
V
Dual of a vector space V
1.1 Groups
Remark 1.2 (i) In the above definition ∗(a, b) is denoted by a ∗ b. It can be easily
seen that usual addition and multiplication are binary operations on the set of
natural numbers N, the set of integers Z, the set of rational numbers Q, the set
of real numbers R and on the set of complex numbers C.
(ii) If we consider the set of vectors V in a plane, vector product is a binary operation
on V while the scalar product on the set V is not a binary operation.
(iii) Let S be a nonempty set equipped with a binary operation ∗, i.e., ∗ : S × S → S
is a mapping, then (S, ∗) is called groupoid. For example, (Z, +), (Z, −) are
groupoids while (N, −) is not a groupoid.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_1
2 1 Algebraic Structures and Matrices
(iv) A nonempty set G equipped with a binary operation ∗, say (G, ∗), is said to be
a semigroup if ∗ is an associative binary operation, i.e., an associative groupoid
is called a semigroup. For example, (Z, + ), (Q, +), (R, +), (C, +), (N, ·),
(Z, ·), (Q, ·), (R, ·), (C, ·), are semigroups but (Z, −), (R∗ = R\{0}, ÷) are
not semigroups.
Definition 1.3 A nonempty set G equipped with a binary operation ∗, say (G, ∗),
is said to be a group if
(1) For every a, b, c ∈ G, a ∗ (b ∗ c) = (a ∗ b) ∗ c, i.e., ∗ is an associative binary
operation on G.
(2) For every a ∈ G, there exists an element e in G such that a ∗ e = e ∗ a = a.
Such an element e in G is called the identity element in G.
(3) For every a ∈ G, there exists an element b ∈ G such that a ∗ b = b ∗ a = e.
Such an element b ∈ G is said to be the inverse of a in G.
Remark 1.4 (i) In the above definition, G is a group under the binary operation
∗. Throughout this chapter by a group G means a group under multiplication
unless otherwise mentioned. For the sake of convenience the product between
any two elements a, b of a multiplicative group G will be denoted by ab instead
of a · b.
(ii) It can be easily seen that the identity element e in a group G is unique. For if
e, e are two identities in a group G, then e = ee = e e = e .
(iii) The inverse of each element in a group G is unique. If a and a are two inverses
of an element a in a group G, then aa = a a = e and aa = a a = e, where
e is the identity element in G. Then it can be easily seen that a = a e =
a (aa ) = (a a)a = ea = a .
(iv) If a −1 denotes the inverse of a in G, then in view of axiom (3), one can write
a = b−1 and b = a −1 . This is also easy to see that (ab)−1 = b−1 a −1 for all
a, b ∈ G.
(v) A group G is said to be abelian (or commutative) if ab = ba holds for all
a, b ∈ G. Otherwise, G is said to be a nonabelian group.
(vi) If a group G contains finite number of elements, then G is said to be a finite
group. Otherwise group G is said to be infinite. The number of elements in a
finite group G is called the order of G and is generally denoted by ◦(G) or |G|.
(vii) For any a ∈ G and n a positive integer, a n = a · a· · · a, a 0 = e, the identity of
n− times
the group G, a −n = (a −1 )n and hence it is straightforward to see that a n a m =
a n+m , (a n )m = a nm .
Example 1.5 (1) It can be easily seen that the groupoids (Z, +), (Q, +), (R, +)
and (C, +) form abelian groups under addition, while (Q∗ = Q\{0}, ·), (R∗ =
R\{0}, ·) and (C∗ = C\{0}, ·) form abelian groups under multiplication.
(2) The set G = {1, −1} forms an abelian group under multiplication.
(3) Consider the set G = {1, −1, i, −i} the set of fourth roots of unity. This is an
abelian group of order 4 under multiplication.
1.1 Groups 3
(4) The set of all positive rational numbers Q+ forms an abelian group under
the binary operation ∗ defined as a ∗ b = ab
2
. Note that e = 2 is the identity
element in Q+ while for any a ∈ Q+ , a4 is the inverse of a in Q+ .
(5) The set Z of all integers forms an abelian group with respect to the binary
operation ∗ defined by a ∗ b = a + b + 1, for all a, b ∈ Z. Note that e = −1
is the identity element in Z, while any a ∈ Z has the inverse −2 − a.
(6) The set {1, ω, ω2 }, where ω is the cube root of unity, forms an abelian group
of order 3 under multiplication of complex numbers.
(7) The set of matrices
1 0 −1 0 1 0 −1 0
M= , , ,
01 0 1 0 −1 0 −1
Definition 1.6 If G is a group, then the order of an element a ∈ G is the least positive
integer n such that a n = e, the identity of group G. If no such positive integer exists
we say that a is of infinite order. We use the notation ◦(a) for the order of a.
Proposition 1.8 Let G be a group and n be a positive integer. Then the following
hold:
(i) G is abelian if and only if (ab)2 = a 2 b2 , for all a, b ∈ G.
(ii) G is abelian if and only if (ab)k = a k bk , for all a, b ∈ G, where k = n, n +
1, n + 2.
(ii) We have (ab)n = a n bn , (ab)n+1 = a n+1 bn+1 and (ab)n+2 = a n+2 bn+2 . Using
the first two conditions, we get (a n bn )(ab) = (ab)n+1 = a n+1 bn+1 . Now using can-
cellation laws we arrive at bn a = abn . Similarly, using the latter two conditions we
find that (a n+1 bn+1 )(ab) = (ab)n+2 = a n+2 bn+2 , and hence bn+1 a = abn+1 . But in
this case abn+1 = bn+1 a = b(bn a) = b(abn ) and using cancellation laws we find
that G is abelian.
Example 1.10 (1) For any group G, G is a subgroup of G. Similarly, the trivial
group {e} is a subgroup of G. Any subgroup of G which is different from G and
{e} is said to be a proper (nontrivial) subgroup of G.
(2) The additive group (Z, +) is a subgroup of additive group (Q, +), the additive
group (Q, +) is a subgroup of (R, +) and the additive group (R, +) is a subgroup
of the additive group (C, +).
(3) The multiplicative group {1, −1, i, −i}, i 2 = −1, is a subgroup of all complex
numbers under multiplication. It is straightforward to see that S L(n, R) and
O(n, R) are subgroups of G L(n, R).
(4) The subsets {e, a}, {e, b} and {e, c} are subgroups of Klein 4-group K =
{e, a, b, c}.
1.1 Groups 5
(5) Let G be the multiplicative group of all nonsingular 2 × 2 matrices over the set
of complex numbers. Consider the subset
10 i 0 0i 0 1
H= ± ,± ,± ,±
01 0 −i i 0 −1 0
Definition 1.12 Let G be a group and H a subgroup of G. For any a ∈ G, the set
H a = {ha | h ∈ H } (resp. a H = {ah | h ∈ H }) is called a right (resp. left) coset of
H determined by a in G.
Proposition 1.17 If H is a normal subgroup of G, then the set of left (right) cosets
of H in G forms a group under the binary operation a H bH = abH .
Proof First, we show that the given binary operation is well-defined, i.e., for
any a1 , a2 , b1 , b2 ∈ G, a1 H = a2 H , b1 H = b2 H implies that a1 b1 H = a2 b2 H .
If a1 H = a2 H , b1 H = b2 H , then a1−1 a2 ∈ H and b1−1 b2 ∈ H . This yields that
(a1 b1 )−1 a2 b2 = b1−1 a1−1 a2 b2 = b1−1 (a1−1 a2 )b1 (b1−1 b2 ). Since H is normal in G, b1−1
(a1−1 a2 )b1 ∈ H . But b1−1 b2 ∈ H yields that (a1 b1 )−1 a2 b2 ∈ H , and hence a1 b1 H =
a2 b2 H . This operation is also associative. In fact, for any a, b, c ∈ G, (a H bH )cH =
(ab)H cH = (ab)cH = a(bc)H = a H (bcH ) = a H (bH cH ). Now if e is the iden-
tity element in G, then eH = H acts as the identity element, i.e., eH a H = (ea)H =
a H . For any a ∈ G, a −1 H is the inverse of a H . Hence the set of cosets of H in G
forms a group.
Lemma 1.19 Let G be a group and H a subgroup of G. Then G is the union of all
left cosets of H in G and any two distinct left cosets of H in G are disjoint.
y = ah = (xh −1 −1 −1 −1
1 )h = x(h 1 h) = bh 2 (h 1 h) = b(h 2 h 1 h),
where h 2 h −1
1 h ∈ H . Hence y ∈ bH and a H ⊆ bH . Similarly, it can be shown that
bH ⊆ a H . Consequently, a H = bH , i.e., a H and bH are not distinct. This leads to
a contradiction.
Remark 1.22 (i) Note that if binary operations in groups G and G are different,
say ∗ and o, respectively, then θ satisfies the property θ (a ∗ b) = θ (a)oθ (b)
for all a, b ∈ G. For the sake of convenience, it has been assumed that both G
and G are multiplicative groups.
(ii) A homomorphism of a group which is also onto is called an epimorphism. A
homomorphism of a group which is also one-to-one is called monomorphism.
Further, a homomorphism of a group G into itself is called an endomorphism.
(iii) A group homomorphism which is one-to-one and onto is said to be an isomor-
phism. An isomorphism of a group G onto itself is called an automorphism of
G.
(iv) If θ : G → G is a homomorphism, then θ (G) is a subgroup of G .
(v) Let θ : G → G be a group homomorphism of G onto G . If G is abelian, then
G is also abelian.
(vi) If θ : G → G is a group homomorphism of G onto G , where e and e are the
identities of G and G , respectively, then θ (e) = e and θ (a −1 ) = (θ (a))−1 for
all a ∈ G.
Example 1.23 (1) For any group G, the identity mapping IG : G → G is a group
homomorphism.
(2) For any two groups G and G , the mapping θ : G → G such that θ (a) = e , the
identity of G is a group homomorphism and is called trivial homomorphism.
(3) The mapping θ : R∗ → R∗ defined on multiplicative group R∗ = R\{0} such
that θ (a) = |a| is a homomorphism, but θ : R → R such that θ (a) = |a| is not
homomorphism on additive group R.
(4) The map θ : R2 → R2 such that θ (a, b) = a is a homomorphism.
(5) If θ : R2 → C such that θ (a, b) = a + ib, then θ is an isomorphism.
(6) Let a ∈ G be a fixed element of G and θ : G → G such that θa (g) = aga −1 ,
for all a ∈ G. Then it can be seen that θ is an automorphism. In fact, for any
x, y ∈ G, θ (x y) = a(x y)a −1 = axa −1 aya −1 = θa (x)θa (y), i.e., θ is a homo-
morphism. Further, θa (x) = θa (y) implies that axa −1 = aya −1 , i.e., x = y
and hence θ is one-to-one. For any x ∈ G, there exists a −1 xa ∈ G such that
θa (a −1 xa) = a(a −1 xa)a −1 = x and hence θ is onto. Thus θ is an automorphism
of G.
Exercises
1. Show that the set of all nth complex roots of unity forms a group with respect
to ordinary multiplication.
2. If G is a group, then show that the set Z (G) = {a ∈ G | ab = ba, for all b ∈ G}
(is said to be center of G ) is a normal subgroup of G.
3. If a is a fixed element of a group G, then show that the set N (a) = {x ∈ G |
ax = xa} (is said to be normalizer of a in G) is a subgroup of G.
4. If every element in a group G is self-inverse, i.e., a −1 = a for all a ∈ G, then
show that G is abelian.
5. In a group G, for all a, b ∈ G show that (aba −1 )n = aba −1 if and only if b = bn ,
where n is any integer.
6. Let G be a group and m, n be two relatively prime positive integers such that
(ab)m = a m bm and (ab)n = a n bn hold for all a, b ∈ G. Then show that G is
abelian.
7. Let G be a group such that (ab)2 = (ba)2 for all a, b ∈ G. Suppose G also has
the property that c2 = e implies that c = e, c ∈ G. Then show that G is abelian.
8. Show that a group G in which a m bm = bm a m and a n bn = bn a n hold for all
a, b ∈ G, where m, n are any two relatively prime positive integers, is abelian.
9. Show that intersection of two subgroups of a group G is a subgroup of G. More
generally, show that intersection of any arbitrary family of subgroups of G is a
subgroup of G.
10. If H is a finite subset of a group G such that ab ∈ H for all a, b ∈ H , then show
that H is a subgroup of G.
11. Show that a group cannot be written as a set theoretic union of two of its proper
subgroups.
1.1 Groups 9
1.2 Rings
This section is devoted to the study of algebraic structure equipped with two binary
operations, namely rings. Several basic properties of rings and subrings are given.
Definition 1.26 A nonempty set R equipped with two binary operations, say addi-
tion + and multiplication · , is said to be a ring if it satisfies the following axioms:
(1) (R, +) is an abelian group.
(2) (R, ·) is a semigroup.
(3) For any a, b, c ∈ R, a · (b + c) = a · b + a · c, (b + c) · a = b · a + c · a.
Remark 1.27 (i) The binary operations + and · are not necessarily usual addition
and multiplication. Moreover, these are only symbols used to represent both
binary operations of R. For the convenience we write ab instead of a · b. A
ring R is said to be commutative if ab = ba holds for all a, b ∈ R. Otherwise,
R is said to be noncommutative. If a ring R contains an element e such that
ae = ea = a for all a ∈ R, we say that R is a ring with identity, and the identity
e in a ring R is usually denoted by 1. In general, a ring R may or may not have
identity. But if the ring R has identity 1, then it is unique.
(ii) It can be easily seen that a0 = 0a = 0 for all a ∈ R.
(iii) For any a, b ∈ R, a(−b) = (−a)b = −ab, (−a)(−b) = ab.
(iv) In a ring R, for a fixed positive integer n, na = a + a + · · · + a and a =
n
n− times
a · a· · · a.
n− times
Example 1.28 (1) (Z, +, ·), (Q, +, ·), (R, +, ·) and (C, +, ·) are examples of com-
mutative rings.
(2) Every additive abelian group G is a ring if we define multiplication in G as
ab = 0 for all a, b ∈ G, where 0 is the additive identity. This ring is called zero
ring.
(3) The set Mn×n (Z) of all n × n matrices over integers forms a ring under the
binary operations of matrix addition and matrix multiplication, which is an
example of a noncommutative ring.
(4) Let Q = {a + bi + cj + dk | a, b, c, d ∈ R}. Define addition in Q as (a1 +
b1 i + c1 j + d1 k) + (a2 + b2 i + c2 j + d2 k) = (a1 + a2 ) + (b1 + b2 )i +
(c1 + c2 ) j + (d1 + d2 )k under which Q forms an abelian group. Now multi-
ply any two members of Q as multiplication of polynomials by using the rules
10 1 Algebraic Structures and Matrices
f (x)g(x) = c0 + c1 x + · · · + ck x k ,
In the previous section, we have defined the notion of subgroup H of a group G, i.e.,
a nonempty subset H of G which is itself a group under the same binary operation.
Analogously, the notion of a subring has been introduced in the case of a ring also.
Remark 1.32 (i) A subring need not contain the identity of a ring. For exam-
ple, (Z, +, ·) has the identity 1 while its subring (2Z, +, ·) has no identity.
(Z6 , ⊕6 , ⊗6 ) has the identity 1, but its subring ({0, 3}, ⊕6 , ⊗6 ) has the identity
3. It is also to remark that a ring has no identity element but it has a subring
which contains the identity element. For example, Z × 3Z has no identity while
its subring Z × {0} has the identity (1, 0).
(ii) Trivially, subring of a commutative ring is commutative. But a noncommuta-
tive ring may have a commutative subring. For example,
a b
M2 (Z) = | a, b, c, d ∈ Z
cd
is commutative.
12 1 Algebraic Structures and Matrices
a b
(iii) Let R = | a, b ∈ R, q ∈ Q . It can be seen that R is a ring with
0q
a b
10
identity . But it has a subring S = | a, b ∈ R, m ∈ Z} without
01 0 2m
identity.
Example 1.34 In the ring M2 (Z) of all 2 × 2 matrices over Z, the ring of inte-
a b c 0
gers consider the subsets A = | a, b ∈ Z and B = | c, d ∈ Z
00 d0
of M2 (Z). It can be easily seen that A is a right ideal of M2 (Z) which is not a left
ideal of M2 (Z) while B is a left ideal of M2 (Z) which is not a right ideal of M2 (Z).
a 0
If we consider the subset C = | a, b, c ∈ Z of M2 (Z), then it can be seen
bc
that C is a subring of M2 (Z), but neither a left nor a right ideal of M2 (Z).
Definition 1.35 Let (R, +, ·) be a ring and I be an ideal of R. Then the set R/I =
{a + I | a ∈ R} forms a ring under the operations: (a + I ) + (b + I ) = (a + b) + I
and (a + I )(b + I ) = ab + I . This is known as quotient ring.
Example 1.37 (1) A map θ : C → C such that θ (z) = z̄, the complex conjugate of
z, is a ring homomorphism which is both one-to-one and onto.
a −b
(2) Let θ : C → R, where R = | a, b ∈ R is a ring under matrix addi-
b a
x −y
tion and multiplication. For any z = x + i y ∈ C, define θ (z) = . It can
y x
be easily verified that θ is a homomorphism which is both one-to-one and onto.
Definition 1.42 A ring R with identity 1, in which each nonzero element has mul-
tiplicative inverse, i.e., if a ∈ R is a nonzero element, then there exists b ∈ R such
that ab = ba = 1, is called a division ring.
Definition 1.43 The characteristic of a ring R, denoted as char (R), is the smallest
positive integer n such that na = 0 for all a ∈ R. If no such integer exists, the
characteristic of R is said to be zero.
Proof Suppose that char (R) = n = 0, and n is not prime. Then n = n 1 n 2 , where
n 1 and n 2 are proper divisors of n. For any 0 = a ∈ R, 0 = na 2 = n 1 n 2 a 2 =
(n 1 a)(n 2 a). Since R is an integral domain, n 1 a = 0 or n 2 a = 0. If n 1 a = 0, then
it can be seen that n 1r = 0 for any r ∈ R. In fact, (n 1r )a = r (n 1 a) = 0, and since
a = 0, we arrive at n 1r = 0 for all r ∈ R, where n 1 < n. This contradicts the mini-
mality of n, and hence n is a prime.
Remark 1.47 It can be easily seen that the definition of a polynomial ring given
above is equivalent to the definition of a polynomial ring via Example 1.28(6).
Exercises
1. Show that a ring R with identity is commutative if and only if (ab)2 = a 2 b2 for
all a, b ∈ R.
2. Justify the existence of the identity in the above result.
3. Prove that a ring R is commutative if it satisfies the property a 2 = a for all
a ∈ R.
4. Show that a ring R is commutative if and only if (a + b)2 = a 2 + 2ab + b2 for
all a, b ∈ R.
5. If (R, +, ·) is a system satisfying all the axioms of a ring with unity except
a + b = b + a, for all a, b ∈ R, then show that (R, +, ·) is a ring.
6. Let R be a ring (may be without unity 1) satisfying any one of the following
identities:
1.2 Rings 15
This section is devoted to study field, its various examples and some basic properties
which have been used freely in the subsequent chapters.
such that x y = yx = 1, i.e., every nonzero element has its multiplicative inverse.
This is an example of a division ring which is not a field.
(4) Define addition and multiplication in R2 = {(a, b) | a, b ∈ R} as (a, b) + (c, d)
= (a + c, b + d), (a, b)(c, d) = (ac − bd, ad + bc). It can be seen that
(R2 , +, ·) is a field. In fact, for any nonzero (a, b) ∈ R2 , there exists (a, b)−1 =
−b
( a 2 +b
a
2 , a 2 +b2 ).
16 1 Algebraic Structures and Matrices
a b
(5) Consider the set M = | a, b ∈ C . Then M is a ring under matrix
−b̄ ā
10
addition and matrix multiplication with identity . For any nonzero matrix
01
x−i y −(a+ib)
x + i y a + ib
X= ∈ M, there exists Y = x +ya−ib +a +b x +y +a +b
∈
2 2 2 2 2 2 2 2
−a + ib x − i y x+i y
x 2 +y 2 +a 2 +b2 x 2 +y 2 +a 2 +b2
10
M such that X Y = Y X = . Hence M is a division ring which is not a
01
field.
Proposition 1.50 Every field is an integral domain.
Proof Suppose that R is a field, and there exist a = 0, b = 0 in R such that ab = 0.
Since b = 0 is an element of the field R, there exists c ∈ R such that bc = cb =
1. Now 0 = (ab)c = a(bc) = a1 = a, a contradiction. Hence ab = 0 and R is an
integral domain.
Remark 1.51 Every integral domain need not be a field. For example, Z, the ring
of integers, is an integral domain which is not a field. But in case of a finite integral
domain, the converse of the above result is true.
Proposition 1.52 Every finite integral domain is a field.
Proof Let R = {a1 , a2 , . . . , an } be a finite integral domain consisting of distinct ele-
ments a1 , a2 , . . . , an , and let a be a nonzero element of R. Then aa1 , aa2 , . . . , aan
are all distinct and belong to R. In fact, if aai = aa j , 1 ≤ i, j ≤ n, then a(ai −
a j ) = 0 which yields that ai = a j , 1 ≤ i, j ≤ n, a contradiction. Hence the set
{aa1 , aa2 , . . . , aan } coincides with R. In particular, aak = a for some 1 ≤ k ≤ n. It
can be seen that ak is the multiplicative identity of R. If a is an arbitrary element
of R, then a = aai for some 1 ≤ i ≤ n. But a ak = ak a = ak (aai ) = (ak a)ai =
aai = a shows that ak is the identity of R, denote it by 1. Hence if 0 = a ∈ R, we
find that aam = am a = 1, for some 1 ≤ m ≤ n. Thus every nonzero element in R
has its multiplicative inverse, R is a field.
Remark 1.53 (i) We have earlier shown that the characteristic of an integral
domain is either 0 or p, where p is a positive prime integer. But we have also
earlier proved that every field is an integral domain. Thus we conclude that
characteristic of a field is either 0 or p, where p is a positive prime integer.
(ii) The identity of a subfield is the same as the identity of the field.
Proposition 1.54 A field has no proper ideals. In particular, every field is a simple
ring.
Proof Let F be a field. Next, suppose that I is an ideal of F. If I = {0}, then nothing
to do. If not, there exists a nonzero element a ∈ I. As F is a field, there exists
the element a −1 ∈ F such that a −1 a = 1. Since I is an ideal of F, it follows that
a −1 a = 1 ∈ I. As a result, I = F. Hence F has no proper ideal. Finally, it can be
said that every field F is a simple ring.
1.3 Fields with Basic Properties 17
Remark 1.57 (i) Let f (x) ∈ F[x]. Let α ∈ F be a root of f (x). Divide f (x)
by (x − α) according to Division algorithm, there exist unique polynomials
q(x) and r (x) such that f (x) = (x − α)q(x) + r (x), either r (x) = 0 or deg
r (x) < deg (x − α) = 1. This implies that r (x) = 0 or λ, a nonzero constant
polynomial. If r (x) = λ, then we get f (x) = (x − α)q(x) + λ. Using the fact
that α is a root of f (x), we arrive at λ = 0, i.e., f (x) = (x − α)q(x). This
shows that (x − α) is a factor of f (x). It is known as factor theorem.
(ii) Let f (x), g(x) ∈ F[x]. We say that a nonzero polynomial f (x) divides a
polynomial g(x) in F[x], symbolically written as f (x)/g(x), if there exists
h(x) ∈ F[x] such that g(x) = f (x)h(x). Here f (x) and h(x) are called as
factors of g(x). In particular, if deg f (x) < deg g(x) and deg h(x) <deg
g(x), then f (x) and h(x) are known as proper or nontrivial factors of g(x).
Otherwise, f (x) and h(x) are known as improper or trivial factors of g(x).
(iii) Let f (x) ∈ F[x] be any nonconstant polynomial. If f (x) has no proper fac-
tors then f (x) is known as irreducible polynomial. Otherwise, it is known as
reducible polynomial.
(iv) Let f (x) ∈ F[x] be any nonconstant polynomial. Then f (x) = f 1 (x) f 2 (x) · · ·
f n (x), where f 1 (x), f 2 (x), . . . , f n (x) are some irreducible polynomials over
F.
(v) Let K be an extension of F. An element α ∈ K is called a root of f (x) = α0 +
α1 x + α2 x 2 + · · · + αn x n ∈ F[x] if f (α) = 0, where f (α) = α0 + α1 α +
α2 α 2 + · · · + αn α n .
(vi) Let f (x) and g(x) be any two nonzero polynomials over F. Then deg ( f (x) +
g(x)) ≤ max ( deg f (x), deg g(x)), deg ( f (x)g(x)) = deg f (x) + deg g(x).
(vii) Let f (x) ∈ F[x] be any polynomial of degree n ≥ 1. Then total number of
roots of f (x) = n.
1.3 Fields with Basic Properties 19
Proof Suppose that F is a field. And S is a subfield of F. This implies that (S, +) is
a subgroup of (F, +). Hence a, b ∈ S ⇒ a + (−b) = a − b ∈ S. Similarly, (S ∗ =
S\{0}, ·) is a subgroup of (F∗ = F\{0}, ·). This implies that a, b ∈ S ∗ ⇒ ab−1 ∈ S ∗ .
In turn, we have ab−1 ∈ S. On the other hand, if a = 0 and b ∈ S ∗ , then ab−1 = 0 ∈
S. Now including the previous two statements, we have shown that a, 0 = b ∈ S ⇒
ab−1 ∈ S.
Conversely, suppose that a subset S of F, with minimum two elements satisfies
(i) a, b ∈ S ⇒ a − b ∈ S; (ii) a, 0 = b ∈ S ⇒ ab−1 ∈ S. To show that (S, +, ·) is
a field, we prove that (S, +, ·) is a commutative ring with identity in which each
nonzero element is a unit. As |S| ≥ 2, there exists an element 0 = x ∈ S. Now using
(i), we have x − x = 0 ∈ S and using (ii) we have x x −1 = 1 ∈ S. Next suppose
that a, b ∈ S. If at least one of them is zero then ab = 0 ∈ S. Now suppose the case
when a and b both are nonzero. Using (ii), we have 1b−1 = b−1 ∈ S. Obviously
b−1 = 0. Now using (ii) again, we have a(b−1 )−1 = ab ∈ S. Thus, we have shown
that a, b ∈ S ⇒ ab ∈ S. Now utilizing (i) and the previous conclusion, we have
shown that S is a subring of F with identity. As F is a commutative ring, we are
bound to conclude that S is a commutative ring with the identity 1 in which each
nonzero element is a unit. Thus S is a field but it is contained in F. Hence S is a
subfield of F.
Definition 1.61 A field F is called a prime field if it has no proper subfield. This
says that only subfield of F is F itself.
Proposition 1.62 The fields Q and Z p , where p is a positive prime integer, are prime
fields.
20 1 Algebraic Structures and Matrices
Proposition 1.64 Let F be a field. If char (F) = 0, then there exists a subfield S1 of
F such that S1 ∼= Q. Further, if char (F) = p, where p is a positive prime integer,
then there exists a subfield S2 of F such that S2 ∼
= Zp.
Proof Let us suppose that char (F) = 0. Define a map f : Q −→ F such that f ( mn ) =
(m1)(n1)−1 , where 1 is the identity of the field F. As char (F) = 0, so (n1) = 0 for
any nonnegative integer n. Therefore (n1)−1 exists. It can be easily verified that f is
a well-defined map and also it is an injective ring homomorphism. Thus Q ≈ f (Q).
But we know that an isomorphic image of a field is a field. This shows that f (Q) is
a field. Now our required subfield S1 of F is f (Q), i.e., S1 = f (Q) ⊆ F.
Now let us suppose that char (F) = p. Since Z p = {0̄, 1̄, . . . , ( p − 1)}. Now
define a map f : Z p −→ F such that f (x̄) = x1, where 1 is the identity of the field
F and x is an integer such that 0 ≤ x ≤ p − 1. It is easy to verify that f is a well-
defined map and also it is an injective ring homomorphism. Hence Z p ≈ f (Z p ).
But f (Z p ) is a field because isomorphic image of a field is a field. Thus our desired
subfield S2 of F is f (Z p ), i.e., S2 = f (Z p ) ⊆ F.
1.3 Fields with Basic Properties 21
Definition 1.65 A field containing a finite number of elements is called a finite field
or a Galois field.
Remark 1.67 (i) The characteristic of a finite field F cannot be 0. For otherwise,
i.e., if char(F) = 0, then 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . . will belong to F
as 1 ∈ F and also all these elements will be distinct. But under such a situation
F will contain infinite number of elements. This leads to a contradiction. Thus
the characteristic of a finite field F will be a positive prime integer p.
Here it is to be noted that converse of this statement is not true. There exists
infinite field of characteristic p. We are constructing an example of such a field.
Let Z p denotes the field of residue classes modulo p. Let x be a transcendental
element over the field Z p . Let Z p [x] denotes the polynomial ring over the
f (x)
field Z p . Now define a set given by T = { g(x) | f (x), g(x) ∈ Z p [x]}. It can be
easily verified that T is a field with regard to addition and multiplication of
rational functions. It is also obvious to observe that T is an infinite field but
char (T ) = p.
(ii) A field F is finite if and only if its multiplicative group is cyclic.
(iii) The number of elements in a finite field will be of the form p n , where p is a
positive prime integer, n is any positive integer and char (F) = p. A finite field
or a Galois field containing p n elements is usually denoted by G F( p n ).
(iv) Given a positive prime p and a positive integer n, there exists a finite field
containing p n elements.
(v) Any two finite fields having the same number of elements are isomorphic.
Example 1.69 (1) The field of complex numbers C is algebraically closed. Since
by Fundamental Theorem of Algebra, we know that every polynomial over field
of complex numbers has all roots in C.
(2) The field of real numbers R is not algebraically closed. Since x 2 + 1 is a poly-
nomial over field of real numbers which has roots ±i but these roots do not lie
in R.
(3) The field of rational numbers Q is not algebraically closed.√Since x 2 − 3 is a
polynomial over field of rational numbers which has roots ± 3 but these roots
do not lie in Q.
(4) Any finite field F cannot be algebraically closed. Let F = {a1 , a2 , . . . , am }. Now
construct a polynomial P(x) = 1 + (x − a1 )(x − a2 ) · · · (x − am ) over F. It is
obvious to observe that no element of F is a root of P(x). Thus F is not alge-
braically closed.
Proposition 1.70 Let F be a field. Then following statements are equivalent for F:
(i) F is algebraically closed.
(ii) F has no proper algebraic extension.
22 1 Algebraic Structures and Matrices
Proof (i) ⇒ (ii) Let F be algebraically closed. We have to prove that F has no proper
algebraic extension. Suppose on contrary, i.e., there exists a proper algebraic exten-
sion K of F. This implies that there exists α ∈ K such that α ∈ / F and α is a root of a
polynomial f (x) over F. Let f (x) = α0 + α1 x + α2 x 2 + · · · + αn x n , where n ≥ 1
be a polynomial over the field F. As F is algebraically closed, hence f (x) will have
a root in F. Let this root be β1 ∈ F. Hence by factor theorem f (x) = (x − β1 )g(x),
where g(x) ∈ F[x]. Using hypothesis again g(x) will have a root β2 ∈ F. Now
using again the factor theorem, we arrive at f (x) = (x − β1 )(x − β2 )g1 (x), where
g1 (x) ∈ F[x]. Now proceeding inductively, after finite number of steps and using
the fact that any polynomial over F of degree n has n roots, we obtain that
f (x) = λ(x − β1 )(x − β2 ) · · · (x − βn ), where λ, βn ∈ F. Now we conclude that
β1 , β2 , . . . , βn are precisely the roots of f (x). But α is also a root of f (x). Thus,
we get that α = βi for some i : 1 ≤ i ≤ n. This implies that α ∈ F, which leads to
a contradiction. Thus F has no proper algebraic extension.
(ii) ⇒ (iii) Let F has no proper algebraic extension. We have to show that every
irreducible polynomial over F is of degree 1. Suppose on contrary. Hence there exists
an irreducible polynomial f (x) over F of degree n ≥ 2. It is obvious to observe that
no root of f (x) will lie in F. Let α be a root of f (x) lying outside F. Clearly α is
algebraic over F. Let K= [F ∪ {α}] be the subfield generated by F ∪ {α}. As α is
algebraic over F and K = F. Thus K is a proper algebraic extension of F. This leads
to a contradiction. Hence every irreducible polynomial over F is of degree 1.
(iv) ⇒ (v) Suppose that every polynomial over F has a root in F. Let f (x) ∈ F[x]
of degree n ≥ 1. Hence f (x) will have a root in F. Let this root be β1 ∈ F. Hence by
factor theorem f (x) = (x − β1 )g(x), where g(x) ∈ F[x]. Using hypothesis again
g(x) will have a root β2 ∈ F. Now using again the factor theorem, we arrive at
f (x) = (x − β1 )(x − β2 )g1 (x), where g1 (x) ∈ F[x]. Now proceeding inductively,
after finitely number of steps and using the fact that degree of f (x) = n, we obtain
that f (x) = λ(x − β1 )(x − β2 ) · · · (x − βn ), where λ, βn ∈ F. Now we conclude
that β1 , β2 , . . . , βn are precisely the roots of f (x) and these lie in F. We are done.
1.3 Fields with Basic Properties 23
(v) ⇒ (vi) Let every polynomial over F have all its roots in F. Suppose that
f (x) ∈ F[x] of degree n ≥ 1. By hypothesis f (x) will have all roots in F. But
we know that the total number of all roots of f (x) is equal to n, the degree
of the polynomial. Thus, let these roots be α1 , α2 , . . . , αn ∈ F. Thus, by factor
theorem (x − α1 ), (x − α2 ), . . . , (x − αn ) will be factors of f (x). Thus, we can
write f (x) = (x − α1 )(x − α2 ) · · · (x − αn )g(x), where g(x) ∈ F[x]. Now compar-
ing the degrees of both sides in the previous relation, we conclude that g(x) = λ ∈ F.
Hence f (x) = λ(x − α1 )(x − α2 ) · · · (x − αn ). Now proof is completed.
(vi) ⇒ (i) Suppose that every polynomial over F breaks into linear factors over F
and let f (x) = λ(x − α1 )(x − α2 ) · · · (x − αn ). Clearly α1 is a root of f (x) which
lies in F. This shows that F is algebraically closed.
Exercises
1. Prove that the only isomorphism of Q (resp. R) onto Q (resp. R) is the identity
mapping I Q (resp. I R ).
2. Show that the only isomorphism of C onto itself which maps reals to reals is the
identity mapping IC or the conjugation mapping.
3. Let R be a commutative ring with unity. Prove that R is a field if R has no proper
ideals.
4. Show that a finite field cannot be of characteristic zero.
√ √
5. Prove that for a fixed prime p, Q( p) = {a + b p | a, b ∈ Q} forms a field
under usual addition and usual multiplication.
6. Let F be a field of characteristic
√ p > 0 and a ∈ F be such that there exists no
b ∈ F with b p = a (i.e., p a ∈ / F). Then show that x p − a is irreducible over F.
7. If f is a function from a field F to itself such that f (x) = x −1 if x = 0 and
f (0) = 0, then show that f is an automorphism if and only if F has either 2, 3, 4
elements.
8. Find all roots of x 5 + 3̄x 3 + x 2 + 2̄x ∈ Z5 [x] in Z5 .
9. Show that a factor ring of a field is either the trivial ring of one element or is
isomorphic to the field.
10. Show that for a field F, the set of all matrices of the form
a b
| for all a, b ∈ F is a right ideal of M2 (F) but not a left ideal of
00
M2 (F). Moreover, find a subset of M2 (F) which is a left ideal of M2 (F), but not
a right ideal of M2 (F).
11. Let R be a ring with identity. If char (R) = 0, then show that there exists a
subring S1 of R such that S1 ∼ = Z, the ring of integers. Further, if char (R) = n,
where n is a positive integer, then prove that there exists a subring S2 of R such
that S2 ∼
= Zn , the ring of residue classes (mod n).
24 1 Algebraic Structures and Matrices
1.4 Matrices
Let F be a field. Most part of the results we are going to discuss hold in the case F
is an arbitrary field, in all that follows we always assume that the characteristic of F
is different from 2.
A matrix over F of size m × n is a rectangular array with m rows and n columns
of the form ⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
A=⎢ ⎣ ... ... ... ... ⎦
⎥
respectively, the ith row and the jth column of A, we may represent A just listing
its rows or columns, that is as
A = [R1 , R2 , . . . , Rm ] or A = [C1 , C2 , . . . , Cn ].
Let Mmn (F) be the set of all m × n matrices over F. If A, B ∈ Mmn (F), then the
sum A + B is the matrix in Mmn (F) obtained by adding
together the corresponding
entries in the two matrices. In other
words, if A = ai j and B = bi j then A + B =
C ∈ Mmn (F) is the matrix C = ci j such that ci j = ai j + bi j , for any i = 1, . . . , m
and j = 1, . . . , n :
⎡ ⎤
a11 + b11 a12 + b12 . . . a1n + b1n
⎢ a21 + b21 a22 + b22 . . . a2n + b2n ⎥
C =⎢
⎣
⎥.
⎦
... ... ... ...
am1 + bm1 am2 + bm2 . . . amn + bmn
Let now α ∈ F and A = ai j ∈ Mmn (F). The scalar multiplication of A by α is the
matrix in Mmn (F) obtained
by multiplying each entry of A by the scalar α, that is,
the matrix α A = αai j :
26 1 Algebraic Structures and Matrices
⎡ ⎤
αa11 αa12 ... αa1n
⎢ αa21 αa22 ... αa2n ⎥
αA = ⎢
⎣ ...
⎥.
... ... ... ⎦
αam1 αam2 ... αamn
Appealing to the field axioms and focusing the attention to an entry-wise, it is easy
to see that addition and scalar multiplication satisfy the following properties:
(1) For any A, B ∈ Mmn (F), A + B = B + A.
(2) For any A, B, C ∈ Mmn (F), A + (B + C) = (A + B) + C.
(3) For any A ∈ Mmn (F), A + 0m×n = 0m×n + A = A.
(4) For any A = (ai j ) ∈ Mmn (F), there exists the matrix B = (−ai j ) ∈ Mmn (F)
such that A + B = 0m×n . Usually such a matrix B is denoted by −A.
(5) For any α ∈ F and A, B ∈ Mmn (F), α(A + B) = α A + α B.
(6) For any α, β ∈ F and A ∈ Mmn (F), (α + β)A = α A + β A.
(7) If 1F is the identity element of F, then 1F A = A, for any A ∈ Mmn (F).
(8) For any α, β ∈ F and A ∈ Mmn (F), α(β A) = (αβ)A.
In particular, note that Mmn (F) is a commutative group with respect to the addition
between matrices. Moreover, it is an example of a vector space over F. The concept
of vector space is discussed in the next chapter.
Definition 1.72 Let A ∈ Mn (F). The sum of all entries on the main diagonal of A
is called the trace of A and denoted by tr (A).
The following properties satisfied by the trace are easy consequences of its definition.
(1) If A, B ∈ Mn (F), then tr (A + B) = tr (A) + tr (B).
(2) If A ∈ Mn (F) and α ∈ F, then tr (α A) = αtr (A).
Definition 1.73 Let A ∈ Mmn (F). The transpose of A, usually denoted by At , is the
n × m matrix obtained by interchanging rows and columns of A in such a way that
the ordered elements in the ith row (column) of A are exactly the ordered elements of
the ith column (resp. row) of At . In other words, if α ∈ F appears in the (r, s)-entry
of A, then it appears in the (s, r )-entry of At .
Here we also list a number of easy properties, whose proofs depend only on the
definition of the transpose.
(1) For any A ∈ Mmn (F), (At )t = A.
(2) For any A, B ∈ Mmn (F), (A + B)t = At + B t .
(3) For any A ∈ Mn (F) and α ∈ F, (α A)t = α At .
(4) For any A ∈ Mn (F), tr (At ) = tr (A).
A particular class of matrices, highly significant in further discussions, consists of
those which satisfy the condition of coinciding with their corresponding transposes.
n
cr s = ar k bks = ar 1 b1s + ar 2 b2s + ar 3 b3s + . . . + ar n bns .
k=1
It is clear that the product AB does not make sense if A ∈ Mmn (F) and B ∈ Mtq (F)
with n = t. Hence, the products AB and B A are simultaneously possible only if A
and B are matrices of the orders m × n and n × m, respectively. Nevertheless, even
when possible, it is not generally true that AB = B A. Thus the matrix product is not
commutative. Associativity of matrix multiplication and distributive properties are
given as below:
Proposition 1.76 (i) If A = ai j ∈ Mmn (F), B = bi j ∈ Mnt (F), C = ci j ∈
Mtq (F) then (AB)C = A(BC), that is, the matrix product is associative.
(ii) For any A ∈ Mmn (F), B ∈ Mnt (F), C ∈ Mnt (F), we have A(B + C) = AB +
AC, i.e., the matrix product is distributive over the matrix addition.
Note 1.77 The earlier mentioned identity matrix of order n is the identity element
for the matrix product in Mn (F). All the properties we have discussed lead us to
the conclusion that, for any n ≥ 1, the set Mn (F), equipped by matrix addition and
multiplication is a (noncommutative) ring having unity.
Two relevant additional properties for the matrix product are the following:
(i) If A = (ai j ) ∈ Mmn (F) and B = (bi j ) ∈ Mnr (F), then (AB)t = B t At .
(ii) If A = (ai j ) ∈ Mn (F) and B = (bi j ) ∈ Mn (F), then tr (AB) = tr (B A).
The Determinant of a Square Matrix
Definition 1.78 Let A = (ai j ) ∈ Mn (F). Then the determinant of A, written as det A
or |A| is the element σ ∈Sn Sign (σ )a1σ (1) a2σ (2) · · · anσ (n) ∈ F, where Sign (σ ) = 1
if σ is an even permutation and sign (σ ) = −1 if σ is an odd permutation and Sn is
the permutation group on n symbols, i.e., 1, 2, . . . , n. We will also use the notation
28 1 Algebraic Structures and Matrices
a11 ... a1n
.. .. ..
. . .
an1 . . . ann
Remark 1.79 (i) For n = 1, i.e., if A = (a) is a 1 × 1 matrix, then we have |A| =
a.
a11 a12
(ii) For n = 2, i.e., if A = is a 2 × 2 matrix, then we have |A| =
a21 a22
a11 a22 − a12 a21 .
(iii) If A is a diagonal matrix with diagonal entries λ1 , λ2 , . . . , λn , then |A| =
λ1 λ2 · · · λn .
Definition 1.80 Any square matrix having determinant equal to zero (respectively
different from zero) is said to be singular (resp. nonsingular).
Determinants have the following well-known basic properties:
Theorem 1.81 Let A ∈ Mn (F). Then
(i)interchanging two rows of A changes the sign of det A,
(ii)det A= det At ,
(iii)for any B ∈ Mn (F), det (AB) =det A det B,
(iv) the determinant of an upper triangular or lower triangular matrix is the product
of the entries on its main diagonal, ⎛ ⎞
B1 0 · · · 0
⎜ . ⎟
⎜ 0 . . . · · · .. ⎟
(v) if A has the block diagonal form, i.e., A = ⎜ . ⎜ ⎟
. ⎟ , where B1 ,
⎝ .. · · · . . . .. ⎠
0 · · · 0 Bm
B2 , . . . , Bm are square matrices, then det A=det B1 det B2 · · · det Bm .
Elementary Array Operations in Fn
α1 R1 + α2 R2 + · · · + αk Rk ∈ Fn (1.1)
α1 R1 + α2 R2 + · · · + αk Rk = 0.
α1 R1 + α2 R2 + · · · + αk Rk = 0 if and only if α1 = α2 = · · · = αk = 0.
α1 R1 + α2 R2 + · · · + αk Rk = 0,
so that
α1 α2 αi−1 αi+1 αk
Ri = − R1 − R2 − · · · − Ri−1 − Ri+1 − · · · − Rk ,
αi αi αi αi αi
where α1i = αi−1 is the inverse of αi as an element of the field F, implying that Ri
can be expressed as a linear combination of {R1 , . . . , Ri−1 , Ri+1 , . . . , Rk }.
Remark 1.84 Let {R1 , . . . , Rk } be a set of elements in Fn . If one of them is the zero
element 0 ∈ Fn , then R1 , . . . , Rk are linearly dependent.
For the sake of clarity, if R1 , . . . , Rk are linearly dependent (or independent) arrays,
we sometimes say that the set {R1 , . . . , Rk } is a linearly dependent set (resp. linearly
independent set).
Given a set S of elements of Fn , one of the most important questions in linear
algebra, if not the most important one, is how to recognize the (largest) number of
linearly independent arrays in S. To answer this question we need to permit some
remarks.
We first assume that the set
{R1 , . . . , Rk } ⊂ Fn (1.2)
α1 R1 + α2 R2 + . . . + αk Rk = 0 (1.3)
30 1 Algebraic Structures and Matrices
for some αi = 0.
A first and obvious remark is that the order of comparison of the arrays does
not affect the fact that the set is linearly dependent. Thus, any other set obtained by
permuting R1 , . . . , Rk , is yet linearly dependent.
Let now 0 = β ∈ F and consider the following set of elements in Fn :
γ1 R1 + γ2 R2 + · · · + γi (β Ri ) + · · · + γk Rk = 0,
Notice that
γ1 R1 + γ2 R2 + · · · + γi (Ri + β R j ) + · · · + γk Rk = 0,
where γh = αh , for any h = j, and γ j = α j − βαi . We then conclude that the set
described in (1.4) is linearly dependent.
Suppose now that the set at (1.2) is linearly independent. Of course, any other set
obtained by permuting (1.2) is yet linearly independent.
Moreover, if we assume there exists some 0 = β ∈ F such that
γ1 R1 + γ2 R2 + · · · + γi (β Ri ) + · · · + γk Rk = 0
for some 0 = γi ∈ F, contradicting the fact that (1.2) is linearly independent. Also
in this case, the scalar product by a nonzero scalar preserves the linear independence
between elements of the new set.
Finally, for 0 = β ∈ F and two distinct rows Ri , R j ∈ {R1 , . . . , Rk }, assuming
that
1.4 Matrices 31
γ1 R1 + γ2 R2 + · · · + γi Ri + · · · + (γ j + γi β)R j + · · · + γk Rk = 0,
Consider then the set S = {R1 , . . . , Rk } ⊂ Fn and assume that it is linearly depen-
dent. Without loss of generality, we may assume that Rk is linearly depending from
R1 , . . . , Rk−1 (if not, it would be enough to permute the order of arrays). By follow-
ing the argument in Remark 1.83, Rk being a linear combination of R1 , . . . , Rk−1 ,
there exist α1 , . . . , αk ∈ F such that
α1 α2 αk−1
Rk = − R1 − R2 − · · · − Rk−1 .
αk αk αk
Starting from this, we consider the subset {R1 , . . . , Rk−1 }. If it is linearly independent,
we have no chance to replace any array by 0. But, in case it is linearly dependent,
by repeating the process, we are able to replace, for instance, Rk−1 by 0 and obtain
again a new set of arrays S which is equivalent to S :
and the subset {R1 , . . . , Rk−t } is linearly independent, then we stop the process.
Having in mind the above, it becomes clear that the largest number k − t of linearly
independent arrays in S (t) is equal to the largest number of linearly independent
arrays in the starting set S. Moreover, if the original set S consists of all linearly
independent elements, then there is no sequence of elementary operations that can
transform it into a set S having some zero element.
Equivalent Matrices
We have previously remarked how any matrix A = ai j ∈ Mmn (F) can be repre-
sented just listing its rows R1 , . . . , Rm .
In light of what we said above, k rows R1 , . . . , Rk extracted from the matrix
are linearly dependent (or independent) if they are dependent (or independent) as
elements of Fn . In particular, to recognize the largest number of linearly independent
rows in the set S = {R1 , . . . , Rk }, we may perform some appropriate elementary
operations on S and obtain an equivalent set S of arrays in Fn . The elements of S
are not necessarily rows from A; nevertheless the relationship of linear dependence
(or independence) between elements of S has not changed since the relationship
between elements of S.
Note 1.86 We introduce the following notations for elementary row operations on
a matrix A :
(I) Interchanging two rows Ri and R j will be denoted Ri ↔ R j .
(II) Multiplying a row Ri by a nonzero scalar α, in such a way that α Ri replaces
Ri in the new set of arrays, will be denoted by Ri → α Ri .
(III) Adding a constant multiple of a row R j (namely α R j ) to another row Ri , in
such a way that Ri + α R j replaces Ri in the new set of arrays, will be denoted
by Ri → Ri + α R j .
Definition 1.87 Two matrices A, A ∈ Mmn (F) are called equivalent if A can be
transformed into A by performing a finite sequence of elementary row operations
on it.
Remark 1.88 Let A, A ∈ Mn (F) be equivalent matrices. Thus A is obtained from
A by performing a finite sequence of elementary row operations. This means that,
1.4 Matrices 33
Hence, we may, in general, say that the reduced row form of a matrix is the following:
⎡ ⎤
0 · 0 a1 j1 · · · · · · ··· · · · · · a1n
⎢0 · · 0 · 0 a2 j2 · · · ··· · · · · · a2n ⎥
⎢ ⎥
⎢0 · 0 0 0 · 0 · 0 a3 j3 · · · · · · · · a3n ⎥
⎢ ⎥
⎢· · · · · · · · · · ··· · · ··· · ⎥
⎢ ⎥
⎢· · · · · · · · · · ··· · · ··· · ⎥
⎢ ⎥
⎣· · · · · · · · · · ··· · · ··· · ⎦
0 0 · · · · · 0 · 0 ··· 0 am jm · · · amn
Definition 1.95 Let A ∈ Mmn (F). The rank of A, denoted by r (A), is the largest
order of any nonsingular square submatrix in A.
According to Theorem 1.94, the above definition of rank of a matrix can be reworded
as follows:
Definition 1.96 Let A ∈ Mmn (F). The rank of A is the largest number of linearly
independent rows ( or columns) in A.
Theorem 1.97 Let A, B ∈ Mmn (F) be two distinct matrices. If A, B are equivalent
then r (A) = r (B).
Proof We proved earlier that row operations do not affect the linear relationships
among rows. Also, since the rank is the largest number of linearly independent rows,
we may assert that two equivalent matrices have the same rank.
Therefore, in order to compute the rank of a matrix, we may change it into a reduced
row form by appropriate elementary operations such that we can directly see how
large its rank is: the count for the number of nonzero rows in its reduced row form
is precisely the rank of the matrix.
Definition 1.98 A square matrix A ∈ Mn (F) is said to be invertible if there exists a
square matrix B ∈ Mn (F), called the inverse of A, such that AB = B A = In .
Since matrix multiplication is not commutative, one can give the notions of both
right and left inverse for a right invertible and left invertible matrix, respectively. We
have the following important elementary properties of invertible matrices:
1.4 Matrices 35
A → E k A −→ E k−1 E k A −→ · · · −→ E 1 E 2 · · · E k A.
36 1 Algebraic Structures and Matrices
Remark 1.101 (i) If the elementary matrix E results from performing a row oper-
ation on Im and if A is an m × n matrix, then the product E A is the matrix that
results when the same row operation is performed on A.
(ii) Every nonsingular square matrix of order n is equivalent to the identity matrix
In and there exists elementary matrices E 1 , . . . , E t such that
In = E t E t−1 · · · E 1 A.
Moreover
A = (E t E t−1 · · · E 1 )−1 = E 1−1 E 2−1 · · · E t−1 .
Recalling that the inverse of any elementary matrix is again an elementary matrix,
we also may assert that any nonsingular square matrix can be expressed by the
product of a finite number of elementary matrices.
Exercises
where k ∈ R.
5. Let A ∈ Mn (C) be a Hermitian matrix. Show that its determinant is a real number.
6. Let A ∈ Mn (R) be such that At = −A. Show that, if the order n is odd, then
|A| = 0.
7. Compute the determinant of the following matrix
⎡ ⎤
1 1 1 ··· 1
⎢ 1 α α2 · · · α n−1 ⎥
⎢ ⎥
⎢ 1 α2 α4 · · · α 2n−2 ⎥
⎢ ⎥,
⎣··· ··· ··· ··· ··· ⎦
· · · α (n−1)
2
1 α n−1 α 2(n−1)
where α = cos 2π n
+ i sin 2π n
.
8. Let Z7 = {0̄, 1̄, 2̄, 3̄, 4̄, 5̄, 6̄} be the field of residue classes (mod 7). Using ele-
mentary row operations, find the inverse of the matrix A ∈ M3 (Z7 ), where
⎡ ⎤
2̄ 3̄ 1̄
A = ⎣ 0̄ 4̄ 5̄ ⎦ .
6̄ 3̄ 4̄
a1 x 1 + a2 x 2 + · · · + an x n = b (1.6)
a1 c1 + a2 c2 + · · · + an cn = b
where ai1 , . . . , ain are the coefficients and bi is the constant term of the ith equation
in the system. A solution to the system (1.7) is a sequence of scalars c1 , . . . , cn that
is simultaneously a solution to any equation in the system, that is
A system of linear equations may have either infinitely many solutions, no solution
or a unique solution. The system is called consistent if it has at least one solution,
and it is called inconsistent if it has no solution. Among consistent systems, the one
having infinitely many solutions are said to be indefinite, the one having a unique
solution are said to be definite.
Definition 1.102 Two system of linear equations involving the same variables
x1 , . . . , xn are said to be equivalent if they have the same set of solutions.
To obtain the solutions of a linear system, we may then decide to study an equivalent
system, having the same set of solutions as the original one. The goal is to simplify
the given system in order to obtain an equivalent one, whose solutions are easier to
get. In this regard, we will emphasize the role played by three types of operations
that can be used on a system to obtain an equivalent system.
(1) If we interchange the order in which two equations of a system occur, this will
have no effect on the solution set.
(2) If one equation of a system is multiplied by a nonzero scalar, this will have no
effect on the solution set.
(3) By adding one equation to a multiple of another one, we create a new equation
certainly satisfied by any solution of the original equations. Thus, the system
consisting of both the original equations and the resulting new one is equivalent
to the original system.
In short, it seems natural that the main questions we must ask ourselves are
(1) how to recognize if a system is consistent;
(2) if it is consistent, how to describe the set of its solutions, by simply relying on
an equivalent system.
To answer to these questions, we will refer to the matrix theory previously developed.
In doing so, the first step is to collect all coefficients from the equations of the system
(1.7), in order to present them in tabular form
1.5 System of Linear Equations 39
⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
A=⎢
⎣ ...
⎥.
... ... ... ⎦
am1 am2 . . . amn
The matrix A is called the coefficient matrix for the system (1.7). Moreover, if we
augment the coefficient matrix with the extra column
⎡ ⎤
b1
⎢ b2 ⎥
⎢ ⎥
B=⎢
⎢...⎥
⎥
⎣...⎦
bm
whose entries are the constants of equations from the system, we obtain a m × n + 1
matrix ⎡ ⎤
a11 a12 . . . a1n | b1
⎢ a21 a22 . . . a2n | b2 ⎥
C =⎢ ⎣ ... ... ... ... | ...⎦
⎥
that is called the augmented matrix for the system (1.7). The augmented matrix C is
usually denoted by [A|B] or A . Hence, if we display the coefficients and constants
for the system in matrix form, and by introducing the array
⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
X =⎢
⎢...⎥
⎥
⎣...⎦
xn
whose entries are the n unknowns used in the equations, we may express compactly
the system (1.7) as follows:
AX = B. (1.8)
From this point of view, it is clear that the elementary row operations for the
augmented matrix representing a system of linear equations exactly duplicate
the above described operations for equations in the system. This means that, if
C = [A|B] ∈ Mmn+1 (F) is the augmented matrix for the original system (1.8) and
C = [A |B ] ∈ Mmn+1 (F) is row equivalent to C, then C is the augmented matrix
for a system that is equivalent to (1.8). Therefore, to reduce (1.8) to an equivalent and
easier system, we just perform elementary row operations on the augmented matrix
of the system, in order to obtain its reduced row form. Then consider the system
associated with this last reduced matrix to recognize if it is consistent and, in case of
40 1 Algebraic Structures and Matrices
positive answer, to describe its set of solutions. At this point we need to fix a method
that states unequivocally when a system is consistent or not.
Theorem 1.103 A m × n system of linear equations AX = B is consistent if and
only if rank(A) = rank([A|B]).
Therefore we conclude that, in case of consistent systems having rank r lesser than
the number n of unknowns, the general solution of the system can be found as follows:
(i) Recognize the r leading unknowns: the ones whose coefficients are precisely
the pivots of augmented matrix in its reduced row form.
(ii) Assign arbitrary values to n − r free unknowns.
(iii) For any given set of values for the free unknowns, the values of the leading ones
are determined uniquely from the equations in the system, by using the back
substitution.
Hence the system admits infinitely many solutions depending on n − r free param-
eters.
Homogeneous Systems Let us focus our attention on an important special case of
systems of linear equations, more precisely when the constant terms are all zero, i.e.,
bi = 0 for any i = 1, . . . , m in (1.7). Such a system is called homogeneous and, of
course, admits always at least the trivial solution xi = 0, for all i = 1, . . . , n. In fact,
there is no doubt that homogeneous systems are consistent. The augmented matrix
is obtained from the coefficient one by adjoining a zero column, and this would
thus not change the rank of the matrix. Hence, the only real question is whether
a homogeneous system admits other solutions besides the trivial one. To answer
this question it is sufficient to recall the arguments presented as a consequence of
Theorem 1.103 so that we may assert the following:
Exercises
in three unknowns and one real parameter k. Determine, in case there exist, the
values of the real parameter k for which the system is consistent. Then, in those
cases, determine all the solutions of the system.
1.5 System of Linear Equations 41
4. Let C be the field of complex numbers. Are the following two systems of linear
equations equivalent? If so, express each equation in each system as a linear
combination of the equations in the other system.
and
(1 + 2i )x1 + 8x2 − i x3 − x4 = 0
2
x − 21 x2 + x3 + 7x4 = 0.
3 1
x1 + 2x2 − x3 + kx4 = 1
2x1 + x2 + x4 = 1
3x1 + 3x2 − x3 + (3 − k)x4 = −1
x1 + x2 + 2x4 = 2
−x1 − x2 + x3 + x4 = k 2
3x1 + 2x3 + 3x4 = k − 2
x1 + 2x2 − kx3 =k
x1 + x2 − kx3 =1
3x1 + (2 + k)x2 − (2 + k)x3 = 0
42 1 Algebraic Structures and Matrices
(1 − i)x1 − i x2 = 0
2x1 + (1 − i)x2 = 0.
Chapter 2
Vector Spaces
Definition 2.1 A nonempty set V equipped with a binary operation say + and an
external binary operation F × V → V such that (α, v) → α.v is said to be a vector
space over the field F if it satisfies the following axioms:
(1) (V, +) is an abelian group
(2) α.(v+w) = α.v+α.w
(3) (α + β).v = α.v+β.v
(4) (αβ).v = α.(βv)
(5) 1.v = v for all v ∈ V , where 1 is the identity of F
for all α, β ∈ F, v, w ∈ V
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 43
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_2
44 2 Vector Spaces
Remark 2.2 (i) The elements of V are called vectors while the elements of F are
said to be scalars. Throughout, the product between scalar α and vector v will
be denoted as αv instead of α.v. In axiom (3) the sum α + β is defined between
two scalars, the elements of the field F while the sum αv+βv is defined between
two vectors, the elements of V . There should be no confusion, the context will
clear the intension. For the sake of convenience the symbol + will stand for the
addition of two vectors as well as scalars.
(ii) If F = R then V is said to be the real vector space while if F = C, then V is said
to be a complex vector space. For any α, β ∈ F the difference α − β represents
α + (−β), where (−β) is the additive inverse of β ∈ F, considered as an element
of additive group (F, +), while for any u, v ∈ V the difference u − v represents
the vector u + (−v), where −v is the additive inverse of v ∈ V in the additive
group (V, +).
Lemma 2.3 Let V be a vector space over a field F. Then for any α, β ∈ F and
v, w ∈ V
(i) α0 = 0.
(ii) 0v = 0.
(iii) −(αv) = α(−v) = (−α)v.
(iv) αv = 0 if and only if α = 0 or v = 0.
(v) (α − β)v = αv − βv.
(vi) α(v − w) = αv − αw.
There should be no confusion in 0 and 0. In the above result 0 is the additive identity of
the field F while 0 denotes the additive identity of the additive group V . Henceforth,
we shall also denote the additive identity 0 by 0. In use it can be easily understood
whether 0 denotes the additive identity of a field or the additive identity of the group
V.
Example 2.4 (1) Every field F is a vector space over its subfield. If E is a subfield
of F, then F is a vector space over E, under the usual addition of F and the
scalar multiplication αv (or the multiplication of F), for any α ∈ E, v ∈ F. In
particular, every field F is a vector space over itself, the field of complex numbers
C is a vector space over the field of reals R and finally the field R of reals is a
vector space over the field Q of rational numbers.
By (1), every field of characteristic 0 can be regarded as a vector space over
the field Q of rational numbers because each field of characteristic 0 contains a
subfield isomorphic to the field Q of rational numbers. Similarly every field of
characteristic p, where p is a positive prime integer, can be regarded as a vector
space over the field Z p of the field of residue classes modulo p because each
field of characteristic p contains a subfield isomorphic to the field Z p of residue
classes modulo p.
2.1 Definitions and Examples 45
of all Mm×n matrices over a field F is a vector space over F under the matrix
addition and scalar multiplication.
(3) If we consider the set Fn = {(a1 , a2 , . . . , an ) | ai ∈ F, i = 1, 2, . . . , n}. Then
any two elements x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ) ∈ Fn are said to be
equal if and only if xi = yi for each i = 1, 2, . . . , n. Now for any α ∈ R, define
addition and scalar multiplication in Fn as follows:
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ),
(α f )(x) = α f (x).
It can be easily verified that the set V of all functions from A into F is a vector
space over F.
(5) Let F[x] be the polynomial ring in indeterminate x over the field F. In the abelian
group (F[x], +) define scalar multiplication as follows:
(7) Consider the set of all real-valued continuous functions C [0, 1] defined on the
interval [0, 1], i.e., f : [0, 1] → R. Define addition and scalar multiplication in
C [0, 1] as follows:
( f + g)(x) = f (x) + g(x),
(α f )(x) = α f (x)
for any α ∈ R, f, g ∈ C [0, 1], x ∈ [0, 1]. Then C [0, 1] is a vector space over
R.
(8) Let F[[x]] be the set of all formal power series in indeterminate
∞ x over a field F,
that is, a collection of the expression
∞ of the form f (x) = i=0 ai x i
, where ai ∈
∞
F. Any two power series f (x) = i=0 ai x i and g(x) = i=0 bi x i are equal if
and only if ai = bi for all i. For any α ∈ F define
∞
f (x) + g(x) = (ai + bi )x i ,
i=0
∞
α f (x) = (αai )x i .
i=0
Proof If W is a subspace of V , then W is itself a vector space over the same field
F and hence for any α ∈ F and w ∈ W, αw ∈ W , as W is closed with regard to
scalar multiplication. Being an additive group W is nonempty and closed under the
operation of addition, that is, for any w1 , w2 ∈ W, w1 + w2 ∈ W .
Conversely, if the conditions (i) and (ii) hold, then by (i) W is closed under
the addition and by (ii) W is closed under the scalar multiplication. Since W is
nonempty, there exists w ∈ W. Now by condition (ii), 0w = 0 ∈ W and also for
any w ∈ W, −w = (−1)w ∈ W . The operation of addition in V being associative
and commutative is also associative and commutative in W , and thus (W, +) is an
abelian group. The axioms (2)–(5) in the Definition 2.1 hold in W , as they hold in
V . Hence W is itself a vector space over the field F with regard to induced binary
operations and therefore W is a subspace of V .
2.1 Definitions and Examples 47
Remark 2.7 Let V be any vector space. Then {0} and V are always subspaces of
V . These two subspaces are called trivial or improper subspaces of V . Any subspace
W of V other than {0} and V is called nontrivial or proper subspace of V .
Example 2.8 (1) In the vector space R3 , consider the subset W = {(x, y, z) ∈
R3 | αx + βy + γ z = 0}, where α, β, γ ∈ R. Since (0, 0, 0) ∈ W, W = ∅. Let
(x1 , y1 , z 1 ), (x2 , y2 , z 2 ) ∈ W . Then
and hence (x1 , y1 , z 1 ) + (x2 , y2 , z 2 ) ∈ W . Also since for any δ ∈ R and (x1 ,
y1 , z 1 ) ∈ W , α(δx1 ) + β(δy1 ) + γ (δz 1 ) = δ(αx1 + βy1 + γ z 1 ) = δ0 = 0
implies that δ(x1 , y1 , z 1 ) ∈ W . Hence W is a subspace of R3 .
(2) Consider the subsets W1 = {(x, 0) | x ∈ R} and W2 = {(0, y) | y ∈ R} of R2 . It
can be easily seen that W1 and W2 are subspaces of the vector space R2 over
the field R. Note that W1 and W2 are the X and Y axes, respectively. Similarly,
if we consider the X Y -plane W3 = {(x, y, 0) | x, y ∈ R} in R3 , it can be easily
verified that W3 is a subspace of the vector space R3 over R.
(3) In Example 2.4(6), the subset Fn [x] of F[x] is a subspace of F[x] over the field
F.
(4) Let V = C [0, 1] be the vector space of all real-valued continuous functions on
[0, 1]. Then W , the subset of V , consisting of all differentiable functions is a
subspace of V .
(5) Let Mn (R) denote the vector space of all n × n matrices with real entries over the
field of real numbers. The subsets W1 and W2 of V , consisting of all symmetric
and skew symmetric matrices respectively, are subspaces of the vector space
Mn (R).
Lemma 2.9 A nonempty subset W of a vector space V over a field F is a subspace
of V if and only if for any w1 , w2 ∈ W and α, β ∈ F, αw1 + βw2 ∈ W .
Proof If W is a subspace of V , then for any w1 , w2 ∈ W and α, β ∈ F, αw1 , βw2 ∈
W by Lemma 2.6, αw1 + βw2 ∈ W . Conversely, let W be a nonempty subset of V
such that for any w1 , w2 ∈ W and α, β ∈ F, αw1 + βw2 ∈ W . Since 1 ∈ F, w1 +
w2 = 1w1 + 1w2 ∈ W . Also since 0 ∈ F, αw1 = αw1 + 0w2 ∈ W , for any α ∈ F
and w1 ∈ W . Hence by Lemma 2.6, W is a subspace of V .
Definition 2.10 Let W1 , W2 , . . . , Wn be subspaces of a vector space V over F. Then
n
the sum of W1 , W2 , . . . , Wn is defined as i=1 Wi = W1 + W2 + · · · + Wn = {w1 +
w2 + · · · + wn | wi ∈ Wi , i = 1, 2, . . . , n}.
n
Remark 2.11
It can be easily seen that i=1 Wi is a subspace of V . In fact, 0 ∈
n n n
i=1 Wi , i=1 Wi = ∅ and if x, y ∈ i=1 Wi , then x = w1 + w2 + · · · + wn , y =
w1 + w2 + · · · + w n , where wi , w i ∈ Wi for each i = 1, 2, . . . , n. Since each Wi s
is a subspace we find that x + y = (w1 + w 1 ) + (w2 +
w 2 ) + · · · + (wn + w n ) ∈
n n
i=1 Wi and α ∈ F, αx = αw1 + αw2 + · · · + αwn ∈ i=1 Wi . Hence by Lemma
n
2.6, we find that i=1 Wi is a subspace of V .
48 2 Vector Spaces
Lemma 2.12 If V is a vector space over F and {Wi }i∈I , where I is an index set, is
a collection of subspaces of V , then W = i∈I Wi is a subspace of V .
The above result shows that the intersection of two subspaces over a field F is a
subspace over F, but it is to be noted that the union of two subspaces need not be a
subspace.
In view of the above result one can generalize the definition of direct sum to a finite
number of subspaces as follows:
2.1 Definitions and Examples 49
(v1 , v2 , . . . , vn ) + (v 1 , v 2 , . . . , v n ) = (v1 + v 1 , v2 + v 2 , . . . , vn + v n )
It can be easily seen that V is a vector space over the field F with regard to the
above operations. We call V as the external direct sum of V1 , V2 , . . . , Vn , denoted as
V = V1 ⊕ V2 ⊕ · · · ⊕ Vn .
Proof For n = 1, V = W1 and the result is obvious, and hence assume that n ≥ 2.
Suppose that V = W1 ⊕ W2 ⊕ · · · ⊕ Wn . Hence by Lemma 2.17, V = W1 + W2 +
· · · + Wn and Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0}. Now for
some v j ∈ W j , let v1 + v2 + · · · + vn = 0. Then vi = − nj=1, j=i v j ∈ Wi ∩ (W1 +
W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ). This yields that vi = 0 for every i.
Conversely, assume that V = W1 + W2 + · · · + Wn and for any vi ∈ Wi ; 1 ≤ i ≤
n, v1 + v2 = · · · + vn = 0 implies that v1 = v2 = · · · = vn = 0. For some i let v ∈
Wi ∩ (W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn ). Then v = wi ∈ Wi and v ∈
W1 + W2 + · · · + Wi−1 + Wi+1 + · · · + Wn implies that v = v1 + v2 + · · · + vi−1
+ vi+1 + · · · + vn and consequently wi = v1 + v2 + · · · + vi−1 + vi+1 + · · · + vn .
This implies that nj=1 v j = 0, where vi = −wi . By our hypothesis v j = 0 and
in particular vi = 0 and hence v = −vi = 0. Therefore Wi ∩ (W1 + W2 + · · · +
Wi−1 + Wi+1 + · · · + Wn ) = {0} and hence V = W1 ⊕ W2 ⊕ · · · ⊕ Wn .
Definition 2.20 Let S be a non-empty subset of a vector space V over a field F, then
the linear span of S, denoted as L(S) is the set of all linear combinations of finite
sets of elements of S, i.e.,
k
L(S) = αi vi | αi ∈ F, vi ∈ S, k ∈ N .
i=1
Note that in the above definition αi , vi , k are chosen from their respective domains.
Since α0 = 0 for all α ∈ F, for S = {0}, L(S) = {0}. Moreover, if S = {v} for some
v ∈ V , then L(S) = {αv | α ∈ F}.
In view of Theorem 2.21(iii) and (v), the following result follows directly:
Corollary 2.22 If W1 , W2 are any two subspaces of a vector space V over F, then
W1 + W2 is a subspace of V spanned by W1 ∪ W2 .
Theorem 2.24 Let S be a nonempty subset of a vector space V over a field F, then
(i) S = L(S),
(ii) S = i∈I Wi , where I is an index set and Wi is a subspace of V containing S
as a subset.
Definition 2.27 Let V be a vector space over a field F and W a subspace of V . Let
V /W = {v + W | v ∈ V }.
Remark 2.28 Since (V /W, +) is the quotient group of (V, +) with regard to its
normal subgroup (W, +), it is obvious to observe the following for all v, v1 , v2 ∈ V :
(i) v + W = ∅.
(ii) v + W = W if and only if v ∈ W .
(iii) v1 + W = v2 + W if and only if (v1 − v2 ) ∈ W .
(iv) Any two elements of V /W are either equal or mutually disjoint.
(v) Union of all the elements of V /W equals V . Thus the quotient space V /W gives
a partition of the vector space V .
Example 2.29 Let V be the vector space of all polynomials of degree less than or
equal to n over the field R of real numbers, where n ≥ 2 is a fixed integer. Assume
that W is a subspace of V , consisting of all polynomials of degree less than or equal
to (n − 2). Hence
Exercises
1. Let U and W be vector spaces over a field F. Let V be the set of ordered pairs
(u, w) where u ∈ U and w ∈ W . Show that V is a vector space over F with regard
to addition in V and scalar multiplication on V defined by (u, w) + (u , w ) =
(u + u , w + w ) and α(u, v) = (αu, αv), where α ∈ F.
2. Let AX = B be a nonhomogeneous system of linear equations in n unknowns,
that is B = 0. Show that the solution set is not a subspace of F n .
3. Let V be the vector space of all functions from the real field R into R. Prove that
W is a subspace of V if W consists of all bounded functions.
4. Suppose U, W1 , W2 are subspaces of a vector space V over a field F. Show that
(U ∩ W1 ) + (U ∩ W2 ) ⊆ U ∩ (W1 + W2 ).
5. Give examples of three subspaces U, W1 , W2 of a vector space V such that
(U ∩ W1 ) + (U ∩ W2 ) = U ∩ (W1 + W2 ).
6. Let S = {(xn ) | xn ∈ R} be the set of all real sequences. Then S is a vector space
over R under the following operations:
α(xn ) = (αxn ), α ∈ R.
Let C be the set of all convergent sequences and C0 be the set of all null sequences.
Then show that C and C0 are vector subspaces of S.
7. Why does a vector space V over F(= C, R or Q) contain either one element or
infinitely many elements? Given v ∈ V, is it possible to have two distinct vectors
u, w ∈ V such that u + v = 0 and w + v = 0?
a b
8. Let H be the collection of all complex 2 × 2 matrices of the form . Show
−b̄ ā
that H is a vector space under the usual matrix addition and scalar multiplication
over R. Is H also a vector space over C?
9. Show that if W1 is a subspace of a vector space V , and if there is a unique
subspace W2 such that V = W1 ⊕ W2 , then W1 = V .
10. Let U = {(a, b, c) | a = b = c, a, b, c ∈ R} and W = {(0, b, c) | b, c ∈ R} be
subspaces of R3 . Show that R3 = U ⊕ W .
11. Let U1 = {(a, b, c) | a = c, a, b, c ∈ R}, U2 = {(a, b, c) | a + b + c = 0, a, b,
c ∈ R} and U3 = {(0, 0, c) | c ∈ R} be subspaces of R3 . Show that
(a) R3 = U1 + U2 ,
(b) R3 = U2 + U3 ,
(c) R3 = U1 + U3 .
When is the sum a direct sum?
12. Let W1 , W2 and W3 be subspaces of a vector space V . Show that W1 + W2 + W3
is not necessarily a direct sum even though
W1 ∩ W2 = W1 ∩ W3 = W2 ∩ W3 = {0}.
14. Let C(R) be the vector space of all real-valued functions over R, and let W1 and
W2 be the collections of even and odd continuous functions on R, respectively.
Show that W1 and W2 are subspaces of C(R). Show further that C(R) = W1 ⊕
W2 .
15. Give an example of a vector space V having any three different nonzero subspaces
W1 , W2 , W3 such that V = W1 ⊕ W2 = W2 ⊕ W3 = W3 ⊕ W1 .
2.2 Linear Dependence, Independence and Basis 55
Example 2.31 (1) Let V = {(a, b, c) | a, b, c ∈ F}. This is a vector space over F
in which vectors e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) are linearly inde-
pendent. In fact, if there exist scalars α1 , α2 , α3 ∈ F such that α1 (1, 0, 0) +
α2 (0, 1, 0) + α3 (0, 0, 1) = (α1 , α2 , α3 ) = (0, 0, 0) implies that α1 = α2 = α3 =
0.
(2) Let V denote the vector space of all polynomials in x over the field R of
real numbers, i.e., V = {α0 + α1 x + α2 x 2 + · · · + αn x n | α0 , α1 , α2 , . . . , αn ∈
R, n ∈ N ∪ {0}}. Assume S is an infinite subset of V , where S = {1, x, x 2 , x 3 ,
x 4 , . . . , x n , . . .}. The set S is a linearly independent subset of V . This is due
to the fact that if we take any finite subset of S, then it will be of the form
{xi1
, x i2 , x i3 , . . . , x im }, where i 1 , i 2 , i 3 , . . . , i m are some nonnegative integers and
if m k=1 λi k x = 0, this shows that λi k = 0 for each k = 1, 2, . . . , m and hence
ik
ProofIf S is linearly dependent, then there exist scalars βi ∈ F not all zero such
n
that i=1 βi vi = 0. Suppose
that βi = 0 for some i. Then the above expression
can be written as vi = nj=1, j=i (−βi−1 β j )v j , i.e., vi = nj=1, j=i α j v j , where α j =
−βi−1 β j ∈ F.
56 2 Vector Spaces
(2) Let V = Fn [x], the set of all polynomials in indeterminate x of degree less than or
equal to n. V is a vector space over the field F. Now let B = {1, x, x 2 , . . . , x n } ⊆
V . It can be easily seen that any f (x) ∈ V can be written as f (x) = α0 1 + α1 x +
· · · + αn x n with αi ∈ F. This shows that B spans V . Moreover, α0 1 + α1 x +
· · · + αn x n = 0 yields that α0 = α1 = · · · = αn = 0 and hence B is linearly
independent. Thus B = {1, x, x 2 , . . . , x n } is a basis of V , which is called the
standard basis of V .
(3) The set M2 2×
of all 2matrices
overR forms a vector space over R. The set
10 01 00 00
B= , , , is a standard basis of M2 .
00 00 10 01
Definition 2.38 Let V be a vector space over a field F. A subset S of V is called a
maximal linearly independent set if
(i) S is linearly independent.
(ii) S ⊂ S , where S ⊆ V , implies that S is a linearly dependent set.
Theorem 2.40 Let V be a vector space over a field F and let S ⊆ V . Then following
statements are equivalent.
(i) S is a maximal linearly independent set of V .
(ii) S is a minimal set of generators of V .
(iii) S is a basis of V .
(iv) Every element of V can be uniquely written as a linear combination of finitely
number of elements of S, i.e., for any v ∈ V if v = α1 v1 + α2 v2 + · · · +
αn vn and v = β1 v1 + β2 v2 + · · · + βn vn , where αi , βi ∈ F and vi ∈ S, i =
1, 2, 3, . . . , n then αi = βi for all i = 1, 2, 3, . . . , n.
Proof (i) ⇒ (ii) We know that S = L(S). First we prove that L(S) = V . L(S) ⊆
V holds obviously. Let v ∈ V . If v ∈ S, then obviously v = 1v, which shows that
v ∈ L(S). On the other hand, if v ∈ V \ S, then S ∪ {v} is a linearly dependent
set since S is a maximal linearly independent set. There exists a finite subset T
of S ∪ {v} containing the element v of S, which is a linearly dependent set. Let
T = {v1 , v2 , . . . , vn , v}. There exist scalars α1 , α2 , . . . , αn , α, not all zero such that
α1 v1 + α2 v2 + · · · + αn vn + αv = 0. Here we claim that α = 0, for otherwise the
set {v1 , v2 , . . . , vn } becomes a linearly dependent set, leading to a contradiction. Now
we get v = (−(α −1 α1 ))v1 + (−(α −1 α2 ))v2 + · · · + ((−α −1 αn )vn . This implies that
v ∈ L(S) and thus V ⊆ L(S). Finally we have proved that L(S) = V .
Now we prove that S is a minimal set such that S = V . Suppose on contrary,
i.e., there exists P S such that P = V , i.e., L(P) = V . Let w ∈ S − P. There
exist scalars β1 , β2 , . . . , βm such that w = β1 w1 + β2 w2 + · · · + βm wm for some
58 2 Vector Spaces
Theorem 2.43 If a vector space V over a field F has a basis containing m vectors,
where m is a positive integer, then any set containing n vectors n > m in V is linearly
dependent.
m
wj = αi j vi ,
i=1
n
β j w j = 0.
j=1
m m n
This yields that nj=1 β j ( i=1 αi j vi ) = 0, i.e., i=1 ( j=1 αi j β j )vi = 0.
But since {v1 , v2 , . . . , vm } is a basis of V , 0 = 0v1 + 0v2 + · · · + 0vm is the
unique representation for 0 vector. Hence the above expression yields that nj=1 αi j β j
= 0, for each i such that 1 ≤ i ≤ m. This is a system of m homogeneous linear equa-
tions in n unknowns. Thus by Theorem 1.104, there exists nontrivial solution say
β1 , β
2 , . . . , βn . This ensures that there exist scalars β1 , β2 , . . . , βn not all zero such
that nj=1 β j w j = 0 and hence the set {w1 , w2 , . . . , wn } is linearly dependent.
Definition 2.44 A vector space V over a field F is said to be finite dimensional (resp.
infinite dimensional) if there exists a finite (resp. infinite) subset S of V which spans
V , i.e., L(S) = V .
Remark 2.45 If a vector space V has a basis with a finite (resp. an infinite) number
of vectors, then it is finite (resp. infinite) dimensional. The number of vectors of
a basis of V is called the dimension of V denoted as dimV . If V = {0}, then its
dimension is taken to be zero.
Theorem 2.46 Let V be a vector space over a field F. If it has a finite basis, then
any two bases of V have the same number of elements.
Proof Let B = {v1 , v2 , . . . , vn } and B = {w1 , w2 , . . . , wm } be two bases of V . As
B is a basis and the set B is a linearly independent set of V , by Theorem 2.43, we
60 2 Vector Spaces
arrive at m ≤ n. Similarly, as B is a basis of V and the set B is a linearly independent
set of V , again by Theorem 2.43, we conclude that n ≤ m. This yields that m = n.
Theorem 2.47 Let V be an n-dimensional vector space over a field F. Then any
linearly independent subset of V consisting n elements is a basis of V .
Theorem 2.48 Let V be an n-dimensional vector space over a field F. Then any
linearly independent subset {v1 , v2 , . . . , vm }, m ≤ n, of V can be extended to a
basis V .
Proof Let dimV = n and dimW = m. If W = {0}, then W = V and, on the other
hand, if W = V , then W = {0}. Hence assume that neither W = {0} nor W = V . In
this case 1 ≤ m < n. Let B1 = {v1 , v2 , . . . , vm } be a basis of W . Since B is linearly
independent, by the above theorem B can be extended to a basis of V say B =
{v1 , v2 , . . . , vm , vm+1 , . . . , vn }. Now B = B1 ∪ B2 with B2 = {vm+1 , vm+2 , . . . , vn }
and B1 ∩ B2 = ∅. Let W be spanned by B2 . Since V and W are spanned by B and B1 ,
respectively, by Lemma 2.41, V = W + W and W ∩ W = {0}, i.e., V = W ⊕ W .
Theorem 2.50 Let V be a finite dimensional vector space over a field F and U a
subspace of V . Then dimU ≤ dimV . Equality holds only when U = V .
2.2 Linear Dependence, Independence and Basis 61
k
r
s
αi vi + βjwj + δ u = 0.
i=1 j=1 =1
The expression in the right side is in W2 and in the left side is in W1 . Therefore,
s
− δ u ∈ W1 ∩ W2 , and hence can be written as − s=1 δ u = kj=1 γ j v j for
=1
62 2 Vector Spaces
k
r
s
αi vi + βjwj = − δ u = 0.
i=1 j=1 =1
{v1 , v2 , . . . , vm , vm+1 , . . . , vn }.
This shows that S spans V /W . Further, we show that S is linearly independent. Let
βm+1 , βm+2 , . . . , βn ∈ F such that
2.2 Linear Dependence, Independence and Basis 63
This implies that βm+1 vm+1 + βm+2 vm+2 + · · · + βn vn ∈ W . Therefore there exist
δ1 , δ2 , . . . , δn ∈ F such that
βm+1 vm+1 + βm+2 vm+2 + · · · + βn vn = δ1 v1 + δ2 v2 + · · · + δm vm , i.e.,
Exercises
U = Span{(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)},
Vector spaces and notions involved there can be interpreted geometrically in some
cases. We have chosen the vector spaces R2 and R3 over the field R of real numbers
for geometrical interpretation.
2.3 Geometrical Interpretations 65
denote the quotient space of R2 with regard to the subspace W and is given by
R2 2
W
= {v + W |v ∈ R2 }. Consider a coset (v + W ) ∈ RW , where v = (a, b). Here (v +
W ) = {(a + λα, b + λβ) | λ ∈ R}. If (x, y) be any arbitrary element of (v + W ),
then we get x−a α
= y−b
β
. Geometrically the latter equation gives a straight line passing
through (a, b) and parallel to the straight line represented by W . This shows that
geometrically each coset (v + W ) is a straight line passing through point v and
parallel to the line represented by the subspace W . Thus geometrically the quotient
space RW is the collection of all the straight lines, which are parallel to the straight
2
plane passing through origin, v1 and v2 . This shows that geometrically W2 represents
a plane passing through origin. Conversely, suppose a plane passing through origin
be given by: px + qy + r z = 0, where (0, 0, 0) = ( p, q, r ) ∈ R3 . If W denotes
μ
the set of all the points lying on this plane, then W = {( qλ−r p
, λ, μ) | λ, μ ∈ R}
−r
or W = {λ( p , 1, 0) + μ( p , 0, 1) | λ, μ ∈ R} ⊂ R . It can be easily verified that
q 3
Exercises
1. Let R2 be the vector space over the field R of real numbers. Let S be any subset
of R2 as given below. Then find the subspace generated by S, i.e., S. Also find
out the equation of the curve represented by this subspace.
(a) S = {(3, 5)}.
(b) S = {(2, −3), (4, −6)}. √ √
(c) S = {(−3, −8), (3, 8), (3 5, 8 5)}.
2. Let R3 be the vector space over the field R of real numbers. Let S be any subset
of R3 as given below. Then find the subspace generated by S, i.e., S. Also find
out the equations of the curves or surfaces represented by these subspaces.
(a) S = {(−5, 11, 3)}.
(b) S = {(5, −3, 17), (3, −8, −11)}.
(c) S = {(3, 5, −3), (6, 10, −6), (7, −8, −6)}.
Let V be a finite dimensional vector space over a field F and let B = {v1 , v2 , . . . , vn }
be a basis of V . Fix the order of elements in the basis B. If v ∈ V then there exist
unique scalars α1 , α2 , . . . , αn ∈ F such ⎡
that v⎤= α1 v1 + α2 v2 + · · · + αn vn . Thus
α1
⎢ α2 ⎥
⎢ ⎥
v ∈ V determines a unique n × 1 matrix ⎢ . ⎥ . This n × 1 matrix is known as the
⎣ .. ⎦
αn
coordinate vector of v relative to the ordered basis B, denoted as [v] B . Notice that
2.4 Change of Basis 67
basis B and the order of the elements in B play a very important role in determining
the coordinate vector of any arbitrary element in V . Throughout this section we shall
consider the ordered basis.
Definition 2.56 Let V be a vector space over a field F and let
B1 = {u 1 , u 2 , . . . , u n } and B2 = {v1 , v2 . . . , vn } be two ordered bases of V . Now
if we consider B2 as a basis of V , then each vector in V can be uniquely written as
a linear combination of v1 , v2 . . . , vn , i.e., u i = nj=1 α ji v j , for each i; 1 ≤ i ≤ n,
where α ji ∈ F. Then the n × n matrix P = (α ji ) is called the transition matrix of
B1 relative to
the basis B2 . Similarly, if we consider B1 as a basis of V , then we can
n
write v j = i=1 βi j u i , for each j; 1 ≤ j ≤ n, and the n × n matrix Q = (βi j ) is
known as the transition matrix of B2 relative to the basis B1 .
Remark 2.57 (i) Notice that in the ⎡ above⎤definition the coordinate vector of u i
α1i
⎢ α2i ⎥
⎢ ⎥
relative to the basis B2 , i.e., [u i ] B2 = ⎢ . ⎥ is the ith column vector in the matrix
⎣ .. ⎦
αni
P relative to the basis B2 .
(ii) In Fn , the standard ordered basis is {e1 , e2 , . . . , en }, where ei is the n-tuple
with ith component 1, and all the other components are zero. For example, in R3 ,
B1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is the standard ordered basis, but the ordered
basis B2 = {(0, 1, 0), (1, 0, 0), (0, 0, 1)} is not the standard ordered basis of R3 .
Example 2.58 Let B1 = {e1 , e2 , e3 } and B2 = {(1, 1, 0), (1, −1, 0), (0, 1, 1)} be
two ordered bases of vector space R3 . Then
Proof Let P = (αi j ) and Q = (βi j ) be the transition matrices of B1 relative to the
basis B2 and of B2 relative to the basis B1 , respectively. Hence
n n we find that u j =
α v
i=1 i j i , for each j, 1 ≤ j ≤ n, where αij ∈ F and vi = k=1 βki u k , for each
i, 1 ≤ i ≤ n. This shows that
n
n
uj = αi j βki u k
i=1 k=1
n n
= βki αi j u k
k=1 i=1
n
= δk j u k ,
k=1
n
where δk j = i=1 βki αi j is the (k, j)-th entry of the product Q P. But since B1 is
linearly independent, we find that δk j = 1 for k = j and δk j = 0 for k = j. This
yields that Q P = I . In a similar manner, it can be shown that P Q = I . Hence P is
nonsingular and Q = P −1 .
Proof Since dimV = n, it is enough to prove that B spans V . Let W = L(B ) and
P −1 = (βi j ). Then for each t, 1 ≤ t ≤ n we find that
n
n
n
j=1 β jt v j = β jt αi j u i
j=1 i=1
n n
= αi j β jt u i
i=1 j=1
n
= δit u i , where δit = 1 ∈ F for i = t and δit = 0 ∈ F for i = t
i=1
= ut .
Proof We find that f k = nj=1 α jk e j , for each k, 1 ≤ k ≤ n and gi = nk=1 βki f k ,
for each i, 1 ≤ i ≤ n. This yields that
n
n
gi = βki α jk e j
k=1
n j=1
n
= α jk βki e j
j=1 k=1
n
= γ ji e j ,
j=1
where γ ji = nk=1 α jk βki . Hence the transition matrix of {g1 , g2 , . . . , gn } relative
to the ordered basis {e1 , e2 , . . . , en } is (γi j ) = (αi j )(βi j ) = P Q. This completes the
proof.
⎡
n ⎤
β1 j b j
⎡⎤ ⎢ j=1 ⎥ ⎡ ⎤⎡ ⎤
a1 ⎢ n ⎥ β11 β12 ··· β1n b1
⎢ ⎥
⎢ a2 ⎥ ⎢ β2 j b j ⎥ ⎢ β21 β22 ··· β2n ⎥ ⎢ b2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
[v] B1 = ⎢ . ⎥ = ⎢ j=1 ⎥ = ⎢ .. .. .. .. ⎥ ⎢ .. ⎥
⎣ .. ⎦ ⎢ .. ⎥ ⎣ . . . . ⎦⎣ . ⎦
⎢ . ⎥
an ⎢ ⎥ βn1 βn2 · · · βnn bn
⎣ n ⎦
βn j b j
j=1
70 2 Vector Spaces
= Q[v] B2 .
This completes the proof of the result.
This shows that ξ(αv1 + βv2 ) = αξ(v1 ) + βξ(v2 ), for any α, β ∈ F and v1 , v2 ∈ V .
(ii) For any v1 , v2 ∈ V , ξ(v1 ) = ξ(v2 ) implies that [v1 ] B = [v2 ] B , which yields
that ai = bi ⎡ ⎤ i, 1 ≤ i ≤ n. Hence v1 = v2 and the map ξ is one-to-one. Also
for each
c1
⎢ c2 ⎥ n
⎢ ⎥
for any c = ⎢ . ⎥ ∈ Mn×1 (F) there exists v = i=1 ci u i ∈ V such that ξ(v) = c
⎣ .. ⎦
cn
and hence ξ is onto.
Example 2.64 Let V = R2 [x] be the vector space of all polynomials with real coef-
ficients of degree less than or equal to 2. Consider the following three ordered bases
of V , B1 = {1, 1 + x, 1 + x + x 2 }, B2 = {x, 1, 1 + x 2 } and B3 = {x, 1 + x 2 , 1}.
(1) For p(x) = 1 − 2x + 2x 2 , find [ p(x)] B1 , [ p(x)] B2 , [ p(x)] B3 .
(2) Find the matrix M1 of B3 relative to B2 .
(3) Find the matrix M2 of B2 relative to B1 .
(4) Find the matrix M of B3 relative to B1 and verify that M = M2 M1 .
(1) Express p(x) as a linear combination of ordered bases B1 , B2 and B3 , given
as below:
1 − 2x + 2x 2 = 3(1) + (−4)(1 + x) + 2(1 + x + x 2 )
1 − 2x + 2x 2 = (−2)x + (−1)1 + 2(1 + x 2 )
2.4 Change of Basis 71
Exercises
1. In the vector space R3 , find the transition matrix of the ordered basis
{(1, cosx, sinx), (1, 0, 0), (1, −sinx, cosx)} relative to the standard ordered
basis {(1, 0, 0), (0, 1, 0), (0, 0, 1)} of R3 .
2. In the vector space R3 , find the transition matrix of the ordered basis
{(2, 1, 0), (0, 2, 1), (0, 1, 2)} relative to the standard ordered basis
{(1, 0, 0), (0, 1, 0), (0, 0, 1)} of R3 .
72 2 Vector Spaces
3. Let B1 = {(1, 0), (0, 1)} and B2 = {(2, 3), (3, 2)} be two ordered bases of R2 .
Then find the transition matrix P of B2 relative to the basis B1 and the transition
matrix Q of B1 relative to the basis B2 and show that P Q = Q P = I2 .
4. In the vector space R3 , let B1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} and B2 =
{(1, 1, 0), (1, −1, 0), (0, 1, 1)} be two ordered bases. Then find the transition
matrix P of B2 relative to the basis B1 and transition matrix Q of B1 relative to
the basis B2 and show that P Q = Q P = I3 .
5. Suppose the X and Y axes in the plane R2 are rotated counterclockwise 45o
so that the new X and Y axes are along the line y = x and the line y = −x,
respectively.
(a) Find the change of basis matrix.
(b) Find the coordinates of the point P(5, 6) under the given rotation.
6. Let B = {u 1 , u 2 , . . . , u n } be an ordered basis of a nonzero finite dimensional
n
vector space V over a field F. For each j : 1 ≤ j ≤ n, define v j = i=1 αi j u j ,
where αi j ∈ F. If the ordered set B = {v1 , v2 , . . . , vn } is a basis of V , then prove
that P = (αi j ) is an invertible matrix over F.
7. Let W be the subspace of C3 over C spanned by α1 = (1, 0, i), α2 = (i, 0, 1),
where C is the field of complex numbers. Prove the following:
(a) The set B = {α1 , α2 } is a basis of W .
(b) The set B = {β1 , β2 }, where β1 = (1 + i, 0, 1 + i), β2 = (1 − i, 0, i − 1)
is also a basis of W .
(c) Find the matrix of the ordered basis B = {β1 , β2 } relative to the ordered
basis B = {α1 , α2 }.
8. Find the total number of ordered bases of the vector space V = Fn , where F is
a finite field containing p elements.
9. Let {α1 , α2 , . . . , αn } be a basis of an n-dimensional vector space V . Show that
{λ1 α1 , λ2 α2 , . . . , λn αn } is also a basis of V for any nonzero scalars λ1 , λ2 , . . . , λn .
If the coordinate of a vector v under the basis {α1 , α2 , . . . , αn } is
x = {x1 , x2 , . . . , xn }, then find the coordinate of v under {λ1 α1 , λ2 α2 , . . . , λn αn }.
What are the coordinates of w = α1 + α2 + · · · + αn with respect to the bases
{α1 , α2 , . .
. , αn } and {λ1 α1 , λ2 α2 , . . . , λn αn } ?
ab
10. Let W = | a, b, c ∈ R . Show that W is a subspace of M2×2 (R) over
bc
10 01 00
R and , , forms a basis of W . Find the coordinate of the
00 10 01
1 −2
matrix under this basis.
−2 3
11. Consider the vector space Rn over R with usual operations. Consider the bases
B1 = {e1 , e2 , . . . , en } where
and B2 = { f 1 , f 2 , . . . , f n } where
A map between any two algebraic structures (say groups, rings, fields, modules
or algebra) of same kind is said to be an isomorphism if it is one-to-one, onto
and homomorphism; roughly speaking, it preserves the operations in the underlying
algebraic structures. If any two vector spaces over the same field are given, then one
can study the relationship between two vector spaces. In this chapter, we will define
the notion of a linear transformation between two vector spaces U and V which are
defined over the same field and discuss the basic properties of linear transformations.
Throughout, the vector spaces are considered over a field F unless otherwise stated.
Definition 3.1 Let U and V be vector spaces over the same field F. A map T : U →
V is said to be a linear transformation (a vector space homomorphism or a linear
map) if it satisfies the following:
(1) T (u 1 + u 2 ) = T (u 1 ) + T (u 2 )
(2) T (αu 1 ) = αT (u 1 )
for all u 1 , u 2 ∈ U and α ∈ F.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 75
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_3
76 3 Linear Transformations
Example 3.3 (1) Consider the vector space V = R[x] of polynomials over the field
R. Define a map T : V → V such that T ( f (x)) = f (x), the usual derivative
of polynomial f (x). It can be easily seen that T ( f (x) + g(x)) = T ( f (x)) +
T (g(x)) and T (α f (x)) = αT ( f (x)), for any f (x), g(x) ∈ V and α ∈ R. Hence
T is a linear mapping which is onto but not one-to-one.
(2) Let T : R2 → R2 such that T (a, b) = (a + b, b) for all a, b ∈ R. It can be easily
seen that T is a linear transformation which is also an isomorphism.
(3) Let T : R3 → R3 such that T (a, b, c) = (0, b, c) for any (a, b, c) ∈ R3 . One
can easily verify that T is a linear transformation which is neither one-to-one
nor onto.
(4) Consider V = C [a, b], the vector space of all continuous real-valued func-
tions on the closed interval [a, b]. Define map T : V → R such that T ( f (x)) =
b
f (x)d x. For any α, β ∈ R and f (x), g(x) ∈ C [a, b], T (α f (x) + βg(x)) =
ab b b
a (α f (x) + βg(x))d x = α a f (x)d x + β a f (x)d x = αT ( f (x)) +
βT (g(x)). This shows that T is a linear transformation which is onto but not
one-to-one.
(5) Let R2 [x] denote the vector space of all polynomials of degree less than or equal
to two, over the field R. Then there exits a natural map T : R2 [x] → R3 defined
by T (α0 + α1 x + α2 x 2 ) = (α0 , α1 , α2 ), where α0 , α1 , α2 ∈ R. This is a linear
transformation which is one-to-one and onto.
(6) The map T : R2 → R2 defined by T (a, b) = (a + 1, b) (or, T (a, b) = (|a|,
|b|)) is not a linear transformation.
(7) The map T : R4 → R2 defined by T (a1 , a2 , a3 , a4 ) = (a1 + a2 , a3 + a4 ) is a
linear transformation. This is onto but not one-to-one.
(8) The map T : Mm×n (R) → Mn×m (R) defined by T (A) = At , the transpose of
the matrix A, is a linear transformation. It can be easily shown that this is an
isomorphism.
(9) Let T : Rn → Rn+1 such that T (a1 , a2 , . . . , an ) = (a1 , a2 , . . . , an , 0). Then T
is a linear transformation called natural inclusion. It is an injective linear trans-
formation but not surjective.
(10) Let C be the vector space over the field of real numbers R. Let T : C → C
be a map defined by T (z) = z̄, where z̄ is the conjugate of complex num-
3 Linear Transformations 77
(α f (x)) = α ( f (x)), for any f (x), g(x) ∈ V and α ∈ F. Hence is a linear
mapping which is one-to-one but not onto. This linear transformation is called
integration transformation.
(13) Let αi j ∈ F for each i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ n. Define a map T :
Fm → Fn such that
m
m
m
T (a1 , a2 , . . . , am ) = αi1 ai , αi2 ai , . . . , αin ai .
i=1 i=1 i=1
(1) Let T : U → V be the mapping which assigns the zero vector of V to every
vector u ∈ U , i.e., T (u) = 0 for all u ∈ U . Then it can be verified that T is a
linear transformation, which is known as the zero linear transformation , usually
denoted by 0.
(2) The identity mapping I : U → U such that I (u) = u for all u ∈ U is a linear
transformation and is known as the identity linear transformation, denoted as
IU . It is an isomorphism also.
(3) Let V be any vector space and W be a subspace of V . The inclusion mapping
i : W → V defined as i(w) = w for all w ∈ W is a linear transformation. This
is known as inclusion linear transformation, which is injective also. It is an
isomorphism if and only if W = V .
(4) Let V be a vector space and W a subspace of V . Let T : V → V /W be the
map defined by T (v) = v + W for every v ∈ V. It is easy to see that T is linear
transformation which is known as the quotient linear transformation. It is a
78 3 Linear Transformations
α1 T (u 1 ) + α2 T (u 2 ) + · · · + αn T (u n ) = T (α1 u 1 + α2 u 2 + · · · + αn u n ) = 0,
The following example shows that there exists no nonzero linear transformation T :
R2 −→ R2 , which maps the straight line ax + by + c = 0, where c = 0 to (0, 0) ∈
R2 .
Example 3.5 It is clear that out of a and b at least one must be nonzero, let it be a, i.e.,
a = 0. The straight line can also be represented as L = {( −bt−c a
, t) | t ∈ R} ⊆ R2 .
−bt−c
But T (l) = (0, 0) for all l ∈ L. This implies that T ( a , t) = (0, 0) for all t ∈ R. In
particular T ( −c
a
, 0) = (0, 0), i.e., (−c)
a
T (1, 0) = (0, 0). Thus we arrive at T (1, 0) =
(0, 0). T ( a , t) = (0, 0) for all t ∈ R also gives us T ( −bt
−bt−c
a
, t) + T ( −c
a
, 0) = (0, 0),
−bt −b
i.e., T ( a , t) = (0, 0). In particular, putting t = 1, we get T ( a , 1) = (0, 0), i.e.,
(−b)
a
T (1, 0) + T (0, 1) = (0, 0). This implies that T (0, 1) = 0. Finally we have
T (x, y) = T (x, 0) + T (0, y) = x T (1, 0) + yT (0, 1) = (0, 0) for all (x, y) ∈ R2 ,
i.e., T = 0.
3 Linear Transformations 79
Exercises
1. Let V be the vector space of all continuous functions f : R −→ xR over the field
of reals and define a mapping φ : V −→ V by [φ( f )](x) = 0 f (t)dt. Prove
that φ is a linear transformation.
2. Let V be the vector space over R of polynomials with real coefficients. Define ψ :
k k
V −→ V by ψ ai x i = iai x i−1 . Prove that ψ is a linear transformation.
i=0 i=1
3. Let Vn = { p(x) ∈ F[x] | degp(x) < n}, where n is any positive integer. Define
T : Vn −→ Vn by T ( p(x)) = p(x + 1). Show that T is an automorphism of Vn .
4. Let U = Fn+1 and V = { p(x) ∈ F[x] | degp(x) ≤ n}, where n is any positive
integer, be the vector spaces over F. Define T : U −→ V by T (α0 , α1 , . . . , αn ) =
α0 + α1 x + · · · + αn x n . Then prove that T is an isomorphism from U to V.
5. Let V = R2 be the 2-dimensional Euclidean space. Show that rotation through
an angle θ 0 is a linear transformation on V .
6. Let V be the vector space of all twice differentiable functions in [0, 1]. Show
that the mappings T1 : V −→ V and T2 : V −→ V defined by T1 ( f ) = dd xf and
T2 ( f ) = x f are linear transformations.
7. T : V −→ V be a linear transformation which is not onto. Show that there exists
some v ∈ V , v = 0, such that T (v) = 0.
8. Show that the following mappings are linear transformations:
(a) T : R2 −→ R2 defined by T (x, y) = (2x − 3y, y).
(b) T : R3 −→ R2 defined by T (x, y, z) = (x + 2y + z, 3x − 4y − 2z).
(c) T : R4 −→ R3 defined by T (x, y, z, t) = (x − y + 2z + t, x − y − 2z +
3t, x + 3y + z − 2t).
9. Show that the following mappings are not linear mappings:
(a) T : R2 −→ R2 defined by T (x, y) = (x y, y 2 ).
(b) T : R2 −→ R3 defined by T (x, y) = (x + 3, 3y, 2x − y).
(c) T : R3 −→ R3 defined by T (x, y, z) = (|x + y|, 2x − 3y + z, x + y + 2).
10. Let V be the vector space of n-square real matrices. Let M be an arbitrary but
fixed matrix in V . Let T1 , T2 : V −→ V be defined by T1 (A) = AM + M A,
T2 (A) = AM − M A, where A is any matrix in V . Show that T1 and T2 both are
linear transformations on V .
11. Prove that a mapping T : U −→ V is a linear transformation if and only if
T (x + αy) = T (x) + αT (y) for all x, y ∈ U and α ∈ F.
12. Prove that any linear functional f : Rn → R is a continuous function.
13. Let f : Rn → R be a continuous function which is also additive. Prove that f
is a linear functional on Rn .
80 3 Linear Transformations
N (T ) = {u ∈ U | T (u) = 0}.
R(T ) = {T (u) | u ∈ U }.
B = {T (u k+1 ), T (u k+2 ), . . . , T (u n )}
3.1 Kernel and Range of a Linear Transformation 81
In order to show that B is linearly independent, suppose there exist scalars αk+1 , αk+2 ,
n
n
. . . , αn ∈ F such that αi T (u i ) = 0. This implies that T ( αi u i ) = 0, and
i=k+1 i=k+1
n
hence αi u i ∈ N (T ). But since {u 1 , u 2 , . . . , u k } spans N (T ), there exist β1 , β2 ,
i=k+1
n
k
n
k
. . . , βk ∈ F such that αi u i = βi u i , which yields that αi u i + (−βi )u i
i=k+1 i=1 i=k+1 i=1
= 0. But since {u 1 , u 2 , . . . , u k , u k+1 , . . . , u n } is a basis of U , we find that each
of αi = 0 and hence the set B is linearly independent and forms a basis of
R(T ). This also shows that elements in B are distinct and r (T ) = n − k, that is
dimU = r (T ) + n(T ).
Theorem 3.11 Let U and V be any two finite dimensional vector spaces over the
same field F. If f : U → V is an isomorphism and B = {u 1 , u 2 , . . . , u n } is a basis
of U , then B = { f (u 1 ), f (u 2 ), . . . , f (u n )} is a basis of V .
Proof We shall show that B is linearly independent. Suppose there exist scalars
n
n
α1 , α2 , . . . , αn such that αi f (u i ) = 0. This implies that f (αi u i ) = 0, i.e.,
i=1 i=1
n
n
f( (αi u i )) = 0. But since f is one-to-one map, we arrive at αi u i = 0. Since B
i=1 i=1
is a basis of U , we find that αi = 0 for each i = 1, 2, . . . , n. This shows that B is
linearly independent.
Now we show that B spans V , let v ∈ V . Since f is onto, for v ∈ V there exists
n
u ∈ U such that f (u) = v. Every vector u ∈ U can be written as u = βi u i , where
i=1
n
n
n
βi ∈ F. This shows that f (u) = f (βi u i ) = βi f (u i ), i.e., v = βi f (u i ).
i=1 i=1 i=1
Therefore, B spans V , and hence a basis of V .
Remark 3.12 The above theorem holds even if U and V are infinite dimensional
vector spaces over the same field F. Accordingly, theorem can be stated as: Let
U and V be any two infinite dimensional vector spaces over the same field F. If
f : U → V is an isomorphism and B = {u 1 , u 2 , . . . , u n , . . .} is a basis of U , then
B = { f (u 1 ), f (u 2 ), . . . , f (u n ), . . .} is a basis of V . Proof of this fact follows the
same pattern as above.
n
n
n
αx + βy = (ααi )u i + (ββi )u i = (ααi + ββi )u i .
i=1 i=1 i=1
n
n
T (x) = αi T (u i ) = αi vi = T (x)
i=1 i=1
and therefore T = T .
The following corollary shows that if any two linear transformations agree on the
basis of a vector space, then they will be the same.
Remark 3.15 (i) Theorem 3.13 can be restated as: Let U, V be vector spaces over
a field F and {u 1 , u 2 , . . . , u n } be a basis of U . A map f : {u 1 , u 2 , . . . , u n } −→
V can be uniquely extended to a linear map T : U −→ V , such that T (u i ) =
f (u i ) for each i, 1 ≤ i ≤ n.
(ii) Any map f from a basis of U to V will determine a unique linear map T :
U −→ V , which is called extension of f by linearity.
(iii) Thus maps from different bases of U to V or different maps from the same
basis of U to V will give different linear maps from U −→ V . Thus Theorem
3.13 gives us a method for determining linear maps from a finite dimensional
vector space U to a vector space V .
Theorem 3.16 Two finite dimensional vector spaces U and V over a field F are
isomorphic if and only if they are of the same dimension.
Proof Let U = ∼ V . This implies that there exists a bijective linear map T : U −→ V .
Suppose that dimU = n. We have to prove that dimV = n. Let {u 1 , u 2 , . . . , u n } be
a basis of U . We claim that the set B = {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis of V .
First we prove that B spans V . For this let v ∈ V . Since T is onto, there exits u ∈ U ,
such that v = T (u). As u ∈ U , this shows that there exist scalars α1 , α2 , . . . , αn ∈
n
F such that u = α1 u 1 + α2 u 2 + · · · + αn u n . As a result, we get v = T ( αi u i ) =
i=1
n
αi T (u i ) and hence B spans V . Next to prove that B is a linearly independent set,
i=1
n
n
let βi T (u i ) = 0 for some scalars β1 , β2 , . . . , βn ∈ F. This gives us T ( βi u i ) =
i=1 i=1
84 3 Linear Transformations
n
n
0, i.e., βi u i ∈ K er T . Since T is injective, we conclude that βi u i = 0. This
i=1 i=1
implies that βi = 0 for all i, 1 ≤ i ≤ n. It proves that B is a linearly independent set
and thus B will also contain n elements. Thus B is a basis of V and dim V = n.
Conversely, suppose that dimU = dimV = n. Then we have to show that U ∼ = V.
Let {u 1 , u 2 , . . . , u n } and {v1 , v2 , . . . , vn } be bases of U and V , respectively. Define
a map f : {u 1 , u 2 , . . . , u n } −→ V such that f (u i ) = vi for each i, 1 ≤ i ≤ n. By
Remark 3.7 (i), this map f can be uniquely extended to a linear map T : U −→ V
such that T (u i ) = f (u i ) = vi for each i, 1 ≤ i ≤ n. We show that the linear map T is
bijective. Let x, y ∈ U . There exist scalars α1 , α2 , . . . , αn , β1 , β2 , . . . , βn ∈ F such
n n n
that x = αi u i and y = βi u i . Then T (x) = T (y) implies that T ( αi u i ) =
i=1 i=1 i=1
n
n
n
n
n
T( βi u i ). Now we obtain that αi T (u i ) = βi T (u i ), i.e., αi vi = βi vi .
i=1 i=1 i=1 i=1 i=1
n
This shows that (αi − βi )vi = 0. Since {v1 , v2 , . . . , vn } is a basis of V , we conclude
i=1
n
n
that αi = βi for all i, 1 ≤ i ≤ n. This implies that αi u i = βi u i , i.e., x = y.
i=1 i=1
These arguments prove that T is one-to-one. To prove the ontoness of T , let v ∈ V ,
n
thus there exist scalars γ1 , γ2 , . . . , γn ∈ F such that v = γi vi . This shows that
i=1
n
n
v= γi T (u i ). Since T is linear, we have v = T ( γi u i ) = T (u), where u =
i=1 i=1
n
γi u i ∈ U . Thus there exists u ∈ U such that T (u) = v, so T is onto and therefore
i=1
T is an isomorphism, i.e., U ∼
= V.
Exercises
1. Find out Range, Kernel, Rank and Nullity of all the linear transformations given
in the Problems 1–6, Problem 8 and 10 of the preceding section.
2. Let T : V1 −→ V2 be a linear map between finite dimensional vector spaces.
Prove that T is an isomorphism if and only if n(T ) = 0 and r (T )= dimV2 .
3. If T : U −→ V is a linear map, where U is finite dimensional, prove that
(a) n(T ) ≤ dimU ,
(b) r (T ) ≤ min(dimU , dimV ).
4. Let Z be subspace of a finite-dimensional vector space U , and V a finite-
dimensional vector space. Then prove that Z will be the kernel of a linear map
T : U −→ V if and only if dim Z ≥ dimU − dimV .
5. Let T : R4 −→ R3 be a linear map defined by T (e1 ) = (1, 1, 1), T (e2 ) =
(1, −1, 1), T (e3 ) = (1, 0, 0), T (e4 ) = (1, 0, 1). Then verify Rank-Nullity
Theorem, where {e1 , e2 , e3 , e4 } is the standard basis of R4 .
6. Let T be a nonzero linear transformation from R5 to R2 such that T is not onto.
Find r (T ) and n(T ).
3.1 Kernel and Range of a Linear Transformation 85
In this section we prove some isomorphism theorems, which have vast applications
in linear algebra.
(V1 V2 ), and hence f is onto. This shows that f is an isomorphism, i.e., V1 +V2 ∼ V2 =
V
1
V1 V2
.
f (α((v + V2 ) + V1 /V2 )) + β((v + V2 ) + V1 /V2 )) = f ((α(v + V2 ) + V1 /V2 ) +
(β(v + V2 ) + V1 /V2 )) = f (((αv + βv ) + V2 ) + V1 /V2 ) = (αv + βv ) + V1 =
(αv + V1 ) + (βv + V1 ) = α(v + V1 ) + β(v + V1 ) = α f ((v + V2 ) + V1 /V2 ) +
β f ((v + V2 ) + V1 /V2 ) for every α, β ∈ F, v, v ∈ V. The previous arguments
show that f is a linear transformation. To prove that f is one-to-one, let f ((v +
V2 ) + V1 /V2 ) = f ((v + V2 ) + V1 /V2 ). This implies that v + V1 = v + V1 , i.e.,
(v − v ) ∈ V1 . Now we have (v − v ) + V2 ∈ V1 /V2 , i.e., (v + V2 ) − (v + V2 ) ∈
V1 /V2 . This shows that (v + V2 ) + V1 /V2 = (v + V2 ) + V1 /V2 , since v + V2 , v +
V2 are the members of the vector space V /V2 and V1 /V2 is a subspace of V /V2 .
Thus f is one-to-one. To show the ontoness of f , let v + V1 ∈ V /V1 . Obviously
f ((v + V2 ) + V1 /V2 ) = v + V1 , which shows that f is onto. Finally we conclude
2 ∼ V
that f is an isomorphism and VV1/V/V2 = V1
.
Exercises
1. Using first isomorphism theorem, prove the second and third isomorphism
Theorems.
2. Let Vn ={ p(x) ∈ Q[x] | degp(x) < n} and Vn−1 = { p(x) ∈ Q(x) | degp(x) <
(n − 1)}, where n > 1. Define T : Vn −→ Vn−1 by T ( p(x)) = ddx p(x). Show
that T is a linear transformation and using first isomorphism theorem also prove
that Vn /Q ∼= Vn−1 .
3. Let V = V1 V2 be the direct sum of its subspaces V1 and V2 . Show that the
mappings p1 : V −→ V1 and p2 : V −→ V2 defined by p1 (v) = v1 , p2 (v) =
88 3 Linear Transformations
are finite dimensional, then also prove that V is finite dimensional and dimV =
dimV1 + dimV2 + · · · + dimVn .
7. Prove that M2 (R) ∼ = R4 . Give two different isomorphisms of M2 (R) onto R4 .
In the set of linear transformations one can combine any two linear transformations
in various ways in order to obtain a new linear transformation. The study of this set
is important because it forms a natural vector space structure. It is more important
in case if we consider the set of linear transformations from a vector space into
itself, because in that case, it is also possible to define the composition of two linear
mappings.
Let T1 , T2 : U → V be linear transformations from a vector space U to a vector
space V over the same field F. The sum T1 + T2 and the scalar product kT1 , where
k ∈ F, are defined to be the following mappings from U into V .
It can be easily seen that T1 + T2 and kT1 are also linear. In fact, for any u 1 , u 2 ∈ U
and α, β ∈ F,
(T1 + T2 )(u) = T1 (u) + T2 (u), (kT1 )(u) = kT1 (u) for any u ∈ U.
Proof Let B = {u 1 , u 2 , . . . , u m } and B = {v1 , v2 , . . . , vn } be bases of U and V ,
respectively. Define mn mappings, f i j : B −→ V for each i = 1, 2, . . . , m and j =
1, 2, . . . , n such that f i j (u k ) = 0 for all k = i and f i j (u k ) = v j for all k = i, where
k = 1, 2, . . . , m. Now by Theorem 3.13, one can find mn linear transformations
Ti j : U → V , such that Ti j | B = f i j where i = 1, 2, . . . , m and j = 1, 2, . . . , n. Our
result follows if we can show that the set {Ti j | i = 1, 2, . . . , m, j = 1, 2, . . . , n} is
a basis of H om(U, V ). Let T be an arbitrary member of H om(U, V ). Since for
each u i ∈ U, T (u i ) ∈ V and {v1 , v2 , . . . , vn } is a basis of V , corresponding to each
n
T (u i ) we can find n scalars αi j ∈ F, j = 1, 2, . . . , n such that T (u i ) = αi j v j .
j=1
Now it is clear that for every i and j, Ti j (u k ) = f i j (u k ) = v j for all k = i and
m n
Ti j (u k ) = f i j (u k ) = 0 for all k = i. Next we claim that T = αi j Ti j . Since
i=1 j=1
m
n
m
n
αi j Ti j (u k ) = αi j Ti j (u k )
i=1 j=1 i=1 j=1
m
n
= αi j Ti j (u k )
i=1 j=1
n
= αk j Tk j (u k )
j=1
n
= αk j v j
j=1
= T (u k ),
m
n
m
n
αi j Ti j ∈ H om(U, V ) and both T, αi j Ti j agree on all basis elements
i=1 j=1 i=1 j=1
m
n
of U . This shows that T = αi j Ti j , i.e., T ∈ H om(U, V ) is a linear com-
i=1 j=1
bination of Ti j s. Now in order to show that the set {Ti j | i = 1, 2, . . . , m, j =
1, 2, . . . , n} is linearly independent, suppose that there exist scalars βi j ∈ F such
m n
that βi j Ti j = 0, where 0 stands for the zero linear transformation from U to
i=1 j=1
m
n
V . This implies that βi j Ti j (u k ) = 0(u k ) for all k = 1, 2, . . . , m. This yields
i=1 j=1
n
m
that βi j Ti j (u k ) = 0. But since Ti j (u k ) = v j for all k = i and Ti j (u k ) = 0
j=1 i=1
n
for all k = i, the latter expression reduces to βk j v j = 0 for all k = 1, 2, . . . , m.
j=1
Since {v1 , v2 , . . . , vn } is a basis of V , we find that βk j = 0. This yields that the
set {Ti j | i = 1, 2, . . . , m, j = 1, 2, . . . , n} is linearly independent and thus forms a
basis of H om(U, V ). Finally, we get dim H om(U, V ) = mn = dimU dimV .
3.3 Algebra of Linear Transformations 91
Remark 3.26 Let U be a vector space over a field F such that dimU = m, then
dim H om(U, U ) = m 2 . Since every field F is a vector space over F of dimension
one, dim H om(U, F) = m.
Lemma 3.28 Let U, V, W, be vector spaces over a field F . Then the following hold:
(i) For any linear transformations T1 : U → V and T2 : V → W the composite
map T2 T1 : U → W is again a linear transformation.
(ii) For any linear transformations T1 : U → V, T2 : V → W and α ∈ F, α(T2 T1 )
= (αT2 )T1 = T2 (αT1 ).
(iii) For any linear transformations T2 , T3 : U → V and T1 : V → W, T1 (T2 +
T3 ) = T1 T2 + T1 T3 .
(iv) For any linear transformations T1 , T2 : V → W and T3 : U → V (T1 + T2 )T3
= T1 T3 + T2 T3 .
Remark 3.29 (i) The set of all linear transformations of V into itself, i.e., H om
(V, V ) is usually denoted by A (V ).
(ii) A linear operator T on the vector space U is called an idempotent linear operator
on U if T 2 = T .
(iii) A linear operator T on the vector space U is called a nilpotent linear operator
on U if T k = 0 for some integer k ≥ 1.
Proof (i) By Theorem 3.24, A (V ) is a vector space over F under the addition and
scalar multiplication operations on linear mappings.
(ii) A (V ) is a ring. In fact, A (V ) is an abelian group under the operation addition of
linear transformations. Also by Lemma 3.28 (i), A (V ) is closed with respect to the
operation of product (composition). Since the composition of functions is associative
in general, the product in A (V ) is associative. The distributivity of product on the
operation of addition of linear transformations follows by Lemma 3.28(iii) and (iv).
Hence A (V ) forms ring under the operations of addition and composition of linear
transformations.
(iii) By Lemma 3.28 (ii), for any α ∈ F and T1 , T2 ∈ A (V )
Exercises
1. Show that the set {T1 , T2 , T3 } in the corresponding vector space is linearly inde-
pendent, where
(a) T1 , T2 , T3 ∈ A (R2 ) defined by T1 (x, y) = (x, 2y), T2 (x, y) = (y, x + y),
T3 (x, y) = (0, x),
(b) T1 , T2 , T3 ∈ H om(R3 , R) defined by T1 (x, y, z) = x + y + z, T2 (x, y, z) =
y + z, T3 (x, y, z) = x − z.
2. Find the conditionunder which dim H om(V, U ) = dimV ?
3. Suppose V = U W , where U and W are subspaces of V . Let T1 and T2 be
linear operators on V defined by T1 (v) = u, T2 (v) = w, where v = u + w, u ∈
U, w ∈ W . Show that
(a) T12 = T1 and T22 = T2 , i.e., T1 and T2 are projections.
(b) T1 + T2 = I , the identity mapping.
(c) T1 T2 = 0 and T2 T1 = 0.
3.3 Algebra of Linear Transformations 93
4. Let T1 and T2 be linear operators onV satisfying parts (a), (b), (c) of the above
Problem 3. Prove that V = T1 (V ) T2 (V ).
5. Give an example to show that the set of all nonzero elements of A (V ) is not a
group under composition of linear operators.
6. Determine the group of units of the ring A (V ).
7. Let T : V −→ W be an isomorphism, where V and W are vector spaces over
the field F. Prove that the mapping f : A (V ) −→ A (W ) defined by f (S) =
T ST −1 is an isomorphism.
8. If T ∈ A (V ), prove that the set of all linear transformations S on V such that
T S = 0 is a subspace and a right ideal of A (V ).
9. If dimA (V ) > 1, then prove that A (V ) is not commutative.
10. Let T1 , T2 be two linear maps on V such that T1 T2 = T2 T1 . Then prove that
(a) (T1 + T2 )2 = T12 + 2(T1 T2 ) + T22 ;
(b) (T1 + T2 )n = n C0 T1n + n C1 T1(n−1) T2 + · · · + n Cn T2n .
11. Let V1 be a subspace of a vector space V . Then prove that the set of all linear
transformations from V to V that vanish on V1 is a subspace of A (V ).
12. Let T be an idempotent linear operator on any vector space U over F. Then
prove that U = R(T ) N (T ) and for any v ∈ R(T ), T (v) = v.
13. Let V be an n-dimensional vector space, n ≥ 1, and T : V → V be a linear
transformation such that T n = 0 but for some v ∈ V, T n−1 (v) = 0. Show that
the set {v, T (v), T 2 (v), . . . , T n−1 (v)} forms a basis of V .
14. Let T : V → V be a nilpotent linear operator. Show that I + T is invertible.
(Hint: Any linear transformation is said to be nilpotent if T k = 0 for some integer
k ≥ 1. If T k = 0, then I − T + T 2 − T 3 + · · · + (−1)k−1 T k−1 is the inverse of
(I + T ).
15. Let B = {v1 , v2 , . . . , vn } be a basis of V . Suppose that for each 1 ≤ i, j ≤ n
define v , f or k = i
f i, j (vk ) = k
vi + v j , f or k = i.
Remark 3.32 (i) From the Remark 3.7(iii), T is nonsingular if and only if T is
injective or one-to-one.
94 3 Linear Transformations
(ii) Let u ∈ K er T1 . This implies that T1 (u) = 0, i.e., T2 (T1 (u)) = T2 (0). Thus
we get T2 T1 (u) = 0, i.e., u ∈ K er T2 T1 . Therefore, K er T1 ⊆ K er T2 T1 . Since both
K er T1 and K er T2 T1 are subspaces of U , K er T1 is a subspace of K er T2 T1 . Now we
get dim K er T1 ≤ dim K er T2 T1 , i.e., n(T1 ) ≤ n(T2 T1 ).
Theorem 3.35 Let V be a finite dimensional vector space and T, S : V −→ V be
linear transformations, where S is nonsingular. Then r (ST ) = r (T S) = r (T ).
Proof Clearly, ST : V −→ V is a linear transformation. I m(ST ) = (ST )(V ) =
S(T (V )). Therefore, r (ST ) = dim S(T (V )). Since T (V ) is a subspace of V , the
restriction of S on the subspace T (V ) is also a nonsingular linear transformation, i.e.,
S |T (V ) : T (V ) −→ V . Hence r (S |T (V ) ) = dimT (V ) follows from Remark 3.32(ii).
Now this implies that dim S(T (V )) = dimT (V ), and hence r (ST ) = r (T ). Again
since V is finite dimensional and S : V −→ V is injective, S is surjective also.
Hence S(V ) = V , i.e., T (S(V ) = T (V ). This relation shows that (T S)(V ) = T (V ),
i.e., I m(T S) = I m(T ). As a result, dim I m(T S) = dim I mT , i.e., r (T S) = r (T ).
Finally we have proved that r (ST ) = r (T S) = r (T ).
Definition 3.36 A linear transformation T : U −→ V is called invertible if T is
bijective as a map. In other words, if T is an isomorphism, then T is said to be
invertible.
Remark 3.37 (i) If a linear transformation T : U −→ V is invertible, then T is
a bijective map and hence the inverse of the map T , which is usually denoted by
T −1 , exists and which is a map from V to U . Now we prove that T −1 (αv1 + βv2 ) =
αT −1 (v1 ) + βT −1 (v2 ) for all α, β ∈ F, v1 , v2 ∈ V . Let us suppose that T −1 (αv1 +
βv2 ) = u ∈ U and T −1 (v1 ) = u 1 ∈ U, T −1 (v2 ) = u 2 ∈ U . This implies that T (u) =
αv1 + βv2 and T (u 1 ) = v1 , T (u 2 ) = v2 . Now we have T (u) = αT (u 1 ) + βT (u 2 ).
But as T is a linear transformation, we have T (u) = T (αu 1 + βu 2 ). Since T is
one-to-one, we get u = αu 1 + βu 2 , i.e., T −1 (αv1 + αv2 ) = αT −1 (v1 ) + βT −1 (v2 ).
Thus T −1 is also linear transformation, which is known as the inverse of linear
transformation T . We know that if T is a bijective map, then T −1 is also a bijective
map. This shows that if T is invertible, i.e., T is an isomorphism, then its inverse
T −1 is also an isomorphism and hence invertible also.
(ii) By the properties of invertible maps, we can say that the linear transformation
T : U −→ V is invertible if and only if there exists the linear transformation T −1 :
V −→ U such that T T −1 = I V and T −1 T = IU , where IU and I V are identity linear
transformations on vector spaces U and V , respectively. In particular, if T : U −→ U
is a linear operator on U . Then T is invertible if and only if there exists the linear
operator T −1 on U , i.e., T −1 : U −→ U such that T T −1 = IU = T −1 T.
(iii) Let U be a finite dimensional vector space and T : U −→ U be an invertible
linear operator on U . This shows that T is bijective, i.e., K er T = {0} or T is nonsin-
gular. Hence n(T ) = 0. Using Rank-Nullity theorem, we have r (T ) = dimU . This
implies that T is surjective. It is easy to observe that if U is a finite dimensional vector
space, then the linear map T : U −→ U is invertible if and only if T is nonsingular,
i.e., injective or alternatively T is surjective. Hence nonsingular and invertible linear
transformations on a vector space U are synonyms when U is finite dimensional.
96 3 Linear Transformations
Theorem 3.38 Let U and V be finite dimensional vector spaces over the same field
F such that dim U = dim V . If T : U −→ V is a linear transformation, then the
following statements are equivalent:
(i)T is invertible.
(ii)T is nonsingular.
(iii)The range of T is V .
(iv) If {u 1 , u 2 , . . . , u n } is any basis of U , then {T (u 1 ), T (u 2 ), . . . , T (u n )} is a basis
of V .
(v) There is a basis {u 1 , u 2 , . . . , u n } for U such that {T (u 1 ), T (u 2 ), . . . , T (u n )} is
a basis of V .
(iv) =⇒ (v) Since U is a finite dimensional vector space, it has a finite basis. Let
{u 1 , u 2 , . . . , u n } be a basis of U . Now by hypothesis {T (u 1 ), T (u 2 ), . . . , T (u n )} is
a basis of V .
3.4 Nonsingular, Singular and Invertible Linear Transformations 97
Exercises
(a) T −1
(b) 2T − T −1 .
9. Let S and T be linear transformations on a finite dimensional vector space. Show
that if ST = I , then T S = I . Is this true for infinite dimensional vector spaces?
10. Let A, B, C, D ∈ Mn×n (C). Define T on Mn×n (C) by T (X ) = AX B + C X +
X D, X ∈ Mn×n (C). Show that T is a linear transformation on Mn×n (C) and that
when C = D = 0, T has an inverse if and only if A and B are invertible.
Throughout this section all the vector spaces will be finite dimensional. Con-
sider an ordered basis B = {u 1 , u 2 , . . . , u m } of a vector space V over a field F.
For any vector u ∈ V , there exist unique scalars α1 , α2 , . . . , αm such that u =
α1 u 1 + α2 u 2 + · · · + αm u m . The coordinate vector of u relative to the ordered basis
B, i.e., [u] B is a column vector denoted by [u] B = [α1 , α2 , . . . , αm ]T . Now, let V
be an n-dimensional vector space with an ordered basis B = {v1 , v2 , . . . , vn } and
T : U → V be a linear transformation. Then, for any u i ∈ U , T (u i ) ∈ V . Therefore,
n
for any 1 ≤ i ≤ m if T (u i ) = α ji v j , αi j ∈ F, then it can be expressed as
j=1
T (u 1 ) = α11 v1 + α21 v2 + · · · + αn1 vn
T (u 2 ) = α12 v1 + α22 v2 + · · · + αn2 vn
..
.
The transpose matrix of the coefficients matrix of the above equations of order n × m
is called the matrix of T relative to the ordered bases B and B of the vector spaces
U and V , respectively, and is denoted as m(T )(B,B ) or [T ] BB . The ith column of this
matrix is the coefficients of T (u i ), when it is expressed as a linear combination of
v j , j = 1, 2, 3, . . . , n.
Remark 3.43 (i) It is straightforward to see that the matrix of a linear transforma-
tion changes according to the choice of a basis of the underlying vector space.
In fact, if we change the order of element in a given basis even then we get a
different matrix of the linear transformation. Since the basis of a vector space is
not unique, the order of element and the basis both play important role in finding
the matrix of a linear transformation.
(ii) The matrix of the zero linear transformation is always the zero matrix for any
choice of basis. However, the matrix of the identity linear transformation is the
identity matrix with respect to the same basis, otherwise it may be different from
the identity matrix.
Theorem 3.44 Let A = (δ ji ) be an n × m matrix. Fix bases B = {u 1 , u 2 , . . . , u m }
m
and B = {v1 , v2 , . . . , vn } of U and V , respectively. Let u ∈ U such that u = αi u i .
i=1
n
Define a map T : U → V such that T (u) = β j v j , where each of β j is defined as
j=1
m
βj = δ jk αk . Then T : U → V is a linear transformation such that [T ] BB = A.
k=1
m
Proof First, we show that T is well defined. For this let u = u , where u = αk u k
k=1
m
n
and u = αk u k . Thus we have T (u) = β j v j , where each of β j is defined as
k=1 j=1
m
n
m
βj = δ jk αk and T (u ) = β j v j , where each of β j is defined as β j = δ jk αk .
k=1 j=1 k=1
As B is a basis of U and u = u , we conclude that αk = αk for each k = 1, 2, 3, . . . , n.
As a result, β j = β j for each j = 1, 2, 3, . . . , n. Thus T (u) = T (u ) stands proved.
3.5 Matrix of a Linear Transformation 101
m
Let α, β ∈ F. Then αu + βu = (ααk + βαk )u k and let T (αu + βu ) =
k=1
n
m
γ j v j , where each of γ j is defined as γ j = δ jk (ααk + βαk ).
j=1 k=1
n
T (αu + βu ) = γjvj
j=1
n m
= δ jk (ααk + βαk ) v j
j=1 k=1
n m n
m
=α δ jk αk v j + β δ jk αk v j
j=1 k=1 j=1 k=1
= αT (u) + βT (u ).
..
.
Therefore [T1 + T2 ] BB = [(α ji + β ji )] = (α ji ) + (β ji ) = [T1 ] BB + [T2 ] BB .
Thus [αT1 ] BB = (αα ji ) = α(α ji ) = α[T1 ] BB .
Thus [T2 T1 ] BB = (γki ) = (βk j )(α ji ) = [T2 ] BB [T1 ] BB .
Theorem 3.48 If U and V are vector spaces over F of dimensions n and m, respec-
tively, then H om(U, V ) ∼
= Mm×n (F), where Mm×n (F) denotes the vector space of
all m × n matrices with entries from F and hence dimension of the vector space
Mm×n (F) = mn.
Proof Let B and B be ordered bases of U and V , respectively. Define a map
f : H om(U, V ) → Mm×n (F) such that f (T ) = [T ] BB . In the light of Theorem 3.45
f is well defined and one-to-one. We also have f (αT1 + βT2 ) = [αT1 + βT2 ] BB =
B B B B
[αT1 ] B + [βT2 ] B = α[T1 ] B + β[T2 ] B = α f (T1 ) + β f (T2 ) for every α, β ∈ F and
T1 , T2 ∈ H om(U, V ). Thus f is a linear transformation. Using Theorem 3.44, it is
easy to observe that f is onto. Now we conclude that f is an isomorphism and hence
H om(U, V ) ∼ = Mm×n (F). Therefore, dim Mm×n (F) = dim H om(U, V ) = dimU
dimV = mn.
Theorem 3.49 Let V be an n-dimensional vector space over F. Then the algebras
A (V ) and Mn×n (F) are isomorphic.
Proof Let IU and I V denote the identity linear operators on the vector spaces U and
B
V , respectively. It is clear that T = IV T IU . Using Theorem 3.46, we have [T ] B1 =
B B B
[I V T IU ] B1 = [I V ] B11 [T ] BB1 [IU ] BB . Now let us put P = [I V ] B11 and Q = [IU ] BB . Thus
B
we can write [T ] B1 = P[T ] BB1 Q, where clearly P and Q are matrices of orders
m × m and n × n, respectively. Now we claim that P and Q are invertible matri-
B
ces. Since I V = I V I V , we find that [I V ] BB11 = [I V I V ] BB11 = [I V ] BB1 [I V ] B11 . Similarly, we
1
B B B B
also have [I V ] B1 = [I V I V ] B1 = [I V ] B11 [I V ] BB1 . But we know that [I V ] BB11 = [I V ] B1 =
1 1 1 1
B
In×n , the identity matrix of order n × n. Hence we conclude that [IV ] BB1 [I V ] B11 =
1
104 3 Linear Transformations
B
[I V ] B11 [I V ] BB1 = In×n , i.e., [I V ] BB1 P = P[I V ] BB1 = In×n . This shows that P is an invert-
1 1 1
ible matrix. In the similar lines one can show that Q is also an invertible matrix. This
B
proves that [T ] BB1 and [T ] B1 are equivalent matrices.
B B
As we have proved that [T ] B1 = P[T ] BB1 Q, where P = [I V ] B11 , i.e., the tran-
sition matrix of B1 relative to B1 and Q = [IU ] BB . If we put S = Q −1 , then
B
[T ] B1 = P[T ] BB1 S −1 , where S = Q −1 = [IU ] BB , i.e., the transition matrix of B rela-
tive to B . This completes the proof.
Corollary 3.51 Let T be a linear operator on an n-dimensional vector space U. If
B and B are ordered bases of U , then [T ] BB and [T ] BB are similar matrices and also
B
[T ] B = P[T ] B P , where P is the transition matrix of B relative to B .
B −1
Proof Let IU be the identity linear operator on the vector space U . If in partic-
ular, in the above theorem, we put V = U , B = B1 and B = B1 , then we get
[T ] BB = P[T ] BB Q, where P = [IU ] BB and Q = [IU ] BB are invertible matrices of order
n × n, i.e., [T ] BB and [T ] BB are equivalent matrices. Now we claim that Q = P −1 .
Since IU = IU IU , we find that [IU ] BB = [IU IU ] BB = [IU ] BB [IU ] BB . In the similar
way, [IU ] BB = [IU IU ] BB = [IU ] BB [IU ] BB . But [IU ] BB = [IU ] BB = In×n , the identity
matrix of order n × n. Hence we conclude that [IU ] BB [IU ] BB = [IU ] BB [IU ] BB = In×n ,
i.e., P Q = Q P = In×n . This implies that Q = P −1 . Finally we have shown that
[T ] BB = P[T ] BB P −1 . This implies that [T ] BB and [T ] BB are similar matrices.
As we have obtained that [T ] BB = P[T ] BB P −1 , where P = [IU ] BB , which is obvi-
ously the transition matrix of B relative to B .
Let T : U −→ V be a linear transformation, where U and V are finite dimensional
vector spaces over F. Theorem 3.50 and Corollary 3.51 show that matrices of T
relative to two different sets of bases are equivalent or similar. The next two results
prove their converse.
Theorem 3.52 Let T : U −→ V be a linear transformation, where U and V are
vector spaces over F of dimensions n and m, respectively. Let B and B1 be ordered
bases of U and V , respectively, and let C = [T ] BB1 . If C is equivalent to D, then there
B
exist ordered bases B , B1 of U and V , respectively, such that D = [T ] B1 .
Proof As C is equivalent to D, there exists nonsingular m × m matrix P and nonsin-
gular n × n matrix Q such that D = PC Q. Let S = Q −1 . Then clearly D = PC S −1 .
By Theorem 3.50, there exists an ordered basis B1 of V such that the transition matrix
of B1 relative to B1 is P −1 , i.e., [I V ] BB1 = P −1 and there exists an ordered basis B of
1
U such that Q is the transition matrix of B relative to B, i.e., [IU ] BB = Q. As a result,
P is the transition matrix of B1 relative to B1 and Q −1 = S is the transition matrix
B
of B relative to B . But by Theorem 3.50, [T ] B1 = P[T ] BB1 S −1 = PC Q = D. This
completes the proof.
Corollary 3.53 Let T be a linear operator on an n-dimensional vector space U.
For some ordered basis B of U , let A = [T ] BB . Let A be similar to D. Then there
exists an ordered basis B of U such that [T ] BB = D.
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 105
Theorem 3.54 Let A be any m × n matrix over F, B and B be the standard bases of
Fn×1 and Fm×1 , respectively. Then, for T : Fn×1 −→ Fm×1 , given by T (X ) = AX ,
[T ] BB = A. For m = n, [T ] BB = A.
⎤⎡
α1
⎢ α2 ⎥
⎢ ⎥
[u] B = ⎢ . ⎥ .
⎣ .. ⎦
αn
n
n
n
m
Then u = α j x j . This implies that T (u) = α j T (x j ) = αj ai j yi ], i.e.,
j=1 j=1 j=1 i=1
m
n
n
T (u) = ai j ]yi . Thus, in [T (u)] BB , the (i, 1)th entry = ai j . It can be easily
i=1 j=1 j=1
n
seen that the (i, 1)th entry of A[u] BB = ai j . This implies that (i, 1)th entry of
j=1
[T (u)] BB = (i, 1)th entry of A[u] BB for each i = 1, 2, . . . , m. Thus [T (u)] BB = A[u] BB .
This completes the proof of (i).
106 3 Linear Transformations
T (1, 0) = (1, 0) = 2(1, i) + −i(−i, 2), T (0, 1) = (0, 0) = 0(1, i) + 0(−i, 2),
2 0
we find that [T ] BB = .
−i 0
(2) The matrix of T relative to the pair of bases B , B.
We have
T (1, i) = (1, 0) = 1(1, 0) + 0(0, 1), T (−i, 2) = (−i, 0) = −i(1, 0) + 0(0, 1).
1 −i
Thus we get [T ] BB = .
0 0
(3) The matrix of T relative to the ordered basis B .
Since
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 107
Exercises
⎡ ⎤
1 0 3
1. Let V = R3 and suppose that ⎣ −1 −4 3 ⎦ is the matrix of T ∈ A (V ) rela-
6 2 1
tive to the basis {e1 , e2 , e3 }, where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).
Find the matrix of T relative to the basis {e1 , e2 , e3 }, where e1 = (1, 1, 0), e2 =
(1, 0, 1) and e3 = (0, 1, 1).
2. Let V be the vector space of all polynomials of degree less than or equal to
3 over R. In A (V ), define T by T (α0 + α1 x + α2 x 2 + α3 x 3 ) = α0 + α1 (x +
1) + α2 (x + 1)2 + α3 (x + 1)3 . Compute the matrix of T relative to bases
(i) {1, x, x 2 , x 3 };
(ii) {1, 1 + x, 1 + x 2 , 1 + x 3 }.
If the matrix in part (i) is A and that in part (ii) is B, then prove that A and B
are similar matrices.
3. Let T : R2 −→ R2 be the linear operator such that T (x, y) = (x − y, 2x + y).
Let B = {(1, 0), (0, 1)} and B = {(1, 2), (2, 1)} be ordered bases of R2 .
(a) Find [T ] BB , [T ] BB , [T ] BB , [T ] BB .
(b) Find the transition matrix P of B relative to B and verify that [T ] BB =
P[T ] BB P −1 .
(c) Find the formula for T −1 , and find [T −1 ] BB . Also verify that [T −1 ] BB =
([T ] BB )−1 .
(d) Find [T −1 ] BB and also verify [T −1 ] BB = ([T ] BB )−1 .
4. Let P2 (R) be the vector space of all polynomials of degree less than or equal to
2 over R and let T : R2 −→ P2 (R) be given by T (α, β) = βx + αx 2 . Consider
the following bases B = {(1, −2), (−3, 0)} of R2 and B = {1, x, x 2 } of P2 (R).
(a) Find [T ] BB .
(b) Verify that [T (3, −4)] BB = [T ] BB [(3, −4)] BB .
(c) Verify that RankT = Rank[T ] BB .
108 3 Linear Transformations
(a) Find T (e1 ), T (e2 ), T (e3 ) and determine T . Prove that T is invertible and
determine T −1 .
(b) Find the matrix of each of the following relative to the standard ordered
basis T 2 , T 2 + T , T 2 + I , (−2T )3 + 6T 2 − I .
(c) Find [T ] BB , where B = {e1 = (1, 1, 1), e2 = (1, 1, 0), e3 = (1, 0, 0)}.
(d) Find a basis of K er T and a basis of RangeT .
(e) Show that both matrices [T ] BB and [T ] BB have the same rank.
(f) Prove that [T ] BB and [T ] BB are similar matrices.
6. Let P3 (R) be the vector space of all polynomials of degree less than or equal to
3 over R and let D : P3 (R) −→ P3 (R) be the differentiation operator. Let B be
the standard basis {1, x, x 2 , x 3 } and B = {1, 1 + x, (1 + x)2 , (1 + x)3 }.
(a) Find [D] BB , [D] BB , [D] BB , [D] BB .
(b) For A = [D] BB , verify that A4 = 0, but A3 = 0.
(c) For any α = 0 in R, prove that α I + D is invertible.
7. Let V be the 2-dimensional vector space of solutions of the differential equation
y − 3y + 2y = 0 over C and let B = {y1 = e x , y2 = e2x } be a basis of V and
D : V −→ V be the differentiation operator. Find [D] BB .
8. Let V be the vector space of all 2 × 2 matrices with real entries and let T :
V −→ R be a map defined by T (A) = the trace of A.
1 0
0 1
0 0
0 0
(a) Show that the set B = , , , is a basis for V .
00 00 10 01
(b) Prove that T is a linear transformation and determine [T ] BB , where B = {5}.
Determine the dimension and a basis for the Kernel of T .
9. Let T be a linear operator on R3 defined by T (x, y, z) = (2y + z, x − 4z, 3x −
6z).
(a) Find [T ] BB , where B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)}.
(b) Verify that [T ] BB [v] B = [T (v)] B for any v ∈ R3 .
10. For each of the following linear operators T on R2 , find the matrix, that is
represented by T relative to the standard basis B = {(1, 0), (0, 1)} of R2 .
(a) T is defined by T (1, 0) = (2, 3), T (0, 1) = (3, −4).
(b) T is the rotation in R2 counterclockwise by 900 .
(c) T is the reflection in R2 about the line y = x.
11. The set B = {e3t , te3t , t 2 e3t } is a basis of a vector space V of functions f :
R −→ R. Let D be the differential operator on V , i.e., D( f ) = ddtf . Find [D] BB .
3.6 Effect of Change of Bases on a Matrix Representation of a Linear Transformation 109
(a) The change of basis matrix or transition matrix for the new coordinate sys-
tem.
(b) T (2, 3), T (−6, 8), T (−2, 6), T (3, 5), T (a, b).
21. Let T be a linear operator on R3 defined by T (x, y, z) = (2x − y + z, x − y +
2z, 2x − 3y + z).
(a) Find [T ] BB and [T ] BB , where B = {(1, 1, 1), (1, 1, 0), (1, 0, 0)} and B =
{(1, 1, 0), (1, 2, 3), (1, 3, 5)}.
(b) Verify that Determinant [T ] BB = Determinant [T ] BB .
22. Let T : R3 −→ R2 be a linear transformation given by T (x, y, z) = (2x − y +
z, −3x + 2y − z). Let B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, B = {(1, 1, 0), (0, 1,
2), (0, 1, 1)} and B1 = {(1, 1), (1, 2)}, B1 = {(1, 0), (0, 2)} be ordered bases of
R3 and R2 , respectively. Then
B
(a) Find [T ] BB1 and [T ] B1 .
B
(b) Verify that [T ] BB1 and [T ] B1 are equivalent matrices.
(c) Verify that Rank [T ] BB1 = Rank [T ] BB1 .
B
(d) Find out nonsingular matrices P and Q such that [T ] B1 = P[T ] BB1 Q.
B
(e) Verify that for any v ∈ R3 , [T (v)] B1 = [T ] BB1 [v] B and [T (v)] B1 =[T ] B1 [v] B .
23. Let T be a linear transformation on an n-dimensional vector space V . If
T n−1 (v) = 0 but T n (v) = 0, for some v ∈ V , then v, T (v), . . . , T n−1 (v) are
linearly independent, and thus form a basis of V . Find the matrix representation
of T under this basis.
24. Let C∞ (R) be the vector space of real valued functions on R having derivative
of all orders. Consider the differential operator D(y) = −y + ay + by, y ∈
C∞ (R), where a and b are real constants. Show that y = eλx lies in the K er D
if and only if λ is a root of the quadratic equation t 2 + at + b = 0.
21
25. Let T be a linear transformation on R associated with the matrix
2
under
02
the basis {α1 = (1, 0), α2 = (0, 1)}. Let W1 be the subspace of R2 spanned by
α1 . Show that W1 is invariant under T and that there does not exist a subspace
W2 invariant under T such that R2 = W1 ⊕ W2 .
26. Let S and T be linear transformations on R2 . Given that
thematrix represen-
12
tation of S under the basis {α1 = (1, 2), α2 = (2, 1)} is , and the matrix
23
33
representation of T under the basis {β1 = (1, 1), β2 = (1, 2)} is . Let
24
u = (3, 3) ∈ R2 . Find
(a) The matrix of S + T under the basis {β1 , β2 }.
(b) The matrix of ST under the basis {α1 , α2 }.
(c) The coordinate vector of S(u) under the basis {α1 , α2 }.
(d) The coordinate vector of T (u) under the basis {β1 , β2 }.
Chapter 4
Dual Spaces
Example 4.2 (1) Let V be a vector space over a field F. Define a map f : V −→ F
given by f (x) = 0 for all x ∈ V. It is obvious to observe that f is a linear
functional on V, which is known as the zero linear functional on V.
(2) Consider the vector space Fn over the field F. Define the map f : Fn −→ F by
f (x1 , x2 , . . . , xn ) = a1 x1 + a2 x2 + · · · + an xn , where x1 , x2 , . . . , xn ∈ F and
a1 , a2 , . . . , an are n fixed scalars. It can be easily verified that f is a linear
functional on Fn .
(3) Consider the vector space Fn over the field F. Let πi : Fn −→ F be the ith
projection, i.e., πi (x1 , x2 , . . . , xn ) = xi . It can be easily shown that πi is a linear
functional on Fn .
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 111
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_4
112 4 Dual Spaces
(4) Let R[x] be the vector space of all polynomials in x over the real field R. Define
1
the map I : R[x] −→ R by I ( f (x)) = 0 f (x)d x. It can be easily proved that
I is a linear functional on R[x].
(5) Let Mn×n (F) be the vector space of all n × n matrices over the field F. Consider
the map f : Mn×n (F) −→ F, defined as f (A) = trace(A) = a11 + a22 + · · · +
ann , where A = (ai j )n×n . It can be easily seen that f is a linear functional on
Mn×n (F).
(6) Let F[x] be the vector space of all polynomials in x over the field F. Define
the map L : F[x] −→ F by L( f (x)) = the value of f (x) at some fixed element
t ∈ F, i.e., f (t). It can be easily seen that L is a linear functional on F[x].
(7) Let C [a, b] be the vector space of all real-valued continuous functions over
the field R of real numbers. Define the map L : C [a, b] −→ R by L( f (x)) =
b
a f (x)d x. It is obvious that L is a linear functional on C [a, b].
Definition 4.3 Let V be a vector space over F and let B = {v1 , v2 , . . . , vn } be a basis
of V . Then a subset
B = { is said to be the dual basis of V
v1 , v2 , . . . , vn } of V with
respect to the basis B if B is a basis of V and vi (vi ) = 1, vi (v j ) = 0 for i = j.
Remark 4.4 (i) Since F is a vector space of dimension one over itself, dim H om
(V, F) = dimV .
(ii) It is clear that the dual basis with respect to any basis of a vector space V is
always unique while the change of basis of V will always produce a different
dual basis.
has a dual basis.
Theorem 4.5 Let V be a vector space over a field F. Then V
n
and for any α ∈ F, vi (αa) = vi i=1 αβi vi = αβi = α vi (a). Hence, vi ∈ V for
each i; 1 ≤ i ≤ n and it can be seen that for each i; 1 ≤ i ≤ n, vi (vi ) = 1, vi (v j )=0
for i = j; j = 1, 2, . . . , n. Since dimV =dim V , in order to show that { v1 , v2 , . . . , vn }
is a basis of V , it remains only to show that it is linearly independent. For any γi ∈ F,
n n
where i = 1, 2, . . . , n whenever i=1 γi vi = 0, then i=1 γi vi (v j ) = 0, for each
j = 1, 2, . . . , n. This implies that γi = 0 for each i = 1, 2, . . . , n. Hence we con-
clude that { .
v1 , v2 , . . . , vn } is a basis of V
4.1 Linear Functionals and the Dual Space 113
(ii) Since
v∈V and B is a basis of V , there exist unique scalars α1 , α2 , . . . , αn such
v = α1 v1 + α2 v2 + · · · + αn vn . Now for any v j ∈ V, 1 ≤ j ≤ n;
that
and
(iii) It can be easily seen that μ is a linear transformation. Indeed, for φ, ξ ∈ V
α, β ∈ F,
αφ + βξ = (αφ + βξ )(v1 )
v1 +
(αφ + βξ )(v2 )
v2 + · · · + (αφ + βξ )(vn )
vn
= αφ(v1 ) + βξ(v1 ) v1 + αφ(v2 ) + βξ(v2 ) v2 + · · · +
αφ(vn ) + βξ(vn ) vn .
Therefore,
. Then f = 0
This shows that μ is linear. Given that B is a basis of V and let f ∈ V
if and only if f (vi ) = 0 for every i with 1 ≤ i ≤ n. Therefore
= dimFn , μ
Hence, K er μ = {0} and therefore μ is one-to-one. Also since dim V
is onto. This completes the proof.
Theorem 4.7 Let V be a vector space over F. Then the following hold:
(i) For any nonzero vector v ∈ V , there exists a linear functional such that
v∈V
v(v) = 0.
114 4 Dual Spaces
v(v)
v(v)
v= x+ v− x ∈ x + K er
v
v(x)
v(x)
n
Remark 4.9 Define a function f : Fn → F such that f (x1 , x2 , . . . , xn ) = ai xi ,
i=1
then it is a linear functional defined on the vector space Fn , which is deter-
mined uniquely by (a1 , a2 , . . . , an ) ∈ Fn relative to the standard ordered basis
B = {e1 , e2 , . . . , en }; f (ei ) = ai , 1 ≤ i ≤ n. Every linear functional on Fn is of this
form for some scalars a1 , a2 , . . . , an , because if B is the standard basis of Fn , f ∈ Fn
n
and f (ei ) = ai , then for any (x1 , x2 , . . . , xn ) ∈ Fn , (x1 , x2 , . . . , xn ) = xi ei yields
i=1
that
n
n
f (x1 , x2 , . . . , xn ) = xi f (ei ) = ai xi .
i=1 i=1
Example 4.10 If B = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} is a basis of R3 , then in order
to find the dual basis of B, define φ1 , φ2 , φ3 : R3 → R such that
φ1 (x, y, z) = a1 x + b1 y + c1 z,
φ2 (x, y, z) = a2 x + b2 y + c2 z,
φ3 (x, y, z) = a3 x + b3 y + c3 z.
and
we find that a2 = 21 , b2 = −1 2
, c2 = 21 and a3 = −1
2
, b3 = 21 , c3 = 21 , respectively.
−x+y+z
This yields that φ2 (x, y, z) = 2 , φ3 (x, y, z) =
x−y+z
2
. Hence, {φ1 , φ2 , φ3 } is a
basis of R3 dual to the basis B.
116 4 Dual Spaces
Exercises
(a) {1, x, x 2 },
(b) {1 + x, 1 − x, 1 + x 2 } and
(c) {x, 2 + x, 1 − x − x 2 }.
7. Let V be the vector space of all polynomials over R of degree less than or
equal to 2. Let φ1 , φ2 , φ3 be the linear functionals on V defined by φ1 ( f (t)) =
1
0 f (t)dt, φ2 ( f (t)) = f (1), φ3 ( f (t)) = f (0). Here f (t) denotes the deriva-
tive of f (t). Find a basis { f 1 (t), f 2 (t), f 3 (t)} of V such that its dual is
{φ1 , φ2 , φ3 }.
8. Let V be the vector space of all polynomials over F of degree less than or equal
to 2. Let a, b, c ∈ F be distinct scalars. Let φa , φb , φc be the linear functionals on
V defined by φa ( f (t)) = f (a), φb ( f (t)) = f (b), φc ( f (t)) = f (c). Show that
{φa , φb , φc } is a basis of V and also find its dual basis.
9. Let {e1 , e2 , . . . , en } be the usual basis of Fn . Show that its dual basis is
{π1 , π2 , . . . , πn }, where πi is the ith projection mapping: πi (a1 , a2 , . . . , an ) =
ai .
10. Let W be a subspace of V . For any linear functional φ on W , show that there is
a linear functional f on V such that f (w) = φ(w) for any w ∈ W ; that is, φ is
the restriction of f to W.
11. Let V be a vector space over R. Let φ1 , φ2 ∈ V and suppose f : V −→ R,
defined by f (v) = φ1 (v)φ2 (v), also belongs to V . Show that either φ1 = 0 or
φ2 = 0.
12. Let V be the vector space of all polynomials over R of degree less than or equal
to 3. Find the dual basis of a basis B = {1, 1 + x, (1 + x)2 , (1 + x)3 } of V.
4.1 Linear Functionals and the Dual Space 117
13. Consider R2 as a vector space over R. Find the formulae for linear functionals f
and g on R2 such that for a fixed θ , f (cosθ, sinθ ) = 1, f (−sinθ, cosθ ) = 2,
g(cosθ, sinθ ) = 2 and g(−sinθ, cosθ ) = 1.
2 .
(a) Prove that B = { f, g} is a basis of R
(b) Find an ordered basis B = {u, v} of R2 such that B is the dual of B.
14. Let n and m be any two positive integers.
(a) For any m linear functionals f 1 , f 2 , . . . , f m on Fn , prove that σ : Fn −→ Fm
given by σ (u) = ( f 1 (u), f 2 (u), . . . , f m (u)) is a linear transformation.
(b) Given any linear transformation T : Fn −→ Fm . Prove that there exist
uniquely determined linear functionals g1 , g2 , . . . , gm depending upon T
such that T (u) = (g1 (u), g2 (u), . . . , gm (u)).
15. Let V be the vector space of all polynomials over R of degree less than or equal to
2. Define φ1 , φ2 , φ3 in V such that φ1 ( f (t)) = 1 f (t)dt, φ2 ( f (t)) = 2 f (t)dt
−1 0 0
and φ3 ( f (t)) = 0 f (t)dt. Show that B = {φ1 , φ2 , φ3 } is a basis of V . Find
a basis B of V , of which B is dual.
16. Let F be a field of characteristic zero and let V be a finite dimensional vector
space over F. If v1 , v2 , . . . , vm are finitely many vectors in V , each different
from the zero vector, prove that there is a linear functional f on V such that
f (vi ) = 0, i = 1, 2, . . . , m.
17. In R3 , let v1 = (1, 0, 1); v2 = (0, 1, −2); v3 = (−1, −1, 0).
(a) If f is a linear functional on R3 such that f (v1 ) = 1, f (v2 ) = −1, f (v3 ) = 3
and if u = (x, y, z), then find f (u).
(b) Describe explicitly a linear functional f on R3 such that f (v1 ) = f (v2 ) = 0
but f (v3 ) = 0.
(c) If f is any linear functional such that f (v1 ) = f (v2 ) = 0 but f (v3 ) = 0 and
if u = (2, 3, −1), then show that f (u) = 0.
18. Let B = {v1 , v2 , v3 } be a basis of C3 defined by v1 = (1, 0, −1), v2 = (1, 1, 1),
v3 = (2, 2, 0). Find the dual basis of B.
For any vector space V , one can consider its dual space V which contains all the
linear functionals on V. Since V is also a vector space, consider ,
V the dual of V
. If V is finite dimensional then V
which contains all the linear functionals on V and
V are also finite dimensional and dimV = dim V = dim V.
Theorem 4.11 (Principal of duality) If V is a vector space over F, then there exists
a canonical isomorphism from V onto V.
118 4 Dual Spaces
Thus, v is a linear functional on V and hence v ∈ V . Now define the canonical map
σ : V → V such that σ (v) = v. For any α, β ∈ F and v1 , v2 ∈ V, σ (αv1 + βv2 ) =
αv1 + βv2 . It can be easily seen that for any f ∈ V , αv1 + βv2 ( f ) = f (αv1 +
βv2 ) = α f (v1 ) + β f (v2 ) = αv1 ( f ) + βv2 ( f ) = (αv1 + βv2 )( f ).
This shows that αv1 + βv2 = αv1 + βv2 . Hence, using this relation in the pre-
ceding definition of σ , it can be seen that σ (αv1 + βv2 ) = ασ (v1 ) + βσ (v2 ) and σ
is a linear transformation from V to V . Now if v ∈ K er σ , then σ (v) = v = 0 or
v( f ) = 0 for all f ∈ V . This shows that f (v) = 0 for all f ∈ V . By Theorem 4.7(i),
v = 0 and consequently K er σ = {0} and σ is a monomorphism. This yields that
dimV = dimσ (V ). However, dimV = dim V = dim V so that σ (V ) is a subspace
of V such that dimσ (V ) = dim V . This can happen only if σ (V ) =
V and hence σ
is onto. This completes the proof of our result.
Remark 4.12 (i) It is to be noted that in the above theorem σ : V −→ V does not
depend upon any particular choice of basis of V , that is why it is called canonical
isomorphism.
(ii) If V is any arbitrary vector space (need not be finite dimensional), even then
σ : V −→ V will exist which is an injective homomorphism, but need not be
onto.
Example 4.14 Consider the vector space R2 [x], i.e., the vector space of all poly-
nomials of degree less than or equal to two over R and let ψ1 , ψ2 , ψ3 : R2 [x] → R
4.2 Second Dual Space 119
1
such that ψ1 ( f (x)) = 0 f (x)d x, ψ2 ( f (x)) = f (x), ψ3 ( f (x)) = f (0). Now we
find the basis of R2 [x] dual to {ψ1 , ψ2 , ψ3 }.
Let {a1 + b1 x + c1 x 2 , a2 + b2 x + c2 x 2 , a3 + b3 x + c3 x 2 } be the required basis
of R2 [x], then
b1 c1
ψ1 (a1 + b1 x + c1 x 2 ) = a1 + + = 1,
2 3
b2 c2
ψ1 (a2 + b2 x + c2 x 2 ) = a2 + + = 0,
2 3
b3 c3
ψ1 (a3 + b3 x + c3 x 2 ) = a3 + + = 0,
2 3
ψ2 (a1 + b1 x + c1 x 2 ) = b1 + 2c1 = 0,
ψ2 (a2 + b2 x + c2 x 2 ) = b2 + 2c2 = 1,
ψ2 (a3 + b3 x + c3 x 2 ) = b3 + 2c3 = 0,
ψ3 (a1 + b1 x + c1 x 2 ) = b1 = 0,
ψ3 (a2 + b2 x + c2 x 2 ) = b2 = 0,
ψ3 (a3 + b3 x + c3 x 2 ) = b3 = 1.
4.3 Annihilators
Definition 4.15 Let V be a vector space over a field F and S be a subset of V. Then
is defined as the collection of all f ∈ V
annihilator S ◦ of S in V such that f (s) = 0
◦
for all s ∈ S, i.e., S = { f ∈ V | f (s) = 0 for all s ∈ S}.
Remark 4.16 (i) For a subset S of V , the annihilator S ◦ of S is defined as the set
of all v ∈ V such that f (v) = 0 for all f ∈ S.
(ii) In a finite dimensional vector space V if 0 = v ∈ V, then we have seen that
there exists f ∈ V such that f (v) = 0. This shows that V ◦ contains no nonzero
functional, i.e., V ◦ = {0}. It is also clear that if a subset S of V contains the
zero vector alone, then S ◦ = V .
120 4 Dual Spaces
Proof (i) Suppose that S1 ⊆ S2 and f ∈ S2◦ . Then for any v ∈ S1 , f (v) = 0 and
consequently, f ∈ S1◦ which completes the required proof.
(ii) Since for any v ∈ S, 0(v) = 0, we find that 0 ∈ S ◦ and therefore S ◦ = φ. Let
f, g ∈ S ◦ and α, β ∈ F. Then for every v ∈ S
which shows that S ◦ is a subspace of V . Now let v ∈ S. Then for every linear func-
tional f ∈ S ◦ , v ∈ (S ◦ )◦ and under the identification of V
v( f ) = f (v) = 0. Hence
◦ ◦
and V , v ∈ (S ) .
Theorem 4.18 Let V be a vector space over a field F and W a subspace of V . Then
(i) dimW + dimW ◦ = dimV,
∼
(ii) W = WV◦ and
(iii) (W ◦ )◦ = W.
(iii) Suppose that dimW = m and dimV = n. Then by the above (i), we find that
dimW ◦ = n − m. Therefore,
(ii) Replacing W1 by W1◦ and W2 by W2◦ in (i) and using Theorem 4.18(iii),
we get (W1◦ + W2◦ )◦ = (W1◦ )◦ ∩ (W2◦ )◦ = W1 ∩ W2 and hence ((W1◦ + W2◦ )◦ )◦ =
(W1 ∩ W2 )◦ . This implies that (W1 ∩ W2 )◦ = W1◦ + W2◦ .
Remark 4.20 Observe that no dimension argument is employed in the proof (i),
hence the above result (i) holds for vector spaces of finite or infinite dimensions.
122 4 Dual Spaces
Definition 4.21 Let V be a vector space over F. Then a maximal proper subspace
of V is called a hyperspace or hyperplane of V .
Example 4.22 (1) Consider Rn , n ≥ 2, as a vector space over R. Then the sub-
spaces W1 = {(α1 , α2 , . . . , αn−1 , 0) | αi ∈ R, i = 1, 2, . . . , n − 1} and W2 =
{(0, β1 , . . . , βn−1 ) | βi ∈ R, i = 1, 2, . . . , n − 1} are hyperspaces of Rn .
(2) Consider Pn (x), n ≥ 1, as the vector space of all real polynomials of degree
at most of degree n over the field R. Then the subspaces W1 = {α1 x + α2 x 2 +
· · · + αn x n | αi ∈ R, i = 1, 2, . . . , n} and W2 = {β1 + β2 x + · · · + βn−1 x n−1 |
βi ∈ R, i = 1, 2, . . . , n − 1} are hyperspaces of Pn (x).
Theorem 4.24 If f is a nonzero linear functional on a vector space V , then the null
space of f is a hyperspace of V . Conversely, every hyperspace of V is the null space
of a (not unique) nonzero linear functional on V .
Proof Let f be a nonzero linear functional on the vector space V and W the null
space of f. We have to show that W is a hyperspace of V. It is obvious that W = V.
Also W = {0} as dim V > 1 if V is finite dimensional. This shows that W is a
proper subspace of V. To prove that W is also a maximal subspace of V , let W1 be a
subspace of V such that W ⊆ W1 ⊆ V . Then we have to prove that either W1 = W
or W1 = V. If W1 = W , then there is nothing to do. If W1 = W , then there exists
4.4 Hyperspaces or Hyperplanes 123
Example 4.27 Let n be a positive integer and F a field. Suppose that W is the set of all
vectors (x1 , x2 , . . . , xn ) ∈ Fn such that x1 + x2 + · · · + xn = 0. Then it can be seen
n
that W ◦ consists of all linear functionals f of the form f (x1 , x2 , . . . , xn ) = c xi .
i=1
n
have proved that f (x1 , x2 , . . . , xn ) = c xi .
i=1
CaseII : Suppose that char (F) = 2. Now in this case x = −x for all x ∈ F. It is
easy to observe that the proof given in Case I holds in Case II also.
Exercises
6. Let V be the vector space of all polynomials over R of degree less than or equal
to 3 and W be the subspace of V , consisting of those polynomials p(x) ∈ V
such that p(1) = 0, p(−1) = 0. Find dim W and dim W ◦ . ⎡ ⎤
122 1
⎢1 2 2 2 ⎥
7. Let W1 and W2 be the row space and the column space of A = ⎢ ⎣ 2 4 3 3 ⎦,
⎥
0 0 1 −1
respectively.
4 that belong to W ◦ .
(a) Find the general formula for those f ∈ R 1
4 that belong to W ◦ .
(b) Find the general formula for those f ∈ R 2
8. For any A, B ∈ Mn (F), the vector space of n × n matrices over a field F, prove
that
(a) tr (AB) = tr (B A),
(b) if A and B are similar, then tr (A) = tr (B),
(c) there exist no two matrices A and B in M2 (R), such that AB − B A = I2 ,
where I2 is the identity matrix of order 2.
126 4 Dual Spaces
9. Let V = M2 (R) be the vector space of all 2 × 2 matrices with real entries and
W is the subspace
of V consisting of those A ∈ V such that AB = B A, where
13 such
B= . Find dim W and dim W ◦ . Does there exist a nonzero f ∈ V
26
00
that f (I2 ) = 0, f = 0 and f ∈ W ◦ .
01
10. Find a basis of the annihilator W ◦ of the subspace W of R4 spanned by
(1, 2, −3, 4) and (0, 1, 4, −1).
11. Let W be the subspace of R5 , which is spanned by the vectors v1 = e1 + 2e2 +
e3 , v2 = e2 + 3e3 + 3e4 + e5 , v3 =e1 + 4e2 + 6e3 + 4e4 + e5 , where {e1 , e2 , e3 ,
e4 , e5 } is the standard basis of R5 . Find a basis for W ◦ .
12. Let V =M2 (R) be the vector space of all 2 × 2 matrices with real entries and
2 −2
let B = . Let W be the subspace of V consisting of those A ∈ V such
−1 1
that AB = 02×2 . Let f be a linear functional on V which is in the annihilator
of W. Suppose that f (I 2 ) = 0 and f (C) = 3, where I2 is the identity matrix of
00
order 2 × 2 and C = . Find f (B).
01
13. Let S be a set, F a field and V (S; F) the vector space of all functions from S
into F, where operations are defined as follows: ( f + g)(x) = f (x) + g(x);
(α f )(x) = α f (x). Let W be any n-dimensional subspace of V (S; F). Show that
there exist points x1 , x2 , . . . , xn in S and functions f 1 , f 2 , . . . , f n in W such that
f i (x j ) = δi j .
14 If W is a subspace of a finite dimensional vector space V and if { f 1 , f 2 , . . . , fr }
is any basis for W ◦ , then prove that W = ∩ri=1 Ni , where Ni is the null space of
f i , i = 1, 2, . . . , r.
. For a
For a given vector space V over a field F, one can always find its dual space V
given linear transformation T : V → W , is it possible to find a linear map T : W→
V ? In the present section, we shall discuss properties of such linear maps. In fact, if
T : V → W is a linear transformation, then for any f ∈ W , the composition f ◦ T :
V → F given by f ◦ T (v) = f (T (v)) for all v ∈ V defines a linear transformation.
In fact, for any u, v ∈ V and α, β ∈ F, f ◦ T (αu + βv) = α f ◦ T (u) + β f ◦ T (v)
and hence f ◦ T ∈ V .
, α, β ∈ F, and v ∈ V
For any f, g ∈ W
4.5 Dual (or Transpose) of Linear Transformation 127
(α f + βg))(v) =
(T ((α f + βg) ◦ T )(v)
= (α f + βg)(T (v))
= α f (T (v)) + βg(T (v))
= α( f ◦ T )(v) + β(g ◦ T )(v)
= (α( f ◦ T ) + β(g ◦ T ))(v)
= ( f ) + β T
(α T (g))(v).
(α f + βg) = α T
This shows that T ( f ) + β T
(g), and hence T
: W
→V is a linear
transformation. This implies that if T ∈ H om(V, W ), then T ∈ H om(W
, V
).
(iii) If T1 + T2 : U −→ V , then T
1 + T2 : V −→ U . If f ∈ V , and u ∈ U , then
((T1 + T2 )( f ))(u)=( f ◦ (T1 + T2 ))(u) = f ((T1 + T2 )(u)) = f (T1 (u) + T2 (u)) =
( f ◦ T1 + f ◦ T2 )(u). This implies that (T 1 + T2 )( f ) = ( f ◦ T1 + f ◦ T2 ), i.e.,
128 4 Dual Spaces
(T
1 + T2 )( f ) = T1 ( f ) + T2 ( f ). Finally, we get (T1 + T2 )( f ) = ( T1 + T2 )( f ), i.e.,
(T1 + T2 ) = T1 + T2 .
(iv) Since αT1 : U −→ V , hence αT 1 : V −→ U . If f ∈ V , and u ∈ U , then
((αT 1 )( f ))(u) = ( f ◦ (αT1 ))(u) = f (αT1 )(u) = f (α(T1 (u))) = α( f (T1 (u)). This
gives us ((αT
1 )( f ))(u) = (α( f ◦ T1 ))(u), i.e., αT1 ( f ) = (α( f ◦ T1 )). Thus, we
conclude that (αT1 )( f ) = (α(T1 ( f ))), i.e., (αT1 )( f ) = (α T1 )( f ). This shows that
(αT
1 ) = α T1 .
Theorem 4.33 Let U and V be vector spaces over a field F and let T be a linear
transformation from U to V. Then null space or kernel of T is the annihilator of
range of T. Further if U and V are finite dimensional, then
) = r (T ).
(i) r (T
(ii) the range of T is the annihilator of the null space of T.
) ⇔
f ∈ N (T ( f ) = 0, the zero linear functional on U
T
⇔ f ◦T =0
⇔ ( f ◦ T )(u) = 0 ∈ F, for all u ∈ U
⇔ f (T (u)) = 0, for all T (u) ∈ T (U )
⇔ f ∈ (T (U ))◦ .
) = (T (U ))◦ .
Hence N (T
4.5 Dual (or Transpose) of Linear Transformation 129
Theorem 4.34 Let U and V be finite dimensional vector spaces over a field F. Let
B1 be an ordered basis for U with dual basis
B1 , and let B2 be an ordered basis for
V with dual basis
B2 . Let T : U −→ V be a linear transformation, let [T ] BB21 be the
]
matrix of T relative to B1 ,B2 and let [T B2
be the matrix of T relative to B1 ,
B2 .
B1
]
Then [T B2
= the transpose of [T ] BB21 .
B 1
Exercises
2. Let V be a finite dimensional vector space. Then prove that the linear transfor-
mation T : V −→ V is nonsingular if and only if its transpose T : V −→ V is
nonsingular.
3. Let V be the vector space of all polynomial functions over the field of real numbers.
Let a and b be fixed real numbers and let f be the linear functional on V defined
b
by f ( p(x)) = a p(x)d x, where p(x) ∈ V . If D is the differentiation operator
f ).
on V , then find D(
4. Let V be the vector space of all n × n matrices over a field F and let B be a fixed
n × n matrix. If T is the linear operator on V defined by T (A) = AB − B A, and
( f ).
if f is the trace function, then find T
5. Let V be a finite dimensional vector space over the field F and let T be a linear
operator on V. Let α be a scalar and suppose there is a nonzero vector v ∈ V such
that T (v) = αv. Prove that there is a nonzero linear functional f on V such that
( f ) = α f.
T
6. Let n be a positive integer and let V be the vector space of all polynomial functions
over the field of real numbers which have degree atmost n, i.e., functions of the
form f (x) = αo + α1 x + · · · + αn x n . Let D be the differential operator on V.
Find a basis for the null space of the transpose operator D.
7. Let V be a finite dimensional vector space over a field F. Show that T −→ T is
an isomorphism from A (V ) to A (V ).
Chapter 5
Inner Product Spaces
In the previous chapters, we have considered vector space V over an arbitrary field
F. In the present chapter, we shall restrict ourselves over the field of reals R or the
complex field C. One can see that the concept of “length” and “orthogonality” did
not appear in the investigation of vector space over arbitrary field. In this chapter, we
place an additional structure on a vector space V to obtain an inner product space.
If V is a vector space over R then V is called real vector space. On the other hand,
if V is a vector space over C then V is called complex vector space.
Definition 5.1 A vector space V over F is said to be an inner product space if there
exists a function , : V × V → F satisfying the following axioms:
(1) u, v = v, u for all u, v ∈ V .
(2) u, u ≥ 0 and u, u = 0 ⇔ u = 0 for all u ∈ V .
(3) αu + βv, w = αu, w + βv, w for all u, v, w ∈ V and α, β ∈ F.
Remark 5.2 (i) The function , satisfying the axioms (1), (2) and (3) is called
inner product on V .
(ii) If F = R, then the complex conjugate v, u = v, u, and hence the axiom (1)
can be written as u, v = v, u.
(iii) u, v is generally denoted as (u, v), u.v or u|v. Throughout we shall denote
it by u, v.
(iv) If F = C, the field of complex numbers, then axiom (1) implies that u, u is
real and hence the axiom (2) makes sense. For any α, β ∈ C and u, v, w ∈ V ,
applying (1) and (2) we see that
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 131
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_5
132 5 Inner Product Spaces
Example 5.3 (1) In the vector space V = F2 , for any u = (α1 , α2 ), v = (β1 , β2 ) ∈
V define
u, v = 2α1 β1 + α1 β2 + α2 β1 + α2 β2 .
u, v = 2α1 β1 + α1 β2 + α2 β1 + α2 β2
= 2α1 β1 + α1 β2 + α2 β1 + α2 β2
= 2β1 α1 + β1 α2 + β2 α1 + β2 α2
= v, u,
u, u = 2α1 α1 + α1 α2 + α2 α1 + α2 α2
= α1 α1 + α1 (α1 + α2 ) + α2 (α1 + α2 )
= |α1 |2 + (α1 + α2 )(α1 + α2 )
= |α1 |2 + |α1 + α2 |2 ≥ 0.
It can be easily seen that the above product defines an inner product on V .
The above inner product is called standard inner product in Rn and Cn , and
the resulting inner product space is called Euclidean and Unitary space, respec-
tively.
(3) Let V = C [a, b], the vector space of all continuous complex valued functions
defined on the closed interval [a, b]. For any f (t), g(t) ∈ V define
b
f (t), g(t) = f (t)g(t)dt.
a
This defines an inner product on V . For any f (t), g(t) ∈ V , it can be seen that
5.1 Inner Products 133
b b
g(t) f (t)dt = a g(t) f (t)dt
a b
= a f (t)g(t)dt.
Then the above product defines an inner product on R3 . Hence there are infinitely
many inner product that can be defined on R3 . Readers are advised to show that
the above product u Avt cannot be an inner product if A has a negative diagonal
entry λi < 0.
a11 a12
(5) Consider the vector space V = M2 (R). For any A = and B =
a21 a22
b11 b12
in V , define
b21 b22
Then it can be easily seen that the above product defines an inner product on
V . In fact A, A = a11
2
+ a12
2
+ a21
2
+ a22
2
≥ 0
andA, A = 0 if and only if
00
a11 = 0, a12 = 0, a21 = 0, a22 = 0, i.e., A = . Moreover, since all the
00
entries of the matrix are real, A, B = B, A. Further for any α, β ∈ R and
A, B, C ∈ V ,
134 5 Inner Product Spaces
αa11 + βb11 αa12 + βb12 c c
α A + β B, C = , 11 12
αa21 + βb21 αa22 + βb22 c21 c22
= (αa11 + βb11 )c11 + (αa12 + βb12 )c12
+(αa21 + βb21 )c21 + (αa22 + βb22 )c22
= α(a11 c11 + a12 c12 + a21 c21 + a22 c22 )+
β(b11 c11 + b12 c12 + b21 c21 + b22 c22 )
= αA, C + βB, C.
α A + β B, C = tr ((α A + β B)t C)
= tr ((α At + β B t )C)
= tr (α At C) + tr (β B t C)
= α tr (At C) + β tr ((B t C)
= αA, C + βB, C.
(iii) Suppose that u, v = 0 for all v ∈ V . Then in particular u, u = 0 and hence
u = 0.
(iv) Since v1 , u = u, v1 = u, v2 = v2 , u, we find that v1 , u = v2 , u for
all u ∈ V . This implies that v1 − v2 , u = v1 , u − v2 , u = 0 for all u ∈ V , and
hence by (iii), we get the required result.
5.1 Inner Products 135
Exercises
1. Let R2 be the vector space over the real field. Find all 4-tuples of real numbers
(a, b, c, d) such that for u = (α1 , α2 ), v = (β1 , β2 ) ∈ R2 , u, v = aα1 β1 +
bα2 β2 + cα1 β2 + dα2 β1 defines an inner product on R2 .
2. Let V be the vector space of all real functions y = f (x) over the field of reals,
2 0
satisfying dd xy2 + 4y = 0. In V define u, v = π uvd x. Prove that this defines
an inner product on V .
3. Let V be the vector space of all real functions y = f (x) over the field of reals, sat-
3 2 0
isfying dd xy3 − 12 dd xy2 + 44 ddyx − 48y = 0. In V if u, v = −∞ uvd x, then prove
that this defines an inner product on V .
∞
4. Let V = {(a1 , a2 , a3 , . . .), ai ∈ R | ai2 is convergent }. Then V is a vector
i=1
space over R with addition and scalar multiplication defined component- wise.
Prove that the map , : V × V −→ R, given by (a1 , a2 , . . .), (b1 ,
b2 , . . .) = a1 b1 + a2 b2 + · · · is well defined and also prove that it is an inner
product on V .
5. Let V be a finite dimensional vector space over F, and {e1 , e2 , . . . , en } be a
basis of V . If u, v ∈ V , then u = a1 e1 + a2 e2 + · · · + an en , v = b1 e1 + b2 e2 +
· · · + bn en , where ai , bi are uniquely determined scalars. Define u, v = a1 b1 +
a2 b2 + · · · + an bn . Prove that this map is an inner product on V .
6. Let R2 be the vector space over the real field. For u = (α1 , α2 ), v = (β1 , β2 ) ∈
R2 , prove that u, v = (α1 −α2 )(β 4
1 −β2 )
+ (α1 +α2 )(β
4
1 +β2 )
defines an inner product
on R2 .
7. Let V be a n-dimensional vector space over the field of complex numbers C.
Let B be a basis of V . Define, for arbitrary u, v ∈ V , u, v = [u] B [v] B , where
the inner product of the coordinate vectors on the right hand side is the natural
inner product of the vector space Cn over C. Prove that , is an inner product
on V .
8. Let V be the vector space of m × n matrices over R. Prove that A, B = tr (B t A)
defines an inner product in V .
9. Suppose f (u, v) and g(u, v) are inner products on a vector space V over R.
Prove that
(a) The sum f + g is an inner product on V , where ( f + g)(u, v) = f (u, v) +
g(u, v).
(b) The scalar product k f , for k > 0, is an inner product on V , where (k f )(u, v)
= k f (u, v).
10. Find the values of k so that the following is an inner product on R2 , where
u = (x1 , x2 ) and v = (y1
, y2 ); u,
v = x1 y1 −
3xa1jybk2 − 3x2 y1 + kx2 y2 .
11. Show that the formula a j x j , bk x k = j+k+1
defines an inner product
j k j,k
on the space R[x] of polynomials over the real field R.
12. Let , be the standard inner product on R2 and let T be the linear operator
T (x1 , x2 ) = (−x2 , x1 ). Now T is “rotation through 900 ” anti-clockwise and has
136 5 Inner Product Spaces
the property that u, T (u) = 0 for all u ∈ R2 . Find all the inner products , on
R2 such that u, T (u) = 0 for all u.
13. Consider any u, v ∈ R2 . Prove that
u, u u, v
u, v v, v = 0
Definition 5.5 Let V be an inner product space. If√v ∈ V , then the length of v (or
norm of v) denoted as
v
and is defined as
v
= v, v.
Proof (i) Clear from the definition of the inner product space.
αu
2 = αu, αu = ααu, u = αα
u
2 = |α|2
u
2 = (|α|
u
)2 .
u + v
2 = u + v, u + v
= u, u + u, v + v, u + v, v
=
u
2 +
v
2 + u, v + u, v
=
u
2 +
v
2 + 2Reu, v.
lelogram equality.
The above identities (iii), (iv) and (v) are known as Polarization identities.
v
.
v
. Therefore,
we assume that neither u = 0 nor v = 0. Then in this case
u
= 0. If
u
= 0, then
u, u = 0 and hence u = 0, a contradiction. Now for any scalar λ
v
.
Remark 5.9 (i) If we consider the Example 5.3(2), the above theorem gives that
for any u = (α1 , α2 , . . . , αn ), v = (β1 , β2 , . . . , βn ) ∈ V
n
n
n
αi βi ≤ |αi |
2 |βi |2 .
i=1 i=1 i=1
(ii) In case of the inner product given in Example 5.3(3) we find that for any
f (t), g(t) ∈ C [a, b]
138 5 Inner Product Spaces
b b b
f (t)g(t)dt ≤ | f (t)|2 dt |g(t)|2 dt.
a a a
u,v
(iii) From Cauchy-Schwartz inequality, we have −1 ≤
u
v
≤ 1 for any two
nonzero vector u and v. This ensures that there exists a unique θ ∈ [0, π ] such
u,v
that cosθ =
u
v
or u, v =
u
v
cosθ . This angle θ is called the angle
between u and v.
Lemma 5.10 Let V be an inner product space. Then for any u, v ∈ V, |u, v| =
|v
if and only if u, v are linearly dependent.
Proof If any one of u, v is zero, then the result follows. Hence, we assume that
neither u nor v is zero. If |u, v| =
u
|v
, then for scalar λ = u,v
v,v
v
= |α|
v
2 = |α|v, v = |αv, v| = |u, v|.
Theorem
√ 5.11 Every inner product space is normed space together with the norm
u
= u, u.
Hence
u
= 0 ⇔ u = 0.
αu
2 = αu, αu = ααu, u = (|α|
u
)2 .
u + v
2 = u + v, u + v
= u, u + u, v + v, u + v, v
= u, u + u, v + u, v + v, v
=
u
2 + 2Re(u, v) +
v
2
≤
u
2 + 2|u, v| +
v
2 (since α ∈ F and Rel α ≤ |α|)
≤
u
2 + 2
u
v
+
v
2 (by Cauchy-Schwarz inequality)
≤ (
u
+
v
)2 .
Theorem 5.13 Let V be an inner product space over R. Then the function d : V ×
V → R such that d(u, v) =
u − v
, where u, v ∈ V is a metric on V .
d(u, v) =
u − v
=
(u − w) + (w − v)
≤
u − w
+
w − v
.
This shows that d(u, v) ≤ d(u, w) + d(w, v) for all u, v, w ∈ V and hence d is a
metric on V .
Exercises
1. Consider f (x) = 4x 3 − 6x + 5 and g(x) = −2x 2 + 7 in the polynomial space
1
P(x) with inner product f, g = 0 f (x)g(x)d x. Find
f
and
g
.
2. Let V be a real inner product space. Show that
(a)
u
=
v
if and only if u + v, u − v = 0,
(b)
u + v
2 =
u
2 +
v
2 if and only if u, v = 0. Show by counter examples
that the above statements are not true for C2 .
3. Let Rn and Cn be the vector spaces over R and C, respectively. Prove that
(a)
(a1 , a2 , . . . , an )
∞ = max (|ai |),
(b)
(a1 , a2 , . . . , an )
1 = |a1 | + |a2 | + · · · + |an |,
140 5 Inner Product Spaces
(c)
(a1 , a2 , . . . , an )
2 = |a1 |2 + |a2 |2 + · · · + |an |2
are norms on Rn and Cn . These are known as infinity-norm, one-norm and
two-norm, respectively.
4. Solve the above problem 3., for u = (1 + i, −2i, 1 − 6i) and v = (1 − i, 2 +
3i, −3i) in C3 .
5. Consider vectors u = (1, −2, −4, 3, −6) and v = (3, −2, 1, −4, −1) in R5 .
Find
(a)
u
∞ and
v
∞ ,
(b)
u
1 and
v
1 ,
(c)
u
2 and
v
2 ,
(d) d∞ (u, v), d1 (u, v), d2 (u, v),
where the norms
.
∞ ,
.
1 and
.
2 are the infinity-norm, one-norm and two-
norm, respectively, on R5 and d∞ , d1 , d2 are metric functions induced by these
norms, respectively.
6. Let C [a, b] be the vector space of real continuous functions on [a, b] over
b b
R. Prove that (i)
f
1 = a | f (t)|dt, (ii)
f
2 = a [ f (t)]2 dt, (iii)
f
∞ =
max(| f (t)|) are norms on C [a, b]. Further consider the functions f (t) = 2t 2 −
6t and g(t) = t 3 + 6t 2 in C [1, 3] and hence find
(a) d1 ( f, g),
(b) d2 ( f, g),
(c) d∞ ( f, g),
where d1 , d2 , d∞ are metric functions induced by the above norms, respectively.
7. Find out norms and metrics induced by the inner products defined in the problems
1 − 8 and 10 of the preceding section.
8. Prove the Apollonius identity
1 1
w − u
2 +
w − v
2 =
u − v
2 + 2
w − (u + v)
2 .
2 2
9. Let u = (r1 , r2 , . . . , rn ) and v = (s1 , s2 , . . . , sn ) be in Rn . The Cauchy-Schwartz
inequality states that
|r1 s1 + r2 s2 + · · · + rn sn |2 ≤ (r12 + r22 + · · · + rn2 )(s12 + s22 + · · · +2 sn2 ).
Prove that
(|r1 s1 | + |r2 s2 | + · · · + |rn sn |)2 ≤ (r12 + r22 + · · · + rn2 )(s12 + s22 + · · · + sn2 ).
n
Proof (i) Suppose there exist scalars α1 , α2 , . . . , αn such that αi u i = 0. Thus for
i=1
each 1 ≤ k ≤ n
0 = 0, u k
n
= αi u i , u k
i=1
n
= αi u i , u k
i=1
= αk u k , u k
= αk
u k
2 .
But since
u k
2 = 0 we find that αk = 0 for each 1 ≤ k ≤ n, and hence S is linearly
independent.
n
(ii) Since v ∈ L(S), there exist scalars α1 , α2 , . . . , αn such that v = αi u i . Thus for
i=1
n
n
each 1 ≤ k ≤ n, v, u k = αi u i , u k = αi u i , u k . But since S is orthogonal,
i=1 i=1
we find that v, u k = αk u k , u k . Now since u k = 0,
u k
2 = 0, the latter relation
n
yields that αk = v,u k
u k
2
, 1 ≤ k ≤ n, and hence v = v,u k
u k
2 k
u .
k=1
(iii) For any u i ∈ S, 1 ≤ i ≤ n
142 5 Inner Product Spaces
n
nn
u i
2 = ui , ui
i=1 i=1
i=1
= u i , u j
1≤i, j≤n
n
= u i , u i
i=1
n
=
u i
2 .
i=1
Remark 5.17 Any two vectors u, v in an Euclidean space are orthogonal if and only
if
u + v
2 =
u
2 +
v
2 . This result is not true in unitary space. In fact, for any
u = (u 1 , u 2 ), v = (v1 , v2 ) ∈ C2 , u, v = u 1 v1 + u 2 v2 defines an inner product on
C2 . If we consider u = (0, i), v = (0, 1) ∈ C2 , then u, v = 00 + i1 = i = 0, and
hence u and v are not orthogonal. But
u + v
2 =
(0, i) + (0, 1)
2
=
(0, 1 + i)
2
= (0, 1 + i), (0, 1 + i)
= 00 + (1 + i)(1 + i)
= (1 + i)(1 − i)
= 2,
u
2 = (0, i), (0, i) = 1 and
v
2 = (0, 1), (0, 1) = 1 yield that
u + v
2 =
u
2 +
v
2 .
k−1
vk , u j
u k = vk − u j ; for 2 ≤ k ≤ n.
j=1
u j
2
n−1
vn , u j
u n = vn − u j,
j=1
u j
2
5.3 Orthogonality and Orthonormality 143
n−1
vn ,u j
if u n = 0, we find that vn =
u j
2 j
u which is spanned by {v1 , v2 , . . . , vn−1 }. This
j=1
shows that {v1 , v2 , . . . , vn } is linearly dependent, a contradiction to our assumption.
Hence u n = 0. Further, for 1 ≤ i ≤ n − 1
n−1
vn , u j
u n , u i = vn , u i − u j , u i .
j=1
u j
2
i
2 u i , u i = 0. This shows that the set
u
= 1. Then the projection Pu (v) of v along u is defined as Pu (v) = u, vu. If
u ∈ V is any nonzero vector, then
u u u, v
Pu (v) = ,v = u.
u
u
u, u
Remark 5.22 We observe that for any u, v in a real inner product space
u,v
v − Pu (v), u = v, u − u,u u, u
= v, u − u, v
= v, u − v, u
= v, u − v, u sincev, u ∈ R
= 0.
Lemma 5.23 Let V be an inner product space defined over R. For a unit vector
u and any vector v ∈ V , let Pu (v) = u, vu. Then d(Pu (v), v) ≤ d(αu, v) for any
α ∈ R.
v − αu
2 =
v − Pu (v) + Pu (v) − αu
2
=
v − Pu (v)
2 +
Pu (v) − αu
2 .
This yields that d(v, αu)2 ≥ d(v, Pu (v))2 , and hence we find that d(Pu (v), v) ≤
d(αu, v) for any α ∈ R.
Definition 5.24 Let S be a set of vectors in an inner product space V . Then S is said
to be orthonormal set if
5.3 Orthogonality and Orthonormality 145
v
2 = v,
v
n
n
= αi vi , αjvj
i=1 j=1
n
n
= αi α¯j vi , v j
j=1 i=1
n
= αi ᾱi
i=1
n
= |v, vi |2 .
i=1
Definition 5.27 Let S be any subset of an inner product space V . The orthogonal
complement of S denoted as S ⊥ consists of those vectors in V which are orthogonal
to every vector of S, i.e.,
Remark 5.28 (i) For any given vector u ∈ V, u ⊥ consists of all vectors in V which
are orthogonal to u, i.e., u ⊥ = {v ∈ V | v, u = 0}. Also for any subset S of
V , (S ⊥ )⊥ = {v ∈ V | v, w = 0, for all w ∈ S ⊥ } will be denoted by S ⊥⊥ .
/ V ⊥ , and hence V ⊥ = {0}.
(ii) Since for any 0 = v ∈ V, v, v = 0, we find that v ∈
Obviously {0}⊥ = V .
(iii) Since zero vector in V is orthogonal to every vector in V , clearly 0 ∈ S ⊥ and
hence S ⊥ = ∅. For any u, v ∈ S ⊥ , α, β ∈ F and any vector w ∈ S
(ii) Since W1 and W2 both are subsets of the subspace W1 + W2 , we find that (W1 +
W2 )⊥ ⊆ W1⊥ and (W1 + W2 )⊥ ⊆ W2⊥ . This yields that (W1 + W2 )⊥ ⊆ W1⊥ ∩ W2⊥ .
Now conversely, assume that u ∈ W1⊥ ∩ W2⊥ . Then u ∈ W1⊥ and u ∈ W2⊥ . There-
fore u is orthogonal to every vector in W1 and also to every vector in W2 . Let
w be any vector in W1 + W2 , then w = w1 + w2 , where w1 ∈ W1 , w2 ∈ W2 . It
can be easily seen that u, w = u, w1 + w2 = u, w1 + u, w2 = 0. Therefore,
u ∈ (W1 + W2 )⊥ and hence W1⊥ ∩ W2⊥ ⊆ (W1 + W2 )⊥ . Combining this with the
above fact we find that (W1 + W2 )⊥ = W1⊥ ∩ W2⊥ .
(iii) Since W1⊥ and W2⊥ are also subspaces of V , taking W1⊥ in place of W1 and W2⊥
in place of W2 in (ii) we find that (W1⊥ + W2⊥ )⊥ = W1⊥⊥ ∩ W ⊥⊥ = W1 ∩ W2 . This
implies that (W1⊥ + W2⊥ )⊥⊥ = (W1 ∩ W2 )⊥ and hence (W1 ∩ W2 )⊥ = W1⊥ + W2⊥ .
5.3 Orthogonality and Orthonormality 147
(iii) =⇒ (i) If (iii) holds then W ⊆ U ⊥ , which implies that U ⊥ W and (i) holds.
Theorem 5.33 (Riesz Representation Theorem) Let V be a finite dimensional inner
product space over F and ϕ : V → F be a linear functional on V . Then there exists
a unique vector u ∈ V such that ϕ(v) = v, u for every v ∈ V .
Proof First we show there exists a vector u ∈ V such that ϕ(v) = v, u for every
v ∈ V . Let {v1 , v2 , . . . , vn } be an orthonormal basis of V . Then by Theorem 5.16(ii),
n
v = v, vi vi , and hence
i=1
ϕ(v) = ϕ v, v1 v1 ) + ϕ(v, v2 v2 ) + · · · + ϕ(v, vn vn
= v, v1 ϕ(v1 ) + v, v2 ϕ(v2 ) + · · · + v, vn ϕ(vn )
= v, ϕ(v1 )v1 + v, ϕ(v2 )v2 + · · · + v, ϕ(vn )vn
= v, ϕ(v1 )v1 + ϕ(v2 )v2 + · · · + ϕ(vn )vn
for every v ∈ V . Now setting u = ϕ(v1 )v1 + ϕ(v2 )v2 + · · · + ϕ(vn )vn , we arrive at
ϕ(v) = v, u for every v ∈ V . Now in order to show that there exists a unique
148 5 Inner Product Spaces
vector u ∈ V with the desired behavior suppose that there exist u 1 , u 2 ∈ V such that
ϕ(v) = v, u 1 = v, u 2 for every v ∈ V . Then we find that
Definition 5.34 Let V and W be any two inner product spaces over the same field F.
Then V is said to be isomorphic to W if there exists a vector isomorphism ψ : V →
W such that for any v1 , v2 ∈ V, v1 , v2 = ψ(v1 ), ψ(v2 ). Such an isomorphism is
called inner product space isomorphism.
Theorem 5.35 Any inner product space V over the field F of dimension n is iso-
morphic to Rn or Cn according as F = R or C.
n
ψ(α1 , α2 , . . . , αn ) = αi vi for every(α1 , α2 , . . . , αn ) ∈ Fn .
i=1
Also
n
n
ψ(α), ψ(β) = αi vi , βjvj
i=1 j=1
n n
= αi β j vi , v j
i=1 j=1
n
= αi βi
i=1
= α, β
Proof (i) It is clear that S ⊆ L(S), and hence (L(S))⊥ ⊆ S ⊥ . Now suppose that
v ∈ S ⊥ . Then v is orthogonal to each vector on S. let w ∈ L(S), then w is a linear
n
combination of finite number of vectors in S, i.e., w = αi vi , where vi ∈ S. Thus
i=1
n
v, w = v, αi vi
i=1
n
= αi v, vi
i=1
= 0 (since v is orthogonal to each vector in S).
Therefore v is orthogonal to every vector in L(S), i.e., v ∈ (L(S))⊥ . This shows that
S ⊥ ⊆ (L(S))⊥ and hence S ⊥ = (L(S))⊥ .
(iii) From (i), we have S ⊥ = (L(S))⊥ , which implies that S ⊥⊥ = (L(S))⊥⊥ . But
since L(S) is a subspace of V , by Theorem 5.29, we find that (L(S))⊥⊥ = L(S).
This shows that S ⊥⊥ = L(S).
n
|v, vi |2 ≤
v
2 .
i=1
n
n
n
v
2 − |v, vi |2 − |v, vi |2 + |v, vi |2 ≥ 0,
i=1 i=1 i=1
n
which implies that |v, vi |2 ≤
v
2 .
i=1
w
= 0 and for vn+1 = w
w
n
n
v − v, vi vi , v j = v, v j − v, vi vi , v j
i=1 i=1
= v, v j − v, v j v j , v j
= v, v j − v, v j
= 0.
n
This shows that v − v, vi vi is orthogonal to each v1 , v2 , . . . , vn . Now assume
i=1
n
that u ∈
/ L(S) and
w
= 0, where w = u − u, vi vi , because if
w
= 0 then
i=1
n
w = 0 and hence the above relation yields that u = u, vi vi or u is in the linear
i=1
combination of v1 , v2 , . . . , vn , i.e., u ∈ L(S), a contradiction and hence
w
= 0.
Further, we see that {v1 , v2 , . . . , vn , vn+1 } is also an orthonormal subset of V . For
each 1 ≤ i ≤ n,
vn+1 , vi =
w
w
, vi
n
= 1
w
u − u, v j v j , vi
j=1
n
= 1
w
1
{u, vi − u, vi }
= 0.
w
,
w
w
= 1
w
2
w, w = 1, which completes
the proof.
If we consider the standard basis B = {e1 , e2 , . . . , en } of Rn ( or Cn ), it can be easily
observed that the basis B is an orthonormal basis, i.e., vectors in B are pairwise
5.3 Orthogonality and Orthonormality 151
orthogonal and each vector in B has unit length. For n ≥ 2 this vector space has
infinitely many bases whose members are pairwise orthogonal, and has length 1. We
shall now discuss a method for obtaining orthonormal basis for a finite dimensional
inner product space.
k−1
vk , u j
u k = vk − u j ; for k ≥ 2,
j=1
u j
2
( (1+2i)
2 , 2 , 2 )
2−i 2−2i
w2 = u2
u 2
= √
18 = ( 1+2i
√ , √
18
, √18 ).
2−i 2−2i
18
2
Thus the required orthonormal basis of W is given by
1 i 1 + 2i 2 − i 2 − 2i
{w1 , w2 } = {( √ , √ , 0), ( √ , √ , √ )}.
2 2 18 18 18
(2) Let R[x], the vector space of all real polynomials with inner product on R[x]
given by
1
p(x), q(x) = p(x)q(x)d x.
−1
= x
√ 1 2 = √23 x,
−1 x d x
√
x2− 1
w3 = u3
u 3
= √ 1 3 5 3 2 = 25√72 (x 3 − 35 x)
−1 (x − x) d x
5
and so√ on. Hence
√ √ orthonormal sequence in R[x] is given by
the required
1 √3 10 2 2
( 2 , 2 x, 4 (3x − 1), 5 7 (x 3 − 35 x), . . .).
√ 2 √
Lemma 5.40 Let {v1 , v2 , . . . , vn } be the set of nonzero pairwise orthogonal vectors
in an inner product space V . Then the vectors u 1 , u 2 , . . . , u n with u 1 = v1 and
vk ,u j
k−1
u k = vk −
u j
2 j
u ; for 2 ≤ k ≤ n are such that vi = u i ; 1 ≤ i ≤ n.
j=1
Proof Given that u 1 = v1 . The result can be proved easily by using induction.
Assume that for some k; 1 ≤ k < n, u i = vi , 1 ≤ i ≤ k. Then for 1 ≤ j ≤ k; vk+1 ,
u j = vk+1 , v j = 0 and hence
k
vk+1 , u j
u k+1 = vk+1 − u j = vk+1 ,
j=1
u j
2
v j
= 1, w j = v j for 1 ≤ j ≤ k, and by using the Gram-Schmidt process we find
that B = {w1 , w2 , . . . , wn } is an orthonormal basis of V .
5.3 Orthogonality and Orthonormality 153
Exercises
1. Let B = {e1 , e2 , . . . , en } be an orthonormal basis of V . Prove that
(a) For any v ∈ V , we have v = v, e1 e1 + v, e2 e2 + · · · + v, en en ,
(b) a1 e1 + a2 e2 + · · · + an en , b1 e1 + b2 e2 + · · · + bn en =a1 b1 + a2 b2 + · · ·
+ an bn ,
(c) For any u, v ∈ V , we have u, v = u, e1 v, e1 + · · · + u, en v, en .
2. Let B = {e1 , e2 , . . . , en } be an orthogonal basis of V . Then prove that for any
v ∈ V , v = ev,e 1
1 ,e1
e1 + ev,e 2
2 ,e2
e2 + · · · + ev,e n
n ,en
en .
3. Consider the subspace U of R4 spanned by the vectors: v1 = (1, 1, 1, 1), v2 =
(1, 1, 2, 4), v3 = (1, 2, −4, −3), using Gram-Schmidt algorithm, find
(a) an orthogonal basis of U ,
(b) an orthonormal basis of U .
4. Let R3 [x] be the inner product of all polynomials of degree at most 3, under the
inner product ∞
p(x)q(x)e−x d x.
2
p(x), q(x) =
−∞
the subspace of odd functions, i.e., functions satisfying f (−t) = − f (t). Find
orthogonal complement
of W .
26. Suppose V = W1 W2 and that f 1 and f 2 are inner products on W1 and W2 ,
respectively. Show that there is a unique inner product f on V such that
(a) W2 = W1⊥ ,
(b) f (u, v) = f k (u, v) when u, v ∈ Wk , k = 1, 2.
Definition 5.42 Let U and V be inner product spaces over the field F and T :
U → V be a linear transformation. The adjoint of T , denoted as T ∗ , is the function
T ∗ : V → U such that T (u), v = u, T ∗ (v), for every u ∈ U and v ∈ V .
Theorem 5.45 If U and V are inner product spaces over the same field F and
T : U → V is a linear transformation, then T ∗ is a linear transformation from V to
U.
Hence the above yields that T ∗ (αv1 + βv2 ) = αT ∗ (v1 ) + βT ∗ (v2 ) for all v1 , v2 ∈ V
and α, β ∈ F.
Theorem 5.46 Let U, V, W be inner product spaces over the same field F. Then
(i)(S + T )∗ = S ∗ + T ∗ for all S, T ∈ H om(U, V ),
(ii)(αT )∗ = αT ∗ for all α ∈ F, T ∈ H om(U, V ),
(iii)(ST )∗ = T ∗ S ∗ for all T ∈ H om(U, V ), S ∈ H om(V, W ),
(iv) (T ∗ )∗ = T for all T ∈ H om(U, V ),
(v) I ∗ = I (resp. O ∗ = O), where I (rep. O) is the identity (resp. zero) linear
transformation on U
(vi) if T ∈ H om(U, V ) and T is invertible, then (T ∗ )−1 = (T −1 )∗ .
u, (αT )∗ (v) = (αT )(u), v = αT (u), v = αu, T ∗ (v) = u, αT ∗ (v).
u, (ST )∗ (w) = (ST )(u), w = S(T (u)), w = T (u), S ∗ (w) = u, T ∗ (S ∗ (w)).
(v) For any u, v ∈ U we have u, v = I (u), v = u, I ∗ (v). This shows that
I ∗ = I . Similarly one can prove that 0∗ = 0.
Remark 5.47 (i) One can find the relationship between N (T ), the null space and
R(T ), the range of a linear map T and its adjoint T ∗ . If T ∈ H om(U, V ), then
for any v ∈ V , v ∈ N (T ∗ ) if and only if T ∗ (v) = 0, and T ∗ (v) = 0 if and only
if u, T ∗ (v) = 0 for all u ∈ U or T (u), v = 0 for all u ∈ U . This shows that
v ∈ N (T ∗ ) if and only if v ∈ (R(T ))⊥ and hence N (T ∗ ) = (R(T ))⊥ . By taking
the orthogonal complement both the sides one can find that R(T ) = N (T ∗ )⊥ ,
and hence replacing T by T ∗ we find that R(T ∗ ) = (N (T ))⊥ .
(ii) If T is a linear transformation from U to V , then the following theorem shows
that it is easy to find T ∗ if we can find the relation between m(T ) and m(T ∗ ).
Theorem 5.48 Let U and V be nonzero finite dimensional inner product spaces
over the same field F, and let B1 = {u 1 , u 2 , . . . , u m } and B2 = {v1 , v2 , . . . , vn } be
ordered orthonormal basis of U and V , respectively. If T is a linear transformation
from U into V , and m(T ) = (α ji ) be the matrix of T of order n × m relative to the
basis B1 of U , then the matrix of T ∗ relative to the basis B2 is the conjugate transpose
of m(T ) of order m × n.
Proof Given that U and V are finite dimensional inner product spaces with orthonor-
mal basis B1 = {u 1 , u 2 , . . . , u m } and B2 = {v1 , v2 , . . . , vn }, respectively. Suppose
that m(T ∗ ) = (βi j ) represents the matrix of T ∗ relative to the basis B2 . Then for
1 ≤ i ≤ m, i ≤ j ≤ n,
n
m
∗
T (u i ) = α ji v j and T (v j ) = βi j u i .
j=1 i=1
Similarly,
m
T ∗ (v j ), u i = βi j u i , u i = βi j .
i=1
βi j = u i , T ∗ (v j = T (u i ), v j = α ji ,
158 5 Inner Product Spaces
and hence m(T ∗ ) = (m(T ))t , i.e., m(T ∗ ) is the conjugate transpose of m(T ).
This relation is very useful in determining T ∗ if T is known.
Thus ⎡ ⎤
5 2 + 3i 3 − 2i
m(T ∗ ) = ⎣ 2 − 3i 2 1 +i ⎦.
3 + 2i 1 − i 7
Assume that every inner product space is a normed space and every normed space is
a metric space. In√fact, if V is an inner product space then V is a normed space under
the norm
v
= v, v, and it has also been seen in the beginning of this chapter
that every normed space is a metric space with respect to the metric d : V × V → R
defined by d(u, v) =
u − v
. In case of isomorphism, it is also known that the linear
map preserves the operations of the vector space. Now we define a linear transforma-
tion on V which do not change the distance or do not change the length of a vector
in an inner product space V .
This shows that d(z 1 , z 2 ) = d(T (z 1 ), T (z 2 )), i.e., d preserves the distance. However,
if we take S : R2 → R2 such that S(a, b) = (0, b), then it can be easily seen that
z
=
S(z)
, z ∈ R2 .
u + v
2 =
u
2 +
v
2 + u, v + v, u, we find that u, v =
u+v
−
u
2 −
v
2
2
2
for
all u, v ∈ V . Now since T ∈ A (V ), the latter relation yields that
T (u), T (v) =
T (u)+T (v)
−
T (u)
−
T (v)
2 2 2
2
T (u+v)
2 −
T (u)
2 −
T (v)
2
= 2
=
u+v
−
u
−
v
2
2 2
2
= u, v
for all u, v ∈ V .
CaseI I . Suppose that V is unitary space. In this case u, v = v, u implies that
u, v + u, v =
u + v
2 −
u
2 −
v
2 .
u + iv
2 = u + iv, u + iv
= u, u − iu, v + iv, u − i 2 v, v
=
u
2 − iu, v + iu, v +
v
2 .
u, v − u, v = i{
u + iv
2 −
u
2 −
v
2 }.
2u, v =
u + v
2 + i
u + iv
2 − (1 + i)(
u
2 +
v
2 ) for all u, v ∈ V.
Replacing u by T (u), v by T (v), and using the fact that T is an isometry with
T (iu) = i T (u), we find that
T (u), T (v) = 21 {
T (u) + T (v)
2 + i
T (u) + i T (v)
2
−(1 + i)(
T (u)
2 +
T (v)
2 )}
= 21 {
T (u + v)
2 + i
T (u + iv)
2 − (1 + i)(
T (u)
2 +
T (v)
2 )}
= 21 {
u + v
2 + i
u + iv
2 − (1 + i)(
u
2 +
v
2 )}
= u, v.
Conversely if T (u), T (v) = u, v for all u, v ∈ V , then in particular T (u), T (u)
= u, u for all u ∈ V , that is,
T (u)
2 =
u
2 . Hence
T (u)
=
u
and T is an
isometry.
One can also find another kind of linear transformation on an inner product space V
which preserves all the structure of V .
Definition 5.64 Let V be an inner product space over the field F. Then T ∈ A (V )
is said to be unitary if
5.4 Operators on Inner Product Spaces 163
Since the above relation holds for all v ∈ V , replacing v by iv, where i 2 = −1
we arrive at −iT (u), T (v) + iT (v), T (u) = −iu, v + iv, u for all u, v ∈
V. Now multiplying by i yields that
Combining the above two relations we find that T (u), T (v) = u, v for all u, v ∈
V , and therefore T is unitary.
Example 5.66 Let V be a finite dimensional inner product space and λ1 , λ2 , . . . , λn
be scalars with absolute value 1. If T ∈ A (V ) satisfies T (v j ) = λ j v j for some
orthonormal basis {v1 , v2 , . . . , vn } of V , then it can be seen that T is an isometry.
n
Suppose that v ∈ V . Then there exist scalars α1 , α2 , . . . , αn such that v = αi vi .
i=1
n
Since the given basis is orthonormal, v, vi = αi . Hence v = v, vi vi .
i=1
n
n
v
= 2
αi vi , αi vi
i=1 i=1
n
= αi αi vi , vi
i=1
n
= |αi |2
i=1
n
= |v, vi |2 .
i=1
n
n
Since v = v, vi vi , we find that T (v) = v, vi T (vi ). This yields that T (v) =
i=1 i=1
n
v, vi λi vi . Now
i=1
n
n
T (v)
2 = v, vi λi vi , v, vi λi vi
i=1 i=1
n
= |v, vi |2 |λi |2
i=1
n
= |v, vi |2 .
i=1
164 5 Inner Product Spaces
Similarly,
n
n
T (v), T (v) = αi T (vi ), α j T (v j )
i=1 j=1
n
n
= αi α j T (vi ), T (v j )
i=1 j=1
n
= |αi |2 .
i=1
ϕv (αu 1 + βu 2 ) = T (αu 1 + βu 2 ), v
= αT (u 1 ), v + βT (u 2 ), v
= αϕv (u 1 ) + βϕv (u 2 ).
5.4 Operators on Inner Product Spaces 165
But since T (u), αv1 + βv2 = u, S(αv1 + βv2 ), the above relation yields that
S(αv1 + βv2 ) = αS(v1 ) + β S(v2 ) and S ∈ A (V ). Now our claim is that S is
unique. Suppose there exists S ∈ A (V ) such that T (u), v = u, S (v). Then
u, (S − S )(v) = 0 for all v ∈ V . This shows that S − S = 0. It can also be seen
that S is one-to-one. If S(v) = 0, then T (u), v = u, S(v) = 0, for all u ∈ V .
Since T is an isometry, T −1 exists, and hence T (T −1 )(v), v = 0 which yields that
v, v = 0 or v = 0. Since V is finite dimensional, S is non-singular.
As T (u), v = u, S(v) for all u, v ∈ V , we find that T (u), T (v) = u, S(T (v))
for all u, v ∈ V . But since T is an isometry, we have u, v = u, ST (v). This reduces
to u, (ST − I )(v) = 0, for all u, v ∈ V . Hence, in particular (ST − I )(v), (ST −
I )(v) = 0, and ST = I . Similarly, it can also be seen that T S = I and hence S is an
isometry. In fact, for any isometry T there exists S ∈ A (V ) such that ST = T S = I .
Now for any v ∈ V ,
T S(v)
=
T (S(v)
=
S(v)
, i.e.,
v
=
I (v)
=
S(v)
This shows that T ∗ (αv1 + βv2 ) = αT ∗ (v1 ) + βT ∗ (v2 ) and hence T ∗ is a linear
operator on V . For uniqueness, let there exist another linear operator on V such
that T (u), v = u, T (v). This yields that u, T ∗ (v) = u, T (v) or u, T ∗ (v) −
T (v) = 0 and hence in particular we find that T ∗ (u) − T (u) = 0 for all u ∈ V i.e.,
T ∗ = T .
Exercises
1. If T : R3 → R2 is a linear transformation, then find the adjoint of T .
166 5 Inner Product Spaces
In this chapter, we study the structure of linear operators. In all that follows, V will be
a finite dimensional vector space and T : V → V a linear operator from V to itself.
We recall that the kernel N (T ) and the image R(T ) of T are both subspaces of V and,
in the light of the rank-nullity theorem, the following conditions are equivalent: (i)
T is bijective, (ii) N (T ) = {0}, (iii) R(T ) = V. A linear operator which satisfies
these equivalent conditions is said to be invertible, its inverse function is also an
operator and its matrix with respect to an arbitrary basis of V is an invertible matrix.
An operator that is not invertible is said to be a singular operator and its matrix with
respect to an arbitrary basis of V is a singular matrix.
We already have discussed that if V is a vector space of dimension n over the
field F and B is a basis for V , then we can associate a matrix A ∈ Mn (F) to T .
More precisely, the column coordinate vectors of A are the images of the elements of
the basis B. Thus any linear operator on V is represented by an appropriate matrix,
whose scalar entries depend precisely on the choice of the basis for V.
We remind the reader of the basic effect of the change of basis in V. If B and
B are two different bases for V and A and A are the matrices of T relative to the
bases B and B , respectively, then A and A are similar to each other. In particular
A = P −1 A P, where P is the transition matrix. Hence A and A represent the same
operator T and A is obtained from A by conjugation. In other words, any matrix
which is similar to A represents the linear operator T.
The first question one might ask oneself is, how to determine a basis of V with
respect to which the linear operator is represented by a particularly simple matrix.
To answer this question, we first introduce the concepts of eigenvalue, eigenvector
and eigenspace of a linear operator T. They will be the main tools for analyzing
the structure of the matrix of T. Then we describe the most relevant classes of
linear operators, in accordance with the study of their representing matrices. More
precisely, we focus our attention on operators which are represented in terms of some
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 167
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_6
168 6 Canonical Forms of an Operator
suitable bases for V, by triangular, diagonal or block diagonal matrices (usually called
canonical forms of a matrix). Then, we conclude this chapter by studying the class
of normal operators.
V0 = {0 = X ∈ V | (A − λ0 I )X = 0}.
with respect to the canonical basis of R3 . The characteristic polynomial is the fol-
lowing determinant:
1 − λ 2 2
p(λ) = 1 3 − λ 1 = (λ − 1)(−λ2 + 4λ + 5).
2 2 1 − λ
There are three distinct roots of p(λ), that is, three distinct eigenvalues λ1 = −1,
λ2 = 5, λ3 = 1 of T. Let us find the corresponding eigenspaces.
For λ1 = −1, we have ⎡ ⎤
222
A + I = ⎣1 4 1⎦
222
which has rank equal to 2. Thus, the associated homogeneous linear system
has solutions X = (α, 0, −α), for any α ∈ R. Hence, the eigenspace corresponding
to λ1 has dimension 1 and is generated by the eigenvector (1, 0, −1).
For λ2 = 5, we have ⎡ ⎤
−4 2 2
A − 5I = ⎣ 1 −2 1 ⎦
2 2 −4
has solutions X = (α, α, α), for any α ∈ R. In this case, the eigenspace correspond-
ing to λ2 is generated by the eigenvector (1, 1, 1).
Finally, for λ3 = 1, ⎡ ⎤
022
A − I = ⎣1 2 1⎦
220
2x2 + 2x3 = 0
x1 + 2x2 + x3 = 0
2x1 + 2x2 = 0
has solutions X = (α, −α, α), for any α ∈ R. The eigenspace corresponding to λ3
is generated by the eigenvector (1, −1, 1).
Example 6.3 Let T : R3 → R3 be the linear operator with associated matrix
⎡ ⎤
0 −1 1
A = ⎣1 0 2⎦
0 0 2
with respect to the canonical basis of C3 . We repeat the same computations as above
and find that λ1 = 2, λ2 = i and λ3 = −i are the eigenvalues of T. For λ1 = 2, we
have ⎡ ⎤
−2 −1 1
A − 2I = ⎣ 1 −2 2 ⎦
0 0 0
6.1 Eigenvalues and Eigenvectors 171
which has rank equal to 2. Thus, the associated homogeneous linear system
−2x1 − x2 + x3 = 0
x1 − 2x2 + 2x3 = 0
has solutions X = (0, α, α), for any α ∈ R. Hence, the eigenspace corresponding to
λ1 has dimension 1 and is generated by the eigenvector (0, 1, 1).
For λ2 = i, we have ⎡ ⎤
−i −1 1
A − i I = ⎣ 1 −i 2 ⎦
0 0 2−i
−i x1 − x2 + x3 = 0
x1 − i x2 + 2x3 = 0
(2 − i)x3 =0
has solutions X = (iα, α, 0), for any α ∈ R. In this case, the eigenspace correspond-
ing to λ2 is generated by the eigenvector (i, 1, 0).
Finally, for λ3 = −i, ⎡ ⎤
i −1 1
A + iI = ⎣1 i 2 ⎦
0 0 2+i
has rank equal to 2 and the associated homogeneous linear system has solutions
X = (−iα, α, 0), for any α ∈ R. The eigenspace corresponding to λ3 is generated
by the eigenvector (−i, 1, 0).
Lemma 6.5 Let A ∈ Mn (F). If λ is an eigenvalue of A, then λm is an eigenvalue
of Am , for any m ≥ 2. Moreover, any eigenvector of A associated with λ is an
eigenvector of Am associated with λm .
Proof It follows directly from the definition of eigenvalue and associated eigenvec-
tor. In fact, multiplying on the left by A in the matrix equation AX = λX, we get
A2 X = λAX = λ2 X, that is A2 admits the eigenvalue λ2 , moreover X is an eigen-
vector associated with λ2 . In a similar way A3 X = λ3 X, and continuing this process
one has that Am X = λm X.
Lemma 6.6 Let A ∈ Mn (C). If λ0 is an eigenvalue of A, then the conjugate element
λ0 is an eigenvalue of the adjoint A∗ .
Proof It is known that, for any matrix B ∈ Mn (C), |B ∗ | = |B|. Thus
|A − λI | = |(A − λI )∗ | = |(A∗ − λI |
|A − λI | = |(A − λI )T | = |A T − λI |.
Proof Let X ∈ V be an eigenvector of A associated with λ0 . One can easily see that
(A + α0 I )X = AX + α0 X = λ0 X + α0 X = (λ0 + α0 )X
as required.
Example 6.9 Let T : C3 → C3 be the linear operator as in Example 6.4 and intro-
duce the following operator G : C3 → C3 defined as G(X ) = T (X ) + 3X, for any
X ∈ C3 . Thus, the matrix of G with respect to the canonical basis of C3 is
⎡ ⎤
3 −1 1
A = ⎣1 3 2⎦.
0 0 5
Proof On the contrary we assume that there exists a nonzero α ∈ F, such that X 1 =
α X 2 . By the hypothesis, we have that AX 1 = λ1 X 1 and AX 2 = λ2 X 2 . The facts that
AX 1 = λ1 X 1 and X 1 = α X 2 imply that α AX 2 = λ1 α X 2 , that is, αλ2 X 2 = λ1 α X 2 .
Hence α(λ2 − λ1 )X 2 = 0, which is a contradiction.
Remark 6.11 The above result implies easily that if {λ1 , . . . , λt } are all the dis-
tinct eigenvalues of A and X 1 , X 2 , . . . , X t are nonzero eigenvectors corresponding
to λ1 , λ2 , . . . , λt , respectively, then {X 1 , . . . , X t } is linearly independent. Moreover,
if W1 , . . . , Wt are the eigenspaces associated with λ1 , . . . , λt , respectively, the sub-
space W = W1 + · · · + Wt is a direct sum and we write W = W1 ⊕ · · · ⊕ Wt .
then ⎡ ⎤
a11 − λ a12 . . . a1n
⎢ a22 − λ . . . a2n ⎥
⎢ ⎥
A − λI = ⎢ .. ⎥
⎣ . ⎦
ann − λ
Theorem 6.14 Let A, B ∈ Mn (F) be similar matrices, that is, there exists an invert-
ible matrix P ∈ Mn (F) such that B = P −1 A P. Then A and B have the same eigen-
values. Moreover, if Y is an eigenvector of B associated with the eigenvalue λ, then
X = PY is an eigenvector of A associated with λ.
Example 6.15 Let T : R2 → R2 be the linear operator having the following matrix,
with respect to the canonical basis B of R2 ,
1 1
A= .
−2 4
Let B = {(1, 0), (1, 1)} a basis for R2 . The matrix of T with respect to B is
P A P = A , where
−1
11
P=
01
Definition 6.16 Let A ∈ Mn (F) and p(λ) its characteristic polynomial. Assume that
F contains the splitting field of p(λ) and let S = {λ1 , . . . , λt } be the set of all distinct
roots of p(λ). Hence, we get the following decomposition
t
where ai = n. For any eigenvalue λk ∈ S, we say that
i=1
(i) ak is its algebraic multiplicity, i.e., it is the number of times λk occurs as a root
of p(λ).
6.1 Eigenvalues and Eigenvectors 175
(ii) The dimension of the eigenspace associated with λk is the geometric multiplicity
of λk .
where 0n−g,g ∈ Mn−g,g (F), A1 ∈ Mg,n−g (F), A2 ∈ Mn−g (F) and D ∈ Mg (F) is a
diagonal block defined as:
⎡ ⎤
λ0
⎢ λ0 ⎥
⎢ ⎥
D=⎢ .. ⎥.
⎣ . ⎦
λ0
Since A and A are similar matrices, they have the same eigenvalues. In particular,
A and A have the same characteristic polynomial. On the other hand, the charac-
teristic polynomial of A is p(λ) = (λ − λ0 )g q(λ), where q(λ) is the characteristic
polynomial of the matrix A2 . Therefore λ0 occurs, as a root of p(λ), at least g-times.
As a consequence, the algebraic multiplicity of λ0 is a ≥ g.
0 0 0 −3
x2 = 0
−x1 − 2x2 + x3 = 0
x1 + 3x2 − x3 = 0
−3x4 = 0
0 000
3x1 + x2 =0
−x1 + x2 + x3 = 0
x1 + 3x2 + 2x3 = 0
has solutions X = (α, −3α, 4α, β), for any α, β ∈ R. In this case, the eigenspace
corresponding to λ2 is generated by the eigenvectors {(1, −3, 4, 0), (0, 0, 0, 1)} and
the geometric multiplicity of λ2 is equal to 2.
Exercises
(a) α0 = |A|.
(b) αn−1 = (−1)n−1 tr (A).
2. Let T : V → V be a nonsingular linear operator on a n-dimensional vector space
V, A the matrix of T with respect to a basis for V. Let λ be an eigenvalue of T
and X ∈ V an eigenvector of T corresponding to λ. Prove that λ = 0 (trivial)
and λ−1 is an eigenvalue of A−1 and X is an eigenvector of T −1 corresponding
to λ−1 .
3. Let T : V → V be a linear operator on a n-dimensional vector space V and let
A, B ∈ Mn (F) two matrices of T with respect two different bases for V (i.e., A
and B are similar). Let λ ∈ F be an eigenvalue of A and B. Prove that:
(a) The geometric multiplicity of λ as eigenvalue of A coincides with its geo-
metric multiplicity as eigenvalue of B.
(b) The trace, the determinant and the rank of A are, respectively, equal to the
trace, the determinant and the rank of B.
4. Let T : R3 → R3 be the linear operator with associated matrix
⎡ ⎤
010
A = ⎣0 0 1⎦.
991
8. Let A be a 3 × 3 matrix with real entries such that det (A) = 6 and the trace
of A is 0. If det (A + I ) = 0, where I denotes the 3 × 3 identity matrix, then
determine eigen values of A.
9. Let J denotes a 101 × 101 matrix with all the entries equal to 1 and let I denotes
the identity matrix of order 101. Then find det (J − I ) and trace (J − I ).
10. Let A be a 4 × 4 matrix with real entries such that −1, 1, 2, −2 are its eigen
values. If B = A4 − 5A2 + 5I , then find:
(a) det (A + B)
(b) det (B)
(c) trace (A − B)
(d) trace (A + B).
11. Show that all eigen values of a real skew-symmetric orthogonal matrix are of
unit modulus.
12. Let α, β be two distinct eigenvalues of a square matrix A, and W1 , W2 be the
corresponding eigenspaces associated with α, β, respectively. Then show that
W1 ∩ W2 = {0}.
where λ1 , . . . , λn ∈ F are precisely the eigenvalues of T (of A), see Remark 6.13.
Analogously, we say that a matrix is triangularizable if it is similar to an upper
triangular matrix.
t
where ai = n and any λi ∈ F, if and only if T has an upper-triangular matrix
i=1
with respect to some orthonormal basis of V.
6.2 Triangularizable Operators 179
Proof Let A ∈ Mn (F) be the matrix of T. The fact that T has an upper-triangular
matrix with respect to some orthonormal basis of V is equivalent to saying that A is
unitarily similar to an upper triangular matrix A , namely H −1 AH = A , where H
is an unitary matrix.
Firstly, we assume that the characteristic polynomial p(λ) of T splits over F
and prove that T is unitary triangularizable. We prove the result by induction on the
dimension of V. Of course, it is trivial in the case dim(V ) = 1. Thus, we assume that
the result holds for any inner product space of dimension less than n. Let λ0 ∈ F be
an eigenvalue of T and X 1 an eigenvector corresponding to λ0 having unit norm, that
is X 1 = 1. Extending {X 1 } to an orthonormal basis B = {X 1 , . . . , X n } of V, we
may compute the matrix A of T with respect to B. If we denote by P the transition
matrix, then it follows that
A = P −1 A P,
for 0n−1,1 = [ 0, . . . , 0 ]t , u ∈ Fn−1 and A1 ∈ Mn−1 (F). Since A and A have the
(n−1)−times
same characteristic polynomial, a fortiori any eigenvalue of A1 is an eigenvalue of A.
Therefore, the characteristic polynomial of A1 splits over F. By induction hypothesis,
there exists an unitary matrix Q ∈ Mn−1 (F) such that Q −1 A1 Q = A2 is an upper
triangular matrix. Hence, if
1 01,n−1
R=
0n−1,1 Q
R −1 A R = R −1 P −1 A P R = (P R)−1 A(P R)
where both P and R are unitary matrices. Thus, the product P R is an unitary matrix,
and its columns are the coordinate vectors of an unitary basis with respect to which
the linear operator T is represented by the upper triangular matrix
180 6 Canonical Forms of an Operator
λ0 uT
.
0n−1,1 A2
−2 1 1 2
so that the associated homogeneous linear system has solutions: X = (α, 0, 0, α), for
any α ∈ R. The corresponding eigenspace is N1 =
(1, 0, 0, 1) =
( √12 , 0, 0, √12 ).
The orthogonal complement is
1 1
N1⊥ =
( √ , 0, 0, − √ ), (0, 1, 0, 0), (0, 0, 1, 0)
2 2
1 1 1 1
B1 = {( √ , 0, 0, √ ), ( √ , 0, 0, − √ ), (0, 1, 0, 0), (0, 0, 1, 0)}.
2 2 2 2
⎡ √1 √1
⎤
2 2
0 0
⎢ 0 0 1 0⎥
Q1 = ⎢
⎣ 0
⎥.
0 0 1⎦
√1
2
− √12 0 0
Hence we compute
⎡ ⎤
1 − √42 0
⎢ ⎥
A2 = Q 2T C1 Q 2 = ⎣ 0 2 0 ⎦ .
0 − √22 2
Once again, we delete the first row and column of A2 and consider the following
2 × 2 submatrix:
2 0
C2 =
− √22 2
and compute
2 − √22
A3 = Q 3T C2 Q 3 = .
0 2
Actually the process is finished. We have now to construct the final upper-
triangular matrix which is similar to A. In other words, we have to find the orthonor-
mal matrix U ∈ M4 (R) such that U T AU is an upper-triangular matrix. This matrix
is the product of 3 orthonormal matrices U1 , U2 , U3 ∈ M4 (R). Each of them is cor-
responding to one step of the above process. More precisely:
(i) The first matrix is U1 = Q 1 .
(ii) Let U2 be the matrix ⎡ ⎤
1 0 0 0
1 0 ⎢0 0 1 0⎥
=⎢
⎣0
⎥.
0 Q2 1 0 0⎦
0 0 0 1
Thus ⎡ √1 ⎤
0 2
0 √12
⎢ 0 1 0 0 ⎥
U = U1 U2 U3 = ⎢
⎣ 0 0
⎥
1 0 ⎦
√1 0
2
0 − √12
and ⎡ ⎤
1 √22 √22 −3
⎢ 0 1 0 − √4 ⎥
⎢ 2⎥
U t AU = ⎢ ⎥.
⎣ 0 0 2 − √2 ⎦
2
0 0 0 2
6.2 Triangularizable Operators 183
Exercises
where λ1 , . . . , λn ∈ F are precisely the eigenvalues of T (of A)( see Remark 6.13).
Usually, B is said to be a diagonalizing basis of T. Analogously, we say that a matrix
is diagonalizable if it is similar to a diagonal matrix.
Assume that V has dimension n. One may note that, in light of the definition of the
matrix associated with a linear operator, the fact that B = {e1 , . . . , en } is a diago-
nalizing basis of V coincides with the one that T (e1 ) = λ1 e1 , …, T (en ) = λn en , for
suitable elements λi ∈ F.
Nevertheless, not all matrices could be diagonalized, in the sense that, in some
cases it is not possible to find a basis of V such that the matrix of T with respect to
that basis is diagonal. Here, we give an answer to the question of which matrices are
similar to diagonal matrices or, analogously, which linear operators are represented
by diagonal matrices. The diagonalizable linear operators are characterized by the
following:
Theorem 6.24 Let V be a vector space of dimension n and T : V → V a linear
transformation from V to itself. If the characteristic polynomial p(λ) of T splits
over F, i.e., p(λ) breaks over F in linear factors, then the following conditions are
equivalent:
(i) T is diagonalizable.
(ii) For any eigenvalue of T, its algebraic multiplicity matches with its geometric
multiplicity.
(iii) There exists a basis for V that consists entirely of eigenvectors of T.
Proof Let A ∈ Mn (F) be the matrix of T. The fact that T has a diagonal matrix with
respect to some basis of V is equivalent to saying that A is similar to a diagonal
matrix A , namely P −1 A P = A , where the columns of P are the coordinate vectors
of a basis with respect to which the linear operator T is represented by the diagonal
matrix A. Thus, we actually show that the following conditions are equivalent:
(i) A is diagonalizable.
(ii) For any eigenvalue of A, its algebraic multiplicity matches with its geometric
multiplicity.
(iii) There exists a basis for V that consists entirely of eigenvectors of A.
6.3 Diagonalizable Operators 185
where the diagonal entries λi are not necessarily distinct and any eigenvalue λi
repeatedly occurs on the main diagonal as many times as it occurs as a root of the
characteristic polynomial of A . Let λk be any eigenvalue of A . Since the rank of
A − λk I is equal to n − ak , the dimension of the eigenspace Vk associated with
λk is n − (n − ak ) = ak , as required.
(ii) ⇒ (iii) We now assume that, for any eigenvalue λk of A, its algebraic multi-
plicity ak matches with its geometric multiplicity gk . Thus, the dimension of any
eigenspace Vk associated with λk is equal to ak . Hence we have
that is
V1 ⊕ V2 ⊕ · · · ⊕ Vm = V.
Therefore, the union of the bases of any Vk is a basis for V that consists entirely
of eigenvectors of A.
(iii) ⇒ (i) We finally assume that B = {X 1 , . . . , X n } is a basis for V that con-
sists entirely of eigenvectors of A. Even if the eigenvalues of A are not necessarily
distinct, we can list them in the set {λ1 , . . . , λn }, such that any eigenvalue λi repeat-
edly occurs as many times as it occurs as a root of the characteristic polynomial
of A and
AX 1 = λ1 X 1 , AX 2 = λ2 X 2 , AX 3 = λ3 X 3 , . . . , AX n = λn X n .
T (X 1 ) = AX 1 = λ1 X 1 = (λ1 , 0, 0, 0, . . . , 0) B
T (X 2 ) = AX 2 = λ2 X 2 = (0, λ2 , 0, 0, . . . , 0) B
T (X 3 ) = AX 3 = λ3 X 3 = (0, 0, λ3 , 0, . . . , 0) B
186 6 Canonical Forms of an Operator
............
T (X n ) = AX n = λn X n = (0, 0, 0, 0, . . . , λn ) B .
If we denote by A the matrix of T with respect to the basis B, then the column
coordinate vectors of A are precisely the images T (X i ), that is,
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
A = ⎢
⎢
⎥.
⎥
⎣ .. ⎦
.
λn
Moreover, A = P −1 A P, where P = X 1 . . . . . . . . . X n is the transition matrix
having in any ith column the coordinates of the eigenvector X i .
having rank equal to 1, so that the associated homogeneous linear system reduces
to x2 − x3 = 0 and has solutions X = (α, β, β), for any α, β ∈ R. The eigenspace
corresponding to λ1 is generated by the eigenvectors {(1, 0, 0), (0, 1, 1)} and the
geometric multiplicity of λ1 is equal to 2.
For λ2 = 3, we have ⎡ ⎤
−1 −3 3
A − 3I = ⎣ 0 −2 1 ⎦
0 −2 1
has solutions X = (3α, α, 2α), for any α ∈ R. In this case, the eigenspace corre-
sponding to λ2 is generated by the eigenvector (3, 1, 2) and the geometric multiplicity
of λ2 is equal to 1.
Now consider the basis B = {(1, 0, 0), (0, 1, 1), (3, 1, 2)} for R3 , consisting
entirely of eigenvectors of T. With respect to this basis, T has a diagonal form.
The matrix of T with respect to B can be computed as A = P −1 A P, where
⎡ ⎤
103
P = ⎣0 1 1⎦
012
0 0 1 −i
−i x1 − x2 + x3 = 0
x1 − i x2 + x3 + x4 = 0
−i x3 − x4 = 0
x3 − i x4 = 0
has solutions X = (α, −iα, 0, 0), for any α ∈ R. The eigenspace corresponding to
λ1 is generated by the eigenvector (1, −i, 0, 0) and the geometric multiplicity of λ1
is equal to 1. We may conclude that, in this case, T is not diagonalizable.
2x1 − 2x2 + x3 − x4 = 0
x3 − x4 =0
2x2 − x3 + x4 = 0
2x1 + x3 − x4 = 0
x3 + x4 = 0
x5 = 0
has solutions X = (−α, α, α, −α, 0), for any α ∈ R. The eigenspace corresponding
to λ2 is generated by the eigenvector (−1, 1, 1, −1, 0) and the geometric multiplicity
of λ2 is equal to 1. For λ3 = −1, we have
⎡ ⎤
2 2 −1 1 0
⎢2 2 1 −1 0⎥
⎢ ⎥
A+I =⎢
⎢0 0 3 1 0⎥⎥
⎣0 0 1 3 0⎦
0 0 0 0 4
2x1 + 2x2 − x3 + x4 = 0
2x1 + +2x2 + x3 − x4 = 0
3x3 + x4 = 0
x3 + 3x4 = 0
x5 = 0
B = {(1, 1, 0, 0, 0), (0, 0, 1, 1, 0), (0, 0, 0, 0, 1), (−1, 1, 1, −1, 0), (−1, 1, 0, 0, 0)}.
Exercises
in terms of the canonical basis for R3 . Determine, if possible, the basis for R3
with respect to which the matrix of T is diagonal.
4. Redo Exercise 3 for the linear operator T : R4 → R4 having matrix
⎡ ⎤
0 −1 1 1
⎢ −1 0 1 1⎥
A=⎢
⎣ 1
⎥.
1 −2 1⎦
0 0 0 1
diagonalizable?
6. Suppose that A ∈ Mn (F) has two distinct eigenvalues λ and μ such that dim(E λ )
= (n − 1), where E λ is the eigenspace associated with λ. Prove that A is diago-
nalizable.
7. For each of the following linear operators T on a vector space V, test for diag-
onalizability and if T is diagonalizable find a basis B for V such that [T ] B is a
diagonal matrix.
(a) V = P3 (R) and T is defined by T ( f (x)) = f (x) + f (x).
(b) V = P2 (R) and T is defined by T (ax 2 + bx + c) = cx 2 + bx + a.
(c) V = R3 and T is defined by
6.3 Diagonalizable Operators 191
⎛ ⎞ ⎛ ⎞
a1 a2
T ⎝ a2 ⎠ = ⎝ −a1 ⎠ .
a3 2a3
We know that T is diagonalizable if and only if there exists a basis for V that consists
entirely of eigenvectors of T. On the other hand, not all linear operators could be
diagonalized. Here, we analyze what can be done in the case T is not diagonalizable.
In this case and under suitable assumptions, we shall see that there exists a basis of
V with respect to which the matrix of T assumes a fairly simple form, called the
Jordan canonical form.
More precisely, the p × p Jordan block associated with the scalar α ∈ F is defined
as the following p × p matrix:
⎡ ⎤
α 1
⎢ .. .. ⎥
⎢ . . ⎥
J p (α) = ⎢ ⎥
⎣ α 1⎦
α
has rank p − 1, the geometric multiplicity of α is 1. This means that J p (α) is not
diagonalizable, unless the trivial case p = 1 and J1 (α) = [α] ∈ M1 (F).
We say that A ∈ Mn (F) has Jordan form if A is made up of diagonal Jordan blocks,
that is
192 6 Canonical Forms of an Operator
⎡ ⎤
Jn 1 (α1 )
⎢ .. ⎥
⎢ . ⎥
A=⎢
⎢
⎥,
⎥
⎣ .. ⎦
.
Jnr (αr )
where any Jni (αi ) is a n i × n i Jordan block associated with the scalar αi such that
n i = n. It is easy to see that α1 , . . . , αr are the eigenvalues of A which are not
i
necessarily distinct, and the characteristic polynomial of A is
p A (λ) = (λ − α1 )n 1 (λ − α2 )n 2 · · · (λ − αr )nr .
has Jordan form, in fact it consists of 2 diagonal Jordan blocks associated with the
eigenvalues 3 and 2, respectively.
has Jordan form, in fact, it consists of 3 diagonal Jordan blocks associated with the
eigenvalues 2, 4 and 3, respectively.
has Jordan form, in fact, it consists of 4 diagonal 1 × 1 Jordan blocks associated with
the eigenvalues 3, 2, 4 and 5, respectively.
6.4 Jordan Canonical Form of an Operator 193
where 0k,n−k is the zero matrix in Mk,n−k (F), 0n−k,k is the zero matrix in Mn−k,k (F),
AU is the matrix of the restriction T|U with respect to the basis BU , A W is the matrix
of the restriction T|W with respect to the basis BW . Therefore,
pT (λ) = |A − λIn | = |AU − λIk ||A W − λIn−k | = pT|U (λ) pT|W (λ)
as required.
exists a positive integer k such that Ak = 0. The smallest such k is called the index of
nilpotency of A, in the sense that Ah = 0 for any positive integer h < k. Of course,
the linear operator T is nilpotent if and only if its matrix A is nilpotent.
Lemma 6.37 Let V be a vector space of dimension n, T : V → V a linear operator
and A ∈ Mn (F) the matrix of T. If A is nilpotent, then its characteristic polynomial
is p(λ) = (−1)n λn .
Proof Let k ≥ 1 be the smallest integer such that Ak = 0. Suppose that 0 = λ ∈ F is
an eigenvalue of A. Then there exists an eigenvector 0 = v ∈ V such that Av = λv.
By Lemma 6.5, it follows that Ak v = λk v = 0, which contradicts the nilpotency of
A. Therefore λ = 0 is the only eigenvalue of A so that its characteristic polynomial
is precisely p(λ) = (−1)n λn .
Definition 6.38 A generalized eigenvector corresponding to an eigenvalue λ of a
linear operator T : V → V is a nonzero vector v ∈ V such that (T − λIV )k v = 0,
for some integer k ≥ 1. The exponent of the generalized eigenvector v is the smallest
integer h ≥ 1 such that (T − λIV )h v = 0.
Let now A ∈ Mn (F) be the matrix of T : V → V, and λ an eigenvalue of T. Denote
(T − λI )0 = I, (T − λI )1 = T − λI, (T − λI )2 = (T − λI )(T − λI ), . . . , (T −
λI )h = (T − λI )(T − λI )h−1 and consider both kernel and image of any (T −
λI )h : V → V, respectively:
Nh,λ = N (T − λI )h = {v ∈ V | (T − λI )h (v) = 0},
Rh,λ = R (T − λI )h = {(T − λI )h (v) | v ∈ V }.
cannot be infinite. Since both chains terminate after finitely many steps, we can find
a minimum integer m ≥ 1 such that
and
Rm,λ = Rm+1,λ = Rm+2,λ = · · · · · · = Rt,λ for all t ≥ m.
Remark 6.39 The linear operator T : V → V is nilpotent if and only if there exists
k ≥ 1 (which is precisely the index of nilpotency of T ) such that Nk,0 = V and
Rk,0 = {0}.
Proof We divide the proof into two steps. Firstly, we consider the case when T
has the eigenvalue λ = 0. Let G = T|Ns,0 : Ns,0 → Ns,0 be the restriction of T to
Ns,0 , H = T|Rs,0 : Rs,0 → Rs,0 the restriction of T to Rs,0 . Denote by pT (t), pG (t)
and p H (t) the characteristic polynomials of T, G and H , respectively. Since V =
Ns,0 ⊕ Rs,0 and by Lemma 6.35, we have
On the other hand, (T − λr −1 I )tk · · · (T − λr −1 I )tr X 1 ∈ Nt1 ,λ1 \ Nts ,λs , for any
k = 2, . . . , r and s = 1. Therefore, relation (6.2) is true only if X 1 = 0. In this case,
(6.1) reduces to
X 2 + · · · + X r = 0, X i ∈ Nti ,λi . (6.3)
So, starting from (6.3) and repeating the same above process r − 1 times, one may
prove that X i = 0, for any i = 1, . . . , r.
Let now W = Nt1 ,λ1 ⊕ · · · ⊕ Ntr ,λr . By Lemma 6.43, we know that dim(Nti ,λi ) is
equal to the algebraic multiplicity of the eigenvalue λi , that is dim F (W ) = dim F (V )
and V = W.
where:
(i) A1 is a nilpotent matrix of index s, corresponding to the restriction of T to Ns ;
(ii) A2 is an invertible matrix, corresponding to the restriction of T to Rs .
More precisely:
(iii) For i = 1,. . . , t, the
coordinate column vectors of the image T (ei ) define the
A1
submatrix .
0n−t,t
(iv) For i = t + 1,. . . , n, the coordinate column vectors of the image T (ei ) define
0
the submatrix t,n−t .
A2
Thus, according to the decomposition V = Ns ⊕ Rs we have the decomposition
(6.4). It is called Fitting decomposition of A (usually V = Ns ⊕ Rs is also called
Fitting decomposition of T ).
Proof In case A is nilpotent, then p(λ) = (−1)n λn is proved in Lemma 6.37. Hence,
we assume p(λ) = (−1)n λn , that is 0 is the only eigenvalue of A. By contradiction,
suppose that A is not nilpotent, that is T is not nilpotent. Then there is some vector
v ∈ V such that T s (v) = 0, in particular Rs = {0} (see Remark 6.39).
Thus, by Fitting decomposition rule, V = Ns ⊕ Rs and
A1 0t,n−t
A= ,
0n−t,t A2
where A2 is an invertible matrix. On the other hand, since we have assumed that 0 is
the only eigenvalue of A, a fortiori it is the only eigenvalue of the nonsingular matrix
A2 , which is a contradiction.
We are now able to construct a basis of V with respect to which the matrix of T has
Jordan form. It will be an inductive process and consists of several steps. We start
with the following:
6.4 Jordan Canonical Form of an Operator 199
0 = α1 T r −1 (X 1 ) + α2 T r −1 (X 2 ) + · · · + αr T r −1 (X r ). (6.6)
α1 X 1 + α2 X 2 + · · · + αr −1 X r −1 = 0. (6.7)
Remark 6.51 Let T : V → V be a nilpotent operator (that is, 0 is the only eigen-
value of T ), X 1 , . . . , X k generalized eigenvectors of T such that ri is the exponent of
X i , for any i = 1, . . . , k. Starting from any X i we now construct the chain of vectors
as in Theorem 6.49:
(i) X i = X i,ri ;
(ii) X i,ri − j = T j (X i,ri ), for j = 1, . . . , ri − 1.
The set Bi = {X i,1 , X i,2 , . . . , X i,ri } is linearly independent and the subspace Wi =
Span(Bi ) is T -invariant. Moreover, in light of Remark 6.50:
(iii) X i,k−1 = T (X i,k ), for k = 2, . . . , ri ;
(iv) T (X i,1 ) = 0, that is, X i,1 is an eigenvector of T.
If V = ⊕i=1
k
W , where Wi s are subspaces of V having bases Bi s respectively, then
ki
the set B = i=1 Bi is a basis for V. Write
and let A be the matrix of T with respect to B. The column coordinate vectors of A
are the images of the elements of the basis B. By computing these images, we have
that
(v) T (X i, j ) = X i, j−1 in case X i, j is not the first vector in the chain Bi ;
(vi) T (X i, j ) = 0 in case X i, j is the first vector in the chain Bi .
Therefore, the coordinate vector of T (X i, j ) with respect to B is either
i−1
(0, 0, . . . , 0, 1, 0, . . . , 0), where 1 is the rh + ( j − 1) -th entry, or
h=1
(0, 0, . . . , 0, 0), respectively. Therefore, the matrix A has the following block diag-
onal form ⎡ ⎤
J1 (0)
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ . .. ⎥
⎣ ⎦
Jk (0)
where ⎡ ⎤
0 1
⎢ .. .. ⎥
⎢ . . ⎥
Ji (0) = ⎢ ⎥
⎣ 0 1⎦
0
Notice that
T (Yi,1 ) = T T ti (Yi,ti +1 ) = T ti +1 (Yi,ti +1 ) = 0.
Moreover
Yim = T ti +1−m (Yi,ti +1 ) = T ti −m (X i,ti )
implies
T (Yim ) = T ti +1−m (X i,ti ) = 0, ∀1 < m ≤ ti + 1.
ti +1
ti +1
ti +1−m
0= αm T (X i,ti ) = (αm X i,m−1 ). (6.10)
m=2 m=2
V = W ⊕ Span{x1 , . . . , xm } = V1 ⊕ · · · ⊕ Vk ⊕ Span{x1 , . . . , xm }.
Proof We prove the theorem by induction on the dimension of V. The result is trivial
if dim F V = 1. Assume dim F V = n ≥ 2 and the theorem holds for any T -invariant
proper subspace of V.
We firstly consider the case p(λ) = (λ − λ1 )n , that is there exists a unique eigen-
value λ1 of T, having algebraic multiplicity equal to n. Let A be the matrix of T with
respect to a basis B of V. Now replace T by T − λ1 I, so that A − λ1 I is the matrix
of T − λ1 I with respect to the same basis B. Moreover, if A − λ1 I is similar to a
Jordan diagonal blocks matrix, so is A. However, T − λ1 I is a nilpotent operator
and we can conclude by Theorem 6.52.
Hence, we may assume that there exist at least two distinct eigenvalues λ1 , λ2 of
T. The linear operator T − λ1 I is singular, having eigenvalue zero, and A − λ1 I is
its matrix. As above, if A − λ1 I admits a Jordan canonical form, so does A.
Let s ≥ 1 be the index of λ1 . That is, s is the minimum integer such that Ns,0 = Nt,0
and Rs,0 = Rt,0 , for any t ≥ s. We write Ns instead of Ns,0 and Rs instead of Rs,0 .
6.4 Jordan Canonical Form of an Operator 203
The fact that Jk (0)k = the zero matrix, is trivial, so we can consider the case d < k.
We denote R, R 2 , . . . , R d the images of the operators H, H 2 , . . . , H d respectively.
Let X 1 , . . . , X k be generalized eigenvectors of T corresponding to the eigenvalue
zero, such that Jk (0) is the matrix of H with respect to the basis C = {X 1 , . . . , X k } for
Fk . Since the columns of Jk (0) are the coordinate vectors of H (X 1 ), . . . , H (X k ) with
respect to the basis C, then H (X 1 ) = 0 and H (X i ) = X i−1 , for any i = 2, . . . , k.
Since {H (X 1 ), . . . , H (X k )} is a generator set for R, then {H (X 2 ), . . . , H (X k )} =
{X 1 , . . . , X k−1 } is a basis for R. Here we also notice that H 2 (X 1 ) = 0, H 2 (X 2 ) =
H (H (X 2 )) = H (X 1 ) = 0 and H 2 (X i ) = H (X i−1 ) = X i−2 , for any i = 3, . . . , k.
As above, this means that {H (X 3 ), . . . , H (X k )} = {X 2 , . . . , X k−1 } is a basis for R 2 .
Repeating this process, we can see that H d (X 1 ) = 0, H d (X i ) = 0 for any
i ≤ d and H d (X i ) = X i−d , for any d < i ≤ k, so that {H (X d+1 ), . . . , H (X k )} =
{X d , . . . , X k−1 } is a basis for R d , that is dim F (R d ) = k − d. Moreover Jk (0)d is the
204 6 Canonical Forms of an Operator
matrix of any H d , in other words the rank of Jk (0)d is equal to the dimension of R d ,
and we are done.
Corollary 6.55 Let T : V → V be a singular linear operator of the finite dimen-
sional vector space V and assume that there exists a basis B for V such that the
matrix A of T with respect to B has Jordan form. If Jk (0) is a k × k Jordan block of
A corresponding to the eigenvalue zero, then
1, d = k
rank Jk (0)d+1 − 2rank Jk (0)d + rank Jk (0)d−1 =
0, 1 ≤ d = k
be the Jordan block diagonal form, where any n i × n i block Ai is associated with an
eigenvalue λi of T. Thus
6.4 Jordan Canonical Form of an Operator 205
⎡ ⎤
A1 − λIn 1
⎢ .. ⎥
⎢ . ⎥
A − λI = ⎢
⎢
⎥
⎥
⎣ .. ⎦
.
At − λIn t
and ⎡ ⎤
(A1 − λIn 1 )k
⎢ .. ⎥
⎢ . ⎥
(A − λI )k = ⎢
⎢
⎥
⎥
⎣ .. ⎦
.
(At − λIn t )k
so that
t
rank (A − λI )k = rank (Ai − λIni )k . (6.12)
i=1
To simplify the notation we denote rk (A) = rank (A − λI )k and rk (Ai ) =
rank (Ai − λIni )k . By (6.12) it follows that
t
rk+1 (A) − 2rk (A) + rk−1 (A) = rk+1 (Ai ) − 2rk (Ai ) + rk−1 (Ai ) . (6.13)
i=1
t
rk+1 (Ai ) − 2rk (Ai ) + rk−1 (Ai ) = 0.
i=1
Hence, in light of relation (6.13), we obtain that rk+1 (A) − 2rk (A) + rk−1 (A) rep-
resents precisely the number of Jordan blocks Ai having dimension k and associated
to the eigenvalue λ.
and an application of Theorem 6.58 shows that the number of times Jh (λ) occurs as
a block in A is
rank (A − λI )h+1 − 2rank (A − λI )h + rank (A − λI )h−1 = 0,
k
!
t= rank (A − λI )i+1 − 2rank (A − λI )i + rank (A − λI )i−1 .
i=1
so that
t = rank (A − λI )0 − rank A − λI = n − rank A − λI
having rank r1 = 3, so that the associated homogeneous linear system has solutions
X = (α, −2β, β, 0, 0), for any α, β ∈ R. The eigenspace corresponding to λ1 is
N1,λ1 =
(1, 0, 0, 0, 0), (0, −2, 1, 0, 0) and the geometric multiplicity of λ1 is equal
to 2. Therefore, there are 2 Jordan blocks corresponding to the eigenvalue λ1 = 3.
More precisely, one block has dimension 2 and one block has dimension 1. Now
⎡ ⎤
0 0 0 −1 −1
⎢0 0 0 −2 4 ⎥
⎢ ⎥
(A − 3I )2 = ⎢
⎢0 0 0 1 −2 ⎥
⎥
⎣0 0 0 1 −2 ⎦
0 0 0 0 1
has rank r2 = 2, that is the associated homogeneous linear system has solutions
(α, β, γ , 0, 0), for any α, β, γ ∈ R. Thus N2,λ1 has dimension 3 and N2,λ1 =
(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0). Notice that the dimension of N2,λ1 coin-
cides with the algebraic multiplicity of λ1 , so that N2,λ1 is the generalized eigenspace
208 6 Canonical Forms of an Operator
having rank equal to 4, so that the associated homogeneous linear system has solu-
tions (−α, −2α, α, α, 0), for any α ∈ R. The eigenspace corresponding to λ2 is
N1,λ2 =
(−1, −2, 1, 1, 0) and the geometric multiplicity of λ2 is equal to 1. There-
fore, there is 1 Jordan block corresponding to the eigenvalue λ2 = 2, having dimen-
sion 2. Since ⎡ ⎤
124 1 3
⎢0 1 0 2 0⎥
⎢ ⎥
(A − 2I ) = ⎢
2
⎢ 0 0 1 −1 0 ⎥
⎥
⎣0 0 0 0 0⎦
000 0 0
has rank r2 = 3, then the associated homogeneous linear system has solutions
(−α − 3β, −2α, α, α, β), for any α, β ∈ R. Thus, the generalized eigenspace N2,λ2
has dimension 2 and N2,λ2 =
(−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1). To obtain a set of
Jordan generators corresponding to λ2 , we start from X 5 ∈ N2,λ2 \ N1,λ2 . Of course
X 5 = (−3, 0, 0, 0, 1). Then compute X 4 = (A − 2I )X 5 = (−1, −2, 1, 1, 0). Hence
B3 = {X 4 , X 5 } is a set of Jordan generators associated with the block of dimension
2, corresponding to λ2 .
Finally, we can write a Jordan basis for T :
B = B1 ∪ B2 ∪ B3
= {X 1 , X 2 , X 3 , X 4 , X 5 }
= {(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, −2, 1, 0, 0), (−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1)}.
T (X 1 ) = AX 1
= (3, 0, 0, 0, 0), with coordinates in terms of B as (3, 0, 0, 0, 0);
6.4 Jordan Canonical Form of an Operator 209
T (X 2 ) = AX 2
= (1, 3, 0, 0, 0), with coordinates in terms of B as (1, 3, 0, 0, 0);
T (X 3 ) = AX 3
= (0, −6, 3, 0, 0), with coordinates in terms of B as (0, 0, 3, 0, 0);
T (X 4 ) = AX 4
= (−2, −4, 2, 2, 0), with coordinates in terms of B as (0, 0, 0, 2, 0);
T (X 5 ) = AX 5
= (−7, −2, 1, 1, 2), with coordinates in terms of B as (0, 0, 0, 1, 2).
then ⎡ ⎤
3 1 0 0 0
⎢0 3 0 0 0⎥
⎢ ⎥
A = P −1 A P = ⎢
⎢0 0 3 0 0⎥⎥
⎣0 0 0 2 1⎦
0 0 0 0 2
and (A − I )3 = 0. In particular, this means that there exists a Jordan block corre-
sponding to λ of dimension 3 and any other Jordan block of λ has dimension less
than 3.
Therefore, A − I has rank r1 = 3, so that the associated homogeneous linear
system has solutions X = (α, β, 0, −α, 0), for any α, β ∈ R. So
N1,λ =
(1, 0, 0, −1, 0), (0, 1, 0, 0, 0), the geometric multiplicity of λ is equal to
2 and there are 2 Jordan blocks corresponding to the eigenvalue λ. Since (A −
I )2 has rank r2 = 1, the associated homogeneous linear system has solutions X =
(α, β, γ , δ, 0), for any α, β, γ , δ ∈ R. So
B = B1 ∪ B2
= {X 1 , X 2 , X 3 , X 4 , X 5 }
= {(−1, −1, 0, 1, 0), (1, 1, 0, 0, 0), (0, 0, 0, 0, 1), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0)}.
then
6.4 Jordan Canonical Form of an Operator 211
⎡ ⎤
1 1 0 0 0
⎢0 1 1 0 0⎥
⎢ ⎥
A = P −1 A P = ⎢
⎢0 0 1 0 0⎥⎥
⎣0 0 0 1 1⎦
0 0 0 0 2
Exercises
1. For each of the following linear operators T on the vector space V, determine
whether the given subspace W is a T -invariant subspace of V :
(a) V = P3 (R), T ( f (x)) = f (x), and W = P2 (R).
(b) V = P(R), T ( f (x)) = x f (x), and W = P2 (R).
(c) V = R3 , T (a, b, c) = T (a + b + c, a + b + c, a + b + c) and W =
{(t, t, t) : t ∈ R}. "1
(d) V = C([0, 1]), T ( f (t)) = [ 0 f (x)d x]t, and W = { f ∈ V : f (t) =
at + b for some a and b}.
01
(e) V = M2 (R), T (A) = A, and W = {A ∈ V : At = A}.
10
2. Let T : R4 → R4 be the linear operator having matrix
⎡ ⎤
2 0 0 0
⎢1 3 1 0⎥
A=⎢
⎣0
⎥
−1 1 1⎦
0 1 2 1
p(X ) = a0 + a1 X + a2 X 2 + · · · + ad X d , a0 , a1 , . . . , ad ∈ F,
we can substitute the matrix A for the indeterminate X and obtain a matrix p(A) ∈
Mn (F), more precisely
6.5 Cayley-Hamilton Theorem and Minimal Polynomial 213
p(A) = a0 I + a1 A + a2 A2 + · · · + ad Ad ∈ Mn (F).
In particular, we say that A is a root for the polynomial p(X ), in case p(A) is the
zero matrix in Mn (F) and write p(A) = 0. Starting from these comments, we may
now state the following:
X = X1 + · · · + Xk. (6.14)
A3 = 4 A2 + 5 A = 4(4 A + 5I ) + 5 A = 21 A + 20I.
Then
A6 = (A3 )2 = (21A + 20I )2 = 441A2 + 840 A + 400I = 441(4 A + 5I ) + 840 A
+ 400I = 2604 A + 2605I. This implies that
7813 7812
A6 = .
7812 7813
that is
152 144
B=
144 152
Proof Let f (A) = 0. By the division algorithm for polynomials, there exist polyno-
mials q(λ), r (λ) ∈ F[λ] such that f (λ) = q(λ)m(λ) + r (λ), where r (λ) = 0 or
deg r (λ) < deg m(λ). Thus 0 = f (A) = q(A)m(A) + r (A) = r (A). Since m(λ) is
the minimal polynomial of T and deg r (λ) < deg m(λ), then the polynomial r (λ)
must be identically zero. Thus m(λ) divides f (λ).
Assume now that f (λ) = m(λ)q(λ), for some q(λ) ∈ F[λ]. Then f (A) = m(A)
q(A) = 0, as required.
Corollary 6.67 The minimal polynomial of a linear operator T divides the charac-
teristic polynomial of T.
Proof Let p(λ) ∈ F[λ] be the characteristic polynomial of T. The result is trivial in
case m(λ) = p(λ). Thus we can consider deg(m) < deg( p) and p(λ) = m(λ)q(λ)
for some q(λ) ∈ F[λ].
Assume first that λ0 is a root of m(λ). In this case it is easy to see that p(λ0 ) =
m(λ0 )q(λ0 ) = 0, that is λ0 is a root of the characteristic polynomial, i.e., λ0 is an
eigenvalue of T.
Suppose, now p(λ0 ) = 0 and suppose by contradiction that (λ − λ0 ) does not
divide m(λ). Hence, there exists a polynomial q(λ) ∈ F[λ] such that m(λ) =
(λ − λ0 )q(λ) + r , where deg(q) = deg(m) − 1 and 0 = r ∈ F. Hence 0 = m(A) =
(A − λ0 I )q(A) + r I, that is −r I = (A − λ0 I )q(A). By computing the determi-
nants of the matrices in the last identity, we get | − r I | = |A − λ0 I ||q(A)|, which
means (−r )n = |A − λ0 I ||q(A)|. Since λ0 is an eigenvalue of T, |A − λ0 I | =
p(λ0 ) = 0 and (−r )n = 0, which is a contradiction.
where λ1 , . . . , λk are the distinct eigenvalues of T and ti is the index of λi , for any
i = 1, . . . , k.
that is
(A − λ1 I )t1 · · · (A − λk I )tk X = 0, for all X ∈ V.
216 6 Canonical Forms of an Operator
Therefore, the polynomial m(λ) represents the zero operator on V, so that m(A) =
0. Moreover, the polynomial m(λ) divides the characteristic polynomial p(λ), since
any ti ≤ ai , for any i = 1, . . . , k, where ai is the algebraic multiplicity of λi .
To conclude the proof, it is sufficient to show that m(λ) is the monic polynomial
of smallest degree such that m(A) is the zero matrix.
Denote
f (λ) = (λ − λ1 )s1 · · · (λ − λk )sk ,
the minimal polynomial of T and suppose by contradiction that f (λ) = m(λ). Since
m(A) = 0, by Theorem 6.66, f (λ) divides m(λ). In other words, there is at least
one s j ∈ {s1 , . . . , sk } such that s j < t j . Since t j is the index of λ j , then there exists
X ∈ N j such that 0 = (A − λ j I )s j X = Y ∈ N j . On the other hand, for any Z ∈ N j
and l = j, we know that Z ∈ / Nl and, by Lemma 6.42, 0 = (A − λl I )sl Z ∈ N j .
Moreover, since (A − λi I ) and (A − λ j I ) commute for any λi = λ j , we can write
the factorization of f (A) as follows:
0 = f (A)(X )
= (A − λ1 I )s1 · · · (A − λ j−1 I )s j−1 (A − λ j+1 I )s j+1 · · · (A − λ j I )sk (A − λ j I )s j X
= (A − λ1 I )s1 · · · (A − λ j−1 I )s j−1 (A − λ j+1 I )s j+1 · · · (A − λ j I )sk Y
= 0
Example 6.70 Let T : R5 → R5 be the linear operator as in the Example 6.61. The
characteristic polynomial of T is p(λ) = (3 − λ)3 (2 − λ)2 and the Jordan canonical
form of T is represented by the matrix
⎡ ⎤
3 1 0 0 0
⎢0 3 0 0 0⎥
⎢ ⎥
A = ⎢
⎢0 0 3 0 0⎥⎥
⎣0 0 0 2 1⎦
0 0 0 0 2
B = {(1, 0, 0, 0, 0), (0, 1, 0, 0, 0), (0, −2, 1, 0, 0), (−1, −2, 1, 1, 0), (−3, 0, 0, 0, 1)}
6.5 Cayley-Hamilton Theorem and Minimal Polynomial 217
B = {(−1, −1, 0, 1, 0), (1, 1, 0, 0, 0), (0, 0, 0, 0, 1), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0)}
Exercises
with respect to the canonical basis for R3 . Applying the Cayley-Hamilton The-
orem, find the eigenvalues and the eigenvectors of the matrix B = A5 − 5A4 +
8A3 − 8A2 + 8A − 7I.
4. Let T : R2 → R2 be the linear operator having matrix
01
A=
21
with respect to the canonical basis for R2 . Compute A10 using the Cayley-
Hamilton Theorem.
5. Let T : R3 → R3 be the linear operator having matrix
218 6 Canonical Forms of an Operator
⎡ ⎤
311
A = ⎣1 3 1⎦
003
with respect to the canonical basis for R3 . Find the inverse matrix of A using the
Cayley-Hamilton Theorem.
6. Let T : R3 → R3 be the linear operator having matrix
⎡ ⎤
3 1 1
A = ⎣ 0 −1 1 ⎦
0 0 2
with respect to the canonical basis for R3 . Applying the Cayley-Hamilton The-
orem,
(a) Find the eigenvalues and the eigenvectors of the matrix B = A5 − 2 A4 −
7A3 + 8A2 + 11A − 2I.
(b) Find the inverse matrix of A.
(c) Compute the matrix A8 .
7. Let T : R9 → R9 be the linear operator having characteristic polynomial equal to
p(λ) = (3 − λ)4 (2 − λ)2 (−1 − λ)3 and minimal polynomial equal to m(λ) =
(3 − λ)2 (2 − λ)(−1 − λ)2 . Describe all possibilities for the Jordan canonical
form of T.
that is (T ∗ − λI )X = 0, as desired.
and also
T (X ), Y =
X, T ∗ (Y ) =
X, μY = μ
X, Y .
λ X 2 = λ
X, X =
λX, X =
T (X ), X =
X, T (X ) =
X, λX = λ
X, X = λ X 2 .
Hence, λ = λ ∈ R.
λ X 2 = λ
X, X =
λX, X =
T (X ), X =
−X, T (X ) =
−X, λX = −λ
X, X = −λ X 2 .
1 = X, X = T (X ), T (X ) = λX, λX = λλ X, X = λλ
as desired.
Remark 6.80 The Spectral Theorem can be stated from two different points of view:
(i) If V is a real inner product space, then the matrix A of T is orthogonally similar
to a diagonal matrix if and only if A is symmetric.
(ii) If V is a complex inner product space, then the matrix A of T is unitarily similar
to a diagonal matrix if and only if A is Hermitian.
0001
For λ2 = 2, we have ⎡ ⎤
0 1 1 0
⎢1 0 1 0⎥
A − 2I = ⎢
⎣1
⎥
1 0 0⎦
0 0 0 0
so that the associated homogeneous linear system has solutions: X = (0, 0, 0, α),
for any α ∈ R. So the corresponding eigenspace is N1,λ2 =
(0, 0, 0, 1). Finally, for
λ3 = 4, we have
222 6 Canonical Forms of an Operator
⎡ ⎤
−2 1 1 0
⎢ 1 −2 1 0 ⎥
A − 4I = ⎢
⎣ 1
⎥
1 −2 0 ⎦
0 0 0 −2
so that the associated homogeneous linear system has solutions: X = (α, α, α, 0),
for any α ∈ R. The corresponding eigenspace is
N1,λ3 =
(1, 1, 1, 0)) =
( √13 , √13 , √13 , 0).
Therefore, with respect to the orthonormal basis
&
1 1 1 1 2 1 1 1
− √ , √ , 0, 0 , √ , √ , − √ , 0 , (0, 0, 0, 1), √ , √ , √ , 0
2 2 6 6 6 3 3 3
the matrix of T is ⎡ ⎤
1 0 0 0
⎢0 1 0 0⎥
A =⎢
⎣0
⎥.
0 2 0⎦
0 0 0 4
0 0 0 1
so that the associated homogeneous linear system has solutions: X = (α, 0, iα, 0),
for any α ∈ R. The corresponding eigenspace is N1,λ1 =
(1, 0, i, 0) =
( √12 , 0,
√i , 0). For λ2 = −1, we have
2
⎡ ⎤
2 −i 0 0
⎢i 1 −1 0⎥
A+I =⎢
⎣0
⎥
−1 2 0⎦
0 0 0 3
6.6 Normal Operators on Inner Product Spaces 223
the matrix of T is ⎡ ⎤
2 0 0 0
⎢ 0 2 0 0 ⎥
A = ⎢
⎣0
⎥.
0 1 0 ⎦
0 0 0 −1
We notice that
α uT 1 α
T (e1 ) = =
0n−1,1 E 0n−1,1 0n−1,1
and
∗ α 01,n−1 1 α
T (e1 ) = =
u E∗ 0n−1,1 u
that is
∗
α
T (e1 ) = (A X ) (A X ) = α 01,n−1
2
= αα
0n−1,1
and
# ∗ #
#T (e1 )#2 = (A )∗ X ∗ (A )∗ X = α u ∗ α = αα + u ∗ u.
u
and
∗ α 01,n−1 α 01,n−1 αα 01,n−1
A (A ) = =
0n−1,1 E 0n−1,1 E∗ 0n−1,1 E E ∗
α 01,n−1 α 01,n−1 αα 01,n−1
(A )∗ A = = .
0n−1,1 E∗ 0n−1,1 E 0n−1,1 E ∗ E
we have
α 0
U −1 A U = ,
0 D
where U is either orthogonal or unitary with respect to the fact that Q is orthogonal
or unitary.
We cannot expect the same conclusion in the case V is a real inner product space:
It is clear from the above discussion that one has to ask oneself what could be the
canonical form of real normal operators, in the case it is not diagonal. To answer this
question, we need to fix some results on invariant subspaces of V.
Proof
(i) Let V = U ⊕ U ⊥ and B1 = {e1 , . . . , ek }, B2 = {c1 , . . . , cn−k } be bases of U
and U ⊥ , respectively. Since T (ei ) ∈ U , for any i = 1, . . . , k, then the matrix
A of T with respect to the basis B1 ∪ B2 for V is
A1 E
A= ,
0n−k,k A2
where 0n−k,k is the zero matrix in Mn−k,k (F), A1 ∈ Mk (F), A2 ∈ Mn−k (F),
E ∈ Mk,n−k (F). Here, we denote αi j ∈ F and ηi j ∈ F the elements of A1 and
E, respectively. For i = 1, . . . k, T (ei ) is the coordinate vector of the entries
in the ith column of A.
The matrix A∗ of T ∗ with respect to the same basis B1 ∪ B2 is
∗ A∗1 Ok,n−k
A = ,
E ∗ A∗2
where 0k,n−k is the zero matrix in Mk,n−k (F). For i = 1, . . . , k, T ∗ (ei ) is the
coordinate vector of the entries in the i-th column of (A1 )∗ and E ∗ , that is the
conjugates of the entries in the i-th row of A1 and E.
Since T is normal, T (ei )2 = T ∗ (ei )2 for any i, thus
k
k
# ∗ #
T (ei )2 = #T (ei )#2
i=1 i=1
that is
6.6 Normal Operators on Inner Product Spaces 227
k
k
k
k
k
n
α ji α ji = αi j αi j + ηi j ηi j.
i=1 j=1 i=1 j=1 i=1 j=k+1
k
n
This implies that i=1 j=k+1ηi j ηi j = 0 that is each ηi j must be zero. There-
fore E = 0 and
A1 0k,n−k
A= .
0n−k,k A2
X, G ∗ (Y ) = G(X ), Y = T (X ), Y = X, T ∗ (Y )
(T|U )∗ T|U .
(v) It is a consequence of the previous result, since U ⊥ is T -invariant.
'
m
'm
2
p(λ) = λ − (αk + iβk ) λ − (αk − iβk ) = λ + ak λ + bk ,
k=1 k=1
√
4bk −ak2
where αk = − a2k and βk = ± 2
. If A is the matrix of T, then by Cayley-
Hamilton Theorem,
'
m
0 = p(A) = A2 + ak A + bk I .
k=1
228 6 Canonical Forms of an Operator
T (Y ) = μ1 T (X ) + μ2 T 2 (X ) = μ1 T (X ) + μ2 (−a j T (X ) − b j X ) ∈ W
where each Ai is either a real number or a block of dimension 2, having the form
α −β
(6.20)
β α
Proof Firstly, we suppose there exists an orthogonal matrix U ∈ Mn (F) such that
⎡ ⎤
A1
⎢ .. ⎥
⎢ . ⎥
A = U −1 AU = ⎢
⎢
⎥,
⎥
⎣ .. ⎦
.
Ak
where each Ai has the form described in the statement of the theorem. In this case,
we can see that each Ai commutes with its transpose, that is each Ai is normal. Hence
the fact that A commutes with its transpose follows from easy computations. Thus
A is normal, as well as A.
Assume now that A is a normal matrix. In case n = 1 it is trivial.
We now prove the result for n = 2. Write
ab
A=
cd
6.6 Normal Operators on Inner Product Spaces 229
and
T ∗ (X 1 )2 = (A T X 1 )T (A T X 1 )
T
a c 1 a c 1
=
bd 0 bd 0
= a 2 + b2 .
If b < 0 the result is proved. If b > 0, we compute the matrix A of T with respect
to the orthonormal basis {X 1 , −X 2 }. It is
a −b
A =
b a
as required.
Hence we can suppose in what follows n ≥ 3 and prove the result by induction
on n. Assume that the theorem holds for any normal matrix having order less than n.
Let U be a T -invariant subspace of V having dimension 1 or 2. If dim R U = 1,
then any vector in U with norm 1 is an orthonormal basis of U and the matrix A1
of T|U has order 1. If dim R U = 2, then T|U is a normal operator on U (see Lemma
6.87) and the matrix A1 of T|U has the form (6.20) with respect to an orthonormal
basis C1 of U. Since T|U ⊥ is also a normal operator on U ⊥ (see again Lemma 6.87),
by induction hypothesis there exists an orthonormal basis C2 of U ⊥ with respect to
which the matrix A2 of T|U ⊥ has the desired form.
Therefore, considering the basis C1 ∪ C2 for V , the matrix A of T with respect
to C1 ∪ C2 is
230 6 Canonical Forms of an Operator
A1 0
0 A2
be the block diagonal form of the matrix of a normal operator T as in Theorem 6.89.
Then each 1 × 1 block Ai is precisely a real eigenvalue of T and any 2 × 2 block
having form (6.20) is corresponding to the pair of complex conjugate eigenvalues
α + iβ, α − iβ of T.
AX = λX = λX .
AX = AX = AX .
Thus, AX = λX as desired.
Proof It follows from the fact that λ = λ (since λ is not real). Hence, X and X are
corresponding to the distinct eigenvalues λ and λ, respectively.
(c) Let X ∈ Cn be an eigenvector of T. Since X is a complex vector, it is defined as a
combination of two real vectors x ∈ Rn the real part, and y ∈ Rn the imaginary
part of X , that is X = x + i y. Then x, y are orthogonal vectors and x2 = y2 .
6.6 Normal Operators on Inner Product Spaces 231
∗
Proof Since X, X are orthogonal, we have 0 = X X = X T X = (x + i y)T (x +
i y) = x T x + 2i x T y − y T y = x2 + 2i x T y − y2 which implies both x2 =
y2 and x T y = 0 as required.
∗
(d) Let Z , Y ∈ Cn be such that both Z ∗ Y = 0 and Z Y = 0. If Z = a + ib and
Y = c + id, for a, b, c, d ∈ V, then a T c = a T d = b T c = b T d = 0, that is the
real and imaginary parts of Z are orthogonal to both the real and imaginary parts
of Y.
Proof By Z ∗ Y = 0 we get
that is
a T c + b T d = 0, a T d − b T c = 0. (6.21)
∗
Analogously, by Z Y = 0, it follows
that is
a T c − b T d = 0, a T d + b T c = 0. (6.22)
( f ) If
√X = x + √ i y is a complex eigenvector of T having length equal to 1, then both
2x and 2y have length equal to 1.
1 = X = x2 + y2 = 2 x2 = 2 y2 .
We are now ready to construct the required orthonormal basis for V. Let α1 , . . . , αr
be the real eigenvalues of T, w1 , . . . , wr real eigenvectors of T corresponding
to α1 , . . . , αr , respectively, and let W =
w1 , . . . , wr . By standard computations,
we obtain an orthonormal basis for W. Let {z 1 , . . . , zr } be such a basis. Let
232 6 Canonical Forms of an Operator
[a1 , b1 , 0 . . . , 0 ],
(n−2)−times
√ √ √
T ( 2y1 ) = A( 2y1 ) = 2(−b1 x1 + a1 y1 ), having coordinates in terms of B :
[−b1 , a1 , 0 . . . , 0 ],
(n−2)−times
√ √ √
T ( 2x2 ) = A( 2x2 ) = 2(a2 x2 + b2 y2 ), having coordinates in terms of B :
[0, 0, a2 , b2 , 0 . . . , 0 ],
(n−4)−times
√ √ √
T ( 2y2 ) = A( 2y2 ) = 2(−b2 x2 + a2 y2 ), having coordinates in terms of B :
[0, 0, −b2 , a2 , 0 . . . , 0 ].
(n−4)−times
√ √ √
More generally, for any j = 1, . . . , k : T ( 2x j ) = A( 2x j ) = 2(a j x j + b j y j ),
having coordinates in terms of B :
[ 0 . . . , 0 , a j , b j , 0 . . . , 0 ],
(2 j−2)−times (n−2 j)−times
√ √ √
T ( 2y j ) = A( 2y j ) = 2(−b j x j + a j y j ), having coordinates in terms of B :
[ 0 . . . , 0 , −b j , a j , 0 . . . , 0 ].
(2 j−2)−times (n−2 j)−times
6.6 Normal Operators on Inner Product Spaces 233
[ 0...,0 , αh , 0...,0 ].
(2k+h−1)−times (n−2k−h)−times
where
0 −α j
Aj =
αj 0
The eigenvalues are 3i, −3i, 0. For λ = 3i, the corresponding eigenspace is W1 =
(−6i − 2, 3i − 4, 5). So we obtain an eigenvector X 1 ∈ C3 having length equal to
1, such that W1 =
X 1 :
−6i − 2 3i − 4 5 −2 −4 5 −6 3
X1 = √ , √ ,√ = √ ,√ ,√ +i √ , √ ,0 .
90 90 90 90 90 90 90 90
Let A be the matrix of T with respect to B. Then the column coordinate vectors
of A are the images of the elements of B. By computations, we get
√ √
T ( 2x2 ) = 3 2y2
√ √
T ( 2y2 ) = −3 2x2
T (X 3 ) = 0.
as expected.
Exercises
with respect to the canonical basis of C4 . Determine an unitary basis for C4 with
respect to which the matrix of T has diagonal form.
4. Let T : R4 → R4 be the normal (skew-symmetric) operator having matrix
⎡ ⎤
0 −2 0 2
⎢ 2 0 0 2⎥
A=⎢
⎣ 0
⎥
0 0 0⎦
−2 −2 0 0
This chapter is devoted to the study of the properties of bilinear and quadratic forms,
defined on a vector space V over a field F. The main goal will be the construction of
appropriate methods aimed at obtaining the canonical expression of the functions,
in terms of suitable bases for V. To do this, we will introduce the concept of orthog-
onality with respect to a bilinear form and mostly make use of the orthogonalization
Gram-Schmidt process. Unless otherwise stated, here any vector space V is a finite
dimensional vector space over F.
Example 7.2 Let B = {b1 , b2 } be a basis for R2 and C = {c1 , c2 , c3 } a basis for R3 .
Let f : R2 × R3 −→ R be the function defined by
f (x1 , x2 ), (y1 , y2 , y3 ) = x1 (y1 + y2 ) + x2 (y1 − y3 ),
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 239
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_7
240 7 Bilinear and Quadratic Forms
where (x1 , x2 ) and (y1 , y2 , y3 ) are the coordinate vectors of any v ∈ R2 and w ∈ R3
in terms of B and C, respectively. Then f is a bilinear form on R2 × R3 . In fact,
for any α, β ∈ R, v1 = (x1 , x2 ), v2 = (x1 , x2 ) ∈ R2 and w1 = (y1 , y2 , y3 ), w2 =
(y1 , y2 , y3 ) ∈ R3 , it is easy to see that
f α(x1 , x2 ) + β(x1 , x2 ), (y1 , y2 , y3 ) = α f (x1 , x2 ), (y1 , y2 , y3 )
+β f (x1 , x2 ), (y1 , y2 , y3 )
and
f (x1 , x2 ), α(y1 , y2 , y3 ) + β(y1 , y2 , y3 ) = α f (x1 , x2 ), (y1 , y2 , y3 )
+β f (x1 , x2 ), (y1 , y2 , y3 ) .
If we consider the coefficients matrix A = ai j , where ai j = f (bi , c j ), for any
1 ≤ i, j ≤ n then it is easy to see that
Thus
11 0
A= .
1 0 −1
Since
f (d1 , e1 ) = 3, f (d1 , e2 ) = −1, f (d1 , e3 ) = 3;
in terms of the bases B = {b1 , b2 , b3 } = {(1, 1, 0), (0, 0, 1), (0, 1, 1)} for R3 and
C = {c1 , c2 } = {(1, 1), (0, 1)} for R2 . Then f can be expressed as follows
⎡ ⎤
11 y1
f (x1 , x2 , x3 ), (y1 , y2 ) = x1 x2 x3 ⎣ 10 ⎦
y2
21
= x1 (y1 + y2 ) + x2 y1 + x3 (2y1 + y2 ).
We now obtain the matrix of f in terms of different ordered bases for V and W.
Let D = {d1 , d2 , d3 } = {(0, 1, 0), (0, 1, 1), (2, 0, 1)} be a basis for R3 and E =
{e1 , e2 } = {(1, 1), (2, 1)} be a basis for R2 . In order to obtain the matrix relative
to the bases D and E, one has to compute the coefficients f (di , e j ). The first
step is to determine the coordinate vectors of d1 , d2 , d3 in terms of B, and e1 , e2
in terms of C : d1 = [0, 1, 0]t = [0, −1, 1]tB , d2 = [0, 1, 1]t = [0, 0, 1]tB , d3 =
[2, 0, 1]t = [2, 3, −2]tB
Thus ⎡ ⎤
1
1
1
f (d1 , e1 ) = 0 −1 1 ⎣ 1 0⎦ = 1,
0
2 1
⎡ ⎤
1 1
2
f (d1 , e2 ) = 0 −1 1 ⎣ 1 0 ⎦ = 1,
−1
2 1
We now consider two different ordered bases B = {b1 , . . . , bn } and B = {b1 , . . . , bn }
for V, as well as two different ordered bases C = {c1 , . . . , cm } and C = {c1 , . . . , cm }
for W. In view of the above, the bilinear form f : V × W → F can be represented
by different matrices, in connection with the choice of a basis for V and W. For
instance, let A be the matrix of f with respect to the ordered bases B for V and C
for W, and A the matrix of f with respect to the ordered bases B for V and C for
W. Now let us describe the relationship between the matrices A and A .
Let P ∈ Mn (F) be the transition matrix of B relative to B, whose i-th column is
the coordinates vector [bi ] B , and Q ∈ Mm (F) be the transition matrix of C relative to
C, whose i-th column is the coordinates vector [ci ]C . We recall that, for any vectors
v ∈ V and w ∈ W, the following hold:
Thus
f (v, w) = [v]tB A[w]C
t
= P[v] B A Q[w]C
= [v]tB P t AQ [w]C .
On the other hand, f (v, w) = [v]tB A [w]C and, by the uniqueness of A in terms of
the ordered bases B and C , we get A = P t AQ.
in terms of the ordered bases B = {b1 , b2 , b3 } = {(1, 1, 0), (0, 0, 1), (0, 1, 1)} for
R3 and C = {c1 , c2 } = {(1, 1), (0, 1)} for R2 .
We introduce D = {d1 , d2 , d3 } = {(0, 1, 0), (0, 1, 1), (2, 0, 1)} a basis for R3 and
E = {e1 , e2 } = {(1, 1), (2, 1)} a basis for R2 .
The transition matrices P, Q of D relative to B and that of E relative to C,
respectively, are ⎡ ⎤
0 0 2
1 2
P = ⎣ −1 0 3 ⎦ , Q =
0 −1
1 1 −2
244 7 Bilinear and Quadratic Forms
so that, the matrix A of f, in terms of the ordered bases D for R3 and E for R2 , is
A = P t AQ
⎡ ⎤⎡ ⎤
0 −1 1 1 1
1 2
= ⎣0 0 1 ⎦⎣1 0⎦
0 −1
2 3 −2 2 1
⎡ ⎤
1 1
= ⎣2 3⎦.
1 2
Let us now investigate the special case when V = W, in other words, we consider the
bilinear form f : V × V → F. Under this assumption, we may always consider only
one ordered basis B = {e1 , . . . , en } for V, so that the coefficients matrix A = ai j
of f is obtained by the computations ai j = f (ei , e j ), for any i, j = 1, . . . , n.
Therefore, if B and B are two different bases for V, then f can be represented
by two different matrices: A, the matrix of f with respect to the ordered basis B,
and A , the matrix of f with respect to the ordered basis B . In light of the above
argument, A = P t A P, where P is the transition matrix of B relative to B. Notice
that A, A , P are n × n square matrices, and, in particular, P is an invertible matrix.
At this point, we would like to recall the following:
Proof We already have proved one direction: if both A and A represent the same
bilinear form f with respect to the ordered bases B and B , respectively, then there
exists a nonsingular matrix P (which is precisely the transition matrix of B relative
to B) such that A = P t A P.
In order to prove the other direction, we assume that A is the matrix of f in
terms of the ordered basis B = {b1 , . . . , bn }. Here suppose that A = P t A P for
some nonsingular n × n matrix P. Of course, the i-th column vector [u 1i , . . . , u ni ]t
in P can be viewed as a coordinate vector with respect to the ordered basis B. Let
n
ui = u ji b j be the vector of V, having coordinates [u 1i , . . . , u ni ]t in terms of B.
j=1
Hence, for any i = 1, . . . , n, we obtain a sequence of n linearly independent vectors
7.2 The Effect of the Change of Bases 245
Proof (i) We firstly assume that f is alternating. Thus, by expanding the relation
f (u + v, u + v) = 0 for any u, v ∈ V, we get
(ii) Consider now the case char (F) = 2 and suppose f is skew-symmetric. There-
fore, f (u, u) = − f (u, u) for any u ∈ V, which implies 2 f (u, u) = 0, for any
u ∈ V, that is, f is alternating.
(iii) Finally, if char (F) = 2, then f is symmetric if and only if f (u, v) = f (v, u) =
− f (v, u) for any u, v ∈ V (since 1 = −1). This last relation holds if and only if f
is skew-symmetric, as desired.
Proof We remark that the same result holds for bilinear skew-symmetric forms, i.e.,
f is skew-symmetric if and only if the matrix A of f is skew-symmetric. Never-
theless, in light of the previous theorem, it is known that a skew-symmetric form is
either symmetric or alternating. Therefore, it is sufficient to prove the result in these
last two cases.
Let B = {e1 , . . . , en } be any ordered basis for V and A = ai j be the matrix of
f in terms of B.
as required.
(ii) Let now f be alternating. Thus, aii = f (ei , ei ) = 0 for any i = 1, . . . , n. More-
over, for any i = j, f (ei + e j , ei + e j ) = 0 implies f (ei , e j ) + f (e j , ei ) = 0, i.e.,
ai j = f (ei , e j ) = − f (e j , ei ) = −a ji and hence A is skew-symmetric.
Conversely, let At = −A be such that aii = 0 for any i = 1, . . . , n. Hence, for
any u ∈ V, f (u, u) = u t Au. As above, since u t Au ∈ F is a scalar element,
Hence, in case of char (F) = 2, it follows f (u, u) = 0 for any u ∈ V and we are
done.
n
Finally, let char (F) = 2 and u = αi ei be any vector of V. Since f (ei , ei ) =
i=1
aii = 0 for any i, and f (ei , e j ) = ai j = −a ji = − f (e j , ei ) = f (e j , ei ) for any i =
j, it follows that
n
n
f (u, u) = f αi ei , αi ei = 2 αi α j f (ei , e j ) = 0,
i=1 i=1 i= j
as required.
= 3x1 y1 − x2 y1 + 2x1 y2 + x2 y2 .
For u = [1, 1]t and v = [3, −2]t , we have f (u, v) = 0 but f (v, u) = 0.
Remark 7.14 Of course, if f is either symmetric or alternating, then the orthogo-
nality relation ⊥ is symmetric. In fact, if f (u, v) = f (v, u) (or f (u, v) = − f (v, u))
for any u, v ∈ V, then u ⊥ v if and only if v ⊥ u.
When the orthogonality relation is symmetric, that is, f (u, v) = 0 if and only if
f (v, u) = 0 for all u, v ∈ V, we say that f is reflexive.
Theorem 7.15 Let f : V × V → F be a bilinear form on the vector space V. Then
f is reflexive if and only if f is either symmetric or alternating.
Proof In light of Remark 7.14, we now assume that f is reflexive and prove that
it is either symmetric or alternating. Let x, y, z ∈ V. Of course, f (x, y) f (x, z) =
f (x, z) f (x, y). Thus, we have
0 = f (x, y) f (x, z) − f (x, z) f (x, y) = f x, f (x, y)z − f (x, z)y . (7.1)
Here, we suppose that f is neither symmetric nor alternating and show that a con-
tradiction follows. In light of our last assumption, there exist u, v, w ∈ V such that
f (v, w) = f (w, v) and f (u, u) = 0. By (7.4) and f (v, w) = f (w, v), it follows
that f (w, w) = f (v, v) = 0. Moreover, by (7.3) and f (u, u) = 0, we also have both
f (u, v) = f (v, u) and f (u, w) = f (w, u).
On the other hand, for x = v, y = w and z = u in (7.2), it follows
0 = f (v, w) f (u, v) − f (v, u) f (w, v) = f (u, v) f (v, w) − f (w, v) (7.5)
Therefore, since f (v, w) = f (w, v), relations (7.5) and (7.6) say that f (u, v) =
f (v, u) = 0 and f (u, w) = f (w, u) = 0. Hence
and call W ⊥ the f -orthogonal space of W. One may notice that this definition is
equivalent to the one of orthogonal complement in an inner product space. In this
sense, we prefer to use the term f -orthogonal space, and not orthogonal complement
for the set W ⊥ , in order to distinguish the case of metric spaces and the other one of
inner product spaces.
In fact, if W ⊥ is the orthogonal complement of the subspace W of an inner product
space V , then W ⊕ W ⊥ = V. On the other hand, if W ⊥ is simply the f -orthogonal
space of W in the metric space V (i.e., V is equipped by a bilinear form that is not
an inner product), then it may happen that W + W ⊥ = V.
= x1 y2 − x2 y1 + x2 y3 − x3 y2 .
If W =
(1, −1, 0), then W ⊥ =
(1, −1, 0), (1, 0, 1) and W + W ⊥ = W ⊥ = R3 .
7.5 The Restriction of a Bilinear Form 249
= x1 y2 + x2 y1 + x1 y3 + x3 y1 + x2 y3 + x3 y2 .
If W =
(1, 1, 0), (0, 1, 1), then any pair of vectors u, v ∈ W can be written as
u = α1 (1, 1, 0) + α2 (0, 1, 1), v = β1 (1, 1, 0) + β2 (0, 1, 1), α1 , α2 , β1 , β2 ∈ R.
Hence
f (u, v) = α1 β1 f (1, 1, 0), (1, 1, 0) + α1 β2 f (1, 1, 0), (0, 1, 1)
+α2 β1 f (0, 1, 1), (1, 1, 0) + α2 β2 f (0, 1, 1), (0, 1, 1)
= 2α1 β1 + 3α1 β2 + 3α2 β1 + 2α2 β2
23 β1
= α1 α2 .
32 β2
23
Thus, the matrix represents the restriction of f to W.
32
For instance, let u ∈ W and the coordinate vector of u for V be [2, 1, −1]t . Thus
u = 2(1, 1, 0) + (−1)(0, 1, 1). Similarly, if v ∈ W and the coordinate vector of v
for V be [1, 4, 3]t then v = 1(1, 1, 0) + 3(0, 1, 1). Thus, the coordinate vector of u
with respect to the basis B = {(1, 1, 0), (0, 1, 1)} for W is [2, −1]t . Analogously,
the coordinate vector of v with respect to B is [1, 3]t .
As vectors of V, we get
⎡ ⎤⎡ ⎤
011 1
f (u, v) = 2 1 −1 ⎣ 1 0 1 ⎦ ⎣ 4 ⎦ = 13.
110 3
As vectors of W,
23 1
f (u, v) = 2 −1 = 13.
32 3
250 7 Bilinear and Quadratic Forms
Given a vector space V and a reflexive bilinear form f on V, we define the radical of
f as the subspace: Rad( f ) = V ⊥ = {u ∈ V | f (u, v) = 0 for all v ∈ V } = {u ∈
V | f (v, u) = 0 for all v ∈ V }. The bilinear form f is said to be non-degenerate if
V ⊥ = {0}. This is equivalent to say that f (v, u) = 0, for any v ∈ V, implies u = 0.
In case V ⊥ = {0}, we refer to f as a degenerate form.
Similarly, we may define the radical of the restriction f |W , where W is a subspace
of V,
in other words, X ∈ V ⊥ if and only if its coordinate vector [x1 , . . . , xn ]tB , in terms
of B, is a solution of the homogeneous linear system associated with the matrix A.
Thus V ⊥ = {0} if and only if the homogeneous linear system associated with the
matrix A has only the trivial solution. This happens if and only if the rank of A is
equal to n, that is, A is invertible, as required.
7.6 Non-degenerate Bilinear Forms 251
Remark 7.21 Using the above argument one may prove that if f : V × V → F
is a reflexive non-degenerate form and W is any subspace of V, then dim(W ) +
dim(W ⊥ ) = dim(V ).
= 21 x1 y2 + 21 x2 y1 + 21 x1 y3 + 21 x3 y1 + 21 x2 y3 + 21 x3 y2 .
If we denote W =
(1, 1, 0), it is easy to see that
(− 21 , 21 , 0), (−1, −1, 1) is the
f -orthogonal space of W. So we may write W ⊥ =
(− 21 , 21 , 0), (−1, −1, 1). Thus
R3 = W ⊕ W ⊥ .
Moreover, any pair of vectors u, v ∈ W, can be written as
α1 α1 β1 β1
u = (− − α2 , − α2 , α2 ), v = (− − β2 , − β2 , β2 )
2 2 2 2
7.6 Non-degenerate Bilinear Forms 253
where (α1 , α2 ) and (β1 , β2 ) are the coordinates of u and v , respectively (in terms
of the above fixed basis for W ⊥ ). By computations, we have that
1
f (u , v ) = − α1 β1 − α2 β2 .
4
Therefore, the matrix
− 14 0
0 −1
Example 7.26 Consider the symmetric form f defined in Example 7.23 and the
basis
1 1
B = (1, 1, 0), (− , , 0), (−1, −1, 1)
2 2
in terms of the canonical basis for R3 . Notice that f is non-degenerate. The vector
X = [1, 0, − 12 ]t is f -isotropic, in fact f (X, X ) = 0. Hence R3 is f -isotropic. The
vector Y = [1, 2, 1]t is f -nonisotropic, in fact f (Y, Y ) = 19.
Let V be equipped with the symmetric bilinear form f. Notice that any inner product
on a real vector space (according to the definition in Chap. 5) is a symmetric bilinear
form. The converse is not generally true. For instance, let f : R2 × R2 → R be a
bilinear form having matrix
−4 2
A=
2 −2
in terms of the canonical basis for R2 . One can verify that f is symmetric. On the
other hand, for u = [1, 1]t ∈ R2 , we see that f (u, u) = u t Au = −2. Thus, f is not
an inner product (in the sense of our definition in Chap. 5).
In general, as statedin the previous section, if the bilinear form f is reflexive,
one may refer to V, f as a metric space. In particular, if f is symmetric, V, f
is called symmetric space. Nevertheless, any symmetric form having the additional
property f (v, v) > 0for all 0 = v ∈ V , is an inner product. Of course, in this case,
we may also refer to V, f as an inner product space.
Lemma 7.31 Let F be a field of characteristic different from 2 and (V, f ) a sym-
metric space. If f = 0 (that is f is not identically zero on V × V ), then there is
w0 ∈ V such that f (w0 , w0 ) = 0.
Proof Assume that f (v, v) = 0 for all v ∈ V. Therefore, for any x0 , y0 ∈ V , it
follows that f (x0 , x0 ) = 0, f (y0 , y0 ) = 0 and f (x0 + y0 , x0 + y0 ) = 0. Therefore,
since the form is symmetric, we get the contradiction
7.7 Diagonalization of Symmetric Forms 255
Theorem 7.32 Let F be a field of characteristic different from 2 and (V, f ) a sym-
metric space. Then there is an f -orthogonal basis for V.
Here, we would like to describe the classical method for finding an f -orthogonal
basis for a symmetric space (V, f ), under the assumption that f (v, v) = 0 for any
nonzero v ∈ V.
Actually, the present method does not differ from the one we have already outlined
in the case of inner product spaces and is usually called the Gram-Schmidt process.
Let B = {b1 , . . . , bn } be a basis of the symmetric space V and A0 the matrix of
f with respect to B. Using the Gram-Schmidt process, we may compute the basis
E = {e1 , . . . , en } for V as follows:
e1 = b1
k−1
f (bk ,ei )
ek = bk − e
f (ei ,ei ) i
for all k = 2, . . . , n;
i=1
that is,
256 7 Bilinear and Quadratic Forms
e1 = b1
f (b2 ,e1 )
e2 = b2 − e
f (e1 ,e1 ) 1
By easy computations, one can see that ek ⊥ ei , for any k = i. Moreover, since f
is nonisotropic, f (ek , ek ) = 0 for any k = 1, . . . , n. Hence, E is an f -orthogonal
basis for V. Let U be the transition matrix of E relative to B such that U t A0 U is
diagonal. Then, the column vectors of U are the coordinates of the elements of E in
terms of B. In particular, U has the following form:
⎡ ⎤
1 α12 α13 · · · α1n
⎢0 1 α23 · · · α2n⎥
⎢ ⎥
⎢ .. ⎥
..
⎢0 0 . ⎥
.
⎢ ⎥
U =⎢. .. ..⎥ , αi j ∈ F. (7.10)
⎢ .. . .⎥
⎢ ⎥
⎢. .. ⎥
⎣ .. . αn−1,n ⎦
0 0 0 1
We recall that such a matrix is usually called upper unitriangular, in the sense that it
is an upper triangular matrix having all diagonal coefficients equal to 1.
The matrix A of f in terms of E is diagonal, namely
⎡ ⎤
a11
⎢ a22 ⎥
⎢ ⎥
A = U t A0 U = ⎢ .. ⎥ (7.11)
⎣ . ⎦
ann
Hence, the matrix à of f in terms of the basis B̃ has the following diagonal form:
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 257
⎡ ⎤
1
⎢ 1 ⎥
⎢ ⎥
à = ⎢ .. ⎥.
⎣ . ⎦
1
If P̃ is the transition matrix having in any i-th column the coordinates of the vector
e˜i , then P̃ t A P̃ = Ã.
Of course, if F is not algebraically closed, one cannot expect the same final result.
In particular, here we describe the case F = R.
Starting from B, we obtain a basis C = {c1 , . . . , cn } for V, as follows:
ck = √ 1 e
| f (ek ,ek )| k
= √ 1 ek
|akk |
for all k = 1, . . . , n
Hence, the matrix A of f in terms of the basis C has the following diagonal form:
⎡ ⎤
±1
⎢ ±1 ⎥
⎢ ⎥
A = ⎢ .. ⎥,
⎣ . ⎦
±1
where any diagonal (i, i)-entry is equal to +1 or −1 according to the fact that aii is
positive or negative, respectively. Finally, by reordering the vectors in C, we get an
f -orthogonal basis D = {w1 , . . . , wn } for V, with respect to which the matrix A
of f is ⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
A = ⎢
⎢
⎥.
⎥
⎢ −1 ⎥
⎢ .. ⎥
⎣ . ⎦
−1
Moreover, if P is the transition matrix having in any i-th column the coordinates of
the vector wi , then P t A P = A .
The Gram-Schmidt process can be essentially viewed as a recursive algorithm,
divided into several steps, which should allow us to obtain a sequence of bases
for V. In any single step of the process, the basis of the sequence consists of vec-
tors obtained as linear combinations of vectors from the basis given at the previous
step. More precisely, at the i-th step, we get a basis Bi = {e1(i) , . . . , en(i) } such that
258 7 Bilinear and Quadratic Forms
{e1(i) , . . . , ei(i) , ek(i) } is an f -orthogonal set for any k ≥ i + 1. We stop the process
whenever the basis consists of all f -orthogonal vectors.
Denote by B0 = {e1(0) , . . . , en(0) } the starting basis for V and A0 = ai j the coef-
ficients matrix of f with respect to B0 . Let us describe in detail any single step:
(i) Step 1 :
Set B1 = {e1(1) , . . . , en(1) }, where
e1(1) = e1(0)
and
f (ek(0) , e1(1) )
ek(1) = ek(0) − (1) (1) 1
e(1) , k = 2, . . . , n.
f (e1 , e1 )
e1(2) = e1(1)
and
e2(2) = e2(1)
and
f (ek(1) , e2(2) )
ek(2) = ek(1) − e2(2) , k = 3, . . . , n.
f (e2(2) , e2(2) )
e(m)
j = e(m−1)
j , j = 1, . . . , m
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 259
e(m)
j ⊥ ei(m) for all i = j and i, j = 1, . . . , m
and
f (ek(m−1) , em(m) )
ek(m) = ek(m−1) − em(m) , k = m + 1, . . . , n.
f (em(m) , em(m) )
f (X, X ) = −2x
1 + 2x1 x2 − 2x2 + 2x2 x3 − 3x3
2 2 2
= − x12 + (x1 − x2 )2 + (x2 − x3 )2 + 2x32 < 0 for all X ∈ R3 .
f (e1 , e2 ) f (e1 , e3 )
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1 ,
f (e1 , e1 ) f (e1 , e1 )
that is,
260 7 Bilinear and Quadratic Forms
1
e1 = e1 , e2 = e2 + e1 , e3 = e3 .
2
Therefore, the transition matrix is
⎡ ⎤
1 21 0
C1 = ⎣ 0 1 0 ⎦
001
In the second step, we define the following new basis B = {e1 , e2 , e3 } for R3 :
f (e2 , e3 )
e1 = e1 , e2 = e2 , e3 = e3 − e ,
f (e2 , e2 ) 2
that is,
2
e1 = e1 , e2 = e2 , e3 = e3 + e2 .
3
The transition matrix is now ⎡ ⎤
100
C2 = ⎣ 0 1 23 ⎦
001
In order to determine the coordinates of vectors e1 , e2 , e3 , we make the composition
of both changes of basis. So we obtain the transition matrix C of the final basis B
relative to the starting canonical basis:
⎡ 1 1
⎤
1 2 3
⎢ ⎥
C = C1 C2 = ⎣ 0 ⎦. 1 23
001
Hence, the coordinates of e1 , e2 , e3 are precisely the columns of C, that is
7.8 The Orthogonalization Process for Nonisotropic Symmetric Spaces 261
1 1 2
B = (1, 0, 0), ( , 1, 0), ( , , 1) .
2 3 3
A is the matrix of f in terms of B . Finally, we construct the basis for R3 with
respect to which the matrix of f has all nonzero entries equal to ±1. To do this, we
e
replace each ei by √ i . In particular, we have
| f (ei ,ei )|
√ √
√
3 7
| f (e1 , e1 )| = 2, | f (e2 , e2 )| = √ , | f (e3 , e3 )| = √
2 3
Here, we would like to repeat the previous example, in this case, the bilinear form is
defined on a complex vector space. More precisely:
Example 7.35 Let f : C3 × C3 → C be the symmetric bilinear form defined by the
matrix ⎡ ⎤
−2 1 0
A = ⎣ 1 −2 1 ⎦
0 1 −3
A is the matrix of f in terms of B . Since the vector space is defined over a
algebraically closed field, here we may construct a basis for C3 with respect to
which the matrix of f has all nonzero entries equal to 1. To do this, we replace each
e
ei by √ i . Thus, we have
f (ei ,ei )
√ √
√ 3 7
f (e1 , e1 ) = i 2,
f (e2 , e2 ) = i √ ,
f (e3 , e3 ) = i √
2 3
So, after m steps, we suppose that Bm = {e1(m) , . . . , en(m) } is the basis for V
such that {e1(m) , . . . , em(m) , ek(m) } is an f -orthogonal set, for any k ≥ m + 1, but
(m) (m)
f (em+1 , em+1 ) = 0. Nevertheless, if there exists j ≥ m + 2 such that f (e(m) (m)
j , e j ) =
(m) (m)
0, we may obtain a new basis Bm where em+1 switches places with e j . The outcome
achieved enables us to apply the Gram-Schmidt method starting from Bm .
Therefore, we have to consider the hardest case when f (e(m) (m)
j , e j ) = 0, for
any j ≥ m + 1. Firstly, we notice that if f (e(m) (m)
j , ek ) = 0, for any j ≥ m + 1 and
k ≥ j + 1, then Bm is already an f -orthogonal basis, and we are done. Thus, we
suppose there are j ≥ m + 1 and k ≥ j + 1 such that f (e(m) (m)
j , ek ) = α = 0. Easy
computations show that
f (e(m)
j + ek(m) , e(m)
j + ek(m) ) = 2α = 0.
Once again, we may obtain a new basis Bm for V, by replacing e(m) with e(m) + ek(m)
j j
in the basis Bm . As above, we can apply the Gram-Schmidt algorithm, by using the
vectors from Bm . By repeating this argument, we will finally find an f -orthogonal
basis for V.
in terms of the canonical basis B = {e1 , e2 , e3 } for R3 . Notice that, in this case, R3 is
f -isotropic. We firstly proceed to construct a basis B = {e1 , e2 , e3 } for R3 in terms
of which the matrix of f has some nonzero element on the main diagonal. Since
f (e1 , e1 ) = 0 and f (e1 , e2 ) = 0, we may define
where α ∈ R and f (e1 , e1 ) = α. It is easy to see that for α = 1 we get the required
condition f (e1 , e1 ) = 0. Thus, the transition matrix is
⎡ ⎤
100
C1 = ⎣ 1 1 0 ⎦
001
that is,
1
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
2
and the corresponding transition matrix is
⎡ ⎤
1 − 21 −1
C2 = ⎣ 0 1 0 ⎦ .
0 0 1
The basis B = {e1 , e2 , e3 } in terms of which the matrix of f is precisely A , is
obtained by the computation
⎡⎤
1 − 21 −1
C = C1 · C2 = ⎣ 1 21 −1 ⎦ ,
0 0 1
where C is the transition matrix from B to the starting basis B. Looking at the
columns of C, we have
1 1
B = (1, 1, 0), (− , , 0), (−1, −1, 1) .
2 2
Moreover, C t AC = A .
Finally, we construct the basis for R3 , with respect to which the matrix of f has
e
all nonzero entries equal to ±1. To do this, we replace each ei by √ i . In
| f (ei ,ei )|
particular, we have
1
| f (e1 , e1 )| = 1, | f (e2 , e2 )| = , | f (e3 , e3 )| = 1
2
and obtain the basis
7.9 The Orthogonalization Process for Isotropic Symmetric Spaces 265
B̃ = (1, 1, 0), (−1, 1, 0), (−1, −1, 1) .
Putting Theorem 7.32 and the above orthogonalization process together, we are now
able to state the following:
(Notice that we cannot say that B is orthonormal, because there is the chance that
some diagonal element is zero, that is, f (ei , ei ) = 0 for some ei ∈ B).
Theorem 7.38 Let (V, f ) be a real symmetric vector space. Then there is an f -
orthogonal basis B for V such that the matrix of f in terms of B has the following
form:
266 7 Bilinear and Quadratic Forms
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥.
⎢ ⎥
⎢ −1 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
The map q is called a quadratic form on V if both the following conditions hold:
(1) the map (7.12) is bilinear;
(2) for any α ∈ F and v ∈ V, q(αv) = α 2 q(v).
It is easy to see that f is symmetric. The map f is called the symmetric bilinear form
associated with q.
Remark 7.40 For the rest of this section, we always assume that char (F) = 2.
= ϕ(u, u) + ϕ(u, v) + ϕ(v, u) + ϕ(v, v)
1
2
−ϕ(u, u) − ϕ(v, v)
= ϕ(u, v) + ϕ(v, u) .
1
2
(7.14)
Let now f : V × V → F be defined as f (u, v) = q(u + v) − q(u) − q(v) , for
1
2
any u, v ∈ V. By relation (7.14) it is clear that f (u, v) = f (v, u), that is, f is
symmetric, moreover, since ϕ is bilinear, so is also f . Thus, in light of Definition 7.39,
we conclude that q is a quadratic form on V, having f as its associated symmetric
bilinear form.
Remark 7.42 The previous result outlines the fact that the quadratic form q can be
expressed both in terms of its associated symmetric bilinear form f , and in terms of
any bilinear form ϕ having the property ϕ(v, v) = q(v) for all v ∈ V. Nevertheless,
ϕ is not required to be necessarily symmetric. On the other hand, in case we assume
ϕ symmetric, by (7.14) it follows that ϕ = f.
In other words, there is an unique symmetric bilinear form associated with q.
Definition 7.43 Let q : V → F be a quadratic form on V, B = {e1 , . . . , en } a basis
for V and ϕ be any bilinear form associated with q, that is, ϕ(v, v) = q(v) for any
v ∈ V. If C is the matrix of ϕ in terms of the basis B, then q(v) = ϕ(v, v) = v t Cv.
We say that C is a matrix associated with q in terms of the basis B.
In light of the previous remark, there is only one symmetric matrix associated
with a quadratic form in terms of a fixed basis.
Now the question that arises is how we can compute the symmetric form associated
with a given quadratic form. To provide an answer to this question, we prove the
following:
268 7 Bilinear and Quadratic Forms
Proof Let ϕ be any bilinear form associated with q, that is, ϕ(v, v) = q(v), for any
v ∈ V. If C is the matrix of ϕ in terms of the basis B, then q(v) = ϕ(v, v) = v t Cv.
Now we define the bilinear form ϕ̃ having matrix C t , that is, ϕ̃(v, w) = v t C t w for
any v, w ∈ V. So we may introduce the quadratic form associated with ϕ̃ as follows:
q̃(v) = v t C t v for any v ∈ V. Since both v t C t v and v t Cv are scalar elements of F,
it is clear that each of them coincides with its transpose. On the other hand, the
transpose of v t C t v is precisely v t Cv (and viceversa). Hence v t C t v = v t Cv, i.e.,
q(v) = q̃(v), for any v ∈ V and
1 t 1
2q(v) = q(v) + q̃(v) = v t (C + C t )v =⇒ q(v) = v (C + C t )v = v t (C + C t ) v,
2 2
None of the above bilinear maps is symmetric. In order to obtain the symmetric form
associated with q, we may choose arbitrarily one of the above bilinear form and
compute its matrix in terms of the canonical basis for R2 . For instance, the matrix of
ϑ is
12 13
A= , so that At =
32 22
and the only symmetric bilinear map ϕ associated with q has matrix
5
1 1
(A + At ) = 5
2 ,
2 2
2
that is,
5 5
ϕ (x1 , x2 ), (y1 , y2 ) = x1 y1 + x1 y2 + x2 y1 + 2x2 y2 .
2 2
Of course, we get the same result starting from the matrix of η or ψ.
where X is the column coordinate vector with respect to B. We say that A is the
matrix of q in terms of the basis B.
Moreover, if B is another basis for V, different from B, it is known that f can be
represented also by the matrix A = P t A P in terms of B , where P is the transition
matrix of B relative to B.
As above, we say that A is the matrix of q with respect to the basis B and
q(X ) = X t A X ,
In light of the equivalence between quadratic and symmetric bilinear forms, let us
rephrase Theorems 7.37 and 7.38 as follows:
In other words, if r is the rank of the matrix of q, then it follows that, for any X ∈ V,
having coordinate vector [x1 , . . . , xn ]t in terms of B,
7.12 Diagonalization of a Quadratic Form 271
q(X ) = X t AX
⎡ ⎤
1
⎢ .. ⎥⎡ ⎤
⎢ . ⎥
⎢ ⎥ x1
⎢ 1 ⎥⎢ . ⎥
= [x1 , . . . , xn ] ⎢
⎢
⎥⎣ . ⎦
⎥ .
⎢ 0 ⎥
⎢ .. ⎥ xn
⎣ . ⎦
0
= x12 + · · · + xr2 .
q(X ) = X t AX
⎡ ⎤
1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥⎡ ⎤
⎢ −1 ⎥ x1
⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
= [x1 , . . . , xn ] ⎢ . ⎥⎣ . ⎦
⎢ ⎥
⎢ −1 ⎥ xn
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
having p ones and r − p negative ones on the main diagonal, where r is the rank of
A (recall that A and D have the same rank, since they are congruent). The ordered
pair ( p, r − p) is called the signature of A.
having p positive ones and r − p negative ones on the main diagonal. In other words,
there exists an invertible matrix P, that is, the transition matrix of B relative to B,
such that P t D P = D .
Definition 7.54 Let q : V → R be a real quadratic form, associated with the sym-
metric bilinear form f. The quadratic form q is called:
(1) Positive definite if q(X ) > 0 for all nonzero vectors X ∈ V ; in this case, the
signature is (n, 0).
(2) Negative definite if q(X ) < 0 for all nonzero vectors X ∈ V ; in this case, the
signature is (0, n).
(3) Indefinite if it is neither positive-definite nor negative-definite, in the sense that
q takes on V both positive and negative values; in this case, the signature is
( p, r − p) for p = r.
(4) Positive semi-definite if q(X ) ≥ 0 for all X ∈ V, but there is some nonzero vector
X 0 ∈ V so that q(X 0 ) = 0; in this case, the signature is (r, 0).
(5) Negative semi-definite if q(X ) ≤ 0 for all X ∈ V, but there is some nonzero
vector X 0 ∈ V so that q(X 0 ) = 0; in this case, the signature is (0, r ).
Remark 7.56 Let (V, f ) be a finite dimensional metric space over the field R. If f is
a symmetric bilinear form on V, having the additional property that f (v, v) ≥ 0, for
any v ∈ V and f (v, v) = 0 if and only if v = 0, then we say that (V, f ) is an inner
product vector space. This is equivalent to say that the quadratic form q associated
with f is positive definite.
Proof Since D and E are congruent, they represent the same real quadratic form
q : V → R, in terms of two different bases for V. The fact that D and E have the
same rank is proved in Lemma 7.50, namely, rank(D) = rank(E) = r ≤ n. Thus,
there exists a basis B = {e1 , . . . , en } for V such that the matrix of q in terms of B is
⎡ ⎤
α12
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ α 2p ⎥
⎢ ⎥
⎢ −α 2p+1 ⎥
⎢ ⎥
⎢ .. ⎥
D=⎢ . ⎥,
⎢ ⎥
⎢ −αr2 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
Consider now the set S={e1 . . . , e p , es+1 , . . . , en } and suppose there exist λ1 , . . . , λ p ,
λ1 , . . . , λn−s ∈ R such that
λ1 e1 + · · · + λ p e p + λ1 es+1
+ · · · + λn−s en = 0. (7.15)
and
n−s r −s
q(u) = q −λ1 es+1
− ··· − λn−s en =
λi2 q(es+i )=− λi2 βs+i
2
≤ 0.
i=1 i=1
p r −s
λi2 αi2 = λi2 βs+i
2
= 0.
i=1 i=1
That is,
λ1 es+1
+ · · · + λn−s en = 0
implying
λj = 0 for all j = 1, . . . , n − s.
es+1 , . . . , en of V are direct summands, so that the dimension of the vector subspace
spanned by S is equal to p + n − s ≤ n, i.e., p ≤ s.
7.13 Definiteness of a Real Quadratic Form 277
Proof We firstly recall that A, A are congruent if and only if they represent the
same real quadratic form q : V → R, in terms of two different bases for V (see
Theorem 7.7). Moreover, Remark 7.33 assures that any symmetric matrix, over a
field of characteristic different from 2, is congruent to a diagonal matrix.
In light of these comments, there exist invertible n × n real matrices M, P, Q and
diagonal n × n real matrices D, D such that the following relations hold simultane-
ously:
A = M t A M, A = P t D P, A = Q t D Q.
Of course, by Remark 7.53, we may assume that D and D represent the signature
of A and A , respectively. Therefore, by
Q t D Q = A = M t A M = M t P t D P M,
it follows
t
D = (Q t )−1 M t P t D P M Q −1 = P M Q −1 D P M Q −1 .
Hence, the diagonal matrices D, D are congruent and, by Lemma 7.57, they have
the same signature.
We now prove the other direction of this theorem and suppose that A, A have the
same signature. Thus, there exists a diagonal n × n real matrix D (representing the
common signature of A and A ) and two invertible n × n real matrices P, Q such
that D = P t A P = Q t A Q. Hence
t
A = (Q t )−1 P t A P Q −1 = P Q −1 A P Q −1 ,
with p positive ones and r − p negative ones on the main diagonal. More precisely,
there exists an invertible n × n real matrix Q such that D = Q t AQ.
On the other hand, since A is symmetric, it is orthogonally similar to a diagonal
matrix D having the eigenvalues {λ1 , . . . , λn } of A on the main diagonal, that is,
there exists an orthogonal real n × n matrix P such that
⎡ ⎤
λ1
⎢ .. ⎥
⎢ . ⎥
P t A P = D = ⎢
⎢
⎥.
⎥
⎣ .. ⎦
.
λn
where
√1
λi
if λi > 0
μi = √1
−λi
if λi < 0
1 if λi = 0.
where ⎧ ⎫
⎨ 1 if λi > 0 ⎬
ηi = μi2 λi = −1 if λi < 0 .
⎩ ⎭
0 if λi = 0
We conclude this chapter by studying the relationship between the principal subma-
trices of a matrix associated with a quadratic form q and the definiteness of q. Here,
we recall the following well-known objects:
Definition 7.61 Let A be a n × n matrix over F, namely
⎡ ⎤
a11 a12 · · · a1n
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
A=⎢
⎢ .. ..
⎥
.. .. ⎥ .
⎣ . . . . ⎦
an1 an2 · · · ann
Proof Firstly, we assume that any principal minor of A is positive and show, by
induction on the dimension n, that q is positive definite. Remind that this is equivalent
to say that any eigenvalue of A is positive.
Of course, in case n = 1, the conclusion is trivial, because A = (a11 ), where
a11 > 0 and q(X ) = a11 x12 > 0, for any X ∈ V, which has its coordinate vector
[X ], where [X ] = [x1 ]t .
Thus, we assume that the result holds for any quadratic form q : W → R, where
W is a real space of dimension m < n. In other words, we suppose that if any principal
minor of the symmetric real matrix associated with q is positive, then q is positive
definite.
Since a11 is the principal minor of order 1, then a11 > 0. By using this element,
we may apply the orthogonalization process for symmetric spaces. We compute the
following coefficients:
a12 a13 a1i
β12 = β13 = . . . . . . β1i = for all i ≥ 2
a11 a11 a11
such that
a11 01,n−1
U t AU = (7.16)
0n−1,1 B
where
Analogously, we denote by Ri the row coordinate vector consisting of the elements
from the i-th row of the product U t A. By easy computations, we see that
⎡ ⎤
R1
⎢ R2 − β12 R1 ⎥
⎢ ⎥
⎢ ⎥
U A = ⎢ R3 − β13 R1 ⎥ .
t
⎢ .. ⎥
⎣ . ⎦
Rn − β1n R1
In other words, R1 = R1 and, for any k ≥ 2, Rk = Rk − β1k R1 . Therefore, any k × k
principal submatrix of U t A is obtained by the corresponding k × k principal subma-
trix of A after using a finite number of elementary row operations. In particular, the
operation is the addition of a multiple of a row to another. It is well known that this
type of basic operation has no effect on the determinant. Hence any k × k principal
minor of U t A is equal to the corresponding k × k principal minor of A.
By using the same argument, one can see that the matrix U t (AU ) has the same
principal minors of AU. On the other hand, since any matrix has the same principal
minors of its transpose matrix, AU and U t At have the same principal minors. In
summary, U t AU, AU, U t At , At and A have the same principal minors.
Looking at (7.16), it is clear that any principal minor of U t AU is the product of
a11 > 0 with a principal minor of B. Therefore, any principal minor of B should be
positive. On the other hand, B is a symmetric matrix of dimension n − 1, therefore,
282 7 Bilinear and Quadratic Forms
by induction hypothesis, the eigenvalues {λ1 , . . . , λn−1 } of B are positive. Thus, the
eigenvalues {a11 , λ1 , . . . , λn−1 } of U t AU are positive, so that q is positive definite.
Suppose now that q is positive definite. Let Ak be any principal submatrix of
A of dimension k and Y = (α1 , . . . , αk ) ∈ Rk . Hence, the coordinate vector of Y
is [Y ] = [α1 , . . . , αk ]t . Further, let X ∈ V such that its coordinate vector is [X ] =
[α1 , . . . , αk , 0, . . . , 0 ]t ∈ Rn . Since q is positive definite, X t AX > 0. That is,
(n−k)−times
⎡ ⎤
α1
⎢ .. ⎥
⎢ ⎥
⎢ . ⎥
Ak Bk,n−k ⎢ ⎥
⎢ αk ⎥
X AX = [α1 , . . . , αk , 0, . . . , 0]
t
Cn−k,k E n−k,n−k ⎢ 0
⎢ ⎥
⎥
⎢ . ⎥
⎣ .. ⎦
0
⎡ ⎤
α1
⎢ ⎥
= [α1 , . . . , αk ]Ak ⎣ ... ⎦ > 0,
αk
where Bk,n−k ∈ Mk,n−k (R), Cn−k,k ∈ Mn−k,k (R) and E n−k,n−k ∈ Mn−k,n−k (R).
By the arbitrariness of Y = (α1 , . . . , αk ), it follows that Y t Ak Y > 0 for any vector
Y ∈ Rk , that is, Ak ∈ Mk (R) is the matrix of a positive definite quadratic form on
Rk . Therefore, any eigenvalue of Ak is positive, and hence the determinant of Ak is
positive because the determinant of Ak is equal to the product of all eigenvalues of
Ak . Thus, any principal minor of A is positive.
Exercises
(a) Show that the equation q(A) = det A defines a quadratic form q on V.
(b) Let W be the subspace of V of matrices of trace 0. Show that the bilinear
form f determined by q is negative
definite on the subspace W.
0b
10. Prove that matrix A = over a field F with char (F) = 2, is congruent to
b0
2b 0
A = . Further, find a nonsingular matrix P over F such that A =
0 − b2
P t A P.
Chapter 8
Sesquilinear and Hermitian Forms
Let C be the complex field, V, W complex vector spaces and V × W the cartesian
product of V and W (as sets). A function f : V × W → C is called sesquilinear
if it is conjugate-linear (complex semi-linear) in the first variable and linear in the
second one, that is
and
f (v, β1 w1 + β2 w2 ) = β1 f (v, w1 ) + β2 f (v, w2 ) (8.2)
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 285
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_8
286 8 Sesquilinear and Hermitian Forms
is a sesquilinear form on Cn × Cn .
Example 8.2 Let A be a complex n × n matrix. The function f : Cn × Cn −→ C
defined as f (X, Y ) = X ∗ AY is a sesquilinear form on Cn × Cn , where X, Y ∈ Cn
and X ∗ denotes the transpose conjugate of the vector X , i.e., X ∗ = (X )t . Here the
vectors X and Y have been identified by column matrices with complex entries.
Let f : V × W → C be a sesquilinear form on V × W, where V and W are finite
dimensional complex vector spaces. Let B = {b1 , . . . , bn } and C = {c1 , . . . , cm }
be ordered bases for V and W, respectively. Let [v] B and [w]C be the coordinate
vectors of v ∈ V in terms of B and w ∈ W in terms of C, respectively. Say [v] B =
n
m
[x1 , . . . , xn ]t and [w]C = [y1 , . . . , ym ]t , i.e., v = xi bi and w = y j c j . We have
i=1 j=1
n
m
f (v, w) = f ( xi bi , yjcj) = xi y j f (bi , c j ).
i=1 j=1 i, j
If we consider the coefficients matrix A = ai j , where ai j = f (bi , c j ), for any
i = 1, . . . , n and j = 1, . . . , m, then it is easy to see that
3
4
f (v, w) = f ( xi bi , yjcj)
i=1 j=1
= i x1 y1 + x1 y2 + (2 + i)x1 y3 + (1 + i)x1 y4 − i x2 y1 + x2 y2 + x2 y3
+(1 − i)x2 y4 + 3i x3 y1 + 2x3 y2 + 2x3 y3 + 3x3 y4 .
8.1 Sesquilinear Forms and Their Matrices 287
As in the case of bilinear forms, we introduce the concept of the quadratic form
associated with a sesquilinear form f. More precisely:
Definition 8.4 Let f : V × V → C be a sesquilinear form. The corresponding
quadratic function Q : V → C is defined as Q(v) = f (v, v), for any v ∈ V.
By relations (8.1) and (8.2), it follows that the corresponding quadratic form Q(v) =
f (v, v) satisfies the following:
and
Q(λu) = |λ|2 Q(u) (8.4)
for any λ ∈ C, u ∈ V.
Moreover, relation (8.3) yields
for any u, v ∈ V.
Unlike the situation, we have previously described in the case of bilinear forms,
here we may prove the following:
Proposition 8.5 Let f : V × V → C be a sesquilinear form and Q : V → C the
quadratic form defined as Q(v) = f (v, v), for any v ∈ V. Then f is the unique
sesquilinear form corresponding to Q.
Proof In (8.5) we replace v by iv, so it follows that
Q(u + iv) = Q(u) + Q(iv) + f (u, iv) + f (iv, u) = Q(u) + Q(v) + i f (u, v) − i f (v, u).
(8.6)
Multiplying (8.6) by i we get
that is
288 8 Sesquilinear and Hermitian Forms
so that
1 i
f (u, v) = Q(u + v) − Q(u) − Q(v) − Q(u + iv) − Q(u) − Q(v) .
2 2
(8.8)
Therefore, f is uniquely determined by the function Q.
Remark 8.6 To underline the difference between bilinear and sesquilinear cases,
we recall that if Q is a quadratic form associated with a bilinear function f , then f
is uniquely determined only if it is symmetric.
In order to extend the concept of symmetric forms to complex vector spaces, we
introduce the following:
Definition 8.7 Let C be the complex field and V a complex vector space. A sesquilin-
ear form f : V × V → C is called Hermitian if
f (v, w) = f (w, v)
for any v, w ∈ V.
Example 8.8 Let f : C3 × C3 → C be a sesquilinear form on C3 × C3 . Let B =
{b1 , b2 , b3 } and C = {c1 , c2 , c3 } be ordered bases for C3 . Next suppose that [X ] B
and [Y ]C be the coordinate vectors of v ∈ V and w ∈ V in terms of B and C,
3
respectively. Say [X ] B = [x1 , x2 , x3 ]t and [Y ]C = [y1 , y2 , y3 ]t , i.e., v = xi bi and
i=1
3
w= y j c j . Let
j=1
3
3
f (v, w) = f ( xi bi , yjcj)
i=1 j=1
= x1 y1 + i x1 y2 + (2 + i)x1 y3 − i x2 y1 + x2 y2 + (1 + i)x2 y3
+(2 − i)x3 y1 + (1 − i)x3 y2 + 2x3 y3 .
Easy computations show that f is Hermitian (as well as its associated matrix).
In light of the above comments regarding the matrix of a sesquilinear form, it is clear
that any Hermitian form on a complex vector space V is represented by a complex
8.1 Sesquilinear Forms and Their Matrices 289
Hermitian matrix, depending on the choice of the basis for V. Conversely, for any
Hermitian matrix A, the sesquilinear form f : V × V → C associated with A is
Hermitian.
Theorem 8.9 Let f : V × V → C be a sesquilinear form on the complex vector
space V. The function f is a Hermitian form if and only if f (v, v) ∈ R, for any
v ∈ V.
Proof If we assume that f is Hermitian, then it is easy to see how the fact f (v, v) =
f (v, v), for any v ∈ V, implies that f (v, v) ∈ R.
Conversely, suppose that f (v, v) ∈ R, for any v ∈ V. Thus, for any u, v ∈ V we
also have
implying that
f (u, v) + f (v, u) = α ∈ R. (8.9)
In particular,
f (iu, v) + f (v, iu) = β ∈ R
that is
−i f (u, v) + i f (v, u) = β
and multiplying by i,
f (u, v) − f (v, u) = iβ. (8.10)
α + iβ
f (u, v) =
2
and
α − iβ
f (v, u) = ,
2
If B = {b1 , . . . , bn } and B = {b1 , . . . , bn } are two different ordered bases for V,
and C = {c1 , . . . , cm }, C = {c1 , . . . , cm } are ordered bases for W, we know that
the sesquilinear form f : V × W → C can be represented by different matrices,
according to the choice of a basis for V and W. Let A be the matrix of f in terms of
the bases B for V and C for W, and A the matrix of f in terms of the bases B for
V and C for W.
By following the same procedure as in the case of bilinear forms, we may establish
the following relationship between A and A :
A = P ∗ AQ
Proof The proof is unchanged with respect to the one of Theorem 7.7.
8.3 Orthogonality 291
8.3 Orthogonality
0 = f (x0 , z 0 ) = f (z 0 , x0 ) = f (z 0 , x0 ).
To prove our result, we suppose on the contrary f (v, v) = 0 for all v ∈ V. Hence, it
follows that f (x0 , x0 ) = 0, f (z 0 , z 0 ) = 0 and f (x0 + z 0 , x0 + z 0 ) = 0. Therefore,
we get the contradiction
Proof An inspection of the proof of Theorem 7.32 reveals that it applies unchanged.
Here we just apply Lemma 8.15 in place of Lemma 7.31. As above mentioned,
the rest of the proof is unchanged.
Remark 8.17 The Gram-Schmidt orthonormalization process, introduced in order
to construct an orthonormal basis for a symmetric space, also applies to the case
of a complex space equipped with a Hermitian form f. No change is needed in
the procedure previously described for symmetric spaces, if not the replacement of
the transpose matrix U t by the conjugate-transpose (adjoint) U ∗ , where U is the
transition matrix.
Moreover, we also recall that, in the case of a complex space V equipped with
a Hermitian form f, if we change to a new orthonormal basis for V, the transition
matrix U will be unitary. In fact, since the matrix A associated with f is Hermitian,
it is unitarily similar to a diagonal matrix. This means that there is an f -orthogonal
292 8 Sesquilinear and Hermitian Forms
basis for V in terms of which the real diagonal matrix A representing f is obtained
by A = U ∗ AU = U −1 AU, where U is an unitary matrix.
Theorem 8.16 and Remark 8.17 will allow us to state the ‘Hermitian’ version of
Theorem 7.38:
in terms of the canonical basis B = {e1 , e2 , e3 } for C3 . In order to apply the orthonor-
malization process, we firstly introduce the following new basis B = {e1 , e2 , e3 } for
C3 :
f (e1 , e2 ) f (e1 , e3 )
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
f (e1 , e1 ) f (e1 , e1 )
that is
1 + 2i i
e1 = e1 , e2 = e2 − e1 , e3 = e3 − e1
2 2
and the corresponding transition matrix is
⎡ ⎤
1 − 1+2i
2
− 2i
C1 = ⎣ 0 1 0 ⎦.
0 0 1
f (e2 , e3 )
e1 = e1 , e2 = e2 , e3 = e3 − e .
f (e2 , e2 ) 2
2 − 3i
e1 = e1 , e2 = e2 , e3 = e3 + e2
5
and the corresponding transition matrix is
⎡ ⎤
10 0
C2 = ⎣ 0 1 2−3i
5
⎦.
00 1
The basis B = {e1 , e2 , e3 } in terms of which the matrix of f is precisely A is
obtained by the computation
⎡ ⎤
1 − 1+2i
2
− 4+3i
5
C = C1 C2 = ⎣ 0 1 2−3i ⎦
5
0 0 1
where C is the transition matrix of B relative to the starting basis B. Looking at the
columns of C and reordering the vectors, we have the following basis for C3
4 + 3i 2 − 3i 1 + 2i
B = (1, 0, 0), (− , , 1), (− , 1, 0)
5 5 2
Finally, we construct the basis for C3 , with respect to which the matrix of f has all
nonzero entries equal to ±1. To do this, we replace any vector
4 + 3i 2 − 3i 1 + 2i
e1 = (1, 0, 0), e2 = (− , , 1), e3 = (− , 1, 0)
5 5 2
ei
by the corresponding √ . In particular, we have
| f (ei ,ei )|
√ 3 5
| f (e1 , e1 )| = 2,
| f (e2 , e2 )| = √ ,
| f (e3 , e3 )| = .
5 2
having p ones and r − p negative ones on the main diagonal, where r is the rank of
A (recall that A and D have the same rank, since they are congruent). The ordered
pair ( p, r − p) is called the signature of A. Moreover, according to Definition 8.12,
f is:
(1) Positive definite if its signature is (n, 0).
(2) Negative definite if its signature is (0, n).
(3) Indefinite if its signature is ( p, r − p), for p = r.
(4) Positive semi-definite if its signature is (r, 0).
(5) Negative semi-definite if its signature is (0, r ).
To prove Theorems 8.23 and 8.24, it is sufficient to recall that the eigenvalues of any
Hermitian matrix are real numbers. Therefore, we can use the argument contained in
Theorems 7.58 and 7.59 without any change, but first replace the role of transpose
matrices with one of adjoint matrices in all the proofs.
296 8 Sesquilinear and Hermitian Forms
Exercises
In Chap. 7 bilinear and quadratic forms with various ramifications have been dis-
cussed. In the present chapter we address an aesthetic concern raised by bilinear
forms and, as a part of this study, the tensor product of vector spaces has been intro-
duced. Further, besides the study of tensor product of linear transformations, in the
subsequent sections, a tensor algebra will be developed, and the chapter concludes
with the study of exterior algebra viz. Grassmann algebra.
Definition 9.1 Let U and V be vector spaces over the same field F. Then tensor
product of U and V is a pair (W, ψ) consisting of a vector space W over F together
with a bilinear map ψ : U × V → W satisfying the following universal property:
for any vector space X over F and any bilinear map f : U × V → X , there exists a
unique linear map h : W → X such that h ◦ ψ = f , that is, the following diagram
ψ
U ×V W
h
f
commutes.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 297
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_9
298 9 Tensors and Their Algebras
m
n
m
n
u ⊗ v = ψ(u, v) = ψ αi u i , βjvj = αi β j u i ⊗ v j .
i=1 j=1 i=1 j=1
(iii) As indicated above, for any u ∈ U and v ∈ V, the image of (u, v) under the
universal bilinear pairing into U ⊗ V shall be denoted u ⊗ v ∈ U ⊗ V . In view
of the bilinearity of the pairing (u, v) → u ⊗ v, we find the relations (α1 u 1 +
α2 u 2 ) ⊗ v = α1 (u 1 ⊗ v) + α2 (u 2 ⊗ v) and u ⊗ (β1 v1 + β2 v2 )=β1 (u ⊗ v1 ) +
β2 (u ⊗ v2 ) in U ⊗ V for any α1 , α2 , β1 , β2 ∈ F, u, u 1 , u 2 ∈ U, v, v1 , v2 ∈ V .
(iv) Note that if U = {0} or V = {0} , then U ⊗ V = {0}. Indeed the only bilinear
pairing ψ : U × V → W is the zero pairing and hence the pairing U × V →
{0} yields the tensor product. Thus {0} ⊗ V = {0} and U ⊗ {0} = {0}.
Example 9.3 Let R[x] and R[y] be vector spaces over R. Then, R[x] ⊗ R[y] =
R[x, y].
Define a map f : R[x] × R[y] → R[x, y], given by, f ( p(x), q(y)) = p(x)q(y).
For any α, β ∈ R, p1 (x), p2 (x) ∈ R[x], q(y) ∈ R[y] we have,
Thus f is linear in first slot. Similarly, one can show that f is linear in second
slot and hence f is a bilinear map. Now we claim that, (R[x, y], f ) is the tensor
product of R[x] and R[y]. For this let X be any arbitrary vector space over R and
g : R[x] × R[y] → X be any arbitrary bilinear map. Then we have to construct a
unique homomorphism h : R[x, y] → X such that the following diagram commutes,
i.e., h ◦ f = g.
9.1 The Tensor Product 299
f
R[x] × R[y] R[x, y]
h
g
f
inite f
inite
h α αi j x y + β
i j
αi j x y
i j
i, j≥0 i, j≥0
f
inite
=h (ααi j + ββi j )x i y j
i, j≥0
f inite
= (ααi j + ββi j )g(x i , y j )
i, j≥0
f inite
f inite
= ααi j g(x , y ) +i j
ββi j g(x i , y j )
i, j≥0 i, j≥0
f
inite f
inite
= αh αi j x i y j + βh βi j x i y j .
i, j≥0 i, j≥0
m
n
m
n
Let p(x)= αi x i and q(y)= β j y j . This implies that p(x)q(y) = δi j x i y j
i=0 j=0 i=0 j=0
and δi j = αi β j , 0 ≤ i ≤ m , 0 ≤ j ≤ n .
Therefore,
300 9 Tensors and Their Algebras
m
n
m
n
(h ◦ f )( p(x), q(y)) = δi j g(x i , y j ) = αi β j g(x i , y j )
i=0 j=0 i=0 j=0
m n
= g(αi x i , β j y j )
m
i=0 j=0
n
=g αi x i , βj y j
i=0 j=0
= g( p(x), q(y)).
This implies that, h = h 1 . Therefore, h is unique and finally, we have R[x] ⊗ R[y] =
R[x, y]. Here it is to be noted that x and y do not commute.
Existence and Uniqueness of Tensor Product of Two Vector Spaces
To prove the existence of tensor product of two vector spaces, we need the notion of
“free vector space over a given set”. Let S be any set and F, a field. A pair (V, f )
is called a free vector space over F on the set S, where V is a vector space over F
and f : S → V is a function if for any arbitrary vector space X over F and for any
arbitrary function g : S → X , there exists a unique linear map h : V → X such that
the following diagram commutes, i.e., h ◦ f = g.
f
S V
h
g
X
9.1 The Tensor Product 301
It can be easily shown that given any set S and F, a field, there always exists a free
vector space over F on S. If (V, f ) is a free vector space on S, then f is one-to-one
and
f (S) = V. Moreover, every vector space can be realized as a quotient space of
a free vector space. If (V, f ) is a free vector space over F on S, then as a convention,
we say that V is a free vector space on S. The associated map f and the field F are
assumed to be understood.
Theorem 9.4 Let U and V be vector spaces over a field F. Then their tensor product
U ⊗ V exists and it is unique upto isomorphism.
Proof Let (W, f ) be a free vector space over the set U × V . Now consider T ,
the subspace of W , generated by the elements f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) −
α2 f (u 2 , v), f (u, β1 v1 + β2 v2 ) − β1 f (u, v1 ) − β2 f (u, v2 ), for all elements u, u 1 ,
u 2 ∈ U, v, v1 , v2 ∈ V and α1 , α2 , β1 , β2 ∈ F. Now construct the quotient space WT .
It is obvious that there exists the quotient homomorphism q : W → WT . Clearly
f = q ◦ f : U × V → WT . We show that f is a bilinear map from U × V to WT ;
f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v)
= q[ f (α1 u 1 + α2 u 2 , v)] − α1 q[ f (u 1 , v)] − α2 q[ f (u 2 , v)],
= q[ f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v)],
= 0.
Hence
f (α1 u 1 + α2 u 2 , v) = α1 f (u 1 , v) + α2 f (u 2 , v),
for all α1 , α2 ∈ F, u 1 , u 2 ∈ U, v ∈ V.
This shows that f is linear in the first coordinate. Similarly, it can be seen that f
is also linear in the second coordinate. Thus, f is a bilinear map. Now, we claim that
( WT , f ) is a tensor product of U and V . For this let g : U × V → X be any arbitrary
bilinear map, where X is any vector space. Then we have to produce a unique linear
map h : WT → X such that the following diagram commutes, i.e., h ◦ f = g.
f
U ×V W
T
h
g
X
302 9 Tensors and Their Algebras
Since W is a free vector space over U × V . Hence for the map g : U × V → X , there
exists a unique linear map h : W → X such that the following diagram commutes,
i.e., h ◦ f = g.
f
U ×V W
h
g
h ( f (α1 u 1 + α2 u 2 , v) − α1 f (u 1 , v) − α2 f (u 2 , v))
= g(α1 u 1 + α2 u 2 , v) − α1 g(u 1 , v) − α2 g(u 2 , v)
= 0.
ψ
U ×V W
h
φ
W
then h ◦ ψ = φ.
In a similar way, now factor ψ through universal property of W ,
304 9 Tensors and Their Algebras
φ
U ×V W
h
ψ
ψ ψ
U ×V W U ×V W
φ
IW ψ h
ψ
W W W
h
ψ
U ×V U ⊗V
h
f
ψ
U ×V U ⊗V
h
f
ψ(U × V )
ψ
U ×V U ⊗V
I
ψ
U ⊗V
But we have
h i
U ⊗ V −→
ψ(U × V ) −→ U ⊗ V
(i ◦ h) ◦ ψ = i ◦ (h ◦ ψ)
=i◦ f
= f
= ψ.
This shows that homomorphism i ◦ h also makes the latter diagram commutative.
Hence, the uniqueness of I in the same guarantees that i ◦ h = I . On using ontoness
of i and I is bijective, i :
ψ(U × V ) → U ⊗ V becomes surjective, i.e., i(
ψ(U ×
V )) = U ⊗ V . But since i is the inclusion map, the later relation yields that
ψ(U ×
V ) = U ⊗ V and hence ψ(U × V ) generates U ⊗ V .
(ii) ⇒ (i) Let ψ : U × V → U ⊗ V be a bilinear function such that
ψ(U ×
V ) = U ⊗ V and for any vector space X over F and any bilinear map f : U × V →
306 9 Tensors and Their Algebras
Theorem 9.6 Let U , V be vector spaces over a field F with the ordered bases
{u 1 , u 2 , . . . , u m } and {v1 , v2 , . . . , vn }, respectively. Then any bilinear mapping f :
U × V → W is uniquely determined by the pairings of f (u i , v j ) ∈ W ; 1 ≤ i ≤
m; 1 ≤ j ≤ n, and conversely if f (u i , v j ) ∈ W are arbitrarily given then there exists
a unique bilinear mapping f : U × V → W satisfying f (u i , v j ) = wi j .
m
n
f (u, v) = αi β j f (u i , v j ).
i=1 j=1
m
we find that αu + α u = (ααi + α αi )u i and
i=1
m
n
f (αu + α u , v) = f (ααi + α αi )u i , βjvj
i=1 j=1
m
n
= (ααi + α αi ) β j f (u i , v j )
i=1 j=1
m n
= (ααi + α αi )β j wi j
i=1 j=1
m n
m
n
=α αi β j wi j + α αi β j wi j
i=1 j=1 i=1 j=1
= α f (u, v) + α f (u , v).
Theorem 9.8 Let U and V be vector spaces over a field F and let B1 = {u i | i ∈ I }
and B2 = {v j | j ∈ J } be bases of U and V, respectively. Then the set B = {u i ⊗
v j |i ∈ I, j ∈ J } is a basis for U ⊗ V.
m
n
Proof To show that B is linearly independent, suppose that αi j (u i ⊗ v j ) = 0.
i=1 j=1
This can be rewritten
308 9 Tensors and Their Algebras
m
n
ui ⊗ αi j v j =0
i=1 j=1
n
and hence application of Lemma 9.7 yields that αi j v j = 0 for all i and hence
j=1
αi j = 0 for all i and j. Now to show that B spans U ⊗ V , let u ⊗ v be an arbitrary
m
n
element of U ⊗ V . Then since u = αi u i and v = β j v j , we find that
i=1 j=1
m n
u⊗v = αi u i ⊗ βjvj
i=1 j=1
m n
= αi u i ⊗ βjvj
i=1 j=1
m
n
= αi β j (u i ⊗ v j )
i=1 j=1
m n
= αi β j (u i ⊗ v j ).
i=1 j=1
Hence, any sum of elements of the form u ⊗ v is a linear combination of the vectors
u i ⊗ v j , as desired.
Corollary 9.9 If U and V are finite dimensional vector spaces over a field F, then
dim(U ⊗ V ) = dim(U )dim(V ).
If U and V are finite dimensional vector spaces, then one can see the nature
of linear functionals which exist on U ⊗ V . The proof of the following theorem
illustrates that any linear functional on the tensor product is nothing but a tensor
product of linear functionals in isomorphic sense.
Theorem 9.10 Let U and V be finite dimensional vector spaces. Then
U
∼
⊗V = U
⊗V
⊗V
via the isomorphism h : U
→ U ⊗ V given by h(σ ⊗ τ )(u ⊗ v) = σ (u)τ (v),
⊗V
for every σ ⊗ τ ∈ U
and for every u ⊗ v ∈ U ⊗ V.
f
U ×V U ⊗V
h σ,τ
g
i.e., h σ,τ ( f (u, v)) = g (u, v) or h σ,τ (u ⊗ v) = g (u, v), and hence h σ,τ (u ⊗ v) =
σ (u)τ (v). This shows that h σ,τ ∈ U ⊗ V.
As for each fixed σ ∈ U
and for each fixed τ ∈ V
, we have unique linear functional
h σ,τ on U ⊗ V . Thus, we define a map g : U × V
→ U ⊗ V such that g(σ, τ ) =
h σ,τ . Noting the above arguments, g is well defined. Here, for any α, β ∈ F, σ1 , σ2 ∈
,
U
g(ασ1 + βσ2 , τ )(u ⊗ v) = (h ασ1 +βσ2 ,τ )(u ⊗ v)
= (ασ1 + βσ2 )(u)τ (v)
= (ασ1 (u) + βσ2 (u))τ (v)
= ασ1 (u)τ (v) + βσ2 (u)τ (v)
= α(h σ1 ,τ (u ⊗ v)) + β(h σ2 ,τ (u ⊗ v))
= (αh σ1 ,τ )(u ⊗ v) + (βh σ2 ,τ )(u ⊗ v)
= (αh σ1 ,τ + βh σ2 ,τ )(u ⊗ v)
= αg(σ1 , τ ) + βg(σ2 , τ ) (u ⊗ v)
f
×V
U
⊗V
U
h
g
U
⊗V
v1 , v
2 , . . . , v
n }. We know that
310 9 Tensors and Their Algebras
{u i ⊗ v j , 1 ≤ i ≤ m, 1 ≤ j ≤ n} and {
u i ⊗ v
j , 1 ≤ i ≤ m, 1 ≤ j ≤ n} are bases
of U ⊗ V and U
⊗V
respectively. Then
u i ⊗ v
j )(u ⊗ vμ ) = u
i (u )
h(
v j (vμ ) = δi, δ j,μ = δ(i, j),(,μ) .
ui ⊗
v
j ) = ui ⊗ vj.
In this way, we have proved the linear map h sends a basis of U
⊗V
to a basis
of U ⊗ V and hence h is a bijective map. Thus h becomes an isomorphism and we
have
⊗V
U
∼= U⊗ V.
Proof Let B1 and B2 be bases for V1 and V2 , respectively. We first show that
μ(B1 × B2 ) spans W . Let y ∈ W . Since μ(B1 × B2 ) spans W , we can write
r s
y= a jk μ(z 1 j , z 2k ), where z 1 j ∈ B1 , z 2k ∈ B2 . But since B1 is a basis for
j=1 k=1
t
p
V1 , z 1 j = b j x1 , where x1 ∈ B1 . Similarly, z 2k = ckm x2m , where x2m ∈ B2 .
=1 m=1
Thus, t
r
s
p
y= a jk μ b j x1 , ckm x2m
j=1 k=1 =1 m=1
r s t
p
= a jk b j ckm μ(x1 , x2m ).
j=1 k=1 =1 m=1
t
p
0= dm μ(x1 , x2m )
=1 m=1
p
r
t
s
= dm μ ej z 1 j , f mk z 2k
=1 m=1 j=1 k=1
r s
t
p
= dm ej f mk μ(z 1 j , z 2k ).
j=1 k=1 =1 m=1
t
p
Since μ(B1 × B2 ) is linearly independent, dm ej f mk = 0, for all j, k. But
=1 m=1
now,
t
p
d m = dm δ δmm
=1 m=1
p
r
t s
= dm b j ej )( ckm f mk
=1 m=1 j=1 k=1
p
r s t
= b j ckm dm ej f mk
j=1 k=1 =1 m=1
= 0, for all , m .
Thus, μ(B1 × B2 ) is linearly independent. Hence μ(B1 × B2 ) is a basis for W .
ψ
U ×V U ⊗V
h
f
V ⊗U
ψ
V ×U V ⊗U
h
g
U ⊗V
is commutative, i.e., h ◦ ψ = g. Now for any x ∈ U ⊗ V , we find that
m n
h ◦ h(x) = h ◦ h αi j (u i ⊗ v j )
m i=1 j=1
n
= h αi j h(u i ⊗ v j )
i=1 j=1
m n
=h αi j (v j ⊗ u i )
i=1 j=1
m
n
= αi j h (v j ⊗ u i )
i=1 j=1
m n
= αi j (u i ⊗ v j )
i=1 j=1
= x.
Similarly, it can be seen that h ◦ h (y) = y for all y ∈ V ⊗ U . This shows that h ◦
h = I on V ⊗ U and h ◦ h = I on U ⊗ V . Thus h is an isomorphism whose inverse
is h and hence U ⊗ V ∼= V ⊗ U.
(ii) The above proof shows that the correspondence u ⊗ v ←→ v ⊗ u has been
used to establish the isomorphism U ⊗ V ∼ = V ⊗ U . Similarly, using the corre-
spondence u ⊗ α ←→ α ⊗ u ←→ αu, where α ∈ F, u ∈ U it can be seen that
U ⊗F∼ = F⊗U ∼ = U.
and
9.1 The Tensor Product 313
g u, α(v1 ⊗ w1 ) + β(v2 ⊗ w2 ) = L u α(v1 ⊗ w1 ) + β(v2 ⊗ w2 )
= αL u (v1 ⊗ w1 ) + β L u (v2 ⊗ w2 )
= αg(u, v1 ⊗ w1 ) + βg(u, v2 ⊗ w2 ).
Using the isomorphism given in (iii) as identification, define the tensor product
of three vector spaces as follows:
U ⊗ V ⊗ W = (U ⊗ V ) ⊗ W = U ⊗ (V ⊗ W ).
n
Vi = V1 ⊗ V2 ⊗ · · · ⊗ Vn .
i=1
n
(v1 , v2 · · · , vn ) → v1 ⊗ v2 ⊗ · · · ⊗ vn ∈ Vi
i=1
V1 ⊗ V2 ⊗, . . . ⊗ Vk = (V1 ⊗ V2 ⊗ · · · ⊗ Vk−1 ) ⊗ Vk
f (v1 , v2 , . . . , vi , . . . , v j , . . . , vn ) = f (v1 , v2 , . . . , v j , . . . , vi , . . . , vn )
for any i = j.
A n-linear map f : V × V × · · · × V → W is called antisymmetric or skew sym-
metric if interchanging any two coordinate positions introduces a factor of (−1), i.e.,
f (v1 , v2 , . . . , vi , . . . , v j , . . . , vn ) = − f (v1 , v2 , . . . , v j , . . . , vi , . . . , vn )
for any i = j.
A n-linear map f : V × V × · · · × V → W is called alternate or alternating if
f (v1 , v2 , . . . , vn ) = 0
ψ
V1 × V2 × · · · × Vn V1 ⊗ V2 ⊗ · · · ⊗ Vn
h
f
X
Exercises
1. If U and V are finite dimensional vector spaces over the same field, then show that
U ⊗ V can be represented as the dual of the space of bilinear forms on U ⊕ V .
2. Let U and V be vector spaces over the same field. If V = V1 ⊕ V2 , then show
that U ⊗ V = (U ⊗ V1 ) ⊕ (U ⊗ V2 ).
9.1 The Tensor Product 315
(U1 ⊗ U ) ∩ (U2 ⊗ U ) ∼
= (U1 ∩ U2 ) ⊗ U.
(U1 ⊗ V ) ∩ (U ⊗ V1 ) ∼
= U1 ⊗ V1 .
(U1 ⊗ V1 ) ∩ (U2 ⊗ V2 ) ∼
= (U1 ∩ U2 ) ⊗ (V1 ∩ V2 ).
τ1
U1 × U2 U1 ⊗ U2
τ2 ◦g
g f
V1 × V2 τ2
V1 ⊗ V2
The unique map f is called the tensor product of linear maps T1 and T2 and
usually f is represented by f = T1 ⊗ T2 . Here, we have ( f ◦ τ1 )(u 1 , u 2 ) = (τ2 ◦
g)(u 1 , u 2 ), i.e., f (u 1 ⊗ u 2 ) = T1 (u 1 ) ⊗ T2 (u 2 ). We also write as (T1 ⊗ T2 )(u 1 ⊗
u 2 ) = T1 (u 1 ) ⊗ T2 (u 2 ).
(ii) Let 0 : U1 → V1 be the zero linear map and T : U2 → V2 be any arbitrary linear
map then 0 = 0 ⊗ T , T ⊗ 0 = 0 , where 0 : U1 ⊗ U2 → V1 ⊗ V2 is the zero
linear map and 0 : U2 ⊗ U1 → V2 ⊗ V1 is the zero linear map. (0 ⊗ T )(u 1 ⊗
u 2 ) = 0(u 1 ) ⊗ T (u 2 ) = 0 ⊗ T (u 2 ) = 0 = 0 (u 1 ⊗ u 2 ) gives the clue.
Example
9.14 Let D : R[x] → R[x] be the linear map, which is called derivation
and : R[y] → R[y]
be the linear map, which is known as integration operator. We
shall determine D ⊗ .
Since, we know
that R[x] ⊗ R[y]=R[x, y], D ⊗ :
R[x] ⊗ R[y] → R[x] ⊗ R[y],
D ⊗ : R[x, y] → R[x, y], given by (D ⊗ )( p(x) ⊗ q(y)) = (Dp(x)) ⊗
i.e.,
( q(y)).
In other words,
f
inite f
inite
(D ⊗ ) αi j x y
i j
= αi j D(x i ) yj
i≥0, j≥0 i≥0, j≥0
f
inite j+1
= αi j i x i−1 yj+1
i≥0, j≥0
f
inite
= i
α x i−1 y j+1 .
j+1 i j
i≥0, j≥0
Proof Given that U1 ∼ = U2 and V1 ∼ = V2 . This confirms that there exist isomor-
phisms T1 : U1 → U2 and T2 : V1 → V2 . Using the definition of tensor product
of linear maps, we have T1 ⊗ T2 : U1 ⊗ V1 → U2 ⊗ V2 , which is a linear map
given as (T1 ⊗ T2 )(u 1 ⊗ v1 ) = T1 (u 1 ) ⊗ T2 (v1 ). We claim that T1 ⊗ T2 is a bijec-
tive map. For ontoness of T1 ⊗ T2 , let u 2 ⊗ v2 ∈ U2 ⊗ V2 . Since T1 and T2 are onto,
there exist u 1 ∈ U1 and v1 ∈ V1 such that T1 (u 1 ) = u 2 and T2 (v1 ) = v2 . Obviously
(T1 ⊗ T2 )(u 1 ⊗ v1 ) = T1 (u 1 ) ⊗ T2 (v1 ) = u 2 ⊗ v2 . This shows that T1 ⊗ T2 is onto.
Before proving that T1 ⊗ T2 is injective, we will prove a fact, i.e., kernel of T1 ⊗ T2
is the subspace K of U1 ⊗ V1 generated by the elements x ⊗ y of U1 ⊗ V1 with
x ∈ K er (T1 ) or y ∈ K er (T2 ). In otherwords,
K er (T1 ⊗ T2 ) = K =
{x ⊗ y ∈ U1 ⊗ V1 |x ∈ K er (T1 ) or y ∈ K er (T2 )}.
U1 ⊗ V1
T1 ⊗T2
q
U1 ⊗V1
U2 ⊗ V2
K q∗
Hence, g is linear in the first coordinate. Similarly, we can show that g is linear in
the second coordinate also. Thus g is a bilinear map. By the definition of U2 ⊗ V2 ,
there exists a unique linear map h : U2 ⊗ V2 → U1 K⊗V1 such that following diagram
commutes, i.e., h(u 2 ⊗ v2 ) = (u 1 ⊗ v1 ) + K , where T1 (u 1 ) = u 2 and T2 (v1 ) = v2 .
9.2 Tensor Product of Linear Transformations 319
U2 × V2 U2 ⊗ V2
h
g
U1 ⊗V1
K
U1 ⊗V1 U1 ⊗V1
Here, we observe that h ◦ q ∗ : K
→ K
is a homomorphism such that
(h ◦ q ∗ ) u 1 ⊗ v1 + K = h q ∗ (u 1 ⊗ v1 + K )
= h T1 (u 1 ) ⊗ T2 (v1 )
= (u 1 ⊗ v1 ) + K .
This shows that h ◦ q ∗ is the identity map. Hence, it is bijective map as a result
q is one-to-one. Thus K erq ∗ = {K }. Let us suppose that z ⊗ t ∈ K er (T1 ⊗ T2 ).
∗
∗
implies that (T1 ⊗ T2 )(z ⊗ t) =∗0 =⇒ T1 (z) ⊗ T2 (t)∗ = 0. Hence, q (z ⊗ t) +
This
K = 0, i.e., (z ⊗ t) + K ∈ K er (q ). But since K er (q ) = {K }, this implies that
(z ⊗ t) + K = K , i.e., z ⊗ t ∈ K . Thus, we have proved that K er (T1 ⊗ T2 ) ⊆ K .
Finally, we have shown that K er (T1 ⊗ T2 ) = K . Given that T1 and T2 are injective,
i.e., K er T1 = {0} and K er T2 = {0}. As a result
K er (T1 ⊗ T2 ) =
x ⊗ y ∈ U1 ⊗ V1 |x ∈ {0} or y ∈ {0}.
Exercises
3. Let d
dx
: R[x] → R[x] and d
dy
: R[y] → R[y] be linear maps, known as deriva-
∂2
tions. Prove that ddx ⊗ dy
d
≡ ∂ x∂ y
. Here, R[x] and R[y] represent the vector spaces
of all real polynomials in x and y, respectively. Here, x and y do not commute.
4. Let P2 (x) be the vector space of all polynomials over R in x of degree less
than or equal to 2 and P3 (y) be the vector space of all polynomials over R in y
320 9 Tensors and Their Algebras
In the previous section, we studied how to obtain the tensor product of finite number
of vector spaces over the same field F. Given a vector space V, we want to construct
an algebra over F. Before, doing that we want to give a brief idea of external direct
product and external direct sum of an arbitrarily set of vector spaces over the same
field F. Let F = {Vi |i ∈ I } be an arbitrarily given set of vector spaces Vi , where I
is an indexing set. Let P denotes the set of all maps that can be defined from I to
the union M of the sets Vi such that f (i) ∈ Vi holds for every i ∈ I, i.e., P = { f :
I −→ M = ∪i∈I Vi and f (i) ∈ Vi holds for every i ∈ I }. Define addition and scalar
multiplication in P as: for any f, g ∈ P, α ∈ F, the functions f + g : I −→ M and
α f : I −→ M are defined by ( f + g)(i) = f (i) + g(i) and (α f )(i) = α( f (i)) for
each i ∈ I. One can easily verify that P is a vector space over F with regard to these
operations. P is known
as the external direct product of vector spaces Vi , i ∈ I. It is
usually denoted by , i.e., P = i∈I Vi .
Now, we construct a special type of subspace of the vector space P. Let us consider
a subset S of P, which consists of all f ∈ P such that f (i) = 0 holds for all except
finite number of indices i ∈ I. It is obvious to observe that S is a subspace of P.
This subspace is known as the external direct sum of given set F of vector spaces.
ext
Usually, we denote it by ext , i.e., S = i∈I Vi . It is to be remarked here that
if the indexing set I is finite, then P = S. And, in this case, each of them can be
called either the external direct product or the external direct sum of the given set F
of vector spaces.
For each index j ∈ I, we define a map f j : V j −→ S such that ( f j )(v) ∈ S for
every v ∈ V j , where ( f j )(v) : I −→ M is a map defined by (( f j )(v))(i) = v if i = j
and 0, otherwise. It can be easily shown that f j is an injective linear transforma-
tion. Thus, for each index j ∈ I, we can identify V j with its image f j (V j ) in S
9.3 Tensor Algebra 321
and in this sense, we can say that V j is a subspace of S for each index j ∈ I.
Let f ∈ S and suppose that 0 = f (i 1 ) = v1 ∈ Vi1 , 0 = f (i 2 ) = v2 ∈ Vi2 , . . . , 0 =
f (ir ) = vr ∈ Vir but f (i) = 0 for each i ∈ I − {i 1 , i 2 , . . . , ir }, where r is any non-
negative integer. Now, we can write f = ( f i1 )(v1 ) + ( f i2 )(v2 ) + · · · + ( f ir )(vr ).
Here, the vectors ( f i1 )(v1 ), ( f i2 )(v2 ), . . . , ( f ir )(vr ) can be identified by the vectors
v1 , v2 , . . . , vr respectively. Thus, finally we can write f ∈ S as f = v1 + v2 + · · · +
r
vr , i.e., f = vi in identified sense. Now, we start constructing an algebra over F.
i=1
p p
Let V be a vector space over F. Define the symbol Vq by Vq = V1 ⊗ V2 ⊗
· · · ⊗ Vp ⊗ V
1 ⊗ V
2 ⊗ · · · ⊗ V
q where Vi = V, i = 1, 2, . . . , p and V
j = V
, j =
p
1, 2, . . . , q. Also, define V00 =F, V0 = V1 ⊗ V2 ⊗ · · · ⊗ V p , Vq0 = V
1 ⊗ V
2 ⊗ · · · ⊗
q , where Vi = V, i = 1, 2, . . . , p, V
V
j = V
, j = 1, 2, . . . , q. Thus, the symbols
p
Vq have been defined for each nonnegative integers p and q. Obviously, F =
p
{Vq , where ( p, q) ∈ N ∪ {0} × N ∪ {0}} is a set of vector spaces, where N ∪ {0} ×
N ∪ {0} is an indexing set. Let T p(V ) be the external direct sum of this set of vector
spaces, i.e., T (V ) = ext ( p,q)∈I Vq , where I = N ∪ {0} × N ∪ {0}. The vector space
T (V ) is known as the tensor space of V and an element of T (V ) is called a tensor.
p
By the previous arguments, it is clear that Vq are subspaces of T (V ) for each non-
p
negative integers p and q. If p and q are positive integers, then an element of V0
is called a contravariant tensor, an element of Vq0 is called a covariant tensor and an
p
element of Vq is called as p times contravariant and q times covariant tensor or a
mixed tensor of the type ( p, q).
r s
p
Define a multiplication in T (V ) as: let x, y ∈ T (V ), clearly x = vq ji , y =
i=1 j=1
m
n
p p p p p
vq i , where vq ji ∈ Vq j i , vq i ∈ Vq i then
j j j
i=1 j=1
pr + pm qs +qn
p
xy = vqpji ⊗ vq i ,
j
a=0 b=0 pi + pi =a q j +q j =b
pi + p
where vq ji ⊗ vq i ∈ Vq j i ⊗ Vq i ∼
p p p p
j j
= Vq j +q j i . This can be easily seen that this multipli-
cation is a binary operation in T (V ). Actually, this multiplication is a bilinear map
from T (V ) × T (V ) to T (V ). It can be verified that the vector space T (V ) forms an
algebra over F with regard to the multiplication defined above, which is in general
noncommutative, infinite dimensional and with the identity V00 . This algebra con-
structed above is known as the tensor algebra of V and is usually denoted by T (V ).
We will denote the set of all contravariant and covariant tensors in T (V ) by T0 (V )
and T 0 (V ), respectively.
Theorem 9.17 T0 (V ) and T 0 (V ) form subalgebras of the tensor algebra T (V ).
p p p
Proof Clearly T0 (V ) = { v0 |v0 ∈ V0 , p = 1, 2, 3, . . .}. Let x, y ∈ T0 (V ).
f inite
r
p
m
p
r
p
Thus x = v0 i and y = v0 i . For any α, β ∈ F, we have αx + βy = αv0 i +
i=1 i=1 i=1
322 9 Tensors and Their Algebras
m
p p p p p
βv0 i . As V0 i and V0 i are subspaces of T (V ), we have, say w0 i = αv0 i ∈
i=1
p p p p
r
p
m
p
V0 i and w0 i = βv0 i ∈ V0 i . Thus αx + βy = w0 i + w0 i . Which shows that
i=1 i=1
αx + βy ∈ T0 (V ), as a result T0 (V ) is a subspace of T (V ). Taking α = 1, β = −1,
+ pm
pr
p p p p
we have x − y ∈ T0 (V ). Here x y = v0 i ⊗ v0 i ), where v0 i ⊗ v0 i ∈
a=0 pi + pi =a
p p
∼ pi + p
V0 i⊗ = V0 i . In turn, we conclude that x y ∈ T0 (V ) and hence T0 (V ) is a
V0 i
subring of T (V ). Only little observation shows that T0 (V ) is a subalgebra of T (V ).
Using the similar arguments, it can be proved that T 0 (V ) is also a subalgebra of
T (V ).
x = x1 ⊗ x2 ⊗ · · · ⊗ x p ⊗ y1 ⊗ y2 ⊗ · · · ⊗ yq , (9.1)
n
n
where xk = xkik eik , k = 1, 2, . . . , p; ym = y mjm e jm , m = 1, 2, . . . , q. If we
i k =1 jm =1
put the values of xk and ym in the above expression for x, then a vast set of summa-
tions occurs. Thus to avoid all these summation signs, we separate these signs but it
is understood. Now using the multilinearity, we obtain
i q j j ··· j
x = x1i1 x2i2 · · · x pp y 1j1 y 2j2 · · · y jq ei11i22···i pq .
i i ···i j j ··· j
Using short notation, we can write the above expression as x = α j11 j22 ··· jpq ei11i22···i pq . Here,
it is understood that summation is carried out on each index. Thus, each tensor in
p k1 k2 ···k h 1 h 2 ···h
Vq can be written in this form. Let y = βh 1 h 2 ···hpq ek1 k2 ···k pq . As a result
1, 2, . . . , p, V j = V , j = 1, 2, . . . , q by
f (x1 , x2 , . . . , x p , y 1 , y 2 , . . . , y q )
= y k (x h )x1 ⊗ x2 ⊗ · · · ⊗ x
h ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y
k ⊗ · · · ⊗ y q , where
x
h and y
k indicate that these vectors are deleted from the tensor. Here, y k (x h ) repre-
sents the image of x h ∈ V under the linear functional y k ∈ V
. It can be verified that
f is a ( p + q)-linear mapping. Thus, using the definition of tensor product of finite
p p−1
number of vector spaces, there exists a unique linear map say Ckh : Vq −→ Vq−1
such that
Ckh (x1 ⊗ x2 ⊗ · · · ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y q )
= y k (x h )x1 ⊗ x2 ⊗ · · · ⊗ x
h ⊗ x p ⊗ y 1 ⊗ y 2 ⊗ · · · ⊗ y
k ⊗ · · · ⊗ y q .
p
Clearly, under this map, the orders of each tensor in Vq are being reduced. The map-
ping Ckh is called a contraction of the hth contravariant index and the kth covariant
p
index. Now, we examine some behaviors of Ckh on the tensors belonging to Vq . For
this, let V be of finite dimension n and further we will use some notations related
with V and T (V ), which have the same meaning as they have in previous para-
j j ··· j p j j ··· j j j ···
j ··· j
graphs. Consider the tensor ei11i22···i pq ∈ Vq , then Ckh (ei11i22···i pq ) = e jk (eih )ei 1i 2···i
k···i q =
1 2 h p
j j ···
j ··· j
using the linearity of Ckh , we have Ckh (x) = α j11 j22 ··· hjk−1 iph jk+1 ··· jq ei 1i 2···i
k···i q . We consider
1 2 h p
some examples to clear these complicated symbols.
Let x ∈ T (V ) such that x = e1 ⊗ e2 − 5e2 ⊗ e2 + e1 ⊗ e2 − 7e2 ⊗ e1 . Then x ∈
V1 and C11 (x) = −5. Let y ∈ T (V ) such that
1
y = 5e1 ⊗ e2 ⊗ e1 ⊗ e1 ⊗ e3 − 7e1 ⊗ e2 ⊗ e1 ⊗ e1 ⊗ e3 + e1 ⊗ e1 ⊗ e2 ⊗ e3 ⊗
e . Clearly y ∈ V32 and C11 (y) = 5e2 ⊗ e1 ⊗ e3 − 7e1 ⊗ e2 ⊗ e3 , C21 (y) = 5e2 ⊗
2
e1 ⊗ e3 − 7e2 ⊗ e1 ⊗ e3 , C32 (y) = 0 but C23 (y), C41 (y) etc. are undefined.
Example 9.18 Let V be a finite dimensional vector space with dimension n. Then V11
is isomorphic to A (V ) via the map f, x ⊗ y −→ f (x ⊗ y) such that f (x ⊗ y)(v) =
y(v)x for all v ∈ V, where y(v) is the image of v under the linear functional y.
Moreover, the contraction C11 of any (x ⊗ y) ∈ V11 is precisely the trace of the linear
operator f (x ⊗ y), where trace of a linear operator is defined as the trace of any
matrix of the linear operator with regard to any ordered basis B of V.
324 9 Tensors and Their Algebras
Define a map η : V × V
−→ A (V ) such that η(x, y) = g(x, y), where g(x, y)
(v) = y(v)x for all v ∈ V. Obviously, for any fixed x and y, g(x, y) is a linear operator
on V. It can be also easily verified that η is a bilinear map. Using the definition of
tensor product, there exists a unique linear map f : V11 to A (V ) such that f (x ⊗
y)(v) = g(x, y)(v) = y(v)x. Here, we observe that V11 and A (V ) have the same
finite dimension. To prove that f is an isomorphism, it is only left to show that f is
n
n
j
one-to-one. For this, let z ∈ K er f, i.e., z = αi (ei ⊗ e j ) because ei ⊗ e j , 1 ≤
i=1 j=1
j
n
n
j
i ≤ n, 1 ≤ j ≤ n is a basis for V11 , where αi ∈ F. This implies that f ( αi (ei ⊗
i=1 j=1
n
n
j
e j ) = 0. Due to linearity of f, we have αi f (ei ⊗ e j ) = 0. It follows that
i=1 j=1
n
n
j
n
n
j
[ αi f (ei ⊗ e )](v) = 0 for all v ∈ V, i.e.,
j
αi ((e j (v))(ei )) = 0 for all
i=1 j=1 i=1 j=1
v ∈ V. Using the fact that {e1 , e2 , . . . , en } is linearly independent set in V and varying
j
v through all the vectors in the set {e1 , e2 , . . . , en }, one concludes that αi = 0 for all
i, j. In turn, we get z = 0 and hence f becomes injective. Hence V1 is isomorphic
1
to Hom (V, V ). This shows that tensors in V11 can be regarded as elements of A (V ).
In a similar fashion, one can show that tensors in V21 can be regarded as elements of
A (V
).
The contraction of x ⊗ y ∈ V11 is given by C11 (x ⊗ y) = y(x).
Case I: If at least one out of x and y is zero, then C11 (x ⊗ y) = 0 and the operator
f (x ⊗ y) will be the zero operator. As the trace of the zero operator is zero, result is
obvious for this case.
Case II: Suppose that x = 0 and y = 0. As y is a nonzero linear functional, Ker y = V
and rank y = 1. Using rank nullity theorem, we have Nullity y = n − 1. Thus V =
Ker y < u > for some nonzero u ∈ V. Now using the previous conclusion, one
can show that the matrix of linear operator f (x ⊗ y) with regard to any ordered basis
will have each of its diagonal entries equal to 0 except one that equals y(x). Thus,
trace of the linear operator equals C11 (x ⊗ y). This proves our result.
Using the same idea as in the previous example, each tensor in V31 can be
regarded as a linear transformation from V03 into V. It can be easily seen that
j j j
contraction need not be commutative. For example, let x = α ij11ij22i3j3 ei11i22i33 be an element
j j j j j
of V33 . Then C12 (x) = αii11 ij22ij33 ei12i33 , C21 (x) = α ij11ii21ij33 ei21i33 , C12 ◦ C21 (x) = αii21ii12 ij33 ei31 , C21 ◦
j
C12 (x) = αii11ii12 ij33 ei32 . Clearly, this shows that C12 ◦ C21 = C21 ◦ C12 . It is to be noted
that product of contractions in one order may be defined but it may not be defined
in the reverse order. The contractions C11 are special in nature. Notice that if we
j j j j j j
take x = α ij11ij22i3j3 ei11i22i33 , then C11 (x) = αii11 ij22ij33 ei22i33 , C11 ◦ C11 (x) = αii11ii22 ij33 ei33 , C11 ◦ C11 ◦
C11 (x) = αii11ii22ii33 . Thus C11 ◦ C11 ◦ C11 maps V33 into the scalars. Similarly, it follows
p
that p copies of C11 composed with each other map V p into F.
9.3 Tensor Algebra 325
Exercises
Throughout this section, V represents a finite dimensional vector space over a field
F, where char (F) = 0 and dim (V ) = n. In this section, we define symmetric and
p
antisymmetric or alternating tensors in V0 . Later, the constructions of symmetric
tensor algebras ST (V ) and antisymmetric tensor algebras AT (V ) of the vector space
V with their properties are interpreted. We have concluded this section by introducing
of exterior product or wedge product ∧ and exterior algebra or Grassmann
the notions
algebra V of V.
Let S p be the permutation group of p integers 1, 2, . . . , p. Let σ ∈ S p . Define a
p
map g : V1 × V2 × · · · × V p −→ V0 , where Vi = V for each i such that 1 ≤ i ≤ p
and g(x1 , x2 , . . . , x p ) = xσ (1) ⊗ xσ (2) ⊗ · · · ⊗ xσ ( p) . It can be shown that g is a p-
linear mapping. Thus using the definition of tensor product, there exists a unique lin-
p p
ear map, say Sσ : V0 −→ V0 , such that Sσ (x1 ⊗ x2 ⊗ · · · ⊗ x p ) = xσ (1) ⊗ xσ (2) ⊗
· · · ⊗ xσ ( p) . With the help of Sσ , we define symmetric and antisymmetric tensors in
p p
V0 . A tensor x ∈ V0 is said to be symmetric if Sσ (x) = x for every σ. If Sσ (x) = x
for some particular σ, then x is said to be symmetric with regard to σ. On the other
p
hand, a tensor x ∈ V0 is said to be antisymmetric or alternating if Sσ (x) =(Sign
σ )x for every σ. If Sσ (x) =(Sign σ ) x for some particular σ, then x is said to be
antisymmetric or alternating with regard to σ, where Sign σ is 1 if σ is even and
Sign σ is −1 if σ is odd.
p
Theorem 9.19 Let x ∈ V0 . Then x is symmetric if and only if Sτ (x) = x for every
transposition τ. The tensor x is antisymmetric if and only if Sτ (x) = −x for every
transposition τ.
Proof Let x be symmetric. Thus Sσ (x) = x for every σ. But any transposition is also
a particular permutation, hence Sτ (x) = x. Conversely, suppose that Sτ (x) = x for
326 9 Tensors and Their Algebras
−A(x). Thus, we get A(x) = −A(x), which implies that A(x) = 0 because char
(F) = 0. Hence x ∈ Kernel (A). This implies that I p ⊆ Kernel A. Let x ∈ Ker-
nel A. Thus A(x) = 0. As we have A(x) = p!1 (Sign σ )Sσ (x). Now adding x
on both sides of the preceding relation, we arrive at x = p!1 (x−Sign σ Sσ (x)).
To show that x ∈ I p , it is sufficient to prove that x−Sign σ Sσ (x) ∈ I p for any
σ ∈ S p . We prove this statement by applying induction on the number of trans-
positions in which σ is expressible. The minimum number of transpositions in
which σ is expressible will be 1 and this is the case, when σ is itself transposition.
p
Thus let σ = (i j). But x ∈ V0 , we have x = x1 ⊗ x2 ⊗ · · · ⊗ x p . This shows that
y = x−Sign (i j)S(i j) (x) = x1 ⊗ x2 ⊗ · · · ⊗ xi ⊗ · · · ⊗ x j ⊗ · · · ⊗ x p + x1 ⊗ x2 ⊗
· · · ⊗ x j ⊗ · · · ⊗ xi ⊗ · · · ⊗ x p = x1 ⊗ x2 ⊗ · · · ⊗ (xi + x j ) ⊗ · · · ⊗ (xi + x j ) ⊗
· · · ⊗ x p − x1 ⊗ x2 ⊗ · · · ⊗ xi ⊗ · · · ⊗ xi ⊗ · · · ⊗ x p − x1 ⊗ x2 ⊗ · · · ⊗ x j ⊗ · · · ⊗
x j ⊗ · · · ⊗ x p . Obviously y ∈ I p . In fact, we have shown that if τ is any transpo-
sition then x + Sτ (x) ∈ I p . Assume the induction hypothesis, i.e., the statement is
true for all permutations σ which have been expressible as product of r transposi-
tions. Let σ1 ∈ S p , which is expressible as product of r + 1 transpositions. We can
write σ1 = σ τ , where σ is a permutation expressible as a product of r transpositions
and τ is a transposition. Now x−Sign σ1 Sσ1 (x) = x−Sign (σ τ )Sσ τ (x) = x+Sign
(σ )Sσ τ (x) − Sτ (x) + Sτ (x) = x + Sτ (x)+Sign (σ )Sσ τ (x) − Sτ (x) = x−
Signτ Sτ (x) − [Sτ (x)−Sign (σ )Sσ (Sτ (x))]. Using the fact that A(τ (x)) = p!1
(Sign σ )Sσ τ (x) = − p!1 (Sign σ τ )Sσ τ (x) = −A(x) = 0 for any σ ∈ S p and the
induction hypothesis we conclude that x−Sign σ1 Sσ1 (x) ∈ I p . Thus result follows
by induction.
r +s
xy = S(vi ⊗ vj ) .
a=0 i+ j=a
Theorem 9.22 Let V be any vector space. Then the symmetric tensor algebra ST (V )
is isomorphic to the algebra of polynomials F[e1 , e2 , . . . , en ], where {e1 , e2 , . . . , en }
is a basis of V.
328 9 Tensors and Their Algebras
where represents the multiplication of algebra F[e1 , e2 , . . . , en ]. It can be easily
verified that f is an algebra isomorphism. Thus the symmetric tensor algebra ST (V )
is isomorphic to the algebra of polynomials F[e1 , e2 , . . . , en ].
Theorem 9.23 Let V be any vector space. Then the vector space ST p (V ) of sym-
metric tensors is isomorphic to the vector space F p [e1 , e2 , . . . , en ], of homogeneous
polynomials of degree p for p ≥ 1.
p
n
Proof Let x ∈ ST p (V ), as x is a symmetric element of V0 . Hence x =
j1 , j2 ,..., ji =1
α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p , where α j1 , j2 ,..., j p ∈ F. Define a map f : ST p (V ) −→
F p [e1 , e2 , . . . , en ], such that
n
f α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p
j1 , j2 ,..., j p =1
n
= α j1 , j2 ,..., j p (e j1 e j2 ··· e j p ),
j1 , j2 ,..., ji =1
where represents the multiplication of algebra F[e1 , e2 , . . . , en ]. It can be easily
verified that f is a vector space isomorphism. Thus the vector space ST p (V ) of
symmetric tensors is isomorphic to the vector space F p [e1 , e2 , . . . , en ].
Theorem 9.24 Let V be any vector space. Then, the vector space F p [e1 , e2 , . . . , en ],
of homogeneous polynomials of degree p is isomorphic to a quotient space of the
p
vector space V0 for p ≥ 1.
p
Proof Define a map f : V0 −→ F p [e1 , e2 , . . . , en ] such that
n
f α j1 , j2 ,..., j p e j1 ⊗ e j2 ⊗ · · · ⊗ e j p
j1 , j2 ,..., j p =1
n
= α j1 , j2 ,..., j p (e j1 e j2 ··· e j p ).
j1 , j2 ,..., ji =1
9.4 Exterior Algebra or Grassmann Algebra 329
Finally, in the end of this section, we want to explore the notion of exte-
rior product and exterior algebra of a vector space V. Consider the quotient vec-
p p
V p V
tor space I0p . Let q : V0 −→ I0p be the quotient homomorphism, i.e., for any
p p
x ∈ V0 , we have q(x) = x + I p . As the map f : V1 × V2 × · · · × V p −→ V0 where
Vi = V, i = 1, 2, . . . , p, such that f (x1 , x2 , . . . , x p ) = x1 ⊗ x2 ⊗ · · · ⊗ x p is a p-
p
V
linear map. Thus, the product map q ◦ f : V1 × V2 × · · · × V p −→ I0p is also a
p-linear map. The image of any element (x1 , x2 , . . . , xn ) under q ◦ f is denoted by
V
p
x1 ∧ x2 ∧ · · · ∧ x p . The quotient space I0p is denoted by p V. The elements of
p
V are called p vectors over V . A p vector is called decomposable
or a pure if it
is of the form x1 ∧ x2 ∧ · · · ∧ x p . We define 1 V = V and 0 V = F. Now, we are
interested in dealing with these p-vectors where p = 0, 1, 2, . . . , which will give
rise to an algebra. For this we prove the following.
Theorem 9.25 Let U be any arbitrary vector space. If h : p V −→ U is a
linear mapping, then h ◦ q ◦ f : V1 × V2 × · · · × V p −→ U where Vi = V, i =
1, 2, . . . , p, is an alternating p-linear mapping, where q and f have been described
in above paragraph. Conversely, if g : V1 × V2 × · · · × V p −→ U where Vi =
V, i = 1, 2, . . . , p, is an alternating p-linear mapping, then there exists a unique lin-
ear mapping h : p V −→ U such that g(x1 , x2 , . . . , x p ) = h(x1 ∧ x2 ∧ · · · ∧ x p ).
· · · × V p −→ p+q (V ) such that g1 (x1 , x2 , . . . , x p ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧
y2 ∧ · · · ∧ yq . Clearly g1 is alternating
p-linear
map. Hence by Theorem 9.25, there
exists unique linear map h 1 : p (V ) −→ p+q (V ) such that g( x1 , x2 , . . . , x p ) =
h 1 (x1 ∧ x2 ∧ · · · ∧ x p ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Similarly define
a map g2 : V1 × V2 × · · · × Vq −→ p+q (V ) such that g2 (y1 , y2 , . . . , yq ) = x1 ∧
x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Clearly g1 isalternating p-linear map. Similarly,
there exists unique linear map h 2 : q (V ) −→ p+q (V ) such that g( y1 , y2 , . . . , yq )
= h 1 (y1 ∧ y2 ∧ · · ·
∧ yq ) = x1 ∧x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . Now define a
map f : p (V ) × q (V ) −→ p+q (V ) such that f (x1 ∧ x2 ∧ · · · ∧ x p , y1 ∧ y2 ∧
· · · ∧ yq ) = x1 ∧ x2 ∧ · · · ∧ x p ∧ y1 ∧ y2 ∧ · · · ∧ yq . As h 1 and h 2 are linear maps,
therefore f is a bilinear map. Let x = x1 ∧ x2 ∧ · · · ∧ x p and y = y1 ∧ y2 ∧ · · · ∧
yq . Then the image of (x, y) under f, i.e., f (x, y) = x ∧ y = x1 ∧ x2 ∧ · · · ∧ x p ∧
y1 ∧ y2 ∧ · · · ∧ yq is called the exterior product or wedge product or the Grassmann
product. If p = 0 or q = 0, then we define α ∧ y to be αy and x ∧ α to be αx. Since
above map f is bilinear. Hence following are evident. (x + y) ∧ (u + v) = x ∧ u +
x ∧ v + y ∧ u + y ∧ v, (αx) ∧ y = x ∧ (αy) = α(x ∧ y). For pure p-vectors x, q-
vectors y, and r -vectors z, we have (x ∧ y) ∧ z = x ∧ (y ∧ z).
Let F = { p (V )| p ∈ I = {0, 1, 2, . . .}} be a set of vector spaces of p-vectors,
where I is an indexing set. (Vp ) be the external direct sum of this set of
Let
vector spaces, i.e., (V ) = ext p∈I (V ). Now we define a new multiplication in
r s
(V ) in the following way. Let x, y ∈ (V ), clearly x = vi , y = vj , where
i=1 j=1
vi ∈ i (V ) and vj ∈ j (V ) then
r +s
xy = (vi ∧ vj ) .
a=0 i+ j=a
It is easy to check that (V ) forms a noncommutative algebra with identity with
regard to the exterior product or wedge product defined above. This algebra is known
as antisymmetric tensor algebra of V, or exterior algebra of V or Grassmann algebra
of V. Usually it is represented by (V ) or AT (V ).
Exercises
p
1. Find the dimension of subspaces ST p (V ) and AT p (V ) of V0 .
2. Prove that the symmetric tensor algebra ST (V ) and exterior algebra (V ) are
graded algebras.
3. Let the dimension of V be n. Prove that dim p (V ) = np .
p q
4. If x ∈ (V ) and y ∈ (V ), then show that x ∧ y = (−1) pq y ∧ x.
5. Let x1 ∧ x2 ∧ · · · ∧ x p ∈ p (V ). Then prove that x1 ∧ x2 ∧ · · · ∧ x p = (Signσ )
xσ (1) ∧ xσ (2) ∧ · · · ∧ xσ ( p) , for any σ ∈ S p .
9.4 Exterior Algebra or Grassmann Algebra 331
In this chapter, we shall study common problems in numerical linear algebra which
includes LU and P LU decompositions together with their applications in solving
a linear system of equations. Further, we shall briefly discuss the power method
which gives an approximation to the eigenvalue of the greatest absolute value and
corresponding eigenvectors. Finally, singular value decomposition (SVD) of matrices
together with its properties and applications in diverse fields of studies are included.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 333
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_10
334 10 Applications of Linear Algebra to Numerical Methods
Er · · · E 2 E 1 A = U.
Since the reduction of A to row echelon form can be achieved without interchanging
any two rows, we can assume that the required elementary matrices E k represent
operations of the form Ri → Ri + α R j designed to add multiples α R j of row R j
to the row Ri below. This means i > j in all the cases. Each E k is, therefore, a
lower triangular elementary matrix with 1 s on the diagonal. The inverse of operation
Ri → Ri + α R j is Ri → Ri − α R j , again with i > j. Hence, the inverse E k−1 is also
a lower triangular matrix with 1 s on its main diagonal. Since elementary matrices
E 1 , E 2 , . . . , Er are nonsingular, multiplying both the sides of the above relation on
the left successively by Er−1 , . . . , E 2−1 , E 1−1 , we get
The matrix L = E 1−1 E 2−1 · · · Er−1 is a lower triangular matrix with 1 s on the main
diagonal provided that no two rows are interchanged in reducing A to U , and the
above yields that A = LU .
The following theorem summarizes the above result:
Theorem 10.1 (LU decomposition theorem) Suppose that A is an m × n matrix
that can be reduced to echelon form without interchanging any two rows. Then there
exist an m × m lower triangular matrix L with 1 s on the main diagonal and an
m × n row echelon matrix U such that A = LU .
Definition 10.2 A factorization of a matrix A as A = LU , where L is a lower tri-
angular and U is an upper triangular matrix, is called an LU decomposition of A.
There is a convenient procedure for finding LU decomposition. In fact, it is only
necessary to keep the track of the multipliers which are used to reduce the matrix
in row reduced echelon form. This procedure is described in the following example
and is called the multiplier method or Dolittle’s method.
⎡ ⎤
12 4
Example 10.3 In order to find the LU decomposition of the matrix A = ⎣ 3 8 14 ⎦ ,
2 6 13
write the identity matrix in the left, i.e.,
⎡ ⎤⎡ ⎤
100 12 4
⎣ 0 1 0 ⎦ ⎣ 3 8 14 ⎦ .
001 2 6 13
The procedure involves doing row operations to the matrix on the right while simul-
taneously updating the column of the matrix on the left. First, we perform the row
operations R2 → R2 − 3R1 to make zero below 1 in the first column and second
row. Note that 3 is added in the second row of the first column because −3 times of
the first row of A is added to the second row.
10.1 LU Decomposition of Matrices 335
⎡ ⎤⎡ ⎤
100 12 4
⎣3 1 0⎦⎣0 2 2 ⎦.
001 2 6 13
⎡ ⎤ ⎡ ⎤
1 00 100
We see that E 1 = ⎣ −3 1 0 ⎦ and hence E 1−1 = ⎣ 3 1 0 ⎦. We carry out the similar
0 01 001
procedure for the third row and find that
⎡ ⎤⎡ ⎤
100 124
⎣3 1 0⎦⎣0 2 2⎦.
201 025
⎡ ⎤ ⎡ ⎤
1 00 100
Note that E 2 = ⎣ 0 1 0 ⎦ and hence E 2−1 = ⎣ 0 1 0 ⎦ . Finally, similar arguments
−2 0 1 201
for the second column and third row yield that
⎡ ⎤⎡ ⎤
100 124
A = ⎣3 1 0⎦⎣0 2 2⎦.
211 003
⎡ ⎤
1 0 0
Thus, we find LU decomposition of the matrix A. We see that E 3 = ⎣ 0 1 0 ⎦ and
0 −1 1
⎡ ⎤
100
hence E 3−1 = ⎣ 0 1 0 ⎦ . It can be seen that U = E 3 E 2 E 1 A and L = E 1−1 E 2−1 E 3−1 .
011
Notice that in each position below the main diagonal of L, the entry is the negative
of the multiplier in the operation that introduced the zero in that position of U .
Remark 10.4 (i) This is natural to ask whether every matrix has LU decompo-
sition. Sometimes it is impossible to write given matrix in this form. In fact,
if a square matrix A can be reduced to row echelon form without using row
interchanges, then A has LU decomposition. More generally, an invertible
matrix A has LU decomposition provided that all its leading submatrices have
nonzero determinants. The kth leading submatrix of A, denoted as Ak , is the
k × k matrix obtained by retaining only the top k rows and left most k columns.
For example, if ⎡ ⎤
a11 a12 · · · a1n
⎢ a21 a22 · · · a2n ⎥
⎢ ⎥
A = ⎢ . . . . ⎥,
⎣ .. .. .. .. ⎦
an1 an2 · · · ann
336 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤
a11 a12 · · · a1k
⎢ a21 a22 · · · a2k ⎥
a11 a12 ⎢ ⎥
then A1 = a11 , A2 = , . . . Ak = ⎢ . .. .. .. ⎥ , and A has LU
a21 a22 ⎣ .. . . . ⎦
ak1 ak2 · · · akk
decompositions if |Ai | = 0; for all 1 ≤ i ≤ n.
(ii) It is also interesting to ask whether a square matrix has more than one decom-
position. In the absence of additional restrictions, it can be easily seen that LU
decompositions are not unique. For example, if
⎡ ⎤⎡ ⎤
11 0 ··· 0 1 u 12 · · · u 1n
⎢ 21 22 ··· 0 ⎥⎢0 1 ··· u 2n ⎥
⎢ ⎥⎢ ⎥
A = LU = ⎢ . .. .. .. ⎥ ⎢ .. .. .. ⎥
⎣ .. . . . ⎦⎣ . . ··· . ⎦
n1 n2 · · · nn 0 0 ··· 1
and L has nonzero entries on the main diagonal, then shift the diagonal entries
from the left factor to the right factor as follows:
⎡ ⎤⎡ ⎤⎡ ⎤
1 0 ··· 0 11 0 · · · 0 1 u 12 · · · u 1n
⎢ 21
⎢ 1 ··· 0⎥
11
⎢ ⎥⎢
⎥ ⎢ 0 22 · · · 0 ⎥ ⎢ 0 1 · · · u 2n ⎥
⎥
A=⎢ . . . ⎥ ⎢ . . . ⎥ ⎢ . . . ⎥
⎣ .. .. · · · .. ⎦ ⎣ .. .. · · · .. ⎦ ⎣ .. .. · · · .. ⎦
n1 n2
11 ⎡22
··· 1 0 0 · · · nn 0 0 ··· 1
⎤⎡ ⎤
1 0 ··· 0 11 11 u 12 · · · 11 u 1n
⎢ 21 1 · · · 0 ⎥ ⎢ 0 22 · · · 22 u 2n ⎥
⎢ 11 ⎥⎢ ⎥
= ⎢ .. .. . ⎥⎢ . .. .. ⎥ .
⎣ . . · · · .. ⎦ ⎣ .. . ··· . ⎦
n1 n2
··· 1 0 0 · · · nn
11 22
0001 10112
This is an LU decomposition
⎡ of A. ⎤ ⎡ ⎤
1 000 1 000
⎢ −2 1 0 0 ⎥ ⎢ 0 1 0 0⎥
It can be seen that E 1 = ⎢ ⎥ ⎢ ⎥
⎣ 0 0 1 0 ⎦ , E 2 = ⎣ −2 0 1 0 ⎦ ,
0 001 0 001
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 000 1 0 00 1 0 0 0
⎢ 0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 1 0 0⎥
E3 = ⎢ ⎥ ⎢ ⎥ ⎢
⎣ 0 0 1 0 ⎦ , E4 = ⎣ 0 − 1 1 0 ⎦ , E5 = ⎣ 0
⎥
4
0 1 0⎦
−1 0 0 1 0 0 01 0 − 21 0 1
338 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤
1 2 1 2 1
⎢0 −4 0 −3 −1 ⎥
with U = E 5 E 4 E 3 E 2 E 1 A = ⎢
⎣0
⎥ and
0 −1 − 41 41 ⎦
0 0 0 21 23
⎡ ⎤
1000
⎢2 1 0 0⎥
L = E 1−1 E 2−1 E 3−1 E 4−1 E 5−1 =⎢
⎣2 1 1 0⎦.
⎥
4
1 21 0 1
11 y1 = b1
21 y1 + 22 y2 = b2
31 y1 + 32 y2 + 33 y3 = b3 .
.. .
. = ..
n1 y1 + n2 y2 + n3 y3 + · · · + nn yn = bn
This yields value of y1 , and further using the successive equations, one can find
y2 , y3 , . . . , yn .
(4) Once Y has been determined, solve the upper triangular system U X = Y to find
the solution X of the system.
Remark 10.6 (i) If any of the diagonal element ii is zero, then the system is
singular and can’t be solved.
(ii) If all the diagonal elements ii are nonzero, then the system has unique solution.
Although the above procedure replaces the problem of solving the single system
into two different systems LY = B and U X = Y , but because of the involvement of
triangular matrices, the latter systems are easy to solve.
10.1 LU Decomposition of Matrices 339
Example 10.7 Solve the system of equations involving five variables and four equa-
tions: ⎡ ⎤
⎡ ⎤ x ⎡ ⎤
12121 ⎢ ⎥ 1
⎢2 0 2 1 1⎥⎢ y ⎥ ⎢2⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣2 3 1 3 2⎦⎢ z ⎥ = ⎣3⎦.
⎣w⎦
10112 4
t
By the above example note that we have the following LU decomposition of the
coefficient matrix:
⎡ ⎤ ⎡ ⎤⎡ ⎤
12121 1000 1 2 1 2 1
⎢ 2 0 2 1 1 ⎥ ⎢ 2 1 0 0 ⎥ ⎢ 0 −4 0 −3 −1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥.
⎣ 2 3 1 3 2 ⎦ = ⎣ 2 1 1 0 ⎦ ⎣ 0 0 −1 − 41 41 ⎦
4
10112 1 21 0 1 0 0 0 1
2
3
2
The LU decomposition is a useful tool to find the solution of a system of equations,
01
but this does not work for every matrix. For example, if we consider A = , then
10
it can be seen that there do not exist lower triangular matrix L and upper triangular
340 10 Applications of Linear Algebra to Numerical Methods
1000
Since the matrix Pσ is full of zeros, it is easy to deal with such matrix. It is called
permuting matrix because it would equal to the identity matrix if we could permute
with its rows.
Remark 10.9 (i) Let σ, τ ∈ Sn . Then their composition σ ◦ τ ∈ Sn .
(ii) If Pσ and Pτ are permutation matrices associated with σ and τ , respectively,
then Pσ Pτ = Pτ ◦σ . In fact, if ai j , bi j are (i, j)th entry of Pσ and Pτ , respec-
tively, and ci j is the (i, j)th entry of Pτ ◦σ , then for each i, j ∈ {1, 2, . . . , n},
n
ci j = aik bk j . Obviously, aik = 1 if and only if k = σ (i) and bk j = 1 if and
k=1
only if j = τ (k). Therefore, aik bk j = 1 if and only if j = τ (k) = τ (σ (i)).
Hence, the product Pσ Pτ is the matrix of permutation τ ◦ σ and Pσ Pτ = Pτ ◦σ .
(iii) The elementary matrix associated with the elementary operation of switching
rows is a permutation matrix. Therefore, performing a series of row switches
may be represented as a permutation matrix, since it is a product of permutation
matrices.
(iv) If E ∈ Mn (F) is an elementary matrix that represents the action Rk → Rk +
α R and if Pσ be the permutation matrix for σ ∈ {1, 2, . . . , n}, then E Pσ =
Pσ E , where E is the elementary matrix that represents the action Rσ (k) →
Rσ (k) + α Rσ ().
Applying procedure of LU decomposition, we can say that when no interchanges
are needed, we can factor a matrix A ∈ Mn (C) as A = LU , where L is a lower
triangular while U is upper triangular. When row interchanges are needed let P be the
permutation matrix that creates these row interchanges, then the LU decomposition
can be carried out for the matrix P A, i.e., P A = LU . This decomposition is known
as P LU decomposition.
10.2 The P LU Decomposition 341
Proof It is clear that permutation of two rows has no bearing on the use of elementary
row operation Ri → Ri + α R j in the reduction of A to row echelon form. Thus, for
any matrix A, there exists a permutation matrix P such that P A can be reduced to row
echelon form without requiring further permutation of rows. Hence, Theorem10.1
guarantees that there exists a suitable matrix L and U such that P A = LU .
⎡ ⎤
2 1 0 1
⎢ 2 1 2 3 ⎥
Example 10.11 P LU decomposition of the matrix A = ⎢ ⎥
⎣ 0 0 1 2 ⎦ . Since
−4 −1 0 −2
⎡ ⎤
1000
⎢0 0 0 1⎥
LU decomposition of A is not possible, let P = ⎢ ⎥
⎣ 0 0 1 0 ⎦ be a permutation matrix
0100
and let
⎡ ⎤⎡ ⎤ ⎡ ⎤
1000 2 1 0 1 2 1 0 1
⎢ 0 0 0 1 ⎥ ⎢ 2 1 2 3 ⎥ ⎢ −4 −1 0 −2 ⎥
A = P A = ⎢ ⎥⎢ ⎥ ⎢
⎣0 0 1 0⎦⎣ 0 0 1 2 ⎦ = ⎣ 0 0 1 2 ⎦.
⎥
0100 −4 −1 0 −2 2 1 2 2
1 021 0 0 0 −2
Hence, A = P 2 A = P(P A) = P LU which yields P LU decomposition of A, i.e.,
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
2 1 0 1 1 0 0 0 1 0 0 0 2 1 0 1
⎢ 2 1 2 3 ⎥ ⎢0 0 0 1⎥ ⎢ −2 1 0 0⎥ ⎢0 1 0 0 ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥.
⎣ 0 0 1 2 ⎦ ⎣0 0 1 0⎦⎣ 0 0 1 0⎦⎣0 0 1 2 ⎦
−4 −1 0 −2 0 1 0 0 1 0 2 1 0 0 0 −2
LY = P B and U X = Y.
y1 = b1
11
b2 −21 y1
y2 = 22
.. .
.
n−1
bn − ni yi
yn = i=1
nn
u 11 x1 + u 12 x2 + · · · + u 1n xn = y1
u 22 x2 + · · · + u 2n xn = y2
.. . .
. = ..
u nn xn = yn
10.2 The P LU Decomposition 343
n
y1 − u 1k xk
yn−1 −u n−1n xn
The back solution is xn = yn
u nn
, xn−1 = u n−1n−1
, x1 = k=1
u 11
.
Remark 10.12 (i) In practice, the step of determining and then multiplying by
the permutation matrix is not actually carried out. Rather, an indent array is
generated, while the elimination step is accomplished that effectively inter-
changes a pointer to the row interchanges. This saves considerable time in
solving potentially very large systems.
(ii) If any of the diagonal element u ii is zero, then the system is singular and cannot
be solved.
(iii) If all diagonal elements of U are nonzero, then the system has unique solution.
⎡ ⎤
1234
Example 10.13 Use P LU factorization of A = ⎣ 1 2 3 0 ⎦ and solve the system
5311
⎡ ⎤
1
of equations AX = B, where B = ⎣ 2 ⎦.
3
We proceed to find row reduced echelon form of the matrix A. First add −1 times
of the first row to the second row and then add −5 times the first row to the third row
of A to get ⎡ ⎤⎡ ⎤
100 1 2 3 4
⎣ 1 1 0 ⎦ ⎣ 0 0 0 −4 ⎦ .
501 0 −7 −14 −19
Now there is no way to obtain upper triangular matrix by using row operation of
replacing a row with itself added to a multiple of another row to the second matrix
(without interchanging any two rows). Now consider the matrix A by switching the
last two rows of A
⎡ ⎤ ⎡ ⎤⎡ ⎤
1234 100 1234
A = ⎣ 5 3 1 1 ⎦ = ⎣ 0 1 0 ⎦ ⎣ 5 3 1 1 ⎦ .
1230 001 1230
Now add −1 times of the first row to the third row and then add −5 times the first
row to the second row to get
⎡ ⎤⎡ ⎤
100 1 2 3 4
⎣ 5 1 0 ⎦ ⎣ 0 −7 −14 −19 ⎦ .
101 0 0 0 −4
The first matrix is lower triangular while the second matrix is upper triangular and
hence A has LU decomposition. ⎡ ⎤
100
Thus, A = P A = LU , where L and U are given above and P = ⎣ 0 0 1 ⎦.
010
Hence, A = P A = P(P A) = P LU and
2
344 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤ ⎡ ⎤⎡ ⎤⎡ ⎤
1234 100 100 1 2 3 4
⎣ 1 2 3 0 ⎦ = ⎣ 0 0 1 ⎦ ⎣ 5 1 0 ⎦ ⎣ 0 −7 −14 −19 ⎦ .
5311 010 101 0 0 0 −4
Exercises
1. If A is any n × n matrix, then show that A can be factored as A = P LU , where
L is lower triangular, U is upper triangular and P is a permutation matrix which
can be obtained by the interchanging rows of In appropriately.
2. Show that the product of many finitely lower triangular matrices is a lower trian-
gular matrix, and apply this result to show that the product of many finitely upper
triangularmatrices
is upper triangular.
pq
3. Let A = . Then
r s
(a) prove that if p = 0, then A has unique LU decomposition with 1 s along
the main diagonal of L;
(b) find the LU decomposition described in (a).
10.2 The P LU Decomposition 345
⎡ ⎤
1 6 2
4. Show that A = ⎣ 2 12 5 ⎦ does not have a LU decomposition. Moreover,
−1 −3 −1
reorder the rows of A and find a LU decomposition of new matrix, and hence
solve the system of equations:
x1 + 6x2 + 2x3 = 9,
2x1 + 12x2 + 5x3 = 7,
−x1 − 3x2 − x3 = 17.
7. Use LU decomposition and forward and back solution to solve the system:
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 −3 2 −2 x1 −11
⎢3 −2 −1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0 ⎥ ⎢ x2 ⎥ = ⎢ −4 ⎥ .
⎣2 36 −28 27 ⎦ ⎣ x3 ⎦ ⎣ 155 ⎦
1 −3 22 5 x4 10
⎡ ⎤
3 −1 0
8. Factor A = ⎣ 3 −1 1 ⎦ as A = P LU , where P is obtained from I3 by inter-
0 2 1
changing rows appropriately, L is lower triangular and U is upper triangular
matrix.
applications, only the dominant eigenvalue and eigenvector of a matrix are needed,
there power method can be tried. However, if additional eigenvalues and eigenvec-
tors are needed, then other methods are required. These methods will not involve the
characteristic polynomial. To observe that there are some advantages to work directly
with the matrix, we must determine the effect that minor changes in the entries of A
have upon the eigenvalues. We have proved a result related to this.
Definition 10.14 Let A be a square matrix. An eigenvalue of A is called the dominant
eigenvalue of A if its absolute value is larger than the absolute values of the remaining
eigenvalues. An eigenvector corresponding to the dominant eigenvalue is called a
dominant eigenvector of A.
Theorem 10.15 Let A be a matrix of order n × n with a complete set of eigenvectors
and let X be a matrix that diagonalizes A, i.e.,
⎡ ⎤
λ1
⎢ λ2 ⎥
⎢ ⎥
X −1 AX = D = ⎢ .. ⎥.
⎣ . ⎦
λn
If A = A + E and λ is an eigenvalue of A , then min |λ − λi | ≤ cond2 (X )||E||2 .
1≤i≤n
Proof If λ is equal to any of the λi s,
then nothing to do. Now suppose that
λ is unequal to any of the λi s. Thus, if we set D1 = D − λ I, then D1 is a
nonsingular matrix. As λ is an eigenvalue of A , it is also an eigenvalue of
X −1 A X. Therefore, X −1 A X − λ I is singular and hence D1−1 (X −1 A X − λ I )
is also singular. On the other hand, D1−1 (X −1 A X − λ I ) = D1−1 X −1 (A + E −
λ I )X = D1−1 X −1 E X + D1−1 X −1 (A − λ I )X = D1−1 X −1 E X + D1−1 X −1 (X D X −1
− λ I )X = D1−1 X −1 E X + D1−1 (D − λ I ). Now using the fact that D1 = D − λ I,
we conclude that D1−1 (X −1 A X − λ I ) = D1−1 X −1 E X + I. This implies that |D1−1
X −1 E X − (−1)I | = 0, i.e., (−1) is an eigenvalue of D1−1 X −1 E X. It follows that
| − 1| ≤ ||D1−1 X −1 E X ||2 ≤ ||D1−1 ||2 cond2 (X )||E||2 . The two-norm of D1−1 is given
by D1−1 = max |λ − λi |−1 . The index i that maximizes |λ − λi |−1 is the same index
1≤i≤n
that minimizes |λ − λi |. Thus, min |λ − λi |−1 ≤ cond2 (X )||E||2 .
1≤i≤n
X, AX X, λX λX, X
= = = λ.
X, X X, X X, X
It is to be noted that we have not scaled the sequence {( λ1k )X k } in the process. On
1
the other hand, if we scale the sequence {X k }, then one gets unit vectors at each step
and the sequence converges to a unit vector in the direction of X 1 . The eigenvalue
λ1 can be computed at the same time.
We now summarize the steps in the power method with scaling as following:
(1) Pick an arbitrary nonzero vector X 0 .
(2) Compute AX 0 and scale down to obtain the first approximation to a dominant
eigenvector. Say it X 1 .
(3) Compute AX 1 and scale down to obtain the second approximation X 2 .
(4) Compute AX 2 and scale down to obtain the third approximation X 3 .
Continuing in this way, a succession X 0 , X 1 , X 2 , . . . of better and better approx-
imations to a dominant eigenvector will be obtained, and in each step, dominant
eigenvalue λ1 is approximated by XXi ,AX i
i ,X i
, where i : 1, 2, . . . .
Example 10.17 Approximate a dominant eigenvector and the dominant
eigenvalue
11
of the matrix A by using power method with scaling, where A = .
13
348 10 Applications of Linear Algebra to Numerical Methods
We arbitrarily choose
1
X0 =
1
X 1 , AX 1 (.5)(1.5) + 3.5
λ1 ≈ = = 3.4.
X 1 , X 1 (.5)(.5) + 1
11 .429 1.429 1 1.429 .417
AX 2 = = , X3 = = .
13 1 3.429 3.429 3.429 1
X 2 , AX 2 (.429)(1.429) + 3.429
λ1 ≈ = = 3.414.
X 2 , X 2 (.429)(.429) + 1
11 .417 1.417 1 1.417 .4147
AX 3 = = , X4 = = .
13 1 3.417 3.417 3.417 1
X 3 , AX 3 (.417)(1.417) + 3.417
λ1 ≈ = = 3.414.
X 3 , X 3 (.417)(.417) + 1
Exercises
21
1. Let A = . Apply three iterations of the power method with any nonzero
12
starting vector and obtain the approximate value of dominant eigenvalue and
dominant eigenvector of A. Determine the exact eigenvalues of A by solving
characteristic equation and determine the eigenspace corresponding to the largest
eigenvalue. Compare the answers you obtained in these two ways.
2. Find the dominant eigenvalue and dominant eigenvector if exist in the following
matrices: ⎡ ⎤ ⎡ ⎤
4 2 1 1 −12 0
−1 4 01 ⎣
, , 0 −5 3 ⎦ , ⎣ 1 0 0 ⎦ .
1 −1 40
0 0 6 0 0 2
1 2 1
3. Let A = and X 0 = . Compute X 1 , X 2 , X 3 and X 4 , using power
−1 −1 1
method. Explain
why power method will fail to converge in this case.
18 17
4. Let A = . Use the power method with scaling to approximate the domi-
2 3
1
nant eigenvalue and the dominant eigenvector of A. Start with X 0 = . Round
1
off all computations to three significant digits and stop after three iterations. Also,
find the exact
⎡ value⎤for the dominant eigenvalue and eigenvector.
21 0
5. Let A = ⎣ 1 2 0 ⎦ . Use the power method with scaling to approximate the
0 0 10
⎡ ⎤
1
dominant eigenvalue and the dominant eigenvector of A. Start with X 0 = ⎣ 1 ⎦ .
1
Round off all computations to three significant digits and stop after three iterations.
Also, find the exact value for the dominant eigenvalue and eigenvector.
6. Let X = (x1 , x2 , . . . , xn )t be an eigenvector of A corresponding to eigenvalue λ.
n
n
If |xi | = ||X ||∞ , then show that ai j = λxi and |λ − aii | ≤ |ai j |.
j=1 j=1, j=i
7. Let A be a matrix with eigenvalues λ1 , λ2 , . . . , λn and let λ be an eigenvalue of
A + E. Let X be a matrix that diagonalizes A, and let C = X −1 E X. Prove that
n
for some i, |λ − λi | ≤ |ci j | and min |λ − λ j | ≤ cond∞ (X )||E||∞ .
j=1 1≤ j≤n
First two sections of this chapter deal with the LU and P LU decompositions of
matrices. In this section, we shall discuss decomposition for rectangular matrices
rather than square matrices. This decomposition is known as Singular Value Decom-
position (SVD). This decomposition is fundamental to numerical analysis and linear
350 10 Applications of Linear Algebra to Numerical Methods
algebra. We will describe here its properties and discuss its applications, which are
many and growing. Throughout the section, A is an m × n matrix, where we have
assumed m ≥ n. This assumption is made for convenience only. All the results will
also hold if m < n.
Definition 10.18 Let A be a matrix of √ order m × n. The real numbers σ1 , σ2 , . . . , σn
are called singular values of A if σi = λi for each i = 1, 2, 3, . . . , n; where λi s are
eigenvalues of At A. The corresponding eigenvectors are called as singular vectors of
A. It is to be noted that At A is positive semi-definite. As a result, all the eigenvalues
of At A will be always nonnegative, i.e., λi ≥ 0 for each i = 1, 2, 3, . . . , n. Also,
thus, all the singular values of A are nonnegative.
Remark 10.20 (i) Let A be any matrix of order m × n with real or complex
entries. Then the Frobenius norm, sometimes also called the Euclidean norm
of an m × n matrix A, is defined as thesquare root of the sum of the absolute
m
n
squares of its elements, i.e., A F = |ai j |2 or equivalently A F =
i=1 j=1
√
trace(A A∗ ), where A∗ is the tranjugate of A.
(ii) Let A be any matrix of order m × n with real or complex entries. Then
m
||A||1 = max |ai j |, which is simply the maximum absolute column sum
1≤ j≤n i=1
of the matrix A;
n
||A||∞ = max |ai j |, which is simply the maximum absolute row sum of
1≤i≤m j=1
the matrix A;
10.4 Singular Value Decompositions 351
Proof Let A be any matrix of order m × n. We have X t At AX = ||AX ||22 ≥ 0 for all
X ∈ Rn . This shows that the matrix At A is positive semi-definite. As a result, all the
eigenvalues of the matrix (At A) will be nonnegative.
√ We order these eigenvalues, so
that λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Define σi = λi . Without loss of generality, one may
suppose that exactly r of the σi s are nonzero, so that σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and
σr +1 = σr +2 = · · · = σn = 0.
Set
⎡ ⎤
σ1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ σ ⎥
D=⎢ ⎢ r ⎥ = Dr O ,
0 ⎥ O O
⎢ ⎥
⎢ . ⎥
.. ⎦
⎣
0
⎢ σr ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
It follows that the eigenvalues of At A are σ12 , σ22 , . . . , σn2 . As σis are the non-
negative square roots of the eigenvalues of At A, they are unique.
(ii) Singular value decomposition of A is not unique. If we notice in the proof of the
above theorem, then we come across the extension of the set {U1 , U2 , . . . , Ur }
to an orthogonal basis of Rm . With the help of this basis, we obtain U. But
we know that this extension is not unique in general. As a result, U is also not
unique. Also ,it is obvious to observe that V is also not unique. Thus, we have
shown that singular value decomposition of A is not unique.
We can also justify this fact by giving a counter example as follows: Let A =
In . If we set D = In , and take U = V to be any arbitrary n × n orthogonal
matrix, then A = U DV t will hold. Thus, uniqueness of decomposition stands
disproved.
(iii) Since V diagonalizes At A, it shows that the Vis : i = 1, 2, . . . , n are eigen-
vectors of At A. Also, as A At = U DV t V D t U t = U D D t U t , it follows that U
diagonalizes A At and the Uis : i = 1, 2, . . . , m are eigenvectors of A At .
(iv) It is also easy to observe that AV = U D. Now comparing ith columns of
each side of the previous equation, we have AVi = σi Ui ; i = 1, 2, . . . , n.
Similarly, we also have At U = V D t and we conclude that At Ui = σi Vi ;
for i = 1, 2, . . . , n; and At Ui = O; for i = n + 1, n + 2, . . . , m. The Vis are
called the right singular vectors of A and the Uis are called the left singular
vectors of A.
Note 10.23 Every complex matrix has a singular value decomposition.
Example 10.24
⎡ Find ⎤ the singular values and a singular value decomposition of A,
0 2
where A = ⎣ 2 −1 ⎦ .
1 0
The matrix
5 −2
A A=
t
−2 5
√
has eigenvalues λ1 = 7 and⎡λ√2 = 3. As
⎤ a result, the singular values of A are σ1 = 7
√ 7 √0
and σ2 = 3. Thus, D = ⎣ 0 3 ⎦ . The eigenvectors corresponding to the eigen-
0 0
1 1
values λ1 and λ2 will be of the form α and β , respectively, where α, β
1 −1
354 10 Applications of Linear Algebra to Numerical Methods
1 1
are nonzero real numbers. Therefore, the orthogonal matrix V = diag- √1
1 −1 2
onalizes At A. As σ1 and σ2 both are nonzero, using (iv) of the above remark, U1
and U2 wil be given by
⎡ ⎤ ⎡ 2 ⎤
0 2 √1 √
1 1 ⎣ ⎦ ⎢ √114 ⎥
U1 = AV1 = √ 2 −1 2 = ⎣ 14 ⎦ ;
σ1 7 1 0 √1
2 √1
14
⎤ ⎡⎡ −2 ⎤
0 2 √1 √
6
1 1 ⎣ ⎢ ⎥
U2 = AV2 = √ 2 −1 ⎦ −1 = ⎣ √36 ⎦ .
2
σ2 3 1 0 √
1
2 √
6
m
m
n m
||Q A||2F = ||[ qi j a jk ]|| F = [ qi j a jk ]2 .
j=1 i=1 k=1 j=1
10.4 Singular Value Decompositions 355
m
n
||Q A||2F = [aik
2
] = ||A||2F .
i=1 k=1
Note 10.27 If A has singular value decomposition U DV t , then using the above
lemma, we have ||A|| F = ||U DV t || F = ||DV t || F = ||(DV t )t || F = ||V D t || F = ||
1
D t || F = ||D|| F . This follows that ||A|| F = (σ12 + σ22 + · · · + σn2 ) 2 , where σ1 , σ2 ,
. . . , σn are the singular values of A.
Theorem 10.28 Let A be a matrix of order m × n of rank r and 0 < k < r. Let
A = U DV t be a singular value decomposition and let Λ denotes the set of all
matrices of order m × n of rank k or less. Assuming that minimum is achieved in
Λ, i.e., there exists a matrix X ∈ Λ such that ||A − X || F = min S∈Λ ||A − S|| F , then
+ · · · + σn2 ) 2 . In particular, if A = U D V t , where
1
||A − X || F = (σk+12
+ σk+2
2
⎡ ⎤
σ1
⎢ σ2 ⎥
⎢ ⎥
⎢ . .. ⎥
⎢ ⎥
⎢ ⎥ Dk O
D =⎢ ⎢ σk ⎥
⎥= O O ,
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
1
then ||A − A || F = (σk+1
2
+ σk+2
2
+ · · · + σn2 ) 2 = min S∈Λ ||A − S|| F .
Proof Let X be a matrix in Λ satisfying ||A − X || F = min S∈Λ ||A − S|| F . Since A
has its rank k, therefore, A ∈ Λ. This implies that ||A − X || F ≤ ||A − A || F =
2 21
||U (D − D )V || F = ||(D − D )|| F = (σk+1 + σk+2 + · · · + σn ) . Next, we will
t 2 2
1
prove that ||A − X || F ≥ (σk+1
2
+ σk+2 2
+ · · · + σn2 ) 2 . Let X = U1 D1 V1t be a singular
value decomposition, where
⎡ ⎤
ω1
⎢ ω2 ⎥
⎢ ⎥
⎢ . .. ⎥
⎢ ⎥
⎢ ⎥ D1k O
D1 = ⎢ ⎢ ωk ⎥
⎥= O O ,
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
||A − X ||2F = ||B11 − D1k ||2F + ||B12 ||2F + ||B21 ||2F + ||B22 ||2F .
B11 B12
We claim that B12 = O. For otherwise, define Y = V1 U1 . Obviously,
O O
Y ∈ Λ and
||A − Y ||2F = ||B21 ||2F + ||B22 ||2F < ||A − X ||2F ,
A = AI
= σ1 U1 V1t + · · · + σn Un Vnt
= σ1 E 1 + · · · + σn E n .
If A is of rank n, then
⎡ ⎤
σ1
⎢ σ2 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥ t
A =U⎢
⎢ σn−1 ⎥V
⎥
⎢ 0 ⎥
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
0
= σ1 E 1 + · · · + σn−1 E n−1
It is to be noted that det A = det (A−1 ) = 1 and all the eigenvalues of A are
1. However, if n is large, then A is close to being singular. To observe this, let
358 10 Applications of Linear Algebra to Numerical Methods
⎡ ⎤
1 −1 −1 ··· −1 −1
⎢ 0 1 −1 ··· −1 −1 ⎥
⎢ ⎥
⎢ 0 0 1 ··· −1 −1 ⎥
B=⎢
⎢
⎥.
⎢ ··· ··· ··· ··· ··· ···⎥
⎥
⎣ 0 0 0 ··· 1 −1 ⎦
−1
3n−2
0 0 ··· 0 1
Proof Since U and V are orthogonal. We have ||A||2 = ||U DV t ||2 = ||D||2 . Now
√
n
i=1 (σi x i )
2
= √
n 2
i=1 xi
≤ σ1 .
||D X ||2
In particular, if we choose X = (1, 0, . . . 0), then ||X ||2
= σ1 . Therefore, it follows
that ||A||2 = ||D||2 = σ1 .
Proof The singular values of A−1 = V D −1 U t arranged in decreasing order are σ1n ≥
1
σn−1
≥ · · · ≥ σ11 . Therefore, ||A−1 ||2 = σ1n and cond 2 (A) = σσn1 .
The singular value decomposition (SVD) is not only a classical theory in matrix com-
putation and analysis but also is a powerful tool in machine learning and modern data
analysis. Today, singular value decomposition has spread through many branches of
sciences, in particular psychology and sociology, climate and atmospheric sciences,
astronomy and descriptive and predictive statistics. This is also used in some impor-
tant topics such as digital image processing, spectral decomposition, polar factoriza-
tion to matrices, compression algorithm, ranking documents, discrete optimization,
10.5 Applications of Singular Value Decomposition 359
When the σis are all distinct, the u is are the eigenvectors of B and the σi2 are the
corresponding eigenvalues. If the σis are not distinct, then any vector that is a linear
combination of those u i with the same eigenvalue is an eigenvector of B.
SVD in Principal Component Analysis
SVD is used in Principal Component Analysis (PCA). PCA is illustrated by an
example: customer-product data where there are n customers buying d products.
Let matrix A with elements ai j represent the probability of customer purchasing
product j. One hypothesizes that there are really only k underlying basic factors
like age, income, family size, etc. that determine a customer’s purchase behavior.
An individual customer’s behavior is determined by some weighted combination
of these underlying factors. This implies that a customer’s purchase behavior can
be characterized by a k-dimensional vector where k is much smaller than n and d.
The components of the vector are weights for each of the basic factors. Associated
with each basic factor is a vector of probabilities, each component of which is the
probability of purchasing a given product by someone whose nature depends only on
that factor. More abstractly, A is an n × d matrix that can be expressed as the product
of two matrices U and V , where U is an n × k matrix expressing the factor weights
for each customer and V is a k × d matrix expressing the purchase probabilities of
products that correspond to that factor. It is possible that A may not be exactly equal
to U V , but close to it since there may be noise or random perturbations.
As discussed in the previous section, we take the best rank k approximation Ak
from SVD, as a result, we get such a U, V. In this usual setting, one assumed that
A was available completely and we wished to find U and V to identify the basic
factors or in some applications to denoise A if we think of A − U V as noise. Now
suppose that n and d are very large, on the order of thousands or even millions, there
is probably little one could do to estimate or even store A. In this setting, we may
assume that we are given just a few elements of A and wish to estimate A. If A was
360 10 Applications of Linear Algebra to Numerical Methods
It is to be noted that the matrix Ak is no longer a 0-1 adjacency matrix. We will prove
2
that (i) for each 0-1 vector X , X t Ak (I − X ) and X t A(I − X ) differ by at most √nk+1 .
Thus, maxima in (10.1) and (10.2) differ by at most this amount. Also, we will show
that (ii) a near optimal X for (10.2) can be found by exploiting the low rank of Ak ,
which by (i) is near optimal for (10.1) where near optimal means with additive error
2
of at most √nk+1 .
First,√we prove (i), since X and I − X are 0-1 n-vectors, each has a length
at most n. Using the definition of two-norm of a matrix , |(A − Ak )(I − X )| ≤
10.5 Applications of Singular Value Decomposition 361
√
n||A − Ak ||2 . Now since X t (A − Ak )(I − X ) is the dot product of the vector X
with the vector (A − Ak )(I − X ), we get X t (A − Ak )(I − X ) ≤ n||A − Ak ||2 . But
we know that ||A − Ak ||2 = σk+1 , (k + 1)st singular
value of A. The inequalities
(k + 1)σk+1
2
≤ σ12 + σ22 + · · · + σk+1
2
≤ ||A||2F = i j ai2j ≤ n 2 imply that σk+1
2
≤
n2
k+1
and hence ||A − Ak ||2 ≤ √k+1 n
showing (i).
Now we prove (ii) as given above. Let us look at the special case when k = 1 and
A is approximated by the rank 1 matrix A1 . An even more special case when the left
and the right singular vectors U and V are required to be identical is already hard
to solve exactly because it subsumes the problem of whether for a set of n vectors
{a1 , a2 , . . . , an }, there is a partition into two subsets whose sums are equal. So, we
look for algorithms that solve the Maximum
k Cut Problem approximately.
For (ii), we have to maximize i=1 σi (X t Ui )(Vit (I − X )) over 0-1 vectors X.
For any S ⊆ {1, 2, . . . , n}, write Ui (S) for the sum of coordinates of the
vector Ui
corresponding to elements in the set S and also for Vi . That is, Ui (S) = j∈S u i j .
k
We will maximize i=1 σi Ui (S)Vi ( S̄) using dynamic programming.
For a subset S of {1, 2, . . . , n}, define the 2k-dimensional vector W (S) =
(U1 (S), V1 ( S̄), U2 (S),
V 2 ( S̄), . . . , Uk (S), Vk ( S̄)). If we had the list of all such vec-
k
tors, we could find i=1 σi Ui (S)Vi ( S̄) for each of them and take the maximum.
There are 2n subsets S, but several S could have the same W (S) and in that case
it suffices to list just one of them. Round each coordinate of each Ui to the nearest
integer multiple of n12 . Call the rounded vector Ūi . Similarly obtain V̄i . Let W (S)
denote the vector (U 1 (S), V 1 (S), U 2 (S), V 2 (S), . . . , U k (S), V k (S). We will con-
struct a list of all possible values of the vector W (S). Again, if several different S s
lead to the same vector W (S), we will keep only one copy on the list. The list will be
constructed by dynamic programming. For the recursive step of Dynamic Program-
ming, assume we already have a list of all such vectors for S ⊆ {1, 2, . . . , i} and
wish to construct the list for S ⊆ {1, 2, . . . , i + 1}. Each S ⊆ {1, 2, . . . , i} leads to
two possible S ⊆ {1, 2, . . . , i + 1}, namely S and S ∪ {i + 1}. In the first case, the
vector W (S ) = (U 1 (S) + u 1,i+1 , V 1 (S), U 2 (S) + u 2,i+1 , V 2 (S), . . .). In the second
case, W (S ) = (U 1 (S), V 1 (S) + v1,i+1 , U 2 (S), V 2 (S) + v2,i+1 , . . .). We put these
two vectors for each vector in the previous list. Then, crucially eliminate duplicates.
2
Assume that k is constant. Now, we show that the error is at most √nk+1 as
√
claimed. Since Ui and Vi are unit length vectors, |Ui (S)|, |(Vi (S)| ≤ n. Also
|U i (S) − Ui (S)| ≤ nkn 2 = k12 and similarly for Vi . To bound the error, we use an
elementary fact: if a, b are reals with |a|, |b| ≤ M and we estimate a by a and b by
b so that |a − a |, |b − b | ≤ δ ≤ M, then |ab − a b | = |a(b − b ) + b (a − a )| ≤
|a||(b − b )| + |b ||(a − a )| ≤ 3Mδ. Using this, we get that
√
k
k
n
3
n2
σi U i (S)V i (S) − σi Ui (S)Vi (S) ≤ 3kσ1 ≤3
i=1 i=1
k2 k
√Next, we prove that the running time is polynomially bounded. |U i (S)|, |V i (S)|2≤
2 n. Since U i (S), V i (S) are all integer multiples of nk1 2 , there are at most √nk 2
possible values of U i (S), V i (S) from which it follows that the list of W (S) never
gets larger than ( √nk
1
2)
2k
which for fixed k is polynomially bounded. Finally, we have
following conclusion:
“Given a directed graph G(V, E), a cut of size at least the maximum cut minus
2
O( √n k ) can be computed in polynomial time n for any fixed k”.
SVD in Image Processing
Suppose A is the pixel intensity matrix of a large image. The entry ai j gives the
intensity of the i jth pixel. If A is n × n, the transmission of A requires transmitting
O(n 2 ) real numbers. Instead, one could send Ak , that is, the top k singular values
σ1 , σ2 , . . . , σk along with the left and right singular vectors U1 , U2 , . . . , Uk and
V1 , V2 , . . . , Vk . This would require sending O(kn) real numbers instead of O(n 2 )
real numbers. If k is much smaller than n, this results in saving. For many images, a
k much smaller than n can be used to reconstruct the image provided that a very low-
resolution version of the image is sufficient. Thus, one can use SV D as a compression
method.
For an illustration, suppose a satellite takes a picture and wants to send it to Earth.
The picture may contain 1000 × 1000 “pixels”, a million little squares, each with a
definite color. We can code the colors and send back 1000000 numbers. It is better
to find the essential information inside the 1000 × 1000 matrix and send only that.
Suppose we know the SV D. The key is in the singular values (in D used in the
previous section). Typically, some σ s are significant and others are extremely small.
If we keep 20 and throw away 980, then we send only the corresponding 20 columns
of U and V (if A = U DV t , as in the previous section). The other 980 columns are
multiplied in U DV t by the small σ s that are being ignored. We can do the matrix
multiplication as columns times rows:
Any matrix is the sum of r matrices of rank 1. If only 20 terms are kept, we send 20
times 2000 numbers instead of a million (25 to 1).
The pictures are really striking, as more and more singular values are included.
At first, you see nothing and suddenly you recognize everything. The cost is in
computing the SV D, and this has become much more efficient, but it is expensive
for a big matrix.
An example of SVD compression in image is demonstrated in Fig. 10.1, where
Fig. 10.1(a) is the original image of Airplane of size 3 × (512 × 512) which is con-
sidered as the test image. In order to apply SVD compression on color image, firstly
the color image is decomposed into three channels, i.e., Red, Green and Blue.
After that, each color channel passes through SVD compression. Finally, all the
three compressed channels are combined to generate the SVD compressed image.
Figure 10.1(b) shows the recovered image using the first 10 singular values having
Peak Signal-to-Noise Ratio (PSNR) value 22.62746 dB and Fig. 10.1(c) shows the
10.5 Applications of Singular Value Decomposition 363
Fig. 10.1 SVD-based compression of Airplane image having size 3 × (512 × 512): (a) Original
image; (b) Compressed image using the first 10 singular values; (c) Compressed image using the
first 30 singular values
7
10
Error between compress and original image
14
12
10
-2
50 100 150 200 250 300
Number of Singular Values
Fig. 10.2 Error of recovered image using different numbers of singular values
recovered image using the first 30 singular values having PSNR value 27.202 dB,
where the visual quality of the recovered image is evaluated by PSNR, and a high
value of PSNR shows that the reconstructed image has high visual quality. Moreover,
in Fig. 10.2, we can analyze using approximately the first 200 singular values yields
to approximately zero error of the generated image.
364 10 Applications of Linear Algebra to Numerical Methods
Exercises
1 −1
1. Find the singular value decomposition of each of the following matrices. ,
0 2
⎡ ⎤ ⎡ ⎤
23 −3 0 0
2 −1 ⎢ 3 1 ⎥ ⎢ 0 1 1⎥
,⎢ ⎥ ⎢
⎣0 0⎦, ⎣ 0 0 2⎦.
⎥
1 2
00 0 00
2. Show that A and At have the same nonzero singular values. Do their singular
value decompositions related? Justify your answer.
3. Let A be a symmetric matrix with eigenvalues λ1 , λ2 , . . . , λn . Then show that
|λ1 |, |λ2 |, . . . , |λn | are singular values of A.
4. Let A be an m × n matrix with a singular value decomposition U DV t . Prove that
||AX ||2
min X =0 = σn ,
||X ||2
Initially, we shall describe briefly the framework within which we are about to set to
work.
Definition 11.1 Let F be a field, V a vector space of dimension n over F and S a
nonempty set. The pair (S, V ) is said to be an affine space if there exists an application
ϕ:S×S→V
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 365
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_11
366 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
between the elements of S and vectors of V. In fact, after the choice of the origin
O ∈ S, any vector v ∈ V can be represented as v = OP, where P ∈ S is uniquely
determined. This bijection S → V is then depending only by the choice of the origin
point. Hence, if dim F V = n then A ∼
=V ∼= Fn . In this sense, we have the following:
Definition 11.2 An affine coordinate system of the affine space A consists of a fixed
point O ∈ S, called origin, and a basis {e1 , . . . , en } of V. It is usually denoted by
{O, e1 , . . . , en }. In terms of this system, the coordinates of any point P ∈ A are
defined as the coordinates of the vector ϕ(O, P) = OP with respect to the basis
{e1 , . . . , en } of V. Hence, if
OP = x1 e1 + x2 e2 + · · · + xn en ,
Remark 11.3 Let V be a vector space, n-dimensional over the field F. If we define
the map ϕ : V × V → V by ϕ(v, w) = w − v, for any vectors v, w ∈ V, then the
pair (V, V ) is an affine space of dimension n. Indeed, the conditions introduced in
Definition 11.1 are satisfied:
(i) For any fixed vector u ∈ V and for any vector v ∈ V, there is a unique vector
w ∈ V such that
ϕ(u, v) = v − u = w.
associated quadratic form is positive definite. For brevity, we shall denote the affine
Euclidean space (S, V ) by E.
Hence, an affine Euclidean space E is associated with a real vector space V in which,
for any vectors v1 , v2 ∈ V , there corresponds a real nonnegative number f (v1 , v2 )
such that the following conditions are satisfied:
(1) For any vectors v1 , v2 , w ∈ V, f (v1 + v2 , w) = f (v1 , w) + f (v2 , w).
(2) For any vectors v1 , v2 ∈ V, f (v1 , v2 ) = f (v2 , v1 ).
(3) For any vectors v1 , v2 ∈ V and scalar λ ∈ R, f (λv1 , v2 ) = f (v1 , λv2 ) =
λ f (v1 , v2 ).
(4) For any vector 0 = v ∈ V, f (v, v) > 0.
A Euclidean coordinate system of E (also called Euclidean frame of reference in
E) is an affine coordinate system {O, e1 , . . . , en } of E, where vectors e1 , . . . , en are
pairwise f -orthonormal. For V = Rn , the affine Euclidean space (Rn , Rn ) is usually
called n-dimensional affine Euclidean space over R and denoted by REn .
Example 11.5 The real affine Euclidean plane RE2 in which f (v1 , v2 ) = v1 · v2
(the usual dot product) is a Euclidean space of dimension 2 associated with the
vector space R2 . It is well known how to introduce a coordinate system in the 2-
dimensional Euclidean plane RE2 (it is usually the 2-dimensional O X Y coordinate
system adopted for the Euclidean geometry in the plane). Choose an origin point
O in the plane and draw two perpendicular axes through it, one horizontal and one
vertical, respectively, called X and Y. Any point P can be uniquely represented by
the pair of real numbers (x, y), called the coordinates of the point P in terms of
the coordinate system O X Y. If i, j are the unit vectors of X and Y , respectively, the
point P has coordinates (x, y) if and only if
OP = xi + yj.
OP = xi + yj + zk.
Example 11.7 More generally, if REn is the affine Euclidean space of dimension n
over R, then there exists an orthonormal basis B of Rn in terms of which the inner
product of vectors is defined by
v1 · v2 = α1 β1 + · · · + αn βn
the system (11.1) can be written in the compact form AX = B. The set of solutions
A is a subset of A = Fn ; more precisely, it is an affine subspace of A. To prove this
fact, we show that if Y1 , Y2 ∈ A are solutions of (11.1), then the vector Y1 Y2 lies in
the solution space of homogeneous system associated with (11.1), which is a vector
subspace of V = Fn . For any P, Q ∈ A, we set PQ = Q − P. By Remark 11.3, we
know that this definition induces a structure of affine space. From AY1 = B and
AY2 = B, it follows A(Y2 − Y1 ) = AY2 − AY1 = 0, that is, Y1 Y2 = Y2 − Y1 lies in
the solution subspace of the homogeneous system associated with (11.1), as asserted.
f 1 (η1 , . . . , ηn ) = b1 , . . . , fr (η1 , . . . , ηn ) = br .
Remark 11.12 Let A = Fn be the affine space over the vector space V = Fn , for F
a field. Fix a point p ∈ A and let W be a vector subspace of V. The set
p + W = { p + w|w ∈ W }
Let A = Fn be the affine space over the vector space V = Fn , for F a field. Under the
assumption that PQ = Q − P, for any P, Q ∈ A, any affine subspace of A has the
form Q + W, for some fixed point Q ∈ A and vector subspace W of V. Moreover,
any affine subspace A of A can be represented by the associated vector subspace W
and by any of its points Q. To prove this, assume A = Q + W, let P ∈ A be any
other point of A and set A = P + W. If R ∈ A , then
PR = PQ + QR = −QP + QR ∈ W
QS = QP + PS = −PQ + PS ∈ W,
x = x0 + tl
y = y0 + tm (11.4)
z = z 0 + tn
which are called parametric equations of the straight line r. Thus, for any point
P ∈ r, if we denote by X and X0 the coordinate vectors of P and P0 , respectively,
we may write X = X0 + tv.
The real numbers (l, m, n) are called direction ratios of the straight line r. There-
fore, two straight lines r and r are parallel if and only if their direction ratios are,
respectively, (l, m, n) and ρ(l, m, n), for a suitable ρ ∈ R.
Moreover, if we consider two straight lines r and r in RE3 , having respectively
direction ratios (l, m, n) and (l , m , n ), we see that they are perpendicular if and
only if the vectors v ≡ (l, m, n) and v ≡ (l , m , n ) are orthogonal, that is, the inner
product v · v is zero. This means that ll + mm + nn = 0.
x − x0 = tl1 + t l2
y − y0 = tm 1 + t m 2
z − z 0 = tn 1 + t n 2
so that the set of all points (x, y, z) belonging to the plane is described by
x = x0 + tl1 + t l2
y = y0 + tm 1 + t m 2
z = z 0 + tn 1 + t n 2
has rank ≤ 2, in particular, its determinant is zero. By the easy computation of this
determinant, it follows that there exist suitable scalar coefficients a, b, c, d such that
372 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
the coordinates (x, y, z) of any point of the plane should satisfy the following relation
ax + by + cz + d = 0. It is called Cartesian equation of the plane.
Definition 11.15 Let A be an affine space over the vector space V and A , A two
affine subspaces of A, having directions V and V , respectively. A and A are said
to be parallel if either V ⊆ V or V ⊆ V . In particular, if A and A have the same
dimension, we say that they are parallel if V = V .
and notice V ⊂ V as vector spaces. Then A and A are parallel subspaces in RA3 .
In particular, we may look at that from the geometrical point of view, saying that
any line having direction ratios (1, −2, 2) and any plane containing vectors whose
coordinates are of the form α(0, 1, −1) + β(1, 1, −1), for any α, β ∈ R, are parallel
in the classical affine 3-dimensional space.
x2 − x3 + x4 = 1,
(11.6)
x3 + x4 = 3.
The set A of solutions of (11.5) and the set A of solutions of (11.6) are parallel
affine subspaces in RA4 . To prove it, firstly we compute the direction V of A . It
consists of the solutions of the homogeneous linear system
x1 − x2 + 2x3 − x4 = 0,
2x1 + x2 + x3 + x4 = 0,
2x1 + x2 − x3 − x4 = 0,
that is, V = (1, −2, −1, 1)
. We may also notice that the point P = ( 23 , − 13 , 0, 0)
is a solution of system (11.5). Hence, A = P + V can be represented by relations:
x1 = 23 + α
x2 = − 13 − 2α
, α ∈ R.
x3 = −α
x4 = α
11.1 Affine and Euclidean Spaces 373
On the other hand, direction V of A consists of the solutions of the homogeneous
linear system
x2 − x3 + x4 = 0,
x3 + x4 = 0,
that is, V = (1, 0, 0, 0), (0, −2, −1, 1)
. Since the point P = (0, 4, 3, 0) is solu-
tion of system (11.6), we write A = P + V and it can be represented by
x1 = β
x2 = 4 − 2γ
, β, γ ∈ R.
x3 = 3−γ
x4 = γ
for some nonzero vector u ∈ Rn that is orthogonal to the hyperplane and some λ ∈ R.
Proof Fix any point Q ∈ H ; let X be its coordinate vector. Then H − Q is a hyper-
plane containing the origin of the frame of reference and parallel to H. If u ∈ Rn is
orthogonal to H, then a point P, having coordinate vector Y, lies in H if and only if
the vector Y − X is orthogonal to u, that is, Y · u = X · u. Thus H has the form (11.7)
for λ = X · u. More precisely, if u = [a1 , . . . , an ]t ∈ Rn , then H is represented by
the linear equation
a1 x1 + · · · + an xn − λ = 0.
Notice that, following the argument presented in the above proposition, the hyper-
plane H contains the origin point of the frame of reference if and only if λ = 0.
Example 11.19 Let u = [1, 2, −1]t ∈ R3 . Then by the symbol Hu,λ we mean all
hyperplanes in RE3 that are orthogonal to vector u. The different hyperplanes Hu,λ
are parallel to each other, as λ varies. Thus, for instance,
(1) for λ = 0, Hu,0 is represented by equation x + 2y − z = 0;
(2) for λ = −1, Hu,−1 is represented by equation x + 2y − z + 1 = 0;
(3) for λ = 2, Hu,2 is represented by equation x + 2y − z − 2 = 0.
By using the dot product of inner product vector space R3 , we are in a position to
compile an overview of the main formulae and techniques enabling the computation
of angles and distances in the 3-dimensional real Euclidean space.
374 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
x = 1 + 2t
y = 2−t , t ∈R
z = 3t
and v be the vector of coordinates (2, −1, 1) in terms of the standard frame of
reference in O X Y Z . In order to get the projection of v onto r, we may compute
the projection onto r of a vector which is equipollent to v (that is, it has the same
length, direction and sense of v) but has its tail on r. We obtain this vector by a
simple translation of v. Actually, without loss of generality, we may assume that v
is precisely applied to r. Notice that the direction of r is represented by the vector
e = (2, −1, 3). At this point, performing the formula (11.8), we get
v·e 8 4 12
v = e= ,− , .
e·e 7 7 7
x = 1 + t − t
y = 2 + 2t − t , t, t ∈ R
z = 1 − t + t
and v be the vector of coordinates (1, 1, 2) in terms of the standard frame of reference
in O X Y Z . As above, we may assume that v is precisely applied to π. Starting from
the definition of π and using the classical orthogonalization process, we arrive at
the conclusion that the vector space representing the direction of π is generated by
11.1 Affine and Euclidean Spaces 375
orthogonal vectors e1 = (1, 2, −1), e2 = (−1, 1, 1). At this point, performing the
formula (11.8), we get
v · e1 v · e2 1 1
v = e1 + e2 = − , 1, .
e1 · e1 e2 · e2 2 2
and the positive X, Y and Z axes, respectively. Therefore, the direction cosines of r
can be represented by
l l
α = cos(
r, X ) ∈ +√ , −√ ,
l 2 + m 2 + n2 l 2 + m 2 + n2
m m
β = cos(r, Y) ∈ +√ , −√ ,
l 2 + m 2 + n2 l 2 + m 2 + n2
n n
γ = cos(
r, Z ) ∈ + √ , −√ .
l 2 + m 2 + n2 l 2 + m 2 + n2
Let now (l, m, n) and (l , m , n ) be the direction ratios of the straight lines r and r ,
respectively. The cosine of the angle enclosed between r and r is equal to the cosine
of the angle enclosed between the unit vectors of r and r , having coordinates
l m n
√ ,√ ,√
l 2 + m 2 + n2 l 2 + m 2 + n2 l 2 + m 2 + n2
and
l m n
√ , √ , √ ,
l 2 + m 2 + n 2 l 2 + m 2 + n 2 l 2 + m 2 + n 2
ll + mm + nn
) = ± √
cos(rr √ .
l 2 + m 2 + n 2 l 2 + m 2 + n 2
x = 1 + 2t x = 2 + 3t
r : y = 1 − t , r : y = 1 + 2t , t, t ∈ R.
z = 1+t z = 1 + t
The directions of r and r are (2, −1, 1)r and (3, 2, 1)r , respectively. Their direction
cosines are
2 1 1
X ) = ± √ , cos(r
cos(r Y ) = ∓ √ , cos(r
Z) = ±√ ,
6 6 6
3 2 1
cos(r
X) = ±√ , cos(r
Y ) = ± √ , cos(r
Z) = ±√ .
14 14 14
5
) = ± √ √ .
cos(rr
6 14
n
f (x1 , . . . , xn ) = 0 where f (x1 , . . . , xn ) = αi xi − λ. (11.9)
i=1
x1 = β1 + α1 t
H : ...
⊥
, t ∈R (11.10)
xn = βn + αn t
n
(αi βi + αi2 t) − λ = 0,
i=1
11.1 Affine and Euclidean Spaces 377
that is,
f (X0 ) + t u2 = 0.
(X0 )
Thus, point Q 0 is obtained by (11.10) for t = − fu 2 . The coordinate vector repre-
senting P0 Q0 is then
f (X0 ) f (X0 ) f (X0 )
− α1 , −α2 , . . . , −αn
u 2
u 2
u2
whose length is
f (X0 )2 | f (X0 )|
(α12 + · · · + αn2 ) = .
u 4 u
|ax0 + by0 + cz 0 + d|
δ(P0 , H ) = √ .
a 2 + b2 + c2
Example 11.24 As a consequence, we may also obtain the distance between two
parallel planes. To do this, without loss of generality, we consider the planes π and
π having equations
π : ax + by + cz + d = 0, π : ax + by + cz + d = 0, d = d .
|ax0 + by0 + cz 0 + d | |d − d|
δ(π, π ) = δ(P0 , π ) = √ =√
a 2 + b2 + c2 a 2 + b2 + c2
distance between the two lines, which is the distance between π and π . It is nat-
urally the shortest distance between the lines, i.e., the length of the orthogonal line
segment to both lines. The solution of the problem is very simple through the use of
both scalar product · and cross product ∧ of vectors. In detail, let v = (l, m, n) and
v = (l , m , n ) be two vectors representing the direction ratios of r and r , respec-
tively. The vector v ∧ v is orthogonal to v and v , that is, to r and r . Hence, for any
choice of two points P ∈ r and Q ∈ r , the absolute value of the scalar projection
of PQ in the direction of v ∧ v is the minimum distance δ(r, r ) between r and r :
v ∧ v
δ(r, r ) = PQ · .
v ∧ v
x= 0 x = −2 + t
r : y = t , r : y = −1 − 3t , t, t ∈ R.
z = 1+t z = −2t
P1 Q1 Q1 Q0 .
On the other hand, since Q1 Q0 and Q0 P1 are the sides of P, it can be also obtained
as the norm of Q1 Q0 ∧ Q0 P1 . Therefore,
= P1 Q1
Q1 Q0 ∧Q0 P1
= Q1 Q0
2 2 2
x1 − x0 y1 − y0 x1 − x0 z 1 − z 0 y1 − y0 z 1 − z 0
+ +
l m l n m n
= l 2 +m 2 +n 2
.
Example 11.26 Let P0 ∈ RE3 be the point of coordinates (1, −1, 2) and r the line
represented by the parametric form
x = 1 + 2t
r : y = 1 − t , t ∈ R.
z =1+t
In order to obtain the minimum distance from P0 to r, we firstly compute the coor-
dinate of vector P0 Q, where Q is any point of r. For instance, if we choose Q as the
point of coordinates (1, 1, 1), it follows that P0 Q ≡ (0, 2, −1). Then the requested
distance is 2 2 2
−1 1 2 1 2 −1
+ +
2 −1 0 −1 0 2
δ(P0 , r ) = 4+1+1
√
= √7 .
2
380 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
Exercises
1. Let A = RA5 be equipped with the standard frame of reference. Determine para-
metric and linear equations representing the affine subspace of A having direc-
tion V = (0, 1, 1, 0, 0), (0, 0, 0, 1, 0), (1, 1, 0, 0, −1)
and containing the point
P ≡ (1, −1, 0, 0, 0).
2. In the affine space RA6 equipped with the standard frame of reference, represent
the affine subspace containing the following points:
3. In the Euclidean space RE3 equipped with the standard frame of reference, deter-
mine the projection of vector v ≡ (1, 2, −1) onto the hyperplane containing the
origin point and having direction V = (2, 1, 0), (−1, 1, 1)
.
4. In the Euclidean space RE3 , consider the following lines: r1 containing the point
P ≡ (1, 1, 2) and having direction V = (1, 1, 0)
, r2 as intersection of planes
having equations x + y − z + 1 = 0 and 2x + y + z − 1 = 0. Prove that r1 and
r2 are skew lines and determine their minimum distance.
5. Let P1 , P2 , P3 be three points in the affine space RA2 equipped with the standard
frame of reference. Letting (α1 , α2 ), (β1 , β2 ) and (γ1 , γ2 ) be their coordinates,
prove that P1 , P2 , P3 are collinear if and only if the matrix
⎡ ⎤
α1 α2 1
⎣ β1 β2 1 ⎦
γ1 γ2 1
The homomorphism ϕ : V → V is called the linear part of f (or also the associated
homomorphism with f ). An affine transformation of A is an isomorphism of A into
itself and its linear part is an isomorphism ϕ : V → V.
The set of all affine transformations of A is usually denoted by A f f (A). The fact
that it is a group can be easily checked.
Remark 11.28 In fact,
(i) the identity map η : A → A is the affine transformation having the identity
map on V as associated automorphism,
(ii) the composition of two affine transformations g ◦ f of A (having associated
automorphisms χ , ϕ : V → V , respectively) is the affine transformation asso-
ciated with the automorphism χ ◦ ϕ of V,
(iii) the inverse of the affine transformation f : A → A, having associated auto-
morphism ϕ : V → V, is the affine transformation f −1 : A → A having
ϕ −1 : V → V as associated automorphism.
Example 11.29 Let A = (S, V ) be an affine space, where V is a vector space over
the field F, and let v ∈ V be a fixed vector. Consider the map f v : A → A defined
as follows: for any P ∈ A,
f v (P) = Q ∈ A ⇐⇒ PQ = v.
fv (P)fv (Q) = P1 P2
= P1 P + PQ + QP2
= −v + PQ + v
= PQ.
Remark 11.30 Let A = (S, V ) be an affine space and f : A → A any affine trans-
formation associated with the identity map of V. It follows that, for any P, Q ∈ A,
both the following hold:
f(P)f(Q) = PQ
and
Pf(P) = PQ + Qf(Q) + f(Q)f(P)
= PQ + Qf(Q) − PQ
= Qf(Q).
χ ( f w ◦ f v ) = w + v = χ ( f w ) + χ ( f v ),
Example 11.32 Let A = (S, V ) be an affine space, where V is a vector space over
the field F. Let P0 ∈ A be a fixed point and 0 = λ ∈ F. Consider the map f P0 ,λ :
A → A defined as follows: for any P ∈ A,
f P0 ,λ (P) = Q ∈ A ⇐⇒ P0 Q = λP0 P.
The point P0 is called the center of f P0 ,λ and the nonzero scalar λ is called its
ratio. If we assume f P0 ,λ (P1 ) = f P0 ,λ (P2 ) = Q ∈ A, then λP0 P1 = λP0 P2 , imply-
ing P1 = P2 ∈ A. Hence, f P0 ,λ is injective. Moreover, for any Q ∈ A and tak-
ing P = f P0 ,λ−1 (Q), one has that P0 P = λ−1 P0 Q, that is, P0 Q = λP0 P. Thus
Q = f P0 ,λ (P), i.e., f v is surjective.
Let now P, Q ∈ A, such that P1 = f P0 ,λ (P) and P2 = f P0 ,λ (Q), that is, P0 P1 =
λP0 P and P0 P2 = λP0 Q. It follows that
11.2 Affine Transformations 383
Proof Let V and V be the respective vector spaces of A and A , and ϕ : V → V the
linear transformation associated with f. For any points Q 1 , Q 2 ∈ f −1 (P ), we see
that ϕ(Q1 Q2 ) = f(Q1 )f(Q2 ) = 0 (since f (Q 1 ) = f (Q 2 ) = P ). Hence, the vector
Q1 Q2 lies in the null space of ϕ (which is a subspace of V ).
Consider now P ∈ f −1 (P ) ⊆ A and any vector u ∈ K er (ϕ), the null space of
ϕ. Then, there exists Q ∈ A such that PQ = u, that is, 0 = ϕ(PQ) = f(P)f(Q), and
f (P) = f (Q) follows. Thus, Q ∈ f −1 (P ) and K er (ϕ) is precisely the vector space
associated with the affine subspace f −1 (P ).
Definition 11.37 Let f : A → A be a projection of A onto its subspace A and
S ⊆ A (not necessarily a subspace of A ). The set
f −1 (S) = f −1 (P)
P∈S
is called a cylinder in A.
Let us now write down the action of affine transformations of affine spaces in coor-
dinate form. To do this, let A be an affine n-dimensional space associated with the
vector space V over the field F, B = {O, e1 , . . . , en } a frame of reference of A. We
prove the following.
Theorem 11.38 The map f : A → A is an affine transformation if and only if there
exist a nonsingular matrix A ∈ Mn (F) and a fixed point c ∈ A such that, for any
point P ∈ A having coordinate vector X = [x1 , . . . , xn ]t in terms of B, the point
f (P) ∈ A has coordinate vector Y = [y1 , . . . , yn ]t in terms of B, satisfying the
following relation:
Y = AX + c. (11.11)
OP = x1 e1 + · · · + xn en
Of(P) = y1 e1 + · · · + yn en
Of(O) = c1 e1 + · · · + cn en
and ϕ(ei ) = Aei , for each i = 1, . . . , n, treating ϕ(ei ) and ei as column vectors.
Thus,
ϕ(OP) = ϕ(x1 e1 + · · · + xn en ) = A(x1 e1 + · · · + xn en )
and
f(O)f(P) = (y1 − c1 )e1 + · · · + (yn − cn )en .
that is, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1 y1 c1
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
A⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥=⎢ . ⎥−⎢ . ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
xn yn cn
as required.
Conversely, for any invertible matrix A ∈ Mn (F) associated with an automor-
phism of V with respect to the basis {e1 , . . . , en }, and for any point c ∈ A having
coordinate vector [c1 , . . . , cn ]t in terms of B, let f A,c : A → A be the map defined
by relation (11.11). This map is an affine transformation. In fact, for any P, Q ∈ A,
having respectively coordinate vectors X = [x1 , . . . , xn ]t and Y = [y1 , . . . , yn ]t in
terms of B, the following holds:
Remark 11.39 Translations of an n-dimensional affine space A are precisely all the
affine transformations of the form f In ,c , where c ∈ A is a fixed point and In ∈ Mn (F)
is the identity matrix.
One of the most relevant aspects of the affine transformations is that some properties
are invariant under the action of such transformations. If a subset S ⊂ A possesses a
property that is invariant under the action of f, then f (S) ⊂ A is a subset having the
same property. Later, we describe in detail the application to the case of geometric
figures having properties that are invariant under the action of affine transformations.
Here, we firstly would like to fix some useful results:
Theorem 11.40 Let A be the affine space FAn and f : A → A an affine transfor-
mation. Then
(i) f maps an affine subspace A to an affine subspace having the same dimension
of A .
(ii) f preserves the property of parallelism among affine subspaces.
In particular, if n = 2, let A be the affine plane FA2 or the affine 3-dimensional space
FA3 and f : A → A an affine transformation. Then, it is obvious to observe that f
maps a line to a line and it also preserves the property of parallelism among lines.
Further, for n = 3 if A is the affine 3-dimensional space FA3 and f : A → A an
affine transformation, then f maps a plane to a plane; it preserves the property of
parallelism among planes and preserves the property of parallelism among a plane
and a line.
386 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
Proof (i) To show these properties, without loss of generality, we may consider
A = P0 + W, for a fixed point P0 ∈ A and a vector subspace W of Fn . Here, we
have identified V by Fn because V Fn , where V is the vector space associated
with the affine space A. Assume that {w1 , . . . , wk } is a basis of W. Thus, for any
point P ∈ A ,
P0 P = t1 w1 + · · · + tk wk
X = X 0 + t1 w1 + · · · + tk wk .
The coordinates of the point f (P) may be computed using the expression f (X ) =
AX + c, where A is an invertible matrix of Mn (F) and c is a fixed point of A . Thus
f (P) = f (X 0 + t1 w1 + · · · + tk wk )
= A(X 0 + t1 w1 + · · · + tk wk ) + c
= (AX 0 + c) + t1 Aw1 + · · · + tk Awk
= P1 + w
Since ϕ(U ) is a vector subspace of ϕ(W ), we conclude that f (A ) and f (A ) are
parallel.
Now, we spend a few lines in order to introduce the first step of the study of conics
in affine spaces. We’ll come back to a deep analysis of conics in the sequel; here, we
just would like to remark some properties which are strictly related to the previous
results.
A conic section or a conic is the locus of a point which moves in a plane so that its
distance from a fixed point is in a constant ratio to its perpendicular distance from a
fixed straight line. The fixed point is called the focus, the fixed straight line is called
the directrix and the constant ratio is called eccentricity usually denoted by e. The
line passing through the focus and perpendicular to the directrix is called axis, and
the point of intersection of a conic with its axis is called a vertex.
M
rix
ct
ire
D
P
Moving Point
S
Focus
the general equation of the second degree. Conversely, it can be proved that the
general equation of the second degree (11.12) also represents a conic section. Hence,
a conic section can be defined in the following way also:
A conic section is the set of all points in a plane, whose coordinates
satisfy
the
a h g
general equation of the second degree given by (11.12). Now let = h b f . Then
g f c
the conic section represented by (11.12) is called
• a parabola if = 0, h 2 = ab;
• an ellipse if = 0, h 2 < ab, (either a = b or h = 0);
• a circle if = 0, h 2 = ab, a = b, h = 0;
• a hyperbola if = 0, h 2 > ab.
so that q(x, y) can be written as q(x, y) = X t AX, where X = [x, y]t . Hence, if
we denote B = [2a13 , 2a23 ]t , then the equation g(x, y) = 0 can be written in the
following matrix notation:
X t AX + B t X + a33 = 0 (11.15)
Case II: If |C| = 0 and |A| < 0, the conic is a non-degenerate hyperbola.
Case III: If |C| = 0 and |A| = 0, the conic is a non-degenerate parabola.
Case IV: If |C| = 0, the conic is degenerate, that is, it consists of either the union
of two (secant, parallel or merged) real lines or the union of two conjugate (secant
or parallel) imaginary lines. In all these subcases, the rank of C is equal to 2 if the
conic is a union of distinct lines; it is equal to 1 in either case (merged lines).
where
−1 β11 β12 t −1 γ1
H=M = , X = f (X ) = [x , y ] , d = −M c= .
β21 β22 γ2
Hence, by using relation (11.17) in (11.15), we have that the geometrical locus f (Γ )
consisting of all points whose coordinates X are the solution of
that is,
X t (H t AH )X + X t H t Ad + d t AH X + d t Ad + B t H X + B t d + a33 = 0.
(11.18)
Moreover, by the facts X t H t Ad = d t At H X and At = A, (11.18) reduces to
X t (H t AH )X + (d t AH + d t AH + B t H )X + (d t Ad + B t d + a33 ) = 0.
(11.19)
Therefore, under the notation A = H t AH, B = (d t AH + d t AH + B t H )t and
a33 = d t Ad + B t d + a33 in (11.19), f (Γ ) is represented by
X t A X + B t X + a33
= 0,
that is, a quadratic equation h(x , y ) = 0. Then f (Γ ) is a conic curve. Notice that
relation (11.17) also induces the following one:
⎡ ⎤ ⎡ ⎤⎡ ⎤
x β11 β12 γ1 x
⎣ y ⎦ = ⎣ β21 β22 γ2 ⎦ ⎣ y ⎦ . (11.20)
1 0 0 1 1
390 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
that is, ⎡ ⎤ ⎡ ⎤
β11 β21 0 β11 β12 γ1
Y t ⎣ β12 β22 0 ⎦ C ⎣ β21 β22 γ2 ⎦ Y = 0. (11.21)
γ1 γ2 1 0 0 1
Since C and C are congruent, they have the same rank. Then we may admit that Γ
is non-degenerate (respectively degenerate) if and only if f (Γ ) is. Moreover, if Γ
were a degenerate conic, then it would be a pair of secant, parallel or merged lines.
In this case, since f maps a line to a line and preserves the property of parallelism
among lines; the image f (Γ ) would be again a pair of secant, parallel or merged
lines, respectively.
Finally, we fix our attention to the case of a non-degenerate conics Γ (ellipse,
hyperbola and parabola). The matrix A represents the quadratic part of the poly-
nomial h(x , y ), which describes the image f (Γ ). Since A = H t AH, then |A | =
|H |2 |A|, that is, the determinant of A is zero, positive or negative, according with
the determinant of A is zero, positive or negative. Thus, the type of the conic is
unchanged.
Let us describe now one class of affine transformations of particular interest. Let us
assume that f : A → A has a fixed point, that is, O ∈ A such that f (O) = O. In
light of Theorem 11.38, f can be represented, with respect to a frame of reference
{O, e1 , . . . , en } of A, by the action
f A,O (X ) = AX,
the set of all affine transformations of the form f A,O . In this sense, there is a one-
to-one correspondence between A f f (A) O and the matrices A ∈ G L n (F). For this
reason, such affine transformations are usually called linear.
Consider then an affine transformation f : A → A and let O be any point of A.
If we denote v = Of(O) ∈ V and f v is the translation of A defined by v, then the
composition g = f v−1 ◦ f (that is, g = f −v ◦ f ) is clearly a linear affine transfor-
mation of A because g fixes the point O as discussed below. Hence, f = f v ◦ g is
a representation of f precisely in terms of composition of one translation and one
linear affine transformation.
Notice that if v = −Of −1 (O) ∈ V and h = f ◦ f v−1 , where f v is the translation
and
f v−1
(O) = T ⇐⇒ f −v (O) = T
⇐⇒ OT = −v
⇐⇒ OT = Of −1 (O)
⇐⇒ T = f −1 (O),
and
h(O) = f ◦ f v−1
(O) = f ◦ f
−1
(O) = O.
Therefore, g and h are linear affine transformations fixing the same point O ∈ A
and associated with the same automorphism of V. Thus, in light of the previously
mentioned bijection between A f f (A) O and G L n (F), g and h are represented by
the same matrix A ∈ G L n (F) (by the choice of a basis of V ), that is, g = h. As a
conclusion, we have that
f = f v ◦ g = g ◦ f v . (11.22)
11.3 Isometries
Definition 11.45 Let E be an affine Euclidean space over the vector space V. An
affine transformation f : E → E is called an isometry of E if the associated auto-
morphism ϕ : V → V is an orthogonal operator (an isometry of V ).
v = w + αu (11.23)
for α ∈ R such that w = αu. By the fact that w · u = 0 and performing the dot
product by u in the identity (11.23), it follows v · u = αu · u.
11.3 Isometries 393
f (P) = f (w + αu)
= w − αu
= w − u·u
v·u
u
= v − αu − u·uv·u
u
= v − 2 u·u u
v·u
which is the coordinate vector of the image f (P). In particular, since w and αu are
orthogonal
f (v) = w + αu = v .
⎡ ⎤ ⎡ ⎤
α1 ⎡ ⎤ α1
⎢ ..⎥ 1 ⎢ ⎥ ..
⎢ .⎥ ⎢ 1 ⎥⎢ ⎥ .
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢
.. . .. ⎥ ⎢ ⎥ ..
⎢ ⎥ ⎢
. ⎥⎢ ⎥ .
⎢ ⎥=⎢ ⎥⎢ ⎥.
⎢ ⎥ ⎢
.. ⎥⎢ ⎥ ..
⎢ ⎥ ⎢
. ⎥⎢ ⎥ .
⎢ ⎥ ⎣ 1 ⎦ ⎢ ⎥
⎣ αn−1 ⎦ ⎣ αn−1 ⎦
−1
−αn αn
This shows that every reflection across a hyperplane containing the origin consists
only of its linear part, moreover, it is represented by a matrix having determinant
equal to −1.
To get a formula for reflections across any hyperplane, not necessarily containing
the origin, we describe the hyperplane in terms of coordinate vectors of its points:
for some nonzero vector u ∈ Rn that is orthogonal to the hyperplane and some λ ∈ R.
394 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
v·u−λ
f (v) = v − 2 u
u·u
which represents the coordinate vector of the reflected point from P across H.
To clarify much more the previous example, we illustrate some specific cases.
Example 11.48 Let r be the straight line in RE2 defined by equation x − 2y + 1 =
0. Its direction is V = (2, 1)
, whose orthogonal complement is u
, where u ≡
(1, −2). Any point of r possesses coordinate vector (y1 , y2 ) satisfying the relation
y1 − 2y2 = −1 (λ = −1). Hence, for any vector (x1 , x2 ) representing a point of RE2 ,
its reflection across r is obtained by
2x1 −4x2 +2 4x1 −8x2 +4
!
f (x1 , x2 ) = x1 − 5
, x2 + 5
By Theorem 11.38 and since any orthogonal operator of a vector space is represented
by an orthogonal matrix, the following directly follows.
Y = AX + c. (11.27)
Let us note that we are now able to reformulate the result contained in Theorem 11.44.
Theorem 11.52 Let E be an affine Euclidean space over the vector space V. For
any isometry f : E → E and for any point O ∈ E, there exist unique v ∈ V and an
isometry g : A → A fixing O, such that
f = fv ◦ g (11.28)
Remark 11.53 We know that, for any points P, Q of an affine Euclidean space E
over the Euclidean vector space V, the vector PQ is determined. Hence, we may
define the map
δ :E×E→R
such that δ(P, Q) = PQ , usually called distance between P and Q. As a met-
ric map on a vector space, one may see that the distance δ satisfies the following
properties:
(i) δ(P, Q) > 0, for any Q = P points of E.
(ii) δ(P, P) = 0, for any P ∈ E.
(iii) δ(P, Q) = δ(Q, P), for any P, Q ∈ E.
(iv) δ(P, Q) = δ(P, H ) + δ(H, Q), for any P, H, Q ∈ E.
396 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
In other words, a distance satisfies the same condition of a metric of a vector space,
thus any affine Euclidean space is a metric space.
Theorem 11.54 Let E be an affine Euclidean space over the vector space V (over
the field R). The map f : E → E is an isometry if and only if there exists a distance
δ : E × E → R such that
δ f (P), f (Q) = δ P, Q (11.29)
= ϕ(PQ)
= PQ
= δ P, Q .
Conversely, assume that condition (11.29) holds. We now fix a point O ∈ E and
introduce the map σ : E → V defined by σ (P) = OP, for any point P ∈ E. In light
of the definition of affine spaces, it is clear that σ is a bijection between E and V,
i.e., for any point P ∈ E, there is a unique vector v ∈ V such that v = OP.
Consider the following function ϕ : V → V defined by ϕ(v) = f(O)f(P), for any
v = OP ∈ V. Notice that
= f(O)f(P) − f(O)f(Q)
= f(Q)f(P)
= QP
= v − w ,
11.3 Isometries 397
A coordinate transformation in the affine Euclidean space RE2 is the transition from
a frame of reference of RE2 to another one, corresponding to a transition from a
basis of R2 to another one, so that any point of RE2 (and so any vector of R2 ) is
represented by different coordinates with respect to different systems.
We fix our attention to translations and rotations of the unit vectors i and j of a
Cartesian coordinate system O X Y. More precisely,
Translation: Introduce a second coordinate system O X Y , having the origin O and
unit vectors i and j , such that the coordinates of O with respect to the system O X Y
are (x0 , y0 ) and the base vectors i , j are parallel to i and j, respectively.
Rotation: Introduce a second coordinate system O X Y , having the same origin and
unit vectors i and j , such that the new coordinate system is obtained from the first
one by a rotation of a certain angle ϑ of the base vectors. We recall that the sign
convention of rotations is positive counterclockwise.
Let (x, y) be the coordinates of the point P in terms of O X Y. In order to determine
the coordinates (x , y ) of P with respect to the new coordinate system, we firstly
remark that they are the coordinates of the vector OP, where O is the tail and P is
the head. We express these coordinates both in terms of {i, j} and in terms of {i , j }:
OP = xi + yj OP = x i + y j .
x = x − x0
y = y − y0
398 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
and conversely
x = x + x0
y = y + y0 .
Moreover, one can see that rotations are nothing more than changes of basis in the
vector space R2 , then they can be represented by a transition matrix of the ordered
basis {i , j } relative to the ordered basis {i, j}. Therefore, the coordinate vector of i
relative to the basis {i, j} is the first column vector in the transition matrix relative to
{i, j}; analogously, the coordinate vector of j relative to the basis {i, j} is the second
column vector in the transition matrix relative to {i, j}.
Hence, in the case the new coordinate system is obtained from the first one by
a counterclockwise rotation of a certain angle ϑ, the
coordinate vector of i and j
relative to the basis {i, j} are cos(ϑ), sin(ϑ) and −sin(ϑ), cos(ϑ) , respectively.
Thus, the rotation is represented by
x cos(ϑ) −sin(ϑ) x
= .
y sin(ϑ) cos(ϑ) y
x = x cos(ϑ) − y sin(ϑ)
y = x sin(ϑ) + y cos(ϑ)
and
x = xcos(ϑ) + ysin(ϑ)
.
y = −xsin(ϑ) + ycos(ϑ)
We finally assume that the new coordinate system O X Y has been counterclockwise
rotated of a certain angle ϑ with respect to O X Y and then translated. As above, we
denote as O ≡ (x0 , y0 ) the coordinates of the translated origin. In this case, the
roto-translation is represented by the relation
x − x0 cos(ϑ) −sin(ϑ) x
=
y − y0 sin(ϑ) cos(ϑ) y
and conversely
x cos(ϑ) sin(ϑ) x − x0
= . (11.30)
y −sin(ϑ) cos(ϑ) y − y0
11.4 A Natural Application: Coordinate Transformation in RE2 399
Thus, we have
x = x cos(ϑ) − y sin(ϑ) + x0
(11.31)
y = x sin(ϑ) + y cos(ϑ) + y0
and
x = (x − x0 )cos(ϑ) + (y − y0 )sin(ϑ)
y = −(x − x0 )sin(ϑ) + (y − y0 )cos(ϑ).
Example 11.55 Let P ≡ (2, −1) and r : x + y − 2 = 0 be, respectively, the coor-
dinates of the point P and the equation of the line r with respect to the system O X Y.
Introduce a new coordinate system O X Y , where O ≡ (1, 4) and the base vectors
{i , j } are counterclockwise rotated by π4 .
We now determine the coordinates of P and the equation of r in terms of the new
coordinate system O X Y . By using relation (11.30), we obtain the new coordinates
of P: " √2 √2 # √
x 1 −2√2
= √
2 √2 = .
y − 2 2
2 2 −5 −3 2
that is √
2x + 3 = 0.
We dedicate the present section to the study of quadrics in affine and Euclidean
spaces. More precisely, we approach the study of the action of affine transformations
and isometries on quadrics. After a general analysis and classification of all the
potential cases, we will focus the attention on conic curves in FA2 and FE2 and
quadric surfaces in FA3 and FE3 from a purely geometrical point of view.
In order to head in the right direction, the first step must therefore be to introduce
the framework within the strategy for classification of geometric loci is usually
implemented. Now, we start by some definitions:
Definition 11.56 Let S, S be two subsets of an affine space A. We say that S and S
are affinely equivalent if there exists a nonsingular affine transformation f : A → A
such that f (S) = S .
400 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
On the basis of the definitions we have just mentioned, we now investigate the
question about how a subset S ⊂ A can be transformed into another S ⊂ A by an
affine transformation in such a way that S could represent the simplest form among
all the possible subsets of A that are affinely equivalent to S.
More specifically, here we examine the case when such a subset is a quadric.
Definition 11.58 Let A be an affine space associated with the vector space V over
the field F and assume dim F V = n. A quadric in A is a nonempty set Q of points
whose coordinates satisfy the equation p(x1 , . . . , xn ) = 0, where p is a quadratic
polynomial in the variables x1 , . . . , xn and with coefficients in F.
Note 11.59 In all that follows, we always assume that the characteristic of the men-
tioned field F is different from 2.
Starting from the definition of a quadric and collecting terms of the second, first and
zeroth degrees, the polynomial p can be written as
n
p(x1 , . . . , xn ) = ai j xi x j + 2 ak,n+1 xk + an+1,n+1 (11.32)
i=1 j=1 k=1
n
q(x1 , . . . , xn ) = ai j xi x j (11.33)
i=1 j=1
X t AX + B t X + an+1,n+1 = 0. (11.34)
Y t CY = 0. (11.35)
Theorem 11.62 Let Q be a quadric of Eq. (11.35), P a point of FAn having coor-
dinate vector u ≡ [γ1 , . . . , γn ] in terms of a fixed coordinate system. The point P
n
is a center of Q if and only if, for any i = 1, . . . , n, ai j γ j + ai,n+1 = 0, that is,
j=1
Au + v = 0.
Proof Let f : FAn → FAn be the central symmetry with respect to point P. By
applying f to the polynomial (11.32) representing Q, we obtain the polynomial
representing the set of points f (Q), that is,
402 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
n
f p(x1 , . . . , xn ) = ai j (2γi − xi )(2γ j − x j )
i, j=1
n
+2 ai,n+1 (2γi − xi ) + an+1,n+1
i=1 (11.36)
n
n
n
=4 ai j γi γ j − 4 ai j xi γ j + ai j xi x j
i, j=1 i, j=1 i, j=1
n n
+4 ai,n+1 γi − 2 ai,n+1 xi + an+1,n+1 .
i=1 i=1
n
ai j γ j + ai,n+1 (xi − γi ) = 0.
i=1 j=1
Since the last identity holds only if any coefficient is identically zero, we get the
required conclusion.
Corollary 11.63 A quadric Q of Eq. (11.35) has a center if and only if rank(C) ≤
rank(A) + 1.
n
Proof If we assume that Q has a center, then ai j γ j + ai,n+1 = 0, where [γ1 , . . . ,
j=1
γn ] is the coordinate vector of the center. Hence, the system of linear equations
coming from the identity AX = −v has solutions, that is, rank(A) = rank(A|v).
Since rank(C) ≤ rank(A|v) + 1, the conclusion follows.
Conversely, if rank(C) ≤ rank(A) + 1 but we assume that Q has no center, then
we have to assert that AX = −v has no solution, that is, rank(A|v) = rank(A) + 1.
This should imply that the (n + 1)th column of C is linearly independent of the first
n column vectors of C. On the other hand, since C is symmetric, the last assertion
means that the (n + 1)th row of C is linearly independent of the first n row vectors
of C. Thus rank(C) = rank(A|v) + 1 = rank(A) + 2, which is a contradiction.
These fruitful discussions allow us to gain the first overview of the distinction between
different categories of quadrics in relation to a possible existence of a center.
Let us now investigate the following question: into what simplest form can the
equation of a quadric be written in terms of a suitable choice of a frame of reference
of the n-dimensional affine space An ? The answer to this question is a consequence
of the solution of the related problem regarding the establishment of appropriate
conditions under which two quadrics can be transformed into each other by an affine
transformation. Since a quadric is represented by a polynomial of degree 2, we may
consider whether two quadratic polynomials are affinely (or metrically) equivalent,
in the sense of the following definition:
Definition 11.65 Let p1 (x1 , . . . , xn ) and p2 (x1 , . . . , xn ) be two distinct polynomials
with coefficients in a field F. We say that p1 and p2 are affinely (or metrically)
equivalent if there exists an affine transformation
(respectively, an isometry) f :
FAn → FAn such that f p1 (x1 , . . . , xn ) = p2 (x1 , . . . , xn ).
The quadrics Q1 and Q2 , represented by two affinely (or metrically) equivalent
polynomials are said to be affine (or metrically) equivalent.
Here we extend the argument previously presented in Theorem 11.43 to the n-
dimensional case. Hence, consider the affine transformation f of A, described by
f (X ) = M X + c, where M is an invertible matrix of Mn (F) and c is a fixed point
of A. The image of Q is computed by the substitution
X = M −1 f (X ) − c = H X + d (11.37)
that is,
X t (H t AH )X + (d t AH + d t AH + B t H )X + (d t Ad + B t d + an+1,n+1 ) = 0.
(11.38)
Therefore, under the notation A = H t AH , B = (d t AH + d t AH + B t H )t and
an+1,n+1 = d t Ad + B t d + an+1,n+1 in (11.38), f (Q) is represented by the equation
X t A X + B t X + an+1,n+1
=0
implying that f (Q) is yet a quadric of A. Since the relation (11.37) can be also
obtained by ⎡ ⎤ ⎡ ⎤
x1 x1
⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ H d ⎢ . ⎥
⎢ ⎥= ⎢ ⎥ , for 0 = [0, . . . , 0 ]t
$ %& '
(11.39)
⎣ xn ⎦ 0t 1 ⎣ x ⎦
n
n -times
1 1
404 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
that is,
Ht 0 H d
Y t C Y = 0.
dt 1 0t 1
Thus, using the terminology introduced in Definition 11.56, we may assert that
(1) The matrices A, A , representing the quadratic parts of polynomials defining
two affinely equivalent quadrics, are congruent.
(2) The matrices C, C , representing the entire polynomials defining two affinely
equivalent quadrics, are congruent.
One of the main properties of congruent matrices is that they have the same rank.
For this reason, in the further discussions we refer to the ranks of A and C in order
to indicate at the same time ranks of A and C , respectively.
If we consider the case when Q is a quadric of an affine Euclidean space E, then the
affine transformation acting on the points of Q can be described by f (X ) = M X + c,
where M is an invertible orthogonal matrix of Mn (F) and c is a fixed point of E. So,
following Definition 11.57, we say that
(1) The matrices A, A , representing the quadratic parts of polynomials defining
two metrically equivalent quadrics, are congruent.
(2) The matrices C, C , representing the entire polynomials defining two metrically
equivalent quadrics, are congruent.
Fixing a quadric Q, we pay special attention to establishing what is a suitable choice
of a frame of reference of the n-dimensional affine space A, in terms of which the
equation of Q can be written in the simplest form. We divide our argument into two
main cases.
By assuming that Q has a center, and by Theorem 11.62, there exists a vector u ∈ Fn
such that Au + v = 0. Looking at relation (11.37), and since A is symmetric, we
11.5 Affine and Metric Classification of Quadrics 405
that is,
X t A X + e = 0 (11.41)
r
p(x1 , . . . , xn ) = αi xi2 + e (11.42)
i=1
r
p(x1 , . . . , xn ) = xi2 − x 2
j +1 (11.44)
i=1 j=h+1
or
r
p(x1 , . . . , xn ) = xi2 − x 2
j − 1. (11.45)
i=1 j=h+1
r
p(x1 , . . . , xn ) = αi xi2 . (11.46)
i=1
r
p(x1 , . . . , xn ) = xi2 − x 2
j . (11.47)
i=1 j=h+1
x 2 + 4x y + 2x + y 2 + 2yz + 2z + 1 = 0
and ⎡ ⎤⎡ ⎤
1 2 0 1 x
!⎢2 1 1 0⎥ ⎢ ⎥
⎢
x yz1 ⎣ ⎥ ⎢ y ⎥ = 0.
0 1 0 1⎦⎣ z ⎦
1 0 1 1 1
we find the coordinates of center (1, −1, −1). Looking at the symmetric matrix
A, by the process of diagonalization of a bilinear symmetric form, we may obtain
H ∈ Mn (R) such that A = H t AH is a diagonal matrix. The implementation of the
standard process leads us to
⎡ ⎤
1 −2 − 23
H = ⎣ 0 1 13 ⎦ .
0 0 1
1
x 2 − 3y 2 + z 2 + 1 = 0.
3
Finally, for
x = X
y = √13 Y
√
z = 3Z ,
X 2 − Y 2 + Z 2 + 1 = 0,
which is a hyperboloid.
408 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
N = {X ∈ Fn : vt X = 0}.
where
(i) the nth column and nth row of matrix E t AE are zeros, except eventually the
(n, n)-entry;
(ii) the product vt E X is equal to αn xn , for some αn ∈ F.
Thus, the polynomial representing Q with respect to the basis {e1 , . . . , en } can be
written as
p(x1 , . . . , xn ) = X t A X + 2αn xn + an+1,n+1 (11.49)
where
à 0
A = .
0t ann
Notice that, since Q has no center, the coefficient αn in (11.49) is not zero. Moreover
à ∈ Mn−1 (F) is symmetric, so that there exists a suitable basis {c1 , . . . , cn−1 } for N
in terms of which the bilinear symmetric form associated with à is represented by
a diagonal matrix; in other words, there exists a nonsingular matrix D ∈ Mn−1 (F)
such that A = D t ÃD is a diagonal matrix, i.e.,
⎡ ⎤
α1
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ αr ⎥
A = D ÃD = ⎢
t
⎢
⎥ 0 = αi ∈ F, r = rank( Ã).
⎥
⎢ 0 ⎥
⎢ .. ⎥
⎣ . ⎦
0
11.5 Affine and Metric Classification of Quadrics 409
Looking at (11.50), assume firstly that ann = 0. In this case, and since
2
αn αn2
ann xn2 + 2αn xn + an+1,n+1 =
ann xn + + an+1,n+1 −
ann ann
r 2
αn
p(x1 , . . . , xn ) = αi xi2 +
ann xn + + a (11.51)
i=1
ann
αn2
for a = an+1,n+1 − .
ann
If we now apply to (11.51) the translation
xi = xi i = 1, . . . , n − 1
xn = xn − aαn ,
nn
r
αi xi2 + ann
xn2 + a
i=1
which should give a quadric with center. This contradiction proves that the coefficient
ann of polynomial (11.50) must be zero. Hence,
r
p(x1 , . . . , xn ) = αi xi2 + 2αn xn + an+1,n+1 . (11.52)
i=1
xi = √ 1 x i = 1, . . . , n
|αi | i
−1
xn
xn = 2αn
− an+1,n+1
2αn
,
the polynomial of Q is
r
p(x1 , . . . , xn ) = xi2 − x 2
j + xn . (11.53)
i=1 j=h+1
Hence, any quadric without a center in the n-dimensional affine space A is affinely
equivalent to one of the form (11.53).
Then, for a quadric without center, we have that rank(A) = r ≤ n − 1 and
rank(C) = r + 2. In particular,
(i) for r = n − 1, the quadric is non-degenerate and here we say that Q is a
Paraboloid;
(ii) for r ≤ n − 2, the quadric is degenerate and here we say that Q is a parabolic
Cylinder.
We can sum up what we have done saying that for any quadric Q in the n-dimensional
affine space An , over an arbitrary field F of characteristic different from 2, there is a
suitable frame of reference for An , in terms of which Q is specified by a particularly
simple equation, called canonical equation or canonical form for Q. Any possible
canonical equation of Q is associated with a polynomial of the form (11.44), (11.45),
(11.47) or (11.53). In particular,
n
n
(i) polynomials xi2 + 1 and xi2 − 1 represent the canonical form of an Ellip-
i=1 i=1
soid (it is a non-degenerate quadric with exactly one center);
h n
h n
(ii) polynomials xi2 − xi2 + 1 and xi2 − xi2 − 1 (h = 0, n) repre-
i=1 i=h+1 i=1 i=h+1
sent the canonical form of a Hyperboloid (once again, it is a non-degenerate
quadric with exactly one center);
h
r
h r
(iii) polynomials xi2 − xi2 + 1 and xi2 − xi2 − 1 (r ≤ n − 1) rep-
i=1 i=h+1 i=1 i=h+1
resent the canonical form of a non-parabolic Cylinder (it is a degenerate quadric
with an infinite number of centers);
h r
(iv) polynomials xi2 − xi2 (r ≤ n) represent a Cone (it is a degenerate
i=1 i=h+1
quadric having exactly one center);
2
n−1
(v) polynomial xi + xn represents a Paraboloid (it is a non-degenerate quadric
i=1
without center);
r
(vi) polynomials xi2 + xn (r ≤ n − 2) represent a parabolic Cylinder (it is a
i=1
degenerate quadric without center).
11.5 Affine and Metric Classification of Quadrics 411
x 2 + 2x y + 2x z + 4x + 2yz + z 2 + 2z + 1 = 0.
and ⎡ ⎤⎡ ⎤
1 1 12 x
!⎢1 0 1 0⎥ ⎢y⎥
x yz1 ⎢
⎣1
⎥ ⎢ ⎥ = 0.
1 1 1⎦⎣ z ⎦
2 0 11 1
and
⎡ ⎤ ⎡ ⎤
1 −1 0 0 0
Et 0 E0 ⎢ −1 0 0 0⎥ ⎢
0⎥
C = C =⎢ ⎥=⎢ A ⎥.
0 1 0 1 ⎣ 0 0 0 1 ⎦ ⎣ 1⎦
0 0 1 1 0 0 1 1
By the standard process for diagonalization for bilinear forms, we find the basis
{(1, 0, 0), (1, 1, 0), (0, 0, 1)} in terms of which the bilinear symmetric form associ-
ated with A is represented by a diagonal matrix. More precisely, if we denote by D,
the transition matrix of {(1, 0, 0), (1, 1, 0), (0, 0, 1)} relative to the standard basis of
R3 which has column vectors (1, 0, 0), (1, 1, 0), (0, 0, 1), then we get
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
100 110 1 0 0
A = D t A D = ⎣ 1 1 0 ⎦ A ⎣ 0 1 0 ⎦ = ⎣ 0 −1 0 ⎦ .
001 001 0 0 0
we get ⎡ ⎤
1 0 0 0
⎢ 0 −1 0 0⎥
HtC H = ⎢
⎣0
⎥.
0 0 1⎦
0 0 1 1
x = X
y = Y
z = Z
2
− 21 ,
At this point, we wonder what can we say in case F is an algebraically closed field
(for instance, F = C). We are going to solve the same above problem by means of
successive affine transformations; after each one, the equation of the quadric is for-
mulated simplifying the original one. One may simply repeat arguments previously
presented, acting with appropriate amendments in some steps. More precisely, for
11.5 Affine and Metric Classification of Quadrics 413
quadrics with center, we may start from the polynomial of the form (11.42):
r
p(x1 , . . . , xn ) = αi xi2 + e.
i=1
r
p(x1 , . . . , xn ) = xi2 + 1. (11.54)
i=1
r
p(x1 , . . . , xn ) = xi2 . (11.55)
i=1
In the case of quadrics without center, we look at polynomial of the form (11.52):
r
p(x1 , . . . , xn ) = αi xi2 + 2αn xn + an+1,n+1 ,
i=1
xi = √1
αi
xi i = 1, . . . , n − 1
xn
xn = 2αn
− an+1,n+1
2αn
the polynomial of Q is
r
p(x1 , . . . , xn ) = xi2 + xn . (11.56)
i=1
variables to zero, is affinely equivalent to one of the sets defined by equating to zero
one of the polynomials (11.54), (11.55) and (11.56). In particular,
n
(i) polynomial xi2 + 1 represents the canonical form of an Ellipsoid;
i=1
r
(ii) polynomials xi2 + 1 (r ≤ n − 1) represent the canonical form of a non-
i=1
parabolic Cylinder;
r
(iii) polynomials xi2 (r ≤ n) represent a Cone;
i=1
n−1
(iv) polynomial xi2 + xn represents a Paraboloid;
i=1
r
(v) polynomials xi2 + xn (r ≤ n − 2) represent a parabolic Cylinder.
i=1
Remark 11.69 We notice that the canonical equation of a degenerate quadric con-
tains less than n variables.
H1 : h 1 (x1 , . . . , xn ) = 0 H2 : h 2 (x1 , . . . , xn ) = 0.
Proof Assume firstly that Q is reducible and hyperplanes H1 and H2 are represented
by the following equations:
H1 : α1 x1 + · · · + αn xn + αn+1 = 0 H2 : β1 x1 + · · · + βn xn + βn+1 = 0,
γ1 x12 + γ2 x2 + · · · + γn xn + γn+1 = 0.
All that has just been said can be applied to the case of quadrics in A2 and A3 , in
order to obtain the well known affine classification of conic curves in the affine plane
416 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
and the affine classification of quadric surfaces in the 3-dimensional space. These
are nothing more than reduced cases of the ones previously discussed.
Thus, in the case of conics in the plane, we may simply assert the following.
Theorem 11.72 Let RA2 be the 2-dimensional affine space over the vector space
R2 . Any conic curve is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 − 1 = 0, non-degenerate real ellipse;
(ii) x 2 − y 2 − 1 = 0, non-degenerate hyperbola;
(iii) x 2 − y = 0, non-degenerate parabola;
(iv) x 2 + y 2 + 1 = 0, empty set (non-degenerate imaginary ellipse);
(v) x 2 − y 2 = 0, two secant real lines;
(vi) x 2 + y 2 = 0, one real point (two complex conjugate lines);
(vii) x 2 − 1 = 0, two real parallel lines;
(viii) x 2 + 1 = 0, empty set (two complex parallel lines);
(i x) x 2 = 0, two real merged lines.
In the case of complex spaces, we have the following.
Theorem 11.73 Let CA2 be the 2-dimensional affine space over the vector space
C2 . Any conic curve is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + 1 = 0, non-degenerate ellipse;
(ii) x 2 − y = 0, non-degenerate parabola;
(iii) x 2 + y 2 = 0, two complex conjugate secant lines;
(iv) x 2 + 1 = 0, two complex parallel lines;
(v) x 2 = 0, two real merged lines.
Shifting the focus to quadric surfaces in real affine space, we have the following.
Theorem 11.74 Let RA3 be the 3-dimensional affine space over the vector space R3 .
Any quadric surface is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + z 2 − 1 = 0, real ellipsoid;
(ii) x 2 + y 2 + z 2 + 1 = 0, empty set (non-degenerate imaginary ellipsoid);
(iii) x 2 + y 2 − z 2 + 1 = 0, elliptic hyperboloid;
(iv) x 2 + y 2 − z 2 − 1 = 0, hyperbolic hyperboloid;
(v) x 2 + y 2 − z = 0, elliptic paraboloid;
(vi) x 2 − y 2 − z = 0, hyperbolic paraboloid;
(vii) x 2 + y 2 + z 2 = 0, one real point (imaginary cone);
(viii) x 2 + y 2 − z 2 = 0, real cone;
(i x) x 2 + y 2 + 1 = 0, empty set (imaginary cylinder);
(x) x 2 + y 2 − 1 = 0, right circular cylinder;
(xi) x 2 − y = 0, parabolic cylinder;
(xii) x 2 − y 2 − 1 = 0, hyperbolic cylinder;
11.5 Affine and Metric Classification of Quadrics 417
Theorem 11.75 Let CA3 be the 3-dimensional affine space over the vector space C3 .
Any quadric surface is affinely equivalent to one of the sets defined by the following
equations:
(i) x 2 + y 2 + z 2 − 1 = 0, ellipsoid;
(ii) x 2 + y 2 − z = 0, paraboloid;
(iii) x 2 + y 2 + z 2 = 0, cone;
(iv) x 2 + y 2 + 1 = 0, elliptic cylinder;
(v) x 2 − y = 0, parabolic cylinder;
(vi) x 2 + y 2 = 0, two secant planes;
(vii) x 2 + 1 = 0, two parallel planes;
(viii) x 2 = 0, two merged planes.
X t A X + e = 0 (11.58)
where A has its eigenvalues on the main diagonal. The polynomial associated with
the quadric is
r
p(x1 , . . . , xn ) = αi xi2 + e (11.59)
i=1
418 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
r
p(x1 , . . . , xn ) = αi xi2 + 1 (11.60)
i=1
where αi = αei , for any i = 1, . . . , r. Notice that the last affine transformation (11.43),
previously used in the affine case, in the present Euclidean space doesn’t make any
sense, because it is not orthogonal.
We then conclude that any quadric with center in the n-dimensional affine
Euclidean space REn is metrically equivalent to one of the following forms:
r
αi xi2 + 1 (11.61)
i=1
r
αi xi2 . (11.62)
i=1
2x 2 + 2x y + 2x + 2y 2 + 4y + z 2 + 2 = 0.
we find the coordinates (0, −1, 0) of the center. To implement the transformation X =
H X + u, where H ∈ M3 (R) is an orthogonal matrix, we determine the eigenvalues
λ1 , λ2 , λ3 of A. We get
(1)
λ1 = λ2 = 1 having associated eigenspace generated by orthogonal vectors
√1 , − √1 , 0 and (0, 0, 1);
2 2
11.5 Affine and Metric Classification of Quadrics 419
(2) λ3 = 3, having associated eigenspace generated by vector √1 , √1 , 0
2 2
.
X 2 + Y 2 + 3Z 2 = 0.
Notice that, for F = R, the quadric consists of exactly one point. For F = C, the
quadric is a degenerate non-reducible surface.
where e = −2vt2 u − vt1 u + an+1,n+1 . Notice that ert +1 H is precisely the (r + 1)th
vt
column of H t H = In , that is, v22 H = er +1 . Hence, vt2 H = v2 er +1 . Therefore,
420 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
xi = xi i = 1, . . . , r
xr +1 = xr+1 − βre+1 ,
r
αi xi2 + βr +1 xr+1 . (11.65)
i=1
has solutions and their general form is (−α − 23 , α, β, −β), for any α, β ∈ R. We
choose one of them, for instance, α = β = 0, and denote it as u = (− 23 , 0, 0, 0).
This is the vector we’ll use for translation. We now determine the null space N (A)
of A: one of its orthonormal bases is
1 1 1 1
√ , − √ , 0, 0 , 0, 0, √ , − √ .
2 2 2 2
Then compose an orthonormal basis for R4 by the union of bases from I m(A) and
N (A), that is,
1 1 1 1 1 1 1 1
B= √ , √ , 0, 0 , 0, 0, √ , √ , √ , − √ , 0, 0 , 0, 0, √ , − √ .
2 2 2 2 2 2 2 2
Summarizing, we conclude that every quadric Q of the affine Euclidean space REn
having equation
X t AX + B t X + an+1,n+1 = 0
can be given in some coordinate system by an equation of the form (11.61) in the
case Q has a center, or by an equation of the form (11.65) if Q doesn’t have any
center. So, we have proved the following.
Theorem 11.78 Every quadric of REn is metrically equivalent to one of the follow-
ing:
p
r
(i) αi xi2 − α j x 2j = 0 if rank(C) = rank(A),
i=1 j= p+1
p r
(ii) αi xi2 − α j x 2j = 1 if rank(C) = rank(A) + 1,
i=1 j= p+1
p r
(iii) αi xi2 − α j x 2j − xr +1 = 0 if rank(C) = rank(A) + 2,
i=1 j= p+1
Theorem 11.79 Let RE2 be the 2-dimensional affine Euclidean space over the vec-
tor space R2 . Any conic curve is metrically equivalent to one of the sets defined by
the following equations:
x2 y2
(i) a2
+ b2
− 1 = 0, real ellipse;
x2 y2
(ii) a2
− − 1 = 0, hyperbola;
b2
(iii) x + ay = 0, parabola;
2
2 2
(iv) ax 2 + by2 + 1 = 0, empty set (imaginary ellipse);
x2 y2
(v) a2
− b2
= 0, two secant real lines;
x2 y2
(vi) a2
+ b2
= 0, one real point (two complex conjugate lines);
11.5 Affine and Metric Classification of Quadrics 423
2
(vii) ax 2 − 1 = 0, two real parallel lines;
2
(viii) ax 2 + 1 = 0, empty set (two complex parallel lines);
(i x) x 2 = 0, two real merged lines.
properties which are invariant under the action of projective transformations: a char-
acteristic feature of projective geometry is the symmetry of relationships between
points and lines, called duality.
Nevertheless, we will restrict ourselves here to briefly mentioning the main defi-
nitions, which will contribute to achieving the primary objective of this chapter: the
study of conic curves and quadric surfaces from a projective point of view.
Let F be a field and consider the vector space Fn+1 of dimension n + 1. Let
0 = v ∈ Fn+1 be a nonzero vector, then the set
v = {λv | λ ∈ F}
Definition 11.81 The projective space FPn , of dimension n, associated with Fn+1 ,
is the set of rays of Fn+1 . Any element of FPn is called a point.
Definition 11.83 Let Mn+1 (F) be the ring of (n + 1) × (n + 1) matrices over F. Any
nonsingular matrix C ∈ Mn+1 (F) defines a linear transformation F : FPn → FPn
which is called projective transformation and transforms a point having projective
coordinates X into a point having projective coordinates X via X ≈ C X , where ≈
indicates equality up to a scale factor.
Definition 11.84 Let S and S be two subsets of the projective space FPn . We say
that S is projectively equivalent to S if there exists a projective transformation
F : FPn → FPn such that F(S) = S .
If P ≡ (x, y) is a point in the affine plane FA2 , we may represent it in the Pro-
jective plane FP2 by its homogeneous coordinates. We simply add a third coordinate
equal to 1. Thus, P ≡ (x, y) is represented by its homogeneous coordinates (x, y, 1).
More generally, the homogeneous coordinates of a point P are (x1 , x2 , x3 ) iff
x = xx13 and y = xx23 are its Euclidean coordinates. Thus, homogeneous coordinates
are invariant up to scaling: (x1 , x2 , x3 ) and (αx1 , αx2 , αx3 ) represent the same point,
for any 0 = α ∈ F.
To represent a line in the projective plane, we start from the standard formula ax +
by + c = 0 and introduce the homogeneous coordinates to arrive at the equation
ax1 + bx2 + cx3 = 0.
The coordinates (x1 , x2 , 0) cannot represent any point of the form (x, y, 1) as
they don’t share the same third coordinates. In fact, (x1 , x2 , 0) is a point representing
the slope of a line: parallel lines r and r in FP2 meet at the line x3 = 0 (line at
infinity). More precisely, if r : ax1 + bx2 + cx3 = 0 and r : ax1 + bx2 + c x3 = 0,
then their intersection point has homogeneous coordinates equal to (b, −a, 0). This
point is called point at infinity of r (and r ) and represents the class of all parallel
lines having the same direction.
Hence, the general idea is to let every couple of lines in the projective plane have
an intersection point. By this approach, the projective plane can be defined as
f (x1 , x2 , x3 ) = a11 x12 + 2a12 x1 x2 + a22 x22 + 2a13 x1 x3 + 2a23 x2 x3 + a33 x32
(11.67)
and ai j ∈ F, for any i, j = 1, 2, 3.
The quadratic polynomial f (x1 , x2 , x3 ) can be represented by the symmetric
matrix ⎡ ⎤
a11 a12 a13
A = ⎣ a12 a22 a23 ⎦
a13 a23 a33
426 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
so that ⎡ ⎤⎡ ⎤
! a11 a12 a13 x1
f (x1 , x2 , x3 ) = x1 x2 x3 ⎣ a12 a22 a23 ⎦ ⎣ x2 ⎦ .
a13 a23 a33 x3
!t
Hence, if we denote X = x1 x2 x3 , then the equation f (x1 , x2 , x3 ) = 0 can be
written in the following matrix notation: X t AX = 0.
We say that A is the matrix associated with the conic Γ, or also that the conic Γ
is determined by the matrix A.
Following Definition 11.84, we say that two conics Γ and Γ of the projective
space FP2 are projectively equivalent if there exists a projective transformation F :
FP2 → FP2 such that F(Γ ) = Γ .
Theorem 11.85 Let Γ be a conic of the projective space FP2 , determined by the 3 ×
3 matrix A with coefficients in F. If F : FP2 → FP2 is a projective transformation
and−1Ct is the matrix representing F, then F(Γ ) is a conic determined by the matrix
C AC −1 .
that is,
−1 t −1
(C X ) C
t
AC C X = 0,
Theorem 11.86 Let Γ and Γ be two projective conics, represented by the symmetric
matrices A ∈ M3 (F) and A ∈ M3 (F), respectively. Then Γ is projectively equivalent
to Γ if and only if there exists a nonsingular matrix C ∈ M3 (F) such that A =
C t AC.
Proof Firstly, we assume that Γ is projectively equivalent to Γ , that is, there exists
a projective transformation F : FP2 → FP2 such that F(Γ ) = Γ . Let C ∈ M3 (F)
be the nonsingular matrix associated with F. Hence, C X 0 ∈ Γ , for any point X 0 ∈
Γ. The replacement of X by C X in the matrix notation X t AX = 0 of Γ leads
to the relation (C X )t A(C X ) = 0, that is, X t (C t AC)X = 0, which represents Γ .
Therefore, A = C t AC as required.
11.6 Projective Classification of Conic Curves and Quadric Surfaces 427
In other words, two projective conics Γ and Γ are projectively equivalent if and
only if their associated matrices are congruent.
In light of Definition 11.87 and Lemma 7.50, we are now able to state the following.
Theorem 11.88 If two conics Γ and Γ of FP2 are projectively equivalent, then
they have the same rank.
In particular,
Theorem 11.89 Let F be an algebraically closed field. Two conics Γ and Γ of FP2
are projectively equivalent if and only if they have the same rank.
The fact that the matrix associated with a conic is symmetric makes it possible for us to
transform it to a diagonal matrix. In this way, we determine the number of congruence
classes for matrices in M3 (F), which correspond to congruence classes of conics. Any
class is represented by a diagonal matrix notation, usually called projective canonical
form of a conic, such that any conic of FP2 is projectively equivalent to one (and
only one) of them. To do this, we refer to the results contained in Theorems 7.46 and
7.47, devoted to the description of the diagonal forms of quadratic functions. More
precisely, in the case F is algebraically closed and using Theorem 7.46, we have the
following.
Theorem 11.90 Let F be an algebraically closed field. Any conic Γ of FP2 is pro-
jectively equivalent to one (and only one) of the following:
(i) x12 + x22 + x32 = 0 (non-degenerate or ordinary conic);
(ii) x12 + x22 = 0 (degenerate conic of rank 2);
(iii) x12 = 0 (degenerate conic of rank 1).
Proof By using the same argument as in Theorem 11.90 and applying Theorem 7.47,
one has that any matrix representing a conic of RP2 is congruent to one of the
following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
10 0 100 1 0 0 100 100
⎣ 0 1 0 ⎦ , ⎣ 0 1 0 ⎦ , ⎣ 0 −1 0 ⎦ , ⎣ 0 1 0 ⎦ , ⎣ 0 0 0 ⎦ .
0 0 −1 001 0 0 0 000 000
invariant up to scaling, then (x1 , x2 , x3 , x4 ) and (αx1 , αx2 , αx3 , αx4 ) represent the
same point, for any 0 = α ∈ R.
To represent a plane in the projective space FP3 , we homogenize the general
equation for planes ax + by + cz + d = 0 by introducing the homogeneous coor-
dinates, so that each term has the same degree. Hence, the general equation of a
plane in FE3 has the form ax1 + bx2 + cx3 + d x4 = 0. Therefore, in order to rep-
resent a line r in FE3 , we just recall that it is the intersection of two non-parallel
planes π : ax1 + bx2 + cx3 + d x4 = 0 and π : a x1 + b x2 + c x3 + d x4 = 0, thus
the line r can be described as
Let now P be a point having projective coordinates (l, m, n, 0). The first three coor-
dinates (l, m, n) represent the direction vector of a line: parallel lines r and r in FP3
meet at the plane x4 = 0 (plane at infinity). More precisely, if r and r are parallel
lines in FP3 , then their intersection point has homogeneous coordinates equal to
(l, m, n, 0), where (l, m, n) are the direction vector of r and r . The point (l, m, n, 0)
is called point at infinity of r (and r ) and represents the class of all parallel lines
having the same direction vector.
To represent a quadric of FP3 , we start from the standard representing equation. A
quadric is a locus in F3 (F = R or F = C) consisting of all points whose coordinates
are a solution of a quadratic equation of the form f (x, y, z) = 0, where
Definition 11.93 Let Σ and Σ be two quadrics of the projective space FP3 . We
say that Σ is projectively equivalent to Σ if there exists a projective transformation
F : FP3 → FP3 such that F(Σ) = Σ .
At this point, we may remind the arguments previously developed in the section
devoted to the classification of conics in the projective space FP2 . Following the
same line we are able to state the following.
Theorem 11.94 Let Σ be a quadric of the projective space FP3 , determined by the
4 × 4 matrix A with coefficients in F. If F : FP3 → FP3 is a projective transforma-
tion and C is the matrix representing F, then F(Σ) is a quadric determined by the
t
matrix C −1 AC −1 .
Theorem 11.95 Let Σ and Σ be two projective quadrics, represented by the sym-
metric matrices A ∈ M4 (F) and A ∈ M4 (F), respectively. Then Σ is projectively
equivalent to Σ if and only if there exists a nonsingular matrix C ∈ M4 (F) such that
A = C t AC.
Hence, two projective quadrics Σ and Σ are projectively equivalent if and only if
their associated matrices are congruent.
Moreover:
Theorem 11.97 If two quadrics of FP3 are projectively equivalent, then they have
the same rank. In particular, two quadrics of CP3 are projectively equivalent if and
only if they have the same rank. (C is an algebraically closed field.)
Therefore, also in the case of classification of quadrics, we may determine a number
of congruence classes for matrices in M4 (F), which correspond to congruence classes
of quadrics. The classes are usually called projective canonical forms of a quadric,
such that any quadric of FP3 is projectively equivalent to one (and only one) of them.
By using again Theorems 7.46 and 7.47, we have the following.
Theorem 11.98 Let F be an algebraically closed field. Any quadric of FP3 is pro-
jectively equivalent to one (and only one) of the following:
11.6 Projective Classification of Conic Curves and Quadric Surfaces 431
Theorem 11.99 Any quadric Σ of RP3 is projectively equivalent to one (and only
one) of the following:
(i) x12 + x22 + x32 + x42 = 0 (non-degenerate or ordinary quadric, containing no
real points);
(ii) x12 + x22 + x32 − x42 = 0 (non-degenerate or ordinary quadric);
(iii) x12 + x22 − x32 − x42 = 0 (non-degenerate or ordinary quadric);
(iv) x12 + x22 + x32 = 0 (degenerate quadric of rank 3, containing only one real
point);
(v) x12 + x22 − x32 = 0 (degenerate quadric of rank 3);
(vi) x12 + x22 = 0 (degenerate quadric of rank 2);
(vii) x12 − x22 = 0 (degenerate quadric of rank 2);
(viii) x12 = 0 (degenerate quadric of rank 1).
Proof As in Theorem 11.98 and applying Theorem 7.47, one has that any matrix
representing a quadric of RP3 is congruent to one of the following:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
⎢0 1 0 0⎥ ⎢0 1 0 0 ⎥ ⎢0 1 0 0 ⎥ ⎢0 1 0 0⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥, ⎢ ⎥
⎣0 0 1 0⎦ ⎣0 0 1 0 ⎦ ⎣0 0 −1 0 ⎦ ⎣0 0 1 0⎦
0 0 0 1 0 0 0 −1 0 0 0 −1 0 0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
⎢0 1 0 0⎥ ⎢0 1 0 0⎥ ⎢0 −1 0 0⎥ ⎢0 0 0 0⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥, ⎢ ⎥.
⎣0 0 −1 0⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Example 11.100 Let Q be the projective quadric of FP3 having equation x12 + x22 −
x32 + x1 x2 + x1 x4 − 3x42 = 0. The matrix associated with Q is
432 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
⎡ ⎤1
0 21
1 2
⎢1 1 0 0 ⎥
A=⎢ ⎥
⎣ 0 0 −1 0 ⎦ .
2
1
2
0 0 −3
0 0 0 1
so that ⎡ ⎤
1 0 0 0
⎢0 3 0 −1 ⎥
A = E 1t AE 1 = ⎢ 4 ⎥
⎣ 0 0 −1 0 ⎦ .
4
0 − 41 0 − 13
4
Starting from this last one, the second transition of basis is represented by the matrix
⎡ ⎤
1 0 00
⎢0 1 0 13 ⎥
E2 = ⎢
⎣0
⎥
0 1 0⎦
0 0 01
and we obtain ⎡ ⎤
10 0 0
⎢0 3 0 0 ⎥
A = E 2t A E 2 = ⎢ ⎥
⎣ 0 0 −1 0 ⎦ .
4
0 0 0 − 10
3
x1 = X 1
2
x2 = √ X 2
3
x3 = X 3
√
3
x4 = √ X 4 ,
10
11.6 Projective Classification of Conic Curves and Quadric Surfaces 433
x1 = X 1
2
x2 = √ X 2
3
x3 = i X 3
√
i 3
x4 = √ X 4 ,
10
Exercises
1. In the affine space RA2 , consider the transformation f : RA2 → RA2 defined by
√ √
1 3 3 3 1 √
f (x, y) = − x− y+ , x − y + 34 .
4 4 4 4 4
Prove that f is an affinity on RA2 and describe geometrically what its action is.
(translation, rotation, reflection, a composition of some of them, etc.)
2. In the affine space RA2 , consider the transformation f : RA2 → RA2 such that
where coordinates of points are referred to the standard frame of reference. Prove
that f is an affinity on RA2 , determine its representation and describe geometri-
cally what its action is.
3. Represent the reflection in RE4 across the hyperplane of equation x1 + 2x2 −
x3 + x4 + 1 = 0.
4. Given a hyperbola Γ in space RE2 , prove that there exists an affine transformation
mapping Γ to the hyperbola represented by equation x y = 1.
5. Let S1 , S2 be two ordered sets of RA2 consisting of 3 non-collinear points each.
Prove that there exists a unique affine transformation f : RA2 → RA2 such that
f (S1 ) = S2 .
6. Determine the affine and metric classification of the following quadrics and the
transformations needed to obtain them, in both cases F = R and F = C:
(a) 2x 2 + y 2 + z 2 − x − 2y + 1 = 0 in FA3 and FE3 .
(b) x 2 + y 2 + 2x y + 2x z + 2yz − 2x + 2 = 0 in FA4 and FE4 .
434 11 Affine and Euclidean Spaces and Applications of Linear Algebra to Geometry
In this chapter, we provide a method for solving systems of linear ordinary differ-
ential equations by using techniques associated with the calculation of eigenvalues,
eigenvectors and generalized eigenvectors of matrices. We learn in calculus how to
solve differential equations and the system of differential equations. Here, we firstly
show how to represent a system of differential equations in a matrix formulation.
Then, using the Jordan canonical form and, whenever possible, the diagonal canon-
ical form of matrices, we will describe a process aimed at solving systems of linear
differential equations in a very efficient way. To do this, we also give a short descrip-
tion of the so-called vector-valued functions. Finally, as a further application, in the
last part of the chapter, we show that the method linked with the solution of systems
also supplies a way of dealing with the problem of resolution of differential equations
of order n.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 435
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3_12
436 12 Ordinary Differential Equations and Linear …
where x is a scalar parameter, g(x) is a function of the variable x which is defined and
continuous in an interval I ⊆ R, y(x) is the unknown function of x, y (x) is the first
derivative of y(x) and, for any i, y (i) is the ith derivative of y(x). We recall that in
literature, the functions y(x), y (x), . . . , y (n) (x) of the variable x, that are involved
in an ordinary differential equation, are commonly replaced by the y, y , . . . , y (n) so
that Eq. (12.1) can be written as
We say that the differential equation is homogeneous if the function g(x) is just the
constant function 0, that is, the differential equation is precisely
The order of a differential equation is the order of the highest derivative which
appears in the equation. A solution (or integral) of an ordinary differential equation
of order n of the form (12.1) is a function y(x) defined in the interval I ⊆ R and
satisfying the following:
(1) It is n-times differentiable in the domain of definition I .
(2) F(x0 , y(x0 ), y (x0 ), . . . , y (n) (x0 )) = g(x0 ), for any x0 ∈ I .
Example 12.1 Any solution of the first-order differential equation y = ay, where
a ∈ R is a fixed constant, has the form y(x) = ceax (c ∈ R). In fact, assuming that
y(x) is a solution, we notice that the product y(x)e−ax has derivative equal to zero;
hence, it is a constant, say y(x)e−ax = c, for some c ∈ R. Moreover, the function
ceax is differentiable in the whole R.
y (x) = ac1 eax − ac2 e−ax and y (x) = a 2 c1 eax + a 2 c2 e−ax = a 2 y(x).
Moreover, the function c1 eax + c2 e−ax is two times differentiable in the whole R.
Note that the constants of integration are included in each of the previous examples.
They are always included in the general solution of differential equations. This means
that the solution of a differential identity represents a family of infinite curves (∞1
12.1 A Brief Overview of Basic Concepts of Ordinary Differential Equations 437
and ∞2 curves for Examples 12.1 and 12.2, respectively). More in general, the
solution of a differential equation of order n is a function depending on n arbitrary
constants and represents a family of ∞n curves. The set of solutions
c
(0 = c ∈ R).
If we consider the following initial value problem:
we get the particular solution correlating to the value of the constant c which satisfies
12 + c2
2 = y(1) =⇒ = 2 =⇒ (c − 1)2 = 0 =⇒ c = 1, 1,
c
Example 12.5 Let us return to Example 12.4. Notice that both y = 2x and y =
−2x are solutions of the second-order differential equation 2yy − x y 2 − 4x = 0.
Nevertheless, none of them can be deducted by the general solution by assigning
particular values to the constant c.
438 12 Ordinary Differential Equations and Linear …
is called the associated homogeneous equation. It is known that, if s(x) is the gen-
eral solution of the associated homogeneous equation (12.5), (s(x) is usually called
a complementary solution) and s0 (x) is any solution of the nonhomogeneous equa-
tion (12.4), then y(x) = cs(x) + s0 (x) is a general solution of the nonhomogeneous
equation, for an arbitrary constant c. In other words, the general solution to (12.4) is,
thus, obtained by adding all possible homogeneous solutions to one fixed particular
solution.
From this family, we can select one which satisfies the initial condition y(x0 ) = y0 .
In any course on ordinary differential equations, all of us probably encountered the
general method known as variation of parameters for constructing particular solu-
tions of nonhomogeneous ordinary differential equations with constant coefficients.
Here, it is not our intention to discuss this aspect. Actually, we would like to reiterate
that, from the point of view of the search for solutions, the study of the associated
homogeneous equation plays a crucial role. Thus, in this section, we shall concen-
trate our attention exactly on the case of linear homogeneous first-order ordinary
differential equations in which one or more than one unknown function occurs.
Let y1 (x), . . . , yn (x) be differentiable functions of the scalar parameter x, ai j
given scalars (for i, j = 1, . . . , n) and f 1 (x), . . . , f n (x) arbitrary functions of x.
Assume that f 1 (x), . . . , f n (x) are defined and continuous in an interval I ⊆ R.
A system of n linear first-order ordinary differential equations in n unknowns
with constant coefficients (an n × n system of linear equations) has the general form
where ⎡ ⎤
y1 (x) ⎡ ⎤
⎢ y (x) ⎥ a11 a12 . . . a1n
⎢ 2 ⎥ ⎢ a21 a22 . . . a2n ⎥
y (x) = ⎢
⎢ ... ⎥,
⎥ A=⎢
⎣...
⎥,
⎣ ... ⎦ ... ... ... ⎦
an1 an2 . . . ann
yn (x)
⎡ ⎤ ⎡ ⎤
y1 (x) f 1 (x)
⎢ y2 (x) ⎥ ⎢ f 2 (x) ⎥
⎢ ⎥ ⎢ ⎥
y(x) = ⎢ . . . ⎥ , f(x) = ⎢
⎢ ⎥
⎢ ... ⎥ ⎥.
⎣ ... ⎦ ⎣ ... ⎦
yn (x) f n (x)
whose domain of definition is the largest possible interval for which all components
are defined, that is, the intersection of their domains of definition.
The calculus processes of taking limits, differentiating and integrating are extended
to vector-valued functions by evaluating the limit (derivative or integral, respectively)
of each entry gi (x) separately, that is,
⎡ ⎤
lim g1 (x) ⎡ ⎤ ⎡ ⎤
⎢ x→x0
⎥ g1 (x) g1 (x)
⎢ lim g2 (x) ⎥ ⎢ g (x) ⎥ ⎢ g2 (x) ⎥
⎢ x→x0 ⎥ ⎢ 2 ⎥ ⎢ ⎥
lim g(x) = ⎢
⎢ ... ⎥ , g (x) = ⎢ . . . ⎥ ,
⎥ ⎢ ⎥ g(x) = ⎢
⎢ ... ⎥ ⎥.
x→x0
⎢ ... ⎥ ⎣ . . . ⎦ ⎣ ... ⎦
⎣ ⎦
lim gn (x) gn (x) gn (x)
x→x0
More precisely, for any point x0 in the domain of definition of g(x), the following
hold:
440 12 Ordinary Differential Equations and Linear …
(1) The limit lim g(x) exists if and only if the limits of all components exist. If any
x→x0
of the limits lim gi (x) fail to exist, then lim g(x) does not exist.
x→x0 x→x0
(2) g(x) is continuous at x = x0 if and only if all components are continuous at
x = x0 .
(3) g(x) is differentiable at x = x0 if and only if all components are differentiable
at x = x0 .
The objects y(x), y (x) and f(x) in relation (12.7) are examples of vector-valued
functions. In this sense, we say that a solution of (12.7) is a vector-valued function
y(x) = [y1 (x), . . . , yn (x)]T satisfying the following conditions:
(1) Each function yi (x) is defined and differentiable in the domain of definition I .
(2) y(x0 ) satisfies (12.7), for any x0 ∈ I , that is, for any x0 ∈ I ,
Since linear homogeneous systems have linear structure, if s1 (x), . . . , sn (x) are solu-
tions of system (12.8), then
= c1 A · s1 (x) + · · · · · · + cn A · sn (x)
= A · c1 s1 (x) + · · · · · · + cn sn (x) .
(2) Compute the determinant of the matrix M(x): it is called the Wronskian of the
n vector functions s1 (x), . . . . . . , sn (x) and usually denoted by W (x).
(3) If the Wronskian is different from zero for any x ∈ I , then s1 (x), . . . . . . , sn (x)
are linearly independent.
As a consequence of well-known Liouville’s theorem (also called Abel’s formula),
we observe that if W (x0 ) = 0 for any given x0 ∈ I , then W (x) = 0 for any x ∈ I .
Remark 12.9 In general, if {yi (x)} is a linearly dependent set of functions, then the
Wronskian must vanish. However, the converse is not necessarily true, as one can find
examples in which the Wronskian vanishes without the functions being dependent.
Nevertheless, if {yi (x)} are solutions for a linear system of ordinary differential
equations, then the converse does hold. In other words, if {yi (x)} are solutions for
a linear system of ordinary differential equations and the Wronskian of the {yi (x)}
vanishes, then {yi (x)} is a linearly dependent set of functions.
y1 = 4y1 ,
y2 = −2y2 ,
y3 = 3y3 ,
y1 = c1 e4x ,
y2 = c2 e−2x ,
y3 = c3 e3x .
To achieve the same general solution, we now introduce a different approach. The
system has the matrix form
⎡ ⎤ ⎡ ⎤⎡ ⎤
y1 (x) 4 0 0 y1 (x)
y (x) = ⎣ y2 (x) ⎦ = ⎣ 0 −2 0 ⎦ ⎣ y2 (x) ⎦ .
is diagonal and has three distinct real eigenvalues {4, −2, 3}. The corresponding
eigenvectors are as follows:
• X 1 = [1, 0, 0] for λ1 = 4.
• X 2 = [0, 1, 0] for λ2 = −2.
• X 3 = [0, 0, 1] for λ3 = 3.
444 12 Ordinary Differential Equations and Linear …
At this point, we notice that s1 (x) = e4x X 1 , s2 (x) = e−2x X 2 , s3 (x) = e3x X 3 , where
s1 , s2 , s3 are precisely the solutions previously deducted from the system.
Thus, the general solution is given by
⎡ ⎤
c1 e4x
y(x) = ⎣ c2 e−2x ⎦ = c1 e4x X 1 + c2 e−2x X 2 + c3 e3x X 3
c3 e3x
0 = y1 (2) = c1 e8 ,
−1 = y2 (2) = c2 e−4 ,
3 = y3 (2) = c3 e6 .
Thus, c1 = 0, c2 = −e4 , c3 = 3e−6 and the solution satisfying the initial conditions
is
y1 = 0,
y2 = −e4 e−2x ,
y3 = 3e−6 e3x .
The previous example is easy to solve, thanks to the fact that each equation in
the system involves one and only one unknown function and its derivative. More
precisely, the ith equation involves precisely yi , yi , so that the matrix associated
with the system is diagonal. This allows us to easily find the solution of any equation
separately. It is clear that the simplest systems are those in which the associated
matrix is diagonal.
Here, we’ll give an answer to the question of how we might solve an homogeneous
system
y (x) = Ay(x) (12.9)
whose coefficient matrix A is not diagonal. The first case we analyze is related to
systems whose associated coefficient matrix is diagonalizable. To work this out, we
prove the following:
Let y1 (x), . . . , yn (x) be differentiable functions of the scalar param-
Theorem 12.12
eter x, A = ai j n×n , a given diagonalizable scalar matrix such that
⎡ ⎤ ⎡ ⎤
y1 (x) ⎡ ⎤ y1 (x)
⎢ y (x) ⎥ a11 a12 . . . a1n ⎢ ⎥
⎢ 2 ⎥ ⎢ a22 . . . a2n ⎥ ⎢ y2 (x) ⎥
⎢ . . . ⎥ = ⎢ a21 ⎥⎢ ... ⎥. (12.10)
⎢ ⎥ ⎣... ... ... ... ⎦⎢ ⎥
⎣ ... ⎦ ⎣ ... ⎦
an1 an2 . . . ann
yn (x) yn (x)
12.2 System of Linear Homogeneous Ordinary Differential Equations 445
If {λ1 , . . . , λn } is the spectrum of A (the eigenvalues are not necessarily distinct) and
{X 1 , . . . , X n } is a linearly independent set of eigenvectors of A, such that AX i =
λi X i , for any i = 1, . . . , n, then {X 1 eλ1 x , . . . . . . , X n eλn x } is a fundamental system
of solutions and any solution of system (12.10) is a vector-valued function having
the form
y(x) = c1 X 1 eλ1 x + · · · + cn X n eλn x (12.11)
where the diagonal entries λi are not necessarily distinct and any eigenvalue λi
repeatedly occurs on the main diagonal as many times as it occurs as a root of the
characteristic polynomial of A. We recall that the columns of P coincide with the
eigenvectors of A. In this regard, we may write
P = [X 1 X 2 . . . Xn]
with ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
p11 p12 p1n
⎢ p21 ⎥ ⎢ p22 ⎥ ⎢ p2n ⎥
X1 = ⎢ ⎥
⎣ ... ⎦ X2 = ⎢ ⎥
⎣ ... ⎦ ... Xn = ⎢ ⎥
⎣ ... ⎦.
pn1 pn2 pnn
where g1 , . . . , gn are differentiable functions of the variable x such that g(x) is related
to the unknown function y(x) by the equation g(x) = P −1 · y(x).
Hence,
y(x) = Pg(x) (12.12)
that is,
y (x) = Pg (x). (12.15)
Substitution of (12.12) and (12.15) in (12.10) leads us to Pg (x) = A Pg(x). Thus,
g (x) = P −1 A Pg(x), which means
⎡ ⎤ ⎡ ⎤⎡ ⎤
g1 (x) λ1 g1 (x)
⎢ g (x) ⎥ ⎢ . . ⎥ ⎢ g2 (x) ⎥
⎢ 2 ⎥ ⎢ . ⎥⎢ ⎥
⎢ ... ⎥ = ⎢ ⎥⎢ ... ⎥ (12.16)
⎢ ⎥ ⎢ .. ⎥ ⎢ ⎥
⎣ ... ⎦ ⎣ . ⎦⎣ ... ⎦
gn (x) λn gn (x)
so that
gi (x) = λi gi (x) ∀i = 1, . . . , n. (12.17)
n
= pi j c j eλ j x ,
j=1
that is,
12.2 System of Linear Homogeneous Ordinary Differential Equations 447
⎡ ⎤
y1 (x)
⎢ y2 (x) ⎥
⎢ ⎥
y(x) = ⎢
⎢ ... ⎥
⎥
⎣ ... ⎦
yn (x)
⎡ ⎤
p11 c1 eλ1 x + · · · + p1n cn eλn x
⎢ p21 c1 eλ1 x + · · · + p2n cn eλn x ⎥
⎢ ⎥
=⎢
⎢ ... ⎥
⎥
⎣ ... ⎦
λ1 x λn x
pn1 c1 e + · · · + pnn cn e
= c1 X 1 eλ1 x + · · · + cn X n eλn x ,
Its determinant is equal to e(λ1 +······+λn )x · det (P) and is different from zero, for any
x ∈ R. Therefore, we conclude that {X 1 eλ1 x , . . . . . . , X n eλn x } is a fundamental system
of solutions.
has four distinct real eigenvalues {−2, 2, 4, 6}; thus, it is diagonalizable. The corre-
sponding eigenvectors are as follows:
• X1 = [−7, 5, 8, −16]T for λ1 = −2.
• X2 = [1, 1, 0, 0]T for λ2 = 2.
• X3 = [1, −1, 0, 0]T for λ3 = 4.
• X4 = [−9, 3, 8, 16]T for λ4 = 6.
Hence,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−7 1 1 −9
⎢ 5 ⎥ −2x ⎢ 1 ⎥ 2x ⎢ −1 ⎥ 4x ⎢ 3 ⎥ 6x
y(x) = c1 ⎢
⎣ 8 ⎦e
⎥ + c2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 0 ⎦ e + c3 ⎣ 0 ⎦ e + c4 ⎣ 8 ⎦ e ,
−16 0 0 16
that is,
y1 = −7c1 e−2x + c2 e2x + c3 e4x − 9c4 e6x ,
35 −2x
y1 = − 32 e + 23 2x
16
e + 23 e4x − e ,
27 6x
32
25 −2x
y2 = 32
e + 23 2x
16
e − 23 e4x + e ,
9 6x
32
y3 = 45 e−2x + 34 e6x ,
y4 = − 25 e−2x + 23 e6x .
y1 = y1 + 2y2 ,
y2 = 3y1 + 2y2 ,
y3 = 3y1 − 2y2 + 4y3 .
has two distinct real eigenvalues {−1, 4}, in particular the algebraic multiplicity of
λ = 4 is equal to 2. The corresponding eigenvectors are as follows:
• X 1 = [2, 3, 0]t and X 2 = [0, 0, 1]t for λ = 4.
• X 3 = [1, −1, −1]t for λ2 = −1.
Thus, the geometric multiplicity of λ = 4 is equal to the algebraic one, that is, the
matrix is diagonalizable. So, the general solution has the following form:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 0 −1
y(x) = c1 ⎣ 3 ⎦ e4x + c2 ⎣ 0 ⎦ e4x + c3 ⎣ 1 ⎦ e−x ,
0 1 1
that is,
y1 = 2c1 e4x − c3 e−x ,
y3 = c2 e4x + c3 e−x .
450 12 Ordinary Differential Equations and Linear …
we get the complex eigenvector X 2 = [i, 1 + i, 1]t . Thus, the complex eigenvector
corresponding to μ = 4 − 3i is X 2 = [−i, 1 − i, 1]t . So, the general solution has the
following form:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 i −i
y(x) = c1 ⎣ 0 ⎦ e−x + c2 ⎣ 1 + i ⎦ e(4+3i)x + c3 ⎣ 1 − i ⎦ e(4−3i)x ,
1 1 1
that is,
y1 = c1 e−x + ic2 e(4+3i)x − ic3 e(4−3i)x ,
= c1 e−x + ic2 e4x cos 3x − c2 e4x sin 3x − ic3 e4x cos 3x − c3 e4x sin 3x
= c1 e−x − (c2 + c3 )e4x sin 3x + i (c2 − c3 )e4x cos 3x ,
= c2 e4x cos 3x + ic2 e4x cos 3x + ic2 e4x sin 3x − c2 e4x sin 3x+
c3 e4x cos 3x − ic3 e4x cos 3x − ic3 e4x sin 3x − c3 e4x sin 3x
= (c2 + c3 )e cos 3x − sin 3x + i(c2 − c3 )e cos 3x + sin 3x ,
4x 4x
y3 = c1 e−x + c2 e4x cos 3x + i sin 3x + c3 e4x (cos 3x − i sin 3x
= c1 e−x + (c2 + c3 )e4x cos 3x + i (c2 − c3 )e4x sin 3x .
Notice that every solution of the form (12.11) will be a linear combination of the
special solutions
X 1 e λ1 x , . . . , X n e λn x
each of which must be interpreted as the product of the variable scalar eλi x and the
constant vector X i , for any eigenvalue λi and corresponding eigenvector X i . Actually,
any particular solution X i eλi x can be obtained from the general one, assigning the
values ci = 1 and c j = 0 for any j = i.
The previously mentioned method allows us to solve the system (12.10) whenever
the associated coefficient matrix A is diagonalizable. The difficulty arises when there
exist some eigenvalues of A having its geometric multiplicity strictly lesser than the
algebraic one. In this case, A has fewer than n linearly independent eigenvectors and
452 12 Ordinary Differential Equations and Linear …
it is not diagonalizable. Nevertheless, when the matrix A has real eigenvalues, the
eigenvectors and generalized eigenvectors form a basis for Rn . From this basis, we
may construct the complete solution of system (12.10), by using a similar argument
as in Theorem 12.12.
Assume that ⎡ ⎤
Jn 1 (λ1 )
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎣ ⎦
Jnr (λr )
r
ni
k
x k− j
y(x) = ci,k eλi x X i, j (12.20)
i=1 k=1 j=1
(k − j)!
We recall that column vectors from P coincide with a Jordan basis for Rn and
represent a set of n linearly independent generalized eigenvectors of A. Any Jordan
block Jn k (λk ) of size n k is associated with a subset of n k Jordan generators, that
is, a chain of n k linearly independent generalized eigenvectors corresponding to the
eigenvalue λk . Moreover, the first vector in the chain is exactly an eigenvector of A
corresponding to λr . We may write
for gi, j differentiable functions of the variable x and defined by g(x) = P −1 y(x),
one can see that solution of (12.19) is given by
y(x) = Pg(x)
······
(12.21)
+X r,1 gr,1 (x) + · · · + X r,nr gr,nr (x)
r
ni
= X i, j gi, j (x).
i=1 j=1
g (x) = P −1 A Pg(x),
454 12 Ordinary Differential Equations and Linear …
that is, ⎡ ⎤ ⎡ ⎤
g1,1 (x) g1,1 (x)
⎢ ... ⎥ ⎡ ⎤⎢ ⎥
⎢ ⎥ ⎢ ... ⎥
⎢ g (x) ⎥ Jn 1 (λ1 ) ⎢ g1,n 1 (x) ⎥
⎢ 1,n 1 ⎥ ⎥⎢ ⎥
⎢ ... ⎥ ⎢ .. ⎥⎢ ... ⎥
⎢ ⎥=⎢ . ⎢ ⎥.
⎢ ... ⎥ ⎢ ⎥⎢
⎦⎢ ... ⎥
(12.22)
⎢ ⎥ ⎣ ⎥
⎢ g (x) ⎥ ⎢ ⎥
⎢ r,1 ⎥ Jnr (λr ) ⎢ gr,1 (x) ⎥
⎣ ... ⎦ ⎣ ... ⎦
gr,nr (x)
gr,n r
(x)
we get
gi = λgi + gi+1 for i = 1, . . . , k − 1;
(12.25)
gk = λgk .
gk−1 (x) = λgk−1 (x) + gk (x)=ck−1 eλx + ck xeλx for arbitrary constants ck−1 , ck .
Analogously,
gk−2 (x) = λgk−2 (x) + gk−1 (x) = λgk−2 (x) + ck−1 eλx + ck xeλx
x 2 λx
gk−2 (x) = ck−2 eλx + ck−1 xeλx + ck e for arbitrary constants ck−2 , ck−1 , ck .
2
Continuing this backward substitutions process, we arrive at the solution of system
(12.25), more precisely,
k
x h− j λx
g j (x) = ch e , j = 1, . . . , k for arbitrary constants c1 , . . . , ck .
h= j
(h − j)!
(12.26)
Go back to the general case. The unknown functions {gi,1 , . . . , gi,ni }, that are involved
in the system associated with Jni and related to eigenvalue λi (for any i = 1, . . . , r )
, can be determined as follows:
ni
x h− j λi x
gi, j (x) = ci,h e , j = 1, . . . , n i for arbitrary constants ci,h .
h= j
(h − j)!
(12.27)
Substitution of (12.27) in (12.21) gives
r
ni
y(x) = X i, j gi, j (x)
i=1 j=1
r
ni
ni
X i, j eλi x
h− j
= x
ci,h (h− j)! (12.28)
i=1 j=1 h= j
r
ni
λi x
h
x h− j
= ci,h e X
(h− j)! i, j
,
i=1 h=1 j=1
where
h
λi x x h− j
e X i, j |i = 1, . . . , r, h = 1, . . . , n i (12.29)
j=1
(h − j)!
is a set of n solutions to the initial system. We may describe this set by marking with
an appropriate symbol and any of its element as follows:
456 12 Ordinary Differential Equations and Linear …
h
x h− j
si,h (x) = eλi x X i, j i = 1, . . . , r h = 1, . . . , n i
j=1
(h − j)!
and denote by S the n × n matrix-valued function whose columns are precisely the
coordinates of any si,h (x):
S(x) = s1,1 (x) . . . s1,n1 (x) . . . . . . sr,1 (x) . . . sr,nr (x) .
for
⎡ ⎤
x ni −1 λi x
eλi x xeλi x x2 eλi x x3! eλi x x4! eλi x
2 3 4
... ... (n i −1)!
e
⎢ x ni −2 λi x ⎥
⎢ 0 eλi x xeλi x x2 eλi x x3! eλi x ⎥
2 3
... ... e
⎢ (n i −2)! ⎥
⎢ 0 0 eλi x xeλi x x2 eλi x
2
x 3 λi x
e ... x ni −3 λi x
e ⎥
⎢ 3! (n i −3)! ⎥
⎢ x ni −4 λi x ⎥
⎢ 0 0 0 eλi x xeλi x x 2 λi x
e ... e ⎥
⎢ 2 (n i −4)! ⎥
Bni (x) = ⎢
⎢
..
.
⎥.
⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎣ 0 0 0 ··· ··· ··· eλi x xeλi x ⎦
0 0 0 ··· ··· ··· 0 eλi x
By the fact that P is the constant invertible matrix whose columns are the coordinates
of linearly independent generalized eigenvectors of A, and since the determinant of
B(x) is a function of the variable x which is trivially nowhere zero, we arrive at
the conclusion that S(x) is an invertible matrix-valued function, for any x ∈ R. This
means that its rank is equal to n and its columns are linearly independent, as required.
Remark 12.17 Returning to the simplified case (12.23), we can assert that, for any
Jordan block Jk of length k and related to the eigenvalue λ, a set of k linearly
independent solutions is given by
12.2 System of Linear Homogeneous Ordinary Differential Equations 457
s1 (x) = eλx X 1 ; s2 (x) = eλx x X 1 + X 2 ;
2 3
x x x2
s3 (x) = eλx X 1 + x X 2 + X 3 ; s4 (x) = eλx X1 + X2 + x X3 + X4 ;
2 3! 2
... ... ... ...
k−2
λx x
sk−1 (x) = e X 1 + · · · + x X k−2 + X k−1 ;
(k − 2)!
k−1
λx x x2
sk (x) = e X1 + · · · + X k−2 + x X k−1 + X k ;
(k − 1)! 2
(12.30)
where {X 1 , . . . , X k } is the chain of generalized eigenvectors associated with eigen-
value λ and generating Jk . Thus, the contribution to the general solution (12.21) of
system (12.19) has the form
k
k t
x t− j
ct st (x) = ct eλx X j, (12.31)
t=1 t=1 j=1
(t − j)!
Let us finally summarize the method for solving a system of the form (12.19):
• Find eigenvalues {λ1 , . . . , λr } of A.
• For any Jordan block Jk (λ) of order k (related to some eigenvalue λ), find the
corresponding chain of k generalized eigenvectors {X 1 , . . . , X k }.
• Construct the particular solution (12.31) corresponding with Jk , that is,
c1 eλx X 1 +
λx
c2 e x X1 + X2 +
2
x
c3 eλx X1 + x X2 + X3 +
2
(12.32)
··· ··· ··· ··· +
k−2
x
ck−1 eλx X 1 + · · · + x X k−2 + X k−1 +
(k − 2)!
k−1
x x2
ck eλx X1 + · · · + X k−2 + x X k−1 + X k .
(k − 1)! 2
y1 = 2y1 − y2 + y3 + y4 ,
y2 = y1 + 4y2 + y4 ,
y3 = 4y3 + y4 ,
y4 = y3 + 4y4 ,
To arrive at the general solution of the system, we now find the eigenvectors and
generalized eigenvectors of matrix A.
Since there exists only one Jordan block of order 3 with eigenvalue λ1 = 3, we need
to find generalized eigenvectors corresponding to λ1 and having exponents 1, 2, 3.
Hence, we must solve the homogeneous linear systems associated with matrices
⎡ ⎤
−1 −1 1 1
⎢ 1 1 0 1⎥
(A − 3I ) = ⎢
⎣ 0 0 1 1⎦,
⎥
0 0 11
⎡ ⎤
0 0 1 0
⎢0 0 2 3⎥
⎢
(A − 3I ) = ⎣
2 ⎥
0 0 2 2⎦
0 0 2 2
and
12.2 System of Linear Homogeneous Ordinary Differential Equations 459
⎡ ⎤
0 0 1 1
⎢0 0 5 5⎥
(A − 3I ) = ⎢
3
⎣0
⎥.
0 4 4⎦
0 0 4 4
and ⎤ ⎡
1
⎢ −1 ⎥
X 1 = (A − 3I )X 2 = ⎢ ⎥
⎣ 0 ⎦.
0
having solution Y1 = α(1, 5, 4, 4), for any α ∈ R. Therefore, the Jordan basis with
respect of which we have the Jordan canonical form A is {X 1 , X 2 , X 3 , Y1 } and the
general solution of the original system of differential equations is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥
c1 e3x ⎢ ⎥ ⎢ ⎥ ⎢
⎣ 0 ⎦ + c2 e x ⎣ 0 ⎦ + ⎣ 0 ⎦ +
3x ⎥
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0 0 1
x ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥ ⎢ ⎥
c3 e3x ⎢ ⎥+x⎢ ⎥+⎢ ⎥ + c4 e5x ⎢ 5 ⎥ ,
2 ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 4 ⎦
0 0 −1 4
that is,
2
y1 (x) = c1 e3x + c2 xe3x + c3 x2 e3x + c4 e5x ,
2
y2 (x) = −c1 e3x − c2 (xe3x + e3x ) − c3 ( x2 e3x + xe3x ) + 5c4 e5x ,
1 = y1 (0) = c1 + c4 ,
0 = y3 (0) = c3 + 4c4 ,
1 x 2 3x
y1 (x) = 78 e3x − 41 xe3x − 2 2
e + 18 e5x ,
2
y2 (x) = − 78 e3x + 14 (xe3x + e3x ) + 21 ( x2 e3x + xe3x ) + 58 e5x ,
has only one eigenvalue λ = 2, with algebraic multiplicity equal to 4, but its geometric
multiplicity is equal to 1. Thus, the matrix has the following Jordan canonical form:
⎡ ⎤
2 1 0 0
⎢0 2 1 0⎥
⎢
A =⎣ ⎥.
0 0 2 1⎦
0 0 0 2
To arrive at the general solution of the system, we now find the eigenvectors and
generalized eigenvectors corresponding to λ and having exponents 1, 2, 3, 4.
Starting with (A − 2I ), we must solve the system
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 2 −1 −1 x1 0
⎢0 0 0 1 ⎥ ⎢ x2 ⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥.
⎣1 1 −1 1 ⎦ ⎣ x3 ⎦ ⎣ 0 ⎦
0 0 0 0 x4 0
and ⎡ ⎤
1
⎢0⎥
X 1 = (A − 2I )X 2 = ⎢
⎣1⎦.
⎥
Therefore, the Jordan basis with respect of which we have the Jordan canonical form
A is {X 1 , X 2 , X 3 , X 4 } and the general solution of the original system of differential
equations is
12.2 System of Linear Homogeneous Ordinary Differential Equations 463
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎢0⎥ ⎢0⎥ ⎢ 0 ⎥
c1 e2x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 1 ⎦ + c2 e x ⎣ 1 ⎦ + ⎣ −1 ⎦ +
2x
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0 −1
x ⎢ ⎥ ⎢ ⎥ ⎢
⎢0⎥ + x ⎢ 0 ⎥ + ⎢ 1 ⎥ +
⎥
c3 e2x ⎣ ⎦ ⎣ ⎦ ⎣
2 1 −1 1 ⎦
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 0 −1 0
⎢ ⎥ x2 ⎢ 0 ⎥
2x x ⎢ 0 ⎥
⎢ 1 ⎥ ⎢0⎥
+ ⎢ ⎥+x⎢ ⎥ ⎢ ⎥
c4 e
6 ⎣1⎦ 2 ⎣ −1 ⎦ ⎣ 1 ⎦ + ⎣0⎦ ,
0 0 0 1
that is,
2 3
y1 (x) = c1 + c2 x + c3 ( x2 − 1) + c4 ( x6 − x) e2x ,
y2 (x) = c3 + c4 x e2x ,
2 3
x2
y3 (x) = c1 + c2 (x − 1) + c3 ( x2 − x + 1) + c4 ( x6 − 2
+ x) e2x ,
y4 (x) = c4 e2x .
has only one eigenvalue λ = 2, with algebraic multiplicity equal to 3, but its geometric
multiplicity is equal to 2. Thus, the matrix has the following Jordan canonical form:
⎡ ⎤
210
A = ⎣ 0 2 0 ⎦ .
002
To arrive at the general solution of the system, we firstly find the generalized eigenvec-
tors corresponding to generating the Jordan block of order 2. Starting with (A − 2I ),
464 12 Ordinary Differential Equations and Linear …
For the chain of order 1, we can easy choose Y1 = (1, 0, 0) ∈ N1,λ . Therefore, the Jor-
dan basis with respect of which we have the Jordan canonical form A is {X 1 , X 2 , Y1 }
and the general solution of the original system of differential equations is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−8 −8 0 1
c1 e2x ⎣ 4 ⎦ + c2 e2x x ⎣ 4 ⎦ + ⎣ 0 ⎦ + c3 e2x ⎣ 0 ⎦ ,
2 2 1 0
that is,
y1 (x) = −8c1 − 8c2 x + c3 e2x ,
y2 (x) = 4c1 + 4c2 x e2x ,
y3 (x) = 2c1 + c2 (2x + 1) e2x .
y1 = −y3 − y4 ,
y2 = −y4 ,
y3 = y1 + y2 ,
y4 = y2 .
has two complex conjugate eigenvalues λ = ±i, each of which has algebraic mul-
tiplicity equal to 2, but geometric multiplicity equal to 1. Thus, the matrix has the
following Jordan canonical form
⎡ ⎤
i 1 0 0
⎢ 0 i 0 0 ⎥
A = ⎢
⎣0
⎥.
0 −i 1 ⎦
0 0 0 −i
that is,
⎡⎤ ⎡ ⎤ ⎡ ⎤
−1 −1 0
⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ −i ⎥
c1 cos x − i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + c2 cos x − i sin x x ⎣ −i ⎦ + ⎣ 0 ⎦ +
0 0 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 −1 0
⎢ 0 ⎥ ⎢ 0 ⎥ ⎢i ⎥
c3 cos x + i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ i ⎦ + c4 cos x + i sin x x ⎣ i ⎦ + ⎣ 0 ⎦
0 0 1
and
Therefore, the real and imaginary parts of s1 (x) are real solutions of the system,
because they are linear combinations of solutions. The arguments that were put
forward are all we need for obtaining the real solutions for systems with complex
eigenvalues.
Example 12.22 Let’s get back to Example 12.15. We now would like to find the
general real solution of the system
The coefficient matrix of the system has one real eigenvalue λ = −1 and two complex
conjugate eigenvalues μ = 4 + 3i, μ = 4 − 3i. It is diagonalizable.
The eigenspace corresponding to λ = −1 is generated by the eigenvector X 1 =
[1, 0, 1]t . The particular solution associated with λ = −1 is then
12.3 Real-Valued Solutions for Systems with Complex Eigenvalues 467
⎡ ⎤
1
s1 (x) = ⎣ 0 ⎦ e−x .
1
To find the part of the general real solution associated with the pair of complex
conjugate eigenvalues, it is sufficient to take only the real and imaginary parts of
eigenvector corresponding to one of them. For example, the complex eigenvector
corresponding to μ = 4 + 3i is X 2 = [i, 1 + i, 1]t = [0, 1, 1]t + i[1, 1, 0]t . Thus,
the real and imaginary parts of X 2 are, respectively,
Re X 2 = [0, 1, 1]t , I m X 2 = [1, 1, 0]t ,
that is, ⎡ ⎤ ⎡ ⎤
0 1
s2 (x) =e cos 3x + i sin 3x
4x ⎣ 1 + i 1⎦
⎦ ⎣
1 0
⎡ ⎤ ⎡ ⎤
− sin 3x cos 3x
=e4x ⎣ cos 3x − sin 3x ⎦ + ie4x ⎣ cos 3x + sin 3x ⎦ .
cos 3x sin 3x
Hence, the linear combination of s1 (x), Re s2 (x)) and I m s2 (x)) gives us the general
real solution of the system, that is,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 − sin 3x cos 3x
y(x) = c1 e−x ⎣ 0 ⎦ + c2 e4x ⎣ cos 3x − sin 3x ⎦ + c3 e4x ⎣ cos 3x + sin 3x ⎦
1 cos 3x sin 3x
Example 12.23 In conclusion, let us return to Example 12.21 and find the general
real solution of the system
y1 = −y3 − y4 ,
y2 = −y4 ,
y3 = y1 + y2 ,
y4 = y2 .
The coefficient matrix of the system has two complex conjugate eigenvalues λ =
±i, each of which has algebraic multiplicity equal to 2, but geometric multiplicity
equal to 1. As pointed out above, to find the general real solution associated with
468 12 Ordinary Differential Equations and Linear …
the pair of complex conjugate eigenvalues, we take only the real and imaginary
parts of generalized eigenvectors corresponding to one of them, more precisely we
choose λ = −i. The chain of generalized eigenvector associated with λ = −i is X 1 =
[−1, 0, −i, 0]t and X 2 = [0, −i, 0, 1]t . Therefore, the particular corresponding to
λ = −i is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 −1 0
⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ −i ⎥
s(x) = e−i x ⎢ ⎥
⎣ −i ⎦ + e
−i x
x⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + ⎣ 0 ⎦ ,
0 0 1
that is,
⎤ ⎡ ⎡ ⎤ ⎡ ⎤
−1 −1 0
⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ −i ⎥
s(x) = cos x − i sin x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −i ⎦ + cos x − i sin x x ⎣ −i ⎦ + ⎣ 0 ⎦
0 0 1
⎡ ⎤ ⎡ ⎤
− cos x − x cos x sin x + x sin x
⎢ − sin x ⎥ ⎢ − cos x ⎥
=⎢ ⎥ ⎢ ⎥
⎣ − sin x − x sin x ⎦ + i ⎣ − cos x − x cos x ⎦ .
cos x − sin x
The linear combination of real and imaginary parts of s(x) is then the real solution
⎡ ⎤ ⎡ ⎤
− cos x − x cos x sin x + x sin x
⎢ − sin x ⎥ ⎢ − cos x ⎥
y(x) = c1 ⎢ ⎥ ⎢ ⎥
⎣ − sin x − x sin x ⎦ + c2 ⎣ − cos x − x cos x ⎦
cos x − sin x
We now consider the linear, homogeneous differential equation with constant coef-
ficients of order n
n
y (n) (x) + ai y (n−i) (x) = 0, where a1 , . . . , an are fixed real constants.
i=1
(12.34)
Any function z(x) defined in an interval I ⊆ R is a solution of (12.34) if
(1) z(x) is at least n-times differentiable in I ;
12.4 Homogeneous Differential Equations of nth Order 469
n
(2) z (n) (x) + ai z (n−i) (x) = 0, for any x ∈ I .
i=1
n
L y = y (n) + ai y (n−i) (12.35)
i=1
for any function y = y(x) at least n-times differentiable over I . Thus, L defines a
map C n (I ) → C 0 (I ) such that L y is a continuous function, for any complex-valued
function y = y(x) at least n-times differentiable over I . We trivially note that, for
any c1 , c2 ∈ C and y1 , y2 ∈ C n (I ),
L(c1 y1 + c2 y2 ) = c1 L y1 + c2 L y2 ,
that is, L is a linear operator. In particular, this means that if y1 , . . . , yk are k solutions
of (12.34), then any linear combination of them is a solution of (12.34). Therefore,
the set
V = {y ∈ C n (I ) : L y = 0}
and assume that functions y1 (x), . . . , yn (x) are solutions of the problems (P1 ), . . . ,
(Pn ), respectively. Thus, y = c1 y1 + · · · + cn yn (ci ∈ R) is solution of the initial
value problem
y(x) = w(x0 )y1 (x) + w (x0 )y2 (x) + · · · + w (n−1) (x0 )yn (x),
470 12 Ordinary Differential Equations and Linear …
where x0 ∈ I is precisely the previously fixed point in the initial values problems
(P1 ), . . . , (Pn ). By the definition of y1 , . . . , yn , we observe that L y = 0 and also
Therefore, y and w are solutions of the same initial value problem and this implies that
y(x) = w(x), for any x ∈ I . Therefore, the arbitrary solution w(x) ∈ V is a linear
combination of the linearly independent solutions y1 , . . . , yn . The arbitrariness of w
allows us to conclude that {y1 , . . . , yn } is a set of linear independent generators for
the vector space V , that is, dim R V = n.
At this point, before proceeding with the determination of methods to obtain the
solutions of (12.34), we recall how to estimate if a set of known solutions is linear
independent or dependent. To do this, we suppose {y1 , . . . , yn } is a set of solutions
for (12.34).
Definition 12.25 The Wronskian of {y1 , . . . , yn } is defined on the interval I to be
the determinant
y1 (x) y2 (x) · · · yn (x)
y1 (x) y2 (x) · · · yn (x)
W (x) = y1 (x) y2 (x) · · · yn (x) .
··· ··· ··· · · ·
(n−1)
y (x) y (n−1)
(x) · · · y (n−1)
(x)
1 2 n
y1 = y2
y2 = y3
Hence, if the vector-valued function f(x) = [y1 (x), . . . , yn (x)]T is the solution of
system (12.37), the solution of the original differential equation (12.34) is precisely
the first component y1 (x) of f(x).
The above matrix A is usually called the Frobenius companion matrix of the monic
polynomial
n
p(x) = x + a1 x
n n−1
+ a2 x n−2
+ · · · + an−1 x + an = x + n
ai x n−i .
i=1
Note that, if n is the degree of a monic polynomial, its companion matrix has order
n. For example, the 4 × 4 companion matrix of x 4 + 2x 3 − 3x 2 + x + 1 is
⎡ ⎤
0 1 0 0
⎢ 0 0 1 0 ⎥
⎢ ⎥.
⎣ 0 0 0 1 ⎦
−1 −1 3 −2
The above discussions clearly show that it is important, from an analytic viewpoint, to
describe some properties of companion matrices. In particular, we focus our attention
in order to compute eigenvalues and eigenvectors of a companion matrix. To do this,
we prove some easy known results.
n
p(x) = x + a1 x
n n−1
+ a2 x n−2
+ · · · + an−1 x + an = x + n
ai x n−i .
i=1
Then the characteristic polynomial of A is (−1)n p(x). Moreover, the minimal poly-
nomial of A coincides with the characteristic one.
Proof We prove the first part of the theorem by induction on the order n of the matrix
(the degree of p(x)).
For n = 1, p(x) = x + a and A = [−a]. Hence, the characteristic polynomial of A
is −a − λ = (−1)1 (λ + a) = (−1)1 p(λ).
Suppose the assertion is true for any (n − 1) × (n − 1) companion matrix (with
n ≥ 2). Since A has order n, its characteristic polynomial is equal to the determinant
472 12 Ordinary Differential Equations and Linear …
−λ 1 0 ... 0
0 −λ 1 ... 0
|A − λI | = . . . . . . ... ... . . . .
0 0 ... −λ 1
−an −an−1 ... . . . −a1 − λ
We compute |A − λI | by using the cofactor expansion with respect to the first col-
umn:
−λ 1 0 ... 0
0 −λ 1 . . . 0
|A − λI | = (−λ) . . . ... ... ... . . .
0 0 . . . −λ 1
−an−1 −an−2 . . . . . . −a1 − λ
1 0 0 ... 0
−λ 1 0 ... 0
+(−1)n+1 (−an ) . . . ... ... ... . . . .
0 ... −λ 1 0
0 0 ... −λ 1
Notice that the first determinant is equal to the characteristic polynomial of the
companion matrix of the polynomial
Moreover, the second determinant is trivially equal to (−1)n+1 (−an ) = (−1)n (an ).
Hence,
|A − λI | = (−1)n−1 (−λ) λn−1 + a1 λn−2 + · · · + an−2 λ + an−1 + (−1)n (an )
= (−1)n λn + a1 λn−1 + · · · + an−2 λ2 + an−1 λ + an
as desired.
Let now λ0 be any eigenvalue of A and consider the matrix A − λ0 I . Since its
determinant must be zero, its rank is less than or equal to n − 1. On the other hand,
writing
12.4 Homogeneous Differential Equations of nth Order 473
⎡ ⎤
−λ0 1 0 ... 0
⎢ 0 −λ0 1 ... 0 ⎥
⎢ ⎥
A − λ0 I = ⎢
⎢ ... ... ... ... ... ⎥ ⎥
⎣ 0 0 ... −λ0 1 ⎦
−an −an−1 ... . . . −a1 − λ0
and deleting the first column and the last row, we obtain the (n − 1) × (n − 1) lower
triangular submatrix ⎡ ⎤
1 0 ... 0
⎢ −λ0 1 . . . 0 ⎥
⎢ ⎥
⎣ ... ... ... ...⎦
0 . . . −λ0 1
n
p(x) = x n + a1 x n−1 + a2 x n−2 + · · · + an−1 x + an = x n + ai x n−i .
i=1
−λx1 + x2 = 0
−λx2 + x3 = 0
(12.39)
... ... ...
−λxn−1 + xn = 0.
We firstly set
y1 = y, y2 = y , y3 = y , y3 = y .
y2 = y3 , (12.40)
The matrix A has two distinct real eigenvalues: λ1 = −1 having algebraic multiplicity
equal to 1; λ2 = 2 having algebraic multiplicity equal to 2.
The eigenvector generating the null space of A + I is X 1 = (1, −1, 1) (that is, the
eigenvector associated with λ1 = −1).
Notice that, since λ2 = 2 has geometric multiplicity equal to 1, there exists one Jordan
block of order 2 generated by the generalized eigenvectors of λ2 . By easy computa-
tion, we get N1,λ2 =
(1, 2, 4) and N2,λ2 =
(1, 0, −4), (0, 1, 4). The correspond-
ing chain of generalized eigenvectors of order 2 is X 1 = (1, 2, 4), X 2 = (0, 1, 4).
Hence, the Jordan canonical form of A is
⎡ ⎤
21 0
A = ⎣ 0 2 0 ⎦
0 0 −1
12.4 Homogeneous Differential Equations of nth Order 475
relative to the Jordan basis {(1, 2, 4), (0, 1, 4), (1, −1, 1)}. Thus, a general solution
for the system (12.40) is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1
c1 e2x ⎣ 2 ⎦ + c2 e2x x ⎣ 2 ⎦ + ⎣ 1 ⎦ + c3 e−x ⎣ −1 ⎦ ,
4 4 4 1
that is,
y1 (x) = c1 + c2 x e2x + c3 e−x ,
y2 (x) = 2c1 + c2 (2x + 1) e2x − c3 e−x ,
y3 (x) = 4c1 + c2 (4x + 4) e2x + c3 e−x .
In particular,
y(x) = y1 (x) = c1 + c2 x e2x + c3 e−x
1 = y(0) = c1 + c3 ,
0 = y (0) = 2c1 + c2 − c3 ,
Solving the above linear system, one has c1 = 23 , c2 = −1, c3 = 13 . So the solution
satisfying the initial conditions is
Since y ( j) (x) = λ j eλx for any j ≥ 0, it follows that y(x) is a solution for (12.34) if
and only if
n
eλx {λn + ai λn−i } = 0 ∀x ∈ I.
i=1
n
λn + ai λn−i = 0 ∀x ∈ I
i=1
since the exponential is never zero. We then observe that the complex exponential eλx
can be a solution for the given differential equation. More precisely, it is a solution
for (12.34) if and only if λ is a root of the polynomial
n
p(t) = t n + ai t n−i .
i=1
p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k ,
n
L = D (n) + ai D (n−i) = p(D), (12.41)
1=1
where p(t) is precisely the characteristic polynomial associated with the differential
equation and D k = ( ddx )k (k = 1, . . . , n), denoting by ddx the differentiation operator.
Recalling that Eq. (12.34) can also be written L y = 0, we see that any solution y(x)
of (12.34) actually is an element of the null space of L, that is, y(x) ∈ K er (L).
Remark 12.29 Suppose there exist L 1 , . . . , L k linear differential operators with
constant coefficients mapping C n (I ) → C 0 (I ), such that L = L 1 L 2 · · · L k . Then
K er (L i ) ⊆ K er (L), for any i = 1, . . . , k.
The idea is now to discuss the characteristic polynomial associated with the differ-
ential equation, in order to obtain the linearly independent functions generating the
null space of L, that is, a basis for K er (L). The roots of p(t) will have three possible
forms:
12.4 Homogeneous Differential Equations of nth Order 477
Case (1): The roots of p(t) are all real and distinct.
In this case, the characteristic polynomial factors as
p(t) = (t − λ1 )(t − λ2 ) · · · (t − λn ),
where λ1 , . . . , λn are the distinct roots of p(t), as well as the differential operator L
similarly factors as
L = (D − λ1 )(D − λ2 ) · · · (D − λn ),
where
(D − λi )y = y − λi y ∀i = 1, . . . , n. (12.42)
Hence, {y1 , . . . , yn } is a linearly independent set, that is, a basis for the vector space
of solutions. The general solution can be written as
9 3
y − y + 5y − y = 0.
2 2
The characteristic polynomial associated with the equation is
9 3
p(t) = t 3 − t 2 + 5t −
2 2
Case (2): The roots of p(t) are all real but not all distinct.
Here we assume
p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k ,
where λ1 , . . . , λk are the distinct roots of p(t) and m i is the algebraic multiplicity of
λi , for i = 1, . . . , k. In parallel, we can factor L as
L = (D − λ1 )m 1 (D − λ2 )m 2 · · · (D − λk )m k .
To fully describe the present case, we need to premise some results. More precisely,
Hence,
12.4 Homogeneous Differential Equations of nth Order 479
= 0 ∀x ∈ I
proving that
eλx , xeλx , . . . , x m−1 eλx ∈ K er (D − λ)m .
To complete the proof, we then prove that those functions are linearly independent.
Let c1 , . . . , cm ∈ R be such that
and, by contradiction, assume there exists at least one index i ∈ {1, . . . , m} such that
ci = 0. Since eλx is never zero, (12.43) says that
c1 + c2 x + · · · + cm x m−1 = 0.
which cannot occur, due to the fact that λ1 − λ2 = 0 and both f 1 (x) and f 2 (x) are not
identically zero. Hence, we may assert that c2 = 0 and, by (12.45), c1 = 0 follows
trivially.
Suppose now that the result is true for the k − 1 functions
Our final aim is to show that it holds again for the k functions
f 1 (x)eλ1 x , . . . , f k (x)eλk x .
k
ci f i (x)eλi x = 0 ∀x ∈ R. (12.46)
i=1
k−1
ci f i (x)e(λi −λk )x = −ck f k (x) ∀x ∈ R. (12.47)
i=1
which is a contradiction, since λi − λk = 0 and f i (x) is not identically zero, for any
i = 1, . . . , k − 1.
Thus, ck = 0 and relation (12.47) reduces to
k−1
ci f i (x)eμi x = 0 ∀x ∈ R, (12.48)
i=1
where
μ1 = (λ1 − λk ), μ2 = (λ2 − λk ), . . . , μk−1 = (λk−1 − λk )
p(t) = (t − λ1 )m 1 (t − λ2 )m 2 · · · (t − λk )m k
k
mi
cis x s−1 eλi x = 0 ∀x ∈ R. (12.50)
i=1 s=1
mi
For any i ∈ {1, . . . , k}, here we denote f i (x) = cis x s−1 .
s=1
In case, any function f i (x) is identically zero in R, it follows trivially cis = 0, for
any i ∈ {1, . . . , k} and for any s ∈ {1, . . . , m i }.
Then we may assume that
• there exist some i 1 , . . . , i h (1 ≤ h ≤ k) such that the polynomials f i1 , . . . , f ih are
not identically zero;
• fr (x) = 0, for any x ∈ R, if r ∈ / {i 1 , . . . , i h }.
Hence, we reduce relation (12.50) to
h
f i j (x)eλi j x = 0 ∀x ∈ R. (12.51)
j=1
But, in light of Proposition 12.33 and since f i1 , . . . , f ih are not identically zero, the
relation (12.51) represents a contradiction.
At this point, we are ready to describe the general solution of Eq. (12.34) in the case
its characteristic polynomial is
p(t) = (t − λ1 )(t − λ2 ) · · · (t − λn ),
...,...,...,...
k
mi
y(x) = cis x s−1 eλi x . (12.52)
i=1 s=1
Example 12.35 Repeat Example 12.28 and solve the differential equation
y − 3y + 4y = 0.
p(t) = t 3 − 3t 2 + 4
p(t) = t 4 − 2t 3 − 3t 2 + 8t − 4
having two distinct real roots: λ1 = 1 and λ2 = 2, both of them of algebraic multi-
plicity equal to 2.
Thus, the general solution of the differential equation is
p(t) = t 4 − 8t 3 + 18t 2 − 27
eαx (cos βx + i sin βx), xeαx (cos βx + i sin βx), . . . , x m−1 eαx (cos βx + i sin βx)
eαx (cos βx − i sin βx), xeαx (cos βx − i sin βx), . . . , x m−1 eαx (cos βx − i sin βx)
which represent a basis for the null space K er (D − λ)m . Exactly as seen in the pre-
vious cases, those functions give their contribution to the constitution of the whole
basis for K er (L).
eαx sin βx, xeαx sin βx, . . . , x m−1 eαx sin βx.
Those functions contribute to the general real solution, together with any other solu-
tions corresponding to any real roots of p(t).
y iv − 16y = 0.
p(t) = t 4 − 16
484 12 Ordinary Differential Equations and Linear …
having two distinct real roots λ1 = 2 and λ2 = −2, both of which of multiplicity
equal to 1, and two complex roots λ3 = 2i and λ3 = −2i.
Thus, the general solution of the differential equation is
Example 12.39 Find the real solution of the following initial value problem:
y − y + 4y − 4y = 0,
y(0) = 2, y (0) = 1, y (0) = 1.
p(t) = t 3 − t 2 + 4t − 4
having one real root λ1 = 1 of multiplicity equal to 1, and two complex roots λ2 = 2i
and λ2 = −2i.
Thus, the general real solution of the differential equation is
Exercises
y1 = y1 ,
y2 = −y2 − 4y3 + 2y4 ,
y3 = 3y2 + y3 − 2y4 ,
y4 = y2 − 4y3 + y4 .
y1 = 2y1 + y2 − y3 + y4 ,
y2 = 2y2 + y3 + 2y4 ,
y3 = 2y3 + y4 ,
y4 = 2y4 .
y1 = y1 + 2y2 − y3 − y4 ,
y2 = −2y1 + y2 + y3 + y4 ,
y3 = y3 + 2y4 ,
y4 = −2y3 + y4 .
y1 = y1 + 2y2 ,
y2 = −2y1 + y2 .
y1 = y2 ,
y2 = −3y1 − 2y2 .
12.4 Homogeneous Differential Equations of nth Order 487
y1 = y1 + y2 + y3 ,
y2 = −y1 + y2 − y4 ,
y3 = y3 + y4 ,
y4 = −y3 + y4 .
y1 = 3y2 − y4 ,
y2 = −3y1 + y3 ,
y3 = 2y4 ,
y4 = −2y3 .
y − 2y + 2y = 0,
1
y(0) = , y (0) = 1.
3
12. Solve the following initial value problem:
y − 2y − 8y = 0,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 489
Nature Singapore Pte Ltd. 2022
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3
Index
A Canonical homomorphism, 77
Abelian group, 2 Canonical isomorphism, 118
Adjoint, 155 Cauchy-Schwartz inequality, 137
Affine coordinate system, 366 Cayley-Hamilton thoerem, 213
Affine Euclidean space, 366 Characteristic equation, 168
Affinely equivalent, 399, 403 Characteristic of a ring, 13
Affine space, 365 Characteristic polynomial, 168, 214, 476
Affine subspace, 368, 369 Coefficient matrix, 39
Affine transformation, 381 Column coordinate vectors, 167
Affinity, 400 Commutative ring, 9
Algebraically closed field, 21 Complementary solution, 438
Algebraic element, 14, 17 Complex spectral theorem, 225
Algebraic extension, 19 Congruence, 244
Algebraic multiplicity, 174, 467 Congruent matrices, 244
Alternate map, 314 Conjugate, 27
Alternating bilinear form, 245 Conjugate-linear, 285
Alternating tensor, 325 Conjugate transpose, 27
Annihilator, 119 Consistent system, 38
Antisymmetric map, 314 Contraction map, 323
Antisymmetric tensor, 325 Contravariant tensor, 321
Associative binary operation, 2 Coordinate vector, 67, 99
Augmented matrix, 39 Coset, 5
Automorphism, 7 Covariant tensor, 321
B
Basis, 56 D
Bessel’s inequality, 149 Decomposable tensor, 322
Bilinear, 239 Definite solution, 38
Bilinear form, 239 Degenerate, 401
Binary operation, 1 Degenerate quadric without center, 410
Diagonalizable matrix, 184, 466
Diagonalizable operator, 184
C Diagonilizing basis, 184
Canonical equation, 410 Diagonal matrix, 25
Canonical forms of a matrix, 168 Dimension, 59
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer 491
Nature Singapore Pte Ltd. 2022
M. Ashraf et al., Advanced Linear Algebra with Applications,
https://doi.org/10.1007/978-981-16-2167-3
492 Index
U W
Unitarily similar, 179, 225 Wedge product, 330
Unitary operator, 162, 218 Wronskian, 442, 470
Upper triangular, 25
Upper unitriangular, 256, 281
Z
V Zero divisor, 13
Variation of parameters, 438 Zero linear transformation, 77
Vector space, 43 Zero matrix, 25
Vector valued function, 439 Zero ring, 9