RUSSIAN FEDERAL COMMITTEE

FOR HIGHER EDUCATION
BASHKIR STATE UNIVERSITY
SHARIPOV R. A.
COURSE OF LINEAR ALGEBRA
AND MULTIDIMENSIONAL GEOMETRY
The Textbook
Ufa 1996
2
MSC 97U20
PACS 01.30.Pp
UDC 512.64
Sharipov R. A. Course of Linear Algebra and Multidimensional Geom-
etry: the textbook / Publ. of Bashkir State University — Ufa, 1996. — pp. 143.
ISBN 5-7477-0099-5.
This book is written as a textbook for the course of multidimensional geometry
and linear algebra. At Mathematical Department of Bashkir State University this
course is taught to the first year students in the Spring semester. It is a part of
the basic mathematical education. Therefore, this course is taught at Physical and
Mathematical Departments in all Universities of Russia.
In preparing Russian edition of this book I used the computer typesetting on
the base of the /
´
o-T
E
X package and I used the Cyrillic fonts of Lh-family
distributed by the CyrTUG association of Cyrillic T
E
X users. English edition of
this book is also typeset by means of the /
´
o-T
E
X package.
Referees: Computational Mathematics and Cybernetics group of Ufa
State University for Aircraft and Technology (UGATU);
Prof. S. I. Pinchuk, Chelyabinsk State University for Technol-
ogy (QGTU) and Indiana University.
Contacts to author.
Office: Mathematics Department, Bashkir State University,
32 Frunze street, 450074 Ufa, Russia
Phone: 7-(3472)-23-67-18
Fax: 7-(3472)-23-67-74
Home: 5 Rabochaya street, 450003 Ufa, Russia
Phone: 7-(917)-75-55-786
E-mails: R Sharipov@ic.bashedu.ru
r-sharipov@mail.ru
ra sharipov@lycos.com
ra sharipov@hotmail.com
URL: http://www.geocities.com/r-sharipov
ISBN 5-7477-0099-5 c ( Sharipov R.A., 1996
c ( Bashkir State University, 1996
English translation c ( Sharipov R.A., 2004
CONTENTS.
CONTENTS. ............................................................................................... 3.
PREFACE. .................................................................................................. 5.
CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. ........ 6.
' 1. The sets and mappings. ......................................................................... 6.
' 2. Linear vector spaces. ........................................................................... 10.
' 3. Linear dependence and linear independence. ......................................... 14.
' 4. Spanning systems and bases. ................................................................ 18.
' 5. Coordinates. Transformation of the coordinates of a vector
under a change of basis. ....................................................................... 22.
' 6. Intersections and sums of subspaces. ..................................................... 27.
' 7. Cosets of a subspace. The concept of factorspace. ................................. 31.
' 8. Linear mappings. ................................................................................ 36.
' 9. The matrix of a linear mapping. ........................................................... 39.
' 10. Algebraic operations with mappings.
The space of homomorphisms Hom(V, W). ........................................... 45.
CHAPTER II. LINEAR OPERATORS. ...................................................... 50.
' 1. Linear operators. The algebra of endomorphisms End(V )
and the group of automorphisms Aut(V ). ............................................. 50.
' 2. Projection operators. ........................................................................... 56.
' 3. Invariant subspaces. Restriction and factorization of operators. .............. 61.
' 4. Eigenvalues and eigenvectors. ............................................................... 66.
' 5. Nilpotent operators. ............................................................................ 72.
' 6. Root subspaces. Two theorems on the sum of root subspaces. ................ 79.
' 7. Jordan basis of a linear operator. Hamilton-Cayley theorem. ................. 83.
CHAPTER III. DUAL SPACE. .................................................................. 87.
' 1. Linear functionals. Vectors and covectors. Dual space. .......................... 87.
' 2. Transformation of the coordinates of a covector
under a change of basis. ....................................................................... 92.
' 3. Orthogonal complements in a dual spaces. ............................................ 94.
' 4. Conjugate mapping. ............................................................................ 97.
CHAPTER IV. BILINEAR AND QUADRATIC FORMS. ......................... 100.
' 1. Symmetric bilinear forms and quadratic forms. Recovery formula. ....... 100.
' 2. Orthogonal complements with respect to a quadratic form. .................. 103.
4 CONTENTS.
' 3. Transformation of a quadratic form to its canonic form.
Inertia indices and signature. ............................................................. 108.
' 4. Positive quadratic forms. Silvester’s criterion. ..................................... 114.
CHAPTER V. EUCLIDEAN SPACES. ..................................................... 119.
' 1. The norm and the scalar product. The angle between vectors.
Orthonormal bases. ........................................................................... 119.
' 2. Quadratic forms in a Euclidean space. Diagonalization of a pair
of quadratic forms. ............................................................................ 123.
' 3. Selfadjoint operators. Theorem on the spectrum and the basis
of eigenvectors for a selfadjoint operator. ............................................ 127.
' 4. Isometries and orthogonal operators. .................................................. 132.
CHAPTER VI. AFFINE SPACES. ........................................................... 136.
' 1. Points and parallel translations. Affine spaces. .................................... 136.
' 2. Euclidean point spaces. Quadrics in a Euclidean space. ...................... 139.
REFERENCES. ....................................................................................... 143.
PREFACE.
There are two approaches to stating the linear algebra and the multidimensional
geometry. The first approach can be characterized as the «coordinates and
matrices approach». The second one is the «invariant geometric approach».
In most of textbooks the coordinates and matrices approach is used. It starts
with considering the systems of linear algebraic equations. Then the theory of
determinants is developed, the matrix algebra and the geometry of the space R
n
are considered. This approach is convenient for initial introduction to the subject
since it is based on very simple concepts: the numbers, the sets of numbers, the
numeric matrices, linear functions, and linear equations. The proofs within this
approach are conceptually simple and mostly are based on calculations. However,
in further statement of the subject the coordinates and matrices approach is not so
advantageous. Computational proofs become huge, while the intension to consider
only numeric objects prevents us from introducing and using new concepts.
The invariant geometric approach, which is used in this book, starts with the
definition of abstract linear vector space. Thereby the coordinate representation
of vectors is not of crucial importance; the set-theoretic methods commonly used
in modern algebra become more important. Linear vector space is the very object
to which these methods apply in a most simple and effective way: proofs of many
facts can be shortened and made more elegant.
The invariant geometric approach lets the reader to get prepared to the study
of more advanced branches of mathematics such as differential geometry, commu-
tative algebra, algebraic geometry, and algebraic topology. I prefer a self-sufficient
way of explanation. The reader is assumed to have only minimal preliminary
knowledge in matrix algebra and in theory of determinants. This material is
usually given in courses of general algebra and analytic geometry.
Under the term «numeric field» in this book we assume one of the following
three fields: the field of rational numbers Q, the field of real numbers R, or the
field of complex numbers C. Therefore the reader should not know the general
theory of numeric fields.
I am grateful to E. B. Rudenko for reading and correcting the manuscript of
Russian edition of this book.
May, 1996;
May, 2004. R. A. Sharipov.
CHAPTER I
LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
' 1. The sets and mappings.
The concept of a set is a basic concept of modern mathematics. It denotes any
group of objects for some reasons distinguished from other objects and grouped
together. Objects constituting a given set are called the elements of this set. We
usually assign some literal names (identificators) to the sets and to their elements.
Suppose the set A consists of three objects m, n, and q. Then we write
A = ¦m, n, q¦.
The fact that m is an element of the set A is denoted by the membership sign:
m ∈ A. The writing p / ∈ A means that the object p is not an element of the set A.
If we have several sets, we can gather all of their elements into one set which
is called the union of initial sets. In order to denote this gathering operation we
use the union sign ∪. If we gather the elements each of which belongs to all of our
sets, they constitute a new set which is called the intersection of initial sets. In
order to denote this operation we use the intersection sign ∩.
If a set A is a part of another set B, we denote this fact as A ⊂ B or A ⊆ B
and say that the set A is a subset of the set B. Two signs ⊂ and ⊆ are equivalent.
However, using the sign ⊆, we emphasize that the condition A ⊂ B does not
exclude the coincidence of sets A = B. If A B, then we say that the set A is a
strict subset in the set B.
The term empty set is used to denote the set ∅ that comprises no elements at
all. The empty set is assumed to be a part of any set: ∅ ⊂ A.
Definition 1.1. The mapping f : X → Y from the set X to the set Y is a
rule f applicable to any element x of the set X and such that, being applied to a
particular element x ∈ X, uniquely defines some element y = f(x) in the set Y .
The set X in the definition 1.1 is called the domain of the mapping f. The
set Y in the definition 1.1 is called the domain of values of the mapping f. The
writing f(x) means that the rule f is applied to the element x of the set X. The
element y = f(x) obtained as a result of applying f to x is called the image of x
under the mapping f.
Let A be a subset of the set X. The set f(A) composed by the images of all
elements x ∈ A is called the image of the subset A under the mapping f:
f(A) = ¦y ∈ Y : ∃ x ((x ∈ A) & (f(x) = y))¦.
If A = X, then the image f(X) is called the image of the mapping f. There is
special notation for this image: f(X) = Imf. The set of values is another term
used for denoting Imf = f(X); don’t confuse it with the domain of values.
§ 1. THE SETS AND MAPPINGS. 7
Let y be an element of the set Y . Let’s consider the set f
−1
(y) consisting of all
elements x ∈ X that are mapped to the element y. This set f
−1
(y) is called the
total preimage of the element y:
f
−1
(y) = ¦x ∈ X: f(x) = y¦.
Suppose that B is a subset in Y . Taking the union of total preimages for all
elements of the set B, we get the total preimage of the set B itself:
f
−1
(B) = ¦x ∈ X: f(x) ∈ B¦.
It is clear that for the case B = Y the total preimage f
−1
(Y ) coincides with X.
Therefore there is no special sign for denoting f
−1
(Y ).
Definition 1.2. The mapping f : X → Y is called injective if images of any
two distinct elements x
1
= x
2
are different, i. e. x
1
= x
2
implies f(x
1
) = f(x
2
).
Definition 1.3. The mapping f : X →Y is called surjective if total preimage
f
−1
(y) of any element y ∈ Y is not empty.
Definition 1.4. The mapping f : X → Y is called a bijective mapping or
a one-to-one mapping if total preimage f
−1
(y) of any element y ∈ Y is a set
consisting of exactly one element.
Theorem 1.1. The mapping f : X →Y is bijective if and only if it is injective
and surjective simultaneously.
Proof. According to the statement of theorem 1.1, simultaneous injectivity
and surjectivity is necessary and sufficient condition for bijectivity of the mapping
f : X →Y . Let’s prove the necessity of this condition for the beginning.
Suppose that the mapping f : X →Y is bijective. Then for any y ∈ Y the total
preimage f
−1
(y) consists of exactly one element. This means that it is not empty.
This fact proves the surjectivity of the mapping f : X → Y .
However, we need to prove that f is not only surjective, but bijective as well.
Let’s prove the bijectivity of f by contradiction. If the mapping f is not bijective,
then there are two distinct elements x
1
= x
2
in X such that f(x
1
) = f(x
2
). Let’s
denote y = f(x
1
) = f(x
2
) and consider the total preimage f
−1
(y). From the
equality f(x
1
) = y we derive x
1
∈ f
−1
(y). Similarly from f(x
2
) = y we derive
x
2
∈ f
−1
(y). Hence, the total preimage f
−1
(y) is a set containing at least two
distinct elements x
1
and x
2
. This fact contradicts the bijectivity of the mapping
f : X → Y . Due to this contradiction we conclude that f is surjective and
injective simultaneously. Thus, we have proved the necessity of the condition
stated in theorem 1.1.
Let’s proceed to the proof of sufficiency. Suppose that the mapping f : X → Y
is injective and surjective simultaneously. Due to the surjectivity the sets f
−1
(y)
are non-empty for all y ∈ Y . Suppose that someone of them contains more
than one element. If x
1
= x
2
are two distinct elements of the set f
−1
(y), then
f(x
1
) = y = f(x
2
). However, this equality contradicts the injectivity of the
mapping f : X → Y . Hence, each set f
−1
(y) is non-empty and contains exactly
one element. Thus, we have proved the bijectivity of the mapping f.
CopyRight c (Sharipov R.A., 1996, 2004.
8 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Theorem 1.2. The mapping f : X →Y is surjective if and only if Imf = Y .
Proof. If the mapping f : X → Y is surjective, then for any element y ∈ Y
the total preimage f
−1
(y) is not empty. Choosing some element x ∈ f
−1
(y), we
get y = f(x). Hence, each element y ∈ Y is an image of some element x under the
mapping f. This proves the equality Imf = Y .
Conversely, if Imf = Y , then any element y ∈ Y is an image of some element
x ∈ X, i. e. y = f(x). Hence, for any y ∈ Y the total preimage f
−1
(y) is not
empty. This means that f is a surjective mapping.
Let’s consider two mappings f : X → Y and g : Y →Z. Choosing an arbitrary
element x ∈ X we can apply f to it. As a result we get the element f(x) ∈ Y .
Then we can apply g to f(x). The successive application of two mappings g(f(x))
yields a rule that associates each element x ∈ X with some uniquely determined
element z = g(f(x)) ∈ Z, i. e. we have a mapping ϕ: X → Z. This mapping is
called the composition of two mappings f and g. It is denoted as ϕ = g ◦ f.
Theorem 1.3. The composition g ◦ f of two injective mappings f : X →Y and
g : Y →Z is an injective mapping.
Proof. Let’s consider two elements x
1
and x
2
of the set X. Denote y
1
= f(x
1
)
and y
2
= f(x
2
). Therefore g ◦ f(x
1
) = g(y
1
) and g ◦ f(x
2
) = g(y
2
). Due to the
injectivity of f from x
1
= x
2
we derive y
1
= y
2
. Then due to the injectivity of g
from y
1
= y
2
we derive g(y
1
) = g(y
2
). Hence, g ◦ f(x
1
) = g ◦ f(x
2
). The injectivity
of the composition g ◦ f is proved.
Theorem 1.4. The composition g ◦ f of two surjective mappings f : X → Y
and g : Y →Z is a surjective mapping.
Proof. Let’s take an arbitrary element z ∈ Z. Due to the surjectivity of
g the total preimage g
−1
(z) is not empty. Let’s choose some arbitrary vector
y ∈ g
−1
(z) and consider its total preimage f
−1
(y). Due to the surjectivity
of f it is not empty. Then choosing an arbitrary vector x ∈ f
−1
(y), we get
g ◦ f(x) = g(f(x)) = g(y) = z. This means that x ∈ (g ◦ f)
−1
(z). Hence, the total
preimage (g ◦ f)
−1
(z) is not empty. The surjectivity of g ◦ f is proved.
As an immediate consequence of the above two theorems we obtain the following
theorem on composition of two bijections.
Theorem 1.5. The composition g ◦ f of two bijective mappings f : X →Y and
g : Y →Z is a bijective mapping.
Let’s consider three mappings f : X →Y , g : Y →Z, and h: Z → U. Then we
can form two different compositions of these mappings:
ϕ = h◦ (g ◦ f), ψ = (h ◦ g) ◦ f. (1.1)
The fact of coincidence of these two mappings is formulated as the following
theorem on associativity.
Theorem 1.6. The operation of composition for the mappings is an associative
operation, i. e. h ◦ (g ◦ f) = (h ◦ g) ◦ f.
§ 1. THE SETS AND MAPPINGS. 9
Proof. According to the definition 1.1, the coincidence of two mappings
ϕ: X →U and ψ: X →U is verified by verifying the equality ϕ(x) = ψ(x) for an
arbitrary element x ∈ X. Let’s denote α = h ◦ g and β = g ◦ f. Then
ϕ(x) = h ◦ β(x) = h(β(x)) = h(g(f(x))),
ψ(x) = α◦ f(x) = α(f(x)) = h(g(f(x))).
(1.2)
Comparing right hand sides of the equalities (1.2), we derive the required equality
ϕ(x) = ψ(x) for the mappings (1.1). Hence, h ◦ (g ◦ f) = (h ◦ g) ◦ f.
Let’s consider a mapping f : X → Y and the pair of identical mappings
id
X
: X →X and id
Y
: Y →Y . The last two mappings are defined as follows:
id
X
(x) = x, id
Y
(y) = y.
Definition 1.5. A mapping l : Y → X is called left inverse to the mapping
f : X →Y if l ◦ f = id
X
.
Definition 1.6. A mapping r : Y → X is called right inverse to the mapping
f : X →Y if f ◦ r = id
Y
.
The problem of existence of the left and right inverse mappings is solved by the
following two theorems.
Theorem 1.7. A mapping f : X → Y possesses the left inverse mapping l if
and only if it is injective.
Theorem 1.8. A mapping f : X → Y possesses the right inverse mapping r if
and only if it is surjective.
Proof of the theorem 1.7. Suppose that the mapping f possesses the left
inverse mapping l. Let’s choose two vectors x
1
and x
2
in the space X and let’s
denote y
1
= f(x
1
) and y
2
= f(x
2
). The equality l ◦ f = id
X
yields x
1
= l(y
1
)
and x
2
= l(y
2
). Hence, the equality y
1
= y
2
implies x
1
= x
2
and x
1
= x
2
implies
y
1
= y
2
. Thus, assuming the existence of left inverse mapping l, we defive that the
direct mapping f is injective.
Conversely, suppose that f is an injective mapping. First of all let’s choose
and fix some element x
0
∈ X. Then let’s consider an arbitrary element y ∈ Imf.
Its total preimage f
−1
(y) is not empty. For any y ∈ Imf we can choose and fix
some element x
y
∈ f
−1
(y) in non-empty set f
−1
(y). Then we define the mapping
l : Y →X by the following equality:
l(y) =

x
y
for y ∈ Imf,
x
0
for y ∈ Imf.
Let’s study the composition l◦f. It is easy to see that for any x ∈ X and for
y = f(x) the equality l◦f(x) = x
y
is fulfilled. Then f(x
y
) = y = f(x). Taking into
account the injectivity of f, we get x
y
= x. Hence, l◦f(x) = x for any x ∈ X.
The equality l ◦ f = id
X
for the mapping l is proved. Therefore, this mapping is a
required left inverse mapping for f. Theorem is proved.
Proof of the theorem 1.8. Suppose that the mapping f possesses the right
inverse mapping r. For an arbitrary element y ∈ Y , from the equality f ◦ r = id
Y
10 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
we derive y = f(r(y)). This means that r(y) ∈ f
−1
(y), therefore, the total
preimage f
−1
(y) is not empty. Thus, the surjectivity of f is proved.
Now, conversely, let’s assume that f is surjective. Then for any y ∈ Y the
total preimage f
−1
(y) is not empty. In each non-empty set f
−1
(y) we choose and
mark exactly one element x
y
∈ f
−1
(y). Then we can define a mapping by setting
r(y) = x
y
. Since f(x
y
) = y, we get f(r(y)) = y and f ◦ r = id
Y
. The existence of
the right inverse mapping r for f is established.
Note that the mappings l : Y → X and r : Y → X constructed when proving
theorems 1.7 and 1.8 in general are not unique. Even the method of constructing
them contains definite extent of arbitrariness.
Definition 1.7. A mapping f
−1
: Y → X is called bilateral inverse mapping
or simply inverse mapping for the mapping f : X → Y if
f
−1
◦ f = id
X
, f ◦ f
−1
= id
Y
. (1.3)
Theorem 1.9. A mapping f : X → Y possesses both left and right inverse
mappings l and r if and only if it is bijective. In this case the mappings l and r are
uniquely determined. They coincide with each other thus determining the unique
bilateral inverse mapping l = r = f
−1
.
Proof. The first proposition of the theorem 1.9 follows from theorems 1.7,
1.8, and 1.1. Let’s prove the remaining propositions of this theorem 1.9. The
coincidence l = r is derived from the following chain of equalities:
l = l ◦ id
Y
= l ◦ (f ◦ r) = (l ◦ f) ◦ r = id
X
◦ r = r.
The uniqueness of left inverse mapping also follows from the same chain of
equalities. Indeed, if we assume that there is another left inverse mapping l

, then
from l = r and l

= r it follows that l = l

.
In a similar way, assuming the existence of another right inverse mapping r

, we
get l = r and l = r

. Hence, r = r

. Coinciding with each other, the left and right
inverse mappings determine the unique bilateral inverse mapping f
−1
= l = r
satisfying the equalities (1.3).
' 2. Linear vector spaces.
Let M be a set. Binary algebraic operation in M is a rule that maps each
ordered pair of elements x, y of the set M to some uniquely determined element
z ∈ M. This rule can be denoted as a function z = f(x, y). This notation is called
a prefix notation for an algebraic operation: the operation sign f in it precedes
the elements x and y to which it is applied. There is another infix notation
for algebraic operations, where the operation sign is placed between the elements
x and y. Examples are the binary operations of addition and multiplication of
numbers: z = x + y, z = x y. Sometimes special brackets play the role of the
operation sign, while operands are separated by comma. The vector product of
three-dimensional vectors yields an example of such notation: z = [x, y].
Let K be a numeric field. Under the numeric field in this book we shall
understand one of three such fields: the field of rational numbers K = Q, the field
of real numbers K = R, or the field of complex numbers K = C. The operation of
§ 2. LINEAR VECTOR SPACES. 11
multiplication by numbers from the field K in a set M is a rule that maps each pair
(α, x) consisting of a number α ∈ K and of an element x ∈ M to some element
y ∈ M. The operation of multiplication by numbers is written in infix form:
y = α x. The multiplication sign in this notation is often omitted: y = αx.
Definition 2.1. A set V equipped with binary operation of addition and with
the operation of multiplication by numbers from the field K, is called a linear
vector space over the field K, if the following conditions are fulfilled:
(1) u +v = v + u for all u, v ∈ V ;
(2) (u +v) + w = u + (v +w) for all u, v, w ∈ V ;
(3) there is an element 0 ∈ V such that v + 0 = v for all v ∈ V ; any such
element is called a zero element;
(4) for any v ∈ V and for any zero element 0 there is an element v

∈ V such
that v +v

= 0; it is called an opposite element for v;
(5) α (u +v) = α u + α v for any number α ∈ K and for any two elements
u, v ∈ V ;
(6) (α+β) v = α v+β v for any two numbers α, β ∈ K and for any element
v ∈ V ;
(7) α (β v) = (αβ) v for any two numbers α, β ∈ K and for any element
v ∈ V ;
(8) 1 v = v for the number 1 ∈ K and for any element v ∈ V .
The elements of a linear vector space are usually called the vectors, while
the conditions (1)-(8) are called the axioms of a linear vector space. We shall
distinguish rational, real, and complex linear vector spaces depending on which
numeric field K = Q, K = R, or K = C they are defined over. Most of the results
in this book are valid for any numeric field K. Formulating such results, we shall
not specify the type of linear vector space.
Axioms (1) and (2) are the axiom of commutativity
1
and the axiom of associa-
tivity respectively. Axioms (5) and (6) express the distributivity.
Theorem 2.1. Algebraic operations in an arbitrary linear vector space V pos-
sess the following properties:
(9) zero vector 0 ∈ V is unique;
(10) for any vector v ∈ V the vector v

opposite to v is unique;
(11) the product of the number 0 ∈ K and any vector v ∈ V is equal to zero
vector: 0 v = 0;
(12) the product of an arbitrary number α ∈ K and zero vector is equal to zero
vector: α 0 = 0;
(13) the product of the number −1 ∈ K and the vector v ∈ V is equal to the
opposite vector: (−1) v = v

.
Proof. The properties (9)-(13) are immediate consequences of the axioms
(1)-(8). Therefore, they are enumerated so that their numbers form successive
series with the numbers of the axioms of a linear vector space.
Suppose that in a linear vector space there are two elements 0 and 0

with the
properties of zero vectors. Then for any vector v ∈ V due to the axiom (3) we
1
The system of axioms (1)-(8) is excessive: the axiom (1) can be derived from other axioms.
I am grateful to A. B. Muftakhov who communicated me this curious fact.
12 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
have v = v + 0 and v + 0

= v. Let’s substitute v = 0

into the first equality and
substitute v = 0 into the second one. Taking into account the axiom (1), we get
0

= 0

+ 0 = 0 + 0

= 0.
This means that the vectors 0 and 0

do actually coincide. The uniqueness of zero
vector is proved.
Let v be some arbitrary vector in a vector space V . Suppose that there are two
vectors v

and v

opposite to v. Then
v +v

= 0, v +v

= 0.
The following calculations prove the uniqueness of opposite vector:
v

= v

+0 = v

+(v + v

) = (v

+ v) + v

=
= (v +v

) +v

= 0 + v

= v

+ 0 = v

.
In deriving v

= v

above we used the axiom (4), the associativity axiom (2) and
we used twice the commutativity axiom (1).
Again, let v be some arbitrary vector in a vector space V . Let’s take x = 0 v,
then let’s add x with x and apply the distributivity axiom (6). As a result we get
x +x = 0 v +0 v = (0 + 0) v = 0 v = x.
Thus we have proved that x +x = x. Then we easily derive that x = 0:
x = x + 0 = x +(x +x

) = (x + x) + x

= x + x

= 0.
Here we used the associativity axiom (2). The property (11) is proved.
Let α be some arbitrary number of a numeric field K. Let’s take x = α 0,
where 0 is zero vector of a vector space V . Then
x + x = α 0 + α 0 = α (0 + 0) = α 0 = x.
Here we used the axiom (5) and the property of zero vector from the axiom (3).
From the equality x + x = x it follows that x = 0 (see above). Thus, the
property (12) is proved.
Let v be some arbitrary vector of a vector space V . Let x = (−1) v. Applying
axioms (8) and (6), for the vector x we derive
v +x = 1 v +x = 1 v +(−1) v = (1 +(−1)) v = 0 v = 0.
The equality v + x = 0 just derived means that x is an opposite vector for the
vector v in the sense of the axiom (4). Due to the uniqueness property (10) of the
opposite vector we conclude that x = v

. Therefore, (−1) v = v

. The theorem is
completely proved.
Due to the commutativity and associativity axioms we need not worry about
setting brackets and about the order of the summands when writing the sums of
vectors. The property (13) and the axioms(7) and (8) yield
(−1) v

= (−1) ((−1) v) = ((−1)(−1)) v = 1 v = v.
§ 2. LINEAR VECTOR SPACES. 13
This equality shows that the notation v

= −v for an opposite vector is quite
natural. In addition, we can write
−α v = −(α v) = (−1) (α v) = (−α) v.
The operation of subtraction is an opposite operation for the vector addition. It
is determined as the addition with the opposite vector: x − y = x + (−y). The
following properties of the operation of vector subtraction
(a + b) −c = a + (b −c),
(a − b) +c = a − (b −c),
(a − b) −c = a − (b +c),
α (x − y) = α x − α y
make the calculations with vectors very simple and quite similar to the calculations
with numbers. Proof of the above properties is left to the reader.
Let’s consider some examples of linear vector spaces. Real arithmetic vector
space R
n
is determined as a set of ordered n-tuples of real numbers x
1
, . . . , x
n
.
Such n-tuples are represented in the form of column vectors. Algebraic operations
with column vectors are determined as the operations with their components:

x
1
x
2
.
.
.
x
n

+

y
1
y
2
.
.
.
y
n

=

x
1
+y
1
x
2
+y
2
.
.
.
x
n
+y
n

α

x
1
x
2
.
.
.
x
n

=

α x
1
α x
2
.
.
.
α x
n

(2.1)
We leave to the reader to check the fact that the set R
n
of all ordered n-tuples
with algebraic operations (2.1) is a linear vector space over the field R of real
numbers. Rational arithmetic vector space Q
n
over the field Q of rational numbers
and complex arithmetic vector space C
n
over the field C of complex numbers are
defined in a similar way.
Let’s consider the set of m-times continuously differentiable real-valued func-
tions on the segment [−1, 1] of real axis. This set is usually denoted as C
m
([−1, 1]).
The operations of addition and multiplication by numbers in C
m
([−1, 1]) are de-
fined as pointwise operations. This means that the value of the function f + g at
a point a is the sum of the values of f and g at that point. In a similar way, the
value of the function α f at the point a is the product of two numbers α and f(a).
It is easy to verify that the set of functions C
m
([−1, 1]) with pointwise algebraic
operations of addition and multiplication by numbers is a linear vector space over
the field of real numbers R. The reader can easily verify this fact.
Definition 2.2. A non-empty subset U ⊂ V in a linear vector space V over a
numeric field K is called a subspace of the space V if:
(1) from u
1
, u
2
∈ U it follows that u
1
+ u
2
∈ U;
(2) from u ∈ U it follows that α u ∈ U for any number α ∈ K.
Let U be a subspace of a linear vector space V . Let’s regard U as an isolated
set. Due to the above conditions (1) and (2) this set is closed with respect to
operations of addition and multiplication by numbers. It is easy to show that
14 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
zero vector is an element of U and for any u ∈ U the opposite vector u

also is
an element of U. These facts follow from 0 = 0 u and u

= (−1) u. Relying
upon these facts one can easily prove that any subspace U ⊂ V , when considered
as an isolated set, is a linear vector space over the field K. Indeed, we have
already shown that axioms (3) and (4) are valid for it. Verifying axioms (1),
(2) and remaining axioms (5)-(8) consists in checking equalities written in terms
of the operations of addition and multiplication by numbers. Being fulfilled for
arbitrary vectors of V , these equalities are obviously fulfilled for vectors of subset
U ⊂ V . Since U is closed with respect to algebraic operations, it makes sure that
all calculations in these equalities are performed within the subset U.
As the examples of the concept of subspace we can mention the following
subspaces in the functional space C
m
([−1, 1]):
– the subspace of even functions (f(−x) = f(x));
– the subspace of odd functions (f(−x) = −f(x));
– the subspace of polynomials (f(x) = a
n
x
n
+. . . +a
1
x +a
0
).
' 3. Linear dependence and linear independence.
Let v
1
, . . . , v
n
be a system of vectors some from some linear vector space V .
Applying the operations of multiplication by numbers and addition to them we
can produce the following expressions with these vectors:
v = α
1
v
1
+. . . + α
n
v
n
. (3.1)
An expression of the form (3.1) is called a linear combination of the vectors
v
1
, . . . , v
n
. The numbers α
1
, . . . , α
n
are taken from the field K; they are called
the coefficients of the linear combination (3.1), while vector v is called the value
of this linear combination. Linear combination is said to be zero or equal to zero if
its value is zero.
A linear combination is called trivial if all its coefficients are equal to zero:
α
1
= . . . = α
n
= 0. Otherwise it is called nontrivial.
Definition 3.1. A system of vectors v
1
, . . . , v
n
in linear vector space V is
called linearly dependent if there exists some nontrivial linear combination of these
vectors equal to zero.
Definition 3.2. A system of vectors v
1
, . . . , v
n
in linear vector space V is
called linearly independent if any linear combination of these vectors being equal
to zero is necessarily trivial.
The concept of linear independence is obtained by direct logical negation of the
concept of linear dependence. The reader can give several equivalent statements
defining this concept. Here we give only one of such statements which, to our
knowledge, is most convenient in what follows.
Let’s introduce one more concept related to linear combinations. We say that
vector v is linearly expressed through the vectors v
1
, . . . , v
n
if v is the value of
some linear combination composed of v
1
, . . . , v
n
.
CopyRight c (Sharipov R.A., 1996, 2004.
§ 3. LINEAR DEPENDENCE AND LINEAR INDEPENDENCE. 15
Theorem 3.1. The relation of linear dependence of vectors in a linear vector
space has the following basic properties:
(1) any system of vectors comprising zero vector is linearly dependent;
(2) any system of vectors comprising linearly dependent subsystem is linearly
dependent in whole;
(3) if a system of vectors is linearly dependent, then at least one of these vectors
is linearly expressed through others;
(4) if a system of vectors v
1
, . . . , v
n
is linearly independent and if adding the
next vector v
n+1
to it we make it linearly dependent, then the vector v
n+1
is linearly expressed through previous vectors v
1
, . . . , v
n
;
(5) if a vector x is linearly expressed through the vectors y
1
, . . . , y
m
and if each
one of the vectors y
1
, . . . , y
m
is linearly expressed through z
1
, . . . , z
n
, then
x is linearly expressed through z
1
, . . . , z
n
.
Proof. Suppose that a system of vectors v
1
, . . . , v
n
comprises zero vector.
For the sake of certainty we can assume that v
k
= 0. Let’s compose the following
linear combination of the vectors v
1
, . . . , v
n
:
0 v
1
+. . . +0 v
k−1
+ 1 v
k
+ 0 v
k+1
+ . . . +0 v
n
= 0.
This linear combination is nontrivial since the coefficient of vector v
k
is nonzero.
And its value is equal to zero. Hence, the vectors v
1
, . . . , v
n
are linearly
dependent. The property (1) is proved. Suppose that a system of vectors
v
1
, . . . , v
n
comprises a linear dependent subsystem. Since linear dependence is
not sensible to the order in which the vectors in a system are enumerated, we can
assume that first k vectors form linear dependent subsystem in it. Then there
exists some nontrivial liner combination of these k vectors being equal to zero:
α
1
v
1
+ . . . + α
k
v
k
= 0.
Let’s expand this linear combination by adding other vectors with zero coefficients:
α
1
v
1
+ . . . +α
k
v
k
+0 v
k+1
+. . . + 0 v
n
= 0.
It is obvious that the resulting linear combination is nontrivial and its value is
equal to zero. Hence, the vectors v
1
, . . . , v
n
are linearly dependent. The property
(2) is proved.
Let assume that the vectors v
1
, . . . , v
n
are linearly dependent. Then there
exists a nontrivial linear combination of them being equal to zero:
α
1
v
1
+ . . . +α
n
v
n
= 0. (3.2)
Non-triviality of the linear combination (3.2) means that at least one of its
coefficients is nonzero. Suppose that α
k
= 0. Let’s write (3.2) in more details:
α
1
v
1
+. . . +α
k
v
k
+ . . . + α
n
v
n
= 0.
16 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Let’s move the term α
k
v
k
to the right hand side of the above equality, and then
let’s divide the equality by −α
k
:
v
k
= −
α
1
α
k
v
1
− . . . −
α
k−1
α
k
v
k−1

α
k+1
α
k
v
k+1
−. . . −
α
n
α
k
v
n
.
Now we see that the vector v
k
is linearly expressed through other vectors of the
system. The property (3) is proved.
Let’s consider a linearly independent system of vectors v
1
, . . . , v
n
such that
adding the next vector v
n+1
to it we make it linearly dependent. Then there is
some nontrivial linear combination of vectors v
1
, . . . , v
n+1
being equal to zero:
α
1
v
1
+ . . . +α
n
v
n
+ α
n+1
v
n+1
= 0.
Let’s prove that α
n+1
= 0. If, conversely, we assume that α
n+1
= 0, we would get
the nontrivial linear combination of n vectors being equal to zero:
α
1
v
1
+ . . . + α
n
v
n
= 0.
This contradicts to the linear independence of the first n vectors v
1
, . . . , v
n
.
Hence, α
n+1
= 0, and we can apply the trick already used above:
v
n+1
= −
α
1
α
n+1
v
1
−. . . −
α
n
α
n+1
v
n
.
This expression for the vector v
n+1
completes the proof of the property (4).
Suppose that the vector x is linearly expressed through y
1
, . . . , y
m
, and each
one of the vectors y
1
, . . . , y
m
is linearly expressed through z
1
, . . . , z
n
. This fact
is expressed by the following formulas:
x =
m
¸
i=1
α
i
y
i
, y
i
=
n
¸
j=1
β
ij
z
j
.
Substituting second formula into the first one, for the vector x we get
x =
m
¸
i=1
α
i

n
¸
j=1
β
ij
z
j

=
n
¸
j=1

m
¸
i=1
α
i
β
ij

z
j
The above expression for the vector x shows that it is linearly expressed through
vectors z
1
, . . . , z
n
. The property (5) is proved. This completes the proof of
theorem 3.1 in whole.
Note the following important consequence that follows from the property (2) in
the theorem 3.1.
Corollary. Any subsystem in a linearly independent system of vectors is lin-
earlyindependent.
§ 3. LINEAR DEPENDENCE AND LINEAR INDEPENDENCE. 17
The next property of linear dependence of vectors is known as Steinitz theorem.
It describes some quantitative feature of this concept.
Theorem 3.2 (Steinitz). If the vectors x
1
, . . . , x
n
are linear independent and
if each of them is expressed through the vectors y
1
, . . . , y
m
, then m n.
Proof. We shall prove this theorem by induction on the number of vectors in
the system x
1
, . . . , x
n
. Let’s begin with the case n = 1. Linear independence of a
system with a single vector x
1
means that x
1
= 0. In order to express the nonzero
vector x
1
through the vectors of a system y
1
, . . . , y
m
this system should contain
at least one vector. Hence, m 1. The base step of induction is proved.
Suppose that the theorem holds for the case n = k. Under this assumption
let’s prove that it is valid for n = k + 1. If n = k + 1 we have a system of
linearly independent vectors x
1
, . . . , x
k+1
, each vector being expressed through
the vectors of another system y
1
, . . . , y
m
. We express this fact by formulas
x
1
= α
11
y
1
+ . . . +α
1m
y
m
,
. . . . . . . . . . . . .
x
k
= α
k1
y
1
+ . . . + α
km
y
m
.
(3.3)
We shall write the analogous formula expressing x
k+1
through y
1
, . . . , y
m
in a
slightly different way:
x
k+1
= β
1
y
1
+ . . . + β
m
y
m
.
Due to the linear independence of vectors x
1
, . . . , x
k+1
the last vector x
k+1
of this
system is nonzero (as well as other ones). Therefore at least one of the numbers
β
1
, . . . , β
m
is nonzero. Upon renumerating the vectors y
1
, . . . , y
m
, if necessary,
we can assume that β
m
= 0. Then
y
m
=
1
β
m
x
k+1

β
1
β
m
y
1
− . . . −
β
m−1
β
m
y
m−1
. (3.4)
Let’s substitute (3.4) into the relationships (3.3) and collect similar terms in them.
As a result the relationships (3.4) are written as
x
i

α
im
β
m
x
k+1
=
m−1
¸
j=1

α
ij
−β
j
α
im
β
m

y
j
, (3.5)
where i = 1, . . . , k. In order to simplify (3.5) we introduce the following notations:
x

i
= x
i

α
im
β
m
x
k+1
, α

ij
= α
ij
−β
j
α
im
β
m
. (3.6)
In these notations the formulas (3.5) are written as
x

1
= α

11
y
1
+. . . +α

1 m−1
y
m−1
,
. . . . . . . . . . . . . . .
x

k
= α

k 1
y
1
+. . . +α

k m−1
y
m−1
.
(3.7)
18 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
According to the above formulas, k vectors x

1
, . . . , x

k
are linearly expressed
through y
1
, . . . , y
m−1
. In order to apply the inductive hypothesis we need to
show that the vectors x

1
, . . . , x

k
are linearly independent. Let’s consider a linear
combination of these vectors being equal to zero:
γ
1
x

1
+ . . . + γ
k
x

k
= 0. (3.8)
Substituting (3.6) for x

i
in (3.8), upon collecting similar terms, we get
γ
1
x
1
+ . . . + γ
k
x
k

k
¸
i=1
γ
i
α
im
β
m

x
k+1
= 0.
Due to the linear independence of the initial system of vectors x
1
, . . . , x
k+1
we
derive γ
1
= . . . = γ
k
= 0. Hence, the linear combination (3.8) is trivial, which
proves the linear independence of vectors x

1
, . . . , x

k
. Now, applying the inductive
hypothesis to the relationships (3.7), we get m− 1 k. The required inequality
m k +1 proving the theorem for the case n = k +1 is an immediate consequence
of m k + 1. So, the inductive step is completed and the theorem is proved.
' 4. Spanning systems and bases.
Let S ⊂ V be some non-empty subset in a linear vector space V . The set S
can consist of either finite number of vectors, or of infinite number of vectors. We
denote by 'S` the set of all vectors, each of which is linearly expressed through
some finite number of vectors taken from S:
'S` = ¦v ∈ V : ∃ n (v = α
1
s
1
+ . . . + α
n
s
n
, where s
i
∈ S)¦.
This set 'S` is called the linear span of a subset S ⊂ V .
Theorem 4.1. The linear span of any subset S ⊂ V is a subspace in a linear
vector space V .
Proof. In order to prove this theorem it is sufficient to check two conditions
from the definition 2.2 for 'S`. Suppose that u
1
, u
2
∈ 'S`. Then
u
1
= α
1
s
1
+ . . . + α
n
s
n
,
u
2
= β
1
s

1
+ . . . +β
m
s

m
.
Adding these two equalities, we see that the vector u
1
+ u
2
also is expressed as a
linear combination of some finite number of vectors taken from S. Therefore, we
have u
1
+ u
2
∈ 'S`.
Now suppose that u ∈ 'S`. Then u = α
1
s
1
+ . . . + α
n
s
n
. For the vector
α u, from this equality we derive
α u = (αα
1
) s
1
+ . . . + (αα
n
) s
n
.
Hence, α u ∈ 'S`. Both conditions (1) and (2) from the definition 2.2 for 'S` are
fulfilled. Thus, the theorem is proved.
§ 4. SPANNING SYSTEMS AND BASES. 19
Theorem 4.2. The operation of passing to the linear span in a linear vector
space V possesses the following properties:
(1) if S ⊂ U and if U is a subspace in V , then 'S` ⊂ U;
(2) the linear span of a subset S ⊂ V is the intersection of all subspaces com-
prising this subset S.
Proof. Let u ∈ 'S` and S ⊂ U, where U is a subspace. Then for the vector u
we have u = α
1
s
1
+ . . . + α
n
s
n
, where s
i
∈ S. But s
i
∈ S and S ⊂ U implies
s
i
∈ U. Since U is a subspace, the value of any linear combination of its elements
again is an element of U. Hence, u ∈ U. This proves the inclusion 'S` ⊂ U.
Let’s denote by W the intersection of all subspaces of V comprising the subset
S. Due to the property (1), which is already proved, the subset 'S` is included
into each of such subspaces. Therefore, 'S` ⊂ W. On the other hand, 'S` is a
subspace of V comprising the subset S (see theorem 4.1). Hence, 'S` is among
those subspaces forming W. Then W ⊂ 'S`. From the two inclusions 'S` ⊂ W
and W ⊂ 'S` it follows that 'S` = W. The theorem is proved.
Let 'S` = U. Then we say that the subset S ⊂ V spans the subspace U, i. e. S
generates U by means of the linear combinations. This terminology is supported
by the following definition.
Definition 4.1. A subset S ⊂ V is called a generating subset or a spanning
system of vectors in a linear vector space V if 'S` = V .
A linear vector space V can have multiple spanning systems. Therefore the
problem of choosing of a minimal (is some sense) spanning system is reasonable.
Definition 4.2. A spanning system of vectors S ⊂ V in a linear vector space
V is called a minimal spanning system if none of smaller subsystems S

S is a
spanning system in V , i. e. if 'S

` = V for all S

S.
Definition 4.3. A system of vectors S ⊂ V is called linearly independent if
any finite subsystem of vectors s
1
, . . . , s
n
taken from S is linearly independent.
This definition extends the definition 3.2 for the case of infinite systems of
vectors. As for the spanning systems, the relation of the properties of minimality
and linear independence for them is determined by the following theorem.
Theorem 4.3. A spanning system of vectors S ⊂ V is minimal if and only if it
is linearly independent.
Proof. If a spanning system of vectors S ⊂ V is linearly dependent, then it
contains some finite linearly dependent set of vectors s
1
, . . . , s
n
. Due to the item
(3) in the statement of theorem 3.1 one of these vectors s
k
is linearly expressed
through others. Then the subsystem S

= S ¦s
k
¦ obtained by omitting this
vector s
k
from S is a spanning system in V . This fact obviously contradicts the
minimality of S (see definition 4.2 above). Therefore any minimal spanning system
of vectors in V is linearly independent.
If a spanning system of vectors S ⊂ V is not minimal, then there is some
smaller spanning subsystem S

S, i. e. subsystem S

such that
'S

` = 'S` = V. (4.1)
20 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
In this case we can choose some vector s
0
∈ S such that s
0
/ ∈ S

. Due to (4.1) this
vector is an element of 'S

`. Hence, s
0
is linearly expressed through some finite
number of vectors taken from the subsystem S

:
s
0
= α
1
s
1
+. . . + α
n
s
n
. (4.2)
One can easily transform (4.2) to the form of a linear combination equal to zero:
(−1) s
0

1
s
1
+ . . . + α
n
s
n
= 0. (4.3)
This linear combination is obviously nontrivial. Thus, we have found that the
vectors s
0
, . . . , s
n
form a finite linearly dependent subset of S. Hence, S is linearly
dependent (see the item (2) in theorem 3.1 and the definition 4.2). This fact means
that any linearly independent spanning system of vector in V is minimal.
Definition 4.4. A linear vector space V is called finite dimensional if there is
some finite spanning system of vectors S = ¦x
1
, . . . , x
n
¦ in it.
In an arbitrary linear vector space V there is at lease one spanning system, e. g.
S = V . However, the problem of existence of minimal spanning systems in general
case is nontrivial. The solution of this problem is positive, but it is not elementary
and it is not constructive. This problem is solved with the use of the axiom of
choice (see [1]). Finite dimensional vector spaces are distinguished due to the fact
that the proof of existence of minimal spanning systems for them is elementary.
Theorem 4.4. In a finite dimensional linear vector space V there is at least one
minimal spanning system of vectors. Any two of such systems ¦x
1
, . . . , x
n
¦ and
¦y
1
, . . . , y
n
¦ have the same number of elements n. This number n is called the
dimension of V , it is denoted as n = dimV .
Proof. Let S = ¦x
1
, . . . , x
k
¦ be some finite spanning system of vectors in a
finite-dimensional linear vector space V . If this system is not minimal, then it is
linear dependent. Hence, one of its vectors is linearly expressed through others.
This vector can be omitted and we get the smaller spanning system S

consisting
of k −1 vectors. If S

is not minimal again, then we can iterate the process getting
one less vectors in each step. Ultimately, we shall get a minimal spanning system
S
min
in V with finite number of vectors n in it:
S
min
= ¦y
1
, . . . , y
n
¦. (4.4)
Usually, the minimal spanning system of vectors (4.4) is not unique. Suppose
that ¦x
1
, . . . , x
m
¦ is some other minimal spanning system in V . Both systems
¦x
1
, . . . , x
m
¦ and ¦y
1
, . . . , y
n
¦ are linearly independent and
x
i
∈ 'y
1
, . . . , y
n
` for i = 1, . . . , m,
y
i
∈ 'x
1
, . . . , x
m
` for i = 1, . . . , n.
(4.5)
Due to (4.5) we can apply Steinitz theorem 3.2 to the systems of vectors
¦x
1
, . . . , x
m
¦ and ¦y
1
, . . . , y
n
¦. As a result we get two inequalities n m
and m n. Therefore, m = n = dimV . The theorem is proved.
§ 4. SPANNING SYSTEMS AND BASES. 21
The dimension dimV is an integer invariant of a finite-dimensional linear vector
space. If dimV = n, then such a space is called an n-dimensional space. Returning
to the examples of linear vector spaces considered in ' 2, note that dimR
n
= n,
while the functional space C
m
([−1, 1]) is not finite-dimensional at all.
Theorem 4.5. Let V be a finite dimensional linear vector space. Then the
following propositions are valid:
(1) the number of vectors in any linearly independent system of vectors x
1
, . . . , x
k
in V is not greater than the dimension of V ;
(2) any subspace U of the space V is finite-dimensional and dimU dimV ;
(3) for any subspace U in V if dimU = dimV , then U = V ;
(4) any linearly independent system of n vectors x
1
, . . . , x
n
, where n = dimV ,
is a spanning system in V .
Proof. Suppose that dimV = n. Let’s fix some minimal spanning system of
vectors y
1
, . . . , y
n
in V . Then each vector of the linear independent system of
vectors x
1
, . . . , x
k
in proposition (1) is linearly expressed through y
1
, . . . , y
n
.
Applying Steinitz theorem 3.2, we get the inequality k n. The first proposition
of theorem is proved.
Let’s consider all possible linear independent systems u
1
, . . . , u
k
composed
by the vectors of a subspace U. Due to the proposition (1), which is already
proved, the number of vectors in such systems is restricted. It is not greater than
n = dimV . Therefore we can assume that u
1
, . . . , u
k
is a linearly independent
system with maximal number of vectors: k = k
max
n = dimV . If u is an
arbitrary vector of the subspace U and if we add it to the system u
1
, . . . , u
k
,
we get a linearly dependent system; this is because k = k
max
. Now, applying
the property (4) from the theorem 3.1, we conclude that the vector u is linearly
expressed through the vectors u
1
, . . . , u
k
. Hence, the vectors u
1
, . . . , u
k
form
a finite spanning system in U. It is minimal since it is linearly independent (see
theorem 4.3). Finite dimensionality of U is proved. The estimate for its dimension
follows from the above inequality: dimU = k n = dimV .
Let U again be a subspace in V . Assume that dimU = dimV = n. Let’s
choose some minimal spanning system of vectors u
1
, . . . , u
n
in U. It is linearly
independent. Adding an arbitrary vector v ∈ V to this system, we make it linearly
dependent since in V there is no linearly independent system with (n + 1) vectors
(see proposition (1), which is already proved). Furthermore, applying the property
(3) from the theorem 3.1 to the system u
1
, . . . , u
n
, v, we find that
v = α
1
u
1
+ . . . + α
m
u
m
.
This formula means that v ∈ U, where v is an arbitrary vector of the space V .
Therefore, U = V . The third proposition of the theorem is proved.
Let x
1
, . . . , x
n
be a linearly independent system of n vectors in V , where n
is equal to the dimension of the space V . Denote by U the linear span of this
system of vectors: U = 'x
1
, . . . , x
n
`. Since x
1
, . . . , x
n
are linearly independent,
they form a minimal spanning system in U. Therefore, dimU = n = dimV . Now,
applying proposition (3) of the theorem, we get
'x
1
, . . . , x
n
` = U = V.
CopyRight c (Sharipov R.A., 1996, 2004.
22 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
This equality proves the fourth proposition of theorem 4.5 and completes the proof
of the theorem in whole.
Definition 4.5. A minimal spanning system e
1
, . . . , e
n
with some fixed order
of vectors in it is called a basis of a finite-dimensional vector space V .
Theorem (basis criterion). An ordered system of vectors e
1
, . . . , e
n
is a
basis in a finite-dimensional vector space V if and only if
(1) the vectors e
1
, . . . , e
n
are linearly independent;
(2) an arbitrary vector of the space V is linearly expressed through e
1
, . . . , e
n
.
Proof is obvious. The second condition of theorem means that the vectors
e
1
, . . . , e
n
form a spanning system in V , while the first condition is equivalent to
its minimality.
In essential, theorem 4.6 simply reformulates the definition 4.5. We give it here
in order to simplify the terminology. The terms «spanning system» and «minimal
spanning system» are huge and inconvenient for often usage.
Theorem 4.7. Let e
1
, . . . , e
s
be a basis in a subspace U ⊂ V and let v ∈ V be
some vector outside this subspace: v / ∈ U. Then the system of vectors e
1
, . . . , e
s
, v
is a linearly independent system.
Proof. Indeed, if the system of vectors e
1
, . . . , e
s
, v is linearly dependent,
while e
1
, . . . , e
s
is a linearly independent system, then v is linearly expressed
through the vectors e
1
, . . . , e
s
, thus contradicting the condition v / ∈ U. This
contradiction proves the theorem 4.7.
Theorem 4.8 (on completing the basis). Let U be a subspace in a finite-
dimensional linear vector space V . Then any basis e
1
, . . . , e
s
of U can be completed
up to a basis e
1
, . . . , e
s
, e
s+1
, . . . , e
n
in V .
Proof. Let’s denote U = U
0
. If U
0
= V , then there is no need to complete the
basis since e
1
, . . . , e
s
is a basis in V . Otherwise, if U
0
= V , then let’s denote by
e
s+1
some arbitrary vector of V taken outside the subspace U
0
. According to the
above theorem 4.7, the vectors e
1
, . . . , e
s
, e
s+1
are linearly independent.
Let’s denote by U
1
the linear span of vectors e
1
, . . . , e
s
, e
s+1
. For the subspace
U
1
we have the same two mutually exclusive options U
1
= V or U
1
= V , as we
previously had for the subspace U
0
. If U
1
= V , then the process of completing the
basis e
1
, . . . , e
s
is over. Otherwise, we can iterate the process and get a chain of
subspaces enclosed into each other:
U
0
U
1
U
2
. . . .
This chain of subspaces cannot be infinite since the dimension of every next
subspace is one as greater than the dimension of previous subspace, and the
dimensions of all subspaces are not greater than the dimension of V . The process
of completing the basis will be finished in (n − s)-th step, where U
n−s
= V .
' 5. Coordinates. Transformation of the
coordinates of a vector under a change of basis.
Let V be some finite-dimensional linear vector space over the field K and let
dimV = n. In this section we shall consider only finite-dimensional spaces. Let’s
§ 5. TRANSFORMATION OF THE COORDINATES OF VECTORS . . . 23
choose a basis e
1
, . . . , e
n
in V . Then an arbitrary vector x ∈ V can be expressed
as linear combination of the basis vectors:
x = x
1
e
1
+ . . . +x
n
e
n
. (5.1)
The linear combination (5.1) is called the expansion of the vector x in the basis
e
1
, . . . , e
n
. Its coefficients x
1
, . . . , x
n
are the elements of the numeric field K.
They are called the components or the coordinates of the vector x in this basis.
We use upper indices for the literal notations of the coordinates of a vector x in
(5.1). The usage of upper indices for the coordinates of vectors is determined by
special convention, which is known as tensorial notation. It was introduced for to
simplify huge calculations in differential geometry and in theory of relativity (see
[2] and [3]). Other rules of tensorial notation are discussed in coordinate theory of
tensors (see [7]
1
).
Theorem 5.1. For any vector x ∈ V its expansion in a basis of a linear vector
space V is unique.
Proof. The existence of an expansion (5.1) for a vector x follows from the
item (2) of theorem 4.7. Assume that there is another expansion
x = x
1
e
1
+ . . . +x
n
e
n
. (5.2)
Subtracting (5.1) from this equality, we get
x = (x
1
−x
1
) e
1
+ . . . +(x
n
− x
n
) e
n
. (5.3)
Since basis vectors e
1
, . . . , e
n
are linearly independent, from the equality (5.3) it
follows that the linear combination (5.3) is trivial: x
i
− x
i
= 0. Then
x
1
= x
1
, . . . , x
n
= x
n
.
Hence the expansions (5.1) and (5.2) do coincide. The uniqueness of the expansion
(5.1) is proved.
Having chosen some basis e
1
, . . . , e
n
in a space V and expanding a vector x in
this base we can write its coordinates in the form of column vectors. Due to the
theorem 5.1 this determines a bijective map ψ : V →K
n
. It is easy to verify that
ψ(x + y) =

x
1
+y
1
.
.
.
x
n
+y
n

, ψ(α x) =

α x
1
.
.
.
α x
n

. (5.4)
The above formulas (5.4) show that a basis is a very convenient tool when
dealing with vectors. In a basis algebraic operations with vectors are replaced by
algebraic operations with their coordinates, i. e. with numeric quantities. However,
coordinate approach has one disadvantage. The mapping ψ essentially depends on
the basis we choose. And there is no canonic choice of basis. In general, none
of basis is preferable with respect to another. Therefore we should be ready to
1
The reference [7] is added in 2004 to English translation of this book.
24 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
consider various bases and should be able to recalculate the coordinates of vectors
when passing from a basis to another basis.
Let e
1
, . . . , e
n
and ˜ e
1
, . . . , ˜e
n
be two arbitrary bases in a linear vector space
V . We shall call them «wavy» basis and «non-wavy» basis (because of tilde sign
we use for denoting the vectors of one of them). The non-wavy basis will also be
called the initial basis or the old basis, and the wavy one will be called the new
basis. Taking i-th vector of new (wavy) basis, we expand it in the old basis:
˜e
i
= S
1
i
e
1
+. . . + S
n
i
e
n
. (5.5)
According to the tensorial notation, the coordinates of the vector ˜e
i
in the
expansion (5.5) are specified by upper index. The lower index i specifies the
number of the vector ˜ e
i
being expanded. Totally in the expansion (5.5) we
determine n
2
numbers; they are usually arranged into a matrix:
S =

S
1
1
. . . S
1
n
.
.
.
.
.
.
.
.
.
S
n
1
. . . S
n
n

. (5.6)
Upper index j of the matrix element S
j
i
specifies the row number; lower index i
specifies the column number. The matrix S in (5.6) the direct transition matrix
for passing from the old basis e
1
, . . . , e
n
to the new basis ˜e
1
, . . . , ˜ e
n
.
Swapping the bases e
1
, . . . , e
n
and ˜e
1
, . . . , ˜ e
n
we can write the expansion of
the vector e
j
in wavy basis:
e
j
= T
1
j
˜ e
1
+. . . +T
n
j
˜ e
n
. (5.7)
The coefficients of the expansion (5.7) determine the matrix T, which is called the
inverse transition matrix. Certainly, the usage of terms «direct» and «inverse»
here is relative; it depends on which basis is considered as an old basis and which
one is taken for a new one.
Theorem 5.2. The direct transition matrix S and the inverse transition matrix
T determined by the expansions (5.5) and (5.7) are inverse to each other.
Remember that two square matrices are inverse to each other if their product
is equal to unit matrix: S T = 1. Here we do not define the matrix multiplication
assuming that it is known from the course of general algebra.
Proof. Let’s begin the proof of the theorem 5.2 by writing the relationships
(5.5) and (5.7) in a brief symbolic form:
˜e
i
=
n
¸
k=1
S
k
i
e
k
, e
j
=
n
¸
i=1
T
i
j
˜e
i
. (5.8)
Then we substitute the first relationship (5.8) into the second one. This yields:
e
j
=
n
¸
i=1
T
i
j

n
¸
k=1
S
k
i
e
k

=
n
¸
k=1

n
¸
i=1
S
k
i
T
i
j

e
k
. (5.9)
§ 5. TRANSFORMATION OF THE COORDINATES OF VECTORS . . . 25
The symbol δ
k
j
, which is called the Kronecker symbol, is used for denoting the
following numeric array:
δ
k
j
=

1 for k = j,
0 for k = j.
(5.10)
We apply the Kronecker symbol determined in (5.10) in order to transform left
hand side of the equality (5.9):
e
j
=
n
¸
k=1
δ
k
j
e
k
. (5.11)
Both equalities (5.11) and (5.9) represent the same vector e
j
expanded in the same
basis e
1
, . . . , e
n
. Due to the theorem 5.1 on the uniqueness of the expansion of a
vector in a basis we have the equality
n
¸
i=1
S
k
i
T
i
j
= δ
k
j
.
It is easy to note that this equality is equivalent to the matrix equality S T = 1.
The theorem is proved.
Corollary. The direct transition matrix S and the inverse transition matrix
T both are non-degenerate matrices and det S det T = 1.
Proof. The relationship det S det T = 1 follows from the matrix equality
S T = 1, which was proved just above. This fact is well known from the course
of general algebra. If the product of two numbers is equal to unity, then none of
these two numbers can be equal to zero:
det S = 0, det T = 0.
This proves the non-degeneracy of transition matrices S and T. The corollary is
proved.
Theorem 5.3. Every non-degenerate nn matrix S can be obtained as a tran-
sition matrix for passing from some basis e
1
, . . . , e
n
to some other basis ˜ e
1
, . . . , ˜e
n
in a linear vector space V of the dimension n.
Proof. Let’s choose an arbitrary e
1
, . . . , e
n
basis in V and fix it. Then let’s
determine the other n vectors ˜ e
1
, . . . , ˜ e
n
by means of the relationships (5.5) and
prove that they are linearly independent. For this purpose we consider a linear
combination of these vectors that is equal to zero:
α
1
˜e
1
+ . . . +α
n
˜ e
n
= 0. (5.12)
Substituting (5.5) into this equality, one can transform it to the following one:

n
¸
i=1
S
1
i
α
i

e
1
+. . . +

n
¸
i=1
S
n
i
α
i

e
n
= 0.
Since the basis vectors e
1
, . . . , e
n
are linearly independent, it follows that all
sums enclosed within the brackets in the above equality are equal to zero. Writing
26 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
these sums in expanded form, we get a homogeneous system of linear algebraic
equations with respect to the variables α
1
, . . . , α
n
:
S
1
1
α
1
+ . . . + S
1
n
α
n
= 0,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
S
n
1
α
1
+ . . . + S
n
n
α
n
= 0.
The matrix of coefficients of this system coincides with S. From the course of
algebra we know that each homogeneous system of linear equations with non-
degenerate square matrix has unique solution, which is purely zero:
α
1
= . . . = α
n
= 0.
This means that an arbitrary linear combination (5.12), which is equal to zero, is
necessarily trivial. Hence, ˜ e
1
, . . . , ˜e
n
is a linear independent system of vectors.
Applying the proposition (4) from the theorem 4.5 to these vectors, we find that
they form a basis in V , while the matrix S appears to be a direct transition matrix
for passing from e
1
, . . . , e
n
to ˜ e
1
, . . . , ˜e
n
. The theorem is proved.
Let’s consider two bases e
1
, . . . , e
n
and ˜e
1
, . . . , ˜e
n
in a linear vector space V
related by the transition matrix S. Let x be some arbitrary vector of the space V .
It can be expanded in each of these two bases:
x =
n
¸
k=1
x
k
e
k
, x =
n
¸
i=1
˜ x
i
˜e
i
. (5.13)
Once the coordinates of x in one of these two bases are fixed, this fixes the vector
x itself, and, hence, this fixes its coordinates in another basis.
Theorem 5.4. The coordinates of a vector x in two bases e
1
, . . . , e
n
and
˜ e
1
, . . . , ˜e
n
of a linear vector space V are related by formulas
x
k
=
n
¸
i=1
S
k
i
˜ x
i
, ˜ x
i
=
n
¸
k=1
T
i
k
x
k
, (5.14)
where S and T are direct and inverse transition matrices for the passage from
e
1
, . . . , e
n
to ˜e
1
, . . . , ˜e
n
, i. e. when e
1
, . . . , e
n
is treated as an old basis and
˜ e
1
, . . . , ˜e
n
is treated as a new one.
The relationships (5.14) are known as transformation formulas for the coordi-
nates of a vector under a change of basis.
Proof. In order to prove the first relationship (5.14) we substitute the expan-
sion of the vector ˜ e
i
taken from (5.8) into the second relationship (5.13):
x =
n
¸
i=1
˜ x
i

n
¸
k=1
S
k
i
e
k

=
n
¸
k=1

n
¸
i=1
S
k
i
˜ x
i

e
k
.
Comparing this expansion x with the first expansion (5.13) and applying the
theorem on uniqueness of the expansion of a vector in a basis, we derive
x
k
=
n
¸
i=1
S
k
i
˜ x
i
.
§ 6. INTERSECTIONS AND SUMS OF SUBSPACES. 27
This is exactly the first transformation formula (5.14). The second formula (5.14)
is proved similarly.
' 6. Intersections and sums of subspaces.
Suppose that we have a certain number of subspaces in a linear vector space
V . In order to designate this fact we write U
i
⊂ V , where i ∈ I. The number of
subspaces can be finite or infinite enumerable, then they can be enumerated by
the positive integers. However, in general case we should enumerate the subspaces
by the elements of some indexing set I, which can be finite, infinite enumerable, or
even non-enumerable. Let’s denote by U and by S the intersection and the union
of all subspaces that we consider:
U =
¸
i∈I
U
i
, S =
¸
i∈I
U
i
. (6.1)
Theorem 6.1. The intersection of an arbitrary number of subspaces in a linear
vector space V is a subspace in V .
Proof. The set U in (6.1) is not empty since zero vector is an element of each
subspace U
i
. Let’s verify the conditions (1) and (2) from the definition 2.2 for U.
Suppose that u
1
, u
2
, and u are the vectors from the subset U. Then they
belong to U
i
for each i ∈ I. However, U
i
is a subspace, hence, u
1
+ u
2
∈ U
i
and
α u ∈ U
i
for any i ∈ I and for any α ∈ K. Therefore, u
1
+ u
2
∈ U and α u ∈ U.
The theorem is proved.
In general, the subset S in (6.1) is not a subspace. Therefore we need to
introduce the following concept.
Definition 6.1. The linear span of the union of subspaces U
i
, i ∈ I, is called
the sum of these subspaces.
For to denote the sum of subspaces W = 'S` we use the standard summation sign:
W =

¸
U
i

=
¸
i∈I
U
i
.
Theorem 6.2. A vector w of a linear vector space V belongs to the sum of
subspaces U
i
, i ∈ I, if and only if it is represented as a sum of finite number of
vectors each of which is taken from some subspace U
i
:
w = u
i
1
+ . . . + u
i
k
, where u
i
∈ U
i
. (6.2)
Proof. Let S be the union of subspaces U
i
⊂ V , i ∈ I. Suppose that w ∈ W.
Then w is a linear combination of finite number of vectors taken from S:
w = α
1
s
1
+ . . . + α
k
s
k
.
But S is the union of subspaces U
i
. Therefore, s
m
∈ U
im
and α
m
s
m
= u
im
∈ U
im
,
where m = 1, . . . , k. This leads to the equality (6.2) for the vector w.
28 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Conversely, suppose that w is a vector given by formula (6.2). Then u
im
∈ U
im
and U
im
⊂ S, i. e. u
im
∈ S. Therefore, the vector w belongs to the linear span of
S. The theorem is proved.
Definition 6.2. The sum W of subspaces U
i
, i ∈ I, is called the direct sum,
if for any vector w ∈ W the expansion (6.2) is unique. In this case for the direct
sum of subspaces we use the special notation:
W =

i∈I
U
i
.
Theorem 6.3. Let W = U
1
+ . . . + U
k
be the sum a finite number of finite-
dimensional subspaces. The dimension of W is equal to the sum of dimensions of
the subspaces U
i
if and only if W is the direct sum: W = U
1
⊕. . . ⊕U
k
.
Proof. Let’s choose a basis in each subspace U
i
. Suppose that dimU
i
= s
i
and let e
i 1
, . . . , e
i s
i
be a basis in U
i
. Let’s join the vectors of all bases into one
system ordering them alphabetically:
e
1 1
, . . . , e
1 s
1
, . . . , e
k 1
, . . . , e
k s
k
. (6.3)
Due to the equality W = U
1
+ . . . + U
k
for an arbitrary vector w of the subspace
W we have the expansion (6.2):
w = u
1
+ . . . + u
k
, where u
i
∈ U
i
. (6.4)
Expanding each vector u
i
of (6.4) in the basis of corresponding subspace U
i
, we
get the expansion of w in vectors of the system (6.3). Hence, (6.3) is a spanning
system of vectors in W (though, in general case it is not a minimal spanning
system).
If dimW = dimU
1
+ . . . + dimU
k
, then the number of vectors in (6.3) cannot
be reduced. Therefore (6.3) is a basis in W. From any expansion (6.4) we can
derive the following expansion of the vector w in the basis (6.3):
w =

s
1
¸
j=1
α
1 j
e
1 j

+ . . . +

s
k
¸
j=1
α
k j
e
k j

. (6.5)
The sums enclosed into the round brackets in (6.5) are determined by the expan-
sions of the vectors u
1
, . . . , u
k
in the bases of corresponding subspaces U
1
, . . . , U
k
:
u
i
=
s
i
¸
j=1
α
i j
e
i j
. (6.6)
Due to (6.6) the existence of two different expansions (6.4) for some vector w
would mean the existence of two different expansions (6.5) of this vector in the
basis (6.3). Hence, the expansion (6.4) is unique and the sum of subspaces
W = U
1
+. . . +U
k
is the direct sum.
CopyRight c (Sharipov R.A., 1996, 2004.
§ 6. INTERSECTIONS AND SUMS OF SUBSPACES. 29
Conversely, suppose that W = U
1
⊕ . . . ⊕ U
k
. We know that the vectors (6.3)
span the subspace W. Let’s prove that they are linearly independent. For this
purpose we consider a linear combination of these vectors being equal to zero:
0 =

s
1
¸
j=1
α
1 j
e
1 j

+. . . +

s
k
¸
j=1
α
k j
e
k j

. (6.7)
Let’s denote by ˜ u
1
, . . . , ˜ u
k
the values of sums enclosed into the round brackets in
(6.7). It is easy to see that ˜ u
i
∈ U
i
, therefore, (6.7) is an expansion of the form
(6.4) for the vector w = 0. But 0 = 0 + . . . + 0 and 0 ∈ U
i
. This is another
expansion for the vector w = 0. However, W = U
1
⊕ . . . ⊕ U
k
, therefore, the
expansion 0 = 0 + . . . + 0 is unique expansion of the form (6.4) for zero vector
w = 0. Then we have the equalities
0 =
s
i
¸
j=1
α
i j
e
i j
for all i = 1, . . . , k.
It’s clear that these equalities are the expansions of zero vector in the bases of
the subspaces U
i
. Hence, α
i j
= 0. This means that the linear combination (6.7)
is trivial, and (6.3) is a linearly independent system of vectors. Thus, being a
spanning system and being linearly independent, the system of vectors (6.3) is a
basis of W. Now we can find the dimension of the subspace W by counting the
number of vectors in (6.3): dimW = s
1
+ . . . + s
k
= dimU
1
+ . . . + dimU
k
. The
theorem is proved.
Note. If the sum of subspaces W = U
1
+ . . . + U
k
is not necessarily the direct
sum, the vectors (6.3), nevertheless, form a spanning system in W. But they do
not necessarily form a linearly independent system in this case. Therefore, we
have
dimW dimU
1
+ . . . +dimU
k
. (6.8)
Sharpening this inequality in general case is sufficiently complicated. We shall do
it for the case of two subspaces.
Theorem 6.4. The dimension of the sum of two arbitrary finite-dimensional
subspaces U
1
and U
2
in a linear vector space V is equal to the sum of their dimen-
sions minus the dimension of their intersection:
dim(U
1
+ U
2
) = dimU
1
+ dimU
2
− dim(U
1
∩ U
2
). (6.9)
Proof. From the inclusion U
1
∩ U
2
⊂ U
1
and from the inequality (6.8) we
conclude that all subspaces considered in the theorem are finite-dimensional. Let’s
denote dim(U
1
∩U
2
) = s and choose a basis e
1
, . . . , e
s
in the intersection U
1
∩U
2
.
Due to the inclusion U
1
∩ U
2
⊂ U
1
we can apply the theorem 4.8 on completing
the basis. This theorem says that we can complete the basis e
1
, . . . , e
s
of the
intersection U
1
∩ U
2
up to a basis e
1
, . . . , e
s
, e
s+1
, . . . , e
s+p
in U
1
. For the
dimension of U
1
, we have dimU
1
= s + p. In a similar way, due to the inclusion
U
1
∩ U
2
⊂ U
2
we can construct a basis e
1
, . . . , e
s
, e
s+p+1
, . . . , e
s+p+q
in U
2
. For
the dimension of U
2
this yields dimU
2
= s + q.
30 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Now let’s join together the two bases constructed above with the use of
theorem 4.8 and consider the total set of vectors in them:
e
1
, . . . , e
s
, e
s+1
, . . . , e
s+p
, e
s+p+1
, . . . , e
s+p+q
. (6.10)
Let’s prove that these vectors (6.10) form a basis in the sum of subspaces U
1
+U
2
.
Let w be some arbitrary vector in U
1
+ U
2
. The relationship (6.2) for this vector
is written as w = u
1
+ u
2
. Let’s expand the vectors u
1
and u
2
in the above two
bases of the subspaces U
1
and U
2
respectively:
u
1
=
s
¸
i=1
α
i
e
i
+
p
¸
j=1
β
s+j
e
s+j
,
u
2
=
s
¸
i=1
˜ α
i
e
i
+
q
¸
j=1
γ
s+p+j
e
s+p+j
.
Adding these two equalities, we find that the vector w is linearly expressed through
the vectors (6.10). Hence, (6.10) is a spanning system of vectors in U
1
+ U
2
.
In order to prove that (6.10) is a linearly independent system of vectors we
consider a linear combination of these vectors being equal to zero:
s+p
¸
i=1
α
i
e
i
+
q
¸
i=1
α
s+p+i
e
s+p+i
= 0. (6.11)
Then we transform this equality by moving the second sum to the right hand side:
s+p
¸
i=1
α
i
e
i
= −
q
¸
i=1
α
s+p+i
e
s+p+i
.
Let’s denote by u the value of left and right sides of this equality. Then for the
vector u we get the following two expressions:
u =
s+p
¸
i=1
α
i
e
i
, u = −
q
¸
i=1
α
s+p+i
e
s+p+i
. (6.12)
Because of the first expression (6.12) we have u ∈ U
1
, while the second expression
(6.12) yields u ∈ U
2
. Hence, u ∈ U
1
∩ U
2
. This means that we can expand the
vector u in the basis e
1
, . . . , e
s
:
u =
s
¸
i=1
β
i
e
i
. (6.13)
Comparing this expansion with the second expression (6.12), we find that
s
¸
i=1
β
i
e
i
+
q
¸
i=1
α
s+p+i
e
s+p+i
= 0. (6.14)
§ 7. COSETS OF A SUBSPACE. THE CONCEPT OF FACTORSPACE. 31
Note that the vectors e
1
, . . . , e
s
, e
s+p+1
, . . . , e
s+p+q
form a basis in U
2
. They
are linearly independent. Therefore, all coefficients in (6.14) are equal to zero. In
particular, we have the following equalities:
α
s+p+1
= . . . = α
s+p+q
= 0. (6.15)
Moreover, β
1
= . . . = β
s
= 0. Due to (6.13) this means that u = 0. Now from the
first expansion (6.12) we get the equality
s+p
¸
i=1
α
i
e
i
= 0.
Since e
1
, . . . , e
s
, e
s+1
, . . . , e
s+p
are linearly independent vectors, all coefficients
α
i
in the above equality should be zero:
α
1
= . . . = α
s
= α
s+1
= . . . = α
s+p
= 0. (6.16)
Combining (6.15) and (6.16), we see that the linear combination (6.11) is trivial.
This means that the vectors (6.10) are linearly independent. Hence, they form a
basis in U
1
+ U
2
. For the dimension of the subspace U
1
+ U
2
this yields
dim(U
1
+U
2
) = s +p + q = (s +p) +(s + q) −s =
= dimU
1
+ dimU
2
− dim(U
1
∩ U
2
).
Thus, the relationship (6.9) and the theorem 6.4 in whole is proved.
' 7. Cosets of a subspace. The concept of factorspace.
Let V be a linear vector space and let U be a subspace in it. A coset of the
subspace U determined by a vector v ∈ V is the following set of vectors
1
:
Cl
U
(v) = ¦w ∈ V : w− v ∈ U¦. (7.1)
The vector v in (7.1) is called a representative of the coset (7.1). The coset Cl
U
(v)
is a very simple thing, it is obtained by adding the vector v with all vectors of the
subspace U. The coset represented by zero vector is the especially simple thing
since Cl
U
(0) = U. It is called a zero coset.
Theorem 7.1. The cosets of a subspace U in a linear vector space V possess
the following properties:
(1) a ∈ Cl
U
(a) for any a ∈ V ;
(2) if a ∈ Cl
U
(b), then b ∈ Cl
U
(a);
(3) if a ∈ Cl
U
(b) and b ∈ Cl
U
(c), then a ∈ Cl
U
(c).
Proof. The first proposition is obvious. Indeed, the difference a − a is equal
to zero vector, which is an element of any subspace: a −a = 0 ∈ U. Hence, due to
the formula (7.1), which is the formal definition of cosets, we have a ∈ Cl
U
(a).
1
We used the sign Cl for cosets since in Russia they are called adjacency classes.
32 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Let a ∈ Cl
U
(b). Then a − b ∈ U. For b − a, we have b − a = (−1) (a − b).
Therefore, b − a ∈ U and b ∈ Cl
U
(a) (see formula (7.1) and the definition 2.2).
The second proposition is proved.
Let a ∈ Cl
U
(b) and b ∈ Cl
U
(c). Then a − b ∈ U and b − c ∈ U. Note that
a − c = (a − b) + (b − a). Hence, a − c ∈ U and a ∈ Cl
U
(c) (see formula (7.1)
and the definition 2.2 again). The third proposition is proved. This completes the
proof of the theorem in whole.
Let a ∈ Cl
U
(b). This condition establishes some kind of dependence between
two vectors a and b. This dependence is not strict: the condition a ∈ Cl
U
(b)
does not exclude the possibility that a

∈ Cl
U
(b) for some other vector a

. Such
non-strict dependences in mathematics are described by the concept of binary
relation (see details in [1] and [4]). Let’s write a ∼ b as an abbreviation for
a ∈ Cl
U
(b). Then the theorem 7.1 reveals the following properties of the binary
relation a ∼ b, which is introduced just above:
(1) reflexivity: a ∼ a;
(2) symmetry: a ∼ b implies b ∼ a;
(3) transitivity: a ∼ b and b ∼ c implies a ∼ c.
A binary relation possessing the properties of reflexivity, symmetry, and transiti-
vity is called an equivalence relation. Each equivalence relation determined in a
set V partitions this set into a union of mutually non-intersecting subsets, which
are called the equivalence classes:
Cl(v) = ¦w ∈ V : w ∼ v¦. (7.2)
In our particular case the formal definition (7.2) coincides with the formal defi-
nition (7.1). In order to keep the completeness of presentation we shall not use
the notation a ∼ b in place of a ∈ Cl
U
(b) anymore, and we shall not refer to the
theory of binary relations (though it is simple and well-known). Instead of this we
shall derive the result on partitioning V into the mutually non-intersecting cosets
from the following theorem.
Theorem 7.2. If two cosets Cl
U
(a) and Cl
U
(b) of a subspace U ⊂ V are
intersecting, then they do coincide.
Proof. Assume that the intersection of two cosets Cl
U
(a) and Cl
U
(b) is not
empty. Then there is an element c belonging to both of them: c ∈ Cl
U
(a) and
c ∈ Cl
U
(b). Due to the proposition (2) of the above theorem 7.1 we derive
b ∈ Cl
U
(c). Combining b ∈ Cl
U
(c) and c ∈ Cl
U
(a) and applying the proposition
(3) of the theorem 7.1, we get b ∈ Cl
U
(a). The opposite inclusion a ∈ Cl
U
(b)
then is obtained by applying the proposition (2) of the theorem 7.1.
Let’s prove that two cosets Cl
U
(a) and Cl
U
(b) do coincide. For this purpose
let’s consider an arbitrary vector x ∈ Cl
U
(a). From x ∈ Cl
U
(a) and a ∈ Cl
U
(b)
we derive x ∈ Cl
U
(b). Hence, Cl
U
(a) ⊂ Cl
U
(b). The opposite inclusion Cl
U
(b) ⊂
Cl
U
(a) is proved similarly. From these two inclusions we derive Cl
U
(a) = Cl
U
(b).
The theorem is proved.
The set of all cosets of a subspace U in a linear vector space V is called the
factorset or quotient set V/U. Due to the theorem proved just above any two
§ 7. COSETS OF A SUBSPACE. THE CONCEPT OF FACTORSPACE. 33
different cosets Q
1
and Q
2
from the factorset V/U have the empty intersection
Q
1
∩ Q
2
= ∅, while the union of all cosets coincides with V :
V =
¸
Q∈V/U
Q.
This equality is a consequence of the fact that any vector v ∈ V is an element of
some coset: v ∈ Q. This coset Q is determined by v according to the formula
Q = Cl
U
(v). For this reason the following theorem is a simple reformulation of
the definition of cosets.
Theorem 7.3. Two vectors v and w belong to the same coset of a subspace U
if and only if their difference v −w is a vector of U.
Definition 7.1. Let Q
1
and Q
2
be two cosets of a subspace U. The sum
of cosets Q
1
and Q
2
is a coset Q of the subspace U determined by the equality
Q = Cl
U
(v
1
+v
2
), where v
1
∈ Q
1
and v
2
∈ Q
2
.
Definition 7.2. Let Q be a coset of a subspace U. The product of Q and
a number α ∈ K is a coset P of the subspace U determined by the relationship
P = Cl
U
(α v), where v ∈ Q.
For the addition of cosets and for the multiplication of them by numbers we use
the same signs of algebraic operations as in case of vectors, i. e. Q = Q
1
+ Q
2
and
P = α Q. The definitions 7.1 and 7.2 can be expressed by formulas
Cl
U
(v
1
) + Cl
U
(v
2
) = Cl
U
(v
1
+ v
2
),
α Cl
U
(v) = Cl
U
(α v).
(7.3)
These definitions require some comments. Indeed, the coset Q = Q
1
+ Q
2
in the
definition 7.1 and the coset P = α Q in the definition 7.2 both are determined
using some representative vectors v
1
∈ Q
1
, v
2
∈ Q
2
, and v ∈ Q. The choice
of a representative vector in a coset is not unique; therefore, we need especially
to prove the uniqueness of the results of algebraic operations determined in the
definitions 7.1 and 7.2. This proof is called the proof of correctness.
Theorem 7.4. The definitions 7.1 and 7.2 are correct and the results of the
algebraic operations of coset addition and of coset multiplication by numbers do
not depend on the choice of representatives in cosets.
Proof. For the beginning we study the operation of coset addition. Lat’s take
consider two different choices of representatives within cosets Q
1
and Q
2
. Let
v
1
, ˜ v
1
be two vectors of Q
1
and let v
1
, ˜ v
1
be two vectors of Q
2
. Then due to the
theorem 7.3 we have the following two equalities:
˜ v
1
− v
1
∈ U, ˜ v
2
−v
2
∈ U.
Hence, (˜ v
1
+ ˜ v
2
) − (v
1
+ v
2
) = (˜ v
1
− v
1
) + (˜ v
2
− v
2
) ∈ U. This means that the
cosets determined by vectors ˜ v
1
+ ˜ v
2
and v
1
+v
2
do coincide with each other:
Cl
U
(˜ v
1
+ ˜ v
2
) = Cl
U
(v
1
+ v
2
).
34 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
This proves the correctness of the definition 7.1 for the operation of coset addition.
Now let’s consider two different representatives v and ˜ v within the coset Q.
Then ˜ v −v ∈ U. Hence, α ˜ v − α v = α (˜ v −v) ∈ U. This yields
Cl
U
(α ˜ v) = Cl
U
(α v),
which proves the correctness of the definition 7.2 for the operation of multiplication
of cosets by numbers.
Theorem 7.5. The factorset V/U of a linear vector space V over a subspace
U equipped with algebraic operations (7.3) is a linear vector space. This space is
called the factorspace or the quotient space of the space V over its subspace U.
Proof. The proof of this theorem consists in verifying the axioms (1)-(8) of a
linear vector space for V/U. The commutativity and associativity axioms for the
operation of coset addition follow from the following calculations:
Cl
U
(v
1
) + Cl
U
(v
2
) = Cl
U
(v
1
+v
2
) =
= Cl
U
(v
2
+v
1
) = Cl
U
(v
2
) + Cl
U
(v
1
),
(Cl
U
(v
1
) + Cl
U
(v
2
)) + Cl
U
(v
3
) = Cl
U
(v
1
+ v
2
) +Cl
U
(v
3
) =
= Cl
U
((v
1
+ v
2
) + v
3
) = Cl
U
(v
1
+ (v
2
+v
3
)) =
Cl
U
(v
1
) +Cl
U
(v
2
+v
3
) = Cl
U
(v
1
) + (Cl
U
(v
2
) +Cl
U
(v
3
)).
In essential, they follow from the corresponding axioms for the operation of vector
addition (see definition 2.1).
In order to verify the axiom (3) we should have a zero element in V/U. The
zero coset 0 = Cl
U
(0) is the best pretender for this role:
Cl
U
(v) + Cl
U
(0) = Cl
U
(v + 0) = Cl
U
(v).
In verifying the axiom (4) we should indicate the opposite coset Q

for a coset
Q = Cl
U
(v). We define it as follows: Q

= Cl
U
(v

). Then
Q+Q

= Cl
U
(v) +Cl
U
(v

) = Cl
U
(v + v

) = Cl
U
(0) = 0.
The rest axioms (5)-(8) are verified by direct calculations on the base of formula
(7.3) for coset operations. Here are these calculations:
α (Cl
U
(v
1
) + Cl
U
(v
2
)) = α Cl
U
(v
1
+v
2
) =
= Cl
U
(α (v
1
+ v
2
)) = Cl
U
(α v
1
+ α v
2
) =
= Cl
U
(α v
1
) + Cl
U
(α v
2
) = α Cl
U
(v
1
) +α Cl
U
(v
2
),
(α +β) Cl
U
(v) = Cl
U
((α + β) v) = Cl
U
(α v + β v) =
= Cl
U
(α v) + Cl
U
(β v) = α Cl
U
(v) +β Cl
U
(v),
α (β Cl
U
(v)) = α Cl
U
(β v) = Cl
U
(α (β v)) =
= Cl
U
((αβ) v) = (αβ) Cl
U
(v),
1 Cl
U
(v) = Cl
U
(1 v) = Cl
U
(v).
§ 7. COSETS OF A SUBSPACE. THE CONCEPT OF FACTORSPACE. 35
The above equalities complete the verification of the fact that the factorset V/U
possesses the structure of a linear vector space.
Note that verifying the axiom (4) we have defined the opposite coset Q

for
a coset Q = Cl
U
(v) by means of the relationship Q

= Cl
U
(v

), where v

is the
opposite vector for v. One could check the correctness of this definition. However,
this is not necessary since due to the property (10), see theorem 2.1, the opposite
coset Q

for Q is unique.
The concept of factorspace is equally applicable to finite-dimensional and to
infinite-dimensional spaces V . The finite or infinite dimensionality of a subspace
U also makes no difference. The only simplification in finite-dimensional case is
that we can calculate the dimension of the factorspace V/U.
Theorem 7.6. If a linear vector space V is finite-dimensional, then for any
its subspace U the factorspace V/U also is finite-dimensional and its dimension is
determined by the following formula:
dimU +dim(V/U) = dimV. (7.4)
Proof. If U = V then the factorspace V/U consists of zero coset only:
V/U = ¦0¦. The dimension of such zero space is equal to zero. Hence, the equality
(7.4) in this trivial case is fulfilled.
Let’s consider a nontrivial case U V . Due to the theorem 4.5 the subspace U
is finite-dimensional. Denote dimV = n and dimU = s, then s < n. Let’s choose a
basis e
1
, . . . , e
s
in U and, according to the theorem 4.8, complete it with vectors
e
s+1
, . . . , e
n
up to a basis in V . For each of complementary vectors e
s+1
, . . . , e
n
we consider the corresponding coset of a subspace U:
E
1
= Cl
U
(e
s+1
), . . . , E
n−s
= Cl
U
(e
n
). (7.5)
Now let’s show that the cosets (7.5) span the factorspace V/U. Indeed, let Q
be an arbitrary coset in V/U and let v ∈ Q be some representative vector of this
coset. Let’s expand the vector v in the above basis of V :
v = (α
1
e
1
+. . . +α
s
e
s
) +β
1
e
s+1
+ . . . + β
n−s
e
n
.
Let’s denote by u the initial part of this expansion: u = α
1
e
1
+. . . +α
s
e
s
. It
is clear that u ∈ U. Then we can write
v = u + β
1
e
s+1
+. . . +β
n−s
e
n
.
Since u ∈ U, we have Cl
U
(u) = 0. For the coset Q = Cl
U
(v) this equality yields
Q = β
1
Cl
U
(e
s+1
) +. . . +β
n−s
Cl
U
(e
n
). Hence, we have
Q = β
1
E
1
+ . . . + β
n−s
E
n−s
.
This means that E
1
, . . . , E
n−s
is a finite spanning system in V/U. Therefore,
V/U is a finite-dimensional linear vector space. To determine its dimension we
CopyRight c (Sharipov R.A., 1996, 2004.
36 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
shall prove that the cosets (7.5) are linearly independent. Indeed, let’s consider a
linear combination of these cosets being equal to zero:
γ
1
E
1
+ . . . + γ
n−s
E
n−s
= 0. (7.6)
Passing from cosets to their representative vectors, from (7.6) we derive
γ
1
Cl
U
(e
s+1
) + . . . +γ
n−s
Cl
U
(e
n
) =
= Cl
U

1
e
s+1
+. . . +γ
n−s
e
n
) = Cl
U
(0).
Let’s denote u = γ
1
e
s+1
+. . . +γ
n−s
e
n
. From the above equality for this vector
we get Cl
U
(u) = Cl
U
(0), which means u ∈ U. Let’s expand u in the basis of
subspace U: u = α
1
e
1
+ . . . + α
s
e
s
. Then, equating two expression for the
vector u, we get the following equality:
−α
1
e
1
− . . . − α
s
e
s

1
e
s+1
+ . . . +γ
n−s
e
n
= 0.
This is the linear combination of basis vectors of V , which is equal to zero. Basis
vectors e
1
, . . . , e
n
are linearly independent. Hence, this linear combination is
trivial and γ
1
= . . . = γ
n−s
= 0. This proves the triviality of linear combination
(7.6) and, therefore, the linear independence of cosets (7.5). Thus, for the
dimension of factorspace this yields dim(V/U) = n − s, which proves the equality
(7.4). The theorem is proved.
' 8. Linear mappings.
Definition 8.1. Let V and W be two linear vector spaces over a numeric field
K. A mapping f : V → W from the space V to the space W is called a linear
mapping if the following two conditions are fulfilled:
(1) f(v
1
+ v
2
) = f(v
1
) + f(v
2
) for any two vectors v
1
, v
2
∈ V ;
(2) f(α v) = α f(v) for any vector v ∈ V and for any number α ∈ K.
The relationship f(0) = 0 is one of the simplest and immediate consequences of
the above two properties (1) and (2) of linear mappings. Indeed, we have
f(0) = f(0 +(−1) 0) = f(0) +(−1) f(0) = 0. (8.1)
Theorem 8.1. Linear mappings possess the following three properties:
(1) the identical mapping id
V
: V →V of a linear vector space V onto itself is
a linear mapping;
(2) the composition of any two linear mappings f : V → W and g : W → U is
a linear mapping g ◦ f : V →U;
(3) if a linear mapping f : V → W is bijective, then the inverse mapping
f
−1
: W →V also is a linear mapping.
Proof. The linearity of the identical mapping is obvious. Indeed, here is the
verification of the conditions (1) and (2) from the definition 8.1 for id
V
:
id
V
(v
1
+ v
2
) = v
1
+ v
2
= id
V
(v
1
) +id
V
(v
2
),
id
V
(α v) = α v = α id
V
(v).
§ 8. LINEAR MAPPINGS. 37
Let’s prove the second proposition of the theorem 8.1. Consider the composition
g ◦ f of two linear mappings f and g. For this composition the conditions (1) and
(2) from the definition 8.1 are verified as follows:
g ◦ f(v
1
+v
2
) = g(f(v
1
+v
2
) = g(f(v
1
) + f(v
2
)) =
= g(f(v
1
)) +g(f(v
2
)) = g ◦ f(v
1
) + g ◦ f(v
2
),
g ◦ f(α v) = g(f(α v)) = g(α f(v)) = α g(f(v))
= α g ◦ f(v).
Now let’s prove the third proposition of the theorem 8.1. Suppose that
f : V → W is a bijective linear mapping. Then it possesses unique bilateral
inverse mapping f
−1
: W →V (see theorem 1.9). Let’s denote
z
1
= f
−1
(w
1
+w
2
) −f
−1
(w
1
) − f
−1
(w
2
),
z
2
= f
−1
(α w) −α f
−1
(w).
It is obvious that the linearity of the inverse mapping f
−1
is equivalent to
vanishing z
1
and z
2
. Let’s apply f to these vectors:
f(z
1
) = f(f
−1
(w
1
+ w
2
) − f
−1
(w
1
) −f
−1
(w
2
)) =
= f(f
−1
(w
1
+ w
2
)) − f(f
−1
(w
1
)) −f(f
−1
(w
2
)) =
= (w
1
+w
2
) −w
1
−w
2
= 0,
f(z
2
) = f(f
−1
(α w) − α f
−1
(w)) = f(f
−1
(α w))−
− α f(f
−1
(w)) = α w −α w = 0.
A bijective mapping is injective. Therefore, from the equalities f(z
1
) = 0 and
f(z
2
) = 0 just derived and from the equality f(0) = 0 derived in (8.1) it follows
that z
1
= z
2
= 0. The theorem is proved.
Each linear mapping f : V → W is related with two subsets: the kernel
Ker f ⊂ V and the image Imf ⊂ W. The image Imf = f(V ) of a linear mapping
is defined in the same way as it was done for a general mapping in ' 1:
Imf = ¦∈W : ∃ v ((v ∈ A) & (f(v) = w))¦.
The kernel of a linear mapping f : V → W is the set of vectors in the space V
that map to zero under the action of f:
Ker f = ¦v ∈ V : f(v) = 0¦
Theorem 8.2. The kernel and the image of a linear mapping f : V →W both
are subspaces in V and W respectively.
Proof. In order to prove this theorem we should check the conditions (1) and
(2) from the definition 2.2 as appliedto the subsets Ker f ⊂ V and Imf ⊂ W.
38 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Suppose that v
1
, v
2
∈ Ker f. Then f(v
1
) = 0 and f(v
2
) = 0. Suppose also that
v ∈ Ker f. Then f(v) = 0. As a result we derive
f(v
1
+ v
2
) = f(v
1
) + f(v
2
) = 0 + 0 = 0,
f(α v) = α f(v) = α 0 = 0.
Hence, v
1
+ v
2
∈ Ker f and α v ∈ Ker f. This proves the proposition of the
theorem concerning the kernel Ker f.
Let w
1
, w
2
, w ∈ Imf. Then there are three vectors v
1
, v
2
, v in V such that
f(v
1
) = w
1
, f(v
2
) = w
2
, and f(v) = w. Hence, we have
w
1
+ w
2
= f(v
1
) + f(v
2
) = f(v
1
+v
2
),
α w = α f(v) = f(α v).
This meant that w
1
+w
2
∈ Imf and α w ∈ Imf. The theorem is proved.
Remember that, according to the theorem 1.2, a linear mapping f : V → W is
surjective if and only if Imf = W. There is a similar proposition for Ker f.
Theorem 8.3. A linear mapping f : V →W is injective if and only if its kernel
is zero, i. e. Ker f = ¦0¦.
Proof. Let f be injective and let v ∈ Ker f. Then f(0) = 0 and f(v) = 0.
But if v = 0, then due to injectivity of f it would be f(v) = f(0). Hence, v = 0.
This means that the kernel of f consists of the only one element: Ker f = ¦0¦.
Now conversely, suppose that Ker f = ¦0¦. Let’s consider two different vectors
v
1
= v
2
in V . Then v
1
−v
2
= 0 and v
1
− v
2
∈ Ker f. Therefore, f(v
1
− v
2
) = 0.
Applying the linearity of f, from this inequality we derive f(v
1
) − f(v
2
) = 0, i. e.
f(v
1
) = f(v
2
). Hence, f is an injective mapping. The theorem is proved.
The following theorem is known as the theorem on the linear independence of
preimages. Here is its statement.
Theorem 8.4. Let f : V → W be a linear mapping and let v
1
, . . . , v
s
be some
vectors of a linear vector space V such that their images f(v
1
), . . . , f(v
n
) in W
are linearly independent. Then the vectors v
1
, . . . , v
s
themselves are also linearly
independent.
Proof. In order to prove the theorem let’s consider a linear combination of
the vectors v
1
, . . . , v
s
being equal to zero:
α
1
v
1
+ . . . + α
s
v
s
= 0.
Applying f to both sides of this equality and using the fact that f is a linear
mapping, we obtain quite similar equality for the images
α
1
f(v
1
) + . . . + α
s
f(v
s
) = 0.
However, these images f(v
1
), . . . , f(v
n
) are linearly independent. Hence, all
coefficients in the above linear combination are equal to zero: α
1
= . . . = α
s
= 0.
§ 9. THE MATRIX OF A LINEAR MAPPING. 39
Then the initial linear combination is also necessarily trivial. This proves that the
vectors v
1
, . . . , v
s
are linearly independent.
A linear vector space is a set. But it is not simply a set — it is a structured
set. It is equipped with algebraic operations satisfying the axioms (1)-(8). Linear
mappings are those being concordant with the structures of a linear vector space
in the spaces they are acting from and to. In algebra such mappings concordant
with algebraic structures are called morphisms. So, in algebraic terminology, linear
mappings are morphisms of linear vector spaces.
Definition 8.2. Two linear vector spaces V and W are called isomorphic if
there is a bijective linear mapping f : V →W binding them.
The first example of an isomorphism of linear vector spaces is the mapping
ψ: V → K
n
in (5.4). Because of the existence of such mapping we can formulate
the following theorem.
Theorem 8.5. Any n-dimensional linear vector space V is isomorphic to the
arithmetic linear vector space K
n
.
Isomorphic linear vector spaces have many common features. Often they can
be treated as undistinguishable. In particular, we have the following fact.
Theorem 8.6. If a linear vector space V is isomorphic to a finite-dimensional
vector space W, then V is also finite-dimensional and the dimensions of these two
spaces do coincide: dimV = dimW.
Proof. Let f : V → W be an isomorphism of spaces V and W. Assume
for the sake of certainty that dimW = n and choose a basis h
1
, . . . , h
n
in W.
By means of inverse mapping f
−1
: W → V we define the vectors e
i
= f
−1
(h
i
),
i = 1, . . . , n. Let v be an arbitrary vector of V . Let’s map it with the use of f
into the space W and then expand in the basis:
f(v) = α
1
h
1
+. . . +α
n
h
n
.
Applying the inverse mapping f
−1
to both sides of this equality, due to the
linearity of f
−1
we get the expansion
v = α
1
e
1
+ . . . + α
n
e
n
.
From this expansion we derive that ¦e
1
, . . . , e
n
¦ is a finite spanning system in V .
The finite dimensionality of V is proved. The linear independence of e
1
, . . . , e
n
follows from the theorem 8.4 on the linear independence of preimages. Hence,
e
1
, . . . , e
n
is a basis in V and dimV = n = dimW. The theorem is proved.
' 9. The matrix of a linear mapping.
Let f : V → W be a linear mapping from n-dimensional vector space V to
m-dimensional vector space W. Let’s choose a basis e
1
, . . . , e
n
in V and a basis
h
1
, . . . , h
m
in W. Then consider the images of basis vectors e
1
, . . . , e
n
in W and
40 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
expand them in the basis h
1
, . . . , h
m
:
f(e
1
) = F
1
1
h
1
+ . . . + F
m
1
h
m
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
f(e
n
) = F
1
n
h
1
+ . . . + F
m
n
h
m
.
(9.1)
Totally in (9.1) we have n expansions that define nm numbers F
i
j
. These numbers
are arranged into a rectangular m n matrix which is called the matrix of the
linear mapping f in a pair of bases e
1
, . . . , e
n
and h
1
, . . . , h
m
:
F =

F
1
1
. . . F
1
n
.
.
. . . .
.
.
.
F
m
1
. . . F
m
n

. (9.2)
When placing the element F
i
j
into the matrix (9.2), the upper index determines
the row number, while the lower index determines the column number. In other
words, the matrix F is composed by the column vectors formed by coordinates
of the vectors f(e
1
), . . . , f(e
n
) in the basis h
1
, . . . , h
m
. The expansions (9.1),
which determine the components of this matrix, are convenient to write as follows:
f(e
j
) =
m
¸
i=1
F
i
j
h
i
. (9.3)
Let x be an arbitrary vector of V and let y = f(x) be its image under the
mapping f. If we expand the vector x in the basis: x = x
1
e
1
+. . . +x
n
e
n
, then,
taking into account (9.3), for the vector y we get
y = f(x) =
n
¸
j=1
x
j
f(e
j
) =
n
¸
j=1
x
j

m
¸
i=1
F
i
j
h
i

.
Changing the order of summations in the above expression, we get the expansion
of the vector y in the basis h
1
, . . . , h
m
:
y = f(x) =
m
¸
i=1

n
¸
j=1
F
i
j
x
j

h
i
.
Due to the uniqueness of such expansion for the coordinates of the vector y in the
basis h
1
, . . . , h
m
we get the following formula:
y
i
=
n
¸
j=1
F
i
j
x
j
. (9.4)
This formula (9.4) is the basic application of the matrix of a linear mapping. It is
used for calculating the coordinates of the vector f(x) through the coordinates of
x. In matrix form this formula is written as

y
1
.
.
.
y
m

=

F
1
1
. . . F
1
n
.
.
. . . .
.
.
.
F
m
1
. . . F
m
n

x
1
.
.
.
x
n

. (9.5)
§ 9. THE MATRIX OF A LINEAR MAPPING. 41
Remember that when composing a column vector of the coordinates of a vector
x, we negotiated to understand this procedure as a linear mapping ψ : V → K
n
(see formulas (5.4) and the theorem 8.5). Denote by
˜
ψ: W → K
m
the analogous
mapping for a vector y in W. Then the matrix relationship (9.5) can be treated as
a mapping F : K
n
→ K
m
. These three mappings ψ,
˜
ψ, F and the initial mapping
f can be written in a diagram:
V
f
−−−−→ W
ψ

˜
ψ
K
n
−−−−→
F
K
m
(9.6)
Such diagrams are called commutative diagrams if the compositions of mappings
«when passing along arrows» from any node to any other node do not depend on
a particular path connecting these two nodes. When applied to the diagram (9.6),
the commutativity means
˜
ψ ◦ f = F ◦ ψ. Due to the bijectivity of linear mappings
ψ and
˜
ψ the condition of commutativity of the diagram (9.6) can be written as
F =
˜
ψ◦f◦ ψ
−1
, f =
˜
ψ
−1
◦ F◦ ψ. (9.7)
The reader can easily check that the relationships (9.7) are fulfilled due to the way
the matrix F is constructed. Hence, the diagram (9.6) is commutative.
Now let’s look at the relationships (9.7) from a little bit other point of view. Let
V and W be two spaces of the dimensions n and m respectively. Suppose that we
have an arbitrary mn matrix F. Then the relationship (9.5) determines a linear
mapping F : K
n
→ K
m
. Choosing bases e
1
, . . . , e
n
and h
1
, . . . , h
m
in V and
W we can use the second relationship (9.7) in order to define the linear mapping
f : V →W. The matrix of this mapping in the bases e
1
, . . . , e
n
and h
1
, . . . , h
m
coincides with F exactly. Thus, we have proved the following theorem.
Theorem 9.1. Any rectangular m n matrix F can be constructed as a ma-
trix of a linear mapping f : V → W from n-dimensional vector space V to m-
dimensional vector space W in some pair of bases in these spaces.
A more straightforward way of proving the theorem 9.1 than we considered
above can be based on the following theorem.
Theorem 9.2. For any basis e
1
, . . . , e
n
in n-dimensional vector space V and
for any set of n vectors w
1
, . . . , w
n
in another vector space W there is a linear
mapping f : V →W such that f(e
i
) = w
i
for i = 1, . . . , n.
Proof. Once the basis e
1
, . . . , e
n
in V is chosen, this defines the mapping
ψ: V →K
n
(see (5.4)). In order to construct the required mapping f we define a
mapping ϕ: K
n
→W by the following relationship:
ϕ :

x
1
.
.
.
x
n

→x
1
w
1
+. . . +x
n
w
n
.
Now it is easy to verify that the required mapping is the composition f = ϕ◦ ψ.
42 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Let’s return to initial situation. Suppose that we have a mapping f : V → W
that determines a matrix F upon choosing two bases e
1
, . . . , e
n
and h
1
, . . . , h
m
in V and W respectively. The matrix F essentially depends on the choice of bases.
In order to describe this dependence we consider four bases — two bases in V and
other two bases in W. Suppose that S and P are direct transition matrices for
that pairs of bases. Their components are defined as follows:
˜e
k
=
n
¸
j=1
S
j
k
e
j
,
˜
h
r
=
m
¸
i=1
P
i
r
h
i
.
The inverse transition matrices T = S
−1
and Q = P
−1
are defined similarly:
e
j
=
n
¸
k=1
T
k
j
˜ e
k
, h
i
=
m
¸
r=1
Q
r
i

˜
h
r
.
We use these relationships and the above relationships (9.3) in order to carry out
the following calculations for the vector f(˜ e
k
):
f(˜ e
k
) =
n
¸
j=1
S
j
k
f(e
j
) =
n
¸
j=1
S
j
k

m
¸
i=1
F
i
j
h
i

=
=
n
¸
j=1
S
j
k

m
¸
i=1
F
i
j

m
¸
r=1
Q
r
i

˜
h
r

.
Upon changing the order of summations this result is written as
f(˜ e
k
) =
m
¸
r=1

m
¸
i=1
n
¸
j=1
Q
r
i
F
i
j
S
j
k

˜
h
r
.
The double sums in round brackets are the coefficients of the expansion of the
vector f(˜ e
k
) in the basis
˜
h
1
, . . . ,
˜
h
m
. They determine the matrix of the linear
mapping f in wavy bases ˜e
1
, . . . , ˜ e
n
and
˜
h
1
, . . . ,
˜
h
m
:
˜
F
r
k
=
m
¸
i=1
n
¸
j=1
Q
r
i
F
i
j
S
j
k
. (9.8)
In a similar way one can derive the converse relationship expressing F through
˜
F:
F
i
j
=
m
¸
r=1
n
¸
k=1
P
i
r
˜
F
r
k
T
k
j
. (9.9)
The relationships (9.8) and (9.9) are called the transformation formulas for the
matrix of a linear mapping under a change of bases. They can be written as
˜
F = P
−1
F S, F = P
˜
F S
−1
. (9.10)
This is the matrix form of the relationships (9.8) and (9.9).
The transformation formulas like (9.10) lead us to the broad class of problems
of «bringing to a canonic form». In our particular case a change of bases in the
CopyRight c (Sharipov R.A., 1996, 2004.
§ 9. THE MATRIX OF A LINEAR MAPPING. 43
spaces V and W changes the matrix of the linear mapping f : V → W. The
problem of bringing to a canonic form in this case consists in finding the optimal
choice of bases, where the matrix F has the most simple (canonic) form. The
following theorem solving this particular problem is known as the theorem on
bringing to the almost diagonal form.
Theorem 9.3. Let f : V → W be some nonzero linear mapping from n-
dimensional vector space V to m-dimensional vector space W. Then there is a
choice of bases in V and W such that the matrix F of this mapping has the follow-
ing almost diagonal form:
F =
s
. .. .

1 0 . . . 0 0 . . . 0 0
0 1 . . . 0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0 . . . 0 0
0 0 . . . 0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0 0 . . . 0 0

s
(9.11)
Proof. The purely zero mapping 0: V →W maps each vector of the space V
to zero vector in W. The matrix of such mapping consists of zeros only. There is
no need to formulate the problem of bringing it to a canonic form.
Let f : V →W be a nonzero linear mapping. The integer number s = dim(Imf)
is called the rank of the mapping f. The rank of a nonzero mapping is not equal to
zero. We begin constructing a canonic base in W by choosing a base h
1
, . . . , h
s
in
the image space Imf. For each basis vector h
i
∈ Imf there is a vector e
i
∈ V such
that f(e
i
) = h
i
, i = 1, . . . , s. These vectors e
1
, . . . , e
s
are linearly independent
due to the theorem 8.4. Let r = dim(Ker f). We choose a basis in Ker f and
denote the basis vectors by e
s+1
, . . . , e
s+r
. Then we consider the vectors
e
1
, . . . , e
s
, e
s+1
, . . . , e
s+r
(9.12)
and prove that they form a basis in V . For this purpose we use the theorem 4.6.
Let’s begin with checking the condition (1) in the theorem 4.6 for the vectors
(9.12). In order to prove the linear independence of these vectors we consider a
linear combination of them being equal to zero:
α
1
e
1
+ . . . + α
s
e
s
+ α
s+1
e
s+1
+. . . +α
s+r
e
s+r
= 0. (9.13)
Let’s apply the mapping f to both sides of the equality (9.13) and take into
account that f(e
i
) = h
i
for i = 1, . . . , s. Other vectors belong to the kernel of the
mapping f, therefore, f(e
s+i
) = 0 for i = 1, . . . , r. Then from (9.13) we derive
α
1
h
1
+. . . + α
s
h
s
= 0.
The vectors h
1
, . . . , h
s
form a basis in Imf. They are linearly independent.
44 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
Hence, α
1
= . . . = α
s
= 0. Taking into account this fact, we reduce (9.13) to
α
s+1
e
s+1
+. . . +α
s+r
e
s+r
= 0.
The vectors e
s+1
, . . . , e
s+r
form a basis in Ker f. They are linearly independent,
therefore, α
s+1
= . . . = α
s+r
= 0. As a result we have proved that all coefficients
of the linear combination (9.13) are necessarily zero. Hence, the vectors (9.12) are
linearly independent.
Now lets check the second condition of the theorem 4.6 for the vectors (9.12).
Assume that v is an arbitrary vector in V . Then f(v) belongs to Imf. Let’s
expand f(v) in the basis h
1
, . . . , h
s
:
f(v) = β
1
h
1
+ . . . +β
s
h
s
. (9.14)
Remember that f(e
i
) = h
i
for i = 1, . . . , s. Then from (9.14) we derive
0 = f(v) − β
1
f(e
1
) −. . . −β
s
f(e
s
) =
= f(v − β
1
e
1
−. . . −β
s
e
s
).
(9.15)
Let’s denote ˜ v = v −β
1
e
1
−. . . −β
s
e
s
. From (9.15) we derive f(˜ v) = 0 for this
vector ˜ v. Hence, ˜ v ∈ Ker f. Let’s expand ˜ v in the basis of Ker f:
˜ v = β
s+1
e
s+1
+. . . +β
s+r
e
s+r
.
From the formula ˜ v = v − β
1
e
1
−. . . −β
s
e
s
and the above expansion we get
v = β
1
e
1
+ . . . +β
s
e
s
+ β
s+1
e
s+1
+. . . +β
s+r
e
s+r
.
This means that the vectors (9.12) form a spanning system in V . The condition
(2) of the theorem 4.6 for them is also fulfilled. Thus, the vectors (9.12) form a
basis in V . This yields the equality
dimV = s + r. (9.16)
In order to complete the proof of the theorem we need to complete the basis
h
1
, . . . , h
s
of Imf up to a basis h
1
, . . . , h
s
, h
s+1
, . . . , h
m
in the space W. For
the vector f(e
j
) with j = 1, . . . , s we have the expansion
f(e
j
) = h
j
=
s
¸
i=1
δ
i
j
h
i
+
m
¸
i=s+1
0 h
i
.
If j = s + 1, . . . , s + r, the expansion for f(e
j
) is purely zero:
f(e
j
) = 0 =
s
¸
i=1
0 h
i
+
m
¸
i=s+1
0 h
i
.
Due to these expansions the matrix of the mapping f in the bases that we have
constructed above has the required almost diagonal form (9.10).
In proving this theorem we have proved simultaneously the next one.
§ 10. ALGEBRAIC OPERATIONS WITH MAPPINGS. 45
Theorem 9.4. Let f : V → W be a linear mapping from n-dimensional space
V to an arbitrary linear vector space W. Then
dim(Ker f) + dim(Imf) = dimV. (9.17)
This theorem 9.4 is known as the theorem on the sum of dimensions of the
kernel and the image of a linear mapping. The proposition of the theorem in the
form of the relationship (9.17) immediately follows from (9.16).
' 10. Algebraic operations with mappings.
The space of homomorphisms Hom(V, W).
Definition 10.1. Let V and W be two linear vector spaces and let f : V →W
and g : V → W be two linear mappings from V to W. The linear mapping
h: V →W defined by the relationship h(v) = f(v) +g(v), where v is an arbitrary
vector of V , is called the sum of the mappings f and h.
Definition 10.2. Let V and W be two linear vector spaces over a numeric
field K and let f : V →W be a linear mapping from V to W. The linear mapping
h : V → W defined by the relationship h(v) = α f(v), where v is an arbitrary
vector of V , is called the product of the number α ∈ K and the mapping f.
The algebraic operations introduced by the definitions 10.1 and 10.2 are called
pointwise addition and pointwise multiplication by a number. Indeed, they are
calculated «pointwise» by adding the values of the initial mappings and by
multiplying them by a number for each specific argument v ∈ V . These operations
are denoted by the same signs as the corresponding operations with vectors:
h = f + g and h = α f. The writing (f + g)(v) is understood as the sum of
mappings applied to the vector v. Another writing f(v) + g(v) denotes the sum
of the results of applying f and g to v separately. Though the results of these
calculations do coincide, their meanings are different. In a similar way one should
distinguish the meanings of left and right sides of the following equality:
(α f)(v) = α f(v).
Let’s denote by Map(V, W) the set of all mappings from the space V to the
space W. Sometimes this set is denoted by W
V
.
Theorem 10.1. Let V and W be two linear spaces over a numeric field K. Then
the set of mappings Map(V, W) equipped with the operations of pointwise addition
and pointwise multiplication by numbers fits the definition of a linear vector space
over the numeric field K.
Proof. Let’s verify the axioms of a linear vector space for the set of mappings
Map(V, W). In the case of the first axiom we should verify the coincidence of the
mappings f + g and g + f. Remember that the coincidence of two mappings is
equivalent to the coincidence of their values when applied to an arbitrary vector
v ∈ V . The following calculations establish the latter coincidence:
(f + g)(v) = f(v) +g(v) = g(v) +f(v) = (g + f)(v).
46 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
As we see in the above calculations, the equality f + g = g + f follows from the
commutativity axiom for the addition of vectors in W due to pointwise nature of
the addition of mappings. The same arguments are applicable when verifying the
axioms (2), (5), and (6) for the algebraic operations with mappings:
((f + g) + h)(v) = (f + g)(v) +h(v) = (f(v) + g(v)) + h(v) =
= f(v) + (g(v) +h(v)) = f(v) +(g +h)(v) = (f + (g + h))(v)
(α (f +g))(v) = α (f + g)(v) = α (f(v) +g(v)) =
= α f(v) +α g(v) = (α f)(v) + (α g)(v) = (α f + α g)(v)
((α +β) f)(v) = (α +β) f(v) = α f(v) + β f(v) =
= (α f)(v) + (β f)(v) = (α f +β f)(v)
For the axioms (7) these calculations look like
(α (β f))(v) = α (β f)(v) = α (β f(v)) = (αβ) f(v) = ((αβ) f)(v).
In the case of the axiom (8) the calculations are even more simple:
(1 f)(v) = 1 f(v) = f(v).
Now let’s consider the rest axioms (3) and (4). The zero mapping is the best
pretender for the role of zero element in the space Map(V, W), it maps each vector
v ∈ V to zero vector of the space W. For this mapping we have
(f + 0)(v) = f(v) + 0(v) = f(v) +0 = f(v).
As we see, the axiom (3) in Map(V, W) is fulfilled.
Suppose that f ∈ Map(V, W). We define the opposite mapping f

for f as
follows: f

= (−1) f. Then we have
(f +f

)(v) = (f +(−1) f)(v) = f(v) +
+ ((−1) f)(v) = f(v) + (−1) f(v) = 0 = 0(v).
The axiom (4) in Map(V, W) is also fulfilled. This completes the proof of the
theorem 10.1.
In typical situation the space Map(V, W) is very large. Even for the finite-
dimensional spaces V and W usually it is an infinite-dimensional space. In linear
algebra the much smaller subset of Map(V, W) is studied. This is the set of all
linear mappings from V to W. It is denoted Hom(V, W) and is called the set
of homomorphisms. The following two theorems show that Hom(V, W) is closed
with respect to algebraic operations in Map(V, W). Therefore, we can say that
Hom(V, W) is the space of homomorphisms.
Theorem 10.2. The pointwise sum of two linear mappings f : V → W and
g : V →W is a linear mapping from the space V to the space W.
§ 10. ALGEBRAIC OPERATIONS WITH MAPPINGS. 47
Theorem 10.3. The pointwise product of a linear mapping f : V → W by a
number α ∈ K is a linear mapping from the space V to the space W.
Proof. Let h = f + g be the sum of two linear mappings f and g. The
following calculations prove the linearity of the mapping h:
h(v
1
+ v
2
) = f(v
1
+ v
2
) +g(v
1
+v
2
) = (f(v
1
)+
+f(v
2
)) + ((g(v
1
) +g(v
2
) = (f(v
1
)+
+ g(v
1
)) + (f(v
2
) + g(v
2
)) = h(v
1
) +h(v
2
),
h(β v) = f(β v) + g(β v) = β f(v)+
+ β g(v) = β (f(v) +g(v) = β h(v).
Now let’s consider the product of the mapping f and the number α. Let’s denote
it by h, i. e. let’s denote h = α f. Then the following calculations
h(v
1
+v
2
) = α f(v
1
+ v
2
) = α (f(v
1
)+
+ f(v
2
)) = α f(v
1
) + α f(v
2
) = h(v
1
) +h(v
2
),
h(β v) = α f(β v) = α (β f(v)) =
= (αβ) f(v) = (βα) f(v) = β (α f(v)) = β h(v).
prove the linearity of the mapping h and thus complete the proofs of both
theorems 10.2 and 10.3.
The space of homomorphisms Hom(V, W) is a subspace in the space of all
mappings Map(V, W). It is much smaller and it consists of objects which are in
the scope of linear algebra. For finite-dimensional spaces V and W the space of
homomorphisms Hom(V, W) is also finite-dimensional. This is the result of the
following theorem.
Theorem 10.4. For finite-dimensional spaces V and W the space of homomor-
phisms Hom(V, W) is also finite-dimensional. Its dimension is given by formula
dim(Hom(V, W)) = dim(V ) dim(W). (10.1)
Proof. Let dimV = n and dimW = m. We choose a basis e
1
, . . . , e
n
in
the space V and another basis h
1
, . . . , h
m
in the space W. Let 1 i n and
1 j m. For each fixed pair of indices i, j within the above ranges we consider
the following set of n vectors in the space W:
w
1
= 0, . . . , w
i−1
= 0, w
i
= h
j
, w
i+1
= 0, . . . , w
n
= 0.
All vectors in this set are equal to zero, except for the i-th vector w
i
which is equal
to j-th basis vector h
j
. Now we apply the theorem 9.2 to the basis e
1
, . . . , e
n
in V
and to the set of vector w
1
, . . . , w
n
. This defines the linear mapping E
i
j
: V →W
such that E
i
j
(e
s
) = w
s
for all s = 1, . . . , n. We write this fact as
E
i
j
(e
s
) = δ
i
s
h
j
, (10.2)
48 CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.
where δ
i
s
is the Kronecker symbol. As a result we have constructed n m mappings
E
i
j
satisfying the relationships (10.2):
E
i
j
: V →W, where 1 i n, 1 j m. (10.3)
Now we show that the mapping (10.3) span the space of homomorphisms
Hom(V, W). For this purpose we take a linear mapping f ∈ Hom(V, W). Suppose
that F is its matrix in the pair of bases e
1
, . . . , e
n
and h
1
, . . . , h
m
. Denote by
F
j
i
the elements of this matrix. Then the result of applying f to an arbitrary
vector v ∈ V is determined by coordinates of this vector according to the formula
f(v) =
n
¸
i=1
v
i
f(e
i
) =
n
¸
i=1
m
¸
j=1
(F
j
i
v
i
) h
j
. (10.4)
Applying E
i
j
to the same vector v and taking into account (10.2), we derive
E
i
j
(v) =
n
¸
s=1
v
s
E
i
j
(e
s
) =
n
¸
s=1
(v
s
δ
i
s
) h
j
= v
i
h
j
. (10.5)
Now, comparing the relationships (10.4) and (10.5), we find
f(v) =
n
¸
i=1
m
¸
j=1
F
j
i
E
i
j
(v).
Since v is an arbitrary vector of the space V , this formula means that f is a linear
combination of the mappings (10.3):
f =
n
¸
i=1
m
¸
j=1
F
j
i
E
i
j
.
Hence, the mappings (10.3) span the space of homomorphisms Hom(V, W). This
proves the finite-dimensionality of the space Hom(V, W).
In order to calculate the dimension of Hom(V, W) we shall prove that the
mappings (10.3) are linearly independent. Let’s consider a linear combination of
these mappings, which is equal to zero:
n
¸
i=1
m
¸
j=1
γ
j
i
E
i
j
= 0. (10.6)
Both left and right hand sides of the equality (10.6) represent the zero mapping
0: V →W. Let’s apply this mapping to the basis vector e
s
. Then
n
¸
i=1
m
¸
j=1
γ
j
i
E
i
j
(e
s
) =
n
¸
i=1
m
¸
j=1

j
i
δ
i
s
) h
j
= 0.
The sum in the index i can be calculated explicitly. As a result we get the linear
combinations of basis vectors in W, which are equal to zero:
m
¸
j=1
γ
j
i
h
j
= 0.
§ 10. ALGEBRAIC OPERATIONS WITH MAPPINGS. 49
Due to the linear independence of the vectors h
1
, . . . , h
m
we derive γ
j
i
= 0.
This means that the linear combination (10.6) is necessarily trivial. Hence, the
mappings (10.3) are linearly independent. They form a basis in Hom(V, W). Now,
by counting these mappings we find that the required formula (10.2) is valid.
The meaning of the above theorem becomes transparent in terms of the matrices
of linear mappings. Indeed, upon choosing the bases in V and W the linear
mappings from Hom(V, W) are represented by rectangular m n matrices. The
sum of mappings corresponds to the sum of matrices, and the product of a
mapping by a number corresponds to the product of the matrix by that number.
Note that rectangular m n matrices form a linear vector space isomorphic to
the arithmetic linear vector space K
mn
. This space is denoted as K
m×n
. So, the
choice of bases in V and W defines an isomorphism of Hom(V, W) and K
m×n
.
CopyRight c (Sharipov R.A., 1996, 2004.
CHAPTER II
LINEAR OPERATORS.
' 1. Linear operators. The algebra of endomorphisms
End(V ) and the group of automorphisms Aut(V ).
A linear mapping f : V → V acting from a linear vector space V to the same
vector space V is called a linear operator
1
. Linear operators are special forms of
linear mappings. Therefore, we can apply to them all results of previous chapter.
However, the less generality the more specific features. Therefore, the theory of
linear operators appears to be more rich and more complicated than the theory of
linear mappings. It contains not only the strengthening of previous theorems for
this particular case, but a class of problems that cannot be formulated for the case
of general linear mappings.
Let’s consider the space of homomorphisms Hom(V, W). If W = V , this space
is called the space of endomorphisms End(V ) = Hom(V, V ). It consists of linear
operators f : V →V which are also called endomorphisms of the space V . Unlike
the space of homomorphisms Hom(V, W), the space of endomorphisms End(V ) is
equipped with the additional binary algebraic operation. Indeed, if we have two
linear operators f, g ∈ End(V ), we can not only add them and multiply them
by numbers, but we can also construct two compositions f ◦ g ∈ End(V ) and
g ◦ f ∈ End(V ).
Theorem 1.1. Let End(V ) be the space of endomorphisms of a linear vector
space V . Here, apart from the axioms (1)-(8) of a linear vector space, the following
relationships are fulfilled:
(9) (f + g) ◦ h = f ◦ h +g ◦ h;
(10) (α f) ◦ h = α (f ◦ h);
(11) f ◦ (g + h) = f ◦ g + f ◦ h;
(12) f ◦ (α g) = α (f ◦ g);
Proof. Each of the equalities (9)-(12) is an operator equality. As we know,
the equality of two operators means that these operators yield the same result
when applied to an arbitrary vector v ∈ V :
((f +g) ◦ h)(v) = (f + g)(h(v)) = f(h(v)) + g(h(v)) =
= (f ◦ h)(v) + (g ◦ h)(v) = (f ◦ h +g ◦ h)(v)
((α f) ◦ h)(v) = (α f)(h(v)) = α f(h(v)) =
= α (f ◦ h)(v) = (α (f ◦ h))(v)
(f ◦ (g + h))(v) = f((g +h)(v)) = f(g(v) +h(v)) =
= f(g(v)) + f(h(v)) = (f ◦ g)(v) + (f ◦ h)(v) = (f ◦ g +f ◦ h)(v)
1
This terminology is not common, however, in this book we strictly follow this terminology.
§ 1. ENDOMORPHISMS AND AUTOMORPHISMS. 51
(f ◦ (α g))(v) = f((α g)(v)) = f(α g(v)) =
= α f(g(v)) = α (f ◦ g)(v) = (α (f ◦ g))(v)
The above calculations prove the properties (9)-(12) of the composition of linear
operators.
Let’s fix the operator h ∈ End(V ) and consider the composition f◦ h as a rule
that maps each operator f to the other operator g = f◦ h. Then we get a mapping:
R
h
: End(V ) →End(V ).
The first two properties (9) and (10) from the theorem 1.1 mean that R
h
is a
linear mapping. This mapping is called the right shift by h since it acts as a
composition, where h is placed on the right side. In a similar way we can define
another mapping, which is called the left shift by h:
L
h
: End(V ) →End(V ).
It acts according to the rule L
h
(f) = h ◦ f. This mapping is linear due to the
properties (11) and (12) from the theorem 1.1.
The operation of composition is an additional binary operation in the space
of endomorphisms End(V ). The linearity of the mapping R
h
is interpreted as
the linearity of this binary operation in its first argument, while the linearity of
L
h
is said to be the linearity of composition in its second argument. A binary
algebraic operation linear in both arguments is called a bilinear operation. A
situation, where a linear vector space is equipped with an additional bilinear
algebraic operation, is rather typical.
Definition 1.1. A linear vector space A over a numeric field K equipped with
a bilinear binary operation of vector multiplication is called an algebra over the
field K or simply a K-algebra.
The operation of multiplication in algebras is usually denoted by some sign
like a dot «•» or a circle «◦», but very often this sign is omitted at all. The
algebra A is called a commutative algebra if the multiplication in it is commutative:
a b = b a. Similarly, the algebra A is called an associative algebra if the operation
of multiplication is associative: (a b) c = a (b c).
From the definition 1.1 and from the theorem 1.1 we conclude that the linear
space End(V ) with the operation of composition taken for multiplication is an
algebra over the same numeric field K as the initial vector space V . This algebra
is called the algebra of endomorphisms of a linear vector space V . It is associative
due to the theorem 1.6 from Chapter I. However, this algebra is not commutative
in general case.
The operation of composition is treated as a multiplication in the algebra
of endomorphisms End(V ). Therefore, it is usually omitted when written in this
context. The multiplication of operator is higher priority operation as compared to
addition. The priority of operator multiplication as compared to the multiplication
by numbers makes no difference at all. This follows from the axiom (7) for the
space End(V ) and from the properties (10) and (12) of the multiplication in
End(V ). Now we can consider positive integer powers of linear operators:
f
2
= f f, f
3
= f
2
f, f
n+1
= f
n
f.
52 CHAPTER II. LINEAR OPERATORS.
If an operator f is bijective, then we have the inverse operator f
−1
and we can
consider negative inverse powers of f as well:
f
−2
= f
−1
f
−1
, f
n
f
−n
= id
V
, f
n+m
= f
n
f
m
.
The latter equality is valid either for positive and negative values of integer
constants n and m.
Definition 1.2. An algebra A over the field K is called an algebra with unit
element or an algebra with unity if there is an element 1 ∈ A such that 1 a = a
and a 1 = a for all a ∈ A.
The algebra of endomorphisms End(V ) is an algebra with unity. The identical
operator plays the role of unit element in this algebra: 1 = id
V
. Therefore, this
operator is also called the unit operator or the operator unity.
Definition 1.3. A linear operator f : V →V is called a scalar operator if it is
obtained by multiplying the unit operator 1 by a number λ ∈ K, i. e. if f = λ 1.
The basic purpose of operators from the space End(V ) is to act upon vectors of
the space V . Suppose that a, b ∈ End(V ) and let x, y ∈ V . Then
(1) (a +b)(x) = a(x) + b(x);
(2) a(x +y) = a(x) + a(y).
These two relationships are well known: the first one follows from the definition of
the sum of two operators, the second relationship follows from the linearity of the
operator a. The question is why the vectors x and y in the above formulas are
surrounded by brackets. This is the consequence of «functional» form of writing
the action of an operator upon a vector: the operator sign is put on the left and
the vector sign is put on the right and is enclosed into brackets like an argument
of a function: w = f(v). Algebraists use the more «deliberate» form of writing:
w = f v. The operator sign is on the left and the vector sign on the right, but no
brackets are used. If we know that f ∈ End(V ) and v ∈ V , then such a writing
makes no confusion. In more complicated case even if we know that α ∈ K,
f, g ∈ End(V ), and v ∈ V , the writing w = αf g v admits several interpretations:
w = α f(g(v)), w = (α f)(g(v)),
w = (α (f ◦ g))(v), w = ((α f) ◦ g)(v).
However, for any one of these interpretations we get the same vector w. Therefore,
in what follows we shall use the algebraic form of writing the action of an operator
upon a vector, especially in huge calculations.
Let f : V → V be a linear operator in a finite-dimensional vector space V .
According to general scheme of constructing the matrix of a linear mapping
we should choose two bases e
1
, . . . , e
n
and h
1
, . . . , h
n
in V and consider the
expansions similar to (9.1) in Chapter I. No doubt that this approach is valid, it
could be very fruitful in some cases. However, to have two bases in one space — it
is certainly excessive. Therefore, when constructing the matrix of a linear operator
the second basis h
1
, . . . , h
n
is chosen to be coinciding with the first one. The
§ 1. ENDOMORPHISMS AND AUTOMORPHISMS. 53
matrix F of an operator f is determined from the expansions
f(e
1
) = F
1
1
e
1
+ . . . + F
n
1
e
n
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
f(e
n
) = F
1
n
e
1
+ . . . + F
n
n
e
n
,
(1.1)
which can be expressed in brief form by the formula
f(e
j
) =
n
¸
i=1
F
i
j
e
i
. (1.2)
The matrix F determined by the expansions (1.1) or by the expansions (1.2) is
called the matrix of a linear operator f in the basis e
1
, . . . , e
n
. This is a square
n n matrix, where n = dimV .
Theorem 1.2. Matrices related to operators f ∈ End(V ) in some fixed basis
e
1
, . . . , e
n
possess the following properties:
(1) the sum of two operators is represented by the sun of their matrices;
(2) the product of an operator by a number is represented by the product of its
matrix by that number;
(3) the composition of two matrices is represented by the product of their matrices.
Proof. Consider the operators f, g, and h from End(V ). Let F, G, and H
be their matrices in the basis e
1
, . . . , e
n
. Proving the first proposition in the
theorem 1.2, let’s denote h = f +g. Then
h(e
j
) = (f +g) e
j
= f(e
j
) +h(e
j
) =
=
n
¸
i=1
F
i
j
e
i
+
n
¸
i=1
G
i
j
e
i
=
n
¸
i=1
(F
i
j
+G
i
j
) e
i
=
n
¸
i=1
H
i
j
e
i
.
Due to the uniqueness of the expansion of a vector in a basis we have H
i
j
= F
i
j
+G
i
j
and H = F + G. The first proposition of the theorem is proved.
The proof of the second proposition is similar. Let’s denote f = α h. Then
h(e
j
) = (α f) e
j
= α f(e
j
) =
= α

n
¸
i=1
F
i
j
e
i

=
n
¸
i=1
(αF
i
j
) e
i
=
n
¸
i=1
H
i
j
e
i
.
Therefore, H
i
j
= αF
i
j
and H = α F. The proof of the third proposition requires
a little bit more efforts. Denote h = f ◦ g. Then
h(e
j
) = (f ◦ g) e
j
= f(g(e
g
)) = f

n
¸
i=1
G
i
j
e
i

=
n
¸
i=1
G
i
j
f(e
i
) =
=
n
¸
i=1
G
i
j

n
¸
s=1
F
s
i
e
s

=
n
¸
s=1

n
¸
i=1
F
s
i
G
i
j

e
s
=
n
¸
s=1
H
s
i
e
s
.
54 CHAPTER II. LINEAR OPERATORS.
Due to the uniqueness of the expansion of a vector in a basis we derive
H
s
i
=
n
¸
i=1
F
s
i
G
i
j
.
The right side of this equality is easily interpreted as the product of two matrices
written in terms of the components of these matrices. Therefore, H = F G. The
theorem is proved.
From the theorem that was proved just above we conclude that when relating an
operator f ∈ End(V ) with its matrix we establish the isomorphism of the algebra
End(V ) and the matrix algebra K
n×n
with standard matrix multiplication.
Now let’s study how the matrix of a linear operator f : V → V changes under
the change of the basis e
1
, . . . , e
n
for some other basis ˜ e
1
, . . . , ˜e
n
. Let S be
the direct transition matrix and let T be the inverse one. Note that we need
not derive the transformation formulas again. We can adapt the formulas (9.10)
from Chapter I for our present purpose. Since the basis h
1
, . . . , h
n
coincides with
e
1
, . . . , e
n
and the basis
˜
h
1
, . . . ,
˜
h
n
coincides with ˜e
1
, . . . , ˜ e
n
, we have P = S.
Then transformation formulas are written as
˜
F = S
−1
F S, F = S
˜
F S
−1
. (1.3)
These are the required formulas for transforming the matrix of a linear operator
under a change of basis. Taking into account that T = S
−1
we can write (1.3) as
˜
F
q
p
=
n
¸
i=1
n
¸
j=1
T
q
i
S
j
p
F
i
j
, F
i
j
=
n
¸
q=1
n
¸
p=1
S
i
q
T
p
j
F
q
p
. (1.4)
The relationships (1.3) yield very important formula relating the determinants
of the matrices F and
˜
F. Indeed, we have
det
˜
F = det(S
−1
) det F det S = (det S)
−1
det F det S = det F.
The coincidence of determinants of the matrices of a linear operator f in two
arbitrary bases mean that they represent a number which does not depend on a
basis at all.
Definition 1.4. The determinant det f of a linear operator f is the number
equal to the determinant of the matrix F of this linear operator in some basis.
A numeric invariant of a geometric object in a linear vector space V is a
number determined by this geometric object such that it does not depend on
anything else other than that geometric object itself. The determinant of a linear
operator det f is an example of such numeric invariant. Coordinates of a vector or
components of the matrix of a linear operator are not numeric invariants. Another
example of a numeric invariant of a linear operator is its rank:
rank f = dim(Imf).
Soon we shall define a lot of other numeric invariants of a linear operator.
§ 1. ENDOMORPHISMS AND AUTOMORPHISMS. 55
From the third proposition of the theorem 1.2 we derive the following formula
for the determinant of a linear operator:
det(f ◦ g) = det(f) det(g). (1.5)
Theorem 1.3. A linear operator f : V →V in a finite-dimensional linear vector
space V is injective if and only if it is surjective.
Proof. In order to prove this theorem we apply the theorem 1.2 and two
theorems 8.3 and 9.4 from Chapter I. The injectivity of the linear operator f
is equivalent to the condition Ker f = ¦0¦, the surjectivity of the operator f
is equivalent to Imf = V , while the theorem 9.4 from Chapter I relates the
dimensions of these two subspaces Ker f and Imf:
dim(Ker f) +dim(Imf) = dim(V ).
If the operator f is injective, then Ker f = ¦0¦ and dim(Ker f) = 0. Then
dim(Imf) = dim(V ). Applying the third proposition of the theorem 4.5 from
Chapter I, we get Imf = V , which proves the surjectivity of the operator f.
Conversely, if the operator f is surjective, then Imf = V and dim(Imf) =
dim(V ). Hence, dim(Ker f) = 0 and Ker f = ¦0¦. This proves the injectivity of
the operator f.
Theorem 1.4. A linear operator f : V →V in a finite-dimensional linear vector
space V is bijective if and only if det f = 0.
Proof. Let x be a vector of V and let y = f(x). Expanding x and y in some
basis e
1
, . . . , e
n
, we get the following formula relating their coordinates:

y
1
.
.
.
y
n

=

F
1
1
. . . F
1
n
.
.
.
.
.
.
.
.
.
F
n
1
. . . F
n
n

x
1
.
.
.
x
n

. (1.6)
The formula (1.6) can be derived independently or one can derive it from the
formula (9.5) of Chapter I. From this formula we derive that x belong to the
kernel of the operator f if and only if its coordinates x
1
, . . . , x
n
satisfy the
homogeneous system of linear equations

F
1
1
. . . F
1
n
.
.
.
.
.
.
.
.
.
F
n
1
. . . F
n
n

x
1
.
.
.
x
n

=

0
.
.
.
0

, (1.7)
The matrix of this system of equations coincides with the matrix of the operator
f in the basis e
1
, . . . , e
n
. Therefore, the kernel of the operator f is nonzero
if and only if the system of equations (1.7) has nonzero solution. Here we use
the well-known result from the theory of determinants: a homogeneous system of
linear algebraic equations with square matrix F has nonzero solution if and only if
det F = 0. The proof of this fact can be found in [5]. From this result immediately
get that the condition Ker f = ¦0¦ is equivalent to Ker f = ¦0¦. Due to the
previous theorem and due to the theorem 1.1 from Chapter I the latter equality
Ker f = ¦0¦ is equivalent to bijectivity of f. The theorem is proved.
56 CHAPTER II. LINEAR OPERATORS.
An operator f with zero determinant detf = 0 is called a degenerate operator.
Using this terminology we can formulate the following corollary of the theorem 1.4.
Corollary. A linear operator f : V → V in a finite-dimensional space V has
a nontrivial kernel Ker f = ¦0¦ if and only if it is degenerate. Otherwise this linear
operator is bijective.
Remember that a bijective linear mapping f from V to W is called an isomor-
phism. If W = V such a mapping establishes an isomorphism of the space V with
itself. Therefore, it is called an automorphism of the space V . The set of all
automorphisms of the space V is denoted by Aut(V ). It is obvious that Aut(V )
possesses the following properties:
(1) if f, g ∈ Aut(V ), then f◦ g ∈ Aut(V );
(2) if f ∈ Aut(V ), then f
−1
∈ Aut(V );
(3) 1 ∈ Aut(V ), where 1 is the identical operator.
It is easy to see that due to the above three properties the set of automorphisms
Aut(V ) is equipped with a structure of a group. The group of automorphisms
Aut(V ) is a subset in the algebra of endomorphisms End(V ), however, it does not
inherit the structure of an algebra, nor even the structure of a linear vector space.
It is clear because, for instance, the zero operator does not belong to Aut(V ). In
the case of finite-dimensional space V the group of automorphisms consists of all
non-degenerate operators.
' 2. Projection operators.
Let V be a linear vector space expanded into a direct sum of two subspaces:
V = U
1
⊕ U
2
. (2.1)
Due to the expansion (2.1) each vector v ∈ V is expanded into a sum
v = u
1
+ u
2
, where u
1
∈ U
1
and u
2
∈ U
2
, (2.2)
the components u
1
and u
2
in (2.2) being uniquely determined by the vector v.
Definition 2.1. The operator P : V → V mapping each vector v ∈ V to its
first component u
1
in the expansion (2.2) is called the operator of projection onto
the subspace U
1
parallel to the subspace U
2
.
Theorem 2.1. For any expansion of the form (2.1) the operator of projection
onto the subspace U
1
parallel to the subspace U
2
is a linear operator.
Proof. Let’s consider a pair of vectors v
1
, v
2
from the space V , and for each
of them consider the expansion like (2.2):
v
1
= u
1
+u
2
,
v
2
= ˜ u
1
+ ˜ u
2
.
Then P(v
1
) = u
1
and P(v
2
) = ˜ u
1
. Let’s add the above two expansions and write
v
1
+v
2
= (u
1
+ ˜ u
1
) +(u
2
+ ˜ u
2
). (2.3)
CopyRight c (Sharipov R.A., 1996, 2004.
§ 2. PROJECTION OPERATORS. 57
From u
1
, ˜ u
1
∈ U
1
and from u
2
, ˜ u
2
∈ U
2
we derive u
1
+ ˜ u
1
∈ U
1
and u
2
+ ˜ u
2
∈ U
2
.
Therefore, (2.3) is an expansion of the form (2.2) for the vector v
1
+v
2
. Then
P(v
1
+ v
2
) = u
1
+ ˜ u
1
= P(v
1
) + P(v
2
). (2.4)
Now let’s consider the expansion (2.2) for an arbitrary vector v ∈ V and
multiply it by a number α ∈ K:
α v = (α u
1
) + (α u
2
).
Then α u
1
∈ U
1
and α u
2
∈ U
2
, therefore, due to the definition of P we get
P(α v) = α u
1
= α P(v). (2.5)
The relationships (2.4) and (2.5) are just the very relationships that mean the
linearity of the operator P.
Suppose that v in the expansion (2.2) is chosen to be a vector of the subspace
U
1
. Then the expansion (2.2) for this vector is v = v + 0, therefore, P(v) = v.
This means that all vectors of the subspace U
1
are projected by P onto themselves.
This fact has an important consequence P
2
= P. Indeed, for any v ∈ V we have
P(v) ∈ U
1
, therefore, P(P(v)) = P(v).
Besides P, by means of (2.2) we can define the other operator Q such that
Q(v) = u
2
. It is also a projection operator: it projects onto U
2
parallel to U
1
.
Therefore, Q
2
= Q. For the sum of these two operators we get P +Q = 1. Indeed,
for any vector v ∈ V we have
P(v) +Q(v) = u
1
+u
2
= v = id
V
(v) = 1(v).
If v ∈ U
1
, then the expansion (2.2) for this vector is v = +0, therefore,
Q(v) = 0. Similarly, P(v) = 0 for all v ∈ U
2
. Hence, we derive Q(P(v)) = 0 and
P(Q(v)) = 0 for any v ∈ V . Summarizing these results, we write
P
2
= P, P +Q = 1,
(2.6)
Q
2
= Q, P Q = QP = 0.
A pair of projection operators satisfying the relationships (2.6) is called a con-
cordant pair of projectors.
in order to get a concordant pair of projectors it is sufficient to define only one
of them, for instance, the operator P. The second operator Q then is given by
formula Q = 1 − P. All of the relationships (2.6) thereby will be automatically
fulfilled. Indeed, we have the relationships
P Q = P ◦ (1 − P) = P − P
2
= P −P = 0,
QP = (1 −P) ◦ P = P − P
2
= P −P = 0.
The relationship Q
2
= Q for Q is derived in a similar way:
Q
2
= (1 −P) ◦ (1 − P) = 1 − 2P + P = 1 −P = Q.
58 CHAPTER II. LINEAR OPERATORS.
Theorem 2.2. An operator P : V → V is a projector onto a subspace parallel
to another subspace if and only if P
2
= P.
Proof. We have already shown that any projector satisfies the equality P
2
=
P. Let’s prove the converse proposition. Suppose that P
2
= P. Let’s denote
Q = 1 − P. Then for operators P and Q all of the relationships (2.6) are fulfilled.
Let’s consider two subspaces
U
1
= ImP, U
2
= Ker P.
For an arbitrary vector v ∈ V we have the expansion
v = 1(v) = (P +Q)v = P(v) + Q(v), (2.7)
where u
1
= P(v) ∈ ImP. From the relationship P Q = 0 for the other vector
u
2
= Q(v) in (2.7) we get the equality
P(u
2
) = P(Q(v)) = 0.
This means u
2
∈ Ker P. Hence, V = ImP + Ker P. Let’s prove that this is a
direct sum of subspaces. We should prove the uniqueness of the expansion
v = u
1
+u
2
, (2.8)
where u
1
∈ ImP and u
2
∈ Ker P. From u
1
∈ ImP we conclude that u
1
= P(v
1
)
for some vector v
1
∈ V . From u
2
∈ Ker P we derive P(u
2
) = 0. Then from (2.8)
we derive the following formulas:
P(v) = P(u
1
) +P(u
2
) = P(P(v
1
)) = P
2
(v
1
) = P(v
1
) = u
1
,
Q(v) = (1 − P) v = v − P(v) = v −u
1
= u
2
.
The relationships derives just above mean that any expansion (2.8) coincides with
(2.7). Hence, it is unique and we have
V = ImP ⊕ Ker P.
The operator P maps an arbitrary vector v ∈ V into the first component of the
expansion (2.8). Hence, P is an operator of projection onto the subspace ImP
parallel to the subspace Ker P.
Now suppose that a linear vector space V is expanded into the direct sum of
several its subspaces U
1
, . . . , U
s
:
V = U
1
⊕ . . . ⊕ U
s
. (2.9)
This expansion of the space V implies the unique expansion for each vector v ∈ V :
v = u
1
+ . . . + u
s
, where u
i
∈ U
i
(2.10)
§ 2. PROJECTION OPERATORS. 59
Definition 2.2. The operator P
i
: V →V that maps each vector v ∈ V to its
i-th component u
i
in the expansion (2.10) is called the operator of projection onto
U
i
parallel to other subspaces.
The proof of linearity of the operators P
i
is practically the same as in case of
two subspaces considered in theorem 2.1. It is based on the uniqueness of the
expansion (2.10).
Let’s choose a vector u ∈ U
i
. Then its expansion (2.10) looks like:
u = 0 +. . . +0 + u +0 + . . . + 0.
Therefore for any such vector u we have P
i
(u) = u and P
j
(u) = 0 for j = i. For
the projection operators P
i
this yields
(P
i
)
2
= P
i
, P
i
◦ P
j
= 0 for i = j. (2.11)
Moreover, from the definition of P
i
we get
P
1
+ . . . +P
s
= 1. (2.12)
Due to the first relationship (2.11) the theory of separate operators P
i
does
not differ from the theory of projectors defined by two component expansions of
the space V . In the case of multicomponent expansions the collective behavior of
projectors is of particular interest. A family of projection operators P
1
, . . . , P
s
is
called a concordant family of projectors if the operators of this family satisfy the
relationships (2.11) and (2.12).
Theorem 2.3. A family of projection operators P
1
, . . . , P
s
is determined by
an expansion of the form (2.9) if and only if it is concordant, i. e. if these operators
satisfy the relationships (2.11) and (2.12).
Proof. We already know that a family of projectors determined by an ex-
pansion (2.9) satisfy the relationships (2.11) and (2.12). Let’s prove the converse
proposition. Suppose that we have a family of operators P
1
, . . . , P
s
satisfying the
relationships (2.11) and (2.12). Then we define the subspaces U
i
= ImP
i
. Due to
the relationship (2.12) for an arbitrary vector v ∈ V we get
v = P
1
(v) + . . . + P
s
(v), (2.13)
where P
i
(v) ∈ ImP
i
. Hence, we have the expansion of V into a sum of subspaces
V = ImP
1
+. . . + ImP
s
. (2.14)
Let’s prove that the sum (2.14) is a direct sum. For this purpose we consider an
expansion of some arbitrary vector v ∈ V corresponding to the expansion (2.14):
v = u
1
+ . . . + u
s
, where u
i
∈ ImP
i
(2.15)
60 CHAPTER II. LINEAR OPERATORS.
From u
i
∈ ImP
i
we conclude that u
i
= P(v
i
), where v
i
∈ V . Then from the
expansion (2.15) we derive the following equality:
P
i
(v) = P
i
(u
1
+. . . +u
s
) =
s
¸
j=1
P
i
(P
j
(v
j
)).
Due to (2.11) only one term in the above sum is nonzero. Therefore, we have
P
i
(v) = (P
i
)
2
v
i
= P
i
(v
i
) = u
i
.
This equality show that an arbitrary expansion (2.15) should coincide with (2.13).
This means that (2.13) is the unique expansion of that sort. Hence, the sum (2.14)
is a direct sum and P
i
is the projection operator onto the i-th component of the
sum (2.14) parallel to its other components. The theorem is proved.
Now we consider a projection operator P as an example for the first approach
to the problem of bringing the matrix of a linear operator to a canonic form.
Theorem 2.4. For any nonzero projection operator in a finite-dimensional
vector space V there is a basis e
1
, . . . , e
n
such that the matrix of the operator
P has the following form in that basis:
{ =
s
. .. .

1 0 . . . 0 0 . . . 0
0 1 . . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0 . . . 0
0 0 . . . 0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0 0 . . . 0

s
.
(2.16)
Proof. Let’s consider the subspaces ImP and Ker P. From the condition
P = 0 we conclude that s = dim(ImP) = 0. Then we choose a basis e
1
, . . . , e
s
in U
1
= ImP and if U
1
= V , we complete it by choosing a basis in e
s+1
, . . . , e
n
in U
2
= Ker P. The sum of these two subspaces is a direct sum: V = U
1
⊕ U
2
,
therefore, joining together two bases in them, we get a basis of V (see the proof of
theorem 6.3 in Chapter I).
Now let’s apply the operator P to the vectors of the basis we have constructed
just above. This operator projects onto U
1
parallel to U
2
, therefore, we have
P(e
i
) =

e
i
for i = 1, . . . , s,
0 for i = s + 1, . . . , n.
Due to this formula it’s clear that (2.16) is the matrix of the projection operator
P the basis e
1
, . . . , e
s
.
§ 3. INVARIANT SUBSPACES. 61
' 3. Invariant subspaces.
Restriction and factorization of operators.
Let f : V →V be a linear operator and let U be a subspace of V . Let’s restrict
the domain of f to the subspace U. Thereby the image of f shrinks to f(U).
However, in general, the subspace f(U) is not enclosed into the subspace U. For
this reason in general case we should treat the restricted operator f as a linear
mapping f
U
: U →V , rather than a linear operator.
Definition 3.1. A subspace U is called an invariant subspace of a linear
operator f : V → V if f(U) ⊆ U, i. e. if u ∈ U implies f(u) ∈ U.
If U is an invariant subspace of f, the restriction f
U
can be treated as a linear
operator in U. Its action upon vectors u ∈ U coincides with the action of f upon
u. As for the vectors outside the subspace U, the operator f
U
cannot be applied
to them at all.
Theorem 3.1. The kernel and the image of a linear operator f : V → V are
invariant subspaces of f.
Proof. Let’s consider the kernel of f for the first. If u ∈ Ker f, then f(u) = 0.
Hence, f(u) ∈ U, since the zero vector 0 is an element of any subspace of V . The
invariance of the kernel Kerf is proved.
Now let u ∈ Imf. Denote w = f(u). Then w is the image of the vector u,
hence, w = f(u) ∈ Imf. The invariance of the image Imf is proved.
Theorem 3.2. The intersection and the sum of an arbitrary number of invariant
subspaces of a linear operator f : V →V both are the invariant subspaces of f.
Proof. Let U
i
, i ∈ I be a family of invariant subspaces of a linear operator
f : V →V . Let’s consider the intersection and the sum of these subspaces:
U =
¸
i∈I
U
i
, W =
¸
i∈I
U
i
.
In ' 6 of Chapter I we have proved that U and W are the subspaces of V . Now
we should prove that they are invariant subspaces. For the first, let’s prove that
U is an invariant subspace. Consider a vector u ∈ U. This vector belongs to all
subspaces U
i
, which are invariant subspaces of f. Therefore, f(u) also belongs
to all subspaces U
i
. This means that f(u) belongs to their intersection U. The
invariance of U is proved.
Now let’s consider a vector w ∈ W. According to the definition of the sum of
subspaces, this vector admits the expansion
w = u
i1
+. . . +u
is
, , where u
ir
∈ U
ir
.
Applying the operator f to both sides of this equality, we get:
f(w) = f(u
i1
) + . . . +f(u
is
).
Due to the invariance of U
i
we have f(u
ir
) ∈ U
ir
. Hence, f(w) ∈ W. This yields
the invariance of the sum W of the invariant subspaces U
i
.
62 CHAPTER II. LINEAR OPERATORS.
Let U be an invariant subspace of a linear operator f : V → V . Let’s consider
the factorspace V/U and define the operator f
V/U
in this factorspace by formula
f
V/U
(Q) = Cl
U
(f(v)), where Q = Cl
U
(v). (3.1)
The operator f
V/U
: V/U → V/U acting according to the rule (3.1) is called the
factoroperator of the quotient operator of the operator f by the subspace U. We
can rewrite the formula (3.1) in shorter form as follows:
f
V/U
(Cl
U
(v)) = Cl
U
(f(v)). (3.2)
Like formulas (7.3) in Chapter I, the formulas (3.1) and (3.2) comprise the definite
amount of uncertainty due to the uncertainty of the choice of a representative v
in a coset Q = Cl
U
(v). Therefore, we need to prove their correctness.
Theorem 3.3. The formula (3.1) and the equivalent formula (3.2) both are
correct. They define a linear operator f
V/U
in factorspace V/U.
Proof. Let’s conside two different representative vectors in a coset Q, i. e. let
v, ˜ v ∈ Q. Then ˜ v − v ∈ U. According to the formula (3.1), we consider two
possible results of applying the operator f
V/U
to Q:
f
V/U
(Q) = Cl
U
(f(v)), f
V/U
(Q) = Cl
U
(f(˜ v)).
Let’s calculate the difference of these two possible results:
Cl
U
(f(˜ v)) − Cl
U
(f(v)) = Cl
U
(f(˜ v) − f(v)) = Cl
U
(f(˜ v −v)).
Note that the vector u = ˜ v −v belongs to the subspace U. Since U is an invariant
subspace, we have ˜ u = f(u) ∈ U. Therefore, we get
Cl
U
(f(˜ v)) − Cl
U
(f(v)) = Cl
U
(˜ u) = 0.
This coincidence Cl
U
(f(˜ v)) = Cl
U
(f(v)) that we have proved just above proves
the correctness of the formula (3.1) and the formula (3.2) as well.
Now let’s prove the linearity of the factoroperator f
V/U
: V/U →V/U. We shall
carry out the appropriate calculations on the base of formula (3.1):
f
V/U
(Q
1
+Q
2
) = f
V/U
(Cl
U
(v
1
) +Cl
U
(v
2
)) =
= f
V/U
(Cl
U
(v
1
+ v
2
)) = Cl
U
(f(v
1
+v
2
)) =
= Cl
U
(f(v
1
)) +Cl
U
(f(v
2
)) = f
V/U
(Q
1
) +f
V/U
(Q
2
),
f
V/U
(α Q) = f
V/U
(α Cl
U
(v)) =
= f
V/U
(Cl
U
(α v)) = Cl
U
(f(α v)) =
= Cl
U
(α f(v)) = α Cl
U
(f(v)) = α f
V/U
(Q).
§ 3. INVARIANT SUBSPACES. 63
These calculations show that f
V/U
is a linear operator. The theorem is proved.
Theorem 3.4. Suppose that U is a common invariant subspace of two linear
operators f, g ∈ End(V ). Then U is an invariant subspace of the operators f+g, αf
and f ◦ g as well. For their restrictions to the subspace U and for the corresponding
factoroperators we have the following relationships:
(f + g)
U
= f
U
+ g
U
; (f +g)
V/U
= f
V/U
+g
V/U
;
(α f)
U
= α f
U
; (α f)
V/U
= α f
V/U
;
(f ◦ g)
U
= f
U
◦ g
U
; (f ◦ g)
V/U
= f
V/U
◦ g
V/U
.
Proof. Let’s begin with the first case. Denote h = f + g and assume that
u is an arbitrary vector of U. Then f(u) ∈ U and g(u) ∈ U since U is
an invariant subspace of both operators f and g. For this reason we obtain
h(u) = f(u) + g(u) ∈ U. This proves that U is an invariant subspace of h. The
relationship h
U
= f
U
+g
U
follows from h = f + g since the results of applying the
restricted operators to u do not differ from the results of applying f, g, and h to
u. The corresponding relationship for the factoroperators is proved as follows:
h
V/U
(Cl
U
(v)) = Cl
U
(h(v)) = Cl
U
(f(v) + h(v)) =
=Cl
U
(f(v)) + Cl
U
(g(v)) = f
V/U
(Cl
U
(v)) +g
V/U
(Cl
U
(v)).
The second case, where we denote h = α f, is not quite different from the first
one. From u ∈ U it follows that f(u) ∈ U, hence, h(u) = α f(u) ∈ U. The
relationship h
U
= α f
U
now is obvious due to the same reasons as above. For the
factoroperators we perform the following calculations:
h
V/U
(Cl
U
(v)) = Cl
U
(h(v)) = Cl
U
(α f(v)) =
= α Cl
U
(f(v)) = α f
V/U
(Cl
U
(v)) = (α f
V/U
)(Cl
U
(v)).
Now we consider the third case. Here we denote h = f ◦ g. From u ∈ U we
derive w = g(u) ∈ U, then from w ∈ U we derive f(w) ∈ U, which means that
U is an invariant subspace of h. Indeed, h(u) = f(g(u)) = f(w) ∈ U. For the
restricted operators this yields the equality
h
U
(u) = h(u) = f(g(u)) = f
U
(g
U
(u)).
Hence, h
U
= f
U
◦ g
U
. Passing to factoroperators, we obtain
h
V/U
(Cl
U
(v)) = Cl
U
(h(v)) = Cl
U
(f(g(v)) = f
V/U
(Cl(g(v))) =
= f
V/U
(g
V/U
(Cl
U
(v))) = f
V/U
◦ g
V/U
(Cl
U
(v)).
The above calculations prove the last relationship of the theorem 3.4.
Theorem 3.5. Let V = U
1
⊕ . . . ⊕ U
s
be an expansion of a linear vector space
V into a direct sum of its subspaces. The subspaces U
1
, . . . , U
s
are invariant sub-
CopyRight c (Sharipov R.A., 1996, 2004.
64 CHAPTER II. LINEAR OPERATORS.
spaces of an operator f : V →V if and only if the projection operators P
1
, . . . , P
s
associated with the expansion V = U
1
⊕. . . ⊕U
s
commute with the operator f, i. e.
if f ◦ P
i
= P
i
◦ f, where i = 1, . . . , s.
Proof. Suppose that all subspaces U
i
are invariant under the action of the
operator f. For an arbitrary vector v ∈ V we consider the expansion determined
by the direct sum V = U
1
⊕. . . ⊕ U
s
:
v = u
1
+. . . + u
s
.
Here u
i
= P
i
(v) ∈ U
i
. From this expansion we derive
P
i
(f(v)) = P
i
(f(u
1
) +. . . + f(u
s
)) = f(u
i
) = f(P
i
(v))
We used the inclusion w
j
= f(u
j
) ∈ U
j
that follows from the invariance of the
subspace U
j
under the action of f. We also used the following properties of
projection operators (they follow from (2.11) and U
i
= ImP
i
, see ' 2 above):
P
i
(w
j
) =

w
i
for j = i,
0 for j = i.
Since v is an arbitrary vector of the space V , from the above equality P
i
(f(v)) =
f(P
i
(v)) we derive f ◦ P
i
= P
i
◦ f.
Conversely, suppose that the operator f commute with all projection operators
P
1
, . . . , P
s
associated with the expansion V = U
1
⊕. . . ⊕U
s
. Let u be an arbitrary
vector of the subspace U
i
. Then we denote w = f(u) and for w we derive
P
i
(w) = P
i
(f(v)) = f(P
i
(u)) = f(u) = w.
Remember that P
i
projects onto the subspace U
i
. Hence, P
i
(w) ∈ U
i
. But due
to the above equality we find that P
i
(w) = w = f(u) ∈ U
i
. Thus we have shown
that the space U
i
is invariant under the action of the operator f. The theorem is
completely proved.
Let’s consider a linear operator f in a finite-dimensional linear vector space V
and possessing an invariant subspace U. Suppose that dimV = n and dimU = s.
Let’s choose a basis e
1
, . . . , e
s
in U and then, if s < n, complete this basis up
to a basis in V . Denote by e
s+1
, . . . , e
n
the complementary vectors. For j s
due to the invariance of the subspace U under the action of f we have f(e
j
) ∈ U.
Therefore, in the expansions of these vectors
f(e
j
) =
s
¸
i=1
F
i
j
e
i
, where j s,
the summation index i runs from 1 to s, but not from 1 to n as it should in
general case, where we expand an arbitrary vector of V . This means that if we
construct the matrix of the operator f in the basis e
1
, . . . , e
n
, this matrix would
§ 3. INVARIANT SUBSPACES. 65
be mounted of blocks with the lower left block in it being zero:
F =
s
. .. .

F
1
1
F
1
2
. . . F
1
s
F
1
s+1
. . . F
1
n
F
2
1
F
2
2
. . . F
2
s
F
2
s+1
. . . F
2
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
F
s
1
F
s
2
. . . F
4
s
F
s
s+1
. . . F
s
n
0 0 . . . 0 F
s+1
s+1
. . . F
s+1
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0 F
n
s+1
. . . F
n
n

s
(3.3)
Matrices of this form are called blockwise-triangular matrices. The upper left
diagonal block in the matrix (3.3) coincides with the matrix of restricted operator
f
U
: U → U in the invariant subspace U.
The lower right diagonal block of the matrix (3.3) can also be interpreted in
a special way. In order to find this interpretation let’s consider the cosets of
complementary vectors in the basis e
1
, . . . , e
n
:
E
1
= Cl
U
(e
s+1
), . . . , E
n−s
= Cl
U
(e
n
). (3.4)
When proving the theorem 7.6 in Chapter I, we have found that these cosets form
a basis in the factorspace V/U. Applying the factoroperator f
V/U
to (3.4), we get
f
V/U
(E
j
) = f
V/U
Cl
U
(e
s+j
) = Cl
U
(f(e
s+j
)) =
=
s
¸
i=1
F
i
s+j
Cl
U
(e
i
) +
n
¸
i=s+1
F
i
s+j
Cl
U
(e
i
).
The first sum in the above expression is equal to zero since the vectors e
1
, . . . , e
s
belong to U. Then, shifting the index i +s →i, we find
f
V/U
(E
j
) =
n−s
¸
i=1
F
s+i
s+j
E
i
.
Looking at this formula, we see that the matrix of the factoroperator f
V/U
in the
basis (3.4) coincides with the lower right diagonal block in the matrix (3.3).
Theorem 3.6. Let f : V →V be a linear operator in a finite-dimensional space
and let U be an invariant subspace of this operator. Then the determinant of f
is equal to the product of two determinants — the determinant of the restricted
operator f
U
and that of the factoroperator f
V/U
:
det f = det(f
U
) det(f
V/U
).
The proof of this theorem is immediate from the following fact well-known
in the theory of determinants: the determinant of blockwise-triangular matrix is
equal to the product of determinants of all its diagonal blocks.
66 CHAPTER II. LINEAR OPERATORS.
' 4. Eigenvalues and eigenvectors.
Let f : V → V be a linear operator. A nonzero vector v = 0 of the space V is
called an eigenvector of the operator f if f v = λ v, where λ ∈ K. The number λ
is called the eigenvalue of the operator f associated with the eigenvector v.
One eigenvalue λ of an operator f can be associated with several or even with
infinite number of eigenvectors. But conversely, if an eigenvector is given, the
associated eigenvalue λ for this eigenvector is unique. Indeed, from the equality
f v = λ v = λ

v and from v = 0 it follows that λ = λ

.
Let v be an eigenvector of the operator f : V → V . Let’s consider the other
operator h
λ
= f −λ 1. Then the equation f v = λ v can be rewritten as
(f −λ 1) v = 0. (4.1)
Hence, v ∈ Ker(f − λ 1). The condition v = 0 means that the kernel of this
operator is nonzero: Ker(f − λ 1) = ¦0¦.
Definition 4.1. A number λ ∈ K is called an eigenvalue of a linear operator
f : V → V if the subspace V
λ
= Ker(f − λ 1) is nonzero. This subspace
V
λ
= Ker(f −λ 1) = ¦0¦ is called the eigenspace associated with the eigenvalue λ,
while any nonzero vector of V
λ
is called an eigenvector of the operator f associated
with the eigenvalue λ.
The collection of all eigenvalues of an operator f is sometimes called the
spectrum of this operator, while the brunch of mathematics studying the spectra of
linear operators is known as the spectral theory of operators. The spectral theory
of linear operators in finite-dimensional spaces is the most simple one. This is the
very theory that is usually studied in the course of linear algebra.
Let f : V → V be a linear operator in a finite-dimensional linear vector space
V . In order to find the spectrum of this operator we apply the corollary of
theorem 1.4. Due to this corollary a number λ ∈ K is an eigenvalue of the operator
f if and only if it satisfies the equation
det(f −λ 1) = 0. (4.2)
The equation (4.2) is called the characteristic equation of the operator f, its roots
are called the characteristic numbers of the operator f.
Let dimV = n. Then the determinant in formula (4.2) is equal to the
determinant of the square n n matrix. The matrix of the operator h
λ
= f −λ 1
is derived from the matrix of the operator f by subtracting λ from each element
on the primary diagonal of this matrix:
H
λ
=

F
1
1
− λ F
1
2
. . . F
1
n
F
2
1
F
2
2
−λ . . . F
2
n
.
.
.
.
.
.
.
.
.
.
.
.
F
n
1
F
n
2
. . . F
n
n
− λ

. (4.3)
§ 4. EIGENVALUES AND EIGENVECTORS. 67
The determinant of the matrix (4.3) is a polynomial of λ:
det(f − λ 1) = (−λ)
n
+F
1
(−λ)
n−1
+ . . . +F
n
. (4.4)
The polynomial in right hand side of (4.4) is called the characteristic polynomial
of the operator f. If F is the the matrix of the operator f in some basis, then the
coefficients F
1
, . . . , F
n
of characteristic polynomial (4.4) are expressed through
the elements of the matrix F. However, note that left hand side of of (4.4) is basis
independent, therefore, the coefficients F
1
, . . . , F
n
do not actually depend on the
choice of basis. They are scalar invariants of the operator f. The fires and the last
invariants in (4.4) are the most popular ones:
F
1
= tr f, F
n
= det f.
The invariant F
1
is called the trace of the operator f. It is calculated through the
matrix of this operator according to the following formula:
tr f =
n
¸
i=1
F
i
i
. (4.5)
We shall not derive this formula (4.5) since it is well-known in the theory of
determinants. We shall only derive the invariance of the trace immediately on the
base of formula (1.4) which describes the transformation of the matrix of a linear
operator under a change of basis:
n
¸
p=1
˜
F
p
p
=
n
¸
i=1
n
¸
j=1

n
¸
p=1
T
p
i
S
j
p

F
i
j
=
n
¸
i=1
n
¸
j=1
δ
j
i
F
i
j
=
n
¸
i=1
F
i
i
.
Upon substituting (4.4) into (4.2) we see that the characteristic equation (4.2)
of the operator f is a polynomial equation of n-th order with respect to λ:
(−λ)
n
+ F
1
(−λ)
n−1
+ . . . + F
n
= 0. (4.6)
Therefore we can estimate the number of eigenvalues of the operator f. Any
eigenvalue λ ∈ K is a root of characteristic equation (4.6). However, not any
root of the equation (4.6) is an eigenvalue of the operator f. The matter is that
a polynomial equation with coefficients in the numeric field K can have roots in
some larger field
˜
K (e. g. Q ⊂ R or R ⊂ C). For the characteristic number λ of the
operator f to be an eigenvalue of this operator it should belong to K. From the
course of general algebra we know that the total number of roots of the equation
(4.6) counted according to their multiplicity and including those belonging to the
extensions of the field K is equal to n (see [4]).
Theorem 4.1. The number of eigenvalues of a linear operator f : V →V equals
to the dimension of the space V at most.
Consider the case K = Q. The roots of a polynomial equation with rational
coefficients are not necessarily rational numbers: the equation λ
2
− 3 = 0 is an
example. In the case of real numbers K = R a polynomial equation with real
coefficients can also have non-real roots, e. g. the equation λ
2
+

3 = 0. However,
the field of complex numbers K = C is an exception.
68 CHAPTER II. LINEAR OPERATORS.
Theorem 4.2. An arbitrary polynomial equation of n-th order with complex
coefficients has exactly n complex roots counted according to their multiplicity.
We shall not prove here this theorem referring the reader to the course of
general algebra (see [4]). The theorem 4.2 is known as the «basic theorem of
algebra», while the property of complex numbers stated in this theorem is called
the algebraic closure of C, i. e. C is an algebraically closed numeric field.
Definition 4.2. A numeric field K is called an algebraically closed field if the
roots of any polynomial equation with coefficients from K are again in K.
Certainly, C is not the unique algebraically closed field. However, in the list of
numeric fields Q, R, C that we consider in this book, only the field of complex
numbers is algebraically closed.
Let λ be an eigenvalue of a linear operator f. Then λ is a root of the equation
(4.6). The multiplicity of this root λ in the equation (4.6) is called the multiplicity
of the eigenvalue λ.
Theorem 4.3. For a linear operator f : V →V in a complex linear vector space
V the number of its eigenvalues counted according to their multiplicities is exactly
equal to the dimension of V .
This proposition strengthen the theorem 4.1. It is an immediate consequence
of the algebraic closure of the field of complex numbers C. In the case K = C the
characteristic polynomial (4.4) is factorized into a product of terms linear in λ:
det(f −λ 1) =
n
¸
i=1

i
−λ). (4.7)
For some operators such an expansion can occur in the case K = Q or K = R,
however, it is not a typical situation. If λ
1
, . . . , λ
n
are understood as characteristic
numbers of the operator f, then the formula (4.7) is always valid.
Due to the formula (4.7) we can present the numeric invariants F
1
, . . . , F
n
of
the operator f as elementary symmetric polynomials of its characteristic numbers:
F
i
= σ
i

1
, . . . , λ
n
).
In particular, for the trace and for the determinant of the operator f we have
tr f =
n
¸
i=1
λ
i
, det f =
n
¸
i=1
λ
i
. (4.8)
The theory of symmetric polynomials is given in the course of general algebra (see,
for example, the book [4]).
Theorem 4.4. For any eigenvalue λ of a linear operator f : V → V the asso-
ciated eigenspace V
λ
is invariant under the action of f.
Proof. The definition 4.1 of an eigenspace V
λ
of a linear operator f can
be reformulated as V
λ
= ¦v ∈ V : f(v) = λ v¦. Therefore, v ∈ V
λ
implies
f(v) = λ v ∈ V
λ
, which proves the invariance of V
λ
.
§ 4. EIGENVALUES AND EIGENVECTORS. 69
We know that the set of linear operators in a space V form the algebra End(V )
over the numeric field K. However, this algebra is too big. Let’s consider some
operator f ∈ End(V ) and complement it with the identical operator 1. Within
the algebra End(V ) we can take positive integer powers of the operator f, we can
multiply them by numbers from K, we can add such products, and we can add to
them scalar operators obtained by multiplying the identical operator 1 by various
numbers from K. As a result we obtain various operators of the form
P(f) = α
p
f
p
+ . . . + α
1
f +α
0
1. (4.9)
The set of all operators of the form (4.9) is called the polynomial envelope of the
operator f; it is denoted K[f]. This is a subset of End(V ) closed with respect to
all algebraic operations in End(V ). Such subsets are used to be called subalgebras.
It is important to say that the subalgebra K[f] is commutative, i. e. for any two
polynomials P and Q the corresponding operators (4.9) commute:
P(f) ◦ Q(f) = Q(f) ◦ P(f). (4.10)
The equality (4.10) is verified by direct calculation. Indeed, let P(f) and Q(f) be
two operator polynomials of the form:
P(f) =
p
¸
i=0
α
i
f
i
, Q(f) =
q
¸
j=0
β
j
f
j
.
Here we denote: f
0
= 1. This relationship should be treated as the definition of
zeroth power of the operator f. Then
P(f) ◦ Q(f) =
p
¸
i=0
q
¸
j=0

i
β
j
) f
i+j
= Q(f) ◦ P(f).
These calculations prove the relationship (4.10).
Theorem 4.5. Let U be an invariant subspace of an operator f. Then it is
invariant under the action of any operator from the polynomial envelope K[f].
Proof. Let u be an arbitrary vector of U. Let’s consider the following
vectors u
0
= u, u
1
= f(u), u
2
= f
2
(u), . . . , u
p
= f
p
(u). Every next vector in this
sequence is obtained by applying the operator f to the previous one: u
i+1
= f(u
i
).
Therefore, from u
0
∈ U it follows that u
1
∈ U since U is an invariant subspace of
f. Then, in turn, we successively obtain u
2
∈ U, u
3
∈ U, and so on up to u
p
∈ U.
Applying the operator P(f) of the form (4.9) to the vector u, we get
P(f) u = α
p
u
p
+. . . +α
1
u
1

0
u
0
.
Hence, due to u
i
∈ U we find that P(f) u ∈ U, which proves the invariance of U
under the action of the operator P(f).
The following fact is curious: if λ is an eigenvalue of the operator f and if v is
an associated eigenvector, then P(f) v = P(λ) v. Therefore, any eigenvector v of
70 CHAPTER II. LINEAR OPERATORS.
the operator f is an eigenvector of the operator P(f). The converse proposition,
however, is not true.
Let λ
1
, . . . , λ
s
be a set of mutually distinct eigenvalues of the operator f. Let’s
consider the operators h
i
= f − λ
i
1, which certainly belong to the polynomial
envelope of f. The permutability of any two such operators follows from (4.10).
The eigenspace V
λ
i
of the operator f is determined as the kernel of the operator
h
i
. According to the definition 4.1, it is nonzero. Moreover, the theorems 4.4 and
4.5 say that V
λ
i
is invariant under the action of f and of all other operators h
j
.
Theorem 4.6. Let λ
1
, . . . , λ
s
be a set of mutually distinct eigenvalues of the
operator f : V → V . Then the sum of associated eigenspaces V
λ1
, . . . , V
λs
is a
direct sum: V
λ1
+ . . . +V
λs
= V
λ1
⊕. . . ⊕V
λs
.
Note that the set of mutually distinct eigenvalues λ
1
, . . . , λ
s
of the operator f
in this theorem could be the complete set of such eigenvalues, or it could include
only a part of such eigenvalues. This makes no difference for the result of the
theorem 4.6, it remains valid in either case.
Proof. Let’s denote by W the sum of eigenspaces of the operator f:
W = V
λ1
⊕. . . ⊕V
λs
. (4.11)
In order to prove that the sum (4.11) is a direct sum we need to prove that for an
arbitrary vector w ∈ W the expansion
w = v
1
+. . . +v
s
, where v
i
∈ V
λ
i
, (4.12)
is unique. For this purpose we consider the operator f
i
defined by formula
f
i
=
s
¸
r=i
h
r
.
The operator f
i
belongs to the polynomial envelope of the operator f and
f
i
(v
j
) =

s
¸
r=i

j
−λ
r
)

v
j
. (4.13)
This follows from v
j
∈ V
λ
j
, which implies h
r
(v
j
) = (λ
j
− λ
r
) v
j
. The formula
(4.13) means that f
i
(v
j
) = 0 for all j = i. Applying the operator f
i
to both sides
of the expansion (4.12), we get the equality
f
i
(w) =

s
¸
r=i

i
− λ
r
)

v
i
.
Hence, for the vector v
i
in the expansion (4.12) we derive
v
i
=
f
i
(w)
s
¸
r=i

i
−λ
r
)
. (4.14)
CopyRight c (Sharipov R.A., 1996, 2004.
§ 4. EIGENVALUES AND EIGENVECTORS. 71
The formula (4.14) uniquely determines all summands in the expansion (4.12) if
the vector w ∈ W is given. This means that the expansion (4.12) is unique and
the sum of subspaces (4.11) is a direct sum.
Definition4.3. A linear operator f : V → V in a linear vector space V is
called a diagonalizable operator if there is a basis e
1
, . . . , e
n
in the space V such
that the matrix of the operator f is diagonal in this basis.
Theorem 4.7. An operator f : V →V is diagonalizable if and only if the sum
of all its eigenspaces coincides with V .
Proof. Let f be a diagonalizable operator. Then we can choose a basis
e
1
, . . . , e
n
such that its matrix F in this basis is diagonal, i. e. only diagonal
elements F
i
i
of this matrix can be nonzero. Then the relationship (1.2), which
determines the matrix F, is written as f(e
i
) = F
i
i
e
i
. Hence, each basis vector e
i
is an eigenvector of the operator f, while λ
i
= F
i
i
is its associated eigenvalue. The
expansion of an arbitrary vector v in this base is an expansion by eigenvectors
of the operator f. Therefore, having collected together the terms with coinciding
eigenvalues in this expansion, we get the expansion
v = v
1
+ . . . + v
s
, where v
i
∈ V
λ
i
.
Since v is an arbitrary vector of V , this means that V
λ1
+ . . . + V
λs
= V . The
direct proposition of the theorem is proved.
Conversely, suppose that λ
1
, . . . , λ
s
is the total set of mutually distinct eigen-
values of the operator f and assume that V
λ1
+ . . . + V
λs
= V . The theorem 4.6
says that this is a direct sum: V = V
λ1
⊕ . . . ⊕ V
λs
= V . Therefore, choosing
a basis in each eigenspace and joining them together, we get a basis in V (see
theorem 6.3 in Chapter I). This is a basis composed by eigenvectors of the operator
f, the application of f to each basis vector reduces to multiplying this vector by
its associated eigenvalue. Therefore, the matrix F of the operator f in this basis
is diagonal. Its diagonal elements coincide with the eigenvalues of the operator f.
The theorem is proved.
Assume that an operator f : V →V is diagonalizable and assume that we have
chosen a basis where its matrix is diagonal. Then the matrix H
f
in formula (4.3)
is also diagonal. Hence, we immediately derive the following formula:
det(f − λ 1) =
n
¸
i=1
(F
i
i
−λ).
Due to this equality we conclude that the characteristic polynomial of a diagona-
lizable operator is factorized into the product of a linear terms and all roots
of characteristic equation belong to the field K (not to its extension). This
means that characteristic numbers of a diagonalizable operator coincide with its
eigenvalues. This is a necessary condition for the operator f to be diagonalizable.
However, it is not a sufficient condition. Even in the case of algebraically closed
field of complex numbers K = C there are non-diagonalizable operators in vector
spaces over the field C.
72 CHAPTER II. LINEAR OPERATORS.
' 5. Nilpotent operators.
Definition 5.1. A linear operator f : V → V is called a nilpotent operator if
for any vector v ∈ V there is a positive integer number k such that f
k
(v) = 0.
According to the definition 5.1 for any vector v there is an integer number k
(depending on v) such that f
k
(v) = 0. The choice of such number has no upper
bound, indeed, if m > k and f
k
(v) = 0 then f
m
(v) = 0. This means that there
is a minimal positive number k = k
min
(depending on v) such that f
k
(v) = 0.
This minimal number k
min
is called the height of the vector v respective to the
nilpotent operator f. The height of zero vector is taken to be zero by definition;
for any nonzero vector v its height is greater or equal to the unity. Let’s denote
the height of v by ν(v) and define the number
ν(f) = max
v∈V
ν(v). (5.1)
For each vector v ∈ V its height is finite, but the maximum in (5.1) can be infinite
since the number of vectors in a linear vector space usually is infinite.
Definition 5.2. In that case, where the maximum in the formula (5.1) is
finite, a nilpotent operator f is called an operator of finite height and the number
ν(f) is called the height of a nilpotent operator f.
Theorem 5.1. In a finite-dimensional linear vector space V the height ν(f) of
any nilpotent operator f : V →V is finite.
Proof. Let’s choose a basis e
1
, . . . , e
n
in V and consider the heights of all
basis vectors ν(e
1
), . . . , ν(e
n
) with respect to f. Then denote
m = max¦ν(e
1
), . . . , ν(e
n
)¦.
For an arbitrary vector v ∈ V consider its expansion v = v
1
e
1
+ . . . + v
n
e
n
.
Then, applying the operator f
m
to v, we find
f
m
(v) =
n
¸
i=1
v
i
f
m
(e
i
) = 0. (5.2)
Due to the formula (5.2) we see that the heights of all vectors of the space V are
restricted by the number m. This means that the height of a nilpotent operator f
is finite: ν(f) = m < ∞.
Theorem 5.2. If f : V → V is a nilpotent operator and if U is an invariant
subspace of the operator f, then the restricted operator f
U
and the factoroperator
f
V/U
both are nilpotent.
Proof. Any vector u of the subspace U ⊂ V is a vector of V . Therefore, there
is an integer number k > 0 such that f
k
(u) = 0. However, the result of applying
the restricted operator f
U
the a vector of U coincides with the result of applying
the initial operator f to this vector. Hence, we have
(f
U
)
k
u = f
k
(u) = 0.
§ 5. NILPOTENT OPERATORS. 73
This proves that f
U
is a nilpotent operator. In the case of factoroperator we
consider an arbitrary coset Q in the factorspace V/U. Let Q = Cl
U
(v), where v is
some fixed vector in Q, and let k = ν(v) be the height of this vector v respective
to the operator f. Then we can calculate
(f
V/U
)
k
Q = Cl
U
(f
k
(v)) = 0.
Now it is clear that the factoroperator f
V/U
is a nilpotent operator. The theorem
is completely proved.
Theorem 5.3. A nilpotent operator f cannot have a nonzero eigenvalue.
Proof. Let λ be an eigenvalue of a nilpotent operator f and let v = 0 be an
associated eigenvector. Then we have f(v) = λ v. On the other hand, since f is
nilpotent, there is a number k > 0 such that f
k
(v) = 0. Then we derive
f
k
(v) = λ
k
v = 0.
But v = 0, therefore, λ
k
= 0. This is the equation for λ and λ = 0 its unique root.
The theorem is proved.
It the finite-dimensional case this theorem can be strengthened as follows.
Theorem 5.4. In a finite-dimensional space V of the dimension dimV = n any
nilpotent operator f has exactly one eigenvalue λ = 0 with the multiplicity n.
Proof. We shall prove this theorem by induction on n = dimV . In the case
n = 1 we fix some vector v = 0 in V and denote by k = ν(v) its height. Then
f
k
(v) = 0 and f
k−1
(v) = 0. This means that w = f
k−1
(v) = 0 is an eigenvector
of f with the eigenvalue λ = 0 since f(w) = f
k
(v) = 0 = 0 w. The base of the
induction is proved.
Suppose that the theorem is proved for any finite-dimensional space of the
dimension less than n and consider a space V of the dimension n = dimV . As
above, let’s fix some vector v = 0 in V and denote by k = ν(v) its height
respective to the operator f. Then f
k
(v) = 0 and w = f
k−1
(v) = 0. Hence, for
the nonzero vector w we get the following series of equalities:
f(w) = f(f
k−1
(v)) = f
k
(v) = 0 = 0 w.
Hence, w is an eigenvector of the operator f and λ = 0 is its associated eigenvalue.
Let’s consider the eigenspace U = V
0
corresponding to the eigenvalue λ = 0.
Let’s denote m = dimU = 0. The restricted operator f
U
is zero, hence, for
characteristic polynomial of this operator f
U
= 0 we derive
det(f
U
− λ 1) = (−λ)
m
.
Now, applying the theorem 3.6, we derive the characteristic polynomial of f:
det(f −λ 1) = (−λ)
m
det(f
V/U
−λ 1). (5.3)
The factoroperator f
V/U
is an operator in factorspace V/U whose dimension n−m
is less than n. Due to the theorem 5.2 the factoroperator f
V/U
is nilpotent,
74 CHAPTER II. LINEAR OPERATORS.
therefore, we can apply the inductive hypothesis to it. Then for its characteristic
polynomial of the factoroperator f
V/U
we get
det(f
V/U
−λ 1) = (−λ)
n−m
. (5.4)
Comparing the above relationships (5.3) and (5.4), we find the characteristic
polynomial of the initial operator f:
det(f −λ 1) = (−λ)
n
.
This means that λ = 0 is the only eigenvalue of the operator f and its multiplicity
is n = dimV . The theorem is proved.
Let f : V → V be a linear operator. Consider a vector v ∈ V and denote by
k = ν(v) its height respective to the operator f. This vector v produces the chain
of k vectors according to the following formulas:
v
1
= f
k−1
(v), v
2
= f
k−2
(v), . . . , v
k
= f
0
(v) = v. (5.5)
The chain vectors (5.5) are related with each other as follows: v
i
= f(v
i−1
). Let’s
apply the operator f to each vector in the chain (5.5). Then the first vector v
1
vanished. Applying f to the rest k − 1 vectors we get another chain:
w
1
= f
k−1
(v), w
2
= f
k−2
(v), . . . , w
k−1
= f(v). (5.6)
Comparing these two chains (5.5) and (5.6), we see that they are almost the same,
but the second chain is shorter. It is obtained from the first one by removing the
last vector v
k
= v.
The vector v
1
is called the side vector or the eigenvector of the chain (5.5).
The other vectors are called the adjoint vectors of the chain. If the side vectors of
two chains are different, then in these two chains there are no coinciding vectors at
all. However, there is even stronger result. It is known as the theorem on «linear
independence of chains».
Theorem 5.5. If the side vectors in several chains of the form (5.5) are linearly
independent, then the whole set of vectors in these chains is linearly independent.
Proof. We consider s chains of the form (5.5). In order to specify the chain
vector we use two indices v
i,j
. The first index i is the number of chain to which
this vector v
i,j
belongs, the second index j specifies the number of this vector
within the i-th chain. Denote by k
1
, . . . , k
s
the lengths of our chains. Without
loss of generality we can assume that the chains are arranged in the order of
decreasing their lengths, i. e. we have the following inequalities:
k
1
k
2
. . . k
s
1. (5.7)
Let k = max¦k
1
, . . . , k
s
¦. We shall prove the theorem by induction on k. If
k = 1 then the lengths of all chains are equal to 1. Therefore, they contain only
the side vectors and have no adjoint vectors at all. The proposition of the theorem
in this case is obviously true.
§ 5. NILPOTENT OPERATORS. 75
Suppose that the theorem is valid for the chains whose lengths are not greater
than k − 1. For our s chains, whose lengths are restricted by the number k, we
consider a linear combination of all their vectors being equal to zero:
s
¸
i=1
ks
¸
j=1
α
i,j
v
i,j
= 0. (5.8)
From this equality we should derive the triviality of the linear combination in
its left hand side. Let’s apply the operator f to both sides of (5.8) and use the
following quite obvious relationships:
f(v
i,j
) =

0, for j = 1,
v
i,j−1
, for j > 1.
If we take into account (5.7), then the result of applying f to (5.8) is written as
s
¸
i=1
ks
¸
j=1
α
i,j
f(v
i,j
) =
r
¸
i=1
kr
¸
j=2
α
i,j
v
i,j−1
= 0. (5.9)
In typical situation r = s. However, sometimes certain chains of vectors can drop
from the above sums at all. This happens if a part of chains were of the length
1. In this case r < s and k
r+1
= . . . = k
s
= 1. The lengths of all chains in (5.7)
cannot be equal to 1 since k > 1.
Shifting the index j + 1 →j in the last sum we can write (5.9) as follows:
r
¸
i=1
kr−1
¸
j=1
α
i,j+1
v
i,j
= 0. (5.10)
The left side of the relationship (5.10) is again a linear combination of chain
vectors. Here we have r chins with the lengths 1 less as compared to original
ones in (5.8). Now we can apply the inductive hypothesis, which yields the linear
independence of all vectors presented in (5.10). Hence, all coefficients of the linear
combination in left hand side of (5.10) are equal to zero. When applied to (5.8)
this fact means that the most part of terms in left hand side of this equality do
actually vanish. The remainder is written as follows:
s
¸
i=1
α
i,1
v
i,1
= 0. (5.11)
Now in the linear combination (5.11) we have only the side vectors of initial chains.
The are linearly independent by the assumption of the theorem. Therefore, the
linear combination (5.11) is also trivial. From triviality of (5.10) and (5.11) it
follows that the initial linear combination (5.8) is trivial too. We have completed
the inductive step and thus have proved the theorem in whole.
Let f : V →V be a nilpotent operator in a linear vector space V and let v be a
vector of the height k = ν(v) in V . Consider the chain of vectors (5.5) generated
by v and denote by U(v) the linear span of chain vectors (5.5):
U(v) = 'v
1
, . . . , v
k
`. (5.12)
76 CHAPTER II. LINEAR OPERATORS.
Due to the theorem 5.5 the subspace U(v) is a finite-dimensional subspace and
dimU(v) = k. The chain vectors (5.5) form a basis in this subspace (5.12). The
following relationships are derived directly from the definition of the chain (5.5):
f(v
1
) = 0,
f(v
2
) = v
1
,
. . . . . . . . . . . . .
f(v
k
) = v
k−1
.
(5.13)
Due to (5.13) the subspace (5.12) is invariant under the action of the operator f.
Hence, we can consider the restricted operator f
U(v)
and, using (5.13), we can find
the matrix of this restricted operator in the chain basis v
1
. . . , v
k
:
J
k
(0) =

0 1 0
0
.
.
.
.
.
. 1
0

. (5.14)
A matrix of the form (5.14) is called a Jordan block or a Jordan cage of a nilpotent
operator. Its primary diagonal is filled with zeros. The upward next diagonal
parallel to the primary one is filled with unities. All other space in the matrix
(5.14) is filled with zeros again. The matrix (5.14) is a square k k matrix, if
k = 1, this matrix degenerates and becomes purely zero matrix with the only
element: J
1
(0) = |0 |.
Let f : V → V again be a nilpotent operator. We continue to study vector
chains of the form (5.5). For this purpose let’s consider the following subspaces:
U
k
= Ker f ∩ Imf
k−1
. (5.15)
If u ∈ U
k
, then u ∈ Imf
k−1
. Therefore, u = f
k−1
(v) for some vector v. This
means that u is a chain vector in a chain of the form (5.5). From the condition
u ∈ Ker f we derive f(u) = f
k
(v) = 0. Hence, v is a vector of the height k and
u is a side vector in the chain (5.5) initiated by the vector v. For the subspaces
(5.15) we have the sequence of inclusions
V
0
= U
1
⊇ U
2
⊇ . . . ⊇ U
k
⊇ . . . , (5.16)
where V
0
= Ker f is the eigenspace corresponding to the unique eigenvalue λ = 0
of nilpotent operator f. The inclusions (5.16) follow from the fact that any chain
(5.5) of the length k with the side vector u = f
k−1
(v) can be treated as a chain of
the length k −1 by dropping the k-th vector v
k
= v (see (5.5) and (5.6)). Then for
the vector v

= f(v) we have u = f
k−2
(v

). This yields the inclusion of subspaces
U
k
⊂ U
k−1
for k > 1.
In a finite-dimensional space V the height of any vector v ∈ V is restricted by
the height of the nilpotent operator f itself:
ν(v) ≤ ν(f) = m < ∞
(see theorem 5.1). Therefore U
m+1
= ¦0¦. Hence, the sequence of inclusions (5.16)
§ 5. NILPOTENT OPERATORS. 77
terminates on m-th step, i. e. we have a finite sequence of inclusions:
V
0
= U
1
⊇ U
2
⊇ . . . ⊇ U
m
⊇ ¦0¦. (5.17)
Sequences of mutually enclosed subspaces of the form (5.16) or (5.17) are called
flags, while each particular subspace in a flag is called a flag subspace.
Theorem 5.6. For any nilpotent operator f in a finite-dimensional space V
there is a basis in V composed by chain vectors of the form (5.5). Such a basis is
called a canonic basis or a Jordan basis of a nilpotent operator f.
Proof. The proof of the theorem is based on the fact that the flag (5.17) is
finite. We choose a basis in the smallest subspace U
m
. Then we complete it up to
a basis in U
m−1
, in U
m−2
, and so on backward along the sequence (5.17). As a
result we construct a basis e
1
, . . . , e
s
in V
0
= Ker f. Note that each vector in such
a basis is a side vector of some chain of the form (5.5). For basis vectors of the
subspace U
m
the lengths of such chains are equal to m. For the complementary
vectors from U
m−1
their chins are of the length m− 1 and further the length of
chains decreases step by step until the unity for the complementary vectors in
largest subspace U
1
= V
0
.
Let’s join together all vectors of the above chains and let’s enumerate them by
means of double indices: e
i,j
. Here i is the number of the chain and j is the
individual number of the vector within i-th chain. Then
e
1
= e
1,1
, . . . , e
s
= e
s,1
.
Now let’s prove that the set of all vectors from the above chains form a basis in V .
The linear independence of this set of vectors follows from the theorem 5.5. We
only have to prove that an arbitrary vector v ∈ V can be represented as a linear
combination of chain vectors e
i,j
. We shall prove this fact by induction on the
height of the vector v.
If k = ν(v) = 1, then v ∈ Ker f = V
0
. In this case v is expanded in the basis
e
1
, . . . , e
s
of the subspace V
0
. This is the base of induction.
Now suppose that any vector of the height less than k can be represented as
a linear combination of chain vectors e
i,j
. Let’s take a vector v of the height k
and denote u = f
k−1
(v). Then f(u) = 0. This means that u is a side vector in a
chain of the length k initiated by the vector v. Therefore, u is an element of the
subspace U
k
(see formula (5.15)); this vector can be expanded in the basis of the
subspace U
k
, which we have constructed above:
u =
r
¸
i=1
α
i
e
i
. (5.18)
Note that in the expansion (5.18) we have only a part of vectors e
1
, . . . , e
s
,
namely, we have only those of them that belongs to U
k
and, hence, are side vectors
in the chains of the length not less than k. Therefore, we can write e
i
= f
k−1
(e
i,k
)
for i = 1, . . . , r. Substituting these expressions into (5.18), we obtain
f
k−1
(v) =
r
¸
i=1
α
i
f
k−1
(e
i,k
). (5.19)
CopyRight c (Sharipov R.A., 1996, 2004.
78 CHAPTER II. LINEAR OPERATORS.
By means of the coefficients of the expansion (5.19) we determine the vector v

:
v

= v −
r
¸
i=1
α
i
e
i,k
. (5.20)
Applying the operator f
k−1
to v

and taking into account (5.19), we find
f
k−1
(v

) = f
k−1
(v) −
r
¸
i=1
α
i
f
k−1
(e
i,k
) = 0.
Hence, the height of the vector v

is less than k and we can apply the inductive
hypothesis to it. This means that v

can be represented as a linear combination of
chain vectors e
i,j
. But v is expressed through v

as follows:
v = v

+
r
¸
i=1
α
i
e
i,k
.
Then v can also be expressed as a linear combination of chain vectors e
i,j
. The
inductive step is completed and the theorem in whole is proved.
In the basis composed by chain vectors, the existence of which was proved in
theorem 5.6, the matrix of nilpotent operator f has the following form:
F =

J
k1
(0)
J
k2
(0)
.
.
.
J
ks
(0)

. (5.21)
The matrix (5.21) is blockwise-diagonal, its diagonal blocks are Jordan cages of
the form (5.14), all other space in this matrix is filled with zeros. It is easy
to understand this fact. Indeed, each chain with the side vector e
i
produces
the invariant subspace U(v) of the form (5.12), where v = e
i,k
i
. Due to the
theorem 5.6 the space V is the direct sum of such invariant subspaces:
V = U(e
1,k1
) ⊕. . . ⊕ U(e
s,ks
).
The matrix (5.21) is called a Jordan form of the matrix of a nilpotent operator.
The theorem 5.6 is known as the theorem on bringing the matrix of a nilpotent
operator to a canonic Jordan form. If the chain basis e
1
, . . . , e
s
is constructed
strictly according to the proof of the theorem 5.6, then Jordan cages are arranged
in the order of decreasing sizes:
k
1
k
2
. . . k
s
.
However, the permutation of vectors e
1
, . . . , . . . e
s
can change this order, and this
usually happens in practice.
§ 6. ROOT SUBSPACES. 79
Theorem 5.7. The height of a nilpotent operator f in a finite-dimensional space
V is less or equal to the dimension n = dimV of this space and f
n
= 0.
Proof. Above in proving the theorem 5.1 we noted that the height ν(f) of a
nilpotent operator f coincides with the greatest height of basis vectors. Due to
the theorem 5.6 now we can choose the chain basis. The height of a chain vector is
not greater than the length of the chain (5.5) to which it belongs. Therefore, the
height of basis vectors in a chain basis is not greater than the number of vectors
in such a basis. This yields ν(f) n = dimV . The height of an arbitrary vector
v of V is not greater than the height of the operator f. Therefore, f
n
(v) = 0 for
all v ∈ V . This means that f
n
= 0. The theorem is proved.
' 6. Root subspaces. Two theorems
on the sum of root subspaces.
Definition 6.1. The root subspace of a linear operator f : V → V correspon-
ding to its eigenvalue λ is the set
V (λ) = ¦v ∈ V : ∃k ((k ∈ N) & ((f − λ 1)
k
v = 0))¦
that consist of vectors vanishing under the action of some positive integer power
of the operator h
λ
= f − λ 1.
For each positive integer k we define the subspace V (k, λ) = Ker(h
λ
)
k
. For
k = 1 the subspace V (1, λ) coincides with the eigenspace V
λ
. Note that (h
λ
)
k
v = 0
implies (h
λ
)
k+1
v = 0. Therefore we have the sequence of inclusions
V (1, λ) ⊆ V (2, λ) ⊆ . . . ⊆ V (k, λ) ⊆ . . . (6.1)
It is easy to see that all subspaces in the sequence (6.1) are enclosed into the root
subspace V (λ). Moreover, V (λ) is the union of the subspaces (6.1):
V (λ) =

¸
k=1
V (k, λ) =

¸
k=1
V (k, λ). (6.2)
In this case the sum of subspaces the sum of subspaces V (k, λ) coincides with their
union. Indeed, let v be a vector of the sum of subspaces V (k, λ). Then
v = v
k1
+. . . +v
ks
, where v
ks
∈ V (k
s
, λ). (6.3)
Let k = max¦k
1
, . . . , k
s
¦, then from the sequence of inclusions (6.1) we derive
v
k
i
∈ V (k, λ). Therefore the vector (6.3) belongs to V (k, λ), hence, it belongs to
the union of all subspaces V (k, λ).
The proof of coincidence of the sum and the union in (6.2) is based on the
inclusions (6.1). Therefore, we have proved the more general theorem.
Theorem 6.1. The sum of a growing sequence of mutually enclosed subspaces
coincides with their union.
The theorem 6.1 shows that the set V (λ) in definition 6.1 is actually a subspace
in V . This subspace is nonzero since it comprises the eigenspace V
λ
as a subset.
80 CHAPTER II. LINEAR OPERATORS.
Theorem 6.2. A root subspace V (λ) of an operator f is invariant under the
action of f and of all operators from its polynomial envelope P(f).
Proof. Let v ∈ V (λ). Then there exists a positive integer number k such that
(h
λ
)
k
v = 0. Let’s consider the vector w = f(v). For this vector we have
(h
λ
)
k
w = (h
λ
)
k
◦ f v = f ◦ (h
λ
)
k
v = f((h
λ
)
k
(v)) = 0.
Here we used the permutability of the operators h
λ
and f, it follows from the
inclusion h
λ
∈ P(f). Due to the above equality we have w = f(v) ∈ V (λ). The
invariance of V (λ) under the action of f is proved. Its invariance under the action
of operators from P(f) now follows from the theorem 4.5.
Theorem 6.3. Let λ and µ be two eigenvalues of a linear operator f : V →V .
Then the restriction of the operator h
λ
= f − λ 1 to the root subspace V (µ) is
(1) a bijective operator if µ = λ;
(2) a nilpotent operator if µ = λ.
Proof. Let’s prove the first proposition of the theorem. We already know that
the subspace V (µ) is invariant under the action of h
λ
. For the sake of convenience
we denote by h
λ,µ
the restriction of h
λ
to the subspace V (µ). This is an operator
acting from V (µ) to V (µ). Let’s find its kernel:
Ker h
λ,µ
= ¦v ∈ V (µ) : h
λ
(v) = 0¦ = Ker h
λ
∩ V (µ).
The kernel of the operator h
λ
by definition coincides with the eigenspace V
λ
.
Therefore, Ker h
λ,µ
= V
λ
∩ V (µ).
Let v be an arbitrary vector of the kernel Ker h
λ,µ
. Due to the above result v
belongs to V
λ
. Therefore, we have the equality
f(v) = λ v. (6.4)
Simultaneously, we have the other condition v ∈ V (µ) which means that there
exists some integer number k > 0 such that
(h
µ
)
k
v = (f −µ 1)
k
v = 0. (6.5)
From (6.4) we get h
µ
(v) = f(v) −µ v = (λ−µ) v. Combining this equality with
(6.5), we obtain the following equality for v:
(h
µ
)
k
v = (λ −µ)
k
v = 0.
Therefore, if λ = µ, we immediately get v = 0, which means that Ker h
λ,µ
= ¦0¦.
Hence, in the case λ = µ the operator h
λ,µ
: V (µ) → V (µ) is injective. The
surjectivity of this operator and, hence, its bijectivity follows from its injectivity
due to the theorem 1.3.
Now let’s prove the second proposition of the theorem. In this case µ = λ,
therefore, we consider the operator h
λ,λ
being the restriction of h
λ
to the subspace
V (λ). Note that h
λ,λ
v = h
λ
v for all v ∈ V (λ). Therefore, from the definition of a
root subspace we conclude that for any vector v ∈ V (λ) there is a positive integer
§ 6. ROOT SUBSPACES. 81
number k such that (h
λ,λ
)
k
v = (f − λ 1)
k
v = 0. This equality means that h
λ,λ
is a nilpotent operator in V (λ). The theorem is proved.
Theorem 6.4. Let λ
1
, . . . , λ
s
be a set of mutually distinct eigenvalues of a
linear operator f : V → V . Then the sum of corresponding root subspaces is a
direct sum: V (λ
1
) + . . . +V (λ
s
) = V (λ
1
) ⊕. . . ⊕ V (λ
s
).
Proof. The proof of this theorem is similar to that of theorem 4.6. Denote by
W the sum of subspaces specified in the theorem:
W = V (λ
1
) +. . . +V (λ
s
). (6.6)
In order to prove that the sum (6.6) is a direct sum we should prove the uniqueness
of the following expansion for an arbitrary vector w ∈ W:
w = v
1
+ . . . +v
s
, where v
i
∈ V (λ
i
). (6.7)
Consider another expansion of the same sort for the same vector w:
w = ˜ v
1
+ . . . + ˜ v
s
, where ˜ v
i
∈ V (λ
i
). (6.8)
Then let’s subtract the second expansion from the first one and for the sake of
brevity denote w
i
= (v
i
− ˜ v
i
) ∈ V (λ
i
). As a result we get
w
1
+. . . + w
s
= 0. (6.9)
Denote h
r
= f −λ
r
1. According to the definition of the root subspace V (λ
r
),
for any vector w
r
in the expansion (6.9) there is some positive integer number k
r
such that (h
r
)
kr
w
r
= 0. We use this fact and define the operators
f
i
=
s
¸
r=i
(h
r
)
kr
. (6.10)
Due to the permutability of the operators h
1
, . . . , h
s
belonging to the polynomial
envelope of the operator f and due to the equality (h
r
)
kr
w
r
= 0 we get
f
i
(w
j
) = 0 for all j = i.
Let’s apply the operator (6.10) to both sides of the equality (6.9). Then all terms
in the sum in left hand side of this equality do vanish, except for i-th term only.
This yields f
i
(w
i
) = 0. Let’s write this equality in expanded form:

s
¸
r=i
(h
r
)
kr

w
i
= 0 (6.11)
The vector w
i
belongs to the root space V (λ
i
), which is invariant under the action
of all operators h
r
in (6.11). Therefore we can replace the operators h
r
in (6.11)
by their restrictions h
r,i
to the subspace V (λ
i
):

s
¸
r=i
(h
r,i
)
kr

w
i
= 0. (6.12)
82 CHAPTER II. LINEAR OPERATORS.
According to the theorem 6.3, the restricted operators h
r,i
are bijective if r = i.
The product (the composition) of bijective operators is bijective. We also know
that applying a bijective operator to nonzero vector we would get a nonzero result.
Therefore, (6.12) implies w
i
= 0. Then v
i
= ˜ v
i
and the expansions (6.7) and
(6.8) do coincide. The uniqueness of the above expansion (6.7) and the theorem in
whole are proved.
Theorem 6.5. Let f be a linear operator in a finite-dimensional space V over
the field K and suppose that its characteristic polynomial factorizes into a product
of linear terms in K. Then the sum of all root subspaces of the operator f is equal
to V , i. e. V (λ
1
) ⊕ . . . ⊕ V (λ
s
) = V , where λ
1
, . . . , λ
s
is the set of all mutually
distinct eigenvalues of the operator f.
Proof. Since λ
1
, . . . , λ
s
is the set of all mutually distinct eigenvalues of the
operator f, for its characteristic polynomial we get
det(f −λ 1) =
s
¸
i=1

i
−λ)
n
i
.
According to the hypothesis of theorem, it is factorized into a product of linear
polynomials of the form λ
i
− λ, where λ
i
is an eigenvalue of f and n
i
is the
multiplicity of this eigenvalue. Let’s denote by W the total sum of all root
subspaces of the operator f, we know that this is a direct sum (see theorem 6.4):
W = V (λ
1
) ⊕. . . ⊕V (λ
s
).
The root subspaces are nonzero, hence, W = ¦0¦.
Further proof is by contradiction. Assume that the proposition of the theorem
is false and W = V . The subspace W is invariant under the action of f as a
sum of invariant subspaces V (λ
i
) (see theorem 3.2). Due to the theorem 4.5 it is
invariant under the action of the operator h
λ
= f − λ 1 as well. Let’s apply the
theorem 3.5 to the operator h
λ
. This yields
det(f −λ 1) = det(f
W
− λ 1) det(f
V/W
−λ 1). (6.13)
Here we took into account that 1
W
= 1 and 1
V/W
= 1, we also used the
theorem 3.4. The characteristic polynomial of the operator f is the product of
characteristic polynomial of restricted operator f
W
and that of factoroperator
f
V/W
. The left hand side of (6.13) factorizes into a product of linear polynomials
in K, therefore, each of the polynomials in right hand side of (6.13) should do
the same. Let λ
q
be one of the eigenvalues of the factoroperator f
V/W
and let
Q ∈ V/W be the corresponding eigenvector. Due to (6.13) the number λ
q
is in the
list λ
1
, . . . , λ
s
of eigenvalues of the operator f. Due to our assumption W = V we
conclude that the factorspace V/W is nontrivial: V/W = ¦0¦, and the coset Q is
not zero. Suppose that v ∈ Q is a representative of this coset Q. Since Q = 0, we
have v ∈ W. The coset Q is an eigenvector of the factoroperator f
V/W
, therefore,
it should satisfy the following equality:
(f
V/W
−λ
q
1) Q = Cl
W
((f − λ
q
1) v) = 0. (6.14)
§ 7. JORDAN BASIS OF A LINEAR OPERATOR. 83
Let’s denote h
r
= f −λ
r
1 for all r = 1, . . . , s (we have already used this notation
in proving the previous theorem). The relationship (6.14) means that
(f −λ
q
1) v = h
q
(v) = w ∈ W. (6.15)
From the expansion W = V (λ
1
) ⊕ . . . ⊕ V (λ
s
) for the vector w, which arises in
formula (6.15), we get the expansion
h
q
(v) = w = v
1
+ . . . + v
s
, where v
i
∈ V (λ
i
). (6.16)
Let’s consider the restriction of the operator h
q
to the root subspace V (λ
i
), this
restriction is denoted h
q,i
(see the proof of theorem 6.4). Due to the theorem 6.3
we know that the operators h
q,i
: V (λ
i
) → V (λ
i
) are bijective for all i = q.
Therefore, for all v
1
, . . . , v
s
in (6.16) other than v
q
we can find ˜ v
i
∈ V (λ
i
) such
that v
i
= h
q,i
(˜ v
i
). Let’s substitute these expressions into (6.16). Then we get
w = h
q
(v) = v
q
+
s
¸
i=q
h
q
(˜ v
i
). (6.17)
Relying upon this formula (6.17), we define the new vector ˜ v
q
:
˜ v
q
= v −
s
¸
i=q
˜ v
i
. (6.18)
For this vector from (6.17) we derive h
q
(˜ v
q
) = v
q
∈ V (λ
q
). Due to the definition
of the root subspace V (λ
q
) there exists a positive integer number k such that
(h
q
)
k
v
q
= 0. Hence, (h
q
)
k+1
˜ v
q
= 0 and, therefore, ˜ v
q
∈ V (λ
q
). Returning back
to the formula (6.18), we derive
v =
s
¸
i=1
˜ v
i
, where v
i
∈ V (λ
i
). (6.19)
From the formula (6.19) and from the expansionW = V (λ
1
) ⊕. . . ⊕V (λ
s
) it follows
that v ∈ W, but this contradicts to our initial choice v ∈ W, which was possible
due to the assumption W = V . Hence, W = V . The theorem is proved.
' 7. Jordan basis of a linear operator.
Hamilton-Cayley theorem.
Let f : V →V be a linear operator in finite-dimensional linear vector space V .
Suppose that V is expanded into the sum of root subspaces of the operator f:
V = V (λ
1
) ⊕. . . ⊕ V (λ
s
). (7.1)
Let’s denote h
i
= f − λ
i
1. Then denote by h
i,j
the restriction of h
i
to V (λ
j
).
According to the theorem 6.3, the restriction h
i,i
is a nilpotent operator in i-th
root subspace V (λ
i
). Therefore, in V (λ
i
) we can choose a canonic Jordan basis
for this operator (see theorem 5.6). The matrix of the operator h
i,i
in canonic
84 CHAPTER II. LINEAR OPERATORS.
Jordan basis is a matrix of the form (5.21) composed by diagonal blocks, where
each diagonal block is a matrix of the form (5.14).
Definition 7.1. A Jordan normal basis of an operator f : V → V is a basis
composed by canonic Jordan bases of nilpotent operators h
i,i
in the root subspaces
V (λ
i
) of the operator f.
Note that an operator f in a finite dimensional space V possesses a Jordan
normal basis if and only if there V is expanded into the sum of root subspaces of
the operator f, i. e. if we have (7.1). The theorem 6.5 yields a sufficient condition
for the existence of a Jordan normal basis of a linear operator.
Suppose that an operator f in a finite-dimensional linear vector space V
possesses a Jordan normal basis. The subspaces V (λ
i
) in (7.1) are invariant with
respect to f. Let’s denote by f
i
the restriction of f to V (λ
i
). The matrix of the
operator f in a Jordan normal basis is a blockwise-diagonal matrix:
F =

F
1
F
2
.
.
.
F
s

, (7.2)
The diagonal blocks F
i
in (7.2) are determined by operators f
i
. Note that the
operators f
i
and h
i,i
are related to each other by the equality f
i
= h
i,i
+ λ
i
1.
Therefore, F
i
is also a blockwise-diagonal matrix:
F
i
=

J
k1

i
)
J
k2

i
)
.
.
.
J
kr

i
)

. (7.3)
The number of diagonal blocks in (7.3) is determined the number of chains in a
canonic Jordan basis of the nilpotent operator h
i,i
, while these diagonal blocks
themselves are matrices of the following form:
J
k
(λ) =

λ 1 0
λ
.
.
.
.
.
. 1
λ

. (7.4)
A matrix of the form (7.4) is called a Jordan block or a Jordan cage with λ on
the diagonal. This is square k k matrix; if k = 1 this matrix degenerates and
becomes a matrix with the single element J
1
(λ) = | λ|.
The matrix of an operator f in a Jordan normal base presented by the
relationships (7.2), (7.3), and (7.4) is called a Jordan normal form of the matrix
of this operator. The problem of constructing a Jordan normal basis for a linear
operator f and thus finding the Jordan normal form F of its matrix is known as
the problem of bringing the matrix of a linear operator to a Jordan normal form.
CopyRight c (Sharipov R.A., 1996, 2004.
§ 7. JORDAN BASIS OF A LINEAR OPERATOR. 85
If the matrix of a linear operator can be brought to a Jordan normal form, this
fact has several important consequences. Note that a matrix of the form(7.4) is
upper-triangular. Hence, (7.3) and (7.2) all are upper-triangular matrices. The
entries on the diagonal of (7.2) are the eigenvalues of the operator f, the i-th
eigenvalue λ
i
being presented n
i
times, where n
i
= dimV (λ
i
). From the course of
algebra we know that the determinant of an upper-triangular matrix is equal to
the product of all its diagonal elements. Therefore, the characteristic polynomial
of an operator possessing a Jordan normal basis is given by the formula
det(f −λ 1) =
s
¸
i=1

i
− λ)
n
i
. (7.5)
Theorem 7.1. The matrix of a linear operator f in a finite-dimensional linear
vector space V over a numeric field K can be brought to a Jordan normal form if and
only if its characteristic polynomial factorizes into the product of linear polynomials
in the field K.
Proof. The necessity of the condition formulated in the theorem 7.1 is imme-
diate from (7.5); the sufficiency is provided by the theorems 5.6 and 6.5.
In the case of the field of complex numbers C any polynomial factorizes into a
product of linear terms. Therefore, the matrix of any linear operator in a complex
linear vector space can be brought to a Jordan normal form.
Theorem 7.2. The multiplicity of an eigenvalue λ of a linear operator f in a
finite-dimensional linear vector space V is equal to the dimension of the correspon-
ding root subspace V (λ).
For the operator f, the characteristic polynomial of which factorizes into
the product of linear terms, the proposition of theorem 7.2 immediately follows
from the formula (7.5). However, this fact is valid also in the case of partial
factorization. Such a case can be reduced to the case of complete factorization
by means of the field extension technique. We do not consider the field extension
technique in this book. But it is worth to note that the complete proof of the
following Hamilton-Cayley theorem is also based on that technique.
Theorem 7.3. Let P(λ) be the characteristic polynomial of a linear operator
f in a finite-dimensional space V . Then P(f) = 0.
Proof. We shall prove the Hamilton-Cayley theorem for the case where the
characteristic polynomial P(λ) factorizes into the product of linear terms:
P(λ) =
s
¸
i=1

i
− λ)
n
i
. (7.6)
Denote h
i
:= f −λ
i
1 and denote by h
i,j
the restriction of h
i
to the root subspace
V (λ
j
). Then from the formula (7.6) we derive
P(f) =
s
¸
i=1
(h
i
)
n
i
.
86 CHAPTER II. LINEAR OPERATORS.
Let’s apply P(f) to an arbitrary vector v ∈ V . Due to the theorem (6.5) we can
expand v into a sum v = v
1
+. . . + v
s
, where v
i
∈ V (λ
i
). Therefore, we have
P(f) v = P(f) v
1
+. . . +P(f) v
s
. (7.7)
The root subspace V (λ
j
) is invariant under the action of the operators h
i
. Then
P(f) v
j
=
s
¸
i=1
(h
i
)
n
i
v
j
=
s
¸
i=1
(h
i,j
)
n
i
v
j
.
Using permutability of the operators h
i
and their restrictions h
i,j
, we can bring
the above expression for P(f) v
j
to the following form:
P(f) v
j
=
s
¸
i=j
(h
i,j
)
n
i
(h
j,j
)
n
j
v
j
. (7.8)
The operator h
j,j
is a nilpotent operator in the subspace V (λ
j
) and n
j
=
dimV (λ
j
). Therefore, we can apply the theorem 5.7. As a result we obtain
(h
j,j
)
n
j
v
j
= 0. Now from (7.7) and (7.8) for an arbitrary vector v ∈ V we derive
P(f) v = 0. This proves the theorem for the special case, where the characteristic
polynomial of an operator f factorizes into a product of linear terms. The general
case is reduced to this special case by means of the field extension technique,
which we do not consider in this book.
CHAPTER III
DUAL SPACE.
' 1. Linear functionals.
Vectors and covectors. Dual space.
Definition 1.1. Let V be a linear vector space over a numeric field K. A
numeric function y = f(v) with vectorial argument v ∈ V and with values y ∈ K
is called a linear functional if
(1) f(v
1
+v
2
) = f(v
1
) +f(v
2
) for any two v
1
, v
2
∈ V ;
(2) f(α v) = αf(v) for any v ∈ V and for any α ∈ K.
The definition of a linear functional is quite similar to the definition of a linear
mapping (see definition 8.1 in Chapter I). Comparing these two definitions, we see
that any linear functional f is a linear mapping f : V → K and, conversely, any
such linear mapping is a linear functional. Thereby the numeric field K is treated
as a linear space of the dimension 1 over itself.
Linear functionals, as linear mappings from V to K, constitute the space
Hom(V, K), which is called the dual space or the conjugate space for the space
V . The dual space Hom(V, K) is denoted by V

. The space of homomorphisms
Hom(V, W) is usually determined by two spaces V and W. However, the dual
space is an exception V

= Hom(V, K), it is determined only by V since K is
known whenever V is given (see definition 2.1 in Chapter I).
Thus, V

= Hom(V, K) is a linear vector space over the same numeric field
K as V . If V is finite-dimensional, then the dimension of the conjugate space is
determined by the theorem 10.4 in Chapter I: dimV

= dimV . The structure of a
linear vector space in V

= Hom(V, K) is determined by two algebraic operations:
the operation of pointwise addition and pointwise multiplication by numbers (see
definitions 10.1 and 10.2 in Chapter I). However, it would be worth to formulate
these two definitions especially for the present case of linear functionals.
Definition 1.2. Let f and g be two linear functionals of V

. The sum of
functionals f and g is a functional h whose values are determined by formula
h(v) = f(v) + g(v) for all v ∈ V .
Definition 1.3. Let f be a linear functional of V

. The product of the
functional f by a number α ∈ K is a functional h whose values are determined by
formula h(v) = α f(v) for all v ∈ V .
Let V be a finite-dimensional vector space over a field K and let e
1
, . . . , e
n
be
a basis in V . Then each vector v ∈ V can be expanded in this basis:
v = v
1
e
1
+. . . +v
n
e
n
. (1.1)
88 CHAPTER III. DUAL SPACE.
Let’s consider i-th coordinate of the vector v. Due to the uniqueness of the
expansion (1.1), when the basis is fixed, v
i
is a number uniquely determined by
the vector v. Hence, we can consider a map h
i
: V → K, defining it by formula
h
i
(v) = v
i
. When adding vectors, their coordinates are added; when multiplying
a vector by a number, its coordinates are multiplied by that number (see the
relationships (5.4) in Chapter I). Therefore, h
i
: V → K is a linear mapping. This
means that each basis e
1
, . . . , e
n
of a linear vector space V determines n linear
functionals in V

. The functionals h
1
, . . . , h
n
are called the coordinate functionals
the basis e
1
, . . . , e
n
. They satisfy the relationships
h
i
(e
j
) = δ
i
j
, (1.2)
where δ
i
j
is the Kronecker symbol. These relationships (1.2) are called the
relationships of biorthogonality.
The proof of the relationships of biorthogonality is very simple. If we expand
the vector e
j
in the basis e
1
, . . . , e
n
, then its j-th component is equal to unity,
while all other components are equal to zero. Note that h
i
(e
j
) is a number equal
to i-th component of the vector e
j
. Therefore, h
i
(e
j
) = 1 if i = j and h
i
(e
j
) = 0
in all other cases.
Theorem 1.1. Coordinate functionals h
1
, . . . , h
n
are linearly independent;
they form a basis in dual space V

.
Proof. Let’s consider a linear combination of the coordinate functionals asso-
ciated with a basis e
1
, . . . , e
n
in V and assume that it is equal to zero:
α
1
h
1
+ . . . +α
n
h
n
= 0. (1.3)
Right hand side of (1.3) is zero functional. Its value when applied to the base
vector e
j
is equal to zero. Hence, we have
α
1
h
1
(e
j
) + . . . +α
n
h
n
(e
j
) = 0. (1.4)
Now we use the relationships of biorthogonality (1.2). Due to these relationships
among n terms h
1
(e
j
), . . . , h
n
(e
j
) in left hand side of the equality (1.4) only one
term is nonzero: h
j
(e
j
) = 1. Therefore, (1.4) reduces to α
j
= 0. But j is an
index that runs from 1 to n. Hence, all coefficients of the linear combination
(1.3) are zero, i. e. it is trivial and coordinate functionals h
1
, . . . , h
n
are linearly
independent.
In order to complete the proof of the theorem now we could use the equality
dimV

= dimV = n and refer to the item (4) of the theorem 4.5 in Chapter I.
However, we choose more explicit way and directly prove that coordinate functio-
nals h
1
, . . . , h
n
span the dual space V

. Let f ∈ V

be an arbitrary linear
functional and let v be an arbitrary vector of V . Then from (1.1) we derive
f(v) = v
1
f(e
1
) + . . . +v
n
f(e
n
) = f( e
1
) h
1
(v) +. . . + f(e
n
) h
n
(v).
Here f(e
1
), . . . , f(e
n
) are numeric coefficients from K and v is an arbitrary
§ 1. LINEAR FUNCTIONALS. DUAL SPACE. 89
vector of V . Therefore, the above equality can be rewritten as an equality of linear
functionals in the conjugate space V

:
f = f(e
1
) h
1
+ . . . + f(e
n
) h
n
. (1.5)
The formula (1.5) shows that an arbitrary function f ∈ V

can be represented as
a linear combination of coordinate functionals h
1
, . . . , h
n
. Hence, being linearly
independent, they form a basis in V

. The theorem is proved.
Definition 1.4. The basis h
1
, . . . , h
n
in V

formed by coordinate functionals
associated with a basis e
1
, . . . , e
n
in V is called the dual basis or the conjugate
basis for e
1
, . . . , e
n
.
Definition 1.5. Let f be a linear functional in a finite-dimensional space V
and let e
1
, . . . , e
n
be a basis in this space. The numbers f
1
, . . . , f
n
determined
by the linear functional f according to the formula
f
i
= f(e
i
) (1.6)
are called the coordinates or the components of f in the basis e
1
, . . . , e
n
.
As we see in formula (1.5), the numbers (1.6) are the coefficients of the
expansion of f in the conjugate basis h
1
, . . . , h
n
. However, in the definition 1.5
they are mentioned as the components of f in the basis e
1
, . . . , e
n
. This is purely
terminological trick, it means that we consider e
1
, . . . , e
n
as a primary basis,
while the conjugate basis is treated as an auxiliary and complementary thing.
The algebraic operations of addition and multiplication by numbers in the
spaces V and V

are related to each other by the following equalities:
f(v
1
+ v
2
) = f(v
1
) + f(v
2
), f(α v) = αf(v);
(1.7)
(f
1
+f
2
)(v) = f
1
(v) +f
2
(v), (α f)(v) = αf(v).
Vectors and linear functionals enter these equalities in a quite similar way. The
fact that in the writing f(v) the functional plays the role of a function, while the
vector is written as an argument is not so important. Therefore, sometimes the
quantity f(v) is denoted differently:
f(v) = 'f [ v`. (1.8)
The writing (1.8) is associated with the special terminology. Functionals from the
dual space V

are called covectors, while the expression 'f [ v` itself is called the
pairing, or the contraction, or even the scalar product of a vector and a covector.
The scalar product (1.8) possesses the property of bilinearity: it is linear in its
first argument f and in its second argument v. This follows from the relationships
(1.7), which are now written as
'f
1
+ f
2
[ v` = 'f
1
[ v` + 'f
2
[ v`, 'α f [ v` = α'f [ v`;
(1.9)
'f [ v
1
+ v
2
` = 'f [ v
1
` +'f [ v
2
`, 'f [ α v` = α'f [ v`.
We have already dealt with the concept of bilinearity earlier in this book (see
theorem 1.1 in Chapter II).
90 CHAPTER III. DUAL SPACE.
The properties (1.9) of the scalar properties (1.8) are analogous to the properties
of the scalar product of geometric vectors — it is usually studied in the course
analytic geometry (see [5]). However, in contrast to that «geometric» scalar
product, the scalar product (1.8) is not symmetric: its arguments belong to
different spaces — they cannot be swapped. Covectors in the scalar product (1.8)
are always written on the left and vectors are always on the right.
The following definition is dictated by the intension to strengthen the analogy
of (1.8) and traditional «geometric» scalar product.
Definition 1.7. A vector v and a covector f are called orthogonal to each
other if their scalar product is zero: 'f [ v` = 0.
Theorem 1.2. Let U V be a subspace in a finite-dimensional vector space V
and let v ∈ U. Then there exists a linear functional f in V

such that f(v) = 0
and f(u) = 0 for all u ∈ U.
Proof. Let dimV = n and dimU = s. Let’s choose a basis e
1
, . . . , e
s
in
a subspace U. Let’s add the vector v to basis vectors e
1
, . . . , e
s
and denote it
v = e
s+1
. The extended system of vectors is linearly independent since v ∈ U,
see the item (4) of the theorem 3.1 in Chapter I. Denote by W = 'e
1
, . . . , e
s+1
`
the linear span of this system of vectors. It is clear that W is a subspace of
V comprising the initial subspace U; its dimension is one as greater than the
dimension of U. The vectors e
1
, . . . , e
s+1
form a basis in W. If W = V , then
we complete the basis e
1
, . . . , e
s+1
up to a basis e
1
, . . . , e
n
in the space V and
consider the coordinate functionals h
1
, . . . , h
n
associated with this base. Let’s
denote f = h
s+1
. Then from the relationships of biorthogonality (1.2) we derive
f(v) = h
s+1
(e
s+1
) = 1 and f(e
i
) = 0 for i = 1, . . . , s.
Being zero on the basis vectors of the subspace U, the functional f = h
s+1
vanishes
on all vector u ∈ U. Its value on the vector v is equal to unity.
Let’s consider the case U = ¦0¦ in the above theorem. Then for any nonzero
vector v we have v ∈ U, and we can formulate the following corollary of the
theorem 1.2.
Corollary. For any vector v = 0 in a finite-dimensional space V there is a
linear functional f in V

such that f(v) = 0.
Let V be a linear vector space over the field K and let W = V

be the
conjugate space of V . We know that W is also a linear vector space over the field
K. Therefore, it possesses its own conjugate space W

. With respect to V this
is the double conjugate space V
∗∗
. We can also consider triple conjugate, fourth
conjugate, etc. Thus we would have the infinite sequence of conjugate spaces.
However, soon we shall see, that in the case of finite-dimensional spaces there is
no need to consider the multiple conjugate spaces.
Let v ∈ V . To any f ∈ V

we associate the number f(v) ∈ K. Thus we define
a mapping ϕ
v
: V

→K, which is linear due to the following relationships:
ϕ
v
(f
1
+ f
2
) = (f
1
+ f
2
)(v) = f
1
(v) +f
2
(v) = ϕ
v
(f
1
) +ϕ
v
(f
2
),
ϕ
v
(α f) = (α f)(v) = αf(v) = αϕ
v
(f).
§ 1. LINEAR FUNCTIONALS. DUAL SPACE. 91
Hence, ϕ
v
is a linear functional in the space V

or, in other words, it is an element
of double conjugate space. The functional ϕ
v
is determined by a vector v ∈ V .
Therefore, when associating ϕ
v
with a vector v, we define a mapping
h : V →V
∗∗
, where h(v) = ϕ
v
for all v ∈ V. (1.10)
The mapping (1.10) is a linear mapping. In order to prove this fact we should
verify the following identities for this mapping h:
h(v
1
+v
2
) = h(v
1
) +h(v
2
), h(α v) = αh(v). (1.11)
The result of applying h to a vector of the space V is an element of double
conjugate space V
∗∗
. Therefore, in order to verify the equalities (1.11) we should
apply both sides of these equalities to an arbitrary covector f ∈ V

and check the
coincidence of the results that we obtain:
h(v
1
+ v
2
)(f) = ϕ
v1+v2
(f) = f(v
1
) +f(v
2
) = ϕ
v1
(f)+
+ ϕ
v2
(f) = h(v
1
)(f) + h(v
2
)(f) = (h(v
1
) + h(v
2
))(f),
h(α v)(f) = ϕ
α·v
(f) = f(α v) = αf(v) =
= αϕ
v
(f) = αh(v)(f) = (α h(v))(f).
Theorem 1.3. For a finite-dimensional linear vector space V the mapping (1.10)
is bijective. It is an isomorphism of the spaces V and V
∗∗
. This isomorphism is
called canonic isomorphism of these spaces.
Proof. First of all we shall prove the injectivity of the mapping (1.10). For
this purpose we consider its kernel Ker h. Let v be an arbitrary vector of Ker h.
Then ϕ
v
= h(v) = 0. But ϕ
v
∈ V
∗∗
, this means that ϕ is a linear functional in the
space V

. Therefore, the equality ϕ
v
= 0 means that ϕ(f) = 0 for any covector
f ∈ V

. Using this equality, from (1.10) we derive
h(v)(f) = ϕ
v
(f) = f(v) = 0 for all f ∈ V

. (1.12)
Now let’s apply the corollary of the theorem 1.2. If the vector v would be nonzero,
then there would be a functional f such that f(v) = 0. This would contradict
the above condition (1.12). Hence, v = 0 by contradiction. This means that
Ker h = ¦0¦ and h is an injective mapping.
In order to prove the surjectivity of the mapping (1.10) we use the theorem 9.4
from Chapter I. According to this theorem
dim(Ker h) +dim(Imh) = dimV.
We have already proved that dim(Ker h) = 0. Hence, dim(Imh) = dimV . Since
Imh is a subspace of V
∗∗
and dimV
∗∗
= dimV

= dimV , we have Imh = V
∗∗
(see item (3) of theorem 4.5 in Chapter I). This completes the proof of surjectivity
of the mapping h and the proof of the theorem in whole.
CopyRight c (Sharipov R.A., 1996, 2004.
92 CHAPTER III. DUAL SPACE.
Canonic isomorphism (1.10) possesses the property that for any vector v ∈ V
and for any covector f ∈ V

the following equality holds:
'h(v) [ f` = 'f [ v`. (1.13)
The equality (1.13) is derived from the definition of h. Indeed, 'h(v) [ f` =
h(v)(f) = ϕ
v
(f) = f(v) = 'f [ v`. The relationship (1.13) distinguishes canonic
isomorphism among all other isomorphisms relating the spaces V and V
∗∗
.
' 2. Transformation of the coordinates
of a covector under a change of basis.
Let V be a finite-dimensional linear vector space and let V

be the associated
dual space. If we treat V

separately forgetting its relation to V , then a choice of
basis and a change of basis in V

are quite the same as in any other linear vector
space. However, the conjugate space V

is practically never considered separately.
The theory of this space should be understood as an extension of the theory of
initial space V .
Let e
1
, . . . , e
n
be a basis in a linear vector space V . Each such basis of V
has the associated basis of coordinate functionals in V

. Choosing another basis
˜ e
1
, . . . , ˜e
n
in V we immediately get another conjugate basis
˜
h
1
, . . . ,
˜
h
n
in V

.
Let S be the transition matrix for passing from the old basis e
1
, . . . , e
n
to the new
basis ˜ e
1
, . . . , ˜e
n
. Similarly, denote by P the transition matrix for passing from
the old dual basis h
1
, . . . , h
n
to the new dual basis
˜
h
1
, . . . ,
˜
h
n
. The components
of these two transition matrices S and P are used to expand the vectors of «wavy»
bases in corresponding «non-wavy» bases:
˜ e
j
=
n
¸
i=1
S
i
j
e
i
,
˜
h
r
=
n
¸
s=1
P
r
s
h
s
. (2.1)
Note that the second formula (2.1) differs from the standard given by formula
(5.5) in Chapter I: the vectors of dual bases in (2.1) are specified by upper indices
despite to the usual convention of enumerating the basis vectors. The reason is
that the dual space V

and the dual bases are treated as complementary objects
with respect to the initial space V and its bases. We have already seen such
deviations from the standard notations in constructing the basis vectors E
i
j
in
Hom(V, W) (see proof of the theorem 10.4 in Chapter I).
In spite of the breaking the standard rules in indexing the basis vectors, the
formula (2.1) does not break the rules of tensorial notation: the free index r is
in the same upper position in both sides of the equality, the summation index s
enters twice — once as an upper index and for the second time as a lower index.
Theorem 2.1. The transition matrix P for passing from the old conjugate
basis h
1
, . . . , h
n
to the new conjugate basis
˜
h
1
, . . . ,
˜
h
n
is inverse to the transi-
tion matrix S that is used for passing from the old basis e
1
, . . . , e
n
to the new
basis ˜ e
1
, . . . , ˜e
n
.
Proof. In order to prove this theorem we use the biorthogonality relationships
(1.2). Substituting (2.1) into these relationships, we get
δ
r
j
= h
r
(e
j
) =
n
¸
s=1
n
¸
i=1
P
r
s
S
i
j
h
s
(e
i
) =
n
¸
s=1
n
¸
i=1
P
r
s
S
i
j
δ
s
i
=
n
¸
i=1
P
r
i
S
i
j
.
§ 2. TRANSFORMATION OF THE COORDINATES OF A COVECTOR . . . 93
The above relationship can be written in matrix form P S = 1. This means that
P = S
−1
. The theorem is proved.
Remember that the inverse transition matrix T is also the inverse matrix for
S. Therefore, in order to write the complete set of formulas relating two pairs of
bases in V and V

it is sufficient to know two matrices S and T = S
−1
:
˜e
j
=
n
¸
i=1
S
i
j
e
i
,
˜
h
r
=
n
¸
s=1
T
r
s
h
s
,
(2.2)
e
i
=
n
¸
j=1
T
j
i
˜e
j
, h
s
=
n
¸
r=1
S
s
r

˜
h
r
.
Let f be a covector from the conjugate space V

. Let’s consider its expansions
in two conjugate bases h
1
, . . . , h
n
and
˜
h
1
, . . . ,
˜
h
n
:
f =
n
¸
s=1
f
s
h
s
, f =
n
¸
r=1
˜
f
r

˜
h
r
. (2.3)
The expansions (2.3) also differ from the standard introduced by formula (5.1) in
Chapter I. To the coordinates of covectors the other standard is applied: they are
specified by lower indices and are written in row vectors.
Theorem 2.2. The coordinates of a covector f in two dual bases h
1
, . . . , h
n
and
˜
h
1
, . . . ,
˜
h
n
associated with the bases e
1
, . . . , e
n
and ˜ e
1
, . . . , ˜e
n
in V are
related to each other by formulas
˜
f
r
=
n
¸
s=1
S
s
r
f
s
, f
s
=
n
¸
j=1
T
r
s
˜
f
r
, (2.4)
where S is the direct transition matrix for passing from e
1
, . . . , e
n
to the «wavy»
basis ˜e
1
, . . . , ˜ e
n
, while T = S
−1
is the inverse transition matrix.
Proof. In order to prove the first relationship (2.4) we substitute the fourth
expression (2.2) for h
s
into the first expansion (2.3):
f =
n
¸
s=1
f
s

n
¸
r=1
S
s
r

˜
h
r

=
n
¸
r=1

n
¸
s=1
S
s
r
f
s

˜
h
r
.
Then we compare the resulting expansion of f with the second expansion (2.3) and
derive the first formula (2.4). The second formula (2.4) is derived similarly.
Note that the formulas (2.4) can be derived immediately from the definition 1.5
and from formula (1.6) without using the conjugate bases.
Theorem 2.3. The scalar product of a vector v and a covector f is determined
by their coordinates according to the formula
'f [ v` =
n
¸
i=1
f
i
v
i
= f
1
v
1
+ . . . + f
n
v
n
. (2.5)
94 CHAPTER III. DUAL SPACE.
Proof. In order to prove (2.5) we use the relationship (1.6):
'f [ v` = f(v) =
n
¸
i=1
f(e
i
) v
i
=
n
¸
i=1
f
i
v
i
.
In (2.5) and in the above calculations f is assumed to be expanded in the basis
h
1
, . . . , h
n
conjugated to the basis e
1
, . . . , e
n
, where v is expanded.
' 3. Orthogonal complements in a dual space.
Definition 3.1. Let S be a subset in a linear vector space V . The orthogonal
complement of the subset S in the conjugate space V

is the set S

⊂ V

composed
by covectors each of which orthogonal to all vectors of S.
The above definition of the orthogonal complement S

can be expressed by the
formula S

= ¦f ∈ V

: ∀v ((v ∈ S) ⇒('f [ v` = 0))¦.
Theorem 3.1. The operation of constructing orthogonal complements of sub-
sets S ⊂ V in the conjugate space V

possesses the following properties:
(1) S

is a subspace in V

;
(2) S
1
⊂ S
2
implies (S
2
)

⊂ (S
1
)

;
(3) 'S`

= S

, where 'S` is the linear span of S;
(4)

¸
i∈I
S
i


=
¸
i∈I
(S
i
)

.
Proof. Let’s prove the first item of the theorem for the beginning. For this
purpose we should verify two conditions from the definition of a subspace. Let
f
1
, f
2
∈ S

, then 'f
1
[ v` = 0 and 'f
2
[ v` = 0 for all v ∈ S. Therefore, for all
vectors v ∈ S we derive the equality 'f
1
+ f
2
[ v` = 'f
1
[ v` + 'f
2
[ v` = 0 which
means that f
1
+ f
2
∈ S

.
Now assume that f ∈ S

. Then 'f [ v` = 0 for all vectors v ∈ S. Hence, for
the covector α f we defive 'α f [ v` = α'f [ v` = 0. This means that α f ∈ S

.
Thus, the first item in the theorem 3.1 is proved.
In order to prove the inclusion (S
2
)

⊂ (S
1
)

in the second item of the
theorem 3.1 we consider an arbitrary covector f of (S
2
)

. From the condition
f ∈ (S
2
)

we derive 'f [ v` = 0 for any v ∈ S
2
. But S
1
⊂ S
2
, therefore, the
equality 'f [ v` = 0 holds for any v ∈ S
1
. Then f ∈ (S
1
)

. This means that
f ∈ (S
2
)

implies f ∈ (S
1
)

. The required inclusion is proved.
In order to prove the third item of the theorem note that the linear span of S
comprises this set: S ⊂ 'S`. Applying the item (2) of the theorem, which we have
already proved, we obtain the inclusion 'S`

⊂ S

. Now we need the opposite
inclusion S

⊂ 'S`

. In order to prove it let’s remember that the linear span 'S`
consists of all possible linear combinations of the form
v = α
1
v
1
+ . . . + α
r
v
r
, where v
i
∈ S. (3.1)
Let f ∈ S

, then 'f [ v` = 0 for all v ∈ S. In particular, this applies to the vectors
v
i
in the expansion (3.1), i. e. 'f [ v
i
` = 0. Then from (3.1) we derive
'f [ v` = α
1
'f [ v
1
` +. . . + α
r
'f [ v
r
` = 0.
§ 3. ORTHOGONAL COMPLEMENTS IN A DUAL SPACE. 95
This means that 'f [ v` = 0 for all v ∈ 'S`. This proves the opposite inclusion
S

⊂ 'S`

and, thus, completes the proof of the equality 'S`

= S

.
Now let’s proceed to the proof of the fourth item of the theorem 3.1. For this
purpose we introduce the following notations:
S =
¸
i∈I
S
i
,
˜
S =
¸
i∈I
(S
i
)

.
Let f ∈ S

. Then 'f [ v` = 0 for all v ∈ S. But S
i
⊂ S for any i ∈ I. Therefore,
'f [ v` = 0 for all v ∈ S
i
and for all i ∈ I. This means that f belongs to each of
the orthogonal complement (S
i
)

, therefore, it belongs to their intersection. Thus,
we have proved the inclusion S


˜
S.
Conversely, from the inclusion f ∈ (S
i
)

for all i ∈ I we derive 'f [ v` = 0 for
all v ∈ S
i
and for all i ∈ I. This means that the equality 'f [ v` = 0 holds for all
vectors v in the union of all sets S
i
. This proves the converse inclusion
˜
S ⊂ S

.
Thus, we have proved that S

=
˜
S. The theorem is proved.
Definition 3.2. Let S be a subset of conjugate space V

. The orthogonal
complement of S in V is the set S

∈ V formed by vectors each of which is
orthogonal to all covectors of the set S.
The above definition of the orthogonal complement S

⊂ V can be expressed
by the formula S

= ¦v ∈ V : ∀f ((f ∈ S) ⇒ ('f [ v` = 0))¦. For this orthogonal
complement one can formulate a theorem quite similar to the theorem 3.1.
Theorem 3.2. The operation of constructing orthogonal complements of sub-
sets S ⊂ V

in V possesses the following four properties:
(1) S

is a subspace in V ;
(2) S
1
⊂ S
2
implies (S
2
)

⊂ (S
1
)

;
(3) 'S`

= S

, where 'S` is a linear span of S;
(4)

¸
i∈I
S
i


=
¸
i∈I
(S
i
)

.
The proof of this theorem almost literally coincides with the proof of the
theorem 3.1. Therefore, here we omit this proof.
Theorem 3.3. Let V be a finite-dimensional vector space and suppose that we
have a subspaces U ⊂ V and a subspace W ⊂ V

. The the condition W = U

in
the sense of definition 3.1 is equivalent to the condition U = W

in the sense of
definition 3.2.
Proof. Suppose that W = U

in the sense of definition 3.1. Then for any
w ∈ W and for any u ∈ U we have the orthogonality 'w[ u` = 0. By definition
W

is the set of all vectors v ∈ V such that 'w[ v` = 0 for all covectors w ∈ W.
Hence, u ∈ U implies u ∈ W

and we have the inclusion U ⊂ W

.
However, we need to prove the coincidence U = W

. Let’s do it by con-
tradiction. Suppose that U = W

. Then there is a vector v
0
such that v
0
∈ W

and v
0
/ ∈ U. In this case we can apply the theorem 1.2 which says that there is
a linear functional f such that it vanishes on all vectors u ∈ U and is nonzero on
the vector v
0
. Then f ∈ W and 'f [ v
0
` = 0, so we have the contradiction to the
96 CHAPTER III. DUAL SPACE.
condition v
0
∈ W

. This contradiction proves that U = W

. As a result we have
proved that W = U

implies U = W

.
Now, conversely, let U = W

. Then for any w ∈ W and for any u ∈ U we
have the orthogonality 'w[ u` = 0. By definition U

is the set of all covectors f
perpendicular to all vectors u ∈ U. Hence, w ∈ W implies w ∈ U

and we have
the inclusion W ⊂ U

.
Next step is to prove the coincidence W = U

. We shall do it again by
contradiction. Assume that W = U

. Then there is a covector f
0
∈ U

such
that f
0
/ ∈ W. Let’s apply the theorem 1.2. It this case it says that there is
a linear functional ϕ in V
∗∗
such that it vanishes on W and is nonzero on the
covector f
0
. Remember that we have the canonic isomorphism h: V → V
∗∗
. We
apply h
−1
to ϕ and get the vector v = h
−1
(ϕ). Then we take into account (1.13)
which yields v ∈ U and 'f
0
[ v` = 0. This contradicts to the condition f
0
∈ U

.
Hence, by contradiction, U = W

and U = W

implies W = U

. The theorem is
completely proved.
The proposition of the theorem 3.3 can be reformulated as follows: in the case
of a finite-dimensional space V for any subspace U ∈ V and for any subspace
W ∈ V

the following relationships are valid:
(U

)

= U, (W

)

= W. (3.2)
For arbitrary subsets S ∈ V and R ∈ V

(not subspaces) in the case of a
finite-dimensional space V we have the relationships
(S

)

= 'S`, (R

)

= 'R`. (3.3)
These relationships (3.3) are derived from (3.2) by using the item (3) in theo-
rems 3.1 and 3.2.
Theorem 3.4. In the case of a finite-dimensional linear vector space V if U is
a subspace of V or if U is a subspace of V

, then dimU +dimU

= dimV .
Proof. Due to the relationships (3.2) the second case U ⊂ V

in the theo-
rem 3.4 is reduced to the first case U ⊂ V if we replace U by U

. Therefore, we
consider only the first case U ⊂ V .
Let dimV = n and dimU = s. We choose a basis e
1
, . . . , e
s
in the subspace U
and complete it up to a basis e
1
, . . . , e
n
in the subspace V . The basis e
1
, . . . , e
n
determines the conjugate basis h
1
, . . . , h
n
in V

. If we specify vectors by their
coordinates in the basis e
1
, . . . , e
n
and if we specify covectors by their coordinates
in dual basis h
1
, . . . , h
n
, then we can apply the formula (2.5).
By construction of the basis e
1
, . . . , e
n
the subspace U consists of vectors the
initial s coordinates of which are deliberate, while the remaining n −s coordinates
are equal to zero. Therefore the condition f ∈ U

means that the equality
'f [ v` =
s
¸
i=1
f
i
v
i
= 0
should be fulfilled identically for any numbers v
1
, . . . , v
s
. This is the case if and
only if the first s coordinates of the covector f are zero. Other n − s coordinates
§ 4. CONJUGATE MAPPING. 97
of f are deliberate. This means that the subspace U

is the linear span of the last
n − s basis vectors of the conjugate basis:
U

= 'h
s+1
, . . . , h
n
`.
For the dimension of the subspace U

this yields dimU

= n − s, hence, we have
the required identity dimU +dimU

= dimV . The theorem is proved
The theorem 3.4 is known as the theorem on the dimension of orthogonal
complements. As an immediate consequence of this theorem we get
¦0¦

= V, V

= ¦0¦,
(3.4)
¦0¦

= V

, (V

)

= ¦0¦.
All these equalities have the transparent interpretation. The first three of the
equalities (3.4) can be proved immediately without using the finite-dimensionality
of V . The proof of the last equality (3.4) uses the corollary of the theorem 1.2,
while this theorem assumes V to be a finite-dimensional space.
Theorem 3.5. In the case of a finite-dimensional space V for any family of
subspaces in V or in V

the following relationships are fulfilled

¸
i∈I
U
i


=
¸
i∈I
(U
i
)

,

¸
i∈I
U
i


=
¸
i∈I
(U
i
)

. (3.5)
Proof. The sum of subspaces is the span of their union. Therefore, the first
relationship (3.5) is an immediate consequence of the items (3) and (4) in the
theorems 3.1 and 3.2. The finite-dimensionality of V here is not used.
The second relationship (3.5) follows from the first one upon substituting U
i
by
(U
i
)

. Indeed, applying (3.2), we derive the equality

¸
i∈I
(U
i
)


=
¸
i∈I
((U
i
)

)

=
¸
i∈I
U
i
.
Now it is sufficient to pass to orthogonal complements in both sides of this equality
and apply (3.2) again. The theorem is proved.
' 4. Conjugate mapping.
Definition 4.1. Let f : V → W be a linear mapping from V to W. A linear
mapping ϕ: W

→ V

is called a conjugate mapping for f if for any v ∈ V and
for any w ∈ W

the relationship 'ϕ(w) [ v` = 'w[ f(v)` is fulfilled.
The problem of the existence of a conjugate mapping is solved by the defi-
nition 4.1 itself. Indeed, in order to define a mapping ϕ : W

→ V

for each
functional w ∈ W

we should specify the corresponding functional h = ϕ(w) ∈ V

.
But to specify a functional in V

this means that we should specify its action upon
an arbitrary vector v ∈ V . In the sense of this reasoning the defining relationship
for a conjugate mapping is written as follows:
h(v) = 'h[ v` = 'ϕ(w) [ v` = 'w[ f(v)`.
98 CHAPTER III. DUAL SPACE.
It is easy to verify that the above equality defines a linear functional h = h(v):
h(v
1
+v
2
) = 'w[ f(v
1
+v
2
)` = 'w[ f(v
1
) + f(v
2
)` =
= 'w[ f(v
1
)` +'w[ f(v
2
)` = h(v
1
) + h(v
2
),
h(α v) = 'w[ f(α v)` = 'w[ α f(v)` = α'w[ f(v)` = αh(v).
Theorem 4.1. For a linear mapping f : V → W from V to W the conjugate
mapping ϕ: W

→V

is also linear.
Proof. Due to the relationship 4.1 for the conjugate mapping ϕ: W

→ V

we have the following relationships:
ϕ(w
1
+ w
2
)(v) = 'w
1
+ w
2
[ f(v)` = 'w
1
[ f(v)`+
+ 'w
2
[ f(v)` = ϕ(w
1
)(v) + ϕ(w
2
)(v) = (ϕ(w
1
) + ϕ(w
2
))(v),
ϕ(α w)(v) = 'α w[ f(v)` = α'w[ f(v)` =
= αϕ(w)(v) = (α ϕ(w))(v).
Since v ∈ V is an arbitrary vector of V from the above calculations we obtain
ϕ(w
1
+ w
2
) = ϕ(w
1
) + ϕ(w
2
) and ϕ(α w) = α ϕ(w). This means that the
conjugate mapping ϕ is a linear mapping.
As we have seen above, the conjugate mapping ϕ: W

→ V

for a mapping
f : V →W is unique. It is usually denoted ϕ = f

. The operation of passing from
f to its conjugate mapping f

possesses the following properties:
(f + g)

= f

+g

, (α f)

= α f

, (f◦ g)

= g

◦ f

.
The first two properties are naturally called the linearity. The last third property
makes the operation f →f

an analog of the matrix transposition. All three of the
above properties are proved by direct calculations on the base of the definition 4.1.
We shall not give these calculations here since in what follows we shall not use the
above properties at all.
Theorem 4.2. In the case of finite-dimensional spaces V and W the kernels
and images of the mappings f : V →W and f

: W

→V

are related as follows:
Ker f

= (Imf)

, Ker f = (Imf

)

,
(4.1)
Imf = (Ker f

)

, Imf

= (Ker f)

.
Proof. The kernel Ker f

is the set of linear functionals of W

that are
mapped to the zero functional in V

under the action of the mapping f

.
Therefore, w ∈ Ker f

is equivalent to the equality f

(w)(v) = 0 for all v ∈ V . As
a result of simple calculations we obtain
f

(w)(v) = 'f

(w) [ v` = 'w[ f(v)` = 0.
Hence, the kernel Ker f

is the set of covectors orthogonal to the vectors of the
form f(v). But the vectors of the form f(v) ∈ W constitute the image Imf.
CopyRight c (Sharipov R.A., 1996, 2004.
§ 4. CONJUGATE MAPPING. 99
Therefore, Ker f

= (Imf)

. The first relationship (4.1) is proved. In proving this
relationship we did not use the finite-dimensionality of W. It is valid for infinite
dimensional spaces as well.
In order to prove the second relationship we consider the orthogonal complement
(Imf

)

. It is formed by the vectors orthogonal to all covectors of the form f

(w):
0 = 'f

(w) [ v` = 'w[ f(v)`.
Using the finite-dimensionality of W, we apply the corollary of the theorem 1.2.
It says that if 'w[ f(v)` = 0 for all w ∈ W

, then f(v) = 0. Therefore, we have
(Imf

)

= Ker f. The second relationship (4.1) is proved. The third and the
fourth relationships are derived from the first and the second ones by means of the
theorem 3.3. Thereby we use the finite-dimensionality of the spaces W and V .
Let the spaces V and W be finite-dimensional. Let’s choose a basis e
1
, . . . , e
n
in V and a basis ˜e
1
, . . . , ˜ e
m
in the space W. This choice uniquely determines
the conjugate bases h
1
, . . . , h
n
and
˜
h
1
, . . . ,
˜
h
m
in V

and W

. Let’s consider a
mapping f : V → W and the conjugate mapping f

: W

→ V

. The matrices of
the mappings f and f

are determined by the expansions:
f(e
j
) =
m
¸
k=1
F
k
j
˜e
k
, f

(
˜
h
i
) =
n
¸
q=1
Φ
i
q
h
q
. (4.2)
The second relationship (4.1) is somewhat different by structure from the first
one. The matter is that the basis vectors of the dual basis are indexed differently
(with upper indices). However, this relationship implement the same idea as the
first one: the mapping is applied to a basis vector of one space and the result is
expanded in the basis of another space.
Theorem 4.3. The matrices of the mappings f and f

determined by the re-
lationships (4.2) are the same, i. e. F
i
j
= Φ
i
j
.
Proof. From the definition of the conjugate mapping we derive
'
˜
h
i
[ f(e
j
)` = 'f

(
˜
h
i
) [ e
j
`. (4.3)
Let’s calculate separately the left and the right hand sides of this equality using
the expansion (4.2) for this purpose:
'
˜
h
i
[ f(e
j
)` =
m
¸
k=1
F
k
j
'
˜
h
i
[ ˜e
k
` =
m
¸
k=1
F
k
j
δ
i
k
= F
i
j
,
'f

(
˜
h
i
) [ e
j
` =
n
¸
q=1
Φ
i
q
'h
q
[ e
j
` =
n
¸
q=1
Φ
i
q
δ
q
j
= Φ
i
j
.
Substituting the above expressions back to the formula (4.3), we get the required
coincidence of the matrices: F
i
j
= Φ
i
j
.
Remark. In some theorems of this chapter the restrictions to the finite-
dimensional case can be removed. However, the prove of such strengthened
versions of these theorems is based on the axiom of choice (see [1]).
CHAPTER IV
BILINEAR AND QUADRATIC FORMS.
' 1. Symmetric bilinear forms
and quadratic forms. Recovery formula.
Definition 1.1. Let V be a linear vector space over a numeric field K. A
numeric function y = f(v, w) with two arguments v, w ∈ V and with the values
in the field K is called a bilinear form if
(1) f(v
1
+ v
2
, w) = f(v
1
, w) + f(v
2
, w) for any two v
1
, v
2
∈ V ;
(2) f(α v, w) = αf(v, w) for any v ∈ V and for any α ∈ K;
(3) f(v, w
1
+w
2
) = f(v, w
1
) +f(v, w
2
) for any two v
1
, v
2
∈ V ;
(4) f(v, α w) = αf(v, w) for any v ∈ V and for any α ∈ K.
The bilinear form f(v, w) is linear in its first argument v when the second
argument w is fixed; it is also linear in its second argument w when the first
argument v is fixed.
Definition 1.2. A bilinear form f(v, w) is called a symmetric bilinear form if
f(v, w) = f(w, v).
Definition 1.3. A bilinear form f(v, w) is called a skew-symmetric bilinear
form or an antisymmetric bilinear form if f(v, w) = −f(w, v).
Having a bilinear form f(v, w), one can produce a symmetric bilinear form:
f
+
(v, w) =
f(v, w) + f(w, v)
2
. (1.1)
Similarly, one can produce a skew-symmetric bilinear form:
f

(v, w) =
f(v, w) −f(w, v)
2
. (1.2)
The operation (1.1) is called the symmetrization of the bilinear form f; the
operation (1.2) is called the alternation of this bilinear form. Thereby any bilinear
form is the sum of a symmetric bilinear form and a skew-symmetric one:
f(v, w) = f
+
(v, w) + f

(v, w). (1.3)
Theorem 1.1. The expansion of a given bilinear form f(v, w) into the sum of
a symmetric and a skew-symmetric bilinear forms is unique.
Proof. Let’s consider an expansion of f(v, w) into the sum of a symmetric
and a skew-symmetric bilinear forms
f(v, w) = h
+
(v, w) + h

(v, w). (1.4)
§ 1. SYMMETRIC BILINEAR FORMS . . . 101
By means of symmetrization and alternation from (1.4) we derive
f(v, w) +f(w, v) = (h
+
(v, w) + h
+
(w, v))+
+ (h

(v, w) +h

(w, v)) = 2 h
+
(v, w),
f(v, w) −f(w, v) = (h
+
(v, w) − h
+
(w, v))+
+ (h

(v, w) −h

(w, v)) = 2 h

(v, w),
Hence, h
+
= f
+
and h

= f

. Therefore, the expansion (1.4) coincides with the
expansion (1.3). The theorem is proved.
Definition 1.4. A numeric function y = g(v) with one vectorial argument
v ∈ V is called a quadratic form in a linear vector space V if g(v) = f(v, v) for
some bilinear form f(v, w).
If g(v) = f(v, v), then the quadratic form g is said to be generated by the
bilinear form f. For a skew-symmetric bilinear form we have f

(v, v) = −f

(v, v).
Hence, f

(v, v) = 0. Then from the expansion (1.3) we derive
g(v) = f(v, v) = f
+
(v, v). (1.5)
The same quadratic form can be generated by several bilinear forms. The
relationship (1.5) shows that any quadratic form can be generated by a symmetric
bilinear form.
Theorem 1.2. For any quadratic form g(v) there is the unique bilinear form
f(v, w) that generates g(v).
Proof. The existence of a symmetric bilinear form f(v, w) generating g(v)
follows from (1.5). Let’s prove the uniqueness of this form. From g(v) = f(v, v)
and from the symmetry of the form f we derive
g(v +w) = f(v +w, v + w) = f(v, v) +f(v, w)+
+f(w, v) +f(w, w) = f(v, v) + 2 f(v, w) + f(w, w).
Now f(v, v) and f(w, w) in right hand side of this formula can be replaced by
g(v) and g(w) respectively. Hence, we get
f(v, w) =
g(v +w) −g(v) −g(w)
2
. (1.6)
Formula (1.6) shows that the values of the symmetric bilinear form f(v, w) are
uniquely determined by the values of the quadratic form g(v). This proves the
uniqueness of the form f.
The formula (1.6) is called a recovery formula. Usually, a quadratic form and
an associated symmetric bilinear form for it both are denoted by the same symbol:
g(v) = g(v, v). Moreover, when a quadratic form is given, we assume without
stipulations that the associated symmetric bilinear form g(v, w) is also given.
102 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
Let f(v, w) be a bilinear form in a finite-dimensional linear vector space V and
let e
1
, . . . , e
n
be a basis in this space. The numbers f
ij
determined by formula
f
ij
= f(e
i
, e
j
) (1.7)
are called the coordinates or the components of the form f in the basis e
1
, . . . , e
n
.
The numbers (1.7) are written in form of a matrix
F =

f
11
. . . f
1n
.
.
.
.
.
.
.
.
.
f
n1
. . . f
nn

, (1.8)
which is called the matrix of the bilinear form f in the basis e
1
, . . . , e
n
. For
the element f
ij
in the matrix (1.8) the first index i specifies the row number, the
second index j specifies the column number. The matrix of a symmetric bilinear
form g is also symmetric: g
ij
= g
ji
. Further, saying the matrix of a quadratic
form g(v), we shall assume the matrix of an associated symmetric bilinear form
g(v, w).
Let v
1
, . . . , v
n
and w
1
, . . . , w
n
be coordinates of two vectors v and w in the
basis e
1
, . . . , e
n
. Then the values f(v, w) and g(v) of a bilinear form and of a
quadratic form respectively are calculated by the following formulas:
f(v, w) =
n
¸
i=1
n
¸
j=1
f
ij
v
i
w
j
, g(v) =
n
¸
i=1
n
¸
j=1
g
ij
v
i
v
j
. (1.9)
In the case when g
ij
is a diagonal matrix, the formula for g(v) contains only the
squares of coordinates of a vector v:
g(v) = g
11
(v
1
)
2
+. . . + g
nn
(v
n
)
2
. (1.10)
This supports the term «quadratic form». Bringing a quadratic form to the form
(1.10) by means of choosing proper basis e
1
, . . . , e
n
in a linear space V is one of
the problems which are solved in the theory of quadratic form.
Let e
1
, . . . , e
n
and ˜e
1
, . . . , ˜e
n
be two bases in a linear vector space V . Let’s
denote by S the transition matrix for passing from the first basis to the second
one. Denote T = S
−1
. From (1.7) we easily derive the formula relating the
components of a bilinear form f(v, w) these two bases. For this purpose it is
sufficient to substitute the relationship (5.8) of Chapter I into the formula (1.7)
and use the bilinearity of the form f(v, w):
f
ij
= f(e
i
, e
j
) =
n
¸
k=1
n
¸
q=1
T
k
i
T
q
j
f(˜ e
k
, ˜e
q
) =
n
¸
k=1
n
¸
q=1
T
k
i
T
q
j
˜
f
kq
.
The reverse formula expressing
˜
f
kq
through f
ij
is derived similarly:
f
ij
=
n
¸
k=1
n
¸
q=1
T
k
i
T
q
j
˜
f
kq
,
˜
f
kq
=
n
¸
i=1
n
¸
j=1
S
i
k
S
j
q
f
ij
. (1.11)
§ 2. ORTHOGONAL COMPLEMENTS . . . 103
In matrix form these relationships are written as follows:
F = T
tr
˜
F T,
˜
F = S
tr
F S. (1.12)
Here S
tr
and T
tr
are two matrices obtained from S and T by transposition.
' 2. Orthogonal complements
with respect to a quadratic form.
Definition 2.1. Two vectors v and w in a linear vector space V are called
orthogonal to each other with respect to the quadratic form g if g(v, w) = 0.
Definition 2.2. Let S be a subset of a linear vector space V . The orthogonal
complement of the subset S with respect to a quadratic form g(v) is the set of
vectors each of which is orthogonal to all vectors of S with respect that quadratic
form g. The orthogonal complement of S is denoted S

⊂ V .
The orthogonal complement of a subset S with respect to a quadratic form g
can be defined formally: S

= ¦v ∈ V : ∀w((w ∈ S) ⇒ (g(v, w) = 0))¦. For the
orthogonal complements determined by a quadratic form g(v) there is a theorem
analogous to theorems 3.1 and 3.2 in Chapter III.
Theorem 2.1. The operation of constructing orthogonal complements of sub-
sets S ⊂ V with respect to a quadratic form g possesses the following properties:
(1) S

is a subspace in V ;
(2) S
1
⊂ S
2
implies (S
2
)

⊂ (S
1
)

;
(3) 'S`

= S

, where 'S` is the linear span of S;
(4)

¸
i∈I
S
i


=
¸
i∈I
(S
i
)

.
Proof. Let’s prove the first item in the theorem for the beginning. For this
purpose we should verify two conditions from the definition of a subspace.
Let v
1
, v
2
∈ S

. Then g(v
1
, w) = 0 and g(v
2
, w) = 0 for all w ∈ S. Hence,
for all w ∈ S we have g(v
1
+ v
2
, w) = g(v
1
, w) + g(v
2
, w) = 0. This means that
v
1
+v
2
∈ S

, so the first condition is verified.
Now let v ∈ S

. Then g(v, w) = 0 for all w ∈ S. Hence, for the vector α v
we derive g(α v, w) = αg(v, w) = 0. This means that α v ∈ S

. Thus, the first
item of the theorem 2.1 is proved.
In order to prove the inclusion (S
2
)

⊂ (S
1
)

in the second item of the
theorem 2.1 we consider an arbitrary vector v in (S
2
)

. From the condition
v ∈ (S
2
)

we get g(v, w) = 0 for any w ∈ S
2
. But S
1
⊂ S
2
, therefore, the equality
g(v, w) = 0 is fulfilled for any w ∈ S
1
. Then v ∈ (S
1
)

. Thus, v ∈ (S
2
)

implies
v ∈ (S
1
)

. This proves the required inclusion.
Now let’s proceed to the third item of the theorem. Note that the linear span of
S comprises this set: S ⊂ 'S`. Applying the second item of the theorem, which is
already proved, we get the inclusion 'S`

⊂ S

. In order to prove the coincidence
'S`

= S

we have to prove the converse inclusion S

⊂ 'S`

. For this purpose
let’s remember that the linear span 'S` is formed by the linear combinations
w = α
1
w
1
+. . . +α
r
w
r
, where w
i
∈ S. (2.1)
104 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
Let v ∈ S

, then g(v, w) = 0 for all w ∈ S. In particular, this is true for the
vectors w
i
in the expansion (2.1): g(v, w
i
) = 0. Then from (2.1) we derive
g(v, w) = α
1
g(v, w
1
) + . . . + α
r
g(v, w
r
) = 0.
Hence, g(v, w) = 0 for all w ∈ 'S`. This proves the converse inclusion S

⊂ 'S`

and thus completes the proof of the coincidence 'S`

= S

.
In proving the fourth item of the theorem we introduce the following notations:
S =
¸
i∈I
S
i
,
˜
S =
¸
i∈I
(S
i
)

.
Let v ∈ S

. Then g(v, w) = 0 for all w ∈ S. But S
i
⊂ S for any i ∈ I. Therefore,
g(v, w) = 0 for all w ∈ S
i
and for all i ∈ I. This means that v belongs to each
of the orthogonal complements (S
i
)

and, hence, it belongs to their intersection.
This proves the inclusion S


˜
S.
Conversely, if v ∈ (S
i
)

for all i ∈ I, then g(v, w) = 0 for all w ∈ S
i
and for all
i ∈ I. Hence, g(v, w) = 0 for any vector w in the union of all sets S
i
. This proves
the converse inclusion
˜
S ⊂ S

.
The above two inclusions S


˜
S and
˜
S ⊂ S

prove the coincidence of two sets
S

=
˜
S. The theorem 2.1 is proved.
Definition 2.3. The kernel of a quadratic form g(v) in a linear vector space
V is the set Ker g = V

formed by vectors orthogonal to each vector of the space
V with respect to the form g.
Definition 2.4. A quadratic form with nontrivial kernel Ker g = ¦0¦ is called
a degenerate quadratic form. Otherwise, if Ker g = ¦0¦, then the form g is called
a non-degenerate quadratic form.
Due to the theorem 2.1 the kernel of a form g(v) is a subspace of the space V ,
where it is defined. The term «kernel» is not an occasional choice for denoting
the set V

. Each quadratic form is associated with some mapping, for which the
subspace V

is the kernel.
Definition 2.5. An associated mapping of a quadratic form g is the mapping
a
g
: V →V

that takes each vector v of the space V to the linear functional f
v
in
the conjugate space V

determined by the relationship
f
v
(w) = g(v, w) for all w ∈ V. (2.2)
The associated mapping a
g
: V → V

is linear, this fact is immediate from the
bilinearity of the form g. Its kernel Ker a
g
coincides with the kernel of the form
g. Indeed, the condition v ∈ Ker a
g
means that the functional f
v
determined by
(2.2) is identically zero. Hence, v is orthogonal to all vectors w ∈ V with respect
to the quadratic form g(v).
The associated mapping a
g
relates orthogonal complements S

determined by
the quadratic form g and and orthogonal complements S

in a dual space, which
we considered earlier in Chapter III.
§ 2. ORTHOGONAL COMPLEMENTS . . . 105
Theorem 2.2. For any subset S ⊂ V and for any quadratic form g(v) in a
linear vector space V the set S

is the total preimage of the set S

under the
associated mapping a
g
, i. e. S

= a
−1
g
(S

).
Proof. The condition v ∈ S

means that g(v, w) = 0 for all w ∈ S. But this
equality can be rewritten in the following way:
g(v, w) = f
v
(w) = a
g
(v)(w) = 'a
g
(v) [ w` = 0 for all w ∈ S.
Hence, the condition v ∈ S

is equivalent to a
g
(v) ∈ S

. This proves the required
equality S

= a
−1
g
(S

).
According to the definition 2.3, vectors of the kernel Ker g are orthogonal to all
vectors of the space V with respect to the form g. Therefore (Ker g)

= V . If we
apply the result of the theorem 2.2 to the kernel S = Ker g, we get
a
−1
g
((Ker g)

) = (Ker g)

= V.
This result becomes more clear if we write it in the following equivalent form:
Ima
g
= a
g
(V ) ⊆ (Ker g)

. (2.3)
Corollary 1. The image of the associated mapping a
g
is enclosed into the
orthogonal complement to its kernel (Ker a
g
)

, i. e. Ima
g
⊆ (Ker a
g
)

.
This corollary of the theorem 2.2 is derived from the formula (2.3) if we take
into account Ker g = Ker a
g
. For a quadratic form g in a finite-dimensional space
V it can be strengthened.
Corollary 2. For a quadratic form g(v) in a finite-dimensional linear vector
space V the image of the associated mapping a
g
: V → V

coincides with the
orthogonal complement of its kernel Ker a
g
:
Ima
g
= (Ker a
g
)

. (2.4)
Proof. Using the theorem 9.4 from Chapter I, we calculate the dimension of
the image Ima
g
of the associated mapping:
dim(Ima
g
) = dimV −dim(Ker a
g
).
The dimension of the orthogonal complement of Ker a
g
in the dual space is
determined by the theorem 3.4 in Chapter III:
dim(Ker a
g
)

= dimV −dim(Ker a
g
).
As we can see, the dimensions of these two subspaces are equal to each other.
Therefore. we can apply the above corollary 1 and the item (3) of the theorem 4.5
from Chapter I. As a result we get the required equality (2.4).
CopyRight c (Sharipov R.A., 1996, 2004.
106 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
Theorem 2.3. Let U V be a subspace of a finite-dimensional space V com-
prising the kernel of a quadratic form g. For any vector v / ∈ U there exists a vector
w ∈ V such that g(v, w) = 0 and g(v, u) = 0 for all u ∈ U.
Proof. This theorem is an analog of the theorem 1.2 from Chapter III. It’s
proof is essentially based on that theorem. Applying the theorem 1.2 from
Chapter III, we get that there exist a linear functional f ∈ V

such that f(v) = 0
and f(u) = 'f [ u` = 0 for all u ∈ U. Due to the last condition this functional
f belongs to the orthogonal complement U

. From the inclusion Ker g ⊂ U,
applying the item (2) of the theorem 3.1 from Chapter III, we get U

⊂ (Ker g)

.
Hence, we conclude that f ∈ (Ker g)

.
Now we apply the corollary 2 from the theorem 2.2. From this corollary we
obtain that (Ker g)

= Ima
g
. Hence, f ∈ Ima
g
and there is a vector w ∈ V that
is taken to f by the associated mapping a
g
, i. e. f = a
g
(w). Then
g(v, w) = a
g
(w)(v) = f(v) = 0,
g(v, u) = a
g
(w)(u) = f(u) = 0 for all u ∈ U.
Due to these relationship we find that w is the very vector that we need to
complete the proof of the theorem.
Theorem 2.4. Let V be a finite-dimensional linear vector space and let U and
W be two subspaces of V comprising the kernel Ker g of a quadratic form g. Then
the conditions W = U

and U = W

are equivalent to each other.
Proof. The theorem 2.4 is an analog of the theorem 3.3 from Chapter III. The
proofs of these two theorems are also very similar.
Suppose that the condition W = U

is fulfilled. Then for any vector w ∈ W
and for any vector u ∈ U we have the relationship g(w, u) = 0. The set W

is
formed by vectors orthogonal to all vectors of W with respect to the quadratic
form g. Therefore, we have the inclusion U ⊂ W

.
Further proof is by contradiction. Assume that U = W

. Then there is a vector
v
0
such that v
0
∈ W

and v
0
∈ U. In this situation we can apply the theorem 2.3
which says that there exists a vector v such that g(v, v
0
) = 0 and g(v, u) = 0 for
all u ∈ U. The latter condition means that v ∈ U

= W. Then the other condition
g(v, v
0
) = 0 contradicts to the initial choice v
0
∈ W

. This contradiction shows
that the assumption U = W

is not true and we have the coincidence U = W

.
Thus, W = U

implies U = W

. We can swap U and W and obtain that U = W

implies W = U

. Hence, these two conditions are equivalent.
The proposition of the theorem 2.3 can be reformulated as follows: for a
subspace U ⊂ V in a finite-dimensional space V the condition Ker g ⊂ U means
that double orthogonal complement of U coincides with that space: (U

)

= U.
For an arbitrary subset S ⊂ V of a finite-dimensional space V one can derive
(S

)

= 'S` +Ker g. (2.5)
Let’s prove the relationship (2.5). Note that vectors of the kernel Ker g are
orthogonal to all vectors of V . Therefore, joining the vectors of the kernel Ker g
§ 2. ORTHOGONAL COMPLEMENTS . . . 107
to S, we do not change the orthogonal complement of this subset:
S

= (S ∪ Ker g)

.
Now let’s apply the item (3) of the theorem 2.1. This yields
S

= (S ∪ Ker g)

= 'S ∪ Ker g`

= ('S` + Ker g)

.
The subspace U = 'S` + Ker g comprises the kernel of the form g. Therefore,
(U

)

= U. This completes the proof of the relationship (2.5):
(S

)

= (('S` + Ker g)

)

= 'S` + Ker g.
Theorem 2.5. In the case of finite-dimensional linear vector space V for any
subspace U of V we have the equality
dimU +dimU

= dimV +dim(Ker g ∩ U), (2.6)
where U

is the orthogonal complement of U with respect to the form g.
Proof. The vectors of the kernel Ker g are orthogonal to all vectors of the
space V , therefore, joining them to U, we do not change the orthogonal comple-
ment U

. Let’s denote W = U +Ker g. Then U

= W

. Applying the theorem 6.4
from Chapter I, for the dimension of W we derive the formula
dimW = dimU +dim(Ker g) − dim(Ker g ∩ U). (2.7)
Now let’s apply the theorem 2.2 to the subset S = W. This yields W

= a
−1
g
(W

).
Note that Ker g ⊂ W, this differs W from the initial subspace U. Let’s apply the
item (2) of the theorem 3.1 to the inclusion Ker g ⊂ W and take into account the
corollary 2 of the theorem 2.2. This yields
W

⊂ (Ker g)

= Ima
g
.
The inclusion W

⊂ Ima
g
means that the preimage of each element f ∈ W

under the mapping a
g
is not empty, while the equality W

= a
−1
g
(W

) shows
that such preimage is enclosed into W

. Therefore, W

= a
−1
g
(W

) implies the
equality a
g
(W

) = W

.
Now let’s consider the restriction of the associated mapping a
g
to the subspace
W

. We denote this restriction by a:
a: W

→V

. (2.8)
The kernel of the mapping (2.8) coincides with the kernel of the non-restricted
mapping a
g
since Ker a
g
= Ker g ⊂ W

. For the image of this mapping we have
Ima = a
g
(W

) = W

.
Let’s apply the theorem on the sum of dimensions of the kernel and the image (see
theorem 9.4 in Chapter I) to the mapping a:
dim(Ker g) + dimW

= dimW

(2.9)
108 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
In order to determine the dimension of W

we apply the relationship
dimW + dimW

= dimV (2.10)
which follows from the theorem 3.4 of Chapter III. Now let’s add the relationships
(2.7) and (2.9) and subtract the relationship (2.10). Taking into account the
coincidence W

= U

, we get the required equality (2.6).
The analogs of the relationships (3.4) from Chapter III in present case are the
relationships ¦0¦

= V and V

= Ker g.
Theorem 2.6. In the case of finite-dimensional linear vector space V equipped
with a quadratic form g for any family of subspaces in V , each of which comprises
the kernel Ker g, the following relationships are fulfilled:

¸
i∈I
U
i


=
¸
i∈I
(U
i
)

,

¸
i∈I
U
i


=
¸
i∈I
(U
i
)

. (2.11)
Proof. In proving the first relationship (2.11) the condition Ker g ⊂ U
i
is
inessential. This relationship is derived from the items (3) and (4) of the theo-
rem 2.1 if we take into account that the sum of subspaces is the linear span of the
union of these subspaces.
The second relationship (2.11) is derived from the first one. From the condition
Ker g ∈ U
i
we derive that ((U
i
)

)

= U
i
(see theorem 2.4). Let’s denote (U
i
)

= V
i
and apply the first relationship (2.11) to the family of subsets V
i
:

¸
i∈I
(U
i
)


=

¸
i∈I
V
i


=
¸
i∈I
(V
i
)

=
¸
i∈I
U
i
.
Now it is sufficient to pass to orthogonal complements in left and right hand sides
of the above equality and apply the theorem 2.4 again. This yields the required
equality (2.11). The theorem is proved.
' 3. Transformation of a quadratic form
to its canonic form. Inertia indices and signature.
Definition 3.1. A subspace U in a linear vector space V is called regular with
respect to a quadratic form g if U ∩ U

⊆ Ker g.
Theorem 3.1. Let U be a subspace in a finite-dimensional space V regular with
respect to a quadratic form g. Then U + U

= V .
Proof. Let’s denote W = U +U

and then let’s calculate the dimension of the
subspace W applying the theorem 6.4 from Chapter I:
dimW = dimU +dimU

− dim(U ∩ U

).
The vectors of the kernel Ker g are perpendicular to all vectors of the space V .
Therefore, Ker g ⊆ U

. Moreover, due to the regularity of U with respect to the
form g we have U ∩ U

⊆ Ker g. Therefore, we derive
U ∩ U

= (U ∩ U

) ∩ Ker g = U ∩ (U

∩ Ker g) = U ∩ Ker g.
§ 3. TRANSFORMATION TO A CANONIC FORM. 109
Because of the equality U ∩ U

= U ∩ Ker g the above formula for the dimension
of the subspace W can be written as follows:
dimW = dimU +dimU

− dim(U ∩ Ker g). (3.1)
Let’s compare (3.1) with the formula (2.6) from the theorem 2.5. This comparison
yields dimW = dimV . Now, applying the item (3) of the theorem 4.5 from
Chapter I, we get W = V . The theorem is proved.
Theorem 3.2. Let U be a subspace of a finite-dimensional space V regular with
respect to a quadratic form g. If U

= Ker g, then there exists a vector v ∈ U

such that g(v) = 0.
Proof. The proof is by contradiction. Assume that there is no vector v ∈ U

such that g(v) = 0. Then the numeric function g(v) is identically zero in the
subspace U

. Due to the recovery formula (1.6) the numeric function g(v, w) is
also identically zero for all v, w ∈ U

.
Now let’s apply the theorem 3.1 and expand an arbitrary vector x ∈ V into a
sum of two vectors x = u + w, where u ∈ U and w ∈ U

. Then for an arbitrary
vector v of the subspace U

we derive
g(v, x) = g(v, u + w) = g(v, u) + g(v, w) = 0 +0 = 0.
The first summand g(v, u) in right hand side of the above equality is zero since the
subspaces U and U

are orthogonal to each other. The second summand g(v, w)
is zero due to our assumption in the beginning of the proof. Since g(v, x) = 0 for
an arbitrary vector x ∈ V , we get v ∈ Ker g. But v is an arbitrary vector of the
subspace U

. Therefore, U

⊆ Ker g. The converse inclusion Ker g ⊆ U

is always
valid. Hence, U

= Ker g, which contradicts the hypothesis of the theorem. This
contradiction means that the assumption, which we have made in the beginning of
our proof, is not valid and, thus, it proves the existence of a vector v ∈ U

such
that g(v) = 0. The theorem is proved.
Theorem 3.3. For any quadratic form g in a finite-dimensional vector space V
there exists a basis e
1
, . . . , e
n
such that the matrix of g is diagonal in this basis.
Proof. The case g = 0 is trivial. The matrix of the zero quadratic form g
is purely zero in any basis. The square n n matrix, which is purely zero, is
obviously a diagonal matrix.
Suppose that g ≡ 0. We shall prove the theorem by induction on the dimension
of the space dimV = n. In the case n = 1 the proposition of the theorem is trivial:
any 1 1 matrix is a diagonal matrix.
Suppose that the theorem is valid for any quadratic form in any space of the
dimension less than n. Let’s consider the subspace U = Ker g. It is regular with
respect to the form g and U

= V . Therefore, we can apply the theorem 3.2.
According to this theorem, there exists a vector v
0
∈ U such that g(v
0
) = 0. Let’s
consider the subspace W obtained by joining v
0
to U = Ker g:
W = Ker g +'v
0
` = U ⊕'v
0
`. (3.2)
This subspace W determines the following two cases: W = V or W = V .
110 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
In the case W = V we choose a basis e
1
, . . . , e
s
in the kernel Ker g and
complete it by one additional vector e
s+1
= v
0
. As a result we get the basis
in V . The matrix of the quadratic form g in this basis is a matrix almost
completely filled with zeros, indeed, for i = 1, . . . , s and j = 1, . . . , s +1 we have
g
ij
= g
ji
= g(e
i
, e
j
) = 0 since e
i
∈ Ker g. The only nonzero element is g
s+1 s+1
, it
is a diagonal element: g
s+1 s+1
= g(e
s+1
, e
s+1
) = g(v
0
) = 0.
In the case W = V we consider the intersection W ∩ W

. Let w ∈ W ∩ W

.
Then from (3.2) we derive w = α v
0
+u, where u ∈ Ker g. Since w is a vector of
W ans simultaneously it is a vector of W

, it should be orthogonal to itself with
respect to the quadratic form g:
g(w, w) = g(α v
0
+u, α v
0
+u) =
= α
2
g(v
0
, v
0
) + 2αg(v
0
, u) +g(u, u) = 0.
(3.3)
But u ∈ Ker g, therefore, g(v
0
, u) = 0 and g(u, u) = 0, while g(v
0
, v
0
) = g(v
0
) =
0. Hence, from (3.3) we get α = 0. This means that w = u ∈ Ker g. Thus,
we have proved the inclusion W ∩ W

⊆ Ker g, which means the regularity of the
subspace W with respect to the quadratic form g.
Now let’s apply the theorem 3.1. It yields the expansion V = W + W

. Note
that v
0
∈ W, but v
0
/ ∈ W ∩ W

. This follows from g(v
0
, v
0
) = 0. Hence, v
0
/ ∈ W

and W

= V . This means that the dimension of the subspace W

is less than n.
The formula (2.6) yield the exact value of this dimension
dimW

= dimV + dim(U ∩ Ker g) − dimU = n − 1.
Let’s consider the restriction of g to the subspace W

. We can apply the inductive
hypothesis to g in W

. Let e
1
, . . . , e
n−1
be a basis of the subspace W

in which
the matrix of the restriction of g to W

is diagonal:
g
ij
= g
ji
= g(e
i
, e
j
) = 0 for i < j n −1. (3.4)
We complete this basis by one vector e
n
= v
0
. Since v
0
/ ∈ W

the extended
system of vectors e
1
, . . . , e
n
is linearly independent and, hence, is a basis of V .
Let’s find the matrix of the quadratic form g in the extended basis. For the
elements in the extension of this matrix we obtain the relationships
g
in
= g
ni
= g(e
i
, e
n
) = 0 for i < n. (3.5)
They follow from the orthogonality of e
i
and e
n
in (3.5). Indeed, e
n
∈ W and
e
i
∈ W

. The relationships (3.4) and (3.5) taken together mean that the matrix
of the quadratic form g is diagonal in the basis e
1
, . . . , e
n
. The inductive step is
over and the theorem is completely proved.
Let g be a quadratic form in a finite dimensional space V and let e
1
, . . . , e
n
be a basis in which the matrix of g is diagonal. Then the value of g(v) can be
calculated by formula (1.10). A part of the diagonal elements g
11
, . . . , g
nn
can be
equal to zero. Let’s denote by s the number of such elements. We can renumerate
the basis vectors e
1
, . . . , e
n
so that
g
11
= . . . = g
ss
= 0. (3.6)
§ 3. TRANSFORMATION TO A CANONIC FORM. 111
The first s vectors of the basis, which correspond to the matrix elements (3.6),
belong to the kernel of the form Ker g. Indeed, if w = e
i
for i = 1, . . . , s, then
g(v, w) = 0 for all vectors v ∈ V . This fact can be easily derived with the use of
formulas (1.9).
Conversely, suppose that w ∈ Ker g. Then for an arbitrary vector v ∈ V we
have the following relationships:
g(v, w) =
n
¸
i=1
n
¸
j=1
g
ij
v
i
w
j
=
n
¸
i=s+1
g
ii
v
i
w
i
= 0.
Since v ∈ V is an arbitrary vector, the above equality should be fulfilled identically
in v
s+1
, . . . , v
n
. But g
ii
= 0 for i s + 1, therefore, w
s+1
= . . . = w
n
= 0. From
these equalities for the vector w we derive
w = w
1
e
1
+. . . +w
s
e
s
.
The conclusion is that any vector w of the kernel Ker g can be expanded into
a linear combination of the first s basis vectors. Hence, these basis vectors
e
1
, . . . , e
s
form a basis in Ker g. The above considerations prove the following
proposition that we present in the form of a theorem.
Theorem 3.4. The number of zeros on the diagonal of the matrix of a quadratic
form g, brought to the diagonal form, is a geometric invariant of the form g. It does
not depend on the method used for bringing this matrix to a diagonal form and
coincides with the dimension of the kernel of the quadratic form: s = dim(Ker g).
Definition 3.2. The number s = dim(Ker g) is called the zero inertia index of
a quadratic form g.
Let g be a quadratic form in a linear vector space over the field of complex
numbers C such that its matrix is diagonal a basis e
1
, . . . , e
n
. Suppose that s is
the zero inertia index of the quadratic form g. Without loss of generality we can
assume that the first s basis vectors e
1
, . . . , e
s
form a basis in the kernel Ker g.
We define the numbers γ
1
, . . . , γ
n
by means of formula
γ
i
=

1 for i s,

g
ii
for i > s.
(3.7)
Remember that for any complex number one can take its square root which is
again a complex number. Complex numbers (3.7) are nonzero. We use them in
order to construct the new basis:
˜ e
i
= (γ
i
)
−1
e
i
, i = 1, . . . , n. (3.8)
The matrix of the quadratic form g in the new basis (3.8) is again a diagonal
matrix. Indeed, we can explicitly calculate the matrix elements:
˜ g
ij
= g(˜ e
i
, ˜e
j
) = (γ
i
γ
j
)
−1
g
ij
= 0 for i = j.
112 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
For the diagonal elements of the matrix of g we derive
˜ g
ii
= g(˜ e
i
, ˜e
i
) = (γ
i
)
−2
g
ii
=

0 for i s,
1 for i > s.
The matrix of the quadratic form g in the basis ˜ e
1
, . . . , ˜e
n
has the following form
which is used to be called the canonic form of the matrix of a quadratic form over
the field of complex numbers C:
( =

0
.
.
.
0
1
.
.
.
1

s

n −s
(3.9)
The matrix ( in (3.9) is a diagonal matrix, its diagonal is filled with s zeros and
n − s ones, where s = dimKer g.
In the case of a linear vector space over the field of real numbers R the canonic
form of the matrix of a quadratic form is different from (3.9). Let e
1
, . . . , e
n
be a
basis in which the matrix of g is diagonal. Diagonal elements of this matrix now
is subdivided into three groups: zero elements, positive elements, and negative
elements. If s is the number of zero elements and r is the number of positive
elements, then remaining n−s −r elements on the diagonal are negative numbers.
Without loss of generality we can assume that the basis vectors e
1
, . . . , e
n
are
enumerated so that g
ii
= 0 for i = 1, . . . , s and g
ii
> 0 for i = s + 1, . . . , s + r.
Then g
ii
< 0 for i = s + r + 1, . . . , n. In the field of reals we can take the square
root only of non-negative numbers. Therefore, here we define γ
1
, . . . , γ
n
a little
bit differently than it was done in (3.7) for complex numbers:
γ
i
=

1 for i s,

[g
ii
[ for i > s.
(3.10)
By means of (3.10) we define new basis ˜ e
1
, . . . , ˜e
n
using the formulas (3.9).
Here is the matrix of the quadratic form g in this basis:
( =

0
.
.
.
0
1
.
.
.
1
−1
.
.
.
−1

s

r
p

r
n
(3.11)
CopyRight c (Sharipov R.A., 1996, 2004.
§ 3. TRANSFORMATION TO A CANONIC FORM. 113
Definition 3.3. The formula (3.11) defines the canonic form of the matrix of
a quadratic form g in a space over the real numbers R. The integers r
p
and r
n
that determine the number of plus ones and the number of minus ones on the
diagonal of the matrix (3.10) are called the positive inertia index and the negative
inertia index of the quadratic form g respectively.
Theorem 3.5. The positive and the negative inertia indices r
p
and r
n
of a
quadratic form g in a space over the field of real numbers R are geometric invariants
of g. They do not depend on a particular way how the matrix of g was brought to
the diagonal form.
Proof. Let e
1
, . . . , e
n
be a basis of a space V in which the matrix of g has
the canonic form (3.11). Let’s consider the following subspaces:
U
+
= 'e
1
, . . . , e
s+rp
`, U

= 'e
s+rp+1
, . . . , e
n
`. (3.12)
The intersection of U
+
and U

is trivial, dimU
+
= s + r
p
, dimU

= r
n
, and for
their sum we have U
+
⊕U

= V .
Let’s take a vector v ∈ U
+
. The value of the quadratic form g for that vector is
determined by the matrix (3.11) according to the formula (1.10):
g(v) =
s+rp
¸
i=s+1
(v
i
)
2
.
The sum of squares in the right hand side of this equality is a non-negative
quantity, i. e. g(v) 0 for all v ∈ U
+
.
Now let’s take a vector v ∈ U

. For this vector the formula (1.10) is written as
g(v) =
n
¸
i=s+rp+1
(−(v
i
)
2
).
If v = 0, then at least one summand in right hand side is nonzero. Hence, g(v) < 0
for all nonzero vectors of the subspace U

.
Suppose that ˜e
1
, . . . , ˜ e
n
is some other basis in which the matrix of g has the
canonic form. Denote by ˜ s, ˜ r
p
, and ˜ r
n
the inertia indices of g in this basis. The
zero inertia indices in both bases are the same s = ˜ s since they are determined by
the kernel of g: s = dim(Ker g) and ˜ s = dim(Ker g).
Let’s prove the coincidence of the positive and the negative inertia indices in
two bases. For this purpose we consider the subspaces
˜
U
+
and
˜
U

determined by
the relationships of the form (3.12) but for the «wavy» basis ˜ e
1
, . . . , ˜e
n
. If we
assume that r
p
= ˜ r
p
, then r
p
> ˜ r
p
or r
p
< ˜ r
p
. For the sake of certainty suppose
that r
p
> ˜ r
p
. Then we calculate the dimensions of U
+
and
˜
U

:
dimU
+
= s + r
p
, dim
˜
U

= ˜ r
n
= n −s − ˜ r
p
.
For the sum of dimensions of these two subspaces U
+
and
˜
U

we get the equality
dimU
+
+dim
˜
U

= n +(r
p
− ˜ r
p
). Due to the above assumption r
p
> ˜ r
p
we derive
dimU
+
+dim
˜
U

> dimV (3.13)
114 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
From the natural inclusion U
+
+
˜
U

⊆ V we get dim(U
+
+
˜
U

) dimV . Using
this estimate together with the inequality (3.13) and applying the theorem 6.4 of
Chapter I to them, we derive dim(U
+

˜
U

) > 0. Hence, the intersection U
+

˜
U

is nonzero, it contains a nonzero vector v ∈ U
+

˜
U

. From the conditions v ∈ U
+
and v ∈ U

we obtain two inequalities
g(v) 0, g(v) < 0
contradicting each other. This contradiction shows that our assumption r
p
= ˜ r
p
is
not valid and the inertia indices r
p
and ˜ r
p
do coincide. From r
p
= ˜ r
p
and s = ˜ s
then we derive r
n
= ˜ r
n
. The theorem is proved.
Definition 3.4. The total set of inertia indices is called the signature of a
quadratic form. In the case of a quadratic form in complex space (K = C) the
signature is formed by two numbers (s, n −s), in the case of real space (K = R) it
is formed by three numbers (s, r
p
, r
n
).
In the case of a linear space over the field of rational numbers K = Q we
can also diagonalize the matrix of a quadratic form and subdivide the diagonal
elements into three parts: positive, negative, and zero elements. This determines
the numbers s, r
p
, and r
n
, which are geometric invariants of g, and we can define
its signature.
However, in the case K = Q we cannot reduce the nonzero diagonal elements
to plus ones and minus ones only. Therefore, the number of geometric invariants
in this case is greater than 3. We shall not look for the complete set of geometric
invariants of a quadratic form in the case K = Q and we shall not construct their
theory since this would lead us to the number theory toward the problems of
divisibility, primality, factorization of integers, etc.
' 4. Positive quadratic forms.
Silvester’s criterion.
In this section we consider quadratic forms in linear vector spaces over the field
of real numbers R. However, almost all results of this section remain valid for
quadratic forms in rational vector spaces as well.
Definition 4.1. A quadratic form g in a space V over the field R is called a
positive form if g(v) > 0 for any nonzero vector v ∈ V .
Theorem 4.1. A quadratic form g in a finite-dimensional space V is positive if
and only if the numbers s and r
n
in its signature (s, r
p
, r
n
) are equal to zero.
Proof. Let g be a positive quadratic form and let e
1
, . . . , e
n
be a basis in
which the matrix of g has the canonic form (3.11). If s = 0 then for the basis
vector e
1
= 0 we would get g(e
1
) = g
11
= 0, which would contradict the positivity
of g. If r
n
= 0, then for the basis vector e
n
= 0 we would get g(e
n
) = g
nn
= −1,
which would also contradict the positivity of g. Hence, s = r
n
= 0.
Now, conversely, let s = r
n
= 0. Then in the basis e
1
, . . . , e
n
, in which the
matrix of g has the form (3.11), its value g(v) is the sum of squares
g(v) = (v
1
)
2
+ . . . + (v
n
)
2
,
§ 4. SILVESTER’S CRITERION. 115
where v
1
, . . . , v
n
are coordinates of a vector v. This formula follows from the
formula (1.10). For a nonzero vector at least one of its coordinates is nonzero.
Hence, g(v) > 0. This proves the positivity of g and thus completes the proof of
the theorem in whole.
The condition s = Ker g obtained in the theorem 3.4 and the condition s = 0
mean that a positive form g in a finite-dimensional space V is non-degenerate:
Ker g = ¦0¦. This fact is valid for a form in an infinite-dimensional space as well.
Theorem 4.2. Any positive quadratic form g is non-degenerate.
Proof. If Ker g = ¦0¦ then there is a nonzero vector v ∈ Ker g. The vector v
of the kernel is orthogonal to all vectors of the space V . Hence, it is orthogonal to
itself: g(v) = g(v, v) = 0. If so, this fact contradicts the positivity of the form g.
Therefore, any positive form g should be non-degenerate.
Theorem 4.3. Any subspace U ⊂ V is regular with respect to a positive qua-
dratic form g in a linear vector space V .
Proof. Since the kernel Ker g of a positive form g is zero, the regularity of
a subspace U with respect to g is equivalent to the equality U ∩ U

= ¦0¦ (see
definition 3.1). Let’s prove this equality. Let v be an arbitrary vector of the
intersection U ∩ U

. From v ∈ U

we derive that it is orthogonal to all vectors of
U. Hence, it is also orthogonal to itself since v ∈ U. Therefore, g(v) = g(v, v) = 0.
Due to positivity of g the equality g(v) = 0 holds only for the zero vector v = 0.
Thus, we get U ∩ U

= ¦0¦. The theorem is proved.
Theorem 4.4. For any subspace U ⊂ V and for any positive quadratic form g
in a finite-dimensional space V there is an expansion V = U ⊕U

.
Proof. The expansion V = U + U

follows from the theorem 3.1. We need
only to prove that the sum in this expansion is a direct sum. For the sum of the
dimensions of U and U

from the theorem 2.5 due to the triviality of the kernel
Ker g = ¦0¦ of a positive quadratic form g we derive
dimU + dimU

= dimV.
Due to this equality in order to complete the proof it is sufficient to apply the
theorem 6.3 of Chapter I.
Let g be a quadratic form in a finite-dimensional space V over the field of
real numbers R. Let’s choose an arbitrary basis e
1
, . . . , e
n
in V and then let’s
construct the matrix of the quadratic form g:
( =

g
11
. . . g
1n
.
.
.
.
.
.
.
.
.
g
n1
. . . g
nn

(4.1)
Let’s delete the last n − k columns and the last n − k raws in the above matrix
(4.1). The determinant of the matrix thus obtained is called the k-th principal
116 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
minor of the matrix (. We denote this determinant by M
k
:
M
k
= det

g
11
. . . g
1k
.
.
.
.
.
.
.
.
.
g
k1
. . . g
kk

(4.2)
The n-th principal minor M
n
coincides with the determinant of the matrix (.
Theorem 4.5. Let g be a positive quadratic form in a finite-dimensional space
V . Then the determinant of the matrix of g in an arbitrary basis of V is positive.
Proof. For the beginning we consider a canonic basis e
1
, . . . , e
n
in which the
matrix of g has the canonic form (3.11). According to the theorem 4.1, the matrix
of a positive quadratic form g in a canonic basis is the unit matrix. Hence, its
determinant is equal to unity and thus it is positive: det ( = 1 > 0.
Now let ˜e
1
, . . . , ˜ e
n
be an arbitrary basis and let S be the transition matrix for
passing from e
1
, . . . , e
n
to ˜e
1
, . . . , ˜ e
n
. Applying the formula (1.12), we get
det
˜
( = det S
tr
(det () det S = (det S)
2
.
In a linear vector space V over the real numbers R the elements of any transition
matrix S are real numbers. Its determinant is also a nonzero real number.
Therefore, (det S)
2
is a positive number. The theorem is proved.
Now again let e
1
, . . . , e
n
be an arbitrary basis of V and let g
ij
be the matrix
of a positive quadratic form g in this basis. Let’s consider the subspace
U
k
= 'e
1
, . . . , e
k
`.
Let’s denote by h
k
the restriction of g to the subspace U
k
. The matrix of the form
h
k
in the basis e
1
, . . . , e
k
coincides with upper left diagonal block in the matrix
of the initial form g. This is the very block that determines the k-th principal
minor M
k
in the formula (4.2). It is clear that the restriction of a positive form g
to any subspace is again a positive quadratic form. Therefore, we can apply the
theorem 4.5 to the form h
k
. This yields M
k
> 0.
Conclusion: the positivity of all principal minors (4.2) is a necessary condition
for the positivity of a quadratic form g itself. As appears, this condition is a
sufficient condition as well. This fact is known as Silvester’s criterion.
Theorem 4.6 (Silvester). A quadratic form g in a finite-dimensional space
V is positive if and only if all principal minors of its matrix are positive.
Proof. The positivity of g implies the positivity of all principal minors in its
matrix. This fact is already proved. Let’s prove the converse proposition. Suppose
that all diagonal minors (4.2) in the matrix of a quadratic form g are positive. We
should prove that g is positive. The proof is by induction on n = dimV .
The basis of the induction in the case dimV = 1 is obvious. Here the matrix
of g consists of the only element g
11
that coincides with the only principal minor:
g
11
= M
1
. The value g(v) in one-dimensional space is determined by the only
coordinate of a vector v according to the formula g(v) = g
11
(v
1
)
2
. Therefore
M
1
> 0 implies the positivity of the form g.
§ 4. SILVESTER’S CRITERION. 117
Suppose that the proposition we are going to prove is valid for a quadratic form
in any space of the dimension less than n = dimV . Let g
ij
be the matrix of our
quadratic form g in some basis e
1
, . . . , e
n
of V . Let’s denote
U = 'e
1
, . . . , e
n−1
`.
Denote by h the restriction of the form g to the subspace U of the dimension n−1.
The matrix elements h
ij
in the matrix of h calculated in the basis e
1
, . . . , e
n−1
coincide with corresponding elements in the matrix of the initial form: h
ij
= g
ij
.
Therefore, the minors M
1
, . . . , M
n−1
can be calculated by means of the matrix
h
ij
. Due to the positivity of these minors, applying the inductive hypothesis, we
find that h is a positive quadratic form in U.
Let ˜e
1
, . . . , ˜ e
n−1
be a basis in which the matrix of the form h has the canonic
form (3.11). Applying the theorem 4.1 to the form h, we conclude that the matrix
˜
h
ij
in the canonic basis ˜e
1
, . . . , ˜ e
n−1
is the unit matrix. Let’s complete the basis
˜e
1
, . . . , ˜ e
n−1
of the subspace U by the vector e
n
/ ∈ U. As a result we get the basis
˜e
1
, . . . , ˜ e
n−1
, e
n
in which the matrix of g has the form
(
1
=

1 . . . 0 ˜ g
1n
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 1 ˜ g
n−1n
˜ g
n1
. . . ˜ g
nn−1
g
nn

. (4.3)
The passage from the basis e
1
, . . . , e
n
to the basis ˜e
1
, . . . , ˜ e
n−1
, e
n
is described
by a blockwise-diagonal matrix S of the form
S
1
=

S
1
1
. . . S
1
n
0
.
.
.
.
.
.
.
.
.
.
.
.
S
n−1
1
. . . S
n−1
n
0
0 . . . 0 1

. (4.4)
The formula (1.12) relates the matrix (4.3) with the matrix ( of the quadratic
form g in the initial basis: (
1
= S
tr
( S. From this formula we derive
det (
1
= det ( (det S)
2
= M
n
(det S)
2
. (4.5)
Due to the above formula (4.5) the positivity of the principal minor M
n
= det (
in the initial matrix (4.1) implies the positivity of the determinant of the matrix
(4.3), i. e. det (
1
> 0.
Let’s calculate the determinant of the matrix (4.3) explicitly. For this purpose
we multiply the first column of this matrix by ˜ g
1n
and subtract it from the last
column. Then we multiply the second column by ˜ g
2n
and subtract it from the last
one. We produce such an operation repeatedly for each of the first n − 1 columns
of the matrix (4.3). From the course of algebra we know that such transformations
do not change the determinant of a matrix. In present case they simplify the
matrix (4.3) bringing it to a lower-triangular form. Therefore, we can calculate
118 CHAPTER IV. BILINEAR AND QUADRATIC FORMS.
the determinant of the matrix (4.3) in explicit form:
det (
1
= det

1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 1 0
˜ g
n1
. . . ˜ g
nn−1
˜ g
nn

= ˜ g
nn
. (4.6)
The element ˜ g
nn
in the transformed matrix is given by the formula
˜ g
nn
= g
nn

n−1
¸
i=1
g
ni
g
in
= g
nn

n−1
¸
i=1
(g
in
)
2
. (4.7)
The matrix of the quadratic form g in the basis ˜ e
1
, . . . , ˜ e
n−1
, e
n
is close to the
diagonal matrix. Let’s complete the process of diagonalization replacing the vector
e
n
by the vector ˜ e
n
∈ U such that
˜ e
n
= e
n

n−1
¸
i=1
g
in
˜ e
i
.
The passage from ˜ e
1
, . . . , ˜e
n−1
, e
n
to ˜e
1
, . . . , ˜ e
n
changes only the last basis
vector. Therefore, the unit diagonal block in the matrix (4.3) remains unchanged.
For non-diagonal elements g(˜ e
k
, ˜e
n
) in the new basis we have
g(˜ e
k
, ˜e
n
) = ˜ g
kn

n−1
¸
i=1
g
in
g(˜ e
k
, ˜e
i
) = ˜ g
kn

n−1
¸
i=1
g
in
˜
h
ki
= 0.
The equality g(˜ e
k
, ˜e
n
) = 0 in the above formula is due to the fact that the matrix
of the restricted form h in its canonic basis ˜e
1
, . . . , ˜ e
n−1
is the unit matrix. For
the diagonal element g(˜ e
n
, ˜e
n
) from this fact we derive
g(˜ e
n
, ˜e
n
) = g
nn

n−1
¸
i=1
n−1
¸
k=1
g
in
g
kn
h
ik
= g
nn

n−1
¸
i=1
(g
in
)
2
.
Comparing this expression with (4.7), we find that g(˜ e
n
, ˜e
n
) = ˜ g
nn
. Thus, the
matrix of g in the basis ˜ e
1
, . . . , ˜ e
n
is a diagonal matrix of the form
(
2
=

1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 1 0
0 . . . 0 ˜ g
nn

. (4.8)
Combining (4.5) and (4.6), for the element ˜ g
nn
in (4.8) we get ˜ g
nn
= M
n
(det S)
2
.
Since the principal minor M
n
of the initial matrix (4.1) is positive, we find that
˜ g
nn
in (4.8) is also positive. Hence, g is a positive quadratic form. Thus, we have
completed the inductive step and have proved the theorem in whole.
CHAPTER V
EUCLIDEAN SPACES.
' 1. The norm and the scalar product. The angle
between vectors. Orthonormal bases.
Definition 1.1. A Euclidean vector space is a linear vector space V over the
field of reals R which is equipped with some fixed positive quadratic form g.
Let (V, g) be a Euclidean vector space. There many positive quadratic forms
in the linear vector space V , however, only one of them is associated with V so
that it defines the structure of Euclidean space in V . Two Euclidean vector spaces
(V, g
1
) and (V, g
2
) with g
1
= g
2
coincide as linear vector spaces, but they are
different when considered as Euclidean vector spaces.
The structure of the Euclidean vector space (V, g) is associated with a special
terminology and special notations. The value of the quadratic form g(v) is non-
negative. The square root of g(v) is called the norm or the length of a vector v.
The norm of a vector v is denoted as follows:
[v[ =

g(v). (1.1)
The quadratic form g(v) produces the bilinear form g(v, w) determined by the
recovery formula (1.6) of Chapter IV. The value of that bilinear form is called the
scalar product of two vectors v and w. The scalar product is denoted as follows:
(v [ w) = g(v, w). (1.2)
Due to the notation (1.1) and (1.2), when dealing with some fixed Euclidean space
(V, g), we can omit the symbol g at all.
The scalar product (1.2) is defined for a pair of two vectors v, w ∈ V . It is
quite different from the scalar product (1.8) of Chapter III, which is defined for a
pair of a vector and a covector. The scalar product (1.2) of a Euclidean vector
space possesses the following properties:
(1) (v
1
+v
2
[ w) = (v
1
[ w) +(v
2
[ w) for all v
1
, v
2
, w ∈ V ;
(2) (α v [ w) = α(v [ w) for all v, w ∈ V and for all α ∈ R;
(3) (v [ w
1
+ w
2
) = (v [ w
1
) + (v[ w
2
) for all w
1
, w
2
, v ∈ V ;
(4) (v [ α w) = α(v [ w) for all v, w ∈ V and for all α ∈ R;
(5) (v [ w) = (w[ v) for all v, w ∈ V ;
(6) [v[
2
= (v [ v) 0 for all v ∈ V and [v[ = 0 implies v = 0.
The properties (1)-(4) reflect the bilinearity of the form g in (1.2). They are
analogous to that of the scalar product of a vector and a covector (see formulas
(1.9) in Chapter III).
CopyRight c (Sharipov R.A., 1996, 2004.
120 CHAPTER V. EUCLIDEAN SPACES.
The properties (5) and (6) have no such analogs. But they are the very
properties that make the scalar product (1.2) a generalization of the scalar
product of 3-dimensional geometric vectors.
Theorem 1.1. The following two additional properties of the scalar product
(1.2) are derived from the properties (1)-(6):
(7) [(v, w)[ [v[ [w[ for all v, w ∈ V ;
(8) [v + w[ [v[ + [w[ for all v, w ∈ V .
The property (7) is known as the Cauchy-Bunyakovsky-Schwarz inequality, while
the property (8) is called the triangle inequality.
Proof. In order to prove the inequality (7) we choose two arbitrary nonzero
vectors v, w ∈ V and consider the numeric function f(α) of a numeric argument
α defined by the following explicit formula:
f(α) = [v + α w[
2
. (1.3)
Using the properties (1)-(6) we find that f(α) is a polynomial of degree two:
f(α) = [v + α w[
2
= (v + α w[ v + α w) =
= (v [ v) +2 α(v [ w) + α
2
(w[ w).
The function (1.3) has a lower bound: f(α) 0. This follows from the property
(6). Let’s calculate the minimum point of the function f(α) by equating its
derivative f

(α) to zero. This yields the following equation:
f

(α) = 2 (v [ w) +2 α(w[ w) = 0.
Solving this equation, we find α
min
= −(v [ w)/(w[ w). Now let’s write the
condition f(α) 0 for the minimal value of the function f(α):
f
min
= f(α
min
) =
[v[
2
[w[
2
− (v [ w)
2
[w[
2
0. (1.4)
The denominator of the fraction (1.4) is positive, therefore, from the inequality
(1.4) we easily derive the property (7).
In order to prove the property (8) we consider the square of the norm for the
vector v +w. For this quantity we derive
[v + w[
2
= (v + w[ v +w) = [v[
2
+ 2 (v [ w) + [w[
2
. (1.5)
Applying the property (8), which is already proved, for the right hand side of the
equality (1.5) we get the following estimate:
[v[
2
+2 (v [ w) +[w[
2
[v[
2
+ 2 [v[ [w[ +[w[
2
= ([v[ +[w[)
2
.
From the relationship (1.5) and from the above inequality we derive the other
inequality [v + w[
2
≤ ([v[ + [w[)
2
. Now the property (8) is derived by taking the
square root of both sides of this inequality. This operation is correct since y =

x
is an increasing function of the real semiaxis [0, +∞). The theorem is proved.
§ 1. THE NORM AND THE SCALAR PRODUCT. 121
Due to the analogy of (1.2) and the scalar product of geometric vectors and due
to the Cauchy-Bunyakovsky-Schwarz inequality [(v, w)[ [v[ [w[ we can introduce
the concept of an angle between vectors in a Euclidean vector space.
Definition 1.2. The number ϕ from the interval 0 ϕ π, which is deter-
mined by the following implicit formula
cos(ϕ) =
(v [ w)
[v[ [w[
, (1.6)
is called the angle between two nonzero vectors v and w in a Euclidean space V .
Due to the property (7) from the theorem 1.1 the modulus of the fraction in left
hand side of (1.6) is not greater than 1. Therefore, the formula (1.6) is correct. It
determines the unique number ϕ from the specified interval 0 ϕ π.
Definition 1.3. Two vectors v and w in a Euclidean space V are called
orthogonal vectors if they form a right angle (ϕ = π/2).
The definition 1.3 applies only to nonzero vectors v and w. The definition 2.1
of Chapter IV is more general. Let’s reformulate for the case of Euclidean spaces.
Definition 1.4. Two vectors v and w in a Euclidean space V are called
orthogonal vectors if their scalar product is zero: (v [ w) = 0.
For nonzero vectors v and w these two definitions 1.3 and 1.4 are equivalent.
Let v
1
, . . . , v
m
be a system of vectors in a Euclidean space (V, g). The matrix
g
ij
composed by the mutual scalar products of these vectors
g
ij
= (v
i
[ v
j
), (1.7)
is called the Gram matrix of the system of vectors v
1
, . . . , v
m
.
Theorem 1.2. A system of vectors v
1
, . . . , v
m
in a Euclidean space is linearly
dependent if and only if the determinant of their Gram matrix is equal to zero.
Proof. Suppose that the vectors v
1
, . . . , v
m
are linearly dependent. Then
there is a nontrivial linear combination of these vectors which is equal to zero:
α
1
v
1
+. . . +α
m
v
m
= 0. (1.8)
Using the coefficients of the linear combination (1.8), we construct the following
expression with the components of Gram matrix (1.7):
m
¸
j=1
g
ij
α
j
=
m
¸
j=1
(v
i
[ v
j
) α
j
= (v
i
[ α
1
v
1
+. . . +α
m
v
m
) = (v
i
[ 0) = 0.
Since i is a free index running over the interval of integer numbers from 1 to m,
this formula means that the columns of Gram matrix g
ij
are linearly dependent.
Hence, its determinant is equal to zero (this fact is known from the course of
general algebra).
122 CHAPTER V. EUCLIDEAN SPACES.
Conversely, assume that the determinant of the Gram matrix (1.7) is equal to
zero. Then the columns of this matrix are linearly dependent and, hence, there is
a nontrivial linear combination of them that is equal to zero:
m
¸
j=1
g
ij
α
j
= 0. (1.9)
Let’s denote v = α
1
v
1
+. . . + α
m
v
m
. Then consider the following double sum,
which is obviously equal to zero due to the equality (1.9):
0 =
m
¸
i=1
m
¸
j=1
α
i
g
ij
α
j
=
m
¸
i=1
α
i
(v
i
[ α
1
v
1
+ . . . +α
m
v
m
) =
= (α
1
v
1
+. . . +α
m
v
m
[ v) = (v [ v) = [v[
2
.
Thus, we get [v[
2
= 0 and, using the positivity of the basic quadratic form g
of the Euclidean space V , we derive v = 0. Since v = 0, we get the nontrivial
linear combination of the form (1.8), which is equal to zero. Hence, our vector
v
1
, . . . , v
m
are linearly dependent.
Let e
1
, . . . , e
n
be a basis in a finite-dimensional Euclidean vector space (V, g).
Let’s consider the Gram matrix of this basis. Knowing the components of the
Gram matrix, we can calculate the norm of vectors (1.1) and the scalar product of
vectors (1.2) through their coordinates:
[v[
2
=
n
¸
i=1
n
¸
j=1
g
ij
v
i
v
j
, (v [ w) =
n
¸
i=1
n
¸
j=1
g
ij
v
i
w
j
. (1.10)
A basis e
1
, . . . , e
n
in a Euclidean space V is called an orthonormal basis if the
Gram matrix for the basis vectors is the unit matrix:
g
ij
=

1 for i = j,
0 for i = j.
(1.11)
If the condition (1.11) is not fulfilled, then the basis e
1
, . . . , e
n
is called a skew-
angular basis. In an orthonormal basis the vectors e
1
, . . . , e
n
are unit vectors
orthogonal to each other. This simplifies the formulas (1.10) substantially:
[v[
2
=
n
¸
i=1
(v
i
)
2
, (v [ w) =
n
¸
i=1
v
i
w
i
. (1.12)
Orthonormal bases do exist. Due to (1.2) and (1.7) we know that the Gram
matrix of the basis vectors e
1
, . . . , e
n
is the matrix of the quadratic g in this
basis. The theorem 3.3 of Chapter IV says that there exists a basis in which the
matrix of g has its canonic form (see (3.11) in Chapter IV). Since g is a positive
quadratic form, its matrix in a canonic form is the unit matrix (see theorem 4.1 in
Chapter IV).
The theorem 4.8 on completing the basis of a subspace formulated in Chapter I
has its analog for orthonormal bases.
§ 2. QUADRATIC FORMS IN A EUCLIDEAN SPACE. 123
Theorem 1.3. Let e
1
, . . . , e
s
be an orthonormal basis in a subspace U of
a finite-dimensional Euclidean space (V, g). Then it can be completed up to an
orthonormal basis e
1
, . . . , e
n
in V .
Proof. Let’s consider the orthogonal complement U

of the subspace U.
According to the theorem 4.4 of Chapter IV, the subspaces U and U

define the
expansion of the space V into a direct sum:
V = U ⊕U

.
The subspace U

inherits the structure of a Euclidean space from V . Let’s choose
an orthonormal basis e
s+1
, . . . , e
n
in U

and then join together two bases of
U and U

. As a result we get the basis in V (see theorem 6.3 of Chapter I).
The vectors of this basis are unit vectors by their length and they are orthogonal
to each other. Hence, this is an orthonormal basis completing the initial basis
e
1
, . . . , e
s
of the subspace U.
Let e
1
, . . . , e
s
and ˜e
1
, . . . , ˜e
s
be two orthonormal bases and let S be the
transition. The Gram matrices of these two bases are the unit matrices. Therefore,
applying the formulas (1.12) of Chapter IV, for the transition matrix S we derive
S
tr
S = 1, S
−1
= S
tr
. (1.13)
Note that a square matrix S satisfying the above relationships (1.13) is called an
orthogonal matrix.
From the relationships (1.13) for the determinant of an orthogonal matrix we
get: (det S)
2
= 1. Therefore, orthogonal matrices are subdivided into two types:
matrices with positive determinant det S = 1 and those with negative determinant
det S = −1. This subdivision is related to the concept of orientation. All bases in
a linear vector space over the field of real numbers R (not necessarily a Euclidean
space) can be subdivided into two sets which can be called «left bases» and «right
bases». The transition matrix for passing from a left basis to a left basis or
for passing from a right basis to another right basis is a matrix with positive
determinant — it does not change the orientation. The transition matrix for
passing from a left basis to a right basis or, conversely, from a right basis to a
left basis is a matrix with negative determinant. Such a transition matrix changes
the orientation of a basis. We say that a linear vector space V over the field of
real numbers R is equipped with the orientation if there is some mechanism to
distinguish one of two types of bases in V .
' 2. Quadratic forms in a Euclidean space.
Diagonalization of a pair of quadratic forms.
Let (V, g) be a Euclidean vector space and let ϕ be a quadratic form in V . For
such a form ϕ we define the following ratio:
µ(v) =
[ϕ(v)[
[v[
2
. (2.1)
The number µ(v) in (2.1) is a real non-negative number. Note that µ(α v) = µ(v)
for any nonzero α ∈ R. Therefore, we can assume v in (2.1) to be a unit vector.
124 CHAPTER V. EUCLIDEAN SPACES.
Let’s denote by |ϕ| the least upper bound of µ(v) for all unit vectors (such
vectors sweep out the unit sphere in the Euclidean space V ):
|ϕ| = sup
|v|=1
µ(v). (2.2)
Definition 2.1. The quantity |ϕ| determined by the formulas (2.1) and (2.2)
is called the norm of a quadratic form ϕ in a Euclidean vector space V . If the
norm |ϕ| is finite, the form ϕ is said to be a restricted quadratic form.
Theorem 2.1. If ϕ is a restricted quadratic form, then there is the estimate
[ϕ(v, w)[ |ϕ| [v[ [w[ for the values of corresponding symmetric bilinear form.
Proof. In order to calculate ϕ(v, w) we use the following equality, which, in
essential, is a version of the recovery formula:
4 αϕ(v, w) = ϕ(v + α w) −ϕ(v −α w). (2.3)
From (2.3) we derive the following inequality for the quantity 4 αϕ(v, w):
4 αϕ(v, w) [ϕ(v +α w)[ +[ϕ(v − α w)[. (2.4)
Now let’s apply the inequality [ϕ(u)[ ≤ |ϕ| [u[
2
derived from (2.1) and (2.2) in
order to estimate the right hand side of (2.4). This yields
4 αϕ(v, w) |ϕ| ([v +α w[
2
+ [(v −α w)[
2
). (2.5)
Let’s express the squares of moduli through the scalar products:
[v ±α w[
2
= [v[
2
± 2 α(v [ w) +α
2
[w[
2
.
Then we can simplify the inequality (2.5) bringing it to the following one:
4 αϕ(v, w) 2|ϕ| ([v[
2

2
[w[
2
).
Now let’s transform the above inequality a little bit more:
f(α) = α
2
|ϕ| [w[
2
−2αϕ(v, w) +|ϕ| [v[
2
0.
The numeric function f(α) of a numeric argument α is a polynomial of degree
two in α. Let’s find the minimum point α = α
min
for this function by equating its
derivative to zero: f

(α) = 0. As a result we obtain
α
min
=
ϕ(v, w)
|ϕ| [w[
2
.
Now let’s write the inequality f(α
min
) 0 for the minimal value of this function.
This yields the following inequality for the bilinear form ϕ:
ϕ(v, w)
2
|ϕ|
2
[v[
2
[w[
2
.
§ 2. QUADRATIC FORMS IN A EUCLIDEAN SPACE. 125
Now it is easy to derive the required estimate for [ϕ(v, w)[ by taking the square
root of both sides of the above inequality. Note that a quite similar method
was used when proving the Cauchy-Bunyakovsky-Schwarz inequality in the theo-
rem 1.1.
Theorem 2.2. Any quadratic form ϕ in a finite-dimensional Euclidean vector
space V is a restricted form.
Proof. Let’s choose an orthonormal basis e
1
, . . . , e
n
in V and consider the
expansion of a unit vector v in this basis. For the coordinates of v in this basis
due to the formulas (1.12) we obtain
(v
1
)
2
+. . . +(v
n
)
2
= 1.
Hence, for the components of v we have [v
i
[ 1. Let’s express the quantity µ(v),
which is defined by formula (2.1), through the coordinates of v:
µ(v) = [ϕ(v)[ =

n
¸
i=1
n
¸
j=1
ϕ
ij
v
i
v
j

.
From [v
i
[ 1 we derive the following estimate for the quantity µ(v):
µ(v)
n
¸
i=1
n
¸
j=1

ij
[ < ∞, (2.6)
Right hand site of (2.6) does not depend v. Due to (2.2) this sum is an upper
bound for the norm |ϕ|. Hence, |ϕ| < ∞. The theorem is proved.
Theorem 2.3. For any quadratic form ϕ in a finite-dimensional Euclidean vec-
tor space V the supremum in formula (2.2) is reached, i. e. there exists a vector
v = 0 such that [ϕ(v)[ = |ϕ| [v[
2
.
Proof. From the course of mathematical analysis we knows that the supremum
of a numeric set is the limit of some converging sequence of numbers of this set
(see [6]). This means that there is a sequence of unit vectors v(1), . . . , v(n), . . .
in V such that the norm |ϕ| is expressed as the following limit:
|ϕ| = lim
s→∞
[ϕ(v(s))[. (2.7)
Let’s choose an orthonormal basis e
1
, . . . , e
n
in v and let’s expand each vector
v(s) of the sequence in this basis. The equality
(v
1
(s))
2
+. . . +(v
n
(s))
2
= 1 (2.8)
is derived from [v(s)[ = 1 due to the formulas (1.12). Now the equality (2.8) means
that each specific coordinate v
i
(s) yields a restricted sequence of real numbers:
−1 v
i
(s) 1.
From the course of mathematical analysis we know that in each restricted sequence
of real numbers one can choose a converging subsequence. So, in the sequence
126 CHAPTER V. EUCLIDEAN SPACES.
of unit vectors v(s) one can choose a subsequence of unit vectors whose first
coordinates form a convergent sequence of numbers. Let’s denote this subsequence
again by v(s) and choose its subsequence with converging second coordinates.
Repeating this choice n-times for each specific coordinate, we get a subsequence of
unit vectors v(s
k
) such that their coordinates all are the converging sequences of
numbers. Let’s consider the limits of these sequences:
v
i
= lim
k→∞
v
i
(s
k
). (2.9)
Denote by v the vector whose coordinates are determined by the limit values (2.9).
Passing to the limit s →∞ in (2.8), we conclude that v is a unit vector: [v[ = 1.
Now let’s calculate [ϕ(v)[ using the matrix of the quadratic form ϕ and the
coordinates of v in the basis e
1
, . . . , e
n
:
[ϕ(v)[ =

n
¸
i=1
n
¸
j=1
ϕ
ij
v
i
v
j

= lim
k→∞

n
¸
i=1
n
¸
j=1
ϕ
ij
v
i
(s
k
) v
j
(s
k
)

.
On the other hand, taking into account (2.7), for [ϕ(v)[ we get
[ϕ(v)[ = lim
k→∞
[ϕ(v(s
k
))[ = lim
s→∞
[ϕ(v(s))[ = |ϕ|. (2.10)
Thus, for the unit vector v with coordinates (2.9) we get [ϕ(v)[ = |ϕ|. Multiplying
v by some number α ∈ R, we can remove the restriction [v[ = 1. Then the equality
(2.10) will be written as [ϕ(v)[ = |ϕ| [v[
2
. The theorem is proved.
Theorem 2.4. For any quadratic form ϕ in a finite-dimensional Euclidean vec-
tor space (V, g) there is an orthonormal basis e
1
, . . . , e
n
such that the matrix of
the form ϕ in this basis is a diagonal matrix.
Proof. The proof is by induction on the dimension of the space V . In the case
dimV = 1 the proposition of the theorem is obvious: any square 1 1 matrix is a
diagonal matrix.
Suppose that the proposition of the theorem is valid for all quadratic forms in
Euclidean spaces of the dimension less than n. Let dimV = n and let ϕ be a
quadratic form in the Euclidean space (V, g). Applying theorems 2.2 and 2.3, we
find a unit vector v ∈ V such that [ϕ(v)[ = |ϕ|. For the sake of certainty we
assume that ϕ(v) 0. Then we can remove the modulus sign: ϕ(v) = |ϕ|. In
the case ϕ(v) < 0 we replace the form ϕ by the opposite form ˜ ϕ = −ϕ since two
opposite forms diagonalize simultaneously.
Let’s denote U = 'v` and consider the orthogonal complement U

. The
subspaces U = 'v` and U

have zero intersection, their sum is a direct sum and
U ⊕ U

= V (see theorem 4.44 in Chapter IV). Let’s take an arbitrary vector
w ∈ U

of the unit length and compose the vector u as follows:
u = cos(α) v + sin(α) w.
Here α is a numeric parameter. It is easy to see that u is also a unit vector, this
follows from the identity cos
2
(α) +sin
2
(α) = 1.
CopyRight c (Sharipov R.A., 1996, 2004.
§ 3. SELFADJOINT OPERATORS. 127
Let’s calculate the value of the quadratic form ϕ on the vector u and treat it as
a function of the numeric parameter α:
f(α) = ϕ(u) = cos
2
(α) ϕ(v) + 2 sin(α) cos(α) ϕ(v, w) +sin
2
(α) ϕ(w).
According to the choice of the vector v, we have the estimate ϕ(u) ϕ(v), and
for α = 0, i. e. when u = v, we have the equality ϕ(u) = ϕ(v). Hence, α = 0 is a
maximum point for the function f(α). Let’s calculate its derivative at the point
α = 0 and equate it to zero. This yields
f

(0) = 2 ϕ(v, w) = 0. (2.11)
Hence, ϕ(v, w) = 0 for all vectors w ∈ U

. Let’s apply the inductive hypothesis
to the subspace U

whose dimension is less by 1 than the dimension of the space
V . Therefore, we can find an orthonormal basis e
1
, . . . , e
n−1
in the subspace
U

such that the matrix of the form ϕ is diagonal in this basis: ϕ(e
i
, e
j
) = 0
for i = j. Let’s complete the basis e
1
, . . . , e
n−1
with the vector e
n
= v. The
complementary vector e
n
is a vector of unit length. It is orthogonal to the vectors
e
1
, . . . , e
n−1
. Therefore, the basis e
1
, . . . , e
n
is an orthonormal basis in V . The
matrix of the form ϕ is diagonal in the basis e
1
, . . . , e
n
. This fact is immediate
from (2.11). The theorem is proved.
The theorem 2.4 is known as the theorem on simultaneous diagonalization of
a pair quadratic form ϕ and g. For this purpose one of them should be positive.
Then the positive form g defines the structure of a Euclidean space in V and then
one can apply the theorem 2.4. Orthonormality of the basis e
1
, . . . , e
n
means
that the matrix of g is diagonal in this basis (it is the unit matrix). The matrix of
ϕ is also diagonal as stated in the theorem 2.4.
' 3. Selfadjoint operators. The theorem on the spectrum
and the basis of eigenvectors for a selfadjoint operator.
Definition 3.1. A linear operator f : V → V in a Euclidean vector space
V is called a symmetric operator or a selfadjoint operator if for any two vectors
v, w ∈ V the following equality is fulfilled: (v [ f(w)) = (f(v) [ w).
Definition 3.2. A linear operator h: V → V in a Euclidean vector space V
is called an adjoint operator to the operator f : V → V if for any two vectors
v, w ∈ V the following equality is fulfilled: (v [ f(w)) = (h(v) [ w). The adjoint
operator is denoted as follows: h = f
+
.
In S 4 of Chapter III we have introduced the concept of conjugate mapping.
There we have shown that any linear mapping f : V →W possesses the conjugate
mapping f

: W

→ V

. For a linear operator f : V → V the conjugate mapping
f

is a linear operator in dual space V

. It is related to f by means of the equality
'f

(u) [ v` = 'u[ f(v)`, (3.1)
which is fulfilled for all u ∈ V

and for all v ∈ V .
128 CHAPTER V. EUCLIDEAN SPACES.
The structure of a Euclidean vector space in V is determined by a positive
quadratic form g. Like every quadratic form, the form g possesses the associated
mapping a
g
: V →V

(see ' 2 in Chapter III) such that
'a
g
(v) [ w` = g(v, w) = (v [ w). (3.2)
In the case of finite-dimensional space V and positive form g the associated
mapping a
g
. is bijective. Therefore, for any linear operator f : V → V we can
define the composition h = a
−1
g
◦ f

◦ a
g
. Then from (3.1) and (3.2) we derive
(h(v) [ w) = 'a
g
◦ h(v) [ w` =
= 'f

◦ a
g
(v) [ w` = 'a
g
(v) [ f(w)` = (v [ f(w)).
(3.3)
Comparing (3.3) with the definition 3.2, we can formulate the following theorem.
Theorem 3.1. For any operator f in a finite-dimensional Euclidean space (V, g)
there is the unique adjoint operator f
+
= a
−1
g
◦ f

◦ a
g
.
Proof. The existence of an adjoint operator is already derived from the
formula f
+
= a
−1
g
◦ f

◦ a
g
and the equality (3.3). Let’s prove its uniqueness.
Assume that h is another operator satisfying the definition 3.2. Then for the
difference r = h − f
+
we derive the relationship
(r(v) [ w) = (h(v) [ w) −(f
+
(v) [ w) =
= (v [ f(w)) − (v [ f(w)) = 0.
(3.4)
Since w in (3.4) is an arbitrary vector, we conclude that r(v) ∈ Ker g. However,
Ker g = ¦0¦ for a positive quadratic form g, hence, h(v) = 0 for any v ∈ V . This
means that h = 0. Thus, we have proved that the adjoint operator f
+
for f is
unique. This completes the proof of the theorem.
Corollary. The passage from f to f
+
is an operator in the space of endomor-
phisms End(V ) of a finite-dimensional Euclidean vector space (V, g). This operator
possesses the following properties:
(f +h)
+
= f
+
+h
+
, (α f)
+
= α f
+
,
(f ◦ h)
+
= h
+
◦ f
+
, (f
+
)
+
= f.
Relying upon the existence and the uniqueness of of the adjoint operator f
+
for
any operator f ∈ End(V ), we can derive all the above relationships immediately
from the definition 3.2. The relationship f
+
= a
−1
g
◦ f

◦ a
g
can be expressed in
the form of the following commutative diagram:
V
f
+
−−−−→ V
ag

ag
V

−−−−→
f

V

§ 3. SELFADJOINT OPERATORS. 129
Comparing the definitions 3.1 and 3.2, now we see that a selfadjoint operator f is
an operator which is adjoint to itself: f
+
= f.
Let e
1
, . . . , e
n
be a basis in a finite-dimensional Euclidean space (V, g) and let
h
1
, . . . , h
n
be the corresponding dual basis composed by coordinate functionals.
For any vector v ∈ V we have the following expansion, which follows from the
definition of coordinate functionals (see ' 13 in Chapter III):
v = h
1
(v) e
1
+ . . . + h
n
(v) e
n
.
Let’s apply this expansion in order to calculate the matrix of the associated
mapping a
g
. For this purpose we need to apply a
g
one by one to all basis vectors
e
1
, . . . , e
n
and expand the results in the dual basis in V

. Let’s consider the
value of the functional a
g
(e
i
) on an arbitrary vector v of the space V :
a
g
(e
i
)(v) = 'a
g
(e
i
) [ v` = g(e
i
, v) =
= g(e
i
, h
1
(v) e
1
+. . . + h
n
(v) e
n
) =
n
¸
j=1
g
ij
h
j
(v).
Since v ∈ V is an arbitrary vector, we conclude that the matrix of the associated
mapping a
g
in two bases e
1
, . . . , e
n
and h
1
, . . . , h
n
coincides with the matrix
g
ij
= g(e
i
, e
j
) of the quadratic form g in the basis e
1
, . . . , e
n
The matrix g
ij
is
non-degenerate (see theorem 1.2 or Silvester’s criterion in ' 4 of Chapter IV). Let’s
denote by g
ij
the components of the matrix inverse to g
ij
. The matrix g
ij
is the
matrix of the inverse mapping a
−1
g
, i. e. we have:
a
g
(e
i
) =
n
¸
j=1
g
ij
h
j
, a
−1
g
(h
j
) =
n
¸
j=1
g
ij
e
i
. (3.5)
The matrix inverse to a symmetric matrix is again a symmetric matrix (this fact
is well-known from general algebra). Therefore g
ij
= g
ji
.
Remember that we have already calculated the matrix of the conjugate mapping
f

(see formula (4.2) and theorem 4.3 in Chapter III). When applied to our present
case the results of Chapter III mean that the matrix of the operator f

: V

→V

in the basis of coordinate functionals h
1
, . . . , h
n
coincides with the matrix of the
initial operator f in the basis e
1
, . . . , e
n
. Let’s combine this fact with (3.5) and
let’s use the formula f
+
= a
−1
g
◦f

◦ a
g
from the theorem 3.1. Then for the matrix
of F
+
of the adjoint operator f
+
we obtain:
(F
+
)
i
j
=
n
¸
k=1
n
¸
q=1
g
iq
F
k
q
g
kj
. (3.6)
In matrix form the formula (3.6) is written as F
+
= G
−1
F
tr
G, where G is the
Gram matrix of that basis in which the matrices of f and f
+
are calculated.
The formula (3.6) simplifies substantially for orthonormal bases. Here the passage
to the adjoint operator means the transposition of its matrix. The matrix of
a selfadjoint operator in an orthonormal basis is symmetric. For this reason
selfadjoint operators are often called symmetric operators.
130 CHAPTER V. EUCLIDEAN SPACES.
Let f : V → V be a selfadjoint operator in a Euclidean space V . Each such
operator produces the quadratic ϕ
f
according to the formula
ϕ
f
(v) = (v [ f(v)). (3.7)
Conversely, assume that we have a quadratic form ϕ in a finite-dimensional
Euclidean space (V, g). The form ϕ determines the associated mapping a
ϕ
(see
definition 2.5 in Chapter IV). This mapping satisfies the relationship
'a
ϕ
(v) [ w` = ϕ(v, w) (3.8)
for any two vectors v, w ∈ V . The positive quadratic form g defining the structure
of Euclidean space in V has also its own associated mapping a
g
. The mapping a
g
is bijective since g is non-degenerate (see theorem 4.2 in Chapter IV). Therefore,
we can consider the composition of a
−1
g
and a
ϕ
:
f
ϕ
= a
−1
g
◦ a
ϕ
. (3.9)
This composition (3.9) is an operator in V . It is called the associated operator
of the form ϕ in a Euclidean space. Since a
g
is bijective, we can write (3.2) as
'u[ w` = (a
−1
g
(u) [ w). Combining this equality with (3.8), we find
(f
ϕ
(v) [ w) = (a
−1
g
(a
ϕ
(v)) [ w) = 'a
ϕ
(v) [ w` = ϕ(v, w). (3.10)
Now, using the symmetry of the form ϕ(v, w) in (3.10), we write
(f
ϕ
(v) [ w) = ϕ(v, w) = ϕ(w, v) = (f
ϕ
(w) [ v) = (v [ f
ϕ
(w)). (3.11)
The relationship (3.11), which is an identity for all v, w ∈ V , means that f
ϕ
is a
selfadjoint operator (see definition 2.1).
The formula (3.7) associates each selfadjoint operator f with the quadratic form
ϕ
f
, while the formula (3.9) associates each quadratic form ϕ with the selfadjoint
operator f
ϕ
. These two associations are one to one and are inverse to each other.
Indeed, let’s apply the formula (3.7) to the operator (3.9) and use (3.10):
ϕ
f
(v) = (v [ f
ϕ
(v)) = ϕ(v, v) = ϕ(v).
Now, conversely, let’s construct the operator h = f
ϕ
for the quadratic form ϕ = ϕ
f
.
For the operator h and for two arbitrary vectors v, w ∈ V from (3.10) we derive
(h(v) [ w) = ϕ
f
(v, w) = (v [ f(w) = (f(v) [ w).
Since w ∈ V is an arbitrary vector and since the form g determining the scalar
product in V is non-degenerate, from the above equality we get h(v) = f(v).
Thus, from what was said above we conclude that defining a selfadjoint operator
in a finite-dimensional Euclidean space is equivalent to defining a quadratic form
in this space. Therefore, we can apply the theorem 2.4 for describing selfadjoint
operators in a finite-dimensional case.
§ 3. SELFADJOINT OPERATORS. 131
Theorem 3.2. All eigenvalues of a selfadjoint operator f in a finite-dimensional
Euclidean space V are real numbers and there is an orthonormal basis composed
by eigenvectors of such operator.
Proof. For the selfadjoint operator f in V we consider the symmetric bilinear
form ϕ
f
(v, w) determined by the quadratic form (3.7). Let e
1
, . . . , e
n
be an
orthonormal basis in which the matrix of the form ϕ
f
is diagonal. Then from the
formula (3.7) we derive the following equalities:
ϕ
f
(e
i
, e
j
) = (e
i
[ f(e
j
) =
n
¸
k=1
F
k
j
g
ik
= F
i
k
. (3.12)
As we see in (3.12), the matrices of the operator f and of the form ϕ
f
in such
basis do coincide. This proves the proposition of the theorem.
The theorem 3.2 is known as the theorem on the spectrum and the basis of
eigenvectors of a selfadjoint operator. The main result of this theorem is the
diaginalizability of selfadjoint operators in a finite-dimensional Euclidean space.
The characteristic polynomial of a selfadjoint operator is factorized into the
product of linear terms in R. Its eigenspaces coincide with the corresponding root
subspaces, the sum of all its eigenspaces coincides with the space V :
V = V
λ1
⊕. . . ⊕V
λs
. (3.13)
Theorem 3.3. Any two eigenvectors of a selfadjoint operator corresponding to
different eigenvalues are orthogonal to each other.
Proof. Let f be a selfadjoint operator in a Euclidean space and let λ = µ be
its eigenvalues. Let’s consider the corresponding eigenvectors a and b:
f(a) = λ a, f(b) = µ b.
Then for these two eigenvectors a and b we derive:
λ(a[ b) = (f(a) [ b) = (a[f(b)) = µ(a[ b).
Hence, (λ − µ) (a[ b) = 0. But we know that λ − µ = 0. Therefore, (a[ b) = 0.
The theorem is proved.
Assume that the kernel of selfadjoint operator f is nontrivial: Ker f = ¦0¦.
Then λ
1
= 0 in (3.13) is one of the eigenvalues of the operator f and we have
Ker f = V
λ1
, Imf = V
λ2
⊕ . . . ⊕ V
λs
.
This means that the kernel and the image of a selfadjoint operator are orthogonal
to each other and their sum coincides with V :
V = Ker f ⊕Imf. (3.14)
132 CHAPTER V. EUCLIDEAN SPACES.
' 4. Isometries and orthogonal operators.
Definition 4.1. A linear mapping f : V → W from one Euclidean vector
space (V, g) to another Euclidean vector space (W, h) is called an isometry if
(f(x) [ f(y)) = (x[ y) (4.1)
for all x, y ∈ V , i. e. if it preserves the scalar product of vectors.
From (4.1) we easily derive [f(x)[ = [x[, therefore, f(x) = 0 implies [x[ = 0 and
x = 0. This means that the kernel of an isometry is always trivial Ker f = ¦0¦, i. e.
any isometry is an injective mapping. Due to the recovery formula for quadratic
forms (see formula (1.6) in Chapter IV) in order to verify that f : V → W is
an isometry it is sufficient to verify that it preserves the norm of vectors, i. e.
[f(x)[ = [x[ for all vectors x ∈ V .
Theorem 4.1. The composition of isometries is again an isometry.
Proof. Assume that the mappings h : U → V and f : V → W both are
isometries. Hence, [h(u)[ = [u[ for all u ∈ U and [f(v)[ = [v[ for all v ∈ V . Then
[f ◦ h(u)[ = [f(h(u))[ = [h(u)[ = [u[
for all u ∈ U. This equality means that the mapping f ◦ h is an isometry. The
theorem is proved.
Definition 4.2. A bijective isometry f : V → W is called an isomorphism of
Euclidean vector spaces.
Theorem 4.2. Isomorphisms of Euclidean vector spaces possess the following
three properties:
(1) the identical mapping id
V
is an isomorphism;
(2) the composition of isomorphisms is an isomorphism;
(3) the mapping inverse to an isomorphism is an isomorphism.
The proof of this theorem is very easy if we use the above theorem 4.1 and the
theorem 8.1 of Chapter I.
Definition 4.3. Two Euclidean vector spaces V and W are called isomorphic
if there is an isomorphism f : V →W relating them.
Let’s consider the arithmetic vector space R
n
composed by column vectors of
the height n. The addition of such vectors and the multiplication of them by
real numbers are performed as the operations with their components (see formulas
(2.1) in Chapter I). Let’s define a quadratic form g(x) in R
n
by setting
g(x) = (x
1
)
2
+. . . +(x
n
)
2
=
n
¸
i=1
(x
i
)
2
. (4.2)
The form (4.2) yields the standard scalar product and, hence, defines the standard
structure of a Euclidean space in R
n
.
§ 4. ISOMETRIES AND ORTHOGONAL OPERATORS. 133
Theorem 4.3. Any n-dimensional Euclidean vector space V is isomorphic to
the space R
n
with the standard scalar product (4.2).
In order to prove this theorem it is sufficient to choose the orthonormal basis in
V and consider the mapping ψ that associates a vector v ∈ V with column vector
of its coordinates (see formula (5.4) in Chapter I).
Definition 4.4. An operator f in a Euclidean vector space V is called an
orthogonal operator if it is bijective and defines an isometry f : V →V .
Due to the theorem 4.2 the orthogonal operators form a group which is called
the orthogonal group of a Euclidean space V and is denoted by O(V ). The group
O(V ) is obviously a subgroup in the group of automorphisms Aut(V ). In the case
V = R
n
the orthogonal group determined by the standard scalar product in R
n
is
denoted by O(n, R).
Let e
1
, . . . , e
n
be an orthonormal basis in a Euclidean space V and let f be an
orthogonal operator. Then from (4.1) we derive
(f(e
i
) [ f(e
j
)) = (e
i
[ e
j
).
For the matrix of the operator f in the basis e
1
, . . . , e
n
this relationship yields:
n
¸
k=1
F
k
i
F
k
j
=

1 for i = j,
0 for i = j,
(4.3)
When written in the matrix form, the formula (4.3) means that
F
tr
F = 1, F
−1
= F
tr
. (4.4)
The relationships (4.4) are identical to the relationships (1.13). Matrices that
satisfy such relationships, as we already know, are called orthogonal matrices. As
a corollary of this fact we can formulate the following theorem.
Theorem 4.4. An orthogonal operator f in an orthonormal basis e
1
, . . . , e
n
of a Euclidean vector space V is given by an orthogonal matrix.
As we have noted in ' 1, the determinant of an orthogonal matrix can be equal
to 1 or to −1. The orthogonal operators in V with determinant 1 form a group
which is called the special orthogonal group of a Euclidean vector space V . This
group is denoted by SO(V ). If V = R
n
, this group is denoted by SO(n, R).
The operators f ∈ SO(V ) in two-dimensional case dimV = 2 are most simple
ones. If e
1
, e
2
is an orthonormal basis in V , then from (4.3) and det F = 1 we
easily find the form of an orthogonal matrix F:
F =

cos(ϕ) −sin(ϕ)
sin(ϕ) cos(ϕ)

(4.5)
A matrix F of the form (4.5) is called a matrix of two-dimensional rotation, while
the numeric parameter ϕ is interpreted as the angle of rotation.
CopyRight c (Sharipov R.A., 1996, 2004.
134 CHAPTER V. EUCLIDEAN SPACES.
Let’s consider orthogonal operators f ∈ SO(V ) in the case dimV = 3. Let
e
1
, e
2
, e
3
be an orthonormal basis in V . A matrix of the form
F =

cos(ϕ) −sin(ϕ) 0
sin(ϕ) cos(ϕ) 0
0 0 1

(4.6)
is an orthogonal matrix with determinant 1. The operator f associated with the
matrix (4.6) is called the operator of rotation about the vector e
3
by the angle ϕ.
Theorem 4.5. In a three-dimensional Euclidean vector space V any orthogonal
operator f with determinant 1 has an eigenvalue λ = 1.
Proof. Let’s consider the characteristic polynomial of the operator f. This is
the polynomial of degree 3 in λ with real coefficients:
P(λ) = −λ
3
+ F
1
λ
2
− F
2
λ + F
3
, where F
3
= det f = 1.
Remember that the values of a polynomial of odd degree for large positive λ and
for large negative λ differ in sign:
lim
λ→−∞
P(λ) = +∞, lim
λ→+∞
P(λ) = −∞.
Therefore the equation of the odd degree P(λ) = 0 with real coefficients has at
least one real root λ = λ
1
. This root is an eigenvalue of the operator f.
Let e
1
= 0 be an eigenvector of f corresponding to the eigenvalue λ
1
. Then,
applying the isometry condition [v[ = [f(v)[ to the vector v = e
1
, we get
[e
1
[ = [f(e
1
)[ = [λ
1
e
1
[ = [λ
1
[ [e
1
[.
Hence, we find that [λ
1
[ = 1. This means that λ
1
= 1 or λ
1
= −1. In the case
λ
1
= 1 the proposition of the theorem is valid. Therefore, we consider the case
λ
1
= −1. Let’s separate the linear factor (λ + 1) in characteristic polynomial:
P(λ) = −λ
3
+F
1
λ
2
−F
2
λ +1 = −(λ + 1)(λ
2
− Φ
1
λ −1).
Then F
1
= Φ
1
− 1 and F
2
= −1 − Φ
1
. In order to the remaining roots of the
polynomial P(λ) we consider the following quadratic equation:
λ
2
− Φ
1
λ −1 = 0.
This equation always has two real roots λ
2
and λ
3
since its discriminant is positive:
D = (Φ
1
)
2
+4 > 0. Due to the Viet theorem we have λ
2
λ
3
= −1. Due to the same
reasons as above in the case of λ
1
, for λ
2
and λ
3
we get [λ
2
[ = [λ
3
[ = 1. Hence,
one of these two real numbers is equal to 1 and the other is equal to −1. Thus,
we have proved that the number λ = 1 is among the eigenvalues of the operator f.
The theorem is proved.
§ 4. ISOMETRIES AND ORTHOGONAL OPERATORS. 135
Theorem 4.5. In a three-dimensional Euclidean vector space V for any ortho-
gonal operator f with determinant 1 there is an orthonormal basis in which the
matrix of f has the form (4.6).
Proof. Under the assumptions of theorem 4.5 the operator f has an eigenvalue
λ
1
= 1. Let e
1
= 0 be an eigenvector of this operator associated with the
eigenvalue λ
1
= 1. Let’s denote by U the span of the eigenvector e
1
and consider
its orthogonal complement U

. This is the two-dimensional subspace in the three-
dimensional space V . This subspace is invariant under the action of f. Indeed,
from x ∈ U

we derive (x[ e
1
) = 0. Let’s write the isometry condition (4.1) for
the vectors x and y = e
1
:
0 = (x[ e
1
) = (f(x) [ f(e
1
)) = λ
1
(f(x) [ e
1
).
Since λ
1
= 1, we get (f(x) [ e
1
) = 0. Hence, f(x) ∈ U

, which proves the
invariance of the subspace U

.
Let’s consider the restriction of the operator f to the invariant subspace
U

. This restriction is an orthogonal operator in two-dimensional space U

, its
determinant being equal to 1. Therefore, in some orthogonal basis e
2
, e
3
of U

the matrix of the restricted operator has the form (4.5).
Remember that e
1
is perpendicular to e
2
and e
3
. It can be normalized to the
unit length. Then three vectors e
1
, e
2
, e
3
form an orthonormal basis in three-
dimensional space V and the matrix of f in this basis has the form (4.6). The
theorem is proved.
The result of this theorem is that any orthogonal operator f with determinant
1 in a three-dimensional Euclidean vector space V is an operator of rotation.
The eigenvector e
1
associated with the eigenvalue λ
1
= 1 determines the axis of
rotation, while the real parameter ϕ in the matrix (4.6) determines the angle of
such rotation.
CHAPTER VI
AFFINE SPACES.
' 1. Points and parallel translations. Affine spaces.
Let M be an arbitrary set. A transformation of the set M is a bijective
mapping p: M →M of the set M onto itself.
Definition 1.1. Let V be a linear vector space. We say that an action of V
on a set M is defined if each vector v ∈ V is associated with some transformation
p
v
of the set M and the following conditions are fulfilled:
(1) p
0
= id
M
;
(2) p
v+w
= p
v
◦ p
w
for all v, w ∈ V .
From the properties (1) and (2) of an action of a space V on a set M one can
easily derive the following two properties of such action:
(3) p
−v
= p
−1
v
for all v ∈ V ;
(4) p
v
◦ p
w
= p
w
◦ p
v
for all v, w ∈ V .
Definition 1.2. An action of a vector space V on a set M is called a transitive
action if for any two elements A, B ∈ M there is a vector v ∈ V such that
p
v
(A) = B, i. e. the transformation p
v
takes A to B.
Definition 1.3. An action of a vector space V on a set M is called a free
action if for any element A ∈ M the equality p
v
(A) = A implies v = 0.
Definition 1.4. A set M is called an affine space over the field K if there is a
free transitive action of some linear vector space V over the field K on M.
Due to this definition any affine space M is associated with some linear vector
space V . Therefore an affine space M is often denoted as a pair (M, V ).
Elements of an affine space are used to be called points. We shall denote them
by capital letters A, B, C, etc. An affine space itself is sometimes called a point
space. A transformation p
v
given by a vector v ∈ V is called a parallel translation
in an affine space M.
Let U be a subspace in V . Let’s choose a point A ∈ M and then let’s define a
subset L ⊂ M in the following way:
L = ¦B ∈ M : ∃ u ((u ∈ U) & (B = p
u
(A)))¦. (1.1)
A subset L of M determined according to (1.1) is called a linear submanifold of
an affine space M. Thereby the subspace U ⊂ V is called the directing subspace of
a linear submanifold L. The dimension of the directing subspace in (1.1) is taken
for the dimension of a linear submanifold L. One-dimensional linear submanifolds
are called straight lines; two-dimensional submanifolds are called planes. If the
§ 1. POINTS AND PARALLEL TRANSLATIONS. 137
dimension of U is less by one than the dimension of V , i. e. if dim(V/U) = 1, then
the corresponding linear submanifold L is called a hyperplane. Linear submanifolds
of other intermediate dimensions have no special titles.
Let U = 'a` be a one-dimensional subspace in V . Then any vector u ∈ U is
presented as u = t a, where t ∈ K. Upon choosing a point A ∈ M the subspace
U determines the straight line in M passing through the point A. An arbitrary
point A(t) of this straight is given by the formula:
A(t) = p
t·a
(A). (1.2)
The formula (1.2) is known as a parametric equation of a straight line in an affine
space, the vector a is called a directing vector, while t ∈ K is a parameter.
If K = R, we can consider the set of points on the straight line (1.2) corre-
sponding to the values of t taken from the interval [0, 1] ⊂ R. Such set is called a
segment of a straight line. The points A = A(0) and B = A(1) are ending points of
this segment. One can choose a direction on the segment AB by saying that one
of the ending points is the beginning of the segment and the other is the end of the
segment. A segment AB with a fixed direction on it is called a directed segment or
an arrowhead segment. Two arrowhead segments
−−→
AB and
−−→
BA are assumed to be
distinct
1
.
Let A and B be two points of an affine space M. Due to the transitivity of the
action of V on M there exists a vector v ∈ V that defines the parallel translation
p
v
taking the point A to the point B: p
v
(A) = B. Let’s prove that such parallel
translation is unique. If p
w
is another parallel translation such that p
w
(A) = B,
then for the parallel translation p
w−v
we have
p
w−v
(A) = p
−v
◦ p
w
(A) = p
−1
v
(p
w
(A)) = p
−1
v
(B) = A.
Since V acts freely on M (see definition 1.3), we have w − v = 0. Hence, w = v,
this proves the uniqueness of the vector v determined by the condition p
v
(A) = B.
The above fact appears to be very useful: if we have an affine space (M, V ),
then vectors of V can be represented by arrowhead segments in M. Each pair of
points A, B ∈ M specifies the unique vector a ∈ V such that p
a
(A) = B. This
vector can be used as a directing vector of the straight line (1.2) passing through
the points A and B. The arrowhead segment with the beginning at the point
A and with the end at the point B is called the geometric representation of the
vector a. It is denoted
−−→
AB
A vector a is uniquely determined by its geometric representation
−−→
AB . How-
ever, a vector a can have several geometric representations. Indeed, if we choose a
point C = A, we can determine the point D = p
a
(C) and then we can construct
the geometric representation
−−→
CD for the vector a. The points A and C specify
a parallel translation p
b
such that p
b
(A) = C. Using the property (4) of parallel
translations, it is easy to find that the parallel translation p
b
maps the segment
AB to the segment CD. So we conclude: various geometric representations of a
vector a are related to each other by means of parallel translations. Note that
p
a+b
(A) = D. Therefore,
−−→
AD is a geometric realization of the vector a +b. From
1
If K = R, an arrowhead segment
−−→
AB is assumed to be consisting on two points A and B
only, it has no interior at all.
138 CHAPTER VI. AFFINE SPACES.
this fact we easily derive the well-known rules for vector addition: the triangle rule
−−→
AC +
−−→
CD =
−−→
AD and the parallelogram rule
−−→
AB +
−−→
AC =
−−→
AD.
Let O be some fixed point of an affine space M. Let’s call it the origin. Then
any point A ∈ M specifies the arrowhead
−−→
OA which is identified with the unique
vector r ∈ V by means of the equality p
r
(O) = A. This vector r = r
A
is called
the radius-vector of the point A. If the space V is finite-dimensional, then we can
choose a basis e
1
, . . . , e
n
and then can expand the radius-vectors of all points
A ∈ M in this basis.
Definition 1.5. A frame or a coordinate system in an affine space M is a pair
consisting of a point O ∈ M and a basis e
1
, . . . , e
n
in V . The coordinates of
the radius-vector r
A
=
−−→
OA in the basis e
1
, . . . , e
n
are called the coordinates of a
point A in the coordinate system O, e
1
, . . . , e
n
.
Coordinate systems in affine spaces play the same role as bases in linear vector
spaces. Let O, e
1
, . . . , e
n
and O

, ˜ e
1
, . . . , ˜ e
n
be two coordinate systems in an
affine space M. The relation of the bases e
1
, . . . , e
n
and ˜ e
1
, . . . , ˜e
n
is given
by the direct and inverse transition matrices S and T. The points O and O

determine the arrowhead segment
−−→
OO

and the opposite arrowhead segment
−−→
O

O.
They are associated with two vectors ρ, ˜ ρ ∈ V :
ρ =
−−→
OO

˜ ρ =
−−→
O

O
Let’s expand ρ in the basis e
1
, . . . , e
n
and ˜ ρ in the basis ˜ e
1
, . . . , ˜e
n
:
ρ = ρ
1
e
1
+ . . . + ρ
n
e
n
,
˜ ρ = ˜ ρ
1
˜e
1
+ . . . + ˜ ρ
n
˜ e
n
.
(1.3)
Then consider a point X ∈ M. The following formulas are obvious:
−−→
OX =
−−→
OO

+
−−→
O

X ,
−−→
O

X =
−−→
O

O +
−−→
OX ,
By means of them we can find the relation of the coordinates of a point X in two
different coordinate systems O, e
1
, . . . , e
n
and O

, ˜e
1
, . . . , ˜ e
n
:
x
i
= ρ
i
+
n
¸
j=1
S
i
j
˜ x
j
, ˜ x
i
= ˜ ρ
i
+
n
¸
j=1
T
i
j
x
j
. (1.4)
Though the vectors ρ and ˜ ρ differ only in sign ( ˜ ρ = −ρ), their coordinates in
formulas (1.4) are much more different:
ρ
i
= −
n
¸
j=1
S
i
j
˜ ρ
j
, ˜ ρ
i
= −
n
¸
j=1
T
i
j
ρ
j
.
This happens because ρ and ˜ ρ are expanded in two different bases (see the above
expansions (1.3)).
The facts from the theory of affine spaces, which we stated above, show that
considering affine spaces is a proper way for geometrization of the linear algebra.
§ 2. EUCLIDEAN POINT SPACES. 139
A vector is an algebraic object: we can add vectors, we can multiply them by
numbers, and we can form linear combinations of them. In affine space the
concept of a point becomes paramount. Points form straight lines, planes, and
their multidimensional generalizations — linear submanifolds. In affine spaces we
have a quite natural concept of parallel translations and, hence, we can define the
concept of parallelism for linear submanifolds. The geometry of two-dimensional
affine spaces is called the planimetry, the geometry of three-dimensional affine
spaces is called the stereometry. Affine spaces of higher dimensions are studied by
a geometrical discipline which is called the multidimensional geometry.
' 2. Euclidean point spaces.
Quadrics in a Euclidean space.
Definition 2.1. An affine space (M, V ) over the field of real numbers R is
called a Euclidean point space if the space V acting on M by parallel translations
is equipped with a structure of a Euclidean vector space, i. e. if in V some positive
quadratic form g is fixed.
In affine spaces, which we considered in previous section, a very important
feature was lacking: there was no concept of a length and there was no concept of
an angle. The structure of a Euclidean space given by a quadratic form g brings
this lacking feature in. Let A and B be two points of a Euclidean point space
M. They determine a vector v ∈ V specified by the condition p
v
(A) = B (this
vector is identified with the arrowhead segment
−−→
AB ). The norm of the vector v
determined by the quadratic form g is called the length of the segment AB or the
distance between two points A and B: [AB[ = [v[ =

g(v). Due to the equality
[ −v[ = [v[ we derive [AB[ = [BA[.
Let
−−→
AB and
−−→
CD be two arrowhead segments in a Euclidean point space. They
are geometric representations of two vectors v and w of V . The angle between
−−→
AB and
−−→
CD by definition is the angle between vectors v and w determined by
the formula (1.6) of Chapter V.
Definition 2.2. A coordinate system O, e
1
, . . . , e
n
in a finite-dimensional
Euclidean point space (M, V, g) is called a rectangular Cartesian coordinate system
in M if e
1
, . . . , e
n
is an orthonormal basis of the Euclidean vector space (V, g).
Definition 2.3. A quadric in a Euclidean point space M is a set of points in
M whose coordinates x
1
, . . . , x
n
in some rectangular Cartesian coordinate system
O, e
1
, . . . , e
n
satisfies some polynomial equation of degree two:
n
¸
i=1
n
¸
j=1
a
ij
x
i
x
j
+ 2
n
¸
i=1
b
i
x
i
+c = 0. (2.1)
The definition of a quadric is not coordinate-free. It is formulated in terms of
some rectangular Cartesian coordinate system O, e
1
, . . . , e
n
. However, passing to
another Cartesian coordinate system is equivalent to a linear change of variables
in the equation (2.1) (see formulas (1.4)). Such a change of variables changes
the coefficients of the polynomial in (2.1), but it does not change the structure
of this equation in whole. A quadric continues to be a quadric in any Cartesian
coordinate system.
140 CHAPTER VI. AFFINE SPACES.
Let O

, ˜ e
1
, . . . , ˜e
n
be some other rectangular Cartesian coordinate system in
M. Let’s consider the passage from O, e
1
, . . . , e
n
to O

, ˜e
1
, . . . , ˜ e
n
. In this
case transition matrices S and T in (1.4) appear to be orthogonal matrices (see
formulas (1.13) in Chapter V. We can calculate the coefficients of the equation of
quadric in the new coordinate system. Substituting (1.4) into (2.1), we get
˜ a
qp
=
n
¸
i=1
n
¸
j=1
a
ij
S
i
q
S
j
p
, (2.2)
˜
b
q
=
n
¸
i=1
b
i
S
i
q
+
n
¸
i=1
n
¸
j=1
a
ij
ρ
j
S
i
q
, (2.3)
˜ c =
n
¸
i=1
n
¸
j=1
a
ij
ρ
i
ρ
j
+
n
¸
i=1
b
i
ρ
i
+c. (2.4)
Now the problem of bringing the equation of a quadric to a canonic form is
formulated as the problem of finding a proper rectangular Cartesian coordinate
system in which the equation (2.1) has the most simple canonic form.
The formula (2.2) coincides with the transformation formula for the components
of a quadratic form under a change of basis (see (1.11) in Chapter IV). Hence,
we conclude that each quadric in M is associated with some quadratic form in
V . The form a determined by the matrix a
ij
in the basis e
1
, . . . , e
n
is called the
primary quadratic form of a quadratic (2.1).
Let’s consider the associated operator f
a
determined by the primary quadratic
form a (see formula (3.9) in Chapter V). The operator f
a
is a selfadjoint operator
in V ; it determines the expansion of the space V into the direct sum of two
mutually orthogonal subspaces Ker f
a
and Imf
a
:
V = Ker f
a
⊕ Imf
a
(2.5)
(see (3.14) in Chapter V). The matrix of the operator f
a
is given by the formula
F
i
j
=
n
¸
k=1
g
ik
a
kj
, (2.6)
where g
ik
is the matrix inverse to the Gram matrix of the basis e
1
, . . . , e
n
. Apart
from f
a
, we define a vector b through its coordinates given by formula
b
i
=
n
¸
k=1
g
ik
b
k
. (2.7)
The definition of b through its coordinates (2.7) is essentially bound to the
coordinate system O, e
1
, . . . , e
n
. This is because the formula (2.3) differs from
the standard transformation formula for the coordinates of a covector under a
change of basis (see (2.4) in Chapter III). Let’s rewrite (2.3) in the following form:
˜
b
q
=
n
¸
i=1
S
i
q

b
i
+
n
¸
j=1
a
ij
ρ
j

. (2.8)
CopyRight c (Sharipov R.A., 1996, 2004.
§ 2. EUCLIDEAN POINT SPACES. 141
Then let’s consider the expansion of the vector b into the sum of two vectors
b = b
(1)
+ b
(2)
according to the expansion (2.5) of the space V . This expansion
induces the expansion b
i
= b
(1)
i
+b
(2)
i
, where b
(1)
i
are transformed as follows:
˜
b
(1)
q
=
n
¸
i=1
S
i
q
b
(1)
i
. (2.9)
The vector b
(2)
in the expansion b = b
(1)
+ b
(2)
can be annihilated at the expense
of proper choice of the coordinate system. Let’s determine the vector ρ =
−−→
OO

from the equality b
(2)
= −f
a
(ρ). Though it is not unique, the vector ρ satisfying
this equality does exist since b
(2)
∈ Imf
a
. For its components we have
b
(2)
i
+
n
¸
j=1
a
ij
ρ
j
= 0, (2.10)
this follows from b
(2)
= −f
a
(ρ) due to (2.6) and (2.7). Substituting (2.10) into
(2.8), we get the following equalities in the new coordinate system:
˜
b
(2)
= 0,
˜
b =
˜
b
(1)
.
The relationships (2.9) show that the numbers b
(1)
i
cannot be annihilated (unless
they are equal to zero from the very beginning). These numbers determine the
vector b
(1)
∈ Ker f
a
which does not depend on the choice of a coordinate system.
As a result we have proved the following theorem.
Theorem 2.1. Any quadric in a Euclidean point space (M, V, g) is associated
with some selfadjoint operator f and some vector b ∈ Ker f such that in some
rectangular Cartesian coordinate system the radius vector r of an arbitrary point
of this quadric satisfies the following equation:
(f(r) [ r) +2 (b[ r) +c = 0. (2.11)
The operator f determines the leading part of the equation (2.11). By means
of this operator we subdivide all quadrics into two basic types:
(1) non-degenerate quadrics, when Ker f = ¦0¦;
(2) degenerate quadrics, when Ker f = ¦0¦.
For non-degenerate quadrics the vector b in (2.11) is equal to zero. Therefore,
non-degenerate quadrics are subdivided into three types:
(1) elliptic type, when c = 0 and the quadratic form a(x) = (f(x) [ x) is positive
or negative, i. e. can be made positive by changing the sign of f;
(2) hyperbolic type, when c = 0 and the quadratic form a(x) = (f(x) [ x) is not
sign-definite, i. e. its signature has both pluses and minuses;
(3) conic type, when c = 0.
Degenerate quadrics are subdivided into two types:
(1) parabolic type, when dimKer f = 1 and b = 0;
(2) cylindric type, when dimKer f > 1 or b = 0.
142 CHAPTER VI. AFFINE SPACES.
The equation (2.1) in the case of a non-degenerate quadric of elliptic type can
be brought to the following canonic form:
(x
1
)
2
(a
1
)
2
+. . . +
(x
n
)
2
(a
n
)
2
= ±1.
This is the canonic equation of a non-degenerate quadric of hyperbolic type:
(x
1
)
2
(a
1
)
2
±. . . ±
(x
n
)
2
(a
n
)
2
= ±1.
The canonic equation of a non-degenerate quadric of conic type is homogeneous:
(x
1
)
2
(a
1
)
2
±. . . ±
(x
n
)
2
(a
n
)
2
= 0.
The equation (2.1) in the case of a degenerate quadric of parabolic type can
be brought to the following canonic form:
(x
1
)
2
(a
1
)
2
± . . . ±
(x
n−1
)
2
(a
n−1
)
2
= 2 x
n
.
If n = dimM > 1, then in a canonic equation of a quadric of cylindric type
there is no explicit entry of at least one variable. Therefore, we can reduce the
dimension of the space M. The reduced quadric can belong to any one of the
above four types. If it is again of cylindric type, then we can repeat the reduction
procedure. This process can terminate in some intermediate dimension yielding
the reduced quadric of some non-cylindric type. Otherwise we shall reach the
dimension dimM = 1. In one-dimensional Euclidean point space there is no
quadrics of cylindric type. Therefore, the quadrics of cylindric type are those
which belong to one of the non-cylindric types in the reduced dimension.
REFERENCES.
1. Kurosh A. G. Course of general algebra, «Nauka» publishers, Moscow.
2. Sharipov R. A. Course of differential geometry
1
, Bashkir State University, Ufa,
1996; see online math/0412421 in Electronic Archive http://arXiv.org and r-
r-sharipov/r4-b3.htm in GeoCities.
3. Sharipov R. A. Classical electrodynamics and the theory of relativity, Bashkir
State University, Ufa, 1997; see online physics/0311011 in Electronic Archive
http://arXiv.org and r-sharipov/r4-b5.htm in GeoCities.
4. Kostrikin A. I. Introduction to algebra, «Nauka» publishers,Moscow, 1977.
5. Beklemishev D. V. Course of analytical geometry and linear algebra, «Nauka»
publishers, Moscow, 1985.
6. Kudryavtsev L. D. Course of mathematical analysis, Vol. I and II, «Visshaya
Shkola» publishers, Moscow, 1985.
7. Sharipov R. A. Quick introduction to tensor analysis, free online publication
math.HO/0403252 in Electronic Archive http://arXiv.org, 2004; see also r-
sharipov/r4-b6.htm in GeoCities.
1
The references [2] and [3] are added in 1998, the reference [7] is added in 2004.

2

MSC 97U20 PACS 01.30.Pp UDC 512.64 Sharipov R. A. Course of Linear Algebra and Multidimensional Geometry: the textbook / Publ. of Bashkir State University — Ufa, 1996. — pp. 143. ISBN 5-7477-0099-5.

This book is written as a textbook for the course of multidimensional geometry and linear algebra. At Mathematical Department of Bashkir State University this course is taught to the first year students in the Spring semester. It is a part of the basic mathematical education. Therefore, this course is taught at Physical and Mathematical Departments in all Universities of Russia. In preparing Russian edition of this book I used the computer typesetting on the base of the AMS-TEX package and I used the Cyrillic fonts of Lh-family distributed by the CyrTUG association of Cyrillic TEX users. English edition of this book is also typeset by means of the AMS-TEX package. Referees: Computational Mathematics and Cybernetics group of Ufa State University for Aircraft and Technology (UGATU); Prof. S. I. Pinchuk, Chelyabinsk State University for Technology (QGTU) and Indiana University.

Contacts to author. Office: Phone: Fax: Mathematics Department, Bashkir State University, 32 Frunze street, 450074 Ufa, Russia 7-(3472)-23-67-18 7-(3472)-23-67-74

Home: 5 Rabochaya street, 450003 Ufa, Russia Phone: 7-(917)-75-55-786 E-mails: R Sharipov@ic.bashedu.ru r-sharipov@mail.ru ra sharipov@lycos.com ra sharipov@hotmail.com URL: http://www.geocities.com/r-sharipov

ISBN 5-7477-0099-5 English translation

c Sharipov R.A., 1996 c Bashkir State University, 1996 c Sharipov R.A., 2004

CONTENTS.

CONTENTS. ............................................................................................... 3. PREFACE. .................................................................................................. 5. CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. ........ 6. § § § § § § § § § § 1. 2. 3. 4. 5. The sets and mappings. ......................................................................... 6. Linear vector spaces. ........................................................................... 10. Linear dependence and linear independence. ......................................... 14. Spanning systems and bases. ................................................................ 18. Coordinates. Transformation of the coordinates of a vector under a change of basis. ....................................................................... 22. 6. Intersections and sums of subspaces. ..................................................... 27. 7. Cosets of a subspace. The concept of factorspace. ................................. 31. 8. Linear mappings. ................................................................................ 36. 9. The matrix of a linear mapping. ........................................................... 39. 10. Algebraic operations with mappings. The space of homomorphisms Hom(V, W ). ........................................... 45.

CHAPTER II. LINEAR OPERATORS. ...................................................... 50. § 1. Linear operators. The algebra of endomorphisms End(V ) and the group of automorphisms Aut(V ). ............................................. § 2. Projection operators. ........................................................................... § 3. Invariant subspaces. Restriction and factorization of operators. .............. § 4. Eigenvalues and eigenvectors. ............................................................... § 5. Nilpotent operators. ............................................................................ § 6. Root subspaces. Two theorems on the sum of root subspaces. ................ § 7. Jordan basis of a linear operator. Hamilton-Cayley theorem. ................. § 1. Linear functionals. Vectors and covectors. Dual space. .......................... § 2. Transformation of the coordinates of a covector under a change of basis. ....................................................................... § 3. Orthogonal complements in a dual spaces. ............................................ § 4. Conjugate mapping. ............................................................................ 50. 56. 61. 66. 72. 79. 83.

CHAPTER III. DUAL SPACE. .................................................................. 87. 87. 92. 94. 97.

CHAPTER IV. BILINEAR AND QUADRATIC FORMS. ......................... 100. § 1. Symmetric bilinear forms and quadratic forms. Recovery formula. ....... 100. § 2. Orthogonal complements with respect to a quadratic form. .................. 103.

......................................... AFFINE SPACES................... § 2.. Quadratic forms in a Euclidean space....................... Quadrics in a Euclidean space........... Positive quadratic forms....... 119...... ......... ........... ... ........ EUCLIDEAN SPACES.... . Diagonalization of a pair of quadratic forms..... ............ Orthonormal bases.... Transformation of a quadratic form to its canonic form.. Inertia indices and signature. § 1... CHAPTER V........... Euclidean point spaces....................... 136................................................................. The angle between vectors........ Theorem on the spectrum and the basis of eigenvectors for a selfadjoint operator........................... § 2...... Affine spaces................ CHAPTER VI........................ 119. § 3..... 136.......... ......................... Silvester’s criterion............ § 1. ............. Points and parallel translations...... 123.......................... ......................................4 CONTENTS.............................. 143........ 127...... 139.............................. § 4........... § 4.......... Isometries and orthogonal operators....... 114.... 132... .. Selfadjoint operators... .... 108.... REFERENCES............. .. § 3.. The norm and the scalar product..............

and linear equations. Under the term «numeric field» in this book we assume one of the following three fields: the field of rational numbers Q. I prefer a self-sufficient way of explanation. Then the theory of determinants is developed. However. The proofs within this approach are conceptually simple and mostly are based on calculations. Therefore the reader should not know the general theory of numeric fields. 2004. There are two approaches to stating the linear algebra and the multidimensional geometry. the field of real numbers R. In most of textbooks the coordinates and matrices approach is used. in further statement of the subject the coordinates and matrices approach is not so advantageous. commutative algebra. the sets of numbers. and algebraic topology. or the field of complex numbers C. The invariant geometric approach. linear functions. starts with the definition of abstract linear vector space. which is used in this book. the numeric matrices.PREFACE. A. The reader is assumed to have only minimal preliminary knowledge in matrix algebra and in theory of determinants. R. B. I am grateful to E. Sharipov. while the intension to consider only numeric objects prevents us from introducing and using new concepts. It starts with considering the systems of linear algebraic equations. May. The first approach can be characterized as the «coordinates and matrices approach». The second one is the «invariant geometric approach». This approach is convenient for initial introduction to the subject since it is based on very simple concepts: the numbers. Rudenko for reading and correcting the manuscript of Russian edition of this book. The invariant geometric approach lets the reader to get prepared to the study of more advanced branches of mathematics such as differential geometry. 1996. Computational proofs become huge. the matrix algebra and the geometry of the space Rn are considered. algebraic geometry. This material is usually given in courses of general algebra and analytic geometry. . May. Thereby the coordinate representation of vectors is not of crucial importance. the set-theoretic methods commonly used in modern algebra become more important. Linear vector space is the very object to which these methods apply in a most simple and effective way: proofs of many facts can be shortened and made more elegant.

The element y = f(x) obtained as a result of applying f to x is called the image of x under the mapping f.1 is called the domain of values of the mapping f. Then we write A = {m. The empty set is assumed to be a part of any set: ∅ ⊂ A. . The set f(A) composed by the images of all elements x ∈ A is called the image of the subset A under the mapping f: f(A) = {y ∈ Y : ∃ x ((x ∈ A) & (f(x) = y))}. we emphasize that the condition A ⊂ B does not exclude the coincidence of sets A = B. If a set A is a part of another set B. The sets and mappings. However. they constitute a new set which is called the intersection of initial sets. we denote this fact as A ⊂ B or A ⊆ B and say that the set A is a subset of the set B. q}. then the image f(X) is called the image of the mapping f. and q.CHAPTER I LINEAR VECTOR SPACES AND LINEAR MAPPINGS. The set of values is another term used for denoting Im f = f(X). The term empty set is used to denote the set ∅ that comprises no elements at all. uniquely defines some element y = f(x) in the set Y . we can gather all of their elements into one set which is called the union of initial sets.1 is called the domain of the mapping f. If A = X. Two signs ⊂ and ⊆ are equivalent. The writing p ∈ A means that the object p is not an element of the set A. If we gather the elements each of which belongs to all of our sets. In order to denote this operation we use the intersection sign ∩. The writing f(x) means that the rule f is applied to the element x of the set X. There is special notation for this image: f(X) = Im f. Definition 1. The set Y in the definition 1. Objects constituting a given set are called the elements of this set. We usually assign some literal names (identificators) to the sets and to their elements. The fact that m is an element of the set A is denoted by the membership sign: m ∈ A. using the sign ⊆. / If we have several sets. It denotes any group of objects for some reasons distinguished from other objects and grouped together. The set X in the definition 1.1. then we say that the set A is a strict subset in the set B. In order to denote this gathering operation we use the union sign ∪. Let A be a subset of the set X. being applied to a particular element x ∈ X. If A B. § 1. The concept of a set is a basic concept of modern mathematics. don’t confuse it with the domain of values. The mapping f : X → Y from the set X to the set Y is a rule f applicable to any element x of the set X and such that. n. n. Suppose the set A consists of three objects m.

then f(x1 ) = y = f(x2 ). The mapping f : X → Y is bijective if and only if it is injective and surjective simultaneously. each set f −1 (y) is non-empty and contains exactly one element. THE SETS AND MAPPINGS. Suppose that the mapping f : X → Y is bijective. The mapping f : X → Y is called surjective if total preimage f −1 (y) of any element y ∈ Y is not empty. However. Suppose that someone of them contains more than one element. CopyRight c Sharipov R. we get the total preimage of the set B itself: f −1 (B) = {x ∈ X : f(x) ∈ B}. . From the equality f(x1 ) = y we derive x1 ∈ f −1 (y). simultaneous injectivity and surjectivity is necessary and sufficient condition for bijectivity of the mapping f : X → Y . Let’s prove the bijectivity of f by contradiction. this equality contradicts the injectivity of the mapping f : X → Y .1.. 1996.1. 2004. i. we have proved the necessity of the condition stated in theorem 1. However. If x1 = x2 are two distinct elements of the set f −1 (y). Let’s prove the necessity of this condition for the beginning.§ 1. This means that it is not empty. Definition 1. the total preimage f −1 (y) is a set containing at least two distinct elements x1 and x2. Thus. Therefore there is no special sign for denoting f −1 (Y ). This fact contradicts the bijectivity of the mapping f : X → Y . Hence. This set f −1 (y) is called the total preimage of the element y: f −1 (y) = {x ∈ X : f(x) = y}. Thus. It is clear that for the case B = Y the total preimage f −1 (Y ) coincides with X. Let’s proceed to the proof of sufficiency.A. Due to the surjectivity the sets f −1 (y) are non-empty for all y ∈ Y . we have proved the bijectivity of the mapping f. Definition 1.1. x1 = x2 implies f(x1 ) = f(x2 ).2. Proof. According to the statement of theorem 1. Let’s consider the set f −1 (y) consisting of all elements x ∈ X that are mapped to the element y. The mapping f : X → Y is called a bijective mapping or a one-to-one mapping if total preimage f −1 (y) of any element y ∈ Y is a set consisting of exactly one element. Taking the union of total preimages for all elements of the set B. Definition 1. Then for any y ∈ Y the total preimage f −1 (y) consists of exactly one element. Suppose that the mapping f : X → Y is injective and surjective simultaneously. This fact proves the surjectivity of the mapping f : X → Y . Suppose that B is a subset in Y . Let’s denote y = f(x1 ) = f(x2 ) and consider the total preimage f −1 (y).4. 7 Let y be an element of the set Y . we need to prove that f is not only surjective. but bijective as well. e. The mapping f : X → Y is called injective if images of any two distinct elements x1 = x2 are different. Theorem 1. then there are two distinct elements x1 = x2 in X such that f(x1 ) = f(x2 ). Similarly from f(x2 ) = y we derive x2 ∈ f −1 (y).3. Hence. If the mapping f is not bijective. Due to this contradiction we conclude that f is surjective and injective simultaneously.

i. g : Y → Z. for any y ∈ Y the total preimage f −1 (y) is not empty. Theorem 1. Let’s choose some arbitrary vector y ∈ g−1 (z) and consider its total preimage f −1 (y). the total preimage (g ◦ f)−1 (z) is not empty.3. This means that x ∈ (g ◦ f)−1 (z). Theorem 1. Due to the injectivity of f from x1 = x2 we derive y1 = y2 . Choosing some element x ∈ f −1 (y). and h : Z → U . then any element y ∈ Y is an image of some element x ∈ X. Due to the surjectivity of f it is not empty. Proof. As an immediate consequence of the above two theorems we obtain the following theorem on composition of two bijections. Hence. (1. The composition g ◦ f of two injective mappings f : X → Y and g : Y → Z is an injective mapping. i. Hence. Proof.1) The fact of coincidence of these two mappings is formulated as the following theorem on associativity. e. The mapping f : X → Y is surjective if and only if Im f = Y . Denote y1 = f(x1 ) and y2 = f(x2 ). we get y = f(x).8 CHAPTER I. Let’s consider two mappings f : X → Y and g : Y → Z. Theorem 1. we get g ◦ f(x) = g(f(x)) = g(y) = z. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. The surjectivity of g ◦ f is proved.4. This proves the equality Im f = Y . g ◦ f(x1 ) = g ◦ f(x2 ). The successive application of two mappings g(f(x)) yields a rule that associates each element x ∈ X with some uniquely determined element z = g(f(x)) ∈ Z. Hence. The operation of composition for the mappings is an associative operation. ψ = (h ◦ g) ◦ f. Then due to the injectivity of g from y1 = y2 we derive g(y1 ) = g(y2 ). . if Im f = Y . e. This means that f is a surjective mapping.2. then for any element y ∈ Y the total preimage f −1 (y) is not empty. we have a mapping ϕ : X → Z. Choosing an arbitrary element x ∈ X we can apply f to it. The composition g ◦ f of two surjective mappings f : X → Y and g : Y → Z is a surjective mapping. i. Theorem 1. Theorem 1. Let’s consider two elements x1 and x2 of the set X. Then we can form two different compositions of these mappings: ϕ = h ◦ (g ◦ f). e. It is denoted as ϕ = g ◦ f. The injectivity of the composition g ◦ f is proved. h ◦ (g ◦ f) = (h ◦ g) ◦ f. Let’s consider three mappings f : X → Y . each element y ∈ Y is an image of some element x under the mapping f. If the mapping f : X → Y is surjective. As a result we get the element f(x) ∈ Y . This mapping is called the composition of two mappings f and g. Proof. Hence. The composition g ◦ f of two bijective mappings f : X → Y and g : Y → Z is a bijective mapping. Due to the surjectivity of g the total preimage g −1 (z) is not empty.6.5. Let’s take an arbitrary element z ∈ Z. Therefore g ◦ f(x1 ) = g(y1 ) and g ◦ f(x2 ) = g(y2 ). Conversely. Then we can apply g to f(x). y = f(x). Then choosing an arbitrary vector x ∈ f −1 (y).

Thus. Theorem 1. (1. for y ∈ Im f. Then f(xy ) = y = f(x). idY (y) = y. we defive that the direct mapping f is injective. Let’s study the composition l◦f. ψ(x) = α ◦ f(x) = α(f(x)) = h(g(f(x))). Hence. Conversely. assuming the existence of left inverse mapping l.1. The last two mappings are defined as follows: idX (x) = x. from the equality f ◦ r = idY .8. Suppose that the mapping f possesses the left inverse mapping l. Suppose that the mapping f possesses the right inverse mapping r. Then we define the mapping l : Y → X by the following equality: l(y) = xy x0 for y ∈ Im f.2). Theorem is proved. The equality l ◦ f = idX for the mapping l is proved. First of all let’s choose and fix some element x0 ∈ X. For any y ∈ Im f we can choose and fix some element xy ∈ f −1 (y) in non-empty set f −1 (y). the equality y1 = y2 implies x1 = x2 and x1 = x2 implies y1 = y2 . Definition 1. A mapping r : Y → X is called right inverse to the mapping f : X → Y if f ◦ r = idY .7. It is easy to see that for any x ∈ X and for y = f(x) the equality l◦f(x) = xy is fulfilled. Let’s consider a mapping f : X → Y and the pair of identical mappings idX : X → X and idY : Y → Y . Let’s denote α = h ◦ g and β = g ◦ f. Taking into account the injectivity of f. we get xy = x.1). The equality l ◦ f = idX yields x1 = l(y1 ) and x2 = l(y2 ). l◦f(x) = x for any x ∈ X.§ 1. A mapping f : X → Y possesses the left inverse mapping l if and only if it is injective. Hence. Let’s choose two vectors x1 and x2 in the space X and let’s denote y1 = f(x1 ) and y2 = f(x2 ). Theorem 1. the coincidence of two mappings ϕ : X → U and ψ : X → U is verified by verifying the equality ϕ(x) = ψ(x) for an arbitrary element x ∈ X.6. we derive the required equality ϕ(x) = ψ(x) for the mappings (1. Hence.5. Proof of the theorem 1. Then ϕ(x) = h ◦ β(x) = h(β(x)) = h(g(f(x))). suppose that f is an injective mapping.8. this mapping is a required left inverse mapping for f. Proof of the theorem 1.2) Comparing right hand sides of the equalities (1. Definition 1. A mapping l : Y → X is called left inverse to the mapping f : X → Y if l ◦ f = idX . Therefore. Its total preimage f −1 (y) is not empty. The problem of existence of the left and right inverse mappings is solved by the following two theorems.7. For an arbitrary element y ∈ Y . THE SETS AND MAPPINGS. 9 Proof. Then let’s consider an arbitrary element y ∈ Im f. According to the definition 1. h ◦ (g ◦ f) = (h ◦ g) ◦ f. A mapping f : X → Y possesses the right inverse mapping r if and only if it is surjective.

and 1. the left and right inverse mappings determine the unique bilateral inverse mapping f −1 = l = r satisfying the equalities (1. f ◦ f −1 = idY . we get f(r(y)) = y and f ◦ r = idY . In this case the mappings l and r are uniquely determined. if we assume that there is another left inverse mapping l . Even the method of constructing them contains definite extent of arbitrariness.9 follows from theorems 1. Definition 1. The first proposition of the theorem 1.7. Now. This rule can be denoted as a function z = f(x. (1.7. the surjectivity of f is proved. In each non-empty set f −1 (y) we choose and mark exactly one element xy ∈ f −1 (y). while operands are separated by comma. The vector product of three-dimensional vectors yields an example of such notation: z = [x. let’s assume that f is surjective. A mapping f : X → Y possesses both left and right inverse mappings l and r if and only if it is bijective. 1. where the operation sign is placed between the elements x and y. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. They coincide with each other thus determining the unique bilateral inverse mapping l = r = f −1 . y). Sometimes special brackets play the role of the operation sign.10 CHAPTER I.9. The operation of . Then we can define a mapping by setting r(y) = xy . Examples are the binary operations of addition and multiplication of numbers: z = x + y. Proof. Coinciding with each other. § 2. Binary algebraic operation in M is a rule that maps each ordered pair of elements x. y]. This notation is called a prefix notation for an algebraic operation: the operation sign f in it precedes the elements x and y to which it is applied.8. The uniqueness of left inverse mapping also follows from the same chain of equalities. r = r . or the field of complex numbers K = C. the total preimage f −1 (y) is not empty. Thus. y of the set M to some uniquely determined element z ∈ M .1. the field of real numbers K = R. Let’s prove the remaining propositions of this theorem 1. then from l = r and l = r it follows that l = l . Hence. Let K be a numeric field. The coincidence l = r is derived from the following chain of equalities: l=l ◦ idY = l ◦ (f ◦ r) = (l ◦ f) ◦ r = idX ◦ r = r.7 and 1. therefore. we get l = r and l = r . This means that r(y) ∈ f −1 (y).3). There is another infix notation for algebraic operations. assuming the existence of another right inverse mapping r .9. In a similar way. Indeed.8 in general are not unique. z = x · y. we derive y = f(r(y)). Under the numeric field in this book we shall understand one of three such fields: the field of rational numbers K = Q. A mapping f −1 : Y → X is called bilateral inverse mapping or simply inverse mapping for the mapping f : X → Y if f −1 ◦ f = idX . Let M be a set. The existence of the right inverse mapping r for f is established. Linear vector spaces. conversely.3) Theorem 1. Since f(xy ) = y. Then for any y ∈ Y the total preimage f −1 (y) is not empty. Note that the mappings l : Y → X and r : Y → X constructed when proving theorems 1.

Then for any vector v ∈ V due to the axiom (3) we The system of axioms (1)-(8) is excessive: the axiom (1) can be derived from other axioms. The elements of a linear vector space are usually called the vectors. (10) for any vector v ∈ V the vector v opposite to v is unique.§ 2. (2) (u + v) + w = u + (v + w) for all u. if the following conditions are fulfilled: (1) u + v = v + u for all u. (3) there is an element 0 ∈ V such that v + 0 = v for all v ∈ V . Formulating such results. we shall not specify the type of linear vector space. (11) the product of the number 0 ∈ K and any vector v ∈ V is equal to zero vector: 0 · v = 0. We shall distinguish rational. v. any such element is called a zero element. (8) 1 · v = v for the number 1 ∈ K and for any element v ∈ V . Muftakhov who communicated me this curious fact. (12) the product of an arbitrary number α ∈ K and zero vector is equal to zero vector: α · 0 = 0. Proof. while the conditions (1)-(8) are called the axioms of a linear vector space. (13) the product of the number −1 ∈ K and the vector v ∈ V is equal to the opposite vector: (−1) · v = v . I am grateful to A. β ∈ K and for any element v∈V. (6) (α + β) · v = α · v + β · v for any two numbers α. A set V equipped with binary operation of addition and with the operation of multiplication by numbers from the field K. LINEAR VECTOR SPACES. w ∈ V . Theorem 2. (5) α · (u + v) = α · u + α · v for any number α ∈ K and for any two elements u. Axioms (1) and (2) are the axiom of commutativity 1 and the axiom of associativity respectively. or K = C they are defined over. (7) α · (β · v) = (αβ) · v for any two numbers α. Axioms (5) and (6) express the distributivity. Suppose that in a linear vector space there are two elements 0 and 0 with the properties of zero vectors. they are enumerated so that their numbers form successive series with the numbers of the axioms of a linear vector space. The multiplication sign in this notation is often omitted: y = α x. it is called an opposite element for v. is called a linear vector space over the field K. x) consisting of a number α ∈ K and of an element x ∈ M to some element y ∈ M . B. The operation of multiplication by numbers is written in infix form: y = α · x.1. Most of the results in this book are valid for any numeric field K. Definition 2. 1 . real. v ∈ V . β ∈ K and for any element v∈V. (4) for any v ∈ V and for any zero element 0 there is an element v ∈ V such that v + v = 0. and complex linear vector spaces depending on which numeric field K = Q. Algebraic operations in an arbitrary linear vector space V possess the following properties: (9) zero vector 0 ∈ V is unique. 11 multiplication by numbers from the field K in a set M is a rule that maps each pair (α.1. K = R. v ∈ V . Therefore. The properties (9)-(13) are immediate consequences of the axioms (1)-(8).

where 0 is zero vector of a vector space V . The equality v + x = 0 just derived means that x is an opposite vector for the vector v in the sense of the axiom (4). Let’s take x = 0 · v. The theorem is completely proved. Let x = (−1) · v. Thus. Due to the commutativity and associativity axioms we need not worry about setting brackets and about the order of the summands when writing the sums of vectors. This means that the vectors 0 and 0 do actually coincide. The property (11) is proved. As a result we get x + x = 0 · v + 0 · v = (0 + 0) · v = 0 · v = x. Let v be some arbitrary vector in a vector space V . The following calculations prove the uniqueness of opposite vector: v = v + 0 = v + (v + v ) = (v + v) + v = = (v + v ) + v = 0 + v = v + 0 = v . Let v be some arbitrary vector of a vector space V . Therefore. let v be some arbitrary vector in a vector space V . v + v = 0. then let’s add x with x and apply the distributivity axiom (6). From the equality x + x = x it follows that x = 0 (see above). the property (12) is proved. Then we easily derive that x = 0: x = x + 0 = x + (x + x ) = (x + x) + x = x + x = 0. Suppose that there are two vectors v and v opposite to v. (−1) · v = v . . Let’s substitute v = 0 into the first equality and substitute v = 0 into the second one. Here we used the axiom (5) and the property of zero vector from the axiom (3). Let’s take x = α · 0. Then v + v = 0. Thus we have proved that x + x = x. Applying axioms (8) and (6). we get 0 = 0 + 0 = 0 + 0 = 0. The property (13) and the axioms(7) and (8) yield (−1) · v = (−1) · ((−1) · v) = ((−1)(−1)) · v = 1 · v = v. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. Due to the uniqueness property (10) of the opposite vector we conclude that x = v . The uniqueness of zero vector is proved. have v = v + 0 and v + 0 = v. Then x + x = α · 0 + α · 0 = α · (0 + 0) = α · 0 = x. Here we used the associativity axiom (2).12 CHAPTER I. for the vector x we derive v + x = 1 · v + x = 1 · v + (−1) · v = (1 + (−1)) · v = 0 · v = 0. In deriving v = v above we used the axiom (4). Again. Taking into account the axiom (1). the associativity axiom (2) and we used twice the commutativity axiom (1). Let α be some arbitrary number of a numeric field K.

13 This equality shows that the notation v = −v for an opposite vector is quite natural. .1) is a linear vector space over the field R of real numbers. Definition 2. Algebraic operations with column vectors are determined as the operations with their components: x1 x2 . (a − b) + c = a − (b − c). . xn y1 y2 + . In a similar way. The following properties of the operation of vector subtraction (a + b) − c = a + (b − c).1) We leave to the reader to check the fact that the set Rn of all ordered n-tuples with algebraic operations (2. . . . α · xn = = (2.§ 2. . 1]) are defined as pointwise operations. This means that the value of the function f + g at a point a is the sum of the values of f and g at that point. α · (x − y) = α · x − α · y (a − b) − c = a − (b + c). . 1] of real axis. . u2 ∈ U it follows that u1 + u2 ∈ U . A non-empty subset U ⊂ V in a linear vector space V over a numeric field K is called a subspace of the space V if: (1) from u1. The reader can easily verify this fact. Let’s consider the set of m-times continuously differentiable real-valued functions on the segment [−1. This set is usually denoted as C m ([−1. Proof of the above properties is left to the reader. Such n-tuples are represented in the form of column vectors. xn + y n x1 x2 α· . In addition. The operations of addition and multiplication by numbers in C m ([−1. The operation of subtraction is an opposite operation for the vector addition. . LINEAR VECTOR SPACES. 1]) with pointwise algebraic operations of addition and multiplication by numbers is a linear vector space over the field of real numbers R. Due to the above conditions (1) and (2) this set is closed with respect to operations of addition and multiplication by numbers. . xn. . (2) from u ∈ U it follows that α · u ∈ U for any number α ∈ K. xn α · x1 α · x2 . . Let’s consider some examples of linear vector spaces. It is determined as the addition with the opposite vector: x − y = x + (−y). yn x1 + y 1 x2 + y 2 . the value of the function α · f at the point a is the product of two numbers α and f(a). Let’s regard U as an isolated set. 1]). It is easy to verify that the set of functions C m ([−1. It is easy to show that . Real arithmetic vector space Rn is determined as a set of ordered n-tuples of real numbers x1. we can write −α · v = −(α · v) = (−1) · (α · v) = (−α) · v. . make the calculations with vectors very simple and quite similar to the calculations with numbers.2. Rational arithmetic vector space Qn over the field Q of rational numbers and complex arithmetic vector space Cn over the field C of complex numbers are defined in a similar way. Let U be a subspace of a linear vector space V . .

Verifying axioms (1). they are called the coefficients of the linear combination (3. + α n · vn . Since U is closed with respect to algebraic operations. – the subspace of polynomials (f(x) = an xn + . We say that vector v is linearly expressed through the vectors v1 . . These facts follow from 0 = 0 · u and u = (−1) · u. CopyRight c Sharipov R. . Relying upon these facts one can easily prove that any subspace U ⊂ V . . 1]): – the subspace of even functions (f(−x) = f(x)). Let v1 . . Otherwise it is called nontrivial. . . zero vector is an element of U and for any u ∈ U the opposite vector u also is an element of U . As the examples of the concept of subspace we can mention the following subspaces in the functional space C m ([−1. .A. . A linear combination is called trivial if all its coefficients are equal to zero: α1 = . these equalities are obviously fulfilled for vectors of subset U ⊂ V . while vector v is called the value of this linear combination. . Applying the operations of multiplication by numbers and addition to them we can produce the following expressions with these vectors: v = α 1 · v1 + . vn in linear vector space V is called linearly independent if any linear combination of these vectors being equal to zero is necessarily trivial. . . . is a linear vector space over the field K. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. (3. . to our knowledge. when considered as an isolated set. . we have already shown that axioms (3) and (4) are valid for it. vn be a system of vectors some from some linear vector space V . . . + a1 x + a0 ). . . . vn if v is the value of some linear combination composed of v1 . vn . The concept of linear independence is obtained by direct logical negation of the concept of linear dependence. . A system of vectors v1 . vn in linear vector space V is called linearly dependent if there exists some nontrivial linear combination of these vectors equal to zero. .1) An expression of the form (3. . αn are taken from the field K. . Indeed. Definition 3. 1996. Being fulfilled for arbitrary vectors of V . . A system of vectors v1 . . The numbers α1.. Linear dependence and linear independence. . . .1).1. Definition 3. is most convenient in what follows.2. Let’s introduce one more concept related to linear combinations. (2) and remaining axioms (5)-(8) consists in checking equalities written in terms of the operations of addition and multiplication by numbers. . 2004. . § 3. – the subspace of odd functions (f(−x) = −f(x)). Here we give only one of such statements which. . . Linear combination is said to be zero or equal to zero if its value is zero. vn.1) is called a linear combination of the vectors v1 .14 CHAPTER I. . it makes sure that all calculations in these equalities are performed within the subset U . . = αn = 0. The reader can give several equivalent statements defining this concept. .

.2) means that at least one of its coefficients is nonzero. . + αk · vk = 0. we can assume that first k vectors form linear dependent subsystem in it. . The property (2) is proved. . + 0 · vn = 0. the vectors v1 . + αk · vk + . .§ 3. Then there exists some nontrivial liner combination of these k vectors being equal to zero: α1 · v1 + . . Let’s compose the following linear combination of the vectors v1 . . + 0 · vk−1 + 1 · vk + 0 · vk+1 + . . . (2) any system of vectors comprising linearly dependent subsystem is linearly dependent in whole. (3) if a system of vectors is linearly dependent. It is obvious that the resulting linear combination is nontrivial and its value is equal to zero. . For the sake of certainty we can assume that vk = 0. (3. . . . vn are linearly dependent. + αn · vn = 0. vn is linearly independent and if adding the next vector vn+1 to it we make it linearly dependent. . . . Suppose that a system of vectors v1 . . . . Let assume that the vectors v1 . . Suppose that a system of vectors v1 . . . . the vectors v1 . . zn . . The property (1) is proved.2) in more details: α1 · v1 + . . Hence. . (4) if a system of vectors v1 .2) Non-triviality of the linear combination (3. vn are linearly dependent. vn comprises zero vector. zn . This linear combination is nontrivial since the coefficient of vector vk is nonzero. . . . . then at least one of these vectors is linearly expressed through others. . 15 Theorem 3. Hence. . + αn · vn = 0. Proof. (5) if a vector x is linearly expressed through the vectors y1 . ym and if each one of the vectors y1 . . . Let’s write (3. . then the vector vn+1 is linearly expressed through previous vectors v1 . . LINEAR DEPENDENCE AND LINEAR INDEPENDENCE. . . Since linear dependence is not sensible to the order in which the vectors in a system are enumerated. . . . . . . . ym is linearly expressed through z1 . . . + αk · vk + 0 · vk+1 + . And its value is equal to zero. Then there exists a nontrivial linear combination of them being equal to zero: α1 · v1 + . . + 0 · vn = 0. . vn are linearly dependent. Suppose that αk = 0. . . vn comprises a linear dependent subsystem. . . then x is linearly expressed through z1. . . . . . . .1. vn. The relation of linear dependence of vectors in a linear vector space has the following basic properties: (1) any system of vectors comprising zero vector is linearly dependent. . . . . Let’s expand this linear combination by adding other vectors with zero coefficients: α1 · v1 + . vn : 0 · v1 + .

. . If. Let’s prove that αn+1 = 0. . . . . ym . . for the vector x we get m n n m x= i=1 αi · j=1 βij · zj = j=1 i=1 αi βij · zj The above expression for the vector x shows that it is linearly expressed through vectors z1 . . αn+1 αn+1 This expression for the vector vn+1 completes the proof of the property (4). − · vn . . . αk αk αk αk Now we see that the vector vk is linearly expressed through other vectors of the system. . . yi = j=1 βij · zj . . − · vk−1 − · vk+1 − . . Let’s move the term αk · vk to the right hand side of the above equality. . Note the following important consequence that follows from the property (2) in the theorem 3. . + αn · vn = 0. we assume that αn+1 = 0. . . and then let’s divide the equality by −αk : vk = − α1 αk−1 αk+1 αn · v1 − . .1 in whole. . zn. vn such that adding the next vector vn+1 to it we make it linearly dependent. . . . conversely. vn .16 CHAPTER I. The property (5) is proved. . Substituting second formula into the first one. . + αn · vn + αn+1 · vn+1 = 0. Hence. . we would get the nontrivial linear combination of n vectors being equal to zero: α1 · v1 + . zn. Corollary. vn+1 being equal to zero: α1 · v1 + . . Let’s consider a linearly independent system of vectors v1 . Then there is some nontrivial linear combination of vectors v1. . . and each one of the vectors y1. . This completes the proof of theorem 3. This fact is expressed by the following formulas: m n x= i=1 αi · y i . . . Suppose that the vector x is linearly expressed through y1. . . . and we can apply the trick already used above: vn+1 = − α1 αn · v1 − . . Any subsystem in a linearly independent system of vectors is linearlyindependent.1. − · vn . . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. This contradicts to the linear independence of the first n vectors v1 . αn+1 = 0. . ym is linearly expressed through z1 . The property (3) is proved. .

+ α∗ m−1 k · ym−1 .5) where i = 1. .3) and collect similar terms in them. Let’s begin with the case n = 1. . Theorem 3. . . . .5) are written as x∗ = α∗ · y1 + .6) In these notations the formulas (3.4) into the relationships (3. . βm βm βm (3. . In order to simplify (3. . . . . xn are linear independent and if each of them is expressed through the vectors y1 . If n = k + 1 we have a system of linearly independent vectors x1. .4) are written as xi − αim · xk+1 = βm m−1 j=1 αij − βj αim · yj . 17 The next property of linear dependence of vectors is known as Steinitz theorem. We express this fact by formulas x1 = α11 · y1 + . xk+1 the last vector xk+1 of this system is nonzero (as well as other ones). ym in a slightly different way: xk+1 = β1 · y1 + . − · ym−1 . .5) we introduce the following notations: x∗ = x i − i αim · xk+1. βm (3. Hence. LINEAR DEPENDENCE AND LINEAR INDEPENDENCE. . ym . . . . then m n. . . . . . . . xk = αk1 · y1 + . . βm (3. . . x∗ k = α∗ 1 k · y1 + . k. . . . . The base step of induction is proved. . . 1 11 1 . . . . we can assume that βm = 0. each vector being expressed through the vectors of another system y1. . . . . + αkm · ym . . Due to the linear independence of vectors x1 .2 (Steinitz). Therefore at least one of the numbers β1 . . . βm α∗ = αij − βj ij αim . As a result the relationships (3. . . If the vectors x1. xn . . . ym this system should contain at least one vector. m 1.7) . . . . βm is nonzero. . (3. . . . Linear independence of a system with a single vector x1 means that x1 = 0. Suppose that the theorem holds for the case n = k. . + α∗ m−1 · ym−1 . . Proof. . . . . ym . + βm · ym . if necessary. . . Under this assumption let’s prove that it is valid for n = k + 1. .4) (3. . . . . We shall prove this theorem by induction on the number of vectors in the system x1 . . xk+1 . ym . . It describes some quantitative feature of this concept. . . . . . . Then ym = 1 β1 βm−1 · xk+1 − · y1 − . . . We shall write the analogous formula expressing xk+1 through y1 . . . Upon renumerating the vectors y1 . + α1m · ym . . . . In order to express the nonzero vector x1 through the vectors of a system y1 .3) Let’s substitute (3.§ 3.

the inductive step is completed and the theorem is proved. This set S is called the linear span of a subset S ⊂ V . we have u1 + u2 ∈ S . .1. The linear span of any subset S ⊂ V is a subspace in a linear vector space V . Now. . Thus.8) γ 1 · x1 + . Hence. . For the vector α · u. + γk · x∗ = 0. xk+1 we derive γ1 = . . . . In order to apply the inductive hypothesis we need to ∗ show that the vectors x∗ . Due to the linear independence of the initial system of vectors x1 . . The required inequality m k + 1 proving the theorem for the case n = k + 1 is an immediate consequence of m k + 1. = γk = 0. from this equality we derive α · u = (α α1) · s1 + . + (α αn) · sn. . + αn · sn. . Let’s consider a linear 1 combination of these vectors being equal to zero: γ1 · x∗ + . 1 k Substituting (3. . . . . Suppose that u1. x∗ . Both conditions (1) and (2) from the definition 2. the theorem is proved.8). we get m − 1 k. The set S can consist of either finite number of vectors. k vectors x∗ . Now suppose that u ∈ S . Proof. Theorem 4. Then ∗ u2 = β 1 · s ∗ + . + α n · s n . we get i k (3. .7). or of infinite number of vectors. . + αn · sn. where si ∈ S)}. Hence.8) is trivial. . u2 ∈ S .2 for S are fulfilled.18 CHAPTER I. each of which is linearly expressed through some finite number of vectors taken from S: S = {v ∈ V : ∃ n (v = α1 · s1 + . 1 u1 = α 1 · s 1 + . . applying the inductive 1 k hypothesis to the relationships (3. We denote by S the set of all vectors. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. Let S ⊂ V be some non-empty subset in a linear vector space V . So. . . Adding these two equalities. . Then u = α1 · s1 + . . + β m · s m . Therefore. upon collecting similar terms. In order to prove this theorem it is sufficient to check two conditions from the definition 2. . . ym−1 . .2 for S . + γ k · xk − γi i=1 αim βm · xk+1 = 0. . § 4. . . . the linear combination (3. .6) for x∗ in (3. . xk are linearly independent. Spanning systems and bases. . . . . According to the above formulas. we see that the vector u1 + u2 also is expressed as a linear combination of some finite number of vectors taken from S. x∗ are linearly expressed 1 k through y1 . α · u ∈ S . . which proves the linear independence of vectors x∗ . . .

As for the spanning systems. where U is a subspace. the subset S is included into each of such subspaces. This fact obviously contradicts the minimality of S (see definition 4. .§ 4. sn taken from S is linearly independent. u ∈ U . If a spanning system of vectors S ⊂ V is not minimal. which is already proved. . Let u ∈ S and S ⊂ U . subsystem S such that S = S = V. Hence. A system of vectors S ⊂ V is called linearly independent if any finite subsystem of vectors s1. . S is a subspace of V comprising the subset S (see theorem 4. where si ∈ S. Let S = U . Definition 4. Theorem 4. Therefore. This proves the inclusion S ⊂ U . i.1) . The operation of passing to the linear span in a linear vector space V possesses the following properties: (1) if S ⊂ U and if U is a subspace in V . Due to the property (1). A spanning system of vectors S ⊂ V in a linear vector space V is called a minimal spanning system if none of smaller subsystems S S is a spanning system in V . Then W ⊂ S . . Then for the vector u we have u = α1 · s1 + . Then the subsystem S = S {sk } obtained by omitting this vector sk from S is a spanning system in V . e.1.2 for the case of infinite systems of vectors. Therefore any minimal spanning system of vectors in V is linearly independent. (2) the linear span of a subset S ⊂ V is the intersection of all subspaces comprising this subset S. 19 Theorem 4. SPANNING SYSTEMS AND BASES. e. A linear vector space V can have multiple spanning systems. A subset S ⊂ V is called a generating subset or a spanning system of vectors in a linear vector space V if S = V . S is among those subspaces forming W . Proof. This definition extends the definition 3. S ⊂ W . then S ⊂ U . On the other hand. Due to the item (3) in the statement of theorem 3. sn. From the two inclusions S ⊂ W and W ⊂ S it follows that S = W . e. Let’s denote by W the intersection of all subspaces of V comprising the subset S. i. + αn · sn. .1 one of these vectors sk is linearly expressed through others. If a spanning system of vectors S ⊂ V is linearly dependent. . the relation of the properties of minimality and linear independence for them is determined by the following theorem. the value of any linear combination of its elements again is an element of U . .3. The theorem is proved. Then we say that the subset S ⊂ V spans the subspace U . then it contains some finite linearly dependent set of vectors s1 .3. (4. then there is some smaller spanning subsystem S S. if S = V for all S S.2. But si ∈ S and S ⊂ U implies si ∈ U . . Definition 4. This terminology is supported by the following definition. Definition 4. Proof. Therefore the problem of choosing of a minimal (is some sense) spanning system is reasonable. Hence. Since U is a subspace. A spanning system of vectors S ⊂ V is minimal if and only if it is linearly independent. .2. S generates U by means of the linear combinations. .1).2 above). i.

. .2). . . . we shall get a minimal spanning system Smin in V with finite number of vectors n in it: Smin = {y1 . . . . . .20 CHAPTER I. . y n i for i = 1. .2) to the form of a linear combination equal to zero: (−1) · s0 + α1 · s1 + .2 to the systems of vectors {x1 . xn} in it. Hence. . However. (4. . If S is not minimal again. This fact means that any linearly independent spanning system of vector in V is minimal. xn } and {y1 . . . . . one of its vectors is linearly expressed through others. + αn · sn = 0. we have found that the vectors s0 . but it is not elementary and it is not constructive. . xm (4. . yn}. . . . Finite dimensional vector spaces are distinguished due to the fact that the proof of existence of minimal spanning systems for them is elementary. If this system is not minimal. . Thus. Theorem 4. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. xm } is some other minimal spanning system in V .3) This linear combination is obviously nontrivial. Ultimately. In an arbitrary linear vector space V there is at lease one spanning system. . This problem is solved with the use of the axiom of choice (see [1]). . s0 is linearly expressed through some finite number of vectors taken from the subsystem S : s0 = α 1 · s1 + . The theorem is proved. Definition 4. xk } be some finite spanning system of vectors in a finite-dimensional linear vector space V . S = V . Hence. (4. . . .5) Due to (4.5) we can apply Steinitz theorem 3. . . . . The solution of this problem is positive. . yn }. . then it is linear dependent. .1 and the definition 4. . . Let S = {x1. . it is denoted as n = dim V . Hence. . xm } and {y1 . . . . Both systems {x1 . .1) this / vector is an element of S . then we can iterate the process getting one less vectors in each step. n. . .4. A linear vector space V is called finite dimensional if there is some finite spanning system of vectors S = {x1 . + α n · sn .4. This number n is called the dimension of V . . the minimal spanning system of vectors (4. . . . Proof. yn } are linearly independent and xi ∈ y 1 . (4. . As a result we get two inequalities n m and m n.2) One can easily transform (4. . . yn } have the same number of elements n. for i = 1. . m. xm } and {y1. .4) Usually. Due to (4. y ∈ x 1 . sn form a finite linearly dependent subset of S. . In a finite dimensional linear vector space V there is at least one minimal spanning system of vectors. . . In this case we can choose some vector s0 ∈ S such that s0 ∈ S . e. . . Any two of such systems {x1. . m = n = dim V . This vector can be omitted and we get the smaller spanning system S consisting of k − 1 vectors. Therefore.4) is not unique. . S is linearly dependent (see the item (2) in theorem 3. Suppose that {x1 . . the problem of existence of minimal spanning systems in general case is nontrivial. g. . .

§ 4. SPANNING SYSTEMS AND BASES.

21

The dimension dim V is an integer invariant of a finite-dimensional linear vector space. If dim V = n, then such a space is called an n-dimensional space. Returning to the examples of linear vector spaces considered in § 2, note that dim Rn = n, while the functional space C m ([−1, 1]) is not finite-dimensional at all. Theorem 4.5. Let V be a finite dimensional linear vector space. Then the following propositions are valid: (1) the number of vectors in any linearly independent system of vectors x1 , . . . , xk in V is not greater than the dimension of V ; (2) any subspace U of the space V is finite-dimensional and dim U dim V ; (3) for any subspace U in V if dim U = dim V , then U = V ; (4) any linearly independent system of n vectors x1 , . . . , xn , where n = dim V , is a spanning system in V . Proof. Suppose that dim V = n. Let’s fix some minimal spanning system of vectors y1 , . . . , yn in V . Then each vector of the linear independent system of vectors x1 , . . . , xk in proposition (1) is linearly expressed through y1 , . . . , yn . Applying Steinitz theorem 3.2, we get the inequality k n. The first proposition of theorem is proved. Let’s consider all possible linear independent systems u1 , . . . , uk composed by the vectors of a subspace U . Due to the proposition (1), which is already proved, the number of vectors in such systems is restricted. It is not greater than n = dim V . Therefore we can assume that u1, . . . , uk is a linearly independent system with maximal number of vectors: k = kmax n = dim V . If u is an arbitrary vector of the subspace U and if we add it to the system u1, . . . , uk , we get a linearly dependent system; this is because k = kmax . Now, applying the property (4) from the theorem 3.1, we conclude that the vector u is linearly expressed through the vectors u1 , . . . , uk. Hence, the vectors u1, . . . , uk form a finite spanning system in U . It is minimal since it is linearly independent (see theorem 4.3). Finite dimensionality of U is proved. The estimate for its dimension follows from the above inequality: dim U = k n = dim V . Let U again be a subspace in V . Assume that dim U = dim V = n. Let’s choose some minimal spanning system of vectors u1, . . . , un in U . It is linearly independent. Adding an arbitrary vector v ∈ V to this system, we make it linearly dependent since in V there is no linearly independent system with (n + 1) vectors (see proposition (1), which is already proved). Furthermore, applying the property (3) from the theorem 3.1 to the system u1, . . . , un, v, we find that v = α 1 · u1 + . . . + α m · um . This formula means that v ∈ U , where v is an arbitrary vector of the space V . Therefore, U = V . The third proposition of the theorem is proved. Let x1, . . . , xn be a linearly independent system of n vectors in V , where n is equal to the dimension of the space V . Denote by U the linear span of this system of vectors: U = x1 , . . . , xn . Since x1 , . . . , xn are linearly independent, they form a minimal spanning system in U . Therefore, dim U = n = dim V . Now, applying proposition (3) of the theorem, we get x1 , . . . , xn = U = V.
CopyRight c Sharipov R.A., 1996, 2004.

22

CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.

This equality proves the fourth proposition of theorem 4.5 and completes the proof of the theorem in whole. Definition 4.5. A minimal spanning system e1, . . . , en with some fixed order of vectors in it is called a basis of a finite-dimensional vector space V . Theorem (basis criterion). An ordered system of vectors e1 , . . . , en is a basis in a finite-dimensional vector space V if and only if (1) the vectors e1, . . . , en are linearly independent; (2) an arbitrary vector of the space V is linearly expressed through e1 , . . . , en. Proof is obvious. The second condition of theorem means that the vectors e1 , . . . , en form a spanning system in V , while the first condition is equivalent to its minimality. In essential, theorem 4.6 simply reformulates the definition 4.5. We give it here in order to simplify the terminology. The terms «spanning system» and «minimal spanning system» are huge and inconvenient for often usage. Theorem 4.7. Let e1 , . . . , es be a basis in a subspace U ⊂ V and let v ∈ V be some vector outside this subspace: v ∈ U . Then the system of vectors e1 , . . . , es, v / is a linearly independent system. Proof. Indeed, if the system of vectors e1 , . . . , es , v is linearly dependent, while e1 , . . . , es is a linearly independent system, then v is linearly expressed through the vectors e1, . . . , es , thus contradicting the condition v ∈ U . This / contradiction proves the theorem 4.7. Theorem 4.8 (on completing the basis). Let U be a subspace in a finitedimensional linear vector space V . Then any basis e1, . . . , es of U can be completed up to a basis e1 , . . . , es , es+1 , . . . , en in V . Proof. Let’s denote U = U0 . If U0 = V , then there is no need to complete the basis since e1, . . . , es is a basis in V . Otherwise, if U0 = V , then let’s denote by es+1 some arbitrary vector of V taken outside the subspace U0 . According to the above theorem 4.7, the vectors e1 , . . . , es, es+1 are linearly independent. Let’s denote by U1 the linear span of vectors e1, . . . , es , es+1 . For the subspace U1 we have the same two mutually exclusive options U1 = V or U1 = V , as we previously had for the subspace U0 . If U1 = V , then the process of completing the basis e1 , . . . , es is over. Otherwise, we can iterate the process and get a chain of subspaces enclosed into each other: U0 U1 U2 ... .

This chain of subspaces cannot be infinite since the dimension of every next subspace is one as greater than the dimension of previous subspace, and the dimensions of all subspaces are not greater than the dimension of V . The process of completing the basis will be finished in (n − s)-th step, where Un−s = V . § 5. Coordinates. Transformation of the coordinates of a vector under a change of basis. Let V be some finite-dimensional linear vector space over the field K and let dim V = n. In this section we shall consider only finite-dimensional spaces. Let’s

§ 5. TRANSFORMATION OF THE COORDINATES OF VECTORS . . .

23

choose a basis e1, . . . , en in V . Then an arbitrary vector x ∈ V can be expressed as linear combination of the basis vectors: x = x1 · e1 + . . . + x n · en . (5.1)

The linear combination (5.1) is called the expansion of the vector x in the basis e1, . . . , en . Its coefficients x1, . . . , xn are the elements of the numeric field K. They are called the components or the coordinates of the vector x in this basis. We use upper indices for the literal notations of the coordinates of a vector x in (5.1). The usage of upper indices for the coordinates of vectors is determined by special convention, which is known as tensorial notation. It was introduced for to simplify huge calculations in differential geometry and in theory of relativity (see [2] and [3]). Other rules of tensorial notation are discussed in coordinate theory of tensors (see [7]1). Theorem 5.1. For any vector x ∈ V its expansion in a basis of a linear vector space V is unique. Proof. The existence of an expansion (5.1) for a vector x follows from the item (2) of theorem 4.7. Assume that there is another expansion x = x 1 · e1 + . . . + x n · en . Subtracting (5.1) from this equality, we get x = (x 1 − x1 ) · e1 + . . . + (x n − xn) · en . (5.3) (5.2)

Since basis vectors e1 , . . . , en are linearly independent, from the equality (5.3) it follows that the linear combination (5.3) is trivial: x i − xi = 0. Then x 1 = x1 , . . . , x n = xn . Hence the expansions (5.1) and (5.2) do coincide. The uniqueness of the expansion (5.1) is proved. Having chosen some basis e1, . . . , en in a space V and expanding a vector x in this base we can write its coordinates in the form of column vectors. Due to the theorem 5.1 this determines a bijective map ψ : V → Kn . It is easy to verify that ψ(x + y) = x1 + y 1 . . . x +y
n n

,

ψ(α · x) =

α · x1 . . . . n α·x

(5.4)

The above formulas (5.4) show that a basis is a very convenient tool when dealing with vectors. In a basis algebraic operations with vectors are replaced by algebraic operations with their coordinates, i. e. with numeric quantities. However, coordinate approach has one disadvantage. The mapping ψ essentially depends on the basis we choose. And there is no canonic choice of basis. In general, none of basis is preferable with respect to another. Therefore we should be ready to
1

The reference [7] is added in 2004 to English translation of this book.

e (5. . Sn S= (5.6) the direct transition matrix ˜ for passing from the old basis e1 . and the wavy one will be called the new basis. .5) are specified by upper index. . ˜ Let e1 . ˜n be two arbitrary bases in a linear vector space e V . Certainly. . . .2. Proof. . . + S i · en .7) The coefficients of the expansion (5. which is called the inverse transition matrix. . Theorem 5. . . . en and e1 . . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. .5) and (5. . en to the new basis ˜1. Remember that two square matrices are inverse to each other if their product is equal to unit matrix: S T = 1.8) into the second one.8) Then we substitute the first relationship (5. lower index i specifies the column number. the usage of terms «direct» and «inverse» here is relative. . . (5. We shall call them «wavy» basis and «non-wavy» basis (because of tilde sign we use for denoting the vectors of one of them). The non-wavy basis will also be called the initial basis or the old basis. consider various bases and should be able to recalculate the coordinates of vectors when passing from a basis to another basis.6) j Upper index j of the matrix element Si specifies the row number. the coordinates of the vector ˜i in the e expansion (5. The direct transition matrix S and the inverse transition matrix T determined by the expansions (5. . .9) . + Tjn · en . en and ˜1.7) are inverse to each other. . e ˜ Swapping the bases e1. This yields: n n n k Si · e k n k Si Tji k=1 i=1 ej = i=1 Tji · = k=1 · ek . . .2 by writing the relationships (5. . en we can write the expansion of e the vector ej in wavy basis: ˜ ˜ ej = Tj1 · e1 + . Totally in the expansion (5. Sn . . they are usually arranged into a matrix: 1 S1 .7) determine the matrix T . .5) According to the tensorial notation. Let’s begin the proof of the theorem 5. . . .. . The lower index i specifies the ˜ number of the vector ei being expanded. e (5. Here we do not define the matrix multiplication assuming that it is known from the course of general algebra. .24 CHAPTER I. . .5) and (5.5) we determine n2 numbers. . ˜i = e k=1 ej = i=1 Tji · ˜i. . The matrix S in (5.7) in a brief symbolic form: n n k Si · e k . n . en . . . . n S1 1 . it depends on which basis is considered as an old basis and which one is taken for a new one. (5. we expand it in the old basis: 1 n ˜ i = Si · e1 + . Taking i-th vector of new (wavy) basis. . .

. Proof.5) and prove that they are linearly independent. . is used for denoting the following numeric array: 1 for k = j. . 25 k The symbol δj . Theorem 5. This proves the non-degeneracy of transition matrices S and T . Every non-degenerate n × n matrix S can be obtained as a tran˜ sition matrix for passing from some basis e1. en by means of the relationships (5. . The corollary is proved. The direct transition matrix S and the inverse transition matrix T both are non-degenerate matrices and det S det T = 1.10) 0 for k = j. . .1 on the uniqueness of the expansion of a vector in a basis we have the equality n k k Si Tji = δj . en basis in V and fix it. Proof.9): n ej = k=1 k δj · e k . TRANSFORMATION OF THE COORDINATES OF VECTORS . . . one can transform it to the following one: n 1 Si α i i=1 n · e1 + . . det T = 0.5) into this equality. . If the product of two numbers is equal to unity. then none of these two numbers can be equal to zero: det S = 0.9) represent the same vector ej expanded in the same basis e1 . . . . + n Si α i i=1 · en = 0. . Corollary. This fact is well known from the course of general algebra. . . . .10) in order to transform left hand side of the equality (5.11) Both equalities (5.3.11) and (5. .§ 5. . en to some other basis e1 . . . . en. . . it follows that all sums enclosed within the brackets in the above equality are equal to zero. For this purpose we consider a linear combination of these vectors that is equal to zero: ˜ α1 · ˜1 + . en are linearly independent. The theorem is proved. We apply the Kronecker symbol determined in (5. Writing .12) Substituting (5. The relationship det S det T = 1 follows from the matrix equality S T = 1. e (5. i=1 It is easy to note that this equality is equivalent to the matrix equality S T = 1. . ˜n e in a linear vector space V of the dimension n. . . Due to the theorem 5. Since the basis vectors e1 . . (5. Then let’s ˜ ˜ determine the other n vectors e1 . k δj = (5. which is called the Kronecker symbol. + αn · en = 0. which was proved just above. Let’s choose an arbitrary e1 . .

. (5. . e ˜ Let’s consider two bases e1 . . From the course of algebra we know that each homogeneous system of linear equations with nondegenerate square matrix has unique solution.. This means that an arbitrary linear combination (5. en is treated as an old basis and e e ˜ ˜ e1 .. The coordinates of a vector x in two bases e1. It can be expanded in each of these two bases: n n x= k=1 xk · e k ... . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. . ... . . . .. . .. e1 .. Comparing this expansion x with the first expansion (5. en of a linear vector space V are related by formulas n n k Si x i . .14) are known as transformation formulas for the coordinates of a vector under a change of basis. . . . αn: 1 1 S1 α1 + . when e1..14) where S and T are direct and inverse transition matrices for the passage from e1 . Theorem 5. The relationships (5. .13): n n n k Si · e k n k Si x i ˜ k=1 i=1 x= i=1 xi · ˜ = k=1 · ek . en and e1.. x= i=1 xi · ˜ i . . and. this fixes the vector x itself. we find that they form a basis in V .13) and applying the theorem on uniqueness of the expansion of a vector in a basis.13) Once the coordinates of x in one of these two bases are fixed. .14) we substitute the expan˜ sion of the vector ei taken from (5. we get a homogeneous system of linear algebraic equations with respect to the variables α1 .26 CHAPTER I. en to ˜1.. . these sums in expanded form. .. In order to prove the first relationship (5. . en to e1 . while the matrix S appears to be a direct transition matrix ˜ for passing from e1.. n n S1 α1 + .. which is purely zero: α1 = .. . The theorem is proved.4. . . ... we derive n xk = i=1 k Si x i . e Applying the proposition (4) from the theorem 4... . . ... . Proof. . ˜n. . ˜ .. i. .. is ˜ necessarily trivial. Hence. . ˜ i=1 xk = xi = ˜ k=1 i Tk xk . . ˜n in a linear vector space V e related by the transition matrix S. . . .8) into the second relationship (5. . ˜n is a linear independent system of vectors.. . hence. . . . .12).5 to these vectors. which is equal to zero. .. en and ˜ ˜ e1 . . e.. The matrix of coefficients of this system coincides with S. . Let x be some arbitrary vector of the space V .. . . . . + Sn αn = 0. = αn = 0. ... . . ˜ e (5. ˜n.. this fixes its coordinates in another basis... en is treated as a new one. . + Sn αn = 0.

which can be finite. Ui is a subspace. Let S be the union of subspaces Ui ⊂ V . u2. S= i∈I Ui .§ 6. in general case we should enumerate the subspaces by the elements of some indexing set I.2. i ∈ I. INTERSECTIONS AND SUMS OF SUBSPACES. the subset S in (6. + uik . + α k · sk . or even non-enumerable. The second formula (5. and u are the vectors from the subset U . For to denote the sum of subspaces W = S we use the standard summation sign: W = Ui = i∈I Ui .1. . then they can be enumerated by the positive integers. Let’s denote by U and by S the intersection and the union of all subspaces that we consider: U= i∈I Ui . The set U in (6. A vector w of a linear vector space V belongs to the sum of subspaces Ui . Suppose that we have a certain number of subspaces in a linear vector space V .2) for the vector w. The linear span of the union of subspaces Ui . u1 + u2 ∈ Ui and α · u ∈ Ui for any i ∈ I and for any α ∈ K.1. 27 This is exactly the first transformation formula (5. The theorem is proved. Then they belong to Ui for each i ∈ I. is called the sum of these subspaces. Proof. . Therefore. However. u1 + u2 ∈ U and α · u ∈ U . where i ∈ I. hence. . . Then w is a linear combination of finite number of vectors taken from S: w = α 1 · s1 + .1) is not empty since zero vector is an element of each subspace Ui . (6. § 6. . Theorem 6. This leads to the equality (6. k. i ∈ I. The number of subspaces can be finite or infinite enumerable. Therefore.2) Proof. Intersections and sums of subspaces.14) is proved similarly. But S is the union of subspaces Ui . where m = 1. i ∈ I. (6. sm ∈ Uim and αm · sm = uim ∈ Uim . infinite enumerable.1) Theorem 6. introduce the following concept.14). Suppose that w ∈ W . Therefore we need to Definition 6. . Suppose that u1. Let’s verify the conditions (1) and (2) from the definition 2. In general. where ui ∈ Ui .1) is not a subspace.2 for U . However. The intersection of an arbitrary number of subspaces in a linear vector space V is a subspace in V . . . . if and only if it is represented as a sum of finite number of vectors each of which is taken from some subspace Ui : w = ui1 + . In order to designate this fact we write Ui ⊂ V .

. The theorem is proved. . if for any vector w ∈ W the expansion (6. + Uk for an arbitrary vector w of the subspace W we have the expansion (6. Therefore. . (6.6) Due to (6. Definition 6.A. e 1 s1 . If dim W = dim U1 + . . Let’s join the vectors of all bases into one system ordering them alphabetically: e 1 1 . .2): w = u1 + . . .. in general case it is not a minimal spanning system).3. . CopyRight c Sharipov R. Proof. . (6.6) the existence of two different expansions (6.3) cannot be reduced. Conversely. . Then uim ∈ Uim and Uim ⊂ S. uk in the bases of corresponding subspaces U1 . The dimension of W is equal to the sum of dimensions of the subspaces Ui if and only if W is the direct sum: W = U1 ⊕ . e k s k . . . . where ui ∈ Ui . The sum W of subspaces Ui . then the number of vectors in (6. From any expansion (6. . suppose that w is a vector given by formula (6.2) is unique. Uk : si ui = j=1 αi j · e i j . .. ei si be a basis in Ui . e.5) of this vector in the basis (6.4) in the basis of corresponding subspace Ui . ⊕ Uk . + Uk be the sum a finite number of finitedimensional subspaces. . the expansion (6. . . (6. . Let’s choose a basis in each subspace Ui .. (6. . . + Uk is the direct sum.+ j=1 αk j · e k j . . . .28 CHAPTER I. Hence. .4) for some vector w would mean the existence of two different expansions (6. Theorem 6. (6.3) Due to the equality W = U1 + . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. . In this case for the direct sum of subspaces we use the special notation: W = i∈I Ui . i. . . .4) we can derive the following expansion of the vector w in the basis (6. . .2. . Therefore (6.4) Expanding each vector ui of (6.3) is a basis in W . + uk . . e k 1 . uim ∈ S. . the vector w belongs to the linear span of S. 1996.2).3).5) are determined by the expansions of the vectors u1. 2004.3): s1 sk w= j=1 α1 j · e 1 j + . .3) is a spanning system of vectors in W (though. Hence. Suppose that dim Ui = si and let ei 1 . . we get the expansion of w in vectors of the system (6. is called the direct sum.4) is unique and the sum of subspaces W = U1 + . + dim Uk . Let W = U1 + . .5) The sums enclosed into the round brackets in (6.3). i ∈ I.

(6.7). Now we can find the dimension of the subspace W by counting the number of vectors in (6.3) is a linearly independent system of vectors. . . therefore. αi j = 0. But they do not necessarily form a linearly independent system in this case.9) Proof. + dim Uk . . . For the dimension of U2 this yields dim U2 = s + q. The dimension of the sum of two arbitrary finite-dimensional subspaces U1 and U2 in a linear vector space V is equal to the sum of their dimensions minus the dimension of their intersection: dim(U1 + U2) = dim U1 + dim U2 − dim(U1 ∩ U2 ).3). In a similar way.. . . form a spanning system in W . If the sum of subspaces W = U1 + . We shall do it for the case of two subspaces. we have dim W dim U1 + . . W = U1 ⊕ .3): dim W = s1 + . .7) is trivial. the expansion 0 = 0 + . es in the intersection U1 ∩ U2 . . the system of vectors (6..§ 6. being a spanning system and being linearly independent. .7) is an expansion of the form (6. . . + 0 is unique expansion of the form (6. Thus. . .4) for zero vector w = 0. k. . Theorem 6. . ⊕ Uk .+ j=1 αk j · e k j . . However. Note. . .4) for the vector w = 0. + Uk is not necessarily the direct sum. . It is easy to see that ui ∈ Ui . From the inclusion U1 ∩ U2 ⊂ U1 and from the inequality (6. + dim Uk . Let’s denote dim(U1 ∩ U2 ) = s and choose a basis e1. . . es . . . . due to the inclusion U1 ∩ U2 ⊂ U2 we can construct a basis e1. . the vectors (6. The theorem is proved. (6. and (6. suppose that W = U1 ⊕ . . . . . . + sk = dim U1 + . .3) is a basis of W . It’s clear that these equalities are the expansions of zero vector in the bases of the subspaces Ui . Due to the inclusion U1 ∩ U2 ⊂ U1 we can apply the theorem 4.8) we conclude that all subspaces considered in the theorem are finite-dimensional. . es . . . nevertheless. es+p+q in U2 . Hence. es of the intersection U1 ∩ U2 up to a basis e1.4. But 0 = 0 + . .7) ˜ ˜ Let’s denote by u1. (6. INTERSECTIONS AND SUMS OF SUBSPACES. . For the dimension of U1 . (6.8 on completing the basis. This is another expansion for the vector w = 0. . . ⊕ Uk . Then we have the equalities si 0= j=1 αi j · ei j for all i = 1. . es+1 . we have dim U1 = s + p. .8) Sharpening this inequality in general case is sufficiently complicated. therefore. . es+p in U1 . Therefore. uk the values of sums enclosed into the round brackets in ˜ (6. . 29 Conversely. . es+p+1 . . . . . This means that the linear combination (6. . For this purpose we consider a linear combination of these vectors being equal to zero: s1 sk 0= j=1 α1 j · e 1 j +. + 0 and 0 ∈ Ui . We know that the vectors (6. Let’s prove that they are linearly independent. This theorem says that we can complete the basis e1.3) span the subspace W .

. (6. . we find that the vector w is linearly expressed through the vectors (6. . Hence.12) Because of the first expression (6. es+p+q . .12).12) we have u ∈ U1 . . u ∈ U1 ∩ U2 .10) is a spanning system of vectors in U1 + U2 . In order to prove that (6. we find that s i=1 q βi · e i + i=1 αs+p+i · es+p+i = 0. es+p . Let’s denote by u the value of left and right sides of this equality.12) yields u ∈ U2 .10) Let’s prove that these vectors (6. while the second expression (6. (6. u=− i=1 αs+p+i · es+p+i . Hence. This means that we can expand the vector u in the basis e1 .8 and consider the total set of vectors in them: e1 . . u2 = i=1 j=1 Adding these two equalities.13) Comparing this expansion with the second expression (6.30 CHAPTER I. (6. Let’s expand the vectors u1 and u2 in the above two bases of the subspaces U1 and U2 respectively: s p u1 = i=1 s αi · e i + αi · ei + ˜ j=1 q βs+j · es+j .10) form a basis in the sum of subspaces U1 + U2 . es+1. γs+p+j · es+p+j . es.10). . . The relationship (6. . . . . (6. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.10) is a linearly independent system of vectors we consider a linear combination of these vectors being equal to zero: s+p q i=1 αi · e i + i=1 αs+p+i · es+p+i = 0.2) for this vector is written as w = u1 + u2. (6. es+p+1 .14) . . Then for the vector u we get the following two expressions: s+p q u= i=1 αi · e i . . Now let’s join together the two bases constructed above with the use of theorem 4. Let w be some arbitrary vector in U1 + U2 . (6. . . es: s u= i=1 βi · e i .11) Then we transform this equality by moving the second sum to the right hand side: s+p q i=1 αi · e i = − i=1 αs+p+i · es+p+i .

Thus. Hence.15) and (6. . In particular. (7. β1 = .1) The vector v in (7. we have the following equalities: αs+p+1 = .11) is trivial. es+p+1 . which is an element of any subspace: a − a = 0 ∈ U . . . COSETS OF A SUBSPACE. all coefficients in (6. it is obtained by adding the vector v with all vectors of the subspace U . . es+p+q form a basis in U2 .16) Combining (6.12) we get the equality s+p i=1 αi · ei = 0. then b ∈ ClU (a). the difference a − a is equal to zero vector.§ 7. . all coefficients αi in the above equality should be zero: α1 = . = αs+p = 0. A coset of the subspace U determined by a vector v ∈ V is the following set of vectors1 : ClU (v) = {w ∈ V : w − v ∈ U }. . . Hence.1).15) Moreover.10) are linearly independent. = αs+p+q = 0.13) this means that u = 0. 31 Note that the vectors e1. (6. The first proposition is obvious. they form a basis in U1 + U2 . The coset ClU (v) is a very simple thing. . This means that the vectors (6. = αs = αs+1 = . . es .1) is called a representative of the coset (7. Now from the first expansion (6. we see that the linear combination (6. . . Since e1. The coset represented by zero vector is the especially simple thing since ClU (0) = U .4 in whole is proved. = βs = 0. The cosets of a subspace U in a linear vector space V possess the following properties: (1) a ∈ ClU (a) for any a ∈ V . the relationship (6. (2) if a ∈ ClU (b). es+1 . Cosets of a subspace.14) are equal to zero. . then a ∈ ClU (c). . . The concept of factorspace.16).1. . § 7. . 1 We used the sign Cl for cosets since in Russia they are called adjacency classes. It is called a zero coset.9) and the theorem 6. Due to (6. For the dimension of the subspace U1 + U2 this yields dim(U1 + U2 ) = s + p + q = (s + p) + (s + q) − s = = dim U1 + dim U2 − dim(U1 ∩ U2 ). (6. Therefore. . . Let V be a linear vector space and let U be a subspace in it. . we have a ∈ ClU (a). es+p are linearly independent vectors. Theorem 7. which is the formal definition of cosets. due to the formula (7. Proof. . .1). Indeed. THE CONCEPT OF FACTORSPACE. They are linearly independent. . . es . (3) if a ∈ ClU (b) and b ∈ ClU (c). . .

LINEAR VECTOR SPACES AND LINEAR MAPPINGS. The set of all cosets of a subspace U in a linear vector space V is called the factorset or quotient set V /U . The opposite inclusion ClU (b) ⊂ ClU (a) is proved similarly. a − c ∈ U and a ∈ ClU (c) (see formula (7. Assume that the intersection of two cosets ClU (a) and ClU (b) is not empty. Combining b ∈ ClU (c) and c ∈ ClU (a) and applying the proposition (3) of the theorem 7. Theorem 7. Let a ∈ ClU (b). The theorem is proved. (2) symmetry: a ∼ b implies b ∼ a.2).1) and the definition 2. Due to the theorem proved just above any two . b − a ∈ U and b ∈ ClU (a) (see formula (7. we get b ∈ ClU (a). Then a − b ∈ U and b − c ∈ U .2) coincides with the formal definition (7. In order to keep the completeness of presentation we shall not use the notation a ∼ b in place of a ∈ ClU (b) anymore. Therefore.1 we derive b ∈ ClU (c).2) In our particular case the formal definition (7. For this purpose let’s consider an arbitrary vector x ∈ ClU (a). Each equivalence relation determined in a set V partitions this set into a union of mutually non-intersecting subsets. Note that a − c = (a − b) + (b − a). If two cosets ClU (a) and ClU (b) of a subspace U ⊂ V are intersecting. From x ∈ ClU (a) and a ∈ ClU (b) we derive x ∈ ClU (b). This completes the proof of the theorem in whole. The third proposition is proved. This condition establishes some kind of dependence between two vectors a and b. For b − a.2. symmetry. The opposite inclusion a ∈ ClU (b) then is obtained by applying the proposition (2) of the theorem 7.1). Let’s write a ∼ b as an abbreviation for a ∈ ClU (b). which is introduced just above: (1) reflexivity: a ∼ a. Let a ∈ ClU (b). we have b − a = (−1) · (a − b). Then the theorem 7. Then a − b ∈ U .1) and the definition 2. which are called the equivalence classes: Cl(v) = {w ∈ V : w ∼ v}. This dependence is not strict: the condition a ∈ ClU (b) does not exclude the possibility that a ∈ ClU (b) for some other vector a .1. Let a ∈ ClU (b) and b ∈ ClU (c). Then there is an element c belonging to both of them: c ∈ ClU (a) and c ∈ ClU (b). Proof. A binary relation possessing the properties of reflexivity. From these two inclusions we derive ClU (a) = ClU (b). and we shall not refer to the theory of binary relations (though it is simple and well-known). Hence.1. then they do coincide.1 reveals the following properties of the binary relation a ∼ b. ClU (a) ⊂ ClU (b). and transitivity is called an equivalence relation.2 again). (7. (3) transitivity: a ∼ b and b ∼ c implies a ∼ c. Let’s prove that two cosets ClU (a) and ClU (b) do coincide. Hence. Such non-strict dependences in mathematics are described by the concept of binary relation (see details in [1] and [4]).32 CHAPTER I. Due to the proposition (2) of the above theorem 7. Instead of this we shall derive the result on partitioning V into the mutually non-intersecting cosets from the following theorem. The second proposition is proved.

2. This means that the v v v ˜ ˜ cosets determined by vectors v1 + v2 and v1 + v2 do coincide with each other: ClU (˜1 + v2) = ClU (v1 + v2 ). Definition 7. This proof is called the proof of correctness. α · ClU (v) = ClU (α · v).2 both are determined using some representative vectors v1 ∈ Q1. The choice of a representative vector in a coset is not unique.1 and 7. v1 be two vectors of Q2. COSETS OF A SUBSPACE.1 and 7. Theorem 7. (7.3.2. and v ∈ Q. Then due to the theorem 7. Indeed.1 and 7. while the union of all cosets coincides with V : V = Q∈V /U Q.1. For the addition of cosets and for the multiplication of them by numbers we use the same signs of algebraic operations as in case of vectors.3) These definitions require some comments. v1 be two vectors of Q1 and let v1 .2 are correct and the results of the algebraic operations of coset addition and of coset multiplication by numbers do not depend on the choice of representatives in cosets. ˜ v2 − v2 ∈ U. where v1 ∈ Q1 and v2 ∈ Q2. The definitions 7. e. i. Definition 7. Lat’s take consider two different choices of representatives within cosets Q1 and Q2 . Theorem 7.§ 7. ˜ Hence. where v ∈ Q. Let Q be a coset of a subspace U . (˜1 + v2 ) − (v1 + v2 ) = (˜1 − v1 ) + (˜2 − v2 ) ∈ U . For this reason the following theorem is a simple reformulation of the definition of cosets. therefore. This coset Q is determined by v according to the formula Q = ClU (v). the coset Q = Q1 + Q2 in the definition 7.2 can be expressed by formulas ClU (v1 ) + ClU (v2 ) = ClU (v1 + v2 ). THE CONCEPT OF FACTORSPACE. Two vectors v and w belong to the same coset of a subspace U if and only if their difference v − w is a vector of U . The sum of cosets Q1 and Q2 is a coset Q of the subspace U determined by the equality Q = ClU (v1 + v2 ). we need especially to prove the uniqueness of the results of algebraic operations determined in the definitions 7. The product of Q and a number α ∈ K is a coset P of the subspace U determined by the relationship P = ClU (α · v). This equality is a consequence of the fact that any vector v ∈ V is an element of some coset: v ∈ Q.1 and the coset P = α · Q in the definition 7.3 we have the following two equalities: ˜ v1 − v1 ∈ U.4. 33 different cosets Q1 and Q2 from the factorset V /U have the empty intersection Q1 ∩ Q2 = ∅. The definitions 7. v2 ∈ Q2 . Proof. Q = Q1 + Q2 and P = α · Q. For the beginning we study the operation of coset addition. Let Q1 and Q2 be two cosets of a subspace U . Let ˜ ˜ v1 . v ˜ .

In verifying the axiom (4) we should indicate the opposite coset Q for a coset Q = ClU (v). ˜ Now let’s consider two different representatives v and v within the coset Q. The factorset V /U of a linear vector space V over a subspace U equipped with algebraic operations (7.2 for the operation of multiplication of cosets by numbers. In order to verify the axiom (3) we should have a zero element in V /U . Theorem 7. This proves the correctness of the definition 7. This space is called the factorspace or the quotient space of the space V over its subspace U . = ClU (α · (v1 + v2 )) = ClU (α · v1 + α · v2) = . The commutativity and associativity axioms for the operation of coset addition follow from the following calculations: ClU (v1 ) + ClU (v2 ) = ClU (v1 + v2 ) = = ClU (v2 + v1 ) = ClU (v2 ) + ClU (v1 ). Here are these calculations: α · (ClU (v1 ) + ClU (v2 )) = α · ClU (v1 + v2 ) = = ClU (α · v1 ) + ClU (α · v2 ) = α · ClU (v1 ) + α · ClU (v2 ). Then Q + Q = ClU (v) + ClU (v ) = ClU (v + v ) = ClU (0) = 0.3) is a linear vector space. which proves the correctness of the definition 7.1). LINEAR VECTOR SPACES AND LINEAR MAPPINGS. α · v − α · v = α · (˜ − v) ∈ U . Hence. Proof. The rest axioms (5)-(8) are verified by direct calculations on the base of formula (7. The proof of this theorem consists in verifying the axioms (1)-(8) of a linear vector space for V /U .5. This yields v ˜ ClU (α · v) = ClU (α · v). (α + β) · ClU (v) = ClU ((α + β) · v) = ClU (α · v + β · v) = = ClU (α · v) + ClU (β · v) = α · ClU (v) + β · ClU (v).34 CHAPTER I. they follow from the corresponding axioms for the operation of vector addition (see definition 2. α · (β · ClU (v)) = α · ClU (β · v) = ClU (α · (β · v)) = = ClU ((αβ) · v) = (αβ) · ClU (v). 1 · ClU (v) = ClU (1 · v) = ClU (v).1 for the operation of coset addition. ˜ ˜ Then v − v ∈ U . We define it as follows: Q = ClU (v ). (ClU (v1 ) + ClU (v2 )) + ClU (v3 ) = ClU (v1 + v2 ) + ClU (v3 ) = = ClU ((v1 + v2 ) + v3) = ClU (v1 + (v2 + v3 )) = ClU (v1 ) + ClU (v2 + v3 ) = ClU (v1 ) + (ClU (v2 ) + ClU (v3 )). In essential. The zero coset 0 = ClU (0) is the best pretender for this role: ClU (v) + ClU (0) = ClU (v + 0) = ClU (v).3) for coset operations.

. this is not necessary since due to the property (10). It is clear that u ∈ U . En−s = ClU (en ). + βn−s · en. Denote dim V = n and dim U = s. . . . Then we can write v = u + β1 · es+1 + . + βn−s · en. . . + αs · es . . For the coset Q = ClU (v) this equality yields Q = β1 · ClU (es+1 ) + . . the equality (7. . let Q be an arbitrary coset in V /U and let v ∈ Q be some representative vector of this coset. (7. 1996. . . COSETS OF A SUBSPACE. en we consider the corresponding coset of a subspace U : E1 = ClU (es+1 ). Let’s expand the vector v in the above basis of V : v = (α1 · e1 + . Hence. 35 The above equalities complete the verification of the fact that the factorset V /U possesses the structure of a linear vector space. Theorem 7. Note that verifying the axiom (4) we have defined the opposite coset Q for a coset Q = ClU (v) by means of the relationship Q = ClU (v ). .4) Proof. then for any its subspace U the factorspace V /U also is finite-dimensional and its dimension is determined by the following formula: dim U + dim(V /U ) = dim V.5) Now let’s show that the cosets (7. If U = V then the factorspace V /U consists of zero coset only: V /U = {0}. The only simplification in finite-dimensional case is that we can calculate the dimension of the factorspace V /U . see theorem 2.6. then s < n. complete it with vectors es+1 . Therefore. . + βn−s · En−s . The finite or infinite dimensionality of a subspace U also makes no difference. . .A. . . . (7. + αs · es) + β1 · es+1 + . en up to a basis in V . . One could check the correctness of this definition. Due to the theorem 4. .§ 7. 2004. En−s is a finite spanning system in V /U . . V /U is a finite-dimensional linear vector space. . If a linear vector space V is finite-dimensional. .5) span the factorspace V /U . + βn−s · ClU (en ). . Let’s consider a nontrivial case U V . . according to the theorem 4. However. . Let’s denote by u the initial part of this expansion: u = α1 · e1 + .8. . . The dimension of such zero space is equal to zero.. This means that E1 . . . . Let’s choose a basis e1. THE CONCEPT OF FACTORSPACE. Indeed. Since u ∈ U .5 the subspace U is finite-dimensional. . The concept of factorspace is equally applicable to finite-dimensional and to infinite-dimensional spaces V . For each of complementary vectors es+1 . es in U and.1. . Hence. To determine its dimension we CopyRight c Sharipov R. we have Q = β1 · E1 + .4) in this trivial case is fulfilled. we have ClU (u) = 0. the opposite coset Q for Q is unique. where v is the opposite vector for v.

. equating two expression for the vector u. which proves the equality (7. Let’s denote u = γ1 · es+1 + . . . . (2) the composition of any two linear mappings f : V → W and g : W → U is a linear mapping g ◦ f : V → U . therefore. Hence. The linearity of the identical mapping is obvious. The relationship f(0) = 0 is one of the simplest and immediate consequences of the above two properties (1) and (2) of linear mappings. we get the following equality: −α1 · e1 − . (8. The theorem is proved. . which is equal to zero.6) and. idV (α · v) = α · v = α · idV (v). Then.5). Definition 8. + γn−s · en = 0. This is the linear combination of basis vectors of V . we have f(0) = f(0 + (−1) · 0) = f(0) + (−1) · f(0) = 0. . Linear mappings possess the following three properties: (1) the identical mapping idV : V → V of a linear vector space V onto itself is a linear mapping. this linear combination is trivial and γ1 = . let’s consider a linear combination of these cosets being equal to zero: γ1 · E1 + . .6) we derive γ1 · ClU (es+1 ) + . + αs · es . v2 ∈ V . . Let’s expand u in the basis of subspace U : u = α1 · e1 + .1. = γn−s = 0. . A mapping f : V → W from the space V to the space W is called a linear mapping if the following two conditions are fulfilled: (1) f(v1 + v2) = f(v1 ) + f(v2 ) for any two vectors v1 . . This proves the triviality of linear combination (7. the linear independence of cosets (7. from (7. . Indeed. .5) are linearly independent. Basis vectors e1 .1) Theorem 8. + γn−s · en ) = ClU (0). here is the verification of the conditions (1) and (2) from the definition 8. . § 8. Thus.36 CHAPTER I. + γn−s · ClU (en ) = (7. .1. .6) = ClU (γ1 · es+1 + . + γn−s · en. − αs · es + γ1 · es+1 + . Indeed. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. Passing from cosets to their representative vectors. for the dimension of factorspace this yields dim(V /U ) = n − s. + γn−s · En−s = 0. . then the inverse mapping f −1 : W → V also is a linear mapping. . (3) if a linear mapping f : V → W is bijective. . . (2) f(α · v) = α · f(v) for any vector v ∈ V and for any number α ∈ K.4). .1 for idV : idV (v1 + v2 ) = v1 + v2 = idV (v1 ) + idV (v2 ). . Proof. which means u ∈ U . From the above equality for this vector we get ClU (u) = ClU (0). shall prove that the cosets (7. Linear mappings. Indeed. Let V and W be two linear vector spaces over a numeric field K. en are linearly independent.

§ 8. LINEAR MAPPINGS.

37

Let’s prove the second proposition of the theorem 8.1. Consider the composition g ◦ f of two linear mappings f and g. For this composition the conditions (1) and (2) from the definition 8.1 are verified as follows: g ◦ f(v1 + v2 ) = g(f(v1 + v2 ) = g(f(v1 ) + f(v2 )) = = g(f(v1 )) + g(f(v2 )) = g ◦ f(v1 ) + g ◦ f(v2 ), g ◦ f(α · v) = g(f(α · v)) = g(α · f(v)) = α · g(f(v)) = α · g ◦ f(v). Now let’s prove the third proposition of the theorem 8.1. Suppose that f : V → W is a bijective linear mapping. Then it possesses unique bilateral inverse mapping f −1 : W → V (see theorem 1.9). Let’s denote z2 = f −1 (α · w) − α · f −1 (w). z1 = f −1 (w1 + w2 ) − f −1 (w1 ) − f −1 (w2 ),

It is obvious that the linearity of the inverse mapping f −1 is equivalent to vanishing z1 and z2 . Let’s apply f to these vectors: f(z1 ) = f(f −1 (w1 + w2 ) − f −1 (w1 ) − f −1 (w2 )) = = (w1 + w2 ) − w1 − w2 = 0,

= f(f −1 (w1 + w2 )) − f(f −1 (w1 )) − f(f −1 (w2 )) =

f(z2 ) = f(f −1 (α · w) − α · f −1 (w)) = f(f −1 (α · w))− − α · f(f −1 (w)) = α · w − α · w = 0.

A bijective mapping is injective. Therefore, from the equalities f(z1 ) = 0 and f(z2 ) = 0 just derived and from the equality f(0) = 0 derived in (8.1) it follows that z1 = z2 = 0. The theorem is proved. Each linear mapping f : V → W is related with two subsets: the kernel Ker f ⊂ V and the image Im f ⊂ W . The image Im f = f(V ) of a linear mapping is defined in the same way as it was done for a general mapping in § 1: Im f = {∈W : ∃ v ((v ∈ A) & (f(v) = w))}. The kernel of a linear mapping f : V → W is the set of vectors in the space V that map to zero under the action of f: Ker f = {v ∈ V : f(v) = 0} Theorem 8.2. The kernel and the image of a linear mapping f : V → W both are subspaces in V and W respectively. Proof. In order to prove this theorem we should check the conditions (1) and (2) from the definition 2.2 as appliedto the subsets Ker f ⊂ V and Im f ⊂ W .

38

CHAPTER I. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.

Suppose that v1 , v2 ∈ Ker f. Then f(v1 ) = 0 and f(v2 ) = 0. Suppose also that v ∈ Ker f. Then f(v) = 0. As a result we derive f(v1 + v2 ) = f(v1 ) + f(v2 ) = 0 + 0 = 0, f(α · v) = α · f(v) = α · 0 = 0. Hence, v1 + v2 ∈ Ker f and α · v ∈ Ker f. This proves the proposition of the theorem concerning the kernel Ker f. Let w1 , w2 , w ∈ Im f. Then there are three vectors v1 , v2, v in V such that f(v1 ) = w1 , f(v2 ) = w2 , and f(v) = w. Hence, we have w1 + w2 = f(v1 ) + f(v2 ) = f(v1 + v2 ), α · w = α · f(v) = f(α · v). This meant that w1 + w2 ∈ Im f and α · w ∈ Im f. The theorem is proved. Remember that, according to the theorem 1.2, a linear mapping f : V → W is surjective if and only if Im f = W . There is a similar proposition for Ker f. Theorem 8.3. A linear mapping f : V → W is injective if and only if its kernel is zero, i. e. Ker f = {0}. Proof. Let f be injective and let v ∈ Ker f. Then f(0) = 0 and f(v) = 0. But if v = 0, then due to injectivity of f it would be f(v) = f(0). Hence, v = 0. This means that the kernel of f consists of the only one element: Ker f = {0}. Now conversely, suppose that Ker f = {0}. Let’s consider two different vectors v1 = v2 in V . Then v1 − v2 = 0 and v1 − v2 ∈ Ker f. Therefore, f(v1 − v2 ) = 0. Applying the linearity of f, from this inequality we derive f(v1 ) − f(v2 ) = 0, i. e. f(v1 ) = f(v2 ). Hence, f is an injective mapping. The theorem is proved. The following theorem is known as the theorem on the linear independence of preimages. Here is its statement. Theorem 8.4. Let f : V → W be a linear mapping and let v1 , . . . , vs be some vectors of a linear vector space V such that their images f(v1 ), . . . , f(vn ) in W are linearly independent. Then the vectors v1, . . . , vs themselves are also linearly independent. Proof. In order to prove the theorem let’s consider a linear combination of the vectors v1 , . . . , vs being equal to zero: α1 · v1 + . . . + αs · vs = 0. Applying f to both sides of this equality and using the fact that f is a linear mapping, we obtain quite similar equality for the images α1 · f(v1 ) + . . . + αs · f(vs ) = 0. However, these images f(v1 ), . . . , f(vn ) are linearly independent. Hence, all coefficients in the above linear combination are equal to zero: α1 = . . . = αs = 0.

§ 9. THE MATRIX OF A LINEAR MAPPING.

39

Then the initial linear combination is also necessarily trivial. This proves that the vectors v1 , . . . , vs are linearly independent. A linear vector space is a set. But it is not simply a set — it is a structured set. It is equipped with algebraic operations satisfying the axioms (1)-(8). Linear mappings are those being concordant with the structures of a linear vector space in the spaces they are acting from and to. In algebra such mappings concordant with algebraic structures are called morphisms. So, in algebraic terminology, linear mappings are morphisms of linear vector spaces. Definition 8.2. Two linear vector spaces V and W are called isomorphic if there is a bijective linear mapping f : V → W binding them. The first example of an isomorphism of linear vector spaces is the mapping ψ : V → Kn in (5.4). Because of the existence of such mapping we can formulate the following theorem. Theorem 8.5. Any n-dimensional linear vector space V is isomorphic to the arithmetic linear vector space Kn. Isomorphic linear vector spaces have many common features. Often they can be treated as undistinguishable. In particular, we have the following fact. Theorem 8.6. If a linear vector space V is isomorphic to a finite-dimensional vector space W , then V is also finite-dimensional and the dimensions of these two spaces do coincide: dim V = dim W . Proof. Let f : V → W be an isomorphism of spaces V and W . Assume for the sake of certainty that dim W = n and choose a basis h1, . . . , hn in W . By means of inverse mapping f −1 : W → V we define the vectors ei = f −1 (hi ), i = 1, . . . , n. Let v be an arbitrary vector of V . Let’s map it with the use of f into the space W and then expand in the basis: f(v) = α1 · h1 + . . . + αn · hn. Applying the inverse mapping f −1 to both sides of this equality, due to the linearity of f −1 we get the expansion v = α 1 · e1 + . . . + α n · en . From this expansion we derive that {e1, . . . , en } is a finite spanning system in V . The finite dimensionality of V is proved. The linear independence of e1 , . . . , en follows from the theorem 8.4 on the linear independence of preimages. Hence, e1, . . . , en is a basis in V and dim V = n = dim W . The theorem is proved. § 9. The matrix of a linear mapping. Let f : V → W be a linear mapping from n-dimensional vector space V to m-dimensional vector space W . Let’s choose a basis e1 , . . . , en in V and a basis h1, . . . , hm in W . Then consider the images of basis vectors e1, . . . , en in W and

Changing the order of summations in the above expression.. . It is used for calculating the coordinates of the vector f(x) through the coordinates of x.. . ... + xn · en . ..5) . .. . which determine the components of this matrix. . . ..1) we have n expansions that define n m numbers Fji. .. . m .. . In other words. (9.. . hm : 1 m f(e1 ) = F1 · h1 + . . F = . (9.. .3) Let x be an arbitrary vector of V and let y = f(x) be its image under the mapping f. .. en and h1.. .. .... . These numbers are arranged into a rectangular m × n matrix which is called the matrix of the linear mapping f in a pair of bases e1 . while the lower index determines the column number. we get the expansion of the vector y in the basis h1. (9. . Fn 1 Fn . + F1 · hm ... taking into account (9.40 CHAPTER I.. hm : m n j=1 y = f(x) = i=1 Fji xj · hi.... . . .1). . . then. for the vector y we get n n m y = f(x) = j=1 xj · f(ej ) = j=1 xj · i=1 Fji · hi . y m = 1 F1 . . . m F1 . x1 . + Fn · hm ... .3)... In matrix form this formula is written as y1 .4) This formula (9... . . The expansions (9.. LINEAR VECTOR SPACES AND LINEAR MAPPINGS.. . . Fn 1 Fn .. f(en ) in the basis h1. . .... . Due to the uniqueness of such expansion for the coordinates of the vector y in the basis h1. . m . . .. expand them in the basis h1. hm we get the following formula: n y = j=1 i Fji xj .. . If we expand the vector x in the basis: x = x1 · e1 + .. m F1 .. . .. the matrix F is composed by the column vectors formed by coordinates of the vectors f(e1 ). . .... . are convenient to write as follows: m f(ej ) = i=1 Fji · hi . hm : 1 F1 . . (9... xn (9. . . . . 1 m f(en ) = Fn · h1 + .2). . the upper index determines the row number. . . .4) is the basic application of the matrix of a linear mapping.. hm ..1) Totally in (9. · .2) When placing the element Fji into the matrix (9...

en in V is chosen. A more straightforward way of proving the theorem 9. .6) can be written as ˜ F = ψ ◦f ◦ ψ−1 .5) can be treated as ˜ a mapping F : Kn → Km . .1 than we considered above can be based on the following theorem. . .5). . hm in V and W we can use the second relationship (9. . ˜ the commutativity means ψ ◦ f = F ◦ ψ.5) determines a linear mapping F : Kn → Km . ˜ f = ψ−1 ◦ F ◦ ψ. . (9. . For any basis e1 . Choosing bases e1. . Theorem 9.4) and the theorem 8. These three mappings ψ. .7) in order to define the linear mapping f : V → W . Then the relationship (9. Proof. 41 Remember that when composing a column vector of the coordinates of a vector x. the diagram (9. . Theorem 9. . . Due to the bijectivity of linear mappings ˜ ψ and ψ the condition of commutativity of the diagram (9.7) are fulfilled due to the way the matrix F is constructed. . Once the basis e1 .6). . .2. In order to construct the required mapping f we define a mapping ϕ : Kn → W by the following relationship: ϕ: x1 . . . . Now let’s look at the relationships (9. . ψ. en and h1.6) Kn − − → Km −− F Such diagrams are called commutative diagrams if the compositions of mappings «when passing along arrows» from any node to any other node do not depend on a particular path connecting these two nodes. . . this defines the mapping ψ : V → Kn (see (5. n.§ 9.7) from a little bit other point of view. . we have proved the following theorem. . Hence. wn in another vector space W there is a linear mapping f : V → W such that f(ei ) = wi for i = 1. F and the initial mapping f can be written in a diagram: V −−→ W −−    ˜ ψ ψ f (9. .4)). . + x n · wn . we negotiated to understand this procedure as a linear mapping ψ : V → Kn ˜ (see formulas (5. . en in n-dimensional vector space V and for any set of n vectors w1 .1. . Then the matrix relationship (9. Now it is easy to verify that the required mapping is the composition f = ϕ ◦ ψ. The matrix of this mapping in the bases e1. . Denote by ψ : W → Km the analogous mapping for a vector y in W . . .7) The reader can easily check that the relationships (9. .6) is commutative. Thus. xn → x1 · w1 + . . Let V and W be two spaces of the dimensions n and m respectively. . THE MATRIX OF A LINEAR MAPPING. . . en and h1. Any rectangular m × n matrix F can be constructed as a matrix of a linear mapping f : V → W from n-dimensional vector space V to mdimensional vector space W in some pair of bases in these spaces. . When applied to the diagram (9. hm coincides with F exactly. Suppose that we have an arbitrary m × n matrix F .

en and h1 .9) The relationships (9. Their components are defined as follows: n m j Sk · e j . In our particular case a change of bases in the CopyRight c Sharipov R.8) and (9. Suppose that we have a mapping f : V → W that determines a matrix F upon choosing two bases e1 . .. . i We use these relationships and the above relationships (9. hm : m n j Qr Fji Sk . . en and h1 . i e f(˜k ) = r=1 i=1 j=1 The double sums in round brackets are the coefficients of the expansion of the ˜ ˜ vector f(˜k ) in the basis h1. .9). In order to describe this dependence we consider four bases — two bases in V and other two bases in W . ˜k = e j=1 ˜ hr = i=1 i Pr · hi . . . hi = r=1 ˜ Q r · hr . . . They determine the matrix of the linear e ˜ ˜ ˜ e mapping f in wavy bases ˜1.42 CHAPTER I.8) and (9. 1996. hm in V and W respectively. = = j=1 Fji · r=1 ˜ Q r · hr i Upon changing the order of summations this result is written as m m n j ˜ Qr Fji Sk · hr . Suppose that S and P are direct transition matrices for that pairs of bases. . The inverse transition matrices T = S −1 and Q = P −1 are defined similarly: n m ej = k=1 ˜ Tjk · ek . . The matrix F essentially depends on the choice of bases. . .10) lead us to the broad class of problems of «bringing to a canonic form». .8) ˜ In a similar way one can derive the converse relationship expressing F through F : m n i ˜r Pr Fk Tjk .10) This is the matrix form of the relationships (9. . They can be written as ˜ F = P −1 F S. hm . ˜ F = P F S −1 .A. 2004. . r=1 k=1 Fji = (9. . . . (9. The transformation formulas like (9. . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. . .3) in order to carry out the following calculations for the vector f(˜k ): e n n j Sk · f(ej ) = m j Sk · i=1 j=1 j Sk · m m i=1 f(˜k ) = e j=1 n Fji · hi . Let’s return to initial situation. i i=1 j=1 ˜r Fk = (9.9) are called the transformation formulas for the matrix of a linear mapping under a change of bases.

. . . . . Theorem 9.. 0 0 . . f(es+i ) = 0 for i = 1. Let’s begin with checking the condition (1) in the theorem 4..§ 9. 0 0 . + αs · es + αs+1 · es+1 + . . . hs form a basis in Im f. The following theorem solving this particular problem is known as the theorem on bringing to the almost diagonal form. . . The matrix of such mapping consists of zeros only. . . therefore. where the matrix F has the most simple (canonic) form. In order to prove the linear independence of these vectors we consider a linear combination of them being equal to zero: α1 · e1 + . . . . + αs+r · es+r = 0. . Other vectors belong to the kernel of the mapping f... . 1 .13) we derive α1 · h1 + . . We choose a basis in Ker f and denote the basis vectors by es+1.. s..11) 0 . .. . The problem of bringing to a canonic form in this case consists in finding the optimal choice of bases. i = 1.6 for the vectors (9.. . . . THE MATRIX OF A LINEAR MAPPING. 0 . .6.. 0 0 . es+1. . . . . . 0        s (9. . . . . . Let f : V → W be a nonzero linear mapping.13) Let’s apply the mapping f to both sides of the equality (9. . + αs · hs = 0. . . . . Let r = dim(Ker f). 43 spaces V and W changes the matrix of the linear mapping f : V → W . . . The rank of a nonzero mapping is not equal to zero.12). The purely zero mapping 0 : V → W maps each vector of the space V to zero vector in W . 0 0 . They are linearly independent. Then there is a choice of bases in V and W such that the matrix F of this mapping has the following almost diagonal form: s F = 1 0 . es+r .. . . .. 0 .. . . . .. 0 0 .. . These vectors e1 . es+r (9. . s. .12) and prove that they form a basis in V . Then from (9. . 0 0 . . . . . . . Then we consider the vectors e1 .. 0 0 . . .. hs in the image space Im f. The integer number s = dim(Im f) is called the rank of the mapping f.3. .4.13) and take into account that f(ei ) = hi for i = 1.. . r. For this purpose we use the theorem 4. . 0 0 .. 0 0 . Let f : V → W be some nonzero linear mapping from ndimensional vector space V to m-dimensional vector space W . There is no need to formulate the problem of bringing it to a canonic form. ... .. . (9. es are linearly independent due to the theorem 8. . . 0 0 . 0 Proof. We begin constructing a canonic base in W by choosing a base h1. The vectors h1 . . . es. . . For each basis vector hi ∈ Im f there is a vector ei ∈ V such that f(ei ) = hi. . . . 1 0 .

12) form a spanning system in V . we reduce (9. . .15) we derive f(˜ ) = 0 for this v ˜ ˜ ˜ vector v. . . In proving this theorem we have proved simultaneously the next one. es+r form a basis in Ker f. − βs · f(es ) = = f(v − β1 · e1 − . αs+1 = . s + r. . (9.14) we derive 0 = f(v) − β1 · f(e1 ) − .13) are necessarily zero. . + βs+r · es+r . . . the vectors (9. .12) form a basis in V . . Then f(v) belongs to Im f. + βs · es + βs+1 · es+1 + . . Due to these expansions the matrix of the mapping f in the bases that we have constructed above has the required almost diagonal form (9. . If j = s + 1. . Thus. . = αs = 0. hs+1 . .12).14) ˜ Let’s denote v = v − β1 · e1 − . + βs · hs . . hs of Im f up to a basis h1. For the vector f(ej ) with j = 1. Now lets check the second condition of the theorem 4. . . α1 = .6 for the vectors (9. . Assume that v is an arbitrary vector in V . From (9. .16) In order to complete the proof of the theorem we need to complete the basis h1. . . The vectors es+1. This yields the equality dim V = s + r. ˜ From the formula v = v − β1 · e1 − . . Taking into account this fact. . − βs · es and the above expansion we get v = β1 · e1 + . . Let’s expand f(v) in the basis h1. Let’s expand v in the basis of Ker f: ˜ v = βs+1 · es+1 + . Remember that f(ei ) = hi for i = 1. hm in the space W . . v ∈ Ker f. .6 for them is also fulfilled. . the vectors (9. . . . Hence. . . .15) (9. therefore. Then from (9.12) are linearly independent. They are linearly independent. s. . LINEAR VECTOR SPACES AND LINEAR MAPPINGS. . . This means that the vectors (9. . Hence. . Hence. . . . . − βs · es . s we have the expansion s m i δj · hi + i=s+1 f(ej ) = hj = i=1 0 · hi .13) to αs+1 · es+1 + . . . . . + βs+r · es+r . = αs+r = 0.44 CHAPTER I. . . As a result we have proved that all coefficients of the linear combination (9. . + αs+r · es+r = 0. .10). hs. The condition (2) of the theorem 4. the expansion for f(ej ) is purely zero: s m f(ej ) = 0 = i=1 0 · hi + i=s+1 0 · hi . hs: f(v) = β1 · h1 + . . . (9. . − βs · es ). .

Let f : V → W be a linear mapping from n-dimensional space V to an arbitrary linear vector space W . The writing (f + g)(v) is understood as the sum of mappings applied to the vector v. W ) the set of all mappings from the space V to the space W . § 10. The algebraic operations introduced by the definitions 10. 45 Theorem 9. where v is an arbitrary vector of V . Let V and W be two linear vector spaces over a numeric field K and let f : V → W be a linear mapping from V to W . Let V and W be two linear spaces over a numeric field K. The following calculations establish the latter coincidence: (f + g)(v) = f(v) + g(v) = g(v) + f(v) = (g + f)(v).16).2. The space of homomorphisms Hom(V. In a similar way one should distinguish the meanings of left and right sides of the following equality: (α · f)(v) = α · f(v). Though the results of these calculations do coincide. In the case of the first axiom we should verify the coincidence of the mappings f + g and g + f.4. their meanings are different. Proof. The proposition of the theorem in the form of the relationship (9. W ). Another writing f(v) + g(v) denotes the sum of the results of applying f and g to v separately. Definition 10. Then the set of mappings Map(V. ALGEBRAIC OPERATIONS WITH MAPPINGS. Indeed. .1. Theorem 10. (9.1 and 10. Definition 10. Remember that the coincidence of two mappings is equivalent to the coincidence of their values when applied to an arbitrary vector v ∈ V .17) immediately follows from (9. where v is an arbitrary vector of V . W ). is called the sum of the mappings f and h.§ 10. Let V and W be two linear vector spaces and let f : V → W and g : V → W be two linear mappings from V to W .2 are called pointwise addition and pointwise multiplication by a number. they are calculated «pointwise» by adding the values of the initial mappings and by multiplying them by a number for each specific argument v ∈ V .1. Let’s denote by Map(V. Algebraic operations with mappings. is called the product of the number α ∈ K and the mapping f. These operations are denoted by the same signs as the corresponding operations with vectors: h = f + g and h = α · f. The linear mapping h : V → W defined by the relationship h(v) = α · f(v). The linear mapping h : V → W defined by the relationship h(v) = f(v) + g(v). W ) equipped with the operations of pointwise addition and pointwise multiplication by numbers fits the definition of a linear vector space over the numeric field K.4 is known as the theorem on the sum of dimensions of the kernel and the image of a linear mapping. Let’s verify the axioms of a linear vector space for the set of mappings Map(V. Then dim(Ker f) + dim(Im f) = dim V. Sometimes this set is denoted by W V .17) This theorem 9.

W ) is studied. W ) is closed with respect to algebraic operations in Map(V. As we see. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. The same arguments are applicable when verifying the axioms (2). W ) is fulfilled. W ) is the space of homomorphisms.46 CHAPTER I. the axiom (3) in Map(V. It is denoted Hom(V. In linear algebra the much smaller subset of Map(V. Then we have (f + f )(v) = (f + (−1) · f)(v) = f(v) + + ((−1) · f)(v) = f(v) + (−1) · f(v) = 0 = 0(v). W ). The axiom (4) in Map(V. it maps each vector v ∈ V to zero vector of the space W . Even for the finitedimensional spaces V and W usually it is an infinite-dimensional space. We define the opposite mapping f for f as follows: f = (−1) · f.2. This is the set of all linear mappings from V to W . we can say that Hom(V. (5). As we see in the above calculations. For this mapping we have (f + 0)(v) = f(v) + 0(v) = f(v) + 0 = f(v). The zero mapping is the best pretender for the role of zero element in the space Map(V. In the case of the axiom (8) the calculations are even more simple: (1 · f)(v) = 1 · f(v) = f(v). Now let’s consider the rest axioms (3) and (4). Therefore. W ). W ) and is called the set of homomorphisms. W ) is also fulfilled. The following two theorems show that Hom(V. Suppose that f ∈ Map(V. and (6) for the algebraic operations with mappings: ((f + g) + h)(v) = (f + g)(v) + h(v) = (f(v) + g(v)) + h(v) = = f(v) + (g(v) + h(v)) = f(v) + (g + h)(v) = (f + (g + h))(v) (α · (f + g))(v) = α · (f + g)(v) = α · (f(v) + g(v)) = = α · f(v) + α · g(v) = (α · f)(v) + (α · g)(v) = (α · f + α · g)(v) = (α · f)(v) + (β · f)(v) = (α · f + β · f)(v) ((α + β) · f)(v) = (α + β) · f(v) = α · f(v) + β · f(v) = For the axioms (7) these calculations look like (α · (β · f))(v) = α · (β · f)(v) = α · (β · f(v)) = (αβ) · f(v) = ((αβ) · f)(v).1. Theorem 10. W ). This completes the proof of the theorem 10. W ) is very large. In typical situation the space Map(V. the equality f + g = g + f follows from the commutativity axiom for the addition of vectors in W due to pointwise nature of the addition of mappings. . The pointwise sum of two linear mappings f : V → W and g : V → W is a linear mapping from the space V to the space W .

en in V i and to the set of vector w1 . . en in the space V and another basis h1 . i. . .3. The space of homomorphisms Hom(V. Now we apply the theorem 9. . wi−1 = 0. prove the linearity of the mapping h and thus complete the proofs of both theorems 10. wi+1 = 0. W ) is also finite-dimensional. Proof. W ) is a subspace in the space of all mappings Map(V. W ). . j within the above ranges we consider the following set of n vectors in the space W : w1 = 0.3. Theorem 10. Then the following calculations h(v1 + v2 ) = α · f(v1 + v2) = α · (f(v1 )+ h(β · v) = α · f(β · v) = α · (β · f(v)) = + f(v2 )) = α · f(v1 ) + α · f(v2 ) = h(v1 ) + h(v2 ). For finite-dimensional spaces V and W the space of homomorphisms Hom(V. The pointwise product of a linear mapping f : V → W by a number α ∈ K is a linear mapping from the space V to the space W . . It is much smaller and it consists of objects which are in the scope of linear algebra. (10. . following calculations prove the linearity of the mapping h: h(v1 + v2 ) = f (v1 + v2) + g(v1 + v2 ) = (f(v1 )+ +f(v2 )) + ((g(v1 ) + g(v2 ) = (f(v1 )+ + g(v1 )) + (f(v2 ) + g(v2 )) = h(v1) + h(v2 ). wi = hj . wn = 0. For each fixed pair of indices i. . W ) is also finite-dimensional. . .2) . . .1) Proof. We choose a basis e1. let’s denote h = α · f. . . Let h = f + g be the sum of two linear mappings f and g. . W )) = dim(V ) · dim(W ). n. . For finite-dimensional spaces V and W the space of homomorphisms Hom(V. This is the result of the following theorem. . . This defines the linear mapping Ej : V → W i such that Ej (es) = ws for all s = 1. = (αβ) · f(v) = (βα) · f(v) = β · (α · f(v)) = β · h(v). . We write this fact as i i Ej (es ) = δs · hj . . . . . hm in the space W . (10. . Let 1 i n and 1 j m. 47 Theorem 10. . h(β · v) = f(β · v) + g(β · v) = β · f(v)+ The + β · g(v) = β · (f(v) + g(v) = β · h(v).4. Let dim V = n and dim W = m. . except for the i-th vector wi which is equal to j-th basis vector hj .§ 10. Its dimension is given by formula dim(Hom(V.2 and 10. . Let’s denote it by h. Now let’s consider the product of the mapping f and the number α. wn . ALGEBRAIC OPERATIONS WITH MAPPINGS.2 to the basis e1. All vectors in this set are equal to zero. . e.

.2). i=1 j=1 i=1 j=1 The sum in the index i can be calculated explicitly. In order to calculate the dimension of Hom(V. which are equal to zero: m j=1 j γi · hj = 0. W ). Suppose that F is its matrix in the pair of bases e1. en and h1.48 CHAPTER I.6) i=1 j=1 Both left and right hand sides of the equality (10.3): n m i Fij · Ej . As a result we have constructed n · m mappings i Ej satisfying the relationships (10. we find n m i Fij · Ej (v).3) span the space of homomorphisms Hom(V.3) span the space of homomorphisms Hom(V. As a result we get the linear combinations of basis vectors in W . hm . Then the result of applying f to an arbitrary vector v ∈ V is determined by coordinates of this vector according to the formula n n m f(v) = i=1 vi · f(ei ) = i=1 j=1 (Fij vi ) · hj .4) i Applying Ej to the same vector v and taking into account (10. (10. the mappings (10. Let’s apply this mapping to the basis vector es .3) are linearly independent. . W ). we derive n i Ej (v) = s=1 i vs · Ej (es ) = n s=1 i (vs δs ) · hj = vi · hj . W ). i where δs is the Kronecker symbol. This proves the finite-dimensionality of the space Hom(V. . W ) we shall prove that the mappings (10. Denote by Fij the elements of this matrix. For this purpose we take a linear mapping f ∈ Hom(V. (10. Then n m j i γi · Ej (es ) = n m j i (γi δs ) · hj = 0.4) and (10. . . (10. 1 j m. f = i=1 j=1 Hence. f(v) = i=1 j=1 Since v is an arbitrary vector of the space V . (10. where 1 i n.5). .3) Now we show that the mapping (10. this formula means that f is a linear combination of the mappings (10. W ).5) Now. LINEAR VECTOR SPACES AND LINEAR MAPPINGS. which is equal to zero: n m j i γi · Ej = 0. Let’s consider a linear combination of these mappings.6) represent the zero mapping 0 : V → W . comparing the relationships (10. . . .2): i Ej : V → W.

3) are linearly independent. . the mappings (10. 49 j Due to the linear independence of the vectors h1. and the product of a mapping by a number corresponds to the product of the matrix by that number. They form a basis in Hom(V. the choice of bases in V and W defines an isomorphism of Hom(V. The sum of mappings corresponds to the sum of matrices.§ 10. . by counting these mappings we find that the required formula (10. ALGEBRAIC OPERATIONS WITH MAPPINGS. Now. 1996. 2004.A. Note that rectangular m × n matrices form a linear vector space isomorphic to the arithmetic linear vector space Kmn .. Indeed. So. hm we derive γi = 0.6) is necessarily trivial. W ). . upon choosing the bases in V and W the linear mappings from Hom(V. Hence.2) is valid. W ) and Km×n . . CopyRight c Sharipov R. This space is denoted as Km×n . The meaning of the above theorem becomes transparent in terms of the matrices of linear mappings. This means that the linear combination (10. . W ) are represented by rectangular m × n matrices.

Proof. It contains not only the strengthening of previous theorems for this particular case. (10) (α · f) ◦ h = α · (f ◦ h). g ∈ End(V ). It consists of linear operators f : V → V which are also called endomorphisms of the space V . If W = V . if we have two linear operators f. Here. this space is called the space of endomorphisms End(V ) = Hom(V. Therefore. in this book we strictly follow this terminology. Each of the equalities (9)-(12) is an operator equality. Indeed. apart from the axioms (1)-(8) of a linear vector space. the less generality the more specific features. Linear operators. but a class of problems that cannot be formulated for the case of general linear mappings. (11) f ◦ (g + h) = f ◦ g + f ◦ h. W ). . As we know.CHAPTER II LINEAR OPERATORS. Linear operators are special forms of linear mappings. we can not only add them and multiply them by numbers. the theory of linear operators appears to be more rich and more complicated than the theory of linear mappings.1. the following relationships are fulfilled: (9) (f + g) ◦ h = f ◦ h + g ◦ h. Let’s consider the space of homomorphisms Hom(V. however. Theorem 1. § 1. V ). Unlike the space of homomorphisms Hom(V. but we can also construct two compositions f ◦ g ∈ End(V ) and g ◦ f ∈ End(V ). Let End(V ) be the space of endomorphisms of a linear vector space V . W ). the space of endomorphisms End(V ) is equipped with the additional binary algebraic operation. we can apply to them all results of previous chapter. However. (12) f ◦ (α · g) = α · (f ◦ g). A linear mapping f : V → V acting from a linear vector space V to the same vector space V is called a linear operator 1 . The algebra of endomorphisms End(V ) and the group of automorphisms Aut(V ). Therefore. the equality of two operators means that these operators yield the same result when applied to an arbitrary vector v ∈ V : ((f + g) ◦ h)(v) = (f + g)(h(v)) = f(h(v)) + g(h(v)) = = (f ◦ h)(v) + (g ◦ h)(v) = (f ◦ h + g ◦ h)(v) ((α · f) ◦ h)(v) = (α · f)(h(v)) = α · f(h(v)) = = α · (f ◦ h)(v) = (α · (f ◦ h))(v) (f ◦ (g + h))(v) = f((g + h)(v)) = f(g(v) + h(v)) = = f(g(v)) + f(h(v)) = (f ◦ g)(v) + (f ◦ h)(v) = (f ◦ g + f ◦ h)(v) 1 This terminology is not common.

A situation.1 mean that Rh is a linear mapping. Similarly. Definition 1. where a linear vector space is equipped with an additional bilinear algebraic operation. the algebra A is called an associative algebra if the operation of multiplication is associative: (a b) c = a (b c). but very often this sign is omitted at all.1 we conclude that the linear space End(V ) with the operation of composition taken for multiplication is an algebra over the same numeric field K as the initial vector space V . A linear vector space A over a numeric field K equipped with a bilinear binary operation of vector multiplication is called an algebra over the field K or simply a K-algebra. f 3 = f 2 f. The operation of multiplication in algebras is usually denoted by some sign like a dot «•» or a circle «◦». In a similar way we can define another mapping. The operation of composition is an additional binary operation in the space of endomorphisms End(V ).§ 1. The algebra A is called a commutative algebra if the multiplication in it is commutative: a b = b a. it is usually omitted when written in this context. while the linearity of Lh is said to be the linearity of composition in its second argument. The priority of operator multiplication as compared to the multiplication by numbers makes no difference at all. where h is placed on the right side. However. which is called the left shift by h: Lh : End(V ) → End(V ).1. It acts according to the rule Lh (f) = h ◦ f.6 from Chapter I. ENDOMORPHISMS AND AUTOMORPHISMS. This follows from the axiom (7) for the space End(V ) and from the properties (10) and (12) of the multiplication in End(V ). This mapping is linear due to the properties (11) and (12) from the theorem 1.1. . This mapping is called the right shift by h since it acts as a composition. From the definition 1.1 and from the theorem 1. is rather typical. It is associative due to the theorem 1. Now we can consider positive integer powers of linear operators: f 2 = f f. The first two properties (9) and (10) from the theorem 1. this algebra is not commutative in general case. f n+1 = f n f. Let’s fix the operator h ∈ End(V ) and consider the composition f ◦ h as a rule that maps each operator f to the other operator g = f ◦ h. Then we get a mapping: Rh : End(V ) → End(V ). The multiplication of operator is higher priority operation as compared to addition. This algebra is called the algebra of endomorphisms of a linear vector space V . 51 (f ◦ (α · g))(v) = f((α · g)(v)) = f(α · g(v)) = = α · f(g(v)) = α · (f ◦ g)(v) = (α · (f ◦ g))(v) The above calculations prove the properties (9)-(12) of the composition of linear operators. The operation of composition is treated as a multiplication in the algebra of endomorphisms End(V ). A binary algebraic operation linear in both arguments is called a bilinear operation. The linearity of the mapping Rh is interpreted as the linearity of this binary operation in its first argument. Therefore.

52 CHAPTER II. but no brackets are used. However. for any one of these interpretations we get the same vector w. Therefore. f n f −n = idV . if f = λ · 1. to have two bases in one space — it is certainly excessive. Therefore. Algebraists use the more «deliberate» form of writing: w = f v. . . . . According to general scheme of constructing the matrix of a linear mapping we should choose two bases e1 . Suppose that a. i. The identical operator plays the role of unit element in this algebra: 1 = idV . The basic purpose of operators from the space End(V ) is to act upon vectors of the space V . . especially in huge calculations. The algebra of endomorphisms End(V ) is an algebra with unity. LINEAR OPERATORS. These two relationships are well known: the first one follows from the definition of the sum of two operators. This is the consequence of «functional» form of writing the action of an operator upon a vector: the operator sign is put on the left and the vector sign is put on the right and is enclosed into brackets like an argument of a function: w = f(v). The latter equality is valid either for positive and negative values of integer constants n and m. . when constructing the matrix of a linear operator the second basis h1.3. y ∈ V . g ∈ End(V ). If we know that f ∈ End(V ) and v ∈ V . Let f : V → V be a linear operator in a finite-dimensional vector space V . en and h1. the second relationship follows from the linearity of the operator a. . w = (α · (f ◦ g))(v). the writing w = α f g v admits several interpretations: w = α · f(g(v)). it could be very fruitful in some cases. Definition 1. Definition 1. hn is chosen to be coinciding with the first one. e. then we have the inverse operator f −1 and we can consider negative inverse powers of f as well: f −2 = f −1 f −1 . In more complicated case even if we know that α ∈ K. Then (1) (a + b)(x) = a(x) + b(x). . The question is why the vectors x and y in the above formulas are surrounded by brackets. However.2. b ∈ End(V ) and let x. The . No doubt that this approach is valid. f n+m = f n f m . f. If an operator f is bijective. . and v ∈ V . then such a writing makes no confusion. . in what follows we shall use the algebraic form of writing the action of an operator upon a vector. An algebra A over the field K is called an algebra with unit element or an algebra with unity if there is an element 1 ∈ A such that 1 · a = a and a · 1 = a for all a ∈ A. w = ((α · f) ◦ g)(v). .1) in Chapter I. The operator sign is on the left and the vector sign on the right. this operator is also called the unit operator or the operator unity. w = (α · f)(g(v)). Therefore. (2) a(x + y) = a(x) + a(y). . A linear operator f : V → V is called a scalar operator if it is obtained by multiplying the unit operator 1 by a number λ ∈ K. hn in V and consider the expansions similar to (9.

..... 53 matrix F of an operator f is determined from the expansions 1 n f(e1 ) = F1 · e1 + ...2. en possess the following properties: (1) the sum of two operators is represented by the sun of their matrices. The proof of the second proposition is similar. Matrices related to operators f ∈ End(V ) in some fixed basis e1. .. The first proposition of the theorem is proved. en .. which can be expressed in brief form by the formula n f(ej ) = i=1 Fji · ei .2) is called the matrix of a linear operator f in the basis e1. (1. g.. .§ 1.2) The matrix F determined by the expansions (1... Let F ... + F1 · en. The proof of the third proposition requires a little bit more efforts.. Then h(ej ) = (α · f) ej = α · f(ej ) = n n n = α· i=1 Fji · ei = i=1 (α Fji ) · ei = i=1 i Hj · e i . and h from End(V )... (2) the product of an operator by a number is represented by the product of its matrix by that number. (3) the composition of two matrices is represented by the product of their matrices.. Hj = α Fji and H = α · F . .1) or by the expansions (1.... . Proving the first proposition in the theorem 1. ... = i=1 Gi · j s=1 Fis · es = s=1 Fis Gi · es = j s=1 ... i Due to the uniqueness of the expansion of a vector in a basis we have Hj = Fji +Gi j and H = F + G. Let’s denote f = α · h. . Theorem 1.. Consider the operators f.2.. . G.... .. (1. + n F n · en .. This is a square n × n matrix. . and H be their matrices in the basis e1. en .. .. ENDOMORPHISMS AND AUTOMORPHISMS... Proof.. Then n n h(ej ) = (f ◦ g) ej = f(g(eg )) = f n n n i=1 n i=1 Gi · ei j = i=1 i Gj · f(ei ) = n s Hi · e s .. Denote h = f ◦ g. . Then h(ej ) = (f + g) ej = f(ej ) + h(ej ) = n n n n = i=1 Fji · ei + i=1 Gi · ei = j i=1 (Fji + Gi ) · ei = j i=1 i Hj · e i . where n = dim V . .. let’s denote h = f + g. .1) f(en ) = 1 F n · e1 + . ... .. i Therefore.

H = F G. . hn coincides with ˜ ˜ ˜ e1 . . LINEAR OPERATORS. . . The coincidence of determinants of the matrices of a linear operator f in two arbitrary bases mean that they represent a number which does not depend on a basis at all. en and the basis h1.54 CHAPTER II. Therefore. . .3) yield very important formula relating the determinants ˜ of the matrices F and F . ˜n. . . . . . From the theorem that was proved just above we conclude that when relating an operator f ∈ End(V ) with its matrix we establish the isomorphism of the algebra End(V ) and the matrix algebra Kn×n with standard matrix multiplication. hn coincides with ˜1. i=1 j=1 n n i q Sq Tjp Fp . Since the basis h1. Definition 1. . .10) from Chapter I for our present purpose. en . . Note that we need not derive the transformation formulas again. The determinant det f of a linear operator f is the number equal to the determinant of the matrix F of this linear operator in some basis. . Due to the uniqueness of the expansion of a vector in a basis we derive n s Hi = i=1 Fis Gi . A numeric invariant of a geometric object in a linear vector space V is a number determined by this geometric object such that it does not depend on anything else other than that geometric object itself. we have ˜ det F = det(S −1 ) det F det S = (det S)−1 det F det S = det F. .4. q=1 p=1 ˜q Fp = Fji = (1. . ˜ F = S F S −1 . . Soon we shall define a lot of other numeric invariants of a linear operator. The theorem is proved. Indeed. we have P = S. Now let’s study how the matrix of a linear operator f : V → V changes under ˜ e the change of the basis e1 . . j The right side of this equality is easily interpreted as the product of two matrices written in terms of the components of these matrices. . Let S be the direct transition matrix and let T be the inverse one. . . The determinant of a linear operator det f is an example of such numeric invariant. Taking into account that T = S −1 we can write (1.4) The relationships (1. . en for some other basis e1 . . e Then transformation formulas are written as ˜ F = S −1 F S.3) as n n j Tiq Sp Fji . . We can adapt the formulas (9. (1. Another example of a numeric invariant of a linear operator is its rank: rank f = dim(Im f).3) These are the required formulas for transforming the matrix of a linear operator under a change of basis. Coordinates of a vector or components of the matrix of a linear operator are not numeric invariants.

. Fn x1 . . while the theorem 9. . en . This proves the injectivity of the operator f. n F1 1 . . . if the operator f is surjective. we get Im f = V . . The theorem is proved. . . . . Here we use the well-known result from the theory of determinants: a homogeneous system of linear algebraic equations with square matrix F has nonzero solution if and only if det F = 0. n F1 1 . (1. . Fn = (1. 55 From the third proposition of the theorem 1. n xn . . y n 1 F1 . . . Hence. Therefore. · . Expanding x and y in some basis e1. . Let x be a vector of V and let y = f(x).5) Theorem 1. . .3 and 9.§ 1. 0 (1. Proof. . A linear operator f : V → V in a finite-dimensional linear vector space V is injective if and only if it is surjective.. Fn = 0 . we get the following formula relating their coordinates: y1 . . . en . . From this formula we derive that x belong to the kernel of the operator f if and only if its coordinates x1 . . . .1 from Chapter I the latter equality Ker f = {0} is equivalent to bijectivity of f. From this result immediately get that the condition Ker f = {0} is equivalent to Ker f = {0}.. If the operator f is injective. then Im f = V and dim(Im f) = dim(V ). .6) can be derived independently or one can derive it from the formula (9. Theorem 1. . Conversely.5) of Chapter I. . Fn x1 . .3. . dim(Ker f) = 0 and Ker f = {0}. . Due to the previous theorem and due to the theorem 1. the kernel of the operator f is nonzero if and only if the system of equations (1.7) has nonzero solution. which proves the surjectivity of the operator f.6) The formula (1. . . Applying the third proposition of the theorem 4. · .4. . A linear operator f : V → V in a finite-dimensional linear vector space V is bijective if and only if det f = 0. xn satisfy the homogeneous system of linear equations 1 F1 .5 from Chapter I. n xn . . . . then Ker f = {0} and dim(Ker f) = 0. ENDOMORPHISMS AND AUTOMORPHISMS. .2 and two theorems 8.4 from Chapter I relates the dimensions of these two subspaces Ker f and Im f: dim(Ker f) + dim(Im f) = dim(V ). Then dim(Im f) = dim(V ). . The injectivity of the linear operator f is equivalent to the condition Ker f = {0}.2 we derive the following formula for the determinant of a linear operator: det(f ◦ g) = det(f) · det(g). The proof of this fact can be found in [5]. In order to prove this theorem we apply the theorem 1. the surjectivity of the operator f is equivalent to Im f = V . . .7) The matrix of this system of equations coincides with the matrix of the operator f in the basis e1.4 from Chapter I. Proof. . . .

Let’s consider a pair of vectors v1 . where 1 is the identical operator.1) the operator of projection onto the subspace U1 parallel to the subspace U2 is a linear operator. nor even the structure of a linear vector space. It is obvious that Aut(V ) possesses the following properties: (1) if f. however. g ∈ Aut(V ). where u1 ∈ U1 and u2 ∈ U2 . Projection operators. For any expansion of the form (2. It is easy to see that due to the above three properties the set of automorphisms Aut(V ) is equipped with a structure of a group. In the case of finite-dimensional space V the group of automorphisms consists of all non-degenerate operators.2) (2. Otherwise this linear operator is bijective. Let V be a linear vector space expanded into a direct sum of two subspaces: V = U1 ⊕ U2 . CopyRight c Sharipov R. then f −1 ∈ Aut(V ). § 2.1) each vector v ∈ V is expanded into a sum v = u1 + u2.56 CHAPTER II.2) is called the operator of projection onto the subspace U1 parallel to the subspace U2 .A. The operator P : V → V mapping each vector v ∈ V to its first component u1 in the expansion (2. ˜ Then P (v1) = u1 and P (v2 ) = u1. (2. If W = V such a mapping establishes an isomorphism of the space V with itself. (2) if f ∈ Aut(V ). Remember that a bijective linear mapping f from V to W is called an isomorphism. 2004. it does not inherit the structure of an algebra. Definition 2. it is called an automorphism of the space V .1) the components u1 and u2 in (2.. The set of all automorphisms of the space V is denoted by Aut(V ). the zero operator does not belong to Aut(V ). An operator f with zero determinant detf = 0 is called a degenerate operator. (2. and for each of them consider the expansion like (2. then f ◦ g ∈ Aut(V ). Corollary. Therefore.2): v1 = u 1 + u 2 . for instance. ˜ ˜ v2 = u 1 + u 2 .3) . 1996.1. Let’s add the above two expansions and write ˜ ˜ v1 + v2 = (u1 + u1) + (u2 + u2 ).4. A linear operator f : V → V in a finite-dimensional space V has a nontrivial kernel Ker f = {0} if and only if it is degenerate. Theorem 2. Due to the expansion (2.2) being uniquely determined by the vector v. (3) 1 ∈ Aut(V ). The group of automorphisms Aut(V ) is a subset in the algebra of endomorphisms End(V ). Proof.1. v2 from the space V . LINEAR OPERATORS. Using this terminology we can formulate the following corollary of the theorem 1. It is clear because.

P (P (v)) = P (v).2) for this vector is v = v + 0. Indeed. for any v ∈ V we have P (v) ∈ U1 . Therefore. Similarly.2) we can define the other operator Q such that Q(v) = u2. then the expansion (2. u1 ∈ U1 and from u2.6) is called a concordant pair of projectors. for any vector v ∈ V we have P (v) + Q(v) = u1 + u2 = v = idV (v) = 1(v). 2 P + Q = 1. If v ∈ U1 . therefore. (2. therefore. u2 ∈ U2 we derive u1 + u1 ∈ U1 and u2 + u2 ∈ U2 . Indeed. 57 ˜ ˜ ˜ ˜ From u1. The second operator Q then is given by formula Q = 1 − P . Then α · u1 ∈ U1 and α · u2 ∈ U2 . for instance. therefore. Therefore. Q(v) = 0. (2. we write P 2 = P. we derive Q(P (v)) = 0 and P (Q(v)) = 0 for any v ∈ V . Q = Q. due to the definition of P we get P (α · v) = α · u1 = α · P (v). Indeed. Then ˜ P (v1 + v2) = u1 + u1 = P (v1 ) + P (v2). Besides P .§ 2.6) A pair of projection operators satisfying the relationships (2. For the sum of these two operators we get P + Q = 1. All of the relationships (2. It is also a projection operator: it projects onto U2 parallel to U1 .5) are just the very relationships that mean the linearity of the operator P . Q P = (1 − P ) ◦ P = P − P 2 = P − P = 0. therefore. P Q = Q P = 0. the operator P . . Hence. Then the expansion (2.4) Now let’s consider the expansion (2.2) for an arbitrary vector v ∈ V and multiply it by a number α ∈ K: α · v = (α · u1) + (α · u2 ).2) is chosen to be a vector of the subspace U1 . P (v) = v. in order to get a concordant pair of projectors it is sufficient to define only one of them.5) The relationships (2. (2. This means that all vectors of the subspace U1 are projected by P onto themselves.2) for this vector is v = +0.3) is an expansion of the form (2. P (v) = 0 for all v ∈ U2 . This fact has an important consequence P 2 = P .2) for the vector v1 + v2 . by means of (2.6) thereby will be automatically fulfilled. we have the relationships P Q = P ◦ (1 − P ) = P − P 2 = P − P = 0. (2. PROJECTION OPERATORS. The relationship Q2 = Q for Q is derived in a similar way: Q2 = (1 − P ) ◦ (1 − P ) = 1 − 2P + P = 1 − P = Q. Q2 = Q. Summarizing these results. Suppose that v in the expansion (2.4) and (2.

10) . it is unique and we have V = Im P ⊕ Ker P. Now suppose that a linear vector space V is expanded into the direct sum of several its subspaces U1 . . Proof. An operator P : V → V is a projector onto a subspace parallel to another subspace if and only if P 2 = P . LINEAR OPERATORS. Let’s prove that this is a direct sum of subspaces. Let’s consider two subspaces U1 = Im P. Theorem 2. From u2 ∈ Ker P we derive P (u2) = 0.8) where u1 ∈ Im P and u2 ∈ Ker P . Us : V = U1 ⊕ . Hence. We have already shown that any projector satisfies the equality P 2 = P . P is an operator of projection onto the subspace Im P parallel to the subspace Ker P .2. (2. . (2. Q(v) = (1 − P ) v = v − P (v) = v − u1 = u2. This means u2 ∈ Ker P . The relationships derives just above mean that any expansion (2. . ⊕ U s . Let’s prove the converse proposition.8) coincides with (2. . From u1 ∈ Im P we conclude that u1 = P (v1) for some vector v1 ∈ V . V = Im P + Ker P . Then for operators P and Q all of the relationships (2. + us. . We should prove the uniqueness of the expansion v = u1 + u2 .8). From the relationship P Q = 0 for the other vector u2 = Q(v) in (2.7) where u1 = P (v) ∈ Im P .7). Hence. The operator P maps an arbitrary vector v ∈ V into the first component of the expansion (2. Suppose that P 2 = P . (2. Let’s denote Q = 1 − P . Then from (2.7) we get the equality P (u2) = P (Q(v)) = 0.58 CHAPTER II. where ui ∈ Ui (2. For an arbitrary vector v ∈ V we have the expansion v = 1(v) = (P + Q)v = P (v) + Q(v). Hence. U2 = Ker P. .8) we derive the following formulas: P (v) = P (u1) + P (u2) = P (P (v1)) = P 2(v1 ) = P (v1) = u1. .9) This expansion of the space V implies the unique expansion for each vector v ∈ V : v = u1 + . .6) are fulfilled.

.12).10) looks like: u = 0 + . Due to the relationship (2. Ps satisfying the relationships (2.11) the theory of separate operators Pi does not differ from the theory of projectors defined by two component expansions of the space V . . Suppose that we have a family of operators P1 . . . (2.12). where ui ∈ Im Pi (2.10). A family of projection operators P1.12) Pi ◦ Pj = 0 for i = j. + us. Let’s prove the converse proposition. Ps is determined by an expansion of the form (2. In the case of multicomponent expansions the collective behavior of projectors is of particular interest. . The operator Pi : V → V that maps each vector v ∈ V to its i-th component ui in the expansion (2. . + Im Ps. Theorem 2. PROJECTION OPERATORS. For the projection operators Pi this yields (Pi )2 = Pi.13) where Pi(v) ∈ Im Pi. .11) and (2. + Ps = 1. It is based on the uniqueness of the expansion (2.3. Then its expansion (2. + 0 + u + 0 + . . Then we define the subspaces Ui = Im Pi . . . from the definition of Pi we get P1 + . Let’s choose a vector u ∈ Ui .9) satisfy the relationships (2. 59 Definition 2. .12).11) and (2. .§ 2. . A family of projection operators P1.12) for an arbitrary vector v ∈ V we get v = P1 (v) + . e. we have the expansion of V into a sum of subspaces V = Im P1 + . . if these operators satisfy the relationships (2. + 0.15) .11) and (2. .14) Let’s prove that the sum (2. + Ps(v).9) if and only if it is concordant. (2. (2.11) Due to the first relationship (2. Ps is called a concordant family of projectors if the operators of this family satisfy the relationships (2. . . . Proof. .12).10) is called the operator of projection onto Ui parallel to other subspaces. . Therefore for any such vector u we have Pi(u) = u and Pj (u) = 0 for j = i. For this purpose we consider an expansion of some arbitrary vector v ∈ V corresponding to the expansion (2.11) and (2. . i. The proof of linearity of the operators Pi is practically the same as in case of two subspaces considered in theorem 2. Hence. . (2. Moreover.14): v = u1 + .14) is a direct sum.2. . We already know that a family of projectors determined by an expansion (2. .1.

. From ui ∈ Im Pi we conclude that ui = P (vi ). . The sum of these two subspaces is a direct sum: V = U1 ⊕ U2 . .16) is the matrix of the projection operator P the basis e1. 0 0 . where vi ∈ V . . . Hence. we complete it by choosing a basis in es+1 . . . . Let’s consider the subspaces Im P and Ker P . .. This equality show that an arbitrary expansion (2. 0 0 .3 in Chapter I). .. we have P (ei) = ei 0 for i = 1. + us) = j=1 Pi(Pj (vj )). . 1 0 .. . . . .11) only one term in the above sum is nonzero. . . . .. 1 . ..60 CHAPTER II. ... From the condition P = 0 we conclude that s = dim(Im P ) = 0. .15) should coincide with (2. joining together two bases in them..        s .. we get a basis of V (see the proof of theorem 6. This operator projects onto U1 parallel to U2 . . This means that (2. Now we consider a projection operator P as an example for the first approach to the problem of bringing the matrix of a linear operator to a canonic form. The theorem is proved.16) 0 . 0 0 . en such that the matrix of the operator P has the following form in that basis: s P= 1 0 . Then from the expansion (2. Therefore. .13) is the unique expansion of that sort. . therefore. .13).. 0 Proof. For any nonzero projection operator in a finite-dimensional vector space V there is a basis e1. for i = s + 1.4. . . . . the sum (2... s. . Then we choose a basis e1 .15) we derive the following equality: s Pi(v) = Pi(u1 + . es in U1 = Im P and if U1 = V . . therefore. 0 . . en in U2 = Ker P . .. ... n. .. . Theorem 2. 0 0 . . . . . es . 0 0 . . 0 0 ... .14) parallel to its other components. (2.. . . .. . . Due to this formula it’s clear that (2.. Due to (2. 0 . 0 0 . . we have Pi(v) = (Pi)2 vi = Pi (vi ) = ui . . . Now let’s apply the operator P to the vectors of the basis we have constructed just above. 0 0 . . . .14) is a direct sum and Pi is the projection operator onto the i-th component of the sum (2.. . . . LINEAR OPERATORS.

where uir ∈ Uir . Now let u ∈ Im f. . the operator fU cannot be applied to them at all. Theorem 3. . let’s prove that U is an invariant subspace. A subspace U is called an invariant subspace of a linear operator f : V → V if f(U ) ⊆ U . hence. For this reason in general case we should treat the restricted operator f as a linear mapping fU : U → V . Definition 3. W = i∈I Ui . The intersection and the sum of an arbitrary number of invariant subspaces of a linear operator f : V → V both are the invariant subspaces of f. Its action upon vectors u ∈ U coincides with the action of f upon u. Let’s consider the kernel of f for the first. in general. Restriction and factorization of operators. Now let’s consider a vector w ∈ W . The invariance of the image Im f is proved. The kernel and the image of a linear operator f : V → V are invariant subspaces of f. Let’s restrict the domain of f to the subspace U . if u ∈ U implies f(u) ∈ U . since the zero vector 0 is an element of any subspace of V . Therefore. f(u) ∈ U . i ∈ I be a family of invariant subspaces of a linear operator f : V → V . w = f(u) ∈ Im f. Hence. Thereby the image of f shrinks to f(U ). Let’s consider the intersection and the sum of these subspaces: U= i∈I Ui . Consider a vector u ∈ U . Denote w = f(u). Let Ui . rather than a linear operator. According to the definition of the sum of subspaces.2. Hence. . i. 61 § 3. then f(u) = 0. which are invariant subspaces of f. If u ∈ Ker f. This means that f(u) belongs to their intersection U . Now we should prove that they are invariant subspaces. Then w is the image of the vector u. If U is an invariant subspace of f. However. This vector belongs to all subspaces Ui . The invariance of the kernel Kerf is proved. The invariance of U is proved. Proof.1.§ 3. we get: f(w) = f(ui1 ) + . f(w) ∈ W .1. the subspace f(U ) is not enclosed into the subspace U . Let f : V → V be a linear operator and let U be a subspace of V . + f(uis ). this vector admits the expansion w = ui1 + . Due to the invariance of Ui we have f(uir ) ∈ Uir . e. . In § 6 of Chapter I we have proved that U and W are the subspaces of V . Invariant subspaces. . As for the vectors outside the subspace U . For the first. + uis . INVARIANT SUBSPACES. . the restriction fU can be treated as a linear operator in U . f(u) also belongs to all subspaces Ui . This yields the invariance of the sum W of the invariant subspaces Ui . Proof. Applying the operator f to both sides of this equality. Theorem 3.

let ˜ ˜ v.2) as well. f V /U (Q) = ClU (f(˜ )). Now let’s prove the linearity of the factoroperator fV /U : V /U → V /U . v Let’s calculate the difference of these two possible results: ClU (f(˜ )) − ClU (f(v)) = ClU (f(˜ ) − f(v)) = ClU (f(˜ − v)). We can rewrite the formula (3. the formulas (3. . (3. Therefore. Let’s consider the factorspace V /U and define the operator fV /U in this factorspace by formula fV /U (Q) = ClU (f(v)). LINEAR OPERATORS. we need to prove their correctness. v u This coincidence ClU (f(˜ )) = ClU (f(v)) that we have proved just above proves v the correctness of the formula (3. fV /U (α · Q) = fV /U (α · ClU (v)) = = fV /U (ClU (α · v)) = ClU (f(α · v)) = = ClU (α · f(v)) = α · ClU (f(v)) = α · fV /U (Q). (3.1) is called the factoroperator of the quotient operator of the operator f by the subspace U . e.3) in Chapter I. we have u = f(u) ∈ U .3.1): fV /U (Q1 + Q2 ) = fV /U (ClU (v1 ) + ClU (v2 )) = = fV /U (ClU (v1 + v2 )) = ClU (f(v1 + v2 )) = = ClU (f(v1 )) + ClU (f(v2 )) = fV /U (Q1) + fV /U (Q2 ). The formula (3. Let U be an invariant subspace of a linear operator f : V → V .62 CHAPTER II. i.2) comprise the definite amount of uncertainty due to the uncertainty of the choice of a representative v in a coset Q = ClU (v). They define a linear operator fV /U in factorspace V /U .1) in shorter form as follows: fV /U (ClU (v)) = ClU (f(v)).2) both are correct. we get ClU (f(˜ )) − ClU (f(v)) = ClU (˜ ) = 0.1) and the equivalent formula (3. where Q = ClU (v). Theorem 3. According to the formula (3. v v v ˜ Note that the vector u = v − v belongs to the subspace U . Therefore.1) and (3. We shall carry out the appropriate calculations on the base of formula (3.2) Like formulas (7. Since U is an invariant ˜ subspace. Let’s conside two different representative vectors in a coset Q. v ∈ Q.1) and the formula (3. we consider two possible results of applying the operator fV /U to Q: f V /U (Q) = ClU (f(v)).1). Proof. Then v − v ∈ U .1) The operator fV /U : V /U → V /U acting according to the rule (3.

then from w ∈ U we derive f(w) ∈ U . Hence. . where we denote h = α · f.§ 3. Denote h = f + g and assume that u is an arbitrary vector of U . . The relationship hU = α · fU now is obvious due to the same reasons as above. is not quite different from the first one. which means that U is an invariant subspace of h. Theorem 3. INVARIANT SUBSPACES.4. For the restricted operators this yields the equality hU (u) = h(u) = f(g(u)) = fU (gU (u)). g ∈ End(V ). (f ◦ g)U = fU ◦ gU . h(u) = f(g(u)) = f(w) ∈ U . (α · f)V /U = α · fV /U . Then U is an invariant subspace of the operators f +g. Now we consider the third case. From u ∈ U it follows that f(u) ∈ U . Suppose that U is a common invariant subspace of two linear operators f. For their restrictions to the subspace U and for the corresponding factoroperators we have the following relationships: (f + g)U = fU + gU . Here we denote h = f ◦ g. we obtain hV /U (ClU (v)) = ClU (h(v)) = ClU (f(g(v)) = fV /U (Cl(g(v))) = = fV /U (gV /U (ClU (v))) = fV /U ◦ gV /U (ClU (v)). The corresponding relationship for the factoroperators is proved as follows: hV /U (ClU (v)) = ClU (h(v)) = ClU (f(v) + h(v)) = = ClU (f(v)) + ClU (g(v)) = fV /U (ClU (v)) + gV /U (ClU (v)). The above calculations prove the last relationship of the theorem 3. For the factoroperators we perform the following calculations: hV /U (ClU (v)) = ClU (h(v)) = ClU (α · f(v)) = = α · ClU (f(v)) = α · fV /U (ClU (v)) = (α · fV /U)(ClU (v)). . hU = fU ◦ gU . For this reason we obtain h(u) = f(u) + g(u) ∈ U . Proof. Passing to factoroperators.4. (α · f)U = α · fU . h(u) = α · f(u) ∈ U . From u ∈ U we derive w = g(u) ∈ U . g. Us are invariant subCopyRight c Sharipov R. ⊕ Us be an expansion of a linear vector space V into a direct sum of its subspaces. Theorem 3. 1996. Indeed. . 2004. α·f and f ◦ g as well. The theorem is proved.. This proves that U is an invariant subspace of h. Then f(u) ∈ U and g(u) ∈ U since U is an invariant subspace of both operators f and g. The relationship hU = fU + gU follows from h = f + g since the results of applying the restricted operators to u do not differ from the results of applying f. (f + g)V /U = fV /U + gV /U .A. . and h to u. (f ◦ g)V /U = fV /U ◦ gV /U . Let’s begin with the first case. The subspaces U1 . hence.5. The second case. Let V = U1 ⊕ . . . 63 These calculations show that fV /U is a linear operator.

Ps associated with the expansion V = U1 ⊕ . But due to the above equality we find that Pi(w) = w = f(u) ∈ Ui . Therefore. . Suppose that dim V = n and dim U = s. . For j s due to the invariance of the subspace U under the action of f we have f(ej ) ∈ U . this matrix would . . ⊕ Us : v = u1 + .64 CHAPTER II. . if s < n. es in U and then. en. The theorem is completely proved. for j = i. the summation index i runs from 1 to s. Conversely. . Thus we have shown that the space Ui is invariant under the action of the operator f. . . Let u be an arbitrary vector of the subspace Ui . Since v is an arbitrary vector of the space V . . Hence.⊕ Us commute with the operator f. if f ◦ Pi = Pi ◦ f. Here ui = Pi(v) ∈ Ui . . . . . Let’s choose a basis e1 . + u s . . from the above equality Pi (f(v)) = f(Pi (v)) we derive f ◦ Pi = Pi ◦ f.⊕ Us . where j s. We also used the following properties of projection operators (they follow from (2. . see § 2 above): Pi (wj ) = wi 0 for j = i. For an arbitrary vector v ∈ V we consider the expansion determined by the direct sum V = U1 ⊕ . . . . . . Denote by es+1. . . Pi (w) ∈ Ui . complete this basis up to a basis in V . suppose that the operator f commute with all projection operators P1. . From this expansion we derive Pi(f(v)) = Pi(f(u1 ) + . . . This means that if we construct the matrix of the operator f in the basis e1 . . where we expand an arbitrary vector of V . . LINEAR OPERATORS. . Let’s consider a linear operator f in a finite-dimensional linear vector space V and possessing an invariant subspace U . where i = 1. en the complementary vectors. + f(us )) = f(ui ) = f(Pi (v)) We used the inclusion wj = f(uj ) ∈ Uj that follows from the invariance of the subspace Uj under the action of f. in the expansions of these vectors s f(ej ) = i=1 Fji · ei. . but not from 1 to n as it should in general case. spaces of an operator f : V → V if and only if the projection operators P1. . . . Remember that Pi projects onto the subspace Ui . i. Then we denote w = f(u) and for w we derive Pi(w) = Pi(f(v)) = f(Pi (u)) = f(u) = w.11) and Ui = Im Pi. . s. . e. Ps associated with the expansion V = U1 ⊕ . . Proof. Suppose that all subspaces Ui are invariant under the action of the operator f.

. . .§ 3. en: E1 = ClU (es+1 ). . . .. Fs 2 . . . The lower right diagonal block of the matrix (3. . . Let f : V → V be a linear operator in a finite-dimensional space and let U be an invariant subspace of this operator. . n . .4) coincides with the lower right diagonal block in the matrix (3. The proof of this theorem is immediate from the following fact well-known in the theory of determinants: the determinant of blockwise-triangular matrix is equal to the product of determinants of all its diagonal blocks. . s F2 1 .. . 0 s+1 . 0 0 . 4 . Looking at this formula. . . shifting the index i + s → i.. .. .3) can also be interpreted in a special way. . . s Fs+1 s+1 Fs+1 . Applying the factoroperator fV /U to (3.. .. Theorem 3.4). s F1 1 F2 2 F2 . . . Fs 1 Fs+1 2 Fs+1 . .. The upper left diagonal block in the matrix (3.. . we see that the matrix of the factoroperator fV /U in the basis (3. INVARIANT SUBSPACES. . . we get fV /U (Ej ) = fV /U ClU (es+j ) = ClU (f(es+j )) = s n i Fs+j i=1 = · ClU (ei) + i=s+1 i Fs+j · ClU (ei ). . . we find n−s fV /U (Ej ) = i=1 s+i Fs+j · Ei . Then the determinant of f is equal to the product of two determinants — the determinant of the restricted operator fU and that of the factoroperator fV /U : det f = det(f U ) · det(f V /U ). . .3) 0 . . . . Fs ..4) When proving the theorem 7. . . . 0 . En−s = ClU (en ).6 in Chapter I. . s Fn F =          s (3. . Fn .6. . 0 . 1 Fn 2 Fn . The first sum in the above expression is equal to zero since the vectors e1 . .. . . Then. we have found that these cosets form a basis in the factorspace V /U . . es belong to U . . 65 be mounted of blocks with the lower left block in it being zero: s 1 F1 2 F1 . In order to find this interpretation let’s consider the cosets of complementary vectors in the basis e1 . .. n Fs+1 .3) coincides with the matrix of restricted operator fU : U → U in the invariant subspace U .3). . Fn Matrices of this form are called blockwise-triangular matrices. .. (3. . .

. This subspace Vλ = Ker(f − λ · 1) = {0} is called the eigenspace associated with the eigenvalue λ. A nonzero vector v = 0 of the space V is called an eigenvector of the operator f if f v = λ · v.. The number λ is called the eigenvalue of the operator f associated with the eigenvector v. Eigenvalues and eigenvectors. . This is the very theory that is usually studied in the course of linear algebra. . § 4. from the equality f v = λ · v = λ · v and from v = 0 it follows that λ = λ . while the brunch of mathematics studying the spectra of linear operators is known as the spectral theory of operators.. Definition 4. The spectral theory of linear operators in finite-dimensional spaces is the most simple one. Then the determinant in formula (4.1. if an eigenvector is given. . (4. Let f : V → V be a linear operator. . its roots are called the characteristic numbers of the operator f. . the associated eigenvalue λ for this eigenvector is unique. One eigenvalue λ of an operator f can be associated with several or even with infinite number of eigenvectors. while any nonzero vector of Vλ is called an eigenvector of the operator f associated with the eigenvalue λ. Due to this corollary a number λ ∈ K is an eigenvalue of the operator f if and only if it satisfies the equation det(f − λ · 1) = 0.66 CHAPTER II.2) The equation (4. LINEAR OPERATORS. Fn − λ . .4. The collection of all eigenvalues of an operator f is sometimes called the spectrum of this operator. 1 Fn 2 Fn Hλ = . v ∈ Ker(f − λ · 1). where λ ∈ K. Indeed.. Let’s consider the other operator hλ = f − λ · 1. . .. (4.2) is equal to the determinant of the square n × n matrix. .3) n . (4. Then the equation f v = λ · v can be rewritten as (f − λ · 1) v = 0. A number λ ∈ K is called an eigenvalue of a linear operator f : V → V if the subspace Vλ = Ker(f − λ · 1) is nonzero. n F2 . Let v be an eigenvector of the operator f : V → V . . The matrix of the operator hλ = f − λ · 1 is derived from the matrix of the operator f by subtracting λ from each element on the primary diagonal of this matrix: 1 F1 − λ 2 F1 1 F2 2 F2 − λ . . n F1 . But conversely. Let f : V → V be a linear operator in a finite-dimensional linear vector space V .2) is called the characteristic equation of the operator f. In order to find the spectrum of this operator we apply the corollary of theorem 1.1) Hence. Let dim V = n. The condition v = 0 means that the kernel of this operator is nonzero: Ker(f − λ · 1) = {0}. .

67 The determinant of the matrix (4. the coefficients F1 .6).6) is an eigenvalue of the operator f. .6) counted according to their multiplicity and including those belonging to the extensions of the field K is equal to n (see [4]). However.2) of the operator f is a polynomial equation of n-th order with respect to λ: (−λ)n + F1 (−λ)n−1 + .4) is basis independent. However.4) is called the characteristic polynomial of the operator f. note that left hand side of of (4. then the coefficients F1. From the course of general algebra we know that the total number of roots of the equation (4. If F is the the matrix of the operator f in some basis. The matter is that a polynomial equation with coefficients in the numeric field K can have roots in ˜ some larger field K (e. g. Fn of characteristic polynomial (4. In the case of real numbers K = R a polynomial equation with real √ coefficients can also have non-real roots. .4) into (4. .4) which describes the transformation of the matrix of a linear operator under a change of basis: n n n n j Tip Sp i=1 j=1 p=1 n n j δi Fji = i=1 j=1 i=1 n ˜p Fp = p=1 Fji = Fii. (4.2) we see that the characteristic equation (4. . Fn = det f. . .4) are the most popular ones: F1 = tr f. + Fn = 0. . the equation λ2 + 3 = 0. the field of complex numbers K = C is an exception. .1. The number of eigenvalues of a linear operator f : V → V equals to the dimension of the space V at most. .4) The polynomial in right hand side of (4. They are scalar invariants of the operator f. . therefore. However. The invariant F1 is called the trace of the operator f. (4. Fn do not actually depend on the choice of basis. + Fn. e.5) since it is well-known in the theory of determinants. Q ⊂ R or R ⊂ C).6) Therefore we can estimate the number of eigenvalues of the operator f. Consider the case K = Q. . The fires and the last invariants in (4. g.§ 4. The roots of a polynomial equation with rational coefficients are not necessarily rational numbers: the equation λ2 − 3 = 0 is an example. Upon substituting (4. For the characteristic number λ of the operator f to be an eigenvalue of this operator it should belong to K. not any root of the equation (4. Any eigenvalue λ ∈ K is a root of characteristic equation (4. We shall only derive the invariance of the trace immediately on the base of formula (1.4) are expressed through the elements of the matrix F . EIGENVALUES AND EIGENVECTORS.5) We shall not derive this formula (4. . It is calculated through the matrix of this operator according to the following formula: n tr f = i=1 Fii. Theorem 4. . (4.3) is a polynomial of λ: det(f − λ · 1) = (−λ)n + F1 (−λ)n−1 + .

7) we can present the numeric invariants F1 . . Then λ is a root of the equation (4. However. Due to the formula (4. . Theorem 4.4) is factorized into a product of terms linear in λ: n det(f − λ · 1) = i=1 (λi − λ).6) is called the multiplicity of the eigenvalue λ.2. It is an immediate consequence of the algebraic closure of the field of complex numbers C. Definition 4.1 of an eigenspace Vλ of a linear operator f can be reformulated as Vλ = {v ∈ V : f(v) = λ · v}. In particular. In the case K = C the characteristic polynomial (4. . Let λ be an eigenvalue of a linear operator f.8) The theory of symmetric polynomials is given in the course of general algebra (see. e. Therefore.7) is always valid. only the field of complex numbers is algebraically closed.1. An arbitrary polynomial equation of n-th order with complex coefficients has exactly n complex roots counted according to their multiplicity. .68 CHAPTER II. This proposition strengthen the theorem 4. C is not the unique algebraically closed field. . in the list of numeric fields Q. (4. λn are understood as characteristic numbers of the operator f. We shall not prove here this theorem referring the reader to the course of general algebra (see [4]). LINEAR OPERATORS. the book [4]). .7) For some operators such an expansion can occur in the case K = Q or K = R. R. which proves the invariance of Vλ . Theorem 4. . The multiplicity of this root λ in the equation (4. Certainly.4. . (4.2.2 is known as the «basic theorem of algebra». For any eigenvalue λ of a linear operator f : V → V the associated eigenspace Vλ is invariant under the action of f. while the property of complex numbers stated in this theorem is called the algebraic closure of C. then the formula (4. . Theorem 4.3. i. Fn of the operator f as elementary symmetric polynomials of its characteristic numbers: Fi = σi (λ1 . The theorem 4. v ∈ Vλ implies f(v) = λ · v ∈ Vλ . . for example. The definition 4. it is not a typical situation. C that we consider in this book. For a linear operator f : V → V in a complex linear vector space V the number of its eigenvalues counted according to their multiplicities is exactly equal to the dimension of V . If λ1 . . for the trace and for the determinant of the operator f we have n n tr f = i=1 λi . however. A numeric field K is called an algebraically closed field if the roots of any polynomial equation with coefficients from K are again in K. . det f = i=1 λi . . C is an algebraically closed numeric field.6). λn ). Proof.

Applying the operator P (f) of the form (4. Proof. Then p q P (f) ◦ Q(f) = i=0 j=0 (αi βj ) · f i+j = Q(f) ◦ P (f). It is important to say that the subalgebra K[f] is commutative. any eigenvector v of . + α1 · f + α0 · 1. . we get P (f) u = αp · up + . . This relationship should be treated as the definition of zeroth power of the operator f. in turn.9) is called the polynomial envelope of the operator f. Let’s consider some operator f ∈ End(V ) and complement it with the identical operator 1. Here we denote: f 0 = 1. which proves the invariance of U under the action of the operator P (f). let P (f) and Q(f) be two operator polynomials of the form: p q i P (f) = i=0 αi · f . 69 We know that the set of linear operators in a space V form the algebra End(V ) over the numeric field K. . Theorem 4. However. u1 = f(u). The following fact is curious: if λ is an eigenvalue of the operator f and if v is an associated eigenvector. (4. . EIGENVALUES AND EIGENVECTORS.10). i. then P (f) v = P (λ) · v.10) is verified by direct calculation. and so on up to up ∈ U . due to ui ∈ U we find that P (f) u ∈ U .5. Hence. we can add such products. Therefore. up = f p (u).§ 4.9) commute: P (f) ◦ Q(f) = Q(f) ◦ P (f). u3 ∈ U . . Let U be an invariant subspace of an operator f. e. Then it is invariant under the action of any operator from the polynomial envelope K[f]. . we can multiply them by numbers from K. Let’s consider the following vectors u0 = u. this algebra is too big. These calculations prove the relationship (4. .9) to the vector u. Every next vector in this sequence is obtained by applying the operator f to the previous one: ui+1 = f(ui ). As a result we obtain various operators of the form P (f) = αp · f p + . Therefore. (4. it is denoted K[f]. Let u be an arbitrary vector of U . we successively obtain u2 ∈ U . + α1 · u1 + α0 · u0. for any two polynomials P and Q the corresponding operators (4. Within the algebra End(V ) we can take positive integer powers of the operator f. This is a subset of End(V ) closed with respect to all algebraic operations in End(V ). and we can add to them scalar operators obtained by multiplying the identical operator 1 by various numbers from K. Indeed.10) The equality (4.9) The set of all operators of the form (4. Q(f) = j=0 βj · f j . Then. . from u0 ∈ U it follows that u1 ∈ U since U is an invariant subspace of f. Such subsets are used to be called subalgebras. u2 = f 2 (u).

The eigenspace Vλi of the operator f is determined as the kernel of the operator hi . . . . Let’s denote by W the sum of eigenspaces of the operator f: W = V λ1 ⊕ . the theorems 4. ⊕ Vλs . ⊕ V λs . it remains valid in either case. Let’s consider the operators hi = f − λi · 1. The converse proposition. + vs .14) r=i (λi − λr ) CopyRight c Sharipov R. Hence.1. (4. . (4.12). Note that the set of mutually distinct eigenvalues λ1 . . which implies hr (vj ) = (λj − λr ) · vj . . Let λ1 .4 and 4. LINEAR OPERATORS. . it is nonzero. . .13) means that fi (vj ) = 0 for all j = i. . . 1996.12) fi = r=i hr . According to the definition 4. Proof. . λs of the operator f in this theorem could be the complete set of such eigenvalues. Moreover. The operator fi belongs to the polynomial envelope of the operator f and s fi (vj ) = r=i (λj − λr ) · vj . (4. The formula (4.11) is a direct sum we need to prove that for an arbitrary vector w ∈ W the expansion w = v1 + . λs be a set of mutually distinct eigenvalues of the operator f : V → V .10). Theorem 4. . .A. Let λ1.5 say that Vλi is invariant under the action of f and of all other operators hj .11) In order to prove that the sum (4. 2004. Applying the operator fi to both sides of the expansion (4. . . the operator f is an eigenvector of the operator P (f). . This makes no difference for the result of the theorem 4. . . however. .12) we derive vi = fi (w) s . For this purpose we consider the operator fi defined by formula s (4. Then the sum of associated eigenspaces Vλ1 . is unique. or it could include only a part of such eigenvalues. . Vλs is a direct sum: Vλ1 + .70 CHAPTER II.. where vi ∈ Vλi . is not true. . . .6.6. which certainly belong to the polynomial envelope of f. for the vector vi in the expansion (4. we get the equality s fi (w) = r=i (λi − λr ) · vi . . + Vλs = Vλ1 ⊕ . The permutability of any two such operators follows from (4. λs be a set of mutually distinct eigenvalues of the operator f.13) This follows from vj ∈ Vλj .

An operator f : V → V is diagonalizable if and only if the sum of all its eigenspaces coincides with V . + Vλs = V . we get a basis in V (see theorem 6. λs is the total set of mutually distinct eigenvalues of the operator f and assume that Vλ1 + . e.11) is a direct sum. . Let f be a diagonalizable operator. having collected together the terms with coinciding eigenvalues in this expansion. This is a basis composed by eigenvectors of the operator f. en such that its matrix F in this basis is diagonal. the application of f to each basis vector reduces to multiplying this vector by its associated eigenvalue. However. where vi ∈ Vλi .3) is also diagonal. The expansion of an arbitrary vector v in this base is an expansion by eigenvectors of the operator f. 71 The formula (4. . Therefore. . . choosing a basis in each eigenspace and joining them together. this means that Vλ1 + . This is a necessary condition for the operator f to be diagonalizable. i. + vs .3 in Chapter I). each basis vector ei is an eigenvector of the operator f. A linear operator f : V → V in a linear vector space V is called a diagonalizable operator if there is a basis e1 .14) uniquely determines all summands in the expansion (4. ⊕ Vλs = V . . The theorem 4. Therefore. is written as f(ei ) = Fii · ei.6 says that this is a direct sum: V = Vλ1 ⊕ . . only diagonal elements Fii of this matrix can be nonzero.12) is unique and the sum of subspaces (4. Since v is an arbitrary vector of V . Theorem 4. . . suppose that λ1 . Then the relationship (1. . . EIGENVALUES AND EIGENVECTORS. . Hence. Proof. + Vλs = V .3. The direct proposition of the theorem is proved. . This means that characteristic numbers of a diagonalizable operator coincide with its eigenvalues. . while λi = Fii is its associated eigenvalue. it is not a sufficient condition. . . Conversely. This means that the expansion (4. . Then the matrix Hf in formula (4. we get the expansion v = v1 + . . . Even in the case of algebraically closed field of complex numbers K = C there are non-diagonalizable operators in vector spaces over the field C. Assume that an operator f : V → V is diagonalizable and assume that we have chosen a basis where its matrix is diagonal.2). . Hence.§ 4. Due to this equality we conclude that the characteristic polynomial of a diagonalizable operator is factorized into the product of a linear terms and all roots of characteristic equation belong to the field K (not to its extension). Then we can choose a basis e1. we immediately derive the following formula: n det(f − λ · 1) = i=1 (Fii − λ). the matrix F of the operator f in this basis is diagonal. Definition4.12) if the vector w ∈ W is given.7. The theorem is proved. Its diagonal elements coincide with the eigenvalues of the operator f. . en in the space V such that the matrix of the operator f is diagonal in this basis. which determines the matrix F . . Therefore.

1) can be infinite since the number of vectors in a linear vector space usually is infinite. In that case. . . we have (fU )k u = f k (u) = 0. In a finite-dimensional linear vector space V the height ν(f) of any nilpotent operator f : V → V is finite. . A linear operator f : V → V is called a nilpotent operator if for any vector v ∈ V there is a positive integer number k such that f k (v) = 0. . ν(en) with respect to f. applying the operator f m to v.72 CHAPTER II. . This means that the height of a nilpotent operator f is finite: ν(f) = m < ∞. This means that there is a minimal positive number k = kmin (depending on v) such that f k (v) = 0. where the maximum in the formula (5. The height of zero vector is taken to be zero by definition. . According to the definition 5. Definition 5. . Proof.2. . v∈V (5. . Nilpotent operators. . . . If f : V → V is a nilpotent operator and if U is an invariant subspace of the operator f. there is an integer number k > 0 such that f k (u) = 0. en in V and consider the heights of all basis vectors ν(e1).1. but the maximum in (5.1 for any vector v there is an integer number k (depending on v) such that f k (v) = 0. for any nonzero vector v its height is greater or equal to the unity.2. . § 5. then the restricted operator fU and the factoroperator fV /U both are nilpotent. . However.2) Due to the formula (5. Hence. Theorem 5. Definition 5. a nilpotent operator f is called an operator of finite height and the number ν(f) is called the height of a nilpotent operator f. Let’s denote the height of v by ν(v) and define the number ν(f) = max ν(v). For an arbitrary vector v ∈ V consider its expansion v = v 1 · e1 + . Proof. . we find n f m (v) = i=1 vi · f m (ei ) = 0. + vn · en. ν(en)}. if m > k and f k (v) = 0 then f m (v) = 0. Let’s choose a basis e1 . This minimal number kmin is called the height of the vector v respective to the nilpotent operator f.2) we see that the heights of all vectors of the space V are restricted by the number m. indeed.1) For each vector v ∈ V its height is finite. Any vector u of the subspace U ⊂ V is a vector of V . the result of applying the restricted operator fU the a vector of U coincides with the result of applying the initial operator f to this vector. Theorem 5. Then denote m = max{ν(e1). The choice of such number has no upper bound. Then. LINEAR OPERATORS.1) is finite.1. (5. Therefore.

(5. A nilpotent operator f cannot have a nonzero eigenvalue. It the finite-dimensional case this theorem can be strengthened as follows. since f is nilpotent. Due to the theorem 5.2 the factoroperator fV /U is nilpotent. for the nonzero vector w we get the following series of equalities: f(w) = f(f k−1 (v)) = f k (v) = 0 = 0 · w.§ 5. This means that w = f k−1 (v) = 0 is an eigenvector of f with the eigenvalue λ = 0 since f(w) = f k (v) = 0 = 0 · w. . The theorem is proved. Then we derive f k (v) = λk · v = 0. Then f k (v) = 0 and w = f k−1 (v) = 0.6. The restricted operator f U is zero. Now it is clear that the factoroperator fV /U is a nilpotent operator. In the case n = 1 we fix some vector v = 0 in V and denote by k = ν(v) its height. λk = 0. w is an eigenvector of the operator f and λ = 0 is its associated eigenvalue. In a finite-dimensional space V of the dimension dim V = n any nilpotent operator f has exactly one eigenvalue λ = 0 with the multiplicity n. there is a number k > 0 such that f k (v) = 0. Now. The theorem is completely proved.3) The factoroperator fV /U is an operator in factorspace V /U whose dimension n − m is less than n. and let k = ν(v) be the height of this vector v respective to the operator f. for characteristic polynomial of this operator fU = 0 we derive det(fU − λ · 1) = (−λ)m .4. We shall prove this theorem by induction on n = dim V . Proof. Let’s consider the eigenspace U = V0 corresponding to the eigenvalue λ = 0. hence. On the other hand. Suppose that the theorem is proved for any finite-dimensional space of the dimension less than n and consider a space V of the dimension n = dim V . As above. let’s fix some vector v = 0 in V and denote by k = ν(v) its height respective to the operator f. where v is some fixed vector in Q. Let λ be an eigenvalue of a nilpotent operator f and let v = 0 be an associated eigenvector. Hence. Let Q = ClU (v). Hence. NILPOTENT OPERATORS. Let’s denote m = dim U = 0. Proof. Theorem 5. But v = 0. 73 This proves that f U is a nilpotent operator. we derive the characteristic polynomial of f: det(f − λ · 1) = (−λ)m det(fV /U − λ · 1). therefore. In the case of factoroperator we consider an arbitrary coset Q in the factorspace V /U . Theorem 5. applying the theorem 3. Then f k (v) = 0 and f k−1 (v) = 0. This is the equation for λ and λ = 0 its unique root. Then we have f(v) = λ · v.3. The base of the induction is proved. Then we can calculate (fV /U )k Q = ClU (f k (v)) = 0.

.7) Let k = max{k1.5). . . This vector v produces the chain of k vectors according to the following formulas: v1 = f k−1 (v).6) Comparing these two chains (5. Let f : V → V be a linear operator. In order to specify the chain vector we use two indices vi. then in these two chains there are no coinciding vectors at all. there is even stronger result. (5. This means that λ = 0 is the only eigenvalue of the operator f and its multiplicity is n = dim V . Then the first vector v1 vanished. Therefore. Without loss of generality we can assume that the chains are arranged in the order of decreasing their lengths. . then the whole set of vectors in these chains is linearly independent. therefore. . Let’s apply the operator f to each vector in the chain (5. We shall prove the theorem by induction on k. wk−1 = f(v). . If the side vectors in several chains of the form (5. The vector v1 is called the side vector or the eigenvector of the chain (5. e.5). Consider a vector v ∈ V and denote by k = ν(v) its height respective to the operator f.5) and (5. . they contain only the side vectors and have no adjoint vectors at all. i. we can apply the inductive hypothesis to it. If k = 1 then the lengths of all chains are equal to 1. (5. ks the lengths of our chains. . (5. . Applying f to the rest k − 1 vectors we get another chain: w1 = f k−1 (v). . If the side vectors of two chains are different. .j . The other vectors are called the adjoint vectors of the chain. .5). . (5.4) Comparing the above relationships (5. Proof. ks}. Then for its characteristic polynomial of the factoroperator fV /U we get det(f V /U − λ · 1) = (−λ)n−m .74 CHAPTER II. The first index i is the number of chain to which this vector vi. the second index j specifies the number of this vector within the i-th chain. v2 = f k−2 (v). . The theorem is proved.3) and (5. It is obtained from the first one by removing the last vector vk = v. LINEAR OPERATORS. We consider s chains of the form (5. It is known as the theorem on «linear independence of chains».5. vk = f 0 (v) = v. Denote by k1 .5) are related with each other as follows: vi = f(vi−1 ). ks 1. we have the following inequalities: k1 k2 . However. but the second chain is shorter.5) are linearly independent. we see that they are almost the same. The proposition of the theorem in this case is obviously true.6).5) The chain vectors (5. we find the characteristic polynomial of the initial operator f: det(f − λ · 1) = (−λ)n . . .j belongs. .4). . w2 = f k−2 (v). Theorem 5..

(5.12) . (5. For our s chains.11) is also trivial.7) cannot be equal to 1 since k > 1. the linear combination (5.j · f(vi. . all coefficients of the linear combination in left hand side of (5. then the result of applying f to (5. 75 Suppose that the theorem is valid for the chains whose lengths are not greater than k − 1.j ) = i=1 j=2 αi.8) and use the following quite obvious relationships: f(vi. When applied to (5.10) The left side of the relationship (5.j+1 · vi. Let f : V → V be a nilpotent operator in a linear vector space V and let v be a vector of the height k = ν(v) in V . The remainder is written as follows: s i=1 αi. we consider a linear combination of all their vectors being equal to zero: s ks i=1 j=1 αi.10) and (5. for j = 1. vk . We have completed the inductive step and thus have proved the theorem in whole. = ks = 1.10). (5.7).5) generated by v and denote by U (v) the linear span of chain vectors (5.9) as follows: r kr −1 i=1 j=1 αi. Now we can apply the inductive hypothesis. If we take into account (5. which yields the linear independence of all vectors presented in (5. In this case r < s and kr+1 = .j = 0. .§ 5.11) it follows that the initial linear combination (5.j · vi. The lengths of all chains in (5.11) we have only the side vectors of initial chains. vi.j = 0.10) is again a linear combination of chain vectors. The are linearly independent by the assumption of the theorem. . (5. (5. However. .8). . sometimes certain chains of vectors can drop from the above sums at all.1 · vi.j−1 = 0.8) this fact means that the most part of terms in left hand side of this equality do actually vanish. From triviality of (5. .5): U (v) = v1 .j ) = 0.10) are equal to zero. NILPOTENT OPERATORS.11) Now in the linear combination (5. Therefore.8) is written as s ks r kr i=1 j=1 αi. Shifting the index j + 1 → j in the last sum we can write (5. whose lengths are restricted by the number k.1 = 0.j · vi. Hence. Consider the chain of vectors (5. Here we have r chins with the lengths 1 less as compared to original ones in (5. Let’s apply the operator f to both sides of (5.9) In typical situation r = s. for j > 1. This happens if a part of chains were of the length 1.8) is trivial too.8) From this equality we should derive the triviality of the linear combination in its left hand side.j−1.

. This means that u is a chain vector in a chain of the form (5.15) we have the sequence of inclusions V0 = U 1 ⊇ U 2 ⊇ .5) and (5.5 the subspace U (v) is a finite-dimensional subspace and dim U (v) = k.. The chain vectors (5.. (5.. . Let f : V → V again be a nilpotent operator. Therefore Um+1 = {0}.5): f(v1 ) = 0. For this purpose let’s consider the following subspaces: Uk = Ker f ∩ Im f k−1 .13). 1 0 . Hence. The upward next diagonal parallel to the primary one is filled with unities.13) Due to (5.. .14) is a square k × k matrix. ⊇ U k ⊇ .5) of the length k with the side vector u = f k−1(v) can be treated as a chain of the length k − 1 by dropping the k-th vector vk = v (see (5. 0 .5)..14) A matrix of the form (5. using (5. vk : 0 1 Jk (0) = 0 .. v is a vector of the height k and u is a side vector in the chain (5.. For the subspaces (5.16) .5).14) is filled with zeros again. (5. . we can find the matrix of this restricted operator in the chain basis v1 .14) is called a Jordan block or a Jordan cage of a nilpotent operator..16) where V0 = Ker f is the eigenspace corresponding to the unique eigenvalue λ = 0 of nilpotent operator f. . The inclusions (5.12). Hence. The matrix (5. this matrix degenerates and becomes purely zero matrix with the only element: J1(0) = 0 .6)).12) is invariant under the action of the operator f. The following relationships are derived directly from the definition of the chain (5. we can consider the restricted operator fU (v) and.13) the subspace (5. u = f k−1 (v) for some vector v. . Then for the vector v = f(v) we have u = f k−2 (v ). Hence.5) form a basis in this subspace (5. f(v2 ) = v1 . All other space in the matrix (5.. (5. We continue to study vector chains of the form (5. . Its primary diagonal is filled with zeros.76 CHAPTER II. .. LINEAR OPERATORS.. Due to the theorem 5. This yields the inclusion of subspaces Uk ⊂ Uk−1 for k > 1. (5. From the condition u ∈ Ker f we derive f(u) = f k (v) = 0. Therefore..1). if k = 1. the sequence of inclusions (5. .5) initiated by the vector v.15) If u ∈ Uk .. f(vk ) = vk−1.16) follow from the fact that any chain (5. . . In a finite-dimensional space V the height of any vector v ∈ V is restricted by the height of the nilpotent operator f itself: ν(v) ≤ ν(f) = m < ∞ (see theorem 5. . then u ∈ Im f k−1 .

. . we can write ei = f k−1 (ei. Proof.5). es of the subspace V0 .18). we have only those of them that belongs to Uk and. . We only have to prove that an arbitrary vector v ∈ V can be represented as a linear combination of chain vectors ei. namely.1.6. we obtain r f k−1(v) = i=1 αi · f k−1 (ei. while each particular subspace in a flag is called a flag subspace. are side vectors in the chains of the length not less than k. . .15)). This is the base of induction. Theorem 5. Such a basis is called a canonic basis or a Jordan basis of a nilpotent operator f. 2004. Now suppose that any vector of the height less than k can be represented as a linear combination of chain vectors ei.17) are called flags. Note that each vector in such a basis is a side vector of some chain of the form (5. Let’s join together all vectors of the above chains and let’s enumerate them by means of double indices: ei.j . Here i is the number of the chain and j is the individual number of the vector within i-th chain. Therefore. which we have constructed above: r u= i=1 αi · e i .j . and so on backward along the sequence (5. 77 terminates on m-th step. Then we complete it up to a basis in Um−1 . .k ) for i = 1..A. (5.17).§ 5. . Let’s take a vector v of the height k and denote u = f k−1 (v). es in V0 = Ker f. As a result we construct a basis e1 . . Then e1 = e1. . hence. 1996. e.17) Sequences of mutually enclosed subspaces of the form (5. This means that u is a side vector in a chain of the length k initiated by the vector v. this vector can be expanded in the basis of the subspace Uk . . In this case v is expanded in the basis e1. (5. The proof of the theorem is based on the fact that the flag (5. The linear independence of this set of vectors follows from the theorem 5. in Um−2 . For any nilpotent operator f in a finite-dimensional space V there is a basis in V composed by chain vectors of the form (5.5. . If k = ν(v) = 1. i. For the complementary vectors from Um−1 their chins are of the length m − 1 and further the length of chains decreases step by step until the unity for the complementary vectors in largest subspace U1 = V0 . Now let’s prove that the set of all vectors from the above chains form a basis in V . . .17) is finite.j . (5.5). We shall prove this fact by induction on the height of the vector v. . Therefore. then v ∈ Ker f = V0 . we have a finite sequence of inclusions: V0 = U1 ⊇ U2 ⊇ .k ).18) we have only a part of vectors e1. r. es = es. NILPOTENT OPERATORS. ⊇ Um ⊇ {0}. . . We choose a basis in the smallest subspace Um . . Substituting these expressions into (5. For basis vectors of the subspace Um the lengths of such chains are equal to m. Then f(u) = 0.16) or (5. . . .19) CopyRight c Sharipov R.1. . u is an element of the subspace Uk (see formula (5.18) Note that in the expansion (5. . es . .

Jks (0) The matrix (5. .k1 ) ⊕ .21) is called a Jordan form of the matrix of a nilpotent operator. Hence. (5. es is constructed strictly according to the proof of the theorem 5. The matrix (5.ki . . (5. and this usually happens in practice. .21) is blockwise-diagonal. Then v can also be expressed as a linear combination of chain vectors ei. es can change this order.14). . . It is easy to understand this fact.21) However. the existence of which was proved in theorem 5.k ) = 0. In the basis composed by chain vectors. . the matrix of nilpotent operator f has the following form: Jk1 (0) Jk2 (0) F = . then Jordan cages are arranged in the order of decreasing sizes: k1 k2 .19) we determine the vector v : r v =v− i=1 αi · ei. . .k . the height of the vector v is less than k and we can apply the inductive hypothesis to it. we find r f k−1(v ) = f k−1 (v) − i=1 αi · f k−1(ei. The inductive step is completed and the theorem in whole is proved.6..j . all other space in this matrix is filled with zeros.6 is known as the theorem on bringing the matrix of a nilpotent operator to a canonic Jordan form. The theorem 5. By means of the coefficients of the expansion (5.78 CHAPTER II. each chain with the side vector ei produces the invariant subspace U (v) of the form (5. LINEAR OPERATORS.j . ⊕ U (es. . Due to the theorem 5. If the chain basis e1 .. . the permutation of vectors e1.6. This means that v can be represented as a linear combination of chain vectors ei.19). its diagonal blocks are Jordan cages of the form (5.k.ks ).12). . .20) Applying the operator f k−1 to v and taking into account (5.6 the space V is the direct sum of such invariant subspaces: V = U (e1. ks . . Indeed.. . . where v = ei. . But v is expressed through v as follows: r v=v + i=1 αi · ei.

7.1 is actually a subspace in V . λ) ⊆ V (2. . + vks . Therefore we have the sequence of inclusions V (1.6 now we can choose the chain basis. The height of a chain vector is not greater than the length of the chain (5.5) to which it belongs. λ). it belongs to the union of all subspaces V (k.3) belongs to V (k. λ). (6. . . Therefore.3) Let k = max{k1. Therefore. Therefore the vector (6. . λ). Note that (hλ )k v = 0 implies (hλ )k+1 v = 0.1 shows that the set V (λ) in definition 6. let v be a vector of the sum of subspaces V (k. . 79 Theorem 5.§ 6. Therefore. The proof of coincidence of the sum and the union in (6. λ) coincides with the eigenspace Vλ . Two theorems on the sum of root subspaces. The root subspace of a linear operator f : V → V corresponding to its eigenvalue λ is the set V (λ) = {v ∈ V : ∃k ((k ∈ N) & ((f − λ · 1)k v = 0))} that consist of vectors vanishing under the action of some positive integer power of the operator hλ = f − λ · 1. This yields ν(f) n = dim V . hence. The height of an arbitrary vector v of V is not greater than the height of the operator f. . The theorem 6. . λ) ⊆ . Definition 6. λ). § 6.1) we derive vki ∈ V (k. This means that f n = 0. (6. Due to the theorem 5.1): ∞ ∞ V (λ) = k=1 V (k. Moreover. ks}. . ROOT SUBSPACES. Proof. Then v = vk1 + . Indeed. the height of basis vectors in a chain basis is not greater than the number of vectors in such a basis. (6.2) is based on the inclusions (6.1) It is easy to see that all subspaces in the sequence (6. f n (v) = 0 for all v ∈ V . For k = 1 the subspace V (1.1 we noted that the height ν(f) of a nilpotent operator f coincides with the greatest height of basis vectors. V (λ) is the union of the subspaces (6.1). λ) ⊆ . λ).1. λ) coincides with their union. The theorem is proved. The sum of a growing sequence of mutually enclosed subspaces coincides with their union.2) In this case the sum of subspaces the sum of subspaces V (k.1) are enclosed into the root subspace V (λ). Above in proving the theorem 5. λ). This subspace is nonzero since it comprises the eigenspace Vλ as a subset. ⊆ V (k. For each positive integer k we define the subspace V (k. then from the sequence of inclusions (6. . Theorem 6. Root subspaces.1. where vks ∈ V (ks . . we have proved the more general theorem. λ) = Ker(hλ )k . The height of a nilpotent operator f in a finite-dimensional space V is less or equal to the dimension n = dim V of this space and f n = 0. . λ) = k=1 V (k.

80 CHAPTER II. Let’s consider the vector w = f(v). Hence. Therefore. Due to the above result v belongs to Vλ . its bijectivity follows from its injectivity due to the theorem 1. The invariance of V (λ) under the action of f is proved. Let λ and µ be two eigenvalues of a linear operator f : V → V . therefore. Ker hλ. A root subspace V (λ) of an operator f is invariant under the action of f and of all operators from its polynomial envelope P (f). (6. Here we used the permutability of the operators hλ and f. Therefore. Therefore. we have the equality f(v) = λ · v. (2) a nilpotent operator if µ = λ. Theorem 6. which means that Ker hλ. Then there exists a positive integer number k such that (hλ )k v = 0.µ the restriction of hλ to the subspace V (µ). hence. in the case λ = µ the operator hλ. we immediately get v = 0. if λ = µ. Then the restriction of the operator hλ = f − λ · 1 to the root subspace V (µ) is (1) a bijective operator if µ = λ. For this vector we have (hλ )k w = (hλ )k ◦ f v = f ◦ (hλ )k v = f((hλ )k (v)) = 0. The kernel of the operator hλ by definition coincides with the eigenspace Vλ . Now let’s prove the second proposition of the theorem. Let’s find its kernel: Ker hλ.µ : V (µ) → V (µ) is injective. Proof. Note that hλ.µ = Vλ ∩ V (µ). (6. LINEAR OPERATORS. This is an operator acting from V (µ) to V (µ). Its invariance under the action of operators from P (f) now follows from the theorem 4.µ = {0}. we consider the operator hλ. For the sake of convenience we denote by hλ.λ being the restriction of hλ to the subspace V (λ).5). Let v be an arbitrary vector of the kernel Ker hλ.µ .2. Let v ∈ V (λ). Proof. The surjectivity of this operator and.4) we get hµ (v) = f(v) − µ · v = (λ − µ) · v.3. it follows from the inclusion hλ ∈ P (f). Therefore.µ = {v ∈ V (µ) : hλ (v) = 0} = Ker hλ ∩ V (µ).λ v = hλ v for all v ∈ V (λ). we obtain the following equality for v: (hµ )k v = (λ − µ)k · v = 0. In this case µ = λ. We already know that the subspace V (µ) is invariant under the action of hλ. Combining this equality with (6.5. Due to the above equality we have w = f(v) ∈ V (λ). from the definition of a root subspace we conclude that for any vector v ∈ V (λ) there is a positive integer . we have the other condition v ∈ V (µ) which means that there exists some integer number k > 0 such that (hµ )k v = (f − µ · 1)k v = 0. Let’s prove the first proposition of the theorem.3.4) Simultaneously.5) From (6. Theorem 6.

This equality means that hλ.11) The vector wi belongs to the root space V (λi ). As a result we get w1 + . (6.9) Denote hr = f − λr · 1. Therefore we can replace the operators hr in (6. except for i-th term only.10) Due to the permutability of the operators h1.λ is a nilpotent operator in V (λ).11) by their restrictions hr. . + vs . . .8) (6. + V (λs ) = V (λ1 ) ⊕ . . Theorem 6. . . + ws = 0. + V (λs ).7) Then let’s subtract the second expansion from the first one and for the sake of ˜ brevity denote wi = (vi − vi ) ∈ V (λi ).6) is a direct sum we should prove the uniqueness of the following expansion for an arbitrary vector w ∈ W : w = v1 + . for any vector wr in the expansion (6. . where vi ∈ V (λi ). Let’s apply the operator (6.4. . The proof of this theorem is similar to that of theorem 4. . (6. According to the definition of the root subspace V (λr ). . ⊕ V (λs ). ROOT SUBSPACES.§ 6.9) there is some positive integer number kr such that (hr )kr wr = 0. . .10) to both sides of the equality (6. which is invariant under the action of all operators hr in (6. where vi ∈ V (λi ). . .λ )k v = (f − λ · 1)k v = 0. Let’s write this equality in expanded form: s (hr )kr r=i wi = 0 (6.6. Denote by W the sum of subspaces specified in the theorem: W = V (λ1 ) + . (6. Proof. . + vs . hs belonging to the polynomial envelope of the operator f and due to the equality (hr )kr wr = 0 we get fi (wj ) = 0 for all j = i. (6.i to the subspace V (λi ): s (hr.12) . Then all terms in the sum in left hand side of this equality do vanish.11).i )kr r=i wi = 0. . We use this fact and define the operators s fi = r=i (hr )kr . The theorem is proved. Consider another expansion of the same sort for the same vector w: ˜ ˜ ˜ w = v1 + .6) In order to prove that the sum (6. . . . Let λ1 . Then the sum of corresponding root subspaces is a direct sum: V (λ1 ) + . λs be a set of mutually distinct eigenvalues of a linear operator f : V → V . 81 number k such that (hλ. . (6. This yields fi (wi ) = 0.9).

82

CHAPTER II. LINEAR OPERATORS.

According to the theorem 6.3, the restricted operators hr,i are bijective if r = i. The product (the composition) of bijective operators is bijective. We also know that applying a bijective operator to nonzero vector we would get a nonzero result. ˜ Therefore, (6.12) implies wi = 0. Then vi = vi and the expansions (6.7) and (6.8) do coincide. The uniqueness of the above expansion (6.7) and the theorem in whole are proved. Theorem 6.5. Let f be a linear operator in a finite-dimensional space V over the field K and suppose that its characteristic polynomial factorizes into a product of linear terms in K. Then the sum of all root subspaces of the operator f is equal to V , i. e. V (λ1 ) ⊕ . . . ⊕ V (λs ) = V , where λ1 , . . . , λs is the set of all mutually distinct eigenvalues of the operator f. Proof. Since λ1 , . . . , λs is the set of all mutually distinct eigenvalues of the operator f, for its characteristic polynomial we get
s

det(f − λ · 1) =

i=1

(λi − λ)ni .

According to the hypothesis of theorem, it is factorized into a product of linear polynomials of the form λi − λ, where λi is an eigenvalue of f and ni is the multiplicity of this eigenvalue. Let’s denote by W the total sum of all root subspaces of the operator f, we know that this is a direct sum (see theorem 6.4): W = V (λ1 ) ⊕ . . . ⊕ V (λs ). The root subspaces are nonzero, hence, W = {0}. Further proof is by contradiction. Assume that the proposition of the theorem is false and W = V . The subspace W is invariant under the action of f as a sum of invariant subspaces V (λi ) (see theorem 3.2). Due to the theorem 4.5 it is invariant under the action of the operator hλ = f − λ · 1 as well. Let’s apply the theorem 3.5 to the operator hλ . This yields det(f − λ · 1) = det(fW − λ · 1) det(fV /W − λ · 1). (6.13)

Here we took into account that 1W = 1 and 1V /W = 1, we also used the theorem 3.4. The characteristic polynomial of the operator f is the product of characteristic polynomial of restricted operator f W and that of factoroperator f V /W . The left hand side of (6.13) factorizes into a product of linear polynomials in K, therefore, each of the polynomials in right hand side of (6.13) should do the same. Let λq be one of the eigenvalues of the factoroperator fV /W and let Q ∈ V /W be the corresponding eigenvector. Due to (6.13) the number λq is in the list λ1 , . . . , λs of eigenvalues of the operator f. Due to our assumption W = V we conclude that the factorspace V /W is nontrivial: V /W = {0}, and the coset Q is not zero. Suppose that v ∈ Q is a representative of this coset Q. Since Q = 0, we have v ∈ W . The coset Q is an eigenvector of the factoroperator f V /W , therefore, it should satisfy the following equality: (f V /W − λq · 1) Q = ClW ((f − λq · 1) v) = 0. (6.14)

§ 7. JORDAN BASIS OF A LINEAR OPERATOR.

83

Let’s denote hr = f − λr · 1 for all r = 1, . . . , s (we have already used this notation in proving the previous theorem). The relationship (6.14) means that (f − λq · 1) v = hq (v) = w ∈ W. (6.15)

From the expansion W = V (λ1 ) ⊕ . . . ⊕ V (λs ) for the vector w, which arises in formula (6.15), we get the expansion hq (v) = w = v1 + . . . + vs , where vi ∈ V (λi ). (6.16)

Let’s consider the restriction of the operator hq to the root subspace V (λi ), this restriction is denoted hq,i (see the proof of theorem 6.4). Due to the theorem 6.3 we know that the operators hq,i : V (λi ) → V (λi ) are bijective for all i = q. ˜ Therefore, for all v1 , . . . , vs in (6.16) other than vq we can find vi ∈ V (λi ) such that vi = hq,i (˜i ). Let’s substitute these expressions into (6.16). Then we get v
s

w = hq (v) = vq +
i=q

hq (˜i ). v

(6.17)

˜ Relying upon this formula (6.17), we define the new vector vq :
s

˜ vq = v −

˜ vi .
i=q

(6.18)

For this vector from (6.17) we derive hq (˜q ) = vq ∈ V (λq ). Due to the definition v of the root subspace V (λq ) there exists a positive integer number k such that ˜ ˜ (hq )k vq = 0. Hence, (hq )k+1 vq = 0 and, therefore, vq ∈ V (λq ). Returning back to the formula (6.18), we derive
s

v=
i=1

˜ vi , where vi ∈ V (λi ).

(6.19)

From the formula (6.19) and from the expansionW = V (λ1 ) ⊕ . . .⊕ V (λs ) it follows that v ∈ W , but this contradicts to our initial choice v ∈ W , which was possible due to the assumption W = V . Hence, W = V . The theorem is proved. § 7. Jordan basis of a linear operator. Hamilton-Cayley theorem. Let f : V → V be a linear operator in finite-dimensional linear vector space V . Suppose that V is expanded into the sum of root subspaces of the operator f: V = V (λ1 ) ⊕ . . . ⊕ V (λs ). (7.1)

Let’s denote hi = f − λi · 1. Then denote by hi,j the restriction of hi to V (λj ). According to the theorem 6.3, the restriction hi,i is a nilpotent operator in i-th root subspace V (λi ). Therefore, in V (λi ) we can choose a canonic Jordan basis for this operator (see theorem 5.6). The matrix of the operator hi,i in canonic

84

CHAPTER II. LINEAR OPERATORS.

Jordan basis is a matrix of the form (5.21) composed by diagonal blocks, where each diagonal block is a matrix of the form (5.14). Definition 7.1. A Jordan normal basis of an operator f : V → V is a basis composed by canonic Jordan bases of nilpotent operators hi,i in the root subspaces V (λi ) of the operator f. Note that an operator f in a finite dimensional space V possesses a Jordan normal basis if and only if there V is expanded into the sum of root subspaces of the operator f, i. e. if we have (7.1). The theorem 6.5 yields a sufficient condition for the existence of a Jordan normal basis of a linear operator. Suppose that an operator f in a finite-dimensional linear vector space V possesses a Jordan normal basis. The subspaces V (λi ) in (7.1) are invariant with respect to f. Let’s denote by fi the restriction of f to V (λi ). The matrix of the operator f in a Jordan normal basis is a blockwise-diagonal matrix: F1 F = F2 .. . Fs The diagonal blocks Fi in (7.2) are determined by operators fi . Note that the operators fi and hi,i are related to each other by the equality fi = hi,i + λi · 1. Therefore, Fi is also a blockwise-diagonal matrix: Jk1 (λi ) Jk2 (λi ) Fi = .. . . Jkr (λi ) The number of diagonal blocks in (7.3) is determined the number of chains in a canonic Jordan basis of the nilpotent operator hi,i, while these diagonal blocks themselves are matrices of the following form: λ Jk (λ) = 1 λ .. .. 0 . . 1 λ . (7.4) (7.3) , (7.2)

A matrix of the form (7.4) is called a Jordan block or a Jordan cage with λ on the diagonal. This is square k × k matrix; if k = 1 this matrix degenerates and becomes a matrix with the single element J1 (λ) = λ . The matrix of an operator f in a Jordan normal base presented by the relationships (7.2), (7.3), and (7.4) is called a Jordan normal form of the matrix of this operator. The problem of constructing a Jordan normal basis for a linear operator f and thus finding the Jordan normal form F of its matrix is known as the problem of bringing the matrix of a linear operator to a Jordan normal form.
CopyRight c Sharipov R.A., 1996, 2004.

5). However.1 is immediate from (7. We do not consider the field extension technique in this book.2) all are upper-triangular matrices.6) Denote hi := f − λi · 1 and denote by hi. In the case of the field of complex numbers C any polynomial factorizes into a product of linear terms. this fact has several important consequences.§ 7. Then from the formula (7.j the restriction of hi to the root subspace V (λj ). the matrix of any linear operator in a complex linear vector space can be brought to a Jordan normal form.2.3) and (7. Therefore. Proof.1. JORDAN BASIS OF A LINEAR OPERATOR.3. Theorem 7. Note that a matrix of the form(7. The necessity of the condition formulated in the theorem 7. Let P (λ) be the characteristic polynomial of a linear operator f in a finite-dimensional space V . For the operator f. (7. Hence.5. (7.2) are the eigenvalues of the operator f. the i-th eigenvalue λi being presented ni times. Such a case can be reduced to the case of complete factorization by means of the field extension technique. Proof. Theorem 7.6 and 6.5). From the course of algebra we know that the determinant of an upper-triangular matrix is equal to the product of all its diagonal elements. The entries on the diagonal of (7.2 immediately follows from the formula (7. the sufficiency is provided by the theorems 5. 85 If the matrix of a linear operator can be brought to a Jordan normal form. Then P (f) = 0. The multiplicity of an eigenvalue λ of a linear operator f in a finite-dimensional linear vector space V is equal to the dimension of the corresponding root subspace V (λ). But it is worth to note that the complete proof of the following Hamilton-Cayley theorem is also based on that technique.6) we derive s P (f) = (hi )ni . the characteristic polynomial of which factorizes into the product of linear terms.4) is upper-triangular. We shall prove the Hamilton-Cayley theorem for the case where the characteristic polynomial P (λ) factorizes into the product of linear terms: s P (λ) = i=1 (λi − λ)ni . i=1 . Therefore. this fact is valid also in the case of partial factorization. The matrix of a linear operator f in a finite-dimensional linear vector space V over a numeric field K can be brought to a Jordan normal form if and only if its characteristic polynomial factorizes into the product of linear polynomials in the field K. where ni = dim V (λi ). the characteristic polynomial of an operator possessing a Jordan normal basis is given by the formula s det(f − λ · 1) = i=1 (λi − λ)ni .5) Theorem 7. (7. the proposition of theorem 7.

where vi ∈ V (λi ).j )ni vj . . which we do not consider in this book.j .7) and (7.j is a nilpotent operator in the subspace V (λj ) and nj = dim V (λj ). + P (f) vs . Let’s apply P (f) to an arbitrary vector v ∈ V . Using permutability of the operators hi and their restrictions hi. Then s s P (f) vj = i=1 (hi)ni vj = i=1 (hi.8) The operator hj. The general case is reduced to this special case by means of the field extension technique.86 CHAPTER II. we can apply the theorem 5. This proves the theorem for the special case. we have P (f) v = P (f) v1 + .5) we can expand v into a sum v = v1 + . . + vs . Therefore. Now from (7. Therefore. where the characteristic polynomial of an operator f factorizes into a product of linear terms. LINEAR OPERATORS. (7. . (7. .j )nj vj = 0.7) The root subspace V (λj ) is invariant under the action of the operators hi .j )ni (hj. we can bring the above expression for P (f) vj to the following form: s P (f) vj = i=j (hi.8) for an arbitrary vector v ∈ V we derive P (f) v = 0.j )nj vj .7. . As a result we obtain (hj. Due to the theorem (6.

3.1) . Definition 1. (1. Let V be a linear vector space over a numeric field K.4 in Chapter I: dim V ∗ = dim V . Comparing these two definitions. The definition of a linear functional is quite similar to the definition of a linear mapping (see definition 8. The product of the functional f by a number α ∈ K is a functional h whose values are determined by formula h(v) = α · f(v) for all v ∈ V . . Vectors and covectors. it would be worth to formulate these two definitions especially for the present case of linear functionals. which is called the dual space or the conjugate space for the space V . The structure of a linear vector space in V ∗ = Hom(V. as linear mappings from V to K. (2) f(α · v) = α f(v) for any v ∈ V and for any α ∈ K.CHAPTER III DUAL SPACE. it is determined only by V since K is known whenever V is given (see definition 2. W ) is usually determined by two spaces V and W . . K) is a linear vector space over the same numeric field K as V . Linear functionals. Dual space. The sum of functionals f and g is a functional h whose values are determined by formula h(v) = f(v) + g(v) for all v ∈ V . Let f be a linear functional of V ∗ .1 in Chapter I). en be a basis in V . Thereby the numeric field K is treated as a linear space of the dimension 1 over itself. A numeric function y = f(v) with vectorial argument v ∈ V and with values y ∈ K is called a linear functional if (1) f(v1 + v2 ) = f(v1 ) + f(v2 ) for any two v1 .2 in Chapter I). . K). the dual space is an exception V ∗ = Hom(V. However. . K) is determined by two algebraic operations: the operation of pointwise addition and pointwise multiplication by numbers (see definitions 10. v2 ∈ V . .1 and 10. However. If V is finite-dimensional. K).2. § 1. then the dimension of the conjugate space is determined by the theorem 10. constitute the space Hom(V. Let f and g be two linear functionals of V ∗ . conversely. The dual space Hom(V.1 in Chapter I). The space of homomorphisms Hom(V. Definition 1. + v n · en . Let V be a finite-dimensional vector space over a field K and let e1 . V ∗ = Hom(V. K) is denoted by V ∗ . Definition 1. . we see that any linear functional f is a linear mapping f : V → K and. Linear functionals. Then each vector v ∈ V can be expanded in this basis: v = v 1 · e1 + .1. any such linear mapping is a linear functional. Thus.

+ vn f(en ) = f( e1 ) h1 (v) + . it is trivial and coordinate functionals h1. . f(en ) are numeric coefficients from K and v is an arbitrary . . . Theorem 1. This means that each basis e1. . their coordinates are added. Proof. en . Let f ∈ V ∗ be an arbitrary linear functional and let v be an arbitrary vector of V . . Coordinate functionals h1. . . (1.3) Right hand side of (1. when the basis is fixed. . Its value when applied to the base vector ej is equal to zero. en. . we have α1 h1(ej ) + . . Due to the uniqueness of the expansion (1.4) in Chapter I). hn are linearly independent. Therefore. DUAL SPACE. However.4) Now we use the relationships of biorthogonality (1. . . . we choose more explicit way and directly prove that coordinate functionals h1 . + f(en ) hn (v).3) are zero. then its j-th component is equal to unity. Due to these relationships among n terms h1(ej ). If we expand the vector ej in the basis e1 . . Hence. . . . The proof of the relationships of biorthogonality is very simple. + αn hn (ej ) = 0. .2) i where δj is the Kronecker symbol. . all coefficients of the linear combination (1. Here f(e1 ). hi (ej ) = 1 if i = j and hi (ej ) = 0 in all other cases. . Note that hi (ej ) is a number equal to i-th component of the vector ej . (1. hn are linearly independent. . . . .2) are called the relationships of biorthogonality. . v i is a number uniquely determined by the vector v. Let’s consider i-th coordinate of the vector v.1) we derive f(v) = v1 f(e1 ) + . . . hn span the dual space V ∗ . . In order to complete the proof of the theorem now we could use the equality dim V ∗ = dim V = n and refer to the item (4) of the theorem 4. . e. . . we can consider a map hi : V → K. . . i. They satisfy the relationships i hi (ej ) = δj . . When adding vectors. . . Let’s consider a linear combination of the coordinate functionals associated with a basis e1.4) only one term is nonzero: hj (ej ) = 1. hi : V → K is a linear mapping. .3) is zero functional. hn(ej ) in left hand side of the equality (1. + αn · hn = 0. . (1. . . hn are called the coordinate functionals the basis e1. they form a basis in dual space V ∗ .1. . . while all other components are equal to zero. its coordinates are multiplied by that number (see the relationships (5.4) reduces to αj = 0. en in V and assume that it is equal to zero: α1 · h1 + . (1. . Then from (1. Therefore. when multiplying a vector by a number. These relationships (1. Therefore. en of a linear vector space V determines n linear functionals in V ∗ .5 in Chapter I.88 CHAPTER III. Hence. . . The functionals h1 .1). Hence. . . But j is an index that runs from 1 to n.2). defining it by formula hi (v) = vi . .

. DUAL SPACE. . . (f1 + f2 )(v) = f1 (v) + f2 (v). en . . .7) (1. . . . (1. or even the scalar product of a vector and a covector. the above equality can be rewritten as an equality of linear functionals in the conjugate space V ∗ : f = f(e1 ) · h1 + . . . 89 vector of V . The numbers f1. . The fact that in the writing f(v) the functional plays the role of a function.5) The formula (1. Hence. As we see in formula (1. en . Functionals from the dual space V ∗ are called covectors. . This follows from the relationships (1. The basis h1. in the definition 1. This is purely terminological trick. . .6) are the coefficients of the expansion of f in the conjugate basis h1. Definition 1. α·f |v = α f |v .5.5).5) shows that an arbitrary function f ∈ V ∗ can be represented as a linear combination of coordinate functionals h1. while the conjugate basis is treated as an auxiliary and complementary thing. being linearly independent. en in V is called the dual basis or the conjugate basis for e1. the numbers (1. . . .8) is associated with the special terminology. sometimes the quantity f(v) is denoted differently: f(v) = f | v . . . . (1. which are now written as f1 + f 2 | v = f 1 | v + f 2 | v . Therefore. The theorem is proved. . or the contraction. LINEAR FUNCTIONALS. . f(α · v) = α f(v).9) We have already dealt with the concept of bilinearity earlier in this book (see theorem 1. hn . Let f be a linear functional in a finite-dimensional space V and let e1.8) possesses the property of bilinearity: it is linear in its first argument f and in its second argument v.5 they are mentioned as the components of f in the basis e1. . (α · f)(v) = α f(v). The algebraic operations of addition and multiplication by numbers in the spaces V and V ∗ are related to each other by the following equalities: f(v1 + v2 ) = f(v1 ) + f(v2 ). . .4. . Definition 1. However. . fn determined by the linear functional f according to the formula fi = f(ei ) are called the coordinates or the components of f in the basis e1.7). (1. while the vector is written as an argument is not so important. The scalar product (1. . . . f | v 1 + v2 = f | v 1 + f | v 2 . + f(en ) · hn .§ 1. . f |α·v = α f |v . . . it means that we consider e1. Therefore. . en be a basis in this space. . . they form a basis in V ∗ .6) Vectors and linear functionals enter these equalities in a quite similar way. . en. . . . . . (1. .8) The writing (1.1 in Chapter II). hn . while the expression f | v itself is called the pairing. en as a primary basis. hn in V ∗ formed by coordinate functionals associated with a basis e1 .

. . . . es+1 the linear span of this system of vectors. etc. . . The properties (1. If W = V .2. es in a subspace U . . . . Proof. . A vector v and a covector f are called orthogonal to each other if their scalar product is zero: f | v = 0.8) is not symmetric: its arguments belong to different spaces — they cannot be swapped.2. . .8) are always written on the left and vectors are always on the right. Let V be a linear vector space over the field K and let W = V ∗ be the conjugate space of V . Thus we define a mapping ϕv : V ∗ → K. Being zero on the basis vectors of the subspace U . Let’s add the vector v to basis vectors e1. Let dim V = n and dim U = s. . Covectors in the scalar product (1. s.9) of the scalar properties (1. . We know that W is also a linear vector space over the field K. Let’s consider the case U = {0} in the above theorem. es and denote it v = es+1 . With respect to V this is the double conjugate space V ∗∗ . . Let’s denote f = hs+1 . . Its value on the vector v is equal to unity. . For any vector v = 0 in a finite-dimensional space V there is a linear functional f in V ∗ such that f(v) = 0. its dimension is one as greater than the dimension of U . . see the item (4) of the theorem 3. .7. . in contrast to that «geometric» scalar product.2) we derive f(v) = hs+1 (es+1 ) = 1 and f(ei ) = 0 for i = 1. . However. Corollary. Let v ∈ V . Denote by W = e1 . then we complete the basis e1 . . . Definition 1. Let U V be a subspace in a finite-dimensional vector space V and let v ∈ U . fourth conjugate. the scalar product (1. es+1 up to a basis e1. that in the case of finite-dimensional spaces there is no need to consider the multiple conjugate spaces. it possesses its own conjugate space W ∗ . The extended system of vectors is linearly independent since v ∈ U . . . and we can formulate the following corollary of the theorem 1. the functional f = hs+1 vanishes on all vector u ∈ U . hn associated with this base. . which is linear due to the following relationships: ϕv (f1 + f2 ) = (f1 + f2 )(v) = f1 (v) + f2 (v) = ϕv (f1 ) + ϕv (f2 ). Theorem 1. However. To any f ∈ V ∗ we associate the number f(v) ∈ K. . Let’s choose a basis e1. . soon we shall see.1 in Chapter I. . Then for any nonzero vector v we have v ∈ U . Then there exists a linear functional f in V ∗ such that f(v) = 0 and f(u) = 0 for all u ∈ U .8) are analogous to the properties of the scalar product of geometric vectors — it is usually studied in the course analytic geometry (see [5]). Therefore. ϕv (α · f) = (α · f)(v) = α f(v) = α ϕv (f). It is clear that W is a subspace of V comprising the initial subspace U . es+1 form a basis in W . The following definition is dictated by the intension to strengthen the analogy of (1. . We can also consider triple conjugate. DUAL SPACE. . en in the space V and consider the coordinate functionals h1. The vectors e1.90 CHAPTER III.8) and traditional «geometric» scalar product. Then from the relationships of biorthogonality (1. . . Thus we would have the infinite sequence of conjugate spaces.

Therefore. LINEAR FUNCTIONALS. DUAL SPACE. CopyRight c Sharipov R. In order to prove the surjectivity of the mapping (1. This means that Ker h = {0} and h is an injective mapping. from (1. This would contradict the above condition (1. In order to prove this fact we should verify the following identities for this mapping h: h(v1 + v2 ) = h(v1 ) + h(v2 ). in order to verify the equalities (1. 2004. Proof. h(α · v) = α h(v). The functional ϕv is determined by a vector v ∈ V . where h(v) = ϕv for all v ∈ V.10). Therefore. ϕv is a linear functional in the space V ∗ or.11) we should apply both sides of these equalities to an arbitrary covector f ∈ V ∗ and check the coincidence of the results that we obtain: h(v1 + v2 )(f) = ϕv1 +v2 (f) = f(v1 ) + f(v2 ) = ϕv1 (f)+ + ϕv2 (f) = h(v1 )(f) + h(v2)(f) = (h(v1 ) + h(v2 ))(f). we define a mapping h : V → V ∗∗ . For a finite-dimensional linear vector space V the mapping (1.10) is bijective. Since Im h is a subspace of V ∗∗ and dim V ∗∗ = dim V ∗ = dim V .10) The mapping (1. this means that ϕ is a linear functional in the space V ∗ . in other words. h(α · v)(f) = ϕα·v (f) = f(α · v) = α f(v) = = α ϕv (f) = α h(v)(f) = (α · h(v))(f). Theorem 1.5 in Chapter I). then there would be a functional f such that f(v) = 0.2. Let v be an arbitrary vector of Ker h.12) Now let’s apply the corollary of the theorem 1. We have already proved that dim(Ker h) = 0.10) is a linear mapping.10) we derive h(v)(f) = ϕv (f) = f(v) = 0 for all f ∈ V ∗ . First of all we shall prove the injectivity of the mapping (1. (1. 91 Hence. v = 0 by contradiction. Hence. we have Im h = V ∗∗ (see item (3) of theorem 4.11) The result of applying h to a vector of the space V is an element of double conjugate space V ∗∗ . when associating ϕv with a vector v. This completes the proof of surjectivity of the mapping h and the proof of the theorem in whole. . But ϕv ∈ V ∗∗ . the equality ϕv = 0 means that ϕ(f) = 0 for any covector f ∈ V ∗ . Using this equality. This isomorphism is called canonic isomorphism of these spaces. 1996.§ 1. (1.A. It is an isomorphism of the spaces V and V ∗∗ . dim(Im h) = dim V . If the vector v would be nonzero. it is an element of double conjugate space. Hence. (1. Then ϕv = h(v) = 0..4 from Chapter I.12). Therefore.3.10) we use the theorem 9. According to this theorem dim(Ker h) + dim(Im h) = dim V. For this purpose we consider its kernel Ker h.

. . en to the new ˜ basis e1 . . . . The reason is that the dual space V ∗ and the dual bases are treated as complementary objects with respect to the initial space V and its bases. .13) distinguishes canonic isomorphism among all other isomorphisms relating the spaces V and V ∗∗ . Indeed. . The components h of these two transition matrices S and P are used to expand the vectors of «wavy» bases in corresponding «non-wavy» bases: n n i Sj · e i . e Let S be the transition matrix for passing from the old basis e1 . . . e Proof.1. . Similarly. Let e1. . . . . then a choice of basis and a change of basis in V ∗ are quite the same as in any other linear vector space. Transformation of the coordinates of a covector under a change of basis.1) does not break the rules of tensorial notation: the free index r is in the same upper position in both sides of the equality. .92 CHAPTER III. ˜n in V we immediately get another conjugate basis h1. The transition matrix P for passing from the old conjugate basis h1. The theory of this space should be understood as an extension of the theory of initial space V . the formula (2. . denote by P the transition matrix for passing from e ˜ the old dual basis h1. ej = ˜ i=1 ˜r = h s=1 r P s · hs . . . (1. . .2). . en be a basis in a linear vector space V . the summation index s enters twice — once as an upper index and for the second time as a lower index. Let V be a finite-dimensional linear vector space and let V ∗ be the associated dual space. hn . Each such basis of V has the associated basis of coordinate functionals in V ∗ . .5) in Chapter I: the vectors of dual bases in (2. However. .13) The equality (1. § 2. .13) is derived from the definition of h. . We have already seen such i deviations from the standard notations in constructing the basis vectors Ej in Hom(V. hn to the new conjugate basis ˜ 1 .4 in Chapter I). DUAL SPACE. If we treat V ∗ separately forgetting its relation to V . . Choosing another basis ˜ ˜ ˜ e1 .1) into these relationships.1) are specified by upper indices despite to the usual convention of enumerating the basis vectors. . ˜n. . . . the conjugate space V ∗ is practically never considered separately. In spite of the breaking the standard rules in indexing the basis vectors. .1) differs from the standard given by formula (5. . hn in V ∗ . hn to the new dual basis ˜ 1. Canonic isomorphism (1. . we get n r δj = hr (ej ) = s=1 i=1 n r i Ps Sj hs (ei ) = s=1 i=1 n n r i s P s Sj δ i = i=1 n i Pir Sj .10) possesses the property that for any vector v ∈ V and for any covector f ∈ V ∗ the following equality holds: h(v) | f = f | v . . ˜ n is inverse to the transih h tion matrix S that is used for passing from the old basis e1 . The relationship (1. . . en to the new ˜ basis e1 . . Substituting (2. . .1) Note that the second formula (2. . (2. W ) (see proof of the theorem 10. Theorem 2. In order to prove this theorem we use the biorthogonality relationships (1. . h(v) | f = h(v)(f) = ϕv (f) = f(v) = f | v . . . . ˜n. .

6) without using the conjugate bases.1) in Chapter I. .2. en . Theorem 2. .3): n n n s Sr · ˜ r h n s Sr f s r=1 s=1 f= s=1 fs · = r=1 · ˜r.2) ei = j=1 Tij · ˜j . ˜j = e i=1 n ˜ hr = s=1 n r Ts · h s . . Therefore. s Sr (2. . The second formula (2. fs = j=1 r ˜ Ts f r . . . Let’s consider its expansions ˜ in two conjugate bases h1.3. e h = r=1 s ·h . . Theorem 2. . while T = S −1 is the inverse transition matrix.3) and derive the first formula (2. (2. in order to write the complete set of formulas relating two pairs of bases in V and V ∗ it is sufficient to know two matrices S and T = S −1 : n n i Sj · e i .4) we substitute the fourth expression (2.4) where S is the direct transition matrix for passing from e1. . . The coordinates of a covector f in two dual bases h1 .4) is derived similarly. . .5 and from formula (1. en to the «wavy» ˜ basis ˜1. f = r=1 ˜ h fr · ˜ r . .§ 2. en and e1 .4).3) also differ from the standard introduced by formula (5. TRANSFORMATION OF THE COORDINATES OF A COVECTOR . . e Proof. (2. . .3) The expansions (2. . . . . Note that the formulas (2. . 93 The above relationship can be written in matrix form P S = 1. . . This means that P = S −1 . . . . ˜ n: h n n f = s=1 f s · hs . + f n v n . Remember that the inverse transition matrix T is also the inverse matrix for S. To the coordinates of covectors the other standard is applied: they are specified by lower indices and are written in row vectors. hn associated with the bases e1 . The theorem is proved. h Then we compare the resulting expansion of f with the second expansion (2. . . The scalar product of a vector v and a covector f is determined by their coordinates according to the formula n f |v = fi v i = f 1 v 1 + . . . . hn and h1 . i=1 (2. ˜r Let f be a covector from the conjugate space V ∗ . .2) for hs into the first expansion (2. . . In order to prove the first relationship (2.5) . . hn ˜ ˜ and ˜ 1.4) can be derived immediately from the definition 1. . ˜n in V are h e related to each other by formulas n n s Sr s=1 ˜ fr = fs .

. . hn conjugated to the basis e1 . f2 ∈ S ⊥ . e. where vi ∈ S.5) we use the relationship (1. (3.1) Let f ∈ S ⊥ . then f1 | v = 0 and f2 | v = 0 for all v ∈ S. From the condition f ∈ (S2 )⊥ we derive f | v = 0 for any v ∈ S2 . . .1). for all vectors v ∈ S we derive the equality f1 + f2 | v = f1 | v + f2 | v = 0 which means that f1 + f2 ∈ S ⊥ . The above definition of the orthogonal complement S ⊥ can be expressed by the formula S ⊥ = {f ∈ V ∗ : ∀v ((v ∈ S) ⇒ ( f | v = 0))}. This means that α · f ∈ S ⊥ . where v is expanded. where S is the linear span of S.1 we consider an arbitrary covector f of (S2 )⊥ . (2) S1 ⊂ S2 implies (S2 )⊥ ⊂ (S1 )⊥ . . i. (3) S ⊥ = S ⊥ . then f | v = 0 for all v ∈ S. therefore. In order to prove it let’s remember that the linear span S consists of all possible linear combinations of the form v = α1 · v1 + . Definition 3. In order to prove the third item of the theorem note that the linear span of S comprises this set: S ⊂ S . Hence.1) we derive f | v = α1 f | v1 + . Now assume that f ∈ S ⊥ . . ⊥ (4) i∈I Si = i∈I (Si )⊥ .1 is proved. Then f | v = 0 for all vectors v ∈ S. this applies to the vectors vi in the expansion (3. . § 3. . f | vi = 0. . Thus. which we have already proved.6): n n f | v = f(v) = f(ei ) vi = i=1 i=1 fi v i . The orthogonal complement of the subset S in the conjugate space V ∗ is the set S ⊥ ⊂ V ∗ composed by covectors each of which orthogonal to all vectors of S. The operation of constructing orthogonal complements of subsets S ⊂ V in the conjugate space V ∗ possesses the following properties: (1) S ⊥ is a subspace in V ∗ . In order to prove the inclusion (S2 )⊥ ⊂ (S1 )⊥ in the second item of the theorem 3. In order to prove (2. Therefore. + αr f | vr = 0. Proof. the equality f | v = 0 holds for any v ∈ S1 .1. In (2. But S1 ⊂ S2 . Orthogonal complements in a dual space. Applying the item (2) of the theorem. Now we need the opposite inclusion S ⊥ ⊂ S ⊥ . . the first item in the theorem 3. Let f1 . DUAL SPACE. For this purpose we should verify two conditions from the definition of a subspace. Let’s prove the first item of the theorem for the beginning. for the covector α · f we defive α · f | v = α f | v = 0. Theorem 3.94 CHAPTER III. In particular. Then f ∈ (S1 )⊥ . . we obtain the inclusion S ⊥ ⊂ S ⊥ . . + αr · vr . . This means that f ∈ (S2 )⊥ implies f ∈ (S1 )⊥ . Then from (3.1. The required inclusion is proved.5) and in the above calculations f is assumed to be expanded in the basis h1 . Proof. en. Let S be a subset in a linear vector space V .

95 This means that f | v = 0 for all v ∈ S .1. This means that the equality f | v = 0 holds for all ˜ vectors v in the union of all sets Si . Theorem 3. Now let’s proceed to the proof of the fourth item of the theorem 3. Let f ∈ S ⊥ . here we omit this proof. Let S be a subset of conjugate space V ∗ . Then f | v = 0 for all v ∈ S. Conversely. The orthogonal complement of S in V is the set S ⊥ ∈ V formed by vectors each of which is orthogonal to all covectors of the set S.1. from the inclusion f ∈ (Si )⊥ for all i ∈ I we derive f | v = 0 for all v ∈ Si and for all i ∈ I. we have proved that S = S. ˜ S= i∈I (Si )⊥ . The above definition of the orthogonal complement S ⊥ ⊂ V can be expressed by the formula S ⊥ = {v ∈ V : ∀f ((f ∈ S) ⇒ ( f | v = 0))}.2 which says that there is / a linear functional f such that it vanishes on all vectors u ∈ U and is nonzero on the vector v0 . Then for any w ∈ W and for any u ∈ U we have the orthogonality w | u = 0. This proves the opposite inclusion S ⊥ ⊂ S ⊥ and. therefore. Suppose that W = U ⊥ in the sense of definition 3. ⊥ (4) i∈I Si = i∈I (Si )⊥ . Definition 3.2.§ 3. thus. But Si ⊂ S for any i ∈ I. so we have the contradiction to the . Thus. The operation of constructing orthogonal complements of subsets S ⊂ V ∗ in V possesses the following four properties: (1) S ⊥ is a subspace in V . f | v = 0 for all v ∈ Si and for all i ∈ I.1.1 is equivalent to the condition U = W ⊥ in the sense of definition 3. (2) S1 ⊂ S2 implies (S2 )⊥ ⊂ (S1 )⊥ . Theorem 3. Let’s do it by contradiction. Let V be a finite-dimensional vector space and suppose that we have a subspaces U ⊂ V and a subspace W ⊂ V ∗ . completes the proof of the equality S ⊥ = S ⊥ .2. The the condition W = U ⊥ in the sense of definition 3.3. Therefore. However. Proof. Suppose that U = W ⊥ . Thus. This means that f belongs to each of the orthogonal complement (Si )⊥ . In this case we can apply the theorem 1.1. Hence. where S is a linear span of S. ⊥ ˜ The theorem is proved. Then f ∈ W and f | v0 = 0. it belongs to their intersection. ORTHOGONAL COMPLEMENTS IN A DUAL SPACE. The proof of this theorem almost literally coincides with the proof of the theorem 3. For this purpose we introduce the following notations: S= i∈I Si . This proves the converse inclusion S ⊂ S ⊥ . ˜ we have proved the inclusion S ⊥ ⊂ S. (3) S ⊥ = S ⊥ . we need to prove the coincidence U = W ⊥ . Then there is a vector v0 such that v0 ∈ W ⊥ and v0 ∈ U . Therefore.2. By definition W ⊥ is the set of all vectors v ∈ V such that w | v = 0 for all covectors w ∈ W . For this orthogonal complement one can formulate a theorem quite similar to the theorem 3. u ∈ U implies u ∈ W ⊥ and we have the inclusion U ⊂ W ⊥ .

. let U = W ⊥ . conversely.3) are derived from (3. . . . It this case it says that there is / a linear functional ϕ in V ∗∗ such that it vanishes on W and is nonzero on the covector f0 . (W ⊥ )⊥ = W. The basis e1.5). (3. . By construction of the basis e1 .3) These relationships (3. Theorem 3. . Hence. . . U = W ⊥ and U = W ⊥ implies W = U ⊥ .2.2) the second case U ⊂ V ∗ in the theorem 3. DUAL SPACE. Remember that we have the canonic isomorphism h : V → V ∗∗ . . . es in the subspace U and complete it up to a basis e1. We apply h−1 to ϕ and get the vector v = h−1(ϕ).1 and 3. We choose a basis e1. . . en the subspace U consists of vectors the initial s coordinates of which are deliberate. . . vs . . w ∈ W implies w ∈ U ⊥ and we have the inclusion W ⊂ U ⊥ . Assume that W = U ⊥ . This contradiction proves that U = W ⊥ . by contradiction.2) For arbitrary subsets S ∈ V and R ∈ V ∗ (not subspaces) in the case of a finite-dimensional space V we have the relationships (S ⊥ )⊥ = S . . en in the subspace V . This contradicts to the condition f0 ∈ U ⊥ . en and if we specify covectors by their coordinates in dual basis h1. Now. . then we can apply the formula (2. . we consider only the first case U ⊂ V . (3. . . Due to the relationships (3. This is the case if and only if the first s coordinates of the covector f are zero. . By definition U ⊥ is the set of all covectors f perpendicular to all vectors u ∈ U .3 can be reformulated as follows: in the case of a finite-dimensional space V for any subspace U ∈ V and for any subspace W ∈ V ∗ the following relationships are valid: (U ⊥ )⊥ = U. . . then dim U + dim U ⊥ = dim V . hn in V ∗ . As a result we have proved that W = U ⊥ implies U = W ⊥ .13) which yields v ∈ U and f0 | v = 0. The proposition of the theorem 3. Then there is a covector f0 ∈ U ⊥ such that f0 ∈ W . . . Therefore.2) by using the item (3) in theorems 3. . Then for any w ∈ W and for any u ∈ U we have the orthogonality w | u = 0. Let dim V = n and dim U = s. Proof. In the case of a finite-dimensional linear vector space V if U is a subspace of V or if U is a subspace of V ∗ . Therefore the condition f ∈ U ⊥ means that the equality s f |v = fi v i = 0 i=1 should be fulfilled identically for any numbers v 1 . while the remaining n − s coordinates are equal to zero. We shall do it again by contradiction. . The theorem is completely proved. . . Other n − s coordinates .4. en determines the conjugate basis h1.4 is reduced to the first case U ⊂ V if we replace U by U ⊥ . (R⊥ )⊥ = R . hn .2. Then we take into account (1. condition v0 ∈ W ⊥ . Hence. If we specify vectors by their coordinates in the basis e1 . Next step is to prove the coincidence W = U ⊥ . .96 CHAPTER III. . Let’s apply the theorem 1. .

A linear mapping ϕ : W ∗ → V ∗ is called a conjugate mapping for f if for any v ∈ V and for any w ∈ W ∗ the relationship ϕ(w) | v = w | f(v) is fulfilled. V ⊥ = {0}. (3. we derive the equality ⊥ (Ui )⊥ i∈I = i∈I ((Ui )⊥ )⊥ = i∈I Ui .5) follows from the first one upon substituting Ui by (Ui )⊥ . Let f : V → W be a linear mapping from V to W . . we have the required identity dim U + dim U ⊥ = dim V . (3. (V ∗ )⊥ = {0}. For the dimension of the subspace U ⊥ this yields dim U ⊥ = n − s. Theorem 3. The problem of the existence of a conjugate mapping is solved by the definition 4. . Indeed. In the sense of this reasoning the defining relationship for a conjugate mapping is written as follows: h(v) = h | v = ϕ(w) | v = w | f(v) . 97 of f are deliberate.4) can be proved immediately without using the finite-dimensionality of V .5.1 itself.4) All these equalities have the transparent interpretation. Now it is sufficient to pass to orthogonal complements in both sides of this equality and apply (3. .4) uses the corollary of the theorem 1. The sum of subspaces is the span of their union. As an immediate consequence of this theorem we get {0}⊥ = V. while this theorem assumes V to be a finite-dimensional space. hence. applying (3. Indeed. hn . CONJUGATE MAPPING. This means that the subspace U ⊥ is the linear span of the last n − s basis vectors of the conjugate basis: U ⊥ = hs+1 . § 4. Therefore.1. The second relationship (3. The theorem is proved The theorem 3. Definition 4.4 is known as the theorem on the dimension of orthogonal complements.2).5) Proof. .2. But to specify a functional in V ∗ this means that we should specify its action upon an arbitrary vector v ∈ V .§ 4. The proof of the last equality (3.2) again. the first relationship (3.2. The first three of the equalities (3. {0}⊥ = V ∗ . The theorem is proved.5) is an immediate consequence of the items (3) and (4) in the theorems 3. i∈I Ui = i∈I (Ui )⊥ . In the case of a finite-dimensional space V for any family of subspaces in V or in V ∗ the following relationships are fulfilled ⊥ ⊥ Ui i∈I = i∈I (Ui )⊥ . in order to define a mapping ϕ : W ∗ → V ∗ for each functional w ∈ W ∗ we should specify the corresponding functional h = ϕ(w) ∈ V ∗ .1 and 3. . The finite-dimensionality of V here is not used. Conjugate mapping.

Ker f = (Im f ∗ )⊥ . (4. We shall not give these calculations here since in what follows we shall not use the above properties at all. Theorem 4. Im f ∗ = (Ker f)⊥ . Im f = (Ker f ∗ )⊥ . This means that the conjugate mapping ϕ is a linear mapping. Proof. The operation of passing from f to its conjugate mapping f ∗ possesses the following properties: (f + g)∗ = f ∗ + g∗ . (f ◦ g)∗ = g∗ ◦ f ∗ . But the vectors of the form f(v) ∈ W constitute the image Im f.1 for the conjugate mapping ϕ : W ∗ → V ∗ we have the following relationships: ϕ(w1 + w2)(v) = w1 + w2 | f(v) = w1 | f(v) + ϕ(α · w)(v) = α · w | f(v) = α w | f(v) = + w2 | f(v) = ϕ(w1 )(v) + ϕ(w2)(v) = (ϕ(w1 ) + ϕ(w2))(v). In the case of finite-dimensional spaces V and W the kernels and images of the mappings f : V → W and f ∗ : W ∗ → V ∗ are related as follows: Ker f ∗ = (Im f)⊥ . 1996. DUAL SPACE. It is usually denoted ϕ = f ∗ . the kernel Ker f ∗ is the set of covectors orthogonal to the vectors of the form f(v). . = α ϕ(w)(v) = (α · ϕ(w))(v). It is easy to verify that the above equality defines a linear functional h = h(v): h(v1 + v2 ) = w | f(v1 + v2 ) = w | f(v1 ) + f(v2 ) = = w | f(v1 ) + w | f(v2 ) = h(v1 ) + h(v2 ). w ∈ Ker f ∗ is equivalent to the equality f ∗ (w)(v) = 0 for all v ∈ V . As we have seen above.1.. For a linear mapping f : V → W from V to W the conjugate mapping ϕ : W ∗ → V ∗ is also linear.2.98 CHAPTER III. All three of the above properties are proved by direct calculations on the base of the definition 4. h(α · v) = w | f(α · v) = w | α · f(v) = α w | f(v) = α h(v). Theorem 4. Hence. Therefore. Since v ∈ V is an arbitrary vector of V from the above calculations we obtain ϕ(w1 + w2) = ϕ(w1 ) + ϕ(w2) and ϕ(α · w) = α · ϕ(w). CopyRight c Sharipov R. Due to the relationship 4. The first two properties are naturally called the linearity.A. 2004. (α · f)∗ = α · f ∗ . The last third property makes the operation f → f ∗ an analog of the matrix transposition. The kernel Ker f ∗ is the set of linear functionals of W ∗ that are mapped to the zero functional in V ∗ under the action of the mapping f ∗ . As a result of simple calculations we obtain f ∗ (w)(v) = f ∗ (w) | v = w | f(v) = 0.1) Proof. the conjugate mapping ϕ : W ∗ → V ∗ for a mapping f : V → W is unique.1.

the prove of such strengthened versions of these theorems is based on the axiom of choice (see [1]). . The matrices of the mappings f and f ∗ determined by the relationships (4. In proving this relationship we did not use the finite-dimensionality of W . .1) is proved. .2. . . 99 Therefore.3. . .2) for this purpose: m m ˜ i | f(ej ) = h k=1 n Fjk ˜ i | ˜k = h e i Fjk δk = Fji.3. This choice uniquely determines e ˜ the conjugate bases h1. The matrices of the mappings f and f ∗ are determined by the expansions: m n f(ej ) = k=1 Fjk ˜k . hm in V ∗ and W ∗ . The third and the fourth relationships are derived from the first and the second ones by means of the theorem 3. . In some theorems of this chapter the restrictions to the finitedimensional case can be removed. Let’s consider a h mapping f : V → W and the conjugate mapping f ∗ : W ∗ → V ∗ . q (4. However. Fji = Φi . e. Let’s choose a basis e1 .1) is proved. . .3). In order to prove the second relationship we consider the orthogonal complement (Im f ∗ )⊥ .2) are the same. The first relationship (4. . this relationship implement the same idea as the first one: the mapping is applied to a basis vector of one space and the result is expanded in the basis of another space.§ 4. we have (Im f ∗ )⊥ = Ker f. It says that if w | f(v) = 0 for all w ∈ W ∗ . . Theorem 4. Ker f ∗ = (Im f)⊥ . It is valid for infinite dimensional spaces as well. Therefore. Thereby we use the finite-dimensionality of the spaces W and V .1) is somewhat different by structure from the first one.2) The second relationship (4. em in the space W . en ˜ in V and a basis ˜1. CONJUGATE MAPPING. From the definition of the conjugate mapping we derive ˜ i | f(ej ) = f ∗ (˜ i ) | ej . . Using the finite-dimensionality of W . hn and ˜ 1. However. The matter is that the basis vectors of the dual basis are indexed differently (with upper indices). j Remark. It is formed by the vectors orthogonal to all covectors of the form f ∗ (w): 0 = f ∗ (w) | v = w | f(v) . . Let the spaces V and W be finite-dimensional. then f(v) = 0. we get the required coincidence of the matrices: Fji = Φi . i.3) Let’s calculate separately the left and the right hand sides of this equality using the expansion (4. q=1 f ∗ (˜ i ) | e j = h q=1 Φi hq | e j = q Substituting the above expressions back to the formula (4. k=1 n i q i Φq δj = Φ j . h h (4. . e f (˜ i ) = h ∗ q=1 Φ i hq . we apply the corollary of the theorem 1. . . j Proof. The second relationship (4.

w) + f(w.3.CHAPTER IV BILINEAR AND QUADRATIC FORMS. Definition 1. α · w) = α f(v. w) = f(v. w) = −f(w.2) The operation (1.4) . w) = f(v1 . w1 ) + f(v. w) = h+ (v.1) Similarly.2. Definition 1. w) for any v ∈ V and for any α ∈ K. The bilinear form f(v. Let’s consider an expansion of f(v. Thereby any bilinear form is the sum of a symmetric bilinear form and a skew-symmetric one: f(v.2) is called the alternation of this bilinear form. 2 (1. w) for any two v1 .3) Theorem 1. § 1. A bilinear form f(v. Let V be a linear vector space over a numeric field K. w) + f(v2 . Recovery formula.1. v) . w) into the sum of a symmetric and a skew-symmetric bilinear forms is unique. v) . w1 + w2 ) = f(v. A numeric function y = f(v. v2 ∈ V . w) with two arguments v. (1. w2 ) for any two v1 . Definition 1.1) is called the symmetrization of the bilinear form f. 2 (1. (4) f(v. Having a bilinear form f(v. (1. w) + h− (v. v). w) = α f(v. v2 ∈ V . it is also linear in its second argument w when the first argument v is fixed. w) = f+ (v. w ∈ V and with the values in the field K is called a bilinear form if (1) f(v1 + v2. w) = f(v. w) − f(w. one can produce a symmetric bilinear form: f+ (v. w). (3) f(v. v). w) into the sum of a symmetric and a skew-symmetric bilinear forms f(v. the operation (1. w) is linear in its first argument v when the second argument w is fixed. Symmetric bilinear forms and quadratic forms. w) is called a skew-symmetric bilinear form or an antisymmetric bilinear form if f(v. A bilinear form f(v. w) + f− (v. w) for any v ∈ V and for any α ∈ K. w). w). Proof. The expansion of a given bilinear form f(v. w) is called a symmetric bilinear form if f(v. (2) f(α · v. one can produce a skew-symmetric bilinear form: f− (v. w) = f(w.1.

w) is also given. v) + f(v. . w). w) generating g(v) follows from (1. v). v + w) = f(v. v) = f+ (v. v) + 2 f(v. v) = 0. The formula (1. w) = f(v. Hence. 2 (1. w) + h− (w. w) + h+ (w.4) we derive f(v. h+ = f+ and h− = f− . 101 By means of symmetrization and alternation from (1.5). the expansion (1. For any quadratic form g(v) there is the unique bilinear form f(v. From g(v) = f(v. Hence. v))+ + (h− (v.5) shows that any quadratic form can be generated by a symmetric bilinear form. The existence of a symmetric bilinear form f(v.5) The same quadratic form can be generated by several bilinear forms. a quadratic form and an associated symmetric bilinear form for it both are denoted by the same symbol: g(v) = g(v. This proves the uniqueness of the form f. Hence. The relationship (1. we get f(v.3) we derive g(v) = f(v.6) Formula (1. w) − h+ (w. w) + f(w. f(v. SYMMETRIC BILINEAR FORMS . v))+ + (h− (v. v). w) − h− (w. Proof. Usually. w) that generates g(v). v) and f(w. v) and from the symmetry of the form f we derive g(v + w) = f(v + w. Then from the expansion (1. f− (v. w) in right hand side of this formula can be replaced by g(v) and g(w) respectively. The theorem is proved.3). (1. Let’s prove the uniqueness of this form. w)+ + f(w. w) = g(v + w) − g(v) − g(w) . For a skew-symmetric bilinear form we have f− (v. v) + f(w. v)) = 2 h+ (v. w). Moreover. v). Now f(v. when a quadratic form is given.§ 1. w). If g(v) = f(v. v) = (h+ (v. v) = (h+ (v. . we assume without stipulations that the associated symmetric bilinear form g(v. v)) = 2 h− (v.2.4) coincides with the expansion (1. Therefore. w) − f(w. v) for some bilinear form f(v. v). v) = −f− (v. w) are uniquely determined by the values of the quadratic form g(v).4. then the quadratic form g is said to be generated by the bilinear form f. Definition 1.6) shows that the values of the symmetric bilinear form f(v. w). A numeric function y = g(v) with one vectorial argument v ∈ V is called a quadratic form in a linear vector space V if g(v) = f(v. .6) is called a recovery formula. w) + f(w. Theorem 1.

.8) the first index i specifies the row number. en in a linear space V is one of the problems which are solved in the theory of quadratic form. .10) This supports the term «quadratic form». (1. . wn be coordinates of two vectors v and w in the basis e1. . The numbers (1. . the second index j specifies the column number. Then the values f(v. .7) we easily derive the formula relating the components of a bilinear form f(v. . Denote T = S −1 . i=1 j=1 fij = k=1 q=1 ˜ Tik Tjq fkq . en be a basis in this space. Further. . w) these two bases.7) are written in form of a matrix f11 F = . f1n . . .11) . (1. Let e1 . . Bringing a quadratic form to the form (1. . en. . Let v1 .. BILINEAR AND QUADRATIC FORMS. . . . The numbers fij determined by formula fij = f(ei . . . fn1 .. fnn which is called the matrix of the bilinear form f in the basis e1 . vn and w1 . .8) . we shall assume the matrix of an associated symmetric bilinear form g(v. . ˜ fkq = (1. w) be a bilinear form in a finite-dimensional linear vector space V and let e1 . (1.9) In the case when gij is a diagonal matrix. . . ˜n be two bases in a linear vector space V . w) = i=1 j=1 fij vi wj . The matrix of a symmetric bilinear form g is also symmetric: gij = gji. .102 CHAPTER IV.7) are called the coordinates or the components of the form f in the basis e1 . . . For the element fij in the matrix (1. en and ˜1. ˜q ) = e e k=1 q=1 ˜ Tik Tjq fkq . saying the matrix of a quadratic form g(v). . . ˜ The reverse formula expressing fkq through fij is derived similarly: n n n n i j Sk Sq fij . . . . .7) and use the bilinearity of the form f(v. . w) and g(v) of a bilinear form and of a quadratic form respectively are calculated by the following formulas: n n n n f(v. . + gnn (vn )2. . ej ) (1.. . . . the formula for g(v) contains only the squares of coordinates of a vector v: g(v) = g11 (v1 )2 + . . . en. w). Let f(v. . . . From (1. . . en . For this purpose it is sufficient to substitute the relationship (5. w): n n n n fij = f(ei .8) of Chapter I into the formula (1. . Let’s e e denote by S the transition matrix for passing from the first basis to the second one. g(v) = i=1 j=1 gij vi vj . .10) by means of choosing proper basis e1. . ej ) = k=1 q=1 Tik Tjq f(˜k .

In order to prove the coincidence S ⊥ = S⊥ we have to prove the converse inclusion S⊥ ⊂ S ⊥ . For this purpose we should verify two conditions from the definition of a subspace. w) = 0. Definition 2. w) = 0))}. the first item of the theorem 2. . For the orthogonal complements determined by a quadratic form g(v) there is a theorem analogous to theorems 3.1 we consider an arbitrary vector v in (S2 )⊥ . + αr · wr . . (3) S ⊥ = S⊥ . For this purpose let’s remember that the linear span S is formed by the linear combinations w = α1 · w1 + . w) = 0. Definition 2. Then g(v. where wi ∈ S. Orthogonal complements with respect to a quadratic form. Then g(v1 . Thus. w) = 0 and g(v2 .§ 2. (4) i∈I Si ⊥ = i∈I (Si )⊥ . 103 In matrix form these relationships are written as follows: ˜ F = T tr F T.1 is proved. for the vector α · v we derive g(α · v. v ∈ (S2 )⊥ implies v ∈ (S1 )⊥ . Hence. v2 ∈ S⊥ . The orthogonal complement of a subset S with respect to a quadratic form g can be defined formally: S⊥ = {v ∈ V : ∀w ((w ∈ S) ⇒ (g(v. Now let’s proceed to the third item of the theorem.1. ORTHOGONAL COMPLEMENTS . we get the inclusion S ⊥ ⊂ S⊥ . which is already proved.1) . w) = 0 for any w ∈ S2 . w) = 0. so the first condition is verified. where S is the linear span of S. therefore.12) Here S tr and T tr are two matrices obtained from S and T by transposition. Two vectors v and w in a linear vector space V are called orthogonal to each other with respect to the quadratic form g if g(v. Thus. . In order to prove the inclusion (S2 )⊥ ⊂ (S1 )⊥ in the second item of the theorem 2. w) = 0 for all w ∈ S. Now let v ∈ S⊥ . for all w ∈ S we have g(v1 + v2 . ˜ F = S tr F S. w) + g(v2 . The orthogonal complement of the subset S with respect to a quadratic form g(v) is the set of vectors each of which is orthogonal to all vectors of S with respect that quadratic form g. w) = 0 for all w ∈ S. From the condition v ∈ (S2 )⊥ we get g(v. This proves the required inclusion. The orthogonal complement of S is denoted S⊥ ⊂ V . Theorem 2. Proof. The operation of constructing orthogonal complements of subsets S ⊂ V with respect to a quadratic form g possesses the following properties: (1) S⊥ is a subspace in V . (1. Applying the second item of the theorem. . § 2.2 in Chapter III. This means that v1 + v2 ∈ S⊥ .1. (2) S1 ⊂ S2 implies (S2 )⊥ ⊂ (S1 )⊥ . w) = α g(v. Then v ∈ (S1 )⊥ . the equality g(v. Let v1 . Let S be a subset of a linear vector space V . w) = g(v1 . But S1 ⊂ S2 . Let’s prove the first item in the theorem for the beginning. This means that α · v ∈ S⊥ . Note that the linear span of S comprises this set: S ⊂ S . w) = 0 is fulfilled for any w ∈ S1 .1 and 3.2. Hence. (2.

w) = α1 g(v. w1 ) + . w) = 0 for all w ∈ S. w) = 0 for all w ∈ S . it belongs to their intersection. Indeed. where it is defined. Definition 2. . Let v ∈ S⊥ . w) = 0 for all w ∈ Si and for all i ∈ I. ˜ S= i∈I (Si )⊥ . hence. g(v. for which the subspace V⊥ is the kernel. This proves the converse inclusion S⊥ ⊂ S ⊥ and thus completes the proof of the coincidence S ⊥ = S⊥ . But Si ⊂ S for any i ∈ I. if v ∈ (Si )⊥ for all i ∈ I. Therefore. Its kernel Ker ag coincides with the kernel of the form g. An associated mapping of a quadratic form g is the mapping ag : V → V ∗ that takes each vector v of the space V to the linear functional fv in the conjugate space V ∗ determined by the relationship fv (w) = g(v. this fact is immediate from the bilinearity of the form g. Hence. The term «kernel» is not an occasional choice for denoting the set V⊥ .104 CHAPTER IV. Hence. Hence. w) = 0 for all w ∈ Si and for all i ∈ I. ˜ This proves the inclusion S⊥ ⊂ S. Otherwise. if Ker g = {0}. which we considered earlier in Chapter III. The associated mapping ag relates orthogonal complements S⊥ determined by the quadratic form g and and orthogonal complements S ⊥ in a dual space. Then g(v. The kernel of a quadratic form g(v) in a linear vector space V is the set Ker g = V⊥ formed by vectors orthogonal to each vector of the space V with respect to the form g. Conversely. + αr g(v. A quadratic form with nontrivial kernel Ker g = {0} is called a degenerate quadratic form. Each quadratic form is associated with some mapping. v is orthogonal to all vectors w ∈ V with respect to the quadratic form g(v). The theorem 2. Definition 2. ˜ ˜ The above two inclusions S⊥ ⊂ S and S ⊂ S⊥ prove the coincidence of two sets ˜ S⊥ = S.2) The associated mapping ag : V → V ∗ is linear. In proving the fourth item of the theorem we introduce the following notations: S= i∈I Si . w) = 0 for all w ∈ S. this is true for the vectors wi in the expansion (2. w) = 0 for any vector w in the union of all sets Si . Then from (2.1 the kernel of a form g(v) is a subspace of the space V . Let v ∈ S⊥ . Due to the theorem 2. Definition 2.1): g(v. In particular. then g(v. then the form g is called a non-degenerate quadratic form. This proves ˜ the converse inclusion S ⊂ S⊥ . g(v.2) is identically zero. then g(v.3.5. w) for all w ∈ V. (2. g(v. wr ) = 0. .1) we derive g(v.1 is proved. . the condition v ∈ Ker ag means that the functional fv determined by (2. This means that v belongs to each of the orthogonal complements (Si )⊥ and. wi ) = 0. BILINEAR AND QUADRATIC FORMS.4.

i. w) = 0 for all w ∈ S. we calculate the dimension of the image Im ag of the associated mapping: dim(Im ag ) = dim V − dim(Ker ag ). 1996.. we get a−1 ((Ker g)⊥ ) = (Ker g)⊥ = V. . Corollary 2. But this equality can be rewritten in the following way: g(v. e. This corollary of the theorem 2. 105 Theorem 2. As a result we get the required equality (2.4 from Chapter I. S⊥ = a−1(S ⊥ ).A. Therefore (Ker g)⊥ = V . (2. Using the theorem 9.3. .2 is derived from the formula (2. g Proof. . the condition v ∈ S⊥ is equivalent to ag (v) ∈ S ⊥ . i. Therefore.4).3) Corollary 1. Hence. For any subset S ⊂ V and for any quadratic form g(v) in a linear vector space V the set S⊥ is the total preimage of the set S ⊥ under the associated mapping ag . CopyRight c Sharipov R. For a quadratic form g(v) in a finite-dimensional linear vector space V the image of the associated mapping ag : V → V ∗ coincides with the orthogonal complement of its kernel Ker ag : Im ag = (Ker ag )⊥ . 2004. vectors of the kernel Ker g are orthogonal to all vectors of the space V with respect to the form g. g According to the definition 2.4 in Chapter III: dim(Ker ag )⊥ = dim V − dim(Ker ag ).§ 2. The condition v ∈ S⊥ means that g(v. ORTHOGONAL COMPLEMENTS . Im ag ⊆ (Ker ag )⊥ .2 to the kernel S = Ker g. The dimension of the orthogonal complement of Ker ag in the dual space is determined by the theorem 3. This proves the required equality S⊥ = a−1 (S ⊥ ).4) Proof. we can apply the above corollary 1 and the item (3) of the theorem 4. For a quadratic form g in a finite-dimensional space V it can be strengthened.5 from Chapter I.2. e. g This result becomes more clear if we write it in the following equivalent form: Im ag = ag (V ) ⊆ (Ker g)⊥ . As we can see.3) if we take into account Ker g = Ker ag . (2. The image of the associated mapping ag is enclosed into the orthogonal complement to its kernel (Ker ag )⊥ . the dimensions of these two subspaces are equal to each other. w) = fv (w) = ag (v)(w) = ag (v) | w = 0 for all w ∈ S. If we apply the result of the theorem 2.

Theorem 2.5) Let’s prove the relationship (2. Proof. Theorem 2. we have the inclusion U ⊂ W⊥ . Then g(v. We can swap U and W and obtain that U = W⊥ implies W = U⊥ . Hence. we get that there exist a linear functional f ∈ V ∗ such that f(v) = 0 and f(u) = f | u = 0 for all u ∈ U . i. w) = ag (w)(v) = f(v) = 0. v0 ) = 0 contradicts to the initial choice v0 ∈ W⊥ . Then the other condition g(v. From the inclusion Ker g ⊂ U . w) = 0 and g(v. For an arbitrary subset S ⊂ V of a finite-dimensional space V one can derive (S⊥ )⊥ = S + Ker g.4 is an analog of the theorem 3. From this corollary we obtain that (Ker g)⊥ = Im ag . The proofs of these two theorems are also very similar. Hence. W = U⊥ implies U = W⊥ . f = ag (w). Let V be a finite-dimensional linear vector space and let U and W be two subspaces of V comprising the kernel Ker g of a quadratic form g. joining the vectors of the kernel Ker g .3 can be reformulated as follows: for a subspace U ⊂ V in a finite-dimensional space V the condition Ker g ⊂ U means that double orthogonal complement of U coincides with that space: (U⊥ )⊥ = U . Then the conditions W = U⊥ and U = W⊥ are equivalent to each other. Suppose that the condition W = U⊥ is fulfilled. Let U V be a subspace of a finite-dimensional space V comprising the kernel of a quadratic form g.3 from Chapter III. u) = 0 for all u ∈ U . This theorem is an analog of the theorem 1. v0 ) = 0 and g(v. (2. g(v.1 from Chapter III. Assume that U = W⊥ . f ∈ Im ag and there is a vector w ∈ V that is taken to f by the associated mapping ag . we get U ⊥ ⊂ (Ker g)⊥ . Therefore. The theorem 2.3. Proof. u) = 0.2.3 which says that there exists a vector v such that g(v.4. Hence. Due to these relationship we find that w is the very vector that we need to complete the proof of the theorem. Note that vectors of the kernel Ker g are orthogonal to all vectors of V . It’s proof is essentially based on that theorem. Applying the theorem 1. The latter condition means that v ∈ U⊥ = W . u) = ag (w)(u) = f(u) = 0 for all u ∈ U. u) = 0 for all u ∈ U . The set W⊥ is formed by vectors orthogonal to all vectors of W with respect to the quadratic form g.106 CHAPTER IV. In this situation we can apply the theorem 2.2 from Chapter III. Due to the last condition this functional f belongs to the orthogonal complement U ⊥ . we conclude that f ∈ (Ker g)⊥ . This contradiction shows that the assumption U = W⊥ is not true and we have the coincidence U = W⊥ . Therefore. Thus.5). e.2 from Chapter III. Then there is a vector v0 such that v0 ∈ W⊥ and v0 ∈ U . BILINEAR AND QUADRATIC FORMS. applying the item (2) of the theorem 3. The proposition of the theorem 2. Further proof is by contradiction. Then for any vector w ∈ W and for any vector u ∈ U we have the relationship g(w. For any vector v ∈ U there exists a vector / w ∈ V such that g(v. these two conditions are equivalent. Now we apply the corollary 2 from the theorem 2.

W⊥ = a−1 (W ⊥ ) implies the g equality ag (W⊥ ) = W ⊥ .5. Theorem 2. . we do not change the orthogonal complement U⊥ .4 from Chapter I. where U⊥ is the orthogonal complement of U with respect to the form g. The vectors of the kernel Ker g are orthogonal to all vectors of the space V . (U⊥ )⊥ = U . In the case of finite-dimensional linear vector space V for any subspace U of V we have the equality dim U + dim U⊥ = dim V + dim(Ker g ∩ U ).6) Now let’s apply the theorem 2. This completes the proof of the relationship (2. Now let’s apply the item (3) of the theorem 2. Now let’s consider the restriction of the associated mapping ag to the subspace W⊥ .8) coincides with the kernel of the non-restricted mapping ag since Ker ag = Ker g ⊂ W⊥ . 107 to S. this differs W from the initial subspace U . for the dimension of W we derive the formula dim W = dim U + dim(Ker g) − dim(Ker g ∩ U ). g Note that Ker g ⊂ W . while the equality W⊥ = a−1 (W ⊥ ) shows g that such preimage is enclosed into W⊥ . Applying the theorem 6. This yields W ⊥ ⊂ (Ker g)⊥ = Im ag . Let’s apply the theorem on the sum of dimensions of the kernel and the image (see theorem 9. Therefore. We denote this restriction by a: a : W⊥ → V ∗ .4 in Chapter I) to the mapping a: dim(Ker g) + dim W ⊥ = dim W⊥ (2.1. This yields W⊥ = a−1 (W ⊥ ). Let’s denote W = U + Ker g.1 to the inclusion Ker g ⊂ W and take into account the corollary 2 of the theorem 2. joining them to U . Therefore. Let’s apply the item (2) of the theorem 3.9) . Then U⊥ = W⊥ . we do not change the orthogonal complement of this subset: S⊥ = (S ∪ Ker g)⊥ . The subspace U = S + Ker g comprises the kernel of the form g. Proof. ORTHOGONAL COMPLEMENTS . The inclusion W ⊥ ⊂ Im ag means that the preimage of each element f ∈ W ⊥ under the mapping ag is not empty. For the image of this mapping we have Im a = ag (W⊥ ) = W ⊥ . (2.2. (2. therefore. This yields S⊥ = (S ∪ Ker g)⊥ = S ∪ Ker g ⊥ = ( S + Ker g)⊥ .2 to the subset S = W . .§ 2.7) (2.8) The kernel of the mapping (2.5): (S⊥ )⊥ = (( S + Ker g)⊥ )⊥ = S + Ker g.

Inertia indices and signature.108 CHAPTER IV.10) which follows from the theorem 3. A subspace U in a linear vector space V is called regular with respect to a quadratic form g if U ∩ U⊥ ⊆ Ker g. . Then U + U⊥ = V . BILINEAR AND QUADRATIC FORMS. the following relationships are fulfilled: Ui i∈I ⊥ = i∈I (Ui )⊥ . In the case of finite-dimensional linear vector space V equipped with a quadratic form g for any family of subspaces in V .11) the condition Ker g ⊂ Ui is inessential. Moreover. each of which comprises the kernel Ker g.11). Let U be a subspace in a finite-dimensional space V regular with respect to a quadratic form g. Taking into account the coincidence W⊥ = U⊥ .7) and (2.4 of Chapter III. Transformation of a quadratic form to its canonic form. § 3. Now it is sufficient to pass to orthogonal complements in left and right hand sides of the above equality and apply the theorem 2. In order to determine the dimension of W ⊥ we apply the relationship dim W + dim W ⊥ = dim V (2.4 from Chapter I: dim W = dim U + dim U⊥ − dim(U ∩ U⊥ ). Therefore. The vectors of the kernel Ker g are perpendicular to all vectors of the space V . The theorem is proved. From the condition Ker g ∈ Ui we derive that ((Ui )⊥ )⊥ = Ui (see theorem 2. Now let’s add the relationships (2. we get the required equality (2.9) and subtract the relationship (2.4 again.1.11) is derived from the first one. due to the regularity of U with respect to the form g we have U ∩ U⊥ ⊆ Ker g.1.6.4) from Chapter III in present case are the relationships {0}⊥ = V and V⊥ = Ker g. This yields the required equality (2. Theorem 3. Let’s denote (Ui )⊥ = Vi and apply the first relationship (2.1 if we take into account that the sum of subspaces is the linear span of the union of these subspaces. Theorem 2. Let’s denote W = U + U⊥ and then let’s calculate the dimension of the subspace W applying the theorem 6.11) to the family of subsets Vi : (Ui )⊥ i∈I ⊥ = i∈I Vi ⊥ = i∈I (Vi )⊥ = i∈I Ui . Ker g ⊆ U⊥ . i∈I Ui ⊥ = i∈I (Ui )⊥ . Therefore.6).11) Proof. Proof.4). This relationship is derived from the items (3) and (4) of the theorem 2. The analogs of the relationships (3.10). In proving the first relationship (2. Definition 3. (2. The second relationship (2. we derive U ∩ U⊥ = (U ∩ U⊥ ) ∩ Ker g = U ∩ (U⊥ ∩ Ker g) = U ∩ Ker g.

2. Let’s consider the subspace W obtained by joining v0 to U = Ker g: W = Ker g + v0 = U ⊕ v0 . 109 Because of the equality U ∩ U⊥ = U ∩ Ker g the above formula for the dimension of the subspace W can be written as follows: dim W = dim U + dim U⊥ − dim(U ∩ Ker g). there exists a vector v0 ∈ U such that g(v0 ) = 0. Then the numeric function g(v) is identically zero in the subspace U⊥ .1) Let’s compare (3. . The case g = 0 is trivial. The theorem is proved. w) = 0 + 0 = 0. TRANSFORMATION TO A CANONIC FORM. is obviously a diagonal matrix. Since g(v. which we have made in the beginning of our proof. Now. The converse inclusion Ker g ⊆ U⊥ is always valid. which contradicts the hypothesis of the theorem.6) the numeric function g(v. Theorem 3. In the case n = 1 the proposition of the theorem is trivial: any 1 × 1 matrix is a diagonal matrix. Assume that there is no vector v ∈ U⊥ such that g(v) = 0. This contradiction means that the assumption. The matrix of the zero quadratic form g is purely zero in any basis.2.6) from the theorem 2. It is regular with respect to the form g and U⊥ = V . The proof is by contradiction. The second summand g(v. . . Proof. Due to the recovery formula (1. Now let’s apply the theorem 3. w) is zero due to our assumption in the beginning of the proof.2) . U⊥ = Ker g. then there exists a vector v ∈ U⊥ such that g(v) = 0. According to this theorem. Suppose that the theorem is valid for any quadratic form in any space of the dimension less than n.3. Hence. Suppose that g ≡ 0. is not valid and. We shall prove the theorem by induction on the dimension of the space dim V = n. This comparison yields dim W = dim V . x) = g(v. en such that the matrix of g is diagonal in this basis. w ∈ U⊥ . . we get v ∈ Ker g. thus. But v is an arbitrary vector of the subspace U⊥ . it proves the existence of a vector v ∈ U⊥ such that g(v) = 0. The square n × n matrix. Let U be a subspace of a finite-dimensional space V regular with respect to a quadratic form g. x) = 0 for an arbitrary vector x ∈ V . For any quadratic form g in a finite-dimensional vector space V there exists a basis e1 . applying the item (3) of the theorem 4. we can apply the theorem 3. Therefore. This subspace W determines the following two cases: W = V or W = V . The first summand g(v.5 from Chapter I. The theorem is proved. (3.§ 3. Proof. u) in right hand side of the above equality is zero since the subspaces U and U⊥ are orthogonal to each other. If U⊥ = Ker g. Theorem 3. Let’s consider the subspace U = Ker g.1) with the formula (2. (3. Then for an arbitrary vector v of the subspace U⊥ we derive g(v. u) + g(v. w) is also identically zero for all v.1 and expand an arbitrary vector x ∈ V into a sum of two vectors x = u + w. which is purely zero. we get W = V . U⊥ ⊆ Ker g.5. Therefore. u + w) = g(v. where u ∈ U and w ∈ U⊥ .

6) yield the exact value of this dimension dim W⊥ = dim V + dim(U ∩ Ker g) − dim U = n − 1. en . In the case W = V we consider the intersection W ∩ W⊥ . . Let’s find the matrix of the quadratic form g in the extended basis. . . . for i = 1. We can apply the inductive hypothesis to g in W⊥ .6) . . This means that the dimension of the subspace W⊥ is less than n. Since v0 ∈ W⊥ the extended system of vectors e1 . . . Let w ∈ W ∩ W⊥ . It yields the expansion V = W + W⊥ . Let e1 . Hence. We can renumerate the basis vectors e1. In the case W = V we choose a basis e1 . . . s and j = 1. . BILINEAR AND QUADRATIC FORMS. u) + g(u.10). But u ∈ Ker g.1. Since w is a vector of W ans simultaneously it is a vector of W⊥ . . ej ) = 0 for i < j n − 1. it should be orthogonal to itself with respect to the quadratic form g: g(w. while g(v0 . . which means the regularity of the subspace W with respect to the quadratic form g. is a basis of V . Then the value of g(v) can be calculated by formula (1.4) and (3. v0 ) = g(v0 ) = 0. . . .3) = α2 g(v0 . it is a diagonal element: gs+1 s+1 = g(es+1 . we have proved the inclusion W ∩ W⊥ ⊆ Ker g. Note that v0 ∈ W . gnn can be equal to zero. hence. . g(v0 . The inductive step is over and the theorem is completely proved. en is linearly independent and. en−1 be a basis of the subspace W⊥ in which the matrix of the restriction of g to W⊥ is diagonal: gij = gji = g(ei . s + 1 we have gij = gji = g(ei . Thus. .5). . (3. (3. Let’s denote by s the number of such elements. u) = 0. v0) + 2α g(v0 . Now let’s apply the theorem 3. . . (3. en) = 0 for i < n. This follows from g(v0 . . The relationships (3. u) = 0 and g(u. therefore. The matrix of the quadratic form g in this basis is a matrix almost completely filled with zeros. Indeed. . .110 CHAPTER IV. α · v0 + u) = (3. The only nonzero element is gs+1 s+1 . Let’s consider the restriction of g to the subspace W⊥ . . . but v0 ∈ W ∩ W⊥ . .3) we get α = 0. u) = 0. Let g be a quadratic form in a finite dimensional space V and let e1. w) = g(α · v0 + u.2) we derive w = α · v0 + u. from (3. . . ej ) = 0 since ei ∈ Ker g.4) / We complete this basis by one vector en = v0 . For the elements in the extension of this matrix we obtain the relationships gin = gni = g(ei . . Hence. en be a basis in which the matrix of g is diagonal. . . es+1 ) = g(v0 ) = 0. Then from (3. . en ∈ W and ei ∈ W⊥ . v0) = 0. . As a result we get the basis in V . = gss = 0. v0 ∈ W⊥ / / and W⊥ = V . . es in the kernel Ker g and complete it by one additional vector es+1 = v0 . . . en so that g11 = .5) They follow from the orthogonality of ei and en in (3. The formula (2. where u ∈ Ker g. . A part of the diagonal elements g11. This means that w = u ∈ Ker g.5) taken together mean that the matrix of the quadratic form g is diagonal in the basis e1. . indeed.

= wn = 0. . .2. Hence. From these equalities for the vector w we derive w = w 1 · e1 + .6). for i > s. Suppose that s is the zero inertia index of the quadratic form g. vn .8) The matrix of the quadratic form g in the new basis (3. . the above equality should be fulfilled identically in vs+1 . . . (3. Conversely.§ 3. . + w s · es . . we can explicitly calculate the matrix elements: gij = g(˜i . . ˜j ) = (γi γj )−1 gij = 0 for i = j. Complex numbers (3. s. Let g be a quadratic form in a linear vector space over the field of complex numbers C such that its matrix is diagonal a basis e1. n. Then for an arbitrary vector v ∈ V we have the following relationships: n n n g(v. . en . belong to the kernel of the form Ker g. The above considerations prove the following proposition that we present in the form of a theorem. .4. these basis vectors e1. 111 The first s vectors of the basis. Since v ∈ V is an arbitrary vector. if w = ei for i = 1. (3. . TRANSFORMATION TO A CANONIC FORM. We use them in order to construct the new basis: ˜ ei = (γi )−1 · ei . . . . . Without loss of generality we can assume that the first s basis vectors e1 . Definition 3. The conclusion is that any vector w of the kernel Ker g can be expanded into a linear combination of the first s basis vectors. .9). es form a basis in Ker g. then g(v. therefore. suppose that w ∈ Ker g. ˜ e e . es form a basis in the kernel Ker g. is a geometric invariant of the form g. Theorem 3. w s+1 = . which correspond to the matrix elements (3. w) = 0 for all vectors v ∈ V . Indeed. . .7) are nonzero. .8) is again a diagonal matrix. It does not depend on the method used for bringing this matrix to a diagonal form and coincides with the dimension of the kernel of the quadratic form: s = dim(Ker g). Indeed.7) Remember that for any complex number one can take its square root which is again a complex number. . . . i = 1. The number of zeros on the diagonal of the matrix of a quadratic form g. . γn by means of formula γi = 1 √ gii for i s. We define the numbers γ1 . . . . This fact can be easily derived with the use of formulas (1. . . w) = i=1 j=1 gij vi wj = i=s+1 gii vi wi = 0. The number s = dim(Ker g) is called the zero inertia index of a quadratic form g. But gii = 0 for i s + 1. . . . . brought to the diagonal form.

its diagonal is filled with s zeros and n − s ones.10) for i > s. en be a basis in which the matrix of g is diagonal. ˜n has the following form e which is used to be called the canonic form of the matrix of a quadratic form over the field of complex numbers C: 0 G= .7) for complex numbers: γi = 1 |gii| for i s. .9) is a diagonal matrix. . γn a little bit differently than it was done in (3. (3. . then remaining n − s − r elements on the diagonal are negative numbers. .9). Then gii < 0 for i = s + r + 1. . 1     . n.A. . −1        . . . . .112 CHAPTER IV. . Without loss of generality we can assume that the basis vectors e1. ˜i) = (γi )−2 gii = ˜ e e 0 1 for i s.9). .. . In the case of a linear vector space over the field of real numbers R the canonic form of the matrix of a quadratic form is different from (3. e Here is the matrix of the quadratic form g in this basis: 0   1 G= . here we define γ1 . Therefore. en are enumerated so that gii = 0 for i = 1. . . ... . where s = dim Ker g. 0 s rp (3. . s + r. . .11) rn CopyRight c Sharipov R. . Diagonal elements of this matrix now is subdivided into three groups: zero elements. . 1996.. For the diagonal elements of the matrix of g we derive gii = g(˜i . 1 −1 . for i > s. and negative elements. . In the field of reals we can take the square root only of non-negative numbers.. . . Let e1 .10) we define new basis e1 .9) n−s The matrix G in (3. . positive elements. . . . ˜n using the formulas (3.   1 . . .. . . 0 s (3. s and gii > 0 for i = s + 1. ˜ The matrix of the quadratic form g in the basis e1 . . BILINEAR AND QUADRATIC FORMS. . . . 2004. If s is the number of zero elements and r is the number of positive elements. ˜ By means of (3.

es+rp . then rp > rp or rp < rp . . rp . ˜ dim U− = rn = n − s − rp. The value of the quadratic form g for that vector is determined by the matrix (3. The sum of squares in the right hand side of this equality is a non-negative quantity.§ 3.10): s+rp g(v) = i=s+1 (vi )2. The positive and the negative inertia indices rp and rn of a quadratic form g in a space over the field of real numbers R are geometric invariants of g. . Let’s take a vector v ∈ U+ . (3.10) is written as n g(v) = i=s+rp +1 (−(vi )2 ). .11). en be a basis of a space V in which the matrix of g has the canonic form (3. The ˜ ˜ ˜ zero inertia indices in both bases are the same s = s since they are determined by ˜ the kernel of g: s = dim(Ker g) and s = dim(Ker g). e.10) are called the positive inertia index and the negative inertia index of the quadratic form g respectively. Theorem 3.12) The intersection of U+ and U− is trivial. The formula (3. . U− = es+rp +1 . g(v) 0 for all v ∈ U+ . .5.11) according to the formula (1. ˜ Suppose that ˜1. . Now let’s take a vector v ∈ U− . . Let’s consider the following subspaces: U+ = e1 . . . 113 Definition 3. g(v) < 0 for all nonzero vectors of the subspace U− . and rn the inertia indices of g in this basis.11) defines the canonic form of the matrix of a quadratic form g in a space over the real numbers R. . ˜ ˜ ˜ For the sum of dimensions of these two subspaces U+ and U− we get the equality ˜− = n + (rp − rp ). The integers rp and rn that determine the number of plus ones and the number of minus ones on the diagonal of the matrix (3. If we assume that rp = rp . For this purpose we consider the subspaces U+ and U− determined by ˜ ˜ the relationships of the form (3. en is some other basis in which the matrix of g has the e canonic form. Proof. . . Denote by s. Hence. They do not depend on a particular way how the matrix of g was brought to the diagonal form. For this vector the formula (1. Then we calculate the dimensions of U+ and U− : ˜ dim U+ = s + rp . dim U+ = s + rp . ˜ Let’s prove the coincidence of the positive and the negative inertia indices in ˜ ˜ two bases. .13) . . TRANSFORMATION TO A CANONIC FORM. . . then at least one summand in right hand side is nonzero.12) but for the «wavy» basis e1 . en. . Let e1. If v = 0. dim U− = rn . . en .3. i. . Due to the above assumption rp > rp we derive dim U+ + dim U ˜ ˜ ˜ dim U+ + dim U− > dim V (3. For the sake of certainty suppose ˜ ˜ ˜ ˜ that rp > rp . and for their sum we have U+ ⊕ U− = V . .

en. In the case of a quadratic form in complex space (K = C) the signature is formed by two numbers (s. . etc. In the case of a linear space over the field of rational numbers K = Q we can also diagonalize the matrix of a quadratic form and subdivide the diagonal elements into three parts: positive. Definition 4.13) and applying the theorem 6. in the case of real space (K = R) it is formed by three numbers (s. From rp = rp and s = s ˜ ˜ ˜ then we derive rn = rn. In this section we consider quadratic forms in linear vector spaces over the field of real numbers R.11). g(v) < 0 contradicting each other. Theorem 4.1. primality. in which the matrix of g has the form (3. . Therefore. A quadratic form g in a space V over the field R is called a positive form if g(v) > 0 for any nonzero vector v ∈ V . the intersection U+ ∩ U− ˜ is nonzero. which would also contradict the positivity of g. and zero elements. . Let g be a positive quadratic form and let e1. If s = 0 then for the basis vector e1 = 0 we would get g(e1 ) = g11 = 0. which are geometric invariants of g. n − s). s = rn = 0. The theorem is proved. . we derive dim(U+ ∩ U− ) > 0. . Positive quadratic forms. A quadratic form g in a finite-dimensional space V is positive if and only if the numbers s and rn in its signature (s. negative. . However. conversely. Hence. . Silvester’s criterion. rp . and we can define its signature. § 4.1. BILINEAR AND QUADRATIC FORMS. rn) are equal to zero. The total set of inertia indices is called the signature of a quadratic form. Proof. which would contradict the positivity of g. From the conditions v ∈ U+ and v ∈ U− we obtain two inequalities g(v) 0. almost all results of this section remain valid for quadratic forms in rational vector spaces as well. en be a basis in which the matrix of g has the canonic form (3.11). Hence. let s = rn = 0. ˜ Definition 3. + (vn )2 . . factorization of integers. its value g(v) is the sum of squares g(v) = (v1 )2 + . rn). in the case K = Q we cannot reduce the nonzero diagonal elements to plus ones and minus ones only. then for the basis vector en = 0 we would get g(en ) = gnn = −1. the number of geometric invariants in this case is greater than 3. Using this estimate together with the inequality (3. Then in the basis e1 .4. This determines the numbers s. This contradiction shows that our assumption rp = rp is ˜ not valid and the inertia indices rp and rp do coincide.4 of ˜ ˜ Chapter I to them. We shall not look for the complete set of geometric invariants of a quadratic form in the case K = Q and we shall not construct their theory since this would lead us to the number theory toward the problems of divisibility. . If rn = 0.114 CHAPTER IV. it contains a nonzero vector v ∈ U+ ∩ U− . . and rn. Now. ˜ ˜ From the natural inclusion U+ + U− ⊆ V we get dim(U+ + U− ) dim V . However. rp . . rp .

vn are coordinates of a vector v. . .3 of Chapter I. it is orthogonal to itself: g(v) = g(v.10). Let’s choose an arbitrary basis e1. . Thus. en in V and then let’s construct the matrix of the quadratic form g: g11 .4. gn1 . The theorem is proved. Any positive quadratic form g is non-degenerate. Proof. . We need only to prove that the sum in this expansion is a direct sum. This proves the positivity of g and thus completes the proof of the theorem in whole. If so. Theorem 4. Since the kernel Ker g of a positive form g is zero. v) = 0.5 due to the triviality of the kernel Ker g = {0} of a positive quadratic form g we derive dim U + dim U⊥ = dim V. g1n . From v ∈ U⊥ we derive that it is orthogonal to all vectors of U . any positive form g should be non-degenerate. 115 where v1 . The determinant of the matrix thus obtained is called the k-th principal . .2. SILVESTER’S CRITERION. Theorem 4. v) = 0. . Let v be an arbitrary vector of the intersection U ∩ U⊥ . . If Ker g = {0} then there is a nonzero vector v ∈ Ker g. The condition s = Ker g obtained in the theorem 3.1) Let’s delete the last n − k columns and the last n − k raws in the above matrix (4. This formula follows from the formula (1. Proof. . . . Hence.. Hence. For any subspace U ⊂ V and for any positive quadratic form g in a finite-dimensional space V there is an expansion V = U ⊕ U⊥ . . g(v) > 0. we get U ∩ U⊥ = {0}. The vector v of the kernel is orthogonal to all vectors of the space V . This fact is valid for a form in an infinite-dimensional space as well.1. Therefore. . this fact contradicts the positivity of the form g. Theorem 4. .4 and the condition s = 0 mean that a positive form g in a finite-dimensional space V is non-degenerate: Ker g = {0}. Let’s prove this equality.§ 4. Let g be a quadratic form in a finite-dimensional space V over the field of real numbers R. . Any subspace U ⊂ V is regular with respect to a positive quadratic form g in a linear vector space V . the regularity of a subspace U with respect to g is equivalent to the equality U ∩ U⊥ = {0} (see definition 3. Proof. gnn G= (4. For the sum of the dimensions of U and U⊥ from the theorem 2. Due to positivity of g the equality g(v) = 0 holds only for the zero vector v = 0.1). Due to this equality in order to complete the proof it is sufficient to apply the theorem 6. The expansion V = U + U⊥ follows from the theorem 3.1). . .3. For a nonzero vector at least one of its coordinates is nonzero. Therefore. . it is also orthogonal to itself since v ∈ U . Hence. . g(v) = g(v. .

BILINEAR AND QUADRATIC FORMS.2) is a necessary condition for the positivity of a quadratic form g itself. Mk = det ..116 CHAPTER IV. . Suppose that all diagonal minors (4. . . . Therefore M1 > 0 implies the positivity of the form g. ˜ ˜ Now let e1. (det S)2 is a positive number. . . . . . .1. The matrix of the form hk in the basis e1 . its determinant is equal to unity and thus it is positive: det G = 1 > 0. Then the determinant of the matrix of g in an arbitrary basis of V is positive. Its determinant is also a nonzero real number. .2).5 to the form hk . . The positivity of g implies the positivity of all principal minors in its matrix. . . Proof. We should prove that g is positive. . The theorem is proved. gk1 . Theorem 4. en in which the matrix of g has the canonic form (3. . . Therefore. . e k . . Now again let e1. . . . . . en to ˜1. . . Applying the formula (1.6 (Silvester). . For the beginning we consider a canonic basis e1. . This fact is known as Silvester’s criterion. This fact is already proved. the matrix of a positive quadratic form g in a canonic basis is the unit matrix.11). this condition is a sufficient condition as well. Here the matrix of g consists of the only element g11 that coincides with the only principal minor: g11 = M1. en . Let g be a positive quadratic form in a finite-dimensional space V . The value g(v) in one-dimensional space is determined by the only coordinate of a vector v according to the formula g(v) = g11 (v1 )2 . The basis of the induction in the case dim V = 1 is obvious. we can apply the theorem 4. In a linear vector space V over the real numbers R the elements of any transition matrix S are real numbers.5. Hence. Therefore.2) The n-th principal minor Mn coincides with the determinant of the matrix G. We denote this determinant by Mk : g11 . Let’s denote by hk the restriction of g to the subspace Uk .2) in the matrix of a quadratic form g are positive. It is clear that the restriction of a positive form g to any subspace is again a positive quadratic form. . . This is the very block that determines the k-th principal minor Mk in the formula (4. ek coincides with upper left diagonal block in the matrix of the initial form g. . g1k . . Conclusion: the positivity of all principal minors (4. . . . en be an arbitrary basis of V and let gij be the matrix of a positive quadratic form g in this basis.12). A quadratic form g in a finite-dimensional space V is positive if and only if all principal minors of its matrix are positive. The proof is by induction on n = dim V . minor of the matrix G. we get e ˜ det G = det S tr (det G) det S = (det S)2 . Theorem 4. . Proof. According to the theorem 4. . gkk (4. . This yields Mk > 0. en be an arbitrary basis and let S be the transition matrix for ˜ passing from e1 . Let’s consider the subspace Uk = e 1 . As appears. Let’s prove the converse proposition. .

3) bringing it to a lower-triangular form. . . . en of V . SILVESTER’S CRITERION. . Due to the positivity of these minors. .3). . . From this formula we derive det G1 = det G (det S)2 = Mn (det S)2 ..3) explicitly.. . . (4. .11). . Let gij be the matrix of our quadratic form g in some basis e1 . . From the course of algebra we know that such transformations do not change the determinant of a matrix. . . . en−1 be a basis in which the matrix of the form h has the canonic e form (3.. . . . . . Therefore. 1 Sn . en in which the matrix of g has the form ˜ e 1 . . . . en is described e by a blockwise-diagonal matrix S of the form 1 S1 . en to the basis ˜1. .. . e..12) relates the matrix (4.3) . gn n−1 ˜ ˜ The passage from the basis e1 . . gn−1 n ˜ gnn G1 = . . i. . 117 Suppose that the proposition we are going to prove is valid for a quadratic form in any space of the dimension less than n = dim V . 0 gn1 ˜ . Let’s calculate the determinant of the matrix (4. . .1) implies the positivity of the determinant of the matrix (4. the minors M1 . we can calculate . . . . . en−1 of the subspace U by the vector en ∈ U .3) with the matrix G of the quadratic form g in the initial basis: G1 = S tr G S. Then we multiply the second column by g2n and subtract it from the last ˜ one.. . . . 0 . . 1 g1n ˜ . . en−1 coincide with corresponding elements in the matrix of the initial form: hij = gij . .. . . n−1 S1 S1 = . en−1 is the unit matrix. Denote by h the restriction of the form g to the subspace U of the dimension n − 1. Mn−1 can be calculated by means of the matrix hij .1 to the form h. . .4) 0 0 The formula (1. We produce such an operation repeatedly for each of the first n − 1 columns of the matrix (4.. we conclude that the matrix ˜ ij in the canonic basis ˜1. det G1 > 0. . . . we find that h is a positive quadratic form in U . 0 . . (4. . Sn . As a result we get the basis ˜ e / ˜1. . . . . . . In present case they simplify the matrix (4. ˜ Let ˜1. . en−1.3). Let’s denote U = e1 . .. (4.§ 4. Applying the theorem 4. For this purpose we multiply the first column of this matrix by g1n and subtract it from the last ˜ column.5) Due to the above formula (4.. .5) the positivity of the principal minor Mn = det G in the initial matrix (4. 0 1 n−1 . en−1. . . . en−1 . The matrix elements hij in the matrix of h calculated in the basis e1. applying the inductive hypothesis. Therefore. . Let’s complete the basis ˜ h e ˜1.

. . for the element gnn in (4.. Hence. . For non-diagonal elements g(˜k . the unit diagonal block in the matrix (4.3) remains unchanged. . . g is a positive quadratic form. en−1. h i=1 The equality g(˜k . BILINEAR AND QUADRATIC FORMS. . . 0 gn1 ˜ . en is close to the diagonal matrix. ..1) is positive. . we have ˜ completed the inductive step and have proved the theorem in whole.3) in explicit form: 1 . i=1 (4. . 0 .6) . . . . 0 .8) . . 0 gnn ˜ det G1 = det = gnn. ˜n) from this fact we derive e e n−1 n−1 n−1 e e g(˜n . the determinant of the matrix (4.7) ˜ ˜ The matrix of the quadratic form g in the basis e1 . . . . . . .. ˜n) in the new basis we have e e n−1 n−1 g(˜k . . . 1 0 .. ˜ ˜ ˜ The passage from e1 . Thus. .8) is also positive. 0 G2 = . 1 0 . (4. . . .. .. the ˜ ˜ matrix of g in the basis e1 . . For e the diagonal element g(˜n . en−1.7). ˜n) = gkn − e e ˜ i=1 gin g(˜k . en) = 0 in the above formula is due to the fact that the matrix e ˜ ˜ of the restricted form h in its canonic basis ˜1. gn n−1 ˜ The element gnn in the transformed matrix is given by the formula ˜ n−1 n−1 gnn = gnn − ˜ i=1 gni gin = gnn − (gin)2 .. . Thus. . ˜i) = gkn − e e ˜ gin ˜ ki = 0.118 CHAPTER IV. en is a diagonal matrix of the form 1 .6). . en changes only the last basis e vector. Therefore. 0 0 . .8) we get gnn = Mn (det S)2 . we find that gnn in (4. . . i=1 e e ˜ Comparing this expression with (4. Let’s complete the process of diagonalization replacing the vector ˜ en by the vector en ∈ U such that n−1 ˜ en = e n − i=1 ˜ gin · ei . . ˜n) = gnn. en to ˜1. .. . ˜n) = gnn − i=1 k=1 gin gkn hik = gnn − (gin )2. we find that g(˜n . ˜ ˜ Since the principal minor Mn of the initial matrix (4. . 0 gnn ˜ Combining (4. ˜ (4. en−1 is the unit matrix.. .5) and (4.. . . . .

(v | w1 + w2 ) = (v | w1 ) + (v | w2 ) for all w1 .CHAPTER V EUCLIDEAN SPACES. § 1. w) determined by the recovery formula (1. only one of them is associated with V so that it defines the structure of Euclidean space in V . (v | w) = (w | v) for all v.8) of Chapter III. Definition 1. A Euclidean vector space is a linear vector space V over the field of reals R which is equipped with some fixed positive quadratic form g. 2004. g) be a Euclidean vector space. Two Euclidean vector spaces (V. (v | α · w) = α (v | w) for all v. which is defined for a pair of a vector and a covector. The angle between vectors.2) Due to the notation (1. (1. but they are different when considered as Euclidean vector spaces.. There many positive quadratic forms in the linear vector space V . however. w ∈ V . The scalar product (1.6) of Chapter IV. 1996. The norm and the scalar product. They are analogous to that of the scalar product of a vector and a covector (see formulas (1. we can omit the symbol g at all. The structure of the Euclidean vector space (V. g) is associated with a special terminology and special notations. v2 .2) of a Euclidean vector space possesses the following properties: (1) (2) (3) (4) (5) (6) (v1 + v2 | w) = (v1 | w) + (v2 | w) for all v1. The properties (1)-(4) reflect the bilinearity of the form g in (1.1. w ∈ V and for all α ∈ R. CopyRight c Sharipov R. g2 ) with g1 = g2 coincide as linear vector spaces. .2). The scalar product (1. v ∈ V . (α · v | w) = α (v | w) for all v. The scalar product is denoted as follows: (v | w) = g(v. The square root of g(v) is called the norm or the length of a vector v.9) in Chapter III). The value of that bilinear form is called the scalar product of two vectors v and w. g).2) is defined for a pair of two vectors v. w2.A. |v|2 = (v | v) 0 for all v ∈ V and |v| = 0 implies v = 0. w ∈ V and for all α ∈ R. Orthonormal bases.1) The quadratic form g(v) produces the bilinear form g(v. It is quite different from the scalar product (1. w). w ∈ V . (1. w ∈ V . Let (V. when dealing with some fixed Euclidean space (V.2).1) and (1. g1) and (V. The norm of a vector v is denoted as follows: |v| = g(v). The value of the quadratic form g(v) is nonnegative.

(1. . therefore.4) we easily derive the property (7).4) is positive.1.4) The denominator of the fraction (1. The theorem is proved. for the right hand side of the equality (1. +∞). Using the properties (1)-(6) we find that f(α) is a polynomial of degree two: f(α) = |v + α · w|2 = (v + α · w | v + α · w) = (1.5) we get the following estimate: |v|2 + 2 (v | w) + |w|2 |v|2 + 2 |v| |w| + |w|2 = (|v| + |w|)2. In order to prove the property (8) we consider the square of the norm for the vector v + w. Proof.2) are derived from the properties (1)-(6): (7) |(v.3) has a lower bound: f(α) 0.5) Applying the property (8).2) a generalization of the scalar product of 3-dimensional geometric vectors. The properties (5) and (6) have no such analogs. This yields the following equation: f (α) = 2 (v | w) + 2 α (w | w) = 0. The function (1. The following two additional properties of the scalar product (1. This follows from the property (6). Now let’s write the condition f(α) 0 for the minimal value of the function f(α): fmin = f(αmin ) = |v|2 |w|2 − (v | w)2 |w|2 0.120 CHAPTER V. From the relationship (1.3) = (v | v) + 2 α (v | w) + α2 (w | w). (1.5) and from the above inequality we derive the other inequality |v + w|2 ≤ (|v| + |w|)2. which is already proved. (8) |v + w| |v| + |w| for all v. w ∈ V and consider the numeric function f(α) of a numeric argument α defined by the following explicit formula: f(α) = |v + α · w|2 . Theorem 1. w)| |v| |w| for all v. Let’s calculate the minimum point of the function f(α) by equating its derivative f (α) to zero. we find αmin = −(v | w)/(w | w). In order to prove the inequality (7) we choose two arbitrary nonzero vectors v. w ∈ V . Solving this equation. while the property (8) is called the triangle inequality. The property (7) is known as the Cauchy-Bunyakovsky-Schwarz inequality. For this quantity we derive |v + w|2 = (v + w | v + w) = |v|2 + 2 (v | w) + |w|2. w ∈ V . from the inequality (1. This operation is correct since y = x is an increasing function of the real semiaxis [0. EUCLIDEAN SPACES. But they are the very properties that make the scalar product (1. Now the property (8) is derived by taking √ the square root of both sides of this inequality.

Definition 1. . Then there is a nontrivial linear combination of these vectors which is equal to zero: α1 · v1 + . Therefore. The definition 1. . . The matrix gij composed by the mutual scalar products of these vectors gij = (vi | vj ). . vm are linearly dependent. Hence. . vm be a system of vectors in a Euclidean space (V.7): m m gij αj = j=1 j=1 (vi | vj ) αj = (vi | α1 · v1 + . Suppose that the vectors v1.2.3 applies only to nonzero vectors v and w. vm in a Euclidean space is linearly dependent if and only if the determinant of their Gram matrix is equal to zero. . 121 Due to the analogy of (1.4 are equivalent.4. Two vectors v and w in a Euclidean space V are called orthogonal vectors if they form a right angle (ϕ = π/2). . (1. The definition 2.1 of Chapter IV is more general. Theorem 1. Since i is a free index running over the interval of integer numbers from 1 to m. .6) is not greater than 1. . . + αm · vm = 0. . we construct the following expression with the components of Gram matrix (1.3.§ 1. w)| |v| |w| we can introduce the concept of an angle between vectors in a Euclidean vector space. .8) (1. . is called the Gram matrix of the system of vectors v1 . . It determines the unique number ϕ from the specified interval 0 ϕ π.1 the modulus of the fraction in left hand side of (1. Definition 1. The number ϕ from the interval 0 mined by the following implicit formula cos(ϕ) = (v | w) . + αm · vm ) = (vi | 0) = 0.7) Using the coefficients of the linear combination (1. g). Definition 1. Two vectors v and w in a Euclidean space V are called orthogonal vectors if their scalar product is zero: (v | w) = 0. Proof.6) is called the angle between two nonzero vectors v and w in a Euclidean space V .3 and 1.8). Due to the property (7) from the theorem 1. .2. . . . vm . its determinant is equal to zero (this fact is known from the course of general algebra).6) is correct. Let v1 . For nonzero vectors v and w these two definitions 1. THE NORM AND THE SCALAR PRODUCT.2) and the scalar product of geometric vectors and due to the Cauchy-Bunyakovsky-Schwarz inequality |(v. Let’s reformulate for the case of Euclidean spaces. |v| |w| ϕ π. A system of vectors v1 . the formula (1. . . . which is deter- (1. this formula means that the columns of Gram matrix gij are linearly dependent.

10) substantially: n n |v|2 = (vi )2. . hence. i=1 (1. . which is equal to zero. there is a nontrivial linear combination of them that is equal to zero: m gij αj = 0. .11) is not fulfilled. en in a Euclidean space V is called an orthonormal basis if the Gram matrix for the basis vectors is the unit matrix: gij = 1 0 for i = j. Let’s consider the Gram matrix of this basis. .8 on completing the basis of a subspace formulated in Chapter I has its analog for orthonormal bases. + αm · vm .1) and the scalar product of vectors (1. for i = j. . . . .10) A basis e1. Conversely. . en is the matrix of the quadratic g in this basis. i=1 j=1 (1. . + αm · vm ) = = (α1 · v1 + . assume that the determinant of the Gram matrix (1. Thus. . . . The theorem 3. j=1 (1. Since g is a positive quadratic form. . Hence. .7) we know that the Gram matrix of the basis vectors e1 . . . .11) in Chapter IV). . Due to (1. . i=1 j=1 (v | w) = gij vi wj . en are unit vectors orthogonal to each other. . . . . which is obviously equal to zero due to the equality (1. + αm · vm | v) = (v | v) = |v|2. en be a basis in a finite-dimensional Euclidean vector space (V. EUCLIDEAN SPACES. Let e1 . .122 CHAPTER V. vm are linearly dependent. . we get |v|2 = 0 and.9): m m m 0= i=1 j=1 αi gij αj = i=1 αi (vi | α1 · v1 + . we get the nontrivial linear combination of the form (1. The theorem 4. we derive v = 0.7) is equal to zero. i=1 (v | w) = vi wi .9) Let’s denote v = α1 · v1 + .2) through their coordinates: n n n n |v|2 = gij vi vj . .8). (1.2) and (1.12) Orthonormal bases do exist. In an orthonormal basis the vectors e1 . . then the basis e1 . we can calculate the norm of vectors (1.3 of Chapter IV says that there exists a basis in which the matrix of g has its canonic form (see (3.1 in Chapter IV). using the positivity of the basic quadratic form g of the Euclidean space V . . Knowing the components of the Gram matrix. . . g). Since v = 0. This simplifies the formulas (1. en is called a skewangular basis. our vector v1 .11) If the condition (1. its matrix in a canonic form is the unit matrix (see theorem 4. Then the columns of this matrix are linearly dependent and. Then consider the following double sum.

es be two orthonormal bases and let S be the transition.1) is a real non-negative number. . Quadratic forms in a Euclidean space. en in U⊥ and then join together two bases of U and U⊥ . . from a right basis to a left basis is a matrix with negative determinant. for the transition matrix S we derive S tr S = 1. . es and e1. conversely. .12) of Chapter IV. (1. Therefore. orthogonal matrices are subdivided into two types: matrices with positive determinant det S = 1 and those with negative determinant det S = −1. 123 Theorem 1.13) for the determinant of an orthogonal matrix we get: (det S)2 = 1. . es of the subspace U . This subdivision is related to the concept of orientation. Therefore. . As a result we get the basis in V (see theorem 6. en in V . . All bases in a linear vector space over the field of real numbers R (not necessarily a Euclidean space) can be subdivided into two sets which can be called «left bases» and «right bases». The subspace U⊥ inherits the structure of a Euclidean space from V . QUADRATIC FORMS IN A EUCLIDEAN SPACE. Let’s consider the orthogonal complement U⊥ of the subspace U . Proof. Let’s choose an orthonormal basis es+1 . . |v|2 (2. ˜ ˜ Let e1.1) The number µ(v) in (2. . . . the subspaces U and U⊥ define the expansion of the space V into a direct sum: V = U ⊕ U⊥ . . For such a form ϕ we define the following ratio: µ(v) = |ϕ(v)| . . . . We say that a linear vector space V over the field of real numbers R is equipped with the orientation if there is some mechanism to distinguish one of two types of bases in V . . The transition matrix for passing from a left basis to a right basis or. Diagonalization of a pair of quadratic forms.4 of Chapter IV. Such a transition matrix changes the orientation of a basis. applying the formulas (1. § 2. . The Gram matrices of these two bases are the unit matrices. .§ 2. g). Therefore.3 of Chapter I). g) be a Euclidean vector space and let ϕ be a quadratic form in V . Then it can be completed up to an orthonormal basis e1. . Hence. this is an orthonormal basis completing the initial basis e1. we can assume v in (2. . . . According to the theorem 4. . S −1 = S tr .13) is called an orthogonal matrix. The vectors of this basis are unit vectors by their length and they are orthogonal to each other. es be an orthonormal basis in a subspace U of a finite-dimensional Euclidean space (V. Let e1. Note that µ(α · v) = µ(v) for any nonzero α ∈ R. From the relationships (1.13) Note that a square matrix S satisfying the above relationships (1.1) to be a unit vector.3. . . The transition matrix for passing from a left basis to a left basis or for passing from a right basis to another right basis is a matrix with positive determinant — it does not change the orientation. Let (V.

This yields 4 α ϕ(v.2) Definition 2. (2. EUCLIDEAN SPACES. w) = ϕ(v + α · w) − ϕ(v − α · w). w): 4 α ϕ(v. If the norm ϕ is finite. Theorem 2.4) (2. From (2. w) .3) Now let’s apply the inequality |ϕ(u)| ≤ ϕ |u|2 derived from (2. As a result we obtain αmin = ϕ(v. This yields the following inequality for the bilinear form ϕ: ϕ(v.3) we derive the following inequality for the quantity 4 α ϕ(v. The quantity ϕ determined by the formulas (2. w) we use the following equality.1.4).5) bringing it to the following one: 4 α ϕ(v. then there is the estimate |ϕ(v. (2. w)2 ϕ 2 |v|2 |w|2.5) Let’s express the squares of moduli through the scalar products: |v ± α · w|2 = |v|2 ± 2 α (v | w) + α2 |w|2.2) in order to estimate the right hand side of (2. If ϕ is a restricted quadratic form. w) 2 ϕ (|v|2 + α2 |w|2). Let’s find the minimum point α = αmin for this function by equating its derivative to zero: f (α) = 0.124 CHAPTER V. w) |ϕ(v + α · w)| + |ϕ(v − α · w)|. Let’s denote by ϕ the least upper bound of µ(v) for all unit vectors (such vectors sweep out the unit sphere in the Euclidean space V ): ϕ = sup µ(v). ϕ |w|2 Now let’s write the inequality f(αmin ) 0 for the minimal value of this function.2) is called the norm of a quadratic form ϕ in a Euclidean vector space V . is a version of the recovery formula: 4 α ϕ(v.1) and (2. The numeric function f(α) of a numeric argument α is a polynomial of degree two in α. In order to calculate ϕ(v. w) + ϕ |v|2 0. |v|=1 (2. Proof. the form ϕ is said to be a restricted quadratic form. Then we can simplify the inequality (2.1. .1) and (2. w) ϕ (|v + α · w|2 + |(v − α · w)|2). w)| ϕ |v| |w| for the values of corresponding symmetric bilinear form. which. Now let’s transform the above inequality a little bit more: f(α) = α2 ϕ |w|2 − 2α ϕ(v. in essential.

Any quadratic form ϕ in a finite-dimensional Euclidean vector space V is a restricted form. .§ 2. . Hence. . For the coordinates of v in this basis due to the formulas (1. Due to (2. . For any quadratic form ϕ in a finite-dimensional Euclidean vector space V the supremum in formula (2. . Proof. in V such that the norm ϕ is expressed as the following limit: ϕ = lim |ϕ(v(s))|. This means that there is a sequence of unit vectors v(1). Theorem 2. So.2) is reached. . s→∞ (2.2) this sum is an upper bound for the norm ϕ . . for the components of v we have |v i | 1. Hence.2.12) we obtain (v1 )2 + . e.6) does not depend v. . QUADRATIC FORMS IN A EUCLIDEAN SPACE. .1). through the coordinates of v: n n µ(v) = |ϕ(v)| = From |vi | ϕij vi vj . Let’s choose an orthonormal basis e1. 125 Now it is easy to derive the required estimate for |ϕ(v. Note that a quite similar method was used when proving the Cauchy-Bunyakovsky-Schwarz inequality in the theorem 1. + (vn (s))2 = 1 (2. + (vn )2 = 1. (2. i. Let’s express the quantity µ(v).6) Right hand site of (2. which is defined by formula (2. From the course of mathematical analysis we knows that the supremum of a numeric set is the limit of some converging sequence of numbers of this set (see [6]). The equality (v1 (s))2 + . . . . en in v and let’s expand each vector v(s) of the sequence in this basis. w)| by taking the square root of both sides of the above inequality. .8) means that each specific coordinate v i (s) yields a restricted sequence of real numbers: −1 vi (s) 1. v(n). .1.7) Let’s choose an orthonormal basis e1 . Proof. . en in V and consider the expansion of a unit vector v in this basis. . in the sequence . . there exists a vector v = 0 such that |ϕ(v)| = ϕ |v|2. .12). . The theorem is proved. i=1 j=1 1 we derive the following estimate for the quantity µ(v): n n µ(v) i=1 j=1 |ϕij | < ∞.8) is derived from |v(s)| = 1 due to the formulas (1. From the course of mathematical analysis we know that in each restricted sequence of real numbers one can choose a converging subsequence. ϕ < ∞. Theorem 2.3. Now the equality (2.

for the unit vector v with coordinates (2. .9) Denote by v the vector whose coordinates are determined by the limit values (2. The theorem is proved.44 in Chapter IV).10) Thus. 1996. Here α is a numeric parameter. EUCLIDEAN SPACES. . Now let’s calculate |ϕ(v)| using the matrix of the quadratic form ϕ and the coordinates of v in the basis e1. The proof is by induction on the dimension of the space V . For any quadratic form ϕ in a finite-dimensional Euclidean vector space (V.3. For the sake of certainty we assume that ϕ(v) 0. It is easy to see that u is also a unit vector. Applying theorems 2. Multiplying v by some number α ∈ R. Let’s denote U = v and consider the orthogonal complement U⊥ . k→∞ (2. The subspaces U = v and U⊥ have zero intersection. . . . Let’s take an arbitrary vector w ∈ U⊥ of the unit length and compose the vector u as follows: u = cos(α) · v + sin(α) · w. taking into account (2. Suppose that the proposition of the theorem is valid for all quadratic forms in Euclidean spaces of the dimension less than n.A. en such that the matrix of the form ϕ in this basis is a diagonal matrix. we find a unit vector v ∈ V such that |ϕ(v)| = ϕ .4. we get a subsequence of unit vectors v(sk ) such that their coordinates all are the converging sequences of numbers. en : n n n n |ϕ(v)| = ϕij vi vj = lim i=1 j=1 k→∞ ϕij vi (sk ) vj (sk ) . their sum is a direct sum and U ⊕ U⊥ = V (see theorem 4. Then we can remove the modulus sign: ϕ(v) = ϕ .2 and 2. g). Let’s consider the limits of these sequences: vi = lim vi (sk ). . Theorem 2.7). In the case ϕ(v) < 0 we replace the form ϕ by the opposite form ϕ = −ϕ since two ˜ opposite forms diagonalize simultaneously. we conclude that v is a unit vector: |v| = 1. i=1 j=1 On the other hand. . of unit vectors v(s) one can choose a subsequence of unit vectors whose first coordinates form a convergent sequence of numbers. for |ϕ(v)| we get |ϕ(v)| = lim |ϕ(v(sk ))| = lim |ϕ(v(s))| = ϕ .9) we get |ϕ(v)| = ϕ . Proof. 2004. Then the equality (2.10) will be written as |ϕ(v)| = ϕ |v|2.9). Let’s denote this subsequence again by v(s) and choose its subsequence with converging second coordinates. this follows from the identity cos 2 (α) + sin2 (α) = 1. .126 CHAPTER V. we can remove the restriction |v| = 1. Let dim V = n and let ϕ be a quadratic form in the Euclidean space (V. g) there is an orthonormal basis e1 . k→∞ s→∞ (2.. . In the case dim V = 1 the proposition of the theorem is obvious: any square 1 × 1 matrix is a diagonal matrix. Passing to the limit s → ∞ in (2.8). CopyRight c Sharipov R. Repeating this choice n-times for each specific coordinate.

4. en−1 in the subspace U⊥ such that the matrix of the form ϕ is diagonal in this basis: ϕ(ei . α = 0 is a maximum point for the function f(α). ϕ(v. There we have shown that any linear mapping f : V → W possesses the conjugate mapping f ∗ : W ∗ → V ∗ . . . . . (2. Let’s calculate its derivative at the point α = 0 and equate it to zero. The theorem is proved. This yields f (0) = 2 ϕ(v. we can find an orthonormal basis e1 . . Selfadjoint operators. which is fulfilled for all u ∈ V ∗ and for all v ∈ V . Therefore. . en. Let’s apply the inductive hypothesis to the subspace U⊥ whose dimension is less by 1 than the dimension of the space V . i. Therefore.§ 3. SELFADJOINT OPERATORS. . The matrix of ϕ is also diagonal as stated in the theorem 2. Definition 3. w) = 0 for all vectors w ∈ U⊥ . .1. It is related to f by means of the equality f ∗ (u) | v = u | f(v) . The theorem on the spectrum and the basis of eigenvectors for a selfadjoint operator. . w ∈ V the following equality is fulfilled: (v | f(w)) = (f(v) | w). and for α = 0. In S 4 of Chapter III we have introduced the concept of conjugate mapping. w) = 0. . . § 3. Orthonormality of the basis e1. . The complementary vector en is a vector of unit length. .11) Hence. e. The theorem 2. Then the positive form g defines the structure of a Euclidean space in V and then one can apply the theorem 2. . ej ) = 0 for i = j. For a linear operator f : V → V the conjugate mapping f ∗ is a linear operator in dual space V ∗ . when u = v. . The adjoint operator is denoted as follows: h = f + .2. . w) + sin2 (α) ϕ(w). 127 Let’s calculate the value of the quadratic form ϕ on the vector u and treat it as a function of the numeric parameter α: f(α) = ϕ(u) = cos2 (α) ϕ(v) + 2 sin(α) cos(α) ϕ(v. . The matrix of the form ϕ is diagonal in the basis e1 . Hence. . en is an orthonormal basis in V . . . For this purpose one of them should be positive. Definition 3. . . en−1. we have the equality ϕ(u) = ϕ(v).11). A linear operator h : V → V in a Euclidean vector space V is called an adjoint operator to the operator f : V → V if for any two vectors v. .1) . en means that the matrix of g is diagonal in this basis (it is the unit matrix). (3. w ∈ V the following equality is fulfilled: (v | f(w)) = (h(v) | w). . the basis e1 . According to the choice of the vector v. en−1 with the vector en = v. Let’s complete the basis e1 .4. we have the estimate ϕ(u) ϕ(v). This fact is immediate from (2.4 is known as the theorem on simultaneous diagonalization of a pair quadratic form ϕ and g. A linear operator f : V → V in a Euclidean vector space V is called a symmetric operator or a selfadjoint operator if for any two vectors v. It is orthogonal to the vectors e1.

(f + )+ = f.4) is an arbitrary vector. Then for the difference r = h − f + we derive the relationship (r(v) | w) = (h(v) | w) − (f + (v) | w) = = (v | f(w)) − (v | f(w)) = 0. g Assume that h is another operator satisfying the definition 3. The passage from f to f + is an operator in the space of endomorphisms End(V ) of a finite-dimensional Euclidean vector space (V. This means that h = 0. Theorem 3. (3. hence. Thus. g) there is the unique adjoint operator f + = a−1 ◦ f ∗ ◦ ag .2.2. This operator possesses the following properties: (f + h)+ = f + + h+ . Corollary.128 CHAPTER V. (α · f)+ = α · f + . g Proof. (f ◦ h)+ = h+ ◦ f + .3) = f ∗ ◦ ag (v) | w = ag (v) | f(w) = (v | f(w)). for any linear operator f : V → V we can define the composition h = a−1 ◦ f ∗ ◦ ag . Let’s prove its uniqueness. The relationship f + = a−1 ◦ f ∗ ◦ ag can be expressed in g the form of the following commutative diagram: V −−→ V −−    ag ag  V∗ −−→ V∗ −− f∗ f+ . is bijective. we have proved that the adjoint operator f + for f is unique.1) and (3. The existence of an adjoint operator is already derived from the formula f + = a−1 ◦ f ∗ ◦ ag and the equality (3. (3.2) we derive g (h(v) | w) = ag ◦ h(v) | w = (3. This completes the proof of the theorem. Ker g = {0} for a positive quadratic form g.3) with the definition 3. w) = (v | w). For any operator f in a finite-dimensional Euclidean space (V. EUCLIDEAN SPACES. g). Then from (3. h(v) = 0 for any v ∈ V . However.2) In the case of finite-dimensional space V and positive form g the associated mapping ag . Therefore.2. The structure of a Euclidean vector space in V is determined by a positive quadratic form g. we can derive all the above relationships immediately from the definition 3. Comparing (3.3).1.4) Since w in (3. the form g possesses the associated mapping ag : V → V ∗ (see § 2 in Chapter III) such that ag (v) | w = g(v. Like every quadratic form. Relying upon the existence and the uniqueness of of the adjoint operator f + for any operator f ∈ End(V ). we can formulate the following theorem. we conclude that r(v) ∈ Ker g.

. . . . . . . we conclude that the matrix of the associated mapping ag in two bases e1 . a−1 (hj ) = g j=1 gij ei .2) and theorem 4. j=1 Since v ∈ V is an arbitrary vector.§ 3. . e. + hn(v) · en. . The matrix of a selfadjoint operator in an orthonormal basis is symmetric. Let’s combine this fact with (3. . en The matrix gij is non-degenerate (see theorem 1. + hn(v) · en ) = n gij hj (v). For this purpose we need to apply ag one by one to all basis vectors e1. For any vector v ∈ V we have the following expansion.2. SELFADJOINT OPERATORS.6) is written as F + = G−1 F tr G. The matrix gij is the matrix of the inverse mapping a−1. g) and let h1. When applied to our present case the results of Chapter III mean that the matrix of the operator f ∗ : V ∗ → V ∗ in the basis of coordinate functionals h1. Let e1. en and expand the results in the dual basis in V ∗ .5) The matrix inverse to a symmetric matrix is again a symmetric matrix (this fact is well-known from general algebra). en be a basis in a finite-dimensional Euclidean space (V. (3. en . . i.5) and let’s use the formula f + = a−1 ◦f ∗ ◦ ag from the theorem 3. . .3 in Chapter III). . . . where G is the Gram matrix of that basis in which the matrices of f and f + are calculated. hn coincides with the matrix gij = g(ei .2 or Silvester’s criterion in § 4 of Chapter IV). . .1 and 3. . . . v) = = g(ei . hn be the corresponding dual basis composed by coordinate functionals. .1. Here the passage to the adjoint operator means the transposition of its matrix. The formula (3. . hn coincides with the matrix of the initial operator f in the basis e1. For this reason selfadjoint operators are often called symmetric operators. Then for the matrix g of F + of the adjoint operator f + we obtain: n n k giq Fq gkj . . . . Let’s denote by gij the components of the matrix inverse to gij . . Let’s apply this expansion in order to calculate the matrix of the associated mapping ag . h1(v) · e1 + . . Remember that we have already calculated the matrix of the conjugate mapping f ∗ (see formula (4. now we see that a selfadjoint operator f is an operator which is adjoint to itself: f + = f. 129 Comparing the definitions 3. .6) simplifies substantially for orthonormal bases. . . Let’s consider the value of the functional ag (ei ) on an arbitrary vector v of the space V : ag (ei)(v) = ag (ei ) | v = g(ei . . . . we have: g n n ag (ei ) = j=1 gij hj .6) In matrix form the formula (3. . k=1 q=1 (F + )i = j (3. which follows from the definition of coordinate functionals (see § 13 in Chapter III): v = h1(v) · e1 + . . . Therefore g ij = gji . ej ) of the quadratic form g in the basis e1. en and h1 .

11) (3. It is called the associated operator of the form ϕ in a Euclidean space.1). let’s construct the operator h = fϕ for the quadratic form ϕ = ϕf . from what was said above we conclude that defining a selfadjoint operator in a finite-dimensional Euclidean space is equivalent to defining a quadratic form in this space. w). The mapping ag is bijective since g is non-degenerate (see theorem 4. we write (fϕ (v) | w) = ϕ(v. we can apply the theorem 2. v) = (fϕ (w) | v) = (v | fϕ (w)).11). Since w ∈ V is an arbitrary vector and since the form g determining the scalar product in V is non-degenerate.7) to the operator (3. v) = ϕ(v).10): ϕf (v) = (v | fϕ(v)) = ϕ(v. Indeed. Each such operator produces the quadratic ϕf according to the formula ϕf (v) = (v | f(v)). let’s apply the formula (3.2) as −1 u | w = (ag (u) | w). w) = (v | f(w) = (f(v) | w).9) associates each quadratic form ϕ with the selfadjoint operator fϕ . Since ag is bijective. EUCLIDEAN SPACES. For the operator h and for two arbitrary vectors v. w) in (3. .8).7) associates each selfadjoint operator f with the quadratic form ϕf .4 for describing selfadjoint operators in a finite-dimensional case. assume that we have a quadratic form ϕ in a finite-dimensional Euclidean space (V.10) The relationship (3. w) (3. we can write (3. w) = ϕ(w.9) This composition (3. g). means that fϕ is a selfadjoint operator (see definition 2. while the formula (3. The formula (3. we can consider the composition of a−1 and aϕ : g fϕ = a−1 ◦ aϕ . g Now. conversely. Now. w ∈ V from (3. (3. Let f : V → V be a selfadjoint operator in a Euclidean space V . w ∈ V . which is an identity for all v. w ∈ V . from the above equality we get h(v) = f(v).9) is an operator in V . (3.8) for any two vectors v.7) Conversely.9) and use (3. g (3. The form ϕ determines the associated mapping aϕ (see definition 2.10). Therefore. Therefore. These two associations are one to one and are inverse to each other. Thus.10) we derive (h(v) | w) = ϕf (v. we find (fϕ (v) | w) = (a−1 (aϕ (v)) | w) = aϕ (v) | w = ϕ(v.130 CHAPTER V. using the symmetry of the form ϕ(v.5 in Chapter IV). This mapping satisfies the relationship aϕ (v) | w = ϕ(v. Combining this equality with (3.2 in Chapter IV). The positive quadratic form g defining the structure of Euclidean space in V has also its own associated mapping ag .

en be an orthonormal basis in which the matrix of the form ϕf is diagonal.2 is known as the theorem on the spectrum and the basis of eigenvectors of a selfadjoint operator. Therefore. For the selfadjoint operator f in V we consider the symmetric bilinear form ϕf (v. Then λ1 = 0 in (3. SELFADJOINT OPERATORS. The theorem is proved. . the sum of all its eigenspaces coincides with the space V : V = V λ1 ⊕ .7). This proves the proposition of the theorem.13) is one of the eigenvalues of the operator f and we have Ker f = Vλ1 . . Let’s consider the corresponding eigenvectors a and b: f(a) = λ · a.3. Assume that the kernel of selfadjoint operator f is nontrivial: Ker f = {0}. Let e1 . f(b) = µ · b. The main result of this theorem is the diaginalizability of selfadjoint operators in a finite-dimensional Euclidean space. Its eigenspaces coincide with the corresponding root subspaces. k=1 (3. (3. .12). But we know that λ − µ = 0.2. .12) As we see in (3. . w) determined by the quadratic form (3. The characteristic polynomial of a selfadjoint operator is factorized into the product of linear terms in R. . This means that the kernel and the image of a selfadjoint operator are orthogonal to each other and their sum coincides with V : V = Ker f ⊕ Im f. .14) . ⊕ V λs .13) Theorem 3. The theorem 3. All eigenvalues of a selfadjoint operator f in a finite-dimensional Euclidean space V are real numbers and there is an orthonormal basis composed by eigenvectors of such operator. Then from the formula (3. Then for these two eigenvectors a and b we derive: λ (a | b) = (f(a) | b) = (a |f(b)) = µ (a | b). Proof.7) we derive the following equalities: n ϕf (ei . Let f be a selfadjoint operator in a Euclidean space and let λ = µ be its eigenvalues. (a | b) = 0. (3. ej ) = (ei | f(ej ) = i Fjk gik = Fk .§ 3. Hence. Proof. Any two eigenvectors of a selfadjoint operator corresponding to different eigenvalues are orthogonal to each other. the matrices of the operator f and of the form ϕf in such basis do coincide. (λ − µ) (a | b) = 0. . Im f = Vλ2 ⊕ . 131 Theorem 3. ⊕ Vλs .

if it preserves the scalar product of vectors. Assume that the mappings h : U → V and f : V → W both are isometries. Isomorphisms of Euclidean vector spaces possess the following three properties: (1) the identical mapping idV is an isomorphism. i. e. The composition of isometries is again an isometry. Hence.2. e. (4. y ∈ V . A bijective isometry f : V → W is called an isomorphism of Euclidean vector spaces. This equality means that the mapping f ◦ h is an isometry. f(x) = 0 implies |x| = 0 and x = 0. The proof of this theorem is very easy if we use the above theorem 4. The theorem is proved. Let’s consider the arithmetic vector space Rn composed by column vectors of the height n. Isometries and orthogonal operators. From (4.2. Theorem 4.1 and the theorem 8. A linear mapping f : V → W from one Euclidean vector space (V. h) is called an isometry if (f(x) | f(y)) = (x | y) for all x. therefore.1) in Chapter I). Theorem 4. Let’s define a quadratic form g(x) in Rn by setting n (4. Definition 4. any isometry is an injective mapping.1. e. .1) g(x) = (x1 )2 + . This means that the kernel of an isometry is always trivial Ker f = {0}. The addition of such vectors and the multiplication of them by real numbers are performed as the operations with their components (see formulas (2. Due to the recovery formula for quadratic forms (see formula (1. Definition 4. EUCLIDEAN SPACES. § 4. (2) the composition of isomorphisms is an isomorphism.132 CHAPTER V. i. defines the standard structure of a Euclidean space in Rn .1.2) yields the standard scalar product and.6) in Chapter IV) in order to verify that f : V → W is an isometry it is sufficient to verify that it preserves the norm of vectors. Then |f ◦ h(u)| = |f(h(u))| = |h(u)| = |u| for all u ∈ U .2) The form (4. + (xn )2 = i=1 (xi)2 . g) to another Euclidean vector space (W. (3) the mapping inverse to an isomorphism is an isomorphism.3. Definition 4. .1 of Chapter I.1) we easily derive |f(x)| = |x|. Two Euclidean vector spaces V and W are called isomorphic if there is an isomorphism f : V → W relating them. . |h(u)| = |u| for all u ∈ U and |f(v)| = |v| for all v ∈ V . |f(x)| = |x| for all vectors x ∈ V . Proof. i. hence.

For the matrix of the operator f in the basis e1.2 the orthogonal operators form a group which is called the orthogonal group of a Euclidean space V and is denoted by O(V ). . . en this relationship yields: n Fik Fjk = k=1 1 0 for i = j.§ 4. .2). 1996. en be an orthonormal basis in a Euclidean space V and let f be an orthogonal operator.A.3) means that F tr F = 1. (4.3. ISOMETRIES AND ORTHOGONAL OPERATORS. the formula (4. 133 Theorem 4. then from (4.4) The relationships (4. CopyRight c Sharipov R. The operators f ∈ SO(V ) in two-dimensional case dim V = 2 are most simple ones. Matrices that satisfy such relationships. . while the numeric parameter ϕ is interpreted as the angle of rotation. e2 is an orthonormal basis in V . Then from (4. .3) When written in the matrix form. An orthogonal operator f in an orthonormal basis e1 . If e1.13). . . Let e1 . (4.5) is called a matrix of two-dimensional rotation. F −1 = F tr . In order to prove this theorem it is sufficient to choose the orthonormal basis in V and consider the mapping ψ that associates a vector v ∈ V with column vector of its coordinates (see formula (5. Theorem 4.4. An operator f in a Euclidean vector space V is called an orthogonal operator if it is bijective and defines an isometry f : V → V .3) and det F = 1 we easily find the form of an orthogonal matrix F : F = cos(ϕ) sin(ϕ) − sin(ϕ) cos(ϕ) (4. . The group O(V ) is obviously a subgroup in the group of automorphisms Aut(V ). Due to the theorem 4. as we already know. this group is denoted by SO(n.. This group is denoted by SO(V ). en of a Euclidean vector space V is given by an orthogonal matrix. As a corollary of this fact we can formulate the following theorem.4) are identical to the relationships (1. are called orthogonal matrices.4) in Chapter I). In the case V = Rn the orthogonal group determined by the standard scalar product in Rn is denoted by O(n. . The orthogonal operators in V with determinant 1 form a group which is called the special orthogonal group of a Euclidean vector space V . 2004. .1) we derive (f(ei ) | f(ej )) = (ei | ej ).5) A matrix F of the form (4. . R). . the determinant of an orthogonal matrix can be equal to 1 or to −1. If V = Rn . R). As we have noted in § 1. for i = j. Any n-dimensional Euclidean vector space V is isomorphic to the space Rn with the standard scalar product (4. .4. Definition 4.

Due to the Viet theorem we have λ2 λ3 = −1. Let’s consider orthogonal operators f ∈ SO(V ) in the case dim V = 3. Remember that the values of a polynomial of odd degree for large positive λ and for large negative λ differ in sign: lim P (λ) = +∞. In a three-dimensional Euclidean vector space V any orthogonal operator f with determinant 1 has an eigenvalue λ = 1. The theorem is proved. This root is an eigenvalue of the operator f. In the case λ1 = 1 the proposition of the theorem is valid. for λ2 and λ3 we get |λ2| = |λ3| = 1. Theorem 4. In order to the remaining roots of the polynomial P (λ) we consider the following quadratic equation: λ2 − Φ1 λ − 1 = 0. Let’s consider the characteristic polynomial of the operator f.6) is called the operator of rotation about the vector e3 by the angle ϕ. Hence. applying the isometry condition |v| = |f(v)| to the vector v = e1 . Let’s separate the linear factor (λ + 1) in characteristic polynomial: P (λ) = −λ3 + F1 λ2 − F2 λ + 1 = −(λ + 1)(λ2 − Φ1 λ − 1). λ→−∞ λ→+∞ Therefore the equation of the odd degree P (λ) = 0 with real coefficients has at least one real root λ = λ1 . Let e1 . e3 be an orthonormal basis in V . e2 . EUCLIDEAN SPACES. Due to the same reasons as above in the case of λ1 .5. .134 CHAPTER V. The operator f associated with the matrix (4. lim P (λ) = −∞. Thus. we consider the case λ1 = −1. Then F1 = Φ1 − 1 and F2 = −1 − Φ1. A matrix of the form cos(ϕ) F = sin(ϕ) 0 − sin(ϕ) cos(ϕ) 0 0 0 1 (4. we get |e1| = |f(e1 )| = |λ1 · e1 | = |λ1| |e1|. This equation always has two real roots λ2 and λ3 since its discriminant is positive: D = (Φ1 )2 + 4 > 0. one of these two real numbers is equal to 1 and the other is equal to −1. we find that |λ1| = 1. where F3 = det f = 1. we have proved that the number λ = 1 is among the eigenvalues of the operator f. This means that λ1 = 1 or λ1 = −1.6) is an orthogonal matrix with determinant 1. Then. Therefore. Let e1 = 0 be an eigenvector of f corresponding to the eigenvalue λ1 . This is the polynomial of degree 3 in λ with real coefficients: P (λ) = −λ3 + F1 λ2 − F2 λ + F3. Hence. Proof.

The result of this theorem is that any orthogonal operator f with determinant 1 in a three-dimensional Euclidean vector space V is an operator of rotation. from x ∈ U⊥ we derive (x | e1) = 0. Then three vectors e1. Hence. Let e1 = 0 be an eigenvector of this operator associated with the eigenvalue λ1 = 1.6) determines the angle of such rotation. Under the assumptions of theorem 4. 135 Theorem 4. Let’s write the isometry condition (4.6). in some orthogonal basis e2 . Remember that e1 is perpendicular to e2 and e3 . ISOMETRIES AND ORTHOGONAL OPERATORS. f(x) ∈ U⊥ . Proof.5). Since λ1 = 1. This is the two-dimensional subspace in the threedimensional space V .1) for the vectors x and y = e1 : 0 = (x | e1 ) = (f(x) | f(e1 )) = λ1 (f(x) | e1 ). we get (f(x) | e1 ) = 0. e2. which proves the invariance of the subspace U⊥ . Indeed. This restriction is an orthogonal operator in two-dimensional space U⊥ .6). Therefore. e3 of U⊥ the matrix of the restricted operator has the form (4. while the real parameter ϕ in the matrix (4. This subspace is invariant under the action of f. In a three-dimensional Euclidean vector space V for any orthogonal operator f with determinant 1 there is an orthonormal basis in which the matrix of f has the form (4. It can be normalized to the unit length.5. Let’s denote by U the span of the eigenvector e1 and consider its orthogonal complement U⊥ . The eigenvector e1 associated with the eigenvalue λ1 = 1 determines the axis of rotation.§ 4.5 the operator f has an eigenvalue λ1 = 1. its determinant being equal to 1. Let’s consider the restriction of the operator f to the invariant subspace U⊥ . . e3 form an orthonormal basis in threedimensional space V and the matrix of f in this basis has the form (4. The theorem is proved.

1) is taken for the dimension of a linear submanifold L. Definition 1. A transformation of the set M is a bijective mapping p : M → M of the set M onto itself. Definition 1.CHAPTER VI AFFINE SPACES. If the . V ). An affine space itself is sometimes called a point space. (2) pv+w = pv ◦ pw for all v.1) A subset L of M determined according to (1.2. two-dimensional submanifolds are called planes. w ∈ V . etc. Due to this definition any affine space M is associated with some linear vector space V . We shall denote them by capital letters A. Elements of an affine space are used to be called points. A transformation pv given by a vector v ∈ V is called a parallel translation in an affine space M . B ∈ M there is a vector v ∈ V such that pv (A) = B. Therefore an affine space M is often denoted as a pair (M. Thereby the subspace U ⊂ V is called the directing subspace of a linear submanifold L. Points and parallel translations.1) is called a linear submanifold of an affine space M . The dimension of the directing subspace in (1. e. B. Let’s choose a point A ∈ M and then let’s define a subset L ⊂ M in the following way: L = {B ∈ M : ∃ u ((u ∈ U ) & (B = pu (A)))}. § 1. i.4. An action of a vector space V on a set M is called a transitive action if for any two elements A. From the properties (1) and (2) of an action of a space V on a set M one can easily derive the following two properties of such action: (3) p−v = p−1 for all v ∈ V .3. C. Affine spaces. Definition 1. Let V be a linear vector space. (1. w ∈ V . We say that an action of V on a set M is defined if each vector v ∈ V is associated with some transformation pv of the set M and the following conditions are fulfilled: (1) p0 = idM .1. Let U be a subspace in V . v (4) pv ◦ pw = pw ◦ pv for all v. One-dimensional linear submanifolds are called straight lines. An action of a vector space V on a set M is called a free action if for any element A ∈ M the equality pv (A) = A implies v = 0. Let M be an arbitrary set. the transformation pv takes A to B. Definition 1. A set M is called an affine space over the field K if there is a free transitive action of some linear vector space V over the field K on M .

1 . a vector a can have several geometric representations. The points A and C specify a parallel translation pb such that pb(A) = C. Such set is called a segment of a straight line. then vectors of V can be represented by arrowhead segments in M . e. POINTS AND PARALLEL TRANSLATIONS. The points A = A(0) and B = A(1) are ending points of this segment.2) corresponding to the values of t taken from the interval [0. 1] ⊂ R. Two arrowhead segments AB and BA are assumed to be distinct1 . i. Hence. v v Since V acts freely on M (see definition 1. an arrowhead segment AB is assumed to be consisting on two points A and B only.3). AD is a geometric realization of the vector a + b. It is denoted AB −→ − A vector a is uniquely determined by its geometric representation AB . One can choose a direction on the segment AB by saying that one of the ending points is the beginning of the segment and the other is the end of the segment. Indeed. if we choose a point C = A. where t ∈ K. Let U = a be a one-dimensional subspace in V . then for the parallel translation pw−v we have pw−v (A) = p−v ◦ pw (A) = p−1 (pw (A)) = p−1 (B) = A. the vector a is called a directing vector. if dim(V /U ) = 1. However. Let A and B be two points of an affine space M . Each pair of points A.2) passing through the points A and B.2) is known as a parametric equation of a straight line in an affine space. An arbitrary point A(t) of this straight is given by the formula: A(t) = p t·a(A). The above fact appears to be very useful: if we have an affine space (M. Note that −→ − pa+b (A) = D. we have w − v = 0. we can determine the point D = pa (C) and then we can construct −→ − the geometric representation CD for the vector a.§ 1. Linear submanifolds of other intermediate dimensions have no special titles. If K = R. Using the property (4) of parallel translations. it has no interior at all. while t ∈ K is a parameter. B ∈ M specifies the unique vector a ∈ V such that pa (A) = B. 137 dimension of U is less by one than the dimension of V . A segment AB with a fixed direction on it is called a directed segment or −→ − −→ − an arrowhead segment. This vector can be used as a directing vector of the straight line (1. So we conclude: various geometric representations of a vector a are related to each other by means of parallel translations. w = v.2) The formula (1. then the corresponding linear submanifold L is called a hyperplane. this proves the uniqueness of the vector v determined by the condition pv (A) = B. If pw is another parallel translation such that pw (A) = B. Then any vector u ∈ U is presented as u = t · a. From −→ − If K = R. (1. Upon choosing a point A ∈ M the subspace U determines the straight line in M passing through the point A. Let’s prove that such parallel translation is unique. we can consider the set of points on the straight line (1. V ). it is easy to find that the parallel translation pb maps the segment AB to the segment CD. Due to the transitivity of the action of V on M there exists a vector v ∈ V that defines the parallel translation pv taking the point A to the point B: pv (A) = B. The arrowhead segment with the beginning at the point A and with the end at the point B is called the geometric representation of the −→ − vector a. Therefore.

. e1. . . . . e1. . en and ρ in the basis e1 . Let O be some fixed point of an affine space M . en in V .3) ˜ ˜ ˜ ρ = ρ 1 · e1 + . en and O . en and e1 . The points O and O −−→ −→ − determine the arrowhead segment OO and the opposite arrowhead segment O O . . . . By means of them we can find the relation of the coordinates of a point X in two ˜ different coordinate systems O. A frame or a coordinate system in an affine space M is a pair consisting of a point O ∈ M and a basis e1 . If the space V is finite-dimensional. + ρ n · en . . . ˜n is given e by the direct and inverse transition matrices S and T . . .5. Definition 1. . AFFINE SPACES. ˜1. e1. + ρ n · en . . This vector r = rA is called the radius-vector of the point A. ˜n: e ρ = ρ 1 · e1 + .138 CHAPTER VI.4) are much more different: n n i Sj ρ j . . (1. . . . . . ˜ j=1 xi = ρ i + xi = ρ i + ˜ ˜ j=1 Tji xj .3)). Let O. The coordinates of −→ − the radius-vector rA = OA in the basis e1. . their coordinates in formulas (1. . this fact we easily derive the well-known rules for vector addition: the triangle rule −→ −→ −→ − − − −→ −→ −→ − − − AC + CD = AD and the parallelogram rule AB + AC = AD . e1. . . . en be two coordinate systems in an ˜ affine space M . . . . The following formulas are obvious: − → −−→ −−→ − OX = OO + O X . which we stated above. . en and then can expand the radius-vectors of all points A ∈ M in this basis. ρ ∈ V : −−→ ρ = OO −→ − ˜ ρ=OO ˜ ˜ Let’s expand ρ in the basis e1. . (1. . . . . . j=1 ˜ This happens because ρ and ρ are expanded in two different bases (see the above expansions (1.4) ˜ ˜ Though the vectors ρ and ρ differ only in sign (ρ = −ρ). show that considering affine spaces is a proper way for geometrization of the linear algebra. . . . . . . . ˜ They are associated with two vectors ρ. en and O . . The facts from the theory of affine spaces. . . Coordinate systems in affine spaces play the same role as bases in linear vector ˜ ˜ spaces. The relation of the bases e1. then we can choose a basis e1. en . . . − −→ − → − → − − O X = O O + OX . . Let’s call it the origin. en are called the coordinates of a point A in the coordinate system O. en : e n n i Sj x j . . . ˜ ˜ Then consider a point X ∈ M . ˜ j=1 ρi = − ρi = − ˜ Tji ρj . . Then −→ − any point A ∈ M specifies the arrowhead OA which is identified with the unique vector r ∈ V by means of the equality pr (O) = A. .

. i. if in V some positive quadratic form g is fixed. It is formulated in terms of some rectangular Cartesian coordinate system O. en. which we considered in previous section. V ) over the field of real numbers R is called a Euclidean point space if the space V acting on M by parallel translations is equipped with a structure of a Euclidean vector space. An affine space (M. the geometry of three-dimensional affine spaces is called the stereometry. Definition 2. .2. . . Such a change of variables changes the coefficients of the polynomial in (2. g). In affine spaces we have a quite natural concept of parallel translations and. en is an orthonormal basis of the Euclidean vector space (V. passing to another Cartesian coordinate system is equivalent to a linear change of variables in the equation (2. hence. . and their multidimensional generalizations — linear submanifolds.§ 2.3. However. They are geometric representations of two vectors v and w of V . . . § 2. The structure of a Euclidean space given by a quadratic form g brings this lacking feature in. The angle between −→ − −→ − AB and CD by definition is the angle between vectors v and w determined by the formula (1. . Definition 2. a very important feature was lacking: there was no concept of a length and there was no concept of an angle. we can multiply them by numbers. A quadric in a Euclidean point space M is a set of points in M whose coordinates x1 . Affine spaces of higher dimensions are studied by a geometrical discipline which is called the multidimensional geometry. 139 A vector is an algebraic object: we can add vectors. A coordinate system O. but it does not change the structure of this equation in whole. en satisfies some polynomial equation of degree two: n n n aij x x + 2 i=1 j=1 i=1 i j bi xi + c = 0. .6) of Chapter V. .1. e. EUCLIDEAN POINT SPACES. Quadrics in a Euclidean space. . The geometry of two-dimensional affine spaces is called the planimetry. . (2. −→ − −→ − Let AB and CD be two arrowhead segments in a Euclidean point space. Let A and B be two points of a Euclidean point space M . e1 . In affine space the concept of a point becomes paramount.1). e1 . xn in some rectangular Cartesian coordinate system O. In affine spaces. g) is called a rectangular Cartesian coordinate system in M if e1. They determine a vector v ∈ V specified by the condition pv (A) = B (this −→ − vector is identified with the arrowhead segment AB ). Definition 2. A quadric continues to be a quadric in any Cartesian coordinate system.4)). . The norm of the vector v determined by the quadratic form g is called the length of the segment AB or the distance between two points A and B: |AB| = |v| = g(v). . Euclidean point spaces. Due to the equality | − v| = |v| we derive |AB| = |BA|. . . . . Points form straight lines. . . e1 . V. we can define the concept of parallelism for linear submanifolds. and we can form linear combinations of them.1) The definition of a quadric is not coordinate-free. planes.1) (see formulas (1. en in a finite-dimensional Euclidean point space (M. .

en . (2. . Hence. .7) The definition of b through its coordinates (2. ˜n be some other rectangular Cartesian coordinate system in e ˜ M . ˜1. (2. This is because the formula (2.3) differs from the standard transformation formula for the coordinates of a covector under a change of basis (see (2. . en . . . Substituting (1.13) in Chapter V. e1 .4) in Chapter III).14) in Chapter V). 1996.3) in the following form: n n i Sq i=1 ˜q = b bi + j=1 aij ρj . . we conclude that each quadric in M is associated with some quadratic form in V . . The operator fa is a selfadjoint operator in V .6) where gik is the matrix inverse to the Gram matrix of the basis e1. . we define a vector b through its coordinates given by formula n bi = k=1 gik bk . AFFINE SPACES. The matrix of the operator fa is given by the formula n Fji = k=1 gik akj .4) + aij ρi ρj + i=1 bi ρi + c. Let’s rewrite (2.. en is called the primary quadratic form of a quadratic (2. Now the problem of bringing the equation of a quadric to a canonic form is formulated as the problem of finding a proper rectangular Cartesian coordinate system in which the equation (2.11) in Chapter IV). . We can calculate the coefficients of the equation of quadric in the new coordinate system. . it determines the expansion of the space V into the direct sum of two mutually orthogonal subspaces Ker fa and Im fa : V = Ker fa ⊕ Im fa (2. . .5) (see (3. Apart from fa .2) coincides with the transformation formula for the components of a quadratic form under a change of basis (see (1. . en . .1). .2) (2. . The formula (2. n n i aij ρj Sq . The form a determined by the matrix aij in the basis e1 . . .3) (2. . .A.8) CopyRight c Sharipov R.9) in Chapter V). i=1 j=1 n aqp = ˜ ˜q = b c= ˜ i=1 j=1 i=1 j=1 n i b i Sq i=1 n n (2. .140 CHAPTER VI. .4) appear to be orthogonal matrices (see formulas (1. e1 . . .1). Let’s consider the associated operator fa determined by the primary quadratic form a (see formula (3. . we get n n i j aij Sq Sp . In this e case transition matrices S and T in (1. 2004.1) has the most simple canonic form. en to O .4) into (2. (2.7) is essentially bound to the coordinate system O. ˜ Let O . e1. Let’s consider the passage from O.

Any quadric in a Euclidean point space (M. (2. i.9) show that the numbers bi cannot be annihilated (unless they are equal to zero from the very beginning). we get the following equalities in the new coordinate system: ˜ b(2) = 0. can be made positive by changing the sign of f. g) is associated with some selfadjoint operator f and some vector b ∈ Ker f such that in some rectangular Cartesian coordinate system the radius vector r of an arbitrary point of this quadric satisfies the following equation: (f(r) | r) + 2 (b | r) + c = 0.6) and (2. Degenerate quadrics are subdivided into two types: (1) parabolic type. Substituting (2. For non-degenerate quadrics the vector b in (2. when dim Ker f = 1 and b = 0. V. These numbers determine the vector b(1) ∈ Ker fa which does not depend on the choice of a coordinate system. EUCLIDEAN POINT SPACES.9) The vector b(2) in the expansion b = b(1) + b(2) can be annihilated at the expense −−→ of proper choice of the coordinate system.§ 2. (3) conic type. when c = 0 and the quadratic form a(x) = (f(x) | x) is positive or negative.7).10) this follows from b(2) = −fa (ρ) due to (2. when c = 0. Though it is not unique.8). e. when Ker f = {0}.10) into (2. where bi are transformed as follows: n ˜(1) = bq i=1 i Sq b i . the vector ρ satisfying this equality does exist since b(2) ∈ Im fa . when Ker f = {0}.11). By means of this operator we subdivide all quadrics into two basic types: (1) non-degenerate quadrics. (1) (2. . For its components we have n bi + j=1 (2) aij ρj = 0. its signature has both pluses and minuses. (2) hyperbolic type. when c = 0 and the quadratic form a(x) = (f(x) | x) is not sign-definite. 141 Then let’s consider the expansion of the vector b into the sum of two vectors b = b(1) + b(2) according to the expansion (2.11) is equal to zero. The relationships (2. (2. non-degenerate quadrics are subdivided into three types: (1) elliptic type. i.11) The operator f determines the leading part of the equation (2. (2) degenerate quadrics. when dim Ker f > 1 or b = 0. Let’s determine the vector ρ = OO (2) from the equality b = −fa (ρ). e. (1) ˜ ˜ b = b(1). As a result we have proved the following theorem. (2) cylindric type.1. Therefore. Theorem 2.5) of the space V . This expansion (1) (2) (1) induces the expansion bi = bi + bi .

.. (a1)2 (an−1)2 If n = dim M > 1. .± = 0.+ = ±1. If it is again of cylindric type. Therefore..± = ±1.1) in the case of a degenerate quadric of parabolic type can be brought to the following canonic form: (x1)2 (xn−1)2 ± .1) in the case of a non-degenerate quadric of elliptic type can be brought to the following canonic form: (x1 )2 (xn )2 +. The reduced quadric can belong to any one of the above four types... The equation (2. (a1 )2 (an )2 The canonic equation of a non-degenerate quadric of conic type is homogeneous: (x1 )2 (xn)2 ±. we can reduce the dimension of the space M . Therefore. (a1 )2 (an)2 The equation (2. 2 (a1 ) (an )2 This is the canonic equation of a non-degenerate quadric of hyperbolic type: (x1 )2 (xn )2 ±. AFFINE SPACES..142 CHAPTER VI.± = 2 xn . Otherwise we shall reach the dimension dim M = 1. then in a canonic equation of a quadric of cylindric type there is no explicit entry of at least one variable. then we can repeat the reduction procedure.. the quadrics of cylindric type are those which belong to one of the non-cylindric types in the reduced dimension.. This process can terminate in some intermediate dimension yielding the reduced quadric of some non-cylindric type. In one-dimensional Euclidean point space there is no quadrics of cylindric type.

Beklemishev D. Sharipov R. «Nauka» publishers. 1977. Kurosh A. 1996. 2. A.htm in GeoCities. A. the reference [7] is added in 2004. «Nauka» publishers. Bashkir State University. Bashkir State University. 4. free online publication math. Sharipov R. 6. 3. I and II. Quick introduction to tensor analysis. see online physics/0311011 in Electronic Archive http://arXiv. 1 The references [2] and [3] are added in 1998. «Visshaya Shkola» publishers. Course of mathematical analysis.org and r-sharipov/r4-b5. Course of analytical geometry and linear algebra. D. see online math/0412421 in Electronic Archive http://arXiv. Ufa. Sharipov R. 1. Moscow. 5.htm in GeoCities. Introduction to algebra. .org and rr-sharipov/r4-b3. 1985. Kostrikin A. V. Vol. A. I.Moscow.org. Course of general algebra. Moscow.REFERENCES. G. 1997. 1985. Kudryavtsev L. 2004. 7. Ufa. «Nauka» publishers. see also rsharipov/r4-b6.HO/0403252 in Electronic Archive http://arXiv.htm in GeoCities. Course of differential geometry 1. Classical electrodynamics and the theory of relativity. Moscow.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.