You are on page 1of 500

REVIEWSOF

Algebra and Analysis


for Engineers and Scientists

"This book is a useful compendium of the mathematics of (mostly) finite-dimensionallinear vector spaces (plus two final chapters on infinite-dimensional
spaces), which do find increasing application in many branches of engineering
and science .... The treatment is thorough; the book will certainly serve as a valuable reference."
- A merican Scientist
"The authors present topics in algebra and analysis for students in engineering
and science .... Each chapter is organized to include a brief overview, detailed
topical discussions and references for further study. Notes about the references
guide the student to collateral reading. Theorems, definitions, and corollaries are
illustrated with examples. The student is encouraged to prove some theorems
and corollaries as models for proving others in exercises. In most chapters, the
authors discuss constructs used to illustrate examples of applications. Discussions are tied together by frequent, well written notes. The tables and index are
good. The type faces are nicely chosen. The text should prepare a student well in
mathematical matters."
- S cience Books and iF lms
"This is an intermediate level text, with exercises, whose avowed purpose is to
provide the science and engineering graduate student with an appropriate modern mathematical (analysis and algebra) background in a succinct, but nontrivial,
manner. After some fundamentals, algebraic structures are introduced followed
by linear spaces, matrices, metric spaces, normed and inner product spaces and
linear operators.... While one can quarrel with the choice of specific topics and
the omission of others, the book is quite thorough and can serve as a text, for
self-study or as a reference."
- M athematical Reviews
"The authors designed a typical work from graduate mathematical lectures: formal definitions, theorems, corollaries, proofs, examples, and exercises. It is to
be noted that problems to challenge students' comprehension are interspersed
throughout each chapter rather than at the end."
- C H O ICE

Printed in the USA

Anthony N. Michel
Charles .J Herget

Algebra and Analysis


for Engineers and Scientists

Birkhauser
Boston Basel Berlin

Anthony N. Michel
Department of Electrical Engineering
nU iversity of Notre Dame
Notre Dame, IN 64 556
.U S.A.

Charles .J eH rget
eH rget Associates
P.O. Box 1425
Alameda, CA 94501
.U S.A.

Cover design by Dutton and Sherman, aH mden, CT.


Mathematics Subject Classification (2000): 03Ex,x
03E20, 08-,X
08-01, IS-,X
15A04, 15A06, 15A09, 15A15, 15AI8, 15A21, 15A57, 15A60, 15A63, 20-,X
26-01, 26Ax,x 26A03, 26A15, 26Bx,x 34,X 340- 1, 34A,x
34AI2, 34A30, 340H 5,
64 A22, 64 A50, 64 A55, 64 Bx,x 64 B20,
64 B25, 64 Cx,x 64 C05,
64 Ex,x
64 0- 1, 64 Ax,x
74 ,X 74 0- 1, 74 Ax,x
74 A05, 74 A07, 74 A10, 74 A25, 74 A30, 74 A67, 47BI5,47HI0,
54,X 540- 1, 54A20, 54C,x 54C05,
54C30, 540x ,
54005, 54 0 30,54 0 35,54 0 4 5 ,
54E50, 93EIO

15-01, 15A03,
20-01, 26-,X
54 B05, 64 ,X 64 NIO, 64 N20,
74 N20, 74 N70,
54E35, 54E54 ,

L i brary of Congress Control Number: 2007931687

ISBN-13: 978-08- 176-74 06-3

e-ISBN-13: 978-08- 176-74 07-0

Printed on acid-free paper.


2007

Birkhiiuser Boston

Originally published as Mathematical oF undations in Engineering and Science by Prentice-aH ll,


Englewood Cliffs, NJ, 1981. A subsequent paperback edition under the title Applied Algebra and
F r the Birkhiiuser Boston printing,
Functional Analysis was published by Dover, New oY rk, 1993. o
the authors have revised the original preface.
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhiiuser Boston, c/o Springer Science+Business Media C
L , 233
Spring Street, New oY rk, NY 10013, S
U A), except for brief excerpts in connection with reviews or
scholarly analysis. sU e in connection with any form of information .storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
9 8 7 6 5 432

www.birkhauser.com

(IBT)

CONTENTS

PREFACE

IX

CH A PTER 1:
1.1
1.2
1.3
1.4
1.5
1.6

2.2
2.3
2.4

CONCEPTS

Sets
1
Functions
12
Relations and Equivalence Relations
25
Operations on Sets
26
Mathematical Systems Considered in This Book
References and Notes
31
References
32

CH A PTER 2:
2.1

F U N DAMENTAL

ALGEBRAIC STRU C TU R ES

Some Basic Structures of Algebra

A. Semigroups and Groups

36

30

33
34

8. Rings and iF elds


46
C. Modules, Vector Spaces, and Algebras
D. Overview
61
Homomorphisms
62
69
Application to Polynomials
References and Notes
74
References
74

53

Contents

vi

CHAPTER J :
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8

iL near Spaces
75
iL near Subspaces and Direct Sums
81
iL near Independence, Bases, and Dimension
iL near Transformations
95
iL near uF nctionals
109
Bilinear uF nctionals
113
Projections
119
Notes and References
123
References
123

CHAPTER 4 :
.4 1
.4 2

.4 3
4.4

.4 5
.4 6
.4 7

.4 8
;4 9
.4 10

VECTOR SPACES AND LINEAR


TRANSFORMATIONS
75

85

FINITE-DIMENSIONAL VECTOR
SPACES ANDMATRICES
124

Coordinate Representation of Vectors


124
Matrices
129
A. Representation of iL near Transformations
by Matrices
129
B. Rank of a Matrix
134
C. Properties of Matrices
136
Equivalence and Similarity
148
Determinants of Matrices

155

Eigenvalues and Eigenvectors


163
Some Canonical oF rms of Matrices
169
Minimal Polynomials, Nilpotent Operators
and the oJ rdan Canonical oF rm
178
A. Minimal Polynomials
178
B. Nilpotent Operators
185
C. The oJ rdan Canonical oF rm
190
Bilinear uF nctionals and Congruence
194
Euclidean Vector Spaces
202
A. Euclidean Spaces: Definition and Properties
B. Orthogonal Bases
209
iL near Transformations on Euclidean Vector Spaces
A. Orthogonal Transformations
216
B. Adjoint Transformations
218
C. Self-Adjoint Transformations
221
D. Some Examples
227
E. uF rther Properties of Orthogonal
Transformations
231

202
216

vii

Contents

.4 11
4.12

Applications. to Ordinary Differential Equations


A. Initial-Value Problem: Definition
238
B. Initial-Value Problem: linear Systems
24 4
Notes and References
261
References
262

CH A PTER 5:

METRIC SPACES

238

263

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9

Definition of Metric Spaces


264
Some Inequalities
268
Examples of Important Metric Spaces
271
Open and Closed Sets
275
Complete Metric Spaces
286
Compactness
298
Continuous Functions
307
Some Important Results in Applications
314
Equivalent and Homeomorphic
Metric Spaces.
Topological Spaces
317
323
5.10 Applications
A. Applications of the Contraction Mapping
Principle
323
B. uF rther Applications to Ordinary Differential
Equations
329
5.11 References and Notes
341
References
341

CHAPTER 6:
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.1 0
6.11
6.12

NORMED SPACES AND INNER PRODUCT


SPACES
343

Normed linear Spaces


344
linear Subspaces
348
Infinite Series
350
Convex Sets
351
iL near Functionals
355
Finite- Dimensional Spaces
360
Geometric Aspects of iL near Functionals
363
Extension of iL near Functionals
367
Dual Space and Second Dual Space
370
Weak Convergence
372
Inner Product Spaces
375
Orthogonal Complements
381

yiii

Contents

6.13
6.14
6.15

6.16

oF urier Series
387
The Riesz Representation Theorem
393
Some Applications
394
A. Approximation of Elements in iH lbert Space
(Normal Equations)
395
B. Random Variables
397
C. Estimation of Random Variables
398
Notes and References
404
References
404

CHAPTER 7:
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10

7.11

L I NEAR OPERATORS

406

Bounded iL near Transformations


04 7
Inverses
415
419
Conjugate and Adjoint Operators
eH rmitian Operators
24 7
Other iL near Operators: Normal Operators, Projections,
nU itary Operators, and Isometric Operators
34 1
The Spectrum of an Operator
439
Completely Continuous Operators
74
The Spectral Theorem for Completely Continuous
Normal Operators
454
Differentiation of Operators
458
Some Applications
465
A. Applications to Integral Equations
465
B. An Example from Optimal Control
468
C. Minimization of Functionals: Method of Steepest
Descent
74 1
References and Notes
473
References
74 3
Index
475

PREFACE

This book evolved from a one-year sequence of courses offered by the authors
at Iowa State University. The audience for this book typically included theoretically oriented first- or second-year graduate students in various engineering or
science disciplines. Subsequently, while serving as Chair of the Department of
Electrical Engineering, and later, as Dean of the College of Engineering at the
University of Notre Dame, the first author continued using this book in courses
aimed primarily at graduate students in control systems. Since administrative
demands precluded the possibility of regularly scheduled classes, the Socratic
method was used in guiding students in self study. This method of course delivery turned out to be very effective and satisfying to student and teacher alike.
F e edback from colleagues and students suggests that this book has been used in
a similar manner elsewhere.
The original objectives in writing this book were to provide the reader with appropriate mathematical background for graduate study in engineering or science;
to provide the reader with appropriate prerequisites for more advanced subjects
in mathematics; to allow the student in engineering or science to become familiar with a great deal of pertinent mathematics in a rapid and efficient manner
without sacrificing rigor; to give the reader a unified overview of applicable
mathematics, thus enabling him or her to choose additional courses in mathematics more intelligently; and to make it possible for the student to understand
at an early stage of his or her graduate studies the mathematics used in the cur-

ix

Preface

rent literature (e.g., journal articles, monographs, and the like).


Whereas the objectives enumerated above for writing this book were certainly pertinent over twenty years ago, they are even more compelling today. The
reasons for this are twofold. First, today's graduate students in engineering or
science are expected to be more knowledgeable and sophisticated in mathematics than students in the past. Second, today's graduate students in engineering
or science are expected to be familiar with a great deal of ancillary material
(primarily in the computer science area), acquired in courses that did not even
exist a couple of decades ago. In view of these added demands on the students'
time, to become familiar with a great deal of mathematics in an efficient manner,
without sacrificing rigor, seems essential.
Since the original publication of this book, progress in technology, and consequently, in applications of mathematics in engineering and science, has been
phenomenal. oH wever, it must be emphasized that the type of mathematics itself that is being utilized in these applications did not experience corresponding
substantial changes. This is particularly the case for algebra and analysis at the
intermediate level, as addressed in the present book. Accordingly, the material
of the present book is as current today as it was at the time when this book first
appeared. (Plus a~ change, plus c' e st la meme chose.- A lphonse aK rr, 1849.)
This book may be viewed as consisting essentially of three parts: set theory
(Chapter I), algebra (Chapters 2- 4 ) , and analysis (Chapters 5-7). Chapter I is
a prerequisite for all subsequent chapters. Chapter 2 emphasizes abstract algebra (semigroups, groups, rings, etc.) and may essentially be skipped by those
who are not interested in this topic. Chapter 3, which addresses linear spaces
and linear transformations, is a prerequisite for Chapters ,4 6, and 7. Chapter 4, which treats finite-dimensional vector spaces and linear transformations
on such spaces (matrices) is required for Chapters 6 and 7. In Chapter 5, metric
spaces are treated. This chapter is a prerequisite for the subsequent chapters. Finally, Chapters 6 and 7 consider Banach and Hilbert spaces and linear operators
on such spaces, respectively.
The choice of applications in a book of this kind is subjective and will always be susceptible to criticisms. We have attempted to include applications of
algebra and analysis that have broad appeal. These applications, which may be
omitted without loss of continuity, are presented at the ends of Chapters 2, 4, 5,
6, and 7 and include topics dealing with ordinary differential equations, integral
equations, applications of the contraction mapping principle, minimization of
functionals, an example from optimal control, and estimation of random variables.
All exercises are an integral part of the text and are given when they arise,
rather than at the end of a chapter. Their intent is to further the reader's understanding of the subject matter on hand.

Preface

ix

The prerequisites for this book include the usual background in undergraduate
mathematics offered to students in engineering or in the sciences at universities
in the United States. Thus, in addition to graduate students, this book is suitable for advanced senior undergraduate students as well, and for self study by
practitioners.
Concerning the labeling of items in the book, some comments are in order. Sections are assigned numerals that reflect the chapter and the section numbers. For
example, Section 2.3 signifies the third section in the second chapter. Extensive
sections are usually divided into subsections identified by upper-case common letters A, B, C, etc. Equations, definitions, theorems, corollaries, lemmas,
examples, exercises, figures, and special remarks are assigned monotonically
increasing numerals which identify the chapter, section, and item number. For
example, Theorem 4.4.7 denotes the seventh identified item in the fourth section
of Chapter .4 This theorem is followed by Eq. (4.4.8), the eighth identified item
in the same section. Within a given chapter, figures are identified by upper-case
letters A, B, C, etc., while outside of the chapter, the same figure is identified
by the above numbering scheme. iF nally, the end of a proof or of an example is
signified by the symbol .

Suggested Course Outlines


Because of the flexibility described above, this book can be used either in a onesemester course, or a two-semester course. In either case, mastery of the material
presented will give the student an appreciation of the power and the beauty of
the axiomatic method; will increase the student's ability to construct proofs;
will enable the student to distinguish between purely algebraic and topological
structures and combinations of such structures in mathematical systems; and of
course, it will broaden the student's background in algebra and analysis.

A one-semester course
Chapters 1, 3, 4, 5, and Sections 6.1 and 6.11 in Chapter 6 can serve as the basis
for a one-semester course, emphasizing basic aspects of Linear Algebra and
Analysis in a metric space setting.
The coverage of Chapter 1 should concentrate primarily on functions (Section 1.2) and relations and equivalence relations (Section 1.3), while the material
concerning sets (Section 1.1) and operations on sets (Section 1.4) may be covered as reading assignments. On the other hand, Section 1.5 (on mathematical
systems) merits formal coverage, since it gives the student a good overview of
the book' s aims and contents.

xii

Preface

The material in this book has been organized so that Chapter 2, which addresses the important algebraic structures encountered in Abstract Algebra, may
be omitted without any loss of continuity. In a one-semester course emphasizing
Linear Algebra, this chapter may be omitted in its entirety.
In Chapter 3, which addresses general vector spaces and linear transformations, the material concerning linear spaces (Section 3.1), linear subspaces and
direct sums (Section 3.2), linear independence and bases (Section 3.3), and linear transformations (Section 3.4) should be covered in its entirety, while selected
topics on linear functionals (Section 3.5), bilinear functionals (Section 3.6), and
projections (Section 3.7) should be deferred until they are required in Chapter .4
Chapter 4 addresses finite-dimensional vector spaces and linear transformations (matrices) defined on such spaces. The material on determinants (Section
4.4) and some of the material concerning linear transformations on Euclidean
vector spaces (Subsections .4 1 OD and .4 1 OE), as well as applications to ordinary
differential equations (Section 4.11) may be omitted without any loss of continuity. The emphasis in this chapter should be on coordinate representations of
vectors (Section 4.1), the representation of linear transformations by matrices
and the properties of matrices (Section 4.2), equivalence and similarity of matrices (Section 4.3), eigenvalues and eigenvectors (Section 4.5), some canonical
forms of matrices (Section 4.6), minimal polynomials, nilpotent operators and
the Jordan canonical form (Section 4.7), bilinear functionals and congruence
(Section 4.8), Euclidean vector spaces (Section 4.9), and linear transformations
on Euclidean vector spaces (Subsections .4 1 OA, .4 1 OB, and .4 1 oq .
Chapter 5 addresses metric spaces, which constitute some of the most important topological spaces. In a one-semester course, the emphasis in this chapter
should be on the definition of metric space and the presentation of important
classes of metric spaces (Sections 5.1 and 5:3), open and closed sets (Section 5.4), complete metric spaces (Section 5.5), compactness (Section 5.6), and
continuous functions (Section 5.7). The development of many classes of metric
spaces requires important inequalities, including the Holder and the Minkowski
inequalities for finite and infinite sums and for integrals. These are presented
in Section 5.2 and need to be included in the course. Sections 5.8 and 5.10 address specific applications and may be omitted without any loss of continuity.
oH wever, time permitting, the material in Section 5.9, concerning equivalent and
homeomorphic metric spaces and topological spaces, should be considered for
inclusion in the course, since it provides the student a glimpse into other areas
of mathematics.
To demonstrate mathematical systems endowed with both algebraic and topological structures, the one-semester course should include the material of
Sections 6.1 and 6.2 in Chapter 6, concerning normed linear spaces (resp., Banach spaces) and inner product spaces (resp., Hilbert spaces), respectively.

Preface

ix ii

A two-semester course
In addition to the material outlined above for a one-semester course, a two-se-

mester course should include most of the material in Chapters 2, 6, and 7.


Chapter 2 addresses algebraic structures. The coverage of semigroups and
groups, rings and fields, and modules, vector spaces and algebras (Section 2.1)
should be in sufficient detail to give the student an appreciation of the various
algebraic structures summarized in Figure B on page 61. Important mappings
defined on these algebraic structures (homomorphisms) should also be emphasized (Section 2.2) in a two-semester course, as should the brief treatment of
polynomials in Section 2.3.
The first ten sections of Chapter 6 address normed linear spaces (resp., Banach spaces) while the next four sections address inner product spaces (resp.,
Hilbert spaces). The last section of this chapter, which includes applications (to
random variables and estimates of random variables), may be omitted without
any loss of continuity. The material concerning normed linear spaces (Section 6.1), linear subspaces (Section 6.2), infinite series (Section 6.3), convex sets
(Section 6.4), linear functionals (Section 6.5), finite-dimensional spaces (Section 6.6), inner product spaces (Section 6.11), orthogonal complements (Section
6.12), and Fourier series (Section 6.13) should be covered in its entirety. Coverage of the material on geometric aspects of linear functionals (Section 6.7),
extensions of linear functionals (Section 6.8), dual space and second dual space
(Section 6.9), weak convergence (Section 6.10), and the Riesz representation
theorem (Section 6.14) should be selective and tailored to the availability of time
and the students' areas of interest. (For example, students interested in optimization and estimation problems may want a detailed coverage of the H a hn- B anach
theorem included in Section 6.8.)
Chapter 7 addresses (bounded) linear operators defined on Banach and Hilbert
spaces. The first nine sections of this chapter should be covered in their entirety
in a two-semester course. The material of this chapter includes bounded linear transformations (Section 7.1), inverses (Section 7.2), conjugate and adjoint
operators (Section 7.3), Hermitian operators (Section 7.4), normal, projection,
unitary and isometric operators (Section 7.5), the spectrum of an operator (Section 7.6), completely continuous operators (Section 7.7), the spectral theorem
for completely continuous normal operators (Section 7.8), and differentiation of
(not necessarily linear and bounded) operators (Section 7.9). The last section,
which includes applications to integral equations, an example from optimal control, and minimization of functionals by the method of steepest descent, may be
omitted without loss of continuity.
Both one-semester and two-semester courses offered by the present authors,
based on this book, usually included a project conducted by each course participant to demonstrate the applicability of the course material. Each project

ix v

Preface

involved a formal presentation to the entire class at the end of the semester.
The courses described above were also offered using the Socratic method, following the outlines given above. These courses typically involved half a dozen
participants. While most of the material was self taught by the students themselves, the classroom meetings served as a forum for guidance, clarifications, and
challenges by the teacher, usually resulting in lively discussions of the subject on
hand not only among teacher and students, but also among students themselves.
For the current printing of this book, we have created a supplementary website of additional resources for students and instructors: http://Michel.Herget.
net. Available at this website are additional current references concerning the
subject matter of the book and a list of several areas of applications (including
references). Since the latter reflects mostly the authors' interests, it is by definition rather subjective. Among several additional items, the website also includes
some reviews of the present book. In this regard, the authors would like to invite
readers to submit reviews of their own for inclusion into the website.
The present publication of Algebra and Analysisfor Engineers and Scientists
was made possible primarily because of Tom Grasso, Birkhauser's Computational Sciences and Engineering Editor, whom we would like to thank for his
considerations and professionalism.
Anthony N. Michel
Charles .J Herget
Summer. 2007

N
U F DAMENTAL

CONCEPTS

In this chapter we present fundamental concepts required throughout the


remainder of this book. We begin by considering sets in Section 1.1. In
Section 1.2 we discuss functions; in Section 1.3 we introduce relations and
equivalence relations; and in Section 1.4 we concern ourselves with operations
on sets. In Section 1.5 we give a brief indication of the types of mathematical
systems which we will consider in this book. The chapter concludes with a
brief discussion of references.

1.1. SETS
Virtually every area of modern mathematics is developed by starting from
an undefined object called a set. There are several reasons for doing this.
One of these is to develop a mathematical discipline in a completely axiomatic
and totally abstract manner. Another reason is to present a unified approach
to what may seem to be highly diverse topics in mathematics. Our reason is
the latter, for our interest is not in abstract mathematics for its own sake.
However, by using abstraction, many of the underlying principles of modern
mathematics are more clearly understood.
Thus, we begin by assuming that a set is a well defined collection of
1

Chapter 1 I uF ndomental Concepts

elements or objects. We denote sets by common capital letters A, B, C, etc.,


and elements or objects of sets by lower case letters a, b, c, etc. F o r example,
we write
A = a{ , b, c}
to indicate that A is the collection of elements a, b, c. If an element x belongs
to a set A, we write
X EA.
In this case we say that "x belongs to A," or "x is contained in A," or "x is
a member of A," etc. Ifx is any element and if A is a set, then we assume that
one knows whether x belongs to A or whether x does not belong to A. If x
does not belong to A we write
x

A.

To illustrate some of the concepts, we assume that the reader is familiar


with the set of real numbers. Thus, if we say

R is the set of all real numbers,


then this is a well defined collection of objects. We point out that it is possible
to characterize the set of real numbers in a purely abstract manner based on
an axiomatic approach. We shall not do so here.
To illustrate a non-well defined collection of objects, consider the statement "the set of all tall people in Ames, Iowa." This is clearly not precise
enough to be considered here.
We will agree that any set A may not contain any given element x more
than once unless we explicitly say so. Moreover, we assume that the concept
of "order" will play no role when representing elements of a set, unless we
say so. Thus, the sets A = a{ , b, c} and B = c{ , b, a} are to be viewed as being
exactly the same set.
We usually do not describe a set by listing every element between the
curly brackets { } as we did for set A above. A convenient method of characterizing sets is as follows. Suppose that for each element x of a set A there is a
statement P(x ) which is either true or false. We may then define a set B which
consists of all elements x E A such that P(x ) is true, and we may write
B

{x

A: P(x ) is true}.

F o r example, let A denote the set of all people who live in Ames, Iowa, and
let B denote the set of all males who live in Ames. We can write, then,
B=

{x

A: x is a male}.

When it is clear which set x belongs to, we sometimes write { x : P(x ) is true}
(instead of, say, {x E A: P(x ) is trueD.
It is also necessary to consider a set which has no members. Since a set is
determined by its elements, there is only one such set which is called the

1.1. Sets

empty set, or the vacuous set, or the null set, or the void set and which is
denoted by 0. Any set, A, consisting of one or more elements is said to be
non-empty or nOD-void. IfA is non-void we write A 1= = 0.
If A and B are sets and if every element of B also belongs to A, then we
say that B is a subset of A or A includes B, and we write B c A or A :::> B.
Furthermore, if B c A and if there is an x E A such that x . B, then we
say that B is a proper subset of A. Some texts make a distinction between
proper subset and any subset by using the notation c and ~, respectively.
We shall not use the symbol ~ in this book. We note that if A is any set,
then 0 c: A. Also, 0 c 0. If B is not a subset of A, we write B A or
A P= B.
1.1.1. Example.
Let R denote the set of all real numbers, let Z denote
the set of all integers, let J denote the set of all positive integers, and let Q
denote the set of all rational numbers. We could alternately describe the set
Zas
Z = { x E R: x is an integer}.
Thus, for every x E R, the statement x is an integer is either true or false.
We frequently also specify sets such as J in the following obvious manner,

{x

E Z: x

1, 2, ...}.

We can specify the set Q as


Q=

x{

R:x =

:,p,q

Z,q : ;i:o} .

It is clear that 0 c J c Z c Q c R, and that each of these subsets are


proper subsets. We note that 0 . .J

We now wish to state what is meant by equality of sets.


1.1.2. De6nition. Two sets, A and B, are said to be equal if A c Band
B c A. In this case we write A = B. If two sets, A and B, are not equal,
we write A :;i: B. Ifx and y denote the same element of a set, we say that they
are equal and we write x = y. If x and y denote distinct elements of a set,
we write x :;i: y.

We emphasize that all definitions are "ifand only if" statements. Thus, in
the above definition we should actually have said: A and B are equal if and
only if A c Band Be A. Since this is always understood, hereafter all
definitions will imply the "only if" portion. Thus, we simply say: two sets A
and B are said to be equal if A c Band B cA.
In Definition 1.1.2 we introduced two concepts of equality, one of equality
of sets and one of equality of elements. We shall encounter many forms of
equality throughout this book.

Chapter 1 I uF ndamental Concepts

Now let X be a set and let A c: .X The complement of subset A with


respect to X is the set of elements of X which do not belong to A. We denote
the complement of A with respect to X by CxA . When it is clear that the complement is with respect to ,X we simply say the complement of A (instead of
the complement of A with respect to X), and simply write A- . Thus, we have
A-

{x

X: x

AJ.
~

(1.1.3)

In every discussion involving sets, we will always have a given fixed set
in mind from which we take elements and subsets. We will call this set the
universal set, and we will usually denote this set by .X
Throughout the remainder of the present section, X denotes always an
arbitrary non-void fixed set.
We now establish some properties of sets.
1.1.4.
(i)
(ii)
(iii)
(iv)
(v)
(vi)

Theorem. eL t A, B, and C be subsets of .X

Then

if A c: Band Bee, then Ace;


X= 0;
0- = X;
(A- r = A;
A c B if and only if A- >= B- ; and
A = B if and only if A- = B- .

Proof To prove (i), first assume that A is non-void and let x E A. Since
A c: B, x E B, and since B c: C, X E C. Since x is arbitrary, every element
of A is also an element of C and so A c C. Finally, if A = 0, then A c C
follows trivially.
The proofs of parts (ii) and (iii) follow immediately from (1.1.3).
To prove (iv), we must show that A c (A- ) - and (A- r c: A. If A = 0,
then clearly A c: (A- r . Now suppose that A is non-void. We note from
(1.1.3) that
(A- r

{x

X:

A- } .

(1.1.5)

If x E A, it follows from (1.1.3) that x ~ A- , and hence we have from


(1.1.5) that x E (A- ) - . This proves that A c:(A- ) - .
If(A- r = 0, then A = 0; otherwise we would have a contradiction by
what we have already shown; i.e., A c: (A- r . So let us assume that (A- r
0. If x E (A- r it follows from (1.1.5) that x ~ A- , and thus we have
x E A in view of (1.1.3). eH nce, (A- r c: A.
We leave the proofs of parts (v) and (vi) as an exercise.
_

"*

1.1.6. Exercise.

Prove parts (v) and (vi) of Theorem 1.1.4.

The proofs given in parts (i) and (iv) of Theorem 1.1.4 are intentionally
quite detailed in order to demonstrate the exact procedure required to prove

1.1. Sets

containment and equality of sets. Frequently, the manipulations required to


prove some seemingly obvious statements are quite long. It is suggested that
the reader carry out all the details in the manipulations of the above exercise
and the exercises that follow.
Nex t , let A and B be subsets of .X We define the union of sets A and B,
denoted by A U B, as the set of all elements that are in A or B; i.e.,
A

B=

x{

E X:

A or x

B}.

When we say x E A or x E B, we mean x is in either A or in B or in both


A and B. This inclusive use of "or" is standard in mathematics and logic.
IfA and B are subsets of ,X we define their intersection to be the set of all
elements which belong to both A and B and denote the intersection by A n B.
Specifically,
A n B = x { E X : x E A and x E B}.

If the intersection of two sets A and B is empty, i.e., if A n B = 0, we


say that A and B are disjoint.
F o r example, let X = I{ , 2, 3,4 , 5}, let A = I{ , 2}, let B = 3{ , ,4 5}, let
C = 2{ , 3}, and let D = ,4{ 5}. Then A- = B, B- = A, DeB, A U B = ,X
A n B = 0, A U C = I{ , 2, 3}, B n D = D, A n C = 2{ ,} etc.
In the next result we summarize some of the important properties of
union and intersection of sets.
1.1.7. Theorem.
(i)

(ii)
(iii)
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
(x)
(xi)
(xii)
(xiii)
(xiv)
(xv)
(xvi)
(xvii)

An B
AU B
An 0
Au 0

Let A, B, and C be subsets of .X

An X =
AuX = X ;
A n A=
Au A=
A u A-

An A-

B n A;
B U A;
=

0;
A;
=

A;

A;
A;
X;

0;

An Be A;
An B = A if and only if A c B;
A c A U B;
A = A u B if and only if B c A;
(A n B) n C = An (B n C);
(A U B) U C = A U (B U C);
A n (B u C) = (A n B) u (A n C);

Then

Chapter 1 I uF ndamental Concepts

(xviii)
(xi)x
(x)x

(A

B) U

(A U B)(A n Bf

=
=

C = (A U C) () (B U
A- n B- ; and
A- U B- .

C);

Proof. We only prove part (xviii) of this theorem. again as an illustration of


the manipulations involved. We will first show that (A () B) U C c (A U C)
() (B U C), and then we show that (A () B) U C::::J (A U C) n (B U C).
Clearly, if (A () B) U C = 0, the assertion is true. So let us assume that
(A () B) U C *- 0, and let x be any element of (A () B) U C. Then x E
A () B or x E C. Suppose x E A () B. Then x belongs to both A and B, and
hence x E A U C and x E B U C. F r om this it follows that x E (A U C)
() (B U C). On the other hand, let x E C. Then x E A U C and x E B U C.
and hence x E (A U C) () (B U C). Thus, if x E (A n B) U C, then
x E (A U C) n (B U C). and we have
(A

B) U C

c (A U

C)

(B U

C).

(1.1.8)

To show that (A () B) U C ::::J (A U C) () (B U C) we need to prove the


assertion only when (A U C) () (B U C) *- 0. So let x be any element of
(A U C) n (B U C). Then x E A U C and x E B U C. Since x E A U C,
then x E A or x E C. Furthermore, x E B U C implies that x E B or
x E C. We know that either x E C or x C. If x E C. then x E (A () B)
U C. If x C, then it follows from the above comments that x E A and
also x E B. Then x E A () B, and hence x E (A () B) U C. Thus, if x C,
then x E (A () B) U C. Since this exhausts all the possibilities, we conclude
that
(A U C) () (B U C) c (A () B) U C.
(1.1.9)
F r om (U . S)
U C .

and (1.1.9) it follows that (A U

C) () (B U

C) =

1.1.10. Exercise.
Prove parts (i) through (xvii) and parts (xi)x
of Theorem 1.1.7.

(A () B)

and (x)x

In view of part (xvi) of Theorem 1.1.7, there is no ambiguity in writing


B U C. Extending this concept. let n be any positive integer and let
AI' A 2
,A3 denote subsets of .X The set AI U A 2 U ... U A3 is defined
to be the set of all x E X which belong to at least one of the subsets AI,
and we write

AU

U
';1
3

A, =

AI U

A2 U

... U

A3

x{

E A, for

E X:

some i

= 1 . .. , n).

Similarly, by part (xv) of Theorem 1.1.7, there is no ambiguity in writing


A () B n C. We define

n
';1

A, =

AI () A: () ... () A. =

{x

X:

A, for all i

= 1, ... ,n).

1.1. Sets
That is,

n A, consists of those members of X


n

1= 1

AI, A z , .

which belong to all the subsets

,An'

We will consider the union and the intersection of an infinite number of


subsets A, at a later point in the present section.
and (x)x
of Theorem
The following is a generalization of parts (xi)x

1.1.7.

1.1.11. Theorem. Let AI> ... , An be subsets of .X Then


(i)
(ii)

1.1.14.

U[ 1=

n[

1=1

A/J- =
A/J=

Exercise.

n A;,
1= 1

(1.1.12)

and

U/=1 A;.

(1.1.13)

Prove Theorem 1.1.11.

The results expressed in Eqs. (1.1.12) and (1.1.13) are usually referred
to as De Morgan's laws. We will see later in this section that these laws hold
under more general conditions.
Next, let A and B be two subsets of X. We define the difference of Band
A, denoted (B - A), as the set of elements in B which are not in A, i.e.,

B-

A=

{x

:X x

f/:

B and x

A}.

We note here that A is not required to be a subset of B. It is clear that


B-

Bn A- .

Now let A and B be again subsets of the set .X The symmetric difference
of A and B is denoted by A ! l B and is defined as
A !l B

(A -

B) U

(B -

A).

The following properties follow immediately.

1.1.15. Theorem. Let A, B, and C denote subsets of .X Then


(i) A A B
(ii) A A B

B A A;
(A U B) -

(iii) A A A = 0;
(iv) A! l 0 = A;
(v) A A (B! l C) =

(A

B);

(A A B) A C;

(vi) A n (B! l C) = (A n B)! l (A


(vii) A! l Be (A! l C) U (C! l B).

1.1.16. Exercise.

Prove Theorem 1.1.15.

C); and

Chapter 1 I Fundamental Concepts

In passing, we point out that the use of Venn diagrams is highly useful in
visualizing properties of sets; however, under no circumstances should such
diagrams take the place of a proof. In Figure A we illustrate the concepts of
union, intersection, difference, and symmetric difference of two sets, and the
complement of a set, by making use of Venn diagrams. Here, the shaded
regions represent the indicated sets.

CU ( DO

C"' A U B

1.1.17.

iF gure A. Venn diagrams.

1.1.18. Definition. A non-void set A is said to be finite if A contains n


distinct elements, where n is some positive integer; such a set A is said to be
of order n. The null set is defined to be finite with order ez ro. A set consisting
of exactly one element, say A = a{ ,} is called a singleton or the singleton of a.
If a set A is not finite, then we say that A is infinite.
In Section 1.2 we will further categorize infinite sets as being countable
or uncountable.
Next, we need to consider sets whose elements are sets themselves. F o r
example, if A, D, and C are subsets of ,X then the collection 1< = A
{ , B, C}
is a set whose elements are A, D, and C. We usually call a set whose elements
are subsets of X a family of subsets of X or a collection of subsets of .X
We will usually employ a hierarchical system of notation where lower
case letters, e.g., a, b, c, are elements of ,X upper case letters, e.g., A, B, C,
are subsets of ,X and script letters, e.g., 1< , B< , e, are families of subsets of .X
We could, of course, continue this process and consider a set whose elements
are families of subsets, e.g., 1<{ , B< , e}.
In connection with the above comments, we point out that the empty

1.1. Sets

set, 0, is a subset of .X It is possible to form a non-empty set whose only


element is the empty set, i.e., { 0 } . In this case, { 0 } is a singleton. We see that
o E { 0 } and 0 c { 0 } .
In principle, we could also consider sets made up of both elements of X
and subsets of .X F o r example, if x E X and A c ,X then ,x { A} is a valid
set. oH wever, we shall not make use of sets of this nature in this book.
There is a special family of subsets of X to which we given a special name.

1.1.19. Definition. Let A be any subset of .X

We define the power class


of A or the power set of A to be the family of all subsets of A. We denote the
power class of A by P< (A). Specifically,
(P(A)

B
{ : B c A}.

1.1.20. Example. The power class of the empty set, (P(0) = { 0 } , i.e.,
the singleton of 0. The power class of a singleton, (P({a)J = { 0 , a{ n. F o r the
In general, if A is a finite set with
set A = a{ , b}, (P(A) = { 0 , a{ ,} b{ ,} a{ ,
n elements, then (P(A) contains 2" elements. _

bn.

Before proceeding further, it should be pointed out that a free and uncritical use of a set theory can lead to contradictions and that set theory has had
a careful development with various devices used to exclude the contradictions.
Roughly speaking, contradictions arise when one uses sets which are "too
big," such as trying to speak of a set which contains everything. In all of our
subsequent discussions we will keep away from these contradictions by always
having some set or space X fixed for a given discussion and by considering
only sets whose elements are elements of ,X or sets (collections) whose elements are subsets of ,X or sets (families) whose elements are collections of
subsets of ,X etc.
Let us next consider ordered sets. Above, we defined set in such a manner
that the ordering of the elements is immaterial, and furthermore that each
element is distinct. Thus, if a and b are elements of ,X then a{ , b} = b{ , a};
i.e., there is no preference given to a or b. Furthermore, we have a{ , a, b}
= a{ , b}. In this case we sometimes speak of an unordered pair a{ , b}.
Frequently, we will need to consider the ordered pair (a, b), (a and b need
not belong to the same set) where we distinguish between the first element a
and the second element b. In this case (a, b) = (u, v) if and only if u = a and
v = b. Thus, (a, b) *- (b, a) if a *- b. Also, we will consider ordered triplets
(a, b, c), ordered u
q adruplets (a, b, c, d), etc., where we need to distinguish
between the first element, second element, third element, fourth element,
etc. Ordered pairs, ordered triplets, ordered quadruplets, etc., are examples of
ordered sets.
We point out here that our characterization of ordered sets is not axiomatic, since we are assuming that the reader knows what is meant by the first

10

element, second element, third element, etc. (However, it is possible to define


ordered sets in a totally abstract fashion without assuming this simple fact.
We shall forego these subtle distinctions and accept the preceding as a definition.)
Now let X and Y be two non-void sets. We define the Cartesian or direct
product of X and ,Y denoted by X x ,Y as the set of all ordered pairs whose
first element belongs to X and whose second element belongs to .Y Thus,

x Y =

y): x

{(x,

,X Y

.} Y

(1.1.21)

Next, let X I > " "


IX I denote n arbitrary non-void sets. We similarly
define the (n-fold) Cartesian product of XI" .. , X., denoted by X I x X 2
X .
x X., as
XI

2 X

..

X.

{(Xl'
XI

2, .

E IX >

,x . ):
X 2 E

2,

, X.

EX . } .

(1.1.22)

We call X I the ith element of the ordered set (X l ' ,x . ) E X I X X 2 X


x .X , i = 1, ... , n. eH re again, two ordered sets (X I " . ,x . ) and

,Y . ) are said to be equal if and only if X I = Y I ' i =


1, ... ,n.
In the following example, the symbol I:>. means equal by definition.

.
(Y I '

Example. eL t R be the set of all real numbers. We denote the


Cartesian product, R x R, by R2 I:>. R x R. Thus, if x, Y E R, the ordered
pair (x, y) E R x R. We may interpret (x, y) geometrically as being the
coordinates of a point in the plane, x being the first coordinate and Y the
second coordinate. _
1.1.23.

1.1.24.

and

eL t A =

Example.
A x

and let B =

a{ , b, c}. Then

({O, a), (0, b), (0, c), (1, a), (I, b), (I, c)}

B=

to, I},

({ a, 0), (a, 1), (b, 0), (b, 1), (c, 0), (c, I)}.

F r om this example it follows that, in general, if A and B are distinct sets,


then
A x B=;e. B x A.
Next, we consider some generalizations to an ordered set. To this end,
let I denote any non-void set which we call index set. Now for each E I,
suppose there is a unique A" c: .X We call A
{ .. : E I} an indexed family of
sets. This notation requires some clarification. Strictly speaking, the set
notation A
{ .. : E I} would normally indicate that none of the sets A",
E I may be repeated. oH wever, in the case of indexed family we agree to
permit the possibility that the sets A.. , E I need not be distinct.
We define an indexed set in a similar manner. eL t I be an index set, and
for each E I let there be a unique element x .. E .X Then the set { x .. : E I}

1.1. Sets

11

is called an indexed set. Here again, we agree to permit the possibility that the
elements x .., a E I need not be distinct. Clearly, if I is a finite non-void set,
then an indexed set is simply an ordered set.
In the next definition, and throughout the remainder of this section, J
denotes the set of positive integers.
1.1.25. Definition. A sequence is an indexed set whose index set is .J A
sequence of sets is an indexed family of sets whose index set is .J
We usually abbreviate the sequence "x {
E :X
n E } J by ,x { ,},
when no
possibility for confusion exists. (Even though the same notation is used for
the sequence ,x { ,}
and the singleton of "x ' the meaning as to which is meant
will always be clear from context.) Some authors write ,x { ,};C!
to indicate
that the index set of the sequence is .J Also, some authors allow the index set
of a sequence to be finite.
We are now in a position to consider the following additional generalizations.

{ .. : a E I} be an indexed family of sets, and let


1.1.26. Definition. L e t A
K be any subset of I. If K is non-void, we define

and

{x

n A.. =

If K

A.. =

.. eJ r

..ex

0, we define

.. e0

{x

A..

:X x
X:

A.. for some a

0 and

}K

A.. for all ex E K }

n A.. =

.. e0

.X

The union and intersection of families of sets which are not necessarily
indexed is defined in a similar fashion. Thus, if ff' is any non-void family of
subsets of ,X then we define

pe'

and

F =

x{

(\ F =
e' ~

{x

:X x E F for some F E ff'}


E

:X x

F for all F E

When, in Definition 1.1.26, K is of the form K


where k is an integer, we sometimes write
1.1.27. Example.
Let X
< I}. Let A.. = x{

o< x

and

n A.. =

.el

O
{ ,J

ff'.}

k{ , k

1, k

0 A" and n A".

-t

2, ...},

= R, the set of real numbers, and let I = x{ E R:


R: 0 < x
< a} for all a E I. Then, U A. = I

..el

i.e., the singleton containing only the element O.

Chapter 1 I uF ndamental Concepts

11

1.1.28. Example.

A" = :x {

-n <

x <

{x

R: - -

:x {

0 <x<

B" =

eL t
and

n B" =
00

,,= \

.
Then, U
=

eL t

n+ .}

.
Rand n A" =

R, the set of real numbers, and let 1=

I
< xI
< 1} + .. = 1

A" =

U ..

.- 1

Then,

,,= \

:x {

B" = :x {

-1

.J Let

< x <

-1 <

x <

I} .

2}

I} .

The reader is now in a position to prove the following results.


1.1.29. Deorem. eL t A
{ .. : (t E I} be an indexed family of sets. Let
any subset of ,X and let K be any subset of I. Then
(i) B

(ii) B U
(iii)

B-

(iv) B (v)
(vi)

U [ .. eIC

n[

.. eIC

U [ .. ex

n[
U

.. eIC

.. eIC

B
[

n B[

A..]
A..

A..]

n (B .. eIC

B be

n A..] ;
U

A.. ];
A..);

n A.. = U (B - A.. );
A.. r = n A;; and
.. ex

.. eIC

.. eIC

A..

1.1.30. Exercise.

.. eIC

.. elC

.. ex

A;.

Prove Theorem 1.1.29.

Parts (v) and (vi) of Theorem 1.1.29 are called De Morgan's laws.
We conclude the present section with the following:
1.1.31. Definition. eL t ff' be any family of subsets of .X ff' is said to be a
family of disjoint sets if for all A, B E ff' such that A :# B, then A n B = 0.
A sequence of sets E{ ,,} is said to be a sequenee of disjoint sets if for every m,
n E J such that m n, E", n E" = 0.

"*

1.2.

N
U F CTIONS

We first give the definition of a function in a set theoretic manner. Then


we discuss the meaning of function in more intuitive terms.
1.2.1. Definition. eL t X and Y be non-void sets. A function/from X into
Y is a subset of X x Y such that for every x E X there is one and only one
y E Y (Le., there is a unique y E )Y such that (x, y) E f The set X is called
the domain of f (or the domain of definition of f), and we say that / is de8Ded
on .X The set { y E :Y (x, y) E f for some x E X } is called the range of f
and is denoted by <n(f). F o r each (x, y) E ,J we call y the value of / at x

1.2. uF nctions
and denote it by f(x ) . We sometimes writef: X
into .Y

f from X

13
->

Y t o denote the function

The terms mapping, map, operator, transformation, and function are used
interchangeably. When using the term mapping, we usually say "a mapping
of X into Y." Although the distinction between the words "of X " and "from
X " is immaterial, as we shall see, the wording "into Y " becomes important
as opposed to the wording "onto Y," which we will encounter later.
Sometimes it is convenient not to insist that the domain of definition off
be all of X ; i.e., a function is sometimes defined on a subset of X rather than
on all of .X In any case, the domain of definition offis denoted by ' J ( f) c .X
Unless specified otherwise, we shall always assume that ' J ( f) = .X
Intuitively, a functionfis a "rule" whereby for each x E X a uniq u e y E Y
is assigned to x . When viewed in this manner, the term mapping is q u ite
descriptive. However, defining a function as a "rule" involves usage of yet
another undefined term.
Concerning functions, some additional comments are in order.
So-called "multivalued functions" are not allowed by the above
definition. They will be treated later under the topic of relations
(Section 1.3).
2. The set X (or )Y may be the Cartesian product of sets, e.g., X = IX
X X 2 X ... x "X . In this case we think offas being a function of n
variables. We write f(x l , . . . ,x,,) to denote the value offat (X I ' , ,
x . ) E X = X I X ... x "X .
3. It is important that the distinction between a function and the value
of a function be clearly understood. The value of a function, f(x ) ,
is an element of .Y The function.f is a much larger entity, and it is
to be thought ofas a single object. Note that f E P< (X
x Y ) (the power
x Y ) is a function. The
set of X x Y), but not every element of P< (X
set of all functions from X into Yis a subset of P< (X
x Y) and is sometimes denoted by yx .
1.

1.2.2. Example.
Let A and B be the sets defined in Example 1.1.24. L e t
fbe the subset of A x B given byf = (0, a), (1, b)}. Thenfis a function from
A into B. We see thatf(O) = a andf(1) = b. The range offis the set (a, h}
which is a proper subset of B.
Although we have defined a function as being a set, we usually characterize
a function according to a rule as shown, for example, in the following.
1.2.3. Example.
L e t R denote the real numbers, and letfbe a function from
R into R whose value at each x E R is given by f(x ) = sin x. The function
f is the sine function. Expressed explicitly as a set, we see that f = {(x, y):

14

Chapter 1 / Fundamental Concepts

= sin .}x Note that the subset ({ ,x


function. _

y): x

sin y} c R x

R is not a

The preceding example also illustrates the notion of the graph of a


function. Let X and Y denote the set of real numbers, let X x Y denote
their Cartesian product, and let/be a function from X into .Y The collection
of ordered pairs (x , f(x
in X x Y is called the graph of the function f.
Thus, a subset G of X x Y is the graph of a function defined on X if and
only if for each x E X there is a unique ordered pair in G whose first element
is .x In fact, the graph of a function and the function itself are one and the
same thing.
Since functions are defined as sets, equality of fUDctiODS is to be interpreted
in the sense as equality of sets. With this in mind, the reader will have no
difficulty in proving the following.
1.2.4. Deorem. Two mappings I and g of X into Y a re
if I(x ) = g(x) for every x E .X
1.2.5. Exercise.

equal if and only

Prove Theorem 1.2.4.

We now wish to further characterize and classify functions. If I is a


function from X into ,Y we denote the range ofIby Gl(f). In general, Gl(/)
c Y mayor may not be a proper subset of .Y Thus, we have the following
definition.
1.2.6. Definition. Let I be a function from X into .Y If Gl(f) = ,Y then
I is said to be surjective or a surjection, and we say that I maps X onto .Y

If/is a function such that for every IX ,X:t


E X , f(x
implies that
l ) = / (x : t)
lX = :X t, then/is said to be injective or a oile-to-one mapping, or an injection.
If/is both injective and surjective, we say that/is bijective or one-to-one and
onto, or a bijection.
Let's go over this again. Every function I: X - > Y is a mapping of X into
.Y If the range of I happens to be all of ,Y then we say I maps X onto .Y
F o r each x E ,X there is always a unique y E Ysuch thaty = I(x ) . oH wever,
there may be distinct elements lX and :X t in X such that I(x l ) = I(x:t). If
there is a unique x E X such that I(x ) = y for each y E Gl(f), then we say
that I is a one-to-one mapping. IfI maps X onto Y a nd is one-to-one, we say
that I is one-to-one and onto. In Figure B an attempt is made to illustrate
these concepts pictorially. In this figure the dots denote elements of sets and
the arrows indicate the rules of the various functions.
The reader should commit to memory the following associations: surjective ~ onto; injective ~ one-to-one; bijective ~ one-to-one and onto.
Frequently, the term one-to-one is abbreviated as (1-1).

15

1.2. Functions
e

,.4

:?' 3 :7
b

d><:3
c

12 : X

,Y

-+

2 -+

1.2.7.

".4
,.3

b.

a~:
13 : X

'~4

I.: .X - + Y .
I. is bijective

3 ... 3Y

13 is (1 - 11

12 is onto

I, is into

,.5

1,: ,X

d.

iF gure B. Illustration of different types of mappings.

We now prove the following important but obvious result.


1.2.8. Theorem. eL t f be a function from X into ,Y and let Z = fSt(f),
the range off L e tgdenote the set{(y, x ) E Z X :X (x, y) E f} . Then, clearly,
g is a subset of Z x X and f is injective if and only if g is a function from Z
into .X

Proof L e tfbe injective, and let y E Z. Since y E fSt(f), there is an x E X


such that (x, y) E f and hence (y, x ) E g. Now suppose there is another
IX E X such that (y, IX ) E g. Then (XI' y) E f Since f is injective and
y = f(x ) = f(x I)' this implies that X = X I and so X is unique. This means
that g is a function from Z into .X
Conversely, suppose g is a function from Z into X.

Let x

X~

be such

that f(x l ) = f(x 2). This implies that (X I ,f(X I and (X 2,f(X 2 E f and so
(f(x l ), IX ) and (f(x 2), x 2) E g. Since f(x l ) = f(x 2) and g is a function,
we must have IX = X 2. Therefore,fis injective. _
The above result motivates the following definition.
1.2.9. Definition. L e tfbe an injective mapping of X into .Y Then we say
that f has an inverse, and we call the mapping g defined in Theorem 1.2.8
the inverse off Hereafter, we will denote the inverse of fbyf- I .

Clearly, iffhas an inverse, thenf1.2.10. Theorem.

(i)
(ii)
(iii)
(iv)

eL t

is a mapping from fSt(f) onto .X

be an injective mapping of X into .Y Then

fis a one-to-one mapping of X onto fSt(f);


f- I is a one-to-one mapping of fSt(f) onto ;X
for every X E ,X f- l (f(X
= ;X and
for every y E al(f),f(f- I (y = y.

1.2.11. Exercise.

Prove Theorem 1.2.10.

16

Chapter 1

Fundamental Concepts

Note that in the above definition, the domain of /-1 is R


< (f) ,which need
not be all of .Y
Some texts insist that in order for a function/to have an inverse, it must
be bijective. Thus, when reading the literature it is important to note which
definition of /-1 the author has in mind. (Note that an injective function
/: X Y is a bijective function from X onto R
< (f).)
1.2.12. Example. eL t X = Y
be given by lex ) = x 3 for every x
and/- I (y) = (y)'/3 for all y. _

=
E

R, the set o"real numbers. L e t/: X Y


R. Then/is a (1-1) mapping of X onto Y

1.2.13. Example. eL t X = Y = ,J the set of positive integers. Let / : ,X


Y b e given by fen) = n + 3 for all n E .J Then/is a (1-1) mapping of X
into .Y oH wever, the range off, R
< (f) = y{ E :Y y > }4 = ,4 { 5, ...}
.Y
Therefore,fhas an inverse,f- I , which is defined only on R
< (f) and not on all
of .Y In this case we have/- I (y) = y - 3 for all y E R
< (f).
_

*'

1.2.14.
-

Example.

eL t

bY e given by lex )

and R
< (f)

= y{

= I

:Y ,- I

into R given by /- I (y)

=
Y

R, the set of all real numbers. Let /: X

for all x

;Ix \

< y< +

I] . Also,

I for all y

I !\y

R. Then/is an injective mapping


E

/-1 is a mapping from R< (f)


R
< (f).

Y a nd g : Y
Next, let ,X ,Y and Z be non-void sets. Suppose that/: X - Z. F o r each x E ,X we have f(x ) E Y and g(f(x
E Z. Since / and g
are mappings from X into Y a nd from Y into Z, respectively, it follows that
for each x E X there is one and only one element g(f(x
E Z. e
H nce, the
set
({ ,x )z E X X Z: z = g(f(x , x E X }
(1.2.15)
is a function from X into Z. We call this function the composite function of
g and / and denote it by g 0 f The value of go/ at x is given by
(g 0 f)(x )

g o/(x )

t:.

g(f(x .

In Figure C, a pictorial interpretation of a composite function is given.


1.2.17. Theorem. If/is a mapping of a set X onto a set Yand g is a mapping
of the set Y onto a set Z, then go/ is a mapping of X onto Z.
In order to show that go/ is an onto mapping we must show that
foranyz E Zthere exists an x E X s uchthatg(/(x
= z . Ifz E Zthensince
g is a mapping of Y onto Z, there is an element y E Y such that g(y) = .z
Furthermore, since / is a mapping of X onto ,Y there is an x E X such that
lex ) = y. Since g o/(x ) = g(f(x
= g(y) = ,z it readily follows that go/
is a mapping of X onto Z, which proves the theorem. _
Proof

1.2. uF nctions

17

1.2.16.

iF gure C. Illustration of a composite function.

We also have
1.2.18. Theorem. IfI is a (I- I ) mapping of a set X onto a set ,Y and if g
is a (I- I ) mappi ng ofthe set Y o nto a set Z, then g 0 I is a (I- I ) mapping of X
ontoZ.
1.2.19. Exercise.

Prove Theorem 1.2.18.

Next we prove:
1.2.20. Theorem. If I is a (1-1) mapping of a set X onto a set ,Y and if
a set Z, then (g 0 f)- I = (f- I ) 0 (g- I ).

g is a (1- I ) mapping of Y o nto

Proof Let z E Z. Then there exists an x E X such that g 0 f(x ) = ,z and


hence (g 0 f)- I (Z) = .x Also, since g 0 f(x ) = g(f(x = ,z it follows that
g- I (Z) = I(x ) , from which we have f- I (g- I (Z
= .x But I- I (g- I (z
=
I- I 0 g- I (Z) and since this is equal to x, we havef- I 0 g- I (Z) = (g olt 1(z).
Since z is arbitrary, the theorem is proved. _

Note carefully that in Theorem 1.2.20 I is a mapping of X onto .Y If it


had simply been an injective mapping, the composite function (/- 1 ) 0 (g- I )
may not be defined. That is, the range of g- I is ;Y however, the domain of
1- 1 is R
< (f). Clearly, the domain ofI- I must include the range of g-1 in order
that the composition (f- l ) 0 (g- l ) be defined.
1.2.21. Example.
Let A = r{ , s, t, u}, B = u{ , v, W, x},
.J z Let the function I : A - > B be defined as

1= fer, )U ,

(s, w), (t, v), (u, )x .J

and C =

w
{ , x , y,

Chapter 1 I uF ndamental Concepts

18

We find it convenient to represent this function in the following way:

(r stU ) .

1=

v x

That is, the top row identifies the domain ofI and the bottom row contains
each uniq u e element in the range of I directly below the appropriate element
in the domain. Clearly, this representation can be used for any function
defined on a finite set. In a similar fashion, let the function g : B - + C be
defined as

= (U

W )X .

Clearly, both/and g are bijective. Also, go lis the (I- I ) mapping of A onto
C given by

). y
z
(xr stU

g 0/=

F u rthermore,

uX ),

g- I

= (X

I- I

og- t

u v

Z
W

y),
x

(gof)- t

= (X r sZ

w Y

).
u

Now

i.e.,f- t og- t = ( goltt

= (rX Wt sZ Y )'
u

The reader can prove the next result readily.


L e t W, X, ,Y and Z be non-void sets. If I is a mapping
of set W into set ,X if g is a mapping of X into set ,Y and if h is a mapping
of Y into set Z (sets W, ,X ,Y Z are not necessarily distinct), then h 0 (g 0 f)
= (h 0 g) of

1.2.22. T
' heorem.

Prove Theorem 1.2.22.

1.2.23. Exercise.

1.2.24. Example. Let A = m


[ , n, p, ,} q
B= m
[ , r, s}, C = r{ , t, u, v},
= w{ , ,x ,Y ,}z and define I : A - + B, g : B - + C, and h : C - + D as

1= (: ;

:),

(~

g=

C ~ ; :).

:),

hog

= (: :

Then

g0I

(~

:)

and

:) .

1.2. uF nctions

19

Thus,
h

(g 0 f) =

i.e., h 0 (g 0 f)

(:

(h

~ ~)

g)

f.

and (h

g)

f
=

(:

~ ~),

There is a special mapping which is so important that we give it a special


name. We have:

1.2.25. Definition. Let X be a non-void set. eL t e : X - X be defined by


e(x) = x for all x E .X We call e the identity function on .X
It is clear that the identity function is bijective.

1.2.26. Theorem. eL t X and Y be non-void sets, and left f: X - .Y Let


ex, ey, and e l be the identity functions on X, ,Y and R
< (f), respectively. Then
(i) iffis injective, thenf- I of= ex andfof- I ; = e l ; and
(ii) f is bijective if and only if there is a g : Y - X such that g
andfo g = ey.

f = ex

Part (i) follows immediately from parts (iii) and (iv) of Theorem
1.2.10.
The proof of part (ii) is left as an exercise. _

Proof.

1.2.27. Exercise.

Prove part (ii) of Theorem 1.2.26.

Another special class of important functions are permutations.


1.2.28. Definition. A permutation on a set X is a (I- I ) mapping of X onto .X
It is clear that the identity mapping on X is a permutation on .X F o r this
reason it is sometimes called the identity permutation on .X It is also clear
that the inverse of a permutation is also a permutation.

1.2.29. Exercise.
as

eL t X

f=

= a{ ,

b, e}, and definef: X

(ac bb ae),

g=

-+

X and g : X X

(ab eb ae).

Show that/, g,f- I , and g- I are permutations on .X


1.2.30. Exercise. Let Z denote the set of integers, and let f : Z - Z be
defined by f(n) = n + 3 for all n E Z. Show thatfandf- I are permutations
on Z and thatf- I 0 f= fo f- I .

10

The reader can readily prove the following results.


1.2.31. Theorem. Iflis a (I- I ) mapping of a set A onto a set B and if g
is a (1-1) mapping of the set B onto the set A, then g 0 I is a permutation on A.
1.2.32. Corollary. If I and g are both permutations on a set A, then g
is a permutation on A.
1.2.33. Exercise.

Prove Theorem 1.2.31 and Corollary 1.2.32.

1.2.34. Exercise. Show that if a set A consists of n elements, then there


are exactly n! (n factorial) distinct permutations on A.

Now letl be a mapping of a set X into a set .Y If X I is a subset of X , then


for each element x ' E XI there is a unique element/(x ' ) E .Y Thus,fmay be
used to define a mapping f' of XI into Y defined by

f' ( x ' )

for all x '

I(x ' )

(1.2.35)

This motivates the following definition.

XI'

1.2.36. Definition. The mappingf' of subset XI C X into Y o f Eq. (I.2.35)


is called the mapping of X . into Y induced by the mapping f: X - > .Y In
this case f' is called the restriction offto the set X I '

We also have:
1.2.37. Definition. IfI is a mapping of XI into Y a nd if XI
mapping f of X into Y is said to be an extension offif

for every x

/(x )
E

XI'

I(x )

1.2.39. Example.
s, t}. Clearly XI

Also, define j, j

j =

:X

(U

eL t X I = u{ , v, ,} x
.X Define I : X I

I=(U

->

X
->

Yas

n p q

then any
(1.2.38)

Thus, if j is an extension off, then I is a mapping of a set XI


which is induced by the mapping j of X into .Y
T,

,X

f{ l, v, ,x y, ,} z

and Y

v x

Z) .

X into Y

= tn, p, ,q

)X .

Y as

v x

npqrs

Z),

j =

(U

npqnt

Then j andj are two different extensions off Moreover, I is the mapping

1.2. uF nctions

11

of IX into Y induced either by j or j. In general, two distinct mappings


may induce the same mapping on a subset. _
Let us next consider the image and the inverse image of sets under
mappings. Specifically, we have
1.2.40. Definition. L e tf be a function from a set X into a set :Y Let A c: ,X
and let B c: .Y We define the image of A under f, denoted by f(A), to be the
set

f(A) =

y{ E :Y y

= f(x ) ,

A}.

We define the inverse image of B under f, denoted by f- l (B), to be the set

f- ' ( B)

x{

X : f(x )

B}.

Note thatf- I (B) is always defined for any f: X - - > .Y That is, there is no
implication here thatfhas an inverse. The notation is somewhat unfortunate
in this respect. Note also that the range offis f( X).
In the next result, some of the important properties of images and inverse
images of functions are summarized.
1.2.41. Theorem. Let f be a function from X into ,Y let A, A1> and A2
be subsets of ,X and let B, BI> and B2 be subsets of .Y Then

(i) if AI c: A, then f(A I) c: f(A);


(ii) f(A I U A2 ) = f(A I ) U f(A2 );
(iii) f(A I n A2 ) c: f(A I) n f(A2 );
(iv) f- ' ( B I U B2 ) = f- I (B I) U f- I(B2 );
(v) f- ' ( B I n B2 ) = rJ ( B I ) n f- I(B 2 );
(vi) f- ' ( B- ) = [ f - I (B)r;
(vii) f-'[f(A)]:::>
A; and
c: B.
(viii) f[ f - ' ( B)]
Proof We prove parts (i) and (ii) to demonstrate the method of proof.
The remaining parts are left as an exercise.
To prove part (i), let y E f(AI)' Then there is an x E AI such that
y = f(x ) . But AI c: A and so x E A. H e nce,f(x )
= y E f(A). This proves
thatf(A I) c: f(A).
To prove part (ii), let y E f(A 1 U A2 ). Then there is an x E AI U A2
such that y = f(x ) . If x E AI, then f(x ) = y E f(A,). If x E A2 , then
f(x ) = y E f(Az ). Since x is in AI or in Az , f(x ) must be in f(A,) or f(Az ).
Therefore, f(A I U A2 ) c: f(A I) U f(Az ). To prove that f(A I) U f(Az )
c: f(A, U Az ), we note that Al c: AI U Az . So by part (i), f(AI) c: f(AI

Chapter 1 I uF ndamental Concepts


U
U

A2). Similarly, f(A2) c f(A, U A2). F r om this it follows that f(A I)


f(A2) c f(A, U A2). We conclude that f(A, U A2) = f(A I) U f(A,j.

1.2.42.

Exercise.

Prove parts (iii) through (viii) of Theorem 1.2.41.

We note that, in general, equality is not attained in parts (iii), (vii), and
(viii) of Theorem 1.2.41. However, by considering special types of mappings
we can obtain the following results for these cases.

1.2.43. Theorem. L e tfbe a function from X into ,Y let A, AI' and A2 be


subsets of ,X and let B be a subset of .Y Then
(i) f(A, n A2) = f(A I) n f(A2) for all pairs of subsets AI, A2 of X
if and only iffis injective;
(ii) f- ' [ f (A)]
= A for all A c X if and only iff is injective; and
(iii) f[ f - I (B)] = B for all B c Y i f and only iffis surjective.

Proof We will prove only part (i) and leave the proofs of parts (ii) and
(iii) as an exercise.
To prove sufficiency, letfbe injective and let AI and A2 be subsets of .X
In view of part (iii) of Theorem 1.2.41, we need only show thatf(A I) nf(A,)
c f(A, n A2). In doing so, let y E f(A I) n f(A2). Then y E f(A I) and
y E f(A2). This means there is an IX E AI and an x 2 E A2 such that y
= f(x ,) = f(x 2). Since f is injective, IX = 2X ' Hence, IX E AI n A2. This
implies that y E f(A J n A2); i.e.,f(A I) n f(Al ) c f(A I n Al )
To prove necessity, assume that f(A I n A2) = f(A I) n f(A2) for all
subsets AI and A 2 of .X F o r purposes of contradiction, suppose there are
IX ' 2X E X such that IX
X 2 and f(x , ) = f(x 2). Let AI = IX { }
and A2
= (X2;} i.e., AI and A2 are singletons of X I and X 2, respectively. Then AI n A2
=
0, and so f(A, n A2) = 0. However, f(A,) = y{ } and f(A2} = y
{ ,}
and thus f(A I) n f(A2) = y{ }
0. This contradicts the fact that f(A , )
n f(A2) = f(A I n A2) for all subsets AI and A2 of .X Thus, f is injective. -

1.2.4.4

Exercise.

Prove parts (ii) and (iii) of Theorem 1.2.43.

Some of the preceding results can be extended to families of sets. F o r


example, we have:

1.2.45. Theorem. Let f be a function from X


an indexed family of sets in ,X and let B
{ .. : IX
of sets in .Y Then
(i) f(U
(ii) f(

A..)

= UE J

f(A..);

n A..) c n f(A..);

(l.EI

EI

EI

into ,Y let A
{ .. : IX E I} be
} K be an indexed family

1.2.

u~ nct;ons

(iii) f- I (U

B,,)

"EI:

(iv) f- I (n

B,,)

"E /{

"EI:

f- I (B,,);

n f- I (B,,); and

(v) if Be Y , f- ' ( B- )

"EI:

[ f - I (B} r .

Proof

We prove parts (i) and (iii) and leave the proofs of the remaining
parts as an exercise.
To prove part (i), let y E feu A,,). This means that there is an x E U A"
"EI

"EI

such that y = f(x ) . Thus, for some IX E T, x E A". This implies that f(x )
E f(A,,) and so y E f(A,,). e
H nce, y E U f(A,,). This shows that feu A,,)
"EI

c U f (A,,).
"EI

To prove the converse, let y


This means there is an x
f(x )

=
Y

f(x )

Conversely, let x

Thus,j(x )
that

.EI:

1.2.46.

"EI

E/{

B". H e nce,j(x )

f- I (B,,) c f- I (

Exercise.

"E/{

"' E /{

Thus, x

.K

.E/{

"EI

T.

A", and so

f-I(B",),

"E/{

B",.

and so x

f- I (B,,) .

f- I (B,,). Then x
ceK

= y. Now x

B.). This means that f(x )

B",) c U
E

f(A,,) for some IX

"EI

E/{

IX

f(A,,) c feu A,,). This completes the

f- I ( U

B" for some

Therefore,j- I (U

f(A,,). Then y

A" such thatf(x )

"EI

U f - I (B.).

E/{

"EI

feu A,,). Therefore, U

proof of part (i).


To prove part (iii), let x
eH nce,

"EI

B., and so x

f- I (B,,) for some


E

f- I ( U

ceK

IX

K.

B,,). This means

B,,), which completes the proof of part (iii).

Prove parts (ii), (iv), and (v) of Theorem 1.2.45.

Having introduced the concept of mapping, we are in a position to consider


an important classification of infinite sets. We first consider the following
definition.

1.2.47. Definition. Let A and B be any two sets. The set A is said to be
equivalent to set B if there exists a bijective mapping of A onto B.
Clearly, if A is equivalent to B, then B is equivalent to A.

1.2.48. Definition. eL t J be the set of positive integers, and let A be any set.
Then A is said to be countably infinite if A is equivalent to .J A set is said to
be countable or denumerable if it is either finite or countably infinite. Ifa set
is not countable, it is said to be uncountable.
We have:

Chapter 1

~ntal

Concepts

1.2.49. Theorem. L e t J be the set of positive integers, and let 1 c .J If 1


is infinite, then 1 is equivalent to .J
Proof. We shall construct a bijective mapping, f, from J onto 1. L e t .J { :
n E J } be the family of sets given by J . = {I, 2, ... , n} for n = 1,2, ....
Clearly, each J. is finite and of order n. Therefore, J. n I is finite. Since I is
0 for all n. L e t us now define f : J - + I as follows. L e t
infinite, 1 - J.
f(I) be the smallest integer in 1. We now proceed inductively. Assume f(n)
E I has been defined and let f(n
1) be the smallest integer in I which is
greater than f(n). Now f(n + 1) > f(n), and so f(n.) > f(n,J for any
n. > n2 This implies thatfis injective.
Nex t , we want to show that f is surjective. We do so by contradiction.
Suppose that f(J )
I. Since f(J ) c I, this implies that 1- f(J )
0. L e t
q be the smallest integer in 1 - f(J ) . Then q
f(1) because f(l) E f(J ) ,
and so q > f(I). This implies that 1 n J q _ .
0. Since In J q _ . is non- v oid
and finite, we may find the largest integer in this set, say r. It follows that
r < q - 1 < .q Now r is the largest integer in I which is less than .q But
r < q implies that r E f(J ) . This means there is an s E J such that r = f(s).
By definition of f,f(s + 1) = .q Hence, q E f(J ) and we have arrived at a
contradition. Thus, f is surjective. This completes the proof. _

We now have the following corollary.


1.2.50. Corollary.
countable.

Let

A c B c .X

If B is a countable set, then A is

Proof. If A is finite, then there is nothing to prove. So let us assume that A


is infinite. This means that B is countably infinite, and so there exists a
bijective mapping f : B - + .J L e t g be the restriction offto A. Then for all
Xu
X 2 E A such that X .
X 2 , g(x . ) =
f(x t )
f(x 2 ) = g(x 2 ). Thus, g
is an injective mapping of A into .J By part (i) of Theorem 1.2.10, g is a
bijective mapping of A onto g(A). This means A is equivalent to g(A), and
thus g(A) is an infinite set. Since g(A) c ,J g(A) is equivalent to .J Hence,
there is a bijective mapping of g(A) onto ,J which we call h. By Theorem
1.2.18, the composite mapping hog is a bijective mapping of A onto .J This
means that J is eq u ivalent to A. Therefore, A is countable. _

We conclude the present section by considering the cardinality of sets.


Specifically, if a set is finite, we say the cardinal Dumber of the set is eq u al to
the number of elements of the set. Iftwo sets are countably infinite, then we
say they have the same cardinal number, which we can define to be the
cardinal number of the positive integers. More generally, two arbitrary sets
are said to have the same cardinal number if we can establish a bijective
mapping between the two sets (i.e., the sets are equivalent).

1.3.

RELATIONS

AND EQUIVALENCE

RELATIONS

Throughout the present section, X denotes a non-void set.

We begin by introducing the notion of relation, which is a generalization


of the concept of function.
1.3.1 Deftnition. Let X and Y be non-void sets. Any subset of X X Y is
called a relation from X to .Y Any subset of X x X is called a relation in .X
1.3.2. Example. Let A = u{ , v, ,x y) and B = a{ , b, c, d). Let ~ = ({ u, a),
(v, b), (u, c), (x, a). Then ~ is a relation from A into B. It is clearly not a
function from A into B (why?). _
1.3.3. Example. Let X = Y = R, the set of real numbers. The set
y) E R x R: :x ::;;; y) is a relation in R. Also, the set ({ ,x y) E R x R:
x = sin y) is a relation in R. This shows that so-called multivalued functions
are actually relations rather than mappings. _

({ ,x

As in the case of mappings, it makes sense to speak of the domain and the
range of a relation. We have:
1.3.4.

DefiDition. eL t p be a relation from X to .Y The subset of X,


{x

:X (x, y)

E p,

E )Y ,

is called the domaiD or p. The subset of Y


{y

:Y (x, y)

p,
X

EX ) ,

is called the ruge of p.


Now let p be a relation from X to .Y Then, clearly, the set p- I c Y
defined by
p- I

{ ( y; x)

Y
X

X : (x, y)

pc X

,X

)Y ,

is a relation from Y to .X The relation p- I is called the inverse relation of p.


Note that whereas the inverse of a function does not always exist, the inverse
of a relation does always exist.
Next, we consider equivalence relations. eL t p denote a relation in X ;
i.e., p c X X .X Then for any ,x y E ,X .either (x, y) E P or (x, y) i p,
but not both. If (x, y) E p, then we write x p y and if (x, y) i p, we write
x.J/y.

1.3.5. DefiDition. Let p be a relation in .X


(i) If x P x for all x

,X

then p is said to be reflexive;

Chapter 1

26

uF ndtzmental

Concepts

(ii) if x P y implies y p x for all x, Y E p, then p is said to be symmetric;


and
(iii) if for all x, y, Z E ,X X PY and y p Z implies x p ,z then p is said to
be traositive.
1.3.6. Example. Let R denote the set of real numbers. The relation in R
given by {(x, y): x < y} is transitive but not reflexive and not symmetric.
y} is symmetric but not reflexive and
The relation in R given by {(x, y): x
not transitive. _

*"

defined by p = ({ A x B):
1.3.7. Example. Let p be the relation in (>J< )X
A c B}. That is, A p B if and only if A c B. Then p is reflexive and transitive
but not symmetric. _
In the following, we use the symbol,.., to denote a relation in .X
E ,.." then we write, as before, x ,.., y.

(x, y)

If

1.3.8. Definition. L e t,.., be a relation in .X Then ...., is said to be an


equivalence relation in X if ,.., is reflexive, symmetric, and transitive. If ,.., is
an equivalence relation and if x ...., y, we say that x is equivalent to y.
In particular, the equivalence relation in X characterized by the statement
"x ,.., y if and only if x = y" is called the equals relation in X or the identity
relation in .X
1.3.9. Example.
eL t X be a finite set, and let A, B, C E P< (X).
Let,.., on
be defined by saying that A ...., B if and only if A and B have the same
number of elements. Clearly A ,.., A. Also, if A ,.., B then B "' "' A. F u rthermore, if A ...., Band B "' "' C, then A ,.., C. Hence, ...., is reflexive, symmetric,
and transitive. Therefore, ,.., is an equivalence relation in P< (X).
_

P< (X)

1.3.10. Example.
Let R1. = R x R, the real plane. Let X be the family of
all triangles in R1.. Then each of the following statements can be used to define
an equivalence relation in :X "is similar to," "is congruent to," "has the same
area as," and "has the same perimeter as." _

1.4.

OPERATIONS ON SETS

In the present section we introduce the concept of operation on set, and


we consider some of the properties of operations. Throughout this section, X
denotes a non-void set.
1.4.1. Definition. A binary operation on X is a mapping of X x
.X A ternary operation on X is a mapping of X x X x X into .X
X

into

1.4.

27

Operations on Sets

We could proceed in an obvious manner and


on .X Since our primary concern in this book will
we will henceforth simply say "an operation on X "
binary operation on .X
If IX: X X X - > X is an operation, then we
IX(,X y) A IX yX .

define an n-ary operation


be with binary operations,
when we actually mean a
usually use the notation

1.4.2. Example.
Let R denote the real numbers. Let f: R x R - > R be
given by f(x , y) = x + y for all x, y E R, where x + y denotes the customary sum of x plus y (Le., + denotes the usual operation of addition of real
numbers). Then f is clearly an operation on R, in the sense of Definition
as being the operation on R,
1.4.1. We could just as well have defined
i.e., +: R x R - > R, where + ( x , y) A x + y. Similarly, the ordinary rules
of subtraction and multiplication on R, "- " and" . ", respectively, are also
operations on R. Notice that division, :- ,- is not an operation on R, because
x :- - y is not defined for all y E R (i.e., x :- - y is not defined for y = 0).
{ ,J then "- : - " is an operation on R#.
However, if we let R* = R - O

"+"

1.4.3. Exercise. Show that if A is a set consisting of n distinct elements,


then there exist exactly n() distinct operations on A.
1.4..4
Example. Let A = a{ , b}. An example of an operation on A is the
mapping IX: A x A - > A defined by
I%(a,

a)

A 01% 0

0,

b)

1%(0,

A 01%

lX(b,O)

b,

b IX a =

b, lX(b, b) =

b IX b =

It is convenient to utilize the following operation table to define

..!~-

ala
b

b
b a

a.

IX:

(l.4 . 5)

If, in general, IX is an operation on an arbitrary finite set A, or sometimes


even on a countably infinite set A, then we can construct an operation table
as follows:

If A =

IX

xIXy

a{ , b}, as at the beginning of this example, then in addition to

IX

28

CMprerlIFm
~ en~/C~up~

L~
a
b

Iba

a b
a a a
b b b

" a b
a

p, y,

:r::

given in (1.4.5), we can define, for example, the operations


A as

b
a b

and ~ on

We now consider operations with important special properties.


1.4.6.
=

is said to be commutative if x cz y

Definition. An operation cz on X
E X.

y cz x for all x , y

1.4.7.
Definition. An operation cz on X is said to be associative if (x cz y) cz z
= x cz (y cz )z for x, y, Z E .X
In the case of the real numbers R, the operations of addition and multiplication are both associative and commutative. The operation ofsubtraction is
neither associative nor commutative.
1.4.8.
then

Definition.

If cz and P are operations on X (not necessarily distinct),

(i) cz is said to be left distributive over


x cz (y P )z

P if

(x cz y)

P (x

cz )z

for every x, y, Z E ;X
(ii) cz is said to be right distributive over
(x

(iii)

P y) cz

z =

(x

P if
cz )z P (y cz

)z

for every x, y, Z E X ; and


cz is said to be distributive over P if cz is both left and right distributive
over p.

In Ex a mple

1.4.4, cz is the only commutative operation. The operation


1.4.4 is not associative. The operations cz, y, and 6 of this ex a mple are associative. In this example, " is distributive over 6 and 6 is distributive
over y.
In the case of the real numbers R, multiplication, ".", is distributive
over addition,
The converse is not true.

p of Example

"+".

1.4.9.
Definition. If cz is an operation on ,X and if IX
is a subset of ,X
then X l is said to be closed relative to cz if for every ,x y E X .. x cz Y E X l .
Clearly, every set is closed with respect to an operation on it.
The set of all integers Z, which is a subset of the real numbers R, is closed
with respect to the operations of addition and multiplication defined on R.
The even integers are also closed with respect to both of these operations,
whereas the odd integers are not a closed set relative to addition.

1.4.

Operations on Sets

1.4.10.

Definition. If a subset X l of X is closed relative to an operation ~


then the operation a: on X l defined by

on X,

(' ,x
for all ,x y

y)

= x

IX'

= x

is called the operation on X l

lX >

induced by

IX.

If X l = X, then IX' = IX. If X l C X but X l 1= = X, then IX' 1= =


since IX'
and are operations on different sets, namely X l and X, respectively. In
general, an induced operation IX' differs from its predecessor IX; however,
it does inherit the essential properties which possesses, as shown in the
following result.

1.4.11. Theorem. L e t
be an operation on X, let X l C X, where X l is
closed relative to IX, and let IX' be the operation on X l induced by IX. Then
(i) if is commutative, then IX' is commutative;
(ii) if is associative, then IX' is associative; and
(iii) if P is an operation on X and X l is closed relative to p, and if is
left (right) distributive over p, then IX' is left (right) distributive over
P', where P' is the operation on X l induced by p.

1.4.12.

Exercise.

Prove Theorem 1.4.11.

The operation IX' on a subset X l induced by an operation on X will


frequently be denoted by IX, and we will refer to as an operation on X l '
In such cases one must keep in mind that we are actually referring to the
induced operation IX' and not to IX.
Definition. eL t X l be a subset of .X An operation a. on X is called
an extension of an operation on X l if X l is closed relative to a. and if is
equal to the operation on X l induced by a..

1.4.13.

A given operation on a subset X l


different extensions.

1.4.14.

Example.
and a. and

on X l

a
b
C

Let

a. on X

a b C
a C b
b
b a
C

a
C

Xl

as

a.

= a{ ,

of a set X may, in general, have many

b, c}, and let X

a b C
a a C b
b C b a
C
b a C
d C d a
e d C a

= a{ ,

a b C d e
a C b d e
C b a e d

b a e

f1.

e d
d e
e d
b e

b e

b, c, d, e}. Define

b e

Chapter 1 I uF ndamental Concepts

30

Clearly, ~ is an operation on IX and ii and fl. are operations on .X Moreover,


both a. and fl. (ii fl.) are extensions of ~. Also, ~ may be viewed as being
induced by ii and fl..

*'

1.5.

MATHEMATICAL
IN THIS BOOK

SYSTEMS

CONSIDERED

We will concern ourselves with several different types of mathematical


systems in the subsequent chapters. Although it is possible to give an abstract
definition of the term mathematical systelf1, we will not do so. Instead, we
will briefly indicate which types of mathematical systems we shall consider in
this book.
1. In Chapter 2 we will begin by considering mathematical systems which
are made up of an underlying set X and an operation ~ defined on X. We
.}~
We will be able to characterize a
will identify such systems by writing ;X{
according to certain properties which X and ~ possess. Two
system ;X{ }~
important cases of such systems that we will consider are semigroups and
groups.
In Chapter 2 we will also consider mathematical systems consisting of a
basic set X and two operations, say ~ and p, defined on ,X where a special
relation exists between ~ and p. We will identify such systems by writing
{X;~,
Pl. Included among the mathematical systems of this kind which we
will consider are rings and fields.
In Chapter 2 we will also consider composite mathematical systems. Such
systems are endowed with two underlying sets, say X and ,F and possess
a much more complex (algebraic) structure than semigroups, groups, rings,
and fields. Composite sy~tems
which we will consider include modules, vector
spaces over a field F which are also called linear spaces, and algebras.
In Chapter 2 we will also study various types of important mappings
(e.g., homomorphisms and isomorphisms) defined on semigroups, groups,
rings, etc.
Mathematical systems of the type considered in Chapter 2 are sometimes
called algebraic systems.
2. In Chapters 3 and 4 we will study in some detail vector spaces and
special types of mappings on vector spaces, called linear transformations.
An important class of linear transformations can be represented by matrices,
which we will consider in Chapter .4 In this chapter we will also study in
some detail important vector spaces, called Euclidean spaces.
3. Most of Chapter 5 is devoted to mathematical systems consisting of
a basic set X and a function p: X x X - + R (R denotes the real numbers),
where p possesses certain properties (namely, the properties of distance

1.6. References and Notes

31

between points or elements in X ) . The function p is called a metric (or a


distance function) and the pair ;X{
p) is called a metric space.
In Chapter 5 we will also consider mathematical systems consisting of
a basic set X and a family of subsets of X (called open sets) denoted by 3.
The pair { X ; 3) is called a topological space. It turns out that all metric
spaces are in a certain sense topological spaces.
We will also study functions and their properties on metric (topological)
spaces in Chapter 5.
.4
In Chapters 6 and 7 we will consider Dormed linear spaces, inner
product spaces, and an important class of functions (linear operators) defined
on such spaces.
A normed linear space is a mathematical system consisting of a vector
space X and a real-valued function denoted by II . II, which takes elements of
X into R and which possesses the properties which characterize the "length"
of a vector. We will denote normed spaces by { X ; 1I11l.
An inner product space consists of a vector space X (over the field of real
numbers R or over the field of complex numbers C) and a function (' , ' ) ,
which takes elements from X x X into R (or into C) and possesses certain
properties which allow us to introduce, among other items, the concept of
orthogonality. We will identify such mathematical systems by writing

{ X ; (,, ) .

It turns out that in a certain sense all inner product spaces are normed
linear spaces, that all normed linear spaces are metric spaces, and as indicated
before, that all metric spaces are topological spaces. Since normed linear
spaces and inner product spaces are also vector spaces, it should be clear
that, in the case of such spaces, properties of algebraic systems (called
algebraic strocture) and properties of topological systems (called topological
structure) are combined.
A class of normed linear spaces which are very important are Bauach
spaces, and among the more important inner product spaces are Hilbert
spaces. Such spaces will be considered in some detail in Chapter 6. Also, in
Chapter 7, linear transformations defined on Banach and Hilbert spaces will
be considered.
5. Applications are considered at the ends of Chapters ,4 5, and 7.

1.6.

REFERENCES

AND NOTES

A classic reference on set theory is the book by Hausdorff 1[ .5]. The many
excellent references on the present topics include the elegant text by Hanneken
1[ .4), the standard reference by Halmos 1[ .3] as well as the books by Gleason
1[ .1] and Goldstein and Rosenbaum 1[ .2].

REFERENCES
1[ .1]

1[ .2]
1[ .3]

1[ .4]
1[ .5]

31

A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.:


Addison-Wesley Publishing Co., Inc., 1966.
M. E. GOLDStEIN and B. M. ROSENBAUM, "Introduction to Abstract Analysis," National Aeronautics and Space Administration, Report No. SP-203,
Washington, D.C., 1969.
P. R. A
H M
L OS, Naive Set Theory. Princeton, N.J.: D. Van Nostrand Company, Inc., 1960.
C. B. A
H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968.
F. A
H SU DORF,F
Mengenlehre. New o
Y rk: Dover Publications, Inc., 194.4

ALGEBRAIC

STRUCTURES

The subject matter of the previous chapter is concerned with set theoretic
structure. We emphasized essential elements of set theory and introduced
related concepts such as mappings, operations, and relations.
In the present chapter we concern ourselves with algebraic structure.
The material of this chapter falls usually under the heading of abstract
algebra or modern algebra. In the next two chapters we will continue our
investigation of algebraic structure. The topics of those chapters go usually
under the heading of linear algebra.
This chapter is divided into three parts. The first section is concerned
with some basic algebraic structures, including semigroups, groups, rings,
fields, modules, vector spaces, and algebras. In the second section we study
properties of special important mappings on the above structures, including
homomorphisms, isomorphisms, endomorphisms, and automorphisms of
semigroups, groups and rings. Because of their importance in many areas
of mathematics, as well as in applications, polynomials are considered in
the third section. Some appropriate references for further reading are suggested at the end of the chapter.
The subject matter of the present chapter is widely used in pure as well as
in applied mathematics, and it has found applications in diverse areas, such
as modern physics, automata theory, systems engineering, information
theory, graph theory, and the like.
33

Chapter 2

34

Algebraic Structures

Our presentation of modern algebra is by necessity very brief. oH wever,


mastery of the topics covered in the present chapter will provide the reader
with the foundation required to make contact with the literature in applications, and it will enable the interested reader to pursue this subject further
at a more advanced level.

2.1.

SOME BASIC STRUCTURES

OF

ALGEBRA

We begin by developing some of the more important properties of mathematical systems, { X ; IX,} where IX is an operation on a non-void set .X
2.1.1. Definition. Let IX be an operation on .X If for all ,x ,Y Z E X,
x IX Y = x IX z implies that y = ,z then we say that ;X {
IX} possesses the left
cancellation property. If x IX y = Z IX Y implies that x = ,z then ;X{
IX}
is
said to possess the right cancellation property. If { X ; IX} possesses both the
left and right cancellation properties, then we say that the cancellation laws
hold in ;X {
IX.}
In the following exercise, some specific cases are given.
2.1.2. Exercise.

eL t

x Y
xxy
yyx

IX

X =

,x{

y} and let IX,

p, )',

.1.- r :- t y
xxy
yxy

xxx
yyx

and d be defined as

~ xxx
yyy

Show that (i) { X ; P} possesses neither the right nor the left cancellation property; (ii) { X ; )'} possesses the left cancellation property but not the right cancellation property; (iii) { X ; d} possesses the right cancellation property but
not the left cancellation property; and (iv) { X ; IX} possesses both the left and
the right cancellation property.
In an arbitrary mathematical system { X ; IX} there are sometimes special
elements in X which possess important properties relative to the operation
IX. We have:
2.1.3. Definition. eL t
element e, such that

IX

be an operation on a set X and let X contain an

IX

e, =

,x

for all x E .X We call e, a right identity element of X relative to lX, or simply


aright identity of the system ;X{
IX.} If X contains an element e, which satisfies
the condition
e,IX

,x

2.1. Some Basic Structures ofAlgebra

3S

for all x E X, then et is called a left identity element of X


simply a left identity of the system ;X{
.}

relative to , or

We note that a system ;X {


} may contain more than one right identity
element of X (e.g., system { X ; cS} of Exercise 2.1.2) or left identity element of
X (e.g., system ;X {
y} of Exercise 2.1.2).

2.1.4. Definition. An element e of a set X


of X relative to an operation on X if
for every x

is called an identity element

e x = x e = x
E

.X

2.1.5. Exercise.

Let

I- h-oI
I

+}

Does either ;X{

or ;X{

to, I}

and define the operations""


+

and" " by

.0 I

0 0

.} have an identity element?

Identity elements have the following properties.

2.1.6. Theorem. L e t

be an operation on .X

has an identity element e, then e is unique.


} has a right identity e, and a left identity ee. then e, = et .
(iii) If is a commutative operation and if ;X{
} has a right identity
element e" then e, is also a left identity.
(i) If { X ;
(ii) If { X ;

Proof To prove the first part, let e' and en be identity elements of { X ; .}
Then e' en = e' and e' en = en. Hence, e' = en.
To prove the second part, note that since e, is a right identity, et e, = et.
Also, since et is a left identity, et e , = e,. Thus, et = e,.
To prove the last part, note that for all x E X we have x = x
e, =

e, x.

In summary, if { X ; } has an identity element, then that element is unique.


F u rthermore, if { X ; } has both a right identity and a left identity element,
then these elements are equal, and in fact they are equal to the uniq u e identity
element. Also, if { X ; } has a right (or left) identity element and is a commutative operation, then {X; } has an identity element.

2.1.7. Definition. L e t
relative to . If x

X,

be an operation on X and let e be an identity of X


then x ' E X is called a right inverse of x relative to

Chapter 2 I Algebraic Structures

provided that
An element x "

x
E

x' =

e.
of x relative to if

X is called a left ia~erse

x"

x =

e.

The following exercise shows that some elements may not possess any right
or left inverses. Some other elements may possess several inverses of one kind
and none of the other, and other elements may possess a number of inverses
of both kinds.
2.1.8. Exercise.

Let X

,x{

y, u, v} and define
~

as

y u v
x x y x y
y x y y x
x

u
v

u v

y
x

(i) Show that { X ; } contains an identity element.


(ii) Which elements possess neither left inverses nor right inverses?
(iii) Which element has a left and a right inverse?
A.

Semigroups and Groups

Of crucial importance are mathematical systems called semlgroups. Such


mathematical systems serve as the natural setting for many important results
in algebra and are used in several diverse areas of applications (e.g., qualitative analysis of dynamical systems, automata theory, etc.).

be an operation on .X
2.1.9. Deftnition. L e t
if is an associative operation on .X

We call { X ;

a semlgroup

Now let ,x y, Z E ,X and let be an associative operation on .X Then


x (y )z = (x y) Z = U E .X Henceforth, we will often simply write
u = x y .z As a result of this convention we see that for x, y, u, V E ,X
X

y~ u~ v =

x
~

(y u) v =

(x~y)~(uv)

x
~

y ~ (u v)

(x~y)uv.

(2.1.10)

As a generalization of the above we have the so-called generalized assoc:lalaw, which asserts that if X I ' X z , .. ' ,x . are elements of a semigroup
{ X ; ~}, then any two products, each involving these elements in a particular
order, are equal. This allows us to simply write X I X z ~ ... ~ x .
ti~e

2.1. Some Basic Structures ofAlgebra

37

In view of Theorem 2.1.6, part (i), if a semigroup has an identity element,


then such an element is unique. We give a special name to such a semigroup.
2.1.11. Definition. A semigroup {X; (X} is called a .monoid if X contains
an identity element relative to (x, Henceforth, the unique identity element of
a monoid ;X {
(X} will be denoted bye.
Subsequently, we frequently single out elements of monoids which possess
inverses.
2.1.12. DefiDition. Let { X ; (X} be a monoid. If x E X possesses a right
inverse x ' E ,X then x is called a right invertible element in .X If x E X
possesses a left inverse x " E ,X then x is called a left invertible element
in .X If x E X is both right invertible and left invertible in ,X then we say
that x is an invertible element or a unit of .X
Clearly, if e

E ,X

then e is an invertible element.

2.1.13. Theorem. Let { X ; (X} be a monoid, and let x E .X If there exists


a left inverse of x, say x', and a right inverse of ,x say x " , then x ' = x " and
x ' is unique.
Since (X is associative, we have (x' (X x) (X x " = x " and x ' (X (x (X x " )
x'. Thus, x ' = x " . Now suppose there is another left inverse of x, say x " ' .
Then x ' " = x " and therefore x ' " = x'.

Proof
=

Theorem 2.1.13 does, in general, not hold for arbitrary mathematical


systems {X; (X} with identity, as is evident from the following:
2.1.14.

Exercise.

Let

= u{ , v, x , y} and define
(X

u v x

u
v

v v u u
u u v x

x
y

x v

(X

as

Use this operations table to demonstrate that Theorem 2.1.13 does not, in
(X} is replaced by system ;X {
(X} with identity.
general, hold if monoid ;X {
By Theorem 2.1.13, any invertible element of a monoid possesses a unique
right inverse and a unique left inverse, and moreover these inverses are
equal. This gives rise to the following.

Chapter 2

38

I Algebraic Structures

2.1.15. Definition. eL t { X ; a} be a monoid. If x E X has a left inverse


and a right inverse, x ' and x " , respectively, then this unique element x ' = x "
is called the inverse of x and is denoted by X - I .
Concerning inverses we have.
2.1.16. Theorem. eL t ;X{

a} be a monoid.

(i) If x E X has an inverse, X - I , then X - I has an inverse (X - I t I = x .


(ii) If x, y E X have inverses X - I , y- I , respectively, then X a y has an
inverse, and moreover (x a y)- I = y- I 1% X - I .
(iii) The identity element e E X has an inverse e- I and e- I = e.

Proof To prove the first part, note that x a X - I = e and X - I


Thus, x is both a left and a right inverse of X - I and (X - I )- I
= .X
To prove the second part, note that
(x a y)a(y- I

and
(y- I

a X-I)

ax - I )
1%

(x

x l % ( yay- I )ax -

a y) = y- I

1%

(X - I

ax

e.

a x ) a y = e.

The third part of the theorem follows trivially from e a e =

e.

In the remainder of the present chapter we will often use the symbols
and "." to denote operations in place of a, p, etc. We will call these
"addition" and "multiplication." oH wever, we strongly emphasize here that
..+" and" " will, in general, not denote addition and multiplication of real
numbers but, instead, arbitrary operations. In cases where there exists an
identity element relative to "+ " , we will denote this element by "0" and call
it "zero." If there exists an identity element relative to ". ", we will denote this
element either by "I" or bye. Our usual notation for representing an identity
+ } an
relative to an arbitrary operation a will still be e. If in a system {X;
element x E X possesses an inverse, we will denote this element by - x and
we will call it "minus "x . F o r example, if ;X{
+ } is a semigroup, then we
denote the inverse of an invertible element x E X by - x , and in this case we
have x + (- x ) = (- x ) + x = 0, and also, - ( - x )
= .x Furthermore, if
,x y E X are invertible elements, then the "sum" x + y is also invertible,
and - ( x
y) = (- y )
(- x ) .
Note, however, that unless
is commutative, - ( x + y) (- x ) + (- y ). Finally, if x, y E X and if y is an
invertible element, then - y E .X In this case we often will simply write
x + (- y ) = x - y.

"+"

*'

"+"

2.1.17. Example. eL t X = O
{ , 1,2, 3}, and let the systems { X ;
{ X ; .} be defined by means of the operation tables

+}

and

2.1.

Some Basic Structures ofAlgebra

0
1
2
3

39

10

1 2 3
0 1 2 3
1 2 3 0
2 3 0 I
3 0 1 2

0
1
2
3

0
0
0
0
0

1 2 3
0 0 0
1 2 3
2 0 2
3 2 1

The reader should readily show that the systems { X ;


}+ and { X ; .} are
monoids. In this case the operation" "
+ is called "addition mod 4" and" "
is called "multiplication mod 4."

The most important special type of semigroup that we will encounter in


this chapter is the group.
2.1.18. Definition. A group is a monoid in which every element is invertible;
IX,} with identity in which every element is
eL ., a group is a semigroup, ;X {
invertible.
The set R of real numbers with the operation of addition is an example of
a group. The set of real numbers with the operation of multiplication does
not form a group, since the number zero does not have an inverse relative to
multiplication. However, the latter system is a monoid. If we let Rtt = R
- O
{ ,J then R
{ ;#
.} is a group.
Groups possess several important properties. Some of these are summarized in the next result.
2.1.19. Theorem. Let {X; IX} be a group, and let e denote the identity
element of X relative to IX. Let x and y be arbitrary elements in .X Then
(i)
(ii)
(iii)
(iv)

if x IX x = x , then x = e;
if Z E X and x IX y = x IX ,z then y = z;
ifz E X a ndx I X y = z I X Y , thenx = z ;
there exists a unique W E X such that
W (X

(v) there exists a unique z

x =

y; and

(2.1.20)

such that

x(Xz=y.
Proof To prove the first part, let x (X x = x. Then X - I IX (x IX x ) =
and so (X - I (X x ) IX x = e. This implies that x = e.
To prove the second part, let x IX y = x IX .z Then X - I (X (x IX y) =
IX )z , and so (X - I IX x ) IX Y = (X - I (X x ) IX .z This implies that y = .z
The proof of part (iii) is similar to that of part (ii).

(2.1.21)

,x

X-I

(X

X-I

IX (x

Chapter 2 I AlgebraicStructures

04

To prove part (iv), let w = y X - I . Then w x = (y x - I ) x = y (X - I


= y. To show that w is unique, suppose there is a v E X such that
= y. Then w x = v x .
By part (iii), w = v.
The proof of the last part of the theorem is similar to the proof of part
(iv). x)
v x

In part (iv) of Theorem 2.1.19 the element w is called the left solution
of Eq. (2.1.20), and in part (v) of this theorem the element z is called the
right solution of Eq. (2. I.21).
We can classify groups in a variety of ways. Some of these classifications
are as follows. eL t { X ; } be a group. Ifthe set X possesses a finite number
of elements, then we speak of a finite group. If the operation is commutative
then we have a commutative group, also called an abelian group. If is not
commutative, then we speak of a non-commutative group or a non-abelian
group. Also, by the order of a group we understand the order of the set .X
Now let ;X {
} be a semigroup and let IX be a non-void subset of X
which is closed relative to . Then by Theorem 1.4.11, the operation I on XI
induced by the associative operation is also associative, and thus the
mathematical system { X I ; I } is also a semigroup. The system { X I ; I } is
called a subsystem of { X ; .} This gives rise to the following concept.

2.1.22. Definitio... eL t { X ;

} be a semigroup, let IX be a non-void subset


of X which is closed relative to lX, and let I be the operation on X I induced
by . The semigroup (X I ; (XI} is called a subsemigroup of (X ; (Xl.

In order to simplify our notation, we will henceforth use the notation


(X I ; (X} to denote the subsemigroup (X I ; t l (Le., we will suppress the
subscript of ) .
The following result allows us to generate subsemigroups in a variety of
ways.

2.1.23. Theorem. eL t {X;

} be a semigroup and let ,X


where I denotes some index set. eL t Y =
"X If,X { ; }

c X for aU i E I,
is a subsemigroup

I, and jf Y is not empty, then { ;Y }

is a subsemigroup

of (X ; } for every i
of { X ; .}

lEI

Proof eL t x , y E .Y Then x, y E ,X for all i


every i, and hence x y E .Y This implies that { Y ;
let

Now let Wbe any non~void

Y'
=

{Y:

subset of ,X where { X ;

We Y c X and { Y ;

I and so x

y E ,X for
is a subsemigroup. _
}

is a semigroup, and

is a subsemigroup of { X ;

n.

2.1. Some Basic Structures ofAlgebra

Then

cy is non-empty,

since X

14

cy. Also, let

G=

YE'l/

.Y

Then We G, and by Theorem 2.1.23 G


{ ; Il} is a subsemigroup of { X ;
This subsemigroup is called the subsemigroup generated by W.

Il}.

2.1.24.
Theorem. Let ;X {
Il} be a monoid with e its identity element, and let
{ X I ; Ill} be a subsemigroup of { X ; Il}. Ife E IX ! , then e is an identity element
of { X I ; Ill} and { X I ; Iltl is a monoid.
2.1.25. Exercise.

Prove Theorem 2.1.24.

Nex t we define subgroup.


2.1.26. Definition. L e t { X ; Il} be a semigroup, and let { X I ; Iltl be a subsemigroup of { X ; Il}. If { X I ; Ill} is a group, then { X I ; Ill} is called a subgroup
of{ X ; Il}. We denote this subgroup by { X I ; Il}, and we say the set IX determines a subgroup of{ X ; Il}.
We consider a specific example in the following:
2.1.27. Exercise.
L e t Z6 = O
{ , 1,2,3,4 , 5} and define the operation
on Z6 by means of the following operation table:

+012345
0012345
1104523
2 2 504
3 1
3345012
4431250
5523104
(a) Show that Z
{ 6; +} is a group.
(b) L e t K = O
{ , I}. Show that{ K ;
+} is a subgroup Of{Z6; +}.
(c) Are there any other subgroups Of{Z6; + } ?
We have seen in Theorem 2.1.24 that if e E IX c ,X then it is also an
identity of the subsemigroup { X I ; Il}. We can state something further.
2.1.28. Theorem. L e t { X ; Il} be a group with identity element e, and let
{ X I ; Il} be a subgroup of { X ; Il}. Then e l is the identity element of { X I ; Il}
if and only if e l = e.

Chapter 2 I Algebraic Structures

14
2.1.29. Exercise.

Prove Theorem 2.1.28.

It should be noted that a semigroup { X ; lX} which has no identity element


may contain a subgroup { X I ; lX,} since it is possible for a subsystem to possess
an identity element while the original system may not possess an identity.
If{ X ; lX} is a semigroup with an identity element and if { X I ; lX} is a subgroup,
then the identity element of X mayor may not be the identity element of X I '
oH wever, if { X ; lX} is a group, then the subgroup must satisfy the conditions
given in the following:

2.1.30. Theorem. eL t { X ; lX} be a group, and let X I be a non-empty subset


of .X Then { X l ; lX} is a subgroup if and only if
(i) e E X I ;
(ii) for every x

E IX >
(iii) for every ,x y E IX >

X-I

and

XI;

lX Y E X l '

Proo.f Assume that { X I ; lX} is a subgroup. Then (i) follows from Theorem
2.1.28, and (ii) and (iii) follow from the definition of a group.
Conversely, assume that hypotheses (i), (ii), and (iii) hold. Condition
(iii) implies that IX is closed relative to lX, and therefore { X I ; lX} is a subsemigroup. Condition (i) along with Theorem 2.1.24 imply that (X I ; lX} is
a monoid, and condition (ii) implies that (X I ; lX} is a group. _
Analogous to Theorem 2.1.23 we have:
2.1.31. Theorem. eL t (X ; lX} be a group, and let ,X c X for all i
where lis some index set. Let Y =
"X If (X,; lX} is a subgroup of { X ;

for every i

l, then (Y ;

lX}

lEI

lX}

l,

is a subgroup of (X ; lX.}

Proof Since e E ,X for every i E 1 it follows that e E .Y Therefore, Y is


non-empty. Now let y E .Y Then y E ,X for all i E l, and thus y- I E IX
so that y- l E .Y Since y E X, it follows that Y c .X Also, for every ,x
y E ,Y x, Y E IX for every i E l, and thus x lX y E IX for every i and hence
x lX y E .Y Therefore, we conclude from Theorem 2.1.30 that { Y ; lX} is a
lX.}
_
subgroup of ;X {
A direct consequence of the above result is the following:
2.1.32. Corollary. eL t (X ; lX}
subgroups of { X ; lX.} eL t X 3
{ X I ; lX} and (X 2 ; lX.}
2.1.33. Exercise.

be a group, and let (X I ; lX} and (X 2 ; lX} be


X 2 Then {X 3 ; lX} is a subgroup of

= XI n

Prove Corollary 2.1.32.

2.1.

Some Basic Structures of Algebra

34

We can define a generated subgroup in a similar manner as was done in


the case of semigroups. To this end let W be any subset of ,X where (X;~}
is a group, and let

Y' = (Y : We Y c X and (Y;~}


The set Y ' is clearly non-empty because
G=

E
Y J!'

is a subgroup of (X ;
X

~} .

.Y' Now let

.Y

Then We G, and by Theorem 2.1.31 (G;~}


is a subgroup of (X ;
subgroup is called the subgroup generated by W.

2.1.34.

Exercise . Let W be defined as above. Show that if


subgroup of(X ; ~}, then it is the subgroup generated by W.

(W;~ }

~}.

This
is a

Let us now consider the following:

"+"

2.1.35. Example.

Let Z denote the set of integers, and let


denote
the usual operation of addition of integers. Let W = (I}. If Y is any subset
of Z such that (Y ; + }
is a subgroup of { Z ; + }
and We ,Y then Y = Z.
To prove this statement, let n be any positive integer. Since Y i s closed with
respect to ,+ we must have 1 + 1 = 2 E .Y Similarly, we must have 1 + I
+ ... + 1 = n E .Y Also, n- I = - n , and therefore all the negative integers
are in .Y Also, n - n = 0 E ,Y i.e., Y = Z. Thus, G =
Y = Z, and so

the group { Z ; + }

is the subgroup generated by I{ .}

E
Y J!'

The above is an example of a special class of generated subgroups, the


so-called cyclic groups, which we will define after our next result.
2.1.36. Theorem.
be a group. Let x

Let

Z denote the set of all integers, and let { X ; ~}


and define ;xl< = x IX X IX IX x (k times), for k a
= (Xl<)-I,
and let OX = e. Let Y = lx { :<
k E Z}.
positive integer. Let I- x <
Then { Y ; ~} is the subgroup of{ ;X }~ generated by (x.}
E

Proof We first show that {Y;~}


is a subgroup of ;X{
.}~
Clearly, Y eX
and e E Y and for every y E Y we have r lEY . Also, for every x , y E Y
we have x ~ y E .Y Thus, by Theorem 2.1.30, (Y;}~
is a subgroup of{ X ; ~}.
Next, we must show that {Y;~}
is the subgroup generated by .} x {
To do so,
it suffices to show that Y c /Y for every /Y such that x E Y and such that
{Y/;~}
is a subgroup of (X; .}~ But this is certainly true, since y E Yimplies
y = lx < for some k E Z. Since x E ,Y it follows that lx < E Y and therefore
y E ,Y .
j

The preceding result motivates the following:

Ciropter 2

I Algebraic Structures

2.1.37. Definition. Let { X ; } be a group. Ifthere exists an element x E X


such that the subgroup generated by } x {
is equal to { X ; ,}
then { X ; } is
called the cyclic group generated by .x

By Theorem 2.1.36, we see that a cyclic group has elements of such a


form that X = { ... , x - 3 , x - z , X - I , e, ,x x z , .. .}. Now suppose there is some
positive integer n such that ' X = e. Then we see that I+ ' X
= .x Similarly,
X-II
= e, and x - + I = .x Thus, X = fe, ,x ... , I-'X ,}
and X i s a finite set of
order n. If there is no n such that ' X = e, then X is an infinite set.
We consider next another important class of groups, the so-called permutation groups. To this end let X be a non-empty set and let M(X ) denote the
set of all mappings of X into itself. Now, if , p E M(X ) then it follows from
(1.2.15) that the composite mapping p 0 belongs also to M(X ) , and we can
define an operation on M(X ) (i.e., a mapping from M(X ) x M(X ) into
M(X ) ) by associating with each ordered pair (P, ) the element po . We
denote this operation by "." and write
II

p . = po ,

pE

M(X ) .

(2.1.38)

We call this operation "multiplication," we refer to p . as the product of


p and , and we note that (P 0 )(x) = (P . )(x) for all x E .X We also note
that "." is associative, for if /%, P, y E M(X ) , then

(<<

P) . y =

(<<

P)

0 i'

0 (jJ 0 y)

(P . y).

Thus, the system { M (X ) ; .} is a semigroup, which we call the semigroup of


transformations on .X
Next, let us recall that a permutation on X is a one-to-one mapping of X
onto .X Clearly, any permutation on X belongs to M(X ) . In particular, the
identity permutation e: X - + X defined by
e(x)

x for all x

,X

belongs to M(X ) . We thus can readily prove the following:


2.1.39. Theorem. { M (X ) ;.}
identity permutation of M(X ) .
L e t
and so e
so /% e = .

Proof

The (e )(x)
. Similarly, (<< e)(x)

E M(X ) .

is a monoid whose identity element is the

e(<)x )
( e(x))

(x)
for every x E ,X
(x) for all x E ,X and

Next, we prove:
2.1.40. Theorem. Let { M (X ) ; .} be the semigroup of transformations on
the set .X An element /% E M(X ) has an inverse in M(X ) if and only if is a
permutation on .X Moreover, the inverse of a unit is the inverse mapping
- I determined by the permutation /%.

2.1.

54

Some Ba.fic Structures ofAlgebra

Proof Suppose that (X E M(X ) is a permutation on .X Then it follows from


Theorem 1.2.10, part (ii), that (X - I is a permutation on X and hence (X - I
E M(X ) . Since (X 0 (X - I
= (X - I 0 (X = e, it follows that (x . (X - I = (X - I . (X = e,
and thus (X has an inverse.
Next, suppose that (X has an inverse in M(X ) and let (x ' denote that inverse
relative to ".". Then (x ' E M(X ) and (x . (X' = (X' (X = e. To show that
(X is a permutation on X
we must show that (X is a one-to-one mapping of
X onto .X To prove that (X is onto, we must show that for any x E X there
exists ayE X such that (X(y) = .x Since (x ' E M(X ) it follows that (X(' )x
E X
for every x E X and (X 0 (X(' )x
= e(x) = .x Letting y = (X(' )x
it follows that
(X is onto. To show that (X is one-to-one
we assume that (X()x
= (X(y). Then,
(X ' ( X ( x
= (X'(X(y
and since (X 0 (X' = e, we have
x =

e(x)

(X

(X(' )x

Therefore, (X is one-to-one.
permutation on .X

(X'

eH nce,

(X()x

if (X

(X'

(X(y)

e(y)

y.

M(X ) has an inverse,

(X - I ,

it is a

Henceforth, we employ the following notation: the set of all permutations


on a given set X is denoted by P(X). As pointed out in Chapter I, if a set
X has n elements, then there are n! distinct permutations on .X
The reader is now in a position to prove the following result.

2.1.41.

Theorem. (P(X ) ;.J

2.1.42.

Exercise.

is a subgroup of(M(X ) ;.J .

Prove Theorem 2.1.41.

The preceding result gives rise to a very important class of groups.

2.1.43. Definition. Any subgroup of the group {P(X); J is called a permu J is called the
tation group or a transformation group on ,X and {P(X);
permutation group or the transformation group on .X
Occasionally, we speak of a permutation group on ,X say { Y ; ,J without
making reference to the set .X In such cases it is assumed that (Y ; J is a
subgroup of the permutation group P(X ) for some set .X

2.1.4.4

Example.
tations, namely,
(XI

(X4

eL t

= ,x{ y, .}z Then P(X ) consists of 3! = 6 permu-

C :),
Gz ;),
y

=
=

(X2

(x s

C z ;),
y

=
=

G ;),

= (;

(x ,

= (: y

(X3

x
y

:),

;).

We can readily verify that (XI = e. If X I = fe, (X 2 J , then (X I ; J is a subgroup


of P(X ) and hence a permutation group on .X eL t X 2 = (e, (X 4 ' (X s J . Then

Chapter 2 I Algebraic Structures

64

{X

2 ; .}

and { X

B.

is also a permutation group on .X

2 ; .} is of order 3.

Note that { X I ;

.} is of order 2

Rings and iF elds

Thus far we have concerned ourselves with mathematical systems consisting of a set and an operation on the set. Presently we consider mathematical systems consisting of a basic set X with two operations ~ and P defined
on the set, denoted by { X ; ~, Pl. Associated with such systems there are two
mathematical systems (called subsystems) {X;~}
and ;X {
Pl. By insisting that
the systems ;X {
}~ and ;X {
P} possess certain properties and that one of the
operations be distributive over the other, we introduce the important mathematical systems known as rings. We then concern ourselves with special types
of important rings called integral domains, division rings, and fields.
2.1.45.
Definition. Let X be a non- e mpty set, and let ~ and P be operations
on .X The set X together with the operations ~ and P on ,X denoted by
{X;~,
P}, is called a ring if
(i) {X;~}
(ii)' { X ;
(iii)

is an abelian group;

Pl is a semigroup; and

P is distributive over ~.

We refer to {X;~}
as the group component of the ring, to ;X{
P} as the
semigroup componeDt of the ring, to tX as the group operatioD of the ring, and
to P as the semigroup operation of the ring. F o r convenience we often denote
a ring { X ; tX, P} by X and simply refer to "ring X " . F o r obvious reasons,
we often use the symbols
and "." ("addition" and "multiplication")
in place of tX and P, respectively. Thus, if X is a ring we may write { X ;
and assume that { X ;
1 is the group component of ,X and { X ; .} is the
semigroup component of .X We call { X ; + }
the additive group of ring ,X
;X{
.} the multiplicatiYe semigroup of ring ,X x
y the sum of x and y, and
x . y the product of x and y.
We use 0 ("z e ro") to denote the identity element of { X ; .}+
If { X ; .} has
an identity element, we denote that identity bye.
The inverse of an element x relative to
is denoted by - .x If x has an
inverse relative to ".", we denote it by X - I . F u rthermore, we denote x
(- y )
by x - y (the "difference of x and y") and (- x )
y by - x
y. Note that
the elements 0, e, - x ,
and X - I are unique.
Subsequently, we adopt the convention that when operations
and
"." appear mixed without parentheses to clarify the order of operation,
the operation should be taken with respect to ". " first and then with respect
to .+
F o r example,

"+"

,+ .}

"+"

x y

z =

(x y)

74

2.1. Some Basic Structures ofAlgebra

and not x (y
Thus, we have
x

)z . The latter would have to be written with parentheses.


(y

)z =

(x y)

In general, the semigroup ;X {


if it does we have:

(x )z

= x y

+
x

.z

.} does not contain an identity. However,

,+ .}

2.1.46.
Definition. Let ;X {
an identity element, we say that X

be a ring. If the semigroup ;X {


is a ring with identity.

.} has

There should be no ambiguity concerning the above statement. The group


+ } always has an identity, so if we say "ring with identity," we must
.}.
refer to ;X{
We note that it is always true that the operation
is commutative for
a given ring. If in addition the operation "." is also commutative, we have

;X {

"+"

,+ .}

be a ring. Ifthe operation "." is com2.1.47.


Definition. eL t ;X {
mutative on the set X then the ring X is called a commutative ring.
F o r rings we also have:

,+ .}

be a ring with identity. An element x


2.1.48.
Definition. L e t{ X ;
is called a unit of X if x has an inverse as an element of the semigroup { X ;
We denote this inverse of x by X - I .

X
.}.

The reader can readily verify that the following examples are rings.
L e tting" "
+ and "." denote the usual operations of
2.1.49.
Exercise.
addition and multiplication, show that { X ; ,+ .} is a commutative ring with
identity if
(i)
(ij)
X

(iii)
X

is the set of integers;


is the set of rational numbers; and
is the set of real numbers.

2.1.50. Exercise.
operation tables:

Show that ;X{

,+ .}

= O{ , I} and define" "+ and" " by the following

Let X

0
0

I
I

I 0

o
I

I
0 0

_O

is a commutative ring with identity.

ex} be an abelian group with identity element e.


2.1.51. Exercise.
Let ;X{
Define the operation P on X as x p y = e for every x , y E .X Show that
;X{
ex, P} is a ring.

Chapter 2 I Algebraic Structures

14

F o r rings we have:

,+ .}

2.1.52. Theorem. If { X ;

is a ring then for every ,x y

(i) x + 0 = 0 + x = x ;
(ii) - ( x + y) = (- x ) + (- y ) = (- x ) - y
(iii) if x
y = 0, then x = y;
(iv) - ( - x )
= x;
(v) 0 = x 0 = 0 x ;
(vi) (- x ) .
y = - ( x . y) = x (- y ); and
(vii) (- x ) ( - y )= x y .

= -x
-

X we have

y;

Proof Parts (i)-(iv) follow from the fact that ;X{


+ } is an abelian group
and from our notation convention.
To prove part (v) we note that since z + 0 = z for every z E X we have
for every x E ,X 0 x + 0 = 0 x = (0 + 0) x = 0 x + 0 x, and thus
o = 0 x. Also, x . 0 + 0 = x 0 = x . (0 + 0) = x 0 + x 0, so that
o = x O. eH nce, 0 = x 0 = 0 . x for every x E .X
To prove part (vi), note that 0 y = 0 for every y E X and since x + (- x )
= 0 we have 0 = O y = x [ + (- x ) ]
y = x y+ ( - x )
y. This implies
y) = (- x ) y since - ( x
y) is the additive inverse of x y.
that - ( x
Similarly, 0 = x 0 = x y[ + (- y )] = x y + x (- y ). This implies
y). Thus, (- x ) . y = - ( x
y) = x (- y ).
that x (- y ) = - ( x
iF nally, to prove part (vii), we note that since - ( - z ) = z for every
z E X and since part (vi) holds for any x E X, we obtain, replacing x by - x ,
(- x ) .

(- y )

-[(-x).

y]

-[-(x

y)] =

y.

Now let { X ; + , .} denote a ring for which the two operations are equal,
i.e., "+" = ".". Then x + y = x y for all x , y E .X In particular, if
y = 0, then x + 0 = x . 0 = 0 for all x E X and we conclude that 0 is the
This gives rise to:
only element of the set

:X

2.1.53. Definition. A ring ;X{

,+ .}

is called a trivial ring if X =

O
{ .J

We next introduce:

,+ .}

be a ring. Ifthere exist non-zero elements


2.1.54. Definition. eL t { X ;
x , y E X (not necessarily distinct) such that x y = 0, then x and yare both
called divisors of ez ro.
We have:

,+ .} be a ring, and let X = X - O{ .J Then X


2.1.55. Theorem. eL t ;X {
has no divisors of ez ro if and only if ;#X{
.} is a subsemigroup of ;X{
.}.

2.1. Some Basic Structures ofAlgebra

49

Proof Assume that X has no divisors of ez ro. Then ,x y E #X


x y 0, so x Y E #X and X# is a subsemigroup.
Conversely, if x, y E #X
implies x y E ,#X
then x y,* if x

*'

implies

*'

y,*O.

and

We now consider special types of rings called integral domains.


2.1.56. Deftnition. A ring { X ;
no divisors of ez ro.

,+ .J

is called an integral domain if it has

Our next result enables us to characterize integral domains in another


equivalent fashion.
2.1.57. Theorem. A ring X is an integral domain if and only if for every
0, the following three statements are equivalent for every y, Z E X:

*' x

(i) y = ;z
(ii) x y =
(iii) y. x =

x z; and
z .x

Proof Assume that X is an integral domain. Clearly (i) implies (ii) and
(iii). To show that (ii) implies (i), let x y = x .z Then x (y - )z = 0.
Since x
0 and X has no ez ro divisors, y - z = 0 or y = .z Thus, (ii)
implies (i). Similarly, it follows that (iii) implies (i). This proves that (i),
(ii), and (iii) are equivalent.
Conversely, assume that x ;= t:. 0 and that (i), (ii), and (iii) are equivalent.
Let x y = O. Then x 0 = x . y, and it follows that y must be zero since
(ii) implies (i). Thus, x . Y ;= t:. 0 for y ;= t:. 0, and X has no ez ro divisors. _

*'

We now introduce divisors of elements.

,+ . J be a commutative integral domain with


2.1.58. Definition. Let ;X{
identity, and let x, y E .X We say y is a divisor of x if there exists an element
Z E X such that x = y .z If y is a divisor of ,x we write y I.x
If y Ix, it is customary to say that y divides .x
2.1.59. Theorem. Let ;X{
,+ .J be a commutative integral domain with
identity, and let x E .X Then x is a unit of X ifand only if x Ie.

Proof Let x Ie. Then there is a z E X such that e = x z


is an inverse of ,x eL ., z = X - I .
Conversely, let x be a unit of .X Then there exists X - I
e = x X - I , and thus x l e. _
We notice that if in an integral domain x y

.x Thus,

such that

0, then either x

0 or

Chapter 2

Algebraic Structures

= O.

Now a divisor of zero cannot have an inverse. To show this, we let


O. Suppose that y has an inverse.
Then x y r I = 0 r I, or x = 0, which contradicts the fact that x and y
are z e ro divisors. However, the fact that an element is not a zero divisor
does not imply it has an inverse. If all of the elements except zero have an
inverse, we have yet another special type of ring.
y

x and y be divisors of zero, eL ., x y =

,+ .}

2.1.60. DeflDition. L e t {X;


be a non-trivial ring, and let X #
The ring X is called a division ring if { X # ;
.} is a subgroup of {X;

X .}.

O
{ .J

In the case of division rings we have:


2.1.61. Theorem.
identity.

Let { X ;

,+ .}

be a division ring. Then X

is a ring with

Proof L e t X #
= X - O{ .J Then { X # ;
.} has an identity element e. L e t
x E .X Ifx E X # , then e x = x e = x. If x ~ X * , then x = 0 and 0 e
= e 0 = O. Therefore, e is an identity element of .X

Of utmost importance is the following special type of ring.


2.1.62. Definition. L e t { X ; ,+
.} be a division ring. Then X
field if the operation "." is commutative.

is called a

Because of the prominence of fields in mathematics as well as in applications, and because we will have occasion to make repeated use of fields, it
may be worthwhile to restate the above definition, by listing all the properties
of fields.
2.1.63. DefinitioD. L e t X be a set containing more than one element, and
let there be two operations
and "." defined on .X Then { X ;
is a
field provided that:

"+"

(i) x

(y

,x y, z

(ii)

z) = (x
X (Le.,

+
"+"

,+ .}

y)
z and x (y )z = (x y) z
and"." are associative operations);

x
y= y
x and x y = y x for all x , y
"." are commutative operations);

(Le.,

for all

"+"

and

(iii) there exists an element 0 E X such that 0


x = x for all x E X;
(iv) for every x E X there exists an element - x
E X
such that x
+(-x)=O;
(v) x (y + z) = x y + x z for all x , y, z E X (i.e., "." is distributive over "+ " );
(vi) there exists an element e*"O such that e x = x for all x E X; and
(vii) for any x
0, there exists an X - I
E X such that x (X - I )
= e.

*"

51

2.1. Some Basic Structures ofAlgebra

2.1.64. Example.
Perhaps the most widely known field is the set of real
numbers with the usual rules for addition and multiplication. _
2.1.65. Exercise. Let Z denote the set of all integers and .. "
+ and "."
denote the usual operations of addition and multiplication on Z. Show that
Z
{ ; ,+ .} is an integral domain, but not a division ring, and hence not a
field.
The above example and exercise yield:
2.1.66. Definition. Let R denote the set of all real numbers, let Z denote
+ and "." denote the usual operations of
the set of all integers, and let" "
addition and multiplication, respectively. We call {R; ,+
.} the field of real
the ring of Integers.
numbers and { Z ;

,+ .}

Another very important field is considered in the following:


2.1.67. Exercise.
eL t C = R x R, where R is given in Definition 2.1.66.
F o r any x , y E C, let x = (a, b) and y = (c, d), where a, b, e, d E R. We
define x = y if and only if a = c and b = d. Also, we define the operations
..+" and "." on C by
x

and
Show that C
{ ;

,+ .}

x y

(a

(ac -

c, b

bd, ad +

d)

be).

is a field.

In view of the last exercise we have:


2.1.68. Definition. The field (C;
the field of complex numbers.

,+ .}

defined in Exercise 2.1.67 is called

2.1.69. Exercise.
eL t Q denote the set of rational numbers, let P denote
the set of irrational numbers, and let" "
+ and" " denote the usual operations
of addition and multiplication on P and Q.
(a) Discuss the system Q
{ ;
(b) Discuss the system P
{ ;

,+ .}.
,+ .}.

2.1.70. Exercise. (This exercise shows that the family of 2 x 2 matrices


denote the field of real numbers.
forms a ring but not a field.) Let {R;
Define M to be the set characterized as follows. If u, v E M, then u and v
are of the form

,+ .}

Chapter 2 I Algebraic Structures

52
where a, b, c, d and m, n, p, q
onMby

E R.

Define the operations

"+"

and "."

u+ v= 0[ dbJ + m[ p qnJ = 0[ c+ p+ m bd+ q+ nJ


C

and

u v=

0[ bJ.
C

[m

nJ = 0[ . m +

c m + d p

pO n +

c.n+ d q

.Jq

(Note that in the preceding the operations + and defined on M are entirely
different from the operations + and for the field R.)
(a)
(b)
(c)
(d)

Show that M
{ ; +}
{ ; +}
Show that M
{ ; ,+
Show that M
{ ; ,+
Show that M

is a monoid.
is an abelian group.
.} is a ring.
.} has divisors of ez ro.

Next, we introduce the concept of subring.


2.1.71. Definition. eL t X be a ring, and let Y be a non-void subset of X
which is closed relative to both operations "+" and "." of the ring .X The
set ,Y together with the (induced) operations "+" and ".", { Y ; ,+ ,} ' is
called a subring of the ring X provided that { Y ; ,+ .} is itself a ring.
In connection with the above definition we say that subset Y determines
,+ .}. We have:

the subring { Y ;

2.1.72. Theorem. If X is a ring then a non-void subset Y o f


a subring of the ring X if and only if
(i)

(ii)

Y is closed with respect to both operations" "


+
-x
E Y whenever x E .Y

2.1.73. Exercise.
sU ing

determines

and" " ; and

Prove Theorem 2.1.72.

the concept of subring, we now introduce subdomains.

2.1.74. Definition. eL t X be a ring, and let Y be a subring of .X


an integral domain, then it is called a subdomain of .X

If Y is

We also define subfield in a natural way.


2.1.75. Definition. eL t X be a ring, and let Y be a subring of .X
field, then it is called a subfteld of .X

If Y is a

Before, we characterized a trivial ring as a ring for which the set X consists
only of the 0 element. In the case of subrings we have:

2.1. Some Basic Structures ofAlbegra

53

,+ .}

2.1.76. Definition. Let { X ;


be a ring, and let { Y ;
Then subring Y is called a trivial subring if either
(i)
(ii)

Y =
Y=

,+ .}

be a subring.

O
{ J or

.X

F o r subdomains we have:
2.1.77. Theorem. L e t X be an integral domain, and let Y be a non-trivial
subring of .X Then Y is a subdomain of .X
Proof Let x , y E ,Y and let x y = O. Since x , y
zero divisors. Thus, Y has no zero divisors.

X, x and y cannot be

F o r subfields we have:

"*

2.1.78. Theorem. Let X be a field, and let Y be a subring of .X


is a subfield of X if and only if for every x E ,Y X
0, X - I E .Y
2.1.79.

Exercise.

Then Y

Prove Theorem 2.1.78.

F o r the intersection of arbitrary subrings we have the following:


2.1.80. Theorem. Let X be a ring, and let /X be a subring of X for each
i E I, where I is some index set. Let Y =
X I ' Then ;Y{
is a subring

,+ .}

/EI

of(X ; + , .} .

Proof Since 0 E /X for all i E I, it follows that 0 E Y a nd Y is non-empty.


L e t x , y E .Y Then x , y E /X for all i El. Hence, x + y E /X and x y
E /X
for all i E I so that Y is closed with respect to "+" and "." Also,
-x
E /X
for every i E I. Thus, by Theorem 2.1.72, Y is a subring of .X

,+ .}

Now let (X ;
Y'
Then' Y

= {Y:

be a ring and let W be any subset of .X

W eY e

is non-empty because X

,+ .}

Also, let

X and Y is a subring of .}X


E

.Y '

Now let R

and, by Theorem 2.1.80, R


{ ;
is a subring of { X ;
called the subring generated by W.

YEy

,+ .}.

.Y

Then We R
This subring is

C. Modules, Vector Spaces, and Algebras


Thus far we have considered mathematical systems consisting of a set
of elements and of mappings from X X X into X called operations on .X
Since a mapping may be regarded as a set and since an operation is a mapping
(see Chapter I), the various components of the mathematical systems considered up to this point may be thought as being derived from one set .X
X

Chapter 2 / Algebraic Structures


Next, we concern ourselves with mathematical systems which are not
restricted to possessing one single fundamental set. We have seen that a single
set admits a number of basic derived sets. Clearly, the number of sets that
may be derived from two sets, say X and ,Y will increase considerably. F o r
example, there are sets which may be generated by utilizing operations on X
and ,Y and then there are sets which may be derived from mappings of
X x Y into X or into .Y
Mathematical systems which possess several fundamental sets and
operations on at least one of these sets may, at least in part, be analyzed by
making use of the development given thus far in the present section. Indeed,
one may view many such complex systems as a composite of simpler mathematical systems and refer to such systems simply as composite mathematical
systems. Important examples of such systems include vector spaces, algebras,
and modules.

{ ; ,+ .} be a ring with identity, e, and let { X ; } +


2.1.81. Definition. L e t R
be an abelian group. L e t IJ: R x X - + X be any function satisfying the
following four conditions for all tx, pER and for all x, y E X :
(i) p(tx + p, x ) = lJ(tx, x ) + IJ(P, x);
(ii) lJ(tx, x + y) = p(tx, x ) + p(tx, y);
(iii) lJ(tx, p(P, x
= p(tx P, x); and
(iv) p(e, x ) = x .

Then the composite system {R, X , p} is called a module.


Since the function p is defined on R x ,X the module defined above is
sometimes called a left R-module. A right R-module is defined in an analogous
manner. We will consider only left R-modules and simply refer to them as
modules, or R-modules.
The mapping IJ: R x X - + X is usually abbreviated by writing lJ(tx, x )
= tx,x i.e., in the same manner as "multiplication of tx times x . " Using this
notation, conditions (i) to (iv) above become
(i) (tx + P)x = txx + px ;
(ii) tx(x + y) = txx + txy;
(iii) tx(px) = (tx P)x; and
(iv)

ex
=

x;

respectively. We usually refer to the module R


{ , X,IJ}
by simply referring
to X and calling it an R-module or a module over R.
To simplify notation, we used in the preceding the same operation symbol,
for ring R as well as for group .X However, this should cause no confusion,
since it will always be clear from contex t which operation is used. We will
follow similar practices on numerous other occasions in this book.

,+

2.1. Some Basic Structures ofAlgebra

55

2.1.82. Example. Let Z


{ ; ,+ .} denote the ring of integers, and let{X;
}+
be any abelian group. Define p: Z X X - + X by p(n, )x = x
,x
where the summation includes x n times. We abreviate this as p(n,x ) = nx
and think of it as "n times x . " The identity element in Z is 1, and we see that
the conditions (i) to (iv) in Definition 2.1.81 are satisfied. Thus, any abelian
group may be viewed as a module over the ring of integers. _

+ ... +

,+ .}

2.1.83. Example. eL t { X ;
be a ring with identity, and let R be a
subring of X with e E R. By defining p: R x X - + X as p(I" )x = (1,. ,x it
is clear that X is an R-module. In particular, if R = ,X we see that any ring
with identity can be made into a module over itself. _
F o r modules we have:
2.1.84. Theorem. Let
we have
(i) (1,0 =

be an R-module. Then for all (I,

R and x

0;

(ii) (1,( - x )
= -(I,x);
(iii) Ox = 0; and
(iv) (- ( I,)x = -(I,x).

Proof. To prove the first part, we note that for 0 E X we have 0 + 0 = O.


Thus, (1,(0 + 0) = (1,0 + (1,0 = (1,0, and so (f,Q = O.
To prove the second part, note that for any x E X w e have x + (- x ) = 0,
and thus (I,(x + (- x
= (l,X + (1,( - x ) = (1,0 = O. Therefore, (1,( - x )

-(I,x).
To prove the third part observe that for 0 E R we have 0 + 0 = O. eH nce,
(0 + O)x = Ox + Ox = Ox, and therefore Ox = O.
To prove the last part, note that since (I, + (- ( I,) = 0 it follows that
( I, + (- ( I, x
= Ox = O. Therefore, (l,X + (- ( I,)x = 0, and (- ( I,)x = -(I,x).
=

We next introduce the important concept of vector space.

2.1.85. Definition. eL t ;F {
group. If X is an F-module,

,+ .}

be a field, and let ;X {


}+ be an abelian
then X is called a vector space over .F

The notion of vector space, also called linear space, is among the most
important concepts encountered in mathematics. We will devote the next two
chapters and a large portion of the remainder of this book to vector spaces
and to mappings on such spaces.

.} be a ring, and let R" = R x ... x R;


2.1.86. Theorem. Let {R; ,+
i.e., R" denotes the n-fold Cartesian product of R. We denote the element

Chapter 2 I Algebraic Structures


X

R" by x =

(x .. :x

Y =

,x ,) and define the operation"

(X I

for all ,x Y E R". Also, we define p: R X


otx

., "x

IY ">

R" - +

"+

on k' by

,Y ,)

k' by

(otx .. . . , otx,,)

for all ot E R and x E R". Then, R" is an R-module.

2.1.87. Exercise.

Prove Theorem 2.1.86.

We also have:

,+ .}

2.1.88. Theorem. eL t ;F {
be a field, and let "F = F x ... X F
be the n-fold Cartesian product of .F Denote the element x E F " by x = (~I'
~z,
... ,~,,)
and define the operation "+" on "F by
for all ,x Y

"F . Also, define p: F x "F


otx =

for all ot

F and x

2.1.89. Exercise.

(~

-+

"F by

..... , ~,,)

F". Then F " is a vector space over .F

Prove Theorem 2.1.88.

In view of Theorem 2.1.88 we have:

2.1.90. Definition. Let ;F { ,+ .} be a field. The vector space "F


is called the vector space of n-tuples over .F

over F

Another very important concept encountered in mathematics is that of


an algebra. We have:

2.1.91. Definition. eL t X be a vector space over a field .F Let a binary


operation called "multiplication" and denoted by "." be defined on X,
satisfying the following axioms:
(i) x ' (y + )z
(ii) (x
y) z
(iii) (otx) (PY)

x Y
x z

(ot P)(x

x ;z

Y ;z and
y)

for all x, ,Y z E X and for all ot, P E .F Then, X is called an algebra over .F
If, in addition to the above axioms, the binary operation of multiplication is

associative, then X is called an associative algebra. Ifthe operation is com-

2.1. Some Basic Structures ofAlgebra

57

mutative, then X is called a commutative algebra. IfX has an identity element,


then X is called an algebra with identity.
Note that in hypothesis (iii) the symbol"." is used to denote two different
operations. Thus, in the case of x y the operation used is defined on X while
in the case of /X p the operation used is defined on .F
The reader is cautioned that in some texts the term algebra means what
we defined to be an associative algebra.
2.1.92. Exercise.
Let M
{ ;
{ ;
in Exercise 2.1.70, and let R
given by

,+ .}

,+ .}

denote the ring of 2 x 2 matrices defined


be the field of real numbers. F o r u E M

where a, b, c, d E R, define /Xu for /X E R by


/Xu

= /[ QX

/Xb.J

tXd

/Xc

Show that M is an associative algebra over R.


In some areas of application, so-called Lie
We have:

algebras are of importance.

2.1.93. Definition. A non-associative algebra R is called a Lie


if x x = 0 for every x E R and if
x (y )z

for every x , y, Z

y . (z x)

(x y) =

algebra
(2.1.94)

R. Eq u ation (2.1.94) is called the J a cobi identity.

L e t us now consider some specific cases of Lie algebras. Our first exercise
shows that any associative algebra can be made into a iL e algebra.
2.1.95. Exercise.
Let R be an associative algebra over ,F
operation "." on R by

and define the

x . y= x y - y x
for all x, y E R (where" " is the operation on the associative algebra R
over F ) . Show that R with "." defined on it is a iL e algebra.
2.1.96. Example.
In Exercise 2.1.70 we showed that the set of 2 x 2
matrices forms a ring but not a field, and in Exercise 2.1.92 we showed that
this set forms an algebra over ,F the field of real numbers. This set can be
made into a Lie algebra by Exercise 2.1.95.

Chapter 2 I Algebraic Structures

2.1.97. Exercise. eL t X denote the usual "three-dimensional space," and


let i, j, k denote the elements of X depicted in Figure A.

k = (0,0,1)

j = (0,1,0)

2.1.98.

vectors i.j. k in three-dimensional space.

iF gure A. Unit

Define the operation "x " on X by

0
-k

j
k
0

k
-j

-i

x
j

i.e., "X " denotes the usual "cross product," also called "outer product."
encountered in vector analysis. Show that X is a iL e algebra.
eL t us next consider submodules.
2.1.99. Deflnition. eL t {R; ,+ .J be a ring with identity, and let { X ; } +
be an abelian group, where X is an R-module. eL t { Y ; + J be a subgroup of
{X;
.J If Y is an R-module, then Y is called an R-submoclole of .X

We can characterize submodules by the following:


2.1.100. Theorem. eL t X be an R-module, and let Y
subset of .X Then, Y is an R-submodule if and only jf

be a non-empty

(i) { Y ; + J is a subgroup of { X ; + ;J and


(ii) for all E R and x E ,Y we have x
E .Y

Proof We give the sufficiency part of the proof and leave the necessity part
as an exercise.
Let , pER and let x E .Y Then x, px , (<< + P)x E Y by hypothesis
(ii). Since iY s a group, it follows that x + py E Yand since x E X we have

2.1. Some Basic Structures ofAlgebra

59

(<< + P)x = x + px . Now Iet E R and let x , y E .Y Then (x + y) E Y


and, also, ( x + y), x , y E .Y Thus, ( x + y) = x + Y,
since Y is a
subgroup of .X Now let , PER, and let x E .Y Then px E ,Y and hence
( px )
E .Y
We have (<< P)x E ,Y and so ( px ) = (<< P)x . Also, since
e E R, we have ex E Y for all x E Y and furthermore, since x E ,X we
have ex = x . This proves that Y is an R-module and hence an R-submodule
of X .
2.1.101.

Exercise.

Prove the necessity part of the preceding theorem.

We next introduce the notion of vector subspace, also called linear subspace.
2.1.102. DefinitioD. Let F be a field, and let X be a vector space over .F
Let Y be a subset of .X If Y is an F - s ubmodule of ,X then Y is called a vector
snbspace.
L e t us consider some specific cases.
2.1.103.
for i
E

R}

Example.

L e t R be a ring, let X

I, ... ,n. Then the subset of X

is an R-submodule of

.X

be an R- m odule,
given by { x

:X

and let {x
x

ii

(1,{,{x

2.1.104.
Example.
Let F be a field, and let P be the vector space of ntuples over .F Let IX = (1,0, ... ,0) and lX = (0,1,0, ... 0). Then IX '
lX E F~. Let Y = (x E P: X = I X I + lX l , .. 1 E .}F Then iY s a vector
subspace. We see that jf X E ,Y then x is of the form x = ( 1,1. (1,1,0, ... ,0).

We next prove:
2.1.105. Theorem. Let X be an R-module, and let Y' denote a family of
R-submodules of ,X i.e., /Y is a submodule of X for every {Y E ,Y '
where
i E I and I is some index set. Let Y =
.{Y Then Y is an R-submodule of
.X
{el

Since {Y is a subgroup of X for all Y 1 E ,Y ' it follows that Y is a


subgroup of X by Theorem 2.1.31. Now let E R and let y E .Y Then
y E Y j for all {Y E .Y '
Hence, y E {Y for all {Y E .Y '
and so (1,y E .Y
Therefore, by Theorem 2.1.100, Y is an R-submodule of .X

Proof

The above result gives rise to:

Chapter 2 I Algebraic Structures

2.1.106. Deftnition. Let X be an R-module, and let W be a subset of .X


eL t cy be the family of subsets of X given by
eL t G

cy =
()

rely

{Y:

W c: Y

and Y i s

c: X

an R-submodule of X}.

.Y Then G is called the R-sabmodule of X generated by W.

eL t us next prove:
2.1.107. Theorem. eL t X be an R-module, and let
(Y x I ' . . , ,x ,) denote the subset of X given by
(Y x

l,

Then (Y x

,x,,)
I'

{x

X:

=
x

IX

+ ... +

"x"'

,X"

.X

I' ... ,"

R}.

lt

eL t

,x ,) is an R-submodule of .X

Proof. F o r brevity let Y = (Y x I ' . . , ,x ,). To show that Y is a subgroup


of X we first note that E .Y Next, for x E ,X let y = (- I )X
I
+ (-I"X )X,,.
Then y E Y and x
y = 0, and hence y = - x .
Next, let
z = PIX I
P..x". Then x
Z =
(IX I
PI)X I
(IX"
P")x,,
E .Y
Therefore, by Theorem 2.1.30, Y is a subgroup of .X

+ ... +

Finally, note that for any a


ax

a(IXlx

+ ... +

IX",X ,)

+ ...

R,

a IXIX

+ ... +

+ ...

Thus, by Theorem 2.1.100, Y is an R-submodule of .X


We see that (Y x
if we let Y = Y ( x "

l,

.Y

, ,x ,)
belongs to the family cy of Definition 2.1.106
... , ,x ,), in which case n Y , = (Y x l , , ,x ,). This

2.1.108. Deftnition. eL t
I'

IX""X

l ,

r'.E.y

leads to:
(Y x
(Y x

+a

be an R-module, let XI'

,x,,) = { x E X : x = IXIX I + ...


, ,x ,) is called the R-module of X

X, and let
IX""X , lX I ' . . ,IX " E R}. Then
generated by x I ' . . . , "x .

... ,X "

Also of interest to us is:


2.1.109. Definition. eL t X be an R-module. If there exist elements XI'
. . ,X" E X such that for every x E X there exist lX I ' , IX" E R such that
x = I X I + ... + "x"'
then X is said to be bitely generated and XI' ... ,
x" are called the generators of .X
It can happen that the indexed set I{ X It . . , IX,,} in the above definition is
not unique. That is to say, for x E X we may have x = IXIX I + ... + IX..x"
= PIX I + ... + P"x", where 1X , 1= = P, for some i. oH wever, if it turns out
that the above representation of x in terms of x " ... ,X " is unique, then we
have:

61

2.1. Some Basic Structures ofAlgebra

2.1.110. DefiDitioD. Let X be an R-module which is finitely generated.


eL t "x ... , X n be generators of .X Iffor every x E X the relation

x
implies that ~I
a basis for .X
D.

+ ... +

~,x,

= PI

for

PIX, + ... + Pnxn


all i = I, ... ,n, then the set "x {
. .. ,xnJ
~.xn

is called

Overview

We conclude this section with the flow chart of Figure D, which attempts
to put into perspective most of the algebraic systems considered thus far.

Integral domain

Commutative ring

Module

Associative algebra

2.1.111.

Commutative algebra

iF gure B. Some basic structures of algebra.

2.2.

O
H MOMORPHISMS

Thus far we have concerned ourselves with various aspects of different


mathematical systems (e.g., semigroups, groups, rings, etc.). In the present
section we study special types of mappings defined on such algebraic structures. We begin by first considering mappings on semigroups.
2.2.1. Definition. eL t { X ; } and { Y ; P} be two semigroups (not necessarily distinct). A mapping p of set X into set Y is called a homomorphism
of the semigroup { X ; } into the semigroup { Y ; P} if
p(x

(X

y)

(2.2.2)

p(x)P p (y)

for every ,x y E .X The image of X under p, denoted by p(X ) , is called the


homomorphic image of .X If x E X then p(x ) is called the homomorphic
image of .x
In iF gure C, the significance of Eq. (2.2.2) is depicted pictorially. rF om
this figure and from Eq. (2.2.2) it is evident why homomorphisms are said to
"preserve the operations (X and p."
p

p
p

x
2.2.3.
{ Y ; fJ.}

Figure C. Homomorphism

or semigroup { X ;

~}

into semigroup

In the above definition we have used arbitrary semigroups ;X{


} and
Pl. As mentioned in Section 2.1, it is often convenient to use the symbol
"+ " for operations. When using the notation { X ; +} and { Y ; +} to denote
two different semigroups, it should of course be understood that the operation
+ associated with set X will, in general, be different from the operation +
associated with set .Y Since it will usually be clear from context which particular operation is being used, the same symbol will be employed for both
semigroups (however, on rare occasions we may wish to distinguish between
different operations on different sets).
sU ing the notation { X ; + } and { Y ; + }
in Definition 2.2.1, Eq. (2.2.2)

{Y;

62

2.2. oH momorphisms
now assumes the form
p(x

y)

p(x )

(2.2.4)

p(y)

for every ,x y E .X This relation looks very much like the "linearity property"
which will be the central topic of a large portion of the remainder of this
book, and with which the reader is no doubt familiar. oH wever, we emphasize
here that the definition of "linear" will be reserved for a later occasion, and
that the term homomorphism is not to be taken as being synonymous with
linear. Nevertheless, we will see that many of the subsequent results for
homomorphisms will reoccur with appropriate counterparts throughout the
this book.
2.2.5. Example. Let R denote the set of real numbers, and let "+" and
"." denote the usual operations of addition and multiplication on R. Then
R
{ ; + } and {R; .} are semigroups. eL t
f(x ) =

for all x

e"

R. Thenfis a homomorphism from {R; + }

to {R; .} .

2.2.6. Exercise. eL t ;X {
+}
and { X ; }
denote the semigroups defined
in Example 2.1.17. L e tf: X - + X b e defined as follows:f(O) = 1,/(1) = 3,
f(2) = I, and f(3) = 3. Show that f is a homomorphism from ;X{
+ } into
{ X ; .} .
In order to simplify our notation even further, we will often use the
symbol"." in the remainder of the present chapter to denote operations for
semigroups (or groups), say { X ; ' } , { Y ; ,J and we will often refer to these
simply as semigroup (or group) X and ,Y respectively. In this case, if p
denotes a homomorphism of X into Y we write
p(x y)

p(x ) p(y)

for all x , y E .X
In Chapter I we classified mappings as being into, onto, one-to-one and into,
and one-to-one alld onto. Now if p is a homomorphism of a semigroup X
into a semigroup ,Y we can also classify homomorphisms as being into, onto,
one-to-one and into, and one-to-one and onto. This classification gives rise
to the following concepts.
2.2.7. Definition. eL t
semigroup .Y

p be a homomorphism of a semigroup X

into a

(i) If P is a mapping of X onto ,Y we say that X and Yare homomorphic


semigroups, and we refer to X as being homomorphic to .Y
(ii) If P is a one-to-one mapping of X into ,Y then p is called an
isomorphism of X into .Y

Chapter 2

64

I Algebraic Structures

(iii) If P is a mapping which is onto and one-to-one,


semigroup X is isomorphic to semigroup .Y

we say that

(iv) If X = Y (Le., p is a homomorphism of semigroup X into itself),


then p is called an endomorphism.
(v) If X = Y a nd if p is an isomorphism (Le., p is an isomorphism of
semigroup X into itself), then p is called an automorphism of .X

We note that since all groups are semigroups, the concepts introduced
in the above definition apply necessarily also to groups.
In connection with isomorphic semigroups (or groups) a very important
observation is in order. We first note that if a semigroup (or group) X is
isomorphic to a semigroup ,Y then there exists a mapping p from X into Y
which is one-to-one and onto. Thus, the inverse of p, p- I, exists and we can
associate with each element of X one and only one element of ,Y and vice
versa. Secondly, we note that p is a homomorphism, eL ., p preserves the
properties of the respective operations associated with semigroup (or group)
X and semigroup (or group) Y o r, to put it another way, under p the (algebraic) properties of semigroups (or groups) X and Y a re preserved. eH nce,
it should be clear that isomorphic semigroups (or groups) are essentially
indistinguishable, the homomorphism (which is one-to-one and onto in this
case) amounting to a mere relabeling of elements of one set by elements of
a second set. We will encounter this type of phenomenon on several other
occasions in this book.
We are now ready to prove several results.

p be a homomorphism from a semigroup X into a

2.2.8. 1beorem. eL t
semigroup .Y Then

(i) p(X ) is a subsemigroup of ;Y


(ii) if X has an identity element, e, p(e) is an identity element of p(X ) ;
(iii) if X has an identity element, e, and if x E X has an inverse, x - I,
then p(x) has an inverse in p(X ) and, in fact, P
[ (X)I-]
= p(x - I);
(iv) if IX is a subsemigroup of ,X then p(X I) is a subsemigroup of p(X ) ;
and
(v) if Y I is a subsemigroup of p(X ) , then

Xl

{x

:X

p(x )

I}

is a subsemigroup of .X
Proof. To prove the first part we must show that the subset p(X ) of Y
is closed relative to the operation" .. on .Y Now if x', y' E p( X), then there
exists at least one x E X and at least one y E X such that p(x) = x ' and
p(y) = y'. Since p is a homomorphism, we have
x'

y'

p(x) p(y) =

p(x y),

2.2. oH momorphisms

and since x y E X it follows that x ' y' E p(X ) because p(x y) E P(X ) .
Thus, p(X ) is closed and, hence, is a subsemigroup of .Y
To prove the second part, note that since e E X we have pee) E p(X ) ,
and since for any x ' E p(X ) there exists x E X such that p(x ) = x ' , we have

p(e) x '

pee) p(x ) =

p(e x ) =

p(x )

x'.

Since this is true for every x ' E p(X ) , it follows that p(e) is a left identity
element of p(X ) . Similarly, we can show that x ' pee) = x ' for every x '
E p(X ) . Thus, p(e) is an identity element of the subsemigroup p(X ) of .Y
To prove the third part of the theorem, note that since p is a homomorphism, we have
p(x ) p(x -

and

p(x -

I)

I)

p(x )

p(x

I-X )

p(x -

x)

p(e),
p(e);

i.e., p(e) is an identity element of p(X ) . Also, since p(x - I ) E P(X ) , p(x ) has
an inverse in P(X ) , and P
[ (X)I- ]
= p(x - I).
The proof of parts (iv) and (v) of this theorem are left as an exercise.
_
2.2.9. Exercise.

Complete the proof of Theorem 2.2.8.

We emphasize that although p(e) in the above theorem is an identity


element of the subsemigroup p(X ) of ,Y it is not necessarily true that pee)
has to be an identity element of .Y
2.2.10. Definition. eL t p be a homomorphism of a semigroup X into a
semigroup .Y If p(X ) has identity element, say e', then the subset of ,X K p
defined by
K p = { x E :X p(x) = e'l
is called the kernel of the homomorphism p.
It turns out that K

is a semigroup; i.e., we have:

2.2.11. Theorem. K , is a subsemigroup of .X


2.2.12. Exercise.

Prove Theorem 2.2.11.

Now let X and Y be groups (instead of semigroups, as above), and let p


be a homomorphism of X into .Y We have:
2.2.13. Theorem. eL t p be a homomorphism from a group X into a group
.Y Then
(i) P(X ) is a subgroup of Y; and
(ii) if e is the identity element of ,X then pee) is the identity element of .Y

Chapter 2 / Algebraic Structures

66

Proof To prove the first part, let e denote the identity element of .X By
part (i) of Theorem 2.2.8, p( X ) is a subsemigroup of ;Y by part (ii) of Theorem
2.2.8, p(e) is an identity element of p(X ) ; and by part (iii) of the same theorem,
it follows that every element of p(X ) has an inverse. Thus, p(X ) is a subgroup
of .Y
The second part of this theorem follows from Theorem 2.1.28 and from
part (ii) of Theorem 2.2.8.
The following result is known as Cayley' s theorem.
2.2.14. Theorem. L e t ;X{
.} be a group, and let { P (X ) ; .} denote the
permutation group on .X Then X is isomorphic to a subgroup of P(X ) .

Proof F o r each 0 E ,X define the mapping /,,: X - + X by fa(x) = 0 x


for each x E .X If ,x y E X and fa(x) = /,,(y), then a x = 0 y, and so
x = y. H e nce, fa is an injective mapping. Now let y E .X Then a- I . y E X
and so /"(0- 1 y) = y. This implies that/" is surjective. H e nce, fa is a (1-1)
mapping of X onto ,X which implies that /" is a permutation on X ; i.e.,
fG E P(X ) . Now define the function rp: X - + P(X ) by rp(o) = /" for each
o E .X Now let u, VEX . F o r each x E X , I. . (x) = (u v) x = u (v x)
= fu(v x ) = fuU . (x ) ) = fu ol.(x ) . Thus, fu. = fu 0 f. for all u, VEX .
Since rp(u v) = f.. and rp(u) 0 rp(v) = fu 0 f., it follows that rp(u v) = rp(u)
o rp(v), and so rp is a homomorphism. Suppose u, v E X
are such that rp(u)
= rp(v). Then fu = f., which implies that j,,(x) = I.(x) for all x E .X In
particular,fu(e) = I.(e). H e nce, u e = v e, so that u = v. This implies that
rpis injective. It follows that rp is a (I- I ) mappingof X onto rp(X). By Theorem
2.2.13, part (i), rp(X) is a subgroup of P(X ) . This completes the proof.
We also have:
2.2.15. Theorem. L e t p be a homomorphism of a semigroup X into a semigroup ,Y and let p be an isomorphism of X with p(X ) . Then
(i)

p- I is an isomorphism of p(X ) with X ; and

(ii) if P(X ) contains an identity element e', then p- I (e' ) = e is an identity


element of X and K , = e{ } and ,K .= e{ }' (K, denotes the kernel of
the homomorphism p).

Proof To prove the first part of the theorem, let x', y' E P(X). Then there
exist uniq u e ,x y E X such that p(x) = x ' and p(y) = y', and p- I (X ' )
= x
and p- I (y' ) = y. Since

p(x y)
we have

p- I (X '

y' )

p(x) P(Y)
x y

p- I (X ' )

x'

.v',

p- I (y' ) .

67

2.2. oH momorphisms

Since this is true for all x ' , y' E p(X ) , it follows that p- I is an isomorphism
of p( X ) with .X
To prove the second part of the theorem we first note that P(X) is a
subsemigroup of Y by Theorem 2.2.8. It follows from Theorem 2.2.13 that
e =
I(e') is an identity element of .X Now let p(k) = e'. Since p(e) = e',
it follows that k = e and that K, = e{ .} We can similarly show that K , .

e{ .} '

F r om the above result we can now conclude that if a semigroup X is


isomorphic to a semigroup ,Y then the semigroup Y is isomorphic to the
semigroup .X
F o r endomorphisms and automorphisms we have:
L e t"

2.2.16. Theorem.
itself.

and IjI be homomorphisms of a semigroup X

into

(i) If" and IjI are endomorphisms of X , then the composite mapping
IjI 0 " is likewise an endomorphism of .X
(ii) If" and IjI are automorphisms of X , then IjI 0 " is an automorphism
of .X
(iii) If" is an automorphism of ,X then
is also an automorphism
of .X

,,-1

Proof To prove the first part, note that" and IjI are both mappings of
X into X, and thus IjI 0 " is a mapping of X into .X
Also, by definition,
(1jI 0 ' l )(x ) = 1jI('l(x
for every x E X. Now since ' l (x ' y) = 'lex) ' l (y)
and ljI(x y) = ljI(x) IjI(Y) for every x , y E X , we have

IjI 0 ,,(x y)

1jI('l(x y
(1jI

'l(x

1jI('l(x)

(1jI

' l (Y

1jI(' l (x

1jI(' l (x

' l (y .

This implies that the mapping IjI 0 1' is an endomorphism of .X


The proof of the second and third part of this theorem is left as an
exercise.
_
Complete the proof of the above theorem.

2.2.17. Exercise.

L e t us next consider homomorphisms of rings. To this end let, henceforth,


and Y be arbitrary rings, and without loss of generality let the operations
of these two rings be denoted by
and ".".
X

"+"

2.2.18. Definition. Let X and Y be two rings. A mapping p of set X


set Y is called a homomorphism of the ring X into the ring Y if
(i) p(x
(ii) p(x ,

y)
y)

p(x )
p(x )

p(y); and

p(y)

into

Chapter 2 I Algebraic Structures

68

for every ,x y E X. The image of X


homomorphic image of .X

into ,Y

denoted by P(X ) ,

is called the

If a homomorphism p is a one-to-one mapping of a ring X into a ring


,Y then p is called an isomorphism of X into .Y If the isomorphism p is an
onto mapping of X into ,Y then p is called an isomorphism of X with .Y
Furthermore, if p is a homomorphism of X into X, then p is called an
endomorphism of the ring .X Finally, an isomorphism of X with itself is called
an automorphism of ring .X
The properties associated with homomorphisms of groups and semigroups
can, of course, be utilized when discussing homomorphisms of rings.
2.2.19. Deorem. eL t p be a homomorphism of a ring X

into a ring .Y

(i) The homomorphic image p(X ) is a subring of .Y


(ii) . If lX is a subring of X , then p(X I ) is a subring of P(X ) .
(iii) eL t Y I be a subring of P(X ) . Then the subset lX C X defined by

XI = {x
E :X
p(x ) E Y t l
is a subring of .X
(iv) eL t Z be a ring and let f/I be a homomorphism of Y into Z. Then
the composite mapping f/I 0 p is a homomorphism of X into Z.

Proof To prove the first part of the theorem we note that the homomorphic
image p(X ) is clearly the homomorphic image of the group { X ; + } and of
the semigroup { X ; .). Since this homomorphic image is a subgroup of
{ Y ; + ) and subsemigroup of { Y ; .), it follows from Theorem 2.1.72 that
P(X ) is a subring of .Y
The proofs of the remaining parts of this theorem are left as an exercise .
2.2.20. Exercise.

Prove parts (ii), (Hi), and (iv) of Theorem 2.2.19.

Analogous to 2.2.10, we make the following definition.


2.2.21. Definition. If p is a homomorphism of a ring X
then the subset K , of X defined by
K,

z{

X:

p(z) =

into a ring ,Y

O}

is called the kernel of the homomorphism p of the ring X into .Y


We close the present section by introducing one more concept.
2.2.22. Definition. eL t { R ; ,+ .} be a ring with identity and let X and Y
be two R-modules. A mapping f: X Y is called an R-homomorphism if,
for all u, v E X and E R the relations

2.3. Application to Polynomials

(i) f(u +
(ii) f(rt.u)

1)

f(u)
rt.f(u)
=

69

f(v); and

hold.
In the next chapter we will consider in great detail a special class of
vector spaces and homomorphisms, and for this reason we will not pursue
this subject any further at this time.

2.3. APPLICATION

TO POLYNOMIALS

Polynomials play an important role in many branches of mathematics


as well as in science and engineering. In the present section we briefly consider
applications of some of the concepts of the preceding sections to polynomials.
First, we wish to give an abstract definition for a polynomial function.
Basically, we want this function to take the form
f(t)
oH wever,

00

alt

+ ... +

a.t.

we are not looking for a way of defining the value of f(t) for each

t, but instead we seek a definition offin terms of the indexed set 0{ o, ... , an}.
To this end we let the a, belong to some field.
More formally, let F be a field and define a set P as follows. If a E P,
then a denotes an infinite sequence of elements from F in which all except a
finite number are ez ro. Thus, if a E P, then
a = 0{ o, a .. ... ,an' 0, 0, ...}.

That is to say, there exists some integer n >


Now let b be another element of P, where
b

We say that a
on P by

"+"

0 such that a/

= 0 for all i > n.

lbo, bl' ... ,b.., 0, 0, ...}.

= b if and only if a, = b, for all i. We now define the operation

0+

b=

a{ o

bo, 0 1

b.. ... .J

Thus, if n 2 m, then a, + b, = 0 for all i > nand P is clearly closed with


respect to "+". Next, we define the operation "." on P by

a b=

where

C"

c = c{ o,
~

CI , .

,J

" a,b,,_,

t:'o

for all k. In this case c" = 0 for all k> m + n, and P is also closed with
respect to the operation" " . Now let us define
Then 0

P and { P ; + }

0= O
{ , 0, ....J
is clearly an abelian group with identity O. Next,

Chapter 2 I Algebraic Structures

70

define

... .J

e= { I ,O,O,

Then e E P and { P ; J is obviously a monoid with e as its identity element.


We can now easily prove the following

,+ .

2.3.1. Theorem. The mathematical system P


{ ;
J is a commutative
ring with identity. It is called the riDg of polynomials over the field .F
2.3.2.

Exercise.

Prove Theorem 2.3.1.

Let us next complete the connection between our abstract characteriz a tion
of polynomials and with the function f(t) we originally introduced. To this
end we let

to= { I ,O,O,

= O{ , I, 0, 0,

t\

t'1. =

O
{ ,O, 1,0,

t3 =

O
{ ,O,O, I,O,

J
J

At this point we still cannot give meaning to a,t', because a, E F and t' E P.
However, if we make the obvious identification a{ " 0,0, ... J E P, and if
we denote this element simply by a, E P, then we have
f(t)

a o to

a\ t\

+ ... +

a t .

Thus, we can represent J ( t) uniquely by the sequence a{ o, at, ... ,a.,


0, ... .J By convention, we henceforth omit the symbol ". ", and write, e.g.,
f(t)

ao

a\ t

+ ... +

a"r.

We assign t appearing in the argument of f(t) a special name.

,+ .}

2.3.3. DeftnitiOD. Let P


{ ;
be the polynomial ring over a field .F
The element t E P, t = O
{ , 1,0, ...}, is called the indeterminate of P.
To simplify notation, we denote by F [ t ]
the ring of polynomials over a
field ,F and we identify elements of t[ F ]
(i.e., polynomials) by making use of
the argument t, e.g., f(t) E t[F .]

* 0,

2.3.4. DeftnitioD. Let f(t) E t[F ,]


and let f(t) = f{ O,f1o . . ,f", ... J
where f, E F for all i. The polynomial f(t) is said to be of order n or
degree n iff"
and if f, =
for all i > n. In this case we write degf(t) =
and we call f" the leading coefticieDt off If f" = I and f, =
for all i >
then J ( t) is said to be monic.

of
n

n,

If every coefficient of a polynomialfis zero, thenf b. is called the zero


polynomial. The order of the zero polynomial is not defined.'

2.3.

71

Application to Polynomials

2.3.5. Theorem. L e tf(t) be a polynomial of order n and let get) be a polynomial of order m. Then f(t)g(t) is a polynomial of order m + n.

Proof

+ ... +

/"t , let get)

L e tf(t) = f o
fit
= f(t)g(t). Then

and let h(t)

go

glt

+ ... +

g.r,

Since It = 0 for i > nand gJ = 0 for j > m, the largest possible value of k
such that hk is non-zero occurs for k = m n; eL .,

hm+n

/"gm'

Since F is a field, f" and gm cannot be zero divisors, and thushm + .


Therefore, hm + . *- 0, and hk = 0 for all k > m + n.

*-

O.

The reader can readily prove the next result.


2.3.6. Theorem. The ring F ( t) of polynomials over a field F is an integral
domain.
2.3.7. Exercise. Prove Theorem 2.3.6.
Our next result shows that, in general, we cannot go any further than
integral domain for t[ F l.
2.3.8. Theorem. Let f(t) E t[F .]
if and only if f(t) is of order zero.

Then f(t) has an inverse relative to "."

Proof

Let f(t) E t[F J


be of order n, and assume that f(t) has an inverse
relative to ".", denoted by f- I (t), which is of order m. Then
f(t)f- I (t)

where e =

e,

{I, 0, 0, ... J is of order ez ro. By Theorem 2.3.5 the degree of


+
n = 0 and since m > 0 and n > 0, we must

f(t)f- 1 (t) is m
n. Thus, m
havem = n = O.
Conversely, let f(t) = fo =
= fo 1 = { f o , 0, 0, ... J .

f{ o, 0, 0, ... ,J where fo

*-

O. Then f- I (t)

In the case of polynomials of order zero we omit the notation t, and we


say f(t) is a scalar. Thus, if c(r) is a polynomial of order zero, we have
c(t) = c, where c 1= = O. We see immediately that cf(t) = cfo + cflt
+ cf"t" for all f(t) E t[F .J
The following result, which we will require in Chapter ,4 is sometimes
called the division algorithm.

+ ...

Chapter 2

72

I Algebraic Structures

2.3.9. Theorem. eL t f(t), get) E E[t] and assume that get)


exist unique elements q(t) and ret) in E[t] such that

*"O. Then there

= (q t)g(t) + ret),
(2.3.10)
where either ret) = 0 or deg ret) < deg get).
Proof If f(t) = 0 or if degf(t) < deg get), then Eq. (2.3.10) is satisfied with
q(t) = 0, and ret) = f(t). Ifdegg(t) = 0, Le.,g(t) = c, thenf(t) = c[ - I f(t)]
C, and Eq. (2.3.10) holds with q(t) =
c- I f(t) and ret) = O.
f(t)

Assume now that deg f(t) > deg get) > 1. The proof is by induction on
the degree of the polynomial f(t). Thus, let us assume that Eq. (2.3.10)
holds for deg f(t) = n. We first prove our assertion for n = 1 and then for
n + I.
Assume that deg f(t) = I, eL ., f(t) = a o + alt, where a l O. We need
only consider the case g(t) = b o + bit, where b l O. We readily see that
Eq. (2.3.10) is satisfied withq ( t) = alb. 1 and ret) = a o - alb.lb o'
Now assume that Eq. (2.3.10) holds for degf(t) = k, where k = 1, ... ,
n. We want to show that this implies the validity of Eq. (2.3.10) for
degf(t) = n + I. Let

*"

f(t) =

ao +

alt

+ ... +

*"

a"+lt"+I,

where a,,+ I 1= = O. Let deg get) = m. We may assume that 0 < m < n + I. Let
g(t) = bo + bit + ... + b",t"', where b", O. It is now readily verified that

f(t)

b;.la"t"+I"- g' (t)

*"

[ f (t) -

b;.la.,tk+I"- g' (t)].

(2.3.11)

Now let h(t) = f(t) - b;.l a.,t"+I-"'g(t).


It can readily be verified that the
coefficient of t"+1 in h(t) is O. Hence, either h(t) = 0 or deg h(t) < n + I.
By our induction hypothesis, this implies there exist polynomials set) and ret)
such that h(t) = s(t)g(t) + ret), where ret) = 0 or deg ret) < deg get). Substituting the expression for h(t) into Eq. (2.3.11), we have

f(t)

= b[ ;.la"t"+I"'-

s(t)]g(t)

ret).

Thus, Eq. (2.3.10) is satisfied and the proof of the existence of ret) and q(t)
is complete.
The proof of the uniqueness of q(t) and ret) is left as an exercise. _
2.3.12. Exercise.

Prove that (q t) and ret) in Theorem 2.3.9 are unique.

The preceding result motivates the following definition.


2.3.13. Definition. Let f(t) and get) be any non-zero polynomials. Let
q(t) and ret) be the unique polynomials such thatf(t) = q(t)g(t) + r(t), where
either ret) = 0 or deg ret) < deg get). We call q(t) the qootient and ret) the
remainder in the division of f(t) by get). If ret) = 0, we say that get) divides
f(t) or is a factor of f(t).

73

2.3. Application to Polynomials

Next. we prove:
2.3.14. Theorem. eL t t[ F ]
denote the ring of polynomials over a field .F
eL t f(t) and g(t) be nonzero polynomials in t[ F .]
Then there exists a unique
monic polynomial. d(t). such that (i) d(t) divides f(t) and g(t). and (ii) if
d'(t) is any polynomial which divides f(t) and g(t), then d'(t) divides d(t).
Let

Proof.

t[ K ]

{x(t)

t[ F :]

x ( t)

m(t)f(t) +

n(t)g(t). where m(t). n(t)

t[ F l}.

We note that f(t). g(t) E t[ K .]


Furthermore, if a(t), b(t) E t[ K .]
then a(t) and a(t)b(t) E t[ K .]
Also. if c is a scalar. then ca(t) E t[ K ]
for all
b(t) E t[ K ]
a(/) E K[/].
Now let d(/) be a polynomial of lowest degree in K[t].
Since all
scalar multiples of d(/) belong to t[ K ,]
we may assume that d(t) is monic. We
now show that for any h(/) E t[ K .]
there is a q(t) E t[ F ]
such that h(/) =
d(/)q(t). To prove this. we know from Theorem 2.3.9 that there exist unique
such that h(t) = q(t)d(t) + ,(1). where either
elements q(t) and ,(t) in t[ F ]
r(t) = 0 or deg ,(t) < deg d(t). Since d(t) E /[ K ]
and q(t) E t[ F .]
it follows
that q(I)d(t) E K(t). Also. since h(/) E t[ K ,]
it follows that r(/) = h(t) Since d(t) is a polynomial of smallest degree in (K t). it
q(t)d(t) E t[ K .]
follows that r(/) = O. eH nce. d(t) divides every polynomial in /[ K .]
To show that d(t) is unique. suppose dl(t) is another monic polynomial
in t[ K ]
which divides every polynomial in t[ K .]
Then d(t) = a(t)dl(t). and
d 1(t) = b(t)d(/) for some a(t). b(t) E t[ F .]
It can readily be verified that this
is true only when aCt) = b(t) = 1. Now, since J ( t), g(t) E t[K l,
part (i) of
the theorem has been proven.
To prove part (ii), let o(t), b(t) E t[ F ]
be such that f(t) = a(t)d'(t) and
get) = b(t)d'(t). Since d(t) E t[ K ,]
there exist polynomials m(t), n(t) such that
d(t) = m(t)f(t) + n(t)g(t). eH nce,
d(t) =

m(t)a(t)d'(t)

= m
[ (t)a(t)

n(t)b(t)d'(t)

n(t)b(t)]d(' t).

This implies that d'(t) divides d(t) and completes the proofofthe theorem.

The polynomial d(t) in the preceding theorem is called the greatest


common divisor of f(t) and g(t). If d(t) = 1. then f(t) and g(t) are said to be
relatively prime.
2.3.15. Exercise. Show that if d(t) is the greatest common divisor of f(t)
and g(t). then there exist polynomials m(t) and n(t) such that
de,) =

m(t)f(t) +

n(t)g(t).

Iff(t) and g(t) are relatively prime, then


1=

m(t)f(t) +

n(t)g(t).

Chapter 2 I Algebraic Structures

74

Now let f(t) E t[ F ]


be of positive degree. If f(t) = g(t)h(t) implies that
either g(t) is a scalar or h(t) is a scalar, then f(t) is said to be irreducible.
We close the present section with a statement of the fundameotal theorem
of algebra.
2.3.16. Theorem. Let f(t) E t[ F ]
the field of real numbers and let C
(i) If F = C, then f(t) can be
product
f(t) = c(t -

be a non-zero polynomial. L e t R denote


denote the field of complex numbers.
written uniquely, except for order, as a

cl)(t -

C1)' .. (t -

c.),

where c, C l , ,C. E C.
(ij) If F = R, then f(t) can be written uniquely, except for order, as a
product
f(t) = cfl(t)f1(t) . . .f",(t),
where C E R and the fl(t), ... ,/",(t) are monic irreducible polynomials of degree one or two.

2.4.

REFERENCES

AND NOTES

There are many excellent texts on abstract algebra. F o r an introductory


exposition of this subject refer, e.g., to Birkhoffand MacLane 2[ .1], H a nneken
2[ .2], H u 2[ .3], Jacobson 2[ .4,]
and McCoy 2[ .6]. The books by Birkhoff
and MacLane and Jacobson are standard references. The texts by H u and
McCoy are very readable. The excellent presentation by H a nneken is
concise, somewhat abstract, yet very readable. Polynomials over a field are
treated extensively in these references. F o r a brief summary of the properties
of polynomials over a field, refer also to Lipschutz 2[ .5].

REFERENCES
2[ .1]
2[ .2]
2[ .3]

2[ .4]
2[ .5)

2[ .6)

G. BIRKO
H F
and S.MACLANE, A Survey of Modern Algebra. New York:
The Macmillan Company, 1965.
C. B. A
H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968.
S. T. Hu, Elements ofModern Algebra. San rF ancisco, Calif.: oH lden-Day,
Inc., 1965.
N. A
J COBSON, eL ctures in Abstract Algebra. New York: D. Van Nostrand
Company, Inc., 1951.
S. LIPSCHUTZ,
iL near Algebra. New York: McGraw-iH ll Book Company,
1968.
N. .H McCoY, uF ndamentals of Abstract Algebra. Boston: Allyn & Bacon,
Inc., 1972.

VECTOR SPACES AND


IL NEAR
TRANSFORMATIONS

In Chapter I we considered the set-theoretic structure of mathematical


systems, and in Chapter 2 we developed to various degrees of complexity the
algebraic structure of mathematical systems. One of the mathematical systems
introduced in Chapter 2 was the linear or vector space, a concept of great
importance in mathematics and applications.
In the present chapter we further examine properties of linear spaces.
Then we consider special types of mappings defined on linear spaces, called
linear transformations, and establish several important properties of linear
transformations.
In the next chapter we will concern ourselves with finite dimensional
vector spaces, and we will consider matrices, which are used to represent
linear transformations on finite dimensional vector spaces.

3.1.

IL NEAR

SPACES

We begin by restating the definition of linear space.


3.1.1. Definition. Let X be a non-empty set, let F be a field, let "+ ..
denote a mapping of X x X into ,X and let" " denote a mapping of F x X
into .X Let the members x E X be called l'ectors, let the elements E F be
called scalars, let the operation
defined on X be called e'\ ctor addition,

"+ ..

75

Chapter 3 I Vector Spaces and iL near Transformations

76

and let the mapping "." be called scalar multiplicatioD or moltipUcatioa or


vectors by scalars. Then for each ,x y E X there is a unique element, x
y
E ,X
called the sum or x aad y, and for each x E X and IX E F there is a
unique element, IX X I!. IXX E ,X called the multiple or x by IX. We say that
the non-empty set X and the field ,F along with the two mappings of vector
addition and scalar multiplication constitute a vector space or a iJ Dear space
if the following axioms are satisfied:

y= y
x for every ,x y EX ;
(i) x
(ii) x
(y
)z = (x + y) + z for every ,x y, Z E X ;
(iii) there is a unique vector in ,X called the ez ro vector or the Dull
vector or the origiD, which is denoted by 0 and which has the property that 0 x = x for all x EX ;
(iv) IX(X
y) = IXX
IXy for all IX E F and for all ,x y E X ;
(v) (IX
p)x = IXX
px for all IX, p E F and for all x E X ;
(vi) (IXP)X = IX(PX) for all IX, p E F and for all x E ;X
(vii) Ox = 0 for all x E X ; and
(viii) Ix = x for all x E .X

The reader may find it instructive to review the axioms of a field which
are summarized in Definition 2.1.63. In (v) the "+" on the left-hand side
on the right-hand side
denotes the operation of addition on F ; the
denotes vector addition. Also, in (vi) IXP I!. IX p, where "." denotes the
operation of mulitplication on .F In (vii) the symbol 0 on the left-hand side is
a scalar; the same symbol on the right-hand side denotes a vector. The I
on the left-hand side of (viii) is the identity element of F r elative to ".".
To indicate the relationship between the set ofvectors X and the underlying
field ,F we sometimes refer to a vector space X over field .F oH wever, usually
we speak of a vector space X without making explicit reference to the field F
and to the operations of vector addition and scalar multiplication. If F is
the field of real numbers we call our vector space a real vector space. Similarly,
if F is the field of complex numbers, we speak of a complex vector space.
Throughout this chapter we will usually use lower case Latin letters (e.g.,
,x y, )z to denote vectors (Le., elements of X ) and lower case Greek letters
(e.g., IX, p, )') to denote scalars (Le., elements of F ) .
If we agree to denote the element (- l )x E X simply by - x , eL ., (- l )x
I!. - x ,
then we have x - x = Ix + (- l )x = (l - l)x = Ox = O. Thus,
if X is a vector space, then for every x E X there is a unique vector, denoted
-x,
such that x - x = O. There are several other elementary properties of
vector spaces which are a direct consequence of the above axioms. Some of
these are summarized below. The reader will have no difficulties in verifying
these.

"+"

3.1. iL near Spaces

77

3.1.2. Theorem. eL t X be a vector space. If ,x y, z are elements in X and


if , P are any members of ,F then the following hold:
(i) if x = y and IX 1= = 0, then x = y;
(ii) If IXX = px and x 1= = 0, then IX = p;.
(iii) if oX + y = x + ,z then y = ;z
(iv) IXO = 0;
(v)

IX(X

(vi) (IX
(vii) x

y) =

fJ)x
y=

3.1.3. Exercise.

IXX

IXX

IX}';
-

px; and
-

0 implies that x

-yo

Prove Theorem 3.1.2.

We now consider several important examples of vector spaces.


3.1.4. Example. eL t X be the set of all "arrows" in the "plane" emanating
from a reference point which we call the origin or the ez ro vector or the null
vector, and which we denote by o. eL t F denote the set of real numbers, and
let vector addition and scalar multiplication be defined in the usual way,
as shown in iF gure A.

Vector x

x
x

x+y

"fY

Vector x + y

3.1.5.

av

.,

.
y

($y

Vector y
Vector av, O< a < l
Vector ($y, fj > 1
Vector "fY, O
<'}

iF gm'e

The reader can readily verify that, for the space described above, all the
axioms of a linear space are satisfied, and hence X is a vector space. _
The purpose of the above example is to provide an intuitive idea of a
linear space. We wiJ) utilize this space occasionally for purposes of motivation
in our development. We must point out however that the terms "plane" and
"arrows" were not formally defined, and thus the space X was not really
properly defined. In the examples which follow, we give a more precise formulation of vector spaces.

Chapter 3 I Vector Spaces and iL near Transformations

78

eL t X = R denote the set of real numbers, and let F also


denote the set of real numbers. We define vector addition to be the usual
addition of real numbers and multiplication of vectors x E R by scalars
(X E F to be multiplication of real numbers. It is a simple matter to show that
this space is a linear space. _

3.1.6. Example.

3.1.7. Example.

Let X = P denote the set of all ordered n-tuples of


elements from field .F Thus, if x E iF t, then x = (C It e2' ... ,elt), where
E ,F i =
I, ... ,n. With ,x yEP and (X E ,F let vector addition and
scalar multiplication be defined as

e,

e2"' " elt) + ('71' ' 7 2"' " 7' 1t)


(el + 7' 1' e2 + 7' 2' ... ,elt + 7' 1t)

(el'
.0.

and
(Xx

e2" .. ,elt)

(X(el'

"+

.0.

eX l'

e 2"

.. ,elt)'

(3.1.8)
(3.1.9)

It should be noted that the symbol"


on the right-hand side of Eq. (3.1.8)
denotes addition on the field ,F and the symbol"
on the left-hand side of
Eq. (3.1.8) designates vector addition. (See Theorem 2.1.88.)
In the present case the null vector is defined as = (0, 0, ... , 0) and the
vector - x
is defined by - x
= -(~I'
~2"'"
~It) = (-~I'
-~2"'"
e- lt)'
Utilizing
the properties of the field ,F all axioms of Definition 3.1.1 are
readily verified, and iF t is thus a vector space. We call this space the space
iF t of n-tuples of elements of .F
_

"+

3.1.10. Example.

In Example 3.1.7 let F = R, the field of real numbers.


Then X = RIt denotes the set of all n-tuples of real numbers. We call the
vector space R" the n-dimensional real coordinate space. Similarly, in Example
3.1.7 let F = C, the field of complex numbers. Then X = Cit designates the
set of all n-tuples of complex numbers. The linear space C" is called the ndimensional complex coordinate space. _
In the previous example we used the term dimension. At a later point in
the present chapter the concept of dimension will be defined precisely and
some of its properties will be examined in detail.

3.1.11. Example.

eL t
numbers of the form
X

denote the set of all infinite sequences of real


(3.1.12)

let F denote the field of real numbers, let vector addition be defined similarly
as in Eq. (3.1.8), and let scalar multiplication be defined similarly as in Eq.
(3.1.9). It is again an easy matter to show that this space is a vector space.
We point out that this space, which we denote by R- , is simply the collection

3.1. iL near Spaces

79

of all infinite sequences; eL ., there is no req u irement that any type of convergence of the sequence be implied. _
3.1.13. Example.
L e t X = c~ denote the set of all infinite sequences of
complex numbers of the form (3.1.12), let F represent the field of complex
numbers, let vector addition be defined similarly as in Eq. (3.1.8), and let
scalar multiplication be defined similarly as in Eq. (3.1.9). Then C~ is a
vector space. _
L e t X denote the set of all sequences of real numbers
3.1.14.
Example.
having only a finite number of non- z e ro terms. Thus, if x E ,X then
(3.1.15)
for some positive integer I. If we define vector addition similarly as in Eq.
(3.1.8), if we define scalar multiplication similarly as in Eq. (3.1.9), and if we
let F be the field of real numbers, then we can readily show that X is a real
vector space. We call this space the space of finitely non-zero sequences.
If X denotes the set of all sequences of complex numbers of the form
(3.1.15), if vector addition and scalar multiplication are defined similarly as
in eq u ations (3.1.8) and (3.1.9), respectively, then X is again a vector space
(a complex vector space). _
L e t X be the set of infinite sequences of real numbers
3.1.16. Example.
of the form (3.1.12), with the property that Iim~.
= O. If F is the field of real
numbers, if vector addition is defined similarly as in Eq. (3.1.8), and if scalar
multiplication is defined similarly as in Eq. (3.1.9), then X is a vector space.
This is so because the sum of two sequences which converge to z e ro also
converges to zero, and because the scalar multiple of a sequence converging
to z e ro also converges to zero.
_
3.1.17. Example.
L e t X be the set of infinite sequences of real numbers
of the form (3.1.12) which are bounded. If vector addition and scalar multiplication are again defined similarly as in (3.1.8) and (3.1.9), respectively,
and if F denotes the field of real numbers, then X is a vector space. This
space is called the space of bounded real sequences.
There also exists, of course, a complex counterpart to this space, the
_
space of bounded complex sequences.
3.1.18. Example.

Let X denote the set of infinite sequences of real numbers

of the form (3.1.12), with the property that ~

00

1= 1

I~II

<

00.

Let

be the field

of real numbers, let vector addition be defined similarly as in (3.1.8), and let
scalar multiplication be defined similarly as in Eq. (3.1.9). Then X is a vector
space. _

I Vector Spaces and Linear Transformations

Chapter 3

80

3.1.19. ExaDlple. Let X be the set of all real-valued continuous functions


defined on the interval a[ , b]. Thus, if x E ,X then x : a[ , b] - > R is a real,
continuous function defined for all a ~ t < b. We note that x = y if and only
if x ( t) = y(t) for all t E a[ , b], and that the null vector is the function which
is ez ro for all t E a[ , b]. eL t F denote the field of real numbers, let liEF ,
and let vector addition and scalar multiplication be defined pointwise by
(x

and

yX t )
(lIx X t )

x ( t)

lIx{t)
=

y(t) for all t E a[ , b]

(3.1.20)

for all t E a[ , b].

(3.1.21)

Then clearly x + y E X whenever x, y E X, IIX E X whenever liEF and


x E ,X and all the axioms of a vector space are satisfied. We call this vector
space the space of real-valued continuous functions on a[ , b) and we denote it
by era, b). _
3.1.22. Example. Let X be the set of all real-valued functions defined on
the interval a[ , b) such that

s:

Ix ( t)\

dt

<

00,

where integration is taken in the Riemann sense. eL t F denote the field of


real numbers, and let vector addition and scalar multiplication be defined as
in equations (3.1.20) and (3.1.21), respectively. We can readily verify that X
is a vector space. _
3.1.23. Example. eL t X denote the set of all real-valued polynomials
defined on the interval a[ , b), let F be the field of real numbers, and let vector
addition and scalar multiplication be defined as in equations (3.1.20) and
(3.1.21), respectively. We note that the null vector is the function which is
ez ro for all t E a[ , b], and also, if x ( t) is a polynomial, then so is - x ( t).
Furthermore, we observe that the sum of two polynomials is again a polynomial, and that a scalar multiple of a polynomial is also a polynomial.
We can now readily verify that X is a linear space. _
3.1.24. Example. eL t X denote the set of real numbers between - a < 0
and + a >
0; i.e., if x E X then x E [ - a , a). Let F be the field of real
numbers. Let vector addition and scalar multiplication be as defined in
Example 3.1.6. Now, if II E F is such that II> I, then tZa > a and tZa .X
F r om this it follows that X is not a vector space. _
Vector spaces such as those encountered in Examples 3.1.19,3.1.22,
and 3.1.23 are called function spaces. In Chapter 6 we will consider some additional linear spaces.

81

3.2. Linear Subspaces and Direct Sums

3.1.25. Exercise. Verify the assertions made in Examples 3.1.6,3.1.7,


3.1.10,3.1.11,3.1.13,3.1.14,3.1.16,3.1.17,3.1.18,3.1.19,3.1.22, and 3.1.23.

3.2.

IL NEAR

SUBSPACES

AND DIRECT SUMS

We first introduce the notion of linear subspace. (See also Definition


2.1.102.)
3.2.1. Definition. A non-empty subset Y of a vector space X is called a
linear manifold or a linear subspace in X if (i) x + y is in Y whenever x and y
are in ,Y and (ii) (X x is in Y whenever (X E F and x E .Y
It is an easy matter to verify that a linear manifold Y satisfies all the
axioms of a vector space and may as such be regarded as a linear space itself.
3.2.2. Example. The set consisting of the null vector 0 is a linear subspace;
{ J is a linear subspace. Also, the vector space X is a linear
i.e., the set Y = O
subspace of itself. If a linear subspace Y is not all of X, then we say that Y
is a proper subspace of .X

3.2.3. Example. The set of all real-valued polynomials defined on the


interval a[ , b] (see Example 3.1.23) is a linear subspace of the vector space
consisting of all real-valued continuous functions defined on the interval
[a, b] (see Example 3.1.19).
Concerning linear subspaces we now state and prove the following result.
3.2.4. Theorem. Let Y and Z be linear subspaces of a vector space .X
The intersection of Y and Z, Y n Z, is also a linear subspace of .X
Proof. Since Y a nd Z are linear subspaces, it follows that 0 E Y a nd 0 E Z,
and thus 0 E Y n Z. Hence, Y n Z is non-empty. Now let (x ,
E ,F
let
X , y E ,Y and let x , y E Z. Then (X x + py E Y a nd also (X x + py E Z,
because Y a nd Z are both linear subspaces. eH nce, (X x + py E Y n Z and
Y n Z is a linear subspace of .X

We can extend the above theorem to a more general result.


3.2.5. Theorem. Let X be a vector space and let ~ be a linear subspace
of X for every i E I, where I denotes some index set. Then
XI is a linear
lei
su bspace 0 f .X

3.2.6. Exercise.

Prove Theorem 3.2.5.

81

Chapter 3

I Vector Spaces and Linear Transformations

Now consider in the vector space of Example 3.1.4 the subsets Y a nd Z


consisting of two lines intersecting at the origin 0, as shown in Figure B.
Clearly, Y and Z are linear subspaces of the vector space .X On the other
hand, the union of Y and Z, Y u Z, obviously does not contain arbitrary
sums lXy + flz, where lX, fl E F and y E Y a nd z E Z. F r om this it follows
that if Y and Z are linear subspaces then, in general, the union Y U Z is
not a linear subspace of .X

3.1.7.

iF gure B

3.2.8. Definition. eL t X be a linear space, and let Y and Z be arbitrary


subsets of X. The sum of sets Y and Z, denoted by Y + Z, is the set of all
vectors in X which are of the form y + ,z where y E Y a nd Z E Z.
The above concept is depicted pictorially in Figure C by utilizing the
vector space of Example 3.1.4. With the aid of our next result we can generate
various linear subspaces.

Y+Z

3.2.9.

iF gure C. Sum of two Subsets.

3.2.

83

Linear Subspaces and Direct Sums

3.2.10. Theorem.
Then their sum, Y
3.2.11. Exercise.

Let Y a nd Z be linear subspaces of a vector space .X


Z, is also a linear subspace of .X
Prove Theorem 3.2.10.

{ ,J
Now let Y and Z be linear subspaces of a vector space .X If Y n Z = O
we say that the spaces Y and Z are disjoint. We emphasize that this terminology is not consistent with that used in connection with sets. We now
have:
3.2.12. Theorem. Let Y and Z be linear subspaces of a vector space .X
Then for every x E Y + Z there exist unique elements Y E Y and Z E Z
Z if and only if Y n Z =
O
{ .J
such that x = Y

x E Y + Z be such that x = IY + ZI = 2Y + Z2' where Y I '


where ZI,Z2 E Z. Then clearly IY - 2Y = Z2 - ZI' Now IY - 2Y
E Y and Z2 - Z I E Z, and since by assumption Y n Z =
(OJ, it follows that
IY - 2Y = 0 and Z2 - ZI = 0, .Y = 2Y and ZI = Z2' Thus, every x E Y + Z
has a unique representation x = Y + ,z where Y E Y and Z E Z, provided
{ .J
that Y n Z = O
Conversely, let us assume that for each x = Y + Z E Y + Z the Y E Y
and the Z E Z are uniquely determined. Let us further assume that the linear
subspaces Y and Z are not disjoint. Then there exists a non-zero vector
v E Y n Z. In this case we can write x = Y
Z = Y
Z
v - v = (y
+ /Xv) + (z - /Xv) for all/X E .F But this implies that y and z are not unique,
which is a contradiction to our hypothesis. eH nce, the spaces Y and Z must
be disjoint. _
Proof

2Y

Let

Y a nd

Theorem 3.2.10 is readily extended to any number of linear subspaces


of .X Specifically, if IX > ... , ,X are linear subspaces of ,X then XI + ...
+ X, is also a linear subspace of .X This enables us to introduce the following:
3.2.13. Definition. Let X I > " "
,X
be linear subspaces of the vector
space .X The sum XI + ... + ,X is said to be a direct sum if for each
x E XI +
,X there is a unique set of iX E IX > i = I, ... ,r such that
x = IX +
x,. We denote the direct sum of X I , . .. , ,X by XI EB ...

EB ,X .

There is a connection between the Cartesian product of two vector spaces


and their direct sum. eL t Y a nd Z be two arbitrary linear spaces over the same
field F and let V = Y x Z. Thus, if v E V, then v is the ordered pair
v=

(y, z),

Chapter 3 I Vector Spaces and iL near Transformations

84

where y E Y a nd

Z E

Z. Now let us define vector addition as


(3.2.14)

and scalar multiplication as


( y, z)

(<y< ,

(3.2.15)

)z ,

where (yI' Z I), (Y1' z 1) E V = Y x Z and where E .F Noting thatfor each


vector (y, z) E V there is a vector - ( y, z) = (- y , - z )
E V, and observing
that (0, 0) = (y, )z - (y, )z for all elements in V, it is an easy matter to show
that the space V = Y x Z is a linear space. We note that Y is not a linear
subspace of V, because, in fact, it is not even a subset of V. However, if we
let
Y'

= fey, 0): yE

Y},

Z'

Z} ,

and

({ O, z): z

Then Y ' and Z' are linear subspaces of V and V = Y' EB Z' . By abuse of
notation, we frequently express this simply as V = Y EB Z.
Once more, making use of Example 3.1.4, let Y and Z denote two lines
intersecting at the origin 0, as shown in F i gure D. The direct sum of linear
subspaces Y a nd Z is in this case the "entire plane."

3.2.16.

Figure D

In order that a subset be a linear subspace of a vector space, it is necessary


that this subset contain the null vector. Thus, in Figure D, the lines Y a nd Z
passing through the origin 0 are linear subspaces of the plane (see Example
3.1.4). In many applications this requirement is too restrictive and a generalization is called for. We have:
3.2.17. DefinitiOD. Let Y be a linear subspace of a vector space ,X
x be a fixed vector in .X We cal1 the translation

and let

3.3. iL near Independence. Bases. and Dimension

z =

+
X

Do

z{

X:

=
x

8S

.Y Y

}Y

a linear variety or a ftat or an affine linear subspace of .X


In F i gure E. an example of a linear variety is given for the vector space of
Example 3.1.4.

3.2.18.

3.3.

iF gure E

IL NEAR INDEPENDENCE,
AND DIMENSION

BASES,

Throughout the remainder of this and in the following chapter we use the
following notation: ({ IX . .. ,(X,,},
(X, E ,F denotes an indexed set of scalars,
and IX{ '
... ,x ,}. ,X E .X denotes an indexed set of vectors.

Before introducing the notions of linear dependence and independence


of a set of vectors in a linear space ,X we first consider the following.

3.3.1. Deflnition. Let Y be a set in a linear space X (Y may be a finite set


or an infinite set). We say that a vector X E X is a finite linear combination
of vectors in Y if there is a finite set of elements {YI' Y 2 ""
,Y ,} in Y a nd a
finite set of scalars ({ IX ' (X2' ... , (X.} in F such that
X

(XIIY

(X22Y

+ ... +

(X"y".

(3.3.2)

In Eq. (3.3.2) vector addition has been extended in an obvious way from
the case of two vectors to the case of n vectors. In later chapters we will
consider linear combinations which are not necessarily finite. The represen-

Chapter 3 I Vector Spaces and iL near Transformations

86

tation of x in Eq. (3.3.2) is, of course, not necessarily unique. Thus, in the
case of Example 3.1.10, if X = RZ and if x = (1, 1), then x can be represented
as
or as

x =

PIZI

pzz

2(! , 0)

3(0,! ) ,

etc. This situation is depicted in Figure .F


= (0,1)

2Y

Z2

(0,

X"

(1,1)

1(1,0) + 1(0, 1)" 2(! , 0) + 3(O,! ) " etc.

M
...-

.......Z, ..

(!'

...... ,Y

= ( 1,0)

0)

3.3.3.

iF gure F

3.3.4.

Theorem. eL t Y be a non-empty subset of a linear space .X eL t


be the set of all finite linear combinations of the vectors from Y; eL .,
Y E V( Y ) if and only if there is some set of scalars { l X I ' , IX",l and some
finite subset {y I ' . ,Y",l of Y such that
V( Y )

Y = IXIIY
+ IX.}'Yz
+ ... + IX",Y .. ,
where m may be any positive integer. Then V( Y) is a linear subspace of .X

3.3.5. Exercise.

Prove Theorem 3.3.4.

Our previous result motivates the following concepts.


3.3.6. Defloition. We say the linear space V( Y )
linear subspace generated by the set .Y

in Theorem 3.3.4 is the

3.3.7. Defloition. eL t Z be a linear subspace of a vector space .X If there


exists a set of vectors Y c X such that the linear space V( Y ) generated by
Y is Z, then we say Y spans Z.
If, in particular, the space of Example 3.1.4 is considered and if V and W
are linear subspaces of X as depicted in Figure 0, then the set Y = e{ ll
spans W, the set Z = e{ lz spans V, and the set M = {el' ezl spans the vector
space .X The set N = e{ l, e'.} , e3l also spans the vector space .X

3.3. iL near Independence, Bases, and Dimension

3.3.8.

87

iF gure G. Vand Ware iL nes Intersecting at Origin O.

3.3.9. Exercise.

Show that V( )Y is the smallest linear subspace of a vector


space X containing the subset Y of X. Specifically, show that if Z is a linear
subspace of X and if Z contains ,Y then Z also contains V( Y).
And now the important notion of linear dependence.

3.3.10. Deftnition. Let { X I ' X 2, ,x",} be a finite non-empty set in a


linear space .X Ifthere exist scalars IX I , . ,IX " , E E, not all zero, such that
IX I X

+ ... +

IX",X",

(3.3.11)

then the set IX{ >


X2'
, x m} is said to be linearly dependent. If a set is not
linearly dependent,then it is said to be linearly independent. In this case the
relation (3.3.11) implies that IX I = IX 2 = ... = IX", = O. An infinite set of
vectors Y in X is said to be linearly independent if every finite subset of Y
is linearly independent.
Note that the null vector cannot be contained in a set which is linearly
independent. Also, if a set of vectors contains a linearly dependent subset,
then the whole set is linearly dependent.
If X denotes the space of Example 3.1.4, the set of vectors y{ , }z in Figure
H is linearly independent, while the set of vectors
v} is linearly dependent.

ru,

v
u

o
3.3.12.
tors.

iF gure .H

3.3.13. Exercise.

iL nearly Independent and iL nearly Dependent Vec-

eL t X = e[a, b), the set of all real-valued continuous


functions on a[ , b), where b > a. As we saw in Example 3.1.19, this set forms

Chapter 3 I Vector Spaces and iL near Transformations

88

a vector space. eL t n be a fixed positive integer, and let us define ,x E X for


i = 0, 1,2, ... , n, as follows. F o r all I E a[ , b), let
and

,x (t)
=

x i t) =

I', i=

I, ... ,n.

L e tY
= x{ o, X I "' " x 8 }. Then V( )Y
of degree less than or equal to n.

is the set of all polynomials on a[ , b]

(a) Show that Y is a linearly independent set in .X


(b) eL t ,X = ,x { ,}
i = 0, 1, ... ,n; i.e., each ,X is a singleton subset
of .X Show that

V( Y )

(c)

eL t oz (t)

V(X o) Ef> V(X

1 for all I

I)

Ef> . . Ef>

V(X.).

a[ , b) and let

Zk(t) =

+ ... +

Ik

for all I E a[ , b) and k = 1, ... ,n. Show that Z =


is a linearly independent set in V( )Y .

3.3.14.

Theorem. eL t

"'
If I::

a vector space .X

"',X

X 1 "' "

"' P,x"
= I::

I- '

If,~

"x ,}
Therefore "" =

P,

,~ "'

"x ,}

Zl"' "

.z }

be a linearly independent set in

P, for all i =

then "" =

1, 2, ... , m.

P,x, then ,~ "' ("" - P,)x, = O. Since the set .x{ , ... ,
is linearly independent, we have ("', - P,) = 0 for all i = 1, ... ,m.

Proof.

"' "',,x =

I- '

{XI'

oz{ ,

for all i.

The next result provides us with an alternate way of defining linear


dependence.

3.3.15. Theorem. A set of vectors .x{ ,

,x",}

1,

is linearly dependent if and only if for some index i, 1 ~


, "'", such that
scalars "' I ' ... , "',- .. "',+ I'

,x

Proof.
"' . X

"' I X .

+ "',.+ IX +

""- I X ' - I

+ ... + "'",x..

Assume that Eq. (3.3.16) is satisfied. Then


I

+ ... +

"' , _ . X ' _ I

(- l )x ,

"',.+ ,x .+

in a linear space X
i ~ m, we can find

+ ... +

"'."X ,

(3.3.16)

O.

Thus, "" = - 1 1= = 0 is a non-trivial choice of coefficient for which Eq.


(3.3.11) holds, and therefore the set {Xl>
X1'
, "x ,}
is linearly dependent.
Conversely, assume that the set {XI' X z , ,x",} is linearly dependent.
Then there exist coefficients "' . , ... , "'", which are not all ez ro, such that

"'x z

+ ... +

"'",x", = O.

(3.3.17)
Suppose that index i is chosen such that "" 1= = O. Rearranging Eq. (3.3.17) to
"'IX

89

3.3. Linear Independence, Bases, and Dimension


- I I I,X

II,X ,

+ ... +

I- I

II' - I X

III+ I X I+ I

+ ... +

II.X " ,

(3.3.18)

and multiplying both sides of Eq. (3.3.18) by - 1 /11" we obtain


IX

PIX I

where P" =
proof. _

P1.X1.

-11"/11,,

= I,

P' _ I X / _

,i -

I, i

PI+ I X / +

+ ... +

P",x " "

I, ... ,m. This concludes our

The proof of the next result is left as an exercise.


3.3.19. Theorem. A finite non-empty set Y in a linear space X is linearly
indenpendent if and only if for each y E V( Y), y 0, there is a unique finite
x " ,} and a uniq u e set of scalars { I II' 111.,"" II",} ,
subset of ,Y say { X I ' X 1 ."' "
such that

3.3.20. Exercise.

Prove Theorem 3.3.19.

3.3.21. Exercise.
L e t Y be a finite set in a linear space .X Show that Y
is linearly independent if and only if there is no proper subset Z of Y such
that V(Z) = V( )Y .
A concept which is of utmost importance in the study of vector spaces is
that of basis of a linear space.
3.3.22. Definition. A set Y
or simply a basis, for X if

in a linear space X

(i) Y is linearly independent; and


(ii) the span of Y is the linear space X

is called a Hamel

itself; eL .,

V( Y )

basis,

.X

As an immediate consequence of this definition we have:


3.3.23. Theorem. Let X be a linear space, and let Y be a linearly independent set in .X Then Y is a basis for V( )Y .
3.3.24.

Exercise.

Prove Theorem 3.3.23.

In order to introduce the notion of dimension of a vector space we show


that if a linear space X is generated by a finite number of linearly independent
elements, then this number of elements must be unique. We first prove the
following result.
3.3.25. 1beorem. L e t
Then for each vector x

{XI'

X 1 .,' "

,x , ,}

be a basis for a linear space .X


. . . , (I" such that

X there exist unique scalars (II'

(lIX

+ ... +

II"X " .

Chapter 3 I Vector Spaces and iL near Transformations

90

Proof. Since IX ' ... ,X . span ,X every vector X


a linear combination of them; i.e.,

can be expressed as

X = lIlx l + lI"X" + ... + lI.X.


for some choice of scalars lIl" .. ,lI . We now must show that these scalars
are unique. To this end, suppose that

=
X

and
Then
x

(- x )

lI"X"
... - P.x.)

(lIIX I

lIlX I

(lIl -

PI)X I

lI"X"

+ ... +

+ ... +

(lI" -

lI.X.

lI.X.)

(- P IX

+ ... +

P")x,,

I-

P"x"
P.)x.

(ll. -

O.

Since the vectors x I, "x , ...' ,X. form a basis for ,X it follows that they
are linearly independent, and therefore we must have (lI, - P,) = 0 for
i = 1, ... ,n. From this it follows that III = PI' lI" = P", ... ,lI" = p".
We also have:
3.3.26. Theorem. eL t IX{ ' "X , ... ,x . } be a basis for vector space ,X and
let {YI' ... IY' II} be any linearly independent set of vectors. Then m < n.

Proof. We need to consider only the case m > n and prove that then we
actually have m = n. Consider the set of vectors IY{ ' X I "" ,x.l. Since the
vectors XI' ... ,X . span ,X IY can be expressed as a linear combination of
them. Thus, the set {YI' X I > ' "
,x.l is not linearly independent. Therefore,
there exist scalars PI' lIl> ... , lI., not all ez ro, such that

PIYI
If all the

lI, are

lIlx l

zero, then PI

+ ... +

*' 0 and PlY

lI"X.
I

(3.3.27)

O.

O. Thus, we can write

PIYI + O "Y + ... + O IY II = O.


But this contradicts the hypothesis of the theorem and can' t happen because
the YI' ... IY ' II are linearly independent. Therefore, at least one of the lI, O.
Renumbering all the x" if necessary, we can assume that lI" O. Solving for
x" we now obtain

*'

*'

"x

(- l l)Y I

+ (~~I)XI

+ ... +

Now we show that the set IY{ '


X I "' "
,X ,-I}
{ X I "' "
x.} is a basis for ,X we have I~ ' "~ ,

(- : :- I )X . _

I.

is also a basis for .X


,~.
E F s uch that

X = ~IXI
+ ... + ~.x .
Substituting (3.3.28) into the above expression we note that

(3.3.28)

Since

3.3. Linear Independence, Bases, and Dimension

'IXI

"Y I

"1

Z' X
"tXI

+
z

,.[(I[ - )Yt
+

91

(-::-I)X._

+ ... +

t]

".- I X . _ I '

where" and are defined in an obvious way. In any case, every x E X can
be expressed as a linear combination of the set of vectors y{ t, X I' , X . _
and thus this set must span .X To show that this set is also linearly independent, let us assume that there are scalars
such that
AYI

AIX I

and assume that A1= = O. Then


YI-

_ (-A
T I)

XI"

+ ... +

In view of Eq. (3.3.27) we have, since


YI

+ ... +

= (p~I)XI

A, AI' ... ,A._ I


A._IX._

(-A
._I)-A
+

PI

a,

X.-

0,

0 X

(3.3.29)

1= = 0, the relation

(-p:-t)x._

t+

(p~.)x.

(3.3.30)

Now the term (-a../Pt)x.


in Eq. (3.3.30) is not zero, because we solved for
X . in Eq. (3.3.28); yet the coefficient multiplying X . in Eq. (3.3.29) is zero.
Since { X I ' ... ,x . J is a basis, we have arrived at a contradiction, in view of
Theorem 3.3.25. Therefore, we must have A = O. Thus, we have

At IX

A._t.X _

+ ... +

0 . .X

= 0

AI

and since { x u . .. , .x l is a linearly independent set it follows that


= 0,
. . , A._ I = O. Therefore, the set { y \ J X I ' , X . _ d is indeed a basis for X.
By a similar argument as the preceding one we can show that the set
,z Y {
YI' XI' '
,x . - z J
is a basis for ,X that the set 3Y{ ' ,z Y Y I ' X I ' ... ,x . - 3 I
is a basis for ,X etc. Now if m > n, then we would not utilize Y n + 1 in our
process. Since {Y., . .. ,Y I ) is a basis by the preceding argument, there exist
coefficients ' I ., ... , ' I I such that
Y.+I

' I .Y .

+ ... +

' I IY I '

But by Theorem 3.3.15 this means the "Y i = 1, ... ,n + 1 are linearly
dependent, a contradiction to the hypothesis of our theorem. F r om this it
now follows that if m > n, then we must have m = n. This concludes the
proof of the theorem. _
As a direct consequence of Theorem 3.3.26 we have:

3.3.31. Theorem. If a linear space X has a basis containing a finite number


of vectors n, then any other basis for X consists of exactly n elements.

Proof Let { X I ' ... , .x 1 be a basis for X, and let also { Y I ""
, y.. l be a
basis for .X Then in view of Theorem 3.3.26 we have m < n. Interchanging
the role of the X i and ,Y we also have n < m. Hence, m = n. _

Chapter 3 I Vector Spaces and iL near Transformations

92

Our preceding result enables us to make the following definition.


3.3.32. Definition. If a linear space X has a basis consisting of a finite
number of vectors, say X { I ' , ,x ,}, then X is said to be a ftnite-diJDelLl4 ional
vector space and the dimension of X is n, abbreviated dim X = n. In this
case we speak of an n-dimeasional vector space. If X is not a finite-dimensional
vector space, it is said to be an inftnite-dimeasional vector space.
We will agree that the linear space consisting of the null vector is finite
dimensional, and we will say that the dimension of this space is ez ro.
Our next result provides us with an alternate characterization of (finite)
dimension of a linear space.
3.3.33. Theorem. Let X be a vector space which contains n linearly independent vectors. If every set of n + I vectors in X is linearly dependent,
then X is finite dimensional and dim X = n.
Proof eL t IX{ >
. . ,x,,} be a linearly independent set in ,X and let x
Then there exists a set of scalars {II I' ... , 11,,+ I} not all ez ro, such that
II I X

+ ... +

II"X"

II H I

and X E V({ X I > " "


is n-dimensional.
_

(- ...!)L x

11"+1

l -

i.e., { X l

,x ,});

..

,x,,}

.X

= O.

Now 11"+1 *- 0, otherwise we would contradict the fact that


linearly independent. eH nce,
X

XI'

,X "

are

(~)x"

11,,+ I

is a basis for .X

Therefore, X

F r om our preceding result follows:


3.3.34. Corollary. Let X be a vector space. If for given n every set of n + 1
vectors in X is linearly dependent, then X is finite dimensional and dim X

n< o

3.3.35. Exercise.

Prove

3.3.34.

Coroll~ry

We are now in a position to speak of coordinates of a vector. We have:


3.3.36. Definition. Let X be a finite-dimensional vector space, and let
x { I ' . . . , ,x ,} be a basis for .X Let X E X be represented by

x =

The unique scalars, I>


to the basis {XI' 2X ."

'tXI

+ ... + ,,,x,,.

2' ., ... ,,,, are called the coordinates of x


,

,x ,}.

with respect

It is possible to prove results similar to Theorems 3.3.26 and 3.3.31 for


infinite-dimensional linear spaces. Since we will not make further use of

3.3. iL near Independence, Bases, and Dimension

93

these results in this book, their proofs will be omitted. In the following
theorems, X is an arbitrary vector space (i.e., finite dimensional or infinite
dimensional).
3.3.37. Theorem. If Y is a linearly independent set in a linear space ,X
then there exists a Hamel basis Z for X such that Y c Z.
3.3.38. Theorem. If Y and Z are Hamel
Y and Z have the same cardinal number.

bases for a linear space ,X

then

The notion of H a mel basis is not the only concept of basis with which we
will deal. Such other concepts (to be specified later) reduce to H a mel basis
on finite-dimensional vector spaces but differ significantly on infinite-dimensional spaces. We will find that on infinite-dimensional spaces the concept
of Hamel basis is not very useful. However, in the case of finite-dimensional
spaces the concept of Hamel basis is most crucial.
In view of the results presented thus far, the reader can readily prove the
following facts.
3.3.39. Theorem.

=n.

Let

be a finite-dimensional linear space with dim X

(i) No linearly independent set in X contains more than n vectors.


(ii) A linearly independent set in X is a basis if and only if it contains
exactly n vectors.
(iii) Every spanning or generating set for X contains a basis for .X
(iv) Every set of vectors which spans X contains at least n vectors.
(v) Every linearly independent set of vectors in X is contained in a basis
for .X
(vi) If Y is a linear subspace of X, then Y is finite dimensional and
dim Y < n .
(vii) If Y is a linear subspace of X and if dim X = dim ,Y then Y = .x
3.3.40.

Exercise.

Prove Theorem 3.3.39.

F r om Theorem 3.3.39 follows directly our next result.


3.3.41. Theorem. Let X be a finite-dimensional linear space of dimension
n, and let Y be a collection of vectors in .X Then any two of the three conditions listed below imply the third condition:
(i) the vectors in Y a re linearly independent;
(ii) the vectors in Y span X ; and
(iii) the number of vectors in Y is n.

Chapter 3 I Vector Spaces and iL near Transformalions


3.3.42.

Exercise.

Prove Theorem 3.3.41.

Another way of restating Theorem 3.3.41 is as follows:


(a) the dimension of a finite-dimensional linear space X is equal to the
smallest number of vectors that can be used to span X ; and
(b) the dimension of a finite-dimensional linear space X is the largest
number of vectors that can be linearly independent in .X
F o r the direct sum of two linear subspaces we have the following result.
Theorem. eL t X be a finite-dimensional vector space. If there
exist linear subspaces Y and Z of X such that X = Y Z, then dim (X )
= dim (Y ) + dim (Z).

3.3.43.

Proof Since X is finite dimensional it follows from part (vi) of Theorem


3.3.39 that Y a nd Z are finite-dimensionallinear spaces. Thus, there exists
a basis, say { Y I ""
,Y,,} for ,Y and a basis, say { Z I> ' "
,z ..}, for Z. Let
W = { Y I "' "
"Y , ZI"" ,z",}. We must show that Wis a linearly independent
set in X and that V(W) = .X Now suppose that

Since the representation for 0


in Y and Z, we must have

X must be unique in terms of its components

and
But this implies that ~I = ~ = ... = ~ "
= PI= P~ = ... = P.. = O.
Thus, W is a linearly independent set in .X Since X is the direct sum of Y
and Z, it is clear that W generates .X Thus, dim X = m + n. This completes
the proof of the theorem. _
We conclude the present section with the following results.
3.3.4.4
1beorem. eL t X be an n-dimensional vector space, and let y{ I '
... , y",} be a linearly independent set of vectors in ,X where m < n. Then it
is possible to form a basis for X consisting of n vectors x I ' , x"' where
,x = ,Y for i = I, ... , m.

Proof

Let { e l"" ,e,,} be a basis for .X Let SI be the set of vectors IY{ '
... ,Y"" e l , , ell}, where { Y I "' "
Y .. } is a linearly independent set of
vectors in X and where m < n. We note that SI spans X and is linearly

3.4.

iL near Transformations

95

dependent, since it contains more than n vectors. Now let

. tJ,,Y

1= '

"*

"
+ E
1= '

p,e, =

O.

Then there must be some Pj


0, otherwise the linear independence of
{ y " ... , Y.} would be contradicted. But this means that ej is a linear combination of the set of vectors Sz = y{ I' . . , Y .., e l , , e j _ l , e j "+ ... , ell};
i.e., Sz is the set SI with ej eliminated. Clearly, Sz still spans .X Now either
Sz contains n vectors or else it is a linearly dependent set. If it contains n
vectors, then by Theorem 3.3.41 these vectors must be linearly independent
in which case Sz is a basis for .X We then let "x = t j , and the theorem is proved.
On the other hand, if Sz contains more than n vectors, then we continue the
above procedure to eliminate vectors from the remaining e,'s until exactly
n - m of them are left. Letting eil, ... ,ej _ be the remaining vectors and
letting X .. + I = til' ... ,x " = ej _ , we have completed the proof of the
theorem. _

3.3.45.

Corollary. Let X be an n-dimensional vector space, and let Y be


an m-dimensional subspace of .X Then there exists a subspace Z of X of
dimension (n - m) such that X = Y EB Z.

3.3.46.

Exercise.

Prove Corollary 3.3.45.

Referring to Figure 3.3.8, it is easy to see that the subspace Z in Corollary


3.3.45 need not be unique.

3.4.

IL NEAR

TRANSFORMATIONS

Among the most important notions which we will encounter are special
types of mappings on vector spaces, called linear transformations.
Deftnition. A mapping T of a linear space X into a linear space ,Y
where X and Y a re vector spaces over the same field ,F is called a linear
transformation or linear operator provided that

3.4.1.

(i) T(x
(ii) T(tJ)x

y)

= T(x) + T(y) for all x, y E X ; and


tJT(x) for all x E X and for all tJ E .F

A transformation which is not linear is called a non-linear transformation.


We will find it convenient to write T E L ( X ,
)Y
to indicate that T is
a linear transformation from a linear space X into a linear space Y (i.e.,

Chapter 3 I Vector Spaces and iL near Transformations

96

)Y denotes the set of all linear transformations from linear space X


into linear space Y).
.
It follows immediately from the above definition that T is a linear transfor-

L(X,

mation from a linear space X into a linear space Y if and only if

" II,T(X
= I-I;
I

,) for all ,X

E X

and for all II, E F , ;

T(tl IIIXI)

I, ... ,n. In engineering

and science this is called the principle of soperposition and is among the most
important concepts in those disciplines.
3.4.2. Example.
Let X = Y denote the space of real-valued continuous
Y
functions on the interval a[ , b] as described in Example 3.1.19. Let T: X - +
be defined by

T
[ (]x t)

f (x s)ds,

<

<

b,

where integration is in the Riemann sense. By the properties of integrals


it follows readily that T is a linear transformation.
3.4.3. Example.
Let X = e"(a, b) denote the set of functions x ( t) with n
continuous derivatives on the interval (a, b), and let vector addition and scalar
multiplication be defined by equations (3.1.20) and (3.1.21), respectively.
It is readily verified that e"(a, b) is a linear space. Now let T: e"(a, b)
-+
eO-I(a, b) be defined by

= dx(t) .

T
[ (]x t)

dt
F r om the properties of derivatives it follows that T is a linear transformation
from e"(a, b) to e"- I (a, b).
3.4.4.
Example.
Let X denote the space ofall complexv- alued functions x ( t)
defined on the half-open interval 0[ , 00) such that x ( t) is Riemann integrable
and such that

,--

where k is some positive constant and a is any real number. Defining vector
addition and scalar multiplication as in Eqs. (3.1.20) and (3.1.21), respectively,
it is easily shown that X is a linear space. Now let Y denote the linear space of
complex functions of a complex variable s (s = (1 + ;0>, ; = ,.JT
= ). The
Y defined by
reader can readily verify that the mapping T: X - +

T
[ (] x s)

50-

e- " x ( t)

dt

is a linear transformation (called the Laplace traasform of x ( t .

(3.4.5)

3.4.6. Example.
Let X be the space of real-valued continuous functions
on a[ , b] as described in Example 3.1.19. Let k(s, t) be a real-valued function

3.4.

iL near Transformations

defined for a
integral

<

<

s :::;;: b, a

<

b, such that for each x

s:

(3.4.7)

k(s, t)x ( t) dt

exists and defines a continuous function of s on a[ , b). eL t


defined by

[Ttx)(s)

the Riemann

s:

y(s)

T1 : X

be
X

-+

(3.4.8)

k(s, t)x ( t) dt.

It is readily shown that T 1 E L ( X ,


X). The equation (3.4.8) is called the
rF edholm integral equation of the first type. _

3.4.9.

Example.

If in place of (3.4.8) we define T z : X

T
[ )xz (s)

y(s)

s:

x(s) -

-+

k(s, t)x ( t) dt,

by
(3.4.10)

then it is again readily shown that Tz E L ( X , X). Equation (3.4.10) is known


as the rF edholm integral equation of the second type. _
3.4.11. Example. In Examples 3.4.6 and 3.4.9, assume that k(s, t)
when t> s. In place of (3.4.7) we now have

= 0

(3.4.12)

k(s, t)x(t)dt.

Eq u ations (3.4.8) and (3.4.10) now become

[ T 3(]x s)

y(s)

J : k(s,

(3.4.13)

t)x ( t) dt

and
[T.x)(s)

y(s)

(x s)

.J :
-

k(s, t)x ( t) dt,

(3.4.14)

respectively. Equations (3.4.13) and (3.4.14)


are called Volterra integral
equations (of the first type and the second type, respectively). Again, the
mappings T3 and T. are linear transformations from X into .X
_
~

3.4.15. Example. eL t X = C, the set of complex


let x denote the complex conjugate of .x Define T: X

= .x

T(x )

Then, clearly, T(x + y) = x + y = x


the field of complex numbers. and if ~
T(~)x

=
~x

=
~x

+ y=

.F

<iT(x)

Therefore, T is not a linear transformation.

T(x )

then
_

*'

T~ (x).

-+

numbers. If x
X as
T(y). Now if F

C,

= C,

Chapter 3 I Vector Spaces and iL near Transformations

98

Example 3.4.15 demonstrates the important fact that condition (i) of


Definition 3.4.1 does not imply condition (ii) of this definition.
Henceforth, where dealing with linear transformations T: X
write Tx in place of T(x ) .
3.4.16.

Definition. L e t T

L(X,

& ( T)

the null space of T. The set


R
< (T)

y{

->

,Y we wiJI

We call the set

)Y .
{x

X:

:Y y

Tx

= O}

Tx , x

(3.4.17)

}X

(3.4.18)

is called the range space of T.


Since TO = 0 it follows that & ( T) and R
< (T) are never empty. The next
two important assertions are readily proved.
3.4.19.

Theorem.

Let T

L(X,

)Y .

Then

(i) the null space & ( T) is a linear subspace of X; and


< (T) is a linear subspace of .Y
(ii) the range space R

3.4.20.

Exercise.

Prove Theorem 3.4.19.

F o r the dimension of the range space R


< (T) we have
Theorem. L e t T E L ( X ,
)Y . If X is finite dimensional with dimen< (T) is finite dimensional and dim R
< [ (T)] < n.
sion n, then R

3.4 . 21.

We assume that R
< (T) =# O
{ J and X =# O
{ ,J for if R
< (T) = O
{ J or X
then dim R
<{ (T)}
= 0, and the theorem is proved. Thus, assume that
n> 0 and let Y l o"" y~+1 E R< (T). Then there exist X I ' . , X~+1
E X
such that Tx/ = y/ for ; = 1, ... , n + 1. Since X is of dimension n, there
exist ~I' . , ~+I
E F such that not all ~/ = 0 and

Proof
=

O
{ ,J

This implies that

~IXt

+ ... +

~+tX~+t

+ ... +

~+IY-+I

O.
=

Thus,
or

O.
Therefore, by Corollary 3.3.34, R
< (T) is finite dimensional and dim R
< [ (T)]
~IYI

n< o

3.4.22. Example.
Let T: R% - > R"", where R% and R- are defined in Examples 3.1.10 and 3.1.11, respectively. F o r x E R% we write x = ({Io )%{ . Define

3.4.

iL near Transformations

Tby
T(~I'

~Z)

(0, I~ '
=

0, ~Z,

0, 0, ...).

The mapping T is clearly a linear transformation. The vectors (0; 1,0, ...)
< (T) and dim R
<[ (T)] = 2 = dim R
[ Z].

and (0,0,0,1,0,0, ...) span R


We also have:
Theorem. Let T E L ( X ,
)Y , and let X be finite dimensional. eL t
be a basis for R
< (T) and let ,X be such that TX I = ,Y for i = 1,
... , n. Then X I ' . , x. are linearly independent in .X

3.4.23.
{YI>'"

,y.J

3.4.24.

Exercise.

Prove Theorem 3.4.23.

Our next result, which as we will see is of utmost importance, is sometimes


called the fundamental theorem of linear equations.
3.4.25.

Theorem. eL t T

L(X,

)Y .

dim &(T)

If X is finite dimensional, then

dim R
< (T)

= dim .X

(3.4.26)

eL t dim X = n, let dim & ( T) = s, and let r = n - s. We must


show that dim R
< (T) = r.
First, let us assume that < s < n, and let e{ l> ez , ... , e.} be a basis for
X chosen in such a way that the last s vectors, et+., e' H '
... ,e., form a
basis for the linear subspace & ( T) (see Theorem 3.3.4)4 . Then the vectors
Tel, Tez,
, Te" Te'1+ >
... , Te. generate the linear subspace R
< (T). But
e,+1> e,+,z
, e. are vectors in &(T), and thus Te,+1 = 0, ... , Te. = O.
From this it now follows that the vectors Tel, Te z , ... , Te, must generate
R
< (T). Now let fl = Tel,fz = Tez, .. ' .I, = Te,. We must show that the
vectors {f1,fZ, ... ,f,} are linearly independent and as such form a basis
< (T).
for R
Next, we observe that "ltfl + "ldz + ... + "I,f, E R
< (T). If the "II> "lz,
... ,"1, are chosen in such a fashion that "ltf. + tdz + ... + "1'/, = 0, then
Proof

=
=

7tfl

T(7. e l

tdz

7z e z

71 Te l

7,f, =

7,e,),

7z Tez

+ ... +

7,Te,

and from this it follows that x = "lle l 7zez + ...


7,e, E &(T). Now,
by assumption, the set e{ I+ ' "
.. , e.} is a basis for &(T). Thus there must
exist scalars 7t+1> 7,H, ... ,7. such that

"lle l

"Izez

This can be rewritten as

+ ... +

"I,e, =

)' , + J e ,+ J

+ ... +

)'.e .

Chapter 3 I Vector Spaces and iL near Transformations

100

But fel, e", ... ,en} is a basis for .X F r om this it follows that 71 = 7" = ...
= Y r = 7r+ I = ... = Y n = O. eH nce, fltf", ... ,fr are linearly independent
< (T) = r. If s = 0, the preceding proof remains valid if
and therefore dim R
we let fel, ... ,e.} be any basis for X and ignore the remarks about the
vectors e{ r + I ' ,en}' If s = n, then ffi.(T) = .X eH nce, R
< (T) = O
{ J and so
< (T) = O. This concludes the proof of the theorem. _
dim R
Our preceding result gives rise to the next definition.
3.4.27. Definition. The rank p(T) of a linear transformation T of a finitedimensional vector space X into a vector space Y is the dimension of the
range space R
< (T). The nullity v(T) of the linear transformation Tis the dimension of the nullspace ffi.(i').
The reader is now in a position to prove the next result.
)Y . Let X be finite dimensional, and let
3.4.28. Theorem. eL t T E L ( X ,
s = dim ffi.(T). eL t IX { '
... ,x , } be a basis for ffi.(T). Then

(i) a vector x E X satisfies the equation


Tx = O

if and only if x = lIlX I + ... + lI,X , for some set of scalars { l ilt
... , lI,}. Furthermore, for each x E X such that Tx = 0 is satisfied,
the set of scalars { l ilt ... , II,} is unique;
(ii) if oY is a fixed vector in ,Y then Tx = oY holds for at least one x E X
(called a solutioD of the equation Tx = oY ) if and only if oY E R
< (T);
and
(iii) if oY is any fixed vector in Y a nd if X o is some vector in X such that
Tx o = oY (i.e., X o is a solution of the equation Tx o = oY ), then a
vector x E X satisfies Tx = oY if and only if x = X o + PIX I + ...
+ P,X, for some set of scalars P{ it P", ... ,P,}. Furthermore, for
each x E X such that Tx = oY , the set of scalars P
{ it P1.' ... ,P,}
is unique.
3.4.29.

Exercise.

Prove Theorem 3.4.28.

Since a linear transformation T of a linear space X into a linear space Y


is a mapping, we can distinguish, as in Chapter I, between linear transformations that are surjective (i.e., onto), injective (i.e., one-to-one), and bijective
(i.e., onto and one-to-one). We will often be particularly interested in
knowing when a linear transformation T has an inverse, which we denote by
T- l . In this connection, the following terms are used interchangeably: T- I
exists, T has an inverse, T is invertible, and Tis non-singular. Also,. a linear

3.4.

iL near Transformations

101

transformation which is not non-singular is said to be singular. We recall,


if T has an inverse, then

T- I (Tx )

and

T(T- I y)

x for all x

E X

(3.4.30)

= y for all y E R< (T).

(3.4.31)

The following theorem is a fundamental result concerning inverses of


linear transformations.
3.4.32.

Let T E L ( X ,

Theorem.

)Y .

(i) The inverse of T exists if and only if Tx = 0 implies x = O.


(ii) If T- I exists, then T- I is a linear transformation from R
< (T) onto .X
Proof To prove part (i), assume first that Tx = 0 implies x = O. Let
X I ' X 2 E X with TX I = TX2' Then T(x l - x 2) = 0 and therefore IX - 2X
= O. Thus, IX = X 2 and T has an inverse.
Conversely, assume that T has an inverse. Let Tx = O. Since TO = 0, we
have TO = Tx. Since T has an inverse, X = O.
To prove part (ii), assume that T- I exists. To establish the linearity of
T- I ,let IY = TX I and 2Y = Tx 2, where Y I ' 2Y E R
< (T) and X I ' X 2 E X are
such that IY = TX I and 2Y = Tx 2. Then
T- I (Y I

2Y )

Also, for

T- I (Tx l
T- I (Y I )

E F we have

T-I(~YI)

T-I(~Txl)

Tx 2)

T- I T(x

x 2)

IX

X 2

T- I (yz ) .

T-I(T(~xl))

~XI

~T-I(YI)'

Thus, T- I is linear. It is also a mapping onto ,X since every Y E R


< (T) is
the image of some X E .X F o r, if X E ,X then there is ayE R
< (T) such that
Tx = y. Hence, X = T- I y and X E R
< (T-I).

3.4.33. Example. Consider the linear transformation T: R2 - + R~ of


Example 3.4.22. Since Tx = 0 implies X = 0, Thas an inverse. We see that T
is not a mapping of R2 onto R- ; however, T is clearly a one-to-one mapping
of R2 onto R
< (T).

F o r finite-dimensional vector spaces we have:


3.4.34.
Theorem. Let T E L ( X ,
)Y . If X is finite dimensional, T has an
inverse if and only if CR(T) has the same dimension as X ; i.e., p(T) = dim .X
Proof

By Theorem 3.4.25 we have


dim ffi:(T) +

dim R
< (T)

= dim .X

Chapter 3 I Vector Spaces and iL near Transformations

101

Since Thas an inverse ifand only iU t (T)


if and only if T has an inverse. _

= O{ ,J it follows that P(T) = dim X

F o r finite-dimensional linear spaces we also have:


3.4.35. Theorem. eL t X and Y be finite-dimensional vector spaces of the
)Y . Then R
< (T) = Y
same dimension, say dim X = dim Y = n. Let T E L ( X ,
if and only if T has an inverse.

Proof Assume that T has an inverse. By


dim R
< (T) = n. Thus, dim R
< (T) = dim Y a nd
part (vii), that R
< (T) = .Y
Conversely, assume that R
< (T) = .Y eL t
R
< (T). Let ,X be such that TX t = ,Y for i =

Theorem 3.4.34 we know that


if follows from Theorem 3.3.39,

IY{ :Y' .! '


. . ,Y . } be a basis for
I, ... ,n. Then, by Theorem
3.4.23, the vectors X u , X . are linearly independent. Since the dimension
of X is n, it follows that the vectors X l ' ,X . span .X Now let Tx = 0 for
some X E .X We can represent X as X = IX I
. x . Hence, 0 = Tx
= IYI
.1
Since the vectorsY I ".' ,Y . are linearly independent,
we must have I =
= . = 0, and thus X =
This implies that T has
an inverse. _

+ ... +

+ ... +

o.

At this point we find it instructive to summarize the preceding results


which characterize injective, surjective, and bijective linear transformations.
In so doing, it is useful to keep Figure J in mind.

:Dm =
X

3.4.36.
iF gure J . iL near transformation T from vector space X
vector space .Y

into

3.4.37. Summary (Injective Linear Transformations). Let X and Y be


vector spaces over the same field ,F and let T E L ( X ,
)Y . The following
are equivalent:
(i) T is injective;
(ii) T has an inverse;

3.4.

iL near Transformations

103

(iii) Tx = 0 implies x = 0;
< (T), there is a unique x
(iv) for each y E R
(v) if TXt = Tx 1 , then X t = x 1 ; and
(vi)

if X

*' x

1,

then TXt

*' Tx

X such that Tx

y;

If X is finite dimensional, then the following are equivalent:


(i) T is injective; and
(ii) p(T) = dim .X
3.4.38. Summary (Surjective Linear Transformations). Let X and Y be
vector spaces over the same field E, and let T E L ( X ,
)Y . The following are
equivalent:
(i) T is surjective; and
(ii) for each Y E ,Y there is an x E X such that Tx
If X and Y a re

= y.

finite dimensional, then the following are equivalent:

(i) T is surjective; and


(ii) dim Y = p(T).
3.4.39. Summary (Bijective Linear Transformations).
vector spaces over the same field E, and let T E L ( X ,
)Y .
equivalent:
(i) T is bijective; and
(ii) for every y E Y there is a unique x

If X

and Y a re

Let X and Y be
The following are

such that Tx =

y.

finite dimensional, then the following are equivalent:

(i) T is bijective; and


(ii) dim X = dim Y = p(T).
3.4.40.
Summary (Injective, Surjective, and Bijective Linear Transformations). L e t X and Y be finite-dimensional vector spaces, over the same
field E, and let dim X = dim .Y (Note: this is true if, e.g., X = .Y ) The
following are equivalent:
(i)
(ii)
(iii)
(iv)
3.4.41.
(3.4.04 ).

T is injective;
T is surjective:
T is bijective; and
T has an inverse.
Exercise.

Verify the assertions made in summaries (3.4.37)-

Chapter 3 I Vector Spaces and iL near Transformations

104

eL t us next examine some of the properties of the set L ( X ,


)Y , the set
of all linear transformations from a vector space X into a vector space .Y
As before, we assumelhat X and Y a re linear spaces over the same field .F
Let S, T E L ( X ,
Y), and define the sum of SandT by

(S

for all x

.X

Also, with /X

by a scalar /X as

T)x

t::.

E
E

Tx

F and T E L ( X ,
(/XT)x

for all x
that /XT

Sx

define multiplication of T

)Y ,

/XTx

t::.

(3.4.24 )

(3.4.34 )

.X It is an easy matter to show that (S T) E L ( X ,


)Y and also
L(X,
)Y . eL t us further note that there exists a zero element in

Y), called the ez ro transformation and denoted by 0, which is defined by

L(X,

Ox

= 0

(3.4.)4

)Y there corresponds a unique


for all x E .X Moreover, to each T E L ( X ,
Y) defined by
linear transformation - T E L ( X ,
( - T)x

for all x E .X

= -

Tx

In this case it follows trivially that - T

(3.4.45)

T=

O.

3.4.64 .
Exercise. eL t X be a finite-dimensional space, and let T E L ( X ,
)Y .
Let e{ l> ... ,e.} be a basis for .X Then Te, = 0 for i = I, ... , n if and only
if T = 0 (i.e., T is the ez ro transformation).
With the above definitions it is now easy to establish the following result.
3.4.74 .
Tbeorem. eL t X and Y be two linear spaces over the same field
of scalars ,F and let L ( ,X Y) denote the set of all linear transformations from
X into .Y Then L ( X ,
Y ) is itself a linear space over ,F called the space of
linear transformations (here, vector addition is defined by Eq. (3.4.24 ) and
multiplication of vectors by scalars is defined by Eq. (3.4.43.
3.4.84 .

Exercise.

Prove Theorem 3.4.74 .

Next, let us recall the definition of an algebra, considered in Chapter 2.


3.4.94 .
Definition. A set X is called an algebra if it is a linear space and
if in addition to each ,x y E X there corresponds an element in X, denoted
by x y and called the product of x times y, satisfying the following axioms:
(i) x (y + )z
(ii) (x + y) z
(iii) (/Xx),
(py)

x y

= x z

(/XP)(x

If in addition to the above,

z for all x , y, z E X ;

y z for all x , y,

y) for all x , y E X

X ; and
and for all /x, P E .F

Z E

3.4.

105

iL near Transformations

(iv) (x

= x

y) z

(y )z for all x , y, Z

,X

then X is called an associatil'e algebra.


If there exists an element i E X such that i . x = x i = x for every
x E ,X then i is called the identity of the algebra. It can be readily shown
that if i exists, then it is unique. Furthermore, if x y = y x for all x , y E ,X
then X is said to be a commutative algebra. Finally, if Y is a subset of X
(X i sanalgebra)and(a)ifx + y
E Y w heneverx , y
E Y , and(b)ifex x
E Y
whenever ex E F and x E ,Y and (c) if x y E Y whenever x , y E ,Y then
Y is called a subalgebra of .X
Now let us return to the subject on hand. Let ,X ,Y and Z be linear spaces
over ,F and consider the vector spaces L ( ,X Y) and L(Y, Z). IfS E L ( ,Y Z)
and if T E L ( X ,
)Y , then we define the product STas the mapping of X into
Z characterized by
(ST)x = S(Tx )
(3.4.50)
for all x E .X The reader can readily verify that ST E L ( X ,
Next, let X = Y = Z. If S, T, V E L ( X ,
X ) and if ex,
easily shown that
S(TU ) = (ST)V,
S(T+

(S +

and

= ST+ SV,

U)
T)V =

SU

(exS)(PT) =

Z).
,F

PE

then it is
(3.4.51)
(3.4.52)
(3.4.53)

TV,

(3.4.54)

(a,P)ST.

F o r example, to verify (3.4.52), we observe that


S
[ eT

= S[(T + )U ]x

)U x]

(ST)x

= S[Tx + ]xU
(SU ) x
=

(ST +

SU ) x

for all x E ,X and hence Eq. (3.4.52) follows.


We emphasize at this point that, in general, commutativity of linear
transformations does not hold; i.e., in general,
(3.4.55)

ST*- TS.

There is a special mapping from a linear space X into ,X called the identity
transformation, defined by
(3.4.56)
Ix = x
for all x E .X We note that I is linear, i.e., I E L ( X ,
if X * - O
{ ,J that I is unique, and that
TI =
for all T

L(X,

X).

IT =

X ) , that I*- O ifand only

(3.4.57)

Also, we can readily verify that the transformation

106

Chapter

rJ,I, rJ, e ,F defined by

(a.I)x

I Vector Spaces and Linear Transformations

a.lx

(3.4.58)

a.x

is also a linear transformation.


The above discussion gives rise to the following result.
3.4.59. Theorem. The set of linear transformations of a linear space X
into ,X denoted by L ( X ,
X), is an associative algebra with identity I. This
algebra is, in general, not commutative.
We further have:
3.4.60.
and

Theorem. Let T

L(X,
T- I T=

X).

If T is bijective, then T- I
IT- I =

I,

L(X,

X)

(3.4.61)

where I denotes the identity transformation defined in Eq. (3.4.56).


3.4.62.

Exercise.

Prove Theorem 3.4.60.

F o r invertible linear transformations defined on finite-dimensional linear


spaces we have the following result.
3.4.63. Theorem. Let X be a finite-dimensional vector space, and let
T E L(X,
X). Then the following are equivalent:
(i)
(ii)
(iii)
(iv)
(v)

3.4.64.

T is invertible;
rank T = dim X ;
T is one-to-one;
T is onto; and
Tx = 0 implies x
Exercise.

O.

Prove Theorem 3.4.63.

Bijective linear transformations are further characterized by our next


result.
3.4.65.
IE L ( X ,

Theorem. Let X be a linear space, and let S, T, U


X ) denote the identity transformation.

L(X,

(i) If ST = S
U = I, then S is bijective and S- I = T = .U
(ii) IfSand Tare bijective, then STis bijective, and (Sn- I =
(iii) If S is bijective, then (S- I )- I = S.
(iv) If S is bijective, then a.S is bijective and (a.S>1F a nd a.

*' O.

X).

Let

T- I S- I .

S- I for all a.

3.4.

107

iL near Transformations

3.4.66.

Exercise.

Prove Theorem 3.4.65.

With the aid of the above concepts and results we can now construct
certain classes of functions of linear transformations. Since relation (3.4.51)
allows us to write the product of three or more linear transformations without
the use of parentheses, we can define T", where T E L ( ,X X ) and n is a positive
integer, as
T"I1T T ... T .
(3.4.67)
n times

Similarly, if T- I is the inverse of T, then we can define T- " ' , where m is a


positive integer, as
T- ' "

11

(T- I )' "

T- I T- I ... T- t .
mtfmes

m ti'ines

(3.4.68)

n tImes

(T. T .... T)
m + ntimes
=

T"'"+
=
=

(T T . ..

n times
=

T) (T T . .

.
mtimes

T)
(3.4.69)

1'" T"'.

In a similar fashion we have


and

(T"')"

= T"" = T- = (1"')"'

(3.4.70)
(3.4.71)

where m and n are positive integers. Consistent with this notation we also
have
TI = T
(3.4.72)
and
TO = 1.
(3.4.73)
We are now in a position to consider polynomials of linear transformations.
Thus, if f(A) is a polynomial, i.e.,

f(A) =

A
\

+ ... +

" A",

(3.4.74)

, ,1"'.

(3.4.75)

' ... , " E ,F then by f(T) we mean


where 0
f(T) =

f1, 0 1

f1,tT

+ ... +

The reader is cautioned that the above concept can, in general, not be

Chapter 3 I Vector Spaces and iL near Transformations

108

extended to functions of two or more linear transformations, because linear


transformations in general do not commute.
Next, we consider the important concept of isomorphic linear spaces.
In Chapter 2we encountered the notion of isomorphisms of groups and rings.
We saw that such mappings, if they exist, preserve the algebraic properties
of groups and rings. Thus, in many cases two algebraic systems (such as
groups or rings) may differ only in the nature ofthe elements ofthe underlying
set and may thus be considered as being the same in all other respects. We
n.ow extend this concept to linear spaces.
3.4.76. Definition. eL t X and Y be vector spaces over the same field .F
Ifthere exists T E L ( X ,
Y) such that Tis a one-to-one mapping of X into ,Y
then T is said to be an isomorphism of X into .Y If in addition, T maps X
onto Y then X and Yare said to be isomorphic.
Note that if X and aY re isomorphic, then clearly aY nd X are isomorphic.
Our next result shows that all n-dimensional linear spaces over the same
field are isomorphic.
3.4.77. Theorem. Every n-dimensional vector space X over a field F is
isomorphic to F".
Proof eL t e{ l, ... ,e,,} be a basis for .X Then every x E X has the unique
representation
x = ele l + ... + e"e",
where {el, e1., ... ,~,,}
is a unique set of scalars (belonging to F ) . Now let
us define a linear transformation T from X into P by
Tx =

(~1>

~1.,

,e,,)

It is an easy matter to verify that T is a linear transformation of X onto P,


and that it is one-to-one (the reader is invited to do so). Thus, X is isomorphic
to P .
It is not difficult to establish the next result.

3.4.78. Theorem. Two finite-dimensional vector spaces X and Yover


same field F are isomorphic if and only if dim X = dim .Y
3.4.79.

Exercise.

the

Prove Theorem 3.4.78.

Theorem 3.4.77 points out the importance ofthe spaces R" and C". Namely,
every n-dimensional vector space over the field of real numbers is isomorphic
to R" and every n-dimensional vector space over the field of complex numbers
is isomorphic to eft (see Example 3.I.lO).

3.5.

IL NEAR

N
UF CTIONALS

There is a special type of linear transformation which is so important


that we give it a special name: linear functional.
We showed in Example 3.1.7 that if F is a field, then "F is a vector space
over .F If, in particular, n = I, then we may view F as being a vector space
over itself. This enables us to consider linear transformations of a vector
space X over F into .F
3.5.1. Definition. Let X be a vector space over a field .F A mapping f
of X into F is called a functional on .X If1 is a linear transformation of X
into ,F then we call 1 a linear functional on X .
. We cite some specific examples of linear functionals.
3.5.2. Example.

Consider the space era, b]. Then the mapping

s:

II(x ) =

ex s) ds, x

era, b]

(3.5.3)

is a linear functional on era, b]. Also, the function defined by


Il(X) =

(x so),

era, b],

So

a[ , b]

(3.5.4)

is also a linear functional on era, b]. Furthermore, the mapping


f,ex)

(x s)xo(s)

(3.5.5)

ds,

where X o is a fixed element of era, b] and where x is any element in era, b],
is also a linear functional on era, b].
3.5.6. Example. eL t X = P, and denote x
The mappingf, defined by
f,(x ) = el

by x

(e

I' .

e.).

(3.5.7)

is a linear functional on .X A more general form of I, is as follows. eL t


a = (~I' ... , ~.) E X be fixed and let x = (el' ... ,e.) be an arbitrary
element of .X It is readily shown that the function
Is(x )
is a linear functional on .X

= :E
,~ e,
I- I

(3.5.8)

3.5.9. Exercise. Show that the mappings (3.5.3), (3.5.4), (3.5.5), (3.5.7),
and (3.5.8) are linear functionals.
Now let X

be a linear space and let X '

denote the set of all linear func-

109

Chapter 3 I Vector Spaces and iL near Transformations

110

tionals on .X Iff E X ' is evaluated at a point x


quently we will also find the notation

f(x )

,X we write f(x ) . Fre-

(x , J )

(3.5.10)

useful. In addition to Eq. (3.5.10), the notation x'(x)


used. In this case Eq. (3.5.10) becomes

f(x )

(x , J )

or x ' x

is sometimes

(x , x ' ) ,
=

(3.5.11)

where x ' is used in place of f Now letfl = t' x ,J1. = ~ belong to IX ,


E .F Let us define fl + f1. = t'x + ~ and f = x ' by

(fl

f1.)(x) =

t'x

(x ,

fl(x )
=

and

(<f< )(x)

~)

(x,~)

f1.(x),

(x , x ' )
=

(x , ;X )

(3.5.12)

(x,

x')

f (x ) ,
=

(3.5.13)

respectively. We denote the functional f = x ' such that f(x )


for all x E X by O. Iffis a linear functional then we note that

f(x l

1X .) =

(XI
IX < '

1X .' x ' )

x')

and also,

f(<)x<
=

,x<

1X< .'

x')
=

and let

x')
,x <

f(x l ) +
=

x')
=

f (x ) .

f(x1.),

x'(x)

(3.5.14)
(3.5.15)

It is now a simple matter to prove the following:


3.5.16. Theorem. The space X '
with vector addition and multiplication
of vectors by scalars defined by equations (3.5.12) and (3.5.13), respectively,
is a vector space over .F
Prove Theorem 3.5.16.

3.5.17. Exercise.

3.5.18. Definition. The linear space X '


of .X

is called the algebraic conjugate

Let us now examine some of the propeties of X '


dimensional linear spaces. We have:

for the case of finite-

3.5.19. T
' heorem. Let X be a finite-dimensional vector space, and let
e{ l , ,e,,} be a basis for .X IfI { t ... , . } is an arbitrary set of scalars,
then there is a unique linear functional x ' E X ' such that (e" x ' ) = , for
i = 1, ... , n.
Proof

F o r every
X

,X

X =

we have
~1l'1

~1.e1.

+ ... +

~"e .

111

3.5. iL near uF nctionals

Now let x '

X'

be given by

I'

(x , x ' )
=

1= 1

J'

rt~"

= e, for some i, we have = I and = 0 if i *- j. Thus, (e x ' ) = rt,


for i = I, ... , n. To show that x ' is unique, suppose there is an" ~ E X '
such that (e,,~)
= rt, for i = I, ... ,n. It then follows that (e,,~)
- (e
"
x ' ) = 0 for i = I, ... ,n, and so (e ~ - x ) = 0 for i = I, ... ,n. This
"
implies ~ - x = 0; i.e., ~ = x .

Ifx

In our next result and on several other occasions throughout this book,
we make use of the Kronecker delta.

3.5.20. DefIDitioD. Let


~
I, ... ,n. Then ~'J

for i,j =

_ I{

/J -

if i =

0 ifi*- j

(3.5.21)

is called the Kronecker delta.

We now have:

3.5.22. Theorem. Let X be a finite-dimensional vector space. If e{ l' e~,


... , e.} is a basis for ,X then there is a unique basis {e~, e;, ... , e:.} in X '
with the property that (e e~) = ~/J'
F r om this it follows that if X is ndimensional, then so is XI. "
Proof F r om Theorem 3.5.19 it follows that for eachj = I, ... , n, a unique
~ E X'
can be found such that .< e e~) = ~/J'
Thus, we only have to show
"
that the set e{ ;, e;, ... , e:.} is a linearly
independent set which spans X I .
To show that e{ ;, e;, ... , e~} is linearly independent, let

PIe;

Then

o=

pze;

(eJ , ~ . Pie;)= .~
1= 1

+ ... +

P.e:. =

p,(eJ , e;)

1= 1

1= 1

o.

= PJ'

P'~/J

and therefore we have PI = P~ = ... = P. = O. This proves that {e~, e;,


... , e~} is a linearly independent set.
To show that the set Ie;, e;, ... , e~} spans X ' , let ' x E X '
and define
rt ,

= (e

x').

"
(x , x ' )

Also,

Let x

(' l eI

1= 1

'lei' We then have

+ ... +

' I (e l , x ' )

' . e., x ' )

+ ... +

(x , e~)
=

= l'< eI>

' . (e., x ' )

~ "<e ,, ~)
t:t

x')
' I rt l
=

'J

+
+

+
+

(' . e., x ' )


' . rt .

Chapter 3 I Vector Spaces and iL near Transformations

112

Combining the above relations we now have

,x <

= I ,X<

x')

e;)

= ,x<

I e;

F r om this it now follows that for any x '

x'

= I e; + ...

which proves our theorem.

X'

.(X,

+ . e.).

e.)

we have

+ . e.,

The previous result motivates the following definition.


3.5.23. DefinitiOD. The basis
called the dual basis of (e" e2 ,

(e;, e;, ... , e.}


,

e.}.

of X '

in Theorem 3.5.22 is

We are now in a position to consider the algebraic transpose of a linear


transformation. L e t S be a linear transformation of a linear space X into a
linear space Y a nd let X ' and yl denote the algebraic conjugates of X and ,Y
respectively (the spaces X and Y need not be finite dimensional). F o r each
y' E yl let us establish a correspondence with an element x ' E X ' according
to the rule
x ' ( x ) = ,x <
x ' ) = <Sx, y' ) = y' ( Sx ) ,
(3.5.24)
where x E .X L e t us denote the mapping defined in this way by ST: STy' =
and let us rewrite Eq. (3.5.24) as

,x <

STy' )

= S< ,x

y' ) ,

x E ,X y' E yl,

x'

(3.5.25)

to define ST. It should be noted that if S is a mapping of X into ,Y then ST


is a mapping of yl into X I , as depicted in F i gure K. We now state the
following formal definition.

s
y

lX

3.5.26.

iF gure .K

Transpose of a linear transformation.

3.5.27. DefinitioD. L e t S be a linear transformation of a linear space X


into a linear space Y over the same field F and let X ' and yl denote the

3.6. Bilinear uF nctiona[s

113

algebraic conjugates of X and ,Y respectively. A transformation ST from yl


into X ' such that
,x<
STy') = S
< ,x
y' )
for all x

X and all y'

yl is called the (algebraic) traaspose of S.

We now show that ST is a linear transformation.

3.5.28. Theorem. Let S E L ( X ,


y), and let ST be the transpose of S. Then
ST is a linear transformation from yl into X ' .
Proof

L e t
,x <

,F and let y;, y~

ST(Y;

Thus, ST(y;

y~)

,x <

ST(y;)
ST(<y< ;

S
< ,x

(y;

,x <

STY;)

ST(<y< ;)
=

S
T(Y;).

yl. Then for all x

ST(y~).

=
=

eH nce,

S
< ,x
ST h

,x<

).

y;)

,X

S
< ,x

h)

Also,

S
< ,x

Y;)

,x <

sry;) =

Therefore, ST

S
< ,x
,x<

Y;)
s ry~).

L ( yl, X ' ) .

The reader should have now no difficulties in proving the following results.

3.5.29. Theorem. Let R, S E L ( X ,


)Y , and let T E L ( y, Z). Let RT, ST,
and TT be the transpose transformations of R, S, and T, respectively. Then,
(i) (R + Sf = RT + ST; and
(ii) (TSf = STTT.
3.5.30. Theorem. Let I denote the identity element of L ( X ,
is the identity element of L ( X ' ,
X I ).

X).

Then TJ

3.5.31. Theorem. Let 0 be the null transformation in L ( X ,


is the null transformation in L ( yl, X I ).

)Y .

Then OT

3.5.32. Exercise.

Prove Theorems 3.5.29-3.5.31.

We will consider an important class of transpose linear transformations


in Chapter 4 (transpose of a matrix).

3.6.

BILINEAR

N
UF CTIONALS

In the present section we introduce the notion of bilinear functional and


examine some of the properties of this concept. Throughout the present
section we concern ourselves only with real vector spaces or complex vector

114

Chapter 3

I Vector Spaces and Linear Transformations

spaces. Thus, if X is a linear space over a field ,F it will be assumed that F


is either the field of real numbers, R, or the field of complex numbers, C.
3.6.1. DefiDitiOD. L e t X be a vector space over C. A mapping g from X
into C is said to be a coojugate f.ClioDal if

g(czx

py) =

lig(x )

pg(y)

(3.6.2)

for all ,x y E X and for all cz, p E C, where d denotes the complex conjugate
of cz and denotes the complex conjugate of p.

Ifin Definition 3.6.1 the complex vector space is replaced by a real linear

space, then the concept of conjugate functional reduces to that of linear


functional, for in this case Eq . (3.6.2) assumes the form

g(czx

py) =

czg(x)

for all ,x y E X and for all cz, pER.

pg(y)

3.6.4. DeftDitioD. L e t X be a vector space over C. A mapping g of X


into C is called a bilioear f.ClioDal or a bilioear form if

(3.6.3)

x
X

(i) for each fixed y, g(x , y) is a linear functional in x ; and


(ii) for each fixed x, g(x , y) is a conjugate functional in y.
Thus, if g is a bilinear functional, then

py, )z =
(a) g(czx
(b) g(x , czy + pz ) =

for all ,x y, z

czg(x,

)z

iig(x , y)

and for all cz,

pg(y, z ) ; and
pg(x , z )

pE

C.

F o r the case of real linear spaces the definition of bilinear functional


is modified in an obvious way by deleting in Definition 3.6.4 the symbol for
complex conjugates.
We leave it as an exercise to verify that the examples cited below are
bilinear functionals.
3.6.5. Example.
L e t ,x y E Cl, where Cl denotes the linear space of
ordered pairs of complex numbers (if ,x y E Cl, then x = (~I>
1~ )
and
y = (111,111. The function
g(x , y) =

is a bilinear functional.

~I;;I

~7.; 7.

3.6.6. Example.
L e t ,x y E Rl, where R7. denotes the linear space of
ordered pairs of real numbers (if ,x y E R7., then x = (~I>
~7.)
and y =
(111) 111' L e t (J denote the angle hetween ,x y E l} 7.. The dot product of two

3.6.

115

Bilinear uF nctionals

vectors, defined by

I'll +
is a bilinear functional. _
g(x , y) =

~z'lz

(~t

~DI/2('It

I' DI/2 cos (J

3.6.7. Example.

Let X be an arbitrary linear space over C, and let L ( x )


and P(y) denote two linear functionals on .X The transformation
g(x , y) =

is a bilinear functional.

L ( x ) P(y)

3.6.8. Example.

eL t X be any linear space over C, and let g be a bilinear


functional. The transformation h defined by
h(x, y) =

is a bilinear functional.

g(x , y)

3.6.9. Exercise. Verify that the transformations given in Examples 3.6.5


through 3.6.8 are bilinear functionals.
We note that for any bilinear functional, g, we have g(O, y) = g(O 0, y)
0 g(O,y) = 0 for all y E .X Also, g(x , 0) = 0 for all x E .X
Frequently,
we find it convenient to impose certain restrictions on
bilinear functionals.
=

3.6.10. Definition. Let X be a complex linear space. A bilinear functional


g is said to be symmetric if g(x , y) = g(y, x ) for all x, y E .X Ifg(x , x) 2 0
0, then
for all x E ,X then g is said to be positive. Ifg(x , x ) > 0 for all x

*"

g is said to be strictly positive.

3.6.11. Definition. eL t X be a complex vector space, and let g be a bilinear


functional. We call the function g: X ......... C defined by
g(x )

g(x , x )

for all x E X, the quadratic fonn induced by g (we frequently omit the phrase
"induced by g").
F o r example, if g(x , y) = ~ 1;;1 + ~2;;2'
as in Example 3.6.5, then g(x )
= ~I~l + ~2~Z = I~dz + l~zI2.
This is a quadratic form as studied in
analytic geometry.
F o r real linear spaces, Definitions 3.6.10 and 3.6.11 are again modified in
an obvious way by ignoring complex conjugates.

3.6.12. Theorem. If g is the quadratic form induced by a bilinear functional

Chapter 3 I Vectof Spaces and iL near Transformations

116

g, then

~
Proof

[ g (x , y)

g(y,x )

t Y)

t(X
=

te 2 Y ) .

By direct expansion we have,


t ( -x +2 - Y ) =

y)

+y - ' - 2 - x+
g (x- 2

=
=

I
4 [ g (x ,

I
4 [ g (x ,

x)

x) -

y)

4 I g(x
=

g(y, x

y)

y)

g(y, x )

g(x, y) -

g(y, x )

g(x, y)

y,x

g(y, y),

and also,

t e 2 Y)

g[ (x,

Thus,
- } [ g (x ,

y)

= te

g(y, ) x

t Y)
-

g(y, y).

t e 2 y).
-

Our next result is commonly referred to as polarization.


3.6.13. 1beorem. Ift is the quadratic form induced by a bilinear form g
on a complex vector space ,X then
g(x , y) =

for every ,x y

Proof

t[ ! ( x

y) -

t[ ! ( x

(here i =

E X

y)

it[ ! ( x

iy)
- it[ ! ( x
-

iy)]

- I ).

F r om the proof of the last theorem we have


te

t Y)

=
~

g(x, y)

g[ (x,

x)

g[ (x,

x) -

g[ (x,

x ) ._ ig(x, y)

g[ (x,

x)

g(y, x )

g(y, y)

and
t e 2 Y)

Also,
it(X

= ~

iY ) =

g(x , y) -

g(y, x )

g(y, y).

ig(y, x )

g(y, y)

and
it(X

- ; iY )

= ~

ig(x, y) -

ig(y, x )

g(y, y).

After combining the above four expressions, Eq. (3.6.14) results. _


The reader can prove the next result readily.

(3.6.14)

117

3.6. Bilinear uF nctiona/s

3.6.15. Theorem. Let X be a complex vector space. If two bilinear functionals g and h are such that g = h, then g = h.

Exercise.

3.6.16.

Prove Theorem 3.6.15.

F o r symmetric bilinear functionals we have:


3.6.17. Theorem. A bilinear functional g on a complex vector space X
symmetric if and only if g is real (i.e., g(x) is real for all x E )X .
Proof

is

Suppose that g is symmetric; i.e., suppose that


g(x, y)

for all ,x y

.X

g(y, x)

= y, we obtain
g(x) = g(x, x) = g(x, x) =
But this implies that g is real.
Setting x

g(x)

for all x E .X
Conversely, if g(x) is real for all x E ,X then for h(x, y) = g(y, x) we
have h(x) = g(x, x ) = g(x, x ) = g(x). Since h = g, it now follows from
Theorem 3.6.15 that h = g, and thus
g(x, y) =

g(y, )x .

Note that Theorems 3.6.13, 3.6.15, and 3.6.17 hold only for complex
vector spaces. Theorem 3.6.15 implies that a bilinear form is uniquely
determined by its induced quadratic form, and Theorem 3.6.13 gives an
explicit connection between g and g. In the case of real spaces, these conclusions do not follow.
3.6.18. Example.
Let X = R2 with x = (el' e2) E R2 and y
E R2. Define the bilinear functionals g and h by
g(x, y) =

and

h(x, y)

Then g(x)

el171

2'217.

1{4 172

(171,172)

2' 172

= ' I ' l l + 3e2'11 + 3,1'12 + 2' 1' 2'

h(x), but g ;#: h. Note that h is symmetric whereas g is not.

Using bilinear functionals, we now introduce the very important concept


of inner product.
3.6.19. Definition. A strictly positive, symmetric bilinear functional g on
a complex linear space X is called an inner product.

F o r the case of real linear spaces, the definition of inner product is identical
to the above definition.
Since in a given discussion the particular bilinear functional g is always

Chapter 3 I Vector Spaces and iL near Transformations

118

specified, we will write (x , y) in place of g(x, y) to denote an inner product.


Utilizing
this notation, the inner product can alternatively be defined as a
rule which assigns a scalar (x, y) to every x , y E X (X is a complex vector
space), having the following properties:
0 for all x 1= = 0 and (x , x ) = 0 if x = 0;
(Y, x ) for all x , y E X ;
Py, )z = cz(,x )z + P(Y, )z for all x , y, Z E X
Ct, P E C; and
(iv) (x , Cty + pz) = ( x , y) + p(x , )z for all x , y, Z E X
Ct, p E C.

(i) (x , x )
(ii) (x , y)
(iii) (Ctx +

>

and for all


and for all

In the case of real linear spaces, the preceding characterization of inner


product is identical, except, of course, that we omit conjugates in (i}(- iv).
We are now in a position to introduce the concept of inner product space.
3.6.20. DefiDition. A complex (real) linear space X on which a complex
(real) inner product, (" ' ) , is defined is called a complex (real) inner product
space. In general, we denote this space by { X ; (0, )}. If the particular inner
product is understood, we simply write X to denote such a space (and we
usually speak of an inner product space rather than a complex or real inner
product space).
It should be noted that if two different inner products are defined on the
same linear space ,X say (' , )' 1 and (' , )2' then we have two different inner
product spaces, namely, { X ; (' , .).} and { X ; (0, ')2}'
Now let { X ; (0, .)' } be an inner product space, let Y be a linear subspace
of ,X and let (' , .)" denote the inner product on Y induced by the inner
product on X ; i.e.,
(x, y)' = (x, y)"
(3.6.21)
for all ,x y EY e
.X Then { Y ; (' , ' )"} is an inner product space in its own
right, and we say that Y is an inner product subspace of X.
Using the concept of inner product, we are in a position to introduce the
notion of orthogonality. We have:
3.6.22. Definition. eL t X be an inner product space. The vectors ,x y E X
are said to be orthogonal if (x, y) = O. In this case we write x - l y. If a vector
x E X is orthogonal to every vector of a set Y c X, then x is said to be
orthogonal to set ,Y and we write x - l .Y If every vector of set Y c X is
orthogonal to every vector of set Z c X, then set Y is said to be orthogonal
to set Z, and we write Y ...L Z.
Clearly, if x is orthogonal to y, then y is orthogonal to .x Note that if
0, then it is not possible that x - l x , because (x, x ) > 0 for all x 1= = O.
Also note that 0 - l x for all x E X.
x 1= =

3.7. Projections

119

Before closing the present section, let us consider a few specific examples.
3.6.23. Example. Let X = R"o F o r x
,' I .) E R , we can readily verify that
o

(~I'

00"

~")

R" and y

(' I I'

(x, y) =

is an inner product, and { X ;


3.6.24. Example.
... ,' I .) E C , let

Let

~,'Il

( ., .)} is a real inner product space. _

=
X

I~

=
x

C", F o r

(~I'

.. " ~.)

C" and y =

('II>

:E ,~ ; "

(x, y) =

1- 1

Then (x, y) is an inner product and ;X{


space. _

(., .)} is a complex inner product

3.6.25. Example. Let X denote the space of continuous complex valued


functions on the interval 0[ , 1). The reader can readily show that for f, g E ,X
(f,g)

f'=

f(t)g(t)dt

is an inner product. Now consider the family of functions {f.}


f.(t) =

n=

0, l ,

f.) = 0 if m

3.7.

1_

defined by

0[ , 1],

2 , .... Clearly, f. E X for all n. It is easily shown that (frn,


n. Thus, f .. ..L f .. if m
n.

*'

*'

PROJECTIONS

In the present section we consider another special class of linear transformations, called prOjectiODS. Such transformations which utilize direct sums
(introduced in Section 3.2) as their natural setting will find wide applications
in later parts of this book.

3.7.1. Definition. Let X be the direct sum of linear spaces X I and X 1 ;


i.e., let X = X I X 1 eL t x = X I + 2X , be the unique representation of
x E X , where X I E X I and 2X , E X 1 We say that the projection on X I along
2X ,

is the transformation defined by


P(x )

XI'

Referring to Figure ,L we note that elements in the plane X can uniquely


be represented as x = X I + 2X " where X I E X I and X 2 E X 2 (X I and X 1
are one-dimensional linear spaces represented by the indicated lines intersecting at the origin 0). In this case, a projection P can be defined as that

Chapter 3 I Vector Spaces and iL near Transformations

120

=
x

3.7.1.

Figure L .

Projection on IX

Xl

2X

along 1'X ..

transformation which maps every point x in the plane X onto the subspace
XI along the subspace 1'X .'
3.7.3. Theorem. eL t X be the direct sum of two linear subspaces X I
1'X ., and let P be the projection on X I along 1'X .' Then
(i) P

L(X,

(ii) R
< (P) =
(iii)

~(P)

X);

X I ; and
X 2

Proof To prove the first part, note that if x = X I


where x " Y I E X I and 1'X .' 1'Y . E X 2 , then clearly
P(f1.X

and

Py)
=
=

P(f1.X I
f1.P(x
f1.P(x)

l)

f1.X1' .

PP(YI)

PYI

PY1' .)

f1.P(x I

1'X . and Y =
f1.X I

1'X .)

PYI
PP(YI

YI

1'Y .'

1'Y .)

pP(y),

and therefore P is a linear transformation.


To prove the second part of the theorem. we note that from the definition
of P it follows that R
< (P) C X I ' Now assume that IX E X I ' Then PX I = IX >
and thus x I E R
< (P). This implies that XI C R
< (P) and proves that R
< (P) = X I '
To prove the last part of the theorem, let 1'X . E X 2 Then PX1' . = 0 so that
1'X . C ~(P).
On the other hand, if x E ~(P),
then Px = O. Since x =
X I + 1' X .'
where X I E XI and 1'X . E 1'X .' it follows that X I = 0 and X E 1'X .'
Thus, 1'X . ::J ~(P).
Therefore, 1'X . = ~(P).

Our next result enables us to characterize projections in an alternative


way.
3.7.4.
~(P)

Theorem. eL t P E L ( X ,
X).
if and only if PP = p'1.= P.

Then P is a projection on R
< (P) along

111

3.7. Projections

Proof Assume that P is the projection on the linear subspace X l of X


along the linear subspace :X h where X = X I EB X I ' By the preceding theorem,
Xl = R
< (P) and X I = m(p). F o r x E ,X
we have x = lX
XI' where
X I E X I and IX
E XI'
Then

p'1. x

P(Px)

PX

XI

Px,

and thus p'1. =

P.
let us assume that p2 = P. Let 1'X . = m(p) and let X I = R
< (P).
Clearly, m(p) and R
< (P) are linear subspaces of .X
We must show that
X = R
< (P) EB m(P) = X I EB X I '
In particular, we must show that R
< (P)
n m(p) = O{ J and that R< (P) and m(p) span .X
Now if y E R
< (P) there exists an x E X such that Px = y. Thus, p'1. x = Py
= Px = y. If y E m(p) then Py = O. Thus, if y is in both m(p) and m(p),
then we must have y = 0; i.e., R
< (P) n m(p) = O
{ .J
Next, let x be an arbitrary element in .X Then we have
C~n> versely,

Px

Letting Px = lX and (I - P)x = IX '


also PX I = P(I - P)x = Px - p'1. x
IX E X I ' F r om this it follows that X
X I along X I is P.

(I -

P)x.

we have PX I = pIX = Px = X I and


Px - Px = 0; i.e., X I E X I and
X I EB X I and that the projection on

The preceding result gives rise to the following:


3.7.5. Definition.
pI = P .

XI'

Let

L(X,

X).

Then P is said to be idempotent if

Now let P be the projection on a linear subspace X l along a linear subspace


Then the projection on X I along X I is characterized in the following way.

3.7.6. Theorem. A linear transformation P is a projection on a linear


subspace if and only if (I - P) is a projection. If P is the projection on X I
along 1'X .' then (I - P) is the projection on 1'X . along X l '
3.7.7. Exercise.

Prove Theorem 3.7.6.

In view of the preceding results there is no ambiguity in simply saying a


transformation P is a projection (rather than P is a projection on X I along
1'X .)'
We emphasize here that if P is a projection, then

X =

R
< (P)

EB m(p).

(3.7.8)

This is not necessarily the case for arbitrary linear transformations T E L ( X ,


X ) for, in general, R
< (T) and meT) need not be disjoint. F o r example, if
there exists a vector X E X such that Tx
0 and such that T2 x = 0,
. then Tx E R
< (T) and Tx E meT).

*'

Chapter 3 I Vector Spaces and iL near Transformations

121
eL t us now consider:

3.7.9. Definition. eL t T E U.X,


X). A linear subspace Y of a vector space
X is said to be invariant under the linear transformation T if y E Y implies
that Ty E .Y
Note that this definition does not imply that every element in Y can be
written in the form z = Ty, with y E .Y It is not even assumed that Ty E Y
implies y E .Y
F o r invariant subspaces under a transformation T E U.X,
X ) we can
readily prove the following result.
3.7.10. Theorem. eL t T

U.X,

Then

X).

(i) X is an invariant subspace under T;


(ii) O
{ J is an invariant subspace under T;
(iii) R
< (T) is an invariant subspace under T; and
(iv) (~ T)
is an invariant subspace under T.
3.7.11. Exercise.

Prove Theorem 3.7.10.

Next we consider:
3.7.12. Definition. eL t X be a linear space which is the direct sum of two
linear subspaces Y and Z; i.e., X = Y EEl Z. If Y a nd Z are both invariant
under a linear transformation T, then T is said to be reduced by Y a nd Z.
We are now in a position to prove the following result.
3.7.13. Theorem. Let Y and Z be two linear subspaces of a vector space
X such that X = Y EEl Z. Let T E L ( X ,
X). Then T is reduced by Y and Z
if and only if PT = TP, where P is the projection on Y along Z.

Proof Assume that PT = TP. If y E ,Y then Ty = TPy = PTy so that


Ty E Y and Y is invariant under T. Now let y E Z. Then Py = 0 and PTy
= TPy = TO = O. Thus, Ty E Z and Z is also invariant under T. eH nce, T
is reduced by Y and Z.
Conversely, let us assume that T is reduced by Y and Z. If x E ,X then
x = y + ,z where y E Y and Z E Z. Then Px = yand TPx = Ty E .Y
eH nce, PTPx = Ty = TPx ; i.e.,
PTPx

TPx

(3.7.14)

for all x E .X On the other hand, since Y a nd Z are invariant under T, we


have Tx = Ty + Tz with Ty E Y and Tz E Z. eH nce, PTx = Ty = PTy
= PTPx ; i.e.,
(3.7.15)
PTPx = PTx

3.8.

123

Notes and References

for all x

.X

Equations (3.7.14)

and (3.7.15) imply that PT =

TP.

We close the present section by considering the following special type of


projection.

3.7.16. Definition. A projection P on an inner product space X is said


to be an orthogonal projection if the range of P and the null space of Pare
orthogonal; i.e., if R
< (P) l.. &(P).
We will consider examples and additional properties of projections in
much greater detail in Chapters 4 and 7.

3.8.

NOTES AND REFERENCES

The material of the present chapter as well as that of the next chapter is
usually referred to as linear algebra. Thus, these two chapters should be
viewed as one package. F o r this reason, applications (dealing with ordinary
differential equations) are presented at the end of the next chapter.
There are many textbooks and reference works dealing with vector spaces
and linear transformations. Some of these which we have found to be very
useful are cited in the references for this chapter. The reader should consult
these for further study.

REFERENCES
3[ .1]
3[ .2]
3[ .3]

3[ .4]

P. R. A
H M
L OS,
iF nite Dimensional Vector Spaces. Princeton, N.J . : D. Van
Nostrand Company, Inc., 1958.
K. O
H M
F AN
and R. N
U K ZE,
Linear Algebra. Englewood Cliffs, N.J . : PrenticeH a ll, Inc., 1971.
A. W. NAYO
L R
and G. R. SEL,L
Linear Operator Theory in Engineering and
Science. New Y o rk: H o lt, Rinehart and Winston, 1971.
A. E. TAYO
L R, Introduction to u
F nctional Analysis. New Y o rk: J o hn Wiley &
Sons, Inc., 1966.

IF NITE-DIMENSIONAL
VECTOR SPACES AND
MATRICES

In the present chapter we examine some of the properties offinite-dimensional


linear spaces. We will show how elements of such spaces are represented by
coordinate vectors and how linear transformations on such spaces are represented by means of matrices. We then will study some of the important properties of matrices. Also, we will investigate in some detail a special type of
vector space, called the Euclidean space. This space is one of the most important spaces encountered in applied mathematics.
Throughout this chapter { " ... , . ,}
/ E ,F and { x " ... ,x . } , /x E ,X
denote an indexed set of scalars and an indexed set of vectors, respectively.

.4 1.

COORDINATE REPRESENTATION
OF VECTORS

Let X be a finite-dimensional linear space over a field ,F and let x { I' ,


x.} be a basis for .X Now if x E ,X then according to Theorem 3.3.25 and
Definition 3.3.36, there exist unique scalars ~ I' . . . ,~.,
called coordinates
of x with respect to this basis such that

x =
124

~IXI

+ ... +

~.x..

(4.1.1)

.4 1.

U5

Coordinate Representation of Vectors

This enables us to represent x unambiguously in terms of its coordinates


as

(4.1.2)

or as
(4.1.3)
We call x (or x T) the coordinate representation of the underlying object (vector)
x with respect to the basis { x " ... ,x,,}. We call x a column vector and x T a
row vector. Also, we say that x T is the transpose vector, or simply the transpose

of the vector x. F u rthermore, we define (x T f to be x.


It is important to note that in the coordinate representation (4.1.2) or
(4.1.3) of the vector (4.1.1), an "ordering" of the basis IX{ '
... ,x,,} is employed (i.e., the coefficient of X, is the ith entry in Eqs. (4.1.2) and (4.1.3.
If the members of this basis were to be relabeled, thus specifying a different
"ordering," then the corresponding coordinate representation of the vector
X would have to be altered, to reflect this change. However, this does not pose
any difficulties, because in a given discussion we will always agree on a particular "ordering" of the basis vectors.
Now let E .F Then

I'

('IX
=

I+

... + ,,,x.)

(<I' < )X

+ ... +

(<e< ")x,,.

(4.1.4)

In view of Eqs. (4. t.I)-.4{- 1.4)


it now follows that the coordinate representa... ,x . } is given by
tion of x with respect to the basis {XI'

I'

e l-

z{

x=

e z
(4.1.5)

I'_ t

or
x
Next, let y E ,X

T=

({I'

e_ .

'z,'

.. ",,) = (<I'< '

"IX I

.. ,e,,).

ez,'

(4.1.6)

where
y

"z X

+ ... +

""X".

The coordinate representation of y with respect to the basis IX { "


is, of course,

(4.1.7)

.. ,x,,}

ChtJpter 4 I iF nite-Dimensional

126

Vector Spaces and Matrices

=Y

(4.1.8)

or
(4.1.9)
Now

x +

(~.x.

(~I
=

+ ... +

~RX.)

+ 1' .)x l + ... +

+ ('1IXI + ... + 1' RX.)


(~.

1' .)x..

(4.1.10)

F r om Eq. (4.1.10) it now follows that the coordinate representation of the


vector x + Y E X with respect to the basis { X I ""
,x . l is given by

x+y=
or

x T+

yT

~.] . + .[~I]
.

(~.

1' .

~.
(~

. =

[~I . ~
R~

1' 1]

(4.1.11)

+ 1' R

+ ('11" .. ,'1R)
+ 1' ..... ,~. + 1' R)'

..... , ~.)

(4.1.12)

Next, let lU { t . , u.l and V


{ It . , vRl be two different bases for the linear
space .X Then clearly there exist two different but unique sets of scalars
(i.e., coordinates) t{ . , . ,t.l and fP..... ,P.l such that
X

tlu.

+ ... +

t.u.
=

P.v l + ... + P.v..

(4.I.I3)

This enables us to represent the same vector x E X with respect to two different bases in terms of two different but unique sets of coordinates, namely,

[]

~d

[]

(4.1.14)

The next two examples are intended to throw additional light on the above
discussion.

.4 1.15.

Example. eL t X = (~I'"
. ,~.)
E R.
eL t "I = (1,0, ... ,0),
(0, 1,0, ... ,0), ... , U . = (0, ... ,0, I). It is readily shown that the
set u{ l , . , u.l is a basis for RR. We call this basis the natural basis for RR.
Noting that
(4.1.16)

U2

.4 1.

Coordinate Representation of Vectors

117

the unambiguous coordinate representation of x


natural basis of R" is

R" with respect to the

(4.1.17)
or

x T=

... , ~.).

(~1'

Moreover, the coordinate representations of the basis vectors U

u. =

... ,

0 ,

z =

, tU I

are

I>

u =
"

(4.1.18)

0
I

respectively. We call the coordinates in Eq. (4.1.17) the natural coordinates


of x E R". (The natural basis for F " and the natural coordinates of x E F "
are similarly defined.)
Next, consider the set of vectors v
{ ., ... , v.J, given by v. = (1,0, ... ,0),
Vz =
(I, 1,0, ... ,0), ... ,v" = (I, ... , I). We see that the vectors { V I>
... , v.J form a basis for R". We can express the vector x given in Eq. (4.1.16)
in terms of this basis by

,'.+

(4.1.19)

for i = 1, 2, ... , n - 1. Thus, the coorwhere ot" = ,,, and ott = " dinate representation of x relative to {v., ... , v.) is given by
ot.
ot z

ot,,_.
_ ot"

,. - z' z' - 3'

,,,-.

(4.1.20)

"~-

"~

Hence, we have represented the same vector x E R by two different coordinate vectors with respect to two different bases for R".
Example. Let X = e[a, b}, the
functions on the interval a[ , b]. Let Y =
= 1 and x,(t) = I' for all I E a[ , b}, i =
3.3.13, Y is a linearly independent set in X

.4 1.21.

set of all real-valued continuous


x{ o, x . , . . , "x J c ,X where ox (t)
1, ... ,n. As we saw in Exercise
and as such it is a basis for V( )Y .

Chapter 4 I iF nite-Dimensional

128

Vector Spaces and Matrices

eH nce, for any y E V(Y) there exists a unique set of scalars ('Io, I' I>
such that
y = I' oXo + ... + I' . X .

. , 1' .1
(4.1.22)

Since y is a polynomial in t we can write, mote explicitly,

y(t) =

1' 0

+ ... +

' l It

'I.t,

t E a[ , b).

(4.1.23)

In the present example there is also a coordinate representation; i.e., we can


represent y E V( )Y by

(4.1.24)

I' .

This representation is with respect to the basis (x o, IX > , .x l in V(Y).


We could, of course, also have used another basis fo~ V(Y). F o r example,
let us choose the basis (zo, z I' . , .z l for V( )Y given in Exercise 3.3.13. Then
we have
y =

1X0Z o

IX I Z

+ ... +

IX"Z",

where IX. = I' " and IXt = I' t - I' t+I'


i = 0, 1, ... ,n - 1. Thus, y
may also be represented with respect to the basis (ZO,ZI' " ,z . } by
1X

IX

1' 0 I' I -

(4.1.25)
E

V(Y )

'II

1' 2
(4.1.26)

IX._
_ IX"

1' ,,-1 - 'I.


'I.

Thus, two different coordinate vectors were used above in representing the
same vector y E V( )Y with respect to two different bases for V( )Y .

Summarizing, we observe:
1. Every vector X belonging to an n-dimensional linear space X over
a field F can be represented in terms of a coordinate vector x, or its
transpose x T , with respect to a given basis e{ I' , e.l c .X We note
that x T E P (the space P is defined in Example 3.1.7). By convention
we will henceforth also write x E P. To indicate the coordinate representation of x E X by x E P, we write x ~ .x
2. In representing x by x, an "ordering" of the basis e{ l t , ell} c X
is implied.

.4 2.

Matrices

3.

129

sU age of different bases for X


sentations of x E .X

.4 2.

results in different coordinate repre-

MATRICES

In this section we will first concern ourselves with the representation


of linear transformations on finite-dimensional vector spaces. Such representations of linear transformations are called matrices. We will then examine
the properties of matrices in great detail. Throughout the present section X
will denote an n-dimensional vector space and Yan m-dimensional vector space
over the same field .F

A. Representation of Linear Transformations by Matrices


We first prove the following result.
Theorem. Let e{ ., e2, ... ,e ..} be a basis for a linear space .X

.4 2.1.

(i) eL t
set
(el>
e..},

(ii) L e t

A be a linear transformation from X

into vector space Y and


is any vector in X and if
e2"' " e..) are the coordinates of x with respect to e{ ., e2, ... ,
then Ax = e1e; + e2e~ + ... + e.. e~.

e;

= Ae1 , e~ = Ae2 ,
e;, ... , e~J

{e~,

,I" = Ae... If x

be any set of vectors in .Y

Then there exists a


= e;,

unique linear transformation A from X into Y such that Ae l


Ae 2 = e~, . . , Ae.. = 1".
Proof To prove (i) we note that
Ax

=
=

A(e1e l
el~

e2e2
e2e~

e"e,,)

elAe l

e2Ae2

+ ... +

X we have unique scalars

e"Ae"

e"e~.

To prove (ii), we first observe that for eachx


e., e2" .. , e.. such that
x

e.e l

+ ... +

e2e2

e..e".

Now define a mapping A from X into Y as


A(x) =

Clearly, A(e,) = e; for i


Given x = ele l + e2e2
we have
A(x + y) =

ele;

= 1,

(el +

,n.

e.. l".

We first must show that A is linear.


e..e.. and y = ' l Ie. + I' 2e2 + ... + ' I ..e..,

A[(el

+ ... +

e2e~

I' 1)e.
I' 1)e'l

+
+

(e ..
(e ..

' I ..)e ..l


' I ..)e' ...

Chapter 4 I iF nite-Dimensional

130

Vector Spaces and Matrices

On the other hand.


and
Thus.

A(x)

ell.

A(y) =

= (el

A(x

+ ... +
+ (e~ +

e~e~
111)e~

e"e:.
11~)~

111e~

+ ... +

11~~

(e"

+ ... +

l1"e~

11,,)e:.

y).

In an identical way we establish that

lXA(x)

A(lX)X

for all x E X and all lX E .F It thus follows that A E L ( X .


)Y .
To show that A is uniq u e. suppose there exists aBE L ( X .
)Y
Be, = e; for i = I . .. n. It follows that (A - B)e, = 0 for all i =
and thus it follows from Exercise 3.4.6
4
that A = B.

such that
I . .. n.

We point out that part (i) of Theorem .4 2.1 implies that a linear transformation is completely determined by knowing how it transforms the basis vectors
in its domain. and part (ii) of Theorem .4 2.1 states that this linear transfor-

mation is uniquely determined in this way. We will utilize these facts in the
following.
Now let X be an n-dimensional vector space. and let {el' ez . .. ell} be
a basis for .X L e t Y b e an m-dimensional vector space. and let {fIJ~
... J " ,}
be a basis for .Y L e t A E L ( X .
)Y . and let e; = Ae, for i = I . .. n. Since
{[IJ~
... J " ,} is a basis for .Y there are uniq u e scalars a{ o.} i = I . .. m.
j = I . .. n. such that

Now let x E .X

Ael =
Aez =

I. = allfl
~ = aufl

Ae" =

e:. =

at..!1
elel

with respect to the basis e{


we have

Ax =
ffIJ~.

..

azt!~

aufz

az,,[z

+ ... +

a",t!",

a",d",

(4.2.2)

a",..!",.

Then x has the uniq u e representation

x =

Since Ax E .Y

+ .,. +

e~ez

e"e"

ell}' In view of part (i) of Theorem 4.2.1

ele~

+ ... +

(4.2.3)

e"e~.

Ax has a uniq u e representation with respect to the basis

,fIlII. say.

Ax =

11t!1

l1dz

+ ... +

11",[",.

(4.2.4)

.4 2.

Matrices

131

Combining Equations (4.2.2) and (4.2.3), we have


Ax

+ ... +

= el(aldl

e,,(au/l +

e8(a l J I

a",d",)

a",,,/,,,)

a",Jm)'

Rearranging the last expression we have


Ax

al"e" + ... + a h e8)/1


+ (a"lel + aue" + ... + a"8e8)/"

= (allel

+
However,
have

(a"'lel

a",,,e,, +

... + a"'8en)/",

in view of the uniqueness of the representation in Eq. (4.2.4)

aue" +
11" = a"lel + aue" +
111

11", =

= allel

amlel

a",,,e,,

alnen,
ah e8'

+ ... +

a",ne8'

we

(4.2.5)

This set of equations enables us to represent the linear transformation A


from linear space X into linear space Y by the unique scalars lao}, i = I,
... , m,j = I, ... , n. F o r convenience we let

A -- [ a,}] -

ail

a" I

a"'l

au

au

...

a",,,

.. ,

a 18 ]

ah .

(4.2.6)

a"'8

We see that once the bases {el, e", . .. ,e { / h/", ... ,I",} are fixed, we can
represent the linear transformation A by the array of scalars in Eq. (4.2.6)
which are uniquely determined by Eq. (4.2.2).
In view of part (ii) of Theorem .4 2.1, the converse to the preceding also
holds. Specifically, with the bases for X and Y still fixed, the array given in
Eq. (4.2.6) is uniquely associated with the linear transformation A of X into .Y
The above discussion justifies the following important definition.
8 },

.4 2.7. Definition. The array given in Eq. (4.2.6) is called the matrix A of
tbe linear transformation A from linear space X into linear space Y with respect
to the basis e{ 1> , en} of X and the basis { I I' ... ,fIll} of .Y
If, in Definition .4 2.7, X = ,Y and if for both X and Y the same basis
e{ l' ... , e is used, then we simply speak of the matrix A of the linear transformation A with respect to the basis e{ l, ... ,e8 } .
In Eq. (4.2.6), the scalars (all, 0,,,, ... ,0'8) form the ith row of A and the
8 }

Chapter 4 I iF nite-Dimensional

132

Vector Spaces and Matrices

scalars (all' 0 2/ , ... , 0"'/) form the jth column of A. The scalar a'l refers to
that element of matrix A which can be found in the ith row and jth column of
A. The array in Eq. (4.2.6) is said to be an (m X n) matrix. Ifm = n, we speak
of a square matrix (i.e., an (n X n) matrix).
In accordance with our discussion of Section .4 1, an (n X 1) matrix
is called a column vector, column matrix, or n-vector, and a (1 x n) matrix
is called a row vector.
We say that two (m X n) matrices A = [ 0 1/] and B = b[ l/] are equal if
and only if 01/ = bl/ for all i = I, ... , m and for allj = I, ... , n.
F r om the preceding discussion it should be clear that the same linear
transformation A from linear space X into linear space Y may be represented
by different matrices, depending on the particular choice of bases in X and .Y
Since it is always clear from context which particular bases are being used,
we usually don' t refer to them explicitly, thus avoiding cumbersome notation.
Now let AT denote the transpose of A E L ( X ,
Y) (refer to Definition
3.5.27). Our next result provides the matrix representation of AT.
.4 2.8. Theorem. Let A E L ( X ,
Y ) and let A denote the matrix of A with
respect to the bases e{ I' ... , e~} in X and { f l' ... ,I.} in .Y Let X I and
yl be the algebraic conjugates of X and Y, respectively. Let AT E L ( Y I , X I )
be the transpose of A. Let {f~,
... ,f~}
and {e~,
... , e:.}, denote the dual
bases of { f l' ... , f",} and e{ u ... , e~}, respectively. If the matrix A is given by
Eq. (4.2.6), then the matrix of AT with respect to {f~,
... ,f~}
of yl and
{e~,
... , e:.} of X ' is given by

all
AT

a21

= [ 01.2.. .~2.2
al~

0"'1]

a2~

"""

~."'2

...

a",~

(4.2.9)

Proof. Let B = b[ l' ] denote the (n x m) matrix of the linear transformation


AT with respect to the bases f{ ,~
... ,f~}
and {e~, ... , e:.J. We want to show
that B is the matrix in Eq. (4.2.9). By Eq. (4.2.2) we have

for i =

I, ...

,n, and

for j = I, ... , m. By Theorem 3.5.22,


Therefore,

e<

"

e~>
=

6,,, and

<I",f~>
=

6,,}.

.4 2.

Matrices

133

Also,

A< el,f>~
Therefore, b,j

e< l, AT/~>

= k=L 1"

bkje~)

(el, tl
=

= bl}'

bklel, e~>

ajl' which proves the theorem. _

The preceding result gives rise to the following concept.


.4 2.10. Definition. The matrix AT in Eq.
of matrix A.

(4.2.9)

is called the transpose

Our next result follows trivially from the discussion leading up to Definition .4 2.7.
.4 2.11. Theorem. Let A be a linear transformation of an n-dimensional
vector space X into an m-dimensional vector space ,Y and let y = Ax. Let
the coordinates of x with respect to the basis e{ l , el' ... , e,,} be (e \ J el' ... ,
e.), and let the coordinates of y with respect to the basis { f l,fl' ... ,f..} be
('I I ' 1' 1' ... , 'I.). eL t

all
all

011

ala

au

ala

(4.2.12)

be the matrix of A with respect to the bases reI' e1 ,


Then

I.}.

allel
auel

a l1 el
a 21 el

or, equivalently,

I' I =
.4 2.15.

Exercise.

jml

+
+

a,je j, i =

alae.

a 1"e. =

I, ... , m.

e.} and { f l,fl, ... ,

' I I'

1' 1'

(4.2.13)

(4.2.14)

Prove Theorem .4 2.11.

Using matrix and vector notation, let us agree to express the system of
linear equations given by Eq. (4.2.13) equivalently as

134

Chapter 4

all
au

I iF nite-Dimensional

aU.

aa.

au

a~ h

a. 1 a.2

I'
2'
,- .

or, more succinctly, as

Vector Spaces and Matrices

"1

"2

".

y,
~

(4.2.16)

(4.2.17)

where x ~ (' I ' 2


' " .. ".) and yT ~ ("1> "2' ... ,,,.).
In terms ofx T, yT, and AT, let us agree to express Eq. (4.2.13) equivalently
as

all
(' I t

2' ' ,' ,.)

or, in short, as

aU.

a. 1
a.2

a21
au

a_

In

ab
T
x AT

("It "2' .. " "",) (4.2.18)

yT.

(4 . 2.19)

We note that in Eq. (4.2.17), x E P, Y E F"', and A is an m X n matrix.


F r om our discussion thus far it should be clear that we can utilize matrices
to study systems of linear eq u ations which are of the form of Eq. (4.2.13).
It should also be clear that an m x n matrix A is nothing more than a uniq u e
representation of a linear transformation A of an n-dimensional vector space
X into an m-dimensional vector space Y over the same field .F As such,
A possesses all the properties of such transformations. We could, in fact,
utilize matrices in place of general linear transformations to establish many
facts concerning linear transformations defined on finite-dimensional linear
spaces. However, since a given matrix is dependent upon the selection of two
particular sets of bases (not necessarily distinct), such practice will, in general,
be avoided whenever possible.
We emphasize that a matrix and a linear transformation are not one and
the same thing. In many texts no distinction in symbols is made between
linear transformations and their matrix representation. We will not follow
this custom.

B. Rank of a Matrix
We begin by proving the following result.
4.2.20. Theorem. L e t A be a linear transformation from X into .Y Then
A has rank r if and only if it is possible to choose a basis e{ l> e2 , , e.}

.4 2.

Matrices

135

for X and a basis { I I' ... ,fIll} for Y such that the matrix A of A with respect
to these bases is of the form
r..

- 100

6
o

010
A=

0 0 0

...

0 0

0-

0 0

1 0 0

...

m=

dim .Y

(4.2.21)

0000000
0000000

....

dim X

n=

Proof. We choose a basis for X of the form e{ l, e2.' ... ,e" e,+I'

. . , e.},
where e{ l+ ' >
... , e.} isa basisfodJt(A). Ifll = Ae l ,f2. = Ae2.' ... ,/, = Ae"
then {l1,f2.," .,/,} is a basis for R
< (A), as we saw in the proof of Theorem
3.4.25. Now choose vectors 1,+1, ... ,fin in Y such that the set of vectors
l{ 1,f2., .. . ,f",} forms a basis for Y (see Theorem 3.3.4)4 . Then

II
12.

Ae l

Ae2

(1)/1

(0)/1

(0)/2.

+
+

(1)12.

(O)/,

(0)/'1+

(0)/,

(0)/'1+

+
+

(O)/In'

(0)/""

..................................................................................................... ,

I, =
o=

Ae,

0=

Ae" =

Ae,+

(0)/1

(0)/2

= (0)/1 + (0)/2. +

+ "' + (O)/In' (4.2.22)


+ (0)/, + (O)/,+ 1 + ... + (O)/In'
(1)/,

(0)/'1+

...................................................................................................... ,

(0)/1

(0)/2.

+ ... +

(0)/,

(0)/'1+

+ ... +

(O)/In'

The necessity is proven by applying Definition 4.2.7 (and also Eq. (4.2.2
to the set of equations (4.2.22); the desired result given by Eq. (4.2.21)
follows.
Sufficiency follows from the fact that the basis for R
< (A) contains r linearly
independent vectors. _
A question of practical significance is the following: if A is the matrix
of a linear transformation A from linear space X into linear space Y with
respect to arbitrary bases e{ l , , e.} for X and { I I' ... , /In} for ,Y what is
< (A) be the subspace of Y generthe rank of A in terms of matrix A? Let R
ated by Ae l , Ae2.' ... , Ae". Then, in view of Eq. (4.2.2), the coordinate representation of Ae/> i = I, ... ,n, in Y with respect to { I I' ... ,fin} is given
by

Chapter 4 I iF nite-Dimensional

136

... ,

Vector Spaces and Matrices

Ae,,~

F r om this it follows that R


< (A) consists of vectors y whose coordinate representation is

+ ... + "

y=

"_ ...

a_ ... I

a_ ... 2

(4.2.23)

a..."

where" I' , "" are scalars. Since every spanning or generating set of a linear
space contains a basis, we are able to select from among the vectors Ael
Ae 2 ... ,Ae" a basis for R
< (A). Suppose that the set A
{ e l , Ae2 ... Aek}
is this basis. Then the vectors Ae I. Ae 2 .. , Ae k are linearly independent.
and the vectors Aek+I' ... , Ae" are linear combinations of the vectors Ae l
Ae2 ,
Aek F r om this there now follows:
.4 2.24.
Theorem. Let A E L ( X .
)Y , and let A be the matrix of A with
respect to the (arbitrary) basis eel' e2 ... , e,,} for X and with respect to the
(arbitrary) basis { l 1.l2 ... .I...} for .Y Let the coordinate representation of
y = Ax be Y = Ax. Then
(i) the rank of A is the number of vectors in the largest possible linearly
independent set of columns of A; and
(ii) the rank of A is the number of vectors in the smallest possible set of
columns of A which has the property that all columns not in it can
be expressed as linear combinations of the columns in it.
In view of this result we make the following definition.
.4 2.25. Definition. The rank of an m X
of linearly independent columns of A.

c.

n matrix A is the largest number

Properties of Matrices

Now let X be an n-dimensional linear space. let Y be an m-dimensional


linear space, let F b e the field for X and ,Y and let A and B be linear transformations of X into .Y eL t A = a[ o ] be the matrix of A. and let B = h[ o ]
be the matrix of B with respect to the bases felt e2 , e,,} in X and { f t.f2.

.4 2.

Matrices

137

... ,/",} in .Y Using Eq. (3.4.2


4 ) as well as Definition .4 2.7, the reader can
readily verify that the matrix of A + D, denoted by C A A + B, is given by

a[ lj]

b[ IJ]

a[ lJ

blj]

e[ IJ]

C.

(4.2.26)

Using Eq. (3.4.34 )


and Definition .4 2.7, the reader can also easily show
that the matrix of A
, denoted by D A A, is given by
A

a[ IJ]

a[ lj]

d[ IJ]

D.

(4.2.27)

F r om Eq. (4.2.26) we note that, in order to be able to add two matrices A


and B, they must have the same number of row.5 and columns. In this case
we say that A and B are comparable matrices. Also, from Eq. (4.2.27) it is
clear that if A is an m X n matrix, then so is A
.
Next, let Z be an r-dimensional vector space, let A E L ( X ,
)Y , and let
D E L ( ,Y Z). L e t A be the matrix of A with respect to the basis e{ I' e", ... ,
e in X and with respect to the basis { f l' ! ' " ... ,f",} in .Y Let B be the matrix
of D with respect to the basis { f l ,f", ... ,!m} in Y a nd with respect to the basis
{ g l' g", ... , g,} in Z. The product mapping DA as defined by Eq. (3.4.50)
is a linear transformation of X into Z. We now ask: what is the matrix C
of DA with respect to the bases e{ l, e", ... , e of X and g{ I ' g", ... , g,}
of Z? By definition of matrices A and B (see Eq. (4.2.2, we have
K }

K }

and

B! J = 1 :bljg/t
1= 1

Now

, "'
= 1=1:1 J=I1:

j= I ,

... ,m.

blj aJkgl'

for k = I, ... , n. Thus, the matrix C of BA with respect to basis e{


in X and { g " ... , g,} in Z is e[ IJ' ]
where

I' ..

K }

(4.2.28)
for i

I, ... , r andj =

I, ... , n. We write this as


C= B A.

(4.2.29)

F r om the preceding discussion it is clear that two matrices A and B can


be multiplied to form the product BA if and only if the number of columns
ofB is equal to the number of rows of A. In this case we say that the matrices
B and A are conformal matrices.

138

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

In arriving at Equations (4.2.28) and (4.2.29) we established the result


given below.
)Y with respect to the
.4 2.30. Theorem. Let A be the matrix of A E L ( X ,
basis leu ez , ... , e.} in X and basis { l u! z ,
,fill} in .Y Let B be the matrix
of BEL ( ,Y Z) with respect to basis { I I' ,z !
,fill} in Y and basis {g" g,z
... ,g,} in Z. Then BA is the matrix of BA.
We now summarize the above discussion in the following definition.
.4 2.31.
let C =

Definition. Let A = a[ l' ] and B = b[ ll] be two m X n matrices,


C[ II] be an n X r matrix, and let ~ E .F Then

(i) the som of A and B is the m x

n matrix

D= A + B

where

dll = a'l + bl'


for all i = I, ... , m and for allj = 1, ... ,n;
(ii) the product of matrix A by scalar ~ is the m x n matrix
E=~A

where
for all i

ell =

1, ... ,m and for allj =

~all

I, ... ,n; and

(iii) the product of matrix A and matrix C is the m x r matrix

G= A C,

where

for each i

I, ... , m and for eachj =

1, ... , r.

The properties of general linear transformations established in Section


3.4 hold, of course, in the case of their matrix representation. We summarize
some of these in the remainder of the present section.
.4 2.32.

Theorem.

(i) Let A and B be (m x n) matrices, and let C be an (n X


Then
(A B)C = AC + BC.

(ii) Let A be an (m
Then
X

n) matrix, and let Band C be (n x

A(B

C)

AD

AC.

r) matrix.
(4.2.33)
r) matrices.
(4.2.34)

.4 2.

Matrices

139

(iii) Let A be an (m X n) matrix, let B be an (n X


C be an (r X s) matrix. Then
A(BC) =
(iv) Let t,

pE

,F and let A be an (m

(t +
(v)

(AB)C.

t(A +

B) =

and let
(4.2.35)

n) matrix. Then
X

P)A =

Let t E ,F and let A and B be (m

r) matrix,

tA +

(4.2.36)

pA.

n) matrices. Then

tA +

(4.2.37)

tB.

(vi) Let t, P E ,F let A be an (m X n) matrix, and let B be an (n X r)


matrix. Then
(4.2.38)
(tA)(pB) = (tP)(AB).
(vii)

Let A and B be (m

n) matrices. Then

A +B=
(viii)

Let A, B, and C be (m
(A +

(4.2.39)

B+ A .

n) matrices. Then

B) + C =

A+

(B +

C).

(4.2.40)

The proofs of the next two results are left as an exercise.

.4 2.41.

Theorem. L e t 0 E L ( X , Y ) be the zero transformation defined by


Eq. (3.4.)4 .
Then for any bases e{ l' ... , e.J and { f l' ... ,I.. J for X and ,Y
respectively, the linear transformation 0 is represented by the (m x n) matrix
(4.2.42)

The matrix 0 is called the Dull matrix.

.4 2.43.

Theorem. Let I E L ( X , X ) be the identity transformation defined


by Eq. (3.4.56). L e t e{ l> ... , e.J be an arbitrary basis for .X Then the matrix
representation of the linear transformation I from X into X with respect to
the basis e{ l> ... , e.J is given by

I
I is called the n x

.4 2.45.

Exercise.

~ ~

[ : .. ..: ..:.:.:..

:J

(4.2.4)4

n identity matrix.
Prove Theorems 4.2.32,4.2.41,

and .4 2.43.

140

Chapter 4 I iF nite-Dimensional
F o r any (m x

Vector Spaces and Matrices

n) matrix A we have

(4.2.46)

A+ O = O + A = A
and for any (n X n) matrix B we have

(4.2.47)

BI= I B= B

where I is the (n x n) identity matrix.


If A = a[ u] is a matrix of the linear transformation A, then correspondingly, - A is a matrix of the linear transformation - A , where

-A =

(- I )A =

all

012

ala

021

02 2

02"

0",2

a",,,

(- I )
_ 0 "' 1

- a ll

- 0 12

- a la

- 0 21

-au

- 0 211

- 0 "' 2

- a ",,,

(4.2.48)

=
_ - a "' l

It follows immediately that A + (- A ) = 0, where 0 denotes the null matrix.


By convention we usually write A + (- A) = A- A .
Let A and B be (n X n) matrices. Then we have, in general,

AB*BA,

(4.2.49)

as was the case in Eq. (3.4.55).


Nex t ,let A E L ( X ,
X ) and assume that A is non-singular. Let A- I denote
the inverse of A. Then, by Theorem 3.4.60, ..4A1= A-1A = 1. Now if A is
the (n x n) matrix of A with respect to the basis e{ l , ,ell} in ;X then
there is an (n X n) matrix B of A- I with respect to the basis e{ u ... ,ell} in
,X such that
(4.2.50)
BA= A B= I .
We call B the inverse of A and we denote it by A- I . In this connection we use
the following terms interchangeably: A- I exists, A bas an inverse, A is
invertible, or A is non-singular. If A is not non-singular, we say A is singnlar.
With the aid of Theorem 3.4.63 the reader can readily establish the following result for matrices.
4.2.51.

Theorem. eL t A be an (n

(i) rank A = n;
(ii) Ax = 0 implies x

0;

n) matrix. The following are equivalent:

.4 2.

Matrices

141

(iii) for every oY E "F , there is a unique X o E F " such that oY =


(iv) the columns of A are linearly independent; and
(v) A - I exists.
4.2.52.

Exercise.

Ax o;

Prove Theorem .4 2.51.

We have shown that we can represent n linear eq u ations by the matrix


eq u ation (4.2.17). Now let A be a non-singular (n x n) matrix and consider
the eq u ation
y = Ax.
(4.2.53)

If we premultiply both sides of this eq u ation by A - I we obtain


x = A- I y ,

(4.2.54)

the solution to Eq. (4.2.53). Thus, knowledge of the inverse of A enables


us to solve the system of linear eq u ations (4.2.53).
In our next result, which is readily verified, some of the important properties of non-singular matrices are given.
4.2.55.

Theorem.

(i) An (n x n) non-singular matrix has one and only one inverse.


(ii) IfA and B are non-singular (n x n) matrices, then (AB)-I = B-1 A- I .

(iii) If A and Bare (n x


are A and D.
4.2.56.

Exercise.

n) matrices and if AB is non-singular, then so

Prove Theorem .4 2.55.

Our next theorem summarizes some of the important properties of the


transpose of matrices. The proof of this theorem is a direct consequence of
the definition of the transpose of a matrix (see Eq. (4.2.9.
4.2.57.

Theorem.

(i) F o r any matrix A, (AT)T = A.


(ii) L e t A and B be conformal matrices. Then (AB)T = DTAT.
(iii) L e t A be a non-singular matrix. Then (AT)-I = (A-I)T.
(iv) L e t A be an (n X n) matrix. Then AT is non-singular if and only if
A is non-singular.
(v) Let A and B be comparable matrices. Then (A
B)T = AT BT.
(vi) L e t
E F and A be a matrix. Then (<A
< )T = A
T.

4.2.58.

Exercise.

Prove Theorem .4 2.57.

Now let A be an (n X

n) matrix, and let m be a positive integer. Similarly

Chapter 4 I iF nite-Dimensional

142

Vector Spaces and Matrices

as in Eq. (3.4.67) we define the (n x n) matrix A'" by


A'"

= -'!

A . ..

A.

(4.2.59)

m times

and if A- I exists. then similarly as in Eq. (3.4.68). we define the (n x n)


matrix A-"' as
A-"' = (A-I)'" = A- I . A- I . .. A- I .
(4.2.60)
,
.
~

m times

As in the case of Eqs. (3.4.69) through (3.4.71). the usual laws of exponents
follow from the above definitions. Specifically. if A is an (n x n) matrix and
if rand s are positive integers. then
A' A'
(A' ) '

A' '

A' '
=

A" =

A" =

A' A'.
=

(A' ) ' .

and if A- I exists. then

(4.2.61)
(4.2.62)
(4.2.63)

Consistent with the above notation we have

Al = A

and

AO =

(4.2.64)

I.

(4.2.65)

We are now once more in a position to consider functions of linear transformations. where in the present case the linear transformations are represented
by matrices F o r example. if f(J.) is the polynomial in .J given in Eq. (3.4.74).
and if A is any (n X n) matrix. then by f(A) we mean
f(A) =

/1 0 1

/lIA

+ ... +

/I.A.

(4.2.66)

.4 2.67. Exercise. Let A E L ( X .


X ) . and let A be the matrix of A with
respect to the basis felt ... e.l in .X L e tf(J . ) be given by Eq. (3.4.74). Show
that/(A) is the matrix of /(A) with respect to the basis e{ l ... e.l.
We noted earlier that in general linear transformations and matrices do
not commute (see (3.4.55) and (4.2.49.
oH wever. in the case of square
matrices. the reader can verify the following result easily.
.4 2.68. lbeorem. Let A. B. C denote (n x n) matrices. let 0 denote the
(n X n) null matrix. and let 1 denote the (n X n) identity matrix. Then.
(i)
(ii)
(iii)
(iv)

0 commutes with any A;


A' commutes with Af. where p and q are positive integers;
/II commutes with any A. where /I E F; and
if A commutes with B and if A commutes with C. then A commutes
with /lB + PC. where /I. P E .F

.4 2.

143

Matrices

.4 1.69.

Exercise.

Prove Theorem .4 2.68.

eL t us now consider some specific examples.

.4 2.70.

Example.

eL t F denote the field of real numbers, and let

[~ ~ ~l

A=

Then
A+
If /X =

~ :J

B = :[ 3

3, then

let

Example.

and A - B

=[

- i , then

/XC
Example.

[ l~ :~
~

I'J~ .

18

eL t F denote the field ofcomplex numbers, let; =

.4 2.71.

[~ ~ iJ.
=

1 0 -I

Then

If/X =

~ [~ i ~I

/XA =

.4 1.71.

and B

I~

7+ i
-i

3-

i 5+
7;
11

2;]

-4

I- 2- 3;]

-8;
7
[
1- i - 3 i

-6i.
+5i

eL t F denote the field of real numbers, let


G

[:

:]

and H -

[~

J-

-1,

eMpter 4 I iF nite-Dimensional

144

Vector Spaces and Matrices

Then
GH
Notice that in this case H G
.4 2.73.

[~ ~J
=

K
Then

.4 2.74.

[~

and L =

10 5J
[ 22 13

KL=

*' LK.

is not defined.

Let F be the field of real numbers, let

Example.

Clearly, K L

10 13.
5]
[ 10 IS
22

and K L

~J
I[ I 7 12
16J

Example.

Let

[~ ~J
=

[~ ~J.

and N =

Then

[~

MN=
i.e., MN =
.4 2.75.

0, even though M t= =

Example.

~J

=0,

0 and N t= =

o. _

If A is as defined in Example .4 2.70, then

I 2]
2 4
AT=4560._
[
163 I
.4 2.76.

Example.

p~
Then

Let

5
[:

-6

~]

32
24
and

Q=
-45
24

24

- 1 6-

-6

24
27
24

24
-2

24

24

24

.4 2.

145

Matrices
1 0

p Q = Q P =
i.e., Q =
.4 2.77.

Q- l .

p- I or, equivalently, P =

Example.

0 1

Consider the set of simultaneous linear equations

~I

6el
2el

2e~

e3

+ 3e2 +
e2

e3

0e3

3e, =

e4 ,

0,

e,

= 0,

(4.2.78)

o.
=

Equation (4.2.78) can be rewritten as

eL t

~ ~] [~:]
:

2 1 0 1

e3

e,

= [

4 2 1 3]

[2

A= 6 314 .

I 0

].

(4.2.79)

(4.2.80)

Matrix A is the coordinate representation of a linear transformation A E L ( X ,


Y). In this case dim X = 4 and dim Y = 3. Observe now that the first column
of A is a linear combination of the second column of A. Also, by adding the
third column of A to the second column we obtain the fourth column of A.
It follows that A has only two linearly independent columns. Hence, the rank
of A is 2. Now since dim X = dim (~ A)
+ dim lR(A), the nullity of A is
also 2.
Next, we discuss briefly partitioned vectors and matrices. Such vectors and
matrices arise in a natural way when linear transformations acting on the
direct sum of linear spaces are considered.
Let X be an n-dimensional vector space, and let Y be an m-dimensional
vector space. Suppose that X = U EB W, where U is an r-dimensionallinear
subspace of ,X and suppose that Y = R EB Q, where R is a p-dimensional
)Y , let e{ l,
, eft} be a basis for X such
linear subspace of .Y Let A E L(X,
that e{ l'
, e,} is a basis for ,U and let { f l'
,f",} be a basis for Y such
,f,} is a basis for R. Let A be the matrix of A with respect to
that { I I'
these bases. Now if x E P is the coordinate representation of x E X with
respect to the basis e{ l, ... , eft}' we can partition x into two components,

Chapter 4 I iF nite-Dimensional

146

I~

Vector Spaces and Matrices

;[= }

(4.2.81)

=[~}

(4.2.82)

~.

where U

pr, ' " E )'-<F

and where

Similarly, we can express y

pI as

'II

y=

' I ..

where y is the coordinate representation of y with respect to { I I' ... ,I..}


and where r E PF and q E F"-p. We say the vector x in Eq. (4.2.81) is partitioned into components u and "'. Clearly, the vector u is determined by the
coordinates ofx corresponding to the basis vectors e{ l' ... ,e,} in .U
We can similarly divide the matrix A into the partition
A=

[~_I -L~:J'

AZI

Au

(4.2.83)

where All isa(p x r)matrix, Au isa(p x (n - r m atrix , A11 isan m - p)


X r) matrix, and Au is an m - p) X (n - r matrix. In this case, the
equation
(4.2.84)
y= A x
is equivalent to the pair of equations

r
q =

Allu
A 11 u

+
+

Au"' } .
Az ' "

(4.2.85)

.4 2.

Matrices

147

A matrix in the form of Eq. (4.2.83) is called a partitioned mattix. The


matrices All, Au, A21 , and A22 are called sUbmatrices of A.
The generalization of partitioning the matrix A into more than four
submatrices is accomplished in an obvious way, when the linear space X
and/or the linear space Y a re the direct sum of more than two linear
subspaces.
Now let the linear spaces X and Y and the linear transformation A and
the matrix A of A still be defined as in the preceding discussion. Let Z be a
k-dimensional vector space (the spaces ,X ,Y and Zare vector spaces over
the same field )F . Let Z = M EB N, where M is a j-dimensionallinear subspace of Z. eL t B E L ( ,Y Z). In a manner analogous to our preceding discussion, we represent B by the partitioned matrix

B= [~I_~LJ'

B21

(4.2.86)

Bu

It is now a simple matter to show that the linear transformation BA


Z) is represented by the partitioned matrix

BA =
We now prove:

[~; -!~;: -i~:;~-! ;:l

L(X,

(4.2.87)

Theorem. Let

X be an n-dimensional vector space, and let


P E L(X,
X ) . If P is a projection, then there exists a basis (el> ... , e.J for
X such that the matrix P of P with respect to this basis is of the form
.4 2.88.

0:

1 0

0:,

I
I
I
I

I
I

p=

0 0

-~_.

(4.2.89)

:0
I

I
I

I
I

I
I

n- r

;0
where r =

dim R
< (P).

Proof. Since P is a projection we have, from Eq. (3.7.8),


X

R
< (P)

EB (~ P).

Now let r = dim R


< (P), and let e{ l> ... , e.J be a basis for X such that (el'
... , e,J is a basis for R
< (P). Let P be the matrix of P with respect to this basis,
and the theorem follows.

Chapter 4 I iF nite-Dimensional

148

Vector Spaces and Matrices

We leave the next result as an exercise.


4.2.90.

Theorem.

A E L(X,

X).

Let

be a finite-dimensional vector space, and let

If W is a p-dimensional invariant subspace of X and if X

EB Z, then there exists

a basis for X such that the matrix A of A with respect


to this basis has the form
A

[~: -i'~!:J o :A

2Z

where All is a (p x p) matrix and the remaining submatrices are of appropriate dimension.
.4 2.91.

Exercise.

.4 3.

Prove Theorem .4 2.90.

EQUIVALENCE

AND SIMILARITY

F r om the previous section it is clear that a linear transformation A of


a finite-dimensional vector space X into a finite-dimensional vector space Y
can be represented by means ofdifferent matrices, depending on the particular
choice of bases in X and .Y The choice of bases may in different cases result
in matrices that are "easy" or "hard" to utilize. Many of the resulting
"standard" forms of matrices, called canonical forms, arise because of practical considerations. Such canonical forms often exhibit inherent characteristics of the underlying transformation A. Before we can consider some of the
more important canonical forms of matrices, we need to introduce several
new concepts which are of great importance in their own right.
Throughout the present section X and Y a re finite-dimensional vector spaces
over the same field ,F dim X = n and dim Y = m. We begin our discussion
with the following result.
4.3.1.

Theorem. Let e{ l , ,e"} be a basis for a linear space ,X


be a set of vectors in X given by

e{ ;, ... , e~}

e; =

:t pjlej

j=

i=

1, ... ,n,

where Plj E F for all i,j = I, ... ,n. The set e{ ;, ... ,~}
X if and only if P = [Plj] is non-singular.

and let
(4.3.2)

forms a basis for

Proof Let e{ ;, . .. ,~}


be linearly independent, and let Pj denote the
jthcolumn vector of P. Let

.4 3.

149

Equivalence and Similarity

for some scalars

lX I '

,IX "

.F This implies that

' } ' " IXIPIl

It follows that

I, ... ,n.

0, i =
=

1':1

Rearranging, we have
or

I::" IX I"

1= I

e;, ... , e.

O.
=

Since
are linearly independent, it follows that IX I = ... = IX" = O.
Thus, the columns ofP are linearly independent. Therefore, P is non-singular.
{ I' . , PIt} be a linearly indeConversely, let P be non-singular, i.e., let P
pendent set of vectors in .X

"
,=I:: lX,e; =

Let

I' . . , IX"

" IX,PI'
... ,e,,} is a linearly independent set, it follows that I::
=

Then

Since e{ l'
for j

0 for some scalars IX

I, ...

I- '

,n, and thus, I::" IX,P, = O. Since P{ I"

I-'

independent set, it now follows that

IX

e{ ;, ... , e.} is a linearly independent set. _


I

= ... =

.F

.. ,p,,} is a linearly
IX"

0, and therefore

The preceding result gives rise to:

4.3.3. Definition. The matrix P of Theorem .4 3.1


basis e{ ;, ... ,
with respect to basis e{ I ' . , eft}'

e.}

is called the matrix of

We note that since P is non-singular, p- I exists. Thus, we can readily


prove the next result.

,e.}

4.3.4.
Theorem. L e t e{ l, ... ,e,,} and e{ ;, . ..
be two bases for ,X
and let P be the matrix of basis e{ ;, ... ,e~}
with respect to basis e{ l' ... , eft}'
Then p- I is the matrix of basis e{ I' ... , eft} with respect to the basis e{ ;,

... , e,.}.

4.3.5.

Exercise.

Prove Theorem .4 3.4.

The next result is also easily verified.

Chapter 4 I iF nite-Dimensional

150

Vector Spaces and Matrices

.4 3.6. Theorem. eL t X be a linear space, and let the sets of vectors e{ l>
... ,eft}' e{ ~
,e..}, and e{ f' , . .. , e':} be bases for .X If P is the matrix
, e'ft} with respect to basis e{ I ' , eft} and if Q is the matrix
of basis e{ ,~
of basis e{ f' ,
, e':} with respect to basis e{ ,~ ... ,e..}, then PQ is the
eft}'
matrix of basis e{ f' , . . , e':} with respect to basis e{ l ,
.4 3.7.

Exercise.

Prove Theorem .4 3.6.

We now prove:
~ , e..} be two bases for a linear
.4 3.8. Theorem. eL t e{ I . . , eft} and e{ ,~
... ,e..} with respect to basis
space .X and let P be the matrix of basis ,~{
e{ lt , eft}' eL t x E X and let x denote the coordinate representation of x
with respect to the basis e{ lt , eft}' eL t x ' denote the coordinate representation of x with respect to the basis e{ ,~ ... ,e..}. Then Px ' = .x

Proof.

eL t x

(~I'

and let (x)' T

... '~ft)'

(~~,

... ,~~).

Then

and
Thus,
~ ft ~eJ

J-I

which implies that


,~

Therefore,

~ [ .~

1-'

J~I

ft

P/J~J'

j':1

plJe, ]

~ ft(.
~
P/J~J
t:1 I- I

) e,

I, ... n.

Px /.

.4 3.9. Exercise. eL t X = Rft and let u{ It ,u.} be the natural basis


for Rft (see Example .4 l.l5). eL t e{ lt ,eft} be another basis for R-, and let
eft be the coordinate representations of e lt , e., respectively, with
e lt
, e.} with
respect to the natural basis. Show that the matrix of basis e{ I .
respect to basis u{ lt , fU t} is given by P = e[ lt e2 , , eft]' i.e., the matrix
whose columns are the column vectors e l .
,eft'
)Y , and let e{ l, ... ,e.} and { f l" .. ,f..}
.4 3.10. Theorem. eL t A E L ( X ,
be bases for X and ,Y respectively. eL t A be the matrix of A with respect
,fill} in .Y eL t ~{ . .. , e..} be
to the bases e{ l , ,eft} in X and { f l'
another basis for .X and letthe matrix of{e,~
, e..} with respectto e{ l , ,
eft} be P. eL t f{ ,~
... ,f~}
be another basis for ,Y and let Q be the matrix of
{ f l' ... ,fill} with respect to f{ ,~
... ,f~}.
eL t A' be the matrix of A with respect

.4 3.

151

Equivalence and Simiklrity

to the bases e{ ,~

... , e:,} in X and f{ ,~

... ,f~}

in .Y Then.

A' = Q AP.
Proof.

We have

A(~ I~ Pklek) =

Ae; =

t Pk/Aek = t Pkl(f't1 at/rlt)

k~1

k~1

Pkl[l=t1 alk(t
q J d j)]
J=I

k~1

Now, by definition, Ae; =

IN

"J ::1

t(f't1 t

J-l

lJ q alkPkl)fj.

k= 1

aj,!j. Since a matrix of a linear transformation

is uniquely determined once the bases are specified, we conclude that

for i =

I, ... ,m andj

I, ... , n. Therefore, A' =

QAP.

In iF gure A, Theorem .4 3.10 is depicted schematically.

A
Px'

x=

" y

"

t
(e;, .. .e~}

A'

x'

.4 3.11.

y=

Ax

u; ..... f;"}

"

y'

Qy

IF gure A. Schematic diagram of Theorem .4 3.10.

The preceding result motivates the following definition.


.4 3.12.
(m X

Definition. An (m X n) matrix
n) matrix A if there exists an (m X

A' is said to be equivalent to an


m) non-singular matrix Q and an

Chapter" I iF nite-Dimensional

152

Vector Spaces and Matrices

n) non-singular matrix P such that

(n X

A' = Q AP.

(4.3.13)

IfA' is equivalent to A, we write A' ..., A.


Thus, an (m X n) matrix A' is equivalent to an (m X n) matrix A if and
only if A and A' can be interpreted as both being matrices of the same linear
transformation A of a linear space X into a linear space ,Y but with respect
to possibly different choices of bases.
Our next result shows that ..., is reflexive, symmetric, and transitive,
and as such is an equivalence relation.
.4 3.14.

Theorem. Let A, B, and C be (m x

n) matrices. Then

(i) A is always equivalent to A;


(ii) if A is equivalent to B, then B is equivalent to A; and
(iii) if A is equivalent to Band B is equivalent to C, then A is equivalent
to C.
.4 3.15.

Exercise.

Prove Theorem .4 3.14.

The reader can prove the next result readily.


.4 3.16.

Theorem. Let A and B be m x n matrices. Then

(i) every matrix A is equivalent to a matrix of the form

1 0 0 ..

1 0

...

..

0-

...

000 .. 1 0 0
0 0 0 .. 0 0 0

.. 0
.. 0

0 0 0 .. 0 0 0

.. 0

= rank A
(4.3.17)

(ii) two (m x n) matrices A and B are equivalent if and only if they


have the same rank; and
(iii) A and AT have the same rank.

.4 3.18.

Exercise.

Prove Theorem .4 3.16.

Our definition of rank of a matrix given in the last section (Definition


.4 2.25) is sometimes called the columa rank of a matrix. Sometimes, an analogous definition for row rank of a matrix is also considered. The above theorem
shows that the row rank of a matrix is equal to its column rank.

.4 3.

153

Equivalence and Similarity

Next, let us consider the special case when X

.Y
=

We have:

Theorem. L e t A E L ( X ,
X), let (e l , , e.l be a basis for ,X and
let A be the matrix of A with respect to (e l ,
, e.l. L e t
(e~,
... , e"l be
another basis for X whose matrix with respect to (e l , , e.l is P. L e t A'
be the matrix of A with respect to (~, ... , e"l. Then

.4 3.19.

A'

P- I AP.

(4.3.20)

The meaning of the above theorem is depicted schematically in F i gure B.


The proof of this theorem is just a special application of Theorem .4 3.10.
,;,.;A~

t,

Ie" . ". enl

....

A'

Ie; ..... e~l


4.3.21.

__

Ie,.' .. enl

t,,-

e{ ;, ... , e~}

Figure B. Schematic diagram of Theorem .4 3.19.

Theorem .4 3.19 gives rise to the following concept.

.4 3.22.

Definition. An (n X n) matrix A' is said to be similar to an (n X


matrix A if there exists an (n X n) non-singular matrix P such that
A'

= P- I AP.

n)

(4.3.23)

If A' is similar to A, we write A' ,." A. We call P a similarity transformation.

It is a simple matter to prove the following:


Theorem. Let A' be similar to A; i.e., A' = P- I AP, where P is
non-singular. Then A is similar to A' and A = PA' P - I .

.4 3.24.

In view of this result, there is no ambiguity in saying two matrices are


similar.
To sum up, if two matrices A and A' represent the same linear transforX), possibly with respect to two different bases for ,X
mation A E L ( X ,
then A and A' are similar matrices.

eMpter 4 I iF nite-Dimensional

154

Vector Spaces and Matrices

Our next result shows that ' " given in Definition 4.3.22 is an equivalence
relation.
4.3.25.

Let A, B, and C be (n x

Theorem.

n) matrices. Then

(i) A is similar to A;
(ii) if A is similar to B, then B is similar to A; and
(iii) if A is similar to B and if B is similar to C, then A is similar to C.
4.3.26.

Exercise.

Prove Theorem .4 3.25.

F o r similar matrices we also have the following result.


4.3.27.

Theorem.

(i) Ifan (n X n) matrix A is similar to an (n X n) matrix B, then At is


similar to Bk, where k is a positive integer.
(ii) L e t
(4.3.28)
where

/%0'

,/%",

.F

Then

f(P- I AP) =

P- l f(A)P.

(4.3.29)

This implies that if B is similar to A, then f(B) is similar to f(A). In fact,


the same matrix P is involved.
(iii)

L e t A' be similar to A, and let f(l) denote the polynomial of Eq.


(4.3.28). Then f(A) = 0 if and only if f(A' ) = O.
(iv) L e t A E L ( X ,
X ) , and l' et A be the matrix of A with respect to a
basis e{ l , ,e.} in .X
L e t f(l) denote the polynomial of Eq .
(4.3.28). Then f(A) is the matrix of f(A) with respect to the basis

e{ l , , e.}.
(v) L e t A E L ( X , X ) , and letf(l) denote the polynomial ofEq . (4.3.28).
Let A be any matrix of A. Thenf(A) = 0 ifand only iff(A) = O.

4.3.30.

Exercise.

Prove Theorem 4.3.27.

We can use results such as the preceding ones to good advantage. F o r


example, let A' denote the matrix

11

0 0
12 0

00

A' =

(4.3.31)

o
o

0
0

0
0

1"_1

1.

.4 .4

ISS

Determinants ofMatrices

Then

MOO
o A~ 0
(A')k

..

0
0

0
0

0
0

Now letf(A) be given by Eq. (4.3.28). Then

I 0
o I
f(A' )

0
0

(10

0 0
0 0

A'1'
0

Al

0 ........ .
Ar .........

o o

o o

0
A2

+ ...

(II

0
0

f(AI )

o
o

0
0

A"_I
0

0
f(A2)

A"

............
. ...........

o
o

0
0

o
f(l.)

We conclude the present section with the following definition.


.4 3.32. Definition. We call a matrix of the form (4.3.31) a diagonal matrix.
Specifically, a square (n X n) matrix A = [a'l] is said to be a diagonal
matrix if alj = 0 for all i j. In this case we write A = diag (all, an, ... ,
a ).

"*

.4 .4

DETERMINANTS OF

MATRICES

At this point of our development we need to consider the important


topic of determinants. After stating the definition of the determinant of a
matrix, we explore some of the commonly used properties of determinants.
We then characterize singular and non-singular linear transformations on
finite-dimensional vector spaces in terms of determinants. Finally, we give
a method of determining the inverse of non-singular matrices.
Let N = {I, 2, ... ,n} . We recall (see Definition 1.2.28) that a permutation
on N is a one-to-one mapping of N onto itself. F o r example, if (J denotes a

Chapter 4 I iF nite-Dimensional

156

permutation on N, then we can represent it as

wherej, E Nfor i = I, ... , n andj,


q given above, more compactly, as

*-

j" for i *- k. Henceforth, we represent


.. . j".

= j dz
q

n)

...
... j,,'

I 2
( jl jz

q=

Vector Spaces and Matrices

Clearly, there are n! possible permutations on N. We let P(N) denote the


set of all permutations on N, and we distinguish between odd and even
permutations. Specifically, if there is an even number of pairs (i, k) such
that i > k but i precedes k in q , then we say that q is even. Otherwise q is
said to be odd. Finally, we define the function sgn from P(N) into F b y

= {

sgn (q)

I
q

is even

-I
q is odd
for all q E P(N).
Before giving the definition of the determinant of a matrix, let us consider
a specific example.

4.4.1.
Example. As indicated in the accompanying table, there are six
permutations on N = (I, 2,3). In this table the odd and even permutations
are identified and the function sgn is given.
t1

t1

(jl.h)

(j.. h)

123
132
213
231
312
321

(1,2)
(1,3)
(2, 1)
(2,3)
(3,1)
(3,2)

(1,3)
(1,2)
(2,3)
(2,1)
(3,2)
(3,1)

Now let A denote the (n X

(jz , h)

sgn t1

even

+1
-1
-1
+1
+1
-1

(2,3)
(3,2)
(1,3)
(3,1)
(1,2)
(2, 1)

odd
odd

even
even
odd

n) matrix

all al2
A=

is

odd or even

a~~
a"l

.. ~
a"z

alrt]
.........
.

~"

a""

We form the product of n elements from A by taking one and only one
element from each row and one and only one element from each column. We
represent this product as

.4 .4

157

Determinants ofMatrices

where tU i]. ... j.) E P(N). It is possible to find n! such products, one for
each u E P(N). We now define the determinant of A, denoted by det (A), by
the sum
det (A) =
where u

I:

"ep(N)

sgn (0') allt a2jo . ..

a.}.,

(4..4 2)

jl .. . j . We also denote the determinant of A by writing

det(A)

(4..4 3)

We now present some of the fundamental properties of determinants.


.4 .4 .4

Theorem.

eL t A and B be (n

x n) matrices.

(i) det (AT) = det (A).


(ii) If all elements of a column (or row) of A are ez ro, then det (A) = O.
(iii) IfB is the matrix obtained by multiplying every element in a column
(or row) of A by a constant tx, while all other columns of B are the
same as those in A, then det (B) = tx det (A).
(iv) If B is the same as A, except that two columns (or rows) are interchanged, then det (B) = - d et (A).
(v) If two columns (or rows) of A are identical, then det (A) = O.
(vi) If the columns (or rows) of A are linearly dependent, then det
(A) = O.

Proof To prove the first part, we note first that each product in the sum
given in Eq. (4..4 2) has as a factor one and only one element from each
column and each row of A. Thus, transposing matrix A will not affect the
n! products appearing in the summation. We now must check to see that
the sign of each term is the same.
F o r U E P(N), the term in det (A) corresponding to 0' is sgn (u)a llta 2}
. a.} . There is a product term in det (AT) of the form a lt'lajo'2" . aN. such
that a 1lt a 2jo . . , a.} . = a} I ' l aN2 ... au . The right-hand side of this equation
is just a rearrangement of the left-hand side. The number of j; > j;+ I for
i = I, ... ,n - I is the same as the number of j/ > j/+ I for i = 1, ... ,
n - 1. Thus, if 0" = ;U j~ . . .j~) then sgn (u' ) = sgn (0'), which means det
(AT) = det (A). Note that this result implies that any property below which
is proved for columns holds equally as well for rows.
To prove the second part, we note from Eq. (4..4 2) that if for some i,
Q/ k =
0 for all k, then det (A) = O. This proves that if every element in a row
of A is ez ro, then det (A) = O. By part (i) it follows that this result holds
also for columns. _

Chapter 4 I iF nite-Dimensional

158
.4 .4 5.

Exercise.

Prove parts (iii}(- vi)

Vector Spaces and Matrices

of Theorem .4 .4 .4

We now introduce some additional concepts for determinants.


.4 .4 6.
Definition. Let A = a[ l' ] be an n x n matrix. If the ith row and
jth column of A are deleted, the remaining (n - 1) rows and (n - 1) columns
can be used to form another matrix Mil whose determinant is det (Mil)'
We call det (MIJ) the minor of a'l' If the diagonal elements of MIJ are diagonal
elements of A, i.e., i = j, then we speak of a principal minor of A. The cofactor
of a'l is defined as (- 1 )' + 1 det (MIJ).
F o r example, if A is a (3 x

3) matrix, then

det (A)

all
a ZI

au

an

a l3
a Z3

the minor of element a Z3 is


det(Mz3)
and the cofactor of a Z3 is

a ll

a 31

The next result provides us with a convenient method of evaluating


determinants.
.4 .4 7.
Theorem. Let A be an n x n matrix. eL t e'l denote the cofactor of
a'l' i,j = I, ... ,n. Then the determinant of A is equal to the sum of the
products of the elements of any column (or row) of A, each by its own
cofactor. Specifically,
(4..4 8)
for j =
for i =

I, ... , n, and,
det (A)
1, ... ,n.

F o r example, if A is a (2 x

= J=IL "

a,AI'

2) matrix, then we have

(4..4 9)

.4 .4

159

Determinants ofMatrices

If A is a (3

x 3) matrix, then we have


=

det (A)

all

012.

0' 3

02'

au

023

0IlC I ,

0I1CU

0I3 C I3'

In this case five other possibilities exist. F o r example, we also have


det (A)
.4 .4 10.

Exercise.

O"C"

02,C2'

a 3 ,c 31

Prove Theorem .4 .4 7.

We also have:
.4 .4 11.
Theorem. Ifthe ith row of an (n X n) matrix A consists of elements
of the form 0/1 + 0:" 0' 2 + 0;2' ,a," + 0:.; i.e., if

a.2

then

det(A)

.4 .4 12.

Exercise.

Prove Theorem .4 .4 11.

Furthermore, we have:
.4 .4 13.
Theorem. eL t A and B be (n x n) matrices. If B is obtained from
the matrix A by adding a constant tt times any column (or row) to any other
column (or row) of A, then det (B) = det (A).
.4 .4 14.

Exercise.

Prove Theorem .4 .4 13.

In addition, we can prove:

Chapter 4 I iF nite-Dimensional

160

Vector Spaces and Matrices

.4 .4 15.

Theorem. Let A be an (n X n) matrix, and let c,/ denote the


cofactor of 0 ,/, i,j = I, ... , n. Then the sum of products of the elements
of any column (or row) by the corresponding cofactors of the elements of
any other column (or row) is ez ro. That is,

a,/c ,k

1=1

and

= 0 for j

*' k

(4..4 16a)
(4..4 16b)

.4 .4 17.

Exercise.

Prove Theorem .4 .4 15.

We can combine Eqs. (4..4 8)


~

and (4..4 16a)

a,/c ,k =

1=1

to obtain

det (A)cS/k>

(4..4 18)

1, ... , n, where /~ k denotes the Kronecker


combine Eqs. (4..4 9) and (4..4 16b) to obtain

j, k =

delta. Similarly, we can


(4..4 19)

1, ... , n.
We are now in a position to prove the following important result.

i, k =

.4 .4 20.

Theorem. eL t A and B be (n

Proof

We have

det (AD) =

det(AB)=
~

'.=1

By Theorem .4 .4 11

n) matrices. Then

det (A) det (B).

(4.4.21)

a",.b / 1

and Theorem .4 .4 ,4

part (iii), we have

a""

a",.

This determinant will vanish whenever two or more of the indices i/,j = 1,
... , n, are identical. Thus, we need to sum only over (f E P(N). We have
det (AB) =

"EP(N)

b"lb,,1" .b ,

.4 .4

Determinants 01 Matrices

161

where q = ili~ ... i. and P(N) is the set of all permutations of N =


n}. It is now straightforward to show that

{I, ... ,

sgn (q) det (A),


=

and hence it follows that


det (AB)

= det (A) det (B).

Our next result is readily verified.


.4 .4 22.
Theorem. Let I be the (n x n) identity matrix, and let 0 be the
(n x n) zero matrix. Then det (I) = I and det (0) = 0.
.4 .4 23.

Exercise.

Prove Theorem .4 .4 22.

The next theorem allows us to characterize non-singular matrices in terms


of their determinants.
.4 .4 24.
Theorem. An (n X
(A)::I= O.

n) matrix A is non-singular if and only if det

Suppose that A is non-singular. Then A- I exists and A- I A = AA- I


I. F r om this it follows that det (A - I A) = I *0, and thus, in view of Eq.
(4..4 21), det (A - I ) ::1= 0 and det (A) O.
Next, assume that A is singular. By Theorem .4 3.16, there exist nonsingular
matrices Q and P such that

Proof

o
A' =

QAP=

o
This shows that rank A
det (QAP)

and det (P)


=0 .

< nand det (A')

0. But

d[ et (Q) [det (A) [det (P)

0,

and det (Q)::I= 0. Therefore, if A is singular, then det (A)

Chapter 4 I iF nite-Dimensional

162

Vector Spaces and Matrices

Let us now turn to the problem of finding the inverse A- I of a nonsingular matrix A. In doing so, we need to introduce the classical adjoint of A.
.4 .4 25.
Definition. Let A be an (n X n) matrix, and let c' j be the cofactor
of D/J for i,j = 1, ... ,n. Let C be the matrix formed by the cofactors of A;
The matrix (J is called the classical adjoint of A. We write
i.e., C = c[ /J' ]
adj (A) to denote the classical adjoint of A.
We now have:
.4 .4 26.

Theorem.

Let A be an (n

A[adj (A)]

n) matrix. Then
X

a[ dj (A)]A

[det (A)] I.

Proof The proof follows by direct computation, using Eqs. (4..4 18)
(4..4 19).

As an immediate consequence of Theorem .4 .4 26


lowing practical result.
4.4.27.

Let A be a non-singular (n x

CoroUary.

A -I
.4 .4 29.

Example.

We have det(A)

and

we now have the foln) matrix. Then

de/(A) adj(A).

(4.4.28)

Consider the matrix

_~ H

A~[:

-1,

adj (A)
and

=[

-3

-1
-1

1 -1

~],

-2
A- I

=
[

-~

The proofs of the next two theorems are left as an exercise.


.4 .4 30.

Theorem. If A and 8 are similar matrices, then det (A) =

det (8).

X). Let A be the matrix of A with respect


.4 .4 31.
Theorem. Let A E L ( X ,
to a basis {el>'
.. ,e,,} in ,X and let A' be the matrix of A with respect to
another basis fe;, ... , e:.} in .X Then det (A) = det (A').

.4 5.

Eigenvalues and Eigenvectors

.4 .4 32.

Exercise.

163

Prove Theorems .4 .4 30

and .4 .4 31.

In view of the preceding results, there is no ambiguity in the following


definition.

.4 .4 33.
Definition. The determinant of a linear transformation A of a
finite-dimensional vector space X into X is the determinant of any matrix
A representing it; i.e., det (A) Do det (A).
The last result of the present section is a consequence of Theorems .4 .4 20
and .4 .4 24.

.4 .4 34.

Theorem. Let X be a finite-dimensional vector space, and let


A, B E L ( X ,
X ) . Then A is non-singular if and only if det (A) O. Also,
det (AB) = d[ et (A)] d[ et (B)].

*"

.4 5.

EIGENVALE
U S

AND EIGENVECTORS

In the present section we consider eigenvalues and eigenvectors of linear


transformations defined on finite-dimensional vector spaces. Later, in
Chapter 7, we will reconsider these concepts in a more general setting.
Eigenvalues and eigenvectors play, of course, a crucial role in the study of
linear transformations.
Throughout the present section, X denotes an n-dimensional vector space
over a field .F
eL t A E L ( X ,
X ) , and let us assume that there exist sets of vectors e{ l,
... , e.J and e{ ;, ... , e~J, which are bases for X such that

e; =

Ael =

lle l ,
(4.5.1)

i. = Ae. = l.e.,

where 1, E ,F i = 1, ... , n. If this is the case, then the matrix A' of A with
respect to the given basis is

A/ =

This motivates the following result.

Chapter 4 I iF nite-Dimensional

164

.4 5.2. Theorem. eL t A
such that

Vector Spaces and Matrices

X ) , and let.t E .F Then the set ofall x E X

L ( ,X

Ax

Ax =

(4.5.3)

is a linear subspace of .X In fact, it is the null space of the linear transformation (A - .tI), where I is the identity element of L(X,
)X .

Proof

Since the zero vector satisfies Eq. (4.5.3) for any .t E ,F the set is
non-void. If the zero vector is the only such vector, then we are done, for
O
{ J is a linear subspace of X (of dimension ez ro). In any case, Eq. (4.5.3)
holds if and only if (A - U ) x
= O. Thus, x belongs to the null space of
A - U , and it follows from Theorem 3.4.19 that the set of all x E X sat
isfying Eq. (4.5.3) is a linear subspace of .X
Henceforth

we let

mol = x{

:X (A -

.tl)x

OJ.
(4.5.4)
The preceding result gives rise to several important concepts which we
introduce in the following definition.
E

X ) , and mol be defined as in Theorem


.4 5.5. DefiDition. Let ,X A E L ( X ,
.4 5.2 and Eq. (4.5.4). A scalar .t such that mol contains more than just the
zero vector is called an eigenvalue of A (i.e., if there is an x =# 0 such that
Ax = lx , then 1 is called an eigenvalue of A). When .t is an eigenvalue of A,
then each x =# 0 in mol is called an eigenvector of A corresponding to the
eigenvalue .t. The dimension of the linear subspace mol is called the multiplicity of the eigenvalue .t. Ifmol is of dimension one, then A. is called a simple
eigenvalue. The set of all eigenvalues of A is called the spectrum of A.

Some authors call an eigenvalue a proper value or a characteristic value


or a latent value or a secular value. Similarly, other names for eigenvector are
proper vector or cbaracteristic vector. The space mol is called the .tth proper
subspace of X.
F o r matrices we give the following corresponding definition.
.4 5.6. DefiDition. Let A be an (n X n) matrix whose elements belong to
the field .F If there exists.t E F and a non-zero vector x E F " such that

Ax

.tx

(4.5.7)

then .t is called an eigenvalue of A and x is called an eigenvector of A corresponding to the eigenvalue .t.
Our next result provides the connection between Definitions .4 5.5 and
.4 5.6.
.4 5.8. Theorem. Let A E L ( X ,
X ) , and let A be the matrix of A with respect
to the basis e{ ., ... ,e,,}. Then A. is an eigenvalue of A if and only if.t is an
eigenvalue of A. Also, x E X is an eigenvector of A corresponding to .t if

.4 5.

165

Eigenvalues and Eigenvectors

and only if the coordinate representation of x with respect to the basis


e{ I' , e,,}, ,x is an eigenvector of A corresponding to 1.
.4 5.9.

Exercise.

Prove Theorem 4.5.8.

Note that if x (or x) is an eigenvector of A (of A), then any non-ez ro


multiple of x (of x) is also an eigenvector of A (of A).
In the next result, the proof of which is left as an exercise, we use determinants to characterize eigenvalues. We have:

.4 5.10.

Theorem. Let A E L(X,


and only if det (A - lI) = O.

.4 5.11.

Exercise.

)X .

Then 1

F is an eigenvalue of A if

Prove Theorem 4.5.10.

Let us next examine the equation


det(A - 1 1) =

(4.5.12)

in terms of the parameter 1. We ask: Can we determine which values of 1,


if any, satisfy Eq. (4.5.12)1 eL t e{ l, ... ,e,,} be an arbitrary basis for X and
let A be the matrix of A with respect to this basis. We then have
det (A -

U)

det (A -

11).

(4.5.13)

The right-hand side of Eq. (4.5.13) may be rewritten as


(all

-1)

au

at..

det(A - 1 1) =

(4.5.14)
0"1

ad

(a"" -

1)

It is clear from Eq. (4.4.2)

that expansion of the determinant (4.5.14) yields


a polynomial in 1 of degree n. In order for 1 to be an eigenvalue of A it must
(a) satisfy Eq. (4.5.12), and (b) it must belong to .F Requirement (b) warrants
further comment: note that there is no guarantee that there exists 1 E F
such that Eq. (4.5.12) is satisfied, or equivalently we have no assurance that
the nth-order polynomial equation
det(A - 1 1) =

has any roots in .F There is, however, a special class of fields for which
requirement (b) is automatically satisfied. We have:

.4 5.15.

Definition. A field F is said to be algebraically closed if for every


polynomial p(l) there is at least one 1 E F such that

Pel) =

o.

(4.5.16)

Chapter 4 I iF nite-Dimensional

166

Vector Spaces and Matrices

Any 1 which satisfies Eq. (4.5.16) is said to be a root of the polynomial equation (4.5.16).
In particular, the field ofcomplex numbers is algebraically closed, whereas
the field of real numbers is not (e.g., consider the equation ..P + I = 0).
There are other fields besides the field of complex numbers which are
algebraically closed. oH wever, since we will not develop these, we will restrict
ourselves to the field of complex numbers, C, whenever the algebraic closure
property of Definition .4 5.15 is required. When considering results that are
valid for a vector space over an arbitrary field, we will (as before) make usage
of the symbol F or frequently (as before) make no reference to F at all.
We summarize the above discussion in the following theorem.
.4 5.17.

Theorem. eL t A

L(X,

Then

X).

(i) det (A - 1 I) is a polynomial of degree n in the parameter 1; i.e.,


there exist scalars /10' /II' , /1ft' depending only on A, such that
lT) =

det (A (note that

/1 0

/1 0

det (A) and

/Ill

/1ft

= (-

/lz l z

+ ... +

/I)' f t

(4.5.18)

I)");

(ii) the eigenvalues of A are precisely the roots of the equation


(A - ).T) = 0; i.e., they are the roots of
/1 0

/II).

+ ... +

/lz)z'

/lft1"

= 0; and

det

(4.5.19)

(iii) A has; at most, n distinct eigenvalues.


The above result motivates the following definition.
.4 5.20.

Definition. eL t A E L ( X ,
det (A -

1I)

and let A be a matrix of A. We call

X),

= det (A -

).1) =

/1 0

/II).

+ ... +

/I)."

(4.5.21)

the characteristic polynomial of A (or of A) and


det(A - 1 T) =

det(A - 1 1) =

(4.5.22)

the characteristic equation of A (or of A).


rF om the fundamental properties of polynomials over the field of complex
numbers there now follows:
Theorem. If X is an n-dimensional vector space over C and if
X ) , then it is possible to write the characteristic polynomial of A
in the form

.4 5.23.
A

L(X,

det (A -

).1)

(1 1
-

).)",,().z -

).)"" .

()., -

).)"",

(4.5.24)

.4 5.

167

Eigenvalues and Eigenvectors

where AI' i = 1, ... ,p, are the distinct roots of Eq. (4.5.19) (Le., AI 1= = A/
for i 1= = j). In Eq. (4.5.24), ml is called the algebraic multiplicity of the root AI'

The ml are positive integers, and

ml =

1= 1

n.

Note the distinction between the concept of algebraic multiplicity of AI


given in Theorem .4 5.23 and the multiplicity of ).1 as given in Definition
.4 5.5. In general, these need not be the same, as will be seen later.
We now state and prove one of the most important results of linear
algebra, the Cayley-aH milton theorem.
.4 5.25. Theorem. eL t A be an n X n matrix, and let p(A) =
be the characteristic polynomial of A. Then
P(A) =

det (A -

AI)

O.

Proof eL t the characteristic polynomial for A be


p(A) =

~o

+ ... +

~IA

~"A".

Now let B(A) be the classical adjoint of (A ~ AI). Since the elements bli).)
of B(A) are cofactors of the matrix A - ),1, they are polynomials in A of
degree not more than n - 1. Thus,
blJ(A)

Letting Bk

PI/O

PI/IA +

... +

PI/<,,-Il

A,,-I.

= P[ I/k] for k = 0, 1, ... , n - 1, we have


B(.t)

Bo

By Theorem .4 .4 26,
(A -

Thus,

.tB I

AI)B(A) =

+ ... +

.t"- I B"_ I '

d[ et (A -

AI)]I.

(A - II)(Bo + AB I + ... + A,,-IB,,_I] = (~o + ~IA + ... + ~"A")I.


Expanding the left-hand side of this equation and equating like powers of
A, we have
- B ,,_ I
=

~"I,

AB,,_I -

B"-1 =

... , AB I - B o =

~"_II,

I I,
ABo

~0I.

Premultiplying the above matrix equations by A", A"-I, ... , A, I, respectively, we have

-A"B"_I

A"B"_I -

~"A",

A1B I -

ABo =

~IA,

A"-IB"_1
ABo

~"_IA"-I,
~ol.

Adding these matrix equations, we obtain

o=

~oI

which was to be shown. _

~IA

+ ... +

~"A"

p(A),

... ,

Chapter 4 I iF nite-Dimensional

168

Vector Spaces and Matrices

As an immediate consequence of the Cayley-aH milton


.4 5.26. Theorem. Let A be an (n
nomial given by Eq. (4.5.21). Then

n) matrix
X

theorem, we have:

with characteristic poly-

(lIA + ... + (l~_IA~-I];


and
(ii) if f(.A.) is any polynomial in 1, then there exist Po, PI' . . , P~_I
such that
(i)

A~

(-I)~+I[(loI

f(A) =
Proof
=

Pol

PIA

+ ... +

P~_IA~-I.

Part (i) follows from Theorem .4 5.25

and from the fact that

(l~

(-I)~.

To prove part (ii), let f(A) be any polynomial in A and let P(1) denote
the characteristic polynomial of A. Then there exist two polynomials g(1)
and r(A) (see Theorem 2.3.9) such that
f(A) =

where deg r[ (A)] < n - I. sU ing


and the theorem follows.

P(A)g(1)

r(1),

the fact that p(A) =

(4.5.27)

0, we have f(A) =

r(A),

The Cayley-aH milton theorem holds also in the case of linear transformations. Specifically, we have the following result.
and let p(l) denote the characteristic

.4 5.28. Theorem. eL t A E L ( X , X ) ,
polynomial of A. Then P(A) = O.
.4 5.29.

Exercise.

Prove Theorem .4 5.28.

Let us now consider a specific example.


.4 5.30.

Example.

Consider the matrix

A=G J~

Let us use Theorem .4 5.26 to evaluate A37. Since n =


A37 is of the form
A37 = Pol + PIA.

2, we assume that

The characteristic polynomial of A is


P(A)

= (I -

and the eigenvalues of A are 1 1


= 1 37 and r(l) in Eq. (4.5.27) is

r(l) =

We must determine

Po

and

PI'

1)(2 -

I and 1 2

Po +
sU ing

1)

2. In the present case f(l)

PI1.
the fact that P(11) =

P(11) =

0, it

.4 6.

169

Some CtllWnical oF rms ofMatrices

= rO. I ) andf(A z ) = r(A z ). Thus, we have


Po + PI = p7 = I, Po + 2PI = 237

follows thatfO' I )
Hence,

PI

= 237

Po =

I and
-

A37

2- 2

(2 - 2
=

or,

[I

A37 -

237

37

37

)1

Therefore,
(2 37

I)A,
-

0.J

I 237
-

Before closing the present section, let us introduce another important


concept for matrices.

.4 5.31.

Definition. If A is an (n X n) matrix, then the trace of A, denoted


by trace A or by tr A, is defined as
tr A =

trace A =

all

022

+ ... +

a..

(4.5.32)

(i.e., the trace of a square matrix is the sum of its diagonal elements).
It turns out that if F = C, the field of complex numbers, then there is a
relationship between the trace, determinant, and eigenvalues of an (n X n)
matrix A. We have:

.4 5.33.

A E L(X,

Theorem. Let X be a vector space over C. Let A be a matrix of


X ) and let det (A - U ) be given by Eq. (4.5.24). Then

(i) det (A)

(ii) trace (A) =

jJ ;

.ti';

t
J=I

mJ

1J

(iii) if B is any matrix similar to A, then trace (8)


(iv) let f(A) denote the polynomial

= trace (A); and

f(A) = 10 + 11 A + ... + 1... A... ;


then the roots of the characteristic polynomial of f(A) are f(AI ),
... ,f(A,) and
det [ f (A) - All = [ f (A I ) - A]"" ... [ f (1,) - 1] ....

.4 5.34.

.4 6.

Exercise.

Prove Theorem .4 5.33.

SOME CANONICAL

O
F RMS

OF

MATRICES

In the present section we investigate under which conditions a linear


transformation of a vector space into itself can be represented by special
types of matrices, namely, by (a) a diagonal matrix, (b) a so-called triangular
matrix, and (c) a so-called "block diagonal matrix." We will also investigate

Chapter 4 I iF nite-Dimensional

170

Vector Spaces and Matrices

when a linear transformation cannot be represented by a diagonal matrix.

Throughout the present section X denotes an n-dimensional vector space


over a field .F
.4 6.1. Theorem. Let ll' ... , lp be distinct eigenvalues of a linear trans)X .
Let e~ 1= = 0, ... , e~ 1= = 0 be eigenvectors of A
formation A E L ( X ,
corresponding to ll"' " lp, respectively. Then the set {e~, ... , e~}
is
linearly independent.

Proof. The proof is by contradiction. Assume that the set e{ ,~ ... ,e~}
is linearly dependent so that there exist scalars I ' ... , p , not all ez ro, such
that

Ie~

+ ... +

= O.

pe~

We assume that these scalars have been chosen in such a fashion that as
few of them as possible are non-zero. Relabeling, if necessary, we thus have
Ie~

+ ... +

(4.6.2)

0,

,e~

where I ;= C 0, ... , IX, 1= = 0 and where r < p is the smallest number for which
we can get such an expression.
Since ll, ... ,l, are eigenvalues and since e~, . .. ,I, are eigenvectors.
we have
0=

Also,

A(O)

+ ... +
+ ... + (<,< l,)I,.

A(<<le~

(<<Ill)e~

o=
=

Subtracting Eq. (4.6.4)

o=

.2., 0

(<<Il,)e~

.2.,(<,~

, 1,)

+ ... +

+ ... +

IAe~

, AI,
(4.6.3)

+ ... +

,e~)

(4.6.4)

(<,< .2.,)1,.

from Eq. (4.6.3) we obtain

I (ll -

.2.,)e~

+ ... +

, (.2., -

l,)I,.

Since by assumption the .2.,'s are distinct, we have found an expression


involving only (r - I) vectors satisfying Eq. (4.6.2). But r was chosen to
be the smallest number for which Eq. (4.6.2) holds. We have thus arrived
at a contradiction, and our theorem is proved. _
We note that if, in the above theorem, A has n distinct eigenvalues, then
the corresponding n eigenvectors span the linear space X (recall that dim

X = n ).

Our next result enables us to represent a linear transformation with n


distinct eigenvalues in a very convenient form.
.4 6.5. Theorem. Let A E L ( X ,
)X . Assume that the characteristic polynomial of A has n distinct roots, so that

.4 6.

Some Canonical oF rms of Matrices

det (A where )..1'


e;,
i = I, 2,
(e~,

)..f)

171

)..)Oz -

()..l -

)..) ... ()..ft -

)..),

are distinct eigenvalues. Then there exists a basis


such that e; is an eigenvector corresponding to )../ for
, n. The matrix A' of A with respect to the basis e{ ;, e;, ... , i.} is

)"z, ... ')..ft


,e~}
of X

Al

A' =

(4.6.6)

o
A"

Proof Let e; denote the eigenvector corresponding to the eigenvalue A/.


In view of Theorem .4 6.1, the set e{ ;, e;, ... ,e.} is linearly independent
because AI' Az ,
, ,t" are all different. Moreover, since there are n of the
e;, the set e{ ,~ e;,
, e~} forms a basis for the n-dimensional space .X Also,
from the definition of eigenvalue and eigenvector, we have

Ale;
Ae; = Aze;

Ae;

(4.6.7)

F r om Eq. (4.6.7) we obtain the desired matrix given in Eq. (4.6.6).

The reader can prove the following useful result readily.


.4 6.8. Theorem. eL t A E L ( X ,
X ) , and let A be the matrix of A with respect
to a basis (e I ' ez , ... , eft}' If the characteristic polynomial
det (A -

)..l)

has n distinct roots, )..1'


with respect to a basis (e;,

z )" z + ... + IXft)..ft


,Aft' then A is similar to the matrix A' of A
, e~}, where
IX o

IX l ' t

IX

A' =

(4.6.9)

Aft

In this case there eixsts a non-singular matrix P such that


A'

P- I AP.

(4.6.10)

The matrix P is the matrix of basis (e;, e;, ... , e~} with respect to basis
(e l , ez , ... , ell}, and p- I is the matrix of basis e{ l, ... ,eft} with respect to

Chapter 4 I iF nite-Dimensional

172

Vector Spaces and Matrices

basis e{ ,~ ... , e,,}. The matrix P can be constructed by letting its columns
be eigenvectors of A corresponding to AIt , A., respectively. That is,

P=
where x

tt

,x .

[XI'

,x.l,

2,

(4.6.11)

are eigenvectors of A corresponding to the eigenvalues

AI, ... , A., respectively.

The similarity transformation P given in Eq. (4.6.11) is called a modal


matrix. If the conditions of Theorem .4 6.8 are satisfied and if, in particular,
Eq. (4.6.9) holds, then we say that matrix A bas been diagonalized.
.4 6.12.

Exercise.

Prove Theorem .4 6.8.

eL t us now consider some specific examples.


.4 6.13. Example. eL t X be a two-dimensional vector space over the field
of real numbers. eL t A E L ( X , X ) , and let e{ l, e2} be a basis for .X Suppose
the matrix A of A with respect to this basis is given by

_ 2-[ J4

A-

The characteristic polynomial of A is


p(l)

det(A - 1 I)

det(A - 1 1)

= A2 + A-

6.

Now det (A - AI) = 0 if and only if A2 + A- 6 = 0, or (A - 2)(A + 3)


= O. Thus, the eigenvalues of A are 1 1 = 2 and 1 2 = - 3 . To find an
eigenvector corresponding to 1 1 , we solve the equation (A - l ll)x
= 0, or

The last equation yields the equations

e4- 1

+ e4 2 =

These are satisfied whenever I~


=

~2'

0, I~ - ~2 = O.
Thus, any vector of the form

is an eigenvector of A corresponding to the eigenvalue 11' F o r convenience,


let us choose ~ = I. Then

is an eigenvector. In a similar fashion we obtain an eigenvector


sponding to 1 2 , given by
X

corre-

.4 6.

Some Canonical oF rms ofMatrices

173

The diagonal matrix A' given in Eq. (4.6.9) is, in the present case,

= [~I ;J=

A'

~[ l~-

We can arrive at A', using Eq. (4.6.10). Specifically, let

P=

[XI'

2]

Then

[ ..22 -.2.8J
=

p- I

and
P- I AP

By Eq. (4.3.2), the basis

A is given by
e~

1=1

[~
=

PIleI =

-~J

el

e2'

[~I
=

e;) c X

e{ ,~

;- J

[~
=

;J.

with respect to which A' represents

e;
=

1=1

Pnel =

4e

1 -

e2'

In view of Theorem 4.3.8, if X is the coordinate representation of x with


respect to e{ l, e2 ), then X ' = P- I X is the coordinate representation of x
with respect to {e~, e;). The vectors e~,
are, of course, eigenvectors of A
corresponding to AI and A2 , respectively. _

e;

When the algebraic multiplicity of one or more of the eigenvalues of a


linear transformation is greater than one, then the linear transformation
is said to have repeated eigenvalues. Unfortunately, in this case it is not always
possible to represent the linear transformation by a diagonal matrix. To put
it another way, if a square matrix has repeated eigenvalues, then it is not
always possible to diagonalize it. oH wever, from the preceding results of the
present section it should be clear that a linear transformation with repeated
eigenvalues can be represented by a diagonal matrix if the number of linearly
independent eigenvectors corresponding to any eigenvalue is the same as the
algebraic multiplicity of the eigenvalue. The following examples throw additional light on these comments.
4.6.14.

Example.

The characteristic equation of the matrix

A=
is

det (A -

AI)

and the eigenvalues of A are AI

13-2]

0 4

-2

-I

= (I - A)2(2 - A) = 0,

= 1 and A2 = 2. The algebraic multiplicity

Chapter 4 I iF nite-Dimensional

174

Vector Spaces and Matrices

of AI is two. Corresponding to AI we can find two linearly independent


eigenvectors

m H[
and

Corresponding to

A~

we have an eigenvector

:[ }
Letting P denote a modal matrix, we have

p=[~
and

1 I]
0[ 1
Oland p- I
010
=

-1

-2
3

!n

A'-P-'AP=[~

In this example, dim moll = 2, which happens to be the same as the algebraic
multiplicity of 11" F o r this reason we were able to diagonalize matrix A.
The next example shows that the multiplicity of an eigenvalue need not
be the same as its algebraic multiplicity. In this case we are not able to
diagonalize the matrix.

.4 6.15.

Example.

The characteristic eq u ation of the matrix

is

21-2]

[ 001
=

det(A - ) .I)

0 2

-1

(1- 1 )(2

- 1 )~

and the eigenvalues of A are 1 1 = 1 and 1~ = 2. The algebraic multiplicity


of 1~ is two. An eigenvector corresponding to AI is
= (I, 1, 1). An
eigenvector corresponding to 1~ must be of the form

rx

H[

*~ O.

.4 6.

Some Canonical oF rms ofMatrices

175

Setting ~x = (1,0,0), we see that dim mAl = I, and thus we have not been
able to determine a basis for R3, consisting of eigenvectors. Consequently,
we have not been able to diagonalize A.
When a matrix cannot be diagonalized we seek, for practical reasons,
to represent a linear transformation by a matrix which is as nearly diagonal
as possible. Our next result provides the basis of representing linear transformations by such matrices, which we call block diagonal matrices. In the next
section we will consider the "simplest" type of block diagonal matrix, called
the Jordan canonical form.
Theorem. Let X be an n-dimensional vector space, and let A
Let Y and Z be linear subspaces of X such that X = Y EEl Z
and such that A is reduced by Y a nd Z. Then there exists a basis for X such
that the matrix A of A with respect to this basis has the form

4.6.16.
E L(X,

X).

where dim Y
matrix.
4.6.17.

Exercise.

A=l'-~[ *J
r, AI is an (r X

r) matrix and A2 is an (n -

r) X

(n -

r)

Prove Theorem .4 6.16.

We can generalize the preceding result. Suppose that X is the direct sum
of linear subspaces X I ' ... ' X , that are invariant under A E L ( X ,
X).
We can define linear transformations AI E L ( X I , ,X ), i = 1, ... ,p, by
A/x = Ax for x E X,. That is to say, A, is the restriction of A to ,X . We now
can find for each A, a matrix representation A" which will lead us to the
following result.
Theorem. eL t X be a finite-dimensional vector space, and let
A E L(X,
)X . If X is the direct sum of p linear subspaces, X I ' ... , "X which
are invariant under A, then there exists a basis for X such that the matrix
representation for A is in the block diagonal form given by

4.6.18.

A=

AI:
...I : A2
,-- ,

._ -

r- -

: A,

Moreover, A, is a matrix representation of A" the restriction of A to X

it

Chapter 4 I iF nite-Dimensional

176
i

Vector Spaces and Matrices

1, ... ,po Also,

.4 6.19.

Exercise.

Prove Theorem .4 6.18.

F r om the preceding it is clear that, in order to carry out the block diagonalization of a matrix A, we need to find an appropriate set of invariant
subspaces of X and, furthermore, to find a simple matrix representation on
each of these subspaces.

.4 6.20. Example. Let X be an n-dimensional vector space. If A E L ( X ,


has n distinct eigenvalues, 1 1 , , 1., and if we let
~J

then
~

= :x{

= OJ,

ll)x

(A -

X)

1, ... ,n,

j =

is an invariant linear subspace under A and

X=

~I

EB

E B~.

F o r any x E ~J'
we have Ax = 1J,x
and hence AJx = 1Jx for x E ~J'
A basis for ~J is any non-zero x J E ~r Thus, with respect to this basis, AJ
is represented by the matrix 1J (in this case, simply a scalar). With respect to a
, .x ,}
A is represented
basis of n linearly independent eigenvectors, IX{ >
by Eq. (4.6.6).
In addition to the diagonal form and the block diagonal form, there
are many other useful forms for matrices to represent linear transformations
on finite-dimensional vector spaces. One of these canonical forms involves
triangular matrices, which we consider in the last result ofthe present section.
We say that an (n X n) matrix is a triangulu matrix ifit either has the form

all

or the form

012.

0 13

ab

022

023

02.

0
0

(4.6.21)

a._ I ,.
a

all

021

02:1,

(4.6.22)

In case of Eq. (4.6.21) we speak of an upper triangulu matrix, whereas in


case of Eq. (4.6.22) we say the matrix is in the lower triangular form.

.4 6.

Some Canonical oF rms ofMatrices

117

.4 6.23. Theorem. L e t X be an n-dimensional vector space over C, and let


A E L ( X , X). Then there exists a basis for X such that A is represented by
an upper triangular matrix.

Proof. We wilt show that if A is a matrix of A, then A is similar to an upper


triangular matrix A'. Our proof is by induction on n.
If n = 1, then the assertion is clearly true. Now assume that for n = k,
and C any k x k matrix, there exists a non-singular matrix Q such that
C' = Q- I CQ is an upper triangular matrix. We now must show.the validity
of the assertion for n = k + 1. Let X b e a (k + I)-dimensional vector space
over C. Let AI be an eigenvalue of A, and letll be a corresponding eigenvector.
Let { f z , ... ,fk+l} be any set of vectors in X such that { f l' ... ,fk+l}
is
a basis for .X L e t B be the matrix of A with respect to the basis { f l' ... ,
fk+I.}
Since All = A.lI B must be of the form
AI

B=
Now let C be the k

bl2

... ::: ...

bk+I,z

. .

0....

bl,k+1

~ '.k:.1

bk+I,k+1

x k matrix

By our induction hypothesis, there exists a non-singular matrix Q such that


C' = Q- I CQ,
where C' is an upper triangular matrix. Now let

0-- :- p=

i
I

I
I
I

...
Q

I
I

0:

By direct computation we have


I ;I 0

0:

...

~-I-

p- I

I
I

.:

I
1

0:

Q- I

178

Chapter 4 I iF nite-Dimensional

and

AI :.
-~_.

P- I BP

o:

Vector Spaces and Matrices

I
I

I
I
I
I
I
I

o:

where the .' s denote elements which may be non-ez ro. Letting A = P-IBP,
it follows that A is upper triangular and is similar to B. eH nce, any (k + 1)
x (k + 1) matrix which represents A E L ( X , X ) is similar to the upper
triangular matrix A, by Theorem .4 3.19. This completes the proof of the
theorem. _
Note that if A is in the triangular form of either Eq. (4.6.21) or (4.6.22),
then
det (A - 11) = (a J I - A)(au - A) ... (a - 1).
In this case the diagonal elements of A are the eigenvalues of A.

.4 7.

MINIMAL POLN
Y OMIALS,
OPERATORS, AND THE
CANONICAL O
F RM

NILPOTENT
JORDAN

In the present section we develop the Jordan canonical form of a matrix.


To do so, we need to introduce the concepts of minimal polynomial and
nilpotent operator and to study some of the properties of such polynomials
and operators. nU less otherwise specified, X denotes an n-dimensional vector

space over a field F throughout the present section.


A.

Minimal Polynomials

F o r purposes of motivation, consider the matrix

[~ o ~ =~].
3

-I

The characteristic polynomial of A is


p(A)

1)Z(2 -

(I -

and we know from the Cayley- Hamilton


P(A)

1),

theorem that
O.

(4.7.1)

.4 7.

179

Minimal Polynomials

Now let us consider the polynomial


Then

m(A) =

A)(2 -

(1 -

m(A)

A) =

2-

3A +

A2

= O.

3A

21 -

AZ
(4.7.2)

Thus, matrix A satisfies Eq. (4.7.2), which is of lower degree than Eq. (4.7.1),
the characteristic eq u ation of A.
Before stating our first result, we recall that an nth- o rder polynomial in
A is said to be monic if the coefficient of An is unity (see Definition 2.3.4).
4.7.3.
Theorem. L e t A be an (n
polynomial m(A) such that
X

n) matrix. Then there exists a unique

(i) m(A) = 0;
(ii) m(A) is monic; and,
(iii) if m'(A) is any other polynomial such that m'(A) = 0, then the degree
of m(A) is less or equal to the degree of m'(A) (Le., m(A) is ofthe lowest
degree such that m(A) = 0).
Proof We know that a polynomial, p(A), exists such that P(A) = 0, namely,
the characteristic polynomial. F u rthermore, the degree of p(A) is n. Thus,
there exists a polynomial, say f(A), of degree m < n such that f(A) = O.
Let us choose m to be the lowest degree for which f(A) = O. Since f(A) is
of degree m, we may divide f(A) by the coefficient of Am, thus obtaining
a monic polynomial, m(A), such that m(A) = O. To show that m(A) is
uniq u e, suppose there is another monic polynomial m' ( A) of degree m
such that m'(A) = O. Then m(l) - m' ( l) is a polynomial of degree less than
m. F u rthermore, m(A) - m'(A) = 0, which contradicts our assumption that
m(A) is the polynomial of lowest degree such that m(A) = O. This completes
the proof. _

The preceding result gives rise to the notion of minimal polynomial.


4.7.4.
Definition. The polynomial m(A) defined in Theorem .4 7.3 is called
the minimal polynomial of A.
Other names for minimal polynomial are minimum polynomial and
reduced characteristic fUBction. In the following we will develop an explicit
form for the minimal polynomial of A, which makes it possible to determine
it systematically, rather than by trial and error.
In the remainder of this section we let A denote an (n X n) matrix, we let
p(A) denote the characteristic polynomial of A, and we let m(A) denote the
minimal polynomial of A.
Theorem. Let f(l) be any polynomial such that f(A) =
m(A) divides f(A).

4.7.5.

O. Then

Chapter 4 I iF nite-Dimensional

180

Vector Spaces and Matrices

Proof. Let 11 denote the degree of mel). Then there exist polynomials q ( l)
and r(l) such that (see Theorem 2.3.9)
I(l)

<

where deg r[ (l)]

or r(l)

11

= q ( l)m(l)

r(l),

= O. Since I(A) = 0, we have

o=

q(A)m(A)

rCA),

and hence rCA) = O. This means r(l) = 0, for otherwise we would have a
contradiction to the fact that mel) is the minimal polynomial of A. Hence,
I(l) = q ( l)m(l) and mel) divides I(l).

.4 7.6. Corollary. The minimal polynomial of A, mel), divides the characteristic polynomial of A, pel).
.4 7.7.

Prove Corollary .4 7.6.

Exercise.

We now prove:

.4 7.8.

Deorem. The polynomial pel) divides m


[ (l)]".

Proof. We want to show that m


[ (l)]" = p(l)q ( l)
Let m(,t) be of degree 11 and be given by

for some polynomial

q(,t).

mel) =

l'

+ ... +

P.l - '

P.

Let us now define the matrices Bo, B., ... , B._. as


Bo = I, B. = A + P.I, B1 = Al + P.A + P1I, ... ,
B._. = A - t + PIA,- l + ... + P._ . I.
Then
Bo = I, B. - ABo = PtI, B1 - AB. = P1I, ... ,
B.- t - AB.- 1 = P.- t I ,
and
- A B' _ I
= P,I - [A' + PtA - t + ...
P,I)

= P,I - meA) = P.I.

Now let
Then
(A -

lI)B(l)

l' B o +

A,-tB 1 +

= A'B o + A1- B
[ t
l' I

PtA,- I I

... +
ABo]

+ ... +

AB'I_

A,-l[Bl

P,- t ll

o+

[l'-'AB

+ ... +

AB t]

A[B,-t

P,I =

l - l AB.

+ .,.

AB,_t]

AB,_l]

m(l)I.

AB,_t

.4 7.

MinimolPolynomials

181

Taking the determinant of both sides of this equation we have


[det (A -

).1)] d[ et B().) =

But det B().) is a polynomial in )., say q().).


p().)q().) = m
[ ().)].

m
[ ()') ft.

Thus, we have proved that

The next result establishes the form of the minimal polynomial.

.4 7.9.

Theorem. Letp().) be given by Eq. (4.5.24);

P().)
where m t ,

).\ , . .

,).p

().t -

).)"',().%

i.e.,

).)"' . .. ().p -

, m p are the algebraic multiplicities of the distinct eigenvalues


of A, respectively. Then

= (). - ).t)"(). where 1 ::;;; v,::;;; m, for i = I, ... ,p.


m().)

).%), . .

(). -

.4 7.11.
-

).)"",

).p)",

(4.7.10)

Exercise. Prove Theorem .4 7.9. (Hint: Assume that m().) =


p\ ) " ... (). - p,)", and use Corollary .4 7.6 and Theorem .4 7.8).

().

The only unknowns left to determine the minimal polynomial of A are


, vp in Eq. (4.7.10). These can be determined in several ways.
Our next result is an immediate consequence of Theorem .4 3.27.

Vt,

4.7.12.

Theorem.

Let

A' be similar to A. and let m' ( .t) be the minimal


= m().).

polynomial of A' . Then m /().)

This result justifies the following definition.

.4 7.13.

Definition. eL t A E L ( X ,
X ) . The minimal polynomial of A is
the minimal polynomial of any matrix A which represents A.
In order to develop the J o rdan canonical form (for linear transformations
with repeated eigenvalues), we need to establish several additional preliminary
results which are important in their own right.

.4 7.14.

Theorem. Let A E L ( X ,
X ) . and letf().) be any polynomial in )..
Let m, = { x : f(A)x
= OJ. Then m, is an invariant linear subspace of X
under A.

Proof The proof that m, is a linear subspace of X is straightforward and


is left as an exercise. To show that m, is invariant under A, let x Em,. so
thatf(A)x

= O. We want to show that Ax

m"

Let

Chapter 4 I iF nite-Dimensional

182

Vector Spaces and Matrices

Then
and

which completes the proof. _


Before proceeding further, we establish some additional notation. Let
AI" .. ,Ap be distinct eigenvalues of A E L(X,
)X . F o r j = I, ... ,p and
for any positive integer ,q let
1~

{x:

AJT)qX

(A -

OJ.

(4.7.15)

Note that this notation is consistent with that used in Example


if we define

}~

.4 6.20

~J.

Note also that, in view of Theorem .4 7.14, 1~ is an invariant linear subspace


of X under A.
We will need the following result concerning the restriction of a linear
transformation.

)X . Let IX and X 1 be linear subspaces of


.4 7.16. Theorem. Let A E L(X,
X such that X = IX EEl X 1 and let AI be the restriction of A to IX . Let f(A)
be any polynomial in 1. If A is reduced by X I and X 1 then, for all IX E X "
f(AI)x I = f(A)x l
.4 7.17.

Exercise.

Prove Theorem .4 7.16.

Next we prove:
.4 7.18. Theorem. Let X be a vector space over C, and let A E
Let m(l) be the minimal polynomial of A as given in Eq. (4.7.10).
= (A - AI)", let h(A) = (l - A1)" ... (A - Ap )" if p 2 2, let
if p = I. eL t AI be the restriction of A to ~i',
i.e., AI X = Ax for all
Let ml = x { E :X h(A)x = OJ. Then

L(X,
X).
Let g(l)
h(A) = I
x E ~i'.

(i) X = ~'i' EEl ml; and


(ii) (l - A I)" is the minimal polynomial for AI'

Proof By Theorem .4 7.14, ml and ~i' are invariant linear subspaces under
A. Since g(l) and h(l) are relatively prime, there exist polynomials (q A) and
r(l) such that (see Exercise 2.3.15)
q ( l)g(l)

r(l)h(l)

1.

.4 7.

eH nce,

183

Minimal Polynomials

for the linear transformation A we have

(q A)g(A)

Thus, for x

,X

we have
x

Now since
h(A)q(A)g(A)x

r(A)h(A)

(q A)g(A)x

(q A)g(A)h(A)x

I.

(4.7.19)

r(A)h(A)x.

(q A)m(A)x

(q A)Ox

0,

it follows thatq(A)g(A)x
E ml. We can similarly show that r(A)h(A)x Emi' .
Thus, for every x E X we have x = XI + x 2 , where IX E mi' and X z E ml.
Let us now show that this representation of x is unique. Let X = IX
X 2
= x; + x~,
where IX ' ;x E ml ' and 2X ' ~x E ml. Then

r(A)h(A)x

r(A)h(A)x

;x

Applying Eq. (4.7.19) to IX and

XI

and

r(A)h(A)x;.
=

we get
r(A)h(A)x l

;X

r(A)h(A)x;.

F r om this we conclude that XI = ;x . Similarly. we can show that X 2 = x~.


Therefore. X = mi' EB ml.
To prove the second part of the theorem, let A I be the restriction of A to
mi' and let A2 be the restriction of A to ml. eL t ml(l) and m2(1) be the
minimal polynomials for AI and A2 respectively. Since g(A I) = 0 and
h(A 1 ) = O. it follows that ml(l) divides g(l) and m1
divides hell. by
Theorem 4.7.5. eH nce, we can write

o.)

ml(l) =

and

m2(A)

(A -

ll)kt

(1 -

A2)lo' ... (1 -

A,)lo,.

where 0 < kl :::;;:vl for i = I . .. p. Now let fell = ml(A)mrlA). Then


f(A) = m l(A)m 2(A). eL t X E X with X = IX + 2X ' where IX E mi' and
2X E ml. Then
f(A)x

m l(A)m 2(A)x 2 = m 2(A)m.(A)x l


O. But this implies that mel) dividesf(l) and 0

m l (A)m 2(A)x

Therefore,f(A) =
i = I, ... ,po
We thus conclude that kl
proof of the theorem. _

VI

for i

<

O.
VI

<

kl'

I, ... P. which completes the

We are now in a position to prove the following important result, called


the primary decomposition theorem.
.4 7.20. Theorem. eL t X be an n-dimensional vector space over C. let
AI' ... A, be the distinct eigenvalues of A E L ( X .
X ) . let the characteristic

184

Chapter 4 I m
F ite-Dimensional

Vector Spaces and Matrices

p(A.) =

A.)-,'

(4.7.21)

A.,)".

(4.7.22)

polynomial of A be
(A.I -

A.)"" ... (A., -

and let the minimal polynomial of A be


m(A.) =

eL t

,x =

Then
i=

(i) "X
(ii) X =

:x {

(A. -

(A -

A. I ) " . . (A. -

OJ,

A.,I)"x =

i=

I, ... ,po

I, ... ,p are invariant linear subspaces of X under A;

Et> .. Et>

Xl

X,;

(iii) (A. - A.,)" is the minimal polynomial of A" where A, is the restriction
of A to X,; and,
(iv) dim ,X = m" i = I, ... ,po

Proof The proofs of parts (i), (ii), and (iii) follow from the preceding
theorem by a simple induction argument and are left as an exercise.
To prove the last part ofthe theorem, we first show that the only eigenvalue
of A, E (L "X
,X ) is A." i = I, ... ,po eL t f) E "X
v*" 0, and consider
(A, - A.l)v = O. From part (iii) it follows that
0= (A, - A.,ly"V = (A, - 11I),1- (A , - A.I/)v
= (A, - 1,I),I- (A. - A.,)v = (A. - A.,)(A, - A.,I),.- l (A, - A.,l)v
(A. - l ,)l(A , =

A.,I),,-l v =

...

= (A. - A.,)"v.

From this we conclude that 1 = 1


"
We can now find a matrix representation
of A in the form given in Theorem
.4 6.18. uF rthermore, from this theorem it follows that
p(A.) =

det (A -

A./) =

D; det (A, -

A./).

Now since the only eigenvalue of A, is 1 the determinant of A, "


be of the form
det (A, - A.I) = (A., - A.)'t,
where ,q =

dim ,X . Since p(A.) is given by Eq. (4.7.21), we must have


(A. I
-

A.)IIII .

(A., -

A.)III, =

(A. l
-

A.)" .. (A., -

from which we conclude that m, = "q Thus, dim ,X


This concludes the proof of the theorem. _
.4 7.23.

A./ must

Exercise.

Prove parts (i)-i{ ii)

A.)t"

m i=
"

1, ... ,po

of Theorem .4 7.20.

The preceding result shows that we can always represent A E L(X,


X)
by a matrix in block diagonal form, where the number of diagonal blocks

.4 7.

Nilpotent Operators

185

(in the matrix A of Theorem .4 6.18) is equal to the number of distinct


eigenvalues of A. We will next find a convenient representation for each of
the diagonal submatrices A" It may turn out that one or more of the submatrices A, will be diagonal. Our next result tells us specifically when A
E L(X,
X ) is representable by a diagonal matrix.
.4 7.24. Theorem.
Let X be an n-dimensional vector space over C, and
X ) . eL t 1..... , 1" p < n, be the distinct eigenvalues of A.
let A E L ( X ,
Then there exists a basis for X such that the matrix A of A with respect to
this basis is diagonal if and only if the minimal polynomial for A is of the
form
mel) = (1 - A1 )(1 - Az ) . (A - A,).
.4 7.25.

Prove Theorem .4 7.24.

Exercise.

.4 7.26. Exercise.
.4 6.14 and .4 6.15.

Apply the above theorem to the matrices in Examples

B. Nilpotent Operators
eL t us now proceed to find a representation for each of the A, E L ( X
,X )
" of
in Theorem .4 7.20 so that the block diagonal matrix representation
A E L(X,
X ) (see Theorem .4 6.18) is as simple as possible. To accomplish
this, we first need to define and examine so-called nilpotent operators.
.4 7.27. DefiDition. eL t N E L ( X ,
X). Then N is said to be nilpotent if
there exists an integer q > 0 such that N" = O. A nilpotent operator is said
to be of index q if N" = 0 but N,,- I "* O.
Recall now that Theorem .4 7.20 enables us to write X = X I EB X z EEl
X . Furthermore, the linear transformation (A, - A,l) is nilpotent on ~.
Ifwe let N, = A, - A,I, then A, = All + N,. Now 1,1 is clearly represented
by a diagonal matrix. oH wever, the transformation N, forces the matrix
representation of A, to be in general non-diagonal. So our next task is to
seek a simple representation of the nilpotent operator N,.
In the next few results, which are concerned with properties of nilpotent
operators, we drop for convenience the subscript i.

EB

.4 7.28. T
' heorem. eL t N E L ( V, V), where V is an m-dimensional vector
space. If N is a nilpotent linear transformation of index q and if x . E V is
such that N,- l x
0, then the vectors x , Nx , ... , N,,- I x in V are linearly
independent.

*"

Chapter 4 I iF nite-Dimensional

186

Vector Spaces and Matrices

Proof. We first note that if Nf- I X *- 0, then NJx *- 0 for j =


q - I. Our proof is now by contradiction. Suppose that

1= 0

= -

= NJ+I[~

NJ x

l{ ,1 Nix
l{ ,J

I=I+ J

Thus,

o.
*- o. Then we can write

l{ ,INI X =

L e tj be the smallest integer such that l{ ,J


NJx

(- ! t )NI- J - I
(l,J

I=I+ J

*- O.
=

X ]

NJ+l

y,

where y is defined in an obvious way. Now we can write

Nf- I X

Nf- J - I NJ x

Nf- J - I NJ + l

Nfy

= O.

We thus have arrived at a contradiction, which proves our result.


Next, let us examine
mations.

0, I, ... ,

the matrix

representation of nilpotent transfor-

.4 7.29. Theorem. Let V be a q-dimensional vector space, and let N E L ( V,


V) be nilpotent of index .q Let mo E V be such that Nf-1m o *- o. Then the
matrix N of N with respect to the basis { N f- I m o, NQ-2 mo , . .. ,mol in V
is given by
0100
00
0010
00
N=
.
(4.7.30)
0000
01
0000
00

Proof.

By the previous theorem we know that {Nf-Im o,' .. ,mol is a linearly


independent set. By hypothesis, there are q vectors in the set, and thus
'{ N f- I m o, ... ,mol forms a basis for V. Let el = Nqm
o for i = I, ... ,q .
Then
O,
i= I
Ne l

Hence,

Ne l

= 0 et

Ne 2

Ne f

= 0 e

I et

{ el->J

0 e2 +
0 e2 +
0 e2 +

2, ... ,q .

1=

+
... +

0 . ef -

0 ef -

I e

f-

0 ef
0 eq

0 e

F r om Eq. (4.2.2) and Definition .4 2.7, it follows that the representation


of Nis that given by Eq. (4.7.30). This completes the proofofthetheorem. -

.4 7.

187

Nilpotent Operators

The above theorem establishes the matrix representation of a nilpotent


linear transformation of index q on a q-dimensional vector space. We will
next determine the representation of a nilpotent operator of index v on a
vector space of dimension m, where v < m. The following lemma shows
that we can dismiss the case v > m.
.4 7.31. eL mma.
V = m. Then v <

Let N
m.

v, where dim

L ( V, V) be nilpotent of index

Proof Assume x E V, N x = 0, N- - I X
0, and v > m. Then, by Theorem
4.7.28, the vectors x , Nx , ... , N- - I x are linearly independent, which contradicts the fact that dim V = m.

To prove the next theorem, we require the following result.


.4 7.32. eL mma.
eL t V be an m-dimensional vector space, let N
V), let v be any positive integer, and let

= OJ, dim WI =
= {x: N2X = OJ, dim W 2 =

WI

W2

{x:

W.

{x:

Nx

N' x

= OJ, dim

W.

L ( V,

II,

12 ,
I.

Also, for any i such that I < i < v, let { e l' ... , em} be a basis for V such
that e{ lt ... ,ed is a basis for WI' Then
(i) WI C w2 C . C W.; and
(ii) (e u " " e"_,, Ne,.+1> ... ,Ne, .. ,} is a linearly independent set of
vectors in W,.
To prove the first part, let x E WI for any i < v. Then NiX = O.
eH nce, NI+ I X = 0, which implies x E W1+ 1 '
To prove the second part, let r = II- I and let t = 11+ I - II' We note
that if x E WI+ I , then NI(Nx ) = 0, and so Nx E WI' This implies that
Ne J E WI for j = II + I, ... ,11+1'
This means that the set of vectors
{el, ... ,e" NeH> !
... , Ne"..} is in WI' We show that this set is linearly
independent by contradiction. Assume there are scalars (XI" ,(x , and
PI' ... , PI> not all ez ro, such that

Proof

(Xle l

Since e{ l ,
be non-ez ro.
eH nce,

+ ... +

(X,e,

PINe,,+1

+ ... +

p,Ne".,

= O.

,e,} is a linearly independent set, at least one of the PI must


Rearranging the last equation we have

Chapter 4 I iF nite-Dimensional

188

Thus,

Vector Spaces and Matrices

+ ... +

fl,e, ..> = 0,
W,. If fl.e,,+! + ...

N' ( fl. e,,+.

and (fl.e,,+. + ... + fl,e".,) E


+ fl,e" 1= = 0, it can
be written as a linear combination of e., ... , e", which contradicts the fact
that e{ ., . .. ,e".,} is a linearly independent set. If fl.e,,+. + ... + fl,e, ,
= 0, we contradict the fact that e
{ ., ... , e".,} is a linearly independent set.
eH nce, weconcludethatlZ, = Ofori = I, ... , r andfl, = Ofori = I, ... , t.
This completes the proof of the theorem. _
We are now in a position to consider the general representation of a
nilpotent operator on a finite-dimensional vector space.
.4 7.33.
let N

Theorem. eL t V be an m-dimensional vector space over C, and


L ( V, V) be nilpotent of index v. Let W. = {x: Nx = O}, ... , W.
= {x: N x = OJ, and let I, = dim W" i = I, ... ,v. Then there exists a basis
for V such that the matrix N of N is of block diagonal form,
E

N=:[ '
where

N,=

:],

(4.7.34)

N,

0100
0010

00
00

0000
0000

01
00

(4.7.35)

i = 1, ... ,r, where r = I., N, is a (k, x k,) matrix, I :::;; k,:::;;


determined in the following way: there are

I. -

I._I

2/, -

1'1+

2/. -

11

(v
1,-.

(i

lI,

and k, is

X v) matrices,

x i) matrices, i = 2, ... ,v -

(I x

I, and

I) matrices.

The basis for V consists of strings of vectors of the form


Proof By eL mma .4 7.32, W. c W1 C
c W. Let e{ ., ... , e.} be a
basis for V such that {e., . .. ,e,.l is a basis for W,. We see that W. = V.
Since N is nilpotent of index v, W._ 1 1= = W. and 1.-. < I.
We now proceed to select a new basis for V which yields the desired result.
We find it convenient to use double subscripting of vectors. L e th . = e,. .+ ,

.4 7.

189

Nilpotent Operators

,/(/y- I v_ . ),y
= e,y and let It. .- 1 = Nlt.., ... ,/(/.- 1 .- . ),.- 1 = NI(/._I . ) ,
By Lemma .4 7.32, it follows that {el>'"
,e,._.,fl . - I ,' " ,I<,.-, . ,) - I } is
a linearly independent subset of W._I> which mayor may not be a basis for
W._ I' If it is not, we adjoin additional elements from W._> \
denoted by
1<,.-, . 1+ 1 . - 1 "" ,/(/. -Iv ) >\- so as to form a basis for W._ I Now let
11 . 2- . = NII - I ,I2. 2- . = NI2.. - I '
,1<, . ,- . ),.-2. = NI<, . ,_ . ). _ I By
Lemma .4 7.32 it follows, as before, that e{ >\ ... , e,. ,/I. 2- .,. .. ,1<, . I- . ). 2- .}
is a linearly independent set in W.-2.' If this set is not a basis we adjoin vectors
from W.-2. so that we do have a basis. We denote the vectors that we adjoin
by 1<".,-1 . 1+ 1 . - 2 ., ,1<,. . - 1 .,.) 2- .' We continue in this manner until we
have formed a basis for V. We express this basis in the manner indicated in
Figure C.

Basis for
f '." - -,

f(/.-I..-,I.

f"._" ,-

f(l.- I ,),V- l , - - .

f(l._ , - / .-

2 ),v- l

f,,2' - - '

f(l.- I ,I,2,- - - - - - - - - ,

f(/2- 1 ,),2

f,." ,-

f(l._ I ._ , ).

f(/2- 1,), 1. - - ,

.4 7.36.

,,- - - - - - - - - ,

f/"I'

F i gtn C. Basis for V.

The desired result follows now, for we have

NI; =
1./

eH nce, if we let XI =
bottom to top, is

/{ ,./-0,>\

j>

j =

I
I.

II.., we see that the first column in Figure

C reading

We see that each column of Figure C determines a string consisting of k,


entries, where k, = v for i = I, ... , (I. - /._1)' Note that (/. - 1.-1) > 0,
so there is at least one string. In general, the number of strings withj entries
is (// - //-1) - (/J + I - //) = 2/} - I} + I - I} - I for j = 2, ... , v - I. Also,
there are /1 - (12. - /1) = 2/ 1 - /" vectors, or strings with one entry.
Finally, to show that the number of entries, NI, in N is /1' we see that

Chapter 4 I iF nite-Dimensional

190

Vector Spaces and Matrices

- I. - 1.- 2 ) +
there are a total of(/. - I.- I ) + (2/'1+ (2/ 1 - 12 ) = II columns in the table of Figure C.
This completes the proof of the theorem. _

... +

(2/ 2
-

II -

13 )

The reader should study Figure C to obtain an appreciation of the structure


of the basis for the space V.

C. The oJ rdan

Canonical oF rm

We are finally now in a position to state and prove the result which establishes the Jordan canonical form of matrices.
.4 7.37.

A E L(X,

Deorem. eL t X be an n-dimensional vector space over C, and let


X ) . eL t the characteristic polynomial of A be

p(A) =

A)"" ... (A, -

(AI -

A)m.,

and let the minimal polynomial of A be


m(A)

(A -

AI)" ... (A -

A,)",

where AI' ... ,A, are the distinct eigenvalues of A. eL t


,X

Then
(i)
(ii)
(iii)
(iv)

Xl>""
X

X,

x{

X:

(A -

A,I)"x

= OJ.

are invariant subspaces of X under A;

= IX EB .. EB

X,;

dim ,X = m i = 1, ... ,p; and


"
there exists a basis for X such that the matrix A of A with respect
to this basis is of the form
AI
A

where A, is an (m,

=
X

0 ... 0]

~ ... ~.2
o

: : :

~.

'

(4.7.38)

... A,

m,) matrix of the form

A, = 1,1 + N,

(4.7.39)

and where N, is the matrix of the nilpotent operator (A, of index V, on ,X given by Eq. (4.7.34) and Eq. (4.7.35).

liT)

Proof. Parts (i)-(iii) are restatements of the primary decomposition theorem


(Theorem .4 7.20). From this theorem we also know that (1 - 1 ,)" is the
minimal polynomial of A" the restriction of A to "X
eH nce, if we let N,
= A, - l,I, then N, is a nilpotent operator of index V, on "X
We are thus
able to represent N, as shown in Eq. (4.7.35).
The completes the proof of the theorem. _

.4 7.

oJ rdan Canonical oF rm

191

A little extra work shows that the representation of A E L ( X .


X ) by a
matrix A of the form given in Eqs. (4.7.38) and (4.7.39) is unique. except for
the order in which the block diagonals AI . .. Ap appear in A.
.4 7.40.
Definition. The matrix A of A E L ( X .
X ) given by Eqs. (4.7.38)
and (4.7.39) is called the Jordan canonical form of A.
We conclude the present section with an example.

Example. Let X = R 7 and let u{ I


u7 } be the natural basis for
.4 7.41.
X (see Example .4 I.15). L e t A E L ( X .
X ) be represented by the matrix
3 0
o o 0
2 -1
2
1
-1
-6 0
2
-2
0 -1
1
3 0
o 0 o 0 1 o0
o 0 o 0 o 1 0
-I
-I
o 1 2 4 1
-I

A=

-1

with respect to u{ I , . . u7 } . L e t us find the matrix At which represents A


in the J o rdan canonical form.
We first find that the characteristic polynomial of A is
Pel)

1)7.

(I -

This implies that 1 1 = I is the only distinct eigenvalue of A. Its algebraic multiplicity is m. = 7. In order to find the minimal polynomial of A.
let
N

),.1,

A-

where I is the identity operator in L ( X ,


respect to the natural basis in X is

-2

N= A - I =

-2

o
o o

-1

-I

X).

The representation for N with

-I

1 -I

0
-6

-I

-I

1
0

o
o

3 0o0
0

3 0

o
o

0
0
4 0

Chapter 4 I iF nite-Dimensional

192

Vector Spaces and Matrices

We assume the minimal polynomial is of the form m(l) = (l - I ' and


proceed to find the smallest VI such that m(A - I ) = m(N) = O. We first
obtain

o
o

NZ

Next, we get that

-1

0
I

o
o
o

0
0
0

0 -I

0
0
0
0
0
0
0

0
0
0
0
0
0
0

0
3 0
o0
0
0 -3 0
0
3 0
o0
0
0
o0
0
o0

N3 = 0 ,

3. eH nce, N is a nilpotent operator of index 3. We see that


We will now apply Theorem .4 7.33 to obtain a representation for
N in this space.
sU ing the notation of Theorem .4 7.33, we let WI = :x { Nx = OJ, Wz
= :x {
NZx = OJ, and W, = :x { N 3 x = 0). We see that N has three linearly
independent rows. This means that the rank of N is 3, and so dim (WI) = II
= .4 Similarly, the rank of NZ is I, and so dim (Wz ) = Iz = 6. Clearly,
dim (W3) = 13 = 7. We can conclude that N will have a representation N'
ofthe form in Eq. (4.7.34) with r = .4 Each of the N; will be of the form in Eq.
(4.7.35). There will be 13 - Iz = 1 (3 x 3) matrix. There will be 2/ z - 13
- II = 1 (2 X 2) matrix, and 2/1 - Iz = 2 (l x I) matrices. eH nce, there
is a basis for X such that N may be represented by the matrix

and so

X =

VI =

5't~.

N' =

1 0iO 0 0 0
001:0000
I
o 0 0:0 0 0 0
r- -j000:01:00
I
I
o 0 010
010 0
- - - r-
o 0 0 0 0:0:0
o 0 0 0 0 0:0
1

,-_

..

The corresponding basis will consist of strings of vectors of the form

NZx..

Nx . .

x..

Nx z , X z , x 3, x ...
We will represent the vectors x .. X z , "x
and x .. by x .. x z , "x
and x ..,
their coordinate representations, respectively, with respect to the natural
basis u{ .. ... , u,} in .X We begin by choosing XI E W3 such that X I 1= Wz ;
i.e., we find an X I such that N 3x I = 0 but NZx I :# O. The vector x f = (0,

.4 7.

193

oJ rdan Canonical oF rm

1,0,0,0,0,0) will do. We see that (Nxl)T = (0,0, 1,0,0,0, - I ) and


(N2IX )T = (- 1 ,0, I, - 1 ,0,0,0). Hence, NX I E Wz but NX I ~ WI and
NZx l E WI' We see there will be only one string of length three, and so we
next choose zX E Wz such that X z ~ WI' Also, the pair N
{ x l , }zx
must be
linearly independent. The vector x I = (1,0,0,0,0,0,0) will do. Now
(NxZ)T = (- 2 ,0,2, - 2 ,0,0, - I ), and NX 2 E WI' We complete the basis
{ Zx l , Nx z ,
for X by selecting two more vectors, X 3 , x , E W., such that N
X 3t x , }
are linearly independent. The vectors x I = (0, 0, - I , - 2, I, 0, 0)
and x r = (1, 3, I, 0, 0, I, 0) will suffice.
It follows that the matrix
P =

N
[ xz

l,

Nx l , X I '

Nx z , X z , x

3,

x,]

is the matrix of the new basis with respect to the natural basis (see Exercise

.4 3.9).

The reader can readily show that


N' = P - I NP,

where

-I

0
I
P=

-I
0

0 0 -2 I
0
0 I
0 0
0
I 0
2 0 -I
0 0 -2 0 -2
0 0

0 0

0 0

p- l =

I
I

3
I

0
0

0 I

0 0
0
0 0
I
0 -I 0
0
and

0 0

-2

0 0
I
3 -I
0
0 I
0
0
0 -3
0
I -I
0 0 -I
-I -3
-2 -I
I 0
0
0 -I
0
0
0
0
I
0 0
0 0
0
0
0
I
0

Finally the J o rdan canonical form for A is given by

A' =

N'

I.

(Recall that the matrix representation for [ i s the same for any basis in .X )
Thus,

Chapter 4 I iF nite-Dimensional

194

Vector Spaces and Matrices

1 1 0iO 0 0 0
I
011:0000
001:0000
t- - 00 0: 1 1:0 0
0 0 0 :I 0 1 :I 0 0
o 0 0
0
I

A' =

o- o- T"i-l

'- -i-

0 0 0 0 0 OIl
Again, the reader can show that
A' =

P- I AP.

In general, it is more convenient as a check to show that PA'

= AP.

.4 7.42.
Exercise. eL t X = R' , and let u{ t , , u,} denote the natural
X ) be represented by the matrix
basis for .X Let A E L ( X ,

A=

05 -1
I
1
0
3 -I
-1
1
0
0
4
0
0
0
1
1
-1
0
0
0
4 -1
0
0
0
0
1
3
0
0
0
0
1
3

Show that the Jordan canonical form of A is given by


4

04

A' =

0iO
I

1:000
o 0 4:0 0 0
O-O
- O
- r- i- 4 l0

0_

1_ _

0 0:0 4 : 0
0 0 0 0 i2
~

and find a basis for X for which A' represents A.

.4 8.

BILINEAR

N
UF CTIONALS

AND CONGRUENCE

In the present section we consider the representation and some of the


properties of bilinear functionals on real finite-dimensional vector spaces.
(We will consider bilinear functionals defined on complex vector spaces in
Chapter 6.)

.4 8.

Bilinear uF nctionals and Congruence

195

Throughout this section X is assumed to be an n-dimensional vector space


over the field ofreal numbers. We recall that iffis a bilinear functional on a
real vector space ,X then f: X x X - + Rand
f(<x<

and

px 2,Y ) =

f(x , IY

PY2) =

f (x ,

for all ' J


.4 8.1.
let

PIr

Rand ' J x

IY r

Definition. eL t e{

X,j

l , ... ,

2Y

YI'

Pf(x , 2Y )
E

.X

tl 1;1 JPd(x

J, Ir"'i;,. PlrYIr)

Pf(x 2,y)

YI)

for all , pER and for all x, X I ' x 2, ,Y


these properties we have, more generally,

f(ii x J

f (x l ,y)

As a consequence of

J, IY r)

I, ... , rand k =

I, ... ,s.

en) be a basis for the vector space X,

ftJ = f(e" eJ ), i,j =

and

I, ... , n.

The matrix F = lftJ ] is called the matrix of the bilinear functional f with
respect to e{ l> ... , en)'
Our first result provides us with the representation of bilinear functionals
on finite-dimensional vector spaces.
.4 8.2. Theorem. Let f be a bilinear functional on a vector space ,X and
let fe I' . . , e.l be a basis for .X eL t F be the matrix of the bilinear functional
fwith respect to the basis fel> ... ,e.l. If X and yare arbitrary vectors in X
and if x and yare their coordinate representation with respect to the basis
fel , e2 , , e.l, then

Proof

+ ... +

~
:E fttl' l J '
1'1= I- J
We have x T = (el" .. ,en) and yT = ('II" .. ,'In)' Also,
e.e. and Y = ' I le l + ... + I' .en Therefore,

f(x , y) =

f(x , y) =

:E
:E
I- I I- J

T
x yF

el' l J ( e l, eJ )

= :E
:E ftJel'lJ
1= I = J I
=

(4.8.3)
=

ele l

T
x yF

which was to be shown.


Conversely, if we are given any (n X n) matrix F, we can use formula
(4.8.3) to define the bilinear functional f whose matrix with respect to the
given basis e{ I ' . . , e.) is, in turn, F again. In general, it therefore follows that
on finite-dimensional vector spaces, bilinear functionals correspond in a
one-to-one fashion to matrices. The particular one-to-one correspondence
depends on the particular basis chosen.
Now recall that if X is a real vector space, thenfis said to be symmetric

Chapter 4 I iF nite-Dimensional

196
if f(x , y) =
concept.
4.8.4.

f(y, x ) for all x , y

Definition.

skew symmetric if
for all x, y

.X

Vector Spaces and Matrices

We also have the following related

A bilinear functional f on a vector space X is said to be

f(x , y)

.X

(4.8.5)

- f (y,x )

F o r symmetric and skew symmetric bilinear functionals we have the


following result.
4.8.6. Theorem. L e t e{ t , ,eR } be a basis for ,X and let F be the matrix
for a bilinear functionalfwith respect to e{ l>'
.. ,e.}. Then

(i) f is symmetric if and only if F = T


F ;
(ii) f is skew symmetric if and only if F = - T
F ; and
(iii) for every bilinear functional f, there exists a unique symmetric
bilinear functional f, and a unique skew symmetric bilinear functional f2 such that
f= f , + f 2'
We callft the symmetric part offandf2 the skew symmetric part off.

.4 8.7.

Exercise.

Prove Theorem .4 8;6.

The preceding result motivates the following definitions.


4.8.8.

Definition. An (n
X

n) matrix F

is said to be

(i) symmetric if F = T
F ; and
(ii) skew symmetric if F = - F T .
The next result is easily verified.
.4 8.9. Theorem. Let f be a bilinear functional on ,X and let ft and f2 be
the symmetric and skew symmetric parts off, respectively. Then
and
for all x , y
4.8.10.
!(x)

f,(x , y) =

t[ f (x ,

y)

feY , )x ]

f2(X, y) =

t[ f (x ,

y) -

feY , )x ]

.X

Exercise.

Prove Theorem .4 8.9.

Now let us recall that the q u adratic form induced by f was defined as
= f(x , x). On a real finite-dimensional vector space X we now have

.4 8.

197

Bilinear uF nctionals and CongrlUn! ce

F o r quadratic forms we have the following result.


.4 8.11. Theorem. L e tJ a nd g be,bilinear functionals on .X The quadratic
forms induced by J and g are equal if and only ifJ and g have the same symmetric part. In other words,! ( x ) = ( x ) for all x E X if and only if
![J(x,

for all x , y

y)

J(Y,

)x ]

J(x,

x) -

! [ g (x ,

y)

g(y, )x ]

.X

Proof We note that


J(x

y, x -

y) =

y) -

feY , y) -

f(x -

y, x -

y)].

g(y, y) -

g(x -

y, x -

y)]

F r om this it follows that


![J(x,

y)

Now if g(x , x ) =
![J(x,

y)

J(Y,

)x ]

! [ f (x ,
=

x ) , then

J(x,
J(Y,

)x ]

=
=

so that
![J(x,

x)

y) 1
+

! [ g (x ,

x)

! [ g (x ,

y)

(Y, )x ]

+
+

J(Y,

x)

J(x,

J(Y,

y).

g(y, x)],

! [ g (x ,

y)

(4.8.12)

g(y, )x .]

Conversely, assume that Eq. (4.8.12) holds for all ,x y E .X


particular, if we let x = y, we have f(x , x ) = g(x , x ) for all x
concludes our proof. _

Then, in
.X This

F r om Theorem .4 8.11 the following useful result follows: when treating


quadratic functionals, it suffices to work with symmetric bilinear functionals.
We leave the proof of the next result as an exercise.
.4 8.13. Theorem. A bilinear functional on a vector space X is skew symmetric if and only if J ( x , x ) = 0 for all x E .X
.4 8.14.

Exercise.

Prove Theorem .4 8.13.

The next result enables us to introduce the concept of congruence.


.4 8.15. Theorem. Let f be a bilinear functional on a vector space X, let
e{ l> ... , e.l be a basis for ,X and let F be the matrix ofJ w ith respect to this
be another basis whose matrix with respect to e{ l>
basis. Let e{ ;, . .. ,e~l
... ,e.l is P. Then the matrix F ' of fwith respect to the basis e{ ;, .. . ,e.l

Chapter 4 I iF nite-Dimensional

198
is given by

P"F P .
=

(4.8.16)

. p,/e,) = ..= /[ :/]p",p,/I(e", e,) = . . p",I",p,/.= 1(1" )~ .

Proof
~

F'

Vector Spaces and Matrices

Let F '

where, by definition,I:/

~ ~

Hence, F '

~ ~

"-I I-'

t:1

t:' t t:1

Then/(

"-I

P"F P .

p",e".
_

We now have:
4.8.17. Definition. An (n x n) matrix F ' is said to be congruent to an
(n X n) matrix F if there exists a non-singular matrix P such that

F'
=

PTFP.

We express this congruence by writing F '

(4.8.18)

,." .F

Note that congruent matrices are also equivalent matrices. The next
theorem shows that ,." in Definition 4.8.17 is reflexive, symmetric, and
transitive, and as such it is an equivalence relation.
4.8.19.

Theorem.

Let A, B, and C be (n x

n) matrices. Then,

(i) A is congruent to A;
(ii) if A is congruent to B, then B is congruent to A; and
(iii) if A is congruent to Band B is congruent to C, then A is congruent
toC.
Proof Clearly A = ITAI, which proves the first part. To prove the second
part, let A = PTBP, where P is non-singular. Then

= (PT)-IAP-I = (P-I)TA(P-I),

which proves the second part.


Let A = PTBP and B = QTCQ, where P and Q are non-singular matrices.
Then
A = PTQTCQP = (QP)TC(QP),
where QP is non-singular. This proves the third part. _
F o r practical reasons we are interested in determining the "nicest" (i.e.,
the simplest) matrix congruent to a given matrix, or what amounts to the
same thing, the "nicest" (i.e., the most convenient) basis to use in expressing
a given bilinear functional. If, in particular, we confine our interest to
quadratic functionals, then it suffices, in view of Theorem .4 8.11, to consider
symmetric bilinear functionals.

.4 8.

Bilinear uF nctionals and Congruence

199

We come now to the main result of this section, called Sylvester's


theorem.
.4 8.20. Theorem. L e t / be any symmetric bilinear functional on a real
n-dimensional vector space .X Then there exists a basis {el' ... ,e.} of X
such that the matrix of/with respect to this basis is of the form

+1

p}

+1

-1

n (4.8.21)

-1
0

o
The integers rand p in the above matrix are uniquely determined by the
bilinear form.

Proof. Since the proof of this theorem is somewhat long, it will be carried
out in several steps.
Step 1. We first show that there exists a basis v{ u ... , v.J of X such that
/(v1, vJ) = 0 for i 1= = j. The proof of this step is by induction on the dimension
of .X The statement is trivial if dim X = 1. Suppose that the assertion is
true for dim X = n - l. Let / be a bilinear functional on ,X where dim
X = n. L e t VI E X be such that /(v l , VI) 1= = O. There must be such a VI;
otherwise, by Theorem .4 8.13, / would be skew symmetric, and we would
conclude that/(x , y) = 0 for all x , y. Now let mz = x { E X : f(v l , x) = OJ.
We now show that mz is a linear subspace of .X Let X u 2X E mz so that
f(vl , X I ) = f(v u x 2) = O. Then f(v u X I + x 2 ) = f(vl , X I ) + f(v l , x 2) = 0
0 = O. Similarly, f(vt> X I ) = 0 for all E R. Therefore, mz is a linear
Furthermore, mz 1= = X because VI mz. Hence, dim mz
subspace of .X
:= ;;; n - 1. Now let dim mz = q < n - 1. Since / is a bilinear functional on
mz, it follows by the induction hypothesis that there is a basis for mz consisting
of a set of q vectors v{ 2 , , vf+tl such that f(v1, vJ) = 0 for i 1= = j, 2 < i,
j < q + 1. Also, f(v l , vJ) = 0 for j = 2, ... ,q + I, by definition of mz.

Chapter 4 I iF nite-Dimensional

200

Vector Spaces and Matrices

uF rthermore, f(v VI) = f(v l , vJ eH nce, f(v VI) = f(v., v,} = 0 for i = 2,
"
... ,q + l .
It follows
that f(v"vJ } = 0 for" i:# j and I~i,j<q+l.
We now show that {VI"" ,vq+ll is a basis for .X eL t x E X and let x '
= x - (XlVI, where (XI = f(v l , x ) ff(v l , VI)' Then f(v., x'} = f(v" )x
(X t /(v l , VI) = f(v l , VI) - f(v l , VI) =
O. Thus, x ' E mI. Since v{ ,z . . ,
vq + l l
is a basis for mI, there exist (X2" ,(Xq+1
such that x ' = (XV
z z + ...
+ (XI+q V q + l ; i.e., x = (XlVI + ... + (XI+q V q + l , Thus, { V I"' " vq + l } spans .X
To show that the set {VI" , vq + l } is linearly independent, assume that
(XlVI +
... + (XI+q Vq1+
= O. Then 0 = f(v l , 0) = f(V., (XlVI + ... +
(XI+q V
(X t /(v"
VI) =
0, which implies that (x , = O. eH nce, (X2VZ + ...
q + l) =
(XI+q V
O. Since the set { V 2"' " vqt+ l
forms a basis for mI, we
q + 1 =
must have (X2 = ... = (X1+q
= O. Thus, { V I"' "
vqt+ l forms a basis for
,X
and we conclude that q + I = n. This completes the proof of
step l.
Step 2. eL t { V I' , v.l be a basis for X such that f(v" VJ) = 0 for
i:# j and let P, = f(v v,) for i,j = I, ... , n. eL t e, = 1' IV, for i = 1, ... , n,
where 1' , = If..Jj;J l "if P,:# 0 and 1' 1 = 1 if P, = O. Now suppose that
P, = f(v" v,):# O. Then we have f(e" e,l = f(Ylv 1' ,V,) = 1' 1f(v" v,)
= P,f..J7lJ = l. Also, if P, = f(v v,) = 0, then f(e" e,l" = Ylf(v" v,) = O.
"
iF nally, we see that f(e eJ ) = f(y,v
vJ Y J ) = IY 1' fJ (v
VJ) = 0 if i:# j.
"
Thus, we have established a basis for" X such that fl}" = f(e" eJ ) = 0 if
i j and "[ = f(e,. e,l = + 1, - 1 , or O.
Step 3. We now show that the integers p and r in matrix (4.8.21) are
uniquely determined by f eL t { e l, ,e.l and {e~, . .. , e~} be bases for X
and let F and F ' be matrices offwith respect to { e l, , e.l and Ie;, ... , i.},
respectively, where

o
-1

F=

-I

n- p

o
o

.4 8.

Bilinear uF nctionals and Congruence

201

o
-1
F'=
-1

n- q

To prove that p = q we show that e l , ,e" e;+h ... ,e:. are linearly
independent. F r om this it must follow that p
(n - q ) < n, or p < .q By
the same argument, q < p, and so p = .q L e t

Ylel

y,e, +

where 1' 1 E R, i = 1,
,p and 1' :
above equation we have
Then

+ ... +

"leI

=
l , ... ,

f(x o, x o) =

Y~

R, i

, ~e:.

,Y e"

Ylel

0,
=

1, ... ,n. Rewriting the

+ ... +

-(Y;+le;+1

+ ... +
+ ... + Y~ >

y~e:.)

o'

+ ... +

pY e p)

(r~+I~+I'

... ,y~e~)]

0,

ep}. On the other hand,

+ ... +

f[-(~+,~+,

= f(y,e,

f(x o, x o)

by choice of{ e

"pep =

+ ... +

;Y l+ e;+1

(- 1 )Z[ -

(,,~+

1)2 -

(,,~+z)Z

y~e:.),

(y~)Z]

<

+ ... +

by choice of{~+I"
.. ,e~+R}'
F r om this we conclude thaty~
'1~ = 0;
i.e., 1' 1 = ... = 1' p = O. Hence, Y~+ I~+ I + ... + y~e~
= O. But the set
{~+I"
.. , e:,} is linearly independent, and thus Y~+I
= ... = , ~ = O. Hence,
... ,e~ are linearly independent, and it follows
the vectors el' ... ,ep , ~+t,
thatp = .q
To prove that r is unique, let r be the number of non-zero elements of F
and let r' be the number of non-zero elements of F ' . By Theorem .4 8.15,
F and F ' are congruent and hence equivalent. Thus, it follows from Theorem
4.3.16 that F and F ' must have the same rank, and therefore r = r'.
This concludes the proof of the theorem. _

201

Chapter 4

I Finite-Dimensional

Vector Spaces and Matrices

Sylvester's theorem allows the following classification ofsymmetric bilinear


functionals.
.4 8.22. Definition. The integer r in Theorem .4 8.20 is called the rank of
the symmetric bilinear functional f. The integer p is called the index of f.
The integer n is called the order off. The integer s = 2p - r (i.e., the number
of + l' s minus the number of - I s' ) is called the signature off.
Since every real symmetric matrix is congruent to a unique matrix of the
form (4.8.21), we define the index, order, and rank of a real symmetric
matrix analogously as in Definition .4 8.22.
Now let us recall that a bilinear functional f on a vector space X is said
to be positive if f(x , x ) > 0 for all x E .X Also, a bilinear functional f is
said to be strictly positive if f(x , x) > 0 for all x
0, x E X (it should be
noted that f(x , x ) = 0 for x = 0). Our final result of the present section,
which is a consequence of Theorem .4 8.20, enables us now to classify symmetric bilinear functionals.

"*

.4 8.23. Theorem. Let p, r, and n be defined as in Theorem .4 8.20. A


symmetric bilinear functional on a real n-dimensional vector space X is
(i) strictly positive if and only if p
(ii) positive if and only if p = r.
.4 8.24.

.4 9.

Exercise.

n; and

Prove Theorem .4 8.23.

EUCIL DEAN

VECTOR SPACES

A. Euclidean Spaces: Definition and Properties

Among the various linear spaces which we will encounter, the so-called
Euclidean spaces are so important that we devote the next two sections to
them. These spaces will allow us to make many generalizations to facts
established in plane geometry, and they will enable us to consider several
important special types of linear transformations. In order to characterize
these spaces properly, we must make use of two important notions, that of
the norm of a vector and that of the inner product of two vectors (refer to
Section 3.6). In the real plane, these concepts are related to the length of a
vector and to the angle between two vectors, respectively. Before considering
the matter on hand, some preliminary remarks are in order.
To begin with, we would like to point out that from a strictly logical
point of view Euclidean spaces should actually be treated at a later point of

.4 9.

Euclidean Vector Spaces

203

our development. This is so because these spaces are specific examples of


metric spaces (to be treated in the next chapter), of normed spaces (to be dealt
with in Chapter 6), and of inner product spaces (also to be considered in
Chapter 6). oH wever, there are several good reasons for considering Euclidean
spaces and their properties at this point. These include: Euclidean spaces are
so important in applications that the reader should be exposed to them as
early as possible; these spaces and their properties will provide the motivation
for subsequent topics treated in this book; and the material covered in the
present section and in the next section (dealing with linear transformations
defined on Euclidean spaces) constitutes a natural continuation and conclusion of the topics considered thus far in the present chapter.
In order to provide proper motivation for the present section, it is useful
to utilize certain facts from plane geometry to indicate the way. To this end
let us consider the space R'- and let x = (' I ' ,,-) and y = ('11' 1' ,-) be vectors
in R'.- Let IU{ >
u,-} be the natural basis for R'.- Then the natural coordinate
representation of x and y is
x =

[~:J

and y =

:[ :J

(4.9.1)

respectively (see Example .4 1.15). The representation of these vectors in


the plane is shown in Figure D. In this figure, Ix I, Iy I, and Ix - y Idenote the

.4 9.1.

iF gure D. eL ngth of vectors and angle between vectors.

lengths of vectors ,x y, and (x - y), respectively, and 8 represents the angle


IlZ , and the length
between x and y. The length of vector x is equal to (,f +
of vector (x - y) is equal to { ( ' I - 1' 1)'- + (,,- - 1' ,-))- ' 1/2. By convention,

,n

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

we say in this case that "the distance from x to y" is equal to {(~I
- I' I)Z
+ (~z - I' Z)Z}1/2, that "the distance from the origin 0 (the null vector) to
x" is equal to (~f + ~DI/Z,
and the like. Using the notation of the present
chapter, we have
(4.9.3)
and

Ix - yl =

,J ( x

y)T(x -

= ,J ( y - )X T(y - )x = Iy - lx .

y)

(4.9.4)

The angle (J between vectors x and y can easily be characterized by its


cosine, namely,

cos 8 =(~

Utilizing

17~+

~z7z)

(4.9.5)

""'~f
+ i ""'I' I + I' z
the notation of the present chapter, we have

,J

cos (J =

T
x x

XT~

(4.9.6)

yTy

It turns out that the real-valued function T


x y, which we used in both Eqs.
(4.9.3) and (4.9.6) to characterize the length of any vector x and the angle
between any vectors x and y, is of fundamental importance. F o r this reason
we denote it by a special symbol; i.e., we write
(x, y)

Now if we let x

t:.

T
x y.

(4.9.7)

yin Eq. (4.9.7), then in view of Eq. (4.9.3) we have

Ix I = ""'(x, x).

(4.9.8)

By inspection of Eq. (4.9.3) we note that

>

(x, x)

and

(x , x )

0 for all x * - O

0 for x

(4.9.9)
(4.9.10)

O.

Also, from Eq. (4.9.7) we have


(x, y) =

(4.9.11)

(y, x)

for all x and y. Moreover, for any vectors ,x y, and z and for any real scalars
and p we have, in view of Eq. (4.9.7), the relations
(x

(x , y

and

y, )z =
)z =

(x, )z
(x, y)

(Y, )z ,

(4.9.12)

(x , )z ,

(4.9.13)

y) =

(x,

y),

(4.9.14)

(x , y ) =

(x,

y).

(4.9.15)

(<,x<

In connection with Eq. (4.9.6) we can make several additional observa1; if x = - y , then cos
tions. First, we note that if x = y, then cos (J =
8 = - 1 ; if x T = (~I> 0) and yT = (0, I' )z , then cos (J = 0; etc. It is easily

.4 9.

Euclidean Vector Spaces

verified, using Eq. (4.9.6), that cos (J assumes all values between
1 and
- 1 ; i.e., - 1 < cos (J S
1.
The above formulation agrees, of course, with our notions of length
of a vector, distance between two vectors, and angle between two vectors.
F r om Eqs. (4.9.9}-(4.9.l5)
it is also apparent that relation (4.9.7) satisfies
all the axioms of an inner product (see Section 3.6).
U s ing the above discussion as motivation, let us now begin our treatment
of Euclidean vector spaces.
F i rst, we recall the definition of a real inner product: a bilinear functional
f on a real vector space X is said to be an inner product on X if (i) f is symmetric and (ii) f is strictly positive. We also recall that a real vector space X
on which an inner product is defined is called a real inner product space. We
now have the following important

.4 9.16.

Definition. A real finite-dimensional vector space on which an


inner product is defined is called a Euclidean space. A finite-dimensional
vector space over the field of complex numbers on which an inner product is
defined is called a unitary space.

We point out that some authors do not restrict Euclidean spaces to be


finite dimensional.
Although many of the results of unitary spaces are essentially identical
to those of Euclidean spaces, we postpone our treatment of complex inner
product spaces until Chapter 6, where we consider spaces that, in general,
may be infinite dimensional.
Throughout the remainder ofthe present section, X will denote an n-dimensional Euclidean space, unless otherwise specified. Since we will always be
concerned with a given bilinear functional on ,X we will henceforth write
(x, y) in place of f(x , y) to denote the inner product of x and y. Finally, for
purposes of completeness, we give a summary of the axioms of a real inner
product. We have

*'

(i) (x, x ) > 0 for all x


0 and (x, x ) = 0 if x = 0;
(ii) (x, y) = (y, x ) for all x , y E X;
(iii) (IXX
py, )z = IX(,X )z
P(y, )z for all x, y, z E X and all IX,
PER; and
(iv) (x, lXy
pz) = IX(,X y) P(x, )z for all x , y E X and all IX, pER.

We note that Eqs.


axioms.

.4 9.17.
if y =

o.

(4.9.9}-(4.9.15)

are clearly in agreement with these

Theorem. The inner product (x, y) =

0 for all x

X if and only

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

Proof If y = 0, then y = 0 x and (x, 0) = ( x , 0 x ) = 0 (x, x ) = 0 for


allx E X .
On the other hand, let (x , y) = 0 for all x E .X Then, in particular, it
must be true that (x, y) = 0 if x = y. We thus have (y, y) = 0, which implies
thaty = 0..
.
The reader can prove the next results readily.
.4 9.18. Corollary. L e t
if and only if A = O.

A E L(X,

.4 9.19.
y E X,

A, B E L ( X ,

Corollary. Let
then A = B.

4.9.20. Corollary.
x , y E R-, then A =
4.9.11.

Exercise.

X).

Then (x, Ay) =

0 for all ,x y E X

If (x, Ay) =

(x, By) for all ,x

X).

A be a real (n x n) matrix.

Let

o.

If x T Ay

0 for all

(~\t

. . , ~_)

Prove Corollaries 4.9.18-4.9.20.

Of crucial importance is the notion of norm. We have:


4.9.11.

Definition. F o r each x E ,X

We call Ix

Ixl

I the norm of .x

let
(x ,

X ) 1/2.

Let us consider a specific case.


4.9.13. Example.
and y = ("I ' . ,

Let
,,_ ) .

X = R- and let x, y E X, where x


F r om Example 3.6.23 it follows that

:E
/~ '
I- I

(x, y) =

(4.9.24)

is an inner product on .X The coordinate representation of x and y with


respect to the natural basis in R- is given by

respectively (see Example .4 1.15). We thus have


and

(x , y)

Ixl =

:E l~
I- I

= Tx y,

)1/2

The above example gives rise to:

(X TX)1/2

(4.9.25)
(4.9.26)

.4 9.

Euclidean Vector Spaces

207

.4 9.27.

Definition. The vector space R" with the inner product defined
in Eq. (4.9.24) is denoted by P. The norm of x given by Eq. (4.9.26) is called
the Euclidean norm on R".
Relation (4.9.29) of the next result is called the Schwarz inequality.

.4 9.28.

Theorem.

Let x and y be any elements of .X

Then

Ix l I Y I ,

l(x,y)1 ~

(4.9.29)

where in Eq. (4.9.29) I(x, y) I denotes the absolute value of a real scalar and
Ix I denotes the norm of .x
F o r any x and y in X and for any real scalar tt we have

Proof

(x

tty, x

tty)

(x, )x

tt(x, y)

Now assume first that y *- 0, and let


tt

Then

(x

tty, x

tty)

(x

(x

,
,

tt(y, )x

tt 2(y, y)

>

O.

- ( x , y).
(y, y)

(x, x )

or

2tt(x, y)

tt 2(y, y)

x) _

2(x, y)(x y)
(y, y ) '

x) _

(x , y)2 >
(y,y) -

(x, x)(y, y)

>

(x , y)2(y y)
(Y, y)2 ,

0
,

(x , y)z.

Taking the square root of both sides, we have the desired inequality

l(x,y)1 < Ix l l yl

To complete the proof, consider the case y = O. Then (x, y)


and in this case the inequality follows trivially.

.4 9.30.

Exercise.

F o r ,x y

,X

= 0, Iy I = 0,

show that

l(x,y)1 = Ix l ' l yl

if and only if x and yare linearly d.ependent.


In the next result we establish the axioms of a norm.

.4 9.31.

Theorem.
following hold:

For

all x and y in X

and for all real scalars tt, the

(i) Ix l >
0 unless x = 0, in which case Ixl = 0;
(ii) Ittx I = Itt I . Ix I, where Itt I denotes the absolute value of the scalar
tt; and

(iii)

Ix

IY ~

Ixl

Iyl

Chopter 4 I iF nite-Dimensional

Vector Spaces and Matrices

Proof The proof of part (i) follows from the definition of an inner product.
To prove part (ii), we note that

Ilx z =

(<,x<

x)

( ,x

x)

Z
(x,

z.

= lz lx

)x

Taking the square root of both sides we have the desired relation

Ilx

= Illx I
To verify the last part of the theorem we note that

Ix

sU ing

ylZ

(x

= Ixl z +

y, x

y)

2(x , y)

(x, x)

+ Iylz.

2(x , y)

(Y , y)

the Schwarz inequality we obtain

Ix +

Ixl z + 2lxl I yl + lylZ = (Ixl + Iyl)z.

ylZ <

Taking the square root of both sides we have

Ix +

y I<

Ix I + Iy I,

which is the desired result. _


Part (iii) of Theorem .4 9.31 is called the triangle inequality. Part (ii) is
called the homogeneous property of a norm. In Chapter 6 we will define
functions on general vector spaces satisfying axioms (i), (ii), and (iii) of
Theorem .4 9.31 without making use of inner products. In such cases we will
speak of normed linear spaces (Euclidean spaces are examples of normed
linear spaces).
Our next result is called the parallelogram law. Its meaning in the plane is
evident from Figure E.
x

x+y

o
.4 9.32.

.4 9.33.

Ix +

Exercise.

FIgure E. Interpretation of the parallelogram law.

Deorem. F o r all ,x y

holds.
.4 9.34.

Iyl

ylZ

+ Ix -

X the equality
ylZ

= 21xl z + 21yI Z

Prove Theorem .4 9.33.

Generalizing Eq. (4.9.4),

we define the distance between two vectors x

.4 9.

Euclidean Vector Spaces

andy of X a s

p(x , y) =

Ix -

(4.9.35)

yl.

It is not difficult for the reader to prove the next result.


4.9.36.

F o r all x , y, Z E ,X

Theorem.

(i) p(x , y) =
(ii) p(x , y) ~
(iii) p(x , y) <

p(y, x ) ;

0 and p(x , y)

0 if and only if x
p(z, y).

p(x , )z

the following hold:

y; and

A function p(x , y) having properties (i), (ii), and (iii) of Theorem .4 9.36
is called a metric. Without making use of inner products, we will in Chapter 5
define such functions on non-empty sets (not necessarily linear spaces) and
we will in such cases speak of metric spaces (Euclidean spaces are examples
of metric spaces).
.4 9.37.

Exercise.

B.

Prove Theorem .4 9.36.

Orthogonal Bases

Following our discussion at the beginning of the present section further,


we now recall the important concept of orthogonality, using inner products.
In accordance with Definition 3.6.22, two vectors ,x y E X are said to be
ortbogonal (to one another) if (x, y) = O. We recall that this is written as
x ..L y. F r om the discussion at the beginning of this section it is clear that in
0 is orthogonal to y
0 if and only if the angle between x
the plane x
and y is some odd multiple of 90.
The reader has undoubtedly encountered a special case of our next result,
known as the Pythagorean tbeorem.

4.9.38.

Theorem.

Proof.

Ix +

Let x , y

Ix +

.X Ifx ..L y, then


ylZ

= Ixl z +

Iylz.

Since by assumption x ..L y, we have (x , y)

ylZ

(x

y, x

y)

which is the desired result.


.4 9.39.

(x, x )

(x , y)

(y,x )

O. Thus,
(Y , y)

= Ixl z

IYl,z

Definition. A vector x

L e t us choose any vector y

Izi

is said to be a unit vector if Ix

* 0 and let
11~lyl
=

1~l YI

I=

I~ Iy. Then the norm of z


=

1,

i.e., z is a unit vector. This process is called normalizing the vector y.

1.
is

Chapter 4 I iF nite-Dimensional

110

Vector Spaces and Matrices

Next, let {fl" .. .J.l be an arbitrary basis for X and let F = [ l t J ] denote
the matrix of the inner product with respect to this basis; i.e., ItJ = (It, f J )
for all; andj. More specifically,F denotes the matrix of the bilinear functional
f that is used in determining the inner product on X with respect to the
indicated basis (see Definition .4 8.1). Let x and y denote the coordinate
representation of x and y, respectively, with respect to { f l' ... ,f.l. Then
we have, by Theorem .4 8.2,
(x , y)

= Tx yF

= J - LI L

= yTFx'

'-I

Ittlh

Now by Theorems .4 8.20 and .4 8.23, since the inner product is symmetric
and strictly positive, there exists a basis e{ l , ,e.l for X such that the
matrix of the inner product with respect to this basis is the (n x n) identity
matrix I, i.e.,
ifi:;ej
(e" eJ

~'J

={

if; = j .

This motivates the following:


.4 9.40.
Definition. If e{ l , , e.l is a basis for X such that (e" eJ ) = 0
for all ;:;e j, i.e., if e, ..L eJ for all ;:;e j, then e{ l , ,e.l is called an
orthogonal basis. If in addition, (e" e,) = I, i.e., if Ie,l = I for all i, then
e{ l , ,e.l is said to be an orthonormal basis for X (thus, e{ lt e.l is
orthonormal if and only if (e" eJ ) = ~I})'
sU ing the properties of inner products and the definitions of orthogonal
and orthonormal bases, we are now in a position to establish several useful
results.
.4 9.41.
Theorem. eL t e{ lt ,e.l be an orthonormal basis for .X Let
x and y be arbitrary vectors in ,X and let the coordinate representation of
x andy with respectto this basis bex T = (':1' ... ,e.) and yT = (' l it , I' .),
respectively. Then
(4.9.42)
and
Proof.

Ixl =

(X TX) I/1

,Je:

+ ... +

e:

(4.9.43)

F r om the above discussion we have


(x, y) =

T
x yF

L Itt,'IJ =

',J-I

In particular, we have
(x, x)

= L

1= '

',J-I

e:.

~'Je,'IJ

= 'L=1 e,'I,.

.4 9.

Euclidean Vector Spaces

211

The reader should note that Eqs. (4.9.7) and (4.9.8) introduced at the
beginning of this section are, of course, in agreement with Eqs. (4.9.42)
and (4.9.43). (See also Example .4 9.23.)
Our next result enables us to determine the coordinates of a vector with
respect to a given orthonormal basis.
4.9.44.
Theorem. Let e{ l , ,e,,} be an orthonormal basis for X and let
x be an arbitrary vector. The coordinates of x with respect to {el' ... , ell}
are given by the formulas
~I

Proof

Since x

(x, e l)

(ele l

~Iel

(x, e l ),

+ ... +

+ ... +

,~"

(x, ell)'

we have

~"e",

e"e", e l) =

el(e l , e l)

Repeating this procedure for (x, el ), i =


result. _

+ ... +

el )

~"(e,,

el'

2, ... ,n, yields the desired

Let us consider some specific cases.


4.9.45.
Example.
where x = (~I'
~2)

= Z

Let X
and y =

(see Definition .4 9.27).


(111,112)' Then

Let

,x y E 2 ,

el111 + 2{ 112'
The natural basis for 2 is given by U I = (1,0) and U 2 = (0, I). Since (u l ,
u / ) = J ' I ' it follows that u{ 1, u 2} is an orthonormal basis for 2. F u rthermore,
we have
(x , y)

4.9.46.
by

Example.

Let X

RZ, and let the inner product on RZ be defined

1{ 111 + 4~z11z
(4.9.47)
(The reader may verify that this is indeed an inner product.) L e t u{ I , u z }
denote the natural basis for RZ; i.e., U I = (1,0) and U z = (0,1). The matrix
representation of the bilinear functional, which determines the above inner
product with respect to the basis u{ l , uz} is
(x, y) =

(x, y)

XT[~

~Jy,

where x and yare the coordinate vectors of x and y with respect to u{ l , U 2 J .


We see that (UI> U 2 ) = 1 0
4 0 . 1 = 0; eL ., U I and U 2 are orthogonal
with respect to the inner product (4.9.47).
Note however that lUll = I and
Iuz\ = 2; i.e., the vectors U I and U 2 are not orthonormal.
Now let e l = (1,0) and e z = (0, t). Then it is readily verified that e{ l> e 2 }
is an orthonormal basis for .X F u rthermore, for x = ~el
+ e~e2' we have

Chapter 4 I iF nite-Dimensional

211
e~

(x, e,) and e~

Vector Spaces and Matrices

(x, e 1 ). If we let

[~]
=

x'

and y'

[:~J
=

denote the coordinate representation of x and y, respectively, with respect to

eel, e1,} then

(x, y) =

(x)' Ty'.

This illustrates the fact that the norm of a vector must be interpreted with
respect to the inner product used in determining the norm. _
Our next result allows us to represent vectors in X in a convenient way.
.4 9.48.
all x

Theorem. Let e{
X w e have
_

,e.} be an orthogonal basis for .X

l ,

(x , e,)

x - ( e , e, )e l
l
Proof.

Normalizing e l ,

i..}, where

e: = rile"
x

(x , e~)e~

(x , eft)

(-eft,
- ) e".
e.

,n. By Theorem .4 9.44

I,

,e", we obtain the orthonormal basis

Then for

{~,

... ,

we have

(x , e~)e~

(x, Tf-re,) (Tf-r) e, + ... + (x, ;' .le.)C:.I)e.


(x , e,) +
+ (xle.1
, e.)
.
eft

=
=
=

~el

(x , e,) e
(el> e,) l

+ ... +

(x , e.) e..
(e., e.)

We are now in a position to characterize inner products by means of


Parseyal's identity, given in our next result.
.4 9.49.
Coronary. Let e{ " ... ,e,,} be an orthogonal basis for .X
for any ,x y E X we have
(x, y)

.4 9.50.

Exercise.

t. (x,

t1

Then

e,)(y, e,).
(e" e,)

Verify Corollary .4 9.49.

Our next result establishes the linear independence of orthogonal vectors.


We have:
.4 9.51. Theorem. Suppose that X I ' ' X k are mutually orthogonal noni.e., ,x ...L X j ' i::l= j. Then X l " ,x k are linearly indezero vectors in ;X
pendent.

.4 9.

Euclidean Vector Spaces

Proof

213

Assume that for real scalars (X I '


(X I X

F o r arbitrary i

0=

(X/(X"

+ ... +

XIX

+ ... +

(XkkX J

(XkkX

XI)

we have

, (Xk

= I, ... , k, we have

(0, XI) =
=

= O.
(X I (X

XI)

I,

+ ... +

(Xk(X

k , XI)

X I );

i.e., (XI(X I , XI) = O. This implies that


linear independence of X I " , X k

(XI

= 0 for arbitrary i, which proves the

Note that the converse to the above theorem is not true. We leave the
proofs of the next two results as an exercise.
.4 9.52. Corollary. A set, of k non-zero mutually orthogonal vectors is
a basis for X if and only if k = dim X = n.
.4 9.53. Corollary. F o r X there exist not more than n mutually orthonormal
vectors. (In this case we speak of a complete orthonormal set of vectors.)
.4 9.54.

Exercise.

Prove Corollaries .4 9.52 and .4 9.53.

Our next result, which is called the GramSchmidt process, allows us to


construct an orthonormal basis from an arbitrary basis.
.4 9.55.

Theorem. eL t

ff., ... , fn} be an arbitrary basis for

gl =

fl,

g,. =

f,. -

g" 1
="

(/,., el)e u
.- 1

j= 1

(/",ej)ej,

el =

gl/lgII,

e,. =

g,./lg,.l,

elf

.X

Set

= g"/lg,,l

Then e{ u ... , elf} is an orthonormal basis for .X


.4 9.56. Exercise. Prove Theorem .4 9.55. To accomplish this, show that
(e" ej ) = 0 for i :#' .j, that lell = 1 for i = 1, ... , n, and that e{ l ,
, elf}
forms a basis for .X
The next result is a direct consequence of Theorem .4 9.55 and Theorem

3.3.4.4

.4 9.57. Corollary. If e u ... , ekJ k < n, are mutually orthogonal non-ez ro


vectors in ,X then we can find a set of vectors ek + I ' . . , elf such that the
set {el, ... , elf} forms a basis for .X
Our next result is known as the Bessel inequality.

Clwpter 4 I iF nite-Dimensional

214

.4 9.58. Theorem. If { X I ' " '


x
orthonormal vectors in ,X then

for all
X

IJ,

is orthogonal to each "x i =


Let

is an arbitrary set of mutually

k}

Moreover, the vector

.X

Proof.

Vector Spaces and Matrices

:E (x,
-

,x )x,

1= '

I, ... , k.

(x, ,x ). We have

I< x =

(x -

f.
IJ,,X I2 =
~

(x, x ) -

Now since the vectors x

1J,tJ,

'X

lt

f. IJ,"X
j

ftilJ j lJ

~I

k k
~I

IJ j X

J-I

j)

(I,,tJt,x ,

X j )'

are mutually orthonormal, we have

which proves the first part of the theorem.


To prove the second part, we note that

In Theorem .4 9.58, let U denote the linear subspace of X which is spanned


by the set of vectors x { lt . , x k } . Then clearly each vector Y defined in this
theorem is orthogonal to each vector of ;U i.e., Y .l.. U (see Definition 3.6.22).
Let us next consider:
.4 9.59.

Theorem. Let

y.L

(i) Let
j=
(ii) y.L
(iii) n =
(iv) (Y.)L .L
(v) X =

{II'
I,

Y be a linear subspace of ,X

x{

X: (x, y) =

0 for all Y

Then x

,Ik} span .Y
, k.

and let

.} Y

(4.9.60)

if and only if x ..1 I j for

y.L

is a linear subspace of .X
dim X = dim Y + dim y.L.

(vi) Let X , Y
and X

.Y

yEt>

y.L
E

2,

2Y

.X
E

If x

IX

y.L, then

+
X

and
Y

YI

Y2'

where

X I ,Y I

.4 9.

Euclidean Vector Spaces

215

and

Ixl

v- "lxti z
=

+ IXlz .z

To prove the first part, note that if x E y.L, then x J . .fl"' "
X J . .fk'
since It E Y for i = 1, ... , k. On the other hand, let x J . ./' , i = 1, ... , k.
Then for any Y E Y there exist scalars I' " i = 1, ... , k such that y = ndl
+ ... + I' ,.fk' eH nce,

Proof

(x, y) =
Thus, x

(x ,

y.L.

t 'I,/')

I- '

t:1

' I ,(x , /,) =

O.

The remaining parts of the theorem are left as an exercise.


.4 9.61.

Exercise.

Prove parts (ii) through (vi) of Theorem 4.9.59.

.4 9.62. Definition. Let Y be a linear subspace of .X The subspace y.L


defined in Eq. (4.9.60) is called the orthogonal complement of .Y
Before closing the present section we state and prove the following important result.
.4 9.63. lbeorem. L e t/be
y E X such that

a linear functional on .X

f(x )

for all x E .X

There exists a unique

(x , y)

(4.9.64)

Proof. If I(x ) = 0 for all x E X , then y = 0 is the unique vector such that
Eq. (4.9.64) is satisfied for all x E ,X by Theorem 4.9.17. So let us suppose

that I(x )

*-

0 for some x

Then

,X and let

{x

X : /(x )

= OJ.

is a linear subspace of .X Let ~.L


be the orthogonal complement
Then it follows from Theorem 4.9.59 that X = ~ EB .~ L
Furthercontains a non-ez ro vector. Let oY E ~.L
and, without loss of
more, .~ L
generality, let oY
be chosen in such a fashion that Iyo I = 1. Now let y
= f(yo)Yo and for any x E X let X o = x - lXoY , where lX = f(x ) /f(yo)' Then
f(x o) = 0, and thus X o E ~. We now have x = X o + lXoY , and
~

of~.

(x, y) =
=

(xo'/(Yo)

oY )

f(yo) (x o, oY )
= lXf(yo) = f(x ) ;

(lXoY /' (Yo)


lXf(yo)

oY )
(Yo. oY )

i.e., for all x E X , /{ x )


= (x , y).
To show that y is unique, suppose that (x , IY ) = (x , yz ) for all x E .X
Then (x , IY - yz ) = 0 for all x E .X But this implies that IY - Y z = 0, or
IY = Yz This completes the proof of the theorem. _

.4 10.

IL NEAR TRANSFORMATIONS
ON EUCIL
DEAN VECTOR SPACES
Orthogonal Transformations

A.

In the present section we concern ourselves with special types of linear


transformations defined on Euclidean vector spaces. We will have occasion
to reconsider similar types of transformations again in Chapter 7, in a much
more general setting. nU less otherwise specified, X willdenote an n-dimensional

Euclidean vector space throughout the present section.

The first special type of linear transformation defined on Euclidean


vector spaces which we consider is the so-called "orthogonal transformation."
Let {el, . .

,e.l be an orthonormal basis for ,X

= t. PJleJ'

let e;

1="1

i=

1, ... ,

n, and let P denote the matrix determined by the real scalars P'' J The following
question arises: when is the set {e~, ... , e.l also an orthonormal basis for
X?
To determine the desired properties of P, we consider

(t Pklek, t. PIJel)

(e;, eJ) =

In order that (1" ej)

k~1

0 for i

(e;, ej) =

~I

i.e., we require that

*"j and (e;, ej) =


PklP' ) k'

PTP
where, as usual, I denotes the n

.4 10.1.
e;

I-J

Theorem. Let

{ e l' ...

=
~

~ PkIP/J
t.1
I for i
PklPkJ

(e ko e,).

j, we require that

6,J ,

= I,

n identity matrix. We summarize.

,e.l

be an orthonormal basis for .X

Let

,e.l is

an orthonormal basis for

Definition. A matrix P such that pT =


= I, is called an orthogonal matrix.

p- I , i.e., such that p7' p

PJleJ'

1, ... ,n. Then {e~,

if and only if pT =

...

P- I .

This result gives rise to the following:

.4 10.2.
=

p- I p

.4 10.3.

Exercise. Show that if P is an orthogonal matrix, then either


det P = 1 or det P = - I . Also, show that if P and Q are (n X n) orthogonal
matrices, then so is PQ.
The nomenclature used in our next definition will become clear shortly.

216

.4 10.

217

Lineary Transformations on Euclidean Vector Spaces

.4 10.4.
Definition. A linear transformation A from X into X is called an
orthogonal linear transformation if (Ax, Ay) = (x, y) for all x, y E .X
Let us now establish some of the properties of orthogonal transformations.
.4 10.5.
IAx l

Theorem. eL t A
Ixl for all x E .X

L(X,

Then A is orthogonal if and only if

X).

Proof. If A is orthogonal, then (Ax, Ax ) =


versely, if IAx I = Ix I for all x E ,X then
IA(x

Also,
IA(x

y)1 2 =

(A(x

yW

= Ix

.X

y), A(x

2(Ax, Ay)
2(Ax, Ay)

+ yl2 = (x +

and therefore
for all x , y

= IAxl 2 +
= Ixl 2 +

y, x

(Ax, Ay) =
_

and IAx

(x, x)
(Ax

Ay, Ax

I = Ilx .

Con-

Ay)

IAyl2

lyl2.

= Ixl 2+

y)

2(x , y)

+ lyl2,

(x, y)

We note that if A is an orthogonal linear transformation, then x .1.. y


for all ,x y E X if and only if Ax .1.. Ay. F o r (x, y) = 0 if and only if (Ax,
Ay) = O.
.4 10.6. Corollary. Every orthogonal linear transformation of X
is non-singular.

Proof Let Ax =
singular. _

O. Then

into X

IAxl = Ixl = O. Thus, x = 0 and A is non-

Our next result establishes the link between Definitions 4.10.2 and 4.10.4.
Theorem. eL t e{ l' ... ,e.J be an orthonormal basis for .X
Let
X ) , and let A be the matrix of A with respect to this basis. Then
A is orthogonal if and only if A is orthogonal.

.4 10.7.
A

L(X,

Proof. Let x and y be arbitrary vectors in ,X and let x and y denote their
coordinate representation, respectively, with respect to the basis e{ l, ... , e.J.
Then Ax and Ay denote the coordinate representation of Ax and Ay, respectively, with respect to this basis. Now,
(Ax, Ay) =

and

(Ax Y ( Ay)

=
x

ATAy,

Chopter 4 I iF nite-Dimensional

118

Vector Spaces and Matrices

Now suppose that A is orthogonal. Then ATA = I and (Ax , Ay) = T


x y
= (x, y) for all x , y E .X On the other hand, if A is orthogonal, then (Ax ,
Ay) = T
x ATAy = T
x y = (x , y) for all x , y E .X Thus, T
x ([ ATA - I )y] = O.
Since this holds for all x , y E ,X we conclude from Corollary 4.9.20 that

= 0; i.e., ATA = I.

ATA -

The next two results are left as an exercise.


4.10.8.

Corollary.

Let A E L ( X ,

X).

IfA is orthogonal, then det A =

1.

4.10.9.
Corollary. Let A, BE L ( X ,
X ) . If A and B are orthogonal transformations, then AB is also an orthogonal linear transformation.
4.10.10.

Exercise.

Prove Corollaries .4 10.8 and .4 10.9.

F o r reasons that will become apparent later, we introduce the following


convention.
4.10.11. Definition. L e t A E L ( X ,
X ) be an orthogonal linear transformation. Ifdet A = + I, then A is called a rotation. Ifdet A = - I , then A is
called a reflection.
B.

Adjoint Transformations

The next important class of linear transformations on Euclidean spaces


which we consider are so-called adjoint linear transformations. Our next
result enables us to introduce such transformations in a natural way.
4.10.12.

Theorem. L e t G E L ( X ,
X ) and defineg: X x X - + R by g(x , y)
(x, Gy) for all x , y E .X Then g is a bilinear functional on .X Moreover, if
{el>'
.. , e.1 is an orthonormal basis for ,X then the matrix of g with respect
to this basis, denoted by G, is the matrix of G with respect to {el>' , e.l.
Conversely, given an arbitrary bilinear functional g defined on ,X there
X ) such that (x , Gy) = g(x , y)
exists a unique linear transformation G E L ( X ,
for all x , y E .X

Proof.
g(x l

Let G

x z , y)

L(X,
(X I

X),

and let g(x , y) =

X Z,

Gy)

(X I '

Gy)

(x , Gy). Then

(x z , Gy)

g(x l ,y)

g(x z , Y ) .

Also,
g(x, YI

yz )
=

(x, G(YI
g(x, Y I )

yz

g(x , yz)

(x, GYI

Gyz)

(x , GYI)

(x , Gyz)

.4 10.

iL near Transformations on Euclidean Vector Spaces

119

Furthermore,
and
g(x, IX)Y

g(tU,

y)

(x, G(IX Y

Gy)

(lX,X
=

IX(,X

(x, IXG(y

Gy)
=

IX(,X

y),

IXg(X,

Gy)

IXg(X,

y),

where IX is a real scalar. Therefore, g is a bilinear functional.


Next, let e{ ., ... ,e.} be an orthonormal basis for .X Then the matrix
G of g with respect to this basis is determined by the elements g/j = g(e l, eJ).
Now let G' = g[ ;J] be the matrix of G with respect to {e., . .. ,e.}. Then
Ge J

k=.

g~Jek

for j =

I, ...

,n.

Hence,

(e lt Ge) =

(e k=t.
l,

g~)ek)
=

g;j.

Since glJ = g(e l , eJ ) = (e lt Ge J ) = g;J' it follows that G' = G; eL ., G is the


matrix ofG.
To prove the last part of the theorem, choose any orthonormal basis
e[ ., ... ,e.} for .X Given a bilinear functional g defined on ,X let G = g[ lj]
denote its matrix with respect to this basis, and let G be the linear transformation corresponding to G. Then (x, Gy) = g(x, y) by the identical argument
given above. Finally, since the matrix of the bilinear functional and the matrix
of the linear transformation were determined independently, this correspondence is unique.
_
It should be noted that the correspondence between bilinear functionals
and linear transformations determined by the relation (x, Gy) = g(x, y) for
all x , y E X does not depend on the particular basis chosen for ;X however,
it does depend on the way the inner product is chosen for X at the outset.
Now let G E L ( X ,
X ) , set g(x, y) = (x, Gy), and let h(x, y) = g(y, x)
= (y, Gx) = (Gx, y). By Theorem 4.10.12, there exists a unique linear transformation, denote it by G*, such that h(x, y) = (x, G*y) for all ,x y E .X
We call the linear transformation G* E L ( X ,
X ) the adjoint of G.

.4 10.13.

Theorem

(i) F o r each G E L ( X ,
X ) , there is a unique G* E L ( X ,
X ) such that
(x, G*y) = (Gx, y) for all ,x y E .X
(ii) Let {e., . .. ,e.} be an orthonormal basis for ,X and let G be the
matrix of the linear transformation G E L ( X ,
X ) with respect to this
basis. Let G* be the matrix of G* with respect to e[ l , , e.}. Then
G* = GT.

Proof The proof of the first part follows from the discussion preceding
the present theorem.
To prove the second part, let e[ l, ... , e.} be an orthonormal basis for
,X and let G* denote the matrix of G* with respect to this basis. Let x and y
be the coordinate representation of x and y, respectively, with respect to this

Chapter 4 I iF nite-Dimensional
basis. Then

(x , G*y) =

T
x G*y

(GX)T y =

(Gx , y) =

Thus, for all x and y we have T


x (G* -

Vector Spaces and Matrices

GT)y

T
x GT y.

O. eH nce,

G* =

GT.

The above result allows the following equivalent definition of the adjoint
linear transformation.
.4 10.14.
Definition. eL t G
is defined by the formula
for all x, y

L(X,

X).

(x , G*y)

.X

The adjoint transformation, G*


(Gx , y)

Although there is obviously great similarity between the adjoint linear


transformation and the transpose of a linear transformation, it should be
noted that these two transformations constitute different concepts. The
differences of these will become more apparent in our subsequent discussion
of linear transformations defined on complex vector spaces in Chapter 7.
Our next result includes some of the elementary properties of the adjoint
of linear transformations. The reader should compare these with the properties of the transpose of linear transformations.
X ) , let A*, B* denote their respective
.4 10.15. Theorem. eL t A, B E L ( X ,
adjoints, and let lX be a real scalar. Then

(i) (A*)* = A;
(ii) (A B)* = A*
(iii) (lXA)* = lXA*;
(iv) (AB)* = B*A*;

(v)
(vi)
(vii)
(viii)
.4 10.16.

B*;

/* = I, where / denotes the identity transformation;


0* = 0, where 0 denotes the null transformation;
A is non-singular if and only if A* is non-singular; and
if A is non-singular, then (A*)- I = (A- I )*.
Exercise.

Prove Theorem .4 10.15.

Our next result enables us to characterize orthogonal transformations in


terms of their adjoints.
.4 10.17.
A* =

Proof

Theorem. eL t A E L ( X ,

A- I .

We have (Ax, Ay) =

X).

Then A is orthogonal if and only if

(A*Ax , y). But A is orthogonal if and only jf

.4 10.

iL near Transformations on Euclidean Vector Spaces

(Ax , Ay) =

(x, y) for all x , y

.X

221

Therefore,

(A*Ax , y)

(x , y)

for all x and y. F r om this it follows that A*A


=A-I .

I, which implies that A*

The proof of the next theorem is left as an exercise.

.4 10.18. Theorem. Let A E L ( X ,


X ) . Then A is orthogonal if and only if
A- I is orthogonal, and A- I is orthogonal if and only if A* is orthogonal.
.4 10.19.

Exercise.

Prove Theorem .4 10.18.

C. Self- A djoint Transformations


Using adjoints, we now introduce two additional important types of
linear transformations.

.4 10.20. Definition. Let A E L ( X , )X . Then A is said to be self-adjoint


if A* = A, and it is said to be skew-adjoint if A* = - A .
Some of the properties of such transformations are as follows.

.4 10.21. Theorem. Let A E L ( X ,


X ) . Let e{ lO , e"} be an orthonormal
basis for ,X and let A be the matrix of A with respect to this basis. The following are equivalent:
(i) A is self-adjoint;
(ii) A is symmetric; and
(iii) (Ax , y) = (x , Ay) for all x , y

.X

.4 10.22. Theorem. Let A E L ( X ,


X), and let e{ l, ... , e"} be an orthonormal basis for .X Let A be the matrix of A with respect to this basis. The
following are equivalent:
(i) A is skew-adjoint;
(ii) A is skew-symmetric (see Definition .4 8.8); and
(iii) (Ax , y) = - ( x , Ay) for all x , y E .X

.4 10.23.

Exercise.

Prove Theorems .4 10.21 and .4 10.22.

The following corollary follows from part (iii) of Theorem .4 10.22.

Chapter 4 I iF nite-Dimensional

221

.4 10.24.
Corollary. eL t
following are equivalent:

Vector Spaces and Matrices

A be as defined in Theorem .4 10.22.

(i) A is skew-symmetric;
(ii) (x, Ax ) = 0 for all x E ;X
(iii) Ax . .l x for all x E .X

Then the

and

Our next result enables us to represent arbitrary linear transformations


as the sum of self-adjoint and skew-adjoint transformations.
.4 10.25. Corollary. eL t A E L(X, X). Then there exist unique At, A"
E L(X,
X ) such that A = AI + A", where At is self-adjoint and A" is skewadjoint.
.4 10.26.

Prove Corollaries .4 10.24

Exercise.

and .4 10.25.

.4 10.27. Exercise. Show that every real n x n matrix can be written in


one and only one way as the sum of a symmetric and skew-symmetric matrix.
Our next result is applicable to real as well as complex vector spaces.
.4 10.28. Theorem. eL t X be a complex vector space. Then the eigenvalues
of a real symmetric matrix A are all real. (If all eigenvalues of A are positive
(negative), then A is called positive (oegative) definite.)
eL t A = r + is denote an eigenvalue of A, where rand s are real
numbers and where i = ../- 1 . We must show that s = O.
Since A is an eigenvalue we know that the matrix (A - AI) is singular.
So is the matrix

Proof

B=

A
[ -

(r

A" -

(r

is)I)[A -

is)I)

is)IA -

(r -

(r -

is)IA

(r

is)(r -

= A" - 2rA + (r" + s")1" = (A - rI)" + s"I.

Since B is singular, there exists an x * "O such that Bx =


0=

T
x Bx

T
x ([ A

rl)"

Since A and I are symmetric,


(A - rI)T
Therefore,

s"I)x =
AT -

rl T

T
x (A -

is)1"

O. Also,

rI)"x

s"xT.x

= A - rl.

i.e.,
where y =

(A -

rI)x. Now yTy =

~
,~

,,1 ~

0 and T
x x

= L ,1> 0, because

I- '

.4 10.

iL near Transformations on Euclidean Vector Spaces

* O. Thus, we have

by assumption x

o=

yTy

>

SZ(xT)x

223

sZxT.x

The only way that this last relation can hold is if s


and Ais real. _

O. Therefore, A =

T,

X ) with
Now let A be the matrix of the linear transformation A E L ( X ,
respect to some basis. If A is symmetric, then all its eigenvalues are real.
In this case A is self-adjoint and all its eigenvalues are also real; in fact, the
eigenvalues of A and A are identical. Thus, there exist uniq u e real scalars
AI' ... , Apt P < n, such that

U)

det (A -

det (A =

(AI -

AI) =

A)""(Az -

A)'"'

... (A, -

A)'".'

(4.10.29)

We summarize these observations in the following:


Corollary. Let A E L ( X ,
)X . If A is self-adjoint, then all eigenvalues of A are real and there exist uniq u e real numbers AI" ,A" p < n,
such that Eq. (4.10.29) holds.

.4 10.30.

As in Section 4.5, we say that in Corollary 4.10.30 the eigenvalues A"


1, ... ,p < n, have algebraic multiplicities m i = 1, ... ,p, respectively.
" is the following result.
Another direct consequence of Theorem 4.10.28

4.10.31.
Corollary. Let
least one eigenvalue.

.4 10.32.

Exercise.

A E L(X,

If A is self-adjoint, then A has at

X).

Prove Corollary 4.10.31.

Let us now examine some of the properties of the eigenvalues and


eigenvectors of self-adjoint linear transformations. First, we have:

.4 10.33.

Theorem. Let A E L ( X ,
X ) be a self-adjoint transformation, and
let AI" .. ,Ap , p < n, denote the distinct eigenvalues of A. If ,X is an eigenvector for A, and if XI is an eigenvector for AI' then ,x .1. XI for all i j.

Proof Assume that A, A,andconsider AX I =


,x
0 and x ,
O. We have

A,(X
Thus,

"

Since A,

x,) =

(A,X

"

)JX
=

* AI' we have (XI'

Now let A

L(X,

X),

(Ax

"

)JX
=

(XI'

Ax /) =

(x"

AJX /) =

(A, -

AJ)(X"

)JX

0, which means ,x .1. xI'

IX ) =

A,X , and Ax,

AJ"X

where

Aix " )J x '

O.
_

and let A, be an eigenvalue of A. Recall that

~,

Chapter 4 I iF nite-Dimensional

224

Vector Spaces and Matrices

denotes the null space of the linear transformation A -

m=
l

x{

= OJ.

:X (A - A Il)x

Recall also that ml is a linear subspace of .X


have immediately:

A,l, i.e.,
(4.10.34)

F r om Theorem .4 10.33 we now

X ) be a self-adjoint transformation, and


.4 10.35. Corollary. Let A E L ( X ,
let AI and Aj be eigenvalues of A. If AI *- Aj , then ml ..1 mj

.4 10.36.

Exercise.

Prove Corollary .4 10.35.

Making use of Theorem .4 9.59, we now prove the following important


result.
X ) be a self-adjoint transformation, and
.4 10.37. lbeorem. Let A E L ( X ,
let A\, ... , A" p < n, denote the distinct eigenvalues of A. Then

dim X

= n=

dim m\

+ ... +

dim mz

dim m,.

Proof Let dim ml = nl , and let ret, ... , e .l be an orthonormal basis for
mi' Next, let e{ + I > ' "
,e.,H .} be an orthonormal basis for mz . We continue
in this manner, finally letting e{ ., + ... +_. + I> , e + ... .+ ,} be an orthonormal
basis for mp Let n\ + ... + n p = m. Since ml ..1 mj , i *- j, it follows that
the vectors et> ... ,e.., relabeled in an obvious way, are orthonormal in .X
We can conclude, by Corollary .4 9.52, that these vectors are a basis for ,X
if we can prove that m = n.
Let Y be the linear subspace of X generated by the orthonormal vectors
e\ , ... , e... Then e{ l , , e..} is an orthonormal basis for Y a nd dim Y = m.
Since dim Y + dim y1. = dim X = n (see Theorem .4 9.59), we need only
prove that dim Y 1. = O. To this end let x be an arbitrary vector in Y 1.. Then
(x, e\) = 0, ... , (x, e..) = 0; i.e., x . .l e\, ... , x ..1 e.., by Theorem .4 9.59.
So, in particular, again by Theorem .4 9.59, we have x ..1 ml , i = I, ... ,p.
Now let y be in mi' Then

(Ax, y) =

(x, Ay) =

(x, AIY)

Alx , y) =

0,

since A is self-adjoint, since y is in ml , and since x ..1 mi' Thus, Ax ..1 m,


for i = I, ... ,p, and again by Theorem .4 9.59, Ax . .l el , i = I, ... , m.
Thus, by Theorem .4 9.59, Ax . .l yol. Therefore, for each x E Y 1. we also have
Ax E yol. Hence, A induces a linear transformation, say A', from yol into
1Y ., where A' x = Ax for all x E y1.. Now A' is a self-adjoint linear transformation from yol into oY l, because for all x and y in yol we have

(A'x, y) =

(AX, y) =

(x, Ay) =

(x, A'y).

Assume now that dim yol> O. Then by Corollary .4 10.31, A' has an
eigenvalue, say Ao, and a corresponding eigenvector X o *- O. Thus, X o *- 0

.4 10.

iL near Transformations on Euclidean Vector Spaces

225

IS 10 y1. and A' x o =


Ax o = Aox o; i.e., Ao is also an eigenvector of A, say
Ao = A,. So now it follows that X o E ~/'
But from above, X o E Y 1., which
This implies that X o 1- x o, or (x o, x o) = 0, which in turn
means X o 1- ~/'
implies that X o = O. But this contradicts our earlier assumption that X o 1= = O.
eH nce, we have arrived at a contradiction, and it therefore follows that dim
Y 1. = O. This proves the theorem. _

Our next result is a direct consequence of Theorem .4 10.37.

.4 10.38.

Corollary. eL t A

L(X,

)X .

If A is self-adjoint, then

(i) there exists an orthonormal basis in X such that the matrix of A


with respect to this basis is diagonal; and
(ii) for each eigenvalue A, of A we have dim m, = multiplicity of A,.

Proof As in the proof of Theorem .4 10.37 we choose an orthonormal


basis ret, ... ,em}, where m = n. We have Ael = Ale., . .. ,Ae", = Ale""
Ae",+l = A2.e"'h+ ' .. ,Ae",+ ... u. = Ape",+ ... + . Thus, the matrix A of A with
respect to e{ l , . ,e.} is
Al

In.

Al
A. 2.

A=

In.

A2.

A,

I
n,

A,

To prove the second part, we note that the characteristic polynomial of

A is

det (A -

AI) =

and, hence, n,

det (A dim /~
=

AI)

(AI -

A)"'(A2. -

multiplicity of A" i

A)"'

1,

(Ap ,p.

Another consequence of Theorem .4 10.37 is the following:

A)"',

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

.4 10.39. Corollary. Let A be a real (n x n) symmetric matrix. Then there


exists an orthogonal matrix P such that the matrix A' defined by
A' = P- I AP = pTAP
is diagonal.
.4 10.40.

Exercise.

Prove Corollary .4 10.39.

F o r symmetric bilinear functionals defined on Euclidean vector spaces


we have the following result.

.4 10.41.
Corollary. eL t f(x , y) be a symmetric bilinear functional on .X
Then there exists an orthonormal basis for X such that the matrix offwith
respect to this basis is diagonal.
Proof

By Theorem .4 10.12 there exists an F

L( ,X X )

such that f(x , y)

= (x, Fy) for all x, y E .X Since f is symmetric, f(y, x ) = f(x , y) =


(y, Fx) = (x, yF ) = (F,x
y) for all x, y E X, and thus, by Theorem .4 10.21,

F is self-adjoint. eH nce, by Corollary .4 10.38, there is an orthonormal basis


for X such that the matrix of F is diagonal. By Theorem .4 10.12, this matrix
is also the representation offwith respect to the same basis. _

The proof of the next result is left as an exercise.

.4 10.42.
Corollary. eL t j(x ) be a quadratic form defined on .X
there exists an orthonormal basis for X such that if x T = (~I' ..
the coordinate representation of x with respect to this basis, then! ( x )
+ ... + lX.e~ for some real scalars lXI' , IX
.4 10.43.

Exercise.

,~.)

Then
is
lXle~

Prove Corollary .4 10.42.

Next, we state and prove the spectral tbeorem for self-adjoint linear
X ) is a
transformations. First, we recall that a transformation P E L ( X ,
projection on a linear subspace of X if and only if p1. = P (see Theorem
3.7.4). Also, for any projection P, X = R
< (P) EEl (~ P),
where R
< (P) is the range
of P and ~(P)
is the null space of P (see Eq. (3.7.8. Furthermore, recall
that a projection P is called an orthogonal projection if R
< (P) ..1 (~ P)
(see
Definition 3.7.16).

.4 10.4.4

Theorem. Let

A E L(X,

X)

be a self-adjoint transformation, let

AI' ... ,Ap denote the distinct eigenvalues of A, and let ~I be the null space

of A - AsI (see Eq. (4.10.34.


F o r each; =
projection on 1& along &f-. Then

I, ... ,p, let PI denote the

(i) PI is an orthogonal projection for each; =


= 0 for i *j, i,j = I, ... ,p;

(ii) PIP)

I, ... ,p;

.4 10.

Linear Transformations on Euclidean Vector Spaces

(iii)

= I, where I E L(X,

PJ

J-I

and
(iv) A =

X ) denotes the identity transformation;

AJP)"

J=I

To prove the first part, note that X = m:, EB m:;-, i = I, ... ,p,
by Theorem .4 9.59. Thus, by Theorem 3.7.3, R< (P ,) = m:, and m:(P ,) = m:;-,
and hence, P, is an orthogonal projection.
To prove the second part, let i 1= = j and let x E .X Then PJx
I:>. x J E m: J .
Since R< (P ,) = m:, and since m:,1.. m: J , we must have x J E m:(P ,); i.e.,
P,PJx
= 0 for all x E .X

Proof

To prove the third part, let P

P" We must show that P

I- I

= I.

To

do so, we first show that P is a projection. This follows immediately from the
fact that for arbitrary x E ,X p2 X = (PI + ... + P,)(Plx + ... + P,x ) =
PIx + ... + P;x , because P'P J = 0 for i 1= = j. Hence, p2 X = (PI + ...
+ P,)x = Px, and thus P is a projection. Next, we show that dim R<[ (P)] = n.
It is straightforward to show that

dim R
<[ (P)]

But by Theorem .4 10.37,

1= 1

dim m
[ :,]

dim m
[ :,],

1= 1

= n,

and thus dim R


< [ (P)]

= n.

Since

X = R
< (P) EB m:(P), we conclude that R
< (P) = .X Finally, since P is a projection with range ,X we conclude that Px = x for all x E ,X i.e., P = 1.
To prove the last part of the theorem, let x E .X F r om part (iii) we have

Let ,x =

P,X

Ax =
=

A(x

for i =
i

AIPIX

PIX

P 2x

+ ... +

I, ... , p. Then ,x E

x,) =
A'p,X

m:, and AX

+ ... + Ax ,
= (AIP I + ... +

AX

which concludes the proof of the theorem.

P,x.
=

Al IX

A,X ,. Hence,

+ ... +

A,x,

A,P,)X,

{ I" .. , P,} satisfying parts (i)-(iii)


Any set of linear transformations P
of Theorem .4 10.44
is said to be a resolution of the identity in the setting of
a Euclidean space. We shall give a more general definition of this concept in
Chapter 7.
D.

Some Examples

At this point it is appropriate to consider some specific cases.


4.10.45.
Example.
Let X = P, let A
arbitrary basis for .X Suppose that

L(X,

X),

and let {el' e2 } be an

I Finite-Dimensional

Chapter 4

A=

0[ 11

Vector Spaces and Matrices

Ou]

021 022

is the matrix of A with respect to the basis e{ l, e2}' eL t x E E2, and let
x T = (' I ' ' 2 ) denote the coordinate representation of x with respect to this
basis. Then Ax is the coordinate representation of Ax with respect to this
basis, and we have
Ax

0[ 111' + OUe2J ~ 1'[ IJ


021 '1 + 022e2 1' 2

= y.

This transformation is depicted p' ictorially in iF gure F.


Now assume that A is a self-adjoint linear transformation. Then there
exists an orthonormal basis e{ ,~ }~ such that
Ae~

A.le~,

Ae; =

A.2e;,

81

111 8 1

.4 10.46.

iF gure F

.4 10.47.

FIgure

.4 10.

iL near Transformations on Euclidean Vector Spaces

where A1 and Az denote the eigenvalues of A. Suppose that the coordinates


of x with respect to l{ a, e~} are ;~ and ;~ , respectively. Then

Ax =

A(~;e~

~e;)

~;Ae~

~;Ae;

~Ale~

~Aze;;

i.e., the coordinate representation of Ax with respect to e{ ,~ e~} is (AI~;'


Az;~ ).
Thus, in order to determine Ax, we merely "stretch" or "compress" the coordinates ;~ , ~ along lines colinear with e~ and e;, respectively. This is illustrated in iF gure G.
.4 10.48.
Example. Consider a transformation R from EZ into 1 . which
rotates vectors as shown in iF gure .H By inspection we can characterize R,
with respect to the indicated orthonormal basis e{ l, ez,J as
Rei
Re z

= cos (Je l + sin (Jez


= - sin (Je l + cos (Je

U n it circle

.4 10.49.

iF gure H

The reader can readily verify that R is indeed a linear transformation. The
matrix of R with respect to this basis is

_ C[ OS

R.-

(J

sin (J

(JJ

sin
cos (J

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

By direct computation we can verify that


lij' =

R; I = [

and, moreover, that


det R, =

c~s
-

(JJ

sin
cos (J ,

(J

sm (J

cos" (J

sin" (J =

1.

Thus, R is indeed a rotation as defined in Definition .4 10.11.


F o r the matrix R, we also note that R o = I, R;t = R_, and R,R~
=

R,+~

.4 10.50. Example. Consider now a transformation A from E3 into E3,


as depicted in iF gure .J The vectors e l , e", e 3 form an orthonormal basis for
Set +Y

PlaneZ

900

I
I
I

I
I

Set .Y

I
aJF ure J

.4 10.51.

E3. The plane Z is spanned by et and e". This transformation accomplishes


a rotation about the vector e3 in the plane Z. By inspection of Figure J it is
clear that this transformation is characterized by the set of equations
Aet
Ae" =

cos (Jet
- s in (Je

sin (Je"

cos (Je"

0 e3
0 e3

0 e l + 0 e" + I e 3
The reader can readily verify that A is a linear transformation. The matrix
Ae 3

.4 10.

Linear Transformations on Euclidean Vector Spaces

231

of A with respect to the basis { e l' e z , e3 l is


A=
[

COS9

- s in 9

sin9

cos 9

F o r this transformation the following facts are immediately evident (assume


sin9 t= = 0): (a) e3 is an eigenvector with eigenvalue I; (b) plane Z is a linear
subspace of 3 ; (c) Ax E Z whenever x E Z; (d) the set +Y
is a linear
subspace of 3 ; (e) Ax E +Y
whenever x E ;+ Y
(f) Z..l ;+ Y
and (g)
dim +Y = I, dim Z = 2, and dim +Y
dim Z = dim E3.

E.

Further

Properties of Orthogonal Transformations

The preceding example motivates several of our subsequent results.


Let A E L ( X ,
X ) . We recall that a linear subspace Y of X is invariant under
A if Ax E Y whenever x E .Y We now prove the following:
.4 10.52.
Then

Theorem.

eL t

L(X,

be an orthogonal transformation.

X)

(i) the only possible real eigenvalues of A, if there are any, are + I
and - I ;
(ii) if Y is a linear subspace of X which is invariant under A, then the
restriction A' of A to Y is an orthogonal transformation from Y into
Y; and
(iii) if Y is a linear subspace of X which is invariant under A, then Y l.
is also a linear subspace of X which is invariant under A.

Proof To prove the first part, assume that A has a real eigenvalue, say Ao
(The definition of eigenvalue of A E L ( X ,
X ) excludes the possibility of complex eigenvalues, since X is a vector space over the field R of real numbers.)
Then Ax = .lox for x
0 and

*'

IAx I = IAox I = lAo IIlx -

But IAx I = Ix I, because A is by assumption an orthogonal linear transformation. Therefore, lAo I = I, and we have Ao = + I or - 1 .
To prove the second part assume that Y is invariant under A. Then
Ax E Y whenever x E ,Y and thus the restriction A' of A to ,Y defined by
A' x

Ax

for all x in ,Y is clearly a linear transformation of Y into .Y Now, trivially,


for all x in Y we have

IA'lx =

IAx l

= lxi,

Chapter 4 I iF nite-Dimensional

232

Vector Spaces and Matrices

since A E L ( X ,
X ) is an orthogonal transformation. Therefore, A' is an
orthogonal transformation from Y into .Y
To prove the last part, let Y be an invariant subspace of X under A. Then
x E y.l if and only if x 1.. y for all y E .Y Suppose then that x E y.l and
consider Ax . Then for each y E Y we have
(Ax , y)

(x, A*y) =

(x, A- I y),

because A is orthogonal.. But A- I Y is also in ,Y for the following reasons.


The restriction A' of A to Y is orthogonal on Y by part (ii) and is therefore
a non-singular transformation from Y into .Y eH nce, (A' t 1 exists and,
moreover, (A' t l must be a transformation from Y into .Y Thus, (A' t ly
= A- I y and A- I y is in .Y We finally have
(Ax , y)

for each y in .Y Thus, Ax


is invariant under A.

(x, A- I y ) =

y.l whenever x

y.l. This proves that y.l

We also have:
.4 10.53. Theorem. Let A E L ( X ,
X ) be an orthogonal transformation, let
+Y
denote the set of all x E X such that Ax = ,x and let y_ denote the set
of all x E X such that Ax = - x . Then +Y and L are linear subspaces of X
and +Y 1.. L .

Proof Since +Y = ~(A


- I) and y_
_ Y are linear subspaces of .X Now let x
(x, y)

which implies that (x, y)

(Ax , Ay) =

~(A
E

(x, - y )

+ I), it follows that +Y


and let y E Y_. Then
=

y),

-(x,

= O. Therefore, x 1.. yand

and

1.. y_.

+Y

sU ing the above theorem we now can prove the following result.
.4 10.54.
Corollary. eL t A, +Y and y_ be defined as in Theorem .4 10.53,
and let Z denote the set of all x E X such that x 1.. +Y and x 1.. y_. Then Z
is a linear subspace of X and dim +Y + dim y_ + dim Z = dim X = 11.
Furthermore, the restriction of A to Z has no (real) eigenvalues.

Proof Let e{ l , , e"J be an orthonormal basis for Y H and let e{ .'1+ >
... , e"I+n.} be an orthonormal basis for ,_ Y
where dim +Y = n 1 and dim y_
= 11". Then the set e{ l ,
,e"I+"}' is orthonormal. Let Y denote the linear
subspace generated by e{ l , , e".+n.}. Then dim Y = 11 1 + 11". By the defand thus Z is a linear
inition of Z and by Theorem .4 9.59 we have Z = .Y ,L
subspace of .X Therefore,
11

= dim X =

= dim

+Y

which was to be shown.

dim Y

dim y_

dim y.l

= n l +

dimZ,

11"

dimZ

.4 10.

Linear Transformations on Euclidean Vector Spaces

233

To prove the second assertion, let A' denote the restriction of A to Z.


Suppose there exists a non-zero vector x E Z such that A' x = lox . Since
A is orthogonal by part (ii) of Theorem .4 10.52, we have 1 0 = 1 by part
(i) of Theorem .4 10.52. Thus, x is either in +Y or in L . But by assumption,
x E Z and Z .J . +Y and Z .J . L . Therefore, x = 0, a contradiction to
our earlier assumption. Hence, the restriction A' of A to Z cannot have a
real eigenvalue. _
I

Our next result is concerned with orthogonal transformations on twodimensional Euclidean spaces.

.4 10.55. Theorem. eL t
where dim X = 2.

L(X,

X)

be an orthogonal transformation,

(i) If det A = + 1 (i.e., A is a rotation), there exists some real (J such


that for every orthonormal basis fe., e2 } the corresponding matrix
of A is

_ C[ OS

R,-

(J

sin (J

(JJ

sin
cos (J

(4.10.56)

(ii) If det A = - 1 (i.e., A is a reflection), there exists some orthonormal


basis feI' e2 } such that the matrix of A with respect to this basis is

1[ o - I OJ.

Q=

(4.10.57)

To prove the first part assume that det A


arbitrary orthonormal basis fe" e2 } . eL t

Proof

= [all

a21

= + 1 and choose an

au]
a2 2

denote the matrix of A with respect to this basis. Then, since A is orthogonal,
so is A and we have
(4.10.58)
and
(4.10.59)
det A = I.
Solving Eqs. (4.10.58) and (4.10.59) (we leave the details to the reader) yields
= cos (J, au = - sin (J, a21 = sin (J, and a 22 = cos (J.
To prove the second part assume that A is orthogonal and that det A
= - I . Consider the characteristic polynomial of A,

alI

. p(l)

Since det A =

- 1 we have /x o =
,

AI>

A2 -

12

/X 11

/X o

- 1 . Solving for 1 I and 1 2 we have


-/XI

.v'iif+4
2

'

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

which implies that both AI and Az are real and that AI *- Az. rF om Theorem
.4 10.52 these eigenvalues are + 1and - 1 . Therefore, there exists an orthonormal basis such that the matrix of A with respect to this basis is
[

OJ I[

AI

Az =

OJ

0 -1

In the above proof we have e l E +Y and ez E ,_Y


in view of Theorem
.4 10.53. Also, from the preceding theorem it is clear that if A is orthogonal
and (a) if det A = 1, then det (A - U ) = 1 - 2 cos 9A + AZ, and (b) if
det A = - 1 , then det (A - U ) = AZ - l.
.4 10.60. Theorem. eL t A E L ( X ,
X ) be an orthogonal transformation
having no (real) eigenvalues. Then there exist linear subspaces Y " ... , Y r
of X such that

(i) dim Y I = 2, i = 1, ... , r;


(ii) Y I 1- Y J for all i *- j;
(iii) dim Y I + ... + dim Y r = dim X = n; and
(iv) each subspace Y I is invariant under A; in fact, the restriction of A
to Y I is a non-trivial rotation (i.e., for the matrix given by Eq.
(4.10.56) we have 9 *- kn, k = 0, 1,2, ...).
Proof.

Since by assumption A does not have any (real) eigenvalues, we have


det (A -

AI) =

(tX I

PI).

).Z) ... (tX,

P,).

).Z),

where the tXl' PI> i = I, ... , r are real (i.e., det (A - U ) does not have any
linear factors (AI - ).), with AI real). Solving the first uq adratic factor we have
, _

-PI

,+ P
J :

A-

-PI

.- .Jm2 -

""I -

and
z-

-41

t4 X

l,

where AI and Az are complex. By Theorem .4 5.33, part (iv), if 1(' ) is any
polynomial function, then I(AI) will be an eigenvalue of I(A). In particular,
ifj(A) = tX l + PIA + AZ, we know that one of the eigenvalues of the linear
transform~tion
tXlI + PIA + AZ will be tX l + PIAl + Ai = 0, by choice.
Thus, the linear transformation (tXII + PIA + AZ) has 0 as an eigenvalue.
Therefore, there exists a vector jl *- 0 in X such that
or

(tXII

PIA

AZ)/I

0 II

(4.10.61)

.4 10.

iL near Transformations on Euclidean Vector Spaces

235

Now let 11 = Alt We assert that 11 and 11 are linearly independent. F o r


if they were not, we would have 11 = "fIt = Alto where" is a real scalar,
and It would be an eigenvector corresponding to a real eigenvalue" of A,
which is impossible by hypothesis. Nex t , let Y t be the linear subspace of X
generated by It and/1 Then Y t is two dimensional. We now show that .Y
is invariant under A. L e t x E Y t Then

=
x

for some et and ez, and

etft

+ e.Jz

= etA/. + ez AI1 = etA/. +

Ax

ez A Z/ t .

But from Eq. (4.10.61) it follows that

AZit
Thus,

Ax

ez ( - f 1,tft

- e z f 1,tft

(et -

= etA/]

- f 1,tft -

PtA/t.

PtA/.) =

- e z f 1,tft

(et -

ez p t)A/t

eZpt)fz,

which shows that Ax E \ Y whenever x E Y t . Thus, Y t is invariant under A.


By Theorem 4.10.52, the restriction A' of A to Y t is an orthogonal transformation from Y t into Y t This restriction cannot have any (real) eigenvalues,
for then A would also have (real) eigenvalues.
F r om Theorem 4.10.55, A' cannot be a reflection, for in that case A'
would have eigenvalues eq u al to
I and - 1 . Moreover, A' cannot be a
trivial rotation, for then the eigenvalues of A' would be eq u al to 1 if 6 = 0 0
and - 1 if (J = 1800 But from Corollary 4.10.8 we know that if A is
orthogonal, then det A = 1. Therefore, it follows now from Theorem
4.10.55 that the restriction of A to Y t is a non- t rivial rotation.
Now let Zt = Y t . Since Y t is invariant under A, so is Zto by Theorem
4.10.52, part (iii), and dim Zt = dim X - 2. The restriction At of A to Zt
is an orthogonal transformation from Zt into Zto and it cannot have any
(real) eigenvalues. Applying the argument already given for A and X now to
At andZ t , we can conclude that there exists a two- d imensional linear subspace
Y z of Z t such that the restriction of A t to Y 1 is a non- t rivial rotation. Now
since Y z i scontainedinZ t and since by definitionZ t = Y t ,wehave .\ Y ..L
Y 2
Nex t , let Z2 be the linear subspace which is orthogonal to both Y t and
Y z , and let A2 be the restriction of A to Z2' Repeating the argument given
thus far, we can conclude that there exists a two- d imensional linear subspace
] Y of Z2 such that the restriction of A 2 to ] Y is a non-trivial rotation and
such that .z Y ..L
] Y and Y t ...L .] Y
To conclude the proof of the theorem, we continue the above process
until we have ex h austed the original space .X

Combining Theorems 4.10.53 and 4.10.60, we obtain the following:

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

.4 10.62. Corollary. Let A E L ( X ,


X ) be an orthogonal linear transforma,_ Y
Y lt . , ,Y of X such that
tion. Then there exist linear subspaces t+ Y
(i)
(ii)
(iii)
(iv)
(v)

all of the above linear subspaces are orthogonal to one another;


n = dim X = dim + Y
+ dim L + dim Y I + ... + dim ,Y ;
x E +Y if and only if Ax = x;
x E y_ if and only if Ax = - x ; and
the restriction of A to each /Y , i = 1, ... , r, is a non-trivial rotation.

Since in the above corollary the dimension of each


is two, we have the following additional result.
.4 10.63. Corollary. If in Corollary .4 10.62 dim X
real eigenvalue.

"Y i =

1, ...

,r,

is odd, then A has a

We leave the proof of the next result as an exercise.


Deorem. If A is an orthogonal transformation from X
.4 10.64.
then the characteristic polynomial of A is of the form
(I -

l) ' ( - 1 -

l)- ' ( I - 2 cos 8 1l

lZ) . .. (1 - 2 cos 8,l

into ,X

lZ)
det(A -

Al).

Moreover, there exists an orthonormal basis e{ lt ... , e.l of X such that the
matrix of A with respect to this basis is of the form
cos 8 1

- s in 9 1

sin 9 I

cos 9 I

o
:I cos 8, - s in 8, :I
~-,

L~

A=

8
~

J_! ~

-1

-1

1+

2,

.4 10.

Linear Transformations on Euclidean Vector Spaces

.4 10.65.

Exercise.

237

Prove Theorem .4 10.64.

In our next result the canonical form of skew-adjoint linear transformations is established.
.4 10.66. Theorem. Let A be a skew-adjoint linear transformation from
X into .X Then there exists an orthonormal basis fe"~ ... ,e.l such that
the matrix A of A with respect to this basis is of the form

o
A=
!-o-i~

o
where the .J "
ez ro.
.4 10.67.

i=

Exercise.

i -J.,
I

I, ... ,r are real and where some of the .J , may be

Prove Theorem .4 10.66.

Before closing the present section, we briefly introduce so-called "normal


transformations." We will have quite a bit more to say about such transformations and their representation in Chapter 7.
.4 10.68. DefinitioD. A transformation A
linear transformation if A*A = AA*.

L(X,

X)

is said to be a normal

Some of the properties of such transformations are as follows.


.4 10.69.

Theorem. eL t A

L(X,

)X .

Then

(i) if A is a self-adjoint transformation then it is also a normal transformation;


(ii) if A is a skew-adjoint transformation, then it is also a normal
transformation;
(iii) if A is an orthogonal transformation, then it is also a normal
transformation; and
(iv) if A is a normal linear transformation then there exists an orthonormal basis {el' ... ,e,,} of X such that the matrix A of A with
respect to this basis is of the form

Chapter 4

._ - - -

PI
-AI

Finite-Dimensional

Vector Spaces and Matrices

AI:I
PI!

o
- - - - - -1

A=

P,

"

:- - - - - - A ,

A, I

P,

I
I

o
V~-2,

The proofs of parts (i)-(iii) follow from the definitions of normal, selfadjoint, skew-adjoint, and orthogonal linear transformations. To prove
part (iv), let A = AI + A2 , where AI = H A + A*) and A2 = t(A - A*),
and note that AI is self-adjoint and A 2 is skew adjoint. This representation
is unique by Corollary .4 10.25. Making use of Theorem .4 10.66 and Corollary
.4 10.38, we obtain the desired result. We leave the details of the proof of
this theorem as an exercise.
.4 10.70.

4.11.

Exercise.

Prove Theorem .4 10.69.

APPLICATIONS
DIFE
F RENTIAL

TO ORDINARY
EQUATIONS

In the present section we present applications to the material covered in


the present chapter and the preceding chapter. Because of their importance
in almost all branches of science and engineering, we consider some topics
in ordinary differential equations. Specifically, we concern ourselves with
initial-value problems described by ordinary differential equations. The
present section is divided into two parts. In subsection A, we define the initialvalue problem, while in subsection B we treat linear initial-value problems.
At the end of the next chapter, we will continue our discussion of ordinary
differential equations.
A. Initial- Value Problem: Definition

Let R denote the set of real numbers, and let D c R2 be a domain (i.e.,
D is an open and connected subset of R2). We will call R2 the (t, x) plane.
Let / be a real-valued function which is defined and continuous on D, and

4.IJ.

Applications to Ordinary Differential Equations

let

I:J.

dx/dt (Le.,

239

denotes the derivative of x with respect to t). We call

x =

f(t, x)

(4.11.1)

an ordinary differential equation of the first order. Let T = (t l ' t )z c R be


an open interval which we call a t interval (Le., T = (t l ' t z ) = t{ E R:
t I < t < t )} z . A real differentiable function rp (if it exists) defined on T such
that the points (t, rp(t E D for all t E T and such that

;(t) =
for all t

(4.11.2)

f(t, rp(t

Tis called a solution of the differential equation (4.11.1).

.4 11.3. Definition. Let ('r, e)


equation (4.11.1) and if rp('r) =
value problem

x =

x('r)

In Figure K a

D. If rp is a solution of the differential

e, then rp is called a solution of tbe initial-

f(t, x ) } .

(4.11.4)

typical solution of an initial-value problem is depicted.


t interval T = (t, t2)

slope of line L

fIT, .,(T))

---

t,

.4 11.5.

iF gure .K

Typical solution of an initial-value problem.

We can represent the initial-value problem given in Eq. (4.1 1.4) equivalently by means of the integral equation

rp(t) =

e+

f(s, rp(s ds.

(4.1 1.6)

Here we say that two problems are equivalent if they have the same solution.
To prove this equivalence, let rp be a solution of the initial-value problem
(4.1 1.4). Then rprr) =
and

;(t) =

f(t, rp(t

140

Chapter 4 I iF nite-

for all lET. Integrating from

DimensiolUll Vector Spaces and Matrices

to I we have

s: ;(s) ds = s: I(s, ,(s

ds

s: I(s, ,(s

ds.

or
' ( 1)

= ~ +

Thus, , is a solution of the integral equation (4.11.6).


Conversely, let, be a solution of the integral equation (4.11.6). Then
' ( f) = ~ ,
and differentiating both sides of Eq. (4.11.6) with respect to I
we have
;(1) = 1(1, ,(t)),
and thus, is a solution of the initial-value problem (4.11.4).
Next, we consider initial-value problems described by means of several
first-order ordinary differential equations. Let D c R!+' I be a domain (Le.,
D is an open and connected subset of R + I ). We will call R + I the (t, X I '
... , x . ) space. Let II> ... ,I. be n real-valued functions which are defined
and continuous on D (i.e., /,(t, X I ' ... , x.), i = I, ... ,n, are defined for
all points in D and are continuous with respect to all arguments I, IX > ,
x.). We call
(4.11. 7)
IX = /,(1, X I ' ... ,x . ),
i = 1, ... , n,
a system of n ordinary differential equations of tbe first order. A set of n real
differentiable functions 1
' ' ... , ,.} (if it exists) defined on a real I interval
T = (I I' I z ) c R such that the points (I, '1(1), ... , ,.(1
E D for all lET
and such that

;tCt) =

/,(1, '1(1), ... ".(t ,

= 1, ... , n

(4.11.8)

for all lET, is called a solution of tbe system of ordinary differential equations
(4.11.7).

,.} is
.4 11.9. Definition. Let (f, ~ I> . . , ~.) E D. If the set { ' I "' "
a solution of the system of equations (4.11.7) and if (' I (f), ... , ,.(f = (~I>
... , ~.),
then the set 1
' "'
. ".} is called a solution of the initial-value
problem
IX = /,(t, X I ' . ' , x.),
i = 1, ... , n } .
(4.11.10)
X I (f) = ~I'
I =
I, ... , n
It is convenient to use vector notation to represent Eq. (4.11.10). Let

.4 11.

241

Applications to Ordinary Differential Equotions

f(/, x )

and define i =

/1(/, X
1.(/,

,X.)]

It.'

/[ I('~
=.

.
1.(/, x)

,x . )

X It . .

)X ]

dx/dt componentwise; i.e.,

We can express Eq. (4.11.10) equivalently as


i

= f(t, x)

(X T)

=;

}.

(4.11.11)

If in Eq. (4.11.1 I) f(t, x) does not depend on I (i.e., f(t, )x


(I, )x E D), then we have
i = f(x).

= f(x) for all


(4.11.12)

In this case we speak of an autonomous system of first-order ordinary difl'erential equations.


Of special importance are systems of first-order ordinary differential
equations described by
(4.11.13)
i = A(t)x + vet),
i

and

(4.11.14)

A(t)x,

(4.11.15)

i= A x ,

where x is a real n-vector, A(t) = a[ j{ (t)] is a real (n x n) matrix with elements


a{j(/) that are defined and continuous on a t interval T, A = a[ ,/] is an (n X n)
matrix with real constant coefficients, and vet) is a real n-vector with components v,(t), i = 1, ... ,n, which are defined and at least piecewise continuous on T. These equations are clearly a special case of Eq. (4.11.7). F o r example, if in Eq. (4.11.7) we let
/,(t,

XI'

,x . ) = /,(t, x) =

a'/(t)x

I- I

l,

i=

I, ... ,n,

then Eq. (4.11.14) results. In the case of Eqs. (4.11.14)


and (4.11.15), we
speak of a linear homogeneous system of ordinary differential equations, in
the case of Eq. (4.11.13) we have a linear non-bomogeneous system of ordinary
differential equations, and in the case of Eq. (4.11.15) we speak of a linear
system of ordinary differential equations with constant coefficients.
Next, we consider initial-value problems described by means of nth-order
ordinary differential equations. L e tlbe a real function which is defined and

Chapter 4 I iF nite-Dimensional

242

continuous in a domain D of the real (I,


Ii. dkx/dl k. We call
(X )~
= 1(1, ,X X (I),

XI'
,

Vector Spaces and Matrices


space, and let

,x~)

X ( k)

(4.1 1.1 6)

x(~-Il)

an nth-order ordinary dift'erential equation. A real function ' I (if it exists)


which is defined on a I interval T = (I I' t 2) C R and which has n derivatives
on T is called a solution of Eq. (4.11.16) if (I, 1' (/), ... ,rp(~)(/
E D for all
I E Tand if
rp(~)(/)
= 1(/, 1' (/), .. , rp(~-Il(/
(4.1 1.17)
for all lET.
.4 11.18.

Definition. eL t (r,
and if rp(r) =
of the initial value problem

e" ... ,e~)

e" ... ,

(4.11.16)

(X )~

rp(~-Il(r)

1(/, ,x x(ll, ...

eJ' ... ,x(I-~ l(r)

x ( r)

D. If ' I is a solution of Eq.


then ' I is called a solution

e~,
,X(~-I

}.

(4.1 1.19)

e~

Of particular interest are nth-order ordinary differential equations

a,,(/)x(~)
a,,(t)x()~

and
a,.x(~)

a._I(/)x(~-1l

a~_I(t)X(~-1l

a l (t)x(1l

+ ... +

a~_lx(~-1l

al(/)x ( l)

alx ( I)

ao(t)x

ao(t)x

0,

aox

V(/),

(4.11.20)

0,

(4.11.21)
(4.11.22)

where a,,(t), . . ,oo(t) are real continuous functions defined on the interval
T, where a~(/)
:;z:! 0 for all lET, where a~, .
, a o are real constants, where
a" :;z:! 0, and where v(/) is a real function defined and piecewise continuous
on T. We call Eq. (4.11.21) a linear homogeneous ordinary differential equation
oforder n, Eq. (4.1 1.20) a linear non-homogeneous ordinary differential equation
of order n, and Eq. (4.1 I .22) a linear ordinary differential equation of order n
with constant coefficients.
We now show that the theory of nth-order ordinary differential equations
reduces to the theory of a system of n first-order ordinary differential equations. To this end, let in Eq. (4.11.19) X = X I ' and let

IX

x =
x

I_~X

x~

X 2
3

x~

1(/,

X ( 2)

(4.1 1.23)

X(~-I)

XI'

, x~)

x(~)

This system of equations is clearly defined for all (I, X I ' ... ,x~)
E D. Now
assume that the vector p4 T = ('11' ... , rp~) is a solution of Eq. (4.11.23) on an

.4 11.

Applications to Ordinary Differential Equations

interval T. Since rp"

= ;"

rp3

f(t, rp,(t), . .. ,rpft(t

= ;", ... ,rpft = rp\ft-I),

and since

f(t, rp,(t), . .. ,rp\ft-Il(t


=

rp\ft)(t),

it follows that the first component rp, of the vector, is a solution of Eq.
(4.11.16) on the interval T. Conversely, assume that rp, is a solution of Eq.
(4.11.16) on the interval T. Then the vector cpT = (rp, rp(l), ... , rp(ft-ll) is
clearly a solution of the system of eq u ations (4.11.23). Note that if rp,(1') = ~"
... ,rp\ft-I)(1') = ~ft' then the vector, satisfies ,(f) = ; , where
= (~t,
... , ~ft)' The converse is also true.
Thus far we have concerned ourselves with initial-value problems characterized by real ordinary differential equations. It is possible to consider initialvalue problems involving complex ordinary differential equations. F o r example, let t be real and let ZT = (z " ... , Zft) be a complex vector (i.e., Zk is of
the form U k + ivk , where U k and V k are real and i = ,J = } ) .
Let D be a domain
in the (t, z) space, and letf.,
,f,. be n continuous complex-valued functions
defined on D. Let fT = (fl'
,f,.), and let = dz/dl. We call

;T

= C(t, )z

(4.11.24)

a system of n complex ordinary differential eq u ations of the first order.


A complex vector cpT = (rp" .. , rpft) which is defined and differentiable on
a real t interval T = (T" T,,) c R such that the points (I, rp,(t), ... , rpft(t
E D for all t E T and such that

= C(t, .<t

(+ t)
for all t

T, is called a solution of the system of eq u ations (4.11.24).

addition, (r,~"
... '~ft)
E D and if (rp,(-r), ... ,rpft(-r = (~I""
then cp is said to be a solution of the initial-value problem

z =

(z r- )

( t, )z } .

'~ft)

If in
~T,

(4.11.25)

Of particular interest in applications are initial-value problems characterized


by complex linear ordinary differential eq u ations having forms analogous
to those given in equations (4.1 1.13)-(4.11.15).
We can similarly consider initial-value problems described by complex
nth- o rder ordinary differential equations.
Let us look now at some specific examples. The first example demonstrates
that the solution to an initial-value problem may not be unique.
4.11.26.

Example.

Consider the initial-value problem

x =

x'/3

(x O) =

O.

We can readily verify that this problem has infinitely many solutions passing

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

through the origin of the (I, x) plane, given by


l

tpi ) =

0,
{ H(t

p)]3/Z, P <

where p is any real number such that

<

<1<

<

<

p
I

I.

The next example shows that the t interval for which a solution to the
initial-value problem exists may be restricted.
.4 11.27.

Example.

Consider the initial-value problem

= ,{

x ( t.)

where { is any real number. By direct computation we can verify that


tp(t)

= {[I -

(t -

tl){ ] - l

is a solution of this problem. We note that if t = t l + Ij{, then the solution


tp(t) is not defined. Thus, there is a restriction on the t interval for which a
solution to the above problem exists. Namely, if { > 0, the above solution
is valid over any interval (t l , t 2 ) such that
II< t

=t

e-+ '

In this case we say the solution fails to exist for I >


if { < 0, the solution given above is valid for any t >
tion exists on any interval (t l , t 2 ).

t 2 On the other hand,


t I ' and we say the solu-

The preceding examples give rise to several important questions:


When does an initial-value problem possess a solution?
When is a solution unique?
What is the extent of the interval over which such a solution exists?
Is the solution continuously dependent on the initial condition ~ ?
At the end of the next chapter we will state and prove results which give
answers to these questions.
8.

Initial-Value Problem: Linear

Systems

In the remainder of the present section we concern ourselves exclusively


with initial-value problems described by linear ordinary differential equations.
Let again T = (tit t 2 ) be a real t interval, let x T = (X I "' "
,x ,) denote an
n-dimensional vector, let A = a[ ,/] be a constant (n X n) matrix, let A(t)
= o[ l/(t)] be an (n X n) matrix with elements 0l/(t) that are defined and continuous on the interval T, and let v(tY = (vl(t), ... ,v,,(t denote an n-

4 . lJ .

Applications to Ordinary Differential Equations

24S

vector with components vl(t) that are defined and piecewise continuous on T.
In the following we consider matrices and vectors with components which
may be either real- or complex-valued. In the former case the field for the x
space is the field of real numbers, while in the latter case the field for the x
space is the field of complex numbers. Also, let
D

{(I, x): lET, x

E Rn(or en)}.

(4.11.28)

At first we consider systems of ordinary differential equations given by

= A(I)x + V(I),
i = A(I)X,

(4.11.29)

i=

(4.11.31)

(4.11.30)

and

Ax.

In the applications section of the next chapter we will show that, with the
above assumptions, equations (4.11.29)-(4.11.31)
possess unique solutions
for every (r, e) E D which exist over the entire interval T = (II' (2 ) and which
depend continuously on the initial conditions. This is an extremely important
result in applications, where we usually require that T = (- 00, 00).
.4 11.32. Theorem. The set S of all solutions of Eq. (4.11.30) on T forms
an n-dimensional vector space.

Proof

L e t fl and ' 2 be solutions of Eq. (4.11.30), let F denote the field for
the x space, and let 0.: 1, 0.: 2 E .F Since
d
dt [o.:lfl(l)

1'

0.:2' 2 (1)]

0.: 114 (1)

0.: 242 (1)

o.:IA(t)4pl(t)

A(t)[o.:l'l(t)

1' >

0.:2A(t)4p2(t)

0.:2'2(t)].

it follows that 0.: 1 + 0.: 2'2 E S whenever


f2 E S and whenever 0.: 1,0.: 2
E .F
F u rthermore, the trivial solution f = 0 defined by f(l) = 0 for all
t E T is clearly in S, and for every TI E S there exists a I' f = -11 E S such
that TI + I' f = O. It is now an easy matter to verify that all the axioms of a
vector space are satisfied for S (we leave the details to the reader to verify).
Next, we must show that S is n-dimensional; i.e., we must find a set of
solutions fl' .. ,
which is linearly independent and which spans S. Let
1;1' ... ,I;n be a set of linearly independent vectors in the n-dimensional x
space. By the existence results which we will prove in the next chapter (and
which we will accept here on faith), if f E T, there exist n solutions
of Eq. (4.11.30) such that' l (f) = ;1' i = I, ... ,n. We first show that these
solutions are linearly independent. F o r purposes of contradiction, assume that
these solutions are linearly dependent. Then there exist scalars 0.: 1" .. ,

n'

I' ' .... n'

Chapter 4 I iF nite-Dimensional
IX"

Vector Spaces and Matrices

,F not all ez ro, such that

for all t

T. This implies that

But this last equation contradicts the assumption that the ~, are linearly
independent. Thus, the ." i = I, ... ,n, are linearly independent. iF nally,
to show that these solutions span S, let. be any solution of Eq. (4.1'1.30) on
T, such that '<1') = ~. Then there exist unique scalars IX I , , IX" E F such
that

because the vectors


follows that

/~ '

i=

1, ...

,n, form

a basis for the x space. It now

By the uniqueness
is a solution of Eq. (4.11.30) on T such that .(1' ) =~.
results which we will prove in the next chapter (and which we accept here
on faith),

Since. was chosen arbitrarily, it follows that the solutions


span S. This concludes the proof. _

1' '

i=

I, ...

,n,

The above result motivates the following two definitions.


.4 11.33. Definition. A set of n linearly independent solutions of Eq.
(4.11.30) on T is called a fundamental set of solutions of (4.11.30). An (n X n)
matrix ' P whose n columns are linearly independent solutions of Eq. (4.11.30)
on T is called a fundamental matrix.
Thus, if { . I' ... , .,,} is a set of n linearly independent solutions of Eq.
(4.11.30) and if.r = (" 11' , "III)' then

,,=
[

" II , U
:~
:.1~

.. ..:::.. :.1~

111]

""I "111 ... ",,"

is a fundamental matrix.

[ . II.1!

1 .,,]

.4 11.

Applications to Ordinary

247

Equations

Diff~rential

In our next definition we employ the natural basis for the x space, given by
0-

, 8:&=

81 =

... ,

u =

0
I

.4 11.34.
Definition. A fundamental matrix . (for Eq. (4.11.30
whose
columns are determined by the linearly independent solutions ' I ' i = I,
... ,n, with
'/(1') = 01' i = I, ... ,n,
'f E

T, is called the state transition matrix . of Eq. (4.11.30).

Let X = IX [ I] be an (n X n) matrix and define differentiation of X


respect to t E Tcomponentwise; i.e., X A x [ o.] We now have:

with

.4 11.35. Theorem. L e t" be a fundamental matrix of Eq. (4.11.30) and


let X denote an (n X n) matrix. Then" satisfies the matrix equation
X = A(t)X , t E T.
(4.11.36)
Proof We have

1 .+ ]

= [ + I I+ : & !

A
[ (t)'II' IA(t>l' :' !&

= A(t)[ . II.z I 1 .] =

A(t)Y .

.. I A(t)' I ' . ]

We also have:
.4 11.37. Theorem. If" is a solution of the matrix equation
on T and if t, ' f E T, then
det "(/) = det "(' f )ef. tf A(.) i., t E T.

Proof
" =

Recall that if C =

[ " II] and A(t) =


fill

~(detY)=

:~

is an (n

n) matrix, then tr C =

;{ I

o[ IAt)]. Then I II =

"u
.. :.:&~

I "d

fl.

[ e ll]

...

lh

.. ::: .. :.z~

.. .

~.2~

(4.11.38)

I-I

CII'

Let

alk(t)"kr Now

flu

fill

(4.11.36)

~.:&~

.. ,

'IIh

.. ::: .. ~:&

".. ".1 "d


"u "I.
""
+ "u
..................

fl

flu

,,-'

,,:& .

fld . , . fl

(4.11.39)

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

Also,

1' 2' .

...................

' / Inn

The last determinant is unchanged if we subtract from the first row 012 times
the second row plus 1 , times the third row up to 0ln times the nth row.
This yields

0\1 (/)'/1

\I

0 \ I (t}yt u

1\ 2' 1

...

\I

/' 122

(t}'/I

1n

\l'2n

01\(/) det 1' .

Repeating the above procedure for the remaining determinants we get

d[ et 1'(t)] =

11 (/)

This now implies


det Y ( t)

.4 1J.04 .

T.

022(1) det 1'(1)

+ ... +

0..(/) det 1'(t)

[tr A(t)] det 1'(t).


=

for all t

det Y ( t)

det Y(r)ef~

It A(,),,,

Exercise.

Verify Eq. (4.11.39).

We now prove:

.4 1J.14 .
Theorem. A solution Y of the matrix equation (4.11.36) is a
0 for all t E T.
fundamental matrix for Eq. (4.11.30) ifand only if det 1'(t)

*'

Proof. Assume that l' = [ . 1 I V


' 21 .. 1V
' .] is a fundamental matrix for
Eq. (4.11.30), and let 'I' be a nontrivial solution for (4.11.30). By Theorem
.4 11.32 there exist unique scalars ~I' , /In E ,F not all ez ro, such that

or

1' a ,

(4.11.42)

where aT = (/II' .. ,/I.). Equation (4.11.42) constitutes a system of n linear


equations with unknowns /II' .. , /In at any f E T and has a unique solution
for any choice of.(f). eH nce, we have det 1' ( f)
0, and it now follows from
0 for any 1 E T.
Theorem .4 11.37 that det Y ( t)
Conversely, let l' be a solution of the matrix equation (4.11.36) and assume

*'

*'

.4 11.

Applications to Ordinary Differential Equations

that det Y ( t) 1= = 0 for all t


pendent for all t E T.

249

T. Then the columns of., are linearly inde-

The reader can readily prove the next result.


.4 11.43.
Theorem. L e t" be a fundamental matrix for Eq. (4.11.30), and
let C be an arbitrary (n x n) non-singular constant matrix. Then
is also
a fundamental matrix for Eq. (4.11.30). Moreover, ifT is any other fundamental matrix for Eq. (4. 11.30) then there exists a constant (n X n) non-singular matrix P such that T = "P.

"C

4.1l.44.

Exercise.

Prove Theorem .4 11.43.

Now let R(t) = [rit) be an arbitrary matrix such that the scalar valued
functions rl}(t) are Riemann integrable on T. We define integration of R(t)
componentwise, i.e.,

f R(t)dt = fr[ ,it)dt

J[

r,/(t)dt}

Integration of vectors is defined similarly.


In the next result we establish some of the properties of the state transition
matrix, . Hereafter, in order to indicate the dependence of. on l' as well
as t, we will write .(t, 1'). By b4 (t, 1'), we mean u.(t, 1')/ut.
.4 11.45.
Theorem. eL t D be defined by Eq. (4.11.28), let l' E T, let
cp(1') = ~, let (1',)~
E D, and let .(t,1' ) denote the state transition matrix
for Eq. (4.11.30) for all t E T. Then
(i) b4 (t, f) = A(t).(t, 1') with .(1' , 1' ) = I, where I denotes the (n x
identity matrix;
(ii) the unique solution of Eq. (4.11.30) is given by
,(t)

for all t E T;
(iii) .(t, f) is non-singular for alI t
(iv) for any t, (J E T we have

.(t,1' )

n)

(4.11.46)

.(t, 1'~
T;

= .(t, (J~(J,

f);

(v) [.(t,1')-1
t:. .- I (t, f) =
.(- r , t) for all t E T; and
(vi) the unique solution of Eq. (4.11.29) is given by

cp(t)

= .(t, 1')~

f .(t,

")v(,,)d,,.

(4.11.47)

Proof The first part of the theorem follows from the definition of the state
transition matrix.

Chapter 4 I iF nite-Dimensiotull

Vector Spaces and Matrices

To prove the second part, assume that f{ t ) =


with respect to t we have

(+ t)

= i(t, f~

A(t~(t,

.(t, f~.

Differentiating

= A(t)t<t).

f~

F u rthermore, f{ f ) = .(f, f~ = ~ .
F r om the uniqueness results (to be
presented in the next chapter) it follows that the specified is indeed the
solution of Eq. (4.11.30).
The third part of the theorem is a consequence of Theorem .4 11.41.
To prove the fourth part of the theorem we note that p4 t{ ) = .(t, f~
is the unique solution of Eq. (4.11.30) satisfying f{ f ) = ~ ,
and also that
f{o)' = ~O',
f~,
0' E T. Now consider the solution of Eq. (4.11.30) with
< )' .
Then
initial condition given at 0' in place of f; i.e., p4 t{ ) = .(t,O'.> O

f{ t )

.(t, f~

~t,

O'~O',

f~.

Since this equation holds for arbitrary ~ in the x space, we have

= .(t, O'~(O',

f)

~t,

f).

To prove the fifth part of the theorem we note that .- I (t, f) exists by part
(iii). F r om part (iv) it now follows that

I=
where I denotes the (n x

.(t, f~(f,

t),

n) identity matrix. Thus,

.(f, t)

.- I (t, f) =

for all t E T.
In the next chapter we will show that under the present assumptions,
Eq. (4.11.29) possesses a unique solution for every (f, ~) E D, wheret< f ) = ~.
Thus, to prove the last part of the theorem, we must show that the function
(4.11.47) is this solution. Differentiating with respect to t we have

= ,< t , f~

+ ~t,

= A(t~(t,

f~

.(t)

A(t)[~t,
=

Also, f{ f )

=~.

A(t~t)

Therefore,

+
f~

t)Y(t)
(Y t)

f ,<t,

f A(t~(t,

f .(t,

,,)v(,,) d"J

,,)Y(,,) d"
,,)Y(,,) d"

v(t)

(Y t).
is the unique solution of Eq. (4.11.29).

In engineering and physics, is interpreted as representing the "state"


of a physical system described by appropriate ordinary differential equations.
In Eq. (4.11.46), the matrix .(t, f) relates the "states" of the system at the
points t E T and f E T. Hence, the name "state transition matrix."
Next, we wish to examine the properties of linear ordinary differential

.4 11.

251

Applications to Ordinary Differential EqQ


U tions

eq u ations with constant coefficients given by Eq. (4.1 1.31). We require the
following preliminary result.
4.11.48.
Theorem. L e t A be a constant (n X n) matrix (A may be real or
complex). L e t SN(I) denote the matrix
N tie
SNCt) = I Ie~ k!AIe.

Then each element of matrix SH(I) converges absolutely and uniformly on


any finite interval (- I I' II), II > 0, as N - - > 00.
Proof

aW denote the (i,j)th element of matrix

Let

Ale, where i,j = I,


0, I, 2, .... Then the (i,j)th element of SNCt) is equal to

... , n, and k =

where ~'J

is the K r onecker

delta. We now show that

~'J +
Let m =

max

max
I.J

I,)

la}+ }

00

for all i,j.

(t Ia'i I). Then m is a constant which depends on the elements

max

=J

I~I~.

(t IalP I . IaW I) <


p=

I<

~!

letl a}r

of the matrix A SinceA1c+1 = A AIe

<

(leI tie
a'i k! '

+ 1e~1

QI)

(max
I

II I< m .max l a1} 1 1.

p=

wehavemaxla(Ie+I)I=maxl~a
~J

t. we have

by induction it follows that max l aWI< m


Then we have for any t

Since I

(- t

~J

Ialp I)(max
Ia~~1 I).
P.}

When k=

I.}

I}

I.}

tl), t l

p- I

Ipa(lell
p}

Therefore,

maxlat]l~m.
I.J

and

Now tet Mk= ( mtl)klk! .

> 0, and for any i,j,

".
I (kIt"k! I<M
l,

+ L
"-I

ali

M" =

e"'t"

we now have
~

QI)

i-

1e= 1

(kl

al}

tIc

-k'

is an absolutely and uniformly convergent series for each i,j over the interval
I I' I I) by the Weierstrass M-test.

(-

We are now in a position to consider the following:


4.11.49.
Definition.
to be the matrix

Let

A be a constant (n X

eAt =
for any

-00

<

<

00.

+ "=L 1-

k
t_ A
"
k!

n) matrix.

We define eAt

Chapter 4 I iF nite-Dimensional

252

We note immediately that eN 1,-0


We now prove:
(n

I.

(- 0 0, 00), let

.4 11.50.

Theorem. Let T =
n) matrix. Then

Vector Spaces and Matrices

l' E

T, and let A be a constant

(i) the state transition matrix for Eq. (4.11.31) is given by

.(t, 1')

= eA l' - . )

for all t E T;
(ii) the matrix eA' is non-singular for all t
(iii) eA"eA" = eA1h+ t ,) for all t I> t 2 E T;
(iv) AeA' = eA'A for all t E T; and
(v) (eN)-1 = e- AI for all t E T.

T;

Proof To prove the first part we must show that .(t, 1') satisfies the matrix
equation
' ( t,1' ) = A.(t, 1')

for all t

T, with .(1' , 1')

.(/,1' )

I. Now, by definition,

e AII-. )

:E (t - k! r- )k Ak.

k=1

In view of Theorem .4 11.48 we may differentiate the above series term by


term. In doing so we obtain

= A+

.{ [ e AII- . )]

dt

:E (t - k! 1')k Ak+l

k=1

= AeA II - .

= A[ I +

:E (t - k! 1')k Ak]

k-I

l,

and thus we have


1')

t~ ,

A~t,

1')

for all t E T, with .(1' , 1' ) = eA l.- . l = I. Therefore, eAll - d is the state transition matrix for Eq. (4.11.31).
The second part of the theorem is obvious.
To prove the third part of the theorem, we note that for any tl, t 2 E T,
we have
- t 2) = eN', which
Now .(tl> - t 2) = eAII,+ t ,l, .(tl> 0) = eA", and ~O,
yields the desired result.
To prove the fourth part of the theorem we note that for all t E T,

A(I +

:E Ik Ak) =

k= l k!

A+

:E t....Ak+1

k=1 k!

(I + k-tl t....Ak)A.
k!

.4 11.

Applications to Ordinary Differential Equations

253

iF nally, to prove the last part of the theorem, note that for all t
t!', . t!(' ,- ) = eA (,- , ) = I.

= e-

Therefore, (t! " )- t

A,.

T,

The following natural question arises: can we find an expression similar


to t!', for the case when A = A(t), t E T. The answer is, in general, no.
oH wever, there is a special case when such a generalization is valid.
Theorem. If for Eq.

.4 11.51.

(4.11.30)

A(tt)A(t z )

= A(tz)A(tt) for all

t l , t z E T, then the state transition matrix .(t, T) is given by

.(t, T)
where B(t, T)
.4 11.52.

= e

A'J (I,)d~
T

eB(.,T)

k= 1

-kIB~{t,

T),

A('1) d'l.

Exercise.

Prove Theorem .4 11.5 I.

We note that a sufficient condition for A{ t


all t I> t z E T is that A( t) be a diagonal matrix.
.4 11.53.

...

Exercise.

l)

to commute with A(tz ) for

Find the state transition matrix for i


A{ t )

A{t)x,

where

~l

= [;

The reader will find it instructive to verify the following additional


results.
.4 11.54.

Exercise.

eL t A denote the (n

A= A[ .I.

eA ,

for all t

T=

=
[

n) diagonal matrix

0].

Show that

A.~

ell' .

0 ]

el .,

(- b o, bo).

.4 11.55. Exercise.
eL t t E T = (- bo, be , let T E T, and let ~ E R~
(or en). Let A be the (n x n) matrix for Eq. (4.1I.3I), and let. denote the

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

unique solution ofEq . (4.11.31) with ,(of) = ~. Let P be a similarity transformation for A, and let B = p- I AP.
(a) Show that eAt = Pe-rP-1 for all t E T.
(b) Show that the unique solution of Eq. (4.11.31) is given by

,=

P.r,

where. is the unique solution of the initial-value problem


t= B y

with

'!(f)
4.1l.56. Exercise.
A(t) = A for all t

P- I ' ( f) =

P-I~.

Let D be defined by (4.11.28).


T; i.e.,

Ax

In Eq.

(4.1 1.29), let


(4.11.57)

v(t).

L e tf E T, and let, denote the unique solution ofEq . (4.11.57) with ' ( f) =~.
Let P be a similarity transformation for A, and let B = p- I AP. Show that
the unique solution of Eq. (4.11.57) is given by

.= p ' ! ' ,
where. is the unique solution of the initial-value problem

By +

with
(f, '! (f

D, t

P- I v(t)

T.

.4 11.58. Exercise. Let J denote the J o rdan canonical form of the (n x n)


matrix A of Eq. (4.11.31), and let M denote the non-singular (n x n) matrix
which transforms A into J ; i.e., J = M- I AM. Then

J o:

- - 1- -

J=

where

i_~!:

o
o

.4 11.

Applications to Ordinary Differential Equations

where

oo

o
o

J.=

. I

1.. + .
o 0
0 lk+ ..
m = I, ... ,p, and where 11> ... ,1.., lk+.' ... ,lu, denote the (not
necessarily distinct) eigenvalues of A. Show that

ell'

where

.]

and
I"

I, - i

2!

(v. - l)!
1' - "
(VIII 2)!

o
where J . is a

VIII

VIII

matrix and k

e'"

v.

+ ... +

v, = n.

Next, we consider initial-value problems characterized by linear nth-order


ordinary differential equations given by

a.(t)x

l l

a.(t)x

l l

a._ . (t)x

a._ . (t)x

c.-

and
a.x

l .)

a._ . x

.-

Il

Il

l. - I )

a.(t)x ( \ l

+ ... +

a.(t)x ( \ l

a.x l l)

ao(t)x =

ao(t)x
aox

v(t),

(4.11.59)

0,

(4.11.60)

O.

(4.11.61)

In Eqs. (4.11.59) and (4.11.60), v(t) and o,(t), i = 0, ... ,n, are functions
which are defined and continuous on a real t interval T, and in Eq. (4.11.61),

Chapter 4 I iF nite-Dimensional

Vector Spaces and Matrices

i = 0, ... , n, are constant coefficients. We assume that 0. F= 0, that


0 for any 1 E T, and that v(l) is not identically ez ro. Furthermore, the
coefficients 01' 0 1(1), i = 0, ... ,n, may be either real or complex.
In accordance with Eq. (4.11.23), we can reduce the study of Eq. (4.11.60)
to the study of the system of n first-order ordinary differential equations

the

01'

0,,(1) F=

i
where

A(I) =

-' oo(t)
_ a,,(I)

o
o

(4.11.62)

A(I)x,

1(1)

- 0 2(1)
0.(1)

a,,(I)

(4.11.63)

- O "- I (t)

a,,(/)

In this case the matrix A(I) is said to be in companion form. Since A(I) is
continuous on T, there exists for all 1 ETa unique solution II to the initialvalue problem
i =

A(I)x

x(t')=;=(~I,,~,,)T

(4.11.64)
'

where T E T and; E R" (or e") (this will be proved in the next chapter).
Moreover, the first component of ,I, PI' is the solution of Eq. (4.11.60)
satisfying
PI(T) =

p(T) =

p(\)(T) =

~I'

... , pl"-II(T) =

~2'

~".

Now let 1' 1' " .. ,' I ' " be solutions of Eq. (4.11.60). Then we can readily
verify that the matrix

=[

;::: ...
1' "\ ' -

;1:' ...:::...;:' ]

I~"-

t)

t)

,,~.-

(4.11.65)

t)

is a solution of the matrix equation

A(I)",

(4.11.66)

where A(I) is defined by Eq. (4.11.63). We call the determinant of" the
Wronskian of Eq. (4.11.60) with respect to the solutions l1>""
I ", and
we denote it by
det" = W(' I ' I > " "
1' ,' ,).
(4.11.67)
Note that for a fixed set of solutions I I" .. , "" (and considering
the Wronskian is a function of I. To indicate this, we write W(" I '

T
,

fixed),

".)(1).

.4 11.

257

Applications to Ordinary Differential Equations

In view of Theorem .4 11.37 we have for all t

,' Y ,)(t)

W(Y ' I "' "

=
.4 11.69.
tion

Example.

T,

det Y ( t) =

det 'P(r)eJ~trACII'"

W(Y ' I "' "

Y', )(r)eJ~-[II.-.e")/II.C"lld".

(4.11.68)

Consider the second-order ordinary differential eq u a-

tZx CZI

tx

The functions 1' Y (t) = t and (z ' Y t)


Consider now the matrix

=
x

0,

< t<

(4.11.70)

00.

lit are clearly solutions of Eq. (4.11.70).

Then
W(YI' >

)z'Y (t)
=

det P
' (t) =

--,

t>

O.

the notation of Eq. (4.1 1.63), we have in the present case al(t)laz(t)
lit. F r om Eq. (4.1 1.68) we have, for any l' > 0,

Using

)z'Y (t)

W(YI' >

= det "(t) =
-_

which checks.

- e2

W(Y' I> )z'Y (r)eJ~

ID (Titl _

l'

-, 2

(- II.e"I/II,C"IJ d"

t>

0,

The reader will have no difficulty in proving the following:

.4 11.71.

Theorem. A set of n solutions of Eq. (4.11.60), Y'I' ... ,Y'", is


linearly independent on a t interval T if and only if W(Yt' >
... ,Y,' ,)(t) 1= = 0
for all t E T. Moreover, every solution of Eq. (4.11.60) is a linear combination of any set of n linearly independent solutions.

.4 11.72.

Exercise.

Prove Theorem .4 11. 71.

We call a set ofn solutions ofEq . (4.11.60), 1'Y t .. , "' Y , which is linearly
independent on T a fundamental set for Eq. (4.11.60).
L e t us next turn our attention to the non-homogeneous linear nth- o rder
ordinary differential eq u ation (4.11.59). Without loss of generality, let us
assume that a,,(t) = 1 for all t E T; i.e., let us consider
C
X "1

a"_I(t)xC"-1l

+ ... +

al(t)x(l)

ao(t)x

v(t).

(4.11.73)

The study of this eq u ation reduces to the study of the system of n first-order

I Finite-Dimensional

Chapter 4

158

ordinary differential equations

o
o

A(t) =

A(t)x

i =

where

Vector Spaces and Matrices

b(t),

(4.11.74)

000
- o o(t)

- 0 1(/)

- 0 2(/)

...

- 0 ._

b(t) =

1(/)

o
V(/)
(4.11.75)

In the next chapter we will show that for all lET there exists a unique
solution ~ to the initial-value problem
i =
(X T)

A(t)x

=; =

b(t)

(el' ... ,e.)T

(4.11.76)

CI '

where T E T and; E R (or C). The first component of~,


of Eq. (4.11.59), with 0.(/) = 1 for all t E T, satisfying
CI(-r)

= el'

C(tJ(r- )

'2'

is the solution

= , .

... , Clo-(> ! r- )

We now have:

.4 11.77.

Theorem. Let I{ t>

lX .>

... , I .}

+ ... +

I.- I>

O._I(t)X

Then the solution' of the equation


Xl.>

o._I(/)x(-tJ

be a fundamental set for the equation

OI(t)X()J

01(/)X()J

satisfying ~(T) = ; = (C(T), CIIl(T),


, ,(-t>(TT
R (or C) is given by the expression

; E

C(/)

= CA(/)

+ to I ,(/)
t:1

I'

W
{ ,(I .. .
W(I h

oo(/)X

oo(t)x =

= O.
v(1),

(4.11.78)
(4.11.79)

(' I " .. ,' . )T, T

, .' )(s)} v(s) ds,


I .)(s)

T,

(4.11.80)

where CA is the solution of Eq. (4.11.78) with CA(T) = ' I ' and where ~(lI'
... ,I.X/) is obtained from W(lI" .. , I .)(/) by replacing the ith column of
W(lI" .. , I .X/)
by (0,0, ... , l)T.

.4 11.81.

Exercise.

Prove Theorem .4 11.77.

Let us consider a specific case.

.4 11.82.
tion

Example.

Consider the second-order ordinary differential equa-

12 x

12>

tx

ltJ
-

b(t), t

>

0,

(4.11.83)

.4 11.

Applications to Ordinary Differential Equations

where b(t) is a real continuous function for all t >


equivalent to

O. This equation

is

(4.11.84)

where v(t) = b(t)/t'1.. F r om Example .4 11.69 we have V'I(t) =


and W(V'., V'1' .)(t) = - 2 /t. Also,

I
t
-1

t, V'1' .(t) =

l/t,

=--,

tr

eL t us next focus our attention on linear nth-order ordinary differential


equations with constant coefficients. Without loss of generality, let us
assume that, in Eq. (4.11.61), a. = 1. We have
(4.11.85)

We call the algebraic equation

a._ l l n- 1 + .,. + all + a o = 0


(4.11.86)
the characteristic equation of the differential equation (4.11.85).
As was done before, we see that the study of Eq. (4.11.85) reduces to the
study of the system of first-order ordinary differential equations given by
P(l) =

ln

:i =

where

Ax,

(4.11.87)

AJ l-a.~ ..... ~ ..... ~ .....~ ... :::.... ~ . .]


o

-al

(4.11.88)

- a 3 ..
- a ._ I
We now show that the eigenvalues of matrix A ofEq . (4.11.88) are precisely
the roots of the characteristic equation (4.11.86). First we consider

-,t

-,t

det(A - , tI)

-a2

o
o

-,t

o
o

Chapter 4 I iF nite-Dimensional

160

-1

Vector Spaces and Matrices

-1

-1

-01

-0"

-03

= -1

...

-(1

- 0 "_ , ,

.
0,,_ 1 )

100
-1

+
sU ing

(- 1 )"+ 1 (- 0

induction we arrive at the expression


det(A - 1 1)

(- I )"{ l "

, ,_11,,-1

0)

+ ... +

-1

all

oo}.

0.

(4.11.89)

1 is an eigenvalue of A if and only if 1


is a root of the characteristic equation (4.11.86).

It follows from Eq. (4.11.89) that

.4 11.90. Exercise. Assume that the eigenvalues of matrix A given in Eq.


(4.11.88) are all real and distinct. eL t A denote the diagonal matrix

(4.11.91)

1"
where 1 1 , ,1" denote the eigenvalues of matrix A. eL t
Vanclermonde matrix given by

V denote the

I
V=

11
II

1"
l~

1"
l~

(a) Show that V is non-singular.


(b) Show that A = V-IAV.
Before closing the present section, let us consider so-called "adjoint
systems." To this end let us consider once more Eq. (4.11.30); i.e.,
t

A(t)x .

(4.11.92)

Let A*(t) denote the conjugate transpose of A(t). (That is, ifA(t) = o[ (J' t)],
then A*(t) = a[ l}(t)]T = a[ ,J (t)],
where a,it) denotes the complex conjugate

.4 12.

261

Notes and References

of a,it).) We call the system of linear first-order ordinary differential equations


y = -A*(t)y
(4.11.93)
the adjoint system to (4.1 1.92).
.4 11.94.
Exercise. eL t Y be a fundamental matrix of Eq. (4.11.92). Show
that T is a fundamental matrix for Eq. (4.11.93) if and only if

T*Y =
C,
where. C is a constant non-singular matrix, and where T* denotes the conjugate transpose of T.
It is also possible to consider adjoint equations for linear nth-order
ordinary differential equations. eL t us for example consider Eq. (4.11.85),
the study of which can be reduced to that of Eq. (4.11.87), with A specified
by Eq. (4.11.88). Now consider the adjoint system to Eq. (4.11.87), given by

y= - A *y,

where

0
0

0
-I

- A *=

(4.11.95)
0
0
0

0 -I

ao
a
a2
1

.....................
o 0
- I a.-

where a, denotes the complex conjugate of a" i =


(4.11.95) represents the system of equations
Yl
=

2Y =
.Y

aoY.,
-YI

= - Y , ,- I

(4.1 1.96)

0, ... , n -

I. Equation

(4.1 1.97)

alY . '
a,,-I.Y

Differentiating the last expression in Eq. (4.11.97) (n ... ' Y " - I '
and letting "Y = ,Y we obtain
C
(- I ) y "> + (- I ),,- l a._1Y c..-I> + ... + (- I )Qlit> +

1) times, eliminating

Y"

aoY

O.

(4.11.98)

Equation (4.11.98) is called the adjoint of Eq. (4.1 1.85).

.4 12.

NOTES AND REFERENCES

There are many excellent texts on finite-dimensional vector spaces and


matrices that can be used to supplement this chapter (see e.g., .4 [ 1], .4 [ 2],
.4 [ ,] 4
and .4 [ 6].4 [ - 10]).
References .4 [ 1], .4 [ 2], .4 [ 6], and .4 [ 10] include appli-

C1uIpter 4

I Fmite-Dimensional

Vector Spaces and Matrices

cations. (In particular, consult the references in .4 [ 10] for a list of diversified
areas of applications.)
ExceUent references on ordinary differential equations include .4 [ 3], .4 [ 5],
and .4 [ 11].
REFERENCES
.4 [ 1]

.4 [ 2]

.4 [ 3]
.4 [ ] 4
.4[ 5]

.4 [ 6]
.4 [ 7)
.4[ 8]
.4[ 9]
.4[ 10]

.4 [ 11]

N. R. AMuNDSON, MatMmatical Methods in Chemical Engineering: Matrices


and Their Applications. Englewood ai1f's, N.J.: Prentice-aH ll, Inc., 1966.
R. E. BELM
L AN, Introduction to Matrix Algebra. New York: McGraw-iH D
Book Company, Inc., 1970.
.F BRAUER and .J A. NOBEL, Qualitatil1e Theory of Ordinary Differential
Equations: An Introduction. New York: W. A. Benjamin, Inc., 1969. *
E. T. BROWNE, Introduction to the Theory of Determinants and Matrices.
Chapel iH D, N.C.: The nU iversity of North carolina Press, 1958.
E. A. CoDDINGTON and N. IL MNSON, Theory of Ordinary Differential
Equations. New York: McGraw-iH ll Book Company, Inc., 1955.
F. R. GANTMACHER, Theory of Matrices. Vols. I, II. New York: Chelsea
Publishing Company, 1959.
P. R. IIALMos, iF nite Dimensional Vector Spaces. Princeton, N.J.: D. Van
Nostrand Company, Inc., 1958.
.K O
H M
F AN
and R. N
UK ZE,
Linear Algebra. Englewood ai1f's, N.J.:
Prentice-aH ll,

Inc., 1961.

S. IL PSCHT
U Z,
Linear Algebra. New York: McGraw-iH ll Book Company,
1968.
B. NOBLE, Applied iL near Algebra. Englewood aiit' s , N.J.: Prentice-aH ll,
Inc., 1969.
.L S. PoNTlU A OIN, Ordinary Differential Equations. Reading, Mass.:

Addison-Wesley Publishing Co., Inc., 1962.

- R eprinted by Dover Publications, Inc., New York,

1989.

METRIC SPACES

U p to this point in our development we have concerned ourselves primarily


with algebraic structure of mathematical systems. In the present chapter we
focus our attention on topological structure. In doing so, we introduce the
concepts of "distance" and "closeness." In the final two chapters we will
consider mathematical systems endowed with algebraic as well as topological
structure.
A generalization of the concept of "distance" is the notion of metric.
Using the terminology from geometry, we will refer to elements of an
arbitrary set X as points and we will characterize metric as a real-valued,
non-negative function on X x X satisfying the properties of "distance"
between two points of .X We will refer to a mathematical system consisting
of a basic set X and a metric defined on it as a metric space. We emphasize
that in the present chapter the underlying space X need not be a linear
space.
In the first nine sections of the present chapter we establish several basic
facts from the theory of metric spaces, while in the last section of the present
chapter, which consists of two parts, we consider some applications to the
material of the present chapter.

5.1.

DEFINITION

OF

METRIC SPACE

We begin with the following definition of metric and metric space.


5.1.1. Definition. eL t X
real-valued function on X
lowing properties:
(i) p(x, y)
(ii) p(x , y)
(iii) p(x, y)

<

>

be an arbitrary non-empty set, and let p be a


x ,X i.e., p: X x X - R, where p has the fol-

0 for all ,x y E X and p(x , y) = 0 if and only if x =


p(y, x) for all x, y E X ; and
p(x , )z + p(z, y) for all x , y, Z E .X

y;

The function p is called a metric on ,X and the mathematical system consisting of p and ,X {X; p}, is called a metric space.
The set X is often called the underlying set of the metric space, the elements
of X are often called points, and p(x, y) is frequently called the distance
from a point x E X to a point y E .X In view of axiom (i) the distance
between two different points is a unique positive number and is equal to zero
if and only if two points coincide. Axiom (ii) indicates that the distance
between points x and y is equal to the distance between points y and x.
Axiom (iii) represents the well-known triangle inequality encountered, for
example, in plane geometry. Clearly, if p is a metric for X and if IX is any
real positive number, then the function IXp(X, y) is also a metric for .X We
are thus in a position to define infinitely many metries on .X
The above definition of metric was motivated by our notion of distance.
Our next result enables us to define metric in an equivalent (and often convenient) way.
5.1.2. Theorem. eL t p: X
(i) p(x, y) =
(ii) p(y, )z <

X -

R. Then p is a metric if and only if

0 if and only if x = y; and


+ p(x, )z for all x , y, z

p(x , y)

.X

Proof

The necessity is obvious. To prove sufficiency, let x, y, Z E X with


y = .z Then 0 = p(y, y) < 2p(x, y). eH nce, p(x , y) ~ 0 for all ,x y E .X
Next, let Z = .x Then p(y, )x < p(x, y). Since x and yare arbitrary, we can
reverse their role and conclude p(x , y) < p(y, x). Therefore, p(x , y) = P(Y, x )
for all ,x y E .X This proves that p is a metric.
Different metrics defined on the same underlying set X yield different
metric spaces. In applications, the choice of a specific metric is often dictated
by the particular problem on hand. If in a particular situation the metric p
is understood, then we simply write X in place of { X ; p} to denote the particular metric space under consideration.
eL t us now consider a few examples of metric spaces.

5.1.

Definition 01 Metric Space

5.1.3. Example.
L e t X be the set of real numbers R, and let the function
P on R x R be defined as
p(x, y)

= Ix

yI

(5.1.4)

for all x, Y E R, where Ix I denotes the absolute value of .x Now clearly


p(x , y) = Ix - yl = 0 ifand only if x = y. Also, for all x , y, Z E R, we have
p(y, )z = Iy - lz = 1(Y - )x + (x - )z 1 < Ix - yl + Ix - lz = p(x , y)
p(x, )z . Therefore, by Theorem 5.1.2, P is a metric and R
{ ; p} is a metric
space. We call p(x, y) defined by Eq. (5.1.4) the usual metric on R, and we
call the metric space R
{ ; p} the real line. _

5.1.5. Example.
L e t X be the set of all complex numbers C. If z E C,
and where a, b are real numbers. Let
then z = a + ib, where i = . .;= 1 ,
i = a - ib and define p as

p(z

l'

Z2)

([ z

Z2)(Z I

I -

Z2)],12.

(5.1.6)

It can readily be shown that C


{ ; p} is a metric space. We call (5.1.6) the
usual metric for C. _
5.1.7. Example.
Let X
function p on X X X as

be an arbitrary non-empty set and define the

0 if x =
{ I if x

* y.

p(x, y) =

(5.1.8)

Clearly p(x, y) 2 0 for all ,x y E X, p(x, x) = 0 for all x E ,X and p(x, y)


::::;; p(x, z)
p(z, y) for all x, y, z E .X Therefore, (5.1.8) is a metric on .X
The function defined in Eq. (5. I .8) is called the discrete metric and is important
in analysis because it can be used to metrize any set .X
_

We distinguish between bounded and unbounded metric spaces.


5.1.9. Definition. L e t { X ; p} be a metric space. If there exists a positive
number r such that p(x, y) < , for all x, y E ,X we say ;X {
p} is a bounded
metric space. If;X{
p} is not bounded, we say ;X{
p} is an unbounded metric
space.
If ;X {
p} is an unbounded metric space, then p takes on arbitrarily large
values. The metric spaces in Examples 5. 1.3 and 5. 1.5 are unbounded, whereas
the metric space in Example 5.1. 7 is clearly bounded.
5.1.10. Exercise.
Let ;X{
p}
function PI : X x X - + R by

be an arbitrary metric space. Define the

PI( X , y)

Show that PI(X,

1+

p(x , y)
.
p(x , y)

y) is a metric. Show that ;X {

(5.1.11)

PI} is a bounded metric space,

Chapter 5 I Metric Spaces


even though ;X {
p} may not be bounded. Thus, the function (5.1.11) can be
used to generate a bounded metric space from any unbounded metric space.
(H i nt: Show that if,: R - . R is given by ,(t) = t/(l
t), then ,(t 1) < ,(t 1 )
for all t 1 , t 1 such that 0 < t 1 < t 1 .)

Subsequently, we will call

R* =

u o+{ o}

{ - o o}

RU

the extended real numbers. In the following exercise, we define a useful metric
on R*. This metric is, of course, not the only metric possible.

5.1.12. Exercise.

= R* and define the function f: R* - . R as


X

Let

1:+ :
~

[(x)

1 {lxi'

:: ~:
E

p*: R* x R* - . R be defined by p.(x , y) = If(x ) - f(y) I for all ,x


R*. Show that R
{ *; P.} is a bounded metric space. The function p* is
{ *; p*} is called the extended real line.
called the usual metric for R*, and R
Let

We will have occasion to use the nex t result.

p} be a metric space, and let x, y, and z be any

5.1.13. Theorem. L e t { X ;
elements of .X

for all x , ,Y z E

Proof

Then

.x

Ip(x,

)z -

I<

p(y, )z

(5.1.14)

p(x, y)

F r om ax i om (iii) of Definition 5.1.1 it follows that

<

p(x, )z
and

<

p(y, x)

p(x, z) -

p(y, z)

P(Y, z)

p(x, y)

p(y, )z

(5.1.15)

p(x, )z .

(5.1.16)

p(x, y)

(5.1.17)

P(Y, )z .

(5.1.18)

F r om (5.1.15) we have

<

and from (5.1.16) we have

- p (y, x)

p(x, z) -

In view of ax i om (ii) of Definition 5.1.1 we have p(x, y)


relations (5.1.17) and (5.1.18) imply
- p(x, y)
This proves that Ip(x, z) -

<

p(x, z) p(y, z)

I<

P(Y, z)

<

p(y, x), and thus

p(x, y).

p(x, y) for all x , y, z

.X

5.1.

267

Definition ofMetric Space

The notion of metric makes it possible to consider various geometric


concepts. We have:
5.1.19. Definition. Let fX ; p} be a metric space, and let Y be a non-void
subset of .X If p(x, y) is bounded for all ,x y E ,Y we define the diameter
of set ,Y denoted t5( )Y or diam (Y ) , as

t5(Y)

sup p{ (x,

y): ,x y E .} Y

If p(x , y) is unbounded, we write t5( Y ) =


00 and we say that Y
has
infinite diameter, or Y is unbounded. If Y is empty, we define t5( )Y = O.
5.1.20. Exercise.
Show that if Y c Z c ,X where fX ; p} is a metric space,
then t5( )Y < t5(Z). Also, show that if Z is non-empty, then t5(Z) = 0 if and
only if Z is a singleton.
We also have:
5.1.21. Definition. Let fX ; p} be a metric space, and let Y a nd Z be two
non-void subsets of .X We define the distance between sets Y a nd Z as

d(Y , Z) =
Let p

inffp(y, )z : y

X and define
d(p, Z)

inffp(p, )z : z

,Y z
E

Z}.

Z}.

We call d(p, Z) the distance between point p and set Z.


Since p(y, z) = p(z, y) for all y E Y and z E Z, it follows that d( ,Y Z)
d(Z, Y). We note that, in general, d( ,Y Z) = 0 does not imply that Y a nd
Z have points in common. F o r example, let X be the real line with the usual
metric p. If Y = fx E :X 0 < x < I} and Z = fx E :X I < x < 2}, then
clearly d( ,Y Z) = 0, even though Y n Z = 0. Similarly, d(p, Z) = 0 does
not imply that p E Z.

5.1.22. Theorem. Let fX ; p} be a metric space, and let Y b e any non-void


subset of .X If p' denotes the restriction of p to Y X ,Y i.e., if

p'(x,
then f;Y

y) =

p(x, y) for all ,x y

,Y

p'} is a metric space.

5.1.23. Exercise.

Prove Theorem 5.1.22.

We call p' the metric induced by p on ,Y and we say that {Y; p'} is a
metric subspace of fX ; p} or simply a subspace of .X Since usually there is
no room for confusion, we drop the prime from p' and simply denote the

Chapter 5 I Metric Spaces


metric subspace by {Y; pl. We emphasize that any non-void subset of a metric
space can be made into a metric subspace. This is not so in the case of linear
,X then we speak of a proper subspace.
subspaces. If Y

5.2.

SOME INEQUAIL TIES

In order to present some of the important metric spaces that arise in


applications, we first need to establish some important inequalities. These
are summarized and proved in the following:
5.2.1. Theorem. L e t R denote the set of real numbers, and let C denote
the set of complex numbers.
(i)

Let p, q

R such that I

<

pER such that ~ >

for all~,

00

~p :::;; ,~
(ii) (H6Ider's

inequality)

1
1
- + -p= 1 . q
(a)

Let

and such that 1. 1. =


p
q
0 and p > 0, we have

<

p, q

1. Then

+ fJq ' .

(5.2.2)

R be such that 1 < p

<

iF nite Sums. L e t n be any positive integer and, let, 1>


and ' I ., ... ,' I . belong either to R or to C. Then

00,

and

n'

(5.2.3)

(b) Infinite Sums. L e t

R or C. If ~

"{ l

le,l' <

and I'{ I}
00

and ~

be infinite sequences in either


I'II~

<

00,

then
(5.2.4)

(c)

Integrals.

f, g:

la, b]

Let
-+

la, b)
R. If

be an interval on the real line, and let

s: If(t) I' dt <

00

and

s: Ig(t) I' dt <

(integration is in the Riemann sense), then

s: If(t)g(t) Idt :::;; :U

If(t) I' dt]

I/':U

Ig(t) ~

dt]

(iii) (Minkowski's inequality) L e t pER, where 1:::;; p < 00.


(a) iF nite Sums. Let n be any positive integer, and let
and ' I I' ... , tI. belong either to R or to C. Then

II'.

00

(5.2.5)

I' ' ... ,e.

5.2. Some Inequalities

(b)

269

Infinite Sums. Let e{ /} and I'{ I} be infinite sequences


R or C. Ift:1le/l' < 00 and ~ 1'1/1' < 00, then
/
[~Iel
I' II'T ' < [~lel 'T/,
+ [ t il' l d' T /,.
eo

(c)

:U
Proof

Integrals.
,J g: a[ , b]

a[ , b]

Let
-+

R. If

then

If(tWdt <

:U

= fl.'/p and q 2 =
any choice of fl., P >

ql

and

I
If(tW dtT '

(tile/l,)' / ,

(~I~II,Y!J'

From Figure A it is clear that q l


0, and hence relation (5.2.2) follows.

00,

/
Ig(tW dtT ,.
(5.2.8)

(tilell') II' =

q2

0 or if

> a.p for

(til1' 11') III

(5.2.3) follows trivially. Therefore, we assume that

7= = 0 and

lell

:U
+

s:

P'/.q

0, then inequality

Ig(tW dt <

= e,-I in the (e, 'I) plane,


= So- e,-I de and q 2 =
If-I d'l. We have

To prove part (iia) we first note that if


=

00

To prove part (i), consider the graph of 1'


ql

(5.2.7)

be an interval on the real line, and let

If(t) g(tW dtTI, <

depicted in Figure A. Let

in either

eo

(iil'lII,)'/I 7= =
(~'TI Iyl'

O. From (5.2.2) we now have

< 1- .

1'1/1

(~I~II')

lell'

Hence,

5.1.9.

iF gure A.

+ 1- .
q

(~'TI ')

I'llI'

Chapter 5 I Metric Spaces

270
It now follows that

~ I,~ ' I

~ '~,I 'I,I
=

~ (~

I~,I,)'/

(~I' ,~)'/.

which was to be proved.


To prove part (iib), we note that for any positive integer n,

1'1,1,)'/' ~ (~I,~ I,)/' (~

< (~I~,I,)'/'(~

~ 1,~I' 1

1'1,1,)'/.'

If we let n - . 00 in the above inequality, then (5.2.4) follows.


The proof of part (iic) is established in a similar fashion. We leave the
details of the proof to the reader.
To prove part (iiia), we first note that if p = I, then inequality (5.2.6)
follows trivially. It therefore suffices to consider the case I < p < 00. We
observe that for any ~, and
we have

I' "

+ I'I,IY =

(I~,I

I'I,I)'-II~,I

1[ ,' 1 +

I'I,I]'-I'~,I

(I~,I

+ 1'1,1)'1- 1'1,1

(I~,I

Summing the above identity with respect to i from I to n, we now have

~
=

+
~

[I~,I

+ 1'1,1]1-' 1'1,1

Applying the Halder inequality (5.2.3) to each of the sums on the right side
of the above relation and noting that (p - l)q = p, we now obtain

~ 1[ ,' 1 + 1'1,1]' < [~(I~,I


If we assume that

+ I' ,I)'T/'[~

[t

+ [~(le,1
/
(le,1 + 1'1,1),]1 ' *- 0

'sl

above inequality by this term, we have

[ ~

Since [ ~

I~,

1] /' < [ ~
[ .~ (I ,~ I +

(1',1 + 1'1,1)'

1' ,1'

1] 1'

I~,I'J/

<

a
We note that in case [ I;
(I~,I
1=1

+ 1'I,l)'T

,[t.1'1' ,I,]/' .'

and divide both sides of the

1] /' + [ .~ 1'1,1' 1] /' .


1'1,1)' 1] /' ,the desired result follows.
I~,I'

+ 1'1,1)' 1] /' =

0, inequality

(5.2.6) follows

trivially.
Applying the same reasoning as above, the reader can now prove the
Minkowski inequality for infinite sum!! and for integrals. _
If in (5.2.3), (5.2.4), or (5.2.5) we let p = q = t, then we speak of the
Schwarz inequality for finite sums, infinite sums, and integrals, respectively.

271

5.3. Examples ofImportant Metric Spaces

5.2.10. Exercise.
Prove H o lder' s
inequality
for integrals (5.2.5),
Minkowski' s inequality for infinite sums (5.2.7), and Minkowski's inequality
for integrals (5.2.8).

5.3.

EXAMPLES

OF

IMPORTANT METRIC SPACES

In the present section we consider specific examples of metric spaces which


are very important in applications. It turns out that aU of the spaces of this
section are also vector spaces.
As in Chapter ,4 we denote elements ,x y E R~ (elements ,x y E C~) by
x = (' I '
, ,~) and y = ('11' ... ,'1~),
respectively, where ' I ' 1' 1 E R for
i = I,
,n (where ' " 1' 1 E C for i = I, ... ,n). Similarly, elements ,x
y E Roo (elements x , y E Coo) are denoted by x = (' 1 ' ,,,, ...) andy = ('11'
1' ", ...), respectively, where
1' 1 E R for all i (where
1' 1 E C for all i).

I' '

5.3.1. Example.

Let

I' '

Cn), let 1 :::;; P <

Rn (let X =

[ t1=1

p,(x, y) =

00,

and let

1', - 1' 11,]1/'.

(5.3.2)

We now show that (Rn; p,}({ C~;


p,}) is a metric space.
Axioms (i) and (ii) of Definition 5.1.1 are readily verified. To show that
axiom (iii) is satisfied, let a, b, d E Rn (let a, b, d E cn), where a = (<IX '
... , (I..), b = (ft I' ... ,ft.), and d = (~I'
... ,~.).
If x = a - band y = b
- d, then we have from inequality (5.2.6),

p,(a,d)=

:::;;

} 11,

} 11,

~1(l.1-~,I'
{

1~ (l.I-PII'

1.l,~I(l.I-~ PI+PI-~II'
=

~IP,-~,I'

11,

11,

= p (a,b)+ p (b,d),

the triangle inequality. It thus follows that (Rn; p,}(C


{ n; p,}) is a metric
space; in fact, it is an unbounded metric space.
We frequently abbreviate (R~; p,} by R; and (cn; p,} by C;. F o r the case
p = 2, we call p" the Euclidean metric or the usual metric on R~.
_
5.3.3. Example.

F o r ,x y

poo(x, y)
It is readily shown that
5.3.5. Ex a mple.

Let

E R~

(for ,x y

max (1'1 -

(R~;

poo}( cn;

<

1 <p

I, =

x{

00,

EX :

E C~),

1111, ...

let

,I'n - 1' nl}

(5.3.4)

poo}) is a metric space. _


let X

R- (or X

t I'll' <

I-I

oo} .

Coo), and define

(5.3.6)

Chapter 5 I Metric Spaces

172
F o r ,x y

[ I;.. I{,

I" let

pJ X , y)

- 1' ,1'

I- I

1] /, .

(5.3.7)

We can readily verify that {I,; p,} is a metric space. _


5.3.8. Example.

eL t X

I..
F o r ,x y

R" (or X

= x{

C"), and let

:X sup 1{ ,{ 1l

I.. , define

< oo}.

(5.3.9)

p..(x, y) = sup
, I{ ,{ - I' ,ll

(5.3.10)

We can easily show that I{ .. ; P..} is a metric space. _


5.3.11. Exercise. sU e the inequalities of Section 5.2 to show that the
spaces of Examples 5.3.3, 5.3.5, and 5.3.8 are metric spaces.
5.3.12. Example. eL t a[ , b], a < b, be an interval on the real line, and let
era, b] be the set of all real-valued continuous functions defined on a[ , b].
Let I < p < 00 and for ,x y E era, b], define
p,(x , y) =

i[

Ix(t) -

y(t)IP dt

IJ I'

(5.3.13)

We now show that e{ ra, b]; p,} is a metric space.


Clearly, p,(x , y) = ply. x), and p,(x , y) ~ 0 for all x , y E era, b]. If
x ( t) = y(t) for all t E a[ , b], then p,(x , y) = O. To prove the converse of
this statement, suppose that x(t) 1= = y(t) for some t E a[ , b). Since x , y E
era, b], x - Y E e[a, b], and there is some interval in a[ , b], i.e., a subinterval
of a[ , b], such that Ix(t) - y(t) I > 0 for all t in that subinterval. eH nce,

1[

Ix ( t) -

y(t) I' dt

1] /, > o.

Therefore, p,(x , y) = 0 if and only if x ( t) = y(t) for all t E a[ , b].


To show that the triangle inequality holds, let u, tJ, W E era, b], and let
x = U - tJ and y = v - w. Then we have, from inequality (5.2.8),

<
=

I{ lu(t) I{ Iu(t) b
I{ lu(t) b

p,(u, w) =

p,(u, v)

w(tWdt
v(t) +

} 1/,

v(t) -

v(tWdt

1/,

w(t) I' dt

+ I{ b

} 1/,

Iv(t) -

w(tWdt

1/,

p,(v, w),

the triangle inequality. It now follows that e{ ra, bJ; p,} is a metric space.
It is easy to see that this space is an unbounded metric space. _

273

5.3. Examples of Important Metric Spaces

5.3.14. Example.
eL t
F o r x, y E ~[a,
b], let

b] be defined as

a[~ ,

p_ ( x ,

y)

sup Ix { t )
-

GStS6

In

the preceding example.

(5.3.15)

y{ t ) I.

To show that {~[a,


b]; p- l is a metric space we first note thatp_ ( x , y) = p_ ( y,
x), that p(x , y) > 0 for all ,x y, and that p{ x , y) = 0 if and only if x ( t) = y{ t )
for all t E a[ , b]. To show that p_ satisfies the triangle inequality we note that

p_ ( x , y)

sup Ix { t )

Ix - y l

o
x =

z{t)

.StS6

z ( t) -

y{ t )

i
'
I

y(t)/

sup Ix ( t) -

GStS6

I
I

I
I

R, pix, yl

= Ix - yl

ev= ( v,.V2)

(x "

X"

.- - - - , -

2x 1

o
P.(x ,

X .. R2, P.(X ,

yl

yl" max (Ix, - y ,I,lx 2

- 2Y 11

(x tl

- -

"' - _ ' : "' - ' =

era,

bJ , P. (x"

-~

Ib
I

2x 1 = sup { I x l

- --

- -

(t)- 2
x t{ 111

a~tb
~

5.3.16.

iF gure B. Illustration of various metrics.

174

Chopttr 5

<

sup I{ x ( t) -

)z

p_(,x
=

It thus follows that e{ a[ ,

b]; P-J

I + Iz(t)

z(t)

.S;' 9

Mttr;c Spacts

y(t) I}
-

y).

p_(,z

is a metric space.

In Figure B,. several metrics considered in Section 5.1 and in the present
section are depicted pictorially.

5.3.17. Exercise.

Show that the metric defined in Eq. (5.3.4) is eq u ivalent

to

p_(,x

"
,-- [ I-'

y)

lim I; I~,

/
- 1' ,1' IJ , .

Let X = R denote the set of real numbers, and define


d(x , y) = (x - yy~ for all x, y E R. Show that the function dis not a metric.
This illustrates the necessity for the ex p onent lip in Eq. (5.3.2).

5.3.18. Exercise.

We conclude the present section by considering Cartesian products of


metric spaces. Let { X ; P.. } and { Y ; py} be two metric spaces, and letZ = X
x .Y Utilizing the metrics P.. and py we can define Metrics on Z in an infinite
variety of ways. Some of the more interesting cases are given in the following:

5.3.19. Theorem. L e t
Z = X x .Y Let ZI =
Define the functions
P,(ZI' Z2)

and

{X;
(XI>

P.. } and {Y; pyJ be metric spaces, and let


Y I ) and Z2 = (x 2, 2Y ) be two points ofZ = X x .Y

([p(z IX >

P_(ZI'

l> ' +

[ p iY I '

Y2)],}I/"

= max p{ (z x u 2X ), PY(IY '


are metric spaces.

Z2)

Then Z
{ ; PI} and Z
{ ; P-J

The spaces Z
{ ; P,J and Z
{ ; P-J

1 <p

<

00

2Y )}.

are examples of product (metric) spa~es.

5.3.20. Exercise.

Prove Theorem 5.3.19.

We can extend
We have:

the above concept to the product of n metric spaces.

5.3.21. Theorem. Let { X I ;


X
,Y ,)

XI
E

X,

... X

"X

" "X
IT

PIJ, ... ,{ X , ,;

t-~

y)

" P,(X
= I;

define the functions


p' ( x ,

For

'~I

P"J be n metric spaces, and let


(XI>

"

... , IX I)

y,)

,X Y

(YI'

... ,

5.4.

175

Open and Closed Sets

and
p"(x , y) =

Then { X ;

p'} and { X ;

5.3.22. Exercise.

5.4.

(I-'
~

p[ ,(x"

)1/~

y,)~]

pIt} are metric spaces.


Prove Theorem 5.3.21.

OPEN AND CLOSED

SETS

Having introduced the notion of metric, we are now in a position to


consider several important fundamental concepts which we will need throughout the remainder of this book.

In the present section ;X{

p} will denote an arbitrary metric space.

5.4.1. Definition. Let X o E X and let r E R, r > O. An open sphere or


open ball, denoted by S(x o; r), is defined as the set
S(x o; r) = x { E :X p(x, x o) < r}.
We call the fixed point X o the center and the number r the radius of S(x o ; r).
F o r simplicity, we often call an open sphere simply a sphere.
The radius of a sphere is always positive and finite. In place of the terms
ball or sphere we also use the term spherical neighborhood of X o'
In Figure C, spheres in several types of metric spaces considered in the
previous sections are depicted. Note that in these figures the indicated spheres
do not include boundaries.
5.4.3. Exercise.
Describe the open sphere in
metric is the discrete metric of Example 5.1.7.

R~

as a function of r if the

We can now categorize the points or elements of a metric space in several


ways.
5.4..4
Definition. eL t Y be a subset of .X A point x E X is called a contact
point or adherent point of set Y if every open sphere with center x contains at
least one point of .Y The set of all adherent points of Y is called the closure
of Y and is denoted by .Y
We note that every point of Y is an adherent point of ;Y
may be points not in Y which are also adherent points of .Y

however, there

5.4.5. Definition. Let Y be a subset of ,X and let x E X be an adherent


point of .Y Then x is called an isolated point if there is a sphere with center x

Chapter 5 I Metric Spaces

276

'.

oX
Sphere S(XO; rl, where X = R
and pIx , yl = Ix - vi

Sphere S(x o ; rl. where X ., R2 and


pIx , yl" P2(x , yl "' [ ( tl 1- 1112 +

(b -1I212J~

t2
t 02 +

t 02 - r

9! - i
- - +- -,

t 02 +

t 02

1.,-,

I~

tOI - r

- ~ ."

-~.

:
tOI

tOI + r

~ I
I - " I+ I.,

~2

I
I

t1

tOI - r

tOI

p(x , yl= p _ ( x , yl"' m ax

112 1

to! + r

era,

bJ

and pIx , yl "' p_ (x, yl = sup Ix ( tI-

tI

litI - .."1'I It:~2 - 1121)

oX - r

y(tl I

a~t~b

5.4.2.

x l tl

Sphere S(Xo; rl, where X ' "

Sphere S(xo ; rI where X "' R2 and

Sphere S(x o ; rl. where X = R2 and


p(x , yl= P I(x , yl= l t

. various
.
Figure C Spheres In
metric spaces.

5.4.

Open and Closed Sets

277

which contains no point of Y o ther than x itself. The point x is called a limit
point or point of accumulation of set Y if every sphere with center at x contains
an infinite number of points of .Y The set of all limit points of Y is called the
derived set of Y a nd is denoted by .'Y
Our next result shows that adherent points are either limit points or
isolated points.
5.4.6 . Theorem. eL t Y be a subset of X and let x E .X Ifx is an adherent
point of ,Y then x is either a limit point or an isolated point.
We prove the theorem by assuming that x is an adherent point of Y
but not an isolated point. We must then show that x is a limit point of .Y
To do so, consider the family of spheres S(x; lin) for n = 1,2, .... eL t
fX t E S(x; lin) be such that fX t E Y b ut fX t 1= = x for each n. Now suppose there
are only a finite number of distinct such points X ft , say, lX { '
... , x k } . If we
let d = min p(x, IX )' then d > O. But this contradicts the fact that there is

Proof

1:S:I:S:k

an fx t E S(x; lin) for every n = 1,2,3, .... eH nce,


fx t and thus X is a limit point of .Y

there are infinitely many

We can now categorize adherent points of Y c X into the following


three classes: (a) isolated points of ,Y which always belong to Y; (b) points
of accumulation which belong to ;Y and (c) points of accumulation which do
not belong to .Y
Example. Let X = R, let p be the usual metric, and let Y = x{ E R:
x < 1, x = 2}, as depicted in Figure D. The element x = 2 is an isolated
point of ,Y the elements 0 and 1 are adherent points of Y which do not
belong to ,Y and each point of the set x { E R: 0 < x < I} is a limit point
of Y belonging to .Y

5.4.7.

o<

o
5.4.8.
5.4.7.

iF gure D. Set Y =

{x

2
R: 0

<
x

< 1, x
=

2} of Example

5.4.9. Example. Let R


{ ; p} be the real line with the usual metric, and let
Q be the set of rational numbers in R. F o r every x E R, any open sphere
S(x; r) contains a point in Q. Thus, every point in R is an adherent point
of Q; i.e., R c Q. Since Q c R, it follows that R = Q. Clearly, there are
no isolated points in Q. Also, for any x E R, every sphere S(x; r) contains

278

Chapter 5

I Metric Spaces

an infinite number of points in Q. Therefore, every point in R is a limit


point of Q; i.e., R c Q'. This implies that Q' = R. _
L e t us now consider the following basic results.
5.4.10. Theorem. L e t Y a nd Z be subsets of ,X and let f and i denote the
closures of Y a nd Z, respectively. L e t
denote the closure of ,Y and let Y '
be the derived set of .Y Then

(i)

Y c

f;

(ii) f = f;
(iii) if Y c Z, then
(iv)

(v)
(vi)

= f u i;
Y n Z c f n i;
f = Y U Y'.

YUZ

i;

and

To prove the first part, let x E .Y Then x E S(x ; r) for every r > O.
Hence, x E .Y Therefore, Y c f.
To prove the second part, let x E ,Y and let r> O. Then there is an
XI
E Y such that X I E S(x ; r),andhencep(x , X I ) = r l < r. L e tro = r - r l
> O. WenowwishtoshowthatS(x l ; ro) c S(x ; r). Indoingso,lety E S(x l ;
ro)' Then p(y, X I ) < roo By the triangle inequality we have p(x , y) ~ p(x ,
XI) +
p(x l , y) < r l + (r - r l ) = r, and hence y E S(x ; r). Since X I E f,
the sphere S(x l ; ro) containsapointx 2 E .Y Thus, X 2 E S(x ; r). Since S(x ;
r) is an arbitrary spherical neighborhood of x , we have X E .Y This proves
that c .Y Also, in view of part (i), we have Y c
Therefore, it follows
that = Y
.Y
To prove the third part of the theorem, let r > 0 and let X E .Y Then
there is ayE Y such that y E S(x ; r). Since Y c Z, Y E Z and thus X is an
adherent point of Z.
To prove the fourth part, note that Y c Y U Z and Z c Y U Z. F r om
part (iii) it now follows that Y c Y U Z and i c Y
U Z. Thus, f u i
c Y U Z. To show that Y U Z c f u i, let X E Y U Z and suppose
that X :q Y u i. Then there exist spheres S(x ; r l ) and S(x ; r2) such that
S(x ; r l) n Y = 0 and S(x ; ' 2 ) n Z = 0. L e t
r = min {'It :' ' } z
Then
S(x ; r) n [ Y U Z] = 0. But this is impossible since X E Y U
Z. Hence,
X E Y
u i, and thus Y U Z c f u i.
The proof of the remainder of the theorem is left as an exercise.
_
Proof

5.4.11.

r.

Exercise.

Prove parts (v) and (vi) of Theorem 5.4.10.

We can further classify points and subsets of metric spaces.


5.4.12.
Definition. L e t Y be a subset of X and let Y - denote the complement of .Y A point X E X is called an interior point of the set Y if there

5.4.

Open and Closed Sets

279

exists a sphere Sex; r) such that sex; r) c .Y The set of all interior points of
set Y is called the interior of Y a nd is denoted by yo. A point x E X is an
ex t erior point of Y if it is an interior point of the complement of .Y The
exterior of Y is the set ofall exterior points of set .Y The set ofall points x E X
such that x E f () (Y - )
is called the frontier of set .Y The boundary of a
set Y is the set of all points in the frontier of Y which belong to .Y
5.4.13. Example.
Let R
{ ; p} be the real line with the usual metric, and
let Y = y{ E R: 0 < :Y 5: I} = (0, I]. The interior of Y is the set (0, I) =
{ y E R: 0 < y < I}. The exterior of Y i s the set (- 0 0, 0) U (I, + 0 0), f =
y{ E R: < Y : 5:
I} = 0[ , I] and Y - = (- 0 0,0] U 1[ , + 0 0). Thus, the

frontier of Y is the set CO, I}, and the boundary of Y is the singleton l{ .}

We now introduce the following important concepts.


5.4.14.
Definition. A subset Y of X is said to be an open subset of X if
every point of Y is an interior point of ;Y eL ., Y = yo. A subset Z of X is
said to be a closed subset of X if Z = Z.
When there is no room for confusion, we usually call Y an open set and

Z a closed set. On occasions when we want to be very explicit, we will say


that Y is open relative to { X ; p} or witb respect to { X ; p}.
In our next result we establish some of the important properties of open
sets.

5.4.15.
(i)
(ii)

Theorem.
and 0 are open sets.
If { .Y } .. eA is an arbitrary family of open subsets of ,X
X

is an open set.
(iii) The intersection of a finite number of open sets of X

then

eA

Y ..

is open.

Proof To prove the first part,. note that for every x E X, any sphere
Sex; r) c .X Hence, every point in X is an interior point. Thus, X is open.
Also, observe that 0 has no points and therefore every point of 0 is an
interior point of 0. Hence, 0 is an open subset of .X
To prove the second part, let .Y{ .} EA
be a family of open sets in ,X and
Y . If Y .. is empty for every tt E A, then Y = 0 is an open
let Y = U
.eA

subset of .X Now suppose that Y *- 0 and let x E .Y Then x E


tt E A. Since Y .. is an open set, there is a sphere Sex; r) such
c Y . Hence, Sex; r) c ,Y and thus x is an interior point of .Y
Y is an open set.
To prove the third part, let Y 1 and Y 2 be open subsets of .X
= 0, then Y 1 n Y 2 is open. So let us assume that Y 1 n Y z *-

Y . for some
that sex; r)
Therefore,
If Y 1 () Y 2
0, and let

Chapter 5 I Metric Spaces

= Y 1 n Y z Since x E Y " there is an r l > 0 such that x E S(x;


Y I ' Similarly, there is an r z > 0 such that x E S(x; rz) c Y z . L e t
T = min { r " Tz.} Then x E S(x ; r), where S(x ; r) c S(x ; T1) and S(x ; r)
c S(x ; rz). Thus, S(x; r) c Y 1 n Y z , and x is an interior point of Y 1 n Y z .
Hence, Y 1 n Y z is an open subset of .X By induction, we can show that the
intersection of any finite number of open subsets of X is open. _
x

E Y

T 1) C

We now make the following

p} be a metric space. The topology of X


Definition. L e t ;X{
mined by p is defined to be the family of all open subsets of .X

5.4.16.

deter-

In our next result we establish a connection between open and closed


subsets of .X
5.4.17.

Theorem.

(i) X and 0 are closed sets.


(ii) If Y is an open subset of ,X
(iii) If Z is a closed subset of ,X

then r is closed.
then Z- is open.

Proof

The first part of this theorem follows immediately from the definitions of ,X 0, and closed set.
To prove the second part, let Y b e any open subset of .X We may assume
that Y 1= = 0 and Y 1= = .X Let x be any adherent point of Y - . Then x cannot
belong to ,Y for if it did, then there would exist a sphere S(x ; ,) c ,Y which
is impossible. Therefore, every adherent point of Y - belongs to Y - , and thus
Y - is closed if Y is open.
To prove the third part, let Z be any closed subset of .X Again, we may
assume that Z 1= = 0 and Z 1= = .X L e t x E Z- . Then there exists a sphere
S(x ; T) which contains no point of Z. This is so because if every such sphere
would contain a point of Z, then x would be an adherent point of Z and
consequently would belong to Z, since Z is closed. Thus, there is a sphere
S(x ; r) c Z- ; i.e., x is an interiorpointofZ- . Since this holds for arbitrary
x E Z- , Z- is an open set. _
In the next
sets.
5.4.18.

result we present additional important properties of open

Theorem.

(i) Every open sphere in X is an open set.


(ii) If Y is an open subset of ,X then there is a family of open spheres,
S
{ .}.eA' such that Y = U S .

eA

(iii) The interior of any subset Y of X


in .Y

is the largest open set contained

5.4.

Open and Closed Sets

281

Proof To prove the first part, let Sex; r) be any open sphere in .X L e t
x . E sex; r), and let p(x, lX ) = r . If we let r o = r - ' . , then according to
the proof of part (ii) of Theorem 5.4.10 we have S(x l ; ro) c Sex; r). Hence,
x . is an interior point of sex; r). Since this is true for any x . E sex; r), it
follows that sex ; r) is an open subset of .X
To prove the second part of the theorem, we first note that if Y = 0,
then Y is open and is the union of an empty family of spheres. So assume
that Y t= = 0 and that Y is open. Then each point X E Y is the center of a
sphere Sex; r) c ,Y and moreover Y is the union of the family of all such
spheres.
The proof of the last part of the theorem is left as an exercise.

5.4.19.

Exercise.

Prove part (iii) of Theorem 5.4.18.

Let {Y; p} be a subspace of a metric space {X;


pI, and suppose that V
is a subset of .Y It can happen that V may be an open subset of Y and at
the same time not be an open subset of .X Thus, when a set is described as
open, it is important to know in what space it is open. We have:

5.4.20.

p} be a metric subspace of { X ; pl.


(i) A subset V c Y is open relative to { Y ; p} if and only if there is a
subset U c X such that U is open relative to { X ; p} and V = Y n .U
(ii) A subset G c Y is closed relative to { Y ; p} if and only if there is a
subset F of X such that Fis closed relative to ;X {
p} and G = F n .Y
Proof L e t S(x o; r) = x { E :x p(x, x o) < r} and S'(x o; r) = x { E :Y p(x,
x o) < r}. Then S' ( x o; r) = Y n S(x o; r).
Theorem.

Let { Y ;

To prove the necessity of part (i), let V be an open set relative to { Y ; p} ,


and let x E V. Then there is a sphere S' ( x ; r) c V (r may depend on )x .
Now

V=

.,el'

S' ( x ; r)

By part (ii) of Theorem 5.4.15,

.,el'

.,el'

S(x ; r)n

Sex; r) =
U

Y.

is an open set in ;X{

pl.

To prove the sufficiency of part (i), let V = Y n ,U where U is an open


subset of .X L e t x E V. Then x E ,U and hence there is a sphere S(x; r) c .U
Thus, S'(x; r) = Y n Sex; r) c Y n U = V. This proves that x is an interior point of V and that V is an open subset of .Y
The proof of part (ii) of the theorem is left as an exercise.
_

5.4.21.

Exercise.

Prove part (ii) of Theorem 5.4.20.

The first part of the preceding theorem may be stated in another equivalent
way. L e t 3 and 3' be the topology of ;X {
p} and {Y; pI, respectively, generated
by p. Then 3' = { Y n :U U E 3}.
Let us now consider some specific examples.

elulpter 5 I Metric Spaces

5.4.22. Example. Let X = R, and let p be the usual metric on R; eL .,


p(x, y) = Ix - yl. Any set Y = (a, b) = { x : a < x < b} is an open subset
of .X We call (a, b) an open interval on R. _
5.4.23. Example. We now show that the word "finite" is crucial in part
(iii) of Theorem 5.4.15. eL t R
{ ; p} denote again the real line with the usual
metric, and let a < b. If "Y = x { E R: a < x < b + lin}, then for each
positive integer n, "Y is an open subset of the real line. oH wever, the set

n- "Y

,,= \

= x{

R: a

< x < b} = (a, b]

is not an open subset of R. (This. can readily be verified, since every sphere
S(b; r) contains a point greater than b and hence is not in

n- "Y .)

,,= \

In the above example, let Y = (a, b]. We saw that Y is not an open subset
of R; i.e., b is not an interior point of .Y oH wever, if we were to consider
{ Y ; p} as a metric space by itself, then Y is an open set.
5.4.24. Example.
eL t e{ ra, b]; p_} denote the metric space of Example
5.3.14. eL t 1 be an arbitrary finite positive number. Then the s~t of continuous
functions satisfying the condition Ix ( t) I < 1 for all a < t < b is an open
_
subset of the metric space e{ ra, b]; p_.}
Theorems 5.4.15 and 5.4.17 tell us that the sets X and 0 are both open
and closed in any metric space. In some metric spaces there may be proper
subsets of X which are both open and closed, as illustrated in the following
example.
5.4.25. Example. eL t X be the set of real numbers given by X = (- 2 ,
- 1 ) U (+ 1 , + 2 ), and let p(x , y) = Ix - yl for x , y E .X Then { X ; p}
is
clearly a metric space. Let Y = (- 2 , - 1 ) c X and Z = (+ I, + 2 ) c .X
Note that both Y and Z are open subsets of .X oH wever, Y - = Z, Z- = ,Y
and thus Y a nd Z are also closed subsets of .X Therefore, Y and Z are proper
subsets of the metric space ;X {
p} which are both open and closed. (Note that
in the preceding we are not viewing X as a subset of R. As such X would be
open. Considering ;X{
p} as our metric space, X is both open and closed.) _
5.4.26. Exercise. eL t { X ; p} be a metric space with p the discrete metric
defined in Example 5.1.7. Show that every subset of X is both open and
closed.
In our next result we summarize several important properties of closed
sets.

5.4.

Open and Closed Sets

5.4.27.

Theorem.

(i) Every subset of X consisting of a finite number of elements is closed.


(ii) L e t X o E ,X let r> 0, and let K ( x o ; r) = x { E X : p(x , x o) < r}.
Then K ( x o; r) is closed.
(iii) A subset Y c X is closed if and only if feY .
(iv) A subset Y c X is closed if and only if Y ' c .Y
(v) Let {Y.}.eA
be any family of closed sets in .X Then
Y. is closed.

eA

(vi) The union of a finite number of closed sets in X is closed.


(vii) The closure of a subset Y of X is the intersection of all closed sets
containing .Y

Proof Only the proof of part (v) is given. Let {Y.}.eA


be any family of
closed subsets of .X Then {Y:}.eA
is a family of open sets. Now
Y . )-

.eA

5.4.28.

Y:

is an open set, and hence

.eA

Y. is a closed subset of .X

(n
.eA

Prove parts (i) to (iv), (vi), and (vii) of Theorem 5.4.27.

Exercise.

We now consider several specific examples of closed sets.


5.4.29. Example.
Let X = R, and let p be the usual metric, p(x , y)
= Ix - yl Any set Y = x{
E R: a < x <
b}, where a < b is a closed subset
of R. We call Y a closed interval on R and denote it by a[ , b].
5.4.30. Example.
We now show that the word "finite" is essential in part
(vi) of Theorem 5.4.27. Let {R; p} denote the real line with the usual metric,
and let a> O. If Y. = x { E R: lin < x < a} for each positive integer n,
then Y. is a closed subset of the real line. However, the set

.=1

Y. =

(x

R: 0 <
x

<

a} =

(0, a]

is not a closed subset of the real line, as can readily be verified since
adherent point of (0, a].

is an

5.4.31. Exercise.
The set K ( x o; r) defined in part (ii) of Theorem 5.4.27
is sometimes called a closed sphere. It need not coincide with S(x o; r), i.e.,
the closure of the open sphere S(x o; r).
(i) Show thatS(x o; r) c K(xo;r).
(ii) Let (X ; p} be the discrete metric space defined in Example 5.1.7.
Describe the sets S(x; I), S(x ; I), and K(x;
I) for any x E X and conclude
I) if X contains more than one point.
that, in general, S(x ; I) K ( x ;

*'

Chapter 5 I Metric Spaces

(iii) Let X = (- 0 0,0) u ,J where J denotes the set of positive integers,


and let p(x, y) = Ix - Y I. Describe S(O; 1), (& 0; I), and (K O; 1)
and conclude that (& 0; 1) (K O; 1).

*"

We are now in a position to introduce certain additional concepts which


are important in analysis and applications.
5.4.32. Definition. eL t Y and Z be subsets of .X The set Y is said to be
dense in Z (or dense with respect to Z) if Y :J Z. The set Y is said to be
everywhere dense in { X ; p} (or simply, everywhere dense in X ) if Y = .X
If the exterior of Y is everywhere dense in X, then Y is said to be nowhere
dense in .X A subset Y of X is said to be dense-in-itself if every point of Y
is a limit point of .Y A subset Y of X which is both closed and dense-in-itself
is called a perfect set.
5.4.33. Definition. A metric space {X; p} is said to be separable if there
is a countable subset Y in X which is everywhere dense in .X
The following result enables us to characterize separable metric spaces
in an equivalent way. We have:
5.4.34.
Theorem. A metric space { X ; p} is separable if and only if there
is a countable set S = lX{ '
,~x
...} c X such that for every x E ,X for given
f>
0 there is an x . E S such that p(x, x . ) < f.
5.4.35.

Exercise.

Prove Theorem 5.4.34.

eL t us now consider some specific cases.


5.4.36. Example. The real line with the usual metric is a separable space.
As we saw in Example 5.4.9, if Q is the set of rational numbers, then Q = R.

5.4.37. Example. Let {R; p,} be the metric space defined in Example
5.3.1 (recall that 1 < p < 00). The set of vectors x = (e I'
,e.) with
rational coordinates (i.e.,
is a rational real number, i = I,
,n) is a
denumerable everywhere dense set in R and, therefore, R
{ ; p,} is a separable
metric space. _

e,

5.4.38. Example. eL t {l,; p,} be the metric space defined in Example 5.3.5
(recall that I < p < 00). We can show that this space is separable in the
following manner. eL t
Y

= .Y{

I,: .Y

('II' ... , 1/.,0,0, ...) for some n,

where 1/1 is a rational real number, i

= 1, ... ,n} .

5.4.

Open and Closed Sets

285

Then Y is a countable subset of I,. To show that it is everywhere dense, let


E > 0 and let x E I" where x = (~I>
~z,
...). Choose n sufficiently large so
that
~
E'
k-~+t

We can now find a Y~

1:

I~kl'

<_. 2

E Y such that

eH nce,

i.e., p,(x,

<

)~Y

By Theorem 5.4.34,

E.

{I,; P,} is separable.


_

In order to establish the separability of the space of continuous functions,


it is necessary to use the Weierstrass approximation theorem, which we state
without proof.
Theorem. Let era, b] be the space of real continuous functions
on the interval a[ , b], and let 6'(t) be the family of all polynomials (defined
on a[ , b]). Let E > 0, and let x E era, b]. Then there is apE 6'(t) such that
sup Ix(t) - P(t)1 < E

5.4.39.

~/~b

5.4.40.

Ex e rcise.

U s ing

the Weierstrass approx i mation

theorem, show

that the metric spaces e{ ra, b]; P,}, defined in Example 5.3.12, and e{ ra, b];
p~,}
defined in Example 5.3.14, are separable.

Exercise. Show that the metric space { X ; p}, where pis the discrete
metric defined in Example 5.1.7, is separable if and only if X is a countable
set.

5.4.41.

We conclude the present section by considering an example of a metric


space which is not separable.
5.4.42.

5.3.8. Let

Example.
Y c R~

L e t {l~;
p~}
be the metric space defined in Example
denote the set

Y ={y
Clearly then Y c
such that

IX

.~I

E R~:
I~.

'1~H)~,

Y =

('11' 1' ,z ...), where 1' 1 =

0 or I}.

Now for every real number IX E 0[ , I], there is ayE Y


where Y =

('11> 1' ,z ...). Thus, Y is an uncountable

set. Notice now that for every IY > zY E ,Y p~(IY >


yz ) = 0 or l. That is, p~
restricted to Y is the discrete metric. It follows from Exercise 5.4.14
that Y
cannot be separable and, consequently, { t ; p~} is not separable. _

Chapter 5 I Metric Spaces

286

5.5.

COMPLETE

METRIC SPACES

The set of real numbers R with the usual metric p defined on it has many
remarkable properties, several of which are attributable to the so-called
"completeness property" of this space. F o r this reason we speak of R
{ ; p}
as being a complete metric space. In the present section we consider general
complete metric spaces.
Throughout this section {X; p} is our underlying metric space, and J denotes
the set of positive integers. Before considering the completeness of metric
spaces we need to consider a few facts about sequences on metric spaces (cf.
Definition 1.1.25).
5.5.1. Definition. A sequence .x { }
in a set Y c: X is a functionf: J Thus, if .x{ }
is a sequence in ,Y thenf(n) = x . for each n E .J

.Y

5.5.2. Definition. eL t .x{ }


be a sequence of points in ,X and let x be a
point of .X The sequence {x.}
is said to converge to x if for every f > 0,
there is an integer N such that for all n;;::: N, p(x, x . ) < f (i.e., x . E S(x ; f)
for all n ;;::: N). In general, N depends on f; i.e., N = N(f). We call x the limit
of .x{ ,}
and we usually write

lim x .

or x . - x as n - 00. If there is no x
then we say that {x.l diverges.

= ,x

X to which the sequence converges,

Thus, x . - + x if and only if the sequence of real numbers {p(x., )x }


converges to ez ro. In view of the above definition we note that for every
f >
0 there is afinite number N such that all terms of {x.l except the first
(N - I) terms must lie in the sphere with center x and radius E. eH nce, the
convergence of a sequence depends on the infinite number of terms x{ N + 1J
X N+ 2'
), and no amount of alteration of a finite number of terms of a
divergent sequence can make it converge. Moreover, if a convergent sequence
is changed by omitting or adding a finite number of terms, then the resulting
sequence is still convergent to the same limit as the original sequence.
Note that in Definition 5.5.2 we called x the limit of the sequence .x{ .}
We will show that if { x . ) has a limit in ,X then that limit is unique.
5.5.3. Definition. eL t .x { }
be a sequence of points in ,X where f(n) to x .
for each n E .J If the range offis bounded, then .x { }
is said to be a bounded
sequence.

The range off in the above definition may consist of a finite number of
points or of an infinite number of points. Specifically, if the range of f

5.5. Complete Metric Spaces

consists of one point, then we speak of a constant sequence.


constant sequences are convergent.

Clearly, all

{ ; p} denote the set of real numbers with the usual


5.5.4. Example. Let R
metric. If n E ,J then the sequence n{ Z} diverges and is unbounded, and the
range of this sequence is an infinite set. The sequence { ( - I )"}
diverges, is

a{ + ( nl)"}

bounded, and its range is a finite set. The sequence


to a, is bounded, and its range is an infinite set.

converges

be a sequence in .X Let n l , n z , ... , nk' ... be


5.5.5. Definition. eL t "x { }
a sequence of positive integers which is strictly increasing; i.e., nJ > nk for
all j > k. Then the sequence "x { .}
is called a subsequence of ,x { ,}.
If the
subsequence "x { .}
converges, then its limit is called a subsequential limit
of ,x { ,].
It turns out that many of the important properties of convergence on R
can be extended to the setting of arbitrary metric spaces. In the next result
several of these properties are summarized.

5.5.6. lbeorem. eL t ,x { ,}

be a sequence in .X

Then

(i) there is at most one point x E X such that lim "x

"

= x;

(ii) if ,x { ,}
is convergent, then it is bounded;
(iii) ,x { ,} converges to a point x E X if and only if every sphere about x

contains all but a finite number of terms in ,x { ,};


(iv) ,x { ,}
converges to a point x E X if and only if every subsequence
of ,x { ,} converges to x ;
(v) if{,x ,}
converges to x E X and if Y E ,X then lim p(x", )Y = p(x, )Y ;

"

(vi) if ,x { ,}
converges to x E X and if the sequence y{ ,,} of X converges
to Y E ,X then lim p(x", y,,) = p(x, y); and

(vii) if ,x [ ,} converges "to x E ,X and if there is ayE X and a )' >


that p(x", y) < )' for all n E ,J then p(x, y) < y.

0 such

= x and
"
lim "x = y. Then for every f > 0 there are positive integers N" and N)' such
" p(x", x ) < f/2 whenever n > N" and p(x", y) < f/2 whenever n > N
that
r

Proof.

To prove part (i), assume that ,x y E X

If we let N

Now

and that lim "x

max (N", N,,), then it follows that

is any positive number. Since the only non-negative number which

Chapter 5 I Metric Spaces

288

is less than every positive number is ez ro, it follows that p(x, y) = 0 and
therefore x =
y.
To prove part (iii), assume that lim x . = x and let Sex; f) be any sphere

about .x Then there is a positive integer N such that the only terms of the
sequence { x . } which are possibly not in Sex; f) are the terms X I ' x 2 , , X N - 1
Conversely, assume that every sphere about X contains all but a finite number
of terms from the sequence .x{ .}
With f > 0 specified, let M = max n{ E :J
.x 1= S(x ; f)} . IfwesetN= M +
l,thenx . E S(x ; f)foralln> N ,which
was to be shown.
To prove part (v), we note from Theorem 5.1.13 that
lP(y, )x -

I=

p(x, .x ).

.x Therefore, lim p(x, x . ) = 0 and so lim Ip(y, )x

By hypothesis, lim x . =
- p (y, x . )

I<

p(y, x.)

0; i.e., lim p(y, x . ) =

p(y, x) .

iF nally, to prove part (vii), suppose to the contrary that p(x, y) > .Y'
Then 6 = p(x, y) - i' > O. Now'Y - p(x., y) > 0 for all n E ,J and thus

0<
for all n

6<

p(x, y) -

p(x., y)

.J But this is impossible, since lim

<

p(x, x . )
X.

.x Thus, p(x, y)

We leave the proofs of the remaining parts as an exercise.


5.5.7. Exercise.

<

y.

Prove parts (ii), (iv), and (vi) of Theorem 5.5.6.

In Definition 5.4.5, we introduced the concept of limit point of a set


In Definition 5.5.2, we defined the limit of a sequence of points,
.x{ ,}
in .X These two concepts are closely related; however, the reader should
carefully note the distinction between the two. The limit point of a set is
strictly a property of the set itself. On the other hand, a sequence is not a set.
Furthermore, the elements of a sequence are ordered and not necessarily
distinct, while the elements of a set are not ordered but are distinct. oH wever,
the range of a sequence is a subset of .X We now give a result relating these
concepts.
Y

c .X

S.S.8. Theorem. eL t
(i) x
.Y { }
(ii) x
.Y{ }
(iii)

Y be a subset of .X

Then

is an adherent point of Y if and only if there is a sequence


in Y (i.e., .Y E Y for all n) such that lim Y. = x ;

E X

is a limit point of the set Y if and only if there is a sequence


of distinct points in Y such that lim Y . = x ; and
X

Y is closed if and only if for every convergent sequence {y.j, such


that Y. E Y for all n, limy. = x
E Y.

5.5.

Complete Metric Spaces

Proof

To prove part (i), assume that lim Y .

x. Then every sphere about

x contains at least one term of the sequence .Y { }


and, since every term of
fy.} is a point of ,Y it follows that x is an adherent point of .Y Conversely,
assume that x is an adherent point of .Y Then every sphere about x contains
at least one point of .Y Now let us choose for each positive integer n a point
.Y E Y such that .Y E S(x; lIn). Then it follows readily that the sequence
.Y { }
chosen in this fashion converges to x. Specifically, if f > 0 is given,
then we choose a positive integer N such that lIN < f. Then for every n > N
we have Y . E S(x; lIn) c S(x; f). This concludes the proof of part (i).
To prove part (ii), assume that x is a limit point of the set .Y Then every
sphere S(x; lIn) contains an infinite number of points, and so we can choose
a Y . E S(x; lIn) such that Y .
IY II for all m < n. The sequence .Y { }
consists
of distinct points and converges to .x Conversely, if .Y { }
is a sequence of
distinct points convergent to x and if S(x; f) is any sphere with center at ,x
then by definition of convergence there is an N such that for all n > N,
y" E S(x; f). That is, there are infinitely many points of Y i n S(x ; f).
To prove part (iii), assume that Y is closed and let ,Y { ,}
be a convergent
sequence with Y . E Y for all n and lim "Y = x . We want to show that x E Y .

"*

By part (i), x must be an adherent point of .Y Since Y is closed, x E .Y


Next, we prove the converse. Let x be an adherent point of .Y Then by part
(i), there is a sequence y{ .J in Y such that lim Y . = x. By hypothesis, we must

have x E .Y Since Y contains all of its adherent points, it must be closed.


_

Statement (iii) of Theorem 5.5.8 is often used as an alternate way of


defining a closed set.
The next theorem provides us with conditions under which a sequence is
convergent in a product metric space.
5.5.9. Theorem. Let {X; P.. J and fY; py} be two metric spaces, letZ = X
x ,Y let p be any of the metrics defined on Z in Theorem 5.3.19, and let
{ Z ; p} denote the product metric space of { X ; P..} and { Y ; py}. If Z E Z
= X x ,Y then z = (x, y), where x E X and y E .Y eL t fx,,} be a sequence
in ,X and let y{ ,,} be a sequence in .Y Then,
(i) the sequence ({ .x , y,,)} converges in Z if and only if ,x { ,}
X and .Y { }
converges in ;Y and
(ii) lim (x"' Y.) = (lim x . , lim y,,) whenever this limit exists.

5.5.10. Exercise.

converges in

Prove Theorem 5.5.9.

In many situations the limit to which a given sequence may converge is


unknown. The following concept enables us to consider the convergence
of a sequence without knowing the limit to which the sequence may converge.

Chapter 5 I Metric Spaces


5.5.11. Definition. A sequence ,x { ,}
of points in a metric space ;X {
p} is
said to be a Cauchy sequence or a fundamental sequence if for every e > 0
there is an integer N such that p(x", "x ,) < e whenever m, n ~ N.
The next result follows directly from the triangle inequality.

p} is

5.5.12. Theorem. Every convergent sequence in a metric space { X ;


a Cauchy sequence.

Proof

Assume that lim "x

"

.x Then for arbitrary e >

integer N such that p(x", x) < el2 and p(x"" x)


In view of the triangle inequality we now have
whenever m, n >

<

el2 whenever m, n

p(x", "x ,) < p(x", x) + p(x"" x) <


N. This proves the theorem. _

Let ,x { ,}

>

N.

p} a Cauchy sequence

We emphasize that in an arbitrary metric space { X ;


is not necessarily convergent.
5.5.13. Theorem.
sequence.

0 we can find an

be a Cauchy sequence. Then ,x { ,}

is a bounded

We need to show that there is a constant "I such that 0 < "I < 00 and
such that p(x"" ,x ,) < "I for all m, n E I.
Letting e = I, we can find N such that p(x"" ,x ,) < I whenever m, n ~ N.
Now let l = max p{ (XI>
x z ), p(x l , x 3), ... ,p(x l , x N)). Then, by the triangle
inequality,
p(x l , ,x ,) < P(X l ' x N ) + p(x N , ,x ,) < (l + I)

Proof

if n > N. Thus, for all n E I, p(x l , ,x ,)


inequality,
p(x"" ,x ,) < p(x"" X l )
for all m, n

I. Thus, p(x"" ,x ,)

<

2(A

<

p(x

l. Again, by the triangle


l,

I) and ,x { ,}

,x ,)

is a bounded sequence.

We also have:

5.5.14. Theorem. If a Cauchy sequence ,x { ,}


contains a convergent subsequence "x { .},
then the sequence ,x { ,}
is convergent.
5.5.15.

Exercise.

Prove Theorem 5.5.14.

We now give the definition of complete metric space.


5.5.16. Definition. If every Cauchy sequence in a metric space ;X{
p} converges to an element in ,X then { X ; p} is said to be a complete metric space.

291

5.5. Complete Metric Spaces

Complete metric spaces are of utmost importance in analysis and applications. We will have occasion to make extensive use of the properties of such
spaces in the remainder of this book.
5.5.17. Example. eL t X = (0, I), and let p(x, y) =
E .X
eL t x . = lin for n E .J Then the sequence .x{ }
is a Cauchy sequence), since Ix . - lx iii < IIN for all
there is no x E X to which .x { }
converges, the metric
complete. _
5.5.18. Example.

= Ix - yl

eL t

Let

x. =

Ix

- Y I for all x,
is Cauchy (i.e., it
n > m > N. Since
space { X ; p} is not

Q, the set of rational numbers, and let p(x, y)

2\

+ ... + 1,
for
n.

Cauchy. Since there is no limit in Q to which .x{ }


Q
{ ; p} is not complete. _

.J

The sequence .x { }

is

converges, the metric space

5.5.19. Example. Let R# = R - CO}, and let p(x , y) = Ix - IY for all


x, Y E R'ft. eL t x . = lin, n E .J The sequence .x{ }
is Cauchy; however, it
does not converge to a limit in R#. Thus, {R#; p} is not complete. Some
further comments are in order here. If we view R# as a subset of R in the
metric space { R ; p} (p denotes the usual metric on R), then the sequence {x.}
converges to zero; i.e., lim x . = O. By Theorem 5.5.8, R# cannot be a closed

subset of R. oH wever, R# is a closed subset of the metric space {R#; p},


since it is the whole space. There is no contradiction here to Theorem 5.5.8,
for the sequence {x.} does not converge to a limit in R#. Specifically, Theorem
5.5.8 states that if a sequence does converge to a limit, then the limit must
belong to the space. The requirement for completeness is that every Cauchy
sequence must converge to an element in the space. _
We now consider several specific examples of important complete metric
spaces.
5.5.20. Example. eL t p denote the usual metric on R, the set of real
numbers. The completeness of R
{ ; p} is one of the fundamental results of
analysis. _
5.5.21. Example.
eL t { X ; P.. } and { Y ; py} be arbitrary complete metric
spaces. If Z = X x Yand if Z E Z, then z = (x, y), where x E X and y E Y
(see Theorem 5.3.19). Define
p,.(Zt, Z2)

= P2X t , tY ), (x 2, 2Y
= ,J ( P ..(x t , x 2)]2 + (piY t , 2Y )]2.

It can readily be shown that the metric space { Z ; P2} is complete.

Chapter 5

292

5.5.22. Exercise.

I Metric Spaces

Verify the completeness of Z


{ ; P2} in the above example.

5.5.23. Example. Let P be the usual metric defined on C, the set of


complex numbers. tU ilizing Example 5.5.21 along with the completeness of
R
{ ; p} (see Example 5.5.20), we can readily show that C
{ ; p} is a complete
metric space. _
5.5.24.

Exercise.

pl.

Verify the completeness of { C ;

5.5.25. Exercise. eL t X = R" (let X = C") denote the set of all real (of
all complex) ordered n-tuples x = (~I' ... ,~,,).
Let y = ('11J ... ,'1,,), let
p,(x , y)

/
1' 11'T "

= [~I~I

and let

sp <

00,

max I{ I~
- 1' 11 . .. ,I~" - 1' "n. i.e. p = 00.
tU ilizing the completeness of the real line (of the complex plane), show that
{R"; p,} = R;({C;'; p,} = C;) is a complete metric space for 1 S p S 00.
In particular, show that if lX { } '
is a Cauchy sequence in R; (in C;), where
lX ' = (~\kJ
... '~"l')'
then {~/l'}
is a Cauchy sequence in R (in C) for j = I,
... ,n, and lX { } '
converges to x, where x = (~I' ... ,~,,)
and ~, = lim l'~
l'
for j = 1, ... , n.

p..(x, y) =

5.5.26. Example. Let {I,; p,} be the metric space defined in Example
5.3.5. We now show that this space is a complete metric space.
eL t

Let lX { } '
f

be a Cauchy sequence in I,. where lX '


E J such that

> O. Then there is an N


p,(x"

lX )'
=

[ .-L1.."

.., -

".1 -

~.l'

1] /'

(~lkJ

<

~2k'

~d'

).

for all k,j ~ N. This implies that


~"'l I < f for every m E J and all
k,j ~ N. Thus, {~.l'}
is a Cauchy sequence in R for every m E ,J and hence
.~{ l}'
is convergent to some limit, say lim ~ ..l' = ~. for m E .J Now let
l'
x = (~t, ~2' , ~'" ). We want to show that (i) x E I, and (ii) lim lX ' = .X
Since lX { } '
is a Cauchy sequence,
exists a " > 0 such that
p,(O, lX )'
=

we know by Theorem 5.5.13 that there

[ .~I ..

1~.k I'

1] /'

<"

for all k E .J Now let n be any positive integer, let p~ be the metric on R"
defined in Exercise 5.5.25, and let ~x = {~\kJ
... '~"l'J.
Then p~(x~,
xj)
< p,(x l" IX )' and thus {x~} is a Cauchy sequence in R;. It also follows that
p~(O,
x~)
s" for all k E .J Now by Exercise 5.5.25, }~x{ converges to x ' ,

5.5.

where x '

p~(O,

293

Complete Metric Spaces

x')

(' I " .. ,' , ,). It follows from Theorem 5.5.6, part (vii), that

< ,,;

follows that x

i.e.,
E

t[ i

1',1'

)t/, <

".

Since this must hold for all n

I,. To show that lim X

I, it

= ,x let > O. Then there is an

integer N such that p,(x } , X k ) < for all k,j > N. Again, let n be any
positive integer. Then we have p~(,~x
)~x
< for all j, k > N. F o r fixed
n, we conclude from Theorem 5.5.6, part (vii), that p~(X',
x~)
:::;; for all
k 2 N. eH nce,

[ ~ " 1,,,, "' s l

k'

I' IJ /' <

for all k >

N, where N depends

only on (and not on n). Since this must hold for all n E I, we conclude
that p(x , x k } < for all k > N. This implies that lim x k = X .
_
k

5.5.27. Exercise.
is complete.

Show that the discrete metric space of Example 5.1.7

5.5.28. Example. eL t e{ ra, bJ; p~) be the metric space defined in Example
5.3.14. Thus, era, bJ is the set of all continuous functions on a[ , bJ and
y)

p~(x,

sup I(X I) -

S/Sb

y(l) I.

We now show that e{ ra, bJ; p~) is a complete metric space. If ,x { ,} isa Cauchy
sequence in era, bJ, then for each > 0 there is an N such that I,x ,(I) - "X ,(I) I
< whenever m, n 2 N for all I E a[ , b]. Thus, for fixed I, the sequence
,X { ,(I})
converges to, say, oX (I}. Since t is arbitrary, the sequence offunctions
{x,,( .)} converges pointwise to a function x o( .). Also, since N = N( )
is independent of I, the sequence ,x { ,( )} converges uniformly to x o( ).
Now from the calculus we know that if a sequence of continuous functions
,x { ,( ) converges uniformly to a function x o( ), then x o( ) is continuous.
Therefore, every Cauchy sequence in e{ a[ , b); pool converges to an element in
this space in the sense of the metric poo. Therefore, the metric space e{ a[ ,
bJ; pool is complete. _

5.5.29. Example. eL t e{ ra, bJ; pz}


5.3.12, with p = 2; i.e.,
pz(x,

:U

y) =

be the metric space defined in Example

(X [ I)

y(I)J2

dt}

lIZ.

We now show that this metric space is not complete. Without loss ofgenerality
let the closed interval be [ - 1 , IJ. In particular, consider the sequence ,x { ,}
of continuous functions defined by
x , ,(t)=

< t:::;; 0

0,

-)

nt,

O:::;;t:::;;l! n

I,I! n :::;;t:::;;)

}
,

Chapter 5 I Metric Spaces

194
x ( t)

n= 3

n= 2 - ~ ' + f

~ -n=l

- l _ l - - f I~ - - l ..- - -

5.5.30.
n =
m >

for e{ ra, b]; P2}.

iF gw'e .F Sequence {x.}

1,2, .... This sequence is depicted pictorially in Figure


n and note that
P{ 2(X

.., .X )}2
=

(m -

,,)2 ill... t 2 dt

(m -

,,)2

3m2 n

< .!. <

fl/. (I -

1/..

.F

Now let

nt)2 dt

3n

whenever n > 1/(3). Therefore, .x{ }


is a Cauchy sequence.
F o r purposes of contradiction, let us now assume that .x{ }
converges to
a continuous function x, where convergence is taken with respect to the metric
P2' In other words, assume that

fl

Ix.(t) -

(x t)12 dt - -

0 as n - -

00.

This implies that the above integral with any limits between + I and - I
also approaches ez ro as n - > 00. Since x.(t) = 0 whenever t E [ - 1 ,0] , we
have

Ix.(t) -

(x t)12 dt

independent of n. From this it follows that the continuous function x is such


that

and x(t)

= 0 whenever t

Choosing n

fl

[ - 1 ,0] .

Ix.(t) -

2dt

I x(t) 1

0,

Now if 0 <

x(t) 12 dt - -

a S I, then

0 as n - -

00.

> I/a, we have


11 -

x ( tW dt - -

0 as n - -

00.

Since this integral is independent of n it vanishes. Also, since x is continuous

5.5. Complete Metric Spaces

it follows that x(t) = 1 for t > a. Since a can be chosen arbitrarily close to
ez ro, we end up with a function x such that
x(t)

= {O,

I,

t
t

E
E

[ - 1 ,0]
(0, I]

}.

Therefore, the Cauchy sequence .x [ J


does not converge to a point in era, b],
and the metric space is not complete. _
The completeness property of certain metric spaces is an essential and
important property which we will use and encounter frequently in the remainder of this book. The preceding example demonstrates that not all metric
spaces are complete. However, this space e[ ra, b]; pzJ is a subspace of a larger
metric space which is complete. To discuss this complete metric space (i.e.,
the completion of e{ ra, b]; pz)} , it is necessary to make use of the eL besgue
theory of measure and integration. F o r a thorough treatment of this theory,
we refer the reader to the texts by Royden 5[ .9] and Taylor 5[ .10]. Although
knowledge of this theory is not an essential requirement in the development
of the subsequent results in this book, we will want to make reference to
certain examples of important metric spaces which are defined in terms of
the eL besgue integral. F o r this reason, we provide the following heuristic
comments for those readers who are unfamiliar with this subject.
The eL besgue measure space on the real numbers, R, consists of the triple
R
{ , mr, lJ ,} where mr is a certain family of subsets of R, called the eL besgue
measurable sets in R, and J l is a mapping, W mr - > R*, called eL besgue
measure, which may be viewed as a generalization of the concept of length
in R. While it is not possible to characterize mr without providing additional
details concerning the eL besgue theory, it is quite simple to enumerate
several important examples of elements in mr. F o r instance, mr contains all
intervals of the form (a, b) = x { E R: a < x < b}, c[ , d) = x { E R: c < x
< d}, (e,f] = x{ E R: e < x < f } , g[ , h] = x{ E R: g < x < h}, as well as
all countable unions and intersections of such intervals. It is emphasized
that mr does not include all subsets of R. Now if A E mr is an interval, then
the measure of A, lJ (A), is the length of A. F o r example, if A = a[ , b], then
lJ (A) = b - a. Also, if B is a countable union of disjoint intervals, then
lJ (B) is the sum of the lengths of the disjoint intervals (this sum may be infinite). Of particular interest are subsets of R having measure ez ro. Essentially,
this means it is possible to "cover" the set with an arbitrarily small subset of
R. Thus, every subset of R containing at most a countable number of points
has eL besgue measure equal to ez ro. F o r example, the set of rational numbers
has eL besgue measure ez ro. (There are also uncountable subsets of R having
eL besgue measure zero.)
In connection with the above discussion, we say that a proposition P(x)
is true almost everywhere (abbreviated a.e.) if the set S = [x E R: P(x) is

Chapter 5

296

Metric Spaces

not true} has eL besgue measure ez ro. F o r example, two functions f, g:


R- + R are said to be equal a.e. if the set S = x { E R:f(x ) *- g(x)} E mt
and if .J l(S) = O.
eL t us now consider the integral of real-valued functions defined on the
interval a[ , b] c R. It can be shown that a bounded function f: a[ , b] - + R
is Riemann integrable (where the Riemann integral is denoted, as usual, by

r
a

f(x ) dx )

if and only if f is continuous almost everywhere on a[ , b]. The


.

class of Riemann integrable funCtions with a metric defined in the same


manner as in Example 5.5.29 (for continuous functions on a[ , b]) is not a
complete metric space. oH wever, as pointed out before, it is possible to
generalize the concept of integral and make it applicable to a class of functions
significantly larger than the class of functions which are continuous a.e.
In doing so, we must consider the class of measurable functions. Specifically,
a functionf: R - + R is said to be a eL besgue measurable fnnction if f- I (' l l.)
E mt for every open set CU c R. Now letfbe a e
L besgue measurable function
which is bounded on the interval a[ , b], and let M = sup { f (x ) = y: x E
a[ , b],} and let m = inf { f (x ) = y: x E a[ , b].} In the eL besgue approach to
integration, the range off is partitioned into intervals. (This is in contrast
with the Riemann approach, where the domain of f is partitioned in
developing the integral.) Specifically, let us divide the range off into the n
parts specified by m = oY < IY < ... <Y~_I
< Y~ = M, let E k = x{ E R:
Yk-I < X < Y k }
for k = I, ... ,", and let {k be such that Y k - I ~ { k < kY
for k =

I, ... ,n. The sum

:t {kP(E

k= 1

k)

approximates

the area under the

graph off, and it can serve as the definition of the integral of f between a
and b, after an appropriate limiting process has been performed. Provided
that this limit exists, it is called the eL besgue integral off over a[ , b], and it
is denoted by

f.

(a, b)

fd p. It can be shown that any bounded function f which

is Riemann integrable over a[ , b] is eL besgue


furthermore

f.

(a,bl

fdJ.l

= fb f(x ) dx .

integrable over a[ , b], and

On the other hand, there are functions

which are eL besgue integrable but not Riemann integrable over a[ , b]. F o r
example, consider the function f: a[ , b] - + R defined by f(x ) = 0 if x is
rational and f(x ) = I if x is irrational. This function is so erratic that the
Riemann integral does not exist in this case. oH wever, since the interval
a[ , b] = A U B, where A = { x : f(x )
= I} and B = { x : f(x ) = O}, it follows
from the preceding characterization

of eL besgue

integral that

f.

(a,bl

fdJ.l

l.J.l(A) + O.J.l(B) = b - a.
eL t us now consider an important class of complete metric spaces, given
in the next example.
=

5.5.

Complete Metric Spaces

297

5.5.31. Example.
Let p > 1 (p not necessarily an integer), let (R, mr, p.)
denote the eL besgue measure space on the real numbers, and let a[ , h] be
a subset of R. Let .c;:a, h] denote the family of functions f: R - > R which
are eL besgue measurable and such that f

[J .

b]

IfIp dp. exists and is finite.

We define an equivalence relation ~ on .c;:a, h] by saying that f' " g


if f(x ) = g(x) except on a subset of a[ , h] having eL besgue measure ez ro.
Now denote the family of equivalence classes into which .cp[a, h] is divided
by pL a[ , h]. Specifically, let us denote the equivalence class [ f ] = g{ E
.cp[a, h]: g ~ f} for f E .cp[a, h]. Then pL a[ , h] = { [ f ] : f
E .cp[a, hlJ
Now let X = pL a[ , h] and define Pp: X x X - > R by
(5.5.32)
It can be shown that the value of p([f], g[ )J defined by Eq. (5.5.32) is the same
for any f and g in the equivalence classes [ f ] and g[ ,] respectively. Furthermore, p satisfies all the axioms of a metric, and as such pL { a[ ,
h]; pp} is a
metric space. One of the important results of the eL besgue theory is that this
space is complete.
It is important to note that the right-hand side of Eq. (5.5.32) cannot be
used to define a metric on .cp[a, h], since there are functions f *- g such that

.[J ,f

b)

If-

glp dp.

0; however, in the literature the distinction between

h] and .cp[a, h] is usually suppressed.


b] instead of [ f ] E A
L a, b], where f E
Finally, in the particular case when p
Example 5.5.29 is a subspace of the space L{

pL a[ ,
pL a[ ,

That is, we usually write


.cJa, b].
= 2, the space e{ ra, b]; pz}

z ; pz.}

of

Before closing the present section we consider some important general


properties of complete metric spaces.
5.5.33. Theorem. Let { X ; p) be a complete metric space, and let { Y ; p} be
a metric subspace of { X ; pl. Then { Y ; p) is complete if and only if Y is a
closed subset of .X
Proof Assume that { Y ; p) is complete. To show that Y is a closed subset
of X we must show that Y contains all of its adherent points. Let y be an
adherent point of ;Y i.e., lety E .Y Then each open sphere S(y; lIn), n = I,
2, ... , contains at least one point y" in .Y Since p(y", y) < lIn it follows that

the sequence y{ ,,) converges to y. Since y{ ,,) is a Cauchy sequence in the complete space { Y ; p} we have y{ ,,} converging to a point y' E .Y But the limit of
a sequence of points in a metric space is unique by Theorem 5.5.6. Therefore,
y' = y; i.e., y E Y and y is closed.

Chapter 5 / Metric Spaces

Conversely, assume that Y is a closed subset of .X To show that the


space { Y ; p} is complete, let .Y { }
be an arbitrary Cauchy sequence in { Y ; pl.
Then y{ ,,} is a Cauchy sequence in the complete metric space ;X {
p} and as
such it has a limit y E .X oH wever, in view of Theorem 5.5.8, part (iii),
the closed subset Y of X contains all its adherent points. Therefore, { Y ; p}
is complete. _
We emphasize that completeness and closure are not necessarily equivalent
in arbitrary metric spaces. F o r example, a metric space is always closed, yet
it is not necessarily complete.
Before characterizing a complete metric space in an alternate way, we
need to introduce the following concept.
5.5.34. Definition. A sequence S{ t}
is called a nested sequence of sets if
St

::>

Sz

of subsets of a metric space ;X{


::>

p}

S3 ::>

We leave the proof of the last result of the present section as an exercise.
5.5.35. Theorem. eL t { X ;

p} be a metric space. Then,

(i) { X ; p} is complete if and only if every sequence of closed nested


spheres in { X ; p} with radii tending to ez ro have non-void interesection; and
p} is complete, if S{ t} is a nested sequence of non-empty closed
(ii) if ;X {
subsets of ,X

and if lim diam (S,,) =

0, then the intersection

is not empty; in fact, it consists of a single point.


5.5.36. Exercise.

5.6.

n SIt

.~I

Prove Theorem 5.5.35.

COMPACTNESS

We recall the Bolzano-Weierstrass theorem from the calculus: Every


bounded, infinite subset of the real line (i.e., the set of real numbers with the
usual metric) has at least one point of accumulation. Thus, if Y is an arbitrary
bounded infinite subset of R, then in view of this theorem we know that any
sequence formed from elements of Y has a convergent subsequence. F o r
example, let Y = 0[ , 2], and let ,x { ,} be the sequence of real numbers given by
"x -

_ I -

(- I )"
2

+ n'I

n-

2
1, , ....

Then the range of this sequence lies in Y and is thus bounded. eH nce,
range has at least one accumulation point. It, in fact, has two.

the

5.6.

Compactness

299

A theorem from the calculus which is closely related to the BolzanoWeierstrass theorem is the eH ine-Borel
theorem. We need the following
terminology.
5.6.1. Definition. eL t Y be a set in a metric space { X ; p), and let A be an
index set. A collection of sets { Y II : (X E A) in {X; p) is called a covering
of Y if Y c U
Y II A subcollection { Y p : p E B) of the covering { Y . : (X E A),
eL .,

ileA

B c A such that Y c

pes

is called a subcovering of { Y.;

(X

A). If

all the members Y . and Y p are open sets, then we speak of an open covering
and open subcovering. If A is a finite set, then we speak of a finite covering.
In general, A may be an uncountable set.
We now recall the eH ine-Borel theorem as it applies to subsets of the real
line (Le., of R): eL t Y be a closed and bounded subset of R. If { Y . : (X E A)
is any family of open sets on the real line which covers ,Y then it is possible to
find a finite subcovering of sets from { Y . : (X E A).
Many important properties of the real line follow from the BolzanoWeierstrass theorem and from the eH ine-Borel
theorem. In general, these
properties cannot be carried over directly to arbitrary metric spaces. The
concept of compactness, to be introduced in the present section, will enable
us to isolate those metric spaces which possess the eH ine-Borel and BolzanoWeierstrass property.
Because of its close relationship to compactness, we first introduce the
concept of total boundedness.
5.6.2. Definition. eL t Y be any set in a metric space { X ; p}, and let l
be an arbitrary positive number. A set S. in X is said to be an l- n et for Y
if for any point y E Y there exists at least one point S E S. such that p(s,y)
< l. The l-net, S.. is said to be finite if S. contains a finite number of points.
A subset Y of X is said to be totally bounded if X contains a finite l- n et for Y
for every l > O.
Some authors use the terminology l-dense set for E-net and precompact
for totally bounded sets.
An obvious equivalent characterization of total boundedness is contained
in the following result.
5.6.3. Theorem. A subset Y c X is totally bounded if and only if Y can
be covered by a finite number of spheres of radius E for any E > O.
5.6.4.

Exercise.

Prove Theorem 5.6.3.

In Figure G a pictorial demonstration of the preceding concepts is given.


If in this figure the size of E would be decreased, then correspondingly, the

Chapter 5 I Metric Spaces

300

Set X

S. is the finite
set consisting of
the dots within
the set X

Set Y

iF gure G. Total boundedness of a set .Y

5.6.5.

number of elements in S. would increase. If for arbitrarily small E the number


of elements in S. remains finite, then we have a totally bounded set .Y
Total boundedness is a stronger property than boundedness. We leave
the proof of the next result as an exercise.
5.6.6.

Then,

p J be a metric space, and let Y be a subset of .X

Theorem. eL t { X ;

(i) if Y is totally bounded, then it is bounded;


if Y is totally bounded, then its closure Y is totally bounded; and
(iii) if the metric space { X ; pJ is totally bounded, then it is separable.
(ii)

5.6.7. Exercise.

Prove Theorem 5.6.6.

We note, for example, that all finite sets (including the empty set) are
totally bounded. Whereas all totally bounded sets are also bounded, the
converse does, in general, not hold. We demonstrate this by means of the
following example.
5.6.8. Example. eL t /{ 2; P2J be the metric space defined in Example 5.3.5.
Consider the subset Y c /2 defined by
Y

= y{

. 1'1,1

12 ::E
t= 1

S I}.

We show that Y is bounded but not totally bounded. F o r any ,x y


have by the Minkowski inequality (5.2.7),
P2(X,y)

= [~Iet

- l' ,12r2 < [~le,/2r2


+

t[ i

l' ,12T'2 s

,Y we

2.

Thus, Y is bounded. To show that Y is not totally bounded, consider the


set of points E = e{ p e2 , J c ,Y
where e l = (1,0,0, ...), e2 = (0, 1,

301

5.6. Compactness

0, ...), etc. Then pz(e l, eJ ) = ...;-T for i 1= = j. Now suppose there is a finite
-net for Y for say = 1- Let S{ l> ... , s,,} be the net S,. Now if eJ is such
that p(eJ' SI) < !
for some i, then peek' sJ > peek' eJ ) - p(eJ' SI) > ! for
k 1= = j. Hence, there can be at most one element of the set E in each sphere
S(SI;! ) for i = I, ... ,n. Since there are infinitely many points in E and
only a finite number of spheres S(SI; ! ) , this contradicts the fact that S, is
an (- n et. Hence, there is no finite (- n et for ( = ! '
and Y is not totally
bounded. _
Let us now consider an example of a totally bounded set.
5.6.9.

Example.

Let R
{ "; pz}

be the metric space defined in Example 5.3.1,

and let Y be the subset of R" defined by Y =

{y

EO

R":

leI

til <

I}. Clearly,

Y is bounded. To show that Y is totally bounded, we construct an -net for


Y for an arbitrary ( > 0. To this end, let N be a positive integer such that
N > .- In, and let S, be the set of all n-tuples given by
s{ =

Sf =

(q l '

... ,q . )

where - N

< ml <

lq =

:Y

EO

N, i

mlIN, some integer ml,


I, ... , nJ.

Then clearly S. c Y a nd S, is finite. Now for any y = (til' ... ,tI,,) EO ,Y


there is an s EO S, such that Ilq - till < IIN for i = I, ... , n. Thus, pz(y, s)
~

[ L

I-I

I{ IN)Z

1] /1

= filN ~ (.

Therefore, S. is a finite (- n et.

is arbitrary, Y is totally bounded.

Since (

In general, any bounded subset of R'i

R
{ "; pz}

is totally bounded.

5.6.10. Exercise.
Let l{ ;z
pz} be the metric space defined in Example
5.3.5, and let Y c /z be the subset defined by

{y

EO

/z:

Itlll~

I, Itizi

<!,

... , 111.1 <

(1)"- I , ... .}

Show that Y is totally bounded.


In studying compactness of metric spaces, we will find it convenient to
introduce the following concept.
5.6.11. Definition. A metric space { X ; p} is said to be sequentially compact
if every sequence of elements in X contains a subsequence which converges
to some element x EO .X A set Y in the metric space { X ; p} is said to be
sequentially compact if the subspace { Y ; pJ is sequentially compact; eL .,
every sequence in Y contains a subsequence which converges to a point in .Y
5.6.12. Example.
L e t X = (0, I], and let p be the usual metric on the real
line R. Consider the sequence .x { ,J
where "x = lin, n = I, 2, . . .. This

302

Chapter 5

Metric Spaces

sequence has no subsequence which converges to a point in ,X


{ X ; p} is not sequentially compact. _

and thus

We now define compactness.


5.6.13. Definition. A metric space { X ; p} is said to be compact, or to possess
p} contains a finite
the eH ine-Borel property, if every open covering of ;X {
open subcovering. A set Y in a metric space { X ; p} is said to be compact if
the subspace { Y ; p} is compact.
Some authors use the term bicompact for eH ine-Borel compactness and
the term compact for what we call sequentially compact. As we shall see
shortly, in the case of metric spaces, compactness and sequential compactness
are equivalent, so no confusion should arise.
We will also show that compact metric spaces can equivalently be characterized by means of the Bolzano-Weierstrass property, given by the following.
5.6.14. Definition. A metric space { X ; p} possesses the Bolzano-Weierstrass
property if every infinite subset of X has at least one point of accumulation.
A set Y in X possesses the Bolzano-Weierstrass property if the subspace
{ Y ; p} possesses the Bolzano-Weierstrass property.
Before setting out on proving the assertions made above, i.e., the equivalence of compactness, sequential compactness, and the Bolzano-Weierstrass
property, in metric spaces, a few comments concerning some of these concepts
may be of benefit.
Informally, we may view a sequentially compact metric space as having
such an abundance of elements that no matter how we choose a sequence,
there will always be a clustering of an infinite number of points around at
least one point in the metric space. A similar interpretation can be made
concerning metric spaces which possess the Bolzano-Weierstrass property.
Utilizing
the concepts of sequential compactness and total boundedness,
we first state and prove the following result.
5.6.15. Theorem. Let { X ; p} be a metric space, and let Y be a subset of
.X The following properties hold:
(i)
(ii)
(iii)
(iv)
(v)

if Y is sequentially compact, then Y is bounded;


if Y is sequentially compact, then Y is closed;
if { X ; p} is sequentially compact, then { X ; p} is totally bounded;
if ;X {
p} is sequentially compact, then ;X{
p} is complete; and
if { X ; p} is totally bounded and complete, then it is sequentially
compact.

Proof To prove (i), assume that Y is a sequentially compact subset of X

and assume, for purposes of contradiction, that Y is unbounded. Then we

5.6. Compactness

303

can construct a sequence ,Y { ,}


with elements arbitrarily far apart. Specifically, let IY E Y a nd choose zY E Y such that P(YI' 12) > I. Next, choose
Y 3 E Y such that p(y I' Y 3) > 1 + p(y I' Y )z . Continuing this process, choose
"Y E Y such that P(YI' ,Y ,) > 1 + p(y., Y , ,- I )' If m > n, then P(YI'Y"')
> I+
p(y"y")andp(y",,y,,) > Ip(Y I ' Y " ' )
- p(YI,Y,,)1
> 1. But this implies that
y{ ,,} contains no convergent subsequence. However, we assumed that Y is
sequentially compact; i.e., every sequence in Y contains a convergent subseuq ence. Therefore, we have arrived at a contradiction. Hence, Y must be
bounded. In the above argument we assumed that Y is an infinite set. We
note that if Y is a finite set then there is nothing to prove.
To prove part (ii), let f denote the closure of Y a nd assume that Y E f.
Then there is a sequence of points ,Y { ,}
in Y which converges to ,Y and every
subsequence of y{ ,,} converges to ,Y by Theorem 5.5.6, part (iv). But, by
hypothesis, Y is sequentially compact. Thus, the sequence y{ ,,} in Y contains
a subsequence which converges to some element in .Y Therefore, Y = f
and Y is closed.
We now prove part (iii). Let { X ; p} be a sequentially compact metric
space, and let X I E .X With E > 0 fixed we choose if possible X z E X such
that p(x p x z ) > E. Next, if possible choose X 3 E X such that p(x l , x z ) > E
and p(x p x 3 ) > E. Continuing this process we have, for every n, p(x", X I ) > E,
p(x", x z ) > E,
, p(x", X , ,_ I )
> E. We now show that this process must
ultimately terminate. Clearly, if{X;
p} is a bounded metric space then we can
pick E sufficiently large to terminate the process after the first step; i.e.,
there is no point x E X such thatp(x 1 , x ) :2 . Now suppose that, in general,
the process does not terminate. Then we have constructed a sequence ,x { ,}
such that for any two members X I ' x J of this sequence, we have p(xt> X J ) > E.
But, by hypothesis, ;X{
p} is sequentially compact, and thus ,x { ,} contains a
subsequence which is convergent to an element in .X Hence, we have arrived
at a contradiction and the process must terminate. Using this procedure
we now have for arbitrary E > 0 afinite set of points { x . , x z , ... ,X l } such
that the spheres, S(x,,; E), n = I, ... ,I, cover X ; i.e., for any E > 0, X
contains a finite E-net. Therefore, the metric space { X ; p} is totally bounded.
We now prove part (iv) of the theorem. Let ,x { ,}
be a Cauchy sequence.
Then for every E > 0 there is an integer I such that p(x"" ,x ,) < f whenever
m > n > I. Since { X ; p} is sequentially compact, the sequence ,x { ,} contains a
subsequence tx{ .l convergent to a point X E X so that lim P(Xl., )x = O.
,,-00
The sequence I{ ,,} is an increasing sequence and I", > m. It now follows that
whenever m > n > I. Letting m - + 00, we have 0 < p(x", )x < E, whenever
n > I. Hence, the Cauchy sequence ,x { ,}
converges to x E .X Therefore,
X is complete.
In connection with parts (iv) and (v) we note that a totally bounded metric

Chapter 5

304

I Metric Spaces

space is not necessarily sequentially compact. We leave the proof of part


(v) as an exercise.
_
5.6.16. Exercise.

Prove part (v) of Theorem 5.6.15.

Parts (iii), (iv) and (v) of the above theorem allow us to define a sequentially
compact metric space equivalently as a metric space which is complete and
totally bounded. We now show that a metric space is sequentially compact if
and only if it satisfies the Bolzano-Weierstrass property.
5.6.17. Theorem. A metric space { X ; p} is sequentially compact if and
only if every infinite subset of X has at least one point of accumulation.

Proof Assume that Y is an infinite subset of a sequentially compact metric


pl. If nY{ } is any sequence of distinct points in ,Y then nY{ } contains
space ;X{
because ;X {
p} is sequentially compact.
a convergent subsequence y{ ,J,
The limit Y of the subsequence is a point of accumulation of .Y
Conversely, assume that { X ; p} is a metric space such that every infinite
subset Y of X has a point of accumulation. Let y{ n} be any sequence of points
then this sequence
in .Y If a point occurs an infinite number of times in nY { ,}
contains a convergent subsequence, a constant subsequence, and we are
finished. If this is not the case, then we can assume that all elements of .Y{ }
are disti net. eL t Z denote the set of all points Y n' n = I, 2, .... By hypothesis,
the infinite set Z has at least one point of accumulation. If Z E Z is such
a point of accumulation then we can choose a sequence of points of Z which
converges to z (see Theorem 5.5.8, part (i and this sequence is a subsequence
y{ ,.} of nY { ' }
Therefore, ;X{
p} is sequentially compact. This concludes the
proof. _
Our next objective is to show that in metric spaces the concepts of compactness and sequential compactness are equivalent. In doing so we employ
the following lemma, the proof of which is left as an exercise.
5.6.18. eL mma.
eL t ;X {
p} be a sequentially compact metric space. If
{ Y .. : IX E A} is an infinite open covering of { X ; p}, then there exists a number
E > 0 such that every sphere in X of radius E is contained in at least one of
the open sets Y ...
5.6.19. Exercise.

Prove Lemma 5.6.18.

5.6.20. Theorem. A metric space { X ;


sequentially compact.

p} is compact if and only if it is

Proof From Theorem 5.6.17, a metric space is sequentially compact if and


only if it has the Bolzano-Weierstrass property. Therefore, we first show

5.6. Compactness

that every infinite subset of a compact metric space has a point of accumulation.
eL t [ X ; p) be a compact metric space, and let Y be an infinite subset of
.X F o r purposes of contradiction, assume that Y has no point of accumulation. Then each x E X is the center of a sphere which contains no point of
,Y except possibly x itself. These spheres form an infinite open covering of
.X But, by hypothesis, [ X ; p) is compact, and therefore we can choose from
this infinite covering a finite number of spheres which also cover .X Now
each sphere from this finite subcovering contains at most one point of ,Y and
therefore Y is finite. But this is contrary to our original assumption, and we
have arrived at a contradiction. Therefore, Y has at least one point of accumulation, and [ X ; p) is sequentially compact.
Conversely, assume that [ X ; p) is a sequentially compact metric space,
and let [ Y .. ; E A) be an arbitrary infinite open covering of .X From
Lemma 5.6.18 there exists an [ > 0 such that every sphere in X of radius
[ is contained in at least one of the open sets Y ... Now, by hypothesis, { X ; p)
is sequentially compact and is therefore totally bounded by part (iii) of
Theorem 5.6.15. Thus, with arbitrary [ fixed we can find a finite [-net,

IX[ >
S(x

x z , ... ,XI)'
l ; [)

c Y .. I , i

A). eH nce,

such that X c

1= 1

S(x

l;

f). Now in view of Lemma

I, ... ,I, where the sets ,Y ,"

XcU

5.6.18,

are from the family { Y

.. ;

Y ..
"
and X has a finite open subcovering chosen from the infinite open covering
{ Y .. ; E A). Therefore, the metric space { X ; p) is compact. This proves the
theorem. _
I-I

There is yet another way of characterizing a compact metric space. Before


doing so, we give the following definition.
5.6.21. Definition. eL t F { .. : E A} be an infinite family of closed sets.
The family F { .. : E A} is said to have the finite intersection property if
for every finite set B c A the set
F .. is not empty.

.. EB

5.6.22. Theorem. A metric space ;X{


p} is compact if and only if every
infinite family F { .. : E A} of closed sets in X with the finite intersection
property has a nonvoid intersection; i.e.,
F .. t= = 0 .

.. EA

5.6.23. Exercise.

Prove Theorem 5.6.22.

We now summarize the above results as follows.

306
5.6.24.

(i)
(ii)
(iii)
(iv)
(v)

Chapter 5

Theorem.

{X;

;X{

p}
p}
p}
p}

In a metric space { X ;

Metric Spaces

p} the following are eq u ivalent:

is compact;
is sequentially compact;

{X;
possesses the Bolzano-Weierstrass property;
{X;
is complete and totally bounded; and
every infinite family of closed sets in { X ; p} with the finite intersection
property has a nonvoid intersection.

Concerning product spaces we offer the following exercise.

5~6.2S.
Exercise.
L e t { X I ; pa, { X z ; pz}, . .. , { X . ;
spaces. L e t X = X I X X z X ... x X . , and let
p(x , y)

PI(X " Y I )

+ ... +

where "x ,Y E "X i = I; ... , n, and where ,x Y


space { X ; p} is also a compact metric space.

P.} be n compact metric

P.(x . , Y.),
E .X

(5.6.26)

Show that the product

The next result constitutes an important characteriz a tion of compact sets


in the spaces R and C.
5.6.27. Theorem. L e t { R ; pz} (let { C ; pz } ) be the metric space defined
in Ex a mple 5.3.1. A set Y c R- (a set Y c C ) is compact if and only ifit
is closed and bounded.
5.6.28. Exercise.

Prove Theorem 5.6.27.

Recall that every non-void compact set in the real line R contains its
infimum and its supremum.
In general, it is not an easy task to apply the results of Theorem 5.6.24
to specific spaces in order to establish necessary and sufficient conditions
for compactness. F r om the point of view of applications, criteria such as
those established in Theorem 5.6.27 are much more desirable.
We now give a condition which tells us when a subset of a metric space is
compact. We have:
5.6.29. Theorem. L e t { X ; p} be a compact metric space, and let Y
If Y is closed, then Y is compact.

Proof

c .X

L e t { Y .. ; (J, E A} be any open covering of ;Y i.e., each Y .. is open


relative to { Y ; pl. Then, by Theorem 5.4.20, for each Y .. there is a U .. which
is open relative to ;X{
p} such that Y .. = Y n U ... Since Y is closed, Y - is
an open set in ;X{
pl. Also, since X = Y U Y - , Y - U U{ .. : (J, E A} is an
open covering of .X Since X is compact, it is possible to find a finite subcovering from this family; i.e., there is a finite set B c A such that X = Y -

5.7.

Continuous uF nctions

u U[ .. EB V..].

Since Y c

307

.. eB

V.., Y

This implies that Y is compact.

=
_

.. eB

V.. ; i.e., { Y

.. ;

B} covers Y .

We close the present section by introducing the concept of relative


compactness.
5.6.30. Definition. Let { X ; p} be a metric space and let Y c .X The subset
Y is said to be relatively compact in X if Y is a compact subset of .X
One of the essential features of a relatively compact set is that every
sequence has a convergent subsequence, just as in the case of compact
subsets; however, the limit of the subsequence need not be in the subset.
Thus, we have the following result.
5.6.31. Theorem. eL t { X ; p} be a metric space and let Y c .X Then Y is
relatively compact in X if and only if every sequence of elements in Y contains
a subsequence which converges to some x E .X

Proof Let Y be relatively compact in ,X and let nY{ }


be any sequence in
.Y Then nY{ } belongs to Y also and hence has a convergent subsequence in
,Y since Y is sequentially compact. Hence, nY{ } contains a subsequence
.X
which converges to an element x EY e
Conversely, let nY { }
be a sequence in .Y Then for each n = 1,2, ... , there
is an x n E Y such that p(x n, nY ) < lin. Since x { n} is a sequence in ,Y it
contains a convergent subsequence, say x{ n.} , which converges to some
x E .X Since nx { .J
is also in ,Y it follows from part (iii) of Theorem 5.5.8
that x E .Y Hence, Y is sequentially compact, and so Y is relatively compact
in .X
_

5.7.

CONTINUOS
U

N
UF CTIONS

Having introduced the concept of metric space, we are in a position to


give a generalization of the concept of continuity of functions encountered
in calculus.
5.7.1. Definition. Let { X ; P..J and { Y ; py} be two metric spaces, and let
f: X - +
Y be a mapping of X into .Y The mappingf is said to be continuous
at the pcint X o E X if for every ( > 0 there is a ~ > 0 such that

o)] < (
whenever p,,(x, x o) < ~. The mapping f is said to be continuous on X
simply continuous if it is continuous at each point x E .X
PY [ f (x ) ,f(x

or

308

Chapter 5 / Metric Spaces

We note that in the above definition the ~ is dependent on the choice


of X o and e; ie., ~ = tS(f, x o). Now if for each f > 0 there exists a ~ = tS(e)
> 0 such that for any X o we have py[ f (x ) ,f(x o)] < f whenever p,,(x, x o) < ~,
then we say that the function f is uniformly continuous on .X Henceforth,
if we simply say f is continuous, we mean f is continuous on .X
5.7.2. Example.
Let { X ; p,,} = R~,
5.3. I). Let A denote the real matrix

We denote x

Rn and Y

::: :::

A=

amI

py}

and let { Y ;

a m2

. ...

:::]

.,.

a mn

RT (see Example

Rm by

L e t us define the function f: Rn - +

Rm by

f(x)

Ax

for each x ERn. We now show that f is continuous on Rn. Ifx, X o


E Rm are such that y = f(x) and aY
= f(x.), then we have

y, oY

[
and

~']

amI

11m

= ~

p[ y(y, OY )]2
Using

R- and

"~ ] e[ ,]

a[ n
=

am_

tL

en

a/j(e J -

eOJ)r

the Schwarz inequality, it follows that


p[ ,.{y,

Now let

M=

t{ 1

tJ

yo)2

all}

Ct ah) ~LJ

< [~
1/1

1= =

0 (if

M=

(e
J

e I)2)
O

0 then we are done). Given any

0 and choosing ~ = flM, it follows that p,.{y, oY ) < f whenever p,,(x, ox )


and any mapping f: Rn - + Rm which is represented by a real, constant
(m X n) matrix A is continuous on Rn.
f

<

>

5.7.3. Example.
Let { X ; p,,} = { Y ; py} = {e[a, b]; P2}' the metric space
defined in Example 5.3.12, and let us define a function/: X - +
Y in the fol-

5.7.

Continuous uF nctions

lowing way. F o r x

309

,X Y = f(x ) is given by

yet)

I: k(t, s)x(s)ds,

a[ , b],

where k: R7. - > R is continuous in the usual sense, i.e., with respect to the
metric spaces R~ and R1. We now show that f is continuous on .X Let x,
X o E X and y, o
Y E Y be such that y = f(x ) and oY = f(x o). Then
[ p iY ,

oY W

It follows from Holder's


where M =
ever Px(,x

u: r

rI{ :

k(t, s)[(x s)

ox (s)]ds}

dt.

inequality for integrals (5.2.5) that


py(y, oY )

<

Mpx(x,

x o),

> 0, py{y,Yo) <

k7.(t, s) dsdtl'7.. eH nce, for any f

x o) <

7.

b, where b =

when-

fiM.

5.7.4. Example. Consider the metric space e{ ra, b]; p~} defined in Example
5.3.14. eL t el[ a , b] be the subset of era, b] of all functions having continuous
first derivatives (on (a, b, and let {X; Px} be the metric subspace {el[a, b];
pool. Let {Y; py} = e{ ra, b]; p~} and define the functionJ: X - > Yas follows.
F o r x E ,X Y = f(x ) is given by
yet) =

dx ( t) .
dt

To show that/is not continuous, we show that for any b > 0 there is a pair
x o) < ~ but pif(x ) ,f(x o > I. eL t ox (t) = 0 for
x , X o E X such that Px(,x
all t E a[ , b], and let x(t) = tx sin rot, tx > 0, ro > O. Then p(x o' x) < tx.
Now if oY = f(x o) and y = f(x ) , then yo(t) = 0 for all t E a[ , b] and yet)
= (XCI) cos rot. eH nce, p(Yo' y) = txro, provided that ro is sufficiently large, i.e.,
so that cos rot = I for some t E a[ , b]. Now no matter what value of ~
we choose, there is an x E X such that p(x, x o) < ~ if we pick tx < ~. oH wever, p(y, oY ) = I if we let ro = Iltx. Therefore,J i s not continuous on .X

We can interpret the notion of continuity of functions in the following


equivalent way.
5.7.5. Theorem. Let {X; Px} and { Y ; py} be metric spaces, and let f:
X ->
.Y Then f is continuous at a point X o E X if and only if for every
f >
0, there exists a ~ > 0 such that
f(S(xo;~)

5.7.6. Exercise.

c S(f(x o); f).

Prove Theorem 5.7.5.

Intuitively, Theorem 5.7.5 tells us thatfis continuous at X o if f(x ) is arbitrarily close to f(x o) when x is sufficiently close to X o. The concept of continuity is depicted in Figure H for the case where { X ; Px} = { Y ; py} = R~.

Chapter 5 I Metric Spaces

310

o
o
{ Y ; pyl

5.7.7.

iF gure H.

R~

Illustration of continuity.

As we did in Chapter I, we distinguish between mappings on metric


spaces which are injective, surjective, or bijective.
It turns out that the concepts of continuity and convergence of sequences
are related. Our next result yields a connection between convergence and
continuity.
5.7.8. Theorem. Let { X ; P,J and { Y ; p,.} be two metric spaces. A function
f: X - +
Y is continuous at a point X o E X if and only if for every sequence
}~x{
of points in X which converges to a point X o the corresponding sequence
f{ (x)~ }
converges to the point f(x o) in Y; i.e.,
limf(x)~

whenever lim ~x

f(lim x~)

f(x o)

= x o'

Proof Assume that f is continuous at a point X o E ,X and let {x.l be a


sequence such that lim x . = X o' Then for every E > 0 there is a 6 > 0 such
that p,.(f(x ) ,f(x o < E whenever Px(x, x o) < 6. Also, there is an N such
that Px ( x . , x o) < 6 whenever n > N. Hence, p,.(f(x . ),f(x o < E whenever
n > N. Thus, if f is continuous at X o and if lim x . = x o, then Iimf(x . )

f(x o)'

Conversely, assume that f(x . ) - + f(x o) whenever ~x - + x o' F o r purposes


of contradiction, assume that f is not continuous at X o' Then there exists
an E > 0 such that for each 6 > 0 there is an x with the property that
Px(x, x o) < 6 and p,.(f(x ) ,f(x o > E. This implies that for each positive
integer n there is an x~ such that Px(x., x o) < lin and P,.(f(x J , f(x o > E
for all n; i.e., ~x - + X o but { f (x . )}
does not converge to f(x o)' But we assumed
that f(x . ) - + f(x o) whenever ~x - + X o' Hence, we have arrived at a contradic-

5.7. Continuous uF nctions

tion, and I must be continuous at


theorem. _
X

311

o' This concludes the proof of our

Continuous mappings on metric spaces possess the following important


properties.

5.7.9. Theorem. eL t { X ; p~} and { Y ;


be a mapping of X into .Y Then
(i)

p,} be two metric spaces, and letl

is continuous on X if and only if the inverse image of each open


subset of { Y ; p,} is open in { X ; p~};
and
(ii) I is continuous on X if and only if the inverse image of each closed
subset of { Y ; p,} is closed in { X ; p~}.

Proof. eL t I be continuous on ,X and let V::t= 0 be an open subset of


;Y{
p,}. Let U = I- I (V). Clearly, :U :t=
0. Now let x E .U Then there exists a
unique y = I(x ) E V. Since V is open, there is a sphere S(y; e) which is
entirely contained in V. Since I is continuous at x, there is a sphere S(x; 0)
such that its image I(S(x ; 0 is entirely contained in S(y; e) and therefore
in V. But from this it follows that S(x; 0) c .U eH nce, every x E U is the
center of a sphere which is contained in .U Therefore, U is open.
Conversely, assume that the inverse image of each non-empty open
subset of Y is open. F o r arbitrary x E X we have y = f(x ) . Since S(y; e)
c Y i s open, the setf- I (S(y; e is open for every f >
and x E f- I (S(y;e .
eH nce, there is a sphere Sex; 0) such that sex ; 0) c f- I (S(y; e . From
this it follows that for every f > 0 there is a 6 > 0 such that f(S(x ; 0)
c S(y; f). Therefore,fis continuous at .x But x E X was arbitrarily chosen.
eH nce, I is continuous on .X This concludes the proof of part (i).
To prove part (ii) we utilize part (i) and take complements of open sets.

The reader is cautioned that the image of an open subset of X under


Y is not necessarily an open subset of .Y
a continuous mapping f: X - +
F o r example, let I: R - + R be defined by f(x ) = x 2 for every x E R. Clearly,
lis continuous on R. eY t the image of the open interval ( - I , I) is the interval
0[ , I). But the interval 0[ , I) is not open.
We leave the proof of the next result as an exercise to the reader.

5.7.10. Theorem. eL t {X; p~}, {Y; p,}, and Z


{ ; P.} be metric spaces, letf
be a mapping of X into ,Y and let g be a mapping of Y into Z. Iffis continuous on X and g is continuous on ,Y then the composite mapping h = g 0 I
of X into Z is continuous on .X
5.7.11. Exercise.

Prove Theorem 5.7.10.

F o r continuous mappings on compact spaces we state and prove the


following result.

Chapter 5

312

5.7.12. Theorem. Let ;X {


Px}
f: X - + Y be continuous on .X

and ;Y{

I Metric Spaces

P)'} be two metric spaces, and let

(i) If {X; Px} is compact, then f(X ) is a compact subset of {Y; p)'.}
(ii) If U is a compact subset of the metric space ;X{
Px,} thenf(U ) is
a compact subset of the metric space { Y ; p)'.}
(iii) If {X; P}x is compact and if U is a closed subset of ,X then f( )U
is a closed subset of { ;Y p)').
(iv) If;X {
Px} is compact, thenfis uniformly continuous on .x

Proof To prove part (i) let IY { I}


be a sequence in f(X ) . Then there are
points ,x { ,}
in X such that IY I = f(x ll ). Since ;X{
Px} is compact we can find
a subsequence ,x { .) of ,x { ,}
which converges to a point in ;X
i.e., ,x . - + x.
In view of Theorem 5.7.8 we have, since f is continuous at x, f(x , .) - + f(x )
E f(X ) .
From this it follows that the sequence ,Y{ ,}
has a convergent subsequence and f(X ) is compact.
To prove part (ii), let U be a compact subset of .X Then ;U {
Px} is a
compact metric space. In view of part (i) it now follows that f( )U is also a
compact subset of the metric space { Y ; p)'.}
To prove part (iii), we first observe that a closed subset U of a compact
metric space ;X{
Px) is itself compact and ;U {
Px) is itself a compact metric
space. In view of part (ii), f( U ) is a compact subset of the metric space
{ Y ; P)'} and as such is bounded and closed.
To prove part (iv), let E > O. F o r every x E ,X there is some positive
number, 'I(x), such that f(S(x ; 2'1(x) c: S(f(x ) ; E/2). Now the family
{ S ex ; ' I (x :
x E X ) is an open covering of X. Since X is compact, there is
a finite set, say F c: ,X such that S
{ ex; ,,(x: x E } F is a covering of .X
Now let
6 = min {,,(x): x E .} F
Since F is a finite set, 6 is some positive number. Now let ,x Y E X be such
that p(x, y) < 6. Choose z E F such that x E S(z; ,,(z. Since 6:::;;; ,,(z),
Y E S(z; 2,,(z.
Since f(S(z ; 2,,(z)
c S(f(z ) ; E/2), it follows that f(x ) and
f(y) are in S(f(z ) ; E/2). eH nce, pif(x ) ,f(y
< E. Since 6 does not depend
on x E ,X f is uniformly continuous on .X This completes the proof of the
theorem. _
eL t us next consider some additional generalizations of concepts encountered in the calculus.
5.7.13. Definition. eL t ;X {
Px} and ;Y{
p),} be metric spaces, and let {fll}
be a sequence of functions from X into .Y Iff{ 1l(X)} converges at each x E X,
then we say that {fll} is pointwise convergent. In this case we write lim fll = f,
II
where f is defined for every x E .X
Equivalently,

we say that the sequence f{ lO} is pointwise convergent to

5.7.

Continuous uF nctions

a function I if for every


= N(f, x ) such that

313
f

> 0 and for every x


pil,,(x ) ,/(x

<

there is an integer N

whenever n > N(f, x). In general, N(f, x ) is not necessarily bounded. However, if N(f, )x is bounded for all x E ,X then we say that the sequence
I[ .} converges to I uniformly on .X Let M(f) = sup N(f, x ) < 00. Equivalently,
"ex

we say that the sequence [f.} converges uniformly to I on X


f >
0 there is an M(f) such that

pil.(x ) ,f(x

<

if for every

whenever n > M(f) for all x E .X


In the next result a connection between uniform convergence of functions
and continuity is established. (We used a special case of this result in the
proof of Example 5.5.28.)
5.7.14. Theorem. Let [ X ; p,,} and [ Y ; py} be two metric spaces, and let
f[ It} be a sequence of functions from X into Y such that f" is continuous on
X for each n. If the sequence [f.} converges uniformly to I on X, then I is
continuous on .X
Assume that the sequence [ f .} converges uniformly to Ion .X Then
< f whenever n > N
for every f > 0 there is an N such that Py(f.(x ) ,f(x
for all x E .X If M > N is a fixed integer then 1M is continuous on .X Letting
X o E X b e fixed, we can find a 6> 0 such thatpy(fM(x),IM(x o < fwhenever
p,,(x , x o) < 6. Therefore, we have

Proof

: ::;;

py(f(x ) ,/(x o

pif(x ) ,fM(X

py(fM(x),fM(X

PY(fM(XO),f(x
o < 3f,
whenever. PJe(x, x o) < 6. F r om this it follows that f is continuous at X O'
Since X o was arbitrarily chosen,fis continuous at all x E .X This proves the
theorem.
The reader will recognize in the last result of the present section several
generalizations from the calculus to real-valued functions defined on metric
spaces.
[ ; p} denote the
5.7.15. Theorem. Let [ X ; pJe} be a metric space, and let R
real line R with the usual metric. Let I: X - > R, and let U c: .X If I is continuous on X and if U is a compact subset of [ X ; p",}, then
(i) lis uniformly continuous on U ;
(ii) fis bounded on ;U and
(iii) if U " * 0, f attains its infimum and supremum on ;U i.e., there
ex i stx o ,x
E U ) andf(x
sup
l) =
1 E U s uchthatf(x o )= i nf{ f (x ) :x
[ f (x ) : x E .} U

Chapter 5

314

Metric Spaces

Proof Part (i) follows from part (iv) ofTheorem 5.7.12. Since U is a compact
subset of X it follows that /(U ) is a compact subset of R. Thus, /(U ) is
bounded and closed. From this it follows that j is bounded. To prove part
(iii), note that if U is a non-empty compact subset of ;X {
Px,} then /(U ) is
a non-empty compact subset of R. This implies that / attains its infimum
and supremum on .U

5.8.

SOME IMPORTANT RESUT


L S
IN APPLICATIONS

In this section we present two results which are used widely in applications. The first of these is called the fixed point principle while the second is
known as the Ascoli-Arzela theorem. Both of these results are widely utilized,
for example,in establishing existence and uniqueness of solutions of various
types of equations (ordinary differential equations, integral equations,
algebraic equations, functional differential equations, and the like).
We begin by considering a special class of continuous mappings on metric
spaces, so-called contraction mappings.
The
5.8.1. Definition. eL t { X ; p} be a metric space and let j: X - X .
function / is said to be a contraction mapping if there exists a real number
c such that 0 < c < I and
for all ,x y

s;;: cp(x . y)

p(f(x ) ,j(y

.X

(5.8.2)

The reader can readily verify the following result.


5.8.3. Theorem. Every contraction mapping is uniformly continuous on
.X

5.8.4.

Prove Theorem 5.8.3.

Exercise.

The following result is known as the fixed point principle or the principle
of contraction mappings.
5.8.5. Theorem. eL t { X ; p} be a complete metric space, and let / be a
contraction mapping of X into .X Then
(i) there exists a unique point

and
(ii) for any

uX

X such that

f(x o) =
XI

E ,X

the sequence x {
X n+ 1

= /(x

n}
n ),

(5.8.6)

xo'

in X defined by

n=

1,2, ...

converges to the unique element X o given in (5.8.6).

(5.8.7)

315

5.8. Some Important Results in Applications

The unique point X o satisfying Eq. (5.8.6) is called a fixed point off In
this case we say that X o is obtained by the method of successive approximations.
We first show that if there is an X o E X satisfying (5.8.6), then it
must be unique. Suppose that X o and oY satisfy (5.8.6). Then by inequality
(5.8.2), we have p(xo,Yo) < cp(x o' oY ) Since 0 < c < I, it follows that
p(x o' oY ) = 0 and therefore X o = oY '
Now let IX be any point in .X We want to show that the sequence fx.}
generated by Eq. (5.8.7) is a Cauchy sequence. F o r any n > I, we have
p(x.+ I, x . ) < cp(x., x._ I). By induction we see that p(x.1+ >
x . ) < C - I p(XZx ' l )
for n = 1,2, .... Thus, for any m > n we have

Proof

p(x""

",- I

< I: P(XkI+ '

x.)

k=

<
-

-1

p( ,zX

1- c

IX

k)
)

<

c - I p(x z ,

xl)[1

c+

... +

C",-I- ]

Since 0 < c < I, the right-hand side of the above inequality can be made
arbitrarily small by choosing n sufficiently large. Thus, .x { }
is a Cauchy
sequence.
Next, since fX ; p} is complete, it follows that .x { }
converges; i.e., lim x
exists. eL t lim x .

But f(lim x . )

= .x Now since/is continuous on


limf(x . ) = f(lim x.).

= f(x ) and lim


I(x n ) = lim x
II.

n+

,X

we have

= .x Thus,/(x ) = x and we

have proven the existence of a fixed point off Since we have already proven
uniqueness, the proof is complete. _
It may turn out that the composite function pn' /),. /0/0 ... 0/ is a
contraction mapping, whereas / is not. The following result shows that such
a mapping still has a unique fixed point.

5.8.8. Corollary. Let { X ;

p} be a complete metric space, and let/; X - > X


be continuous on .X If the composite function p.' = f 0/0 ... 0/ is a
contraction mapping, then there is a unique point X o E X such that

f(x o) = X o'
(5.8.9)
Moreover, the fixed point can be determined by the method of successive
approximations (see Theorem 5.8.5).

5.8.10. Exercise.

Prove Corollary 5.8.8.

We will consider several applications of the above results in the last


section of this chapter.
Before we can consider the Arzela-Ascoli theorem, we need to introduce
the following concept.

316

Chapter 5

I Metric Spaces

5.8.11. Definition. Let e[a, b] denote the set of all continuous real-valued
functions defined on the interval a[ , b] of the real line R. A subset Y of
era, b] is said to be equicontinuous on a[ , b] if for every f > 0 there exists a
J > 0 such that Ix(t) - x(t o) I < f for all x E Y and all t, to such that
It - tol < .J
Note that in this definition J depends only on f and not on x or 1 and '
We now state and prove the Arzela-Ascoli theorem.

0,

5.8.12. Theorem. Let e{ a[ , b]; p_} be the metric space defined in Example
5.3.14. Let Y be a bounded subset of e[a, b]. If Y is equicontinuous on a[ , b],
then Y is relatively compact in e[a, b].

Proof F o r each positive integer k, let us divide the interval a[ , b] into k equal
parts by the set of points Vk = t{ ok' 11k' ... ,/ u } c a[ , b]. That is, a = 10k
< Ilk < ... < lu = b, where t'k = a + (ilk)(b - a), i = 0, I, ... ,k, and

a[ , b]

k= 1

= U

1= '

[ / c/- I lk'

I,k] for all k

= 1,2, .... Since each

Vk is a finite set,

Vk is a countable set. F o r convenience of notation, let us denote this set

by T
{ ,! Tz, ....J The ordering of this set is immaterial. Next, since Y is
bounded, there is a ., > 0 such that p_(,x
y) < ., for all ,x Y E .Y Let X o be
held fixed in ,Y and let Y E Y be arbitrary. Let 0 E era, b] be the function
which is zero for all 1 E a[ , b]. Then p_(y, 0) < p_(y, x o) + p_(x o, 0). Hence,
p_ ( y, 0) < M for all y E ,Y where M = ., + p_(x o, 0). This implies that
sup ly(t)1 < M for all Y E .Y Now, let y{ .J be an arbitrary sequence in

IEI bl

.Y We want to show that y{ .J contains a convergent subsequence. Since


IY.(TI)I < M for all n, the sequence of real numbers .Y { (T I)} contains a
convergent subsequence which we shall call {YI.(TI)}'
Again, since IhY (1'Z) I
< M for all n, the sequence of real numbers hY{ (1')z }
contains a convergent
subsequence which we shall call .z Y { (1')z .}
We see that .z Y { (1'
I)} is a subsequence Of{hY (1' I)}, and hence it is convergent. Proceeding in a similar fashion,
we obtain sequences y
{ hI, .z Y { ,}
... such that bY{ }
is a subsequence of y
{ 1.}
for all k > j. Furthermore, each sequence is such that lim hY (1',) exists for
each i such that 1 <
a subsequence of hY { }

i < k. Now let { x . J b e


the sequence y{ .} Then .x { }
is
and lim .X (1',) exists for i = 1,2, .... We now wish to

show that .x{ }


is a Cauchy sequence in e{ a[ , b]; p_.}
Let f> 0 be given.
Since Y is equicontinuous on a[ , b], we can find a positive number k such
that Ix.(t) - x . (t' ) I < f/3 for every n whenever It - t' l < Ilk. Since .X { (1',)}
is a convergent sequence of real numbers, there exists a positive integer
N such that Ix.(1',) - m
X (1',) I < f/3 whenever m > Nand n > N for all
1', E Vk Now, if t E a[ , b], there is some 1', E Vk such that II - 1',1 < Ilk.

5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces

Hence,

Ix i t)

for all m >


-

"x ,(t)

Nand n >

I< Ifx t(t)

317

N, we have
fX t(t,)

I + IfX t(t,)

"x ,(t,)

I
IX",(t,) -

"x ,(t)1

<

E.

This implies that poo(x"" fX t) .< E for all m, n > N. Therefore, .x{ }
is a Cauchy
sequence in era, b]. Since e{ ra, b]; pool is a complete metric space (see Example
5.5.28), fX{ t}
converges to some point in era, b]. This implies that fY { t}
has a subsequence which converges to a point in era, b] and so, by Theorem
5.6.31, Y is relatively compact in era, b]. This completes the proof of the
theorem. _
Our next result follows directly from Theorem 5.8.12. It is sometimes
referred to as Ascoli's lemma.
5.8.13. Corollary. Let 9{ 1ft} be a sequence of functions in e{ ra, b]; poolIf 9{ 1ft} is equicontinuous on a[ , b] and uniformly bounded on a[ , b] (Le., there
exists an M> 0 such that sup 1,.(t)1 < M for all n), then there exists a
.S;.S;b

era, b] and a subsequence 9{ 1ft.} of ,{ ,,}

uniformly on a[ , b].

5.8.14.

Exercise.

such that 9{ 1ft.}

converges to,

Prove Corollary 5.8.13.

We close the present section with the following converse to Theorem


5.8.12.
5.8.15. Theorem. Let Y be a subset of era, b] which is relatively compact
in the metric space e{ ra, b]; pool. Then Y is a bounded set and is equicontinuous on a[ , b].
5.8.16. Exercise.

5.9.

Prove Theorem 5.8.15.

EQUIVALENT
AND HOMEOMORPHIC
SPACES. TOPOLOGICAL
SPACES

METRIC

It is possible that seemingly different metric spaces may exhibit properties


which are very similar with regard to such concepts as open sets, limits
of sequences, and continuity of functions. F o r example, for each p, I < p
< 00, the spaces R~ (see Examples 5.3.1,5.3.3) are different metric spaces.
However, it turns out that the family of all open sets is the same in all of
these metric spaces for 1 < p < 00 (e.g., the family of open sets in R7 is the
same as the family of open sets in Ri, which is the same as the family of open
sets in Rj, etc.). Furthermore, metric spaces which are not even defined on

Chapter 5

318

I Metric Spaces

the same underlying set (e.g., the metric spaces { X ; P.. } and { Y ; py}, where
X
Y) may have many similar properties of the type mentioned above.
We begin with equivalence of metric spaces defined on the same underlying
set.

5.9.1. Definition. Let { X ; ptl and { X ; Pl} be two metric spaces defined on
the same underlying set .X Let 3 1 and 31 be the topology of X determined
by PI and Pl' respectively. Then the metrics PI and Pl are said to be equivalent
metrics if 3 1 = 31 ,
Throughout the present section we use the notation
f: { X ;

PI} ~

{Y;

Pl}

to indicate a mapping from X into ,Y where the metric on X is PI and the


metric on Y is Pl' This distinction becomes important in the case where
X = ,Y i.e. in the casef: { X ; PI} - + { X ; Pl}'
Let us denote by i the identity mapping from X onto ;X i.e., i(x ) = x
for all x E .X Clearly, i is a bijective mapping, and the inverse is simply
i itself. However, since the domain and range of i may have different metrics
associated with them, we shall write
and

i: ;X{
i- I : {X;

PI}
Pl}

{X;
~
~

Pl}
{X;

PI}'

With the foregoing statements in mind, we provide in the following


theorem a number ofequivalent statements to characterize equivalent metrics.
5.9.2. Theorem. Let {X; pd, {X; Pl}' and { Y ;
the following statements are equivalent:

P3} be metric spaces. Then

(i) PI and Pl are equivalent metrics;


(ii) for any mappingf: X - +
Y,J:
{ X ; PI} - +
{ Y ; P3} is continuous on X
if and only iff: { X ; Pl} - + { Y ; P3} is continuous on X;
(iii) the mapping i: { X ; PI} - + { X ; Pl} is continuous on ,X and the
mapping i- I : { X ; Pl} - + { X ; ptl is continuous on X; and
(iv) for any sequence x { R } in ,X x { R } converges to a point x in { X ; PI}
if and only if x { R } converges to x in ;X {
Pl}'

Proof

To prove this theorem we show that statement (i) implies statement


(ii); that statement (ii) implies statement (iii); that statement (iii) implies
statement (iv); and that statement (iv) implies statement (i).
To show that (i) implies (ii), assume that PI and Pl are equivalent metrics,
and letfbe any continuous mapping from ;X{
PI} into {Y; P3}' Let U be any
open set in { Y ; P3}' Sincefis continuous,J - I (U )
is an open set in { X ; PI}'
Since PI and Pl are equivalent metrics,f- I (U ) is also an open set in { X ; Pl} '
Hence, the mapping f: { X ; Pl} - + { Y ; P3} is continuous. The proof of the
converse in statement (ii) is identical.

5.9. Equivalent and oH meomorphic

Metric Spaces. Topological Spaces

319

We now show that (ii) implies (iii). Clearly, the mapping i: ;X{
pz} - +
{X;
pz} is continuous. Now assume the validity of statement (ii), and let
{ Y ; P3} = {X; pz.} Then i: {X; PI} - {X; pz} is continuous. Again, it is clear
that i- I : { X ; PI} - + { X ; PI}
is continuous. Letting { Y ; P3} = { X ; pd
in
statement (ii), it follows that i- I : { X ; pz} - + { X ; PI} is continuous.
Next, we show that (iii) implies (iv). eL t i: ;X{
PI} - + ;X {
pz} be continuous, and let the sequence {x~}
in metric space { X ; PI} converge to .x By
Theorem 5.7.8, lim i(x)~
= i(x); eL ., lim ~x = x in { X ; pz.} The converse is
~

proven in the same manner.


Finally, we show that (iv) implies (i). L e t U be an open set in { X ; PI}'
be a sequence in U - which converges
Then U - is closed in { X ; PI}' Now let{x~J
to x in { X ; PI}. Then x E U by part (iii) of Theorem 5.5.8. By assumption,
{x~}
converges to x in { X ; pz} also. Furthermore, since x E U - ,
U - is closed
pz,} by part (iii) of Theorem 5.5.8. Hence, U is open in ;X{
pz.} Letting
in ;X{
U be an open set in ;X { pz,} by the same reasoning we conclude that U is open
in { X ; PI}' Thus, PI and pz are equivalent metrics.
This concludes the proof of the theorem. _
The next result establishes sufficient conditions for two metrics to be
equivalent. These conditions are not necessary, however.

5.9.3. Theorem. Let ;X{


PI} and ;X {
pz} be two metric spaces. If there
exist two positive real numbers, Y' and A, such that
lpz ( x ,

for all ,x y

5.9.4.

,X

Exercise.

y)

<

y)

PI(X,

<

lJ Pz(,x

y)

then PI and pz are equivalent metrics.


Prove Theorem 5.9.3.

Let us now consider some specific examples of equivalent metric spaces.


5.9.5. Exercise.
eL t ;X {
pJ be any metric space. F o r the example
Exercise 5.1.10 the reader showed that {X; PI} is a metric space, where
PI ( x , y
for all ,x y

.X

)_
-

of

p(x , y)
p(x , y)

Show that P and PI are equivalent metrics.

PI} = R~ and {R~;


pz} = R~ be the metric spaces
5.9.6. Theorem. Let {R~;
defined in Example 5.3.1, and let R{ ;~
pool be the metric space defined in
Ex a mple 5.3.3. Then
(i) poo(x, y) < pz(,x
y) < ..jn poo(x, y) for all ,x y E R~;
(ii) poo(x, y) < PI(X, y) < npoo(x, y) for all ,x y E R~; and
(iii) PI' Pz, and poo are equivalent metrics.

Chapter 5 I Metric Spaces

320

5.9.7. Exercise.

Prove Theorem 5.9.6.

It can be shown that for the metric spaces R


{ n; PoP) and R
{ n; Pv), PoP and Pv
are equivalent metrics for any p, q such that I < p <
00, I < q <
00.
In Example 5.1.12, we defined a metric P*, called the usual metric for R*.
U p until now, it has not been apparent that there is any meaningful connection between P* and the usual metric for R. The following result shows that
when P* is restricted to R, it is equivalent to the usual metric on R.
5.9.8. Theorem. L e t R
{ ; p) denote the real line with the usual metric,
and let R
{ *; p*J denote the extended real line (see Exercise 5.1.12). Consider
R
{ ; P*J which is a metric subspace of R
{ *; P*J. Then
{ ; p) and {R; p*J, p and p* are eq u ivalent
(i) for the metric spaces R
metrics;
(ii) if U c R, then U is open in {R; p) if and only if U is open in
R
{ *; P*); and
(iii) if U is open in R
{ *; p*), then U n R, U - {+ooJ,
and U - { - o o)
are open in R
{ *; p*).

5.9.9. Exercise.
Prove Theorem 5.9.8. (H i nt:
5.9.2 to prove part (i) of this theorem.)

Use

part (iii) of Theorem

Our next example shows that i- I need not be continuous, even though

i is continuous.

5.9.10. Example.
L e t X be any non-empty set, and let PI be the discrete
metric on X (see Example 5.1.7). In Exercise 5.4.26 the reader was asked
to show that every subset of X is open in { X ; PI)' Now let { X ; p) be an
arbitrary metric space with the same underlying set .X Clearly, i: { X ; PI)
-+
{ X ; p) is continuous. However, i- I : { X ; p) - + { X ; PI) is not continuous
unless every subset of { X ; p) is open. Since this is usually not true, i- I need
not be continuous. _
Next, we introduce the concepts of homeomorphism and homeomorphic
metric spaces.
5.9.11. Definition. Two metric spaces { X ; P.. J and {Y; py} are said to be
bomeomorpbic if there exists a mapping rp: { X ; P..J - + { Y ; p,.) such that (i)
rp is a bijective mapping of X onto ,Y and (ii) E c X is open in { X ; P..) if
and only if rp(E) is open in { Y ; p,.J. The mappingrp is calledabomeomorpbism.
We immediately have the following generalization of Theorem 5.9.2.
5.9.12. Theorem. Let {X; P..,J { Y ;
let rp be a bijective mapping of { X ;
statements are equivalent.

p,.), and { Z ; p,) be metric spaces, and


P.. ) onto { Y ; p,.). Then the following

5.9. Equivalent and oH meomorphic

Metric Spaces. Topological Spaces

321

(i) rp is a homeomorphism;
Px} - + Z
{ ; Pz} is continuous on
(ii) for any mapping f: X - + Z, f: ;X {
X if and only iff0 rp-I: { Y ; py} - + { Z ; Pz} is continuous on ;Y
(iii) rp: { X ; Px} - + { Y ; py} is continuous and rp-I: {Y; py} - + {X; Px} is
continuous; and
(iv) for any sequence x { n } in ,X x { n } converges to a point x in {X; Px}
if and only if r{ p(x n )} converges to rp(x) in { Y ; py}.
5.9.13. Exercise.

Prove Theorem 5.9.12.

The connection between homeomorphic metric spaces defined on the same


underlying set and equivalent metrics is provided by the next result.
5.9.14. Theorem. Let { X ; PI} and { X ; P2} be two metric spaces with the
same underlying set .X Then P I and P2 are equivalent if and only if the identity
PI} - + ;X {
P2} is a homeomorphism.
mapping i: ;X {
5.9.15. Exercise.

Prove Theorem 5.9.14.

It is possible for ;X {
PI} and ;X {
P2} to be homeomorphic, even though
PI and P2 may not be equivalent.
There are important cases for which the metric relations between the
elements of two distinct metric spaces are the same. In such cases only the
nature of the elements of the metric spaces differ. Since this difference may
be of no importance, such spaces may often be viewed as being essentially
identical. Such metric spaces are said to be isometric. Specifically, we have:

5.9.16. Definition. eL t { X ; Px} and { Y ; py} be two metric spaces, and let
rp: { X ; Px} - + (Y ; py} be a bijective mapping of X onto .Y The mapping rp is
said to be an isometry if

Px(,x

y) =

py(rp(x), rp(y

for all x, y E .X If such an isometry exists, then the metric spaces (X ;


and ;Y{
P,.} are said to be isometric.

Px}

rp be an isometry. Then rp is a homeomorphism.

5.9.17. Theorem.

eL t

5.9.18. Exercise.

Prove Theorem 5.9.17.

We close the present section by introducing the concept of topological


space. It turns out that metric spaces are special cases of such spaces.
pI,
In Theorem 5.4.15 we showed that, in the case of a metric space ;X{
(i) the empty set 0 and the entire space X are open; (ii) the union of an
arbitrary collection of open sets is open; and (iii) the intersection of a finite
collection of open sets is open. Examining the various proofs of the present

Chapter 5 I Metric Spaces

322

chapter, we note that a great deal of the development of metric spaces is not
a consequence of the metric but, rather, depends only on the properties of
certain open and closed sets. Taking the notion of open set as basic (instead
of the concept of distance, as in the case of metric spaces) and taking the
aforementioned properties of open sets as postulates, we can form a mathematical structure which is much more general than the metric space.
5.9.19. Definition. Let X be a non-void set of points, and let 3 be a family
of subsets which we will call open. We call the pair ;X{
3} a topological space
if the following hold:

3, 0 E 3;
(ii) if U 1 E 3 and U z E 3, then U 1 n U z
(iii) for any index set A, if IX E A, and ,U .
(i)

E
E

3; and
3, then

,.eA

,U .

3.

The family 3 is called the topology for the set .X The complement of an
open set U E 3 with respect to X is called a closed set.
The reader can readily verify the following results:
eL t ;X {

5.9.20. Theorem.

3} be a topological space. Then

(i) 0 is closed;
(ii) X is closed;
(iii) the union of a finite number of closed sets is closed; and
(iv) the intersection of an arbitrary collection of closed sets is closed.
5.9.21.

Exercise.

Prove Theorem 5.9.20.

We close the present section by citing several specific examples


topological spaces.
5.9.22. Example.
topological space.

In view of Theorem 5.4.15,

of

every metric space is a

5.9.23. Example.
Let X = ,x { y,J and let the open sets in X be the void set
0, the set X itself, and the set .} x {
If 3 is defined in this way, then ;X {
3}
is a topological space. In this case the closed sets are 0, ,X and y{ .}
_
5.9.24. Example. Although many fundamental concepts carry over from
metric spaces to topological spaces, it turns out that the concept of topological
space is often too general. Therefore, it is convenient to suppose that certain
topological spaces satisfy some additional conditions which are also true
in metric spaces. These conditions, called the separation axioms, are imposed
on topological spaces ;X{
3} to form the following important special cases:

5.10. Applications

323

TI-spaces: A topological space (X ; ::I} is called a TI-space if every set


consisting of a single point is closed. Equivalently, a space is called a T I space, provided that if x and yare distinct points there is an open set containing y but not .x Clearly, metric spaces satisfy the TI-axiom.
is called a Tzs- pace if for all
Tz-spaces:
A topological space (X;::I}
distinct points ,x y E X there are disjoint open sets U x and U y such that
x E U x and y E U r Tzs- paces are also called aH usdorff' spaces. All metric
spaces are Hausdorff spaces. Also, all T z s- paces are T I-spaces. However,
there are TI-spaces which do not satisfy the Tz-separation axiom.
T 3-spaces: A topological space (X; ::I} is called a T 3s- pace if (i) it is a
TI-space, and (ii) if given a closed set Y a nd a point x not in Y there are
disjoint open sets U I and U z such that x E U I and Y c U z . T 3s- paces are
also called regular topological spaces. All metric spaces are T 3-spaces. All
T 3s- paces are Tzs- paces; however, not all Tzs- paces are T 3s- paces.
T.-spaces: A topological space (X;::I}
is called a T.-space if (i) it is a
TI-space, and (ii) if for each pair of disjoint closed sets IY > Y z in X there
exists a pair of disjoint open sets U I , U z such that Y I c U I and Y z c U z .
T.-spaces are also called normal topological spaces. Such spaces are clearly
T 3-spaces. However, there are T 3-spaces which are not normal topological
spaces. On the other hand, all metric spaces are T.-spaces.

5.10.

APPLICATIONS

The present section consists of two parts (subsections A and B).


In the first part we make extensive use of the contraction mapping principle
to establish existence and uniqueness results for various types of equations.
This part consists essentially of some specific examples.
In the second part, we continue the discussion of Section .4 11, dealing
with ordinary differential equations. Specifically, we will apply Ascoli's
lemma, and we will answer the questions raised at the end ofsubsection4.1IA.

A. Applications of the Contraction Mapping Principle


In our first example we consider a scalar algebraic equation which may be
linear or nonlinear.
5.10.1.

Example.

Consider the equation

= f(x ) ,

(5.10.2)

where f: a[ , b] - - > a[ , b] and where a[ , b] is a closed interval of R. Let L


.
and assume that f satisfies the condition

If(x z ) -

f(x l )!

<

Llx

z -

IX I

> 0,

(5.10.3)

Chapter 5 I Metric Spaces

324

for all X ! , X z E a[ , b). In this case / is said to satisfy a iL pschitz condition,


and L is called a iL pschitz constant.
Now consider the complete metric space {R; p}, where p denotes the usual
metric on the real line. Then aH , b]; p} is a complete metric subspace of{ R ; p}
(see Theorem 5.5.33). If in (5.10.3) we assume that L < I, then/is clearly a
contraction mapping, and Theorem 5.8.5 applies. It follows that if L < I,
then Eq. (5.10.2) possesses a unique solution. Specifically, if X o E a[ , b), then
the sequence ,x { ,}, n = 1,2, ... determined by "X = /(X,,_I)
converges to
the unique solution of Eq. (5.10.2).
Note that if Id/(x ) fdx I = If' ( x ) I < c < I on the interval a[ , b) (in this
case f' ( a) denotes the right-hand derivative of/at a, and f' ( b) denotes the
left-hand derivative of/at b), then / is clearly a contraction.
In iF gures J and K the applicability of the contraction mapping principle
y= x
b1-----------.,(

81- - - . (

iF gure J . Successive approximations (convergent case).

5.10.4.

y x
/

~
............... y" fIx)

8
/

,-

x.
X

3x
b

5.10.5.

iF gure .K

Successive approximations (convergent case).

5.10. Applications

325

is demonstrated pictorially. As indicated, the sequence .x{ }


determined by

successive approximations converges to the fixed point .x


In our next example we consider a system of linear equations.
5.10.6. Example.

Consider the system of n linear equations

e, =

:'J 1

+ P"

a'J~

e.)

i=

1, ... , n.

(5.10.7)

Assume that x = (~I'


... ,
E R, b = (PI> '
.. , P.) E R, and a'J E R.
Here the constants a'' J P, are known and the are unknown. In the following
we use the contraction mapping principle to determine conditions for the
existence and uniqueness of solutions of Eq. (5.10.7). In doing so we consider
different metric spaces. In all cases we let

e,

y =!(x)
denote the mapping determined by the system of linear equations

a'J~
P" i = I, ... , n,
"J :1
where y = (' I I' ... , 'I.) ERn.
F i rst we consider the complete space R
{ n; PI} = R7. Let y' = ! ( x ' ) ,
y" = ! ( x " ), x ' = (~;, ... , ~) and x " = (~~,
... , ~). We have

I' , =

PI(y' , y")

<

PI(f(X ' ) ,f(x "

tIt a,iej -

e~)1

I- I

J-I

m:x

i; It a,~j

I- I

tt

<

{~la,JI}PI(X',

+ P, - a'Je~

J=I

1= I 1= I

- P,

la,Jllej - e~1

x " ),

where in the preceding the Holder inequality for finite sums was used (see
Theorem 5.2.1). Clearly, f is a contraction if the inequality

(5.10.8)
holds. Thus, Eq. (5.10.7) possesses a unique solution if (5. 10.8) holds for allj.
Next, we consider the complete space R
{ n; pz} = Ri. We have

pl(y' , y")

= pl(! ( x ' ) ,! ( x "


=

=
~

ft1 a,tj +

{.

L L .{ a,J(ej - e1) 2} < { L L

I-I I-J

1= 1

J=I

P, - a,J e1- P, } z
ah }

pi(x ' ,

x " ),

where, in the preceding, the Schwarz inequality for finite sums was employed
(see Theorem 5.2.1). It follows that f is a contraction, provided that the
inequality

(5.10.9)

ehopler 5 I Metric Spaces

326

holds. Therefore, Eq. (5.10.7) possesses a uniq u e solution, if (5.10.9) is


satisfied.
{ a; p-l = R:.. We have
Lastly, let us consider the complete metric space R

p- ( y' , y") =

m
{ ax

L
1= 1

la,/llp_(x',

Thus,f is a contraction if

It

= m~

p_ ( f(x ' ) ,f(x "

~)

x " ).

fti 1al/ Il
a

{m~x

all~

b.

<

l.

(5.10.10)

Hence, if (5.10.10) holds, then Eq. (5.10.7) has a unique solution.


In summary, if anyone of the conditions (5.10.8), (5.10.9), or (5.10.10)
holds, then Eq. (5.10.7) possesses a uniq u e solution, namely .x This solution
can be determined by the successive approx i mation
I- '..t(k) - -

for all i =

I, ...

a IJ ' - I..t(k- I )

1-

bI'

k -- " 1 2 ... ,

,n, with starting point X C01


=

...

(~iO),

,~~O.

(5.10.11)

Next, let us consider an integral eq u ation.


L e t, E e[a, b) and let (K s, I) be a real-valued function
3.10.12. Example.
which is continuous on the sq u are a[ , b) X a[ , b). L e t 1 E R. We call

ex s)

tp{s)

s: (K s,

t)x(t)dt

(5.10.13)

a Fredholm nOD-homogeneous linear integral equation of the secODd kind.


In this eq u ation x is the unknown, (K s, t) and, are specified, and 1 is regarded
as an arbitrary parameter.
We now show that for all III sufficiently small, Eq. (5.10.13) has a uniq u e
solution which is continuous on a[ , b]. To this end, consider the complete
metric space e{ ra, b]; p-l, and let y = f(x ) denote the mapping determined by

yes)
Clearly
sup I(K s,

a5;t5;b
a5;,5;b

y E era, b].
t) I. Then

+ ). s: (K s,

,(s)

We thus have f:

era, b]

S 111M(b -

p_ ( f(x l ),! ( x ,

t)x(t)dt.
-+

a)p_ ( x

era, b].

Now let M =

l , x , ).

Therefore, if we choose 1 so that


111=

<

M(b

a)

(5.10.14)

5.10. Applications

3rT

then f is a contraction mapping. F r om Theorem 5.8.5 it now follows tbat


Eq. (5.10.13) possesses a uniq u e solution x E era, b], if (5.10.14) bolds.
Starting at X o E era, b], successive approx i mations to this solution are given
by

,x ,(s)

,(s)

.t

s: (K s,

t)x"I_ (t)dt,

1,2,3, . . ..

(5.10.15)

Nex t , we consider yet another type of integral eq u ation.

5.10.16. Example. L e t rp
on the triangle a < t < s <
(x s)

rp(s)

era, b], let (K s, t) be a real continuous function


b, and let .t E R. We call

+ .t .J '

(K s,

t)x(t)dt,

<

s<

b,

(5.10.17)

a linear Volterra integral equation. H e re x is unknown, (K s, t) and, are


specified, and .t is an arbitrary parameter.
We now show that, for all .t, Eq. (5.10.17) possesses a uniq u e continuous
solution. We consider again the complete metric space e{ ra, b]; pool, and we
let Y = f(x ) be the mapping determined by

y(s) =

rp(s)

.1.

(K s,

t)x(t)dt.

Since the right- h and side of this expression is continuous, it follows that
f: era, b] - era, b]. Moreover, since K is continuous, there is an M such
that IK(s, t)1 ~ M. L e t YI = f(x l ), and let zY = f(x z ). As in the preceding
example, we have

p..(f(x l ),f(x 2)) = P"(YI> 2Y ) < 1.1.1 M(b Now let fl"l denote the composite mapping f 0 f
= yl"l. A little bit of algebra yields
p..(fI"l(X I ),fI"l(x

However,

= p..(yl">'yl"l) ~

~n . 1.tI"M"(b - a)" -

0 as n -

a)poo(x l , x 2)

0 f, and let fl"l(x )

~n. 1.1.I"M"(b - a)"p..(x


00.

l,

2 )

(5.10.18)

Thus, for an arbitrary value of

.t, n can be chosen so large that


k
H e nce,

we have

~! 1.tI"M"(b -

a)"

<

1.

p.. (fI"l(X I),fI"l(X 2) < kp..(x 1 , x 2 ), 0 < k < l.


Therefore, the composite mapping fl"l is a contraction mapping. It follows
from Corollary 5.8.8 that Eq. (5.10.17) possesses a uniq u e continuous solution
for arbitrary .t. This solution can be determined by the method of successive
approx i mations.
_

Chapter 5 I Metric Spaces

328

Verify inequality (5.10.18).

5.10.19. Exercise.

Next we consider initial-value problems characterized by scalar ordinary


differential equations.
5.10.20. Example.

Consider the initial-value problem

x =

(X f)

1(1, x ) }

(5.10.21)

discussed in Section .4 1 I. We would like to determine conditions for the


existence and uniqueness of a solution ,(1) of (5.10.21) for f < 1 ::;; T.
Let k > 0, and assume that 1 satisfies the condition

<

1/(1, XI) - 1(1, )z x 1

k IX

lzx

I -

for all 1 E [f, T] and for all x l ' X z E R. In this case we say thatf satisfies
a iL pschitz condition in x and we call k a Lipschitz constant.
As was pointed out in Section .4 11, Eq. (5.10.21) is equivalent to the
integral equation

,(t)

e+ s: f(s, tp(sds.

Consider now the complete metric space { e [ f ,

F ( ,)
Then clearly F: e[ f , T]

poo(F(I' )'

F ( ,z

f(s, fP(sds,

e[ f ,

-+

n.

Now

[ (s,

'I(S

sup

ItJ

sup

ft k l' l (s)

.9:S:T

<

e+
=

:S:,:S:T

(5.10.22)

T]; Poo}, and let


f

<

1<

T.

f(s, ' z ( s ] d s

,z ( S) Ids

Thus, F is a contraction if k < Ij(T - f).


Next, let p.) denote the composite mapping F
as in (5.10.18), the reader can verify that

<

k(T -

f)Poo(fPl> fPz) .

.F Similarly,

(5.10.23)
Since

..!n., k(T -

..!n., kft(T-

f)ft

<

f)ft

-+

0 as n - +

00,

it follows that for sufficiently large n,

I. Therefore, p.) is a contraction. It now follows from

Corollary 5.8.8 that Eq. (5.10.21) possesses a unique solution for [f, T].
Furthermore, this solution can be obtained by the method of successive
approximations.
_

5.10. Applications

5.10.24.

Exercise.

329

Generalize Example 5.10.20 to the initial-value problem

IX
x t fr)

,[ (1,

XI'

{I'

,x

ft ) ,

I, ... , n,

which is discussed in Section .4 1 I .


B.

Further

Applications to Ordinary Differential Equations

At the end of Section 4 . lJ A we raised the following questions: (i) When


does an initial-value problem possess solutions? (ii) When are these solutions
unique? (iii) What is the extent of the interval over which such solutions
exist? (iv) Are these solutions continuously dependent on initial conditions?
In Example 5.10.20 we have already given a partial answer to the first
two uq estions. In the remainder of the present section we refine the type of
result given in Example 5.10.20, and we give an answer to the remaining
items raised above.
As in the beginning of Section 4 . lIA, we call R2 the (I, )x plane, we let
D c R2 denote an open connected set (i.e., D is a domain), we assume that
/ is a real-valued function which is defined and continuous on D, we call
T = (11' ( 2 ) eRa I interval, and we let ' I denote a solution of the differential
equation
X = 1(1, )x .
(5.10.25)
The reader should refer to Section 4 . lIA for the definition of solution rp.
We first concern ourselves with the initial-value problem
X

1(1, x),

x ( r)

={

(5.10.26)

characterized in Definition .4 11.3. Our first result is concerned with existence


of solutions of this problem. It is convenient to establish this result in two
stages, using the notion of -approximate
solution of Eq. (5.10.25).
5.10.27. Definition. A function rp defined and continuous on a I interval
T is called an a- pproximate
solution of Eq. (5.10.25) on T if

(i) (t, 1' (1 E D for all t E T;


(ii) ' I has a continuous derivative on T except possibly on a finite set
S of points in T, where there are jump discontinuities allowed; and
(iii) I;(t) - 1(1, rp(I I < for all lET - S.

IfS is not empty, ' I is said to have piecewise continuous derivatives on T.


We now prove:
5.10.28. Theorem.

Do

In Eq. (5.10.25), let/be continuous on the rectangle


{(I, x ) :

It - 1'1 <

a,

Ix - {I <

b}.

Chapter 5

330
x

I Metric Spaces

x
"' -r-' - ~ - I

,"

(1", l~

I
I
I
I
I

- J=- ~

L . .-

I......-"""-- _ _

__
~

_ L.

1"- a

1"+ 1

OCal<-

_ '_

1"+ -

........."- ' - "~'- ' :_ ' t1"- a

1"+ -

b
M

b .,. + 1
M

OC=.! I<

(1" - I , ~l
(1" + a,~)

5.10.29.

iF gure .L

Construction of an t-approximate

Given any f > 0, there exists an f-approximate


on an interval It - f I< ~ :5: a such that tp(r) =

Proof

max

(' , > : lED,

II(t, x) I, and let

solution tp of Eq. (5.10.25)


~.

= min (a, hIM). Note that

a if a < hiM and ~ = hiM if a> hiM (refer to Figure )L . We will show
that an f-approximate solution exists on the interval [ f , f + ~]. The proof is
similar for the interval (f - ~, fl. In our proof we will construct an f-approxiconsisting of a finite number of straight line
mate solution starting at (f,~,
segments joined end to end (see Figure )L .
Since 1 is continuous on the compact set Do, it is uniformly continuous
on Do (see Theorem 5.7.12). eH nce, given f > 0, there exists 6 = 6(f) > 0
such that I/(t, x ) - I(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It - t'l < 6
and Ix - i'x < 6. Now let f = to and f + ~ = t". We divide the half-open
interval (to, t,,] into n half-open subintervals (to, tl]' (tl' t 2,] ... , (t,,_ I ' t,,] in
such a fashion that
~

Let

solution.

(5.10.30)

331

5.10. Applications

Next, we construct a polygonal path consisting of n straight lines joined end


and having slopes equal to
to end, starting at the point (r, e> 6 (to,
ml _ 1 = ! ( tI- 1 > e l- l )
over the intervals (tl_ l ,tl] , i=
I, ... ,n, respectively,
where el = el- I + m l _ I Itl - tl _ 1 I A typical polygonal path is shown in
Figure .L Note that the graph of this path is confined to the triangular region
in Figure .L eL t us denote the polygonal path constructed in this way by 1' .
Note that 1' is continuous on the interval 1[ ,' l' + ~], that 1' is a piecewise
linear function, and that 1' is piecewise continuously differentiable. Indeed,
we have 1' (1') =
=
and

eo)

eo

1' (t)

= 1' (t l- l ) + f(tl-I>

'1(ti-I(t

ti-I)'

ti-

< t < ti' i =

1, ... , n.
(5.10.31)

Also note that

1'1(t) -

I<

1' (t')

Mit -

t' l

(5.10.32)

for all t, t' E 1[ ,' l'


~].
We now show that 1' is an f-approximate solution.
Let t E (t l _ l , tJ Then it follows from (5.10.30) and (5.10.32) that 1'1(t)
- 1' (t l- l ) I < 6 . Now since If(t, x) - f(t' , x ' ) I < f whenever (t, x), (t', x ' )
E Do, It t'l < 6, and Ix - x ' i < 0, it follows from Eq. (5.10.31) that

If(tl- I , rp(t l_ I - f(t, 1' (t 1< f.


Therefore, the function 1' is an f-approximate solution on the interval
It - 1'1 < ~ < a .
I;(t) -

f(t, 1' 1

We are now in a position to establish conditions for the existence of solutions of the initial-value problem (5.10.26).
S.10.33. Theorem. In Eq. (5.10.25), let! be continuous on the rectangle
Do =
)x : It - 1'1 < a, Ix - el < b}. Then the initial-value problem
(5.10.26) has a solution on some t interval given by It - 1'1 < ~ ~ a.

ret,

Proof

Let

f.

> 0, f.+

.--

< . and lim f. = 0 (i.e., let .{ ,}

1,2, ... ,

be a monotone decreasing sequence of positive numbers tending to ez ro).


By Theorem 5.10.28, there exists for every . an f.-approximate solution of
Eq. (5.10.25), call it 1' ., on some interval It - 1'1 < ~ such that 1' .(1') = e.
Now for each 1' . it is true, by construction of 1' ., that
(5.10.34)

This shows that 1' { .J is an equicontinuous set of functions (see Definition


5.8.11). Letting t' = Tin (5.10.34), we have 1'1.(t) - el < Mit - 1'1 < M~,
and thus I1' .(t) I< I" + M~ for all n and for all t E T[ , l' + ~]. Thus, the
sequence 1' { .J is uniformly bounded. In view of the Ascoli lemma (see Corollary 5.8.13) there exists a subsequence 1'{ .}, k = I, 2, ... of the sequence
1' { .J which converges uniformly on the interval 1[ ' - ~, l' + ~] to a limit
function 1' ; i.e.,

331

Chapter 5

Metric Spaces

This function is continuous (see Theorem 5.7.14) and, in addition, I,{ t )


-

,(t' ) 1

Mit -

t'l

To complete the proof, we must show that, is a solution of Eq. (5.10.26)


or, equivalently, that, satisfies the integral equation

,(t) =

f(s, ,(s d s.

(5.10.35)

eL t , be an f a- pproximate solution, let .~ ,(t)


= ; (t) - f(t, , ..(t)) at
= 0 at the points where
those points where , is differentiable, and let .~ ,(t)
, is not differentiable. Then , can be expressed in integral form as
, .(t) =

+
~

[f(s, , (s

~ . ,(s)]ds.

(5.10.36)

...(t)1 < f... Also, since


uniformly on 1[ ' - IX,
f +
IX,] as k - + 00, it follows that If(t,
f(t, ,(t I < f on the interval
[ f - IX, f + IX] whenever k is so large that I,... (t) - ,(t)1 < ~ on [ f - IX,
f +
IX.] sU ing Eq. (5.10.36) we now have

Since , ... is an f-approximate

solution, we have

f is uniformly continuous on Do and since ,

,.,(1 -

[f(s, , ..,(s

Therefore, ~

-+

f [f(s, ,.,(s
tp(t)

I If If(s, , ..(s + If I~ . ,(s) Ids 1<

+ ~ ...(s)]ds <

- f(s, ,(s

1~

+ ~.,(s)]ds
= ~

which completes the proof. _

f(s, ,(s
lX(f ...

Ids I

f)

f' .

f(s, ,(s d s. It now follows that

f f(s, ,(sds,

sU ing Theorem 5.10.33, the reader can readily prove the next result.
5.10.37. Corollary. In Eq. (5.10.25), let f be continuous on a domain D
ofthe (t, x ) plane, and let (f, ~ E D. Then the initial-value problem (5.10.26)
has a solution, on some t interval containing f.
5.10.38. Exercise.

Prove Theorem 5.10.37.

Theorem 5.10.33 (along with Corollary 5.10.37) is known in the literature


as the Caucby-Peano existence tbeorem. Note that in these results the solution
, is not guaranteed to be unique.
Next, we seek conditions under which uniqueness of solutions is assured.
We require the following preliminary result, called the Gronwall inequality.
5.10.39. Theorem. eL t rand k be real continuous functions on an interval
> 0 and k(t) > 0 for all t E a[ , b], and let ~ > 0 be a

a[ , b]. Suppose ret)

5.10. Applications

333

given non-negative constant. If

r(t) ~
for all t

a[ , b), then

for all t

a[ , b).

Proof.

Let

k(t)r(t)

<

R(t) =

+
~

k(t)R(t), and

k(s)r(s)ds.

(K t)

<

R(t), R(a)

6, R(t)

(5.10.42)

e-f.k(I).". Then

-k(t)e-I~k(l)dl

Multiplying both sides of (5. 10.42) by (K t)

K(t)R(t) -

- K ( t)k(t).
we have

<

K(t)k(t)R(t)

or

K(t)R(t)

or

(5.10.40)

r(t)

Then

k(t)R(t) <

R(t) for all t E a[ , b). Let (K t)

f k(s)r(s)ds
+

K(t)R(t)

<

0
0

< o.

~ (K[ t)R(t)

Integrating this last expression from a to t we obtain


K(t)R(t) - (K a)R(a) < 0
or
K(t)R(t) - ~ < 0
or
or

r(t)

which is the desired inequality.

<

R(t) <

~el~k(l)dl,

In our next result we will require that the function 1 in Eq.


satisfy a Lipschitz condition
I/(t, x ' )

for all (t, x'), (t, x " )

I(t, x " ) I <

k lx'

(5.10.25)

- x"I

D.

5.10.43. Theorem. In Eq. (5.10.25), let 1 be continuous on a domain D


of the (t, x ) plane, and let 1 satisfy a Lipschitz condition with respect to x
on D. Let (t, e> E D. Then the initial-value problem (5.10.26) has a unique
solution on some t interval containing t (i.e., if' l and
are two solutions

2'

Chapter 5 I Metric Spaces

334

e,

of Eq. (5.10.25) on an interval (a, b), if r- E (a, b), and if 'I(-r) = 'z(-r) =
then' l = ,z ) .
Proof. By Corollary 5.10.37, at least one solution exists on some interval
(a, b), r- E (a, b). Now suppose there is more than one solution, say ' I
and 'z, to the initial-value problem (5.10.26). Then

e+

,z ( t)

"~et)
for all t

s: f(s, ,,(sds,

(a, b), and

' I (t) -

s: [f(s, 'I(S

i
-

1,2

f(s, ' z ( s ] d s.

Let ret) = l' l (t) - ' z ( t)l, and let k > 0 denote the iL pschitz constant for
f. In the following we consider the case when t ~ r- , and we leave the details
of the proof for t < r- as an exercise. We have,
ret)

s: If(s, 'I(S

<
=

i.e.,

f(s, ,z(s))l ds
-

<

s: kr(s)ds

r-[ , b). The conditions of Theorem 5.10.39 are clearly satisfied and

we have: if r(t) ~
case J

,z ( s) Ids

kr(s)ds;
ret)

for all t

s: k l'l(s) -

<

kr(s)ds, then ret)

0, it follows that

ret)

,z ( t) 1 =

Therefore, l' l (t) in this interval. _

0 for all t

0 for all t

< JeJ~""'.

Since in the present

r-[ , b).

r-[ , b), and ' I (t)

' z ( t)

for all t

Now suppose that in Eq. (5.10.25) f is continuous on some domain D


of the (t, x ) plane and assume thatfis bounded on D; i.e., suppose there
exists a constant M > 0 such that
sup If(t, )x 1

(""leD

M.
~

Also, assume that r- E (a, b), that (-r, e> E D, and that the initial-value
problem (5.10.26) has a solution, on a t interval (a, b) such that (t, ,(t)) E D
for all t E (a, b). Then
lim ,(t)
1- . +

exist. To prove this, let t

,(a+ )

(a, b). Then

,(t)

and lim ,(t)


1- 6 -

e+

I(s, ,(s)ds.

,(b- )

5.10.

If a

335

Applications

<

t1

<

t1

<

b, then

1,(t 1)

<

,(t1 )1
-

i"lf(s, ,(s 1ds

"

<

Mlt 1
-

til

Now let t 1 - - band t 1 - - b. Then It1 - t 1 1- - 0, and therefore I,(tl)


- ,(t1) I- - O. This limiting process yields thus a convergent Cauchy
sequence; i.e., ,(b- ) exists. The existence of ,(a+ ) is similarly established.
(b, ,(b-
are in the domain
Next, let us assume that the points (a, ,(a+ ,
D. We now show that the solution, can be continued to the right of t = b.
An identical procedure can be used to show that the solution P' can be
continued to the left of t = a.
We define a function

;(t) =

{ , (t),
,(b- ) ,

Then

;(0 =

(a, b) } .
b

s: f(s, ;(sds

{+

for all t E (a, bl. Thus, the derivative of ;(t) exists on the interval (a, b),
and the left-hand derivative of ;(t) at t = b is given by

;(b- ) = feb, ;(b .


Next, we consider the initial-value problem

.i

x(b)

f(t, x )

,(b- ) .

By Corollary 5.10.37, the differential eq u ation .i = f(t, x ) has a solution


' " which passes through the point (b, ,(b- and which exists on some interval
lb, b + Pl, P > O. Now let

~1)

1 E (a, bl
1 E b
[ , b+

{ ; (1),
",(1),

Pl

}.

To show that ; is a solution of the differential eq u ation on the interval


(a, b + Pl, with ;(-r) = ,{ we must show that ; is continuous at t = b.
Since

,(b- )

and since

;(0 =

,(b- )

we have

;(0 =
for all t

(a, b

{+

{+

f(x , ;(s d s

s: f(s, ;(sds,

s: f(s, ;(sds

+ Pl. The continuity of ;

in the last eq u ation implies the

Chapter 5 I Metric Spaces

336

countinuity of I(s, s~ .

Differentiating the last equation, we have

~(t)
= I(t, ~(t
for all t E (a, b + Pl.
We call ~ a continuation of the solution tp to the interval (a, b Pl. If
1 satisfies a Lipschitz condition on D with respect to ,x then ~ is unique,
and we call ~ the continuation of tp to the interval (a, b + Pl.
We can repeat the above procedure of continuing solutions until the
boundary of D is reached.
Now let the domain D be, in particular, a rectangle, as shown in F i gure
M. It is important to notice that, in general, we cannot continue solutions
over the entire t interval T shown in this figure.

0=

h{ . )x : Tl < t < T 2.tl

< x < t 2)

T= ( Tl.T2)

5.10.4.4
iF gure M. Continuation of a solution to the boundary of
domain D.

We summarize the above discussion in the following:


5.10.45. Theorem. In Eq. (5.10.25), let f be continuous and bound on a
domain D of the (t, x) plane and let (T, { ) E D. Then all solutions of the
initial-value problem (5.10.26) can be continued to the boundary of D.
We can readily extend Theorems 5.10.28, 5.10.33, Corollary 5.10.37,
and Theorems 5.10.43 and 5.10.45 to initial-value problems characterized by
systems of n first-order ordinary differential equations, as given in Definition
.4 11.9 and Eq. .4 1 1.11. In doing so we replace D c R'1. by D c Ra+ I , x E R
by x E RaJ: D - + R by f: D - + Ra, the absolute value Ixl by the q u antity
a

and the metric p(x, y) =

Ix -

Ixl = I;
Ix,l,
'sl

y I on R by the metric p(x , y)

(5.10.46)

= I; I,x - y,l

I- '

on R-. (The reader can readily verify that the function given in Eq. (5.10.46)
satisfies the axioms of a norm (see Theorem .4 9.31).) The definition of Eapproximate solution for the differential eq u ation i = f(t, x ) is identical
to that given in Definition 5.10.27, save that scalars are replaced by vectors
(e.g., the scalar function tp is replaced by the n-vector valued function p4 ).

337

5.10. Applications

Also, the modifications involved in defining a Lipschitz


on D c R-+ I are obvious.
5.10.47.

condition for f(t, )x

F o r the ordinary differential eq u ation

Exercise.

i =

(5.10.48)

f(t, x)

and for the initial-value problem


i

(X T)

f(t, x),

(5.10.49)
~

characterized in Eq. (4.11.7) and Definition .4 11.9, respectively, state and


prove results for existence, uniqueness, and continuation of solutions, which
are analogous to Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems
5.10.43 and 5.10.45.
In connection with Theorem 5.10.45 we noted that the solutions of initialvalue problems described by non-linear ordinary differential equations can,
in general, not be extended to the entire t interval T depicted in Figure M.
We now show that in the case of initial-value problems characterized by
linear ordinary differential equations it is possible to extend solutions to the
entire interval T. First, we need some preliminary results.
Let
D = ({ t, )x : a < t < b, x E R-}
(5.10.50)
where the function
equations

11 is defined in Eq. (5.10.46). Consider the set of linear


=

,X

J-I

a,it)x J

f,(t, x ) ,

I.>.

1, ... , n

(5.10.51)

where the a,it), i,j = I, ... , n, are assumed to be real and continuous
functions defined on the interval a[ , b]. We first show that f(t, x) = lfl(t, x),
... ,/_(t, )x T
]
satisfies a Lipschitz condition on D,
If(t, x ' )

for all (t, x ' ) , (t, x " )


and k

max

I! ( .J ! ( ._ I - I

L I

f(t, x " ) I ~

k lx '

= (x ; , ...

D, where x '

x"l

x" =

,x~)T,

(x:' ,

a,it) I Indeed, we have

Ir(t, x ' ) -

r(t, x " ) I = ,~

= L

II,(t, x ' )

- I - 1=1'
~

I- I

<

a,it)x~

I- 'tit a,it)(x~
L

J-I

Ix~

J-I

J-I

I,(t, x " ) I

x~'1

x~)

a,it)x~

= klx' - "x l

... ,x~)T,

Chapter 5

338

Metric Spaces

Next, we prove the following:


5.10.52. Lemma.
In Eq. (5.10.48), let f(t, x) = (/1(t, x), ... ,I,,(t, T
X be
continuous on a domain D c R"+I, and let f(t, x) satisfy a Lipschitz condition
on D with respect to x, with Lipschitz constant k. If' l and
are uniq u e
solutions of the initial-value problem (5.10.49), with ' I (f) = ;1' ' 2 (t) = ;2
and with (t, ;1), (t, ;2) E D, then

2'

2' (01::;: 1;1 - ;2Iekl,-1<

"I(t) for all (t, ' I (t ,

(t, ' 2 { t

D.

We assume that t > t, and we leave the details of the proof for
t as an exercise. We have

Proof
t

<

(5.10.53)

= ;1

' I (t)
' 2 (t) =

;2 +

and

t2(t) 1 <

"I{ t ) -

1;1 -

r
r

f(s, ' I (s d s,
f(s, t2(sds,

;11

Applying Theorem 5.10.39 to inequality


(5.10.53) results. _

s: Ifl(s) -

(5.10.54),

f1(S) 1ds.

(5.10.54)

the desired inequality

We are now in a position to prove the following important result for


systems of linear ordinary differential eq u ations.
5.10.55. Theorem. L e t D c Rn+l be given by Eq. (5.10.50), and let the
real functions alit), i,j = I, ... ,n, be continuous on the t interval a[ , b].
Then there exists a unique solutionto the initial-value problem

IX

X,(f)

with (t, ~1'

a[ , b].

Proof

'~n)

=
=
E

a1it)x1

1=1
~I'

!,(t, )x ,

= I, ... , n

I, ... ,n }

(5.10.56)

D. This solution can be extended to the entire interval

Since the vector f(t, x) = (fl{ t , x), ... ,/.(t, T


x
is continuous on
D, since f(t, x) satisfies a Lipschitz condition with respect to x on D, and
since (T,;) E D (where; = (~I'
... '~n)T),
it follows from Theorem 5.10.43
(interpreted for systems of first-order ordinary differential eq u ations) that
the initial-value problem (5.10.56) has a uniq u e solution 'I' through the point

5.10. Applications

339

(r, ;) over some interval e[ , d] c a[ , b]. We must show that 'I' can be continued
to a unique solution, over the entire interval a[ , b].
Let i be any solution of Eq. (5.10.56) through (r, ;) which exists on some
= i and = 0, we
subinterval of a[ , b]. Applying Lemma 5.10.52 to
have

I'

2'

(5.10.57)
for all t in the domain of definition of i. F o r purposes of contradiction,
suppose that 'I' does not have a continuation to a[ , b] and assume that 'I'
has a continuation i existing up to t' < b and cannot be continued beyond
t'. But inequality (5.10.57) implies that the path (t, i(t remains inside a
closed bounded subset of D. It follows from Theorem 5.10.45, interpreted
for systems of first-order ordinary differential equations, that i may be
continued beyond t'. We thus have arrived at a contradiction, which proves
that a continuation, of", exists on the entire interval a[ , b]. This continuation
is unique because f(t, )x satisfies a Lipschitz
condition with respect to x
on D.
S.lO.58. Exercise.
In Theorem 5.10.55, let alj(t), i,j = 1, ... ,n, be
continuous on the open interval (- 00, 00). Show that the initial-value probE Rn+1 which can
lem (5.10.56) possesses unique solutions for every (r,
be extended to the t interval (- 00, 00).

e>

Exercise.
eL t D c Rn+ I be given by Eq. (5.10.50), and let the real
functions alit), v/(t), i,j = I, ... ,n, be continuous on the t interval a[ , b].
Show that there exists a unique solution to the initial-value problem
5.10.59.

el' ...

lx r- )

with (r,
,en)
entire interval a[ , b].

e/,

I, ... ,n,

(5.10.60)

D. Show that this solution can be extended to the

It is possible to relax the conditions on v/(t), j = 1, ' . ' . ,n, in the above
exercise considerably. F o r example, it can be shown that if v/(t) is piecewise
continuous on a[ , b], then the assertions of Exercise 5.10.59 still hold.
We now address ourselves to the last item of the present section. Consider
the initial-value problem (5.10.49) which we characterized in Definition
.4 11.9. Assume that f(t, )x satisfies a Lipschitz condition on a domain D
c Rn+1 and that (r,;) E D. Then the initial-value problem possesses a
unique solution, over some t interval containing 1'. To indicate the depen-

340

Chapter 5 / Metric Spaces

dence of, on the initial point (r, ;), we write


,(t;

T, ;),

where fP{T; T,;) = ;. We now ask: What are the effects of different initial
conditions on the solution of Eq. (5.1O.48)? Our next result provides the
answer.
5.10.61. Theorem. In Eq. (5.10.49) let f(t, )x satisfy a iL pschitz condition
with respect to x on Dc R+I. Let (T,;) E D. Then the unique solution
f(t; T, ;) of Eq. (5.10.49), existing on some bounded t interval containing T,
depends continuously on ; on any such bounded interval. (This means if
;. ->
; , then ,(t; T, ;.) - >
f(t; T, ;).)
Proof

We have

= ;. +

frs, ,(s; T, ;.)]ds

,(t; T,;)

=;

frs, ,(s; T, ;)]ds.

and

It follows that for t

>

(the proof for t

<

.(t; T, 1;)1

I.(t; T, ;.) -

r
+ r
<
+ r

cp(t; T, ;.)

II;. - ; 1

<
where k denotes a iL pschitz
we obtain

is left as an exercise),

If[s, ,(s; T, ;.)] -

;1 + k

II;. -

' , (s;

constant for f(t, )x .

I,(t; T, 1;.) - ,(t; T, 1;)1 < II;. - 1;1 ef~kd'


Thus if 1;. - >

1;, then cp(t;

T,

T,

1;.) - >

.(t;

T,

1;.) sU ing

frs, cp(s; T, ;)]1 ds

cp(s;

T,

;)1 ds,

Theorem 5.10.39,

= II;. -1;1

ek(t- T l.

1;).

It follows from the proof of the above theorem that the convergence is
uniform with respect to t on any interval a[ , b] on which the solutions are
defined.

5.10.62. Example.

The initial-value problem

x =

(X T)

where

-00

<

<

00, - 0 0

,(t;

T,

e>

2x

c= ;

< c; < 00, has the unique solution


= c;eZ(t-T), - 0 0 < t < 00,

which depends continuously on the initial value C;.

(5.10.63)

5.11.

References and Notes

341

Thus far, in the present section, we have concerned ourselves with problems characterized by real ordinary differential equations. It is an easy matter
to verify that all the existence, uniqueness, continuation, and dependence
(on initial conditions) results proved in the present section are also valid for
initial-value problems described by complex ordinary differential equations
such as those given, e.g., in Eq. (4.11.25). In this case, the norm of a complex
vector z = (z " ... ,ZRY ' Zk = kU + ivk , k = 1, ... , n, is given by

where IZk I =
= IZI - Z21

(u~

vl)I/2.

5.11. REFERENCES

The metric on

en

is in this case given by P(ZI' Z2)

AND NOTES

There are numerous excellent texts on metric spaces. Books which are
especially readable include Copson 5[ .2], Gleason 5[ .3], Goldstein and
Rosenbaum [5.4], Kantorovich and Akilov [5.5], oK lmogorov and Fomin
5[ .7], Naylor and Sell 5[ .8], and Royden 5[ .9]. Reference 5[ .8] includes some
applications. The book by eK lley 5[ .6] is a standard reference on topology.
An excellent reference on ordinary differential equations is the book by
Coddington and eL vinson [5.1].

REFERENCES
5[ .1]
5[ .2]
5[ .3]
5[ .4]

5[ .5]

5[ .6]
5[ .7]

E. A. CODDINGTON and N. EL VINSON, Theory ofOrdinary Differential Equations. New o


Y rk: McGraw-iH li Book Company, Inc., 1955.
E. T. CoPSON, Me/ric Spaces. Cambridge, England: Cambridge nU iversity
Press, 1968.
A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.:
Addison-Wesley Publishing Co., Inc., 1966.

M. E. GOLDSTEIN and B. M. ROSENBAUM, "Introduction to Abstract Analysis," National Aeronautics and Space Administration, Report No. SP-203,
Washington, D.C., 1969.
L. V. A
K NTOROVICH
and G. P. AKIO
L V,
uF nctional Analysis in Normed
Y rk: The Macmillan Company, 1964.
Spaces. New o
.J EK EL ,Y
General Topology. Princeton, N.J.: D. Van Nostrand Company,
Inc., 1955.
A. N. O
K M
L OGOROV and S. V. O
F MIN, Elements of the Theory ofFunctions
and uF nctional Analysis. Vol. I. Albany, N.Y.: Graylock Press, 1957.

342
5[ .8]
5[ .9]
5[ .10]

Chapter 5 I Metric Spaces


A. W. NAYO
L R. and G. R. SEL,L
iL near Operator Theory in Engineering and
Science. New Y o rk: H o lt, Rinehart and Winston, 1971.
H . L . ROYDEN, Real Analysis. New Y o rk:The Macmillan Company,I965.
A. E. TAYO
L R.,
General Theory of uF nctions and Integration. New Y o rk;
Blaisdell Publishing Company, 1965.

NORMED SPACES AND


INNER PRODUCT SPACES

In Chapters 2- 4 we concerned ourselves primarily with algebraic aspects of


certain mathematical systems, while in Chapter 5 we addressed ourselves to
topological properties of some mathematical systems. The stage is now set
to combine topological and algebraic structures. In doing so, we arrive at
linear topological spaces, namely normed linear spaces and inner product
spaces, in general, and Banach spaces and Hilbert spaces, in particular. The
properties of such spaces are the topic of the present chapter. In the next
chapter we will study linear transformations defined on Banach and Hilbert
spaces. The material of the present chapter and the next chapter constitutes
part of a branch of mathematics called functional analysis.
Since normed linear spaces and inner product spaces are vector spaces as
well as metric spaces, the results of Chapters 3 and 5 are applicable to the
spaces considered in this chapter. Furthermore, since the Euclidean spaces
considered in Chapter 4 are important examples of normed linear spaces and
inner product spaces, the reader may find it useful to refer to Section .4 9 for
proper motivation of the material to follow.
The present chapter consists of 16 sections. In the first 10 sections we
consider some of the important general properties of normed linear spaces
and Banach spaces. In sections II through 14 we examine some of the
important general characteristics of inner product spaces and Hilbert spaces.
(Inner product spaces are special types of normed linear spaces; Hilbert
343

Chapter 6 I Normed Spaces and Inner Product Spaces

344

spaces are special cases of Banach spaces; Banach spaces are special kinds of
nonned linear spaces; and H i lbert spaces are special types of inner product
spaces.) In section 15, we consider two applications. This chapter is concluded with a brief discussion of pertinent references in the last section.

6.1.

NORM ED IL NEAR

SPACES

Throughout this chapter, R denotes the field ofreal numbers, C denotes the
field of complex numbers, F denotes either R or C, and X denotes a vector
space over .F
6.1.1. Definition. Let II 1 1 denote a mapping from X into R which satisfies
the following properties for every ,x y E X and every E :F
(i)
(ii)
(iii)
(iv)

IIxll ~ 0;
IIxll = 0 if and only if x =
II/%IX I = 11l lxll; and
Ilx + yll ~ IIxll + lIyll

0;

The function " "is called a nonn on X, the mathematical system con II}, is called a nonned linear space, and II x II
sisting of II I\ and ,X { X ; "
is called the nonn or .x If F = C we speak of a complex nonned linear space,
and if F = R we speak of a real nonned linear space.
Different norms defined on the same linear space X yield different nonned
linear spaces. If in a given discussion it is clear which particular norm is
II} to denote the nonned
being used, we simply write X in place of { X ; "
linear space under consideration. Properties (iii) and (iv) in Definition 6.1.1
are called the homogeneity property and the triangle inequality of a nonn,
respectively.
Let { X ; II II} be a normed linear space and let ,x E ,X i = I, ... ,n.
Repeated use of the triangle inequality yields

II X I + ... + .x 1I

I\ x

I "

+ ... + IIx.lI

The following result shows that every normed linear space has a metric
associated with it, induced by the. nonn I\ II. Therefore, every nonned
linear space is also a metric space.

II III be a nonned linear space, and let p be a


6.1.2. 1beorem. L e t ;X {
real-valued function defined on X x X given by p(x, y) = IIx - yll for all
,x y E .X Then p is a metric on X and ;X{
p} is a metric space.
6.1.3. Exercise.

Prove Theorem 6.1.2.

6.1. oH rmed iL near Spaces

This theorem tells us that all ofthe results in the previous chapter on metric
spaces apply to normed linear spaces as we/l,providedwe let p(x, y) = Ilx - y II.
We will adopt the convention that when using the terminology ofmetric spaces
(e.g., completeness, compactness, convergence, continuity, etc.) in a normed
linear space (X ; II . Ill, we mean with respect to the metric space (X ; p},
where p(x, y) = II x - y II. Also, whenever we use metric space properties on ,F
i.e., on R or C, we mean with respect to the usual metric on R or C, respectively.
With the foregoing in mind, we now introduce the following important
concept.
6.1.4. Definition. A complete normed linear space is called a Banach
space.
Thus, (X ; (II II} is a Banach space if and only if (X ;
metric space, where p(x , y) = IIx - yll.

p} is a complete

6.1.5. Example.
Let X = RR, the space of n-tuples of real numbers, or let
X = CR, the space of n-tuples of complex numbers. From Example 3.1.10 we
see that X is a vector space. F o r x E X given by x = (e I' . . . , e.), and for
pER such that I < p < 00, define

II x II, = l[ ei I' + ... + leRI'p /,.


verify that II . II, satisfies the axioms

We can readily
of a norm. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv) is a direct
conseq u ence of Minkowski' s inequality for finite sums (5.2.6). L e tting
pix , y) = II x y II" then (X ; p,} is the metric space of Exercise 5.5.25.
Since (X ; p,} is complete, it follows that (RR; II . II,} and (CR; II . II,} are
Banach spaces.
We may also define a norm on X by letting

IIxll .. =
It can readily be verified that (R\
spaces (see Exercise 5.5.25).
6.1.6. Example.
Let
ple 3.1.13), let I S p

<

=
X

00,

I,

max

I~R~'

II . II..}

le,l
and (CR; II

. II..}

R" (see Example 3.1.11) or X


and as in Example 5.3.5, let

x{

Define

IIxll, =

:X

f; le,l'

I~'

<

= C" (see Exam-

oo}.

(~ .. le,l' )1/' .
/- 1

are also Banach

(6.1.7)

It is readily verified that II . II, is a norm on the linear space I,. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv), the triangle

Chapter 6 I Normed Spaces and Inner Product Spaces

346

inequality, follows from Minkowski' s inequality for infinite sums (5.2.7).


Invoking Example 5.5.26, it also follows that l{ p; II . lip} is a Banach space.
H e nceforth, when we simply refer to the Banach space Ip , we assume that the
norm on this space is given by Eq. (6.1.7).
L e tting p = 00 and
I..

= x{

X:

sup

ne/1}

sup

ne/I},

(refer to Example 5.3.8), and defining

IIxII .. =

< oo}

(6.1.8)

it is readily verified that I{ .. ; II II..} is also a Banach space. When we simply


refer to the Banach space I.., we have in mind the norm given in Eq.
(6.1.8).

6.1.9. Example
(a) L e t
the interval

e[a, b) denote the linear space of real continuous functions on


a[ , b), as given in Example 3.1.l9. F o r x E era, b) define

Ilxllp =

i[ "lx(t Wdt I] IP ,
b

I <p

<

00.

It is easily shown that lela, b); II . lip} is a normed linear space. Ax i oms
(i)-(iii) of Definition 6.1.1 follow trivially, while axiom (iv) follows from the
Minkowski inequality for integrals (5.2.8). L e t pix , y) = IIx - lY l p Then
e{ ra, b); pp} is a metric space which is not complete (see Example 5.5.29
where we considered the special case p = 2). It follows that e{ ra, b); II . lip}
is not a Banach space.
Next, define on the linear space era, b) the function II . II.. by

IIxII .. =

sup Ix ( t)

' E la,b)

I.

It is readily shown that e{ ra, b); II II..} is a normed linear space. L e t p.. (x, y)
= I\ x - yll... In accordance with Example 5.5.28, e{ ra, b); P..} is a complete
metric space, and thus e{ ra, b); II . II..} is a Banach space.
The above discussion can be modified in an obvious way for the case
where era, b) consists of complex-valued continuous functions defined on
a[ , b). H e re vector addition and multiplication of vectors by scalars are defined
similarly as in Eqs. (3.1.20) and (3.1.21), respectively. F u rthermore, it is
easy to show that e{ ra, b); II lip}, I < p < 00, and e{ ra, b); II . II..} are
normed linear spaces with norms defined similarly as above. Once more, the
space e{ ra, b); II lip}, I < p < 00, is not a Banach space, while the space
lela, b); II II..} is.
(b) The metric space pL { (a, b); pp} was defined in Example 5.5.31. It
can be shown that pL a[ , b) is a vector space over R. If we let

IIXI1p=f[

),,,,bl

IfIPdJlI] /P,

6.1. NormedLinear Spaces

347

>

I, for f E pL a[ , b], where the integral is the Lebesgue integral, then


pL { a[ ,
b]; II . lip} is a Banach space since pL { a[ ,
b]; pp} is complete, where
pp(x, y) ~ Ilx - lY lp.
_
p

6.1.10. Example. Let { X ; II II..}, {Y; II . II,.} be two normed linear spaces
over ,F and let X x Y denote the Cartesian product of X and .Y Defining
vector addition on X x Y by
(XI'

IY )

= (XI

(x z ' )z Y

X z , IY

)z Y

and multiplication of vectors by scalars as

( ,x

y)

(<,x<

y),

we can readily show that X x Y is a linear space (see Eqs. (3.2.14), (3.2.15)
and the related discussion). This space can be used to generate a normed
linear space { X
x ;Y II . III by defining the norm II . II as
F u rthermore, if { X ;
easily shown that { X

6.1.11. Exercise.
6.1.10.

II(x, y)11 = IIxll .. + IIYII,


II . II..} and { Y ; II II,} are Banach
x ;Y II . III is also a Banach space.

spaces, then it is
_

Verify the assertions made in Examples 6.1.5 through

II III a

sphere S(x o; r) with

IIx - ox ll < rl

(6.1.12)

We note that in a normed linear space { X ;


center X o E X and radius r > 0 is given by

S(x o; r)

x{

X:

Referring to Theorem 5.4.27 and Exercise 5.4.31, recall that in a metric


space the closure of a sphere (denoted by S(x o; r need not coincide with the
closed sphere (denoted by (K x o; r. In a normed linear space we have the
following result.

6.1.13. Theorem. Let X be a normed linear space, let X o E ,X and let


r > O. L e t S(x o; r) denote the closure of the open sphere S(x o; r) given by
Eq. (6.1.12). Then S(x o; r) = (K x o; r), the closed sphere, where

K(xo;r)

x{

X : llx - x o lI< r } .

(6.1.14)

Proof By Exercise 5.4.31 we know that S(x o ; r) c (K x o; r). Thus, we


need only show that (K x o; r) c S(x o; r). It is clearly sufficient to show that
x { E :X Ilx - ox ll = r} c S(x o; r). To do so, let x be such that IIx - ox ll
= r, and let 0 < < I. Let Y = X o + (I - f)x . Then y - X o = (I - )
(x - x o)' Thus, Ily - ox ll = II - 1
l lx
- ox ll < r and so y E S(x o; r).
Also, y - x = ( x o - )x . Therefore, Ily - ix i = f r. This means that
x E S(x o; r), which completes the proof. _

Chapter 6 I Normed Spaces and Inner Product Spaces

348

Thus, in a nonned linear space we may call S(x o; r) the closed sphere
given by Eq. (6.1.14).
When regarded as a function from X into R, a nonn has the following
important property.
6.1.15. Theorem. Let ;X {
II . III be a nonned linear space. Then
a continuous mapping of X into R.

II II is

Proof We view II . II as a mapping from the metric space {X; p}, P =


II x - y II, into the real numbers with the usual metric for R. Thus, for given
f >
0, we wish to show that there is a t5 > 0 such that II x - y II < t5 implies
IlIx l l- l Iylll < f. Now let z = x - y. Then x = z + y and so Ilxll <
Ilzll + lIyll This implies that IIx l l- l lyll < Ilzll. Similarly, y = x - ,z
and so IIYII < Ilxll + II- z l l = Ilxll + IIzll. Thus, IIYII-Ilxll
< IIzll. It
now follows that Illx l l- l Iylll< IIzll = IIx - yll. Letting t5 = f, the
desired result follows.

In this chapter we will not always require that a particular nonned linear
space be a Banach space. Nonetheless, many important results of analysis
require the completeness property. This is also true in applications. F o r
example, in the solution of various types of equations (such as non-linear
differential equations, integral equations, etc.) or in optimization problems
or in non-linear feedback problems or in approximation theory, as well as
many other areas of applications, we frequently obtain our desired solution
in the form of a sequence generated by means of some iterative scheme. In
such a sequence, each succeeding member is closer to the desired solution
than its predecessor. Now even though the precise solution to which a
sequence of this type may converge is unknown, it is usually imperative that
the sequence converge to an element in that space which happens to be the
setting of the particular problem in question.

6.2.

IL NEAR

SUBSPACES

We now turn our attention briefly to linear subspaces of a normed linear


space. We first recall Definition 3.2.1. A non-empty subset Y of a vector
space X is called a linear subspace in X if (i) x + y E Y whenever x and y
are in ,Y and (ii) x E Y whenever E F and x E .Y Next, consider a
normed linear space { X ; 1I11l, let Y be a linear subspace in ,X and let II lit
denote the restriCtion of 1111 to ;Y i.e.,

IIxll l = Ilxll
Then it is easy to show that { Y ;

for all

.Y

1\ lit} is also a nonned linear space. We

6.2. iL near Subspaces

349

call II . III the norm induced by II . lion Y a nd we say that { Y ; II III} is a


normed linear subspace of ;X {
II II}, or simply a linear subspace of .X Since
there is usually no room for confusion, we drop the subscript and simply
denote this subspace by { Y ; II III In fact, when it is clear which norm is
being used, we usually refer to the normed linear spaces X and .Y
Our first result is an immediate consequence of Theorem 5.5.33.
6.2.1. Theorem. Let X be a Banach space, and let Y be a linear subspace
of .X Then Y is a Banach space if and only if Y is closed.
In the following we give an example of a linear subspace of a Banach space
which is not closed.
6.2.2. Example.
Let X be the Banach space /1 of Example 6.1.6, and let
Y be the space of finitely non-zero sequences given in Example 3.1.14. It is
easily shown that Y is a linear subspace of .X To show that Y is not closed,
consider the sequence (y.l in Y defined by
IY
1Y

!Y

(1,0,0, ...),
=
=

(I, 1/2,0,0, ...),


(I, 1/2, 1/4,0,0, ...),

.........................

Y.

(I, 1/2, ... , 1/2 , 0,0,

).

This sequence converges to the point x = (I, 1/2,


, 1/2, 1/2-+ 1,
Since x ,Y it follows from part (iii) of Theorem 5.5.8 that Y
_
closed subset of .X

. . ) E

.X

is not a

Next, we prove:
6.2.3. Theorem. Let X be a Banach space, let Y be a linear subspace of
,X and let f denote the closure of .Y Then f is a closed linear subspace of .X

Proof Since Y is closed, we only have to show that Y is a linear subspace.


Let ,x Y E ,Y and let f > O. Then there exist elements ,'x y' E Y such that
and lIy - y' l l < f. Hence, for arbitrary ~,P
E ,F
'X~
+
+ py) - (~X' + Py' ) II = 1I~(x - x ' ) + P(y - y')11 <
I~I lIx - x ' i l + IPI l Iy - y'll < (I~I
+ IPlk Since f > 0 is arbitrary,
this implies that ~x + py is an adherent point of ;Y i.e., x~ + py E .Y
This completes the proof of the theorem. _
IIx py'

x'lI

<

.Y Now

1I(~x

We conclude this section with the following useful result.


6.2.4. Theorem. Let X be a normed linear space, and let Y
subspace of .X If Y is an open subset of ,X then Y = .X

be a linear

Chapter 6 I Normed Spaces and Inner Product Spaces

350

Proof

eL t x E .X We wish to show that x E .Y Since 0


assume that x 1= = O. Since Y is open and 0 E ,Y there is some l

the sphere S(O; l)


Yis

= 2lixllx,

.Y eL t z

a linear subspace, it follows that

6.3.

INFINITE

Then

IIzll <

211 x llz = x

>

and so z

,Y we may

0 such that
E

.Y Since

Y .

SERIES

Having defined a norm on a linear space, we are in a position to consider


the concept of infinite series in a meaningful way. Throughout this section
we refer to a normed linear space ;X{
II II} simply as .X
6.3.1. Definition. eL t x{
integer m, let

be a sequence of elements in .X F o r each positive

8}

y",

XI

+ ... +

"x .'

We call y{ ",} the sequence of partial sums of x{ 8 } . If the sequence y{ ",}


verges to a limit y E ,X we say the infinite series
XI

+ ... +
z

converges and we write

y=

We say the infinite series I; X


8

X k

I; X

8- 1

+ ...
=

I; X

8- 1

con-

diverges if the sequence "Y { ,}

diverges.

The following result yields sufficient conditions for an infinite series to


converge.

.
IfI; IIx

6.3.2. Theorem. eL t X be a Banach space, and let x {


8= 1

11 <

00,

then

(i) the infinite series I; X


(ii)

8= 1

Proof To prove the first part, let y", =

8Y -

"Y , =

Since

i: II

8= 1

X " ,+ I

+ ... +

be a sequence in X .

converges; and

III;
X II <
I; II X II
8= 1
8 1
8

8}

x 8 eH nce,

XI

+ ... +

"x .'

If n >

m, then

II is a convergent infinite series of real numbers, the sequence

6.4.

351

Convex Sets

of partial sums sIft = IIxIl1 + ... + II x'" II is Cauchy. Hence, given f > 0,
there is a positive integer N such that n > m > N implies Is. - SIft I:: :; ; f.
But Is. - s.. 1> Ily. - y",lI, and so Y { ..} is a Cauchy sequence. Since X is
complete, y{ ",} is convergent and conclusion (i) follows.
To prove the second part, let y", = IX
X .. , and let y =
lim y", =

+ ... +

...

I; X
.=1

Ilyll

.
I; Ilx/ll

Then for each positive integer m we have y =

<

m -+

Ily.. 1I < Ily - y.. 11 +


...
...
00, we have III; Ix II <
I; II tx ll-
Ily - y .. 11

1= 1

6.4.

CONVEX

1= 1

y-

y",

at

Y .. and

Taking the limit as

I- I

SETS

In the present section we consider the concepts of convexity and cones


which arise naturally in many applications. Throughout this section, X is a

real normed linear space.

L e t X and y be two elements of .X

xy =

z{

X:

Z=

ax

the line segment joining X


follows.

(I -

We call the set xy, defined by

a)y for all a

and y. Convex

R such that 0

<

<

I},

sets are now characterized

as

6.4.1. Definition. L e t Y be a subset of .X Then Y is said to be convex if


Y contains the line segment x y whenever X and yare two arbitrary points
in .Y A convex set is called a convex body if it contains at least one interior
point, i.e., if it completely contains some sphere.
In F i gure A we depict a line segment xy, a convex set, and a non-convex
set in R2.

)(

line segment yx

Convex set
6.4.2.

Figure

Non-convex

set

Chapter 6 I Normed Spaces and Inner Product Spaces

352

Note that an equivalent statement for Y to be convex is that if ,x y E Y


then x + py E Y whenever and p are positive constants such that
+P=1.
We cite a few examples.
6.4.3. Example. The empty set is convex. Also, a set consisting of one
point is convex. In R3, a cube and a sphere are convex bodies, while a plane
and a line segment are convex sets but not convex bodies. Any linear subspace of X is a convex set. Also, any linear variety of X (see Definition 3.2.17)
is a convex set. _
6.4..4
Y=
.X

6.4.5.

x{

Example.
E :X
x

Let

= Y,y

Exercise.

Y and Z be convex sets in ,X let II, pER, and let


E .J Y
Then the set Y
+ pZ is a convex set in

Prove the assertions made in Examples 6.4.3 and 6.4..4

6.4.6. Theorem. eL t Y be a convex set in ,X and let II, pER be positive


scalars. Then (<< + P)Y = Y + pY .

Proof Regardless of convexity, if x E (<< + P)Y, then x = (<< + P)y =


ay + py E Y + pY , and thus (<< + P)Y c Y + pY . Now let Y be
convex, and let x = y
pz, where y, z E .Y Then

+ pX
=

i:+ f J Y

+ i+ " J z

+~-tiJ1i:+
-1 J

- i:+P

- .

,Y

because

"i:+P
Therefore, x
the proof. _

(<<

P)Y and thus Y

pY c (<< +

P)Y.

This completes

We leave the proof of the next result as an exercise.


6.4.7. Theorem. Let e be an arbitrary collection of convex
intersection
Y is also a convex set.

sets. The

eY e

6.4.8.

Exercise.

Prove Theorem 6.4.7.

The preceding result gives rise to the following concept.


6.4.9. Definition. eL t Y be any set in .X The convex bull of ,Y also called
the convex cover of ,Y denoted by Y e , is the intersection of all convex sets
which contain .Y

6.4.

Convex Sets

353

We note that the convex hull of Y is the smallest convex set which contains .Y Examples of convex covers of sets in R2 are depicted in Figure B.

Legend:

f!J
6.4.10.

iF gure B. Convex hulls.

6.4.11. Theorem. Let Y be any set in .X


of 'points expressible as I~ IY
~2Y2
where

I~

>

0, i =

The convex hull of Y is the set


~"Y",
where IY ' ... ,Y" E ,Y

+ ... +

" ~I =
I, ... , n, where E
1= 1

1 and where n is not fixed.

Proof If Z is the set of elements expressible as described above, then clearly


Z is convex. Moreover, Y c Z, and hence Y e c Z. To show that Z eY e ,
we show that Z is contained in every convex set which contains .Y We do so
by induction on the number of elements of Y that appear in the representation
of an element of Z. Let U be a convex set with :U ::J .Y If z = (% l ZI E Z for
which n = I, then (%1 = 1 and Z E .U Now assume that an element of Z is
in U if it is represented in terms of n - I elements of .Y Let Z = ~l Z I + ...
+ (%"Z" be in Z, let P = ~I + ... + (%"_1' let PI = (%IIP, i = I, ... ,n - I,
andletu = PIZI + ... + P,,- l Z,,- I ' Thenu E ,U by the induction hypothesis.
But "z E ,U (%" = I - P, and Z = pu + (I - P)z" E ,U since U is convex.
This completes the induction, and thus Z c U from which it follows that
Zc eY '

6.4.12. Theorem. Let


is also a convex set.
6.4.13.

Exercise.

be a convex set in .X

Then the closure of ,Y

,Y

Prove Theorem 6.4.12.

Since the intersection of any number of closed sets is always closed, it


follows from Theorem 6.4.7 that the intersection of an arbitrary number of
closed convex sets is also a closed convex set.
We now consider some interesting aspects of norms in terms of convex
sets.

Chapter 6 / Normed Spaces and Inner Product Spaces

6.4.14.

Theorem. Any sphere in X is a convex set.

Proof

We consider without loss of generality the unit sphere,

Y={x
~

Ifx o' oY
~

+ P=

+ P=

,Y then IIx oII <


I, then II~xo
I, and thus ~xo

X : llx l l<

I and IIYolI
PYol1 < lI~xoll

+ PYo

.Y

I}.

< I. Now if >

IIPYolI

0 and
~llxoll

P>
+

0, where
PllYol1 <

In view of Theorems 6.1.13, 6.4.1 2, and 6.4.1 4, it follows that a closed


sphere S(x o; r) is also convex. The following example, cast in R2, is rather
instructive.

Example. On R2 we define the norm II IIi> of Ex a mple 6.1.5. A


moment' s reflection reveals that in case of II . 112' the unit sphere is a circle
of radius I; when the norm is II II.., the unit sphere is a sq u are with vertices
(1,1), (I, - I ), (- 1 , 1), (- 1 , - I ); if the norm is II 111' the unit sphere is
the sq u are with vertices (0, I), (I~ 0), (- 1 ,0), (0, - I ). If for the unit sphere
corresponding to II lip we let p increase from I to 00, then this sphere will
deform in a continuous manner from the sq u are corresponding to II lit
to the sq u are corresponding to II . II... This is depicted in F i gure C. We note
that in all cases the unit sphere results in a convex set.
F o r the case of the real-valued function

6.4.15.

(6.4.16)
the set determined by II x II < 1 results in a set which is not convex. In particular, if p = 2/3, the set determined by II x II < I yields the boundary and the
interior of an asteroid, as shown in F i gure C. The reason for the non- c onvex i ty
of this set can be found in the fact that the function (6.4.16) does not represent
a norm. In particular, it can be shown that (6.4.16) does not satisfy the triangle
inequality.

11'11.

t,
11'11,

6.4.17.

Unit spheres for


Example 6.4.15

1I'lb13

iF gure C. Unit

spheres for Example 6.4.15.

355

6.5. iL near uF nctionals

6.4.18.

Exercise.

Verify the assertions made in Example 6.4.15.

We conclude this section by introducing the notion of cone.


6.4.19. Definition. A set Y in X is called a cone with vertex at the origin
if Y E Y implies that Y E Y for all > O. If Y is a cone with vertex at the
origin, then the set X o + ,Y X o E ,X is called a cone with vertex X o' A convex
cone is a set which is both convex and a cone.
In Figure D examples of cones are shown.

(al Cone

(bl Convex cone

6.4.20.

6.5.

IL NEAR

iF gure D

FUNCTIONALS

Throughout this section X

is a normed linear space.

We recall that a mapping, f, from X into F is called a functional on X


(see Definition 3.5.1). Iff is also linear, i.e., f(<x<
+ py) = f (x ) + Pf(y)
for all , P E F and all ,x y E ,X then f is called a linear functional (refer to
Definition 3.5.1). Recall further that X I , the set of all linear functionals on
,X is a linear space over F (see Theorem 3.5.16). eL t f E X I and x E .X In
accordance with Eq. (3.5.10), we use the notation

f(x ) =

(6.5.1)

< x , f)

to denote the value offat .x Alternatively, we sometimes find it convenient


to let x' E X ' denote a linear functional defined on X and write (see Eq.
(3.5.11))

x'(x)

= ,x<

x').

(6.5.2)

Invoking Definition 5.7.1, we note that continuity of a functional at a


point X o E X means, in the present context, that for every f > 0 there is a
~ > 0 such that If(x ) - f(x o) I < f whenever II x - X o I\ .~<
Our first

Chapter 6 I oH rmed Spaces and Inner Product Spaces

356

result shows that if a linear functional on X is continuous at one point of X


then it is continuous at all points of .X
6.5.3. Theorem. If a linear functional f on X is continuous at some point
X o E X,
then it is continuous for all x E .X

Proof If y{ ,,} is a sequence in X such that y" - + x o' thenf(y,,) - + f(x o)' by
Theorem 5.7.8. Now let ,x { ,} be a sequence in X converging to x E .X Then
the sequence y{ ,,} in X given by y" = "x - x + X o converges to x o' By the
linearity off, we have
f(x ) = f(y,,) -

f(x,,) -

f(x o)'
Since If(y,,) - f(x o )l- +
0 as y" - + X o' we have If(x,,) - f(x ) l- +
0 as
"x - + x, and therefore f is continuous at x E .X Since x is arbitrary, the
proof of the theorem is complete. _

"*

It is clear that iffis a linear functional and if f(x )


0 for some x
then the range off is all of F ; i.e., R
< (f) = .F
F o r linear functionals we define boundedness as follows.

,X

6.5.4. Definition. A linear functionalf on X is said to be bounded if there


exists a real constant M > 0 such that
for all x

.X

If(x ) I < Mil x II


Iff is not bounded, then it is said to be unbounded.

The following theorem shows that continuity and boundedness of linear


functionals are equivalent.
6.5.5. Theorem. A linear functional
bounded if and only if it is continuous.

f on a normed linear space X is

Proof Assume thatfis bounded, and let M be such that If(x)1 < Mil x II
for all x E .X If"x - + 0, then If(x,,) I < Mil "x 11- + o. H e nce,fis continuous
at x = O. F r om Theorem 6.5.3 it follows thatfis continuous for all x E .X
Conversely, assume thatfis continuous at x = 0 and hence at any x E X.
There is a 6> 0 such that If(x)1 < I whenever IIxll < 6. Now for any
x 1= = 0 we have II (6x)/11 Ix I II = 6, and thus
If(x )
If we let M =

I If(~

IIxll

1/6, then If(x )

II II) 1=
-r
X

If( 6x ) \ .

I< Mllxll,

TIXlT

IIxll
-r
<

II x ll.

0-

andfis bounded. _

We will see later, in Example 6.5.17, that there may exist linear funetionals
on a normed linear space which are unbounded. The class oflinear functionals
which are bounded has some interesting properties.

6.5. iL near uF nctionals

357

6.5.6. Theorem. eL t X '


be the vector space of all linear functionals on
,X and let X
denote the family of all bounded linear functionals on .X
Define the function II . II: X - + R by

11/11 =

for IE *X .

I/(x)1
IIxll

sup
.... 0

(6.5.7)

Then
(i) *X
is a linear subspace of XI;
(ii) the function II II defined in Eq. (6.5.7) is a norm on X ;
; II . III is complete.
(iii) the normed space { X

and

The proof of part (i) is straightforward and is left as an exercise.


To prove part (ii), note that if I -F= 0, then II I II > 0 and if I = 0, then
11/11 = O. Also, since

Proof.

IOt/(x)1
IIIx I

sup
.... 0

it follows that II Otl II =

lOt IIII II

III1 + 1211 =

sup 1/1(x)

<

sup 1/1(x)1

.... 0

Finally,

IIxll

IIxll

.... 0

lOti sup l/(x ) I,


.... 0 lTXlr
=

12(x)

I<

sup 1{ /1(x)1

.... 0

IIxll

1/2(x)

I}

= III1II + 11/211.

sup 1/2(x)1

.... 0 Ilxll

eH nce, II . II satisfies the axioms of a norm.


E *
X be a Cauchy sequence. Then Ilx~ - :x "11
To prove part (iii), let }~x{
-+
0 as m, n - + 00. If we evaluate this sequence at any x E ,X then {x~(x)}
is a Cauchy sequence of scalars, because Ix(~ )x
- :x "(x) I< IIx~
- :x .. 1111 x II.
This implies that for each x E X there is a scalar x ' ( x )
such that x~(x)
-+
x'(x).
We observe that (' x Otx + py) = lim (~x Otx
+ py) = lim O[ tx(~ )x
+
px~(y)]

Otx(' )x

= Ot lim x:(x) +
~

p lim x:(y)
.-~

= /X('x )x

11_ 0 0

px ' ( y),

i.e.,

11' _ 0 0

('x Otx

py) =

px ' ( y), and thus, ' x is a linear functional. Next we show that ' x
is bounded. Since :x { }
is a Cauchy sequence, for f > 0 there is an M such that
Ix : (x ) - :x "(x)1 < fllxll for all m, n > M and for all x E .X But (~x )x +x'(x),
and hence Ix ' ( x ) - :x "(x) I < fllxll for all m > M. It now follows that

Ix(' )x I

= Ix(' )x

:x "(x)

< fllxll + Ilx:"lIllxll,

:x "(x)

I < Ix(' )x
-

:x "(x)

I + I:x "(x) I

and thus x ' is a bounded linear functional. iF nally, to show that :x ., - +


E *
X , we note that Ix ' ( x ) - :x "(x) I < fllx II whenever m > M from which
we have Ilx' - :x ., II < f whenever m > M. This proves the theorem. _

x'

Chapter 6 I Normed Spaces and Inner Product Spaces


6.5.8. Exercise.

Prove part (i) of Theorem 6.5.6.

It is especially interesting to note that *X


is a Banach space whether X
is or is not a Banach space. We are now in a position to make the following
definition.
6.5.9. Definition. The set of all bounded linear functionals on a normed
space X is called the Donned conjugate space of ,X or the nonned dual of ,X
or simply the dual of ,X and is denoted by *
X . F o r I E *X
we call 11/11
defined by Eq. (6.5.7) the nonn off
The next result states that the norm of a functional can be represented
in various equivalent ways.
6.5.10. T
' heorem. L e tlbe
be the norm off Then
(i)

(ii)
(iii)
6.5.H .

IIIII=
Ilfll=
11/11 =

a bounded linear functional on ,X

inf{ M : I/(x )

Is

sup { 1 /(x ) l} ;

and

b:1~ 1

and let 11/11

II for all x EX } ;

Mil x

sup {l/(x)l}.

1..1- 1

Exercise.

Prove Theorem 6.5.10.

Let us now consider the norms of some specific linear functionals.


6.5.12. Example.
mapping

Consider the normed linear space

I(x ) =

x(s) ds,

e{ ra, b]; II II-I. The

era, b]

is a linear functional on era, b] (cf. Example 3.5.2). The norm of this functional equals (b - a), because
I/(x ) 1 =

I6J

x(s) ds

I<

(b -

a) max
G~.~6

Ix(s) I.

6.5.13. Example. Consider the space e{ ra, b]; II II..}, let X o be a fixed
element of era, b], and let x be any element of era, b]. The mapping
I(x ) =

is a linear functional on
bounded, because

If(x)1 =

If

s:

x(s)xo(s)

ds

era, b] (cf. Example 3.5.2). This functional is

(X S)Xo(S)

ds

I< u: I

oX (S)

IdS) II x 11_.

6.5. iL near uF nctionals

359

Sincefis bounded and linear, it follows that it is continuous. We leave it to


the reader to show that

11/11 =

Ixo(s) Ids.
-

6.5.14. Example. eL t a = (~1' ... , ~n) be a fixed element of nF , and let


x = (et, ... ,en) denote an arbitrary element of P. Then if

f(x )

~/e"

it follows that f is a linear functional on P (cf. Example 3.5.6). eL tting


IIxll = I< etl Z + ... + ienI Z)I/2, it follows from the Schwarz inequality
(4.9.29) that
(6.5.15)
Thus, f is bounded and continuous. In order to determine the norm of ,J
we rewrite (6.5.15) as

I/(x) I < sup If(x)1 < lIall


Ilxll - ",00 lTXlf ,
from which it follows that II/ II < II a II. Next, by setting x = a, we have
If(a)1 = lIallz . Thus,
I/(a) I = 11011
11011
.
Therefore IIfll=

lIall -

6.5.16. Example. Analogous to the above example, let a = (~1' ~z' ...)
be a fixed element of the Banach space I" (see Example 6.1.6), and let x =
(et, e", .. ) be an arbitrary element of I". It follows that if
f(x )

00

~ ~/e"
1=1

thenfis a linear functional on /". We can show thatfis bounded by observing


that

If(x)1 =

Iii/~e(1

< ~I~le/l

< lIalllIxll,

which follows from Holder's inequality for infinite sums (5.2.4). Thus, f is
bounded and, hence, continuous. In a manner similar to that of Example
6.5.14, we can show that II/II = I! all. _
We conclude this section with an example
functional.

of an unbounded linear

Chapter 6 I Normed Spaces and Inner Product Spaces

360
6.5.17.
(~"

~2'

IIxli =

Example. Consider the space X of finitely non-zero sequences x =


'~8'
0, 0, ...) (cf. Example 3.1.14). Define II . II: X - + R as
max lell It is easy to show that ;X {
II . II} is a normed linear space.

F u rthermore, it is readily verified that the mapping

is an unbounded linear functional on .X

6.5.18. Exercise.
Verify the assertions made in Examples 6.5.12, 6.5.13,
6.5.14,6.5.16, and 6.5.17.

6.6.

IF NITE-DIMENSIONAL

SPACES

We now briefly turn our attention to finite-dimensional vector spaces.


Throughout this section X denotes a normed linear space.
We recall that if "x {
... ,x 8 } is a basis for a linear space ,X then for each
x E X there is a unique set ofscaJ a rs e{ l" .. ,e8} in ,F called the coordinates
of x with respect to this basis (see Definition 3.3.36). We now prove the following result.
6.6.1. Theorem. Let X be a finite-dimensional normed linear space, and
let { x " ... ,x
be a basis for .X
F o r each x E ,X let the coordinates of x
with respect to this basis be denoted by (e I ' . , e8) E P. F o r i = I, ... ,
n, define the linear functionals It: X - + F by It(x) =
Then each It is a continuous linear functional.
8

el'

Proof The proof that It is linear is straightforward. To show that It is a


bounded linear functional, we let

S=

(<"<

a{ =

... , 8 )

8 :

1,1 + 121 + ... + 181

I}.

It is left as an exercise to show that S is a compact set in the metric space


8F { ; PI} (see Example 5.3.1). Now let us define the function g: S - + R by

g(a) =

1I,,x

+ ... +

8 8x 11

The reader can readily verify that g is a continuous function on S. Now let
m = inf{ g (a): a E S}. It follows from Theorem 5.7.15 that there is an
a o E S such that g(ao) = m. Note that m 0 since "x {
... , x 8 } is a basis for
,X and also a o O. Hence m > O. It now follows that

for every a

= (<"<

II,,x + ... + 8 8x 11 > m


... 8
' ) E S. Since 1,1 + ... + 181 =

I for a

S, we

6.6. iF nite-Dimensional

361

Spaces

see that

IIIX IX I + ... + IXnnx ll

> m(lIXII + ... + IIX n)!

for all a E S.
Next, for arbitrary x E X with coordinates (' I " ..
p = 1'1 I + ... + I I First, we suppose that p > O. Then

,en)

en

Ilxli = Ilelx l + ... +

>

pm(I%1

enxnll

I+

= p II%x

(6.6.2)
E

P, we let

... + tXnll

+I~I)

= m(lell +
I+ n' l),
where inequality (6.6.2) has been used.

Therefore, if p ::1= 0, we have

(let! + ... + len I) < - lmIlx l I

(6.6.3)

Noting that inequality (6.6.3) is also true if p = 0, we conclude that this


Since Ift(x)1 = Iell < leI I + ... + Ienl,
inequality is true for all x E .X
i = I, ... , n, we see that Ift(x) I < (ljm)lIx l i for any x E .X Hence, ft is a
bounded linear functional and, consequently, it is continuous. _
6.6.4. Exercise.
Prove that the set S and the function g have the properties
asserted in the proof of Theorem 6.6.1.
The preceding theorem allows us to prove the following important result.
6.6.5. Theorem.
X is complete.

Let
X

be a finite-dimensional normed linear space. Then

Proof

Let Ix { >
... , ,x ,} be a basis for ,X let kY { }
be a Cauchy sequence in
and for each k let the coordinates of kY with respect to IX { >
... ,x,,} be
given by (l1kl> ... , ' 7 h)' It follows from Theorem 6.6.1 that there is a constant M such that I11k} 1- 1/J1 < MllYk - IY II forj = I, ... , n and all i, k =
1,2, .... Hence, each sequence 7'{ k}}
is a Cauchy sequence in ,F i.e., in R
or C, and is therefore convergent. Let '70} = lim 7' k} for j = I, ... , n. If we
,X

let

oY =
it follows that kY { }

' 7 0I X I

converges to oY '

+ ... +

7' o"x",

This proves that X

is complete.

The next result follows from Theorems 6.6.5 and 6.2.1.


6.6.6. Theorem. L e t X be a normed linear space, and let Y be a finitedimensional linear subspace of .X Then (i) Y is complete, and (ii) Y is closed.

Chapter 6 I Normed Spaces and Inner Product Spaces

6.6.7. Exercise.

Prove Theorem 6.6.6.

Our next result is an immediate consequence of Theorem 6.6.1.

X be a finite-dimensional normed linear space, and


let/be a linear functional on .X Then/is continuous.

6.6.8. Theorem. Let


6.6.9. Exercise.

Prove Theorem 6.6.8.

We recall from Definition 5.6.30 and Theorem 5.6.31 that a subset Y o f


a metric space X is relatively compact if every sequence of elements in Y
contains a subsequence which converges to an element in .X This property
can be useful in characterizing finite-dimensional subspaces in an arbitrary
normed linear space as we shall see in the next theorem. Note also that in
view of Definition 5.1.19 a subset Y in a normed linear space X is bounded
if and only if there is a .t > 0 such that II Y II < .t for all Y E .Y

6.6.10. Theorem. Let X be a normed linear space, and let Y be a linear


subspace of .X Then Y is finite dimensional if and only if every bounded
subset of Y is relatively compact.
Proof (Necessity) Assume that Y is finite dimensional, and let {x I' , x.J
be a basis for .Y Then for any Y E Y there is a unique set {"I' ... ,
such
that Y = "IX I + ... +
Let A be a bounded subset of ,Y and let I' k }
be a sequence in A. Then we can write kY = "ax i + ... + ".kX . for k =
I, 2, . . . . There exists a .l > 0 such that II Y kll < .l for all k. Consider I"Ik I
+ ... + I".kl. We wish to show that this sum is bounded. Suppose that it
is not. Then for each positive integer m, we can find a Y k . such that I"I k.1
+ ... + I".l.1 .>L mY > m. Now let Y~. = (l/Ym)Yk. It follows that

".J

".x ..

lIy~.1I
-

Thus, Y~.

0 as m -

<.!. m

On the other hand,

00.

Y~.

,,1m II Y k . II ,< ,1
=

"' l k.X I

+ ... +

"~k.X.

+ ... +

where t(,k. = "' k J " m for i = I, ... ,n. Since 1";k.1


l~k.1
= I, the
coordinates "{ ;k., ; .. k~' .J
form a bounded sequence in "F and as such
contain a convergent subsequence. Let "{ ;o' ... , ~o}
be the limit of such
a convergent subsequence whose indices we denote by kmJ' If we let Y~ =
";ox i + ... + , ~ox., then we have
\Iy~

y~JI

< 1";0 - ";kJ lIx

- 0 as mj-

lI + ... +

I~o

~k",1

I\ .x II

00.

Thus, Y~.J
- y~. Since y~", - 0, it follows that Y~ = O. But this is impossible
because lX { '
... , .x } is a linearly independent set. We conclude that the sum

6.7.

Geometric Aspects ofLinear uF nctionals

363

II' lk I + ... + I1' 11k I is bounded. Consequently, there is a subsequence


I' { lk}"
.. ,'1"k}} which is convergent in "F . Let 1'{ 10'
,'1"o} be the limit of
the convergent subsequence, and let oY = 1' 10 X I +
+ 1' "ox". Then Y k /- >
oY . Thus, kY { }
contains a convergent subsequence, and this proves that A is
relatively compact.
(Sufficiency) Assume that every bounded subset of iY s relatively compact.
Let IX E Y be such that IIxllI = I, and let VI = V({x l } ) be the linear sub(see Definition 3.3.6). If VI = ,Y then we are done.
space generated by IX { }
If VI *' - ,Y let z Y E Y be such that z Y $ VI' Let d = infllyz - x " l. Since
VI is closed by Theorem 6.6.6, we must have d> 0; otherwise z Y E VI'
F o r every ' 1 > 0 there is an X o E VI such that d < IIYz - X oII < d + I' .
Now let X z = (Yz - o
x )/Ilyz - ox ll. Then X z rt V!, Ilxlz l = I, and
II

Ilx z - ix i = I I ~:

;:11 - ix i

IIYz

x o ll" z Y
-

I'x I

- x ' I I>
_d_
= 1- - ' 1 - ,
d+ ' 1
-d+n
d+ ' 1
where x ' = X o + IIYz - X oII X E VI for all X E VI' Since I' is arbitrary, we
can choose I' so that II X z - X 1 II > t
Now let V2 be the linear subspace generated by {XI' x z .} If VI = ,Y we are
done. If not, we can proceed in the manner used above to select an X 3 rt VZ,
II x 3 11 = I, II IX - x 3 11 > t, and II X z - x 3 11 > t. If we continue this process,
then we either have V({x l ' . . . , ,x ,}) = Y for some n, or or else we obtain
such that Ilx,,1I = t and IIx" - "x ,11 > 1- for all
an infinite sequence ,x { ,}
m *' - n. The second alternative is impossible, since ,x { ,} is a bounded sequence
and as such must contain a convergent subsequence. This completes the
proof. _
>

IIYz

6.7. GEOMETRIC ASPECTS OF


IL NEAR

FUNCTIONALS

Throughout this section X denotes a real normed linear space. Before giving
geometric interpretations of linear functionals we introduce the notions of
maximal subspace and hyperplane.
6.7.1. Definition. A linear subspace Y of linear space X is called maximal
if it is not all of X and if there exists no linear subspace Z of X such that
Y , *- Z ,Z,*X and Y c Z.
Recall that if Y is a linear subspace of X and if Z E ,Y then we call the
set Z = z + Y a linear variety (see Definition 3.2.17). In this case we also
say that Z is a translation of .Y

Chapter 6 I aH rmed Spaces and Inner Product Spaces

364

6.7.2. Definition. A hyperplane Y in a linear space X is a rnax i mallinear


variety resulting from the translation of a maximal linear subspace.
If a hyperplane Y contains the origin, then it is simply a maximal linear
subspace and all hyperplanes Z obtained by translating Y are said to be
paraUel to .Y
The following theorem provides us with an important characterization
of hyperplanes in terms of linear functionals.
6.7.3. Theorem. Iff1= = 0 is a linear functional on X and if (X is any fixed
scalar, then the set Y = (x: f(x ) = (X} is a hyperplane. It contains the origin
o if and only if (X = O. Conversely, if Y is a hyperplane in a linear space X,
then there is a linear functional f on X and a fixed scalar (X such that Y =
(x: f(x ) = (X.}

Proof Consider the first part. Sincef=/::. 0 there is an IX such thatf(x ,) =


,) = (X and thus X o E .Y Let
P /= ::. O. If X o = ( /X P)x
( /X P)f(x
l , then f(x o) =
oY = - X o .Y It is readily verified that oY = (x : f(x ) = O} and that oY
is a linear subspace, so that Y is a linear variety. Since oY /= ::. ,X we can
write every element of X as the sum of an element of oY and a multiple of
y, where y E X - oY ' If x E ,X if y is any element in X - oY such that
f(y) 1= = 0, and if

z = x _ f (x ) y

f(y) ,

then f(z ) = 0, and thus x has the required form. Now assume that Y I is a
linear subspace of X for which oY c Y I and Y I 1= = oY ' We can choose
y E Y I - oY , and the above argument shows that X c
Y I so that Y I = .X
This shows that oY is maximal and that Y is a hyperplane.
The assertion that Y contains 0 if and only if (X = 0 follows readily.
Consider now the last part of the thorem. If Y is a hyperplane in X,
then Y is the translation of a linear subspace Z in X ; i.e., Y = x o + Z, with
X o fixed. If X o i Z, and if V( Y + x o) denotes the linear subspace generated
x o) =
If for x = (Xx o Z, Z E Z, we
by the set Y + X O' then V(Y
define f(x ) = (x , then Y = (x: f(x ) = I}. On the other hand, if X o E Z, then
we take IX i Z, X = V(Z + IX )'
Y = Z, and define for x = (XIX
+ Z,
f(x ) = (X. Then Y = (x : f(x ) = OJ. This concludes the proof of the
theorem. _

.x

In the proof ofthe above theorem we established also the following result:
6.7.4.
let Z

Theorem. L e tf /= ::. 0 be a linear functional on the linear space ,X and


:x{ f(x ) = OJ. If X o E X - Z, then every x E X can be expressed as
_ f(x )
x - f(x o )x O + ,z Z E Z.

6.7.

Geometric Aspects ofLinear uF nctionals

The next result shows that it is possible to establish a unique correspondence between hyperplanes and linear functionals. This result follows readily
from Theorem 6.7.3.
6.7.5. Theorem. eL t Y be a hyperplane in a linear space .X If Y does not
contain the origin, there is a unique linear functional f on X such that
Y = { x : f(x )
= I}.
6.7.6. Exercise.

Prove Theorem 6.7.5.

6.7.7. Theorem. Let Y be a maximal linear subspace in a Banach space .X


Then either Y = for f = ;X i.e., either Y is closed or else Y is dense in .X

Proof Since Y is a linear subspace, f is a linear subspace of X by Theorem


6.2.3. Now Y c f. eH nce, if Y , * f we must have f = ,X since Y is a
maximal linear subspace. _
In the next result we will show that Y is closed if and only if the functional
Y is bounded (Le., continuous). Thus, corresponding to
any hyperplane in a normed linear space there is a functional that is bounded
whenever the hyperplane is closed and vice versa.

f associated with

6.7.8. Theorem. L e tfbe a non-ez ro linear functional on X , and let Y =


{ x : f(x )
= II} be a hyperplane in .X Then Y i s closed for every II if and only
iffis bounded.

Proof Sincefis bounded, it is continuous. If ,x { ,} is a sequence in Y which


converges to x E ,X then f(x,,) - + f(x ) = II, so that x E ,Y and thus Y is
closed.
= O} be closed. In view of Theorem 6.7.4,
Conversely, let Z = { x : f(x )
be a
there exists an X o E X - Z such that X = x [ o + Z] . Now let ,x { ,}
sequence in X such that "x - + x E .X Then it is possible to express each "x
and x as "x = c.xo + "z and x = cX o + ,z where z., z E Z. eL t d =
inf II X o - z II Since Z is closed, d > O. Now
6ez

IIx -

,x ,11 =

>

II(e inf

(6- 6 .)eZ

e,,)x o -

II(e -

(z - ,z ,)11
e.)x o - (z -

,z ,)11
=

Ie - e"ld.

Thus e" - + e. Moreover, since f(x,,) = c"f(x o) + f(z,,) = e"f(x o) - + cf(x o)


= f(x ) , it follows that f is continuous on ,X and hence bounded. _
We now introduce the concept of a half-space.
6.7.9. Definition. Let f be a non-zero linear functional on ,X
II E R. Let Y b e the hyperplane given by Y = { x : f(x )
= II}. Let Y

1,

and let
Y 2, Y 3,

Chapter 6 I Normed Spaces and Inner Product Spaces

and Y 4 be subsets of X defined by Y I = { x : f(x )


< ,} Y 1 = { x : f(x ) < .}
Y , = { x : f(x )
> .} and Y 4 = { x : f(x ) > .} Then each of the sets Y I Y 1,
Y , . and Y 4 is called a half space determined by .Y In addition, let ZI and Z1
be subsets of .X We say that Y separates ZI and Z1 if either (i) ZI C Y 1 and
Z1 C Y 4 or (ii) ZI C Y 4 and Z1 C Y 1
6.7.10. Exercise. Show that each of the sets Y I , Y 1 , "Y
Y 4 in the preceding definition is convex. Also, show that if in the above definition f is
continuous, then Y I and Y , are open sets in ,X and Y 1 and Y 4 are closed sets
in .X
In order to demonstrate some of the notions introduced. we conclude
this section with the following example.
~1)
E .X
Let y =
6.7.11. Example. Let X = R1, and let x = (~\t
be any fixed vector in .X and define the linear functionalf on X as

The set

f(x )

= "I~I

"1~1'

x{ E R1:f(x ) = "I~I
+ "1~1 = O}
is a line through the origin of R1 which is normal to the vector y. IfX I
the hyperplane

oY =

6.7.12.

("1' "1)

iF gure E. H a lf spaces.

oY .

6.8. Extension ofLinear uF nctionals

367

is a linear variety which is parallel to oY . The hyperplane Y divides RZ into


two open half-spaces ZI and Zz as depicted in Figure E. It should be noted
that x E X can now be written as x = Z + py, Z E ,Y where x E ZI if
P > 0 and x E Zz if P < o.

6.8.

EXTENSION

OF

IL NEAR

N
UF CTIONALS

In this section we state and prove the Hahn-Banach theorem. This result
is very important in analysis and has important implications in applications.
We would like to point out that the present form of this theorem is not the
most general version of the Hahn-Banach theorem.
Throughout this section X will denote a real normed linear space.
6.8.1. Definition. Let Y be a linear subspace of ,X let Z be a proper linear
subspace of ,Y let I be a bounded linear functional defined on Z, and let]
be a bounded linear functional defined on .Y If lex) = I(x ) whenever x E Z,
then] is called an extension of/from Z to .Y If the spaces ,X ,Y Z are normed
and if 1I/11z = II 1 lin then I is called a norm preserving extension off
We now prove the following version of the Hahn-Banach

theorem.

6.8.2. Theorem. Every bounded linear functional I defined on a linear


subspace Y of a real normed linear space X can be extended to the entire
space X with preservation of norm. Specifically, one can find a bounded
linear functional I such that
(i)

lex) = lex) for every x


Illllx = Ililly

Y;

and

(ii)
Proof Although this theorem is true for X not separable, we
the proof only for the case where X is separable (see Definition
separability). We assume that Y is a proper linear subspace of X,
wise there is nothing to prove. Let x I E X but x I i ,Y and let us
subset
Y

x{

X:

x =

(XIX

y, (X

R, y

shall give
5.4.33 for
for otherdefine the

.}Y

It is straightforward to verify that Y I is a linear subspace of ,X and furthermore that for each x E Y I there is a unique (X E R and a unique y E Y
such that x = (XIX
+ y. Ifan ex t ension] of/from Y t o Y I exists, then it has
the form

lex)

(X l (x

l)

+ ICY),

and if we let c = l(x l ), then lex) = 1(Y) - c(X. From this it is clear that
the extension is specified by prescribing the constant (Xc. In order that the

Chapter 6 I Normed Spaces and Inner Product Spaces

368

norm of the functional not be increased when it is continued from Y to Y


we must find a c such that the inequality

If(Y) holds for all Y E .Y


If Y E ,Y then /Y tX

<

tXcl

IIfll IIy

tXX I II

Y and the above inequality can be written as

If(tX)z

<

tXcl

IIflllltXz

or

tXx

I/(z) - cl < Ilfllllz

II

Ix II

This inequality can be rewritten as

- I Ifllllz

IX II < f (z )

c<
-

Ilfllllz

Ix II

or, equivalently, as

IIfll liz

f(z ) -

Ix II <

<

f(z )

IIfllliz

Ix II

(6.8.3)

for all z E .Y We now must show that such a number c does indeed always
exist. To do this, it suffices to show that for any Y l o)/Z E ,Y we have
C 1 t>.

/(YI) 1- 1/IIIIYI

XI

II s /(Y z )

IIfllllyz

x I II

t>.

Cz

(6.8.4)

But this inequality follows directly from

< IlfllilYI - lzY I = IlfllllYI + lX - lX - y~1I


< IIfllllYI + IX II + IIfll II zY + lX II
In view of (6.8.3) and (6.8.4) it follows that C I S C < Cz. If we now let
f(Y I ) -

f(yz )

j(x )

= f(y) -

/%C,

lY >

E
X

we have 11111 = II/II, and J is an extension of/from Y to Y I


Next, since X is separable it contains a denumerable everywhere dense
set lX{ >
,~X

, X . , .. } . F r om this set of vectors we select, one at a time,


a linearly independent subset of vectors lY{ >
,z Y ... ,Y . , ...} which belongs
to X - .Y The set lY{ >
,z Y _ .. ,Y . , ...} together with the linear subspace Y
generates a subspace W dense in .X
Following the above procedure, we now extend the functional f to a
functional J on the subspace W by ex t ending/from Y to Y I , then to 2Y ,' etc.,
where
Y

= :x{

=
X

tXIY

;Y Y

,Y tX

= tXzY

;Y Y

and
Y~

= :x{
X

I,

tX

R}
R},

etc.

6.8. Extension 0/ iL near uF nctionals

369

Finally, we extend J from the dense subspace W to the space X. At the


remaining points of X the functional is defined by continuity. If x E X ,
then there exists a sequence w
{ n } of vectors in W converging to .x By con= IimJ ( w n ). The inequality IJ ( x ) I< 1I/1111x II
tinuity, if lim Wn = ,x thenJ ( x )
._00

"_00

follows from

I=

IJ ( x )

lim Ij(wn) I <

'-00

lim 1I/IIIIwnii

n- o o

This completes the proof of the theorem.

11/1I11xll

The next result is a direct consequence of Theorem 6.8.2.


6.8.5. Corollary. Let X o E ,X X o 1= = O. Then there exists a bounded nonzero linear functional/defined on all of X such that/(x o) = II X o II and 11/11

=1.

Proof Let Y be the linear subspace of X given by Y


IX E R}. F o r Y E ,Y define 10(Y) = IXlix oII, where Y
IIXIllx

oll,

and so

l~o~ 1

1 for all Y E .Y

The proof now follows from Theorem 6.8.2.

= y{
IXx

o'

:X y = lU o,
Then Ilyll=

This implies that

II/oil = 1.

The next result is also a consequence of the Hahn-Banach

theorem.

6.8.6. Corollary. Let X o E ,X X O 1= = 0, and let "f > O. Then there exists a
bounded nonzero linear functional/defined on all of X such that II I II = "f
and/(x o) = 1I/11 l Ix o lI
The above corollary guarantees the existence
linear functionals.
6.8.7. Exercise.
In the next
given.

of non-trivial bounded

Prove Corollary 6.8.6.


example

a geometric interpretation of Corollary 6.8.5

IS

6.8.8. Example.
Let X o E ,X X o 1= = 0, and let/be a linear functional defined
on X such that/(x o ) = Ilx o II and II/II = I. Let K b e the closed sphere given
by K = x { E :X IIxll < IlxolIJ.
Now if x E ,K
then I(x ) < I/(x ) I <
11/11 l Ix l l < IIx o II, and so x belongs to the half-space x { E X : /(x )
< IIx o ll}
Thus, the hyperplane x { E :X /(x ) = II X oII} is tangent to the closed sphere
(as illustrated in Figure )F .
_

Chapter 6

370

I Normed Spaces and Inner Product Spaces

(X :

6.8.9.

iF gure .F

fIx) l= Ixoll}

Illustration of Corollary 6.8.5.

In closing this section, we mention two of the more important consequences of the Hahn-Banach theorem with significant practical implications.
One of these states that given a convex set Y in X containing an interior
point and given a fixed point not in the interior of ,Y there is a hyperplane
separating the fixed point and the convex set .Y The second of these asserts
that if Y t and Y z are convex sets in X, if Y t has interior points, and if Y z
contains no interior point of Y I , then there is a closed hyperplane which
separates Y t and Y z

6.9.

DU A L

SPACE AND SECOND DU A L

SPACE

In this section we briefly reconsider dual space X (see Definition 6.5.9),


and we introduce the dual space of X, called the second dual space. Throughout this section X is a real normed linear space, and X '
is the algebraic
conjugate of .X
We begin by determining the dual spaces of some common normed linear
spaces.
6.9.1. Example. eL t X = R , let x = (el' ... ,e.) denote an arbitrary
element of R, let a = (/XI' .. , /X.) be some fixed element in R', let II x II =
~f
+ ... + ~:, and recall from Example 6.5.14 that the functional/(x) =
/X l et + .,. +
/X.e. is a bounded linear functional on X and II/II = 11011.
If we define a set of basis vectors in R as e l = (1,0, ... ,0), ... , e. =
(0, ... , 0, I), then x

R may be expressed as x

= L

I- I

e,e,. If we let

/(e,), where/is any bounded linear functional on R, then

/x,

6.9.

Dual Space and Second Dual Space

Thus, the dual space X *

of R'

Furthermore, the norm on x * is


6.9.2. Exercise. Let
given by IIxll = max
that X *

is itself the space R' in the sense that


X

.
CE

consist of all functionals of the form f(x )

the elements of X *

there is an a

371

15;15;,

=
X

I~,I

IlfII =

lal

l 2
ex n /

ex,~"

R' , where the norm of x = (~I'


...
(see Example 6.1.5). Show that if f

= (ex .. ... ,ex.)

,~.)
E

X*,

is
then

E X

+ ... + ex.~ ,

is given by Ilfll= E lex,l.


I- I

R such that f(x ) =

R' , and show that the norm on X*

exl~1

so

6.9.3. Exercise.
Let X = R' , and define the norm of x = (~I'
... , ~,) E X
1~,lp)l/p,
where I < p <
(see Example 6.1.5).
by IIxll = (1~llp
Show that if f E x * then there is an a = (ex ... .. , ex,) E R" such that
f(x ) = exl~1
+ ... + ex,~" i.e., X* = R', and show that the norm on X *

00

+ ... +

is given by IlfII =

(I ex I I'

6.9.4.

Let

Exercise.

6.1.6 and let

.1
+ .1q =
p

+ ... +

IIX,I')

1/"

where q is such that .1


p

I. If p

I.

00, defined in Example


00. Show that the dual
<

be the space 1.1" 1 < p

+ .1q =

= 1, we take q =

space of 1.1' is I,. Specifically, show that every bounded linear functional on
lp is uniquely representable as
I(x )

= 1=I;
ex,"{
1

where a = (ex .. ... , ex k , ) is an element of I,. Also, show that every


element a of If defines an element of (lp)* in the same way and that

IIfll=

1(sup1: IIXIex,Ill,)I/'
I- I
I

if p

001

< P<

if I

I.

Since X * is a normed linear space (see Theorem 6.5.6) it is possible to


form the dual space of X*, which we will denote by X " and which will
be referred to as the second dual space of .X As before, we will use the
notation x " for elements of X** and we will write

x " (x ' ) = ,' x <

x " ),

where x' E X*. If X ' denotes the algebraic conjugate of ,X then the reader
can readily show that even though X * c X ' and X** c (X * )I, in general,
x * * is not a linear subspace of X f f.
Let us define a mapping J of X into x * * by the relation
(x ' , J x )

= (x , x ' ) , x

E ,X

x'

x*

(6.9.5)

Chapter 6 I Normed Spaces and Inner Product Spaces

372

or, equivalently, by

Jx

x",

x " (x ' )

x'(x).

(6.9.6)

We call this mapping J the canonical mapping of X into X**. The functional
x " defined on X * in this way is linear, because

"x (<;x <

and thus x "

px ; )

(X * )/.

,x <

x;

px ; )

Since

,x<

x;)

x"(x~)

p<,x

;x >

px " (x ; ),

Ix " (x ' ) I= Ix'(x) I= I,x< )'x 1 < IIxllllxl' I,


it follows that Ilx"ll < IIxll and thus x " E X**. We can actually show that
II x " II = II x II This is obvious for x = O. If x *- 0, then in view of Corollary
6.8.6 there exists a non-zero x ' E X * such that ,x <
x ' ) = II x 1111 x ' II, and thus
IIx"ll = IIxll. From this it follows that the norm of every x E X can be
defined in two ways: as the norm of an element in X and as the norm of a
linear functional on X * , i.e., as the norm of an element in X**. We summarize this discussion in the following result:
6.9.7. Theorem.
X

is isometric to some linear subspace in X**.

If we agree not to distinguish between isometric spaces, then Theorem


6.9.7 can simply be stated as X c X**.
6.9.8. Definition. A normed linear space is said to be reflexive if the
canonical mapping (6.9.6), J : X - + X**, is onto. If we again agree not to
distinguish between isometric spaces, we write in this case X** = .X If
X*X**, then X is said to be irreflexive.
6.9.9. Example.

The space

R~,

<

I <p

<

<

00

is reflexive.

6.9.10. Example.

The spaces Ip , I

6.9.11. Example.

The space II is irreflexive.

6.9.12. Exercise.
6.9.1 I.

Prove the assertions made in Examples 6.9.9 through

6.10.

WEAK

00,

are reflexive.

_
_

CONVERGENCE

Having introduced the normed dual space, we are now in a position to


consider the notion of weak convergence, a concept which arises frequently
in analysis and which plays an important role in certain applications. Throughout this section X denotes a normed linear space and *X is the dual space of .X

6.10.

373

Weak Convergence

6.10.1. Definition. A sequence .x{ l


of elements in X is said to converge
weakly to the element x E X if for every x ' E *
X , .x< , x ' )- + ,x < x ' ) . In
this case we write x . + - x weakly. If a sequence { x . }
converges to x E ,X
i.e., if II x . - x 11-+
as n - 00, then we call this convergence strong convergence or convergence in norm to distinguish it from weak convergence.

6.10.2. Theorem. Let .x{ l be a sequence in X which converges in norm to


x E .X Then .x{ l converges weakly to .x

Proof

Assume that

have

1.x< ,

and thus x . -

Ilx. - x l l-

<

x ' ) - (x , ,x )1
x weakly. _

as n -

Ilx'llllx.

00.

Then for any x '

x l l- -

0 as n -

6.10.3. Example.
X 2

Consider in

/2

= (0, 1,0, ... ,0, ...),

the sequence of vectors


X 3

*X

we

00,

Thus, strong convergence implies weak convergence. However,


verse is not true, in general, as the following example shows.

0, ...),

X J

the con-

(1,0, ... ,

= (0,0, I, ... ,0, ...), .... To show

that { x . }
converges weakly we note that every x ' E /2 = *X
can be represented as the scalar product with some fixed vector y = ('11' 1' 2' ... , 1' ., ...);
i.e., if x = (el' e2' ... ,e., ... ), then

,x <

x')

= I=:E
el'1l
J
~

(see Exercise 6.9.4). F o r the case of the sequence { x . }

we now have

.x< , x ' ) = 1' .,


and since 1' . + as n - 00 for every y E /2' it follows that .x < ,
x ' ) +as n - 00 for every x ' E 12 , Thus, .x{ }
converges to weakly. However,
x. +
strongly, because II x . II = 1. _

We leave the proof of the next result as an exercise to the reader.


6.10.4. Theorem. If X
are equivalent.
6.10.5. Exercise.

is finite dimensional, weak and strong convergence

Prove Theorem 6.10.4.

Analogous to the concept of weak convergence of elements of a normed


linear space X we can introduce the notion of weak convergence of elements
of *
X .
6.10.6. Definition. A sequence of functionals {x~}
in x* converges weakstar
(i.e., weak*) to the linear functional x ' E x* if for every x E X we have
,x < )~x
+ - ,x< x ' ) . We say that x~ - x ' weak*.

Chapter 6 I Normed Spaces and Inner Product Spaces

374

Since strong convergence in X implies weak convergence in X , it follows


that if a sequence of linear functionals {x~}
in X
converges to the linear
functional x ' E X , then x~ - + x ' weak.
Let us consider an example.
6.10.7. Example. Let a[ , b) be an interval on the real line containing the
origin, i.e., a < 0 < b, and let e{ a[ , b); II . II~} be the Banach space of realvalued continuous functions as defined in Example 6.1.9. Let l{ pft} be a sequence
of functions in era, b) satisfying the following conditions for n = I, 2, ... :
(i) Ipft(t) >

(ii)
(iii)

Ipft(t)

0 for all t E a[ , b);


0 if It I > lin and t

s: Ipft(t) dt =

a[ , b); and

I.

F o r each n = I, 2, ... , we can define a continuous linear functional


on X (see Example 6.5.13) by

(x ,
where x

era, b]. Now let ~x

x~>

s: (x t)lpft(t) dt

be defined on

(x ,

x~>

era, b) by

= (x O)

for all x E era, b]. It is clear that x~ E X .


We now show that ~ - + x~
weak . By the mean value theorem from the calculus, there is a tft such that
- l in S t. < lin and

1/.
- l ift

Ip.(t)x(t) dt

(x tft)

lift
- l ift

'ft(t) dt

(x tft)

for each n = 1,2, ... , and x E era, b]. Thus, (x , x~>


-+
(x O) for every
x E era, b]; i.e., ~x - X o weak . We see that the sequence of functions I{ p.}
does not approach a limit in era, b]. In particular, there is no Ipo E era, b]
such that (x O)

s: (x t)lpo(t) dt. Frequently,

in applications, it is convenient

to say the sequence l{ pft} converges to the so-called "~ function" which has
this property. We see that the sequence l{ pft} converges to the ~ function in the
sense of weak convergence. _
6.10.8. Theorem. Let X be a separable normed linear space. Every
bounded sequence of linear functionals in X contains a weakly convergent
subsequence.

Proof Since X is separable, we can choose a denumerable everywhere dense


x z , .. ,x ft , . } in .X Now let }~x{
be a sequence in X . Since this
set IX{ '
}>~x
is a bounded sequence
sequence is bounded in norm, the sequence ({ IX >
a subsequence
in either R or C. It now follows that we can select from }~x{

375

6.11. Inner Product Spaces

{xU
such that the sequence IX < { '
.~x J>
converges. Again, from the subwe can select another subsequence {x~.J
such that the sequence
sequence .~x{ J
lX < { '
:.~X J>
converges. Continuing this procedure, we obtain the sequences

, x~.,

"~X "~x

X~"

x~.,

, x~.,

X~"

x~.,

, x~,

By taking the diagonal of the above array, we obtain the subsequence of


linear functionals X~"
x~.,
x~.,
.... F o r this subsequence, the sequence
,~X (X)~ ,
.~x (x)~ ,
l'x .()~X ,
... converges for all n. But then X~,(X),
/x ..(x),
.~x (x),
... converges for all X E .X
This completes the proof of the
theorem. _
The concepts of weak convergence and weak* convergence give rise to
various generalizations, some of which we briefly mention.
Let X be a normed linear space and, let x * be its normed dual. We call
weak compact if every infinite sequence from Y contains a
a set Y c: X
weak* convergent subsequence. We say that a functional defined on ,X which
in general may be non-linear, is weakly continuous at a point X o E X if for
x~,
... , x~l in *
X ,
every f > 0 there is a ~ > 0 and a finite collection {x~,
such that If(x ) - f(x o) I < f for all x such that I<,x
;x > I < ~ for i = 1,2,
... ,n. We can define weak continuity of a functional similarly by inter-

changing the roles of X

and X .

It can be shown that if X is a real normed linear space and X


is its
normed dual, then any closed sphere in x* is weak compact.
The reader can readily show that iff is a weakly continuous functional,
then x . - + X weakly implies that f(x~)
-+
f(x ) .

6.11.

INNER PRODUCT

SPACES

We recall (see Definition 3.6.19 and the discussion following this definition)
that if X is a complex linear space, a function defined on X X X into C,
which we denote by (x, y) for x, y E ,X is called an inner product if
(i) (x, )x

> 0 for all x * - O and (x, )x = 0 if x = 0;

(ii) (x, y) = (y, )x for all x, y E X ;


(iii) (IXX + py, z ) = IX(,X z ) + P(Y, z ) for all x , y, z
IX, P E C; and
(iv) (x , IXy + pz ) = ( x , y) + P(x , z ) for all x , y, z
IX, p E C.

E
E

and for all


and for all

Chapter 6 I Normed Spaces and Inner Product Spaces

376

In the case of real linear spaces, the preceding characterization of an


inner product is identical, except we omit complex conjugates in (ii) and (iv).
We call a complex (real) linear space X on which an inner product, ( " ' ) ,
is defined a complex (real) inner product space which we denote by ;X {
( " .)}
(see Definition 3.6.20). If the particular inner product being used in a given
discussion is understood, we simply write X to denote the inner product
space. In accordance with our discussion following Definition 3.6.20, recall
also that different inner products defined on the same linear space yield
different inner product spaces. Finally, refer also to the discussion following
Definition 3.6.20 for the characterization of an (inner product) subspace.
We have already extensively studied finite-dimensional real inner product
spaces, i.e., Euclidean vector spaces, in Sections .4 9 and .4 10. Our subsequent
presentation will be in a more general setting, where X need not be finite
dimensional and where X may be a complex vector space. In fact, unless
otherwise stated, { X ; ( " .)} will denote in this section an arbitrary complex
inner product space. Since the proofs of several of the following theorems are
nearly identical to corresponding ones in Sections .4 9 and .4 10, we will leave
such proofs as exercises.
One of our first objectives will be to show that every inner product space
{ X ; ( " .)} has a norm associated with it which is induced by its inner
product ( " .). We find it convenient to consider first the Schwarz inequality, given in the following theorem.
6.11.1. Theorem. F o r any x E ,X let us define the function
by IIxll = (x , X ) I/2. Then for all x , y E ,X

II . II: X

I(x, y)l ~ II x llI\ IY I


6.11.3. Exercise.

6.11.4.

Prove Theorem 6.11.l (see Theorem .4 9.28).

Theorem. Let X
-+
R defined by

is a norm; i.e., for every ,x Y


(i)
(ii)
(iii)
(iv)
6.11.6~

II . II

be an inner product space. Then the function

IIxll =
E

(x,

(6.11.5)

X ) I/2

X and for every

C, we have

IIxll ~ 0;
IIxll = 0 if and only if x = 0;
lIlx l = 1lIlxll; and
Ilx + IY I < IIxll + lIyll
Exercise.

(6.11.2)

Using the above results, we can now readily show that the function
defined by IIxll = (x, X ) I/2 is a norm.

II . II: X

-+

Prove Theorem 6.11.4 (see Theorem .4 9.31).

6.11.

Inner Product Spaces

377

Theorem 6.11.4 allows us to view every inner product space as a normed


linear space, provided that we use Eq. (6.11.5) to define the norm on .X
Moreover, in view of Theorem 6.1.2, we may view every inner product space
as a metric space, provided that we define the metric by p(x, y) = Ilx - yll.

Subsequently, we adopt the convention that when using the properties and
terminology ofa normed linear space in connection with an inner product space
we mean the norm induced by the inner product, as given in Eq. (6.11.5).
We are now in a position to make the following important definition.

6.11.7.
space.

Definition.

A complete inner product space is called a H i lbert

Thus, every H i lbert space is also a Banach space (and also a complete
metric space). Some authors insist that H i lbert spaces be infinite dimensional.
We shall not follow that practice. An arbitrary inner product space (not
necessarily complete) is sometimes also called a pre- H i lbert space.
6.11.8. Example.
L e t X be a finite-dimensional (real or complex) inner
product space. It follows from Theorem 6.6.5 that X is a H i lbert space. _
6.11.9. Example.
L e t 12. be the (complex) linear space defined in Ex a mple
6.1.6. L e t x = (el> e2.' ...) E 12.' Y = (111) 112.' ...) E 12.' and define (x, y):
12. X 12. - Cas
(x , y)

= I-I;
elil'
I

It can readily be shown that ( " .) is an inner product on .X Since 12. is


complete relative to the norm induced by this inner product (see Ex a mple
6.1.6), it follows that 12. is a H i lbert space. _
6.11.10. Ex a mple
(a) L e t X = ~[a,
b] denote the linear space of complex-valued continuous
functions defined on a[ , b] (see Ex a mple 6.1.9). F o r ,x y E ~[a,
b] define
(x , y)

s:

x ( t)y(t) dt.

It is readily verified that this space is a pre- H i lbert space. In view of Example
6.1.9 this space is not complete relative to the norm II x II = (x, X)I/2., and hence
it is not a H i lbert space.
(b) We extend the space of real-valued functions, pL a[ , bJ, defined in
Ex a mple 5.5.31 for the case p = 2, to complex-valued functions to be the
set of all functions f: a[ , b] C such that f = u + iv for u, v E 2L .[a, b].
Denoting this space also by 2L .[a, b], we define

(f, g)

= r

G
[ J .bl

fgdp,

Chapter 6 I Normed Spaces and Inner Product Spaces

378

for f, g

b], where integration is in the eL begue


.)} is a Hilbert space. _

E L~[a,

b]; ( "

{L~[a,

In the next example

sense. The space

we consider the Cartesian product of Hilbert

spaces.

i = I, ... , n, denote a finite collection of


6.11.11. Example. Let IX{ '}
Hilbert spaces over C, and let X = IX
X
x X . If x E ,X then x =
(X I J '
. , x.) with IX
E IX '
Defining vector addition and multiplication of
vectors by scalars in the usual manner (see Eqs. (3.2.14), (3.2.15), and the
related discussion, and see Example 6.1.10) it follows that X is a linear
space. If ,x Y E X and if (XI' IY )I denotes the inner product of IX and IY on
uX then it is easy to show that

defines an inner product on .X The norm induced on X b y


is

Ilxll =
where IIXlIII =
X is a Hilbert

(x, )X I/2

d: IIIX

11f)1/2

I- I

X I)/'2. It is readily verified that X

(XI'

space. _

6.11.12. Exercise.

this inner product

is complete, and thus

Verify the assertions made in Example 6.1 1.11.

In Theorem 6.1.15 we saw that in a normed linear space { X ; II II}, the


norm 1\ II is a continuous mapping of X into R. Our next result establishes
the continuity of an inner product. In the following, X . +- X implies convergence with respect to the norm induced by the inner product ( " .)
on .X

6.11.13.
X E

,X

Theorem. Let .x{ }


be a sequence in X
and let .Y { }
be a sequence in .X Then

+-

(i) (z, x . )

(ii) (x . , z) -

(iii)

IIxlIll-+-

(iv) if 1; .Y
~

,._ 1

6.11.14.

(z, )x for all z


(x, )z for all z
IIxll; and

;X

X;

is convergent in ,X

then (1; .Y , )z

.X

Exercise.

such that x .

,,= 1

+-

x, where

= n:o::.l
1; (y., )z for all
~

Prove Theorem 6.11.13.

Next, let us recall that two vectors x, Y E X are said to be orthogonal


if (x, y) = 0 (see Definition 3.6.22). In this case we write x ..L y. If Y c X

379

6.11. Inner Product Spaces

and x E X is such that x .J .. y for all y E ,Y then we write x .J .. .Y Also, if


Z c X and Y c X and if z .J .. Y for all z E Z, then we write Y .J .. Z. Furthermore, observe that x .J .. x implies that x = O. Finally, the notion of inner
product allows us to consider the concepts of alignment and colinearity of
vectors.
6.11.1S. Definition. Let X be an inner product space. The vectors x, y E X
are said to be coJinear if (x, y) = l Ix l l llyll and aligned if (x, y) =

IlxllIIYII

Our next result is proved by straightforward computation.


6.11.16. Theorem.

F o r all x, y

yW
Ilx (i) Ilx
(ii) if x .J .. y, then IIx

6.11.17. Exercise.

yW =
yW

X we have

211xW

= IlxW

211yW; and

IlyW

Prove Theorem 6.11.16.

Parts (i) and (ii) of Theorem 6.11.16 are referred to as the parallelogram
law and the Pythagorean theorem, respectively (refer to Theorems .4 9.33 and
.4 9.38).
Let x { .. : a E I} be an indexed set of elements in ,X
where I is an arbitrary index set (i.e., I is not necessarily the integers). Then
(x .. : E I} is said to be an orthogonal set ofvectors if x .. ...L x p for all , pEl
such that 1= = p. A vector x E X is called a unit vector if II x II = 1. An
6.11.18. Definition.

orthogonal set of vectors is called an orthonormal set if every element of the


set is a unit vector. Finally, if IX{ }
is a sequence of elements in ,X we define
an orthogonal sequence and an orthonormal sequence in an obvious manner.
sU ing an inductive process we can generalize part (ii) of Theorem 6.11.16
as follows.
6.11.19. Theorem.

Let { X I '

... ,x

II ~
x

n}

be a finite orthogonal set in .X

W= J~

Then

IIx J llz.

We note that if x 1= = 0 and if y = lx llxll,


then lIyll = 1. eH nce, it is
possible to convert every orthogonal set of vectors into an orthonormal set.
Let us now consider a specific example.
6.11.20. Example.
Let X denote the space of continuous complex-valued
functions on the interval 0[ , I]. In accordance with Example 6.11.10, we

I Normed Spaces and Inner Product Spaces

Chapter 6

380

define an inner product on X by

(f, g)

(f(t)g(t) dt.

(6.11.21)

We now show that the set of vectors defined by


fft(t)

= e2a .,' , n = 0, I , 2 , ... ,i = , J = I

is an orthonormal set in .X
we obtain
(f.,f",) =

Substituting Eq.

e 2a (a- I II)'

Since e

2ak

i.e., if m

'

cos 2nk

2n(n -

II

(fft,f",) =

0, m

* n;

0 e2aCa- I II)"

(6.11.21),

dt

1
m)i

i sin 2nk, we have

* n, then fa ..L

fill' On the other hand,

:J

(fft,fft) =

i.e., if n =

(6.11.22) into Eq.

fft(t)f",(t)
dt

(6.11.22)

m, then (fft,fft) =

e2a (ft- f t)" dt =

Il/all =

I and

I;

1.

The next result arises often in applications.


6.11.23. Theorem.
(i)

t I(x,

1='

(ii) (x -

x;)

6.11.25. Exercise.

x,)x,)

is a finite orthonormal set in ,X

... , fX t}

12 < IlxW

:t (x,

1='

If { X I '

for all

..L x
J

X;

for any j

and

then

(6.11.24)

= 1, ... , n.

Prove Theorem 6.11.23 (see Theorem .4 9.58).

On passing to the limit as n


result.
6.11.26. Theorem. If ,x { }

->

00

in (6.11.24), we obtain the following

is any countable orthonormal set in ,X

then
(6.11.27)

for every x

.X

The relationship (6.11.27) is known as the Bessel inequality. The scalars


(x , x,) are called the Fourier coefficients of x with respect to the orthonormal set ,x { .}
The next result is a generalization of Theorem .4 9.17.

(1"

6.12.

381

Orthogonal Complements

6.11.28. Theorem. In an inner product space X we have (x, y)


x E X if and only if y = O.
6.11.29. Exercise.

0 for all

Prove Theorem 6.11.28.

From our discussion thus far it should be clear that not every normed
linear space can be made into an inner product space. The following theorem
gives us sufficient conditions for which a normed linear space is also an
inner product space.
6.11.30. Theorem.

Let

be a normed linear space. If for all ,x y

Ilx + yll2 + Ilx - yW = 2(llxW + IlyW),


then it is possible to define an inner product on X by
(x, y)

tfll x

for all ,x y

,X

yW where i =

6.11.33. Exercise.

,X

(6.11.31)

IIx - yW + illx + iyW - illx - iyW} (6.11.32)


,.;=T.

Prove Theorem 6.11.30.

6.11.34.
Corollary. If X is a real normed linear space whose norm
satisfies Eq. (6.11.31) for all ,x y E ,X then it is possible to define an inner
product on X by
(x, y)

for all ,x y

= tWx

yW - l lx
-

yW}

.X

6.11.35. Exercise.

Prove Corollary 6.11.34.

In view of part (i) of Theorem 6.11.16 and in view of Theorem 6.11.30,


condition (6.11.31) is both necessary and sufficient that a normed linear
space be also an inner product space. Furthermore, it can also be shown
that Eq. (6.11.32) uniquely defines the inner product on a normed linear
space.
We conclude this section with the following exercise.
00, be the normed linear space defined
6.11.36. Exercise.
eL t I" I < p <
in Example 6.1.6. Show that I, is an inner product space if and only if
p= 2.

6.12.

ORTHOGONAL

COMPLEMENTS

In this section we establish some interesting structural properties of


Hilbert spaces. Specifically, we will show that any vector x of a Hilbert space
X can uniquely be represented as the sum of two vectors y and ,z where y

Chapter 6 I Normed Spaces and Inner Product Spaces

is in a subspace Y of X

and z is orthogonal to .Y

This is known as the

projection theorem. In proving this theorem we employ the so-called "classical

projection theorem," a result of great importance in its own right. This


theorem extends the following familiar result to the case of (infinite-dimensional) Hilbert spaces: in the three-dimensional Euclidean space the shortest
distance between a point and a plane is along a vector through the point and
perpendicular to the plane. Both the classical projection theorem and the
projection theorem are of great importance in applications.

Throughout this section, ;X {

(-, .)) is a complex inner product space.

6.12.1. Definition. eL t Y be a non-void subset of .X The set of all vectors


orthogonal to ,Y denoted by .Y l, is called the orthogonal complement of .Y
The orthogonal complement of yl. is denoted by y{ l.).l. 6 yil, the orthogonal complement of yil is denoted by (yil)~
6 Yil.l,
etc.
6.12.2. Example. eL t X be the space 3 depicted in iF gure G, and let Y
be the Ix a- ix s.
Then yl. is the x 2 x 3p- lane, yu is the Ix a- ix s,
~Y
is again
= yl., yilil
the x 2 x 3p- lane, etc. Thus, in the present case, y.u = ,Y ~ Y
= yil, yil~il
= y~il
= yl., etc. _

y.

Xl

6.11.3

iF gure G

We now state and prove several properties of the orthogonal complement.


The proof of the first result is left as an exercise.
6.12.4.

Theorem. In an inner product space ,X O


{ )l. =

6.12.5. Exercise.

and Xl. =

O
{ .J

Prove Theorem 6.12.4.

6.12.6. Theorem. eL t Y be a non-void subset of .X Then y~ is a closed


linear subspace of .X
Proof If ,x y E y.l, then (x, )z = 0 and (y, )z = 0 for all z E .Y eH nce,
(<x< + py, )z = ( ,x )z + P(Y, )z = 0, and thus (<x< + Py) .l- z for all
z E ,Y or (<x<
py) E lY .. Therefore, yl. is a linear subspace of .X

6.12.

Orthogonal Complements

383

To show that y.l is closed, assume that X o is a point of accumulation of


Then there is a sequence fx)~
from y.l such that II ~x - X o11- 0 as n 00. By Theorem 6.11.13 we have 0 =
(x~,
z) (x o, z ) as n 00 for all
Z E .Y
Therefore X o E y.l and y.l is closed. _
lY ..

Before considering the next result we require the following concept.

6.12.7. Definition. Let Y be a non-void subset of ,X and let V(Y) be the


linear subspace generated by Y (see Definition 3.3.6). Let V(Y) denote the
closure of V(Y). We call V(Y) the closed linear subspace generated by .Y
Note that in view of Theorem 6.2.3, V(Y)
of .X

6.12.8. Theorem. Let


(i)
(ii)
(iii)
(iv)
(v)

is indeed a linear subspace

Y and Z be non-void subsets of .X Then

either Y () yl. = 0 or Y () y.l = O


{ ;J
Y c Y.ll.;
if Y c Z, then Zl. c Y.l;
y.l = Y . lll; and
yH is the smallest closed linear subspace of X
i.e., yH = V(Y).

which contains ;Y

To prove part (i), assume that Y () yl. 1= = 0, and let x E Y


Then x E Y and x E y.l and so (x , x ) = O. This implies that x =
The proof of part (ii) is left as an exercise.
To prove part (iii), let Y E Z.L. Then y .l- x for all x E Z. Since
it follows that y .l- x for all x E .Y Thus, y E y.L whenever Y E

Proof

y.L

::::J

Z.L.

() .Y l.

O.

Z ::::J ,Y
Z.L and

To prove part (iv) we note that, by part (ii) of this theorem, y.l c yll.L .
On the other hand, since Y c yH , by part (iii) of this theorem, y.L ::::J y.L l l.
Thus, y.L = y.L..L .L
The proof of part (v) is also left as an exercise.
_

6.12.9. Exercise.

Prove parts (ii) and (v) of Theorem 6.12.8.

In view of part (iv) of the above theorem, we can write y.L = y.LH
=
= ... , and y.l.L = y.l.L..L L = yU H l l.
= ....
Before giving the classical projection theorem, we state and prove the
following preliminary result.
yl..L..L lL .

6.12.10. Theorem. Let Y


arbitrary vector in .X Let

6=

be a linear subspace of ,X
inf(lIy -

xII: y

.} Y

and let x

be an

Chapter 6 I Normed Spaces and Inner Product Spaces

384

If there exists a oY E Y such that lIyo - Ix I = 0, then oY is unique, and


moreover oY E Y is the unique element in Y such that IIoY - x II = 0 if and
only if (x - oY ) 1- .Y

Proof Let us first show that if II oY - x II = 0, then (x - oY ) 1- .Y In doing


so we assume to the contrary that there is ayE Y not orthogonal to x - oY '
We also assume, without loss of generality, that y is a unit vector and that
(x - oY , y) =
O. Defining a vector Z E Y as z = oY + ,Y
we have

IIx - W
z

IIx -

y 1l2 =

(x -

oY -

= (x - oY , x - oY ) -

(x -

oY , y ) -

= IIx = IIx i.e., II x - z II < II x -

oY -

oY 11 2 - 1 1

I I

y , x - oY (<,Y<

y)
oY )

x -

(<,Y<

y)

lIlIyll2
oY 11 2;

11 1- 1
< Ilx oY II. F r om this it follows that if x oY is not orthog,Y then Ilyo - ix i
o. This completes the first part of
oY

onal to every y E
the proof.
Next, assume that (x - oY ) 1- .Y We must show that oY is a uniq u e vector
such that II x y II > II x - oY II for all y oY . F o r any y E Y we have, in
view of part (ii) of Theorem 6.11.16,

I!x - yW = IIx - oY

oY -

F r om this it follows that IIx - yll


pletes the proof of the theorem. _

yW =

IIx - oY W

> Ilx - oY II for

Ilyo -

all y

*oY -

yW
This com-

In Figure H the meaning of Theorem 6.] 2.] 0 is illustrated pictorially for


a subset Y of 3.

6.12.11.

fF uJ re

The preceding theorem does not ensure the existence of the vector oY .
However, if we require in Theorem 6.12.10 that Y b e a c/osedlinear subspace
in a H i lbert space ,X then the existence of the unique vector oY is guaranteed.

6.12.

Orthogonal Complements

385

This important result, which we will prove below, is called the classical
projection theorem.
6.12.12. Theorem. eL t X be a Hilbert space, and let Y be a closed linear
subspace of .X Let x be an arbitrary vector in ,X and let

J = inf{lIy - Ix I: Y

.}Y

Then there exists a unique vector oY E Y such that IIYo - Ix I = .J


over, oY E Y is the unique vector such that IIY o - x l l= i nf(lly- x l l:
Y E } Y if and only if the vector (x - oY ) 1.. .Y

More-

Proof. In view of Theorem 6.12. IO we only have to establish the existence


of a vector oY E Y such that II x - oY II = .J Assume that x i Y (if x E ,Y
then x = oY and we are done). Since J is the infimum of Ily - ix i for all
Y E ,Y there is a sequence nY{ }
in Y such that Ilx - n
Y ll-J as n - 00.
We now show that nY { }
is a Cauchy sequence. By part (i) of Theorem 6.11.16
we have
II(Y", -

x)

(x - nY )W

+ II(Y", -

)x =

(x - nY )IJ2

211Y", X

11 2

211x -

nY W

This equation yields, after some straightforward manipulations, the relation


Ily", -

nY W

= 211Y", -

xW

211x -

nY W -

14 1 x - (y",

Since Y is a linear subspace, it follows that for each "Y " nY


(y", + nY )/2 E .Y Thus,llx - (y", + nY )/211 > ~ and

lIy", -

nY W

<

211Y",

xW

211x -

nY W -

Yn)!r'
we have

(4 P.

Also, since lIy", - x W- - ~2 as m - 00, it follows that lIy", - "Y W - 0


is a Cauchy sequence. Since Y is a closed linear
as m, n - 00. eH nce, nY{ }
subspace of a Hilbert space, it is itself a Hilbert space and as such nY{ }
has
a limit oY E .Y iF nally, by the continuity of the norm (see Theorem 6.1.15),
it follows that lim II x - "Y II = II x - oY II = .J This proves the theorem. The next result is a consequence of the preceding theorem.
6.12.13. Theorem. If Y and Z are closed linear subspaces of a Hilbert
space ,X if Y c Z, and if Y F= Z, then there exists a non-zero vector in Z,
say ,z such that z 1.. .Y

Proof. Let x be any vector in Z which is not in Y (there is one such vector
by hypothesis). If we define J as above, i.e., J = inf{lly - Ix I: Y E ,}Y
then there exists by Theorem 6.12.12 a vector oY E Y such that II x - oY II
= .J Now let z = oY - .x Then z 1.. Y by Theorem 6.12.12. _

Chapter 6 I Normed Spaces and Inner Product Spaces

386

From part (ii) of Theorem 6.12.8 we have, in general, y.u.


certain conditions equality holds.
6.12.14. Theorem. eL t
f = y.u..

Y be a linear subspace of a Hilbert

::>

.Y Under

space .X

Then

Proof From part (ii) of Theorem 6.12.8 we have Y c y.u.. Since y.u. is
closed by Theorem 6.12.6, it follows that f c y.u.. F o r purposes of contradiction, let us now assume that f := 1= y.u.. Then Theorem 6.12.13 establishes the existence of a vector z E y.u. such that z := 1= 0 and such that z - ' f.
Thus, .z E fl.. Since Y c f, it follows that Z E yl.. Therefore, we have
Z E yl. n y.u. and Z := 1= 0, which is a contradiction to part (i) of Theorem
6.12.8. eH nce, we must have f = y.u..
We note that if, in particular, Y is a closed linear subspace of X, then
y.u..
In connection with the next result, recall the definition of the sum of two
subsets of X (see Definition 3.2.8).
Y

6.12.15. Theorem. If Y and Z are closed linear subspaces of a Hilbert


space ,X and if Y - ' Z, then Y + Z is a closed linear subspace of .X

Proof In view of Theorem 3.2.10, Y + Z is a linear subspace of .X To


show that Y + Z is closed, it suffices to show that if u is a point of accumulation for Y + Z, then u = Y + z for some Y E Y a nd for some Z E Z.
eL t u be a point of accumulation of Y + Z. Then there is a sequence of
vectors u{ ,,} in Y + Z with lIu" - ull- 0 as n - 00. In this sequence we
have for each n, u" = "Y + "z with "Y E Y and "z E Z. Suppose now that
u{ ,,} converges to a vector u E .X By the Pythagorean theorem (see Theorem
6.11.16) we have

Ilu" - umW = IIY" - mY +


But II u" - U m 11- 0 as m, n -

Z" -

m
z W =

IIY" - m
Y W

liz" - m
z ll 2

00, because u" having a limit is a Cauchy


sequence. Therefore, IIY" - m
Y W - 0 and liz" - m
z W - 0 as m, n - + 00.
But this implies that the sequences y{ ,,}, Z{ It} are also Cauchy sequences. Since
Y and Z are closed, these sequences have limits Y E Y and Z E Z, respectively. iF nally, we note that

Ilu" as n - +
u= Y

(y

+ )z 1I

= IIY" - Y

ZIt -

lz l <

IIY,,-

yll + liz" - lz l

Therefore, since "z cannot approach two distinct limits, we have


.z This completes the proof.

00.

Before proceeding to the next result, we recall from Definition 3.2.13 that
a linear space X is the direct sum of two linear subspaces Y and Z if for
every x E X there is a unique Y E Y and a unique Z E Z such that x =

6.13. oF urier

Series

387

.z We write, in this case, X


the projection theorem.

=
Y

ffi Z. The following result is known as

6.12.16. Theorem. If Y is a closed linear subspace of a Hilbert


then X = Y f fi y1..

space X,

Proof Let Z = Y + y1.. By hypothesis, Y is a closed linear subspace and


so is y1. in view of Theorem 6.12.6. F r om the previous result it now follows

that Z is also a closed linear subspace. Next, we show that Z = .X Since


Y c Z and y1. c Z it follows from part (iii) of Theorem 6.12.8 that Z1. c y1.
and also that Z1. c y1.1., so that Z1. c y1. nY u . But from part (i) of
Theorem 6.12.8 we have y1. n y1.1. = O
{ .J Therefore, the ez ro vector is the
{ .J Since Z is a closed linear
only element in both y1. and yil, and thus Z1. = O
subspace we have from Theorems 6.12.4 and 6.12.14,

Zu =

(Z1.)1.

= 0{ 1} . =

.X

We have thus shown that we can represent every x E X as the sum x =


+ ,z where Y E Y and z E y1.. To show that this representation is unique
we consider x = IY + Zl and x = 2Y + Z2' where tY > 2Y E Y and Zl' Z2
E y1.. Then (x - x ) = 0 = IY
- 2Y + Z\ - Z2 or IY - 2Y = Z2 - Zl' Now
clearly (YI - 2Y ) E Y a nd (Z2 - zJ
E yl.. Since IY
- 2Y = Z2 - Zl we also
have (Y\ - 2Y ) E y1. and (Z2 - Zl) E .Y F r om this it follows that IY - 2Y
= Z2 - Z\ = 0; eL ., IY
= 2Y
and Zl = Z2' Therefore, x is unique.
_
The above theorem allows us to write any vector x of a Hilbert space X
as the sum of two vectors Y and z ; i.e., x = y + ,z where y is in a closed linear
subspace Y o f X and z is in y1.. It is this theorem which gave rise to the
expression orthogonal complement.
If X is a Hilbert space and if Y is a closed linear subspace of X and if
x = y + ,z where y E Y a nd Z E y1., then we define the mapping P as
Px = y .

We call the function P the projection of x onto .Y Note that P(Px ) ~ p2X =
Py = ;Y eL ., p2 = P. We will examine the properties of projections in
greater detail in the next chapter. (Refer also to Definition 3.7.1 and
Theorem 3.7.4.)

6.13.

O
F R
U IER

SERIES

In the previous section we examined some of the structural properties of


Hilbert spaces. Presently, we will concern ourselves with the representation
of elements in Hilbert space. We will see that the vectors of a Hilbert space
can under certain conditions be represented as a linear combination of a

Chapter 6 I Normed Spaces and Inner Product Spaces

388

finite or infinite number of vectors from an orthonormal set. In this connection we will touch upon the concept of basis in Hilbert space. The property which makes all this possible is, of course, the inner product.
Much of the material in this section is concerned with an abstract approach
to the topic of F o urier series. Since the reader is probably already familiar
with certain facets of F o urier analysis, he or she is now in a position to recognize the power and the beauty of the abstract approach.
Throughout this section ;X {
(0, .)} is a complex inner product space, and
convergence of an infinite series is to be understood in the sense of Definition
6.3.1.
We now consider the representation of a vector Y of a finite-dimensional
linear subspace Y in an inner product space.
6.13.1. Theorem. Let X be an inner product space, let uY{
.. ,Yn} be a
finite orthonormal set in ,X and let Y be the linear subspace of X generated
by { Y I ' . , nY ' } Then the vectors {Yu .. , nY } form a basis for Y a nd, moreover, in the representation of a vector Y E Y by the sum

=
Y

IXIIY

+ ... +

IXnnY '

the coefficients t1., are specified by


t1.1 = ( y,y/),

Exercise.

6.13.2.
4.9.51.)

i= I ,

.. ,n.

Prove Theorem 6.13.1. (Refer to Theorems 4.9.44

and

We now generalize the preceding result.


6.13.3. Theorem.

Let

infinite orthonormal sequence


element x

.I 1X

1= 1

/1

<

00.

t1.,

Proof

A series

i.e.,

,X

if and only if L

in .X

Assume that

(x, ,x ),
2

I:& m + l

t:1

t1.I X

be a countably

is convergent to an

In this case we have the relation

i; I1X / <

1= 1

space and let ,x { }

be a H i lbert
X

00,

1t1. / 12

= I, 2, ....

and let $ "

__

t IX,X

I- I

,. If n

> m, then

6.13. oF urier

Series

389

as n, m - > 00. Therefore, {s.} is a Cauchy sequence and as such it has a


limit, say ,x in the Hilbert space .X Thus lim s. = x .
Conversely, if s{ .}

Ill,1 2- >

1-11I+1

and ~

00

1 ..+ 1

Ill,12 <

-00

converges then it is a Cauchy sequence and" s. -

0 as n, m - >
00.

From

00.

this it follows that

'-11I+1

s". W

Ill,12 - >

Now assume that} ' Ill,12 < 00, and let x = lim s. We must show that
f:1
- ...
ll, = (x, ,x ). From Theorem 6.13.1 we have ll, = (s., ,x ), i = I, ... ,n.
But s. - > x, and hence by the continuity of the inner product we have
(s., ,x ) - > (x, ,x ) as n - > 00. Therefore, ll, = (x, ,x ), which completes the
proof. _
In the next result we use the concept of closed linear subspace generated by
a set (see Definition 6.12.7).
6.13.4. Theorem. Let ,x { }
be an orthonormal sequence in a Hilbert space
,X and let Y be the closed linear subspace generated by IX { ' }
Corresponding
to each x E X the series
00

converges to an element

1-'

(x, x,)x

(6.13.5)

.Y Moreover, (x -

6.13.6. Exercise. Prove Theorem 6.13.4. (Hint:


6.13.3, and the continuity of the inner product.)

)X ..L .Y

tU ilize

Theorems 6.11.26,

A more general version of Theorem 6.13.4 can be established by replacing


by an arbitrary orthonormal set Z.
the orthonormal sequence ,x { }
In view of Theorem 6.13.4 any element x of a Hilbert space X can
unambiguously be represented by a series of the form 6.13.5 provided that
the closed linear subspace Y generated by the orthonormal sequence ,x { }
is equal to the space .X The scalars (x, ,x ) in 6.13.5 are called Fourier coefllcients of x with respect to the ,x { ,}
6.13.7. Definition. Let X be a Hilbert space. An orthonormal set Y in X
is said to be complete if there exists no orthonormal set of which Y is a
proper subset.
The next result enables us to characterize complete orthonormal sets.
6.13.8. Theorem. Let X be a Hilbert space, and let Y be an orthonormal
set in .X Then the following statements are equivalent:

Chopter 6 I oH rmed Spaus and Inner Product Spaces

390

Y is complete;
(ii) if (x, y) =
for all Y
(iii) V(Y) = .X
(i)

,Y then x

0; and

6.13.9. Exercise. Prove Theorem 6.13.8 for the case where Y is an orthonormal sequence ,x { .}
As a specific example of a complete orthonormal set, we consider the
set of elements e l = (1,0, ... ,0, ...), e" = (0, 1,0, ... ,0, ...), e3 =
(0,0, 1,0, ... ,0, ...), ... in the Hilbert space I" (see Example 6.11.9). It is
readily verified that Y = Ie,} is an orthonormal set in I". Now let x = (' t ,
,,,' ... "., . . ) E

Ilx - iX II"

I", and corresponding to x

f 1',1",

Ic:t:l

and thus lim IIx

- iX ii

k- -

let

iX

0. Hence,
=

,'1=

,' e,.

V(Y) =

Then
I" and

Y is complete by the preceding theorem.

Many of the subsequent results involving countable orthonormal sets


may be shown to hold for uncountable orthonormal sets as well (refer to
Definition 1.2.48). The proofs of these generalized results usually require a
postulate known as Zorn' s lemma. (Consult the references cited at the end
of this chapter for a discussion of this lemma.) Although the proofs of such
generalized results are not particularly difficult, they do involve an added
level of abstraction which we do not wish to pursue in this book. In connection with generalized results of this type, it is also necessary to use the notion
of cardinal number of a set, introduced at the end of Section 1.2.
The next result is known as Puseval' s formula (refer also to Corollary
.4 9.49).
6.13.10. Theorem. L e t X
orthonormal in .X Then

space and let the sequence ,x { }

be a Hilbert

be

(6.13.11)
for every x

if and only if the sequence ,x { }

liz II" L

is complete.

Proof. Assume to the contrary that the sequence ,x { }

E X

such that

such that (z, ,x )

there exists some z 1= =

I(z,

1= =

x,)

I-'
that the sequence

Now assume
6.13.4 and 6.13.8 we have
x

t=1

is not complete. Then


for all i. Thus, there exists a

I". This proves the first part.


,x { }

(x, ,x )x,

is complete. In view of Theorems

,~

, ,x .

6.13. oF urier

Since ,x { J

Series

391

is orthonormal we obtain
IIx I I

(i: (1"x " 1; (1,J x )J


=

1= I

J=

= 1'' f1=

This completes the proof.

J=

1; (1,/i J (x

"

)J x

1= I

1(1,,1 2

A more general version of Theorem 6.13.10 can be established by replacing


the orthonormal sequence by an orthonormal set.
The next result, known as the Gram-Schmidt procedure, allows us to
construct orthonormal sets in inner-product spaces (compare with Theorem

.4 9.55).

6.13.12. Theorem. Let

X be an inner-product space. eL t ,x { }
be a finite
or a countably infinite sequence of linearly independent vectors. Then there
exists an orthonormal sequence y{ ,J having the same cardinal number as the
and generating the same linear subspace as ,x { ,}
sequence ,x { }

Proof

Since IX # . 0, let us define IY as

It is clear that IY

and IX

11;:11'
=

YI

generate the same linear subspace. Next, let

Since
(Z2' Y I )

(x 2 (x 2, Y I )

(x 2, IY )YIo
-

(X2'

= (x

YI)

2, Y I )

= 0,

YI)

(x 2, IY )(YIo

YI)

*"

it follows that Z2 -L Y I ' We now let 2Y = 2z /11 2z 11. Note that Z2 0, because
and IY are linearly independent. Also, IY and 2Y generate the same linear
subspace as IX and X 2, because 2Y is a linear combination of IY and 2Y '
Proceeding in the fashion described above we define Zlo Z2' ... and
Y I ' 2Y ' . . recursively as
2X

Z. =

.X -

and

Y.=

a- I

1= 1

(X., ,Y )Y,

IIz:II'

As before, we can readily verify that z. L- ,Y for all i < n, that z. # . 0, and
that the ,Y { l,
i = I, ... ,n, generate the same linear subspace as the ,x{ ,}
i = I, ... ,n. If the set ,x { }
is finite, the process terminates. Otherwise it is
continued indefinitely by induction.

Chapter 6 I Normed Spaces and Inner Product Spaces

392

e,,}

The sequence
thus constructed can be put into a one-to-one corTherefore, these sequences have the
respondence with the sequence ,x { ,}
same cardinal number. _
The following result can be established by use of Zorn's lemma.
6.13.13. Theorem. eL t X be an inner product space containing a nonez ro element. Then X contains a complete orthonormal set. If Y is any
orthonormal set in ,X then there is a complete orthonormal set containing
Y a s a subset.
Indeed, it is also possible to prove the following result: if in an inner
product space \Y and Y 1 are two complete orthonormal sets, then Y \ and Y 1
have the same cardinal number, so that a one-to-one mapping of set \ Y onto
set Y 1 can be established. This result, along with Theorem 6.13.13, allows
us to conclude that with each Hilbert space X there is associated in a natural
way a cardinal number ". This, in turn, enables us to consider " as the
dimension of a Hilbert space .X F o r the case of finite-dimensional spaces this
concept and the usual definition of dimension coincide. oH wever, in general,
these two notions are not to be viewed as one and the same concept.
Next, recall that in Chapter 5 we defined a metric space X to be separable
if there is a countable subset everywhere dense in X (see Definition 5.4.33).
Since normed linear spaces and inner product spaces are also metric spaces,
we speak also of separable Banach spaces and separable Hilbert spaces. In
the case of Hilbert spaces, we can characterize separability in the following
equivalent way.
6.13.14. Theorem. A Hilbert space X is separable if and only if it contains
a complete orthonormal sequence.
6.13.15. Exercise.
,x { }

Prove Theorem 6.13.14.

Since in a separable Hilbert space X with a complete orthonormal sequence


one can represent every X E X as
X

= L

. (x,

1= \

X , )X

I,

we refer to a complete orthonormal sequence ,x { }


in a separable Hilbert
space X as a basis for .X Caution should be taken here not to confuse this
concept with the definition of basis introduced in Chapter 3. (See Definitions
3.3.6 and 3.3.22.) In that case we defined each X in a vector space to have a
representation as a finite linear combination of vectors X I ' Indeed, the concept of Hamel basis (see Definition 3.3.22), which is a purely algebraic

6.14.

The Riesz Representation Theorem

393

concept, is of very little value in spaces which are not finite dimensional. In
such spaces, orthonormal basis as defined above is much more useful.
We conclude this section with the following result.

6.13.16. Theorem. L e t Y be an orthonormal set in a separable H i lbert


space .X Then Y is either a finite set or a countably infinite set.
6.13.17. Exercise.

6.14.

THE

Prove Theorem 6.13.16.

RIESZ REPRESENTATION

THEOREM

In this section we state and prove an important result known as the


Riesz representation theorem. A direct conseq u ence of this theorem is that
the dual space x * of a H i lbert space X is itself a H i lbert space. Throughout
this section, { X ; (0, .)} is a H i lbert space.
We begin by first noting that for a fixed Y E ,X

= (x,

f(x )

y)

(6.14.1)

is a linear functional in .x By means of (6.14.1) distinct vectors y E X are


associated with distinct functionals. F r om the Schwarz ineq u ality we have

I(x, y)1

<

Ilx l illyll

X ). F r om this it follows that


H e nce, Ilfll < lIyll andfis bounded (i.e.,f E *
if X is a H i lbert space, then bounded linear functionals are determined by
the elements of X itself. In the next theorem we show that every element
y of X determines a uniq u e bounded linear functionalf(i.e., a uniq u e element
of X*) of the form (6.] 4 . 1) and that Ilfli = lIyll. F r om this we conclude
that the dual space X* of the H i lbert space X is itself a H i lbert space.
(Compare the following with Theorem 4.9.63.)

6.14.2.

Theorem. (Riesz) L e tfbe a bounded linear functional on .X Then


there is a uniq u e y E X such that f(x ) = (x, y) for all x E .X Moreover,
Ilfll = Ilyll, and every y determines a uniq u e element of the dual space X*
in this way.

Proof

F o r fixed y E ,X define the linear functionalf on X by Eq . (6.14.1).


F r om the Schwarz ineq u ality we have If(x)1 = lex, y)1 < lIyllllx l l so thatf
is a bounded linear functional and IIfll < lIyll L e tting x = y we have If(y)1
= l(y,y)1 = lIyllllyll, from which it follows that Ilfli = Ilyll
Nex t , let f be a bounded linear functional defined on the H i lbert space
.X L e t Z be the set of all vectors z E X such that fez) = o. By Theorem
3.4.19, Z is a linear subspace of .X Now let Z{ 8} be a sequence of vectors in
Z, and let X o E X be a point of accumulation of Z{ 8}' In view of the con-

eluzpter 6 I Normed Spaces and Inner Product Spaces

394

tinuity offwe now have 0 = fez,,) - f(x o) as n - 00. Thus, X o E Z and Z


is closed.
If Z = X, then for aU x E X we have f(x ) = 0, and the equality f(x ) =
(x, y) = 0 for all x E X holds if and only if y = O.
Now consider the case Z c X, X 1= = Z. F r om above, Z is a closed linear
subspace of .X We can therefore utilize Theorem 6.12.16 to represent X by
the direct sum

X=

ZEfjZ1..

Since Z c X and Z 1= = ,X there exists in view of Theorem 6.12.13 a non-zero


vector u E X such that u L - Z; i.e., u E Z1.. Also, since u 1= = O.and since
u E Z1., it follows from part (i) of Theorem 6.12.8 that u fI. Z, and hence
feu) 1= = O. Since Z1. is a linear subspace of ,X we may assume without loss of
generality that feu) = l. We now show that u is a scalar multiple of our
desired vector yin Eq. (6.14.1).
F o r any fixed x E X we can write

f(x -

f(x ) u) =

f(x ) -

f(x ) f(u) =

f(x ) -

f(x ) =

0,

f(x ) u) E Z. F r om before, we have u...L Z and hence


0, or (x, u) = f(x ) lluW, or f(x ) = (x, u/lluW). Letting
u/ll u Wyields now the desired form

and thus (x -

(x - f(x ) u, u) =

y=

f(x )

= (x,

y).

To show that the vector y is unique we assume that f(x ) = (x, y' ) and
I(x ) = (x, y") for all x E .X Then (x, y' ) - (x, y") = 0, or (x, y' - y") = 0,
or (y' - y", )x = 0 for all x E .X It now follows from Theorem 6.11.28 that
y' = y". This completes the proof of the theorem. _

6.14.3.

Exercise.

Definition 6.9.8).

Show that every H i lbert

space X

is reflexive (refer to

6.14..4

Exercise. Two normed linear spaces over the same field are said
to be congruent if they are isomorphic (see Definition 3.4.76) and isometric
(see Definition 5.9.16). Let X be a H i lbert space. Show that X is congruent
to X*.

6.15.

SOME APPLICATIONS

We now consider two applications to some of the material of the present


chapter. This section consists of three parts. In the first of these we consider
the problem of approximating elements in a H i lbert space by elements in a
finite-dimensional subspace. lit the second part we briefly consider random

6.15.

Some Applications

395

variables, while in the third part we concern ourselves with the estimation of
random variables.

A. Approx i mation
(Normal Equations)

of Elements in H i lbert

Space

In many applications it is necessary to approximate functions by simpler


ones. This problem can often be implemented by approximating elements
from an appropriate Hilbert space by elements belonging to a suitable linear
subspace. In other words, we need to consider the problem of approximating
a vector x in a Hilbert space X by a vector oY in a linear subspace Y o f .X
eL t IY E X for i = I, ... ,n, and let Y = V({IY )}
denote the linear subSince Y is finite dimensional, it is
space of X generated by fY I ' ... ,Yn'}
closed. Now for any fixed x E X we wish to find that element of Y which
minimizes II x - Y II for all Y E .Y If oY E Y is that element, then we say that
oY approximates x. We call (x - oY ) the error vector and II x - oY II the error.
Since any vector in Y can be expressed as a linear combination Y = lX \ y \
+ ... + IXnY n' our problem is reduced to finding the set of IX/, i = 1, ... , n,
for which the error IIx - lX \ y \ - ... - IXnnY ll is minimized. But in view of
the classical projection theorem (Theorem 6.12.12), oY E Y which minimizes
the error is unique and, moreover, (x - oY ) ..L yj' i = I, ... ,n. From this
we obtain the n simultaneous linear equations

GT(y1> ... , nY )

IX] \
[

:
=

([ 'X )\Y ]
:

(6.15.1)
'

IXn
(x , nY )
where in Eq. (6.15.1) GT(y\, ... , nY ) is the transpose of the matrix
(Y\,

Y\)

(Y\,

nY )

(Y2,

Y\)

(Y2'

nY )

(6.15.2)
(Yn' Y \ )
(Yn' nY )
The matrix (6.15.2) is called the Gram matrix of Y\, ... ,Yn' The determinant
of (6.15.2) is called the Gram determinant and is denoted by A(YI> ... ,yJ.
The equations (6.15.1) are called the normal equations. It is clear that in a
real Hilbert space G(YI> ... ,Yn) = GT(YI> ... ,Yn), and that in a complex
Hilbert space G(YI> ... ,Yn) = GT(y1' ... ,Yn)'
In order to approximate x E X by oY E Y we only need to solve Eq.
(6.15.1) for the lXI' i = 1, ... ,n. The next result gives conditions under
which Eq. (6.15.1) possesses a unique solution for the IX I

Chapter 6 I oH rmed Spaces and Inner Product Spaces

396

6.15.3. Theorem. A set of elements IY { '


... 'Yft} of a Hilbert space X is
linearly independent if and only if the Gram determinant ~(y I' , fY t)
O.

*'

Proof We prove this result by proving the equivalent statement ~(yl'


... ,
fY t) = 0 if and only if the vectors IY { '
... , fY t} are linearly dependent.
Assume that IY { '
... 'Yft}
is a set of linearly dependent vectors in .X
Then there exists a set of scalars l{ IX ' ... ,IXft}' not all ez ro, such that

+ ... +

IXIIY

IXftYft

= O.

(6.15.4)

Taking the inner product of Eq. (6.15.4) with the vectors IY { '
the n linear equations

... , fY t}

yields

(6.15.5)
IXI(Yft'

YI)

t- -

..

IXft(Yft'

fY t)

= 0

Taking the l{ IX ' ... ,IXft} as unknowns, we see that for a non-trivial solution
(IX I . .. ,IXft) to exist we must have (~ IY '
... 'Yft) = O.
Conversely, assume that ~(yl'
... , fY t) = O. Then a non-trivial solution
(IX I
, IXft) exists
for Eq. (6.15.5). After rewriting Eq. (6.15.5) as

we obtain

f' .'

( I'I-IX IIY .

which implies that

t IXly,

I- I

l":1

IXllY )

II,IX
=

1= I

IY I12 =

0,

O. Therefore, the set { Y I '

dependent. This completes the proof.

... ,Y . }

is linearly

The next result establishes an expression for the error II x - oY II. The
proof of this result follows directly from the classical projection theorem.
6.15.6. Theorem. Let X be a Hilbert space, let x E ,X let { y l' ... , fY t} be
a set of linearly independent vectors in ,X let Y be the linear subspace of X
generated by { y l' ... , fY t}' and let oY E Y be such that
IIx -

Then

oY II =

min ! I x
7EY

yll =

min IIx -

I%IIY

... -

IXJ.II

6.15. Some Applications

397

where

!\(YI'

(Yit ,Y ,)

(Ylt

x)

(Y z ,

(Y z ,

x)

,Y ,)

= det

... ,Y", x )

(Y", ,Y ,)
(x, ,Y ,)
6.15.7. Exercise.
8.

(y.. x )

(x, x )

Prove Theorem 6.15.6.

Random Variables

A rigorous development of the theory of probability is based on measure


and integration theory. Since knowledge of this theory by the reader has not
been assumed, a brief discussion of some essential concepts will now be
given.
We begin by introducing some terminology. If 0 is a non-void set, a
family of subsets, ,~ of 0 is called a q-algebra (or a f-q ield) if (i) for all
E, F E 0 we have E U F E 0 and E - F E 0, (ii) for any countable
sequence of sets (E,,} in
~

we have

,,-I

E"

E ~,

and (iii) 0

E ~.

It readily

follows that a q-algebra is a family of subsets of 0 which is closed under all


countable set operations.
A function P: ~ - + R, where ~ is a q-algebra, is called a probability
= 0 and pen) = 1, and
measure if (i) 0 < P (E) ~ 1 for all E E ~,(ii)P(0)
(iii) for any countable collection of sets (E,,} in ~ such that E, n EJ = 0
if i

"*j, we have P(U .. E,,) = . peE,,) .


""'1

.- 1

A probability space is a triple (0, ~,P},


where n is a non-void set, ~ is
a q-algebra of subsets of 0, and P is a probability measure on .~ We call
elements 0.> E 0 outcomes (usually thought of as occurring at random), and
we call elements E E ~ events.
A function X: 0 - + R is called a random variable if (0.>: (X o.
< }x E ~
for all x E R. The set (0.>: (X o.
< }x is usually written in shorter form as
(X < .} x If X is a random variable, then the function F x : R - + R defined by
(xF )x
= P(X < }x for x E R is called the distribution function of .X If "X
i = 1, ... , n are random variables, we define the random vector X as X =
(X I ' .. , ,X ,)T. Also, for x = (x " ... ,x , ,? E R", the event (XI < X l " . , "X
< ,x ,} is defined to be (0.>: IX (o. < IX } (') o{ .>: X 2(0. < x z } (') ... (') (0.>:
,X ,(o.
< ,x ,}. Furthermore, for a random vector ,X the function F x : R" - + R,

Chapter 6 / Normed Spaces and Inner Product Spaces

398

defined by (xF )x
= P{ X I < x . , ... , fX t < x ft ,} is called the distribution
function of .X
If X is a random variable and g is a function, g: R - R, such that the
Stieltjes integral
to be E{g(X)}

roo g(x )dF x


roo g(x)dF(x )X .

is a function, g: Rft -

exists, then the expected value of g( X )


Similarly, if X

is a random vector and if g

R such that t.g(x ) dF x ( X )

value ofg(X) is defined to be E{g(X)}

t.

is defined

exists, then the expected

g(x)dF(x )x .

Some of the expected

values of primary interest are E(X), the expected value of ,X E(XZ), the second
moment of ,X and E{ [ X
- E(X)Z
] ,}
the variance of .X
If we let .c z denote the family of random variables defined on a probability
space to, g:, P} such that E(XZ) < 00, then this space is a vector space over
R with the usual definition of addition and multiplication by a scalar. We
say two random variables, IX and X z , are equal almost surely if P{co: IX (co)
(z X co)}
= O. If we let L z denote the family of equivalence classes of all
random variables which are almost surely equal (as in Example 5.5.31),
then L { z ; (,)} is a real Hilbert space where the inner product is defined by

*'

L z
Throughout the remainder of this section, we let to, g:, P} denote our
belong
underlying probability space, and we assume that all random variab~es
to the Hilbert space L z with inner product (X , )Y = E(XY).
(X,

)Y

E(XY)

for ,X

C. Estimation of Random Variables


The special class of estimation problems which we consider may be
formulated as follows: given a set of random variables { Y .. ... , "Y ,}, find
the best estimate of another random variable, .X The sense in which an
estimate is "best" will be defined shortly. Here we view the set {Y., ... , "Y ,}
to be observations and the random variable X as the unknown.
F o r any mappingf: R'" - R such thatf(Y I , . , "Y ,) E L z for all observations {Y ... .. , "Y ,}, we call X = f(Y I , , "Y ,) an estimate of .X Iffis
linear, we call X a linear estimate.
Next, letfbe linear; eL ., letfbe a linear functional on R"'. Then there is
a vector aT = (III' ... ,II",) E R'" such thatf(y) = aTy for all yT = ("., ... ,
"",) E R"'. Now a linear estimate, X = lilY.
II",Y"" is called the
best linear estimate of ,X given { Y l "' "
"Y ,},
if E{ [ X
- lilY. - ... II",Y",]Z} is minimum with respect to a E R"'.
The classical projection theorem (see Theorem 6.12.12) tells us that the
best linear estimate of X is the projection of X onto the linear vector space

+ ... +

6.15. Some Applications

V({Y

399

, IY Il})'
F u rthermore, Eq . (6.15.1) gives us the explicit form for
1, ... ,m. We are now in a position to summariz e the above discussion in the following theorem, which is usually called the orthogonality
principle.

" i =

p . .

, Y III belong to L z Then X


= I Y I + ...
p ..
is the best linear estimate of X if and only if {p ... 'Ill} are such
- 21Y ,} = 0 for i = 1, ... ,m.

6.15.8. Theorem. L e t ,X Y

I IlYIIl
that E{ [ X

We also have the following result.

Y I ,
, Y IIl belong to L z . L e t
G = ,Y [ j]'
where
i,j = I,
, m, and let V = (PI' ... ,Pill) E Rill, where
,} for i = 1,
, m. If G is non- s ingular, then X = I Y I
is the best linear estimate of X if and only if aT = bTG - I .

6.15.9. Corollary. L e t ,X

't,j =

P, =

l ilY

E{,Y Y
E{XY

1ft

j,}

6.15.10. Exercise.

+ ...

Prove Theorem 6.15.8 and Corollary 6.15.9.

L e t us now consider a specific case.

6.15.11. Example. L e t ,X VI' ... , Vm be random variables in L z such that


E{ X }
= E{V,} = E{XV ,} = 0 for i = I,
,m, and let R = P[ /J]
be non, m. Suppose that the measuresingular where P,j = E[V,V j] for i,j = I,
ments {Y p . , IY ft} of X are given by Y , = X
V, for i = I, ... ,m.
Then we have E{,Y Y
+ V,][X + Vj]} = 0'; + P'j for i,j = I,
j} = E{ [ X
,} = E{ X ( X
... ,m, where 0'; 11. E{.J z X
Also, E{XY
+ V,)} = 0'; for i = I,
... ,m. Thus, G = /Y[ ']J
where j' Y = 0'; + P'j for i,j = I, ... , m, bT =
(PI' ... , Pm), where P, = 0'; for i = 1, ... ,m, and aT = bTG-I.

6.15.12. Exercise. In the preceding example, show that if P,j =


i,} = I, ... , m, where btj is the K r onecker delta, then

, -

0';-+

z for

mO'" , O'v

._

I -

O';b ,j for

I, ... , m.

The nex t result provides us with a useful means for finding the best linear
estimate of a random variable ,X given a set of random variables { Y p ... ,
Y k } , if we already have the best linear estimate, given {Y p . , Y k - I } .

6.15.13. Theorem. L e t k > 2, and let Y I , , Y k be random variables


in L z . L e t Y ' j = V({Y I ,
Yj})'
the linear vector space generated by the
random variables {Y p . , Y j } , for 1 < j <
k. L e t Y i k - I) denote the
best linear estimate of Y k given {Y p . . , Y k- I,} and let Y k(k - I) = Y k Y i k - I). Then kY '
= 'Yk-I EB V({Y k(k - I)}).

Chapter 6 I Normed Spaces and Inner Product Spaces

04 0

Proof By the classical projection theorem (see Theorem 6.12.12), "Y ,(k - I)
.J .. ,Y' .-I
Now for arbitrary Z E ,Y ' ., we must have Z = CIY I + ... +
C,.-I ,Y .-I + C,.Y,. for some (C I' ... ,C,.). We can rewrite this as Z = ZI + Z2'
where ZI = CIY I + ... + C,.-I,Y .-I
+ C,.Y,.(k - I) and Z2 = C,.Y,.(k - I).
and Z2 1- 'Y,.-I'
it follows from Theorem 6.12.12 that ZI
Since ZI E ,Y ' .-I
and Z2 E V({,Y .(k - I)}), the theorem
and Z2 are unique. Since ZI E ,Y' .-I
is proved. _

We can extend the problem of estimation of (scalar) random variables


to random vectors. eL t X I ' ... , X. be random variables in 2 ' and let X =
(XI>
... , .X )T be a random vector. Let Y o ' .. , "Y , be random variables in
,
2 ' We call i = (.A\, ... , .X )T the best linear estimate of ,X given Y{ I'
"Y ,}. if ,X is the best linear estimate of "X given { Y o '
.. , "Y ,} for i = 1,
,
n. Clearly, the orthogonality principle must hold for each X , ; i.e., we must
have E{(,X
- ,X )Y j } = 0 for i = 1, ... ,n and j = 1, ... ,m. In this case
i can be expressed as i = AY, where A is an (n X m) matrix of real numbers
and Y = (Y I , , "Y ,)T. Corollary 6.15.9 assumes now the following matrix
form.

6.15.14. Theorem. Let X I ' ... ,X., oY


... , "Y ,
2 ' Let G = [)',j]' where )'1) = E{,Y Y
j } for i,j =
P
[ ,j] , where PI) = E{ X , Y
j} for i = I, ... ,n. If
i = AY is the best linear estimate of ,X given ,Y
6.15.15.

Exercise.

be random variables in
1, ... ,m, and let B =
G is non-singular, then
if and only if A = BG- I .

Prove Theorem 6.15.14.

We note that Band G in the above theorem can be written in an alternate


way. That is, we can say that

i =

E{XYT}[E{YVTWIY

(6.15.16)

is the best linear estimate of .X By the expected value of a matrix of random


variables, we mean the expected value of each element of the matrix.
In the remainder of this section we apply the preceding development to
dynamic systems.
eL t J = {I, 2, ...} denote the set of positive integers. We use the notation
{X(k)}
to denote a sequence of random vectors; i.e., X(k) is a random vector
be a sequence of random vectors, (U k) = [ U I (k),
for each k E .J eL t (U { k)}
... , U i k)] T ,
with the properties
and

= 0

E{ U ( k)}

E{U(k)UT(j)}

Q(k~j"

(6~I5.1

7)

(6.15.18)

for all j, k E ,J where Q(k) is a symmetric positive definite (p X p) matrix


{ (k)} be a sequence of random vectors, V(k) =
for all k E .J Next, let V

6.15.

Some Applications

04 1

[V1(k), ... , V..(k)]T, with the properties

E{V(k)}

and

E{V(k)VT(j)}

(6.15.19)

R(k)Ojk

(6.15.20)

for all j, k E ,J where R(k) is a symmetric positive definite (m


for all k E .J
Now let X ( I) be a random vector, X ( I) = 1X[ (I), ... , X~(I)]T,
properties
E{X(I)}
= 0
and
E{X(I)XT(I)}
= P(I),
X

m) matrix

with the
(6.15.21)
(6.15.22)

where P(I) is an (n X n) symmetric positive definite matrix. We assume


further that the relationships among the random vectors are such that
E{(U k)VT(j)
E{(X I)UT(k)

and

E{X(I)VT(k)}

=
=

0,

(6.15.23)

0,

(6.15.24)

(6.15.25)

for all k,j E .J


Next, let A(k) be a real (n x n) matrix for each k E ,J let B(k) be a real
(n x p) matrix for each k E ,J and let C(k) be a real (m x n) matrix for
each k E .J We let {X(k)}
and (Y{ k)}
be the sequences of random vectors
generated by the difference eq u ations
and

(X k

Y(k)

1)

A(k)X(k)
C(k)X(k)

B(k)U(k)

(6.15.26)

V(k)

(6.15.27)

for k = 1,2, ....


We are now in a position to consider the following estimation problem:
... , Y(k)}, find the best linear estimate of
given the set of observations, (Y{ I),
the random vector (X k). We could view the observed random variables as
Y [ (I), yT(2), ... ,YT(k)],
and apply
a single random vector, say cyT = T
Theorem 6.15.14; however, it turns out that a rather elegant and significant
algorithm exists for this problem, due to R. E. Kalman, which we consider
next.
In the following, we adopt some additional convenient notation. F o r
each k,j E ,J we let t(j Ik) denote the best linear estimate of X ( j), given
(Y{ I),
... , (Y k)}.
This notation is valid for j < k and j;;::: k; however, we
shall limit our attention to the situation where j ;;::: k. In the present context,
a recursive algorithm means that ~(k + I Ik + I) is a function only of
~(k Ik) and Y ( k + I). The following theorem, which is the last result of this
section, provides the desired algorithm explicitly.

I Normed Spaces and Inner Product Spaces

Chapter 6

6.15.28. Theorem (K a lman).


Given the foregoing assumptions for the
dynamic system described by Eqs. (6.15.26) and (6.15.27), the best linear
estimate of X(k), given (Y(I), ... , Y ( k)} , is provided by the following set of
difference eq u ations:
i(k Ik)

i(k Ik -

and
i(k

where
K ( k)

P(k Ik -

P(k

for k

K ( k)[ Y ( k)

II k)

11 k)

= I[ -

C(k)i(k Ik -

1)],

(6.15.29)

A(k)i(k Ik),

I)CT(k)[C(k)P(k

P(kl k)

and

I)

Ik
-

I)CT(k)

K ( k)C(k)] P (kl

A(k)P(kl k)AT(k)

(6.15.30)

(6.15.31)

R(k)] - l ,

I),

(6.15.32)

B(k)Q(k)BT(k)

(6.15.33)

k -

1, 2, ... , with initial conditions

i(IIO) =

and

P(lIO)

= P(I).

Proof Assume that i(kl k - I) is known for k E .J We may interpret


i(lIO) as the best linear estimate of X(l), given no observations. We wish

to find i(k Ik) and i(k + 11 k). It follows from Theorem 6.15.13 (extended
to the case of random vectors) that there is a matrix K ( k) such that i(k I k)
= i(kl k - I) + K ( k)f(kl k - 1), where f(kl k - 1) = Y ( k) - t(kl k
- I), and t(k Ik - I) is the best linear estimate of Y ( k), given {Y(l),
... ,
Y ( k - I)}. It follows immediately from Eqs. (6.15.23) and (6.15.27) and the
orthogonality principle that t(k I k - 1) = C(k)i(k I k - I). Thus, we have
shown that Eq. (6.15.29) must be true. In order to determine K ( k), let
X ( kl k - 1) = X ( k) - X ( kl k - I). Then it follows from Eqs. (6.15.26) and
(6.15.29) that

(X kl

k)

X ( kl

k -

I) -

K ( k)[ C (k)X ( kl

k -

I) +

V(k)] .

To satisfy the orthogonality principle, we must have E{ X ( k I k)Y T (j)} = 0 for


j = 1, ... , k. We see that this is satisfied for any K ( k) for j = 1, ... , k - 1.
In order to satisfy E(X ( k Ik)YT(k)} = 0, K ( k) must satisfy
0=

E{ X ( k

Ik -

l)YT(k)}

K ( k)[ C (k)E{ X ( k

Let us first consider the term


E{ X ( k

Ik -

I)YT(k)}

E(X ( kl

k -

Ik -

l)Y T (k)}

E{V(k)YT(k)}].

I)X T (k)C T(k)

X ( kl

k -

(6.15.34)
l)VT(k)}.

(6.15.35)

We observe that X(k), the solution to the difference eq u ation (6.15.26) at


(time) k, is a linear combination of X ( l) and U ( l), ... , U ( k - 1). In view

6.15. Some Applications

04 3

of Eqs. (6.15.23) and (6.15.25) it follows that E{X(j)VT(k)}


= 0 for all
k,j E .J Hence, E{ X ( kl k - I)VT(k)} = 0, since X ( kl k - I) is a linear
Y ), ... , Y(k - I).
combination of X(k) and O
Next, we consider the term

E{ X ( kl

k-

= E{ X ( kl

I)XT(k)}

k-

l)[XT(k)

iT(kl k -

iT(k Ik -

1)' +

I)]}

= E { X ( klk- I )[ X T (klk- l ).+ i T(klk- I )] }

P(kl

k-

I)

E{X(kl

I)

(6.15.36)

where

P(kl k and E{ X ( klk


tion of O
Y { ),

t::.

l)iT(klk' - I)} =
I)} .

k-

I)}

I)X T (klk -

0, since i(klk - I ) is a linear combina-

... , Y(k -

Now consider

Using

.+

= E{V(k)[TX (k)CT(k)

E{V(k)YT(k)}

= R(k).

VT(k)J}

(6.15.37)

Eqs. (6.15.35), (6.15.36), and (6.15.37), Eq. (6.15.34) becomes

0=

P(kl k -

I)CT(k) -

K(k)[C(k)P(kl

k-

l)CT(k)

.+

R(k)].

(6.15.38)

Solving for (K k), we obtain Eq. (6.15.31).


To obtain Eq. (6.15.32), let X(k I k) = i(k) - X(k Ik) and P(k Ik) =
ErX ( k I k)XT(k I k)}. In view of Eqs. (6.15.27) and (6.15.29) we have

X ( kl k) =

X ( kl k -

1) -

K(k)[C(k)X(kl

Ik -

= [ I - K(k)C(k)]X(k

k-

1) -

1)

V(k)]

(K k)V(k).

F r om this it follows that


P(kl k) =

I[ -

(K k)C(k)JP(kl
I[ -

= I[ -

K ( k)C(k)] P (kl

K(k)C(k)]P(k
P
{ (k

k-

Ik -

1)

k-

Ik -

I)CT(k) -

I)CT(k)KT(k)

(K k)R(k)KT(k)

1)

K(k)[C(k)P(k

Ik -

I)CT(k)

.+ R(k)J}

T
K (k).

U s ing Eq. (6.15.38), it follows that Eq. (6.15.32) must be true.


To show that i(k' + 11k) is given by Eq. (6.15.30), we simply show that
the orthogonality principle is satisfied. That is,

E{[X(k

+
=

1) -

for j

A(k)i(k Ik)]YT(j)}

EfA(k)[X(k)

1, ... , k.

i(k I k)]YT(j)}

.+

EfB(k)U(k)YT(j)}
=

Chapter 6 / Normed Spaces and Inner Product Spaces

04 4

Finally, to verify Eq. (6.15.33), we have from Eqs. (6.15.26) and (6.15.30)

X(k

11 k)

A(k)X ( k

Ik) +

B(k)U(k).

F r om this, Eq. (6.15.33) follows immediately. We note that i(ll 0)


P(IIO) = P(l). This completes the proof. _

6.16.

0 and

NOTES AND REFERENCES

The material of the present chapter as well as that of the next chapter
constitutes part of what usually goes under the heading of functional analysis.
Thus, these two chapters should be viewed as a whole rather than two separate
parts.
There are numerous excellent sources dealing with H i lbert and Banach
spaces. We cite a representative sample of these which the reader should
consult for further study. References 6
[ .6]6[- .8],
6[ .10], and 6[ .12] are at an
introductory or intermediate level, whereas references 6
[ .2]6[ - .4]
and 6[ .13]
are at a more advanced level. The books by Dunford and Schwartz and by
Hille and Phillips are standard and encyclopedic references on functional
analysis; the text by Y osida constitutes a concise treatment of this subject,
while the monograph by H a lmos contains a compact exposition on H i lbert
space. The book by Taylor is a standard reference on functional analysis at
the intermediate level. The texts by K a ntorovich and Akilov, by K o lmogorov
and F o min, and by Liusternik and Sobolev are very readable presentations
of this subject. The book by Naylor and Sell, which presents a very nice
introduction to functional analysis, includes some interesting examples. F o r
references with applications of functional analysis to specific areas, including
those in Section 6.15, see, e.g., Byron and F u ller 6[ .1], K a lman et al. 6[ .5],
L u enberger 6[ .9], and Porter 6[ .11].

REFERENCES
6[ .1]
6[ .2]
6[ .3]
6[ .4]
6[ .5]

.F W. BYRON and R. W. EL UF R,
Mathematics of Classical and Quantum
Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,

1969 and 1970.


N. DUNO
F RD and .J SCHWARTZ, Linear Operators. Parts I and II. New York:
Interscience Publishers, 1958 and 1964.
P. R. A
H M
L OS, Introduction to Hilbert Space. New York: Chelsea Publishing
Company, 1957.
E. IH EL
and R. S. PHIIL PS,
Functional Analysis and Semi-Groups. Providence, R.I.: American Mathematical Society, 1957.
R. E. A
K M
L AN, P. L . A
F B
L , and M. A. ARBIB, Topics in Mathematical System
Theory. New York: McGraw-iH ll
Book Company, 1969.
*Reprinted in one volume by Dover Publications, Inc., New oY rk,

1992.

6.16.

6[ .6)
6[ .7)
6[ .8)
6[ .9)
6[ .10]
6[ .11]
6[ .12]
6[ .13]

Notes and References


L . V. A
K NTORovlCH
Spaces. New York:

and G. P. AKIO
L V,
uF nctional Analysis in Normed
The Macmillan Company, 1964.
A. N. O
K M
L OGOROV and S. V. O
F MIN, Elements of the Theory of uF nctions
and uF nctional Analysis. Vols. t, II. Albany, N.Y.: Graylock Press, 1957
and 1961.
.L A. IL SU TERNIK
and V. .J SoBOLEV, Elements ofFunctional Analysis. New
York: rF ederick Ungar Publishing Company, 1961.
D. G. EUL NBERGER,
Optimization by Vector Space Methods. New York:
J o hn Wiley & Sons, Inc., 1969.
A. W. NAYO
L R and G. R. SEL,L
iL near Operator Theory. New York: Holt,
Rinehart and Winston, 1971.
W. A. PORTER, Modern oF undations of Systems Engineering. New York:
The Macmillan Company, 1966.
A. E. TAYO
L R, Introduction to uF nctional Analysis. New York: John Wiley
& Sons, Inc., 1958.
.K O
Y SIDA, uF nctional Analysis. Berlin: Springer-Verlag, 1965.

IL NEAR

OPERATORS

In the present chapter we concern ourselves with linear operators defined


on Banach and Hilbert spaces and we study some of the important properties
of such operators. We also consider selected applications in this chapter.
This chapter consists of ten parts. Throughout, we consider primarily
bounded linear operators, which we introduce in the first section. In the second
section we look at inverses of linear transformations, in section three we
introduce conjugate and adjoint operators, and in section four we study
hermitian operators. In the fifth section we present additional special linear
transformations, including normal operators, projections, unitary operators,
and isometric operators. The spectrum of an operator is considered in the
sixth, while completely continuous operators are introduced in the seventh
section. In the eighth section we present one of the main results of the present
chapter, the spectral theorem for completely continuous normal operators.
Finally, in section nine we study differentiation of operators (which need not
be linear) defined on Banach and Hilbert spaces.
Section ten, which consists of three subsections, is devoted to selected
topics in applications. Items touched upon include applications to integral
equations, an example from optimal control, and minimization of functionals
(method of steepest descent). The chapter is concluded with a brief discussion
of pertinent references in the eleventh section.

7.1.

BOUNDED

IL NEAR

TRANSFORMATIONS

Throughout this section X and Y denote vector spaces over the same field

,F where F is either R (the real numbers) or C (the complex numbers).


We begin by pointing to several concepts considered previously. Recall
from Chapter I that a transformation or operator T is a mapping of a subset
:D(T) of X into .Y Unless specified to the contrary, we will assume that
X = :D(T). Since a transformation is a mapping we distinguish, as in Chapter
I, between operators which are onto or surjective, one-to-one or injective,
and one-to-one and onto or bijective. If T is a transformation of X into Y we
write T: X - +
.Y If x E X we call y = T(x) the image ofx in Y under T, and
if V c X we define the image ofset V in Y under T as the set
T(V)

y{

Y:

T(v), v EVe X } .

On the other hand, if W c ,Y then the inverse image ofset Wunder T is the
set
T- I (W) = x { E :X y = T(x) EWe .} Y
We define the range ofT, denoted R
< (T), by

R
< (T) =

y{

:Y

y=

T(x), x EX } ;

i.e., R
< (T) = T(X). Recall that if a transformation T of X into Y is injective,
then the inverse of T, denoted T- I , exists (see Definition 1.2.9). Thus, if
y = T(x) and if T is injective, then x = T- l (y).
In Definition3.4.1 we defined a linear operator (or a linear transformation)
as a mapping of X into Y having the property that
(i) T(x
(ii) T(lX)X

+ y) = T(x) + T(y) for all ,x y E X ;


= lXT(x) for alllX E F and all x E

and
.X

As in Chapter 3, we denote the class of all linear transformations from


Also, in the case of linear transformations we write

X into Y by L ( X ,
)Y .
Tx in place of T(x).

Of great importance are bounded linear operators, which turn out to


be also continuous. We have the following definition.

7.1.1. Definition. Let X and Y be normed linear spaces. A linear operator


T: X - +
Y is said to be bounded if there is a real number 1' > 0 such that
for all x

.X

II Tx Ily < 1' 11 lx ix

The notation II x Ilx indicates that the norm on X


II Tx lIy indicates that the norm on Y is employed.

is used, while the notation


However, since the norms
of the various spaces are usually understood, it is customary to drop the
subscripts and simply write II x II and II Tx II

04 7

Chapter 7 I iL near Operators

04 8

Our first result allows us to characterize a bounded linear operator in an


equivalent way.

)Y . Then T is bounded if and only if T


7.1.2. Theorem. Let T E L ( X ,
maps the unit sphere into a bounded subset of .Y
7.1.3.

Exercise.

Prove Theorem 7.1.2.

In Chapter 5 we introduced continuous functions (see Definition 5.7.1).


The definition of continuity of an operator in the setting of normed linear
spaces can now be rephrased as follows.
7.1.4.
Definition. An operator T: X - > Y (not necessarily linear) is said
to be continuous at a point X o E X iffor every f > 0 there is a 6 > 0 such that
IIT(x ) -

whenever II x X

o II

T(x o) II <

< 6.

The reader can readily prove the next result.


7.1.5. Theorem. Let T E L ( X ,
)Y . If T is continuous at a single point
X o E ,X
then it is continuous at all x E .X

7.1.6. Exercise.

Prove Theorem 7.1.5.

In this chapter we will mainly concern ourselves with bounded linear


operators. Our next result shows that in the case of linear operators boundedness and continuity are equivalent.
7.1.7. Theorem.
is bounded.

Let T

L(X,

)Y .

Then T is continuous if and only if it

Assume that T is bounded, and let "I be such that II Tx II S "IIIx II for
all x E .X Now consider a sequence x { n ) in X such that x . - > 0 as n - > 00.
Then II TX n II < , 11 .x 11 - > 0 as n - > 00, and hence T is continuous at the point
E .X
F r om Theorem 7.1.5 it follows that T is continuous at all points
x E .X
Conversely, assume that Tis continuous at x = 0, and hence at all x E .X
Since TO = 0 we can find a 6 > 0 such that II Tx II < I whenever II x II S 6.
F o r any x 1= := 0 we have 1I(6x)/llxllll = 6, and hence

Proof

IITxll
Ifwe let"

II T(I I~

1/6, then II Txll

<

11)11

i'llxll,

(~)II

T(I ~I)I

and Tis bounded.

ilfl

<
_

7.1. Bounded iL near Transformations

Now let S, T E L ( X ,
+ T) by
(S

and in Eq. (3.4.34 )

In Eq.

Y).

operators (S

T)x

(3.4.24 )

Sx

we defined the sum of linear

Tx, x

,X

we defined multiplication ofT by a scalar

= IX(Tx),

(IXT)x

,X

IX

IX

F as

.F

We also recall (see Eq. (3.4. 4


that the zero transformation, 0, of X into Y
is defined by Ox = 0 for all x E X and that the negative of a transformation
T, denoted by - T, is defined by (- T)x = Tx for all x E X (see Eq.
X ) is defined
3.4.45.
Furthermore, the identity transformation IE L ( X ,
by Ix = x for all x E X (see Eq. (3.4.56. Referring to Theorem 3.4.74 , we
recall that L ( X ,
)Y is a linear space over .F
Next, let ,X ,Y Z be vector spaces over ,F and let S E L ( ,Y Z) and
T E L ( X , )Y . The product of Sand T, denoted by ST, was defined in Eq.
(3.4.50) as the mapping of X into Z such that

(ST)x =

S(Tx), x

.X

It can readily be shown that ST E L ( X , Z). Furthermore, if X = Y = Z,


X ) is an associative algebra with identity I (see Theorem 3.4.59).
then L ( X ,
Note however that the algebra L ( X ,
X ) is, in general, not commutative
because, in general,

ST* TS.

In the following, we will use the notation B(X , )Y to denote the set of
all bounded linear transformations from X into Y; i.e.,
B(X , )Y

{T

L(X,

)Y :

T is bounded}.

(7.1.8)

The reader should have no difficulty in proving the next theorem.


7.1.9. Theorem. The space B(X , )Y is a linear space over .F
7.1.10. Exercise.

Prove Theorem 7.1.9.

Next, we wish to define a norm on B(X , )Y .


7.1.11. Definition.
defined by
IITII

Let

B(X , )Y .

inf{y: II Txll

<

The norm of T, denoted


yllxll for all x EX } .

II Til,

is

(7.1.12)

Note that II Til is finite and that

IITxll S IITII' l lx l l
for all x E .X In proving that the function II . II: B(X , )Y - + R satisfies all
the axioms of a norm (see Definition 6.1.1), we need the following result.

Chapter 7 I iL near Operators

14 0

7.1.13. Theorem. Let T E B(X , )Y .


Then
expressed in anyone of the following forms:

< lY lxll

(i)

II Til =

inf{ y :IITx l l

(ii)

II Til =

sup I{ I Tx l l/llx l l:

(iii)

IITII =

(iv)

II Til =

7.1.14.

","0

Tx l l: x

sup

I{ I

Tx l l: x

EX } .

Exercise.

;} X

and

Prove Theorem 7.1.13.

We now show that the function


the axioms of a norm.

II . II defined

7.1.15. Theorem. The linear space B(X ,


norm defined by Eq. (7.1.12; i.e.,

in Eq. (7.1.12) satisfies all

)Y is a normed linear space (with

II Til > 0, and II Til = 0 if and only ifT =


s IISII + II Til for every S, T E B(X, )Y ; and
III Til for every T E B(X, )Y and for every ~ E .F

(i) for every T


(ii) liS + Til
(iii) II T~ II
= I~

Proof

be

EX } ;

I{ I

1"'1=\

can equivalently

for all x EX } ;

sup

I",I:S:\

II Til

)Y ,

B(X ,

0;

The proof of part (i) is obvious. To verify (ii) we note that

II(S + T)x l l = IISx + Tx l l < IISxll + IITxl1 < (IISII + IITll)llxll


If x = 0, then we are finished. If x t= = 0, then
liS + Til = ~

< IISII + IITII for all x

II(Sltx~)xlI

,X

x t= =

We leave the proof of part (iii), which is similar, as an exercise.


F o r the space B(X ,

B(X,

X),

then ST

IISTII < IISII 1 1


F o r each x

B(X,

IISTII =
completing the proof.

B(X,

X)

and

Til

X we have

II (ST)x II = II S(Tx) II < IISII11


which shows that ST

we have the following results.

X)

7.1.16. Theorem. If S, T

Proof

O.

X).

sup
","0

If x t= =

Tx l l

< IISIIIITIIllxll,

0, then

II(ST)xll < IISIIIITII,


IIxll

7.1.17. Theorem. Let / denote the identity operator on .X


B(X, X), and II/II = 1.

Then /

7.1.

Bounded iL near

7.1.18. Exercise.

Transformations

14 1

Prove Theorem 7.1.17.

We now consider some specific cases.


7.1.19. Example.

= (el' ez, ... ) E

X = I z , the Banach space of Example


let us define T: X - > X by

Let
,X

Tx =

6.1.6. F o r

(0, ez, e3' ... ).

The reader can readily verify that T is a linear operator which is neither
injective nor surjective. We see that
00
00

IITxW =

<

~Ie,lz

le,l z

IIxW

Thus, T is a bounded linear operator. To compute IITII we observe that


which implies that IITII < I. Choosing, in particular, x =
(0, 1,0, ...) E ,X we have II Txll = Ilxll = I and

IITxll < II x II,

II Txll < IITIIllxll = IITII.


Thus, it must be that IITil = I.
1=

7.1.20. Example. Let X = era, b], and let 111100 be the norm on era, b] defined
in Example 6.1.9. eL t k: a[ , b] X a[ , b] - > R be a real-valued function, continuous on the square a < s < b, a < t < b. Define the operator T: X - > X
by

[ T x ] ( s)

for x

.X

Then T

L(X,

X)

IITx II =

sup

Q~,~b

This shows that T


that IITII = )10'

)10

B(X ,

k(s, t)x ( t) dt

(see Example 3.4.6). Then

Ifb k(s, t)x(t) dt I

< [Q~rb
=

r
Q

Ik(s, t) Idt]

lIxll

)Y and that

IITil <

[Q~fb
)10'

Ix ( t) I]
It can, in fact, be shown

F o r norms of linear operators on finite-dimensional spaces, we have the


following important result.
7.1.21. Theorem.
continuous.

eL t

T E L(X,

)Y .

If X

Let {XI'
,x n } be a basis for .X
set of scalars
,en} such that x
the linear functionalsj,: X - > F b y j,(x ) =

Proof

reI,

is finite dimensional, then Tis


F o r each x

= elx l +

e" i =

I,

there is a unique
If we define
,n, then by Theorem
X,

enxn'

Chapter 7 I iL near Operators

14 2

6.6.1 we know that each f, is a continuous linear functional. Thus, there


exists a set of real numbers { " I' ... ,,,"} such that If,(x) I < ",lIxll for i =
1, ... , n. Now

Tx = ' I Tx l + ... + ,"Tx".


If we let p = max, llTx,11
and )'0 = max
, )"' then it follows that IITxll
< np)'oll x II. Thus, T is bounded and hence continuous. _

Next, we concern ourselves with various norms of linear transformations


on the finite dimensional space R".

7.1.22. Example. eL t X = R", and let IU { '


... ,u"} be the natural basis for
R" (see Example .4 I.I 5). F o r any A E L ( X ,
X ) there is an n X n matrix, say
... ,
A = a[ ll] (see Definition .4 2.7), which represents A with respect to IU{ >
u"}. Thus, if Ax = y, where x = (' I > ... ,,") E X a ndy = ('71' ... , 7' ") E ,X
we may represent this transformation by y = Ax (see Eq. (4.2.17. In Example 6.1.5 we defined several norms on R", namely
and

IIxllp =

[ I ' l lI'

+ ... +

1e"'I] ' /P,

11_ = max, I{

II x

<

1 <p

00

e,l}.

It turns out that different norms on R" give rise to different norms oftransformation A. (In this case we speak of the norm of A induced by the norm
defined on R".) In the present example we derive expressions for the norm
of A in terms of the elements of matrix A when the norm on R" is given
by II III' II liz, and II 11-

led + ... + 1e"1.

(i) L e tp = 1; i.e., IIxli =


To prove this, we see that

IIAxl1 =

It1 atjel I<I~ tilatj'll

= l=t 1 lell ,-t 1 la'll < l-i;


. lell
=
eL t

m
{ ax

S;lS;" I- I

jo be such that

Then IIAII =

i;latj,l=

1= 1

max

From this it follows that II A II ~

t'lalll.

1-1=

S;lS;" ,=t I Iall I}

m
{ ax

tla/JI"

I S;lS;" 1= 1

t lau,l

I- '

la,ll} l lx l l

show that equality must hold, let X o =


and " = 0 if i *- jo. Then
IIAxoli =

max

)' 0 '

(' I ' ... ,,")

)' 0 '
E

and Ilxoll =

Then IIAII< " o'

To

R" be given by ' l l

= I,

1.

and so we conclude that II A II =

)' 0 '

7.1.

Bounded iL near

Transformations

14 3

Ie.

(ii) Let p = 2; i.e., IIx l l = (leI 12 +


+
12)1/2. Let AT denote the
and let A
{ .,
,A,,} be the distinct eigenvalues
transpose of A (see Eq. (4.2.9,
of the matrix ATA (see Definition .4 5.6). LetA o = max AJ . Then II A II = ~.
J

To prove this we note first that by Theorem .4 10.28 the eigenvalues of ATA
are all real. We show first that they are, in fact, non-negative.
Let {XI' ... , ,x ,} be eigenvectors of ATA corresponding to the eigenvalues
A{ . I, ... , A,,}, respectively. Then for each i = I, ... , k we have ATAx/ = A/X/.
Thus, ;X ATAx/ = A,X;/X .
From this it follows that
A=
,

>

;X ATAx/
x;x/-

O.

F o r arbitrary x E X it follows from Theorem .4 10.44


that x = IX
"x , where ATAx, = A/X/, i = I, ... ,k. Hence,
ATAx = AIX I
A/eX", By Theorem .4 9.41
we have IIAxW = T
x ATAx. Thus,

IIAxW =

" Atllx,W <


= I;

T
x ATAx

'=1

from which it follows that II A II < ~.


sponding to Ao, then we must have
achieved. Thus, II A II = ~.

e,

(iii) Let Ilx l i = max I{


I}. Then
/
this part is left as an exercise.
_

7.1.23. Exercise.

"

Ao I; Ilx,W
/= 1

+ .

+ .

AollxW,

If we let x be an eigenvector correAollxW, and so equality is

IIAxW =
IIAII =

max ( t
/

J=I

laill).

The proof of

Prove part (iii) of Example 7.1.22.

Next, we prove the following important result concerning the completeness


of B(X , )Y

7.1.24. Theorem. If Y is complete, then the normed linear space B(X ,


is also complete.

)Y

Proof L e t {T.} be a Cauchy sequence in the normed linear space B(X , )Y .


Choose N such that for a given f > 0, IITm - T.II < f whenever m > N
and n > N. Since the T. are bounded we have for each x E ,X

<

T.lll1xll < fllx l l


whenever m, n ~ N. From this it follows that T
{ .x}
is a Cauchy sequence in
.Y But Y is complete, by hypothesis. Therefore, T.x has a limit in Y which
depends on x E .X Let us denote this limit by Tx; i.e., lim T.x = Tx. To
IITmx -

T.xll

IITm-

.-00

show that T is linear we note that

T(x
and

y) =

lim T.(x

T(<)x<

y) =

lim T.(<)x <

lim T.x

lim T.y

= lim T.x =

Tx.

= Tx + Ty

Chapter 7 I iL near Operators

14 4

Thus, T is a linear operator of X into .Y We show next that T is bounded


and hence continuous. Since every Cauchy sequence in a normed linear
space is bounded, it follows that the sequence T
{ nJ is bounded, and thus
II Tn II < M for all n, where M is some constant. We have

II Txll =

II =

lilim Tnx

lim II Tnx

II <

sup 01 Tn IIIIx

II)

This proves that T is bounded and therefore continuous, and T E B(X, )Y .


Finally, we must show that Tn - + T as n - + 00 in the norm of B(X, )Y . F r om
before, we have II TmX - Tnx II < ell x II whenever m, n > N. Ifwe let n - + 00,
then II TmX - Tx II < ell x II for every x E X provided that m > N. This
implies that II Tm - Til < e whenever m > N. But Tm - + T as m - + 00 with
respect to the norm defined on B(X, )Y . Therefore, B(X, )Y is complete and
the theorem is proved. _
In Definition 3.4.16 we defined the null space of T
=

(~ T)

x{

:X Tx

OJ.

L(X,

)Y as
(7.1.25)

We then showed that the range space R


< (T) is a linear subspace of Y a nd
that ~(T)
is a linear subspace of .X F o r the case of bounded linear transformations we have the following result.
7.1.26. Theorem.
of .X

eL t T

)Y . Then

B(X,

~(T)

is a closed linear subspace

Proof meT) is a linear subspace of X by Theorem 3.4.19. That it is closed


follows from part (ii) of Theorem 5.7.9, since (~ T)
= T-I({O)J and since O{ J
is a closed subset of .Y
_
We conclude this section with the following useful result for continuous
linear transformations.
7.1.27. Theorem. Let T

T(I;

I- I

for every convergent series I; ~,X,


I- I

)Y . Then T is continuous if and only if

L(X,
~IXI)

= I; ~ITxl
1= 1

in .X

The proof of this theorem follows readily from Theorem 5.7.8. We leave
the details as an exercise.
7.1.28. Exercise.

Prove Theorem 7.1.27.

7.2.

INVERSES

Throughout this section X and Y denote vector spaces over the same field

F where F is either R (the real numbers) or C (the complex numbers).


We recall that a linear operator T: X - 4 Y has an inverse, T- I , if it is
injective, and if this is so, then T- I is a linear operator from R
< (T) onto X
(see Theorem 3.4.32). We have the following result concerning the continuity

ofT- I .

7.2.1. Theorem. Let T E L(X,


)Y . Then T- I exists, and T- I E B(<R(T), )X
if and only if there is an IX > 0 such that II Tx II > IXII x II for all x E .X If
this is so, II T- I II < I/IX.

Proof Assume that there is a constant IX > 0 such that IXII x II < II Tx II for
all x E .X Then Tx = 0 implies x = 0, and T- ' exists by Theorem 3.4.32.
For y E R
< (T) there is an x E X such that y = Tx and T- l y = .x Thus,
or

II =

IXII x

IXII T- I y II

<

II <

IIT-I Y

II Tx II

= II y II,

~ lIyll

Hence, T- I is bounded and liT-III < I/IX.


Conversely, assume that T- I exists and is bounded. Then, for x E X
there is ayE R
< (T) such that y = Tx, and also x = T- I y. Since T- I is
bounded we have
or

The next result, called the Neumann expansion theorem, gives us important
information concerning the existence of the inverse of a certain class of
bounded linear transformations.
7.2.2. Theorem. Let X be a Banach space, let T E B(X, )X , let I E B(X, X)
denote the identity operator, and let II Til < I. Then the range of (1- T) is
,X the inverse of (I - T) exists and is bounded and satisfies the inequality

F u rthermore, the series

. Til in B(X,

.-0

with respect to the norm of B(X,


(1- T)- I

1+

)X ;

T+

(7.2.3)

X)

converges uniformly to (J -

i.e.,

T2

+ ... +

T"

+ ....

T)- I
(7.2.4)
14 5

Chapter 7 I iL near Operators

416

Proof

Since

IITil <

I, it follows that the series I~J ITII

view of Theorem 7.1.16 we have


converges in the space B(X ,
Theorem 7.1.24. If we set

II P II < IITil,

then
ST =

I: P+

TS =

T)S =

I,

.=0

T)

S(I -

It now follows from Theorem 3.4.65 that (I F u rthermore, S E B(X , X ) . The inequality
and is left as an exercise.
_
7.2.5. Exercise.

I: T
.so

I:P,
.-0

S=

(I -

and hence the series

because this space is complete in view of

X),

and

converges. In

I.

T)- I exists and is equal to S.


(7.2.3) follows now readily

.Prove inequality (7.2.3).

The next result, which is of great significance, is known as the Banach


inverse theorem.
7.2.6. Theorem. Let X and Y be Banach spaces, and let T E B(X , )Y .
If T is bijective then T- I is bounded.

Proof

The proof of this theorem is rather lengthy and requires two preliminary results which we state and prove separately.

7.2.7. Proposition. If A is any subset of X such that .1 = X (.1 denotes the


closure of A), then any x E X such that x * - O can be written in the form
x =

where

X.

Proof

that II X

"x

A and

Xl

II < til X
Ilx

A such that
and obtain

1,2, ....

is constructed as follows. Let X l E A be such


II. This can certainly be done since A= .X Now choose
- XI - X" II < illX II We continue in this manner
XI

We can always choose such an x .


.X

+ ... ,

X.

k}

Ilx and A =

+ ... +

X"

II X. II < 31IxlI/2, n =

The sequence x {

Xl

By construction of

I
.x 1I < 2.lIx
lI.
-

E A,

.X{ l,11

because
-

tl

11--

Xl

0 as n -

.x _

1 E

00.

Hence,

7.2. Inverses
x

= :E x
k- I

417

We now compute II x"

II. First,

we see that

Ilxlll = IIx l - X + Ix I < IIx l - Ix I + IIx l i < l! lxll,


IIx z lI = Ilx z + IX - -X
IX + ix i < IIx - IX - lzx l
+ Ilx - Ix II < illIx I,

and, in general,

IIx" II = IIx" + ,X ,_I +


< Ilx - IX -

+ IX -

+ -X

I - ... - ,X ,_I
,x ,11 + Ilx - IX - ... - ,x ,_111
X

II

< 2"lI x ll.


which proves the proposition.
If A
{ ,,}

7.2.8. Proposition.

= U

such that X

GO

_
is any countable collection of subsets of X

A", then there is a sphere S(x o; E)

,,-I
C .1".

and a set A" such

that S(X o; E)
Proof The proof is by contradiction. Without loss of generality, assume
that
AI C A z C A 3 C . . . .
F o r purposes of contradiction assume that for every x E X and every n
there is an E" > 0 such that S(x ; E,,) n A" = 0. Now let IX E X and EI > 0
be such that S(x l ; f l ) n AI = 0. eL t X z E X and f z > 0 be such that
S(x z ; fz ) c S(x l ; f\ ) and S(x z ; fz ) n Az = 0. We see that it is possible to
construct a sequence of closed nested spheres, ,K { ,},
(see Definition 5.5.34)
in such a fashion that the diameter of these spheres, diam (K,,), converges to
ez ro. In view of part (ii) of Theorem 5.5.35,
Then
X

n K" * 0. eL t
..

k- I

A" for all n. But this contradicts the fact that X =

completes the proof of the proposition.

Proof ofTheorem 7.2.6. Let


Ak =

Clearly, Y

.
U A

k- I

{ y E :Y
k

II r- I y II <

kllyll},

n "K .
GO

GO

,,= 1

k= 1

A". This

= 1,2, ....

By Proposition 7.2.8 there is a sphere S(Yo; f)

and a set A" such that S(Yo; E) C .1". We may assume that oY E A". Let
be such that 0 < p < E, and let us define the sets Band Bo by
and

= y{

Bo =

S(Yo; f):

{y E Y: y

p < lIy -

= z - oY ,

Z E

oY II}

B}.

14 8

Chapter 7 I iL near Operators

We now show that there is an Ax such that B o c


Y - oY E Bo. We then have

Ax

Let Y
<

B n Aft' Then

IIT-I(y - oY )11 < IIT-I Y II + IIT-I oY lI


~ nUl Y II + II oY III
< nUly - oY II + 211 oY 10
= nlly - Y 11[1
o

<
Now let K

nlly -

oY

11[1

211Yoll
IIYoY- ll
+ 211 po ll

be a positive integer such that

n[1

211 po "] < .K

It then follows that Y - oY E Ax. It follows readily that Bo c Ax


Now let Y be an arbitrary element in .Y It is always possible to choose
a real number .t such that .ty E Bo. Thus, there is a sequence y{ ,} such that
<

,Y

= .ty. This means that the sequence

Ax for all i and lim ,Y

i{ :

,Y }

converges to y. We observe from the definition of Ax that if ,Y E Ax , then


T1 IY E Ax for any real number .t. eH nce, we have shown that Y c Ax- .
iF nally, for arbitrary Y

where II fY t
series I; X
k= 1

Y we can write, by Proposition 7.2.7,

Y = IY +
II < 311 Y II/2ft . L e tx k =

+ ... +

1Y

T- I yk , k

Since T is continuous and since


X k)

< 3KllylI.

,and consider the infinite

lk =

IIxlI < ' t l"x k " < 3KIIYl k~

I~

This series converges, since

so that

T(f

fY t

= 1,

= ~

k- I

TX

:tY k

k- I

This implies that

~
X
(:' 1 k

y. eH nce,

T-I

3KIIYII

converges, it follows that Tx =

x = T- I y . Therefore, IIxli = IIT- I Y II

is bounded, which was to be proved.

tU ilizing
the principle of contraction mappings (see Theorem 5.8.5),
we now establish results related to inverses which are important in applications. In the setting of normed linear spaces we can restate the definition of a
contraction mapping as being a function T: X X (T is not necessarily

7.3.

Conjugate and Adjoint Operators

linear) such that

14 9

T(y) II <

IIT(x) -

/Xlix

yll

for all x , y E ,X with 0 < IX < I. The principle of contraction mappings


asserts that if T is a contraction mapping, then the equation

T(x)

= x

has one and only one solution x E .X


We now state and prove the following result.
7.2.9. Theorem.
and let l
O.

(i) If III >


(ii) if Ill>
(iii) if III >
vector x

Let
X

X),

let l E ,F

II T II, then Tx = h has a unique solution, namely x =


II Til, then (T - 1/)-1 exists and is continuous on X ;
II T II, then for a given y E X there is one and only
such that (T -

E X

x =(iv) if III-

be a Banach space, let T E B(X,

Til <

Proof
(i) F o r any ,x y

i[

l/)x =

y, and

+ ~ + ...

;J

0;
one

and

I, then T- I exists and is continuous on .X


E

,X

we have

1I1- Tx - l - t TY I I
= 11- 1 1\ 1 T(x - y)1I < Il-IIIITllllx - yll.
Thus, if II Til < IAI, then A-I T is a contraction mapping. In view of the
principle of contraction mappings there is a unique x E X with l- ' T x
= x,
or Tx = lx . The unique solution has to be x = 0, because TO = O.
I

(ii)

L e tL

t- T.

Then IILII

mil

Til <

l. It now follows from Theorem

exists and is continuous on .X Thus, (lL - ll)- I = (T


7.2.2 that (L _ / )- 1
- ll)- I exists and is continuous on .X This completes the proof of part (ii).
The proofs of the remaining parts are left as an exercise.
_
7.2.10. Exercise.

7.3.

Prove parts (iii) and (iv) of Theorem 7.2.9.

CONJG
U ATE

AND ADJOINT

OPERATORS

Associated with every bounded linear operator defined on a normed linear


space is a transformation called its conjugate, and associated with every
bounded linear operator defined on an inner product space is a transformation
called its adjoint. These operators, which we consider in this section, are of
utmost importance in analysis as well as in applications.

Chapter 7 I iL near Operators

24 0

Throughout this section X and Y a re normed linear spaces over ,F where


F is either R (the real numbers) or C (the complex numbers). In some cases
we may further assume that X and Y a re inner product spaces, and in other
instances we may require that X and/or Y be complete.
eL t X f and yf denote the algebraic conjugate of X and ,Y respectively
(refer to Definition 3.5.18). tU ilizing
the notation of Section 3.5, we write
x ' E X f and y' E yf to denote elements of these spaces. If T E L ( X ,
)Y ,
we defined the transpose of T, TT, to be a mapping from yf to X f determined
by the equation
,x <

TTy' )

y' ) for all x

T
< ,x

,X y'

yf

(see Definition 3.5.27), and we showed that TT E L ( yf, Xf).


Now let us assume that T: X - + Y is a bounded linear operator on X
into .Y Let *
x and y* denote the normed conjugate spaces of X and ,Y
respectively (refer to Definition 6.5.9). If y' E y*, then y' ( y) = y< , y' ) is
defined for every y E Y and, in particular, it is defined for every y = Tx,
x E .X The quantity T
< ,x
y' ) = y' ( Tx ) is a scalar for each x E .X Writing
x'(x) = T
< ,x
y' ) = y' ( Tx ) , we have defined a functional x ' on .X Since y'
is a linear transformation (it is a bounded linear functional) and since T is a
linear transformation (it is a bounded linear operator), it follows readily
that x ' is a linear functional. Also, since T is bounded, we have

Ix'(x) I =

I=

ly' ( Tx )

I<T,x

y' ) I<

Ily'lllITxll"<

lIy'III1Tllllxll,

and therefore x ' is a bounded linear functional and x ' E X*. We have thus
assigned to each functional y' E y* a functional x ' E X * ; i.e., we have
established a linear operator which maps y* into X * . This operator is called
the conjugate operator of the operator T and is denoted by T'. We now have
The definition of T': *Y
.x <

*x

-+

T' y ' )

x'

T' y ' .

is usually expressed by the relation


y' ) , x E ,X y' E Y*.

T
< ,x

tU ilizing
operator notation rather than bracket notation, the definition
of the conjugate operator T' satisfies the equation
x'(x)

y' ( Tx )

(T' y ' ) (x ) ,

E ,X

and we may therefore write


y' T =

T' y ' ,

where y' T denotes the functional on X consisting of the operators T and


y' , and T' y ' is the functional obtained by operating on y' by T' .
The reader can readily show that T' is unique and linear. If *Y = yf,
which is the case if Y is finite dimensional, then the conjugate T' and the
transpose TT are identical concepts. oH wever, since, in general, y* is a proper

7.3.

24 1

Conjugate and Adjoint Operators

subspace of yl, TT is an extension of T' or, conversely, T' is a restriction of


P to the space *Y .
We summarize the above discussion in the following definition and
Figure A.

7.3.1.

y.

iF gure A

7.3.2. Definition. Let T be a bounded linear operator on X into .Y


conjugate operator of T, T' : y* - + X * is defined by the formula
,x <

7.3.3. Exercise.

T' y ' )

= <Tx,

y' ) , x

,X y'

The

*Y .

Show that the conjugate operator T' is unique and linear.

Before exploring the properties of conjugate operators, we introduce


another important operator which is closely related to the conjugate operator,
the so-called "adjoint operator." In this case we focus our attention on
Hilbert spaces.
Let X and Y denote Hilbert spaces, and let the symbol ( , ) denote the
inner product on both X and .Y If T is a bounded linear transformation on
X into ,Y then in view of the above discussion there is a unique bounded linear
operator from *Y into X , called the conjugate of T. But in view of Theorem
6.14.2, the dual spaces X , y* may be identified with X and ,Y respectively,
because X and Y a re Hilbert spaces. This gives rise to a new type of bounded
linear operator from Y into ,X called the adjoint of T, which we consider in
place ofT' .
Let oY E Y be fixed, and let x ' ( x ) = (x, x ' ) = (Tx, oY ), where T E
B(X, )Y and x ' E *X . By Theorem 6.14.2 there is a unique X o E X such that

Chapter 7 I iL near Operators

24 2

x'(x) = (x, ox )' Writing X o = T*yo we define in this way a transformation of


Y into .X We call this transformation the adjoint of T. Dropping the subscript

ez ro, we characterize the adjoint of T by the formula

(Tx, y) =

(x, T*y), x

,X Y

.Y

We will now show that T*: Y - + X is linear, unique, and bounded. To prove
linearity, let x E ,X lY > 2Y E ,Y let ~, P E ,F and note that

(x, T*(~IY

PY2
=

(Tx, ~YI
(x,

T*YI)

PY2)

(Tx,

IY )

p(x , T*Y2) =

P(Tx, 2Y )

(x, ~T*YI

PT*Y)J ' '

F r om this it follows that


T*(~YI

PY2) =

~T*YI

PT*Y2'

and therefore T* is linear.


To show that T* is unique we note that if (x, T*y) = (x, S*y), then
(x, T*y) - (x, S*y) = 0 implies (x, (T* - S*)y) = 0 for all x E .X F r om
this it follows that (T* - S*)y 1. x for all x E ,X and thus (T* - S*)y = 0
for all Y E .Y Therefore, T* = S*.
To verify that T* is bounded we observe that

II T*x 11 2 = I(T*x, T*x) I = I(T(T*x), )x 1 s;; II T(T*x) IIIIx II


and thus

<

IITIIIIT*xllllxll,

II T*x II < IITllllxll

F r om this it follows that T* is bounded and furthermore II T*II S;; II T II.


We now give the following formal definition.
7.3.4. Definition. Let X and Y be Hilbert spaces, and let T be a bounded
linear operator on X into .Y The adjoint opentor T*: Y - > X is defined by
the formula

(Tx, y) =

(x, T*y), x

,X Y

.Y

Summarizing the above discussion we have the following result.


7.3.5. Theorem. The adjoint operator T* given in Definition 7.3.4
linear, unique, and bounded.

is

The reader is cautioned that many authors use the terms conjugate operator
and adjoint operator interchangeably. Also, the symbol T* is used by many
authors to denote both adjoint and conjugate operators.
Some of the important properties of conjugate operators are summarized
in the following result.

7.3.

Conjugate and Adjoint Operators

24 3

7.3.6. Theorem. Conjugate transformations have the following properties:


(i) liT' II = II Til;
(ii) I' = I, where I is the identity operator on a normed linear space X ;
(iii) 0' =

(iv)
(v)
(vi)
(vii)

0, where 0 is the ez ro operator on a normed linear space X;


(S T)' = S'
T', where S, T E B(X, )Y and where ,X Y a re
normed linear spaces;
(IXT)' = IXT,'
where T E B(X, )Y , IX E ,F and ,X Y a re normed
linear spaces;
(ST)' = T'S', where T E B(X, )Y , S E B( ,Y Z), and ,X ,Y Z are
normed linear spaces; and
if T- J exists and if T- ' E B(Y, X), then (T' ) - J exists, and moreover
(T' t J = (T-J)'.

Proof To prove part (i) we note that

I<,x

<IIy'IIIITxll<
Ily'IIIITllllxll
this it follows that IIT'yl' l < IITlllly'lI,and therefore
IIT'II<IITII

From

T' y ' )

I=

I< T x , y' ) 1

"*

Next, let X o E ,X X O O. In view of the Hahn-Banach


theorem (see
Corollary 6.8.5) there is a y~ E *Y , II y~ II = 1, such that T
< x o, y~) = II Tx o II.
Therefore,

II Tx o II = I<x

o' T'y~>I<

IIl1x o II < IIT'lIllx o II,

IIT'y~

from which it follows that

IITil <

IIT'II

Therefore, II Til = II T' II


The proofs of properties (ii)-(vi) are straightforward. To prove (iv), for
example, we note that
,x<

(S +

T)' y ' )

S
S
< ,x

,x<

1- T)x , y' )

y' )

S' y '

<Tx,

y' )

T' y ' )

S
< x

=
,x<

Tx, y' )
,x<
(S'

S' y ' )

,x<

T' y ' )

T' ) y' ) .

From this it follows that (S T)' = S'


T' .
To prove part (vii) assume that T E B(X, )Y has a bounded inverse T- J :
y - + .X To show that T': y* - + *X
has an inverse we must show that it is
injective. eL t y~, y; E y* be such that y~
y;. Then
,x<

T' y ;>

,x<

T'y>~

"*

<Tx,

y; -

"*

y~)"*

for some x E .X From this it follows that T'y~


T'y,~
and T' is one-to-one.
We can, in fact, show that T' is onto. We note that for any x ' E x* and any

Chapter 7 I iL near Operators

24 4
E
X

,X

Tx =
<x,

y, and we have

x')

<T-I

<x,

y, x ' )

<y, (T- I )' X ' )

<Tx,

(T- I )' x ' )

T'(T~I)'x').

From this it follows that


x'

This shows that x '


7.3.7. Exercise.

R
< (T')

T' ( T- I )' x ' .

and that (T' ) - I =

(T- I )' .

Prove parts (ii), (iii), (v), and (vi) of Theorem 7.3.6.

In the next theorem some of the important properties of adjoint operators


are summarized.
7.3.8. Theorem. Let ,X ,Y and Z be Hilbert spaces, and let I and 0 denote
the identity and ez ro transformation on ,X respectively. Then
(i) IIT*II= IITII, where T E B(X , )Y ;
(ii) 1* = I;
(iii) 0* = 0;
(iv) (S + T)* = S* + T*, where S, T E B(X , )Y ;
(v) (a:T)* = a.T*, where T E B(X, )Y and IX E F ;
(vi) (ST)* = T*S*, where T E B(X, )Y , S E B(Y, Z);
(vii) if T- I E B(Y , X ) exists, then (T*)- t E B(X , )Y exists, and moreover (T*)- t = (T- t )*;
(viii) if for T E B(X , )Y we define (T*)* = T**, then T** = T; and
(ix) IIT*TII= IITllz , where T E B(X, )Y .

Proof

To prove part (i) we note that


IIT*x l l z

or

= I(T*x , T*x ) ! = I(T(T*x ) , )x 1 < II T(T*x ) IIIlxll


< IITIIIIT*xllllxll,
Ilr*xll <

IITllllx l l

F r om the last inequality it follows that IIT*II <

T and T* we obtain
IITxZ
U

or

= I(Tx ,

Tx)1

= I(T*(Tx ) ,

)x 1

II T II. Reversing the roles of

< II T*(Tx ) IIIIx II <

IIT*IIIITx l lllx l l,

II Tx II < II T* IIIIx II
this it follows that II Til < IIT*II,and therefore IITII = IIT*II

F r om
The proofs of properties (ii)-(viii) are trivial. To prove part (ix), we first

7.3.

Conjugate and Adjoint Operators

24 5

note that

IIT*TII< IIT*IIIITII= IITIIIITII= IITW


On the other hand,

IITxW =

(Tx, Tx ) =

<

(T*Tx, )x

< IIT*Tllllxllllxll

IIT*Tx l lllx l l

Taking the square root on both sides of the above inequality we obtain
and thus II Til <

II Txll < ~IIT*Tllllxll,


or IITW < IIT*TII Hence, IIT*TII=

~IIT*TII,

7.3.9. Exercise.

IITW

Prove parts (ii)-(viii) of Theorem 7.3.8.

F r om the above discussion it is obvious that adjoint operators are distinct


from conjugate operators even though many of their properties appear to
be identical, especially for the case of real spaces. We now cite a few examples
to illustrate some of the concepts considered above.
Let X = C" be the H i lbert space with inner product
defined in Example 3.6.24, and let A E L ( X ,
X ) be represented (with respect
The transformation
to the natural basis for X ) by the n x n matrix A = a[ IJ'}
Y = Ax can be written in the form

7.3.10. Example.

= E

IY

i=

alJ x J ,

I= J

1,2, ... ,n,

where IY is the ith component of the vector y E .X Let A* denote the adjoint
of A on the H i lbert space X, and let A* be represented by the n X n matrix
[a~}.
Now if u = (u l ,
,u.) E X , then
(Ax , u)

(y, u)

and
(x , A*u) =

1= 1

Nil =

t U I( 1=f:1 aIJx

1= 1

J ),

t (t
a~uJ)'
J-I

I- I

XI

In order that (Ax , u) = (x , A*u) we must have a~ = iiIJ ; i.e., the matrix of
A* is the transpose of the conjugate of the matrix of A.

7.3.11. Example. Let X = Y = L 1 a[ , b], a <


and define the rF edholm operator T by
yet) =

(Tx)(t)

b (see Example

k(s, t)x(s)ds, t

6. I I.lO),

a[ , b],

where it is assumed that the kernel function k(s, t) is well enough behaved so
that

s: s:

I k(s, t) 1

dtds

<

00.

Clulpler 7 I iL near Operalors

24 6

Now if U E L

[,
2a

b], then

(Tx, u)

u)

(Y.

s: y(l)u(l) dl = s: (U I) :U k(s, I)x(s)ds )dl


s: ex s) :U k(s, I)U(I)d1 )ds.

F r om this it follows that the adjoint T* of T maps U into the function

s: k(t, s)u(s)ds;

= (T*uXI) =

Z(I)

i.e., the adjoint of T is obtained by interchanging the roles of sand t in the


kernel and by utilizing the complex conjugate of k.
7.3.12.

12

-/

Exercise.
by

=
Y

12 (see Example

2' " .. "., )=

T(el>

for all x = (' I >


defined by

Let

'2'

e., )

(0,

(' I I' 1' 2' ... , " . .. )

I' ' 2' " .. ".....)

12 , Show that T*: 12 - 1

T*('11> 1' 2' ... 'I., ... )


for all y

6.1.6) and define T:

= ('12' 1' 3' ... , 'I.,

y,
is the operator

... )

12,

Recalling the definition of orthogonal complement (refer to Definition

6.12.1), we have the following important results for bounded linear operators
on H i lbert

spaces.

7.3.13. Theorem. L e t T be a bounded linear operator on a H i lbert


X into a H i lbert space .Y Then,
(i) { R
< (T)}.L
(ii) R
< (T) =
(iii) ~(T)
=
(iv) R
< (T*)
(v)

(~ T*)

(vi) R
< (T)

space

~(T*);

~(T*).L;

R
<{ (T*)}.;L

=
=

~(T).L;
~(TT);

and

R
< (TT*).

Proof We prove (i) and (v) and leave the proofs of (ii}(- iv)

and (vi) as an
exercise.
To prove (i), we first show that R
< (T).l = ~(T*).
Let y E R
< (T).l. Then
(y, Tx) = 0 for all x E ,X and hence (T y , x) = 0 for all x E .X This can be
true only if T*y = 0; i.e., y E ~(T*).
On the other hand, if y E ~(T*),
then (T*y, x) = 0 for all x E .X Thus, (y, Tx) = 0 for every x E ,X which
implies that y E R< (T).l. Now R
< (T) need not be closed. However, by Theorem
6.12.14 R
< (T) = R
< (T)u. Therefore, R<{ (T)}.L
= R< (T)il.L = R< (T).L = ~(T*).

7.4.

eH rmitian

24 7

Operators

To prove (v), let y E m(T*). Then T*y = 0 and TT*y = O. This implies
that m(T*) c m(TT*). Next, let y E m(TT*). Then TT*y = 0 and (y, TT*y)
= O. This implies that (T*y, T*y) = 0 so that T*y = O. Therefore, y E m(T*)
and m(TT*) c m(T*), completing the proof of part (v).
7.3.14.

Exercise.

Prove parts (ii)-(iv) and (vi) of Theorem 7.3.13.

We conclude this section with the following results.


7.3.15. Theorem. Let T E B(X, )X , where X
M and N be subsets of .X Define T(M) as

T(M) =

y{ : y =

is a Hilbert

space, and let

Tx, x EM} .

If T(M) c N, then T*(Ni-) c Mi-.

Proof Let z .1 N. Then for x E M we have (Tx, )z


fore, T*z .1 x for all x E M and T*z E Mi-.

0=

(x, T*z). There-

7.3.16. Theorem. Let T E B(X, )X , where X is a Hilbert space, and let


M and N be closed linear subspaces of .X Then T(M) c N if and only if
T*(Ni-) c Mi-.

Proof If T(M) c N, then by Theorem 7.3.15 T*(Ni-) c Mi-. Conversely,


if T*(Nl.) c Ml., then by Theorem 7.3.15, T**(Mil) c NH . But T** = T
and if M and N are closed linear subspaces, then MH = M and NJ.1. = N.
Therefore, T(M) c N.

7.4.

HERMITIAN

OPERATORS

Throughout this section X denotes a complex Hilbert space. We shall


be primarily concerned with operators T E B(X , X ) . By T* we shall always
mean the adjoint of T.
F o r our first result, recall the definition of bilinear functional (Definition
3.6.4).
7.4.1. Theorem. Let T E B(X, X) and define the function rp: X x X by rp(x, y) = (Tx, y) for all x, y E .X Then rp is a bilinear functional.
7.4.2.

Exercise.

Prove Theorem 7.4.1.

Of central importance in this section is the following class of operators.


7.4.3. Definition. A bounded linear transformation T
to be hermitian if T = T*.

B(X,

X)

is said

24 8

Some authors call such transformations self-adjoint operators (see Definition .4 10.20).
The next two results allow us to characterize a hermitian operator in an
equivalent manner. The first of these involves symmetric bilinear forms (see
Definition 3.6.10).
7.4..4
Theorem. eL t T E B(X,
bilinear transformation ,(x , y) =

X). Then T is hermitian if and only if the


(Tx , y) is symmetric.

Proof If T* = T, then ,(x , y) = (Tx , y) = (x , T*y) = (x , Ty) = (Ty, )x


= ,(y, x), and therefore, is symmetric.
Conversely, assume that ,(x , y) = ,(y, x). Then ,(y, )x = (Ty, )x =
(x , Ty) = ,(x , y) = (Tx , y) = (x , T*y); eL ., (x, Ty) = (x , T*y) for all x , y E
.X F r om this it follows that
T* -

and thus (T* T*

E Xor

7.4.5.

T)x ..L y for all x


T.

Theorem. Let

(Tx , x ) is real for every x

T E B(X,
E .X

0,

T)x , y) =
E

X).

This implies that T*x

.X

Tx for all

Then T is hermitian if and only if

If T is hermitian, then (Tx , y) = (Ty, x). Setting x = y, we obtain


= (Tx , x), which implies that (Tx , )x is real.
Then (x, Tx ) =
Conversely, suppose (x , Tx ) = (x , Tx ) for all x E .X
(Tx , x). Now consider (x , Ty) for arbitrary x , y E .X It is easily verified that

Proof
(Tx , )x

(x

y, T(x

(x -

where;

y, T(x -

i(x -

iy, T(x -

y) -

(T(x -

y), x -

i(T(x -

iy), x -

(T(x

y), x

iy

= ..;=1. Also,
y)

iy)

i(x
4(x,

i(T(x

iy, T(x

4 ( Tx ,

iy

(7.4.6)

Ty)
iy), x
y).

Since the left-hand sides of Eqs. (7.4.6) and (7.4.7) are equal,
that (x , Ty) = (Tx , y) for all x , y E ,X and hence T = T*.

iy

(7.4.7)

it follows

The norm of a hermitian operator can be found as follows.


7.4.8. Theorem. Let T E B(X, X ) be a hermitian operator. Then the norm
of T can be expressed in the following equivalent ways:
(i)
(ii)

II Til =
IITII =

sup I{ (Tx , )x l:
sup { I (Tx , y)l:

IIxll = I}; and


IIxli = lIyll = I}.

7.4.

eH rmitian

7.4.9.

Operators

Exercise.

24 9

Prove Theorem 7.4.8.

In the next theorem, some of the more important properties of hermitian


operators are given.
7.4.10. Theorem. eL t
be a real scalar. Then
(i)
(ii)
(iii)
(iv)
7.4 . H .

S, T E B(X,

be hermitian operators, and let ex

X)

(S + T) is a hermitian operator;
exT is a hermitian operator;
if T is bijective, then T- I is hermitian; and
ST is hermitian if and only if ST = TS.
Exercise.

Prove Theorem 7.4.10.

Since in the case of hermitian operators (Tx , x ) is real for all x E ,X


the following definition concerning definiteness applies (recall Definition
3.6.10).
7.4.12. Definition. eL t T E B(X, X ) be a hermitian operator. Then Tis
said to be positive if (Tx , x ) ~ 0 for all x E .X In this case we write T ~ O.
If (Tx, x) > 0 for all x *- 0, we say that T is strictly positive.
7.4.13. Definition. eL t S, T
hermitian operator T + (- S ) =
7.4.14.
Theorem. eL t S, T, U
ex be a real scalar. Then,
(i)
(ii)
(iii)
(iv)

be hermitian operators. If the


0, then we write T > S.

B(X, X )
T- S>
E

if S ~ 0, T~
0, then (S +
if ex > 0, T~
0, then exT~
if S ::; T, T::; ,U then S <
for any V E B(X, X), if
V*V> o.

B(X,
T)

0;

be hermitian operators, and let

>

X)

0;

U; and
T > 0, then V*TV>

O. In particular,

Proof The proofs of parts (i}(- iii) are obvious. F o r example, if S > 0,
T > 0, then (Sx , )x + (Tx, )x = (Sx + Tx , )x =
+ Dx , x) ;;::: 0 and
(S+ D;;:::O.
To prove part (iv) we note that (V*TVx , x) = (TVx , Vx);;::: 0, since Vx
= y is a vector in X and (Ty, )Y > 0 for all y E .X If we consider, in particular, T = 1= 1*, then v* V ~ O.

The proof of the next result follows by direct verification of the formulas
involved.

34 0

Chapter 7 / iL near Operators

7.4.15.

Theorem. eL t A

where i

= ,j- 1 .

~
=

A
[

E B(X ,

and let

X),

V=

and

A*]

ii

A
[ -

A*],

Then

(i) U and V are hermitian operators; and


(ii) if A = C + iD, where C and D are hermitian, then C
D= V.
7.4.16.

=
U

and

Prove Theorem 7.4.15.

Exercise.

eL t us now consider some specific cases.


7.4.17. Example. eL t X = C" with inner product given in Example 3.6.24.
Let A E B(X , X), and let e{ l> ... ,eft) be any orthonormal basis for .X As
we saw in Example 7.3.10, if A is represented by the matrix A, then A* is
represented by A * = AT. In this case A is hermitian ifand only if A = AT.
7.4.18. Example.
T E B(X, X ) by

= L
X

eL t

Then for any

7.4.19.

Tx

s:

tx ( t)z ( t)

(x , Tz)

Let X =

z =

Show that T*

Tx

y(t)

dt

s:

x ( t)tz ( t)

dt

-+

by

x ( s)ds.

*" T and therefore T is not hermitian.

7.4.20. Exercise. eL t X = L
given in Example 7.3.11; i.e.,

Show that T

tx ( t).

b], and define T: X

[,
2a
L

6.11.10), and define

(T*x , z).

T* and T is hermitian.

Exercise.

we have

E X

(Tx , z )

Thus, T =

b] (see Example

[,
2a

(Tx ) (t)

a[ , b] and consider the Fredholm

s: k(s, t)x(s)ds,

T* if and only if k(t, s) =

operator

t E a[ , b].

k(s, t).

We conclude this section with the following result, which we will subsequently require.

7.5.

Other iL near Operators

34 1

7.4.21. Theorem. Let X be a H i lbert space, let T E B(X , X ) be a hermitian


operator, and let 1 E R. Then there exists a real number" > 0 such that
, 11 x II < II (T - U ) x II for all x E X if and only if (T - U ) is bijective and
(T - 11)-1 E B(X , X ) , in which case II(T - ,il)-III < 1/".

L e t T A= T - AT. It follows from Theorem 7.4.10 that T Ais also


hermitian.
To prove sufficiency, let Til E B(X , X ) . It follows that for all Y E ,X
IITilyli < II Til II l lyl\ L e ttingy = TAX and" = II Til WI, we have II TAX II
2 ,,11 x 1\ for all x E .X
To prove necessity, let" > 0 be such that , 11 x II < II TAX II for all x E .X
We see that TAX = 0 implies X = 0; i.e., m(TJ = O
{ ,J and so TAis injective.
< (T A) = .X It follows from Theorem 6.12.16 that X
We next show that R
= R< (T A) EEl R< (T A)1.. F r om Theorem 7.3.13, we have R< (TA)l = men). Since
TAis hermitian, meT! ) = m(TA) = O
{ .J Hence, R< (T A) = .X We next show that
R
< (T A) = R
< (T A), i.e. the range of T A is closed. Let nY{ J
be a sequence in
R
< (T A) such thatYn - > y. Then there is a sequence nx{ J in X such that TAx n = n'J '
F o r any positive integers m, n, , 11 X m - X nIi < II TAx m - TAx nII = II m
Y - nY II.
Since nY{ J
is Cauchy, nx { J
must also be Cauchy. Let X n - > .x Then nY = TAx n
->
TAX = y. Thus, Y E R
< (T A) and so R
< (T A) is closed. This proves that TA
is bijective. Finally, ,,11 Ti I Y II < II Y II for all y E X implies Ti I E B(X , X )
and II Tilll < 1/". This completes the proof of the theorem. _

Proof

7.5.

OTHER
LINEAR OPERATORS:
NORMAL OPERATORS, PROJECTIONS,
U N ITARY
OPERATORS,
AND ISOMETRIC OPERATORS

In this section we consider additional important types of linear operators.


Throughout this section X is a complex H i lbert space, T* denotes the adjoint
of T E B(X , X ) , and I E B(X , X ) denotes the identity operator.
7.5.1. Definition.
ifT*T= TT*.

An operator T

7.5.2. Definition. An operator T


operator if T*T = I.
7.5.3. Definition. An operator T
tor if T*T = TT* = I.

B(X ,

E B(X ,

B(X ,

Our first result is for normal operators.

is said to be a normal operator

X)

X)

X)

is said to be an isometric
is said to be an unitary opera-

34 1

ClUpJ ter

7.5.4. Theorem. Let


operators such that T
7.5.5. Exercise.
Theorem 7.4.15.

I iL near Operators

T E B(X, X).
Let ,U V E B(X, X ) be hermitian
U
iV. Then T is normal if and only if U V = VU.

Prove Theorem 7.5.4. Recall that U and V are unique by

F o r the next result, recall that a linear subspace Y of X is invariant


under a linear transformation T if T(Y ) c Y (see Definition 3.7.9). Also,
recall that a cloSed linear subspace Y of a Hilbert space X is itself a Hilbert
space with inner product induced by the inner product on X (see Theorem
6.2.1).
7.5.6. Theorem. Let T E B( ,X X ) be a normal operator, and let Y be a
closed linear subspace of X which is invariant under T. eL t T I be the restriction of T to .Y Then TIE B(Y , )Y and T I is normal.
7.5.7. Exercise.

Prove Theorem 7.5.6.

F o r isometric operators we have the following result.


7.5.8. Theorem. eL t T E B(X , X). Then the following are equivalent:
(i) T is isometric;
(ii) (Tx , Ty) = (x, y) for all ,x y E X ; and
(iii) II Tx - Ty II = IIx y II for all ,x y E .X
Proof If T is isometric, then (x, y) = (lx , y) = (T*Tx, y) = (Tx , Ty) for
all x , y E .X
Next, assume that (Tx, Ty) = (x, y). Then I\ Tx - Ty I\ ' = I\ T(x - y) II'
= (T(x - y), T(x - y)) = x - y), (x - y = IIx - yll' ; i.e., IITx - Tyll

= l lx - y ll

iF nally, assume that II Tx - Ty II = II x y II. Then (T*Tx,


II Tx W = IIx W = (x, x); i.e., (T T x , )x = (x, x) for all x
implies that T T = I; i.e., T is isometric. _
=

)x
E

.X

(Tx , Tx )
But this

From Theorem 7.5.8 there follows the following corollary.


7.5.9. Corollary. If T E B(X, X ) is an isometric operator, then IITxll
= Ilxll for all x E X and IITII = I.
F o r unitary operators we have the following result.
7.5.10. Theorem. eL t T

E B(X ,

(i) T is unitary;
(ii) T is unitary;
(iii) T and T are isometric;

)X .

Then the following are equivalent:

7.5. Other iL near Operators

34 3

(iv) T is isometric and T* is injective;


(v) T is isometric and surjective; and
(vi) T is bijective and T- I = T*.

7.S.H.

Exercise.

Prove Theorem 7.5.10.

Before considering projections, let us briefly return to Section 3.7. Recall


that if (a linear space) X is the direct sum of two linear subspaces XI and X z ,
i.e., X = X l EB X z , then for each x E X there exist unique X l E X l and
Xz
E X z such that X =
Xl
X z . We call a mapping P: X - >
X defined by
Px = X l the projection on .X along X z . Recall thatP E L ( X ,
X), R
< (P) = X l '
and m(p) = X z . Furthermore, recall that if P E L ( X ,
X ) is such that pz = P,
then P is said to be idempotent and this condition is both necessary and sufficient for P to be a projection on R
< (P) along m(p) (see Theorem 3.7.4). Now
if X is a Hilbert space and if X l = Y is a closed linear subspace of ,X then
X z = y.l and X = Y E9 y.l (see Theorem 6.12.16). If for this particular
case P is the projection on Y a long y.l, then P is an orthogonal projection
(see Definition 3.7.16). In this case we shall simply call P the orthogonal
projection on .Y

7.5.12. Theorem. eL t Y be a closed linear subspace of X such that Y


and Y

*" .X

Let P be the orthogonal projection onto .Y Then

*" O{ J

(i) P E B(X, X ) ;
(ii) IIPII = I; and
(iii) p* = P.

Proof We know that P E L ( X ,


X ) . To show that P is bounded let X = X l
x z , where X I E Y a nd X z E .Y l. Then II Px II = Ilxlli < IIxll. eH nce, P
is bounded and IIPII ~ I. If X z = 0, then IIPxl1 = IIxll and so IIPII = I.
To prove (iii), let x, Y E X be given by X = X I + X z and Y = IY + ,zY
respectively, where X I ' IY E Y a nd x z , zY E .Y l. Then (x , Py) = (X l + X z ,
Y l ) = (X l ' Y l ) and (Px, y) = (XI> IY
yz) = (X I ' YI)' Thus, (x, Py) = (Px, y)

for all ,x Y E .X
This implies that P

= P*.

From the above theorem it follows that an orthogonal projection is a


hermitian operator.

7.5.13. Theorem. Let Y be a closed linear subspace of ,X and let P be the


orthogonal projection onto .Y If
Y

= x{

and if Y z is the range of P, then Y

Px

E X:

= Y

= )x

= Y z.

Chapter 7 I iL near Operators

34 4

Proof
Y=
Y
7.5.14.

Since
I

Y~

~Y

Theorem.

,Y

since Y c Y

Let P

L(X,

x{

it follows that

c Y~,

If P is idempotent and hermitian, then

X).

and since Y

I,

Px

E X:

}x

is a closed linear subspace of X and P is the orthogonal projection onto .Y

Proof

Since P is a linear operator we have

If x, y E ,Y then Px

P(rx.x

fty) =

x and Py

P(rx.x

rx.Px

ftPy.

y, and it follows that


fty) =

rx.x

fty.

Therefore, (rx.x
fty) E Y a nd Y is a linear subspace of .X We must show
that Y is a closed linear subspace. First, however, we show that P is bounded
and therefore continuous. Since

IIPzW

(Pz, Pz)

(P*Pz, )z

(P~z,

)z

(Pz, )z

<

IIPz l lllz l I,

we have IIPzlI ~ IIzll and IIPI! = l.


To show that Y is a closed linear subspace of X let X o be a point of accumulation of the space .Y Then there is a sequence of vectors {x~}
in Y such
that lim I\ ~x - X o II = O. Since ~x E ,Y we can put Px~
= x~ and we have

I\ Px~

~ X o 1\ - 0 as n - (Xl. Since P is bounded, it is continuous and thus


we also have 1\ Px~
- PX o II- 0 as n - > (X l , and hence X o E .Y
Finally, we must show that P is an orthogonal projection. L e t x E ,Y
and let y E .Y l. Then (Py, )x = (y, Px) = (Y, x) = 0, since x ...L y. Therefore,
Py...L x and Py E .Y l. But P(Py) = Py, since P~ = P and thus Py E .Y
Therefore, it follows that Py = 0, because Py E Y and Py E .Y l. Now let
Z = x + y E ,X
where x E Y and y E lY .. Then pz = Px + Py = x +
= .x Hence, P is an orthogonal projection onto .Y

The next result is a direct consequence of Theorem 7.5.14.


7.5.15. Corollary. L e t Y be a closed linear subspace of X,
the orthogonal projection onto .Y Then P(yl.) = O
{ .J
7.5.16. Exercise.

and let P be

Prove Corollary 7.5.15.

The next result yields the representation of an orthogonal projection onto


a finite-dimensional subspace of .X
7.5.17. Theorem. L e t IX{ >
, x~} be a finite orthonormal set in ,X and
let Y be the linear subspace of X generated by { X I "' "
x~}.
Then the
orthogonal projection of X onto Y is given by

Px =

I- I

(x, ,X )X

, for all x

.X

7.5.

Other iL near Operators

34 5

Proof We first note that Y is a closed linear subspace of X by Theorem


6.6.6. We now show that P is a projection by proving that p'1. = P. F o r
any j

I, ... , n we have

PX

Hence,

for any x

ft
~

(x

I- I

J ,

,x )x,

(7.5.18)

Ix "

we have
X

" (x,

,=

,X )X

t-1

Next, we show that CR(P) =


Y c CR(P), let y E .Y Then

.Y

Px.

It is clear that CR(P) c .Y

+ ... +

tllXI

To show that

tI"x"

for some { t il' ... ,tift}. It follows from Eq. (7.5.18) that Py = Y and so
y E CR(P).
iF nally, to show that P is an orthogonal projection, we must show that
CR(P) 1- (~ P).
To do so, let x E ~(P)
and let y E CR(P). Then

(x, y)

=
=

(x, Py)
~

I~

(x, ~

" (x, ,X )(X

(O,y)

This completes the proof.

" (y, ,X )X

1= 1

O.

"

y)

(~(x, "
1'1=

,)

=
,X )X

"(y,
- - ,x )(x,

1= 1

"

y)

,x )

(Px, y)

Referring to Definition 3.7.12 we recall that if Y and Z are linear subspaces


of (a linear space) X such that X = Y ffi Z, and if T E L ( X ,
X ) is such
that both Y and Z are invariant under T, then T is said to be reduced by Y
and Z. When X is a Hilbert space, we make the following definition.

7.5.19. Definition. eL t Y be a closed linear subspace of ,X and let T E


X ) . Then Y is said to reduce T if Y a nd y.l. are invariant under T.

L(X,

Note that in view of Theorem 6.12.16, Definitions 3.7.12 and 7.5.19 are
consistent.
The proof of the next theorem is straightforward.

7.5.20. Theorem. Let


B(X , X ) . Then
Y

be a closed linear subspace of ,X

and let T

Y is invariant under T if and only if y.l. is invariant under T*; and


(ii) Y reduces T if and only if Y is invariant under T and T*.
(i)

Chapter 7 I iL near Operators


7.5.21. Exercise.

Prove Theorem 7.5.20.

7.5.22. Theorem. Let Y be a closed linear subspace of ,X let P be the


orthogonal projection onto ,Y let T E B(X, X ) , and let I denote the identity
operator on .X Then

(i) Y is invariant under T if and only if TP = PTP;


(ii) Y reduces T if and only if TP = PT; and
(iii) (I - P) is the orthogonal projection onto lY ..
Proof To prove (i), assume that TP = PTP. Then for any x E Y we have
Tx = T(Px ) = P(TPx ) E Y, since P applied to any vector of X is in .Y
Conversely, if Y is invariant under T, then for any vector x E X we have
T(Px ) E ,Y because Px E .Y Thus, P(TPx ) = TPx for every x E .X
To prove (ii), assume that PT = TP. Then PTP = P2T = PT = TP.
Therefore, PTP = TP, and it follows from (i) that Y is invariant under T.
To prove that Y reduces T we must show that Y is invariant under T*.
Since P is hermitian we have T*P = (PT)* = (TP)* = P*T* = PT*; i.e.,
T*P = PT*. But above we showed that PTP = TP. Applying this to T* we
obtain T*P = PT*P. In view of (i), Y is now invariant under T*. Therefore,
the closed linear space reduces the linear operator T.
Conversely, assume that Y reduces T. By part (i), TP = PTP and T*P
= PT*P. Thus, PT = (T*P)* = (PT*P)* = PTP = TP; i.e., TP = PT.
To prove (iii) we first show that (I - P) is hermitian. We note that
(l - P)* = 1* - p* = I - P. Next, we show that (I - P) is idempotent.
We observe that (I - pp = (1- 2P + P2) = (1- 2P + P) = (1- P).
Finally, we note that (1 - P)x = x if and only if Px = 0, which implies
that x E lY .. Thus,
yl.

x{

X:

(1- P)x

It follows from Theorem 7.5.14 that (I The next


theorem.

.}x

P) is a projection onto lY ..

result follows immediately from part (iii) of the preceding

7.5.23. Theorem. Let Y be a closed linear subspace of ,X and let P be the


orthogonal projection on .Y If II Px II = II x II, then Px = x, and consequently
x E .Y
7.5.24.

Exercise.

Prove Theorem 7.5.23.

We leave the proof of the following result as an exercise.


7.5.25. Theorem. Let Y a nd Z be closed linear subspaces of ,X and let P
and Q be the orthogonal projections on Y a nd Z, respectively. Let 0 denote

7.5.

Other iL near Operators

the zero transformation in B(X,


(i) Y 1(ii) PQ =
(iii) QP =

(iv) P(Z)
(v) Q(Y )

;z
=

34 7
)X .

The following are equivalent:

0;
0;
O
{ ;}

= O{ .J

7.5.26. Exercise.

and

Prove Theorem 7.5.25.

F o r the product of two orthogonal projections we have the following


result.

7.S.27. Theorem. L e t Y I and Y z be closed linear subspaces of ,X and let


PI and P z be the orthogonal projections onto Y I and Y z , respectively. The
product transformation PJP Z is an orthogonal projection if and only if PI
commutes with P z . In this case the range of P1P Z is Y I (i Y z .

Proof Assume that PIP Z = PZP I Then (PIP Z)* = PfN = PZP I = PIP Z;
i.e., if PIP Z = PZP I then (PIP Z)* = (P1P Z) Also, (PJPZP = PIPZPIP Z
= PIPIPZP Z = PIP Z; i.e., if PIP Z = PZP I , then PIP Z is idempotent. Therefore, PIP Z is an orthogonal projection.
Conversely, assume that PJP Z is an orthogonal projection. Then (PJP z )*
= PfN = PZP 1 and also (P1P Z)* = PJP z . Hence, P1P Z = PZP J .
Finally, we must show that the range of PI P z is eq u al to Y J (i Y z . Assume
that x E 6l(P IP z ). Then P1PZx = ,x because P J P z isan orthogonal projection.
Also, PIPZx = PI(PZx) E Y J , because any vector operated on by P J is in Y I '
Similarly, PZPlx = Pz(PJ)x
E Y z . Now, by hypothesis, P1P Z = PZP Io and
therefore PIPZx = PZPJx
= x E Y I (i Y z . Thus, whenever x E 6l(P IP z ),
then x E Y J (i Y z . This implies that 6l(P IP z ) c Y I (i Y z . To show that
6l(P IP z ) ::J Y I ( i Y z , assume that x E Y 1 (i Y z . Then PJPZx
= PJP{ )xz
= PIX = X E 6l(P IP z ). Thus, Y I (i Y z C 6l(P 1P z ). Therefore, 6l(P IP z )
= Y I (i Y z

7.5.28. Theorem. L e t

Y and Z be closed linear subspaces of ,X and let P


and Q be the orthogonal projections onto Y a nd Z, respectively. The following
are eq u ivalent:

(i)
(ii)
(iii)

P::;;; Q;

II Px II < II Qxll
Y c: z;

(iv) QP =
(v) PQ =

P; and
P.

for all x

E X;

7. I iL near

ChJpz ter

34 8

Operators

Assume that P ~ Q. Since P and Q are orthogonal projections, they


are hermitian. F o r a hermitian operator, P ~ 0 means (Px , x ) ~ 0 for all
x E .X If P < Q, then (Px , x ) < (Qx , x ) for all x E X or (P"x , x ) < (Q"x ,
x ) or (Px , Px ) ~ (Qx , Qx ) or II Px II" < II Qx1l 2 , and hence IIPxll < II Qx l l
for aU x E .X
Next, assume that II Px II < II Qx II for all x E .X If x E Y , then Px = x
and
Proof

(x , x )

(Px , Px )

IIQxll" ~

IIPxll" ~

IIQllllxll"

II x

II" =

(x , x ) ,

and therefore II Qx II = II x II. F r om Theorem 7.5.23 it now follows that


Qx = x , and hence x E Z. Thus, whenever x E Y then x E Z and Z ::J Y.
Now assume that Z ::J Y and let y = Px , where x is any vector in X.
Then QPx = Qy = y = Px for all x E X and QP = P.
Suppose now that QP = P. Then (QP)* = P*, or P*Q* = PQ = p* = P;
i.e., PQ = P.
Finally, assume that PQ = P. F o r any x E X we have (Px , x ) = IIPxll"
= IIPQxll"~IIPII"IIQxll"
= IIQxll" = (Qx , Qx ) = (Q2 X ,X ) = (Qx , x ) ;
i.e., (Px, )x < (Qx , )x from which we have P < Q. _
We leave the proof of the next result as an exercise.
7.5.29. Theorem. Let Y

and "Y be closed linear subspaces of ,X

and let

PI and P 2 be the orthogonal projections onto Y t and "Y , respectively. The


difference transformation P = PI - P z is an orthogonal projection if and
only if P z < PI' The range of Pis Y t n Y t .

7.5.30. Exercise.

Prove Theorem 7.5.29.

We close this section by considering some specific cases.


7.5.31. Example.
in Example .4 10.48.

eL t R denote the transformation from E" into E" given


That transformation is represented by the matrix

R,

= [c~S

SID

0 - sin OJ
cos 0

with respect to an orthonormal basis e{ l'


obtain
R:

=[
-

e"J.

By direct computation we

c~s 0

SID

sin OJ.
9 cos 9

It readily follows that R*R = RR* = I. Therefore, R is a linear transformation which is isometric, unitary, and normal. _

7.6. The Spectrum 0/ an Operator

7.5.32. Exercise.
eL t
by y = PTx, where

=
X

y(t) =

34 9
0[ , 00) and define the truneation operator P T

{ X ( t)

for all 0 < t :::;; T


for all t > T

Show that PT is an orthogonal projection with range

R
< (P

T)

x{

E :X

x(t)

and null space


m(P T )

Additional examples
Section 7.10.

7.6.

THE

= x{

:X

(x t)

0 for t

> T},

= 0 for all t < T}.

of different types of operators are considered in

SPECTRUM

OF

AN OPERATOR

In Chapter 4 we introduced and discussed eigenvalues and eigenvectors


of linear transformations defined on finite-dimensional vector spaces. In
the present section we continue this discussion in the setting of infinitedimensional spaces.
nU less otherwise stated, X will denote a complex Banach space and I will
denote the identity operator on .X oH wever, in our first definition, X may be
an arbitrary vector space over a field .F
7.6.1. Definition. eL t T E L ( X ,
)X . A scalar A E F is called an eigenvalue
of T if there exists an x E X such that x * - O and such that Tx = AX. Any
vector x * - O satisfying the equation Tx = Ax is called an eigenvector of T
corresponding to the eigenvalue A..
7.6.2. Definition. eL t X be a complex Banach space and let T: X The set of all .J E F = C such that

.X

(i) R
< (T - AI) is dense in ;X
(ii) (T - .J I)-I exists; and
(iii) (T - .J I)-I is continuous (i.e., bounded)
is called the resolvent set of T and is denoted by p(T). The complement of
p(T) is called the spectrum of T and is denoted by q ( T).
The preceding definitions require some comments. First, note that if .J
is an eigenvalue of T, there is an x * - O such that (T - .J I)x = O. From
Theorem 3.4.32 this is true if and only if (T - AI) does not have an inverse.
eH nce, if .J is an eigenvalue of T, then ,t E (q T). Note, however, that there

C1u:zpter 7 I iL near Operators

04

are other ways that a complex number 1 may fail to be in p(T). These possi.
bilities are enumerated in the following definition.
7.6.3. Definition. The set of all eigenvalues of T is called the point spectrum
of T. The set of alll such that (T - l1)- 1 exists but Gl(T - l l) is not dense
in X is called the residual spectrum of T. The set of all 1 such that (T - 11)-1
exists and such that Gl(T - 11) is dense in X but (T - ll)- I is not continuous
is called the continuous spectrum. We denote these sets by pq ( T), Rq(T),
and Cq(T), respectively.
Clearly, q ( T) = Pq(T) U Cq(T) U Rq(T). Furthermore, when X is finite
dimensional, then q(T) = Pq(T). We summarize the preceding definition in
the following table.
AI)-1 exists and
is
continuous
(T (T -

< R (T- U )
R
< (T

-U)

*X

.11)-1

AI)-1 exists but


not
continuous
(T (T -

AI)-1 does
not exist

(T -

.11)-1 is

A e p(D

.Ie Ca(D

A e Pa(D

.Ie "RtT(T)

1 e RtT(T)

1 e PtT(T)

7.6.4. Table A. Characterization of the resolvent set and the spectrum


of an operator

7.6.5. Example.
x = (~I'
~2"
..)

Let X = /2 be the Hilbert space of Example 6.11.9, let


,X and define T E B(X , X ) by

Tx

! { 2 ' i(3' ...).


F o r each 1 E C we want to determine (a) whether (T - 11)-1 exists; (b) if so,
whether (T - 11)-1 is continuous; and (c) whether Gl(T - 1 1) = .X
(~I'

First we consider the point spectrum of T. IfTx =

lx then (~

k = 1,2 . ... This holds for non-trivial x if and only if l


Hence.

k:k =

pq ( T) = {

l )~k
=

0,

11k for some k.

I. 2 . .. } .

Next, assume that 1 pq(T). so that (T - l1)- 1 exists. and let us inves
tigate the continuity of (T - 1 1)- 1 . We see that if y = (' I I. 1' 2.' ..) E Gl(T
- 11), then (T - l 1)- l y = x is given by
~

-.....!l.L_
k'lk .
..! . ._ l - I - l k

k-

7.6. The Spectrum 0/ an Operator

Now if A.

0, then

II (T - A.I)-I y W=

.
k= 1

14
~

and (T -

k"11~

A.I)-I is not

bounded and hence not continuous. On the other hand, if A.


A.I)-I is continuous since I' k I < , 1 11k I for all k, where

(T -

*" 0, then

and
p(n

= P[ O'(T) u CO' ( nr

7.6.6. Exercise. eL t X = lz, the Hilbert space of Example


x = (' I ' ,,,' ' 3 ' " .), and define the right shift operator T,: X - +
left shift operator T,: X - + X by

=
Y

and

T,x

(0,

I' ' , ,' ...)

6.11.9, let
X and the

respectively. Show that

p(T,)

p(T,)

= CO'(T,) = A{ .

CO'(T,)
RO'(T,)
PO'(T,)

= A{ .

PO'(T,)
RO'(T,)

C: IA.I >

= A{ . E
= 0.

I),

C: IA.I =
C: IA.I

I),

< I),

We now examine some of the properties of the resolvent set and the
spectrum.
7.6.7. Theorem. Let T E B(X, X). IflAI >
lently, if A E O'(n, then IA.I < II Til.

II Til, then A. E

p(T) or, equiva-

14

Chapter 7

7.6.8. Exercise.

I iL near Operators

Prove Theorem 7.6.7 (use Theorem 7.2.2).

7.6.9. Theorem. Let T

B(X,

X).

Then P(T) is open and o'(T) is closed.

Proof Since o(T) is the complement of p(T), it is closed if and only if P(T)
is open. Let 1 0 E P(T). Then (T - 1 0 1) has a continuous inverse. F o r arbitrary 1 we now have

III- (T - l oI} - I (T 1- 1}11


= II(T - l oI} - ' ( T - 1 0 1) - (T - l ol} - I (T
= II(T - l oI} - I [ ( T - 1 01) - (T -1I)]11
= II(l- l o)(T - 1 0 1)-111

= Il- l olIl(T - 1 0 /)-111.


Now for 11 - 10 I sufficiently small, we have
III- (T - loT) - I (T - 1I) II = 11 - 1 0 III(T -

1 0 ) -I

- 1 /)11

II <

1.

Now in Theorem 7.2.2 we showed that if T E B(X, X), then T has a continuous inverse if III- Til < 1. In our case it now follows that (T - lo/)- I (T
- lI) has a continuous inverse, and therefore (T - 1I) has a continuous
inverse whenever Il - lo I is sufficiently small. This implies that 1 E p(T)
and P(T) is open. eH nce, u(T) is closed. _
F o r normal, hermitian, and isometric operators we have the following
result.
7.6.10. Theorem. eL t X be a Hilbert space, let T
eigenvalue of T, and let Tx = lx . Then
(i)
(ii)
(iii)
(iv)

B(X ,

X),

let l be an

if T is hermitian then 1 is real;


if T is isometric, then III = I;
if T is normal, then X is an eigenvalue of T* and T*x = ;x X
and
if T is normal, if .J l is an eigenvalue of T such that .J l 1= = 1, and if
Ty = .J lY, then x ..1 y.

Proof Without loss of generality, assume that x is a unit vector.


To prove (i) note that l = 111 x W= l(x , )x = (lx , )x = (Tx, )x , which
is real by Theorem 7.4.5. Therefore, (Tx, )x = (x, Tx) = (Tx, )x = ;X i.e.,
1 = X and 1 is real.
To verify (ii), note that if T is isometric, then II Txll = IIxll = 1, by
Corollary 7.5.9. Since Tx = Ax it follows that IIlxll = 1 or Illllx l i = I,
and hence III = l.

7.6.

The Spectrum 0/ an Operator

34

To prove (iii), assume that T is normal; i.e., T*T =


(T -

U ) (T

U)*
=

eL .,

(T -

(T -

U ) (T*

(T -

U ) T*

(T -

T*T -

IT -

(T* -

II)(T -

UXT

).1)*

U ) II

IT +

).T* -

II)

TT* -

TT*. Then

).T* +

).II

(T -

U ) *(T

),1)

(T -

;.II
U ) *(T

).1);

U),

and (T - AI) is normal. Also, we can readily verify that I\ (T - ).I)x II =


II(T - 11)*xll. Since (T - U ) x
= 0, it follows that (T - AI)*X = 0, or
(T* - Il)x = 0, or T*x = Ix . Therefore, I is an eigenvalue of T* with eigenvector .x
To prove the last part assume that 1 F= /.l and that T is normal. Then
(1 -

/.l)(x, y)

=
=

i.e., (A -

/.l)(x, y)

= (.tx, y) - (x, fly)


T*y) = (Tx , y) - (Tx , y) =

).(x, y) -

/.l(x, y)

(Tx , y) -

(x,

= O. Since 1 F=

/.l we have x ..L y.

0;

The next two results indicate what happens to the spectrum of an operator
T when it is subjected to various elementary transformations.
7.6.11. Theorem.
T. Then

Let

T E B(X ,

= p(q(T

q ( p(T

7.6.12. Exercise.

and let P(T) denote a polynomial in

X),

= {p(A): A E q(T)J.

Prove Theorem 7.6.11.

7.6.13. Theorem. Let T


q ( T- I )

B(X ,

be a bijective mapping. Then

X)

[ q ( T)r

tJ.

l{ :).

q ( T)} .

Proof Since T- I exists, 0 i q ( T) and so the definition of (q[ T)]1sense. Now for any). F= 0, consider the identity
(T- I -

It follows that if 1
1

(U

q ( T), then (T- I -

q ( T) implies that

prove that [q(T)]-1


Tand T- I .

1I)

1i

c q ( T- I )

q ( T- I ).

T)

makes

T- I .

1/) has a continuous inverse; i.e.,

In other words, U ( T- I )

c [ u (T)r

To

we proceed similarly, interchanging the roles of

Chapter 7 I iL near Operators

We now introduce the concept of the approximate point spectrum of an


operator.
7.6.14. Definition. eL t T E B(X, )X . Then 1 E C is said to belong to the
approximate point spectrum of T if for every E > 0 there exists a non-zero
vector x E X such that II Tx - lx II < Ell x II. We denote the approximate
point spectrum by n(T). If 1 E n(T), then 1 is called an approximate eigenvalue
ofT.
Clearly, Pt1(T) c n(T). Other properties of n(T) are as follows.
7.6.15. Theorem.
n(T) c t1(T).

eL t

be a Hilbert
X

space, and let T

Proof Assume that 1 ~ t1(T). Then (T and for any x E X we have

IIxII =

1/)- 1

and 1

II.
~

Then

)X .

has a continuous inverse,

< II(T - l l)- I IIII(T

II(T- l l)- I (T - l l)x l l

Now let E = I/II(T Ell x II for every x E X

lJ )

B(X,

- l l)x l l.

Then we have, from above, II Tx n(T). Therefore, t1(T) ::> n(T).

lx l l
~

We leave the proof of the next result as an exercise.


7.6.16. Theorem. eL t X be a Hilbert
normal operator. Then n(T) = t1(T).
7.6.17. Exercise.

space, and let T

B(X,

be a

X)

Prove Theorem 7.6.16.

We can use the approximate point spectrum to establish some of the


properties of the spectrum of hermitian operators.
7.6.18. Theorem. eL t
hermitian. Then
X

be a Hilbert

space, and let T

B(X,

X)

be

(i) t1(T) is a subset of the real line;

(ii) II Til = sup {Ill: 1 E t1(T)}; and


(iii) t1(T) is not empty and either + II Til or

-II

Til belongs to t1(T).

Proof To prove (i), note that if T is hermitian it is normal and t1(T) = n(T).
eL t 1 E n(T), and assume that 1 0 is complex. Then for any x
0 we have

0<

11- IlIlxW =

<

I T -

=
i.e.,

211(T

ll)x ,

)x 1

*'

11- II(x , )x

I T -

Il)x ,

*'

I T - l l)x ,

)x 1 <

II(T -

)x -

lJ)lx lllxll

- l J ) x l lllx l l;
0<

11- IllIx l l <

T -

211(T- l l)x l l

Il)x ,

)x 1

II(T- Il)x l lllx l i

The Spectrum 0/ an Operator

7.6.

for all x E .X But this implies that l rt neT), contrary to the original assumption. eH nce, it must follow that l = .i, which implies that l is real.
To prove (ii), first note that II Til > sup { I ll: l E q ( T)} for any T E
B(X, X) (see Theorem 7.6.7). To show that equality holds, if T is hermitian,
we first must show that II T WE n(P) = q ( P). F o r all reall and all x E X
we can write

IIT2 x -

..1hW =

Since (T2X , )x

(T2 X or

(T2 X - l 2 X , T2X - l 2 X )
= (T2 X , T2 X ) - (T2 X , l2 X ) -

(Tx, T*x)

l2X , Px -

l2 X

IIT2x - l 2 X W =

Now let }~x{

(Tx, Tx), we now have


= (T2 X , T2 X ) - 2l 2(Tx, Tx)
2l 211TxW

IIT2x W -

be a sequence

< (II T 1111 Tx~


).4

l2X~HZ

ZH

).211 Tx~

IIT2X~HZ

11)2 -+

II T2x .

2A,211 Tx~ ZH
0 as n - +

2l211Tx~W

A,4 =

). 211 Tx~

ZH -

eL t

7.6.22. Exercise.

Prove Theorem 7.6.21.

x{

be a Hilbert

:X

IITII. If'

B(X,

(T -

W+ ). 4

00;

7.6.21. Theorem.
neT) is closed.

i~ T)

)x ,
(7.6.19)

2). 211 Tx~

Prove part (iii) of Theorem 7.6.18.

In the following we let T


i.e.,
space of T - ;U

A,4(,X

l4

7.6.20. Exercise.

)x .

A,4 I1xW.

- ). 2.x 11 +- 0 as n +- 00, and thus ). 2 E n(T2) =


Using Theorems 7.6.11 and 7.6.15 and the fact that). 2
follows that
IITII = sup { I ll: l E u(T)} .
The proof of (iii) is left as an exercise.
_
eL .,

l4(,X

of unit vectors such that IITx~ll--+

l = IITII, then we have, from Eq. (7.6.19),


IIPx~

(l2 X , T2X ) +

X),
U)x

space, and let T

).

C, and we let

= O} =

~(T

U).

(q T2).
E

.~ l(T)

n(P), it now

B(X,

)X .

Then

be the null
(7.6.23)

It follows from Theorem 7.1.26 that ~.l(T)


is a closed linear subspace of .X
F o r the next result, recall Definition 3.7.9 for the meaning of an invariant
subspace.

7.6.24. Theorem. Let X be a Hilbert space, let). E C, and let S, T


B(X, )X . If ST = TS, then ~l(T)
is invariant under S.

Chapter 7 I iL near Operators

46

Proof

L e t x E l~ (n.
We wantto show that Sx
Since x E ~l(n,
we have Tx = .lx. Thus, STx

have TSx = lSx .

7.6.25.

Corollary

Proof

Since IT =

i.e., TSx = lSx .


lSx . Since ST = TS, we

E ~l(n;

is invariant under T.

~l(n

IT, the result follows from Theorem 7.6.24.

F o r the nex t result, recall Definition 7.5.19.


7.6.26. Theorem. L e t X be a H i lbert space, let A.
If T is normal, then
(i)
(ii)
(iii)

l~ (T)

l~ (T)..l
l~ (T)

~rtT*);
,~ ,(T)

if A.

reduces T.

C, and let T

B(X,

X).

"* p.; and

Proof

The proofs of parts (i) and (ii) are left as an exercise.


To prove (iii), we see that ~l(T)
is invariant under T from Corollary
7.6.25. To prove that ~l(T)lis invariant under T, let y E ~l(T)l..
We want
to show that (x, Ty) = 0 for all x E & i T). If x E &l(T), we have Tx = .lx,
and so, by part (i), T*x = .x X
Now (x, Ty) = (T*x, y) = (X,x
y) =
(X ,x
y) = O. This implies that Ty E &l(T)l., and so &l(T)l. is invariant under
T. This completes the proof of part (iii).
7.6.27.

Prove parts (i) and (ii) of Theorem 7.6.26.

Exercise.

Before considering the last result of this section, we make the following
definition.
7.6.28. Definition. A family of closed linear subspaces in a H i lbert space
X is said to be total if the only vector y E X orthogonal to each member of
the family is y =

o.

7.6.29. Theorem. L e t X be a H i lbert space and let S; T E B(X , X ) . Ifthe


family of closed linear subspaces of X given by {&l(T):
A. E C} is total,
then TS = ST if and only if & l (n is invariant under S for all,t E C.

Proof

The necessity follows from Theorem 7.6.24. To prove sufficiency,


assume that & l (T) is invariant under S for aliA. E C. L e t & denote the null
- ST). If x E ~in,
then Sx E ~l(n
space of TS - ST; i.e., ~ = ~(TS
by hypothesis. Hence, TSx = T(Sx ) = ,t(Sx) = S(.lx) = S(Tx ) = STx for
all x E ~iT).
Thus, (TS - ST)x = 0 for any x E & l (n, and so ~in
c .&
If there is a vector y 1- & , then it follows that y 1- & i T) for all A. E C. By
hypothesis, the family {&l(T):
A. E C} is total, and thus y = O. It follows that
1& . = O
{ J and 1& .1. = rOll. and 1& .1. = ,&
because & is a closed linear

7.7.

74

Completely Continuous Operators

subspace of .X Therefore,
Hence, TS = ST.

7.7.

COMPLETELY

m. =

X;

eL .,

CONTINUOUS

(TS -

ST)x

= 0 for all
x

.X

OPERATORS

Throughout this section X is a normed linear space over the field ofcomplex
numbers C.
Recall that a set Y c X is bounded if there is a constant k such that for
all x E Y we have II x II < k. Also, recall that a set Y is relatively compact
if each sequence x{ n } of elements chosen from Y contains a convergent
subsequence (see Definition 5.6.30 and Theorem 5.6.31). When Y contains
only a finite number of elements then any sequence constructed from Y must
include some elements infinitely many times, and thus Y contains a convergent

subsequence. From this it follows that any set containing a finite number
of elements is relatively compact. Every relatively compact set is contained
in a compact set and hence is bounded. F o r the finite-dimensional case it is
also true that every bounded set is relatively compact (e.g., in Rn the BolzanoWeierstrass theorem guarantees this). However, in the infinite-dimensional
case it does not follow that every bounded set is also relatively compact.
In analysis and in applications linear operators which transform bounded
sets into relatively compact sets are of great importance. Such operators are
called completely continuous operators or compact operators. We give the
following formal definition.

7.7.1. Definition. eL t X and Y be normed linear spaces, and let T be a


linear transformation with domain X and range in .Y Then T is said to be
completely continuous or compact if for each bounded sequence x { n } in ,X the
sequence { T x . } contains a subsequence converging to some element of y E .Y

We have the following equivalent characterization of a completely continuous operator.


7.7.2. Theorem. Let X and Y be normed linear spaces, and let T E
B(X , Y ) . Then T is completely continuous if and only if the sequence { T x n }
contains a subsequence convergent to some y E Y for all sequences x { n } such
that Ilx,,11 < I for all n.
7.7.3.

Exercise.

Prove Theorem 7.7.2.

Clearly, if an operator T is completely continuous, then it is continuous.


On the other hand, the fact that T may be continuous does not ensure that
it is completely continuous.
We now cite some examples.

Chapter 7 I iL near Operators

84

7.7.4. Example. eL t T: X - X be the ez ro operator; i.e., Tx =


x E .X Then T is clearly completely continuous. _

0 for all

7.7.5. Example. Let X = era, bJ, and let II . II", be the norm on era, bJ
as defined in Example 6.1.9. eL t k: a[ , bJ X a[ , bJ - R be a real-valued
function continuous on the square a < s < b, a < t < b. Defining T:
X-Xby

s:

T
[ (J x s)

k(s, t)x(t)dt

for all x E ,X we saw in Example 7.1.20 that Tis a bounded linear operator.
We now show that T is completely continuous.
eL t ,x { ,}
be a bounded sequence in ;X i.e., there is a K >
0 such that
IIx"lI", < K for all n. It readily follows that if "Y = Tx", then IIY"II S 7011x"II,
where

70 =

sup

1I~,b~

fb Ik(s, t) Idt (see Example

7.1.20). We now show that .Y { }

II

is an equicontinuous set of functions on a[ , bJ (see Definition 5.8.11). Let


f >
O. Then, because of the uniform continuity of k on a[ , bJ X a[ , bJ, there
is a ~

> 0 such that

Ik(s .. t) -

every t E a[ , bJ. Thus


IY,,(sl)

y,,(s~)

I<

k(s~,

Ik(sl'

t)1
t) -

<

(K b

k(s~,

f_

a) if

lSI - s~1 < ~

t) IIx(t) Idt

<

for

for all n and all s.. s~ such that lSI - s~ I < ~. This implies the set ,Y{ ,}
is
equicontinuous, and so by the Arzela-Ascoli theorem (Theorem 5.8.12),
the set { Y . }
is relatively compact in era, b] ; i.e., it has a convergent subseuq ence. This implies that T is completely continuous.
It can be shown that if X = L~[a,
b) and if T is the Fredholm operator
defined in Example 7.3. II, then T is also a completely continuous operator.

The next result provides us with an example of a continuous linear transformation which is not completely continuous.
7.7.6. Theorem. Let IE B(X , X) denote the identity operator on X .
Then I is completely continuous if and only if X is finite dimensional.

Proof. The proof is an immediate consequence of Theorem 6.6.10. _


We now consider some of the general properties of completely continuous
operators.
7.7.7. Theorem. eL t X and Y be normed linear spaces, let S, T E B(X , )Y
be completely continuous operators, and let IX, pEe. Then the operator
(IXS + PT) is completely continuous.

7.7. Completely Continuous Operators

94

Proof Given a sequence .x{ }


with Ilx.1I < I, there is a subsequence x { .}
such that the sequence {Sx .} has a limit u; i.e., Sx ~ u. F r om the sequence
x { .} we pick another subsequence x { ,} such that TX.' J ~ v. Then

(as
as n k , n kJ
~

00.

PDx
J

= aSx
J

PTx , -

(Xu

pv

We leave the proofs of the next results as an exercise.


7.7.8. Theorem. L e t T E B(X, X ) be completely continuous. Let Y be a
closed linear subspace of X which is invariant under T. Let T t be the restriction of T to .Y Then T t E B(Y, )Y and T t is completely continuous.
7.7.9. Exercise.

Prove Theorem 7.7.8.

7.7.10. Theorem. L e t T E B(X, X ) be a completely continuous operator,


and let S E B(X , X ) be any bounded linear operator. Then ST and TS are
completely continuous.
7.7.11. Exercise.

Prove Theorem 7.7.10.

7.7.12. Corollary. Let X


B(X , )Y and S E B( ,Y X).
pletely continuous.
7.7.13.

Exercise.

and Y be normed linear spaces, and let T E


If T is completely continuous, then ST is com-

Prove Corollary 7.7.12.

7.7.14.
Example. A consequence of the above corollary is that if T E
B(X, X ) is completely continuous and X is infinite dimensional, then T cannot
be a bijective mapping of X onto .X For, suppose T were bijective. Then we
would have T- t T = I. By the Banach inverse theorem (see Theorem 7.2.6)
T- t would then be continuous, and by the preceding theorem the identity
mapping would be completely continuous. However, according to Theorem
7.7.6, this is possible only when X is finite dimensional.
Pursuing this example further, let X = era, bJ with II II~ as defined
in Example 6.1.9. Let T: X

<

~
X

be defined by Tx(t)

s: (x r- )d-r

for a

<

b and x E .X It is easily shown that T is a completely continuous operator


< (T) is the family of all functions
on .X It is, however, not bijective since R
which are continuously differentiable in ,X and thus R
< (T) is clearly a proper
subset of .X The operator T is injective, since Tx = 0 implies x = O. The
< (T) and a < t < b. We
inverse T- t is given by T- t y(t) = dy(t)/dt for y E R
saw in Example 5.7.4 that T- t is not continuous.
In our next result we require the following definition.

Chapter 7 I iL near Operators


7.7.15. Definition. Let X and Y be normed linear spaces, and let T E
B(X, )Y . The operator Tis said to be finite dimensional ifT(X ) is finite dimensional; i.e., the range of T is finite dimensional.
7.7.16. Theorem. Let X and Y be normed linear spaces, and let T E
B(X, )Y . If T is a finite-dimensional operator, then it is a completely continuous operator.
Let .x { l be a sequence in X such that II .x 1I ::;; 1 for all n. Then { T x . l
is a bounded sequence in T(X). It follows from Theorem 6.6.10 that the set
{ T x . l is relatively compact, and as such this set has a convergent subsequence
in T(X ) . It follows from Theorem 7.7.2 that T is completely continuous. _

Proof

The proof of the next result utilizes what is called the diagonalization
process.
7.7.17. Theorem. Let X and Ybe Banach spaces, and let { T .l be a sequence
of completely continuous operators mapping X into .Y If the sequence { T .l
converges in norm to an operator T, then T is completely continuous.
Let .x { l be an arbitrary sequence in X with IIx.11 < I. We must show
that the sequence {Tx.l contains a convergent subsequence.
By assumption, T 1 is a completely continuous operator, and thus we can
{ 1 .x l. Let
select a convergent subsequence from the sequence T

Proof

denote the inverse images of the members of this convergent subsequence.


Next, let us apply T" to each member of the above subsequence. Since T"
is completely continuous, we can again select a convergent subsequence
from the sequence {T"x. 1l. The inverse images of the terms of this sequence
are
Xu,

"X ", x 3", ... , .x ", ....

Continuing this process we can generate the array

Using

this array, let us now form the diagonal sequence

Now each of the operators T IJ T", T 3 , , T., ... transforms this sequence
into a convergent sequence. To show that Tis completely continuous we must

7.7.

Completely Continuous Operators

54 1

show that T also transforms this sequence into a convergent sequence. Now

II Tx - Tx .... 11 =

<

<

1\ Tx -

liT -

Tkx

1\ Tx." 11

+ II Tkx

Tkll(llx 11

i.e.,

Ilx

m ",

Tkx
-

II) +

Tkx". -

Tkx ..",

Tkx",,,, -

II + II Tkx",,,, - Tx",,,, II
1\ Tkx - T k"x ,,,, II;

Tx",,,,

II

Tkx",,,,

II Tx"" - Tx",,,, II < liT - TkII(II x II + II "x '

II) + II Tkx - Tkx",,,, II.


Since the sequence T
{ kX } converges, we can choose m, n > N such that
II Tkx - Tkx ..", II < f/2, and also we can choose k so that II T - Tk II < f/4.
We now have

II Tx - Tx",,,, II <

whenever m, n > Nand T


{ "x .}
is a Cauchy sequence. Since Y is a complete
space it follows that this sequence converges in Y a nd by Theorem 7.7.2
the desired result follows. _
Theorem 7.7.7 implies that the family of completely continuous operators
forms a linear subspace of B(X, )Y . The preceding theorem states that if Y
is complete, then this linear subspace is closed.
7.7.18. Theorem.

eL t
X

be a Hilbert

space, and let T

B(X ,

X).

Then

(i) T is completely continuous if and only if T*T is completely continuous; and


(ii) T is completely continuous if and only if T* is completely continuous.
We prove (i) and leave the proof of (ii) as an exercise. Assume that
T is completely continuous. It then follows from Theorem 7.7.10 that T*T
is completely continuous.
Conversely, assume that T*T is completely continuous, and let (x,,} be
a sequence in X such that II "x II < 1. It follows that there is a subsequence
"x{ J
such that T*Tx". - > x E X as nk - > 00. Now
Proof

II TX"J - Tx". W= II T(x"J - x ) W= (T(x' J - "x .), T(x", - "x .


= (T*T(x", - "x .), (x"J - "x . < II T*T(x"J - x ) II II "x J - "x . II

:::;; 211 T*Tx",

T*Tx " .II- -

as nl , nk - > 00. Thus, T


{ "x ,}
is a Cauchy sequence and so it is convergent.
It follows from Theorem 7.7.2 that Tis completely continuous. _
7.7.19.

Exercise.

Prove part (ii) of Theorem 7.7.18.

In the remainder of this section we turn our attention to the properties


of eigenvalues of completely continuous operators.

Chapter 7 I iL near Operators

54 2

7.7.20. Theorem. eL t X b e a Hilbert space, let T


If T is completely continuous and if 1 =#

m.A(n =

is finite dimensional.

:x {

0, then

Tx

B(X, )X , and letA. E C.

lx }

Proof. The proof is. by contradiction. Assume that m.A(n is not finite
dimensional. Then there is an orthonormal infinite sequence X I ' 2X .' ,
x .., ... in m.A(n, and

= II Ax .. -AxlllW = 1112. II x .. - lx llW = 21112.;


Txlllil = ,."I"'r III =# 0 for all m =# n. Therefore, no subsequence

IITx .. -

TxlllW

i.e., II Tx .. of T
{ x ..} can be a Cauchy sequence, and hence no subsequence of T
{ x ..} can
converge. This completes the proof. _
In the next result n(T) denotes the approximate point spectrum of T.

7.7.21. Theorem. eL t X b e a Hilbert space, let T E B(X, )X , and let 1 E C.


If T is completely continuous, if 1 =# 0, and if 1 E n(T), then 1 is an
eigenvalue.

For each positive integer n there is an x .. E X such that II Tx .. -

Proof.

< .!.nII x .. II forA.

n(n. We may assume that II x .. II =

Ax .. II

l. Since Tis completely

continuous, there is a subsequence of x { ..}, say x { ...} such that T


{ x ..J is
convergent. eL t lim Tx ... = y E .X It now follows that lIy - lx I1-- 0
as nk

--

"'
00; i.e., AX ... -

y. Now lIyll=# 0, because lIyll = lim II AX ... II =

IAI lim II x II = IAI =# O. By the continuity of T. we now have


"'

Ty =

T(lim lx ...) =
....

lim T(AX ...) =


IJ ,t

1 lim Tx
II.

"'

= ly.

eH nce, Ty = ly, y =# O. Thus, 1 is an eigenvalue of T and y is the corresponding eigenvector. _


The proof of the next result is an immediate consequence of Theorems
7.6.16 and 7.7.21.

7.7.22. Theorem. eL t X be a Hilbert space, and let T


pletely continuous and normal. If 1
ofT.

7.7.23. Exercise.

u(n and 1 =#

E B(X, X ) be com0, then 1 is an eigenvalue

Prove Theorem 7.7.22.

The above theorem states that, with the possible exception of 1 = 0,


the spectrum of a completely continuous normal operator consists entirely
of eigenvalues; i.e., if 1 =# 0, either 1 E Pu(T) or 1 E P(T).

7.7.

Completely Continuous Operators

54 3

7.7.24.
Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If T is
completely continuous and hermitian, then T has an eigenvalue, l, with

III = II Til

Proof The proof follows directly from part (iii) of Theorem 7.6.18 and
Theorem 7.7.22. _
7.7.25. Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If Tis
normal and completely continuous, then T has at least one eigenvalue.
Proof If T = 0, then l = 0 clearly satisfies the conclusion of the theorem.
So let us assume that T *- O. Also, if T = T*, the conclusion of the theorem
follows from Theorem 7.7.24. So let us assume that T*- T*. L e t U = 1(T

T*) and V =

i/T -

T*). It follows from Theorem 7.4.15 that U

and V

are hermitian. F u rthermore, by Theorem 7.5.4 we have U V = VU. F r om


Theorems 7.7.7 and 7.7.18, U and V are completely continuous. Byassumption, V*- O. By the preceding theorem, V has a non- z e ro eigenvalue which we
shall call p. It follows from Theorem 7.1.26 that ffi:iV) = ffi:(V - PI) ~ N
is a closed linear subspace of .X Since U V = VU, Theorem 7.6.24 implies
that N is invariant under .U Now let U I be the restriction of U to the linear
subspace N. It follows that U I is completely continuous by Theorem 7.7.8.
It is readily verified that U I is a hermitian operator on the inner product
subspace N (see Eq. (3.6.21). Hence, U I is completely continuous and
hermitian. This implies that there is an (X E C and an x E N such that x * - O
and U l x
= (X.x This means Ux = (X.x Now since x E N, we must have Vx
= px . It follows that l = (1, + iP is an eigenvalue of T with corresponding
ipx = x + iP)x = lx . This
eigenvector x , since Tx = U [ + iV] x = (Xx
completes the proof. _

We now state and prove the last result of this section.


7.7.26. Theorem. L e t X be a H i lbert space, and let T E B(X , X). If Tis
normal and completely continuous, then T has an eigenvalue l such that

III = II Til

Proof L e t S = T*T. Then S is hermitian and completely continuous by


Theorem 7.7.18. Also, S > 0 because (Sx , )x = (T*Tx , x ) = (Tx , Tx ) =
II Tx ZH > O. This last condition implies that S has no negative eigenvalues.
Specifically, if l is an eigenvalue of S, then there is an x * - O in X such that
Sx = Ax. Now

o<

(Sx,

)x

(Ax, x) =

A(x,

)x =

AllxW,

and since II x II *- 0, we have A > O. By Theorem 7.7.24, S has an eigenvalue,


p, where p = IISII = IIT*TII = IITW Now let N ~ ffi:(S - pI) = ffi:iS ),
and note that N contains a non- z e ro vector. Since T is normal, TS = T(T*T)

Chapter 7 I iL near Operators


= (T*nT = ST. Similarly, we have T*S = ST*. By Theorem 7.6.24, N is
invariant under T and under T*. By Theorem 7.5.6 this means T remains
normal when its domain of definition is restricted to N. By Theorem 7.7.25,
there is alE C and a vector x I= :- 0 in N such that Tx = lx , and thus T*x
= .x X
Now since Sx = T*Tx = T*(lx ) = IT*x = llx = 1112x for this x I= :0, and since Tx = lJ X for all x E N, it follows that 111 2 = lJ = II S II = II T*T II
= II T W Therefore, III = II T II and 1 is an eigenvalue of T. _

7.8.

THE SPECTRAL THEOREM


O
F R COMPLETELY
CONTINUOS
U
NORMAL OPERATORS

The main result of this section is referred to as the spectral theorem


(for completely continuous operators). Some of the direct consequences of
this theorem provide an insight into the geometric properties of normal
operators. Results such as the spectral theorem playa central role in applications. In Section 7.10 we will apply this theorem to integral equations.

Throughout this section, X is a complex iH lbert


We require some preliminary results.

space.

7.8.1. neorem. L e t T E B(X, X ) be completely continuous and normal.


F o r each f > 0, let A. be the annulus in the complex plane defined by

A. =

{l

E C: f

< 1).1 s II Til}.

Then the number of eigenvalues of T contained in A. is finite.

Proof To the contrary, let us assume that for some f > 0 the annulus A.
contains an infinite number of eigenvalues. By the Bolzano-Weierstrass
theorem, there is a point of accumulation 1 0 of the eigenvalues in the
annulus A. Let ){ .ft} be a sequence of distinct eigenvalues such that )." - > ).0
as n - > 00, and let Tx" = l"x", II "x II = I. Since T is a completely continuous
for which the sequence T
{ "x .}
operator, there is a subsequence x { ...} of ,x { ,}
converges to an element u E X ; i.e., Tx". - > U as nk - > 00. Thus, since
Tx ... = l".x
we have l x ... - > u. But 1/).... - > 1/10 because 1" I= :- O. Therefore x - > (I/10)u. But the x are distinct eigenvectors corresponding to
distinct eigenvalues. By part (iv) of Theorem 7.6.10 .x { ..}
is an orthonormal
2
sequence and "x . - > (I/10)u. But II x - "x ,11 = 2, and thus x { ...} cannot be
a Cauchy sequence. Yet, it is convergent by assumption; i.e., we have arrived
at a contradiction. Therefore, our initial assumption is false and the theorem
is proved. _
ft. ,

Our next result is a direct consequence of the preceding theorem.

7.8.

The Spectral Theorem for Completely Continuous Normal Operators

54 5

7.8.2. Theorem. Let T

E B(X , X ) be completely continuous and normal.


Then the number of eigenvalues of T is at most denumerable. If the set of
eigenvalues is denumerable, then we have a point of accumulation at zero
and only at zero (in the complex plane). The non-zero eigenvalues can be
ordered so that

7.8.3. Exercise.

Prove Theorem 7.8.2.

The next result is known as the spectral theorem. Here we let Ao = 0,


and we let {AI' A2.' ...} be the non-zero eigenvalues of a completely continuous operator T E B(X , X). Note that Ao mayor may not be an eigenvalue
of T. If Ao is an eigenvalue, then m.(T) need not be finite dimensional. oH wever, by Theorem 7.7.20, m.(T - A/) is finite dimensional for i = 1,2, ....

7.8.4.

Theorem. eL t T E B(X, X ) be completely continuous and normal,


{ lt A2.' ...} be the non-zero distinct eigenvalues of T
let Ao = 0, and let A
(this collection may be finite). eL t m., = m.(T - A,I) for i = 0, I, 2, ....
Then the family of closed linear subspaces m
{ .,};:o of X is total.

The fact that each


Theorem 7.1.26. Now let Y

Proof

m., is a closed linear subspace of X follows from


= U m.", and let N = y.1.. We wish to show that

N= O
{ .J By Theorem 6.12.6, N is a closed linear subspace of .X We will
show first that Y is invariant under T*. Let x E .Y Then x E m.. for some n
and Tx = l"x. Now l.,(T*x ) = T*(l"x ) = T*Tx = T(T*x ) ; i.e., T(T*x )
= l.(T*x ) and so T*x E m.., which implies T*x E .Y
Therefore, Y is
invariant under T*. From Theorem 7.3.15 it follows that y.1. is invariant under
T. Hence, N is an invariant closed linear subspace under T. It follows from
Theorems 7.7.8 and 7.5.6 that if T I is the restriction of T to N, then T I E
B(N, N) and T I is completely continuous and normal. Now let us suppose that
N 1= = O
{ .J By Theorem 7.7.25 there is a non-zero x E N and a A. E C such that
T I x = lx . But if this is so, Ais an eigenvalue of T and it follows that x E m."
for some n. Hence, x E N (\ ,Y which is impossible unless x = O. This
completes the proof.
In proving an alternate form of the spectral theorem, we require
following result.

the

7.8.5. Theorem. Let {N k } be a sequence of orthogonal closed linear subspaces of ;X i.e., N k .1. N J for all j 1= = k. Then the following statements are
equivalent:

(i) N
{ k } is a total family;
(ii) X is the smallest closed linear subspace which contains every N k ; and

Chapter 7 I iL near Operators

S4 6

for every x E X there is a unique sequence x{


(a) X k E N k for every k,

(iii)

(b)

Proof

= U

II

L
k=1

k}

such that

and

X;

We first prove the equivalence of statements (i) and (ii). Let Y


Nil' Then Y c y.l.L by Theorem 6.12.8. Furthermore, y.l.L is the smallest

closed linear subspace which contains Y by Theorem 6.12.8. Now suppose


{ N k } is a total family. Then yl. = O
{ .J Hence, yl.l. = X and so X is the
smallest closed linear subspace which contains every N k
On the other hand, suppose X is the smallest closed linear subspace which
{ .J But yl.l.l. = lY .. Thus,
contains every N k Then X = y.l.L and yl.l.l. = O
yl. = O
{ ,J and so { N k } is a total family.
We now prove the equivalence of statements (i) and (iii). Let N
{ k } be a
total family, and let x E .X F o r every k = 1,2, ... , there is an IX < E IH <
and a kY E Nt such that x = X k + IY '< If IX < = 0, then (x, x k) = 0. If IX <
0,
then (x, xk1llxkll) = (Xk + kY ' x k lllx k ll) = II ,x .. II Thus, it follows from
Bessel's inequality that

*'

eH nce,
let Y

. Ilx,..1I

<

k=1
N j . Then (x -

(i:
,..-1 ,x ..,

(x j' y) -

Next,

00.

x o, y)

Y)

let

(x j

o
X

(x j' y) -

X k

"'1=
Yj - x o, y) =

i: (x k, y) =

"'I~

Then X o
(x j ,y)
(x j' y) -

For

.X

(Y j ,y)

(x j ' Y )

fixed j,
(xo,Y)

O. Thus,

x o) is orthogonal to every element of Nj for every j. Since N


{ ,..} is a total
family, we have x = x o. To prove uniqueness, suppose that x = L IX <

(x -

..
L

k=1

x~

x~,

where X

N k we have (x

i: II X k -

k=1

k,

x~

Nk

E
k -

.
for all k. Then L

x~)

L-

X~ 11 2 = O. Thus, II X

(x j k-

x~)

k=1

for j

(x k -

x~)

*' k, and so II k~

"'1-

O. Since X k
(x k -

x~)

Ir

x~ II = 0 for all k, and X k is unique for

each k.
To prove that (iii) implies (i), assume that x E Nt for every k. By hypothesis, x

i:

k=1

X k,

where

X k

N k for all k. Hence,

for any j we have

7.8.

54 7

The Spectral Theorem for Completely Continuous Normal Operators

and x ) = 0 for allj. This means x


completes the proof.

= 0, and so N
{ k ) is a total family. This

In Definition 3.2.13 we introduced the direct sum of a finite number of


linear subspaces. The preceding theorem permits us to extend this definition
in a meaningful way to a countable number of linear subspaces.
7.8.6. Definition. Let kY { ) be a sequence of mutually orthogonal closed
linear subspaces of ,X and let V({Y k )) be the closed linear subspace generated
by kY{ '}

If every x

V({Y k)) is uniquely representable as x

L
k= 1

X k E Y k for every k, then we say V({Y k)) is the direct sum of kY{ )'
case we write

k, where

In this

We are now in a position to present another version of the spectral theorem.


7.8.7. Theorem. eL t T E B(X , X ) be completely continuous and normal,
let lo = 0, and let P'I' l2' ... , In' ...) be the non-zero distinct eigenvalues
of T. eL t mol = mo(T - lJ ) for i = 0, I, 2, ... , and let Pi be the projection
on mol along mot. Then
(i) PI is an orthogonal projection for each i;
(ii) PIP) = 0 for all i,j such that i F= j;
(iii)

..

I; P J =

)- 0

(iv) T

..

t=1

I; and
lJP).

The proof of each part follows readily from results already obtained.
We simply indicate the principal results needed and leave the details as an
exercise.
Part (i) follows from the definition of orthogonal projection. Part (ii)
follows from part (ii) of Theorem 7.6.26. Parts (iii) and (iv) follow from
Theorems 7.1.27 and 7.8.5.

Proof

7.8.8. Exercise.

Prove Theorem 7.8.7.

In Chapter 4 we defined the resolution of the identity operator for


Euclidean spaces. We conclude this section with a more general definition.

{ n ) be a sequence of linear transformations on X


7.8.9. Definition. Let P
such that P n E B(X , X ) for each n. If conditions (i), (ii), and (iii) of Theorem
{ n ) is said to be a resolution of the identity.
7.8.7 are satisfied, then P

7.9.

DIFE
F RENTIATION

OF

OPERATORS

In this section we consider differentiation of operators on normed linear


spaces. Such operators need not be linear. Throughout this section, X and Y
are normed linear spaces over a field ,F where F may be either R, the real
numbers, or C, the complex numbers. We will identify mappings which are,
.Y As usual, L ( X ,
)Y will denote the class
in general, not linear by I: X - +
of all linear operators from X into ,Y while B(X , Y) will denote the class of
all bounded linear operators from X into Y~
7.9.1. Definition. L e t X o E X be a fixed element, and let I: X
there exists a function 6/(x o, .): X - +
Y such that

-+

.Y

If

(7.9.2)
(where t E )F for all hEX , then I is said to be Gateaux differentiable at
x o, and 6/(x o, h) is called the Gateaux differential of/at X o with increment h.
The Gateaux differential ofI is sometimes also called the weak differential
of I or the G-differenfial of f If I is Gateaux differentiable at x o, then
6/(x o, h) need not be linear nor continuous as a function of hEX . However,
we shall primarily be concerned with functions I: X - +
Y which have these
properties. This gives rise to the following concept.
7.9.3. Definition. L e t X o E X be a fixed element, and let I: X
there exists a bounded linear operator F ( x o) E B(X, )Y such that

.Y

If

(where hEX ) , then f is said to be F r echet differentiable at x o, and F ( x


is called the F r echet derivative of I at x o' We define

o)

I~~ 1I~lIl f(xo

h) -

f' ( x o)

f(x o) -

F(x

= 0

o)'

If I is F r echet differentiable for each x E D,


to be F r echet differentiable on D.
We now show that F r echet
tiability.

F(xo)hll

-+

where D c X,

then I is said

differentiability implies Gateaux

differen-

7.9.4.
Theorem. L e t/: X - +
,Y and let X o E X be a fixed element. If I is
F r echet differentiable at x o then/is Gateaux differentiable. and furthermore
the Gateaux differential is given by

6/(x o, h) =
54 8

f' ( x o )h

for all hEX .

7.9.

Differentiation ofOperators

Proof Let
such that

o)

F(x

II t~

o), let

!'(x

1I1\(J X

provided that II th II <

54 9

> 0, and let hEX .

/(x o) -

th) -

Then there is a 0

II <

F ( x o )th

> 0

II h II

0 if th *- O. This implies that

II /(x o + t~)
provided that It I <
h) = (F ox )h.

0/11 h II.

~/(xo,

Hence,

/(x o) -

II <

F ( x o )h

/ is Gateaux

Because of the preceding theorem, if I: X .at X o E ,X the Gateaux differential ~/(xo,


h)
Frecbet differential of/at x o with increment h.
Let us now consider some examples.

differentiable at
Y

o and

is Frechet differentiable
! , (x o )h is also called the

7.9.5. Example.
Let X be a Hilbert space, and let/be a functional defined
on X ; i.e., I: X .- .F If I has a Frechet derivative at some X o E ,X then
that derivative must be a bounded linear functional on ;X i.e.,! , (x o) E X.
In view of Theorem 6.14.2, there is an element oY E X such that ! , (x o )h =
(h,yo)for each h E .X AIthough! , (x o) E X andyo E ,X we know by Exercise
6.14.4 that X and X
are congruent and thus isometric. It is customary to
view the corresponding elements of isometric spaces as being one and the
same element. With this in mind, we say! ' ( x o) = oY and we call! ' ( x o) the
gradient off at X O'
As a special case of the preceding example
specific case.

we consider the following

7.9.6. Example.
Let X = R' and let 1111 be any norm on .X By Theorem
6.6.5, X is a Banach space. Now let / be a functional defined on X ; i.e.,
I: X .- R. Let x = (~I' ... ,~.) E X and h = (hI> ... ,h.) E .X If/has
continuous partial derivatives with respect to ~I' i = I, ... ,n, then the
Frechet differential of/is given by
~/(x,

F o r fixed
X

o E ,X

h) -

8/(x )
ae:
hI + ... + o
- 8/(xc )

h.

we define the bounded linear functional F(x


F(xo)h

= ~ 8/(x
~ )

hi "~"o

o) on X by

for hEX .

Then F ( x o) is the Frechet derivative of/at X O' As in the preceding example,


we do not distinguish between X and X , and we write

Chapter 7 I iL near Operators

64 0

The gradient off at x is given by

f '(x)

(Uf(X)

U f (x ) .

~ " "' ~

(7.9.7)

In the following, we consider another example of the gradient of a functional.


7.9.8. Example. eL t X b e a real Hilbert space, letL : X - > X b e a bounded
linear operator, and let/: X - > R be given by f(x ) = (x , L x ) .
Then I has a
rF echet derivative which is given by! , (x ) = (L + *L )x. To verify this, we let
h be an arbitrary element in X and we let (F )x
= (L + *L )x. Then

f(x

h) -

f(x ) -

F(x)h

(x

h, L x

= (h,Lh).

Lh) -

(x, L x )

F(x)h

I-

(h, L x )

(h, L *x)

From this it follows that


lim If(x
IhH

h) -

f(x ) -

IIhll

- .

In

the next example we consider a functional which frequently arises in


optimization problems.
7.9.9. Example. Let X and Y be real Hilbert spaces, and let L be a bounded
linear operator from X into ;Y i.e., L E B(X , )Y . eL t L * be the adjoint of L .
eL t v be a fixed element in ,Y and let/be a real-valued functional defined on
Xby

IIv - L x

f(x ) =

11 1 for all x

.X

Then f has a Frechet derivative which is given by


f' ( x )

-2L*v

(v, v) -

2L*Lx.

To verify this, observe that


f(x )

(v -

Lx,

(v, v) -

v-

Lx)

2(L*v, )x

2(v, L x )

(Lx,

Lx)

(x , L * L x ) .

The conclusion now follows from Examples 7.9.5 and 7.9.8.

I:

In the next

->

R"'.

example we introduce the Jacobian matrix

of a function

7.9.10. Example. eL t X = R8, and let Y = R"'. Since X and Y are finite
dimensional, we may assume arbitrary norms on each of these spaces and
they wiII both be Banach spaces. L e tf: X - > .Y F o r x = (~I"
.. '~8)
E ,X

7.9.

64 1

Differentiation ofOperators

let us write

I(x ) =

For
X

o E ,X

/I~X)J

/[ 1(1;1,;., .
.

I",(x)
1",(1;1''
assume that the partial derivatives

af,(x )

,I;')J

,1;.)

af,(x o)

ae;-

? f ; "=". exist and are continuous for i = I, ... , m and j = I, ... ,n. The Frechet
differential of1 at X o with increment h = (hI' ... ,h.) E X is given by

3/(x o, h) =

all (x o)

a/,(x o)

h[ h:.'J

al",(x o)

al",(x o)

_ ael

The F r tkhet derivative of 1 at X o is given by


all (x o)

al;.

which is also called the Jacobian matrix


j' ( x ) = a! ( x ) /ax .

of 1 at
X

o' We sometimes write

7.9.11. Example.
Let X = e[a, b], the family of real-valued continuous
functions defined on a[ , b], and let { X ; II II-} be the Banach space given in
Example 6.1.9. Let k(s, t) be a real-valued function defined and continuous
on a[ , b] X a[ , b], and let g(t, )x be a real-valued function which is defined
and ag(t, x ) /ax is continuous for t E a[ , b] and x E R. Let I: X - . X be
defined by
I(x )
F o r fixed
given by
X

o E ,X

s: k(s, t)g(t, x(tdt,

.X

the Frechet differential of1 at X o with increment hEX

3/(x o, h) =

k(s, t) ag(t'a~o(t})

h(t)dt.

is

Chapter 7 I iL near Operators

64 2
7.9.12. Exercise.

Verify the assertions made in Examples 7.9.5 to 7.9.11.

We now establish some of the properties of F r echet differentials.


7.9.13. Theorem.
Then

Let f, g: X

-+

be Frechet

differentiable at
X

o E .X

(i) fis continuous at X o E ;X and


(ii) for all ,~ p
E ,F f~ + pg is F r echet differentiable at X o and (~f
+ pg)'(x o) = ~f'(xo)
pg' ( x o)
Proof To prove (i), let f be Frechet differentiable at x o, and let F(x o) be
the Frechet derivative off at X o' Then
f(x o + h) - f(x o) = f(x o + h) - f(x o) - (F ox )h + (F ox )h,

and

IIf(x o + h) - f(x o) II ~ IIf(x o + h) - f(x o) - (F ox )hll + IIF(x o)hll.


Since F(x o) is bounded, there is an M > 0 such that II (F o
x )h II < Mil h II.
F u rthermore, for given! > 0 there is a ~ > 0 such that IIf(x o + h) - f(x o)
- (F ox h) II < I! I h II provided that II h II .~<
Hence, IIf(x o + h) - f(x o) II
< (M + ! ) lIhll whenever IIhll .~<
This implies thatfis continuous atx o'
The proof of part (ii) is straightforward and is left as an exercise.
_
7.9.14.

Prove part (ii) of Theorem 7.9.13.

Exercise.

We now show that the chain rule encountered in calculus applies to


Frechet derivatives as well.
7.9.15. Theorem. Let ,X ,Y and Z be normed linear spaces. L e t g: X - + ,Y
f: Y - + Z, and let,: X - + Z be the composite function , = fog. L e t g be
Frechet differentiable on an open set D c ,X and let f be F r echet differentiable on an open set E c g(D). If x E D is such that g(x) E E, then, is
Frechet differentiable at x and ,' ( x ) = f'(g(x))g'(x).

Proof Let y = g(x) and d =


x + hE D. Then
,(x

h) -

f(y +

,(x ) -

Thus, given!
11,(x

>

f' ( y)d

0 there is a ~

h) -

f' ( y)g' ( x ) h

f(y) -

d) -

,(x ) -

g(x

>

h) f(y +

f' ( y){ g (x

g(x), where hEX


d) -

f(y) h) -

0 such that II d II

f' ( y)g' ( x ) hll


~

! l Idli

<

f' ( y)d

g(x) ~ and

is such that

f' ( y)[ d -

g'()x h)

g'()x h).

II h II <

11f' ( y)II l Ihll

~ imply
E.

By the continuity of g (see the proof of part (i) of Theorem 7.9.13), it follows
that Ildli < M l Ihll for some constant M. Hence, there is a constant k

7.9.

Differentiation 01 Operators

such that

II ,(x +

h) -

This implies that ,' ( x )

64 3

,(x ) -

f' ( y)g' ( x ) h

exists and ,' ( x )

II <

kf II h II.

f' ( g(x g ' ( x ) .

We next consider the Frckhet derivative of bounded linear operators.


7.9.16. Theorem. Let T be a linear operator from X into .Y If f(x ) = Tx
for all x E ,X then/is Frechet differentiable on X if and only if T is a bounded
linear operator. In this case, f' ( x ) = T for all x E .X

Proof Let T be a bounded linear operator. Then Ilf(x + h) - f(x ) - Th II


= IIT(x + h) - Tx - Thll = 0 for all x , hEX . F r om this it follows that
f' ( x ) = T.
Conversely, suppose T is unbounded. Then, by Theorem 7.9.13,/ cannot
be Frechet differentiable.

Let us consider a specific case.


7.9.17. Example.
Let X = R" and Y = Rm, and let us assume that the
natural basis for each of these spaces is being used (see Example .4 1.15).
If A E H ( X ,
Y), then Ax is given in matrix representation by

all
Ax =

:
amI

Hence,
f' ( x ) =

if I(x ) = Ax , then f' ( x )


df(x ) /U x
is A.

A, and the matrix

representation of

The next result is useful in obtaining bounds on Frechet


functions.

differentiable

7.9.18. Theorem. Let f: X - + ,Y let D be an open set in ,X and let / be


Frechet differentiable on D. eL t X o E D, and let hEX be such that X o
+ th E D for all t when 0 < t < I. eL t N = sup 11f'(x o + th) II. Then
0< , < 1

Ilf(x o + h) - f(x o) II < N l Ihll.


Proof Let y = f(x o h) - f(x o), and let , be a bounded linear functional
E Y * ) such that ,(y) = 11,11 l Iyl! (see Corollary 6.8.6).
defined on Y(i.e."
Define g: (0, 1) - + R by get) = ,(f(x
th for 0 < t < I. By Theorems
7.9.15 and 7.9.16, g'(t) = ' P (/' ( x + th)h). By the mean value theorem of
calculus, there is a to such that 0 < to < I and g(I) - g(O) = g'(t 0)' Thus,

I,(/(x

,(/(x

I< 11,11

sup 1If' ( x

0< 1 < 1

+ th)IIllhll

Chapter 7 I iL near Operators

Since

Irp(f(x

it follows that II/(x o

rp(/(x
h) -

I(x

I = Irp(/(x + h) = IIrplllI/(x o +
sup 11f'(x
o) II ~
O<t<1

I = Irp(y) I
h) - I(x o) II,
th)1I l Ihll.

I(x ) )

If a function I: X - > Y is Frechet differentiable on an open set D c .X


and if f' ( x ) is Frechet differentiable at x E D, then I is said to be twice
Frechet differentiable at x , and we call the Frechet derivative of f' ( x ) the
second derivative of f We denote the second derivative of I by I". Note
that I" is a bounded linear operator defined on X with range in the nonned
linear space B(X, )Y .
We leave the proof of the next result as an exercise.

7.9.19. Theorem. L e t/: X - > Ybe twice Frechet differentiable on an open


set D c .X eL t X o E D, and hEX be such that X o + th E D for all t
when 0 < t < I. eL t N = sup 1I/"(x + th) II. Then

11/(x +
7.9.20. Exercise.

O< t < 1

h) -

I(x ) -

f' ( x ) hll

< iN l Ihll z .

Prove Theorem 7.9.19.

We conclude the present section by showing that the Gateaux and Frechet
differentials play a role in maximizing and minimizing functionals which is
similar to that of the ordinary derivative of functions of real variables.
eL t F = R, and let I be a functional on X ; i.e., I: X - > R. Clearly, for
fixed x o, hEX . we may define a function g: R - + R by the relation g(t)
= I(x o + th) for all t E R. In this case, if I is Gateaux differentiable at x o
we see that ~/(xo.
h) = g' ( t) It.o, where g' ( t) is the usual derivative of g(t).
We will need this property in proving our next result, Theorem 7.9.22. First,
however, we require the following important concept.

7.9.21. Definition. eL t I be a real-valued functional defined on a domain


S) c X ; i.e.,f: S) - >
R. eL t X o E S). Then/is said to have a relative minimum
(relative maximum) at X o if there exists an open sphere S(x o ; r) c X such that
for all x E S(x o; r) n S) the relation I(x o) < I(x ) (/(x o) ~ I(x
holds.
IfI has either a relative minimum or a relative maximum at x o then I is
said to have a relative extremum at X O'
F o r relative extrema, we have the following result.

7.9.22. Theorem. eL t I: X - + R be Gateaux differentiable at


If/has a relative extremum at x o, then ~/(xo,
h) = 0 for all hEX .
X

o E .X

7.10. Some Applications

Proof

As pointed out in the remark preceding Definition 7.9.21, the realvalued function g(t) = f(x o + th) must have an extremum at t = O. From
the oridnary calculus we must have g'(t) 1,.0 = O. eH nce, 6f(x o, h) = 0
for all hEX .
We leave the proof of the next result as an exercise.

7.9.23. Corollary. eL t f: X - + R be Frechet differentiable at


fhas a relative extremum at x o, thenj' ( x o) = O.
7.9.24.

Exercise.

X o E

.X

If

Prove Corollary 7.9.23.

We conclude this section with the following example.


7.9.25. Example. Consider the real-valued functionalf defined in Example
7.9.9; i.e.,f(x ) = IIv - L x liz. F o r a given v E ,Y a necessary condition for
fto have a minimum at X o E X is that

o=

L*Lx

L*v .

7.10. SOME APPLICATIONS


In this section we consider selected applications of the material of the
present chapter. The section consists of three parts. In the first part we consider integral equations, in the second part we give an example in optimal
control, while in the third part we address the problem of minimizing functionals by the method of steepest descent.
A. Applications to Integral Equations

Throughout this part, X is a complex Hilbert space while T denotes a


completely continuous normal operator defined on .X
We recall that if, e.g., X = a[ z L ,
b] and T is defined by (see Example
7.3.11 and the comment at the end in Example 7.7.5)

Tx(s)

s: k(s, t)x(t)dt,

(7.10.1)

then T is a completely continuous operator defined on .X Furthermore, if


k(s, t) = k(t, s) for all s, t E a[ , b], then T is hermitian (see Exercise 7.4.20)
and, hence, normal.
In the following, we shall focus our attention on equations of the form

Tx -

h =

y,

(7.10.2)

Chapter 7 I iL near Operators

64 6

where A E C and x, Y E .X If, in particular, T is defined by Eq. (7.10.1),


then Eq. (7.10.2) includes a large class of integral equations. Indeed, it was
the study of such equations which gave rise to much of the development of
functional analysis.
We now prove the following existence and uniqueness result.
7.10.3. Theorem. If A1= = 0 and if A is not an eigenvalue of T, then Eq.
(7.10.2) has a unique solution, which is given by
(7.10.4)

{ n} are the non-zero distinct eigenvalues of T, P n is the projection of


where A
X onto ~n = ~(T - AnI) along~;,l
for n = 1,2, ... ,and Pox is the projection of x onto ~(T).
Proof We first prove that the infinite series on the right-hand side of
Eq. (7.10.4) is convergent. Since A1= = 0, it cannot be an accumulation point
of A
{ n}. Thus, we can find ad> 0 such that IAI > d and 11 - 1k I> d for
k = 1,2, .... We note from Theorem 7.8.7 that PIP j = 0 for i j. Now
for N < 00, we have by the Pythagorean theorem,

*'

II-Pf +

k~I;: ~;:112

k~

=rhIlPoYW+

11-A
! kI2I1PkYW

< d211PoYW + dz kt IIP kyW

+ ktlllPkYll z ]

= dzI[ IPoYW
= d 211 poY

+ ~ Pkylr

< dzll pOY +


=

This implies that k~


Theorem 6.13.3 that

11

dziIYW.

~ 1k 12 II PkY

nt :X ~ ):

i;l PkyW

11 is convergent, and so it follows from

is convergent to an element in .X

be a positive integer. By Theorem 7.5.12, P j is continuous, and so


P
)
PP
by Theorem 7.1.27, Pj ~,
~ 1 = ~ , J ....:Y,. Now let x be given by
L e tj

00

Eq. (7.10.4) for arbitrary Y

00

11-1 All

,,- 1

A"

lJ .

.X We want to show that Tx - l x

= y. F r om

7.10. Some Applications

64 7

Eq. (7.10.4) we have

Pox

I
- r PoY

and

1 lPJ y forj=

PJ X = l J
Thus, poY

- l Pox

and PJY

lJPxJ

theorem (Theorem 7.8.7), we have Y

= poY

lPJx.

Now from the spectral

fti PJ'Y
00

Tx

00

= ftilJ P J x ,

and

00

~
lPJx. Hence, Y = Tx - l x .
:'J 1
Finally, to show that x given by Eq. (7.10.4) is unique, let x and z be such
that Tx - Ax = Tz - lz = y. Then it follows that T(x - )z - l(x - z)
=Y
- Y = O. Hence, T(x - )z = l(x - )z . Since 1 is by assumption not
an eigenvalue of T, we must have x - z = O. This completes the proof. _

lx

= lPox

1,2, ....

In the next result we consider the case where 1 is a non-zero


ofT.

eigenvalue

7.tO.S. Theorem. Let I{ n} denote the non-zero distinct eigenvalues of T,


and let A= lJ for some positive integer j. Then there is a (non-unique)
x E X satisfying Eq. (7.10.2) if and only if PJY = 0, where PJ is the orthogonal
projection of X onto ffi:J = :x { (T - Al)x = O}. If PJY = 0, then a solution
to Eq. (7.10.2) is given by

X=X

poY
o - " ' "II.

PkY

~'

k= l lI.k
k*J

where Po is the orthogonal projection of X


in ffi:J '

-.I\,

(7.10.6)

onto ffi:(T) and X o is any element

Proof We first observe that ffi:J reduces T by part (iii) of Theorem 7.6.26.
It therefore follows from part (ii) of Theorem 7.5.22 that TPJ = PJT. Now
suppose that Y is such that Eq. (7.10.2) is satisfied for some x E .X Then it
follows that PJY = Pi Tx - lJ x ) = TPJx - lJPxJ
= AJPXJ - AJPXJ = O.
In the preceding, we used the fact that Tx = lJ x for x E ffi:J and PJx E ffi:J
for all x E .X Hence, PJY = O.
Conversely, suppose that PJY = 0, and let x be given by Eq. (7.10.6).
The proof that x satisfies Eq. (7.10.2) follows along the same lines as the
proof of Theorem 7.10.3, and the details are left as an exercise. The nonuniqueness of the solution is apparent, since (T - ll)x o = 0 for any X o
E

ffi:J'
-

7.tO.7. Exercise.

Complete the proof of Theorem 7.10.5.

Chapter 7 I iL near Operators

64 8
B.

An Example

from Optimal Control

In this example we consider systems which can appropriately be described


by the system of first-order ordinary differential equations

AX(I)

i(l) =

(7.10.8)

BU(I),

X o is given. Here (X I) E RIO and (U I) E R'" for every 1 such that


< 1 < T for some T> 0, and A is an n X n matrix, and B is an n X m
matrix. As we saw in part (vi) of Theorem .4 11.45, if each element of the
vector (U I) is a continuous function of I, then the unique solution to Eq.
(7.10.8) at time 1 is given by

where x ( o)

= .(1, O)x(O)

(X I)

(.(1, r- )BU(f)d-r,

(7.10.9)

where .(1, f) is the state transition matrix for the system of equations given
in Eq. (7.10.8).
[' , T] by
Let sU now define the class of vector valued functions ;L O
;L O
[' ,

T] =

u{ : uT

(U . ,

,u",), where

If we define the inner product by


(u, v)

/U

[ ,
20

T], i =

I, ... ,m} .

uT(t)v(l)dl

for u, v E Lr[O, 1',] then it follows that Lr[O, T] is a Hilbert space (see
Example 6.11.11). Next, let us define the linear operator L : Lr[O, T] - +
Li[O, 1'] by

[Lu](I)

.(1, r- )BU(f)d-r

(7.10.10)

for all U E Lr[O, 1'.] Since the elements of .(1, r- ) are continuous functions
on 0[ , T] X 0[ , T], it follows that L is completely continuous.
Now recall from Exercise 5.10.59 that Eq. (7.10.9) is the unique solution
to Eq. (7.10.8) when the elements of the vector u(t) are continuous functions
of t. It can be shown that the solution of Eq. (7.10.8) exists in an extended
sense if we permit u E Lr[O, T]. Allowing for this generalization, we can
now consider the following optimal control problem. Let "I E R be such that
"I > 0, and let/be the real-valued functional defined on Ll[O, T] given by
/(u)

T
x (t)X(I)dt

"I

T
U (I)U(t)dt,

(7.10.11)

where (x t) is given by Eq. (7.10.9) for U E T


L O
[ , T]. The linear quadratic
L O
[ , T] such that/(u) in Eq. (7.10.11) is
cost control problem is to find u E T
minimum, where x(t) is the solution to the set of ordinary differential equations (7.10.8). This problem can be cast into a minimization problem in a
Hilbert space as follows.

7.10. Some Applications

64 9

Let

v(t)

= - . (t, O)x o for 0 <

t ::::;; T.

Then we can rewrite Eq. (7.10.9) as

x =

v,

Lu -

and Eq. (7.10.11) assumes the form


f(u)

= IILu - vW + "lIuW.

We can find the desired minimizing u in the more general context of


arbitrary real Hilbert spaces by means of the following result.

7.10.12. Theorem. Let X and Y be real Hilbert spaces, let :L

X - + Y be a
completely continuous operator, and let L * denote the adjoint of L . Let v
be a given fixed element in ,Y let" E R, and define the functionalf: X - + R
by
f(u) =

"lIull z

vW +

IILu -

(7.10.13)

for u E .X (In Eq. (7.10.13) we use the norm induced by the inner product
and note that II u II is the norm of u E ,X while II L u - v II is the norm of
(L u - v) E .Y ) If in Eq. (7.10.13), " > 0, then there exists a unique U o E X
such that f(u o) < f(u) for all u E .X Furthermore, U o is the solution to the
equation
L*Lu

+ "U

o=

(7.10.14)

L*v.

eL t us first examine Eq. (7.10.14). Since L is a completely continuous


operator, by Corollary 7.7.12, so is L*L.
Furthermore, the eigenvalues of
L * L cannot be negative, and so - " cannot be an eigenvalue of L*L.
Making
the association T = L * L ,
A = - " , and y = L * v in Eq. (7.10.2), it is clear
that Tis normal and it follows from Theorem 7.10.3 that Eq. (7.10.14) has a
unique solution. In fact, this solution is given by Eq. (7.10.4), using the above
definitions of symbols.
Next, let us assume that U o is the unique element in X satisfying Eq.
(7.10.14), and let hE X b e arbitrary. It follows from Eq. (7.10.13) that
Proof.

f(u o +

h) =

=
=

Therefore, f(u o +

(L u o +
(L u o -

(v, v)

(L u o -

v,L u o + L h - v) + ,,(uo +
v, L u o - v) + 2(Lh, L u o - v)

Lh -

v, L u o -

2(h, L * L u

IILu o -

"(I!o, u o) +
o+

vW +

v)

"u o -

2,,(u o, h)
(v, v)
L * v)

,,(uo, uo)
,,(h, h)

IlvW + "lIuoW+

h) is minimum if and only if h

,,(h, h)

"lIhW

O.

h, U o +

h)

Chapter 7 I iL near Operators

74 0

The solution to Eq. (7.10.14) can be obtained from Eq. (7.10.4); however.
a more convenient method is available for the finding of the solution when
L is given by Eq. (7.10.10). This is summariz e d in the following result.

7.10.1S. Theorem. L e t Y' >

0, and let f(u) be defined by Eq . (7.10.11),


where x ( t) is the solution to Eq. (7.10.8). If

for all t such that 0 ~


ential eq u ation

P(t)
with P(T)

Proof

<

.J_ ..
=

u(t)

BTp(t)x ( t)

Y'

T, where P(t) is the solution to the matrix differ-

- A Tp(t) -

P(t)A

+.!.

Y'

P(t)BBTp(t) -

(7.10.16)

where L u

is given

O. then u minimizes f(u).

We want to show that u satisfies Eq . (7.10.14),

by Eq. (7.10.10). We note that ifu satisfies Eq . (7.10.14). then u


-

v)

-'!'L*x.

bitrary w

Y'

- . ! . * (L u

We now find the expression for evaluating L * w

,L ,[O.

ru:
r f:
ruT(t)[f
r

for ar-

T]. We compute
(w. 0 )

Y'

.(s, t)Bu(t)dt w(s)ds

uT(t)BT.T(S,t)w(s)dtds

BT.T(S, t)w(s)dsJ d t.

In order for this last expression to eq u al (L*w, u), we must have

*L[ w](t)
Thus, u must satisfy

u(t)

t<

for all t such that 0 ~

=-

BT.T(S. t)w(s)ds.

_ BI T

Y'

iT

.T(S, t)x(s)ds

T. Now assume there exists a matrix P such that

P(t)x(t)

$T(S. t)x(s)ds.

(7.10.17)

We now find conditions for such a matrix P(t) to exist. F i rst, we see that
P(T) = O. Next, differentiating both sides of Eq. (7.10.17) with respect to
t, and noting that ebT(s, t) = AT.(s. t), we have

P(t)x(t)

P(t)i(t) =

- x ( t)

- x ( t)

AT
-

$T(S, t)x(s)ds

ATp(t)x(t).

74 1

7.10. Some Applications

Therefore,

P(t)x(t)

But

P(t)[Ax(t)
u(t) =

so that
P(t)x ( t)

Hence,

Bu(t)]

- l - L * x ( t)

P(t)Ax ( t) - l - P (t)BBTp(t)x ( t)
1'
pet) must satisfy
pet)

with peT) =
If

- A TP(t) -

P(t)AT +

L*Lu

1' U

ATP(t)X(t).

- x ( t)

l- P (t)BBTP(t) i'

O.

it follows that u satisfies

- l - B Tp(t)x ( t)
i'
=

1'

- x ( t)

= *L v,

ATP(t)X(t).

where v = - $ ( t, O)x o and so, by Theorem 7.10.12, u minimizes


Eq. (7.10.11). This completes the proof of the theorem. _

I given by

The differential equation for pet) in Eq. (7.10.16) is called a matrix


Riccati equation and can be shown to have a unique solution for all t < T.

C. Minimiz a tion of Functionals:

Method of Steepest Descent

The problem of finding the minimum (or maximum) of functionals arises


frequently in many diverse areas in applications. In this part we turn our
attention to an iterative method of obtaining the minimum of a functional
I defined on a real Hilbert space .X
Consider a functional I: X - + R of the form

I(x ) =

(x, Mx ) -

2(w, x )

+ p,

(7.10.18)

where w is a fixed vector in ,X where PER, and where M is a linear selfadjoint operator having the property

c,llx W x , Mx ) < c

1IxW

(7.10.19)

for all x E X and some constants C 2 > C 1 > O. The reader can readily verify
that the functional given in Eq. (7.10.13) is a special case off, given in Eq.
(7.10.18), where we make the association M= L * L + 1' 1 (provided i' > 0),
w

= L * v, and p =
U n der

(v, v).

the above conditions, the equation

Mx =

(7.10.20)

74 1

Chapter 7 I iL near Operators

has a unique solution, say x o, and X o minimizes f(x ) . Iterative methods are
based on beginning with an initial guess to the solution of Eq. (7.10.20)
and then successively attempting to improve the estimate according to a
recursive relationship of the form
(7.10.21)
where~.
E Rand r. E .X
Different methods of selecting~.
and r. give rise
to various algorithms of minimizing f(x ) given in Eq. (7.10.18) or, equivalently, finding the solution to Eq. (7.10.20). In this part we shall in particular
consider the method of steepest descent. In doing so we let

r.

w-

Mx.,

n=

1,2, . . . .

(7.10.22)

The term r. defined by Eq. (7.10.22) is called the residual of the approximation x . If, in particular, x . satisfies Eq. (7.10.20), we see that the residual
is ez ro. F o r f(x ) given in Eq. (7.10.18), we see that

f' ( x . )

= - 2 r",

where f' ( x . ) denotes the gradient of f(x . ). That is, the residual, r., is
"pointing" into the direction of the negative of the gradient, or in the direction
of steepest descent. Equation (7.10.2 I) indicates that the correction term
~.r.
is to be a scalar multiple of the gradient, and thus the steepest descent
method constitutes an example of one of the so-called "gradient methods."
is chosen so thatf(x . + ~.r.)
is minimum.
With r. given by Eq. (7.10.22),~.
Substituting x . + ~.r. into Eq. (7.10.18), it is readily shown that
(l

(r r.)
(r., Mr.)

is the minimizing value. This method is illustrated pictorially in Figure B.

,X

fix , )
7.10.23.

iF gure B. Illustration of the method of steepest descent.

74 3

7.11. Refe,ences and Notes

In the following result we show that under appropriate conditions the


x{ J
generated in the heuristic discussion above converges to the
sequence N
uniq u e minimizing element X o satisfying Eq. (7.10.20).
7.10.24. Theorem. L e t M E B(X , X ) be a self-adjoint operator such that
for some pair of positive real numbers" and .J l we have ,,11 x W< (x, Mx )
< .J lllx Wfor all x E .X L e t IX E X be arbitrary, let W E ,X and let'N = W
- Mx N, where N
X I+
= X N (l,N'N for n = 1,2, ... ,and (l,N = ('N' N
' )/('N' M'N)'
Then the sequence x {
converges to x o, where X o is the uniq u e solution to
Eq. (7.10.20).

N}

In view of the Schwarz inequality we have (x, Mx ) < IIMx l lllx l l.


This implies that "llx l l < IIMx l l for all x E ,X and so M is a bijective
mapping by Theorem 7.4.21, with M- I E B(X , X ) and 11M-III < I/r. By
Theorem 7.4.10, M- I is also self-adjoint. L e t X o be the uniq u e solution to
Eq . (7.10.20), and define :F X - > R by

Proof

x o, M(x - x o)) for x E .X


We see that F is minimized uniquely by x = x o, and furthermore F ( x o) = O.
We now show that lim F ( x N) = O. If for some n, F ( x N) = 0, the process
F(x)

(x -

terminates and we are done. So assume in the following that F ( x


also that since M is positive, we have F ( x ) > 0 for all x E .X
We begin with the fact that

F ( x N+ I) = F ( x N) - 2(1,N('N' MYN)
(I,~('N'
where we have let NY = X o - X N. Noting that N
' =
(YN' MYN) = (M- I ' N ' 'n), we have
F(x

Hence, (F N
x I+ )
so X N- >

7.11.

<

N) F(x

(1 -

(F N
x I+ )
N)

)F ( X

n) <

1= =

O. Note

M'N)'
MYN' so that F ( x

N) =

(' n ' n' )2


:;;::: .1..
(' n , M' n )(M- I rN, ' n )
.J l

(1 -

x o, which was to be proven.

REFERENCES

F(x

l ).

Thus, li~

F(x

n) =

Oand

AND NOTES

Many of the excellent sources dealing with linear operators on Banach


and H i lbert spaces include Balakrishnan 7[ .2], Dunford and Schwarz 7[ .5],
K a ntorovich and Akilov 7[ .6], K o lmogorov and F o min 7[ .7], Liusternik and
Sobolev 7[ .8], Naylor and Sell 7[ .11], and Taylor 7[ .12). The exposition by
Naylor and Sell is especially well suited from the viewpoint of applications
in science and engineering.

Chapter 7 / Linear Operators

474

F o r applications of the type considered in Section 7.10, as well as additional applications, refer to Antosiewicz and Rheinboldt 7[ .1], Balakrishnan
7[ .2], Byron and Fuller 7[ .3], Curtain and Pritchard 7[ .4,]
Kantarovich and
Akilov 7[ .6], Lovitt 7[ .9], and Luenberger 7[ .10). Applications to integral
equations (see Section 7.lOA) are treated in 7[ .3] and 7[ .9]. Optimal control
problems (see Section 7.lOB) in a Banach and Hilbert space setting are
and 7[ .10]. Methods for minimization of funcpresented in 7[ .2], 7[ .4,]
and 7[ .10].
tionals (see Section 7.1OC) are developed in 7[ .1], 7[ .6],

REF E RENCES
7[ .1]

7[ .2]
7[ .3]

7[ .4]

7[ .5]
7[ .6]

7[ .7]
7[ .8]
7[ .9]
7[ .10]
7[ .11]
7[ .12]

.H A. ANTOSIEWICZ and W. C. RHEINBOLDT,


"Numerical Analysis and
uF nctional Analysis," Chapter 14 in Survey of Numerical Analysis, ed. by
.J TODD. New oY rk: McGraw-iH ll Book Company, 1962.
A. V. BALARK ISHNAN,
Applied uF nctional Analysis. New o
Y rk: SpringerVerlag, 1976.
.F W. BYRON and R. W. EL UF R,
Mathematics of Classical and Quantum
Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,
1969 and 1970.
R. .F CuRTAIN and A. .J PRITCHARD, uF nctional Analysis in Modern Applied
Mathematics. o
L ndon: Academic Press, Inc., 1977.
N. DUNO
F RD and .J SCHWARZ, iL near Operators, Parts I and II. New oY rk:
Interscience Publishers, 1958 and 1964.
L . V. A
K NTOROVICH
and G. P. AKIO
L V,
uF nctional Analysis in Normed
Y rk: The Macmillan Company, 1964.
Spaces. New o
A. N. O
K M
L OGOROV and S. V. O
F MIN, Elements of the Theory ofFunctions
and uF nctional Analysis. Vols. I, II. Albany, N.Y.: Graylock Press, 1957
and 1961.
L. A. IL SU TERNlK
and V. J. SoBOLEV, Elements ofFunctional Analysis. New
oY rk: rF ederick nU gar Publishing Company, 1961.
W. V. LovllT, iL near Integral Equations. New oY rk: Dover Publications,
Inc., 1950.
D. G. EUL NBERGER,
Optimization by Vector Space Methods. New o
Y rk:
oJ hn Wiley & Sons, Inc., 1969.
A. W. NAYO
L R and G. R. SEL,L Linear Operator Theory. New oY rk: oH lt,
Rinehart and Winston, 1971.
A. E. TAYO
L R, Introduction to uF nctional Analysis. New oY rk: oJ hn Wiley
& Sons, Inc., 1958.
*Reprinted in one volume by Dover Publications, Inc., New oY rk,

1992.

INDEX

Abelian group, 40
abstract algebra, 33
additive group, 46
adherent point, 275
adjoint system of
ordinary differential
equations, 261
adjoint transformation, 219, 220,422
affine linear subspace, 85
algebra, 30,56,57,104
algebraically closed
field, 165
algebraic conjugate, 110
algebraic multiplicity, 167,223

algebraic structure, 31
algebraic system, 30
algebra with identity, 57,105
aligned, 379
almost everywhere, 295
approximate eigenvalue, 444
approximate point
spectrum, 444
approximation, 395
Arzela-Ascoli theorem, 316
Ascoli's lemma, 317
associative algebra, 56, 105
associative operation, 28
automorphism, 64, 68
autonomous system of
differential equations, 241
Axioms of norm, 207
475

476

Index
B

Banach inverse theorem, 416


Banach space, 31, 345
basis, 61,89
Bessel inequality, 213, 380
bicompact, 302
bijection 14
bijective, 14, 100
bilinear form, 114
bilinear functional, 114-115
binary operation, 26
block diagonal matrix, 175
Bolzano-Weierstrass
property, 302
Bolzano-Weierstrass
theorem, 298
boundary, 279
bounded linear
functional, 356
bounded linear operator, 407
bounded metric space, 265
bounded sequence, 286
B(X,Y), 409

c
C[a,b],80
cancellation laws, 34
canonical mapping, 372
cardinal number, 24
cartesian product, 10
Cauchy-Peano
existence theorem, 332
Cauchy sequence, 290
Cayley-Hamilton theorem, 167
Cayley's theorem, 66
characteristic equation, 166,259
characteristic polynomial, 166
characteristic value, 164
characteristic vector, 164
0 > 79
classical adjoint of a
matrix, 162
closed interval, 283
closed relative to an
operation, 28

closed set, 279


closed sphere, 283
closure, 275
C n ,78
cofactor, 158
colinear, 379
collection of subsets, 8
column matrix, 132
column of a matrix, 132
column rank of a matrix, 152
column vector, 125
commutative algebra, 57,105
commutative group, 40
commutative operation, 28
commutative ring, 47
compact, 302
compact operator, 447
companion form, 256
comparable matrices, 137
complement of a subset, 4
completely continuous
operator, 447
complete metric space, 290
complete ortghonormal
set of vectors, 213,389
completion, 295
complex vector space, 76
composite function, 16
composite mathematical
system, 30, 54
conformal matrices, 137
congruent matrices, 198
conjugate functional, 114
conjugate operator, 421
constant coefficients, 241
contact point, 275
continuation of a
solution, 336
continuous function, 307,408
continuous spectrum, 440
contraction mapping, 314
converge, 286,350
convex, 351-355
coordinate representation
of a vector, 125
coordinates of a vector
with respect to a basis, 92, 124
countable set, 23
countably infinite set, 23

Index

477

covering, 299
cyclic group, 43,44

D
degree of a polynomial, 70
DeMorgan's laws, 7,12
dense-in-itself, 284
denumerable set, 23
derived set, 277-278
determinant of a
linear transformation, 163
determinant of a matrix, 157
diagonalization of a
matrix, 172
diagonalization process, 450
diagonal matrix, 155
diameter of a set, 267
difference of sets, 7
differentiation:
of matrices, 247
of vectors, 241
dimension, 78,92,392
direct product, 10
direct sum of linear,
subspaces83,457
discrete metric, 265
disjoint sets, 5
disjoint vector spaces, 83
distance 264
between a point
and a set, 267
between sets, 267
between vectors, 208
distribution function, 397
distributive, 28
diverge, 286, 350
division algorithm, 71
division (of
polynomials), 72
division ring, 46, 50
divisor, 49
divisors of zero, 48
divisors of zero, 48
domain of a function, 12
domain of a relation, 25
dot product, 114

dual, 358
dual basis, 112

E
e-approximate solution, 329
e-dense set, 299
e-net, 299
eigenvalue, 164,439
eigenvector, 164,439
element, 2
element of ordered set, 10
empty set, 3
endomorphism, 64, 68
equal by definition, 10
equality of functions, 14
equality of matrices, 132
equality of sets, 3
equals relation, 26
equicontinuous, 316
equivalence relation, 26
equivalent matrices, 151
equivalent metrics, 318
equivalent sets, 23
error vector, 395
estimate, 398
Euclidean metric, 271
Euclidean norm, 207
Euclidean space, 30,124, 205
even permutation, 156
events, 397
everywhere dense, 284
expected value, 398
extended real line, 266
extended real numbers, 266
extension of a function, 20
extension of an
operation, 29
exterior, 279
extremum, 464

F
factor, 72
family of disjoint sets, 12
family of subsets, 8

478

Index

field, 30, 46, 50


field of complex
numbers, 51
field of real numbers, 51
finite covering, 299
finite-dimensional
operator, 450
finite-dimensional
vector space, 92,124
finite group, 40
finite intersection
property, 305
finite linear
combination of vectors, 85
finite set, 8
fixed point, 315
flat, 85
F n , 78
Fourier coefficients, 380,389
Frechet derivative, 458
Fredholm equation, 97,326
Fredholm operator, 425
function, 12
functional, 109,355
functional analysis, 343
function space, 80
fundamental matrix, 246
fundamental sequence, 290
fundamental set, 246
fundamental theorem
of algebra, 74
fundamental theorem of
linear equations, 99

G
Gateaux differential, 458
generalized associative
law, 36
generated subspace, 383
generators of a set, 60
Gram matrix, 395
Gram-Schmidt process, 213,391
graph of a function, 14
greatest common divisor, 73
Gronwall inequality, 332
group, 30, 39

group component, 46
group operation, 46

H
Hahn-Banach theorem, 367-370
half space, 366
Hamel basis, 89
Hausdorff spaces, 323
Heine-Borel property, 302
Heine-Borel theorem, 299
hermitian operator, 427
Hilbert space, 31, 377
homeomorphism, 320
homogeneous property
of a norm, 208,344
homogeneous system, 241-242
homomorphic image, 62,68
homomorphic rings, 67
homomorphic semigroups, 63
homomorphism, 30, 62
hyperplane, 364

I
idempotent operator, 121
identity:
element, 35
function, 19
matrix, 139
permutation, 19,44
relation, 26
transformation, 105,409
image of a set under f, 21
indeterminate of a
polynomial ring, 70
index:
of a nilpotent
operator, 185
of a symmetric
bilinear functional, 202
set, 10
indexed family of sets, 10
indexed set, 11
induced:
mapping, 20

Index
induced (cont.)
metric, 267
norm,349,412
operation, 29
inequalities, 268-271
infinite-dimensional
vector space, 92
infinite series, 350
infinite set, 8
initial value problem, 238-261,328-:
injection, 14
injective, 14,100
inner product, 117,205,375
inner product space, 31, 118, 205
inner product subspace, 118
integral domain, 46,49
integration:
of matrices 249
of vectors 249
interior, 278
intersection of sets, 5
invariant linear
subspace, 122
inverse:
image 21
of a function, 15, 100
of a matrix, 140
of an element, 38
relation, 25
invertible element, 37
invertible linear
transformation, 100
invertible matrix, 140
irreducible polynomial, 74
irreflexive, 372
isolated point, 275
isometric operator, 431
isometry,321
isomorphic, 108
isomorphic semigroups, 64
isomorphism, 30, 63, 68,108

J
Jacobian matrix, 461
Jacobi identity, 57
Jordan canonical form, 175,191

K
Kalman's theorem, 401402
kernel of a homomorphism, 65
Kronecker delta, 111

L
Laplace transform, 96
latent value, 164
leading coefficient of
a polynomial, 70
Lebesgue integral, 296
Lebesgue measurable
function, 296
Lebesgue measurable
sets, 295
Lebesgue measure, 295
left cancellation
property, 34
left distributive, 28
left identity, 35
left inverse, 36
left invertible element, 37
left R-module, 54
left solution, 40
Lie algebra, 57
limit, 286
limit point, 277,288
line segment, 351
linear:
algebra, 33
functional, 109,355-360
manifold, 81
operator, 31,95
quadratic cost
control, 468
space, 30,55,76
subspace, 59,81,348
subspace generated
by a set, 86
transformation, 30, 95,100
variety, 85
linearly dependent, 87
linearly independent, 87
Lipschitz condition, 324, 328
Lipschitz constant, 324, 328

480

Index

lower triangular matrix, 176


L 297
L(X,Y), 104

M
map, 13
mapping, 13
mathematical system, 30
matrices, 30
matrix, 132
matrix of:
a bilinear functional, 195
a linear transformation, 131
one basis with respect
to a second basis, 149
maximal linear subspace, 363
metric, 31,209,264
metric space, 31,209, 263-342
metric subspace, 267
minimal polynomial, 179,181
minor of a matrix, 158
modal matrix, 172
modern algebra, 33
module, 30, 54
monic polynomial, 70
monoid, 37
multiplication of a
linear transformation
by a scalar, 104
multiplication of
vectors by scalars, 76,409
multiplicative semigroup, 46
multiplicity of an
eigenvalue, 164
multivalued function, 25

N
natural basis, 126
natural coordinates, 127
n-dimensional complex
coordinate space, 78
n-dimensional real
coordinate space, 78

n-dimensional vector
space, 92
negative definite
matrix, 222
nested sequence
of sets, 298
Neumann expansion
theorem, 415
nilpotent operator, 185
non-abelian group, 40
non-commutative group, 40
non-empty set, 3
non-homogeneous system, 241-242
non-linear
transformation, 95
non-singular linear
transformation, 100
non-singular matrix, 140
non-void set, 3
norm, 206, 344
normal:
equations, 395
linear
transformation, 237
operator, 431
topological space, 323
normalizing a vector, 209
normed conjugate space, 358
normed dual space, 358
normed linear space, 31, 208,344
norm of a bounded
linear transformation, 409
norm preserving, 367
nowhere dense, 284
null:
matrix, 139
set, 3
space, 98,224
vector, 76, 77
nullity of a linear
transformation, 100
n-vector, 132

O
object, 2
observations, 398
odd permutation, 156

481

Index
one-to-one and onto
mapping, 14,100
one-to-one mapping, 14, 100
onto mapping, 14,100
open:
ball, 275
covering, 299
interval, 282
set, 279
sphere, 275
operation table, 27
operator, 13
optimal control problem, 468
ordered sets, 9
order of a group, 40
order of a polynomial, 70
order of a set, 8
ordinary differential
equations, 238-261
origin, 76, 77
orthogonal:
basis, 210
complement, 215,382
linear transformation, 217, 231-:
matrix, 216,226
projection, 123,433
set of vectors, 379
vectors, 118,209
orthogonality principle, 399
orthonormal set of
vectors, 379
outcomes, 397

point spectrum, 440


polarization, 116
polynomial, 69
positive definite matrix, 222
positive operator, 429
power class, 9
power set, 9
precompact, 299
predecessor of an
operation, 29
pre-Hilbert space, 377
primary decomposition
theorem, 183
principal minor of a
matrix, 158
principle of superposition, 96
probability space, 397
product metric spaces, 274
product of:
a matrix by a scalar, 138
linear transformations, 105,409
two elements, 46,104
two matrices, 138
projection, 119,226,387
projection theorem, 387,400
proper:
subset, 3
subspace, 81, 164
value, 164
vector, 164
Pythagorean theorem, 209, 379
Q

P
parallel, 364
parallelogram law, 208, 379
Parseval's formula, 390
Parseval's identity, 212
partial sums, 350
partitioned matrix, 147
permutation group, 44,45
permutation on a set, 19
piecewise continuous
derivatives, 329
point of accumulation, 277
points, 264

quadratic form, 115, 226


quotient, 72

R
radius, 275
random variable, 397
range of a function, 12
range of a relation, 25
range space, 98
rank of a linear
transformation, 100

482
rank of a matrix, 136
rank of a symmetric
bilinear functional, 202
real inner product space, 205
real line, 265
real vector space, 76
reduce, 435
reduced characteristic
function, 179
reduced linear
transformation, 122
reflection, 218
reflexive, 372
reflexive relation, 25
regular topological
space, 323
relation, 25
relatively compact, 307
relatively prime, 73
remainder, 72
repeated eigenvalues, 173
residual, 472
residual spectrum, 440
resolution of the
identity, 226,457
resolvent set, 439
restriction of a mapping, 20
R-homomorphism, 68
Riccati equation, 471
Riemann intergrable, 296
Riesz representation
theorem, 393
right:
cancellation property, 34
distributive, 28
identity, 34
inverse, 35
invertible element, 37
R-module, 54
solution, 40
R, 78
ring, 30,46
ring of integers, 51
ring of polynomials, 70
ring with identity, 47
R-module, 54
Rn, 78
rotation, 218, 230
row of a matrix, 131

Index
row rank of a matrix, 152
row vector, 125,132
R*, 266
R-submodule, 58
R-submodule generated
by a set, 60

s
scalar, 75
scalar multiplication, 76
Schwarz inequality, 207,376
second dual space, 371
secular value, 164
self-adjoint linear
transformation, 221, 224-225
self-adjoint operators, 428
semigroup, 30, 36
semigroup component, 46
semigroup of
transformations, 44
semigroup operation, 46
separable, 284, 300
separates, 366
sequence, 11, 286
sequence of disjoint
sets, 12
sequence of sets, 11
sequentially compact, 301-305
set, 1
set of order zero, 8
shift operator, 441
a-algebra, 397
a-field,397
signature of a symmetric
bilinear functional, 202
similarity transformation, 153
similar matrices, 153
simple eigenvalues, 164
singleton set, 8
singular linear
transformation, 101
singular matrix, 140
skew-adjoint linear
transformation, 221, 237
skew symmetric bilinear
functional, 196
skew symmetric matrix, 196

483

Index
skew symmetric part of a
linear functional, 196
solution of a differential
equation, 239
solution of an initial
value problem, 239
space of:
bounded complex
sequences, 79
bounded real sequences, 79
finitely non-zero
sequences, 79
linear transformations, 104
real-valued continuous
functions, 80
span, 86
spectral theorem, 226,455,457
spectrum, 164,439
sphere, 275
spherical neighborhood, 275
square matrix, 132
state transition matrix, 247-255
steepest descent, 472
strictly positive, 429
strong convergence, 373
subalgebra, 105
subcovering, 299
subdomain, 52
subfield, 52
subgroup, 41
subgroup generated
by a set, 43
submatrix, 147
subring, 52
subring generated by
a set, 53
subsemigroup,40
subsemigroup generated
by a set, 41
subsequence, 287
subset, 3
subsystem, 40,46
successive
approximations, 315, 324-328
sum of:
elements, 46
linear operators, 409
linear transformations, 104
matrices, 138

sets, 82
vectors, 76
surjective, 14, 100
Sylvester's theorem, 199
symmetric difference
of sets, 7
symmetric matrix, 196, 226
symmetric part of a
linear functional, 196
symmetric relation, 26
system of differential
equations, 240, 255-260

T
ternary operation, 26
Tj-spaces, 323
topological space, 31
topological structure, 31
topology, 280, 318,322-323
totally bounded, 299
T',421
trace of a matrix, 169
transformation, 13
transformation group, 45
transitive relation, 26
transpose of a linear
transformation, 113,420
transpose of a matrix, 133
transpose of a vector, 125
triangle inequality, 208, 264, 344
triangular matrix, 176
trivial ring, 48
trivial solution, 245
trivial subring, 53
truncation operator, 439
T*,422
T T ,113

u
unbounded linear
functional, 356
unbounded metric space, 265
uncountable set, 23
uniform convergence, 313

484

Index

uniformly continuous, 308


union of sets, 5
unit, 37
unitary operator, 431
unitary space, 205
unit of a ring, 47
unit vector, 209
unordered pair of
elements, 9
upper triangular matrix, 176
usual metric for R*, 266,320
usual metric on R, 265
usual metric on Rn, 271

V
vacuous set, 3
Vandermonde matrix, 260
variance, 398
vector, 75
vector addition, 75
vector space, 30,55, 76
vector space of n-tuples
over F, 56
vector space over a field, 76
vector subspace, 59

Venn diagram, 8
void set, 3
Volterra equation, 327
Volterra integral
equation, 97

w
weak convergence, 373
weakly continuous, 375
weak* compact, 375
weak-star convergence, 373
Weierstrass approximation
theorem, 285
Wronskian, 256-259
XYZ
Xf, 357
X*, 357-358
zero:
polynomial, 70
transformation, 104,409
vector, 76, 77
Zorn's lemma, 390

You might also like