for Scientists & Engineers
Matrix Analysis
for Scientists & Engineers
This page intentionally left blank This page intentionally left blank
Matrix Analysis
for Scientists & Engineers
Alan J. Laub
University of California
Davis, California
slam.
Matrix Analysis
for Scientists & Engineers
Alan J. Laub
University of California
Davis, California
Copyright © 2005 by the Society for Industrial and Applied Mathematics.
1 0 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book
may be reproduced, stored, or transmitted in any manner without the written permission
of the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 University City Science Center, Philadelphia, PA 191042688.
MATLAB® is a registered trademark of The MathWorks, Inc. For MATLAB product information,
please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 017602098 USA,
5086477000, Fax: 5086477101, info@mathworks.com, www.mathworks.com
Mathematica is a registered trademark of Wolfram Research, Inc.
Mathcad is a registered trademark of Mathsoft Engineering & Education, Inc.
Library of Congress CataloginginPublication Data
Laub, Alan J., 1948
Matrix analysis for scientists and engineers / Alan J. Laub.
p. cm.
Includes bibliographical references and index.
ISBN 0898715768 (pbk.)
1. Matrices. 2. Mathematical analysis. I. Title.
QA188138 2005
512.9'434—dc22
2004059962
About the cover: The original artwork featured on the cover was created by freelance
artist Aaron Tallon of Philadelphia, PA. Used by permission.
slam is a registered trademark.
Copyright © 2005 by the Society for Industrial and Applied Mathematics.
10987654321
All rights reserved. Printed in the United States of America. No part of this book
may be reproduced, stored, or transmitted in any manner without the written permission
of the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 University City Science Center, Philadelphia, PA 191042688.
MATLAB® is a registered trademark of The MathWorks, Inc. For MATLAB product information,
please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 017602098 USA,
5086477000, Fax: 5086477101, info@mathworks.com, wwwmathworks.com
Mathematica is a registered trademark of Wolfram Research, Inc.
Mathcad is a registered trademark of Mathsoft Engineering & Education, Inc.
Library of Congress CataloginginPublication Data
Laub, Alan J., 1948
Matrix analysis for scientists and engineers / Alan J. Laub.
p. cm.
Includes bibliographical references and index.
ISBN 0898715768 (pbk.)
1. Matrices. 2. Mathematical analysis. I. Title.
QA 188.L38 2005
512.9'434dc22
2004059962
About the cover: The original artwork featured on the cover was created by freelance
artist Aaron Tallon of Philadelphia, PA. Used by permission .
•
5.lam... is a registered trademark.
To my wife, Beverley
(who captivated me in the UBC math library
nearly forty years ago)
To my wife, Beverley
(who captivated me in the UBC math library
nearly forty years ago)
This page intentionally left blank This page intentionally left blank
Contents
Preface xi
1 Introduction and Review 1
1.1 Some Notation and Terminology 1
1.2 Matrix Arithmetic 3
1.3 Inner Products and Orthogonality 4
1.4 Determinants 4
2 Vector Spaces 7
2.1 Definitions and Examples 7
2.2 Subspaces 9
2.3 Linear Independence 10
2.4 Sums and Intersections of Subspaces 13
3 Linear Transformations 17
3.1 Definition and Examples 17
3.2 Matrix Representation of Linear Transformations 18
3.3 Composition of Transformations 19
3.4 Structure of Linear Transformations 20
3.5 Four Fundamental Subspaces 22
4 Introduction to the MoorePenrose Pseudoinverse 29
4.1 Definitions and Characterizations 29
4.2 Examples 30
4.3 Properties and Applications 31
5 Introduction to the Singular Value Decomposition 35
5.1 The Fundamental Theorem 35
5.2 Some Basic Properties 38
5.3 Row and Column Compressions 40
6 Linear Equations 43
6.1 Vector Linear Equations 43
6.2 Matrix Linear Equations 44
6.3 A More General Matrix Linear Equation 47
6.4 Some Useful and Interesting Inverses 47
vii
Contents
Preface
1 Introduction and Review
1.1 Some Notation and Terminology
1.2 Matrix Arithmetic . . . . . . . .
1.3 Inner Products and Orthogonality .
1.4 Determinants
2 Vector Spaces
2.1 Definitions and Examples .
2.2 Subspaces.........
2.3 Linear Independence . . .
2.4 Sums and Intersections of Subspaces
3 Linear Transformations
3.1 Definition and Examples . . . . . . . . . . . . .
3.2 Matrix Representation of Linear Transformations
3.3 Composition of Transformations . .
3.4 Structure of Linear Transformations
3.5 Four Fundamental Subspaces . . . .
4 Introduction to the MoorePenrose Pseudoinverse
4.1 Definitions and Characterizations.
4.2 Examples..........
4.3 Properties and Applications . . . .
5 Introduction to the Singular Value Decomposition
5.1 The Fundamental Theorem . . .
5.2 Some Basic Properties .....
5.3 Rowand Column Compressions
6 Linear Equations
6.1 Vector Linear Equations . . . . . . . . .
6.2 Matrix Linear Equations ....... .
6.3 A More General Matrix Linear Equation
6.4 Some Useful and Interesting Inverses.
vii
xi
1
1
3
4
4
7
7
9
10
13
17
17
18
19
20
22
29
29
30
31
35
35
38
40
43
43
44
47
47
viii Contents
7 Projections, Inner Product Spaces, and Norms 51
7.1 Projections 51
7.1.1 The four fundamental orthogonal projections 52
7.2 Inner Product Spaces 54
7.3 Vector Norms 57
7.4 Matrix Norms 59
8 Linear Least Squares Problems 65
8.1 The Linear Least Squares Problem 65
8.2 Geometric Solution 67
8.3 Linear Regression and Other Linear Least Squares Problems 67
8.3.1 Example: Linear regression 67
8.3.2 Other least squares problems 69
8.4 Least Squares and Singular Value Decomposition 70
8.5 Least Squares and QR Factorization 71
9 Eigenvalues and Eigenvectors 75
9.1 Fundamental Definitions and Properties 75
9.2 Jordan Canonical Form 82
9.3 Determination of the JCF 85
9.3.1 Theoretical computation 86
9.3.2 On the +1's in JCF blocks 88
9.4 Geometric Aspects of the JCF 89
9.5 The Matrix Sign Function 91
10 Canonical Forms 95
10.1 Some Basic Canonical Forms 95
10.2 Definite Matrices 99
10.3 Equivalence Transformations and Congruence 102
10.3.1 Block matrices and definiteness 104
10.4 Rational Canonical Form 104
11 Linear Differential and Difference Equations 109
11.1 Differential Equations 109
11.1.1 Properties of the matrix exponential 109
11.1.2 Homogeneous linear differential equations 112
11.1.3 Inhomogeneous linear differential equations 112
11.1.4 Linear matrix differential equations 113
11.1.5 Modal decompositions 114
11.1.6 Computation of the matrix exponential 114
11.2 Difference Equations 118
11.2.1 Homogeneous linear difference equations 118
11.2.2 Inhomogeneous linear difference equations 118
11.2.3 Computation of matrix powers 119
11.3 HigherOrder Equations 120
viii
7 Projections, Inner Product Spaces, and Norms
7.1 Projections ..................... .
7.1.1 The four fundamental orthogonal projections
7.2 Inner Product Spaces
7.3 Vector Norms
7.4 Matrix Norms ....
8 Linear Least Squares Problems
8.1 The Linear Least Squares Problem . . . . . . . . . . . . . .
8.2 Geometric Solution . . . . . . . . . . . . . . . . . . . . . .
8.3 Linear Regression and Other Linear Least Squares Problems
8.3.1 Example: Linear regression ...... .
8.3.2 Other least squares problems ...... .
8.4 Least Squares and Singular Value Decomposition
8.5 Least Squares and QR Factorization . . . . . . .
9 Eigenvalues and Eigenvectors
9.1 Fundamental Definitions and Properties
9.2 Jordan Canonical Form .... .
9.3 Determination of the JCF .... .
9.3.1 Theoretical computation .
9.3.2 On the + l's in JCF blocks
9.4 Geometric Aspects of the JCF
9.5 The Matrix Sign Function.
10 Canonical Forms
10.1 Some Basic Canonical Forms .
10.2 Definite Matrices . . . . . . .
10.3 Equivalence Transformations and Congruence
10.3.1 Block matrices and definiteness
10.4 Rational Canonical Form . . . . . . . . .
11 Linear Differential and Difference Equations
ILl Differential Equations . . . . . . . . . . . . . . . .
11.1.1 Properties ofthe matrix exponential . . . .
11.1.2 Homogeneous linear differential equations
11.1.3 Inhomogeneous linear differential equations
11.1.4 Linear matrix differential equations . .
11.1.5 Modal decompositions . . . . . . . . .
11.1.6 Computation of the matrix exponential
11.2 Difference Equations . . . . . . . . . . . . . .
11.2.1 Homogeneous linear difference equations
11.2.2 Inhomogeneous linear difference equations
11.2.3 Computation of matrix powers .
11.3 HigherOrder Equations. . . . . . . . . . . . . . .
Contents
51
51
52
54
57
59
65
65
67
67
67
69
70
71
75
75
82
85
86
88
89
91
95
95
99
102
104
104
109
109
109
112
112
113
114
114
118
118
118
119
120
Contents ix
12 Generalized Eigenvalue Problems 125
12.1 The Generalized Eigenvalue/Eigenvector Problem 125
12.2 Canonical Forms 127
12.3 Application to the Computation of System Zeros 130
12.4 Symmetric Generalized Eigenvalue Problems 131
12.5 Simultaneous Diagonalization 133
12.5.1 Simultaneous diagonalization via SVD 133
12.6 HigherOrder Eigenvalue Problems 135
12.6.1 Conversion to firstorder form 135
13 Kronecker Products 139
13.1 Definition and Examples 139
13.2 Properties of the Kronecker Product 140
13.3 Application to Sylvester and Lyapunov Equations 144
Bibliography 151
Index 153
Contents
12 Generalized Eigenvalue Problems
12.1 The Generalized EigenvaluelEigenvector Problem
12.2 Canonical Forms ................ .
12.3 Application to the Computation of System Zeros .
12.4 Symmetric Generalized Eigenvalue Problems .
12.5 Simultaneous Diagonalization ........ .
12.5.1 Simultaneous diagonalization via SVD
12.6 HigherOrder Eigenvalue Problems ..
12.6.1 Conversion to firstorder form
13 Kronecker Products
13.1 Definition and Examples ............ .
13.2 Properties of the Kronecker Product ...... .
13.3 Application to Sylvester and Lyapunov Equations
Bibliography
Index
ix
125
125
127
130
131
133
133
135
135
139
139
140
144
151
153
This page intentionally left blank This page intentionally left blank
Preface
This book is intended to be used as a text for beginning graduatelevel (or even seniorlevel)
students in engineering, the sciences, mathematics, computer science, or computational
science who wish to be familar with enough matrix analysis that they are prepared to use its
tools and ideas comfortably in a variety of applications. By matrix analysis I mean linear
algebra and matrix theory together with their intrinsic interaction with and application to
linear dynamical systems (systems of linear differential or difference equations). The text
can be used in a onequarter or onesemester course to provide a compact overview of
much of the important and useful mathematics that, in many cases, students meant to learn
thoroughly as undergraduates, but somehow didn't quite manage to do. Certain topics
that may have been treated cursorily in undergraduate courses are treated in more depth
and more advanced material is introduced. I have tried throughout to emphasize only the
more important and "useful" tools, methods, and mathematical structures. Instructors are
encouraged to supplement the book with specific application examples from their own
particular subject area.
The choice of topics covered in linear algebra and matrix theory is motivated both by
applications and by computational utility and relevance. The concept of matrix factorization
is emphasized throughout to provide a foundation for a later course in numerical linear
algebra. Matrices are stressed more than abstract vector spaces, although Chapters 2 and 3
do cover some geometric (i.e., basisfree or subspace) aspects of many of the fundamental
notions. The books by Meyer [18], Noble and Daniel [20], Ortega [21], and Strang [24]
are excellent companion texts for this book. Upon completion of a course based on this
text, the student is then wellequipped to pursue, either via formal courses or through self
study, followon topics on the computational side (at the level of [7], [11], [23], or [25], for
example) or on the theoretical side (at the level of [12], [13], or [16], for example).
Prerequisites for using this text are quite modest: essentially just an understanding
of calculus and definitely some previous exposure to matrices and linear algebra. Basic
concepts such as determinants, singularity of matrices, eigenvalues and eigenvectors, and
positive definite matrices should have been covered at least once, even though their recollec
tion may occasionally be "hazy." However, requiring such material as prerequisite permits
the early (but "outoforder" by conventional standards) introduction of topics such as pseu
doinverses and the singular value decomposition (SVD). These powerful and versatile tools
can then be exploited to provide a unifying foundation upon which to base subsequent top
ics. Because tools such as the SVD are not generally amenable to "hand computation," this
approach necessarily presupposes the availability of appropriate mathematical software on
a digital computer. For this, I highly recommend MA TL A B® although other software such as
xi
Preface
This book is intended to be used as a text for beginning graduatelevel (or even seniorlevel)
students in engineering, the sciences, mathematics, computer science, or computational
science who wish to be familar with enough matrix analysis that they are prepared to use its
tools and ideas comfortably in a variety of applications. By matrix analysis I mean linear
algebra and matrix theory together with their intrinsic interaction with and application to
linear dynamical systems (systems of linear differential or difference equations). The text
can be used in a onequarter or onesemester course to provide a compact overview of
much of the important and useful mathematics that, in many cases, students meant to learn
thoroughly as undergraduates, but somehow didn't quite manage to do. Certain topics
that may have been treated cursorily in undergraduate courses are treated in more depth
and more advanced material is introduced. I have tried throughout to emphasize only the
more important and "useful" tools, methods, and mathematical structures. Instructors are
encouraged to supplement the book with specific application examples from their own
particular subject area.
The choice of topics covered in linear algebra and matrix theory is motivated both by
applications and by computational utility and relevance. The concept of matrix factorization
is emphasized throughout to provide a foundation for a later course in numerical linear
algebra. Matrices are stressed more than abstract vector spaces, although Chapters 2 and 3
do cover some geometric (i.e., basisfree or subspace) aspects of many of the fundamental
notions. The books by Meyer [18], Noble and Daniel [20], Ortega [21], and Strang [24]
are excellent companion texts for this book. Upon completion of a course based on this
text, the student is then wellequipped to pursue, either via formal courses or through self
study, followon topics on the computational side (at the level of [7], [II], [23], or [25], for
example) or on the theoretical side (at the level of [12], [13], or [16], for example).
Prerequisites for using this text are quite modest: essentially just an understanding
of calculus and definitely some previous exposure to matrices and linear algebra. Basic
concepts such as determinants, singularity of matrices, eigenvalues and eigenvectors, and
positive definite matrices should have been covered at least once, even though their recollec
tion may occasionally be "hazy." However, requiring such material as prerequisite permits
the early (but "outoforder" by conventional standards) introduction of topics such as pseu
doinverses and the singular value decomposition (SVD). These powerful and versatile tools
can then be exploited to provide a unifying foundation upon which to base subsequent top
ics. Because tools such as the SVD are not generally amenable to "hand computation," this
approach necessarily presupposes the availability of appropriate mathematical software on
a digital computer. For this, I highly recommend MAlLAB® although other software such as
xi
xii Preface
Mathematica® or Mathcad® is also excellent. Since this text is not intended for a course in
numerical linear algebra per se, the details of most of the numerical aspects of linear algebra
are deferred to such a course.
The presentation of the material in this book is strongly influenced by computa
tional issues for two principal reasons. First, "reallife" problems seldom yield to simple
closedform formulas or solutions. They must generally be solved computationally and
it is important to know which types of algorithms can be relied upon and which cannot.
Some of the key algorithms of numerical linear algebra, in particular, form the foundation
upon which rests virtually all of modern scientific and engineering computation. A second
motivation for a computational emphasis is that it provides many of the essential tools for
what I call "qualitative mathematics." For example, in an elementary linear algebra course,
a set of vectors is either linearly independent or it is not. This is an absolutely fundamental
concept. But in most engineering or scientific contexts we want to know more than that.
If a set of vectors is linearly independent, how "nearly dependent" are the vectors? If they
are linearly dependent, are there "best" linearly independent subsets? These turn out to
be much more difficult problems and frequently involve researchlevel questions when set
in the context of the finiteprecision, finiterange floatingpoint arithmetic environment of
most modern computing platforms.
Some of the applications of matrix analysis mentioned briefly in this book derive
from the modern statespace approach to dynamical systems. Statespace methods are
now standard in much of modern engineering where, for example, control systems with
large numbers of interacting inputs, outputs, and states often give rise to models of very
high order that must be analyzed, simulated, and evaluated. The "language" in which such
models are conveniently described involves vectors and matrices. It is thus crucial to acquire
a working knowledge of the vocabulary and grammar of this language. The tools of matrix
analysis are also applied on a daily basis to problems in biology, chemistry, econometrics,
physics, statistics, and a wide variety of other fields, and thus the text can serve a rather
diverse audience. Mastery of the material in this text should enable the student to read and
understand the modern language of matrices used throughout mathematics, science, and
engineering.
While prerequisites for this text are modest, and while most material is developed from
basic ideas in the book, the student does require a certain amount of what is conventionally
referred to as "mathematical maturity." Proofs are given for many theorems. When they are
not given explicitly, they are either obvious or easily found in the literature. This is ideal
material from which to learn a bit about mathematical proofs and the mathematical maturity
and insight gained thereby. It is my firm conviction that such maturity is neither encouraged
nor nurtured by relegating the mathematical aspects of applications (for example, linear
algebra for elementary statespace theory) to an appendix or introducing it "onthefly" when
necessary. Rather, one must lay a firm foundation upon which subsequent applications and
perspectives can be built in a logical, consistent, and coherent fashion.
I have taught this material for many years, many times at UCSB and twice at UC
Davis, and the course has proven to be remarkably successful at enabling students from
disparate backgrounds to acquire a quite acceptable level of mathematical maturity and
rigor for subsequent graduate studies in a variety of disciplines. Indeed, many students who
completed the course, especially the first few times it was offered, remarked afterward that
if only they had had this course before they took linear systems, or signal processing,
xii Preface
Mathematica® or Mathcad® is also excellent. Since this text is not intended for a course in
numerical linear algebra per se, the details of most of the numerical aspects of linear algebra
are deferred to such a course.
The presentation of the material in this book is strongly influenced by computa
tional issues for two principal reasons. First, "reallife" problems seldom yield to simple
closedform formulas or solutions. They must generally be solved computationally and
it is important to know which types of algorithms can be relied upon and which cannot.
Some of the key algorithms of numerical linear algebra, in particular, form the foundation
upon which rests virtually all of modem scientific and engineering computation. A second
motivation for a computational emphasis is that it provides many of the essential tools for
what I call "qualitative mathematics." For example, in an elementary linear algebra course,
a set of vectors is either linearly independent or it is not. This is an absolutely fundamental
concept. But in most engineering or scientific contexts we want to know more than that.
If a set of vectors is linearly independent, how "nearly dependent" are the vectors? If they
are linearly dependent, are there "best" linearly independent subsets? These tum out to
be much more difficult problems and frequently involve researchlevel questions when set
in the context of the finiteprecision, finiterange floatingpoint arithmetic environment of
most modem computing platforms.
Some of the applications of matrix analysis mentioned briefly in this book derive
from the modem statespace approach to dynamical systems. Statespace methods are
now standard in much of modem engineering where, for example, control systems with
large numbers of interacting inputs, outputs, and states often give rise to models of very
high order that must be analyzed, simulated, and evaluated. The "language" in which such
models are conveniently described involves vectors and matrices. It is thus crucial to acquire
a working knowledge of the vocabulary and grammar of this language. The tools of matrix
analysis are also applied on a daily basis to problems in biology, chemistry, econometrics,
physics, statistics, and a wide variety of other fields, and thus the text can serve a rather
diverse audience. Mastery of the material in this text should enable the student to read and
understand the modem language of matrices used throughout mathematics, science, and
engineering.
While prerequisites for this text are modest, and while most material is developed from
basic ideas in the book, the student does require a certain amount of what is conventionally
referred to as "mathematical maturity." Proofs are given for many theorems. When they are
not given explicitly, they are either obvious or easily found in the literature. This is ideal
material from which to learn a bit about mathematical proofs and the mathematical maturity
and insight gained thereby. It is my firm conviction that such maturity is neither encouraged
nor nurtured by relegating the mathematical aspects of applications (for example, linear
algebra for elementary statespace theory) to an appendix or introducing it "onthef1y" when
necessary. Rather, one must lay a firm foundation upon which subsequent applications and
perspectives can be built in a logical, consistent, and coherent fashion.
I have taught this material for many years, many times at UCSB and twice at UC
Davis, and the course has proven to be remarkably successful at enabling students from
disparate backgrounds to acquire a quite acceptable level of mathematical maturity and
rigor for subsequent graduate studies in a variety of disciplines. Indeed, many students who
completed the course, especially the first few times it was offered, remarked afterward that
if only they had had this course before they took linear systems, or signal processing.
Preface xiii
or estimation theory, etc., they would have been able to concentrate on the new ideas
they wanted to learn, rather than having to spend time making up for deficiencies in their
background in matrices and linear algebra. My fellow instructors, too, realized that by
requiring this course as a prerequisite, they no longer had to provide as much time for
"review" and could focus instead on the subject at hand. The concept seems to work.
— AJL, June 2004
Preface XIII
or estimation theory, etc., they would have been able to concentrate on the new ideas
they wanted to learn, rather than having to spend time making up for deficiencies in their
background in matrices and linear algebra. My fellow instructors, too, realized that by
requiring this course as a prerequisite, they no longer had to provide as much time for
"review" and could focus instead on the subject at hand. The concept seems to work.
AJL, June 2004
This page intentionally left blank This page intentionally left blank
Chapter 1
Introduction and Review
1.1 Some Notation and Terminology
We begin with a brief introduction to some standard notation and terminology to be used
throughout the text. This is followed by a review of some basic notions in matrix analysis
and linear algebra.
The following sets appear frequently throughout subsequent chapters:
1. R
n
= the set of ntuples of real numbers represented as column vectors. Thus, x e Rn
means
where xi e R for i e n.
Henceforth, the notation n denotes the set {1, . . . , n}.
Note: Vectors are always column vectors. A row vector is denoted by y
T
, where
y G Rn and the superscript T is the transpose operation. That a vector is always a
column vector rather than a row vector is entirely arbitrary, but this convention makes
it easy to recognize immediately throughout the text that, e.g., X
T
y is a scalar while
xy
T
is an n x n matrix.
2. Cn = the set of ntuples of complex numbers represented as column vectors.
3. R
mxn
= the set of real (or realvalued) m x n matrices.
4. R
mxnr
= the set of real m x n matrices of rank r. Thus, R
nxnn
denotes the set of real
nonsingular n x n matrices.
5. C
mxn
= the set of complex (or complexvalued) m x n matrices.
6. C
mxn
= the set of complex m x n matrices of rank r.
1
Chapter 1
Introduction and Review
1.1 Some Notation and Terminology
We begin with a brief introduction to some standard notation and terminology to be used
throughout the text. This is followed by a review of some basic notions in matrix analysis
and linear algebra.
The following sets appear frequently throughout subsequent chapters:
I. IR
n
= the set of ntuples of real numbers represented as column vectors. Thus, x E IR
n
means
where Xi E IR for i E !!.
Henceforth, the notation!! denotes the set {I, ... , n }.
Note: Vectors are always column vectors. A row vector is denoted by y ~ where
y E IR
n
and the superscript T is the transpose operation. That a vector is always a
column vector rather than a row vector is entirely arbitrary, but this convention makes
it easy to recognize immediately throughout the text that, e.g., x
T
y is a scalar while
xyT is an n x n matrix.
2. en = the set of ntuples of complex numbers represented as column vectors.
3. IR
rn
xn = the set of real (or realvalued) m x n matrices.
4. 1R;n xn = the set of real m x n matrices of rank r. Thus, I R ~ xn denotes the set of real
nonsingular n x n matrices.
5. e
rnxn
= the set of complex (or complexvalued) m x n matrices.
6. e;n xn = the set of complex m x n matrices of rank r.
Chapter 1. Introduction and Review
Each of the above also has a "block" analogue obtained by replacing scalar components in
the respective definitions by block submatrices. For example, if A e R
nxn
, B e R
mx n
, and
C e R
mxm
, then the (m+ n) x (m+ n) matrix [ A0 Bc ] is block upper triangular.
The transpose of a matrix A is denoted by A
T
and is the matrix whose (i, j)th entry
is the (7, Oth entry of A, that is, (A
7
),, = a,,. Note that if A e R
mx
", then A
7
" e E"
xm
.
If A e C
mx
", then its Hermitian transpose (or conjugate transpose) is denoted by A
H
(or
sometimes A*) and its (i, j)\h entry is (A
H
),
7
= («77), where the bar indicates complex
conjugation; i.e., if z = a + jf$ (j = i = v^T), then z = a — jfi. A matrix A is symmetric
if A = A
T
and Hermitian if A = A
H
. We henceforth adopt the convention that, unless
otherwise noted, an equation like A = A
T
implies that A is realvalued while a statement
like A = A
H
implies that A is complexvalued.
Remark 1.1. While \/—\ is most commonly denoted by i in mathematics texts, j is
the more common notation in electrical engineering and system theory. There is some
advantage to being conversant with both notations. The notation j is used throughout the
text but reminders are placed at strategic locations.
Example 1.2.
Transposes of block matrices can be defined in an obvious way. For example, it is
easy to see that if A,, are appropriately dimensioned subblocks, then
is symmetric (and Hermitian).
is complexvalued symmetric but not Hermitian.
is Hermitian (but not symmetric).
2
We now classify some of the more familiar "shaped" matrices. A matrix A e
(or A eC"
x
")i s
• diagonal if a,
7
= 0 for i ^ j.
• upper triangular if a,
;
= 0 for i > j.
• lower triangular if a,
7
= 0 for / < j.
• tridiagonal if a
(y
= 0 for z — j\ > 1.
• pentadiagonal if a
i;
= 0 for / — j\ > 2.
• upper Hessenberg if a
f
j = 0 for i — j > 1.
• lower Hessenberg if a,
;
= 0 for j — i > 1.
2 Chapter 1. Introduction and Review
We now classify some of the more familiar "shaped" matrices. A matrix A E IR
n
xn
(or A E e
nxn
) is
• diagonal if aij = 0 for i i= }.
• upper triangular if aij = 0 for i > }.
• lower triangular if aij = 0 for i < }.
• tridiagonal if aij = 0 for Ii  JI > 1.
• pentadiagonal if aij = 0 for Ii  J I > 2.
• upper Hessenberg if aij = 0 for i  j > 1.
• lower Hessenberg if aij = 0 for }  i > 1.
Each of the above also has a "block" analogue obtained by replacing scalar components in
the respective definitions by block submatrices. For example, if A E IR
nxn
, B E IR
nxm
, and
C E jRmxm, then the (m + n) x (m + n) matrix [ ~ ~ ] is block upper triangular.
The transpose of a matrix A is denoted by AT and is the matrix whose (i, j)th entry
is the (j, i)th entry of A, that is, (AT)ij = aji. Note that if A E jRmxn, then AT E jRnxm.
If A E em xn, then its Hermitian transpose (or conjugate transpose) is denoted by A H (or
sometimes A*) and its (i, j)th entry is (AH)ij = (aji), where the bar indicates complex
conjugation; i.e., if z = IX + jfJ (j = i = R), then z = IX  jfJ. A matrix A is symmetric
if A = A T and Hermitian if A = A H. We henceforth adopt the convention that, unless
otherwise noted, an equation like A = A T implies that A is realvalued while a statement
like A = AH implies that A is complexvalued.
Remark 1.1. While R is most commonly denoted by i in mathematics texts, } is
the more common notation in electrical engineering and system theory. There is some
advantage to being conversant with both notations. The notation j is used throughout the
text but reminders are placed at strategic locations.
Example 1.2.
1. A = [
; ~ ] is symmetric (and Hermitian).
2. A = [
5
7+}
7 + j ]
2 is complexvalued symmetric but not Hermitian.
[
5 7+} ]
3 A  2 is Hermitian (but not symmetric).
·  7  j
Transposes of block matrices can be defined in an obvious way. For example, it is
easy to see that if Aij are appropriately dimensioned subblocks, then
r = [
1.2. Matrix Arithmetic
1.2 Matrix Arithmetic
It is assumed that the reader is familiar with the fundamental notions of matrix addition,
multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication occurs when the second matrix is a column
vector x, i.e., the matrixvector product Ax. A very important way to view this product is
to interpret it as a weighted sum (linear combination) of the columns of A. That is, suppose
The importance of this interpretation cannot be overemphasized. As a numerical example,
take A = [96 85 74]x = 2 . Then we can quickly calculate dot products of the rows of A
with the column x to find Ax =[50 32]' but this matrixvector product can also be computed
v1a
For large arrays of numbers, there can be important computerarchitecturerelated advan
tages to preferring the latter calculation method.
For matrix multiplication, suppose A e R
mxn
and B = [bi,...,b
p
] e R
nxp
with
bi e W
1
. Then the matrix product A B can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears
frequently in the text and is presented below as a theorem. Again, its importance cannot be
overemphasized. It is deceptively simple and its full understanding is well rewarded.
Theorem 1.3. Let U = [M I , . . . , u
n
] e R
mxn
with u
t
e R
m
and V = [v
{
,..., v
n
] e R
pxn
with v
t
e R
p
. Then
If matrices C and D are compatible for multiplication, recall that (CD)
T
= D
T
C
T
(or (CD}
H
— D
H
C
H
). This gives a dual to the matrixvector result above. Namely, if
C eR
mxn
has row vectors cj e E
lx
", and is premultiplied by a row vector y
T
e R
l xm
,
then the product can be written as a weighted linear sum of the rows of C as follows:
3
Theorem 1.3 can then also be generalized to its "row dual." The details are left to the readei
Then
1.2. Matrix Arithmetic 3
1 .2 Matrix Arithmetic
It is assumed that the reader is familiar with the fundamental notions of matrix addition,
multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication occurs when the second matrix is a column
vector x, i.e., the matrixvector product Ax. A very important way to view this product is
to interpret it as a weighted sum (linear combination) of the columns of A. That is, suppose
I ]
A = la' ....• a"1 E JR
m
" with a, E JRm and x = l
Then
Ax = Xjal + ... + Xnan E jRm.
The importance of this interpretation cannot be overemphasized. As a numerical example,
take A = ! x = Then we can quickly calculate dot products of the rows of A
with the column x to find Ax = but this matrixvector product can also be computed
via
3.[ J+2.[ J+l.[ l
For large arrays of numbers, there can be important computerarchitecturerelated advan
tages to preferring the latter calculation method.
For matrix multiplication, suppose A E jRmxn and B = [hI,.'" h
p
] E jRnxp with
hi E jRn. Then the matrix product AB can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears
frequently in the text and is presented below as a theorem. Again, its importance cannot be
overemphasized. It is deceptively simple and its full understanding is well rewarded.
Theorem 1.3. Let U = [Uj, ... , un] E jRmxn with Ui E jRm and V = [VI, .•. , Vn] E lR
Pxn
with Vi E jRP. Then
n
UV
T
= LUiVr E jRmxp.
i=I
If matrices C and D are compatible for multiplication, recall that (C D)T = DT C
T
(or (C D)H = DH C
H
). This gives a dual to the matrixvector result above. Namely, if
C E jRmxn has row vectors cJ E jRlxn, and is premultiplied by a row vector yT E jRlxm,
then the product can be written as a weighted linear sum of the rows of C as follows:
yTC=YICf EjRlxn.
Theorem 1.3 can then also be generalized to its "row dual." The details are left to the reader.
Chapter 1. Introduction and Review
1.3 Inner Products and Orthogonality
For vectors x, y e R", the Euclidean inner product (or inner product, for short) of x and
y is given by
Note that the inner product is a scalar.
If x, y e C", we define their complex Euclidean inner product (or inner product,
for short) by
and we see that, indeed, (x, y)
c
= (y, x)
c
.
Note that x
T
x = 0 if and only if x = 0 when x e Rn but that this is not true if x e Cn.
What is true in the complex case is that X
H
x = 0 if and only if x = 0. To illustrate, consider
the nonzero vector x above. Then X
T
X = 0 but X
H
X = 2.
Two nonzero vectors x, y e R are said to be orthogonal if their inner product is
zero, i.e., x
T
y = 0. Nonzero complex vectors are orthogonal if X
H
y = 0. If x and y are
orthogonal and X
T
X = 1 and y
T
y = 1, then we say that x and y are orthonormal. A
matrix A e R
nxn
is an orthogonal matrix if A
T
A = AA
T
= /, where / is the n x n
identity matrix. The notation /„ is sometimes used to denote the identity matrix in R
nx
"
(orC"
x
"). Similarly, a matrix A e C
nxn
is said to be unitary if A
H
A = AA
H
= I. Clearly
an orthogonal or unitary matrix has orthonormal rows and orthonormal columns. There is
no special name attached to a nonsquare matrix A e R
mxn
(or € C
mxn
) with orthonormal
rows or columns.
1.4 Determinants
It is assumed that the reader is familiar with the basic theory of determinants. For A e R
nxn
(or A 6 C
nxn
) we use the notation det A for the determinant of A. We list below some of
Note that (x, y)
c
= (y, x)
c
, i.e., the order in which x and y appear in the complex inner
product is important. The more conventional definition of the complex inner product is
( x , y )
c
= y
H
x = Eni=1 xiyi but throughout the text we prefer the symmetry with the real
case.
Example 1.4. Let x = [ 1j ] and y = [ 1/ 2 ]. Then
while
44 Chapter 1. Introduction and Review
1.3 Inner Products and Orthogonality
For vectors x, y E IRn, the Euclidean inner product (or inner product, for short) of x and
y is given by
n
(x, y) := x
T
y = Lx;y;.
;=1
Note that the inner product is a scalar.
If x, y E <en, we define their complex Euclidean inner product (or inner product,
for short) by
n
(x'Y}c :=xHy = Lx;y;.
;=1
Note that (x, y)c = (y, x}c, i.e., the order in which x and y appear in the complex inner
product is important. The more conventional definition of the complex inner product is
(x, y)c = yH x = L:7=1 x;y; but throughout the text we prefer the symmetry with the real
case.
Example 1.4. Let x = [} ] and y = [ ~ ] . Then
(x, Y}c = [ } JH [ ~ ] = [I  j] [ ~ ] = 1  2j
while
and we see that, indeed, (x, Y}c = {y, x)c'
Note that x
T
x = 0 if and only if x = 0 when x E IR
n
but that this is not true if x E en.
What is true in the complex case is that x
H
x = 0 if and only if x = O. To illustrate, consider
the nonzero vector x above. Then x
T
x = 0 but x
H
X = 2.
Two nonzero vectors x, y E IR
n
are said to be orthogonal if their inner product is
zero, i.e., x
T
y = O. Nonzero complex vectors are orthogonal if x
H
y = O. If x and y are
orthogonal and x
T
x = 1 and yT y = 1, then we say that x and y are orthonormal. A
matrix A E IR
nxn
is an orthogonal matrix if AT A = AAT = I, where I is the n x n
identity matrix. The notation In is sometimes used to denote the identity matrix in IR
nxn
(or en xn). Similarly, a matrix A E en xn is said to be unitary if A H A = AA H = I. Clearly
an orthogonal or unitary matrix has orthonormal rows and orthonormal columns. There is
no special name attached to a nonsquare matrix A E ]Rrn"n (or E e
mxn
) with orthonormal
rows or columns.
1.4 Determinants
It is assumed that the reader is familiar with the basic theory of determinants. For A E IR
n
xn
(or A E en xn) we use the notation det A for the determinant of A. We list below some of
1.4. Determinants
the more useful properties of determinants. Note that this is not a minimal set, i.e., several
properties are consequences of one or more of the others.
1. If A has a zero row or if any two rows of A are equal, then det A = 0.
2. If A has a zero column or if any two columns of A are equal, then det A = 0.
3. Interchanging two rows of A changes only the sign of the determinant.
4. Interchanging two columns of A changes only the sign of the determinant.
5. Multiplying a row of A by a scalar a results in a new matrix whose determinant is
a det A.
6. Multiplying a column of A by a scalar a results in a new matrix whose determinant
is a det A.
7. Multiplying a row of A by a scalar and then adding it to another row does not change
the determinant.
8. Multiplying a column of A by a scalar and then adding it to another column does not
change the determinant.
9. det A
T
= det A (det A
H
= det A if A e C
nxn
).
10. If A is diagonal, then det A = a11a22 • • • a
nn
, i.e., det A is the product of its diagonal
elements.
11. If A is upper triangular, then det A = a11a22 • • • a
nn
.
12. If A is lower triangular, then det A = a11a22 • • • a
nn
.
13. If A is block diagonal (or block upper triangular or block lower triangular), with
square diagonal blocks A11, A22, • • •, A
nn
(of possibly different sizes), then det A =
det A11 det A22 • • • det A
nn
.
14. If A, B eR
nxn
,thendet(AB) = det A det 5.
15. If A € R
nxn
, then det(A
1
) = 1det A.
16. If A e R
nxn
and D e R
mxm
, then det [Ac
B
D
] = del A det ( D – CA–
l
B).
Proof: This follows easily from the block LU factorization
17. If A eR
nxn
and D e RM
mxm
, then det [Ac
B
D
] = det D det(A – BD–
1
C) .
Proof: This follows easily from the block UL factorization
5 1.4. Determinants 5
the more useful properties of determinants. Note that this is not a minimal set, i.e., several
properties are consequences of one or more of the others.
1. If A has a zero row or if any two rows of A are equal, then det A = o.
2. If A has a zero column or if any two columns of A are equal, then det A = O.
3. Interchanging two rows of A changes only the sign of the determinant.
4. Interchanging two columns of A changes only the sign of the determinant.
5. Multiplying a row of A by a scalar ex results in a new matrix whose determinant is
exdetA.
6. Multiplying a column of A by a scalar ex results in a new matrix whose determinant
is ex det A.
7. Multiplying a row of A by a scalar and then adding it to another row does not change
the determinant.
8. Multiplying a column of A by a scalar and then adding it to another column does not
change the determinant.
9. detAT = detA (detA
H
= detA if A E C"X").
10. If A is diagonal, then det A = alla22 ... ann, i.e., det A is the product of its diagonal
elements.
11. If A is upper triangular, then det A = all a22 ... a"n.
12. If A is lower triangUlar, then det A = alla22 ... ann.
13. If A is block diagonal (or block upper triangular or block lower triangular), with
square diagonal blocks A 11, A
22
, ... , An" (of possibly different sizes), then det A =
det A 11 det A22 ... det Ann.
14. If A, B E IR
nxn
, then det(AB) = det A det B.
15. If A E then det(A
1
) = de: A .
16. If A E and DE IR
mxm
, then det = detA det(D  CA
1
B).
Proof" This follows easily from the block LU factorization
] [
17. If A E IR
nxn
and D E then det = det D det(A  B D
1
C).
Proof" This follows easily from the block UL factorization
BD
1
I
] [
Chapter 1. Introduction and Review
Remark 1.5. The factorization of a matrix A into the product of a unit lower triangular
matrix L (i.e., lower triangular with all 1's on the diagonal) and an upper triangular matrix
U is called an LU factorization; see, for example, [24]. Another such factorization is UL
where U is unit upper triangular and L is lower triangular. The factorizations used above
are block analogues of these.
Remark 1.6. The matrix D — CA–
1
B is called the Schur complement of A in [AC BD].
Similarly, A – BD–
l
C is the Schur complement of D in [AC
B
D
].
EXERCISES
1. If A e R
nxn
and or is a scalar, what is det(aA)? What is det(–A)?
2. If A is orthogonal, what is det A? If A is unitary, what is det A?
3. Let x, y e Rn. Show that det(I – xy
T
) = 1 – y
T
x.
4. Let U1, U
2
, . . ., Uk € R
nxn
be orthogonal matrices. Show that the product U =
U1 U2 • • • Uk is an orthogonal matrix.
5. Let A e R
n x n
. The trace of A, denoted TrA, is defined as the sum of its diagonal
elements, i.e., TrA = Eni=1
aii.
(a) Show that the trace is a linear function; i.e., if A, B e R
nxn
and a, ft e R, then
Tr(aA + fiB)= aTrA + fiTrB.
(b) Show that Tr(Afl) = Tr(£A), even though in general AB ^ B A.
(c) Let S € R
nxn
be skewsymmetric, i.e., S
T
= S. Show that TrS = 0. Then
either prove the converse or provide a counterexample.
6. A matrix A e W
x
" is said to be idempotent if A
2
= A.
/ x ™ . , • , ! T 2cos
2
<9 sin 20 1 . . _ ,
(a) Show that the matrix A =  . _ .. _ .
2rt
is idempotent for all #.
2 _ sin 2^ 2sm
z
# J
r
(b) Suppose A e IR"
X
" is idempotent and A ^ I. Show that A must be singular.
66 Chapter 1. Introduction and Review
Remark 1.5. The factorization of a matrix A into the product of a unit lower triangular
matrix L (i.e., lower triangular with all l's on the diagonal) and an upper triangular matrix
V is called an LV factorization; see, for example, [24]. Another such factorization is VL
where V is unit upper triangular and L is lower triangular. The factorizations used above
are block analogues of these.
Remark 1.6. The matrix D  e A I B is called the Schur complement of A in [ ~ ~ ].
Similarly, A  BDIe is the Schur complement of Din [ ~ ~ l
EXERCISES
1. If A E jRnxn and a is a scalar, what is det(aA)? What is det(A)?
2. If A is orthogonal, what is det A? If A is unitary, what is det A?
3. Letx,y E jRn. Showthatdet(lxyT) = 1 yTx.
4. Let VI, V2, ... ,Vk E jRn xn be orthogonal matrices. Show that the product V =
VI V2 ... V
k
is an orthogonal matrix.
5. Let A E jRNxn. The trace of A, denoted Tr A, is defined as the sum of its diagonal
elements, i.e., TrA = L ~ = I au·
(a) Show that the trace is a linear function; i.e., if A, B E JRn xn and a, f3 E JR, then
Tr(aA + f3B) = aTrA + f3TrB.
(b) Show that Tr(AB) = Tr(BA), even though in general AB i= BA.
(c) Let S E jRnxn be skewsymmetric, i.e., ST = So Show that TrS = O. Then
either prove the converse or provide a counterexample.
6. A matrix A E jRnxn is said to be idempotent if A2 = A.
I [ 2cos
2
0
(a) Show that the matrix A =  . 2f)
2 sm 0
sin 20 J. . d .. II II
2sin
2
0 IS I empotent lor a o.
(b) Suppose A E jRn xn is idempotent and A i= I. Show that A must be singular.
Chapter 2
Vector Spaces
In this chapter we give a brief review of some of the basic concepts of vector spaces. The
emphasis is on finitedimensional vector spaces, including spaces formed by special classes
of matrices, but some infinitedimensional examples are also cited. An excellent reference
for this and the next chapter is [10], where some of the proofs that are not given here may
be found.
2.1 Definitions and Examples
Definition 2.1. A field is a set F together with two operations +, • : F x F — > F such that
Axioms (A1)(A3) state that (F, +) is a group and an abelian group if (A4) also holds.
Axioms (M1)(M4) state that (F \ {0}, •) is an abelian group.
Generally speaking, when no confusion can arise, the multiplication operator "•" is
not written explicitly.
7
(Al) a + (P + y ) = (a + p ) + y f o r all a, f t, y € F.
(A2) there exists an element 0 e F such that a + 0 = a. for all a e F.
(A3 ) for all a e F, there exists an element (—a) e F such that a + (— a) = 0.
(A4 ) a + p = ft + afar all a, ft e F.
(M l) a  ( p  y ) = ( a  p )  y f o r al l a, p, y e F.
(M 2) there exists an element 1 e F such that a • I = a for all a e F.
(M 3 ) for all a e ¥, a ^0, there exists an element a"
1
€ F such that a • a~
l
= 1.
(M 4 ) a • p = P • a for all a, p e F.
(D) a  ( p + y)=ci p+a y f or alia, p,ye¥.
Chapter 2
Vector Spaces
In this chapter we give a brief review of some of the basic concepts of vector spaces. The
emphasis is on finitedimensional vector spaces, including spaces formed by special classes
of matrices, but some infinitedimensional examples are also cited. An excellent reference
for this and the next chapter is [10], where some of the proofs that are not given here may
be found.
2.1 Definitions and Examples
Definition 2.1. A field is a set IF together with two operations +, . : IF x IF ~ IF such that
(Al) a + (,8 + y) = (a +,8) + y for all a,,8, y Elf.
(A2) there exists an element 0 E IF such that a + 0 = a for all a E IF.
(A3) for all a E IF, there exists an element (a) E IF such that a + (a) = O.
(A4) a + ,8 = ,8 + a for all a, ,8 Elf.
(Ml) a· (,8, y) = (a·,8)· y for all a,,8, y Elf.
(M2) there exists an element I E IF such that a . I = a for all a E IF.
(M3) for all a E IF, a f. 0, there exists an element aI E IF such that a . aI = 1.
(M4) a·,8 =,8 . afar all a, ,8 E IF.
(D) a· (,8 + y) = a·,8 +a· y for all a, ,8, y Elf.
Axioms (Al)(A3) state that (IF, +) is a group and an abelian group if (A4) also holds.
Axioms (MI)(M4) state that (IF \ to), .) is an abelian group.
Generally speaking, when no confusion can arise, the multiplication operator "." is
not written explicitly.
7
Chapter 2. Vector Spaces
Example 2.2.
1. R with ordinary addition and multiplication is a field.
2. C with ordinary complex addition and multiplication is a field.
3. Raf. r] = the field of rational functions in the indeterminate x
8
where Z+ = {0,1,2, . . . }, is a field.
4. RMr
mxn
= { m x n matrices of rank r with real coefficients) is clearly not a field since,
for example, (Ml) does not hold unless m = n. Moreover, R"
x
" is not a field either
since (M4) does not hold in general (although the other 8 axioms hold).
Definition 2.3. A vector space over a field F is a set V together with two operations
+ :V x V ^V and : F xV »• V such that
A vector space is denoted by (V, F) or, when there is no possibility of confusion as to the
underlying fie Id, simply by V.
Remark 2.4. Note that + and • in Definition 2.3 are different from the + and • in Definition
2.1 in the sense of operating on different objects in different sets. In practice, this causes
no confusion and the • operator is usually not even written explicitly.
Example 2.5.
1. (R", R) with addition defined by
and scalar multiplication defined by
is a vector space. Similar definitions hold for (C", C).
(VI) (V, +) is an abelian group.
(V2) ( a  p )  v = a  ( P ' V ) f o r all a, p e F and for all v e V.
(V3) (a + ft) • v = a • v + p • v for all a, p € F and for all v e V.
(V4) a(v + w)=av + a w for all a e F and for all v, w e V.
(V5) 1 • v = v for all v e V (1 e F).
8 Chapter 2. Vector Spaces
Example 2.2.
I. IR with ordinary addition and multiplication is a field.
2. e with ordinary complex addition and multiplication is a field.
3. Ra[x] = the field of rational functions in the indeterminate x
= {a
o
+ atX + ... + apxP +}
:aj,f3i EIR ;P,qEZ ,
f30 + f3t
X
+ ... + f3qX
q
where Z+ = {O,l,2, ... }, is a field.
4. I R ~ xn = { m x n matrices of rank r with real coefficients} is clearly not a field since,
for example, (MI) does not hold unless m = n. Moreover, l R ~ x n is not a field either
since (M4) does not hold in general (although the other 8 axioms hold).
Definition 2.3. A vector space over a field IF is a set V together with two operations
+ : V x V + V and· : IF x V + V such that
(VI) (V, +) is an abelian group.
(V2) (a· f3) . v = a . (f3 . v) for all a, f3 E IF andfor all v E V.
(V3) (a + f3). v = a· v + f3. v for all a, f3 Elf andforall v E V.
(V4) a· (v + w) = a . v + a . w for all a ElF andfor all v, w E V.
(V5) I· v = v for all v E V (1 Elf).
A vector space is denoted by (V, IF) or, when there is no possibility of confusion as to the
underlying field, simply by V.
Remark 2.4. Note that + and· in Definition 2.3 are different from the + and . in Definition
2.1 in the sense of operating on different objects in different sets. In practice, this causes
no confusion and the· operator is usually not even written explicitly.
Example 2.5.
I. (IRn, IR) with addition defined by
and scalar multiplication defined by
is a vector space. Similar definitions hold for (en, e).
2.2. Subspaces
3. Let (V, F) be an arbitrary vector space and V be an arbitrary set. Let O (X > , V) be the
set of functions / mapping D to V. Then O (D, V) is a vector space with addition
defined by
2.2 Subspaces
Definition 2.6. Let (V, F) be a vector space and let W c V, W = 0. Then (W, F) is a
subspace of (V, F) i f and only i f (W, F) is i tself a vector space or, equi valently, i f and only
i f ( a w 1 + ß W 2 ) e W for all a, ß e ¥ and for all w 1 , w
2
e W.
Remark 2.7. The latter characterization of a subspace is often the easiest way to check
or prove that something is indeed a subspace (or vector space); i.e., verify that the set in
question is closed under addition and scalar multiplication. Note, too, that since 0 e F, this
implies that the zero vector must be in any subspace.
Notation: When the underlying field is understood, we write W c V, and the symbol c,
when used with vector spaces, is henceforth understood to mean "is a subspace of." The
less restrictive meaning "is a subset of" is specifically flagged as such.
9
2. (E
mxn
, E) is a vector space with addition defined by
and scalar multiplication defined by
and scalar multiplication defined by
Special Cases:
(a) V = [to, t \ ] , (V, F) = (IR", E), and the functions are piecewise continuous
=: (PC[f
0
, t\ ] )
n
or continuous =: (C[?
0
, h] )
n
.
4. Let A € R"
x
". Then (x(t) : x ( t ) = Ax(t}} is a vector space (of dimension n) .
2.2. Subspaces 9
2.
(JRmxn, JR) is a vector space with addition defined by
[ ." + P"
al2 + fJI2 aln + fJln
l
a21 + fJ2I a22 + fJ22 a2n + fJ2n
A+B= .
amI + fJml am2 + fJm2 amn + fJmn
and scalar multiplication defined by
[ ya"
y
a
l2
ya," l
y
a
21 y
a
22 ya2n
yA = . . .
yaml ya
m
2
ya
mn
3. Let (V, IF) be an arbitrary vector space and '0 be an arbitrary set. Let cf>('O, V) be the
set of functions f mapping '0 to V. Then cf>('O, V) is a vector space with addition
defined by
(f + g)(d) = fed) + g(d) for all d E '0 and for all f, g E cf>
and scalar multiplication defined by
(af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>.
Special Cases:
(a) '0 = [to, td, (V, IF) = (JR
n
, JR), and the functions are piecewise continuous
=: (PC[to, td)n or continuous =: (C[to, td)n.
(b) '0 = [to, +00), (V, IF) = (JRn, JR), etc.
4. Let A E JR(nxn. Then {x(t) : x(t) = Ax(t)} is a vector space (of dimension n).
2.2 Subspaces
Definition 2.6. Let (V, IF) be a vector space and let W ~ V, W f= 0. Then (W, IF) is a
subspace of (V, IF) if and only if (W, IF) is itself a vector space or, equivalently, if and only
if(awl + fJw2) E W foral! a, fJ E IF andforall WI, W2 E W.
Remark 2.7. The latter characterization of a subspace is often the easiest way to check
or prove that something is indeed a subspace (or vector space); i.e., verify that the set in
question is closed under addition and scalar multiplication. Note, too, that since 0 E IF, this
implies that the zero vector must be in any subspace.
Notation: When the underlying field is understood, we write W ~ V, and the symbol ~ ,
when used with vector spaces, is henceforth understood to mean "is a subspace of." The
less restrictive meaning "is a subset of' is specifically flagged as such.
Then W
a
,ß is a subspace of V if and only if ß = 0. As an interesting exercise, sketch
W2,1, W2,o, W1/2,1, and W1/2,
0
. Note, too, that the vertical line through the origin (i.e.,
a = oo) is also a subspace.
All lines through the origin are subspaces. Shifted subspaces W
a
,ß with ß = 0 are
called linear varieties.
Henceforth, we drop the explicit dependence of a vector space on an underlying field.
Thus, V usually denotes a vector space with the underlying field generally being R unless
explicitly stated otherwise.
Definition 2.9. If 12, and S are vector spaces (or subspaces), then R = S if and only if
R C S and S C R.
Note: To prove two vector spaces are equal, one usually proves the two inclusions separately:
An arbitrary r e R is shown to be an element of S and then an arbitrary 5 € S is shown to
be an element of R.
2.3 Linear Independence
Let X = { v1 , v2, • • •} be a nonempty collection of vectors u, in some vector space V.
Definition 2.10. X is a linearly dependent set of vectors if and only if there exist k distinct
elements v1, . . . , vk e X and scalars a1, . . . , ak not all zero such that
10 Chapter 2. Vector Spaces
Example 2.8.
1. Consider (V, F) = (R"
X
",R) and let W = [A e R"
x
" : A is symmetric}. Then
We V.
Proof: Suppose A\, A
2
are symmetric. Then it is easily shown that ctA\ + fiAi is
symmetric for all a, ft e R.
2. Let W = { A € R"
x
" : A is orthogonal}. Then W is /wf a subspace of R"
x
".
3. Consider (V, F) = (R
2
, R) and for each v € R
2
of the form v = [v1v2 ] identify v1 with
the jccoordinate in the plane and u
2
with the ycoordinate. For a, ß e R, define
X is a linearly independent set of vectors if and only if for any collection of k distinct
elements v1, . . . ,Vk of X and for any scalars a1, . . . , ak,
10 Chapter 2. Vector Spaces
Example 2.S.
1. Consider (V,lF) = (JR.nxn,JR.) and let W = {A E JR.nxn : A is symmetric}. Then
Proof' Suppose AI, A2 are symmetric. Then it is easily shown that aAI + f3A2 is
symmetric for all a, f3 E R
2. Let W = {A E ]Rnxn : A is orthogonal}. Then W is not a subspace of JR.nxn.
3. Consider (V, IF) = (]R2, JR.) and for each v E ]R2 of the form v = ] identify VI with
the xcoordinate in the plane and V2 with the ycoordinate. For a, f3 E R define
W",/l = {V : v = [ ac f3 ] ; c E JR.} .
Then W",/l is a subspace of V if and only if f3 = O. As an interesting exercise, sketch
W2.I, W2,O, Wi,I' and Wi,o, Note, too, that the vertical line through the origin (i.e.,
a = 00) is also a subspace.
All lines through the origin are subspaces. Shifted subspaces W",/l with f3 =1= 0 are
called linear varieties.
Henceforth, we drop the explicit dependence of a vector space on an underlying field.
Thus, V usually denotes a vector space with the underlying field generally being JR. unless
explicitly stated otherwise.
Definition 2.9. ffR and S are vector spaces (or subspaces), then R = S if and only if
R R.
Note: To prove two vector spaces are equal, one usually proves the two inclusions separately:
An arbitrary r E R is shown to be an element of S and then an arbitrary s E S is shown to
be an element of R.
2.3 Linear Independence
Let X = {VI, V2, •.• } be a nonempty collection of vectors Vi in some vector space V.
Definition 2.10. X is a linearly dependent set of vectors if and only if there exist k distinct
elements VI, ... , Vk E X and scalars aI, ..• , (Xk not all zero such that
X is a linearly independent set of vectors if and only if for any collection of k distinct
elements VI, ... , Vk of X and for any scalars aI, ••• , ak,
al VI + ... + (XkVk = 0 implies al = 0, ... , ak = O.
2.3. Linear Independence 11
(since 2v\ — v
2
+ v3 = 0).
2. Let A e R
xn
and 5 e R"
xm
. Then consider the rows of e
tA
B as vectors in C
m
[t
0
, t1]
(recall that e
fA
denotes the matrix exponential, which is discussed in more detail in
Chapter 11). Independence of these vectors turns out to be equivalent to a concept
called controllability, to be studied further in what follows.
Let v
f
e R", i e k, and consider the matrix V = [ v1 , ... ,Vk] e R
nxk
. The linear
dependence of this set of vectors is equivalent to the existence of a nonzero vector a e R
k
such that Va = 0. An equivalent condition for linear dependence is that the k x k matrix
V
T
V is singular. If the set of vectors is independent, and there exists a e R* such that
Va = 0, then a = 0. An equivalent condition for linear independence is that the matrix
V
T
V is nonsingular.
Definition 2.12. Let X = [ v1 , v2, . . . } be a collection of vectors vi. e V. Then the span of
X is defined as
Example 2.13. Let V = R
n
and define
Then Sp{e1, e
2
, ...,e
n
} = Rn.
Definition 2.14. A set of vectors X is a basis for V if and only ij
1. X is a linearly independent set (of basis vectors), and
2. Sp(X) = V.
Example 2.11.
is a linearly independent set. Why?
s a linearly dependent set However,
1. LetV = R
3
. Then
where N = {1, 2, ...}.
2.3. Linear Independence 11
Example 2.11.
I. 1£t V = Then {[ H i Hi] } i" independent.. Why?
Howe,."I [ i 1 [ i 1 [ l ] } is a Iin=ly
(since 2vI  V2 + V3 = 0).
2. Let A E ]Rnxn and B E ]Rnxm. Then consider the rows of etA B as vectors in em [to, tIl
(recall that etA denotes the matrix exponential, which is discussed in more detail in
Chapter 11). Independence of these vectors turns out to be equivalent to a concept
called controllability, to be studied further in what follows.
Let Vi E ]Rn, i E If, and consider the matrix V = [VI, ... , Vk] E ]Rnxk. The linear
dependence of this set of vectors is equivalent to the existence of a nonzero vector a E ]Rk
such that Va = O. An equivalent condition for linear dependence is that the k x k matrix
VT V is singular. If the set of vectors is independent, and there exists a E ]Rk such that
Va = 0, then a = O. An equivalent condition for linear independence is that the matrix
V T V is nonsingular.
Definition 2.12. Let X = {VI, V2, ..• } be a collection of vectors Vi E V. Then the span of
X is defined as
Sp(X) = Sp{VI, V2, ... }
= {v : V = (Xl VI + ... + (XkVk ; (Xi ElF, Vi EX, kEN},
where N = {I, 2, ... }.
Example 2.13. Let V = ]Rn and define
0 0
0 1 0
el =
0
, e2 =
0
,'" ,en =
0
o o
Then SpIel, e2, ... , en} = ]Rn.
Definition 2.14. A set of vectors X is a basis for V if and only if
1. X is a linearly independent set (of basis vectors), and
2. Sp(X) = V.
12 Chapter 2. Vector Spaces
Example 2.15. [e\,..., e
n
} is a basis for IR" (sometimes called the natural basis).
Now let b1, ..., b
n
be a basis (with a specific order associated with the basis vectors)
for V. Then for all v e V there exists a unique ntuple {E1 , . . . , E n} such that
Definition 2.16. The scalars {Ei} are called the components (or sometimes the coordinates)
of v with respect to the basis (b1, ..., b
n
] and are unique. We say that the vector x of
components represents the vector v with respect to the basis B.
Example 2.17. In Rn,
we have
To see this, write
Then
Theorem 2.18. The number of elements in a basis of a vector space is independent of the
particular basis considered.
Definition 2.19. If a basis X for a vector space V= 0) has n elements, V is said to
be ndimensional or have dimension n and we write dim(V) = n or dim V — n. For
We can also determine components of v with respect to another basis. For example, while
with respect to the basis
where
12 Chapter 2. Vector Spaces
Example 2.15. {el, ... , en} is a basis for]Rn (sometimes called the natural basis).
Now let b
l
, ... , b
n
be a basis (with a specific order associated with the basis vectors)
for V. Then for all v E V there exists a unique ntuple ... , such that
v = + ... + = Bx,
where
B [b".,b.l. x D J
Definition 2.16. The scalars } are called the components (or sometimes the coordinates)
of v with respect to the basis {b
l
, ... , b
n
} and are unique. We say that the vector x of
components represents the vector v with respect to the basis B.
Example 2.17. In]Rn,
VI ]
: = vlel + V2e2 + ... + vne
n
·
Vn
We can also determine components of v with respect to another basis. For example, while
with respect to the basis
we have
To see this, write
Then
[ ] = I . el + 2 . e2,
[ ] = 3 . [ ] + 4· [ l
[ ] = XI • [  ] + X2 • [ _! ]
= [  ! ] [ l
[ ] = [ ; 1 r I [ ; ] = [ l
Theorem 2.18. The number of elements in a basis of a vector space is independent of the
particular basis considered.
Definition 2.19. If a basis X for a vector space V(Jf 0) has n elements, V is said to
be n.dimensional or have dimension n and we write dim (V) = n or dim V = n. For
2.4 Sums and Intersections of Subspaces
Definition 2.21. Let (V, F) be a vector space and let 71, S c V. The sum and intersection
of R, and S are defined respectively by:
The subspaces R, and S are said to be complements of each other in T.
Remark 2.23. The union of two subspaces, R C S, is not necessarily a subspace.
Definition 2.24. T = R 0 S is the direct sum of R and S if
Theorem 2.22.
2.4. Sums and Intersections of Subspaces 13
consistency, and because the 0 vector is in any vector space, we define dim(O) = 0. A
vector space V is finitedimensional if there exists a basis X with n < +00 elements;
otherwise, V is infinitedimensional.
Thus, Theorem 2.18 says that dim(V) = the number of elements in a basis.
Example 2.20.
1. d i m(Rn)=n.
2. dim(R
mxn
) = mn.
Note: Check that a basis for R
mxn
is given by the mn matrices Eij; i e m, j e n,
where E
f
j is a matrix all of whose elements are 0 except for a 1 in the (i, j)th location.
The collection of Eij matrices can be called the "natural basis matrices."
3. dim(C[to, t1])  +00.
4. dim{A € R
nxn
: A = A
T
} = {1/2(n + 1).
1
2
(To see why, determine 1/ 2n( n + 1) symmetric basis matrices.)
5. dim{A e R
nxn
: A is upper (lower) triangular} = 1/ 2n( n + 1).
1. n + S = {r + s : r e U, s e 5}.
2. ft H 5 = {v : v e 7^ and v e 5}.
K
1. K + S C V (in general, U\  \ h 7^ =: ]T ft/ C V, for finite k).
1=1
2. 72. D 5 C V (in general, f] * R,
a
C V/ or an arbitrary index set A).
a e A
1. n n S = 0, and
2. U + S = T (in general ft; n (^ ft,) = 0 am/ ]Pft, = T).
y>f «
2.4. Sums and Intersections of Subspaces 13
consistency, and because the 0 vector is in any vector space, we define dim(O) = O. A
vector space V is finitedimensional if there exists a basis X with n < +00 elements;
otherwise, V is infinitedimensional.
Thus, Theorem 2.18 says that dim (V) = the number of elements in a basis.
Example 2.20.
1. = n.
2. = mn.
Note: Check that a basis for is given by the mn matrices Eij; i E m, j E
where Eij is a matrix all of whose elements are 0 except for a 1 in the (i, J)th location.
The collection of E;j matrices can be called the "natural basis matrices."
3. dim(C[to, tJJ) = +00.
4. dim{A E : A = AT} = !n(n + 1).
(To see why, determine !n(n + 1) symmetric basis matrices.)
5. dim{A E : A is upper (lower) triangular} = !n(n + 1).
2.4 Sums and Intersections of Subspaces
Definition 2.21. Let (V, JF') be a vector space and let R, S S; V. The sum and intersection
ofR and S are defined respectively by:
1. R + S = {r + s : r E R, s E S}.
2. R n S = {v : v E R and v E S}.
Theorem 2.22.
k
1. R + S S; V (in general, RI + ... + Rk =: L R; S; V, for finite k).
;=1
2. R n S S; V (in general, n Ra S; V for an arbitrary index set A).
CiEA
Remark 2.23. The union of two subspaces, R U S, is not necessarily a subspace.
Definition 2.24. T = REB S is the direct sum ofR and S if
1. R n S = 0, and
2. R + S = T (in general, R; n (L R
j
) = 0 and L Ri = T).
H;
The subspaces Rand S are said to be complements of each other in T.
14 Chapter 2. Vector Spaces
Remark 2.25. The complement of ft (or S) is not unique. For example, consider V = R
2
and let ft be any line through the origin. Then any other distinct line through the origin is
a complement of ft. Among all the complements there is a unique one orthogonal to ft.
We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose T =R O S. Then
1. every t € T can be written uniquely in the form t = r + s with r e R and s e S.
2. dim(T) = dim(ft) + dim(S).
Proof: To prove the first part, suppose an arbitrary vector t e T can be written in two ways
as t = r1 + s1 = r2 + S2, where r1, r2 e R. and s1, S2 e S. Then r1 — r2 = s2— s\. But
r1 –r2 £ ft and 52 — si e S. Since ft fl S = 0, we must have r\ = ri and s\ = si from
which uniqueness follows.
The statement of the second part is a special case of the next theorem. D
Theorem 2.27. For arbitrary subspaces ft, S of a vector space V,
EXERCISES
1. Suppose {vi,..., Vk} is a linearly dependent set. Then show that one of the vectors
must be a linear combination of the others.
2. Let x\, *2, . . . , x/c E R" be nonzero mutually orthogonal vectors. Show that [x\,...,
X k} must be a linearly independent set.
3. Let v\,... ,v
n
be orthonormal vectors in R". Show that Av\,..., Av
n
are also or
thonormal if and only if Ae R"
x
" is orthogonal.
4. Consider the vectors v\ — [2 l]
r
and 1*2 = [3 l]
r
. Prove that vi and V2 form a basis
for R
2
. Find the components of the vector v = [4 l]
r
with respect to this basis.
Example 2.28. Let U be the subspace of upper triangular matrices in E"
x
" and let £ be the
subspace of lower triangular matrices in R
nxn
. Then it may be checked that U + L = R
nxn
while U n £ is the set of diagonal matrices in R
nxn
. Using the fact that dim (diagonal
matrices} = n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity
of the formula given in Theorem 2.27.
Example 2.29. Let (V, F) = (R
nxn
, R), let ft be the set of skewsymmetric matrices in
R"
x
", and let S be the set of symmetric matrices in R"
x
". Then V = U 0 S.
Proof: This follows easily from the fact that any Ae R"
x
" can be written in the form
The first matrix on the righthand side above is in S while the second is in ft.
14 Chapter 2. Vector Spaces
Remark 2.25. The complement of R (or S) is not unique. For example, consider V = jR2
and let R be any line through the origin. Then any other distinct line through the origin is
a complement of R. Among all the complements there is a unique one orthogonal to R.
We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose T = R EB S. Then
1. every t E T can be written uniquely in the form t = r + s with r E Rand s E S.
2. dim(T) = dim(R) + dim(S).
Proof: To prove the first part, suppose an arbitrary vector t E T can be written in two ways
as t = rl + Sl = r2 + S2, where rl, r2 E Rand SI, S2 E S. Then r,  r2 = S2  SI. But
rl  r2 E Rand S2  SI E S. Since R n S = 0, we must have rl = r2 and SI = S2 from
which uniqueness follows.
The statement of the second part is a special case of the next theorem. 0
Theorem 2.27. For arbitrary subspaces R, S of a vector space V,
dim(R + S) = dim(R) + dim(S)  dim(R n S).
Example 2.28. Let U be the subspace of upper triangular matrices in jRn xn and let .c be the
subspace of lower triangUlar matrices in jRn xn. Then it may be checked that U + .c = jRn xn
while un.c is the set of diagonal matrices in jRnxn. Using the fact that dim {diagonal
matrices} = n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity
of the formula given in Theorem 2.27.
Example 2.29. Let (V, IF) = (jRnxn, jR), let R be the set of skewsymmetric matrices in
jRnxn, and let S be the set of symmetric matrices in jRnxn. Then V = n $ S.
Proof: This follows easily from the fact that any A E jRnxn can be written in the form
1 TIT
A=2:(A+A )+2:(AA).
The first matrix on the righthand side above is in S while the second is in R.
EXERCISES
1. Suppose {VI, ... , vd is a linearly dependent set. Then show that one of the vectors
must be a linear combination of the others.
2. Let XI, X2, ... , Xk E jRn be nonzero mutually orthogonal vectors. Show that {XI, ... ,
Xk} must be a linearly independent set.
3. Let VI, ... , Vn be orthonormal vectors in jRn. Show that Av" •.. , AV
n
are also or
thonormal if and only if A E jRnxn is orthogonal.
4. Consider the vectors VI = [2 1 f and V2 = [3 1 f. Prove that VI and V2 form a basis
for jR2. Find the components of the vector v = [4 If with respect to this basis.
Exercises 15
5. Let P denote the set of polynomials of degree less than or equal to two of the form
Po + p\x + pix
2
, where po, p\, p2 e R. Show that P is a vector space over E. Show
that the polynomials 1, *, and 2x
2
— 1 are a basis for P. Find the components of the
polynomial 2 + 3x + 4x
2
with respect to this basis.
6. Prove Theorem 2.22 (for the case of two subspaces R and S only).
7. Let P
n
denote the vector space of polynomials of degree less than or equal to n, and of
the form p( x) = po + p\x + • • • + p
n
x
n
, where the coefficients /?, are all real. Let PE
denote the subspace of all even polynomials in P
n
, i.e., those that satisfy the property
p(—x} = p(x). Similarly, let PQ denote the subspace of all odd polynomials, i.e.,
those satisfying p(—x} = – p( x) . Show that P
n
= P
E
© PO
8. Repeat Example 2.28 using instead the two subspaces 7" of tridiagonal matrices and
U of upper triangular matrices.
Exercises 15
5. Let P denote the set of polynomials of degree less than or equal to two of the form
Po + PI X + P2x2, where Po, PI, P2 E R Show that P is a vector space over R Show
that the polynomials 1, x, and 2x2  1 are a basis for P. Find the components of the
polynomial 2 + 3x + 4x
2
with respect to this basis.
6. Prove Theorem 2.22 (for the case of two subspaces Rand S only).
7. Let P
n
denote the vector space of polynomials of degree less than or equal to n, and of
the form p(x) = Po + PIX + ... + Pnxn, where the coefficients Pi are all real. Let PE
denote the subspace of all even polynomials in P
n
, i.e., those that satisfy the property
p( x) = p(x). Similarly, let Po denote the subspace of all odd polynomials, i.e.,
those satisfying p(x) = p(x). Show that P
n
= PE EB Po·
8. Repeat Example 2.28 using instead the two subspaces T of tridiagonal matrices and
U of upper triangular matrices.
This page intentionally left blank This page intentionally left blank
Chapter 3
Linear Transformations
3.1 Definition and Examples
We begin with the basic definition of a linear transformation (or linear map, linear function,
or linear operator) between two vector spaces.
Definition 3.1. Let (V, F) and (W, F) be vector spaces. Then C : V > W is a linear
transformation if and only if
£(avi + pv
2
) = aCv\ + fi£v
2
far all a, £ e F and far all v
}
,v
2
e V.
The vector space V is called the domain of the transformation C while VV, the space into
which it maps, is called the codomain.
Example 3.2.
1. Let F = R and take V = W = PC[f
0
, +00).
Define £ : PC[t
0
, +00) > PC[t
0
, +00) by
2. Let F = R and take V = W = R
mx
". Fix M e R
mxm
.
Define £ : R
mx
" > M
mxn
by
3. Let F = R and take V = P" = (p(x) = a
0
+ ct
}
x H h a
n
x" : a, E R} and
w = p
n

1
.
Define C.: V —> W by Lp — p', where' denotes differentiation with respect to x.
17
Chapter 3
Linear Transformations
3.1 Definition and Examples
We begin with the basic definition of a linear transformation (or linear map, linear function,
or linear operator) between two vector spaces.
Definition 3.1. Let (V, IF) and (W, IF) be vector spaces. Then I: : V + W is a linear
transformation if and only if
I:(avi + {3V2) = al:vi + {3I:V2 for all a, {3 ElF and for all VI, V2 E V.
The vector space V is called the domain of the transformation I: while W, the space into
which it maps, is called the codomain.
Example 3.2.
1. Let IF = JR and take V = W = PC[to, +00).
Define I: : PC[to, +00) + PC[to, +00) by
vet) f+ wet) = (I:v)(t) = 11 e(tr)v(r) dr.
to
2. Let IF = JR and take V = W = JRmxn. Fix ME JRmxm.
Define I: : JRmxn + JRmxn by
X f+ Y = I:X = MX.
3. Let IF = JR and take V = pn = {p(x) = ao + alx + ... + anx
n
: ai E JR} and
W = pnl.
Define I: : V + W by I: p = p', where I denotes differentiation with respect to x.
17
18 Chapters. Li near Transformations
3.2 Matrix Representation of Linear Transformations
Linear transformations between vector spaces with specific bases can be represented con
veniently in matrix form. Specifically, suppose £ : (V, F) — > • (W, F) is linear and further
suppose that {u,, i e n} and {Wj, j e m] are bases for V and W, respectively. Then the
ith column of A = Mat £ (the matrix representation of £ with respect to the given bases
for V and W) is the representation of £i> , with respect to {w
}
•, j e raj. In other words,
represents £ since
where W = [w\,..., w
m
] and
is the z'th column of A. Note that A = Mat £ depends on the particular bases for V and W.
This could be reflected by subscripts, say, in the notation, but this is usually not done.
The action of £ on an arbitrary vector v e V is uniquely determined (by linearity)
by its action on a basis. Thus, if v = E1v1 + • • • + E
n
v
n
= Vx (where u, and hence jc, is
arbitrary), then
Thinking of A both as a matrix and as a linear transformation from Rn to R
m
usually causes no
confusion. Change of basis then corresponds naturally to appropriate matrix multiplication.
Thus, £V = WA since x was arbitrary.
When V = R", W = R
m
and [vi , i e n}, [wj , j e m} are the usual (natural) bases
the equation £V = WA becomes simply £ = A. We thus commonly identify A as a linea
transformation with its matrix representation, i.e.,
18 Chapter 3. Linear Transformations
3.2 Matrix Representation of Linear Transformations
Linear transformations between vector spaces with specific bases can be represented con
veniently in matrix form. Specifically, suppose L : (V, IF) (W, IF) is linear and further
suppose that {Vi, i E and {w j, j E !!!.} are bases for V and W, respectively. Then the
ith column of A = Mat L (the matrix representation of L with respect to the given bases
for V and W) is the representation of LVi with respect to {w j, j E m}. In other words,
represents L since
A=
al
n
]
: E JR.mxn
a
mn
LVi = aliwl + ... + amiWm
=Wai,
where W = [WI, ... , w
m
] and
is the ith column of A. Note that A = Mat L depends on the particular bases for V and W.
This could be reflected by subscripts, say, in the notation, but this is usually not done.
The action of L on an arbitrary vector V E V is uniquely determined (by linearity)
by its action on a basis. Thus, if V = VI + ... + Vn = V x (where v, and hence x, is
arbitrary), then
LVx = Lv = + ... +
= WAx.
Thus, LV = W A since x was arbitrary.
When V = JR.n, W = lR.
m
and {Vi, i E {W j' j E !!!.} are the usual (natural) bases,
the equation LV = W A becomes simply L = A. We thus commonly identify A as a linear
transformation with its matrix representation, i.e.,
Thinking of A both as a matrix and as a linear transformation from JR." to lR.
m
usually causes no
confusion. Change of basis then corresponds naturally to appropriate matrix multiplication.
3.3. Composition of Transformations 19
3.3 Composition of Transformations
Consider three vector spaces U, V, and W and transformations B from U to V and A from
V to W. Then we can define a new transformation C as follows:
formula
Two Special Cases:
Inner Product: Let x, y e Rn. Then their inner product is the scalar
Outer Product: Let x e R
m
, y e Rn. Then their outer product is the m x n
matrix
Note that any rankone matrix A e R
mxn
can be written in the form A = xy
T
above (or xy
H
if A e C
mxn
). A rankone symmetric matrix can be written in
the form XX
T
(or XX
H
).
The above diagram illustrates the composition of transformations C = AB. Note that in
most texts, the arrows above are reversed as follows:
However, it might be useful to prefer the former since the transformations A and B appear
in the same order in both the diagram and the equation. If dimZ// = p, dimV = n,
and dim W = m, and if we associate matrices with the transformations in the usual way,
then composition of transformations corresponds to standard matrix multiplication. That is,
we have C — A B . The above is sometimes expressed componentwise by the
3.3. Composition ofTransformations 19
3.3 Composition of Transformations
Consider three vector spaces U, V, and Wand transformations B from U to V and A from
V to W. Then we can define a new transformation C as follows:
C
The above diagram illustrates the composition of transformations C = AB. Note that in
most texts, the arrows above are reversed as follows:
C
However, it might be useful to prefer the former since the transformations A and B appear
in the same order in both the diagram and the equation. If dimU = p, dim V = n,
and dim W = m, and if we associate matrices with the transformations in the usual way,
then composition of transformations corresponds to standard matrix mUltiplication. That is,
we have CAB . The above is sometimes expressed componentwise by the
mxp
formula
Two Special Cases:
nxp
n
cij = L aikbkj.
k=1
Inner Product: Let x, y E ~ n . Then their inner product is the scalar
n
xTy = Lx;y;.
;=1
Outer Product: Let x E ~ m , y E ~ n . Then their outer product is the m x n
matrix
Note that any rankone matrix A E ~ m x n can be written in the form A = xyT
above (or xyH if A E c
mxn
). A rankone symmetric matrix can be written in
the form xx
T
(or xx
H
).
20 Chapter 3. Li near Transformations
3.4 Structure of Linear Transformations
Let A : V —> W be a linear transformation.
Definition 3.3. The range of A, denotedlZ( A), is the set {w e W : w = Av for some v e V}.
Equivalently, R(A) — {Av : v e V}. The range of A is also known as the image of A and
denoted Im(A).
The nullspace of A, denoted N(A), is the set {v e V : Av = 0}. The nullspace of
A is also known as the kernel of A and denoted Ker (A).
Theorem 3.4. Let A : V — >• W be a linear transformation. Then
1. R( A) C W.
2. N(A) c V.
Note that N(A) and R(A) are, in general, subspaces of different spaces.
Theorem 3.5. Let A e R
mxn
. If A is written in terms of its columns as A = [a\,... ,a
n
],
then
Proof: The proof of this theorem is easy, essentially following immediately from the defi
nition. D
Remark 3.6. Note that in Theorem 3.5 and throughout the text, the same symbol (A) is
used to denote both a linear transformation and its matrix representation with respect to the
usual (natural) bases. See also the last paragraph of Section 3.2.
Definition 3.7. Let {v1 , . . . , vk] be a set of nonzero vectors u, e Rn. The set is said to
be orthogonal if' vjvj = 0 for i ^ j and orthonormal if vf vj = 8
ij
, where 8
t
j is the
Kronecker delta defined by
Example 3.8.
is an orthogonal set.
is an orthonormal set.
3. If { t > i , . . . , Vk} with u, € M." is an orthogonal set, then I — /==,  ., — /===  is an
I ^/v, vi ^/v'
k
v
k
]
orthonormal set.
then
20 Chapter 3. LinearTransformations
3.4 Structure of Linear Transformations
Let A : V + W be a linear transformation.
Definition3.3. The range of A, denotedR(A), is the set {w E W : w = Av for some v E V}.
Equivalently, R(A) = {Av : v E V}. The range of A is also known as the image of A and
denoted Im(A).
The nullspace of A, denoted N(A), is the set {v E V : Av = O}. The nullspace of
A is also known as the kernel of A and denoted Ker (A).
Theorem 3.4. Let A : V + W be a linear transformation. Then
1. R(A) S; W.
2. N(A) S; V.
Note that N(A) and R(A) are, in general, subspaces of different spaces.
Theorem 3.5. Let A E If A is written in terms of its columns as A = [ai, ... , an],
then
R(A) = Sp{al, ... , an} .
Proof: The proof of this theorem is easy, essentially following immediately from the defi
nition. 0
Remark 3.6. Note that in Theorem 3.5 and throughout the text, the same symbol (A) is
used to denote both a linear transformation and its matrix representation with respect to the
usual (natural) bases. See also the last paragraph of Section 3.2.
Definition 3.7. Let {VI, ... , vd be a set of nonzero vectors Vi E The set is said to
be orthogonal if vr v j = 0 for i f= j and orthonormal if vr v j = 8ij' where 8ij is the
Kronecker delta defined by
8 {I ifi=j,
ij = 0 if i f= j.
Example 3.8.
1. {[ J. [ : J} is an orthogonal set.
2. {[ ] ,[ J} is an orthonormal set.
3 If { }
. h 1Tlln • h I th { .
. VI, •.• ,Vk Wit Vi E.IN,. IS an ort ogona set, en ... , IS an
VI
orthonormal set.
3.4. Structure of Linear Transformations 21
Definition 3.9. Let S c Rn. Then the orthogonal complement of S is defined as the set
S
1
 = {v e Rn : V
T
S = 0 for all s e S}.
Example 3.10. Let
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e.,
nonzero) solutions of the system of equations
Note that there is nothing special about the two vectors in the basis defining S being or
thogonal. Any set of vectors will do, including dependent spanning vectors (which would,
of course, then give rise to redundant equations).
Theorem 311 Let R S C R
n
The
Proof: We prove and discuss only item 2 here. The proofs of the other results are left as
exercises. Let { v1 , ..., v
k
} be an orthonormal basis for S and let x e Rn be an arbitrary
vector. Set
3.4. Structure of Li near Transformations 21
Definition 3.9. Let S <; ]Rn. Then the orthogonal complement of S is defined as the set
vTs=OforallsES}.
Example 3.10. Let
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e.,
nonzero) solutions of the system of equations
3xI + 5X2 + 7X3 = 0,
4xI + X2 + X3 = 0.
Note that there is nothing special about the two vectors in the basis defining S being or
thogonal. Any set of vectors will do, including dependent spanning vectors (which would,
of course, then give rise to redundant equations).
Theorem 3.11. Let n, S <; ]Rn. Then
2. S \B = ]Rn.
3. = S.
4. n <; S if and only if <;
5. (n + = nl. n
6. (n n = +
Proof: We prove and discuss only item 2 here. The proofs of the other results are left as
exercises. Let {VI, ... , Vk} be an orthonormal basis for S and let x E ]Rn be an arbitrary
vector. Set
k
XI = L (xT Vi)Vi,
;=1
X2 = X XI.
we see that x2 is orthogonal to v1, ..., Vk and hence to any linear combination of these
vectors. In other words, X2 is orthogonal to any vector in S. We have thus shown that
S + S
1
= Rn. We also have that S U S
1
=0 since the only vector s e S orthogonal to
everything in S (i.e., including itself) is 0.
It is also easy to see directly that, when we have such direct sum decompositions, we
can write vectors in a unique way with respect to the corresponding subspaces. Suppose,
for example, that x = x1 + x2. = x'1+ x'
2
, where x\, x 1 E S and x2, x'
2
e S
1
. Then
(x'1 — x1)
T
( x'
2
— x2) = 0 by definition of ST . But then (x'1 — x1)
T
( x' 1 – x1) = 0 since
x
2
— X2 = — (x'1 — x1) (which follows by rearranging the equation x1+x2 = x'1 + x'
2
) . Thus,
x1 — x'1 and x2 = x
2
. D
Theorem 3.12. Let A : Rn —> R
m
. Then
1. N(A)
1
" = 7£(A
r
). (Note: This holds only for finitedimensional vector spaces.)
2. 'R,(A)
1
~ — J\f(A
T
). (Note: This also holds for infinitedimensional vector spaces.)
Proof: To prove the first part, take an arbitrary x e A/ "(A). Then Ax = 0 and this is
equivalent to y
T
Ax = 0 for all v. But y
T
Ax = ( A
T
y ) x. Thus, Ax = 0 if and only if x
is orthogonal to all vectors of the form A
T
v, i.e., x e R(A
r
) . Since x was arbitrary, we
have established that N(A)
1
= U(A
T
}.
The proof of the second part is similar and is left as an exercise. D
Definition 3.13. Let A : R
n
> R
m
. Then {v e R" : Av = 0} is sometimes called the
right nullspace of A. Similarly, (w e R
m
: W
T
A = 0} is called the left nullspace of A.
Clearly, the right nullspace is A/"(A) while the left nullspace is J\f(A
T
).
Theorem 3.12 and part 2 of Theorem 3.11 can be combined to give two very fun
damental and useful decompositions of vectors in the domain and codomain of a linear
transformation A. See also Theorem 2.26.
Theorem 3.14 (Decomposition Theorem). Let A : R" > R
m
. Then
7. every vector v in the domain space R" can be written in a unique way as v = x + y,
where x € M(A) and y € J\f(A)
±
= ft(A
r
) (i.e., R" = M(A) 0 ft(A
r
)).
2. every vector w in the codomain space R
m
can be written in a unique way asw = x+y,
where x e U(A) and y e ft(A)
1
 = Af(A
T
) (i.e., R
m
= 7l(A) 0 M(A
T
)).
This key theorem becomes very easy to remember by carefully studying and under
standing Figure 3.1 in the next section.
3.5 Four Fundamental Subspaces
Consider a general matrix A € E^
x
". When thought of as a linear transformation from E"
to R
m
, many properties of A can be developed in terms of the four fundamental subspaces
22 Chapters. L i near Transformations
Then x\ e < S and, since
22 Chapter 3. Linear Transformations
Then XI E S and, since
T T T
x
2
V j = X V j  X I V j
=XTVjXTVj=O,
we see that X2 is orthogonal to VI, .•. , Vk and hence to any linear combination of these
vectors. In other words, X2 is orthogonal to any vector in S. We have thus shown that
S + S.l = IRn. We also have that S n S.l = 0 since the only vector s E S orthogonal to
everything in S (i.e., including itself) is O.
It is also easy to see directly that, when we have such direct sum decompositions, we
can write vectors in a unique way with respect to the corresponding subspaces. Suppose,
for example, that x = XI + X2 = x; + x ~ , where XI, x; E Sand X2, x ~ E S.l. Then
(x;  XI/ ( x ~  X2) = 0 by definition of S.l. But then (x;  XI)T (x;  xd = 0 since
x ~ X2 = (x; XI) (which follows by rearranging the equation XI +X2 = x; + x ~ ) . Thus,
XI = x; andx2 = x ~ . 0
Theorem 3.12. Let A : IR
n
+ IRm. Then
1. N(A).l = R(A
T
). (Note: This holds only for finitedimensional vector spaces.)
2. R(A).l = N(A
T
). (Note: This also holds for infinitedimensional vector spaces.)
Proof: To prove the first part, take an arbitrary x E N(A). Then Ax = 0 and this is
equivalent to yT Ax = 0 for all y. But yT Ax = (AT y{ x. Thus, Ax = 0 if and only if x
is orthogonal to all vectors of the form AT y, i.e., x E R(AT).l. Since x was arbitrary, we
have established thatN(A).l = R(A
T
).
The proof of the second part is similar and is left as an exercise. 0
Definition 3.13. Let A : IR
n
+ IRm. Then {v E IR
n
: A v = O} is sometimes called the
right nullspace of A. Similarly, {w E IR
m
: w
T
A = O} is called the left nullspace of A.
Clearly, the right nullspace is N(A) while the left nullspace is N(A
T
).
Theorem 3.12 and part 2 of Theorem 3.11 can be combined to give two very fun
damental and useful decompositions of vectors in the domain and codomain of a linear
transformation A. See also Theorem 2.26.
Theorem 3.14 (Decomposition Theorem). Let A : IR
n
+ IRm. Then
1. every vector v in the domain space IR
n
can be written in a unique way as v = x + y,
where x E N(A) and y E N(A).l = R(AT) (i.e., IR
n
= N(A) EB R(A
T
».
2. every vector w in the codomain space IR
m
can be written ina unique way as w = x+y,
where x E R(A) and y E R(A).l = N(A
T
) (i.e., IR
m
= R(A) EBN(A
T
».
This key theorem becomes very easy to remember by carefully studying and under
standing Figure 3.1 in the next section.
3.5 Four Fundamental Subspaces
Consider a general matrix A E lR;,xn. When thought of as a linear transformation from IR
n
to IRm, many properties of A can be developed in terms of the four fundamental subspaces
3.5. Four Fundamental Subspaces 23
Figure 3.1. Four fundamental subspaces.
7£(A), 'R.(A)^, Af ( A) , and N(A)T. Figure 3.1 makes many key properties seem almost
obvious and we return to this figure frequently both in the context of linear transformations
and in illustrating concepts such as controllability and observability.
Definition 3.15. Let V and W be vector spaces and let A : V
motion.
1. A is onto (also called epic or surjective) ifR,(A) = W.
W be a linear transfor
2. A is onetoone or 11 (also called monic or infective) ifJ\f(A) = 0. Two equivalent
characterizations of A being 11 that are often easier to verify in practice are the
following:
Definition 3.16. Let A : E" > R
m
. Then rank(A) = dimftCA). This is sometimes called
the column rank of A (maximum number of independent columns). The row rank of A is
3.5. Four Fundamental Subspaces 23
A
r
N(A)1
r
EB {OJ
X {O}Gl
nr m r
Figure 3.1. Four fundamental subspaces.
R(A), R(A)1, N(A), and N(A)1. Figure 3.1 makes many key properties seem almost
obvious and we return to this figure frequently both in the context of linear transformations
and in illustrating concepts such as controllability and observability.
Definition 3.15. Let V and W be vector spaces and let A : V + W be a linear transfor
mation.
1. A is onto (also called epic or surjective) ifR(A) = W.
2. A is onetoone or 11 (also called monic or injective) if N(A) = O. Two equivalent
characterizations of A being 11 that are often easier to verify in practice are the
following:
(a) AVI = AV2 ===} VI = V2 .
(b) VI t= V2 ===} AVI t= AV2 .
Definition 3.16. Let A : IR
n
+ IRm. Then rank(A) = dim R(A). This is sometimes called
the column rank of A (maximum number of independent columns). The row rank of A is
24 Chapter3. Linear Transformations
dim 7£(A
r
) (maximum number of independent rows). The dual notion to rank is the nullity
of A, sometimes denoted nullity(A) or corank(A), and is defined as dim A/"(A).
Theorem 3.17. Let A : R
n
> R
m
. Then dim K(A) = dimA/ '(A)
±
. (Note: Since
A/^A)
1
" = 7l(A
T
), this theorem is sometimes colloquially stated "row rank of A = column
rank of A.")
Proof: Define a linear transformation T : J\f(A)~
L
— >• 7£(A) by
Clearly T is 11 (since A/"(T) = 0). To see that T is also onto, take any w e 7£(A). Then
by definition there is a vector x e R" such that Ax — w. Write x = x\ + X2, where
x\ e A/^A)
1
 and jc
2
e A/"(A). Then Ajti = u; = r*i since *i e A/^A)
1
. The last equality
shows that T is onto. We thus have that dim7?.(A) = dimA/^A^ since it is easily shown
that if { ui , . . . , iv} is abasis forA/'CA)
1
, then {Tv\, . . . , Tv
r
] is abasis for 7?.(A). Finally, if
we apply this and several previous results, the following string of equalities follows easily:
"column rank of A" = rank(A) = dim7e(A) = dim A/^A)
1
= dim7l(A
T
) = rank(A
r
) =
"row rank of A." D
The following corollary is immediate. Like the theorem, it is a statement about equality
of dimensions; the subspaces themselves are not necessarily in the same vector space.
Corollary 3.18. Let A : R" > R
m
. Then dimA/"(A) + dimft(A) = n, where n is the
dimension of the domain of A.
Proof: From Theorems 3.11 and 3.17 we see immediately that
For completeness, we include here a few miscellaneous results about ranks of sums
and products of matrices.
Theorem 3.19. Let A, B e R"
xn
. Then
Part 4 of Theorem 3.19 suggests looking at the general problem of the four fundamental
subspaces of matrix products. The basic results are contained in the following easily proved
theorem.
24 Chapter 3. LinearTransformations
dim R(AT) (maximum number of independent rows). The dual notion to rank is the nullity
of A, sometimes denoted nullity(A) or corank(A), and is defined as dimN(A).
Theorem 3.17. Let A : ]Rn ~ ]Rm. Then dim R(A) = dimNCA)L. (Note: Since
N(A)L = R(A
T
), this theorem is sometimes colloquially stated "row rank of A = column
rank of A.")
Proof: Define a linear transformation T : N(A)L ~ R(A) by
Tv = Av for all v E N(A)L.
Clearly T is 11 (since N(T) = 0). To see that T is also onto, take any W E R(A). Then
by definition there is a vector x E ]Rn such that Ax = w. Write x = Xl + X2, where
Xl E N(A)L andx2 E N(A). Then AXI = W = TXI since Xl E N(A)L. The last equality
shows that T is onto. We thus have that dim R(A) = dimN(A)L since it is easily shown
that if {VI, ... , v
r
} is a basis for N(A)L, then {TVI, ... , Tv
r
} is a basis for R(A). Finally, if
we apply this and several previous results, the following string of equalities follows easily:
"column rank of A" = rank(A) = dim R(A) = dimN(A)L = dim R(AT) = rank(AT) =
"row rank of A." 0
The following corollary is immediate. Like the theorem, it is a statement about equality
of dimensions; the subspaces themselves are not necessarily in the same vector space.
Corollary 3.18. Let A : ]Rn ~ ]Rm. Then dimN(A) + dim R(A) = n, where n is the
dimension of the domain of A.
Proof: From Theorems 3.11 and 3.17 we see immediately that
n = dimN(A) + dimN(A)L
= dimN(A) + dim R(A) . 0
For completeness, we include here a few miscellaneous results about ranks of sums
and products of matrices.
Theorem 3.19. Let A, B E ]Rnxn. Then
1. O:s rank(A + B) :s rank(A) + rank(B).
2. rank(A) + rank(B)  n :s rank(AB) :s min{rank(A), rank(B)}.
3. nullity(B) :s nullity(AB) :s nullity(A) + nullity(B).
4. if B is nonsingular, rank(AB) = rank(BA) = rank(A) and N(BA) = N(A).
Part 4 of Theorem 3.19 suggests looking atthe general problem of the four fundamental
subspaces of matrix products. The basic results are contained in the following easily proved
theorem.
3.5. Four F u n d a me n t a l Subspaces 25
Theorem 3.20. Let A e R
mxn
, B e R
nxp
. Then
The next theorem is closely related to Theorem 3.20 and is also easily proved. It
is extremely useful in text that follows, especially when dealing with pseudoinverses and
linear least squares problems.
Theorem 3.21. Let A e R
mxn
. Then
We now characterize 11 and onto transformations and provide characterizations in
terms of rank and invertibility.
Theorem 3.22. Let A : R
n
» R
m
. Then
1. A is onto if and only //"rank(A) — m (A has linearly independent rows or is said to
have full row rank; equivalently, AA
T
is nonsingular).
2. A is 11 if and only z/r a nk(A) = n (A has linearly independent columns or is said
to have full column rank; equivalently, A
T
A is nonsingular).
Proof: Proof of part 1: If A is onto, dim7?,(A) — m — rank (A). Conversely, let y e R
m
be arbitrary. Let jc = A
T
(AA
T
)~
]
y e R
n
. Then y = Ax, i.e., y e 7?.(A), so A is onto.
Proof of part 2: If A is 11, then A/"(A) = 0, which implies that dim A/^A)
1
—n —
dim 7£(A
r
), and hence dim 7£(A) = n by Theorem 3.17. Conversely, suppose Ax\ = Ax^.
Then A
r
A;t i = A
T
Ax2, which implies x\ = x^. since A
r
A is invertible. Thus, A is
11. D
Definition 3.23. A : V —» W is invertible (or bijective) if and only if it is 11 and onto.
Note that if A is invertible, then dim V — dim W. Also, A : W
1
»• E" is invertible or
nonsingular if and only z/r ank(A) = n.
Note that in the special case when A € R"
x
", the transformations A, A
r
, and A"
1
are all 11 and onto between the two spaces M(A)
±
and 7£(A). The transformations A
T
and A~
!
have the same domain and range but are in general different maps unless A is
orthogonal. Similar remarks apply to A and A~
T
.
3.5. Four Fundamental Subspaces 25
Theorem 3.20. Let A E IRmxn, B E IRnxp. Then
1. RCAB) S; RCA).
2. N(AB) ;2 N(B).
3. R«AB)T) S; R(B
T
).
4. N«AB)T) ;2 N(A
T
).
The next theorem is closely related to Theorem 3.20 and is also easily proved. It
is extremely useful in text that follows, especially when dealing with pseudoinverses and
linear least squares problems.
Theorem 3.21. Let A E IRmxn. Then
1. R(A) = R(AA
T
).
2. R(AT) = R(A
T
A).
3. N(A) = N(A
T
A).
4. N(A
T
) = N(AA
T
).
We now characterize II and onto transformations and provide characterizations in
terms of rank and invertibility.
Theorem 3.22. Let A : IR
n
+ IRm. Then
1. A is onto if and only if rank (A) = m (A has linearly independent rows or is said to
have full row rank; equivalently, AA T is nonsingular).
2. A is 11 if and only ifrank(A) = n (A has linearly independent columns or is said
to have full column rank; equivalently, AT A is nonsingular).
Proof' Proof of part 1: If A is onto, dim R(A) = m = rank(A). Conversely, let y E IRm
be arbitrary. Let x = AT (AAT)I Y E IRn. Then y = Ax, i.e., y E R(A), so A is onto.
Proof of part 2: If A is 11, then N(A) = 0, which implies that dimN(A)1 = n =
dim R(A
T
), and hence dim R(A) = n by Theorem 3.17. Conversely, suppose AXI = AX2.
Then AT AXI = AT AX2, which implies XI = X2 since AT A is invertible. Thus, A is
11. D
Definition 3.23. A : V + W is invertible (or bijective) if and only if it is 11 and onto.
Note that if A is invertible, then dim V = dim W. Also, A : IRn + IR
n
is invertible or
nonsingular ifand only ifrank(A) = n.
Note that in the special case when A E I R ~ x n , the transformations A, AT, and AI
are all 11 and onto between the two spaces N(A)1 and R(A). The transformations AT
and A I have the same domain and range but are in general different maps unless A is
orthogonal. Similar remarks apply to A and A T.
26 Chapters. Li near Transformations
If a linear transformation is not invertible, it may still be right or left invertible. Defi
nitions of these concepts are followed by a theorem characterizing left and right invertible
transformations.
Definition 3.24. Let A : V > W. Then
1. A is said to be right invertible if there exists a right inverse transformation A~
R
:
W —> V such that AA~
R
= I
w
, where I
w
denotes the identity transformation on W.
2. A is said to be left invertible if there exists a left inverse transformation A~
L
: W —>
V such that A~
L
A = I
v
, where I
v
denotes the identity transformation on V.
Theorem 3.25. Let A : V > W. Then
1. A is right invertible if and only if it is onto.
2. A is left invertible if and only if it is 11.
Moreover, A is invertible if and only if it is both right and left invertible, i.e., both 11 and
onto, in which case A~
l
= A~
R
= A~
L
.
Note: From Theorem 3.22 we see that if A : E" >• E
m
is onto, then a right inverse
is given by A~
R
= A
T
(AA
T
) . Similarly, if A is 11, then a left inverse is given by
A~
L
= (A
T
A)~
1
A
T
.
Theorem 3.26. Let A : V » V.
1. If there exists a unique right inverse A~
R
such that AA~
R
= I, then A is invertible.
2. If there exists a unique left inverse A~
L
such that A~
L
A = I, then A is invertible.
Proof: We prove the first part and leave the proof of the second to the reader. Notice the
following:
Thus, (A
R
+ A
R
A — /) must be a right inverse and, therefore, by uniqueness it must be
the case that A~
R
+ A~
R
A — I = A~
R
. But this implies that A~
R
A = /, i.e., that A~
R
is
a left inverse. It then follows from Theorem 3.25 that A is invertible. D
Example 3.27.
1. Let A = [1 2] : E
2
»• E
1
. Then A is onto. (Proof: Take any a € E
1
; then one
can always find v e E
2
such that [1 2][^] = a). Obviously A has full row rank
(=1) and A~
R
= [ _j j is a right inverse. Also, it is clear that there are infinitely many
right inverses for A. In Chapter 6 we characterize all right inverses of a matrix by
characterizing all solutions of the linear matrix equation AR = I.
26 Chapter 3. linear Transformations
If a linear transformation is not invertible, it may still be right or left invertible. Defi
nitions of these concepts are followed by a theorem characterizing left and right invertible
transformations.
Definition 3.24. Let A : V + W. Then
1. A is said to be right invertible if there exists a right inverse transformation A
R
:
W + V such that AA R = I
w
, where Iw denotes the identity transfonnation on W.
2. A is said to be left invertible if there exists a left inverse transformation A L : W +
V such that A L A = Iv, where Iv denotes the identity transfonnation on V.
Theorem 3.25. Let A : V + W. Then
1. A is right invertible if and only if it is onto.
2. A is left invertible if and only ifit is 11.
Moreover, A is invertible if and only if it is both right and left invertible, i.e., both 11 and
onto, in which case A I = A R = A L.
Note: From Theorem 3.22 we see that if A : ]Rn + ]Rm is onto, then a right inverse
is given by A R = AT (AAT) I. Similarly, if A is 11, then a left inverse is given by
A
L
= (AT A)I AT.
Theorem 3.26. Let A : V + V.
1. If there exists a unique right inverse A  R such that A A  R = I, then A is invertible.
2. If there exists a unique left inverse A L such that A L A = I, then A is invertible.
Proof: We prove the first part and leave the proof of the second to the reader. Notice the
following:
A(A
R
+ ARA I) = AA
R
+ AARA  A
= I + I A  A since AA R = I
= I.
Thus, (A R + A R A  I) must be a right inverse and, therefore, by uniqueness it must be
the case that A R + A R A  I = A R. But this implies that A R A = I, i.e., that A R is
a left inverse. It then follows from Theorem 3.25 that A is invertible. 0
Example 3.27.
1. Let A = [1 2]:]R2 + ]R I. Then A is onto. (Proo!' Take any a E ]R I; then one
can always find v E ]R2 such that [1 2][ ~ ~ ] = a). Obviously A has full row rank
(= 1) and A  R = [ _ ~ ] is a right inverse. Also, it is clear that there are infinitely many
right inverses for A. In Chapter 6 we characterize all right inverses of a matrix by
characterizing all solutions of the linear matrix equation A R = I.
Exercises 27
2. Let A = [J] : E
1
> E
2
. ThenAis 11. (Proof: The only solution to 0 = Av = [
I
2
]v
is v = 0, whence A/"(A) = 0 so A is 11). It is now obvious that A has full column
rank (=1) and A~
L
= [3 — 1] is a left inverse. Again, it is clear that there are
infinitely many left inverses for A. In Chapter 6 we characterize all left inverses of a
matrix by characterizing all solutions of the linear matrix equation LA = I.
3. The matrix
when considered as a linear transformation on IE
below bases for its four fundamental subspaces.
\ is neither 11 nor onto. We give
EXERCISES
3 4
1. Let A = [
8 5
J and consider A as a linear transformation mapping E
3
to E
2
.
Find the matrix representation of A with respect to the bases
2. Consider the vector space R
nx
" over E, let S denote the subspace of symmetric
matrices, and let 7£ denote the subspace of skewsymmetric matrices. For matrices
X, Y e E
nx
" define their inner product by (X, Y) = Tr( X
r
F) . Show that, with
respect to this inner product, 'R, — S^.
3. Consider the differentiation operator C defined in Example 3.2.3. Is £ 11? Is£
onto?
4. Prove Theorem 3.4.
of R
3
and
of E
2
.
Exercises 27
2. LetA = [i]:]Rl ~ ]R2. Then A is 11. (Proof The only solution toO = Av = [i]v
is v = 0, whence N(A) = 0 so A is 11). It is now obvious that A has full column
rank (=1) and A L = [3  1] is a left inverse. Again, it is clear that there are
infinitely many left inverses for A. In Chapter 6 we characterize all left inverses of a
matrix by characterizing all solutions of the linear matrix equation LA = I.
3. The matrix
[
1 1
A = 2 1
3 1
when considered as a linear transformation on ]R3, is neither 11 nor onto. We give
below bases for its four fundamental subspaces.
EXERCISES
1. Let A = [ ~ ; i) and consider A as a linear transformation mapping ]R3 to ]R2.
Find the matrix representation of A with respect to the bases
{[lHHU]}
{ [ i l [ ~ J }
2. Consider the vector space ]Rnxn over ]R, let S denote the subspace of symmetric
matrices, and let R denote the subspace of skewsymmetric matrices. For matrices
X, Y E ]Rnxn define their inner product by (X, y) = Tr(X
T
Y). Show that, with
respect to this inner product, R = S J. .
3. Consider the differentiation operator £, defined in Example 3.2.3. Is £, II? Is £,
onto?
4. Prove Theorem 3.4.
28 Chapters. Linear Transformations
5. Prove Theorem 3.11.4.
6. Prove Theorem 3.12.2.
7. Determine bases for the four fundamental subspaces of the matrix
8. Suppose A e R
mxn
has a left inverse. Show that A
T
has a right inverse.
9. Let A = [ J o]. Determine A/"(A) and 7£(A). Are they equal? Is this true in general?
If this is true in general, prove it; if not, provide a counterexample.
10. Suppose A € Mg
9x48
. How many linearly independent solutions can be found to the
homogeneous linear system Ax = 0?
11. Modify Figure 3.1 to illustrate the four fundamental subspaces associated with A
T
e
R
nxm
thought of as a transformation from R
m
to R".
28 Chapter 3. Linear Transformations
5. Prove Theorem 3.Il.4.
6. Prove Theorem 3.12.2.
7. Detennine bases for the four fundamental subspaces of the matrix
2 5 5 3
8. Suppose A E IR
m
xn has a left inverse. Show that A T has a right inverse.
9. Let A = n DetennineN(A) and R(A). Are they equal? Is this true in general?
If this is true in general, prove it; if not, provide a counterexample.
10. Suppose A E How many linearly independent solutions can be found to the
homogeneous linear system Ax = O?
11. Modify Figure 3.1 to illustrate the four fundamental subspaces associated with ATE
IR
nxm
thought of as a transformation from IR
m
to IRn.
Chapter 4
Introduction to the
MoorePen rose
Pseudoinverse
In this chapter we give a brief introduction to the MoorePenrose pseudoinverse, a gener
alization of the inverse of a matrix. The MoorePenrose pseudoinverse is defined for any
matrix and, as is shown in the following text, brings great notational and conceptual clarity
to the study of solutions to arbitrary systems of linear equations and linear least squares
problems.
4.1 Definitions and Characterizations
Consider a linear transformation A : X —>• y, where X and y are arbitrary finite
dimensional vector spaces. Define a transformation T : Af(A)
1
 —>• Tl(A) by
Then, as noted in the proof of Theorem 3.17, T is bijective (11 and onto), and hence we
can define a unique inverse transformation T~
l
: 7£(A) —>• J\f(A}~
L
. This transformation
can be used to give our first definition of A
+
, the MoorePenrose pseudoinverse of A.
Unfortunately, the definition neither provides nor suggests a good computational strategy
for determining A
+
.
Definition 4.1. With A and T as defined above, define a transformation A
+
: y —» • X by
where y = y\ + j2 with y\ e 7£(A) and yi e Tl(A}
L
. Then A
+
is the MoorePenrose
pseudoinverse of A.
Although X and y were arbitrary vector spaces above, let us henceforth consider the
case X = W
1
and y = R
m
. We have thus defined A+ for all A e IR ™
X
" . A purely algebraic
characterization of A
+
is given in the next theorem, which was proved by Penrose in 1955;
see [22].
29
Chapter 4
Introduction to the
MoorePenrose
Pseudoinverse
In this chapter we give a brief introduction to the MoorePenrose pseudoinverse, a gener
alization of the inverse of a matrix. The MoorePenrose pseudoinverse is defined for any
matrix and, as is shown in the following text, brings great notational and conceptual clarity
to the study of solutions to arbitrary systems of linear equations and linear least squares
problems.
4.1 Definitions and Characterizations
Consider a linear transformation A : X + y, where X and Y are arbitrary finite
dimensional vector spaces. Define a transformation T : N(A).l + R(A) by
Tx = Ax for all x E NCA).l.
Then, as noted in the proof of Theorem 3.17, T is bijective Cll and onto), and hence we
can define a unique inverse transformation T
1
: RCA) + NCA).l. This transformation
can be used to give our first definition of A +, the MoorePenrose pseudoinverse of A.
Unfortunately, the definition neither provides nor suggests a good computational strategy
for determining A + .
Definition 4.1. With A and T as defined above, define a transformation A + : Y + X by
where Y = YI + Yz with Yl E RCA) and Yz E RCA).l. Then A+ is the MoorePenrose
pseudoinverse of A.
Although X and Y were arbitrary vector spaces above, let us henceforth consider the
case X = ~ n and Y = lP1.
m
. We have thus defined A + for all A E lP1.;" xn. A purely algebraic
characterization of A + is given in the next theorem, which was proved by Penrose in 1955;
see [22].
29
30 Chapter 4. Introduction to the MoorePenrose Pseudoinverse
Theorem 4.2. Let A e R?
xn
. Then G = A
+
i f and only i f
(PI) AGA = A.
(P2) GAG = G.
(P3) (AGf = AG.
(P4) (GA)
T
= GA.
Furthermore, A
+
always exi sts and i s uni que.
Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Also,
a right or left inverse satisfies no fewer than three of the four properties. Unfortunately, as
with Definition 4.1, neither the statement of Theorem 4.2 nor its proof suggests a computa
tional algorithm. However, the Penrose properties do offer the great virtue of providing a
checkable criterion in the following sense. Given a matrix G that is a candidate for being
the pseudoinverse of A, one need simply verify the four Penrose conditions (P1)(P4). If G
satisfies all four, then by uniqueness, it must be A
+
. Such a verification is often relatively
straightforward.
Example 4.3. Consider A = [' ]. Verify directly that A
+
= [ f ] satisfies (P1)(P4).
Note that other left inverses (for example, A~
L
= [3 — 1]) satisfy properties (PI), (P2),
and (P4) but not (P3).
Still another characterization of A
+
is given in the following theorem, whose proof
can be found in [1, p. 19]. While not generally suitable for computer implementation, this
characterization can be useful for hand calculation of small examples.
Theorem 4.4. Let A e R™
xn
. Then
4.2 Examples
Each of the following can be derived or verified by using the above definitions or charac
terizations.
Example 4.5. A
+
= A
T
(AA
T
)~ if A is onto (independent rows) (A is right invertible).
Example 4.6. A
+
= (A
T
A)~ A
T
i f A is 11 (independent columns) (A is left invertible).
Example 4.7. For any scalar a,
30 Chapter 4. Introduction to the MoorePenrose Pseudoinverse
Theorem 4.2. Let A E lR;" xn. Then G = A + if and only if
(Pl) AGA = A.
(P2) GAG = G.
(P3) (AG)T = AG.
(P4) (GA)T = GA.
Furthermore, A + always exists and is unique.
Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Also,
a right or left inverse satisfies no fewer than three of the four properties. Unfortunately, as
with Definition 4.1, neither the statement of Theorem 4.2 nor its proof suggests a computa
tional algorithm. However, the Penrose properties do offer the great virtue of providing a
checkable criterion in the following sense. Given a matrix G that is a candidate for being
the pseudoinverse of A, one need simply verify the four Penrose conditions (P1)(P4). If G
satisfies all four, then by uniqueness, it must be A +. Such a verification is often relatively
straightforward.
Example 4.3. Consider A = [a Verify directly that A+ = [! ~ ] satisfies (PI)(P4).
Note that other left inverses (for example, A L = [3  1]) satisfy properties (PI), (P2),
and (P4) but not (P3).
Still another characterization of A + is given in the following theorem, whose proof
can be found in [1, p. 19]. While not generally suitable for computer implementation, this
characterization can be useful for hand calculation of small examples.
Theorem 4.4. Let A E lR;" xn. Then
4.2 Examples
A + = lim (AT A + 8
2
1) I AT
6+0
= limAT(AAT +8
2
1)1.
6+0
(4.1)
(4.2)
Each of the following can be derived or verified by using the above definitions or charac
terizations.
Example 4.5. X
t
= AT (AA T) I if A is onto (independent rows) (A is right invertible).
Example 4.6. A+ = (AT A)I AT if A is 11 (independent columns) (A is left invertible).
Example 4.7. For any scalar a,
if a t= 0,
if a =0.
4.3. Properties and Appl ications 31
Example 4.8. For any vector v e M",
Example 4.9.
Example 4.10.
4.3 Properties and Applications
This section presents some miscellaneous useful results on pseudoinverses. Many of these
are used in the text that follows.
Theorem 4.11. Let A e R
mx
" and suppose U e R
mxm
, V e R
nx
" are orthogonal (M is
orthogonal if M
T
= M
1
). Then
Proof: For the proof, simply verify that the expression above does indeed satisfy each c
the four Penrose conditions. D
Theorem 4.12. Let S e R
nxn
be symmetric with U
T
SU = D, where U is orthogonal an
D is diagonal. Then S
+
= UD
+
U
T
, where D
+
is again a diagonal matrix whose diagonc
elements are determined according to Example 4.7.
Theorem 4.13. For all A e R
mxn
,
Proof: Both results can be proved using the limit characterization of Theorem 4.4. The
proof of the first result is not particularly easy and does not even have the virtue of being
especially illuminating. The interested reader can consult the proof in [1, p. 27]. The
proof of the second result (which can also be proved easily by verifying the four Penrose
conditions) is as follows:
4.3. Properties and Applications
Example 4.8. For any vector v E jRn,
Example 4.9.
[ ~ ~ r = [
0
~ l
[ ~ ~ r = [
I I
1
Example 4.10.
4 4
I I
4 4
4.3 Properties and Applications
if v i= 0,
if v = O.
31
This section presents some miscellaneous useful results on pseudoinverses. Many of these
are used in the text that follows.
Theorem 4.11. Let A E jRmxn and suppose U E jRmxm, V E jRnxn are orthogonal (M is
orthogonal if MT = M
1
). Then
Proof: For the proof, simply verify that the expression above does indeed satisfy each of
the four Penrose conditions. 0
Theorem 4.12. Let S E jRnxn be symmetric with U
T
SU = D, where U is orthogonal and
D is diagonal. Then S+ = U D+U
T
, where D+ is again a diagonal matrix whose diagonal
elements are determined according to Example 4.7.
Theorem 4.13. For all A E jRmxn,
1. A+ = (AT A)+ AT = AT (AA
T
)+.
2. (A
T
)+ = (A+{.
Proof: Both results can be proved using the limit characterization of Theorem 4.4. The
proof of the first result is not particularly easy and does not even have the virtue of being
especially illuminating. The interested reader can consult the proof in [1, p. 27]. The
proof of the second result (which can also be proved easily by verifying the four Penrose
conditions) is as follows:
(A
T
)+ = lim (AA
T
+ 8
2
l)IA
~   + O
= lim [AT(AAT + 8
2
l)1{
~   + O
= [limAT(AAT + 8
2
l)1{
~   + O
= (A+{. 0
32 Chapter 4. Introduction to the MoorePenrose Pseudoinverse
Note that by combining Theorems 4.12 and 4.13 we can, in theory at least, compute
the MoorePenrose pseudoinverse of any matrix (since A A
T
and A
T
A are symmetric). This
turns out to be a poor approach in finiteprecision arithmetic, however (see, e.g., [7], [11],
[23]), and better methods are suggested in text that follows.
Theorem 4.11 is suggestive of a "reverseorder" property for pseudoinverses of prod
nets of matrices such as exists for inverses of nroducts TTnfortnnatelv. in peneraK
As an example consider A = [0 1J and B = LI. Then
while
However, necessary and sufficient conditions under which the reverseorder property does
hold are known and we quote a couple of moderately useful results for reference.
Theorem 4.14. (AB)
+
= B
+
A
+
if and only if
Proof: For the proof, see [9]. D
Theorem 4.15. (AB)
+
= B?A+, where BI = A+AB and A) = AB\B+.
Proof: For the proof, see [5]. D
Theorem 4.16. If A e R
n
r
xr
, B e R
r
r
xm
, then (AB)
+
= B+A+.
Proof: Since A e R
n
r
xr
, then A
+
= (A
T
A)~
l
A
T
, whence A
+
A = I
r
. Similarly, since
B e W
r
xm
, we have B
+
= B
T
(BB
T
)~\ whence BB
+
= I
r
. The result then follows by
taking BI = B, A\ = A in Theorem 4.15. D
The following theorem gives some additional useful properties of pseudoinverses.
Theorem 4.17. For all A e R
mxn
,
32 Chapter 4. Introduction to the MoorePenrose Pseudo inverse
Note that by combining Theorems 4.12 and 4.13 we can, in theory at least, compute
the MoorePenrose pseudoinverse of any matrix (since AAT and AT A are symmetric). This
turns out to be a poor approach in finiteprecision arithmetic, however (see, e.g., [7], [II],
[23]), and better methods are suggested in text that follows.
Theorem 4.11 is suggestive of a "reverseorder" property for pseudoinverses of prod
ucts of matrices such as exists for inverses of products. Unfortunately, in general,
As an example consider A = [0 I] and B = [ : J. Then
(AB)+ = 1+ = I
while
B+ A+ = [ ] =
However, necessary and sufficient conditions under which the reverseorder property does
hold are known and we quote a couple of moderately useful results for reference.
Theorem 4.14. (AB)+ = B+ A + if and only if
1. n(BB
T
AT) n(AT)
and
2. n(A T AB) nCB) .
Proof: For the proof, see [9]. 0
Theorem 4.15. (AB)+ = B{ Ai, where BI = A+ AB and AI = ABIB{.
Proof: For the proof, see [5]. 0
Theorem 4.16. If A E B E then (AB)+ = B+ A+.
Proof' Since A E then A+ = (AT A)I AT, whence A+ A = f
r
• Similarly, since
B E lR;xm, we have B+ = BT(BBT)I, whence BB+ = f
r
. The result then follows by
takingB
t
= B,At = A in Theorem 4.15. 0
The following theorem gives some additional useful properties of pseudoinverses.
Theorem 4.17. For all A E lR
mxn
,
1. (A+)+ = A.
2. (AT A)+ = A+(A
T
)+, (AA
T
)+ = (A
T
)+ A+.
3. n(A+) = n(A
T
) = n(A+ A) = n(A
T
A).
4. N(A+) = N(AA+) = N«AA
T
)+) = N(AA
T
) = N(A
T
).
5. If A is normal, then AkA+ = A+ Ak and (Ak)+ = (A+)kforall integers k > O.
Exercises 33
Note: Recall that A e R"
xn
is normal if AA
T
= A
T
A. For example, if A is symmetric,
skewsymmetric, or orthogonal, then it is normal. However, a matrix can be none of the
preceding but still be normal, such as
for scalars a, b e E.
The next theorem is fundamental to facilitating a compact and unifying approach
to studying the existence of solutions of (matrix) linear equations and linear least squares
problems.
Theorem 4.18. Suppose A e R
nxp
, B e E
MX m
. Then K(B) c U(A) if and only if
AA+B = B.
Proof: Suppose K(B) c U(A) and take arbitrary jc e R
m
. Then Bx e H(B) c H(A), so
there exists a vector y e R
p
such that Ay = Bx. Then we have
where one of the Penrose properties is used above. Since x was arbitrary, we have shown
that B = AA+B.
To prove the converse, assume that AA
+
B = B and take arbitrary y e K(B). Then
there exists a vector x e R
m
such that Bx = y, whereupon
EXERCISES
1. Use Theorem 4.4 to compute the pseudoinverse of \
2 2
1 •
2. If jc, y e R", show that (xy
T
)
+
= (x
T
x)
+
(y
T
y)
+
yx
T
.
3. For A e R
mxn
, prove that 7£(A) = 7£(AA
r
) using only definitions and elementary
properties of the MoorePenrose pseudoinverse.
4. For A e R
mxn
, prove that ft(A+) = ft(A
r
).
5. For A e R
pxn
and 5 € R
mx
", show that JV(A) C A/"(S) if and only if fiA+A = B.
6. Let A G M"
xn
, 5 e E
nxm
, and D € E
mxm
and suppose further that D is nonsingular.
(a) Prove or disprove that
(b) Prove or disprove that
Exercises 33
Note: Recall that A E IRn xn is normal if A A T = A T A. For example, if A is symmetric,
skewsymmetric, or orthogonal, then it is normal. However, a matrix can be none of the
preceding but still be normal, such as
A=[ a b]
b a
for scalars a, b E R
The next theorem is fundamental to facilitating a compact and unifying approach
to studying the existence of solutions of (matrix) linear equations and linear least squares
problems.
Theorem 4.18. Suppose A E IRnxp, B E IRnxm. Then R(B) S; R(A) if and only if
AA+B = B.
Proof: Suppose R(B) S; R(A) and take arbitrary x E IRm. Then Bx E R(B) S; RCA), so
there exists a vector y E IRP such that Ay = Bx. Then we have
Bx = Ay = AA + Ay = AA + Bx,
where one of the Penrose properties is used above. Since x was arbitrary, we have shown
that B = AA+ B.
To prove the converse, assume that AA + B = B and take arbitrary y E R(B). Then
there exists a vector x E IR
m
such that Bx = y, whereupon
y = Bx = AA+Bx E R(A). 0
EXERCISES
1. Use Theorem 4.4 to compute the pseudoinverse of U ;].
2. If x, Y E IRn, show that (xyT)+ = (x
T
x)+(yT y)+ yx
T
.
3. For A E IRmxn, prove that RCA) = R(AAT) using only definitions and elementary
properties of the MoorePenrose pseudoinverse.
4. For A E IRmxn, prove that R(A+) = R(A
T
).
5. For A E IRPxn and BE IRmxn, show thatN(A) S; N(B) if and only if BA+ A = B.
6. Let A E IRn xn, B E JRn xm , and D E IRm xm and suppose further that D is nonsingular.
(a) Prove or disprove that
[ ~
AB
r = [
A+ A+ABD
i
].
D 0
D
i
(b) Prove or disprove that
[ ~
B
r =[
A+ A+BD
1
l
D 0
D
i
This page intentionally left blank This page intentionally left blank
Chapter 5
Introduction to the Singular
Value Decomposition
In this chapter we give a brief introduction to the singular value decomposition (SVD). We
show that every matrix has an SVD and describe some useful properties and applications
of this important matrix factorization. The SVD plays a key conceptual and computational
role throughout (numerical) linear algebra and its applications.
5.1 The Fundamental Theorem
Theorem 5.1. Let A e R™
xn
. Then there exist orthogonal matrices U e R
mxm
and
V € R
nxn
such that
where S = [J °
0
], S = diagfcri, ... , o>) e R
rxr
, and a\ > • • • > o
r
> 0. More
specifically, we have
The submatrix sizes are all determined by r (which must be < min{m, «}), i.e., U\ e W
nxr
,
U
2 e
^x(mr)
; Vi e R
«xr
j
y
2 €
Rnxfor^
and the
0
JM
^/ocJb in E are compatibly
dimensioned.
Proof: Since A
r
A> 0 ( A
r
Ai s symmetric and nonnegative definite; recall, for example,
[24, Ch. 6]), its eigenvalues are all real and nonnegative. (Note: The rest of the proof follows
analogously if we start with the observation that A A
T
> 0 and the details are left to the reader
as an exercise.) Denote the set of eigenvalues of A
T
A by {of , / e n} with a\ > • • • > a
r
>
0 = o>
+
i = • • • = a
n
. Let {u, , i e n} be a set of corresponding orthonormal eigenvectors
and let V\ = [v\, ..., v
r
] , Vi = [v
r+
\, . . . , v
n
]. Letting S — diag(cri, . . . , cf
r
), we can
write A
r
AVi = ViS
2
. Premultiplying by Vf gives Vf A
T
AVi = VfV^S
2
= S
2
, the latter
equality following from the orthonormality of the r;, vectors. Pre and postmultiplying by
S~
l
eives the emotion
35
Chapter 5
Introduction to the Singular
Value Decomposition
In this chapter we give a brief introduction to the singular value decomposition (SVD). We
show that every matrix has an SVD and describe some useful properties and applications
of this important matrix factorization. The SVD plays a key conceptual and computational
role throughout (numerical) linear algebra and its applications.
5.1 The Fundamental Theorem
Theorem 5.1. Let A E Then there exist orthogonal matrices U E IRmxm and
V E IR
nxn
such that
A =
(5.1)
where =
n
S diag(ul, ... , u
r
) E
IRrxr, and UI
> > U
r
> O. More
specifically, we have
U2) [
0
] [
V
T
]
A = [U
I
I
(5.2)
0
VT
2
= Ulsvt·
(5.3)
The submatrix sizes are all determined by r (which must be S min{m, n}), i.e., UI E IRmxr,
U2 E IRrnx(mrl, VI E IRnxr, V
2
E IRnx(nr), and the Osubblocks in are compatibly
dimensioned.
Proof: Since AT A ::::: 0 (AT A is symmetric and nonnegative definite; recall, for example,
[24, Ch. 6]), its eigenvalues are all real and nonnegative. (Note: The rest of the proof follows
analogously if we start with the observation that AAT ::::: 0 and the details are left to the reader
as an exercise.) Denote the set of eigenvalues of AT A by {U?, i E !!.} with UI ::::: ... ::::: U
r
>
0= Ur+1 = ... = Un. Let {Vi, i E !!.} be a set of corresponding orthonormal eigenvectors
and let VI = [VI, ... ,V
r
), V2 = [Vr+I, ... ,V
n
]. LettingS = diag(uI, ... ,u
r
), we can
write A T A VI = VI S2. Premultiplying by vt gives vt A T A VI = vt VI S2 = S2, the latter
equality following from the orthonormality of the Vi vectors. Pre and postmultiplying by
SI gives the equation
(5.4)
35
36 Chapter 5. Introduction to the Singular Value Decomposition
Turning now to the eigenvalue equations corresponding to the eigenvalues o
r+
\, . . . , a
n
we
have that A
T
AV
2
= V
2
0 = 0, whence Vf A
T
AV
2
= 0. Thus, AV
2
= 0. Now define the
matrix Ui e M
mx/
" by U\ = AViS~
l
. Then from (5.4) we see that UfU\ = /; i.e., the
columns of U\ are orthonormal. Choose any matrix U
2
£ ^
7 7 I X(
™~
r)
such that [U\ U
2
] is
orthogonal. Then
since A V
2
=0. Referring to the equation U\ = A V\ S
l
defining U\, we see that U{ AV\ =
S and 1/2 AVi = U^UiS = 0. The latter equality follows from the orthogonality of the
columns of U\ andU
2
. Thus, we see that, in fact, U
T
AV = [Q Q], and defining this matrix
to be S completes the proof. D
Definition 5.2. Let A = t/E V
T
be an SVD of A as in Theorem 5.1.
1. The set [a\,..., a
r
} is called the set of (nonzero) singular values of the matrix A and
i
is denoted £(A). From the proof of Theorem 5.1 we see that cr,(A) = A
(
2
(A
T
A) =
A.? (AA
T
). Note that there are alsomin{m, n] — r zero singular values.
2. The columns ofU are called the left singular vectors of A (and are the orthonormal
eigenvectors of AA
T
).
3. The columns of V are called the right singular vectors of A (and are the orthonormal
eigenvectors of A
1
A).
Remark 5.3. The analogous complex case in which A e C™
x
" is quite straightforward.
The decomposition is A = t/E V
H
, where U and V are unitary and the proof is essentially
identical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that U and V can be interpreted as changes of basis in both the domain
and codomain spaces with respect to which A then has a diagonal matrix representation.
Specifically, let C, denote A thought of as a linear transformation mapping W to W. Then
rewriting A = U^V
T
as AV = U E we see that Mat £ is S with respect to the bases
[v\,..., v
n
} for R" and {u\,..., u
m
] for R
m
(see the discussion in Section 3.2). See also
Remark 5.16.
Remark 5.5. The singular value decomposition is not unique. For example, an examination
of the proof of Theorem 5.1 reveals that
• any orthonormal basis for jV(A) can be used for V
2
.
there may be nonuniqueness associated with the columns of V\ (and hence U\) cor
responding to multiple cr/' s.
36 Chapter 5. Introduction to the Singular Value Decomposition
Turning now to the eigenvalue equations corresponding to the eigenvalues ar+l, ... , an we
have that A T A V
z
= VzO = 0, whence Vi A T A V
2
= O. Thus, A V
2
= O. Now define the
matrix VI E IRmxr by VI = AViSI. Then from (5.4) we see that VrVI = /; i.e., the
columns of VI are orthonormal. Choose any matrix V2 E IRmx(mr) such that [VI V2] is
orthogonal. Then
V
T
AV = [
VrAV
I
VIAV
I
=[
VrAV
I
vIA VI
Vr AV
z
]
vI AV
z
]
since A V
2
= O. Referring to the equation V I = A VI SI defining VI, we see that V r A VI =
S and vI A VI = vI VI S = O. The latter equality follows from the orthogonality of the
columns of VI and V
2
. Thus, we see that, in fact, VT A V = and defining this matrix
to be completes the proof. 0
Definition 5.2. Let A = V"i:. VT be an SVD of A as in Theorem 5.1.
1. The set {ai, ... , a
r
} is called the set of (nonzero) singular values of the matrix A and
I
is denoted From the proof of Theorem 5.1 we see that ai(A) = A;' (AT A) =
I
At (AA
T
). Note that there are also min{m, n}  r zero singular values.
2. The columns of V are called the left singular vectors of A (and are the orthonormal
eigenvectors of AA
T
).
3. The columns of V are called the right singular vectors of A (and are the orthonormal
eigenvectors of AT A).
Remark 5.3. The analogous complex case in which A E xn is quite straightforward.
The decomposition is A = V"i:. V H, where V and V are unitary and the proof is essentially
identical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that V and V can be interpreted as changes of basis in both the domain
and codomain spaces with respect to which A then has a diagonal matrix representation.
Specifically, let C denote A thought of as a linear transformation mapping IR
n
to IRm. Then
rewriting A = V"i:. VT as A V = V"i:. we see that Mat C is "i:. with respect to the bases
{VI, ... , v
n
} for IR
n
and {u I, •.. , u
m
} for IR
m
(see the discussion in Section 3.2). See also
Remark 5.16.
Remark 5.5. The !:ingular value decomposition is not unique. For example, an examination
of the proof of Theorem 5.1 reveals that
• £lny orthonormal basis for N(A) can be used for V2.
• there may be nonuniqueness associated with the columns of VI (and hence VI) cor
responding to multiple O'i'S.
5.1. The Fundamental Theorem 37
• any C/
2
can be used so long as [U\ Ui] is orthogonal.
• columns of U and V can be changed (in tandem) by sign (or multiplier of the form
e
je
in the complex case).
What is unique, however, is the matrix E and the span of the columns of U\, f/2, Vi, and
¥2 (see Theorem 5.11). Note, too, that a "full SVD" (5.2) can always be constructed from
a "compact SVD" (5.3).
Remark 5.6. Computing an SVD by working directly with the eigenproblem for A
T
A or
AA
T
is numerically poor in finiteprecision arithmetic. Better algorithms exist that work
directly on A via a sequence of orthogonal transformations; see, e.g., [7], [11], [25].
F/vamnlp 5.7.
Example 5.10. Let A e R
MX
" be symmetric and positive definite. Let V be an orthogonal
matrix of eigenvectors that diagonalizes A, i.e., V
T
AV = A > 0. Then A = VAV
T
is an
SVD of A.
A factorization t/SV
r
o f a n m x n matrix A qualifies as an SVD if U and V are
orthogonal and £ is an m x n "diagonal" matrix whose diagonal elements in the upper
left corner are positive (and ordered). For example, if A = f/E V
T
is an SVD of A, then
VS
r
C/
r
i sanSVDof A
T
.
where U is an arbitrary 2x2 orthogonal matrix, is an SVD.
Example 5.8.
where 0 is arbitrary, is an SVD.
Example 5.9.
is an SVD.
5.1. The Fundamental Theorem 37
• any U2 can be used so long as [U
I
U2] is orthogonal.
• columns of U and V can be changed (in tandem) by sign (or multiplier of the form
e
j8
in the complex case).
What is unique, however, is the matrix I: and the span of the columns of UI, U2, VI, and
V
2
(see Theorem 5.11). Note, too, that a "full SVD" (5.2) can always be constructed from
a "compact SVD" (5.3).
Remark 5.6. Computing an SVD by working directly with the eigenproblem for A T A or
AA T is numerically poor in finiteprecision arithmetic. Better algorithms exist that work
directly on A via a sequence of orthogonal transformations; see, e.g., [7], [11], [25],
Example 5.7.
A  [1 0 ]  U I U
T
 01 ,
where U is an arbitrary 2 x 2 orthogonal matrix, is an SVD.
Example 5.8.
A _ [ 1
 0  ~ ] = [
where e is arbitrary, is an SVD.
Example 5.9.
cose
 sine
sin e
cose J [ ~ ~ J [
cose
sine
A=U n=[
I 2y'5
2 ~ ][ 3 ~ 0][
3
5
2
y'5
4y'5 0 0
3 S 15
2
0
_y'5 0 0
3
3
[
I
]
3
3J2 [ ~
~ ]
=
2
3
2
3
is an SVD.
Sine]
cose '
v'2 v'2
]
T T
v'2 v'2
T
2
Example 5.10. Let A E IR
nxn
be symmetric and positive definite. Let V be an orthogonal
matrix of eigenvectors that diagonalizes A, i.e., VT A V = A > O. Then A = V A V
T
is an
SVDof A.
A factorization UI: VT of an m x n matrix A qualifies as an SVD if U and V are
orthogonal and I: is an m x n "diagonal" matrix whose diagonal elements in the upper
left comer are positive (and ordered). For example, if A = UI:V
T
is an SVD of A, then
VI:TU
T
is an SVD of AT.
38 Chapter 5. Introduction to the Singular Value Decomposition
5.2 Some Basic Properties
Theorem 5.11. Let A e R
mxn
have a singular value decomposition A = VLV
T
. Using
the notation of Theorem 5.1, the following properties hold:
1. rank(A) = r = the number of nonzero singular values of A.
2. Let U =. [ H I , ..., u
m
] and V = [v\, ..., v
n
]. Then A has the dyadic (or outer
product) expansion
Remark 5.12. Part 4 of the above theorem provides a numerically superior method for
finding (orthonormal) bases for the four fundamental subspaces compared to methods based
on, for example, reduction to row or column echelon form. Note that each subspace requires
knowledge of the rank r. The relationship to the four fundamental subspaces is summarized
nicely in Figure 5.1.
Remark 5.13. The elegance of the dyadic decomposition (5.5) as a sum of outer products
and the key vector relations (5.6) and (5.7) explain why it is conventional to write the SVD
as A = UZV
T
rather than, say, A = UZV.
Theorem 5.14. Let A e E
mx
" have a singular value decomposition A = UHV
T
as in
TheoremS.]. Then
where
3. The singular vectors satisfy the relations
38 Chapter 5. Introduction to the Singular Value Decomposition
5.2 Some Basic Properties
Theorem 5.11. Let A E jRrnxn have a singular value decomposition A = U'£ V
T
. Using
the notation of Theorem 5.1, the following properties hold:
1. rank(A) = r = the number of nonzero singular values of A.
2. Let V = [UI, ... , urn] and V = [VI, ... , v
n
]. Then A has the dyadic (or outer
product) expansion
r
A = Laiuiv;.
i=1
3. The singular vectors satisfy the relations
for i E r.
AVi = ajui,
AT Uj = aivi
(5.5)
(5.6)
(5.7)
4. LetUI = [UI, ... , u
r
], U2 = [Ur+I, ... , urn], VI = [VI, ... , v
r
], andV2 = [Vr+I, ... , V
n
].
Then
(a) R(VI) = R(A) = N(A
T
/.
(b) R(U
2
) = R(A)1 = N(A
T
).
(c) R(VI) = N(A)1 = R(A
T
).
(d) R(V2) = N(A) = R(AT)1.
Remark 5.12. Part 4 of the above theorem provides a numerically superior method for
finding (orthonormal) bases for the four fundamental subspaces compared to methods based
on, for example, reduction to row or column echelon form. Note that each subspace requires
knowledge of the rank r. The relationship to the four fundamental subspaces is summarized
nicely in Figure 5.1.
Remark 5.13. The elegance of the dyadic decomposition (5.5) as a sum of outer products
and the key vector relations (5.6) and (5.7) explain why it is conventional to write the SVD
as A = U'£V
T
rather than, say, A = U,£V.
Theorem 5.14. Let A E jRmxn have a singular value decomposition A = U,£V
T
as in
Theorem 5.1. Then
(5.8)
where
5.2. Some Basic Properties 39
Figure 5.1. SVD and the four fundamental subspaces.
with the Qsubblocks appropriately sized. Furthermore, if we let the columns of U and V
be as defined in Theorem 5.11, then
Proof: The proof follows easily by verifying the four Penrose conditions. D
Remark 5.15. Note that none of the expressions above quite qualifies as an SVD of A
+
if we insist that the singular values be ordered from largest to smallest. However, a simple
reordering accomplishes the task:
This can also be written in matrix terms by using the socalled reverseorder identity matrix
(or exchange matrix) P = \e
r
,e
r
^\, ..., e^, e\\, which is clearly orthogonal and symmetric.
5.2. Some Basic Properties 39
A
r r
E9 {O}
/ {O)<!l
nr mr
Figure 5.1. SVD and the four fundamental subspaces.
with the Osubblocks appropriately sized. Furthermore, if we let the columns of U and V
be as defined in Theorem 5.11, then
r 1
= L v;u;, (5.10)
;=1 U;
Proof' The proof follows easily by verifying the four Penrose conditions. 0
Remark 5.15. Note that none of the expressions above quite qualifies as an SVD of A +
if we insist that the singular values be ordered from largest to smallest. However, a simple
reordering accomplishes the task:
(5.11)
This can also be written in matrix terms by using the socalled reverseorder identity matrix
(or exchange matrix) P = [e
r
, erI, ... , e2, ed, which is clearly orthogonal and symmetric.
is the matrix version of (5.11). A "full SVD" can be similarly constructed.
Remark 5.16. Recall the linear transformation T used in the proof of Theorem 3.17 and
in Definition 4.1. Since T is determined by its action on a basis, and since ( v \ , . . . , v
r
} is a
basis forJ\f(A)
±
, then T can be defined by TV; = cr, w, , / e r. Similarly, since [u\, ... ,u
r
}
isabasisfor7£(.4), then T~
l
can be defined by T^' M, = ^u, , / e r. From Section 3.2, the
matrix representation for T with respect to the bases { v \ , ..., v
r
} and { MI , . . . , u
r
] is clearly
S, while the matrix representation for the inverse linear transformation T~
l
with respect to
the same bases is 5""
1
.
5.3 Row and Column Compressions
Row compression
Let A E R
mxn
have an SVD given by (5.1). Then
Notice that M(A)  M(U
T
A) = A/"(SV,
r
) and the matrix SVf e R
r x
" has full row
rank. I n other words, premultiplication of A by U
T
is an orthogonal transformation that
"compresses" A by row transformations. Such a row compression can also be accomplished
D _
by orthogonal row transformations performed directly on A to reduce it to the form
0
,
where R is upper triangular. Both compressions are analogous to the socalled rowreduced
echelon form which, when derived by a Gaussian elimination algorithm implemented in
finiteprecision arithmetic, is not generally as reliable a procedure.
Column compression
Again, let A e R
mxn
have an SVD given by (5.1). Then
This time, notice that H(A) = K(AV) = K(UiS) and the matrix UiS e R
mxr
has full
column rank. I n other words, postmultiplication of A by V is an orthogonal transformation
that "compresses" A by column transformations. Such a compression is analogous to the
40 Chapters. Introduction to the Singular Value Decomposition
Then
40 Chapter 5. Introduction to the Singular Value Decomposition
Then
A+ = (VI p)(PS1 p)(PVr)
is the matrix version of (5.11). A "full SVD" can be similarly constructed.
Remark 5.16. Recall the linear transformation T used in the proof of Theorem 3.17 and
in Definition 4.1. Since T is determined by its action on a basis, and since {VI, ... , v
r
} is a
basisforN(A).l, then T can be defined by TVj = OjUj , i E ~ . Similarly, since {UI, ... , u
r
}
is a basis forR(A), then T
I
canbedefinedbyTIu; = tv; ,i E ~ . From Section 3.2, the
matrix representation for T with respect to the bases {VI, ... , v
r
} and {u I, ... , u
r
} is clearly
S, while the matrix representation for the inverse linear transformation T
I
with respect to
the same bases is SI.
5.3 Rowand Column Compressions
Row compression
Let A E lR.
mxn
have an SVD given by (5.1). Then
VT A = :EVT
= [ ~ ~ ] [ ~ i ]
 [ SVr ] lR.
mxn
 0 E .
Notice that N(A) = N(V
T
A) = N(svr> and the matrix SVr E lR.
rxll
has full row
rank. In other words, premultiplication of A by VT is an orthogonal transformation that
"compresses" A by row transformations. Such a row compression can also be accomplished
by orthogonal row transformations performed directly on A to reduce it to the form [ ~ ] ,
where R is upper triangular. Both compressions are analogous to the socalled rowreduced
echelon form which, when derived by a Gaussian elimination algorithm implemented in
finiteprecision arithmetic, is not generally as reliable a procedure.
Column compression
Again, let A E lR.
mxn
have an SVD given by (5.1). Then
AV = V:E
= [VI U2] [ ~ ~ ]
=[VIS 0] ElR.mxn.
This time, notice that R(A) = R(A V) = R(UI S) and the matrix VI S E lR.
m
xr has full
column rank. In other words, postmultiplication of A by V is an orthogonal transformation
that "compresses" A by I;olumn transformations. Such a compression is analogous to the
Exercises 41
socalled columnreduced echelon form, which is not generally a reliable procedure when
performed by Gauss transformations in finiteprecision arithmetic. For details, see, for
example, [7], [11], [23], [25].
EXERCISES
1. Let X € M
mx
". If X
T
X = 0, show that X = 0.
2. Prove Theorem 5.1 starting from the observation that AA
T
> 0.
3. Let A e E"
xn
be symmetric but indefinite. Determine an SVD of A.
4. Let x e R
m
, y e R
n
be nonzero vectors. Determine an SVD of the matrix A e R™
defined by A = xy
T
.
6. Let A e R
mxn
and suppose W eR
mxm
and 7 e R
nxn
are orthogonal.
(a) Show that A and W A F have the same singular values (and hence the same rank).
(b) Suppose that W and Y are nonsingular but not necessarily orthogonal. Do A
and WAY have the same singular values? Do they have the same rank?
7. Let A € R"
XM
. Use the SVD to determine a polar factorization of A, i.e., A = QP
where Q is orthogonal and P = P
T
> 0. Note: this is analogous to the polar form
z = re
l&
ofa complex scalar z (where i = j = V^T).
5. Determine SVDs of the matrices
Exercises 41
socalled columnreduced echelon form, which is not generally a reliable procedure when
performed by Gauss transformations in finiteprecision arithmetic. For details, see, for
example, [7], [11], [23], [25].
EXERCISES
1. Let X E IRmxn. If XT X = 0, show that X = o.
2. Prove Theorem 5.1 starting from the observation that AAT ~ O.
3. Let A E IR
nxn
be symmetric but indefinite. Determine an SVD of A.
4. Let x E IRm, y E ~ n be nonzero vectors. Determine an SVD of the matrix A E ~ ~ xn
defined by A = xyT.
5. Determine SVDs of the matrices
(a)
[
1
]
0 1
(b)
[
~ l
6. Let A E ~ m x n and suppose W E IR
mxm
and Y E ~ n x n are orthogonal.
(a) Show that A and WAY have the same singular values (and hence the same rank).
(b) Suppose that Wand Yare nonsingular but not necessarily orthogonal. Do A
and WAY have the same singular values? Do they have the same rank?
7. Let A E ~ ~ x n . Use the SVD to determine a polar factorization of A, i.e., A = Q P
where Q is orthogonal and P = p
T
> O. Note: this is analogous to the polar form
z = re
iO
of a complex scalar z (where i = j = J=I).
This page intentionally left blank This page intentionally left blank
Chapter 6
Li near Equations
In this chapter we examine existence and uniqueness of solutions of systems of linear
equations. General linear systems of the form
are studied and include, as a special case, the familiar vector system
6.1 Vector Li near Equations
We begin with a review of some of the principal results associated with vector linear systems.
Theorem 6.1. Consider the system of linear equations
1. There exists a solution to (6.3) if and only ifbeH(A).
2. There exists a solution to (6.3} for all b e R
m
if and only ifU(A) = W", i.e., A is
onto; equivalently, there exists a solution if and only j/"rank([A, b]) = rank(A), and
this is possible only ifm < n (since m = dimT^(A) = rank(A) < min{m, n}).
3. A solution to (6.3) is unique if and only ifJ\f(A) = 0, i.e., A is 11.
4. There exists a unique solution to (6.3) for all b e W" if and only if A is nonsingular;
equivalently, A G M
mxm
and A has neither a 0 singular value nor a 0 eigenvalue.
5. There exists at most one solution to (6.3) for all b e W
1
if and only if the columns of
A are linearly independent, i.e., A/"(A) = 0, and this is possible only ifm > n.
6. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only if
rank(A) < n.
43
Chapter 6
Linear Equations
In this chapter we examine existence and uniqueness of solutions of systems of linear
equations. General linear systems of the form
(6.1)
are studied and include, as a special case, the familiar vector system
Ax = b; A E ]Rn xn, b E ]Rn.
(6.2)
6.1 Vector Linear Equations
We begin with a review of some of the principal results associated with vector linear systems.
Theorem 6.1. Consider the system of linear equations
Ax = b; A E lR
m
xn, b E lRm.
(6.3)
1. There exists a solution to (6.3) if and only if b E R(A).
2. There exists a solution to (6.3) for all b E lR
m
if and only ifR(A) = lR
m
, i.e., A is
onto; equivalently, there exists a solution if and only ifrank([A, b]) = rank(A), and
this is possible only ifm :::: n (since m = dim R(A) = rank(A) :::: min{m, n n.
3. A solution to (6.3) is unique if and only if N(A) = 0, i.e., A is 11.
4. There exists a unique solution to (6.3) for all b E ]Rm if and only if A is nonsingular;
equivalently, A E lR
mxm
and A has neither a 0 singular value nor a 0 eigenvalue.
5. There exists at most one solution to (6.3) for all b E lR
m
if and only if the columns of
A are linearly independent, i.e., N(A) = 0, and this is possible only ifm ::: n.
6. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only if
rank(A) < n.
43
44 Chapter 6. Linear Equations
Proof: The proofs are straightforward and can be consulted in standard texts on linear
algebra. Note that some parts of the theorem follow directly from others. For example, to
prove part 6, note that x = 0 is always a solution to the homogeneous system. Therefore, we
must have the case of a nonunique solution, i.e., A is not 11, which implies rank(A) < n
by part 3. D
6.2 Matrix Linear Equations
In this section we present some of the principal results concerning existence and uniqueness
of solutions to the general matrix linear system (6.1). Note that the results of Theorem
6.1 follow from those below for the special case k = 1, while results for (6.2) follow by
specializing even further to the case m = n.
Theorem 6.2 (Existence). The matrix linear equation
and this is clearly of the form (6.5).
has a solution if and only ifl^(B) C 7£(A); equivalently, a solution exists if and only if
AA
+
B = B.
Proof: The subspace inclusion criterion follows essentially from the definition of the range
of a matrix. The matrix criterion is Theorem 4.18.
Theorem 6.3. Let A e R
mxn
, B eR
mxk
and suppose that AA
+
B = B. Then any matrix
of the form
is a solution of
Furthermore, all solutions of (6.6) are of this form.
Proof: To verify that (6.5) is a solution, premultiply by A:
That all solutions arc of this form can be seen as follows. Let Z be an arbitrary solution of
(6.6), i.e., AZ — B. Then we can write
44 Chapter 6. Linear Equations
Proof: The proofs are straightforward and can be consulted in standard texts on linear
algebra. Note that some parts of the theorem follow directly from others. For example, to
prove part 6, note that x = 0 is always a solution to the homogeneous system. Therefore, we
must have the case of a nonunique solution, i.e., A is not II, which implies rank(A) < n
by part 3. 0
6.2 Matrix Linear Equations
In this section we present some of the principal results concerning existence and uniqueness
of solutions to the general matrix linear system (6.1). Note that the results of Theorem
6.1 follow from those below for the special case k = 1, while results for (6.2) follow by
specializing even further to the case m = n.
Theorem 6.2 (Existence). The matrix linear equation
AX = B; A E JR.
mxn
, BE JR.mxk, (6.4)
has a solution if and only ifR(B) S; R(A); equivalently, a solution exists if and only if
AA+B = B.
Proof: The subspace inclusion criterion follows essentially from the definition of the range
of a matrix. The matrix criterion is Theorem 4.18. 0
Theorem 6.3. Let A E JR.mxn, B E JR.mxk and suppose that AA + B = B. Then any matrix
of the form
X = A+ B + (/  A+ A)Y, where Y E JR.nxk is arbitrary, (6.5)
is a solution of
AX=B. (6.6)
Furthermore, all solutions of (6.6) are of this form.
Proof: To verify that (6.5) is a solution, premultiply by A:
AX = AA+ B + A(I  A+ A)Y
= B + (A  AA+ A)Y by hypothesis
= B since AA + A = A by the first Penrose condition.
That all solutions are of this form can be seen as follows. Let Z be an arbitrary solution of
(6.6). i.e .. AZ :::: B. Then we can write
Z=A+AZ+(IA+A)Z
=A+B+(IA+A)Z
and this is clearly of the form (6.5). 0
6.2. Matrix Linear Equations 45
Remark 6.4. When A is square and nonsingular, A
+
= A"
1
and so (/ — A
+
A) = 0. Thus,
there is no "arbitrary" component, leaving only the unique solution X• = A~
1
B.
Remark 6.5. It can be shown that the particular solution X = A
+
B is the solution of (6.6)
that minimizes TrX
7
X. (Tr() denotes the trace of a matrix; recall that TrX
r
X = £\ • jcj.)
Theorem 6.6 (Uniqueness). A solution of the matrix linear equation
is unique if and only if A
+
A = /; equivalently, (6.7) has a unique solution if and only if
M(A) = 0.
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting
that A
+
A = / can occur only if r — n, where r = rank(A) (recall r < h). But rank(A) = n
if and only if A is 11 or _ /V(A) = 0. D
Example 6.7. Suppose A e E"
x
". Find all solutions of the homogeneous system Ax — 0.
Solution:
where y e R" is arbitrary. Hence, there exists a nonzero solution if and only if A
+
A /= I.
This is equivalent to either rank (A) = r < n or A being singular. Clearly, if there exists a
nonzero solution, it is not unique.
Computation: Since y is arbitrary, it is easy to see that all solutions are generated
from a basis for 7£(7 — A
+
A). But if A has an SVD given by A = f/E V
T
, then it is easily
checked that /  A+A = V
2
V
2
r
and U(V
2
V^) = K(V
2
) = N(A).
Example 6.8. Characterize all right inverses of a matrix A e ]R
mx
"; equivalently, find all
solutions R of the equation AR = I
m
. Here, we write I
m
to emphasize the m x m identity
matrix.
Solution: There exists a right inverse if and only if 7£(/
m
) c 7£(A) and this is
equivalent to AA
+
I
m
= I
m
. Clearly, this can occur if and only if rank(A) = r = m (since
r < m) and this is equivalent to A being onto (A
+
is then a right inverse). All right inverses
of A are then of the form
where Y e E"
xm
is arbitrary. There is a unique right inverse if and only if A
+
A = /
(AA(A) = 0), in which case A must be invertible and R = A"
1
.
Example 6.9. Consider the system of linear firstorder difference equations
6.2. Matrix Linear Equations 45
Remark 6.4. When A is square and nonsingular, A + = AI and so (I  A + A) = O. Thus,
there is no "arbitrary" component, leaving only the unique solution X = AI B.
Remark 6.5. It can be shown that the particular solution X = A + B is the solution of (6.6)
that minimizes TrXT X. (TrO denotes the trace of a matrix; recall that TrXT X = Li,j xlj.)
Theorem 6.6 (Uniqueness). A solution of the matrix linear equation
AX = B; A E lR,mxn, BE lR,mxk
(6,7)
is unique if and only if A + A = I; equivalently, (6.7) has a unique solution if and only if
N(A) = O.
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting
thatA+ A = I can occur only ifr = n, wherer = rank(A) (recallr ::: n), Butrank(A) = n
if and only if A is Ilor N(A) = O. 0
Example 6.7. Suppose A E lR,nxn. Find all solutions of the homogeneous system Ax = 0,
Solution:
x=A+O+(IA+A)y
= (IA+A)y,
where y E lR,n is arbitrary. Hence, there exists a nonzero solution if and only if A + A t= I,
This is equivalent to either rank(A) = r < n or A being singular. Clearly, if there exists a
nonzero solution, it is not unique,
Computation: Since y is arbitrary, it is easy to see that all solutions are generated
from a basis for R(I  A + A). But if A has an SVD given by A = U h VT, then it is easily
checked that 1 A+ A = Vz V[ and R(Vz vD = R(Vz) = N(A),
Example 6.S. Characterize all right inverses of a matrix A E lR,mxn; equivalently, find all
solutions R of the equation AR = 1
m
, Here, we write 1m to emphasize the m x m identity
matrix,
Solution: There exists a right inverse if and only if R(Im) S; R(A) and this is
equivalent to AA + 1m = 1m. Clearly, this can occur if and only if rank(A) = r = m (since
r ::: m) and this is equivalent to A being onto (A + is then a right inverse). All right inverses
of A are then of the form
R = A+ 1m + (In  A+ A)Y
=A++(IA+A)Y,
where Y E lR,nxm is arbitrary, There is a unique right inverse if and only if A+ A I
(N(A) = 0), in which case A must be invertible and R = AI.
Example 6.9. Consider the system of linear firstorder difference equations
(6,8)
46 Chapter 6. Linear Equations
with A e R"
xn
and fieR"
xm
(rc>l,ra>l). The vector Jt* in linear system theory is
known as the state vector at time k while Uk is the input (control) vector. The general
solution of (6.8) is given by
for k > 1. We might now ask the question: Given X Q = 0, does there exist an input sequence
{uj } y~ Q such that x^ takes an arbitrary va
of reachability. Since m > 1, from the
see that (6.8) is reachable if and only if
[ Uj }
k
jj^ such that X k takes an arbitrary value in W ? In linear system theory, this is a question
of reachability. Since m > 1, from the fundamental Existence Theorem, Theorem 6.2, we
or, equivalently, if and only if
A related question is the following: Given an arbitrary initial vector X Q , does there ex
ist an input sequence {"y} "~ o such that x
n
= 0? In linear system theory, this is called
controllability. Again from Theorem 6.2, we see that (6.8) is controllable if and only if
Clearly, reachability always implies controllability and, if A is nonsingular, control
lability and reachability are equivalent. The matrices A = [ °
1
Q
1 and 5 = f ^ 1 provide an
example of a system that is controllable but not reachable.
The above are standard conditions with analogues for continuoustime models (i.e.,
linear differential equations). There are many other algebraically equivalent conditions.
Example 6.10. We now introduce an output vector y
k
to the system (6.8) of Example 6.9
by appending the equation
with C e R
pxn
and D € R
pxm
(p > 1). We can then pose some new questions about the
overall system that are dual in the systemtheoretic sense to reachability and controllability.
The answers are cast in terms that are dual in the linear algebra sense as well. The condition
dual to reachability is called observability: When does knowledge of {"
7
}"!Q and {y_ / } "~ o
suffice to determine (uniquely) Jt
0
? As a dual to controllability, we have the notion of
reconstructibility: When does knowledge of {w
y
} "~ Q and {;y/ } "Io suffice to determine
(uniquely) x
n
l The fundamental duality result from linear system theory is the following:
(A, B) is reachable [ controllable] if and only if (A
T
, B
T
] is observable [ reconstructive].
46 Chapter 6. Linear Equations
with A E IR
nx
" and B E IR
nxm
(n I, m I). The vector Xk in linear system theory is
known as the state vector at time k while Uk is the input (control) vector. The general
solution of (6.8) is given by
kJ
Xk = Akxo + LAkJj BUj
j=O
k kJ Uk2
[
UkJ ]
•...• A B]
(6.9)
(6.10)
for k 1. We might now ask the question: Given Xo = 0, does there exist an input sequence
{u j 1 such that Xk takes an arbitrary value in 1R"? In linear system theory, this is a question
of reacbability. Since m I, from the fundamental Existence Theorem, Theorem 6.2, we
see that (6.8) is reachable if and only if
R([ B, AB, ... , A
n

J
B]) = 1R"
or, equivalently, if and only if
rank [B, AB, ... , A
n

J
B] = n.
A related question is the following: Given an arbitrary initial vector Xo, does there ex
ist an input sequence {u j l'/:b such that Xn = O? In linear system theory, this is called
controllability. Again from Theorem 6.2, we see that (6.8) is controllable if and only if
Clearly, reachability always implies controllability and, if A is nonsingular, control
lability and reachability are equivalent. The matrices A = and B = provide an
example of a system that is controllable but not reachable.
The above are standard conditions with analogues for continuoustime models (i.e.,
linear differential equations). There are many other algebraically equivalent conditions.
Example 6.10. We now introduce an output vector Yk to the system (6.8) of Example 6.9
by appending the equation
(6.11)
with C E IR
Pxn
and D E IR
Pxm
(p 1). We can then pose some new questions about the
overall system that are dual in the systemtheoretic sense to reachability and controllability.
The answers are cast in terms that are dual in the linear algebra sense as well. The condition
dual to reachability is called observability: When does knowledge of {u j r/:b and {Yj l';:b
suffice to determine (uniquely) xo? As a dual to controllability, we have the notion of
reconstructibility: When does knowledge of {u j r/:b and {YJ lj:b suffice to determine
(uniquely) xn? The fundamental duality result from linear system theory is the following:
(A. B) iJ reachable [controllablcl if and only if (A T. B T) is observable [reconsrrucrible]
6.4 Some Us ef u l and I nt er es t i ng Inverses 47
To derive a condition for observability, notice that
Thus,
Let v denote the (known) vector on the lefthand side of (6.13) and let R denote the matrix on
the righthand side. Then, by definition, v e Tl(R), so a solution exists. By the fundamental
Uniqueness Theorem, Theorem 6.6, the solution is then unique if and only if N(R) = 0,
or, equivalently, if and only if
6.3 A More General Matrix Linear Equation
Theorem 6.11. Let A e R
mxn
, B e R
mxq
, and C e R
pxti
. Then the equation
has a solution if and only if AA
+
BC
+
C = B, in which case the general solution is of the
where Y € R
n
*
p
is arbitrary.
A compact matrix criterion for uniqueness of solutions to (6.14) requires the notion
of the Kronecker product of matrices for its statement. Such a criterion (CC
+
< g) A
+
A — I)
is stated and proved in Theorem 13.27.
6.4 Some Useful and Interesting Inverses
In many applications, the coefficient matrices of interest are square and nonsingular. Listed
below is a small collection of useful matrix identities, particularly for block matrices, as
sociated with matrix inverses. In these identities, A e R
nxn
, B E R
nxm
, C e R
mxn
,
and D € E
mxm
. Invertibility is assumed for any component or subblock whose inverse is
indicated. Verification of each identity is recommended as an exercise for the reader.
6.4 Some Useful and Interesting Inverses
Thus,
To derive a condition for observability, notice that
kl
Yk = CAkxo + L CAk1j BUj + DUk.
j=O
r
Yo  Duo
Yl  CBuo  Du]
Yn]  L j : ~ CA
n

2
j BUj  DUnl
47
(6.12)
(6.13)
Let v denote the (known) vector on the lefthand side of (6.13) and let R denote the matrix on
the righthand side. Then, by definition, v E R(R), so a solution exists. By the fundamental
Uniqueness Theorem, Theorem 6.6, the solution is then unique if and only if N(R) = 0,
or, equivalently, if and only if
6.3 A More General Matrix Linear Equation
Theorem 6.11. Let A E jRmxn, B E jRmx
q
, and C E jRpxq. Then the equation
AXC=B (6.14)
has a solution if and only if AA + BC+C = B, in which case the general solution is of the
form
(6.15)
where Y E jRnxp is arbitrary.
A compact matrix criterion for uniqueness of solutions to (6.14) requires the notion
of the Kronecker product of matrices for its statement. Such a criterion (C C+ ® A + A = I)
is stated and proved in Theorem 13.27.
6.4 Some Useful and Interesting Inverses
In many applications, the coefficient matrices of interest are square and nonsingular. Listed
below is a small collection of useful matrix identities, particularly for block matrices, as
sociated with matrix inverses. In these identities, A E jRnxn, B E jRnxm, C E jRmxn,
and D E jRm xm. Invertibility is assumed for any component or subblock whose inverse is
indicated. Verification of each identity is recommended as an exercise for the reader.
48 Chapter 6. Linear Equations
1. (A + BDCr
1
= A~
l
 A~
l
B(D~
l
+ CA~
l
B)~
[
CA~
l
.
This result is known as the ShermanMorrisonWoodbury formula. It has many
applications (and is frequently "rediscovered") including, for example, formulas for
the inverse of a sum of matrices such as (A + D)"
1
or (A"
1
+ D"
1
) . It also
yields very efficient "updating" or "downdating" formulas in expressions such as
T — 1
(A + JUT ) (with symmetric A e R"
x
" and ;c e E") that arise in optimization
theory.
EXERCISES
1. As in Example 6.8, characterize all left inverses of a matrix A e M
mx
".
2. Let A € E
mx
", B e R
mxk
and suppose A has an SVD as in Theorem 5.1. Assuming
7Z(B) c 7£(A), characterize all solutions of the matrix linear equation
Both of these matrices satisfy the matrix equation X^ = I from which it is obvious
that X~
l
= X. Note that the positions of the / and — / blocks may be exchanged.
where E = (D — CA B) (E is the inverse of the Schur complement of A). This
result follows easily from the block LU factorization in property 16 of Section 1.4.
where F = (A — ED C) . This result follows easily from the block UL factor
ization in property 17 of Section 1.4.
in terms of the SVD of A
48 Chapter 6. Linear Equations
1. (A + BDC)I = AI  AIB(D
I
+ CAIB)ICAI.
This result is known as the ShermanMorrisonWoodbury formula. It has many
applications (and is frequently "rediscovered") including, for example, formulas for
the inverse of a sum of matrices such as (A + D)lor (AI + DI)I. It also
yields very efficient "updating" or "downdating" formulas in expressions such as
(A + xx
T
) I (with symmetric A E lR
nxn
and x E lRn) that arise in optimization
theory.
2. r
l
= [
3. !/ r
l
= l r
l
= 1
Both of these matrices satisfy the matrix equation X2 = / from which it is obvious
that XI = X. Note that the positions of the / and  / blocks may be exchanged.
4. r
l
= [
AI BD
I
]
D I .
5. r
l
= 1
6. [ / +c
BC
r
l
= [!C / 1
7. r
l
= [ AI l
where E = (D  CA
I
B)I (E is the inverse of the Schur complement of A). This
result follows easily from the block LU factorization in property 16 of Section 1.4.
8. r
l
= D
I
l
where F = (A  B D
I
C) I. This result follows easily from the block UL factor
ization in property 17 of Section 1.4.
EXERCISES
1. As in Example 6.8, characterize all left inverses of a matrix A E lR
m
xn .
2. Let A E lRmxn, B E lR
fflxk
and suppose A has an SVD as in Theorem 5.1. Assuming
R(B) R(A), characterize all solutions of the matrix linear equation
AX=B
in terms of the SVD of A.
Exercises 49
3. Let jc, y e E" and suppose further that X
T
y ^ 1. Show that
4. Let x, y € E" and suppose further that X
T
y ^ 1. Show that
where c = 1/(1 — x
T
y).
5. Let A e R"
x
" and let A"
1
have columns c\, ..., c
n
and individual elements y
;y
.
Assume that x/
(
7^ 0 for some / and j. Show that the matrix B — A —
l
—ei e
T
: (i.e.,
A with — subtracted from its (zy)th element) is singular.
Hint: Show that c
t
< = M(B).
6. As in Example 6.10, check directly that the condition for reconstructibility takes the
form
Exercises 49
3. Let x, y E IR
n
and suppose further that x T y i= 1. Show that
T 1 1 T
(/  xy) = I  xy .
xTy 1
4. Let x, y E IR
n
and suppose further that x T y i= 1. Show that
cxJ
C '
where C = 1/(1  x
T
y).
5. Let A E 1 R ~ xn and let A 1 have columns Cl, ... ,C
n
and individual elements Yij.
Assume that Yji i= 0 for some i and j. Show that the matrix B = A  ~ i e;e; (i.e.,
A with yl subtracted from its (ij)th element) is singular.
l'
Hint: Show that Ci E N(B).
6. As in Example 6.10, check directly that the condition for reconstructibility takes the
form
N[
fA J ~ N(A
n
).
CA
n

1
This page intentionally left blank This page intentionally left blank
Chapter 7
Projections, Inner Product
Spaces, and Norms
7.1 Projections
Definition 7.1. Let V be a vector space with V = X 0 y. By Theorem 2.26, every v e V
has a unique decomposition v = x + y with x e X and y e y. Define PX y • V — > • X c V
by
Figure 7.1. Oblique projections.
Theorem 7.2. Px,y is linear and P# y — Px,y
Theorem 7.3. A linear transformation P is a projection if and only if it is idempotent, i.e.,
P
2
= P. Also, P is a projection if and only if I —P is a projection. Infact, Py,x — I — Px,y
Proof: Suppose P is a projection, say on X along y (using the notation of Definition 7.1).
51
Px,y is called the (oblique) projection on X along 3^.
Figure 7.1 displays the projection of v on both X and 3^ in the case V =
Chapter 7
Projections, Inner Product
Spaces, and Norms
7.1 Projections
Definition 7.1. Let V be a vector space with V = X EEl Y. By Theorem 2.26, every v E V
has a unique decomposition v = x + y with x E X and y E y. Define pX,y : V + X <; V
by
PX,yV = x for all v E V.
PX,y is called the (oblique) projection on X along y.
Figure 7.1 displays the projection of von both X and Y in the case V = ]R2.
y
x
Figure 7.1. Oblique projections.
Theorem 7.2. px.y is linear and pl.
y
= px.y.
Theorem 7.3. A linear transformation P is a projection if and only if it is idempotent, i.e.,
p2 = P. Also, P isaprojectionifandonlyifl P isaprojection. Infact, Py.x = I px.y.
Proof: Suppose P is a projection, say on X along Y (using the notation of Definition 7.1).
51
52 Chapter 7. Projections, Inner Product Spaces, and Norms
Let u e V be arbitrary. Then Pv = P(x + y) = Px = x. Moreover, P
2
v = PPv —
Px = x = Pv. Thus, P
2
= P. Conversely, suppose P
2
= P. Let X = {v e V : Pv = v}
and y = {v € V : Pv = 0}. It is easy to check that X and 3^ are subspaces. We now prove
that V = X 0 y. First note that tfveX, then Pv = v. If v e y, then Pv = 0. Hence
i f v € X n y, then v = 0. Now let u e V be arbitrary. Then v = Pv + (I  P)v. Let
x = Pv, y = (I  P)v. Then Px = P
2
v = Pv = x so x e X, while Py = P(I  P}v =
Pv  P
2
v = 0 so y e y. Thus, V = X 0 y and the projection on X along y is P.
Essentially the same argument shows that / — P is the projection on y along X. D
Definition 7.4. In the speci al case where y = X^, PX.X
L
*
s
called an orthogonal projec
tion and we then use the notati on PX = PX,X
L

Theorem 7.5. P e E"
xn
i s the matri x of an orthogonal projecti on (onto K(P)} i f and only
i fP
2
= p = P
T
.
Proof: Let P be an orthogonal projection (on X, say, along X
L
} and let jc, y e R" be
arbitrary. Note that (/  P)x = (I  PX,X^X = P
x
±,
x
x by Theorem 7.3. Thus,
(/  P)x e X
L
. Since Py e X, we have ( P y f ( I  P)x = y
T
P
T
(I  P)x = 0.
Since x and y were arbitrary, we must have P
T
(I — P) = 0. Hence P
T
= P
T
P = P,
with the second equality following since P
T
P is symmetric. Conversely, suppose P is a
symmetric projection matrix and let x be arbitrary. Write x = Px + (I — P)x. Then
x
T
P
T
(I  P)x = x
T
P(I  P}x = 0. Thus, since Px e U(P), then (/  P)x 6 ft(P)
1
and P must be an orthogonal projection. D
7.1.1 The four fundamental orthogonal projections
Using the notation of Theorems 5.1 and 5.11, let A 6 R
mxn
with SVD A = UT,V
T
=
UtSVf. Then
are easily checked to be (unique) orthogonal projections onto the respective four funda
mental subspaces,
52 Chapter 7. Projections, Inner Product Spaces, and Norms
Let v E V be arbitrary. Then Pv = P(x + y) = Px = x. Moreover, p
2
v = P Pv =
Px = x = Pv. Thus, p2 = P. Conversely, suppose p2 = P. Let X = {v E V : Pv = v}
and Y = {v E V : Pv = OJ. It is easy to check that X and Y are subspaces. We now prove
that V = X $ y. First note that if v E X, then Pv = v. If v E Y, then Pv = O. Hence
if v E X ny, then v = O. Now let v E V be arbitrary. Then v = Pv + (I  P)v. Let
x = Pv, y = (I  P)v. Then Px = p
2
v = Pv = x so x E X, while Py = P(l  P)v =
Pv  p
2
v = 0 so Y E y. Thus, V = X $ Y and the projection on X along Y is P.
Essentially the same argument shows that I  P is the projection on Y along X. 0
Definition 7.4. In the special case where Y = X1, px.xl. is called an orthogonal projec
tion and we then use the notation P
x
= PX.XL
Theorem 7.5. P E jRnxn is the matrix of an orthogonal projection (onto R(P)) if and only
if p2 = P = pT.
Proof: Let P be an orthogonal projection (on X, say, along X 1) and let x, y E jR" be
arbitrary. Note that (I  P)x = (I  px.xJ.)x = PXJ..xx by Theorem 7.3. Thus,
(I  P)x E X1. Since Py E X, we have (py)T (I  P)x = yT pT (I  P)x = O.
Since x and y were arbitrary, we must have pT (I  P) = O. Hence pT = pT P = P,
with the second equality following since pT P is symmetric. Conversely, suppose P is a
symmetric projection matrix and let x be arbitrary. Write x = P x + (I  P)x. Then
x
T
pT (I  P)x = x
T
P(l  P)x = O. Thus, since Px E R(P), then (I  P)x E R(P)1
and P must be an orthogonal projection. 0
7.1 .1 The four fundamental orthogonal projections
Using the notation of Theorems 5.1 and 5.11, let A E jRmxII with SVD A = U!:V
T
U\SVr Then
r
PR(A)
AA+
U\U[
Lu;uT,
;=1
m
PR(A).L
1 AA+
U2
U
! LUiUT,
i=r+l
11
PN(A)
1 A+A
V2V{
L ViVf,
i=r+l
r
PN(A)J.
A+A
VIV{
LViVT
i=l
are easily checked to be (unique) orthogonal projections onto the respective four funda
mental subspaces.
7.1. Projections 53
Example 7.6. Determine the orthogonal projection of a vector v e M" on another nonzero
vector w e R
n
.
Solution: Think of the vector w as an element of the onedimensional subspace IZ(w).
Then the desired projection is simply
(using Example 4.8)
Moreover, the vector z that is orthogonal to w and such that v = Pv + z is given by
z = PK(
W
)±V = (/ — PK(W))V = v — (^^ j w. See Figure 7.2. A direct calculation shows
that z and u; are, in fact, orthogonal:
Figure 7.2. Orthogonal projection on a "line."
Example 7.7. Recall the proof of Theorem 3.11. There, {v\ , ..., Vk} was an orthornormal
basis for a subset S of W
1
. An arbitrary vector x e R" was chosen and a formula for x\
appeared rather mysteriously. The expression for x\ is simply the orthogonal projection of
x on S. Specifically,
Example 7.8. Recall the diagram of the four fundamental subspaces. The indicated direct
sum decompositions of the domain E" and codomain R
m
are given easily as follows.
Let x e W
1
be an arbitrary vector. Then
7.1. Projections 53
Example 7.6. Determine the orthogonal projection of a vector v E IR
n
on another nonzero
vector w E IRn.
Solution: Think of the vector w as an element of the onedimensional subspace R( w).
Then the desired projection is simply
Pn(w)v = ww+v
wwTv
(using Example 4.8)
= (WTV)
T W.
W W
Moreover, the vector z that is orthogonal to wand such that v = P v + z is given by
z = Pn(w)"' v = (l  Pn(w»v = v  ( : ; ~ ) w. See Figure 7.2. A direct calculation shows
that z and ware, in fact, orthogonal:
v
z
Pv w
Figure 7.2. Orthogonal projection on a "line."
Example 7.7. Recall the proof of Theorem 3.11. There, {VI, ... , Vk} was an orthomormal
basis for a subset S of IRn. An arbitrary vector x E IR
n
was chosen and a formula for XI
appeared rather mysteriously. The expression for XI is simply the orthogonal projection of
X on S. Specifically,
Example 7.8. Recall the diagram of the four fundamental subspaces. The indicated direct
sum decompositions of the domain IR
n
and codomain IR
m
are given easily as follows.
Let X E IR
n
be an arbitrary vector. Then
X = PN(A)u + PN(A)X
= A+ Ax + (I  A+ A)x
= VI vt x + V
2
Vi x (recall VVT = I).
54 Chapter 7. Projections, Inner Product Spaces, and Norms
Similarly, let y e ]R
m
be an arbitrary vector. Then
Example 7.9. Let
Then
and we can decompose the vector [2 3 4]
r
uniquely into the sum of a vector in A/' CA)
1
and a vector in J\f(A), respectively, as follows:
7.2 Inner Product Spaces
Definition 7.10. Let V be a vector space over R. Then { • , • ) : V x V
product if
is a real inner
1. (x, x) > Qfor all x 6V and ( x , x } =0 if and only ifx = 0.
2. (x, y) = (y,x)forallx,y e V.
3. { *, cryi + ^2) = a(x, y\) + / 3( j t, y^} for all jc, yi, j2 ^ V and for alia, ft e R.
Example 7.11. Let V = R". Then { ^, y} = X
T
y is the "usual" Euclidean inner product or
dot product.
Example 7.12. Let V = E". Then ( j c, y)
Q
= X
T
Qy, where Q = Q
T
> 0 is an arbitrary
n x n positive definite matrix, defines a "weighted" inner product.
Definition 7.13. If A e R
mx
", then A
T
e R
nxm
is the unique linear transformation or map
such that (x, Ay)  (A
T
x, y) for all x € R
m
and for all y e R".
54 Chapter 7. Projections, Inner Product Spaces, and Norms
Similarly, let Y E IR
m
be an arbitrary vector. Then
Y = PR(A)Y +
= AA+y + (l AA+)y
= U1Ur y + U2U[ Y (recall UU
T
= I).
Example 7.9. Let
Then
1/4
1/4
o
1/4 ]
1/4
o
and we can decompose the vector [2 3 4V uniquely into the sum of a vector in N(A)L
and a vector in N(A), respectively, as follows:
[ ! ] A' Ax + (l  A' A)x
[
1/2 1/2 0] [ 2] [
= ! +
[
5/2] [1/2]
= + .
7.2 Inner Product Spaces
1/2
1/2
o
1/2
1/2
o
Definition 7.10. Let V be a vector space over IR. Then (', .) : V x V + IR is a real inner
product if
1. (x, x) ::: Of or aU x E V and (x, x) = 0 if and only ifx = O.
2. (x, y) = (y, x) for all x, y E V.
3. (x, aYI + PY2) = a(x, Yl) + f3(x, Y2) for all x, Yl, Y2 E V and/or all a, f3 E IR.
Example 7.11. Let V = IRn. Then (x, y) = x
T
Y is the "usual" Euclidean inner product or
dot product.
Example 7.12. Let V = IRn. Then (x, y) Q = X T Qy, where Q = Q T > 0 is an arbitrary
n x n positive definite matrix, defines a "weighted" inner product.
Definition 7.13. If A E IR
m
xn, then ATE IR
n
xm is the unique linear transformation or map
such that {x, Ay) = {AT x, y) for all x E IR
m
andfor all y e IRn.
7.2. Inner Product Spaces 55
It is easy to check that, with this more "abstract" definition of transpose, and if the
(/, y)th element of A is a
(;
, then the (i, y)t h element of A
T
is a/ , . It can also be checked
that all the usual properties of the transpose hold, such as (Afl) = B
T
A
T
. However, the
definition above allows us to extend the concept of transpose to the case of weighted inner
products in the following way. Suppose A e R
mxn
and let {, }g and (•, }
R
, with Q and
R positive definite, be weighted inner products on R
m
and W, respectively. Then we can
define the "weighted transpose" A
#
as the unique map that satisfies
(x, Ay)
Q
= (A
#
x, y)
R
for all x e R
m
and for all y e W
1
.
By Example 7.12 above, we must then have X
T
QAy = x
T
(A
#
) Ry for all x, y. Hence we
must have QA = (A
#
) R. Taking transposes (of the usual variety) gives A
T
Q = RA
#
.
Since R is nonsingular, we find
A* = /r'A' Q.
We can also generalize the notion of orthogonality (x
T
y = 0) to Q orthogonality (Q is
a positive definite matrix). Two vectors x, y e W are <2orthogonal (or conjugate with
respect to Q) if ( x, y}
Q
= X
T
Qy = 0. Q orthogonality is an important tool used in
studying conjugate direction methods in optimization theory.
Definition 7.14. Let V be a vector space over C. Then {, •} : V x V > C is a complex
inner product if
1. ( x, x) > Qfor all x e V and ( x, x) =0 if and only ifx = 0.
2. (x, y) = (y, x) for all x, y e V.
3. (x,ayi + fiy
2
) = a(x, y\) + fi(x, y
2
}forallx, y\, y
2
e V and for alia, ft 6 C.
Remark 7.15. We could use the notation {•, }
c
to denote a complex inner product, but
if the vectors involved are complexvalued, the complex inner product is to be understood.
Note, too, from part 2 of the definition, that ( x, x) must be real for all x.
Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have
(ax\ + fix
2
, y) = a(x\, y) + P(x
2
, y}.
Remark 7.17. The Euclidean inner product of x, y e C" is given by
The conventional definition of the complex Euclidean inner product is (x, y} = y
H
x but we
use its complex conjugate x
H
y here for symmetry with the real case.
Remark 7.18. A weighted inner product can be defined as in the real case by (x, y}
Q
—
X
H
Qy, for arbitrary Q = Q
H
> 0. The notion of Q orthogonality can be similarly
generalized to the complex case.
7.2. Inner product Spaces 55
It is easy to check that, with this more "abstract" definition of transpose, and if the
(i, j)th element of A is aij, then the (i, j)th element of AT is ap. It can also be checked
that all the usual properties of the transpose hold, such as (AB) = BT AT. However, the
definition above allows us to extend the concept of transpose to the case of weighted inner
products in the following way. Suppose A E ]Rm xn and let (., .) Q and (., .) R, with Q and
R positive definite, be weighted inner products on IR
m
and IRn, respectively. Then we can
define the "weighted transpose" A # as the unique map that satisfies
(x, AY)Q = (A#x, Y)R for all x E IRm and for all Y E IRn.
By Example 7.l2 above, we must then have x
T
QAy = x
T
(A#{ Ry for all x, y. Hence we
must have QA = (A#{ R. Taking transposes (of the usual variety) gives AT Q = RA#.
Since R is nonsingular, we find
A# = R1A
T
Q.
We can also generalize the notion of orthogonality (x
T
y = 0) to Qorthogonality (Q is
a positive definite matrix). Two vectors x, y E IRn are Qorthogonal (or conjugate with
respect to Q) if (x, y) Q = X T Qy = O. Qorthogonality is an important tool used in
studying conjugate direction methods in optimization theory.
Definition 7.14. Let V be a vector space over <C. Then (., .) : V x V + C is a complex
inner product if
1. (x, x) :::: 0 for all x E V and (x, x) = 0 if and only if x = O.
2. (x, y) = (y, x) for all x, y E V.
3. (x, aYI + f3Y2) = a(x, yll + f3(x, Y2) for all x, YI, Y2 E V andfor all a, f3 E c.
Remark 7.15. We could use the notation (., ·)e to denote a complex inner product, but
if the vectors involved are complexvalued, the complex inner product is to be understood.
Note, too, from part 2 of the definition, that (x, x) must be real for all x.
Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have
Remark 7.17. The Euclidean inner product of x, y E C
n
is given by
n
(x, y) = LXiYi = xHy.
i=1
The conventional definition of the complex Euclidean inner product is (x, y) = yH x but we
use its complex conjugate x
H
y here for symmetry with the real case.
Remark 7.1S. A weighted inner product can be defined as in the real case by (x, y)Q =
x
H
Qy, for arbitrary Q = QH > o. The notion of Qorthogonality can be similarly
generalized to the complex case.
56 Chapter 7. Projections, Inner Product Spaces, and Norms
Definition 7.19. A vector space (V, F) endowed with a specific inner product is called an
inner product space. If F = C, we call V a complex inner product space. If F = R, we
call V a real inner product space.
Example 7.20.
1. Check that V = R"
x
" with the inner product (A, B) = Tr A
T
B is a real inner product
space. Note that other choices are possible since by properties of the trace function,
TrA
T
B = TrB
T
A = TrAB
T
= TrBA
T
.
2. Check that V = C
nx
" with the inner product (A, B) = Tr A
H
B is a complex inner
product space. Again, other choices are possible.
Definition 7.21. Let V be an inner product space. For v e V, we define the norm (or
length) ofv by \\v\\ = */(v, v). This is called the norm induced by (  ,  ) .
Example 7.22.
1. If V = E." with the usual inner product, the induced norm is given by   i>   =
xV—*« 9\ 7
( E , =i < Y )
2

2. If V = C" with the usual inner product, the induced norm is given by \\v\\ =
(£ ?
=
, l » ,  l
2
)* .
Theorem 7.23. Let P be an orthogonal projection on an inner product space V. Then
\\Pv\\ < \\v\\forallv e V.
Proof: Since P is an orthogonal projection, P
2
= P = P
#
. (Here, the notation P
#
denotes
the unique linear transformation that satisfies ( P u , v } = (u, P
#
v) for all u, v e V. If this
seems a little too abstract, consider V = R" (or C"), where P
#
is simply the usual P
T
(or
P
H
)). Hence ( P v , v) = (P
2
v, v) = (Pv, P
#
v) = ( P v , Pv) = \\Pv\\
2
> 0. Now /  P is
also a projection, so the above result applies and we get
from which the theorem follows.
Definition 7.24. The norm induced on an inner product space by the "usual" inner product
is called the natural norm.
In case V = C" or V = R", the natural norm is also called the Euclidean norm. In
the next section, other norms on these vector spaces are defined. A converse to the above
procedure is also available. That is, given a norm defined by \\x\\ — • > /(• * > x), an inner
product can be defined via the following.
56 Chapter 7. Projections, Inner Product Spaces, and Norms
Definition 7.19. A vector space (V, IF) endowed with a specific inner product is called an
inner product space. If IF = e, we call V a complex inner product space. If IF = R we
call V a real inner product space.
Example 7.20.
1. Check that V = IR
n
xn with the inner product (A, B) = Tr AT B is a real inner product
space. Note that other choices are possible since by properties of the trace function,
Tr AT B = Tr B T A = Tr A B T = Tr BAT.
2. Check that V = e
nxn
with the inner product (A, B) = Tr AH B is a complex inner
product space. Again, other choices are possible.
Definition 7.21. Let V be an inner product space. For v E V, we define the norm (or
length) ofv by IIvll = J(V,V). This is called the norm induced by (', .).
Example 7.22.
1. If V = IR
n
with the usual inner product, the induced norm is given by II v II
n 2 1
(Li=l V
i
)2.
2. If V = en with the usual inner product, the induced norm is given by II v II =
"n 2 !
(L...i=l IVi I ) .
Theorem 7.23. Let P be an orthogonal projection on an inner product space V. Then
IIPvll ::::: Ilvll for all v E V.
Proof: Since P is an orthogonal projection, p2 = P = pH. (Here, the notation p# denotes
the unique linear transformation that satisfies (Pu, v) = (u, p#v) for all u, v E V. If this
seems a little too abstract, consider V = IR
n
(or en), where p# is simply the usual pT (or
pH)). Hence (Pv, v) = (P
2
v, v) = (Pv, p#v) = (Pv, Pv) = IIPvll
2
::: O. Now / P is
also a projection, so the above result applies and we get
0::::: ((I  P)v. v) = (v. v)  (Pv, v)
= IIvll2  IIPvll
2
from which the theorem follows. 0
Definition 7.24. The norm induced on an inner product space by the "usual" inner product
is called the natural norm.
In case V = en or V = IR
n
, the natural norm is also called the Euclidean norm. In
the next section, other norms on these vector spaces are defined. A converse to the above
procedure is also available. That is, given a norm defined by IIx II = .j(X,X}, an inner
product can be defined via the following.
7.3. Vector Norms 57
Theorem 7.25 (Polarization Identity).
1. For x, y € R", an inner product is defined by
7.3 Vector Norms
Definition 7.26. Let (V, F) be a vector space. Then \ \  \ \ : V >• R is a vector norm if it
satisfies the following three properties:
2. For x, y e C", an inner product is defined by
where j = i = \/—T.
(This is called the triangle inequality, as seen readily from the usual diagram illus
trating the sum of two vectors in R
2
.)
Remark 7.27. It is convenient in the remainder of this section to state results for complex
valued vectors. The specialization to the real case is obvious.
Definition 7.28. A vector space (V, F) is said to be a normed linear space if and only if
there exists a vector norm  •  : V > R satisfying the three conditions of Definition 7.26.
Example 7.29.
1. For x e C", the Holder norms, or pnorms, are defined by
Special cases:
(The second equality is a theorem that requires proof.)
7.3. Vector Norms
Theorem 7.25 (Polarization Identity).
1. For x, y E an inner product is defined by
(x,y)=xTy=
2. For x, y E en, an inner product is defined by
where j = i = .J=I.
7.3 Vector Norms
IIx + yll2 _ IIxll2 _ lIyll2
2
57
Definition 7.26. Let (V, IF) be a vector space. Then II . II : V + IR is a vector norm ifit
satisfies the following three properties:
1. Ilxll::: Of or all x E V and IIxll = 0 ifand only ifx = O.
2. Ilaxll = lalllxllforallx E Vandforalla E IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V.
(This is called the triangle inequality, as seen readily from the usual diagram illus
trating the sum of two vectors in ]R2 .)
Remark 7.27. It is convenient in the remainder of this section to state results for complex
valued vectors. The specialization to the real case is obvious.
Definition 7.28. A vector space (V, IF) is said to be a normed linear space if and only if
there exists a vector norm II . II : V + ]R satisfying the three conditions of Definition 7.26.
Example 7.29.
1. For x E en, the HOlder norms, or pnorms, are defined by
Special cases:
(a) Ilx III = L:7=1 IXi I (the "Manhattan" norm).
1 1
(b) Ilxllz = (L:7=1Ix;l2)2 = (X
H
X)2 (the Euclidean norm).
(c) Ilxlioo = maxlx;l = lim IIxllp
IE!! p++oo
(The second equality is a theorem that requires proof.)
58 Chapter 7. Projections, Inner Product Spaces, and Norms
2. Some weighted pnorms:
(a)   JC  , .
D
= E^rf/l*/!, where 4 > 0.
(b) I k llz . g — (x
h
Q
X
Y > where Q = Q
H
> 0 (this norm is more commonly
denoted  • 
c
).
3. On the vector space (C[to, t \ ] , R), define the vector norm
On the vector space ((C[to, t\])
n
, R), define the vector norm
Fhcorem 7.30 (Holder Inequality). Let x, y e C". Ther,
A particular case of the Holder inequality is of special interest.
Theorem 7.31 (CauchyBunyakovskySchwarz Inequality). Let x, y e C". Then
with equality if and only if x and y are linearly dependent.
Proof: Consider the matrix [x y] e C"
x2
. Since
is a nonnegative definite matrix, its determinant must be nonnegative. In other words,
0 < ( x
H
x ) ( y
H
y ) — ( x
H
y ) ( y
H
x ) . Since y
H
x = x
H
y, we see immediately that \X
H
y\ <
\\X\\2\\y\\2
D
Note: This is not the classical algebraic proof of the CauchyBunyakovskySchwarz
(CBS) inequality (see, e.g., [20, p. 217]). However, it is particularly easy to remember.
Remark 7.32. The angle 0 between two nonzero vectors x, y e C" may be defined by
cos# = I, „ .^ , 0 < 0 < 5. The CBS inequality is thus equivalent to the statement
IlMmlylb — ^
COS 0 <1.
Remark 7.33. Theorem 7.31 and Remark 7.32 are true for general inner product spaces.
Remark 7.34. The norm  • 
2
is unitarily invariant, i.e., if U € C"
x
" is unitary, then
\\Ux\\
2
= \\x\\
2
(Proof. \\Ux\\l = x
H
U
H
Ux = X
H
X = \\x\\\). However,   , and   1^
58 Chapter 7. Projections, Inner Product Spaces, and Norms
2. Some weighted pnorms:
(a) IIxll1.D = whered; > O.
1
(b) IIx IIz.Q = (x
H
Qx) 2, where Q = QH > 0 (this norm is more commonly
denoted II . IIQ)'
3. On the vector space (C[to, ttl, 1Ft), define the vector norm
11111 = max 1/(t)I·
On the vector space «e[to, ttlr, 1Ft), define the vector norm
1111100 = max II/(t) 11
00
,
Theorem 7.30 (HOlder Inequality). Let x, y E en. Then
I I
+=1.
p q
A particular case of the HOlder inequality is of special interest.
Theorem 7.31 (CauchyBunyakovskySchwarz Inequality). Let x, y E en. Then
with equality if and only if x and yare linearly dependent.
Proof' Consider the matrix [x y] E en
x2
. Since
is a nonnegative definite matrix, its determinant must be nonnegative. In other words,
o (x
H
x)(yH y)  (x
H
y)(yH x). Since yH x = x
H
y, we see immediately that IXH yl
IIxll2l1yllz. 0
Note: This is not the classical algebraic proof of the CauchyBunyakovskySchwarz
(CBS) inequality (see, e.g., [20, p. 217]). However, it is particularly easy to remember.
Remark 7.32. The angle e between two nonzero vectors x, y E en may be defined by
cos e = 0 e I' The CBS inequality is thus equivalent to the statement
1 cose 1 1.
Remark 7.33. Theorem 7.31 and Remark 7.32 are true for general inner product spaces.
Remark 7.34. The norm II . 112 is unitarily invariant, i.e., if U E e
nxn
is unitary, then
IIUxll2 = IIxll2 (Proof IIUxili = XHUHUx = xHx = IIxlli)· However, 11·111 and 1I·IIClO
7.4. Matrix Norms 59
are not unitarily invariant. Similar remarks apply to the unitary invariance of norms of real
vectors under orthogonal transformation.
Remark 7.35. If x, y € C" are orthogonal, then we have the Pythagorean Identity
7.4 Matrix Norms
In this section we introduce the concept of matrix norm. As with vectors, the motivation for
using matrix norms is to have a notion of either the size of or the nearness of matrices. The
former notion is useful for perturbation analysis, while the latter is needed to make sense of
"convergence" of matrices. Attention is confined to the vector space (W
nxn
, R) since that is
what arises in the majority of applications. Extension to the complex case is straightforward
and essentially obvious.
Definition 7.39.  •  : R
mx
" > E is a matrix norm if it satisfies the following three
properties:
2 _ _/ / .
the proof of which follows easily from z2 = z z.
Theorem 7.36. All norms on C" are equivalent; i.e., there exist constants c\, ci (possibly
depending onn) such that
Example 7.37. For x G C", the following inequalities are all tight bounds; i.e., there exist
vectors x for which equality holds:
Finally, we conclude this section with a theorem about convergence of vectors. Con
vergence of a sequence of vectors to some limit vector can be converted into a statement
about convergence of real numbers, i.e., convergence in terms of vector norms.
Theorem 7.38. Let \\ • \\ be a vector norm and suppose v, i»
( 1 )
, v
(2
\ ... e C". Then
7.4. Matrix Norms 59
are not unitarily invariant. Similar remarks apply to the unitary invariance of norms of real
vectors under orthogonal transformation.
Remark 7.35. If x, y E en are orthogonal, then we have the Pythagorean Identity
Ilx ± = +
the proof of which follows easily from liz = ZH z.
Theorem 7.36. All norms on en are equivalent; i.e., there exist constants CI, C2 (possibly
depending on n) such that
Example 7.37. For x E en, the following inequalities are all tight bounds; i.e., there exist
vectors x for which equality holds:
Ilxlll :::: Jn Ilxlb
Ilxll2:::: IIxll»
IIxlloo :::: IIxll»
Ilxlll :::: n IIxlloo;
IIxl12 :::: Jn Ilxll
oo
;
IIxlioo :::: IIxllz.
Finally, we conclude this section with a theorem about convergence of vectors. Con
vergence of a sequence of vectors to some limit vector can be converted into a statement
about convergence of real numbers, i.e., convergence in terms of vector norms.
Theorem 7.38. Let II· II be a vector norm and suppose v, v(l), v(2), ... E en. Then
lim V(k) = v if and only if lim II v(k)  v II = O.
k4+00
7.4 Matrix Norms
In this section we introduce the concept of matrix norm. As with vectors, the motivation for
using matrix norms is to have a notion of either the size of or the nearness of matrices. The
former notion is useful for perturbation analysis, while the latter is needed to make sense of
"convergence" of matrices. Attention is confined to the vector space (IRm xn , IR) since that is
what arises in the majority of applications. Extension to the complex case is straightforward
and essentially obvious.
Definition 7.39. II· II : IR
mxn
IR is a matrix norm if it satisfies the following three
properties:
1. IIAII Of or all A E IR
mxn
and IIAII = 0 if and only if A = O.
2. lIaAl1 = lalliAliforall A E IR
mxn
andfor all a E IR.
3. IIA + BII :::: IIAII + IIBII for all A, BE IRmxn.
(As with vectors, this is called the triangle inequality.)
60 Chapter 7. Projections, Inner Product Spaces, and Norms
Example 7.40. Let A e R
mx
". Then the Frobenius norm (or matrix Euclidean norm) is
defined by
^wncic r = laiiK^/i;;.
Example 7.41. Let A e R
mxn
. Then the matrix pnorms are defined by
The following three special cases are important because they are "computable." Each is a
theorem and requires a proof.
1. The "maximum column sum" norm is
2. The "maximum row sum" norm is
3. The spectral norm is
Example 7.42. Let A E R
mxn
. The Schatten/7norms are defined by
Some special cases of Schatten /?norms are equal to norms defined previously. For example,
 . 
5 2
=  . \\
F
and  • 
5i00
=  • 
2
. The norm  • 
5>1
is often called the trace norm.
Example 7.43. Let A e K
mx
". Then "mixed" norms can also be defined by
Example 7.44. The "matrix analogue of the vector 1norm,"  A\\
s
= ^ j \a
i}
; , is a norm.
The concept of a matrix norm alone is not altogether useful since it does not allow us
to estimate the size of a matrix product A B in terms of the sizes of A and B individually.
60 Chapter 7. Projections, Inner Product Spaces, and Norms
Example 7.40. Let A E lR,mxn. Then the Frobenius norm (or matrix Euclidean norm) is
defined by
IIAIIF ~ (t. t ai;) I ~ (t. altA)) 1 ~ (T, (A' A)) 1 ~ (T, (AA '));
(where r = rank(A)).
Example 7.41. Let A E lR,mxn. Then the matrix pnorms are defined by
IIAxll
IIAII = max _P = max IIAxll .
P Ilxllp;60 Ilxli
p
IIxllp=1 p
The following three special cases are important because they are "computable." Each is a
theorem and requires a proof.
I. The "maximum column sum" norm is
2. The "maximum row sum" norm is
IIAlioo = max
rE!!l. (
t laUI ).
J=1
3. The spectral norm is
tTL T
IIAII2 = Amax(A A) = A ~ a x ( A A ) = a1(A).
Note: IIA+llz = l/ar(A), where r = rank(A).
Example 7.42. Let A E lR,mxn. The Schattenpnorms are defined by
I
IIAlls.p = (at' + ... + a!)"".
Some special cases of Schatten pnorms are equal to norms defined previously. For example,
11·115.2 = II . IIF and 11'115,00 = II . 112' The norm II . 115.1 is often called the trace norm.
Example 7.43. Let A E lR,mxn _ Then "mixed" norms can also be defined by
IIAII = max IIAxil
p
p,q 11.<110#0 IIxllq
Example 7.44. The "matrix analogue of the vector Inorm," IIAlis = Li.j laij I, is a norm.
The concept of a matrix norm alone is not altogether useful since it does not allow us
to estimate the size of a matrix product AB in terms of the sizes of A and B individually.
7.4. Matrix Norms 61
Notice that this difficulty did not arise for vectors, although there are analogues for, e.g.,
inner products or outer products of vectors. We thus need the following definition.
Definition 7.45. Let A e R
mxn
, B e R
nxk
. Then the norms \\ • \\
a
, \\ • \\
p
, and \\ • \\
y
are
mutually consistent if \\ A B \\
a
< \\A\\p\\B\\
y
. A matrix norm\\ • \\ is said to be consistent
if \\AB\\ <  A   fi whenever the matrix product is defined.
Example 7.46.
1.  • /7 and  • 
p
for all p are consistent matrix norms.
2. The "mixed" norm
is a matrix norm but it is not consistent. For example, take A = B = \ \ J1. Then
  Af l  
l i 00
= 2whil e  A 
l i 00
  B 
1 >00
= l.
The p norms are examples of matrix norms that are subordinate to (or induced by)
a vector norm, i.e.,
11^ 4^ 11
(or, more generally, A = max^o ., . .
p
) . For such subordinate norms, also called oper
ator norms, we clearly have Aj c < A1jt. Since   Af ij c  <   A    f l j c  < Aflj t,
it follows that all subordinate norms are consistent.
Theorem 7.47. There exists a vector x* such that Ajt* = A jc* if the matrix normis
subordinate to the vector norm.
Theorem 7.48. If \\ • \\
m
is a consistent matrix norm, there exists a vector norm \\ • \\
v
consistent with it, i.e., H Aj c JI ^ < \\A\\
m
\\x\\
v
.
Not every consistent matrix norm is subordinate to a vector norm. For example,
consider  • \\
F
. Then  A^ 
2
< A
F
j c
2
, so  • 
2
is consistent with  • 
F
, but there does
not exist a vector norm  •  such that A
F
is given by max^o \^ •
Useful Results
The following miscellaneous results about matrix norms are collected for future reference.
The interested reader is invited to prove each of them as an exercise.
2. For A e R"
x
", the following inequalities are all tight, i.e., there exist matrices A for
which equality holds:
7.4. Matrix Norms 61
Notice that this difficulty did not arise for vectors, although there are analogues for, e.g.,
inner products or outer products of vectors. We thus need the following definition.
Definition 7.45. Let A E ]Rmxn, B E ]Rnxk. Then the norms II . II", II· Ilfl' and II . lIy are
mutuallyconsistentifIlABII,,::S IIAllfllIBlly. A matrix norm 11·11 is said to be consistent
if II A B II ::s II A 1111 B II whenever the matrix product is defined.
Example 7.46.
1. II· II F and II . II p for all p are consistent matrix norms.
2. The "mixed" norm
IIAxll1
II· 11
100
= max = max laijl
, x;60 Ilx 1100 i,j
is a matrix norm but it is not consistent. For example, take A = B = [: :]. Then
IIABIII,oo = 2 while IIAIII,ooIlBIII,oo = 1.
The pnorms are examples of matrix norms that are subordinate to (or induced by)
a vector norm, i.e.,
IIAxl1
IIAII = max  = max IIAxl1
x;60 IIx II Ilxll=1
IIAxll .
(or, more generally, IIAllp,q = maxx;60 IIxll
q
P
), For such subordmate norms, also caUedoper
atornorms, wec1earlyhave IIAxll ::s IIAllllxll· Since IIABxl1 ::s IIAlIllBxll ::s IIAIIIIBllllxll,
it follows that all subordinate norms are consistent.
Theorem 7.47. There exists a vector x* such that IIAx*11 = IIAllllx*11 if the matrix norm is
subordinate to the vector norm.
Theorem 7.48. If II . 11m is a consistent matrix norm, there exists a vector norm II . IIv
consistent with it, i.e., IIAxliv ::s IIAlim Ilxli
v
'
Not every consistent matrix norm is subordinate to a vector norm. For example,
consider II . II F' Then II Ax 112 ::s II A II Filx 112, so II . 112 is consistent with II . II F, but there does
not exist a vector norm II . II such that IIAIIF is given by max
x
;60 " , ~ ~ i ' .
Useful Results
The following miscellaneous results about matrix norms are collected for future reference.
The interested reader is invited to prove each of them as an exercise.
1. II In II p = 1 for all p, while IIIn II F = .jii.
2. For A E ]Rnxn, the following inequalities are all tight, i.e., there exist matrices A for
which equality holds:
IIAIII ::s .jii IIAlb
IIAII2 ::s.jii IIAII
I
,
II A 1100 ::s n IIAII
I
,
IIAIIF ::s.jii IIAII
I
,
IIAIII ::s n IIAlloo,
IIAII2 ::s .jii IIAlloo,
IIAlioo ::s .jii IIAII2,
IIAIIF ::s .jii IIAlb
IIAIII ::s .jii II
A
IIF;
IIAII2::S IIAIIF;
IIAlioo ::s .jii IIAIIF;
IIAIIF ::s .jii IIAlioo'
62 Chapter 7. Projections, Inner Product Spaces, and Norms
3. For A eR
mxa
.
4. The norms  • \\
F
and  • 
2
(as well as all the Schatten /?norms, but not necessarily
other pnorms) are unitarily invariant; i.e., for all A e R
mx
" and for all orthogonal
matrices Q zR
mxm
and Z e M"
x
", (MZ
a
=   A 
a
fora = 2 or F.
Convergence
The following theorem uses matrix norms to convert a statement about convergence of a
sequence of matrices into a statement about the convergence of an associated sequence of
scalars.
Theorem 7.49. Let \\ \\bea matrix normand suppose A, A
( 1)
, A
(2)
, ... e R
mx
". Then
EXERCISES
1. If P is an orthogonal projection, prove that P
+
= P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P — Q
must be an orthogonal matrix.
3. Prove that / — A
+
A is an orthogonal projection. Also, prove directly that V
2
V/ is an
orthogonal projection, where ¥2 is defined as in Theorem 5.1.
4. Suppose that a matrix A e W
nxn
has linearly independent columns. Prove that the
orthogonal projection onto the space spanned by these column vectors is given by the
matrix P = A(A
T
A)~
}
A
T
.
5. Find the (orthogonal) projection of the vector [2 3 4]
r
onto the subspace of R
3
spanned by the plane 3;c — v + 2z = 0.
6. Prove that E"
x
" with the inner product (A, B) = Tr A
T
B is a real inner product
space.
7. Show that the matrix norms  • 
2
and  • \\
F
are unitarily invariant.
8. Definition: Let A e R
nxn
and denote its set of eigenvalues (not necessarily distinct)
by { Ai , . . . , > . „ } . The spectral radius of A is the scalar
62 Chapter 7. Projections, Inner Product Spaces, and Norms
3. For A E IR
mxn
,
max laijl :::: IIAII2 :::: ~ max laijl.
l.] l.]
4. The norms II . IIF and II . 112 (as well as all the Schatten pnorms, but not necessarily
other pnorms) are unitarily invariant; i.e., for all A E IR
mxn
and for all orthogonal
matrices Q E IR
mxm
and Z E IR
nxn
, IIQAZlia = IIAlla fora = 2 or F.
Convergence
The following theorem uses matrix norms to convert a statement about convergence of a
sequence of matrices into a statement about the convergence of an associated sequence of
scalars.
Theorem 7.49. Let II ·11 be a matrix norm and suppose A, A(I), A(2), ... E IRmxn. Then
lim A (k) = A if and only if lim IIA (k)  A II = o.
k ~ + o o k ~ + o o
EXERCISES
1. If P is an orthogonal projection, prove that p+ = P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P  Q
must be an orthogonal matrix.
3. Prove that I  A + A is an orthogonal projection. Also, prove directly that V
2
Vl is an
orthogonal projection, where V2 is defined as in Theorem 5.1.
4. Suppose that a matrix A E IR
mxn
has linearly independent columns. Prove that the
orthogonal projection onto the space spanned by these column vectors is given by the
matrix P = A(AT A) 1 AT.
5. Find the (orthogonal) projection of the vector [2 3 4f onto the subspace of 1R
3
spanned by the plane 3x  y + 2z = O.
6. Prove that IR
n
xn with the inner product (A, B) = Tr AT B is a real inner product
space.
7. Show that the matrix norms II . 112 and II . IIF are unitarily invariant.
8. Definition: Let A E IR
nxn
and denote its set of eigenvalues (not necessarily distinct)
by P.l, ... , An}. The spectral radius of A is the scalar
p(A) = max IA;I.
i
Exercises 63
Determine A
F
, H AI d , A
2
, H AH ^ , and p(A). (An n x n matrix, all of whose
columns and rows as well as main d iagonal and antid iagonal sum to s = n(n
2
+ l)/2,
is called a "magic square" matrix. I f M is a magic square matrix, it can be proved
that  M U p = s for all/?.)
10. Let A = xy
T
, where both x, y e R" are nonzero. Determine A
F
, Aj, A
2
,
and Aoo in terms of \\x\\
a
and /or \\y\\p, where a and ft take the value 1, 2, or oo as
appropriate.
Let
9. Let
Determine A
F
, \\A\\
lt
A
2
, H A^ , and p(A).
Exercises 63
Let
A = [ ~ 0 ~ ] .
14 12 5
Determine IIAIIF' IIAII
I
, IIAlb IIAlloo, and peA).
9. Let
A = [ ~ ~ ~ ] .
492
Determine IIAIIF' IIAII
I
, IIAlb IIAlloo, and peA). (An n x n matrix, all of whose
columns and rows as well as main diagonal and antidiagonal sum to s = n (n
2
+ 1) /2,
is called a "magic square" matrix. If M is a magic square matrix, it can be proved
that IIMllp = s for all p.)
10. Let A = xyT, where both x, y E IR
n
are nonzero. Determine IIAIIF' IIAIII> IIAlb
and II A 1100 in terms of IIxlla and/or IlylljJ, where ex and {3 take the value 1,2, or (Xl as
appropriate.
This page intentionally left blank This page intentionally left blank
Chapter 8
Li near Least Squares
Problems
8.1 The Li near Least Squares Problem
Problem: Suppose A e R
mx
" with m > n and b <= R
m
is a given vector. The linear least
squares problem consists of finding an element of the set
Solution: The set X has a number of easily verified properties:
1. A vector x e X if and only if A
T
r = 0, where r = b — Ax is the residual associated
with x. The equations A
T
r — 0 can be rewritten in the form A
T
Ax = A
T
b and the
latter form is commonly known as the normal equations, i.e., x e X if and only if
x is a solution of the normal equations. For further details, see Section 8.2.
2. A vector x E X if and onlv if x is of the form
To see why this must be so, write the residual r in the form
Now, (Pn(A)b — AJ C ) is clearly in 7£(A) , while
so these two vectors are orthogonal. Hence,
from the Pythagorean identity (Remark 7.35). Thus, A.x — b\\\ (and hence p ( x ) =
\\Ax —b\\2) assumes its minimum value if and only if
65
Chapter 8
Linear Least Squares
Problems
8.1 The Linear Least Squares Problem
Problem: Suppose A E jRmxn with m 2: nand b E jRm is a given vector. The linear least
squares problem consists of finding an element of the set
x = {x E jRn : p(x) = IIAx  bll
2
is minimized}.
Solution: The set X has a number of easily verified properties:
1. A vector x E X if and only if AT r = 0, where r = b  Ax is the residual associated
with x. The equations AT r = 0 can be rewritten in the form A T Ax = AT b and the
latter form is commonly known as the normal equations, i.e., x E X if and only if
x is a solution of the normal equations. For further details, see Section 8.2.
2. A vector x E X if and only if x is of the form
x=A+b+(IA+A)y, whereyEjRnisarbitrary. (8.1)
To see why this must be so, write the residual r in the form
r = (b  PR(A)b) + (PR(A)b  Ax).
Now, (PR(A)b  Ax) is clearly in 'R(A), while
(b  PR(A)b) = (I  PR(A))b
= PR(A),,b E 'R(A)L
so these two vectors are orthogonal. Hence,
= lib 
= lib  + IIPR(A)b 
from the Pythagorean identity (Remark 7.35). Thus, IIAx  (and hence p(x) =
II Ax  b 112) assumes its minimum value if and only if
(8.2)
65
66 Chapter 8. Linear Least Squares Problems
and this equation always has a solution since AA
+
b e 7£(A). By Theorem 6.3, all
solutions of (8.2) are of the form
where y e W is arbitrary. The minimum value of p ( x ) is then clearly equal to
the last inequality following by Theorem 7.23.
3. X is convex. To see why, consider two arbitrary vectors jci = A
+
b + (I — A+A)y
and *2 = A+b + (I — A+A)z in X. Let 6 e [0, 1]. Then the convex combination
0*i + (1  #)*
2
= A+b + (I  A
+
A)(Oy + (1  0)z) is clearly in X.
4. X has a unique element x* of minimal 2norm. In fact, x* = A
+
b is the unique vector
that solves this "double minimization" problem, i.e., x * minimizes the residual p ( x )
and is the vector of minimum 2norm that does so. This follows immediately from
convexity or directly from the fact that all x e X are of the form (8.1) and
which follows since the two vectors are orthogonal.
5. There is a unique solution to the least squares problem, i.e., X = {x*} = {A+b}, if
and only if A
+
A = I or, equivalently, if and only if rank (A) = n.
Just as for the solution of linear equations, we can generalize the linear least squares
problem to the matrix case.
Theorem 8.1. Let A e E
mx
" and B € R
mxk
. The general solution to
is of the form
where Y € R"
xfc
is arbitrary. The unique solution of minimum 2norm or Fnorm is
X = A+B.
Remark 8.2. Notice that solutions of the linear least squares problem look exactly the
same as solutions of the linear system AX = B. The only difference is that in the case
of linear least squares solutions, there is no "existence condition" such as K(B) c 7£(A).
If the existence condition happens to be satisfied, then equality holds and the least squares
66 Chapter 8. Linear Least Squares Problems
and this equation always has a solution since AA+b E R(A). By Theorem 6.3, all
solutions of (8.2) are of the form
x = A+ AA+b + (I  A+ A)y
=A+b+(IA+A)y,
where y E ]R.n is arbitrary. The minimum value of p (x) is then clearly equal to
lib  PR(A)bll
z
= 11(1  AA+)bI1
2
~ Ilbll z,
the last inequality following by Theorem 7.23.
3. X is convex. To see why, consider two arbitrary vectors Xl = A + b + (I  A + A) y
and Xz = A+b + (I  A+ A)z in X. Let 8 E [0,1]. Then the convex combination
8x, + (1  8)xz = A+b + (I  A+ A)(8y + (1  8)z) is clearly in X.
4. X has a unique element x" of minimal2norm. In fact, x" = A + b is the unique vector
that solves this "double minimization" problem, i.e., x* minimizes the residual p(x)
and is the vector of minimum 2norm that does so. This follows immediately from
convexity or directly from the fact that all x E X are of the form (8.1) and
which follows since the two vectors are orthogonal.
5. There is a unique solution to the least squares problem, i.e., X = {x"} = {A+b}, if
and only if A + A = lor, equivalently, if and only if rank(A) = n.
Just as for the solution of linear equations, we can generalize the linear least squares
problem to the matrix case.
Theorem 8.1. Let A E ]R.mxn and BE ]R.mxk. The general solution to
min IIAX  Bib
XElR
Plxk
is of the form
X=A+B+(IA+A)Y,
where Y E ]R.nxk is arbitrary. The unique solution of minimum 2norm or Fnorm is
X = A+B.
Remark 8.2. Notice that solutions of the linear least squares problem look exactly the
same as solutions of the linear system AX = B. The only difference is that in the case
of linear least squares solutions, there is no "existence condition" such as R(B) S; R(A).
If the existence condition happens to be satisfied. then equality holds and the least squares
8.3 Linear Regression and Other Linear Least Squares Problems 67
residual is 0. Of all solutions that give a residual of 0, the unique solution X = A
+
B has
minimum 2norm or Fnorm.
Remark 8.3. If we take B = I
m
in Theorem 8.1, then X = A
+
can be interpreted as
saying that the MoorePenrose pseudoinverse of A is the best (in the matrix 2norm sense)
matrix such that AX approximates the identity.
Remark 8.4. Many other interesting and useful approximation results are available for the
matrix 2norm (and Fnorm). One such is the following. Let A e M™
x
" with SVD
8.2 Geometric Solution
Looking at the schematic provided in Figure 8.1, it is apparent that minimizing  Ax — b\\
2
is equivalent to finding the vector x e W
1
for which p — Ax is closest to b (in the Euclidean
norm sense). Clearly, r = b — Ax must be orthogonal to 7£(A). Thus, if Ay is an arbitrary
vector in 7£(A) (i.e., y is arbitrary), we must have
Then a best rank k approximation to A for l <f c <r , i . e . , a solution to
is given by
The special case in which m = n and k = n — 1 gives a nearest singular matrix to A e
Since y is arbitrary, we must have A
T
b — A
T
Ax = 0 or A
r
A;c = A
T
b.
Special case: If A is full (column) rank, then x = (A
T
A) A
T
b.
8.3 Linear Regression and Other Linear Least Squares
Problems
8.3.1 Example: Linear regression
Suppose we have m measurements (t\,y\), . . . , (t
m
,y
m
) for which we hypothesize a linear
(affine) relationship
8.3 Linear Regression and Other Linear Least Squares Problems 67
residual is O. Of all solutions that give a residual of 0, the unique solution X = A + B has
minimum 2norm or F norm.
Remark 8.3. If we take B = 1m in Theorem 8.1, then X = A+ can be interpreted as
saying that the MoorePenrose pseudoinverse of A is the best (in the matrix 2norm sense)
matrix such that AX approximates the identity.
Remark 8.4. Many other interesting and useful approximation results are available for the
matrix 2norm (and F norm). One such is the following. Let A E with SVD
A = = LOiUiV!.
i=l
Then a best rank k approximation to A for 1 :s k :s r, i.e., a solution to
min IIA  MIi2,
MEJRZ'xn
is given by
k
Mk = LOiUiV!.
i=1
The special case in which m = nand k = n  1 gives a nearest singular matrix to A E x n .
8.2 Geometric Solution
Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx  bll
2
is equivalent to finding the vector x E lR
n
for which p = Ax is closest to b (in the Euclidean
norm sense). Clearly, r = b  Ax must be orthogonal to R(A). Thus, if Ay is an arbitrary
vector in R(A) (i.e., y is arbitrary), we must have
0= (Ay)T (b  Ax)
=yTAT(bAx)
= yT (ATb _ AT Ax).
Since y is arbitrary, we must have AT b  AT Ax = 0 or AT Ax = AT b.
Special case: If A is full (column) rank, then x = (AT A)l ATb.
8.3 Linear Regression and Other Linear Least Squares
Problems
8.3.1 Example: Linear regression
Suppose we have m measurements (ll, YI), ... , (trn, Ym) for which we hypothesize a linear
(affine) relationship
y = at + f3
(8.3)
68 Chapter 8. Linear Least Squares Problems
Figure 8.1. Projection of b on K(A).
for certain constants a. and ft. One way to solve this problem is to find the line that best fits
the data in the least squares sense; i.e., with the model (8.3), we have
where &\,..., 8
m
are "errors" and we wish to minimize 8\ + • • • + 8^ Geometrically, we
are trying to find the best line that minimizes the (sum of squares of the) distances from the
given data points. See, for example, Figure 8.2.
Figure 8.2. Simple linear regression.
Note that distances are measured in the vertical sense from the points to the line (as
indicated, for example, for the point (t\, y\}}. However, other criteria arc possible. For ex
ample, one could measure the distances in the horizontal sense, or the perpendicular distance
from the points to the line could be used. The latter is called total least squares. Instead
of 2norms, one could also use 1norms or oonorms. The latter two are computationally
68 Chapter 8. Linear Least Squares Problems
b
r
p=Ax Ay E R(A)
Figure S.l. Projection of b on R(A).
for certain constants a and {3. One way to solve this problem is to find the line that best fits
the data in the least squares sense; i.e., with the model (8.3), we have
YI = all + {3 + 81,
Y2 = al2 + {3 + 82
where 8
1
, ... , 8
m
are "errors" and we wish to minimize 8? + ... + 8;. Geometrically, we
are trying to find the best line that minimizes the (sum of squares of the) distances from the
given data points. See, for example, Figure 8.2.
y
Figure 8.2. Simple linear regression.
Note that distances are measured in the venical sense from the point!; to [he line (a!;
indicated. for example. for the point (tl. YIn. However. other criteria nrc For cx
ample, one could measure the distances in the horizontal sense, or the perpendiculnr distance
from the points to the line could be used. The latter is called total least squares. Instead
of 2norms, one could also use Inorms or oonorms. The latter two are computationally
8.3. Linear Regression and Other Linear Least Squares Problems 69
much more difficult to handle, and thus we present only the more tractable 2norm case in
text that follows.
The ra "error equations" can be written in matrix form as
where
We then want to solve the problem
or, equivalently,
Solution: x — [^1 is a solution of the normal equations A
T
Ax = A
T
y where, for the
special form of the matrices above, we have
and
8.3.2 Other least squares problems
Suppose the hypothesized model is not the linear equation (8.3) but rather is of the form
y = f ( t ) =
Cl
0!(0+ • • • 4 c
n
<t>
n
(t). (8.5)
In (8.5) the < / > ,(0 are given (basis) functions and the c
;
are constants to be determined to
minimize the least squares error. The matrix problem is still (8.4), where we now have
An important special case of (8.5) is least squares polynomial approximation, which
corresponds to choosing 0,• (?) = t'~
l
, i
;
e n, although this choice can lead to computational
The solution for the parameters a and ft can then be written
8.3. Linear Regression and Other Linear Least Squares Problems 69
much more difficult to handle, and thus we present only the more tractable 2norm case in
text that follows.
The m "error equations" can be written in matrix form as
Y = Ax +0,
where
We then want to solve the problem
minoT 0 = min (Ax  y)T (Ax  y)
x
or, equivalently,
min = min II Ax 
x
Solution: x = is a solution of the normal equations AT Ax
special form of the matrices above, we have
and
AT Y = [ Li ti Yi J.
LiYi
The solution for the parameters a and f3 can then be written
8.3.2 Other least squares problems
(8.4)
AT y where, for the
Suppose the hypothesized model is not the linear equation (S.3) but rather is of the form
(8.5)
In (8.5) the ¢i(t) are given (basis) functions and the Ci are constants to be determined to
minimize the least squares error. The matrix problem is still (S.4), where we now have
An important special case of (8.5) is least squares polynomial approximation, which
corresponds to choosing ¢i (t) = t
i

1
, i E !!, although this choice can lead to computational
70 Chapter 8. Linear Least Squares Problems
difficulties because of numerical ill conditioning for large n. Numerically better approaches
are based on orthogonal polynomials, piecewise polynomial functions, splines, etc.
The key feature in (8.5) is that the coefficients c, appear linearly. The basis functions
< / > , can be arbitrarily nonlinear. Sometimes a problem in which the c, 's appear nonlinearly
can be converted into a linear problem. For example, if the fitting function is of the form
y = f ( t ) = c\e
C2i
, then taking logarithms yields the equation logy = logci + cjt. Then
defining y — logy, c\ = logci, and GI = cj_ results in a standard linear least squares
problem.
8.4 Least Squares and Singular Value Decomposition
In the numerical linear algebra literature (e.g., [4], [7], [11], [23]), it is shown that solution
of linear least squares problems via the normal equations can be a very poor numerical
method in finite precision arithmetic. Since the standard Kalman filter essentially amounts
to sequential updating of normal equations, it can be expected to exhibit such poor numerical
behavior in practice (and it does). Better numerical methods are based on algorithms that
work directly and solely on A itself rather than A
T
A. Two basic classes of algorithms are
based on S VD and QR (orthogonal upper triangular) factorization, respectively. The former
is much more expensive but is generally more reliable and offers considerable theoretical
insight.
In this section we investigate solution of the linear least squares problem
The last equality follows from the fact that if v = [£ ], then u^ =   i> i \\\ + \\vi\\\ (note
that orthogonality is not what is used here; the subvectors can have different lengths). This
explains why it is convenient to work above with the square of the norm rather than the
norm. As far as the minimization is concerned, the two are equivalent. In fact, the last
quantity above is clearly minimized by taking z\ = S~
l
c\. The subvector z
2
is arbitrary,
while the minimum value of \\Ax — b\\^ is l ^l l r
via the SVD. Specifically, we assume that A has an SVD given by A = UT, V
T
= U\SVf
as in Theorem 5.1. We now note that
70 Chapter 8. Linear Least Squares Problems
difficulties because of numerical ill conditioning for large n. Numerically better approaches
are based on orthogonal polynomials, piecewise polynomial functions, splines, etc.
The key feature in (8.5) is that the coefficients Ci appear linearly. The basis functions
¢i can be arbitrarily nonlinear. Sometimes a problem in which the Ci'S appear nonlinearly
can be converted into a linear problem. For example, if the fitting function is of the form
Y = f (t) = c, e
C2
/ , then taking logarithms yields the equation log y = log c, + c2f. Then
defining y = log y, c, = log c" and C2 = C2 results in a standard linear least squares
problem.
8.4 Least Squares and Singular Value Decomposition
In the numerical linear algebra literature (e.g., [4], [7], [11], [23]), it is shown that solution
of linear least squares problems via the normal equations can be a very poor numerical
method in finiteprecision arithmetic. Since the standard Kalman filter essentially amounts
to sequential updating of normal equations, it can be expected to exhibit such poor numerical
behavior in practice (and it does). Better numerical methods are based on algorithms that
work directly and solely on A itself rather than AT A. Two basic classes of algorithms are
based on SVD and QR (orthogonalupper triangular) factorization, respectively. The former
is much more expensive but is generally more reliable and offers considerable theoretical
insight.
In this section we investigate solution of the linear least squares problem
min II Ax  b11
2
, A E IR
mxn
, bE IR
m
, (8.6)
x
via the SVD. Specifically, we assume that A has an SVD given by A = = U,SVr
as in Theorem 5.1. We now note that
IIAx  = x 
= II VT X  U
T
bll; since II . Ib is unitarily invariant
wherez=VTx,c=UTb
= II [ ]  [ ] II:
= II [ c, ] II:
The last equality follows from the fact that if v = then II v II = II viii + II v211 (note
that orthogonality is not what is used here; the subvectors can have different lengths). This
explains why it is convenient to work above with the square of the norm rather than the
norm. As far as the minimization is concerned. the two are equivalent. In fact. the last
quantity above is clearly minimized by taking z, = S'c,. The subvector Z2 is arbitrary,
while the minimum value of II Ax  b II is II czll
8.5. Least Squares and QR Factorization 71
Now transform back to the original coordinates:
The last equality follows from
Note that since 12 is arbitrary, V
2
z
2
is an arbitrary vector in 7Z(V
2
) = A/"(A). Thus, x has
been written in the form x = A
+
b + (/ — A
+
A ) _ y, where y e R
m
is arbitrary. This agrees,
of course, with (8.1).
The minimum value of the least squares residual is
and we clearly have that
minimum least squares residual is 0 4=> b is orthogonal to all vectors in U
2
•<=^ b is orthogonal to all vectors in 7l(A}
L
Another expression for the minimum residual is  (/ — AA
+
) b 
2
. This follows easily since
(7  AA+)b\\
2
2
 \\U2Ufb\\l = b
T
U
2
U^U
2
UJb = b
T
U
2
U*b = \\U?b\\
2
2
.
Finally, an important special case of the linear least squares problem is the
socalled fullrank problem, i.e., A e 1R™
X
" . In this case the SVD of A is given by
A = UZV
T
= [U
{
t/ 2][o]^i
r
> and there is thus "no V
2
part" to the solution.
8.5 Least Squares and QR Factorization
In this section, we again look at the solution of the linear least squares problem (8.6) but this
time in terms of the QR factorization. This matrix factorization is much cheaper to compute
than an SVD and, with appropriate numerical enhancements, can be quite reliable.
To simplify the exposition, we add the simplifying assumption that A has full column
rank, i.e., A e R™
X M
. It is then possible, via a sequence of socalled Householder or Givens
transformations, to reduce A in the following way. A finite sequence of simple orthogonal
row transformations (of Householder or Givens type) can be performed on A to reduce it
to triangular form. If we label the product of such orthogonal row transformations as the
orthogonal matrix Q
T
€ R
mxm
, we have
B.S. Least Squares and QR Factorization 71
Now transform back to the original coordinates:
x = Vz
= [VI V
2
1 [ ]
= VIZI + V2Z2
= VISici + V2Z2
= vlsIufb + V
2
Z
2
.
The last equality follows from
c = U T b = [ f: ] = [ l
Note that since Z2 is arbitrary, V
2
Z
2
is an arbitrary vector in R(V
2
) = N(A). Thus, x has
been written in the form x = A + b + (I  A + A) y, where y E ffi.m is arbitrary. This agrees,
of course, with (8.1).
The minimum value of the least squares residual is
and we clearly have that
minimum least squares residual is 0 {::=:} b is orthogonal to all vectors in U2
{::=:} b is orthogonal to all vectors in R(A)l.
{::=:} b E R(A).
Another expression for the minimum residual is II (I  AA +)bllz. This follows easily since
11(1 = = b
T
U
Z
V!V
2
V!b = bTVZV!b =
Finally, an important special case of the linear least squares problem is the
socalled fullrank problem, i.e., A E In this case the SVD of A is given by
A = V:EV
T
= [VI Vzl[g]Vr, and there is thus "no V
2
part" to the solution.
8.5 Least Squares and QR Factorization
In this section, we again look at the solution of the linear least squares problem (8.6) but this
time in terms of the QR factorization. This matrix factorization is much cheaper to compute
than an SVD and, with appropriate numerical enhancements, can be quite reliable.
To simplify the exposition, we add the simplifying assumption that A has full column
rank, i.e., A E It is then possible, via a sequence of socalled Householder or Givens
transformations, to reduce A in the following way. A finite sequence of simple orthogonal
row transformations (of Householder or Givens type) can be performed on A to reduce it
to triangular form. If we label the product of such orthogonal row transformations as the
orthogonal matrix QT E ffi.mxm, we have
(8.7)
72 Chapter 8. Linear Least Squares Problems
where R e M£
x
" is upper triangular. Now write Q = [Q\ Q
2
], where Q\ e R
mx
" and
Q
2
€ K"
IX(m
~"
)
. Both Q\ and <2
2
have orthonormal columns. Multiplying through by Q
in (8.7), we see that
Any of (8.7), (8.8), or (8.9) are variously referred to as QR factorizations of A. Note that
(8.9) is essentially what is accomplished by the GramSchmidt process, i.e., by writing
AR~
l
= Q\ we see that a "triangular" linear combination (given by the coefficients of
R~
l
) of the columns of A yields the orthonormal columns of Q\.
Now note that
The last quantity above is clearly minimized by taking x = R
l
c\ and the minimum residual
is \\C 2\\2 Equivalently, we have x = R~
l
Q\b = A
+
b and the minimum residual is IIC?^!^
EXERCISES
1. For A € W
xn
, b e E
m
, and any y e R", check directly that (I  A
+
A)y and A
+
b
are orthogonal vectors.
2. Consider the following set of measurements (*,, y
t
):
(a) Find the best (in the 2norm sense) line of the form y = ax + ft that fits this
data.
(b) Find the best (in the 2norm sense) line of the form jc = ay + (3 that fits this
data.
3. Suppose qi and q
2
are two orthonormal vectors and b is a fixed vector, all in R".
(a) Find the optimal linear combination aq^ + fiq
2
that is closest to b (in the 2norm
sense).
(b) Let r denote the "error vector" b — ctq\ — flq
2
 Show that r is orthogonal to
both^i and q
2
.
72 Chapter 8. Linear Least Squares Problems
where R E is upper triangular. Now write Q = [QI Qz], where QI E ffi.mxn and
Qz E ffi.m x (mn). Both Q I and Qz have orthonormal columns. Multiplying through by Q
in (8.7), we see that
(8.8)
= [QI Qz] [ ]
= QIR.
(8.9)
Any of (8.7), (8.8), or (8.9) are variously referred to as QR factorizations of A. Note that
(8.9) is essentially what is accomplished by the GramSchmidt process, i.e., by writing
AR
1
= QI we see that a "triangular" linear combination (given by the coefficients of
R
I
) of the columns of A yields the orthonormal columns of Q I.
Now note that
IIAx  = IIQ
T
Ax  since II . 112 is unitarily invariant
= II [ ] x  [ ] If:,
The last quantity above is clearly minimized by taking x = R
I
Cl and the minimum residual
is Ilczllz. Equivalently, we have x = R
1
Qf b = A +b and the minimum residual is II Qr bllz'
EXERCISES
1. For A E ffi.
mxn
, b E ffi.
m
, and any y E ffi.
n
, check directly that (I  A + A)y and A +b
are orthogonal vectors.
2. Consider the following set of measurements (Xi, Yi):
(1,2), (2,1), (3,3).
(a) Find the best (in the 2norm sense) line of the form y = ax + fJ that fits this
data.
(b) Find the best (in the 2norm sense) line of the form x = ay + fJ that fits this
data.
3. Suppose q, and qz are two orthonormal vectors and b is a fixed vector, all in ffi.
n
•
(a) Find the optimallinear combination aql + (3q2 that is closest to b (in the 2norm
sense).
(b) Let r denote the "error vector" b  aql  {3qz. Show that r is orthogonal to
both ql and q2.
Exercises 73
4. Find all solutions of the linear least squares problem
5. Consider the problem of finding the minimum 2norm solution of the linear least
«rmarp« nrr»h1<=>m
(a) Consider a perturbation E\ = [
0
pi of A, where 8 is a small positive number.
Solve the perturbed version of the above problem,
where AI = A + E\. What happens to jt* — y 
2
as 8 approaches 0?
(b) Now consider the perturbation EI = \
0 s
~\ of A, where again 8 is a small
positive number. Solve the perturbed problem
where A
2
— A + E
2
. What happens to \\x* — z
2
as 8 approaches 0?
6. Use the four Penrose conditions and the fact that Q\ has orthonormal columns to
verify that if A e R™
x
" can be factored in the form (8.9), then A+ = R~
l
Q\.
1. Let A e R"
x
", not necessarily nonsingular, and suppose A = QR, where Q is
orthogonal. Prove that A
+
= R
+
Q
T
.
Exercises 73
4. Find all solutions of the linear least squares problem
min II Ax  bll
2
x
when A = [
5. Consider the problem of finding the minimum 2norm solution of the linear least
squares problem
min II Ax  bl1
2
x
when A = ] and b = [ ! 1 The solution is
(a) Consider a perturbation EI = of A, where 8 is a small positive number.
Solve the perturbed version of the above problem,
where AI = A + E
I
. What happens to IIx*  yII2 as 8 approaches O?
(b) Now consider the perturbation E2 = n of A, where again 8 is a small
positive number. Solve the perturbed problem
min II A
2
z  bib
z
where A2 = A + E
2
• What happens to IIx*  zll2 as 8 approaches O?
6. Use the four Penrose conditions and the fact that QI has orthonormal columns to
verify that if A E can be factored in the form (8.9), then A+ = R
I
Qf.
7. Let A E not necessarily nonsingular, and suppose A = QR, where Q is
orthogonal. Prove that A + = R+ QT .
This page intentionally left blank This page intentionally left blank
Chapter 9
Eigenvalues and
Eigenvectors
9.1 Fundamental Definitions and Properties
Definition 9.1. A nonzero vector x e C" is a right eigenvector of A e C
nxn
if there exists
a scalar A. e C, called an eigenvalue, such that
Similarly, a nonzero vector y e C" is a left eigenvector corresponding to an eigenvalue
a if
By taking Hermitian transposes in (9.1), we see immediately that X
H
is a left eigen
vector of A
H
associated with A . Note that if x [y] is a right [left] eigenvector of A, then
so is ax [ay] for any nonzero scalar a E C. One oftenused scaling for an eigenvector is
a — \j';t so that the scaled eigenvector has norm 1. The 2norm is the most common
norm used for such scaling.
Definition 9.2. The polynomial n (A.) = det(A —A ,/ ) is called the characteristic polynomial
of A. (Note that the characteristic polynomial can also be defined as det(A . / — A ). This
results in at most a change of sign and, as a matter of convenience, we use both forms
throughout the text.}
The following classical theorem can be very useful in hand calculation. It can be
proved easily from the Jordan canonical form to be discussed in the text to follow (see, for
example, [21]) or directly using elementary properties of inverses and determinants (see,
for example, [3]).
Theorem 9.3 (CayleyHamilton). For any A e C
nxn
, n(A) = 0.
Example 9.4. Let A = [~g ~g] . Then n(k) = X
2
+ 2A , — 3. It is an easy exercise to
verify that n(A) = A
2
+ 2A  31 = 0.
It can be proved from elementary properties of determinants that if A e C"
x
", then
7 t (X) is a polynomial of degree n. Thus, the Fundamental Theorem of A lgebra says that
75
Chapter 9
Eigenvalues and
Eigenvectors
9.1 Fundamental Definitions and Properties
Definition 9.1. A nonzero vector x E en is a right eigenvector of A E e
nxn
if there exists
a scalar A E e, called an eigenvalue, such that
Ax = AX. (9.1)
Similarly, a nonzero vector y E en is a left eigenvector corresponding to an eigenvalue
Mif
(9.2)
By taking Hennitian transposes in (9.1), we see immediately that x
H
is a left eigen
vector of A H associated with I. Note that if x [y] is a right [left] eigenvector of A, then
so is ax [ay] for any nonzero scalar a E C. One oftenused scaling for an eigenvector is
a = 1/ IIx II so that the scaled eigenvector has nonn 1. The 2nonn is the most common
nonn used for such scaling.
Definition 9.2. The polynomialn (A) = det (A  A l) is called the characteristic polynomial
of A. (Note that the characteristic polynomial can also be defined as det(Al  A). This
results in at most a change of sign and, as a matter of convenience, we use both forms
throughout the text.)
The following classical theorem can be very useful in hand calculation. It can be
proved easily from the Jordan canonical fonn to be discussed in the text to follow (see, for
example, [21D or directly using elementary properties of inverses and determinants (see,
for example, [3]).
Theorem 9.3 (CayleyHamilton). For any A E e
nxn
, n(A) = O.
Example 9.4. Let A = [  ~  ~ ] . Then n(A) = A2 + 2A  3. It is an easy exercise to
verify that n(A) = A2 + 2A  31 = O.
It can be proved from elementary properties of detenninants that if A E e
nxn
, then
n(A) is a polynomial of degree n. Thus, the Fundamental Theorem of Algebra says that
75
and set X = 0 in this identity, we get the interesting fact that del (A) = AI • A.2 • • • A
M
(see
also Theorem 9.25).
If A e W
xn
, then n(X) has real coefficients. Hence the roots of 7 r( A) , i.e., the
eigenvalues of A, must occur in complex conjugate pairs.
Example 9.6. Let a, ft e R and let A = [ _^ £ ]. Then jr( A. ) = A.
2
 2aA + a
2
+ ft
2
and
A has eigenvalues a ± fij (where j = i = •>/—!)•
If A € R"
x
", then there is an easily checked relationship between the left and right
eigenvectors of A and A
T
(take Hermitian transposes of both sides of (9.2)). Specifically, if
y is a left eigenvector of A corresponding to A e A( A) , then y is a right eigenvector of A
T
corresponding to A. € A ( A) . Note, too, that by elementary properties of the determinant,
we always have A ( A ) = A ( A
r
) , but that A ( A ) = A ( A ) only if A e R"
x
".
Definition 9.7. IfX is a root of multiplicity m ofjr(X), we say that X is an eigenvalue of A
of algebraic multiplicity m. The geometric multiplicity ofX is the number of associated
independent eigenvectors = n — rank( A — A/) = dim J \ f(A — XI).
If A € A ( A ) has algebraic multiplicity m, then 1 < di mA/ "(A — A/) < m. Thus, if
we denote the geometric multiplicity of A by g, then we must have 1 < g < m.
Definition 9.8. A matrix A e W
x
" is said to be defective if it has an eigenvalue whose
geometric multiplicity is not equal to (i.e., less than) its algebraic multiplicity. Equivalently,
A is said to be defective if it does not have n linearly independent (right or left) eigenvectors.
From the CayleyHamilton Theorem, we know that n(A) = 0. However, it is pos
sible for A to satisfy a lowerorder polynomial. For example, if A = \
1
Q
®], then A sat
isfies (1 — I)
2
= 0. But it also clearly satisfies the smaller degree polynomial equation
a  n = o.
Definition 5.5. The minimal polynomial of A G K""" is the polynomial o/ (X) of least
degree such that a (A) =0.
It can be shown that or(l) is essentially unique (unique if we force the coefficient
of the highest power of A to be +1, say; such a polynomial is said to be monic and we
generally write et (A) as a monic polynomial throughout the text). Moreover, it can also be
7 6 Chapt er 9. Ei g e n va l ue s and Ei genvect ors
7 r( A) has n roots, possibly repeated. These roots, as solutions of the determinant equation
are the eigenvalues of A and imply the singularity of the matrix A — XI, and hence further
guarantee the existence of corresponding nonzero eigenvectors.
Definition 9.5. The spectrum of A e C"
x
" is the set of all eigenvalues of A, i.e., the set of
all roots of its characteristic polynomial n(X). The spectrum of A is denoted A ( A) .
Let the eigenvalues of A e C"
x
" be denoted X\ ,..., X
n
. Then if we write (9.3) in the
form
76 Chapter 9. Eigenvalues and Eigenvectors
n(A) has n roots, possibly repeated. These roots, as solutions of the determinant equation
n(A) = det(A  AI) = 0, (9.3)
are the eigenvalues of A and imply the singularity of the matrix A  AI, and hence further
guarantee the existence of corresponding nonzero eigenvectors.
Definition 9.5. The spectrum of A E c
nxn
is the set of all eigenvalues of A, i.e., the set of
all roots of its characteristic polynomialn(A). The spectrum of A is denoted A(A).
Let the eigenvalues of A E en xn be denoted A], ... , An. Then if we write (9.3) in the
form
n(A) = det(A  AI) = (A]  A) ... (An  A) (9.4)
and set A = 0 in this identity, we get the interesting fact that det(A) = A] . A2 ... An (see
also Theorem 9.25).
If A E 1Ftnxn, then n(A) has real coefficients. Hence the roots of n(A), i.e., the
eigenvalues of A, must occur in complex conjugate pairs.
Example 9.6. Let a, f3 E 1Ft and let A = [ ~ f 3 !]. Then n(A) = A
2
 2aA + a
2
+ f32 and
A has eigenvalues a ± f3j (where j = i = R).
If A E 1Ftnxn, then there is an easily checked relationship between the left and right
eigenvectors of A and AT (take Hermitian transposes of both sides of (9.2». Specifically, if
y is a left eigenvector of A corresponding to A E A(A), then y is a right eigenvector of AT
corresponding to I E A(A). Note, too, that by elementary properties of the determinant,
we always have A(A) = A(AT), but that A(A) = A(A) only if A E 1Ftnxn.
Definition 9.7. If A is a root of multiplicity m of n(A), we say that A is an eigenvalue of A
of algebraic multiplicity m. The geometric multiplicity of A is the number of associated
independent eigenvectors = n  rank(A  AI) = dimN(A  AI).
If A E A(A) has algebraic multiplicity m, then I :::: dimN(A  AI) :::: m. Thus, if
we denote the geometric multiplicity of A by g, then we must have I :::: g :::: m.
Definition 9.8. A matrix A E 1Ft
nxn
is said to be defective if it has an eigenvalue whose
geometric multiplicity is not equal to (i.e., less than) its algebraic multiplicity. Equivalently,
A is said to be defective if it does not have n linearly independent (right or left) eigenvectors.
From the CayleyHamilton Theorem, we know that n(A) = O. However, it is pos
sible for A to satisfy a lowerorder polynomial. For example, if A = [ ~ ~ ] , then A sat
isfies (Je  1)2 = O. But it also clearly satisfies the smaller degree polynomial equation
(it.  1) ;;;:; 0
neftnhion ~ . ~ . Thll minimal polynomial Of A l::: l!if.nxn ix (hI' polynomilll a(A) oJ IPll.ft
degree such that a(A) ~ O.
It can be shown that a(Je) is essentially unique (unique if we force the coefficient
of the highest power of A to be + 1. say; such a polynomial is said to be monic and we
generally write a(A) as a monic polynomial throughout the text). Moreover, it can also be
9.1. Fundamental Definitions and Properties 77
shown that a (A.) divides every nonzero polynomial fi(k} for which ft (A) = 0. In particular,
a(X) divides n(X).
There is an algorithm to determine or ( A . ) directly ( without knowing eigenvalues and as
sociated eigenvector structure). Unfortunately, this algorithm, called the Bezout algorithm,
is numerically unstable.
Example 9.10. The above definitions are illustrated below for a series of matrices, each
of which has an eigenvalue 2 of algebraic multiplicity 4, i. e. , 7r( A ) = ( A — 2)
4
. We denote
the geometric multiplicity by g.
A t this point, one might speculate that g plus the degree of a must always be five.
Unfortunately, such is not the case. The matrix
Theorem 9.11. Let A e C«
x
"
ana
[
e
t A ., be an eigenvalue of A with corresponding right
eigenvector j c,. Furthermore, let yj be a left eigenvector corresponding to any A
;
e A ( A )
such that Xj =£ A . ,. Then yfx{ = 0.
Proof: Since Ax
t
= A ,*,,
9.1. Fundamental Definitions and Properties 77
shown that a(A) divides every nonzero polynomial f3(A) for which f3(A) = O. In particular,
a(A) divides n(A).
There is an algorithm to determine a(A) directly (without knowing eigenvalues and as
sociated eigenvector structure). Unfortunately, this algorithm, called the Bezout algorithm,
is numerically unstable.
Example 9.10. The above definitions are illustrated below for a series of matrices, each
of which has an eigenvalue 2 of algebraic multiplicity 4, i.e., n(A) = (A  2)4. We denote
the geometric multiplicity by g.
A  [ ~
0
! ] ha,"(A) ~ (A  2)' ""d g ~ 1.
2 I
 0
0 2
0 0 0
A ~ [ ~
0
~ ] ha< a(A) ~ (A  2)' ""d g ~ 2.
2
0 2
0 0
A ~ U
I 0
~ ] h'" a(A) ~ (A  2)2 ""d g ~ 3.
2 0
0 2
0 0
A ~ U
0 0
~ ] ha<a(A) ~ (A  2) andg ~ 4.
2 0
0 2
0 0
At this point, one might speculate that g plus the degree of a must always be five.
Unfortunately, such is not the case. The matrix
A ~ U
I 0
!]
2 0
0 2
0 0
has a(A) = (A  2)2 and g = 2.
Theorem 9.11. Let A E cc
nxn
and let Ai be an eigenvalue of A with corresponding right
eigenvector Xi. Furthermore, let Yj be a left eigenvector corresponding to any Aj E l\(A)
such that Aj 1= Ai. Then YY Xi = O.
Proof' Since AXi = AiXi,
(9.5)
78 Chapter 9. Eigenvalues and Eigenvectors
Similarly, since y" A = Xjyf,
Subtracting (9.6) from (9.5), we find 0 = (A., — A
y
)j ^j c, . Since A,, — A.
7
 ^ 0, we must have
yfxt =0.
The proof of Theorem 9.11 is very similar to two other fundamental and important
results.
Theorem 9.12. Let A e C"
x
" be Hermitian, i.e., A = A
H
. Then all eigenvalues of A must
be real.
Proof: Suppose (A ., x) is an arbitrary eigenvalue/eigenvector pair such that Ax = A .J C. Then
Taking Hermitian transposes in (9.7) yields
Using the fact that A is Hermitian, we have that Xx
H
x = Xx
H
x. However, since x is an
eigenvector, we have X
H
X /= 0, from which we conclude A . = A , i.e., A . is real. D
Theorem 9.13. Let A e C"
x
" be Hermitian and suppose X and / J L are distinct eigenvalues
of A with corresponding right eigenvectors x and z, respectively. Then x and z must be
orthogonal.
Proof: Premultiply the equation Ax = A.J C by Z
H
to get Z
H
Ax = X z
H
x . Take the Hermitian
transpose of this equation and use the facts that A is Hermitian and A . is real to get X
H
Az =
Xx
H
z. Premultiply the equation Az = i^z by X
H
to get X
H
Az = / ^X
H
Z = Xx
H
z. Since
A, ^ /z, we must have that X
H
z = 0, i.e., the two vectors must be orthogonal. D
Let us now return to the general case.
Theorem 9.14. Let A €. C
nxn
have distinct eigenvalues A ,
1 ?
. . . , A .
n
with corresponding
right eigenvectors x\,... ,x
n
. Then [x\,..., x
n
} is a linearly independent set. The same
result holds for the corresponding left eigenvectors.
Proof: For the proof see, for example, [21, p. 118].
If A e C
nx
" has distinct eigenvalues, and if A ., e A (A ), then by Theorem 9.11, jc, is
orthogonal to all yj's for which j ^ i. However, it cannot be the case that yf*x
t
= 0 as
well, or else x
f
would be orthogonal to n linearly independent vectors (by Theorem 9.14)
and would thus have to be 0, contradicting the fact that it is an eigenvector. Since yf*Xi ^ 0
for each i, we can choose the normalization of the *, 's, or the y, 's, or both, so that y
t
H
x; = 1
f or / € n.
78 Chapter 9. Eigenvalues and Eigenvectors
Similarly, since YY A = A j yy,
(9.6)
Subtracting (9.6) from (9.5), we find 0 = (Ai  Aj)YY xi. Since Ai  Aj =1= 0, we must have
YyXi = O. 0
The proof of Theorem 9.11 is very similar to two other fundamental and important
results.
Theorem 9.12. Let A E c
nxn
be Hermitian, i.e., A = AH. Then all eigenvalues of A must
be real.
Proof: Suppose (A, x) is an arbitrary eigenvalue/eigenvector pair such that Ax = AX. Then
(9.7)
Taking Hermitian transposes in (9.7) yields
Using the fact that A is Hermitian, we have that IXH x = AXH x. However, since x is an
eigenvector, we have xH x =1= 0, from which we conclude I = A, i.e., A is real. 0
Theorem 9.13. Let A E c
nxn
be Hermitian and suppose A and iJ are distinct eigenvalues
of A with corresponding right eigenvectors x and z, respectively. Then x and z must be
orthogonal.
Proof: Premultiply the equation Ax = AX by ZH to get ZH Ax = AZ
H
x. Take the Hermitian
transpose of this equation and use the facts that A is Hermitian and A is real to get x H Az =
AxH z. Premultiply the equation Az = iJZ by x
H
to get x
H
Az = iJXH Z = AXH z. Since
A =1= iJ, we must have that x
H
z = 0, i.e., the two vectors must be orthogonal. 0
Let us now return to the general case.
Theorem 9.14. Let A E c
nxn
have distinct eigenvalues AI, ... , An with corresponding
right eigenvectors XI, ... , x
n
• Then {XI, ... , x
n
} is a linearly independent set. The same
result holds for the corresponding left eigenvectors.
Proof: For the proof see, for example, [21, p. 118]. 0
If A E c
nxn
has distinct eigenvalues, and if Ai E A(A), then by Theorem 9.11, Xi is
orthogonal to all y/s for which j =1= i. However, it cannot be the case that Yi
H
Xi = 0 as
well, or else Xi would be orthogonal to n linearly independent vectors (by Theorem 9.14)
and would thus have to be 0, contradicting the fact that it is an eigenvector. Since yr Xi =1= 0
for each i, we can choose the normalization of the Xi'S, or the Yi 's, or both, so that Yi
H
Xi = 1
for i E !1.
9.1. Fundament al Def i ni t i o ns and Properties 79
Theorem 9.15. Let A e C"
x
" have distinct eigenvalues A .I , ..., A .
n
and let the correspond
ing right eigenvectors form a matrix X = [x\, ..., x
n
]. Similarly, let Y — [y\, ..., y
n
]
be the matrix of corresponding left eigenvectors. Furthermore, suppose that the left and
right eigenvectors have been normalized so that yf
1
Xi = 1, / en. Finally, let A =
di ag ( A ,j , . . . , X
n
) e W
txn
. Then A J C, = A ., * /, / e n, can be written in matrix form as
Example 9.16. Let
Then n(X) = det( A  A ./) =  (A .
3
+ 4A .
2
+ 9 A . + 10) =  (A . + 2 )(A .
2
+ 2 A , + 5), from
which we find A ( A ) = { — 2 , — 1 ± 2 j } . We can now find the right and left eigenvectors
corresponding to these eigenvalues.
For A  i = — 2 , solve the 3 x 3 linear system (A — (—2 } I)x\ = 0 to get
while y^Xj = 5,
;
, / en, y' e n, is expressed by the equation
These matrix equations can be combined to yield the following matrix factorizations:
and
Note that one component of ;ci can be set arbitrarily, and this then determines the other two
(since di mA /XA — ( — 2 )7) = 1). To get the corresponding left eigenvector y\, solve the
linear system y\(A + 2 1) = 0 to get
This time we have chosen the arbitrary scale factor for y\ so that y f x \ = 1.
For A
2
= — 1 + 2 j , solve the linear system (A — (— 1 + 2 j )I)x
2
= 0 to get
9.1. Fundamental Definitions and Properties 79
Theorem 9.15. Let A E en xn have distinct eigenvalues A I, ... , An and let the correspond
ing right eigenvectors form a matrix X = [XI, ... , xn]. Similarly, let Y = [YI,"" Yn]
be the matrix of corresponding left eigenvectors. Furthermore, suppose that the left and
right eigenvectors have been normalized so that YiH Xi = 1, i E !!:: Finally, let A =
diag(AJ, ... , An) E ]Rnxn. Then AXi = AiXi, i E !!, can be written in matrixform as
AX=XA (9.8)
while YiH X j = oij, i E!!, j E !!, is expressed by the equation
yHX = I.
(9.9)
These matrix equations can be combined to yield the following matrix factorizations:
and
Example 9.16. Let
XlAX = A = yRAX
n
A = XAX
I
= XAyH = LAixiyr
2
5
3
3
2
i=1
~ ] .
4
(9.10)
(9.11)
Then rr(A) = det(A  AI) = (A
3
+ 4A2 + 9)" + 10) = ()" + 2)(),,2 + 2)" + 5), from
which we find A(A) = {2, 1 ± 2j}. We can now find the right and left eigenvectors
corresponding to these eigenvalues.
For Al = 2, solve the 3 x 3 linear system (A  (2)l)xI = 0 to get
Note that one component of XI can be set arbitrarily, and this then determines the other two
(since dimN(A  (2)1) = 1). To get the corresponding left eigenvector YI, solve the
linear system yi (A + 21) = 0 to get
This time we have chosen the arbitrary scale factor for YJ so that yi XI = 1.
For A2 = 1 + 2j, solve the linear system (A  (1 + 2j) I)x2 = 0 to get
[
3+ j ]
X2 = 3 ~ / .
80 Chapter 9. Eigenvalues and Eigenvectors
Solve the linear system y" (A — (1 + 27')/) = 0 and normalize y>
2
so that y"x
2
= 1 to get
For X T , = — 1 — 2 j, we could proceed to solve linear systems as for A.
2
. However, we
can also note that x$ =x
2
' and yi = jj. To see this, use the fact that A, 3 = A.2 and simply
conjugate the equation A;c
2
— ^.2 *2 to get Ax^ = ^2 X 2  A similar argument yields the result
for left eigenvectors.
Now define the matrix X of right eigenvectors:
It is then easy to verify that
Other results in Theorem 9.15 can also be verified. For example,
Finally, note that we could have solved directly only for *i and x
2
(and X T , = x
2
). Then,
instead of determining the j,'s directly, we could have found them instead by computing
X ~
l
and reading off its rows.
Example 9.17. Let
Then 7r(A.) = det(A  A./) = (A
3
+ 8A
2
+ 19X + 12) = (A. + 1)(A. + 3)(A, + 4),
from which we find A (A) = { —1, —3, —4}. Proceeding as in the previous example, it is
straightforward to compute
and
80 Chapter 9. Eigenvalues and Eigenvectors
Solve the linear system yf (A  ( I + 2 j) I) = 0 and nonnalize Y2 so that yf X2 = 1 to get
For A3 = I  2j, we could proceed to solve linear systems as for A2. However, we
can also note that X3 = X2 and Y3 = Y2. To see this, use the fact that A3 = A2 and simply
conjugate the equation AX2 = A2X2 to get AX2 = A2X2. A similar argument yields the result
for left eigenvectors.
Now define the matrix X of right eigenvectors:
3+j 3
j
]
3j 3+j .
2 2
It is then easy to verify that
.!.=.L
4
l+j
4
!.±1
4
.!.=.L
4
Other results in Theorem 9.15 can also be verified. For example,
[
2 0
XIAX=A= 0 1+2j
o 0
Finally, note that we could have solved directly only for XI and X2 (and X3 = X2). Then,
instead of detennining the Yi'S directly, we could have found them instead by computing
XI and reading off its rows.
Example 9.17. Let
A = .
o 3
Then Jl"(A) = det(A  AI) = _(A
3
+ 8A
2
+ 19A + 12) = (A + I)(A + 3)(A + 4),
from which we find A(A) = {I, 3, 4}. Proceeding as in the previous example, it is
gtruightforw!U"d to
I
i ]
0
I
and
1 2 1
] y'
3 0 3
2 2 2
9.1. Fundamental Definitions and Properties 81
We also have X~
l
AX = A = di ag( —1, —3, —4 ) , which is equivalent to the dyadic expan
sion
Theorem 9.18. Eigenvalues (but not eigenvectors) are invariant under a similarity trans
formation T.
Proof: Suppose (A, jc) is an eigenvalue/eigenvector pair such that Ax = Xx. Then, since T
is nonsingular, we have the equivalent statement (T~
l
AT)(T~
l
x) = X ( T ~
l
x ) , from which
the theorem statement follows. For left eigenvectors we have a similar statement, namely
y
H
A = Xy
H
ifandon\yif(T
H
y)
H
(T~
1
AT) =X(T
H
yf. D
Remark 9.19. If / is an analytic function (e.g., f ( x ) is a polynomial, or e
x
, or sin*,
or, in general, representable by a power series X^^o
fl
n*
n
)> then it is easy to show that
the eigenvalues of /( A) (defined as X^o^A") are /( A) , but /( A) does not necessarily
have all the same eigenvectors (unless, say, A is diagonalizable). For example, A = T
0 O
j
has only one right eigenvector corresponding to the eigenvalue 0, but A
2
= f
0 0
1 has two
independent right eigenvectors associated with the eigenvalue 0. What is true is that the
eigenvalue/eigenvector pair (A, x) maps to ( /( A) , x) but not conversely.
The following theorem is useful when solving systems of linear differential equations.
Details of how the matrix exponential e'
A
is used to solve the system x = Ax are the subject
of Chapter 11.
Theorem 9.20. Let A e R"
xn
and suppose X~~
1
AX — A, where A is diagonal. Then
9.1. Fundamental Definitions and Properties 81
We also have XI AX = A = diag( 1, 3, 4), which is equivalent to the dyadic expan
sion
3
A = LAixiyr
i=1
W j 0
+(4) [ ; ]
1
 
3
(I) [
I I I
J + (3) [
I
0
I
] + (4) [
I I I
l
(;
3
(;
2
2 3
3
3
I 2 I
0 0 0
I I I
3 3 3
3
3
3
I I I
I
0
I
I I I
(;
3
(;
2
2
3
3
3
Theorem 9.18. Eigenvalues (but not eigenvectors) are invariant under a similarity trans
formation T.
Proof: Suppose (A, X) is an eigenvalue/eigenvector pair such that Ax = AX. Then, since T
is nonsingular, we have the equivalent statement (T
I
AT)(T
I
x) = A(T
I
x), from which
the theorem statement follows. For left eigenvectors we have a similar statement, namely
yH A = AyH if and only if (T
H
y)H (T
1
AT) = A(T
H
y)H. D
Remark 9.19. If f is an analytic function (e.g., f(x) is a polynomial, or eX, or sinx,
or, in general, representable by a power series anxn), then it is easy to show that
the eigenvalues of f(A) (defined as are f(A), but f(A) does not necessarily
have all the same eigenvectors (unless, say, A is diagonalizable). For example, A = 6 ]
has only one right eigenvector corresponding to the eigenvalue 0, but A
2
= ] has two
independent right eigenvectors associated with the eigenvalue o. What is true is that the
eigenvalue/eigenvector pair (A, x) maps to (f(A), x) but not conversely.
The following theorem is useful when solving systems of linear differential equations.
Details of how the matrix exponential etA is used to solve the system i = Ax are the subject
of Chapter 11.
Theorem 9.20. Let A E jRnxn and suppose XI AX = A, where A is diagonal. Then
n
= LeA,txiYiH.
i=1
82 Chapter 9. Eigenvalues and Eigenvectors
Proof: Starting from the definition, we have
The following corollary is immediate from the theorem upon setting t = I.
Corollary 9.21. If A e R
nx
" is diagonalizable with eigenvalues A ., , /' en, and right
eigenvectors x
t
•, / € n_, then e
A
has eigenvalues e
X i
, i € n_, and the same eigenvectors.
There are extensions to Theorem 9.20 and Corollary 9.21 for any function that is
analytic on the spectrum of A , i.e., f ( A ) = X f ( A ) X ~
l
= Xdi ag ( / ( A . i ) , . . . , f ( X
t t
) ) X ~
l
.
It is desirable, of course, to have a version of Theorem 9.20 and its corollary in which
A is not necessarily diagonalizable. It is necessary first to consider the notion of Jordan
canonical form, from which such a result is then available and presented later in this chapter.
9.2 Jordan Canonical Form
Theorem 9.22.
1. Jordan Canonical Form (/CF): For all A e C"
x
" with eigenvalues X\,..., k
n
e C
(not necessarily distinct), there exists X € C^
x
" such that
where each of the Jordan block matrices / i , . . . , J
q
is of the form
82 Chapter 9. Eigenvalues and Eigenvectors
Proof' Starting from the definition, we have
n
= LeA;IXiYiH. 0
i=1
The following corollary is immediate from the theorem upon setting t = I.
Corollary 9.21. If A E ]Rn xn is diagonalizable with eigenvalues Ai, i E ~ , and right
eigenvectors Xi, i E ~ , then e
A
has eigenvalues e
A
" i E ~ , and the same eigenvectors.
There are extensions to Theorem 9.20 and Corollary 9.21 for any function that is
analytic on the spectrum of A, i.e., f(A) = Xf(A)X
I
= Xdiag(J(AI), ... , f(An))X
I
.
It is desirable, of course, to have a version of Theorem 9.20 and its corollary in which
A is not necessarily diagonalizable. It is necessary first to consider the notion of Jordan
canonical form, from which such a result is then available and presented later in this chapter.
9.2 Jordan Canonical Form
Theorem 9.22.
I. lordan Canonical Form (JCF): For all A E c
nxn
with eigenvalues AI, ... , An E C
(not necessarily distinct), there exists X E c ~ x n such that
XI AX = 1 = diag(ll, ... , 1q), (9.12)
where each of the lordan block matrices 1
1
, ••• , 1q is of the form
Ai
0 o
0
Ai
0
1i =
Ai
(9.13)
o
Ai
o o Ai
9.2. Jordan Canonical Form 83
2. Real Jordan Canonical Form: For all A € R
nx
" with eigenvalues Xi, . . . , X
n
(not
necessarily distinct), there exists X € R"
xn
such that
where each of the Jordan block matrices J\, ..., J
q
is of the form
in the case of real eigenvalues A., e A (A), and
where M
;
= [ _»' ^ 1 and I
2
= \
0
A in the case of complex conjugate eigenvalues
a
i
±jp
i
eA(A
>
).
Proof: For the proof see, for example, [21, pp. 120124]. D
Transformations like T = I " _, "•{ "] allow us to go back and forth between a real JCF
and its complex counterpart:
For nontrivial Jordan blocks, the situation is only a bit more complicated. With
9.2. Jordan Canonical Form 83
and L;=1 ki = n.
2. Real Jordan Canonical Form: For all A E jRnxn with eigenvalues AI, ... , An (not
necessarily distinct), there exists X E such that
(9.14)
where each of the Jordan block matrices 11, ... , 1q is of the form
where Mi = [ ] and h = [6 in the case of complex conjugate eigenvalues
(Xi ± jfJi E A(A).
Proof: For the proof see, for example, [21, pp. 120124]. 0
Transformations like T = [ _  { ] allow us to go back and forth between a real JCF
and its complex counterpart:
TI [ (X + jfJ O. ] T = [ (X fJ ] = M.
o (X  JfJ fJ (X
For nontrivial Jordan blocks, the situation is only a bit more complicated. With
1
o
j
o
j
o
1 o 0 '
o j 1
84 Chapter 9. Ei genval ues and Eigenvectors
it is easily checked that
Definition 9.23. The characteristic polynomials of the Jordan blocks defined in Theorem
9 . 2 2 are called the elementary divisors or invariant factors of A.
Theorem 9.24. The characteristic polynomial of a matrix is the product of its elementary
divisors. The minimal polynomial of a matrix is the product of the elementary divisors of
highest degree corresponding to distinct eigenvalues.
Theorem 9.25. Let A e C"
x
" with eigenvalues AI, . . . , X
n
. Then
Proof:
1. From Theorem 9.22 we have that A = X J X ~
l
. Thus,
det(A) = det(XJX
1
) = det(7) = ] ~ [ "
=l
A,  .
2. Again, from Theorem 9.22 we have that A = X J X ~
l
. Thus,
Tr(A) = Tr(XJX~
l
) = TrC/X"
1
*) = Tr(/) = £"
=1
A., . D
Example 9.26. Suppose A e E
7x7
is known to have 7r(A) = (A.  1)
4
(A  2)
3
and
a (A.) = (A. — I)
2
(A. — 2)
2
. Then A has two possible JCFs (not counting reorderings of the
diagonal blocks):
Note that 7
(1)
has elementary divisors (A  I )
2
, (A.  1), (A.  1), (A,  2)
2
, and (A  2),
while /(
2)
has elementary divisors (A  I )
2
, (A  I )
2
, (A  2)
2
, and (A  2).
84 Chapter 9. Eigenvalues and Eigenvectors
it is easily checked that
[ "+ jfi
0 0
] T ~ [ ~
T
I
0
et + jf3 0 0
h
l
0 0 et  jf3 M
0 0 0 et  jf3
Definition 9.23. The characteristic polynomials of the Jordan blocks defined in Theorem
9.22 are called the elementary divisors or invariant factors of A.
Theorem 9.24. The characteristic polynomial of a matrix is the product of its elementary
divisors. The minimal polynomial of a matrix is the product of the elementary divisors of
highest degree corresponding to distinct eigenvalues.
Theorem 9.25. Let A E c
nxn
with eigenvalues AI, .. " An. Then
n
1. det(A) = nAi.
i=1
n
2. Tr(A) = 2,)i.
i=1
Proof:
1. From Theorem 9.22 we have that A = X J XI. Thus,
det(A) = det(X J XI) = det(J) = n7=1 Ai.
2. Again, from Theorem 9.22 we have that A = X J XI. Thus,
Tr(A) = Tr(X J XI) = Tr(JX
1
X) = Tr(J) = L7=1 Ai. 0
Example 9.26. Suppose A E lR.
7x7
is known to have :rr(A) = (A  1)4(A  2)3 and
et(A) = (A  1)2(A  2)2. Then A has two possible JCFs (not counting reorderings of the
diagonal blocks):
1 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 I 0 0 0
J(l) =
0 0 0 1 0 0 0
and f2) =
0 0 0 1 0 0 0
0 0 0 0 2 1 0 0 0 0 0 2 1 0
0 0 0 0 0 2 0 0 0 0 0 0 2 0
0 0 0 0 0 0 2
0 0 0 0 0 0 2
Note that J(l) has elementary divisors (A  1)z, (A  1), (A  1), (A  2)2, and (A  2),
while J(2) has elementary divisors (A  1)2, (A  1)2, (A  2)2, and (A  2).
9.3. Determination of the JCF &5
Example 9.27. Knowing T T (A.), a ( A ) , and rank (A — A,,7) for distinct A., is not sufficient to
determine the JCF of A uniquely. T he matrices
both have 7r( A . ) = (A. — a) , a( A . ) = (A. — a) , and rank( A — al) = 4, i.e., three eigen
vectors.
9.3 Determination of the JCF
T he first critical item of information in determining the JCF of a matrix A e W
lxn
is its
number of eigenvectors. For each distinct eigenvalue A , , , the associated number of linearly
independent right (or left) eigenvectors is given by dim A^(A — A.,7) = n — rank( A — A.
(
7).
T he straightforward case is, of course, when X, is simple, i.e., of algebraic multiplicity 1; it
then has precisely one eigenvector. T he more interesting (and difficult) case occurs when
A, is of algebraic multiplicity greater than one. For example, suppose
T hen
has rank 1, so the eigenvalue 3 has two eigenvectors associated with it. If we let [^i £2 &]
T
denote a solution to the linear system (A — 3/) £ = 0, we find that 2£
2
+ £3=0. T hus, both
are eigenvectors (and are independent). T o get a third vector JC3 such that X = [x\ KJ_ XT,]
reduces A to JCF, we need the notion of principal vector.
Definition 9.28. Let A e C"
xn
(or R"
x
") . Then x is a right principal vector of degree k
associated with X e A (A) if and only if (A  XI)
k
x = 0 and (A  U}
k
~
l
x ^ 0.
Remark 9.29.
1. An analogous definition holds for a left principal vector of degree k.
9.3. Determination of the JCF 85
Example 9.27. Knowing rr(A), a(A), and rank(A  Ai l) for distinct Ai is not sufficient to
determine the JCF of A uniquely. The matrices
a 0 0 0 0 0 a 0 0 0 0 0
0 a 0 0 0 0 0 a 0 0 0 0
0 0 a 0 0 0 0 0 0 a 0 0 0 0
Al=
0 0 0 a 0 0
A2 =
0 0 0 a 0 0
0 0 0 0 a 0 0 0 0 0 0 a 0
0 0 0 0 0 a 1 0 0 0 0 0 a 0
0 0 0 0 0 0 a 0 0 0 0 0 0 a
both have rr(A) = (A  a)7, a(A) = (A  a)\ and rank(A  al) = 4, i.e., three eigen
vectors.
9.3 Determination of the JCF
The first critical item of information in determining the JCF of a matrix A E ]R.nxn is its
number of eigenvectors. For each distinct eigenvalue Ai, the associated number of linearly
independent right (or left) eigenvectors is given by dimN(A  A;l) = n  rank(A  A;l).
The straightforward case is, of course, when Ai is simple, i.e., of algebraic multiplicity 1; it
then has precisely one eigenvector. The more interesting (and difficult) case occurs when
Ai is of algebraic multiplicity greater than one. For example, suppose
[3 2
n
A = 0 3
o 0
Then
A3I= U
2 I]
o 0
o 0
has rank 1, so the eigenvalue 3 has two eigenvectors associated with it. If we let
denote a solution to the linear system (A  = 0, we find that + = O. Thus, both
are eigenvectors (and are independent). To get a third vector X3 such that X = [Xl X2 X3]
reduces A to JCF, we need the notion of principal vector.
Definition 9.28. Let A E c
nxn
(or ]R.nxn). Then X is a right principal vector of degree k
associated with A E A(A) ifand only if(A  ulx = 0 and (A  AI)kl x i= o.
Remark 9.29.
1. An analogous definition holds for a left principal vector of degree k.
86 Chapter 9. Eigenvalues and Eigenvectors
2. The phrase "of grade k" is often used synonymously with "of degree k."
3. Principal vectors are sometimes also called generalized eigenvectors, but the latter
term will be assigned a much different meaning in Chapter 12.
4. The case k = 1 corresponds to the "usual" eigenvector.
5. A right (or left) principal vector of degree k is associated with a Jordan block ji of
dimension k or larger.
9.3.1 Theoretical computation
To motivate the development of a procedure for determining principal vectors, consider a
2x2 Jordan block {h
0
h1. Denote by x
(1)
and x
(2)
the two columns of a matrix X e R
2
,x
2
that reduces a matrix A to this JCF. Then the equation AX = XJ can be written
The first column yields the equation Ax
(1)
= hx
(1)
which simply says that x
(1)
is a right
eigenvector. The second column yields the following equation for x
(2)
, the principal vector
of degree 2:
If we premultiply (9.17) by (A  XI), we find (A  XI )
z
x
( 2 )
= (A  XI)x
w
= 0. Thus,
the definition of principal vector is satisfied.
This suggests a "general" procedure. First, determine all eigenvalues of A e R"
x
"
(or C
nxn
). Then for each distinct X e A (A) perform the following:
1. Solve
This step finds all the eigenvectors (i.e., principal vectors of degree 1) associated with
X. The number of eigenvectors depends on the rank of A — XI. For example, if
rank(A — XI) = n — 1, there is only one eigenvector. If the algebraic multiplicity of
X is greater than its geometric multiplicity, principal vectors still need to be computed
from succeeding steps.
2. For each independent jc
(1)
, solve
The number of linearly independent solutions at this step depends on the rank of
(A — XI )
2
. If, for example, this rank is n — 2 , there are two linearly independent
solutions to the homogeneous equation (A — XI)
2
x^ = 0. One of these solutions
is, of course, x
(l)
(^ 0), since (A  XI )
2
x
( l )
= (A  XI)0 = 0. The other solution
is the desired principal vector of degree 2. (It may be necessary to take a linear
combination of jc
(1)
vectors to get a righthand side that is in 7£(A — XI). See, for
example, Exercise 7.)
86 Chapter 9. Eigenvalues and Eigenvectors
2. The phrase "of grade k" is often used synonymously with "of degree k."
3. Principal vectors are sometimes also called generalized eigenvectors, but the latter
term will be assigned a much different meaning in Chapter 12.
4. The case k = 1 corresponds to the "usual" eigenvector.
S. A right (or left) principal vector of degree k is associated with a Jordan block J; of
dimension k or larger.
9.3.1 Theoretical computation
To motivate the development of a procedure for determining principal vectors, consider a
2 x 2 Jordan block [ ~ i]. Denote by x(l) and x(2) the two columns of a matrix X E l R ~ X 2
that reduces a matrix A to this JCF. Then the equation AX = X J can be written
A [x(l) x(2)] = [x(l) X(2)] [ ~ ~ J.
The first column yields the equation Ax(!) = AX(!), which simply says that x(!) is a right
eigenvector. The second column yields the following equation for x(2), the principal vector
of degree 2:
(A  A/)x(2) = x(l). (9.17)
If we premultiply (9.17) by (A  AI), we find (A  A1)2 x(2) = (A  A1)X(l) = O. Thus,
the definition of principal vector is satisfied.
This suggests a "general" procedure. First, determine all eigenvalues of A E lR
nxn
(or c
nxn
). Then for each distinct A E A(A) perform the following:
1. Solve
(A  A1)X(l) = O.
This step finds all the eigenvectors (i.e., principal vectors of degree I) associated with
A. The number of eigenvectors depends on the rank of A  AI. For example, if
rank(A  A/) = n  1, there is only one eigenvector. If the algebraic multiplicity of
A is greater than its geometric multiplicity, principal vectors still need to be computed
from succeeding steps.
2. For each independent x(l), solve
(A  A1)x(2) = x(l).
The number of linearly independent solutions at this step depends on the rank of
(A  uf. If, for example, this rank is n  2, there are two linearly independent
solutions to the homogeneous equation (A  AI)2x (2) = o. One of these solutions
is, of course, x(l) (1= 0), since (A  'A1)
2
x(l) = (A  AI)O = o. The other solution
is the desired principal vector of degree 2. (It may be necessary to take a linear
combination of x(l) vectors to get a righthand side that is in R(A  AI). See, for
example, Exercise 7.)
9. 3. Determination of the JCF 87
4. Continue in this way until the total number of independent eigenvectors and principal
vectors is equal to the algebraic multiplicity of A.
Unfortunately, this naturallooking procedure can fail to find all Jordan vectors. For
more extensive treatments, see, for example, [20] and [21]. Determination of eigenvectors
and principal vectors is obviously very tedious for anything beyond simple problems (n = 2
or 3, say). Attempts to do such calculations in finiteprecision floatingpoint arithmetic
generally prove unreliable. There are significant numerical difficulties inherent in attempting
to compute a JCF, and the interested student is strongly urged to consult the classical and very
readable [8] to learn why. Notice that highquality mathematical software such as MATLAB
does not offer a jcf command, although a jordan command is available in MATLAB'S
Symbolic Toolbox.
Theorem 9.30. Suppose A e C
kxk
has an eigenvalue A, of algebraic multiplicity k and
suppose further that rank(A — AI) = k — 1. Let X = [ x
( l )
, . . . , x
(k)
], where the chain of
vectors x(i) is constructed as above. Then
Theorem 9.31. (x
( 1)
, . . . , x
(k)
} is a linearly independent set.
Theorem 9.32. Principal vectors associated with different Jordan blocks are linearly inde
pendent.
Example 9.33. Let
The eigenvalues of A are A1 = 1, h2 = 1, and h
3
= 2. First, find the eigenvectors associated
with the distinct eigenvalues 1 and 2.
,(1)
(A  2/)x3(1) = 0 yields
3. For each independent x
(2)
from step 2, solve
9.3. Determination of the JCF 87
3. For each independent X(2) from step 2, solve
(A  AI)x(3) = x(2).
4. Continue in this way until the total number of independent eigenvectors and principal
vectors is equal to the algebraic multiplicity of A.
Unfortunately, this naturallooking procedure can fail to find all Jordan vectors. For
more extensive treatments, see, for example, [20] and [21]. Determination of eigenvectors
and principal vectors is obviously very tedious for anything beyond simple problems (n = 2
or 3, say). Attempts to do such calculations in finiteprecision floatingpoint arithmetic
generally prove unreliable. There are significant numerical difficulties inherent in attempting
to compute a JCF, and the interested student is strongly urged to consult the classical and very
readable [8] to learn why. Notice that highquality mathematical software such as MATLAB
does not offer a j cf command, although a j ardan command is available in MATLAB's
Symbolic Toolbox.
Theorem 9.30. Suppose A E C
kxk
has an eigenvalue A of algebraic multiplicity k and
suppose further that rank(A  AI) = k  1. Let X = [x(l), ... , X(k)], where the chain of
vectors x(i) is constructed as above. Then
Theorem 9.31. {x(l), ... , X(k)} is a linearly independent set.
Theorem 9.32. Principal vectors associated with different Jordan blocks are linearly inde
pendent.
Example 9.33. Let
1 ;].
002
The eigenvalues of A are AI = I, A2 = 1, and A3 = 2. First, find the eigenvectors associated
with the distinct eigenvalues 1 and 2.
(A  = 0 yields
88 Chapter 9. Eigenvalues and Eigenvectors
(A l/)x,
(1)
=0 yields
Then it is easy to check that
9.3.2 On the +1 's in JCF blocks
In this subsection we show that the nonzero superdiagonal elements of a JCF need not be
1 's but can be arbitrary — so long as they are nonzero. For the sake of defmiteness, we
consider below the case of a single Jordan block, but the result clearly holds for any JCF.
Supposed A € R
nxn
and
Let D = d i a g ( d 1 , . . . , d
n
) be a nonsingular "scaling" matrix. Then
To find a principal vector of degree 2 associated with the multiple eigenvalue 1, solve
( A – l/)x,
(2)
= x,
(1)
toeet
Now let
88 Chapter 9. Eigenvalues and Eigenvectors
(A  11)x?J = 0 yields
To find a principal vector of degree 2 associated with the multiple eigenvalue 1, solve
(A  1I)xl
2
) = xiI) to get
[ 0 ]
(2)
x, = ~ .
Now let
xl" xl"] ~ [ ~
0 5
l
X = [xiI) 1 3
0
Then it is easy to check that
X  ' ~ U
0
5 ] [ I
n
1
i and XlAX = ~
1
0 0
9.3.2 On the +1 's in JCF blocks
In this subsection we show that the nonzero superdiagonal elements of a JCF need not be
1 's but can be arbitrary  so long as they are nonzero. For the sake of definiteness, we
consider below the case of a single Jordan block, but the result clearly holds for any JCF.
Suppose A E jRnxn and
Let D = diag(d" ... , d
n
) be a nonsingular "scaling" matrix. Then
A
4l.
0 0
d,
0
)...
!b.
0
d,
D'(X' AX)D = D' J D = j =
A
d
n

I
0
d
n

2
A
d
n
d
n

I
0 0
)...
9.4. Geometric Aspects of the JCF 89
Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements.
This result can also be interpreted in terms of the matrix X = [x\,..., x
n
] of eigenvectors
and principal vectors that reduces A to its JCF. Specifically, J is obtained from A via the
similarity transformation XD = \d\x\,..., d
n
x
n
}.
In a similar fashion, the reverseorder identity matrix (or exchange matrix)
9.4 Geometric Aspects of the JCF
Note that di mM( A — A.,/ )
w
= «,.
Definition 9.35. Let V be a vector space over F and suppose A : V —>• V is a linear
transformation. A subspace S c V is Ainvariant if AS c S, where AS is defined as the
set {As : s e S}.
can be used to put the superdiagonal elements in the subdiagonal instead if that is desired:
The matrix X that reduces a matrix A e IR"
X
" (or C
nxn
) to a JCF provides a change of basis
with respect to which the matrix is diagonal or block diagonal. It is thus natural to expect an
associated direct sum decomposition of R. Such a decomposition is given in the following
theorem.
Theorem 9.34. Suppose A e R"
x
" has characteristic polynomial
and minimal polynomial
with A i , . . . , A.
m
distinct. Then
9.4. Geometric Aspects of the JCF 89
Appropriate choice of the di's then yields any desired nonzero superdiagonal elements.
This result can also be interpreted in terms of the matrix X = [x[, ... ,x
n
] of eigenvectors
and principal vectors that reduces A to its lCF. Specifically, j is obtained from A via the
similarity transformation XD = [d[x[, ... , dnxn].
In a similar fashion, the reverseorder identity matrix (or exchange matrix)
0 0 I
0
p = pT = p[ =
(9.18)
0 1
I 0 0
can be used to put the superdiagonal elements in the subdiagonal instead if that is desired:
A I 0 0 A 0 0
0 A 0 A 0
p[
A
p=
0 1 A
0
A I A 0
0 0 A 0 0 A
9.4 Geometric Aspects of the JCF
The matrix X that reduces a matrix A E jH.nxn (or c
nxn
) to a lCF provides a change of basis
with respect to which the matrix is diagonal or block diagonal. It is thus natural to expect an
associated direct sum decomposition of jH.n. Such a decomposition is given in the following
theorem.
Theorem 9.34. Suppose A E jH.nxn has characteristic polynomial
n(A) = (A  A[)n) ... (A  Amt
m
and minimal polynomial
a(A) = (A  A[)V) '" (A  Am)V
m
with A I, ... , Am distinct. Then
jH.n = N(A  AlIt) E6 ... E6 N(A  AmItm
= N (A  A 1 I) v) E6 ... E6 N (A  Am I) Vm .
Note that dimN(A  AJ)Vi = ni.
Definition 9.35. Let V be a vector space over IF and suppose A : V + V is a linear
transformation. A subspace S ~ V is A invariant if AS ~ S, where AS is defined as the
set {As: s E S}.
90 Chapter 9. Eigenvalues and Eigenvectors
If V is taken to be R" over R, and S e R"
x
* is a matrix whose columns s\,..., s/t
span a /^dimensional subspace <S, i.e., K(S) = <S, then <S is Ainvariant if and only if there
exists M eR
kxk
such that
This follows easily by comparing the /th columns of each side of (9.19):
Example 9.36. The equation Ax = A* = xA defining a right eigenvector x of an eigenvalue
X says that * spans an Ainvariant subspace (of dimension one).
Example 9.37. Suppose X block diagonalizes A, i.e.,
Rewriting in the form
we have that A A, = A", /,, / = 1, 2, so the columns of A, span an Amvanant subspace.
Theorem 9.38. Suppose A e E"
x
".
7. Let p(A) = «o/ + o?i A + • • • + <x
q
A
q
be a polynomial in A. Then N(p(A)) and
7£(p(A)) are Ainvariant.
2. S isAinvariant if and only ifS
1
 is A
T
invariant.
Theorem 9.39. If V isa vector space over F such that V = N\ ® • • • 0 N
m
, where each
A// isAinvariant, then a basisfor V can be chosen with respect to which A hasa block
diagonal representation.
The Jordan canonical form is a special case of the above theorem. If A has distinct
eigenvalues A,, as in Theorem 9.34, we could choose bases for N(A — A.,/)"' by SVD, for
example (note that the power n, could be replaced by v,). We would then get a block diagonal
representation for A with full blocks rather than the highly structured Jordan blocks. Other
such "canonical" forms are discussed in text that follows.
Suppose A" = [ X i , . . . , X
m
] e R"
n
xn
is such that X ^AX = diag(7i,. . . , J
m
), where
each Ji = diag(/,i,..., //*,.) and each /,* is a Jordan block corresponding to A, e A(A).
We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our
attention here to only the Jordan block case. Note that A A", = A*, /,, so by (9.19) the columns
of A", (i.e., the eigenvectors and principal vectors associated with A.,) span an Ainvariant
subspace of W.
Finally, we return to the problem of developing a formula for e'
A
in the case that A
is not necessarily diagonalizable. Let 7, € C"
x
"' be a Jordan basis for N(A
T
— A.,/)"' .
Equivalently, partition
90 Chapter 9. Eigenvalues and Eigenvectors
If V is taken to be ]Rn over Rand S E ]Rn xk is a matrix whose columns SI, ... , Sk
span a kdimensional subspace S, i.e., R(S) = S, then S is Ainvariant if and only if there
exists M E ]Rkxk such that
AS = SM. (9.19)
This follows easily by comparing the ith columns of each side of (9.19):
Example 9.36. The equation Ax = AX = x A defining a right eigenvector x of an eigenvalue
A says that x spans an Ainvariant subspace (of dimension one).
Example 9.37. Suppose X block diagonalizes A, i.e.,
XI AX = [ ~ J
2
].
Rewriting in the form
~ J,
we have that AX
i
= X;li, i = 1,2, so the columns of Xi span an Ainvariant subspace.
Theorem 9.38. Suppose A E ]Rnxn.
1. Let peA) = CloI + ClIA + '" + ClqAq be a polynomial in A. Then N(p(A)) and
R(p(A)) are Ainvariant.
2. S is A invariant if and only if S 1. is A T invariant.
Theorem 9.39. If V is a vector space over IF such that V = NI EB ... EB N
m
, where each
N; is Ainvariant, then a basis for V can be chosen with respect to which A has a block
diagonal representation.
The Jordan canonical form is a special case of the above theorem. If A has distinct
eigenvalues Ai as in Theorem 9.34, we could choose bases for N(A  Ai/)n, by SVD, for
example (note that the power ni could be replaced by Vi). We would then get a block diagonal
representation for A with full blocks rather than the highly structured Jordan blocks. Other
such "canonical" forms are discussed in text that follows.
Suppose X = [Xl ..... Xm] E ] R ~ x n is such that XI AX = diag(J1, ... , J
m
), where
each J
i
= diag(JiI,"" Jik,) and each Jik is a Jordan block corresponding to Ai E A(A).
We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our
attention here to only the Jordan block case. Note that AXi = Xi J
i
, so by (9.19) the columns
of Xi (i.e., the eigenvectors and principal vectors associated with Ai) span an Ainvariant
subspace of]Rn.
Finally, we return to the problem of developing a formula for e
l
A in the case that A
is not necessarily diagonalizable. Let Yi E <e
nxn
, be a Jordan basis for N (AT  A;lt.
Equivalently, partition
9.5. The Matrix Sign Function 91
compatibly. Then
In a similar fashion we can compute
which is a useful formula when used in conjunction with the result
for a k x k Jordan block 7, associated with an eigenvalue A. = A.,.
9.5 The Matrix Sign Function
In this section we give a very brief introduction to an interesting and useful matrix function
called the matrix sign function. It is a generalization of the sign (or signum) of a scalar. A
survey of the matrix sign function and some of its applications can be found in [15].
Definition 9.40. Let z E C with Re(z) ^ 0. Then the sign of z is defined by
Definition 9.41. Suppose A e C"
x
" has no eigenvalues on the imaginary axis, and let
be a Jordan canonical form for A, with N containing all Jordan blocks corresponding to the
eigenvalues of A in the left halfplane and P containing all Jordan blocks corresponding to
eigenvalues in the right halfplane. Then the sign of A, denoted sgn(A), is given by
9.S. The Matrix Sign Function
compatibly. Then
A = XJX
I
= XJy
H
= [XI, ... , Xm] diag(JI, ... , J
m
) [Y
I
, ••• , Ym]H
m
= LX;JiYi
H
.
i=1
In a similar fashion we can compute
m
etA = LXietJ;YiH,
i=1
which is a useful formula when used in conjunction with the result
A 0 0
eAt teAt
.lt
2
e
At
2!
0 A
0
eAt teAt
exp t
A 0
0 0
eAt
1
0 0 A
0 0
for a k x k Jordan block J
i
associated with an eigenvalue A = Ai.
9.5 The Matrix Sign Function
91
In this section we give a very brief introduction to an interesting and useful matrix function
called the matrix sign function. It is a generalization of the sign (or signum) of a scalar. A
survey of the matrix sign function and some of its applications can be found in [15].
Definition 9.40. Let z E C with Re(z) f= O. Then the sign of z is defined by
Re(z) {+1
sgn(z) = IRe(z) I = 1
ifRe(z) > 0,
ifRe(z) < O.
Definition 9.41. Suppose A E cnxn has no eigenvalues on the imaginary axis, and let
be a Jordan canonical form for A, with N containing all Jordan blocks corresponding to the
eigenvalues of A in the left halfplane and P containing all Jordan blocks corresponding to
eigenvalues in the right halfplane. Then the sign of A, denoted sgn(A), is given by
[
/ 0] I
sgn(A) = X 0 / X ,
92 Chapter 9. Eigenvalues and Eigenvectors
where the negative and positive identity matrices are of the same dimensions as N and P,
respectively.
There are other equivalent definitions of the matrix sign function, but the one given
here is especially useful in deriving many of its key properties. The JCF definition of the
matrix sign function does not generally lend itself to reliable computation on a finiteword
length digital computer. In fact, its reliable numerical calculation is an interesting topic in
its own right.
We state some of the more useful properties of the matrix sign function as theorems.
Their straightforward proofs are left to the exercises.
Theorem 9.42. Suppose A e C"
x
" has no eigenvalues on the imaginary axis, and let
S = sgn(A). Then the following hold:
1. S is diagonalizable with eigenvalues equal to del.
2. S
2
= I.
3. AS = SA.
4. sgn(A") = (sgn(A))".
5. sgn(T
l
AT) = T
l
sgn(A)TforallnonsingularT e C"
x
".
6. sgn(cA) = sgn(c) sgn(A)/or all nonzero real scalars c.
Theorem 9.43. Suppose A e C"
x
" has no eigenvalues on the imaginary axis, and let
S — sgn(A). Then the following hold:
1. 7l(S — /) is an Ainvariant subspace corresponding to the left halfplane eigenvalues
of A (the negative invariant subspace).
2. R(S+/) is an Ainvariant subspace corresponding to the right halfplane eigenvalues
of A (the positive invariant subspace).
3. negA = (/ — S)/2 is a projection onto the negative invariant subspace of A.
4. posA = (/ + S)/2 is a projection onto the positive invariant subspace of A.
EXERCISES
1. Let A e C
nxn
have distinct eigenvalues AI, ..., X
n
with corresponding right eigen
vectors Xi, ... ,x
n
and left eigenvectors y\, ..., y
n
, respectively. Let v e C" be an
arbitrary vector. Show that v can be expressed (uniquely) as a linear combination
of the right eigenvectors. Find the appropriate expression for v as a linear combination
of the left eigenvectors as well.
92 Chapter 9. Eigenvalues and Eigenvectors
where the negative and positive identity matrices are of the same dimensions as N and p,
respectively.
There are other equivalent definitions of the matrix sign function, but the one given
here is especially useful in deriving many of its key properties. The JCF definition of the
matrix sign function does not generally lend itself to reliable computation on a finiteword
length digital computer. In fact, its reliable numerical calculation is an interesting topic in
its own right.
We state some of the more useful properties of the matrix sign function as theorems.
Their straightforward proofs are left to the exercises.
Theorem 9.42. Suppose A E e
nxn
has no eigenvalues on the imaginary axis, and let
S = sgn(A). Then the following hold:
1. S is diagonalizable with eigenvalues equal to ± 1.
2. S2 = I.
3. AS = SA.
4. sgn(AH) = (sgn(A»H.
5. sgn(T1AT) = T1sgn(A)T foralinonsingularT E e
nxn
.
6. sgn(cA) = sgn(c) sgn(A) for all nonzero real scalars c.
Theorem 9.43. Suppose A E e
nxn
has no eigenvalues on the imaginary axis, and let
S = sgn(A). Then the following hold:
I. R(S l) is an Ainvariant subspace corresponding to the left halfplane eigenvalues
of A (the negative invariant subspace).
2. R(S + l) is an A invariant subspace corresponding to the right halfplane eigenvalues
of A (the positive invariant subspace).
3. negA == (l  S) /2 is a projection onto the negative invariant subspace of A.
4. posA == (l + S)/2 is a projection onto the positive invariant subspace of A.
EXERCISES
1. Let A E e
nxn
have distinct eigenvalues ),.1> ••• , ),.n with corresponding right eigen
vectors Xl, ... , Xn and left eigenvectors Yl, ••. , Yn, respectively. Let v E en be an
arbitrary vector. Show that v can be expressed (uniquely) as a linear combination
of the right eigenvectors. Find the appropriate expression for v as a linear combination
of the left eigenvectors as well.
Exercises 93
2. Suppose A € C"
x
" is skewHermitian, i.e., A
H
= —A. Prove that all eigenvalues of
a skewHermitian matrix must be pure imaginary.
3. Suppose A e C"
x
" is Hermitian. Let A be an eigenvalue of A with corresponding
right eigenvector x. Show that x is also a left eigenvector for A. Prove the same result
if A is skewHermitian.
5. Determine the eigenvalues, right eigenvectors and right principal vectors if necessary,
and (real) JCFs of the following matrices:
6. Determine the JCFs of the following matrices:
Find a nonsingular matrix X such that X
1
AX = J, where J is the JCF
Hint: Use[ — 1 1 — l]
r
as an eigenvector. The vectors [0 1 — l]
r
and[ l 0 0]
r
are both eigenvectors, but then the equation (A — /)jc
(2)
= x
(1)
can't be solved.
8. Show that all right eigenvectors of the Jordan block matrix in Theorem 9.30 must be
multiples of e\ e R*. Characterize all left eigenvectors.
9. Let A e R"
x
" be of the form A = xy
T
, where x, y e R" are nonzero vectors with
x
T
y = 0. Determine the JCF of A.
10. Let A e R"
xn
be of the form A = / + xy
T
, where x, y e R" are nonzero vectors
with x
T
y = 0. Determine the JCF of A.
11. Suppose a matrix A e R
16x 16
has 16 eigenvalues at 0 and its JCF consists of a single
Jordan block of the form specified in Theorem 9.22. Suppose the small number 10~
16
is added to the (16,1) element of J. What are the eigenvalues of this slightly perturbed
matrix?
4. Suppose a matrix A € R
5x5
has eigenvalues {2, 2, 2, 2, 3}. Determine all possible
JCFs for A.
7. Let
Exercises 93
2. Suppose A E rc
nxn
is skewHermitian, i.e., AH = A. Prove that all eigenvalues of
a skewHermitian matrix must be pure imaginary.
3. Suppose A E rc
nxn
is Hermitian. Let A be an eigenvalue of A with corresponding
right eigenvector x. Show that x is also a left eigenvector for A. Prove the same result
if A is skewHermitian.
4. Suppose a matrix A E lR.
5x5
has eigenvalues {2, 2, 2, 2, 3}. Determine all possible
JCFs for A.
5. Determine the eigenvalues, right eigenvectors and right principal vectors if necessary,
and (real) JCFs of the following matrices:
[
2 1 ]
(a) 1 0 '
6. Determine the JCFs of the following matrices:
<a) U j n
2
1
2
=n
7. Let
A = [H 1]·
2 2"
Find a nonsingular matrix X such that XI AX = J, where J is the JCF
J = [ ~ ~ ~ ] .
001
Hint: Use[1 1  I]T as an eigenvector. The vectors [0 If and[1 0 of
are both eigenvectors, but then the equation (A  I)x(2) = x(1) can't be solved.
8. Show that all right eigenvectors of the Jordan block matrix in Theorem 9.30 must be
multiples of el E lR.
k
. Characterize all left eigenvectors.
9. Let A E lR.
nxn
be of the form A = xyT, where x, y E lR.
n
are nonzero vectors with
x
T
y = O. Determine the JCF of A.
10. Let A E lR.
nxn
be of the form A = 1+ xyT, where x, y E lR.
n
are nonzero vectors
with x
T
y = O. Determine the JCF of A.
11. Suppose a matrix A E lR.
16x
16 has 16 eigenvalues at 0 and its JCF consists of a single
Jordan block of the form specified in Theorem 9.22. Suppose the small number 10
16
is added to the (16,1) element of J. What are the eigenvalues of this slightly perturbed
matrix?
94 Chapter 9. Eigenvalues and Eigenvectors
12. Show that every matrix A e R"
x
" can be factored in the form A = Si$2, where Si
and £2 are real symmetric matrices and one of them, say Si, is nonsingular.
Hint: Suppose A = X J X ~
l
is a reduction of A to JCF and suppose we can construct
the "symmetric factorization" of J . Then A = ( X S i X
T
) ( X ~
T
S
2
X ~
l
) would be the
required symmetric factorization of A. Thus, it suffices to prove the result for the
JCF. The transformation P in (9.18) is useful.
13. Prove that every matrix A e W
x
" is similar to its transpose and determine a similarity
transformation explicitly.
Hint: Use the factorization in the previous exercise.
14. Consider the block upper triangular matrix
where A e M"
xn
and A
n
e R
kxk
with 1 < k < n. Suppose A
u
^ 0 and that we
want to block diagonalize A via the similarity transformation
where X e R*
x
<«  *), i.e.,
Find a matrix equation that X must satisfy for this to be possible. If n = 2 and k = 1,
what can you say further, in terms of AU and A 22, about when the equation for X is
solvable?
15. Prove Theorem 9.42.
16. Prove Theorem 9.43.
17. Suppose A e C"
xn
has all its eigenvalues in the left half plane. Prove that
sgn(A) =  /.
94 Chapter 9. Eigenvalues and Eigenvectors
12. Show that every matrix A E jRnxn can be factored in the form A = SIS2, where SI
and S2 are real symmetric matrices and one of them, say S1, is nonsingular.
Hint: Suppose A = Xl XI is a reduction of A to JCF and suppose we can construct
the "symmetric factorization" of 1. Then A = (X SIXT)(X
T
S2XI) would be the
required symmetric factorization of A. Thus, it suffices to prove the result for the
JCF. The transformation P in (9.18) is useful.
13. Prove that every matrix A E jRn xn is similar to its transpose and determine a similarity
transformation explicitly.
Hint: Use the factorization in the previous exercise.
14. Consider the block upper triangular matrix
A _ [ All
 0
Al2 ]
A22 '
where A E jRnxn and All E jRkxk with 1 ::s: k ::s: n. Suppose Al2 =1= 0 and that we
want to block diagonalize A via the similarity transformation
where X E IRkx(nk), i.e.,
TIAT = [A 011 0 ]
A22 .
Find a matrix equation that X must satisfy for this to be possible. If n = 2 and k = 1,
what can you say further, in terms of All and A22, about when the equation for X is
solvable?
15. Prove Theorem 9.42.
16. Prove Theorem 9.43.
17. Suppose A E en xn has all its eigenvalues in the left halfplane. Prove that
sgn(A) = 1.
Chapter 10
Canonical Forms
10.1 Some Basic Canonical Forms
Problem: Let V and W be vector spaces and suppose A : V — > • W is a linear transformation.
Find bases in V and W with respect to which Mat A has a "simple form" or "canonical
form." In matrix terms, if A e R
mxn
, find P e R™
xm
and Q e R
n
n
xn
such that PAQ has a
"canonical form." The transformation A M» PAQ is called an equivalence; it is called an
orthogonal equivalence if P and Q are orthogonal matrices.
Remark 10.1. We can also consider the case A e C
m xn
and unitary equivalence if P and
< 2 are unitary.
Two special cases are of interest:
1. If W = V and < 2 = P"
1
, the transformation A H> PAP"
1
is called a similarity.
2 . If W = V and if Q = P
T
is orthogonal, the transformation A i» PAP
T
is called
an orthogonal similarity (or unitary similarity in the complex case).
The following results are typical of what can be achieved under a unitary similarity. If
A = A
H
6 C"
x
" has eigenvalues AI, . . . , A
n
, then there exists a unitary matrix £7 such that
U
H
AU — D, where D = di ag( A. j , . . . , A.
n
). This is proved in Theorem 10.2 . What other
matrices are "diagonalizable" under unitary similarity? The answer is given in Theorem
10.9, where it is proved that a general matrix A e C"
x
" is unitarily similar to a diagonal
matrix if and only if it is normal (i.e., AA
H
= A
H
A). Normal matrices include Hermitian,
skewHermitian, and unitary matrices (and their "real" counterparts: symmetric, skew
symmetric, and orthogonal, respectively), as well as other matrices that merely satisfy the
definition, such as A = [ _
a
b
^1 for real scalars a and b. If a matrix A is not normal, the
most "diagonal" we can get is the JCF described in Chapter 9.
Theorem 10.2. Let A = A
H
e C"
x
" have (real) eigenvalues A. I, . . . , X
n
. Then there
exists a unitary matrix X such that X
H
AX = D = diag(A.j , . . . , X
n
) (the columns ofX are
orthonormal eigenvectors for A).
95
Chapter 10
Canonical Forms
10.1 Some Basic Canonical Forms
Problem: Let V and W be vector spaces and suppose A : V + W is a linear transformation.
Find bases in V and W with respect to which Mat A has a "simple form" or "canonical
form." In matrix terms, if A E IR
mxn
, find P E lR;;:xm and Q E l R ~ x n such that P AQ has a
"canonical form." The transformation A f+ P AQ is called an equivalence; it is called an
orthogonal equivalence if P and Q are orthogonal matrices.
Remark 10.1. We can also consider the case A E e
mxn
and unitary equivalence if P and
Q are unitary.
Two special cases are of interest:
1. If W = V and Q = p
1
, the transformation A f+ PAPI is called a similarity.
2. If W = V and if Q = pT is orthogonal, the transformation A f+ P ApT is called
an orthogonal similarity (or unitary similarity in the complex case).
The following results are typical of what can be achieved under a unitary similarity. If
A = A H E en xn has eigenvalues AI, ... , An, then there exists a unitary matrix U such that
U
H
AU = D, where D = diag(AJ, ... , An). This is proved in Theorem 10.2. What other
matrices are "diagonalizable" under unitary similarity? The answer is given in Theorem
10.9, where it is proved that a general matrix A E e
nxn
is unitarily similar to a diagonal
matrix if and only if it is normal (i.e., AA H = AHA). Normal matrices include Hermitian,
skewHermitian, and unitary matrices (and their "real" counterparts: symmetric, skew
symmetric, and orthogonal, respectively), as well as other matrices that merely satisfy the
definition, such as A = [ _ ~ !] for real scalars a and h. If a matrix A is not normal, the
most "diagonal" we can get is the JCF described in Chapter 9.
Theorem 10.2. Let A = A H E en xn have (real) eigenvalues AI, ... ,An. Then there
exists a unitary matrix X such that X
H
AX = D = diag(Al, ... , An) (the columns of X are
orthonormal eigenvectors for A).
95
96 Chapter 10. Canonical Forms
Proof: Let x\ be a right eigenvector corresponding to X\, and normalize it such that xf*x\ =
1. Then there exist n — 1 additional vectors x
2
, ..., x
n
such that X = [x\,..., x
n
] =
[x\ X
2
] is unitary. Now
Then x^U
2
= 0 (/ € k) means that x
f
is orthogonal to each of the n — k columns of U
2
.
But the latter are orthonormal since they are the last n — k rows of the unitary matrix U.
Thus, [Xi f/2] is unitary. D
The construction called for in Theorem 10.2 is then a special case of Theorem 10.3
for k = 1. We illustrate the construction of the necessary Householder matrix for k — 1.
For simplicity, we consider the real case. Let the unit vector x\ be denoted by [£i, ..., %
n
]
T
.
In (10.1) we have used the fact that Ax\ = k\x\. When combined with the fact that
x"xi = 1, we get Ai remaining in the (l,l)block. We also get 0 in the (2,l)block by
noting that x\ is orthogonal to all vectors in X
2
. In (10.2), we get 0 in the (l,2)block by
noting that X
H
AX is Hermitian. The proof is completed easily by induction upon noting
that the (2,2)block must have eigenvalues X
2
,..., A.
n
. D
Given a unit vector x\ e E", the construction of X
2
e ]R"
X
("
1
) such that X —
[x\ X
2
] is orthogonal is frequently required. The construction can actually be performed
quite easily by means of Householder (or Givens) transformations as in the proof of the
following general result.
Theorem 10.3. Let X\ e C
nxk
have orthonormal columns and suppose U is a unitary
matrix such that UX\ = \
0
1, where R € C
kxk
is upper triangular. Write U
H
= [U\ U
2
]
with Ui € C
nxk
. Then [Xi U
2
] is unitary.
Proof: Let X\ = [x\,..., Xk]. Construct a sequence of Householder matrices (also known
as elementary reflectors) H\,..., H
k
in the usual way (see below) such that
where R is upper triangular (and nonsingular since x\, ..., Xk are orthonormal). Let U =
H
k
...H
v
. Then U
H
= / / , • • H
k
and
96 Chapter 10. Canonical Forms
Proof' Let XI be a right eigenvector corresponding to AI, and normalize it such that XI =
1. Then there exist n  1 additional vectors X2, ... , Xn such that X = (XI, ... , xn] =
[XI X
2
] is unitary. Now
XHAX = [
xH
] A [XI X2] = [
]
I
XH
XfAxl XfAX
2
2
=[
Al
]
(10.1)
0 XfAX
2
=[
Al 0
l
(10.2)
0
XfAX
z
In (l0.1) we have used the fact that AXI = AIXI. When combined with the fact that
XI = 1, we get Al remaining in the (l,I)block. We also get 0 in the (2, I)block by
noting that XI is orthogonal to all vectors in Xz. In (10.2), we get 0 in the (l,2)block by
noting that XH AX is Hermitian. The proof is completed easily by induction upon noting
that the (2,2)block must have eigenvalues A2, ... , An. 0
Given a unit vector XI E JRn, the construction of X
z
E JRnx(nl) such that X =
[XI X
2
] is orthogonal is frequently required. The construction can actually be performed
quite easily by means of Householder (or Givens) transformations as in the proof of the
following general result.
Theorem 10.3. Let XI E C
nxk
have orthonormal columns and suppose V is a unitary
matrix such that V X I = [ where R E C
kxk
is upper triangular. Write V H = [VI Vz]
with VI E C
nxk
. Then [XI V
2
] is unitary.
Proof: Let X I = [XI, ... ,xd. Construct a sequence of Householder matrices (also known
as elementary reflectors) HI, ... , Hk in the usual way (see below) such that
Hk ... HdxI, ... , xd = [ l
where R is upper triangular (and nonsingular since XI, ... , Xk are orthonormal). Let V =
Hk'" HI. Then VH = HI'" Hk and
Then X
i
H
U2 = 0 (i E means that Xi is orthogonal to each of the n  k columns of V2.
But the latter are orthonormal since they are the last n  k rows of the unitary matrix U.
Thus. [XI U2] is unitary. 0
The construction called for in Theorem 10.2 is then a special case of Theorem 10.3
for k = 1. We illustrate the construction of the necessary Householder matrix for k = 1.
For simplicity, we consider the real case. Let the unit vector XI be denoted by .. , ,
10.1. Some Basic Canonical Forms 97
Then the necessary Householder matrix needed for the construction of X^ is given by
U = I — 2uu
+
= I — ^UU
T
, where u = [t\ ± 1, £2, • • •» £«]
r
 It can easily be checked
that U is symmetric and U
T
U = U
2
= I, so U is orthogonal. To see that U effects the
necessary compression of j ci, it is easily verified that U
T
U = 2 ± 2£i and U
T
X\ = 1 ± £1.
Thus,
Further details on Householder matrices, including the choice of sign and the complex case,
can be consulted in standard numerical linear algebra texts such as [7], [11], [23], [25].
The real version of Theorem 10.2 is worth stating separately since it is applied fre
quently in applications.
Theorem 10.4. Let A = A
T
e E
nxn
have eigenvalues k\, ... ,X
n
. Then there exists an
orthogonal matrix X e W
lxn
(whose columns are orthonormal eigenvectors of A) such that
X
T
AX = D = diag(Xi, . . . , X
n
).
Note that Theorem 10.4 implies that a symmetric matrix A (with the obvious analogue
from Theorem 10.2 for Hermitian matrices) can be written
which is often called the spectral representation of A. In fact, A in (10.3) is actually a
weighted sum of orthogonal projections P, (onto the onedimensional eigenspaces corre
sponding to the A., 's), i.e.,
where P, = PUM —
x
i
x
f =
x
i
x
j since xj xi — 1.
The following pair of theorems form the theoretical foundation of the doubleFrancis
QR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
10.1. Some Basic Canonical Forms 97
Then the necessary Householder matrix needed for the construction of X
2
is given by
U = I  2uu+ = I  +uu
T
, where u = [';1 ± 1, ';2, ... , ';nf. It can easily be checked
u u
that U is symmetric and U
T
U = U
2
= I, so U is orthogonal. To see that U effects the
necessary compression of Xl, it is easily verified that u
T
u = 2 ± 2';1 and u
T
Xl = 1 ± ';1.
Thus,
Further details on Householder matrices, including the choice of sign and the complex case,
can be consulted in standard numerical linear algebra texts such as [7], [11], [23], [25].
The real version of Theorem 10.2 is worth stating separately since it is applied fre
quently in applications.
Theorem 10.4. Let A = AT E jRnxn have eigenvalues AI, ... , An. Then there exists an
orthogonal matrix X E jRn xn (whose columns are orthonormal eigenvectors of A) such that
XT AX = D = diag(Al, ... , An).
Note that Theorem 10.4 implies that a symmetric matrix A (with the obvious analogue
from Theorem 10.2 for Hermitian matrices) can be written
n
A = XDX
T
= LAiXiXT,
(10.3)
i=1
which is often called the spectral representation of A. In fact, A in (10.3) is actually a
weighted sum of orthogonal projections Pi (onto the onedimensional eigenspaces corre
sponding to the Ai'S), i.e.,
n
A = LAiPi,
i=l
where Pi = PR(x;) = xiXt = xixT since xT Xi = 1.
The following pair of theorems form the theoretical foundation of the doubleFrancis
QR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
98 Chapter 10. Canonical Forms
Theorem 10.5 (Schur). Let A e C"
x
". Then there exists a unitary matrix U such that
U
H
AU = T, where T is upper triangular.
Proof: The proof of this theorem is essentially the same as that of Theorem 10.2 except that
in this case (using the notation U rather than X) the (l,2)block wf AU2 is not 0. D
In the case of A e R"
x
", it is thus unitarily similar to an upper triangular matrix, but
if A has a complex conjugate pair of eigenvalues, then complex arithmetic is clearly needed
to place such eigenvalues on the diagonal of T. However, the next theorem shows that every
A e W
xn
is also orthogonally similar (i.e., real arithmetic) to a quasiuppertriangular
matrix. A quasiuppertriangular matrix is block upper triangular with 1 x 1 diagonal
blocks corresponding to its real eigenvalues and 2x2 diagonal blocks corresponding to its
complex conjugate pairs of eigenvalues.
Theorem 10.6 (MurnaghanWintner). Let A e R"
x
". Then there exists an orthogonal
matrix U such that U
T
AU = S, where S is quasiuppertriangular.
Definition 10.7. The triangular matrix T in Theorem 10.5 is called a Schur canonical
form or Schur form. The quasiuppertriangular matrix S in Theorem 10.6 is called a real
Schur canonical form or real Schur form (RSF). The columns of a unitary [orthogonal]
matrix U that reduces a matrix to [real] Schur form are called Schur vectors.
Example 10.8. The matrix
is in RSF. Its real JCF is
Note that only the first Schur vector (and then only if the corresponding first eigenvalue
is real if U is orthogonal) is an eigenvector. However, what is true, and sufficient for virtually
all applications (see, for example, [17]), is that the first k Schur vectors span the same A
invariant subspace as the eigenvectors corresponding to the first k eigenvalues along the
diagonal of T (or S).
While every matrix can be reduced to Schur form (or RSF), it is of interest to know
when we can go further and reduce a matrix via unitary similarity to diagonal form. The
following theorem answers this question.
Theorem 10.9. A matrix A e C"
x
" is unitarily similar to a diagonal matrix if and only if
A is normal (i.e., A
H
A = AA
H
).
Proof: Suppose U is a unitary matrix such that U
H
AU = D, where D is diagonal. Then
so A is normal.
98 Chapter 10. Canonical Forms
Theorem 10.5 (Schur). Let A E c
nxn
. Then there exists a unitary matrix U such that
U
H
AU = T, where T is upper triangular.
Proof: The proof of this theorem is essentially the same as that of Theorem lO.2 except that
in this case (using the notation U rather than X) the (l,2)block ur AU
2
is not O. 0
In the case of A E IR
n
xn , it is thus unitarily similar to an upper triangular matrix, but
if A has a complex conjugate pair of eigenvalues, then complex arithmetic is clearly needed
to place such eigenValues on the diagonal of T. However, the next theorem shows that every
A E IR
nxn
is also orthogonally similar (i.e., real arithmetic) to a quasiuppertriangular
matrix. A quasiuppertriangular matrix is block upper triangular with 1 x 1 diagonal
blocks corresponding to its real eigenvalues and 2 x 2 diagonal blocks corresponding to its
complex conjugate pairs of eigenvalues.
Theorem 10.6 (MurnaghanWintner). Let A E IR
n
xn. Then there exists an orthogonal
matrix U such that U
T
AU = S, where S is quasiuppertriangular.
Definition 10.7. The triangular matrix T in Theorem 10.5 is called a Schur canonical
form or Schur fonn. The quasiuppertriangular matrix S in Theorem 10.6 is called a real
Schur canonical form or real Schur fonn (RSF). The columns of a unitary [orthogonal}
matrix U that reduces a matrix to [real} Schur fonn are called Schur vectors.
Example 10.8. The matrix
s ~ [
2 5
n
2 4
0 0
is in RSF. Its real JCF is
h[
1
n
1 1
0 0
Note that only the first Schur vector (and then only if the corresponding first eigenvalue
is real if U is orthogonal) is an eigenvector. However, what is true, and sufficient for virtually
all applications (see, for example, [17]), is that the first k Schur vectors span the same A
invariant subspace as the eigenvectors corresponding to the first k eigenvalues along the
diagonal of T (or S).
While every matrix can be reduced to Schur form (or RSF), it is of interest to know
when we can go further and reduce a matrix via unitary similarity to diagonal form. The
following theorem answers this question.
Theorem 10.9. A matrix A E c
nxn
is unitarily similar to a diagonal matrix if and only if
A is normal (i.e., AH A = AA
H
).
Proof: Suppose U is a unitary matrix such that U
H
AU = D, where D is diagonal. Then
AAH = U VUHU VHU
H
= U DDHU
H
== U DH DU
H
== AH A
so A is normal.
10.2. Definite Matrices 99
Conversely, suppose A is normal and let U be a unitary matrix such that U
H
AU = T,
where T is an upper triangular matrix (Theorem 10.5). Then
It is then a routine exercise to show that T must, in fact, be diagonal. D
10.2 Definite Matrices
Definition 10.10. A symmetric matrix A e W
xn
is
1. positive definite if and only ifx
T
Ax > Qfor all nonzero x G W
1
. We write A > 0.
2. nonnegative definite (or positive semidefinite) if and only if X
T
Ax > 0 for all
nonzero x e W. We write A > 0.
3. negative definite if—A is positive definite. We write A < 0.
4. nonpositive definite (or negative semidefinite) if— A is nonnegative definite. We
write A < 0.
Also, if A and B are symmetric matrices, we write A > B if and only if A — B > 0 or
B — A < 0. Similarly, we write A > B if and only ifA — B>QorB — A < 0.
Remark 10.11. If A e C"
x
" is Hermitian, all the above definitions hold except that
superscript //s replace Ts. Indeed, this is generally true for all results in the remainder of
this section that may be stated in the real case for simplicity.
Remark 10.12. If a matrix is neither definite nor semidefinite, it is said to be indefinite.
Theorem 10.13. Let A = A
H
e C
nxn
with eigenvalues X
{
> A
2
> • • • > A
n
. Then for all
x eC",
Proof: Let U be a unitary matrix that diagonalizes A as in Theorem 10.2. Furthermore,
let v = U
H
x, where x is an arbitrary vector in C
M
, and denote the components of y by
j]i, i € n. Then
But clearly
10.2. Definite Matrices 99
Conversely, suppose A is normal and let U be a unitary matrix such that U H A U = T,
where T is an upper triangular matrix (Theorem 10.5). Then
It is then a routine exercise to show that T must, in fact, be diagonal. 0
10.2 Definite Matrices
Definition 10.10. A symmetric matrix A E lR.
nxn
is
1. positive definite if and only if x T Ax > 0 for all nonzero x E lR.
n
. We write A > O.
2. nonnegative definite (or positive semidefinite) if and only if x
T
Ax :::: 0 for all
nonzero x E lR.
n
• We write A :::: O.
3. negative definite if  A is positive definite. We write A < O.
4. nonpositive definite (or negative semidefinite) if A is nonnegative definite. We
write A ~ O.
Also, if A and B are symmetric matrices, we write A > B if and only if A  B > 0 or
B  A < O. Similarly, we write A :::: B if and only if A  B :::: 0 or B  A ~ O.
Remark 10.11. If A E e
nxn
is Hermitian, all the above definitions hold except that
superscript H s replace T s. Indeed, this is generally true for all results in the remainder of
this section that may be stated in the real case for simplicity.
Remark 10.12. If a matrix is neither definite nor semidefinite, it is said to be indefinite.
Theorem 10.13. Let A = AH E e
nxn
with eigenvalues AI :::: A2 :::: ... :::: An. Thenfor all
x E en,
Proof: Let U be a unitary matrix that diagonalizes A as in Theorem 10.2. Furthermore,
let y = U H x, where x is an arbitrary vector in en, and denote the components of y by
11;, i En. Then
But clearly
n
x
H
Ax = (U
H
X)H U
H
AU(U
H
x) = yH Dy = LA; 111;12.
n
LA; 11'/;12 ~ AlyH Y = AIX
H
X
;=1
;=1
100 Chapter 10. Canonical Forms
and
from which the theorem follows. D
Remark 10.14. The ratio ^^ for A = A
H
< = C
nxn
and nonzero jc e C" is called the
Rayleigh quotient of jc. Theorem 10.13 provides upper (AO and lower (A.
w
) bounds for
the Rayleigh quotient. If A = A
H
e C"
x
" is positive definite, X
H
Ax > 0 for all nonzero
x E C", soO < X
n
< • • • < A. I.
Corollary 10.15. Let A e C"
x
". Then \\A\\
2
= ^
m
(A
H
A}.
Proof: For all x € C" we have
Let jc be an eigenvector corresponding to X
max
(A
H
A). Then ^pjp
2
= ^^(A" A) , whence
Definition 10.16. A principal submatrix of an nxn matrix A is the (n — k)x(n — k) matrix
that remains by deleting k rows and the corresponding k columns. A leading principal
submatrix of order n — k is obtained by deleting the last k rows and columns.
Theorem 10.17. A symmetric matrix A e E"
x
" is positive definite if and only if any of the
following three equivalent conditions hold:
1. The determinants of all leading principal submatrices of A are positive.
2. All eigenvalues of A are positive.
3. A can be written in the formM
T
M, where M e R"
x
" is nonsingular.
Theorem 10.18. A symmetric matrix A € R"
x
" is nonnegative definite if and only if any
of the following three equivalent conditions hold:
1. The determinants of all principal submatrices of A are nonnegative.
2. All eigenvalues of A are nonnegative.
3. A can be written in the formM
T
M, where M 6 R
ix
" and k > rank(A) — rank(M) .
Remark 10.19. Note that the determinants of all principal eubmatrioes muet bQ nonnogativo
in Theorem 10.18.1, not just those of the leading principal submatrices. For example,
consider the matrix A — [
0
_
l
1. The determinant of the 1x1 leading submatrix is 0 and
the determinant of the 2x2 leading submatrix is also 0 (cf. Theorem 10.17). However, the
100 Chapter 10. Canonical Forms
and
n
LAillJilZ::: AnyHy = An
xHx
,
i=l
from which the theorem follows. 0
Remark 10.14. The ratio XHHAx for A = AH E e
nxn
and nonzero x E en is called the
x x
Rayleigh quotient of x. Theorem 1O.l3 provides upper (A 1) and lower (An) bounds for
the Rayleigh quotient. If A = AH E e
nxn
is positive definite, x
H
Ax > 0 for all nonzero
x E en, so 0 < An ::::: ... ::::: A I.
I
Corollary 10.15. Let A E e
nxn
. Then IIAII2 = Ar1ax(AH A).
Proof: For all x E en we have
I
Let x be an eigenvector corresponding to Amax (A H A). Then = Ar1ax (A H A), whence
IIAxll2 ! H
IIAliz = max = Amax{A A). 0
xfO IIxll2
Definition 10.16. A principal submatrixofan n x n matrix A is the (n k) x (n k) matrix
that remains by deleting k rows and the corresponding k columns. A leading principal
submatrix of order n  k is obtained by deleting the last k rows and columns.
Theorem 10.17. A symmetric matrix A E is positive definite if and only if any of the
following three equivalent conditions hold:
1. The determinants of all leading principal submatrices of A are positive.
2. All eigenvalues of A are positive.
3. A can be written in the form MT M, where M E xn is nonsingular.
Theorem 10.18. A symmetric matrix A E xn is nonnegative definite if and only if any
of the following three equivalent conditions hold:
1. The determinants of all principal submatrices of A are nonnegative.
2. All eigenvalues of A are nonnegaTive.
3. A can be wrirren in [he/orm MT M, where M E IRb<n and k ranlc(A) "" ranlc(M).
R.@mllrk 10.19. Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll "ubm!ltriC[!!l mu"t bB nonnBgmivB
in Theorem 10.18.1, not just those of the leading principal submatrices. For example,
consider the matrix A = _ The determinant of the I x 1 leading submatrix is 0 and
the determinant of the 2 x 2 leading submatrix is also 0 (cf. Theorem 10.17). However, the
10.2. Definite Matrices 101
principal submatrix consisting of the (2,2) element is, in fact, negative and A is nonpositive
definite.
Remark 10.20. The factor M in Theorem 10.18.3 is not unique. For example, if
Recall that A > B if the matrix A — B is nonnegative definite. The following
theorem is useful in "comparing" symmetric matrices. Its proof is straightforward from
basic definitions.
Theorem 10.21. Let A, B e R
nxn
be symmetric.
1. If A >BandMe R
nxm
, then M
T
AM > M
T
BM.
2. If A >B and M e R
nxm
, then M
T
AM > M.
T
BM.
j m
The following standard theorem is stated without proof (see, for example, [16, p.
181]). It concerns the notion of the "square root" of a matrix. That is, if A € E"
xn
, we say
that S e R
nx
" is a square root of A if S
2
— A. In general, matrices (both symmetric and
nonsymmetric) have infinitely many square roots. For example, if A = /2, any matrix S of
the form [
c
s
°*
e
e
_
c
s
™
9
e
] is a square root.
Theorem 10.22. Let A e R"
x
" be nonnegative definite. Then A has a unique nonnegative
definite square root S. Moreover, SA = AS and rankS = rank A (and hence S is positive
definite if A is positive definite).
A stronger form of the third characterization in Theorem 10.17 is available and is
known as the Cholesky factorization. It is stated and proved below for the more general
Hermitian case.
Theorem 10.23. Let A e <C
nxn
be Hermitian and positive definite. Then there exists a
unique nonsingular lower triangular matrix L with positive diagonal elements such that
A = LL
H
.
Proof: The proof is by induction. The case n = 1 is trivially true. Write the matrix A in
the form
By our induction hypothesis, assume the result is true for matrices of order n — 1 so that B
may be written as B = L\L^, where L\ e C
1
""
1
^""^ is nonsingular and lower triangular
then M can be
10.2. Definite Matrices 101
principal submatrix consisting of the (2,2) element is, in fact, negative and A is nonpositive
definite.
Remark 10.20. The factor M in Theorem 10.18.3 is not unique. For example, if
then M can be
[1 0], [ fz
ti
o [ ~ 0]
o l ~ 0 , ...
v'3 0
Recall that A :::: B if the matrix A  B is nonnegative definite. The following
theorem is useful in "comparing" symmetric matrices. Its proof is straightforward from
basic definitions.
Theorem 10.21. Let A, B E jRnxn be symmetric.
1. 1f A :::: Band M E jRnxm, then MT AM :::: MT BM.
2. If A> Band M E j R ~ x m , then MT AM> MT BM.
The following standard theorem is stated without proof (see, for example, [16, p.
181]). It concerns the notion of the "square root" of a matrix. That is, if A E lR.
nxn
, we say
that S E jRn xn is a square root of A if S2 = A. In general, matrices (both symmetric and
nonsymmetric) have infinitely many square roots. For example, if A = lz, any matrix S of
h
" [COSO Sino] .
t e 10rm sinO _ cosO IS a square root.
Theorem 10.22. Let A E lR.
nxn
be nonnegative definite. Then A has a unique nonnegative
definite square root S. Moreover, SA = AS and rankS = rankA (and hence S is positive
definite if A is positive definite).
A stronger form of the third characterization in Theorem 10.17 is available and is
known as the Cholesky factorization. It is stated and proved below for the more general
Hermitian case.
Theorem 10.23. Let A E c
nxn
be Hermitian and positive definite. Then there exists a
unique nonsingular lower triangular matrix L with positive diagonal elements such that
A = LLH.
Proof: The proof is by induction. The case n = 1 is trivially true. Write the matrix A in
the form
By our induction hypothesis, assume the result is true for matrices of order n  1 so that B
may be written as B = L1Lf, where Ll E c(nl)x(nl) is nonsingular and lower triangular
102 Chapt er 10. Ca n o n i c a l Forms
with positive diagonal elements. It remains to prove that we can write the n x n matrix A
in the form
10.3 Equivalence Transformations and Congruence
Theorem 10.24. Let A € C™*
7 1
. Then there exist matrices P e C ™
xm
and Q e C"
n
x
" such
that
Proof: A classical proof can be consulted in, for example, [21, p. 131]. Alternatively,
suppose A has an SVD of the form (5.2) in its complex version. Then
Note that the greater freedom afforded by the equivalence transformation of Theorem
10.24, as opposed to the more restrictive situation of a similarity transformation, yields a
far "simpler" canonical form (10.4). However, numerical procedures for computing such
an equivalence directly via, say, Gaussian or elementary row and column operations, are
generally unreliable. The numerically preferred equivalence is, of course, the unitary equiv
alence known as the SVD. However, the SVD is relatively expensive to compute and other
canonical forms exist that are intermediate between (10.4) and the SVD; see, for example
[7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable
than (10.4) and more efficiently computable than a ful l SVD. Many similar results are also
available.
where a is positive. Performing the indicated matrix multiplication and equating the cor
responding submatrices, we see that we must have L\c = b and a
nn
= C
H
C + a
2
. Clearly
c is given simply by c = L^b. Substituting in the expression involving a, we find
a
2
= a
nn
— b
H
L\
H
L\
l
b = a
nn
— b
H
B~
l
b (= the Schur complement of B in A). But we
know that
Since det (fi ) > 0, we must have a
nn
—b
H
B
l
b > 0. Choosing a to be the positive square
root of «„„ — b
H
B~
l
b completes the proof. D
102 Chapter 10. Canonical Forms
with positive diagonal elements. It remains to prove that we can write the n x n matrix A
in the form
b ] = [L
J
0 ] [Lf c J,
ann c a 0 a
where a is positive. Performing the indicated matrix multiplication and equating the cor
responding submatrices, we see that we must have L I C = b and ann = c
H
c + a
2
• Clearly
c is given simply by c = C,lb. Substituting in the expression involving a, we find
a
2
= ann  b
H
LIH L11b = ann  b
H
B1b (= the Schur complement of B in A). But we
know that
o < det(A) = det [
b ] = det(B) det(a
nn
_ b
H
B1b).
ann
Since det(B) > 0, we must have ann  b
H
B1b > O. Choosing a to be the positive square
root of ann  b
H
B1b completes the proof. 0
10.3 Equivalence Transformations and Congruence
Theorem 10.24. Let A E c;,xn. Then there exist matrices P E C:
xm
and Q E such
that
(l0.4)
Proof: A classical proof can be consulted in, for example, [21, p. 131]. Alternatively,
suppose A has an SVD of the form (5.2) in its complex version. Then
[
Sl 0 ] [ U
H
] [I 0 ]
o I Uf AV = 0 0 .
Take P = [ 'f [I ] and Q = V to complete the proof. 0
Note that the greater freedom afforded by the equivalence transformation of Theorem
10.24, as opposed to the more restrictive situation of a similarity transformation, yields a
far "simpler" canonical form (10.4). However, numerical procedures for computing such
an equivalence directly via, say, Gaussian or elementary row and column operations, are
generally unreliable. The numerically preferred equivalence is, of course, the unitary equiv
alence known as the SVD. However, the SVD is relatively expensive to compute and other
canonical forms exist that are intermediate between (l0.4) and the SVD; see, for example
[7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable
than (lOA) and more efficiently computable than a full SVD. Many similar results are also
available.
10.3. Equivalence Transformations and Congruence 103
Theorem 10.25 (Complete Orthogonal Decomposition). Let A e C™
x
". Then there exist
unitary matrices U e C
mxm
and V e C
nxn
such that
where R e €,
r
r
xr
is upper (or lower) triangular with positive diagonal elements.
Proof: For the proof, see [4]. D
Theorem 10.26. Let A e C™
x
". Then there exists a unitary matrix Q e C
mxm
and a
permutation matrix Fl e C"
x
" such that
where R E C
r
r
xr
is upper triangular and S e C
r x(
"
r)
is arbitrary but in general nonzero.
Proof: For the proof, see [4]. D
Remark 10.27. When A has full column rank but is "near" a rank deficient matrix,
various rank revealing QR decompositions are available that can sometimes detect such
phenomena at a cost considerably less than a full SVD. Again, see [4] for details.
Definition 10.28. Let A e C
nxn
and X e C
n
n
xn
. The transformation A i> X
H
AX is called
a congruence. Note that a congruence is a similarity if and only ifX is unitary.
Note that congruence preserves the property of being Hermitian; i.e., if A is Hermitian,
then X
H
AX is also Hermitian. It is of interest to ask what other properties of a matrix are
preserved under congruence. It turns out that the principal property so preserved is the sign
of each eigenvalue.
Definition 10.29. Let A = A
H
e C"
x
" and let 7t, v, and £ denote the numbers of positive,
negative, and zero eigenvalues, respectively, of A. Then the inertia of A is the triple of
numbers In(A) = (n, v, £). The signature of A is given by sig(A) = n — v.
Example 10.30.
2. If A = A" eC
n x
" , t h enA > 0 if and only if In (A) = (n, 0, 0).
3. If In(A) = (TT, v, £), then rank(A) = n + v.
Theorem 10.31 (Sylvester's Law of Inertia). Let A = A
H
e C
nxn
and X e C
n
n
xn
. Then
In(A) = ln(X
H
AX).
Proof: For the proof, see, for example, [21, p. 134]. D
Theorem 10.31 guarantees that rank and signature of a matrix are preserved under
congruence. We then have the following.
10.3. Equivalence Transformations and Congruence 103
Theorem 10.25 (Complete Orthogonal Decomposition). Let A E e ~ x n . Then there exist
unitary matrices U E e
mxm
and V E e
nxn
such that
where R E e;xr is upper (or lower) triangular with positive diagonal elements.
Proof: For the proof, see [4]. 0
(10.5)
Theorem 10.26. Let A E e ~ x n . Then there exists a unitary matrix Q E e
mxm
and a
permutation matrix IT E en xn such that
QAIT = [ ~ ~ l
(10.6)
where R E e;xr is upper triangular and S E erx(nr) is arbitrary but in general nonzero.
Proof: For the proof, see [4]. 0
Remark 10.27. When A has full column rank but is "near" a rank deficient matrix,
various rank revealing QR decompositions are available that can sometimes detect such
phenomena at a cost considerably less than a full SVD. Again, see [4] for details.
Definition 10.28. Let A E e
nxn
and X E e ~ x n . The transformation A H XH AX is called
a congruence. Note that a congruence is a similarity if and only if X is unitary.
Note that congruence preserves the property of being Hermitian; i.e., if A is Hermitian,
then X H AX is also Hermitian. It is of interest to ask what other properties of a matrix are
preserved under congruence. It turns out that the principal property so preserved is the sign
of each eigenvalue.
Definition 10.29. Let A = AH E e
nxn
and let rr, v, and ~ denote the numbers of positive,
negative, and zero eigenvalues, respectively, of A. Then the inertia of A is the triple of
numbers In(A) = (rr, v, n The signature of A is given by sig(A) = rr  v.
Example 10.30.
l.In[! 1
o 0
00] o 0
10 =(2,1,1).
o 0
2. If A = AH E e
nxn
, then A> 0 if and only if In(A) = (n, 0, 0).
3. If In(A) = (rr, v, n, then rank(A) = rr + v.
Theorem 10.31 (Sylvester's Law of Inertia). Let A = A HE en xn and X E e ~ xn. Then
In(A) = In(X
H
AX).
Proof: For the proof, see, for example, [21, p. 134]. D
Theorem 10.31 guarantees that rank and signature of a matrix are preserved under
congruence. We then have the following.
104 Chapter 10. Canonical Forms
Theorem 10.32. Let A = A
H
e C"
xn
with In(A) = (jt, v, £). Then there exists a matrix
X e C"
n
xn
such that X
H
AX = diag(l, . . . , 1, 1,..., 1, 0, . . . , 0), where the number of
1 's is 7i, the number of — l's is v, and the number 0/0 's is (,.
Proof: Let AI , . . . , X
w
denote the eigenvalues of A and order them such that the first TT are
positive, the next v are negative, and the final £ are 0. By Theorem 10.2 there exists a unitary
matrix U such that U
H
AU = diag(Ai, . . . , A
w
). Define the n x n matrix
Then it is easy to check that X = U W yields the desired result. D
10.3.1 Block matrices and definiteness
Theorem 10.33. Suppose A = A
T
and D = D
T
. Then
if and only if either A > 0 and D  B
T
A~
l
B > 0, or D > 0 and A  BD^B
T
> 0.
Proof: The proof follows by considering, for example, the congruence
The details are straightforward and are left to the reader. D
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Theorem 10.35. Suppose A = A
T
and D = D
T
. Then
if and only ifA>0, AA
+
B = B, and D  B
T
A
+
B > 0.
Proof: Consider the congruence with
and proceed as in the proof of Theorem 10.33. D
10.4 Rational Canonical Form
One final canonical form to be mentioned is the rational canonical form.
104 Chapter 10. Canonical Forms
Theorem 10.32. Let A = AH E c
nxn
with In(A) = (Jr, v, O. Then there exists a matrix
X E such that XH AX = diag(1, ... , I, I, ... , 1,0, ... ,0), where the number of
1 's is Jr, the number of I 's is v, and the numberofO's
Proof: Let A I, ... , An denote the eigenvalues of A and order them such that the first Jr are
positive, the next v are negative, and the final are O. By Theorem 10.2 there exists a unitary
matrix V such that VH AV = diag(AI, ... , An). Define the n x n matrix
vv = ... , 1/.fArr+I' ... , I/.fArr+v, I, ... ,1).
Then it is easy to check that X = V VV yields the desired result. 0
10.3.1 Block matrices and definiteness
Theorem 10.33. Suppose A = AT and D = DT. Then
ifand only ifeither A> ° and D  BT AI B > 0, or D > 0 and A  BD
I
BT > O.
Proof: The proof follows by considering, for example, the congruence
B ] [I _AI B JT [ A
D 0 I BT
] [
The details are straightforward and are left to the reader. 0
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Theorem 10.35. Suppose A = AT and D = DT. Then
B ] > °
D 
if and only if A:::: 0, AA+B = B. and D  BT A+B:::: o.
Proof: Consider the congruence with
and proceed as in the proof of Theorem 10.33. 0
10.4 Rational Canonical Form
One final canonical form to be mentioned is the rational canonical form.
10.4. Rational Canonical Form 105
Definition 10.36. A matrix A e M"
x
" is said to be nonderogatory if its minimal polynomial
and characteristic polynomial are the same or, equivalently, if its Jordan canonical f orm
has only one block associated with each distinct eigenvalue.
Suppose A E W
xn
is a nonderogatory matrix and suppose its characteristic polyno
mial is 7 r( A ) = A " — ( a
0
+ «A +
is similar to a matrix of the form
+ a
n
_ i A
n
~ ' )  Then it can be shown (see [12]) that A
Definition 10.37. A matrix A e E
nx
" of the f orm (10.7) is called a companion matrix or
is said to be in companion form.
Companion matrices also appear in the literature in several equivalent forms. To
illustrate, consider the companion matrix
This matrix is a special case of a matrix in lower Hessenberg form. Using the reverseorder
identity similarity P given by (9.18), A is easily seen to be similar to the following matrix
in upper Hessenberg form:
Moreover, since a matrix is similar to its transpose (see exercise 13 in Chapter 9), the
following are also companion matrices similar to the above:
Notice that in all cases a companion matrix is nonsingular if and only if aO /= 0.
In fact, the inverse of a nonsingular companion matrix is again in companion form. For
£*Yamr\1j=»
10.4. Rational Canonical Form 105
Definition 10.36. A matrix A E lR
n
Xn is said to be nonderogatory ifits minimal polynomial
and characteristic polynomial are the same or; equivalently, if its Jordan canonical form
has only one block associated with each distinct eigenvalue.
Suppose A E lR
nxn
is a nonderogatory matrix and suppose its characteristic polyno
mial is n(A) = An  (ao + alA + ... + an_IAnI). Then it can be shown (see [12]) that A
is similar to a matrix of the form
o o o
o 0
o
(10.7)
o o
Definition 10.37. A matrix A E lR
nxn
of the form (10.7) is called a cornpanion rnatrix or
is said to be in cornpanion forrn.
Companion matrices also appear in the literature in several equivalent forms. To
illustrate, consider the companion matrix
(l0.8)
This matrix is a special case of a matrix in lower Hessenberg form. Using the reverseorder
identity similarity P given by (9.18), A is easily seen to be similar to the following matrix
in upper Hessenberg form:
a2 al
o 0
1 0
o 1
6]
o .
o
(10.9)
Moreover, since a matrix is similar to its transpose (see exercise 13 in Chapter 9), the
following are also companion matrices similar to the above:
l
:: ~ ! ~ 0 1 ] .
ao 0 0
(10.10)
Notice that in all cases a companion matrix is nonsingular if and only if ao i= O.
In fact, the inverse of a nonsingular companion matrix is again in companion form. For
example,
o
1
o
 ~
ao
1
o
o
 ~
ao
o
o
_!!l
o
o
(10.11)
106 Chapter 10. Canonical Forms
with a similar result for companion matrices of the form (10.10).
If a companion matrix of the form (10.7) is singular, i.e., if ao = 0, then its pseudo
inverse can still be computed. Let a e M""
1
denote the vector \a\, 02,..., a
n
i] and let
c =
l+
l
a
r
a
. Then it is easily verified that
Note that / — caa
T
= (I + aa
T
) , and hence the pseudoinverse of a singular companion
matrix is not a companion matrix unless a = 0.
Companion matrices have many other interesting properties, among which, and per
haps surprisingly, is the fact that their singular values can be found in closed form; see
[14].
Theorem 10.38. Let a\ > GI > • • • > a
n
be the singular values of the companion matrix
A in (10.7). Let a = a\ + a\ + • • • +a%_
{
and y = 1 + «.Q + a. Then
Remark 10.39. Explicit formulas for all the associated right and left singular vectors can
also be derived easily.
If A € R
nx
" is derogatory, i.e., has more than one Jordan block associated with
at least one eigenvalue, then it is not similar to a companion matrix of the form (10.7).
However, it can be shown that a derogatory matrix is similar to a block diagonal matrix,
each of whose diagonal blocks is a companion matrix. Such matrices are said to be in
rational canonical form (or Frobenius canonical form). For details, see, for example, [12].
Companion matrices appear frequently in the control and signal processing literature
but unfortunately they are often very difficult to work with numerically. Algorithms to reduce
an arbitrary matrix to companion form are numerically unstable. Moreover, companion
matrices are known to possess many undesirable numerical properties. For example, in
general and especially as n increases, their eigenstructure is extremely ill conditioned,
nonsingular ones are nearly singular, stable ones are nearly unstable, and so forth [14].
Ifao ^ 0, the largest and smallest singular values can also be written in the equivalent form
106 Chapter 10. Canonical Forms
with a similar result for companion matrices of the form (10.10).
If a companion matrix of the form (10.7) is singular, i.e., if ao = 0, then its pseudo
inverse can still be computed. Let a E JRn1 denote the vector [ai, a2, ... , anIf and let
c = I + ~ T a' Then it is easily verified that
o
o
o
o
o o
o
o
+
o
1 caa
T
o J.
ca
Note that I  caa T = (I + aa T) I , and hence the pseudoinverse of a singular companion
matrix is not a companion matrix unless a = O.
Companion matrices have many other interesting properties, among which, and per
haps surprisingly, is the fact that their singular values can be found in closed form; see
[14].
Theorem 10.38. Let al ~ a2 ~ ... ~ an be the singular values of the companion matrix
A in (10.7). Leta = ar + ai + ... + a;_1 and y = 1 + aJ + a. Then
2 _ 1 ( J 2 2)
a
l
 2 y + y  4a
o
'
a? = 1 for i = 2, 3, ... , n  1,
a; = ~ (y  J y2  4a
J
) .
If ao =1= 0, the largest and smallest singular values can also be written in the equivalent form
Remark 10.39. Explicit formulas for all the associated right and left singular vectors can
also be derived easily.
If A E JRnxn is derogatory, i.e., has more than one Jordan block associated with
at least one eigenvalue, then it is not similar to a companion matrix of the form (10.7).
However, it can be shown that a derogatory matrix is similar to a block diagonal matrix,
each of whose diagonal blocks is a companion matrix. Such matrices are said to be in
rational canonical form (or Frobenius canonical form). For details, see, for example, [12].
Companion matrices appear frequently in the control and signal processing literature
but unfortunately they are often very difficult to work with numerically. Algorithms to reduce
an arbitrary matrix to companion form are numerically unstable. Moreover, companion
matrices are known to possess many undesirable numerical properties. For example, in
general and especially as n increases, their eigenstructure is extremely ill conditioned,
nonsingular ones are nearly singular, stable ones are nearly unstable, and so forth [14].
Exercises 107
Companion matrices and rational canonical forms are generally to be avoided in floating
point computation.
Remark 10.40. Theorem 10.38 yields some understanding of why difficult numerical
behavior might be expected for companion matrices. For example, when solving linear
systems of equations of the form (6.2), one measure of numerical sensitivity is K
P
(A) =
I I ^ I I
p
I I A~
l
I I
p
>
m
e socalled condition number of A with respect to inversion and with respect
to the matrix Pnorm. I f this number is large, say 0(10*), one may lose up to k digits of
precision. I n the 2norm, this condition number is the ratio of largest to smallest singular
values which, by the theorem, can be determined explicitly as
I t is easy to show that y/2/ao < k2(A) < £,, and when GO is small or y is large (or both),
then K2(A) ^ T~I. I t is not unusual for y to be large for large n. Note that explicit formulas
for K\ (A) and K oo(A) can also be determined easily by using (10.11).
EXERCISES
1. Show that if a triangular matrix is normal, then it must be diagonal.
2. Prove that if A e M"
x
" is normal, then Af(A) = A/"(A
r
).
3. Let A G C
nx
" and define p(A) = maxx
€
A(A) I ' M Then p(A) is called the spectral
radius of A. Show that if A is normal, then p(A) = A
2
. Show that the converse
is true if n = 2.
4. Let A € C
nxn
be normal with eigenvalues y1 , ..., y
n
and singular values a\ > a
2
>
• • • > o
n
> 0. Show that a, (A) = A.,(A) for i e n.
5. Use the reverseorder identity matrix P introduced in (9.18) and the matrix U in
Theorem 10.5 to find a unitary matrix Q that reduces A e C"
x
" to lower triangular
form.
6. Let A = I J MeC
2x2
. Find a unitary matrix U such that
7. I f A e W
xn
is positive definite, show that A
[
must also be positive definite.
3. Suppose A e E"
x
" is positive definite. I s [ ^ /i 1 > 0?
}. Let R, S 6 E
nxn
be symmetric. Show that [ * J 1 > 0 if and only if S > 0 and
R> S
Exercises 107
Companion matrices and rational canonical forms are generally to be avoided in fioating
point computation.
Remark 10.40. Theorem 10.38 yields some understanding of why difficult numerical
behavior might be expected for companion matrices. For example, when solving linear
systems of equations of the form (6.2), one measure of numerical sensitivity is Kp(A) =
II A II p II A ] II p' the socalled condition number of A with respect to inversion and with respect
to the matrix pnorm. If this number is large, say O(lO
k
), one may lose up to k digits of
precision. In the 2norm, this condition number is the ratio of largest to smallest singular
values which, by the theorem, can be determined explicitly as
y+J
y
2 4
a5
21
a
ol
It is easy to show that :::: K2(A) :::: 1:01' and when ao is small or y is large (or both),
then K2(A) It is not unusualfor y to be large forlarge n. Note that explicit formulas
for K] (A) and Koo(A) can also be determined easily by using (l0.11).
EXERCISES
1. Show that if a triangular matrix is normal, then it must be diagonal.
2. Prove that if A E jRnxn is normal, then N(A) = N(A
T
).
3. Let A E cc
nxn
and define peA) = max)..EA(A) IAI. Then peA) is called the spectral
radius of A. Show that if A is normal, then peA) = IIAII2' Show that the converse
is true if n = 2.
4. Let A E en xn be normal with eigenvalues A], ... , An and singular values 0'1 0'2
... an O. Show that a; (A) = IA;(A)I for i E!l.
5. Use the reverseorder identity matrix P introduced in (9.18) and the matrix U in
Theorem 10.5 to find a unitary matrix Q that reduces A E cc
nxn
to lower triangular
form.
6. Let A = :] E CC
2x2
. Find a unitary matrix U such that
7. If A E jRn xn is positive definite, show that A I must also be positive definite.
8. Suppose A E jRnxn is positive definite. Is [1 O?
9. Let R, S E jRnxn be symmetric. Show that > 0 if and only if S > 0 and
R > SI.
108 Chapter 10. Canonical Forms
10. Find the inertia of the following matrices:
108
10. Find the inertia of the following matrices:
(a) [ ~ ~ l (b) [
(d) [1 1 + j ]
1  j 1 .
Chapter 10. Canonical Forms
2 1 + j ]
1  j 2 '
Chapter 11
Linear Differential and
Difference Equations
11.1 Differential Equations
In this section we study solutions of the linear homogeneous system of differential equations
for t > IQ. This is known as an initialvalue problem. We restrict our attention in this
chapter only to the socalled timeinvariant case, where the matrix A e R
nxn
is constant
and does not depend on t. The solution of (11.1) is then known always to exist and be
unique. It can be described conveniently in terms of the matrix exponential.
Definition 11.1. For all A e R
nxn
, the matrix exponential e
A
e R
nxn
is defined by the
power series
The series (11.2) can be shown to converge for all A (has radius of convergence equal
to +00). The solution of (11.1) involves the matrix
which thus also converges for all A and uniformly in t.
11.1.1 Properties of the matrix exponential
1. e° = I.
Proof: This follows immediately from Definition 11.1 by setting A = 0.
2. For all A G R"
XM
, (e
A
f  e^.
Proof: This follows immediately from Definition 11.1 and linearity of the transpose.
109
Chapter 11
Linear Differential and
Difference Equations
11.1 Differential Equations
In this section we study solutions of the linear homogeneous system of differential equations
x(t) = Ax(t); x(to) = Xo E JR.n (11.1)
for t 2: to. This is known as an initialvalue problem. We restrict our attention in this
chapter only to the socalled timeinvariant case, where the matrix A E JR.nxn is constant
and does not depend on t. The solution of (11.1) is then known always to exist and be
unique. It can be described conveniently in terms of the matrix exponential.
Definition 11.1. For all A E JR.nxn, the matrix exponential e
A
E JR.nxn is defined by the
power series
+00 1
e
A
= L ,Ak.
k=O k.
(11.2)
The series (11.2) can be shown to converge for all A (has radius of convergence equal
to +(0). The solution of (11.1) involves the matrix
(11.3)
which thus also converges for all A and uniformly in t.
11.1.1 Properties of the matrix exponential
1. eO = I.
Proof This follows immediately from Definition 11.1 by setting A = O.
T T
2. For all A E JR.nxn, (e
A
) = e
A
•
Proof This follows immediately from Definition 11.1 and linearity of the transpose.
109
110 Chapter 11. Linear Differential and Difference Equations
3. For all A e R"
x
" and for all t, r e R, e
(t
+
T)A
= e'
A
e
rA
= e
lA
e'
A
.
Proof: Note that
Compare like powers of A in the above two equations and use the binomial theorem
on (t + T)*.
4. For all A, B e R"
xn
and for all t e R, e
t(A+B)
=^e'
A
e'
B
= e'
B
e'
A
if and only if A
and B commute, i.e., AB = B A.
Proof: Note that
and
and
while
Compare like powers of t in the first equation and the second or third and use the
binomial theorem on (A + B)
k
and the commutativity of A and B.
5. For all A e R"
x
" and for all t e R, (e'
A
)~
l
= e~'
A
.
Proof: Simply take T = — t in property 3.
6. Let £ denote the Laplace transform and £~
!
the inverse Laplace transform. Then for
all A € R"
x
" and for all t € R,
(a) C{e
tA
} = (sIAr
l
.
(b) £
1
{(j/A)
1
} = «
M
.
Proof: We prove only (a). Part (b) follows similarly.
110 Chapter 11. Linear Differential and Difference Equations
3. For all A E JRnxn and for all t, T E JR, e(t+r)A = etA erA = erAe
tA
.
Proof" Note that
(t + T)2 2
e(t+r)A = I + (t + T)A + A + ...
2!
and
tA rA t 2 T 2
(
2 )( 2 )
e e = I + t A + 2! A +... I + T A + 2! A +... .
Compare like powers of A in the above two equations and use the binomial theorem
on(t+T)k.
4. For all A, B E JRnxn and for all t E JR, et(A+B) =etAe
tB
= etBe
tA
if and only if A
and B commute, i.e., AB = BA.
Proof' Note that
and
while
t
2
et(A+B) = I + teA + B) + (A + B)2 + ...
2!
tB tA t 2 t 2
(
2 )( 2 )
e e = 1+ tB + 2iB +... 1+ tA + 2!A +... .
Compare like powers of t in the first equation and the second or third and use the
binomial theorem on (A + B/ and the commutativity of A and B.
5. ForaH A E JRnxn and for all t E JR, (etA)1 = e
tA
.
Proof" Simply take T = t in property 3.
6. Let £ denote the Laplace transform and £1 the inverse Laplace transform. Then for
all A E JRnxn and for all t E lR,
(a) .l{e
tA
} = (sI  A)I.
(b) .lI{(sl A)I} = erA.
Proof" We prove only (a). Part (b) follows similarly.
{+oo
= io et(sl)e
tA
dt
(+oo
= io ef(Asl) dt since A and (sf) commute
11.1. Differential Equations 111
= (sl A)
1
.
The matrix (s I — A) ~' is called the resolvent of A and is defined for all s not in A (A).
Notice in the proof that we have assumed, for convenience, that A is diagonalizable.
If this is not the case, the scalar dyadic decomposition can be replaced by
using the JCF. All succeeding steps in the proof then follow in a straightforward way.
7. For all A e R"
x
" and for all t e R, £(e'
A
) = Ae
tA
= e'
A
A.
Proof: Since the series (11.3) is uniformly convergent, it can be differentiated termby
term from which the result follows immediately. Alternatively, the formal definition
can be employed as follows. For any consistent matrix norm,
11.1. Differential Equations 111
= {+oo t e(AiS)t x;y;H dt assuming A is diagonalizable
10 ;=1
= e(AiS)t dt]x;y;H
n 1
= '"' Xi y;H assuming Re s > Re Ai for i E !!
L..... s  A"
i=1 I
= (sI  A)I.
The matrix (s I  A) I is called the resolvent of A and is defined for all s not in A (A).
Notice in the proof that we have assumed, for convenience, that A is diagonalizable.
If this is not the case, the scalar dyadic decomposition can be replaced by
m
et(Asl) = L Xiet(Jisl)y;H
;=1
using the JCF. All succeeding steps in the proof then follow in a straightforward way.
7. For all A E JRnxn and for all t E JR, 1h(e
tA
) = Ae
tA
= etA A.
Proof: Since the series (11.3) is uniformly convergent, it can be differentiated termby
term from which the result follows immediately. Alternatively, the formal definition
d e(t+M)A _ etA
_(/A) = lim
d t L'lt
can be employed as follows. For any consistent matrix norm,
II
etA II III II
u.  Ae
tA
= L'lt  /A)  Ae
tA
= II  etA)  Ae
tA
II
= II  l)e
tA
 Ae
tA
II
II
I ( (M)2 2 ) tA tAil
= L'lt M A + A +... e  Ae
= II ( Ae
tA
+ A
2
e
tA
+ ... )  Ae
tA
II
= II ( A2 + A
3
+ .. , ) etA II
< MIIA21111e
tA
II _ + IIAII + IIAI12 + ...
(
1 L'lt (L'lt)2 )
 2! 3! 4!
< L'lt1lA21111e
tA
Il (1 + L'ltiIAIl + IIAII2 + ... )
= L'lt IIA 21111e
tA
112 Chapter 11. Linear Differential and Difference Equations
For fixed t, the righthand side above clearly goes to 0 as At goes to 0. Thus, the
limit exists and equals Ae'
A
. A similar proof yields the limit e'
A
A, or one can use the
fact that A commutes with any polynomial of A of finite degree and hence with e'
A
.
11.1.2 Homogeneous linear differential equations
Theorem 11.2. Let A e R
nxn
. The solution of the linear homogeneous initialvalue problem
Proof: Differentiate (11.5) and use property 7 of the matrix exponential to get x( t ) =
Ae
( t
~
to) A
xo = Ax( t) . Also, x( t
0
) — e
( fo
~
t
° '
) A
X Q — X Q so, by the fundamental existence and
uniqueness theorem for ordinary differential equations, (11.5) is the solution of (11.4). D
11.1.3 Inhomogeneous linear differential equations
Theorem 11.3. Let A e R
nxn
, B e W
xm
and let the vectorvalued function u be given
and, say, continuous. Then the solution of the linear inhomogeneous initialvalue problem
for t > IQ is given by the variation of parameters formula
Proof: Differentiate (11.7) and again use property 7 of the matrix exponential. The general
formula
is used to get x( t ) = Ae
{
'
to) A
x
0
+ f'
o
Ae
(
'
s) A
Bu( s) ds + Bu( t) = Ax( t) + Bu( t) . Also,
*('o)
=
< ?
(f
° ~
fo)/ 1
.¥ o + 0 = X Q so, by the fundamental existence and uniqueness theorem for
ordinary differential equations, (11.7) is the solution of (11.6). D
Remark 11.4. The proof above simply verifies the variation of parameters formula by
direct differentiation. The formula can be derived by means of an integrating factor "trick"
as follows. Premultiply the equation x — Ax = Bu by e~
tA
to get
112 Chapter 11. Linear Differential and Difference Equations
For fixed t, the righthand side above clearly goes to 0 as t:.t goes to O. Thus, the
limit exists and equals Ae
t
A • A similar proof yields the limit e
t
A A, or one can use the
fact that A commutes with any polynomial of A of finite degree and hence with etA.
11.1.2 Homogeneous linear differential equations
Theorem 11.2. Let A E IR
n
xn. The solution of the linear homogeneous initialvalue problem
x(t) = Ax(l); x(to) = Xo E IR
n
(11.4)
for t ::: to is given by
(11.5)
Proof: Differentiate (11.5) and use property 7 of the matrix exponential to get x (t) =
Ae(tto)A
xo
= Ax(t). Also, x(to) = e(toto)A Xo = Xo so, by the fundamental existence and
uniqueness theorem for ordinary differential equations, (11.5) is the solution of (11.4). 0
11.1.3 Inhomogeneous linear differential equations
Theorem 11.3. Let A E IR
nxn
, B E IR
nxm
and let the vectorvalued function u be given
and, say, continuous. Then the solution of the linear inhomogeneous initialvalue problem
x(t) = Ax(t) + Bu(t); x(to) = Xo E IR
n
for t ::: to is given by the variation of parameters formula
x(t) = e(tto)A
xo
+ t e(ts)A Bu(s) ds.
l t o
(11.6)
(11.7)
Proof: Differentiate (11.7) and again use property 7 of the matrix exponential. The general
formula
d l
q
(t) l
q
(t) af(x t) dq(t) dp(t)
 f(x, t) dx = ' dx + f(q(t), t)  f(p(t), t)
dt pet) pet) at dt dt
is used to get x(t) = Ae(tto)A Xo + Ir: Ae(ts)A Bu(s) ds + Bu(t) = Ax(t) + Bu(t). Also,
x(t
o
} = e(totolA Xo + 0 = Xo so, by the fundilm()ntill nnd uniqu()Oc:s:s theorem for
ordinary differential equations, (11.7) is the solution of (1l.6). 0
Remark 11.4. The proof above simply verifies the variation of parameters formula by
direct differentiation. The formula can be derived by means of an integrating factor "trick"
as follows. Premultiply the equation x  Ax = Bu by e
tA
to get
(11.8)
11.1. Differential Equations 113
Now integrate (11.8) over the interval [to, t]:
Thus,
and hence
11.1.4 Linear matrix differential equations
Matrixvalued initialvalue problems also occur frequently. The first is an obvious general
ization of Theorem 11.2, and the proof is essentially the same.
Theorem 11.5. Let A e W
lxn
. The solution of the matrix linear homogeneous initialvalue
nrohlcm
for t > to is given by
In the matrix case, we can have coefficient matrices on both the right and left. For
convenience, the following theorem is stated with initial time to = 0.
Theorem 11.6. Let A e Rn
xn
, B e R
mxm
, and C e Rn
xm
. Then the matrix initialvalue
problem
—
a
tA
ra
tB
has the solutionX ( t ) = e Ce
Proof: Differentiate e
tA
Ce
tB
with respect to t and use property 7 of the matrix exponential.
The fact that X ( t ) satisfies the initial condition is trivial. D
Corollary 11.7. Let A, C e IR"
X
". Then the matrix initialvalue problem
has the solution X(t} = e
tA
Ce
tAT
.
When C is symmetric in (11.12), X ( t ) is symmetric and (11.12) is known as a Lya
punov differential equation. The initialvalue problem (11.11) is known as a Sylvester
differential equation.
11.1. Differential Equations
Now integrate (11.8) over the interval [to, t]:
Thus,
and hence
esAx(s) ds = eSABu(s) ds.
1
t d 1t
to ds to
etAx(t)  etoAx(to) = t e
sA
Bu(s) ds
lto
x(t) = e(tt
olA
xo
+ t e(ts)A Bu(s) ds.
lto
11.1.4 Linear matrix differential equations
113
Matrixvalued initialvalue problems also occur frequently. The first is an obvious general
ization of Theorem 11.2, and the proof is essentially the same.
Theorem 11.5. Let A E jRnxn. The solution of the matrix linear homogeneous initialvalue
problem
X(t) = AX(t); X(to) = C E jRnxn (11.9)
for t ::: to is given by
X(t) = e(tto)Ac.
(11.10)
In the matrix case, we can have coefficient matrices on both the right and left. For
convenience, the following theorem is stated with initial time to = O.
Theorem 11.6. Let A E jRnxn, B E jRmxm, and C E ]R.nxm. Then the matrix initialvalue
problem
X(t) = AX(t) + X(t)B; X(O) = C (11.11)
has the solution X(t) = etACe
tB
.
Proof: Differentiate etACe
tB
with respect to t and use property 7 of the matrix exponential.
The fact that X (t) satisfies the initial condition is trivial. 0
Corollary 11.7. Let A, C E ]R.nxn. Then the matrix initialvalue problem
X(t) = AX(t) + X(t)AT; X(O) = C (11.12)
has the solution X(t) = etACetAT.
When C is symmetric in (11.12), X (t) is symmetric and (11.12) is known as a Lya
punov differential equation. The initialvalue problem (11.11) is known as a Sylvester
differential equation.
114 Chapter 11. Linear Differential and Difference Equations
11.1.5 Modal decompositions
Let A E W
xn
and suppose, for convenience, that it is diagonalizable (if A is not diagonaliz
able, the rest of this subsection is easily generalized by using the JCF and the decomposition
A — ^ X f Ji Y
t
H
as discussed in Chapter 9). Then the solution x(t) of (11.4) can be written
The ki s are called the modal velocities and the right eigenvectors *, are called the modal
directions. The decomposition above expresses the solution x(t) as a weighted sum of its
modal velocities and directions.
This modal decomposition can be expressed in a different looking but identical form
if we write the initial condition X Q as a weighted sum of the right eigenvectors
Then
In the last equality we have used the fact that y f * X j = S f j .
Similarly, in the inhomogeneous case we can write
11.1.6 Computation of the matrix exponential
JCF method
Let A e R"
x
" and suppose X e Rn
xn
is such that X"
1
AX = J, where J is a JCF for A.
Then
114 Chapter 11. Linear Differential and Difference Equations
11.1 .5 Modal decompositions
Let A E jRnxn and suppose, for convenience, that it is diagonalizable (if A is not diagonaliz
able, the rest of this subsection is easily generalized by using the JCF and the decomposition
A = L X;li y
i
H
as discussed in Chapter 9). Then the solution x(t) of (11.4) can be written
x(t) = e(tto)A Xo
= (ti.iUtO)Xiyr) Xo
1=1
n
= L(Yi
H
xoeAi(ttO»Xi.
i=1
The Ai s are called the modal velocities and the right eigenvectors Xi are called the modal
directions. The decomposition above expresses the solution x (t) as a weighted sum of its
modal velocities and directions.
This modal decomposition can be expressed in a different looking but identical form
n
if we write the initial condition Xo as a weighted sum of the right eigenvectors Xo = L ai Xi.
Then
n
= L(aieAiUtO»Xi.
i=1
In the last equality we have used the fact that Yi
H
X j = flij.
Similarly, in the inhomogeneous case we can write
i
t e(ts)A Bu(s) ds = t (it eAiUS)YiH Bu(s) dS) Xi.
~ i=1 ~
11.1.6 Computation of the matrix exponential
JCF method
i=1
Let A E jRnxn and suppose X E j R ~ x n is such that XI AX = J, where J is a JCF for A.
Then
etA = etXJX1
= XetJX
1
I
n
Le
A
•
,
X'Yi
H
if A is diagonalizable
1=1
~ t,x;e'J,y;H in geneml.
11.1. Differential Equations 115
If A is diagonalizable, it is then easy to compute e
tA
via the formula e
tA
= Xe
tJ
X '
since e
tj
is simply a diagonal matrix.
In the more general case, the problem clearly reduces simply to the computation of
the exponential of a Jordan block. To be specific, let .7, e <C
kxk
be a Jordan block of the form
Clearly A/ and N commute. Thus, e
tJi
= e'
u
e
tN
by property 4 of the matrix exponential.
The diagonal part is easy: e
tu
= diag(e
x
',..., e
xt
}. But e
tN
is almost as easy since N is
nilpotent of degree k.
Definition 11.8. A matrix M e M
nx
"
M
p
= 0, while M
p
~
l
^ 0.
is nilpotent of degree (or index, or grade) p if
For the matrix N defined above, it is easy to check that while N has 1's along only
its first superdiagonal (and O's elsewhere), N
2
has 1's along only its second superdiagonal,
and so forth. Finally, N
k
~
l
has a 1 in its (1, k) element and has O's everywhere else, and
N
k
= 0. Thus, the series expansion of e'
N
is finite, i.e.,
Thus,
In the case when A. is complex, a real version of the above can be worked out.
11.1. Differential Equations 115
If A is diagonalizable, it is then easy to compute etA via the formula etA = Xe
tl
XI
since e
t
I is simply a diagonal matrix.
In the more general case, the problem clearly reduces simply to the computation of
the exponential of a Jordan block. To be specific, let J
i
E C
kxk
be a Jordan block of the form
J
i
=
A 1
o A
o
o o
o =U+N.
o A
Clearly AI and N commute. Thus, e
t
I, = eO.! e
l
N by property 4 of the matrix exponential.
The diagonal part is easy: e
lH
= diag(e
At
, ••• ,eAt). But e
lN
is almost as easy since N is
nilpotent of degree k.
Definition 11.8. A matrix M E jRnxn is nilpotent of degree (or index, or grade) p if
MP = 0, while MPI t= O.
For the matrix N defined above, it is easy to check that while N has l's along only
its first superdiagonal (and O's elsewhere), N
2
has l's along only its second superdiagonal,
and so forth. Finally, N
k

I
has a 1 in its (1, k) element and has O's everywhere else, and
N
k
= O. Thus, the series expansion of e
lN
is finite, i.e.,
Thus,
t
2
t
k

I
e
IN
=I+tN+N
2
+ ... + N
k

I
2! (k  I)!
o
o o
eAt
teAt
12 At
2I
e
0
eAt teAl
ell; =
0 0
eAt
0 0
t
1
IkI At
(kI)! e
12 At
2I
e
teAl
eAt
In the case when A is complex, a real version of the above can be worked out.
116 Chapter 11. Linear Differential and Difference Equations
Example 11.9. Let A = [ ~ _ \ J]. Then A (A) = {2, 2} and
Interpolation method
This method is numerically unstable in finiteprecision arithmetic but is quite effective for
hand calculation in smallorder problems. The method is stated and illustrated for the
exponential function but applies equally well to other functions.
Given A € E.
nxn
and /(A) = e
tx
, compute f(A) = e'
A
, where t is a fixed scalar.
Suppose the characteristic polynomial of A can be written as n ( X ) = Yi?=i (^ ~~ ^ i)" ' »
where the A.,  s are distinct. Define
where O TQ , . . . , a
n
i are n constants that are to be determined. They are, in fact, the unique
solution of the n equations:
Here, the superscript (&) denotes the fcth derivative with respect to X. With the a, s then
known, the function g is known and /(A) = g(A). The motivation for this method is
the CayleyHamilton Theorem, Theorem 9.3, which says that all powers of A greater than
n — 1 can be expressed as linear combinations of A
k
for k = 0, 1, . . . , n — 1. Thus, all the
terms of order greater than n — 1 in the power series for e'
A
can be written in terms of these
lowerorder powers as well. The polynomial g gives the appropriate linear combination.
Example 11.10. Let
and /(A) = e
tK
. Then j r(A.) = (A. + I)
3
, so m = 1 and n
{
= 3.
Let g(X) — UQ + a\X + o^A.
2
. Then the three equations for the a, s are given by
116 Chapter 11. Linear Differential and Difference Equations
Example 11.9.
Let A = [=i
a Then A(A) = {2, 2} and
etA = Xe
tJ
xI
=[
2 1
] exp t [
2
 ~ ] [
1
]
0
1 2
=[
2
] [ e ~ 2 t
te
2t
] [
1
]
1
e
2t
1 2
Interpolation method
This method is numerically unstable in finiteprecision arithmetic but is quite effective for
hand calculation in smallorder problems. The method is stated and illustrated for the
exponential function but applies equally well to other functions.
Given A E jRnxn and f(A) = etA, compute f(A) = etA, where t is a fixed scalar.
Suppose the characteristic polynomial of A can be written as n(A) = nr=1 (A  Ai t',
where the Ai s are distinct. Define
where ao, ... , anl are n constants that are to be determined. They are, in fact, the unique
solution of the n equations:
g(k)(Ai) = f(k)(Ai); k = 0, I, ... , ni  I, i Em.
Here, the superscript (k) denotes the kth derivative with respect to A. With the aiS then
known, the function g is known and f(A) = g(A). The motivation for this method is
the CayleyHamilton Theorem, Theorem 9.3, which says that all powers of A greater than
n  1 can be expressed as linear combinations of A k for k = 0, I, ... , n  1. Thus, all the
terms of order greater than n  1 in the power series for e
t
A can be written in terms of these
lowerorder powers as well. The polynomial g gives the appropriate linear combination.
Example 11.10. Let
A = [  ~  ~ ~ ]
o 01
and f(A) = etA. Then n(A) = (A + 1)3, so m = 1 and nl = 3.
Let g(A) = ao + alA + a2A2. Then the three equations for the aiS are given by
g(I) = f(1) ==> ao al +a2 = e
t
,
g'(1) = f'(1) ==> at  2a2 = te
t
,
g"(I) = 1"(1) ==> 2a2 = t
2
e
t
•
11.1. Differential Equations 117
Solving for the a, s, we find
Thus,
~4 4i t f f > \ t k TU^^ _/"i\ f \ i o\ 2
Example 11.11. Let A = [ _* J] and /(A) = e
a
. Then 7 r(X ) = (A + 2)
2
so m = 1 and
«i = 2.
Let g(A.) = «o + ofiA.. Then the defining equations for the a,s are given by
Solving for the a,s, we find
Thus,
Other methods
1. Use e
tA
= £~
l
{(sl — A)^
1
} and techniques for inverse Laplace transforms. This
is quite effective for smallorder problems, but general nonsymbolic computational
techniques are numerically unstable since the problem is theoretically equivalent to
knowing precisely a JCF.
2. Use Pade approximation. There is an extensive literature on approximating cer
tain nonlinear functions by rational functions. The matrix analogue yields e
A
=
11 .1. Differential Equations
117
Solving for the ai s, we find
Thus,
Example 11.11. Let A = [ : : : : ~ 6] and f(A) = eO. Then rr(A) = (A + 2)2 so m = 1 and
nL = 2.
Let g(A) = ao + aLA. Then the defining equations for the aiS are given by
g(2) = f(2) ==> ao  2al = e
2t
,
g'(2) = f'(2) ==> al = te
2t
.
Solving for the aiS, we find
Thus,
ao = e
2t
+ 2te
2t
,
aL = te
2t
.
f(A) = etA = g(A) = aoI + al A
= (e
2t
+ 2te
2t
) [ ~
_ [ e
2t
_ 2te
2t
 te
2t
Other methods
o ] + te
2t
[4 4 ]
I I 0
1. Use etA = .cI{(sI  A)I} and techniques for inverse Laplace transforms. This
is quite effective for smallorder problems, but general nonsymbolic computational
techniques are numerically unstable since the problem is theoretically equivalent to
knowing precisely a JCE
2. Use Pade approximation. There is an extensive literature on approximating cer
tain nonlinear functions by rational functions. The matrix analogue yields e
A
~
118 Chapter 11. Linear Differential and Difference Equations
D~
l
(A)N(A), where D(A) = 8
0
I + Si A H h S
P
A
P
and N(A) = v
0
I + v
l
A +
• • • + v
q
A
q
. Explicit formulas are known for the coefficients of the numerator and
denominator polynomials of various orders. Unfortunately, a Fade approximation for
the exponential is accurate only in a neighborhood of the origin; in the matrix case
this means when  A is sufficiently small. This can be arranged by scaling A, say, by
/ * \
2
*
multiplying it by 1/2* for sufficiently large k and using the fact that e
A
= ( e
{ ] / 2 )A
j .
Numerical loss of accuracy can occur in this procedure from the successive squarings.
3. Reduce A to (real) Schur form S via the unitary similarity U and use e
A
= Ue
s
U
H
and successive recursions up the superdiagonals of the (quasi) upper triangular matrix
e
s
.
4. Many methods are outlined in, for example, [19]. Reliable and efficient computation
of matrix functions such as e
A
and log(A) remains a fertile area for research.
11.2 Difference Equations
In this section we outline solutions of discretetime analogues of the linear differential
equations of the previous section. Linear discretetime systems, modeled by systems of
difference equations, exhibit many parallels to the continuoustime differential equation
case, and this observation is exploited frequently.
11.2.1 Homogeneous linear difference equations
Theorem 11.12. Let A e Rn
xn
. The solution of the linear homogeneous system of difference
equations
11.2.2 Inhomogeneous linear difference equations
Theorem 11.14. Let A e R
nxn
, B e R
nxm
and suppose {«*}£§ « a given sequence of
mvectors. Then the solution of the inhomogeneous initialvalue problem
for k > 0 is given by
Proof: The proof is almost immediate upon substitution of (11.14) into (11.13). D
Remark 11.13. Again, we restrict our attention only to the socalled timeinvariant
case, where the matrix A in (11.13) is constant and does not depend on k. We could also
consider an arbitrary "initial time" ko, but since the system is timeinvariant, and since we
want to keep the formulas "clean" (i.e., no double subscripts), we have chosen ko = 0 for
convenience.
118 Chapter 11. Linear Differential and Difference Equations
DI(A)N(A), where D(A) = 001 + olA + ... + opAP and N(A) = vol + vIA +
... + Vq A q. Explicit formulas are known for the coefficients of the numerator and
denominator polynomials of various orders. Unfortunately, a Pad6 approximation for
the exponential is accurate only in a neighborhood of the origin; in the matrix case
this means when IIAII is sufficiently small. This can be arranged by scaling A, say, by
2'
multiplying it by 1/2k for sufficiently large k and using the fact that e
A
= (e( I /2')A )
Numerical loss of accuracy can occur in this procedure from the successive squarings.
3. Reduce A to (real) Schur form S via the unitary similarity U and use e
A
= U e
S
U H
and successive recursions up the superdiagonals of the (quasi) upper triangular matrix
e
S
.
4. Many methods are outlined in, for example, [19]. Reliable and efficient computation
of matrix functions such as e
A
and 10g(A) remains a fertile area for research.
11.2 Difference Equations
In this section we outline solutions of discretetime analogues of the linear differential
equations of the previous section. Linear discretetime systems, modeled by systems of
difference equations, exhibit many parallels to the continuoustime differential equation
case, and this observation is exploited frequently.
11.2.1 Homogeneous linear difference equations
Theorem 11.12. Let A E jRn xn. The solution of the linear homogeneous system of difference
equations
(11.13)
for k 2:: 0 is given by
Proof: The proof is almost immediate upon substitution of (11.14) into (11.13). 0
Remark 11.13. Again, we restrict our attention only to the socalled timeinvariant
case, where the matrix A in (11.13) is constant and does not depend on k. We could also
consider an arbitrary "initial time" ko, but since the system is timeinvariant, and since we
want to keep the formulas "clean" (i.e., no double subscripts), we have chosen ko = 0 for
convenience.
11.2.2 Inhomogeneous linear difference equations
Theorem 11.14. Let A E jRnxn, B E jRnxm and suppose { u d t ~ is a given sequence of
mvectors. Then the solution of the inhomogeneous initialvalue problem
(11.15)
11.2. Difference Equations 119
is given by
11.2.3 Computation of matrix powers
It is clear that solution of linear systems of difference equations involves computation of
A
k
. One solution method, which is numerically unstable but sometimes useful for hand
calculation, is to use ztransforms, by analogy with the use of Laplace transforms to compute
a matrix exponential. One definition of the ztransform of a sequence {gk} is
Assuming z > max A, the ztransform of the sequence {A
k
} is then given by
X€A(A)
Proof: The proof is again almost immediate upon substitution of (11.16) into (11.15). D
Methods based on the JCF are sometimes useful, again mostly for smallorder prob
lems. Assume that A e M"
xn
and let X e R^
n
be such that X~
1
AX = /, where J is a
JCF for A. Then
If A is diagonalizable, it is then easy to compute A
k
via the formula A
k
— XJ
k
X
l
since /* is simply a diagonal matrix.
11.2. Difference Equations
is given by
kI
xk=AkXO+LAkjIBUj, k:::.O.
j=O
119
(11.16)
Proof: The proof is again almost immediate upon substitution of (11.16) into (11.15). 0
11.2.3 Computation of matrix powers
It is clear that solution of linear systems of difference equations involves computation of
A k. One solution method, which is numerically unstable but sometimes useful for hand
calculation, is to use ztransforms, by analogy with the use of Laplace transforms to compute
a matrix exponential. One definition of the ztransform of a sequence {gk} is
+00
= LgkZ
k
.
k=O
Assuming Izl > max IAI, the ztransform of the sequence {Ak} is then given by
AEA(A)
+00
k "'kk 1 12
Z({A})=L...zA =I+A+"2A + ...
k=O z z
= (lzIA)I
= z(zI  A)I.
Methods based on the JCF are sometimes useful, again mostly for smallorder prob
lems. Assume that A E jRnxn and let X E be such that XI AX = J, where J is a
JCF for A. Then
Ak = (XJXI)k
= XJkX
1
_I
 m
LXi Jty
i
H
;=1
if A is diagonalizable,
in general.
If A is diagonalizable, it is then easy to compute Ak via the formula Ak = X Jk XI
since Jk is simply a diagonal matrix.
120 Chapter 11. Linear Differential and Difference Equations
In the general case, the problem again reduces to the computation of the power of a
Jordan block. To be specific, let 7, e C
pxp
be a Jordan block of the form
Writing /,• = XI + N and noting that XI and the nilpotent matrix N commute, it is
then straightforward to apply the binomial theorem to (XI + N)
k
and verify that
The symbol ( ) has the usual definition of ,
(
^ ., and is to be interpreted as 0 if k < q.
In the case when A. is complex, a real version of the above can be worked out.
4
Example 11.15. Let A = [_J J]. Then
Basic analogues of other methods such as those mentioned in Section 11.1.6 can also
be derived for the computation of matrix powers, but again no universally "best" method
exists. For an erudite discussion of the state of the art, see [11, Ch. 18].
11.3 HigherOrder Equations
It is well known that a higherorder (scalar) linear differential equation can be converted to
a firstorder linear system. Consider, for example, the initialvalue problem
with 4 > (t } a given function and n initial conditions
120 Chapter 11. Linear Differential and Difference Equations
In the general case, the problem again reduces to the computation of the power of a
Jordan block. To be specific, let J
i
E Cpxp be a Jordan block of the form
o ... 0 A
Writing J
i
= AI + N and noting that AI and the nilpotent matrix N commute, it is
then straightforward to apply the binomial theorem to (AI + N)k and verify that
Ak
kA kI
(;)A
k

2
(
k ) AkP+I
pl
0
Ak kA
k

1
J/ =
0 0
Ak
( ; ) A
k

2
kA
k

1
0 0
Ak
The symbol (: ) has the usual definition of q ! ( k k ~ q ) ! and is to be interpreted as 0 if k < q.
In the case when A is complex, a real version of the above can be worked out.
Example 11.15. Let A = [=i a Then
Ak = XJkX1 = [2 1 ] [(_2)k k(2)kk
1
] [ 1 2
1
]
1 1 0 (2) 1
_ [ (_2/
1
(2  2k) k( 2l+
1
]
 k( _2)k1 (2l
1
(2k  2) .
Basic analogues of other methods such as those mentioned in Section 11.1.6 can also
be derived for the computation of matrix powers, but again no universally "best" method
exists. For an erudite discussion of the state of the art, see [11, Ch. 18].
11.3 HigherOrder Equations
It is well known that a higherorder (scalar) linear differential equation can be converted to
a firstorder linear system. Consider, for example, the initialvalue problem
(11.17)
with ¢J(t) a given function and n initial conditions
y(O) = Co, y(O) = CI, ... , inI)(O) = CnI' (1l.l8)
Exercises 121
Here, v
(m)
denotes the mth derivative of y with respect to t. Define a vector x (?) e R" with
components *i(0 = y ( t ) , x
2
( t) = y ( t ) , . . . , x
n
( t) = y
{ n
~
l )
( t ) . Then
These equations can then be rewritten as the firstorder linear system
The initial conditions take the form ^(0) = c = [ C Q , c\, ..., C
M
_ I ] .
Note that det(X7 — A) = A." + a
n
\X
n
~
l
H h a\X + ao. However, the companion
matrix A in (11.19) possesses many nasty numerical properties for even moderately sized n
and, as mentioned before, is often well worth avoiding, at least for computational purposes.
A similar procedure holds for the conversion of a higherorder difference equation
EXERCISES
1. Let P € R
nxn
be a projection. Show that e
p
% / + 1.718P.
2. Suppose x, y € R" and let A = xy
T
. Further, let a = x
T
y. Show that e'
A
I + g ( t , a) xy
T
, where
3. Let
with n initial conditions, into a linear firstorder difference equation with (vector) initial
condition.
Exercises 121
Here, y(m) denotes the mth derivative of y with respect to t. Define a vector x (t) E ]Rn with
components Xl (t) = yet), X2(t) = yet), ... , Xn(t) = Inl)(t). Then
Xl (I) = X2(t) = y(t),
X2(t) = X3(t) = yet),
Xnl (t) = Xn(t) = y(nl)(t),
Xn(t) = y(n)(t) = aoy(t)  aly(t)  ...  an_llnl)(t) + ¢(t)
= aOx\ (t)  a\X2(t)  ...  anlXn(t) + ¢(t).
These equations can then be rewritten as the firstorder linear system
0 0 0
0 0 1
x(t)+ [ n ~ ( t )
x(t) =
0
0 0 1
ao a\ a
n
\
The initial conditions take the form X (0) = C = [co, Cl, •.. , C
n
\ r.
(11.19)
Note that det(A!  A) = An + an_1A
n

1
+ ... + alA + ao. However, the companion
matrix A in (11.19) possesses many nasty numerical properties for even moderately sized n
and, as mentioned before, is often well worth avoiding, at least for computational purposes.
A similar procedure holds for the conversion of a higherorder difference equation
with n initial conditions, into a linear firstorder difference equation with (vector) initial
condition.
EXERCISES
1. Let P E lR
nxn
be a projection. Show that e
P
~ ! + 1.718P.
2. Suppose x, y E lR
n
and let A = xyT. Further, let a = XT y. Show that etA
1+ get, a)xyT, where
{
!(eat  I)
g(t,a)= a t
3. Let
if a 1= 0,
if a = O.
122 Chapter 11. L i n ear Di f f eren ti al and Di f f erence Equati on s
where X e M'
nx
" is arbitrary. Show that
4. Let K denote the skewsymmetric matrix
where /„ denotes the n x n identity matrix. A matrix A e R
2n x2n
is said to be
Hamiltonian if K~
1
A
T
K = A and to be symplectic if K~
l
A
T
K  A
1
.
(a) Suppose E is Hamiltonian and let A, be an eigenvalue of H. Show that — A, must
also be an eigenvalue of H.
(b) Suppose S is symplectic and let A. be an eigenvalue of S. Show that 1 /A, must
also be an eigenvalue of S.
(c) Suppose that H is Hamiltonian and S is symplectic. Show that S~
1
HS must be
Hamiltonian.
(d) Suppose H is Hamiltonian. Show that e
H
must be symplectic.
5. Let a, ft € R and
Then show that
6. Find a general expression for
7. Find e
M
when A =
5. Let
(a) Solve the differential equation
122 Chapter 11. Linear Differential and Difference Equations
where X E jRmxn is arbitrary. Show that
e
A = [eo I sinh 1 X ]
~ I .
4. Let K denote the skewsymmetric matrix
[
0 In ]
In 0 '
where In denotes the n x n identity matrix. A matrix A E jR2nx2n is said to be
Hamiltonian if K I AT K =  A and to be symplectic if K I AT K = A I.
(a) Suppose H is Hamiltonian and let).. be an eigenvalue of H. Show that ).. must
also be an eigenvalue of H.
(b) Suppose S is symplectic and let).. be an eigenvalue of S. Show that 1/).. must
also be an eigenValue of S.
(c) Suppose that H is Hamiltonian and S is symplectic. Show that SI H S must be
Hamiltonian.
(d) Suppose H is Hamiltonian. Show that e
H
must be symplectic.
5. Let a, f3 E lR and
Then show that
6. Find a general expression for
7. Find etA when A =
8. Let
ectt cos f3t
_eut sin f3t
ectctrt sin ~ t J.
e cos/A
(a) Solve the differential equation
i = Ax ; x(O) = [ ~ J.
Exercises 123
Show that the eigenvalues of the solution X ( t ) of this problem are the same as those
of Cf or all?.
11. The year is 2004 and there are three large "free trade zones" in the world: Asia (A),
Europe (E), and the Americas (R). Suppose certain multinational companies have
total assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. Each
year half of the Americas' money stays home, a quarter goes to Europe, and a quarter
goes to Asia. For Europe and Asia, half stays home and half goes to the Americas.
(a) Find the matrix M that gives
(b) Find the eigenvalues and right eigenvectors of M.
(c) Find the distribution of the companies' assets at year k.
(d) Find the limiting distribution of the $40 trillion as the universe ends, i.e., as
k — » • +00 (i.e., around the time the Cubs win a World Series).
(Exercise adapted from Problem 5.3.11 in [24].)
(b) Solve the differential equation
9. Consider the initialvalue problem
for t > 0. Suppose that A e E"
x
" is skewsymmetric and let a = \\XQ\\
2
. Show that
*(OII
2
= af or al l f > 0.
10. Consider the n x n matrix initialvalue problem
12. (a) Find the solution of the initialvalue problem
(b) Consider the difference equation
If £
0
= 1 and z\ = 2, what is the value of Z IQ OO? What is the value of Zk in
general?
Exercises 123
(b) Solve the differential equation
i = Ax + b; x(O) = [ ~ l
9. Consider the initialvalue problem
i(t) = Ax(t); x(O) = Xo
for t ~ O. Suppose that A E ~ n x n is skewsymmetric and let ex = Ilxol12. Show that
I/X(t)1/2 = ex for all t > O.
10. Consider the n x n matrix initialvalue problem
X(t) = AX(t)  X(t)A; X(O) = c.
Show that the eigenvalues of the solution X (t) of this problem are the same as those
of C for all t.
11. The year is 2004 and there are three large "free trade zones" in the world: Asia (A),
Europe (E), and the Americas (R). Suppose certain multinational companies have
total assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. Each
year half of the Americas' money stays home, a quarter goes to Europe, and a quarter
goes to Asia. For Europe and Asia, half stays home and half goes to the Americas.
(a) Find the matrix M that gives
[
A] [A]
E =M E
R year k+1 R year k
(b) Find the eigenvalues and right eigenvectors of M.
(c) Find the distribution of the companies' assets at year k.
(d) Find the limiting distribution of the $40 trillion as the universe ends, i.e., as
k * +00 (i.e., around the time the Cubs win a World Series).
(Exercise adapted from Problem 5.3.11 in [24].)
12. (a) Find the solution of the initialvalue problem
.Yet) + 2y(t) + yet) = 0; yeO) = 1, .YeO) = O.
(b) Consider the difference equation
Zk+2 + 2Zk+1 + Zk = O.
If Zo = 1 and ZI = 2, what is the value of ZIOOO? What is the value of Zk in
general?
This page intentionally left blank This page intentionally left blank
Chapter 12
Generalized Eigenvalue
Problems
12.1 The Generalized Eigenvalue/Eigenvector Problem
In this chapter we consider the generalized eigenvalue problem
125
where A, B e C"
xn
. The standard eigenvalue problem considered in Chapter 9 obviously
corresponds to the special case that B = I.
Definition 12.1. A nonzero vector x e C" is a right generalized eigenvector of the pair
(A, B) with A, B e C
MX
" if there exists a scalar A. e C, called a generalized eigenvalue,
such that
Similarly, a nonzero vector y e C" is a left generalized eigenvector corresponding to an
eigenvalue X if
When the context is such that no confusion can arise, the adjective "generalized"
is usually dropped. As with the standard eigenvalue problem, if x [y] is a right [left]
eigenvector, then so is ax [ay] for any nonzero scalar a. e C.
Definition 12.2. The matrix A — X B is called a matrix pencil (or pencil of the matrices A
and B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalue
problem occur where the matrix pencil A — X B is singular.
Definition 12.3. The polynomial 7 r(A.) = det(A — A.5) is called the characteristic poly
nomial of the matrix pair (A, B) . The roots ofn(X .) are the eigenvalues of the associated
generalized eigenvalue problem.
Remark 12.4. When A, B e E"
xn
, the characteristic polynomial is obviously real, and
hence nonreal eigenvalues must occur in complex conjugate pairs.
Chapter 12
Generalized Eigenvalue
Problems
12.1 The Generalized Eigenvalue/Eigenvector Problem
In this chapter we consider the generalized eigenvalue problem
Ax = 'ABx,
where A, B E e
nxn
. The standard eigenvalue problem considered in Chapter 9 obviously
corresponds to the special case that B = I.
Definition 12.1. A nonzero vector x E en is a right generalized eigenvector of the pair
(A, B) with A, B E e
nxn
if there exists a scalar 'A E e, called a generalized eigenvalue,
such that
Ax = 'ABx. (12.1)
Similarly, a nonzero vector y E en is a left generalized eigenvector corresponding to an
eigenvalue 'A if
(12.2)
When the context is such that no confusion can arise, the adjective "generalized"
is usually dropped. As with the standard eigenvalue problem, if x [y] is a right [left]
eigenvector, then so is ax [ay] for any nonzero scalar a E <C.
Definition 12.2. The matrix A  'AB is called a matrix pencil (or pencil of the matrices A
and B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalue
problem occur where the matrix pencil A  'AB is singular.
Definition 12.3. The polynomial n('A) = det(A  'AB) is called the characteristic poly
nomial of the matrix pair (A, B). The roots ofn('A) are the eigenvalues of the associated
generalized eigenvalue problem.
Remark 12.4. When A, B E jRnxn, the characteristic polynomial is obviously real, and
hence nonreal eigenvalues must occur in complex conjugate pairs.
125
and there are again four cases to consider.
Case 1: a ^ 0, ft ^ 0. There are two eigenvalues, 1 and ^.
Case 2: a = 0, ft ^ 0. There is only one eigenvalue, 1 (of multiplicity 1).
Case 3: a ^ 0, f3 = 0. There are two eigenvalues, 1 and 0.
Case 4: a = 0, (3 = 0. All A 6 C are eigenvalues since det(B — uA) = 0.
At least for the case of regular pencils, it is apparent where the "missing" eigenvalues have
gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 of
A — A.B, with its reciprocal eigenvalue being 0 in Case 3 of the reciprocal pencil B — nA.
A similar reciprocal symmetry holds for Case 2.
While there are applications in system theory and control where singular pencils
appear, only the case of regular pencils is considered in the remainder of this chapter. Note
that A and/or B may still be singular. If B is singular, the pencil A — KB always has
126 Chapter 12. Generalized Eigenvalue Problems
Remark 12.5. If B = I (or in general when B is nonsingular), then n ( X ) is a polynomial
of degree n, and hence there are n eigenvalues associated with the pencil A — X B. However,
when B = I, in particular, when B is singular, there may be 0, k e n, or infinitely many
eigenvalues associated with the pencil A — X B. For example, suppose
where a and ft are scalars. Then the characteristic polynomial is
and there are several cases to consider.
Case 1: a ^ 0, ft ^ 0. There are two eigenvalues, 1 and .
Case 2: a = 0, f3 / 0. There are two eigenvalues, 1 and 0.
Case 3: a = 0, f3 = 0. There is only one eigenvalue, 1 (of multiplicity 1).
Case 4: a = 0, f3 = 0. All A e C are eigenvalues since det(A — A. B ) =0.
Definition 12.6. If del (A — X B) is not identically zero, the pencil A — X B is said to be
regular; otherwise, it is said to be singular.
Note that if AA(A) n J\f(B) ^ 0, the associated matrix pencil is singular (as in Case
4 above).
Associated with any matrix pencil A — X B is a reciprocal pencil B — n,A and cor
responding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues
(JL = £. It is instructive to consider the reciprocal pencil associated with the example in
Remark 12.5. With A and B as in (12.3), the characteristic polynomial is
126 Chapter 12. Generalized Eigenvalue Problems
Remark 12.5. If B = I (or in general when B is nonsingular), then rr(A) is a polynomial
of degree n, and hence there are n eigenvalues associated with the pencil A  AB. However,
when B =I I, in particular, when B is singular, there may be 0, k E !!, or infinitely many
eigenvalues associated with the pencil A  AB. For example, suppose
where a and (3 are scalars. Then the characteristic polynomial is
det(A  AB) = (I  AHa  (3A)
and there are several cases to consider.
Case 1: a =I 0, {3 =I O. There are two eigenvalues, I and ~ .
Case 2: a = 0, {3 =I O. There are two eigenvalues, I and O.
Case 3: a =I 0, {3 = O. There is only one eigenvalue, I (of multiplicity 1).
Case 4: a = 0, (3 = O. All A E C are eigenvalues since det(A  AB) == O.
(12.3)
Definition 12.6. If det(A  AB) is not identically zero, the pencil A  AB is said to be
regular; otherwise, it is said to be singular.
Note that if N(A) n N(B) =I 0, the associated matrix pencil is singular (as in Case
4 above).
Associated with any matrix pencil A  AB is a reciprocal pencil B  /.LA and cor
responding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues
/.L = ±. It is instructive to consider the reciprocal pencil associated with the example in
Remark 12.5. With A and B as in (12.3), the characteristic polynomial is
det(B  /.LA) = (1  /.L)({3  a/.L)
and there are again four cases to consider.
Case 1: a =I 0, {3 =I O. There are two eigenvalues, I and ~ .
Case 2: a = 0, {3 =I O. There is only one eigenvalue, I (of multiplicity I).
Case 3: a =I 0, {3 = O. There are two eigenvalues, 1 and O.
Case 4: a = 0, (3 = O. All A E C are eigenvalues since det(B  /.LA) == O.
At least for the case of regular pencils, it is apparent where the "missing" eigenvalues have
gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 of
A  AB, with its reciprocal eigenvalue being 0 in Case 3 of the reciprocal pencil B  /.LA.
A similar reciprocal symmetry holds for Case 2.
While there are applications in system theory and control where singular pencils
appear, only the case of regular pencils is considered in the remainder of this chapter. Note
that A and/or B may still be singular. If B is singular, the pencil A  AB always has
12. 2. Canonical Forms 127
fewer than n eigenvalues. If B is nonsingular, the pencil A A. f i always has precisely n
eigenvalues, since the generalized eigenvalue problem is then easily seen to be equivalent
to the standard eigenvalue problem B~
l
Ax = Xx (or AB~
l
w = Xw). However, this turns
out to be a very poor numerical procedure for handling the generalized eigenvalue problem
if B is even moderately ill conditioned with respect to inversion. Numerical methods that
work directly on A and B are discussed in standard textbooks on numerical linear algebra;
see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
12.2 Canonical Forms
Just as for the standard eigenvalue problem, canonical forms are available for the generalized
eigenvalue problem. Since the latter involves a pair of matrices, we now deal with equiva
lencies rather than similarities, and the first theorem deals with what happens to eigenvalues
and eigenvectors under equivalence.
Theorem 12.7. Let A, fl, Q, Z e C
nxn
with Q and Z nonsingular. Then
1. the eigenvalues of the problems A — XB and QAZ — XQBZ are the same (the two
problems are said to be equivalent).
2. ifx is a right eigenvector of A—XB, then Z~
l
x is a right eigenvector of QAZ—XQ B Z.
3. ify is a left eigenvector of A —KB, then Q~
H
y isa left eigenvector ofQAZ — XQBZ.
Proof:
1. det(QAZXQBZ) = det[0(A  XB)Z] = det gdet Zdet(A  XB). Since det 0
and det Z are nonzero, the result follows.
2. The result follows by noting that (A – yB)x  Oif andonly if Q(AXB)Z(Z~
l
x) =
0.
3. Again, the result follows easily by noting that y
H
(A — XB) — 0 if and only if
( Q~
H
y )
H
Q( A– XB ) Z = Q. D
where T
a
and Tp are upper triangular.
By Theorem 12.7, the eigenvalues of the pencil A — XB are then the ratios of the diag
onal elements of T
a
to the corresponding diagonal elements of Tp, with the understanding
that a zero diagonal element of Tp corresponds to an infinite generalized eigenvalue.
There is also an analogue of the MurnaghanWintner Theorem for real matrices.
The first canonical form is an analogue of Schur's Theorem and forms, in fact, the
theoretical foundation for the QZ algorithm, which is the generally preferred method for
solving the generalized eigenvalue problem; see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
Theorem 12.8. Let A, B e Cn
xn
. Then there exist unitary matrices Q, Z e Cn
xn
such that
12.2. Canonical Forms 127
fewer than n eigenvalues. If B is nonsingular, the pencil A  AB always has precisely n
eigenvalues, since the generalized eigenvalue problem is then easily seen to be equivalent
to the standard eigenvalue problem B
1
Ax = Ax (or AB
1
W = AW). However, this turns
out to be a very poor numerical procedure for handling the generalized eigenvalue problem
if B is even moderately ill conditioned with respect to inversion. Numerical methods that
work directly on A and B are discussed in standard textbooks on numerical linear algebra;
see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
12.2 Canonical Forms
Just as for the standard eigenvalue problem, canonical forms are available for the generalized
eigenvalue problem. Since the latter involves a pair of matrices, we now deal with equiva
lencies rather than similarities, and the first theorem deals with what happens to eigenvalues
and eigenvectors under equivalence.
Theorem 12.7. Let A, B, Q, Z E c
nxn
with Q and Z nonsingular. Then
1. the eigenvalues of the problems A  AB and QAZ  AQBZ are the same (the two
problems are said to be equivalent).
2. ifx isa right eigenvector of AAB, then Zl x isa righteigenvectorofQAZAQB Z.
3. ify isa left eigenvector of A AB, then QH y isa lefteigenvectorofQAZ AQBZ.
Proof:
1. det(QAZ  AQBZ) = det[Q(A  AB)Z] = det Q det Z det(A  AB). Since det Q
and det Z are nonzero, the result follows.
2. The result follows by noting that (A AB)x = 0 if and only if Q(A AB)Z(Zl x) =
o.
3. Again, the result follows easily by noting that yH (A  AB) o if and only if
(QH y)H Q(A _ AB)Z = O. 0
The first canonical form is an analogue of Schur's Theorem and forms, in fact, the
theoretical foundation for the QZ algorithm, which is the generally preferred method for
solving the generalized eigenvalue problem; see, for example, [7, Sec. 7.7] or [25, Sec. 6.7].
Theorem 12.8. Let A, B E c
nxn
. Then there exist unitary matrices Q, Z E c
nxn
such that
QAZ = T
a
, QBZ = T
fJ
,
where Ta and TfJ are upper triangular.
By Theorem 12.7, the eigenvalues ofthe pencil A  AB are then the ratios of the diag
onal elements of Ta to the corresponding diagonal elements of T
fJ
, with the understanding
that a zero diagonal element of TfJ corresponds to an infinite generalized eigenvalue.
There is also an analogue of the MurnaghanWintner Theorem for real matrices.
128 Chapter 12. Generalized Eigenvalue Problems
Theorem 12.9. Let A, B e R
nxn
. Then there exist orthogonal matrices Q, Z e R"
xn
such
thnt
where T is upper triangular and S is quasiuppertriangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil formed with the corresponding
2x2 diagonal subblock of T has a pair of complex conjugate eigenvalues. Otherwise, real
eigenvalues are given as above by the ratios of diagonal elements of S to corresponding
elements of T.
There is also an analogue of the Jordan canonical form called the Kronecker canonical
form (KCF). A full description of the KCF, including analogues of principal vectors and
so forth, is beyond the scope of this book. In this chapter, we present only statements of
the basic theorems and some examples. The first theorem pertains only to "square" regular
pencils, while the full KCF in all its generality applies also to "rectangular" and singular
pencils.
Theorem 12.10. Let A, B e C
nxn
and suppose the pencil A — XB is regular. Then there
exist nonsingular matrices P, Q € C"
x
" such that
where J is a Jordan canonical form corresponding to the finite eigenvalues of A A.fi and
N is a nilpotent matrix of Jordan blocks associated with 0 and corresponding to the infinite
eigenvalues of A — XB.
Example 12.11. The matrix pencil
with characteristic polynomial (X — 2)
2
has a finite eigenvalue 2 of multiplicty 2 and three
infinite eigenvalues.
Theorem 12.12 (Kronecker Canonical Form). Let A, B e C
mxn
. Then there exist
nonsingular matrices P e C
mxm
and Q e C
nxn
such that
128 Chapter 12. Generalized Eigenvalue Problems
Theorem 12.9. Let A, B E jRnxn. Then there exist orthogonal matrices Q, Z E jRnxn such
that
QAZ = S, QBZ = T,
where T is upper triangular and S is quasiuppertriangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil fonned with the corresponding
2 x 2 diagonal subblock of T has a pair of complex conjugate eigenvalues. Otherwise, real
eigenvalues are given as above by the ratios of diagonal elements of S to corresponding
elements of T.
There is also an analogue of the Jordan canonical fonn called the Kronecker canonical
form (KeF). A full description of the KeF, including analogues of principal vectors and
so forth, is beyond the scope of this book. In this chapter, we present only statements of
the basic theorems and some examples. The first theorem pertains only to "square" regular
pencils, while the full KeF in all its generality applies also to "rectangular" and singular
pencils.
Theorem 12.10. Let A, B E c
nxn
and suppose the pencil A  AB is regular. Then there
exist nonsingular matrices P, Q E c
nxn
such that
peA  AB)Q = [ ~ ~ ]  A [ ~ ~ l
where J is a Jordan canonical form corresponding to the finite eigenvalues of A  AB and
N is a nilpotent matrix of Jordan blocks associated with 0 and corresponding to the infinite
eigenvalues of A  AB.
Example 12.11. The matrix pencil
[2 I
0 0
~ ]> [ ~
0 0
o 0] o 2 0 0 I 0 o 0
o 0 1 0 0 0 I 0
o 0 0 1 0 0 o 0
o 0 0 0 0 0 0 0
with characteristic polynomial (A  2)2 has a finite eigenvalue 2 of multiplicty 2 and three
infinite eigenvalues.
Theorem 12.12 (Kronecker Canonical Form). Let A, B E c
mxn
• Then there exist
nonsingular matrices P E c
mxm
and Q E c
nxn
such that
peA  AB)Q = diag(LII' ... , L
l
" L ~ , ...• L;'. J  A.I, I  )"N),
12.2. Canonical Forms 129
where N is nilpotent, both N and J are in Jordan canonical form, and L^ is the (k + 1) x k
bidiagonal pencil
The /( are called the left minimal indices while the r, are called the right minimal indices.
Left or right minimal indices can take the value 0.
Such a matrix is in KCF. The first block of zeros actually corresponds to LQ, LQ, LQ, LQ ,
LQ, where each LQ has "zero columns" and one row, while each LQ has "zero rows" and
one column. The second block is L\ while the third block is L\. The next two blocks
correspond to
Just as sets of eigenvectors span Ainvariant subspaces in the case of the standard
eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the
generalized eigenproblem.
Definition 12.14. Let A, B e W
lxn
and suppose the pencil A — XB is regular. Then V is a
deflating subspace if
Just as in the standard eigenvalue case, there is a matrix characterization of deflating
subspace. Specifically, suppose S e R n*
xk
is a matrix whose columns span a ^dimensional
subspace S of R
n
, i.e., R ( S) = <S. Then S is a deflating subspace for the pencil A — XB if
and only if there exists M e R
kxk
such that
while the nilpotent matrix N in this example is
12.2. Canonical Forms 129
where N is nilpotent, both Nand J are in Jordan canonical form, and Lk is the (k + I) x k
bidiagonal pencil
A 0 0
A
Lk =
0 0
A
0 0 I
The Ii are called the left minimal indices while the ri are called the right minimal indices.
Left or right minimal indices can take the value O.
Example 12.13. Consider a 13 x 12 block diagonal matrix whose diagonal blocks are
A 0]
I A .
o I
Such a matrix is in KCF. The first block of zeros actually corresponds to Lo, Lo, Lo, L6,
L6, where each Lo has "zero columns" and one row, while each L6 has "zero rows" and
one column. The second block is L\ while the third block is LI The next two blocks
correspond to
[
21
J = 0 2
o 0
while the nilpotent matrix N in this example is
000
Just as sets of eigenvectors span Ainvariant subspaces in the case of the standard
eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the
generalized eigenproblem.
Definition 12.14. Let A, B E and suppose the pencil A  AB is regular. Then V is a
deflating subspace if
dim(AV + BV) = dimV. (12.4)
Just as in the standard eigenvalue case, there is a matrix characterization of deflating
subspace. Specifically, suppose S E is a matrix whose columns span a kdimensional
subspace S of i.e., n(S) = S. Then S is a deflating subspace for the pencil A  AB if
and only if there exists M E such that
AS = BSM. (12.5)
130 Chapter 12. Generalized Eigenvalue Problems
If B = /, then (12.4) becomes dim(AV + V) = dimV, which is clearly equivalent to
AV c V. Similarly, (12.5) becomes AS = SM as before. If the pencil is not regular, there
is a concept analogous to deflating subspace called a reducing subspace.
12.3 Application to the Computation of System Zeros
Consider the linear svstem
which has a root at —2.8 .
The method of finding system zeros via a generalized eigenvalue problem also works
well for general multiinput, multioutput systems. Numerically, however, one must be
careful first to "deflate out" the infinite zeros (infinite eigenvalues of (12.6)). This is accom
plished by computing a certain unitary equivalence on the system pencil that then yields a
smaller generalized eigenvalue problem with only finite generalized eigenvalues (the finite
zeros).
The connection between system zeros and the corresponding system pencil is non
trivial. However, we offer some insight below into the special case of a singleinput,
with A € M
n x n
, B € R"
x m
, C e R
pxn
, and D € R
pxm
. This linear timeinvariant state
space model is often used in multivariable control theory, where x(= x(t)) is called the state
vector, u is the vector of inputs or controls, and y is the vector of outputs or observables.
For details, see, for example, [26].
In general, the (finite) zeros of this system are given by the (finite) complex numbers
z, where the "system pencil"
drops rank. In the special case p = m, these values are the generalized eigenvalues of the
(n + m) x (n + m) pencil.
Example 12.15. Let
Then the transfer matrix (see [26]) of this system is
which clearly has a zero at —2.8 . Checking the finite eigenvalues of the pencil (12.6), we
find the characteristic polynomial to be
130 Chapter 12. Generalized Eigenvalue Problems
If B = I, then (12.4) becomes dim (A V + V) = dim V, which is clearly equivalent to
AV ~ V. Similarly, (12.5) becomes AS = SM as before. lEthe pencil is not regular, there
is a concept analogous to deflating subspace called a reducing subspace.
12.3 Application to the Computation of System Zeros
Consider the linear system
i = Ax + Bu,
y = Cx + Du
with A E jRnxn, B E jRnxm, C E jRPxn, and D E jRPxm. This linear timeinvariant state
space model is often used in multivariable control theory, where x(= x(t)) is called the state
vector, u is the vector of inputs or controls, and y is the vector of outputs or observables.
For details, see, for example, [26].
In general, the (finite) zeros of this system are given by the (finite) complex numbers
z, where the "system pencil"
(12.6)
drops rank. In the special case p = m, these values are the generalized eigenvalues of the
(n + m) x (n + m) pencil.
Example 12.15. Let
A=[
4
2
Then the transfer matrix (see [26)) of this system is
C = [I 2],
55 + 14
g(5)=C(sIA)'B+D= 2 '
5 + 3s + 2
D=O.
which clearly has a zero at 2.8. Checking the finite eigenvalues of the pencil (12.6), we
find the characteristic polynomial to be
det
[
A c
M
B]
D "'" 5A + 14,
which has a root at 2.8.
The method of finding system zeros via a generalized eigenvalue problem also works
well for general mUltiinput, multioutput systems. Numerically, however, one must be
careful first to "deflate out" the infinite zeros (infinite eigenvalues of (12.6». This is accom
plished by computing a certain unitary equivalence on the system pencil that then yields a
smaller generalized eigenvalue problem with only finite generalized eigenvalues (the finite
zeros).
The connection between system zeros and the corresponding system pencil is non
trivial. However, we offer some insight below into the special case of a singleinput.
12.4. Symmetric Generalized Eigenvalue Problems 131
singleoutput system. Specifically, let B = b e Rn, C = c
1
e R
l xn
, and D = d e R.
Furthermore, let g(.s) = c
r
(s7 — A )~
!
Z ? + d denote the system transfer function (matrix),
and assume that g ( s ) can be written in the form
where T T (S ) is the characteristic polynomial of A, and v(s) and T T (S ) are relatively prime
(i.e., there are no "pole/zero cancellations").
Suppose z € C is such that
is singular. Then there exists a nonzero solution to
or
Assuming z is not an eigenvalue of A (i.e., no pole/zero cancellations), then from (12.7) we
get
Substituting this in (12.8), we have
or g ( z ) y = 0 by the definition of g . Now _ y ^ 0 (else x = 0 from (12.9)). Hence g(z) = 0,
i.e., z is a zero of g.
12.4 Symmetric Generalized Eigenvalue Problems
A very important special case of the generalized eigenvalue problem
for A, B e R
nxn
arises when A = A and B = B
1
> 0. For example, the secondorder
system of differential equations
where M is a symmetric positive definite "mass matrix" and K is a symmetric "stiffness
matrix," is a frequently employed model of structures or vibrating systems and yields a
generalized eigenvalue problem of the form (12.10).
Since B is positive definite it is nonsingular. Thus, the problem (12.10) is equivalent
to the standard eigenvalue problem B~
l
Ax = A J C. However, B~
1
A is not necessarily
symmetric.
12.4. Symmetric Generalized Eigenvalue Problems 131
singleoutput system. Specifically, let B = b E ffi.n, C = c
T
E ffi.l xn, and D = d E R
Furthermore, let g(s) = c
T
(s I  A) 1 b + d denote the system transfer function (matrix),
and assume that g(s) can be written in the form
v(s)
g(s) = n(s)'
where n(s) is the characteristic polynomial of A, and v(s) and n(s) are relatively prime
(i.e., there are no "pole/zero cancellations").
Suppose Z E C is such that
[
A  zI b ]
c
T
d
is singular. Then there exists a nonzero solution to
or
(A  zl)x + by = 0,
c
T
x +dy = O.
(12.7)
(12.8)
Assuming z is not an eigenvalue of A (i.e., no pole/zero cancellations), then from (12.7) we
get
x = (A  zl)lby.
(12.9)
Substituting this in (12.8), we have
_c
T
(A  zl)lby + dy = 0,
or g(z)y = 0 by the definition of g. Now y 1= 0 (else x = 0 from (12.9». Hence g(z) = 0,
i.e., z is a zero of g.
12.4 Symmetric Generalized Eigenvalue Problems
A very important special case of the generalized eigenvalue problem
Ax = ABx (12.10)
for A, B E ffi.nxn arises when A = AT and B = BT > O. For example, the secondorder
system of differential equations
Mx+Kx=O,
where M is a symmetric positive definite "mass matrix" and K is a symmetric "stiffness
matrix," is a frequently employed model of structures or vibrating systems and yields a
generalized eigenvalue problem ofthe form (12.10).
Since B is positive definite it is nonsingular. Thus, the problem (12.10) is equivalent
to the standard eigenvalue problem B
1
Ax = AX. However, B
1
A is not necessarily
symmetric.
Nevertheless, the eigenvalues of B
l
A are always real (and are approximately 2.1926
and 3.1926 in Example 12.16).
Theorem 12.17. Let A, B e R
nxn
with A = A
T
and B = B
T
> 0. Then the generalized
eigenvalue problem
whose eigenvalues are approximately 2.1926 and —3.1926 as expected.
The material of this section can, of course, be generalized easily to the case where A
and B are Hermitian, but since realvalued matrices are commonly used in most applications,
we have restricted our attention to that case only.
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be
orthogonal with respect to the inner product (x, y)
B
= X
T
By. Moreover, if A > 0, then
the eigenvalues are also all positive.
Proof: Since B > 0, it has a Cholesky factorization B = LL
T
, where L is nonsingular
(Theorem 10.23). Then the eigenvalue problem
can be rewritten as the equivalent problem
Letting C = L
1
AL
J
and z = L
1
x, (12.11) can then be rewritten as
Since C = C
T
, the eigenproblem (12.12) has n real eigenvalues, with corresponding eigen
vectors zi,..., z
n
satisfying
Then x, = L
T
zi, i € n, are eigenvectors of the original generalized eigenvalue problem
and satisfy
Finally, if A = A
T
> 0, then C = C
T
> 0, so the eigenvalues are positive. D
Example 12.18. The Cholesky factor for the matrix B in Example 12.16 is
Then it is easily checked thai
132 Chapter 12. Generalized Eigenvalue Problems
Example 12.16. Let A ThenB~
l
A
132 Chapter 12. Generalized Eigenvalue Problems
Example 12.16. Let A = ; l B = [i J Then A = J
Nevertheless, the eigenvalues of A are always real (and are approximately 2.1926
and 3.1926 in Example 12.16).
Theorem 12.17. Let A, B E jRnxn with A = AT and B = BT > O. Then the generalized
eigenvalue problem
Ax = ABx
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be
orthogonal with respect to the inner product (x, y) B = x
T
By. Moreover, if A > 0, then
the eigenvalues are also all positive.
Proof: Since B > 0, it has a Cholesky factorization B = LL T, where L is nonsingular
(Theorem 10.23). Then the eigenvalue problem
Ax = ABx = ALL T x
can be rewritten as the equivalent problem
(12.11)
Letting C = L AL and Z = LT x, (12.11) can then be rewritten as
Cz = AZ. (12.12)
Since C = C
T
, the eigenproblem (12.12) has n real eigenvalues, with corresponding eigen
vectors Z I, •.. , Zn satisfying
zi Zj = Dij.
Then Xi = L Zi, i E !!., are eigenvectors of the original generalized eigenvalue problem
and satisfy
(Xi, Xj)B = xr BXj = (zi L Zj) = Dij.
Finally, if A = AT> 0, then C = C
T
> 0, so the eigenvalues are positive. 0
Example 12.18. The Cholesky factor for the matrix B in Example 12.16 is
1] .
.,fi .,fi
Then it is easily checked that
c = = [ 0 . .5
2.5
2 . .5 ]
1.5 '
whose eigenvalues are approximately 2.1926 and 3.1926 as expected.
The material of this section can, of course, be generalized easily to the case where A
and B are Hermitian, but since realvalued matrices are commonly used in most applications,
we have restricted our attention to that case only.
12.5. Simultaneous Diagonalization 133
12.5 Simultaneous Diagonalization
Recall that many matrices can be diagonalized by a similarity. In particular, normal ma
trices can be diagonalized by a unitary similarity. It turns out that in some cases a pair of
matrices (A, B) can be simultaneously diagonalized by the same matrix. There are many
such results and we present only a representative (but important and useful) theorem here.
Again, we restrict our attention only to the real case, with the complex case following in a
straightforward way.
Theorem 12.19 (Simultaneous Reduction to Diagonal Form). Let A, B e E"
x
" with
A = A
T
and B = B
T
> 0. Then there exists a nonsingular matrix Q such that
\ 2.5.1 Simultaneous diagonalization via SVD
There are situations in which forming C = L~
1
AL~
T
as in the proof of Theorem 12.19 is
numerically problematic, e.g., when L is highly ill conditioned with respect to inversion. In
such cases, simultaneous reduction can also be accomplished via an SVD. To illustrate, let
where D is diagonal. In fact, the diagonal elements of D are the eigenvalues of B
1
A.
Proof: Let B = LL
T
be the Cholesky factorization of B and setC = L~
1
AL~
T
. Since
C is symmetric, there exists an orthogonal matrix P such that P
T
CP = D, where D is
diagonal. Let Q = L~
T
P. Then
and
Finally, since QDQ~
l
= QQ
T
AQQ~
l
= L
T
PP
T
L~
1
A = L~
T
L~
1
A = B~
1
A, we
haveA(D) = A(B~
1
A). D
Note that Q is not in general orthogonal, so it does not preserve eigenvalues of A and B
individually. However, it does preserve the eigenvalues of A — XB. This can be seen directly.
LetA = Q
T
AQandB = Q
T
BQ. Then/HA = Q~
l
B~
l
Q~
T
Q
T
AQ = Q~
1
B~
1
AQ.
Theorem 12.19 is very useful for reducing many statements about pairs of symmetric
matrices to "the diagonal case." The following is typical.
Theorem 12.20. Let A, B e M"
xn
be positive definite. Then A > B if and only if B~
l
>
A
1
.
Proof: By Theorem 12.19, there exists Q e E"
x
" such that Q
T
AQ = D and Q
T
BQ = I,
where D is diagonal. Now D > 0 by Theorem 10.31. Also, since A > B, by Theorem
10.21 we have that Q
T
AQ > Q
T
BQ, i.e., D > I. But then D"
1
< / (this is trivially true
since the two matrices are diagonal). Thus, QD~
l
Q
T
< QQ
T
, i.e., A~
l
< B~
l
. D
12.5. Simultaneous Diagonalization 133
12.5 Simultaneous Diagonalization
Recall that many matrices can be diagonalized by a similarity. In particular, normal ma
trices can be diagonalized by a unitary similarity. It turns out that in some cases a pair of
matrices (A, B) can be simultaneously diagonalized by the same matrix. There are many
such results and we present only a representative (but important and useful) theorem here.
Again, we restrict our attention only to the real case, with the complex case following in a
straightforward way.
Theorem 12.19 (Simultaneous Reduction to Diagonal Form). Let A, B E ] [ ~ n x n with
A = AT and B = BT > O. Then there exists a nonsingular matrix Q such that
where D is diagonal. Infact, the diagonal elements of D are the eigenvalues of B
1
A.
Proof: Let B = LLT be the Cholesky factorization of B and set C = L I AL T. Since
C is symmetric, there exists an orthogonal matrix P such that pTe p = D, where D is
diagonal. Let Q = L  T P. Then
and
QT BQ = pT L I(LLT)L T P = pT P = [.
Finally, since QDQI = QQT AQQI = L T P pT L I A = L T L I A
B
1
A, we
have A(D) = A(B
1
A). 0
Note that Q is not in general orthogonal, so it does not preserve eigenvalues of A and B
individually. However, it does preserve the eigenvalues of A  'AB. This can be seen directly.
Let A = QT AQ and B = QT BQ. Then B
1
A = Q1 B
1
QT QT AQ = Q1 B
1
AQ.
Theorem 12.19 is very useful for reducing many statements about pairs of symmetric
matrices to "the diagonal case." The following is typical.
Theorem 12.20. Let A, B E lR
nxn
be positive definite. Then A 2: B if and only if B
1
2:
AI.
Proof: By Theorem 12.19, there exists Q E l R ~ x n such that QT AQ = D and QT BQ = [,
where D is diagonal. Now D > 0 by Theorem 10.31. Also, since A 2: B, by Theorem
10.21 we have that QT AQ 2: QT BQ, i.e., D 2: [. But then D
I
:::: [(this is trivially true
since the two matrices are diagonal). Thus, Q D
I
QT :::: Q QT, i.e., A I :::: B
1
. 0
12.5.1 Simultaneous diagonalization via SVD
There are situations in which forming C = L I AL T as in the proof of Theorem 12.19 is
numerically problematic, e.g., when L is highly iII conditioned with respect to inversion. In
such cases, simultaneous reduction can also be accomplished via an SVD. To illustrate. let
The problem (12.15) is called a generalized singular value problem and algorithms exist to
solve it (and hence equivalently (12.13)) via arithmetic operations performed only on LA
and LB separately, i.e., without forming the products L
A
L
T
A
or L
B
L
T
B
explicitly; see, for
example, [7, Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by
operations performed directly on M rather than by forming the matrix M
T
M and solving
the eigenproblem M
T
MX = Xx.
Remark 12.22. Various generalizations of the results in Remark 12.21 are possible, for
example, when A = A
T
> 0. The case when A is symmetric but indefinite is not so
straightforward, at least in real arithmetic. For example, A can be written as A = PDP
T
,
~ ~ ~ ~ T
where D is diagonal and P is orthogonal, but in writing A — PDDP = PD(PD) with
D diagonal, D may have pure imaginary elements.
134 Chapter 12. Generalized Eigenvalue Problems
us assume that both A and B are positive definite. Further, let A = L
A
L
T
A
and B — LsL
T
B
be Cholesky factorizations of A and B, respectively. Compute the SVD
where E e R£
x
" is diagonal. Then the matrix Q = L
B
T
U performs the simultaneous
diagonalization. To check this, note that
while
Remark 12.21. The SVD in (12.13) can be computed without explicitly forming the
indicated matrix product or the inverse by using the socalled generalized singular value
decomposition (GSVD). Note that
and thus the singular values of L
B
L
A
can be found from the eigenvalue problem
Letting x = L
B
z we see that (12.14) can be rewritten in the form L
A
L
A
x = XL
B
z =
ALgL^Lg
7
z, which is thus equivalent to the generalized eigenvalue problem
134 Chapter 12. Generalized Eigenvalue Problems
us assume that both A and B are positive definite. Further, let A = and B =
be Cholesky factorizations of A and B, respectively. Compute the SVD
(12.13)
where L E xn is diagonal. Then the matrix Q = L i/ u performs the simultaneous
diagonalization. To check this, note that
while
QT AQ = U
T
= UTULVTVLTUTU
= L2
QT BQ = U
T
= UTU
= I.
Remark 12.21. The SVD in (12.13) can be computed without explicitly forming the
indicated matrix product or the inverse by using the socalled generalized singular value
decomposition (GSVD). Note that
and thus the singular values of L B 1 L A can be found from the eigenvalue problem
02.14)
Letting x = LBT Z we see that 02.14) can be rewritten in the form = ALBz =
z, which is thus equivalent to the generalized eigenvalue problem
02.15)
The problem (12.15) is called a generalized singular value problem and algorithms exist to
solve it (and hence equivalently (12.13» via arithmetic operations performed only on LA
and L B separately, i.e., without forming the products LA L or L B L explicitly; see, for
example, [7, Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by
operations performed directly on M rather than by forming the matrix MT M and solving
the eigenproblem MT M x = AX.
Remark 12.22. Various generalizations of the results in Remark 12.21 are possible, for
example, when A = AT::: O. The case when A is symmetric but indefinite is not so
straightforward, at least in real arithmetic. For example, A can be written as A = PDP T,
where Disdiagonaland P is orthogonal,butin writing A = PDDp
T
= PD(PD{ with
D diagonal, b may have pure imaginary elements.
12.6. HigherOrder Eigenvalue Problems 135
12.6 HigherOrder Eigenvalue Problems
Consider the secondorder system of differential equations
where q(t} e W
1
and M, C, K e Rn
xn
. Assume for simplicity that M is nonsingular.
Suppose, by analogy with the firstorder case, that we try to find a solution of (12.16) of the
form q(t) = e
xt
p, where the nvector p and scalar A. are to be determined. Substituting in
(12.16) we get
To get a nonzero solution /?, we thus seek values of A. for which the matrix A.
2
M + A.C + K
is singular. Since the determinantal equation
yields a polynomial of degree 2rc, there are 2n eigenvalues for the secondorder (or
quadratic) eigenvalue problem A.
2
M + A.C + K.
A special case of (12.16) arises frequently in applications: M = I, C = 0, and
K = K
T
. Suppose K has eigenvalues
Let a > k =  f j i k 1
2
• Then the 2n eigenvalues of the secondorder eigenvalue problem A.
2
/ + K
are
If r = n (i.e., K = K
T
> 0), then all solutions of q + Kq = 0 are oscillatory.
12.6.1 Conversion to firstorder form
Let x\ = q and \i = q. Then (12.16) can be written as a firstorder system (with block
companion matrix)
where x(t) €. E
2
". If M is singular, or if it is desired to avoid the calculation of M
l
because
M is too ill conditioned with respect to inversion, the secondorder problem (12.16) can still
be converted to the firstorder generalized linear system
or, since
12.6. HigherOrder Eigenvalue Problems 135
12.6 HigherOrder Eigenvalue Problems
Consider the secondorder system of differential equations
Mq+Cq+Kq=O, (12.16)
where q(t) E ~ n and M, C, K E ~ n x n . Assume for simplicity that M is nonsingular.
Suppose, by analogy with the firstorder case, that we try to find a solution of (12.16) of the
form q(t) = eAt p, where the nvector p and scalar A are to be determined. Substituting in
(12.16) we get
or, since eAt :F 0,
(A
2
M + AC + K) p = O.
To get a nonzero solution p, we thus seek values of A for which the matrix A
2
M + AC + K
is singular. Since the determinantal equation
o = det(A
2
M + AC + K) = A 2n + ...
yields a polynomial of degree 2n, there are 2n eigenvalues for the secondorder (or
quadratic) eigenvalue problem A
2
M + AC + K.
A special case of (12.16) arises frequently in applications: M = I, C = 0, and
K = KT. Suppose K has eigenvalues
IL I ::: ... ::: ILr ::: 0 > ILr+ I ::: ... ::: ILn·
Let Wk = I ILk I !. Then the 2n eigenvalues of the secondorder eigenvalue problem A
2
I + K
are
± jWk; k = 1, ... , r,
± Wk; k = r + 1, ... , n.
If r = n (i.e., K = KT ::: 0), then all solutions of q + K q = 0 are oscillatory.
12.6.1 Conversion to firstorder form
Let XI = q and X2 = q. Then (12.16) can be written as a firstorder system (with block
companion matrix)
. [ 0
X = M1K
where x (t) E ~ 2 n . If M is singular, or if it is desired to avoid the calculation of M
I
because
M is too ill conditioned with respect to inversion, the secondorder problem (12.16) can still
be converted to the firstorder generalized linear system
[
I OJ' [0 I J
o M x = K C x.
136 Chapter 12. Generalized Eigenvalue Problems
Many other firstorder realizations are possible. Some can be useful when M, C, and/or K
have special symmetry or skewsymmetry properties that can exploited.
Higherorder analogues of (12.16) involving, say, the kth derivative of q, lead naturally
to higherorder eigenvalue problems that can be converted to firstorder form using aknxkn
block companion matrix analogue of (11.19). Similar procedures hold for the general k\h
order difference equation
EXERCISES
are the eigenvalues of the matrix A — BD
1
C.
2. Let F, G € C
MX
". Show that the nonzero eigenvalues of FG and GF are the same.
Hint: An easy "trick proof is to verify that the matrices
are similar via the similarity transformation
are identical for all F 6 E"
1
*" and all G G R"
xm
.
Hint: Consider the equivalence
(A similar result is also true for "nonsquare" pencils. In the parlance of control theory,
such results show that zeros are invariant under state feedback or output injection.)
which can be converted to various firstorder systems of dimension kn.
1. Suppose A e R
nx
" and D e R™
xm
. Show that the finite generalized eigenvalues of
the pencil
3. Let F e C
nxm
, G e C
mx
". Are the nonzero singular values of FG and GF the
same?
4. Suppose A € R
nxn
, B e R
n
*
m
, and C e E
wx
". Show that the generalized eigenval
ues of the pencils
and
136 Chapter 12. Generalized Eigenvalue Problems
Many other firstorder realizations are possible. Some can be useful when M, C, andlor K
have special symmetry or skewsymmetry properties that can exploited.
Higherorder analogues of (12.16) involving, say, the kth derivative of q, lead naturally
to higherorder eigenvalue problems that can be converted to firstorder form using a kn x kn
block companion matrix analogue of (11.19). Similar procedures hold for the general kth
order difference equation
which can be converted to various firstorder systems of dimension kn.
EXERCISES
1. Suppose A E lR
n
xn and D E lR::! xm. Show that the finite generalized eigenvalues of
the pencil
[ ~ ~ J  A [ ~ ~ J
are the eigenvalues of the matrix A  B D
1
C.
2. Let F, G E e
nxn
• Show that the nonzero eigenvalues of FG and G F are the same.
Hint: An easy "trick proof' is to verify that the matrices
[Fg ~ ] and [ ~ GOF ]
are similar via the similarity transformation
3. Let F E e
nxm
, G E e
mxn
• Are the nonzero singular values of FG and GF the
same?
4. Suppose A E ]Rnxn, B E lR
nxm
, and C E lRmxn. Show that the generalized eigenval
ues of the pencils
[ ~ ~ J  A [ ~ ~ J
and
[ A + B ~ + GC ~ ] _ A [ ~ ~ ]
are identical for all F E Rm xn and all G E R" xm .
Hint: Consider the equivalence
[
I G][AU B][I 0]
01 CO Fl'
(A similar result is also true for "nonsquare" pencils. In the parlance of control theory,
such results show that zeros are invariant under state feedback or output injection.)
Exercises 137
5. Another family of simultaneous diagonalization problems arises when it is desired
that the simultaneous diagonalizing transformation Q operates on matrices A, B e
]R
nx
" in such a way that Q~
l
AQ~
T
and Q
T
BQ are simultaneously diagonal. Such
a transformation is called contragredient. Consider the case where both A and
B are positive definite with Cholesky factorizations A = L&L
T
A
and B = L#Lg,
respectively, and let UW
T
be an SVD of L
T
B
L
A
.
(a) Show that Q = LA V£~
5
is a contragredient transformation that reduces both
A and B to the same diagonal matrix.
(b) Show that Q~
l
= ^~^U
T
L
T
B
.
(c) Show that the eigenvalues of A B are the same as those of E
2
and hence are
positive.
Exercises 137
5. Another family of simultaneous diagonalization problems arises when it is desired
that the simultaneous diagonalizing transformation Q operates on matrices A, B E
jRnxn in such a way that Ql AQT and QT BQ are simultaneously diagonal. Such
a transformation is called contragredient. Consider the case where both A and
B are positive definite with Cholesky factorizations A = LA L and B = L B L
respectively, and let be an SVD of
(a) Show that Q = LA is a contragredient transformation that reduces both
A and B to the same diagonal matrix.
(b) Show that Ql =
(c) Show that the eigenvalues of AB are the same as those of 1;2 and hence are
positive.
This page intentionally left blank This page intentionally left blank
Chapter 13
Kronecker Products
13.1 Definition and Examples
Definition 13.1. Let A e R
mx
", B e R
pxq
. Then the Kronecker product (or tensor
product) of A and B is defined as the matrix
Obviously, the same definition holds if A and B are complexvalued matrices. We
restrict our attention in this chapter primarily to realvalued matrices, pointing out the
extension to the complex case only where it is not obvious.
Example 13.2.
Note that B < g> A / A < g> B.
2. Foranyfl e!F
X(
7, /
2
< 8 > f l = [o l\
Replacing I
2
by /„ yields a block diagonal matrix with n copies of B along the
diagonal.
3. Let B be an arbitrary 2x2 matrix. Then
139
Chapter 13
Kronecker Products
13.1 Definition and Examples
Definition 13.1. Let A E lR
mxn
, B E lR
pxq
. Then the Kronecker product (or tensor
product) of A and B is defined as the matrix
[
allB
A@B= :
amlB
alnB ]
: E lRmpxnq.
amnB
(13.1)
Obviously, the same definition holds if A and B are complexvalued matrices. We
restrict our attention in this chapter primarily to realvalued matrices, pointing out the
extension to the complex case only where it is not obvious.
Example 13.2.
1. Let A =
2
nand B = [; Then
2
4 2 6
n
A@B = [
2B 3 4 6 6
2B 3 4 2 2
9 4 6 2
Note that B @ A i A @ B.
2. Forany B E lR
pxq
, /z @ B = J.
Replacing 12 by In yields a block diagonal matrix with n copies of B along the
diagonal.
3. Let B be an arbitrary 2 x 2 matrix. Then
l b"
0
b12
0
l
B @/z =
b
ll
0 b12
0
b
2
2
0
b
21
0 b
22
139
140 Chapter 13. Kronecker Products
The extension to arbitrary B and /„ is obvious.
4. Let Jt € R
m
, y e R". Then
5. Let* eR
m
, y eR". Then
13.2 Properties of the Kronecker Product
Theorem 13.3. Let A e R
mx
", 5 e R
rxi
, C e R"
x
^ and D e R
sxt
. Then
Proof: Simply verify that
Theorem 13.4. For all A and B,
Proof: For the proof, simply verify using the definitions of transpose and Kronecker
product. D
Corollary 13.5. If A e R"
xn
and B e R
mxm
are symmetric, then A® B is symmetric.
Theorem 13.6. If A and B are nonsingular,
Proof: Using Theorem 13.3, simply note that
140 Chapter 13. Kronecker Products
The extension to arbitrary B and In is obvious.
4. Let x E y E !R.n. Then
[
T T]T
X ® Y = XIY , ... , XmY
= [XIYJ, ... , XIYn, X2Yl, ... , xmYnf E !R.
mn
.
13.2 Properties of the Kronecker Product
Theorem 13.3. Let A E B E C E and D E Then
(A 0 B)(C 0 D) = AC 0 BD (E
Proof; Simply verify that
=AC0BD. 0
Theorem 13.4. Foral! A and B, (A ® Bl = AT ® BT.
al;kCkPBD ]
amkckpBD
(13.2)
Proof' For the proof, simply verify using the definitions of transpose and Kronecker
product. 0
Corollary 13.5. If A E ]Rn xn and B E !R.
m
xm are symmetric, then A ® B is symmetric.
Theorem 13.6. If A and Bare nonsingular, (A ® B)I = AI ® B
1
.
Proof: Using Theorem 13.3, simply note that (A ® B)(A 1 ® B
1
) = 1 ® 1 = I. 0
Corollary 13.8. If A € E"
xn
is orthogonal and B e M
mxm
15 orthogonal, then A < g > B is
orthogonal.
13.2. Properties of the Kronecker Product 141
Theorem 13.7. If A e IR"
xn
am/ B eR
mxm
are normal, then A® B is normal.
Proof:
yields a singular value decomposition of A < 8 > B (after a simple reordering of the diagonal
elements O/£A < 8 > £5 and the corresponding right and left singular vectors).
Corollary 13.11. Let A e R™
x
" have singular values a\ > • • • > a
r
> 0 and let B e
have singular values T\ > • • • > T
S
> 0. Then A < g ) B (or B < 8 > A) has rs singular values
^iT\ > • • • > ff
r
T
s
> Qand
Theorem 13.12. Let A e R
nx
" have eigenvalues A.,  , / e n, and let B e R
mxw
/zave
eigenvalues jJij, 7 € m. TTzen ?/ze mn eigenvalues of A® B are
Moreover, if x\, ..., x
p
are linearly independent right eigenvectors of A corresponding
to A  i , . . . , A.
p
(p < n), and zi, • • •, z
q
are linearly independent right eigenvectors of B
corresponding to JJL\ , ..., \Ju
q
(q < m), then ;c, < 8 > Zj € ffi.
m
" are linearly independent right
eigenvectors of A® B corresponding to A., /u ,
7
, i e /?, 7 e q.
Proof: The basic idea of the proof is as follows:
If A and B are diag onalizable in Theorem 13.12, we can take p = n and q —mand
thu s g et the complete eig enstru ctu re of A < 8 > B. In g eneral, if A and fi have Jordan form
Example 13.9. Let A and B  Then it is easily seen that
A i s orthog onal wi th eig envalu es e
±j9
and B i s orthog onal wi th eig envalu es e
±j(i>
. T he 4x4
matrix A ® 5 is then also orthog onal with eig envalu es e^'^+'W and e
±
^
( 6>
~^
>
\
Theorem 13.10. Lg f A G E
mx
" have a singular value decomposition l/^E^Vj an^ /ef
fi e ^
pxq
have a singular value decomposition UB^B^B Then
13.2. Properties of the Kronecker Product
Theorem 13.7. If A E IR
nxn
and B E IR
mxm
are normal, then A 0 B is normal.
Proof:
(A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B) by Theorem 13.4
= AT A 0 BT B by Theorem 13.3
= AAT 0 B BT since A and B are normal
= (A 0 B)(A 0 B)T by Theorem 13.3. 0
141
Corollary 13.8. If A E IR
nxn
is orthogonal and B E IR
mxm
is orthogonal, then A 0 B is
orthogonal.
E I 139 L A
[
eose Sine] dB [Cos</> Sin</>] Th ., '1 h
xamp e .• et = _ sin e cose an = _ sin</> cos</>O en It IS easl y seen t at
A is orthogonal with eigenvalues e±jO and B is orthogonal with eigenvalues e±j</J. The 4 x 4
matrix A 0 B is then also orthogonal with eigenvalues e±jeH</» and e±je
fJ
</».
Theorem 13.10. Let A E IR
mxn
have a singular value decomposition VA ~ A vI and let
B E IR
pxq
have a singular value decomposition V B ~ B VI. Then
yields a singular value decomposition of A 0 B (after a simple reordering of the diagonal
elements of ~ A 0 ~ B and the corresponding right and left singular vectors).
Corollary 13.11. Let A E lR;"xn have singular values UI :::: ... :::: U
r
> 0 and let B E IRfx
q
have singular values <I :::: ... :::: <s > O. Then A 0 B (or B 0 A) has rs singular values
U, <I :::: ... :::: U
r
<s > 0 and
rank(A 0 B) = (rankA)(rankB) = rank(B 0 A) .
Theorem 13.12. Let A E IR
n
xn have eigenvalues Ai, i E !!, and let B E IR
m
xm have
eigenvalues JL j, j E m. Then the mn eigenvalues of A 0 Bare
Moreover, if Xl, ••. , xp are linearly independent right eigenvectors of A corresponding
to AI, ... , A p (p ::::: n), and Z I, ... ,Zq are linearly independent right eigenvectors of B
corresponding to JLI, ... ,JLq (q ::::: m), then Xi 0 Zj E IR
mn
are linearly independent right
eigenvectors of A 0 B corresponding to Ai JL j, i E l!! j E 1·
Proof: The basic idea of the proof is as follows:
(A 0 B)(x 0 z) = Ax 0 Bz
= AX 0 JLZ
= AJL(X 0 z). 0
If A and Bare diagonalizable in Theorem 13.12, we can take p = nand q = m and
thus get the complete eigenstructure of A 0 B. In general, if A and B have Jordan form
142 Chapter 13. Kronecker Products
decompositions given by P~
l
AP = JA and Q~
]
BQ = JB, respectively, then we get the
following Jordanlike structure:
Note that JA® JB, while upper triangular, is generally not quite in Jordan form and needs
further reduction (to an ultimate Jordan form that also depends on whether or not certain
eigenvalues are zero or nonzero).
A Schur form for A ® B can be derived similarly. For example, suppose P and
Q are unitary matrices that reduce A and 5, respectively, to Schur (triangular) form, i.e.,
P
H
AP = T
A
and Q
H
BQ = T
B
(and similarly if P and Q are orthogonal similarities
reducing A and B to real Schur form). Then
Corollary 13.13. Let A e R
nxn
and B e R
mxm
. Then
Definition 13.14. Let A e R
nxn
and B e R
mxm
. Then the Kronecker sum (or tensor sum)
of A and B, denoted A © B, is the mn x mn matrix (I
m
< g> A) + (B ® /„). Note that, in
general, A ® B ^ B © A.
Example 13.15.
Then
The reader is invited to compute B 0 A = (/3 ® B) + (A < g> /2) and note the difference
with A © B.
1. Let
142 Chapter 1 3. Kronecker Products
decompositions given by p
I
AP = J
A
and Ql BQ = J
B
, respectively, then we get the
following Jordanlike structure:
(P ® Q)I(A ® B)(P ® Q) = (P
I
® Ql)(A ® B)(P ® Q)
= (P
1
AP) ® (Ql BQ)
= J
A
® J
B ·
Note that h ® JR, while upper triangular, is generally not quite in Jordan form and needs
further reduction (to an ultimate Jordan form that also depends on whether or not certain
eigenvalues are zero or nonzero).
A Schur form for A ® B can be derived similarly. For example, suppose P and
Q are unitary matrices that reduce A and B, respectively, to Schur (triangular) form, i.e.,
pH AP = TA and QH BQ = TB (and similarly if P and Q are orthogonal similarities
reducing A and B to real Schur form). Then
(P ® Q)H (A ® B)(P ® Q) = (pH ® QH)(A ® B)(P ® Q)
= (pH AP) ® (QH BQ)
= TA ® T
R
.
Corollary 13.13. Let A E IR
n
xn and B E IR
rn
xm. Then
1. Tr(A ® B) = (TrA)(TrB) = Tr(B ® A).
2. det(A ® B) = (det A)m(det Bt = det(B ® A).
Definition 13.14. Let A E IR
n
Xn and B E IR
m
xrn. Then the Kronecker sum (or tensor sum)
of A and B, denoted A EEl B, is the mn x mn matrix Urn ® A) + (B ® In). Note that, in
general, A EEl B i= B EEl A.
Example 13.15.
1. Let
2
;
2
Then
2 3 0 0 0 2 0 0 0 0
3 2 1 0 0 0 0 2 0 0 1 0
AfflB = (h®A)+(B®h) =
1 1 4 0 0 0 0 0 2 0 0
0 0 0 2 3
+
2 0 0 3 0 0
0 0 0 3 2 0 2 0 0 3 0
0 0 0 4 0 0 2 0 0 3
The reader is invited to compute B EEl A = (h ® B) + (A 0 h) and note the difference
with A EEl B.
13.2. Properties of the Kronecker Product 143
If A and B are diagonalizable in Theorem 13.16, we can take p = n and q = m and
thus get the complete eigenstructure of A 0 B. In general, if A and B have Jordan form
decompositions given by P~
1
AP = JA and Q"
1
BQ = JB, respectively, then
is a Jordanlike structure for A © B.
Then J can be written in the very compact form J = (4 < 8 > M) + (E^®l2) = M 0 Ek.
Theorem 13.16. Let A e E"
x
" have eigenvalues A,  , i e n, and let B e R
mx
'" have
eigenvalues /z
;
, 7 e ra. TTzen r/ze Kronecker sum A® B = (I
m
(g> A) + (B < g> /„) /za^ ran
e/genva/wes
Moreover, if x\,... ,x
p
are linearly independent right eigenvectors of A corresponding
to AI, . . . , X
p
(p < n), and z\, ..., z
q
are linearly independent right eigenvectors of B
corresponding to f j i \ , . . . , f^
q
(q < ra), then Zj < 8 > Xi € W
1
" are linearly independent right
eigenvectors of A® B corresponding to A., + [ij , i € p, j e q.
Proof: The basic idea of the proof is as follows:
2. Recall the real JCF
where M =
13.2. Properties of the Kronecker Product
2. Recall the real JCF
1=
where M = [
a
f3
M I 0 0
f3
a
o M I 0
M
0
J. Define
0 0
0 0
Ek =
0
I 0
M I
o M
o
o
o
143
E jR2kx2k,
Then 1 can be written in the very compact form 1 = (I} ® M) + (Ek ® h) = M $ E
k
.
Theorem 13.16. Let A E jRnxn have eigenvalues Ai, i E !!. and let B E jRmxm have
eigenvalues fJj, j E I!!. Then the Kronecker sum A $ B = (1m ® A) + (B ® In) has mn
eigenvalues
Al + fJt, ... , AI + fJm, A2 + fJt,···, A2 + fJm, ... , An + fJm'
Moreover, if XI, .•• , xp are linearly independent right eigenvectors of A corresponding
to AI, ... , Ap (p ::s: n), and ZI, ... , Zq are linearly independent right eigenvectors of B
corresponding to fJt, ... , fJq (q ::s: m), then Z j ® Xi E jRmn are linearly independent right
eigenvectors of A $ B corresponding to Ai + fJj' i E E, j E fl·
Proof: The basic idea of the proof is as follows:
[(1m ® A) + (B ® In)](Z ® X) = (Z ® Ax) + (Bz ® X)
= (Z ® Ax) + (fJZ ® X)
= (A + fJ)(Z ® X). 0
If A and Bare diagonalizable in Theorem 13.16, we can take p = nand q = m and
thus get the complete eigenstructure of A $ B. In general, if A and B have Jordan form
decompositions given by pI AP = lA and Qt BQ = l
B
, respectively, then
[(Q ® In)(lm ® p)rt[(lm ® A) + (B ® In)][CQ ® In)(lm ® P)]
= [(1m ® p)I(Q ® In)I][(lm ® A) + (B ® In)][(Q ® In)(/m ® P)]
= [(1m ® pI)(QI ® In)][(lm ® A) + (B ® In)][CQ ® In)(/m <:9 P)]
= (1m ® lA) + (JB ® In)
is a Jordanlike structure for A $ B.
144 Chapter 13. Kronecker Products
A Schur form for A © B can be derived similarly. Again, suppose P and Q are unitary
matrices that reduce A and B, respectively, to Schur (triangular) form, i.e., P
H
AP = T
A
and Q
H
BQ = T
B
(and similarly if P and Q are orthogonal similarities reducing A and B
to real Schur form). Then
((Q ® /„)(/« ® P)]"[(/m < 8 > A) + (B ® /
B
)][(e (g) /„)(/„, ® P)] = (/
m
< 8 > r
A
) + (7* (g) /„),
where [(Q < 8 > /„)(/« ® P)] = (< 2 ® P) is unitary by Theorem 13.3 and Corollary 13.8 .
13.3 Application to Sylvester and Lyapunov Equations
In this section we study the linear matrix equation
A special case of (13.3) is the symmetric equation
obtained by taking B = A
T
. When C is symmetric, the solution X e W
x
" is easily shown
also to be symmetric and (13.4) is known as a Lyapunov equation. Lyapunov equations
arise naturally in stability theory.
The first important question to ask regarding (13.3) is, When does a solution exist?
By writing the matrices in (13.3) in terms of their columns, it is easily seen by equating the
z'th columns that
The coefficient matrix in (13.5) clearly can be written as the Kronecker sum (I
m
* A) +
(B
T
® /„). The following definition is very helpful in completing the writing of (13.5) as
an "ordinary" linear system.
where A e R"
x
", B e R
mxm
, and C e M"
xm
. This equation is now often called a Sylvester
equation in honor of J.J. Sylvester who studied general linear matrix equations of the form
These equations can then be rewritten as the mn x mn linear system
144 Chapter 13. Kronecker Products
A Schur fonn for A EB B can be derived similarly. Again, suppose P and Q are unitary
matrices that reduce A and B, respectively, to Schur (triangular) fonn, i.e., pH AP = TA
and QH BQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B
to real Schur fonn). Then
where [(Q ® In)(lm ® P)] = (Q ® P) is unitary by Theorem 13.3 and Corollary 13.8.
13.3 Application to Sylvester and Lyapunov Equations
In this section we study the linear matrix equation
AX+XB=C, (13.3)
where A E IR
nxn
, B E IR
mxm
, and C E IRnxm. This equation is now often called a Sylvester
equation in honor of 1.1. Sylvester who studied general linear matrix equations of the fonn
k
LA;XB; =C.
;=1
A special case of (13.3) is the symmetric equation
AX +XAT = C (13.4)
obtained by taking B = AT. When C is symmetric, the solution X E IR
n
xn is easily shown
also to be symmetric and (13.4) is known as a Lyapunov equation. Lyapunovequations
arise naturally in stability theory.
The first important question to ask regarding (13.3) is, When does a solution exist?
By writing the matrices in (13.3) in tenns of their columns, it is easily seen by equating the
ith columns that
m
AXi + Xb; = C; = AXi +
j=1
These equations can then be rewritten as the mn x mn linear system
[
A+blll b
21
1
bl21 A + b
2Z
1
blml b2ml
(13.5)
The coefficient matrix in (13.5) clearly can be written as the Kronecker sum (1m 0 A) +
(B
T
0 In). The following definition is very helpful in completing the writing of (13.5) as
an "ordinary" linear system.
13.3. Application to Sylvester and Lyapunov Equations 145
Definition 13.17. Let c
(
€ E.
n
denote the columns ofC e R
nxm
so that C = [n,..., c
m
}.
Then vec(C) is defined to be the mnvector formed by stacking the columns ofC on top of
one another, i.e., vec(C) =
Using Definition 13.17, the linear system (13.5) can be rewritten in the form
There exists a unique solution to (13.6) if and only if [(I
m
® A) + (B
T
® /„)] is nonsingular.
But [(I
m
< 8 > A) + (B
T
(g) /„)] is nonsingular if and only if it has no zero eigenvalues.
From Theorem 13.16, the eigenvalues of [(/
m
<g> A) + (B
T
<8> /„)] are A., + IJ LJ , where
A,, e A (A), i e n_, and ^j e A(fi), j e m. We thus have the following theorem.
Theorem 13.18. Let A e R
nxn
, B G R
mxm
, and C e R"
xm
. 77ie/i the Sylvester equation
has a unique solution if and only if A and —B have no eigenvalues in common.
Sylvester equations of the form (13.3) (or symmetric Lyapunov equations of the form
(13.4)) are generally not solved using the mn x mn "vec" formulation (13.6). The most
commonly preferred numerical algorithm is described in [2]. First A and B are reduced to
(real) Schur form. An equivalent linear system is then solved in which the triangular form
of the reduced A and B can be exploited to solve successively for the columns of a suitably
transformed solution matrix X. Assuming that, say, n > m, this algorithm takes only O(n
3
)
operations rather than the O(n
6
) that would be required by solving (13.6) directly with
Gaussian elimination. A further enhancement to this algorithm is available in [6] whereby
the larger of A or B is initially reduced only to upper Hessenberg rather than triangular
Schur form.
The next few theorems are classical. They culminate in Theorem 13.24, one of many
elegant connections between matrix theory and stability theory for differential equations.
Theorem 13.19. Let A e Rn
xn
, B e R
mxm
, and C e R
nxm
. Suppose further that A and B
are asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have real
parts in the open left halfplane). Then the (unique) solution of the Sylvester equation
can be written as
Proof: Since A and B are stable, A., (A) + A
;
 (B) ^ 0 for all i, j so there exists a unique
solution to(13.8 )by Theorem 13.18. Now integrate the differential equation X = AX + XB
(with X(0) = C) on [0, +00):
13.3. Application to Sylvester and Lyapunov Equations 145
Definition 13.17. Let Ci E jRn denote the columns ofC E jRnxm so that C = [CI, ... , C
m
].
: : ~ ~ : : ~ : : : d ~ ~ : : : O :[]::::fonned by "ocking the colunuu of C on top of
Using Definition 13.17, the linear system (13.5) can be rewritten in the form
[(1m ® A) + (B
T
® In)]vec(X) = vec(C). (13.6)
There exists a unique solution to (13.6) if and only if [(1m ® A) + (B
T
® In)] is nonsingular.
But [(1m ® A) + (B
T
® In)] is nonsingular if and only if it has no zero eigenvalues.
From Theorem 13.16, the eigenvalues of [(1m ® A) + (BT ® In)] are Ai + Mj, where
Ai E A(A), i E!!, and Mj E A(B), j E!!!.. We thus have the following theorem.
Theorem 13.1S. Let A E lR
nxn
, B E jRmxm, and C E jRnxm. Then the Sylvester equation
AX+XB=C
(13.7)
has a unique solution if and only if A and  B have no eigenvalues in common.
Sylvester equations of the form (13.3) (or symmetric Lyapunov equations of the form
(13.4» are generally not solved using the mn x mn "vee" formulation (13.6). The most
commonly preferred numerical algorithm is described in [2]. First A and B are reduced to
(real) Schur form. An equivalent linear system is then solved in which the triangular form
of the reduced A and B can be exploited to solve successively for the columns of a suitably
transformed solution matrix X. Assuming that, say, n :::: m, this algorithm takes only 0 (n
3
)
operations rather than the O(n
6
) that would be required by solving (13.6) directly with
Gaussian elimination. A further enhancement to this algorithm is available in [6] whereby
the larger of A or B is initially reduced only to upper Hessenberg rather than triangular
Schur form.
The next few theorems are classical. They culminate in Theorem 13.24, one of many
elegant connections between matrix theory and stability theory for differential equations.
Theorem 13.19. Let A E jRnxn, B E jRmxm, and C E jRnxm. Suppose further that A and B
are asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have real
parts in the open left halfplane). Then the (unique) solution of the Sylvester equation
AX+XB=C (13.8)
can be written as
(13.9)
Proof: Since A and B are stable, Aj(A) + Aj(B) =I 0 for all i, j so there exists a unique
solution to (13.8) by Theorem 13.18. Now integrate the differential equation X = AX + X B
(with X(O) = C) on [0, +00):
lim XU)  X(O) = A roo X(t)dt + ([+00 X(t)dt) B.
IHoo 10 10
(13.10)
146 Chapter 13. Kronecker Products
Using the results of Section 11.1.6, it can be shown easily that lim e = lim e = 0.
r—> + oo t—v+oo
Hence, using the solution X ( t ) = e
tA
Ce
tB
from Theorem 11.6, we have that lim X ( t ) — 0.
/—<+3C
Substituting in (13.10) we have
Remark 13.20. An equivalent condition for the existence of a unique solution to AX +
XB = C is that [ J _
c
fi
] be similar to [ J _°
B
] (via the similarity [ J _* ]).
Theorem 13.21. Lef A, C e R"
x
". TTzen r/z e Lyapunov equation
has a unique solution if and only if A and —A
T
have no eigenvalues in common. If C is
symmetric and (13.11) has a unique solution, then that solution is symmetric.
Remark 13.22. If the matrix A e W
xn
has eigenvalues A.I ,...,!„, then — A
T
has eigen
values —A.], . . . , —k
n
. Thus, a sufficient condition that guarantees that A and — A
T
have
no common eigenvalues is that A be asymptotically stable. Many useful results exist con
cerning the relationship between stability and Lyapunov equations. Two basic results due
to Lyapunov are the following, the first of which follows immediately from Theorem 13.19.
Theorem 13.23. Let A,C e R"
x
" and suppose further that A is asymptotically stable.
Then the (unique) solution of the Lyapunov equation
can be written as
Theorem 13.24. A matrix A e R"
x
" is asymptotically stable if and only if there exists a
positive definite solution to the Lyapunov equation
Proof: Suppose A is asymptotically stable. By Theorems 13.21 and 13.23 a solution to
(13.13) exists and takes the form (13.12). Now let v be an arbitrary nonz ero vector in E".
Then
and so X
where C 
146 Chapter 13. Kronecker Products
Using the results of Section 11.1.6, it can be shown easily that lim e
lA
= lim e
lB
= O.
1>+00 1 .... +00
Hence, using the solution X (t) = elACe
lB
from Theorem 11.6, we have that lim X (t) = O.
t ~ + x
Substituting in (13.10) we have
C = A (1+
00
elACe
lB
dt) + (1+
00
elACe
lB
dt) B
{+oo
and so X = 1o elACe
lB
dt satisfies (13.8). o
Remark 13.20. An equivalent condition for the existence of a unique solution to AX +
X B = C is that [ ~ _C
B
] be similar to [ ~ _OB] (via the similarity [ ~ _ ~ ]).
Theorem 13.21. Let A, C E jRnxn. Then the Lyapunov equation
AX+XAT = C (13.11)
has a unique solution if and only if A and  A T have no eigenvalues in common. If C is
symmetric and ( 13.11) has a unique solution, then that solution is symmetric.
Remark 13.22. If the matrix A E jRn xn has eigenvalues )"" ... , An, then  AT has eigen
values AI, ... ,  An. Thus, a sufficient condition that guarantees that A and  A T have
no common eigenvalues is that A be asymptotically stable. Many useful results exist con
cerning the relationship between stability and Lyapunov equations. Two basic results due
to Lyapunov are the following, the first of which follows immediately from Theorem 13.19.
Theorem 13.23. Let A, C E jRnxn and suppose further that A is asymptotically stable.
Then the (unique) solution o/the Lyapunov equation
AX+XAT=C
can be written as
(13.12)
Theorem 13.24. A matrix A E jRnxn is asymptotically stable if and only if there exists a
positive definite solution to the Lyapunov equation
AX +XAT = C, (13.13)
where C = C
T
< O.
Proof: Suppose A is asymptotically stable. By Theorems l3.21 and l3.23 a solution to
(13.13) exists and takes the form (13.12). Now let v be an arbitrary nonzero vector in jRn.
Then
13.3. Application to Sylvester and Lyapunov Equations 147
Since — C > 0 and e
tA
is nonsingular for all t, the integrand above is positive. Hence
v
T
Xv > 0 and thus X is positive definite.
Conversely, suppose X = X
T
> 0 and let A. e A (A) with corresponding left eigen
vector y. Then
Since y
H
Xy > 0, we must have A + A = 2 Re A < 0 . Since A was arbitrary, A must be
asymptotically stable. D
Remark 13.25. The Lyapunov equation AX + XA
T
= C can also be written using the
vec notation in the equivalent form
A subtle point arises when dealing with the "dual" Lyapunov equation A
T
X + XA = C.
The equivalent "vec form" of this equation is
However, the complexvalued equation A
H
X + XA = C is equivalent to
The vec operator has many useful properties, most of which derive from one key
result.
Theorem 13.26. For any three matrices A, B, and C for which the matrix product ABC is
defined,
Proof: The proof follows in a fairly straightforward fashion either directly from the defini
tions or from the fact that vec(;t;y
r
) = y <8 > x. D
An immediate application is to the derivation of existence and uniqueness conditions
for the solution of the simple Sylvesterlike equation introduced in Theorem 6.11.
Theorem 13.27. Let A e R
mxn
, B e R
px(}
, and C e R
mxq
. Then the equation
has a solution X e R.
nxp
if and only ifAA
+
CB
+
B = C, in which case the general solution
is of the form
where Y e R
nxp
is arbitrary. The solution of (13.14) is unique if BB
+
® A
+
A = I.
Proof: Write (13.14) as
13.3. Application to Sylvester and Lyapunov Equations 147
Since C > 0 and etA is nonsingular for all t, the integrand above is positive. Hence
v
T
Xv > 0 and thus X is positive definite.
Conversely, suppose X = XT > 0 and let A E A(A) with corresponding left eigen
vector y. Then
0> yHCy = yH AXy + yHXAT Y
= (A + I)yH Xy.
Since yH Xy > 0, we must have A + I = 2 Re A < O. Since A was arbitrary, A must be
asymptotically stable. D
Remark 13.25. The Lyapunov equation AX + X A T = C can also be written using the
vec notation in the equivalent form
[(/ ® A) + (A ® l)]vec(X) = vec(C).
A subtle point arises when dealing with the "dual" Lyapunov equation A T X + X A = C.
The equivalent "vec form" of this equation is
[(/ ® AT) + (AT ® l)]vec(X) = vec(C).
However, the complexvalued equation A H X + X A = C is equivalent to
[(/ ® AH) + (AT ® l)]vec(X) = vec(C).
The vec operator has many useful properties, most of which derive from one key
result.
Theorem 13.26. For any three matrices A, B, and C for which the matrix product ABC is
defined,
vec(ABC) = (C
T
® A)vec(B).
Proof: The proof follows in a fairly straightforward fashion either directly from the defini
tions or from the fact that vec(xyT) = y ® x. D
An immediate application is to the derivation of existence and uniqueness conditions
for the solution of the simple Sylvesterlike equation introduced in Theorem 6.11.
Theorem 13.27. Let A E jRrnxn, B E jRPxq, and C E jRrnxq. Then the equation
AXB =C (13.14)
has a solution X E jRn x p if and only if A A + C B+ B = C, in which case the general solution
is of the form
(13.15)
where Y E jRnxp is arbitrary. The solution of (13. 14) is unique if BB+ ® A+ A = [.
Proof: Write (13.14) as
(B
T
® A)vec(X) = vec(C) (13.16)
148 Chapter 13. Kronecker Products
by Theorem 13.26. This "vector equation" has a solution if and only if
It is a straightforward exercise to show that (M ® N)
+
= M
+
< 8> N
+
. Thus, (13.16) has a
solution if and only if
and hence if and only if AA
+
CB
+
B = C.
The general solution of (13.16) is then given by
where Y is arbitrary. This equation can then be rewritten in the form
or, using Theorem 13.26,
The solution is clearly unique if BB
+
< 8> A
+
A = I. D
EXERCISES
1. For any two matrices A and B for which the indicated matrix product is defined,
show that (vec(A))
r
(vec(fl)) = Tr(A
r
£). In particular, if B e Rn
xn
, then Tr(fl) =
vec(/J
r
vec(fl).
2. Prove that for all matrices A and B, (A ® B)
+
= A
+
® B
+
.
3. Show that the equation AX B = C has a solution for all C if A has full row rank and
B has full column rank. Also, show that a solution, if it exists, is unique if A has full
column rank and B has full row rank. What is the solution in this case?
4. Show that the general linear equation
can be written in the form
148 Chapter 1 3. Kronecker Products
by Theorem 13.26. This "vector equation" has a solution if and only if
(B
T
® A)(B
T
® A)+ vec(C) = vec(C).
It is a straightforward exercise to show that (M ® N) + = M+ ® N+. Thus, (13.16) has a
solution if and only if
vec(C) = (B
T
® A)«B+{ ® A+)vec(C)
= [(B+ B{ ® AA+]vec(C)
= vec(AA +C B+ B)
and hence if and only if AA + C B+ B = C.
The general solution of (13 .16) is then given by
vec(X) = (B
T
® A) + vec(C) + [I  (B
T
® A) + (B
T
® A)]vec(Y),
where Y is arbitrary. This equation can then be rewritten in the form
vec(X) = «B+{ ® A+)vec(C) + [I  (BB+{ ® A+ A]vec(y)
or, using Theorem 13.26,
The solution is clearly unique if B B+ ® A + A = I. 0
EXERCISES
I. For any two matrices A and B for which the indicated matrix product is defined,
show that (vec(A»T (vec(B» = Tr(A
T
B). In particular, if B E lR
nxn
, then Tr(B) =
vec(Inl vec(B).
2. Prove that for all matrices A and B, (A ® B)+ = A+ ® B+.
3. Show that the equation AX B = C has a solution for all C if A has full row rank and
B has full column rank. Also, show that a solution, if it exists, is unique if A has full
column rank and B has full row rank. What is the solution in this case?
4. Show that the general linear equation
k
LAiXB
i
=C
i=1
can be written in the form
[BT ® AI + ... + B[ ® Ak]vec(X) = vec(C).
Exercises 149
5. Let x € M
m
and y e E". Show that *
r
< 8 > y = yx
T
.
6. Let A e R"
xn
and £ e M
mxm
.
(a) Show that A < 8 > B
2
= A
2
£
2
.
(b) What is A ® B\\
F
in terms of the Frobenius norms of A and B? Justify your
answer carefully.
(c) What is the spectral radius of A < 8 > B in terms of the spectral radii of A and B?
Justify your answer carefully.
7. Let A, 5 eR"
x
".
(a) Show that (/ ® A)* = / < 8 > A* and (fl < g > /)* = B
fc
® / for all integ ers &.
(b) Show that e
l
®
A
= I < g ) e
A
and e
5
®
7
= e
B
(g ) /.
(c) Show that the matrices / (8 ) A and B ® / commute.
(d) Show that
(Note: This result would look a little "nicer" had we defined our Kronecker
sum the other way around. However, Definition 13.14 is conventional in the
literature.)
8 . Consider the Lyapunov matrix equation (13.11) with
and C the symmetric matrix
Clearly
is a symmetric solution of the equation. Verify that
is also a solution and is nonsymmetric. Explain in lig ht of Theorem 13.21.
9. Block Triangularization: Let
where A e Rn
xn
and D e R
mxm
. It is desired to find a similarity transformation
of the form
such that T
l
ST is block upper triang ular.
Exercises 149
5. Let x E ]Rm and y E ]Rn. Show that x T ® y = y X T •
(a) Show that IIA ® BII2 = IIAII2I1Blb.
(b) What is II A ® B II F in terms of the Frobenius norms of A and B? Justify your
answer carefully.
(c) What is the spectral radius of A ® B in terms of the spectral radii of A and B?
Justify your answer carefully.
7. Let A, B E ]Rnxn.
(a) Show that (l ® A)k = I ® Ak and (B ® Il = Bk ® I for all integers k.
(b) Show that el®A = I ® e
A
and eB®1 = e
B
® I.
(c) Show that the matrices I ® A and B ® I commute.
(d) Show that
e
AEIlB
= eU®A)+(B®l) = e
B
® e
A
.
(Note: This result would look a little "nicer" had we defined our Kronecker
sum the other way around. However, Definition 13.14 is conventional in the
literature.)
8. Consider the Lyapunov matrix equation (13.11) with
A = [ ~ _ ~ ]
and C the symmetric matrix
[ ~
Clearly
Xs = [ ~ ~ ]
is a symmetric solution of the equation. Verify that
Xns = [ _ ~ ~ ]
is also a solution and is nonsymmetric. Explain in light of Theorem 13.21.
9. Block Triangularization: Let
where A E ]Rn xn and D E ]Rm xm. It is desired to find a similarity transformation
of the form
T = [ ~ ~ J
such that T
1
ST is block upper triangular.
150 Chapter 13. Kronecker Products
(a) Show that S is similar to
if X satisfies the socalled matrix Riccati equation
(b) Formulate a similar result for block lower triangularization of S.
10. Block Diagonalization: Let
where A e Rn
xn
and D E R
mxm
. It is desired to find a similarity transformation of
the form
such that T
l
ST is block diagonal,
(a) Show that S is similar to
if Y satisfies the Sylvester equation
(b) Formulate a similar result for block diagonalization of
150 Chapter 13. Kronecker Products
(a) Show that S is similar to
[
A +OBX B ]
DXB
if X satisfies the socalled matrix Riccati equation
CXA+DXXBX=O.
(b) Fonnulate a similar result for block lower triangularization of S.
to. Block Diagonalization: Let
S= [ ~ ~ l
where A E jRnxn and D E jRmxm. It is desired to find a similarity transfonnation of
the fonn
T = [ ~ ~ ]
such that T
1
ST is block diagonal.
(a) Show that S is similar to
if Y satisfies the Sylvester equation
AY  YD = B.
(b) Fonnulate a similar result for block diagonalization of
Bibliography
[1] Albert, A., Regression and the MoorePenrose Pseudoinverse, Academic Press, New
York, NY, 1972.
[2] Bartels, R.H., and G.W. Stewart, "Algorithm 432. Solution of the Matrix Equation
AX + XB = C," Cornm. ACM, 15(1972), 820826.
[3] Bellman, R., Introduction to Matrix Analysis, Second Edition, McGrawHill, New
York, NY, 1970.
[4] Bjorck, A., Numerical Methods for Least Squares Problems, SIAM, Philadelphia, PA,
1996.
[5] Cline, R.E., "Note on the Generalized Inverse of the Product of Matrices," SIAM Rev.,
6(1964), 57–58.
[6] Golub, G.H., S. Nash, and C. Van Loan, "A HessenbergSchur Method for the Problem
AX + XB = C," IEEE Trans. Autom. Control, AC24(1979), 909913.
[7] Golub, G.H., and C.F. Van Loan, Matrix Computations, Third Edition, Johns Hopkins
Univ. Press, Baltimore, MD, 1996.
[8] Golub, G.H., and J.H. Wilkinson, "IllConditioned Eigensystems and the Computation
of the Jordan Canonical Form," SIAM Rev., 18(1976), 578619.
[9] Greville, T.N.E., "Note on the Generalized Inverse of a Matrix Product," SIAM Rev.,
8(1966), 518–521 [Erratum, SIAM Rev., 9(1967), 249].
[10] Halmos, PR., FiniteDimensional Vector Spaces, Second Edition, Van Nostrand,
Princeton, NJ, 1958.
[11] Higham, N.J., Accuracy and Stability of'Numerical Algorithms, Second Edition, SIAM,
Philadelphia, PA, 2002.
[12] Horn, R.A., and C.R. Johnson, Matrix Analysis, Cambridge Univ. Press, Cambridge,
UK, 1985.
[13] Horn, R.A., and C.R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press,
Cambridge, UK, 1991.
151
Bibliography
[1] Albert, A., Regression and the MoorePenrose Pseudoinverse, Academic Press, New
York, NY, 1972.
[2] Bartels, RH., and G.w. Stewart, "Algorithm 432. Solution of the Matrix Equation
AX + X B = C," Comm. ACM, 15(1972),820826.
[3] Bellman, R, Introduction to Matrix Analysis, Second Edition, McGrawHill, New
York, NY, 1970.
[4] Bjorck, A., Numerical Methodsfor Least Squares Problems, SIAM, Philadelphia, PA,
1996.
[5] Cline, R.E., "Note on the Generalized Inverse of the Product of Matrices," SIAM Rev.,
6(1964),5758.
[6] Golub, G.H., S. Nash, and C. Van Loan, "A HessenbergSchur Method for the Problem
AX + X B = C," IEEE Trans. Autom. Control, AC24(1979), 909913.
[7] Golub, G.H., and c.F. Van Loan, Matrix Computations, Third Edition, Johns Hopkins
Univ. Press, Baltimore, MD, 1996.
[8] Golub, G.H., and lH. Wilkinson, "IllConditioned Eigensystems and the Computation
ofthe Jordan Canonical Form," SIAM Rev., 18(1976),578619.
[9] Greville, T.N.E., "Note on the Generalized Inverse of a Matrix Product," SIAM Rev.,
8(1966),518521 [Erratum, SIAM Rev., 9(1967), 249].
[10] Halmos, P.R, FiniteDimensional Vector Spaces, Second Edition, Van Nostrand,
Princeton, NJ, 1958.
[11] Higham, N.1., Accuracy and Stability of Numerical Algorithms, Second Edition, SIAM,
Philadelphia, PA, 2002.
[12] Hom, RA., and C.R. Johnson, Matrix Analysis, Cambridge Univ. Press, Cambridge,
UK, 1985.
[13] Hom, RA., and C.R. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press,
Cambridge, UK, 1991.
151
152 Bibliography
[14] Kenney, C, and A.J. Laub, "Controllability and Stability Radii for Companion Form
Systems," Math, of Control, Signals, and Systems, 1(1988), 361390.
[15] Kenney, C.S., and A.J. Laub, "The Matrix Sign Function," IEEE Trans. Autom. Control,
40(1995), 1330–1348.
[16] Lancaster, P., and M. Tismenetsky, The Theory of Matrices, Second Edition with
Applications, Academic Press, Orlando, FL, 1985.
[17] Laub, A.J., "A Schur Method for Solving Algebraic Riccati Equations," IEEE Trans..
Autom. Control, AC24( 1979), 913–921.
[18] Meyer, C.D., Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, PA,
2000.
[19] Moler, C.B., and C.F. Van Loan, "Nineteen Dubious Ways to Compute the Exponential
of a Matrix," SIAM Rev., 20(1978), 801836.
[20] Noble, B., and J.W. Daniel, Applied Linear Algebra, Third Edition, PrenticeHall,
Englewood Cliffs, NJ, 1988.
[21] Ortega, J., Matrix Theory. A Second Course, Plenum, New York, NY, 1987.
[22] Penrose, R., "A Generalized Inverse for Matrices," Proc. Cambridge Philos. Soc.,
51(1955), 406–413.
[23] Stewart, G. W., Introduction to Matrix Computations, Academic Press, New York, NY,
1973.
[24] Strang, G., Linear Algebra and Its Applications, Third Edition, Harcourt Brace
Jovanovich, San Diego, CA, 1988.
[25] Watkins, D.S., Fundamentals of Matrix Computations, Second Edition, Wiley
Interscience, New York, 2002.
[26] Wonham, W.M., Linear Multivariable Control. A Geometric Approach, Third Edition,
SpringerVerlag, New York, NY, 1985.
152 Bibliography
[14] Kenney, C., and AJ. Laub, "Controllability and Stability Radii for Companion Fonn
Systems," Math. of Control, Signals, and Systems, 1(1988),361390.
[15] Kenney, C.S., andAJ. Laub, "The Matrix Sign Function," IEEE Trans. Autom. Control,
40(1995),13301348.
[16] Lancaster, P., and M. Tismenetsky, The Theory of Matrices, Second Edition with
Applications, Academic Press, Orlando, FL, 1985.
[17] Laub, AJ., "A Schur Method for Solving Algebraic Riccati Equations," IEEE Trans ..
Autom. Control, AC24( 1979), 913921.
[18] Meyer, C.D., Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, PA,
2000.
[19] Moler, c.B., and c.P. Van Loan, "Nineteen Dubious Ways to Compute the Exponential
of a Matrix," SIAM Rev., 20(1978),801836.
[20] Noble, B., and J.w. Daniel, Applied Linear Algebra, Third Edition, PrenticeHall,
Englewood Cliffs, NJ, 1988.
[21] Ortega, J., Matrix Theory. A Second Course, Plenum, New York, NY, 1987.
[22] Pemose, R., "A Generalized Inverse for Matrices," Proc. Cambridge Philos. Soc.,
51(1955),406413.
[23] Stewart, G.W., Introduction to Matrix Computations, Academic Press, New York, NY,
1973.
[24] Strang, G., Linear Algebra and Its Applications, Third Edition, Harcourt Brace
Jovanovich, San Diego, CA, 1988.
[25] Watkins, D.S., Fundamentals of Matrix Computations, Second Edition, Wiley
Interscience, New York, 2002.
[26] Wonham, W.M., Linear Multivariable Control. A Geometric Approach, Third Edition,
SpringerVerlag, New York, NY, 1985.
Index
A–invariant subspace, 89
matrix characterization of, 90
algebraic multiplicity, 76
angle between vectors, 58
basis, 11
natural, 12
block matrix, 2
definiteness of, 104
diagonalization, 150
inverse of, 48
LU factorization, 5
triangularization, 149
C", 1
(pmxn i
(p/nxn 1
Cauchy–Bunyakovsky–Schwarz Inequal
ity, 58
Cayley–Hamilton Theorem, 75
chain
of eigenvectors, 87
characteristic polynomial
of a matrix, 75
of a matrix pencil, 125
Cholesky factorization, 101
co–domain, 17
column
rank, 23
vector, 1
companion matrix
inverse of, 105
pseudoinverse of, 106
singular values of, 106
singular vectors of, 106
complement
of a subspace, 13
orthogonal, 21
congruence, 103
conjugate transpose, 2
contragredient transformation, 137
controllability, 46
defective, 76
degree
of a principal vector, 85
determinant, 4
of a block matrix, 5
properties of, 4–6
dimension, 12
direct sum
of subspaces, 13
domain, 17
eigenvalue, 75
invariance under similarity transfor
mation, 81
elementary divisors, 84
equivalence transformation, 95
orthogonal, 95
unitary, 95
equivalent generalized eigenvalue prob
lems, 127
equivalent matrix pencils, 127
exchange matrix, 39, 89
exponential of a Jordan block, 91, 115
exponential of a matrix, 81, 109
computation of, 114–118
inverse of, 110
properties of, 109–112
field, 7
four fundamental subspaces, 23
function of a matrix, 81
generalized eigenvalue, 125
generalized real Schur form, 128
153
Index
Ainvariant subspace, 89
matrix characterization of, 90
algebraic multiplicity, 76
angle between vectors, 58
basis, 11
natural, 12
block matrix, 2
definiteness of, 104
diagonalization, 150
inverse of, 48
LV factorization, 5
triangularization, 149
en, 1
e
mxn
, 1
e ~ x n , 1
CauchyBunyakovskySchwarz Inequal
ity,58
CayleyHamilton Theorem, 75
chain
of eigenvectors, 87
characteristic polynomial
of a matrix, 75
of a matrix pencil, 125
Cholesky factorization, 101
codomain, 17
column
rank, 23
vector, 1
companion matrix
inverse of, 105
pseudoinverse of, 106
singular values of, 106
singular vectors of, 106
complement
of a subspace, 13
orthogonal, 21
153
congruence, 103
conjugate transpose, 2
contragredient transformation, 137
controllability, 46
defective, 76
degree
of a principal vector, 85
determinant, 4
of a block matrix, 5
properties of, 46
dimension, 12
direct sum
of subspaces, 13
domain, 17
eigenvalue, 75
invariance under similarity transfor
mation,81
elementary divisors, 84
equivalence transformation, 95
orthogonal, 95
unitary, 95
equivalent generalized eigenvalue prob
lems, 127
equivalent matrix pencils, 127
exchange matrix, 39, 89
exponential of a Jordan block, 91, 115
exponential of a matrix, 81, 109
computation of, 114118
inverse of, 110
properties of, 109112
field, 7
four fundamental subspaces, 23
function of a matrix, 81
generalized eigenvalue, 125
generalized real Schur form, 128
154 Index
generalized Schur form, 127
generalized singular value decomposition,
134
geometric multiplicity, 76
Holder Inequality, 58
Hermitian transpose, 2
higher–order difference equations
conversion to first–order form, 121
higher–order differential equations
conversion to first–order form, 120
higher–order eigenvalue problems
conversion to first–order form, 136
i, 2
idempotent, 6, 51
identity matrix, 4
inertia, 103
initial–value problem, 109
for higher–order equations, 120
for homogeneous linear difference
equations, 118
for homogeneous linear differential
equations, 112
for inhomogeneous linear difference
equations, 119
for inhomogeneous linear differen
tial equations, 112
inner product
complex, 55
complex Euclidean, 4
Euclidean, 4, 54
real, 54
usual, 54
weighted, 54
invariant factors, 84
inverses
of block matrices, 47
7, 2
Jordan block, 82
Jordan canonical form (JCF), 82
Kronecker canonical form (KCF), 129
Kronecker delta, 20
Kronecker product, 139
determinant of, 142
eigenvalues of, 141
eigenvectors of, 141
products of, 140
pseudoinverse of, 148
singular values of, 141
trace of, 142
transpose of, 140
Kronecker sum, 142
eigenvalues of, 143
eigenvectors of, 143
exponential of, 149
leading principal submatrix, 100
left eigenvector, 75
left generalized eigenvector, 125
left invertible, 26
left nullspace, 22
left principal vector, 85
linear dependence, 10
linear equations
characterization of all solutions, 44
existence of solutions, 44
uniqueness of solutions, 45
linear independence, 10
linear least squares problem, 65
general solution of, 66
geometric solution of, 67
residual of, 65
solution via QR factorization, 71
solution via singular value decom
position, 70
statement of, 65
uniqueness of solution, 66
linear regression, 67
linear transformation, 17
co–domain of, 17
composition of, 19
domain of, 17
invertible, 25
left invertible, 26
matrix representation of, 18
nonsingular, 25
nullspace of, 20
154
generalized Schur form, 127
generalized singular value decomposition,
134
geometric multiplicity, 76
Holder Inequality, 58
Hermitian transpose, 2
higherorder difference equations
conversion to firstorder form, 121
higherorder differential equations
conversion to firstorder form, 120
higherorder eigenvalue problems
conversion to firstorder form, 136
i,2
idempotent, 6, 51
identity matrix, 4
inertia, 103
initialvalue problem, 109
for higherorder equations, 120
for homogeneous linear difference
equations, 118
for homogeneous linear differential
equations, 112
for inhomogeneous linear difference
equations, 119
for inhomogeneous linear differen
tial equations, 112
inner product
complex, 55
complex Euclidean, 4
Euclidean, 4, 54
real, 54
usual, 54
weighted, 54
invariant factors, 84
inverses
of block matrices, 47
j,2
Jordan block, 82
Jordan canonical form (JCF), 82
Kronecker canonical form (KCF), 129
Kronecker delta, 20
Kronecker product, 139
determinant of, 142
eigenvalues of, 141
eigenvectors of, 141
products of, 140
pseudoinverse of, 148
singUlar values of, 141
trace of, 142
transpose of, 140
Kronecker sum, 142
eigenvalues of, 143
eigenvectors of, 143
exponential of, 149
leading principal submatrix, 100
left eigenvector, 75
left generalized eigenvector, 125
left invertible. 26
left nullspace, 22
left principal vector, 85
linear dependence, 10
linear equations
Index
characterization of all solutions, 44
existence of solutions, 44
uniqueness of solutions, 45
linear independence, 10
linear least squares problem, 65
general solution of, 66
geometric solution of, 67
residual of, 65
solution via QR factorization, 71
solution via singular value decom
position, 70
statement of, 65
uniqueness of solution, 66
linear regression, 67
linear transformation, 17
codomain of, 17
composition of, 19
domain of, 17
invertible, 25
left invertible. 26
matrix representation of, 18
nonsingular, 25
nulls pace of, 20
Index 155
range of, 20
right invertible, 26
LU factorization, 6
block, 5
Lyapunov differential equation, 113
Lyapunov equation, 144
and asymptotic stability, 146
integral form of solution, 146
symmetry of solution, 146
uniqueness of solution, 146
matrix
asymptotically stable, 145
best rank k approximation to, 67
companion, 105
defective, 76
definite, 99
derogatory, 106
diagonal, 2
exponential, 109
Hamiltonian, 122
Hermitian, 2
Householder, 97
indefinite, 99
lower Hessenberg, 2
lower triangular, 2
nearest singular matrix to, 67
nilpotent, 115
nonderogatory, 105
normal, 33, 95
orthogonal, 4
pentadiagonal, 2
quasi–upper–triangular, 98
sign of a, 91
square root of a, 101
symmetric, 2
symplectic, 122
tridiagonal, 2
unitary, 4
upper Hessenberg, 2
upper triangular, 2
matrix exponential, 81, 91, 109
matrix norm, 59
1–.60
2–, 60
oo–, 60
/?–, 60
consistent, 61
Frobenius, 60
induced by a vector norm, 61
mixed, 60
mutually consistent, 61
relations among, 61
Schatten, 60
spectral, 60
subordinate to a vector norm, 61
unitarily invariant, 62
matrix pencil, 125
equivalent, 127
reciprocal, 126
regular, 126
singular, 126
matrix sign function, 91
minimal polynomial, 76
monic polynomial, 76
Moore–Penrose pseudoinverse, 29
multiplication
matrix–matrix, 3
matrix–vector, 3
Murnaghan–Wintner Theorem, 98
negative definite, 99
negative invariant subspace, 92
nonnegative definite, 99
criteria for, 100
nonpositive definite, 99
norm
induced, 56
natural, 56
normal equations, 65
normed linear space, 57
nullity, 24
nullspace, 20
left, 22
right, 22
observability, 46
one–to–one (1–1), 23
conditions for, 25
onto, 23
conditions for, 25
Index
range of, 20
right invertible, 26
LV factorization, 6
block,5
Lyapunov differential equation, 113
Lyapunov equation, 144
and asymptotic stability, 146
integral form of solution, 146
symmetry of solution, 146
uniqueness of solution, 146
matrix
asymptotically stable, 145
best rank k approximation to, 67
companion, 105
defective, 76
definite, 99
derogatory, 106
diagonal,2
exponential, 109
Hamiltonian, 122
Hermitian, 2
Householder, 97
indefinite, 99
lower Hessenberg, 2
lower triangular, 2
nearest singular matrix to, 67
nilpotent, 115
nonderogatory, 105
normal, 33, 95
orthogonal, 4
pentadiagonal, 2
quasiuppertriangular, 98
sign of a, 91
square root of a, 10 1
symmetric, 2
symplectic, 122
tridiagonal, 2
unitary, 4
upper Hessenberg, 2
upper triangular, 2
matrix exponential, 81, 91, 109
matrix norm, 59
1,60
2,60
00,60
p,60
consistent, 61
Frobenius, 60
induced by a vector norm, 61
mixed,60
mutually consistent, 61
relations among, 61
Schatten,60
spectral, 60
155
subordinate to a vector norm, 61
unitarily invariant, 62
matrix pencil, 125
equivalent, 127
reciprocal, 126
regular, 126
singUlar, 126
matrix sign function, 91
minimal polynomial, 76
monic polynomial, 76
MoorePenrose pseudoinverse, 29
multiplication
matrixmatrix, 3
matrixvector, 3
MumaghanWintner Theorem, 98
negative definite, 99
negative invariant subspace, 92
nonnegative definite, 99
criteria for, 100
nonpositive definite, 99
norm
induced,56
natural,56
normal equations, 65
normed linear space, 57
nullity, 24
nullspace,20
left, 22
right, 22
observability, 46
onetoone (11), 23
conditions for, 25
onto, 23
conditions for, 25
156 Index
orthogonal
complement, 21
matrix, 4
projection, 52
subspaces, 14
vectors, 4, 20
orthonormal
vectors, 4, 20
outer product, 19
and Kronecker product, 140
exponential of, 121
pseudoinverse of, 33
singular value decomposition of, 41
various matrix norms of, 63
pencil
equivalent, 127
of matrices, 125
reciprocal, 126
regular, 126
singular, 126
Penrose theorem, 30
polar factorization, 41
polarization identity, 57
positive definite, 99
criteria for, 100
positive invariant subspace, 92
power (Kth) of a Jordan block, 120
powers of a matrix
computation of, 119–120
principal submatrix, 100
projection
oblique, 51
on four fundamental subspaces, 52
orthogonal, 52
pseudoinverse, 29
four Penrose conditions for, 30
of a full–column–rank matrix, 30
of a full–row–rank matrix, 30
of a matrix product, 32
of a scalar, 31
of a vector, 31
uniqueness, 30
via singular value decomposition, 38
Pythagorean Identity, 59
Q –orthogonality, 55
QR factorization, 72
T O " 1
IK , 1
M
mxn i
, 1
M
mxn 1
r '
M nxn 1
n ' '
range, 20
range inclusion
characterized by pseudoinverses, 33
rank, 23
column, 23
row, 23
rank–one matrix, 19
rational canonical form, 104
Rayleigh quotient, 100
reachability, 46
real Schur canonical form, 98
real Schur form, 98
reciprocal matrix pencil, 126
reconstructibility, 46
regular matrix pencil, 126
residual, 65
resolvent, 111
reverse–order identity matrix, 39, 89
right eigenvector, 75
right generalized eigenvector, 125
right invertible, 26
right nullspace, 22
right principal vector, 85
row
rank, 23
vector, 1
Schur canonical form, 98
generalized, 127
Schur complement, 6, 48, 102, 104
Schur T heorem, 98
Schur vectors, 98
second–order eigenvalue problem, 135
conversion to first–order form, 135
Sherman–Morrison–Woodbury formula,
48
signature, 103
similarity transformation, 95
and invariance of eigenvalues, h
156
orthogonal
complement, 21
matrix, 4
projection, 52
subspaces, 14
vectors, 4, 20
orthonormal
vectors, 4, 20
outer product, 19
and Kronecker product, 140
exponential of, 121
pseudoinverse of, 33
singular value decomposition of, 41
various matrix norms of, 63
pencil
equivalent, 127
of matrices, 125
reciprocal, 126
regular, 126
singular, 126
Penrose theorem, 30
polar factorization, 41
polarization identity, 57
positive definite, 99
criteria for, 100
positive invariant subspace, 92
power (kth) of a Jordan block, 120
powers of a matrix
computation of, 119120
principal submatrix, 100
projection
oblique, 51
on four fundamental subspaces, 52
orthogonal, 52
pseudoinverse, 29
four Penrose conditions for, 30
of a fullcolumnrank matrix, 30
of a fullrowrank matrix, 30
of a matrix product, 32
of a scalar, 31
of a vector, 31
uniqueness, 30
via singular value decomposition, 38
Pythagorean Identity, 59
Qorthogonality, 55
QR factorization, 72
JR.n, I
JR.mxn,1
1
I
range, 20
range inclusion
Index
characterized by pseudoinverses, 33
rank, 23
column, 23
row, 23
rankone matrix, 19
rational canonical form, 104
Rayleigh quotient, 100
reachability, 46
real Schur canonical form, 98
real Schur form, 98
reciprocal matrix pencil, 126
reconstructibility, 46
regular matrix pencil, 126
residual, 65
resolvent, III
reverseorder identity matrix, 39, 89
right eigenvector, 75
right generalized eigenvector, 125
right invertible, 26
right nullspace, 22
right principal vector, 85
row
rank, 23
vector, I
Schur canonical form, 98
generalized, 127
Schur complement, 6, 48, 102, 104
Schur Theorem, 98
Schur vectors, 98
secondorder eigenvalue problem, 135
conversion to firstorder form, 135
ShermanMorrisonWoodbury formula,
48
signature, 103
similarity transformation, 95
and invariance of eigenvalues, 81
Index 157
orthogonal, 95
unitary, 95
simple eigenvalue, 85
simultaneous diagonalization, 133
via singular value decomposition, 134
singular matrix pencil, 126
singular value decomposition (SVD), 35
and bases for four fundamental
subspaces, 38
and pseudoinverse, 38
and rank, 38
characterization of a matrix factor
ization as, 37
dyadic expansion, 38
examples, 37
full vs. compact, 37
fundamental theorem, 35
nonuniqueness, 36
singular values, 36
singular vectors
left, 36
right, 36
span, 11
spectral radius, 62, 107
spectral representation, 97
spectrum, 76
subordinate norm, 61
subspace, 9
A–invariant, 89
deflating, 129
reducing, 130
subspaces
complements of, 13
direct sum of, 13
equality of, 10
four fundamental, 23
intersection of, 13
orthogonal, 14
sum of, 13
Sylvester differential equation, 113
Sylvester equation, 144
integral form of solution, 145
uniqueness of solution, 145
Sylvester's Law of Inertia, 103
symmetric generalized eigenvalue prob
lem, 131
total least squares, 68
trace, 6
transpose, 2
characterization by inner product, 54
of a block matrix, 2
triangle inequality
for matrix norms, 59
for vector norms, 57
unitarily invariant
matrix norm, 62
vector norm, 58
variation of parameters, 112
vec
of a matrix, 145
of a matrix product, 147
vector norm, 57
l–, 57
2–, 57
oo–, 57
P–, 51
equivalent, 59
Euclidean, 57
Manhattan, 57
relations among, 59
unitarily invariant, 58
weighted, 58
weighted p–, 58
vector space, 8
dimension of, 12
vectors, 1
column, 1
linearly dependent, 10
linearly independent, 10
orthogonal, 4, 20
orthonormal, 4, 20
row, 1
span of a set of, 11
zeros
of a linear dynamical system, 130
Index
orthogonal, 95
unitary, 95
simple eigenvalue, 85
simultaneous diagonalization, 133
via singular value decomposition, 134
singular matrix pencil, 126
singular value decomposition (SVD), 35
and bases for four fundamental
subspaces, 38
and pseudoinverse, 38
and rank, 38
characterization of a matrix factor
ization as, 37
dyadic expansion, 38
examples, 37
full vs. compact, 37
fundamental theorem, 35
nonuniqueness, 36
singular values, 36
singular vectors
left, 36
right, 36
span, 11
spectral radius, 62, 107
spectral representation, 97
spectrum, 76
subordinate norm, 61
subspace, 9
Ainvariant, 89
deflating, 129
reducing, 130
subspaces
complements of, 13
direct sum of, 13
equality of, 10
four fundamental, 23
intersection of, 13
orthogonal, 14
sum of, 13
Sylvester differential equation, 113
Sylvester equation, 144
integral form of solution, 145
uniqueness of solution, 145
157
Sylvester's Law of Inertia, 103
symmetric generalized eigenvalue prob
lem,131
total least squares, 68
trace, 6
transpose, 2
characterization by inner product, 54
of a block matrix, 2
triangle inequality
for matrix norms, 59
for vector norms, 57
unitarily invariant
matrix norm, 62
vector norm, 58
variation of parameters, 112
vec
of a matrix, 145
of a matrix product, 147
vector norm, 57
1,57
2,57
00,57
p,57
equivalent, 59
Euclidean, 57
Manhattan, 57
relations among, 59
unitarily invariant, 58
weighted, 58
weighted p, 58
vector space, 8
dimension of, 12
vectors, 1
column, 1
linearly dependent, 10
linearly independent, 10
orthogonal, 4, 20
orthonormal, 4, 20
row, 1
span of a set of, 11
zeros
of a linear dynamical system, 130
Matrix Analysis Matrix Analysis
for Scientists & Engineers for Scientists & Engineers
This page intentionally left blank This page intentionally left blank
California slam. Laub Alan J. .Matrix Analysis Matrix Analysis for Scientists & Engineers for Scientists & Engineers Alan J. Laub University of California Davis.
MATLAB® is a registered trademark of The MathWorks. Title..) ISBN 0898715768 (pbk. 2. I. 1948Matrix analysis for scientists and engineers / Alan J. No part of this book All rights reserved.L38 2005 512. 3600 University City Science Center. Alan J. p. QA188138 2005 QA 188.. 3600 University City Science Center. 3 Apple Hill Drive. Mathcad is a registered trademark of Mathsoft Engineering & Education. Laub. Mathematical analysis. 10987654321 10987654321 All rights reserved. stored. Philadelphia. 1. No part of this book may be reproduced. Matrix analysis for scientists and engineers / Alan J. Fax: 5086477101. 3 Apple Hill Drive.com. 1948Laub. Laub. PA 191042688. Alan J. Inc. 5086477000.lam. PA. Inc. Inc.com. write to the Society for Industrial and Applied of the publisher. Includes bibliographical references and index. Includes bibliographical references and index. wwwmathworks. Library of Congress CataloginginPublication Data Library of Congress CataloginginPublication Data Laub. • slam is a registered trademark.. MATLAB® is a registered trademark of The MathWorks. For MATLAB product information. Mathematical analysis.9'434—dc22 512. p. Inc. Title. please contact The MathWorks. Mathcad is a registered trademark of Mathsoft Engineering & Education. artist Aaron Tallon of Philadelphia. Natick.9'434dc22 2004059962 2004059962 About the cover: The original artwork featured on the cover was created by freelance About the cover: The original artwork featured on the cover was created by freelance artist Aaron Tallon of Philadelphia. For information. Used by permission. Mathematica is a registered trademark of Wolfram Research. please contact The MathWorks. Used by permission 5. stored. Fax: 5086477101. info@mathworks.mathworks.. Printed in the United States of America. or transmitted in any manner without the written permission of the publisher. cm.com 5086477000.. Printed in the United States of America. 2. is a registered trademark. Inc. Mathematics. cm.Copyright Copyright © 2005 by the Society for Industrial and Applied Mathematics.com Mathematica is a registered trademark of Wolfram Research. MA 017602098 USA. info@mathworks. PA.. . write to the Society for Industrial and Applied Mathematics. For MATLAB product information.. For information. PA 191042688. or transmitted in any manner without the written permission may be reproduced. I. Matrices. Inc. Inc. Matrices. Philadelphia. MA 017602098 USA. Inc. www. 2005 by the Society for Industrial and Applied Mathematics.) 1. ISBN 0898715768 (pbk. Natick.
Beverley (who captivated me in the UBC math library captivated UBC nearly forty years ago) nearly forty . Beverley To my wife.To my wife.
This page intentionally left blank This page intentionally left blank .
2.1 Definitions and Characterizations Definitions and Characterizations.5 Four Fundamental Subspaces .1 Vector Linear Equations .. . .3 Inner Products and Orthogonality 1. 4. 5.2 Matrix Representation of Linear Transformations 3.3 Properties and Applications Introduction to the Singular Value Decomposition Introduction to the Singular Value Decomposition 5. 5. 4...1 The Fundamental Theorem 5.3 Composition of Transformations 3. . . . . . . .2 Subspaces. . .. .2 Subspaces 2. .. . .2 Matrix Linear Equations .. . .3 Properties and Applications .4 Structure of Linear Transformations 3.2 Matrix Linear Equations 6.4 Sums and Intersections of Subspaces Linear Transformations Linear Transformations 3. .1 Some Notation and Terminology 1. 2.2 Examples 4. . . .1 4.. ..1 Definition and Examples .3 Rowand Column Compressions 5.3 Linear Independence 2.3 Inner Products and Orthogonality .2 Some Basic Properties 5.. 3.1 Definition and Examples 3. 4. 3. .1 2.1 Vector Linear Equations 6.2 Examples. Definitions and Examples 2.1 Some Notation and Terminology 1. 6. .2 Matrix Arithmetic . .4 Determinants Vector Spaces Vector Spaces 2.3 A More General Matrix Linear Equation 6. .3 Row and Column Compressions Linear Equations Linear Equations 6.4 Some Useful and Interesting Inverses. 6. .2 Some Basic Properties . .2 Matrix Arithmetic 1. 6. ..5 Four Fundamental Subspaces Introduction to the MoorePenrose Pseudoinverse Introduction to the MoorePenrose Pseudoinverse 4. . 1.1 Definitions and Examples . .Contents Contents Preface Preface 1 1 Introduction and Review Introduction and Review 1. . .3 Linear Independence . . . .3 Composition of Transformations . .. . .3 A More General Matrix Linear Equation 6. . . ..4 Some Useful and Interesting Inverses vii xi xi 1 1 1 1 3 3 4 4 4 7 7 7 7 9 9 10 10 13 13 17 17 17 17 18 18 19 19 20 20 22 22 2 2 3 3 4 4 29 29 30 30 31 31 35 35 35 35 38 40 5 5 6 6 43 43 43 43 44 47 47 47 47 .4 Determinants 1.4 Structure of Linear Transformations 3.. . . 1.. . 3. .2 Matrix Representation of Linear Transformations 3.4 Sums and Intersections of Subspaces 2.. . . ...1 The Fundamental Theorem .
3. . .4 Matrix Norms Linear Least Squares Problems 8. .5 Least Squares and QR Factorization .1 7.2. . . . . . .1.1 Block matrices and definiteness 10. . .1 Projections 7. . . . .5 Modal decompositions 11.4 Least Squares and Singular Value Decomposition 8.2. . . . . . 11. . . . .3. . .1 Block matrices and definiteness 10.1 Example: Linear regression . 8.2 Inner Product Spaces 7.3. 10.5 Least Squares and QR Factorization 8. .5 The Matrix Sign Function 51 51 51 51 52 52 54 54 57 57 59 59 8 65 65 65 65 67 67 67 67 67 67 69 70 70 71 71 9 75 75 75 82 82 85 85 86 86 88 88 89 89 91 91 95 95 10 Canonical Forms 10.2 Geometric Solution 8. . .1 Properties of the matrix exponential 11. . . .2. .1 The four fundamental orthogonal projections The four fundamental orthogonal projections 7.2 Other least squares problems .4 Rational Canonical Form . .4 Rational Canonical Form 11 Linear Differential and Difference Equations 11 Linear Differential and Difference Equations 11. .5 The Matrix Sign Function.1 9. . .1 The Linear Least Squares Problem 8.1.4 Matrix Norms . . . .3 Linear Regression and Other Linear Least Squares Problems 8. 9. .3 Linear Regression and Other Linear Least Squares Problems 8. . 7.1.3.viii viii Contents Contents 7 Projections. . .3. . . . 11. Example: Linear regression 8. 8. .2 Homogeneous linear differential equations 11. Eigenvalues and Eigenvectors 9.3 HigherOrder Equations .3 Vector Norms 7. . .2 Inner Product Spaces 7. . .2 On the +1's in JCF blocks 9. . .2 Jordan Canonical Form .2. . . . and Norms 7.3.2 Difference Equations . . . 9. . 11. . . . . .3 Computation of matrix powers 11. .1 Homogeneous linear difference equations 11.4 Geometric Aspects of the JCF 9. 11.1 The Linear Least Squares Problem . .1 Some Basic Canonical Forms 10.3. Inner Product Spaces. . . 95 95 99 102 102 104 104 104 104 109 109 109 109 109 109 112 112 112 112 113 113 114 114 114 114 118 118 118 118 118 118 119 119 120 120 . .1 Fundamental Definitions and Properties 9. . . 10. 7. . .1 Properties ofthe matrix exponential . .1 Projections .3 Determination of the JCF . . . . . . .2 Definite Matrices 10. .2.6 Computation of the matrix exponential 11.1. 11. . . .3.5 Modal decompositions . .1. . .3 Inhomogeneous linear differential equations 11.4 Linear matrix differential equations 11.1. . . . . . .2 Definite Matrices . . .1 Homogeneous linear difference equations 11.1. 11. 11.3 Inhomogeneous linear differential equations 11. 8.1 Differential Equations ILl Differential Equations . . . .2 Inhomogeneous linear difference equations 11. . .2 Homogeneous linear differential equations 11.4 Geometric Aspects of the JCF 9.3. . . . . . .6 Computation of the matrix exponential 11.1.1. . . . . .3 HigherOrder Equations.1. . . . . . . Theoretical computation 9. . . . . .3 Vector Norms 7.1 Theoretical computation .2 Difference Equations 11. . .1 Fundamental Definitions and Properties 9. .2 Geometric Solution . . .4 Least Squares and Singular Value Decomposition 8. . .2 Other least squares problems 8.1.1 8. . 9. . . .3 Equivalence Transformations and Congruence 10.3 Determination of the JCF 9.1. . . .2 Jordan Canonical Form 9. .3 Computation of matrix powers . .1. . .2 On the + l's in JCF blocks 9. .3.4 Linear matrix differential equations . . . .3 Equivalence Transformations and Congruence 10.2 Inhomogeneous linear difference equations 11. . .1. . . . .2. 10.1 Some Basic Canonical Forms .
1 The Generalized EigenvaluelEigenvector Problem 12.2 Properties of the Kronecker Product . . . . .6 HigherOrder Eigenvalue Problems . .5. .1 Conversion to firstorder form 12.4 Symmetric Generalized Eigenvalue Problems .3 Application to Sylvester and Lyapunov Equations Bibliography Bibliography Index Index .1 Definition and Examples . . . . . . 13.2 Properties of the Kronecker Product 13. 12. 12. .6 HigherOrder Eigenvalue Problems 12.3 Application to the Computation of System Zeros .1 Conversion to firstorder form 125 125 125 127 127 130 131 131 133 133 133 135 135 135 139 139 139 139 140 144 144 151 153 13 Kronecker Products 13 Kronecker Products 13. . . . . . .3 Application to Sylvester and Lyapunov Equations 13. . .5 Simultaneous Diagonalization . . . . .3 Application to the Computation of System Zeros 12.2 Canonical Forms .Contents Contents ix ix 12 Generalized Eigenvalue Problems 12 Generalized Eigenvalue Problems 12. . . .6. 12.1 Definition and Examples 13. . 12.1 Simultaneous diagonalization via SVD 12. .6. .2 Canonical Forms 12. 13.1 The Generalized Eigenvalue/Eigenvector Problem 12. .5 Simultaneous Diagonalization 12. 12. . .5.. . . . . .1 Simultaneous diagonalization via SVD 12. .4 Symmetric Generalized Eigenvalue Problems 12. . . .
This page intentionally left blank This page intentionally left blank .
By matrix analysis I mean linear tools and ideas comfortably in a variety of applications. The concept of matrix factorization applications and by computational utility and relevance. Noble and Daniel [20]. [13]. in many cases. mathematics. [13]. or [25]. I highly recommend MATLAB® although other software such as xi xi . essentially Prerequisites for using this text are quite modest: essentially just an understanding for this understanding of calculus and definitely some previous exposure to matrices and linear algebra. The concept of matrix factorization is emphasized throughout to provide a foundation for a later course in numerical linear is emphasized throughout to provide a foundation for a later course in numerical linear algebra. By matrix analysis I mean linear algebra and matrix theory together with their intrinsic interaction with and application to algebra and matrix theory together with their intrinsic interaction with and application to linear dynamical systems (systems of linear differential or difference equations). and concepts such as determinants. These powerful and versatile tools can then be exploited to provide a unifying foundation upon which to base subsequent topcan exploited to foundation subsequent topics. I have tried throughout to emphasize only the more important and "useful" tools.. computer science. in many cases. For this.e. These powerful and versatile tools doinverses and the singular value decomposition (SVD). or computational students in engineering. the sciences. and mathematical structures. Instructors are encouraged to supplement the book with specific application examples from their own encouraged to supplement the book with specific application examples from their own particular subject area. for example). the sciences. example) or on the theoretical side (at the level of [12]. even though their recollecmatrices least tion may occasionally be "hazy. but somehow didn't quite manage to do. or computational science science who wish to be familar with enough matrix analysis that they are prepared to use its enough analysis they are prepared to tools and ideas comfortably in a variety of applications. Upon completion of a course based on this are excellent companion texts for this book." However. The text can be used in a onequarter or onesemester course to provide a compact overview of can be used in a onequarter or onesemester course to provide a compact overview of much of the important and useful mathematics that. either via formal courses or through selftext. and positive definite matrices should have been covered at least once.e. The books by Meyer [18]. but somehow didn't quite manage to do. requiring such material as prerequisite permits the early (but "outoforder" by conventional standards) introduction of topics such as pseuthe early (but "outoforder" by conventional standards) introduction of topics such as pseudoinverses and the singular value decomposition (SVD). students meant to learn thoroughly as undergraduates. although Chapters 2 and 3 do cover some geometric (i. either via formal courses or through selfstudy. particular subject area. eigenvalues and eigenvectors. singularity of matrices." However. the student is then wellequipped to pursue. for [11]. although Chapters 2 and 3 algebra. or [16]." this ics. Matrices are stressed more than abstract vector spaces. and Strang [24] Ortega are excellent companion texts for this book. I have tried throughout to emphasize only the and more advanced material is introduced. followon topics on the computational side (at the level of [7]. Certain topics thoroughly as undergraduates. basisfree or subspace) aspects of many of the fundamental do cover some geometric (i.Preface Preface This book is intended to be used as a text for beginning graduatelevel (or even seniorlevel) This book is intended to be used as a text for beginning graduatelevel (or even seniorlevel) students in engineering. [II]. Because tools such as the SVD are not generally amenable to "hand computation. eigenvalues and eigenvectors. Because tools such as the SVD are not generally amenable to "hand computation. methods. Matrices are stressed more than abstract vector spaces. Upon completion of a course based on this text. basisfree or subspace) aspects of many of the fundamental notions. or [16]. The choice of topics covered in linear algebra and matrix theory is motivated both by The choice of topics covered in linear algebra and matrix theory is motivated both by applications and by computational utility and relevance. Basic of calculus and definitely some previous exposure to matrices and linear algebra." this approach necessarily presupposes the availability of appropriate mathematical software on approach necessarily presupposes the availability of appropriate mathematical software on a digital computer. computer science. the student is then wellequipped to pursue. mathematics.. Certain topics that may have been treated cursorily in undergraduate courses are treated in more depth that may have been treated cursorily in undergraduate courses are treated in more depth and more advanced material is introduced. students meant to learn much of the important and useful mathematics that. The text linear dynamical systems (systems of linear differential or difference equations). example) or on the theoretical side (at the level of [12]. requiring such material as prerequisite permits tion may occasionally be "hazy. For this. for example). singularity of matrices. Basic concepts such as determinants. I highly recommend MAlLAB® although other software such as a digital computer. Ortega [21]. [23].
in particular. and a wide variety of other fields. modern Some of the applications of matrix analysis mentioned briefly in this book derive of the applications of matrix analysis mentioned briefly in this book modem statespace from the modern statespace approach to dynamical systems. many students who completed especially offered. and coherent fashion. and while most material is developed from basic ideas in the book. in an elementary linear algebra course." a set of vectors is either linearly independent or it is not. Mastery of the material in this text should enable the student to read and understand the modern language of matrices used throughout mathematics. remarked afterward that if processing. diverse audience. and states often give rise to models of very numbers models high order that must be analyzed. Rather." Proofs are given for many theorems. If a set of vectors is linearly independent. form the foundation Some of the key algorithms of numerical linear algebra. and modem engineering. statistics. are deferred to such a course. how "nearly dependent" are the vectors? If they linearly independent. especially the first few times it was offered. A second motivation for a computational emphasis is that it provides many of the essential tools for what I call "qualitative mathematics. linear algebra introducing "onthefly" algebra for elementary statespace theory) to an appendix or introducing it "onthef1y" when to necessary. and evaluated. and the course has proven to be remarkably successful at enabling students from Davis. chemistry. This is ideal not given explicitly. control systems with standard large numbers of interacting inputs. The "language" in which such described models are conveniently described involves vectors and matrices. completed the course. or signal processing. and thus the text can serve a rather diverse audience. But in most engineering or scientific contexts we want to know more than that. It is thus crucial to acquire knowledge vocabulary a working knowledge of the vocabulary and grammar of this language. are there "best" linearly independent subsets? These tum out to turn be much more difficult problems and frequently involve researchlevel questions when set be much more difficult problems and frequently involve researchlevel questions when set in the context of the finiteprecision. Since this text is not intended for a course in numerical linear algebra per se. for example. mathematics. The presentation of the material in this book is strongly influenced by computais influenced by computational issues for two principal reasons. science. and the course has proven to be remarkably successful at enabling students from disparate backgrounds to acquire a quite acceptable level of mathematical maturity and acceptable graduate rigor for subsequent graduate studies in a variety of disciplines. outputs. Statespace methods are Statespace modem now standard in much of modern engineering where. The tools of matrix analysis are also applied on a daily basis to problems in biology. consistent. . the details of most of the numerical aspects of linear algebra per se. if only they had had this course before they took linear systems. This is an absolutely fundamental fundamental concept. must lay firm foundation upon which and perspectives perspectives can be built in a logical. one must lay a firm foundation upon which subsequent applications and Rather. This is ideal material from which to learn a bit about mathematical proofs and the mathematical maturity and insight gained thereby. the student does require a certain amount of what is conventionally referred Proofs referred to as "mathematical maturity." For example. "reallife" problems seldom yield to simple "reallife" closedform formulas or solutions. simulated. form the foundation virtually modem upon which rests virtually all of modern scientific and engineering computation. finiterange floatingpoint arithmetic environment of of of most modem computing platforms. Some of the key algorithms of numerical linear algebra. they are either obvious or easily found in the literature. are deferred to such a course. First. applied physics. Indeed. in particular. It is my firm conviction that such maturity is neither encouraged conviction neither nor nurtured by relegating the mathematical aspects of applications (for example. When they are not given explicitly. prerequisites developed While prerequisites for this text are modest. I have taught this material for many years. they are either obvious or easily found in the literature. many times at UCSB and twice at UC Davis. econometrics. If If are linearly dependent.xii xii Preface Preface Mathcad® Mathematica® or Mathcad® is also excellent. They must generally be solved computationally and closedform it is important to know which types of algorithms can be relied upon and which cannot.
realized that by requiring this course as a prerequisite. . The concept seems to work.. My fellow instructors. etc. they would have been able to concentrate on the new ideas deficiencies they wanted to learn. AJL.Preface Preface xiii XIII or estimation theory. too. June 2004 — AJL. they no longer had to provide as much time for "review" and could focus instead on the subject at hand. rather than having to spend time making up for deficiencies in their background background in matrices and linear algebra.
This page intentionally left blank This page intentionally left blank .
e 6. Henceforth. 1R. This is followed by a review of some basic notions in matrix analysis and linear algebra. n }. en 4. but this convention makes column vector rather than a row vector is entirely arbitrary.1 1. IR~ xn denotes the set of real = set of real of rank Thus. 2. 3. 5. xyT is an n x n matrix. where Xi E R for ii E !!. Rnxnn denotes the set of real nonsingular n x n matrices. e. 5.g. the set of ntuples of complex numbers represented as column vectors. x T y is a scalar while xyT is an n x n matrix. Henceforth. Thus.. XTy is a scalar while it easy to recognize immediately throughout the text that. and linear algebra. Cn = the set of ntuples of complex numbers represented as column vectors. Crnxn = the set of complex (or complexvalued) m x n matrices. Rn = the set of ntuples of real numbers represented as column vectors. That a vector is always a column vector rather than a row vector is entirely arbitrary. = the set of complex m x n matrices of rank r.1 Some Notation and Terminology Some Notation and Terminology We begin with a brief introduction to some standard notation and terminology to be used We begin with a brief introduction to some standard notation and terminology to be used throughout the text.. The following sets appear frequently throughout subsequent chapters: The following sets appear frequently throughout subsequent chapters: 1. ... but this convention makes it easy to recognize immediately throughout the text that. the notation n denotes the set {1. Thus.Chapter 1 Chapter 1 Introduction and Review Introduction and Review 1. the notation!! denotes the set {I. . n}. That a vector is always a y E IR n and the superscript T is the transpose operation. IR rn xn = the set of real (or realvalued) m x n matrices. IR n = the set of ntuples of real numbers represented as column vectors. e. Note: Vectors are always column vectors.. R mxn = the set of real (or realvalued) m x n matrices. 1 .. mxn = the set of complex (or complexvalued) x n matrices. This is followed by a review of some basic notions in matrix analysis throughout the text.n xn Rmxnr = the set of real m x n matrices of rank r.. x E Rn I. A row vector is denoted by yT where Note: Vectors are always column vectors. 2.g. A row vector is denoted by y~. .n xn Cmxn = the set of complex m x n matrices of rank r. where y G Rn and the superscript T is the transpose operation. Thus. nonsingular n x n matrices. x e IR n means means where xi e IR for e n. e.
We henceforth that. = 0 for i ^ j. . • upper Hessenberg if afj = 0 for — > 1. = 0 for < j. then A7" e jRnxm. an equation like A = A T implies that A is realvalued while a statement like A = AH implies that A is complexvalued. Hermitian conjugate sometimes A*) and its (i. Example 1.. = 0 for j — > 1. Example 1. For example. A matrix A is symmetric i. B E IR nxm . j is Remark the more common notation in electrical engineering and system theory. The The transpose of a matrix A is denoted by AT and is the matrix whose (i. Introduction and Review We now classify some of the more familiar "shaped" matrices.e. For example. A = AH A complexvalued. = 0 for i > }. • lower triangular if aij7 = 0 for i/ < }. There is some the more common notation in electrical engineering and system theory. are appropriately dimensioned subblocks.e.. an equation like A = A T implies that A is realvalued while a statement otherwise noted.. if A E IRnxn. otherwise noted. then z = IX — jfi. if e Rnxn e Rmxn C e Rmxm then the (m n) x (m n) matrix [A0 ~] is block upper triangular. is complexvalued symmetric but not Hermitian.. If A E em xn. i)th entry of A. = a jfJ. if z = a + jf$ (j = ii = R). Note that if A E R mx ". 2 2.. • lower triangular if a.J I > 2. that is. i. There is some advantage to being conversant with both notations. While \/—\ is most commonly denoted by i in mathematics texts. where the bar indicates complex sometimes A*) and its = IX jfJ (j = = v^T). text but reminders are placed at strategic locations. = 0 for / — j\ > 2. A if A = AT and Hermitian if A = AH. Oth (A 7 ). is Hermitian (but not symmetric). • pentadiagonal if aij = 0 for Ii . = 0 for i > j. then the (m + n) x (m + n) matrix [~ Bc] is block upper triangular. The notation j is used throughout the text but reminders are placed at strategic locations.[ 7 .. • diagonal if aij7 = 0 for i i= }. • tridiagonal if aij = 0 for Ii . • pentadiagonal if ai. A matrix A E IRn xn e (or A E enxn ) is A eC" x ")is • diagonal if a. } is While R is most commonly denoted by i in mathematics texts. • upper triangular if aij. ] is symmetric (and Hermitian). A = [ 7+} 5 3· A . (7.JI > 1.. z Remark 1.jj > 1. 1. • lower Hessenberg if a.j  7+} ] is Hermitian (but not symmetric). it is Transposes of block matrices can be defined in an obvious way. For example.1. AT E E" xm is the (j.ii > 1. • upper triangular if a. then its Hermitian transpose (or conjugate transpose) is denoted by AH (or H If A e C mx ". ~ 5 is symmetric (and Hermitian). • upper Hessenberg if aij = 0 for ii . 7 = («77).2. unless if A = A T Hermitian A = A H. C E jRmxm.. A = [ . j)th entry of a matrix A is denoted by AT and is the matrix whose j)th entry A. For example. • tridiagonal if a(y = 0 for z — j\ > 1. then easy to see that if A. We henceforth adopt the convention that. 7 + j ] is complexvalued symmetric but not Hermitian. A is conjugation..2 2 Chapter 1. j)\h entry is (AH)ij = (aji). a. • lower Hessenberg if aij = 0 for } . and definitions block submatrices. then r = [ . it is easy to see that if Aij are appropriately dimensioned subblocks. 2 Transposes of block matrices can be defined in an obvious way. A e jRmxn. where the bar indicates complex j)th entry is (A H ). Each of the above also has a "block" analogue obtained by replacing scalar components in the respective definitions by block submatrices.2. Introduction and Review Chapter 1. (AT)ij = aji..
and is premultiplied by a row vector yTe jRlxm...~].e... . . AB bi E W1. if (C D)H — D C ). Again. This gives a dual to the matrixvector result above. i=I If matrices C and D are compatible for multiplication... .2.3. Theorem 1.3 can then also be generalized to its "row dual.... recall that (CD)T = DT C T (C D)T = DT T If H H H (or (CD} = DHC H ). E JRm and x = l I. .2 Arithmetic It is assumed that the reader is familiar with the fundamental notions of matrix addition.... Theorem reader. matrixvector if C E jRmxn has row vectors cJ E jRlxn. A very important way to view this product is interpret weighted to interpret it as a weighted sum (linear combination) of the columns of A. there can be important computerarchitecturerelated advancomputerarchitecturerelated tages to preferring the latter calculation method.. It is deceptively simple and its full understanding is well rewarded. It Theorem 1. and multiplication of matrices. Theorem 1. but equivalent.. i. its importance cannot be overemphasized. Then we can quickly calculate dot products of the rows of A column Ax = [.xn~ ] Then Ax = Xjal + . That is. The importance of this interpretation cannot be overemphasized.• a"1 E m JR " with a.[ ~ l For large arrays of numbers." The details are left to the readei "row left . un]]Ee jRmxn with Ui t Ee jRm and V = [VI.'" p] E jRnxp For matrix multiplication. un Rmxn with u Rm and V = [v ... vector x.[ ~ J+l.. and is premultiplied by a row yT E R l x m then the product can be written as a weighted linear sum of the rows of C as follows: follows: yTC=YICf +"'+Ymc~ EjRlxn.. matrixvector product with the column x to find Ax = [50 32]' but this matrixvector product can also be computed computed via v1a 3. n UV T = LUiVr E jRmxp..3. A special case of matrix multiplication occurs when the second matrix is a column multiplication second i. multiplication of a matrix by a scalar. . {.2 Matrix Arithmetic 1. Let U = [MI. importance interpretation take A = [96 85 74]x = take A = [~ ~]. + Xnan E jRm.1. suppose (linear combination) suppose A = la' . Vn] ]Ee lR Pxn U [Uj. Ax..bhp ] e Rnxp with hi e jRn. formulation of matrix multiplication that appears frequently in the text and is presented below as a theorem. . Matrix Arithmetic 3 1. suppose A E jRmxn and [hI. Then the matrix product A B can be thought of as above. multiplication.e. suppose A e Rmxn and B = [bi. applied p times: There is also an alternative.. the matrixvector product Ax.•. As a numerical example.[ ~ J+2. x = ! 2 Then we can quickly calculate dot products of the rows of A [~]. eRmxn has row cj e E l x ". vn Rpxn p with Vit e R . Namely. Then v E jRP.
. Y}c = [ } JH [ ~ ] = [I . indeed. where / is the n x n identity matrix. y)c = y = Eni=1 xiyi but throughout the text we prefer the symmetry with the real case. i. x)c and we see that. the order in which x and y appear in the complex inner (x.e. x}c. We list below some of (or A 6 en xn) we use the notation det A for the determinant of A. Y}c = {y. The more conventional definition of the complex inner product is product is important.4 Determinants Determinants It is assumed that the reader is familiar with the basic theory of determinants. Then Example 1. then we say that x and y are orthonormal. Clearly said = an orthogonal or unitary matrix has orthonormal rows and orthonormal columns. Introduction and Review Chapter 1.3 Inner Products and Orthogonality Inner Products and Orthogonality For vectors x. we define their complex Euclidean inner product (or inner product. consider the nonzero vector x above. There is no special name attached to a nonsquare matrix A E R mxn (or E Cmxn with orthonormal no special name attached to a nonsquare matrix A e ]Rrn"n (or € e mxn ))with orthonormal rows or columns.2j while while and we see that.. The more conventional definition of the complex inner product is H ( x . the nonzero vector x above. Introduction and Review 1.. rows or columns. y ) c = yHxx = L:7=1 x.. Note that x T x = 0 if and only if x = 0 when x E IRn but that this is not true if x E en. (x. but throughout the text we prefer the symmetry with the real (x. x)c. Then x T x = 0 but x H X = 2. i. y)c = (y.4 4 Chapter 1. In sometimes used denote identity matrix. Note that the inner product is a scalar. (x.y. Nonzero complex vectors are orthogonal if XHy = O. .. Then XTX = 0 but XHX = 2. If x. the Euclidean inner product (or inner product. For A E R nnxn A e IR xn It assumed of determinants. A orthogonal and XTX = 1 and yTyy = 1. i. Example 1. To illustrate.3 1. 1. y e IRn are said to be orthogonal if their inner product is zero. where I is the n x n matrix A e IRnxn is an orthogonal matrix if ATA = AAT = /. then we say that x and y are orthonormal.4. Similarly. E R are said to be orthogonal if their inner product is Two nonzero vectors x. Let x = [} ]] and y = [~].j] [ ~ ] = 1 .e. y E R". A EC = (orC" xn). Nonzero complex vectors are orthogonal if x H y = 0. i. We list below some of . order in which x product is important. indeed. xTyy = 0. To illustrate. Similarly.y.. Two nonzero vectors x. y E <en. for short) of x and For vectors y e IRn. y)c = (y. y) := x y = Lx. case. x T = O. Let x = [1j and y = [1/2]. the Euclidean inner product inner for short) y is given by y is given by n T (x.4. consider What is true in the complex case is that x H 0 if and only if O. If x and y are zero.=1 y appear in Note that (x. for short) by for short) by n (x'Y}c :=xHy = Lx. a matrix A e en xn is said to be unitary if A H A = AA H = I. If e C".y. There is an orthogonal or unitary matrix has orthonormal rows and orthonormal columns. The notation /„ is sometimes used to denote the identity matrix in IR nxn in Rnx" x nxn H H (or en "). we define their complex Euclidean inner product (or inner product. x)c'.e. x and y are orthogonal and x Tx = 1 and yT = 1.e. What is true in the complex case is that XH x = 0 if and only if x = 0.4 1. y)c = (y. Note that x Tx = 0 if and only if x = 0 when x e Rn but that this is not true if x e Cn. Then (x.=1 Note that the inner product is a scalar. . (or A E Cnxn) we use the notation det A for the determinant of A. A nxn matrix E R is an orthogonal matrix if AT A = AAT = I.
B E IRnxn . det A 11 det A22 • • det Ann 14. Multiplying a column of A by a scalar ex results in a new matrix whose determinant scalar a determinant is ex det A. 10.. Determinants 5 properties the more useful properties of determinants. 7. If A E R n x n and D e RMmxm. If A is diagonal. If A. i. • • An" (of A = square diagonal blocks A11. Proof" Proof: This follows easily from the block UL factorization BD. Note that this is not a minimal set. then det [Ac ~] det D det(A B D. 14.• a"n. Interchanging two columns of A changes only the sign of the determinant. change the determinant. then det [~ BD] = det D det(A – B D – 11C ) .• ann. If A has a zero column or if any two columns of A are equal. then det(A... Ann (of possibly different sizes). Multiplying a row of A by a scalar and then adding it to another row does not change the determinant. then det(A1) = 1detA . Determinants 1. 16.1.. Interchanging two rows of A changes only the sign of the determinant. Interchanging two rows of A changes only the sign of the determinant. If A A A = o.A22. then det A = O.1 I ][ .4. then det [~ BD] = del A det(D – CA– l 1 B). several more is a properties are consequences of one or more of the others. with A block diagonal (or block 13. i. 11.. det A is the product of its diagonal 10.thendet(AB) = det A det 5.. 3. 13. If A e IRnxn and D E lR~xm.. If A € lR~xn. then det A = a11a22 . 2. Multiplying a row of A by a scalar a results in a new matrix whose determinant is 5. If A is block diagonal (or block upper triangular or block lower triangular). If A is upper triangular. are consequences of one or more of the others.. 17. . of 5.B).• det Ann. If A.e. If A e R n x n and D e R m x m . then det A = 0.. exdetA.. A 22 . 16. then det A = different det A11 det A22 . det AT = det A (detA H = detA if A e C nxn ). If A has a zero column or if any two columns of A are equal. Multiplying a row of A by a scalar ex results in a new matrix whose determinant is a det A...CA. Proof: This follows easily from the block LU factorization Proof" This follows easily from the block LU factorization [~ ~J=[ ~ ][ ~ 17. the determinant.1 ) = de: A. If A E Rnxn. 8. If A E lR~xn and DE IR mxm det [Ac ~] detA det(D . then det A = all a22 . If A has a zero row or if any two rows of A are equal. det A is the product of its diagonal diagonal. If det = a11a22 • • ann 12.. B eR n x n . properties 1. If elements. Multiplying a column of A by a scalar and then adding it to another column does not a column of scalar column does change the determinant. 3. then det A = alla22 • • ann 12. If A is lower triangUlar.. then det A = 0..• ann. 8. 9. Multiplying A 6.. Multiplying a row of A by a scalar and then adding it to another row does not change 7. detAT = detA (det A H = det A A E C"X")...e. 15. is a det A. 11. then det A = a11a22 . then det(AB) = det A det B. A 11. 15. If A is lower triangular.e.. 4.4.•. = alla22 • • ann i.C).
. Let x. AB ^ BA. ! [ 2cos2<9 I T 2cos2 0 (a) Show that the matrix A = _. Showthatdet(lxyT) 1 – yTx. is defined as the sum of its diagonal A e Rnxn. i. V2. A =.5.. Letx.Vk € jRn xn be orthogonal matrices. Show that the product V = VI.e. what is det(aA)? What is det(–A)? E R a det(A)? A? If A unitary.. The factorization of a matrix A into the product of a unit lower triangular matrix L (i.yTx. ST = S. i. [24]..• V k is an orthogonal matrix. B e JRn xn and a.. (b) Suppose A e IR" X "is idempotent and A i= I. either prove the converse or provide a counterexample.. The matrix D ..B D – l C is the Schur complement of D in [~ BD ].e. 2f) 2 _ sm 2^ sin 0 sin sin 20 1 . i.e A – 1 B is called the Schur complement of A in[ACBD].. lower triangular with all l's on the diagonal) and an upper triangular matrix L 1's an V is called an LV factorization. lor z r 2sm2rt # J. Show that A must be singular.. (b) Show that Tr(AB) = Tr(BA). are block analogues of these. • . for example. (a) Show that the trace is a linear function. ST = So Show that TrS = 0. example. The factorization of a matrix A into the product of a unit lower triangular Remark 1. i. of denoted Tr A. elements. of Din [AC ~ l EXERCISES EXERCISES nxn 1. Suppose A E jRn xn is idempotent and A ^ I. see.. even though in general AB i= B A. what is det A? If A is unitary. 2 0 IS I dempotent .6 6 Chapter 1. TrS O. Introduction and Review Remark 1. Another such factorization is UL where V is unit upper triangular and L is lower triangular.5. TrA = Eni=1 aii. if A.6. denoted TrA. y e Rn. . 2sin20 J is idempotent for all #. Remark 1. aII o. U1 U2 • • Uk is an 5. The trace of A. then Tr(aA fiB)= aTrA + f3TrB. Uk E Rnxn U = VI V2 .. Show that det(I – xyT) = 1. Remark — C I B – BDIe Similarly.. . [24]. Another such factorization is VL U is an LU factorization. .. (c) Let S € Rnxn be skewsymmetric. _. A matrix A e Wx" is said to be idempotent if A2 = A. nxn linear E R f3 E JR. A E jRnxn A2 / x™ . . [~ ~ ]. . ft e R.. 6.e.y E jRn. TrA = L~=I au· elements. see. The factorizations used above U triangular. Then E jRnxn skewsymmetric. II _ . Tr(Afl) = Tr(£A). what is det A? If A 3.. Introduction and Review Chapter 1. U2 . Let A E jRNxn. A? 2. Tr(aA + f3B) = aTrA + fiTrB.e.e. If A is orthogonal. A .. If A e jRnxn and or is a scalar. 4. Let U1. Show that A must be singular.
0. the multiplication operator ". (Ml) a .8.) ( a .8. . (D) (D) a· p a . (A2) there exists an element 0 e IF such that a 0 = a. afar all a. when no confusion can arise. •) is an abelian group. ft Elf. (M2) there exists an element I E F such that a . . but some infinitedimensional examples are also cited. .8 .8) + y ffor all a. (A3) for all a E IF. y Elf.8 a for all a. the multiplication operator "•" is Generally speaking.8 = ft + afar all a. An excellent reference for this and the next chapter is [10]. ft.8 E IF. y € F. (M4) a·. (Al) a + (. (Ml) a· p . 2. 7 . including spaces formed by special classes emphasis is on finitedimensional vector spaces. y)=cip+a. when no confusion can arise. (M4) a • p =.ye¥.. +) is a group and an abelian group if (A4) also holds. Axioms (Al)(A3) state that (IF.p ) . not written explicitly. where some of the proofs that are not given here may be found. (M3) for all a E ¥.8 e F. p e F.((. but some infinitedimensional examples are also cited.) is an abelian group.((. A field is a set IF together with two operations +. p. Axioms (M1)(M4) state that (F \ to). A field is a set F together with two operations +. (M2) 1 e IF • I = for a e IF.8 + y) = a·. .y for alia.8. . a f. there exists an element aI E F such that a . (A3) for all a e F.1.8 = P • a for all a. including spaces formed by special classes of matrices. y e F.8 + y) = (a +.. yy) = (a·. y Elf. where some of the proofs that are not given here may for this and the next chapter is [10]. be found. (A4) a + . The emphasis is on finitedimensional vector spaces. An excellent reference of matrices. for all a e IF. Generally speaking. there exists an element (a) e F such that a + (a) = 0.8 +a· y for all a. y Elf. p.o r all a. (A4) a + p = . : IF x F —> IF such that Definition 2. • F x IF ~ F such that (Al) a (P y ) = (a + p ) y o r all a. there exists an element (—a) E IF such that a (—a) O.8)· yyf for all a.1 Definitions and Examples Definition 2.Chapter 2 Vector Spaces Vector Spaces In this chapter we give a brief review of some of the basic concepts of vector spaces.8. I = a for all a E F." is not written explicitly. ^ 0. The In this chapter we give a brief review of some of the basic concepts of vector spaces. Axioms (A1)(A3) state that (F. aI = 1. (M3) e IF. a"1 € IF • a~l = 1.1. +) is a group and an abelian group if (A4) also holds. Axioms (MI)(M4) state that (IF \ {0}. (A2) there exists an element 0 E F such that a + 0 = a for all a E F.
2.P. simply by V.. C with ordinary complex addition and multiplication is a field. 3. where Z+ = {O. Example 2.2.5. R) with addition defined by I. 1. F) or. (IRn. Example 2.F xV »• V such that (VI) (V.. p E IF andfor all v e V. when there is no possibility of confusion as to the A vector space is denoted by (V. underlying field. Moreover.3 are different from the + and • in Definition 2.. no confusion and the operator is usually not even written explicitly. }. Ra[x] = the field of rational functions in the indeterminate x 3.. (V5) 1 v = v for all v e V (1 Elf).f3i EIR . IR with ordinary addition and multiplication is a field. 4. IR~ xn = m x n matrices of rank r with real coefficients) is clearly not a field since. C).3. (V4) a· (v + w) = a . is a vector space.2. IF) or. in Definition Remark 2. (V3) (a (V4) a(v w)=av a w for all a ElF andfor all v. A vector space over a field IF is a set V together with two operations + ::V x V + V and· :: IF x V + V such that V x V ^V and. lR~xn is not a field either since (M4) does not hold in general (although the other 8 axioms hold).}. + f3qXq :aj. 4. (VI) (V.5. . this causes no confusion and the·• operator is usually not even written explicitly. In practice. RMrmxn= {m x n matrices of rank r with real coefficients} is clearly not a field since. ) f for all a. f3 Elf andforall v E V.1. Vector Spaces Example 2. +) is an abelian group. (R". A vector space over a field F is a set V together with two operations Definition 2. Similar definitions hold for (en.4. w E V. this causes 2. e). v for all a.r] = the field of rational functions in the indeterminate x = {ao + f30 + atX f3t X + . I.1 in the sense of operating on different objects in different sets.. simply by V. Example 2.. 1. p € F and for all v e V. is a field. (V2) (a·.p ) .• v = a·• v + p • v for all a. v + a. IR) with addition defined by and scalar multiplication defined by and scalar multiplication defined by is a vector space. w for all a e F and for all v. v = a .qEZ +} . +) is an abelian group. where Z+ = {0..3 are different from the + and . Note that + and • in Definition 2. for example. A vector space is denoted by (V. In practice. (V3) (a + f3). (Ml) does not hold unless m = n. 2.4. Remark 2.( (f3' V v) o r all a.8 Chapter 2. Definition 2.3. when there is no possibility of confusion as to the underlying fie Id. . Note that + and· in Definition 2. Raf. (MI) does not hold unless m = n. w e V.. . (V2) ( a f3) v = a P . ft) v = a v + f3. e with ordinary complex addition and multiplication is a field. R with ordinary addition and multiplication is a field.1 in the sense of operating on different objects in different sets. R"x" is not a field either for example. f3 e F and for all v E V. since (M4) does not hold in general (although the other 8 axioms hold). Similar definitions hold for (C".l. 2. Moreover. + apxP + .. (V5) I·• v = v for all v E V (1 e F).2. is a field.
and the functions are piecewise continuous =: (PC[to. Then O(D. if and only subspace of (V. is henceforth understood to mean "is a subspace of. Note.2. that since 0 e F." + fJ2I a21 + P" .2. Let O(X>.7. verify that the set in or prove that something is indeed a subspace (or vector space). for all d ED. IF) = (JRn. td)n. Then cf>('O.7. Let A € R"x".2..2 2. JR). IF) be an arbitrary vector space and '0 be an arbitrary set. Notation: When the underlying field is understood. less restrictive meaning "is a subset of' is specifically flagged as such. foral! a. Let (V. Notation: When the underlying field is understood. V) be the set of functions / mapping D to V.e. t\]. and the symbol c. i. IF) is a subspace of (V. (JRmxn. (V. Let A E JR(nxn. W2 E W. Then (W.. is henceforth understood to mean "is a subspace of. g E cf> and scalar multiplication defined by and scalar multiplication defined by (af)(d) = af(d) for all a E IF. h])n (b) '0 = [to. Special Cases: Special Cases: (a) V = [to. td.. when used with vector spaces. and for all f E cf>. (V. implies that the zero vector must be in any subspace. V) is a vector space with addition set of functions f mapping '0 to V. y a l2 y a 22 yam 2 ya. Note. etc. F) be an arbitrary vector space and V be an arbitrary set. t\])n continuous =: (C[to. (E mxn JR) is a vector space with addition defined by 2. Then (W. +00). we write W c V. V) be the 3. E) is a vector space with addition defined by 9 9 A+B= [ . ß E ¥ and for all w1. too. . too. F) is itself a vector space or. (V.6. F) be a vector space and let W ~ V.2 Subspaces Subspaces Definition 2. Subspaces 2. if and only if(aw1 ßW2) E if(awl + fJw2) e W for all a. V) is a vector space with addition defined by defined by (f + g)(d) = fed) + g(d) for all d E '0 and for all f. l . F) = (IR". IF) = (JR n . IF) is itself a vector space or. W = 0. JR). 4. 4. =: (PC[f0. F) if and only if (W. The latter characterization of a subspace is often the easiest way to check or prove that something is indeed a subspace (or vector space)." The less restrictive meaning "is a subset of" is specifically flagged as such. i. Then {x(t) : x(t) = Ax(t}} is a vector space (of dimension n). Then (x(t) : x ( t ) = Ax(t)} is a vector space (of dimension n). Let cf>('O. Let (V. equivalently. this implies that the zero vector must be in any subspace. and the functions are piecewise continuous (a) '0 = [to. fJ e IF andforall WI. td)n or continuous =: (C[?0. + fJmn l yaml yamn 3. Subspaces 2." The when used with vector spaces. W f= 0.e. IF) if and only if (W." ya2n . this question is closed under addition and scalar multiplication. 2. we write W ~ V. The latter characterization of a subspace is often the easiest way to check Remark 2.6.. Let (V. and the symbol ~. amI al2 a22 + fJI2 + fJ22 aln + fJln a2n + fJ2n a mn + fJml am2 + fJm2 and scalar multiplication defined by and scalar multiplication defined by [ ya" y a 21 yA = . Let (V. E). F) is a Definition 2. verify that the set in question is closed under addition and scalar multiplication. w2 e Remark 2. IF) be a vector space and let W c V. that since 0 E IF. equivalently.
2.lF) = (R" X ".ß with ß =1= 0 are called linear varieties. . Then W is /wf a subspace of JR.0.o. If 12.O.. R) and for each v € R2 of the form v = [v1v2 ] identify v1 with 3. V2. Let W = {A € R"x" : A is orthogonal}.nxn. Vector Spaces Example 2. W~V. 1.1.e.S. .Vk of X and for any scalars a1. V usually denotes a vector space with the underlying field generally being R unless Thus. E ]Rnxn : not 2. . vk E X and scalars a1. then R = S if and only if R C S and S C R. in some vector space V. f3 e R. . A2 are symmetric. . .JR. too. W2. Wi. As an interesting exercise. c E JR.. v2. be an element of R..•. JR.9.nxn.} . W1/2.. (Xk not all zero such that X is a linearly independent set of vectors if and only if for any collection of k distinct X is a linearly independent set of vectors if and only if for any collection of k distinct elements v1. As an interesting exercise. All lines through the origin are subspaces. • • •} be a nonempty collection of vectors u.I' and Wi. X is a linearly dependent set of vectors ifand only if there exist k distinct if and only if exist distinct elements v1. Consider (V. For a. W2. sketch W2.. define W". sketch Then Wa. Proof: Suppose A\. Example 2.ß is a subspace of V if and only if ß = O. .. + (XkVk = 0 implies al = 0.• } be a nonempty collection of vectors Vi in some vector space V. For ß E R define the jccoordinate in the plane and V2 with the ycoordinate. called linear varieties. Henceforth. F) = (R2. Consider (V.8. Let X {VI. ak. that the vertical line through the origin (i. = {A E JR.. . •.R) and 1. F) = (JR.... ak = O.10 Chapter 2.10.and W1/2. V usually denotes a vector space with the underlying field generally being JR. then R = S if and only if Definition 2. ffR and S are vector spaces (or subspaces). Then (V. Note. Definition 2.. R"x". . Vk e X and scalars aI.. .3 2. •••.o./l = {V : v = [ ac ~ f3 ] . Henceforth. Then it is easily shown that ctA\ + f3A2 is Proof' Suppose AI. ~SandS ~ R.nxn : A We V. .. that the vertical line through the origin (i.1. X linearly set of Definition 2. ak. Consider (V.e. Shifted subspaces Wa./l with f3 = 0 are All lines through the origin are subspaces. a = oo) is also a subspace. al VI + . . . A2 are symmetric. W2. . one usually proves the two inclusions separately: Note: To prove two vector spaces are equal. Note.) and let W = [A e R"x" : A is symmetric}. one usually proves the two inclusions separately: An arbitrary r e R is shown to be an element of S and then an arbitrary s E S is shown to is shown to be an element of and then an arbitrary 5 € is shown to An arbitrary r E be an element of R. Shifted subspaces W". we drop the explicit dependence of a vector space on an underlying field. To prove two vector spaces are equal. ak not all zero such that elements VI. 3. Definition 2./l is a subspace of V if and only if f3 = 0. unless explicitly stated otherwise. IF) = (]R2. Thus. . .3 Linear Independence Linear Independence Let X = {v1. .I. Vk of X and for any scalars aI. Then it is easily shown that aAI + fiAi is symmetric for all a. Then W".9. and S are vector spaces (or subspaces). too.. f3 e R. .) and for each v E ]R2 of the form v = [~~ ] identify VI with the xcoordinate in the plane and u2 with the ycoordinate. . explicitly stated otherwise. elements VI. ft E R symmetric for all a.. a = 00) is also a subspace.. we drop the explicit dependence of a vector space on an underlying field.10.
. The linear v E ]Rn. Independence of these vectors turns out to be equivalent to a concept called controllability. Why? However.. then a = 0. 2..12. A set of vectors X is a basis for V if and only if 1. Example 2. and consider the matrix V = [VI. and there exists a E ]Rk such that VT V is singular. e k. ... which is discussed in more detail in efA Chapter 11). Independence of these vectors turns out to be equivalent to a concept Chapter 11). Vi EX. Then consider the rows of etA B as vectors in em [to..2. Sp(X) = V. 2.11.. tIl 2. ..en = 0 0 0 o SpIel. Example 2. The dependence of this set of vectors is equivalent to the existence of a nonzero vector E Rk dependence of this set of vectors is equivalent to the existence of a nonzero vector a e ]Rk O. t1] (recall that etA denotes the matrix exponential. en} = Rn. An equivalent condition for linear independence is that the matrix Va = 0.. V V2."I [ i1i1l ]} [[ s a linearly is a Iin=ly dependent set de~ndent ~t (since 2v\ — V2 + v3 = 0). . Let A E ]Rnxn and 5 e R"xm.14. = [ v 1 . Let X = {VI. If the set of vectors is independent.v2 + V3 = 0). ii E If. A set of vectors X is a basis for V if and only ij Definition 2..Vk] e ]Rnxk. {1. to be studied further in what follows.•}} be a collection of vectors vi. Howe. e2 = 0 1 0 .. 1. kEN}.. X is a linearly independent set (of basis vectors)..11. Then the span of of X is defined as X is defined as Sp(X) = Sp{VI.. and X (of and 2.. }. X = [v1 v2 .14. e2. Then Sp{e1. T V is nonsingular. Then consider the rows of etA B as vectors in Cm [t0. Sp(X) = V. Why? independent. then = O...3. + (XkVk . called consider Let Vif e R".}.13.. .13.12. . .en} = ]Rn. e2 . Linear Independence 2. Let V = Rn and define = ]Rn and el = 0 0 .. 2. linear dependence x such that Va = 0.. and there exists a e R* such that Va = 0. Linear Independence Example 2. A e R xn B E ]Rnxm. If the set of vectors is independent. .. . Definition 2. = {v : where N = {I. Vk] E Rnxk. ~ HHi] } Ime~ly i is a i" linearly independent set.'" . An equivalent condition for linear independence is that the matrix V TV is nonsingular. (since 2vI . An equivalent condition for linear dependence is that the k x k matrix condition VT V is singular. } = (Xl VI + . (Xi ElF.. to be studied further in what follows. .. LetV = 11 11 ~. o Definition 2.3. . 1£t V = R3.. Vi e span of Definition 2. E V. .. Then {[ Then I. V2.
] l = = Theorem 2.19.17. .. For . In Rn. Then for all v E V there exists a unique ntuple {~I'. The number of elements in a basis of a vector space is independent of the particular basis considered. We say that the vector x of of of (b1. while We can also determine components of v with respect to another basis. . + vne n · Vn We can also determine components of v with respect to another basis.. . For be n... VI ] : = vlel + V2e2 + . . bn]} and are unique. For example. x ~ D J Definition 2. {el. write [ ] = XI • [ ~ + ] X2 • [ _! ] =[ ~ = [ ~ Then Then ! ][ ~~ l 1 [ ~~ ] = [ . while [ ~ ] = I . . . . n } such that for V...E~n} such that v= where ~Ibl + ... We represents B. bn be a basis (with a specific order associated with the basis vectors) b1....l. components represents the vector v with respect to the basis B.. particular basis considered. B ~ [b".16..19. {~i } of v with respect to the basis {b l . n unique. .18... If V= 0) V is Definition 2. r I [ . el + 2 .. Example 2.. In]Rn. e2.. . For example. The number of elements in a basis of a vector space is independent of the Theorem 2.18.dimensional or have dimension n and we write dim (V) = n or dim V — n. n for Then for all e there exists a unique ntuple {E1 .16. + ~nbn = Bx. The scalars {Ei}are called the components (or sometimes the coordinates) components coordinates) Definition 2. Vector Spaces Example 2. Definition 2.[ ~  ] + 4· [ ~ l To see this. . en} natural Now let b l .. with respect to the basis with respect to the basis {[~l[!J} we have we have [ ~ ~ ] = 3. for]Rn [e\.. en} is a basis for IR" (sometimes called the natural basis).b.. .15..12 12 Chapter 2. .. If a basis X for a vector space V(Jf 0) has n elements.. V is said to X for be ndimensional or have dimension n and we write dim(V) n or dim V n.
. R. and S are defined respectively by: 1. U + S = T (in general ft. otherwise. we define dim(O) = O.j matrices can be called the "natural basis matrices.. dim{A E ~nxn :: A is upper (lower) triangular} = 1/2n(n+ 1). vector space V is finitedimensional if there exists a basis X with n < +00 elements.) = 0 and ]P ft. and 2. Let (V.4 Sums and Intersections of Subspaces Subspaces Definition 2. S. dim(~n) = n..23.) 2 5. The subspaces Rand S are said to be complements of each other in T. is not necessarily a subspace. V. dim(C[to. where Efj is a matrix all of whose elements are 0 except for a 1 in the (i. V is infinitedimensional. Example 2. R H S = {v : v e R and v e S}. Example 2. Remark 2.+00. j E ~. for finite k). R = 0. 2. Sums and Intersections of Subspaces 2.4 2. and S are said to be complements of each other in T.21. 2. and because the 0 vector is in any vector space. i e m. S S. n S {r s : r E U.a S. dim{A € Rnxn A AT} = {1/2(n 1 (To see why. RI \ h Rk =: ]T ft/ C V. R + S = {r + s : r e R. 2. A consistency. 2.20.18 says that dim (V) the number of elements in a basis. R j ) = 0 am/ Ri = T).2. V. R S C V (in general. s e S}. Note: Check that a basis for ~mxn is given by the mn matrices Eij. for finite k). Ra C V/or an arbitrary index set A). where Eij is a matrix all of whose elements are 0 except for a 1 in the (i. dim(~mXn) = mn. otherwise. JF') be a vector space and let 71. 4. A vector space V is finitedimensional if there exists a basis X with n < +00 elements. n n S = 0.23. The union of two subspaces.22.= T)." 3. Note: Check that a basis for Rmxn is given by the mn matrices Eij. s E 5}.24.4. The sum and intersection ofR and S are defined respectively by: of R. ft n 5 = {v : v E 7^ and v E 5}. Theorem 2. and 1. 2. . 2. 1. t1]) . determine !n(n 1) symmetric basis matrices. H. . is not necessarily a subspace. « The subspaces R. Remark 2. Theorem 2.4. n 5 S. Definition 2. U S. J)th location. R S = (in general. dim{A E ~nxn :: A = AT} = !n(n + 1).21. V is infinitedimensional. + 7^ =: L R. T = R 0 S is the direct sum of R and S if = REB S is the direct sum ofR and S if Definition 2. K + S S. V (in general. The collection of Eij matrices can be called the "natural basis matrices. dim(R mxn ) mn. 1. U\ + 1. 5. we define dim(O) = 0. Theorem 2. a eA CiEA f] n *R. i E m.22.24. y>f (L L .20. Thus. The sum and intersection Definition 2. tJJ) = +00. 1. 1. Theorem 2.) (To see why. 72. dim(Rn)=n. F) be a vector space and let R.=1 K k 1=1 2. determine 1/2n(n + 1) symmetric basis matrices. The union of two subspaces. R D S C V (in general. Thus. j e n. R C S. dim{A e Rnxn A is upper (lower) triangular} = !n(n 1).18 says that dim(V) = the number of elements in a basis. V for an arbitrary index set A). and because the 0 vector is in any vector space. Sums and Intersections of Subspaces 13 13 consistency. Let (V. n (^ ft. j)th location. V (in general." The collection of E. S c V.
Then r1 — r2 = s2— SI.. where rl.. Since ft fl 0. . 2. Av" •. Then V = U $ S. together with Examples 2. Example 2..29. which uniqueness follows.25. and let S Let (V. and let R"x". mutually [x\. where r1. ft. Let U be the subspace of upper triangular matrices in E" x" and let £ be the subspace of lower triangUlar matrices in Rnxn. the set in jRnxn.2 and 2.r . v = [4 l]r jR2. We discuss more about orthogonal complements elsewhere in the text. r2 E Rand s1.. . Xk E jRn 2.. *2. S2 E S. R). Vector Spaces Remark 2.29. S of a vector space V.20. r2 e R. Let x\.si e S.27. But r1 –r2 E ft and S2 — SI E S. unique ft. F) (R n x n . X2. S of a vector space V. .... Vector Spaces Chapter 2. 3.26. we must have rl = r2 and s\ = si from S2 from which uniqueness follows...27. validity of the formula given in Theorem 2... one can easily verify the validity = n. Show that {XI. Let VI.27. while U n £ is the set of diagonal matrices in Rnxn. x/c E R" be nonzero mutually orthogonal vectors... For example. Vn thonormal if and only if A E R"x" is orthogonal. . Then Theorem 2. D Theorem 2.s\.28. jRnxn.20. we must have r\ ri and SI rl . For arbitrary subspaces ft. Find the components of the vector v = [4 If with respect to this basis. Consider the vectors VI — [2 1f and V2 = [3 1f. of the formula given in Theorem 2. .. Using the fact that dim {diagonal (diagonal matrices} = n. every t € can be written uniquely in the form r s with r e R and s e S.5. Theorem 2. .25. and SI. dim(T) = dim(R) + dim(S).. . suppose an arbitrary vector t E T can be written in two ways t e as t S2. Example 2.. Among all the complements there is a unique one orthogonal to R. Prove that viand V2 form a a basis Consider v\ = [2 l]r 1*2 = [3 l] Prove that VI and V2 form basis 2 for R . The complement of R (or S) is not unique.r2 £ Rand 52 . Example 2.26. . every t E T can be written uniquely in the form tt = r + s with r E Rand s E S. jR). .c = jRnnxn jRn xn. consider V = R2 unique. AVn are orv\. 0 The statement of the second part is a special case of the next theorem. EXERCISES EXERCISES 1. . {vi. Since R n S = 0.c the Example 2. But as t = r1 + s1 = r2 + S2. IF) = (jRnxn. 2.. Suppose =R EB Then 1. . vd must be a linear combination of the others. Then show that one of the vectors 1..27. ft S) = jR2 and let ft be any line through the origin. Then any other distinct line through the origin is and let R be any line through the origin. For arbitrary subspaces R.c jRnxn.... Suppose T = R O S. XI. . Xk} must be a linearly independent set. 2. Vk} is a linearly dependent set.14 14 Chapter 2.. Theorem 2. = dim(ft) + Proof: To Proof: To prove the first part. Avn are also orjRn. jRn xn .r2 S2 . Then it may be checked that U + ... 1 TIT The first matrix on the righthand side above is in S while the second is in R. dim(R + S) = dim(R) + dim(S)  dim(R n S). n Proof: A e jRnxn written Proof: This follows easily from the fact that any A E R"x" can be written in the form A=2:(A+A )+2:(AA). let R be the set of skewsymmetric matrices in (V. ft be the set of symmetric matrices in R" x ". ft. e jRnxn 4.vn be orthonormal vectors in R". Show that Av\. Then any other distinct line through the origin is a complement of R. . S2 e rl Sl r2 Then r. Suppose {VI.28. .. Xk} must be a linearly independent set. triangular + L = R xn un. 0 S.
Exercises Exercises
15
5. Let denote the set of polynomials of degree less than or equal to two of the form 5. Let P denote the set of polynomials of degree less than or equal to two of the form Po + PI X + pix2, where Po, PI, p2 E R. Show that P is a vector space over R Show Po p\x P2x2, where po, p\, P2 e R Show that is a vector space over E. Show Find the components of the that the polynomials 1, *, and 2x2 — 1 are a basis for P. Find the components of the that the polynomials 1, x, and 2x2  1 are a basis for 2 2 with respect to this basis. polynomial 2 + 3x 4x polynomial 2 + 3x + 4x with respect to this basis.
6. Prove Theorem 2.22 (for the case of two subspaces Rand S only). 6. Prove Theorem 2.22 (for the case of two subspaces R and only).
7. Let n denote the vector space of polynomials of degree less than or equal to n, and of 7. Let Pn denote the vector space of polynomials of degree less than or equal to n, and of the form p ( x ) = Po + PIX + ...•+ Pnxn,, where the coefficients Pi are all real. Let PE po + p\x + • • + pnxn where the coefficients /?, are all real. Let PE the form p(x) denote the subspace of all even polynomials in Pn,, i.e., those that satisfy the property denote the subspace of all even polynomials in n i.e., those that satisfy the property p(—x} = p(x). Similarly, let PQ denote the subspace of all odd polynomials, i.e., p( x) = p(x). Similarly, let Po denote the subspace of all odd polynomials, i.e., those satisfying p(—x} = p(x). Show that Pn = PE EB Po· those satisfying p(x) = – p ( x ) . Show that n = PE © PO8. Repeat Example 2.28 using instead the two subspaces 7" of tridiagonal matrices and 8. Repeat Example 2.28 using instead the two subspaces T of tridiagonal matrices and U of upper triangular matrices. U of upper triangular matrices.
This page intentionally left blank This page intentionally left blank
Chapter 3 Chapter 3
Linear Transformations Linear Transformations
3.1 3.1
Definition and Examples Definition and Examples
definition of a linear (or function, We begin with the basic definition of a linear transformation (or linear map, linear function, or linear operator) between two vector spaces. or linear operator) between two vector spaces.
Let IF) and (W, IF) be vector spaces. Then I: : > a Definition 3.1. Let (V, F) and (W, F) be vector spaces. Then C : V + W is a linear transformation if and only if transformation if and only if I:(avi £(avi + {3V2) = aCv\ + {3I:V2 far all a, {3 e F andfor all v},v2e V. pv2) = al:vi fi£v2 for all a, £ ElF and far all VI, V2 E V. The vector space V is called the domain of the transformation C while VV, the space into called the of the transformation I: while W, the space into The vector space which it maps, is called the which it maps, is called the codomain.
Example 3.2. Example 3.2.
1. Let F = R and take V = W = PC[f0, +00). 1. Let IF JR and take V W PC[to, +00). Define I: : PC[to, +00) > PC[to, +00) by Define £ : PC[t0, +00) + PC[t0, +00) by
vet)
f+
wet) = (I:v)(t) =
11
to
e(tr)v(r) dr.
2. Let F = R and take V = W = JRmxn. Fix M e R m x m . Let IF JR and V W R mx ". Fix ME JRmxm. Define £ : JRmxn + M mxn by I: : R mx " > JRmxn by
X
f+
Y
= I:X = MX.
3. Let F = R and take V = P" = {p(x) = a0 + ct}x H ... + anx"n : a, E R} and ao alx + ai E JR} and 3. Let IF = JR and take V = pn (p(x) h anx W = pnl. w = pn1. I: : —> Define C.: V + W by I: p = p', where'I denotes differentiation with respect to x. Lp — p', where denotes differentiation x.
17
.m and {Vi. W = R m and [ v i . n A= al : ] E JR. F) —>• (W.. F) is linear and further suppose that {Vi.. and hence jc. i E n}. j E raj. Thus. w ] and L is the ith column of A.e.. In other words. i e n} and {Wj. i e ~}.. Linear Transformations 3. Specifically. Then the {w j. z'th V This could be reflected by subscripts.• + E nVnn = V x (where u.. . say. then arbitrary). Thinking of both as a matrix and as a linear transformation from JR. £V = W A since x was arbitrary. j E !!!. with respect to {w }•. if V = E1v1 + ..} are the usual (natural) bases. [ w . if v = ~I VI + • • + ~n v = Vx (where v. IF) veniently in matrix form. but this is usually not done. w m] and where W = [WI. When V = JR. When V = R".mxn a mn represents L since represents £ since LVi = aliwl =Wai." to Rm usually causes no Thinking of A both as a matrix and as a linear transformation from Rn to lR... E ~} e m] V ith column of A = Mat £ (the matrix representation of £ with respect to the given bases = L L for V and W) is the representation of LVi with respect to {w j... j E m} are the usual (natural) bases WA linea LV L = A. W = lR. We thus commonly identify A as a linear transformation with its matrix representation. Thus. We identify A the equation £V = W A becomes simply £ = A. suppose £ : (V.. Note that A = Mat £ depends on the particular bases for V and W.2 Matrix Representation of Linear Transformations Matrix Representation of Linear Transformations Linear transformations between vector spaces with specific bases can be represented conLinear transformations between vector spaces with specific bases can be represented conSpecifically. j e m}. {W jj' j e !!!. Li near Transformations Chapters. + . LV WA since x was arbitrary.. i. .. in the notation.2 3. Change of basis then corresponds naturally to appropriate matrix multiplication.. is arbitrary). {u. In other words. Thus. + ~nLvn =~IWal+"'+~nWan = WAx. respectively. is by its action on a basis.18 Chapter 3. Thus.m usually causes no naturally confusion. for V and W) is the representation of £i>. then LVx = Lv = ~ILvI + .. transformation with its matrix representation. and hence x.n. + amiWm where W = [w\. L IF) ~ (W.e.} are bases for V and W. usually L The action of £ on an arbitrary vector V e V is uniquely determined (by linearity) v E V uniquely determined by its action on a basis. i.
3. Then their inner product is the scalar E ~n. Composition of Transformations 19 19 3. xx T XX ). dim V = n. y e Rn. If dimU = p. the arrows above are reversed as follows: C However. Note that in most texts. Then we can define a new transformation C as follows: to W. it might be useful to prefer the former since the transformations A and B appear in the same order in both the diagram and the equation. dimV = n.y. That is. Two Special Cases: Two Special Inner Product: Let x. then composition of transformations corresponds to standard matrix mUltiplication. then composition of transformations corresponds to standard matrix multiplication. and if we associate matrices with the transformations in the usual way. Then their outer product is the m x n E ~m. The above is sometimes expressed componentwise by the C — A B . and dim W m. we have C A B . . the form XXT (or xx HH). y E Rn. Composition ofTransformations 3. . Outer Product: matrix matrix mxn E R Note that any rankone matrix A e ~mxn can be written in the form A = xyT = xyT H mxn mxn). and W and transformations B from U to V and A from Wand V to W.3 Composition of Transformations Composition Consider three vector spaces U. and dim W = m. and if we associate matrices with the transformations in the usual way. in the same order in both the diagram and the equation. V. Inner Product: n xTy = Lx. expressed mxp nxp formula cij = L k=1 n aikbkj. That is. Note that in The above diagram illustrates the composition of transformations C = AB. Then we can define a new transformation C as follows: C The above diagram illustrates the composition of transformations C = AB.=1 Outer Product: Let x e Rm.3. If dimZ// = p..3. y e ~n. A rankone symmetric matrix can be written in above (or xy if A E C xyH e c ).
. vd of u. the same symbol (A) is used to denote both a linear transformation and its matrix representation with respect to the used to denote both a linear transformation and its matrix representation with respect to the usual (natural) bases. is the set {w e w = Av for some v e V}. 0 nition. .Vk } With Vi E. Equivalently.2.8. an M." • orthonormal set. is the {v e V Av = 0}..20 20 Chapter 3.•. if i f= j. ~ 3 .i •. denotedR(A).. 2..3.. I ~VI VI ^/v. e ~mxn.an]. denoted N(A). the same symbol (A) is Note that in Theorem and throughout the text.4 3. denoted Im(A). vi ^/v'k vk ~~~ ] . V.7. The nullspace of The of denoted N(A). If {VI. Note that N(A) and R(A) are. Definition 3. Theorem 3. (A). is the set {v E V : Av = O}. Let A : V + W be a linear transformation. usual (natural) bases. Then Let A : V —>• be a linear transformation. See also the last paragraph of Section 3. then ~ . . The range of A is also known as the image of A and — {Av e V}. N(A) S.. . —/=== .3. Note that N(A) and R(A) are. {v1. Li near Transformations 3. If A is written in terms of its columns as A = [ai. Vk} with u. Note that in Theorem 3. 1. If in of = [a\.[ :~~ J} . orthonormal set. Definition3. vk] be a set of nonzero vectors Vi E ~n. essentially following immediately from the definition. . The range of A...4 Structure of Linear Transformations Structure of Linear Transformations Let A : V —> W be a linear transformation. 1.. The nullspace of kernel of and A is also known as the kernel of A and denoted Ker (A). LinearTransformations Chapter 3..5. R(A) = {Av : v E V}. {[ ~~i ]. of of denoted Im(A). Theorem 3.2. .. R(A) C W. 2. is an orthogonal set. N(A) c V. Then 1.8. .6.. is the set {w E W : w = Av for some v E V}. See also the of Section 3. The set is said to 3. subspaces of different spaces. {[ ~ J.. Definition 3. { t > .4. . R ( A ) S.IN. IS an orthogonaI set. where 8ij is the Kronecker delta defined by Kronecker delta defined by 8 = {I0 ij ifi=j. Example 3. then then R(A) = Sp{al. € 1Tlln is an orthogonal set. ~}  ISisan 3.. .. then Proof: The proof of this theorem is easy.. be orthogonal if' vjvj 0 for i ^ j and orthonormal if vf vj 8ij' where 8tj is the be orthogonal if vr v j = 0 for i f= j and orthonormal if vr v j = 8ij.5 and throughout the text. then {I —/==. Let {VI. The range of A. essentially following immediately from the defiProof: The proof of this theorem is easy.. Let A : V + be a linear transformation. W. e Rn. denotedlZ( A).. Let A E Rmxn. subspaces of different spaces. 2. in general. D Remark 3.. is an orthonormal set. ~. The nullspace of A. is an orthonormal set.7. .. [: J} is an orthogonal set. in general. an].. 3. . an} .
The proofs of the other results are left as Proof: left exercises.. Structure of Li near Transformations 3. Let {VI. Proof: We prove and discuss only item 2 here. S 5. k =X .4. Let R S C Rn The S <.. Then the of defined T S~={VE]Rn: V S = 0 for all s e S}. n <. . Let S <.3.9.=1 XI. Then the orthogonal complement of S is defined as the set c ]Rn. Let 3. (S~)l. nonzero) solutions of the system of equations 3xI 4xI + 5X2 + 7X3 = 0. then give rise to redundant equations). ]Rn. vk} e ]Rn vector. . + X2 + X3 = 0... n S~. n~. if and only if S~ <. Then it can be shown that Working from the definition. S1. Example 3. including dependent spanning vectors (which would. Vk} be an orthonormal basis for S and let x E Rn be an arbitrary {v1.e.4. = S. the computation involved is simply to find all nontrivial (i. Rn. of course. 3. 2. 6. .= {v e Rn : vTs=OforallsES}. (n n S)~ = n~ + S~.. Set vector. (n + S)~ = nl. Then n..11. S \B S~ = ]Rn. Any set of vectors will do.10. 4. Structure of Linear Transformations 21 21 Definition 3. Set XI X2 = L (xT Vi)Vi.10. Note that there is nothing special about the two vectors in the basis defining S being orthogonal. Theorem 311 Let Theorem 3. .
. E N(A) and E N(A).. Thus. every vector v in the domain space IRn can be written in a unique way as v = x 7.l = 7£(AT). every vector in the codomain space R m can be written ina unique way asw = x+y. many properties of A can be developed in terms of the four fundamental subspaces .14 (Decomposition Theorem). We have thus shown that S + S. . every vector v in the domain space R" can be written in a unique way as v = x + y. Let A : IRn > IRm. where x\.e. since Then XI e S and. Then Theorem 3. But then (x'1 —XI)TT (x. Clearly. (Note: This for finitedimensional 1. {w E IR m : WT A = O} is called the left nullspace of A.(A)1~ — J\f(ATT ).e.) 2.l. We S n S1 =0 the e orthogonal everything in (i.. When thought of as a linear transformation from E" Consider a general matrix A € E^ x "..1 in the next section. Let A : IRn —> Rm.l N(A T (i.) Proof: To x E N(A).. X2 is orthogonal to any vector in S.XITVj =XTVjXTVj=O. Clearly. Similarly.11 can be combined to give two very funTheorem 3. E R(A) and E R(A).. Then R(A r ).12. since T x 2 Vj = XTVj . See also Theorem 2. for example. In other words. It can write vectors in a unique way with respect to the corresponding subspaces. 0 x1 — x'1 andx2 = x2. IRn = M(A) EB R(A T)). N(A). Then 1.l. See also Theorem 2. transformation A. right nullspace of A. many properties of A can be developed in terms of the four fundamental subspaces to IRm. This key theorem becomes very easy to remember by carefully studying and underThis key theorem becomes very easy to remember by carefully studying and understanding Figure 3. But yT Ax = (ATyy{ x.12 and part 2 of Theorem 3..12. Li near Transformations Chapters.xn. x E R(A r ) Since x 1 have established thatN(A).11 can be combined to give two very fundamental decompositions damental and useful decompositions of vectors in the domain and codomain of a linear and transformation A. +x~). (Note: This holds only for finitedimensional vector spaces. established that N(A) U(AT ). Then T (x. Since x was arbitrary. Thus. can write vectors in a unique way with respect to the corresponding subspaces.) 2. where x e U(A) and y e ft(A)1. Suppose. E S and X2. (w e Rm : w T A = 0} is called the left nullspace of A.13. Linear Transformations Then x\ E <S and. Let A : Rn + Rm. the right nullspace is A/"(A) while the left nullspace is N(A T ).= Af(AT) ) (i. We also have that S U S.. In other words. XI) (which follows by rearranging the equation XI +X2 = x. Rm = 7l(A) 0 M(AT)).l.. But yT Ax = ( A T ) x.l = N(A ). (Note: This also holds for infinitedimensional vector spaces. Let A : R" + IRm.14 (Decomposition Theorem). XI = x. i.XI/ (x~ . .l where x € M(A) and y € J\f(A)± = R(AT) (i. x~ e S. IRm = R(A) EBN(A T». x e R(AT).e.5 Four Fundamental Subspaces Four Fundamental Subspaces Consider a general matrix A E lR.l = R(A T}. Ax = 0 if and only if x orthogonal is orthogonal to all vectors of the form AT y..e.x1) (x'1 xd x2 — X2 = — (x'1 — x1) (which follows by rearranging the equation x1+x2 = x'1 + x'2).•. 2. we form AT v. Then {v e IRn : A v = O} is sometimes called the Definition 3. 3.1 in the next section. including itself) is O. everything in S (i. Vk and hence to any linear combination of these vectors. D Theorem 3.. that x = XI for example. Theorem 3. D Definition 3. standing Figure 3. R(A). It is also easy to see directly that.26. The proof of the second part is similar and is left as an exercise. R" N(A) 0 ft(Ar ». 0 The proof of the second part is similar and is left as an exercise. we decompositions. when we have such direct sum decompositions. Let A : IRn > Rm. Suppose.l = Rn.. + x~. Ax = 0 if and only if x equivalent to yT Ax = 0 for all y. take an arbitrary x e A/"(A).e. x'2 E S1. Vk and hence to any linear combination of these we see that X2 is orthogonal to VI. Thus. that x = x1 + x2. Theorem 3. . 'R. i. every vector w in the codomain space IRm can be written in a unique way as w = x+y.26. But then (x. Then {v E R" : Av = 0} is sometimes called the right nullspace of A. Ax = Proof: To prove the first part.. and x2 = x~. . = x'1+ x'2. x~ X2 = (x. Let A : Rn + IRm. Then Ax = 0 and this is an and equivalent to yT Ax = 0 for all v. Similarly. (Note: This also holds for infinitedimensional vector spaces. – x1) = 0 since 0 by definition of S.l = 0 since the only vector s E S orthogonal to S1 = IRn.5 3.13. including itself) is 0. Thus. where XI. Then Theorem 3.) N(A)1" spaces. We have thus shown that vectors. we see that x2 is orthogonal to v1.22 22 Chapter 3.12 and part 2 of Theorem 3. the right nullspace is N(A) while the left nullspace is J\f(AT). X2 is orthogonal to any vector in S.e. y. x 1 E Sand x2.X2) 0 since (x'1 — x1) (x' 2 — x2) = 0 by definition of ST. ft(Ar) (i. .e. Then X2 = x. x. When thought of as a linear transformation from IR n to Rm.e.
(A) = W. A is onto (also called epic or surjective) ifR(A) = W. and N(A)1. 3. A is onetoone or 11 (also called monic or infective) ifJ\f(A) = 0.1 makes many key properties seem almost N(A)T. The row rank of A is column rank of of independent row rank of . 1. properties 7£(A).3. A f ( A ) . Figure 3.16. R(A)1. Definition 3. 1. and in illustrating concepts such as controllability and observability. Figure 3. Let and W be vector spaces and let A : motion. Then rank(A) = dim R(A). Two equivalent characterizations of A being 11 that are often easier to verify in practice are the characterizations of A being 11 that are often easier to verify in practice are the following: following: (a) AVI = AV2 (b) VI ===} VI = V2 . t= V2 ===} AVI t= AV2 . rank(A) dimftCA). Four Fundamental Subspaces 23 23 A r N(A)1 r EB {OJ X {O}Gl nr m r Figure 3. Two equivalent 2.5. Four Fundamental Subspaces 3. A is onto (also called epic or surjective) ifR. 2. IR n > IRm.5. N(A). Four fundamental subspaces. Let V and W be vector spaces and let A : V + W be a linear transforDefinition 3. This is sometimes called 3. fundamental subspaces. 'R.15. the column rank of A (maximum number of independent columns). be a linear transforDefinition 3.15. A is onetoone or 11 (also called monic or injective) if N(A) = O.(A)^.1. mation.16.1. R(A). Let A : E" + Rm.1 obvious and we return to this figure frequently both in the context of linear transformations obvious and we return to this figure frequently both in the context of linear transformations and in illustrating concepts such as controllability and observability.
Clearly T is 11 (since A/"(T) = 0). Theorem 3. .. we include here a few miscellaneous results about ranks of sums completeness. Part 4 of Theorem 3. the subspaces themselves are not necessarily in the same vector space. and is defined as dimN(A).. if {ui. by definition there is a vector x E ]Rn such that Ax = w. 1. Tvrr]} is a basis for R(A). The basic results are contained in the following easily proved following theorem.19 suggests looking at the general problem of the four fundamental Part 4 of Theorem 3. Then N(T) = To w E 7£(A).11 and 3. this theorem is sometimes colloquially stated "row rank of A = column N(A)L = R(A A/^A) " = 7l(A ).17. then {TVI. denoted nullity(A) or corank(A)..17 we see immediately that Proof: From Theorems 3. of A. . rank(A) + B) :s rank(A) + rank(B). r*i *i E N(A)L. + nullity(B). v r } is a basis for N(A)L. . nullity(B) :s nullity(AB) :s nullity(A) 4. Write x = Xl + X2. rank(B)}.17 we see immediately that n = dimN(A) = dimN(A) + dimN(A)L + dim R(A) . following follows we apply this and several previous results. Proof: From Theorems 3. .andx22 e A/"(A)." 0 of D The following corollary is immediate.. Tv abasis 7?. 3. The dual notion to rank is the nullity R(AT) of independent rows). where Ax — w..19 suggests looking atthe general problem of the four fundamental subspaces of matrix products. Let A.18. O:s rank(A 2.11 and 3. + rank(B)  n :s rank(AB) :s min{rank(A).(A) = dimA/^A^ 1 if that if {VI.. dimension of the domain of A. of Corollary 3. Then Ajti = W = TXI since Xl e A/^A). B E R" xn . Linear Transformations dim 7£(A r ) (maximum number of independent rows). the following string of equalities follows easily: "column rank of A" = rank(A) = dim R(A) = dimN(A)L1 = dim R(AT) = rank(AT)) = A" rank(A) = dim7e(A) = dim A/^A) = dim7l(AT) = rank(A r = "column "row rank of A. Then dimN(A) + dim R(A) = n. . of A. and products of matrices. LinearTransformations Chapter3.(A). u.") Proof: Proof: Define a linear transformation T : N(A)L ~ R(A) by J\f(A)~L —>• 7£(A) by Tv = Av for all v E N(A)L. dimensions. and is defined as dim A/"(A). dimA/'(A) ± (Note: 1 T T ). Let A : R" ~ Rm. . . Let A : Rn > Rm. We thus have that dim R(A) = dimN(A)L since it is easily shown T dim7?. .") of A. . dimA/"(A) + dimft(A) = dimension of the domain of A. it is a statement about equality of dimensions. rank(AB) = rank(BA) = rank(A) and N(BA) = N(A).18. sometimes denoted nullity(A) or corank(A). take any W e R(A). where n is the ]Rn > ]Rm.. Like the theorem. . Theorem 3.. if B is nonsingular. {Tv\. iv} abasis forA/'CA) .19. 1 1 Xl E A/^A) .19. . shows that T is onto. To see that T is also onto. the subspaces themselves are not necessarily in the same vector space. e ]Rnxn. Then dim K(A) = dimNCA)L. (Note: Since 3. The last equality AXI x\ e N(A)L and jc E N(A). colloquially of = rank of A. 3. 0 For completeness. x x e R" x\ X2. R(A) : ]Rn ~ ]Rm. Finally.24 24 Chapter 3. Then 3.17.
suppose Ax\ dim R(A T). It The next theorem is closely related to Theorem 3. e IRmxn.23. A is 11 if and only ifrank(A) = n (A has linearly independent columns or is said 2. 2. Ar. dim7?. e 7?. A A T. A : W1 + E" is invertible or nonsingular if and only z/rank(A) = n. Conversely. 4. y E R(A). Conversely. A is 11 if and only z/rank(A) = n (A has linearly independent columns or is said to have full column rank. then dim V — dim W. Then 3.and R(A). Conversely.e. equivalently.17. A Proof of part 2: If A is 11. We now characterize 11 and onto transformations and provide characterizations in We now characterize II and onto transformations and provide characterizations in terms of rank and invertibility. The transformations AT and A I have the same domain and range but are in general different maps unless A is and A~! have the same domain and range but are in general different maps unless A is orthogonal. Definition 3. : R n » Rm. 1. Note that if A is invertible. Let jc = AT(AAT)~]y Y E Rn. RCAB) S. so A is onto. AA is nonsingular).22. and AI A. then N(A) = 0.5. and hence dim R(A) n by Theorem 3.21. A € IR~xn.23. It is extremely useful in text that follows. AT. 1. equivalently. N(AB) . D D 11. R«AB)T) S.22. R(AT) 3.17. suppose AXI = Ax^. 2.3. Also.—n = dim 7£(A r ).21. e IRnxp. especially when dealing with pseudoinverses and linear least squares problems. A is onto if and only if rank (A) = m (A has linearly independent rows or is said to have full row rank. then dim V = dim W. Then A r A. R(A) = R(AA T ). A : V —» W is invertible (or bijective) if and only if it is 11 and onto. R(B T ). Definition 3. Theorem 3. Similar remarks apply to A and A~T. Let e IRmxn. A"1 ± are all 11 and onto between the two spaces M(A) and 7£(A). 1. RCA). XI = X2 AT A A 11. 3. Also. N(A) = N(A T A). linear least squares problems. especially when dealing with pseudoinverses and is extremely useful in text that follows. Then Theorem 3. AT A is nonsingular). which implies that dim A/^A). AX2. and hence dim 7£(A) = n by Theorem 3. Let A E Rmxn. A is AT AXI AT AX2. Note that in the special case when A E R"x". Conversely. which implies that dimN(A)11 = n — Proof of part 2: If A is 11.20.2 N(B). A : IRn »• IR n is invertible or Note that if A is invertible. dim R(A) = m = rank(A). let y E IRm be arbitrary. A is onto if and only //"rank(A) — m (A has linearly independent rows or is said to 1.20.5. have full row rank.(A) — m — rank (A). i. AATT is nonsingular). 2. x AT (AAT)I e IRn.20 and is also easily proved. Let A : IRn + IRm. Let A E Rmxn.ti = AT Ax2. Four Fundamental Subspaces Theorem 3.(A). AT A nonsingular). N«AB)T) . to have full column rank. B E Rnxp.20 and is also easily proved. let y e Rm Proof: Proof of part 1: If A is onto. = R(A T A). . A : V + W is invertible (or bijective) if and only if it is 11 and onto. equivalently.2 N(A T ). nonsingular ifand only ifrank(A) = n. since ArA is invertible. Four Fundamental Subspaces 3. the transformations A. The transformations AT are all 11 and onto between the two spaces N(A)1. Proof' Proof of part 1: If A is onto. terms of rank and invertibility. 25 25 The next theorem is closely related to Theorem 3. Then y = Ax. then A/"(A) = 0. equivalently. Thus. Then 3.. Theorem 3. N(A T ) = N(AA T ). 4. which implies x\ = x^.
.I) must be a right inverse and.e.22 ]Rn >• ]Rm Note: From Theorem 3. If there exists a unique right inverse A~R such that AA~R = I. i. (A R + A RA . therefore. then one (Proo!' = [1 2]:]R2 + ]R . Then > 1. both 11 and Moreover. € ]R . A is said to be right invertible if there exists a right inverse transformation A~RR : if A. where Iw denotes the identity transfonnation on W. i.. then a right inverse is given by A~R = AT(AAT) I. i. 1. (A R + A R A — /) must be a right inverse and. A is said to be left invertible if there exists a left inverse transformation A~L : W —> to transformation A L : V such that A L A = Iv. then A is invertible. (Proof: Take any a E E1I. A... it is clear that there are infinitely many right inverse. then a left inverse is given by A R = AT (AAT) left T L = A..25 that A is invertible. by uniqueness it must be Thus. is left invertible if and if it and left invertible.e. where Iv denotes the identity transfonnation on V.R = If left A L A L A = 2. It then follows from Theorem 3. are infinitely A. Then Definition 3. then A is invertible. it may still be right or left invertible.I = A~R.26.R AA. It then follows from Theorem 3.R + ARA I) = AA. such that A~LA = Iv where Iv denotes the identity transformation on V. —> transformation if left + 2. can always find v e E2 such that [1 2][^] = a). 1. Obviously A has full row rank (= 1) and A . 0 a left inverse. Defileft If linear concepts left nitions of these concepts are followed by a theorem characterizing left and right invertible transformations.R = _~] (=1) and A~R = [ _j j is a right inverse. A R A = I. 3. if A is 11. characterizing all solutions of the linear matrix equation AR = I. A is left invertible if and only ifit is 11.e. A~ (A A)~ A .25 that A is invertible.: AA R = w Iw W + V such that AA~R = Iw. linear Transformations If a linear transformation is not invertible. If Proof: proof of second Proof: We prove the first part and leave the proof of the second to the reader. in which case A~l = A~R = A~L. Let A : V » V. A is right invertible if and only if it is onto.25..24. Notice the and leave the following: following: A(A. therefore.R + AARA = I A +IA  A since AA R = I = I. Theorem 3. 2.e. A is invertible if and only if it is both right and left invertible. Definition 3.both 11 and is if and if onto.L = (ATTA)I1AT. . Li near Transformations Chapters. Let A : V + W. If there exists a unique left inverse A~L such that A~LA = I. Similarly.26. Obviously A has full row rank can always find v E ]R2 such that [1 2][ ~~] = a). i. Let A : V > W. that A~R is a left inverse. by uniqueness it must be A R + A R A — = A R. Let Theorem 3.24. Theorem 3. Let A : V + Then 1. 1.. Then A is onto.22 we see that if A : E" + Em is onto. D Example 3. Let A = [1 2] : E2 »• E1I. in A I = A R = A L. But this implies that A~RA = /. In Chapter 6 we characterize all right inverses of a matrix by Chapter characterize characterizing all solutions of the linear matrix equation A R = I. right inverses for A. Also. Let + V.26 Chapter 3.27. A right invertible if and only if it onto. A R the case that A~R + A~RA .
ThenAis 11. 'R. Consider the vector space ]Rnxn over ]R. Let A = [~ . The matrix A = 1 1 2 1 [ 3 1 when considered as a linear transformation onIE \ is neither 11 nor onto. and let R denote the subspace of skewsymmetric matrices.Exercises 27 2. respect to this inner product. J E2. For matrices matrices. For matrices X. 3. . Let A = [i]:]Rl > ]R2. Show that. Y) = Tr(X Tr F). Prove Theorem 3. and let 7£ denote the subspace of skewsymmetric matrices. is neither 11 nor onto. LetA [J] : E1 ~ E2. 4. In Chapter 6 we characterize all left inverses of a matrix by characterizing all solutions of the linear matrix equation LA = I. (Proof Theonly solution toO = Av = [i]v 2. We give when considered as a linear on ]R3. Is £. Then A is 11. It is now obvious that A has full column rank (=1) and A~L = [3 . (Proof: The only solution to 0 = Av = [I2]v is v = 0. In Chapter 6 we characterize all left inverses of a infinitely many left inverses for A. Find the matrix representation of A with respect to the bases Find the matrix representation of A to bases {[lHHU]} of R3 and {[il[~J} of E . below bases for its four fundamental subspaces. Y E Enx" define their inner product by (X. Again. Consider the vector space R nx " over E. with Y e ]Rnxn (X. R = S J. Prove Theorem 3. EXERCISES EXERCISES 3 4 1. It is now obvious that A has full column is v 0. 2. Consider differentiation £ 11? Is£ onto? onto? 4. Let A = [8 5 i) and consider A as a linear transformation mapping E3 to ]R2. matrix characterizing all solutions of the linear matrix equation LA = I. 2 . let S denote the subspace of symmetric matrices. Consider the differentiation operator C defined in Example 3. We give below bases for its four fundamental subspaces. 3. whence N(A) = 0 so A is 11). £. II? Is £.3. let denote the subspace of symmetric 2. it is clear that there are A L = [3 — 1] infinitely many left inverses for A. — S^. consider A linear transformation ]R3 1. whence A/"(A) = 0 so A is 11).4. The matrix 3.1] is a left inverse. y) = Tr(X Y). respect to this inner product.4.2.
2. If E 1R~9X48. 3. Suppose A E IR m xn has a left inverse.Il.4. Suppose A € Mg 9x48 . Are they equal? Is this true in general? DetennineN(A) and R(A). Determine A/"(A) and 7£(A). Let A = [ J o].11. Let = [~ 9. Prove Theorem 3.28 5. prove it.4. ~ ~ 3 8. . linearly independent solutions 10. How many linearly independent solutions can be found to the 10. Linear Transformations 7.1 11. Prove Theorem 3. homogeneous linear system Ax = 0? homogeneous linear system Ax = O? n 3. Chapter 3.1 to illustrate the four fundamental subspaces associated with AT e associated ATE nxm IR from IR m R". Rnxm thought of as a transformation from Rm to IRn. Linear Transformations Chapters.12. Show that AT has a right inverse. provide a counterexample. Are they equal? Is this true in general? If this is true in general. Modify Figure 3. Theorem 6. left T Suppose e Rmxn 9. Determine bases for the four fundamental subspaces of the matrix Detennine fundamental A=[~2 5 5 ~]. if not.
Then A+ is the MoorePenrose where y = y\ pseudoinverse of A. Then A+ is the MoorePenrose j2 with y\ E RCA) and yi E RCA). as noted in the proof of Theorem 3..Chapter 4 Chapter 4 Introduction to the Introduction to the MoorePenrose MoorePen rose Pseudoinverse Pseudoinverse In this chapter we give a brief introduction to the MoorePenrose pseudoinverse. With A and T as defined above. brings great notational and conceptual clarity to the study of solutions to arbitrary systems of linear equations and linear least squares to the study of solutions to arbitrary systems of linear equations and linear least squares problems. neither provides Unfortunately. pseudoinverse of A.l. With A and T as defined above.m We A+ A e lP1. characterization of A is given in the next theorem. 29 . T is bijective Cll and onto).1 4. Definition 4.. a generalization of the inverse of a matrix. for determining A+ . as noted in the proof of Theorem 3. Then.. Although X and y were arbitrary vector spaces above. where X Xand Y y are arbitrary finitedimensional vector spaces. which was proved by Penrose in 1955.l. 4. let us henceforth consider the X ~n lP1. a generIn this chapter we give a brief introduction to the MoorePenrose pseudoinverse. brings great notational and conceptual clarity matrix and. problems. where and are arbitrary finiteConsider a linear transformation A : X + y. We have thus defined A+ for all A E IR™xn. the MoorePenrose pseudoinverse of A. T is bijective (11 and onto). Define a transformation T : Af(A)1. can be used to give our first definition of A . define a transformation A + y + X by where Y = YI + Yz with Yl e 7£(A) and Yz e Tl(A}L.1 Definitions and Characterizations Definitions and Characterizations Consider a linear transformation A : X —>• y.17.1. the definition neither provides nor suggests a good computational strategy good computational strategy for determining A +. see [22]. case X = W1 and Y = Rm. and hence we Then. which was proved by Penrose in 1955. The MoorePenrose pseudoinverse is defined for any matrix and.l. see [22]. as is shown in the following text." X ". This transformation T~ + can be used to give our first definition of A+.+ R(A) by dimensional Define transformation T : N(A). as is shown in the following text. the MoorePenrose pseudoinverse of A.1. define a transformation A+ : Y —»• X by Definition 4.17. and hence we can RCA) —>• J\f(A}~L This transformation can define a unique inverse transformation Tl 1 :: 7£(A) + NCA). let us henceforth consider the Although X and Y were arbitrary vector spaces above.l —>• Tl(A) by Tx = Ax for all x E NCA). A purely algebraic y + characterization of A+ is given in the next theorem.
Example 4.4. . X+ = AT(AATT) I if A is onto (independent rows) (A is right invertible). Such a verification is often relatively straightforward. Let A E lR. Introduction to the MoorePenrose Pseudoinverse Theorem 4.1]) satisfy properties (PI).30 Chapter 4. straightforward. then by uniqueness. Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. Furthermore. the Penrose properties do offer the great virtue of providing a tional algorithm. Example 4.2 nor its proof suggests a computawith Definition 4. it must be A+." xn. and (P4) but not (P3). L Note that other left inverses (for example. Then A+ [a [! = lim (AT A + 82 1) I AT 6+0 6+0 (4. Note that other left inverses (for example.7. For any scalar a.6. A~ = [3 . If G satisfies all four. then by uniqueness. A + always exists and is unique.2 nor its proof suggests a computational algorithm. Then Theorem 4. if a t= 0.." xn. Verify directly that A+ = [ ~] satisfies (PI)(P4). (P3) (AGf (P3) (AG)T = AG. However. characterization can be useful for hand calculation of small examples. AG. (PI) AGA = A. p.5.3. whose proof Still another characterization of A + is given in the following theorem. While not generally suitable for computer implementation.4.3.2. one need simply verify the four Penrose conditions (P1)(P4). A+ always exists and is unique. While not generally suitable for computer implementation. the Penrose properties do offer the great virtue of providing a checkable criterion in the following sense. a right or left inverse satisfies no fewer than three of the four properties. as a right or left inverse satisfies no fewer than three of the four properties. Also. 19]. Consider A = ['].2 4. terizations.1. A t = AT (AA )~ if A is onto (independent rows) (A is Example 4. Let A e R™xn. Example 4. A+ = (AT A)~ AT if A is 11 (independent columns) (A is left invertible).2 Examples Examples Each of the following can be derived or verified by using the above definitions or characEach of the following can be derived or verified by using the above definitions or characterizations. one need simply verify the four Penrose conditions (P1)(P4). Unfortunately.6. as with Definition 4. (P2) GAG = G. Also. Example 4. Verify directly that A+ = Example 4. neither the statement of Theorem 4. if a =0.5. Such a verification is often relatively satisfies all four. and (P4) but not (P3). If G the pseudoinverse of A. For any scalar a.1. Unfortunately. Note that the inverse of a nonsingular matrix satisfies all four Penrose properties. (P2). Then G = A+ if and only if Theorem 4. it must be A +. = Furthermore. Let A E lR. However. A L = [3 — 1]) satisfy properties (PI). Example 4. (P4) (GA)T = GA. 19]. Given a matrix G that is a candidate for being checkable criterion in the following sense.2) 4. p. whose proof can be found in [1. Theorem 4. (P2) GAG G. this can be found in [1. A+ = (AT A)I AT if A is 11 (independent columns) (A is left invertible). Given a matrix G that is a candidate for being the pseudoinverse of A.7. (4. Example 4. (P2). (P4) (GA)T = GA. Consider A = f ] satisfies (P1)(P4). this characterization can be useful for hand calculation of small examples. neither the statement of Theorem 4.2.1) = limAT(AAT +8 2 1)1. Let A e R?xn Then G = A + if and only if (Pl) AGA = A. Still another characterization of A+ is given in the following theorem. Introduction to the MoorePenrose Pseudoinverse Chapter 4.
(A T )+ = (A+{. The proof of the first result is not particularly easy and does not even have the virtue of being proof of the first result is not particularly easy and does not even have the virtue of being especially illuminating.9.8.7.12. Example 4.10.3. Example 4. Properties and Applications Example 4. 0 the four Penrose conditions.12.3. if v i= 0..11. Many of these are used in the text that follows.10. e jRmxn and suppose Rmxm R n are orthogonal (M is T 1 1 orthogonal if M M ). simply verify that the expression above does indeed satisfy each of Proof: For the proof. Then Proof: For the proof. where U is orthogonal and D is diagonal. The proof of the second result (which can also be proved easily by verifying the four Penrose proof of the second result (which can also be proved easily by verifying the four Penrose conditions) is as follows: conditions) is as follows: (A T )+ = lim (AA T ~+O + 82 l)IA = lim [AT(AAT ~+O + 82 l)1{ + 82 l)1{ 0 = [limAT(AAT ~+O = (A+{. where D+ is again a diagonal matrix whose diagonc D is diagonal. . p. Then orthogonal if MT = M. D Theorem 4.7. The interested reader can consult the proof in [1. [~ r 1 =[ 4 4 I I ~l 4 I I 4 4. 2.13. Let S E Rnxn be symmetric with U TSU = D.3 4.). A+ = (AT A)+ AT = AT (AA T)+. Theorem 4.4. Proof: Both results can be proved using the limit characterization of Theorem 4. For all A E jRmxn.4.VVEejRnxnx " are orthogonal (M is 4. where U is orthogonal an Theorem 4. Let S e jRnxn be symmetric with UT SU = D.13. The Proof: Both results can be proved using the limit characterization of Theorem 4. The especially illuminating. p. simply verify that the expression above does indeed satisfy each c the four Penrose conditions. 31 31 Example 4. Example 4. . For any vector e jRn. For any vector v E M". if v = O. Then S+ = U D+UT. 27].11. . Theorem 4. [~ ~ r ~ =[ 0 Example 4.9. elements are determined according to Example 4. Properties and Applications 4. Many of these This section presents some miscellaneous useful results on pseudoinverses.8.3 Properties and Applications Properties and Applications This section presents some miscellaneous useful results on pseudoinverses. 27]. Then S+ UD+U T where D+ is again a diagonal matrix whose diagonal elements are determined according to Example 4. The interested reader can consult the proof in [1. For A e Rmxn 1. 4.4. are used in the text that follows. Let A E R m x "and suppose UUEejRmxm.
xm. (AB)+ = B{ Ai. then (AB)+ = B+ A+. poor (see. Theorem 4. in theory at least. A+ = (AT A)I AT.14.12 Note that by combining Theorems 4.13 can. in general. 4. Introduction to the MoorePenrose Pseudoinverse Chapter 4. Theorem 4. and better methods are suggested in text that follows. compute 4. 4. This A AT AT turns out to be a poor approach in finiteprecision arithmetic. [II]. where BI = A+ AB and AI = AB\B+.15. see [9]. we have B = B (BBT)~\ whence BB+ = Ir. in peneraK ucts of matrices such as exists for inverses of products. [11]. If A e Rnrxr. (AB)+ = B+A+ if and only if 4. 0 D E lR~xr. the MoorePenrose pseudoinverse of any matrix (since AAT and AT A are symmetric). see [5]. Proof: Proof: For the proof.15.11 is suggestive of a "reverseorder" property for pseudoinverses of prodTheorem 4. where BI = A+AB and A) = ABIB{. necessary and sufficient conditions under which the reverseorder property does hold are known and we quote a couple of moderately useful results for reference. 4. 0 The following theorem gives some additional useful properties of pseudoinverses. TTnfortnnatelv.. since e lR~xr. we B+ BT(BBT)I. then AkA+ = A+ Ak and (Ak)+ = (A+)kforall integers k > O. 2. [7].14. (AT A)+ = A+(A T)+. e. Proof: Proof: For the proof. For all A E lR mxn . n(A+) 4.11 nets of matrices such as exists for inverses of nroducts Unfortunately.At = A in Theorem 4. [23]). If e lR~xm. [7]. (AB)+ = B?A+.. Ir Similarly. Introduction to the MoorePenrose Pseudo inverse 4.32 Chapter 4. The result then follows by E lR.15. [9]. n(A T AB) ~ nCB) . For e Rmxn . 3. BB+ f r The by taking BI = B. A\ = A in Theorem 4. = n(A T) = n(A+ A) = n(A TA).13 we can. D takingB t = B. (AA T )+ = (A T)+ A+. (AB)+ = B+ A + if and only if 1. N(A+) 5. Then (AB)+ = 1+ = I while while B+ A+ = [~ ~J ~ = ~.• Similarly. n(BB T AT) ~ n(AT) and 2. 1.15. (A+)+ = A.. [] sufficient reverseorder However. xm + T B e Wr . If A is normal.17. B E Rrrxm. .g.16. then A+ = (ATA)~lAT. properties Theorem 4. however (see.16. (AB)+ = B+A+. Then As an example consider A = [0 I] and B = LI. [5]. As an example consider A = [0 1J and B = [ : J. whence A+A = f r. e. = N(AA+) = N«AA T)+) = N(AA T) = N(A T). Theorem 4. Proof' A+ A Proof: Since A E Rnrxr.17.12 and 4.g. 0 D Theorem 4.
Use Theorem 4. 4. we have shown where one of the Penrose properties is used above. show that (xyT)+ = (x Tx)+(yT y)++yxT. show that JV(A) C A/"(S) if and only if BA+ A = B. y E IRn.• 1 2 x. For A e R m x n . Proof: Suppose K(B) S. whereupon there exists a vector x e IR m such that Bx = y. then it is normal. such as preceding but still be normal. prove that 7£(A) = 7£(AA r ) using only definitions and elementary properties of the MoorePenrose pseudoinverse. For example. A E IRn xn B E E n xm 6. we have shown that B = AA+B.]. Theorem 4. properties of the MoorePenrose pseudoinverse. Then B and take arbitrary y E R(B). a matrix can be none of the preceding but still be normal. problems. prove that R(A+) = R(A T). that B = AA+ B. prove that RCA) = R(AAT) using only definitions and elementary 3. To prove the converse. A E IRmxn. A+ 0 A+BD. B E E M X m . Let A G M"xn.i l . However. RCA).4 to compute the pseudoinverse of U . or orthogonal.18. Since x was arbitrary. assume that AA + B To prove the converse. if A is symmetric. (a) Prove or disprove that Prove or disprove that [~ (b) Prove or disprove that (b) Prove or disprove that AB D [~ B D r r=[ =[ A+ 0 A+ABD. Then Bx E H(B) S. then it is normal. Suppose A E Rnxp. R(A) and take arbitrary x e IRm. Since x was arbitrary. Then Bx e R(B) c H(A). = B. a matrix can be none of the skewsymmetric. Use Theorem 4. However. or orthogonal. Y e R". e IRnxm. such as A=[ b a a b] for scalars a. N(B) and 5 € IRmxn. The next theorem is fundamental to facilitating a compact and unifying approach The next theorem is fundamental to facilitating a compact and unifying approach to studying the existence of solutions of (matrix) linear equations and linear least squares to studying the existence of solutions of (matrix) linear equations and linear least squares problems. skewsymmetric. Then there exists a vector x E Rm such that Bx = y. if A is symmetric. and D E E mxm and suppose further that D is nonsingular. so Proof: Suppose R(B) c U(A) and take arbitrary jc E Rm. assume that AA+B = B and take arbitrary y e K(B). (xyT)+ = (xTx)+(yTy) yx T 3. A e IRPxn thatN(A) S. For example. b E E.4 to compute the pseudoinverse of \ 2 1. so there exists a vector y E Rp such that Ay = Bx.i ]. 0 EXERCISES EXERCISES 1. b e R for scalars a. € IRm xm D 6. If jc. fiA+A B. 2. ft(A+) ft(Ar 5.i D. 5 e JRn x m .Exercises 33 Note: Recall that A E IRn xn is normal if A A = A T A. Note: Recall that A e R" xn is normal if AATT = AT A. Then R(B) c R(A) if and only if Suppose e IRnxp. where one of the Penrose properties is used above. For A e Rmxn. Then K(B) S. For A E Rpxn and BE R mx ". Then we have there exists a vector y e IRP such that Ay = Bx. Then we have Bx = Ay = AA + Ay = AA + Bx.1 D. U(A) if and only if AA+B = B.. whereupon y = Bx = AA+Bx E R(A). For A E IRmxn.
This page intentionally left blank This page intentionally left blank .
More S > o r > O.. (Note: The rest of the proof follows [24. its eigenvalues are all real and nonnegative. the latter VfV^S2 = S2. rcfr)... Letting — diag(cri.. dimensioned. .e.1) rxr A = [U I U2) [ ~ 0 0 ][ ] 2 T VI VT (5. «}). .3) = Ulsvt· The submatrix sizes are all determined by r (which must be S min{m..2) (5.. Premultiplying by vt gives vt ATAVi = vt VI S2 = S2. < min{m. Let {Vi.) Denote the set of eigenvalues of AT A by {of / E n} with a\ > • • > a > 0 = o>+i = • • an... U\ E IRmxr. . ii E !!. e IRmxm and V E IR nxn such that V € Rnxn such that UI > diag(ul. r write A r A VI = ViS2.LettingSS = diag(uI.. . The SVD plays a key conceptual and computational of this important matrix factorization. 6]). its eigenvalues are all real and nonnegative. i e !!.. Preand postmultiplying by SI gives the emotion S~l eives the equation (5..1 The Fundamental Theorem Theorem A Theorem 5. .e. . for example.. We In this chapter we give a brief introduction to the singular value decomposition (SVD). More where ~ = [~ specifically.] . i.we can Vi = [vr+ .. 5. vectors. the latter equality following from the orthonormality of the Vi vectors. (5. Theorem 5. . Then there exist orthogonal matrices U E Rmxm and E IR~xn.• = Un.} with UI ::::: .. . . u r ) e R .Chapter 5 Chapter 5 Introduction to the Singular Introduction to the Singular Value Decomposition Value Decomposition In this chapter we give a brief introduction to the singular value decomposition (SVD). U2 E IRrnx(mrl..and postmultiplying by equality following from the orthonormality of the r..o>) E IRrxr. . we have n = U~VT. and the Osubblocks in are compatibly dimensioned. VI «xr j V U2 e ^x(mr) . we can and let V\ [VI. y22 €E Rnxfor^ and the 0JM^/ocJb in E~are compatibly IRnx(nr).Vv r).} be a set of corresponding orthonormal eigenvectors and let VI = [v\. . . [24. Proof: Since A r A > 00 A r A i is symmetric and nonnegative definite... role throughout (numerical) linear algebra and its applications. The SVD plays a key conceptual and computational role throughout (numerical) linear algebra and its applications.4) 35 . Ch..1.) Denote the set of eigenvalues of AT A by {U?. . (Note: The rest of the proof follows analogously if we start with the observation that AAT > 0 and the details are left to the reader analogously if we start with the observation that A A T ::::: 0 and the details are left to the reader as an exercise. recall.u ).\.1. vn]. Let A e R™ x n . .. n}). Pre. 6]). i. UI e Wnxr. .. Premultiplying by Vf gives Vf A T A VI write ATAVi = VI S2. Vi eE RIRnxr. Let {u. and a\ > • • • > Ur > 0. Proof: Since AT A ::::: ( (AT A s symmetric and nonnegative definite. Ch. We show that every matrix has an SVD and describe some useful properties and applications show that every matrix has an SVD and describe some useful properties and applications of this important matrix factorization. recall.. S = diagfcri. specifically.• ::::: Urr > as an exercise. V2 = [Vr+I. e n} be a set of corresponding orthonormal eigenvectors 0= Ur+1 = ... . for example. where S = [J °0].Vn ].
} for R" {VI. 3. Then from (5. m for IR (see the discussion in Section 3. we see that.2..r zero singular values. matrix VI E M mx/ " by U\ = AViS~l..'. The columns of V are called the left singular vectors of A (and are the orthonormal called orthonormal columns ofU left singular vectors of eigenvectors of AA ).36 36 Chapter 5. Let A = V"i:. VT be an SVD of A as in Theorem 5. The columns of V are called the right singular vectors of A (and are the orthonormal right singular vectors of of called orthonormal eigenvectors of AT1A). . .. ar}} is called the set of (nonzero) singular values of the matrix A and called [a\. . vn } for IR and {u\. Introduction to the Singular Value Decomposition Chapter 5.. The decomposition is A = V"i:.3. except for Hermitian transposes replacing transposes.1. See also m [v\. •. of the proof of Theorem 5. we see that Mat C is "i:. U be interpreted changes domain and codomain spaces with respect to which A then has a diagonal matrix representation.2).? (AA I min{m. . whence Vi ATAV22 = O. The latter equality follows from the orthogonality of the S and vI AVi = vI VI S = O. A. eigenvectors of AA TT).. U V identical.16. Then T V AV =[ =[ VrAVI VIAVI VrAVI vIA VI Vr AVz vI AVz ] ~] since A V2 =0. 1. we see that V r A VI = since A V2 = O. Referring to the equation U\ = A V\ S l defining VI. Note that V and V can be interpreted as changes of basis in both the domain Remark 5. where V and V are unitary and the proof is essentially proof decomposition A = t/E V H. the IRmxr VI AViSI. VT as AV = V"i:.. analogous complex e IC~ xn straightforward.. singular 2. in fact. The latter equality follows from the orthogonality of the columns of VI and V 2. From the proof of Theorem 5.(AT A) = is denoted ~(A). A = t/E VT SVD of A 5... C denote A thought of as linear transformation mapping W to W. The !:ingular value decomposition is not unique. AV2 = 0. ... Remark 5. with respect to the bases A = U^V as A V Mat £ is S the U E we see respect n and {u I. u ]} for Rm (see the discussion in Section 3. . Introduction to the Singular Value Decomposition Turning now to the eigenvalue equations corresponding to the eigenvalues or+\.. Remark 5. Thus. n] — Note that there are also min{m. n} .1.(A) At. i. Choose any matrix V2 E ^ 77IX( ™~ r) such that [VI V2] is VI columns orthonormal. The analogous complex case in which A E C™ x " is quite straightforward. V H. . of values of i I proof of A. VTAV = [~ Q]. Thus.16.2. there may be nonuniqueness associated with the columns of V\ (and hence U\) cor• there may be nonuniqueness associated with the columns of VI (and hence VI) corresponding to multiple O'i'S. See also m Remark 5..4) see UfU\ = columns of U\ are orthonormal. .(AATT).e. Remark 5. responding to multiple cr/'s. Ui e (5.. Choose U2 £ IRmx(mr) [U\ U2] orthogonal. Specifically.4.5. cr.... we see that U{ AV\ = S and 1/2 A VI = U^UiS = 0.5.3..1 we see that ai(A) = A(2 (ATA) = £(A). Then T rewriting A = V"i:..4. of A A). . an we Turning now to the eigenvalue equations corresponding to the eigenvalues ar+l. an examination decomposition Remark 5. Then Specifically. 0 Definition 5. The set {ai.2). and codomain spaces with respect to which A then has a diagonal matrix representation. an we have that ATAV2z = VzO = 0. Remark 5. and defining this matrix U\ andU UT A V [Q ~].. Remark 5.denote A thought of as aalinear transformation mapping IR n to IRm. let C. Now V20 Vf A T A V 0.. For example. to be ~ completes the proof.. Definition 5. Referring to the equation V I = A VI SI defining U\.1 reveals that proof Theorem any orthonormal basis for jV(A) can be used for V2 • £lny orthonormal basis for N(A) can be used for V2. . Now define the ATA V AV O. .4) we see that VrVI = /. singular unique.. D to be S completes the proof.).
10. U2.sine sin e cose J[~ ~J[ cose sine Sine] cose ' where e is arbitrary. Vi. [25]. A factorization UI: VT of a n m x n matrix A qualifies as an SVD if U and V are A t/SV r o f an m n U orthogonal and I: is an m x n "diagonal" matrix whose diagonal elements in the upper £ left comer are positive (and ordered).. Let A e IRnxn" be symmetric and positive definite. see. 0 Example 5.9. Computing an SVD by working directly with the eigenproblem for AT A or 5.2) can always be constructed from ¥2 Theorem too. f/2. SVDof A. corner f/E V T r r T Ti isaanS V D o f AT.9. then A A. too. orthogonal transformations.3). VT AV = A > O. i.1. Note.2) can always be constructed from a "compact SVD" (5. symmetric V orthogonal matrix of eigenvectors that diagonalizes A.. C/ [U\ Ui] orthogonal.. VT A V = A = VAV eigenvectors > 0. Example 5. Example 5. is the matrix I: and the span of the columns of UI.7. i.8. A _ [ 1  0 ~ ] cose = [ . SVD" (5. The Fundamental Theorem 37 37 • any U22can be used so long as [U I U2] is orthogonal. is an SVD. What is unique. e.8. [7].11). n=[ [] 3 2 I 3 2 I 5 2y'5 y'5 S 4y'5 15 2~ ][ 3~ 0 0 0][ 0 0 3 0 _y'5 3 v'2 T v'2 T v'2 T v'2 2 ] 3 2 2 = 3 3 3J2 [~ ~] A E R MX Example 5.6. 01  where U is an arbitrary 2x2 2 orthogonal matrix. [7]. if A = UI:VT is an SVD of A.. e.5. U V form • columns of U and V can be changed (in tandem) by sign (or multiplier of the form e je in the complex case). [11].10. and E U\.1. The Fundamental Theorem 5. Then A = V A VTT is an A.. AT A Remark 5.e. A=U is an SVD.U I U T . that "full SVD" (5. see. Let V be an orthogonal 5. is an SVD.g.3). [11]. that aa"full SVD" (5. VI:TU/ s n SVD of A VS C . SVD of A. For example. Computing AA AATT is numerically poor in finiteprecision arithmetic.g. U arbitrary 2 x orthogonal 5.6.e. VI. V2 (see Theorem 5. [25]. F/vamnlp 5.[1 0 ] . Better algorithms exist that work directly on A via a sequence of orthogonal transformations. e j8 the case). Example A . however.
= R(A T ). i=1 (5. The elegance of the dyadic decomposition (5. rank(A) = r = the number of nonzero singular values of A.. say.5) 3. .13..11. Note that each subspace requires on. urn]. The singular vectors satisfy the relations AVi = ajui. andV2 = [Vr+I. Vn].13. vr ]. . (c) R(VI) = N(A)1. nicely in Figure 5. um] and V = [VI.8) where where . 1. LetUI = [UI...5) as a sum of outer products and the key vector relations (5. Introduction to the Singular Value Decomposition Chapter 5.7) explain why it is conventional to write the SVD as A = U'£VTT rather than.12.38 38 Chapter 5.= N(A T ). A = UZV. Then A has the dyadic (or outer 2. Introduction to the Singular Value Decomposition 5. u r ]. Then (5..7) AT Uj = aivi for i E r.. (b) R(U2) = R(A)1. Note that each subspace requires knowledge of the rank r.. The relationship to the four fundamental subspaces is summarized knowledge of the rank r. . The relationship to the four fundamental subspaces is summarized nicely in Figure 5. . Then A has the dyadic (or outer product) expansion product) expansion r A = Laiuiv. Let U =. . Remark 5.1. (5.14.6) and (5. Let A E Rmxn have a singular value decomposition A = U'£ VT. .. The elegance of the dyadic decomposition (5. as A = UZV rather than.. ..1.2 Some Basic Properties Some Basic Properties Theorem 5.£VTT as in Let A e jRmxn have a singular value decomposition A = UHV as in Theorem Theorem 5. .2 5. reduction to row or column echelon form. Let A E E mx " have a singular value decomposition A = U. Theorem 5. . Part 4 of the above theorem provides a numerically superior method for Remark 5. U2 = [Ur+I.11. vn]. urn] and V = [v\. say. Using the notation of Theorem 5.. reduction to row or column echelon form.. . A = U. for example. Part 4 of the above theorem provides a numerically superior method for finding (orthonormal) bases for the four fundamental subspaces compared to methods based finding (orthonormal) bases for the four fundamental subspaces compared to methods based on. . .£V.6) (5. Then TheoremS. Then (a) R(VI) = R(A) = N(A T / . rank(A) = r = the number of nonzero singular values of A. Remark 5. vn]. Let A e jRrnxn have a singular value decomposition A = VLV T Using Theorem 5.. .].12. [HI. . ...1.7) explain why it is conventional to write the SVD and the key vector relations (5. . VI = [VI. 4. for example.1. the following properties hold: 1. Let V = [UI.6) and (5.1. (d) R(V2) = N(A) = R(AT)1. 2..5) as a sum of outer products Remark 5.. the following properties hold: the notation of Theorem 5. The singular vectors satisfy the relations 3. .
However.1. Proof' The proof follows easily by verifying the four Penrose conditions. which is clearly orthogonal and symmetric. if we let the columns of U and V be as defined in Theorem 5. . Furthermore.2. e\\. e2. . However. ed.11) This can also be written in matrix terms by using the socalled reverseorder identity matrix This can also be written in matrix terms by using the socalled reverseorder identity matrix (or exchange matrix) P = \err.11. then = L r 1 v.1.. with the Osubblocks appropriately sized. Note that none of the expressions above quite qualifies as an SVD of A + Remark 5. Figure 5. which is clearly orthogonal and symmetric.15. .10) .u. Note that none of the expressions above quite qualifies as an SVD of A+ if we insist that the singular values be ordered from largest to smallest. SVD and the four fundamental subspaces.er^\... if we let the columns of U and V with the Qsubblocks appropriately sized. Proof: The proof follows easily by verifying the four Penrose conditions. Furthermore. Remark 5. a simple reordering accomplishes the task: reordering accomplishes the task: (5. then be as defined in Theorem 5. e^.15.5. Some Basic Properties 39 39 A r r E9 {O} / {O)<!l nr mr Figure 5.11. Some Basic Properties 5. 0 D (5.2..=1 U.. (or exchange matrix) P = [e erI... SVD and the four fundamental subspaces. a simple if we insist that the singular values be ordered from largest to smallest. .
5. notice that R(A) R(A V) This time. Since T is determined by its action on a basis. Remark 5.1). In other words. Both compressions are analogous to the socalled rowreduced echelon form which.. then T can be defined by TVj = OjUj . = cr.. A "full SVD" can be similarly constructed. the matrix representation for T with respect to the bases {VI.r) and the matrix SVr e lR.urr}} e r.. urr]} is clearly S.1). basis forJ\f(A)±. Then VT A = :EVT = [~ ~ ] [ ~i ] D _ . premultiplication of A by VT is an orthogonal transformation that "compresses" A by row transformations.1.. while the matrix representation for the inverse linear transformation T~ with respect to the same bases is SI.2. . / E~. postmultiplication of A by V is an orthogonal transformation column rank. notice that H(A) = K(AV) = R(UI S) and the matrix UiS e Rm xr has full K(UiS) and the matrix VI S E lR. the same bases is 5""1. = tv. let A E lR. = ^u.. Then Again. From Section 3.11). in Definition 4.11). let A e Rmxn have an SVD given by (5. is not generally as reliable a procedure. Then Let A E lR. Introduction to the Singular Value Decomposition Chapter 5. . by orthogonal row transformations performed directly on A to reduce it to the form [~]. . mxr has full column rank. In other words. the isabasisfor7£(. .4). finiteprecision arithmetic. .M(UT Notice that M(A) = N(V T A) = N(svr> and the matrix SVf E Rrxll" has full row A/"(SV. Column compression Column compression Again. Recall the linear transformation T used in the proof of Theorem 3.. r x has full row rank.. while the matrix representation for the inverse linear transformation TlI with respect to S. .1.[ SVr ] 0 mxn E lR. Similarly. In other words.mxn..17 and in Definition 4. where R is upper triangular.. ..l. v } and {MI . In other words. u is clearly matrix representation for T with respect to the bases { v \ . Recall the linear transformation T used in the proof of Theorem 3.3 5.i e r. .i / E~. A "full SVD" can be similarly constructed.... and since ( v \ .16. Since T is determined by its action on a basis.2. Both compressions are analogous to the socalled rowreduced where R is upper triangular. From Section 3. Such a row compression can also be accomplished by orthogonal row transformations performed directly on A to reduce it to the form 0 . Similarly.mxn have an SVD given by (5. Notice that N(A) . . . then T can be defined by TV. and since {VI. Such a compression is analogous to the . Introduction to the Singular Value Decomposition A+ = (VI p)(PS1 p)(PVr) is the matrix version of (5. Then AV = V:E = [VI U2] [~ ~ ] =[VIS 0] ElR.1).. is the matrix version of (5.. . have an SVD given by (5.40 40 Then Then Chapters. since {UI. .16. . Such a compression is analogous to the "compresses" A by I. then T~ canbedefinedbyTIu. then TlI can be defined by T^'M. postmultiplication of A by V is an orthogonal transformation that "compresses" A by column transformations.. u is a basis forR(A).. w. since [u\.1).3 Rowand Column Compressions Row and Column Compressions Row compression Let A E R have an SVD given by (5. This time.17 and Remark 5. .olumn transformations. when derived by a Gaussian elimination algorithm implemented in finiteprecision arithmetic. . is not generally as reliable a procedure. . premultiplication of A by UT is an orthogonal transformation that rank. Such a row compression can also be accomplished "compresses" A by row transformations.vvr}}is aa is r basisforN(A). .. vrr} and {u I.. when derived by a Gaussian elimination algorithm implemented in echelon form which.
which is not generally a reliable procedure when socalled columnreduced echelon form.[11]. A E IRnxn indefinite. show that X = 0. [7]. an SVD A. € IRmxn. Let A E ~mxn and W E IR mxm and 7 E ~nxn are (a) Show that A and WAY have the same singular values (and hence the same rank). 3.. xyT 5. of defined by A defined by A = xyT. . Do A Wand Yare A and WAY have the same singular values? Do they have the same rank? and WAY have the same singular values? Do they have the same rank? factorization of i. Let x e Rm. A = Q P 7. Let X E M mx ". y e Rn be nonzero vectors. For details.e. z of complex scalar z (where i j J=I). 4. [23]. Note: this is analogous to the polar form iO z = rel&ofaa complex scalar z (where i = j = V^T). EXERCISES EXERCISES 1.e. Use the SVD to determine a polar factorization of A. Let E ~~xn. Prove Theorem 5. see. Let A e E"xn be symmetric but indefinite. for example.1 starting from the observation that AAT > 0. 2. (b) Suppose that W and Y are nonsingular but not necessarily orthogonal..1 starting from the observation that AAT ~ O. for performed by Gauss transformations in finiteprecision arithmetic. Determine an SVD of the matrix A E R™ xn E IRm. Prove Theorem 5. [25]. [7]. If XTX = 0. [11]. Determine SVDs of the matrices (a) (b) [ ] [ ~l 1 0 1 6. Determine an SVD of A. If XT X = 2. which is not generally a reliable procedure when performed by Gauss transformations in finiteprecision arithmetic.Exercises Exercises 41 41 socalled columnreduced echelon form. Use the SVD to determine a where Q is orthogonal and P p T > O. y E ~n Determine A e ~~ 4. = o. Note: this is analogous to the polar form where Q is orthogonal and P = PT > 0. Determine SVDs of the matrices 5. see.. [25]. For details. (a) Show that and W A F have the same singular values (and hence the same rank). Let A € R" X M . [23]. i. A = QP 7. Let A e Rmxn and suppose W eRmxm and Y e Rnxn are orthogonal.
This page intentionally left blank This page intentionally left blank .
A/"(A) = 0. A are linearly independent. 6. i. A is 2. 5. as a special case. There exists a solution to (6. i. the familiar vector system Ax = b. i. There exists a unique solution to (6. There exists at most one solution to (6.1 Vector Linear Equations Vector Linear Equations We begin with a review of some of the principal results associated with vector linear systems. and this is possible only ifm > n. onto.3) for all b e W1 if and only if the columns of A are linearly independent.3) if and only b E R(A). equivalently.3} for e R m if only ifU(A) = W". equivalently..3) for all b e W" if and only if is nonsingular. General linear systems of the form (6.2) 6. Consider the system of linear equations Ax = b. 4. 3. A is 11. (6. 3. and this is possible only ifm ::: n. b E lRm.. There exists a solution to (6.3) if and only ififbeH(A).e.e. there exists a solution if and only ifrank([A.e.e. b]) = rank(A). We begin with a review of some of the principal results associated with vector linear systems. equivalently. Theorem 6. 43 .3) 1. (6. General linear systems of the form equations. A E lR mxm and A has neither a 0 singular value nor a 0 eigenvalue. i.Chapter 6 Chapter 6 Linear Equations Linear Equations In this chapter we examine existence and uniqueness of solutions of systems of linear In this chapter we examine existence and uniqueness of solutions of systems of linear equations.3) for all b E lRm if and only ifR(A) = lRm. n}).3) for all b E lRm if and only if the columns of 5. i.1. and this is possible only ifm :::: n (since m = dim R(A) = rank(A) :::: min{m. rank(A) < n.. A E lRmxn. equivalently. as a special case.1 6. b]) = rank(A). There exists at most one solution to (6. A G M m x m and A has neither a singular value nor a eigenvalue.e.1) are studied and include. A solution to (6.3) for all b E ]Rm if and only if A is nonsingular.1. there exists a solution if and only j/"rank([A. (6. N(A) 0.e. n this is possible only ifm < n (since m dimT^(A) = rank(A) < min{m. There exists a nontrivial solution to the homogeneous system Ax = 0 if and only if Ax = 0 if 6. only if rank(A) < n. There exists a unique to (6.3) is unique if and only if N(A) = 0. n. and onto.. A solution to (6. 2... A is 11. A E ]Rn xn.3) is unique if and only ifJ\f(A) = 0. Consider the system of linear equations Theorem 6. 1. There exists a solution to (6. b E ]Rn. 4. the familiar vector system are studied and include. i.
AZ :::: B. note that x 0 is always a solution to the homogeneous system. (6.e.1 follow from those below for the special case = 1.e . equivalently.mxk. A E JR. we prove part 6. (6. while results for (6.6). Let Z be an arbitrary solution of (6.2) follow by specializing even further to the case m = n. Therefore.4) has a solution if and only ifl^(B) C 7£(A).44 Chapter 6.6) Furthermore. (6. The matrix criterion is Theorem 4. For example. Linear Equations Proof: The proofs are straightforward and can be consulted in standard texts on linear Proof: The proofs are straightforward and can be consulted in standard texts on linear algebra. Note that some parts of the theorem follow directly from others.18.e.6). to algebra. The matrix linear equation Theorem 6.3.1). to prove part 6. Furthermore. 0 .3. i. Linear Equations Chapter 6. all solutions of (6. we must have the case of a nonunique solution. premultiply by A: AX = AA+ B + A(I = B A+ A)Y + (A  AA+ A)Y by hypothesis = B since AA + A = A by the first Penrose condition. while results for (6.6) are of this form.. i. For example. and this is clearly of the form (6. which implies rank(A) < n by part 0 by part 3.mxn. Therefore. Then we can write Z=A+AZ+(IA+A)Z =A+B+(IA+A)Z and this is clearly of the form (6.e.1). The matrix linear equation AX = B.6) are of this form. That all solutions are of this form can be seen as follows. Proof: The subspace inclusion criterion follows essentially from the definition of the range Proof: The subspace inclusion criterion follows essentially from the definition of the range of a matrix. E JR. R(A).2 6. +B = Theorem 6. all solutions of (6.2)follow by 6. Note that the results of Theorem 6. which implies rank(A) < n must have the case of a nonunique solution. BE JR. Then any matrix eRmxk of the form of the form X = A+ B + (/ . The matrix criterion is Theorem 4.18.2 (Existence). Note that some parts of the theorem follow directly from others. mxn . Then we can write (6. i. a solution exists if and only if AA+B = B. premultiply by A: Proof: To verify that (6.5) is a solution of is a solution of AX=B. Theorem 6. of a matrix. note that x = 0 is always a solution to the homogeneous system. B E JR. AA+B B.2 (Existence). 0 Theorem 6. A is not 11. i. a solution exists if and only if has a solution if and only ifR(B) S.5).mxk and suppose that AA+B = B. AZ — B. Let Z be an arbitrary solution of That all solutions arc of this form can be seen as follows.. D 6.5) is a solution. equivalently..1 follow from those below for the special case k = 1.A+ A)Y. where Y E JR. Let A e Rmxn. A is not II.2 Matrix Linear Equations In this section we present some of the principal results concerning existence and uniqueness In this section we present some of the principal results concerning existence and uniqueness of solutions to the general matrix linear system (6.nxk is arbitrary. specializing even further to the case m = n.5) is a solution. Note that the results of Theorem of solutions to the general matrix linear system (6.5). Proof: To verify that (6..
7.mxn.2. equivalently. there exists a nonzero solution if and only if A+A /= I. this can occur if and only if rank(A) = r m (since equivalent to AA+Im = 1m. r checked that 1. Ax — 0.7) has a unique solution if and only if unique if and only if A + A = I. A solution of the matrix linear equation Theorem 6.mxk (6. X• = A~ B. vD Example 6. nonzero solution. Clearly. find all A e ]Rmx". if and only if A is Ilor _/V(A) = O.6 (Uniqueness).4. Clearly. (6. A R (AA(A) = A"1. Example 6.A+A) = O. A A+ AI Remark (/ — A + A) 0. It can be shown that the particular solution X = A+B is the solution of (6. / .6. Find all solutions of the homogeneous system Ax = 0.9. But if A has an SVD given by A = U h VT. Consider the system of linear firstorder difference equations (6.3.5.) Theorem 6. y E R" A + A t= I.7) has a unique solution if and only if M(A) = 0. BE lR. Characterize all right inverses of a matrix A E lR. if there exists a unique. recall that TrXT X = £\ •xlj. D 0 Example 6.. there is no "arbitrary" component. Computation: Since y is arbitrary.A+A = Vz V2 and R(Vz2V^) = R(Vz) = N(A). equivalently. (6.A+A).S. It particular (6. Solution: There exists a right inverse if and only if R(Im) S. A E lR. Characterize AR = Im solutions R of the equation AR = 1m. A solution of the matrix linear equation AX = B.mxn. in which case A must be invertible and R = AI.nxn. A e E"x".5. where y e lR. The second follows by noting thatA+A = I can occur only ifr = n. Suppose A E lR.) that minimizes TrXT X. wherer = rank(A) (recallr ::: h). Example 6. Hence. this can occur if and only if rank(A) = r = m (since r ::: m) and this is equivalent to A being onto (A + is then a right inverse). rank(A) = < A This is equivalent to either rank (A) = r < n or A being singular.9.8) . recall that TrX r = Li.nxm is arbitrary. Proof: Proof: The first equivalence is immediate from Theorem 6. All right inverses r < m) A (A+ of A are then of the form of A R = A+ 1m + (In .6) that minimizes TrX7 (Tr() denotes the trace of a matrix.6 (Uniqueness). Remark 6. A A = f/E VT.6) +B Remark 6. N(A) = O.8. 7£(A) and this is 7£(/m) c R(A) equivalent to AA + 1m Im. (TrO denotes the trace of a matrix. A+ A where Y E lR. and (N(A) = 0). When A is square and nonsingular.n is arbitrary.A+ A V2 V[ and U(V = K(V2) = N(A). There is a unique right inverse if and only if A+A = I/ e E"xm arbitrary. then it is easily R(I — A + A). But rank(A) = n that A+ A = / if r — n. Butrank(A) = n if and only if A is 11 or N(A) = 0. Matrix Linear Equations 6.2. it is not unique. Here. matrix. it is easy to see that all solutions are generated y from a basis for 7£(7 .7) is unique if and only if A+A = /. Example 6. Thus.7. leaving only the unique solution X = AI1B.j jcj. we write 1m to emphasize the m x m identity Im matrix.A+ A)Y =A++(IA+A)Y. where r rank(A) (recall r < n). Solution: x=A+O+(IA+A)y = (IA+A)y. Consider Example 6. A+ = A"1 and so (I . Matrix Linear Equations 45 Remark 6. Clearly. equivalently.
if and only if rank [B.. if and only if or. The vector Jt* in linear system theory is e IR nx " fieR" (n ~ I. equivalently. Again from Theorem 6. does there exist an input sequence {ujj 1jj^ such that Xk takes an arbitrary value in 1R"? In linear system theory.8) is reachable if and only if if R([ B. m known as the state vector at time while Uk is the input (control) vector.. We might now ask the question: Given Xo 0. The general solution of (6. this is called controllability. .10.J B]) = 1R" or.e. standard conditions with analogues for continuoustime models (i. from the fundamental Existence Theorem. B) is if(AT . . Theorem l'/:b Clearly.11) with and D (p ~ 1). A n . Since > 1. The linear differential equations). The above are standard conditions with analogues for continuoustime models (i.10.• A k kJ B] [ ~o (6. There are many other algebraically equivalent conditions. The matrices A = [ ° Q and f ^ provide an lability and reachability are equivalent. The condition The answers are cast in terms that are dual in the linear algebra sense as well.9) ~Axo+[B. this is a question va [Uj }k~:b such that x^ takes an arbitrary value in W ? In linear system theory.AB •.. reachability always implies controllability and. We now introduce an output vector yk to the system (6.y/}"Io suffice to determine reconstructibility: When does knowledge of {w r/:b and {YJ lj:b suffice to determine (uniquely) xn? The fundamental duality result from linear system theory is the following: (uniquely) xnl The fundamental duality result from linear system theory is the following: E RPxn e IR pxn E RPxm € IR pxm (A. Since m ~ I. Linear Equations Xk with A E R"xn and B E IR nxmxm(rc>l. (A..ra>l). we have the notion of suffice to determine (uniquely) xo? As a dual to controllability. We might now ask the question: Given XQ = 0.8) is controllable if and only if if controllability. does there exist an input sequence {u j an input sequence {"y}"~o such that xn = O? In linear system theory.9 Example 6.8) of Example 6. we of reachability. AB. Example 6. A related question is the following: Given an arbitrary initial vector Xo.46 46 Equations Chapter 6..2. We now introduce an output vector Yk to the system (6. see that (6. B T] is observable [reconsrrucrible] [controllablcl if and T) observable [reconstructive]. from the fundamental Existence Theorem.. AB. A n . The answers are cast in terms that are dual in the linear algebra sense as well. if A is nonsingular. Theorem 6. controlA 1 lability and reachability are equivalent. The matrices A = [~ ~]1and B5 == [~] 1 providean example of a system that is controllable but not reachable. . we have the notion of reconstructibility: When does knowledge of {u jy }"~Q and {. example of a system that is controllable but not reachable.8) is given by solution of (6.10) for k ~ 1.:b dual to reachability is called observability: When does knowledge of {" j }"!Q and {y_/}"~o suffice to determine (uniquely) Jt0? As a dual to controllability. There are many other algebraically equivalent conditions.. We can then pose some new questions about the overall system that are dual in the systemtheoretic sense to reachability and controllability. The condition dual to reachability is called observability: When does knowledge of {u 7 r/:b and {Yj l'. .J B] = n.8) of Example 6. The general known as the state vector at time k while Uk is the input (control) vector.~ I).2. we see that (6. We can then pose some new questions about the with C and (p > 1). linear differential equations).T. equivalently..8) is given by kJ Xk = Akxo + LAkJj BUj j=O UkJ ] Uk2 (6.9 by appending the equation by appending the equation (6. this is a question {u }y~Q Xk of reacbability. overall system that are dual in the systemtheoretic sense to reachability and controllability.e. this is called such that Xn = 0? linear system theory. B) iJ reachable [controllable] ifand only if (A . does there exA related question is the following: Given an arbitrary initial vector XQ. does there exist an input sequence for k > 1.
(6. asbelow is a small collection of useful matrix identities. in which case the general solution is of the has a solution if and only if AA + BC+C = B. e Tl(R).14) requires the notion A compact matrix criterion for uniqueness of solutions to (6.DUnl 6.6. equivalently.27. B E Rmxq. 6. Thus. and C e jRpxq.Duo Yl . Listed below is a small collection of useful matrix identities. e Rmxn. the solution is then unique if and only if N(R) ==0.2 j BUj . indicated. and C E Rpxti. particularly for block matrices. arbitrary.13) and let denote the matrix on the righthand side.4 Some Useful and Interesting Inverses 47 To derive a condition for observability. By the fundamental the righthand side. is stated and proved in Theorem 13. Let A E Rmxn. Such a criterion (C C+ ® A +A = I) of the Kronecker product of matrices for its statement.11. if and only if r Yn]  Lj:~ CA n . by definition. Theorem 6.. Such a criterion (CC+ <g) A+ A — I) is stated and proved in Theorem 13. so a solution exists.4 6. Theorem 6. Verification of each identity is recommended as an exercise for the reader.13) and let R denote the matrix on Let denote the (known) vector on the lefthand side of (6. so a solution exists.3 6. B E Rnxm.4 Some Useful and Interesting Inverses 6. Then the equation e jRmxn.CBuo . by definition. Listed In many applications.12) j=O Yo .3 A More General Matrix Linear Equation A More General Matrix Linear Equation AXC=B (6. notice that To derive a condition for observability. In these identities. A compact matrix criterion for uniqueness of solutions to (6. Verification of each identity is recommended as an exercise for the reader.6. mxm and D E jRm Invertibility is assumed for any component or subblock whose inverse is and D € E xm.6.27. . E jRnxm. the solution is then unique if and only if N(R) Uniqueness Theorem. the has a solution if and only if AA+BC+C = B. Then. Invertibility is assumed for any component or subblock whose inverse is indicated. the coefficient matrices of interest are square and nonsingular. the coefficient matrices of interest are square and nonsingular. A E Rnxn. in which case the general solution is of the form (6. +L kl CAk1j BUj + DUk. By the fundamental Uniqueness Theorem. Then. v E R(R).14) requires the notion of the Kronecker product of matrices for its statement.Du] (6. notice that Yk = CAkxo Thus. sociated e jRnxn. equivalently. associated with matrix inverses.4 Some Useful and Interesting Inverses Some Useful and Interesting Inverses In many applications. 0.14) Theorem 6. if and only if or. B e jRmx q .13) Let v denote the (known) vector on the lefthand side of (6.15) E jRnxp where Y € Rn*p is arbitrary. particularly for block matrices. C E jRmxn. or.
Rmxk and suppose has an SVD as in Theorem 5.CA. (A + BDC)I = A~l . where F = (A .4. Linear Equations Chapter 6.4. result follows easily from the block LU factorization in property 16 of Section 1. Assuming 2.I B)I (E is the inverse of the Schur complement of A). characterize all solutions of the matrix linear equation 7Z(B) c 7£(A). BB EelR fflxk and suppose AAhas an SVD as in Theorem 5. l 8.. This where E = (D — CA B) (E is the inverse of the Schur complement of A). 2. X. It also the inverse of a sum of matrices such as (A + D)"1 or (A" + D"1) It also yields very efficient "updating" or "downdating" formulas in expressions such as yields very efficient "updating" or "downdating" formulas in expressions such as T (A + JUT ) I1 (with symmetric A E R"x" and . It has many This result is known as the ShermanMorrisonWoodbury formula.c E E") that arise in optimization (A + xx T ) — (with symmetric A e lRnxn and x e lRn) that arise in optimization theory.BDI l = [ AI BD. for example. r A~I [~ ~ r [D~I~AI D~I 1 ~r ~~B 1 r l [~ ~ r [D~CF +~~I~.8. where E = (D . Linear Equations 1.. Assuming R(B) ~ R(A). for example. theory.A~lB(D~ CA~lB)~[CA~l This result is known as the ShermanMorrisonWoodbury formula.4.B D.I C) I. Note that the positions of the / and .48 Chapter 6. It has many applications (and is frequently "rediscovered") including.I EXERCISES EXERCISES 1. that X~l [~ !/ [~ ~ r [~ ~ l [~ ~/ r [~ ~ 1 l l l = [ ~ 4. characterize all left inverses of a matrix A e lR ". 2. ization in property 17 of Section 1.AIB(DlI + CAIB)ICAI. = = Both of these matrices satisfy the matrix equation X2 = / from which it is obvious these matrices satisfy the matrix equation X^ = I from which it is obvious Both of that XI = X.1. (A BDCr1 = AI . characterize all solutions of the matrix linear equation AX=B in terms of the SVD of A in terms of the SVD of A. 1. [ / +c 7. This result follows easily from the block LU factorization in property 16 of Section 1. formulas for applications (and is frequently "rediscovered") including. This result follows easily from the block UL factorwhere F = (A — ED C) This result follows easily from the block UL factorization in property 17 of Section 1. BC 6. 5. mx .I ] D. characterize all left inverses of a matrix A E Mm xn . As in Example 6. As in Example 6. Let A E lRmxn.I . formulas for the inverse of a sum of matrices such as (A + D)lor (AI1 + DI)I. [~ ~ r l 3. Note that the positions of the / and — / blocks may be exchanged. Let A € E mx ". 1.4. l = l = [!C / [~ ~ l = [ AI +_~~!~CAI A~BE = D..1.8./ blocks may be exchanged.
e.10. Let x. where C = 1/(1 . y E E" and suppose further that XTy ^ 1. l' Hint: Show that Ci E N(B). A with — subtracted from its (zy)th element) is singular. . Show that (/ . in Example 6. Show that the matrix B = A . Show that cxJ C ' where c 1/(1 — T y). Let A e R"xxn and let A"1 have columns c\.e. As in Example 6. Let x. € IRn and suppose that x T y i= 1. Let A E 1R~ " and let A 1 have columns Cl. T 4... Show that 3.. Hint: Show that ct <= M(B).x xTy).l ~i e. 5. . check directly condition for reconstructibility the form form N[ fA J CA n 1 ~ N(A n ).. . Let jc.y Assume that Yji i= 0 for some i/ and j. check directly that the condition for reconstructibility takes the 6. y e IRn and suppose further that x T y ^ 1... 6. y E E" and suppose further that XTy i= 1.Exercises Exercises 3. .e. (i. Show that 4. Assume that x/( 7^ 0 for some and j..10. Show that the matrix B — A — —eie T : (i.. A with yl subtracted from its (ij)th element) is singular. c and individual elements y.xy) T 1 49 = I  1 xTy 1 xy .Cn and individual elements Yij.
This page intentionally left blank This page intentionally left blank .
y = Px.3.2.y is called the (oblique) projection on X along y. By Theorem 2.x — I — Px.1 displays the projection of von both and 3^ in the case = Figure 7. V by by PX.1 7. Theorem 7. Theorem 7.y • V —>• c V has a unique decomposition v = x + y with x e X and y E y.yV = x for all v E V.y is called the (oblique) projection on X along 3^. PX. every v e V has a unique decomposition v x y with x E and y e y.y is linear and pl. Oblique projections. Also. Let V be a vector space with V = X EEl Y. Inner Product Projections. Let V be a vector space with V X 0 y..1 Projections Definition 7.2.1 displays the projection of v on both X and Y in the case V = ]R2.1). every v E V Definition 7. Theorem 7.e. Figure 7.1).Chapter 7 Chapter 7 Projections.x = I px.26.1. Also. i. Figure 7. Inner Product Spaces. px. say on X along y (using the notation of Definition 7. y x Figure 7. Px. A linear transformation P is a projection if and only if it is idempotent.1. Define pX. Define PX y : V + X <.yp2 = P. say on X along Y (using the notation of Definition 7.e. P isaprojectionifandonlyifl P isaprojection. P is a projection if and only if I —P is a projection.y. Infact. i.1. Py.y.3. 51 51 . Px.1.y Theorem 7. Oblique projections. Proof: Suppose P is a projection. and Norms 7..26. P2 = P.y is linear and P# y — px. By Theorem 2. A linear transformation P is a projection if and only if it is idempotent. Infact. Py. Proof: Suppose P is a projection. and Norms Spaces.
P)x E XL.1.P}x = 0. Thus. V = X $ Y and the projection on X along Y is P. Essentially the same argument shows that / .X^X = Px±. Then v = Pv + (I .XLtion and we then use the notation P x = PX. Hence pT = pT P = P.XL iss called an orthogonal projection and we then use the notation PX = PX. let A E Rmxn with SVD A = U!:VTT = A = UT. along 1) and let jc.P)x. with the second equality following since PTP is symmetric. T Since x and y were arbitrary. In the special case where Y = X^.L 1. Thus. and Norms Let v e V be arbitrary. Thus.5.P)x = O. Thus. P e jRnxn is the matrix of an orthogonal projection (onto R(P)) if and only ifP2 PT if p2 = p = pT. we must have P (I — P) = O. Projections.)x = PXJ. y = (I . along XXL} and let x. i=r+l PN(A) 1. Thus.xJ. then Pv = O. then v = O. p 2v = PPv = Let u E V be arbitrary.. First note that iftfveX.P)v. Then Px = p 2v = Pv = x so x e X. Let X = {v e V : Pv = v} and y {v € V : Pv 0}. P)x = y PT(I P)x = 0.1 . y = (I . suppose P = P. Write x = P x + (I . Conversely.=1 m PR(A). Now let u e V be arbitrary. Note that (I . Then v if v € Pv (I .P)x = O. Then U\SVr Then r PR(A) AA+ U\U[ Lu. we have (py)T ((I .P)v.3.4. P2v = P Pv — 2 2 Px = x = Pv. we must have pT (I .P)x = (I . suppose P is a is a with the second equality following since pT P is symmetric.P)x e X1. say. We now prove that V = X $ y. then Pv = v.px. In the special case where y X1. Then symmetric projection matrix and let x be arbitrary. say.xx by Theorem 7.. Now let v E V be arbitrary. then Pv v. Py e X.11.P)x = (I .PX.P)x E ft(P)1 xTPT(I . (I . and P must be an orthogonal projection. then v = 0. P E E"xn is the matrix of an orthogonal projection (onto K(P)} if and only 7. D Essentially the same argument shows that I — P is the projection on y along X.4.11.xx by Theorem 7.52 52 Chapter 7.1 and 5. Hence PT = PTP = P. px. R" be Proof: Let P be an orthogonal projection (on X. If v E Y.P)x = xTP(I . A 6 jRmxII UtSVf. p2 = P. we have ( P y f I (/ . P = P. while Py = P(I P}v = x Pv . since Px e U(P).xl. Let X n y. Then Pv = P(x + y) = Px = x. It is easy to check that X and 3^ are subspaces. then (/ . mental subspaces.3.. .P)v = = Pv.P)v. PN(A)J. 0 Definition 7. V Pv .uT. Write x = Px (I — P)x. Projections. Conversely. It is easy to check that X and Y are subspaces.P)v. Conversely.P2v = 0 so Y E y. Note that (/ . Hence that V X 0 y. Inner Product Spaces.P)x = yTTpT (I . A+A VIV{ r LViVT are easily checked to be (unique) orthogonal projections onto the respective four fundaare easily checked to be (unique) orthogonal projections onto the respective four fundamental subspaces. Moreover.XL Theorem 7..P)x 6 R(P)1and P must be an orthogonal projection. Conversely.V Theorems 5. Px E R(P). suppose symmetric projection matrix and let x be arbitrary. and Norms Chapter 7. First note that v E X. arbitrary. * called an orthogonal projecDefinition 7.AA+ U2 U ! LUiUT.1 7. Hence if v E X ny. Moreover. then Pv = 0.p 2 v 0 so y e Thus.P is the projection on Y along X. .1 The four fundamental orthogonal projections The four fundamental orthogonal projections Using the notation of Theorems 5. P Proof: Let P be an orthogonal projection (on X. Since Py E X. D 0 7.P) = 0. yy Ee jR" be arbitrary. PX. Since x and y were arbitrary. Then x T pT (I . (I . Inner Product Spaces. Thus. Let X = {v E V : Pv = v} Px = x = Pv. while Py = P(l .A+A V2V{ L i=r+l i=l 11 ViVf. We now prove and Y = {v E V : Pv = OJ. X 0 y and the projection on X along y is P. If v e y.1 5.5. Then Px = P2v = Pv = x so x E X. Let x = Pv. Then Pv = P(x + y) = Px = x. suppose p2 = P.P)x = x T P(l .
X on Specifically. Recall the diagram of the four fundamental subspaces. Then the desired projection is simply Then the desired projection is simply Pn(w)v = ww+v wwTv (using Example 4.11. Example 7. Determine the orthogonal projection of a vector e M" on another nonzero Example 7.11. in fact." Figure 7. The indicated direct Example 7. . There. Then Let x e W be an arbitrary vector.2." Example 7.1.8) = (WTV) W.. .2. See Figure 7. See Figure 7.6.7. Vk} was an orthornormal Example 7.1. e Rn Solution: Think of the vector w as an element of the onedimensional subspace IZ(w). There.7. Recall the proof of Theorem 3. The expression for x\ is simply the orthogonal projection of XI projection of rather x on S. IR n Rm 1 n Let X E IR be an arbitrary vector.. Determine the orthogonal projection of a vector v E IR n on another nonzero vector w E IRn. are. An arbitrary vector x e IRn was chosen and a formula for x\ appeared rather mysteriously.2.2. Projections 53 Example 7. {VI. Vk} was an orthomormal basis for a subset S of W1. the vector z that is orthogonal to w and such that Pv Moreover. An arbitrary vector x E R" was chosen and a formula for XI basis for a subset of IRn.(:. . Orthogonal projection on a "line.6. Orthogonal projection on a "line.8) (using Example 4. orthogonal: that z and u. { v \ ..7. Recall the proof of Theorem 3.8. Recall the diagram of the four fundamental subspaces. the vector z that is orthogonal to wand such that v = P v + z is given by z is given by z = PK(W)±Vv = (/ — PK(W))V = v — (^^ j w.Pn(w»v = v . A direct calculation shows that and ware.A+ A)x 2 = A+ Ax + (I = VI vt x + V Vi x (recall VVT = I). Solution: Think of the vector w as an element of the onedimensional subspace R( w).8. Specifically.~) w. T W W Moreover. . The indicated direct sum decompositions of the domain E" and codomain IRm are given easily as follows. Projections 7... orthogonal: v z Pv w Figure 7. A direct calculation shows z = Pn(w)"' = (l . Then X = PN(A)u + PN(A)X . in fact..
y e V. Let Then Then and we can decompose the vector [2 3 and we can decompose the vector [2 3 and a vector in N(A). Inner Product Spaces. Ay) = {AT x. ATE IR nxm transformation Definition 7. Then (x. Let V be a vector space over R.9. If e IR mx ". defines a "weighted" inner product. (x. Example 7. y)Q = XT Qy. [ 5~2 + 7. as follows: o o 4] uniquely into the sum of a vector in N(A)L 4V uniquely into the sum of a vector in A/'CA)1 r 1/4 1/4 ] 1/4 1/4 [!]~ = = A' Ax + (l  A' A)x 1/2 1/2 1/2 1/2 0] [ 2] [ 1/2 1/2 + [ 1~2 1~2 ~ o o ! 5/2] [1/2] 1~2 . Projections.9. then AT e Rn xm is the unique linear transformation or map T E IRm andfor IRn.12. (jc. and Norms Similarly.x)forallx. Let V be a vector space over IR. Then ('. y} = XTy is the "usual" Euclidean inner product or dot product. Y2 E V and/or all a. Let V = R". respectively. y) = (y. y) Q = X T Qy. where Q = QT > 0 is an arbitrary Q = Q T > is an Example 7. only ifx = O.AA+)y = U1Ur y + U2U[ Y (recall UU T = I).13. Then (x. such that {x. yi. y) = (y. y\) + /3(jt.11.2 Inner Product Inner Product Spaces Definition 7.y E V. Inner Product Spaces.12. y^} for all jc. respectively. as follows: and a vector in J\f(A). Yl. n x n positive definite matrix. Yl) + f3(x.54 Chapter 7. {*. x) ::: Qfor aU x 6V and (x. 3. . x } = 0 if and only ifx = 0. definite defines Definition 7. Projections.10. If A E Rm xn. aYI + PY2) = a(x. f3ftE IR. cryi + ^2) = a(x.10. e R. V = IRn. Let V = E". Then {^. Then Y = PR(A)Y + PR(A)~Y = AA+y + ( l . Example 7. x) for all x. 3. and Norms Chapter 7. let y e IR m be an arbitrary vector. Then Similarly. j2 ^ V and for alia. Let Example 7. (x. 2. y) for all x € Rm and for all y e R". (x. Y2) for all x. (x. (x.13. Let V = IRn. . Then { • • V x V if product if 1.) ) :: V x V + IR is a real inner is a real inner Definition 7. > Ofor all E V ( x x) =0 if 2. ..11. Example 7. let Y E ]Rm be an arbitrary vector.(A . y) x T Y is the "usual" Euclidean inner product or Example 7.
7.2. Inner product Spaces 7.2. Inner Product Spaces
55 55
It is easy to check that, with this more "abstract" definition of transpose, and if the It is easy to check that, with this more "abstract" definition of transpose, and if the (i, j)th element of A is aij, then the (i, j)th element of AT is ap. It can also be checked (/, y)th element of A is a(;, then the (i, y)th element of AT is a/,. It can also be checked that all the usual properties of the transpose hold, such as (Afl) = BT AT. However, the that all the usual properties of the transpose hold, such as (AB) = BT AT. However, the
definition above allows us to extend the concept of transpose to the case of weighted inner definition above allows us to extend the concept of transpose to the case of weighted inner products in the following way. Suppose A e Rmxn and let (., .) Q and (•, .) R, , with Q and A E ]Rm xn (., }R with Q and {, }g R positive definite, be weighted inner products on Rm and W, respectively. Then we can positive definite, be weighted inner products on IR m and IRn, respectively. Then we can define the "weighted transpose" A # as the unique map that satisfies define the "weighted transpose" A# as the unique map that satisfies
(x, AY)Q = (A#x, y)R all x e IRm (x, Ay)Q = (A#x, Y)R for all x E Rm and for all Y E W1. y e IRn.
By Example 7.l2 above, we must then have x T QAy x T (A#{ Ry for all x, y. Hence we By Example 7.12 above, we must then have XT QAy = xT(A#) Ry for all x, y. Hence we transposes (of AT Q = RA#. must have QA = (A#{ R. Taking transposes (of the usual variety) gives AT Q = RA#. QA = (A#) R. Since R is nonsingular, we find Since R is nonsingular, we find
A# = R1A Q. A* = /r'A' TQ.
We can also generalize the notion of orthogonality (x T = 0) to Q orthogonality (Q is We can also generalize the notion of orthogonality (xTyy = 0) to Qorthogonality (Q is a positive definite matrix). Two vectors x, y E IRn are Qorthogonal (or conjugate with a positive definite matrix). Two vectors x, y e W are <2orthogonal (or conjugate with T X Qy O. Qorthogonality is an important tool used in respect to Q) if ( x y) Q respect to Q) if (x,, y } Q = XT Qy = 0. Q orthogonality is an important tool used in studying conjugate direction methods in optimization theory. studying conjugate direction methods in optimization theory. Definition 7.14. Let V be a vector space over C. Then (., •} : V V > Definition 7.14. Let V be a vector space over <C. Then {, .) : V x V + C is a complex is a complex inner product if inner product if
1. (x,, x ) :::: Qfor all x e V and ( x , x ) = 0 if and only if x = 0. 1. ( x x) > 0 for all x E V and (x, x) =0 if and only ifx = O.
2. (x, y) = (y, x) for all x, y E V. (y, x) for all x, y e V. 2. (x, y)
3. (x, aYI + fiy2) = a(x, y\) + fi(x, Y2) for all x, YI, y2 E V and for alia, f3 6 C. 3. (x,ayi f3Y2) = a(x, yll f3(x, y2}forallx, y\, Y2 e V andfor all a, ft E c. Remark 7.15. We could use the notation (., ·)e to denote a complex inner product, but Remark 7.15. We could use the notation {•, }c to denote a complex inner product, but if the vectors involved are complexvalued, the complex inner product is to be understood. if the vectors involved are complexvalued, the complex inner product is to be understood. Note, too, from part 2 of the definition, that (x, x) must be real for all x. Note, too, from part 2 of the definition, that ( x , x ) must be real for all x.
Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have Remark 7.16. Note from parts 2 and 3 of Definition 7.14 that we have
(ax\ + fix2, y) = a(x\, y) + P(x2, y}.
Remark 7.17. The Euclidean inner product of x, e C" is given by Remark 7.17. The Euclidean inner product of x, y E C n is given by
n
(x, y)
= LXiYi = xHy.
i=1
The conventional definition of the complex Euclidean inner product is (x, y) yH but we The conventional definition of the complex Euclidean inner product is (x, y} = yHxx but we use its complex conjugate H here for symmetry with the real case. use its complex conjugate xHyy here for symmetry with the real case.
Remark 7.18. A weighted inner product can be defined as in the real case by (x, y)Q = Remark 7.1S. A weighted inner product can be defined as in the real case by (x, y}Q — x H Qy, arbitrary Q QH > o. notion Qorthogonality can be similarly XH Qy, for arbitrary Q = QH > 0. The notion of Q orthogonality can be similarly generalized to the complex case. generalized to the complex case.
56 56
Chapter 7. Projections, Inner Product Spaces, and Norms Chapter 7. Projections, Inner Product Spaces, and Norms
Definition 7.19. A vector space (V, F) endowed with a specific inner product is called an Definition 7.19. A vector space (V, IF) endowed with a specific inner product is called an inner product space. If F = C, we call V a complex inner product space. If F = R, we inner product space. If IF = e, we call V a complex inner product space. If IF = R we call V a real inner product space. call V a real inner product space.
Example 7.20. Example 7.20. 1. Check that V = IRn x" with the inner product (A, B) = Tr AT B is a real inner product 1. Check that = R" xn with the inner product (A, B) = Tr AT B is a real inner product space. Note that other choices are possible since by properties of the trace function, space. Note that other choices are possible since by properties of the trace function, Tr AT B = TrB TA = Tr A B = TrBAT TrATB = Tr BTA = TrABTT = Tr BAT..
2. Check that V = e nxn with the inner product (A, B) = Tr AHB is a complex inner Tr AH B is a complex inner 2. Check that V = Cnx" with the inner product (A, B) product space. Again, other choices are possible. product space. Again, other choices are possible. Definition 7.21. Let V be an inner product space. For v e V, we define the norm (or Definition 7.21. Let V be an inner product space. For v E V, we define the norm (or length) ofv by IIvll = */(v, v). This is called the norm induced by (',, .).. length) ofv by \\v\\ = J(V,V). This is called the norm induced by (  ) Example 7.22. Example 7.22. 1. If V = E." with the usual inner product, the induced norm is given by i> 1. If V = IRn with the usual inner product, the induced norm is given by II v II = n 2 2 1
(Li=l V i (E,=i<Y))2.xV—*« 9\ 7
2. If V = en with the usual inner product, the induced norm is given by II v II = 2. If V = C" with the usual inner product, the induced norm is given by \\v\\ "n (L...i=l IVi ) ! (£? = ,l»,lI22)*.. Theorem 7.23. Let P be an orthogonal projection on an inner product space V. Then Then Theorem 7.23. Let P be an orthogonal projection on an inner product space \\Pv\\ ::::: Ilvll for all v e V. IIPvll < \\v\\forallv E V.
Proof: Since P is an orthogonal projection, p2 = P = pH. (Here, the notation p# denotes Proof: Since P is an orthogonal projection, P2 = P = P#. (Here, the notation P# denotes the unique linear transformation that satisfies ( P u , } = (u, p#v) for all u, v E If this the unique linear transformation that satisfies (Pu, vv) = (u, P#v) for all u, v e V. If this seems a little too abstract, consider V = R" (or en), where P# is simply the usual PT (or seems a little too abstract, consider = IRn (or C"), where p# is simply the usual pT (or pH)). Hence (Pv, v) = (P 2v, v) = (Pv, p#v) = (Pv, Pv) = IIPvll 2 > O. Now /  P is PH)). Hence ( P v , v) = (P2v, v) = (Pv, P#v) = ( P v , Pv) = \\Pv\\2 ::: 0. Now /  P is also a projection, so the above result applies and we get also a projection, so the above result applies and we get
0::::: ((I  P)v. v) = (v. v)  (Pv, v)
=
IIvll2  IIPvll 2
from which the theorem follows. from which the theorem follows.
0
Definition 7.24. The norm induced on an inner product space by the "usual" inner product Definition 7.24. The norm induced on an inner product space by the "usual" inner product is called the natural norm. is called the natural norm.
In case V = C" or V = R",, the natural norm is also called the Euclidean norm. In In case = en or = IR n the natural norm is also called the Euclidean norm. In the next section, other norms on these vector spaces are defined. A converse to the above the next section, other norms on these vector spaces are defined. A converse to the above procedure is also available. That is, given a norm defined by IIx II = •>/(•*> x), an inner procedure is also available. That is, given a norm defined by \\x\\ — .j(X,X}, an inner product can be defined via the following. product can be defined via the following.
7.3. Vector Norms 7.3. Vector Norms Theorem 7.25 (Polarization Identity). Theorem 7.25 (Polarization Identity).
1. For x, y E m~n, an inner product is defined by 1. For x, y € R", an inner product is defined by (x,y)=xTy=
57 57
IIx+YIl2~IIX_YI12_
IIx + yll2 _ IIxll2 _ lIyll2 2
2. For x, y E en, an inner product is defined by 2. For x, y e C", an inner product is defined by
where j = i = \/—T. where j = i = .J=I.
7.3 7.3
Vector Norms Vector Norms
Definition 7.26. Let (V, F) be a vector space. Then \ Definition 7.26. Let (V, IF) be a vector space. Then II \ . \ II\ : V + R is a vector norm ifit V >• IR is a vector norm if it satisfies the following three properties: satisfies the following three properties:
1. Ilxll::: Ofor all x E V and IIxll = 0 ifand only ifx
= O.
2. Ilaxll = lalllxllforallx
E
Vandforalla
E
IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V. (This is called the triangle inequality, as seen readily from the usual diagram illus (This is called the triangle inequality, as seen readily from the usual diagram illustrating the sum of two vectors in ]R2 .) trating the sum of two vectors in R2 .) Remark 7.27. It is convenient in the remainder of this section to state results for complexRemark 7.27. It is convenient in the remainder of this section to state results for complexvalued vectors. The specialization to the real case is obvious. valued vectors. The specialization to the real case is obvious. Definition 7.28. A vector space (V, F) is said to be a normed linear space if and only if Definition 7.28. A vector space (V, IF) is said to be a normed linear space if and only if there exists a vector norm  •  : V > R satisfying the three conditions of Definition 7.26. there exists a vector norm II . II : V + ]R satisfying the three conditions of Definition 7.26. Example 7.29. Example 7.29.
1. For x E en, the Holder norms, or pnorms, are defined by 1. For e C", the HOlder norms, or pnorms, are defined by
Special cases: Special cases: (a) Ilx III = L:7=1
IXi
I (the "Manhattan" norm).
1
(b) Ilxllz = (L:7=1Ix;l2)2 = (c) Ilxlioo
(X
H
1
X)2
(the Euclidean norm).
= maxlx;l
IE!!
=
(The second equality is a theorem that requires proof.) (The second equality is a theorem that requires proof.)
p++oo
lim IIxllp
32. and Norms Chapter 7.32 are true for general inner product spaces. However. y e en.. t \ ] R).32.. Remark 7.> where Q = QH > 0 (this norm is more commonly = QH > Ikllz. [20..g. is a nonnegative definite matrix. The norm II .32 are true for general inner product spaces. „ . 112 is unitarily invariant. tO~t:5. ttl. then Remark 7. with equality if and only if x and y are linearly dependent.D = L~=ld. IIQ)' 1 3..e. Proof' Consider the matrix [x y] E C" x2 .lx.58 58 Chapter 7. Theorem 7. if U € C"x" is unitary.g = (x QXY denoted  • c).  . i. it is particularly easy to remember. In other words. R).e. e. then H H \\Ux\\2 \\x\\2 (Proof.34. However. y e C". Remark 7.30 (HOlder Inequality). Inner Product Spaces. 0 < e — 5. A particular case of the Holder inequality is of special interest. if U E enxn is unitary. Projections.tl Theorem 7. y E C". Let x. 1Ft).t~JI On the vector space ((C[to. +=1. On the vector space (C[to.(x Hyy)(yH x). Projections. Then Fhcorem 7. p q I I A particular case of the HOlder inequality is of special interest. Since is a nonnegative definite matrix. Let x. t\])n.D = E^rf/l*/!. denoted II . However. ttlr. and Norms 2. define the vector norm 3. define the vector norm 1111100 = max II/(t) 11 00 . 1Ft). 1cose 1~ 1. o ~ (x Hxx)(yH y) . define the vector norm 11111 = max 1/(t)I· to:::. Since Proof: Consider the matrix [x y] e en x2 . The norm  • 2 is unitarily invariant.33. Then Theorem 7. The angle 0 between two nonzero vectors x.l. Theorem 7. it is particularly easy to remember. where 4 > O. p. e. Remark 7. (b) IIx IIz. 217]).31 (CauchyBunyakovskySchwarz Inequality)..34. 217]). Some weighted pnorms: (a) IIxll1. Let x. Since yH = x H y. p. D 0 \\X\\2\\y\\2Note: This is not the classical algebraic proof of the CauchyBunyakovskySchwarz Note: This is not the classical algebraic proof of the CauchyBunyakovskySchwarz (CBS) inequality (see. its determinant must be nonnegative. Inner Product Spaces. JC. Let x. whered.g. i. > 0. In other words. Theorem 7. On the vector space (C[to. we see immediately that IXH y\ ~ 0 < ( x H ) ( y H y ) — ( x H ) ( y H x ) .. y E en may be defined by cos# = 1I. The angle e between two nonzero vectors x. and  . Remark 7. its determinant must be nonnegative. y e C" may be defined by Remark 7.31 and Remark 7.31 and Remark 7.1^ IIUxll2 = IIxll2 (Proof IIUxili = x U Ux = xHx = IIxlli)· However. Ther.30 (Holder Inequality). we see immediately that \XH yl < IIxll2l1yllz. [20.. Some weighted pnorms: 2.31 (CauchyBunyakovskySchwarz Inequality). Then with equality if and only if x and yare linearly dependent. (CBS) inequality (see. 11·111 and 1I·IIClO XHUHUx .33. \\Ux\\l XHX = \\x\\\). The CBS inequality is thus equivalent to the statement I. define the vector norm On the vector space «e[to.^ cos e = IlMmlylb 0 ~ 0 < I' The CBS inequality is thus equivalent to the statement ~ ^  COS 0 < 1. y E en.Q — (xhH Qx) 2.~~1~1112'. Since yHxx = x Hy.
7. i. C2 (possibly 7. while the latter is needed to make sense of "convergence" of matrices. there exist constants CI. then we have the Pythagorean Identity Ilx ± YII~ = IIxll~ + IIYII~. there exist vectors x for which equality holds: vectors x for which equality holds: Ilxlll :::: Jn Ilxlb Ilxll2:::: IIxll» IIxlloo :::: IIxll» Ilxlll :::: n IIxlloo. i. Matrix Norms 7. 3. i. v(2).. the following inequalities are all tight bounds. IIAII ~ Ofor all A E IR mxn and IR IIAII = 0 if and only if A = O.37. IR) since that is "convergence" of matrices.e. Extension to the complex case is straightforward what arises in the majority of applications.38.e. i. Finally.e. (As with vectors. Then 7. and essentially obvious. i. ConFinally. the motivation for using matrix norms is to have a notion of either the size of or the nearness of matrices.36. Convergence of a sequence of vectors to some limit vector can be converted into a statement vergence of a sequence of vectors to some limit vector can be converted into a statement about convergence of real numbers. E en.4 Matrix Norms Matrix Norms In this section we introduce the concept of matrix norm. y E en are orthogonal. IIxlioo :::: IIxllz.e. v(l). Theorem 7. Extension to the complex case is straightforward and essentially obvious. the proof of which follows easily from liz II~ = ZH z. As with vectors.. convergence in terms of vector norms.35.. Similar remarks apply to the unitary invariance of norms of real are not unitarily invariant. 2.. The former notion is useful for perturbation analysis. ci (possibly depending onn) such that depending on n) such that Example 7. IIxl12 :::: Jn Ilxll oo . For x E en. The using matrix norms is to have a notion of either the size of or the nearness of matrices. Attention is confined to the vector space (IRmnxn. If x.. vectors under orthogonal transformation. BE IRmxn. e C".35. i» (1) v(2\ . there exist Example 7. Let II·• II be a vector norm and suppose v. this is called the triangle inequality. we conclude this section with a theorem about convergence of vectors. Remark 7.39. about convergence of real numbers. II·• II : IR mxn > IR is a matrix norm if it satisfies the following three properties: properties: 1. Let \\ \\ be a vector norm and suppose v.7. All norms on C" are equivalent.4.39. then we have the Pythagorean Identity Remark 7. If y € C" are orthogonal.. the following inequalities are all tight bounds. Definition 7. the motivation for In this section we introduce the concept of matrix norm. All norms on en are equivalent. Attention is confined to the vector space (W xn R) since that is what arises in the majority of applications. Matrix Norms 59 59 are not unitarily invariant. i. we conclude this section with a theorem about convergence of vectors..38. there exist constants c\.) .. lIaAl1 = lalliAliforall A E mxn andfor all a E IR..e.e. Theorem 7..4 7.36. 2 the proof of which follows easily from z2 _ z_//. IIA + BII :::: IIAII + IIBII for all A..   R mx " ~ E is a matrix norm if it satisfies the following three Definition 7.37. convergence in terms of vector norms. Similar remarks apply to the unitary invariance of norms of real vectors under orthogonal transformation.4.. Then lim k4+00 V(k) = v if and only if lim k~+oo II v(k)  v II = O. As with vectors. . For x G C". while the latter is needed to make sense of former notion is useful for perturbation analysis.
Projections.2 =  .42._ Then "mixed" norms can also be defined by e lR. J=1 3."  A\\ = ^ \ai} . The spectral norm is 3. Then the matrix pnorms are defined by A e Rmxn.. pnorms previously.40.jj laij. \\F and 11'115. I Some special cases of Schatten /?norms are equal to norms defined previously. (AA ')). 1. Example 7.43.1 is often called the trace norm. ai." theorem and requires a proof.00 =  • 2. Inner Product Spaces.43.44. IIAII P t altA)) 1 ~ (T." Each is a "computable. I. Let A E K m x ". The "maximum row sum" norm is 2.  . The "matrix analogue of the vector 1norm. For example. e R mx ". The following three special cases are important because they are "computable. + a!)"". is a norm. (where r = laiiK^/i.mxn. . I.<110#0 IIxllq Example 7. tTL T Note: IIA+llz = l/ar(A).60 max _P IIAxll = max Ilxli p IIxllp=1 IIAxll p . Example 7. The Schattenpnorms are defined by E lR. Example 7. to estimate the size of a matrix product A B in terms of the sizes of A and B individually. (A' A)) 1 ~ (T.60 Chapter 7. ^wncic = rank(A)). Example 7. IIAII2 = Amax(A A) = A~ax(AA ) = a1(A). Inner Product Spaces. The norm  .mxn. IIAlioo = max rE!!l. where r mxn = rank(A).42. The concept of a matrix norm alone is not altogether useful since it does not allow us to estimate the size of a matrix product AB in terms of the sizes of A and B individually. Example 7. defined by IIAIIF ~ (t. 11·115. and Norms Chapter 7.. Let A E lR.mxn IIAII p.q = max IIAxil p 11. (t laUI).p = (at' + .) I ~ (t. matrix = Ilxllp. Example 7.mxn. is a norm..44." IIAliss = Li. and Norms Example 7.40.  5 2 = II IIF and  • 5i00 = II . Then the Frobenius norm (or matrix Euclidean norm) is 7. The "matrix analogue of the vector Inorm. Let A E lR. 5>1 is often called the trace norm. The "maximum column sum" norm is 2.41. Let A E R . Projections. Schatten/7norms IIAlls.. 112' The norm II • 115.
consider  • \\F. .60 Ilx i. 2. 1. There exists a vector x* such that IIAx*11 = IIAllllx*11 if the matrix norm is subordinate to the vector norm.46. we clearly have Ajc ::s A1jt.jii IIAII I. Then :]. i.e.jii IIAII I .47. i.48.. HAjcJI^ ::s \\A\\m Ilxli v. If \\ • 11m is a consistent matrix norm. reader The interested reader is invited to prove each of them as an exercise.. take A = B = \ \ Afl li00 = 2whileA li00 B 1>00 = 1. also caUedoperator norms. We thus need the following definition.e. We thus need the following definition. but there does consider II . more generally. although there are analogues for.but there does  is consistent with F. inner products or outer products of vectors. Then The p norms are examples of matrix norms that are subordinate to (or induced by) The pnorms are examples of matrix norms that are subordinate to (or induced by) a vector norm.e.. which equality holds: which equality holds: IIAIII ::s . Theorem 7.1100 = max laijl x. e. e R" x ". There exists a vector x* such that Ajt* = A jc* if the matrix norm is Theorem 7. Then the norms \\ . 1. so not exist a vector norm  .oo J1. IIAII2 ::s. 11^4^11 P (or. For example. IIAlioo ::s . 2. subordinate to the vector norm. 2...4. A A 2. p for all p are consistent matrix norms. IIAxl1 IIAII = max . Let A e Rmxn.jii IIAlb IIAIII ::s n IIAlloo. if II A B II < II A 1111 fi whenever the matrix product is defined. •II p for all p are consistent matrix norms. e. Let A E ]Rmxn. For example.jii II A IIF.jii IIAIIF.e. i. II F' ThenA^ < A II Filx 112. \\m is a consistent matrix norm. atornorms.7. Theorem 7. IIAIIF ::s. it follows that all subordinate norms are consistent. II In II p = 1 for all p.oo 2 while IIAIII.60 IIx II Ilxll=1 IIAxll p . )). \\v consistent with it.4. . q For such subordmate norms. A = max^o IIxll. the IIIn II F = .j is a matrix norm but it is not consistent. more generally. IIAllp. not exist a vector norm II •  such that IIAIIF is given by max x . l. For example. IIAII2::S IIAIIF. i..  • /7and II .47.. Matrix Norms 61 61 Notice that this difficulty did not arise for vectors. it follows that all subordinate norms are consistent. Notice that this difficulty did not arise for vectors.so II . For A following inequalities are all tight.45.~~i'. wec1earlyhave IIAxll < IIAllllxll· Since Afijc < IIAlIllBxll < Afljt. take A = B = [: is a matrix norm but it is not consistent. B E ]Rnxk. IIAIII ::s . there exists a vector norm II • IIv Theorem 7.q = maxx.. \\ • \\p. If II . IIAxll1 = max . Example 7.jii IIAlioo' .jii. Then the norms II • \\a II· Ilfl' and .Then II Ax 1122 ::s II AFjc2. Definition 7.e. consistent with it.jii IIAII2. II· II F and 1. For example. A matrix norm 11·11\\is said to be consistent mutuallyconsistentifIlABII. there exists a vector norm \\ . •II F.and II \\ •lIy y are mutually consistent if \\ A B \\ a < IIAllfllIBlly. i. \\ are Definition 7.46.60 \^ • Useful Results The following miscellaneous results about matrix norms are collected for future reference. a vector norm.g.60 . Not every consistent matrix norm is subordinate to a vector norm. Matrix Norms 7. exercise.. IIAII2 ::s . while E ]Rnxn.. II such that AF is given by max^o ". also called oper(or.45.= max IIAxl1 x.e.g. IIAlioo ::s . IIAIIF ::s .. B e Rnxk..jii IIAlloo. Since IIABxl1 ::s Afljc ::s IIAIIIIBllllxll. inner products or outer products of vectors.jii IIAlb IIAIIF ::s .ooIlBIII. IIABIII. Example 7. IIAxliv < IIAlim \\x\\v' Not every consistent matrix norm is subordinate to a vector norm. For such subordinate norms.::S \\A\\p\\B\\y A matrix norm \\ • is said to be consistent if \\AB\\ ::s  A   B II whenever the matrix product is defined. The following miscellaneous results about matrix norms are collected for future reference. The "mixed" norm "mixed" norm II· 11 100 . there exist matrices A for i.. II A 1100 ::s n IIAII I .48. II". •1122is consistent with II . although there are analogues for.
If P is an orthogonal projection.. and Norms Chapter 7. where ¥2 is defined as in Theorem 5. . If P projection. Find the (orthogonal) projection of the vector [2 3 4f onto the subspace of 1R3 5. Show that the matrix norms II .c — v + = 0. Chapter 7. IIAllaa fora matrices Q E IR Convergence Convergence The following theorem uses matrix norms to convert a statement about convergence of a sequence of matrices into a statement about the convergence of an associated sequence of of scalars. 4. e Rmx" mxm x mxm and Z E IRnxn . IIF are unitarily invariant. Let II ·11 be a matrix norm and suppose A. Inner Product Spaces. Theorem 7. for all A E IRmxn and for all orthogonal unitarily invariant. A (1) . Projections. „ } The spectral radius of A is the scalar p(A) = max IA.Q — Q must be an orthogonal matrix.l. . i. EeIRmxn. but not necessarily other pnorms) are unitarily invariant.. For A eRmxa . p+ = P. prove that P+ = P.I. where V2 is defined as in Theorem 5. 7... and Norms max laijl :::: IIAII2 :::: ~ max laijl. spanned by the plane 3x . IIQAZlia = A fora = 2 or F. . 2.. Projections.e.An}. Prove that the A e Wnxn orthogonal projection onto the space spanned by these column vectors is given by the P matrix P = A(ATTA)~}AT.y + 2z = O..49. Then 7. A(A A) 1 AT 5. . but not necessarily The norms  • \\F and  • 2 (as well as all the Schatten pnorms. \\ \\bea Rmx".1. A (2) ..62 62 3.] 4. A(I). Prove that P .. Also. 6. space. i . > . orthogonal projection..49. l. IIF and II . 112 and II . Prove that / . B) = Tr ATB is a real inner product IR n x" AT B (A. Prove that E"xn with the inner product (A.e. Inner Product Spaces.  • 2 and  • \\F 8. Definition: Let A E IRnxn and denote its set of eigenvalues (not necessarily distinct) by P. [2 3 4]r R3 spanned by the plane 3. 3. scalars.1. Suppose P and Q are orthogonal projections and P + Q = I. must be an orthogonal matrix. . 112 (as well as all the Schatten /?norms.. 3. For A E IR mxn . The norms II . 1. (MZa or F. i. Definition: Let A e Rnxn and denote its set of eigenvalues (not necessarily distinct) 8. B) = space. Suppose that a matrix A E IR mxn has linearly independent columns..] l.. matrices Q zR and e M" ". EXERCISES EXERCISES 1. The spectral radius of A is the scalar by {Ai .. A(2).A+A is an orthogonal projection. Then k~+oo lim A (k) = A if and only if k~+oo lim IIA (k)  A II = o. prove directly that V22Vl is an I — +A V V/ is an orthogonal projection.
(An n x n matrix.) T 10.. where both x. Let A=[~4 9 2 ~ ~]. 9. Aj. and p(A).. If M is a magic square matrix. y E IR n are nonzero. y e R" are nonzero.) that  M Up = for all/?. Let A = xy . where both x. all of whose columns and rows as well as main diagonal and antidiagonal sum to s = n(n2 + 1) /2. H A I I A2. . 2. \\A\\ A2. where ex and {3 take the value 1. Let A = xyT. IIAlb IIAlloo. is called a "magic square" matrix. A2. Determine IIAIIF' IIAII d . appropriate. all of whose Determine AF. IIAlb IIAlloo. H A H ^ and peA). Let 9. it can be proved that IIMllp = ss for all p. (An n x n matrix.2. HA^. 10. columns and rows as well as main diagonal and antidiagonal sum to s = n (n 2 l)/2. and p(A). Determine AF. where a and ft take the value 1.Exercises Exercises 63 63 Let Let A=[~ 14 0 12 5 ~]. and peA). Determine AF. or (Xl as appropriate. or oo as and II A 1100 in terms of IIxlla and/or IlylljJ. Determine IIAIIF' IIAIII> IIAlb and Aoo in terms of \\x\\a and/or \\y\\p. Determine IIAIIF' IIAII Ilt.
This page intentionally left blank This page intentionally left blank .
see Section 8. Solution: The set X has a number of easily verified properties: The set X has a number of easily verified properties: 1. i.bll 2 is minimized}.b E 'R(A)L so these two vectors are orthogonal. IIrll~ = lib . x E X if and only if x is a solution of the normal equations.1) To see why this must be so. AT — A T Ax = AT b latter form is commonly known as the normal equations.. The linear least Problem: Suppose A e jRmxn with > n and b <= Rm is given vector. (8. i.x — b\\\ (and hence p ( x ) = from the Pythagorean identity (Remark 7. write the residual in the form r = (b .Axll~ = lib .1 The Linear Least Squares Problem The Linear Least Squares Problem Problem: Suppose A E Rmx" with m 2: nand b E jRm is aagiven vector. vector x e X if and only if AT where b — Ax is the residual associated with x.PR(A)b) = (I . (Pn(A)b — AJC) is clearly in 7£(A).e. whereyEjRnisarbitrary. (PR(A)b .e.2.2. see Section 8. write the residual r in the form To see why this must be so.Chapter 8 Chapter 8 Linear Least Squares Linear Least Squares Problems Problems 8.. Now. so these two vectors are orthogonal.PR(A)b) + (PR(A)b  Ax). IIAx . A vector x E X if and only if ATrr = 0. Thus.2) 65 .PR(A))b = PR(A). For further details.35). where r = b . A vector x X if and onlv if x is of the x=A+b+(IA+A)y. 2. while (b . Hence.PR(A)bll~ + IIPR(A)b  Axll~ from the Pythagorean identity (Remark 7. Thus.1 8. Hence. x e X if and only if latter form is commonly known as the normal equations. A vector x E X if and only if x is of the form 2.Ax is the residual associated 1. while Now.Ax) is clearly in 'R(A).. For further details.35). The equations ATrr = 0 can be rewritten in the form A TAx = ATb and the x.bll~ (and hence p(x) = \\Ax . is a solution of the normal equations. A.b 112) assumes its minimum value if and only if II Ax —b\\2) assumes its minimum value if and only if (8. The linear least squares problem consists of finding an element of the set squares problem consists of finding an element of the set x = {x E jRn : p(x) = IIAx .
has a unique element x" of minimal2norm. x* minimizes the residual p(x) that solves this "double minimization" problem. In fact.. The minimum value of p ((x) is then clearly equal to where y E ]R. To see why. then equality holds and the least squares . 1]..1]. which follows since the two vectors are orthogonal. then equality holds and the least squares If the existence condition happens to be satisfied.3. X. there is no "existence condition" such as R(B) S. we can generalize the linear least squares problem to the matrix case.n is arbitrary. 0*i (1 #)* = A+b (I .. X = A+B. By Theorem 6. Then the convex combination and Xz = A+b (I . Just as for the solution of linear equations. Notice that solutions of the linear least squares problem look exactly the same as solutions of the linear system AX = B.1) and which follows since the two vectors are orthogonal. Linear Least Squares Problems and this equation always has a solution since AA+b e 7£(A). of linear least squares solutions. In fact.nxk is arbitrary. X = {x*} = {A+b}.23. This follows immediately from and is the vector of minimum 2norm that does so.mxk. there is no "existence condition" such as K(B) c R(A). There is a unique solution to the least squares problem.1) and convexity or directly from the fact that all x E X are of the form (8. consider two arbitrary vectors Xl = A + b 3. if and only if rank (A) = n. The general solution to e ]R. where y e W is arbitrary. and only if A+A = I or.mxn XElR Plxk min IIAX  Bib is of the form is of the form X=A+B+(IA+A)Y. the last inequality following by Theorem 7.2) are of the form x = A+ AA+b + (I  A+ A)y =A+b+(IA+A)y. i. i. The only difference is that in the case of linear least squares solutions. consider two arbitrary vectors jci = A+b + (I — A + A) y (I . equivalently.A+A)(Oy (1 .3. The Theorem 8.0)z) is clearly in 4..8)xz2 = A+b ++ (I A+ A)(8y ++ (1 8)z) is clearly in X. The unique solution of minimum 2norm or Fnorm is X = A+B. X is convex. X has a unique element x* of minimal 2norm. Let A E E mx " and B € Rmxk.e. equivalently. Let 8 e [0. The unique solution of minimum 2norm or Fnorm is where Y € ]R. 5.2) are of the form solutions of (8. Then the convex combination 8x. 3.e. if 5.A+ A)z in X. Notice that solutions of the linear least squares problem look exactly the Remark 8. Remark 8.2.e... x* = A+b is the unique vector that solves this "double minimization" problem. Linear Least Squares Problems Chapter 8. all solutions of (8. AA+)bI1 2 the last inequality following by Theorem 7. problem to the matrix case. X is convex. The only difference is that in the case same as solutions of the linear system AX = B. x* minimizes the residual p ( x ) and is the vector of minimum 2norm that does so. i. + (1 . Let 6 E [0. X = {x"} = {A+b}. To see why.2. If the existence condition happens to be satisfied.A+A)y and *2 = A+b + (I — A+A)z in X. i. x" = + b is the unique vector 4. if and only if rank(A) n.1. By Theorem 6. There is a unique solution to the least squares problem. all and this equation always has a solution since AA+b E R(A). The minimum value of p x ) is then clearly equal to lib .66 Chapter 8. where Y E R" xfc is arbitrary. This follows immediately from convexity or directly from the fact that all x e X are of the form (8.e.23. if and only if A + A lor. 7£(A). BE ]R. we can generalize the linear least squares Just as for the solution of linear equations.PR(A)bll z = ~ 11(1 Ilbll z.
8.3 Linear Regression and Other Linear Least Squares Problems 8.3 Linear Regression and Other Linear Least Squares Problems
67
O. X = +B residual is 0. Of all solutions that give a residual of 0, the unique solution X = A+B has minimum 2norm or F norm. Fnorm. Remark 8.3. If we take B = 1m in Theorem 8.1, then X = A+ can be interpreted as Im in Theorem 8.1, then Remark 8.3. If we take B A+ can be interpreted as saying that the MoorePenrose pseudoinverse of A is the best (in the matrix 2norm sense) A AX matrix such that AX approximates the identity. Remark 8.4. Many other interesting and useful approximation results are available for the F norm). matrix 2norm (and Fnorm). One such is the following. Let A E M™ x " with SVD following. e lR~xn
A
= U~VT = LOiUiV!.
i=l
Then a best rank k approximation to A for 1< f c < r r,i . e . , a solution to A k l :s k :s , i.e., a
MEJRZ'xn
min IIA  MIi2,
is given by is given by
k
Mk =
LOiUiV!.
i=1
The special case in which m = n and k = n  1 gives a nearest singular matrix to A E A e = nand = —
lR~ xn .
8.2 8.2
Geometric Solution Geometric Solution
Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx —bll 2 2  Ax b\\ x e W1 p — Ax is equivalent to finding the vector x E lRn for which p = Ax is closest to b (in the Euclidean b Ay norm sense). Clearly, r = b  Ax must be orthogonal to R(A). Thus, if Ay is an arbitrary r b — Ax 7£(A). R(A) vector in 7£(A) (i.e., y is arbitrary), we must have y
0= (Ay)T (b  Ax) =yTAT(bAx) = yT (ATb _ AT Ax).
Since y is arbitrary, we must have ATb — ATAx = 0 or A r A;c = AT b. AT b  AT Ax AT Ax = ATb. T Special case: If A is full (column) rank, then x = (AT A) ATb. A = (A A)l ATb.
8.3 8.3
8.3.1 8.3.1
Linear Regression and Other Linear Least Squares Linear Regression and Other Linear Least Squares Problems Problems
Example: Linear regression
Suppose we have m measurements (ll, YI), ... ,, (trn,,ym) for which we hypothesize a linear (t\,y\), . . . (tm Ym) (affine) relationship (8.3) y = at + f3
68
Chapter 8. Linear Least Squares Problems Chapter 8. Linear Least Squares Problems
b
r
p=Ax
Ay E R(A)
Figure S.l. Projection of b on K(A). Figure 8.1. Projection of b on R(A).
for certain constants a. and {3. One way to solve this problem is to find the line that best fits for certain constants a and ft. One way to solve this problem is to find the line that best fits the data in the least squares sense; i.e., with the model (8.3), we have the data in the least squares sense; i.e., with the model (8.3), we have
YI
Y2
= all + {3 + 81 ,
= al2 + {3 + 82
where &\,..., 8m are "errors" and we wish to minimize 8\ + • • 8;. Geometrically, we where 81 , ... , 8m are "errors" and we wish to minimize 8? + ...• + 8^ Geometrically, we are trying to find the best line that minimizes the (sum of squares of the) distances from the are trying to find the best line that minimizes the (sum of squares of the) distances from the given data points. See, for example, Figure 8.2. given data points. See, for example, Figure 8.2.
y
Figure 8.2. Simple linear regression. Figure 8.2. Simple linear regression.
Note that distances are measured in the vertical sense from the points to [he line (as Note that distances are measured in the venical sense from the point!; to the line (a!; indicated, for example, for the point (t\, y\}}. However, other criteria arc possible. For exindicated. for example. for the point (tl. YIn. However. other criteria nrc po~~iblc. For cxample, one could measure the distances in the horizontal sense, or the perpendicular distance ample, one could measure the distances in the horizontal sense, or the perpendiculnr distance from the points to the line could be used. The latter is called from the points to the line could be used. The latter is called total least squares. Instead squares. Instead of 2norms, one could also use 1norms or oonorms. The latter two are computationally of 2norms, one could also use Inorms or oonorms. The latter two are computationally
8.3. Linear Regression and Other Linear Least Squares Problems 8.3. Linear Regression and Other Linear Least Squares Problems
69
much more difficult to handle, and thus we present only the more tractable 2norm case in difficult text that follows. follows. The m "error equations" can be written in matrix form as ra
Y = Ax +0,
where
We then want to solve the problem
minoT 0 = min (Ax  y)T (Ax  y)
x
or, equivalently, min lIoll~ = min II Ax  YII~.
x
(8.4)
AT Solution: x = [~] is a solution of the normal equations AT Ax Solution: x — [^1 is a solution of the normal equations ATAx = ATyy where, for the special form of the matrices above, we have special form of the matrices above, we have
and and
AT Y = [ Li ti Yi
LiYi
J.
The solution for the parameters a and f3 can then be written ft
8.3.2
Other least squares problems
y = f(t) =
Cl0!(0
(8.3) of the form Suppose the hypothesized model is not the linear equation (S.3) but rather is of the form + • • • 4 cn<t>n(t). (8.5) (8.5)
In (8.5) the ¢i(t) are given (basis) functions and the Ci; are constants to be determined to </>,(0 functions c minimize the least squares error. The matrix problem is still (S.4), where we now have minimize the least squares error. The matrix problem is still (8.4), where we now have
An important special case of (8.5) is least squares polynomial approximation, which corresponds to choosing ¢i (t) = t t'~1,, i i;Ee!!, although this choice can lead to computational 0,• (?) = i  l n, although this choice can lead to computational
etc. [4]. [7]. c. + c2f. if the fitting function is of the form y t) Y = ff( (t) = c\eC2i. then II v II ~ = II viii ~ + II v211 ~ (note that orthogonality is not what is used here. ] II: = II [ The last equality follows from the fact that if v [~~]. The former based on SVD and QR (orthogonalupper triangular) factorization. In this section we investigate solution of the linear least squares problem min II Ax x b11 2 . it can be expected to exhibit such poor numerical behavior in practice (and it does). Ib is unitarily invariant =11~zcll~ wherez=VTx.c=UTb = II [~ ~] [ ~~ ] . are based on orthogonal polynomials. we assume that A has an SVD given by A = UT. since II . Since the standard Kalman filter essentially amounts to sequential updating of normal equations. The basis functions coefficients c. VT = U. [23]).can be arbitrarily nonlinear. S~lc\.. bE IR m . 8.b\\^ is II czll ~. of linear least squares problems via the normal equations can be a very poor numerical method in finiteprecision arithmetic. Then GI defining y = logy.SVr U~VT Theorem 5.6) via the SVD. The former is much more expensive but is generally more reliable and offers considerable theoretical offers insight. The subvector z2 is arbitrary. Sometimes a problem in which the Ci'S appear nonlinearly nonlinearly can be converted into a linear problem. c\ logci. Specifically. We now note that IIAx  bll~ = IIU~VT x =  bll~ II ~ VT X  U T bll. the two are equivalent.g. Then c. as in Theorem 5. = log c" and C2 = cj_ results in a standard linear least squares y — log y. the last equivalent.1.4 Least Squares and Singular Value Decomposition Least Squares and Singular Value Decomposition In the numerical linear algebra literature (e. Better numerical methods are based on algorithms that behavior in practice (and it does). Two basic classes of algorithms are A itself S VD and QR (orthogonalupper triangular) factorization. respectively. problem. . The key feature in (8. quantity above is clearly minimized by taking z\ = S'c. This that orthogonality is not what is used here. piecewise polynomial functions. [11].. the subvectors can have different lengths). z. it is shown that solution [4]. [7]. In fact. Since the standard Kalman filter essentially amounts method in finiteprecision arithmetic. As far as the minimization is concerned. fact.70 70 Chapter 8.. e C2 / then taking logarithms yields the equation log y = log c.4 8. A E IRmxn . c. Linear Least Squares Problems Chapter 8. insight. etc. This explains why it is convenient to work above with the square of the norm rather than the concerned. For example.[ ~~ ] II: sz~~ c. Better numerical methods are based on algorithms that AT work directly and solely on A itself rather than AT A. Linear Least Squares Problems difficulties because of numerical ill conditioning for large n.1. Z2 while the minimum value of \\Ax — b II ~ is l^llr while the minimum value of II Ax . (8. arbitrary. Numerically better approaches ill difficulties n. C2 problem. we assume that A has an SVD given by A U\SVf via the SVD. splines. Specifically. the subvectors can have different lengths). For example. if the fitting function is of the form can be converted into a linear problem. appear functions </>. 's ¢i. norm.5) is that the coefficients Ci appear linearly. then taking logarithms yields the equation logy = logci + cjt. respectively. then u^ = i>i \\\ \\vi\\\ (note The last equality follows from the fact that if v = [£ ].
m is arbitrary. x has been written in the form x = A+b + (I . V2 Z 2 is an arbitrary vector in 7Z(V2)) = A/"(A).mxm.6) but this In this section. In this case the SVD of A is given by A A = V:EVTT = [VI{ Vzl[g]Vr.5 Least Squares and QR Factorization Least Squares and QR Factorization In this section. This follows easily since (7 . Thus.AA+)bll~ . This matrix factorization is much cheaper to compute time in terms of the QR factorization.A + A)_y. To simplify the exposition. A finite sequence of simple orthogonal transformations. If we label the product of such orthogonal row transformations as the to triangular form. This follows easily since Another expression for the minimum residual is II (I .1). A E ffi. can be quite reliable.6) but this time in terms of the QR factorization. A e R™ X M . via a sequence of socalled Householder or Givens transformations.e. This matrix factorization is much cheaper to compute than an SVD and. In this case the SVD of A is given by socalled fullrank problem. Z VISici The last equality follows from The last equality follows from c = UTb = [ ~ f: ]= [ ~~ l Note that since Z2 is arbitrary. x has Note that since 12 is arbitrary. we add the simplifying assumption that A has full column rank. of course. i.8.. UZV = [U t/2][o]^i r > and there is thus "no V2 part" to the solution.. an important special case of the linear least squares problem is the Finally.S. and there is thus "no V2 part" to the solution.. If we label the product of such orthogonal row transformations as the orthogonal matrix QT E R m x m . Least Squares and QR Factorization Now transform back to the original coordinates: Now transform back to the original coordinates: x = Vz 71 71 = [VI V2 1[ ~~ ] = VIZ I + V2Z2 = = + V2Z2 vlsIufb + V2 2. can be quite reliable. we add the simplifying assumption that A has full column To simplify the exposition.AA+)b\\22 = \\U2Ufb\\l = bTU2U^U22V!b = bTU2U*b = \\U?b\\22. with appropriate numerical enhancements.7) . we have QT € ffi. i.1).~xn. 8. Thus. to reduce A in the following way. we again look at the solution of the linear least squares problem (8.. of course. is orthogonal to all vectors in 7l(A}L b E R(A). This agrees. than an SVD and. Another expression for the minimum residual is  (/ — AA + )b 2 .AA+)bllz. = +b + (/ — A + A) y.5 8.e. via a sequence of socalled Householder or Givens rank. where y E Rm is arbitrary. It is then possible. to reduce A in the following way. 11(1.11U2U!b"~ = bTUZV!V UJb = bTVZV!b = IIV!bll~. we again look at the solution of the linear least squares problem (8. A E ffi.5. (8. with appropriate numerical enhancements. Least Squares and QR Factorization B. This agrees. where y e ffi. with (8. V2z is an arbitrary vector in R(V2 = N(A). A e 1R™ X ".e. It is then possible. i. i.e. Finally. with (8. The minimum value of the least squares residual is The minimum value of the least squares residual is and we clearly have that and we clearly have that minimum least squares residual is 0 4=> b is orthogonal to all vectors in U2 minimum least squares residual is 0 {::=:} b is orthogonal to all vectors in U2 {::=:} •<=^ {::=:} b is orthogonal to all vectors in R(A)l.~xn. A finite sequence of simple orthogonal row transformations (of Householder or Givens type) can be performed on A to reduce it row transformations (of Householder or Givens type) can be performed on A to reduce it to triangular form. an important special case of the linear least squares problem is the socalled fullrank problem.
b E Em. 3. Both Q I and Qz2 have orthonormal columns. where Q\ e R mx " and Qz € K" x(mn). Multiplying through by Q in (8. For A E Wmxn .7). we have x = R. n • (a) Find the optimal linear combination aq^ + (3q2 that is closest to b (in the 2norm (a) Find the optimallinear combination aql + fiq2 that is closest to b (in the 2norm sense).7)..9) is essentially what is accomplished by the GramSchmidt process.e. Linear Least Squares Problems Chapter 8. all in ffi. Suppose qi and q2 are two orthonormal vectors and b is a fixed vector. (8. both ql and q2 .Show that r is orthogonal to both^i and q2.8). i. i. where QI E ffi. 3..Cl and the minimum residual The last quantity above is clearly minimized by taking x = R lIc\ and the minimum residual is Ilczllz. 2.. we have = R~l Qf b = +b and the minimum residual is IIC?^!^ EXERCISES EXERCISES 1.3).9) Any of (8.8). we see that in (8. (b) Find the best (in the 2norm sense) line of the form jc = ay + (3 that fits this (b) Find the best (in the 2norm sense) line of the form x = ay fJ that fits this data.7). check directly that (I . (3.A+A)y and A +b 1.9) are variously referred to as QR factorizations of A. (2. Yi): 2. data. xn. or (8. Suppose q.I of the columns of yields the orthonormal columns of Q\. Consider the following set of measurements (*. sense). m and any e ffi.72 Chapter 8. Now write Q = [QI Q2].mxn and where R e M£ x " is upper triangular. (a) Find the best (in the 2norm sense) line of the form y = ax + fJ that fits this (a) Find the best (in the 2norm sense) line of the form y = ax + ft that fits this data. Note that Any of (8. b e ffi. (8..1). Note that (8.. Both Q\ and <2 have orthonormal columns. Qz] [ (8. For € ffi. yt): (1.[ ~~ ] If:. Equivalently.9) are variously referred to as QR factorizations of A.e. Consider the following set of measurements (Xi. Multiplying through by Q Q2 E ffi. The last quantity above is clearly minimized by taking x = R. R~l) ) of the columns of A yields the orthonormal columns of QI.. by writing AR~l1 = Q\ we see that a "triangular" linear combination (given by the coefficients of ARQI we see that a "triangular" linear combination (given by the coefficients of R. Linear Least Squares Problems where E ffi. 112 is unitarily invariant ~ ] x .~xn is upper triangular.8) ~ ] (8. and any y E R". Now write Q = [Q\ Qz]. and qz are two orthonormal vectors and b is a fixed vector. or (8. by writing (8. are orthogonal vectors. Now note that Now note that IIAx  bll~ = IIQ T Ax = II [ QTbll~ since II . we see that A=Q[~J = [QI = QIR. all in R".+ A)y and A+b are orthogonal vectors. (b) Let r denote the "error vector" b . n .1Q\b = A+b and the minimum residual is II Qr bllz' is \\C2\\2. check directly that (I .7). data.2).Equivalently.flq2 Show that r is orthogonal to (b) Let r denote the "error vector" b — ctq\ — {3qz.aql .m IX(m ~" ) .9) is essentially what is accomplished by the GramSchmidt process.
where 8 is a small positive number.xn can be factored in the form (8. Use the four Penrose conditions and the fact that QI has orthonormal columns to verify that if A e R™ x "can be factored in the form (8. not necessarily nonsingular. Let A E ~nxn.bl1 2 when A = [~ ~ ] and b = [ !1 x The solution is (a) Consider a perturbation EI = [~ pi of A. Find all solutions of the linear least squares problem 4.:. Solve the perturbed version of the above problem. of 2norm solution of least «rmarp« problem squares nrr»h1<=>m min II Ax .yII2 as 8 approaches O? (b) Now consider the perturbation E2 = [~ (b) Now consider the perturbation EI = \0 s~\ of A. Consider the problem of finding the minimum 2norm solution of the linear least 5.bll 2 x when A = [ ~ 5.9). Find all solutions of the linear least squares problem min II Ax . Solve the perturbed problem positive number. What happens to IIx* — y 2 as 8 approaches 0? where AI = A + E\. where again 8 is a small positive number. 7. where again 8 is a small of A. (a) Consider a perturbation E\ = [0 ~] of A.IlQf. where Q is orthogonal. then A+ == R.Exercises Exercises 73 4.bib z n where A2 = A + E2. verify that if A E ~. What happens to jt* .9). Prove that A+ = R+ QT. A+ R+QT .z2 as 8 approaches O? where A2 — A E 2 What happens to \\x* — zll2 as 8 approaches 0? 6. where 8 is a small positive number. not necessarily nonsingular. and suppose A where is 1. Solve the perturbed version of the above problem. and suppose A = QR. Solve the perturbed problem min II A 2 z .• What happens to IIx* . where AI = A + E I . then A+ R~ Q\. Use the four Penrose conditions and the fact that Q\ has orthonormal columns to 6. Let A e R"x".
This page intentionally left blank This page intentionally left blank .
4. example. The 2nonn is the most common a — \j'. we see immediately that XH is a left eigenvector of A H associated with A.1) Similarly. such that a scalar A E e.2. Thus. (9. we use both forms throughout the text.Chapter 9 Chapter 9 Eigenvalues and Eigenvalues and Eigenvectors Eigenvectors 9.A).1). For any A e Cnxn . Then n(A) A2 + 2A 3. then It can be proved from elementary properties of detenninants that if A e enxn . It can be proved easily from the Jordan canonical form to be discussed in the text to follow (see. the Fundamental Theorem of Algebra says that x 75 . for proved easily from the Jordan canonical fonn to be discussed in the text to follow (see. as a matter of convenience. Theorem 9.3 (CayleyHamilton).31 = 0.1 9.) throughout the text. Let A = [~g ~g].1). The polynomial n (A) = det(A—A.t so that the scaled eigenvector has norm 1. then so is ax [ay] for any nonzero scalar a E C. Let A [~ ~]. Note that if x [y] is a right [left] eigenvector of A. we use both forms results in at most a change of sign and. Note that if x [y] is a right [left] eigenvector of A. A nonzero vector x E en is a right eigenvector of A E e nxn if there exists Definition 9./ — A). Thus. a nonzero vector y e C" is a left eigenvector corresponding to an eigenvalue a if Mif (9. This of A. The polynomialn (A.1 Fundamental Definitions and Properties Fundamental Definitions and Properties Definition 9.2.1. Then n(k) = X2 + 2A. such that Ax = AX.1. for example. [21]) or directly using elementary properties of inverses and determinants (see. n(A) = 0. The 2norm is the most common nonn used for such scaling. e C. This results in at most a change of sign and.4.3 (CayleyHamilton). norm used for such scaling. It is an easy exercise to Example 9. called an eigenvalue. for example. called an eigenvalue.— 3. For any A E e nxn . Example 9.Al) is called the characteristic polynomial Definition 9./) is called the characteristic polynomial of A. then n(A) is a polynomial of degree n. then vector of AH associated with I. [3]).} The following classical theorem can be very useful in hand calculation. It can be The following classical theorem can be very useful in hand calculation. (Note that the characteristic polynomial can also be defined as det(A.2) By taking Hennitian transposes in (9. It can be proved from elementary properties of determinants that if A E C" ". (Note that the characteristic polynomial can also be defined as det(Al . we see immediately that x H is a left eigenBy taking Hermitian transposes in (9. A nonzero vector x e C" is a right eigenvector of A e Cnxn if there exists a scalar A. [3]). [21D or directly using elementary properties of inverses and determinants (see.) = det (A .. Definition 9. verify that n(A) = A2 2A . It is an easy exercise to 2 verify that n(A) = A + 2A . the Fundamental Theorem of Algebra says that 7t (X) is a polynomial of degree n. Theorem 9. n(A) = O. for example. as a matter of convenience. One oftenused scaling for an eigenvector is One oftenused scaling for an eigenvector is so is ax [ay] for any nonzero scalar a E a = 1/ IIx II so that the scaled eigenvector has nonn 1.31 O. a nonzero vector y E en is a left eigenvector corresponding to an eigenvalue Similarly.
too. Note. A matrix A e 1Ft x" is said to be defective if it has an eigenvalue whose geometric multiplicity is not equal to (i.e. These roots.5. then I < dimA/"(A — A/) < m.7. then A satisfies (Je — I)2 = 0. A is said to be defective if it does not have n linearly independent (right or left) eigenvectors. possibly repeated. • AM(see and set X = 0 in this identity. such a polynomial is said to be monic and we generally write et (A) as a monic polynomial throughout the text). we know that n(A) = 0. For example. Eigenvalues and Eigenvectors Chapter 9. then there is an easily checked relationship between the left and right If A € R"x". For example. Definition 9.5..AI) :::: m. then y is a right eigenvector of AT corresponding to I € A (A).ft Definition 5. but that A(A) = A(A) only if A e R"x". However.e. must occur in complex conjugate pairs. eigenvalues of A.76 Chapter 9.3) are the eigenvalues of A and imply the singularity of the matrix A . as solutions of the determinant equation 7r(A) has n roots.e.2. we get the interesting fact that del (A) = A] • A2 • • An (see also Theorem 9. then n(X) has real coefficients.4) and set A = 0 in this identity.2aA + aa2+ f322 and A has eigenvalues a f3j (where A has eigenvalues a ± fij (where j = i = R).. Xn. Then n(A) = A22.3) in the n(A) = det(A . Thll minimal polynomial of A G l!if. less than) its algebraic multiplicity. such a polynomial is said to be monic and we of the highest power of A to be +1..6. . However. if eigenvectors of A and AT (take Hermitian transposes of both sides of (9. From the CayleyHamilton Theorem. But it also clearly satisfies the smaller degree polynomial equation isfies (1 .rank(A .AI). if left of A A E A(A). Equivalently. if A = \1Q ®]. then we must have 1 :::: g :::: m. if A = [~ ~]... the n(A) coefficients.:. degree such that a (A) =0.AI) = dimN(A . if we denote the geometric multiplicity of A by g. we get the interesting fact that det(A) = AI . i •>/—!)• If A E 1Ftnxn.. checked eigenvectors of A and AT (take Hermitian transposes of both sides of (9. A.8. Moreover.~. if If A € A(A) has algebraic multiplicity m. A is said to be defective if it does not have n linearly independent (right or left) eigenvectors. and hence further guarantee the existence of corresponding nonzero eigenvectors. then A satsible for A to satisfy a lowerorder polynomial. neftnhion ~.25). we denote the geometric multiplicity of A by g. An. Definition 9. say. i. all roots of its characteristic polynomialn(A). it can also be generally write a(A) as a monic polynomial throughout the text). Thus.6.XI. then 1 :::: dimN(A . less than) its algebraic multiplicity. If A E A(A) has algebraic multiplicity m. The spectrum of A is denoted A(A). and hence further are the eigenvalues of A and imply the singularity of the matrix A — AI. A. a . 2aA + 2 + ft and Example 9. The geometric multiplicity of A is the number of associated independent eigenvectors = n — rank(A — A/) = dim J\f(A — XI). i.. it can also be .. But it also clearly satisfies the smaller degree polynomial equation (it...8. Let a. sible for A to satisfy a lowerorder polynomial. eigenvalues of A. then we must have I < g < m. n(A).5. we say that X is an eigenvalue of A of algebraic multiplicity m.A) .. Specifically. . that by elementary properties of the determinant. The spectrum of A is denoted A (A). Specifically. If e Wxn. c form form A e C" " A]. as solutions of the determinant equation n(A) = det(A  AI) = 0. independent eigenvectors = n . say. (An . we say that A is an eigenvalue of A Definition 9.e. Let a. i.A) (9. it is posn(A) = O.7. Thus.2)).. If is a root of multiplicity m of n(A). IfXA is a root of multiplicity m ofjr(X).. we always have A(A) = A(AT).. A matrix A E Wnxn is said to be defective if it has an eigenvalue whose Definition 9.) A. Eigenvalues and Eigenvectors n(A) has n roots. geometric multiplicity is not equal to (i. The geometric multiplicity ofX is the number of associated of algebraic multiplicity m.1)2 = O.. Equivalently.e. Definition 9. These roots. Let the eigenvalues of A E en xxn be denoted X\. of we always have A(A) = A(A r ).. Example 9.nxn is the polynomial o/(X) oJ IPll. Then if we write (9. The spectrum of A e nxn is the set of all eigenvalues of A... y of AT y is a left eigenvector of A corresponding to A e A(A). The minimal polynomial Of A l::: K""" ix (hI' polynomilll a(A) of least degree such that a(A) ~ O. E A(A). f3 e R and let A = [~f3 £ ]. Then jr(A. Moreover. (9..2».AI) = (A] . The spectrum of A E C"x" is the set of all eigenvalues of A. but that A(A) = A(A) only if A E 1Ftnxn. possibly repeated. must occur in complex conjugate pairs.n =0o. Theorem If A E 1Ftnxn. ft E 1Ft and let A = [ _^ !].1) . Hence the roots of 7r(A). the set of Definition 9. guarantee the existence of corresponding nonzero eigenvectors. the set of all roots of its characteristic polynomial n(X). It can be shown that or(l) is essentially unique (unique if we force the coefficient It can be shown that a(Je) is essentially unique (unique if we force the coefficient of the highest power of A to be + 1.
A~[~ A~U 2 0 0 I 2 2 ] ha< a(A) (A . be an eigenvalue of A with corresponding right Theorem 9. Bezout algorithm. . a(A) directly (without knowing eigenvalues and asThere is an algorithm to determine or (A..11.e. e l\(A) yj Xi. Fundamental Definitions and Properties 77 77 a(A) f3(A) O.. Let A e C« x " ana [et A.11. the geometric multiplicity by g. 0 0 0 2 0 0 0 2 ] h'" a(A) (A . a(X) n(X).) directly (without knowing eigenvalues and as Unfortunately.9. In particular. Unfortunately. We denote 7r(A) (A .. Fundamental Definitions and Properties 9. Then Xi = 0. Proof' Since AXi AiXi. algorithm. YY Proof: Since Axt = A. shown that a (A.1. The matrix A~U has a(A) I 2 0 0 2 0 0 0 !] (9..2) andg ~ 4.."(A) ~ ~ ~ ~ (A .2)' ""d g 2. i. Then yfx{ = O. sociated eigenvector structure). called the Bezout algorithm. eigenvector is numerically unstable. left Aj E A (A) such that Xj 1= A. A[~  2 0 I 2 0 0 0 0 0 0 !] ~ ~ ~ ha. each 4. 0 g At this point. Example 9.2)' ""d g ~ ~ ~ 1.. Furthermore. every nonzero polynomial f3(A) particular.5) = (A  2)2 and g = 2. one might speculate that g plus the degree of a must always be five. a(A) divides n(A).) divides every nonzero polynomial fi(k} for which ft (A) = 0. of which has an eigenvalue 2 of algebraic multiplicity 4.2)4. g. Unfortunately. such that Aj =£ Ai.1. this algorithm. let Yj be a left eigenvector corresponding to any A.*. 0 0 0 2 A~U 0 0 ] ha<a(A) (A .10. Example 9. Theorem 9. The above definitions are illustrated below for a series of matrices. Let A E cc nxn and let Ai be an eigenvalue of A with corresponding right eigenvector jc. n(A) = (A — 2)4. such is not the case.2)2 ""d g 3.10.
x) is an arbitrary eigenvalue/eigenvector pair such that Ax = A. and if Ai E A(A). we have that XxH = AXH x. x is a linearly independent set. Since yf*Xi =1= 0 for each i. . A = AH.. A = AH. be real. = A..5). . 118]. since YY A = AjXjyf.. we find 0 = (Ai . Then (9.. eigenvector....7) yields Using the fact that A is Hermitian. orthogonal. we must have that XHzz = 0. then by Theorem 9. ^ /z. so that YitH x.13. Eigenvalues and Eigenvectors yy. Then Proof: Suppose (A. Then all eigenvalues of A must Theorem 9. it cannot be the case that YiH Xi = 0 as well. i. respectively. the two vectors must be orthogonal. 0 The proof of Theorem 9. contradicting the fact that it is an eigenvector.. i.. 118].Aj)YY xi. D A =1= iJ. A. Since Ai .14. Then all eigenvalues of A must be real. e A(A). . xn}} is a linearly independent set. for example. Premultiply the equation Az = iJZ by x H to get x HAz = /^XHZZ = XxHz.. A. results. The same right eigenvectors XI. Then x and z must be of A with corresponding right eigenvectors and respectively... XH x /= 0.xnn • Then {XI. .n with corresponding right eigenvectors x\.5). we must have Subtracting (9.is real.12.11. [21. Theorem 9.7) yields Taking Hermitian transposes in (9. p. we have that IXHxx = XxHx. we can choose the normalization of the Xi'S.12. The same result holds for the corresponding left eigenvectors. 0 Let us now return to the general case. Since A.. c Proof: Premultiply the equation Ax = A...14.JC. is If A e C nxn has distinct eigenvalues. or the y.11 is very similar to two other fundamental and important results. Since XxHz. then by Theorem 9. i. or else Xi would be orthogonal to n linearly independent vectors (by Theorem 9. since x is an eigenvector. 0 If A E cnx " has distinct eigenvalues.e..14) well.. (9. i. 's.14) and would thus have to be 0. result holds for the corresponding left eigenvectors. Since Xi ^ 0 and would thus have to be 0. we have xHX =1= 0. i. However. so that y H Xi = 1 for each i. p. we must have that x H = 0. AXH z. for i € !1.e. Let A e C"x" be Hermitian and suppose A and /JL are distinct eigenvalues Theorem 9. Xi is orthogonal to all y/s for which j =1= i.. Let us now return to the general case.— A.e. x) is an arbitrary eigenvalue/eigenvector pair such that Ax = AX. contradicting the fact that it is an eigenvector. Since equation Az i^z XH get X H Az = iJXH A.. yr . However. Then [x\. jc. .. = 1 for/ E n. However. the two vectors must be orthogonal. Take the Hermitian Proof: Premultiply the equation Ax = AX by ZH to get ZH Ax = AZ H x. or both.. from which conclude A. Eigenvalues and Eigenvectors Chapter 9...11 is very similar to two other fundamental and important The proof of Theorem 9.. Then and z must be orthogonal. Proof: Proof: For the proof see. or the Yi 's. or else xf would be orthogonal to n linearly independent vectors (by Theorem 9.7) Taking Hermitian transposes in (9.are distinct eigenvalues of A with corresponding right eigenvectors x and z.e. A is real.7. we must have yfxt = O. Chapter 9. we can choose the normalization of the *. from which we conclude I = A. Let A e nxn be Hermitian. 's.6) Subtracting (9. . YyXi =0. for [21. An with corresponding Theorem 9. it cannot be the case that yf*xt = 0 as orthogonal to all yj's for which j ^ i. 1 ?.=1= 0.e. Take the Hermitian A transpose of this equation and use the facts that A is Hermitian and A is real to get xXHAz == of equation facts A. is real to get H Az AxH z.6) from (9.. since y" A = Similarly. 0 D Theorem 9. and if A..e. Cnxn have distinct eigenvalues AI. or both. c Proof: Suppose (A. . since x is an Using the fact that A is Hermitian.6) from (9.JC by ZH to get ZH Ax = X z Hx .. Let A E nxn be Hermitian and suppose X and iJ. we find 0 = (A. Let A €. i. x .Aj ^ 0.. Let A E C"x" be Hermitian.11. However.78 Similarly.. Let A E cnxn have distinct eigenvalues A .— A y )j^jc. the proof see.13. Theorem 9.
— 1 2 j } . 1 ± 2j}. Finally. . Yn] ing right eigenvectors form a matrix X = [XI. i E !!:: Finally.(1 + 2j) I)x2 = 0 to get For A2 — 1 + 2j..ci can be set arbitrarily./) = (A 3 + 4A. = A.. can be written in matrix form as AX=XA (9... and this then determines the other two (since dimA/XA — (—2)7) = 1). let Y — [y\. / e n.nand let the correspondTheorem 9. solve the linear system (A . 2A. xn]. Let A E C"x" have distinct eigenvalues A.22 + 2)" + 5). Similarly. from which we find A(A) = {—2. .. 2)(A.15.9. is expressed by the equation yHX = I. xn].."" yn] be the matrix of corresponding left eigenvectors. Fundamental Definitions and Properties 9. To get the corresponding left eigenvector y\.AI) (A. .I = = LAixiyr i=1 (9.. is expressed by the equation while YiH Xj = oij..A. . .(2)l)xI = 0 to get Note that one component of . Let A e en xn have distinct eigenvalues AI. solve the 3 x 3 linear system (A — (—2}I)x\ = 0 to get For Ai = 2. Furthermore. corresponding to these eigenvalues. .I . solve the 3 x 3 linear system (A .. + 5). .11) Example 9. / en.16.(2)1) = 1).. solve the linear system (A — (—1 + 2j)I)x2 = 0 to get yi X2 =[ 3+ j ] 3 ~/ . Then AXi = AiXi. Furthermore.8) while y^Xj = 5.*/.3 4A2 9 A. . let A = diag(AJ. Then AJC.. can be written in matrixform as diag(A. i E !!. For A2 = 1 + 2j. Fundamental Definitions and 79 Theorem 9. and this then determines the other two Note that one component of XI can be set arbitrarily.. .16. An) e ]Rnxn.. . A. Xn) E Wtxn.. 10) (A. An and let the corresponding right eigenvectors form a matrix X [x\.. This time we have chosen the arbitrary scale factor for y\ so that \ = 1.2 + 9)" + 10) = ()" + 2)().15. .9) =A = XAyH yRAX n (9. solve the (since dimN(A . suppose that the left and right eigenvectors have been normalized so that yf1 Xi = 1. j e n. Then rr(A) = det(A .1. y' E !!.. We can now find the right and left eigenvectors corresponding to these eigenvalues.10) = XAX. We can now find the right and left eigenvectors which we find A (A) = {2..j. Similarly. let Y = [YI. let A = right eigenvectors have been normalized so that YiH Xi = 1. To get the corresponding left eigenvector YI. i E!!. / en. Let Example 9.. solve the linear system (A 21) = 0 to get linear system y\(A + 21) = 0 to get yi This time we have chosen the arbitrary scale factor for YJ so that y f xXI = 1. These matrix equations can be combined to yield the following matrix factorizations: These matrix equations can be combined to yield the following matrix factorizations: XlAX and and A (9.1.. suppose that the left and be the matrix of corresponding left eigenvectors. . from Then n(X) det(A . . For Al = —2.. Let 2 5 3 3 2 4 ~ ] .
+ 3)(A.17./) _(A + 8A from which we find A(A) = {—1.80 Chapter 9. use the fact that A.2*2 to get Ax^ = ^2X2. is from which we find A (A) = {I. Eigenvalues and Eigenvectors Chapter 9. we can also note that X3 =x2' and yi jj. o 3 Then 7r(A.2 and simply can also note that x$ = X2 and Y3 = Y2. Other results in Theorem 9. = I . To see this. use the fact that A33 = A2 and simply conjugate the equation A.. 3. + 4).15 can also be verified. Let A = [~ ~ ~] . 4}. we could have found them instead by computing instead of detennining the Yi'S directly. Let Example 9.L Other results in Theorem 9.2j.!. A.( I + 2 j) I) = 0 and nonnalize Y22 so that y"xX2 = 1 to get For A3 = — 1 — 2j.±1 4 4 4 l+j . det(A . instead of determining the j. + 1)(A.AI) = (A33 + 8A 22+ 19A + 12) = (A + I)(A + 3)(A + 4). However. XIAX=A= [ 2 0 0 1+2j o 0 Finally.'s directly. Now define the matrix X of right eigenvectors: Now define the matrix of right eigenvectors: 3+j 3j 3.17.15 can also be verified.=. we could proceed to solve linear systems as for A2.A similar argument yields the result conjugate the equation AX2 — A2X2 to get AX2 A2X2. we could proceed to solve linear systems as for A. Proceeding as in the previous example.) 19X + 12) = (A. Then. Eigenvalues and Eigenvectors Solve the linear system yf (A — (1 + 27')/) = 0 and normalize y> so that yf 2 1 to get Solve the linear system y" (A . Then Jl"(A) = det(A .j ] 3+j . Then. To see this. X~l Example 9.=. itit is gtruightforw!U"d to compute straightforward to comput~ X~[~ and and I 0 I i ] 1 x. Finally. Proceeding as in the previous example. we For XT. note that we could have solved directly only for XI and X2 (and X3 = X2). similar argument yields the result for left eigenvectors. For example. 2 It is then easy to verify that It is then easy to verify that 2 .!. for left eigenvectors. = x2). note that we could have solved directly only for *i and x2 (and XT. —4}. —3.c2 = ^.A.L 4 !. we could have found them instead by computing XI and reading off its rows.2 However. For example.~q 1 3 2 2 0 2 3 ] 2 ~ y' .
Theorem 9.18. The following theorem is useful when solving systems of linear differential equations. but /(A) does not necessarily the eigenvalues of /(A) (defined as X^o^A") are /(A). What is true is that the eigenvalue/eigenvector pair (A.1 AT) =X(THyf. but A = [~ has two independent right eigenvectors associated with the eigenvalue o. ( x ) is a polynomial. x) maps to (/(A). —3. I I ~J I 2 0 0 0 3 3 3 I I (. For left eigenvectors we have a similar statement. Then.lIAT)(T~lx)x) = X ( T ~ lIxx). Eigenvalues (but not eigenvectors) are invariant under a similarity transTheorem 9. ff(x) is a polynomial. or sinx. x) maps to (f(A). 3. A = [~ Oj ] have all the same eigenvectors (unless. i=1 . For example. Proof: Suppose (A. which is equivalent to the dyadic expansion sion 3 A = LAixiyr i=1 ~(I)[ ~ W~l+(3)[ j ][~ ~ 1 . Then. ] [~ ~ (I) [ I (. 4). For left eigenvectors we have a similar statement.) .3 0 ~l +(4) [ . formation T. 2 3 I (. The following theorem is useful when solving systems of linear differential equations.20. What is true is that the independent right eigenvectors associated with the eigenvalue 0. For example. X) is an eigenvalue/eigenvector pair such that Ax = AX. or eX. A is diagonalizable). say. D D yHA = XyH ifandon\yif(T(T Hy)H (T. I 3 I (.1. representable by a power series X^^o fln*n)> then it is easy to show that representable L~:O anxn).20. which is equivalent to the dyadic expanWe also have X~l AX = A = diag( 1. from the theorem statement follows. Then suppose XI AX = A.. If / is an analytic function (e. of Chapter 11. Eigenvalues (but not eigenvectors) are invariant under a similarity transformation T. eigenvalue/eigenvector pair (A.but f(A) does not necessarily have all the same eigenvectors (unless. etA Ax are Details of how the matrix exponential e'A is used to solve the system x = Ax are the subject solve system i of Chapter 11. J+ (3) [ 2 0 2 I I I 2 I ]+ 3 3 I I 3 I I I 3 3 I (4) [ 3 3 I I 0 3 3 l Theorem 9. A Theorem 9. jc) is an eigenvalue/eigenvector pair such that Ax = Xx. since T is nonsingular.9. Remark 9. —4). or ex. x) but not conversely.19. where A is diagonal. namely yH A AyH if and only if Hy)H(T~1AT) = A(T Hy)H. A is diagonalizable). from which equivalent statement (T~ AT)(T.19.. namely the theorem statement follows. we have the equivalent statement (T. in general. Remark 9.I = A(T. but A2 = f0 0~1]has two has only one right eigenvector corresponding to the eigenvalue 0. say.18. or sin*. x) but not conversely. e jRnxn n = LeA. Fundamental Definitions and Properties 81 81 We also have XI AX = A = diag(—1. Let A E R" xn and suppose X~~1AX — A.1. then easy to show that the eigenvalues of f(A) (defined as L~:OanAn) are f(A). or.g. A = T0 6 2 has only one right eigenvector corresponding to the eigenvalue 0. since T Proof: Suppose (A. Fundamental Definitions and Properties 9.txiYiH.g. If f is an analytic function (e.
. analytic on the spectrum of A. Jordan Canonical Form (/CF): For all A e c nxn with eigenvalues AI. .13) 1i = o o Ai o Ai . € n_. It is necessary first to consider the notion of Jordan canonical form..22. and right Corollary 9.e. 0 (9.2 Jordan Canonical Form Jordan Canonical Form Theorem 9. f ( X t t ) ) X ~ It is desirable. i. of course.... i E ~.2 9... ( A ) = Xf(A)X. It is necessary first to consider the notion of Jordan A is not necessarily diagonalizable. .82 Proof: Starting from the definition. . Eigenvalues and Eigenvectors n = LeA. 1q). The following corollary is immediate from the theorem upon setting t = I. i=1 0 The following corollary is immediate from the theorem upon setting t = I. ff(A) = X f ( A ) X ~ l I = Xdiag(J(AI).22.i). 9. from which such a result is then available and presented later in this chapter. Corollary 9. there exists X E C^x" such that XI AX = 1 = diag(ll.21 for any function that analytic on the spectrum of A.. to have a version of Theorem 9... ...20 and its corollary in which It is desirable. there exists X € c~xn such that (not necessarily distinct)... and the same eigenvectors. from which such a result is then available and presented later in this chapter. lordan Canonical Form (JCF): For all A E C"x" with eigenvalues X\. eigenvectors Xi. kn E C (not necessarily distinct). f(An))X. If A E Rnx" is diagonalizable with eigenvalues A.20 and its corollary in which A is not necessarily diagonalizable. then eA has eigenvalues e X"i ... 1q is of the form where each of the Jordan block matrices / 1 ••• Jq is of the form Ai 0 Ai Ai o 0 (9.20 and Corollary 9. canonical form.. and right eigenvectors xt•. . /' en..e.21. to have a version of Theorem 9. ii E ~. If A e ]Rn xn is diagonalizable with eigenvalues Ai. An e C 1.IXiYiH. .21 for any function that isis There are extensions to Theorem 9.. we have Proof' Starting from the definition.. i. / € n_. .20 and Corollary 9.12) where each of the lordan block matrices 1i .= Xdiag(/(A. . Eigenvalues and Eigenvectors Chapter 9. and the same eigenvectors... we have Chapter 9.21. . i E ~. I. then e A has eigenvalues e A There are extensions to Theorem 9. Theorem 9. of course.Il .
~xn necessarily distinct).2.. complicated. the situation is only a bit more complicated. .. ] T (X . [21.JfJ =[ (X fJ fJ ] (X = M. (Xii±jpieA(A>). 83 83 Form: 2. 120124].. For nontrivial Jordan blocks. Jordan Canonical Form and L..14) J\. With 1 j o j o 1 o o o j ~ ~] 0 1 ' . pp. for example..=1 ki = n. there exists X € R" xn such that (9.An n (not € jRnxn Xi. .9. = [ _»' ^ 1 and h2 = [6 ~] in the case of complex conjugate eigenvalues where Mi = [ _~. Proof: proof D 0 Transformations like T = [ _~ "•{"]allow us to go back and forth between aareal JCF Transformations like T = I" _. Proof: For the proof see. Jordan Canonical Form 9. X (not necessarily X E lR. ~: ] and I = \0 A in the case of complex conjugate eigenvalues a ± jfJi E A(A)... ... and where M. e A (A). { ] allow us to go back and forth between real JCF and its complex counterpart: TI [ (X + jfJ o O. . 1q is of form where each of the Jordan block matrices 11. Jq is of the form of in the case of real eigenvalues A.. Real Jordan Canonical Form: For all A E R n x " with eigenvalues AI.2. .. .
The minimal polynomial of a matrix is the product of the elementary divisors of divisors. Tr(A) = 2. Then Theorem 9. Then c n 1..1)2.2).. Let A e C" " with eigenvalues AI. The minimal polynomial of a matrix is the product of the elementary divisors of highest degree corresponding to distinct eigenvalues. The characteristic polynomial of a matrix is the product of its elementary Theorem 9. Again.. 9. Suppose A e lR. from Theorem 9.(A. An. 2(A(A. Eigenvalues and Eigenvectors Chapter 9.. i=1 l Proof: Proof: 1.22 we have that A = X JJX ~ l . 1)2(A .22 l Tr(A) = Tr(X J XI) Tr(JX.(A 1).23. I) . J(2) has elementary divisors (A while /( 2) haselementary divisors (A . and (A .. 1). and (A (A . . X XI. x Theorem 9. 2)2. det(A) = det(XJX. Then has two possible JCFs (not counting reorderings of the diagonal blocks): diagonal blocks): 1 J(l) = 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 and f2) 0 0 0 0 0 1 0 0 0 0 0 2 = 0 0 0 0 0 0 I 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 0 2 0 J(l) has elementary divisors (A Note that 7(1) haselementary divisors (A .26. 1.1)z. . Tr(A) = Tr(XJX~ ) = TrC/X"1 X) = Tr(/) = £"=1 A.. 1 det(A) = det(X J XI) det(J) = n7=1 Ai.I [ "+ jfi 0 0 0 et + jf3 0 0 0 0 et .7x7 is known to have 7r(A) = (A . . Eigenvalues and Eigenvectors T. Thus.)i.22 we have that A X J XI.22 are called the elementary divisors or invariant factors of A. 2 . From Theorem 9.25.) = (A.26.. highest degree corresponding to distinct eigenvalues.jf3 0 ]T~[~ l h M Definition 9.22 are called the elementary divisors or invariant factors of A. from Theorem 9. .I)(A (A2)2. Theorem 9. . Then AAhas two possible JCFs (not counting reorderings of the a (A. Let A E nxn with eigenvalues AI. I) .2).jf3 0 0 et .2)3 3and is known to have :rr(A) Example 9.24.2).1*) = Tr(J) = L7=1 Ai. " Xn.1)4(A . det(A) = nAi..24.84 it is easily checked that it is easily checked that Chapter 9.22 we have that A = X J X ~ .. The characteristic polynomials of the Jordan blocks defined in Theorem Definition 9. and 2).. From Theorem 9.2(A(A .1)2. . Thus. — 2) . i=1 n 2. and (A .2)2.23..— I) (A. (A1). D 0 Example 9. Suppose A E E (A.(A. 2. . The characteristic polynomials of the Jordan blocks defined in Theorem 9.) = det(7) = ]~["=l A. 1)4(A 2) and 2 2 et(A) = (A . .2)2. The characteristic polynomial of a matrix is the product of its elementary divisors.2)2. Thus.25. 1)..
it The straightforward case is.e. and rank(A al) vectors. a (A). An analogous definition holds for a left principal vector of degree k.l) independent right — — A.AI)klx i= o. when X. Thus. of course. The more interesting (and difficult) case occurs when Ai multiplicity A. eigenvectors dimN(A — A.A.7) = n . i. suppose suppose A = [3 2 0 o Then Then 3 0 A3I= U2 I] o o 0 0 n has rank 1. 9. a(A). and rank (A A. The matrices A uniquely. A e nxn ]R.— a) and rank(A . both are eigenvectors (and are independent). i. Knowing TT (A. c .(7).28.3 Determination of the JCF Determination of the JCF The first critical item of information in determining the JCF of a matrix A E Wlxn is its A e ]R. three eigenboth have rr(A) = (A .. associated independent right (or left) eigenvectors is given by dim A^(A .A.. it then has precisely one eigenvector.) = (A. and rank(A —Ai l) for distinct Ai is not sufficient to rr(A). Determination of the JCF 9. left k.e. of algebraic multiplicity 1. i. when Ai is simple. a(A.7) for distinct A. Let A E C"xn (or R"x").3. If we let [~l ~2 ~3]T associated If [^i £2 &]T denote a solution to the linear system (A — 3l)~ = 0.27.l). Definition 9.nxn). determine a 0 0 0 0 0 0 0 a 0 0 0 0 0 a 0 0 0 0 Al= 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 0 a 0 0 0 0 0 0 1 a A2 = a 0 0 0 0 0 0 0 a 0 0 0 0 0 a 0 0 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 a 0 0 0 0 0 0 0 a 4. 1. is of algebraic multiplicity greater than one. The straightforward case is.— a) . three eigen7r(A.3. so the eigenvalue 3 has two eigenvectors associated with it. Remark 9. of course. i. of algebraic multiplicity 1. To get a third vector X3 such that X = [Xl KJ_ X3] reduces A to JCF.e. we find that 2£2 + ~3 = O.— al) == 4.. is not sufficient to Example 9.a(A) = (A . determine the JCF of A uniquely.9.] are eigenvectors (and are independent).) = (A.rank(A .27.nxn number of eigenvectors. Determination of the JCF 85 &5 Example 9.is simple. Thus.. a)7.ulx = 0 and (A . Remark 9. the associated number of linearly A.. For each distinct eigenvalue Ai.e.3/)£ = 0..28. we find that 2~2 + £ 3= 0 . X e A(A) if and only if (A XI)kx = 0 and (A U}k~l x ^ 0. To get a third vector JC3 such that X [x\ X2 XT.29. a)\ ..).. both denote a solution to the linear system (A ...29. Then x is a right principal vector of degree k degree associated with A E A (A) ifand only if(A . X principal Definition 9. we need the notion of principal vector. For example.
For example. x(l)." "of often 3. solutions solutions to the homogeneous equation (A .17) The first column yields the equation Ax(!) = AX(!). principal vectors still need to be computed from succeeding steps. For each independent jc (1) . principal vectors of degree 1) associated with A. The other solution necessary is the desired principal vector of degree 2. Denote by x(1) and x(2) the two columns of a matrix X e R2. eigenvector. The number of eigenvectors depends on the rank of A . for get righthand example. (9.2. since (A . See.X I ) ( l ) = (A AI)O o. for example. Theother solution (A . the principal vector second of degree 2: of degree (A .A/)x(2) = x(l). if of .XI. (A — uf. ji of dimension k or larger.AI). of course. Then for each distinct X e A (A) perform the following: z (2) w c 1.A1)x(2) = x(l).XI)2x^ = 0. x(l) (^ 0).) . the definition of principal vector is satisfied.1 Theoretical computation Theoretical computation To motivate the development of a procedure for determining principal vectors. there is only one eigenvector. If we premultiply (9. Exercise 7.A1)X(l) = O. 2. this rank is n . The second column yields the following equation for x .1. if X.. is.x2 A JCF.'A1)22xx(l) = (A . solve (A .A1)2 x(2) = (A .XI)0 = 0.A/) = — multiplicity of rank(A — XI) = n . of — AI.1 9. Solve (A . I) associated This step finds all the eigenvectors (i. E lR nxn This suggests a "general" procedure. 4. determine all eigenvalues of A e R" x " nxn ).XI).AI). A E A(A) following: (or C ). Eigenvalues and Eigenvectors synonymously "of 2. of k 5. Principal vectors are sometimes also called generalized eigenvectors. The case k = 1 corresponds to the "usual" eigenvector. by (A . k = eigenvector. The phrase "of grade k" is often used synonymously with "of degree k. we find (A If we premultiply XI) x = (A XI)x = 0. wefind(A.3.3. different term will be assigned a much different meaning in Chapter 12.86 Chapter 9. One of these solutions (A — AI)2 x (2) x(l) (1= 0).X I ) .17) by (A . If the algebraic multiplicity of If A principal need X is greater than its geometric multiplicity. there are two linearly independent n — o. consider a determining 2 x Jordan [~ i]. combination of jc(1) vectors to get a righthand side that is in 7£(A — XI). S. Then the equation AX = XJ can be written that reduces a matrix A to this JCF. which simply says that x(!) is a right Ax(1) = hx(1) x (1) (2) x(2). See. Denote by x(l) and x(2) the two columns of a matrix X E lR~X2 2x2 2 Jordan block{h0 h1.A1)X(l) = O. but the latter generalized eigenvectors. First.e. If. of The number of linearly independent solutions at this step depends on the rank of 2 (A . Then the equation AX = X J can be written A [x(l) x(2)] = [x(l) X(2)] [~ ~ J. Thus. A right (or left) principal vector of degree k is associated with a Jordan block J. Eigenvalues and Eigenvectors Chapter 9. 9. Thus. (It may be necessary to take a linear of x(l) R(A .
see. . Let A=[~ 0 2 ] .. 0 The eigenvalues of A are AI = 1.2I)x~1) = 0 yields . There are significant numerical difficulties inherent in attempting generally prove unreliable. see. Continue in this way until the total number of independent eigenvectors and principal 4. x (k) } is a linearly independent set. find the eigenvectors associated with the distinct eigenvalues 1 and 2. Theorem 9. Principal vectors associated with different Jordan blocks are linearly indeTheorem 9.. First. for example. Continue in this way until the total number of independent eigenvectors and principal vectors is equal to the algebraic multiplicity of A. solve (A AI)x(3) 87 = x(2). find the eigenvectors associated The eigenvalues of A are A1 = I. for example. Theorem 9. Theorem 9.3. For more extensive treatments. . and A3 = 2.3. say). X(k)} is a linearly independent set. {x(l). . x(k)]..32. Suppose A e Ckxk has an eigenvalue A. X(k)]. Determination of eigenvectors and principal vectors is obviously very tedious for anything beyond simple problems (n = 2 and principal vectors is obviously very tedious for anything beyond simple problems (n = 2 or 3. . [20] and [21].AI) = k . and the interested student is strongly urged to consult the classical and very to compute a JCF. Symbolic Toolbox. First. although a j ardan command is available in MATLAB'S does not offer a jcf command. Suppose A E C kxk has an eigenvalue A of algebraic multiplicity kkand suppose further that rank(A — AI) = k — 1. 1 . [20] and [21]. where the chain of vectors x(i) is constructed as above. Then Theorem 9. Attempts to do such calculations in finiteprecision floatingpoint arithmetic generally prove unreliable. h2 = 1. and the interested student is strongly urged to consult the classical and very readable [8] to learn why. (x (1) . and h3 = 2. although a jordan command is available in MATLAB's Symbolic Toolbox.30. .32. Attempts to do such calculations in finiteprecision floatingpoint arithmetic or 3. of algebraic multiplicity and Theorem 9. say). . For each independent x(2) from step 2. 4.1. this naturallooking procedure can fail to find all Jordan vectors.31.2/)x3(1)= 0 yields (A .31. Let Example 9.9. Determination of eigenvectors more extensive treatments. Let X = x ( l ) .(1) (A . solve 3.. Determination of the JCF 3. For Unfortunately.33. where the chain of suppose further that rank(A . with the distinct eigenvalues 1 and 2. For each independent X(2) from step 2.33. . pendent. Determination of the JCF 9.. vectors is equal to the algebraic multiplicity of A. . Then vectors x(i) is constructed as above. .. Example 9. Notice that highquality mathematical software such as MATLAB does not offer a j cf command. Let X = [[x(l). Principal vectors associated with different Jordan blocks are linearly independent. There are significant numerical difficulties inherent in attempting to compute a JCF. .30. A2 = 1. Unfortunately. this naturallooking procedure can fail to find all Jordan vectors.. Notice that highquality mathematical software such as MATLAB readable [8] to learn why.
solve 2 (A .. d.. =0 yields (1) Chapter 9. Then Let D = diag(d" . 0 !b. 0 0 D'(X' AX)D = D' J D = j ). but the result clearly holds for any JCF.1I)xl ) = xiI) to get (A – l/)x. For the sake of definiteness. consider below the case of a single Jordan block. 0 = 0 A dn dn I 2 0 dn dn I A 0 )..l/)x.. For the sake of defmiteness. Eigenvalues and Eigenvectors To find a principal vector of degree 2 associated with the multiple eigenvalue 1. Suppose A € Rnxn and SupposedA E jRnxn and Let D diag(d1. . . ..11)x?J = 0 yields (A.3. .3. Now let Now let X (2) =[ 0 ] ~ . d n)) be a nonsingular "scaling" matrix. 0 1 = [xiI) 0 xl" xl"] ~ [ ~ 5 ] and XlAX 5 3 0 Then it is easy to check that Then it is easy to check that l 1 X'~U i 1 =[ ~ I 0 0 n 9. we 1 's but can be arbitrary — so long as they are nonzero. dn be a nonsingular "scaling" matrix.2 On the +1 's in JCF blocks 's JCF In this subsection we show that the nonzero superdiagonal elements of a JCF need not be In this subsection we show that the nonzero superdiagonal elements of a JCF need not be 1's but can be arbitrary .88 (A .2 9.. we consider below the case of a single Jordan block. (1) toeet x. .(2) = x. solve To find a principal vector of degree 2 associated with the multiple eigenvalue 1.so long as they are nonzero. d. Then A 4l. but the result clearly holds for any JCF..
. set {As s E S}. Theorem 9.Am)Vm with Ai. interpreted This result can also be interpreted in terms of the matrix X = [x\. (A ...A. A. the reverseorder identity matrix (or exchange matrix) In a similar fashion.. It is thus natural to expect an associated direct sum decomposition of R... A subspace S c V is A invariant if AS c S. Such a decomposition is given in the following theorem. similarity transformation XD [d[x[. Suppose e jH. where AS is defined as the transformation. Definition 9. Then AI.. where AS is defined as the set {As:: s e S}...4 9. E6 N (A ...AlIt) E6 . Note that dimM(A .. . Geometric Aspects of the JCF 89 di's Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements... .A[)n) .35. It is thus natural to expect an with respect to which the matrix is diagonal or block diagonal. In a similar fashion. Such a decomposition is given in the following associated direct sum decomposition of jH...9..AmItm . Specifically.18) 0 I 1 0 0 can be used to put the superdiagonal elements in the subdiagonal instead if that is desired: to superdiagonal elements in instead desired: A I 0 0 A 0 A 0 A 0 0 A 0 p[ A p= 0 1 0 0 A 0 I A A 0 0 0 A 9.4. Let IF and suppose + transformation.A1I) v) E6 . dnxn].4 Geometric Aspects of the JCF Geometric Aspects of the JCF The matrix X that reduces a matrix A E IR"X"(or C nxn)) to aalCF provides aachange of basis X e jH. ..n..n = N(A = N (A . mdistinct. Geometric Aspects of the JCF 9.. .35. .Am I) Vm .. dimN(A — AJ)Vi = ni. E6 N(A .. x n eigenvectors and principal vectors that reduces A to its lCF.. Suppose A E R"x" has characteristic polynomial 9.nxn n(A) = (A .Amtm c and minimal polynomial a(A) = (A . j is obtained from A via the and principal vectors that reduces A to its JCF. .. dnxn}.. Then jH. A subspace S ~ V is Ainvariant if AS ~ S. .xn]] of eigenvectors = [x[.34. Am distinct. Specifically. .4. the reverseorder identity matrix (or exchange matrix) 0 p = pT = p[ = 0 I 0 (9.A[)V) '" (A . Let V be a vector space over F and suppose A : V —>• V is a linear Definition 9.nxn (or nxn to JCF provides change of basis with respect to which the matrix is diagonal or block diagonal.34./) w = «. J is obtained from A via the similarity transformation XD = \d\x\.
we could choose bases for N(A — A.37. s/t span a /^dimensional subspace <S. /. Ainvariant. Then N(p(A)) and 1. so by (9.. Jm).. //*. . Other representation for A with full blocks rather than the highly structured Jordan blocks.38. i. Finally. S is Ainvariant if and only if S ..19) the columns of Xi (i.Ji . the eigenvectors and principal vectors associated with A. where each Ji = diag(/. 9..36.38. AT Theorem 9. Note that A A". .g.. example (note that the power n. so the columns of A. Sk If R" R.Xm] ] Ee]R~xnxnisis such that X^AX ==diag(J1. .) span an Ainvariant of A". eigenvalues Ai 9. for N(A .. is not necessarily diagonalizable. we have that A A. A".) and each Jik is a Jordan block corresponding to A..19) AS = SM.e. e A(A).. 7. If V is a vector space over IF such that V = N\ EB . This follows easily by comparing the ith columns of each side of (9. . where each Theorem 9.19): /th Example 9. = X.) and each /. is Ainvariant...• EB Nm. Let peA) = «o/ + o?i A + • + <xqA be a polynomial in A.. be a Jordan basis for N (AT .. i /= 1..). Suppose X block diagonalizes A. Let 7. We would then get a block diagonal example (note that the power ni could be replaced by Vi). could be replaced by v. .e. Jm). (i.e.. If A has distinct The Jordan canonical form is a special case of the above theorem. Rewriting in the form ~ J. A invariant if only ifS1 1.is A T invariant. . Xm R"n such that XI AX diag(7i.34. but we restrict our attention here to only the Jordan block case.90 Chapter 9...e. = 1. each Ji = diag(JiI.. € C"x"' be a Jordan basis for N(AT — A. e E"x"..Ai/)n. K(S) <S... Suppose A E ]Rnxn... where X [ X i .. then is Ainvariant if and only if there span a kdimensional subspace S. Then N(p(A)) and R(p(A)) 7£(p(A)) are Ainvariant..i. then S <S is Ainvariant if and only if there exists M E ]Rkxk such that eRkxk (9. The Jordan canonical form is a special case of the above theorem. We would then get a block diagonal representation for A with full blocks rather than the highly structured Jordan blocks. and S e R" xk s\. Eigenvalues and Eigenvectors If V is taken to be ]Rn over Rand S E ]Rn x* is a matrix whose columns SI.. i. ."" Jik. /. Let Yi E <enxn . The equation Ax = A* = x A defining a right eigenvector x of an eigenvalue AX = x A defining a right eigenvector x of an eigenvalue A x X says that * spans an Ainvariant subspace (of dimension one). Let p(A) = CloI + ClIA + '"• •+ ClqAqq be a polynomial in A.li. such "canonical" forms are discussed in text that follows.. 9. We could also use other block diagonal decompositions (e..lt. 2. R(S) == S.* is a Jordan block corresponding to Ai E A(A). The equation Ax Example 9. Equivalently.. Example 9. If F = NI ® • • 0 m A// is Ainvariant. Eigenvalues and Eigenvectors Chapter 9..19) the columns attention here to only the Jordan block case.2.e. then a basis for V can be chosen with respect to which A has a block diagonal representation.39.as in Theorem 9./)"'. .. then a basis for V can be chosen with respect to which A has a block N./)"' by SVD.37.34. we return to the problem of developing a formula for e l A in the case that A A formula e' A is not necessarily diagonalizable.. so by (9.39...2. so the columns of Xi span an Amvanant subspace. we have that AXi Theorem 9. XI AX = [~ J 2 ]. via SVD).. Other such "canonical" forms are discussed in text that follows. If A has distinct eigenvalues A. partition .36.A. Note that AXi = A*. partition Equivalently. = Xi..span an Ainvariant subspace. i. diagonal representation. of W. .. Suppose A"== [Xl . the eigenvectors and principal vectors associated with Ai) span an Ainvariant subspace of]Rn.
It is a generalization of the sign (or signum) of a scalar. and let e cnxn be a Jordan canonical form for A.5 The Matrix Sign Function The Matrix Sign Function section brief interesting useful In this section we give a very brief introduction to an interesting and useful matrix function function called the matrix sign function.40. denoted sgn(A). It is a generalization of the sign (or signum) of a scalar. Ym]H = LX.9.lt 2 e At 2! 0 exp t 0 0 0 1 A teAt eAt 0 0 0 0 0 block Ji associated A = A.41. is given by eigenvalues in the right halfplane. Then the sign of A. denoted sgn(A).. associated with an eigenvalue A.I = XJy H = [XI.5. m ••• . for a k x k Jordan block 7. E f= 0. The Matrix Sign Function 91 91 compatibly. . Then the sign of A. .= Ai. .5 9. . A survey of the matrix sign function and some of its applications can be found in [15]. The Matrix Sign Function 9. Then the sign of z is defined by Re(z) {+1 sgn(z) = IRe(z) I = 1 ifRe(z) > 0. A called the matrix sign function. of defined Definition 9.41. i=1 which is a useful formula when used in conjunction with the result which is a useful formula when used in conjunction with the result A 0 A A 0 eAt teAt eAt .40. with N containing all Jordan blocks corresponding to the be a Jordan canonical form for with N containing all Jordan blocks corresponding to the eigenvalues of in the left halfplane and P containing all Jordan blocks corresponding to eigenvalues of A in the left halfplane and P containing all Jordan blocks corresponding to eigenvalues in the right halfplane. 9. is given by sgn(A) = X [ / 0] 0 / X I ..S.. Then A = XJX. Definition 9. ifRe(z) < O.. Definition 9. Let z E C with Re(z) ^ O.YiH. i=1 H In a similar fashion we can compute m etA = LXietJ. Xm] diag(JI. . Suppose A E C"x" has no eigenvalues on the imaginary axis.JiYi .. Definition 9.. Then compatibly. Jm) [YI .
. sgn(T1AT) Tlsgn(A)TforallnonsingularT e enxn 6.n 1. and let — sgn(A). respectively. Xn with corresponding right eigenA e nxn ). 6. Let v E en be an vectors Xl. 3.43. S = sgn(A). . S2 = I. . sgn(cA) = sgn(c) sgn(A) for all nonzero real scalars c. The JCF definition of the matrix sign function does not generally lend itself to reliable computation on a finitewordgenerally itself length digital computer.S) /2 is a projection onto the negative invariant subspace of A. Eigenvalues and Eigenvectors where the negative and positive identity matrices are of the same dimensions as N and p. 3. sgn(A") = (sgn(A»H. ••. AS = SA.. 7l(S l) is an Ainvariant subspace corresponding to the left halfplane eigenvalues left halfplane I.1> .. posA == (l + S)/2 is a projection onto the positive invariant subspace of A..92 92 Chapter 9. 2. projection subspace of 4. respectively. R(S — /) Ainvariant of (the negative invariant subspace). 5. 3.. Then the following hold: following e 1.. Yn. 4. 5. sgn(TlAT) = T1sgn(A)T foralinonsingularT E C"x". positive = (/ + of A. .. S = sgn(A). Let e C" be an arbitrary vector. There are other equivalent definitions of the matrix sign function. Then the following hold: following 1. 4. Let A E Cnxn have distinct eigenvalues AI. Their straightforward proofs are left to the exercises. but the one given here is especially useful in deriving many of its key properties. sgn(cA) = sgn(c) sgn(A)/or c.43. 2. sgn(AH) = (sgn(A))".. Eigenvalues and Eigenvectors Chapter 9. Find the appropriate expression for v as a linear combination expression of the left eigenvectors as well.xn and left eigenvectors Yl. The JCF definition of the here is especially useful in deriving many of its key properties.42. . Their left exercises. Theorem 9. S2 = I. ± 1. distinct right eigenvectors Xi. yn. e nxn Theorem 9. and let = sgn(A). EXERCISES EXERCISES 1.. ••• . ). its reliable numerical calculation is an interesting topic in calculation its own right. S is diagonalizable with eigenvalues equal to del. respectively. AS = SA. Show that v can be expressed (uniquely) as a linear combination e of the right eigenvectors... We state some of the more useful properties of the matrix sign function as theorems. negA = (/ — S)/2 3. Show that v can be expressed (uniquely) as a linear combination arbitrary vector. of A (the negative invariant subspace).. R(S+/) is an Ainvariant subspace corresponding to the right halfplane eigenvalues R(S + l) A invariant halfplane of (the positive invariant of A (the positive invariant subspace).. but the one given There are other equivalent definitions of the matrix sign function. Theorem 9. Xn and left eigenvectors y\. 2. In fact.. negA == (l . We state some of the more useful properties of the matrix sign function as theorems. e C"x" Theorem 9.. Suppose A E C"x" has no eigenvalues on the imaginary axis.42. . positive of P. Suppose A E enxn has no eigenvalues on the imaginary axis.
(A — I)x(2) x(1) 8. Suppose A E C"x" is skewHermitian. Prove that all eigenvalues of 2. eigenvalues. i. 2. y E lR.22. Determine the eigenvalues. Show that all right eigenvectors of the Jordan block matrix in Theorem 9. eigenvectors and if and (real) JCFs of the following matrices: (a) 2 1 ] 0 ' [ 1 6. AH = A. Suppose 10~16 is added to the (16..5x5 has eigenvalues {2. = O. 2. Determine the JCF of A. multiples of e\ E lR. Suppose A e rc nxn is Hermitian. Let A E lR. where J is the JCF 1 J=[~ 0 1~]. Suppose A € rc nxn is skewHermitian. y e R" are nonzero vectors 10. Suppose a matrix A E R 16x 16 has 16 eigenvalues at 0 and its JCF consists of a single A e lR. Let A be an eigenvalue of A with corresponding right eigenvector x.nxn A = xyT. if A is skewHermitian. a skewHermitian matrix must be pure imaginary. Suppose a matrix A E lR. 5.l]r as an eigenvector.e. 16x 16 has eigenvalues at 0 its JCF consists of single Jordan block of the form specified in Theorem 9. Prove the same result if A is skewHermitian. 3}. where x. x. Determine all possible € R 5x5 {2. Determine the JCFs of the following matrices: <a) Uj n 2 1 2 =n 7. where x. 2. JCFs for A.30 must be multiples of el e R*. k .Exercises 93 93 2. 10. 3}. 9. Characterize all left eigenvectors.1) element of J. where J is the JCF Find a nonsingular matrix X such that X AX = J. Let A be an eigenvalue of A with corresponding 3. Show that x is also a left eigenvector for A.e./)jc = x can't be solved. Let A e R" xn be of the form A = 1+ xyT. JCFs for A.. Prove that all eigenvalues of a skewHermitian matrix must be pure imaginary. The vectors [0 1 Ifand[l 0 of [0 — l] r and[1 0]r (2) (1) are both eigenvectors. y E lR. but then the equation (A . 4. Suppose A E C"x" is Hermitian.1) element of J.30 must be 8. Let 7. y e R" are nonzero vectors with A E lR. Show that all right eigenvectors of the Jordan block matrix in Theorem 9. Show that x is also a left eigenvector for A.22. Determine the JCF of A. Determine the JCFs of the following matrices: 6.16 Jordan form specified 9. Let A e R"x" be of the form A = xyT. i. What are the eigenvalues of this slightly perturbed matrix? matrix? . n are nonzero vectors with with xTTyy = 0. AH = —A. 11. Prove the same result right eigenvector x.n T T x yy = 0. right eigenvectors and right principal vectors if necessary. Determine the JCF of A. Let A = [H 1]· 2 2" Find a nonsingular matrix X such that XI AX = J. 5. where x. Determine the JCF of A. ~ 0 Hint: Use[1 1 — I]T an Hint: Use[— 1 1 . Characterize all left eigenvectors. Suppose the small number 10. nxn be of the form A = / + xyT. 3. What are the eigenvalues of this slightly perturbed is added to the (16. x O.
If n = 2 and k = 1. Then = ( X SIXT)(X. transformation explicitly.42. Prove Theorem 9. Prove Theorem 9. Then A = (XS i X T ) ( X ~ T T S2XI) would be the the "symmetric factorization" of J.e. it suffices to prove the result for the required symmetric factorization of A. where SI and £2 are real symmetric matrices and one of them. 16. TIAT = [A011 A22 0 ] . 14. about when the equation for is solvable? solvable? 15. Suppose A e sgn(A) = 1. If n = 2 and k = 1. in terms of AU and A 22.43. Eigenvalues and Eigenvectors 12. Suppose A E C"xn has all its eigenvalues in the left halfplane. Consider the block upper triangular matrix 14. Thus. is nonsingular. The transformation P in (9.94 Chapter 9. say S1.18) is useful. Prove that every matrix e jRn xn is similar to its transpose and determine a similarity transformation explicitly. about when the equation for X is what can you say further. The transformation P in (9. and S2 are real symmetric matrices and one of them. Find a matrix equation that X must satisfy for this to be possible. Show that every matrix A E R"x" can be factored in the form A = SIS2. X e R*x <«*).43.S2X~l) would required symmetric factorization of A. Eigenvalues and Eigenvectors Chapter 9. 15. Prove that 17. 16. 13. say Si. Prove Theorem 9. Prove that 17.42.18) is useful. Show that every matrix A e jRnxn can be factored in the form A Si$2.. Hint: Use the factorization in the previous exercise. where Si 12. what can you say further. it suffices to prove the result for the JCF. Hint: Use the factorization in the previous exercise. Prove Theorem 9. xn has all its eigenvalues in the left halfplane. Hint: Suppose A = Xl XI is a reduction of A to JCF and suppose we can construct Hint: Suppose A = X J X ~ l is a reduction of A to JCF and suppose we can construct the "symmetric factorization" of 1. in terms of All and A22. Consider the block upper triangular matrix A _ [ All  0 Al2 ] A22 ' where A E M"xn and An E Rkxk with 1 ::s: k < n. is nonsingular. Suppose Au =1= 0 and that we we e jRnxn and All e jRkxk 1 < ::s: n. Suppose Al2 ^ and want to block diagonalize A via the similarity transformation want to block diagonalize A via the similarity transformation where X E IRkx(nk). en . Prove that every matrix A E W x" is similar to its transpose and determine a similarity 13. JCF. i. Find a matrix equation that X must satisfy for this to be possible. Thus. sgn(A) = /.
. . If a matrix A is not normal. as well as other matrices that merely satisfy the symmetric. An) (the columns of X are exists a unitary matrix X such that XHAX = D = diag(A. AAHH = AH A). If The following results are typical of what can be achieved under a unitary similarity. respectively).1. If W = V and <2== p.2.Chapter 10 Chapter 10 Canonical Forms Canonical Forms 10. If A = A H 6 C" " has eigenvalues AI. Normal matrices include Hermitian. skewskewHermitian. where it is proved that a general matrix A E C"x" is unitarily similar to a diagonal 10. . . and unitary matrices (and their "real" counterparts: symmetric.. . . where it is proved that a general matrix A e enxn is unitarily similar to a diagonal matrix if and only if it is normal (i.. skewHermitian. orthogonal equivalence if P and are orthogonal matrices.. ...." In matrix terms. = V and if pT is orthogonal. and orthogonal.1.2. Let A = AH e C"x" have (real) eigenvalues A.. Normal matrices include Hermitian. ...1. the transformation A H> PAP" 1 is called similarity. What other U HAU = D." The transformation A f+ P AQ is called an equivalence. such as A = [ _ab ^1 for real scalars a and b.9.2. Xn Theorem 10.e. as well as other matrices that merely satisfy the definition. If P"1 . An). an orthogonal similarity (or unitary similarity in the complex case). V and Q 1.1 10.. We can also consider the case A E emxn and unitary equivalence if P and <2 are unitary.. This is proved in Theorem 10. . the for real scalars a and h. orthonormal eigenvectors for A). This is proved in Theorem 10.. . If W = V and if Q = PT is orthogonal. matrix if and only if it is normal (i. find P e lR. Problem: Let V and W be vector spaces and suppose A : V —>• W is a linear transformation. the transformation A i» PAPT is called If an orthogonal similarity (or unitary similarity in the complex case)." In matrix terms. if A E IR mxn find E R™ and Q E lR~xn such that P AQ has a form. . . The following results are typical of what can be achieved under a unitary similarity. 95 95 . A. it is called an "canonical form. n ). most "diagonal" we can get is the JCF described in Chapter 9.An. and orthogonal.:xm and Q e Rnnxn such that PAQ has a "canonical form..1 Some Basic Canonical Forms Some Basic Canonical Forms Problem: Let V and W be vector spaces and suppose A : V + W is a linear transformation. then there exists a unitary matrix U such that UH AU — D. What other matrices are "diagonalizable" under unitary similarity? The answer is given in Theorem matrices are "diagonalizable" under unitary similarity? The answer is given in Theorem 10. where D = diag(AJ. and unitary matrices (and their "real" counterparts: symmetric. Q are unitary." The transformation A M» PAQ is called an equivalence.e. the transformation A f+ PAPI is called aasimilarity. the transformation A f+ P ApT is called 2. An. such as A = [_~ most "diagonal" we can get is the JCF described in Chapter 9. Find bases in V and W with respect to which Mat A has a "simple form" or "canonical Find bases in V and W with respect to which Mat A has a "simple form" or "canonical xm form. . if A e Rmxn ... respectively). the definition.. = H E en xn exists a unitary matrix X such that X H AX = D = diag(Al. An. Then there AI. . .9.j.. .. skewsymmetric. where D = diag(A.j. If a matrix A is not normal. then there exists a unitary matrix £7 such that A = AH E en xxn has eigenvalues AI. Xn) (the columns ofX are orthonormal eigenvectors for A). AA = AHA).2. . .I.. Two special cases are of interest: Two special cases are of interest: 1. it is called an orthogonal equivalence if P and Q are orthogonal matrices. We can also consider the case A e Cm xn and unitary equivalence if P and Remark 10. Remark 10. !] Theorem 10.
Canonical Forms Chapter 10. xf*x\ = Proof' Let x\ be a right eigenvector corresponding to X\. Let V = XI.. ..T. Write UH = [U\ U ] [VI Vz] 0 2 with Ui E Cnxk . we consider the real case.. Xk are orthonormal)..1) (10. .. k 1. I)block x"xi = 1.. [XI U2] is unitary.. The construction can actually be performed orthogonal frequently [x\ 2 quite easily by means of Householder (or Givens) transformations as in the proof of the Householder transformations proof following general result..96 96 Chapter 10. . we get Ai remaining in the (l.. Hk in the usual way (see below) such that Hk . orthogonal (l. VI € Cnxk [Xi U ] Proof: Let X\I = [x\.2)block by XI Xz. xn] = [x\ ] [XI X22] is unitary. Then [XI V 2] is unitary. n .. Write V H matrix such that V X I = [ ~].2) Al X~AX2 XfAX 2 0 Al ] 0 XfAX z 0 l In (10... Let XI E Cnxk have orthonormal columns and suppose V is a unitary matrix such that UX\ = \ 1. We also get 0 in the (2. . . .... .1 additional vectors x2.• • Hk and Hk'" HI. HdxI.l)block by Al (2.3.l)block. .I)block. —k U. Then there exist n — 1 additional vectors X2. X 1 XI e E". the construction of X2 E JRnx(nl) such that X — z e ]R" (". When combined with the fact that x~ XI = 1. We illustrate the construction of the necessary Householder matrix for k — 1. Let the unit vector x\ be denoted by [~I. Xk]. XH AX induction noting that XH AX is Hermitian.. simplicity.1) we have used the fact that Ax\ = AIXI.k rows of the unitary matrix U. we get 0 in the (l..2)... Let X\ e Cnxk have orthonormal columns and suppose U is a unitary Theorem 10. . . %n] XI .3 called Theorem 10. Xn such that [x\.. 0 Thus.2)block noting that x\ is orthogonal to all vectors in X2. . .. Then VH = / / .k But the latter are orthonormal since they are the last n . Xk H Hk. . Then U = HI'" Hk and H Then x^U2 = 0 (i E ~) means that xf is orthogonal to each of the n — k columns of V2... .. . Canonical Forms Proof: eigenvector corresponding AI. .. In (10. . Construct a sequence of Householder matrices (also known HI. Hk as elementary reflectors) H\. . [£i. . D The construction called for in Theorem 10. (l.Hv.. The proof is completed easily by induction upon noting proof that the (2.3. . k = For simplicity. Thus. xd = [ ~ l U = where R is upper triangular (and nonsingular since x\. xn] = 1. Now XHAX =[ xH I XH ] A [XI 2 X 2] =[ =[ =[ x~Axl XfAxl X~AX2 XfAX 2 ] (10. following general result. D 0 (2.... When combined with the fact that In (l0..3 for k = 1. (/ € k) U2 X i U2 = Xi ... . An. where R € Ckxk is upper triangular.2 is then a special case of Theorem 10. A...xd.... [Xi f/2] unitary.1) we have used the fact that AXI = k\x\.. . where R E kxk is upper triangular.... and normalize it such that x~ XI = XI 1. .) [XI X2]] is orthogonal is frequently required. 10.2)block must have eigenvalues A2. Construct a sequence of Householder matrices (also known Proof: Let X [XI..2)block X2 . ~nf. xn such that X = (XI. Then there exist n . . X = Given a unit vector x\ E JRn.
. Some Basic Canonical Forms 97 Then the necessary Householder matrix needed for the construction of X 2 is given by Then the necessary Householder matrix needed for the construction of X^ is given by U = I . An). where Pi = PR(x. • • . .nf. including the choice of sign and the complex case. [25]. i=l theoretical The following pair of theorems form the theoretical foundation of the doubleFrancisdoubleFrancisQR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way. .1 and UT X\ = 1 ± £1. Theorem 10. Further details on Householder matrices. it is easily verified that u T u = ± 2£i and u T Xl = 1 ± '. Thus...1.3) is actually a often weighted sum of orthogonal projections P.2 is worth stating separately since it is applied frequently in applications. quently in applications. i.1.4. . i=1 (10.. Let A = AT e E nxn have eigenvalues k\. An.e.. ... necessary compression of Xl. To see that U effects the U symmetric U U = U = I. Let A E jRn xn (whose orthogonal matrix X e Wlxn (whose columns are orthonormal eigenvectors of A) such that of XT AX = D = diag(Al.3) spectral which is often called the spectral representation of A. U orthogonal. consulted standard numerical linear algebra can be consulted in standard numerical linear algebra texts such as [7].1. Some Basic Canonical Forms 10. . [11]. X n ). [7]. '.2 for Hermitian matrices) can be written n A = XDX T = LAiXiXT. The real version of Theorem 10.2uu+ = I . n A = LAiPi. U effects necessary compression of jci.i.2.2 is worth stating separately since it is applied fre10... .2 for Hermitian matrices) can be written from Theorem 10.4 implies that a symmetric matrix A (with the obvious analogue from Theorem 10.. . .. XTAX = D = diag(Xi. = PUM = xixf = xxixT since xj xi — 1. Then there exists an 10.It can easily be checked — 2uu+ — u u T . it is easily verified that UT U = 2 ± 2'. A Note that Theorem 10. £«] r It can checked T 2 that U is symmetric and U TU = U 2 = I.10.e. sponding to the Ai'S). (onto the onedimensional eigenspaces correPi onedimensional eigenspaces sponding to the A. so U is orthogonal.. [25]. [23]. x where P.+uu T . £2.) — xiXt = i j since xT Xi = 1. where u ^UU [t\ 1.1 ± 1. [23].Xn. In fact. . where u = ['. A in (10. [11]. Then there exists an AT E jRnxn have eigenvalues AI. . 's).4..•» '.
The following theorem answers this question.5 is called a Schur canonical form or Schur form. for example.2)block wf AU2 is not 0. then complex arithmetic is clearly needed to place such eigenValues on the diagonal of T.. and sufficient for virtually all applications (see. D ur In the case of A e IRn ". The quasiuppertriangular matrix S in Theorem 10. The columns of a unitary [orthogonal] matrix U that reduces a matrix to [real} Schur fonn are called Schur vectors. but if A has a complex conjugate pair of eigenvalues. it is of interest to know While every matrix can be reduced to Schur form (or RSF). Then there exists a unitary matrix U such that U H AU = T. Canonical Forms Theorem 10. following theorem answers this question.2 except that Proof: The proof of this theorem is essentially the same as that of Theorem 10. Its real JCF is h[ 1 1 1 0 0 n n Note that only the first Schur vector (and then only if the corresponding first eigenvalue Note that only the first Schur vector (and then only if the corresponding first eigenvalue is real if U is orthogonal) is an eigenvector. then complex arithmetic is clearly needed if A has a complex conjugate pair of eigenvalues. it is thus unitarily similar to an upper triangular matrix. A quasiuppertriangular matrix is block upper triangular with 1 x 1 diagonal matrix. is that the first Schur vectors span the same all applications (see.2 except that in this case (using the notation U rather than X) the (l. Then Proof: Suppose U is a unitary matrix such that U H AU = D.7. Theorem 10. Let A E R"xxn. Proof: Suppose U is a unitary matrix such that U H AU = D. it is thus unitarily similar to an upper triangular matrix. Definition 10. Let A E cnxn Then there exists a unitary matrix U such that Theorem 10. [17]). The when we can go further and reduce a matrix via unitary similarity to diagonal form.5 (Schur). Then there exists an orthogonal Let A e IR n ".9. Let A e C"x".6 (MurnaghanWintner).e.7.. real arithmetic) to a quasiuppertriangular A e Wnxn is also orthogonally similar (i. A matrix A e c nxn is unitarily similar to a diagonal matrix if and only if A is normal (i. where D is diagonal.98 98 Chapter 10. Example 10.5 (Schur). However. 0 in this case (using the notation U rather than X) the (l. but In the case of A E R"xxn . A quasiuppertriangular matrix is block upper triangular with 1 x 1 diagonal blocks corresponding to its real eigenvalues and 2x2 2 diagonal blocks corresponding to its blocks corresponding to its real eigenvalues and 2 x diagonal blocks corresponding to its complex conjugate pairs of eigenvalues. AH A = AAH ). the next theorem shows that every A E IR xn is also orthogonally similar (i. matrix U that reduces a matrix to [real] Schur form are called Schur vectors. The matrix s~ [ 2 0 2 5 4 0 is in RSF.2)block AU2 is not O. where T is upper triangular. However. Then AAH = U VUHU VHU H = U DDHU H == U DH DU H == AH A so A is normal. Theorem 10. is that the first k Schur vectors span the same Ainvariant subspace as the eigenvectors corresponding to the first eigenvalues along the invariant subspace as the eigenvectors corresponding to the first k eigenvalues along the diagonal of T (or S). While every matrix can be reduced to Schur form (or RSF). what is true. However.e. Proof: The proof of this theorem is essentially the same as that of Theorem lO. diagonal of T (or S). The matrix 10. where T is upper triangular.8. However. and sufficient for virtually is real if U is orthogonal) is an eigenvector.e. complex conjugate pairs of eigenvalues. so A is normal. real arithmetic) to a quasiuppertriangular matrix. where S is quasiuppertriangular..6 is called a real Schur canonical form or real Schur form (RSF).6 is called a real form or Schur fonn.8. Canonical Forms Chapter 10.. The columns of a unitary [orthogonal} Schur canonical form or real Schur fonn (RSF). Its real JCF is is in RSF. it is of interest to know when we can go further and reduce a matrix via unitary similarity to diagonal form. the next theorem shows that every to place such eigenvalues on the diagonal of T. where D is diagonal. Then there exists an orthogonal 10.9. matrix U such that U AU = S. AHA = AA H).6 T T matrix U such that U AU = S. what is true. The triangular matrix T in Theorem 10. . The quasiuppertriangular matrix S in Theorem 10.5 is called a Schur canonical Definition 10. . A is normal (i. for example. UH AU = T. A matrix A E C"x" is unitarily similar to a diagonal matrix if and only if Theorem 10. The triangular matrix T in Theorem 10. [17]).e. where S is quasiuppertriangular.
.2. Thenfor all Let A = AH E Cnxn with eigenvalues AI > A2 > • • > An.B :::: 0 or B . Similarly. and denote the components of y by v UHx. Indeed. A symmetric matrix A e Wxn 1. if—A 4. suppose A is normal and let U be a unitary matrix such that U H AU = T. Definite Matrices 99 Conversely. Then 11.2. Proof: Proof: Let U be a unitary matrix that diagonalizes A as in Theorem 10. U diagonalizes A 10.nxn is Definition 10. i € n. positive definite if and only if xTT Ax > 0 for all nonzero x G lR.10.12.11. we write A :::: B if and only ifA — B>QorB — A < 0.12. If neither semidefinite. 11'/. this section that may be stated in the real case for simplicity. We write A < O. Then n x HAx = (U HX)H U H AU(U Hx) = yH Dy = LA.. Remark 10. 111. negative definite if . negative positive definite.11. superscript H s replace T s. be diagonal.10. if A and B are symmetric matrices.12 ~ AlyH Y = AIX HX . B — A < 0.13. nonnegative definite (or positive semidefinite) if and only if XT Ax :::: 0 for all (or positive if and only if x T Ax > for all nonzero x e W. where T is an upper triangular matrix (Theorem 10. If A E C"x" is Hermitian. Similarly. e Theorem 10. we write A > B if and only if A — B > B . We (or negative if— A nonnegative definite.A < O. if A and B are symmetric matrices.. 2.2 Definite Matrices Definite Matrices Definition 10. Furthermore.=1 . Also.2 10. We write A > O.2.n. 3.B > 0 or or Also. If a matrix is neither definite nor semidefinite. CM j]i. We write A > 0. i En. this is generally true for all results in the remainder of of superscript //s Ts. It T 0 D 10. A symmetric matrix A E lR.=1 But clearly n LA.A is positive definite.5).n • We write A :::: 0. We write A < 0. We write A ~ 0. x eC".. positive definite if and only ifx Ax > Qfor all nonzero x E W1 We write A > O. Let A = AH e enxn with eigenvalues X{ :::: A2 :::: . nonpositive definite (or negative semidefinite) if A is nonnegative definite.. it is said to be indefinite. all the above definitions hold except that A e nxn Remark 10. we write A > B if and only if A . Remark 10. in fact. Definite Matrices 10. A U U HA U T.• :::: An. nonzero x E lR. Furthermore. Then T (Theorem It is then a routine exercise to show that T must. let y = U H x..A ~ O. indefinite. where x is an arbitrary vector in en. we write A > B if and only if A .10. Then for all E en. write A < O.2.
0 D Remark XHHAx Remark 10. 3. All eigenvalues of A are nonnegative.w) x HAx > the Rayleigh quotient. xfO IIxll2 I 0 Definition submatrixofan n x k) x k) Definition 10.15. A can be wrirren in [he/orm MT M.16. For example.. The ratio ^^ x for A = AH <=enxn and nonzero x jc een isis calledthe = AH E Cnxn and nonzero E C" called the x of x. 2. of positive.. If A = AH e C"x" is positive definite. However. All eigenvalues of A are positive. consider the matrix A — [0 _l~]. whence IIAxll2 ! H IIAliz = max . A principal submatrix of an nxn n matrix A is the (n — k)x(n(n — k) matrix that remains by deleting k rows and the corresponding k columns. Theorem 10. Let A e C"x". A symmetric matrix A € R"x" is nonnegative definite if and only if any of following equivalent of the following three equivalent conditions hold: 1.19. from which the theorem follows. 3. Remark 10. where M E IRb<n and k ~ ranlc(A) — ranlc(M). The determinants of all principal submatrices of A are nonnegative. 3. x E C". Canonical Forms Chapter 10. Theorem 1O.@mllrk 10. All eigenvalues of A are positive. Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll eubmatrioes muet bQ nonnogativo in Theorem 10. 2. Theorem 10.= Amax{A A).1.18.17. not just those of the leading principal submatrices.I. Then 111~~1~2 Let jc be an eigenvector corresponding to Xmax(AHA). Theorem 10. where M 6 R ix " and k > rank(A) "" rank(M). A can be written in the form MT M.17). Then ^pjp2 = ^^(A" HA). the . of all principal submatrices of 2. Theorem 10. form MT E ~n xn E ~n xn definite if and only if Theorem 10. so 0 An ::::: .17). The determinant of the I x leading submatrix is 0 and consider the matrix A = [~ 2x 0 (cf. The determinant of the 1x1 1 leading submatrix is 0 and 1. of obtained and E ~nxn positive definite if and only if any of the Theorem 10.1. Note that the determinants of all principal "ubm!ltriC[!!l mu"t bB nonnBgmivB R..100 100 Chapter 10.19.l3 provides upper (AO and lower (An) bounds for (A. Canonical Forms and and n LAillJilZ::: i=l AnyHy = An xHx . Let A E enxn Then \\A\\2 = Ar1ax(AH A).18. where M e R"x" is nonsingular. whence Ar1ax (A A). determinant the determinant of the 2x2 2 leading submatrix is also 0 (cf. A leading principal submatrix of order n — k is obtained by deleting the last k rows and columns. ::::: AI. Then IIAII2 = ^m(AH A}. The determinants of all leading principal submatrices of A are positive.18. . All eigenvalues of A are nonnegaTive. I Proof: E C" Proof: For all x € en we have Let x be an eigenvector corresponding to Amax (A HA).soO < X n < ••• < A. Corollary Corollary 10. A symmetric matrix A e E" x" is positive definite if and only if any of the following equivalent following three equivalent conditions hold: determinants of principal 1.13 provides (A 1) Rayleigh quotient of jc.14. XHAx > 0 for all nonzero = AH E enxn E en. A can be written in the form MT M.
It is stated and proved below for the more general known as the Cholesky factorization.20. In general. If A > B and M e Rm . In general. Let A e c nxn be Hermitian unique nonsingular lower triangular matrix L nonsingular A = LLH.2.21.17 is available and is known as the Cholesky factorization. [16. E jRnxn MT AM > M BM. p.17 is available and is A stronger form of the third characterization in Theorem 10. 1. if € E" xn we say that e jRn x that S E R nxn"isisa asquare root of AA ifS2 2 =— A. If A> Band E jR~xm.we say 181]). p.22.. MT AM> M. Remark 10. Then A has aaunique nonnegative definite square root S. SA = AS and rankS = rank A (and hence S is positive = AS S S.nxn . The factor M in Theorem 10. The case n = 1 is trivially true. in fact. matrices (both symmetric and square root of if S A. The following Recall that A > B if the matrix A — B is nonnegative definite. Definite Matrices 101 101 principal submatrix consisting of the (2. 0 Recall that A :::: B if the matrix A . Ll E C1""1^""^ and .20. where L\ e c(nl)x(nl) is nonsingular and lower triangular as = L1Lf. with positive diagonal elements such that positive Proof: The proof is by induction. That is. A e R"x be nonnegative definite. 2. for example. j proof (see. negative and is nonpositive definite. A stronger form of the third characterization in Theorem 10. BM. Then there exists a positive definite. standard theorem stated 181]). [ fz ti o o l [~ 0] ~ 0 v'3 . assume the result is true for matrices of order n . Theorem 10. 1f A :::: Band M E Rnxm. Then A has unique nonnegative Theorem 10. For example. Let A E lR. The following theorem is useful in "comparing" symmetric matrices.is a square root. nxm 2.. for example. [16. 10rm [COSO _ Sino] . The following standard theorem is stated without proof (see.B is nonnegative definite. The factor M in Theorem 10. and positive definite. negative and A is nonpositive principal submatrix consisting of the (2. . nxn Theorem 10.22. Moreover. assume the result is true for matrices of order — 1 so that B may be written as B = L\L^. matrices (both symmetric and nonsymmetric) have infinitely many square roots. basic definitions. any matrix S of c e s 9 the " °* ™ the form [ ssinOe _ ccosOe ] IS a square root. Hermitian case. if Remark 10. For example.10.nxn"be nonnegative definite. Write the matrix A in Proof: The proof is by induction.23. Write the matrix A in the form the form By our induction hypothesis.2) element is.3 is not unique. then MT AM :::: MTTBM. It is stated and proved below for the more general Hermitian case. The case = is trivially true.1 so that B By our induction hypothesis. definite if A is positive definite). if then M can be then M can be [1 0].18. Let A. if A E lR. = LLH. then MT AM > MT TBM. Its proof is straightforward from theorem is useful in "comparing" symmetric matrices. For example. Definite Matrices 10. If >BandMe jRnxm. rankS = rankA definite definite if positive definite). if A = lz.2) element is.3 is not unique.18. if = /2.2. B e Rnxn be symmetric.23. It concerns the notion of the "square root" of a matrix. Theorem 10. Its proof is straightforward from basic definitions. any matrix of nonsymmetric) have infinitely many square roots. E <C Theorem 10. That is.. For example. concerns the notion of the "square root" of a matrix. in fact.
are generally unreliable. p. Alternatively. Then there exist matrices P E C: xm and Q e C"nx" such E c. Canonical Forms Chapter 10. 2].xn.3 10. of ann — b 0 root of «„„ .2) in its complex version. Substituting in the expression involving a. Canonical Forms with positive diagonal elements.24. multiplication where a is positive.. But know that o < det(A) = det [ ~ b ] = det(B) det(a nn _ b H B1b). for example (10.24. see.4) efficiently available. They are more stably computable than (lOA) and more efficiently computable than a full SVD. 5]. . we find by L^b. ann Since det(B) > 0.b B1b completes D 10. But we = ann — b LIH L\lb = ann — bH B~lb B A). of course. Take P =[ S~ 'f [I ] and Q = V to complete the proof. Ch. Choosing a be det(fi) > HB~lb completes the proof. Performing the indicated matrix multiplication and equating the corresponding submatrices. yields a far "simpler" canonical form (10. 0 Note that the greater freedom afforded by the equivalence transformation of Theorem afforded 10. Then E c~xn such exist e C™ x m that that PAQ=[~ ~l (l0. the SVD is relatively expensive to compute and other canonical forms exist that are intermediate between (l0. However.lb. suppose A has an SVD of the form (5.p. Gaussian or elementary row and column operations. [4. for example.b H B1b (= the Schur complement of B in A).b H B1b > O.102 102 Chapter 10. It remains to prove that we can write the n x n matrix A It in the form in the form ann b ] = [LJ c a 0 ] [Lf 0 c a J.131]. Choosing a to be the positive square ann . numerical procedures for computing such procedures an equivalence directly via. 131].b HL\H L11b = ann .4) [7.4) and the SVD. Then [ Sl o 0 ] [ I Uf U H ] AV = [I 0 0 ] 0 .• Clearly we see we L I C = b and ann = c HC a 2 c is given simply by c = C.3 Equivalence Transformations and Congruence Equivalence Transformations and Congruence Theorem 10. say.4) Proof: proof Proof: A classical proof can be consulted in.4). as opposed to the more restrictive situation of a similarity transformation. However. Let A € C™*71. [21. the unitary equivunitary alence known as the SVD. Substituting in the involving we find 2 a2 = ann . The numerically preferred equivalence is. Two such forms are stated here. Alternatively. [21. Ch. Many similar results are also (10. we must have ann —bHB lb > 0. we see that we must have L\c = b and ann = CHc + a 2. available.
numbers In(A) (n. Then there exists a unitary matrix Q e e mxm and a Theorem 10.5) where R E e. If In(A) = (rr. It is of interest to ask what other properties of a matrix are preserved under congruence. Remark 10. various rank revealing QR decompositions are available that can sometimes detect such various rank revealing QR decompositions are available that can sometimes detect such phenomena at a cost considerably less than a full SVD.28. Let A E C™ x ".28. D 0 Remark 10.29.31 (Sylvester's Law of Inertia). v.xr is upper (or lower) triangular with positive diagonal elements.. Proof: For the proof. Note that congruence preserves the property of being Hermitian.e. if A is Hermitian. v.XH AX Definition 10.31 guarantees that rank and signature of a a matrixare preserved under Theorem 10. respectively. nxn E e X E e~xn. see [4]. negative. £).3. £). Let A E e~xn. Equivalence Transformations and Congruence 10. In(A) = ln(X Proof: For the proof. It is of interest to ask what other properties of a matrix are then X H AX is also Hermitian. If A AH e e x " then A> 0 if and only if In(A) = (n. 0. Then the inertia of A is the triple of inertia of of negative. if A is Hermitian.25 (Complete Orthogonal Decomposition). Note that a congruence is a similarity if and only ifX is unitary. The transformation A i> XH AX is called a congruence. where R E Crrxr is upper triangular and S e C rx( " r) is arbitrary but in general nonzero. Theorem 10.In[! 1o o o 0 0 00] 10 =(2. v. 0 2. see [4]. respectively. Then is the numbers In(A) = (rr.30. The H. then XH AX is also Hermitian. It turns out that the principal property so preserved is the sign of each eigenvalue. Let A e C™ ". Definition 10. congruence. and £ denote the numbers of positive. (TT. Then there exists a unitary matrix Q E Cmxm and a permutation permutation matrix IT e en xn" such that Fl E C"x QAIT = [~ ~ l (10.1). 2. i. Example 10. phenomena at a cost considerably less than a full SVD. It turns out that the principal property so preserved is the sign preserved under congruence. Let A = A He ennxn and X e Cnnxn.26. of A. 0).3. p. Let A = AH E e nxn and let rr. In(A) 3. We then have the following. Again. Again.xr E erx(nr) arbitrary general nonzero.t h e n A > 0 if and only if In (A) = (n. and zero eigenvalues. Proof: For the proof. see [4] for details. Let A e Cnxn and X e Cnnxn. v. Definition 10. D Proof: For the proof. In(A) = In(X AX).. HE C xn E e~ xn. Then there exist unitary matrices U e Cmxm and V E Cnxn such that unitary matrices U E e mxm and V e e nxn such that (10. Note that a congruence is a similarity if and only if X is unitary. see.v.1. for example. . v.6) E e. D Theorem 10. Equivalence Transformations and Congruence 103 103 Theorem 10. The signature of is Example 10. see.0). Let A = AH e C"x" and let 7t. n The signature of A is given by sig(A) = n . a congruence.27. Let A e e~xn. If A = A" E C nnxn.e.25 (Complete Orthogonal Decomposition). where R e €. 0 D x Theorem 10. 134]. sig(A) = rr — v. Then there exist Theorem 10. for example. and ~ denote the numbers of positive. Proof: For the proof. upper Proof: For the proof.26. When A has full column rank but is "near" a rank deficient matrix. .29. i. We then have the following. Then H HAX). p.rrxr is upper (or lower) triangular with positive diagonal elements. v. 134]. When A has full column rank but is "near" a rank deficient matrix. [21.31 guarantees that rank and signature of matrix are preserved under congruence. then rank(A) rr v. see [4]. Note that congruence preserves the property of being Hermitian.27. and eigenvalues.10. n. [21.31 (Sylvester's Law of Inertia).0. see [4] for details.30. see [4]. of A. Definition 10. of each eigenvalue. then rank(A) = n + v. Theorem 10. l.
. Xw denote the eigenvalues of A and order them such that the first TTare ~ O. D > and . X UW desired 10.. . . where the number of X 1's is Jr. . A w ). and D .4 Rational Canonical Form Rational Canonical Form rational One final canonical form to be mentioned is the rational canonical form. and D .1 Block matrices and definiteness Theorem 10. . AA+B = B. 1/. . An of Jr Proof: Let AI . .3. or D > 0 and A .104 104 Chapter 10.35. for example.. . Then Remark Remark 10.BT AI > 0.. if and if either A> and D . 1. ... the next v are negative.3... . I/. the number of Il's is v. . Suppose A = AT and D = DT. 0 D 10.BD^BT > 0. Proof: AI. £)..33. if ifA>0.. . .fArr+I' . 0 D Then it is easy to check that X = V VV yields the desired result. the number of — 's is v. and the final £ are 0. ..0. Proof: Consider the congruence with Proof: Consider proof Theorem and proceed as in the proof of Theorem 10.33. v. where the number of E c~xn XH AX = diag(1.4 10. . left AT D DT. Note the symmetric Schur complements of A (or D) in the theorem.... By Theorem 10.. I/~.. . Then there exists a matrix AH C"xn In(A) = (Jr. the congruence B ] [I D ~ 0 _AI B I ° JT [ A BT ~ ][ ~ 0 D The details are straightforward and are left to the reader. Theorem positive.. I. . Canonical Forms Chapter 10. 1.. Theorem 10. .0).. . ..32.1 10. AA+B = B. I.34. . Canonical Forms Theorem 10.1). 0)..2 there exists a unitary matrix V such that VHAU = diag(AI. v. O. 0. 1. Suppose A = AT and D = DT. Let A = AHeE cnxn with In(A) = (jt. ifand only ifeither A > 0 and D .. 1. Then = AT D = DT. . 's is 7i.I BT > O. Define the nn x n matrix U UH AV = diag(Ai. ..BT A+B:::: o. . .... the number 0/0 's is (. and the numberofO's is~. Define the x n matrix vv = diag(I/~. . .BT A+B > 0. An). X e C"nxn such that XHAX = diag(l.BD. Proof: proof Proof: The proof follows by considering.BT A~l B > 0. . I. B D ] >  ° if and only if A:::: 0.fArr+v.
4. : ~ ! ~01]. Companion matrices also appear in the literature in several equivalent forms. o (10. Suppose A E lRnxn is a nonderogatory matrix and suppose its characteristic polynoSuppose A E Wxn is a nonderogatory matrix and suppose its characteristic polynon(A) An — (a0 + alA + a n _iA n ~'). the Moreover.Then it can be shown (see [12]) that A mial is 7r(A) = A" . the inverse of a nonsingular companion matrix is again in companion form. consider the companion matrix illustrate. Then it can be shown (see [12]) that A is similar to a matrix of the form is similar to a matrix of the form o o o o 0 o o o (10. is said to be in cornpanion form.36. Rational Canonical Form 10. since a matrix is similar to its transpose (see exercise 13 in Chapter 9).(ao + «A + . A matrix A e E nx " of the form (10. equivalently. To Companion matrices also appear in the literature in several equivalent forms. A is easily seen to be similar to the following matrix identity similarity P given by (9. Using the reverseorder identity similarity P given by (9. since a matrix is similar to its transpose (see exercise 13 in Chapter 9). A is easily seen to be similar to the following matrix in upper Hessenberg form: in upper Hessenberg form: a2 al o 0 0 1 o 1 6] ao o .8) This matrix is a special case of a matrix in lower Hessenberg form. has only one block associated with each distinct eigenvalue. A matrix A E lRn Xn is said to be nonderogatory ifits minimal polynomial if its minimal polynomial and characteristic polynomial are the same or.37. equivalently.18). Notice that in all cases a companion matrix is nonsingular if and only if aO i= O. In fact. if its Jordan canonical form and characteristic polynomial are the same or.7) is called a companion matrix or is said to be in companion forrn.10.11) ..9) Moreover.7) is called a cornpanion rnatrix or Definition 10.4.37. the following are also companion matrices similar to the above: following are also companion matrices similar to the above: Notice that in all cases a companion matrix is nonsingular if and only if ao /= 0. Rational Canonical Form 105 105 Definition A matrix A e M"x" is said to be Definition 10.10) o 1 o 1 o o o o o o (10. + an_IAnI). For £*Yamr\1j=» example.18). For In fact. consider the companion matrix (l0. A matrix A E lRnxn of the form (10. if its Jordan canonical form has only one block associated with each distinct eigenvalue. Using the reverseorder This matrix is a special case of a matrix in lower Hessenberg form.. the inverse of a nonsingular companion matrix is again in companion form. To illustrate.7) Definition 10. l 0 0 ~ ao ~ ao _!!l (10.
39... for example. Algorithms to reduce an arbitrary matrix to companion form are numerically unstable. Such matrices are said to be in each of whose diagonal blocks is a companion matrix. each of whose diagonal blocks is a companion matrix.39.. .. Ifao ^ 0.106 Chapter 10.. with a similar result for companion matrices of the form (10. Let al ~ GI ~ • • ~ an be the singular values of the companion matrix Theorem 10. associated at least one eigenvalue.4aJ) .4ao ' 1 2)  a? = 1 for i = 2. a2. especially nonsingular ones are nearly singular. For details. the largest and smallest singular values can also be written in the equivalent form Remark 10. anIf and let e M"" \a\. I — T = T) Note that / .. Then A in (10. then its pseudoIf singular. + a. Canonical Forms with a similar result for companion matrices of the form (10. Explicit formulas for all the associated right and left singular vectors can also be derived easily.. Let a\ > a2 > . see [14]. Theorem 10.. If a companion matrix of the form (10. i..10). = ~ (y .7). companion an arbitrary matrix to companion form are numerically unstable. then it is not similar to a companion matrix of the form (10. Companion matrices appear frequently in the control and signal processing literature Companion matrices appear frequently in the control and signal processing literature but unfortunately they are often very difficult to work with numerically.7) is singular. For example. and perhaps surprisingly. However. .. and so forth [14]. stable ones are nearly unstable. is the fact that their singular values can be found in closed form. matrix is not a companion matrix unless a = O. Moreover. also be derived easily.7).e.• > an be the singular values of the companion matrix A in (10. companion matrices are known to possess many undesirable numerical properties.Jy2 .caa T ca o J. then it is not similar to a companion matrix of the form (10. Algorithms to reduce but unfortunately they are often very difficult to work with numerically. and perCompanion matrices have many other interesting properties. if ao = 1 inverse can still be computed.1._1 and y = 1 + + a. Leta = ar aJ al 2_ 2 ( y + Jy 2. is the fact that their singular values can be found in closed form. the largest and smallest singular values can also be written in the equivalent form If ao =1= 0. n . see.38.38.. and so forth [14].. 02. For example. Such matrices are said to be in rational canonical form Frobenius rational canonical form (or Frobenius canonical form).. form). Let a E JRn1 denote the vector [ai.7). Explicit formulas for all the associated right and left singular vectors can Remark 10. among which.. and hence the pseudoinverse of a singular companion + matrix is not a companion matrix unless a = 0. Let a = a\ + a\ + • • • + a%_{ and y = 1 + «. If A E R nx " is derogatory. see haps surprisingly. i.Q + a.7). their eigenstructure is extremely ill conditioned. . at least one eigenvalue.caa T = (I + aaT) I . if ao = 0. a. a n i] and l c I+~T a. among which. Then it is easily verified that c = l+ ara' Then it is easily verified that o o o + o o o o o o 1. Companion matrices have many other interesting properties.. it can be shown that a derogatory matrix is similar to a block diagonal matrix.10). nonsingular ones are nearly singular. 3. stable ones are nearly unstable. in matrices are known to possess many undesirable numerical properties. in n general and especially as n increases.e. . Then + ai + . Canonical Forms Chapter 10. Moreover. has more than one Jordan block associated with If A € JRnxn derogatory. . [12].
Note that explicit formulas then K2(A) ~ I~I' It is not unusualfor y to be large forlarge Note that explicit formulas Koo(A) for K] (A) and Koo(A) can also be determined easily by using (l0. Let A G Cnx" and define p(A) = maxx€A(A) IAI. Show that if A is normal. then it must be diagonal.. Then p(A) is called the spectral radius of A. Let A = I J : ]eEC 22x2. one measure of numerical sensitivity is KP(A) = A A ] > the socalled condition number of A with respect to inversion and with respect II ^ IIpp II A~l IIpp'me socalled condition number of A with respect to inversion and with respect to the matrix pnorm. one may lose up to k digits of to the matrix Pnorm. then p(A) = IIAII2' Show that the converse is true if n = 2.(A) for e n. Show that [~ R > SI.5 to find a unitary matrix Q that reduces A e C"x" to lower triangular form. Let R. It is easy to show that y/2/ao < K2(A) < £. (A) = IA. If A E Wxn is positive definite. A E en x n eigenvalues A].. can be determined explicitly as determined explicitly y+J y 2 . R> S [1 A~I] ~ O? /i 1 > 0? ~] > 0 if and only if > 0 and J 1 > 0 if and only if S > 0 and .18) U A E cc nxn Theorem 10. Show that if a triangular matrix is normal.. A [ must also be positive 7.• ~ an ~ O. If A e jRn xn 8. If this number is large. one may lose up to k digits of precision. then K2(A) ^ T~I.Exercises Exercises 107 Companion matrices and rational canonical forms are generally to be avoided in floatingCompanion matrices and rational canonical forms are generally to be avoided in fioatingpoint computation. when solving linear equations numerical sensitivity Kp(A) = systems of equations of the form (6. Show that [ * }. . Show that if A is normal. 5.. yn and singular values a\ ~ a2 > . when solving linear behavior might be expected for companion matrices. 9. Theorem 10.38 yields some understanding of why difficult numerical Remark 10. Let A € C n xn be normal with eigenvalues y1 . 3. this condition number is the ratio of largest to smallest singular values which. In the 2norm. 2. is true if n = 2.EA(A) I'MpeA) 3. If this number is large. It is not unusual for y to be large for large n. Prove that if A e M"x" is normal.. say O(lO k ). Find a unitary matrix U such that [~ M CC x 2 Find a unitary matrix U such that 6. say 0(10*).4a5 21 a ol It is easy to show that 21~01 :::: k2(A) :::: 1:01' and when ao is small or y is large (or both). Use the reverseorder identity matrix P introduced in (9..11).(A)I for ii E!l. A E cc nxn peA) = max). K\ (A) (10.. Theorem 10. EXERCISES EXERCISES 1. Show that a. Show that a.11).40...38 yields some understanding of why difficult numerical behavior might be expected for companion matrices. this condition number is the ratio of largest to smallest singular precision.40. • • > on > 0.. then Af(A) = N(A Tr ). 1. Let R.2). An and singular 0'1 > 0'2 ~ 4.. then peA) = A2.18) and the matrix U in identity in (9. and when GO is small or y is large (or both). show that AI must also be positive definite. In the 2norm. Let A 7. (A) A. S 6 E nxn be symmetric. E jRnxn be symmetric. Show that the converse radius of A. For example. For example. Suppose A e E"x" is positive definite. by the theorem. Is [ ^ A E jRnxn is definite. Remark 10. . 6. . A E jRnxn N(A) = A/"(A ).
1 1.j 1+ j ] 2 ' (d) [ .108 108 10. Canonical Forms Chapter 10. . Canonical Forms [~ ~ l (b) [ 2 1.j 1+ j ] 1 . Find the inertia of the following matrices: following 10. (a) Chapter 10.
nxn is defined by Definition 11. where the matrix A e JR. (e(eAf = e A e^.1.1 Differential Equations Differential Equations = Ax(t). Proof This follows immediately from Definition 11. This is known as an initialvalue problem.1 and linearity of the transpose.1. T T 109 109 . where the matrix A E Rnxn is constant chapter only to the socalled timeinvariant case.1 and linearity of the transpose.Ak. = Xo In this section we study solutions of the linear homogeneous system of differential equations In this section we study solutions of the linear homogeneous system of differential equations x(t) x(to) E JR. We restrict our attention in this chapter only to the socalled timeinvariant case. Proof: This follows immediately from Definition 11.1) is then known always to exist and be and does not depend on t.1. which thus also converges for all A and uniformly in t. eO = I.n (11.2) can be shown to converge for all A (has radius of convergence equal to +00). Definition 11. The solution of (11. e° = I.nxn.1 11. For all A e Rnxn. unique. It can be described conveniently in terms of the matrix exponential. The solution of (11.1 11.1) is then known always to exist and be unique. We restrict our attention in this for t > IQ. For all A JR. A) • 2. The solution of (11.3) which thus also converges for all A and uniformly in t.1 Properties of the matrix exponential Properties of the matrix exponential 1.1) involves the matrix to +(0). The solution of (11. This is known as an initialvalue problem. k.1) for t 2: to.nxn. the matrix exponential e A e JR.1. Proof: This follows immediately from Definition 11. 11.nxn is constant and does not depend on t.2) can be shown to converge for all A (has radius of convergence equal The series (11. It can be described conveniently in terms of the matrix exponential. For all A E JR.Chapter 11 Chapter 11 Linear Differential and Linear Differential and Difference Equations Difference Equations 11.2) k=O The series (11. the matrix exponential e A E Rnxn is defined by the power series power series e = A L +00 1 . Forall A EG R" XM .1) involves the matrix (11. (11. Proof This follows immediately from Definition 11.1 by setting = 0.1 by setting AA =O.
(b) £. {+oo = io et(sl)e tA dt since A and (sf) commute =io (+oo ef(Asl) dt . ) . = I + (t + T)A + (t + T)2 A 2 + . and B commute.. (a) . For all A.. i. Let £ denote the Laplace transform and £1 the inverse Laplace transform.e. (e'A)~l e~'A.. For all e JRnxn and for all E R... Part (b) follows similarly. = e'A erA = elAe'A . (a) C{etA = (sIArl. (b) .1 {(j/A)... et(A+B) =^e'Ae'B = e'Be'A and and B commute. ) . Let denote the Laplace transform and £~! the inverse Laplace transform. (etA)1 = e. Proof" Simply take T = — t in property 3.. 5. T E JR. Compare like powers of t in the first equation and the second or third and use the Compare like powers of t in the first equation and the second or third and use the k binomial theorem on (A + B/ and the commutativity of A and B. et(A+B) =etAe tB = etBe tA if and only if A all e JRnxn and all e R. 6.e. Proof" We prove only (a). Then for E JRnxn t E R. Compare like powers of A in the above two equations and use the binomial theorem Compare like powers of A in the above two equations and use the binomial theorem on(t+T)k.. binomial theorem on (A B) and the commutativity of A and B. AB = B A. Then for 6.tA . Linear Differential and Difference Equations Chapter 11.1 } = erA.. AB = BA. Proof: Simply take T = t in property 3.lI{(sl..A)I} = «M.110 110 Chapter 11. For all e R"x" and for all t. For all A E JRnxn and for all t. ForaH A E R" x " and for all t e JR.A)I. Proof" Note that Proof: Note that e(t+r)A = etA erA = erAe tA . Part (b) follows similarly. 4. all A € R"x" and for all t € lR. 2 2! and and while while e e tB tA = ( 1+ tB t2 2 + 2iB 2 +. ) ( I + T A + T2!2 A 2 +. i. B E R" xn and for all t E JR.. r e R.. Linear Differential and Difference Equations e(t+r)A e(t+T)A 3.l{e tA}} = (sI . Proof' Note that Proof: Note that et(A+B) = I t + teA + B) + (A + B)2 + . Proof: We prove only (a).. on (t + T)*. 2! and and e e tA rA 2 = ( I + t A + t2! A 2 +. ) ( 1+ tA + t2!A 2 +.
Notice in the proof that we have assumed. For all A e R"x" and for all t e R.3) is uniformly convergent. Differential Equations 11.Ae tA tA tA I I e tA .. for convenience. .y.A) ~' is called the resolvent of A and is defined for all s not in A (A).H using the JCF. the scalar dyadic decomposition can be replaced by et(Asl) =L .11. it can be differentiated termbyterm from which the result follows immediately.etA .Ae II = I L'lt (M)2 + ~ A 2 +.A"I i=1 .. that A is diagonalizable. it can be differentiated termbyProof: Since the series (11.1 The matrix (s I — A) I is called the resolvent of A and is defined for all s not in A (A).A)I.=1 m Xiet(Jisl)y. for convenience. the scalar dyadic decomposition can be replaced by If this is not the case.H dt assuming A is diagonalizable ..... . ) = L'lt IIA 21111e tA IIe~tIIAII.AetAil Ae tA I ~t (e~tAetA I (M A I ~t (e~tA . Notice in the proof that we have assumed. 1h(e tA ) = AetA = etA A. the formal definition d dt _(/A) = lim ~t+O e(t+M)A _ etA L'lt can be employed as follows.u .All succeeding steps in the proof then follow in aastraightforward way.y.. using the JCF. For all A E JRnxn and for all E JR. = (sl A).. e'A Proof: Since the series (11. Alternatively. ) 3! 4! L'ltiIAIl < L'lt1lA21111e (1 + + (~t IIAII2 + ..Ae tA I = III (etAe~tA L'lt = = /A) ...Ae tA etA) .l)e . For any consistent matrix norm. s .X i y.. £(e'A) 7.. The matrix (s I .H L. ) = I ( Ae + = tA ~. ) etA I < MIIA21111e  L'lt (L'lt)2 + IIAII + IIAI12 + . employed I e(t+~t)AAt.3) is uniformly convergent.=1 = ~[fo+oo e(AiS)t dt]x. A 2etA + . Differential Equations 111 111 = {+oo 10 n t 1 e(AiS)t x. that A is diagonalizable.. All succeeding steps in the proof then follow in straightforward way.H = '"' assuming Re s > Re Ai for i E !! = (sI . ) 3 II I ( ~..1... If this is not the case. ... A2 + (~~)2 A tA II tA Il 1 (_ 2! + .1..
D 11.8) . The general Proof: Differentiate (11. B E Wnxm and let the vectorvalued function u be given Let A e IR nxn . Also. Also.1. x(to) = e(toto)A Xo = XQ so.i~t()Oc() nnd uniqu()Oc:s:s theorem for *('o)} = <?(f°~fo)/1.4. Thus. The proof above simply verifies the variation of parameters formula by direct differentiation.¥o + 0 = XQ so. A similar proof yields the limit e'A A. Let A E IR n xn. continuous. t ) . t ) . (11. or one can use the fact that A commutes with any polynomial of A of finite degree and hence with e'A.2.7) and again use property 7 of the matrix exponential.5) Proof: Differentiate (11. Linear Differential and Difference Equations For fixed t.5) and use property 7 of the matrix exponential to get x t ) = Ae(tto)A xo fundamental Ae(t~to)Axo = Ax(t). the For fixed t. Then the solution of the linear inhomogeneous initialvalue problem and.t goes to O. Premultiply the equation x — Ax = Bu by e~ to get (11.1. x(to) = Xo E IR n (11.tA to get as follows.7) is the solution of (1l. fact that A commutes with any polynomial of A of finite degree and hence with etA. 0 ordinary differential equations.2 Homogeneous linear differential equations Homogeneous equations x(t) Theorem 11. Linear Differential and Difference Equations Chapter 11.7) is the solution of (11. say.112 112 Chapter 11.f(p(t).dt dt is used to get x ( t ) = Ae(tto)Ax0 + f'o Ae('s)ABu(s) ds + Bu(t) = Ax(t) + Bu(t). Thus.5) is the solution of (11. Premultiply the equation x .. the righthand side above clearly goes to 0 as t:.4) for t ::: to is given by (11. x(to) = Xo E IRn (11. (11. Then the solution of the linear inhomogeneous initialvalue problem x(t) = Ax(t) + Bu(t). say. The general formula formula d dt l q (t) pet) f(x. The formula can be derived by means of an integrating factor "trick" direct differentiation. The formula can be derived by means of an integrating factor "trick" as follows. by the fundamental existence and uniqueness theorem for ordinary differential equations. or one can use the limit exists and equals Ae t A A similar proof yields the limit et A A.3 Inhomogeneous linear differential equations Inhomogeneous equations Theorem 11. The proof above simply verifies the variation of parameters formula by Remark 11. B e IR xm and let the vectorvalued function u be given Theorem and..Ax = Bu by e. The solution ofthe linear homogeneous initialvalue problem Let A e Rnxn. D Ir: Remark 11. the limit exists and equals Ae'A •.3.7) and again use property 7 of the matrix exponential.4. t) dx = l af(x t) ' dx pet) at (t) q + dq(t) dp(t) f(q(t). the righthand side above clearly goes to 0 as At goes to 0.7) Proof: Differentiate (11. 11.4). 0 uniqueness theorem for ordinary differential equations.5) and use property 7 of the matrix exponential to get x ((t) = Proof: Differentiate (11.5) is the solution of (11. Let A E Rnxn . by the fundamental existence and x(t0) — e(fo~t°')AXQ — Xo uniqueness theorem for ordinary differential equations.4). Ae(ts)A Bu(s) to get x(t) = Ae{'to)A Xo + Bu(t) = Ax(t) = x(to e(totolA Xo + = Xo fundilm()ntill ()lI. lo t (11.6) for t ::: to is given by the variation of parameters formula for t > IQ is given by the variation of parameters formula x(t) = e(tto)A xo + t e(ts)A Bu(s) ds. (11. continuous.6).6). The solution of the linear homogeneous initialvalue problem = Ax(l). (11.
E ]R. and C e Rnxm.7. The initialvalue problem (11.6. Then the matrix initialvalue E jRmxm.8) over the interval [to. X((t) is symmetric and (11. Theorem 11. t exponential. C e IR" ".2. The of nrohlcm problem X(t) = AX(t). Theorem 11. etAx(t) . .11) X(t) = etACe = e ratB has the solution X ( t ) — atACe tB . Let A. Let A E Wlxn. t]: 113 1 Thus. punov differential equation.4 11. Theorem 11.nxm. t]: Now integrate (11. The first is an obvious generalization of Theorem 11. differential equation. For convenience. Differential Equations [to.5. Differential Equations 11.12) is known as a LyaX t) punov differential equation. E ]R.7.9) for t ::: to is given by for t > to is given by X(t) = e(tto)Ac. X(O) =C (11. B e R m x m . problem problem X(t) = AX(t) + X(t)B. the Proof: Differentiate etACe tB property Proof: Differentiate etACetB with respect to t and use property 7 of the matrix exponential.sA Bu(s) ds x(t) = e(ttolA xo + lto t e(ts)A Bu(s) ds. Let A E Rnxn. the Theorem 11. The solution of the matrix linear homogeneous initialvalue e jRnxn.1. the When C is symmetric in (11. X t) X 0 D Corollary 11..nxn. and hence t d esAx(s) ds = to ds 1t to eSABu(s) ds. and the proof is essentially the same. we can have coefficient matrices on both the right and left. following to = O.1. (11.11) is known as a Sylvester Sylvester differential equation. and the proof is essentially the same. Then the matrix initialvalue problem X(t) = AX(t) + X(t)AT. X(O) = C (11. Corollary 11.6. the following theorem is stated with initial time to = 0.12) X(t) = etACetAT has the solution X(t} = etACetAT. X(to) =C E jRnxn (11.4 Linear matrix differential equations Linear matrix differential equations Matrixvalued initialvalue problems also occur frequently.1.12).2.10) coefficient In the matrix case.11. e jRnxn.etoAx(to) = lto t e. 11. The fact that X((t) satisfies the initial condition is trivial.1.
where J is a JCF for A.e'J. i=1 The ki s are called the modal velocities and the right eigenvectors Xi are called the modal The Ai s are called the modal velocities and the right eigenvectors *.4) can be written x(t) = e(tto)A Xo E jRnxn E Wxn = (ti. that it is diagonalizable (if A is not diagonalizable.1 . are called the modal directions. the rest of this subsection is easily generalized by using the JCF and the decomposition able.6 Computation of the matrix exponential Computation exponential JCF method JCF method Let A e R"x" and suppose X E Rnxn is such that X"1 AX = J. Linear Differential and Difference Equations Chapter 11. In the last equality we have used the fact that YiHXj = flij.114 114 Chapter 11. for convenience. Then Then i=1 n = L(aieAiUtO»Xi. that it is diagonalizable (if A is not diagonalizLet A and suppose. for convenience. Linear Differential and Difference Equations 11.iUtO)Xiyr) Xo 1=1 n = L(YiHxoeAi(ttO»Xi. in the inhomogeneous case we can write Similarly. Similarly. Then the solution x(t) of (11.5 11. The decomposition above expresses the solution x(t) as a weighted sum of its modal velocities and directions. if A is diagonalizable in geneml.H . modal velocities and directions.1.1.1 n Le A• X'YiH .5 Modal decompositions Let A and suppose. Then the solution x(t) of (11. in the inhomogeneous case we can write t e(ts)A Bu(s) ds i~ = t i=1 (it eAiUS)YiH Bu(s) dS) Xi. i=1 In the last equality we have used the fact that yf*Xj = Sfj.li y t as discussed in Chapter 9). The decomposition above expresses the solution x (t) as a weighted sum of its directions. Then Then etA = etXJX1 = XetJX.y. ~ 11. the rest of this subsection is easily generalized by using the JCF and the decomposition H A — ^ Xf Ji YiH as discussed in Chapter 9).x. Let A E jRnxn and suppose X e jR~xn is such that XI AX = J.4) can be written A = L X. ~ 1=1 I t. This modal decomposition can be expressed in a different looking but identical form This modal decomposition can be expressed in a different looking but identical form n if we write the initial condition Xo as a weighted sum of the right eigenvectors if we write the initial condition XQ as a weighted sum of the right eigenvectors Xo = L ai Xi. where J is a JCF for A.
. and N kforth. ext}.11. k) O's k k N = 0.eAt). Differential Equations 115 If A is diagonalizable. or grade) MP = 0.e.. the problem clearly reduces simply to the computation of problem clearly reduces the exponential of a Jordan block. t2 t k. O. is complex. ••• . e lN finite. A matrix M E M nx " is nilpotent of degree (or index.. and so forth.EeCkxk be aaJordan block of the form Ji <Ckxk be Jordan block of the form A Ji = 1 o o o =U+N. Nk~lI has a 1 in its (1. nilpotent Definition 11.!etN by property 4 of the matrix exponential. Differential Equations 11. e'u e l N tu x lH = diag(e At . Mp~l ^ O. N22 has l's along only its second superdiagonal. Mp = 0. Thus. of In the more general case. real version of the above can be worked out.0. degree k. e ttJi = eO. Finally. let . But e tN is almost as easy since N The diagonal part is easy: e e = diag(e '. it is then easy to compute etA via the formula etA = XetJ XI' Xe tl X If is etA etA tj since et I is simply a diagonal matrix.I)! I o t 1 o Thus. o A o o A Clearly A/ and N commute. aareal version of the above can be worked out.1. it is easy to check that while N has 1's along only its first superdiagonal (and O's elsewhere). k) element and has O's everywhere else.. elN is is nilpotent of degree k. (1. the series expansion of e'N is finite.1.8. teAl eAt = 0 0 0 2I e 12 At teAl 0 eAt In the case when A is complex.. eAt teAt eAt o 2I e 12 At IkI At e (kI)! 0 ell.. AI e I. To be specific. l's For the matrix N defined above. + N k2! (k . or grade) p if if matrix M e jRnxn is nilpotent of degree (or index. its first superdiagonal (and O's elsewhere). A.I e IN =I+tN+N 2 + . while MPI t=. N has 1's along only its second superdiagonal.7.. i. Thus..
The method is stated and illustrated for the hand calculation in smallorder problems. . all the Ak — expressed k 1.s known. Let A = [ ~_\ J]. so m = 1 and n{ = 3. I.nxn and /(A) = etx.. compute f(A) = etA. f(A) n(A) etK.116 Chapter 11. g(I) = f(1) ==> ao .3. — 1. The method is stated and illustrated for the exponential function but applies equally well to other functions. .. I. . Linear Differential and Difference Equations Example 11.a l +a2 = e==> at . ==> 2a2 = t 2 e. terms of order greater than n . k = 0.10.I. Let Example 11. ni ... . Thus.9.) = (A + 1)3.10. With the aiS then kth superscript (&) X. . and /(A) = etA.. lowerorder g Example 11. They are.t • g'(1) = f'(1) g"(I) = 1"(1) . (A. Theorem 9. the function g is known and f(A) = g(A). in fact. 2} and Example 11. The polynomial g gives the appropriate linear combination.. characteristic of n(X (^ ~~ ^i)"'» where the A. 2} and etA Xe tJ =[=i a = xI =[ =[ 2 1 ] exp t ] [ [ 2 0 ~ ] [ 1 1 2 1 2 ] 2 1 e~2t te. compute f(A) = e'A. The motivation for this method is known. . the superscript (k) denotes the fcth derivative with respect to A. a. Linear Differential and Difference Equations Chapter 11. Given A E jRnxn and f(A) = etA. The motivation for this method is the CayleyHamilton Theorem. t fixed Given A € E... the function g is known and /(A) = g(A).2. i Em.. functions.s are given by g(A) — ao aiS a\X o^A. + I) 3 . Here.2a2 = te.1 can be expressed as linear combinations of Ak for k = 0. Then the three equations for the a. ani solution of the n equations: g(k)(Ai) = f(k)(Ai). Suppose the characteristic polynomial of A can be written as n(A)) = Yi?=i (A . Define the Ai nr=1 n where ao. n . . Let A = [~ o ~01~ ] t . Then A(A) = {2.2t e.. Let A Then A (A) = {2. anl are n constants that are to be determined. . . the unique OTQ.2t ][ 1 ] Interpolation method Interpolation method This method is numerically unstable in finiteprecision arithmetic but is quite effective for effective hand calculation in smallorder problems.1.1 in the power series for et A can be written in terms of these greater n— e' A lowerorder powers as well.9. .t . .s are distinct. where t is a fixed scalar.Ai t'. which says that all powers of A greater than A n . . Then jr(A. so m = 1 and nl Let g(X) = UQ + alA + a2A2.
2. t ff>\ tk TU^^ _/"i\ Example 11. Let g(A.2t [ ~ o ] + te. There is an extensive literature on approximating certain nonlinear functions by rational functions. Let A _* Example 11. we find Solving for the a.2t .) = «o + ofiA.11.2t . Let A = [ ~4 J] and /(A) = eO.2t ) 2te.2t .11. Then rr(A) = f\ + o\2 so m = and (A i 2) «i nL = 2. Use etA = £~l{(sl . we find Solving for the aiS. we find 117 Thus.2t + 2te. but general nonsymbolic computational effective smallorder techniques numerically problem equivalent techniques are numerically unstable since the problem is theoretically equivalent to knowing precisely a JCE JCF.1. 2.2t I [4 4] I 0 _ [  e.A)^ 1 } and techniques for inverse Laplace transforms. Then the defining equations for the aiS are given by 6] g(2) = f(2) ==> ao ==> al 2al = e. f(A) = etA = g(A) = aoI + al A = (e.s. Then 7r(X) = (A+ 2)22 so m = 11and [::::~ 4i and f(A) = ea. s. Thus.2t aL = + 2te. 2t . te. The matrix analogue yields e A ~ functions rational eA = . 1. Use Pade approximation.2t _ Other methods Other methods 1.. we find ao = e. g'(2) = f'(2) = te Solving for the a. This etA = .s are given by Let g(A) ao + aLA. Differential Equations Solving for the ai s. Then the defining equations for the a.cI{(sI — A)I} is quite effective for smallorder problems.1. Differential Equations 11 .2t te.11.
Linear discretetime systems. say. for example. but since the system is timeinvariant. Linear Differential and Difference Equations DI(A)N(A). by 22' 2* )A A multiplying it by 1/2k for sufficiently large k and using the fact that A = / { ]I //2')A )\ * . + opAP and N(A) = vol + vIA + D~ (A)N(A). and this observation is exploited frequently. 11. Then the solution of the inhomogeneous initialvalue problem mvectors. a Pad6 approximation for denominator the exponential is accurate only in a neighborhood of the origin.13). in the matrix case this means when  A is sufficiently small. Numerical loss of accuracy can occur in this procedure from the successive squarings.• + Vq A q.13.1 11. no double subscripts). and since we want to keep the formulas "clean" (i.13) is constant and does not depend on k.13. of matrix functions such as e A and 10g(A) remains a fertile area for research. This can be arranged by scaling A. and since we consider an arbitrary "initial time" ko.118 118 l Chapter 11. The solution of the linear homogeneous system ofdifference Let A e jRn xn. we have chosen ko = 0 for want to keep the formulas "clean" (i.2.13).. Then the solution of the inhomogeneous initialvalue problem (11. Reliable and efficient computation 4. where D(A) = 001 + olA + . We could also case. we restrict our attention only to the socalled timeinvariant Remark 11. Reduce A to (real) Schur form S via the unitary similarity U and use e A 3. Many methods are outlined in. [19]. by this means when IIAII is sufficiently small.12. We could also consider an arbitrary "initial time" ko. 11. e (e( 3. for example. 4. 11. Reliable and efficient computation of matrix functions such as e A and log(A) remains a fertile area for research. Linear discretetime systems.2. Numerical loss of accuracy can occur in this procedure from the successive squarings.13) is constant and does not depend on k. but since the system is timeinvariant. exhibit many parallels to the continuoustime differential equation difference equations. where the matrix A in (11. we restrict our attention only to the socalled timeinvariant case. a Fade approximation for polynomials of various orders. in the matrix case the exponential is accurate only in a neighborhood of the origin. The solution ofthe linear homogeneous system of difference equations equations (11. case. Let A E Rnxn. Many methods are outlined in. modeled by systems of difference equations.2 Inhomogeneous linear difference equations Inhomogeneous linear difference equations E jRnxn. This can be arranged by scaling A.. Unfortunately.13) for k > 0 is given by for k 2:: 0 is given by Proof: The proof is almost immediate upon substitution of (11.2 Difference Equations Difference Equations In this section we outline solutions of discretetime analogues of the linear differential In this section we outline solutions of discretetime analogues of the linear differential equations of the previous section. Reduce A to (real) Schur form S via the unitary similarity U and use eA = U e SsUH Ue U H and successive recursions up the superdiagonals of the (quasi) upper triangular matrix and successive recursions up the superdiagonals of the (quasi) upper triangular matrix e s. Proof: The proof is almost immediate upon substitution of (11.. 0 D Remark 11. [19]. we have chosen ko = 0 for convenience. Again. where D(A) 80I Si A H h SPA and N(A) v0I + vlA + q Explicit formulas are known for the coefficients of the numerator and Explicit formulas are known for the coefficients of the numerator and denominator polynomials of various orders. eS . Let A e Rnxn..14..e.2. •• vq A . B e Rnxm and suppose {«*}£§ « a given sequence of mvectors. Unfortunately.1 Homogeneous linear difference equations Homogeneous linear difference equations Theorem 11.e.15) .14) into (11. where the matrix A in (11. E jRnxm {udt~ is of Theorem 11. = P = multiplying it by 1/2* for sufficiently large k and using the fact that e = ( e j . no double subscripts). Linear Differential and Difference Equations Chapter 11.2 11.. exhibit many parallels to the continuoustime differential equation case. and this observation is exploited frequently. say. modeled by systems of equations of the previous section. Again.14) into (11. convenience.
X~1 AX JCF for A.11.16) into (11.H m if A is diagonalizable. it is then easy to compute Ak via the formula Ak = XJkXXI Ak Ak — X Jk If diagonalizable. the ztransform of the sequence {Ak is then given by Assuming z > max IAI. k=O Assuming Izl > max A.16) Proof: The proof is again almost immediate Proof: The proof is again almost immediate upon substitution of (11. k:::.15). One solution method.16) into (11.1 _I tA~X. where J is a E jRnxn and X E R^n J.y. Then JCF for A.O..3 11. One definition of the ztransform of a sequence {gk} is a matrix exponential. the ztransform of the sequence {Ak}} is then given by AEA(A) X€A(A) k "'kk 1 12 Z({A})=L. Jk .2. One definition of the ztransform of a sequence is +00 Z({gk}t~) = LgkZk. Difference Equations 119 119 is given by kI xk=AkXO+LAkjIBUj.2. is to use ztransforms.3 Computation of matrix powers Computation of matrix powers It is clear that solution of linear systems of difference equations involves computation of It is clear that solution of linear systems of difference equations involves computation of k.. j=O (11. a matrix exponential.15).2. in general. which is numerically unstable but sometimes useful for hand calculation.. Difference Equations 11.=1 H l If A is diagonalizable. +00 k=O z z = (lzIA)I = z(zI . substitution of (11.. Then Ak = (XJXI)k = XJkX.2. since /* is simply a diagonal matrix. based Methods based on the JCF are sometimes useful. LXi Jtyi . sometimes useful Ak.A)I. Assume that A e M" xn and let X e jR~xn be such that XI AX = /. by analogy with the use of Laplace transforms to compute ztransforms. 0 D 11.zA =I+A+"2 A + . again mostly for smallorder probsmallorder lems.
17) with ¢J(t) a given function and n initial conditions 4>(t} y(O) = Co.15. A is complex. see [11.1 Ak ( . To be specific. aareal version of the above can be worked out..6 can also methods 11. For an erudite discussion of the state of the art.)A  ( k ) AkP+I pl 0 J/ = kA k.(^ . it is then straightforward to apply the binomial theorem to (AI + N)k and verify that straightforward N)k (XI verify Ak kA kI Ak k 2 (. 1 1 1 2 1 ] Basic analogues of other methods such as those mentioned in Section 11.. In the case when A. Let A = [_J Example 11. the problem again reduces to the computation of the power of a In the general case.120 Chapter 11. inI)(O) = CnI' (1l.1 (2 ..1 Ak The symbol (: ) has the usual definition of q!(kk~q)! and is to be interpreted as 0 if k < q. but again no universally "best" method be derived for the computation of matrix powers.l8) . Linear Differential and Difference Equations Chapter 11. Example 11. and is to be interpreted as 0 if k < q. it is commute.. Then Then 1 ] [(_2)k 1 0 k(2)kk(2) 1 ] [ _ [ (_2/. Consider. Ch. for example.6 be derived for the computation of matrix powers. . y(O) = CI. Let A Ak = XJkX1 = [=i 4 a [2 1 J]. [11. the initialvalue problem initialvalue (11.• = AI and noting that AI and the nilpotent matrix Writing Ji = XI + N and noting that XI and the nilpotent matrix N commute.3 HigherOrder Equations HigherOrder Equations differential It is well known that a higherorder (scalar) linear differential equation can be converted to higherorder a firstorder linear system.2k) k( _2)k1 ] k( 2l+ (2l.1. . the problem again reduces to the computation of the power of a To Ji E Cpxp Jordan block.1. Linear Differential and Difference Equations In the general case.is complex. The symbol ( ) has the usual definition of .3 11. ) Ak. real version of the above can be worked out. let 7.2 0 0 0 0 kA k . but again no universally "best" method exists. 11. 18]..1(2k . e Cpxp be a Jordan block of the form o . 0 A Writing /.2) .15.
as mentioned before. into a linear firstorder difference equation with (vector) initial condition. However. is often well worth avoiding. 3.718P.a\X2(t) .. where !(eat . . Further. condition. as mentioned before.I) g(t... let a = xTy. Note that det(X7 — A) = An + an\Xn 1l H alA + ao. Cl. Show that e P ~ ! + 1. Show that e'A 1+ g ( t . Define a vector x (?) e R" with components *i(0 = y ( t ) . the companion matrix A in (11... However... at least for computational purposes. Suppose x. •. v (m) denotes the mth derivative of y with respect to t. Show that etA 2. Xn(t) y { n ~ l ) ( t ) .a)= { a t nxn p if a if a 1= 0. Cn \ . 2. is often well worth avoiding. . EXERCISES EXERCISES 1. . c\. . .. Let P E lR nxn be a projection.anlXn(t) + ¢(t). xn(t) = Inl)(t). = X3(t) = yet).. let a = XT y. = O. at least for computational purposes. into a linear firstorder difference equation with (vector) initial with n initial conditions. .. the companion Note that det(A! .19) possesses many nasty numerical properties for even moderately sized n matrix A in (11..A) = A. . .. and. Then components Xl (t) yet).19) possesses many nasty numerical properties for even moderately sized n and. Suppose x. Let . Define a vector x (t) E ]Rn with Here.an_llnl)(t) Xnl (t) Xn(t) = y(n)(t) = aoy(t)  + ¢(t) = aOx\ (t) . y € R" and let A = xyT. y(m) denotes the mth derivative of y with respect to t. A similar procedure holds for the conversion of a higherorder difference equation A similar procedure holds for the conversion of a higherorder difference equation with n initial conditions."+ an_1A n~ + . a)xyT. Then Xl (I) X2(t) = X2(t) = y(t). x2(t) = y ( t ) . Let P € R 1. a)xyT. Further. aly(t) . (11. C M _I] The initial conditions take the form X (0) = C = [co..19) The initial conditions take the form ^(0) = c [CQ. Show that e % / + 1. where I + get. y E lRn and let A = xyT. = Xn(t) = y(nl)(t). be a projection. These equations can then be rewritten as the firstorder linear system These equations can then be rewritten as the firstorder linear system 0 0 x(t) = 0 0 1 0 0 0 ao a\ x(t)+ [ 0 1 a n\ n ~(t) r..718P.Exercises 121 121 Here. +h a\X+ ao. X2(t) yet)... Let 3.
Let (a) Solve the differential equation (a) Solve the differential equation i = Ax .. Show that ). must (a) Suppose E is Hamiltonian and let A.. 4. Show that 1/).A and to be symplectic K~l AT K . Let 5.122 122 Chapter 11. x(O) =[ ~ J. ft € lR and Let a. Find eM when A = Find etA = 8. must also be an eigenValue of S. Hamiltonian if K~1ATK = A and to be symplectic if K I ATK = A I. 6. be an eigenvalue of H. . Let denote the skewsymmetric matrix 4. must also be an eigenvalue of H. Show S~1 H S Hamiltonian. be an eigenvalue of S. Show that E jRmxn e = [eoI A sinh 1 X ] ~I . (a) Suppose H is Hamiltonian and let). also be an eigenvalue of H. Let K denote the skewsymmetric matrix 0 [ In In ] 0 ' In A E jR2nx2n where /„ denotes the n x n identity matrix. Show that eH must be symplectic. Linear Differential and Difference where X e M'nx" is arbitrary. Hamiltonian. Linear Differential and Difference Equations Chapter 11.be an eigenvalue of H.. (d) Suppose 5.A 1 . f3 E R and Then show that Then show that ectt _eut cos f3t sin f3t ectctrt e sin ~t cos/A J. A matrix A e R 2nx2n is said to be K I AT K = . Show that SI HS must be Suppose and symplectic. Let a. (b) Suppose S is symplectic and let A. H (d) Suppose H is Hamiltonian. must (b) Suppose S is symplectic and let). Show that 1 /A.be an eigenvalue of S. Find a general expression for Find a general expression for 7. also eigenvalue of (c) Suppose that H is Hamiltonian and S is symplectic. Show that —A..
as k —»• +00 (i. (a) Find the matrix M that gives (a) Find the matrix M that gives [ A] E R =M year k+1 [A] E R year k (b) Find the eigenvalues and right eigenvectors of M. around the time the Cubs win a World Series).e.3. Suppose certain multinational companies have total assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. Show that the eigenvalues of the solution X t ) of this problem are the same as those Show that the eigenvalues of the solution X ((t) of this problem are the same as those of C for all?. (c) Find the distribution of the companies' assets at year k. X(O) = c. Suppose certain multinational companies have Europe (E). and a quarter goes to Asia. of Cf or all t.X(t)A. and a quarter year half of the Americas' money stays home. Europe (E).) 12. Consider the initialvalue problem i(t) = Ax(t). The year is 2004 and there are three large "free trade zones" in the world: Asia (A). Each total assets of $40 trillion of which $20 trillion is in E and $20 trillion is in R. (c) Find the distribution of the companies' assets at year k. (b) Find the eigenvalues and right eigenvectors of M. Consider the n x n matrix initialvalue problem X(t) = AX(t) . Each year half of the Americas' money stays home.11 in [24]. as (d) Find the limiting distribution of the $40 trillion as the universe ends... (b) Consider the difference equation (b) Consider the difference equation Zk+2 + 2Zk+1 + Zk = O. Suppose that e E"x" is skewsymmetric and let a = \\XQ\\2.. The year is 2004 and there are three large "free trade zones" in the world: Asia (A).. Consider the n x n matrix initialvalue problem 10. x(O) =[ ~ l 9. a quarter goes to Europe. 11. (a) Find the solution of the initialvalue problem (a) Find the solution of the initialvalue problem . half stays home and half goes to the Americas.Yet) + 2y(t) + yet) = 0. 12.YeO) = O. x(O) = Xo for t ~ O.11 in [24].e. i. I/X(t)1/2 = ex for all t > 0. what is the value of ZIOOO? What is the value of Zk in 2. (Exercise adapted from Problem 5. i. Consider the initialvalue problem 9. 11. . yeO) = 1. Show that *(OII2 = aforallf > O.e. and the Americas (R). around the time the Cubs win a World Series). For Europe and Asia. Show that for t > 0. goes to Asia. half stays home and half goes to the Americas. (d) Find the limiting distribution of the $40 trillion as the universe ends.e.Exercises Exercises (b) Solve the differential equation (b) Solve the differential equation i 123 = Ax + b. For Europe and Asia.3. a quarter goes to Europe. and the Americas (R). Suppose that A E ~nxn is skewsymmetric and let ex = Ilxol12. what is the value of ZIQOO? What is the value of Zk in general? general? . If £0 = 1 and z\ If Zo = 1 and ZI = 2. 10. k * +00 (i.) (Exercise adapted from Problem 5.
This page intentionally left blank This page intentionally left blank .
the characteristic polynomial is obviously real. The polynomial 7r(A.1 The Generalized Eigenvalue/Eigenvector Problem The Generalized Eigenvalue/Eigenvector Problem Ax = 'ABx. called a generalized eigenvalue. The standard eigenvalue problem considered in Chapter 9 obviously where A. and Remark 12. eigenvalues for the generalized eigenvalue problem occur pencil — XB problem occur where the matrix pencil A . B e C" xn The standard eigenvalue problem considered in Chapter 9 obviously corresponds to the special case that B = I. B E C MX if there exists a scalar A E C. The polynomial n('A) = det(A — A. a. The roots ofn(X. such that that (12. e e.3.2.2. if x [y] is a right [left] ax [ay] for any eigenvector. B e enxn" if there exists a scalar 'A. a nonzero vector y e C" is a left generalized eigenvector corresponding to an E en generalized eigenvector eigenvalue 'X if eigenvalue A if (12. B E E" xn . characteristic hence nonreal eigenvalues must occur in complex conjugate pairs. B). 125 125 . In this chapter we consider the generalized eigenvalue problem In we the generalized eigenvalue problem where A. As with the standard eigenvalue problem. A E en Definition 12.4. As with the standard eigenvalue problem. generalized eigenvalue problem. the adjective "generalized" "generalized" standard eigenvalue [y] is usually dropped. hence nonreal eigenvalues must occur in complex conjugate pairs. then so is ax [ay] for any nonzero scalar a E <C. called a generalized eigenvalue. The matrix A — 'AB is called a matrix pencil (or pencil of the matrices A Definition 12. The roots ofn('A) are the eigenvalues of the associated nomial of the matrix pair (A.1 12. corresponds to the special case that B = I.'AB is singular. Definition 12. .5) is called the characteristic polynomial of the matrix pair (A. Similarly.3. e C. Definition 12. B e jRnxn.) = det(A .1. B). A nonzero vector x e C" is a right generalized eigenvector of the pair generalized eigenvector of (A.'AB) is called the characteristic polyDefinition 12.1. (A. Remark 12. and A.4. When A. B) with A.Chapter 12 Chapter 12 Generalized Eigenvalue Generalized Eigenvalue Problems 12. B) with A. B). Definition 12.2) When the context is such that no confusion can arise.) are the eigenvalues of the associated generalized eigenvalue problem. B E enxn.1) Ax = 'ABx. eigenvector.XB is called a matrix pencil (or pencil of the matrices A and B). The matrix A .
with its reciprocal eigenvalue being 0 in Case 3 of the reciprocal pencil B — /. I). (3 = 0.L)({3 ./.(3A) ±.3) where a and (3 are scalars. Case 2: = 0. Case 1: a =I. Associated with any matrix pencil A . With A and B as in (12. k E !!.AB. I (of multiplicity 1). 1 and 0. det(B . There are two eigenvalues. otherwise.X B is a reciprocal pencil B — n. All A e C are eigenvalues since det(A — AB) =0. pencil . I and ^.126 126 Chapter 12. If del (A . Case 1: ^ 0. Case 3: Case 4: = 0. Note that A and/or B may still be singular.LA and corresponding generalized eigenvalue problem. There are two eigenvalues. eigenvalues associated with the pencil A . There are two eigenvalues. 1 and 0.AB Definition 12. There is only one eigenvalue. All A 6 C are eigenvalues since det(B — uA) = O. it is said to be singular. Clearly the reciprocal pencil has eigenvalues responding generalized /. there is a second eigenvalue "at infinity" for Case 3 of of . {3 =I. when B =I.KB always has pencil — AB . If = of degree n. ^ 0. the associated matrix pencil is singular (as in Case N(A) n N(B) =Isingular 4 above). it is apparent where the "missing" eigenvalues have "missing" gone in Cases 2 and 3. All A E C are eigenvalues since det(A . However. n(X) Remark 12.0. f3 = O. A similar reciprocal symmetry holds for Case 2.O.5. and ~.nA.a/. ft =I.LA) = (1 . {3 ^ 0. However. eigenvalues — AB.AB) and there are several cases to consider. A similar reciprocal symmetry holds for Case 2.0.O. there may be 0.B. There are two eigenvalues. Then the characteristic polynomial is ft det(A . Case Case 3: a = 0. Case 2: a = 0. Note that if AA(A) n J\f(B) ^ 0.5. f3 / 0. is singular. If B is singular. reciprocal Case of reciprocal . {3 =I. Case = ft ^ 0.LA. =I.0. There are two eigenvalues. 1 and O. If B = I (or in general when B is nonsingular).B) == O. There are two eigenvalues.A.6.XB. I1 and .3).XB) is not identically zero. zero. regular. All A E C are eigenvalues since det(B . I and O. f3 = O. That is to say. ft ^ O. There is only one eigenvalue. Case 4: a = 0. the pencil A — XB is said to be 12. There are two eigenvalues. Case 4: a = 0. Case 1: a =I.{3 = 0. {3 = 0. 1 and ~. B k e n. f3 = 0.XB. A — A./. Generalized Eigenvalue Problems Remark 12./. Case 1: a ^ 0. While While there are applications in system theory and control where singular pencils appear. (3 = O. or infinitely many B = I.A and corAssociated with any matrix pencil — AB is a reciprocal pencil . (12. then rr(A) is a polynomial nonsingular).O. when B is singular.LA) == 0.L = (JL = £. the pencil A . suppose associated — AB. I multiplicity 1). It is instructive to consider the reciprocal pencil associated with the example in It reciprocal Remark 12.6. and hence there are n eigenvalues associated with the pencil A . the characteristic polynomial is = (I .L) and there are again four cases to consider. {3 =I. only the case of regular pencils is considered in the remainder of this chapter. If det(A — AB) not regular.0.5. Case 4: a = 0. Case 2: a = 0. 1 (of multiplicity 1). (3 = O. There are two eigenvalues.I. Note appear. in particular. Generalized Eigenvalue Problems Chapter 12.AHa . 1 Case 3: a =I. For example. only the case of regular pencils is considered in the remainder of this chapter./. At least for the case of regular pencils.
this turns to the standard eigenvalue problem B~1Ax = Xx (or AB~1w = Xw). B e cnxn .7] [25. then QHy isa left eigenvector ofQAZ AQBZ.2. B. ifx isa right eigenvector of A—XB. Canonical Forms 12.AB and QAZ . Let A. we now deal with equivaa matrices. for example. in fact. 2. fewer than n eigenvalues. f i always has precisely eigenvalues.7. the theoretical foundation for the QZ algorithm. Sec. where Ta and Tp are upper triangular. Then there exist unitary matrices Q. and the first theorem deals with what happens to eigenvalues lencies rather than similarities. Sec.7. D The first canonical form is an analogue of Schur's Theorem and forms. Q.AB). 7. Sec. the pencil A fewer than eigenvalues. 2. of AAB. the result follows. the pencil A AAB always has precisely n . However. see.XB)Z] = detQ det Z det(A .7]. Z e Cnxn such that 12. 6. 7. [7. Numerical methods that work directly on A and are discussed in standard textbooks on numerical linear algebra. The result follows by noting that (A AB)x = 0 if and only if Q(A AB)Z(Zl x) = The result follows by noting that (A –yB)x . QBZ = TfJ . Sec. fl. which is the generally preferred method for solving the generalized eigenvalue problem. that a zero diagonal element of TfJ corresponds to an infinite generalized eigenvalue.AB) o if and only if (QH y ) H Q ( A –_ B ) Z = Q. Sec. since the generalized eigenvalue problem is then easily seen to be equivalent eigenvalues. E nxn with Q and nonsingular. work directly on A and B are discussed in standard textbooks on numerical linear algebra. the result follows easily by noting that yH(A — XB) — 0 if and only if yH (A .AQBZ) = det[Q(A .7]. 6. Again. [7.8. and eigenvectors under equivalence. However. Canonical Forms 127 B is nonsingular. 7. Q. for example. Let A. 3. which is the generally preferred method for theoretical foundation for the QZ algorithm. By Theorem 12. then Z~lx isa right eigenvector of QAZ—XQ B Z. ify isa left eigenvector of A —KB. where Ta and TfJ are upper triangular. Then 1. Then 12. Q~H y isa lefteigenvectorofQAZ — XQBZ. to ifx is a Zl x is a righteigenvectorofQAZAQB Z. the eigenvalues ofthe pencil A — XB are then the ratios of the diagonal elements of Ta to the corresponding diagonal elements of TfJ . since the generalized eigenvalue problem is then easily seen to be equivalent to the standard eigenvalue problem B.2 Canonical Forms Canonical Forms Just as for the standard eigenvalue problem. det(QAZXQBZ) = det[0(A . [7.l Ax Ax (or AB. 12. 6. There is also an analogue of the MurnaghanWintner Theorem for real matrices. 6.AB)Z] = det gdet Zdet(A 1. the eigenvalues of the problems A . Let A.AB are then the ratios of the diagBy Theorem 12.12.l W AW). and the first theorem deals with what happens to eigenvalues and eigenvectors under equivalence.7]. canonical forms are available for the generalized eigenvalue problem. in fact.7. for example. Numerical methods that if B is even moderately ill conditioned with respect to inversion. the eigenvalues of the pencil A . ify is a left of AB. see.AQBZ are the same (the two problems problems are said to be equivalent). .7] or [25.8.Oif andonly if Q(AXB)Z(Z~lx) = 0. E c nxn such that QAZ = Ta . the result follows. 7.7] or [25.2 12. Since det 0 and det Z are nonzero. with the understanding onal elements of Ta to the corresponding diagonal elements of Tp. Let A. and det Z are nonzero. solving the generalized eigenvalue problem. this turns out to be a very poor numerical procedure for handling the generalized eigenvalue problem out to be a very poor numerical procedure for handling the generalized eigenvalue problem if is even moderately ill conditioned with respect to inversion.. Theorem 12.7]. Z e Cnxn with Q and Z nonsingular. 0 ( Q ~ H y)H Q(A X AB)Z = O. for example. canonical forms are available for the generalized Just as for the standard eigenvalue problem. with the understanding that a zero diagonal element of Tp corresponds to an infinite generalized eigenvalue. B E Cnxn Then there exist unitary matrices Q. c 3. the eigenvalues of the problems A — XB and QAZ — XQBZ are the same (the two 1. Proof: Proof: 1. see. There is also an analogue of the MurnaghanWintner Theorem for real matrices. Sec. o. lencies rather than similarities.7] or [25. Since det Q XB). det(QAZ . Sec. Theorem 12. see. [7. the The first canonical form is an analogue of Schur's Theorem and forms.2. If B is nonsingular.7. Since the latter involves a pair of matrices.
A [~ ~ l of .AB where J is a Jordan canonical form corresponding to the finite eigenvalues of A A. Let A. .AB)Q = [~ ~ ] . The matrix pencil 12. Q € c nxn"such that nonsingular E C" such that peA . form (KCF).12 (Kronecker Canonical Form). The first theorem pertains only to "square" regular pencils. Then there exist 12. When S has a 2 x 2 diagonal block.AB. Let A. .9.9.2)2 with characteristic polynomial (A — 2)2 has a finite eigenvalue 2 of multiplicty 2 and three 2 2 infinite eigenvalues.XB is regular. QBZ = T.12 mxm nxn mxm nxn E C nonsingular nonsingular matrices P e c and Q e c QE C such that peA .fi and canonical form nilpotent matrix of associated and N is a nilpotent matrix of Jordan blocks associated with 0 and corresponding to the infinite infinite eigenvalues of A .)"N). quasiuppertriangular. Let A.128 Chapter 12. while the full KeF in all its generality applies also to "rectangular" and singular KCF "rectangular" pencils. of — XB. the 2 x 2 subpencil formed with the corresponding fonned 2 x diagonal subblock 2x2 2 diagonal subblock of T has a pair of complex conjugate eigenvalues. including analogues of principal vectors and description of of so forth. B e Rnxn. where T is upper triangular and S is quasiuppertriangular.• L..A. real eigenvalues.'.11. In this chapter.I.11..AB)Q = diag(LII' . Generalized Eigenvalue Problems Chapter 12. Then there exist orthogonal matrices Q. is beyond the scope of this book. There is also an analogue of the Jordan canonical form called the Kronecker canonical fonn Kronecker form (KeF). Then there x exist nonsingular matrices P. thnt that QAZ = S. I . T. B E c nxn pencil — AB Theorem 12.. . B e Cnxn and suppose the pencil A . A full description of the KeF.. mxn E C • Theorem 12. J . Z e R"xn such B E jRnxn. B e c mxn . of eigenvalues are given as above by the ratios of diagonal elements of S to corresponding elements of T. Generalized Eigenvalue Problems Theorem 12.10. Example 12. E jRnxn 12. L l" L~. KCF. [2o I o o o 0 0 0 0 0 2 0 0 1 0 0 1 0 0 ~ ]> [~ 0 I 0 0 0 0 0 0 0 0 o o 0 I 0] 0 0 0 0 (X . Otherwise. we present only statements of the basic theorems and some examples.
next two correspond to correspond J = 21 0 2 [ o 0 while the nilpotent matrix N in this example is N [ ~6~].— XBif S Rn. LQ. Lo L6 one column. Example 12.12. where each LQ has "zero columns" and one row. (12. are called the right minimal indices. and Lk is the (k + 1) x k bidiagonal pencil bidiagonal pencil A 0 0 A Lk = 0 0 0 0 A I The Ii are called the left minimal indices while the ri are called the right minimal indices. L6. The /( are called the left minimal indices while the r. both N and J are in Jordan canonical form. Specifically. LQ. i.4) eigenvalue characterization Just as in the standard eigenvalue case. both Nand J are in Jordan canonical form.35). B e Wlxn and suppose the pencil A . Consider a 13 x 12 block diagonal matrix whose diagonal blocks are A 0] I o A I . (12. Canonical Forms 129 where N is nilpotent.The next two blocks second block L\ one the block is L\. while each LQ has "zero rows" and L6. LQ . 000 Just as sets of eigenvectors span Ainvariant subspaces in the case of the standard eigenvectors eigenproblem (recall Definition 9. n(S)) = S. there is an analogous geometric concept for the eigenproblem generalized eigenproblem. and L^ is the (k + I) x k where N is nilpotent.2.13. The second block is L\ while the third block is LI. Definition 12.e. Lo. Then is deflating subspace for the pencil A AB if and only if there exists M E Rkxk such that e ~kxk AS = BSM. 0. Then V is a E ~nxn suppose pencil — AB deflating subspace if deflating subspace if dim(AV + BV) = dimV.14. there is a matrix characterization of deflating subspace. Such a matrix is in KCF.e. Lo.. Lo. Then SS is aadeflating subspace for the pencil A . The first block of zeros actually corresponds to LQ. suppose S e Rn* xk is a matrix whose columns span a kdimensional E ~nxk ^dimensional subspace S of ~n.XB is regular. R ( S <S. generalized eigenproblem.2. i. corresponds LQ. Canonical Forms 12.5) . Left Left or right minimal indices can take the value O. Let A..
the (finite) zeros of this system are given by the (finite) complex numbers In general. In the special case p = m. Checking the finite eigenvalues of the pencil (12. where x(= x(t)) is called the state space model is often used in multivariable control theory.4) becomes dim (A V + V) = dim V.15. Generalized Eigenvalue Problems If B = /. Similarly. these values are the generalized eigenvalues of the drops rank. trivial. the (finite) zeros of this system are given by the (finite) complex numbers where the "system pencil" z. vector. E jRnxm. multioutput systems.8. Similarly.3 12. see. This linear timeinvariant statespace model is often used in multivariable control theory. Ac M D "'" 5A + 14. 12. = Cx + Du E jRnxn.6). then (12. however. however. For details. where x(= x(t)) is called the state vector. The method of finding system zeros via a generalized eigenvalue problem also works The method of finding system zeros via a generalized eigenvalue problem also works well for general multiinput.4) becomes dim(AV + V) = dimV. However. we find the characteristic polynomial to be find the characteristic polynomial to be det [ which has a root at 2. D=O. If the pencil is not regular. then (12. which is clearly equivalent to If B = I. see. and y is the vector of outputs or observables. u is the vector of inputs or controls. Then the transfer matrix (see [26]) of this system is Then the transfer matrix (see [26)) of this system is g(5)=C(sIA)'B+D= 5 55 2 + 14 ' + 3s + 2 which clearly has a zero at —2.3 Application to the Computation of System Zeros Application to the Computation of System Zeros i y Consider the linear system Consider the linear svstem = Ax + Bu. multioutput systems. In general. there is a concept analogous to deflating subspace called a reducing subspace. is a concept analogous to deflating subspace called a reducing subspace. Checking the finite eigenvalues of the pencil (12. (12. which is clearly equivalent to AV c V. This is accomcareful first to "deflate out" the infinite zeros (infinite eigenvalues of (12. Numerically.6) drops rank.8. for example.15. B] . zeros).8. we offer some insight below into the special case of a singleinput. Let A=[ 4 2 C = [I 2]. and E jRPxm. lEthe pencil is not regular.6)). u is the vector of inputs or controls. there AV ~ V. However.8.5) becomes AS = SM as before. The connection between system zeros and the corresponding system pencil is nonThe connection between system zeros and the corresponding system pencil is nontrivial. This linear with A € M n x n . (12.6». Numerically. these values are the generalized eigenvalues of the (n + m) x (n m) pencil. For details. Example 12. and y is the vector of outputs or observables.6). In the special case p = m. we offer some insight below into the special case of a singleinput. we which clearly has a zero at 2. Let Example 12. B € R" xm . for example. [26]. (n + m) x (n + m) pencil. C e Rpxn. [26]. and D € Rpxm.5) becomes AS = SM as before. which has a root at —2. one must be careful first to "deflate out" the infinite zeros (infinite eigenvalues of (12. where the "system pencil" (12. This is accomplished by computing a certain unitary equivalence on the system pencil that then yields a plished by computing a certain unitary equivalence on the system pencil that then yields a smaller generalized eigenvalue problem with only finite generalized eigenvalues (the finite smaller generalized eigenvalue problem with only finite generalized eigenvalues (the finite zeros). one must be well for general mUltiinput.130 Chapter 12. E jRPxn.
" is a frequently employed model of structures or vibrating systems and yields a frequently generalized eigenvalue problem ofthe form (12. the problem (12.e. (12.A to the standard eigenvalue problem Bl1Ax = AJC..zl)lby.9) Substituting this in (12. system of differential equations differential Mx+Kx=O. we have Substituting this in (12. B E Rnxn arises when A = A and B = BT > O. B e ffi.9».8). z is a zero of g. symmetric. For example. . of the Since B is positive definite it is nonsingular. Symmetric Generalized Eigenvalue Problems 131 131 1 singleoutput system. Hence g(z) = 0.l xn. and D = d E R. Specifically.9)). M K where M is a symmetric positive definite "mass matrix" and K is a symmetric "stiffness definite "stiffness matrix.4 12.7) (12.e.4 Symmetric Generalized Eigenvalue Problems Symmetric Generalized Eigenvalue Problems Ax = ABx A very important special case of the generalized eigenvalue problem (12.. Thus. B~11A is not necessarily B~ Ax = AX.10) is equivalent B.8) c T x +dy = O. let g(. or g ( z ) y = 0 by the definition of g. 0 from (12. or g(z)y 0 by the definition of g. we have _c T (A .n.8). Now y ^ 0 (else x z i. Symmetric Generalized Eigenvalue Problems 12. Then there exists a nonzero solution to or or (A . the problem (12.s) = c (s I — A) 1 b + d c function and assume that g(s) can be written in the form and assume that g ( s ) can be written in the form v(s) g(s) = n(s)' polynomial A. there are no "pole/zero cancellations").nxn A AT and B the B1 0. g.. "pole/zero cancellations"). g(s) Furthermore. A pole/zero Assuming z is not an eigenvalue of A (i. let B = b E Rn. However. Suppose z € is such that Suppose Z E C is such that [ A . e ffi.10). and D e R r T(s7 .4.zI cT b ] d is singular.e.4. Hence g(z) 0. and v(s) and n(s) are relatively prime TT(S) v(s) TT(S) (i. Now _y 1= 0 (else x = 0 from (12. 12.10) for A. C = c T E R l x n .A)~ ! Z? + d denote the system transfer function (matrix). Thus. relatively where n(s) is the characteristic polynomial of A.12.zl)x + by = 0. no pole/zero cancellations). then from (12.7) we get get x = (A . b e ffi.10) is equivalent Since B is positive definite it is nonsingular. the secondorder A. (12.zl)lby + dy = 0.
fi Then it is easily checked that Then it is easily checked thai c = L~lAL~T = [ 0. ii € n. it has a Cholesky factorization B = LL T.16. Then the generalized A.1926 and —3. the eigenvalue problem eigenvalue problem Ax = ABx has n real eigenvalues. with corresponding eigenvectors zi.5 ] 1.18. Finally. generalized case A and B are Hermitian. then product y) x T By. B E Rnxn with A = AT and B = BT > 0. the eigenproblem (12.. but since realvalued matrices are commonly used in most applications. l = [i ~ J B ThenB~ A Then A B~Il = [~ ~ J B~I A approximately Nevertheless. it has a Cholesky factorization B = LLT. Let A = [~ .11) can be rewritten as the equivalent problem 1 Letting C = L ~I AL ~T and z = L1 x.1926 in Example 12.12) has n real eigenvalues. Xj)B T T = xr BXj = (zi L ~l)(LLT)(L ~T Zj) = Dij.16 is Example 12.5 ' 3.18. . Example 12..5 2. we have restricted our attention to that case only.16 is D 0 L=[~ . Then the eigenvalue problem Ax = ABx = ALL Tx (12. Theorem 12. we have restricted our attention to that case only. so the eigenvalues are positive.. Generalized Eigenvalue Problems Chapter 12. Moreover.1926 as expected. B e jRnxn A AT and B BT > O. then C = C T > 0.11) can then be rewritten as = Cz = AZ. if A = A > 0.. positive. E !!. Then the eigenvalue problem (Theorem 10. Moreover.fi 1] .1926 whose eigenvalues are approximately 2. Finally. Proof: Since B > 0.17. with corresponding eigenSince C = C T. if orthogonal > 0. and are Hermitian. the eigenvalues of B l A are always real (and are approximately 2. where L is nonsingular Proof: Since B > 0..11) can then be rewritten as AL J and Z = LT x. zi Then x. (12. Generalized Eigenvalue Problems Example 12.. . •.23).132 132 Chapter 12. Let A Example 12. zn satisfying vectors Z I.. Let A. (12.. of course. and the n corresponding right eigenvectors can be chosen to be orthogonal with respect to the inner product (x. The Cholesky factor for the matrix B in Example 12. where L is nonsingular (Theorem 10. . Zn Zj = Dij. = L ~Tzi. the eigenvalues are also all positive. then = C T > 0. The material of this section can. are eigenvectors of the original generalized eigenvalue problem Xi Zi. (12. be generalized easily to the case where A material of can.16).. y)BB = XT By. but since realvalued matrices are commonly used in most applications..16. so the eigenvalues are positive.1926 and 3. are eigenvectors of the original generalized eigenvalue problem and satisfy and satisfy (Xi. The Cholesky factor for the matrix B in Example 12. if A > 0. if A = AT> 0.23).12) Since C = C T the eigenproblem (12..12) has n real eigenvalues.5 2.
Then A 2: B if and only if B~ 2: AI.21 we have that QT AQ 2: QT BQ. Simultaneous Diagonalization 12. Let A. i.. There are many such results and we present only a representative (but important and useful) theorem here.5.1A).19 is numerically problematic. there exists Q e E"x" such that QT AQ = D and QT BQ = I. Also. It turns out that in some cases a pair of trices can be diagonalized by a unitary similarity. Theorem 12.< / (this is trivially true 0 since the two matrices are diagonal). Proof: By Theorem 12.5 12.19. we B~ 1 A. Proof: Let B = LLT be the Cholesky factorization of B and set C = L~1AL T. when L is highly ill conditioned with respect to inversion.e.T P. B e M" xn be positive definite. i. such results and we present only a representative (but important and useful) theorem here. Proof: By Theorem 12. where D is diagonal. Let Q = L~T P.5. where D is C is symmetric. Let Q = L . i. simultaneous reduction can also be accomplished via an SVD. Then diagonal. Since Proof: Let T C is symmetric. D 2: [... there exists an orthogonal matrix P such that pTe p = D.g. the diagonal elements of D are the eigenvalues of B.e. we restrict our attention only to the real case. A I < B~ . However. A~l :::: Bl1.lI QT :::: Q QT. haveA(D) = A(B. In particular. it does preserve the eigenvalues of A — XB.12. by Theorem 10. A1. matrices to "the diagonal case. so it does not preserve eigenvalues of A and B individually. Since LLT be the Cholesky factorization of and setC L I AL~T.20. with the complex case following in a Again.1A = Q1l B~1QT QT AQ = Q11B. = QQT AQQ~l = LTPPTL~IA = L~TL~1A L T P pT L 1 A L T L I A QQT AQQI 0 D = B1A. i. B) can be simultaneously diagonalized by the same matrix. Then and and QT BQ Finally. Then there exists a nonsingular matrix Q such that A = AT and B = BT > 0. Now D > 0 by Theorem 10. Then B. Let A.31." The following is typical.. B E lRnxn be positive definite. Let A = QT AQandB = QT BQ. there exists Q E lR~xn such that QT AQ = D and QT BQ = [..1 Simultaneous diagonalization via SVD Simultaneous diagonalization via SVD There are situations in which forming C L I AL T as in the proof of Theorem 12. Let A. D since the two matrices are diagonal). Infact.21 we have that QT AQ > QT BQ. by Theorem where D is diagonal.1AQ.e. This can be seen directly. let such cases. we restrict our attention only to the real case. = pT L I(LLT)L T P = pT P = [.19 is very useful for reducing many statements about pairs of symmetric Theorem 12. Again. In fact. QD~ QT < QQT.5 Simultaneous Diagonalization Simultaneous Diagonalization Recall that many matrices can be diagonalized by a similarity. B) can be simultaneously diagonalized by the same matrix.19.19 is There are situations in which forming C = L~1AL~T as in the proof of Theorem 12. However.19 (Simultaneous Reduction to Diagonal Form). normal matrices can be diagonalized by a unitary similarity. since A > B.20. Now D > 0 by Theorem 10.. simultaneous reduction can also be accomplished via an SVD. since QDQI Finally. since QDQ~l have A(D) = A(B~1A).5.5. e. there exists an orthogonal matrix P such that P CP = D.l Q~T QT Q~ B~ AQ. so it does not preserve eigenvalues of and B Note that Q is not in general orthogonal.e." The following is typical. with the complex case following in a straightforward way. In particular.g. individually. In such cases. This can be seen directly. the diagonal elements of D are the eigenvalues of B 1A. let . To illustrate. To illustrate. e. \ 2. where D is diagonal. LetA QT AQ and B QT Then/HA Q~ B. since A 2: B. In numerically problematic.'AB. D > I. it does preserve the eigenvalues of A . Q D.31. B E E"x" with 12. we Note that Q is not in general orthogonal. Then there exists a nonsingular matrix Q such that where D is diagonal. normal maRecall that many matrices can be diagonalized by a similarity. But then D"1I :::: [(this is trivially true 10. There are many matrices (A. But then D. Simultaneous Diagonalization 133 12. Thus. Thus. Also. Theorem 12. It turns out that in some cases a pair of matrices (A. Theorem 12. when L is highly iII conditioned with respect to inversion.19 e ][~nxn A AT and B BT > O.1A. where D is diagonal.1 12. Then A > B if and only if Bl1 > Theorem 12.19 is very useful for reducing many statements about pairs of symmetric matrices to "the diagonal case. straightforward way.
operations performed directly on M rather than by forming the matrix MT M and solving performed MT forming the eigenproblem MT MX = AX. let A = LALTA and B — LBL~ us assume that both A and B are positive definite.14) Letting x = LB z we see that (12..13) where E E R£ x " isisdiagonal. for generalizations results 12. PDPT ~ ~ ~ ~ T PD(PD{ with where Disdiagonaland P is orthogonal.15) The problem (12.7. for LB i. products LA L ~ LBL~ see. example.14) can be rewritten in the form LALAx = XLBz = Letting x = LBT Z we see 02. let A = LAL~ and B = LsLTB be Cholesky factorizations of A and B. D b . Generalized Eigenvalue Problems us assume that both A and B are positive definite. (12. Generalized Eigenvalue Problems Chapter 12. To check this. 8.21 are possible. Further. respectively..15) is called a generalized singular value problem and algorithms exist to problem generalized solve it (and hence equivalently (12. when A = AT > 0.butin writing A — PDDP T = PD(PD) with D is diagonal and P orthogonal.. The case when A is symmetric but indefinite is not so A = AT::: O.13)) and LB separately. Then the matrix Q U performs the simultaneous L e 1R~ xn diagonalization. Further. Remark 12. which is thus to the generalized eigenvalue problem 02.e.14) rewritten the LAL~x = ALBz = A L g L ^ L g 7 z . at least in real arithmetic.3].13» via arithmetic operations performed only on LA LA (12. but in writing = PDDp D diagonal. For example.21. Various generalizations of the results in Remark 12. without forming the products LALTA or LBLTB explicitly. Then the matrix Q == LLBTu performs the simultaneous diagonal. i. This is analogous to finding the singular values of a matrix M by Sec. The SVD in (12. Note that LB A and thus the singular values of L B 1 LA can be found from the eigenvalue problem 02. [7. A can be written as A = PDP T. Compute the SVD Cholesky factorizations A B. Sec. eigenproblem MT M x Xx. respectively.22.e. D may have pure imaginary elements. note that T QT AQ = U Li/(LAL~)Li/U = UTULVTVLTUTU i/ = while L2 QT BQ = U T LB1(LBL~)Li/U = UTU = I. Remark 12. see. which is thus equivalent to the generalized eigenvalue problem ALBL~LBT z.134 134 Chapter 12.13) can be computed without explicitly forming the without Remark product indicated matrix product or the inverse by using the socalled generalized singular value decomposition (GSVD). A straightforward.21 example.
1 12. A special case of (12.6. 12. by analogy with the firstorder case.. ..6. (A 2 M + AC + K) p = O. . Suppose K = KT.6 HigherOrder Eigenvalue Problems HigherOrder Eigenvalue Problems Mq+Cq+Kq=O.C + K.16) arises frequently in applications: 0. Substituting in form q(t) = ext p. polynomial 2n. then all solutions of q + Kq = 0 are oscillatory. Assume for simplicity that M is nonsingular. k = 1. where the nvector p and scalar A. p.6 12. where q(t} e W1 and M.. If M is singular. . K = KT ::: 0).. = [ M1K 0 x (t) E ~2n.. Since the determinantal equation o = det(A 2 M + AC + K) = A2n + .2M + A.. or if it is desired to avoid the calculation of M lI because M is too ill conditioned with respect to inversion. quadratic) eigenvalue problem A.e. yields a polynomial of degree 2rc.6.2M + A. r. that we try to find a solution of (12.16) Consider the secondorder system of differential equations Consider the secondorder system of differential equations q(t) E ~n E ~nxn. (12.. Substituting in q(t) = eAt p. (12.6. K e Rnxn. Since the determinantal equation is singular.. the secondorder problem (12.16) can be written as a firstorder system (with block Let XI q and X2 Then (12. and A special case of (12. . n. ::: ILn· Let a>k = IILk I!. Suppose.2 K are are ± jWk. If r = n (i. ± Wk. for which the matrix A. M Mwhere x(t) €.16) we get (12..16) can still M secondorder generalized linear be converted to the firstorder generalized linear system converted I [ o M OJ'x = [0 K I C Jx.12. HigherOrder Eigenvalue Problems 135 12. HigherOrder Eigenvalue Problems 12.1 Conversion to firstorder form Conversion to firstorder form Let x\ = q and \i = q.C + K is singular. C = 0.. Suppose K has eigenvalues eigenvalues IL I ::: . ::: ILr ::: 0 > ILr+ I ::: . KT > 0). C.16) of the p A are to be determined. there are 2n eigenvalues for the secondorder (or A2 M + AC + K. Then (12. k = r + 1.• Then the 2n eigenvalues of the secondorder eigenvalue problem A2 I /+ K Let Wk =  fjik 12 Then the 2n eigenvalues of the secondorder eigenvalue problem A. seek A A2 M + AC + To get a nonzero solution /?. since eAt :F 0. are to be determined. we thus seek values of A.16) arises frequently in applications: M = I..16) or. then all solutions of q K q 0 are oscillatory. E2". If r n (i. and = = KT..16) can be written as a firstorder system (with block companion matrix) X .e. .
to higherorder eigenvalue problems that can be converted to firstorder form using a kn x kn to higherorder eigenvalue problems that can be converted to firstorder form using aknxkn block companion matrix analogue of (11.. Let F e Cnxm . Hint: Consider the equivalence I G][AUO F0]' B][I l [01 C (A similar result is also true for "nonsquare" pencils.136 136 Chapter 12.. such results show that zeros are invariant under state feedback or output injection.16) involving. Let € C M X • Show that the nonzero eigenvalues of and G F are the same. and/or K have special symmetry or skewsymmetry properties that can exploited. EXERCISES EXERCISES nx 1. G e Cmxn • Are the nonzero singular values of FG and GF the same? same? wx E ]Rnxn. Some can be useful when M. B e lRn*m. In the parlance of control theory. C. Show that the finite generalized eigenvalues of E lR " finite eigenvalues of e R™ x m the pencil [~ ~JA[~ ~J are the eigenvalues of the matrix A — BD 1 C. E Rnxm and E E 4. Similar procedures hold for the general k\horder difference equation order difference equation which can be converted to various firstorder systems of dimension kn. derivative q. andlor K Many other firstorder realizations are possible. verify Hint: An easy "trick proof is to verify that the matrices "trick proof' [Fg ~] and [~ GOF ] are similar via the similarity transformation are similar via the similarity transformation Let F E nxm G E mx ". C. Generalized Eigenvalue Problems Many other firstorder realizations are possible. Suppose A € Rnxn.19). Let F. Some can be useful when M. Show that the generalized eigenvalues of the pencils ues of the pencils e e [~ ~JA[~ ~J and and [ A + B~ + GC ~] _ A [~ ~] are identical for all F E E"1xn and all G E R" xmm . Show that the nonzero eigenvalues of FG and GF are the same. G E enxn". which can be converted to various firstorder systems of dimension kn. say.19). lead naturally naturally involving. (A similar result is also true for "nonsquare" pencils. Show that the generalized eigenval". properties Higherorder analogues of (12. the kth derivative of q.1 2. Generalized Eigenvalue Problems Chapter 12. In the parlance of control theory. .B D. F 6 Rm *" G R" x . Are the FG and GF the 3. Suppose A e Rnxn and D E lR::! xm.) . and C e lRmxn. Similar procedures hold for the general kthblock companion matrix analogue of (11.
(b) Show that Q~l = ^~^UT LTB.Exercises Exercises 137 137 desired 5. A B B are positive definite with Cholesky factorizations A = L<A and B = L#Lg. Ql = ~!UTL~. Consider the case where both A and transformation contragredient. (c) Show that the eigenvalues of A B are the same as those of 1. Such QT BQ a transformation is called contragredient. positive Cholesky = LA L ~ = L B L ~. B E e jRnxn Ql AQT ]Rnx" in such a way that Q~l AQ~T and QT BQ are simultaneously diagonal. and let U~VT be an SVD of LTBLA (a) Show that Q = LA V £ ~ 5 is a contragredient transformation that reduces both contragredient = LA V~! A and B to the same diagonal matrix. positive. A and B to the same diagonal matrix. respectively. . and let UWT be an SVD of L~LA'. Another family of simultaneous diagonalization problems arises when it is desired Another simultaneous diagonalization problems operates that the simultaneous diagonalizing transformation Q operates on matrices A.2 and hence are AB E2 positive. respectively.
This page intentionally left blank This page intentionally left blank .
We Obviously. / 2 <8>fl = [o ~ l\ 2. Example 13. 2B 2B ~J. the same definition holds if A and B are complexvalued matrices.1 Definition and Examples Definition and Examples Definition 13.. Let A e R mx ".1 13.2. pointing out the restrict our attention in this chapter primarily to realvalued matrices. n 2.1. Let A E lRmxn B E R Definition 13. Then 0 b ll b12 B @/z = l b" b~l 139 0 b2 2 0 b21 0 0 b12 0 b 22 l . Forany B e!F pxq /z @ B = [~ In Replacing 12 by /„ yields a block diagonal matrix with n copies of B along the I2 diagonal with n copies of along the diagonal. Then 3...A @ B. (13. B e lR pxq.Chapter 13 Chapter 13 Kronecker Products Kronecker Products 13. Let B be an arbitrary 2 x 2 matrix. pointing out the extension to the complex case only where it is not obvious. Foranyfl E lRX(7.1. the same definition holds if A and B are complexvalued matrices. We restrict our attention in this chapter primarily to realvalued matrices. 4 3 4 3 4 9 4 2 6 2 6 6 6 2 2 Note that B @ A i.2. Let B be an arbitrary 2x2 matrix. Example 13. Let A = [~ 2 2 nand B = [. Then A@B =[ 3~ ~]~U J. 1. Then the Kronecker product (or tensor Then the Kronecker product (or tensor product) of A and B is defined as the matrix product) of A and B is defined as the matrix allB A@B= [ : amlB alnB ] : E lRmpxnq. Note that B <g> A / A <g> B. extension to the complex case only where it is not obvious.1) amnB Obviously.
Let E ~mxn. C e ~nxp. Then 13.kCkPBD L~=1 amkckpBD ] 0 Theorem 13. = 1 ® 1 = I. If AI ® B.2 Properties of the Kronecker Product Properties of the Kronecker Product (A 0 B)(C 0 D) = AC 0 BD (E ~mrxpt). X2Yl.2 13. A® 13.6. D Corollary 13. Kronecker Products Kronecker Products The extension to arbitrary B and /„ is obvious..3. . Then X ® Y = [ XIY T . . and D e Rsxt.3.. Theorem 13.5. .. simply note that (A ® B)(A 1 ® B. Let Jt € Rm. . Let A e R mx ". (A ® B)I = Bare 13. If A and B are nonsingular. B In x E ~m.1 ) Theorem 13. y eR". 4. 0 . .. simply verify using the definitions of transpose and Kronecker verify transpose Kronecker 0 product.5. XmY T]T = [XIYJ. B e ~rxs. C E R" x ^ and D E ~sxt. mn . If E ]Rn xn e Rmxm are Theorem 13. (A ® Bl = AT ® BT.140 Chapter 13.m xm are symmetric.. Foral! Proof' Proof: For the proof.3. For all A and B. Proof: Proof: Using Theorem 13. Then 13. .2) Proof: Simply verify that Proof. (13. 5 E R r x i .. y e !R. XIYn. If A e R"xn and B E !R. 5. xmYnf E !R.3.6. Let* eR m .1. Simply verify that ~[ =AC0BD. L~=l al.n. E R".4. then A® B is symmetric.
12.. i / E e!!.8. then A® B is normal. <I :::: . Lgf A E E mxn have a singular value decomposition VA ~A Theorem 13. i E l!! 7 E 1· Proof: proof Proof: The basic idea of the proof is as follows: follows: (A 0 B)(x 0 z) = Ax 0 Bz =AX 0 JLZ = AJL(X 0 z). we can take p = nand q = m and n and q —m and If A and B are diagonalizable in Theorem 13.8.7. 0 Zj E€ IR mn "are linearly independent right eigenvectors of A 0 B corresponding to Ai JL 7 i e /?.13.3.. Let A E R nx "have eigenvalues Ai. TTzen ?/ze mn eigenvalues of A 0 Bare Moreover. .and let BB E e IRR mxwhave e IR nxn have eigenvalues A. eigenvectors of A® B corresponding to A.•.[Cos</> cos</>O Then It IS easl'1y seen that .2.. In general. Example 13...10. Properties of the Kronecker Product 13. q Corollary 13. ••. A.j. Then A <g)B (or B 0<8> A) has rs singular values U. Ap (p ::::: and ZI. = (A 0 B)(A 0 B)T 0 Corollary 13.12. Then A 0 B (or B A) has rs singular values have singular values <I :::: . if Xl.p (p < n).9.n.. If A E IR"xn and B eRmxm are normal... :::: U rTs > 0 and ^iT\ > • • • > ffr <s Qand rank(A 0 B) = (rankA)(rankB) = rank(B 0 A) .. Let A G IR mx " have a singular value decomposition l/^E^Vj an^ let and /ef singular decomposition UB^B^BB e IR pxq fi E ^pxq have a singular value decomposition V B ~B VI. if x\."xn have singular values UI :::: . j € m. if A and fi have Jordan form .JLqq (q < m). . 7 E m. mxm /zave Theorem 13. In general. and let eigenvalues jJij.. 0 If A and Bare diagonalizable in Theorem 13.i .. A0 B e±jeH</» e±jefJ </».c. . Then vI yields a singular value decomposition of A <8>B (after aasimple reordering of the diagonal yields a singular value decomposition of A 0 B (after simple reordering of the diagonal elements O/£A <8> £5 and the corresponding right and left singular vectors). L et A E xamp Ie 139 Let A = [ _eose cose andB . If A e IR nxn am/ B E IR mxm are normal.. xp are linearly independent right eigenvectors of A corresponding Moreover./u.. Let A E lR..7.10.. then A 0 B is normal. we can take p thus get the complete eigenstructure of A 0 B. if A and B have Jordan form thus get the complete eigenstructure of A <8> B. The 4 x 4 orthogonal e±j9 orthogonal eigenvalues e±j(i>.11..• :::: TS > O.. elements of ~A 0 ~B and the corresponding right and left singular vectors). .m are linearly independent right corresponding to JJL\ . then A <g> B is € IR nxn orthogonal and e IR m x m 15 then 0 is orthogonal. • • zq independent of to A .4 by Theorem 13. then .. matrix A ® 5 is then also orthogonal with eigenvalues e^'^+'W and e ± ^ (6> ~^ > \ Theorem 13.. ...12. j e q.. and zi. then Xi <8> Zj ffi.• sin e = _ sin</> Sin</>] Then it is easily seen that A is orthogonal with eigenvalues e±jO and B is orthogonal with eigenvalues e±j</J. . 141 141 Proof: Proof: (A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B) = AT A 0 BT B = AAT 0 B BT by Theorem 13.2. . . xp are linearly independent right eigenvectors of A corresponding AI... Properties of the Kronecker Product Theorem 13. ...• :::: U rr > 0 and let B E IRfx Corollary e R™x" singular a\ > • • > a > e have singular values T\ > • • > <s > 0. If A E E"xn is orthogonal and B E Mmxm is orthogonal. Sine] and B . If Corollary 13.3 since A and B are normal by Theorem 13.Zq are linearly independent right eigenvectors of B corresponding to JLI. Theorem 13. \Ju (q ::::: m). Then the mn eigenvalues of A® B are eigenvalues JL j. .
13.14. . in of A and B.e.15. while upper triangular. respectively. ~l 2 2 1 3 AfflB = (h®A)+(B®h) = 1 3 0 1 0 4 0 3 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 3 4 2 0 0 2 0 0 2 0 0 2 0 0 0 1 0 0 + 0 2 0 0 2 0 0 0 0 3 0 0 0 3 0 0 0 3 The reader is invited to compute B 0 A = (/3 ® B) + (A 0 h) and note the difference The reader is invited to compute B EEl A = (h ® B) (A <g> /2) and note the difference with A © B. of A and B. denoted A © B. respectively. pH AP = TA and QH BQ = TB (and similarly if and are orthogonal similarities PHAP = TA and QHBQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B to real Schur form)..I ® Ql)(A ® B)(P ® Q) = (P. A ® B i= B © A. Tr(A ® B) = (TrA)(TrB) = Tr(B ® A). For example. Note that. For example. A EEl B ^ B EEl A. A Schur form for A ® B can be derived similarly. respectively. Let A e Rn Xn and B e Rm xrn. with A EEl B. general. i. Note that.. to Schur (triangular) form. then we get the decompositions given by P~lI AP = J A and Ql BQ = JB. in general. i.15. 2. Kronecker Products Chapter 13. Let A e Rn xn and B e Rrn xm.AP J B . while upper triangular.142 142 Chapter 13. Then 13. Example 13. Then (P ® Q)H (A ® B)(P ® Q) = (pH ® QH)(A ® B)(P ® Q) = (pH AP) ® (QH BQ) = TA ® TR . 1. suppose P and Schur form for A ® B can be derived similarly. Example 13. E IR nxn E IR mxm. are unitary matrices that reduce A and 5. eigenvalues are zero or nonzero). nxn mxm Definition 13. Let A~U Then Then 2 2 !]andB~[ . respectively.e. Let 1. 1. to Schur (triangular) form. Then reducing A and B to real Schur form). Then the Kronecker sum (or tensor sum) .14. denoted A EEl B. is generally not quite in Jordan form and needs further reduction (to an ultimate Jordan form that also depends on whether or not certain further reduction (to an ultimate Jordan form that also depends on whether or not certain eigenvalues are zero or nonzero).13. then we get the JA and Q~] BQ following Jordanlike structure: following Jordanlike structure: (P ® Q)I(A ® B)(P ® Q) = (P. det(A ® B) = (det A)m(det Bt = det(B ® A). E IR E IR Kronecker Definition 13. Corollary 13. Kronecker Products decompositions given by p.1 AP) ® (Ql BQ) = JA ® JB · Note that h ® JR. suppose P and Q are unitary matrices that reduce A and B. is generally not quite in Jordan form and needs Note that JA® JB. is the mn x mn matrix Urn <g> A) + (B ® In). is the mn mn matrix (Im ® A) + (B ® /„).
.16. 7 e I!!. and let B E Rmx'" have e jRnxn eigenvalues A.. .. An + fJm' Moreover. .. . then decompositions given by P~1AP = lA and Q"1 BQ = JB. j E fl· eigenvectors of A $ B corresponding to A.. 0 I M 0 where M = [ where M = o M a f3 f3 a J.. Define 0 0 0 0 o o Ek = 0 o Then 1 can be written in the very compact form 1 = (4 <8>M) + (Ek ® h) = M $ E k . zq are linearly independent eigenvectors of corresponding to fJt. (I} ® M) + (E^®l2) = M 0 Ek. . Then the Kronecker sum A® B = (1m (g>A) + (B ® In) has mn (Im ® A) + (B <g> /„) /za^ ran eigenvalues fJj.2. j E ra.. . f^q (q ::s: ra).13. Proof: The basic idea of the proof is as follows: Proof: The basic idea of the proof is as follows: [(1m ® A) + (B ® In)](Z ® X) = (Z ® Ax) = (Z + (Bz ® X) ® Ax) + (fJZ ® X) = (A + fJ)(Z ® X). Recall the real JCF 2.16.2. respectively.•• .. . . Ap (p < and ZI. .. . if A and B have Jordan form pI l B . Properties of the Kronecker Product 2. . AI + fJm.i e n. . ii E E. respectively. is a Jordanlike structure for A © B. . then decompositions given JA and Qt BQ [(Q ® In)(lm ® p)rt[(lm ® A) = [(1m ® p)I(Q ® In)I][(lm ® A) = (1m ® lA) + (B ® In)][CQ ® In)(lm ® P)] + (B ® In)][(Q ® In)(/m ® + (B ® P)] = [(1m ® pI)(QI ® In)][(lm ® A) In)][CQ ® In)(/m <:9 P)] + (JB ® In) is a Jordanlike structure for A $ B.···. TTzen r/ze Kronecker sum A $ B eigenvalues e/genva/wes Al + fJt. Xp (p ::s: n). Then J can be written in the very compact form J Theorem 13. e jRmxm eigenvalues /z... . . . if x\.. we can take p nand q and If A and B are diagonalizable in Theorem 13. .xp are linearly independent right eigenvectors of A corresponding Moreover. xp are linearly independent right eigenvectors of A corresponding to AI. . A2 + fJm. In general. Recall the real JCF M I M 143 143 0 I M I 0 o 1= 0 E jR2kx2k.. eigenvectors of A® B corresponding to Ai + [ij. and z\.\ . i E !!. fJq (q < m). Let A E E"x" have eigenvalues Ai.. we can take p = n and q = m and thus get the complete eigenstructure of A $ In general. 0 If A and Bare diagonalizable in Theorem 13.16.. A2 + fJt.. if A and have Jordan form thus get the complete eigenstructure of A 0 B. Properties of the Kronecker Product 13. if XI. . . + fJj' € p... Zq are linearly independent right eigenvectors of B AI. j e q.... then Zj ® Xi E€ jRmn" are linearly independent right Zj <8> Xi W1 are linearly independent right corresponding f j i .
. it is easily seen by equating the writing (13. suppose P and are unitary A Schur form for A © B can be derived similarly. the solution X E Wnx" is easily shown taking B = AT.3 and Corollary 13.3 Application to Sylvester and Lyapunov Equations Application to Sylvester and Lyapunov Equations In this section we study the linear matrix equation In this section we study the linear matrix equation AX+XB=C. When does a solution exist? By writing the matrices in (13.4) is known as a Lyapunov equation. i.=1 A special case of (13.. Lyapunovequations arise naturally in stability theory.e.3) in tenns of their columns. [(Q ® /„)(/« ® P)] = (<2 ® P) is unitary by Theorem 13. Then to real Schur fonn). an "ordinary" linear system.J.Xj. . respectively. Sylvester who studied general linear matrix equations of the form equation in honor of J. and C e M" xm .8. Again.3 13.5) as (B T 0 /„). j=1 These equations can then be rewritten as the These equations can then be rewritten as the mn x mn linear system x linear system A+blll bl21 A + b 2Z 1 b2ml b 21 1 (13. =C.3 and Corollary 13.4) is known as a Lyapunov equation. Kronecker Products Chapter 13. = C.3) is. . Sylvester who studied general linear matrix equations of the fonn k LA. When C is symmetric. When does a solution exist? The first important question to ask regarding (13. (13. B e Rmxm . Again.5) clearly can be written as the Kronecker sum (1m 0 A) + The coefficient matrix in (13.3) mxm E IRnxn E IR E IRnxm. where [(Q <8>In)(lm ® P)] = (Q ® P) is unitary by Theorem 13.XB.144 Chapter 13..e. When symmetric. Then ((Q ® /„)(/« ® P)]"[(/m <8> A) + (B ® /B)][(e (g) /„)(/„. suppose P and Q are unitary fonn. The following definition is very helpful in completing the writing of (13..1. Kronecker Products A Schur fonn for A EB B can be derived similarly.3) is the symmetric equation AX +XAT = C (13.5) [ blml The coefficient matrix in (13. Lyapunov equations also to be symmetric and (13. i. = AXi + l:~>j. Sylvester where A e R"x". pH AP = TA matrices that reduce A and B.4) obtained by taking B = AT.3) in terms of their easily seen z'th columns that ith columns that m AXi + Xb.5) as an "ordinary" linear system. PHAP = TA that reduce to Schur and QH BQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B and QHBQ = TB (and similarly if P and Q are orthogonal similarities reducing A and B to real Schur form).8. to Schur (triangular) form. 13. arise naturally in stability theory. The following definition is very helpful in completing the writing of (13. This equation is now often called a Sylvester equation is now often equation in honor of 1. solution e IR xn also to be symmetric and (13. ® P)] = (/m <8> rA) + (7* (g) /„).5) clearly can be written as the Kronecker sum (Im * A) + (BT ® In).3) is. The first important question to ask regarding (13.
8) by Theorem 13. n > m. Assuming that. The most commonly preferred numerical algorithm is described in [2]. this algorithm takes only O(n3 ) operations rather than the O(n6)) that would be required by solving (13. Sylvester equations of the form (13. 77ie/i Theorem 13. one of many elegant connections between matrix theory and stability theory for differential equations. (A)+ Aj(B) =I 00 for all i.17.18. Let A e lRnxn. n :::: m. the linear system (13.4)) are generally not solved using the mn x mn "vee" formulation (13. The most (13. where A.16..3) (or symmetric Lyapunov equations of the form (13..7) has a unique solution if and only if A and . We thus have the following theorem.3) (or symmetric Lyapunov equations of the form Sylvester equations of the form (13. A further enhancement to this algorithm is available in [6] whereby the larger of A or B is initially reduced only to upper Hessenberg rather than triangular the larger of A or B is initially reduced only to upper Hessenberg rather than triangular Schur form. ofC e jRnxm [CI.6) There exists a unique solution to (13.. .. There exists a unique solution to (13. + IJLJ.5) can be rewritten in the form Using Definition 13. B e Rmxm..n denote the columns ofC E Rnxm so that C = [ n . the eigenvalues of [(/m <g> A) + (BT ® In)] are Ai A.17. has a unique solution if and only if A and —B have no eigenvalues in common.6) directly with operations rather than the O(n 6 that would be required by solving (13.. First A and B are reduced to commonly preferred numerical algorithm is described in [2].and Mj Ee A(B).. E R E jRnxm.3. vec(C) = Using Definition 13.10) . say.6). .B have no eigenvalues in common.17.1S.19.5) can be rewritten in the form [(1m ® A) + (B T ® In)]vec(X) = vec(C). so there exists unique Proof: Since A and B are stable.6) if and only if [(1m ® A) + (B T ® In)] is nonsingular. Theorem C E jRnxm. .13. E jRmxm. A. xn Theorem 13.9) Proof: Since A and B are stable. AX+XB=C (13.17. They culminate in Theorem 13. c E jRn the Then vec(C) is defined to be the mnvector formed by stacking the columns ofC on top of by C ::~~::~: ::d~~:::O:[]::::fonned "ocking the colunuu of on top of one another. and ^j Theorem 13. +00): IHoo lim XU) . Now integrate the differential equation X AX XB (with X(O) C) on [0. e m. Schur form.X(O) = A 10 roo X(t)dt + ([+00 X(t)dt) 10 B. From Theorem 13. ..24. They culminate in Theorem 13.8)by Theorem 13. Then the (unique) solution of the Sylvester equation AX+XB=C (13. Let A e jRnxn.18.8) can be written as can be written as (13. But [(1m ® A) + (B (g) /„)] nonsingular and only has no zero eigenvalues.16.(B) ^ solution to(13. and C e R" xm . Suppose further are asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have real are asymptotically stable (a matrix is asymptotically stable if all its eigenvalues have real parts in the open left halfplane). Application to Sylvester and Lyapunov Equations 13. . Definition 13.4» are generally not solved using the mn x mn "vec" formulation (13..6). Now integrate the differential equation X = AX + X B solution to (13. where From Theorem 13. A further enhancement to this algorithm is available in [6] whereby Gaussian elimination. the eigenvalues of [(1m ® A) + (BT <8> /„)] are + Mj. j j E!!!. this algorithm takes only 0 (n 3) transformed solution matrix X.e A (A). Ai E A(A).. E!!. one of many The next few theorems are classical. But [(Im <8>A) + (B TT ® In)] isisnonsingular ififand only ififitithas no zero eigenvalues.6) directly with Gaussian elimination. An equivalent linear system is then solved in which the triangular form equivalent linear system is then solved in which the triangular form of the reduced and can be exploited to solve successively for the columns of a suitably of the reduced A and B can be exploited to solve successively for the columns of a suitably transformed solution matrix X. First A and B are reduced to (real) Schur form. +00): (with X(0) = C) on [0. Then the (unique) solution of the Sylvester equation parts in the open left halfplane). (real) Schur form. j j so there exists aaunique for all i. Cm}. We thus have the following theorem. elegant connections between matrix theory and stability theory for differential equations. The next few theorems are classical. B E Rmxm. A(fi).3. Let Ci( € E. ii e n_. the linear system (13. Then the Sylvester equation G jRmxm.18. Suppose further that A and B E Rn . i. (13. (13. Assuming that. c ]. and C e Rnxm.e. Aj(A) + A.6) if and only if [(Im ® A) + (BT ® /„)] is nonsingular. say.24. Application to Sylvester and Lyapunov Equations 145 145 Definition 13.
Theorem Substituting in (13. Kronecker Products Chapter 13. If symmetric and (13. A.1. C E R"x". +00 r—>+oo t—v+oo X t ) = etACelB X t ) — O.11) has a unique solution if and only if A and . By Theorems 13..21 l3. An.21 and 13.20.6. . C e jRnxn further asymptotically stable.6.. Thus. Then the Lyapunov equation e jRnxn. symmetric and ( 13. .23 a solution to (13. Let A.. then that solution is symmetric. Theorem 13. .]. Theorem 13.12) Theorem 13.24. results = 0. Two basic results due to Lyapunov are the following. Now let v be an arbitrary nonzero vector in jRn.!„.13) exists and takes the form (13.23 solution Proof: Suppose A is asymptotically stable. If C is has unique if and only if and —A T eigenvalues in common. .ATT have A —A. then . .11) has a unique solution. a sufficient condition that guarantees that A and ..I .An.10) we have C t~+x /—<+3C = A (1+ 00 elACe lB dt) + (1+ o 00 elACe lB dt) B and so X and so X = 1o {+oo elACe lB dt satisfies (13.146 146 Chapter 13.13) where C = C T < O.12).11) has a unique solution. Lef A.21. Many useful results exist concerning the relationship between stability and Lyapunov equations.19. TTzen r/ze AX+XAT =C (13.AT has eigen— AT eigenvalues AI.. using the solution X ((t) = elACe tB from Theorem 11. the first of which follows immediately from Theorem 13. If matrix A e jRn xn eigenvalues )"" .. Then Then .. Hence.19. If the matrix A E Wxn has eigenvalues A. X B = is that [ J _Cfi ] be similar to [~ _OB] (via the similarity [ Let Theorem 13. —kn.. sufficient —A common eigenvalues A asymptotically no common eigenvalues is that A be asymptotically stable. Then the (unique) solution o/the Lyapunov equation of the AX+XAT=C can be written as can be written as (13. Remark 13. (13.. then that solution is symmetric.23. An equivalent condition for the existence of a unique solution to AX + AX + Remark XB = C is that [~ _cB ] be similar to [ J _°B ](via the similarity [~J _~ ]). it can be shown easily that lim elA = lim elB = O.A T have no eigenvalues in common. Remark 13.. .22. where C Proof: asymptotically l3.. Kronecker Products Using the results of Section 11. v E".C E R"x" and suppose further that A is asymptotically stable. . _* ]). we have that lim X ((t) = 0. A matrix A E R"x" is asymptotically stable if and only if there exists a only if e jRnxn asymptotically if positive definite solution to the Lyapunov equation positive definite solution to the Lyapunov equation AX +XAT = C.. 1>+00 1 .8).
Conversely. v TXv > 0 and thus X is positive definite. Since yH Xy > 0. D tions or from the fact that vec(xyT) = y ® x. suppose X = XT > 0 and let A.13. The solution of (13. in which the solution is of the form is of the form (13. for the solution of the simple Sylvesterlike equation introduced in Theorem 6. Theorem 13. B E Rpx(}. A subtle point arises when dealing with the "dual" Lyapunov equation A T X X A A subtle point arises when dealing with the "dual" Lyapunov equation ATX + XA = C. For any three matrices A.25. Let A E Rmxn.27. most of which derive from one key The vec operator has many useful properties. most of which derive from one key result.15) of (13.25.26. and C for which the matrix product ABC is Theorem 13. the integrand above is positive. B e jRPxq. result. Hence Since C > 0 and etA is nonsingular for all t. Then the equation 13. The equivalent "vec form" of this equation is The equivalent "vec form" of this equation is [(/ ® AT) + (AT ® l)]vec(X) = + (AT ® l)]vec(X) = vec(C). in which case the general solution has a if only ifAA + C B+ C.3.14) as Proof: Write (13. e jRrnxq. the AXB =C (13. B. e jRrnxn. and C for which the matrix product ABC is defined. D An immediate application is to the derivation of existence and uniqueness conditions An immediate application is to the derivation of existence and uniqueness conditions for the solution of the simple Sylvesterlike equation introduced in Theorem 6. Then 0> yHCy = yH AXy + yHXAT Y = (A + I)yH Xy. we must have A + A = 2 R e A < O. e A(A) with corresponding left eigenvector y. D asymptotically stable. The Lyapunov equation AX X A = C can also be written using the Remark 13. The Proof: Write (13. D Remark 13. Then vector y.16) . Application to Sylvester and Lyapunov Equations 13. Since A was arbitrary. 14) is unique if BB+ ® A+A = [. C.yr) = <8> x.27.14) as (B T ® A)vec(X) = vec(C) (13. the complexvalued equation AHX + XA = C is equivalent to [(/ ® AH) vec(C). vec(ABC) = (C T ® A)vec(B). For any three matrices A. the complexvalued equation H X X A = C is equivalent to However. A must be asymptotically stable. and C E Rmxq. Theorem 13.11.14) is unique if BB+ ® A+ A = I.3. nx p if and only if A A+CB+BB = C.14) xp E jRn has a solution X e R.26.t. Proof: The proof follows in a fairly straightforward fashion either directly from the definiProof: The proof follows in a fairly straightforward fashion either directly from the definitions or from the fact that vec(. Since A was arbitrary. The vec operator has many useful properties. B. we must have A + I = 2 Re A < 0 . However. Application to Sylvester and Lyapunov Equations 147 147 Since — C > 0 and etA is nonsingular for all the integrand above is positive. suppose X = XT > 0 and let A E A (A) with corresponding left eigenConversely.11. Hence vT Xv > 0 and thus X is positive definite. The Lyapunov equation AX + XATT = C can also be written using the vec notation in the equivalent form vec notation in the equivalent form [(/ ® A) + (A ® l)]vec(X) = vec(C). where Y e Rnxp is arbitrary. A must be Since yHXy > 0. where Y E jRnxp is arbitrary. defined.
148 148
Chapter 1 3. Kronecker Products Chapter 13. Kronecker Products
by Theorem 13.26. This "vector equation" has a solution if and only if by Theorem 13.26. This "vector equation" has a solution if and only if
(B T ® A)(B T ® A)+ vec(C)
+
= vec(C).
+ +
It is a straightforward exercise to show that (M ® N) + = M+ ® N+.. Thus, (13.16) has aa It is a straightforward exercise to show that (M ® N) = M <8> N Thus, (13.16) has
solution if and only if solution if and only if vec(C)
=
(B T ® A)«B+{ ® A+)vec(C)
= [(B+ B{ ® AA+]vec(C)
= vec(AA +C B+ B)
and hence if and only if AA +CB+B = C. and hence if and only if AA+ C B+ B C. The general solution of (13 .16) is then given by The general solution of (13.16) is then given by vec(X) = (B T ® A) + vec(C)
+ [I 
(B T ® A) + (B T ® A)]vec(Y),
where Y is arbitrary. This equation can then be rewritten in the form where Y is arbitrary. This equation can then be rewritten in the form vec(X)
= «B+{
® A+)vec(C)
+ [I
 (BB+{ ® A+ A]vec(y)
or, using Theorem 13.26, or, using Theorem 13.26,
The solution is clearly unique if B B+ ® A + A ==I. The solution is clearly unique if BB+ <8> A+A I.
0 D
EXERCISES EXERCISES
I. For any two matrices A and B for which the indicated matrix product is defined, 1. For any two matrices A and B for which the indicated matrix product is defined, show that (vec(A»T(vec(fl)) = Tr(A T B). In particular, if B E Rn x n ,, then Tr(B) = show that (vec(A)) r (vec(B» = Tr(A r £). In particular, if B e lR nxn then Tr(fl) = vec(/J r vec(fl). vec(Inl vec(B). 2. Prove that for all matrices A and B, (A ® B)+ = A+ ® B+.. 2. Prove that for all matrices A and B, (A ® B)+ = A+ ® B+
3. Show that the equation AX B = C has a solution for all C if A has full row rank and 3. Show that the equation AX B = C has a solution for all C if A has full row rank and B has full column rank. Also, show that a solution, if it exists, is unique if A has full B has full column rank. Also, show that a solution, if it exists, is unique if A has full column rank and B has full row rank. What is the solution in this case? column rank and B has full row rank. What is the solution in this case? 4. Show that the general linear equation 4. Show that the general linear equation
k
LAiXBi =C
i=1
can be written in the form can be written in the form
[BT ® AI
+ ... + B[ ® Ak]vec(X) =
vec(C).
Exercises Exercises
149 149
5. Let x E ]Rm and y E E". Show that *rT ® yy==y X T T. x <8> € Mm e ]Rn. yx •
6. Let A e R" xn and £ e M m x m . (a) Show that IIA ® BII22 = IIAII2I1Blb. (a) Show that A <8> B = A2£2. (b) What is II A ® B II F in terms of the Frobenius norms of A and B? Justify your (b) What is A ® B\\F in terms of the Frobenius norms of A and B? Justify your answer carefully. answer carefully.
(c) What is the spectral radius of A ® B in terms of the spectral radii of A and B? of A <8> B in terms of the spectral radii of A and B? Justify your answer carefully. Justify your answer carefully. 7. Let A, 5 eR" x ". 7. Let A, B E ]Rnxn.
A)k = / <8> A* and (fl <g> l = B® I for all integers k. (a) Show that (l ® A)* = I ® Ak and (B ® I /)* =Bk fc ® / for all integers &. (/ l A (b) Show that el®A = I ® eeA and eB®1 7= eeB ® I./. e® <g) A and e5® = B (g)
(c) Show that the matrices I ® and (c) Show that the matrices / (8)AA andBB® I /commute. ® commute. (d) Show that (d) Show that
e AEIlB
= eU®A)+(B®l) = e B ® e A .
(Note: This result would look a little "nicer" had we defined our Kronecker (Note: This result would look a little "nicer" had we defined our Kronecker sum the other way around. However, Definition 13.14 is conventional in the 13.14 literature.)
8. Consider the Lyapunov matrix equation (13.11) with
A =
and C the symmetric matrix and C the symmetric matrix
[~ _~ ]
[~
Xs
Clearly Clearly
=
[~ ~ ]
[_~ ~
]
is a symmetric solution of the equation. Verify that is a symmetric solution of the equation. Verify that
Xns =
is also a solution and is nonsymmetric. Explain in light of Theorem 13.21. is also a solution and is nonsymmetric. Explain in light of Theorem 13.21. 9. Block Triangularization: Let 9. Block Triangularization: Let
A E ]Rn xn find similarity where A e Rnxn and D E ]Rm xm. It is desired to find a similarity transformation e Rmxm. of the form of the form
T=[~ ~J
such that T l1ST is block upper triangular. such that T ST is block upper triangular.
150 150 (a) Show that S is similar to
Chapter 13. Kronecker Products Chapter 13. Kronecker Products
[
A +OBX
B ] DXB
if X satisfies the socalled matrix Riccati equation if X satisfies the socalled matrix Riccati equation
CXA+DXXBX=O.
(b) Fonnulate a similar result for block lower triangularization of S. Formulate S.
to. Block 10. Block Diagonalization: Let
S=
[~ ~
l
where A E Rnxn and D E R m x m . It is desired to find a similarity transfonnation of e jRnxn E jRmxm. transformation of the fonn form
T=[~ ~]
such that T l1ST is block diagonal, T ST block diagonal. (a) Show that S is similar to
if Y satisfies the Sylvester equation Y
AY  YD = B.
(b) Formulate a similar result for block diagonalization of Fonnulate of
Johns Hopkins [7] Golub. Academic Press. 6(1964). Cambridge. G. Regression and the MoorePenrose Pseudoinverse. 1996. RA. T.. York. UK.A. [9] Greville. SIAM. PA.. Philadelphia. 9(1967). AX + XB = C.. 1996. [2] Bartels. "Note on the Generalized Inverse of the Product of Matrices. C. 1985. 909913.. and J. NY." SIAM Rev.. PA.N.. Matrix Computations. R. Nash. Van Loan. AC24(1979).5758. Numerical Methodsfor Least Squares Problems. A. Cambridge Univ. Univ. PR.N. NJ. and C. SIAM 9(1967). 1972. NJ.H." IEEE [7] Golub. 15(1972). T..E. 151 151 . Van Loan. NY. R. 1958. Second Edition. McGrawHill. Third Edition. Philadelphia. Princeton. 2002.. G. Philadelphia.H. and C. [5] Cline.R. Baltimore.R. Press. N. Solution of the Matrix Equation [2] Bartels. 6(1964). UK.. New York.R.. New [1] York. Second Edition. Cambridge Univ. Solution Equation AX + X B = C. G. to Second Edition. 18(1976). Cambridge. [4] Bjorck.H. [3] Bellman.H. Second Edition. 518–521 [Erratum. 1970. McGrawHill.578619. 15(1972). Baltimore. Wilkinson. Accuracy of'Numerical Algorithms. Van Nostrand. Press. and C. and G. Press. and lH. Johnson.H. "IllConditioned Eigensystems and the Computation [8] Golub." SIAM Rev. Third Edition. Introduction to Matrix Analysis. SIAM." SIAM Rev. Control.H." SIAM Rev. AC24(1979). [5] Cline. [4] Bjorck. 57–58..." Comm. and C. R. [9] Greville. [13] Hom. Wilkinson.. [8] Golub.F. Matrix Analysis. "Note on the Generalized Inverse of a Matrix Product. York. Matrix Computations. [10] Halmos.. Stewart. 820826. "Note on the Generalized Inverse of a Matrix Product. Johnson.. "A HessenbergSchur Method for the Problem AX + XB = C. MD.J. NY. 1996... 1996. 1970. and C.. Nash. Numerical Methods for Least Squares Problems. "IllConditioned Eigensystems and Computation ofthe Jordan Canonical Form.F. "Algorithm 432. Horn.H. 1985. 8(1966).W. [11] Higham. [6] Golub. FiniteDimensional Vector Spaces. 2002. PA.E. and C. Univ. Accuracy and Stability ofNumerical Algorithms. P.. Second Edition. R. Matrix Analysis. SIAM.Bibliography [1] Albert. Johnson. SIAM Rev. S. Cambridge. UK. N.R. 249].E. and G. G. 1991. Cambridge Univ.. New [3] Bellman. R. [11] Higham. Johns Hopkins Univ. A. Press. 1958.. Van Loan. Autom. G. Horn.w.." Cornm. Cambridge. of the SIAM 18(1976).H.E. S. Topics in Matrix Analysis. [12] Hom.. 249].1. RH. 1972.A.." SIAM Rev. "A HessenbergSchur Method for the Problem [6] Golub. Philadelphia.. Stewart. NY. Press. Analysis. G.. UK. Second Edition. 8(1966). RA.. [10] Halmos.R. "Algorithm 432. SIAM. R. R.518521 [Erratum.." IEEE Trans. 1991. and c.820826. Princeton. FiniteDimensional Vector Spaces. ACM. 578619. AX X B = C. "Note on the Generalized Inverse of the Product of Matrices. Van Nostrand. Van Loan. MD.
1985. [19] Moler. Signals. Daniel. Second Edition. Jovanovich. SpringerVerlag. New York. 40(1995). C.S. Daniel. NY. 20(1978). Third Edition. 1988.. SIAM. Cambridge Philos. [15] Kenney..S. Matrix Analysis and Applied Linear Algebra.361390.. Matrix Theory.. NJ. [18] Meyer. Analysis Applied Linear [18] Meyer. and AJ..D. Third Edition. R. G. PA. 1985.. 361390. SpringerVerlag. c. G. and M. 406–413.. Harcourt Brace [24] Strang. C. Tismenetsky. Soc. FL. "Controllability and Stability Radii for Companion Form Systems.M. Englewood Cliffs. [20] Noble. W.M.152 152 Bibliography Bibliography [14] Kenney.J. C..801836. 1987. Introduction to Matrix Computations. Control. New York. Third Edition.. "The Matrix Sign Function. "The Matrix Sign Function. New York. Second Edition." IEEE Trans. C. Third Edition. Linear Multivariable Control. 1(1988). "A Schur Method for Solving Algebraic Riccati Equations. AJ. and Systems. 51(1955). Plenum." Math. A. New York. and J. "Nineteen Dubious Ways to Compute the Exponential [19] Moler. Control. Fundamentals of Matrix Computations.13301348. [22] Penrose." SIAM Rev. W. NY. "Controllability and Stability Radii for Companion Fonn [14] Kenney.S.J. [15] Kenney. P. New York. of a Matrix.S. 2000. P. andAJ. 1330–1348.W. Linear Algebra and Its Applications. B... Orlando. [17] Laub. Systems. C." Math. J.B. [26] Wonham. Autom. and M. G. 913921. Laub. NJ. 1988. Wiley[25] D." IEEE Trans. AC24( 1979)..B. Theory of Second Edition with Applications. CA.. PrenticeHall. 913–921. and c. Interscience. Control. Laub. 1973." SIAM 20(1978).406413. SIAM. San Diego. [22] Pemose. Third Edition. 1979).W. of Control. [24] Strang. 1988. 801836. W. New York. NY. and J. Laub. NY. "A Generalized Inverse for Matrices. Academic Press. Philadelphia. R. [21] Ortega. A Second Course. 40(1995).. PrenticeHall. Philadelphia. 1985.D. G. [23] Stewart. [16] Lancaster. NY. Control. 2002. Van Loan." Proc. FL. . D. "A Generalized Inverse for Matrices. Linear Algebra and Its Applications.w.. 51(1955). NY. [23] Stewart. 1973. Tismenetsky.. Academic Press. of Control." Proc. Laub.F.J. and C. Plenum..P. The Theory of Matrices. and A. 1985. Soc... and A. 1(1988). New York.. "A Schur Method for Solving Algebraic Riccati Equations. [20] Noble. Third Edition." IEEE Trans.. Second Edition with [16] Lancaster. C.. A Geometric Approach. [17] Laub. Applied Linear Englewood Cliffs. Autom. 2002. 2000. Autom. of [25] Watkins.. to Academic Press. 1988.." IEEE Trans . San Diego. C. 1987.. Harcourt Brace Jovanovich. PA.. [21] Ortega. "Nineteen Dubious Ways to Compute the Exponential of a Matrix.. Applied Linear Algebra.. [26] Wonham. Applications. Academic Press. Van Loan. Interscience. CA. Orlando.
Index Index Ainvariant subspace. 13 of a subspace. 12 dimension. 12 natural. C". 137 contragredient transformation. 150 inverse of. 48 LV factorization.58 ity. 58 CayleyHamilton Theorem. 95 unitary. 58 basis. 125 generalized real Schur form. 104 diagonalization. 114–118 computation of. 106 complement complement of a subspace. 125 Cholesky factorization. 7 field. 4–6 dimension. 46 controllability. 4 of a block matrix. 39. 17 domain. 103 conjugate transpose. 21 orthogonal. 75 of a matrix. 46 properties of. 87 characteristic polynomial characteristic polynomial of a matrix. 95 orthogonal. 2 conjugate transpose. 23 four fundamental subspaces. 12 direct sum direct sum of subspaces. 2 block matrix. 1 companion matrix companion matrix inverse of. 105 inverse of. 106 singular values of. 95 unitary. 104 definiteness of. 23 function of a matrix. 5 LU factorization. 17 co–domain. 91. 101 codomain. 128 generalized real Schur form. 90 matrix characterization of. 109 exponential of a matrix. 103 congruence. 46 defective. 39. 5 triangularization. 17 eigenvalue. 76 angle between vectors. 81. 4 determinant. 75 chain chain of eigenvectors. 11 basis. 23 rank. 89 exponential of a Jordan block. 106 singular vectors of. 75 of a matrix pencil. 1 . 109–112 properties of. 127 equivalent matrix pencils. 13 orthogonal. 5 of a block matrix. 12 block matrix. 23 vector. 91. four fundamental subspaces. 101 Cholesky factorization. 84 elementary divisors. 81 function of a matrix. 2 contragredient transformation.81 elementary divisors. 81 mation. 105 pseudoinverse of. 128 e (pmxn mxn en. 127 lems. 76 defective. 13 domain. 13 of subspaces. 89 exchange matrix. 58 angle between vectors. 125 generalized eigenvalue. 75 Cayley–Hamilton Theorem. 81. 21 153 . 11 natural. 48 inverse of. 5 properties of. 125 of a matrix pencil. 75 invariance under similarity transforinvariance under similarity transformation. 150 diagonalization. 110 inverse of. 81 generalized eigenvalue. 110 properties of. 85 of a principal vector. 87 of eigenvectors. 106 pseudoinverse of. 127 exchange matrix. 95 equivalent generalized eigenvalue probequivalent generalized eigenvalue problems. 137 controllability. 106 singular values of. 75 eigenvalue. 106 singular vectors of. 17 column column rank. 2 definiteness of. 115 exponential of a matrix. 84 equivalence transformation. i 1 (p/nxn 1 e~xn. 90 algebraic multiplicity. 1 CauchyBunyakovskySchwarz InequalCauchy–Bunyakovsky–Schwarz Inequality. 114118 inverse of. equivalent matrix pencils. 85 determinant. 76 algebraic multiplicity. 109112 field. 95 equivalence transformation. 149 congruence. 89 matrix characterization of. 89 A–invariant subspace. 1 vector. 109 computation of. 115 exponential of a Jordan block. 76 degree degree of a principal vector.
20 Kronecker . 54 weighted. 25 nulls pace of. 2 idempotent. 70 statement of. 51 identity matrix. 26 invertible. 100 left eigenvector. 71 71 decomsolution via singular value decomposition.2 7. 112 equations.2 i. 4 Euclidean. 45 linear independence. 65 problem. 112 for inhomogeneous linear difference for inhomogeneous linear difference equations. 45 uniqueness of solutions. 143 eigenvectors of. 121 conversion firstorder higher–order differential equations higherorder conversion to first–order form. 10 linear least squares problem. 103 initialvalue problem. 142 eigenvalues of. 10 linear independence. 17 invertible. 82 Kronecker canonical form (KCF). 18 nonsingular. 44 uniqueness of solutions. 134 134 geometric geometric multiplicity. 85 linear dependence. general solution of. 67 linear transformation. matrix representation of. 2 transpose. 47 Index Index Kronecker product. 142 transpose of. 141 singular trace of. 143 eigenvectors of. 140 pseudoinverse of. 75 left generalized eigenvector. 54 invariant factors. 25 left invertible. 149 leading principal submatrix. higherorder higher–order difference equations conversion to first–order form. 19 domain of. 20 nullspace j. 17 co–domain codomain of. 44 existence of solutions. 76 Holder Inequality. 148 singUlar values of. 127 generalized generalized singular value decomposition. 82 Jordan canonical form (JCF). 141 eigenvectors products of. 10 linear equations linear equations characterization of all solutions. 17 composition of. 136 firstorder i. generalized decomposition. 109 initial–value for higherorder equations. 66 geometric solution of.154 generalized Schur form. 22 left left principal vector. 125 left invertible. 4. 26 invertible. 120 firstorder higherorder eigenvalue problems higher–order problems conversion to first–order form. 66 linear regression. 143 exponential of. inner product inner product complex. 84 inverses of block matrices. 51 idempotent. 54 real. 44 characterization of all solutions. 140 Kronecker sum. 118 for homogeneous linear differential homogeneous equations. 54 usual. 6. 4 complex Euclidean. 129 Kronecker canonical Kronecker delta. 67 residual of. 139 determinant of. 120 higher–order for homogeneous linear difference homogeneous equations. 2 Jordan block. 141 eigenvectors of. 4 inertia. 65 uniqueness of solution. left nullspace. 58 Hermitian transpose. 55 complex Euclidean. 44 existence of solutions. 119 for inhomogeneous linear differeninhomogeneous differential equations. 65 solution via QR factorization. 142 determinant eigenvalues of.
quasi–upper–triangular. 2 unitary.60 mutually consistent. 4 unitary. 2 exponential. 122 tridiagonal. 56 normal equations. 61 subordinate unitarily invariant.60 oo–. 61 Frobenius. 22 observability. 91 of square root of a. 98 sign of a. 99 derogatory. 109 matrix exponential. 106 diagonal. 60 mixed. 2 quasiuppertriangular. 46 observability. singular. 60 spectral. 81. 20 right invertible. 67 companion. 146 symmetry of solution. 146 symmetry of solution. 61 61 relations among. 122 Hermitian.Index Index range of. 56 induced. 3 MumaghanWintner Murnaghan–Wintner Theorem. 99 norm norm induced. 46 onetoone (11). subordinate to a vector norm. 20 left. 2 upper Hessenberg.60 2–.56 natural. 146 integral form of solution.5 block.56 natural. 2 symplectic. 2 nearest singular matrix to. 144 equation. 62 matrix pencil. 33. 59 1. 76 monic polynomial. 105 normal. 3 matrixvector. 105 defective. 2 Householder. 126 singUlar. 97 indefinite. 29 multiplication multiplication matrixmatrix. 145 best rank k approximation to. 4 pentadiagonal. 60 spectral. 125 equivalent. 23 conditions for. 57 nullity. 115 nonderogatory. 4 upper Hessenberg. matrix–vector. 126 regular. 61 mixed. 2 matrix exponential. 92 92 nonnegative definite. 109 Hamiltonian. 25 . 26 LV LU factorization. 101 of 10 1 symmetric. 24 nullspace. 113 Lyapunov equation. 146 uniqueness of solution. 91. 60 induced by a vector norm. 98 negative definite. 146 integral form of solution. 99 definite. 76 MoorePenrose Moore–Penrose pseudoinverse.2 diagonal. one–to–one (1–1). 67 nilpotent. 60 p.60 2.20 nullspace. 2 lower triangular. 65 normed linear space. conditions for. 60 00. 91 minimal polynomial. and asymptotic stability.60 Schatten. 97 Householder. 81. 127 reciprocal. 91. 22 22 right. 5 Lyapunov differential equation. 61 Schatten. 57 normed linear space.60 /?–. 99 negative invariant subspace. 99 lower Hessenberg. 109 matrix norm. 60 61 consistent. 99 criteria for.60 1–. matrix–matrix. 100 nonpositive definite. 62 unitarily invariant. 95 orthogonal. 146 of 155 matrix matrix asymptotically stable. 6 block. 25 onto. 126 matrix sign function. 2 upper triangular. 23 for. 113 Lyapunov differential equation. 76 definite.
135 ShermanMorrisonWoodbury formula. n ' ' range. 127 Schur complement. 22 right principal vector. 85 right principal vector. 125 right invertible. 135 conversion to first–order form. 4. 1 Schur canonical form. and invariance of eigenvalues. 85 row row rank. 52 subspaces. 4. 98 reciprocal matrix pencil. 135 conversion to firstorder form. 99 definite. 75 right generalized eigenvector. 104 Rayleigh quotient. 48. 104 Schur Theorem. 59 Qorthogonality. 98 Schur vectors. 111 reverse–order identity matrix. Sherman–Morrison–Woodbury formula.n. 31 of a vector. 19 rational canonical form. 98 form. 4 matrix. 46 regular matrix pencil. 126 regular. 126 singular. 103 similarity transformation. 20 vectors. 63 pencil equivalent. 31 of a vector. 89 right eigenvector. 30 polar factorization. 31 of a scalar. 29 four Penrose conditions for. 121 pseudoinverse of. 46 real Schur canonical form. 72 QR factorization. 52 projection. 72 TO" . 92 positive invariant subspace.mxn. 81 invariance eigenvalues. 135 secondorder eigenvalue problem. 126 Penrose theorem. 126 singular. 126 residual. 52 fundamental orthogonal. 92 power (kth) of a Jordan block. 21 21 matrix. 100 projection projection 51 oblique. 30 ful